TOP ACTIVATIONS
MAX = 0.263

happy scene . <|endoftext|> K aren was busy in her garden
it . K aren was so excited ; she
! K aren had fun playing with the
On the way home , Kay la started
the station . On the way , she stopped to help
. On the way to the park Mary saw
. On the way to the park , they
fish . K aren had so much fun that
with it . <|endoftext|> K aren and her mom were at
. On the way to the doctor , Lily
Little Girl : " K aren ." K aren
One day , Martin and Ellen were playing together in the
aren ." K aren and Ann had so much
. On the way to school , Ann went
to them and asked for directions . The stranger had a
sleep . K aren laughed quietly and turned off
wall . He liked the way it looked and decided to
. On the way home , she met a
, Jenny was on her way outside when she discovered a
On the way to deliver the pin ew

MIN ACTIVATIONS
MIN = -0.249

said . " Thanks ," Sp ide y replied . "
learned an important lesson . Sp oil it and you will
fly got close enough , Sp ide y jumped and caught
was a little spider named Sp ide y . Sp ide
From that day on , Sp ide y always remembered to
bought us a treat . Sp aghetti !" he said .
swoop ed down to catch Sp ide y . Sp ide
to go back home , Sp ide y and Annie said
big fly came by and Sp ide y caught it in
From that day on , Sp ide y knew that even
back to their homes . Sp ide y was so happy
try new things sometimes . Sp icy food is good for
No , sweet ie . Sp iders can be dangerous if
are hard ." So , Sp ide y tried again and
to leave it alone . Sp iders can be a bit
catch Sp ide y . Sp ide y wasn 't ready
" That 's okay . Sp iders won 't hurt you
there was a spider named Sp ide y . Sp ide
was a little spider named Sp ide y . Sp ide
won 't hurt you . Sp iders don 't bite ."

INTERVAL 0.135 - 0.263
CONTAINS 0.763%

The comb is tight in the drawer . They
! <|endoftext|> Once upon a time , there was a woman
rock ! It has many arms and spots !"
! <|endoftext|> Once upon a time there was a little boy
box of old clothes and buttons and rib bons in the

INTERVAL 0.007 - 0.135
CONTAINS 38.533%

, he took good care of himself . <|endoftext|> Once upon
he cried and screamed , but no one could hear him
. They learned a hard lesson . <|endoftext|> L ily and
the swings and slide , ran around and smiled . At
cat , Wh isk ers . They liked to play together

INTERVAL -0.121 - 0.007
CONTAINS 59.767%

The owner gave her a big hug and praised her for
did today ." The moral of the story
eager to number more things and have more fun . <|endoftext|>
had won ! Jack held up the prize in
cup board with many shelves and it was perfect for all

INTERVAL -0.249 - -0.121
CONTAINS 0.938%

, " How do you do that , Sarah ?"
fell asleep with a happy dream . <|endoftext|> Tom and Lily
Wow , look at that bug !" Tim said . "
the magician , " How do you make things vanish ?"
very happy - the new cartoon had been even more exciting
Token happy
Token ID3772
Feature activation-6.42e-02

Pos logit contributions
aren+0.008
oys+0.008
inger+0.008
lynn+0.008
riet+0.007
Neg logit contributions
 fro-0.007
 Sp-0.007
 nib-0.007
 perform-0.006
 slur-0.006
Token scene
Token ID3715
Feature activation-4.75e-03

Pos logit contributions
 fro+0.103
 Sp+0.101
 nib+0.095
 perform+0.090
 slur+0.088
Neg logit contributions
aren-0.105
oys-0.103
inger-0.101
lynn-0.096
 policy-0.096
Token.
Token ID13
Feature activation-5.45e-03

Pos logit contributions
 fro+0.007
 Sp+0.006
 nib+0.006
 slur+0.005
 perform+0.005
Neg logit contributions
oys-0.009
aren-0.009
inger-0.008
lynn-0.008
 policy-0.008
Token<|endoftext|>
Token ID50256
Feature activation-8.92e-02

Pos logit contributions
 fro+0.007
 Sp+0.007
 nib+0.006
 perform+0.006
 slur+0.006
Neg logit contributions
inger-0.011
aren-0.011
oys-0.011
lynn-0.010
 policy-0.010
TokenK
Token ID42
Feature activation+7.72e-02

Pos logit contributions
 fro+0.102
 Sp+0.093
 nib+0.093
 slur+0.090
 perform+0.087
Neg logit contributions
aren-0.192
inger-0.183
oys-0.183
lynn-0.177
athy-0.175
Tokenaren
Token ID5757
Feature activation+2.63e-01

Pos logit contributions
oys+0.119
 policy+0.112
inger+0.109
issors+0.108
riet+0.107
Neg logit contributions
 fro-0.132
 Sp-0.123
 nib-0.118
 o-0.111
 slur-0.105
Token was
Token ID373
Feature activation+1.33e-02

Pos logit contributions
aren+0.522
inger+0.504
oys+0.503
lynn+0.483
athy+0.482
Neg logit contributions
 fro-0.347
 Sp-0.326
 nib-0.310
 perform-0.298
 slur-0.290
Token busy
Token ID8179
Feature activation-2.67e-04

Pos logit contributions
aren+0.027
inger+0.027
oys+0.026
lynn+0.026
athy+0.025
Neg logit contributions
 fro-0.016
 Sp-0.016
 nib-0.015
 perform-0.014
 slur-0.014
Token in
Token ID287
Feature activation+3.26e-03

Pos logit contributions
 fro+0.000
 Sp+0.000
 nib+0.000
 slur+0.000
 perform+0.000
Neg logit contributions
aren-0.001
oys-0.000
inger-0.000
lynn-0.000
 policy-0.000
Token her
Token ID607
Feature activation-4.48e-04

Pos logit contributions
aren+0.008
inger+0.008
oys+0.008
lynn+0.007
riet+0.007
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.003
 slur-0.002
Token garden
Token ID11376
Feature activation-1.22e-02

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
riet-0.001
Token it
Token ID340
Feature activation+2.37e-02

Pos logit contributions
 fro+0.005
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.009
oys-0.009
inger-0.009
lynn-0.009
uster-0.009
Token.
Token ID13
Feature activation-6.41e-03

Pos logit contributions
aren+0.043
oys+0.042
inger+0.041
lynn+0.040
 policy+0.040
Neg logit contributions
 fro-0.033
 Sp-0.031
 nib-0.030
 slur-0.030
 gag-0.027
Token
Token ID198
Feature activation-1.26e-02

Pos logit contributions
 fro+0.009
 Sp+0.008
 nib+0.008
 perform+0.008
 slur+0.007
Neg logit contributions
aren-0.012
oys-0.012
inger-0.012
lynn-0.012
riet-0.011
Token
Token ID198
Feature activation-8.41e-03

Pos logit contributions
 fro+0.016
 Sp+0.014
 nib+0.014
 perform+0.013
 slur+0.013
Neg logit contributions
aren-0.026
inger-0.025
oys-0.025
lynn-0.025
athy-0.024
TokenK
Token ID42
Feature activation+7.43e-02

Pos logit contributions
 fro+0.014
 Sp+0.013
 nib+0.013
 slur+0.012
 perform+0.012
Neg logit contributions
aren-0.014
oys-0.014
inger-0.014
lynn-0.013
athy-0.013
Tokenaren
Token ID5757
Feature activation+2.63e-01

Pos logit contributions
oys+0.080
inger+0.074
 policy+0.074
issors+0.073
uster+0.069
Neg logit contributions
 fro-0.165
 Sp-0.156
 nib-0.148
 o-0.139
 perform-0.139
Token was
Token ID373
Feature activation+1.78e-02

Pos logit contributions
aren+0.504
inger+0.474
oys+0.469
athy+0.461
riet+0.456
Neg logit contributions
 fro-0.366
 Sp-0.346
 nib-0.324
 perform-0.315
 slur-0.310
Token so
Token ID523
Feature activation-6.20e-03

Pos logit contributions
aren+0.036
oys+0.035
inger+0.035
lynn+0.034
riet+0.033
Neg logit contributions
 fro-0.023
 Sp-0.021
 nib-0.020
 perform-0.019
 slur-0.019
Token excited
Token ID6568
Feature activation-6.30e-02

Pos logit contributions
 fro+0.008
 Sp+0.008
 nib+0.007
 perform+0.007
 slur+0.007
Neg logit contributions
aren-0.012
inger-0.012
oys-0.012
lynn-0.011
riet-0.011
Token;
Token ID26
Feature activation+1.45e-02

Pos logit contributions
 fro+0.075
 Sp+0.071
 nib+0.068
 perform+0.064
 slur+0.062
Neg logit contributions
aren-0.132
inger-0.130
oys-0.129
lynn-0.125
riet-0.122
Token she
Token ID673
Feature activation-4.98e-02

Pos logit contributions
aren+0.035
oys+0.035
inger+0.035
lynn+0.034
riet+0.033
Neg logit contributions
 fro-0.012
 Sp-0.011
 nib-0.011
 perform-0.010
 slur-0.009
Token!
Token ID0
Feature activation+3.34e-02

Pos logit contributions
 fro+0.083
 Sp+0.079
 nib+0.075
 perform+0.070
 slur+0.070
Neg logit contributions
aren-0.126
oys-0.125
inger-0.122
lynn-0.117
riet-0.116
Token
Token ID220
Feature activation+5.61e-02

Pos logit contributions
aren+0.069
inger+0.068
oys+0.068
lynn+0.065
riet+0.064
Neg logit contributions
 fro-0.041
 Sp-0.039
 nib-0.037
 perform-0.035
 slur-0.034
Token
Token ID198
Feature activation+1.43e-04

Pos logit contributions
aren+0.131
oys+0.128
inger+0.128
lynn+0.125
riet+0.123
Neg logit contributions
 fro-0.051
 Sp-0.050
 nib-0.043
 perform-0.041
 slur-0.038
Token
Token ID198
Feature activation+1.75e-03

Pos logit contributions
aren+0.000
oys+0.000
inger+0.000
lynn+0.000
riet+0.000
Neg logit contributions
 fro-0.000
 Sp-0.000
 nib-0.000
 perform-0.000
 slur-0.000
TokenK
Token ID42
Feature activation+7.74e-02

Pos logit contributions
aren+0.003
oys+0.003
inger+0.003
lynn+0.003
athy+0.003
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.002
 perform-0.002
 slur-0.002
Tokenaren
Token ID5757
Feature activation+2.61e-01

Pos logit contributions
oys+0.118
inger+0.113
 policy+0.112
issors+0.112
lynn+0.109
Neg logit contributions
 fro-0.133
 Sp-0.124
 nib-0.116
 o-0.108
 perform-0.105
Token had
Token ID550
Feature activation+1.04e-01

Pos logit contributions
aren+0.498
inger+0.475
oys+0.474
athy+0.461
riet+0.454
Neg logit contributions
 fro-0.360
 Sp-0.341
 nib-0.320
 perform-0.313
 slur-0.303
Token fun
Token ID1257
Feature activation+1.21e-02

Pos logit contributions
aren+0.208
oys+0.206
inger+0.204
lynn+0.196
riet+0.191
Neg logit contributions
 fro-0.136
 Sp-0.128
 nib-0.124
 perform-0.117
 slur-0.114
Token playing
Token ID2712
Feature activation-5.68e-02

Pos logit contributions
aren+0.021
oys+0.020
lynn+0.020
inger+0.019
 policy+0.019
Neg logit contributions
 fro-0.018
 Sp-0.017
 nib-0.016
 slur-0.015
 gag-0.015
Token with
Token ID351
Feature activation+1.25e-02

Pos logit contributions
 gag+0.080
 fro+0.075
 dream+0.070
atmeal+0.069
 Sp+0.068
Neg logit contributions
oys-0.086
lynn-0.084
han-0.084
aren-0.083
inger-0.082
Token the
Token ID262
Feature activation+1.80e-02

Pos logit contributions
aren+0.027
oys+0.027
inger+0.027
lynn+0.026
riet+0.025
Neg logit contributions
 fro-0.014
 Sp-0.012
 nib-0.012
 perform-0.011
 slur-0.011
Token
Token ID220
Feature activation+5.59e-02

Pos logit contributions
aren+0.067
inger+0.066
oys+0.066
lynn+0.063
riet+0.061
Neg logit contributions
 fro-0.043
 Sp-0.041
 nib-0.039
 perform-0.036
 slur-0.035
Token
Token ID198
Feature activation-1.28e-03

Pos logit contributions
aren+0.133
oys+0.133
inger+0.132
lynn+0.129
 policy+0.127
Neg logit contributions
 Sp-0.045
 fro-0.045
 nib-0.036
 perform-0.035
 slur-0.031
Token
Token ID198
Feature activation+1.72e-04

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
lynn-0.003
 policy-0.003
TokenOn
Token ID2202
Feature activation+1.28e-02

Pos logit contributions
aren+0.000
oys+0.000
inger+0.000
lynn+0.000
riet+0.000
Neg logit contributions
 fro-0.000
 Sp-0.000
 nib-0.000
 perform-0.000
 slur-0.000
Token the
Token ID262
Feature activation+2.52e-02

Pos logit contributions
aren+0.028
oys+0.028
inger+0.028
lynn+0.027
riet+0.027
Neg logit contributions
 fro-0.014
 Sp-0.013
 nib-0.012
 perform-0.011
 slur-0.011
Token way
Token ID835
Feature activation+2.60e-01

Pos logit contributions
aren+0.044
inger+0.044
oys+0.044
lynn+0.041
uster+0.041
Neg logit contributions
 fro-0.040
 Sp-0.036
 nib-0.036
 perform-0.035
 dream-0.034
Token home
Token ID1363
Feature activation+3.00e-02

Pos logit contributions
inger+0.615
oys+0.613
aren+0.610
lynn+0.593
riet+0.589
Neg logit contributions
 fro-0.235
 Sp-0.209
 nib-0.177
 perform-0.175
 dance-0.168
Token,
Token ID11
Feature activation+5.53e-03

Pos logit contributions
aren+0.067
inger+0.066
oys+0.066
lynn+0.065
riet+0.063
Neg logit contributions
 fro-0.031
 Sp-0.029
 nib-0.027
 perform-0.026
 slur-0.023
Token Kay
Token ID17356
Feature activation+1.54e-01

Pos logit contributions
aren+0.011
oys+0.011
inger+0.010
lynn+0.010
riet+0.010
Neg logit contributions
 fro-0.007
 Sp-0.007
 nib-0.007
 slur-0.006
 perform-0.006
Tokenla
Token ID5031
Feature activation+9.77e-02

Pos logit contributions
oys+0.224
 policy+0.215
issors+0.205
uster+0.200
inger+0.200
Neg logit contributions
 fro-0.279
 Sp-0.256
 nib-0.254
 sne-0.248
 perform-0.246
Token started
Token ID2067
Feature activation+1.33e-02

Pos logit contributions
aren+0.188
inger+0.183
oys+0.178
athy+0.173
lynn+0.172
Neg logit contributions
 fro-0.132
 Sp-0.127
 nib-0.117
 perform-0.112
 slur-0.110
Token the
Token ID262
Feature activation+2.01e-02

Pos logit contributions
 fro+0.027
 Sp+0.025
 nib+0.024
 slur+0.023
 perform+0.022
Neg logit contributions
aren-0.048
oys-0.046
inger-0.046
lynn-0.045
riet-0.044
Token station
Token ID4429
Feature activation+1.05e-01

Pos logit contributions
aren+0.034
inger+0.033
oys+0.033
lynn+0.032
athy+0.032
Neg logit contributions
 fro-0.031
 Sp-0.030
 nib-0.029
 slur-0.028
 perform-0.027
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
aren+0.247
inger+0.247
oys+0.241
lynn+0.238
athy+0.236
Neg logit contributions
 fro-0.098
 Sp-0.088
 nib-0.086
 perform-0.083
 slur-0.073
Token On
Token ID1550
Feature activation+2.39e-02

Pos logit contributions
inger+0.033
 policy+0.032
oys+0.031
 supervision+0.030
lynn+0.030
Neg logit contributions
 Sp-0.014
 fro-0.014
 perform-0.013
 grow-0.012
 movies-0.012
Token the
Token ID262
Feature activation+1.81e-02

Pos logit contributions
inger+0.062
oys+0.062
uster+0.059
aren+0.059
 policy+0.058
Neg logit contributions
 fro-0.017
 Sp-0.016
 perform-0.013
 dream-0.013
 movies-0.012
Token way
Token ID835
Feature activation+2.60e-01

Pos logit contributions
inger+0.036
riet+0.036
aren+0.036
 policy+0.036
 vain+0.034
Neg logit contributions
 movies-0.023
 perform-0.022
 o-0.022
 fro-0.022
 dream-0.022
Token,
Token ID11
Feature activation+9.38e-03

Pos logit contributions
inger+0.680
ady+0.656
issors+0.652
riet+0.651
oys+0.647
Neg logit contributions
 fro-0.151
 Sp-0.145
 he-0.105
 dance-0.100
 perform-0.099
Token she
Token ID673
Feature activation-5.10e-02

Pos logit contributions
inger+0.021
oys+0.020
aren+0.020
uster+0.020
ady+0.019
Neg logit contributions
 fro-0.010
 Sp-0.010
 perform-0.009
 nib-0.008
 dance-0.008
Token stopped
Token ID5025
Feature activation+4.41e-02

Pos logit contributions
 fro+0.060
 Sp+0.057
 nib+0.054
 slur+0.051
 perform+0.051
Neg logit contributions
aren-0.107
inger-0.103
oys-0.102
lynn-0.099
athy-0.098
Token to
Token ID284
Feature activation-2.58e-02

Pos logit contributions
aren+0.079
oys+0.078
inger+0.077
lynn+0.075
athy+0.072
Neg logit contributions
 fro-0.062
 Sp-0.059
 nib-0.057
 slur-0.056
 perform-0.054
Token help
Token ID1037
Feature activation+4.68e-02

Pos logit contributions
 fro+0.030
 Sp+0.029
 nib+0.027
 slur+0.026
 perform+0.024
Neg logit contributions
aren-0.053
oys-0.051
inger-0.050
lynn-0.050
riet-0.048
Token.
Token ID13
Feature activation+1.01e-02

Pos logit contributions
aren+0.022
oys+0.021
inger+0.021
lynn+0.020
riet+0.020
Neg logit contributions
 fro-0.013
 Sp-0.012
 nib-0.011
 slur-0.011
 perform-0.011
Token
Token ID198
Feature activation+2.91e-03

Pos logit contributions
inger+0.022
oys+0.022
aren+0.022
lynn+0.021
 policy+0.020
Neg logit contributions
 fro-0.011
 Sp-0.011
 nib-0.010
 perform-0.010
 slur-0.009
Token
Token ID198
Feature activation+5.28e-03

Pos logit contributions
aren+0.006
inger+0.006
oys+0.006
lynn+0.006
athy+0.006
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.003
 slur-0.003
TokenOn
Token ID2202
Feature activation+5.52e-03

Pos logit contributions
aren+0.009
oys+0.009
inger+0.009
lynn+0.008
riet+0.008
Neg logit contributions
 fro-0.009
 Sp-0.008
 nib-0.008
 perform-0.008
 slur-0.007
Token the
Token ID262
Feature activation+2.24e-02

Pos logit contributions
aren+0.012
inger+0.011
oys+0.011
lynn+0.011
athy+0.011
Neg logit contributions
 fro-0.007
 Sp-0.006
 nib-0.006
 slur-0.005
 perform-0.005
Token way
Token ID835
Feature activation+2.60e-01

Pos logit contributions
uster+0.044
inger+0.042
riet+0.042
 vain+0.041
 policy+0.041
Neg logit contributions
 fro-0.033
 dance-0.031
 swim-0.029
 perform-0.029
 play-0.029
Token to
Token ID284
Feature activation-2.33e-02

Pos logit contributions
inger+0.600
aren+0.598
oys+0.592
riet+0.582
lynn+0.580
Neg logit contributions
 fro-0.238
 Sp-0.219
 perform-0.180
 nib-0.176
 dance-0.176
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
 fro+0.026
 Sp+0.024
 nib+0.023
 perform+0.021
 slur+0.021
Neg logit contributions
aren-0.051
inger-0.050
oys-0.050
lynn-0.048
riet-0.047
Token park
Token ID3952
Feature activation+4.57e-02

Pos logit contributions
inger+0.042
aren+0.042
oys+0.042
riet+0.041
uster+0.041
Neg logit contributions
 fro-0.027
 perform-0.023
 Sp-0.023
 nib-0.022
 dance-0.022
Token Mary
Token ID5335
Feature activation+1.05e-01

Pos logit contributions
inger+0.104
aren+0.102
oys+0.102
lynn+0.101
uster+0.101
Neg logit contributions
 fro-0.047
 Sp-0.043
 nib-0.039
 perform-0.037
 dream-0.033
Token saw
Token ID2497
Feature activation-3.44e-03

Pos logit contributions
aren+0.214
inger+0.206
oys+0.202
athy+0.197
lynn+0.196
Neg logit contributions
 fro-0.129
 Sp-0.123
 nib-0.114
 slur-0.110
 perform-0.110
Token.
Token ID13
Feature activation+3.85e-03

Pos logit contributions
aren+0.142
inger+0.139
oys+0.139
lynn+0.134
riet+0.132
Neg logit contributions
 fro-0.076
 Sp-0.071
 nib-0.068
 perform-0.065
 slur-0.062
Token
Token ID198
Feature activation-5.40e-03

Pos logit contributions
aren+0.007
oys+0.007
inger+0.007
lynn+0.007
riet+0.006
Neg logit contributions
 fro-0.006
 Sp-0.005
 nib-0.005
 perform-0.005
 slur-0.005
Token
Token ID198
Feature activation-1.20e-03

Pos logit contributions
 fro+0.007
 Sp+0.006
 nib+0.006
 perform+0.006
 slur+0.006
Neg logit contributions
aren-0.011
inger-0.011
oys-0.011
lynn-0.011
athy-0.010
TokenOn
Token ID2202
Feature activation+6.34e-03

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 slur+0.002
 perform+0.002
Neg logit contributions
aren-0.002
oys-0.002
inger-0.002
lynn-0.002
athy-0.002
Token the
Token ID262
Feature activation+2.18e-02

Pos logit contributions
aren+0.014
oys+0.014
inger+0.014
lynn+0.013
riet+0.013
Neg logit contributions
 fro-0.007
 Sp-0.006
 nib-0.006
 perform-0.006
 slur-0.005
Token way
Token ID835
Feature activation+2.59e-01

Pos logit contributions
inger+0.036
aren+0.036
oys+0.035
riet+0.034
 policy+0.033
Neg logit contributions
 fro-0.036
 dream-0.036
 perform-0.034
 dance-0.034
 sne-0.034
Token to
Token ID284
Feature activation-2.06e-02

Pos logit contributions
aren+0.603
inger+0.601
oys+0.591
riet+0.584
athy+0.577
Neg logit contributions
 fro-0.248
 Sp-0.232
 nib-0.194
 perform-0.183
 dance-0.173
Token the
Token ID262
Feature activation+1.71e-02

Pos logit contributions
 fro+0.021
 Sp+0.020
 nib+0.019
 perform+0.018
 slur+0.016
Neg logit contributions
aren-0.047
oys-0.046
inger-0.046
lynn-0.044
riet-0.044
Token park
Token ID3952
Feature activation+4.51e-02

Pos logit contributions
aren+0.031
inger+0.030
oys+0.030
lynn+0.029
riet+0.028
Neg logit contributions
 fro-0.026
 Sp-0.025
 nib-0.024
 perform-0.023
 slur-0.022
Token,
Token ID11
Feature activation+6.44e-03

Pos logit contributions
aren+0.096
inger+0.095
oys+0.095
lynn+0.091
riet+0.090
Neg logit contributions
 fro-0.052
 Sp-0.049
 nib-0.046
 perform-0.044
 slur-0.042
Token they
Token ID484
Feature activation-5.82e-03

Pos logit contributions
aren+0.013
oys+0.013
inger+0.012
lynn+0.012
riet+0.012
Neg logit contributions
 fro-0.008
 nib-0.007
 Sp-0.007
 slur-0.007
 perform-0.007
Token fish
Token ID5916
Feature activation-1.05e-01

Pos logit contributions
aren+0.046
oys+0.045
inger+0.045
lynn+0.043
riet+0.042
Neg logit contributions
 fro-0.035
 Sp-0.034
 nib-0.032
 perform-0.031
 slur-0.030
Token.
Token ID13
Feature activation-2.01e-03

Pos logit contributions
 fro+0.136
 Sp+0.133
 nib+0.121
 slur+0.120
 perform+0.119
Neg logit contributions
oys-0.200
aren-0.198
inger-0.198
lynn-0.184
 policy-0.184
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
 fro+0.003
 nib+0.003
 Sp+0.003
 perform+0.003
 slur+0.003
Neg logit contributions
aren-0.004
oys-0.004
inger-0.004
lynn-0.003
athy-0.003
Token
Token ID198
Feature activation-9.69e-03

Pos logit contributions
 fro+0.017
 Sp+0.016
 nib+0.016
 perform+0.015
 slur+0.015
Neg logit contributions
aren-0.027
inger-0.026
oys-0.026
lynn-0.025
athy-0.025
TokenK
Token ID42
Feature activation+7.34e-02

Pos logit contributions
 fro+0.016
 Sp+0.015
 nib+0.015
 slur+0.014
 perform+0.014
Neg logit contributions
aren-0.016
inger-0.016
oys-0.015
athy-0.015
lynn-0.015
Tokenaren
Token ID5757
Feature activation+2.59e-01

Pos logit contributions
oys+0.111
inger+0.105
 policy+0.103
lynn+0.102
aren+0.102
Neg logit contributions
 fro-0.130
 Sp-0.121
 nib-0.117
 perform-0.109
 slur-0.107
Token had
Token ID550
Feature activation+9.73e-02

Pos logit contributions
aren+0.482
inger+0.460
oys+0.449
athy+0.447
riet+0.437
Neg logit contributions
 fro-0.362
 Sp-0.345
 nib-0.319
 perform-0.319
 gag-0.308
Token so
Token ID523
Feature activation-6.34e-03

Pos logit contributions
aren+0.205
oys+0.203
inger+0.201
lynn+0.193
riet+0.189
Neg logit contributions
 fro-0.117
 Sp-0.110
 nib-0.107
 perform-0.099
 slur-0.097
Token much
Token ID881
Feature activation+2.14e-03

Pos logit contributions
 fro+0.010
 Sp+0.010
 perform+0.009
 nib+0.009
 slur+0.009
Neg logit contributions
aren-0.011
inger-0.010
oys-0.010
riet-0.009
lynn-0.009
Token fun
Token ID1257
Feature activation+4.37e-03

Pos logit contributions
aren+0.004
oys+0.004
inger+0.004
lynn+0.004
riet+0.004
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.003
 slur-0.003
Token that
Token ID326
Feature activation+2.36e-02

Pos logit contributions
aren+0.008
oys+0.008
inger+0.008
lynn+0.007
 policy+0.007
Neg logit contributions
 fro-0.006
 Sp-0.006
 nib-0.005
 perform-0.005
 slur-0.005
Token with
Token ID351
Feature activation-4.42e-03

Pos logit contributions
 fro+0.088
 Sp+0.079
 nib+0.077
 slur+0.074
 perform+0.072
Neg logit contributions
oys-0.135
aren-0.133
 policy-0.129
inger-0.127
lynn-0.125
Token it
Token ID340
Feature activation+1.05e-02

Pos logit contributions
 Sp+0.003
 fro+0.003
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
inger-0.012
oys-0.012
aren-0.012
athy-0.011
lynn-0.011
Token.
Token ID13
Feature activation-8.03e-03

Pos logit contributions
aren+0.024
inger+0.023
oys+0.023
lynn+0.022
athy+0.022
Neg logit contributions
 fro-0.011
 Sp-0.010
 nib-0.010
 perform-0.009
 slur-0.009
Token<|endoftext|>
Token ID50256
Feature activation-9.31e-02

Pos logit contributions
 fro+0.010
 Sp+0.009
 nib+0.009
 perform+0.008
 slur+0.008
Neg logit contributions
aren-0.017
inger-0.017
oys-0.017
lynn-0.016
riet-0.016
TokenK
Token ID42
Feature activation+7.34e-02

Pos logit contributions
 fro+0.109
 Sp+0.100
 nib+0.098
 slur+0.095
 perform+0.093
Neg logit contributions
aren-0.197
inger-0.190
oys-0.190
lynn-0.183
athy-0.180
Tokenaren
Token ID5757
Feature activation+2.59e-01

Pos logit contributions
oys+0.126
inger+0.117
 policy+0.116
issors+0.113
riet+0.113
Neg logit contributions
 fro-0.117
 Sp-0.109
 nib-0.104
 o-0.097
 dream-0.093
Token and
Token ID290
Feature activation+7.01e-03

Pos logit contributions
aren+0.519
inger+0.502
oys+0.501
lynn+0.482
athy+0.479
Neg logit contributions
 fro-0.337
 Sp-0.315
 nib-0.300
 perform-0.288
 slur-0.280
Token her
Token ID607
Feature activation-2.84e-03

Pos logit contributions
inger+0.014
aren+0.014
oys+0.014
lynn+0.013
uster+0.013
Neg logit contributions
 fro-0.009
 Sp-0.009
 nib-0.008
 perform-0.008
 slur-0.008
Token mom
Token ID1995
Feature activation+4.97e-03

Pos logit contributions
 fro+0.003
 Sp+0.003
 nib+0.003
 dream+0.003
 perform+0.003
Neg logit contributions
aren-0.006
oys-0.006
inger-0.006
 policy-0.006
lynn-0.006
Token were
Token ID547
Feature activation+4.60e-03

Pos logit contributions
aren+0.009
oys+0.009
inger+0.009
lynn+0.008
riet+0.008
Neg logit contributions
 fro-0.007
 Sp-0.007
 nib-0.007
 perform-0.007
 slur-0.006
Token at
Token ID379
Feature activation-8.55e-02

Pos logit contributions
aren+0.010
oys+0.010
inger+0.010
lynn+0.009
riet+0.009
Neg logit contributions
 fro-0.005
 Sp-0.005
 nib-0.005
 perform-0.004
 slur-0.004
Token.
Token ID13
Feature activation+1.55e-04

Pos logit contributions
aren+0.044
inger+0.043
oys+0.043
lynn+0.041
riet+0.041
Neg logit contributions
 fro-0.027
 Sp-0.026
 nib-0.024
 perform-0.023
 slur-0.023
Token
Token ID198
Feature activation-9.95e-03

Pos logit contributions
aren+0.000
oys+0.000
inger+0.000
lynn+0.000
riet+0.000
Neg logit contributions
 fro-0.000
 nib-0.000
 Sp-0.000
 perform-0.000
 slur-0.000
Token
Token ID198
Feature activation-6.07e-03

Pos logit contributions
 fro+0.013
 Sp+0.011
 nib+0.011
 perform+0.011
 slur+0.011
Neg logit contributions
aren-0.020
inger-0.020
oys-0.020
lynn-0.019
athy-0.019
TokenOn
Token ID2202
Feature activation+5.38e-03

Pos logit contributions
 fro+0.010
 Sp+0.010
 nib+0.009
 slur+0.009
 perform+0.009
Neg logit contributions
aren-0.010
oys-0.010
inger-0.010
lynn-0.009
athy-0.009
Token the
Token ID262
Feature activation+2.55e-02

Pos logit contributions
aren+0.012
inger+0.011
oys+0.011
lynn+0.011
athy+0.011
Neg logit contributions
 fro-0.006
 Sp-0.006
 nib-0.006
 perform-0.005
 slur-0.005
Token way
Token ID835
Feature activation+2.59e-01

Pos logit contributions
aren+0.033
inger+0.032
oys+0.032
riet+0.029
lynn+0.029
Neg logit contributions
 fro-0.052
 nib-0.049
 Sp-0.049
 perform-0.048
 dream-0.047
Token to
Token ID284
Feature activation-2.03e-02

Pos logit contributions
aren+0.568
inger+0.560
oys+0.555
riet+0.542
lynn+0.536
Neg logit contributions
 fro-0.286
 Sp-0.268
 nib-0.239
 perform-0.224
 sne-0.209
Token the
Token ID262
Feature activation+2.08e-02

Pos logit contributions
 fro+0.022
 Sp+0.021
 nib+0.020
 perform+0.018
 slur+0.018
Neg logit contributions
aren-0.045
inger-0.044
oys-0.044
lynn-0.042
riet-0.041
Token doctor
Token ID6253
Feature activation-9.80e-02

Pos logit contributions
aren+0.036
oys+0.035
inger+0.034
lynn+0.033
athy+0.033
Neg logit contributions
 fro-0.031
 Sp-0.031
 nib-0.029
 slur-0.028
 perform-0.028
Token,
Token ID11
Feature activation+5.29e-03

Pos logit contributions
 fro+0.124
 Sp+0.117
 nib+0.113
 slur+0.106
 perform+0.106
Neg logit contributions
aren-0.196
oys-0.190
inger-0.189
lynn-0.182
riet-0.179
Token Lily
Token ID20037
Feature activation-3.18e-02

Pos logit contributions
aren+0.011
oys+0.010
inger+0.010
lynn+0.010
riet+0.010
Neg logit contributions
 fro-0.006
 nib-0.006
 Sp-0.006
 slur-0.006
 perform-0.006
TokenLittle
Token ID22253
Feature activation+1.01e-02

Pos logit contributions
 fro+0.001
 nib+0.001
 Sp+0.001
 slur+0.001
cast+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
athy-0.001
Token Girl
Token ID7430
Feature activation-1.67e-02

Pos logit contributions
aren+0.018
oys+0.017
inger+0.017
lynn+0.016
athy+0.016
Neg logit contributions
 fro-0.015
 nib-0.015
 Sp-0.014
 slur-0.014
 perform-0.014
Token:
Token ID25
Feature activation+5.62e-02

Pos logit contributions
 fro+0.029
 Sp+0.028
 nib+0.027
 slur+0.026
 perform+0.026
Neg logit contributions
aren-0.026
inger-0.024
oys-0.024
lynn-0.023
riet-0.023
Token "
Token ID366
Feature activation+5.17e-02

Pos logit contributions
aren+0.087
oys+0.078
inger+0.078
lynn+0.077
riet+0.075
Neg logit contributions
 fro-0.096
 Sp-0.089
 nib-0.089
 slur-0.089
 perform-0.086
TokenK
Token ID42
Feature activation+7.38e-02

Pos logit contributions
aren+0.091
oys+0.087
inger+0.084
lynn+0.084
athy+0.083
Neg logit contributions
 fro-0.080
 nib-0.076
 Sp-0.076
 slur-0.074
 perform-0.070
Tokenaren
Token ID5757
Feature activation+2.58e-01

Pos logit contributions
oys+0.122
inger+0.122
aren+0.120
lynn+0.117
 policy+0.116
Neg logit contributions
 fro-0.117
 Sp-0.113
 nib-0.102
 perform-0.101
 slur-0.095
Token."
Token ID526
Feature activation-2.83e-02

Pos logit contributions
aren+0.511
inger+0.493
oys+0.491
lynn+0.470
athy+0.469
Neg logit contributions
 fro-0.339
 Sp-0.317
 nib-0.309
 slur-0.293
 perform-0.291
Token
Token ID198
Feature activation-3.75e-03

Pos logit contributions
 fro+0.032
 Sp+0.032
 nib+0.028
 perform+0.025
 grow+0.024
Neg logit contributions
inger-0.062
oys-0.061
aren-0.060
lynn-0.059
 policy-0.058
Token
Token ID198
Feature activation-2.34e-03

Pos logit contributions
 fro+0.004
 Sp+0.004
 nib+0.004
 perform+0.003
 slur+0.003
Neg logit contributions
aren-0.008
oys-0.008
inger-0.008
lynn-0.008
riet-0.008
TokenK
Token ID42
Feature activation+7.42e-02

Pos logit contributions
 fro+0.005
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
lynn-0.003
athy-0.003
Tokenaren
Token ID5757
Feature activation+2.58e-01

Pos logit contributions
oys+0.099
inger+0.096
 policy+0.094
uster+0.092
lynn+0.090
Neg logit contributions
 fro-0.139
 Sp-0.138
 nib-0.126
 perform-0.123
 slur-0.118
TokenOne
Token ID3198
Feature activation+1.23e-02

Pos logit contributions
aren+0.019
inger+0.019
oys+0.019
 policy+0.018
uster+0.018
Neg logit contributions
 fro-0.010
 Sp-0.010
 perform-0.010
 nib-0.010
 dream-0.009
Token day
Token ID1110
Feature activation-4.00e-02

Pos logit contributions
oys+0.019
aren+0.018
inger+0.017
athy+0.017
 policy+0.017
Neg logit contributions
 fro-0.021
 Sp-0.020
 nib-0.020
cast-0.020
 slur-0.019
Token,
Token ID11
Feature activation+8.82e-03

Pos logit contributions
 Sp+0.033
 fro+0.033
 perform+0.030
 nib+0.030
 dance+0.025
Neg logit contributions
inger-0.101
aren-0.098
oys-0.095
lynn-0.094
issors-0.092
Token Martin
Token ID5780
Feature activation+1.03e-01

Pos logit contributions
inger+0.017
aren+0.016
oys+0.015
lynn+0.015
 supervision+0.014
Neg logit contributions
 Sp-0.013
 perform-0.013
 fro-0.012
 nib-0.012
 Will-0.012
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
aren+0.228
inger+0.226
oys+0.225
lynn+0.222
issors+0.220
Neg logit contributions
 fro-0.108
 nib-0.102
 Sp-0.099
 perform-0.092
 sne-0.089
Token Ellen
Token ID23811
Feature activation+2.58e-01

Pos logit contributions
inger+0.015
oys+0.015
ady+0.014
 supervision+0.014
aren+0.014
Neg logit contributions
 Sp-0.019
 perform-0.018
 Will-0.017
 dance-0.017
 fro-0.017
Token were
Token ID547
Feature activation+7.17e-03

Pos logit contributions
aren+0.558
inger+0.551
oys+0.548
lynn+0.531
riet+0.517
Neg logit contributions
 fro-0.294
 Sp-0.274
 nib-0.264
 perform-0.247
 slur-0.239
Token playing
Token ID2712
Feature activation-7.12e-02

Pos logit contributions
aren+0.015
oys+0.015
inger+0.015
lynn+0.014
riet+0.014
Neg logit contributions
 fro-0.008
 Sp-0.008
 nib-0.007
 perform-0.007
 slur-0.007
Token together
Token ID1978
Feature activation-3.06e-02

Pos logit contributions
 gag+0.116
 fro+0.104
 dream+0.097
 nib+0.097
 slur+0.096
Neg logit contributions
oys-0.105
han-0.093
 courage-0.090
inger-0.089
lynn-0.089
Token in
Token ID287
Feature activation-3.96e-03

Pos logit contributions
 fro+0.038
 Sp+0.037
 slur+0.032
 gag+0.032
 nib+0.032
Neg logit contributions
oys-0.058
aren-0.057
inger-0.057
 policy-0.054
lynn-0.054
Token the
Token ID262
Feature activation+1.49e-02

Pos logit contributions
 fro+0.004
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.003
Neg logit contributions
aren-0.009
oys-0.009
inger-0.009
lynn-0.008
riet-0.008
Tokenaren
Token ID5757
Feature activation+2.58e-01

Pos logit contributions
oys+0.122
inger+0.122
aren+0.120
lynn+0.117
 policy+0.116
Neg logit contributions
 fro-0.117
 Sp-0.113
 nib-0.102
 perform-0.101
 slur-0.095
Token."
Token ID526
Feature activation-2.83e-02

Pos logit contributions
aren+0.511
inger+0.493
oys+0.491
lynn+0.470
athy+0.469
Neg logit contributions
 fro-0.339
 Sp-0.317
 nib-0.309
 slur-0.293
 perform-0.291
Token
Token ID198
Feature activation-3.75e-03

Pos logit contributions
 fro+0.032
 Sp+0.032
 nib+0.028
 perform+0.025
 grow+0.024
Neg logit contributions
inger-0.062
oys-0.061
aren-0.060
lynn-0.059
 policy-0.058
Token
Token ID198
Feature activation-2.34e-03

Pos logit contributions
 fro+0.004
 Sp+0.004
 nib+0.004
 perform+0.003
 slur+0.003
Neg logit contributions
aren-0.008
oys-0.008
inger-0.008
lynn-0.008
riet-0.008
TokenK
Token ID42
Feature activation+7.42e-02

Pos logit contributions
 fro+0.005
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
lynn-0.003
athy-0.003
Tokenaren
Token ID5757
Feature activation+2.58e-01

Pos logit contributions
oys+0.099
inger+0.096
 policy+0.094
uster+0.092
lynn+0.090
Neg logit contributions
 fro-0.139
 Sp-0.138
 nib-0.126
 perform-0.123
 slur-0.118
Token and
Token ID290
Feature activation+1.85e-02

Pos logit contributions
aren+0.457
inger+0.428
oys+0.419
athy+0.417
riet+0.413
Neg logit contributions
 fro-0.387
 Sp-0.369
 nib-0.346
 perform-0.339
 slur-0.338
Token Ann
Token ID5506
Feature activation+2.12e-02

Pos logit contributions
aren+0.033
oys+0.031
inger+0.030
lynn+0.030
riet+0.029
Neg logit contributions
 fro-0.028
 Sp-0.027
 nib-0.027
 perform-0.026
 slur-0.025
Token had
Token ID550
Feature activation+1.08e-01

Pos logit contributions
aren+0.041
oys+0.037
riet+0.037
athy+0.037
inger+0.037
Neg logit contributions
 fro-0.028
 Sp-0.027
 nib-0.025
 slur-0.024
 soup-0.024
Token so
Token ID523
Feature activation-3.93e-03

Pos logit contributions
aren+0.229
oys+0.224
inger+0.224
lynn+0.216
riet+0.212
Neg logit contributions
 fro-0.126
 Sp-0.118
 nib-0.112
 perform-0.106
 slur-0.103
Token much
Token ID881
Feature activation+5.27e-03

Pos logit contributions
 fro+0.007
 Sp+0.006
 perform+0.006
 nib+0.006
 slur+0.006
Neg logit contributions
aren-0.006
oys-0.006
inger-0.006
athy-0.005
riet-0.005
Token.
Token ID13
Feature activation-4.19e-03

Pos logit contributions
aren+0.042
oys+0.041
inger+0.040
lynn+0.039
riet+0.039
Neg logit contributions
 fro-0.025
 Sp-0.023
 nib-0.022
 slur-0.021
 perform-0.021
Token
Token ID198
Feature activation-1.49e-02

Pos logit contributions
 fro+0.005
 Sp+0.005
 nib+0.005
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.009
oys-0.008
inger-0.008
lynn-0.008
riet-0.008
Token
Token ID198
Feature activation-9.99e-03

Pos logit contributions
 fro+0.019
 Sp+0.017
 nib+0.017
 perform+0.016
 slur+0.016
Neg logit contributions
aren-0.031
inger-0.030
oys-0.030
lynn-0.029
athy-0.028
TokenOn
Token ID2202
Feature activation+2.90e-03

Pos logit contributions
 fro+0.019
 nib+0.018
 Sp+0.018
 slur+0.017
 perform+0.017
Neg logit contributions
aren-0.015
inger-0.014
oys-0.014
lynn-0.013
athy-0.013
Token the
Token ID262
Feature activation+1.64e-02

Pos logit contributions
aren+0.006
oys+0.006
inger+0.006
lynn+0.006
riet+0.006
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.003
 slur-0.003
Token way
Token ID835
Feature activation+2.58e-01

Pos logit contributions
inger+0.029
aren+0.028
oys+0.027
uster+0.027
lynn+0.026
Neg logit contributions
 fro-0.026
 perform-0.026
 dance-0.026
 dream-0.025
 Sp-0.023
Token to
Token ID284
Feature activation-2.39e-02

Pos logit contributions
inger+0.591
oys+0.582
aren+0.580
lynn+0.567
riet+0.564
Neg logit contributions
 fro-0.248
 Sp-0.233
 nib-0.192
 perform-0.191
 dance-0.184
Token school
Token ID1524
Feature activation+2.59e-02

Pos logit contributions
 fro+0.026
 Sp+0.024
 nib+0.023
 perform+0.022
 slur+0.020
Neg logit contributions
aren-0.053
oys-0.052
inger-0.052
lynn-0.050
uster-0.049
Token,
Token ID11
Feature activation+2.08e-03

Pos logit contributions
inger+0.059
aren+0.058
oys+0.057
lynn+0.056
uster+0.055
Neg logit contributions
 fro-0.027
 Sp-0.025
 perform-0.023
 nib-0.023
 dream-0.020
Token Ann
Token ID5506
Feature activation+1.85e-02

Pos logit contributions
aren+0.004
oys+0.004
inger+0.003
lynn+0.003
riet+0.003
Neg logit contributions
 fro-0.003
 nib-0.003
 slur-0.003
 Sp-0.003
 perform-0.003
Token went
Token ID1816
Feature activation-5.34e-02

Pos logit contributions
aren+0.038
inger+0.037
oys+0.037
lynn+0.035
riet+0.035
Neg logit contributions
 fro-0.023
 Sp-0.022
 nib-0.021
 perform-0.020
 slur-0.019
Token to
Token ID284
Feature activation-2.86e-02

Pos logit contributions
aren+0.032
oys+0.031
inger+0.031
lynn+0.029
athy+0.029
Neg logit contributions
 fro-0.020
 Sp-0.019
 nib-0.017
 perform-0.017
 slur-0.017
Token them
Token ID606
Feature activation+1.13e-01

Pos logit contributions
 fro+0.035
 Sp+0.033
 nib+0.031
 perform+0.029
 slur+0.029
Neg logit contributions
aren-0.059
oys-0.057
inger-0.056
lynn-0.054
riet-0.054
Token and
Token ID290
Feature activation+1.73e-02

Pos logit contributions
aren+0.215
oys+0.209
inger+0.206
lynn+0.199
riet+0.196
Neg logit contributions
 fro-0.155
 Sp-0.146
 nib-0.135
 perform-0.131
 slur-0.129
Token asked
Token ID1965
Feature activation+3.52e-02

Pos logit contributions
aren+0.033
oys+0.031
inger+0.031
riet+0.030
lynn+0.030
Neg logit contributions
 fro-0.024
 Sp-0.023
 nib-0.021
 perform-0.021
 slur-0.020
Token for
Token ID329
Feature activation-4.02e-02

Pos logit contributions
aren+0.079
inger+0.078
oys+0.077
lynn+0.075
athy+0.073
Neg logit contributions
 fro-0.038
 Sp-0.036
 nib-0.033
 perform-0.031
 slur-0.030
Token directions
Token ID11678
Feature activation+2.58e-01

Pos logit contributions
 fro+0.044
 Sp+0.042
 nib+0.038
 perform+0.036
 slur+0.034
Neg logit contributions
inger-0.095
aren-0.092
oys-0.089
lynn-0.087
athy-0.086
Token.
Token ID13
Feature activation+1.77e-02

Pos logit contributions
aren+0.587
inger+0.586
oys+0.569
lynn+0.560
athy+0.551
Neg logit contributions
 fro-0.255
 Sp-0.243
 nib-0.221
 perform-0.211
 slur-0.196
Token The
Token ID383
Feature activation+1.38e-02

Pos logit contributions
aren+0.035
oys+0.035
inger+0.034
lynn+0.033
riet+0.033
Neg logit contributions
 fro-0.023
 Sp-0.021
 nib-0.021
 perform-0.020
 slur-0.019
Token stranger
Token ID16195
Feature activation+3.40e-02

Pos logit contributions
inger+0.027
oys+0.025
aren+0.025
lynn+0.025
issors+0.025
Neg logit contributions
 fro-0.020
 dream-0.018
 o-0.017
 perform-0.017
 dance-0.017
Token had
Token ID550
Feature activation+1.01e-01

Pos logit contributions
aren+0.064
oys+0.061
inger+0.060
lynn+0.059
riet+0.058
Neg logit contributions
 fro-0.047
 Sp-0.045
 nib-0.042
 perform-0.041
 slur-0.040
Token a
Token ID257
Feature activation-2.54e-02

Pos logit contributions
oys+0.248
aren+0.247
inger+0.242
uster+0.235
lynn+0.234
Neg logit contributions
 fro-0.093
 nib-0.089
 sne-0.085
 Sp-0.082
 grow-0.076
Token sleep
Token ID3993
Feature activation-1.46e-01

Pos logit contributions
 fro+0.052
 Sp+0.050
 slur+0.044
bel+0.044
 nib+0.044
Neg logit contributions
aren-0.070
inger-0.069
lynn-0.067
oys-0.067
riet-0.065
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
 fro+0.205
 Sp+0.189
 nib+0.174
 perform+0.169
 slur+0.166
Neg logit contributions
inger-0.268
aren-0.265
oys-0.261
lynn-0.249
riet-0.248
Token
Token ID198
Feature activation-1.50e-02

Pos logit contributions
 fro+0.017
 Sp+0.016
 nib+0.016
 perform+0.015
 slur+0.015
Neg logit contributions
aren-0.020
oys-0.019
inger-0.019
lynn-0.018
riet-0.018
Token
Token ID198
Feature activation-9.11e-03

Pos logit contributions
 fro+0.020
 nib+0.017
 Sp+0.017
 perform+0.017
 slur+0.017
Neg logit contributions
aren-0.030
inger-0.029
oys-0.028
lynn-0.028
athy-0.028
TokenK
Token ID42
Feature activation+7.10e-02

Pos logit contributions
 fro+0.015
 Sp+0.015
 nib+0.014
 slur+0.014
 perform+0.014
Neg logit contributions
aren-0.015
oys-0.014
inger-0.014
athy-0.014
lynn-0.013
Tokenaren
Token ID5757
Feature activation+2.58e-01

Pos logit contributions
oys+0.090
 policy+0.084
issors+0.082
inger+0.082
riet+0.080
Neg logit contributions
 fro-0.145
 Sp-0.135
 nib-0.127
 o-0.120
 perform-0.118
Token laughed
Token ID13818
Feature activation+1.08e-02

Pos logit contributions
aren+0.477
inger+0.444
athy+0.442
oys+0.431
riet+0.429
Neg logit contributions
 fro-0.363
 Sp-0.347
 perform-0.322
 nib-0.318
 gag-0.314
Token quietly
Token ID12703
Feature activation-1.57e-02

Pos logit contributions
oys+0.015
aren+0.015
uster+0.014
inger+0.014
riet+0.014
Neg logit contributions
 fro-0.018
 Sp-0.017
 perform-0.017
 gag-0.016
atmeal-0.016
Token and
Token ID290
Feature activation+7.39e-03

Pos logit contributions
 fro+0.022
 Sp+0.021
 nib+0.019
 perform+0.019
 slur+0.018
Neg logit contributions
aren-0.029
oys-0.028
inger-0.028
riet-0.027
lynn-0.027
Token turned
Token ID2900
Feature activation+2.72e-02

Pos logit contributions
aren+0.013
inger+0.012
riet+0.012
oys+0.012
lynn+0.012
Neg logit contributions
 fro-0.010
 Sp-0.010
 perform-0.009
 nib-0.009
 slur-0.009
Token off
Token ID572
Feature activation+7.62e-02

Pos logit contributions
aren+0.052
inger+0.051
oys+0.050
lynn+0.049
uster+0.049
Neg logit contributions
 fro-0.037
 Sp-0.034
 nib-0.033
 perform-0.032
 slur-0.030
Token wall
Token ID3355
Feature activation-9.41e-03

Pos logit contributions
aren+0.043
inger+0.042
oys+0.041
lynn+0.039
uster+0.039
Neg logit contributions
 fro-0.041
 Sp-0.038
 nib-0.037
 perform-0.036
 slur-0.035
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
 fro+0.011
 Sp+0.010
 nib+0.010
 perform+0.009
 slur+0.009
Neg logit contributions
aren-0.020
oys-0.020
inger-0.020
lynn-0.019
riet-0.019
Token He
Token ID679
Feature activation+2.13e-04

Pos logit contributions
aren+0.032
inger+0.032
oys+0.031
lynn+0.030
riet+0.029
Neg logit contributions
 fro-0.024
 Sp-0.023
 nib-0.022
 perform-0.021
 slur-0.020
Token liked
Token ID8288
Feature activation-5.90e-03

Pos logit contributions
aren+0.000
oys+0.000
inger+0.000
lynn+0.000
athy+0.000
Neg logit contributions
 fro-0.000
 Sp-0.000
 perform-0.000
 nib-0.000
 slur-0.000
Token the
Token ID262
Feature activation+2.13e-02

Pos logit contributions
 fro+0.008
 Sp+0.008
 nib+0.008
 perform+0.007
 gag+0.007
Neg logit contributions
aren-0.010
oys-0.010
lynn-0.009
riet-0.009
inger-0.009
Token way
Token ID835
Feature activation+2.58e-01

Pos logit contributions
aren+0.038
inger+0.037
oys+0.036
lynn+0.035
riet+0.034
Neg logit contributions
 fro-0.032
 Sp-0.031
 nib-0.029
 perform-0.028
 slur-0.027
Token it
Token ID340
Feature activation+2.47e-02

Pos logit contributions
inger+0.682
aren+0.674
oys+0.667
riet+0.644
athy+0.640
Neg logit contributions
 fro-0.177
 Sp-0.171
 perform-0.126
 nib-0.125
 dance-0.121
Token looked
Token ID3114
Feature activation+9.94e-03

Pos logit contributions
aren+0.048
inger+0.046
oys+0.046
lynn+0.045
athy+0.043
Neg logit contributions
 fro-0.032
 Sp-0.030
 nib-0.029
 slur-0.027
 perform-0.027
Token and
Token ID290
Feature activation+1.67e-02

Pos logit contributions
aren+0.023
inger+0.023
oys+0.023
lynn+0.022
athy+0.022
Neg logit contributions
 fro-0.010
 Sp-0.009
 nib-0.009
 perform-0.008
 slur-0.008
Token decided
Token ID3066
Feature activation-1.48e-02

Pos logit contributions
aren+0.032
oys+0.030
lynn+0.029
inger+0.029
riet+0.029
Neg logit contributions
 fro-0.021
 Sp-0.020
 nib-0.019
 perform-0.019
 slur-0.018
Token to
Token ID284
Feature activation-2.73e-02

Pos logit contributions
 fro+0.016
 Sp+0.015
 nib+0.014
 perform+0.013
 slur+0.013
Neg logit contributions
aren-0.033
inger-0.033
oys-0.033
lynn-0.031
riet-0.031
Token.
Token ID13
Feature activation-2.72e-03

Pos logit contributions
aren+0.051
oys+0.050
inger+0.050
lynn+0.048
riet+0.047
Neg logit contributions
 fro-0.025
 Sp-0.023
 nib-0.022
 perform-0.021
 slur-0.020
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
 fro+0.004
 Sp+0.004
 nib+0.004
 perform+0.003
 slur+0.003
Neg logit contributions
aren-0.005
oys-0.005
inger-0.005
lynn-0.005
riet-0.005
Token
Token ID198
Feature activation-5.44e-03

Pos logit contributions
 fro+0.013
 Sp+0.012
 nib+0.012
 perform+0.011
 slur+0.011
Neg logit contributions
aren-0.021
inger-0.020
oys-0.020
lynn-0.019
athy-0.019
TokenOn
Token ID2202
Feature activation+6.00e-03

Pos logit contributions
 fro+0.009
 Sp+0.009
 nib+0.009
 slur+0.008
 sne+0.008
Neg logit contributions
aren-0.009
oys-0.009
inger-0.008
lynn-0.008
athy-0.008
Token the
Token ID262
Feature activation+2.16e-02

Pos logit contributions
aren+0.013
oys+0.013
inger+0.013
lynn+0.012
riet+0.012
Neg logit contributions
 fro-0.007
 Sp-0.006
 nib-0.006
 perform-0.006
 slur-0.005
Token way
Token ID835
Feature activation+2.58e-01

Pos logit contributions
aren+0.030
inger+0.030
oys+0.029
riet+0.027
lynn+0.027
Neg logit contributions
 fro-0.042
 nib-0.039
 Sp-0.039
 perform-0.038
 dream-0.038
Token home
Token ID1363
Feature activation+3.07e-02

Pos logit contributions
aren+0.587
inger+0.578
oys+0.576
riet+0.562
lynn+0.554
Neg logit contributions
 fro-0.261
 Sp-0.242
 nib-0.210
 perform-0.199
 dance-0.188
Token,
Token ID11
Feature activation+3.90e-03

Pos logit contributions
aren+0.068
inger+0.066
oys+0.066
lynn+0.065
riet+0.064
Neg logit contributions
 fro-0.034
 Sp-0.031
 nib-0.029
 perform-0.028
 slur-0.026
Token she
Token ID673
Feature activation-5.03e-02

Pos logit contributions
aren+0.007
oys+0.007
inger+0.007
lynn+0.007
riet+0.007
Neg logit contributions
 fro-0.005
 nib-0.005
 Sp-0.005
 slur-0.005
 perform-0.005
Token met
Token ID1138
Feature activation+5.38e-03

Pos logit contributions
 Sp+0.070
 fro+0.070
 perform+0.060
 nib+0.060
 slur+0.060
Neg logit contributions
aren-0.089
inger-0.087
athy-0.083
oys-0.082
riet-0.081
Token a
Token ID257
Feature activation-2.29e-02

Pos logit contributions
aren+0.012
oys+0.012
inger+0.012
lynn+0.012
riet+0.011
Neg logit contributions
 fro-0.006
 Sp-0.005
 nib-0.005
 perform-0.005
 slur-0.004
Token,
Token ID11
Feature activation+2.06e-02

Pos logit contributions
 fro+0.028
 Sp+0.026
 nib+0.025
 perform+0.024
 slur+0.024
Neg logit contributions
aren-0.044
oys-0.042
inger-0.042
lynn-0.040
riet-0.040
Token Jenny
Token ID28437
Feature activation+5.05e-02

Pos logit contributions
aren+0.037
oys+0.036
inger+0.035
lynn+0.034
riet+0.034
Neg logit contributions
 fro-0.030
 nib-0.028
 Sp-0.028
 slur-0.027
 perform-0.027
Token was
Token ID373
Feature activation+2.93e-02

Pos logit contributions
aren+0.097
inger+0.093
oys+0.092
athy+0.090
riet+0.090
Neg logit contributions
 fro-0.069
 Sp-0.066
 nib-0.061
 slur-0.058
 perform-0.058
Token on
Token ID319
Feature activation+5.07e-03

Pos logit contributions
aren+0.051
inger+0.050
oys+0.047
ady+0.047
athy+0.046
Neg logit contributions
 fro-0.043
 Sp-0.042
 nib-0.038
 slur-0.037
 perform-0.037
Token her
Token ID607
Feature activation+9.44e-03

Pos logit contributions
aren+0.010
inger+0.010
lynn+0.009
oys+0.009
athy+0.009
Neg logit contributions
 fro-0.007
 Sp-0.006
 nib-0.006
 slur-0.006
 perform-0.005
Token way
Token ID835
Feature activation+2.58e-01

Pos logit contributions
aren+0.015
oys+0.015
inger+0.015
lynn+0.014
riet+0.014
Neg logit contributions
 fro-0.016
 Sp-0.015
 nib-0.015
 perform-0.014
 slur-0.014
Token outside
Token ID2354
Feature activation-2.51e-02

Pos logit contributions
aren+0.609
inger+0.595
oys+0.593
athy+0.580
riet+0.579
Neg logit contributions
 fro-0.236
 Sp-0.215
 nib-0.201
 perform-0.194
 dance-0.188
Token when
Token ID618
Feature activation-3.94e-02

Pos logit contributions
 fro+0.037
 Sp+0.036
 nib+0.035
 slur+0.034
 gag+0.033
Neg logit contributions
aren-0.039
oys-0.039
 policy-0.037
inger-0.036
riet-0.035
Token she
Token ID673
Feature activation-4.65e-02

Pos logit contributions
 fro+0.043
 nib+0.040
 slur+0.039
 Sp+0.039
 perform+0.036
Neg logit contributions
aren-0.085
oys-0.081
lynn-0.079
inger-0.078
riet-0.078
Token discovered
Token ID5071
Feature activation-1.26e-02

Pos logit contributions
 Sp+0.071
 fro+0.067
 slur+0.065
 gag+0.065
 perform+0.062
Neg logit contributions
aren-0.076
athy-0.067
inger-0.067
riet-0.065
ady-0.065
Token a
Token ID257
Feature activation-2.14e-02

Pos logit contributions
 fro+0.014
 Sp+0.013
 nib+0.012
 perform+0.012
 slur+0.012
Neg logit contributions
aren-0.027
inger-0.027
oys-0.027
lynn-0.026
riet-0.025
Token
Token ID220
Feature activation+5.79e-02

Pos logit contributions
aren+0.123
inger+0.121
oys+0.121
lynn+0.117
riet+0.116
Neg logit contributions
 fro-0.048
 Sp-0.046
 nib-0.041
 perform-0.038
 slur-0.036
Token
Token ID198
Feature activation-1.45e-03

Pos logit contributions
aren+0.134
oys+0.132
inger+0.132
lynn+0.128
riet+0.126
Neg logit contributions
 fro-0.054
 Sp-0.052
 nib-0.047
 perform-0.043
 slur-0.041
Token
Token ID198
Feature activation+9.70e-04

Pos logit contributions
 fro+0.002
 nib+0.002
 Sp+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
lynn-0.003
athy-0.003
TokenOn
Token ID2202
Feature activation+1.02e-02

Pos logit contributions
aren+0.002
oys+0.002
inger+0.002
lynn+0.001
athy+0.001
Neg logit contributions
 fro-0.002
 Sp-0.001
 nib-0.001
 perform-0.001
 slur-0.001
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
aren+0.022
oys+0.022
inger+0.022
lynn+0.021
athy+0.020
Neg logit contributions
 fro-0.012
 Sp-0.011
 nib-0.010
 perform-0.010
 slur-0.009
Token way
Token ID835
Feature activation+2.58e-01

Pos logit contributions
inger+0.037
aren+0.037
oys+0.036
riet+0.034
uster+0.034
Neg logit contributions
 fro-0.032
 Sp-0.030
 nib-0.029
 perform-0.029
 dream-0.029
Token to
Token ID284
Feature activation-2.01e-02

Pos logit contributions
aren+0.603
inger+0.601
oys+0.590
riet+0.581
athy+0.570
Neg logit contributions
 fro-0.245
 Sp-0.229
 nib-0.194
 perform-0.182
 dream-0.170
Token deliver
Token ID5203
Feature activation+6.38e-02

Pos logit contributions
 fro+0.021
 Sp+0.020
 nib+0.019
 perform+0.018
 slur+0.017
Neg logit contributions
aren-0.046
oys-0.045
inger-0.045
lynn-0.043
riet-0.043
Token the
Token ID262
Feature activation+1.30e-02

Pos logit contributions
inger+0.152
aren+0.148
oys+0.144
riet+0.143
uster+0.141
Neg logit contributions
 fro-0.067
 Sp-0.065
 nib-0.054
 soup-0.051
 o-0.050
Token pin
Token ID6757
Feature activation-4.93e-02

Pos logit contributions
inger+0.022
aren+0.022
oys+0.021
riet+0.020
lynn+0.020
Neg logit contributions
 fro-0.022
 Sp-0.020
 nib-0.020
 perform-0.019
 dream-0.019
Tokenew
Token ID413
Feature activation-2.58e-03

Pos logit contributions
 fro+0.052
 Sp+0.048
 perform+0.045
 nib+0.045
 dance+0.040
Neg logit contributions
aren-0.111
inger-0.107
oys-0.107
lynn-0.104
athy-0.104
Token said
Token ID531
Feature activation-2.26e-02

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.002
oys-0.002
inger-0.002
lynn-0.001
riet-0.001
Token.
Token ID13
Feature activation-1.82e-03

Pos logit contributions
 fro+0.024
 Sp+0.023
 nib+0.021
 perform+0.020
 slur+0.020
Neg logit contributions
aren-0.051
inger-0.050
oys-0.050
lynn-0.048
athy-0.048
Token "
Token ID366
Feature activation+3.61e-02

Pos logit contributions
 Sp+0.002
 fro+0.002
 nib+0.002
 slur+0.002
 perform+0.002
Neg logit contributions
inger-0.004
aren-0.004
oys-0.004
lynn-0.003
 policy-0.003
TokenThanks
Token ID9690
Feature activation+1.35e-01

Pos logit contributions
aren+0.061
oys+0.060
inger+0.060
lynn+0.057
riet+0.056
Neg logit contributions
 fro-0.058
 Sp-0.055
 nib-0.053
 perform-0.051
 slur-0.050
Token,"
Token ID553
Feature activation-5.12e-02

Pos logit contributions
oys+0.271
aren+0.270
inger+0.263
lynn+0.255
riet+0.251
Neg logit contributions
 fro-0.172
 Sp-0.158
 nib-0.156
 perform-0.148
 slur-0.146
Token Sp
Token ID1338
Feature activation-2.49e-01

Pos logit contributions
 fro+0.055
 Sp+0.049
 nib+0.049
 perform+0.047
 dream+0.043
Neg logit contributions
aren-0.111
oys-0.108
riet-0.106
inger-0.105
lynn-0.104
Tokenide
Token ID485
Feature activation-5.38e-02

Pos logit contributions
 fro+0.434
 nib+0.409
 perform+0.383
 Sp+0.374
 dream+0.373
Neg logit contributions
oys-0.427
inger-0.417
aren-0.398
lynn-0.373
ady-0.371
Tokeny
Token ID88
Feature activation-5.50e-02

Pos logit contributions
 fro+0.074
 Sp+0.068
 nib+0.067
 perform+0.066
 movies+0.063
Neg logit contributions
inger-0.099
oys-0.098
aren-0.097
riet-0.093
lynn-0.090
Token replied
Token ID8712
Feature activation-2.52e-02

Pos logit contributions
 fro+0.083
 Sp+0.077
 nib+0.073
 perform+0.072
 gag+0.067
Neg logit contributions
aren-0.095
oys-0.093
inger-0.093
riet-0.088
lynn-0.087
Token.
Token ID13
Feature activation-9.20e-03

Pos logit contributions
 fro+0.030
 Sp+0.026
 nib+0.025
 perform+0.024
 slur+0.023
Neg logit contributions
oys-0.052
aren-0.052
inger-0.051
lynn-0.049
riet-0.049
Token "
Token ID366
Feature activation+3.25e-02

Pos logit contributions
 fro+0.013
 Sp+0.013
 nib+0.012
 perform+0.011
 slur+0.011
Neg logit contributions
aren-0.017
inger-0.017
oys-0.017
lynn-0.016
riet-0.016
Token learned
Token ID4499
Feature activation+2.63e-02

Pos logit contributions
oys+0.160
aren+0.160
inger+0.155
lynn+0.150
uster+0.149
Neg logit contributions
 fro-0.131
 nib-0.128
 Sp-0.126
 perform-0.118
 sne-0.118
Token an
Token ID281
Feature activation+2.56e-03

Pos logit contributions
aren+0.065
oys+0.063
inger+0.063
athy+0.060
riet+0.059
Neg logit contributions
 fro-0.022
 Sp-0.022
 nib-0.019
 perform-0.019
 dream-0.017
Token important
Token ID1593
Feature activation+7.64e-02

Pos logit contributions
aren+0.004
oys+0.004
inger+0.004
athy+0.004
riet+0.004
Neg logit contributions
 Sp-0.004
 fro-0.004
 o-0.004
 nib-0.004
 dream-0.004
Token lesson
Token ID11483
Feature activation+7.22e-02

Pos logit contributions
aren+0.106
inger+0.100
oys+0.097
riet+0.094
athy+0.094
Neg logit contributions
 fro-0.144
 Sp-0.143
 nib-0.139
 perform-0.137
 dream-0.133
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
inger+0.180
aren+0.177
uster+0.171
athy+0.170
riet+0.166
Neg logit contributions
 Sp-0.058
 he-0.046
 fro-0.046
 perform-0.044
 nib-0.041
Token Sp
Token ID1338
Feature activation-2.49e-01

Pos logit contributions
 Sp+0.016
 fro+0.013
 nib+0.013
 perform+0.012
 slur+0.011
Neg logit contributions
inger-0.027
aren-0.027
oys-0.027
lynn-0.025
uster-0.025
Tokenoil
Token ID9437
Feature activation-6.56e-02

Pos logit contributions
 fro+0.398
 Sp+0.388
 nib+0.358
 perform+0.354
 slur+0.349
Neg logit contributions
aren-0.399
inger-0.380
lynn-0.372
oys-0.372
riet-0.370
Token it
Token ID340
Feature activation+9.34e-03

Pos logit contributions
 fro+0.093
 Sp+0.088
 perform+0.084
 nib+0.082
 slur+0.076
Neg logit contributions
aren-0.125
oys-0.118
inger-0.117
lynn-0.116
riet-0.112
Token and
Token ID290
Feature activation-3.52e-03

Pos logit contributions
aren+0.021
inger+0.020
oys+0.020
lynn+0.019
uster+0.019
Neg logit contributions
 fro-0.010
 Sp-0.010
 nib-0.009
 perform-0.009
 dream-0.008
Token you
Token ID345
Feature activation+1.64e-02

Pos logit contributions
 fro+0.004
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.003
Neg logit contributions
aren-0.007
oys-0.007
inger-0.007
lynn-0.007
riet-0.007
Token will
Token ID481
Feature activation-6.91e-03

Pos logit contributions
aren+0.029
inger+0.028
oys+0.027
athy+0.026
lynn+0.026
Neg logit contributions
 fro-0.025
 Sp-0.023
 nib-0.023
 slur-0.021
 perform-0.021
Token fly
Token ID6129
Feature activation-6.94e-02

Pos logit contributions
aren+0.021
oys+0.020
inger+0.020
lynn+0.019
athy+0.019
Neg logit contributions
 Sp-0.017
 fro-0.017
 nib-0.016
 slur-0.015
 perform-0.015
Token got
Token ID1392
Feature activation-1.59e-02

Pos logit contributions
 Sp+0.089
 fro+0.087
 nib+0.083
 slur+0.081
 gag+0.077
Neg logit contributions
aren-0.132
inger-0.129
oys-0.124
lynn-0.120
athy-0.119
Token close
Token ID1969
Feature activation+9.28e-02

Pos logit contributions
 fro+0.021
 Sp+0.021
 perform+0.020
 slur+0.019
 nib+0.019
Neg logit contributions
aren-0.030
inger-0.029
oys-0.028
lynn-0.027
athy-0.027
Token enough
Token ID1576
Feature activation+5.20e-02

Pos logit contributions
oys+0.219
aren+0.217
inger+0.216
lynn+0.211
issors+0.209
Neg logit contributions
 fro-0.088
 nib-0.075
 Sp-0.075
 perform-0.064
 o-0.064
Token,
Token ID11
Feature activation-1.32e-02

Pos logit contributions
oys+0.129
aren+0.127
inger+0.127
lynn+0.124
riet+0.122
Neg logit contributions
 fro-0.048
 nib-0.042
 Sp-0.039
 grow-0.035
 o-0.034
Token Sp
Token ID1338
Feature activation-2.47e-01

Pos logit contributions
 fro+0.016
 nib+0.016
 Sp+0.015
 slur+0.015
 perform+0.015
Neg logit contributions
aren-0.025
oys-0.023
inger-0.023
riet-0.023
lynn-0.023
Tokenide
Token ID485
Feature activation-5.18e-02

Pos logit contributions
 fro+0.447
 nib+0.426
 perform+0.395
 slur+0.393
 Sp+0.393
Neg logit contributions
inger-0.396
oys-0.388
aren-0.371
ady-0.341
uster-0.340
Tokeny
Token ID88
Feature activation-4.99e-02

Pos logit contributions
 fro+0.074
 Sp+0.074
 nib+0.067
 perform+0.066
 slur+0.065
Neg logit contributions
inger-0.091
aren-0.087
oys-0.086
athy-0.080
riet-0.080
Token jumped
Token ID11687
Feature activation-3.13e-02

Pos logit contributions
 fro+0.068
 Sp+0.067
 nib+0.060
 perform+0.059
 gag+0.059
Neg logit contributions
aren-0.089
inger-0.089
oys-0.085
athy-0.083
riet-0.082
Token and
Token ID290
Feature activation+4.51e-03

Pos logit contributions
 fro+0.048
 Sp+0.047
 perform+0.044
 nib+0.042
 slur+0.042
Neg logit contributions
inger-0.051
oys-0.050
aren-0.049
uster-0.048
lynn-0.047
Token caught
Token ID4978
Feature activation-1.12e-02

Pos logit contributions
aren+0.007
inger+0.007
 courage+0.007
oys+0.007
riet+0.007
Neg logit contributions
 Sp-0.006
 movies-0.006
 gag-0.006
 fro-0.006
 perform-0.006
Token was
Token ID373
Feature activation+2.40e-03

Pos logit contributions
aren+0.041
oys+0.040
inger+0.039
lynn+0.037
athy+0.037
Neg logit contributions
 fro-0.024
 Sp-0.023
 nib-0.022
 slur-0.021
 perform-0.021
Token a
Token ID257
Feature activation-2.66e-02

Pos logit contributions
aren+0.005
oys+0.005
inger+0.005
lynn+0.005
athy+0.005
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.002
 slur-0.002
Token little
Token ID1310
Feature activation+5.13e-02

Pos logit contributions
 fro+0.048
 Sp+0.047
 nib+0.045
 slur+0.044
 perform+0.044
Neg logit contributions
aren-0.036
oys-0.034
inger-0.033
athy-0.032
lynn-0.032
Token spider
Token ID19230
Feature activation-1.59e-01

Pos logit contributions
aren+0.080
oys+0.077
inger+0.075
athy+0.072
riet+0.072
Neg logit contributions
 fro-0.084
 Sp-0.083
 nib-0.079
 perform-0.078
 slur-0.077
Token named
Token ID3706
Feature activation+3.05e-02

Pos logit contributions
 fro+0.264
 Sp+0.246
 slur+0.230
 perform+0.224
 nib+0.223
Neg logit contributions
aren-0.249
inger-0.241
oys-0.234
athy-0.233
 policy-0.221
Token Sp
Token ID1338
Feature activation-2.46e-01

Pos logit contributions
aren+0.063
oys+0.062
inger+0.062
lynn+0.060
riet+0.059
Neg logit contributions
 fro-0.037
 Sp-0.035
 nib-0.034
 perform-0.032
 slur-0.031
Tokenide
Token ID485
Feature activation-5.22e-02

Pos logit contributions
 fro+0.412
 nib+0.383
 slur+0.353
 Sp+0.343
 dream+0.342
Neg logit contributions
inger-0.444
oys-0.427
aren-0.417
ady-0.394
athy-0.393
Tokeny
Token ID88
Feature activation-4.58e-02

Pos logit contributions
 fro+0.099
 slur+0.093
 perform+0.091
 nib+0.091
 Sp+0.089
Neg logit contributions
oys-0.066
inger-0.065
ace-0.061
aren-0.061
ady-0.055
Token.
Token ID13
Feature activation-6.20e-03

Pos logit contributions
 fro+0.065
 slur+0.058
 nib+0.058
 Sp+0.057
 perform+0.057
Neg logit contributions
aren-0.083
oys-0.082
inger-0.081
athy-0.077
riet-0.076
Token Sp
Token ID1338
Feature activation-2.39e-01

Pos logit contributions
bel+0.005
 fro+0.005
 perform+0.005
 sne+0.005
 gag+0.004
Neg logit contributions
oys-0.013
aren-0.013
uster-0.013
lynn-0.012
ruff-0.012
Tokenide
Token ID485
Feature activation-3.65e-02

Pos logit contributions
 fro+0.337
 nib+0.306
 Sp+0.292
 perform+0.286
 slur+0.281
Neg logit contributions
inger-0.477
oys-0.469
aren-0.468
athy-0.436
lynn-0.431
Token From
Token ID3574
Feature activation+8.36e-02

Pos logit contributions
 fro+0.001
 nib+0.001
 perform+0.001
 sne+0.001
 slur+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
uster-0.001
Token that
Token ID326
Feature activation+1.09e-02

Pos logit contributions
aren+0.178
inger+0.174
oys+0.173
lynn+0.168
riet+0.164
Neg logit contributions
 fro-0.098
 Sp-0.092
 nib-0.088
 perform-0.083
 slur-0.082
Token day
Token ID1110
Feature activation-4.97e-02

Pos logit contributions
aren+0.019
inger+0.018
oys+0.018
lynn+0.018
athy+0.017
Neg logit contributions
 fro-0.017
 Sp-0.016
 nib-0.015
 slur-0.015
 perform-0.014
Token on
Token ID319
Feature activation-1.32e-02

Pos logit contributions
 fro+0.044
 Sp+0.043
 nib+0.038
 perform+0.035
 slur+0.032
Neg logit contributions
aren-0.122
inger-0.118
oys-0.118
lynn-0.113
riet-0.113
Token,
Token ID11
Feature activation-1.10e-02

Pos logit contributions
 fro+0.016
 Sp+0.015
 nib+0.014
 perform+0.014
 slur+0.013
Neg logit contributions
aren-0.028
oys-0.027
inger-0.027
lynn-0.026
riet-0.026
Token Sp
Token ID1338
Feature activation-2.45e-01

Pos logit contributions
 gag+0.010
 fro+0.008
 simmer+0.008
 nib+0.007
 bloom+0.007
Neg logit contributions
oys-0.021
 courage-0.020
 policy-0.020
aren-0.020
lynn-0.020
Tokenide
Token ID485
Feature activation-4.22e-02

Pos logit contributions
 fro+0.351
 nib+0.323
 perform+0.305
 Sp+0.304
 slur+0.291
Neg logit contributions
inger-0.482
oys-0.477
aren-0.469
athy-0.438
ady-0.434
Tokeny
Token ID88
Feature activation-4.74e-02

Pos logit contributions
 fro+0.072
 Sp+0.069
 nib+0.069
 slur+0.066
 perform+0.065
Neg logit contributions
inger-0.065
oys-0.061
aren-0.058
ace-0.056
athy-0.055
Token always
Token ID1464
Feature activation+8.07e-03

Pos logit contributions
 fro+0.068
 Sp+0.063
 nib+0.061
 slur+0.057
 gag+0.057
Neg logit contributions
inger-0.085
aren-0.083
oys-0.082
athy-0.079
lynn-0.078
Token remembered
Token ID12086
Feature activation+1.32e-02

Pos logit contributions
aren+0.017
oys+0.016
inger+0.016
uster+0.016
lynn+0.016
Neg logit contributions
 fro-0.011
 nib-0.010
 Sp-0.010
 perform-0.010
 sne-0.009
Token to
Token ID284
Feature activation-4.02e-02

Pos logit contributions
aren+0.028
oys+0.027
inger+0.027
lynn+0.026
riet+0.026
Neg logit contributions
 fro-0.015
 Sp-0.014
 nib-0.014
 perform-0.013
 slur-0.013
Token bought
Token ID5839
Feature activation+9.56e-02

Pos logit contributions
aren+0.060
inger+0.055
oys+0.055
lynn+0.053
athy+0.051
Neg logit contributions
 fro-0.055
 Sp-0.051
 nib-0.047
 slur-0.046
 soup-0.046
Token us
Token ID514
Feature activation+9.00e-02

Pos logit contributions
aren+0.194
oys+0.189
inger+0.187
lynn+0.182
riet+0.180
Neg logit contributions
 fro-0.119
 Sp-0.110
 nib-0.108
 perform-0.102
 slur-0.100
Token a
Token ID257
Feature activation-2.40e-02

Pos logit contributions
aren+0.167
oys+0.165
inger+0.161
lynn+0.156
riet+0.152
Neg logit contributions
 fro-0.120
 nib-0.115
 Sp-0.112
 slur-0.107
 gag-0.104
Token treat
Token ID2190
Feature activation+2.26e-03

Pos logit contributions
 fro+0.040
 Sp+0.038
 nib+0.037
 perform+0.035
 slur+0.035
Neg logit contributions
aren-0.039
oys-0.038
inger-0.038
lynn-0.036
riet-0.035
Token.
Token ID13
Feature activation-1.02e-02

Pos logit contributions
aren+0.004
oys+0.004
inger+0.004
lynn+0.004
 policy+0.004
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 slur-0.003
 perform-0.002
Token Sp
Token ID1338
Feature activation-2.44e-01

Pos logit contributions
 fro+0.016
 Sp+0.015
 nib+0.014
 perform+0.014
 slur+0.013
Neg logit contributions
inger-0.018
aren-0.018
oys-0.017
lynn-0.017
athy-0.016
Tokenaghetti
Token ID35812
Feature activation-1.68e-01

Pos logit contributions
 fro+0.398
 Sp+0.375
 nib+0.371
 perform+0.349
 slur+0.345
Neg logit contributions
aren-0.418
oys-0.412
inger-0.410
lynn-0.384
athy-0.376
Token!"
Token ID2474
Feature activation-1.35e-02

Pos logit contributions
 fro+0.260
 Sp+0.245
 nib+0.235
 slur+0.226
 perform+0.221
Neg logit contributions
oys-0.288
aren-0.286
inger-0.279
lynn-0.264
 policy-0.262
Token he
Token ID339
Feature activation-7.03e-02

Pos logit contributions
 fro+0.020
 nib+0.019
cast+0.019
 perform+0.019
 gag+0.018
Neg logit contributions
aren-0.023
oys-0.021
riet-0.021
 Stephanie-0.020
athy-0.019
Token said
Token ID531
Feature activation-1.58e-02

Pos logit contributions
 fro+0.115
 Sp+0.107
atmeal+0.104
cast+0.101
 perform+0.101
Neg logit contributions
aren-0.100
oys-0.095
riet-0.093
inger-0.090
 fore-0.088
Token.
Token ID13
Feature activation+8.59e-04

Pos logit contributions
 fro+0.021
 Sp+0.019
 nib+0.018
 perform+0.017
 slur+0.017
Neg logit contributions
oys-0.030
aren-0.030
inger-0.030
lynn-0.028
riet-0.028
Token swoop
Token ID38527
Feature activation+5.05e-03

Pos logit contributions
aren+0.011
inger+0.010
riet+0.010
oys+0.010
athy+0.010
Neg logit contributions
 fro-0.011
 perform-0.011
 movies-0.011
 breakfast-0.010
 gag-0.010
Tokened
Token ID276
Feature activation-2.86e-02

Pos logit contributions
aren+0.008
oys+0.008
inger+0.008
lynn+0.007
riet+0.007
Neg logit contributions
 fro-0.009
 Sp-0.008
 nib-0.008
 perform-0.008
 slur-0.008
Token down
Token ID866
Feature activation-3.55e-02

Pos logit contributions
 fro+0.040
 Sp+0.038
 perform+0.037
 nib+0.036
 slur+0.034
Neg logit contributions
aren-0.051
oys-0.050
inger-0.050
lynn-0.048
riet-0.047
Token to
Token ID284
Feature activation-3.91e-02

Pos logit contributions
 fro+0.045
 Sp+0.041
 nib+0.040
 perform+0.039
 slur+0.037
Neg logit contributions
aren-0.070
oys-0.069
inger-0.068
lynn-0.065
 policy-0.064
Token catch
Token ID4929
Feature activation-6.61e-02

Pos logit contributions
 fro+0.049
 gag+0.047
 Sp+0.046
bel+0.044
cast+0.044
Neg logit contributions
aren-0.068
oys-0.065
lynn-0.064
inger-0.064
riet-0.061
Token Sp
Token ID1338
Feature activation-2.44e-01

Pos logit contributions
 gag+0.089
atmeal+0.088
 fro+0.086
 dream+0.084
 perform+0.083
Neg logit contributions
oys-0.099
 courage-0.097
aren-0.096
uster-0.095
inger-0.095
Tokenide
Token ID485
Feature activation-4.28e-02

Pos logit contributions
 fro+0.335
 nib+0.306
 Sp+0.304
 slur+0.281
 perform+0.280
Neg logit contributions
inger-0.493
aren-0.479
oys-0.476
athy-0.450
lynn-0.446
Tokeny
Token ID88
Feature activation-4.97e-02

Pos logit contributions
 fro+0.068
 slur+0.064
 Sp+0.064
 nib+0.064
 perform+0.063
Neg logit contributions
oys-0.067
inger-0.067
aren-0.065
lynn-0.060
uster-0.060
Token.
Token ID13
Feature activation+4.33e-04

Pos logit contributions
 fro+0.069
 Sp+0.066
 nib+0.060
 perform+0.060
 slur+0.060
Neg logit contributions
oys-0.085
aren-0.084
inger-0.083
 policy-0.079
riet-0.078
Token Sp
Token ID1338
Feature activation-2.40e-01

Pos logit contributions
aren+0.001
oys+0.001
inger+0.001
uster+0.001
lynn+0.001
Neg logit contributions
 fro-0.001
 perform-0.001
 nib-0.001
 gag-0.001
 dream-0.001
Tokenide
Token ID485
Feature activation-3.74e-02

Pos logit contributions
 fro+0.309
 nib+0.279
 Sp+0.277
 perform+0.264
 slur+0.251
Neg logit contributions
aren-0.497
inger-0.496
oys-0.493
athy-0.460
lynn-0.460
Token to
Token ID284
Feature activation-2.75e-02

Pos logit contributions
aren+0.338
inger+0.334
athy+0.327
oys+0.325
lynn+0.322
Neg logit contributions
 fro-0.099
 nib-0.093
 Sp-0.092
 perform-0.081
 sne-0.078
Token go
Token ID467
Feature activation-5.02e-02

Pos logit contributions
 fro+0.028
 Sp+0.026
 nib+0.024
 perform+0.023
 slur+0.022
Neg logit contributions
aren-0.063
oys-0.062
inger-0.062
lynn-0.060
riet-0.059
Token back
Token ID736
Feature activation+4.62e-02

Pos logit contributions
 fro+0.074
 Sp+0.069
 nib+0.067
 perform+0.065
 dream+0.060
Neg logit contributions
aren-0.092
inger-0.090
oys-0.090
lynn-0.086
riet-0.086
Token home
Token ID1363
Feature activation+2.00e-02

Pos logit contributions
aren+0.088
oys+0.087
inger+0.087
lynn+0.083
riet+0.082
Neg logit contributions
 fro-0.064
 Sp-0.060
 nib-0.058
 perform-0.056
 slur-0.052
Token,
Token ID11
Feature activation-3.83e-03

Pos logit contributions
aren+0.045
inger+0.045
oys+0.044
lynn+0.044
athy+0.043
Neg logit contributions
 fro-0.021
 Sp-0.019
 nib-0.018
 perform-0.017
 slur-0.016
Token Sp
Token ID1338
Feature activation-2.44e-01

Pos logit contributions
 fro+0.005
 Sp+0.005
 nib+0.005
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.008
oys-0.007
inger-0.007
lynn-0.007
riet-0.007
Tokenide
Token ID485
Feature activation-4.37e-02

Pos logit contributions
 fro+0.391
 nib+0.367
 Sp+0.334
 perform+0.333
 slur+0.325
Neg logit contributions
oys-0.456
inger-0.449
aren-0.420
ady-0.404
lynn-0.392
Tokeny
Token ID88
Feature activation-4.47e-02

Pos logit contributions
 fro+0.065
 Sp+0.063
 nib+0.059
 perform+0.058
 slur+0.057
Neg logit contributions
inger-0.076
aren-0.075
oys-0.075
riet-0.069
lynn-0.068
Token and
Token ID290
Feature activation+9.46e-03

Pos logit contributions
 fro+0.061
 Sp+0.059
 nib+0.055
 perform+0.053
 slur+0.052
Neg logit contributions
aren-0.085
inger-0.082
oys-0.082
lynn-0.078
riet-0.078
Token Annie
Token ID25860
Feature activation-1.97e-03

Pos logit contributions
aren+0.017
oys+0.017
riet+0.016
inger+0.016
lynn+0.015
Neg logit contributions
 fro-0.011
 nib-0.011
 Sp-0.010
 perform-0.010
 slur-0.010
Token said
Token ID531
Feature activation-1.16e-02

Pos logit contributions
 fro+0.003
 Sp+0.003
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
aren-0.004
inger-0.003
oys-0.003
riet-0.003
lynn-0.003
Token big
Token ID1263
Feature activation+1.69e-02

Pos logit contributions
 fro+0.045
 Sp+0.044
 nib+0.043
 slur+0.041
 perform+0.041
Neg logit contributions
aren-0.040
oys-0.038
inger-0.038
lynn-0.036
athy-0.035
Token fly
Token ID6129
Feature activation-6.24e-02

Pos logit contributions
aren+0.030
oys+0.029
inger+0.029
lynn+0.028
athy+0.027
Neg logit contributions
 fro-0.024
 Sp-0.024
 nib-0.023
 perform-0.022
 slur-0.022
Token came
Token ID1625
Feature activation-9.50e-03

Pos logit contributions
 fro+0.074
 Sp+0.070
 nib+0.067
 perform+0.062
 slur+0.061
Neg logit contributions
aren-0.131
inger-0.128
oys-0.128
lynn-0.123
riet-0.120
Token by
Token ID416
Feature activation-1.43e-03

Pos logit contributions
 fro+0.011
 Sp+0.010
 nib+0.010
 perform+0.009
 slur+0.008
Neg logit contributions
aren-0.021
oys-0.021
inger-0.020
riet-0.020
lynn-0.020
Token and
Token ID290
Feature activation+1.49e-02

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
oys-0.003
aren-0.003
inger-0.003
lynn-0.003
 policy-0.002
Token Sp
Token ID1338
Feature activation-2.43e-01

Pos logit contributions
aren+0.029
oys+0.027
inger+0.027
riet+0.026
lynn+0.026
Neg logit contributions
 fro-0.018
 Sp-0.018
 nib-0.017
 slur-0.017
 gag-0.016
Tokenide
Token ID485
Feature activation-4.65e-02

Pos logit contributions
 fro+0.407
 nib+0.387
 Sp+0.369
 slur+0.354
 perform+0.349
Neg logit contributions
inger-0.429
oys-0.412
aren-0.408
lynn-0.381
athy-0.374
Tokeny
Token ID88
Feature activation-4.53e-02

Pos logit contributions
 fro+0.071
 Sp+0.070
 nib+0.065
 slur+0.064
 perform+0.064
Neg logit contributions
inger-0.076
aren-0.074
oys-0.072
riet-0.066
athy-0.066
Token caught
Token ID4978
Feature activation-1.31e-03

Pos logit contributions
 fro+0.060
 Sp+0.058
 nib+0.053
 slur+0.053
 gag+0.051
Neg logit contributions
aren-0.084
inger-0.081
oys-0.080
riet-0.076
athy-0.076
Token it
Token ID340
Feature activation+1.87e-02

Pos logit contributions
 fro+0.002
 perform+0.002
 slur+0.002
 nib+0.002
 dream+0.001
Neg logit contributions
inger-0.002
oys-0.002
aren-0.002
lynn-0.002
uster-0.002
Token in
Token ID287
Feature activation-3.71e-03

Pos logit contributions
aren+0.038
inger+0.037
oys+0.037
lynn+0.036
riet+0.035
Neg logit contributions
 fro-0.023
 Sp-0.021
 nib-0.020
 perform-0.019
 slur-0.019
Token From
Token ID3574
Feature activation+7.71e-02

Pos logit contributions
 fro+0.020
 perform+0.018
 nib+0.018
 sne+0.017
 slur+0.017
Neg logit contributions
aren-0.022
oys-0.021
inger-0.021
lynn-0.020
riet-0.020
Token that
Token ID326
Feature activation+7.06e-03

Pos logit contributions
aren+0.175
inger+0.173
oys+0.173
lynn+0.165
uster+0.164
Neg logit contributions
 fro-0.081
 Sp-0.075
 nib-0.070
 perform-0.065
 slur-0.061
Token day
Token ID1110
Feature activation-4.95e-02

Pos logit contributions
aren+0.014
oys+0.013
inger+0.013
lynn+0.013
riet+0.013
Neg logit contributions
 fro-0.010
 Sp-0.009
 nib-0.009
 perform-0.008
 slur-0.008
Token on
Token ID319
Feature activation-1.54e-02

Pos logit contributions
 Sp+0.040
 fro+0.040
 nib+0.035
 perform+0.033
 dream+0.029
Neg logit contributions
aren-0.124
inger-0.119
oys-0.118
riet-0.115
lynn-0.114
Token,
Token ID11
Feature activation-1.16e-02

Pos logit contributions
 fro+0.019
 Sp+0.017
 nib+0.017
 perform+0.016
 slur+0.016
Neg logit contributions
aren-0.032
oys-0.032
inger-0.032
lynn-0.030
riet-0.030
Token Sp
Token ID1338
Feature activation-2.43e-01

Pos logit contributions
 gag+0.010
 fro+0.009
 nib+0.008
 simmer+0.007
 slur+0.007
Neg logit contributions
 policy-0.023
 courage-0.023
oys-0.023
 vain-0.022
lynn-0.022
Tokenide
Token ID485
Feature activation-4.36e-02

Pos logit contributions
 fro+0.290
 Sp+0.260
 nib+0.259
 perform+0.244
 slur+0.235
Neg logit contributions
aren-0.520
inger-0.518
oys-0.518
lynn-0.488
athy-0.484
Tokeny
Token ID88
Feature activation-5.26e-02

Pos logit contributions
 fro+0.069
 Sp+0.065
 nib+0.065
 slur+0.062
 perform+0.062
Neg logit contributions
inger-0.073
oys-0.072
aren-0.071
riet-0.066
lynn-0.066
Token knew
Token ID2993
Feature activation-7.91e-03

Pos logit contributions
 fro+0.079
 Sp+0.073
 nib+0.071
 perform+0.067
 slur+0.067
Neg logit contributions
aren-0.093
inger-0.092
oys-0.091
lynn-0.087
athy-0.086
Token that
Token ID326
Feature activation+6.76e-03

Pos logit contributions
 fro+0.008
 Sp+0.008
 nib+0.007
 perform+0.006
 slur+0.006
Neg logit contributions
aren-0.018
inger-0.018
oys-0.018
riet-0.017
lynn-0.017
Token even
Token ID772
Feature activation-1.18e-01

Pos logit contributions
oys+0.013
 policy+0.013
aren+0.013
lynn+0.012
inger+0.012
Neg logit contributions
 fro-0.009
 nib-0.007
 Sp-0.007
 slur-0.007
 perform-0.007
Token back
Token ID736
Feature activation+4.15e-02

Pos logit contributions
 fro+0.062
 Sp+0.058
 nib+0.057
 perform+0.054
 slur+0.053
Neg logit contributions
aren-0.076
oys-0.074
inger-0.074
lynn-0.071
athy-0.069
Token to
Token ID284
Feature activation-3.67e-02

Pos logit contributions
aren+0.075
oys+0.073
inger+0.073
lynn+0.070
riet+0.068
Neg logit contributions
 fro-0.062
 Sp-0.059
 nib-0.057
 perform-0.054
 slur-0.053
Token their
Token ID511
Feature activation+4.66e-02

Pos logit contributions
 fro+0.048
 Sp+0.044
 nib+0.042
bel+0.041
 gag+0.040
Neg logit contributions
aren-0.067
lynn-0.063
riet-0.063
oys-0.062
athy-0.061
Token homes
Token ID5682
Feature activation-1.62e-02

Pos logit contributions
aren+0.079
oys+0.077
inger+0.076
lynn+0.073
athy+0.072
Neg logit contributions
 fro-0.072
 Sp-0.069
 nib-0.066
 perform-0.064
 slur-0.063
Token.
Token ID13
Feature activation+2.20e-03

Pos logit contributions
 fro+0.018
 Sp+0.017
 nib+0.016
 perform+0.015
 slur+0.015
Neg logit contributions
aren-0.035
oys-0.034
inger-0.034
lynn-0.033
riet-0.032
Token Sp
Token ID1338
Feature activation-2.43e-01

Pos logit contributions
aren+0.004
oys+0.004
riet+0.003
inger+0.003
athy+0.003
Neg logit contributions
 fro-0.003
 nib-0.003
 slur-0.003
 perform-0.003
 Sp-0.003
Tokenide
Token ID485
Feature activation-4.02e-02

Pos logit contributions
 fro+0.322
 nib+0.289
 Sp+0.286
 perform+0.269
 slur+0.258
Neg logit contributions
aren-0.497
oys-0.496
inger-0.494
athy-0.462
lynn-0.458
Tokeny
Token ID88
Feature activation-4.56e-02

Pos logit contributions
 fro+0.064
 Sp+0.062
 nib+0.060
 perform+0.058
 slur+0.057
Neg logit contributions
inger-0.064
aren-0.064
oys-0.063
riet-0.059
athy-0.058
Token was
Token ID373
Feature activation+4.53e-03

Pos logit contributions
 fro+0.061
 Sp+0.058
 nib+0.055
 perform+0.052
 slur+0.051
Neg logit contributions
aren-0.088
inger-0.085
oys-0.085
lynn-0.081
riet-0.081
Token so
Token ID523
Feature activation-1.73e-02

Pos logit contributions
aren+0.009
inger+0.009
oys+0.009
lynn+0.008
riet+0.008
Neg logit contributions
 fro-0.006
 Sp-0.006
 nib-0.005
 perform-0.005
 slur-0.005
Token happy
Token ID3772
Feature activation-6.45e-02

Pos logit contributions
 fro+0.024
 Sp+0.023
 nib+0.022
 perform+0.021
 slur+0.020
Neg logit contributions
aren-0.033
inger-0.032
oys-0.031
riet-0.030
lynn-0.030
Token try
Token ID1949
Feature activation+5.13e-02

Pos logit contributions
 fro+0.044
 nib+0.041
 Sp+0.040
 perform+0.040
 slur+0.038
Neg logit contributions
aren-0.074
inger-0.071
oys-0.071
lynn-0.069
riet-0.068
Token new
Token ID649
Feature activation+1.11e-01

Pos logit contributions
oys+0.099
inger+0.099
aren+0.098
lynn+0.094
 policy+0.092
Neg logit contributions
 fro-0.066
 Sp-0.060
 nib-0.058
 perform-0.056
 slur-0.054
Token things
Token ID1243
Feature activation+3.31e-02

Pos logit contributions
oys+0.163
inger+0.154
 policy+0.151
 courage+0.149
 attention+0.144
Neg logit contributions
 fro-0.184
 Sp-0.175
 nib-0.166
 perform-0.165
 slur-0.153
Token sometimes
Token ID3360
Feature activation-7.11e-02

Pos logit contributions
aren+0.070
oys+0.069
inger+0.069
lynn+0.066
riet+0.065
Neg logit contributions
 fro-0.039
 Sp-0.036
 nib-0.035
 perform-0.033
 slur-0.032
Token.
Token ID13
Feature activation+2.63e-03

Pos logit contributions
 fro+0.085
 Sp+0.080
 nib+0.076
 perform+0.071
 slur+0.070
Neg logit contributions
aren-0.149
oys-0.147
inger-0.146
lynn-0.140
riet-0.138
Token Sp
Token ID1338
Feature activation-2.42e-01

Pos logit contributions
aren+0.004
inger+0.004
oys+0.004
lynn+0.004
riet+0.004
Neg logit contributions
 fro-0.004
 Sp-0.004
 nib-0.004
 perform-0.004
 slur-0.004
Tokenicy
Token ID4611
Feature activation-1.29e-01

Pos logit contributions
 fro+0.403
 nib+0.377
 Sp+0.374
 perform+0.345
 dream+0.340
Neg logit contributions
oys-0.426
inger-0.414
aren-0.403
lynn-0.380
 policy-0.371
Token food
Token ID2057
Feature activation-2.77e-02

Pos logit contributions
 fro+0.188
 Sp+0.176
 nib+0.174
 perform+0.162
 slur+0.162
Neg logit contributions
oys-0.233
aren-0.231
inger-0.227
lynn-0.215
riet-0.211
Token is
Token ID318
Feature activation+2.25e-02

Pos logit contributions
 fro+0.039
 Sp+0.037
 nib+0.035
 perform+0.034
 slur+0.033
Neg logit contributions
aren-0.052
inger-0.051
oys-0.051
lynn-0.049
riet-0.048
Token good
Token ID922
Feature activation+3.95e-02

Pos logit contributions
aren+0.042
oys+0.040
inger+0.040
lynn+0.039
riet+0.038
Neg logit contributions
 fro-0.033
 Sp-0.032
 nib-0.031
 perform-0.029
 slur-0.028
Token for
Token ID329
Feature activation-4.30e-02

Pos logit contributions
aren+0.095
inger+0.092
lynn+0.090
athy+0.088
oys+0.087
Neg logit contributions
 fro-0.037
 Sp-0.036
 nib-0.034
 perform-0.032
 slur-0.030
TokenNo
Token ID2949
Feature activation-5.38e-03

Pos logit contributions
aren+0.061
oys+0.060
inger+0.060
lynn+0.057
riet+0.056
Neg logit contributions
 fro-0.059
 Sp-0.056
 nib-0.055
 perform-0.052
 slur-0.051
Token,
Token ID11
Feature activation-4.02e-03

Pos logit contributions
 Sp+0.005
 fro+0.004
 sne+0.004
 tea+0.004
 nib+0.004
Neg logit contributions
inger-0.014
aren-0.013
lynn-0.013
riet-0.013
athy-0.013
Token sweet
Token ID6029
Feature activation+2.00e-02

Pos logit contributions
 fro+0.004
 Sp+0.004
 perform+0.003
 sne+0.003
 o+0.003
Neg logit contributions
inger-0.010
aren-0.010
riet-0.009
ady-0.009
athy-0.009
Tokenie
Token ID494
Feature activation-1.01e-01

Pos logit contributions
aren+0.042
oys+0.041
inger+0.041
lynn+0.039
riet+0.039
Neg logit contributions
 fro-0.024
 Sp-0.023
 nib-0.022
 perform-0.020
 slur-0.020
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
 fro+0.091
 Sp+0.091
 nib+0.078
 perform+0.075
 sne+0.069
Neg logit contributions
aren-0.239
inger-0.234
oys-0.230
lynn-0.230
uster-0.226
Token Sp
Token ID1338
Feature activation-2.42e-01

Pos logit contributions
 Sp+0.019
 fro+0.018
 nib+0.016
 perform+0.016
 Bee+0.015
Neg logit contributions
inger-0.021
aren-0.021
lynn-0.020
oys-0.020
riet-0.020
Tokeniders
Token ID4157
Feature activation-9.89e-02

Pos logit contributions
 fro+0.302
 Sp+0.295
 nib+0.266
 perform+0.262
 slur+0.255
Neg logit contributions
aren-0.476
inger-0.463
oys-0.453
lynn-0.447
riet-0.446
Token can
Token ID460
Feature activation-2.92e-02

Pos logit contributions
 fro+0.115
 Sp+0.107
 nib+0.104
 perform+0.101
 slur+0.095
Neg logit contributions
aren-0.211
inger-0.207
oys-0.207
lynn-0.200
riet-0.195
Token be
Token ID307
Feature activation-5.05e-03

Pos logit contributions
 fro+0.043
 Sp+0.041
 nib+0.039
 slur+0.037
 perform+0.036
Neg logit contributions
inger-0.051
aren-0.051
oys-0.051
lynn-0.048
athy-0.047
Token dangerous
Token ID4923
Feature activation+3.65e-02

Pos logit contributions
 fro+0.007
 Sp+0.007
 nib+0.007
 perform+0.006
 sne+0.006
Neg logit contributions
aren-0.010
oys-0.009
inger-0.009
lynn-0.009
riet-0.009
Token if
Token ID611
Feature activation-3.27e-03

Pos logit contributions
aren+0.079
inger+0.077
oys+0.076
lynn+0.074
riet+0.072
Neg logit contributions
 fro-0.042
 Sp-0.040
 nib-0.038
 perform-0.036
 slur-0.034
Token are
Token ID389
Feature activation-1.97e-02

Pos logit contributions
aren+0.057
oys+0.056
inger+0.056
lynn+0.053
riet+0.053
Neg logit contributions
 fro-0.035
 Sp-0.033
 nib-0.032
 perform-0.030
 slur-0.029
Token hard
Token ID1327
Feature activation+3.67e-02

Pos logit contributions
 fro+0.028
 nib+0.026
 Sp+0.026
 perform+0.024
 grow+0.024
Neg logit contributions
aren-0.039
oys-0.037
inger-0.036
lynn-0.036
athy-0.035
Token."
Token ID526
Feature activation-3.35e-02

Pos logit contributions
aren+0.084
inger+0.080
lynn+0.079
oys+0.078
athy+0.076
Neg logit contributions
 fro-0.038
 Sp-0.037
 nib-0.035
 perform-0.034
 sne-0.032
Token So
Token ID1406
Feature activation+3.09e-03

Pos logit contributions
 fro+0.058
 perform+0.054
 nib+0.054
 sne+0.053
 slur+0.052
Neg logit contributions
aren-0.047
oys-0.046
inger-0.041
lynn-0.041
riet-0.041
Token,
Token ID11
Feature activation-4.93e-03

Pos logit contributions
oys+0.005
aren+0.005
 policy+0.005
 Stephanie+0.005
lynn+0.005
Neg logit contributions
 fro-0.004
 gag-0.004
 nib-0.004
atmeal-0.004
 slur-0.004
Token Sp
Token ID1338
Feature activation-2.42e-01

Pos logit contributions
atmeal+0.006
 gag+0.006
cast+0.006
 fro+0.006
aghetti+0.005
Neg logit contributions
 Stephanie-0.007
 courage-0.007
oys-0.007
aren-0.007
 Alison-0.006
Tokenide
Token ID485
Feature activation-4.05e-02

Pos logit contributions
 fro+0.321
 nib+0.285
 Sp+0.280
 perform+0.272
 slur+0.260
Neg logit contributions
inger-0.496
oys-0.495
aren-0.486
lynn-0.458
athy-0.455
Tokeny
Token ID88
Feature activation-4.99e-02

Pos logit contributions
 fro+0.068
 Sp+0.066
 nib+0.063
 perform+0.061
 slur+0.061
Neg logit contributions
inger-0.064
oys-0.063
aren-0.061
riet-0.057
lynn-0.057
Token tried
Token ID3088
Feature activation+5.63e-02

Pos logit contributions
 fro+0.074
 Sp+0.071
 nib+0.065
 perform+0.063
 slur+0.062
Neg logit contributions
aren-0.087
inger-0.086
oys-0.085
lynn-0.080
riet-0.080
Token again
Token ID757
Feature activation-1.23e-01

Pos logit contributions
aren+0.128
inger+0.126
oys+0.125
lynn+0.120
riet+0.119
Neg logit contributions
 fro-0.059
 Sp-0.055
 nib-0.053
 perform-0.048
 slur-0.046
Token and
Token ID290
Feature activation+1.48e-03

Pos logit contributions
 fro+0.154
 Sp+0.145
 nib+0.139
 perform+0.131
 slur+0.128
Neg logit contributions
aren-0.252
oys-0.247
inger-0.246
lynn-0.237
riet-0.232
Token to
Token ID284
Feature activation-2.60e-02

Pos logit contributions
aren+0.103
oys+0.103
inger+0.101
lynn+0.097
 policy+0.096
Neg logit contributions
 fro-0.081
 Sp-0.077
 nib-0.073
 perform-0.070
 slur-0.068
Token leave
Token ID2666
Feature activation+5.71e-02

Pos logit contributions
 fro+0.032
 Sp+0.031
 nib+0.028
 perform+0.027
 slur+0.026
Neg logit contributions
aren-0.053
inger-0.052
oys-0.052
lynn-0.050
athy-0.049
Token it
Token ID340
Feature activation+2.30e-02

Pos logit contributions
aren+0.138
inger+0.136
oys+0.135
lynn+0.131
uster+0.129
Neg logit contributions
 fro-0.053
 Sp-0.052
 nib-0.046
 perform-0.041
 slur-0.039
Token alone
Token ID3436
Feature activation-4.33e-02

Pos logit contributions
inger+0.041
aren+0.040
oys+0.040
lynn+0.038
 policy+0.037
Neg logit contributions
 fro-0.033
 Sp-0.032
 nib-0.030
 slur-0.030
 perform-0.029
Token.
Token ID13
Feature activation-8.44e-03

Pos logit contributions
 fro+0.054
 Sp+0.050
 nib+0.048
 perform+0.046
 slur+0.045
Neg logit contributions
aren-0.089
oys-0.087
inger-0.087
lynn-0.083
riet-0.082
Token Sp
Token ID1338
Feature activation-2.40e-01

Pos logit contributions
 Sp+0.013
 fro+0.012
 nib+0.011
 perform+0.011
 slur+0.010
Neg logit contributions
inger-0.016
aren-0.016
oys-0.015
lynn-0.015
riet-0.014
Tokeniders
Token ID4157
Feature activation-9.55e-02

Pos logit contributions
 fro+0.399
 nib+0.380
 Sp+0.365
 perform+0.349
 dream+0.342
Neg logit contributions
oys-0.426
inger-0.422
aren-0.411
lynn-0.380
athy-0.374
Token can
Token ID460
Feature activation-2.64e-02

Pos logit contributions
 fro+0.127
 Sp+0.120
 nib+0.114
 perform+0.106
 slur+0.106
Neg logit contributions
aren-0.186
inger-0.181
oys-0.181
lynn-0.173
riet-0.171
Token be
Token ID307
Feature activation-8.21e-04

Pos logit contributions
 fro+0.038
 Sp+0.037
 nib+0.033
 slur+0.032
cast+0.031
Neg logit contributions
inger-0.046
aren-0.044
oys-0.044
lynn-0.043
athy-0.042
Token a
Token ID257
Feature activation-2.47e-02

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
riet-0.001
Token bit
Token ID1643
Feature activation+4.44e-02

Pos logit contributions
 fro+0.040
 Sp+0.039
 nib+0.037
 perform+0.036
 slur+0.036
Neg logit contributions
aren-0.039
oys-0.039
inger-0.039
lynn-0.036
 policy-0.036
Token catch
Token ID4929
Feature activation-6.61e-02

Pos logit contributions
 fro+0.049
 gag+0.047
 Sp+0.046
bel+0.044
cast+0.044
Neg logit contributions
aren-0.068
oys-0.065
lynn-0.064
inger-0.064
riet-0.061
Token Sp
Token ID1338
Feature activation-2.44e-01

Pos logit contributions
 gag+0.089
atmeal+0.088
 fro+0.086
 dream+0.084
 perform+0.083
Neg logit contributions
oys-0.099
 courage-0.097
aren-0.096
uster-0.095
inger-0.095
Tokenide
Token ID485
Feature activation-4.28e-02

Pos logit contributions
 fro+0.335
 nib+0.306
 Sp+0.304
 slur+0.281
 perform+0.280
Neg logit contributions
inger-0.493
aren-0.479
oys-0.476
athy-0.450
lynn-0.446
Tokeny
Token ID88
Feature activation-4.97e-02

Pos logit contributions
 fro+0.068
 slur+0.064
 Sp+0.064
 nib+0.064
 perform+0.063
Neg logit contributions
oys-0.067
inger-0.067
aren-0.065
lynn-0.060
uster-0.060
Token.
Token ID13
Feature activation+4.33e-04

Pos logit contributions
 fro+0.069
 Sp+0.066
 nib+0.060
 perform+0.060
 slur+0.060
Neg logit contributions
oys-0.085
aren-0.084
inger-0.083
 policy-0.079
riet-0.078
Token Sp
Token ID1338
Feature activation-2.40e-01

Pos logit contributions
aren+0.001
oys+0.001
inger+0.001
uster+0.001
lynn+0.001
Neg logit contributions
 fro-0.001
 perform-0.001
 nib-0.001
 gag-0.001
 dream-0.001
Tokenide
Token ID485
Feature activation-3.74e-02

Pos logit contributions
 fro+0.309
 nib+0.279
 Sp+0.277
 perform+0.264
 slur+0.251
Neg logit contributions
aren-0.497
inger-0.496
oys-0.493
athy-0.460
lynn-0.460
Tokeny
Token ID88
Feature activation-4.65e-02

Pos logit contributions
 fro+0.063
 Sp+0.062
 perform+0.061
 nib+0.060
 slur+0.059
Neg logit contributions
inger-0.055
oys-0.053
aren-0.053
ace-0.047
riet-0.047
Token wasn
Token ID2492
Feature activation+3.84e-03

Pos logit contributions
 fro+0.060
 Sp+0.056
 perform+0.053
 nib+0.052
 gag+0.050
Neg logit contributions
aren-0.087
inger-0.085
oys-0.083
athy-0.079
riet-0.079
Token't
Token ID470
Feature activation+7.97e-02

Pos logit contributions
aren+0.007
inger+0.006
oys+0.006
lynn+0.006
riet+0.006
Neg logit contributions
 fro-0.006
 Sp-0.006
 nib-0.006
 perform-0.005
 slur-0.005
Token ready
Token ID3492
Feature activation-7.15e-02

Pos logit contributions
aren+0.152
inger+0.150
oys+0.148
lynn+0.142
riet+0.140
Neg logit contributions
 fro-0.109
 Sp-0.104
 nib-0.099
 perform-0.095
 slur-0.093
Token "
Token ID366
Feature activation+3.46e-02

Pos logit contributions
 fro+0.005
 Sp+0.004
 nib+0.004
 perform+0.004
 slur+0.004
Neg logit contributions
aren-0.007
oys-0.007
inger-0.007
lynn-0.007
riet-0.007
TokenThat
Token ID2504
Feature activation+8.89e-03

Pos logit contributions
aren+0.058
oys+0.055
inger+0.054
lynn+0.053
athy+0.052
Neg logit contributions
 fro-0.057
 Sp-0.054
 nib-0.053
 perform-0.050
 slur-0.049
Token's
Token ID338
Feature activation-4.52e-03

Pos logit contributions
aren+0.016
oys+0.016
inger+0.015
lynn+0.015
athy+0.014
Neg logit contributions
 fro-0.013
 Sp-0.013
 nib-0.012
 perform-0.011
 slur-0.011
Token okay
Token ID8788
Feature activation+2.15e-02

Pos logit contributions
 fro+0.007
 Sp+0.007
 nib+0.006
 perform+0.006
 slur+0.006
Neg logit contributions
aren-0.008
inger-0.008
oys-0.008
lynn-0.008
riet-0.008
Token.
Token ID13
Feature activation-5.93e-03

Pos logit contributions
aren+0.045
oys+0.044
inger+0.044
lynn+0.042
riet+0.041
Neg logit contributions
 fro-0.026
 Sp-0.025
 nib-0.024
 perform-0.022
 slur-0.022
Token Sp
Token ID1338
Feature activation-2.40e-01

Pos logit contributions
 Sp+0.009
 fro+0.009
 nib+0.008
 perform+0.008
 slur+0.008
Neg logit contributions
inger-0.011
aren-0.011
oys-0.010
lynn-0.010
riet-0.010
Tokeniders
Token ID4157
Feature activation-1.04e-01

Pos logit contributions
 fro+0.424
 nib+0.409
 perform+0.368
 slur+0.367
 Sp+0.367
Neg logit contributions
oys-0.403
inger-0.382
aren-0.370
 policy-0.348
lynn-0.341
Token won
Token ID1839
Feature activation+3.61e-02

Pos logit contributions
 fro+0.136
 Sp+0.128
 nib+0.123
 perform+0.116
 slur+0.114
Neg logit contributions
aren-0.204
oys-0.199
inger-0.199
lynn-0.191
riet-0.187
Token't
Token ID470
Feature activation+7.81e-02

Pos logit contributions
inger+0.057
oys+0.057
aren+0.057
lynn+0.054
athy+0.052
Neg logit contributions
 fro-0.060
 nib-0.056
 Sp-0.055
 slur-0.052
 perform-0.052
Token hurt
Token ID5938
Feature activation-1.56e-02

Pos logit contributions
oys+0.146
aren+0.146
inger+0.145
lynn+0.137
 policy+0.135
Neg logit contributions
 fro-0.108
 Sp-0.103
 nib-0.096
 perform-0.090
 slur-0.090
Token you
Token ID345
Feature activation+1.52e-02

Pos logit contributions
 fro+0.018
 Sp+0.017
 nib+0.016
 perform+0.015
 slur+0.014
Neg logit contributions
aren-0.035
inger-0.034
oys-0.033
lynn-0.033
athy-0.032
Token there
Token ID612
Feature activation+4.25e-02

Pos logit contributions
 fro+0.000
 slur+0.000
 perform+0.000
 nib+0.000
 Sp+0.000
Neg logit contributions
aren-0.000
oys-0.000
inger-0.000
lynn-0.000
 policy-0.000
Token was
Token ID373
Feature activation+7.23e-03

Pos logit contributions
aren+0.082
oys+0.077
inger+0.075
lynn+0.075
 policy+0.074
Neg logit contributions
 fro-0.056
 Sp-0.051
 nib-0.051
 slur-0.049
 perform-0.049
Token a
Token ID257
Feature activation-2.54e-02

Pos logit contributions
aren+0.011
oys+0.011
athy+0.011
lynn+0.011
ady+0.011
Neg logit contributions
 fro-0.011
cast-0.010
 perform-0.010
 Sp-0.010
 slur-0.010
Token spider
Token ID19230
Feature activation-1.55e-01

Pos logit contributions
 fro+0.051
 Sp+0.050
 nib+0.048
 perform+0.047
cast+0.047
Neg logit contributions
aren-0.027
oys-0.027
inger-0.025
athy-0.024
lynn-0.024
Token named
Token ID3706
Feature activation+5.03e-02

Pos logit contributions
 fro+0.299
 Sp+0.268
 gag+0.265
 slur+0.260
 perform+0.255
Neg logit contributions
inger-0.181
oys-0.180
aren-0.178
athy-0.174
 policy-0.162
Token Sp
Token ID1338
Feature activation-2.40e-01

Pos logit contributions
inger+0.093
aren+0.093
oys+0.092
issors+0.091
lynn+0.090
Neg logit contributions
 fro-0.060
cast-0.057
 perform-0.057
bel-0.056
 nib-0.055
Tokenide
Token ID485
Feature activation-4.68e-02

Pos logit contributions
 fro+0.376
 nib+0.346
 Sp+0.325
 slur+0.317
 perform+0.316
Neg logit contributions
inger-0.447
oys-0.436
aren-0.422
athy-0.403
ady-0.400
Tokeny
Token ID88
Feature activation-4.07e-02

Pos logit contributions
 fro+0.096
 slur+0.090
 perform+0.088
 nib+0.086
 Sp+0.084
Neg logit contributions
oys-0.050
inger-0.047
ace-0.045
ubby-0.043
aren-0.038
Token.
Token ID13
Feature activation+6.32e-03

Pos logit contributions
 fro+0.065
 perform+0.056
cast+0.055
 slur+0.054
 nib+0.054
Neg logit contributions
oys-0.066
inger-0.062
aren-0.062
athy-0.060
lynn-0.058
Token Sp
Token ID1338
Feature activation-2.34e-01

Pos logit contributions
oys+0.013
issors+0.012
ace+0.012
 vain+0.011
 Step+0.011
Neg logit contributions
fish-0.006
bel-0.006
 fro-0.005
atmeal-0.005
 perform-0.005
Tokenide
Token ID485
Feature activation-3.29e-02

Pos logit contributions
 fro+0.348
 nib+0.318
 Sp+0.302
 perform+0.296
 slur+0.291
Neg logit contributions
inger-0.451
oys-0.448
aren-0.436
athy-0.413
ady-0.409
Token was
Token ID373
Feature activation+1.52e-02

Pos logit contributions
aren+0.066
oys+0.064
inger+0.063
 policy+0.061
athy+0.061
Neg logit contributions
 fro-0.047
 Sp-0.045
 nib-0.044
 slur-0.043
 perform-0.041
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
aren+0.029
oys+0.028
inger+0.028
athy+0.027
lynn+0.027
Neg logit contributions
 fro-0.020
 Sp-0.018
 nib-0.018
 perform-0.017
 slur-0.017
Token little
Token ID1310
Feature activation+5.86e-02

Pos logit contributions
 fro+0.035
 Sp+0.034
 nib+0.033
 slur+0.032
 perform+0.032
Neg logit contributions
aren-0.025
oys-0.023
inger-0.022
athy-0.022
lynn-0.022
Token spider
Token ID19230
Feature activation-1.52e-01

Pos logit contributions
aren+0.091
oys+0.087
inger+0.084
riet+0.081
athy+0.081
Neg logit contributions
 fro-0.096
 Sp-0.095
 nib-0.092
 perform-0.090
 slur-0.089
Token named
Token ID3706
Feature activation+4.04e-02

Pos logit contributions
 fro+0.253
 Sp+0.234
 slur+0.222
 perform+0.215
 gag+0.213
Neg logit contributions
aren-0.236
inger-0.227
athy-0.222
oys-0.221
ady-0.209
Token Sp
Token ID1338
Feature activation-2.39e-01

Pos logit contributions
aren+0.085
inger+0.083
oys+0.082
lynn+0.080
riet+0.079
Neg logit contributions
 fro-0.047
 nib-0.043
 Sp-0.042
 perform-0.041
 slur-0.040
Tokenide
Token ID485
Feature activation-4.46e-02

Pos logit contributions
 fro+0.398
 nib+0.368
 slur+0.337
 dream+0.330
 Sp+0.328
Neg logit contributions
inger-0.434
oys-0.422
aren-0.404
ady-0.388
athy-0.385
Tokeny
Token ID88
Feature activation-4.02e-02

Pos logit contributions
 fro+0.087
 slur+0.081
 nib+0.079
 perform+0.079
 Sp+0.078
Neg logit contributions
oys-0.054
inger-0.052
ace-0.051
aren-0.048
ady-0.045
Token.
Token ID13
Feature activation+1.95e-03

Pos logit contributions
 fro+0.060
 slur+0.054
 nib+0.053
 perform+0.052
 Sp+0.052
Neg logit contributions
aren-0.069
oys-0.069
inger-0.067
athy-0.064
riet-0.063
Token Sp
Token ID1338
Feature activation-2.35e-01

Pos logit contributions
oys+0.004
uster+0.004
aren+0.004
ace+0.004
issors+0.004
Neg logit contributions
bel-0.001
 fro-0.001
 sne-0.001
 perform-0.001
 gag-0.001
Tokenide
Token ID485
Feature activation-3.27e-02

Pos logit contributions
 fro+0.343
 nib+0.313
 Sp+0.294
 perform+0.290
 slur+0.286
Neg logit contributions
inger-0.460
oys-0.454
aren-0.446
athy-0.418
ady-0.416
Token won
Token ID1839
Feature activation+3.54e-02

Pos logit contributions
 fro+0.027
 Sp+0.025
 nib+0.025
 perform+0.024
 slur+0.024
Neg logit contributions
inger-0.031
aren-0.031
oys-0.031
athy-0.028
lynn-0.028
Token't
Token ID470
Feature activation+8.20e-02

Pos logit contributions
oys+0.054
inger+0.054
aren+0.053
lynn+0.050
athy+0.048
Neg logit contributions
 fro-0.061
 nib-0.058
 Sp-0.055
 slur-0.055
 perform-0.053
Token hurt
Token ID5938
Feature activation-1.22e-02

Pos logit contributions
inger+0.138
oys+0.138
aren+0.136
lynn+0.128
 policy+0.128
Neg logit contributions
 fro-0.123
 Sp-0.118
 nib-0.111
 slur-0.106
 perform-0.103
Token you
Token ID345
Feature activation+2.15e-02

Pos logit contributions
 fro+0.012
 Sp+0.012
 nib+0.010
 perform+0.009
 sne+0.009
Neg logit contributions
aren-0.029
inger-0.028
lynn-0.028
oys-0.027
uster-0.027
Token.
Token ID13
Feature activation-1.50e-03

Pos logit contributions
oys+0.035
inger+0.034
aren+0.034
lynn+0.031
 policy+0.031
Neg logit contributions
 fro-0.035
 Sp-0.034
 nib-0.033
 slur-0.032
 perform-0.031
Token Sp
Token ID1338
Feature activation-2.39e-01

Pos logit contributions
 Sp+0.002
 fro+0.002
 Bee+0.002
 perform+0.002
 nib+0.002
Neg logit contributions
inger-0.003
aren-0.003
lynn-0.003
oys-0.003
 policy-0.003
Tokeniders
Token ID4157
Feature activation-9.76e-02

Pos logit contributions
 fro+0.371
 Sp+0.349
 nib+0.344
 perform+0.324
 slur+0.319
Neg logit contributions
aren-0.425
oys-0.421
inger-0.419
lynn-0.395
athy-0.385
Token don
Token ID836
Feature activation-4.08e-02

Pos logit contributions
 fro+0.126
 Sp+0.118
 nib+0.113
 perform+0.107
 slur+0.105
Neg logit contributions
aren-0.195
oys-0.190
inger-0.190
lynn-0.182
riet-0.179
Token't
Token ID470
Feature activation+8.14e-02

Pos logit contributions
 fro+0.054
 Sp+0.052
 nib+0.049
 perform+0.048
 slur+0.046
Neg logit contributions
aren-0.080
oys-0.077
inger-0.077
lynn-0.074
riet-0.074
Token bite
Token ID13197
Feature activation-4.01e-02

Pos logit contributions
aren+0.155
oys+0.151
inger+0.151
lynn+0.145
riet+0.142
Neg logit contributions
 fro-0.114
 Sp-0.108
 nib-0.104
 perform-0.099
 slur-0.097
Token."
Token ID526
Feature activation-3.14e-02

Pos logit contributions
 fro+0.054
 Sp+0.051
 nib+0.050
 perform+0.047
 slur+0.046
Neg logit contributions
aren-0.077
oys-0.076
inger-0.075
lynn-0.072
riet-0.071
Token
Token ID198
Feature activation-1.49e-02

Pos logit contributions
aren+0.005
oys+0.005
inger+0.005
lynn+0.004
riet+0.004
Neg logit contributions
 fro-0.004
 nib-0.004
 Sp-0.004
 perform-0.004
 slur-0.004
Token
Token ID198
Feature activation-1.30e-02

Pos logit contributions
 fro+0.018
 Sp+0.016
 nib+0.016
 perform+0.015
 slur+0.014
Neg logit contributions
aren-0.032
oys-0.031
inger-0.031
lynn-0.030
athy-0.029
TokenThe
Token ID464
Feature activation+3.49e-02

Pos logit contributions
 Sp+0.017
 fro+0.017
 nib+0.015
 perform+0.014
 slur+0.014
Neg logit contributions
aren-0.025
inger-0.024
uster-0.024
riet-0.024
lynn-0.024
Token comb
Token ID1974
Feature activation+2.49e-02

Pos logit contributions
aren+0.071
oys+0.070
inger+0.069
lynn+0.067
uster+0.066
Neg logit contributions
 fro-0.045
 nib-0.041
 Sp-0.040
 perform-0.040
 sne-0.039
Token is
Token ID318
Feature activation+2.38e-02

Pos logit contributions
inger+0.032
aren+0.028
issors+0.028
oys+0.027
 courage+0.027
Neg logit contributions
 fro-0.046
 Sp-0.046
 nib-0.044
 nood-0.041
 perform-0.040
Token tight
Token ID5381
Feature activation+1.42e-01

Pos logit contributions
aren+0.040
inger+0.040
oys+0.039
lynn+0.038
riet+0.036
Neg logit contributions
 fro-0.038
 Sp-0.036
 nib-0.035
 perform-0.034
 slur-0.033
Token in
Token ID287
Feature activation-3.06e-03

Pos logit contributions
aren+0.370
inger+0.354
athy+0.350
riet+0.348
oys+0.345
Neg logit contributions
 nib-0.102
 fro-0.096
 Sp-0.096
 sne-0.090
 perform-0.081
Token the
Token ID262
Feature activation+5.39e-03

Pos logit contributions
 Sp+0.004
 fro+0.004
 nib+0.004
 sne+0.003
 grow+0.003
Neg logit contributions
aren-0.007
riet-0.006
inger-0.006
ruff-0.006
athy-0.006
Token drawer
Token ID33451
Feature activation-2.09e-02

Pos logit contributions
aren+0.009
inger+0.009
oys+0.009
lynn+0.008
riet+0.008
Neg logit contributions
 fro-0.009
 Sp-0.008
 nib-0.008
 perform-0.008
 slur-0.008
Token.
Token ID13
Feature activation+6.83e-03

Pos logit contributions
 fro+0.023
 Sp+0.021
 nib+0.021
 perform+0.019
 slur+0.019
Neg logit contributions
aren-0.046
oys-0.045
inger-0.045
lynn-0.043
riet-0.043
Token They
Token ID1119
Feature activation+6.62e-03

Pos logit contributions
aren+0.011
oys+0.011
inger+0.011
lynn+0.010
riet+0.010
Neg logit contributions
 fro-0.011
 nib-0.010
 Sp-0.010
 perform-0.010
 slur-0.010
Token!
Token ID0
Feature activation+2.73e-02

Pos logit contributions
aren+0.070
oys+0.069
inger+0.068
lynn+0.066
riet+0.065
Neg logit contributions
 fro-0.039
 Sp-0.036
 nib-0.035
 perform-0.033
 slur-0.032
Token<|endoftext|>
Token ID50256
Feature activation-9.42e-02

Pos logit contributions
inger+0.065
aren+0.064
oys+0.064
lynn+0.062
riet+0.061
Neg logit contributions
 Sp-0.025
 fro-0.024
 nib-0.021
 perform-0.019
 slur-0.018
TokenOnce
Token ID7454
Feature activation-4.61e-02

Pos logit contributions
 fro+0.106
 Sp+0.096
 nib+0.095
 slur+0.092
 perform+0.089
Neg logit contributions
aren-0.204
oys-0.197
inger-0.197
lynn-0.189
athy-0.187
Token upon
Token ID2402
Feature activation-4.75e-02

Pos logit contributions
 fro+0.049
 Sp+0.044
 nib+0.043
 perform+0.040
 slur+0.039
Neg logit contributions
aren-0.104
inger-0.101
oys-0.101
lynn-0.098
riet-0.096
Token a
Token ID257
Feature activation-1.87e-02

Pos logit contributions
 fro+0.086
 Sp+0.083
cast+0.082
 slur+0.076
 mo+0.076
Neg logit contributions
oys-0.052
ady-0.051
aren-0.050
 courage-0.050
lynn-0.049
Token time
Token ID640
Feature activation+1.42e-01

Pos logit contributions
 fro+0.037
 Sp+0.036
 nib+0.035
 perform+0.034
 slur+0.033
Neg logit contributions
aren-0.024
oys-0.023
inger-0.023
lynn-0.022
riet-0.021
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
aren+0.280
oys+0.274
inger+0.266
lynn+0.259
 policy+0.252
Neg logit contributions
 fro-0.186
 nib-0.173
 Sp-0.172
 slur-0.167
 perform-0.166
Token there
Token ID612
Feature activation+5.11e-02

Pos logit contributions
aren+0.019
oys+0.019
inger+0.018
lynn+0.018
athy+0.017
Neg logit contributions
 fro-0.016
 slur-0.015
 nib-0.015
 perform-0.015
 Sp-0.014
Token was
Token ID373
Feature activation+1.56e-02

Pos logit contributions
aren+0.094
oys+0.089
 policy+0.085
lynn+0.085
inger+0.084
Neg logit contributions
 fro-0.070
 nib-0.065
 Sp-0.064
 slur-0.064
 perform-0.063
Token a
Token ID257
Feature activation-1.92e-02

Pos logit contributions
aren+0.026
oys+0.026
athy+0.025
ady+0.024
inger+0.024
Neg logit contributions
 fro-0.022
 perform-0.021
cast-0.021
 Sp-0.020
 slur-0.020
Token woman
Token ID2415
Feature activation+9.08e-02

Pos logit contributions
 fro+0.037
 Sp+0.036
 nib+0.035
 perform+0.034
cast+0.034
Neg logit contributions
aren-0.022
oys-0.022
athy-0.020
inger-0.020
lynn-0.019
Token rock
Token ID3881
Feature activation-3.25e-02

Pos logit contributions
aren+0.022
oys+0.021
inger+0.021
lynn+0.020
riet+0.020
Neg logit contributions
 fro-0.016
 Sp-0.015
 nib-0.015
 perform-0.014
 slur-0.014
Token!
Token ID0
Feature activation+3.77e-02

Pos logit contributions
 fro+0.051
 Sp+0.050
 nib+0.047
 slur+0.046
 perform+0.045
Neg logit contributions
inger-0.054
oys-0.053
aren-0.053
lynn-0.049
 policy-0.048
Token It
Token ID632
Feature activation+5.03e-03

Pos logit contributions
oys+0.056
aren+0.052
inger+0.050
lynn+0.049
uster+0.048
Neg logit contributions
 fro-0.065
 nib-0.062
 Sp-0.060
 slur-0.059
 perform-0.059
Token has
Token ID468
Feature activation+5.82e-02

Pos logit contributions
inger+0.007
oys+0.007
aren+0.007
lynn+0.006
athy+0.006
Neg logit contributions
 fro-0.009
 Sp-0.008
 nib-0.008
 slur-0.008
 perform-0.008
Token many
Token ID867
Feature activation+2.25e-02

Pos logit contributions
aren+0.120
inger+0.116
oys+0.116
lynn+0.112
athy+0.110
Neg logit contributions
 fro-0.075
 Sp-0.070
 nib-0.069
 perform-0.064
 slur-0.063
Token arms
Token ID5101
Feature activation+1.70e-01

Pos logit contributions
aren+0.042
oys+0.039
inger+0.039
lynn+0.038
athy+0.038
Neg logit contributions
 fro-0.036
 nib-0.033
 Sp-0.033
 slur-0.032
 sne-0.030
Token and
Token ID290
Feature activation+1.46e-02

Pos logit contributions
aren+0.395
lynn+0.373
athy+0.370
riet+0.367
inger+0.365
Neg logit contributions
 fro-0.168
 nib-0.156
 Sp-0.150
 perform-0.143
 grow-0.136
Token spots
Token ID10222
Feature activation-1.89e-02

Pos logit contributions
aren+0.023
inger+0.022
oys+0.022
lynn+0.021
riet+0.021
Neg logit contributions
 fro-0.025
 Sp-0.024
 nib-0.023
 perform-0.022
 slur-0.022
Token!"
Token ID2474
Feature activation-9.76e-03

Pos logit contributions
 fro+0.027
 Sp+0.025
 nib+0.024
 perform+0.023
 slur+0.022
Neg logit contributions
aren-0.035
inger-0.035
oys-0.035
lynn-0.033
riet-0.032
Token
Token ID198
Feature activation-1.42e-03

Pos logit contributions
 fro+0.011
 Sp+0.010
 nib+0.010
 perform+0.009
 slur+0.009
Neg logit contributions
aren-0.021
oys-0.021
inger-0.021
lynn-0.020
riet-0.020
Token
Token ID198
Feature activation-3.57e-04

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.003
inger-0.003
oys-0.003
riet-0.003
lynn-0.003
Token!
Token ID0
Feature activation+2.76e-02

Pos logit contributions
aren+0.284
oys+0.278
inger+0.278
lynn+0.268
riet+0.263
Neg logit contributions
 fro-0.147
 Sp-0.137
 nib-0.131
 perform-0.122
 slur-0.119
Token<|endoftext|>
Token ID50256
Feature activation-9.48e-02

Pos logit contributions
aren+0.058
oys+0.056
inger+0.056
riet+0.053
lynn+0.053
Neg logit contributions
 fro-0.034
 Sp-0.030
 nib-0.030
 perform-0.028
 slur-0.028
TokenOnce
Token ID7454
Feature activation-4.75e-02

Pos logit contributions
 fro+0.103
 Sp+0.095
 nib+0.092
 slur+0.086
 perform+0.086
Neg logit contributions
aren-0.209
oys-0.203
inger-0.203
lynn-0.195
athy-0.192
Token upon
Token ID2402
Feature activation-4.87e-02

Pos logit contributions
 fro+0.050
 Sp+0.046
 nib+0.044
 perform+0.041
 slur+0.040
Neg logit contributions
aren-0.107
oys-0.104
inger-0.104
lynn-0.101
riet-0.099
Token a
Token ID257
Feature activation-1.89e-02

Pos logit contributions
 fro+0.091
cast+0.088
 Sp+0.087
 mo+0.081
 slur+0.081
Neg logit contributions
oys-0.052
ady-0.050
aren-0.048
 courage-0.048
ace-0.047
Token time
Token ID640
Feature activation+1.41e-01

Pos logit contributions
 fro+0.037
 Sp+0.036
 nib+0.035
 perform+0.034
 slur+0.033
Neg logit contributions
aren-0.025
oys-0.024
inger-0.024
lynn-0.023
riet-0.022
Token there
Token ID612
Feature activation+4.87e-02

Pos logit contributions
aren+0.275
oys+0.268
inger+0.260
lynn+0.253
 policy+0.247
Neg logit contributions
 fro-0.189
 nib-0.176
 Sp-0.174
 slur-0.171
 perform-0.170
Token was
Token ID373
Feature activation+1.41e-02

Pos logit contributions
aren+0.087
oys+0.079
lynn+0.077
 policy+0.077
athy+0.076
Neg logit contributions
 fro-0.069
 nib-0.065
 slur-0.065
 perform-0.062
 gag-0.062
Token a
Token ID257
Feature activation-1.91e-02

Pos logit contributions
aren+0.023
oys+0.022
athy+0.022
ady+0.022
inger+0.022
Neg logit contributions
 fro-0.021
 perform-0.019
cast-0.019
 Sp-0.019
 slur-0.019
Token little
Token ID1310
Feature activation+6.80e-02

Pos logit contributions
 fro+0.037
 Sp+0.036
 nib+0.035
 perform+0.034
cast+0.034
Neg logit contributions
aren-0.022
oys-0.021
inger-0.020
athy-0.020
lynn-0.019
Token boy
Token ID2933
Feature activation-6.89e-02

Pos logit contributions
aren+0.111
oys+0.108
inger+0.104
athy+0.101
lynn+0.099
Neg logit contributions
 fro-0.109
 Sp-0.105
 nib-0.103
 perform-0.100
 slur-0.098
Token box
Token ID3091
Feature activation+8.41e-02

Pos logit contributions
aren+0.044
inger+0.044
oys+0.044
lynn+0.042
riet+0.041
Neg logit contributions
 fro-0.035
 Sp-0.033
 nib-0.032
 perform-0.030
 slur-0.030
Token of
Token ID286
Feature activation+3.50e-02

Pos logit contributions
aren+0.207
inger+0.205
riet+0.199
lynn+0.198
athy+0.198
Neg logit contributions
 fro-0.065
 nib-0.064
 Sp-0.058
 perform-0.057
 sne-0.052
Token old
Token ID1468
Feature activation+1.53e-01

Pos logit contributions
aren+0.054
oys+0.053
inger+0.052
lynn+0.050
riet+0.049
Neg logit contributions
 fro-0.060
 Sp-0.057
 nib-0.056
 perform-0.054
 slur-0.052
Token clothes
Token ID8242
Feature activation+8.82e-02

Pos logit contributions
aren+0.243
inger+0.239
oys+0.235
lynn+0.225
riet+0.219
Neg logit contributions
 fro-0.263
 Sp-0.251
 nib-0.245
 perform-0.235
 slur-0.229
Token and
Token ID290
Feature activation+1.69e-02

Pos logit contributions
inger+0.218
aren+0.217
athy+0.210
lynn+0.210
oys+0.207
Neg logit contributions
 fro-0.065
 nib-0.062
 perform-0.058
 Sp-0.058
 sne-0.054
Token buttons
Token ID12163
Feature activation+1.37e-01

Pos logit contributions
aren+0.035
inger+0.035
oys+0.034
lynn+0.034
athy+0.033
Neg logit contributions
 fro-0.021
 Sp-0.020
 nib-0.019
 perform-0.018
 slur-0.017
Token and
Token ID290
Feature activation+1.70e-02

Pos logit contributions
aren+0.341
inger+0.339
athy+0.335
lynn+0.324
riet+0.323
Neg logit contributions
 fro-0.100
 Sp-0.096
 nib-0.093
 perform-0.086
 sne-0.085
Token rib
Token ID12183
Feature activation+5.21e-02

Pos logit contributions
aren+0.033
inger+0.033
oys+0.031
lynn+0.031
athy+0.030
Neg logit contributions
 fro-0.024
 Sp-0.023
 nib-0.022
 perform-0.021
 slur-0.020
Tokenbons
Token ID23461
Feature activation+1.69e-01

Pos logit contributions
 policy+0.069
aren+0.066
oys+0.062
riet+0.061
 vain+0.057
Neg logit contributions
 Sp-0.103
 nib-0.098
 fro-0.098
 slur-0.095
 grow-0.089
Token in
Token ID287
Feature activation+5.48e-04

Pos logit contributions
aren+0.433
athy+0.426
inger+0.418
oys+0.413
riet+0.411
Neg logit contributions
 fro-0.109
 nib-0.104
 Sp-0.100
 perform-0.092
cast-0.091
Token the
Token ID262
Feature activation+1.46e-02

Pos logit contributions
inger+0.002
aren+0.002
riet+0.001
athy+0.001
oys+0.001
Neg logit contributions
 Sp-0.000
 pretend-0.000
 o-0.000
 fro-0.000
 perform-0.000
Token,
Token ID11
Feature activation-2.01e-02

Pos logit contributions
 fro+0.008
 Sp+0.008
 nib+0.008
 perform+0.007
 slur+0.007
Neg logit contributions
aren-0.018
inger-0.017
oys-0.017
lynn-0.017
riet-0.017
Token he
Token ID339
Feature activation-7.11e-02

Pos logit contributions
 fro+0.026
 Sp+0.025
 nib+0.024
 slur+0.024
 perform+0.023
Neg logit contributions
aren-0.038
oys-0.037
inger-0.035
lynn-0.035
riet-0.035
Token took
Token ID1718
Feature activation+1.12e-02

Pos logit contributions
 fro+0.105
 Sp+0.103
 nib+0.091
 perform+0.091
 slur+0.090
Neg logit contributions
aren-0.119
inger-0.115
oys-0.110
riet-0.109
athy-0.108
Token good
Token ID922
Feature activation+2.89e-02

Pos logit contributions
aren+0.026
oys+0.026
inger+0.026
lynn+0.025
athy+0.024
Neg logit contributions
 fro-0.011
 Sp-0.010
 nib-0.010
 perform-0.009
 slur-0.009
Token care
Token ID1337
Feature activation+4.51e-02

Pos logit contributions
aren+0.046
inger+0.044
oys+0.044
lynn+0.042
riet+0.041
Neg logit contributions
 fro-0.050
 Sp-0.048
 nib-0.046
 perform-0.045
 slur-0.044
Token of
Token ID286
Feature activation+1.33e-02

Pos logit contributions
aren+0.094
inger+0.090
oys+0.088
lynn+0.087
athy+0.087
Neg logit contributions
 fro-0.055
 Sp-0.054
 nib-0.049
 perform-0.049
 slur-0.046
Token himself
Token ID2241
Feature activation+5.46e-02

Pos logit contributions
aren+0.032
inger+0.032
oys+0.031
uster+0.030
lynn+0.030
Neg logit contributions
 fro-0.013
 Sp-0.012
 nib-0.011
 perform-0.010
 slur-0.010
Token.
Token ID13
Feature activation-9.98e-03

Pos logit contributions
inger+0.101
aren+0.101
oys+0.101
lynn+0.096
riet+0.095
Neg logit contributions
 fro-0.075
 Sp-0.071
 nib-0.066
 slur-0.064
 perform-0.062
Token<|endoftext|>
Token ID50256
Feature activation-9.34e-02

Pos logit contributions
 fro+0.014
 Sp+0.013
 nib+0.013
 perform+0.012
 slur+0.012
Neg logit contributions
aren-0.019
inger-0.019
oys-0.019
lynn-0.018
riet-0.017
TokenOnce
Token ID7454
Feature activation-5.11e-02

Pos logit contributions
 fro+0.108
 Sp+0.098
 nib+0.097
 slur+0.093
 perform+0.092
Neg logit contributions
aren-0.199
inger-0.193
oys-0.192
lynn-0.186
athy-0.182
Token upon
Token ID2402
Feature activation-5.19e-02

Pos logit contributions
 fro+0.058
 Sp+0.053
 nib+0.053
 perform+0.050
 slur+0.049
Neg logit contributions
aren-0.111
inger-0.108
oys-0.107
lynn-0.104
riet-0.101
Token he
Token ID339
Feature activation-7.33e-02

Pos logit contributions
 fro+0.067
 Sp+0.064
 nib+0.060
 perform+0.053
 sne+0.051
Neg logit contributions
aren-0.169
inger-0.169
oys-0.166
lynn-0.160
athy-0.158
Token cried
Token ID16896
Feature activation-7.60e-03

Pos logit contributions
 fro+0.098
 Sp+0.093
 nib+0.089
 perform+0.085
 slur+0.083
Neg logit contributions
aren-0.143
oys-0.140
inger-0.140
lynn-0.134
riet-0.132
Token and
Token ID290
Feature activation+9.42e-04

Pos logit contributions
 fro+0.010
 Sp+0.009
 nib+0.009
 perform+0.009
 slur+0.008
Neg logit contributions
oys-0.015
aren-0.015
inger-0.014
uster-0.014
lynn-0.014
Token screamed
Token ID25421
Feature activation+4.23e-02

Pos logit contributions
aren+0.002
inger+0.002
oys+0.002
lynn+0.002
riet+0.002
Neg logit contributions
 fro-0.001
 Sp-0.001
 nib-0.001
 perform-0.001
 slur-0.001
Token,
Token ID11
Feature activation-6.81e-03

Pos logit contributions
oys+0.073
inger+0.070
uster+0.070
aren+0.069
lynn+0.068
Neg logit contributions
 fro-0.061
 Sp-0.053
 perform-0.053
 nib-0.052
 slur-0.050
Token but
Token ID475
Feature activation+4.00e-02

Pos logit contributions
 fro+0.009
 Sp+0.009
 nib+0.009
 slur+0.008
 perform+0.008
Neg logit contributions
aren-0.013
oys-0.013
inger-0.013
athy-0.013
lynn-0.012
Token no
Token ID645
Feature activation+2.87e-02

Pos logit contributions
aren+0.082
oys+0.080
inger+0.079
lynn+0.078
riet+0.076
Neg logit contributions
 fro-0.048
 Sp-0.045
 nib-0.043
 perform-0.042
 slur-0.040
Token one
Token ID530
Feature activation+2.08e-02

Pos logit contributions
aren+0.059
inger+0.058
oys+0.058
lynn+0.054
riet+0.054
Neg logit contributions
 fro-0.039
 Sp-0.036
 nib-0.035
 perform-0.033
 slur-0.032
Token could
Token ID714
Feature activation-2.85e-02

Pos logit contributions
aren+0.039
inger+0.037
oys+0.037
lynn+0.036
riet+0.035
Neg logit contributions
 fro-0.030
 Sp-0.028
 nib-0.026
 perform-0.026
 slur-0.025
Token hear
Token ID3285
Feature activation-8.93e-02

Pos logit contributions
 fro+0.039
 Sp+0.038
 nib+0.034
 slur+0.032
 perform+0.032
Neg logit contributions
inger-0.054
aren-0.053
oys-0.052
lynn-0.051
 policy-0.049
Token him
Token ID683
Feature activation-9.36e-03

Pos logit contributions
 fro+0.080
 Sp+0.072
 nib+0.069
 perform+0.063
 slur+0.061
Neg logit contributions
aren-0.213
oys-0.209
inger-0.209
lynn-0.203
riet-0.199
Token.
Token ID13
Feature activation-3.32e-03

Pos logit contributions
 fro+0.053
 Sp+0.049
 nib+0.048
 perform+0.045
 slur+0.044
Neg logit contributions
aren-0.090
oys-0.088
inger-0.087
lynn-0.084
riet-0.083
Token They
Token ID1119
Feature activation-6.60e-04

Pos logit contributions
 Sp+0.003
 fro+0.002
 Bee+0.002
 nib+0.002
 Will+0.002
Neg logit contributions
inger-0.008
aren-0.008
uster-0.008
lynn-0.008
riet-0.008
Token learned
Token ID4499
Feature activation+3.15e-02

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
riet-0.001
Token a
Token ID257
Feature activation-3.68e-02

Pos logit contributions
aren+0.076
inger+0.074
oys+0.072
lynn+0.071
riet+0.071
Neg logit contributions
 Sp-0.027
 fro-0.027
 nib-0.024
 perform-0.023
 slur-0.021
Token hard
Token ID1327
Feature activation+3.54e-02

Pos logit contributions
 fro+0.068
 Sp+0.068
 nib+0.065
 perform+0.064
 slur+0.062
Neg logit contributions
aren-0.053
inger-0.049
oys-0.048
riet-0.047
athy-0.046
Token lesson
Token ID11483
Feature activation+7.36e-02

Pos logit contributions
aren+0.041
oys+0.040
inger+0.039
lynn+0.037
riet+0.035
Neg logit contributions
 fro-0.076
 Sp-0.073
 nib-0.072
 perform-0.069
 slur-0.068
Token.
Token ID13
Feature activation-8.45e-03

Pos logit contributions
inger+0.194
aren+0.192
uster+0.190
athy+0.185
riet+0.181
Neg logit contributions
 Sp-0.047
 fro-0.035
 perform-0.035
 nib-0.034
 breathe-0.030
Token<|endoftext|>
Token ID50256
Feature activation-9.93e-02

Pos logit contributions
 Sp+0.008
 fro+0.005
 Bee+0.005
 Will+0.005
 perform+0.004
Neg logit contributions
aren-0.021
inger-0.021
uster-0.021
lynn-0.020
riet-0.020
TokenL
Token ID43
Feature activation-4.46e-02

Pos logit contributions
 fro+0.120
 slur+0.113
 nib+0.111
 Sp+0.109
 perform+0.104
Neg logit contributions
aren-0.199
oys-0.194
inger-0.192
lynn-0.184
athy-0.184
Tokenily
Token ID813
Feature activation-1.84e-02

Pos logit contributions
 fro+0.057
 Sp+0.055
 nib+0.050
 perform+0.050
 o+0.049
Neg logit contributions
aren-0.086
inger-0.084
oys-0.082
lynn-0.082
uster-0.079
Token and
Token ID290
Feature activation+7.27e-04

Pos logit contributions
 fro+0.021
 Sp+0.020
 nib+0.019
 perform+0.018
 slur+0.017
Neg logit contributions
aren-0.039
inger-0.039
oys-0.039
lynn-0.037
uster-0.037
Token the
Token ID262
Feature activation+1.19e-02

Pos logit contributions
 fro+0.009
 Sp+0.008
 nib+0.008
 perform+0.008
 slur+0.008
Neg logit contributions
aren-0.015
inger-0.014
lynn-0.014
oys-0.014
athy-0.014
Token swings
Token ID26728
Feature activation+1.65e-02

Pos logit contributions
aren+0.026
oys+0.025
inger+0.025
riet+0.024
uster+0.024
Neg logit contributions
 fro-0.015
 nib-0.013
 Sp-0.013
 perform-0.012
 slur-0.012
Token and
Token ID290
Feature activation+7.75e-03

Pos logit contributions
aren+0.032
inger+0.031
oys+0.031
lynn+0.030
riet+0.029
Neg logit contributions
 fro-0.022
 Sp-0.021
 nib-0.020
 perform-0.019
 slur-0.018
Token slide
Token ID10649
Feature activation-4.34e-02

Pos logit contributions
aren+0.015
oys+0.014
inger+0.014
lynn+0.013
riet+0.013
Neg logit contributions
 fro-0.010
 Sp-0.010
 nib-0.009
 perform-0.009
 slur-0.009
Token,
Token ID11
Feature activation-1.62e-03

Pos logit contributions
 fro+0.049
 Sp+0.046
 nib+0.044
 perform+0.041
 slur+0.040
Neg logit contributions
aren-0.094
oys-0.093
inger-0.093
lynn-0.089
riet-0.088
Token ran
Token ID4966
Feature activation+4.43e-02

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
lynn-0.003
riet-0.003
Token around
Token ID1088
Feature activation-3.65e-02

Pos logit contributions
aren+0.102
inger+0.101
oys+0.101
athy+0.097
 policy+0.097
Neg logit contributions
 fro-0.045
 Sp-0.041
 nib-0.041
 perform-0.036
 dance-0.035
Token and
Token ID290
Feature activation+7.98e-03

Pos logit contributions
 fro+0.049
 gag+0.046
 Sp+0.045
 perform+0.043
 slur+0.043
Neg logit contributions
aren-0.065
oys-0.062
inger-0.060
lynn-0.059
riet-0.057
Token smiled
Token ID13541
Feature activation+5.29e-02

Pos logit contributions
aren+0.014
oys+0.013
inger+0.013
riet+0.012
lynn+0.012
Neg logit contributions
 fro-0.012
 Sp-0.011
 nib-0.011
 perform-0.011
 slur-0.010
Token.
Token ID13
Feature activation+5.47e-04

Pos logit contributions
oys+0.082
inger+0.078
uster+0.077
ace+0.075
aren+0.074
Neg logit contributions
 fro-0.083
 nib-0.073
 Sp-0.073
 slur-0.073
 gag-0.069
Token At
Token ID1629
Feature activation+2.44e-02

Pos logit contributions
aren+0.001
oys+0.001
inger+0.001
lynn+0.001
riet+0.001
Neg logit contributions
 fro-0.001
 Sp-0.001
 nib-0.001
 slur-0.001
 perform-0.001
Token cat
Token ID3797
Feature activation-8.48e-02

Pos logit contributions
 fro+0.010
 Sp+0.009
 nib+0.009
 perform+0.009
 slur+0.009
Neg logit contributions
aren-0.010
oys-0.010
inger-0.010
lynn-0.010
riet-0.009
Token,
Token ID11
Feature activation-2.78e-04

Pos logit contributions
 fro+0.116
 nib+0.104
 Sp+0.104
 slur+0.101
 perform+0.100
Neg logit contributions
aren-0.163
oys-0.160
inger-0.154
athy-0.148
lynn-0.148
Token Wh
Token ID854
Feature activation-1.53e-02

Pos logit contributions
 fro+0.000
 Sp+0.000
 nib+0.000
 slur+0.000
 perform+0.000
Neg logit contributions
aren-0.001
oys-0.000
inger-0.000
lynn-0.000
athy-0.000
Tokenisk
Token ID1984
Feature activation-1.75e-02

Pos logit contributions
 Sp+0.019
 fro+0.018
 nib+0.016
 perform+0.015
 sne+0.015
Neg logit contributions
aren-0.030
uster-0.029
riet-0.029
lynn-0.028
 policy-0.028
Tokeners
Token ID364
Feature activation+1.03e-01

Pos logit contributions
 Sp+0.034
 fro+0.033
 perform+0.031
 nib+0.030
 slur+0.029
Neg logit contributions
aren-0.022
uster-0.021
oys-0.021
inger-0.021
lynn-0.021
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
aren+0.230
inger+0.227
oys+0.225
lynn+0.219
riet+0.216
Neg logit contributions
 fro-0.108
 Sp-0.103
 nib-0.097
 perform-0.091
 slur-0.087
Token They
Token ID1119
Feature activation+8.70e-03

Pos logit contributions
aren+0.023
inger+0.023
oys+0.023
lynn+0.022
riet+0.021
Neg logit contributions
 fro-0.017
 Sp-0.016
 nib-0.016
 perform-0.015
 slur-0.014
Token liked
Token ID8288
Feature activation-1.50e-02

Pos logit contributions
aren+0.013
oys+0.013
athy+0.012
riet+0.012
inger+0.012
Neg logit contributions
 fro-0.014
 Sp-0.013
 nib-0.013
 slur-0.012
cast-0.012
Token to
Token ID284
Feature activation-3.66e-02

Pos logit contributions
 fro+0.027
 gag+0.023
 nib+0.023
fish+0.022
 perform+0.022
Neg logit contributions
oys-0.021
aren-0.019
 policy-0.019
athy-0.018
 supposed-0.018
Token play
Token ID711
Feature activation-1.11e-01

Pos logit contributions
 fro+0.041
 Sp+0.039
cast+0.035
 nib+0.034
fish+0.031
Neg logit contributions
oys-0.072
inger-0.071
aren-0.070
lynn-0.069
ace-0.068
Token together
Token ID1978
Feature activation-3.34e-02

Pos logit contributions
 fro+0.160
 gag+0.153
 nib+0.144
 slur+0.131
 Sp+0.131
Neg logit contributions
oys-0.195
inger-0.174
aren-0.174
lynn-0.171
issors-0.169
Token The
Token ID383
Feature activation-9.56e-04

Pos logit contributions
 fro+0.006
 Sp+0.006
 nib+0.006
 perform+0.005
 slur+0.005
Neg logit contributions
aren-0.010
inger-0.010
oys-0.009
lynn-0.009
athy-0.009
Token owner
Token ID4870
Feature activation+6.29e-02

Pos logit contributions
 fro+0.001
 dream+0.001
 nib+0.001
 sne+0.001
 dance+0.001
Neg logit contributions
inger-0.002
aren-0.002
uster-0.002
riet-0.002
oys-0.002
Token gave
Token ID2921
Feature activation+4.55e-02

Pos logit contributions
aren+0.110
oys+0.102
inger+0.099
athy+0.099
riet+0.095
Neg logit contributions
 fro-0.096
 Sp-0.089
 gag-0.087
 nib-0.085
 perform-0.081
Token her
Token ID607
Feature activation-1.59e-02

Pos logit contributions
oys+0.109
aren+0.109
inger+0.107
lynn+0.104
 policy+0.102
Neg logit contributions
 fro-0.043
 Sp-0.040
 nib-0.037
 perform-0.033
 slur-0.032
Token a
Token ID257
Feature activation-3.92e-02

Pos logit contributions
 fro+0.014
 Sp+0.013
 nib+0.012
 perform+0.011
 slur+0.011
Neg logit contributions
aren-0.039
oys-0.038
inger-0.038
lynn-0.036
riet-0.036
Token big
Token ID1263
Feature activation+6.63e-03

Pos logit contributions
 fro+0.059
 Sp+0.056
 nib+0.054
 perform+0.051
 slur+0.050
Neg logit contributions
aren-0.070
oys-0.068
inger-0.068
lynn-0.065
riet-0.063
Token hug
Token ID16225
Feature activation+1.00e-01

Pos logit contributions
aren+0.011
inger+0.011
oys+0.011
lynn+0.010
riet+0.010
Neg logit contributions
 fro-0.011
 nib-0.010
 Sp-0.010
 slur-0.010
 perform-0.010
Token and
Token ID290
Feature activation-1.55e-03

Pos logit contributions
aren+0.222
inger+0.221
oys+0.216
lynn+0.212
athy+0.208
Neg logit contributions
 fro-0.106
 Sp-0.100
 nib-0.095
 perform-0.090
 slur-0.082
Token praised
Token ID15342
Feature activation+3.73e-02

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
aren-0.003
oys-0.003
inger-0.003
riet-0.003
lynn-0.003
Token her
Token ID607
Feature activation-1.79e-02

Pos logit contributions
aren+0.075
oys+0.074
inger+0.072
riet+0.072
lynn+0.071
Neg logit contributions
 fro-0.045
 nib-0.041
 Sp-0.039
 slur-0.038
 perform-0.038
Token for
Token ID329
Feature activation-5.43e-02

Pos logit contributions
 fro+0.022
 Sp+0.020
 nib+0.020
 perform+0.018
 slur+0.018
Neg logit contributions
aren-0.037
oys-0.037
inger-0.037
lynn-0.035
riet-0.034
Token did
Token ID750
Feature activation-7.84e-02

Pos logit contributions
aren+0.044
inger+0.044
oys+0.044
lynn+0.042
athy+0.041
Neg logit contributions
 fro-0.034
 Sp-0.032
 nib-0.030
 slur-0.028
 perform-0.028
Token today
Token ID1909
Feature activation-1.05e-01

Pos logit contributions
 fro+0.075
 Sp+0.073
 perform+0.070
 nib+0.070
 slur+0.063
Neg logit contributions
aren-0.184
inger-0.178
oys-0.176
lynn-0.170
riet-0.169
Token."
Token ID526
Feature activation-3.11e-02

Pos logit contributions
 fro+0.138
 Sp+0.130
 nib+0.125
 perform+0.117
 slur+0.116
Neg logit contributions
aren-0.208
oys-0.204
inger-0.203
lynn-0.194
riet-0.191
Token
Token ID220
Feature activation+3.96e-02

Pos logit contributions
 fro+0.044
 nib+0.040
 perform+0.039
 Sp+0.039
 slur+0.037
Neg logit contributions
aren-0.059
oys-0.055
athy-0.054
inger-0.053
riet-0.053
Token
Token ID198
Feature activation-1.75e-02

Pos logit contributions
aren+0.091
oys+0.090
inger+0.090
lynn+0.087
riet+0.086
Neg logit contributions
 fro-0.038
 Sp-0.036
 nib-0.033
 perform-0.030
 slur-0.029
Token
Token ID198
Feature activation-1.39e-02

Pos logit contributions
 fro+0.021
 Sp+0.019
 nib+0.019
 perform+0.018
 slur+0.018
Neg logit contributions
aren-0.037
inger-0.035
oys-0.035
lynn-0.034
athy-0.034
TokenThe
Token ID464
Feature activation+3.61e-02

Pos logit contributions
 fro+0.024
 nib+0.023
 Sp+0.023
 perform+0.022
 slur+0.021
Neg logit contributions
aren-0.022
oys-0.021
inger-0.020
athy-0.020
lynn-0.020
Token moral
Token ID6573
Feature activation+1.51e-02

Pos logit contributions
aren+0.062
inger+0.061
oys+0.059
lynn+0.057
riet+0.057
Neg logit contributions
 fro-0.059
 Sp-0.054
 nib-0.053
 perform-0.050
 slur-0.050
Token of
Token ID286
Feature activation+2.22e-02

Pos logit contributions
inger+0.037
aren+0.035
 supervision+0.035
riet+0.035
lynn+0.035
Neg logit contributions
 Sp-0.012
 fro-0.010
 perform-0.010
 dream-0.010
 soup-0.009
Token the
Token ID262
Feature activation+3.04e-05

Pos logit contributions
aren+0.046
oys+0.045
inger+0.045
lynn+0.044
athy+0.043
Neg logit contributions
 fro-0.027
 Sp-0.024
 nib-0.024
 perform-0.022
 slur-0.022
Token story
Token ID1621
Feature activation-6.58e-02

Pos logit contributions
aren+0.000
inger+0.000
oys+0.000
lynn+0.000
riet+0.000
Neg logit contributions
 fro-0.000
 Sp-0.000
 nib-0.000
 perform-0.000
 slur-0.000
Token eager
Token ID11069
Feature activation+3.41e-02

Pos logit contributions
aren+0.021
oys+0.021
inger+0.021
lynn+0.020
riet+0.020
Neg logit contributions
 fro-0.011
 Sp-0.010
 nib-0.010
 perform-0.009
 slur-0.009
Token to
Token ID284
Feature activation-2.97e-02

Pos logit contributions
aren+0.074
oys+0.073
inger+0.073
lynn+0.070
riet+0.069
Neg logit contributions
 fro-0.038
 Sp-0.036
 nib-0.034
 perform-0.032
 slur-0.031
Token number
Token ID1271
Feature activation+6.04e-02

Pos logit contributions
 fro+0.028
 Sp+0.026
 nib+0.025
 perform+0.023
 slur+0.022
Neg logit contributions
aren-0.070
oys-0.069
inger-0.069
lynn-0.066
riet-0.066
Token more
Token ID517
Feature activation-3.92e-03

Pos logit contributions
inger+0.139
aren+0.139
oys+0.137
lynn+0.132
athy+0.131
Neg logit contributions
 fro-0.061
 Sp-0.056
 nib-0.053
 perform-0.050
 slur-0.047
Token things
Token ID1243
Feature activation+3.61e-02

Pos logit contributions
 fro+0.006
 Sp+0.005
 nib+0.005
 perform+0.005
 slur+0.005
Neg logit contributions
aren-0.007
inger-0.007
oys-0.007
lynn-0.007
riet-0.007
Token and
Token ID290
Feature activation+6.53e-03

Pos logit contributions
aren+0.088
inger+0.088
oys+0.086
athy+0.085
lynn+0.083
Neg logit contributions
 fro-0.029
 Sp-0.027
 nib-0.026
 perform-0.026
 dance-0.022
Token have
Token ID423
Feature activation-7.74e-04

Pos logit contributions
aren+0.013
oys+0.013
inger+0.013
lynn+0.012
riet+0.012
Neg logit contributions
 fro-0.008
 Sp-0.008
 nib-0.007
 perform-0.007
 slur-0.007
Token more
Token ID517
Feature activation-6.04e-03

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 sne+0.001
Neg logit contributions
inger-0.002
oys-0.002
aren-0.002
uster-0.002
riet-0.002
Token fun
Token ID1257
Feature activation+1.66e-03

Pos logit contributions
 fro+0.008
 nib+0.007
 Sp+0.007
 perform+0.007
 slur+0.007
Neg logit contributions
aren-0.012
inger-0.012
oys-0.012
lynn-0.011
riet-0.011
Token.
Token ID13
Feature activation+2.79e-03

Pos logit contributions
aren+0.003
oys+0.003
inger+0.003
lynn+0.003
 policy+0.003
Neg logit contributions
 fro-0.002
 Sp-0.002
 nib-0.002
 perform-0.002
 slur-0.002
Token<|endoftext|>
Token ID50256
Feature activation-8.74e-02

Pos logit contributions
inger+0.006
aren+0.006
oys+0.006
lynn+0.006
riet+0.006
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.003
 perform-0.002
 slur-0.002
Token had
Token ID550
Feature activation+8.76e-02

Pos logit contributions
 fro+0.095
 Sp+0.088
 nib+0.086
 perform+0.081
 slur+0.078
Neg logit contributions
aren-0.160
oys-0.159
inger-0.157
lynn-0.152
riet-0.149
Token won
Token ID1839
Feature activation+2.85e-02

Pos logit contributions
oys+0.173
inger+0.170
aren+0.170
athy+0.166
riet+0.165
Neg logit contributions
 fro-0.114
 nib-0.110
 Sp-0.109
 sne-0.107
 perform-0.105
Token!
Token ID0
Feature activation+2.57e-02

Pos logit contributions
aren+0.064
oys+0.063
inger+0.063
athy+0.061
riet+0.061
Neg logit contributions
 fro-0.029
 Sp-0.029
 perform-0.026
 nib-0.026
 sne-0.023
Token
Token ID198
Feature activation-1.90e-02

Pos logit contributions
aren+0.059
inger+0.059
oys+0.058
lynn+0.056
athy+0.055
Neg logit contributions
 fro-0.025
 Sp-0.025
 nib-0.022
 perform-0.020
 slur-0.019
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
 fro+0.025
 nib+0.023
 Sp+0.022
 perform+0.022
 slur+0.020
Neg logit contributions
aren-0.038
oys-0.037
inger-0.037
lynn-0.036
athy-0.034
TokenJack
Token ID14295
Feature activation-7.38e-03

Pos logit contributions
 fro+0.028
 nib+0.026
 Sp+0.026
 perform+0.024
 slur+0.024
Neg logit contributions
aren-0.029
oys-0.029
inger-0.028
lynn-0.027
riet-0.026
Token held
Token ID2714
Feature activation+8.02e-02

Pos logit contributions
 fro+0.009
 Sp+0.008
 nib+0.008
 perform+0.007
 slur+0.007
Neg logit contributions
aren-0.016
oys-0.016
inger-0.016
lynn-0.015
 policy-0.015
Token up
Token ID510
Feature activation+2.36e-03

Pos logit contributions
oys+0.186
aren+0.185
inger+0.185
athy+0.175
 policy+0.173
Neg logit contributions
 fro-0.082
 Sp-0.075
 nib-0.067
 perform-0.064
 slur-0.063
Token the
Token ID262
Feature activation-4.72e-03

Pos logit contributions
aren+0.005
inger+0.005
oys+0.005
lynn+0.005
athy+0.005
Neg logit contributions
 fro-0.002
 Sp-0.002
 nib-0.002
 perform-0.002
 slur-0.002
Token prize
Token ID11596
Feature activation+6.79e-02

Pos logit contributions
 fro+0.006
 perform+0.005
 dance+0.005
 nib+0.005
 swim+0.005
Neg logit contributions
oys-0.010
inger-0.010
aren-0.010
athy-0.010
uster-0.009
Token in
Token ID287
Feature activation-1.29e-02

Pos logit contributions
aren+0.153
inger+0.150
oys+0.150
athy+0.144
riet+0.143
Neg logit contributions
 fro-0.069
 nib-0.062
 Sp-0.062
 perform-0.057
 slur-0.053
Token cup
Token ID6508
Feature activation-8.99e-02

Pos logit contributions
aren+0.021
inger+0.021
oys+0.020
lynn+0.020
riet+0.020
Neg logit contributions
 fro-0.015
 Sp-0.015
 nib-0.014
 perform-0.013
 slur-0.013
Tokenboard
Token ID3526
Feature activation-7.85e-02

Pos logit contributions
 fro+0.112
 Sp+0.108
 nib+0.102
 perform+0.098
 slur+0.093
Neg logit contributions
aren-0.183
inger-0.181
oys-0.175
lynn-0.173
riet-0.170
Token with
Token ID351
Feature activation-2.16e-03

Pos logit contributions
 Sp+0.056
 fro+0.055
 nib+0.053
 perform+0.049
 slur+0.042
Neg logit contributions
inger-0.199
aren-0.198
oys-0.190
lynn-0.190
riet-0.189
Token many
Token ID867
Feature activation+1.43e-02

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 slur+0.002
 perform+0.002
Neg logit contributions
aren-0.005
inger-0.005
oys-0.005
lynn-0.005
athy-0.004
Token shelves
Token ID20726
Feature activation+9.34e-02

Pos logit contributions
inger+0.026
aren+0.025
oys+0.025
lynn+0.025
riet+0.024
Neg logit contributions
 fro-0.022
 Sp-0.022
 nib-0.021
 slur-0.020
 perform-0.019
Token and
Token ID290
Feature activation+2.67e-03

Pos logit contributions
inger+0.245
aren+0.241
athy+0.237
lynn+0.235
riet+0.232
Neg logit contributions
 Sp-0.057
 nib-0.054
 fro-0.052
 perform-0.049
 grow-0.047
Token it
Token ID340
Feature activation+1.24e-02

Pos logit contributions
aren+0.005
oys+0.005
inger+0.005
lynn+0.005
riet+0.005
Neg logit contributions
 fro-0.004
 Sp-0.003
 nib-0.003
 perform-0.003
 slur-0.003
Token was
Token ID373
Feature activation+4.27e-03

Pos logit contributions
aren+0.024
oys+0.023
inger+0.023
lynn+0.022
riet+0.021
Neg logit contributions
 fro-0.017
 nib-0.015
 Sp-0.015
 slur-0.014
 perform-0.014
Token perfect
Token ID2818
Feature activation-6.04e-03

Pos logit contributions
aren+0.009
oys+0.008
inger+0.008
lynn+0.008
riet+0.008
Neg logit contributions
 fro-0.005
 Sp-0.005
 nib-0.005
 perform-0.005
 slur-0.004
Token for
Token ID329
Feature activation-5.23e-02

Pos logit contributions
 fro+0.006
 Sp+0.006
 nib+0.005
 perform+0.005
 slur+0.005
Neg logit contributions
aren-0.014
inger-0.014
oys-0.013
lynn-0.013
athy-0.013
Token all
Token ID477
Feature activation+6.75e-03

Pos logit contributions
 fro+0.059
 Sp+0.055
 nib+0.052
 perform+0.049
 slur+0.048
Neg logit contributions
aren-0.114
oys-0.111
inger-0.111
lynn-0.107
riet-0.105
Token,
Token ID11
Feature activation+2.01e-04

Pos logit contributions
aren+0.069
inger+0.068
oys+0.066
lynn+0.064
athy+0.063
Neg logit contributions
 fro-0.054
 Sp-0.053
 nib-0.049
 perform-0.047
 slur-0.045
Token "
Token ID366
Feature activation+4.37e-02

Pos logit contributions
aren+0.000
oys+0.000
inger+0.000
lynn+0.000
riet+0.000
Neg logit contributions
 fro-0.000
 Sp-0.000
 nib-0.000
 perform-0.000
 slur-0.000
TokenHow
Token ID2437
Feature activation-1.14e-01

Pos logit contributions
aren+0.088
inger+0.087
oys+0.087
lynn+0.083
riet+0.081
Neg logit contributions
 fro-0.056
 Sp-0.053
 nib-0.050
 perform-0.048
 slur-0.046
Token do
Token ID466
Feature activation-1.27e-01

Pos logit contributions
 fro+0.162
 nib+0.152
 Sp+0.150
 slur+0.141
 perform+0.131
Neg logit contributions
oys-0.208
aren-0.207
inger-0.198
lynn-0.196
 policy-0.193
Token you
Token ID345
Feature activation+2.18e-02

Pos logit contributions
 fro+0.177
 Sp+0.167
 nib+0.161
 perform+0.153
 slur+0.149
Neg logit contributions
aren-0.242
oys-0.237
inger-0.237
lynn-0.227
riet-0.222
Token do
Token ID466
Feature activation-1.27e-01

Pos logit contributions
aren+0.043
inger+0.043
oys+0.043
lynn+0.041
riet+0.040
Neg logit contributions
 fro-0.027
 Sp-0.026
 nib-0.024
 slur-0.022
 perform-0.022
Token that
Token ID326
Feature activation+1.42e-02

Pos logit contributions
 fro+0.185
 nib+0.168
 slur+0.168
 Sp+0.167
 gag+0.162
Neg logit contributions
lynn-0.210
oys-0.209
aren-0.209
inger-0.203
 policy-0.199
Token,
Token ID11
Feature activation-8.41e-04

Pos logit contributions
inger+0.024
oys+0.024
lynn+0.024
 policy+0.024
aren+0.023
Neg logit contributions
 Sp-0.020
 fro-0.019
 slur-0.019
 nib-0.019
 gag-0.017
Token Sarah
Token ID10490
Feature activation+1.20e-01

Pos logit contributions
 fro+0.001
 Sp+0.001
 nib+0.001
 perform+0.001
 slur+0.001
Neg logit contributions
aren-0.001
oys-0.001
inger-0.001
lynn-0.001
riet-0.001
Token?"
Token ID1701
Feature activation-5.80e-02

Pos logit contributions
aren+0.237
athy+0.229
oys+0.229
inger+0.227
lynn+0.224
Neg logit contributions
 fro-0.150
 nib-0.138
 slur-0.137
 Sp-0.133
 gag-0.127
Token
Token ID198
Feature activation-3.94e-03

Pos logit contributions
 fro+0.113
 nib+0.107
 Sp+0.106
 perform+0.102
 slur+0.102
Neg logit contributions
aren-0.079
oys-0.076
inger-0.073
lynn-0.072
riet-0.068
Token fell
Token ID3214
Feature activation-2.74e-02

Pos logit contributions
 fro+0.032
 Sp+0.030
 nib+0.028
 perform+0.027
 gag+0.027
Neg logit contributions
aren-0.033
inger-0.031
athy-0.031
oys-0.031
riet-0.030
Token asleep
Token ID16039
Feature activation-1.19e-01

Pos logit contributions
 fro+0.037
 Sp+0.034
 nib+0.034
 slur+0.033
 perform+0.033
Neg logit contributions
inger-0.051
aren-0.050
oys-0.050
uster-0.049
lynn-0.047
Token with
Token ID351
Feature activation-2.03e-03

Pos logit contributions
 fro+0.169
 Sp+0.153
 nib+0.142
 slur+0.140
 perform+0.137
Neg logit contributions
aren-0.216
inger-0.214
oys-0.213
lynn-0.202
riet-0.200
Token a
Token ID257
Feature activation-3.53e-02

Pos logit contributions
 fro+0.002
 Sp+0.002
 nib+0.002
 perform+0.002
 slur+0.002
Neg logit contributions
aren-0.004
oys-0.004
inger-0.004
lynn-0.004
riet-0.004
Token happy
Token ID3772
Feature activation-6.54e-02

Pos logit contributions
 fro+0.050
 Sp+0.048
 nib+0.046
 perform+0.044
 slur+0.043
Neg logit contributions
aren-0.066
inger-0.064
oys-0.064
lynn-0.061
riet-0.060
Token dream
Token ID4320
Feature activation-1.69e-01

Pos logit contributions
 Sp+0.090
 fro+0.090
 nib+0.079
 perform+0.077
cast+0.074
Neg logit contributions
aren-0.120
oys-0.118
inger-0.115
 policy-0.111
lynn-0.111
Token.
Token ID13
Feature activation-6.67e-03

Pos logit contributions
 fro+0.182
 Sp+0.169
 nib+0.161
 perform+0.159
 slur+0.144
Neg logit contributions
aren-0.373
oys-0.363
inger-0.362
lynn-0.351
athy-0.348
Token<|endoftext|>
Token ID50256
Feature activation-9.46e-02

Pos logit contributions
 fro+0.008
 Sp+0.007
 nib+0.007
 perform+0.006
 slur+0.006
Neg logit contributions
aren-0.014
oys-0.014
inger-0.014
lynn-0.013
riet-0.013
TokenTom
Token ID13787
Feature activation+8.97e-02

Pos logit contributions
 fro+0.129
 slur+0.119
 nib+0.117
 Sp+0.116
 gag+0.113
Neg logit contributions
aren-0.179
inger-0.172
oys-0.169
athy-0.163
lynn-0.163
Token and
Token ID290
Feature activation+5.68e-03

Pos logit contributions
aren+0.142
oys+0.139
inger+0.138
lynn+0.131
riet+0.128
Neg logit contributions
 fro-0.153
 Sp-0.146
 nib-0.142
 perform-0.137
 slur-0.134
Token Lily
Token ID20037
Feature activation-3.26e-02

Pos logit contributions
aren+0.013
oys+0.012
inger+0.012
lynn+0.012
riet+0.012
Neg logit contributions
 fro-0.006
 Sp-0.006
 nib-0.006
 perform-0.005
 slur-0.005
TokenWow
Token ID22017
Feature activation-2.77e-03

Pos logit contributions
aren+0.136
oys+0.132
inger+0.128
lynn+0.122
athy+0.120
Neg logit contributions
 fro-0.138
 Sp-0.132
 nib-0.129
 perform-0.126
 slur-0.124
Token,
Token ID11
Feature activation+1.05e-02

Pos logit contributions
 fro+0.004
 nib+0.004
 Sp+0.004
 perform+0.004
 slur+0.004
Neg logit contributions
oys-0.004
aren-0.004
inger-0.004
lynn-0.004
riet-0.004
Token look
Token ID804
Feature activation-1.05e-02

Pos logit contributions
oys+0.017
aren+0.016
 ours+0.016
 Stephanie+0.015
 policy+0.014
Neg logit contributions
cast-0.015
 fro-0.015
 nib-0.015
 perform-0.014
bel-0.014
Token at
Token ID379
Feature activation-8.26e-02

Pos logit contributions
 perform+0.015
 fro+0.015
 Sp+0.015
 gag+0.014
Ro+0.014
Neg logit contributions
 ours-0.016
aren-0.015
oys-0.015
ige-0.014
riet-0.014
Token that
Token ID326
Feature activation+1.58e-02

Pos logit contributions
 fro+0.113
 nib+0.109
 Sp+0.104
 slur+0.103
 perform+0.103
Neg logit contributions
aren-0.148
oys-0.146
inger-0.144
athy-0.138
lynn-0.138
Token bug
Token ID5434
Feature activation-1.58e-01

Pos logit contributions
inger+0.027
oys+0.025
 policy+0.024
 way+0.024
 courage+0.024
Neg logit contributions
 slur-0.021
bel-0.021
 Sp-0.020
 nib-0.020
 perform-0.019
Token!"
Token ID2474
Feature activation-4.33e-02

Pos logit contributions
 Sp+0.294
 fro+0.286
 slur+0.283
 gag+0.282
 perform+0.270
Neg logit contributions
inger-0.179
aren-0.173
 ours-0.161
oys-0.158
riet-0.152
Token Tim
Token ID5045
Feature activation+3.65e-04

Pos logit contributions
 fro+0.066
 perform+0.063
 nib+0.062
cast+0.061
atmeal+0.061
Neg logit contributions
aren-0.071
oys-0.067
 Stephanie-0.061
riet-0.061
inger-0.059
Token said
Token ID531
Feature activation-2.00e-02

Pos logit contributions
aren+0.001
oys+0.001
inger+0.001
lynn+0.001
riet+0.001
Neg logit contributions
 fro-0.000
 Sp-0.000
 nib-0.000
 perform-0.000
 slur-0.000
Token.
Token ID13
Feature activation+6.86e-03

Pos logit contributions
 fro+0.027
 Sp+0.024
 nib+0.023
 perform+0.022
 slur+0.022
Neg logit contributions
oys-0.038
aren-0.038
inger-0.037
lynn-0.036
riet-0.036
Token "
Token ID366
Feature activation+4.64e-02

Pos logit contributions
aren+0.011
oys+0.011
inger+0.010
lynn+0.010
riet+0.009
Neg logit contributions
 fro-0.011
 perform-0.010
 nib-0.010
 Sp-0.010
 dream-0.010
Token the
Token ID262
Feature activation+9.04e-03

Pos logit contributions
aren+0.056
oys+0.055
inger+0.054
lynn+0.053
riet+0.053
Neg logit contributions
 fro-0.030
 nib-0.027
 Sp-0.027
 perform-0.025
 slur-0.025
Token magician
Token ID42055
Feature activation-5.24e-02

Pos logit contributions
aren+0.015
oys+0.014
inger+0.014
lynn+0.014
riet+0.013
Neg logit contributions
 fro-0.015
 Sp-0.014
 nib-0.014
 slur-0.013
 perform-0.013
Token,
Token ID11
Feature activation-4.37e-03

Pos logit contributions
 fro+0.067
 Sp+0.063
 nib+0.061
 slur+0.058
 perform+0.055
Neg logit contributions
aren-0.103
oys-0.103
inger-0.100
lynn-0.097
riet-0.096
Token "
Token ID366
Feature activation+3.31e-02

Pos logit contributions
 fro+0.007
 Sp+0.007
 nib+0.007
 perform+0.006
 slur+0.006
Neg logit contributions
aren-0.007
oys-0.007
inger-0.007
lynn-0.006
riet-0.006
TokenHow
Token ID2437
Feature activation-1.16e-01

Pos logit contributions
aren+0.064
oys+0.062
inger+0.062
lynn+0.060
athy+0.059
Neg logit contributions
 fro-0.045
 Sp-0.043
 nib-0.041
 slur-0.039
 perform-0.039
Token do
Token ID466
Feature activation-1.34e-01

Pos logit contributions
 fro+0.166
 nib+0.152
 Sp+0.148
 slur+0.135
 perform+0.126
Neg logit contributions
oys-0.214
aren-0.209
 policy-0.205
lynn-0.203
inger-0.202
Token you
Token ID345
Feature activation+1.96e-02

Pos logit contributions
 fro+0.177
 Sp+0.166
 nib+0.162
 perform+0.150
 slur+0.150
Neg logit contributions
aren-0.264
oys-0.259
inger-0.257
lynn-0.248
riet-0.243
Token make
Token ID787
Feature activation-3.68e-02

Pos logit contributions
inger+0.035
aren+0.034
oys+0.034
lynn+0.033
riet+0.032
Neg logit contributions
 fro-0.027
 Sp-0.027
 nib-0.024
 slur-0.023
 gag-0.022
Token things
Token ID1243
Feature activation+3.03e-02

Pos logit contributions
 fro+0.051
 nib+0.049
 Sp+0.047
 slur+0.046
 gag+0.044
Neg logit contributions
inger-0.063
oys-0.062
aren-0.062
lynn-0.061
athy-0.060
Token vanish
Token ID39572
Feature activation-4.40e-02

Pos logit contributions
oys+0.051
inger+0.050
 ours+0.048
lynn+0.048
aren+0.047
Neg logit contributions
 fro-0.044
 Sp-0.042
 nib-0.041
 slur-0.039
 gag-0.037
Token?"
Token ID1701
Feature activation-7.21e-02

Pos logit contributions
 fro+0.056
 Sp+0.053
 nib+0.052
 slur+0.051
 gag+0.046
Neg logit contributions
inger-0.084
oys-0.084
aren-0.082
lynn-0.079
 policy-0.078
Token very
Token ID845
Feature activation-5.60e-02

Pos logit contributions
aren+0.004
oys+0.004
inger+0.004
lynn+0.004
riet+0.004
Neg logit contributions
 fro-0.003
 Sp-0.003
 nib-0.002
 perform-0.002
 slur-0.002
Token happy
Token ID3772
Feature activation-6.80e-02

Pos logit contributions
 fro+0.067
 Sp+0.062
 nib+0.060
 perform+0.058
 slur+0.056
Neg logit contributions
aren-0.119
oys-0.118
inger-0.118
lynn-0.115
athy-0.112
Token -
Token ID532
Feature activation+1.12e-03

Pos logit contributions
 fro+0.098
 Sp+0.088
 nib+0.085
 perform+0.081
 slur+0.080
Neg logit contributions
oys-0.126
aren-0.125
inger-0.117
 policy-0.117
lynn-0.116
Token the
Token ID262
Feature activation-3.07e-03

Pos logit contributions
aren+0.003
oys+0.003
inger+0.003
lynn+0.003
riet+0.003
Neg logit contributions
 fro-0.001
 Sp-0.001
 nib-0.001
 perform-0.001
 slur-0.001
Token new
Token ID649
Feature activation+1.03e-01

Pos logit contributions
 fro+0.003
 Sp+0.003
 nib+0.003
 perform+0.003
 slur+0.002
Neg logit contributions
aren-0.007
inger-0.007
oys-0.007
lynn-0.007
riet-0.007
Token cartoon
Token ID16251
Feature activation-1.52e-01

Pos logit contributions
aren+0.244
oys+0.241
inger+0.235
lynn+0.230
 policy+0.227
Neg logit contributions
 fro-0.094
 Sp-0.083
 nib-0.081
 perform-0.071
 slur-0.071
Token had
Token ID550
Feature activation+8.80e-02

Pos logit contributions
 fro+0.197
 Sp+0.181
 nib+0.177
 slur+0.162
 perform+0.162
Neg logit contributions
aren-0.303
oys-0.298
inger-0.289
lynn-0.281
 policy-0.277
Token been
Token ID587
Feature activation+8.00e-02

Pos logit contributions
aren+0.173
inger+0.172
oys+0.171
uster+0.168
issors+0.166
Neg logit contributions
 fro-0.119
 nib-0.110
 perform-0.109
 Sp-0.109
 sne-0.106
Token even
Token ID772
Feature activation-1.16e-01

Pos logit contributions
inger+0.167
oys+0.167
aren+0.166
lynn+0.162
riet+0.160
Neg logit contributions
 fro-0.098
 Sp-0.092
 nib-0.088
 perform-0.085
 sne-0.084
Token more
Token ID517
Feature activation-1.34e-02

Pos logit contributions
 fro+0.144
 Sp+0.136
 nib+0.130
 perform+0.122
 slur+0.120
Neg logit contributions
aren-0.238
oys-0.233
inger-0.232
lynn-0.223
riet-0.219
Token exciting
Token ID7895
Feature activation+2.14e-02

Pos logit contributions
 fro+0.017
 nib+0.017
 Sp+0.016
 perform+0.016
 dream+0.016
Neg logit contributions
aren-0.027
oys-0.027
inger-0.026
issors-0.025
lynn-0.025