TOP ACTIVATIONS
MAX = 8.605

shoot hay and eat cup cakes . It was a good
and his friends ate cup cakes and played with the fixed
mom to make some cup cakes . But it
sandwiches and some little cup cakes . Lily was happy because
help her make some cup cakes and put up some decorations
her mom had made cup cakes ! Bella was so excited
Grand ma was making cup cakes for the family . Dave
Can I put the cup cakes in the oven ?" June
! Finally , the cup cakes were ready and they took
When the cup cakes were done baking , they
playing games and eating cup cakes . At the
Mom my was baking cup cakes . She got out the
hard so that the cup cakes would be extra fluffy .
Finally , the cup cakes were ready and they all
She loved to bake cup cakes with her mom my .
When the cup cakes were done , Mill ie
, they put the cup cakes in the oven to bake
and enjoyed the delicious cup cakes . From that day on
my and eat delicious cup cakes ! <|endoftext|> Once upon a
, Dave y 's cup cakes were done , and they

MIN ACTIVATIONS
MIN = -7.849

like taking strange things from strangers . Live and learn !
instinct can often save us unnecessary danger . <|endoftext|> Once upon
important to be careful around strangers , especially ones who act
them to stay away from strangers . But they did not
your parents and not take unnecessary risks . <|endoftext|> One day
careful and not talk to strangers . Sara and Ben agreed
for the help of kind strangers . <|endoftext|> Once upon a
to be too scared of strangers . <|endoftext|> Once upon a
The animals did not like strangers , so they began to
animals safe and chased away strangers who came too close to
call the park her " use ful - park ". <|endoftext|>
valuable lesson - never trust strangers . <|endoftext|> Once upon a
knew how to deal with strangers . She said to the
dad and not talk to strangers . They said they would
her and never talked to strangers again . <|endoftext|> Once upon
her to be careful of strangers . A stranger is someone
, even if they are strangers . From that day on
to always be careful around strangers . <|endoftext|> Once upon a
€ ™ t scared of strangers anymore - he was very
friends and not talking to strangers . Lily thought that was

INTERVAL 4.491 - 8.605
CONTAINS 0.161%

said it was time to un pack their bags . Tom
up the blocks . They work together to make a new
Artist ". She made the art . She sees Tom and
flags and towers with cr ay ons . They put the
arrived , all of them o oh - ed and a

INTERVAL 0.378 - 4.491
CONTAINS 32.760%

she smiled as she skipped home , counting every step along
, there was a loud noise at the open house .
, but green . You can wear it when you take
Once upon a time , there was a tree . It
Once upon a time , there was a little girl named

INTERVAL -3.736 - 0.378
CONTAINS 66.530%

y and started to complain . The little girl
on the swings and the slide , but then she saw
was nothing there except a few cob we bs on the
're welcome , Lily . Let 's play again soon !"
," Ben says . He takes the doll . He tries

INTERVAL -7.849 - -3.736
CONTAINS 0.550%

land liked to help each other . They played together near
too . They hugged each other and couldn 't stop smiling
and Sam looked at each other . They felt sorry for
, " To Anna and Spot , from Grand ma and
One day , Bark y saw Wh isk ers
Token shoot
Token ID2686
Feature activation-1.61e+00

Pos logit contributions
 strangers+0.011
 goats+0.010
emy+0.010
 necks+0.010
 pad+0.010
Neg logit contributions
bel-0.010
cakes-0.009
organ-0.008
 Jos-0.008
spe-0.008
Token hay
Token ID27678
Feature activation-1.66e+00

Pos logit contributions
 strangers+0.105
 goats+0.100
emy+0.091
 necks+0.089
 pad+0.087
Neg logit contributions
bel-0.117
cakes-0.109
organ-0.103
 Jos-0.100
spe-0.099
Token and
Token ID290
Feature activation-4.87e-02

Pos logit contributions
 strangers+0.099
 goats+0.094
 pad+0.089
 necks+0.085
 goat+0.083
Neg logit contributions
bel-0.119
organ-0.117
cakes-0.114
 Jos-0.110
ushes-0.105
Token eat
Token ID4483
Feature activation-1.86e+00

Pos logit contributions
 strangers+0.003
 goats+0.003
 necks+0.003
emy+0.003
 pad+0.003
Neg logit contributions
bel-0.003
cakes-0.003
organ-0.003
he-0.003
 Jos-0.003
Token cup
Token ID6508
Feature activation+3.67e+00

Pos logit contributions
 strangers+0.114
 goats+0.107
emy+0.098
 necks+0.097
 pad+0.094
Neg logit contributions
bel-0.140
cakes-0.127
organ-0.125
 Jos-0.121
spe-0.120
Tokencakes
Token ID37263
Feature activation+8.60e+00

Pos logit contributions
bel+0.099
spe+0.093
 Jos+0.092
organ+0.080
urd+0.057
Neg logit contributions
 strangers-0.363
 goats-0.359
agging-0.341
 dogs-0.340
emy-0.335
Token.
Token ID13
Feature activation+3.85e-01

Pos logit contributions
bel+0.750
 Jos+0.703
organ+0.684
spe+0.671
 Suz+0.652
Neg logit contributions
 strangers-0.375
 goats-0.342
 pad-0.325
agging-0.323
 necks-0.303
Token It
Token ID632
Feature activation+5.89e-01

Pos logit contributions
bel+0.025
cakes+0.022
organ+0.021
spe+0.019
castle+0.019
Neg logit contributions
 strangers-0.026
 pad-0.025
emy-0.025
 goats-0.025
uf-0.024
Token was
Token ID373
Feature activation+3.36e-01

Pos logit contributions
bel+0.049
ushes+0.045
cakes+0.045
 Jos+0.043
he+0.042
Neg logit contributions
 strangers-0.026
 goats-0.025
 necks-0.025
emy-0.024
 fox-0.024
Token a
Token ID257
Feature activation+2.14e-01

Pos logit contributions
bel+0.026
cakes+0.025
spe+0.022
organ+0.022
 Jos+0.021
Neg logit contributions
 strangers-0.020
 goats-0.019
emy-0.018
 hesitation-0.017
 unnecessary-0.017
Token good
Token ID922
Feature activation-5.19e-01

Pos logit contributions
bel+0.014
cakes+0.014
organ+0.013
 Jos+0.012
he+0.011
Neg logit contributions
 strangers-0.014
 goats-0.013
emy-0.013
 pad-0.013
 necks-0.013
Token and
Token ID290
Feature activation-4.42e-02

Pos logit contributions
bel+0.145
cakes+0.140
 Jos+0.123
organ+0.123
ushes+0.115
Neg logit contributions
 strangers-0.134
 necks-0.128
 goats-0.127
emy-0.125
 pad-0.125
Token his
Token ID465
Feature activation-4.10e-01

Pos logit contributions
 strangers+0.003
 goats+0.003
emy+0.003
uf+0.003
 hesitation+0.003
Neg logit contributions
bel-0.003
cakes-0.003
organ-0.002
 Jos-0.002
spe-0.002
Token friends
Token ID2460
Feature activation+8.49e-01

Pos logit contributions
 strangers+0.028
emy+0.027
 goats+0.027
 necks+0.026
 pad+0.026
Neg logit contributions
bel-0.026
cakes-0.025
organ-0.023
 Jos-0.022
he-0.020
Token ate
Token ID15063
Feature activation-1.09e-01

Pos logit contributions
bel+0.057
cakes+0.055
organ+0.050
 Jos+0.048
ushes+0.046
Neg logit contributions
 strangers-0.054
 goats-0.051
 pad-0.051
 necks-0.050
 hesitation-0.049
Token cup
Token ID6508
Feature activation+3.69e+00

Pos logit contributions
 strangers+0.006
 goats+0.006
 necks+0.006
 tails+0.005
 hesitation+0.005
Neg logit contributions
bel-0.008
cakes-0.007
spe-0.007
organ-0.007
 Jos-0.007
Tokencakes
Token ID37263
Feature activation+8.60e+00

Pos logit contributions
spe+0.117
bel+0.108
 Jos+0.107
organ+0.090
urd+0.089
Neg logit contributions
 strangers-0.294
 wo-0.291
 years-0.291
agging-0.288
 dogs-0.286
Token and
Token ID290
Feature activation-4.45e-02

Pos logit contributions
bel+0.764
 Jos+0.736
spe+0.723
organ+0.668
 Suz+0.667
Neg logit contributions
 strangers-0.273
agging-0.261
 pad-0.258
 years-0.255
 necks-0.245
Token played
Token ID2826
Feature activation+1.15e+00

Pos logit contributions
 strangers+0.002
 goats+0.002
 necks+0.002
 tails+0.002
 pad+0.002
Neg logit contributions
bel-0.003
cakes-0.003
organ-0.003
he-0.003
 Jos-0.003
Token with
Token ID351
Feature activation+8.03e-01

Pos logit contributions
bel+0.065
cakes+0.064
spe+0.063
 Jos+0.061
organ+0.059
Neg logit contributions
 strangers-0.077
agging-0.077
 necks-0.075
 tails-0.072
 goats-0.071
Token the
Token ID262
Feature activation+5.11e-01

Pos logit contributions
bel+0.057
cakes+0.054
organ+0.053
spe+0.047
 Jos+0.047
Neg logit contributions
 strangers-0.052
 goats-0.048
 necks-0.045
 tails-0.043
 years-0.042
Token fixed
Token ID5969
Feature activation+3.86e+00

Pos logit contributions
bel+0.036
organ+0.034
cakes+0.033
 Jos+0.033
ill+0.030
Neg logit contributions
 strangers-0.031
 goats-0.030
 goat-0.028
 pad-0.027
 necks-0.027
Token mom
Token ID1995
Feature activation+1.48e+00

Pos logit contributions
 strangers+0.026
 goats+0.025
emy+0.024
 hesitation+0.023
uf+0.023
Neg logit contributions
bel-0.026
cakes-0.025
organ-0.023
 Jos-0.022
he-0.021
Token to
Token ID284
Feature activation-1.91e-01

Pos logit contributions
bel+0.101
cakes+0.097
organ+0.091
 Jos+0.085
spe+0.081
Neg logit contributions
 strangers-0.095
 goats-0.089
 pad-0.086
emy-0.085
 hesitation-0.084
Token make
Token ID787
Feature activation+4.11e-01

Pos logit contributions
 strangers+0.012
 goats+0.010
 hesitation+0.010
uf+0.010
 goat+0.009
Neg logit contributions
cakes-0.014
bel-0.014
organ-0.013
ushes-0.012
 Jos-0.012
Token some
Token ID617
Feature activation-9.81e-01

Pos logit contributions
bel+0.034
spe+0.032
cakes+0.032
organ+0.029
ill+0.028
Neg logit contributions
 strangers-0.020
 goats-0.018
 walks-0.018
 unnecessary-0.017
emy-0.017
Token cup
Token ID6508
Feature activation+3.69e+00

Pos logit contributions
 strangers+0.053
 goats+0.051
 dogs+0.045
agging+0.044
 hesitation+0.044
Neg logit contributions
bel-0.076
organ-0.071
cakes-0.069
spe-0.068
 Jos-0.067
Tokencakes
Token ID37263
Feature activation+8.60e+00

Pos logit contributions
spe+0.221
organ+0.178
 Jos+0.174
urd+0.162
Inc+0.156
Neg logit contributions
agging-0.207
ly-0.198
 dogs-0.193
is-0.191
 strangers-0.191
Token.
Token ID13
Feature activation+3.31e-01

Pos logit contributions
spe+0.825
bel+0.780
 Jos+0.746
 Manny+0.738
 Suz+0.731
Neg logit contributions
 tale-0.183
agging-0.182
 strangers-0.179
 pad-0.155
 wo-0.153
Token
Token ID198
Feature activation+3.39e-01

Pos logit contributions
bel+0.028
vell+0.027
organ+0.026
cakes+0.025
spe+0.025
Neg logit contributions
emy-0.010
uf-0.010
 goat-0.009
 goats-0.009
 Running-0.009
Token
Token ID198
Feature activation+3.31e-01

Pos logit contributions
bel+0.024
cakes+0.024
organ+0.022
 Jos+0.021
spe+0.020
Neg logit contributions
 strangers-0.020
emy-0.019
 goats-0.018
uf-0.018
 pad-0.017
TokenBut
Token ID1537
Feature activation-1.37e+00

Pos logit contributions
 Jos+0.026
cakes+0.026
bel+0.025
 Sus+0.024
ushes+0.024
Neg logit contributions
emy-0.013
 strangers-0.013
 goats-0.012
 necks-0.012
 goat-0.012
Token it
Token ID340
Feature activation-2.81e-01

Pos logit contributions
 strangers+0.078
 goats+0.073
 hesitation+0.070
 necks+0.070
 pad+0.069
Neg logit contributions
cakes-0.102
bel-0.102
organ-0.091
ushes-0.088
he-0.086
Token sandwiches
Token ID29777
Feature activation+5.97e-02

Pos logit contributions
 strangers+0.143
emy+0.136
 goats+0.134
leness+0.132
iley+0.129
Neg logit contributions
bel-0.104
cakes-0.101
organ-0.091
 Jos-0.086
he-0.077
Token and
Token ID290
Feature activation-5.76e-02

Pos logit contributions
bel+0.005
organ+0.005
cakes+0.005
 Jos+0.005
spe+0.005
Neg logit contributions
 strangers-0.003
 pad-0.003
 goats-0.003
 necks-0.002
emy-0.002
Token some
Token ID617
Feature activation-1.03e+00

Pos logit contributions
 strangers+0.004
emy+0.004
agging+0.004
 goats+0.004
leness+0.004
Neg logit contributions
bel-0.004
cakes-0.004
organ-0.003
 Jos-0.003
spe-0.003
Token little
Token ID1310
Feature activation-1.23e+00

Pos logit contributions
 strangers+0.067
emy+0.064
 goats+0.063
uf+0.061
iley+0.061
Neg logit contributions
bel-0.068
cakes-0.067
organ-0.059
 Jos-0.056
he-0.053
Token cup
Token ID6508
Feature activation+3.66e+00

Pos logit contributions
 strangers+0.071
emy+0.066
 goats+0.066
 necks+0.062
 pad+0.062
Neg logit contributions
bel-0.092
cakes-0.089
organ-0.082
 Jos-0.080
spe-0.075
Tokencakes
Token ID37263
Feature activation+8.58e+00

Pos logit contributions
bel+0.108
organ+0.107
 Jos+0.101
spe+0.092
urd+0.074
Neg logit contributions
 strangers-0.348
 goats-0.348
 pad-0.330
 wo-0.330
agging-0.328
Token.
Token ID13
Feature activation+3.78e-01

Pos logit contributions
bel+0.683
 Jos+0.669
organ+0.642
spe+0.627
 Manny+0.591
Neg logit contributions
 strangers-0.381
 pad-0.376
 goats-0.372
 necks-0.352
agging-0.350
Token Lily
Token ID20037
Feature activation+3.35e-01

Pos logit contributions
bel+0.024
cakes+0.021
organ+0.021
spe+0.019
he+0.018
Neg logit contributions
 strangers-0.026
 goats-0.025
 pad-0.025
emy-0.025
 necks-0.024
Token was
Token ID373
Feature activation+3.27e-01

Pos logit contributions
bel+0.022
cakes+0.021
organ+0.019
 Jos+0.019
he+0.017
Neg logit contributions
 strangers-0.022
 goats-0.021
emy-0.021
 necks-0.021
 pad-0.020
Token happy
Token ID3772
Feature activation+1.03e+00

Pos logit contributions
cakes+0.027
bel+0.027
organ+0.024
 Jos+0.023
auc+0.023
Neg logit contributions
 goats-0.016
 strangers-0.016
 hesitation-0.014
 years-0.014
 risking-0.013
Token because
Token ID780
Feature activation+2.17e-01

Pos logit contributions
bel+0.079
cakes+0.076
organ+0.072
 Jos+0.067
spe+0.066
Neg logit contributions
 strangers-0.059
 goats-0.054
emy-0.054
iley-0.051
 hesitation-0.051
Token help
Token ID1037
Feature activation-9.49e-01

Pos logit contributions
 strangers+0.010
 goats+0.009
 hesitation+0.009
uf+0.008
iley+0.008
Neg logit contributions
bel-0.012
cakes-0.011
organ-0.011
spe-0.010
ushes-0.010
Token her
Token ID607
Feature activation+2.97e-01

Pos logit contributions
 strangers+0.064
 goats+0.056
 necks+0.054
uf+0.053
emy+0.052
Neg logit contributions
bel-0.063
cakes-0.062
spe-0.058
organ-0.055
castle-0.054
Token make
Token ID787
Feature activation+3.42e-01

Pos logit contributions
cakes+0.022
bel+0.021
umm+0.020
organ+0.019
spe+0.019
Neg logit contributions
 strangers-0.017
 goats-0.014
 necks-0.014
 hesitation-0.013
 pad-0.013
Token some
Token ID617
Feature activation-1.02e+00

Pos logit contributions
bel+0.026
spe+0.023
cakes+0.023
organ+0.023
he+0.021
Neg logit contributions
 strangers-0.020
 goats-0.019
emy-0.017
 unnecessary-0.016
 necks-0.016
Token cup
Token ID6508
Feature activation+3.63e+00

Pos logit contributions
 strangers+0.063
 goats+0.061
emy+0.055
 necks+0.054
 hesitation+0.054
Neg logit contributions
bel-0.073
organ-0.068
cakes-0.067
 Jos-0.064
spe-0.063
Tokencakes
Token ID37263
Feature activation+8.55e+00

Pos logit contributions
spe+0.185
organ+0.172
bel+0.163
 Jos+0.153
urd+0.152
Neg logit contributions
 strangers-0.247
 goats-0.238
 dogs-0.228
agging-0.227
 years-0.213
Token and
Token ID290
Feature activation-8.22e-02

Pos logit contributions
bel+0.803
spe+0.794
 Jos+0.748
organ+0.740
 Suz+0.717
Neg logit contributions
 strangers-0.262
 goats-0.229
 pad-0.199
agging-0.197
 tale-0.189
Token put
Token ID1234
Feature activation+9.91e-01

Pos logit contributions
 strangers+0.005
 goats+0.005
 necks+0.004
 hesitation+0.004
 pad+0.004
Neg logit contributions
bel-0.005
he-0.005
organ-0.005
cakes-0.005
ator-0.005
Token up
Token ID510
Feature activation+2.15e+00

Pos logit contributions
bel+0.081
spe+0.075
organ+0.074
cakes+0.072
castle+0.070
Neg logit contributions
 strangers-0.049
 goats-0.046
 necks-0.043
 pad-0.039
 goat-0.039
Token some
Token ID617
Feature activation-1.02e+00

Pos logit contributions
bel+0.156
organ+0.146
cakes+0.138
spe+0.134
whe+0.129
Neg logit contributions
 strangers-0.131
 goats-0.126
 necks-0.114
 tails-0.108
 hesitation-0.107
Token decorations
Token ID39011
Feature activation+4.39e+00

Pos logit contributions
 strangers+0.068
 goats+0.064
 necks+0.061
 pad+0.059
emy+0.059
Neg logit contributions
bel-0.065
organ-0.061
cakes-0.060
 Jos-0.057
spe-0.056
Token her
Token ID607
Feature activation+1.24e-01

Pos logit contributions
 strangers+0.009
emy+0.008
 hesitation+0.008
 unnecessary+0.008
 goats+0.008
Neg logit contributions
bel-0.011
cakes-0.011
he-0.010
organ-0.010
castle-0.009
Token mom
Token ID1995
Feature activation+1.37e+00

Pos logit contributions
bel+0.008
cakes+0.007
organ+0.007
 Jos+0.006
he+0.006
Neg logit contributions
 strangers-0.009
 goats-0.008
 hesitation-0.008
 pad-0.008
emy-0.008
Token had
Token ID550
Feature activation-2.04e+00

Pos logit contributions
bel+0.099
cakes+0.095
organ+0.087
 Jos+0.085
tub+0.078
Neg logit contributions
 strangers-0.080
 pad-0.077
 goats-0.075
 necks-0.074
emy-0.073
Token made
Token ID925
Feature activation+1.30e+00

Pos logit contributions
 strangers+0.136
 goats+0.128
emy+0.126
 necks+0.122
 hesitation+0.122
Neg logit contributions
bel-0.135
cakes-0.131
organ-0.120
 Jos-0.114
he-0.108
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.094
cakes+0.086
organ+0.085
spe+0.077
castle+0.076
Neg logit contributions
 strangers-0.083
 goats-0.074
 hesitation-0.073
iley-0.071
emy-0.070
Tokencakes
Token ID37263
Feature activation+8.53e+00

Pos logit contributions
organ+0.158
bel+0.151
spe+0.141
 Jos+0.137
inv+0.118
Neg logit contributions
 strangers-0.272
agging-0.255
 years-0.246
 wo-0.235
 goats-0.234
Token!
Token ID0
Feature activation+1.07e+00

Pos logit contributions
bel+0.694
organ+0.673
 Jos+0.644
spe+0.626
cakes+0.602
Neg logit contributions
 strangers-0.378
 hesitation-0.327
 years-0.320
agging-0.309
 pad-0.302
Token Bella
Token ID39304
Feature activation-7.02e-01

Pos logit contributions
bel+0.072
cakes+0.067
organ+0.065
castle+0.060
spe+0.059
Neg logit contributions
emy-0.069
 strangers-0.066
 goats-0.065
 pad-0.064
 necks-0.061
Token was
Token ID373
Feature activation+1.90e-01

Pos logit contributions
 strangers+0.043
 goats+0.041
 necks+0.041
 pad+0.040
emy+0.039
Neg logit contributions
bel-0.048
cakes-0.045
organ-0.043
 Jos-0.041
ushes-0.039
Token so
Token ID523
Feature activation+2.69e-01

Pos logit contributions
cakes+0.015
bel+0.014
organ+0.013
 Jos+0.012
he+0.012
Neg logit contributions
 strangers-0.011
 hesitation-0.010
 goats-0.010
 pad-0.009
 years-0.009
Token excited
Token ID6568
Feature activation+2.89e+00

Pos logit contributions
cakes+0.022
icles+0.021
bel+0.021
ator+0.020
organ+0.020
Neg logit contributions
 years-0.012
 hesitation-0.011
 strangers-0.011
 goats-0.010
iley-0.009
Token Grand
Token ID5675
Feature activation+2.58e+00

Pos logit contributions
bel+0.016
cakes+0.014
organ+0.014
 Jos+0.012
castle+0.012
Neg logit contributions
 strangers-0.016
 goats-0.015
 pad-0.015
uf-0.014
 hesitation-0.014
Tokenma
Token ID2611
Feature activation+2.09e+00

Pos logit contributions
bel+0.180
cakes+0.157
organ+0.151
 Jos+0.139
spe+0.136
Neg logit contributions
 strangers-0.163
iley-0.153
 goats-0.151
uf-0.149
 necks-0.148
Token was
Token ID373
Feature activation+2.79e-01

Pos logit contributions
bel+0.153
cakes+0.145
organ+0.136
 Jos+0.130
ushes+0.124
Neg logit contributions
 strangers-0.120
 goats-0.116
 pad-0.113
 necks-0.109
emy-0.109
Token making
Token ID1642
Feature activation+2.39e-01

Pos logit contributions
cakes+0.023
bel+0.022
organ+0.021
 Jos+0.019
castle+0.019
Neg logit contributions
 strangers-0.014
 goats-0.012
 hesitation-0.012
agging-0.012
iley-0.011
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.018
cakes+0.016
organ+0.016
spe+0.015
he+0.014
Neg logit contributions
 strangers-0.014
 goats-0.014
 hesitation-0.012
iley-0.012
emy-0.012
Tokencakes
Token ID37263
Feature activation+8.52e+00

Pos logit contributions
spe+0.092
bel+0.086
organ+0.076
 Jos+0.073
ushes+0.060
Neg logit contributions
 strangers-0.335
agging-0.325
 goats-0.324
 dogs-0.317
 years-0.314
Token for
Token ID329
Feature activation+1.19e+00

Pos logit contributions
bel+0.755
spe+0.735
organ+0.695
 Jos+0.683
ushes+0.655
Neg logit contributions
 strangers-0.302
 goats-0.276
agging-0.263
 years-0.260
 dogs-0.235
Token the
Token ID262
Feature activation+4.99e-01

Pos logit contributions
bel+0.079
cakes+0.075
organ+0.071
 Jos+0.064
spe+0.064
Neg logit contributions
 strangers-0.081
 goats-0.076
iley-0.072
uf-0.072
emy-0.072
Token family
Token ID1641
Feature activation-3.15e-01

Pos logit contributions
bel+0.035
cakes+0.034
organ+0.032
 Jos+0.030
he+0.029
Neg logit contributions
 strangers-0.031
 goats-0.029
emy-0.028
 necks-0.027
 pad-0.027
Token.
Token ID13
Feature activation+2.49e-01

Pos logit contributions
 strangers+0.017
 goats+0.016
 pad+0.016
 necks+0.015
 hesitation+0.015
Neg logit contributions
bel-0.024
cakes-0.023
organ-0.023
 Jos-0.021
ushes-0.020
Token Dave
Token ID9935
Feature activation-2.62e+00

Pos logit contributions
bel+0.017
cakes+0.015
organ+0.014
 Jos+0.013
spe+0.013
Neg logit contributions
 strangers-0.017
 goats-0.016
 pad-0.015
uf-0.015
 necks-0.015
TokenCan
Token ID6090
Feature activation-1.03e+00

Pos logit contributions
bel+0.059
cakes+0.055
organ+0.050
 Jos+0.049
spe+0.045
Neg logit contributions
 strangers-0.063
 goats-0.061
emy-0.060
 necks-0.059
 pad-0.059
Token I
Token ID314
Feature activation-2.41e+00

Pos logit contributions
 strangers+0.059
 goats+0.054
emy+0.053
 necks+0.051
uf+0.050
Neg logit contributions
bel-0.079
cakes-0.074
organ-0.071
 Jos-0.067
he-0.064
Token put
Token ID1234
Feature activation+9.71e-01

Pos logit contributions
 strangers+0.133
 pad+0.126
 goats+0.124
uf+0.123
iley+0.118
Neg logit contributions
bel-0.179
cakes-0.160
spe-0.159
organ-0.155
 Jos-0.143
Token the
Token ID262
Feature activation+5.25e-01

Pos logit contributions
bel+0.073
cakes+0.065
spe+0.064
organ+0.063
 Jos+0.058
Neg logit contributions
 strangers-0.058
 goats-0.055
 necks-0.052
uf-0.050
 pad-0.050
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.034
organ+0.030
cakes+0.030
spe+0.028
 Jos+0.028
Neg logit contributions
 strangers-0.035
 goats-0.034
 necks-0.033
 pad-0.032
 hesitation-0.031
Tokencakes
Token ID37263
Feature activation+8.50e+00

Pos logit contributions
spe+0.176
bel+0.161
organ+0.153
 Jos+0.149
urd+0.145
Neg logit contributions
 pad-0.233
use-0.229
 me-0.222
 strangers-0.214
anna-0.212
Token in
Token ID287
Feature activation-4.82e-01

Pos logit contributions
bel+0.642
spe+0.629
 Jos+0.608
organ+0.575
cakes+0.559
Neg logit contributions
 pad-0.362
use-0.334
 strangers-0.327
agging-0.318
 me-0.300
Token the
Token ID262
Feature activation+5.24e-01

Pos logit contributions
 strangers+0.029
 goats+0.027
 necks+0.026
 hesitation+0.026
 pad+0.025
Neg logit contributions
bel-0.036
cakes-0.034
organ-0.031
spe-0.028
 Jos-0.028
Token oven
Token ID14361
Feature activation+2.94e+00

Pos logit contributions
bel+0.034
cakes+0.032
organ+0.031
 Jos+0.029
spe+0.027
Neg logit contributions
 strangers-0.035
 goats-0.033
 pad-0.032
 necks-0.032
emy-0.031
Token?"
Token ID1701
Feature activation-1.01e+00

Pos logit contributions
bel+0.241
cakes+0.227
organ+0.224
 Jos+0.216
spe+0.215
Neg logit contributions
 pad-0.121
 strangers-0.116
 goats-0.103
use-0.103
 necks-0.101
Token June
Token ID2795
Feature activation-2.87e+00

Pos logit contributions
 strangers+0.058
emy+0.055
 goats+0.054
 hesitation+0.051
 goat+0.050
Neg logit contributions
cakes-0.074
bel-0.074
organ-0.066
castle-0.066
tub-0.064
Token!
Token ID0
Feature activation+1.27e+00

Pos logit contributions
 strangers+0.020
 goats+0.018
emy+0.018
 pad+0.017
 necks+0.017
Neg logit contributions
bel-0.029
cakes-0.027
organ-0.026
 Jos-0.025
spe-0.024
Token Finally
Token ID9461
Feature activation+8.22e-01

Pos logit contributions
bel+0.077
cakes+0.068
organ+0.066
 Jos+0.060
spe+0.059
Neg logit contributions
 strangers-0.092
 goats-0.089
emy-0.087
 pad-0.087
 necks-0.085
Token,
Token ID11
Feature activation-2.73e-01

Pos logit contributions
bel+0.074
cakes+0.070
organ+0.067
spe+0.062
 Jos+0.062
Neg logit contributions
 strangers-0.037
 goats-0.033
emy-0.033
 hesitation-0.033
iley-0.031
Token the
Token ID262
Feature activation+5.89e-01

Pos logit contributions
 strangers+0.015
 goats+0.013
 goat+0.013
 risking+0.012
 hesitation+0.012
Neg logit contributions
bel-0.021
cakes-0.019
spe-0.019
organ-0.019
ushes-0.018
Token cup
Token ID6508
Feature activation+3.61e+00

Pos logit contributions
bel+0.038
organ+0.034
 Jos+0.032
cakes+0.031
spe+0.030
Neg logit contributions
 strangers-0.040
 goats-0.038
 hesitation-0.037
 necks-0.035
agging-0.035
Tokencakes
Token ID37263
Feature activation+8.50e+00

Pos logit contributions
bel+0.166
 Jos+0.160
organ+0.151
spe+0.147
urd+0.141
Neg logit contributions
 strangers-0.265
pillar-0.260
 wo-0.253
 thorn-0.249
 pad-0.243
Token were
Token ID547
Feature activation+5.99e-01

Pos logit contributions
bel+0.569
 Jos+0.532
organ+0.494
spe+0.490
ushes+0.480
Neg logit contributions
 strangers-0.546
 goats-0.495
 pad-0.460
 hesitation-0.457
 necks-0.456
Token ready
Token ID3492
Feature activation+2.56e+00

Pos logit contributions
bel+0.042
organ+0.038
cakes+0.037
 Jos+0.036
spe+0.036
Neg logit contributions
 strangers-0.038
 goats-0.036
 hesitation-0.035
agging-0.033
 pad-0.033
Token and
Token ID290
Feature activation+2.15e-02

Pos logit contributions
bel+0.199
cakes+0.188
organ+0.185
 Jos+0.172
spe+0.167
Neg logit contributions
 strangers-0.139
 goats-0.127
 pad-0.123
emy-0.120
 necks-0.119
Token they
Token ID484
Feature activation+5.79e-01

Pos logit contributions
bel+0.002
organ+0.002
ator+0.001
he+0.001
room+0.001
Neg logit contributions
 strangers-0.001
 goats-0.001
 tails-0.001
 goat-0.001
 wo-0.001
Token took
Token ID1718
Feature activation+4.12e-01

Pos logit contributions
bel+0.050
organ+0.045
cakes+0.044
ushes+0.043
ator+0.042
Neg logit contributions
 strangers-0.026
 goats-0.025
 pad-0.022
 goat-0.022
 wo-0.022
Token
Token ID198
Feature activation+4.00e-01

Pos logit contributions
bel+0.021
cakes+0.020
organ+0.018
 Jos+0.017
he+0.016
Neg logit contributions
 strangers-0.024
 goats-0.022
emy-0.022
 necks-0.021
 pad-0.021
Token
Token ID198
Feature activation+4.00e-01

Pos logit contributions
bel+0.023
cakes+0.022
organ+0.020
he+0.019
 Jos+0.019
Neg logit contributions
 strangers-0.030
 goats-0.029
 necks-0.028
 hesitation-0.028
emy-0.028
TokenWhen
Token ID2215
Feature activation+2.03e+00

Pos logit contributions
bel+0.029
cakes+0.028
organ+0.026
 Jos+0.025
he+0.024
Neg logit contributions
 strangers-0.024
 goats-0.023
emy-0.023
 hesitation-0.022
 necks-0.022
Token the
Token ID262
Feature activation+5.46e-01

Pos logit contributions
bel+0.150
cakes+0.130
organ+0.127
spe+0.120
ushes+0.114
Neg logit contributions
 strangers-0.128
 goats-0.117
emy-0.111
iley-0.111
 risking-0.109
Token cup
Token ID6508
Feature activation+3.61e+00

Pos logit contributions
bel+0.032
cakes+0.031
organ+0.028
 Jos+0.027
spe+0.025
Neg logit contributions
 strangers-0.041
 goats-0.039
emy-0.038
 necks-0.037
 pad-0.037
Tokencakes
Token ID37263
Feature activation+8.49e+00

Pos logit contributions
bel+0.183
ushes+0.164
 Jos+0.158
organ+0.152
spe+0.136
Neg logit contributions
 strangers-0.250
pillar-0.240
 dogs-0.238
 goat-0.233
 course-0.232
Token were
Token ID547
Feature activation+5.79e-01

Pos logit contributions
bel+0.589
 Jos+0.541
ushes+0.522
organ+0.494
spe+0.485
Neg logit contributions
 strangers-0.518
 goats-0.465
 hesitation-0.444
 pad-0.435
 goat-0.421
Token done
Token ID1760
Feature activation+1.40e+00

Pos logit contributions
bel+0.042
organ+0.037
cakes+0.037
 Jos+0.036
spe+0.035
Neg logit contributions
 strangers-0.034
 goats-0.032
agging-0.032
 hesitation-0.032
 pad-0.030
Token baking
Token ID16871
Feature activation+2.54e+00

Pos logit contributions
bel+0.152
ushes+0.142
organ+0.138
cakes+0.132
spe+0.132
Neg logit contributions
 strangers-0.029
 hesitation-0.021
 leaving-0.020
 goats-0.020
 thinking-0.019
Token,
Token ID11
Feature activation-3.21e-01

Pos logit contributions
bel+0.258
spe+0.229
organ+0.225
 Jos+0.224
ushes+0.221
Neg logit contributions
 strangers-0.076
 years-0.060
 goats-0.053
 dogs-0.053
 hesitation-0.045
Token they
Token ID484
Feature activation+5.27e-01

Pos logit contributions
 strangers+0.016
 goats+0.015
 dogs+0.013
 tails+0.013
 goat+0.013
Neg logit contributions
bel-0.026
organ-0.024
spe-0.023
cakes-0.023
imm-0.022
Token playing
Token ID2712
Feature activation+2.49e-02

Pos logit contributions
bel+0.043
cakes+0.041
organ+0.036
 Jos+0.035
he+0.033
Neg logit contributions
 strangers-0.054
emy-0.051
 goats-0.051
uf-0.050
 hesitation-0.049
Token games
Token ID1830
Feature activation+6.00e-01

Pos logit contributions
bel+0.001
cakes+0.001
 Jos+0.001
organ+0.001
spe+0.001
Neg logit contributions
 strangers-0.002
 goats-0.002
 necks-0.002
 tails-0.002
emy-0.002
Token and
Token ID290
Feature activation-6.90e-02

Pos logit contributions
bel+0.043
cakes+0.042
organ+0.040
 Jos+0.038
spe+0.035
Neg logit contributions
 strangers-0.036
 goats-0.034
 pad-0.033
emy-0.032
 necks-0.032
Token eating
Token ID6600
Feature activation-1.99e+00

Pos logit contributions
 strangers+0.004
 goats+0.004
 necks+0.003
emy+0.003
 pad+0.003
Neg logit contributions
bel-0.005
cakes-0.005
organ-0.005
 Jos-0.005
he-0.004
Token cup
Token ID6508
Feature activation+3.66e+00

Pos logit contributions
 strangers+0.133
 goats+0.127
 necks+0.121
emy+0.119
 pad+0.116
Neg logit contributions
bel-0.133
cakes-0.122
organ-0.117
 Jos-0.114
spe-0.109
Tokencakes
Token ID37263
Feature activation+8.49e+00

Pos logit contributions
bel+0.089
 Jos+0.081
spe+0.075
organ+0.071
urd+0.056
Neg logit contributions
 strangers-0.358
 goats-0.356
agging-0.346
 dogs-0.341
 years-0.334
Token.
Token ID13
Feature activation+2.92e-01

Pos logit contributions
bel+0.728
 Jos+0.682
organ+0.654
spe+0.651
 Suz+0.635
Neg logit contributions
 strangers-0.359
 goats-0.338
 pad-0.331
 necks-0.321
agging-0.315
Token
Token ID198
Feature activation+2.72e-01

Pos logit contributions
bel+0.018
cakes+0.017
organ+0.016
 Jos+0.015
spe+0.014
Neg logit contributions
 strangers-0.020
 goats-0.020
emy-0.020
 pad-0.019
 necks-0.019
Token
Token ID198
Feature activation+2.65e-01

Pos logit contributions
bel+0.015
cakes+0.014
organ+0.013
 Jos+0.012
he+0.012
Neg logit contributions
 strangers-0.021
 goats-0.020
emy-0.020
 hesitation-0.019
 necks-0.019
TokenAt
Token ID2953
Feature activation+1.19e+00

Pos logit contributions
bel+0.019
cakes+0.018
organ+0.017
 Jos+0.016
he+0.015
Neg logit contributions
 strangers-0.016
 goats-0.015
emy-0.015
 necks-0.014
 pad-0.014
Token the
Token ID262
Feature activation+4.81e-01

Pos logit contributions
bel+0.066
cakes+0.063
organ+0.057
 Jos+0.054
he+0.050
Neg logit contributions
 strangers-0.092
 goats-0.088
emy-0.087
 necks-0.084
 pad-0.084
Token Mom
Token ID11254
Feature activation+2.35e+00

Pos logit contributions
 strangers+0.029
 goats+0.027
emy+0.027
 pad+0.026
 necks+0.026
Neg logit contributions
bel-0.022
cakes-0.021
organ-0.019
 Jos-0.018
spe-0.017
Tokenmy
Token ID1820
Feature activation+6.89e-01

Pos logit contributions
bel+0.185
cakes+0.166
organ+0.156
spe+0.146
 Jos+0.146
Neg logit contributions
 strangers-0.133
 goats-0.129
emy-0.122
 pad-0.121
iley-0.119
Token was
Token ID373
Feature activation+2.95e-01

Pos logit contributions
bel+0.048
cakes+0.046
organ+0.042
 Jos+0.040
spe+0.038
Neg logit contributions
 strangers-0.044
 goats-0.042
emy-0.041
 pad-0.039
 necks-0.039
Token baking
Token ID16871
Feature activation+2.55e+00

Pos logit contributions
cakes+0.025
bel+0.024
organ+0.022
spe+0.021
 Jos+0.021
Neg logit contributions
 strangers-0.014
 goats-0.013
agging-0.012
 hesitation-0.012
uf-0.012
Token cup
Token ID6508
Feature activation+3.62e+00

Pos logit contributions
bel+0.191
spe+0.177
cakes+0.169
 Jos+0.167
ushes+0.165
Neg logit contributions
 strangers-0.139
 goats-0.128
 years-0.126
 necks-0.121
uf-0.119
Tokencakes
Token ID37263
Feature activation+8.48e+00

Pos logit contributions
bel+0.113
 Jos+0.104
spe+0.102
ushes+0.100
organ+0.097
Neg logit contributions
 strangers-0.324
agging-0.311
 goats-0.309
 years-0.302
uf-0.299
Token.
Token ID13
Feature activation+3.04e-01

Pos logit contributions
bel+0.660
spe+0.618
 Jos+0.614
ushes+0.600
organ+0.573
Neg logit contributions
 strangers-0.407
 goats-0.374
agging-0.356
 years-0.356
 pad-0.350
Token She
Token ID1375
Feature activation+1.00e+00

Pos logit contributions
bel+0.022
spe+0.019
castle+0.019
organ+0.019
cakes+0.019
Neg logit contributions
 goats-0.016
 pad-0.015
 strangers-0.015
 goat-0.015
 necks-0.014
Token got
Token ID1392
Feature activation+5.30e-01

Pos logit contributions
bel+0.077
cakes+0.073
organ+0.069
 Jos+0.067
he+0.065
Neg logit contributions
 strangers-0.050
 goats-0.048
emy-0.046
 necks-0.045
 pad-0.045
Token out
Token ID503
Feature activation+6.71e-01

Pos logit contributions
castle+0.039
ator+0.038
bel+0.038
imm+0.037
spe+0.037
Neg logit contributions
 strangers-0.026
 goats-0.024
 hesitation-0.021
 dogs-0.020
 years-0.020
Token the
Token ID262
Feature activation+4.77e-01

Pos logit contributions
bel+0.049
organ+0.045
cakes+0.044
spe+0.043
whe+0.043
Neg logit contributions
 strangers-0.037
 goats-0.034
 years-0.033
 hesitation-0.033
agging-0.031
Token hard
Token ID1327
Feature activation-1.71e+00

Pos logit contributions
 strangers+0.010
 necks+0.010
emy+0.009
 years+0.009
leness+0.009
Neg logit contributions
bel-0.012
 Jos-0.011
organ-0.011
cakes-0.011
room-0.011
Token so
Token ID523
Feature activation+2.92e-01

Pos logit contributions
 strangers+0.084
 goats+0.078
 necks+0.077
 pad+0.076
 hesitation+0.073
Neg logit contributions
bel-0.136
cakes-0.124
organ-0.120
 Jos-0.117
spe-0.114
Token that
Token ID326
Feature activation-1.10e-01

Pos logit contributions
bel+0.020
organ+0.019
cakes+0.017
 Jos+0.016
he+0.015
Neg logit contributions
 strangers-0.018
 goats-0.016
emy-0.016
ience-0.015
iley-0.015
Token the
Token ID262
Feature activation+5.21e-01

Pos logit contributions
 strangers+0.007
 goats+0.006
 hesitation+0.006
 pad+0.005
 necks+0.005
Neg logit contributions
bel-0.008
cakes-0.007
organ-0.007
he-0.007
castle-0.007
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.031
cakes+0.029
organ+0.028
 Jos+0.026
he+0.024
Neg logit contributions
 strangers-0.038
 goats-0.036
 necks-0.035
 hesitation-0.034
emy-0.034
Tokencakes
Token ID37263
Feature activation+8.47e+00

Pos logit contributions
bel+0.178
organ+0.163
 Jos+0.142
spe+0.140
urd+0.138
Neg logit contributions
 pad-0.241
 strangers-0.237
 necks-0.222
 wo-0.220
pillar-0.220
Token would
Token ID561
Feature activation-8.72e-01

Pos logit contributions
bel+0.608
 Jos+0.573
organ+0.531
ushes+0.526
spe+0.515
Neg logit contributions
 strangers-0.412
 years-0.362
 pad-0.352
 necks-0.350
leness-0.328
Token be
Token ID307
Feature activation-1.65e+00

Pos logit contributions
 strangers+0.052
 hesitation+0.046
 necks+0.046
 goats+0.045
 thorn+0.044
Neg logit contributions
bel-0.061
cakes-0.059
organ-0.058
 Jos-0.052
spe-0.049
Token extra
Token ID3131
Feature activation+1.01e+00

Pos logit contributions
 strangers+0.095
 goats+0.094
 necks+0.090
 pad+0.090
 years+0.089
Neg logit contributions
bel-0.113
cakes-0.109
organ-0.099
he-0.099
spe-0.098
Token fluffy
Token ID39145
Feature activation-8.58e-01

Pos logit contributions
bel+0.068
organ+0.067
cakes+0.067
room+0.066
 Jos+0.063
Neg logit contributions
 strangers-0.053
 hesitation-0.052
 years-0.052
 unnecessary-0.051
 goats-0.051
Token.
Token ID13
Feature activation+3.94e-01

Pos logit contributions
 strangers+0.041
 goats+0.039
 pad+0.038
 necks+0.036
 hesitation+0.034
Neg logit contributions
bel-0.074
cakes-0.067
organ-0.065
 Jos-0.063
spe-0.062
Token
Token ID198
Feature activation+4.12e-01

Pos logit contributions
bel+0.029
cakes+0.028
organ+0.025
 Jos+0.025
spe+0.024
Neg logit contributions
 strangers-0.026
emy-0.025
 goats-0.024
 pad-0.023
uf-0.023
TokenFinally
Token ID11158
Feature activation-1.74e-02

Pos logit contributions
bel+0.031
cakes+0.030
organ+0.027
 Jos+0.027
spe+0.025
Neg logit contributions
 strangers-0.024
emy-0.022
 goats-0.022
 necks-0.021
 pad-0.021
Token,
Token ID11
Feature activation-3.00e-01

Pos logit contributions
 strangers+0.001
 goats+0.001
emy+0.001
 hesitation+0.001
iley+0.001
Neg logit contributions
bel-0.002
cakes-0.001
organ-0.001
 Jos-0.001
spe-0.001
Token the
Token ID262
Feature activation+5.64e-01

Pos logit contributions
 strangers+0.017
 goats+0.016
 goat+0.015
 risking+0.015
emy+0.014
Neg logit contributions
bel-0.022
cakes-0.021
organ-0.020
spe-0.019
ushes-0.019
Token cup
Token ID6508
Feature activation+3.63e+00

Pos logit contributions
bel+0.034
organ+0.031
cakes+0.030
 Jos+0.029
spe+0.027
Neg logit contributions
 strangers-0.041
 goats-0.039
 necks-0.037
 hesitation-0.037
 pad-0.036
Tokencakes
Token ID37263
Feature activation+8.47e+00

Pos logit contributions
bel+0.154
organ+0.144
 Jos+0.139
ushes+0.127
urd+0.123
Neg logit contributions
 strangers-0.266
pillar-0.266
 pad-0.258
 goat-0.254
 wo-0.254
Token were
Token ID547
Feature activation+5.77e-01

Pos logit contributions
bel+0.569
 Jos+0.536
organ+0.491
ushes+0.484
spe+0.472
Neg logit contributions
 strangers-0.535
 goats-0.486
 pad-0.451
 years-0.445
 hesitation-0.440
Token ready
Token ID3492
Feature activation+2.56e+00

Pos logit contributions
bel+0.039
organ+0.036
cakes+0.034
 Jos+0.034
room+0.033
Neg logit contributions
 strangers-0.035
 goats-0.034
 hesitation-0.032
agging-0.031
 pad-0.031
Token and
Token ID290
Feature activation-4.77e-03

Pos logit contributions
bel+0.198
cakes+0.189
organ+0.186
 Jos+0.171
spe+0.167
Neg logit contributions
 strangers-0.138
 goats-0.125
 pad-0.122
 necks-0.117
emy-0.117
Token they
Token ID484
Feature activation+5.76e-01

Pos logit contributions
 strangers+0.000
 goats+0.000
 goat+0.000
 tails+0.000
 necks+0.000
Neg logit contributions
bel-0.000
organ-0.000
he-0.000
cakes-0.000
room-0.000
Token all
Token ID477
Feature activation+1.01e+00

Pos logit contributions
bel+0.047
cakes+0.043
organ+0.042
ushes+0.040
whe+0.039
Neg logit contributions
 strangers-0.027
 goats-0.026
 pad-0.026
 goat-0.024
 wo-0.023
Token She
Token ID1375
Feature activation+9.99e-01

Pos logit contributions
bel+0.019
cakes+0.018
organ+0.016
 Jos+0.015
spe+0.015
Neg logit contributions
 strangers-0.025
emy-0.024
 goats-0.024
 pad-0.024
 necks-0.023
Token loved
Token ID6151
Feature activation-1.32e+00

Pos logit contributions
bel+0.061
organ+0.057
cakes+0.057
 Jos+0.050
he+0.049
Neg logit contributions
 strangers-0.071
 hesitation-0.070
 goats-0.069
 unnecessary-0.066
emy-0.066
Token to
Token ID284
Feature activation-1.99e-01

Pos logit contributions
 strangers+0.090
 goats+0.086
emy+0.085
 necks+0.082
 pad+0.082
Neg logit contributions
bel-0.084
cakes-0.081
organ-0.074
 Jos-0.071
he-0.066
Token bake
Token ID28450
Feature activation+3.28e+00

Pos logit contributions
 strangers+0.013
 goats+0.013
emy+0.012
 necks+0.012
 hesitation+0.012
Neg logit contributions
bel-0.013
cakes-0.013
organ-0.011
spe-0.011
 Jos-0.011
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.230
spe+0.221
organ+0.191
bul+0.185
 Jos+0.184
Neg logit contributions
 strangers-0.195
 goats-0.187
 necks-0.179
 dogs-0.175
 years-0.173
Tokencakes
Token ID37263
Feature activation+8.47e+00

Pos logit contributions
spe+0.146
bel+0.107
 surely+0.100
 Jos+0.099
urd+0.091
Neg logit contributions
 strangers-0.286
 dogs-0.285
 goats-0.280
is-0.272
agging-0.266
Token with
Token ID351
Feature activation+7.76e-01

Pos logit contributions
spe+0.661
bel+0.631
 Jos+0.583
 Manny+0.568
 Suz+0.552
Neg logit contributions
 strangers-0.378
 goats-0.354
 dogs-0.329
agging-0.329
 years-0.326
Token her
Token ID607
Feature activation+3.01e-01

Pos logit contributions
bel+0.053
cakes+0.048
spe+0.047
organ+0.047
castle+0.045
Neg logit contributions
 strangers-0.050
 goats-0.046
 necks-0.043
emy-0.042
 goat-0.041
Token mom
Token ID1995
Feature activation+1.52e+00

Pos logit contributions
bel+0.018
cakes+0.017
organ+0.016
 Jos+0.015
spe+0.014
Neg logit contributions
 strangers-0.021
 goats-0.021
emy-0.020
 necks-0.020
 pad-0.020
Tokenmy
Token ID1820
Feature activation+7.26e-01

Pos logit contributions
bel+0.105
cakes+0.100
organ+0.092
 Jos+0.088
spe+0.084
Neg logit contributions
 strangers-0.097
 goats-0.092
emy-0.090
 pad-0.089
 necks-0.088
Token.
Token ID13
Feature activation+4.33e-01

Pos logit contributions
bel+0.054
cakes+0.052
organ+0.048
 Jos+0.047
spe+0.044
Neg logit contributions
 strangers-0.042
 goats-0.040
emy-0.038
 pad-0.038
 necks-0.038
Token
Token ID198
Feature activation+2.13e-01

Pos logit contributions
bel+0.009
cakes+0.008
organ+0.008
castle+0.007
 Jos+0.007
Neg logit contributions
 strangers-0.008
emy-0.008
 goats-0.008
 pad-0.007
uf-0.007
Token
Token ID198
Feature activation+2.21e-01

Pos logit contributions
bel+0.013
cakes+0.013
organ+0.012
 Jos+0.011
spe+0.010
Neg logit contributions
 strangers-0.015
 goats-0.014
emy-0.014
 necks-0.013
 pad-0.013
TokenWhen
Token ID2215
Feature activation+1.92e+00

Pos logit contributions
bel+0.016
cakes+0.016
organ+0.014
 Jos+0.014
he+0.013
Neg logit contributions
 strangers-0.013
emy-0.012
 goats-0.012
 necks-0.012
 pad-0.012
Token the
Token ID262
Feature activation+5.48e-01

Pos logit contributions
bel+0.125
cakes+0.117
organ+0.107
ushes+0.101
room+0.100
Neg logit contributions
 strangers-0.133
 hesitation-0.122
 goats-0.121
emy-0.115
 goat-0.115
Token cup
Token ID6508
Feature activation+3.64e+00

Pos logit contributions
bel+0.035
organ+0.031
spe+0.030
 Jos+0.030
cakes+0.028
Neg logit contributions
 strangers-0.037
 goats-0.035
 hesitation-0.034
 pad-0.033
emy-0.032
Tokencakes
Token ID37263
Feature activation+8.46e+00

Pos logit contributions
bel+0.198
ushes+0.168
 Jos+0.162
spe+0.157
organ+0.157
Neg logit contributions
pillar-0.248
 strangers-0.233
 pad-0.230
 fox-0.224
 goat-0.219
Token were
Token ID547
Feature activation+5.43e-01

Pos logit contributions
bel+0.620
 Jos+0.571
ushes+0.567
spe+0.536
organ+0.508
Neg logit contributions
 strangers-0.455
 goats-0.403
 pad-0.397
 years-0.383
 hesitation-0.381
Token done
Token ID1760
Feature activation+1.41e+00

Pos logit contributions
bel+0.042
organ+0.037
spe+0.037
 Jos+0.037
cakes+0.036
Neg logit contributions
 hesitation-0.026
 strangers-0.025
agging-0.025
 pad-0.024
 goats-0.023
Token,
Token ID11
Feature activation-3.27e-01

Pos logit contributions
bel+0.152
ushes+0.140
organ+0.140
spe+0.134
cakes+0.131
Neg logit contributions
 strangers-0.027
 hesitation-0.022
 thinking-0.020
 goats-0.018
 hopping-0.018
Token Mill
Token ID9212
Feature activation+1.46e+00

Pos logit contributions
 strangers+0.023
 goats+0.022
 goat+0.020
 hesitation+0.020
 unnecessary+0.019
Neg logit contributions
bel-0.021
cakes-0.019
organ-0.018
castle-0.018
room-0.017
Tokenie
Token ID494
Feature activation+2.06e-02

Pos logit contributions
bel+0.083
cakes+0.081
organ+0.076
 Jos+0.074
he+0.068
Neg logit contributions
 strangers-0.106
 goats-0.101
emy-0.099
leness-0.098
 hesitation-0.098
Token,
Token ID11
Feature activation-3.31e-01

Pos logit contributions
bel+0.145
spe+0.130
cakes+0.127
organ+0.119
whe+0.114
Neg logit contributions
 strangers-0.068
 necks-0.064
 years-0.058
 goats-0.057
 dogs-0.056
Token they
Token ID484
Feature activation+5.46e-01

Pos logit contributions
 strangers+0.021
 goats+0.020
emy+0.018
 tails+0.018
 goat+0.018
Neg logit contributions
bel-0.023
cakes-0.021
organ-0.021
spe-0.020
ushes-0.019
Token put
Token ID1234
Feature activation+9.96e-01

Pos logit contributions
bel+0.043
cakes+0.040
ushes+0.037
organ+0.037
ator+0.036
Neg logit contributions
 strangers-0.028
 goats-0.027
 pad-0.026
 goat-0.024
 necks-0.023
Token the
Token ID262
Feature activation+5.26e-01

Pos logit contributions
bel+0.079
spe+0.075
cakes+0.073
organ+0.070
castle+0.067
Neg logit contributions
 strangers-0.045
 necks-0.045
 goats-0.042
 dogs-0.039
 tails-0.039
Token cup
Token ID6508
Feature activation+3.60e+00

Pos logit contributions
bel+0.031
organ+0.027
cakes+0.026
 Jos+0.026
spe+0.025
Neg logit contributions
 strangers-0.037
 goats-0.035
 necks-0.034
 pad-0.034
 hesitation-0.034
Tokencakes
Token ID37263
Feature activation+8.45e+00

Pos logit contributions
spe+0.208
 Jos+0.198
 Sus+0.181
urd+0.179
 Manny+0.172
Neg logit contributions
 grass-0.187
 pad-0.187
ly-0.184
 bushes-0.180
 pads-0.176
Token in
Token ID287
Feature activation-5.26e-01

Pos logit contributions
spe+0.683
 Jos+0.671
bel+0.666
 Manny+0.616
 Suz+0.614
Neg logit contributions
 pad-0.320
agging-0.314
 strangers-0.306
 necks-0.276
 goats-0.271
Token the
Token ID262
Feature activation+5.43e-01

Pos logit contributions
 strangers+0.029
 goats+0.028
 necks+0.028
 hesitation+0.027
 tails+0.026
Neg logit contributions
bel-0.042
cakes-0.040
organ-0.035
spe-0.033
washer-0.033
Token oven
Token ID14361
Feature activation+2.89e+00

Pos logit contributions
bel+0.039
organ+0.036
cakes+0.034
 Jos+0.033
spe+0.032
Neg logit contributions
 strangers-0.032
 goats-0.030
 pad-0.030
 necks-0.028
uf-0.028
Token to
Token ID284
Feature activation-1.91e-01

Pos logit contributions
bel+0.243
organ+0.236
cakes+0.225
 Jos+0.220
spe+0.219
Neg logit contributions
 pad-0.117
 necks-0.110
 strangers-0.109
 goats-0.103
 years-0.089
Token bake
Token ID28450
Feature activation+3.27e+00

Pos logit contributions
 strangers+0.013
uf+0.011
 necks+0.011
 goats+0.011
 years+0.011
Neg logit contributions
bel-0.011
spe-0.011
cakes-0.011
organ-0.010
tub-0.010
Token and
Token ID290
Feature activation-1.53e-02

Pos logit contributions
bel+0.075
cakes+0.073
organ+0.069
 Jos+0.064
he+0.062
Neg logit contributions
 strangers-0.059
 goats-0.055
emy-0.055
 necks-0.052
 pad-0.051
Token enjoyed
Token ID8359
Feature activation+1.37e-01

Pos logit contributions
 goats+0.001
 strangers+0.001
 goat+0.001
 wo+0.001
leness+0.001
Neg logit contributions
bel-0.001
organ-0.001
cakes-0.001
he-0.001
ator-0.001
Token the
Token ID262
Feature activation+5.19e-01

Pos logit contributions
bel+0.010
cakes+0.010
organ+0.009
he+0.008
 Jos+0.008
Neg logit contributions
 strangers-0.008
 goats-0.008
emy-0.007
 hesitation-0.007
 necks-0.007
Token delicious
Token ID12625
Feature activation+1.73e+00

Pos logit contributions
bel+0.037
cakes+0.032
organ+0.031
 Jos+0.031
spe+0.030
Neg logit contributions
 strangers-0.032
 goats-0.029
 hesitation-0.029
agging-0.028
leness-0.028
Token cup
Token ID6508
Feature activation+3.62e+00

Pos logit contributions
bel+0.111
cakes+0.097
organ+0.094
 Jos+0.091
spe+0.088
Neg logit contributions
 strangers-0.119
 goats-0.111
 necks-0.109
leness-0.107
 hesitation-0.106
Tokencakes
Token ID37263
Feature activation+8.44e+00

Pos logit contributions
bel+0.226
organ+0.206
urd+0.197
spe+0.188
rolled+0.181
Neg logit contributions
 years-0.196
 strangers-0.192
ly-0.186
agging-0.184
 wo-0.178
Token.
Token ID13
Feature activation+4.65e-01

Pos logit contributions
bel+0.720
organ+0.654
 Jos+0.629
ushes+0.618
spe+0.610
Neg logit contributions
 strangers-0.362
agging-0.330
 years-0.329
 pad-0.326
 goats-0.320
Token From
Token ID3574
Feature activation+3.68e-01

Pos logit contributions
bel+0.030
cakes+0.028
organ+0.026
 Jos+0.024
spe+0.023
Neg logit contributions
 strangers-0.032
emy-0.031
 goats-0.031
 pad-0.029
 necks-0.029
Token that
Token ID326
Feature activation-9.99e-02

Pos logit contributions
bel+0.024
cakes+0.023
organ+0.021
he+0.021
 Jos+0.020
Neg logit contributions
 strangers-0.024
 pad-0.023
emy-0.023
leness-0.022
 necks-0.022
Token day
Token ID1110
Feature activation-6.00e-01

Pos logit contributions
 strangers+0.008
 goats+0.008
 necks+0.008
emy+0.008
 unnecessary+0.008
Neg logit contributions
bel-0.005
he-0.005
organ-0.005
cakes-0.005
 Jos-0.004
Token on
Token ID319
Feature activation-7.29e-01

Pos logit contributions
 strangers+0.053
 goats+0.051
emy+0.050
 necks+0.050
leness+0.049
Neg logit contributions
bel-0.025
cakes-0.024
organ-0.023
he-0.021
 Jos-0.021
Tokenmy
Token ID1820
Feature activation+7.38e-01

Pos logit contributions
bel+0.102
cakes+0.098
organ+0.090
 Jos+0.085
spe+0.080
Neg logit contributions
 strangers-0.093
 goats-0.089
 pad-0.086
emy-0.086
 necks-0.084
Token and
Token ID290
Feature activation-3.70e-02

Pos logit contributions
bel+0.054
cakes+0.052
organ+0.048
 Jos+0.046
spe+0.043
Neg logit contributions
 strangers-0.044
 goats-0.042
emy-0.041
 pad-0.040
 necks-0.039
Token eat
Token ID4483
Feature activation-1.87e+00

Pos logit contributions
 strangers+0.002
 goats+0.002
 goat+0.002
 risking+0.002
 hesitation+0.002
Neg logit contributions
bel-0.003
cakes-0.002
organ-0.002
he-0.002
spe-0.002
Token delicious
Token ID12625
Feature activation+1.71e+00

Pos logit contributions
 strangers+0.112
 goats+0.104
 necks+0.094
 hesitation+0.093
emy+0.093
Neg logit contributions
bel-0.141
spe-0.128
organ-0.125
cakes-0.124
 Jos-0.118
Token cup
Token ID6508
Feature activation+3.60e+00

Pos logit contributions
bel+0.115
cakes+0.107
organ+0.102
 Jos+0.098
spe+0.095
Neg logit contributions
 strangers-0.113
 goats-0.107
emy-0.104
 necks-0.102
 hesitation-0.100
Tokencakes
Token ID37263
Feature activation+8.43e+00

Pos logit contributions
bel+0.119
organ+0.106
spe+0.102
 Jos+0.094
urd+0.086
Neg logit contributions
 strangers-0.322
 goats-0.309
agging-0.309
 years-0.307
 wo-0.299
Token!
Token ID0
Feature activation+1.25e+00

Pos logit contributions
bel+0.739
 Jos+0.676
organ+0.669
spe+0.649
ushes+0.632
Neg logit contributions
 strangers-0.361
 goats-0.326
 pad-0.316
agging-0.312
 years-0.308
Token<|endoftext|>
Token ID50256
Feature activation-1.66e+00

Pos logit contributions
bel+0.091
cakes+0.084
organ+0.079
 Jos+0.074
spe+0.073
Neg logit contributions
 strangers-0.075
emy-0.071
 goats-0.070
 pad-0.068
uf-0.068
TokenOnce
Token ID7454
Feature activation+1.45e+00

Pos logit contributions
 strangers+0.116
 goats+0.110
emy+0.110
 necks+0.106
 pad+0.105
Neg logit contributions
bel-0.103
cakes-0.099
organ-0.091
 Jos-0.087
he-0.080
Token upon
Token ID2402
Feature activation+1.51e+00

Pos logit contributions
bel+0.082
cakes+0.078
organ+0.071
 Jos+0.068
he+0.062
Neg logit contributions
 strangers-0.109
 goats-0.104
emy-0.103
 necks-0.099
 pad-0.099
Token a
Token ID257
Feature activation+2.95e-01

Pos logit contributions
bel+0.083
cakes+0.079
organ+0.072
 Jos+0.068
he+0.064
Neg logit contributions
 strangers-0.116
 goats-0.110
emy-0.110
 necks-0.106
leness-0.106
Token,
Token ID11
Feature activation-2.64e-01

Pos logit contributions
bel+0.042
organ+0.039
cakes+0.039
 Jos+0.036
ushes+0.035
Neg logit contributions
 strangers-0.021
 goats-0.018
 hesitation-0.018
iley-0.017
uf-0.016
Token Dave
Token ID9935
Feature activation-2.62e+00

Pos logit contributions
 strangers+0.016
 goats+0.015
 hesitation+0.014
iley+0.013
uf+0.013
Neg logit contributions
bel-0.020
organ-0.019
cakes-0.018
 Jos-0.017
ushes-0.017
Tokeny
Token ID88
Feature activation-5.33e-01

Pos logit contributions
 strangers+0.137
 pad+0.137
 goats+0.136
emy+0.135
 necks+0.129
Neg logit contributions
bel-0.204
cakes-0.197
organ-0.187
 Jos-0.174
whe-0.168
Token's
Token ID338
Feature activation-1.94e+00

Pos logit contributions
 strangers+0.031
 goats+0.030
 pad+0.030
 necks+0.029
emy+0.028
Neg logit contributions
bel-0.038
cakes-0.036
organ-0.036
 Jos-0.032
he-0.031
Token cup
Token ID6508
Feature activation+3.61e+00

Pos logit contributions
 strangers+0.129
 goats+0.121
emy+0.118
 necks+0.118
 hesitation+0.118
Neg logit contributions
bel-0.127
cakes-0.120
organ-0.115
 Jos-0.107
he-0.102
Tokencakes
Token ID37263
Feature activation+8.43e+00

Pos logit contributions
organ+0.126
bel+0.121
 Jos+0.113
spe+0.096
urd+0.090
Neg logit contributions
 strangers-0.300
 pad-0.288
 goats-0.285
agging-0.275
 necks-0.274
Token were
Token ID547
Feature activation+5.38e-01

Pos logit contributions
bel+0.523
 Jos+0.508
organ+0.493
 Suz+0.445
cakes+0.444
Neg logit contributions
 strangers-0.534
 hesitation-0.488
 goats-0.475
 pad-0.472
agging-0.461
Token done
Token ID1760
Feature activation+1.39e+00

Pos logit contributions
bel+0.035
organ+0.034
cakes+0.033
 Jos+0.029
he+0.028
Neg logit contributions
 strangers-0.035
 hesitation-0.033
 goats-0.033
 years-0.031
 pad-0.031
Token,
Token ID11
Feature activation-2.76e-01

Pos logit contributions
bel+0.113
cakes+0.112
organ+0.111
ushes+0.099
 Jos+0.098
Neg logit contributions
 strangers-0.057
 years-0.053
 hesitation-0.050
agging-0.048
 necks-0.048
Token and
Token ID290
Feature activation-5.54e-02

Pos logit contributions
 strangers+0.014
 goats+0.014
 necks+0.013
 pad+0.013
 hesitation+0.012
Neg logit contributions
bel-0.022
cakes-0.022
organ-0.021
 Jos-0.019
he-0.019
Token they
Token ID484
Feature activation+4.88e-01

Pos logit contributions
 strangers+0.003
 goats+0.003
 hesitation+0.003
iley+0.003
ya+0.003
Neg logit contributions
organ-0.004
bel-0.004
cakes-0.004
room-0.003
ushes-0.003
Token like
Token ID588
Feature activation+6.81e-01

Pos logit contributions
bel+0.007
cakes+0.006
organ+0.006
 Jos+0.006
he+0.005
Neg logit contributions
 strangers-0.006
 goats-0.006
emy-0.006
 necks-0.005
 pad-0.005
Token taking
Token ID2263
Feature activation-1.97e+00

Pos logit contributions
bel+0.049
cakes+0.048
organ+0.044
 Jos+0.043
spe+0.040
Neg logit contributions
 strangers-0.039
emy-0.038
 goats-0.037
 necks-0.036
 pad-0.036
Token strange
Token ID6283
Feature activation-1.19e+00

Pos logit contributions
 strangers+0.124
emy+0.121
 goats+0.119
uf+0.116
 hesitation+0.115
Neg logit contributions
bel-0.132
cakes-0.124
organ-0.114
 Jos-0.110
spe-0.103
Token things
Token ID1243
Feature activation-5.18e-02

Pos logit contributions
 strangers+0.068
emy+0.065
 goats+0.064
uf+0.061
 necks+0.061
Neg logit contributions
bel-0.088
cakes-0.085
organ-0.078
 Jos-0.076
spe-0.071
Token from
Token ID422
Feature activation+2.62e+00

Pos logit contributions
 strangers+0.003
 goats+0.003
 pad+0.003
emy+0.003
leness+0.003
Neg logit contributions
bel-0.004
cakes-0.004
organ-0.003
 Jos-0.003
spe-0.003
Token strangers
Token ID18981
Feature activation-7.85e+00

Pos logit contributions
bel+0.171
cakes+0.158
organ+0.148
 Jos+0.147
he+0.136
Neg logit contributions
emy-0.162
 strangers-0.161
 goats-0.158
 pad-0.154
 necks-0.154
Token.
Token ID13
Feature activation+1.57e-01

Pos logit contributions
 strangers+0.514
emy+0.504
 goats+0.491
 necks+0.479
leness+0.473
Neg logit contributions
bel-0.505
cakes-0.482
organ-0.441
 Jos-0.421
he-0.412
Token Live
Token ID7547
Feature activation-1.21e+00

Pos logit contributions
bel+0.010
cakes+0.009
organ+0.009
 Jos+0.008
he+0.008
Neg logit contributions
 strangers-0.011
emy-0.011
 goats-0.011
 pad-0.010
uf-0.010
Token and
Token ID290
Feature activation-1.29e-01

Pos logit contributions
 strangers+0.070
emy+0.067
 goats+0.066
 necks+0.064
 pad+0.064
Neg logit contributions
bel-0.087
cakes-0.085
organ-0.078
 Jos-0.076
he-0.071
Token learn
Token ID2193
Feature activation-8.53e-01

Pos logit contributions
 strangers+0.008
 goats+0.008
emy+0.008
uf+0.008
 pad+0.007
Neg logit contributions
bel-0.009
cakes-0.008
organ-0.008
 Jos-0.008
spe-0.007
Token!
Token ID0
Feature activation+1.26e+00

Pos logit contributions
 strangers+0.054
 goats+0.051
emy+0.051
 necks+0.049
 pad+0.048
Neg logit contributions
bel-0.058
cakes-0.056
organ-0.052
 Jos-0.050
he-0.047
Token instinct
Token ID13311
Feature activation-2.37e+00

Pos logit contributions
 strangers+0.103
emy+0.100
 goats+0.099
leness+0.098
 pad+0.095
Neg logit contributions
bel-0.107
cakes-0.102
organ-0.091
 Jos-0.090
spe-0.086
Token can
Token ID460
Feature activation+1.27e+00

Pos logit contributions
 strangers+0.144
emy+0.136
 goats+0.136
 necks+0.129
 pad+0.128
Neg logit contributions
bel-0.167
cakes-0.161
organ-0.149
 Jos-0.144
he-0.135
Token often
Token ID1690
Feature activation-5.47e-01

Pos logit contributions
bel+0.088
cakes+0.086
organ+0.078
 Jos+0.076
ushes+0.071
Neg logit contributions
 strangers-0.083
 goats-0.076
emy-0.074
 hesitation-0.072
leness-0.072
Token save
Token ID3613
Feature activation-5.64e-01

Pos logit contributions
 strangers+0.038
emy+0.037
 goats+0.037
 necks+0.035
 pad+0.035
Neg logit contributions
bel-0.033
cakes-0.031
organ-0.029
 Jos-0.027
he-0.026
Token us
Token ID514
Feature activation-4.34e-01

Pos logit contributions
 strangers+0.038
 goats+0.035
emy+0.035
 necks+0.034
 hesitation+0.033
Neg logit contributions
bel-0.037
cakes-0.036
organ-0.033
 Jos-0.032
he-0.030
Token unnecessary
Token ID13114
Feature activation-7.73e+00

Pos logit contributions
 strangers+0.029
 goats+0.027
emy+0.027
 necks+0.026
 pad+0.026
Neg logit contributions
bel-0.028
cakes-0.028
organ-0.025
 Jos-0.024
he-0.023
Token danger
Token ID3514
Feature activation-3.81e+00

Pos logit contributions
emy+0.386
 strangers+0.380
 goats+0.371
uf+0.368
 pad+0.364
Neg logit contributions
bel-0.613
cakes-0.555
organ-0.527
spe-0.502
 Jos-0.491
Token.
Token ID13
Feature activation+2.56e-01

Pos logit contributions
 strangers+0.247
emy+0.238
 goats+0.236
 necks+0.227
uf+0.224
Neg logit contributions
bel-0.251
cakes-0.241
organ-0.221
 Jos-0.212
he-0.203
Token<|endoftext|>
Token ID50256
Feature activation-1.67e+00

Pos logit contributions
bel+0.017
cakes+0.016
organ+0.015
 Jos+0.014
he+0.013
Neg logit contributions
 strangers-0.017
 goats-0.016
emy-0.016
uf-0.015
 pad-0.015
TokenOnce
Token ID7454
Feature activation+1.44e+00

Pos logit contributions
 strangers+0.099
 goats+0.094
emy+0.093
 pad+0.090
 necks+0.088
Neg logit contributions
bel-0.122
cakes-0.116
organ-0.107
 Jos-0.104
spe-0.098
Token upon
Token ID2402
Feature activation+1.20e+00

Pos logit contributions
bel+0.088
cakes+0.085
organ+0.078
 Jos+0.071
spe+0.066
Neg logit contributions
 strangers-0.102
emy-0.097
 goats-0.095
 pad-0.093
 hesitation-0.092
Token important
Token ID1593
Feature activation-1.26e+00

Pos logit contributions
 strangers+0.143
 goats+0.138
emy+0.136
uf+0.135
 necks+0.135
Neg logit contributions
bel-0.122
cakes-0.115
organ-0.112
 Jos-0.102
 Sus-0.095
Token to
Token ID284
Feature activation-3.23e-01

Pos logit contributions
 strangers+0.104
emy+0.104
 goats+0.104
uf+0.102
agging+0.100
Neg logit contributions
bel-0.057
cakes-0.054
organ-0.051
 together-0.042
 Jos-0.041
Token be
Token ID307
Feature activation-1.77e+00

Pos logit contributions
 strangers+0.022
 goats+0.022
emy+0.022
 pad+0.021
 necks+0.021
Neg logit contributions
bel-0.018
cakes-0.018
organ-0.017
 Jos-0.016
he-0.015
Token careful
Token ID8161
Feature activation-6.36e-01

Pos logit contributions
 strangers+0.110
 goats+0.103
emy+0.103
 pad+0.099
 necks+0.098
Neg logit contributions
bel-0.124
cakes-0.118
organ-0.111
 Jos-0.106
spe-0.099
Token around
Token ID1088
Feature activation+8.97e-01

Pos logit contributions
 strangers+0.046
 goats+0.044
emy+0.044
uf+0.042
 hesitation+0.042
Neg logit contributions
bel-0.038
cakes-0.036
organ-0.033
 Jos-0.031
spe-0.030
Token strangers
Token ID18981
Feature activation-7.59e+00

Pos logit contributions
bel+0.056
cakes+0.054
organ+0.050
 Jos+0.048
he+0.045
Neg logit contributions
 strangers-0.060
 goats-0.058
emy-0.058
 necks-0.056
 pad-0.056
Token,
Token ID11
Feature activation-4.49e-01

Pos logit contributions
emy+0.512
 strangers+0.508
 goats+0.492
 necks+0.477
 hesitation+0.476
Neg logit contributions
bel-0.458
cakes-0.433
organ-0.403
he-0.388
 Jos-0.382
Token especially
Token ID2592
Feature activation+5.24e-01

Pos logit contributions
 strangers+0.029
 goats+0.027
emy+0.026
 necks+0.025
 pad+0.025
Neg logit contributions
bel-0.032
cakes-0.031
organ-0.029
 Jos-0.027
he-0.026
Token ones
Token ID3392
Feature activation-1.31e+00

Pos logit contributions
bel+0.030
cakes+0.025
organ+0.025
 Jos+0.023
spe+0.022
Neg logit contributions
uf-0.037
agging-0.036
emy-0.036
 strangers-0.036
 pad-0.035
Token who
Token ID508
Feature activation-3.79e+00

Pos logit contributions
 strangers+0.080
emy+0.079
 goats+0.077
 hesitation+0.077
uf+0.076
Neg logit contributions
bel-0.088
cakes-0.082
organ-0.075
he-0.073
 Jos-0.071
Token act
Token ID719
Feature activation-1.96e-01

Pos logit contributions
 strangers+0.235
emy+0.226
 goats+0.223
 necks+0.215
 pad+0.215
Neg logit contributions
bel-0.259
cakes-0.247
organ-0.228
 Jos-0.220
he-0.208
Token them
Token ID606
Feature activation-1.93e-01

Pos logit contributions
 strangers+0.031
 goats+0.029
emy+0.028
 hesitation+0.027
 necks+0.027
Neg logit contributions
bel-0.028
cakes-0.026
organ-0.025
 Jos-0.022
spe-0.022
Token to
Token ID284
Feature activation-3.51e-01

Pos logit contributions
 strangers+0.013
 goats+0.012
emy+0.011
 necks+0.011
 hesitation+0.011
Neg logit contributions
bel-0.013
cakes-0.012
organ-0.012
 Jos-0.011
he-0.011
Token stay
Token ID2652
Feature activation-1.08e+00

Pos logit contributions
 strangers+0.022
 goats+0.020
emy+0.020
 necks+0.020
 hesitation+0.019
Neg logit contributions
bel-0.025
cakes-0.025
organ-0.023
 Jos-0.021
spe-0.020
Token away
Token ID1497
Feature activation-4.97e-01

Pos logit contributions
 strangers+0.080
 goats+0.076
emy+0.076
 necks+0.074
 pad+0.073
Neg logit contributions
bel-0.063
cakes-0.062
organ-0.055
 Jos-0.053
spe-0.049
Token from
Token ID422
Feature activation+2.69e+00

Pos logit contributions
 strangers+0.036
 goats+0.032
emy+0.032
 hesitation+0.031
 pad+0.031
Neg logit contributions
cakes-0.033
bel-0.030
organ-0.028
 Jos-0.025
whe-0.025
Token strangers
Token ID18981
Feature activation-7.53e+00

Pos logit contributions
bel+0.124
 Jos+0.111
cakes+0.106
organ+0.099
 Sus+0.097
Neg logit contributions
emy-0.217
leness-0.211
 pad-0.210
uf-0.209
 goats-0.207
Token.
Token ID13
Feature activation-1.33e-01

Pos logit contributions
emy+0.534
 strangers+0.514
leness+0.510
 goats+0.505
uf+0.493
Neg logit contributions
bel-0.433
cakes-0.410
organ-0.370
he-0.366
 Jos-0.357
Token But
Token ID887
Feature activation-2.00e+00

Pos logit contributions
 strangers+0.011
 goats+0.010
emy+0.010
 necks+0.010
 pad+0.010
Neg logit contributions
bel-0.007
cakes-0.007
organ-0.006
 Jos-0.006
he-0.005
Token they
Token ID484
Feature activation+2.16e-01

Pos logit contributions
 strangers+0.138
 goats+0.125
 hesitation+0.120
 necks+0.118
emy+0.118
Neg logit contributions
bel-0.136
cakes-0.133
organ-0.124
he-0.110
ushes-0.107
Token did
Token ID750
Feature activation+1.05e+00

Pos logit contributions
bel+0.016
cakes+0.015
organ+0.014
 Jos+0.014
spe+0.013
Neg logit contributions
 strangers-0.014
 goats-0.012
emy-0.012
 hesitation-0.012
 necks-0.012
Token not
Token ID407
Feature activation+5.11e-01

Pos logit contributions
bel+0.063
cakes+0.060
organ+0.055
 Jos+0.052
he+0.049
Neg logit contributions
 strangers-0.076
 goats-0.073
emy-0.073
 pad-0.070
 necks-0.070
Token your
Token ID534
Feature activation-2.93e+00

Pos logit contributions
 pad+0.030
emy+0.029
 necks+0.029
 goats+0.028
agging+0.028
Neg logit contributions
bel-0.019
 Jos-0.018
cakes-0.017
organ-0.016
 Sus-0.016
Token parents
Token ID3397
Feature activation+5.93e-01

Pos logit contributions
 strangers+0.211
 goats+0.207
emy+0.203
leness+0.200
 pad+0.199
Neg logit contributions
bel-0.170
cakes-0.158
organ-0.144
 Jos-0.140
spe-0.131
Token and
Token ID290
Feature activation-8.68e-02

Pos logit contributions
bel+0.034
cakes+0.033
he+0.030
organ+0.030
 Jos+0.029
Neg logit contributions
emy-0.042
 strangers-0.039
 goats-0.038
 unnecessary-0.037
 necks-0.037
Token not
Token ID407
Feature activation+5.83e-01

Pos logit contributions
 strangers+0.006
 goats+0.005
emy+0.005
 necks+0.005
 hesitation+0.005
Neg logit contributions
bel-0.006
cakes-0.006
organ-0.005
 Jos-0.005
he-0.005
Token take
Token ID1011
Feature activation-1.99e+00

Pos logit contributions
bel+0.035
cakes+0.033
organ+0.031
 Jos+0.029
spe+0.028
Neg logit contributions
 strangers-0.040
emy-0.039
 goats-0.039
 necks-0.038
leness-0.037
Token unnecessary
Token ID13114
Feature activation-7.52e+00

Pos logit contributions
emy+0.122
 pad+0.121
 strangers+0.118
 goats+0.117
 necks+0.116
Neg logit contributions
bel-0.126
cakes-0.120
organ-0.114
 Jos-0.108
 Sus-0.102
Token risks
Token ID7476
Feature activation-4.24e+00

Pos logit contributions
emy+0.298
uf+0.290
iley+0.284
 pad+0.281
 goats+0.279
Neg logit contributions
bel-0.676
organ-0.585
cakes-0.580
 together-0.560
spe-0.528
Token.
Token ID13
Feature activation+3.00e-02

Pos logit contributions
 strangers+0.269
 goats+0.255
emy+0.255
 necks+0.244
 pad+0.241
Neg logit contributions
bel-0.289
cakes-0.278
organ-0.256
 Jos-0.246
he-0.232
Token<|endoftext|>
Token ID50256
Feature activation-1.64e+00

Pos logit contributions
bel+0.002
cakes+0.002
organ+0.002
 Jos+0.002
spe+0.001
Neg logit contributions
 strangers-0.002
 goats-0.002
emy-0.002
 necks-0.002
 pad-0.002
TokenOne
Token ID3198
Feature activation-2.74e+00

Pos logit contributions
 strangers+0.100
 goats+0.094
emy+0.094
 pad+0.090
uf+0.089
Neg logit contributions
bel-0.117
cakes-0.111
organ-0.102
 Jos-0.099
spe-0.095
Token day
Token ID1110
Feature activation-7.41e-01

Pos logit contributions
 strangers+0.178
 goats+0.173
emy+0.170
 necks+0.168
leness+0.165
Neg logit contributions
bel-0.181
cakes-0.175
organ-0.159
 Jos-0.155
he-0.152
Token careful
Token ID8161
Feature activation-7.55e-01

Pos logit contributions
 strangers+0.110
 goats+0.103
emy+0.103
 pad+0.098
 necks+0.098
Neg logit contributions
bel-0.149
cakes-0.143
organ-0.133
 Jos-0.129
spe-0.121
Token and
Token ID290
Feature activation-1.35e-01

Pos logit contributions
 strangers+0.051
 goats+0.048
emy+0.048
 necks+0.046
 pad+0.046
Neg logit contributions
bel-0.049
cakes-0.047
organ-0.044
 Jos-0.042
he-0.039
Token not
Token ID407
Feature activation+3.29e-01

Pos logit contributions
 strangers+0.008
 goats+0.008
emy+0.008
 necks+0.007
 pad+0.007
Neg logit contributions
bel-0.009
cakes-0.009
organ-0.009
 Jos-0.008
he-0.008
Token talk
Token ID1561
Feature activation-5.25e-01

Pos logit contributions
bel+0.021
cakes+0.020
organ+0.019
 Jos+0.018
spe+0.017
Neg logit contributions
 strangers-0.022
 goats-0.021
emy-0.021
 necks-0.020
 pad-0.020
Token to
Token ID284
Feature activation-4.73e-01

Pos logit contributions
 strangers+0.034
 goats+0.032
emy+0.031
 necks+0.030
 pad+0.030
Neg logit contributions
bel-0.035
cakes-0.034
organ-0.032
 Jos-0.031
he-0.029
Token strangers
Token ID18981
Feature activation-7.49e+00

Pos logit contributions
emy+0.028
 strangers+0.028
 goats+0.027
 pad+0.027
leness+0.027
Neg logit contributions
bel-0.033
cakes-0.031
organ-0.028
 Jos-0.028
spe-0.026
Token.
Token ID13
Feature activation-6.70e-02

Pos logit contributions
 strangers+0.489
emy+0.488
 goats+0.470
 necks+0.461
leness+0.459
Neg logit contributions
bel-0.475
cakes-0.454
organ-0.414
 Jos-0.396
he-0.386
Token Sara
Token ID24799
Feature activation-1.63e+00

Pos logit contributions
 strangers+0.005
 goats+0.005
emy+0.005
 necks+0.005
 pad+0.005
Neg logit contributions
bel-0.004
cakes-0.004
organ-0.003
 Jos-0.003
he-0.003
Token and
Token ID290
Feature activation-1.64e-01

Pos logit contributions
 strangers+0.106
 goats+0.100
emy+0.100
 necks+0.096
 pad+0.095
Neg logit contributions
bel-0.109
cakes-0.105
organ-0.097
 Jos-0.093
he-0.087
Token Ben
Token ID3932
Feature activation-1.95e+00

Pos logit contributions
 strangers+0.010
emy+0.010
 goats+0.010
 necks+0.009
leness+0.009
Neg logit contributions
bel-0.011
cakes-0.011
organ-0.010
 Jos-0.010
he-0.009
Token agreed
Token ID4987
Feature activation+1.30e+00

Pos logit contributions
 strangers+0.141
emy+0.134
 goats+0.134
 necks+0.128
 pad+0.128
Neg logit contributions
bel-0.116
cakes-0.112
organ-0.102
 Jos-0.097
he-0.091
Token for
Token ID329
Feature activation+1.15e+00

Pos logit contributions
bel+0.013
cakes+0.013
organ+0.012
 Jos+0.012
he+0.011
Neg logit contributions
 strangers-0.012
 goats-0.011
emy-0.011
 necks-0.010
 pad-0.010
Token the
Token ID262
Feature activation+2.57e-01

Pos logit contributions
bel+0.064
cakes+0.062
organ+0.055
 Jos+0.055
he+0.049
Neg logit contributions
 strangers-0.085
emy-0.082
 goats-0.082
 pad-0.080
 necks-0.080
Token help
Token ID1037
Feature activation-1.28e+00

Pos logit contributions
bel+0.019
cakes+0.018
organ+0.017
 Jos+0.016
he+0.015
Neg logit contributions
 strangers-0.015
 goats-0.014
emy-0.014
 necks-0.014
 pad-0.014
Token of
Token ID286
Feature activation+2.67e+00

Pos logit contributions
 strangers+0.080
 goats+0.075
emy+0.071
 pad+0.071
 hesitation+0.070
Neg logit contributions
bel-0.091
cakes-0.087
organ-0.084
 Jos-0.077
ushes-0.073
Token kind
Token ID1611
Feature activation-2.46e+00

Pos logit contributions
bel+0.180
cakes+0.174
organ+0.164
 Jos+0.150
he+0.144
Neg logit contributions
 strangers-0.183
 goats-0.169
emy-0.165
 necks-0.159
 hesitation-0.158
Token strangers
Token ID18981
Feature activation-7.45e+00

Pos logit contributions
 strangers+0.150
 goats+0.141
emy+0.139
 necks+0.132
 pad+0.132
Neg logit contributions
bel-0.177
cakes-0.171
organ-0.161
 Jos-0.152
he-0.145
Token.
Token ID13
Feature activation+3.53e-01

Pos logit contributions
 strangers+0.458
 goats+0.430
emy+0.418
 pad+0.406
 necks+0.401
Neg logit contributions
bel-0.531
cakes-0.513
organ-0.480
 Jos-0.459
spe-0.426
Token<|endoftext|>
Token ID50256
Feature activation-1.49e+00

Pos logit contributions
bel+0.024
cakes+0.022
organ+0.021
 Jos+0.019
he+0.018
Neg logit contributions
 strangers-0.023
emy-0.023
 goats-0.023
 pad-0.022
uf-0.021
TokenOnce
Token ID7454
Feature activation+1.43e+00

Pos logit contributions
 strangers+0.084
emy+0.079
 goats+0.079
 pad+0.077
uf+0.074
Neg logit contributions
bel-0.113
cakes-0.108
organ-0.098
 Jos-0.096
spe-0.092
Token upon
Token ID2402
Feature activation+1.22e+00

Pos logit contributions
bel+0.088
cakes+0.087
organ+0.080
 Jos+0.073
whe+0.067
Neg logit contributions
 strangers-0.100
emy-0.096
 goats-0.093
 pad-0.092
 hesitation-0.090
Token a
Token ID257
Feature activation+2.02e-01

Pos logit contributions
bel+0.067
cakes+0.064
organ+0.058
 Jos+0.055
he+0.051
Neg logit contributions
 strangers-0.094
 goats-0.090
emy-0.089
 necks-0.087
leness-0.086
Token to
Token ID284
Feature activation-3.32e-01

Pos logit contributions
bel+0.038
cakes+0.036
organ+0.033
 Jos+0.031
spe+0.030
Neg logit contributions
 strangers-0.045
emy-0.043
 goats-0.043
 necks-0.042
uf-0.042
Token be
Token ID307
Feature activation-1.79e+00

Pos logit contributions
emy+0.023
 pad+0.023
 strangers+0.023
 goats+0.023
 necks+0.022
Neg logit contributions
bel-0.018
organ-0.017
cakes-0.017
spe-0.015
 Jos-0.015
Token too
Token ID1165
Feature activation+1.49e+00

Pos logit contributions
 strangers+0.103
 goats+0.097
emy+0.097
 necks+0.093
 pad+0.092
Neg logit contributions
bel-0.132
cakes-0.127
organ-0.119
 Jos-0.114
spe-0.108
Token scared
Token ID12008
Feature activation+6.91e-01

Pos logit contributions
bel+0.099
cakes+0.088
organ+0.085
spe+0.081
 Jos+0.077
Neg logit contributions
 necks-0.092
 pad-0.092
 strangers-0.090
emy-0.089
 goats-0.088
Token of
Token ID286
Feature activation+2.78e+00

Pos logit contributions
bel+0.036
cakes+0.035
organ+0.031
 Jos+0.030
spe+0.028
Neg logit contributions
 strangers-0.052
emy-0.052
 goats-0.051
 necks-0.051
uf-0.050
Token strangers
Token ID18981
Feature activation-7.44e+00

Pos logit contributions
bel+0.159
cakes+0.149
 Jos+0.142
organ+0.137
 Sus+0.133
Neg logit contributions
emy-0.185
 pad-0.182
 goats-0.180
 necks-0.178
leness-0.177
Token.
Token ID13
Feature activation+4.45e-01

Pos logit contributions
 strangers+0.490
emy+0.489
 goats+0.471
 necks+0.463
leness+0.462
Neg logit contributions
bel-0.461
cakes-0.444
organ-0.406
 Jos-0.387
he-0.377
Token<|endoftext|>
Token ID50256
Feature activation-1.60e+00

Pos logit contributions
bel+0.028
cakes+0.026
organ+0.024
 Jos+0.022
he+0.021
Neg logit contributions
 strangers-0.031
emy-0.030
 goats-0.030
 pad-0.029
uf-0.029
TokenOnce
Token ID7454
Feature activation+1.36e+00

Pos logit contributions
 strangers+0.094
 goats+0.089
emy+0.088
 pad+0.086
iley+0.083
Neg logit contributions
bel-0.117
cakes-0.111
organ-0.102
 Jos-0.099
spe-0.095
Token upon
Token ID2402
Feature activation+1.24e+00

Pos logit contributions
bel+0.082
cakes+0.080
organ+0.074
 Jos+0.067
spe+0.063
Neg logit contributions
 strangers-0.098
emy-0.092
 goats-0.091
 pad-0.088
 hesitation-0.088
Token a
Token ID257
Feature activation+1.70e-01

Pos logit contributions
bel+0.069
cakes+0.065
organ+0.060
 Jos+0.056
he+0.052
Neg logit contributions
 strangers-0.095
 goats-0.091
emy-0.090
 necks-0.088
leness-0.087
Token The
Token ID383
Feature activation-1.03e+00

Pos logit contributions
bel+0.015
cakes+0.014
organ+0.013
 Jos+0.012
he+0.012
Neg logit contributions
 strangers-0.016
 goats-0.016
emy-0.016
 pad-0.015
uf-0.015
Token animals
Token ID4695
Feature activation-1.35e+00

Pos logit contributions
 strangers+0.054
emy+0.053
iley+0.051
leness+0.050
 necks+0.050
Neg logit contributions
bel-0.077
cakes-0.075
organ-0.068
 Jos-0.065
spe-0.065
Token did
Token ID750
Feature activation+1.06e+00

Pos logit contributions
 strangers+0.102
 goats+0.097
emy+0.093
 hesitation+0.092
 pad+0.091
Neg logit contributions
bel-0.078
cakes-0.076
organ-0.070
 Jos-0.066
ushes-0.060
Token not
Token ID407
Feature activation+5.74e-01

Pos logit contributions
bel+0.082
cakes+0.082
organ+0.076
 Jos+0.073
he+0.070
Neg logit contributions
 strangers-0.062
 hesitation-0.054
 goats-0.052
 risking-0.052
leness-0.051
Token like
Token ID588
Feature activation+8.23e-01

Pos logit contributions
cakes+0.041
bel+0.040
organ+0.037
 Jos+0.035
iced+0.035
Neg logit contributions
 strangers-0.037
 goats-0.033
 hesitation-0.033
leness-0.031
emy-0.031
Token strangers
Token ID18981
Feature activation-7.40e+00

Pos logit contributions
bel+0.051
cakes+0.049
organ+0.045
 Jos+0.043
he+0.040
Neg logit contributions
 strangers-0.058
 goats-0.055
emy-0.054
 necks-0.052
 pad-0.052
Token,
Token ID11
Feature activation-5.94e-01

Pos logit contributions
 strangers+0.478
emy+0.477
 goats+0.458
 necks+0.451
leness+0.442
Neg logit contributions
bel-0.473
cakes-0.459
organ-0.419
 Jos-0.399
he-0.395
Token so
Token ID523
Feature activation+5.30e-02

Pos logit contributions
 strangers+0.035
 goats+0.033
emy+0.032
 necks+0.031
 pad+0.031
Neg logit contributions
bel-0.044
cakes-0.042
organ-0.039
 Jos-0.038
he-0.036
Token they
Token ID484
Feature activation+2.92e-01

Pos logit contributions
bel+0.003
cakes+0.003
organ+0.003
 Jos+0.003
he+0.002
Neg logit contributions
 strangers-0.004
 goats-0.004
emy-0.004
 hesitation-0.004
 necks-0.004
Token began
Token ID2540
Feature activation+1.38e+00

Pos logit contributions
bel+0.021
cakes+0.021
organ+0.020
 Jos+0.019
ushes+0.018
Neg logit contributions
 strangers-0.018
 goats-0.017
 hesitation-0.015
 pad-0.015
emy-0.015
Token to
Token ID284
Feature activation-3.26e-01

Pos logit contributions
bel+0.086
cakes+0.082
organ+0.075
 Jos+0.072
spe+0.067
Neg logit contributions
 strangers-0.096
emy-0.091
 goats-0.091
 necks-0.087
 pad-0.087
Token animals
Token ID4695
Feature activation-1.24e+00

Pos logit contributions
bel+0.009
cakes+0.009
organ+0.008
he+0.008
 Jos+0.007
Neg logit contributions
 strangers-0.007
emy-0.007
leness-0.006
uf-0.006
iley-0.006
Token safe
Token ID3338
Feature activation-2.62e+00

Pos logit contributions
 strangers+0.079
 goats+0.075
emy+0.073
 pad+0.071
 necks+0.071
Neg logit contributions
bel-0.085
cakes-0.083
organ-0.078
 Jos-0.074
spe-0.069
Token and
Token ID290
Feature activation-1.70e-02

Pos logit contributions
 strangers+0.161
 goats+0.151
emy+0.148
 necks+0.144
 pad+0.143
Neg logit contributions
bel-0.186
cakes-0.182
organ-0.169
 Jos-0.162
he-0.150
Token chased
Token ID26172
Feature activation-3.23e+00

Pos logit contributions
 strangers+0.001
 goats+0.001
emy+0.001
 necks+0.001
leness+0.001
Neg logit contributions
bel-0.001
cakes-0.001
organ-0.001
 Jos-0.001
he-0.001
Token away
Token ID1497
Feature activation-4.38e-01

Pos logit contributions
 strangers+0.238
emy+0.232
 goats+0.229
 hesitation+0.226
 pad+0.225
Neg logit contributions
bel-0.176
cakes-0.166
organ-0.151
 Jos-0.146
he-0.137
Token strangers
Token ID18981
Feature activation-7.39e+00

Pos logit contributions
 strangers+0.028
 goats+0.025
emy+0.024
 necks+0.023
 tails+0.022
Neg logit contributions
cakes-0.032
bel-0.031
organ-0.030
 Jos-0.028
he-0.027
Token who
Token ID508
Feature activation-3.72e+00

Pos logit contributions
 strangers+0.482
emy+0.460
 goats+0.456
 necks+0.438
leness+0.434
Neg logit contributions
bel-0.487
cakes-0.468
organ-0.429
 Jos-0.414
he-0.394
Token came
Token ID1625
Feature activation+1.38e+00

Pos logit contributions
 strangers+0.247
 goats+0.227
emy+0.209
leness+0.206
 risking+0.205
Neg logit contributions
bel-0.253
cakes-0.252
organ-0.238
 Jos-0.225
ushes-0.212
Token too
Token ID1165
Feature activation+1.61e+00

Pos logit contributions
bel+0.085
cakes+0.081
organ+0.074
 Jos+0.071
he+0.066
Neg logit contributions
 strangers-0.098
 goats-0.093
emy-0.093
 necks-0.089
 pad-0.089
Token close
Token ID1969
Feature activation-2.54e+00

Pos logit contributions
bel+0.099
cakes+0.090
organ+0.084
 Jos+0.079
spe+0.079
Neg logit contributions
 strangers-0.112
emy-0.108
 goats-0.108
iley-0.105
 necks-0.104
Token to
Token ID284
Feature activation-3.36e-01

Pos logit contributions
 strangers+0.179
emy+0.175
 goats+0.172
uf+0.167
leness+0.167
Neg logit contributions
bel-0.153
cakes-0.142
organ-0.128
 Jos-0.126
spe-0.120
Token call
Token ID869
Feature activation-1.43e+00

Pos logit contributions
 strangers+0.075
 goats+0.070
emy+0.068
 hesitation+0.067
 pad+0.066
Neg logit contributions
bel-0.081
cakes-0.079
organ-0.072
 Jos-0.068
spe-0.064
Token the
Token ID262
Feature activation+4.83e-01

Pos logit contributions
 strangers+0.100
 goats+0.088
 hesitation+0.085
iley+0.081
 pad+0.080
Neg logit contributions
bel-0.093
cakes-0.091
organ-0.088
castle-0.078
spe-0.077
Token park
Token ID3952
Feature activation-1.40e+00

Pos logit contributions
bel+0.034
cakes+0.033
organ+0.030
 Jos+0.029
spe+0.027
Neg logit contributions
 strangers-0.029
emy-0.028
 goats-0.028
 necks-0.027
leness-0.027
Token her
Token ID607
Feature activation+2.49e-01

Pos logit contributions
 strangers+0.086
 goats+0.082
 pad+0.074
 hesitation+0.073
 goat+0.072
Neg logit contributions
bel-0.096
organ-0.094
cakes-0.091
 Jos-0.084
he-0.083
Token "
Token ID366
Feature activation+5.23e-01

Pos logit contributions
bel+0.017
cakes+0.017
organ+0.016
 Jos+0.015
he+0.014
Neg logit contributions
 strangers-0.016
 goats-0.015
 pad-0.014
emy-0.014
 hesitation-0.014
Tokenuse
Token ID1904
Feature activation-7.38e+00

Pos logit contributions
cakes+0.032
bel+0.032
organ+0.028
 Jos+0.028
castle+0.027
Neg logit contributions
 strangers-0.032
 goats-0.032
 pad-0.032
emy-0.031
iley-0.030
Tokenful
Token ID913
Feature activation+4.37e-01

Pos logit contributions
 strangers+0.242
 goats+0.215
emy+0.211
 necks+0.192
 pad+0.190
Neg logit contributions
bel-0.734
cakes-0.716
organ-0.679
 Jos-0.661
he-0.634
Token-
Token ID12
Feature activation+3.22e+00

Pos logit contributions
bel+0.032
cakes+0.031
organ+0.029
 Jos+0.027
he+0.026
Neg logit contributions
 strangers-0.026
 goats-0.024
emy-0.024
 pad-0.023
 necks-0.023
Tokenpark
Token ID20928
Feature activation+2.03e+00

Pos logit contributions
bel+0.211
cakes+0.204
organ+0.189
 Jos+0.180
he+0.168
Neg logit contributions
 strangers-0.212
 goats-0.201
emy-0.199
 necks-0.191
 pad-0.190
Token".
Token ID1911
Feature activation+5.39e-01

Pos logit contributions
bel+0.145
cakes+0.141
organ+0.129
 Jos+0.125
he+0.118
Neg logit contributions
 strangers-0.122
emy-0.115
 goats-0.115
 necks-0.110
 pad-0.108
Token<|endoftext|>
Token ID50256
Feature activation-1.54e+00

Pos logit contributions
bel+0.039
cakes+0.037
organ+0.036
spe+0.032
castle+0.032
Neg logit contributions
 strangers-0.031
uf-0.030
 goats-0.030
emy-0.029
 pad-0.029
Token valuable
Token ID8119
Feature activation-6.39e-01

Pos logit contributions
bel+0.014
cakes+0.013
organ+0.012
 Jos+0.011
spe+0.011
Neg logit contributions
 strangers-0.015
 goats-0.014
emy-0.014
 necks-0.014
 pad-0.014
Token lesson
Token ID11483
Feature activation+1.08e-01

Pos logit contributions
 necks+0.053
 goats+0.053
 strangers+0.052
emy+0.052
 pad+0.050
Neg logit contributions
bel-0.031
cakes-0.029
organ-0.026
 Jos-0.023
spe-0.022
Token -
Token ID532
Feature activation+2.21e+00

Pos logit contributions
cakes+0.006
bel+0.006
 Jos+0.005
organ+0.005
he+0.005
Neg logit contributions
emy-0.008
 goats-0.008
 necks-0.007
 strangers-0.007
uf-0.007
Token never
Token ID1239
Feature activation-1.39e+00

Pos logit contributions
bel+0.156
cakes+0.151
organ+0.142
 Jos+0.134
he+0.128
Neg logit contributions
 strangers-0.138
 goats-0.128
emy-0.125
 pad-0.121
 necks-0.120
Token trust
Token ID3774
Feature activation-1.77e+00

Pos logit contributions
 strangers+0.093
emy+0.090
 goats+0.089
 necks+0.087
uf+0.085
Neg logit contributions
bel-0.090
cakes-0.084
organ-0.079
 Jos-0.075
spe-0.071
Token strangers
Token ID18981
Feature activation-7.38e+00

Pos logit contributions
emy+0.122
 goats+0.118
uf+0.117
 necks+0.117
 pad+0.116
Neg logit contributions
bel-0.101
cakes-0.093
 Jos-0.086
organ-0.085
 Sus-0.081
Token.
Token ID13
Feature activation+2.73e-01

Pos logit contributions
 strangers+0.473
emy+0.463
 goats+0.454
 necks+0.443
leness+0.433
Neg logit contributions
bel-0.484
cakes-0.467
organ-0.426
 Jos-0.413
he-0.396
Token<|endoftext|>
Token ID50256
Feature activation-1.65e+00

Pos logit contributions
bel+0.018
cakes+0.017
organ+0.016
 Jos+0.015
spe+0.014
Neg logit contributions
 strangers-0.018
emy-0.017
 goats-0.017
 pad-0.016
uf-0.016
TokenOnce
Token ID7454
Feature activation+1.33e+00

Pos logit contributions
 strangers+0.085
 goats+0.081
emy+0.080
 pad+0.079
iley+0.076
Neg logit contributions
bel-0.133
cakes-0.125
organ-0.117
 Jos-0.113
spe-0.110
Token upon
Token ID2402
Feature activation+1.25e+00

Pos logit contributions
bel+0.082
cakes+0.080
organ+0.075
 Jos+0.067
spe+0.063
Neg logit contributions
 strangers-0.093
emy-0.087
 goats-0.085
 pad-0.084
 hesitation-0.083
Token a
Token ID257
Feature activation+2.16e-01

Pos logit contributions
bel+0.068
cakes+0.065
organ+0.059
 Jos+0.056
he+0.052
Neg logit contributions
 strangers-0.096
 goats-0.092
emy-0.091
 necks-0.088
leness-0.088
Token knew
Token ID2993
Feature activation+5.61e-01

Pos logit contributions
bel+0.033
cakes+0.032
organ+0.030
 Jos+0.028
he+0.027
Neg logit contributions
 strangers-0.030
 goats-0.029
emy-0.028
 necks-0.027
 pad-0.027
Token how
Token ID703
Feature activation+1.58e+00

Pos logit contributions
bel+0.034
cakes+0.033
organ+0.030
 Jos+0.030
spe+0.027
Neg logit contributions
 strangers-0.039
 goats-0.037
emy-0.037
 necks-0.036
 pad-0.036
Token to
Token ID284
Feature activation-3.77e-01

Pos logit contributions
bel+0.092
cakes+0.088
organ+0.078
 Jos+0.077
spe+0.070
Neg logit contributions
 strangers-0.115
 goats-0.111
emy-0.110
 necks-0.109
 pad-0.108
Token deal
Token ID1730
Feature activation+1.58e+00

Pos logit contributions
 strangers+0.027
 goats+0.025
emy+0.025
 necks+0.024
 hesitation+0.024
Neg logit contributions
bel-0.023
cakes-0.023
organ-0.020
 Jos-0.020
spe-0.018
Token with
Token ID351
Feature activation+7.82e-01

Pos logit contributions
bel+0.091
cakes+0.087
organ+0.080
 Jos+0.076
spe+0.072
Neg logit contributions
 strangers-0.120
 goats-0.112
emy-0.109
 necks-0.106
leness-0.106
Token strangers
Token ID18981
Feature activation-7.37e+00

Pos logit contributions
bel+0.036
cakes+0.035
 Jos+0.031
organ+0.030
 Sus+0.029
Neg logit contributions
 strangers-0.060
emy-0.060
 goats-0.059
 pad-0.059
agging-0.058
Token.
Token ID13
Feature activation-4.56e-02

Pos logit contributions
 strangers+0.483
emy+0.477
 goats+0.460
 necks+0.455
leness+0.451
Neg logit contributions
bel-0.463
cakes-0.449
organ-0.406
 Jos-0.395
he-0.379
Token She
Token ID1375
Feature activation+5.32e-01

Pos logit contributions
 strangers+0.004
 goats+0.004
emy+0.004
 pad+0.003
 necks+0.003
Neg logit contributions
bel-0.002
cakes-0.002
organ-0.002
 Jos-0.002
he-0.002
Token said
Token ID531
Feature activation+5.00e-01

Pos logit contributions
bel+0.039
cakes+0.037
organ+0.034
 Jos+0.033
he+0.032
Neg logit contributions
 strangers-0.032
 goats-0.030
emy-0.030
 necks-0.029
 pad-0.028
Token to
Token ID284
Feature activation-3.55e-01

Pos logit contributions
bel+0.037
cakes+0.036
organ+0.034
 Jos+0.033
he+0.031
Neg logit contributions
 strangers-0.029
 goats-0.027
emy-0.027
 necks-0.026
 pad-0.025
Token the
Token ID262
Feature activation+1.97e-01

Pos logit contributions
 strangers+0.025
emy+0.024
 goats+0.024
 pad+0.024
 necks+0.023
Neg logit contributions
bel-0.021
cakes-0.020
 Jos-0.018
organ-0.018
he-0.016
Token dad
Token ID9955
Feature activation+1.49e-01

Pos logit contributions
 strangers+0.006
 goats+0.006
 hesitation+0.005
 necks+0.005
 risking+0.005
Neg logit contributions
cakes-0.007
bel-0.007
organ-0.007
 Jos-0.007
iced-0.006
Token and
Token ID290
Feature activation-8.05e-02

Pos logit contributions
bel+0.010
cakes+0.010
organ+0.009
 Jos+0.009
he+0.008
Neg logit contributions
 strangers-0.009
emy-0.009
 goats-0.009
 necks-0.008
 pad-0.008
Token not
Token ID407
Feature activation+4.80e-01

Pos logit contributions
 strangers+0.005
 goats+0.005
 hesitation+0.004
 necks+0.004
emy+0.004
Neg logit contributions
cakes-0.006
bel-0.005
organ-0.005
 Jos-0.005
he-0.005
Token talk
Token ID1561
Feature activation-4.39e-01

Pos logit contributions
bel+0.033
cakes+0.032
organ+0.029
 Jos+0.028
he+0.026
Neg logit contributions
 strangers-0.032
 goats-0.029
emy-0.029
 pad-0.028
 necks-0.028
Token to
Token ID284
Feature activation-4.19e-01

Pos logit contributions
 strangers+0.028
 goats+0.025
emy+0.024
 pad+0.024
 necks+0.023
Neg logit contributions
bel-0.031
cakes-0.030
organ-0.028
 Jos-0.028
he-0.025
Token strangers
Token ID18981
Feature activation-7.37e+00

Pos logit contributions
emy+0.028
 strangers+0.028
 goats+0.028
 pad+0.027
 necks+0.027
Neg logit contributions
bel-0.025
cakes-0.024
organ-0.021
 Jos-0.021
spe-0.020
Token.
Token ID13
Feature activation+3.61e-03

Pos logit contributions
 strangers+0.480
emy+0.463
 goats+0.456
 necks+0.440
leness+0.437
Neg logit contributions
bel-0.482
cakes-0.465
organ-0.425
 Jos-0.408
he-0.389
Token They
Token ID1119
Feature activation-3.90e-01

Pos logit contributions
bel+0.000
cakes+0.000
organ+0.000
 Jos+0.000
he+0.000
Neg logit contributions
 strangers-0.000
 goats-0.000
emy-0.000
 necks-0.000
 pad-0.000
Token said
Token ID531
Feature activation+6.19e-01

Pos logit contributions
 strangers+0.022
 goats+0.020
emy+0.019
 necks+0.019
 hesitation+0.019
Neg logit contributions
bel-0.031
cakes-0.030
organ-0.028
 Jos-0.027
ushes-0.025
Token they
Token ID484
Feature activation+2.92e-01

Pos logit contributions
bel+0.032
cakes+0.031
organ+0.028
 Jos+0.026
he+0.024
Neg logit contributions
 strangers-0.050
 goats-0.047
emy-0.047
 necks-0.045
 pad-0.045
Token would
Token ID561
Feature activation-1.25e+00

Pos logit contributions
bel+0.020
cakes+0.019
organ+0.017
 Jos+0.017
spe+0.016
Neg logit contributions
 strangers-0.019
 goats-0.018
emy-0.017
 necks-0.017
 pad-0.017
Token her
Token ID607
Feature activation+6.64e-02

Pos logit contributions
 strangers+0.032
 goats+0.030
emy+0.030
 necks+0.029
 pad+0.029
Neg logit contributions
bel-0.024
cakes-0.023
organ-0.021
 Jos-0.020
spe-0.019
Token and
Token ID290
Feature activation+3.96e-02

Pos logit contributions
bel+0.004
cakes+0.004
organ+0.004
 Jos+0.004
he+0.004
Neg logit contributions
 strangers-0.004
 goats-0.004
emy-0.004
 necks-0.004
 pad-0.004
Token never
Token ID1239
Feature activation-1.48e+00

Pos logit contributions
cakes+0.003
bel+0.003
organ+0.003
 Jos+0.002
he+0.002
Neg logit contributions
 strangers-0.002
 pad-0.002
 goats-0.002
 necks-0.002
 hesitation-0.002
Token talked
Token ID6619
Feature activation-7.93e-01

Pos logit contributions
 strangers+0.092
 goats+0.086
emy+0.085
 pad+0.082
 necks+0.082
Neg logit contributions
bel-0.104
cakes-0.101
organ-0.093
 Jos-0.089
he-0.084
Token to
Token ID284
Feature activation-3.68e-01

Pos logit contributions
 strangers+0.049
 goats+0.043
 pad+0.042
 hesitation+0.041
emy+0.041
Neg logit contributions
bel-0.056
cakes-0.054
organ-0.052
 Jos-0.050
spe-0.046
Token strangers
Token ID18981
Feature activation-7.36e+00

Pos logit contributions
emy+0.028
 strangers+0.027
 goats+0.027
 necks+0.026
 pad+0.026
Neg logit contributions
bel-0.020
cakes-0.018
 Jos-0.017
organ-0.016
 Sus-0.015
Token again
Token ID757
Feature activation+1.02e+00

Pos logit contributions
 strangers+0.529
 goats+0.502
emy+0.498
 necks+0.480
 pad+0.478
Neg logit contributions
bel-0.444
cakes-0.427
organ-0.389
 Jos-0.372
he-0.344
Token.
Token ID13
Feature activation+3.12e-01

Pos logit contributions
bel+0.066
cakes+0.062
organ+0.057
 Jos+0.056
he+0.052
Neg logit contributions
 strangers-0.068
 goats-0.066
emy-0.065
 necks-0.063
uf-0.062
Token<|endoftext|>
Token ID50256
Feature activation-1.62e+00

Pos logit contributions
bel+0.022
cakes+0.021
organ+0.019
 Jos+0.018
spe+0.017
Neg logit contributions
 strangers-0.019
emy-0.018
 goats-0.018
 pad-0.018
uf-0.018
TokenOnce
Token ID7454
Feature activation+1.36e+00

Pos logit contributions
 strangers+0.100
emy+0.094
 goats+0.093
 pad+0.089
uf+0.088
Neg logit contributions
bel-0.114
cakes-0.109
organ-0.099
 Jos-0.097
spe-0.093
Token upon
Token ID2402
Feature activation+1.32e+00

Pos logit contributions
bel+0.078
cakes+0.075
organ+0.068
 Jos+0.064
spe+0.059
Neg logit contributions
 strangers-0.101
emy-0.096
 goats-0.096
 pad-0.092
 necks-0.092
Token her
Token ID607
Feature activation+1.09e-01

Pos logit contributions
 strangers+0.027
 goats+0.026
emy+0.026
 necks+0.025
 pad+0.025
Neg logit contributions
bel-0.018
cakes-0.017
organ-0.016
 Jos-0.015
he-0.013
Token to
Token ID284
Feature activation-3.15e-01

Pos logit contributions
bel+0.008
cakes+0.007
organ+0.007
 Jos+0.007
he+0.007
Neg logit contributions
 strangers-0.007
 goats-0.006
emy-0.006
 hesitation-0.006
 pad-0.006
Token be
Token ID307
Feature activation-1.69e+00

Pos logit contributions
 strangers+0.021
 goats+0.018
uf+0.018
 hesitation+0.018
emy+0.017
Neg logit contributions
bel-0.022
cakes-0.021
organ-0.020
 Jos-0.018
ushes-0.018
Token careful
Token ID8161
Feature activation-5.04e-01

Pos logit contributions
 strangers+0.087
 goats+0.080
emy+0.077
 hesitation+0.073
 necks+0.073
Neg logit contributions
cakes-0.139
bel-0.138
 Jos-0.122
organ-0.122
spe-0.122
Token of
Token ID286
Feature activation+2.73e+00

Pos logit contributions
 strangers+0.036
 goats+0.035
emy+0.034
 necks+0.033
 pad+0.033
Neg logit contributions
bel-0.030
cakes-0.029
organ-0.027
 Jos-0.025
he-0.024
Token strangers
Token ID18981
Feature activation-7.36e+00

Pos logit contributions
bel+0.162
cakes+0.156
organ+0.149
 Jos+0.144
 Sus+0.134
Neg logit contributions
emy-0.178
leness-0.177
 pad-0.173
uf-0.172
 goats-0.171
Token.
Token ID13
Feature activation+1.32e-01

Pos logit contributions
emy+0.508
 strangers+0.492
leness+0.486
 necks+0.481
 goats+0.476
Neg logit contributions
bel-0.430
cakes-0.425
organ-0.391
he-0.365
 Jos-0.363
Token A
Token ID317
Feature activation-1.28e+00

Pos logit contributions
bel+0.008
cakes+0.007
organ+0.007
 Jos+0.006
spe+0.006
Neg logit contributions
 strangers-0.009
 goats-0.009
emy-0.009
 pad-0.009
 necks-0.009
Token stranger
Token ID16195
Feature activation-1.80e+00

Pos logit contributions
 necks+0.085
agging+0.080
leness+0.079
emy+0.078
 strangers+0.078
Neg logit contributions
bel-0.085
cakes-0.079
organ-0.068
spe-0.066
 Jos-0.063
Token is
Token ID318
Feature activation-6.21e-01

Pos logit contributions
 strangers+0.115
 goats+0.108
emy+0.106
 pad+0.102
 hesitation+0.101
Neg logit contributions
bel-0.124
cakes-0.120
organ-0.111
 Jos-0.107
he-0.100
Token someone
Token ID2130
Feature activation-3.64e+00

Pos logit contributions
 strangers+0.037
emy+0.036
 goats+0.036
 necks+0.035
leness+0.035
Neg logit contributions
bel-0.044
cakes-0.041
organ-0.039
 Jos-0.037
spe-0.035
Token,
Token ID11
Feature activation-3.27e-01

Pos logit contributions
 strangers+0.030
emy+0.029
 goats+0.028
uf+0.028
 necks+0.027
Neg logit contributions
bel-0.027
cakes-0.026
organ-0.023
 Jos-0.022
spe-0.021
Token even
Token ID772
Feature activation+2.23e+00

Pos logit contributions
 strangers+0.020
 goats+0.019
emy+0.018
 necks+0.017
 pad+0.017
Neg logit contributions
bel-0.024
cakes-0.023
organ-0.021
 Jos-0.020
he-0.020
Token if
Token ID611
Feature activation+8.03e-01

Pos logit contributions
bel+0.120
cakes+0.110
organ+0.102
 Jos+0.096
spe+0.088
Neg logit contributions
 strangers-0.173
emy-0.167
 goats-0.166
 necks-0.164
uf-0.162
Token they
Token ID484
Feature activation+3.72e-01

Pos logit contributions
bel+0.040
cakes+0.037
organ+0.033
 Jos+0.032
 Sus+0.030
Neg logit contributions
 pad-0.064
emy-0.063
 strangers-0.062
uf-0.062
agging-0.062
Token are
Token ID389
Feature activation-5.56e-01

Pos logit contributions
bel+0.025
cakes+0.024
organ+0.022
 Jos+0.022
spe+0.020
Neg logit contributions
 strangers-0.025
 goats-0.023
emy-0.022
 hesitation-0.022
leness-0.022
Token strangers
Token ID18981
Feature activation-7.35e+00

Pos logit contributions
 strangers+0.036
 goats+0.034
emy+0.033
 necks+0.032
 pad+0.032
Neg logit contributions
bel-0.038
cakes-0.037
organ-0.034
 Jos-0.033
he-0.031
Token.
Token ID13
Feature activation+2.66e-01

Pos logit contributions
 strangers+0.467
emy+0.447
 goats+0.443
 necks+0.426
 pad+0.421
Neg logit contributions
bel-0.496
cakes-0.476
organ-0.440
 Jos-0.421
he-0.400
Token From
Token ID3574
Feature activation+1.47e-01

Pos logit contributions
bel+0.018
cakes+0.017
organ+0.016
 Jos+0.014
spe+0.014
Neg logit contributions
 strangers-0.017
emy-0.017
 goats-0.017
 pad-0.016
uf-0.016
Token that
Token ID326
Feature activation-1.91e-01

Pos logit contributions
bel+0.009
cakes+0.008
organ+0.008
he+0.007
 together+0.007
Neg logit contributions
 pad-0.010
 strangers-0.010
emy-0.010
leness-0.009
 unnecessary-0.009
Token day
Token ID1110
Feature activation-7.23e-01

Pos logit contributions
emy+0.014
 strangers+0.014
 goats+0.014
 necks+0.014
 unnecessary+0.014
Neg logit contributions
bel-0.010
cakes-0.009
he-0.009
organ-0.009
 Jos-0.009
Token on
Token ID319
Feature activation-9.10e-01

Pos logit contributions
 strangers+0.060
emy+0.059
 goats+0.059
 necks+0.059
 pad+0.056
Neg logit contributions
bel-0.034
cakes-0.032
organ-0.029
 Jos-0.028
he-0.027
Token to
Token ID284
Feature activation-3.53e-01

Pos logit contributions
bel+0.058
cakes+0.056
organ+0.052
 Jos+0.050
he+0.048
Neg logit contributions
 strangers-0.040
 goats-0.038
emy-0.037
 necks-0.036
 pad-0.035
Token always
Token ID1464
Feature activation+4.51e-01

Pos logit contributions
 strangers+0.023
 goats+0.022
emy+0.021
 necks+0.021
 pad+0.021
Neg logit contributions
bel-0.024
cakes-0.022
organ-0.021
 Jos-0.020
he-0.019
Token be
Token ID307
Feature activation-1.77e+00

Pos logit contributions
bel+0.026
organ+0.024
cakes+0.023
 Jos+0.021
whe+0.020
Neg logit contributions
 goats-0.031
 strangers-0.031
emy-0.031
uf-0.030
 pad-0.029
Token careful
Token ID8161
Feature activation-5.37e-01

Pos logit contributions
 strangers+0.089
 goats+0.083
emy+0.082
 necks+0.077
 hesitation+0.076
Neg logit contributions
bel-0.146
cakes-0.145
organ-0.132
 Jos-0.130
he-0.125
Token around
Token ID1088
Feature activation+7.79e-01

Pos logit contributions
 strangers+0.042
 goats+0.040
emy+0.040
uf+0.039
 hesitation+0.039
Neg logit contributions
bel-0.029
cakes-0.027
organ-0.025
 Jos-0.023
spe-0.022
Token strangers
Token ID18981
Feature activation-7.35e+00

Pos logit contributions
bel+0.050
cakes+0.049
organ+0.044
 Jos+0.041
spe+0.041
Neg logit contributions
 strangers-0.057
 goats-0.051
emy-0.051
iley-0.048
 necks-0.048
Token.
Token ID13
Feature activation+3.11e-01

Pos logit contributions
 strangers+0.469
emy+0.455
 goats+0.448
 necks+0.431
leness+0.430
Neg logit contributions
bel-0.487
cakes-0.468
organ-0.434
 Jos-0.413
he-0.398
Token<|endoftext|>
Token ID50256
Feature activation-1.65e+00

Pos logit contributions
bel+0.020
cakes+0.019
organ+0.018
 Jos+0.017
spe+0.016
Neg logit contributions
 strangers-0.021
emy-0.020
 goats-0.020
 pad-0.019
uf-0.019
TokenOnce
Token ID7454
Feature activation+1.33e+00

Pos logit contributions
 strangers+0.102
emy+0.096
 goats+0.095
 pad+0.090
 necks+0.090
Neg logit contributions
bel-0.116
cakes-0.112
organ-0.101
 Jos-0.100
spe-0.096
Token upon
Token ID2402
Feature activation+1.31e+00

Pos logit contributions
bel+0.073
cakes+0.069
organ+0.063
 Jos+0.060
he+0.055
Neg logit contributions
 strangers-0.102
 goats-0.098
emy-0.097
 necks-0.094
 pad-0.093
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
bel+0.077
cakes+0.075
organ+0.067
 Jos+0.065
spe+0.060
Neg logit contributions
 strangers-0.096
 goats-0.091
emy-0.091
 necks-0.087
 pad-0.087
Token€
Token ID26391
Feature activation-8.45e-01

Pos logit contributions
bel+0.048
cakes+0.046
organ+0.043
 Jos+0.041
he+0.038
Neg logit contributions
 strangers-0.043
 goats-0.040
emy-0.040
 necks-0.038
 pad-0.038
Tokenâ„¢
Token ID8151
Feature activation-8.52e-02

Pos logit contributions
 strangers+0.049
emy+0.047
 goats+0.047
 necks+0.045
 pad+0.045
Neg logit contributions
bel-0.059
cakes-0.059
organ-0.056
 Jos-0.053
he-0.049
Tokent
Token ID83
Feature activation+1.16e+00

Pos logit contributions
 strangers+0.006
 goats+0.006
emy+0.006
 pad+0.006
uf+0.006
Neg logit contributions
bel-0.005
cakes-0.004
organ-0.004
 Jos-0.004
ushes-0.003
Token scared
Token ID12008
Feature activation+7.49e-01

Pos logit contributions
bel+0.078
cakes+0.076
organ+0.070
 Jos+0.067
he+0.062
Neg logit contributions
 strangers-0.076
 goats-0.071
emy-0.070
 hesitation-0.068
 pad-0.067
Token of
Token ID286
Feature activation+2.90e+00

Pos logit contributions
bel+0.045
cakes+0.044
 Jos+0.040
organ+0.039
 Sus+0.037
Neg logit contributions
 necks-0.050
emy-0.050
 strangers-0.049
 goats-0.049
 pad-0.048
Token strangers
Token ID18981
Feature activation-7.34e+00

Pos logit contributions
bel+0.153
cakes+0.149
 Jos+0.137
organ+0.131
 Sus+0.128
Neg logit contributions
emy-0.210
 pad-0.206
 necks-0.204
 strangers-0.203
 goats-0.203
Token anymore
Token ID7471
Feature activation-1.07e+00

Pos logit contributions
 strangers+0.467
emy+0.457
 goats+0.445
 necks+0.436
leness+0.430
Neg logit contributions
bel-0.479
cakes-0.467
organ-0.424
 Jos-0.411
he-0.399
Token -
Token ID532
Feature activation+2.11e+00

Pos logit contributions
 strangers+0.071
emy+0.068
 goats+0.068
 necks+0.066
leness+0.065
Neg logit contributions
bel-0.068
cakes-0.065
organ-0.059
 Jos-0.058
he-0.054
Token he
Token ID339
Feature activation+5.84e-01

Pos logit contributions
bel+0.130
cakes+0.125
organ+0.115
 Jos+0.109
he+0.102
Neg logit contributions
 strangers-0.149
 goats-0.142
emy-0.139
 necks-0.134
 pad-0.134
Token was
Token ID373
Feature activation+1.46e-01

Pos logit contributions
bel+0.036
cakes+0.035
organ+0.032
 Jos+0.031
spe+0.029
Neg logit contributions
 strangers-0.040
emy-0.038
 goats-0.038
 necks-0.037
leness-0.036
Token very
Token ID845
Feature activation-1.74e-01

Pos logit contributions
bel+0.009
cakes+0.009
organ+0.008
 Jos+0.008
he+0.008
Neg logit contributions
 strangers-0.009
emy-0.009
 goats-0.009
 necks-0.009
leness-0.009
Token friends
Token ID2460
Feature activation+7.33e-01

Pos logit contributions
bel+0.005
cakes+0.005
organ+0.005
 Jos+0.004
he+0.004
Neg logit contributions
 strangers-0.005
 goats-0.004
emy-0.004
 necks-0.004
 pad-0.004
Token and
Token ID290
Feature activation-7.23e-02

Pos logit contributions
bel+0.050
cakes+0.049
organ+0.045
 Jos+0.043
he+0.041
Neg logit contributions
 strangers-0.046
emy-0.044
 goats-0.043
 necks-0.041
leness-0.041
Token not
Token ID407
Feature activation+7.13e-01

Pos logit contributions
 strangers+0.005
 goats+0.004
emy+0.004
 hesitation+0.004
 necks+0.004
Neg logit contributions
cakes-0.005
bel-0.005
organ-0.005
 Jos-0.004
he-0.004
Token talking
Token ID3375
Feature activation-6.38e-01

Pos logit contributions
bel+0.049
cakes+0.048
organ+0.044
 Jos+0.043
he+0.040
Neg logit contributions
 strangers-0.046
 goats-0.043
emy-0.042
 hesitation-0.041
 pad-0.041
Token to
Token ID284
Feature activation-2.86e-01

Pos logit contributions
 strangers+0.041
 goats+0.037
 pad+0.034
emy+0.034
 unnecessary+0.034
Neg logit contributions
bel-0.044
cakes-0.043
organ-0.041
 Jos-0.040
he-0.037
Token strangers
Token ID18981
Feature activation-7.34e+00

Pos logit contributions
emy+0.018
 strangers+0.018
 goats+0.018
 pad+0.018
 necks+0.017
Neg logit contributions
bel-0.018
cakes-0.017
organ-0.016
 Jos-0.016
spe-0.015
Token.
Token ID13
Feature activation+2.48e-01

Pos logit contributions
 strangers+0.472
emy+0.453
 goats+0.449
 necks+0.432
leness+0.426
Neg logit contributions
bel-0.487
cakes-0.471
organ-0.433
 Jos-0.415
he-0.394
Token Lily
Token ID20037
Feature activation+2.30e-01

Pos logit contributions
bel+0.017
cakes+0.016
organ+0.015
 Jos+0.014
he+0.014
Neg logit contributions
 strangers-0.015
 goats-0.015
 pad-0.014
emy-0.014
 necks-0.014
Token thought
Token ID1807
Feature activation+1.72e+00

Pos logit contributions
bel+0.014
cakes+0.013
organ+0.013
 Jos+0.012
spe+0.011
Neg logit contributions
 strangers-0.016
 goats-0.015
emy-0.015
 hesitation-0.015
leness-0.015
Token that
Token ID326
Feature activation-2.05e-01

Pos logit contributions
bel+0.108
cakes+0.107
organ+0.096
 Jos+0.093
he+0.087
Neg logit contributions
 strangers-0.115
emy-0.112
 goats-0.110
 necks-0.108
 pad-0.107
Token was
Token ID373
Feature activation+1.71e-01

Pos logit contributions
 strangers+0.015
 goats+0.014
emy+0.013
 hesitation+0.013
leness+0.013
Neg logit contributions
bel-0.013
cakes-0.012
organ-0.011
 Jos-0.011
he-0.010
Token said
Token ID531
Feature activation+7.04e-01

Pos logit contributions
 strangers+0.007
emy+0.006
 goats+0.006
 unnecessary+0.006
leness+0.006
Neg logit contributions
bel-0.007
cakes-0.007
organ-0.006
 Jos-0.006
spe-0.005
Token it
Token ID340
Feature activation-3.60e-01

Pos logit contributions
bel+0.052
cakes+0.050
organ+0.047
 Jos+0.044
he+0.043
Neg logit contributions
 strangers-0.042
 goats-0.039
emy-0.038
 hesitation-0.037
 necks-0.036
Token was
Token ID373
Feature activation+2.52e-01

Pos logit contributions
 strangers+0.021
 goats+0.019
 pad+0.019
 necks+0.018
 goat+0.017
Neg logit contributions
bel-0.026
cakes-0.024
 Jos-0.023
organ-0.022
ushes-0.022
Token time
Token ID640
Feature activation-2.16e+00

Pos logit contributions
bel+0.022
cakes+0.021
organ+0.019
he+0.018
spe+0.018
Neg logit contributions
 strangers-0.013
 goats-0.011
 hesitation-0.011
 pad-0.011
emy-0.011
Token to
Token ID284
Feature activation-2.05e-01

Pos logit contributions
emy+0.194
 strangers+0.189
 goats+0.187
leness+0.185
uf+0.182
Neg logit contributions
bel-0.096
cakes-0.085
 Jos-0.080
 Sus-0.072
 together-0.069
Token un
Token ID555
Feature activation+4.89e+00

Pos logit contributions
 strangers+0.015
 goats+0.014
 hesitation+0.013
uf+0.013
emy+0.013
Neg logit contributions
bel-0.013
cakes-0.013
organ-0.012
spe-0.011
 Jos-0.011
Tokenpack
Token ID8002
Feature activation+2.02e+00

Pos logit contributions
cakes+0.218
bel+0.210
 Jos+0.194
ushes+0.178
 Suz+0.174
Neg logit contributions
 strangers-0.358
 necks-0.355
 Street-0.353
agging-0.348
 years-0.347
Token their
Token ID511
Feature activation-1.15e-01

Pos logit contributions
bel+0.140
cakes+0.133
organ+0.128
 Jos+0.114
spe+0.113
Neg logit contributions
 strangers-0.130
 goats-0.118
 necks-0.115
iley-0.113
 hesitation-0.113
Token bags
Token ID11668
Feature activation+5.46e-01

Pos logit contributions
 strangers+0.006
 goats+0.006
emy+0.006
 necks+0.006
 pad+0.005
Neg logit contributions
bel-0.009
cakes-0.009
organ-0.008
 Jos-0.008
he-0.008
Token.
Token ID13
Feature activation+2.20e-01

Pos logit contributions
bel+0.045
cakes+0.043
organ+0.043
 Jos+0.040
spe+0.039
Neg logit contributions
 strangers-0.024
 pad-0.023
 goats-0.021
 hesitation-0.020
 necks-0.020
Token Tom
Token ID4186
Feature activation-3.16e-01

Pos logit contributions
bel+0.018
organ+0.016
spe+0.016
cakes+0.015
ator+0.015
Neg logit contributions
 goats-0.010
uf-0.010
 strangers-0.010
 pad-0.010
 necks-0.010
Token up
Token ID510
Feature activation+2.07e+00

Pos logit contributions
 strangers+0.086
 goats+0.084
 dogs+0.077
 goat+0.075
 necks+0.074
Neg logit contributions
bel-0.090
spe-0.078
cakes-0.073
organ-0.071
castle-0.068
Token the
Token ID262
Feature activation+2.18e-01

Pos logit contributions
bel+0.142
cakes+0.130
organ+0.128
spe+0.115
 Jos+0.112
Neg logit contributions
 strangers-0.141
 goats-0.135
 necks-0.127
emy-0.122
 tails-0.122
Token blocks
Token ID7021
Feature activation+3.40e+00

Pos logit contributions
bel+0.015
cakes+0.015
organ+0.014
 Jos+0.013
he+0.013
Neg logit contributions
 strangers-0.014
 goats-0.013
emy-0.012
 necks-0.012
 pad-0.012
Token.
Token ID13
Feature activation-1.80e-01

Pos logit contributions
bel+0.319
organ+0.313
cakes+0.296
 Jos+0.289
spe+0.274
Neg logit contributions
 strangers-0.114
 pad-0.106
 necks-0.093
 goats-0.093
 hesitation-0.091
Token They
Token ID1119
Feature activation-5.96e-01

Pos logit contributions
 strangers+0.014
 goats+0.013
emy+0.013
 necks+0.013
uf+0.012
Neg logit contributions
bel-0.010
cakes-0.010
organ-0.009
 Jos-0.008
he-0.008
Token work
Token ID670
Feature activation+4.56e+00

Pos logit contributions
 strangers+0.031
 necks+0.027
 walks+0.027
 goats+0.026
 hesitation+0.026
Neg logit contributions
bel-0.048
cakes-0.046
organ-0.043
spe-0.041
 Jos-0.040
Token together
Token ID1978
Feature activation+3.90e+00

Pos logit contributions
bel+0.288
cakes+0.280
 Jos+0.277
organ+0.267
 Suz+0.256
Neg logit contributions
 strangers-0.278
agging-0.267
 necks-0.256
 hesitation-0.252
 risking-0.247
Token to
Token ID284
Feature activation-1.83e-01

Pos logit contributions
bel+0.291
cakes+0.284
organ+0.263
 Jos+0.256
spe+0.240
Neg logit contributions
 strangers-0.225
 goats-0.205
 pad-0.201
emy-0.200
 necks-0.199
Token make
Token ID787
Feature activation+2.60e-01

Pos logit contributions
 strangers+0.014
 necks+0.012
 hesitation+0.012
 goats+0.012
 appearance+0.012
Neg logit contributions
cakes-0.011
bel-0.011
organ-0.009
spe-0.009
 Jos-0.009
Token a
Token ID257
Feature activation+5.96e-02

Pos logit contributions
bel+0.020
cakes+0.019
organ+0.018
spe+0.018
he+0.017
Neg logit contributions
 strangers-0.015
 goats-0.013
 necks-0.013
 unnecessary-0.013
 tails-0.012
Token new
Token ID649
Feature activation-1.78e+00

Pos logit contributions
organ+0.005
cakes+0.005
bel+0.005
 Jos+0.004
he+0.004
Neg logit contributions
 strangers-0.003
emy-0.003
 pad-0.003
 goats-0.003
 thorn-0.003
TokenArtist
Token ID43020
Feature activation-1.35e+00

Pos logit contributions
bel+0.018
cakes+0.018
organ+0.016
 Jos+0.016
castle+0.015
Neg logit contributions
 strangers-0.012
 goats-0.012
emy-0.011
 necks-0.011
leness-0.011
Token".
Token ID1911
Feature activation+2.80e-01

Pos logit contributions
 strangers+0.078
 goats+0.074
emy+0.072
 pad+0.070
iley+0.070
Neg logit contributions
bel-0.098
cakes-0.096
organ-0.091
 Jos-0.084
he-0.081
Token She
Token ID1375
Feature activation+4.69e-01

Pos logit contributions
bel+0.018
cakes+0.016
organ+0.016
spe+0.014
 Jos+0.014
Neg logit contributions
 strangers-0.020
 goats-0.019
emy-0.018
uf-0.018
 hesitation-0.018
Token made
Token ID925
Feature activation+1.28e+00

Pos logit contributions
bel+0.038
cakes+0.038
organ+0.035
 Jos+0.034
he+0.032
Neg logit contributions
 strangers-0.024
 goats-0.023
 necks-0.022
emy-0.021
 pad-0.021
Token the
Token ID262
Feature activation+1.57e-01

Pos logit contributions
bel+0.092
organ+0.092
cakes+0.088
spe+0.081
castle+0.081
Neg logit contributions
 strangers-0.078
 goats-0.071
 hesitation-0.062
emy-0.062
 necks-0.062
Token art
Token ID1242
Feature activation+4.56e+00

Pos logit contributions
bel+0.010
cakes+0.010
organ+0.010
 Jos+0.009
spe+0.008
Neg logit contributions
 strangers-0.010
 goats-0.010
 pad-0.009
 necks-0.009
 hesitation-0.009
Token.
Token ID13
Feature activation-2.07e-01

Pos logit contributions
organ+0.350
 Suz+0.309
 Sus+0.302
cakes+0.301
izz+0.297
Neg logit contributions
 strangers-0.213
 tale-0.204
 pad-0.186
leness-0.183
agging-0.183
Token She
Token ID1375
Feature activation+4.89e-01

Pos logit contributions
 strangers+0.013
 goats+0.013
emy+0.012
uf+0.012
 pad+0.012
Neg logit contributions
bel-0.014
cakes-0.013
organ-0.013
spe-0.012
castle-0.012
Token sees
Token ID7224
Feature activation-9.59e-01

Pos logit contributions
cakes+0.037
bel+0.037
organ+0.034
 Jos+0.032
he+0.031
Neg logit contributions
 strangers-0.028
 goats-0.026
 necks-0.025
emy-0.024
 pad-0.024
Token Tom
Token ID4186
Feature activation-7.17e-01

Pos logit contributions
 strangers+0.064
 goats+0.060
emy+0.056
 necks+0.055
 hesitation+0.055
Neg logit contributions
bel-0.064
cakes-0.062
organ-0.060
 Jos-0.054
spe-0.052
Token and
Token ID290
Feature activation-3.49e-01

Pos logit contributions
 strangers+0.042
emy+0.039
 goats+0.039
 necks+0.037
leness+0.037
Neg logit contributions
bel-0.053
cakes-0.051
organ-0.047
 Jos-0.046
he-0.043
Token flags
Token ID9701
Feature activation+1.21e+00

Pos logit contributions
organ+0.141
bel+0.138
cakes+0.134
spe+0.124
he+0.123
Neg logit contributions
 strangers-0.097
 necks-0.097
 goats-0.096
 tails-0.091
 pad-0.086
Token and
Token ID290
Feature activation-3.02e-01

Pos logit contributions
organ+0.110
bel+0.105
cakes+0.101
 Jos+0.094
inv+0.093
Neg logit contributions
 pad-0.047
 goats-0.045
 strangers-0.043
 necks-0.042
 tails-0.040
Token towers
Token ID18028
Feature activation+3.70e+00

Pos logit contributions
 strangers+0.019
 goats+0.018
 necks+0.017
emy+0.017
 pad+0.017
Neg logit contributions
bel-0.021
cakes-0.020
organ-0.019
he-0.018
 Jos-0.018
Token with
Token ID351
Feature activation+7.93e-01

Pos logit contributions
organ+0.341
cakes+0.327
bel+0.326
 Jos+0.314
inv+0.300
Neg logit contributions
 necks-0.112
 pad-0.112
 strangers-0.109
agging-0.100
 tails-0.099
Token cr
Token ID1067
Feature activation+3.52e+00

Pos logit contributions
cakes+0.055
bel+0.055
organ+0.051
he+0.045
whe+0.044
Neg logit contributions
 strangers-0.052
 goats-0.048
 necks-0.048
 tails-0.045
 hesitation-0.045
Tokenay
Token ID323
Feature activation+4.54e+00

Pos logit contributions
bel+0.184
cakes+0.162
organ+0.158
spe+0.144
 Jos+0.138
Neg logit contributions
emy-0.268
 strangers-0.266
agging-0.265
 pad-0.263
 necks-0.262
Tokenons
Token ID684
Feature activation+2.06e-01

Pos logit contributions
bel+0.275
cakes+0.273
 Jos+0.239
organ+0.239
ushes+0.214
Neg logit contributions
 strangers-0.325
 goats-0.307
emy-0.301
 hesitation-0.294
uf-0.294
Token.
Token ID13
Feature activation-2.41e-01

Pos logit contributions
bel+0.015
cakes+0.015
organ+0.014
 Jos+0.013
spe+0.012
Neg logit contributions
 strangers-0.012
 goats-0.011
 pad-0.011
emy-0.011
 necks-0.011
Token They
Token ID1119
Feature activation-6.74e-01

Pos logit contributions
 strangers+0.017
 goats+0.017
emy+0.017
 necks+0.016
 pad+0.016
Neg logit contributions
bel-0.015
cakes-0.014
organ-0.013
he-0.011
spe-0.011
Token put
Token ID1234
Feature activation+8.80e-01

Pos logit contributions
 necks+0.033
 pad+0.031
 strangers+0.030
 goats+0.029
 tails+0.027
Neg logit contributions
cakes-0.054
bel-0.049
umm-0.046
organ-0.044
ores-0.044
Token the
Token ID262
Feature activation+1.18e-01

Pos logit contributions
bel+0.066
cakes+0.063
organ+0.062
spe+0.058
he+0.055
Neg logit contributions
 necks-0.048
 strangers-0.048
 goats-0.046
 tails-0.044
 pad-0.040
Token arrived
Token ID5284
Feature activation-1.34e+00

Pos logit contributions
bel+0.053
cakes+0.050
organ+0.047
ushes+0.044
 Jos+0.043
Neg logit contributions
 strangers-0.049
 goats-0.046
 pad-0.045
 hesitation-0.044
iley-0.043
Token,
Token ID11
Feature activation-3.35e-01

Pos logit contributions
 strangers+0.066
 goats+0.059
 hesitation+0.058
iley+0.055
agging+0.054
Neg logit contributions
cakes-0.110
bel-0.110
organ-0.109
 Jos-0.097
whe-0.095
Token all
Token ID477
Feature activation+1.05e+00

Pos logit contributions
 strangers+0.022
 goats+0.020
 hesitation+0.018
 goat+0.018
iley+0.018
Neg logit contributions
cakes-0.023
organ-0.022
bel-0.021
castle-0.020
iced-0.019
Token of
Token ID286
Feature activation+2.68e+00

Pos logit contributions
organ+0.066
bel+0.064
cakes+0.064
ushes+0.052
iced+0.051
Neg logit contributions
 strangers-0.074
 goats-0.070
 hesitation-0.066
leness-0.066
iley-0.064
Token them
Token ID606
Feature activation+3.17e-02

Pos logit contributions
bel+0.155
cakes+0.149
organ+0.135
 Jos+0.127
he+0.118
Neg logit contributions
 strangers-0.202
 goats-0.192
emy-0.190
 necks-0.183
 hesitation-0.182
Token o
Token ID267
Feature activation+4.80e+00

Pos logit contributions
bel+0.002
cakes+0.002
organ+0.002
 Jos+0.002
ushes+0.002
Neg logit contributions
 strangers-0.002
 goats-0.002
 hesitation-0.002
 pad-0.002
 necks-0.002
Tokenoh
Token ID1219
Feature activation+1.78e+00

Pos logit contributions
bel+0.322
cakes+0.303
organ+0.297
 Jos+0.278
spe+0.259
Neg logit contributions
 strangers-0.295
emy-0.285
 goats-0.284
agging-0.276
leness-0.275
Token-
Token ID12
Feature activation+3.30e+00

Pos logit contributions
bel+0.124
cakes+0.119
organ+0.108
 Jos+0.106
spe+0.099
Neg logit contributions
 strangers-0.113
 goats-0.106
emy-0.106
 necks-0.101
leness-0.100
Tokened
Token ID276
Feature activation+3.26e+00

Pos logit contributions
bel+0.204
cakes+0.199
organ+0.193
 Jos+0.176
he+0.169
Neg logit contributions
 strangers-0.211
 goats-0.201
emy-0.199
iley-0.194
 pad-0.194
Token and
Token ID290
Feature activation-8.32e-02

Pos logit contributions
cakes+0.237
bel+0.237
organ+0.225
 Jos+0.205
he+0.198
Neg logit contributions
 strangers-0.192
 goats-0.180
 pad-0.175
 hesitation-0.174
 necks-0.169
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 strangers+0.005
 goats+0.005
 hesitation+0.005
 necks+0.005
 pad+0.005
Neg logit contributions
cakes-0.006
organ-0.005
bel-0.005
he-0.005
ushes-0.005
Token she
Token ID673
Feature activation+7.94e-01

Pos logit contributions
 strangers+0.132
 hesitation+0.126
iley+0.121
 years+0.117
 pad+0.115
Neg logit contributions
bel-0.164
cakes-0.155
organ-0.154
ushes-0.132
room-0.131
Token smiled
Token ID13541
Feature activation+6.18e-01

Pos logit contributions
bel+0.067
cakes+0.064
organ+0.060
 Jos+0.057
ushes+0.055
Neg logit contributions
 strangers-0.038
 pad-0.035
iley-0.034
emy-0.034
 goats-0.033
Token as
Token ID355
Feature activation+5.59e-01

Pos logit contributions
cakes+0.053
bel+0.050
organ+0.047
 Jos+0.045
he+0.043
Neg logit contributions
 pad-0.028
 strangers-0.027
 hesitation-0.025
 goats-0.025
agging-0.025
Token she
Token ID673
Feature activation+7.92e-01

Pos logit contributions
bel+0.042
organ+0.041
cakes+0.041
iced+0.036
tub+0.036
Neg logit contributions
 strangers-0.031
 hesitation-0.027
iley-0.026
 goats-0.025
leness-0.024
Token skipped
Token ID26684
Feature activation-2.45e+00

Pos logit contributions
bel+0.057
cakes+0.055
organ+0.051
 Jos+0.049
spe+0.046
Neg logit contributions
 strangers-0.046
emy-0.044
 goats-0.044
 pad-0.043
 necks-0.043
Token home
Token ID1363
Feature activation+2.09e+00

Pos logit contributions
 strangers+0.150
 pad+0.144
agging+0.140
 goats+0.138
 hesitation+0.137
Neg logit contributions
cakes-0.162
bel-0.162
organ-0.157
 Jos-0.144
spe-0.137
Token,
Token ID11
Feature activation-3.23e-01

Pos logit contributions
cakes+0.163
organ+0.157
bel+0.154
 Jos+0.142
ushes+0.137
Neg logit contributions
 strangers-0.102
 hesitation-0.094
 years-0.092
 pad-0.091
 necks-0.090
Token counting
Token ID14143
Feature activation+1.50e+00

Pos logit contributions
 strangers+0.019
 goats+0.017
 pad+0.017
emy+0.017
 necks+0.017
Neg logit contributions
bel-0.024
cakes-0.023
organ-0.022
 Jos-0.021
he-0.019
Token every
Token ID790
Feature activation-8.96e-01

Pos logit contributions
cakes+0.098
bel+0.097
organ+0.095
 Jos+0.086
he+0.081
Neg logit contributions
 strangers-0.097
 goats-0.087
 necks-0.085
 hesitation-0.085
 pad-0.084
Token step
Token ID2239
Feature activation-4.05e+00

Pos logit contributions
 strangers+0.054
 goats+0.050
emy+0.049
 pad+0.048
 hesitation+0.047
Neg logit contributions
bel-0.065
cakes-0.063
organ-0.059
 Jos-0.056
he-0.053
Token along
Token ID1863
Feature activation-3.29e-01

Pos logit contributions
 strangers+0.258
 pad+0.236
 goats+0.235
 hesitation+0.225
agging+0.223
Neg logit contributions
bel-0.265
organ-0.261
cakes-0.252
 Jos-0.227
ushes-0.213
Token,
Token ID11
Feature activation-5.55e-01

Pos logit contributions
 strangers+0.130
emy+0.121
 goats+0.121
 necks+0.114
 pad+0.114
Neg logit contributions
bel-0.190
cakes-0.184
organ-0.172
 Jos-0.166
he-0.156
Token there
Token ID612
Feature activation+9.98e-01

Pos logit contributions
 strangers+0.037
emy+0.036
 goats+0.034
 pad+0.034
 necks+0.033
Neg logit contributions
cakes-0.037
bel-0.036
organ-0.031
 Jos-0.030
spe-0.028
Token was
Token ID373
Feature activation+6.77e-02

Pos logit contributions
bel+0.071
cakes+0.070
organ+0.060
 Jos+0.058
ushes+0.056
Neg logit contributions
 strangers-0.062
emy-0.057
 goats-0.056
 pad-0.055
iley-0.053
Token a
Token ID257
Feature activation+1.31e-01

Pos logit contributions
bel+0.004
cakes+0.004
organ+0.004
 Jos+0.004
spe+0.003
Neg logit contributions
 strangers-0.005
 goats-0.005
emy-0.005
 pad-0.004
 necks-0.004
Token loud
Token ID7812
Feature activation+2.53e-01

Pos logit contributions
bel+0.010
cakes+0.009
organ+0.009
 Jos+0.008
he+0.008
Neg logit contributions
 strangers-0.008
 goats-0.007
emy-0.007
 necks-0.007
leness-0.007
Token noise
Token ID7838
Feature activation+6.17e-01

Pos logit contributions
bel+0.017
cakes+0.017
 Jos+0.015
organ+0.015
he+0.015
Neg logit contributions
 strangers-0.015
 unnecessary-0.015
emy-0.015
 hesitation-0.014
 necks-0.014
Token at
Token ID379
Feature activation+2.86e+00

Pos logit contributions
bel+0.042
cakes+0.041
organ+0.038
 Jos+0.036
he+0.034
Neg logit contributions
 strangers-0.039
emy-0.037
 goats-0.037
 necks-0.035
 pad-0.035
Token the
Token ID262
Feature activation+3.92e-01

Pos logit contributions
bel+0.198
cakes+0.193
organ+0.172
 Jos+0.159
spe+0.159
Neg logit contributions
 strangers-0.194
 goats-0.176
emy-0.166
 necks-0.165
 hesitation-0.163
Token open
Token ID1280
Feature activation+2.88e-02

Pos logit contributions
bel+0.029
cakes+0.028
organ+0.026
 Jos+0.025
he+0.024
Neg logit contributions
 strangers-0.023
emy-0.022
 goats-0.021
leness-0.021
 necks-0.021
Token house
Token ID2156
Feature activation+2.75e+00

Pos logit contributions
bel+0.002
cakes+0.002
organ+0.002
 Jos+0.002
he+0.002
Neg logit contributions
 strangers-0.002
emy-0.002
 goats-0.002
leness-0.002
 necks-0.002
Token.
Token ID13
Feature activation+1.03e-01

Pos logit contributions
bel+0.199
cakes+0.192
organ+0.181
 Jos+0.173
spe+0.160
Neg logit contributions
 strangers-0.167
 goats-0.155
emy-0.149
 necks-0.146
 pad-0.146
Token,
Token ID11
Feature activation-5.54e-01

Pos logit contributions
 strangers+0.194
emy+0.186
 goats+0.184
leness+0.175
 necks+0.175
Neg logit contributions
bel-0.193
cakes-0.186
organ-0.171
 Jos-0.164
he-0.155
Token but
Token ID475
Feature activation-2.39e+00

Pos logit contributions
 strangers+0.031
 goats+0.029
emy+0.028
 necks+0.027
 pad+0.027
Neg logit contributions
bel-0.043
cakes-0.042
organ-0.040
 Jos-0.037
he-0.036
Token green
Token ID4077
Feature activation-1.95e-01

Pos logit contributions
 strangers+0.145
 goats+0.134
emy+0.132
 necks+0.126
 pad+0.126
Neg logit contributions
bel-0.175
cakes-0.169
organ-0.158
 Jos-0.149
he-0.143
Token.
Token ID13
Feature activation-2.27e-01

Pos logit contributions
 strangers+0.012
emy+0.012
 goats+0.012
uf+0.012
 hesitation+0.011
Neg logit contributions
bel-0.013
cakes-0.013
organ-0.011
 Jos-0.011
he-0.010
Token You
Token ID921
Feature activation+7.47e-02

Pos logit contributions
 strangers+0.016
 goats+0.015
emy+0.015
 necks+0.014
 hesitation+0.014
Neg logit contributions
bel-0.014
cakes-0.014
organ-0.013
 Jos-0.012
he-0.011
Token can
Token ID460
Feature activation+1.28e+00

Pos logit contributions
cakes+0.006
bel+0.006
 Jos+0.005
organ+0.005
castle+0.005
Neg logit contributions
 strangers-0.004
 goats-0.004
 pad-0.004
uf-0.004
 risking-0.004
Token wear
Token ID5806
Feature activation-2.02e+00

Pos logit contributions
bel+0.087
cakes+0.084
organ+0.074
 Jos+0.072
spe+0.071
Neg logit contributions
 strangers-0.086
 goats-0.078
uf-0.078
emy-0.077
iley-0.076
Token it
Token ID340
Feature activation-5.13e-01

Pos logit contributions
 strangers+0.149
emy+0.148
 goats+0.144
uf+0.140
 hesitation+0.140
Neg logit contributions
bel-0.110
cakes-0.107
 Jos-0.097
organ-0.094
 Sus-0.089
Token when
Token ID618
Feature activation+2.39e-01

Pos logit contributions
 strangers+0.032
 goats+0.030
 pad+0.029
emy+0.029
 necks+0.029
Neg logit contributions
bel-0.036
cakes-0.034
organ-0.032
 Jos-0.031
spe-0.029
Token you
Token ID345
Feature activation-1.48e+00

Pos logit contributions
bel+0.017
cakes+0.017
 Jos+0.015
organ+0.015
 Sus+0.014
Neg logit contributions
 strangers-0.014
emy-0.013
 goats-0.013
leness-0.013
 pad-0.013
Token take
Token ID1011
Feature activation-1.90e+00

Pos logit contributions
 strangers+0.099
 goats+0.094
 pad+0.090
emy+0.090
uf+0.089
Neg logit contributions
bel-0.097
cakes-0.095
organ-0.085
 Jos-0.084
spe-0.080
TokenOnce
Token ID7454
Feature activation+1.47e+00

Pos logit contributions
 strangers+0.088
 goats+0.084
emy+0.083
 pad+0.080
iley+0.078
Neg logit contributions
bel-0.124
cakes-0.118
organ-0.109
 Jos-0.106
spe-0.101
Token upon
Token ID2402
Feature activation+1.45e+00

Pos logit contributions
bel+0.089
cakes+0.086
organ+0.079
 Jos+0.073
spe+0.067
Neg logit contributions
 strangers-0.105
emy-0.100
 goats-0.098
 pad-0.095
 hesitation-0.095
Token a
Token ID257
Feature activation+3.38e-01

Pos logit contributions
bel+0.088
cakes+0.086
organ+0.077
 Jos+0.074
spe+0.070
Neg logit contributions
 strangers-0.103
 goats-0.097
emy-0.097
 necks-0.093
 pad-0.092
Token time
Token ID640
Feature activation-2.15e+00

Pos logit contributions
he+0.019
bel+0.018
organ+0.018
cakes+0.016
iced+0.016
Neg logit contributions
 goats-0.024
 strangers-0.023
leness-0.023
 risking-0.023
 hesitation-0.022
Token,
Token ID11
Feature activation-2.97e-01

Pos logit contributions
 strangers+0.122
 goats+0.115
emy+0.114
 necks+0.109
leness+0.108
Neg logit contributions
bel-0.162
cakes-0.156
organ-0.146
 Jos-0.142
he-0.133
Token there
Token ID612
Feature activation+1.18e+00

Pos logit contributions
 strangers+0.020
emy+0.019
 goats+0.018
 pad+0.018
 hesitation+0.017
Neg logit contributions
cakes-0.020
bel-0.020
organ-0.017
 Jos-0.016
spe-0.015
Token was
Token ID373
Feature activation+2.43e-01

Pos logit contributions
bel+0.080
cakes+0.076
organ+0.066
 Jos+0.064
ushes+0.061
Neg logit contributions
 strangers-0.079
emy-0.073
 goats-0.072
 necks-0.070
 pad-0.070
Token a
Token ID257
Feature activation+3.34e-01

Pos logit contributions
bel+0.015
cakes+0.014
organ+0.013
 Jos+0.012
spe+0.012
Neg logit contributions
 strangers-0.017
 goats-0.017
emy-0.016
 pad-0.016
 necks-0.016
Token tree
Token ID5509
Feature activation-1.75e+00

Pos logit contributions
bel+0.024
cakes+0.023
organ+0.021
 Jos+0.020
he+0.019
Neg logit contributions
 strangers-0.020
 goats-0.019
 necks-0.019
emy-0.018
leness-0.018
Token.
Token ID13
Feature activation+3.98e-01

Pos logit contributions
 unnecessary+0.144
 strangers+0.142
 hesitation+0.140
leness+0.139
emy+0.139
Neg logit contributions
bel-0.089
cakes-0.079
he-0.071
organ-0.069
spe-0.066
Token It
Token ID632
Feature activation+5.39e-01

Pos logit contributions
bel+0.027
cakes+0.025
organ+0.023
 Jos+0.022
spe+0.021
Neg logit contributions
 strangers-0.026
 goats-0.025
emy-0.024
uf-0.024
 pad-0.023
TokenOnce
Token ID7454
Feature activation+1.39e+00

Pos logit contributions
 strangers+0.113
 goats+0.106
emy+0.106
 pad+0.101
 necks+0.101
Neg logit contributions
bel-0.117
cakes-0.112
organ-0.102
 Jos-0.099
spe-0.093
Token upon
Token ID2402
Feature activation+1.43e+00

Pos logit contributions
bel+0.080
cakes+0.077
organ+0.070
 Jos+0.066
spe+0.061
Neg logit contributions
 strangers-0.103
 goats-0.098
emy-0.098
 necks-0.094
 pad-0.093
Token a
Token ID257
Feature activation+2.19e-01

Pos logit contributions
bel+0.084
cakes+0.081
organ+0.073
 Jos+0.071
spe+0.065
Neg logit contributions
 strangers-0.104
 goats-0.099
emy-0.098
 necks-0.095
 pad-0.094
Token time
Token ID640
Feature activation-2.21e+00

Pos logit contributions
he+0.013
bel+0.012
organ+0.012
cakes+0.011
iced+0.011
Neg logit contributions
 goats-0.015
leness-0.015
 strangers-0.015
 hesitation-0.014
 risking-0.014
Token,
Token ID11
Feature activation-5.00e-01

Pos logit contributions
 strangers+0.135
 goats+0.129
emy+0.125
leness+0.122
 necks+0.122
Neg logit contributions
bel-0.157
cakes-0.150
organ-0.141
 Jos-0.137
he-0.129
Token there
Token ID612
Feature activation+1.11e+00

Pos logit contributions
 strangers+0.035
emy+0.034
 goats+0.033
 pad+0.032
 necks+0.032
Neg logit contributions
cakes-0.031
bel-0.030
organ-0.026
 Jos-0.026
spe-0.024
Token was
Token ID373
Feature activation+2.39e-01

Pos logit contributions
bel+0.070
cakes+0.068
organ+0.059
 Jos+0.058
spe+0.053
Neg logit contributions
 strangers-0.078
emy-0.072
 goats-0.071
 necks-0.070
 pad-0.069
Token a
Token ID257
Feature activation+2.19e-01

Pos logit contributions
bel+0.015
cakes+0.014
organ+0.013
 Jos+0.012
spe+0.012
Neg logit contributions
 strangers-0.017
 goats-0.016
emy-0.016
 pad-0.015
 necks-0.015
Token little
Token ID1310
Feature activation-1.25e+00

Pos logit contributions
bel+0.015
cakes+0.015
organ+0.014
 Jos+0.013
he+0.012
Neg logit contributions
 strangers-0.014
 goats-0.013
emy-0.013
 necks-0.012
leness-0.012
Token girl
Token ID2576
Feature activation+4.77e-01

Pos logit contributions
 strangers+0.085
 goats+0.080
emy+0.080
 necks+0.078
 hesitation+0.077
Neg logit contributions
bel-0.079
cakes-0.076
organ-0.070
 Jos-0.067
he-0.063
Token named
Token ID3706
Feature activation-2.65e+00

Pos logit contributions
bel+0.030
 Jos+0.027
 decor+0.026
 Sus+0.025
organ+0.025
Neg logit contributions
 unnecessary-0.036
 hesitation-0.034
leness-0.033
 risks-0.031
 strangers-0.031
Tokeny
Token ID88
Feature activation-9.80e-01

Pos logit contributions
bel+0.154
cakes+0.151
organ+0.137
 Jos+0.132
spe+0.125
Neg logit contributions
 strangers-0.116
 goats-0.106
emy-0.106
 pad-0.105
agging-0.102
Token and
Token ID290
Feature activation-1.99e-01

Pos logit contributions
 strangers+0.062
 goats+0.058
emy+0.058
 necks+0.055
 pad+0.055
Neg logit contributions
bel-0.068
cakes-0.065
organ-0.060
 Jos-0.058
he-0.054
Token started
Token ID2067
Feature activation+8.38e-01

Pos logit contributions
 strangers+0.013
 goats+0.012
emy+0.012
 hesitation+0.011
 necks+0.011
Neg logit contributions
bel-0.014
cakes-0.013
organ-0.012
 Jos-0.012
he-0.011
Token to
Token ID284
Feature activation-3.34e-01

Pos logit contributions
bel+0.051
cakes+0.046
organ+0.043
 Jos+0.042
spe+0.039
Neg logit contributions
 strangers-0.059
emy-0.058
 goats-0.058
uf-0.056
leness-0.055
Token complain
Token ID13121
Feature activation-8.61e-01

Pos logit contributions
 strangers+0.021
 goats+0.020
emy+0.019
 hesitation+0.019
 pad+0.019
Neg logit contributions
bel-0.023
cakes-0.023
organ-0.021
 Jos-0.020
he-0.019
Token.
Token ID13
Feature activation+6.10e-02

Pos logit contributions
 strangers+0.050
 goats+0.045
emy+0.043
 hesitation+0.042
 pad+0.042
Neg logit contributions
bel-0.065
cakes-0.062
organ-0.059
 Jos-0.056
he-0.054
Token
Token ID198
Feature activation+3.10e-02

Pos logit contributions
bel+0.004
cakes+0.004
organ+0.004
 Jos+0.003
spe+0.003
Neg logit contributions
 strangers-0.004
emy-0.004
 goats-0.004
 pad-0.004
uf-0.004
Token
Token ID198
Feature activation+2.45e-02

Pos logit contributions
bel+0.002
cakes+0.002
organ+0.001
 Jos+0.001
he+0.001
Neg logit contributions
 strangers-0.002
 goats-0.002
 necks-0.002
emy-0.002
leness-0.002
TokenThe
Token ID464
Feature activation-9.49e-01

Pos logit contributions
bel+0.002
cakes+0.002
organ+0.002
 Jos+0.002
he+0.001
Neg logit contributions
 strangers-0.001
 goats-0.001
emy-0.001
 necks-0.001
 hesitation-0.001
Token little
Token ID1310
Feature activation-1.36e+00

Pos logit contributions
uf+0.059
leness+0.059
 strangers+0.059
 risking+0.058
emy+0.058
Neg logit contributions
bel-0.059
cakes-0.055
organ-0.050
spe-0.050
he-0.048
Token girl
Token ID2576
Feature activation+3.19e-01

Pos logit contributions
 strangers+0.101
emy+0.097
 goats+0.096
uf+0.094
leness+0.094
Neg logit contributions
bel-0.077
cakes-0.073
organ-0.067
 Jos-0.065
spe-0.059
Token on
Token ID319
Feature activation-8.06e-01

Pos logit contributions
bel+0.067
cakes+0.064
organ+0.058
 Jos+0.056
spe+0.054
Neg logit contributions
 strangers-0.087
 goats-0.082
emy-0.082
 necks-0.080
 pad-0.079
Token the
Token ID262
Feature activation+4.27e-01

Pos logit contributions
emy+0.068
 strangers+0.068
 goats+0.066
leness+0.066
 pad+0.065
Neg logit contributions
bel-0.036
cakes-0.035
organ-0.030
 Jos-0.029
he-0.027
Token swings
Token ID26728
Feature activation-9.02e-01

Pos logit contributions
cakes+0.026
bel+0.024
he+0.022
whe+0.021
organ+0.020
Neg logit contributions
emy-0.031
 hesitation-0.031
iley-0.030
leness-0.030
uf-0.030
Token and
Token ID290
Feature activation-1.48e-02

Pos logit contributions
 strangers+0.055
emy+0.052
 goats+0.051
leness+0.049
uf+0.049
Neg logit contributions
bel-0.064
cakes-0.062
organ-0.058
 Jos-0.055
he-0.053
Token the
Token ID262
Feature activation+4.35e-01

Pos logit contributions
 strangers+0.001
emy+0.001
 goats+0.001
uf+0.001
leness+0.001
Neg logit contributions
bel-0.001
cakes-0.001
organ-0.001
 Jos-0.001
he-0.001
Token slide
Token ID10649
Feature activation-5.51e-01

Pos logit contributions
cakes+0.030
bel+0.029
organ+0.025
he+0.025
 Jos+0.024
Neg logit contributions
 strangers-0.028
emy-0.027
uf-0.026
 hesitation-0.026
leness-0.026
Token,
Token ID11
Feature activation-3.83e-01

Pos logit contributions
 strangers+0.034
 goats+0.031
emy+0.031
 necks+0.030
 pad+0.030
Neg logit contributions
bel-0.039
cakes-0.038
organ-0.035
 Jos-0.034
spe-0.032
Token but
Token ID475
Feature activation-2.26e+00

Pos logit contributions
 strangers+0.022
emy+0.021
 goats+0.021
 necks+0.020
leness+0.020
Neg logit contributions
bel-0.028
cakes-0.027
organ-0.025
 Jos-0.024
he-0.023
Token then
Token ID788
Feature activation-6.49e-01

Pos logit contributions
 strangers+0.163
 goats+0.157
emy+0.156
 necks+0.150
uf+0.149
Neg logit contributions
bel-0.134
cakes-0.129
organ-0.118
 Jos-0.114
he-0.104
Token she
Token ID673
Feature activation+7.66e-01

Pos logit contributions
 strangers+0.049
emy+0.048
 goats+0.048
agging+0.047
leness+0.047
Neg logit contributions
bel-0.034
cakes-0.033
organ-0.031
 Jos-0.030
 Sus-0.027
Token saw
Token ID2497
Feature activation-1.03e+00

Pos logit contributions
bel+0.047
cakes+0.044
organ+0.043
 Jos+0.041
he+0.038
Neg logit contributions
 strangers-0.052
leness-0.051
 goats-0.050
 hesitation-0.049
 unnecessary-0.049
Token was
Token ID373
Feature activation-5.42e-02

Pos logit contributions
bel+0.077
cakes+0.068
organ+0.064
ushes+0.062
 Jos+0.059
Neg logit contributions
 strangers-0.066
 goats-0.055
 pad-0.054
 hesitation-0.054
 necks-0.052
Token nothing
Token ID2147
Feature activation-1.32e+00

Pos logit contributions
 strangers+0.003
 goats+0.003
emy+0.003
 hesitation+0.003
 necks+0.003
Neg logit contributions
bel-0.004
cakes-0.004
organ-0.004
he-0.003
 Jos-0.003
Token there
Token ID612
Feature activation+1.01e+00

Pos logit contributions
 strangers+0.090
emy+0.089
 goats+0.088
 necks+0.086
uf+0.084
Neg logit contributions
bel-0.082
cakes-0.080
organ-0.072
 Jos-0.070
he-0.064
Token except
Token ID2845
Feature activation-1.75e+00

Pos logit contributions
bel+0.068
cakes+0.066
organ+0.061
 Jos+0.058
spe+0.054
Neg logit contributions
 strangers-0.066
 goats-0.061
emy-0.060
 pad-0.059
 necks-0.058
Token a
Token ID257
Feature activation-4.99e-03

Pos logit contributions
 strangers+0.113
 hesitation+0.100
 goats+0.097
 pad+0.096
 appearance+0.093
Neg logit contributions
bel-0.129
cakes-0.127
organ-0.118
spe-0.106
 Jos-0.103
Token few
Token ID1178
Feature activation-2.48e+00

Pos logit contributions
 strangers+0.000
 goats+0.000
emy+0.000
 pad+0.000
 necks+0.000
Neg logit contributions
bel-0.000
cakes-0.000
organ-0.000
 Jos-0.000
he-0.000
Token cob
Token ID22843
Feature activation+2.13e+00

Pos logit contributions
 strangers+0.149
emy+0.143
 goats+0.142
 necks+0.135
 pad+0.135
Neg logit contributions
bel-0.177
cakes-0.172
organ-0.156
 Jos-0.153
spe-0.143
Tokenwe
Token ID732
Feature activation+1.37e+00

Pos logit contributions
bel+0.135
cakes+0.128
organ+0.122
 Jos+0.115
spe+0.110
Neg logit contributions
 strangers-0.147
 goats-0.134
emy-0.133
 necks-0.131
iley-0.128
Tokenbs
Token ID1443
Feature activation-1.67e+00

Pos logit contributions
bel+0.105
cakes+0.101
organ+0.094
 Jos+0.092
he+0.088
Neg logit contributions
 strangers-0.076
emy-0.072
 goats-0.071
 necks-0.067
 hesitation-0.067
Token on
Token ID319
Feature activation-1.02e+00

Pos logit contributions
 strangers+0.107
 goats+0.099
 pad+0.096
emy+0.095
 hesitation+0.094
Neg logit contributions
bel-0.115
cakes-0.110
organ-0.104
 Jos-0.097
spe-0.092
Token the
Token ID262
Feature activation+2.98e-01

Pos logit contributions
 strangers+0.071
 goats+0.063
 necks+0.061
emy+0.059
 pad+0.058
Neg logit contributions
bel-0.069
cakes-0.067
organ-0.061
 Jos-0.057
spe-0.054
Token're
Token ID821
Feature activation-4.84e-01

Pos logit contributions
bel+0.066
cakes+0.062
organ+0.055
spe+0.055
ushes+0.052
Neg logit contributions
 strangers-0.039
 goats-0.038
 pad-0.037
 hesitation-0.036
iley-0.034
Token welcome
Token ID7062
Feature activation-1.80e+00

Pos logit contributions
 strangers+0.021
iley+0.019
 goats+0.019
agging+0.019
uf+0.019
Neg logit contributions
bel-0.047
cakes-0.044
spe-0.040
organ-0.037
castle-0.035
Token,
Token ID11
Feature activation-2.85e-01

Pos logit contributions
 strangers+0.081
 goats+0.075
emy+0.074
uf+0.070
 hesitation+0.069
Neg logit contributions
bel-0.160
cakes-0.152
organ-0.148
spe-0.137
 Jos-0.134
Token Lily
Token ID20037
Feature activation+2.08e-01

Pos logit contributions
 strangers+0.020
emy+0.020
 goats+0.019
 pad+0.018
 goat+0.018
Neg logit contributions
bel-0.020
cakes-0.017
organ-0.017
iced-0.015
he-0.014
Token.
Token ID13
Feature activation+3.68e-01

Pos logit contributions
bel+0.017
cakes+0.016
organ+0.015
 Jos+0.014
spe+0.013
Neg logit contributions
 strangers-0.011
 goats-0.011
emy-0.010
 pad-0.010
 necks-0.010
Token Let
Token ID3914
Feature activation-1.12e+00

Pos logit contributions
bel+0.030
cakes+0.028
organ+0.027
 Jos+0.025
he+0.025
Neg logit contributions
 strangers-0.018
emy-0.018
 goats-0.017
 pad-0.017
iley-0.017
Token's
Token ID338
Feature activation-1.94e+00

Pos logit contributions
 strangers+0.064
emy+0.061
 goats+0.059
 necks+0.059
 hesitation+0.059
Neg logit contributions
cakes-0.081
bel-0.079
organ-0.075
 Jos-0.072
he-0.070
Token play
Token ID711
Feature activation+2.17e-02

Pos logit contributions
 strangers+0.117
 goats+0.109
 pad+0.107
 necks+0.105
emy+0.103
Neg logit contributions
cakes-0.143
bel-0.143
organ-0.121
 Jos-0.120
spe-0.117
Token again
Token ID757
Feature activation+1.16e+00

Pos logit contributions
bel+0.001
cakes+0.001
organ+0.001
 Jos+0.001
spe+0.001
Neg logit contributions
 strangers-0.002
 goats-0.002
emy-0.002
 necks-0.002
 pad-0.001
Token soon
Token ID2582
Feature activation+1.51e+00

Pos logit contributions
bel+0.056
cakes+0.052
organ+0.049
 Jos+0.046
he+0.041
Neg logit contributions
 strangers-0.096
 goats-0.091
leness-0.091
 hesitation-0.091
emy-0.090
Token!"
Token ID2474
Feature activation+1.97e-01

Pos logit contributions
bel+0.097
cakes+0.095
organ+0.086
 Jos+0.083
he+0.079
Neg logit contributions
 strangers-0.101
emy-0.097
 goats-0.095
 hesitation-0.092
 necks-0.091
Token,"
Token ID553
Feature activation-7.84e-01

Pos logit contributions
 strangers+0.098
 goats+0.092
emy+0.091
 necks+0.088
 pad+0.087
Neg logit contributions
bel-0.108
cakes-0.105
organ-0.096
 Jos-0.093
he-0.088
Token Ben
Token ID3932
Feature activation-1.91e+00

Pos logit contributions
 strangers+0.043
 goats+0.041
emy+0.040
 necks+0.038
 hesitation+0.038
Neg logit contributions
bel-0.061
cakes-0.059
organ-0.055
 Jos-0.053
spe-0.050
Token says
Token ID1139
Feature activation+5.97e-01

Pos logit contributions
 strangers+0.110
emy+0.103
 goats+0.103
 necks+0.097
 pad+0.097
Neg logit contributions
bel-0.142
cakes-0.138
organ-0.128
 Jos-0.124
he-0.116
Token.
Token ID13
Feature activation-2.09e-01

Pos logit contributions
bel+0.044
cakes+0.043
organ+0.040
 Jos+0.038
spe+0.036
Neg logit contributions
 strangers-0.035
 goats-0.032
emy-0.032
 hesitation-0.031
 necks-0.031
Token He
Token ID679
Feature activation+7.02e-01

Pos logit contributions
 strangers+0.014
emy+0.013
 goats+0.013
 pad+0.013
 necks+0.013
Neg logit contributions
bel-0.014
cakes-0.013
organ-0.012
 Jos-0.012
spe-0.011
Token takes
Token ID2753
Feature activation+1.50e-01

Pos logit contributions
bel+0.057
cakes+0.055
organ+0.051
 Jos+0.050
he+0.047
Neg logit contributions
 strangers-0.036
 goats-0.034
emy-0.034
 necks-0.032
 pad-0.032
Token the
Token ID262
Feature activation+2.23e-01

Pos logit contributions
bel+0.012
cakes+0.012
spe+0.011
organ+0.011
castle+0.011
Neg logit contributions
 strangers-0.008
iley-0.007
 goats-0.007
 Rover-0.007
 walks-0.007
Token doll
Token ID3654
Feature activation+7.48e-01

Pos logit contributions
bel+0.013
cakes+0.013
organ+0.012
 Jos+0.011
spe+0.010
Neg logit contributions
 strangers-0.016
 goats-0.015
 pad-0.015
 hesitation-0.015
 necks-0.015
Token.
Token ID13
Feature activation-2.11e-01

Pos logit contributions
bel+0.058
cakes+0.057
organ+0.053
 Jos+0.052
spe+0.049
Neg logit contributions
 strangers-0.041
 pad-0.039
 goats-0.037
 necks-0.035
 goat-0.034
Token He
Token ID679
Feature activation+7.13e-01

Pos logit contributions
 strangers+0.014
emy+0.014
 goats+0.013
uf+0.013
 pad+0.013
Neg logit contributions
bel-0.014
cakes-0.013
organ-0.012
spe-0.011
 Jos-0.011
Token tries
Token ID8404
Feature activation-1.10e-01

Pos logit contributions
bel+0.058
cakes+0.056
organ+0.053
 Jos+0.052
he+0.049
Neg logit contributions
 strangers-0.037
 goats-0.034
emy-0.033
 necks-0.033
 pad-0.031
Token land
Token ID1956
Feature activation-1.14e+00

Pos logit contributions
 hesitation+0.174
leness+0.171
uf+0.171
iley+0.168
agging+0.167
Neg logit contributions
bel-0.208
spe-0.175
 Jos-0.172
cakes-0.169
organ-0.169
Token liked
Token ID8288
Feature activation-1.33e+00

Pos logit contributions
 strangers+0.073
 goats+0.069
 necks+0.066
emy+0.066
 pad+0.065
Neg logit contributions
bel-0.075
cakes-0.073
organ-0.072
 Jos-0.066
whe-0.060
Token to
Token ID284
Feature activation-3.74e-01

Pos logit contributions
 strangers+0.097
 goats+0.092
emy+0.091
 necks+0.088
 hesitation+0.087
Neg logit contributions
bel-0.080
cakes-0.077
organ-0.071
 Jos-0.067
he-0.063
Token help
Token ID1037
Feature activation-1.19e+00

Pos logit contributions
 strangers+0.025
 goats+0.023
emy+0.023
 necks+0.023
 hesitation+0.022
Neg logit contributions
cakes-0.024
bel-0.024
organ-0.022
 Jos-0.021
spe-0.021
Token each
Token ID1123
Feature activation-2.32e+00

Pos logit contributions
 strangers+0.084
 goats+0.079
emy+0.076
 necks+0.073
 hesitation+0.072
Neg logit contributions
bel-0.077
cakes-0.075
organ-0.069
 Jos-0.065
he-0.062
Token other
Token ID584
Feature activation-4.58e+00

Pos logit contributions
emy+0.114
 necks+0.113
 strangers+0.112
iley+0.111
uf+0.111
Neg logit contributions
bel-0.179
cakes-0.169
organ-0.159
 Jos-0.159
he-0.153
Token.
Token ID13
Feature activation+5.31e-02

Pos logit contributions
 strangers+0.271
 goats+0.254
 necks+0.239
 pad+0.235
emy+0.235
Neg logit contributions
bel-0.337
cakes-0.336
organ-0.310
 Jos-0.299
he-0.277
Token They
Token ID1119
Feature activation-6.18e-01

Pos logit contributions
bel+0.003
cakes+0.003
organ+0.003
 Jos+0.002
he+0.002
Neg logit contributions
 strangers-0.004
 goats-0.004
emy-0.004
 necks-0.004
 pad-0.004
Token played
Token ID2826
Feature activation+1.00e+00

Pos logit contributions
 strangers+0.037
 goats+0.036
 pad+0.033
 necks+0.032
 goat+0.032
Neg logit contributions
cakes-0.045
bel-0.045
organ-0.041
 Jos-0.039
iced-0.038
Token together
Token ID1978
Feature activation+3.66e+00

Pos logit contributions
cakes+0.060
bel+0.059
 Jos+0.055
spe+0.054
organ+0.053
Neg logit contributions
 strangers-0.070
 goats-0.068
emy-0.066
 necks-0.066
agging-0.066
Token near
Token ID1474
Feature activation-1.42e+00

Pos logit contributions
bel+0.245
cakes+0.240
organ+0.219
 Jos+0.212
spe+0.195
Neg logit contributions
 strangers-0.241
 goats-0.226
emy-0.221
 pad-0.219
 necks-0.218
Token too
Token ID1165
Feature activation+1.50e+00

Pos logit contributions
 strangers+0.018
 goats+0.018
emy+0.016
 hesitation+0.016
 pad+0.016
Neg logit contributions
bel-0.026
cakes-0.025
organ-0.025
 Jos-0.022
he-0.022
Token.
Token ID13
Feature activation+4.30e-01

Pos logit contributions
bel+0.113
cakes+0.110
organ+0.102
 Jos+0.096
spe+0.090
Neg logit contributions
 strangers-0.087
 goats-0.082
emy-0.081
uf-0.077
leness-0.076
Token They
Token ID1119
Feature activation-2.38e-01

Pos logit contributions
bel+0.032
organ+0.029
cakes+0.029
castle+0.025
spe+0.024
Neg logit contributions
emy-0.024
uf-0.024
 strangers-0.024
 goats-0.024
 pad-0.023
Token hugged
Token ID41130
Feature activation-1.16e+00

Pos logit contributions
 strangers+0.013
 goats+0.013
 pad+0.013
 hesitation+0.012
 goat+0.012
Neg logit contributions
bel-0.018
cakes-0.017
organ-0.016
iced-0.015
ushes-0.015
Token each
Token ID1123
Feature activation-2.30e+00

Pos logit contributions
 strangers+0.070
 goats+0.063
 necks+0.060
iley+0.060
 years+0.059
Neg logit contributions
bel-0.087
cakes-0.084
organ-0.081
castle-0.072
whe-0.070
Token other
Token ID584
Feature activation-4.41e+00

Pos logit contributions
emy+0.140
 necks+0.134
 strangers+0.131
 appearance+0.130
 risking+0.129
Neg logit contributions
cakes-0.159
bel-0.149
he-0.146
 Jos-0.141
 Kate-0.130
Token and
Token ID290
Feature activation+1.30e-03

Pos logit contributions
 strangers+0.254
 goats+0.239
emy+0.235
 necks+0.228
 pad+0.226
Neg logit contributions
bel-0.329
cakes-0.317
organ-0.297
 Jos-0.285
he-0.266
Token couldn
Token ID3521
Feature activation+1.59e+00

Pos logit contributions
bel+0.000
organ+0.000
cakes+0.000
iced+0.000
ushes+0.000
Neg logit contributions
 strangers-0.000
 goats-0.000
 hesitation-0.000
 necks-0.000
leness-0.000
Token't
Token ID470
Feature activation+1.16e+00

Pos logit contributions
bel+0.131
organ+0.110
 Jos+0.106
ushes+0.106
cakes+0.105
Neg logit contributions
 strangers-0.087
uf-0.074
 goats-0.073
 Rover-0.073
 walks-0.069
Token stop
Token ID2245
Feature activation+4.34e-01

Pos logit contributions
bel+0.094
cakes+0.093
organ+0.086
ushes+0.082
 Jos+0.081
Neg logit contributions
 strangers-0.061
 hesitation-0.052
 goats-0.052
uf-0.051
leness-0.050
Token smiling
Token ID16755
Feature activation+1.62e-01

Pos logit contributions
bel+0.034
cakes+0.033
organ+0.031
 Jos+0.030
spe+0.028
Neg logit contributions
 strangers-0.023
emy-0.022
 goats-0.022
 necks-0.020
 pad-0.020
Token and
Token ID290
Feature activation-1.33e-01

Pos logit contributions
 strangers+0.070
 goats+0.065
emy+0.065
 necks+0.062
 pad+0.062
Neg logit contributions
bel-0.085
cakes-0.082
organ-0.076
 Jos-0.073
he-0.069
Token Sam
Token ID3409
Feature activation-8.31e-01

Pos logit contributions
 strangers+0.010
emy+0.009
 goats+0.009
 hesitation+0.009
 necks+0.009
Neg logit contributions
cakes-0.008
bel-0.008
organ-0.007
iced-0.007
he-0.006
Token looked
Token ID3114
Feature activation-2.08e-01

Pos logit contributions
 strangers+0.057
 goats+0.053
emy+0.052
 necks+0.051
 hesitation+0.051
Neg logit contributions
bel-0.054
cakes-0.052
organ-0.048
 Jos-0.046
spe-0.042
Token at
Token ID379
Feature activation+2.96e+00

Pos logit contributions
 strangers+0.014
 goats+0.014
emy+0.013
 pad+0.013
 necks+0.013
Neg logit contributions
bel-0.013
cakes-0.013
organ-0.012
 Jos-0.011
spe-0.010
Token each
Token ID1123
Feature activation-2.37e+00

Pos logit contributions
bel+0.164
cakes+0.157
organ+0.142
 Jos+0.135
he+0.124
Neg logit contributions
 strangers-0.229
 goats-0.219
emy-0.218
 necks-0.210
 pad-0.209
Token other
Token ID584
Feature activation-4.67e+00

Pos logit contributions
emy+0.141
 necks+0.133
uf+0.132
 strangers+0.131
 unnecessary+0.131
Neg logit contributions
cakes-0.159
bel-0.155
he-0.152
organ-0.142
room-0.140
Token.
Token ID13
Feature activation-6.75e-02

Pos logit contributions
 strangers+0.291
emy+0.280
 goats+0.275
uf+0.260
 necks+0.260
Neg logit contributions
bel-0.323
cakes-0.311
organ-0.285
 Jos-0.276
he-0.263
Token They
Token ID1119
Feature activation-6.38e-01

Pos logit contributions
 strangers+0.005
 goats+0.005
emy+0.004
 necks+0.004
 pad+0.004
Neg logit contributions
bel-0.004
cakes-0.004
organ-0.004
he-0.004
 Jos-0.003
Token felt
Token ID2936
Feature activation-3.29e-01

Pos logit contributions
 strangers+0.040
 goats+0.038
emy+0.037
 necks+0.036
 pad+0.036
Neg logit contributions
bel-0.044
cakes-0.043
organ-0.040
 Jos-0.038
spe-0.035
Token sorry
Token ID7926
Feature activation-8.71e-01

Pos logit contributions
 strangers+0.020
 goats+0.018
emy+0.018
 hesitation+0.017
leness+0.017
Neg logit contributions
bel-0.025
cakes-0.024
organ-0.022
 Jos-0.021
spe-0.020
Token for
Token ID329
Feature activation+1.10e+00

Pos logit contributions
 strangers+0.057
 goats+0.055
emy+0.054
 necks+0.052
 hesitation+0.052
Neg logit contributions
bel-0.058
cakes-0.056
organ-0.051
 Jos-0.050
he-0.046
Token,
Token ID11
Feature activation-7.19e-01

Pos logit contributions
bel+0.050
cakes+0.046
organ+0.046
he+0.043
spe+0.042
Neg logit contributions
 strangers-0.021
 goats-0.020
emy-0.018
 hesitation-0.018
uf-0.017
Token "
Token ID366
Feature activation+2.99e-01

Pos logit contributions
 strangers+0.054
emy+0.051
 goats+0.051
 unnecessary+0.049
leness+0.049
Neg logit contributions
bel-0.041
cakes-0.040
organ-0.035
 Jos-0.035
spe-0.031
TokenTo
Token ID2514
Feature activation+8.09e-01

Pos logit contributions
bel+0.021
cakes+0.020
organ+0.019
 Jos+0.018
he+0.017
Neg logit contributions
 strangers-0.018
 goats-0.017
emy-0.017
 necks-0.016
 pad-0.016
Token Anna
Token ID11735
Feature activation-1.60e+00

Pos logit contributions
bel+0.059
cakes+0.057
organ+0.053
 Jos+0.050
spe+0.048
Neg logit contributions
 strangers-0.050
 goats-0.046
emy-0.045
uf-0.044
 necks-0.043
Token and
Token ID290
Feature activation-3.52e-01

Pos logit contributions
 strangers+0.099
emy+0.094
 goats+0.093
leness+0.091
 necks+0.091
Neg logit contributions
bel-0.109
cakes-0.105
organ-0.097
 Jos-0.095
he-0.091
Token Spot
Token ID15899
Feature activation-4.65e+00

Pos logit contributions
 strangers+0.022
emy+0.022
 goats+0.021
leness+0.021
 necks+0.021
Neg logit contributions
bel-0.023
cakes-0.023
organ-0.020
 Jos-0.020
he-0.019
Token,
Token ID11
Feature activation-7.28e-01

Pos logit contributions
 strangers+0.278
 goats+0.261
emy+0.255
 pad+0.246
uf+0.243
Neg logit contributions
bel-0.342
cakes-0.331
organ-0.306
 Jos-0.294
spe-0.279
Token from
Token ID422
Feature activation+2.78e+00

Pos logit contributions
 strangers+0.041
 goats+0.037
emy+0.036
 pad+0.035
uf+0.034
Neg logit contributions
bel-0.057
cakes-0.056
organ-0.053
 Jos-0.050
spe-0.047
Token Grand
Token ID5675
Feature activation+2.43e+00

Pos logit contributions
bel+0.177
cakes+0.169
organ+0.154
 Jos+0.150
he+0.139
Neg logit contributions
 strangers-0.189
emy-0.182
 goats-0.182
 necks-0.174
 pad-0.174
Tokenma
Token ID2611
Feature activation+2.04e+00

Pos logit contributions
cakes+0.108
bel+0.095
 Jos+0.085
organ+0.083
he+0.083
Neg logit contributions
 pad-0.208
emy-0.206
 unnecessary-0.204
 strangers-0.204
 goats-0.203
Token and
Token ID290
Feature activation-3.51e-01

Pos logit contributions
bel+0.145
cakes+0.140
organ+0.130
 Jos+0.125
he+0.117
Neg logit contributions
 strangers-0.125
 goats-0.117
emy-0.116
 pad-0.111
 necks-0.111
Token
Token ID198
Feature activation+1.51e-01

Pos logit contributions
bel+0.004
cakes+0.003
organ+0.003
 Jos+0.003
whe+0.003
Neg logit contributions
 strangers-0.006
iley-0.005
 hesitation-0.005
 goats-0.005
leness-0.005
Token
Token ID198
Feature activation+1.60e-01

Pos logit contributions
bel+0.008
cakes+0.007
organ+0.006
he+0.006
 Jos+0.006
Neg logit contributions
 strangers-0.013
 goats-0.012
 hesitation-0.012
 necks-0.012
leness-0.012
TokenOne
Token ID3198
Feature activation-2.47e+00

Pos logit contributions
bel+0.012
organ+0.011
cakes+0.010
whe+0.010
spe+0.009
Neg logit contributions
 hesitation-0.010
iley-0.010
 strangers-0.009
uf-0.009
osa-0.009
Token day
Token ID1110
Feature activation-6.54e-01

Pos logit contributions
 necks+0.180
leness+0.180
 strangers+0.180
 goats+0.176
 risking+0.175
Neg logit contributions
bel-0.139
cakes-0.129
organ-0.123
he-0.120
 Jos-0.113
Token,
Token ID11
Feature activation-4.75e-01

Pos logit contributions
 strangers+0.048
 pad+0.046
iley+0.045
 necks+0.045
 goats+0.045
Neg logit contributions
bel-0.038
organ-0.035
cakes-0.034
 together-0.030
 Sus-0.029
Token Bark
Token ID20511
Feature activation-3.75e+00

Pos logit contributions
 strangers+0.033
 goats+0.031
emy+0.031
 necks+0.030
 pad+0.030
Neg logit contributions
bel-0.030
cakes-0.029
organ-0.026
 Jos-0.025
he-0.023
Tokeny
Token ID88
Feature activation-5.53e-01

Pos logit contributions
 strangers+0.273
 hesitation+0.266
leness+0.259
 goats+0.258
emy+0.254
Neg logit contributions
bel-0.224
cakes-0.202
organ-0.193
 Jos-0.180
he-0.172
Token saw
Token ID2497
Feature activation-1.09e+00

Pos logit contributions
 strangers+0.035
leness+0.034
 hesitation+0.034
emy+0.033
 goats+0.033
Neg logit contributions
bel-0.036
cakes-0.034
organ-0.033
 Jos-0.031
he-0.030
Token Wh
Token ID854
Feature activation+5.68e-01

Pos logit contributions
leness+0.103
 pad+0.102
 necks+0.100
iley+0.099
anna+0.099
Neg logit contributions
bel-0.037
organ-0.035
cakes-0.034
 Jos-0.029
 Suz-0.028
Tokenisk
Token ID1984
Feature activation+2.25e+00

Pos logit contributions
cakes+0.024
organ+0.023
bel+0.023
room+0.021
he+0.020
Neg logit contributions
 hesitation-0.048
leness-0.047
 strangers-0.047
 goats-0.046
 risking-0.044
Tokeners
Token ID364
Feature activation+1.69e+00

Pos logit contributions
bel+0.153
cakes+0.144
 Jos+0.136
organ+0.134
spe+0.126
Neg logit contributions
 strangers-0.146
 goats-0.134
emy-0.134
agging-0.126
 necks-0.126