TOP ACTIVATIONS
MAX = 0.144

saw many choices . Which way should he go ?
seed asked , " Which way should I grow at the
. " Which way , Tom ?" Lily asked
â Which way should I go ? â
be happy . Even though situations can be serious , it
said , " But ter flies are meant to fly and
and asked , " Bird y , do you know where
selfish . They asked D ucky if they could join in
, " Well , one way you can learn news is
a swim . He asked all his friends to come swimming
green paint . He asked little Toby , who was three
eat . They say " y um " and " del
very worried . He asked all the other animals what was
started to cry and asked all of the animals to help
coming from . He asked all the other animals , but
ground . She asked John y not to touch it but
tall mountain . He asked all his friends to join him
was hidden . He asked all his servants , but no
said . " But ter flies are free . They need
was light . She asked little Jake to come and join

MIN ACTIVATIONS
MIN = -0.192

saw the great big Fer ris wheel and all the other
Lily loved the big Fer ris wheel the most . She
they went on the fer ris wheel together , and Lily
He went on the fer ris wheel , the roller coaster
and down on the roller co aster , Lily shouted ,
's go on the fer ris wheel and see the whole
went on a scary roller co aster ride . Tim my
Lily loved to ride roller co asters and always remembered how
day ever ! <|endoftext|> Mor ris was very excited . Today
you start with the roller co aster ? â his
Did you enjoy the roller co aster and your ice cream
was a happy 3 - year - old boy who loved
he saw a different roller co aster - like ride .
and put on their rain co ats and boots . They
have to wear your rain co ats and boots ," Mom
took Sara the 3 - year - old to the park
Lily saw a big roller co aster and asked her mom
Jake was a three - year - old boy . He
Ben was a 3 - year - old boy who owned
Charlie was a three - year - old girl who loved

INTERVAL 0.060 - 0.144
CONTAINS 0.732%

Benny tried to swat it away but the bird wouldn 't
loved to use his long tail to swing from tree to
on the counter and ran away . They did not want
to the store and saw all the perfume bottles lined up
to Rabb ie . Fox y said , " I 'm

INTERVAL -0.024 - 0.060
CONTAINS 86.070%

and Kar la . One day they were playing in the
the help we receive . <|endoftext|> Once upon a time ,
and show everyone her voice . She told her mom and
snacks together and then continued on their adventure . They walked
mom 's opinion , even if it didn 't turn out

INTERVAL -0.108 - -0.024
CONTAINS 13.189%

and became very worried . But then ,
right !" She said as she began to load the shoes
girl named Kate . She loved to pick pretty things .
with so many new stories to tell and jokes to share
k itty was happy . She was able to lie in

INTERVAL -0.192 - -0.108
CONTAINS 0.010%

to put on their seat belts . Mom started
a cheerful little 3 - year - old boy . He
they go to the merry - go - round and hop
Tim was a three - year - old boy . He
there would be more beautiful works of art to come .
Token saw
Token ID2497
Feature activation-1.15e-02

Pos logit contributions
ris+0.002
innie+0.002
abel+0.002
icer+0.002
ma+0.002
Neg logit contributions
 echoed-0.001
 arrow-0.001
 arrows-0.001
adoes-0.001
 hoop-0.001
Token many
Token ID867
Feature activation+1.96e-02

Pos logit contributions
 anx+0.016
ris+0.016
innie+0.014
ma+0.014
abel+0.014
Neg logit contributions
adoes-0.020
 scares-0.018
 arrows-0.018
 echoed-0.018
 arrow-0.018
Token choices
Token ID7747
Feature activation+2.15e-02

Pos logit contributions
adoes+0.028
 scares+0.025
 echoed+0.025
 bully+0.024
 net+0.024
Neg logit contributions
 anx-0.034
ris-0.032
abel-0.030
 Er-0.030
ma-0.030
Token.
Token ID13
Feature activation-4.73e-03

Pos logit contributions
adoes+0.044
 scares+0.041
 echoed+0.040
 bully+0.040
 arrows+0.039
Neg logit contributions
 anx-0.024
ris-0.022
 Er-0.020
abel-0.020
ma-0.020
Token Which
Token ID9022
Feature activation-1.50e-02

Pos logit contributions
 anx+0.009
ris+0.008
abel+0.008
ma+0.008
innie+0.008
Neg logit contributions
adoes-0.006
 pitch-0.005
 scares-0.005
 echoed-0.005
 hoop-0.005
Token way
Token ID835
Feature activation+1.44e-01

Pos logit contributions
ris+0.009
ma+0.009
 anx+0.009
 Er+0.007
 first+0.007
Neg logit contributions
adoes-0.037
 echoed-0.035
ickets-0.035
 affect-0.034
 racket-0.034
Token should
Token ID815
Feature activation+2.38e-02

Pos logit contributions
adoes+0.201
 echoed+0.190
 scares+0.189
AAAA+0.182
 web+0.181
Neg logit contributions
 anx-0.243
ma-0.230
 first-0.227
 Er-0.226
 next-0.218
Token he
Token ID339
Feature activation-1.23e-02

Pos logit contributions
adoes+0.033
 echoed+0.029
 scares+0.029
 web+0.027
 arrows+0.027
Neg logit contributions
 anx-0.042
 Er-0.039
ma-0.038
ris-0.038
abel-0.038
Token go
Token ID467
Feature activation+2.08e-02

Pos logit contributions
 anx+0.019
ris+0.017
 first+0.017
ma+0.016
 patiently+0.016
Neg logit contributions
adoes-0.021
 echoed-0.018
ickets-0.017
 scares-0.017
 bullet-0.017
Token?
Token ID30
Feature activation-1.68e-02

Pos logit contributions
adoes+0.029
 scares+0.027
 pitch+0.027
 net+0.027
 arrows+0.026
Neg logit contributions
 anx-0.036
ris-0.034
abel-0.034
 Er-0.032
innie-0.032
Token
Token ID198
Feature activation+2.57e-03

Pos logit contributions
 anx+0.029
ris+0.027
abel+0.026
ma+0.026
 Er+0.025
Neg logit contributions
adoes-0.024
 scares-0.022
 echoed-0.022
 bully-0.021
 arrows-0.021
Token seed
Token ID9403
Feature activation+4.46e-02

Pos logit contributions
adoes+0.002
 echoed+0.002
 scares+0.002
ickets+0.002
 racket+0.002
Neg logit contributions
 anx-0.004
 next-0.003
ma-0.003
 first-0.003
 patiently-0.003
Token asked
Token ID1965
Feature activation+1.18e-02

Pos logit contributions
adoes+0.076
 scares+0.069
 bully+0.067
 racket+0.066
 echoed+0.065
Neg logit contributions
 anx-0.068
ris-0.059
ma-0.059
abel-0.058
 patiently-0.058
Token,
Token ID11
Feature activation+5.02e-02

Pos logit contributions
adoes+0.020
 scares+0.018
 echoed+0.018
 bully+0.017
 arrows+0.017
Neg logit contributions
 anx-0.018
ris-0.017
 Er-0.016
abel-0.015
ma-0.015
Token "
Token ID366
Feature activation+5.66e-03

Pos logit contributions
adoes+0.046
 echoed+0.042
 scares+0.042
 tail+0.040
 snail+0.040
Neg logit contributions
ris-0.111
 anx-0.105
abel-0.103
ma-0.101
son-0.100
TokenWhich
Token ID13828
Feature activation-5.05e-03

Pos logit contributions
adoes+0.007
 snail+0.007
 scares+0.007
 bully+0.007
AAAA+0.007
Neg logit contributions
 anx-0.010
ris-0.010
abel-0.009
 Er-0.009
innie-0.009
Token way
Token ID835
Feature activation+1.42e-01

Pos logit contributions
 first+0.007
 next+0.007
ma+0.007
 anx+0.006
 once+0.006
Neg logit contributions
adoes-0.009
 echoed-0.008
ickets-0.008
 trigger-0.008
 bullet-0.008
Token should
Token ID815
Feature activation+2.64e-02

Pos logit contributions
adoes+0.209
 echoed+0.203
AAAA+0.192
 scares+0.191
 bullet+0.190
Neg logit contributions
 anx-0.227
 first-0.225
ma-0.221
 next-0.214
 Er-0.205
Token I
Token ID314
Feature activation+7.34e-03

Pos logit contributions
adoes+0.036
 echoed+0.032
 scares+0.032
 arrows+0.030
 racket+0.030
Neg logit contributions
 anx-0.048
 Er-0.043
abel-0.043
ma-0.043
ris-0.043
Token grow
Token ID1663
Feature activation+3.33e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 echoed+0.011
 arrows+0.010
 bully+0.010
Neg logit contributions
 anx-0.012
ris-0.011
abel-0.010
ma-0.010
 Er-0.010
Token at
Token ID379
Feature activation+2.42e-03

Pos logit contributions
adoes+0.049
 scares+0.046
 echoed+0.045
 bully+0.044
 arrows+0.043
Neg logit contributions
 anx-0.055
ris-0.051
abel-0.049
ma-0.049
 Er-0.049
Token the
Token ID262
Feature activation+1.06e-02

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.004
 arrows+0.004
Neg logit contributions
 anx-0.004
ris-0.003
 Er-0.003
ma-0.003
abel-0.003
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
adoes+0.007
 scares+0.007
 pitch+0.007
 net+0.006
 echoed+0.006
Neg logit contributions
 anx-0.007
innie-0.007
abel-0.007
ris-0.007
son-0.007
Token
Token ID198
Feature activation+5.09e-03

Pos logit contributions
adoes+0.028
 scares+0.025
 echoed+0.024
 bully+0.024
 net+0.023
Neg logit contributions
 anx-0.040
ris-0.038
abel-0.036
ma-0.036
 Er-0.035
Token
Token ID198
Feature activation+1.28e-02

Pos logit contributions
 scares+0.007
adoes+0.007
 echoed+0.007
 pitch+0.007
 arrow+0.007
Neg logit contributions
 anx-0.008
ris-0.007
 Er-0.007
abel-0.007
son-0.007
Token"
Token ID1
Feature activation+2.85e-03

Pos logit contributions
adoes+0.017
 scares+0.016
 echoed+0.016
 bully+0.015
 arrows+0.015
Neg logit contributions
 anx-0.023
ris-0.021
abel-0.021
 Er-0.021
ma-0.020
TokenWhich
Token ID13828
Feature activation-4.38e-03

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.004
 arrows+0.004
Neg logit contributions
 anx-0.005
ris-0.004
abel-0.004
 Er-0.004
ma-0.004
Token way
Token ID835
Feature activation+1.41e-01

Pos logit contributions
 first+0.005
ma+0.005
 next+0.005
ris+0.005
 anx+0.004
Neg logit contributions
adoes-0.008
ickets-0.008
 echoed-0.008
 trigger-0.008
ashes-0.007
Token,
Token ID11
Feature activation+5.97e-02

Pos logit contributions
adoes+0.246
 echoed+0.233
 scares+0.229
 bullet+0.225
ickets+0.224
Neg logit contributions
 anx-0.186
 first-0.183
ma-0.183
 next-0.168
 Er-0.160
Token Tom
Token ID4186
Feature activation+5.01e-03

Pos logit contributions
adoes+0.103
 affect+0.097
ickets+0.097
 scares+0.096
 echoed+0.096
Neg logit contributions
 anx-0.081
ris-0.072
ma-0.070
 first-0.069
abel-0.069
Token?"
Token ID1701
Feature activation+7.11e-03

Pos logit contributions
adoes+0.008
 echoed+0.007
 scares+0.007
ickets+0.007
 arrows+0.007
Neg logit contributions
 anx-0.008
ma-0.008
ris-0.008
 patiently-0.007
abel-0.007
Token Lily
Token ID20037
Feature activation+2.00e-02

Pos logit contributions
adoes+0.008
 scares+0.008
 echoed+0.008
 bully+0.007
 arrows+0.007
Neg logit contributions
 anx-0.014
ris-0.013
abel-0.013
 Er-0.013
ma-0.013
Token asked
Token ID1965
Feature activation+8.41e-03

Pos logit contributions
adoes+0.030
 scares+0.028
 echoed+0.028
 bully+0.027
 arrows+0.026
Neg logit contributions
 anx-0.033
ris-0.031
 Er-0.029
abel-0.029
ma-0.029
Tokenâ
Token ID22940
Feature activation-1.97e-02

Pos logit contributions
adoes+0.004
 echoed+0.004
 scares+0.003
 bully+0.003
 pitch+0.003
Neg logit contributions
 anx-0.004
ris-0.004
 Er-0.003
ma-0.003
abel-0.003
Token
Token ID26391
Feature activation-2.75e-02

Pos logit contributions
 anx+0.030
ris+0.027
 Er+0.027
abel+0.027
ma+0.026
Neg logit contributions
adoes-0.031
 echoed-0.029
 scares-0.029
 bully-0.028
 pitch-0.028
Token
Token ID129
Feature activation-3.11e-02

Pos logit contributions
 anx+0.043
ris+0.039
 Er+0.038
 patiently+0.038
abel+0.037
Neg logit contributions
adoes-0.040
 echoed-0.039
 scares-0.038
AAAA-0.037
 hoop-0.037
Token
Token ID241
Feature activation-2.15e-02

Pos logit contributions
 anx+0.050
ris+0.047
abel+0.044
 Er+0.044
ma+0.044
Neg logit contributions
adoes-0.048
 scares-0.045
 echoed-0.043
 pitch-0.043
 bully-0.042
TokenWhich
Token ID13828
Feature activation-1.01e-02

Pos logit contributions
 anx+0.039
ma+0.037
abel+0.037
ris+0.035
 Er+0.035
Neg logit contributions
 echoed-0.026
adoes-0.026
 pitch-0.025
 scares-0.025
AAAA-0.025
Token way
Token ID835
Feature activation+1.40e-01

Pos logit contributions
 anx+0.013
ris+0.013
ma+0.012
abel+0.012
 first+0.011
Neg logit contributions
adoes-0.018
ickets-0.016
 echoed-0.016
 scares-0.016
 bully-0.016
Token should
Token ID815
Feature activation+2.28e-02

Pos logit contributions
adoes+0.189
 scares+0.176
 echoed+0.173
 bully+0.171
 bullet+0.167
Neg logit contributions
 anx-0.244
ma-0.231
 Er-0.224
abel-0.221
ris-0.220
Token I
Token ID314
Feature activation+8.40e-03

Pos logit contributions
adoes+0.026
 scares+0.023
 echoed+0.022
 bully+0.021
 arrows+0.021
Neg logit contributions
 anx-0.046
ris-0.044
 Er-0.042
abel-0.042
ma-0.042
Token go
Token ID467
Feature activation+2.39e-02

Pos logit contributions
adoes+0.010
 scares+0.010
 echoed+0.009
 bully+0.009
 net+0.009
Neg logit contributions
 anx-0.016
ris-0.015
abel-0.014
 Er-0.014
ma-0.014
Token?
Token ID30
Feature activation-1.43e-03

Pos logit contributions
adoes+0.028
 scares+0.027
 pitch+0.026
 net+0.026
 bully+0.025
Neg logit contributions
 anx-0.046
ris-0.044
abel-0.043
innie-0.041
 Er-0.041
Tokenâ
Token ID22940
Feature activation+1.97e-03

Pos logit contributions
 anx+0.003
ris+0.002
 Er+0.002
abel+0.002
ma+0.002
Neg logit contributions
adoes-0.002
 scares-0.002
 echoed-0.002
 bully-0.002
 net-0.002
Token be
Token ID307
Feature activation+1.16e-02

Pos logit contributions
 anx+0.011
 patiently+0.010
ris+0.010
ma+0.009
 Er+0.009
Neg logit contributions
adoes-0.013
 scares-0.012
 echoed-0.012
AAAA-0.011
ickets-0.011
Token happy
Token ID3772
Feature activation-1.84e-02

Pos logit contributions
adoes+0.018
 scares+0.016
 echoed+0.016
 bully+0.016
 arrows+0.015
Neg logit contributions
 anx-0.019
ris-0.018
abel-0.017
 Er-0.017
innie-0.016
Token.
Token ID13
Feature activation-1.11e-02

Pos logit contributions
 anx+0.022
ris+0.020
abel+0.018
 Er+0.018
ma+0.018
Neg logit contributions
adoes-0.036
 scares-0.034
 echoed-0.034
 bully-0.033
 arrows-0.033
Token Even
Token ID3412
Feature activation+2.23e-02

Pos logit contributions
 anx+0.019
ris+0.018
abel+0.017
 Er+0.017
ma+0.017
Neg logit contributions
adoes-0.016
 scares-0.015
 echoed-0.014
 bully-0.014
 arrows-0.014
Token though
Token ID996
Feature activation+2.98e-02

Pos logit contributions
adoes+0.029
 scares+0.027
 echoed+0.026
 bully+0.025
 arrows+0.025
Neg logit contributions
 anx-0.041
ris-0.039
abel-0.037
 Er-0.037
ma-0.037
Token situations
Token ID7445
Feature activation+1.40e-01

Pos logit contributions
adoes+0.048
 scares+0.045
 echoed+0.044
 bully+0.043
 arrows+0.042
Neg logit contributions
 anx-0.046
ris-0.043
 Er-0.040
abel-0.040
ma-0.040
Token can
Token ID460
Feature activation-5.50e-02

Pos logit contributions
adoes+0.209
 echoed+0.183
 scares+0.183
 net+0.181
 bully+0.179
Neg logit contributions
 anx-0.226
ris-0.208
 Er-0.205
ma-0.203
abel-0.201
Token be
Token ID307
Feature activation+2.80e-02

Pos logit contributions
 anx+0.102
ris+0.099
son+0.091
innie+0.091
 Er+0.091
Neg logit contributions
adoes-0.071
 scares-0.068
 echoed-0.065
 bully-0.063
 affect-0.062
Token serious
Token ID2726
Feature activation+1.31e-02

Pos logit contributions
adoes+0.046
ickets+0.043
 racket+0.042
 bullet+0.042
 trigger+0.042
Neg logit contributions
 anx-0.039
ris-0.036
abel-0.035
ma-0.034
 Er-0.034
Token,
Token ID11
Feature activation+1.57e-02

Pos logit contributions
adoes+0.023
 scares+0.022
 net+0.021
 bully+0.021
AAAA+0.021
Neg logit contributions
 anx-0.018
 Er-0.015
abel-0.015
ris-0.015
 patiently-0.015
Token it
Token ID340
Feature activation+4.02e-02

Pos logit contributions
adoes+0.025
 scares+0.023
 echoed+0.023
 bully+0.022
 arrows+0.022
Neg logit contributions
 anx-0.024
ris-0.023
abel-0.021
 Er-0.021
ma-0.021
Token said
Token ID531
Feature activation+7.12e-03

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.004
 arrow+0.004
Neg logit contributions
 anx-0.004
ris-0.004
 Er-0.004
abel-0.004
ma-0.004
Token,
Token ID11
Feature activation+3.15e-02

Pos logit contributions
adoes+0.011
 scares+0.010
 echoed+0.010
 bully+0.010
 arrows+0.010
Neg logit contributions
 anx-0.011
ris-0.011
 Er-0.010
abel-0.010
ma-0.010
Token "
Token ID366
Feature activation-4.30e-03

Pos logit contributions
adoes+0.035
 echoed+0.031
 scares+0.031
 arrow+0.028
 arrows+0.028
Neg logit contributions
ris-0.065
 anx-0.063
abel-0.059
ma-0.059
 Er-0.057
TokenBut
Token ID1537
Feature activation-6.99e-03

Pos logit contributions
 anx+0.008
ris+0.007
 Er+0.007
abel+0.007
ma+0.007
Neg logit contributions
adoes-0.006
 scares-0.005
 echoed-0.005
 bully-0.005
 snail-0.005
Tokenter
Token ID353
Feature activation+1.01e-01

Pos logit contributions
 anx+0.012
ris+0.012
abel+0.011
 Er+0.011
ma+0.011
Neg logit contributions
adoes-0.010
 scares-0.009
 echoed-0.009
 bully-0.009
 arrows-0.009
Tokenflies
Token ID27959
Feature activation+1.21e-01

Pos logit contributions
adoes+0.166
 echoed+0.154
 scares+0.153
 bully+0.147
 arrows+0.146
Neg logit contributions
 anx-0.147
ris-0.138
ma-0.131
abel-0.130
 Er-0.130
Token are
Token ID389
Feature activation+1.70e-02

Pos logit contributions
adoes+0.195
 bullet+0.182
Found+0.176
 Indians+0.176
 arrow+0.176
Neg logit contributions
 first-0.163
ll-0.162
 anx-0.162
 patiently-0.150
ma-0.149
Token meant
Token ID4001
Feature activation-1.30e-02

Pos logit contributions
adoes+0.034
 echoed+0.033
ickets+0.032
 Sorry+0.031
 racket+0.031
Neg logit contributions
 anx-0.017
 first-0.017
 next-0.016
 patiently-0.016
 Er-0.015
Token to
Token ID284
Feature activation+2.02e-03

Pos logit contributions
 anx+0.012
 patiently+0.010
 next+0.010
 first+0.010
 Er+0.010
Neg logit contributions
adoes-0.027
 echoed-0.026
ickets-0.026
 scares-0.025
AAAA-0.025
Token fly
Token ID6129
Feature activation+1.92e-02

Pos logit contributions
adoes+0.003
 echoed+0.003
 scares+0.003
ickets+0.003
 racket+0.003
Neg logit contributions
 anx-0.003
 first-0.003
ris-0.003
 next-0.003
 Er-0.003
Token and
Token ID290
Feature activation+6.83e-03

Pos logit contributions
adoes+0.035
 echoed+0.033
 scares+0.033
 bully+0.032
ickets+0.031
Neg logit contributions
 anx-0.024
ris-0.022
ma-0.020
 patiently-0.020
abel-0.020
Token and
Token ID290
Feature activation+2.21e-02

Pos logit contributions
adoes+0.049
 scares+0.047
 echoed+0.046
 net+0.044
 bully+0.044
Neg logit contributions
 anx-0.042
ris-0.041
 Er-0.037
abel-0.037
ma-0.037
Token asked
Token ID1965
Feature activation+8.37e-03

Pos logit contributions
adoes+0.035
 scares+0.032
 racket+0.031
 bully+0.031
 affect+0.031
Neg logit contributions
 anx-0.035
 Er-0.032
ris-0.031
 patiently-0.031
ma-0.031
Token,
Token ID11
Feature activation+4.55e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 echoed+0.011
 arrows+0.010
 mushroom+0.010
Neg logit contributions
 anx-0.014
ris-0.014
abel-0.013
 Er-0.013
ma-0.013
Token "
Token ID366
Feature activation+3.98e-03

Pos logit contributions
adoes+0.040
 scares+0.038
 echoed+0.037
 snail+0.036
 mushroom+0.036
Neg logit contributions
ris-0.100
 anx-0.096
abel-0.095
son-0.092
ma-0.091
TokenBird
Token ID42562
Feature activation+5.99e-02

Pos logit contributions
adoes+0.005
 mushroom+0.005
 scares+0.005
 arrows+0.005
 bully+0.005
Neg logit contributions
 anx-0.007
ris-0.007
abel-0.007
 Er-0.006
abella-0.006
Tokeny
Token ID88
Feature activation+1.20e-01

Pos logit contributions
adoes+0.096
 scares+0.089
 echoed+0.087
 bully+0.085
 mushroom+0.083
Neg logit contributions
 anx-0.093
ris-0.087
 Er-0.082
abel-0.081
ma-0.079
Token,
Token ID11
Feature activation+6.23e-02

Pos logit contributions
adoes+0.216
 echoed+0.200
 scares+0.200
 arrows+0.190
 bully+0.189
Neg logit contributions
 anx-0.159
ris-0.148
ma-0.144
abel-0.141
 Er-0.136
Token do
Token ID466
Feature activation+3.82e-02

Pos logit contributions
adoes+0.098
 scares+0.089
 echoed+0.089
 arrows+0.084
 bully+0.084
Neg logit contributions
 anx-0.099
ris-0.091
abel-0.088
ma-0.088
 Er-0.088
Token you
Token ID345
Feature activation+2.64e-02

Pos logit contributions
adoes+0.058
 scares+0.054
 echoed+0.052
 bully+0.051
 arrows+0.050
Neg logit contributions
 anx-0.063
ris-0.059
abel-0.055
 Er-0.055
ma-0.055
Token know
Token ID760
Feature activation+1.39e-02

Pos logit contributions
adoes+0.027
 scares+0.025
 echoed+0.024
 bully+0.024
 net+0.023
Neg logit contributions
 anx-0.055
ris-0.052
 Er-0.051
abel-0.050
ma-0.050
Token where
Token ID810
Feature activation-1.27e-02

Pos logit contributions
adoes+0.023
 scares+0.022
 echoed+0.022
 bully+0.021
 affect+0.021
Neg logit contributions
 anx-0.020
ma-0.018
abel-0.018
ris-0.018
 patiently-0.017
Token selfish
Token ID20363
Feature activation+2.08e-02

Pos logit contributions
adoes+0.020
 scares+0.018
 echoed+0.018
 mushroom+0.017
ickets+0.017
Neg logit contributions
 anx-0.020
ris-0.018
ma-0.017
 patiently-0.017
abel-0.017
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
adoes+0.045
 scares+0.043
 bully+0.042
 echoed+0.042
AAAA+0.041
Neg logit contributions
 anx-0.020
ris-0.018
ma-0.016
abel-0.016
 Er-0.015
Token They
Token ID1119
Feature activation+3.79e-03

Pos logit contributions
 anx+0.016
ris+0.015
 Er+0.014
abel+0.014
ma+0.014
Neg logit contributions
adoes-0.018
 scares-0.016
 echoed-0.016
 bully-0.016
 arrows-0.015
Token asked
Token ID1965
Feature activation+1.42e-02

Pos logit contributions
adoes+0.006
 scares+0.006
 echoed+0.006
 bully+0.006
 net+0.005
Neg logit contributions
 anx-0.006
ris-0.005
abel-0.005
 Er-0.005
ma-0.005
Token D
Token ID360
Feature activation+7.60e-02

Pos logit contributions
adoes+0.023
 arrows+0.021
 mushroom+0.021
 snail+0.021
 scares+0.021
Neg logit contributions
ris-0.023
 anx-0.021
ma-0.019
innie-0.019
abel-0.019
Tokenucky
Token ID5309
Feature activation+1.19e-01

Pos logit contributions
adoes+0.070
 scares+0.063
 bully+0.060
 echoed+0.059
 net+0.058
Neg logit contributions
 anx-0.168
ris-0.160
 Er-0.155
ma-0.153
abel-0.152
Token if
Token ID611
Feature activation+7.97e-03

Pos logit contributions
adoes+0.202
 scares+0.189
 bully+0.183
 mushroom+0.181
 echoed+0.181
Neg logit contributions
 anx-0.170
ris-0.164
abel-0.148
 Er-0.147
innie-0.145
Token they
Token ID484
Feature activation+5.09e-03

Pos logit contributions
adoes+0.010
 snail+0.009
 scares+0.009
 net+0.009
 arrows+0.009
Neg logit contributions
ris-0.016
 anx-0.015
abel-0.014
innie-0.014
ma-0.014
Token could
Token ID714
Feature activation-6.38e-03

Pos logit contributions
adoes+0.007
 scares+0.007
 echoed+0.007
 bully+0.007
 arrows+0.006
Neg logit contributions
 anx-0.009
ris-0.008
 Er-0.008
abel-0.008
ma-0.008
Token join
Token ID4654
Feature activation+5.89e-03

Pos logit contributions
 anx+0.012
ris+0.011
abel+0.010
 Er+0.010
ma+0.010
Neg logit contributions
adoes-0.008
 scares-0.008
 echoed-0.008
 bully-0.007
 arrows-0.007
Token in
Token ID287
Feature activation+3.55e-02

Pos logit contributions
adoes+0.011
 scares+0.010
 echoed+0.010
 bully+0.010
 arrows+0.010
Neg logit contributions
 anx-0.008
ris-0.007
abel-0.006
 Er-0.006
ma-0.006
Token,
Token ID11
Feature activation+2.52e-02

Pos logit contributions
adoes+0.061
 echoed+0.055
 scares+0.055
 mushroom+0.054
 arrow+0.053
Neg logit contributions
ris-0.073
 anx-0.071
icer-0.065
rom-0.063
 Er-0.063
Token "
Token ID366
Feature activation-1.01e-02

Pos logit contributions
adoes+0.027
 echoed+0.025
 scares+0.024
 bully+0.022
 tail+0.022
Neg logit contributions
ris-0.052
 anx-0.050
abel-0.047
ma-0.046
innie-0.046
TokenWell
Token ID5779
Feature activation-1.00e-02

Pos logit contributions
 anx+0.020
ris+0.017
 patiently+0.017
abel+0.016
ma+0.016
Neg logit contributions
adoes-0.012
ickets-0.011
 echoed-0.011
 scares-0.011
 arrows-0.011
Token,
Token ID11
Feature activation+4.70e-02

Pos logit contributions
 anx+0.017
ris+0.017
 Er+0.016
rom+0.015
abel+0.015
Neg logit contributions
adoes-0.014
 scares-0.013
 echoed-0.013
 bully-0.012
 net-0.012
Token one
Token ID530
Feature activation+3.41e-03

Pos logit contributions
adoes+0.071
 scares+0.066
 echoed+0.064
 bully+0.063
 arrows+0.061
Neg logit contributions
 anx-0.077
ris-0.072
abel-0.068
 Er-0.068
ma-0.068
Token way
Token ID835
Feature activation+1.19e-01

Pos logit contributions
adoes+0.006
 scares+0.005
 echoed+0.005
 bully+0.005
 arrows+0.005
Neg logit contributions
 anx-0.005
ris-0.004
ma-0.004
abel-0.004
 Er-0.004
Token you
Token ID345
Feature activation+1.70e-02

Pos logit contributions
adoes+0.174
 scares+0.158
 echoed+0.153
 bully+0.151
 net+0.149
Neg logit contributions
 anx-0.202
ris-0.195
abel-0.181
 Er-0.179
innie-0.176
Token can
Token ID460
Feature activation-5.01e-02

Pos logit contributions
adoes+0.024
 scares+0.022
 echoed+0.022
 bully+0.021
 arrows+0.021
Neg logit contributions
 anx-0.030
ris-0.028
abel-0.026
 Er-0.026
ma-0.026
Token learn
Token ID2193
Feature activation-1.85e-03

Pos logit contributions
 anx+0.087
ris+0.082
abel+0.077
 Er+0.077
ma+0.076
Neg logit contributions
adoes-0.071
 scares-0.066
 echoed-0.064
 bully-0.062
 net-0.061
Token news
Token ID1705
Feature activation-1.23e-03

Pos logit contributions
 anx+0.002
ris+0.002
ma+0.002
abel+0.002
 Er+0.002
Neg logit contributions
adoes-0.003
 bully-0.003
 echoed-0.003
 scares-0.003
 bullet-0.003
Token is
Token ID318
Feature activation-3.56e-02

Pos logit contributions
 anx+0.002
ris+0.002
abel+0.002
ma+0.002
 Er+0.002
Neg logit contributions
adoes-0.002
 scares-0.002
 echoed-0.002
 bully-0.002
 net-0.002
Token a
Token ID257
Feature activation-5.53e-04

Pos logit contributions
 anx+0.031
rom+0.031
ris+0.031
ll+0.029
abel+0.029
Neg logit contributions
 arrow-0.021
 arrows-0.021
 net-0.020
 scares-0.020
 tail-0.020
Token swim
Token ID9422
Feature activation-1.14e-02

Pos logit contributions
 anx+0.001
rom+0.001
ris+0.001
 Er+0.001
abel+0.001
Neg logit contributions
 net-0.001
 pitch-0.001
 bully-0.001
 snail-0.001
 arrow-0.001
Token.
Token ID13
Feature activation-1.31e-02

Pos logit contributions
 anx+0.017
ris+0.016
 Er+0.016
innie+0.016
abel+0.016
Neg logit contributions
adoes-0.018
 scares-0.017
 net-0.017
 echoed-0.017
 pitch-0.017
Token He
Token ID679
Feature activation-1.36e-02

Pos logit contributions
 anx+0.019
ris+0.017
 Er+0.017
abel+0.016
ma+0.016
Neg logit contributions
adoes-0.023
 scares-0.021
 bully-0.021
 echoed-0.021
 affect-0.020
Token asked
Token ID1965
Feature activation+1.95e-02

Pos logit contributions
 anx+0.024
ris+0.023
 Er+0.023
innie+0.022
ma+0.022
Neg logit contributions
 echoed-0.017
 scares-0.016
adoes-0.016
 net-0.016
 arrow-0.015
Token all
Token ID477
Feature activation+1.19e-01

Pos logit contributions
adoes+0.031
 echoed+0.029
 scares+0.028
 arrows+0.028
 arrow+0.028
Neg logit contributions
ris-0.030
 anx-0.029
rom-0.027
abel-0.027
 Er-0.027
Token his
Token ID465
Feature activation+3.25e-02

Pos logit contributions
adoes+0.233
 scares+0.216
 bully+0.211
 echoed+0.211
 affect+0.209
Neg logit contributions
 anx-0.146
ma-0.131
ris-0.130
abel-0.127
 Er-0.126
Token friends
Token ID2460
Feature activation+8.59e-02

Pos logit contributions
 net+0.028
adoes+0.028
 scares+0.027
 arrows+0.027
 arrow+0.026
Neg logit contributions
 anx-0.070
ris-0.069
rom-0.066
 Er-0.065
abel-0.064
Token to
Token ID284
Feature activation+2.30e-02

Pos logit contributions
adoes+0.121
 echoed+0.118
 net+0.116
 scares+0.113
 snail+0.113
Neg logit contributions
ris-0.133
 anx-0.132
innie-0.124
icer-0.121
 Er-0.119
Token come
Token ID1282
Feature activation-3.10e-02

Pos logit contributions
adoes+0.030
 scares+0.028
 echoed+0.027
 bully+0.027
 net+0.027
Neg logit contributions
 anx-0.042
ris-0.040
abel-0.038
 Er-0.037
ma-0.037
Token swimming
Token ID14899
Feature activation+9.38e-03

Pos logit contributions
abel+0.052
 anx+0.051
innie+0.051
ris+0.051
son+0.050
Neg logit contributions
adoes-0.041
 pitch-0.041
 net-0.039
 tail-0.039
 arrow-0.037
Token green
Token ID4077
Feature activation+3.08e-04

Pos logit contributions
 anx+0.007
ris+0.007
abel+0.006
 Er+0.006
ma+0.006
Neg logit contributions
adoes-0.006
 scares-0.006
 echoed-0.006
 bully-0.006
 arrows-0.006
Token paint
Token ID7521
Feature activation-4.87e-02

Pos logit contributions
adoes+0.000
 scares+0.000
 echoed+0.000
 bully+0.000
 arrows+0.000
Neg logit contributions
 anx-0.001
ris-0.000
abel-0.000
ma-0.000
 Er-0.000
Token.
Token ID13
Feature activation-8.78e-03

Pos logit contributions
 anx+0.075
ris+0.069
 Er+0.065
innie+0.064
abel+0.064
Neg logit contributions
adoes-0.079
 echoed-0.074
 scares-0.071
 pitch-0.071
 net-0.070
Token He
Token ID679
Feature activation-1.29e-02

Pos logit contributions
 anx+0.015
ris+0.014
abel+0.013
ma+0.013
innie+0.013
Neg logit contributions
adoes-0.012
 scares-0.011
 echoed-0.011
 pitch-0.011
 net-0.011
Token asked
Token ID1965
Feature activation+1.94e-02

Pos logit contributions
 anx+0.021
ris+0.021
 Er+0.020
innie+0.019
ma+0.019
Neg logit contributions
adoes-0.017
 echoed-0.017
 scares-0.017
 arrow-0.016
 net-0.015
Token little
Token ID1310
Feature activation+1.18e-01

Pos logit contributions
adoes+0.021
 echoed+0.021
 scares+0.019
 squash+0.019
 bully+0.019
Neg logit contributions
ris-0.035
 anx-0.032
innie-0.031
ll-0.031
icer-0.031
Token Toby
Token ID37578
Feature activation+4.59e-02

Pos logit contributions
 bug+0.134
adoes+0.133
 snail+0.131
 frogs+0.130
 scares+0.127
Neg logit contributions
innie-0.216
ris-0.212
 anx-0.210
ve-0.193
Acc-0.192
Token,
Token ID11
Feature activation+4.10e-02

Pos logit contributions
adoes+0.071
 scares+0.067
 echoed+0.066
 bully+0.065
 mushroom+0.065
Neg logit contributions
 anx-0.069
ris-0.067
abel-0.061
ma-0.060
innie-0.060
Token who
Token ID508
Feature activation-1.26e-03

Pos logit contributions
 echoed+0.031
 snail+0.030
 tail+0.029
 pitch+0.029
adoes+0.029
Neg logit contributions
ris-0.092
 anx-0.088
abel-0.085
icer-0.084
ma-0.083
Token was
Token ID373
Feature activation+1.79e-02

Pos logit contributions
 anx+0.002
ris+0.002
rom+0.002
innie+0.002
 Er+0.002
Neg logit contributions
 scares-0.002
 echoed-0.002
 mushroom-0.002
 bully-0.002
adoes-0.002
Token three
Token ID1115
Feature activation-5.46e-03

Pos logit contributions
adoes+0.024
 scares+0.024
 bully+0.024
 pitch+0.023
 echoed+0.023
Neg logit contributions
ris-0.030
 anx-0.028
innie-0.026
ll-0.025
abel-0.025
Token eat
Token ID4483
Feature activation+2.49e-02

Pos logit contributions
ris+0.012
 anx+0.012
innie+0.011
abel+0.011
rom+0.011
Neg logit contributions
adoes-0.007
 scares-0.007
 net-0.006
 bully-0.006
 echoed-0.006
Token.
Token ID13
Feature activation+2.15e-02

Pos logit contributions
adoes+0.031
 scares+0.029
 net+0.029
 pitch+0.028
 mushroom+0.028
Neg logit contributions
ris-0.046
 anx-0.045
innie-0.042
abel-0.042
rom-0.042
Token They
Token ID1119
Feature activation+1.98e-02

Pos logit contributions
adoes+0.029
 scares+0.027
 echoed+0.026
 bully+0.026
 arrows+0.025
Neg logit contributions
 anx-0.039
ris-0.036
 Er-0.035
abel-0.035
ma-0.034
Token say
Token ID910
Feature activation+1.51e-02

Pos logit contributions
 scares+0.015
 bully+0.015
 net+0.014
 mushroom+0.014
 scatter+0.014
Neg logit contributions
 anx-0.044
ris-0.043
innie-0.042
rom-0.042
 Er-0.040
Token "
Token ID366
Feature activation-9.83e-03

Pos logit contributions
adoes+0.024
 scares+0.022
 echoed+0.021
 bully+0.021
 arrows+0.021
Neg logit contributions
 anx-0.024
ris-0.022
abel-0.021
 Er-0.021
ma-0.021
Tokeny
Token ID88
Feature activation+1.18e-01

Pos logit contributions
 anx+0.018
ris+0.016
abel+0.016
 patiently+0.016
 Er+0.015
Neg logit contributions
 echoed-0.013
ickets-0.012
adoes-0.012
 scares-0.012
 bully-0.012
Tokenum
Token ID388
Feature activation+5.02e-02

Pos logit contributions
adoes+0.179
 scares+0.166
 echoed+0.161
 bully+0.158
 net+0.155
Neg logit contributions
 anx-0.193
ris-0.182
abel-0.171
 Er-0.170
ma-0.169
Token"
Token ID1
Feature activation-2.89e-02

Pos logit contributions
adoes+0.061
 scares+0.055
 mushroom+0.053
 net+0.053
 bully+0.053
Neg logit contributions
 anx-0.099
ris-0.091
rom-0.085
abel-0.085
 Er-0.084
Token and
Token ID290
Feature activation+5.52e-02

Pos logit contributions
 anx+0.044
ris+0.042
 Er+0.040
abel+0.039
innie+0.038
Neg logit contributions
adoes-0.047
 echoed-0.044
 scares-0.043
 bully-0.041
 arrows-0.041
Token "
Token ID366
Feature activation-1.03e-02

Pos logit contributions
adoes+0.084
 scares+0.078
 echoed+0.076
 bully+0.074
 arrows+0.073
Neg logit contributions
 anx-0.090
ris-0.084
 Er-0.080
abel-0.080
ma-0.079
Tokendel
Token ID12381
Feature activation-5.35e-02

Pos logit contributions
 anx+0.019
abel+0.017
ris+0.016
ma+0.016
innie+0.016
Neg logit contributions
ickets-0.013
 affect-0.012
 echoed-0.012
adoes-0.012
 bully-0.012
Token very
Token ID845
Feature activation-2.28e-02

Pos logit contributions
adoes+0.022
 scares+0.021
 echoed+0.020
 bully+0.020
 arrows+0.020
Neg logit contributions
 anx-0.020
ris-0.019
abel-0.018
 Er-0.018
ma-0.017
Token worried
Token ID7960
Feature activation-2.00e-02

Pos logit contributions
 anx+0.043
ris+0.040
abel+0.038
 Er+0.038
ma+0.038
Neg logit contributions
adoes-0.029
 scares-0.027
 echoed-0.026
 bully-0.025
 arrows-0.025
Token.
Token ID13
Feature activation-7.55e-03

Pos logit contributions
ris+0.027
 anx+0.027
innie+0.024
ma+0.023
 Er+0.023
Neg logit contributions
adoes-0.036
 echoed-0.034
 scares-0.033
 arrows-0.031
 bully-0.031
Token He
Token ID679
Feature activation-1.72e-02

Pos logit contributions
 anx+0.012
ris+0.012
ma+0.011
abel+0.011
 Er+0.011
Neg logit contributions
adoes-0.012
 scares-0.011
 echoed-0.011
 bully-0.010
 arrows-0.010
Token asked
Token ID1965
Feature activation+1.81e-02

Pos logit contributions
ris+0.030
 anx+0.030
ma+0.029
 Er+0.028
innie+0.028
Neg logit contributions
 echoed-0.022
adoes-0.022
 scares-0.022
 arrow-0.020
 arrows-0.020
Token all
Token ID477
Feature activation+1.18e-01

Pos logit contributions
adoes+0.028
 echoed+0.025
 scares+0.025
 mushroom+0.025
 bully+0.024
Neg logit contributions
ris-0.030
 anx-0.027
rom-0.027
innie-0.025
icer-0.025
Token the
Token ID262
Feature activation+3.53e-02

Pos logit contributions
adoes+0.206
 scares+0.191
 echoed+0.188
 bully+0.186
AAAA+0.182
Neg logit contributions
 anx-0.167
ris-0.150
ma-0.148
abel-0.147
 Er-0.146
Token other
Token ID584
Feature activation+5.33e-02

Pos logit contributions
adoes+0.038
 scares+0.037
 bully+0.034
 net+0.034
 arrows+0.033
Neg logit contributions
ris-0.072
 anx-0.072
innie-0.067
 Er-0.066
abel-0.066
Token animals
Token ID4695
Feature activation+5.44e-02

Pos logit contributions
adoes+0.058
 scares+0.053
 echoed+0.050
 bully+0.048
 arrows+0.048
Neg logit contributions
 anx-0.111
ris-0.107
 Er-0.100
abel-0.100
innie-0.099
Token what
Token ID644
Feature activation-3.98e-02

Pos logit contributions
adoes+0.093
 scares+0.088
 echoed+0.086
 bully+0.084
 net+0.083
Neg logit contributions
 anx-0.077
ris-0.072
 Er-0.067
abel-0.067
ma-0.066
Token was
Token ID373
Feature activation-9.56e-03

Pos logit contributions
 anx+0.052
ris+0.048
 Er+0.045
abel+0.045
ma+0.045
Neg logit contributions
adoes-0.073
 scares-0.067
 echoed-0.065
 bully-0.064
 arrows-0.064
Token started
Token ID2067
Feature activation+2.69e-02

Pos logit contributions
 anx+0.002
ris+0.002
 Er+0.002
ma+0.002
rom+0.002
Neg logit contributions
adoes-0.002
 echoed-0.002
 arrow-0.001
 scares-0.001
 arrows-0.001
Token to
Token ID284
Feature activation+4.14e-04

Pos logit contributions
adoes+0.050
 scares+0.048
 echoed+0.045
 net+0.045
 bully+0.045
Neg logit contributions
ris-0.035
 anx-0.033
abel-0.030
ma-0.030
innie-0.029
Token cry
Token ID3960
Feature activation+7.41e-02

Pos logit contributions
adoes+0.000
 scares+0.000
 echoed+0.000
 arrows+0.000
ickets+0.000
Neg logit contributions
 anx-0.001
ris-0.001
 Er-0.001
abel-0.001
ma-0.001
Token and
Token ID290
Feature activation-4.39e-03

Pos logit contributions
adoes+0.113
 echoed+0.107
 scares+0.099
 tail+0.098
 arrow+0.097
Neg logit contributions
ris-0.121
 anx-0.116
ma-0.111
 Er-0.106
son-0.106
Token asked
Token ID1965
Feature activation+1.24e-02

Pos logit contributions
 anx+0.007
 patiently+0.006
abel+0.006
ma+0.006
 Er+0.006
Neg logit contributions
adoes-0.007
ickets-0.007
 affect-0.006
 scares-0.006
 arrows-0.006
Token all
Token ID477
Feature activation+1.18e-01

Pos logit contributions
adoes+0.017
 echoed+0.016
 mushroom+0.015
 scares+0.015
 arrows+0.015
Neg logit contributions
ris-0.022
 anx-0.020
icer-0.019
 Er-0.019
rom-0.019
Token of
Token ID286
Feature activation+5.08e-02

Pos logit contributions
adoes+0.228
 scares+0.212
 echoed+0.209
 bully+0.207
 affect+0.206
Neg logit contributions
 anx-0.149
ma-0.133
abel-0.131
ris-0.129
 Er-0.126
Token the
Token ID262
Feature activation+2.74e-02

Pos logit contributions
adoes+0.063
 mushroom+0.059
 scares+0.056
 web+0.055
 echoed+0.055
Neg logit contributions
ris-0.095
rom-0.088
 anx-0.087
son-0.086
 Er-0.085
Token animals
Token ID4695
Feature activation+5.45e-02

Pos logit contributions
adoes+0.037
 scares+0.035
 echoed+0.033
 bully+0.032
 net+0.031
Neg logit contributions
 anx-0.050
ris-0.048
abel-0.045
 Er-0.045
ma-0.044
Token to
Token ID284
Feature activation+1.10e-02

Pos logit contributions
adoes+0.094
 scares+0.089
 echoed+0.088
 bully+0.086
 net+0.084
Neg logit contributions
 anx-0.076
ris-0.072
 Er-0.066
abel-0.066
ma-0.065
Token help
Token ID1037
Feature activation+1.84e-02

Pos logit contributions
adoes+0.017
 scares+0.015
 echoed+0.015
 bully+0.015
 arrows+0.014
Neg logit contributions
 anx-0.018
ris-0.017
ma-0.016
 Er-0.016
abel-0.016
Token coming
Token ID2406
Feature activation-2.21e-02

Pos logit contributions
 anx+0.042
ris+0.038
 Er+0.035
ma+0.035
abel+0.035
Neg logit contributions
adoes-0.070
 scares-0.066
 echoed-0.064
 bully-0.064
 arrows-0.063
Token from
Token ID422
Feature activation+7.06e-03

Pos logit contributions
 anx+0.041
ris+0.040
abel+0.039
innie+0.038
abella+0.037
Neg logit contributions
adoes-0.029
 scares-0.025
 echoed-0.025
 arrow-0.024
 pitch-0.024
Token.
Token ID13
Feature activation+6.66e-04

Pos logit contributions
adoes+0.012
 scares+0.011
 net+0.011
 arrow+0.011
 pitch+0.011
Neg logit contributions
 anx-0.010
ris-0.010
abel-0.009
innie-0.009
 Er-0.009
Token He
Token ID679
Feature activation-1.53e-02

Pos logit contributions
adoes+0.001
 scares+0.001
 echoed+0.001
 bully+0.001
 net+0.001
Neg logit contributions
 anx-0.001
ris-0.001
ma-0.001
abel-0.001
 Er-0.001
Token asked
Token ID1965
Feature activation+2.12e-02

Pos logit contributions
 anx+0.027
ris+0.026
 Er+0.025
ma+0.025
innie+0.024
Neg logit contributions
 echoed-0.019
adoes-0.019
 scares-0.018
 arrow-0.018
 arrows-0.017
Token all
Token ID477
Feature activation+1.18e-01

Pos logit contributions
adoes+0.031
 arrows+0.028
 arrow+0.028
 mushroom+0.027
 echoed+0.027
Neg logit contributions
ris-0.036
 anx-0.034
icer-0.032
abel-0.032
innie-0.032
Token the
Token ID262
Feature activation+3.53e-02

Pos logit contributions
adoes+0.217
 scares+0.201
 bully+0.197
 echoed+0.197
 affect+0.194
Neg logit contributions
 anx-0.158
ma-0.140
ris-0.139
abel-0.137
 Er-0.136
Token other
Token ID584
Feature activation+5.34e-02

Pos logit contributions
adoes+0.047
 scares+0.043
 echoed+0.042
 bully+0.041
 arrows+0.040
Neg logit contributions
 anx-0.065
ris-0.061
abel-0.058
 Er-0.058
ma-0.057
Token animals
Token ID4695
Feature activation+5.41e-02

Pos logit contributions
adoes+0.061
 scares+0.056
 echoed+0.054
 bully+0.052
 arrows+0.051
Neg logit contributions
 anx-0.107
ris-0.102
abel-0.097
 Er-0.097
ma-0.096
Token,
Token ID11
Feature activation+3.75e-02

Pos logit contributions
adoes+0.094
 scares+0.089
 echoed+0.088
 bully+0.085
 net+0.084
Neg logit contributions
 anx-0.075
ris-0.070
abel-0.065
 Er-0.065
ma-0.064
Token but
Token ID475
Feature activation+1.81e-02

Pos logit contributions
adoes+0.052
 scares+0.049
 echoed+0.049
 net+0.047
 arrow+0.047
Neg logit contributions
ris-0.063
 anx-0.063
abel-0.057
ma-0.057
innie-0.057
Token ground
Token ID2323
Feature activation+5.28e-02

Pos logit contributions
 anx+0.001
ris+0.001
abel+0.001
 Er+0.001
ma+0.001
Neg logit contributions
adoes-0.001
 scares-0.001
 echoed-0.001
 bully-0.000
 arrows-0.000
Token.
Token ID13
Feature activation+3.09e-03

Pos logit contributions
adoes+0.097
 echoed+0.091
 scares+0.091
 pitch+0.087
 net+0.086
Neg logit contributions
 anx-0.069
ris-0.064
 Er-0.060
abel-0.059
innie-0.058
Token She
Token ID1375
Feature activation+5.43e-03

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.003
 pitch+0.003
Neg logit contributions
 anx-0.006
ris-0.005
abel-0.005
ma-0.005
 Er-0.005
Token asked
Token ID1965
Feature activation+2.29e-02

Pos logit contributions
adoes+0.007
 echoed+0.007
 scares+0.007
 arrow+0.007
 bully+0.007
Neg logit contributions
 anx-0.009
ris-0.009
 Er-0.008
abel-0.008
ma-0.008
Token John
Token ID1757
Feature activation+1.76e-02

Pos logit contributions
adoes+0.033
 echoed+0.029
 scares+0.026
 arrow+0.026
 bully+0.026
Neg logit contributions
ris-0.038
icer-0.034
lan-0.033
ll-0.033
rou-0.033
Tokeny
Token ID88
Feature activation+1.18e-01

Pos logit contributions
adoes+0.024
 mushroom+0.024
 scares+0.022
igator+0.022
 arrow+0.022
Neg logit contributions
 anx-0.028
ris-0.026
rom-0.024
icer-0.024
abel-0.023
Token not
Token ID407
Feature activation+1.20e-02

Pos logit contributions
adoes+0.167
 scares+0.151
 mushroom+0.146
 echoed+0.144
 bully+0.142
Neg logit contributions
ris-0.201
 anx-0.194
icer-0.176
innie-0.173
abel-0.172
Token to
Token ID284
Feature activation+2.94e-02

Pos logit contributions
adoes+0.027
 scares+0.026
 echoed+0.025
 bully+0.025
 arrows+0.024
Neg logit contributions
 anx-0.011
ris-0.010
ma-0.009
 Er-0.009
abel-0.009
Token touch
Token ID3638
Feature activation+2.49e-02

Pos logit contributions
adoes+0.042
 scares+0.039
 echoed+0.037
 arrows+0.036
 net+0.036
Neg logit contributions
 anx-0.051
ris-0.048
ma-0.046
abel-0.045
 Er-0.045
Token it
Token ID340
Feature activation+1.49e-02

Pos logit contributions
adoes+0.035
 scares+0.031
 echoed+0.030
 arrows+0.029
 bully+0.029
Neg logit contributions
 anx-0.045
ris-0.043
abel-0.041
innie-0.040
 Er-0.040
Token but
Token ID475
Feature activation+1.28e-02

Pos logit contributions
adoes+0.028
 scares+0.026
 echoed+0.025
 bully+0.025
 arrows+0.024
Neg logit contributions
 anx-0.019
ris-0.018
abel-0.017
 Er-0.016
ma-0.016
Token tall
Token ID7331
Feature activation-4.30e-02

Pos logit contributions
 anx+0.071
 Er+0.066
ris+0.066
ma+0.065
son+0.065
Neg logit contributions
 mushroom-0.051
 net-0.050
 bully-0.049
 pitch-0.048
 web-0.048
Token mountain
Token ID8598
Feature activation-5.39e-02

Pos logit contributions
 anx+0.077
ris+0.073
 Er+0.070
abel+0.069
ma+0.069
Neg logit contributions
adoes-0.058
 scares-0.053
 echoed-0.051
 bully-0.051
 net-0.050
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
 anx+0.073
innie+0.066
ris+0.065
 Er+0.065
abel+0.064
Neg logit contributions
adoes-0.095
 scares-0.091
 echoed-0.089
 bully-0.087
 net-0.087
Token He
Token ID679
Feature activation-1.92e-02

Pos logit contributions
 anx+0.018
ris+0.016
abel+0.015
ma+0.015
 patiently+0.015
Neg logit contributions
adoes-0.018
 scares-0.017
 echoed-0.017
 bully-0.016
 mushroom-0.016
Token asked
Token ID1965
Feature activation+1.66e-02

Pos logit contributions
 anx+0.034
 Er+0.033
ris+0.033
innie+0.032
rom+0.032
Neg logit contributions
 scares-0.023
 echoed-0.023
adoes-0.022
 arrow-0.022
 arrows-0.022
Token all
Token ID477
Feature activation+1.18e-01

Pos logit contributions
adoes+0.029
 scares+0.027
 echoed+0.026
 arrows+0.026
 bully+0.026
Neg logit contributions
 anx-0.023
ris-0.023
abel-0.021
 Er-0.021
ma-0.020
Token his
Token ID465
Feature activation+2.25e-02

Pos logit contributions
adoes+0.240
 scares+0.223
 echoed+0.219
 bully+0.218
 affect+0.216
Neg logit contributions
 anx-0.133
ris-0.118
ma-0.115
abel-0.113
 Er-0.112
Token friends
Token ID2460
Feature activation+8.54e-02

Pos logit contributions
adoes+0.023
 scares+0.022
 arrows+0.022
 arrow+0.021
 tail+0.021
Neg logit contributions
 anx-0.047
ris-0.045
 Er-0.043
rom-0.043
abel-0.043
Token to
Token ID284
Feature activation+1.91e-02

Pos logit contributions
adoes+0.155
 scares+0.146
 echoed+0.143
 bully+0.140
 net+0.139
Neg logit contributions
 anx-0.112
ris-0.105
abel-0.097
 Er-0.096
ma-0.095
Token join
Token ID4654
Feature activation+4.88e-03

Pos logit contributions
adoes+0.028
 scares+0.026
 echoed+0.025
 bully+0.024
 arrows+0.024
Neg logit contributions
 anx-0.032
ris-0.031
ma-0.029
 Er-0.029
abel-0.028
Token him
Token ID683
Feature activation-5.85e-03

Pos logit contributions
adoes+0.009
 arrows+0.009
 scares+0.009
 echoed+0.009
 net+0.009
Neg logit contributions
 anx-0.006
ris-0.006
abel-0.005
 Er-0.005
innie-0.005
Token was
Token ID373
Feature activation-1.03e-02

Pos logit contributions
 anx+0.009
ris+0.008
ma+0.008
abel+0.008
 Er+0.008
Neg logit contributions
adoes-0.011
 scares-0.010
 echoed-0.010
 bully-0.010
 net-0.010
Token hidden
Token ID7104
Feature activation+7.20e-03

Pos logit contributions
 anx+0.014
ris+0.013
abel+0.012
 Er+0.012
ma+0.012
Neg logit contributions
adoes-0.018
 scares-0.017
 echoed-0.017
 bully-0.017
 net-0.016
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
adoes+0.011
 scares+0.010
 arrows+0.010
 echoed+0.010
 mushroom+0.010
Neg logit contributions
 anx-0.011
innie-0.011
ris-0.011
abella-0.010
abel-0.010
Token He
Token ID679
Feature activation-1.11e-02

Pos logit contributions
adoes+0.003
 scares+0.003
 echoed+0.003
 bully+0.003
 arrows+0.003
Neg logit contributions
 anx-0.004
ris-0.004
ma-0.004
abel-0.003
 Er-0.003
Token asked
Token ID1965
Feature activation+1.98e-02

Pos logit contributions
 anx+0.020
ris+0.019
 Er+0.019
ma+0.019
innie+0.018
Neg logit contributions
 echoed-0.014
 scares-0.013
adoes-0.013
 arrow-0.013
 arrows-0.013
Token all
Token ID477
Feature activation+1.18e-01

Pos logit contributions
adoes+0.029
 mushroom+0.026
 scares+0.026
 arrow+0.026
 arrows+0.026
Neg logit contributions
ris-0.034
 anx-0.031
rom-0.030
innie-0.030
abel-0.029
Token his
Token ID465
Feature activation+2.76e-02

Pos logit contributions
adoes+0.214
 scares+0.201
 echoed+0.197
 bully+0.194
 affect+0.189
Neg logit contributions
 anx-0.158
ris-0.144
ma-0.138
abel-0.137
 Er-0.136
Token servants
Token ID17523
Feature activation+2.66e-02

Pos logit contributions
adoes+0.029
 scares+0.027
 arrows+0.027
 arrow+0.026
 net+0.026
Neg logit contributions
 anx-0.057
ris-0.054
rom-0.053
 Er-0.052
abel-0.051
Token,
Token ID11
Feature activation+4.45e-02

Pos logit contributions
adoes+0.043
 echoed+0.041
 scares+0.041
 mushroom+0.040
 net+0.040
Neg logit contributions
 anx-0.037
ris-0.035
innie-0.034
icer-0.033
abel-0.032
Token but
Token ID475
Feature activation+1.53e-02

Pos logit contributions
 echoed+0.037
adoes+0.037
 scares+0.036
 snail+0.036
 arrow+0.036
Neg logit contributions
ris-0.101
 anx-0.092
icer-0.091
abel-0.089
son-0.088
Token no
Token ID645
Feature activation+2.56e-02

Pos logit contributions
adoes+0.023
 echoed+0.022
 arrow+0.021
 net+0.021
 arrows+0.021
Neg logit contributions
ris-0.025
 anx-0.024
innie-0.023
abel-0.022
son-0.021
Token said
Token ID531
Feature activation-6.51e-03

Pos logit contributions
adoes+0.008
 scares+0.008
 echoed+0.007
 bully+0.007
 net+0.007
Neg logit contributions
 anx-0.007
ris-0.007
abel-0.007
ma-0.006
 Er-0.006
Token.
Token ID13
Feature activation+7.79e-03

Pos logit contributions
ris+0.010
 anx+0.009
innie+0.008
icer+0.008
abel+0.008
Neg logit contributions
adoes-0.011
 echoed-0.011
 scares-0.010
 bully-0.010
 tail-0.010
Token "
Token ID366
Feature activation-1.06e-02

Pos logit contributions
adoes+0.014
 scares+0.012
 affect+0.012
 arrows+0.012
 arrow+0.012
Neg logit contributions
 anx-0.011
 Er-0.010
ris-0.010
 patiently-0.010
abel-0.009
TokenBut
Token ID1537
Feature activation-6.35e-03

Pos logit contributions
 anx+0.017
ris+0.016
abel+0.015
ma+0.015
 Er+0.014
Neg logit contributions
adoes-0.017
 scares-0.016
 echoed-0.015
 bully-0.015
 arrows-0.015
Tokenter
Token ID353
Feature activation+9.23e-02

Pos logit contributions
 anx+0.011
ma+0.010
 Er+0.010
ris+0.010
abel+0.010
Neg logit contributions
adoes-0.008
 scares-0.008
 echoed-0.008
 bully-0.007
ickets-0.007
Tokenflies
Token ID27959
Feature activation+1.17e-01

Pos logit contributions
adoes+0.143
 scares+0.132
 echoed+0.130
 bully+0.126
 arrows+0.125
Neg logit contributions
 anx-0.146
ris-0.137
abel-0.130
ma-0.129
 Er-0.129
Token are
Token ID389
Feature activation+2.31e-02

Pos logit contributions
adoes+0.187
 bullet+0.169
 Indians+0.168
 target+0.160
 arrow+0.159
Neg logit contributions
 anx-0.170
 first-0.166
ma-0.161
ll-0.161
abel-0.158
Token free
Token ID1479
Feature activation+5.00e-02

Pos logit contributions
adoes+0.045
ickets+0.043
 echoed+0.042
AAAA+0.040
 affect+0.040
Neg logit contributions
 anx-0.025
 first-0.024
 next-0.023
 patiently-0.022
 Er-0.021
Token.
Token ID13
Feature activation+2.51e-02

Pos logit contributions
adoes+0.098
 scares+0.092
 echoed+0.091
 bully+0.090
 arrows+0.087
Neg logit contributions
 anx-0.059
ris-0.054
ma-0.050
abel-0.050
 Er-0.048
Token They
Token ID1119
Feature activation+2.74e-02

Pos logit contributions
adoes+0.036
 echoed+0.034
 bully+0.033
 racket+0.033
 affect+0.033
Neg logit contributions
 anx-0.038
 Er-0.037
abel-0.036
ris-0.035
rom-0.035
Token need
Token ID761
Feature activation+4.37e-02

Pos logit contributions
adoes+0.043
 echoed+0.037
AAAA+0.036
ickets+0.036
 bullet+0.036
Neg logit contributions
 anx-0.040
 first-0.039
ris-0.038
 Er-0.037
abel-0.037
Token was
Token ID373
Feature activation-7.67e-03

Pos logit contributions
adoes+0.065
 scares+0.062
 echoed+0.061
 bully+0.058
 mushroom+0.057
Neg logit contributions
 anx-0.058
ris-0.055
 Er-0.051
abel-0.049
ma-0.049
Token light
Token ID1657
Feature activation-6.86e-03

Pos logit contributions
ris+0.012
 anx+0.012
 Er+0.010
abel+0.010
innie+0.010
Neg logit contributions
adoes-0.012
 scares-0.011
 echoed-0.011
 pitch-0.011
 tail-0.011
Token.
Token ID13
Feature activation+2.94e-03

Pos logit contributions
 anx+0.010
ris+0.010
innie+0.009
 Er+0.009
son+0.009
Neg logit contributions
adoes-0.012
 echoed-0.011
 scares-0.010
 pitch-0.010
 mushroom-0.010
Token She
Token ID1375
Feature activation+5.21e-03

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.004
 mushroom+0.004
Neg logit contributions
 anx-0.005
ris-0.005
ma-0.004
abel-0.004
 Er-0.004
Token asked
Token ID1965
Feature activation+2.31e-02

Pos logit contributions
adoes+0.007
 echoed+0.007
 scares+0.007
 bully+0.006
 arrow+0.006
Neg logit contributions
 anx-0.009
ris-0.009
 Er-0.008
innie-0.008
ma-0.008
Token little
Token ID1310
Feature activation+1.17e-01

Pos logit contributions
adoes+0.028
 echoed+0.024
 mushroom+0.023
 squash+0.022
 arrow+0.022
Neg logit contributions
ris-0.048
vy-0.040
ll-0.039
 anx-0.039
icer-0.039
Token Jake
Token ID14757
Feature activation+1.22e-02

Pos logit contributions
adoes+0.148
 bug+0.131
 hoop+0.127
 mushroom+0.127
 arrow+0.126
Neg logit contributions
ris-0.230
 anx-0.217
innie-0.202
vy-0.200
rom-0.196
Token to
Token ID284
Feature activation+3.62e-02

Pos logit contributions
adoes+0.015
 scares+0.015
 echoed+0.014
 bully+0.014
 mushroom+0.014
Neg logit contributions
 anx-0.021
ris-0.021
icer-0.020
rom-0.019
innie-0.019
Token come
Token ID1282
Feature activation-2.39e-02

Pos logit contributions
adoes+0.057
 scares+0.053
 echoed+0.052
 bully+0.050
 arrows+0.049
Neg logit contributions
 anx-0.057
ris-0.054
abel-0.051
 Er-0.051
ma-0.050
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
ris+0.041
abel+0.041
abella+0.041
 anx+0.041
son+0.039
Neg logit contributions
adoes-0.031
 pitch-0.029
 mushroom-0.028
 hoop-0.027
 tail-0.027
Token join
Token ID4654
Feature activation+4.62e-03

Pos logit contributions
adoes+0.017
 scares+0.016
 echoed+0.015
 arrows+0.015
 racket+0.015
Neg logit contributions
 anx-0.017
ris-0.016
ma-0.015
 Er-0.015
abel-0.015
Token saw
Token ID2497
Feature activation-2.03e-02

Pos logit contributions
 scares+0.005
 bully+0.005
 echoed+0.005
adoes+0.005
 net+0.004
Neg logit contributions
 anx-0.007
ris-0.007
 Er-0.006
innie-0.006
rom-0.006
Token the
Token ID262
Feature activation+1.35e-02

Pos logit contributions
 anx+0.023
ris+0.020
abel+0.020
ma+0.019
 Er+0.019
Neg logit contributions
adoes-0.041
 scares-0.039
 echoed-0.038
 bully-0.038
 affect-0.037
Token great
Token ID1049
Feature activation-4.97e-04

Pos logit contributions
adoes+0.022
 echoed+0.020
 scares+0.020
 web+0.019
 arrows+0.018
Neg logit contributions
 anx-0.020
ris-0.019
 Er-0.018
ma-0.018
abel-0.018
Token big
Token ID1263
Feature activation+9.51e-05

Pos logit contributions
 anx+0.001
ma+0.001
ris+0.001
abel+0.001
 Er+0.001
Neg logit contributions
adoes-0.001
 echoed-0.001
 scares-0.001
ickets-0.001
 affect-0.001
Token Fer
Token ID12880
Feature activation-1.28e-01

Pos logit contributions
adoes+0.000
 echoed+0.000
ickets+0.000
icks+0.000
 affect+0.000
Neg logit contributions
 anx-0.000
ma-0.000
 first-0.000
 next-0.000
 Er-0.000
Tokenris
Token ID2442
Feature activation-1.92e-01

Pos logit contributions
 anx+0.064
 Er+0.040
abel+0.030
 patiently+0.027
innie+0.020
Neg logit contributions
adoes-0.326
 bully-0.321
 pitch-0.315
ickets-0.312
 net-0.312
Token wheel
Token ID7825
Feature activation-6.40e-02

Pos logit contributions
 anx+0.339
abel+0.325
innie+0.315
 patiently+0.300
years+0.294
Neg logit contributions
adoes-0.243
 arrow-0.233
 bully-0.226
 arrows-0.225
 net-0.222
Token and
Token ID290
Feature activation+4.98e-03

Pos logit contributions
 anx+0.092
ris+0.084
innie+0.083
 Er+0.081
abel+0.081
Neg logit contributions
adoes-0.106
 echoed-0.103
 scares-0.100
 arrow-0.098
 pitch-0.097
Token all
Token ID477
Feature activation+6.14e-02

Pos logit contributions
adoes+0.008
 affect+0.008
ickets+0.008
 scares+0.008
 bully+0.008
Neg logit contributions
 anx-0.008
 patiently-0.006
abel-0.006
 Er-0.006
ma-0.006
Token the
Token ID262
Feature activation-1.65e-03

Pos logit contributions
adoes+0.119
 scares+0.107
 arrows+0.107
 bullet+0.106
 net+0.105
Neg logit contributions
 anx-0.079
ris-0.073
ma-0.072
abel-0.069
 Er-0.066
Token other
Token ID584
Feature activation+1.60e-02

Pos logit contributions
 anx+0.003
innie+0.003
ris+0.003
 patiently+0.003
abel+0.003
Neg logit contributions
 scares-0.002
 bully-0.002
adoes-0.002
 arrows-0.002
 pitch-0.002
Token Lily
Token ID20037
Feature activation+2.07e-02

Pos logit contributions
adoes+0.001
 scares+0.001
 echoed+0.001
 bully+0.001
 arrows+0.001
Neg logit contributions
 anx-0.001
ris-0.001
abel-0.001
ma-0.001
 Er-0.001
Token loved
Token ID6151
Feature activation-1.89e-02

Pos logit contributions
adoes+0.032
 scares+0.030
 echoed+0.030
 bully+0.029
 arrows+0.028
Neg logit contributions
 anx-0.032
ris-0.031
 Er-0.029
abel-0.028
ma-0.028
Token the
Token ID262
Feature activation-3.39e-03

Pos logit contributions
 anx+0.024
ris+0.023
abel+0.021
 Er+0.021
ma+0.020
Neg logit contributions
adoes-0.035
 scares-0.033
 echoed-0.032
 bully-0.032
 arrows-0.032
Token big
Token ID1263
Feature activation-3.17e-03

Pos logit contributions
 anx+0.006
ris+0.006
 Er+0.006
abel+0.006
innie+0.006
Neg logit contributions
adoes-0.004
 scares-0.004
 net-0.004
 echoed-0.004
 bully-0.004
Token Fer
Token ID12880
Feature activation-1.19e-01

Pos logit contributions
 anx+0.005
ma+0.005
ris+0.005
abel+0.005
 Er+0.005
Neg logit contributions
adoes-0.005
 echoed-0.004
 scares-0.004
 affect-0.004
ickets-0.004
Tokenris
Token ID2442
Feature activation-1.90e-01

Pos logit contributions
 anx+0.115
 Er+0.095
abel+0.085
 patiently+0.084
innie+0.076
Neg logit contributions
adoes-0.256
 bully-0.245
 scares-0.240
 echoed-0.240
ickets-0.239
Token wheel
Token ID7825
Feature activation-6.19e-02

Pos logit contributions
 anx+0.332
innie+0.306
abel+0.305
 patiently+0.298
 Er+0.293
Neg logit contributions
adoes-0.248
 arrow-0.230
 arrows-0.226
 scares-0.226
 bully-0.222
Token the
Token ID262
Feature activation-1.84e-02

Pos logit contributions
 anx+0.089
ris+0.082
 Er+0.077
abel+0.076
innie+0.074
Neg logit contributions
adoes-0.106
 scares-0.099
 echoed-0.099
 bully-0.095
 arrows-0.094
Token most
Token ID749
Feature activation-2.97e-02

Pos logit contributions
 anx+0.031
abel+0.031
innie+0.031
ris+0.030
 patiently+0.028
Neg logit contributions
 arrows-0.024
 tail-0.024
 net-0.024
 bully-0.024
adoes-0.024
Token.
Token ID13
Feature activation-2.33e-02

Pos logit contributions
 anx+0.050
 Er+0.046
abel+0.044
rom+0.044
abella+0.044
Neg logit contributions
adoes-0.041
 echoed-0.040
 arrow-0.039
 arrows-0.038
 pitch-0.038
Token She
Token ID1375
Feature activation-1.09e-02

Pos logit contributions
 anx+0.039
ris+0.036
abel+0.034
ma+0.034
 Er+0.034
Neg logit contributions
adoes-0.034
 scares-0.031
 echoed-0.031
 bully-0.030
 hoop-0.030
Token they
Token ID484
Feature activation-6.05e-03

Pos logit contributions
 anx+0.008
ris+0.008
abel+0.008
 Er+0.008
ma+0.007
Neg logit contributions
adoes-0.007
 scares-0.007
 echoed-0.007
 arrows-0.006
 bully-0.006
Token went
Token ID1816
Feature activation+6.43e-03

Pos logit contributions
 anx+0.010
ris+0.010
 Er+0.010
abel+0.009
ma+0.009
Neg logit contributions
adoes-0.008
 echoed-0.008
 scares-0.007
 arrow-0.007
 arrows-0.007
Token on
Token ID319
Feature activation+2.23e-02

Pos logit contributions
 pitch+0.009
 net+0.009
adoes+0.009
 hoop+0.009
 arrows+0.009
Neg logit contributions
abel-0.010
 anx-0.010
innie-0.010
 Er-0.009
icer-0.009
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
 pitch+0.029
 net+0.029
 arrows+0.027
 hoop+0.026
adoes+0.026
Neg logit contributions
abel-0.040
innie-0.040
son-0.040
 anx-0.040
iah-0.038
Token fer
Token ID11354
Feature activation-8.77e-02

Pos logit contributions
abel+0.022
innie+0.022
 anx+0.022
ris+0.021
ait+0.021
Neg logit contributions
 arrow-0.014
 arrows-0.013
 pitch-0.012
 net-0.012
 hoop-0.012
Tokenris
Token ID2442
Feature activation-1.87e-01

Pos logit contributions
 anx+0.062
 Er+0.055
abel+0.043
 patiently+0.038
innie+0.036
Neg logit contributions
 arrow-0.205
adoes-0.205
 arrows-0.204
 bully-0.198
 net-0.195
Token wheel
Token ID7825
Feature activation-7.40e-02

Pos logit contributions
 anx+0.344
abel+0.340
innie+0.330
 Er+0.313
abella+0.311
Neg logit contributions
adoes-0.230
 arrow-0.229
 arrows-0.219
 bullet-0.200
 net-0.198
Token together
Token ID1978
Feature activation-2.64e-02

Pos logit contributions
 anx+0.125
innie+0.122
abel+0.117
 Er+0.116
abella+0.114
Neg logit contributions
adoes-0.099
 echoed-0.097
 arrow-0.094
 arrows-0.090
 pitch-0.090
Token,
Token ID11
Feature activation+1.01e-02

Pos logit contributions
 anx+0.042
innie+0.041
ris+0.041
 Er+0.040
abel+0.040
Neg logit contributions
 echoed-0.036
adoes-0.036
 pitch-0.036
 net-0.036
 tail-0.035
Token and
Token ID290
Feature activation+1.32e-02

Pos logit contributions
adoes+0.013
 pitch+0.013
 tail+0.013
 net+0.013
 echoed+0.013
Neg logit contributions
innie-0.017
abel-0.017
ris-0.016
 anx-0.016
 Er-0.016
Token Lily
Token ID20037
Feature activation-2.01e-03

Pos logit contributions
adoes+0.019
 echoed+0.018
 arrow+0.017
 scares+0.017
 arrows+0.017
Neg logit contributions
 anx-0.022
ris-0.021
innie-0.020
abel-0.020
ma-0.020
Token He
Token ID679
Feature activation-2.45e-02

Pos logit contributions
 anx+0.010
ris+0.009
 Er+0.009
abel+0.009
 patiently+0.008
Neg logit contributions
adoes-0.011
 scares-0.010
 echoed-0.010
 bully-0.010
 arrows-0.010
Token went
Token ID1816
Feature activation+2.29e-02

Pos logit contributions
 anx+0.043
innie+0.042
 Er+0.041
abel+0.041
ma+0.040
Neg logit contributions
 arrow-0.029
 echoed-0.029
 scares-0.028
 arrows-0.028
adoes-0.027
Token on
Token ID319
Feature activation+3.69e-02

Pos logit contributions
 pitch+0.027
 arrows+0.025
 net+0.025
 arrow+0.025
 snail+0.025
Neg logit contributions
abel-0.043
innie-0.040
icer-0.040
 anx-0.039
son-0.037
Token the
Token ID262
Feature activation-2.75e-03

Pos logit contributions
 arrows+0.047
 arrow+0.046
 pitch+0.045
 tail+0.044
 net+0.043
Neg logit contributions
abel-0.067
rom-0.065
iah-0.065
innie-0.064
ris-0.063
Token fer
Token ID11354
Feature activation-8.72e-02

Pos logit contributions
 anx+0.005
abel+0.005
ris+0.005
innie+0.005
bie+0.005
Neg logit contributions
 arrow-0.003
 pitch-0.003
 arrows-0.003
 tail-0.003
adoes-0.003
Tokenris
Token ID2442
Feature activation-1.84e-01

Pos logit contributions
 anx+0.065
 Er+0.056
abel+0.046
 patiently+0.042
innie+0.039
Neg logit contributions
adoes-0.206
 arrows-0.199
 arrow-0.198
 bully-0.193
 net-0.191
Token wheel
Token ID7825
Feature activation-6.61e-02

Pos logit contributions
 anx+0.322
abel+0.308
innie+0.304
 Er+0.292
vy+0.290
Neg logit contributions
adoes-0.250
 arrow-0.239
 arrows-0.231
 bullet-0.220
 scares-0.219
Token,
Token ID11
Feature activation+1.46e-02

Pos logit contributions
 anx+0.116
innie+0.112
abel+0.109
 Er+0.108
ris+0.105
Neg logit contributions
adoes-0.080
 echoed-0.079
 arrow-0.077
 arrows-0.074
 tail-0.074
Token the
Token ID262
Feature activation+2.26e-02

Pos logit contributions
 arrow+0.019
adoes+0.019
 arrows+0.019
 tail+0.019
 echoed+0.019
Neg logit contributions
abel-0.024
ris-0.024
innie-0.024
 anx-0.023
ma-0.022
Token roller
Token ID24471
Feature activation-6.68e-02

Pos logit contributions
 bully+0.027
 arrows+0.027
 arrow+0.027
 net+0.026
 tail+0.026
Neg logit contributions
 anx-0.042
abel-0.042
ris-0.042
innie-0.042
rom-0.039
Token coaster
Token ID42450
Feature activation-5.16e-02

Pos logit contributions
 anx+0.112
abel+0.104
ris+0.101
 Er+0.100
innie+0.098
Neg logit contributions
adoes-0.100
 scares-0.092
 echoed-0.092
 bully-0.090
 arrow-0.090
Token and
Token ID290
Feature activation-2.09e-02

Pos logit contributions
 anx+0.068
ris+0.067
innie+0.066
son+0.066
abel+0.065
Neg logit contributions
adoes-0.047
 pitch-0.045
 scares-0.043
 hoop-0.042
 net-0.042
Token down
Token ID866
Feature activation+2.32e-02

Pos logit contributions
 anx+0.045
ris+0.044
abel+0.042
innie+0.042
 Er+0.041
Neg logit contributions
adoes-0.020
 echoed-0.018
 scares-0.018
 bully-0.017
 hoop-0.016
Token on
Token ID319
Feature activation+1.70e-02

Pos logit contributions
adoes+0.030
 scares+0.028
 echoed+0.027
 pitch+0.026
 bully+0.026
Neg logit contributions
 anx-0.041
ris-0.040
innie-0.040
son-0.038
abel-0.038
Token the
Token ID262
Feature activation-2.36e-02

Pos logit contributions
adoes+0.022
 pitch+0.021
 net+0.019
 hoop+0.019
 hoops+0.019
Neg logit contributions
ll-0.032
 anx-0.031
ris-0.031
son-0.030
years-0.029
Token roller
Token ID24471
Feature activation-1.07e-01

Pos logit contributions
 anx+0.046
 Er+0.046
ris+0.046
innie+0.044
years+0.043
Neg logit contributions
 pitch-0.026
 arrow-0.025
 hoop-0.024
 tail-0.024
 net-0.023
Tokenco
Token ID1073
Feature activation-1.76e-01

Pos logit contributions
 anx+0.168
 Er+0.155
ris+0.150
years+0.145
abel+0.143
Neg logit contributions
adoes-0.167
 echoed-0.160
 scares-0.157
 pitch-0.153
 bully-0.152
Tokenaster
Token ID1603
Feature activation-8.26e-02

Pos logit contributions
 anx+0.325
 Er+0.313
 patiently+0.285
abel+0.279
joy+0.277
Neg logit contributions
 echoed-0.187
 pitch-0.182
 scares-0.182
ickets-0.181
adoes-0.180
Token,
Token ID11
Feature activation+1.14e-02

Pos logit contributions
 anx+0.131
ris+0.127
 Er+0.122
innie+0.119
abel+0.117
Neg logit contributions
adoes-0.121
 echoed-0.117
 scares-0.115
 bully-0.108
 pitch-0.108
Token Lily
Token ID20037
Feature activation+9.58e-03

Pos logit contributions
adoes+0.015
 scares+0.013
 echoed+0.013
 bully+0.012
 net+0.012
Neg logit contributions
 anx-0.021
ris-0.021
abel-0.019
ma-0.019
 Er-0.019
Token shouted
Token ID17293
Feature activation+2.27e-02

Pos logit contributions
 echoed+0.012
adoes+0.012
 scares+0.012
 bully+0.011
 arrow+0.011
Neg logit contributions
 anx-0.016
ris-0.016
 Er-0.016
innie-0.015
abel-0.015
Token,
Token ID11
Feature activation+1.14e-02

Pos logit contributions
adoes+0.035
 echoed+0.032
 scares+0.032
 bully+0.030
 pitch+0.030
Neg logit contributions
ris-0.036
 anx-0.035
ma-0.033
abel-0.033
 Er-0.032
Token's
Token ID338
Feature activation+4.69e-02

Pos logit contributions
adoes+0.025
 scares+0.024
 echoed+0.023
 bully+0.022
 arrows+0.022
Neg logit contributions
 anx-0.025
ris-0.024
ma-0.023
abel-0.023
 Er-0.022
Token go
Token ID467
Feature activation+3.30e-02

Pos logit contributions
adoes+0.081
 scares+0.074
AAAA+0.073
 echoed+0.073
 CA+0.070
Neg logit contributions
 anx-0.068
ris-0.063
 first-0.061
ma-0.061
 patiently-0.059
Token on
Token ID319
Feature activation+4.27e-02

Pos logit contributions
 pitch+0.040
 scares+0.040
 net+0.040
adoes+0.039
 bully+0.039
Neg logit contributions
 anx-0.061
abel-0.056
 Er-0.055
innie-0.053
son-0.053
Token the
Token ID262
Feature activation-1.34e-03

Pos logit contributions
adoes+0.055
 pitch+0.053
 net+0.052
 hoops+0.051
 arrows+0.051
Neg logit contributions
 anx-0.079
abel-0.075
son-0.075
ris-0.075
 Er-0.073
Token fer
Token ID11354
Feature activation-7.86e-02

Pos logit contributions
 anx+0.002
ris+0.002
abel+0.002
 Er+0.002
innie+0.002
Neg logit contributions
adoes-0.002
 scares-0.002
 pitch-0.002
 arrow-0.002
 net-0.002
Tokenris
Token ID2442
Feature activation-1.75e-01

Pos logit contributions
 anx+0.066
 Er+0.059
abel+0.048
 patiently+0.043
innie+0.041
Neg logit contributions
adoes-0.175
 arrow-0.172
 arrows-0.172
 bully-0.171
 net-0.167
Token wheel
Token ID7825
Feature activation-6.15e-02

Pos logit contributions
 anx+0.313
abel+0.297
innie+0.287
 Er+0.281
 patiently+0.278
Neg logit contributions
adoes-0.232
 arrow-0.224
 arrows-0.216
 scares-0.207
 bully-0.202
Token and
Token ID290
Feature activation+1.58e-02

Pos logit contributions
 anx+0.106
ris+0.096
 Er+0.095
abel+0.094
innie+0.093
Neg logit contributions
adoes-0.088
 scares-0.080
 echoed-0.080
 bully-0.077
 arrow-0.076
Token see
Token ID766
Feature activation-1.67e-02

Pos logit contributions
adoes+0.025
ickets+0.023
 echoed+0.023
 scares+0.023
AAAA+0.021
Neg logit contributions
 anx-0.025
ris-0.023
 first-0.022
ma-0.022
 Er-0.022
Token the
Token ID262
Feature activation+2.29e-02

Pos logit contributions
 anx+0.027
ris+0.025
 Er+0.024
abel+0.024
ma+0.023
Neg logit contributions
adoes-0.026
 scares-0.024
 echoed-0.023
 arrows-0.023
 bully-0.022
Token whole
Token ID2187
Feature activation+2.78e-02

Pos logit contributions
adoes+0.032
 bully+0.031
 arrows+0.031
 arrow+0.031
 scares+0.030
Neg logit contributions
 anx-0.039
innie-0.037
abel-0.037
ris-0.036
 Er-0.034
Token went
Token ID1816
Feature activation+9.51e-03

Pos logit contributions
 anx+0.020
ris+0.019
abel+0.018
 Er+0.018
ma+0.018
Neg logit contributions
adoes-0.017
 scares-0.016
 echoed-0.016
 bully-0.015
 arrows-0.015
Token on
Token ID319
Feature activation+3.03e-02

Pos logit contributions
adoes+0.014
 pitch+0.013
 net+0.013
 scares+0.013
 arrows+0.013
Neg logit contributions
 anx-0.015
abel-0.015
ris-0.015
innie-0.014
icer-0.014
Token a
Token ID257
Feature activation-2.30e-02

Pos logit contributions
 pitch+0.039
 arrows+0.039
adoes+0.038
 net+0.038
 scares+0.036
Neg logit contributions
ll-0.053
abel-0.053
innie-0.052
ma-0.052
icer-0.052
Token scary
Token ID14343
Feature activation+1.39e-02

Pos logit contributions
 anx+0.037
ris+0.035
 Er+0.034
abel+0.033
ma+0.033
Neg logit contributions
 pitch-0.033
 net-0.032
 scares-0.032
adoes-0.031
 bully-0.031
Token roller
Token ID24471
Feature activation-1.02e-01

Pos logit contributions
adoes+0.024
 scares+0.023
 bully+0.022
 echoed+0.022
 net+0.022
Neg logit contributions
 anx-0.020
ris-0.019
 Er-0.018
abel-0.017
ma-0.017
Tokenco
Token ID1073
Feature activation-1.73e-01

Pos logit contributions
 anx+0.141
 Er+0.135
ris+0.134
innie+0.130
abel+0.127
Neg logit contributions
adoes-0.179
 scares-0.171
 bully-0.170
 echoed-0.168
 pitch-0.166
Tokenaster
Token ID1603
Feature activation-8.21e-02

Pos logit contributions
 anx+0.297
 Er+0.287
abel+0.260
 patiently+0.258
joy+0.255
Neg logit contributions
uts-0.200
 scares-0.198
 echoed-0.197
ickets-0.197
 pitch-0.193
Token ride
Token ID6594
Feature activation-6.16e-02

Pos logit contributions
 anx+0.114
innie+0.112
ris+0.110
 Er+0.108
abel+0.100
Neg logit contributions
adoes-0.134
 echoed-0.133
 scares-0.129
 net-0.123
 pitch-0.123
Token.
Token ID13
Feature activation-1.17e-02

Pos logit contributions
innie+0.100
 anx+0.098
abel+0.094
ris+0.093
 Er+0.092
Neg logit contributions
adoes-0.093
 echoed-0.088
 arrows-0.083
 pitch-0.082
 arrow-0.079
Token Tim
Token ID5045
Feature activation-5.40e-03

Pos logit contributions
 anx+0.018
ris+0.017
 Er+0.016
abel+0.016
ma+0.016
Neg logit contributions
adoes-0.019
 scares-0.018
 echoed-0.017
 bully-0.017
 arrows-0.017
Tokenmy
Token ID1820
Feature activation-7.09e-03

Pos logit contributions
 anx+0.007
 Er+0.007
lan+0.007
ris+0.006
 patiently+0.006
Neg logit contributions
 echoed-0.009
adoes-0.009
 bully-0.009
 tail-0.009
 arrow-0.008
Token Lily
Token ID20037
Feature activation-1.91e-02

Pos logit contributions
 anx+0.013
ris+0.012
innie+0.011
rom+0.011
abel+0.011
Neg logit contributions
adoes-0.009
 scares-0.008
 echoed-0.008
 bully-0.008
 pitch-0.008
Token loved
Token ID6151
Feature activation-3.05e-02

Pos logit contributions
 anx+0.030
ris+0.028
 Er+0.027
innie+0.026
rom+0.026
Neg logit contributions
 echoed-0.028
adoes-0.028
 scares-0.027
 bully-0.026
 arrow-0.026
Token to
Token ID284
Feature activation-1.99e-02

Pos logit contributions
ris+0.052
 anx+0.050
icer+0.046
abel+0.046
vy+0.045
Neg logit contributions
adoes-0.045
 scares-0.042
 pitch-0.041
 snail-0.040
 echoed-0.040
Token ride
Token ID6594
Feature activation-4.35e-02

Pos logit contributions
 anx+0.036
ris+0.034
abel+0.032
 Er+0.032
ma+0.032
Neg logit contributions
adoes-0.027
 scares-0.025
 echoed-0.024
 bully-0.024
 arrows-0.023
Token roller
Token ID24471
Feature activation-9.37e-02

Pos logit contributions
ris+0.078
 anx+0.076
 Er+0.070
innie+0.069
abel+0.068
Neg logit contributions
adoes-0.064
 arrows-0.059
 echoed-0.056
 hoop-0.055
 arrow-0.054
Tokenco
Token ID1073
Feature activation-1.60e-01

Pos logit contributions
 anx+0.150
ris+0.147
abel+0.136
 Er+0.134
bie+0.131
Neg logit contributions
adoes-0.145
 scares-0.135
 echoed-0.133
 arrows-0.131
 pitch-0.126
Tokenasters
Token ID7060
Feature activation-4.80e-02

Pos logit contributions
 anx+0.301
 Er+0.275
 patiently+0.261
abel+0.254
joy+0.247
Neg logit contributions
uts-0.173
ickets-0.172
adoes-0.169
 scares-0.168
 echoed-0.167
Token and
Token ID290
Feature activation-3.87e-04

Pos logit contributions
 anx+0.073
ris+0.071
innie+0.065
abel+0.065
 Er+0.064
Neg logit contributions
adoes-0.077
 echoed-0.072
 scares-0.072
 arrows-0.069
 bully-0.068
Token always
Token ID1464
Feature activation-2.52e-02

Pos logit contributions
 anx+0.001
ris+0.001
abel+0.001
innie+0.001
ma+0.001
Neg logit contributions
adoes-0.000
 echoed-0.000
 scares-0.000
 arrow-0.000
 arrows-0.000
Token remembered
Token ID12086
Feature activation+7.01e-03

Pos logit contributions
 anx+0.044
ris+0.042
abel+0.041
innie+0.040
 Er+0.040
Neg logit contributions
adoes-0.033
 echoed-0.032
 scares-0.032
 bully-0.031
 arrow-0.030
Token how
Token ID703
Feature activation-1.96e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 echoed+0.011
 arrows+0.010
 net+0.010
Neg logit contributions
 anx-0.010
ris-0.010
abel-0.009
 Er-0.009
innie-0.009
Token day
Token ID1110
Feature activation-2.32e-02

Pos logit contributions
 anx+0.039
ris+0.036
 Er+0.035
abel+0.034
rom+0.034
Neg logit contributions
adoes-0.038
 scares-0.035
 echoed-0.035
 net-0.034
 bully-0.034
Token ever
Token ID1683
Feature activation-1.77e-02

Pos logit contributions
 anx+0.029
ris+0.028
innie+0.027
son+0.026
icer+0.026
Neg logit contributions
 echoed-0.040
 net-0.040
adoes-0.038
 scares-0.038
 snail-0.038
Token!
Token ID0
Feature activation-1.43e-02

Pos logit contributions
innie+0.032
 anx+0.032
 Er+0.030
son+0.029
rom+0.029
Neg logit contributions
 echoed-0.021
 net-0.020
 mushroom-0.020
 arrow-0.019
 scares-0.019
Token<|endoftext|>
Token ID50256
Feature activation+2.20e-02

Pos logit contributions
 anx+0.016
ris+0.015
 Er+0.014
abel+0.014
ma+0.014
Neg logit contributions
adoes-0.029
 scares-0.027
 echoed-0.026
 bully-0.026
 arrows-0.026
TokenMor
Token ID20044
Feature activation-6.47e-02

Pos logit contributions
 echoed+0.028
 scares+0.028
 net+0.028
 mushroom+0.027
ickets+0.027
Neg logit contributions
 anx-0.039
 patiently-0.035
ris-0.034
abel-0.033
 Er-0.032
Tokenris
Token ID2442
Feature activation-1.58e-01

Pos logit contributions
 anx+0.085
innie+0.072
 patiently+0.072
 Er+0.072
rom+0.070
Neg logit contributions
 arrows-0.110
 bounces-0.107
 net-0.106
 arrow-0.106
alk-0.104
Token was
Token ID373
Feature activation-7.63e-03

Pos logit contributions
 anx+0.288
iam+0.262
rom+0.262
year+0.261
innie+0.261
Neg logit contributions
adoes-0.171
 arrow-0.167
 scares-0.161
 net-0.157
 scatter-0.156
Token very
Token ID845
Feature activation+1.58e-03

Pos logit contributions
 anx+0.010
ris+0.010
 Er+0.009
abel+0.009
ma+0.009
Neg logit contributions
adoes-0.013
 scares-0.012
 echoed-0.012
 net-0.012
 bully-0.012
Token excited
Token ID6568
Feature activation-2.93e-02

Pos logit contributions
 bully+0.002
 snail+0.002
 scares+0.002
 net+0.002
adoes+0.002
Neg logit contributions
ris-0.003
 anx-0.002
abel-0.002
rom-0.002
son-0.002
Token.
Token ID13
Feature activation-2.39e-02

Pos logit contributions
 anx+0.036
ris+0.034
abel+0.031
 Er+0.030
ma+0.030
Neg logit contributions
adoes-0.056
 echoed-0.052
 scares-0.052
 bully-0.051
 arrow-0.050
Token Today
Token ID6288
Feature activation-1.91e-02

Pos logit contributions
 anx+0.034
ris+0.034
 Er+0.033
abel+0.030
son+0.030
Neg logit contributions
adoes-0.041
 scares-0.039
 bully-0.036
 web-0.036
 net-0.036
Token you
Token ID345
Feature activation+1.90e-02

Pos logit contributions
adoes+0.066
 scares+0.060
 web+0.057
 mushroom+0.057
 bullet+0.057
Neg logit contributions
 anx-0.046
abel-0.044
 Er-0.044
ma-0.044
ris-0.042
Token start
Token ID923
Feature activation-1.25e-02

Pos logit contributions
adoes+0.030
 scares+0.026
 echoed+0.025
AAAA+0.025
 mushroom+0.025
Neg logit contributions
 anx-0.032
ris-0.030
abel-0.028
ma-0.027
 Er-0.027
Token with
Token ID351
Feature activation+1.83e-03

Pos logit contributions
 anx+0.017
ris+0.015
 Er+0.014
abel+0.014
ma+0.014
Neg logit contributions
adoes-0.023
 scares-0.021
 echoed-0.021
 bully-0.021
 arrows-0.020
Token the
Token ID262
Feature activation+7.45e-03

Pos logit contributions
adoes+0.003
 scares+0.003
 echoed+0.003
 arrows+0.003
 net+0.002
Neg logit contributions
 anx-0.003
ris-0.003
abel-0.003
 Er-0.003
ma-0.002
Token roller
Token ID24471
Feature activation-9.42e-02

Pos logit contributions
adoes+0.011
 scares+0.009
 echoed+0.009
 bully+0.009
AAAA+0.009
Neg logit contributions
 anx-0.013
ris-0.012
ma-0.012
 Er-0.012
abel-0.012
Tokenco
Token ID1073
Feature activation-1.58e-01

Pos logit contributions
 anx+0.144
ris+0.135
 Er+0.127
abel+0.126
ma+0.121
Neg logit contributions
adoes-0.154
 scares-0.145
 echoed-0.140
 bully-0.138
 pitch-0.134
Tokenaster
Token ID1603
Feature activation-7.25e-02

Pos logit contributions
 anx+0.271
 Er+0.244
ris+0.236
abel+0.232
 patiently+0.229
Neg logit contributions
adoes-0.218
 scares-0.215
 echoed-0.205
 pitch-0.201
 bully-0.201
Token?
Token ID30
Feature activation-7.68e-03

Pos logit contributions
 anx+0.102
ris+0.094
ma+0.091
abel+0.088
 Er+0.085
Neg logit contributions
adoes-0.126
 scares-0.116
 echoed-0.113
 bully-0.112
 arrows-0.111
Tokenâ
Token ID22940
Feature activation-6.16e-03

Pos logit contributions
 anx+0.014
ris+0.013
ma+0.013
abel+0.013
 patiently+0.012
Neg logit contributions
adoes-0.010
 scares-0.009
 echoed-0.009
 bully-0.008
 arrows-0.008
Token
Token ID26391
Feature activation-1.66e-02

Pos logit contributions
 anx+0.011
abel+0.010
 Er+0.010
ris+0.009
 patiently+0.009
Neg logit contributions
adoes-0.008
 scares-0.008
 echoed-0.008
 bully-0.008
 pitch-0.008
Token his
Token ID465
Feature activation+1.71e-02

Pos logit contributions
 anx+0.027
ris+0.025
abel+0.024
 patiently+0.024
 Er+0.024
Neg logit contributions
adoes-0.025
 scares-0.024
 echoed-0.023
 bully-0.022
 racket-0.022
TokenDid
Token ID11633
Feature activation-9.60e-03

Pos logit contributions
adoes+0.006
 bully+0.006
 echoed+0.005
 scares+0.005
 snail+0.005
Neg logit contributions
 anx-0.009
ris-0.008
innie-0.008
abel-0.008
 patiently-0.007
Token you
Token ID345
Feature activation+4.56e-02

Pos logit contributions
 anx+0.015
ris+0.014
abel+0.014
 Er+0.014
ma+0.013
Neg logit contributions
adoes-0.015
 scares-0.014
 echoed-0.013
 bully-0.013
 net-0.013
Token enjoy
Token ID2883
Feature activation+2.77e-03

Pos logit contributions
adoes+0.066
 scares+0.059
 echoed+0.057
 arrows+0.054
 net+0.054
Neg logit contributions
 anx-0.080
ris-0.076
abel-0.072
ma-0.071
 Er-0.071
Token the
Token ID262
Feature activation-6.84e-03

Pos logit contributions
adoes+0.004
 scares+0.003
 arrows+0.003
 pitch+0.003
 tail+0.003
Neg logit contributions
 anx-0.005
ris-0.005
abel-0.005
 Er-0.005
innie-0.005
Token roller
Token ID24471
Feature activation-9.50e-02

Pos logit contributions
 anx+0.012
ris+0.011
abel+0.011
 Er+0.011
innie+0.011
Neg logit contributions
adoes-0.009
 scares-0.009
 bully-0.009
 pitch-0.009
 net-0.009
Tokenco
Token ID1073
Feature activation-1.54e-01

Pos logit contributions
 anx+0.146
ris+0.138
 Er+0.134
abel+0.129
innie+0.123
Neg logit contributions
adoes-0.153
 scares-0.147
 bully-0.141
 echoed-0.140
 pitch-0.137
Tokenaster
Token ID1603
Feature activation-7.46e-02

Pos logit contributions
 anx+0.255
 Er+0.232
abel+0.216
 patiently+0.214
ris+0.207
Neg logit contributions
 scares-0.209
adoes-0.204
 bully-0.202
igator-0.200
 echoed-0.198
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
 anx+0.112
ris+0.105
 Er+0.099
abel+0.098
ma+0.096
Neg logit contributions
adoes-0.122
 scares-0.114
 echoed-0.112
 bully-0.109
 net-0.107
Token your
Token ID534
Feature activation+3.36e-03

Pos logit contributions
adoes+0.022
 scares+0.019
 echoed+0.019
 net+0.019
 arrows+0.019
Neg logit contributions
 anx-0.021
ris-0.019
 Er-0.019
ma-0.019
abel-0.018
Token ice
Token ID4771
Feature activation-4.96e-02

Pos logit contributions
 tail+0.005
 net+0.004
adoes+0.004
 scares+0.004
 bully+0.004
Neg logit contributions
 anx-0.006
ris-0.006
 Er-0.006
rom-0.005
innie-0.005
Token cream
Token ID8566
Feature activation-5.65e-02

Pos logit contributions
 anx+0.091
 Er+0.085
abel+0.082
ris+0.082
innie+0.078
Neg logit contributions
adoes-0.065
 scares-0.059
 bully-0.057
 pitch-0.057
 echoed-0.057
Token was
Token ID373
Feature activation-8.16e-03

Pos logit contributions
 anx+0.101
 Er+0.095
ris+0.093
rom+0.091
abella+0.089
Neg logit contributions
adoes-0.066
 scares-0.065
 echoed-0.063
 arrow-0.063
 bully-0.062
Token a
Token ID257
Feature activation-2.00e-02

Pos logit contributions
ris+0.012
 anx+0.012
 Er+0.011
abel+0.011
rom+0.011
Neg logit contributions
adoes-0.012
 tail-0.012
 echoed-0.012
 scares-0.012
 pitch-0.012
Token happy
Token ID3772
Feature activation-3.16e-02

Pos logit contributions
ris+0.038
 anx+0.036
abella+0.036
ll+0.036
rom+0.036
Neg logit contributions
 bully-0.024
 snail-0.022
 tail-0.022
 pitch-0.022
 mushroom-0.021
Token 3
Token ID513
Feature activation-8.62e-02

Pos logit contributions
 anx+0.055
ris+0.053
 Er+0.050
abel+0.050
ma+0.049
Neg logit contributions
adoes-0.042
 bully-0.039
 echoed-0.039
 scares-0.039
 pitch-0.038
Token-
Token ID12
Feature activation-7.56e-02

Pos logit contributions
 anx+0.089
 Er+0.080
ris+0.078
 patiently+0.072
innie+0.071
Neg logit contributions
adoes-0.176
 echoed-0.171
 scares-0.169
 bully-0.168
 pitch-0.161
Tokenyear
Token ID1941
Feature activation-1.53e-01

Pos logit contributions
icer+0.103
 anx+0.103
 Er+0.088
 camps+0.087
joy+0.086
Neg logit contributions
igator-0.130
 bully-0.111
ickets-0.110
 arrow-0.104
uts-0.104
Token-
Token ID12
Feature activation-1.02e-01

Pos logit contributions
 anx+0.190
ris+0.176
 Er+0.175
abella+0.168
ma+0.166
Neg logit contributions
adoes-0.274
 bully-0.257
 arrow-0.257
 scares-0.252
 net-0.251
Tokenold
Token ID727
Feature activation-9.52e-02

Pos logit contributions
 anx+0.217
icer+0.210
 Er+0.196
 patiently+0.193
 symp+0.192
Neg logit contributions
igator-0.091
uts-0.079
ickets-0.078
 bully-0.077
 arrow-0.075
Token boy
Token ID2933
Feature activation-4.91e-02

Pos logit contributions
 anx+0.165
ris+0.162
innie+0.160
icer+0.155
abella+0.153
Neg logit contributions
 bully-0.118
adoes-0.115
 arrow-0.111
 echoed-0.110
 tail-0.110
Token who
Token ID508
Feature activation-7.45e-02

Pos logit contributions
 anx+0.078
innie+0.070
ris+0.069
icer+0.068
 Er+0.067
Neg logit contributions
 echoed-0.068
 bully-0.066
 scares-0.065
adoes-0.065
 pitch-0.064
Token loved
Token ID6151
Feature activation-3.60e-02

Pos logit contributions
 anx+0.130
ris+0.123
rom+0.115
 Er+0.115
innie+0.115
Neg logit contributions
 echoed-0.096
 scares-0.094
adoes-0.094
 bully-0.092
 arrow-0.086
Token he
Token ID339
Feature activation-1.87e-02

Pos logit contributions
 anx+0.037
ris+0.035
ma+0.032
 Er+0.031
abel+0.031
Neg logit contributions
adoes-0.045
 echoed-0.043
 scares-0.042
 arrows-0.041
 bully-0.040
Token saw
Token ID2497
Feature activation-1.89e-02

Pos logit contributions
ris+0.033
 anx+0.033
 Er+0.033
innie+0.031
rom+0.031
Neg logit contributions
 echoed-0.023
 arrow-0.022
 arrows-0.022
 bully-0.021
 net-0.021
Token a
Token ID257
Feature activation+6.37e-03

Pos logit contributions
 anx+0.027
ris+0.026
ma+0.023
innie+0.023
 Er+0.023
Neg logit contributions
adoes-0.032
 scares-0.031
 echoed-0.031
 arrows-0.030
 arrow-0.030
Token different
Token ID1180
Feature activation-9.76e-03

Pos logit contributions
adoes+0.010
 scares+0.009
 echoed+0.009
ickets+0.009
 arrows+0.008
Neg logit contributions
 anx-0.010
ris-0.009
ma-0.009
 Er-0.009
abel-0.009
Token roller
Token ID24471
Feature activation-8.70e-02

Pos logit contributions
 anx+0.019
ris+0.018
 Er+0.017
abel+0.017
ma+0.017
Neg logit contributions
adoes-0.012
 scares-0.011
 echoed-0.011
 bully-0.010
 arrows-0.010
Tokenco
Token ID1073
Feature activation-1.51e-01

Pos logit contributions
 anx+0.144
ris+0.133
 Er+0.130
abel+0.127
years+0.123
Neg logit contributions
adoes-0.133
 scares-0.129
 bully-0.126
 echoed-0.124
 hoop-0.122
Tokenaster
Token ID1603
Feature activation-6.64e-02

Pos logit contributions
 anx+0.299
 Er+0.276
abel+0.268
 patiently+0.263
joy+0.251
Neg logit contributions
 scares-0.148
 arrow-0.147
 echoed-0.143
uts-0.141
 pitch-0.141
Token-
Token ID12
Feature activation-5.74e-02

Pos logit contributions
 anx+0.090
ris+0.086
 Er+0.083
innie+0.082
abel+0.079
Neg logit contributions
adoes-0.115
 echoed-0.110
 scares-0.110
 net-0.105
 bully-0.105
Tokenlike
Token ID2339
Feature activation-1.73e-02

Pos logit contributions
 anx+0.115
 patiently+0.102
joy+0.099
abel+0.096
 Er+0.095
Neg logit contributions
igator-0.065
 arrow-0.061
ickets-0.059
 bully-0.058
adoes-0.058
Token ride
Token ID6594
Feature activation-4.91e-02

Pos logit contributions
 anx+0.036
ris+0.033
 Er+0.033
abel+0.032
 patiently+0.032
Neg logit contributions
adoes-0.018
 scares-0.017
 tail-0.017
 bully-0.017
 pitch-0.017
Token.
Token ID13
Feature activation+6.29e-03

Pos logit contributions
 anx+0.084
innie+0.080
ris+0.078
abel+0.078
 Er+0.077
Neg logit contributions
adoes-0.071
 echoed-0.067
 arrows-0.064
 arrow-0.063
 pitch-0.061
Token and
Token ID290
Feature activation-8.87e-03

Pos logit contributions
ris+0.064
innie+0.059
icer+0.059
 Er+0.058
abel+0.058
Neg logit contributions
adoes-0.058
 echoed-0.058
 tail-0.052
 scares-0.051
 pitch-0.051
Token put
Token ID1234
Feature activation-2.31e-02

Pos logit contributions
 anx+0.015
ris+0.014
 Er+0.014
abel+0.014
ma+0.013
Neg logit contributions
adoes-0.013
 scares-0.012
 echoed-0.012
 bully-0.011
 net-0.011
Token on
Token ID319
Feature activation+4.18e-02

Pos logit contributions
 anx+0.041
ris+0.039
abel+0.038
 Er+0.036
innie+0.036
Neg logit contributions
adoes-0.034
 scares-0.030
 echoed-0.030
 net-0.029
 arrows-0.029
Token their
Token ID511
Feature activation-1.38e-02

Pos logit contributions
adoes+0.046
 pitch+0.040
 hoop+0.039
 net+0.039
 arrows+0.038
Neg logit contributions
ris-0.086
 anx-0.085
abel-0.082
innie-0.080
ll-0.079
Token rain
Token ID6290
Feature activation-2.72e-03

Pos logit contributions
 anx+0.026
ris+0.025
abel+0.024
 Er+0.024
ma+0.023
Neg logit contributions
adoes-0.017
 scares-0.016
 echoed-0.015
 bully-0.015
 net-0.015
Tokenco
Token ID1073
Feature activation-1.50e-01

Pos logit contributions
 anx+0.005
ris+0.004
 Er+0.004
abel+0.004
ma+0.004
Neg logit contributions
adoes-0.004
 scares-0.004
 echoed-0.003
 bully-0.003
 pitch-0.003
Tokenats
Token ID1381
Feature activation-4.59e-03

Pos logit contributions
 anx+0.281
 Er+0.263
 patiently+0.240
abel+0.240
ris+0.233
Neg logit contributions
adoes-0.170
 pitch-0.159
 scares-0.157
 hoop-0.156
igators-0.155
Token and
Token ID290
Feature activation+1.86e-02

Pos logit contributions
 anx+0.007
ris+0.007
innie+0.007
abel+0.007
 Er+0.006
Neg logit contributions
adoes-0.007
 echoed-0.007
 pitch-0.006
 net-0.006
 arrows-0.006
Token boots
Token ID14412
Feature activation-9.45e-03

Pos logit contributions
adoes+0.027
 scares+0.025
 bully+0.024
 echoed+0.024
ickets+0.023
Neg logit contributions
 anx-0.032
ris-0.029
ma-0.028
abel-0.028
 Er-0.028
Token.
Token ID13
Feature activation+1.10e-02

Pos logit contributions
innie+0.015
 anx+0.015
abel+0.014
ris+0.014
icer+0.014
Neg logit contributions
adoes-0.014
 echoed-0.014
 arrow-0.012
 tail-0.012
 pitch-0.012
Token They
Token ID1119
Feature activation+1.02e-02

Pos logit contributions
adoes+0.013
 scares+0.013
 echoed+0.012
 bully+0.012
 arrows+0.012
Neg logit contributions
 anx-0.021
ris-0.020
 Er-0.019
abel-0.019
ma-0.019
Token have
Token ID423
Feature activation+5.76e-03

Pos logit contributions
adoes+0.029
 scares+0.028
 bully+0.026
 echoed+0.026
 net+0.026
Neg logit contributions
 anx-0.053
ris-0.049
 Er-0.048
abel-0.047
ma-0.047
Token to
Token ID284
Feature activation-5.83e-03

Pos logit contributions
adoes+0.011
 scares+0.010
 net+0.010
 bully+0.010
 arrows+0.010
Neg logit contributions
 anx-0.008
ris-0.007
abel-0.006
 Er-0.006
innie-0.006
Token wear
Token ID5806
Feature activation+1.85e-02

Pos logit contributions
 anx+0.011
ris+0.010
 Er+0.010
abel+0.010
ma+0.010
Neg logit contributions
adoes-0.007
 scares-0.007
 echoed-0.007
 bully-0.006
 pitch-0.006
Token your
Token ID534
Feature activation-6.92e-03

Pos logit contributions
adoes+0.027
 scares+0.025
 echoed+0.024
 arrows+0.024
 bully+0.024
Neg logit contributions
 anx-0.031
ris-0.030
abel-0.028
 Er-0.028
ma-0.027
Token rain
Token ID6290
Feature activation-7.44e-03

Pos logit contributions
 anx+0.012
ris+0.012
abel+0.012
 Er+0.011
rom+0.011
Neg logit contributions
adoes-0.009
 scares-0.009
 net-0.009
 arrows-0.008
 arrow-0.008
Tokenco
Token ID1073
Feature activation-1.49e-01

Pos logit contributions
 anx+0.012
ris+0.011
 Er+0.011
abel+0.010
innie+0.010
Neg logit contributions
adoes-0.012
 scares-0.011
 echoed-0.011
 pitch-0.010
 arrows-0.010
Tokenats
Token ID1381
Feature activation-9.72e-03

Pos logit contributions
 anx+0.291
 Er+0.257
 patiently+0.244
abel+0.243
joy+0.234
Neg logit contributions
adoes-0.168
 scares-0.162
 arrow-0.157
 pitch-0.156
uts-0.155
Token and
Token ID290
Feature activation+2.36e-02

Pos logit contributions
 anx+0.015
ris+0.014
 Er+0.013
abel+0.013
innie+0.013
Neg logit contributions
adoes-0.016
 scares-0.014
 echoed-0.014
 bully-0.014
 arrows-0.014
Token boots
Token ID14412
Feature activation-8.12e-03

Pos logit contributions
adoes+0.035
ickets+0.034
 echoed+0.033
 scares+0.033
 bully+0.033
Neg logit contributions
 anx-0.038
ma-0.033
ris-0.033
 patiently-0.032
 Er-0.032
Token,"
Token ID553
Feature activation-1.89e-02

Pos logit contributions
 anx+0.014
ris+0.013
 Er+0.013
innie+0.013
abel+0.013
Neg logit contributions
adoes-0.012
 echoed-0.010
 scares-0.010
 arrow-0.010
 arrows-0.010
Token Mom
Token ID11254
Feature activation-2.69e-02

Pos logit contributions
 anx+0.034
ris+0.034
abel+0.031
rom+0.030
 Er+0.030
Neg logit contributions
adoes-0.024
 scares-0.023
 echoed-0.023
 bully-0.022
 arrow-0.022
Token took
Token ID1718
Feature activation-2.20e-02

Pos logit contributions
 anx+0.077
ris+0.072
abel+0.069
 Er+0.068
rom+0.067
Neg logit contributions
adoes-0.046
 scares-0.044
 echoed-0.044
 bully-0.042
 pitch-0.041
Token Sara
Token ID24799
Feature activation-3.89e-02

Pos logit contributions
 anx+0.036
ris+0.034
 patiently+0.031
ma+0.031
ll+0.031
Neg logit contributions
adoes-0.033
 scares-0.031
 echoed-0.030
 arrows-0.029
 pitch-0.029
Token the
Token ID262
Feature activation-1.51e-02

Pos logit contributions
 anx+0.071
ris+0.070
Mike+0.068
icer+0.067
Daniel+0.066
Neg logit contributions
 pitch-0.038
adoes-0.038
 net-0.036
 arrow-0.036
 tail-0.036
Token 3
Token ID513
Feature activation-7.44e-02

Pos logit contributions
 anx+0.028
ris+0.028
abel+0.027
years+0.026
abella+0.026
Neg logit contributions
 hoop-0.018
 net-0.018
 pitch-0.018
adoes-0.017
 scares-0.017
Token-
Token ID12
Feature activation-5.63e-02

Pos logit contributions
 anx+0.094
 Er+0.089
 patiently+0.086
ris+0.085
rom+0.081
Neg logit contributions
 scares-0.124
adoes-0.122
 bully-0.122
 echoed-0.121
igator-0.118
Tokenyear
Token ID1941
Feature activation-1.48e-01

Pos logit contributions
 anx+0.070
icer+0.064
joy+0.061
 symp+0.061
 Er+0.060
Neg logit contributions
igator-0.091
igators-0.087
icks-0.086
ickets-0.084
adoes-0.083
Token-
Token ID12
Feature activation-8.30e-02

Pos logit contributions
 anx+0.191
ris+0.177
 Er+0.172
abel+0.166
icer+0.166
Neg logit contributions
adoes-0.258
 arrow-0.237
 scares-0.236
 arrows-0.233
 bully-0.232
Tokenold
Token ID727
Feature activation-8.84e-02

Pos logit contributions
 anx+0.172
icer+0.165
 symp+0.162
joy+0.155
 Er+0.153
Neg logit contributions
igator-0.061
igators-0.059
ickets-0.059
uts-0.059
icks-0.058
Token to
Token ID284
Feature activation-3.13e-02

Pos logit contributions
ris+0.151
 anx+0.145
innie+0.141
icer+0.139
abella+0.136
Neg logit contributions
adoes-0.109
 bully-0.108
 scares-0.102
 echoed-0.100
 arrows-0.098
Token the
Token ID262
Feature activation-2.22e-03

Pos logit contributions
 anx+0.052
ris+0.051
abel+0.049
 Er+0.047
ma+0.047
Neg logit contributions
adoes-0.044
 scares-0.041
 pitch-0.040
 echoed-0.040
 bully-0.040
Token park
Token ID3952
Feature activation-9.78e-03

Pos logit contributions
ris+0.004
 anx+0.004
abel+0.004
vy+0.004
abella+0.004
Neg logit contributions
 pitch-0.003
 bully-0.003
 hoop-0.003
 squash-0.003
 net-0.003
Token Lily
Token ID20037
Feature activation-2.70e-03

Pos logit contributions
 anx+0.003
ris+0.003
abel+0.003
ma+0.003
innie+0.003
Neg logit contributions
adoes-0.002
 echoed-0.002
 scares-0.002
 bully-0.002
 arrows-0.002
Token saw
Token ID2497
Feature activation-2.37e-02

Pos logit contributions
 anx+0.005
ris+0.005
 Er+0.004
innie+0.004
ma+0.004
Neg logit contributions
 echoed-0.003
adoes-0.003
 scares-0.003
 bully-0.003
 arrow-0.003
Token a
Token ID257
Feature activation+1.25e-03

Pos logit contributions
 anx+0.033
ris+0.032
 Er+0.029
abel+0.028
ma+0.028
Neg logit contributions
adoes-0.042
 scares-0.039
 echoed-0.039
 arrows-0.038
 bully-0.038
Token big
Token ID1263
Feature activation+2.62e-02

Pos logit contributions
adoes+0.002
 scares+0.002
 bully+0.002
 echoed+0.002
 net+0.002
Neg logit contributions
 anx-0.002
ris-0.002
 Er-0.002
abel-0.002
ma-0.002
Token roller
Token ID24471
Feature activation-8.65e-02

Pos logit contributions
adoes+0.039
 scares+0.036
 echoed+0.035
 bully+0.033
 arrows+0.033
Neg logit contributions
 anx-0.044
ris-0.041
ma-0.039
abel-0.039
 Er-0.039
Tokenco
Token ID1073
Feature activation-1.47e-01

Pos logit contributions
 anx+0.154
 Er+0.144
ris+0.140
years+0.136
bie+0.135
Neg logit contributions
adoes-0.118
 scares-0.114
 bully-0.112
 pitch-0.111
 echoed-0.109
Tokenaster
Token ID1603
Feature activation-6.17e-02

Pos logit contributions
 anx+0.274
 Er+0.263
 patiently+0.244
abel+0.241
joy+0.237
Neg logit contributions
 scares-0.149
 echoed-0.146
 pitch-0.146
 arrow-0.144
adoes-0.143
Token and
Token ID290
Feature activation+1.86e-03

Pos logit contributions
 anx+0.085
ris+0.081
 Er+0.077
innie+0.074
abel+0.074
Neg logit contributions
adoes-0.106
 echoed-0.101
 scares-0.101
 bully-0.096
 pitch-0.095
Token asked
Token ID1965
Feature activation+1.27e-02

Pos logit contributions
adoes+0.003
 scares+0.003
 echoed+0.003
 bully+0.003
 net+0.002
Neg logit contributions
 anx-0.003
ris-0.003
 Er-0.003
abel-0.003
ma-0.003
Token her
Token ID607
Feature activation+7.02e-02

Pos logit contributions
adoes+0.017
 arrows+0.015
 echoed+0.015
 mushroom+0.015
 arrow+0.015
Neg logit contributions
ris-0.023
 anx-0.021
icer-0.020
abel-0.020
 Er-0.019
Token mom
Token ID1995
Feature activation+1.01e-02

Pos logit contributions
 hoop+0.087
adoes+0.087
 mushroom+0.086
 arrow+0.086
 echoed+0.085
Neg logit contributions
 anx-0.116
icer-0.116
ris-0.114
rom-0.107
son-0.105
TokenJake
Token ID43930
Feature activation-6.85e-02

Pos logit contributions
 scares+0.038
adoes+0.037
 mushroom+0.036
 echoed+0.036
 net+0.036
Neg logit contributions
 anx-0.046
 patiently-0.040
ris-0.040
abel-0.039
 Er-0.038
Token was
Token ID373
Feature activation-1.88e-02

Pos logit contributions
 anx+0.109
ris+0.103
 Er+0.099
rom+0.096
innie+0.095
Neg logit contributions
adoes-0.099
 scares-0.095
 echoed-0.094
 bully-0.091
 arrow-0.090
Token a
Token ID257
Feature activation-2.05e-02

Pos logit contributions
ris+0.031
 Er+0.029
rom+0.028
abel+0.028
 anx+0.027
Neg logit contributions
 tail-0.026
adoes-0.025
 pitch-0.024
 net-0.024
 echoed-0.024
Token three
Token ID1115
Feature activation-5.68e-02

Pos logit contributions
ris+0.042
 anx+0.041
abella+0.040
rom+0.040
 Er+0.039
Neg logit contributions
 bully-0.022
 tail-0.020
 snail-0.020
 pitch-0.020
 net-0.019
Token-
Token ID12
Feature activation-5.32e-02

Pos logit contributions
 anx+0.059
ris+0.054
 Er+0.053
rom+0.048
innie+0.048
Neg logit contributions
adoes-0.114
 scares-0.112
 bully-0.111
 echoed-0.110
 mushroom-0.106
Tokenyear
Token ID1941
Feature activation-1.47e-01

Pos logit contributions
 anx+0.066
 Er+0.063
icer+0.059
 patiently+0.056
innie+0.055
Neg logit contributions
igator-0.099
igators-0.084
 bully-0.081
 arrow-0.080
ickets-0.080
Token-
Token ID12
Feature activation-9.05e-02

Pos logit contributions
 anx+0.177
ris+0.163
 Er+0.162
abella+0.155
abel+0.153
Neg logit contributions
adoes-0.273
 arrow-0.257
 bully-0.255
 scares-0.255
 net-0.252
Tokenold
Token ID727
Feature activation-8.94e-02

Pos logit contributions
 anx+0.197
 Er+0.186
icer+0.184
 patiently+0.176
 symp+0.175
Neg logit contributions
igator-0.080
 arrow-0.066
igators-0.065
ickets-0.065
 bully-0.061
Token boy
Token ID2933
Feature activation-3.81e-02

Pos logit contributions
ris+0.172
 anx+0.172
innie+0.166
abella+0.163
rom+0.156
Neg logit contributions
 bully-0.095
adoes-0.094
 tail-0.089
 arrow-0.088
 snail-0.088
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
 anx+0.064
ris+0.058
innie+0.057
abella+0.056
 Er+0.056
Neg logit contributions
 echoed-0.048
 scares-0.048
 bully-0.048
adoes-0.047
 tail-0.046
Token He
Token ID679
Feature activation-2.74e-02

Pos logit contributions
 anx+0.023
ris+0.022
ve+0.021
ma+0.021
 patiently+0.020
Neg logit contributions
adoes-0.016
 echoed-0.015
 scares-0.014
 pitch-0.014
 hoop-0.014
TokenBen
Token ID11696
Feature activation-2.90e-02

Pos logit contributions
 echoed+0.030
 scares+0.030
adoes+0.029
 mushroom+0.029
 net+0.028
Neg logit contributions
 anx-0.042
 patiently-0.038
ris-0.037
abel-0.036
 Er-0.036
Token was
Token ID373
Feature activation-9.02e-03

Pos logit contributions
 anx+0.046
ris+0.044
 Er+0.042
abel+0.041
rom+0.041
Neg logit contributions
adoes-0.042
 scares-0.040
 echoed-0.040
 bully-0.038
 arrow-0.038
Token a
Token ID257
Feature activation-2.05e-02

Pos logit contributions
ris+0.014
 anx+0.013
 Er+0.013
rom+0.013
innie+0.013
Neg logit contributions
adoes-0.013
 echoed-0.013
 tail-0.013
 scares-0.012
 pitch-0.012
Token 3
Token ID513
Feature activation-5.94e-02

Pos logit contributions
ris+0.040
rom+0.039
 anx+0.039
abella+0.038
ll+0.038
Neg logit contributions
 bully-0.023
 pitch-0.021
 snail-0.021
 tail-0.021
 net-0.020
Token-
Token ID12
Feature activation-5.70e-02

Pos logit contributions
 anx+0.065
 Er+0.064
ris+0.060
abella+0.058
innie+0.058
Neg logit contributions
adoes-0.112
 bully-0.110
 echoed-0.109
 scares-0.108
 arrow-0.103
Tokenyear
Token ID1941
Feature activation-1.46e-01

Pos logit contributions
icer+0.074
 Er+0.071
 anx+0.069
innie+0.064
ris+0.062
Neg logit contributions
igator-0.099
 bully-0.085
igators-0.080
 arrow-0.080
ickets-0.078
Token-
Token ID12
Feature activation-8.98e-02

Pos logit contributions
 anx+0.180
 Er+0.172
ris+0.171
abella+0.167
innie+0.165
Neg logit contributions
adoes-0.255
 arrow-0.248
 bully-0.242
 net-0.240
 arrows-0.239
Tokenold
Token ID727
Feature activation-8.12e-02

Pos logit contributions
icer+0.196
 anx+0.192
 Er+0.191
rom+0.177
 symp+0.175
Neg logit contributions
igator-0.072
 arrow-0.058
 bully-0.057
igators-0.053
ickets-0.053
Token boy
Token ID2933
Feature activation-2.90e-02

Pos logit contributions
ris+0.161
 anx+0.160
innie+0.159
abella+0.154
icer+0.152
Neg logit contributions
 bully-0.083
adoes-0.076
 snail-0.074
 arrow-0.074
 tail-0.074
Token who
Token ID508
Feature activation-6.59e-02

Pos logit contributions
 anx+0.052
innie+0.047
abella+0.047
icer+0.047
ris+0.046
Neg logit contributions
 echoed-0.032
 bully-0.032
 tail-0.031
 pitch-0.031
 arrow-0.031
Token owned
Token ID6898
Feature activation-2.99e-02

Pos logit contributions
 anx+0.119
ris+0.112
rom+0.107
 Er+0.106
innie+0.106
Neg logit contributions
 echoed-0.081
 scares-0.080
 bully-0.078
adoes-0.076
 arrow-0.072
TokenCharlie
Token ID37136
Feature activation-4.01e-02

Pos logit contributions
 scares+0.031
adoes+0.031
 echoed+0.030
 net+0.029
 mushroom+0.029
Neg logit contributions
 anx-0.034
 patiently-0.030
ris-0.030
abel-0.029
 Er-0.029
Token was
Token ID373
Feature activation-2.49e-02

Pos logit contributions
 anx+0.065
ris+0.061
 Er+0.059
rom+0.058
innie+0.057
Neg logit contributions
adoes-0.055
 scares-0.055
 echoed-0.054
 arrow-0.052
 bully-0.052
Token a
Token ID257
Feature activation-2.25e-02

Pos logit contributions
ris+0.040
 Er+0.038
 anx+0.037
abel+0.036
rom+0.036
Neg logit contributions
 tail-0.034
adoes-0.034
 net-0.033
 scares-0.033
 pitch-0.033
Token three
Token ID1115
Feature activation-5.79e-02

Pos logit contributions
ris+0.045
 anx+0.044
abella+0.043
rom+0.043
 Er+0.042
Neg logit contributions
 bully-0.025
 snail-0.024
 tail-0.022
 pitch-0.022
 net-0.021
Token-
Token ID12
Feature activation-5.25e-02

Pos logit contributions
 anx+0.065
ris+0.060
 Er+0.060
abella+0.055
innie+0.055
Neg logit contributions
adoes-0.111
 bully-0.109
 scares-0.109
 echoed-0.106
 mushroom-0.103
Tokenyear
Token ID1941
Feature activation-1.46e-01

Pos logit contributions
 anx+0.063
 Er+0.062
icer+0.058
innie+0.054
 patiently+0.054
Neg logit contributions
igator-0.097
 bully-0.082
igators-0.081
 arrow-0.079
ickets-0.078
Token-
Token ID12
Feature activation-8.82e-02

Pos logit contributions
 anx+0.179
ris+0.167
 Er+0.167
abella+0.160
innie+0.157
Neg logit contributions
adoes-0.264
 arrow-0.249
 bully-0.248
 scares-0.246
 wing-0.244
Tokenold
Token ID727
Feature activation-8.79e-02

Pos logit contributions
 anx+0.189
 Er+0.183
icer+0.182
 symp+0.174
 patiently+0.171
Neg logit contributions
igator-0.074
 arrow-0.059
 bully-0.058
igators-0.057
ickets-0.057
Token girl
Token ID2576
Feature activation-1.92e-02

Pos logit contributions
ris+0.168
 anx+0.166
innie+0.162
abella+0.160
 Er+0.153
Neg logit contributions
 bully-0.095
adoes-0.095
 tail-0.089
 arrow-0.088
 snail-0.088
Token who
Token ID508
Feature activation-7.37e-02

Pos logit contributions
 anx+0.032
ris+0.030
 Er+0.030
innie+0.029
abella+0.029
Neg logit contributions
 echoed-0.024
adoes-0.024
 tail-0.024
 web-0.024
 net-0.023
Token loved
Token ID6151
Feature activation-3.76e-02

Pos logit contributions
 anx+0.138
ris+0.131
 Er+0.125
innie+0.124
rom+0.123
Neg logit contributions
 echoed-0.085
 scares-0.084
adoes-0.081
 bully-0.080
 arrow-0.073
Token Benny
Token ID44275
Feature activation+3.97e-02

Pos logit contributions
 anx+0.008
ris+0.007
 Er+0.007
 patiently+0.007
abel+0.007
Neg logit contributions
adoes-0.007
 scares-0.006
 echoed-0.006
 bully-0.006
 affect-0.006
Token tried
Token ID3088
Feature activation+4.81e-02

Pos logit contributions
adoes+0.074
 scares+0.067
 bully+0.065
 echoed+0.064
 net+0.064
Neg logit contributions
 anx-0.054
ris-0.048
abel-0.047
ma-0.045
 Er-0.045
Token to
Token ID284
Feature activation+1.67e-02

Pos logit contributions
adoes+0.090
 scares+0.085
 echoed+0.083
 bully+0.082
 arrows+0.082
Neg logit contributions
 anx-0.059
ris-0.057
abel-0.054
ma-0.052
 Er-0.051
Token swat
Token ID44284
Feature activation+5.92e-02

Pos logit contributions
adoes+0.026
 scares+0.023
 echoed+0.022
 CA+0.022
 Sorry+0.022
Neg logit contributions
 anx-0.029
 first-0.025
 patiently-0.025
ris-0.024
 next-0.024
Token it
Token ID340
Feature activation+2.09e-02

Pos logit contributions
adoes+0.092
 scares+0.086
 echoed+0.084
 bully+0.082
 arrows+0.082
Neg logit contributions
 anx-0.094
ris-0.089
abel-0.084
 Er-0.083
ma-0.082
Token away
Token ID1497
Feature activation+7.03e-02

Pos logit contributions
adoes+0.035
 scares+0.032
 echoed+0.031
 bully+0.031
 net+0.030
Neg logit contributions
 anx-0.032
ris-0.028
 Er-0.027
 patiently-0.027
ma-0.027
Token but
Token ID475
Feature activation-5.90e-03

Pos logit contributions
adoes+0.113
 scares+0.105
 echoed+0.102
 bully+0.100
 arrows+0.098
Neg logit contributions
 anx-0.109
ris-0.102
 Er-0.096
abel-0.095
ma-0.094
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
 anx+0.008
ris+0.007
 Er+0.007
 patiently+0.006
ma+0.006
Neg logit contributions
adoes-0.011
 scares-0.010
 echoed-0.010
 bully-0.010
ickets-0.010
Token bird
Token ID6512
Feature activation+3.52e-02

Pos logit contributions
adoes+0.023
 scares+0.021
 echoed+0.020
 bully+0.019
 arrows+0.019
Neg logit contributions
 anx-0.041
ris-0.038
abel-0.037
ma-0.037
 Er-0.036
Token wouldn
Token ID3636
Feature activation-3.28e-02

Pos logit contributions
adoes+0.057
 scares+0.053
 echoed+0.051
 bully+0.050
 net+0.050
Neg logit contributions
 anx-0.054
ris-0.050
abel-0.047
ma-0.047
 Er-0.047
Token't
Token ID470
Feature activation+8.53e-03

Pos logit contributions
 anx+0.051
ris+0.049
abel+0.046
ma+0.045
 Er+0.045
Neg logit contributions
adoes-0.050
 scares-0.049
 echoed-0.047
 bully-0.046
 affect-0.045
Token loved
Token ID6151
Feature activation-1.85e-02

Pos logit contributions
 anx+0.033
ris+0.032
 Er+0.030
ma+0.030
abel+0.030
Neg logit contributions
adoes-0.024
 scares-0.024
 echoed-0.023
 bully-0.022
 arrows-0.022
Token to
Token ID284
Feature activation-1.24e-02

Pos logit contributions
 anx+0.019
ris+0.017
 Er+0.015
abel+0.015
ma+0.015
Neg logit contributions
adoes-0.040
 scares-0.037
 echoed-0.037
 bully-0.036
 arrows-0.036
Token use
Token ID779
Feature activation-2.02e-02

Pos logit contributions
 anx+0.021
ris+0.019
ma+0.018
abel+0.018
 patiently+0.018
Neg logit contributions
adoes-0.019
 echoed-0.017
 scares-0.016
 racket-0.016
AAAA-0.016
Token his
Token ID465
Feature activation+1.47e-03

Pos logit contributions
 anx+0.024
 Er+0.020
ris+0.020
ma+0.019
abel+0.018
Neg logit contributions
adoes-0.039
 echoed-0.038
 bully-0.037
 scares-0.037
 affect-0.037
Token long
Token ID890
Feature activation-8.91e-04

Pos logit contributions
adoes+0.002
 scares+0.002
 echoed+0.002
 bully+0.002
 arrows+0.002
Neg logit contributions
 anx-0.003
ris-0.002
 Er-0.002
ma-0.002
abel-0.002
Token tail
Token ID7894
Feature activation+6.24e-02

Pos logit contributions
 anx+0.002
ris+0.002
ma+0.002
abel+0.001
son+0.001
Neg logit contributions
adoes-0.001
 scares-0.001
 echoed-0.001
 bully-0.001
ickets-0.001
Token to
Token ID284
Feature activation-2.73e-02

Pos logit contributions
 scares+0.151
AAAA+0.151
adoes+0.150
 Sorry+0.147
 bully+0.145
Neg logit contributions
 anx-0.041
 next-0.033
 first-0.032
 patiently-0.031
ris-0.031
Token swing
Token ID9628
Feature activation-4.00e-03

Pos logit contributions
 anx+0.052
ris+0.048
ma+0.046
 Er+0.046
abel+0.046
Neg logit contributions
adoes-0.035
 scares-0.031
 echoed-0.031
 bully-0.029
 arrows-0.028
Token from
Token ID422
Feature activation+4.72e-03

Pos logit contributions
 anx+0.005
ris+0.005
abel+0.005
 Er+0.005
ma+0.005
Neg logit contributions
adoes-0.007
 scares-0.007
 echoed-0.007
 bully-0.006
 arrows-0.006
Token tree
Token ID5509
Feature activation-3.74e-04

Pos logit contributions
adoes+0.007
 scares+0.007
 echoed+0.006
 bully+0.006
 arrows+0.006
Neg logit contributions
 anx-0.008
ris-0.007
abel-0.007
 Er-0.007
ma-0.007
Token to
Token ID284
Feature activation-1.57e-02

Pos logit contributions
 anx+0.000
ris+0.000
abel+0.000
 Er+0.000
ma+0.000
Neg logit contributions
adoes-0.001
 scares-0.001
 echoed-0.001
 arrows-0.001
 net-0.001
Token on
Token ID319
Feature activation+2.06e-02

Pos logit contributions
adoes+0.046
 scares+0.042
 echoed+0.041
 net+0.040
 arrows+0.040
Neg logit contributions
 anx-0.040
ris-0.038
 Er-0.036
abel-0.035
innie-0.035
Token the
Token ID262
Feature activation-1.16e-02

Pos logit contributions
adoes+0.030
 pitch+0.028
 net+0.027
 tail+0.026
 squash+0.026
Neg logit contributions
 anx-0.036
ris-0.035
abel-0.034
ll-0.034
innie-0.033
Token counter
Token ID3753
Feature activation-7.51e-02

Pos logit contributions
 anx+0.020
ma+0.018
ris+0.018
 Er+0.018
 next+0.018
Neg logit contributions
adoes-0.016
 echoed-0.015
 scares-0.015
 arrows-0.014
 bully-0.014
Token and
Token ID290
Feature activation+1.41e-02

Pos logit contributions
innie+0.133
ris+0.122
 Er+0.122
abella+0.120
 anx+0.120
Neg logit contributions
adoes-0.099
 net-0.093
 scares-0.090
 echoed-0.089
 web-0.089
Token ran
Token ID4966
Feature activation+1.72e-02

Pos logit contributions
adoes+0.021
 scares+0.020
 echoed+0.019
 bully+0.019
 arrows+0.019
Neg logit contributions
 anx-0.023
ris-0.022
abel-0.020
 Er-0.020
ma-0.020
Token away
Token ID1497
Feature activation+7.02e-02

Pos logit contributions
adoes+0.028
 scares+0.026
 pitch+0.025
 bully+0.025
 net+0.025
Neg logit contributions
 anx-0.026
ris-0.024
abel-0.024
innie-0.024
 Er-0.024
Token.
Token ID13
Feature activation+2.00e-02

Pos logit contributions
adoes+0.151
 scares+0.141
 echoed+0.139
 arrows+0.137
 bully+0.137
Neg logit contributions
 anx-0.073
ris-0.064
 Er-0.059
 patiently-0.057
abel-0.057
Token They
Token ID1119
Feature activation+1.92e-02

Pos logit contributions
adoes+0.030
 echoed+0.027
 scares+0.027
 bully+0.026
 arrows+0.026
Neg logit contributions
 anx-0.033
ris-0.031
 Er-0.030
abel-0.029
 patiently-0.028
Token did
Token ID750
Feature activation-2.03e-04

Pos logit contributions
adoes+0.025
 scares+0.024
 echoed+0.024
 bully+0.023
 net+0.022
Neg logit contributions
 anx-0.034
ris-0.032
 Er-0.031
innie-0.031
ma-0.031
Token not
Token ID407
Feature activation-7.20e-03

Pos logit contributions
 anx+0.000
ris+0.000
 Er+0.000
abel+0.000
ma+0.000
Neg logit contributions
adoes-0.000
 scares-0.000
 echoed-0.000
 bully-0.000
 net-0.000
Token want
Token ID765
Feature activation+1.88e-02

Pos logit contributions
 anx+0.010
ris+0.010
vy+0.009
abel+0.009
ma+0.009
Neg logit contributions
adoes-0.013
 scares-0.010
 arrows-0.010
 CA-0.010
 echoed-0.010
Token to
Token ID284
Feature activation-2.18e-02

Pos logit contributions
adoes+0.016
 scares+0.015
 echoed+0.015
 bully+0.015
 net+0.015
Neg logit contributions
 anx-0.009
ris-0.009
abel-0.008
 Er-0.008
ma-0.008
Token the
Token ID262
Feature activation+6.67e-03

Pos logit contributions
 anx+0.028
ris+0.026
 Er+0.024
abel+0.024
ma+0.024
Neg logit contributions
adoes-0.041
 scares-0.038
 echoed-0.037
 bully-0.036
 arrows-0.036
Token store
Token ID3650
Feature activation-3.03e-02

Pos logit contributions
adoes+0.009
 pitch+0.009
 bully+0.009
 mushroom+0.009
 net+0.009
Neg logit contributions
 anx-0.011
ris-0.011
abel-0.010
ma-0.010
 Er-0.010
Token and
Token ID290
Feature activation+8.26e-03

Pos logit contributions
 anx+0.058
abel+0.056
ris+0.055
innie+0.053
 Er+0.052
Neg logit contributions
adoes-0.034
 pitch-0.033
 arrow-0.030
 snail-0.030
 net-0.030
Token saw
Token ID2497
Feature activation-2.29e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 bully+0.011
 echoed+0.011
 pitch+0.011
Neg logit contributions
 anx-0.013
ris-0.013
abel-0.012
innie-0.012
 Er-0.012
Token all
Token ID477
Feature activation+6.26e-02

Pos logit contributions
 anx+0.035
ris+0.035
abel+0.030
ma+0.029
 Er+0.029
Neg logit contributions
adoes-0.037
 arrow-0.034
 arrows-0.034
 scares-0.034
 mushroom-0.033
Token the
Token ID262
Feature activation-6.40e-03

Pos logit contributions
adoes+0.122
 scares+0.114
 echoed+0.112
 bullet+0.112
 arrows+0.111
Neg logit contributions
 anx-0.073
ma-0.070
ris-0.068
 Er-0.066
abel-0.066
Token perfume
Token ID40094
Feature activation-1.66e-02

Pos logit contributions
 anx+0.013
ris+0.012
abel+0.011
 patiently+0.011
ma+0.011
Neg logit contributions
adoes-0.008
 scares-0.007
 bully-0.007
 mushroom-0.007
 pitch-0.006
Token bottles
Token ID14096
Feature activation-7.10e-03

Pos logit contributions
 anx+0.021
ris+0.019
abel+0.018
ma+0.018
 Er+0.018
Neg logit contributions
adoes-0.031
 scares-0.029
 echoed-0.028
 bully-0.028
 arrows-0.028
Token lined
Token ID16566
Feature activation-8.13e-02

Pos logit contributions
 anx+0.008
ris+0.007
ma+0.007
 Er+0.006
 patiently+0.006
Neg logit contributions
adoes-0.015
 scares-0.014
 bully-0.014
AAAA-0.013
 echoed-0.013
Token up
Token ID510
Feature activation-1.31e-02

Pos logit contributions
ris+0.157
 anx+0.151
abel+0.145
innie+0.143
 Er+0.134
Neg logit contributions
adoes-0.109
 pitch-0.092
 arrows-0.086
 squash-0.084
 arrow-0.080
Token to
Token ID284
Feature activation-1.16e-02

Pos logit contributions
adoes+0.018
 scares+0.017
 echoed+0.017
 bully+0.017
 arrows+0.016
Neg logit contributions
 anx-0.010
ris-0.010
abel-0.009
ma-0.009
 Er-0.009
Token Rabb
Token ID22502
Feature activation-2.39e-02

Pos logit contributions
 anx+0.021
ris+0.021
abel+0.019
ma+0.019
rom+0.018
Neg logit contributions
adoes-0.015
 scares-0.014
 bully-0.014
 echoed-0.014
 mushroom-0.014
Tokenie
Token ID494
Feature activation+4.41e-02

Pos logit contributions
 anx+0.033
ris+0.031
abel+0.029
ma+0.029
 Er+0.028
Neg logit contributions
adoes-0.042
 scares-0.040
 echoed-0.039
 bully-0.038
 arrows-0.037
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
adoes+0.070
 mushroom+0.068
 snail+0.066
 bully+0.065
 arrows+0.065
Neg logit contributions
 anx-0.065
ris-0.065
innie-0.059
rom-0.058
abella-0.058
Token Fox
Token ID5426
Feature activation+3.88e-02

Pos logit contributions
 anx+0.016
 Er+0.016
ris+0.014
abel+0.014
 patiently+0.014
Neg logit contributions
adoes-0.026
 scares-0.024
 echoed-0.024
 bully-0.024
 affect-0.024
Tokeny
Token ID88
Feature activation+7.47e-02

Pos logit contributions
adoes+0.058
 scares+0.053
 echoed+0.051
 bully+0.050
 pitch+0.050
Neg logit contributions
 anx-0.065
ris-0.061
ma-0.059
abel-0.059
 Er-0.058
Token said
Token ID531
Feature activation+8.59e-06

Pos logit contributions
adoes+0.119
 scares+0.112
 echoed+0.110
 bully+0.108
 arrows+0.105
Neg logit contributions
 anx-0.114
ris-0.109
 Er-0.101
abel-0.100
ma-0.099
Token,
Token ID11
Feature activation+2.59e-02

Feature not active on prev token;
no predictions were affected.
Token "
Token ID366
Feature activation-6.80e-03

Pos logit contributions
adoes+0.022
 echoed+0.021
 scares+0.020
 snail+0.019
 bully+0.019
Neg logit contributions
ris-0.060
 anx-0.056
abel-0.055
ma-0.054
icer-0.053
TokenI
Token ID40
Feature activation+1.35e-02

Pos logit contributions
 anx+0.012
ris+0.012
abel+0.011
ma+0.010
 Er+0.010
Neg logit contributions
adoes-0.010
 scares-0.009
 bully-0.009
 echoed-0.009
 snail-0.008
Token'm
Token ID1101
Feature activation+1.22e-02

Pos logit contributions
 scares+0.016
 bully+0.016
 echoed+0.015
 affect+0.015
adoes+0.015
Neg logit contributions
 anx-0.026
 Er-0.024
ris-0.023
rom-0.023
son-0.023
Token and
Token ID290
Feature activation-1.63e-02

Pos logit contributions
adoes+0.029
 tail+0.028
 arrows+0.028
 net+0.028
 arrow+0.027
Neg logit contributions
 anx-0.056
ris-0.053
abel-0.051
son-0.051
 Er-0.051
Token Kar
Token ID9375
Feature activation-5.24e-02

Pos logit contributions
 anx+0.029
ris+0.028
abel+0.026
ma+0.026
 Er+0.025
Neg logit contributions
adoes-0.022
 scares-0.020
 echoed-0.020
 bully-0.020
 arrows-0.019
Tokenla
Token ID5031
Feature activation-5.25e-02

Pos logit contributions
 anx+0.075
ris+0.062
abel+0.062
 tenth+0.061
innie+0.061
Neg logit contributions
adoes-0.074
 arrow-0.073
 arrows-0.073
 wing-0.072
uders-0.071
Token.
Token ID13
Feature activation-1.84e-02

Pos logit contributions
 anx+0.079
 Er+0.070
ris+0.069
innie+0.069
son+0.068
Neg logit contributions
 echoed-0.077
adoes-0.075
 arrow-0.075
 tail-0.074
 arrows-0.074
Token One
Token ID1881
Feature activation-4.06e-02

Pos logit contributions
 anx+0.036
ris+0.035
abel+0.033
ma+0.033
 Er+0.032
Neg logit contributions
adoes-0.021
 scares-0.019
 echoed-0.019
 bully-0.018
 arrows-0.018
Token day
Token ID1110
Feature activation+3.82e-02

Pos logit contributions
 anx+0.047
ris+0.043
abel+0.040
ma+0.039
 Er+0.039
Neg logit contributions
adoes-0.082
 scares-0.077
 echoed-0.075
 bully-0.074
 arrows-0.073
Token they
Token ID484
Feature activation-1.78e-02

Pos logit contributions
adoes+0.075
 scares+0.069
 web+0.067
 bully+0.066
 echoed+0.066
Neg logit contributions
 anx-0.047
 Er-0.044
ris-0.042
 patiently-0.039
ma-0.038
Token were
Token ID547
Feature activation+1.65e-03

Pos logit contributions
 anx+0.024
ris+0.021
ma+0.020
abel+0.020
 patiently+0.019
Neg logit contributions
adoes-0.035
 scares-0.032
 web-0.031
 thorn-0.030
 affect-0.030
Token playing
Token ID2712
Feature activation+2.19e-03

Pos logit contributions
adoes+0.002
 scares+0.002
 bully+0.002
 echoed+0.002
 net+0.002
Neg logit contributions
 anx-0.003
ris-0.003
abel-0.002
 Er-0.002
ma-0.002
Token in
Token ID287
Feature activation+1.63e-03

Pos logit contributions
adoes+0.004
 scares+0.004
 echoed+0.004
 bully+0.004
 arrows+0.004
Neg logit contributions
 anx-0.003
ris-0.003
 Er-0.002
ma-0.002
abel-0.002
Token the
Token ID262
Feature activation-2.75e-02

Pos logit contributions
 pitch+0.002
adoes+0.002
 net+0.002
 scares+0.002
 hoop+0.002
Neg logit contributions
 anx-0.003
ris-0.003
abel-0.003
abella-0.003
innie-0.003
Token the
Token ID262
Feature activation+1.54e-02

Pos logit contributions
 anx+0.043
ris+0.041
abel+0.038
 Er+0.038
ma+0.037
Neg logit contributions
adoes-0.044
 scares-0.041
 echoed-0.040
 bully-0.039
 arrows-0.039
Token help
Token ID1037
Feature activation-2.28e-02

Pos logit contributions
adoes+0.019
 scares+0.018
 net+0.018
 wing+0.018
 bully+0.018
Neg logit contributions
 anx-0.030
ris-0.029
abel-0.027
abella-0.027
 Er-0.027
Token we
Token ID356
Feature activation-7.01e-03

Pos logit contributions
 anx+0.038
ris+0.036
abel+0.034
 Er+0.034
ma+0.034
Neg logit contributions
adoes-0.034
 scares-0.031
 echoed-0.030
 bully-0.029
 arrows-0.029
Token receive
Token ID3328
Feature activation-3.01e-02

Pos logit contributions
 anx+0.011
ris+0.010
 Er+0.010
ma+0.010
abel+0.010
Neg logit contributions
adoes-0.011
 scares-0.010
 echoed-0.010
 bully-0.010
 net-0.009
Token.
Token ID13
Feature activation-5.14e-03

Pos logit contributions
 anx+0.045
ris+0.042
abel+0.040
 Er+0.040
ma+0.039
Neg logit contributions
adoes-0.050
 scares-0.046
 echoed-0.045
 bully-0.044
 arrows-0.044
Token<|endoftext|>
Token ID50256
Feature activation+2.64e-02

Pos logit contributions
 anx+0.007
ris+0.006
abel+0.006
ma+0.006
 Er+0.006
Neg logit contributions
adoes-0.009
 scares-0.009
 echoed-0.008
 bully-0.008
 net-0.008
TokenOnce
Token ID7454
Feature activation-4.67e-03

Pos logit contributions
 scares+0.031
 echoed+0.031
 net+0.030
 mushroom+0.030
adoes+0.030
Neg logit contributions
 anx-0.049
 patiently-0.044
abel-0.043
ris-0.043
 Er-0.042
Token upon
Token ID2402
Feature activation+1.68e-02

Pos logit contributions
 anx+0.006
ris+0.005
 Er+0.005
abel+0.005
rom+0.005
Neg logit contributions
adoes-0.009
 scares-0.008
 echoed-0.008
 net-0.008
 mushroom-0.008
Token a
Token ID257
Feature activation-1.93e-02

Pos logit contributions
adoes+0.025
 echoed+0.025
 tail+0.024
 scares+0.024
 pitch+0.024
Neg logit contributions
ris-0.026
 anx-0.026
 Er-0.024
rom-0.023
ma-0.023
Token time
Token ID640
Feature activation+1.03e-02

Pos logit contributions
 Er+0.033
rom+0.031
 anx+0.031
years+0.029
son+0.027
Neg logit contributions
 pitch-0.026
 net-0.025
 scares-0.025
 tail-0.025
 echoed-0.025
Token,
Token ID11
Feature activation-6.05e-03

Pos logit contributions
adoes+0.017
 scares+0.016
 echoed+0.016
 bully+0.016
 arrows+0.015
Neg logit contributions
 anx-0.015
ris-0.014
ma-0.013
 Er-0.013
abel-0.013
Token and
Token ID290
Feature activation+1.92e-02

Pos logit contributions
adoes+0.010
 scares+0.009
 echoed+0.009
 arrows+0.009
 bully+0.009
Neg logit contributions
 anx-0.009
ris-0.008
abel-0.008
 Er-0.008
innie-0.007
Token show
Token ID905
Feature activation+1.84e-02

Pos logit contributions
adoes+0.031
AAAA+0.027
 scares+0.027
 echoed+0.026
 racket+0.026
Neg logit contributions
 anx-0.031
ris-0.027
 first-0.027
 patiently-0.026
 Er-0.026
Token everyone
Token ID2506
Feature activation+5.86e-02

Pos logit contributions
adoes+0.029
 scares+0.028
 echoed+0.027
 bully+0.026
 net+0.026
Neg logit contributions
 anx-0.029
ris-0.026
abel-0.025
 Er-0.025
ma-0.025
Token her
Token ID607
Feature activation+2.06e-02

Pos logit contributions
adoes+0.107
 scares+0.100
 bullet+0.098
AAAA+0.097
 bully+0.097
Neg logit contributions
 anx-0.076
ris-0.070
ma-0.069
abel-0.069
 Er-0.068
Token voice
Token ID3809
Feature activation+3.16e-02

Pos logit contributions
adoes+0.029
 scares+0.026
 echoed+0.026
 bully+0.025
 net+0.024
Neg logit contributions
 anx-0.036
ris-0.034
abel-0.032
ma-0.032
 Er-0.032
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
adoes+0.063
 scares+0.059
 echoed+0.058
 bully+0.057
 arrows+0.056
Neg logit contributions
 anx-0.036
ris-0.033
abel-0.030
 Er-0.030
ma-0.030
Token She
Token ID1375
Feature activation+1.25e-02

Pos logit contributions
adoes+0.024
 scares+0.023
 echoed+0.023
 affect+0.022
 bully+0.022
Neg logit contributions
 anx-0.025
ris-0.022
 Er-0.022
abel-0.021
 patiently-0.021
Token told
Token ID1297
Feature activation+3.36e-02

Pos logit contributions
adoes+0.019
 scares+0.018
 echoed+0.017
 bully+0.017
 arrows+0.016
Neg logit contributions
 anx-0.020
ris-0.019
 Er-0.018
ma-0.018
abel-0.018
Token her
Token ID607
Feature activation+6.28e-02

Pos logit contributions
adoes+0.055
 scares+0.051
 bully+0.050
 arrows+0.049
 echoed+0.049
Neg logit contributions
ris-0.050
 anx-0.050
ma-0.044
abel-0.044
 Er-0.044
Token mom
Token ID1995
Feature activation-4.92e-04

Pos logit contributions
adoes+0.110
 scares+0.103
 echoed+0.100
 bully+0.098
 affect+0.097
Neg logit contributions
 anx-0.088
ris-0.081
abel-0.079
ma-0.077
 Er-0.076
Token and
Token ID290
Feature activation+4.84e-02

Pos logit contributions
 anx+0.001
ris+0.001
 Er+0.001
innie+0.001
icer+0.001
Neg logit contributions
adoes-0.001
 echoed-0.001
 mushroom-0.001
 scares-0.001
 arrow-0.001
Token snacks
Token ID28487
Feature activation-3.52e-02

Pos logit contributions
 anx+0.028
ris+0.028
rom+0.027
 Er+0.026
abella+0.026
Neg logit contributions
 mushroom-0.020
 pitch-0.020
 net-0.019
 scares-0.019
 tail-0.019
Token together
Token ID1978
Feature activation-2.80e-02

Pos logit contributions
 anx+0.057
ris+0.054
 Er+0.053
innie+0.053
abel+0.052
Neg logit contributions
adoes-0.049
 net-0.048
 echoed-0.047
 pitch-0.047
 mushroom-0.045
Token and
Token ID290
Feature activation-5.83e-03

Pos logit contributions
 anx+0.045
ris+0.043
 Er+0.042
innie+0.042
rom+0.042
Neg logit contributions
adoes-0.038
 net-0.037
 echoed-0.037
 pitch-0.037
 tail-0.037
Token then
Token ID788
Feature activation-4.54e-02

Pos logit contributions
 anx+0.009
ris+0.009
 Er+0.008
innie+0.008
abel+0.008
Neg logit contributions
adoes-0.009
 echoed-0.008
 scares-0.008
 bully-0.008
 net-0.008
Token continued
Token ID3767
Feature activation+8.44e-03

Pos logit contributions
 anx+0.074
ris+0.070
abel+0.066
 Er+0.066
ma+0.065
Neg logit contributions
adoes-0.068
 scares-0.064
 echoed-0.062
 bully-0.061
 net-0.060
Token on
Token ID319
Feature activation+2.65e-02

Pos logit contributions
adoes+0.013
 net+0.013
 pitch+0.012
 scares+0.012
 tail+0.012
Neg logit contributions
ris-0.013
 anx-0.013
abel-0.012
innie-0.012
rom-0.011
Token their
Token ID511
Feature activation-2.96e-02

Pos logit contributions
 net+0.030
adoes+0.029
 pitch+0.029
 tail+0.029
 arrows+0.028
Neg logit contributions
 anx-0.050
ris-0.050
rom-0.049
abel-0.048
iam-0.048
Token adventure
Token ID8855
Feature activation-3.13e-02

Pos logit contributions
ris+0.050
rom+0.050
 anx+0.050
 Er+0.049
abel+0.047
Neg logit contributions
 pitch-0.039
 net-0.039
adoes-0.037
 scares-0.037
 tail-0.037
Token.
Token ID13
Feature activation-9.05e-03

Pos logit contributions
 anx+0.047
innie+0.045
ris+0.044
 Er+0.044
abel+0.043
Neg logit contributions
adoes-0.048
 echoed-0.046
 pitch-0.046
 arrows-0.045
 net-0.044
Token They
Token ID1119
Feature activation-3.81e-03

Pos logit contributions
 anx+0.016
ris+0.015
 Er+0.014
abel+0.014
ma+0.014
Neg logit contributions
adoes-0.013
 scares-0.012
 echoed-0.012
 bully-0.011
 arrows-0.011
Token walked
Token ID6807
Feature activation-1.34e-02

Pos logit contributions
 anx+0.007
innie+0.007
ris+0.006
 Er+0.006
rom+0.006
Neg logit contributions
 echoed-0.004
 net-0.004
 arrow-0.004
 scatter-0.004
 mushroom-0.004
Token mom
Token ID1995
Feature activation-5.51e-02

Pos logit contributions
adoes+0.002
 scares+0.001
 echoed+0.001
 bully+0.001
 net+0.001
Neg logit contributions
 anx-0.002
ris-0.001
 Er-0.001
abel-0.001
ma-0.001
Token's
Token ID338
Feature activation-3.07e-03

Pos logit contributions
 anx+0.091
ris+0.085
innie+0.082
 Er+0.081
abella+0.080
Neg logit contributions
adoes-0.078
 echoed-0.076
 scares-0.072
 pitch-0.072
 arrow-0.072
Token opinion
Token ID4459
Feature activation-1.46e-02

Pos logit contributions
ris+0.005
 anx+0.005
 Er+0.005
abella+0.005
abel+0.005
Neg logit contributions
adoes-0.004
 scares-0.004
 net-0.004
 pitch-0.004
 arrow-0.004
Token,
Token ID11
Feature activation+2.49e-02

Pos logit contributions
 anx+0.019
ris+0.017
abel+0.016
ma+0.016
 Er+0.016
Neg logit contributions
adoes-0.027
 scares-0.026
 echoed-0.025
 bully-0.025
 arrows-0.024
Token even
Token ID772
Feature activation+2.26e-02

Pos logit contributions
adoes+0.039
 echoed+0.036
 scares+0.036
 arrows+0.034
 arrow+0.034
Neg logit contributions
 anx-0.039
ris-0.038
 Er-0.035
ma-0.034
abel-0.034
Token if
Token ID611
Feature activation-4.37e-03

Pos logit contributions
adoes+0.035
 scares+0.032
 echoed+0.031
 bully+0.031
 arrows+0.030
Neg logit contributions
 anx-0.037
ris-0.034
abel-0.032
 Er-0.032
ma-0.032
Token it
Token ID340
Feature activation+3.32e-02

Pos logit contributions
ris+0.008
 anx+0.008
son+0.008
icer+0.007
innie+0.007
Neg logit contributions
adoes-0.006
 scares-0.005
 echoed-0.005
 arrows-0.005
 arrow-0.005
Token didn
Token ID1422
Feature activation-1.19e-02

Pos logit contributions
adoes+0.058
 scares+0.054
 echoed+0.053
 bully+0.052
 arrows+0.051
Neg logit contributions
 anx-0.047
ris-0.043
abel-0.041
 Er-0.040
ma-0.040
Token't
Token ID470
Feature activation-5.19e-03

Pos logit contributions
 anx+0.017
ris+0.016
abel+0.015
ma+0.015
 Er+0.015
Neg logit contributions
adoes-0.020
 scares-0.019
 echoed-0.018
 bully-0.018
 pitch-0.017
Token turn
Token ID1210
Feature activation-3.66e-02

Pos logit contributions
 anx+0.008
ris+0.007
abel+0.007
ma+0.007
 first+0.007
Neg logit contributions
adoes-0.008
 net-0.007
 arrows-0.007
 scares-0.007
AAAA-0.007
Token out
Token ID503
Feature activation+9.81e-03

Pos logit contributions
 anx+0.061
ris+0.058
abel+0.055
 Er+0.055
ma+0.054
Neg logit contributions
adoes-0.054
 scares-0.050
 echoed-0.049
 bully-0.047
 arrows-0.047
Token and
Token ID290
Feature activation+9.07e-03

Pos logit contributions
 anx+0.044
ris+0.041
innie+0.040
abel+0.039
 Er+0.038
Neg logit contributions
adoes-0.044
 echoed-0.042
 scares-0.042
 bully-0.040
 arrow-0.040
Token became
Token ID2627
Feature activation+2.61e-02

Pos logit contributions
adoes+0.016
 scares+0.015
ickets+0.015
 echoed+0.015
 bully+0.014
Neg logit contributions
 anx-0.012
ris-0.011
 Er-0.011
ma-0.010
abel-0.010
Token very
Token ID845
Feature activation-1.81e-02

Pos logit contributions
adoes+0.047
 scares+0.044
 echoed+0.043
 bully+0.042
 arrows+0.042
Neg logit contributions
 anx-0.036
ris-0.034
abel-0.031
ma-0.031
 Er-0.030
Token worried
Token ID7960
Feature activation-1.81e-02

Pos logit contributions
 anx+0.036
ris+0.035
innie+0.034
abel+0.034
ma+0.033
Neg logit contributions
adoes-0.020
 scares-0.020
 bully-0.019
 echoed-0.018
 arrows-0.017
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
 anx+0.023
ris+0.022
 Er+0.020
abel+0.020
ma+0.020
Neg logit contributions
adoes-0.034
 scares-0.032
 echoed-0.032
 bully-0.030
 arrows-0.030
Token
Token ID220
Feature activation-2.38e-02

Pos logit contributions
 anx+0.017
ris+0.016
ma+0.015
abel+0.015
 patiently+0.015
Neg logit contributions
adoes-0.016
 scares-0.015
 echoed-0.015
 bully-0.014
 pitch-0.014
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
 anx+0.034
ris+0.031
abel+0.030
 Er+0.028
ma+0.028
Neg logit contributions
adoes-0.041
 scares-0.038
 echoed-0.038
 bully-0.036
ickets-0.036
Token
Token ID198
Feature activation-4.65e-03

Pos logit contributions
 anx+0.020
abel+0.018
ris+0.017
 Er+0.017
ma+0.017
Neg logit contributions
 scares-0.016
 echoed-0.015
adoes-0.015
AAAA-0.015
ickets-0.015
TokenBut
Token ID1537
Feature activation-2.43e-02

Pos logit contributions
 anx+0.007
ris+0.006
abel+0.006
 Er+0.006
ma+0.006
Neg logit contributions
adoes-0.008
 scares-0.007
 echoed-0.007
 bully-0.007
 pitch-0.007
Token then
Token ID788
Feature activation-2.18e-02

Pos logit contributions
ris+0.035
 anx+0.035
innie+0.032
abel+0.031
son+0.031
Neg logit contributions
adoes-0.040
 echoed-0.038
 arrow-0.037
 net-0.036
 scares-0.036
Token,
Token ID11
Feature activation+3.02e-02

Pos logit contributions
 anx+0.034
ris+0.033
ma+0.030
abel+0.030
rom+0.030
Neg logit contributions
adoes-0.034
 scares-0.032
 echoed-0.031
 arrows-0.031
 bully-0.030
Token right
Token ID826
Feature activation-9.81e-03

Pos logit contributions
adoes+0.034
 scares+0.032
 echoed+0.031
 pitch+0.030
 arrows+0.030
Neg logit contributions
 anx-0.028
ris-0.026
abel-0.024
 Er-0.023
ma-0.023
Token!"
Token ID2474
Feature activation+3.02e-04

Pos logit contributions
 anx+0.020
ris+0.020
 Er+0.019
innie+0.019
years+0.019
Neg logit contributions
adoes-0.010
 scares-0.008
 echoed-0.008
 mushroom-0.008
 arrow-0.008
Token She
Token ID1375
Feature activation-1.47e-03

Pos logit contributions
adoes+0.001
 scares+0.001
 web+0.001
 affect+0.001
ickets+0.001
Neg logit contributions
 anx-0.000
 Er-0.000
ris-0.000
ma-0.000
 patiently-0.000
Token said
Token ID531
Feature activation+3.20e-03

Pos logit contributions
 anx+0.002
ris+0.002
abel+0.002
 Er+0.002
ma+0.002
Neg logit contributions
adoes-0.002
 scares-0.002
 echoed-0.002
 bully-0.002
 net-0.002
Token as
Token ID355
Feature activation+2.95e-02

Pos logit contributions
adoes+0.005
 echoed+0.005
 scares+0.004
 bully+0.004
 tail+0.004
Neg logit contributions
ris-0.005
icer-0.005
 anx-0.004
innie-0.004
abel-0.004
Token she
Token ID673
Feature activation-3.12e-02

Pos logit contributions
 echoed+0.037
adoes+0.034
 arrow+0.032
 scares+0.032
 bully+0.031
Neg logit contributions
ris-0.057
icer-0.054
innie-0.053
rom-0.051
ll-0.050
Token began
Token ID2540
Feature activation-1.46e-02

Pos logit contributions
 anx+0.054
ris+0.051
 Er+0.050
ma+0.049
innie+0.048
Neg logit contributions
 echoed-0.041
adoes-0.040
 scares-0.039
 arrows-0.037
 arrow-0.037
Token to
Token ID284
Feature activation-2.41e-02

Pos logit contributions
ris+0.019
 anx+0.019
icer+0.017
innie+0.017
abel+0.017
Neg logit contributions
adoes-0.025
 scares-0.024
 echoed-0.024
 net-0.023
 arrows-0.023
Token load
Token ID3440
Feature activation-3.65e-02

Pos logit contributions
 anx+0.046
ris+0.042
abel+0.041
 Er+0.041
ma+0.040
Neg logit contributions
adoes-0.031
 scares-0.028
 echoed-0.027
 bully-0.026
 net-0.025
Token the
Token ID262
Feature activation-3.56e-03

Pos logit contributions
ris+0.062
 anx+0.061
ll+0.057
innie+0.057
abel+0.057
Neg logit contributions
adoes-0.054
 arrows-0.050
 scares-0.047
 echoed-0.045
 arrow-0.045
Token shoes
Token ID10012
Feature activation-2.45e-02

Pos logit contributions
ris+0.007
 anx+0.007
abel+0.007
innie+0.006
abella+0.006
Neg logit contributions
 net-0.004
 hoop-0.004
adoes-0.004
 snail-0.004
 arrows-0.004
Token girl
Token ID2576
Feature activation+1.11e-02

Pos logit contributions
 anx+0.020
ris+0.019
 Er+0.018
abel+0.018
ma+0.018
Neg logit contributions
 bully-0.010
 snail-0.010
 pitch-0.009
 net-0.009
 mushroom-0.009
Token named
Token ID3706
Feature activation-3.84e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 bully+0.011
 web+0.011
 arrow+0.011
Neg logit contributions
 anx-0.022
ris-0.020
 Er-0.020
son-0.019
abel-0.019
Token Kate
Token ID16693
Feature activation-3.32e-02

Pos logit contributions
 anx+0.076
ris+0.072
 Er+0.069
abel+0.068
ma+0.068
Neg logit contributions
adoes-0.046
 scares-0.042
 echoed-0.040
 bully-0.039
 net-0.038
Token.
Token ID13
Feature activation-1.51e-02

Pos logit contributions
 Er+0.054
 anx+0.053
innie+0.051
ris+0.050
abella+0.050
Neg logit contributions
 echoed-0.043
 arrow-0.040
 bully-0.040
adoes-0.040
 tail-0.040
Token She
Token ID1375
Feature activation-1.53e-02

Pos logit contributions
 anx+0.028
ris+0.027
 Er+0.025
ma+0.025
abel+0.025
Neg logit contributions
adoes-0.019
 echoed-0.018
 scares-0.017
 mushroom-0.016
 hoop-0.016
Token loved
Token ID6151
Feature activation-2.44e-02

Pos logit contributions
 anx+0.024
ris+0.022
 Er+0.021
abel+0.021
ma+0.021
Neg logit contributions
adoes-0.024
 scares-0.023
 echoed-0.023
 bully-0.022
 arrows-0.021
Token to
Token ID284
Feature activation-1.46e-02

Pos logit contributions
 anx+0.028
ris+0.026
abel+0.023
 Er+0.023
ma+0.023
Neg logit contributions
adoes-0.049
 scares-0.046
 echoed-0.045
 bully-0.045
 arrows-0.044
Token pick
Token ID2298
Feature activation-1.31e-02

Pos logit contributions
 anx+0.024
ris+0.021
ma+0.020
 Er+0.020
abel+0.020
Neg logit contributions
adoes-0.023
 scares-0.021
 echoed-0.021
 racket-0.020
 bully-0.020
Token pretty
Token ID2495
Feature activation+1.13e-02

Pos logit contributions
 anx+0.024
ris+0.023
abel+0.022
 Er+0.022
ma+0.022
Neg logit contributions
adoes-0.017
 scares-0.016
 echoed-0.015
 bully-0.015
 arrows-0.014
Token things
Token ID1243
Feature activation-4.01e-03

Pos logit contributions
adoes+0.013
 echoed+0.013
 scares+0.013
 bully+0.013
 affect+0.012
Neg logit contributions
 anx-0.021
ris-0.020
 Er-0.020
abel-0.019
ma-0.019
Token.
Token ID13
Feature activation-1.74e-02

Pos logit contributions
 anx+0.005
ris+0.005
 Er+0.005
abel+0.005
ma+0.005
Neg logit contributions
adoes-0.007
 scares-0.007
 echoed-0.007
 bully-0.007
 arrows-0.006
Token with
Token ID351
Feature activation+5.39e-03

Pos logit contributions
adoes+0.070
 scares+0.066
 echoed+0.065
 bully+0.064
 arrows+0.062
Neg logit contributions
 anx-0.045
ris-0.041
abel-0.038
ma-0.037
 Er-0.037
Token so
Token ID523
Feature activation+2.16e-02

Pos logit contributions
adoes+0.010
 scares+0.009
 echoed+0.009
 bully+0.009
 net+0.008
Neg logit contributions
 anx-0.008
ris-0.007
 Er-0.007
abel-0.006
ma-0.006
Token many
Token ID867
Feature activation+1.10e-02

Pos logit contributions
adoes+0.030
 echoed+0.029
 scares+0.029
 arrows+0.028
 arrow+0.028
Neg logit contributions
 anx-0.036
ris-0.036
ma-0.034
innie-0.034
icer-0.034
Token new
Token ID649
Feature activation-2.14e-02

Pos logit contributions
adoes+0.017
 scares+0.016
 echoed+0.015
 bully+0.015
 net+0.014
Neg logit contributions
 anx-0.018
ris-0.017
abel-0.016
 Er-0.016
ma-0.016
Token stories
Token ID3923
Feature activation-2.24e-02

Pos logit contributions
 anx+0.037
ris+0.035
abel+0.032
 Er+0.032
ma+0.032
Neg logit contributions
adoes-0.030
 scares-0.028
 bully-0.027
 echoed-0.027
 snail-0.026
Token to
Token ID284
Feature activation-3.21e-02

Pos logit contributions
 anx+0.026
ris+0.024
abel+0.022
 Er+0.022
ma+0.022
Neg logit contributions
adoes-0.044
 scares-0.042
 echoed-0.041
 bully-0.040
 arrows-0.039
Token tell
Token ID1560
Feature activation+6.51e-04

Pos logit contributions
 anx+0.059
ris+0.057
 Er+0.053
abel+0.053
ma+0.053
Neg logit contributions
adoes-0.041
 scares-0.038
 echoed-0.037
 bully-0.036
 arrows-0.036
Token and
Token ID290
Feature activation+1.66e-02

Pos logit contributions
adoes+0.001
 scares+0.001
 echoed+0.001
 arrows+0.001
 bully+0.001
Neg logit contributions
 anx-0.001
ris-0.001
 Er-0.001
abel-0.001
ma-0.001
Token jokes
Token ID14532
Feature activation+3.73e-02

Pos logit contributions
adoes+0.026
 scares+0.024
AAAA+0.023
ickets+0.023
 CA+0.023
Neg logit contributions
 anx-0.027
ris-0.023
 patiently-0.023
 Er-0.023
ma-0.022
Token to
Token ID284
Feature activation-2.96e-03

Pos logit contributions
adoes+0.075
 scares+0.071
 echoed+0.069
 bully+0.068
 arrows+0.067
Neg logit contributions
 anx-0.042
ris-0.039
abel-0.035
 Er-0.035
ma-0.035
Token share
Token ID2648
Feature activation+3.02e-02

Pos logit contributions
 anx+0.004
ris+0.004
abel+0.004
 Er+0.004
ma+0.004
Neg logit contributions
adoes-0.005
 scares-0.005
 echoed-0.005
 bully-0.005
 arrows-0.005
Token k
Token ID479
Feature activation+1.68e-02

Pos logit contributions
ris+0.019
 anx+0.018
rom+0.017
innie+0.017
 Er+0.017
Neg logit contributions
 bully-0.015
 snail-0.015
 net-0.015
 arrow-0.015
 mushroom-0.014
Tokenitty
Token ID9760
Feature activation-9.43e-03

Pos logit contributions
adoes+0.020
 scares+0.018
 bully+0.018
 echoed+0.017
 arrows+0.017
Neg logit contributions
 anx-0.033
ris-0.030
 Er-0.029
abel-0.029
 patiently-0.028
Token was
Token ID373
Feature activation-5.59e-03

Pos logit contributions
 anx+0.016
ris+0.015
 Er+0.015
innie+0.015
rom+0.014
Neg logit contributions
 echoed-0.012
adoes-0.012
 scares-0.012
 arrow-0.012
 bully-0.011
Token happy
Token ID3772
Feature activation-1.02e-02

Pos logit contributions
 anx+0.009
ris+0.008
 Er+0.008
innie+0.008
rom+0.008
Neg logit contributions
adoes-0.008
 echoed-0.008
 scares-0.008
 tail-0.008
 bully-0.008
Token.
Token ID13
Feature activation-1.83e-02

Pos logit contributions
 anx+0.015
ris+0.015
 Er+0.013
innie+0.013
rom+0.013
Neg logit contributions
adoes-0.017
 echoed-0.016
 scares-0.016
 bully-0.015
 net-0.015
Token She
Token ID1375
Feature activation-2.78e-02

Pos logit contributions
 anx+0.033
ris+0.031
abel+0.029
 Er+0.029
ma+0.029
Neg logit contributions
adoes-0.025
 scares-0.023
 echoed-0.022
 bully-0.022
 arrows-0.021
Token was
Token ID373
Feature activation-4.90e-03

Pos logit contributions
 anx+0.046
ris+0.044
 Er+0.042
ma+0.041
innie+0.041
Neg logit contributions
adoes-0.039
 echoed-0.037
 scares-0.037
 arrow-0.035
 bully-0.035
Token able
Token ID1498
Feature activation+1.27e-02

Pos logit contributions
ris+0.008
 anx+0.008
innie+0.007
 Er+0.007
icer+0.007
Neg logit contributions
 echoed-0.007
adoes-0.006
 tail-0.006
 scares-0.006
 wing-0.006
Token to
Token ID284
Feature activation-7.51e-03

Pos logit contributions
adoes+0.025
 scares+0.024
 echoed+0.023
 net+0.023
 bully+0.023
Neg logit contributions
 anx-0.015
ris-0.014
 Er-0.013
abel-0.013
ma-0.012
Token lie
Token ID6486
Feature activation-1.08e-02

Pos logit contributions
 anx+0.014
ris+0.013
 Er+0.012
abel+0.012
ma+0.012
Neg logit contributions
adoes-0.010
 scares-0.009
 echoed-0.009
 bully-0.008
 net-0.008
Token in
Token ID287
Feature activation-6.27e-03

Pos logit contributions
ris+0.021
 anx+0.020
innie+0.020
abel+0.019
son+0.019
Neg logit contributions
adoes-0.014
 echoed-0.012
 net-0.011
 arrows-0.011
 snail-0.011
Token to
Token ID284
Feature activation-1.32e-02

Pos logit contributions
adoes+0.079
 scares+0.075
 bully+0.073
 echoed+0.073
 arrows+0.071
Neg logit contributions
 anx-0.053
ris-0.047
abel-0.045
ma-0.044
 Er-0.044
Token put
Token ID1234
Feature activation-3.04e-02

Pos logit contributions
 anx+0.025
ris+0.023
 Er+0.023
abel+0.023
innie+0.022
Neg logit contributions
adoes-0.016
 scares-0.015
 echoed-0.015
 bully-0.015
 pitch-0.014
Token on
Token ID319
Feature activation+3.64e-02

Pos logit contributions
 anx+0.052
ris+0.048
abel+0.048
innie+0.045
 Er+0.045
Neg logit contributions
adoes-0.047
 scares-0.043
 arrows-0.042
 net-0.041
 pitch-0.041
Token their
Token ID511
Feature activation-9.73e-03

Pos logit contributions
adoes+0.045
 pitch+0.040
 arrows+0.040
 net+0.037
 arrow+0.037
Neg logit contributions
ll-0.076
ris-0.074
 anx-0.072
innie-0.071
abel-0.068
Token seat
Token ID5852
Feature activation-4.73e-02

Pos logit contributions
 anx+0.018
ris+0.018
abel+0.017
innie+0.017
 Er+0.017
Neg logit contributions
adoes-0.012
 scares-0.011
 net-0.011
 bully-0.011
 arrows-0.011
Token belts
Token ID28617
Feature activation-1.08e-01

Pos logit contributions
 anx+0.073
innie+0.069
 Er+0.068
ris+0.066
abel+0.066
Neg logit contributions
adoes-0.079
 echoed-0.068
 pitch-0.066
 hoop-0.066
 scares-0.066
Token.
Token ID13
Feature activation-6.80e-03

Pos logit contributions
 anx+0.156
innie+0.148
 Er+0.144
ris+0.143
abel+0.143
Neg logit contributions
adoes-0.175
 echoed-0.167
 pitch-0.160
 scares-0.157
 racket-0.155
Token
Token ID198
Feature activation-6.43e-03

Pos logit contributions
 anx+0.008
ris+0.007
 Er+0.007
abel+0.007
ma+0.007
Neg logit contributions
adoes-0.013
 scares-0.013
 echoed-0.012
 bully-0.012
 arrows-0.012
Token
Token ID198
Feature activation+2.70e-03

Pos logit contributions
 anx+0.009
ris+0.008
abel+0.008
 Er+0.008
ma+0.008
Neg logit contributions
adoes-0.010
 scares-0.010
 echoed-0.010
 pitch-0.010
 bully-0.009
TokenMom
Token ID29252
Feature activation-1.45e-02

Pos logit contributions
adoes+0.005
 scares+0.005
 echoed+0.005
 mushroom+0.005
 bully+0.005
Neg logit contributions
 anx-0.003
ris-0.003
ma-0.003
 Er-0.003
abel-0.003
Token started
Token ID2067
Feature activation+3.17e-02

Pos logit contributions
 anx+0.027
 Er+0.026
lan+0.025
innie+0.025
ris+0.025
Neg logit contributions
 echoed-0.016
 scares-0.014
 arrow-0.014
adoes-0.013
 bully-0.013
Token a
Token ID257
Feature activation+7.34e-03

Pos logit contributions
 anx+0.020
ris+0.018
abel+0.017
 Er+0.017
ma+0.017
Neg logit contributions
adoes-0.033
 scares-0.031
 echoed-0.031
 bully-0.030
 net-0.030
Token cheerful
Token ID37999
Feature activation+2.80e-02

Pos logit contributions
adoes+0.012
 scares+0.011
 echoed+0.011
 arrows+0.010
ickets+0.010
Neg logit contributions
 anx-0.011
ris-0.010
 Er-0.010
ma-0.010
abel-0.010
Token little
Token ID1310
Feature activation+5.06e-02

Pos logit contributions
adoes+0.039
 scares+0.036
 echoed+0.035
 affect+0.033
 arrows+0.033
Neg logit contributions
 anx-0.049
ris-0.045
abel-0.044
ma-0.044
 Er-0.044
Token 3
Token ID513
Feature activation-4.29e-02

Pos logit contributions
adoes+0.073
 scares+0.066
 echoed+0.066
 affect+0.062
ickets+0.062
Neg logit contributions
 anx-0.086
ris-0.079
abel-0.078
 Er-0.078
ma-0.077
Token-
Token ID12
Feature activation-4.79e-02

Pos logit contributions
 anx+0.044
ris+0.039
 Er+0.039
 patiently+0.036
innie+0.035
Neg logit contributions
adoes-0.088
 scares-0.086
 echoed-0.085
 bully-0.085
 arrow-0.081
Tokenyear
Token ID1941
Feature activation-1.31e-01

Pos logit contributions
 anx+0.064
icer+0.061
 Er+0.055
innie+0.053
 patiently+0.053
Neg logit contributions
igator-0.085
 bully-0.076
ickets-0.075
igators-0.074
 echoed-0.072
Token-
Token ID12
Feature activation-7.56e-02

Pos logit contributions
 anx+0.149
ris+0.137
 Er+0.131
abel+0.126
innie+0.126
Neg logit contributions
adoes-0.258
 scares-0.243
 bully-0.240
 echoed-0.237
 arrow-0.237
Tokenold
Token ID727
Feature activation-5.84e-02

Pos logit contributions
 anx+0.153
icer+0.144
 Er+0.136
 patiently+0.134
innie+0.131
Neg logit contributions
igator-0.076
ickets-0.072
uts-0.071
 bully-0.068
 arrow-0.065
Token boy
Token ID2933
Feature activation-8.87e-03

Pos logit contributions
 anx+0.108
ris+0.107
innie+0.103
abella+0.099
icer+0.099
Neg logit contributions
 bully-0.068
adoes-0.068
 scares-0.064
 snail-0.063
 arrow-0.062
Token.
Token ID13
Feature activation-1.09e-03

Pos logit contributions
 anx+0.013
ris+0.012
innie+0.012
abella+0.012
 Er+0.012
Neg logit contributions
 scares-0.013
 bully-0.013
 echoed-0.013
adoes-0.013
 net-0.012
Token He
Token ID679
Feature activation-1.87e-02

Pos logit contributions
 anx+0.002
ris+0.002
 Er+0.001
abel+0.001
ma+0.001
Neg logit contributions
adoes-0.002
 scares-0.002
 echoed-0.002
 bully-0.002
 arrows-0.002
Token they
Token ID484
Feature activation-2.39e-02

Pos logit contributions
adoes+0.014
 scares+0.013
 net+0.012
 echoed+0.012
 pitch+0.012
Neg logit contributions
 anx-0.018
ris-0.017
abel-0.016
innie-0.016
ma-0.016
Token go
Token ID467
Feature activation+1.02e-03

Pos logit contributions
 anx+0.037
ris+0.035
abel+0.032
ma+0.032
 Er+0.031
Neg logit contributions
adoes-0.042
 scares-0.036
 echoed-0.035
 racket-0.034
 thorn-0.033
Token to
Token ID284
Feature activation-2.48e-02

Pos logit contributions
 net+0.002
adoes+0.002
 pitch+0.002
 scares+0.002
 snail+0.002
Neg logit contributions
 anx-0.002
abel-0.001
innie-0.001
 Er-0.001
son-0.001
Token the
Token ID262
Feature activation-1.90e-03

Pos logit contributions
 anx+0.039
ris+0.038
abel+0.036
innie+0.035
rom+0.035
Neg logit contributions
adoes-0.036
 pitch-0.036
 scares-0.036
 bully-0.035
 net-0.035
Token merry
Token ID44570
Feature activation-8.32e-02

Pos logit contributions
 anx+0.003
ris+0.003
 Er+0.003
abel+0.003
innie+0.003
Neg logit contributions
adoes-0.003
 bully-0.003
 net-0.003
 mushroom-0.003
 pitch-0.003
Token-
Token ID12
Feature activation-1.11e-01

Pos logit contributions
 anx+0.115
ris+0.106
 Er+0.101
abel+0.100
ma+0.097
Neg logit contributions
adoes-0.147
 scares-0.137
 echoed-0.133
 bully-0.133
 net-0.130
Tokengo
Token ID2188
Feature activation-9.77e-02

Pos logit contributions
 anx+0.198
 Er+0.184
abel+0.181
innie+0.180
icer+0.171
Neg logit contributions
igator-0.140
adoes-0.134
 bully-0.133
 arrow-0.131
ickets-0.131
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
 anx+0.146
innie+0.134
 Er+0.132
 patiently+0.122
abella+0.121
Neg logit contributions
adoes-0.150
 arrows-0.144
 scares-0.144
 bully-0.141
 echoed-0.141
Tokenround
Token ID744
Feature activation-4.38e-02

Pos logit contributions
 anx+0.306
abel+0.284
 Er+0.283
innie+0.282
 patiently+0.275
Neg logit contributions
adoes-0.038
igator-0.034
uts-0.030
 arrow-0.030
 pitch-0.030
Token and
Token ID290
Feature activation-6.00e-03

Pos logit contributions
 anx+0.065
innie+0.062
ris+0.060
 Er+0.059
son+0.056
Neg logit contributions
adoes-0.067
 echoed-0.064
 pitch-0.063
 tail-0.061
 scares-0.061
Token hop
Token ID1725
Feature activation+5.04e-02

Pos logit contributions
 anx+0.010
ris+0.009
 Er+0.009
abel+0.009
ma+0.009
Neg logit contributions
adoes-0.009
 scares-0.008
 echoed-0.008
 bully-0.008
ickets-0.008
TokenTim
Token ID14967
Feature activation-3.93e-02

Pos logit contributions
adoes+0.051
 scares+0.051
 echoed+0.049
 bully+0.048
 mushroom+0.048
Neg logit contributions
 anx-0.050
ris-0.043
 patiently-0.043
abel-0.043
 Er-0.042
Token was
Token ID373
Feature activation-1.53e-02

Pos logit contributions
 anx+0.045
ris+0.041
 Er+0.040
abel+0.037
rom+0.037
Neg logit contributions
adoes-0.075
 scares-0.071
 bully-0.071
 echoed-0.071
 arrow-0.070
Token a
Token ID257
Feature activation-1.31e-02

Pos logit contributions
ris+0.025
 anx+0.024
 Er+0.023
abel+0.023
ma+0.022
Neg logit contributions
adoes-0.021
 tail-0.021
 scares-0.020
 net-0.020
 echoed-0.020
Token three
Token ID1115
Feature activation-4.49e-02

Pos logit contributions
ris+0.027
 anx+0.026
abella+0.025
 Er+0.025
rom+0.025
Neg logit contributions
 bully-0.015
 snail-0.014
 pitch-0.013
 tail-0.013
 net-0.013
Token-
Token ID12
Feature activation-4.08e-02

Pos logit contributions
 anx+0.055
ris+0.052
 Er+0.051
abella+0.047
innie+0.047
Neg logit contributions
adoes-0.083
 scares-0.080
 bully-0.080
 echoed-0.078
 mushroom-0.075
Tokenyear
Token ID1941
Feature activation-1.36e-01

Pos logit contributions
 anx+0.044
 Er+0.044
ris+0.040
abella+0.039
innie+0.039
Neg logit contributions
igator-0.081
 bully-0.072
igators-0.070
 arrow-0.069
adoes-0.068
Token-
Token ID12
Feature activation-7.76e-02

Pos logit contributions
 anx+0.160
ris+0.149
 Er+0.145
abel+0.137
abella+0.135
Neg logit contributions
adoes-0.261
 scares-0.243
 bully-0.243
 arrow-0.239
 echoed-0.236
Tokenold
Token ID727
Feature activation-7.90e-02

Pos logit contributions
 anx+0.153
 Er+0.151
icer+0.147
 symp+0.140
abella+0.137
Neg logit contributions
igator-0.080
 bully-0.068
 arrow-0.067
igators-0.064
 arrows-0.061
Token boy
Token ID2933
Feature activation-3.29e-02

Pos logit contributions
ris+0.153
 anx+0.151
innie+0.147
abella+0.146
 Er+0.139
Neg logit contributions
 bully-0.087
adoes-0.086
 arrow-0.079
 snail-0.078
 tail-0.077
Token.
Token ID13
Feature activation-1.91e-02

Pos logit contributions
 anx+0.060
ris+0.057
abella+0.056
innie+0.056
 Er+0.055
Neg logit contributions
 bully-0.037
 echoed-0.036
 tail-0.035
 scares-0.035
adoes-0.035
Token He
Token ID679
Feature activation-3.01e-02

Pos logit contributions
 anx+0.033
ris+0.031
abel+0.030
ma+0.030
 patiently+0.029
Neg logit contributions
adoes-0.027
 scares-0.024
 echoed-0.024
 bully-0.023
 mushroom-0.023
Token there
Token ID612
Feature activation+9.83e-03

Pos logit contributions
 anx+0.040
ris+0.039
icer+0.038
son+0.037
innie+0.036
Neg logit contributions
adoes-0.032
 echoed-0.030
 scares-0.029
 arrows-0.029
 arrow-0.029
Token would
Token ID561
Feature activation-4.89e-02

Pos logit contributions
adoes+0.017
 scares+0.016
 echoed+0.016
 bully+0.015
 arrows+0.015
Neg logit contributions
 anx-0.014
ris-0.013
 Er-0.012
abel-0.012
innie-0.012
Token be
Token ID307
Feature activation+1.82e-02

Pos logit contributions
 anx+0.087
ris+0.083
abel+0.078
 Er+0.078
ma+0.077
Neg logit contributions
adoes-0.066
 scares-0.061
 echoed-0.060
 bully-0.058
 arrows-0.057
Token more
Token ID517
Feature activation-3.99e-03

Pos logit contributions
adoes+0.031
 scares+0.029
 echoed+0.029
 arrows+0.028
 bully+0.028
Neg logit contributions
 anx-0.026
ris-0.025
innie-0.023
 Er-0.023
son-0.023
Token beautiful
Token ID4950
Feature activation-2.00e-02

Pos logit contributions
 anx+0.006
ris+0.006
abel+0.005
 Er+0.005
innie+0.005
Neg logit contributions
adoes-0.007
 scares-0.006
 echoed-0.006
 arrows-0.006
 bully-0.006
Token works
Token ID2499
Feature activation-1.26e-01

Pos logit contributions
 anx+0.032
ris+0.030
 Er+0.029
abel+0.028
innie+0.028
Neg logit contributions
adoes-0.032
 scares-0.029
 echoed-0.028
 arrows-0.028
 bully-0.027
Token of
Token ID286
Feature activation-2.98e-02

Pos logit contributions
 anx+0.191
innie+0.185
ris+0.175
son+0.175
 Er+0.174
Neg logit contributions
adoes-0.200
 echoed-0.189
 scares-0.183
 arrows-0.177
 bully-0.171
Token art
Token ID1242
Feature activation-5.77e-02

Pos logit contributions
 anx+0.045
ris+0.044
abel+0.041
 Er+0.040
innie+0.040
Neg logit contributions
adoes-0.048
 scares-0.045
 echoed-0.043
 arrows-0.043
 bully-0.043
Token to
Token ID284
Feature activation-1.14e-02

Pos logit contributions
 anx+0.083
ris+0.075
 Er+0.074
innie+0.073
abel+0.072
Neg logit contributions
adoes-0.097
 echoed-0.092
 scares-0.090
 arrows-0.088
 bully-0.086
Token come
Token ID1282
Feature activation-4.49e-02

Pos logit contributions
 anx+0.021
ris+0.020
abel+0.019
 Er+0.019
innie+0.019
Neg logit contributions
adoes-0.015
 scares-0.014
 echoed-0.013
 bully-0.013
 arrows-0.013
Token.
Token ID13
Feature activation-9.77e-03

Pos logit contributions
 anx+0.067
ris+0.065
abel+0.062
innie+0.061
son+0.060
Neg logit contributions
adoes-0.076
 scares-0.067
 echoed-0.067
 arrows-0.066
 arrow-0.065