TOP ACTIVATIONS
MAX = 0.170

the N aughty Fox Stack Trick !" He then
It said , " Magic Trick Wand . For ages 10
always remembered to release the Wi - Fi when she was
mom my called a piano tun er who came and fixed
But remember to release the Wi - Fi when you 're
always thanked Clim ber the Lad der for its help .
he used Clim ber the Lad der . It helped him
tied their rope around the pu , held tight to the
her how to use the Wi - Fi network to watch
. I just need to recal ibr ate and try again
squirrel . They were the laz iest , most content couple
and she only hopped around laz ily in the forest .
, forgetting all about his laz iness . From
careful and not let his laz iness get in the way
way through the countryside and laz ily flowed around the rocks
speaks English and Lily speaks Spanish . " Look
' l ion ' in Spanish . It is ' leon
man to teach her some Spanish . The old man was
the name " B AN ANA " and her gir affe
she had learned lots of Spanish ! At the

MIN ACTIVATIONS
MIN = -0.224

. She grabbed the fire extingu isher and put out the
quickly checked for a fire extingu isher , but there wasn
he found a magical o int ment . He touched it
the apple ! He squ int ed his eyes and saw
when he noticed a gl int in the ground . He
, shiny wheel that gl int ed in the sun .
After they made the spl int , they put it on
the distance . He squ int ed and said to Sarah
. M int no longer wanted to go
J ill squ int ed up at the sky
stick to make a spl int for the bird 's wing
for something . He squ int ed his eyes and saw
day he noticed something gl int ing in the sun .
pile of dust and l int in the corner . He
the paint . She squ int ed at the canvas and
smell ?" The squirrel squ int ed , " Oh ,
and saw the sun gl int ing off the water .
forth , the sunlight gl int ing off her long ,
They all had to squ int their eyes .
used to make a spl int for the bird 's wing

INTERVAL 0.072 - 0.170
CONTAINS 0.284%

was a little girl called T essa . T essa was
They felt sorry . They realized they were wrong .
Mrs . P arson smiled and said , "
That is something we won 't find out today ."
owl told the rabbit kind words , and soon the rabbit

INTERVAL -0.027 - 0.072
CONTAINS 89.397%

under the tree . She sees the apples . They are
man on it was being ignorant and rev ved the engine
feel sick . Her tum my hurt and she didn 't
said . She took the gum away from them and threw
rabbit called Bob . Bob was a gentle creature and he

INTERVAL -0.125 - -0.027
CONTAINS 10.315%

got lost . His family looked for him everywhere , but
. She wanted to get closer to see it better ,
The mouse was sad , but he listened to his mum
face . The purple k ite with its long strings was
on , they were always together . Oscar would help the

INTERVAL -0.224 - -0.125
CONTAINS 0.004%

, he had to squ int his eyes . He turned
is angry . It his ses . It scratches . It
ksh akes . The mil ksh akes were blue and pink
angry snake . It his ses and snaps at Ben .
maze . Exc ited ly , she asked her
Token the
Token ID262
Feature activation-4.39e-04

Pos logit contributions
Her+0.084
uts+0.082
ves+0.081
 Everyone+0.073
ggies+0.070
Neg logit contributions
oured-0.092
ishing-0.084
 least-0.084
rying-0.081
ksh-0.078
Token N
Token ID399
Feature activation+5.96e-02

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
 least+0.001
ishing+0.001
Neg logit contributions
ves-0.001
Her-0.001
uts-0.001
 Everyone-0.001
 herself-0.001
Tokenaughty
Token ID28496
Feature activation-1.70e-02

Pos logit contributions
ves+0.091
 many+0.089
 Everyone+0.087
 much+0.085
Her+0.084
Neg logit contributions
oured-0.121
ited-0.115
rying-0.110
ishing-0.106
 least-0.105
Token Fox
Token ID5426
Feature activation-6.11e-02

Pos logit contributions
oured+0.028
ited+0.027
rying+0.025
 least+0.025
ishing+0.024
Neg logit contributions
ves-0.032
 many-0.031
Her-0.031
 Everyone-0.031
 much-0.030
Token Stack
Token ID23881
Feature activation+7.66e-02

Pos logit contributions
oured+0.129
rying+0.123
ited+0.123
 least+0.120
ishing+0.118
Neg logit contributions
ves-0.090
Her-0.086
 Everyone-0.083
ggies-0.074
 herself-0.074
Token Trick
Token ID30028
Feature activation+1.70e-01

Pos logit contributions
Her+0.120
ves+0.119
uts+0.114
 Everyone+0.112
 many+0.111
Neg logit contributions
oured-0.149
rying-0.146
ited-0.145
ishing-0.141
 least-0.138
Token!"
Token ID2474
Feature activation+2.59e-02

Pos logit contributions
Her+0.262
 Everyone+0.250
ves+0.242
 many+0.239
 herself+0.230
Neg logit contributions
rying-0.335
oured-0.329
ishing-0.310
ksh-0.300
cy-0.291
Token
Token ID198
Feature activation+1.13e-02

Pos logit contributions
ves+0.032
uts+0.027
ggies+0.026
Her+0.025
 much+0.025
Neg logit contributions
oured-0.063
 least-0.056
ksh-0.055
ited-0.053
cy-0.053
Token
Token ID198
Feature activation+1.12e-02

Pos logit contributions
uts+0.015
 herself+0.014
ves+0.014
 laws+0.013
 others+0.013
Neg logit contributions
oured-0.018
cy-0.016
 retrieved-0.015
 least-0.015
ited-0.015
TokenHe
Token ID1544
Feature activation+1.95e-02

Pos logit contributions
uts+0.012
ves+0.012
 herself+0.011
 many+0.011
 much+0.011
Neg logit contributions
ited-0.025
oured-0.022
ksh-0.021
rying-0.021
 least-0.020
Token then
Token ID788
Feature activation+3.05e-03

Pos logit contributions
ves+0.036
 many+0.035
 Everyone+0.034
Her+0.033
 herself+0.033
Neg logit contributions
oured-0.034
ited-0.032
rying-0.030
ishing-0.029
 least-0.029
Token It
Token ID632
Feature activation-2.50e-03

Pos logit contributions
ves+0.010
ggies+0.008
fs+0.008
 lesson+0.008
ggy+0.007
Neg logit contributions
 extingu-0.006
 Sure-0.006
oured-0.006
cy-0.006
elt-0.006
Token said
Token ID531
Feature activation-8.72e-03

Pos logit contributions
oured+0.005
ited+0.005
 least+0.004
rying+0.004
ishing+0.004
Neg logit contributions
 Everyone-0.004
ves-0.004
 many-0.004
uts-0.004
Her-0.004
Token,
Token ID11
Feature activation+1.67e-02

Pos logit contributions
oured+0.015
 extingu+0.013
 least+0.013
work+0.012
tern+0.012
Neg logit contributions
Every-0.013
First-0.012
 Everyone-0.011
ves-0.011
 herself-0.011
Token "
Token ID366
Feature activation-1.38e-02

Pos logit contributions
ves+0.023
 Everyone+0.021
 many+0.020
ggies+0.020
uts+0.019
Neg logit contributions
oured-0.033
rying-0.031
ksh-0.030
 extingu-0.029
ited-0.029
TokenMagic
Token ID22975
Feature activation-1.01e-02

Pos logit contributions
ited+0.028
oured+0.027
cy+0.026
 least+0.026
rying+0.026
Neg logit contributions
ves-0.020
 much-0.019
 many-0.018
 Everyone-0.018
uts-0.017
Token Trick
Token ID30028
Feature activation+1.69e-01

Pos logit contributions
oured+0.019
ited+0.017
rying+0.017
ishing+0.016
 least+0.016
Neg logit contributions
Her-0.017
 many-0.016
ves-0.016
 herself-0.016
 much-0.016
Token Wand
Token ID22420
Feature activation+3.54e-02

Pos logit contributions
ves+0.272
Her+0.272
 Everyone+0.269
 many+0.269
 herself+0.263
Neg logit contributions
oured-0.319
rying-0.290
ited-0.289
ishing-0.282
 least-0.275
Token.
Token ID13
Feature activation+4.50e-03

Pos logit contributions
Her+0.053
 Everyone+0.052
 many+0.051
ves+0.049
 much+0.048
Neg logit contributions
oured-0.072
rying-0.068
ited-0.068
ishing-0.067
 least-0.062
Token For
Token ID1114
Feature activation-1.04e-02

Pos logit contributions
ves+0.007
 many+0.006
ggies+0.006
 much+0.006
 lesson+0.006
Neg logit contributions
oured-0.009
 least-0.008
 extingu-0.008
ksh-0.008
cy-0.007
Token ages
Token ID9337
Feature activation+9.26e-03

Pos logit contributions
 least+0.022
oured+0.021
rying+0.019
ishing+0.019
ited+0.019
Neg logit contributions
ves-0.016
uts-0.015
Her-0.015
 Everyone-0.014
l-0.012
Token 10
Token ID838
Feature activation+5.41e-02

Pos logit contributions
Her+0.012
ves+0.012
 Everyone+0.011
His+0.011
 many+0.011
Neg logit contributions
oured-0.020
rying-0.019
 least-0.018
ited-0.018
ishing-0.017
Token always
Token ID1464
Feature activation-3.20e-02

Pos logit contributions
oured+0.009
ited+0.008
rying+0.008
ksh+0.008
ishing+0.008
Neg logit contributions
ves-0.006
 many-0.006
 Everyone-0.006
 herself-0.005
Her-0.005
Token remembered
Token ID12086
Feature activation-1.73e-02

Pos logit contributions
oured+0.067
ited+0.061
ksh+0.060
arse+0.057
rying+0.056
Neg logit contributions
 many-0.052
 herself-0.047
 much-0.046
 people-0.045
 himself-0.044
Token to
Token ID284
Feature activation-8.97e-03

Pos logit contributions
oured+0.038
 least+0.036
rying+0.035
ited+0.035
ishing+0.035
Neg logit contributions
ves-0.025
 Everyone-0.022
uts-0.021
 People-0.020
Her-0.020
Token release
Token ID2650
Feature activation+2.90e-03

Pos logit contributions
oured+0.016
ited+0.015
rying+0.014
ishing+0.013
 least+0.013
Neg logit contributions
ves-0.016
 Everyone-0.016
 many-0.016
Her-0.015
 herself-0.015
Token the
Token ID262
Feature activation-3.27e-03

Pos logit contributions
uts+0.004
ves+0.004
 Everyone+0.004
His+0.004
Her+0.004
Neg logit contributions
oured-0.006
ishing-0.005
 least-0.005
ited-0.005
rying-0.005
Token Wi
Token ID11759
Feature activation+1.41e-01

Pos logit contributions
oured+0.007
ited+0.007
rying+0.007
ishing+0.006
 least+0.006
Neg logit contributions
ves-0.005
 Everyone-0.004
Her-0.004
 many-0.004
uts-0.004
Token-
Token ID12
Feature activation+2.12e-02

Pos logit contributions
Her+0.253
 many+0.252
 Everyone+0.245
 much+0.242
 herself+0.242
Neg logit contributions
oured-0.237
ishing-0.224
ited-0.222
rying-0.221
ksh-0.210
TokenFi
Token ID10547
Feature activation+1.71e-02

Pos logit contributions
ves+0.040
 herself+0.039
 Everyone+0.038
uts+0.038
 himself+0.038
Neg logit contributions
oured-0.033
ited-0.031
ksh-0.028
rying-0.028
cy-0.028
Token when
Token ID618
Feature activation-1.01e-02

Pos logit contributions
ves+0.026
Her+0.026
 Everyone+0.024
 many+0.024
 herself+0.024
Neg logit contributions
oured-0.034
rying-0.031
ited-0.031
ksh-0.030
ishing-0.030
Token she
Token ID673
Feature activation+7.76e-03

Pos logit contributions
oured+0.021
 least+0.021
rying+0.020
ishing+0.019
cy+0.019
Neg logit contributions
ves-0.010
 Everyone-0.010
Her-0.010
uts-0.010
His-0.010
Token was
Token ID373
Feature activation-6.24e-03

Pos logit contributions
 many+0.013
ves+0.013
 Everyone+0.012
 much+0.012
 herself+0.012
Neg logit contributions
oured-0.014
arse-0.013
rying-0.013
ksh-0.013
ited-0.013
Token mom
Token ID1995
Feature activation+1.32e-02

Pos logit contributions
oured+0.032
ited+0.030
ishing+0.029
 least+0.029
rying+0.028
Neg logit contributions
ves-0.017
 Everyone-0.016
 many-0.015
uts-0.015
 himself-0.015
Tokenmy
Token ID1820
Feature activation-3.96e-03

Pos logit contributions
 many+0.024
ves+0.024
Her+0.023
 herself+0.023
 much+0.023
Neg logit contributions
oured-0.022
ited-0.021
ksh-0.020
rying-0.020
 least-0.019
Token called
Token ID1444
Feature activation+1.29e-02

Pos logit contributions
oured+0.008
ited+0.007
rying+0.007
ksh+0.007
ishing+0.007
Neg logit contributions
ves-0.006
 many-0.006
Her-0.006
 herself-0.006
 Everyone-0.006
Token a
Token ID257
Feature activation-7.98e-03

Pos logit contributions
ves+0.016
uts+0.015
 Everyone+0.014
Her+0.013
ggies+0.012
Neg logit contributions
oured-0.030
rying-0.029
ited-0.028
 least-0.027
ishing-0.027
Token piano
Token ID19132
Feature activation+7.10e-03

Pos logit contributions
oured+0.017
ited+0.016
rying+0.015
 least+0.015
ishing+0.015
Neg logit contributions
ves-0.011
 Everyone-0.011
 herself-0.011
uts-0.011
 many-0.010
Token tun
Token ID6278
Feature activation+1.41e-01

Pos logit contributions
ves+0.009
 Everyone+0.009
 many+0.008
Her+0.008
uts+0.008
Neg logit contributions
oured-0.016
ited-0.015
rying-0.015
 least-0.014
ishing-0.014
Tokener
Token ID263
Feature activation+3.63e-02

Pos logit contributions
ves+0.239
 Everyone+0.226
 much+0.219
 many+0.215
Her+0.207
Neg logit contributions
oured-0.257
ishing-0.255
ited-0.252
rying-0.247
cy-0.234
Token who
Token ID508
Feature activation-3.91e-02

Pos logit contributions
ves+0.051
 Everyone+0.050
 many+0.049
 much+0.049
Her+0.047
Neg logit contributions
oured-0.079
rying-0.071
ited-0.069
ishing-0.069
 least-0.067
Token came
Token ID1625
Feature activation-1.70e-02

Pos logit contributions
oured+0.071
ited+0.067
rying+0.063
 least+0.061
ishing+0.061
Neg logit contributions
ves-0.068
 Everyone-0.065
Her-0.065
 many-0.064
uts-0.063
Token and
Token ID290
Feature activation-4.94e-03

Pos logit contributions
rying+0.034
oured+0.033
ishing+0.031
arse+0.030
 least+0.030
Neg logit contributions
ves-0.028
 Everyone-0.027
 People-0.024
uts-0.023
 many-0.023
Token fixed
Token ID5969
Feature activation-4.25e-02

Pos logit contributions
oured+0.009
ited+0.009
rying+0.008
ishing+0.008
 least+0.008
Neg logit contributions
ves-0.009
 Everyone-0.008
 many-0.008
Her-0.008
 herself-0.008
Token But
Token ID887
Feature activation-2.57e-02

Pos logit contributions
oured+0.009
ksh+0.008
 least+0.008
rying+0.008
ited+0.008
Neg logit contributions
ves-0.005
 many-0.005
uts-0.004
Her-0.004
 much-0.004
Token remember
Token ID3505
Feature activation-3.40e-02

Pos logit contributions
oured+0.053
ited+0.051
rying+0.049
 least+0.048
ishing+0.048
Neg logit contributions
ves-0.038
 Everyone-0.037
uts-0.035
 many-0.035
Her-0.035
Token to
Token ID284
Feature activation-6.18e-03

Pos logit contributions
oured+0.076
rying+0.072
 least+0.071
ishing+0.069
ited+0.068
Neg logit contributions
ves-0.044
 Everyone-0.043
Her-0.042
uts-0.040
 much-0.037
Token release
Token ID2650
Feature activation+6.50e-03

Pos logit contributions
oured+0.009
ited+0.009
 least+0.009
rying+0.009
ishing+0.008
Neg logit contributions
ves-0.012
uts-0.012
 Everyone-0.012
 many-0.011
Her-0.011
Token the
Token ID262
Feature activation-1.67e-03

Pos logit contributions
uts+0.011
ves+0.010
Her+0.010
His+0.009
 Everyone+0.009
Neg logit contributions
oured-0.011
 least-0.011
ishing-0.011
ish-0.010
rying-0.010
Token Wi
Token ID11759
Feature activation+1.41e-01

Pos logit contributions
oured+0.004
ited+0.003
rying+0.003
ishing+0.003
 least+0.003
Neg logit contributions
ves-0.002
Her-0.002
 Everyone-0.002
uts-0.002
 herself-0.002
Token-
Token ID12
Feature activation+2.24e-02

Pos logit contributions
Her+0.269
 many+0.261
 herself+0.254
 Everyone+0.252
 much+0.250
Neg logit contributions
oured-0.222
ishing-0.213
rying-0.208
ited-0.208
ksh-0.201
TokenFi
Token ID10547
Feature activation+1.65e-02

Pos logit contributions
ves+0.043
 herself+0.043
uts+0.042
 himself+0.042
 Everyone+0.041
Neg logit contributions
oured-0.032
ited-0.031
ksh-0.028
cy-0.028
rying-0.028
Token when
Token ID618
Feature activation-8.18e-03

Pos logit contributions
Her+0.027
ves+0.026
 many+0.025
 herself+0.025
 Everyone+0.025
Neg logit contributions
oured-0.031
rying-0.029
ited-0.028
ksh-0.027
ishing-0.027
Token you
Token ID345
Feature activation+2.12e-02

Pos logit contributions
 least+0.015
rying+0.014
oured+0.014
ishing+0.013
cy+0.013
Neg logit contributions
Her-0.012
uts-0.011
ves-0.011
His-0.010
 Everyone-0.010
Token're
Token ID821
Feature activation-9.03e-02

Pos logit contributions
 many+0.034
 Everyone+0.034
ves+0.033
 much+0.032
 herself+0.032
Neg logit contributions
oured-0.040
ited-0.038
ishing-0.037
rying-0.037
ksh-0.036
Token always
Token ID1464
Feature activation-2.85e-02

Pos logit contributions
ves+0.014
 Everyone+0.013
 many+0.013
Her+0.013
uts+0.013
Neg logit contributions
oured-0.016
ited-0.016
rying-0.014
 least-0.014
ishing-0.014
Token thanked
Token ID26280
Feature activation-2.22e-03

Pos logit contributions
oured+0.048
ited+0.045
rying+0.042
ksh+0.042
ishing+0.041
Neg logit contributions
ves-0.053
 many-0.053
 Everyone-0.051
 much-0.051
 herself-0.050
Token Clim
Token ID28573
Feature activation-4.24e-02

Pos logit contributions
 least+0.004
oured+0.004
rying+0.004
ishing+0.004
ited+0.004
Neg logit contributions
uts-0.004
ves-0.003
Her-0.003
 much-0.003
case-0.003
Tokenber
Token ID527
Feature activation+2.74e-03

Pos logit contributions
oured+0.069
ited+0.067
rying+0.066
ishing+0.065
 least+0.059
Neg logit contributions
 Everyone-0.082
ves-0.082
 many-0.082
 much-0.080
Her-0.076
Token the
Token ID262
Feature activation-2.33e-03

Pos logit contributions
ves+0.004
Her+0.004
uts+0.004
 Everyone+0.004
 many+0.004
Neg logit contributions
oured-0.006
ited-0.005
rying-0.005
 least-0.005
ishing-0.005
Token Lad
Token ID12862
Feature activation+1.39e-01

Pos logit contributions
ited+0.005
oured+0.005
 least+0.004
rying+0.004
ishing+0.004
Neg logit contributions
ves-0.004
uts-0.004
Her-0.003
 Everyone-0.003
 herself-0.003
Tokender
Token ID1082
Feature activation-2.25e-02

Pos logit contributions
Her+0.190
 many+0.183
 Everyone+0.180
ves+0.180
uts+0.176
Neg logit contributions
oured-0.292
ited-0.280
rying-0.268
ishing-0.262
int-0.258
Token for
Token ID329
Feature activation-2.11e-03

Pos logit contributions
ited+0.046
rying+0.044
ishing+0.042
 least+0.042
oured+0.040
Neg logit contributions
Her-0.035
 Everyone-0.034
ves-0.034
uts-0.032
Everyone-0.031
Token its
Token ID663
Feature activation-2.97e-03

Pos logit contributions
oured+0.004
 least+0.004
rying+0.004
ited+0.003
ishing+0.003
Neg logit contributions
uts-0.004
ves-0.004
 Everyone-0.003
Her-0.003
 People-0.003
Token help
Token ID1037
Feature activation-1.03e-02

Pos logit contributions
oured+0.005
ited+0.005
 least+0.005
rying+0.005
ishing+0.004
Neg logit contributions
ves-0.005
 Everyone-0.005
Her-0.005
uts-0.005
 many-0.005
Token.
Token ID13
Feature activation-5.22e-03

Pos logit contributions
rying+0.020
oured+0.020
ited+0.020
 least+0.019
ishing+0.018
Neg logit contributions
ves-0.015
Her-0.014
 Everyone-0.014
uts-0.014
 many-0.013
Token he
Token ID339
Feature activation+9.06e-03

Pos logit contributions
Her+0.013
ves+0.013
uts+0.012
 Everyone+0.010
 herself+0.010
Neg logit contributions
oured-0.022
 least-0.022
ited-0.021
rying-0.020
ishing-0.019
Token used
Token ID973
Feature activation-1.52e-02

Pos logit contributions
ves+0.018
 Everyone+0.017
Her+0.017
uts+0.017
 many+0.017
Neg logit contributions
oured-0.014
ited-0.014
rying-0.013
 least-0.012
ishing-0.012
Token Clim
Token ID28573
Feature activation-4.17e-02

Pos logit contributions
oured+0.036
ited+0.034
ishing+0.032
rying+0.032
 least+0.032
Neg logit contributions
ves-0.018
 Everyone-0.017
Her-0.017
uts-0.017
 People-0.015
Tokenber
Token ID527
Feature activation+2.83e-03

Pos logit contributions
oured+0.079
ited+0.076
rying+0.071
ishing+0.069
 least+0.068
Neg logit contributions
ves-0.070
 Everyone-0.067
 many-0.067
Her-0.065
 herself-0.065
Token the
Token ID262
Feature activation-2.04e-03

Pos logit contributions
ves+0.004
 Everyone+0.004
Her+0.004
uts+0.004
 much+0.003
Neg logit contributions
oured-0.006
rying-0.006
ited-0.006
ishing-0.006
 least-0.005
Token Lad
Token ID12862
Feature activation+1.38e-01

Pos logit contributions
oured+0.004
ited+0.004
rying+0.004
 least+0.004
ishing+0.003
Neg logit contributions
ves-0.003
 Everyone-0.003
Her-0.003
uts-0.003
 many-0.003
Tokender
Token ID1082
Feature activation-2.20e-02

Pos logit contributions
Her+0.185
 many+0.179
 Everyone+0.177
ves+0.173
 much+0.171
Neg logit contributions
oured-0.288
ited-0.274
rying-0.264
ishing-0.258
int-0.256
Token.
Token ID13
Feature activation-2.97e-03

Pos logit contributions
ited+0.046
rying+0.045
ishing+0.044
oured+0.043
 least+0.042
Neg logit contributions
 Everyone-0.032
Her-0.031
ves-0.030
 People-0.029
Everyone-0.028
Token It
Token ID632
Feature activation-4.87e-03

Pos logit contributions
oured+0.007
 least+0.006
ksh+0.006
 extingu+0.006
int+0.006
Neg logit contributions
ves-0.004
Her-0.003
ggy-0.003
ggies-0.003
 herself-0.003
Token helped
Token ID4193
Feature activation-2.58e-02

Pos logit contributions
oured+0.010
ited+0.009
rying+0.009
 least+0.009
ishing+0.008
Neg logit contributions
ves-0.007
 Everyone-0.007
Her-0.007
 many-0.007
uts-0.007
Token him
Token ID683
Feature activation+4.05e-02

Pos logit contributions
oured+0.046
 least+0.045
rying+0.044
ishing+0.042
ited+0.041
Neg logit contributions
Her-0.040
uts-0.040
ves-0.039
 much-0.035
 Everyone-0.035
Token tied
Token ID8165
Feature activation+5.82e-03

Pos logit contributions
 many+0.013
ves+0.012
 much+0.012
 Everyone+0.012
 herself+0.012
Neg logit contributions
oured-0.013
ited-0.012
rying-0.011
ishing-0.011
ksh-0.011
Token their
Token ID511
Feature activation-3.90e-02

Pos logit contributions
uts+0.008
ves+0.008
Her+0.008
 Everyone+0.007
ggy+0.007
Neg logit contributions
oured-0.012
ishing-0.011
 least-0.011
 extingu-0.011
cy-0.011
Token rope
Token ID17182
Feature activation-2.43e-02

Pos logit contributions
oured+0.089
ited+0.086
rying+0.082
ishing+0.078
ksh+0.078
Neg logit contributions
ves-0.052
 many-0.051
 Everyone-0.047
 herself-0.047
 much-0.046
Token around
Token ID1088
Feature activation+2.64e-03

Pos logit contributions
oured+0.049
rying+0.045
ishing+0.045
ited+0.041
cy+0.041
Neg logit contributions
 Everyone-0.038
ves-0.036
Everyone-0.036
Her-0.035
 People-0.033
Token the
Token ID262
Feature activation-3.78e-03

Pos logit contributions
ves+0.003
uts+0.002
 Everyone+0.002
Her+0.002
 many+0.002
Neg logit contributions
oured-0.007
ited-0.007
rying-0.006
ishing-0.006
 least-0.006
Token pu
Token ID47574
Feature activation+1.38e-01

Pos logit contributions
oured+0.009
ited+0.009
rying+0.008
ishing+0.008
ksh+0.008
Neg logit contributions
 many-0.005
ves-0.005
 Everyone-0.004
 much-0.004
 herself-0.004
Token,
Token ID11
Feature activation+4.89e-03

Pos logit contributions
 Everyone+0.202
Her+0.196
 many+0.194
 People+0.180
 much+0.180
Neg logit contributions
oured-0.286
rying-0.260
ited-0.249
arse-0.239
ishing-0.239
Token held
Token ID2714
Feature activation-1.78e-03

Pos logit contributions
ves+0.007
 many+0.006
 Everyone+0.006
 herself+0.006
Her+0.006
Neg logit contributions
oured-0.011
ited-0.010
rying-0.010
ishing-0.010
 least-0.009
Token tight
Token ID5381
Feature activation+2.85e-02

Pos logit contributions
oured+0.004
rying+0.004
ishing+0.004
ited+0.004
 least+0.004
Neg logit contributions
uts-0.002
ves-0.002
Her-0.002
 Everyone-0.002
 People-0.002
Token to
Token ID284
Feature activation-1.28e-02

Pos logit contributions
ves+0.039
 Everyone+0.038
Her+0.037
 People+0.033
Everyone+0.033
Neg logit contributions
oured-0.064
ited-0.060
rying-0.059
ishing-0.058
 extingu-0.055
Token the
Token ID262
Feature activation-3.86e-03

Pos logit contributions
oured+0.028
rying+0.027
 least+0.026
ited+0.026
ishing+0.026
Neg logit contributions
uts-0.017
ves-0.015
Her-0.015
 Everyone-0.014
 People-0.012
Token her
Token ID607
Feature activation+7.08e-03

Pos logit contributions
ves+0.013
uts+0.011
ggies+0.010
 Everyone+0.010
 himself+0.009
Neg logit contributions
oured-0.027
ited-0.025
ishing-0.025
cy-0.024
 least-0.024
Token how
Token ID703
Feature activation-1.47e-02

Pos logit contributions
ves+0.010
uts+0.009
 Everyone+0.009
 himself+0.008
Her+0.008
Neg logit contributions
oured-0.015
ited-0.013
 least-0.013
 extingu-0.013
ishing-0.013
Token to
Token ID284
Feature activation-5.00e-03

Pos logit contributions
oured+0.027
ited+0.026
rying+0.026
ishing+0.025
cy+0.024
Neg logit contributions
ves-0.022
 himself-0.020
 Everyone-0.019
 herself-0.019
uts-0.019
Token use
Token ID779
Feature activation+9.34e-03

Pos logit contributions
oured+0.008
ited+0.007
rying+0.007
ishing+0.006
 least+0.006
Neg logit contributions
ves-0.010
 many-0.010
 Everyone-0.010
uts-0.010
 much-0.009
Token the
Token ID262
Feature activation-7.02e-04

Pos logit contributions
uts+0.014
ves+0.013
 Everyone+0.013
Her+0.012
ggies+0.012
Neg logit contributions
oured-0.018
 least-0.017
 extingu-0.017
ishing-0.016
ited-0.015
Token Wi
Token ID11759
Feature activation+1.38e-01

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
ishing+0.001
 least+0.001
Neg logit contributions
ves-0.001
uts-0.001
 Everyone-0.001
Her-0.001
 herself-0.001
Token-
Token ID12
Feature activation+2.17e-02

Pos logit contributions
Her+0.258
 many+0.247
 herself+0.243
 much+0.240
 Everyone+0.238
Neg logit contributions
oured-0.220
ishing-0.213
ited-0.199
ksh-0.198
rying-0.196
TokenFi
Token ID10547
Feature activation+1.30e-02

Pos logit contributions
 himself+0.039
uts+0.039
ves+0.039
 herself+0.038
 Everyone+0.037
Neg logit contributions
oured-0.032
ited-0.029
cy-0.028
ksh-0.028
ishing-0.027
Token network
Token ID3127
Feature activation+1.82e-02

Pos logit contributions
Her+0.021
 herself+0.020
ves+0.020
 himself+0.020
 many+0.020
Neg logit contributions
oured-0.024
ksh-0.021
 extingu-0.021
rying-0.021
ited-0.021
Token to
Token ID284
Feature activation-4.85e-03

Pos logit contributions
ves+0.026
Her+0.024
ggies+0.023
ggy+0.023
 Everyone+0.022
Neg logit contributions
oured-0.036
 least-0.034
rying-0.033
ishing-0.032
ited-0.031
Token watch
Token ID2342
Feature activation-5.79e-03

Pos logit contributions
oured+0.007
ited+0.006
rying+0.006
 least+0.006
ishing+0.006
Neg logit contributions
uts-0.010
ves-0.010
 Everyone-0.009
 many-0.009
 much-0.009
Token.
Token ID13
Feature activation-8.48e-03

Pos logit contributions
rying+0.059
oured+0.057
 least+0.054
ishing+0.053
ited+0.052
Neg logit contributions
ves-0.049
 Everyone-0.042
Her-0.042
 many-0.040
uts-0.039
Token I
Token ID314
Feature activation-1.94e-02

Pos logit contributions
oured+0.019
ited+0.017
 least+0.017
rying+0.017
ksh+0.016
Neg logit contributions
ves-0.012
 many-0.011
Her-0.011
 herself-0.010
uts-0.010
Token just
Token ID655
Feature activation-5.22e-02

Pos logit contributions
oured+0.036
ited+0.035
ishing+0.033
ksh+0.033
rying+0.033
Neg logit contributions
ves-0.032
 Everyone-0.032
 much-0.031
 many-0.031
-0.030
Token need
Token ID761
Feature activation-9.33e-03

Pos logit contributions
oured+0.105
ited+0.102
ksh+0.095
rying+0.094
ishing+0.093
Neg logit contributions
 many-0.080
 Everyone-0.079
 much-0.078
ves-0.075
 herself-0.075
Token to
Token ID284
Feature activation-1.06e-02

Pos logit contributions
oured+0.021
 least+0.020
ited+0.019
rying+0.019
ishing+0.019
Neg logit contributions
ves-0.012
uts-0.011
Her-0.011
 Everyone-0.011
ggies-0.010
Token recal
Token ID42653
Feature activation+1.35e-01

Pos logit contributions
oured+0.018
ited+0.017
rying+0.016
 least+0.016
ishing+0.016
Neg logit contributions
ves-0.020
 Everyone-0.019
 many-0.019
Her-0.018
uts-0.018
Tokenibr
Token ID2889
Feature activation+3.00e-02

Pos logit contributions
ves+0.208
 many+0.207
 herself+0.197
Her+0.196
 much+0.196
Neg logit contributions
oured-0.272
ited-0.245
rying-0.239
ishing-0.228
 least-0.227
Tokenate
Token ID378
Feature activation-9.07e-03

Pos logit contributions
ves+0.057
 Everyone+0.054
uts+0.054
 many+0.051
 herself+0.050
Neg logit contributions
ited-0.054
oured-0.052
rying-0.046
ksh-0.044
cy-0.043
Token and
Token ID290
Feature activation-9.82e-03

Pos logit contributions
oured+0.016
rying+0.015
ishing+0.015
ish+0.014
 least+0.014
Neg logit contributions
Her-0.016
ves-0.014
 Everyone-0.013
His-0.013
Every-0.013
Token try
Token ID1949
Feature activation+3.80e-02

Pos logit contributions
oured+0.019
ited+0.018
 least+0.018
rying+0.018
ishing+0.017
Neg logit contributions
ves-0.016
Her-0.015
uts-0.014
 Everyone-0.014
 many-0.014
Token again
Token ID757
Feature activation+2.98e-02

Pos logit contributions
Her+0.052
ves+0.051
 Everyone+0.048
uts+0.047
ily+0.045
Neg logit contributions
oured-0.080
rying-0.076
ishing-0.072
 least-0.071
ksh-0.068
Token squirrel
Token ID33039
Feature activation-2.32e-02

Pos logit contributions
oured+0.039
ited+0.037
rying+0.035
ksh+0.034
 least+0.034
Neg logit contributions
ves-0.026
 many-0.025
 Everyone-0.025
 much-0.024
 herself-0.023
Token.
Token ID13
Feature activation-5.45e-03

Pos logit contributions
oured+0.051
ited+0.048
rying+0.047
 least+0.046
cy+0.045
Neg logit contributions
ves-0.031
 Everyone-0.030
Her-0.029
 many-0.028
 People-0.026
Token They
Token ID1119
Feature activation-1.20e-02

Pos logit contributions
oured+0.013
 least+0.013
cy+0.012
ksh+0.012
ited+0.012
Neg logit contributions
ves-0.007
ggies-0.005
uts-0.005
ggy-0.005
Her-0.005
Token were
Token ID547
Feature activation-9.13e-03

Pos logit contributions
oured+0.025
ited+0.024
rying+0.023
ishing+0.023
ksh+0.022
Neg logit contributions
 many-0.017
ves-0.017
 Everyone-0.017
 herself-0.016
 much-0.016
Token the
Token ID262
Feature activation-1.56e-03

Pos logit contributions
oured+0.021
ited+0.020
 least+0.020
rying+0.019
ishing+0.019
Neg logit contributions
ves-0.012
uts-0.011
Her-0.010
 Everyone-0.010
 herself-0.009
Token laz
Token ID37296
Feature activation+1.32e-01

Pos logit contributions
oured+0.004
ited+0.004
rying+0.004
ishing+0.003
 least+0.003
Neg logit contributions
ves-0.002
 many-0.002
 Everyone-0.002
Her-0.002
 herself-0.002
Tokeniest
Token ID6386
Feature activation+3.08e-02

Pos logit contributions
ves+0.189
 many+0.188
Her+0.183
 Everyone+0.183
uts+0.180
Neg logit contributions
oured-0.275
ited-0.272
rying-0.251
ishing-0.249
 least-0.241
Token,
Token ID11
Feature activation+8.64e-03

Pos logit contributions
ves+0.043
 many+0.042
 Everyone+0.041
 much+0.040
 herself+0.039
Neg logit contributions
oured-0.067
ited-0.064
rying-0.061
ishing-0.059
ksh-0.059
Token most
Token ID749
Feature activation+4.09e-02

Pos logit contributions
ves+0.012
 Everyone+0.011
Her+0.011
 many+0.011
uts+0.011
Neg logit contributions
oured-0.019
ited-0.018
rying-0.017
 least-0.017
ishing-0.017
Token content
Token ID2695
Feature activation-2.46e-02

Pos logit contributions
ves+0.050
 Everyone+0.046
Her+0.045
 many+0.045
 herself+0.044
Neg logit contributions
oured-0.097
ited-0.094
rying-0.089
ishing-0.088
 least-0.088
Token couple
Token ID3155
Feature activation-3.16e-03

Pos logit contributions
oured+0.052
ited+0.050
rying+0.049
ishing+0.048
 least+0.046
Neg logit contributions
ves-0.034
Her-0.032
 Everyone-0.031
 many-0.031
uts-0.030
Token and
Token ID290
Feature activation-6.19e-03

Pos logit contributions
oured+0.001
rying+0.001
ishing+0.001
ited+0.001
 least+0.001
Neg logit contributions
ves-0.000
Her-0.000
 Everyone-0.000
uts-0.000
 herself-0.000
Token she
Token ID673
Feature activation+9.75e-03

Pos logit contributions
ited+0.009
oured+0.009
 least+0.009
rying+0.009
ishing+0.008
Neg logit contributions
ves-0.012
uts-0.011
Her-0.011
 Everyone-0.011
 People-0.010
Token only
Token ID691
Feature activation-3.19e-02

Pos logit contributions
ves+0.017
 Everyone+0.016
 many+0.016
Her+0.016
uts+0.016
Neg logit contributions
oured-0.017
ited-0.017
rying-0.016
 least-0.015
ishing-0.015
Token hopped
Token ID45230
Feature activation+1.97e-02

Pos logit contributions
oured+0.053
ited+0.052
rying+0.046
 least+0.045
ishing+0.044
Neg logit contributions
ves-0.062
 Everyone-0.059
Her-0.058
uts-0.057
 many-0.057
Token around
Token ID1088
Feature activation+7.36e-03

Pos logit contributions
ves+0.040
 Everyone+0.036
uts+0.036
 People+0.035
Her+0.035
Neg logit contributions
rying-0.031
oured-0.031
 least-0.029
ited-0.028
ishing-0.027
Token laz
Token ID37296
Feature activation+1.32e-01

Pos logit contributions
ves+0.011
uts+0.010
 Everyone+0.009
Her+0.009
Everyone+0.008
Neg logit contributions
oured-0.015
rying-0.015
 least-0.015
ited-0.014
ishing-0.014
Tokenily
Token ID813
Feature activation+3.01e-02

Pos logit contributions
 Everyone+0.135
uts+0.132
ves+0.128
 People+0.126
The+0.123
Neg logit contributions
ited-0.283
ishing-0.282
int-0.278
rying-0.273
oured-0.273
Token in
Token ID287
Feature activation+6.76e-03

Pos logit contributions
ves+0.051
 Everyone+0.050
Her+0.047
Everyone+0.045
 People+0.045
Neg logit contributions
oured-0.057
ited-0.053
rying-0.053
 least-0.051
ishing-0.051
Token the
Token ID262
Feature activation-2.51e-03

Pos logit contributions
ves+0.012
uts+0.011
ggies+0.009
 Everyone+0.009
oles+0.009
Neg logit contributions
 least-0.013
ited-0.013
oured-0.011
cy-0.011
rying-0.011
Token forest
Token ID8222
Feature activation-1.35e-02

Pos logit contributions
oured+0.004
ited+0.004
 least+0.004
rying+0.004
ksh+0.004
Neg logit contributions
ves-0.005
uts-0.005
Her-0.004
 Everyone-0.004
 herself-0.004
Token.
Token ID13
Feature activation-4.27e-03

Pos logit contributions
rying+0.031
oured+0.030
ited+0.030
 least+0.028
ishing+0.028
Neg logit contributions
ves-0.016
 Everyone-0.016
Her-0.014
uts-0.014
Everyone-0.014
Token,
Token ID11
Feature activation+8.56e-03

Pos logit contributions
oured+0.022
ited+0.021
rying+0.020
ishing+0.019
 least+0.019
Neg logit contributions
 Everyone-0.015
Her-0.014
ves-0.014
 People-0.013
Everyone-0.013
Token forgetting
Token ID32012
Feature activation-1.28e-02

Pos logit contributions
ves+0.013
 Everyone+0.012
Her+0.012
uts+0.012
 many+0.011
Neg logit contributions
oured-0.018
ited-0.016
 least-0.016
rying-0.016
ishing-0.015
Token all
Token ID477
Feature activation-5.90e-02

Pos logit contributions
oured+0.026
ited+0.024
 least+0.024
rying+0.024
ishing+0.022
Neg logit contributions
ves-0.019
Her-0.017
uts-0.016
 Everyone-0.015
 herself-0.015
Token about
Token ID546
Feature activation-1.15e-02

Pos logit contributions
oured+0.144
ited+0.141
rying+0.135
ishing+0.132
 least+0.130
Neg logit contributions
ves-0.067
Her-0.060
 Everyone-0.060
uts-0.059
 many-0.057
Token his
Token ID465
Feature activation-1.20e-02

Pos logit contributions
 least+0.023
oured+0.023
rying+0.023
ishing+0.022
ited+0.022
Neg logit contributions
ves-0.017
Her-0.016
 Everyone-0.015
uts-0.015
 People-0.014
Token laz
Token ID37296
Feature activation+1.31e-01

Pos logit contributions
oured+0.024
ited+0.023
 least+0.021
ishing+0.021
rying+0.021
Neg logit contributions
ves-0.019
Her-0.018
 Everyone-0.018
uts-0.018
 herself-0.017
Tokeniness
Token ID1272
Feature activation-7.97e-03

Pos logit contributions
Her+0.257
 many+0.255
 Everyone+0.254
ves+0.247
uts+0.241
Neg logit contributions
ited-0.203
oured-0.192
ishing-0.184
rying-0.178
int-0.167
Token.
Token ID13
Feature activation-4.59e-03

Pos logit contributions
oured+0.017
ited+0.016
 least+0.016
rying+0.016
ishing+0.015
Neg logit contributions
ves-0.011
Her-0.010
 Everyone-0.010
 many-0.009
ggies-0.009
Token
Token ID198
Feature activation+1.35e-02

Pos logit contributions
oured+0.009
ksh+0.008
 least+0.008
isk+0.008
 retrieved+0.007
Neg logit contributions
ves-0.007
Her-0.006
 people-0.005
 herself-0.005
 many-0.005
Token
Token ID198
Feature activation+1.32e-02

Pos logit contributions
uts+0.017
 people+0.015
 others+0.015
 herself+0.015
 words+0.015
Neg logit contributions
oured-0.023
 retrieved-0.022
 least-0.021
isk-0.021
ited-0.020
TokenFrom
Token ID4863
Feature activation+5.07e-02

Pos logit contributions
uts+0.016
 herself+0.015
 many+0.015
 Everyone+0.015
 words+0.015
Neg logit contributions
ited-0.025
oured-0.023
 least-0.023
 retrieved-0.022
arse-0.021
Token careful
Token ID8161
Feature activation-9.40e-03

Pos logit contributions
oured+0.101
ited+0.097
rying+0.095
ishing+0.093
 least+0.092
Neg logit contributions
ves-0.071
 Everyone-0.065
uts-0.064
Her-0.060
 many-0.060
Token and
Token ID290
Feature activation-8.27e-03

Pos logit contributions
rying+0.018
ishing+0.017
oured+0.016
 least+0.015
arse+0.015
Neg logit contributions
ves-0.014
Her-0.014
 Everyone-0.013
 many-0.012
 People-0.012
Token not
Token ID407
Feature activation+8.67e-03

Pos logit contributions
oured+0.018
ited+0.017
rying+0.017
 least+0.016
ishing+0.016
Neg logit contributions
ves-0.012
 Everyone-0.011
 many-0.010
Her-0.010
uts-0.010
Token let
Token ID1309
Feature activation-2.45e-03

Pos logit contributions
ves+0.015
 Everyone+0.014
Her+0.013
 many+0.013
 herself+0.013
Neg logit contributions
oured-0.016
ited-0.016
rying-0.015
 least-0.015
ishing-0.014
Token his
Token ID465
Feature activation-1.55e-02

Pos logit contributions
 least+0.005
oured+0.005
rying+0.005
arse+0.005
ishing+0.005
Neg logit contributions
Her-0.003
ves-0.003
uts-0.003
 many-0.003
 Everyone-0.003
Token laz
Token ID37296
Feature activation+1.27e-01

Pos logit contributions
 least+0.028
oured+0.026
rying+0.024
ited+0.024
ishing+0.024
Neg logit contributions
Her-0.027
ves-0.027
uts-0.024
 herself-0.024
 Everyone-0.024
Tokeniness
Token ID1272
Feature activation-1.24e-02

Pos logit contributions
 many+0.269
 Everyone+0.268
ves+0.265
Her+0.264
uts+0.261
Neg logit contributions
ited-0.164
ishing-0.155
rying-0.154
oured-0.146
 least-0.134
Token get
Token ID651
Feature activation+1.30e-02

Pos logit contributions
oured+0.020
rying+0.020
ited+0.020
 least+0.020
arse+0.018
Neg logit contributions
ves-0.022
 Everyone-0.021
Her-0.021
 many-0.021
 People-0.019
Token in
Token ID287
Feature activation+6.57e-03

Pos logit contributions
ves+0.021
Her+0.019
 Everyone+0.019
uts+0.019
 many+0.018
Neg logit contributions
oured-0.026
rying-0.025
ited-0.024
ishing-0.024
 least-0.024
Token the
Token ID262
Feature activation-6.95e-03

Pos logit contributions
ves+0.011
uts+0.011
Her+0.011
 Everyone+0.010
 People+0.010
Neg logit contributions
 least-0.013
 extingu-0.011
ited-0.011
rying-0.011
oured-0.011
Token way
Token ID835
Feature activation-2.73e-02

Pos logit contributions
oured+0.014
ited+0.014
rying+0.013
 least+0.013
ishing+0.013
Neg logit contributions
ves-0.011
uts-0.010
Her-0.010
 Everyone-0.010
 herself-0.010
Token way
Token ID835
Feature activation-2.98e-02

Pos logit contributions
oured+0.021
ited+0.020
rying+0.019
ishing+0.018
 least+0.018
Neg logit contributions
ves-0.016
 Everyone-0.015
 many-0.015
Her-0.015
 herself-0.015
Token through
Token ID832
Feature activation+9.46e-03

Pos logit contributions
oured+0.059
rying+0.056
ited+0.054
ishing+0.053
 least+0.052
Neg logit contributions
ves-0.049
 Everyone-0.046
Her-0.044
uts-0.043
ggies-0.042
Token the
Token ID262
Feature activation-6.76e-03

Pos logit contributions
ves+0.013
uts+0.013
 Everyone+0.012
Her+0.012
ggies+0.011
Neg logit contributions
rying-0.019
 least-0.019
oured-0.018
ishing-0.018
ited-0.017
Token countryside
Token ID25708
Feature activation+2.48e-03

Pos logit contributions
oured+0.011
ited+0.011
rying+0.010
 least+0.010
ishing+0.010
Neg logit contributions
ves-0.013
Her-0.013
uts-0.012
 Everyone-0.012
 herself-0.012
Token and
Token ID290
Feature activation-7.13e-03

Pos logit contributions
ves+0.003
Her+0.003
 Everyone+0.003
uts+0.002
 much+0.002
Neg logit contributions
oured-0.006
rying-0.006
ited-0.006
 least-0.005
ishing-0.005
Token laz
Token ID37296
Feature activation+1.26e-01

Pos logit contributions
oured+0.015
ited+0.014
rying+0.014
 least+0.014
ishing+0.013
Neg logit contributions
ves-0.010
 Everyone-0.010
uts-0.010
Her-0.010
 herself-0.009
Tokenily
Token ID813
Feature activation+2.53e-02

Pos logit contributions
uts+0.151
Her+0.147
 Everyone+0.146
ves+0.141
 many+0.135
Neg logit contributions
ited-0.268
oured-0.259
rying-0.255
ishing-0.255
int-0.243
Token flowed
Token ID35456
Feature activation-2.10e-02

Pos logit contributions
 Everyone+0.045
ves+0.044
Her+0.042
uts+0.041
 People+0.040
Neg logit contributions
oured-0.045
ited-0.043
rying-0.042
ishing-0.039
 least-0.038
Token around
Token ID1088
Feature activation+6.14e-03

Pos logit contributions
rying+0.036
oured+0.033
ishing+0.030
ited+0.029
ksh+0.029
Neg logit contributions
ves-0.042
uts-0.038
Her-0.037
 Everyone-0.036
ggy-0.036
Token the
Token ID262
Feature activation-5.57e-03

Pos logit contributions
ves+0.009
uts+0.008
 Everyone+0.007
Her+0.007
ggies+0.007
Neg logit contributions
rying-0.013
oured-0.012
 least-0.012
ishing-0.012
ited-0.012
Token rocks
Token ID12586
Feature activation+2.19e-02

Pos logit contributions
oured+0.010
ited+0.009
rying+0.009
ishing+0.008
 least+0.008
Neg logit contributions
ves-0.010
 Everyone-0.010
 many-0.010
Her-0.010
uts-0.010
Token speaks
Token ID9209
Feature activation+1.53e-02

Pos logit contributions
ves+0.058
 Everyone+0.056
 many+0.055
Her+0.054
 herself+0.053
Neg logit contributions
oured-0.070
ited-0.067
rying-0.063
ishing-0.061
 least-0.061
Token English
Token ID3594
Feature activation+8.60e-02

Pos logit contributions
ves+0.023
uts+0.021
Her+0.020
 Everyone+0.020
ggies+0.018
Neg logit contributions
oured-0.027
rying-0.024
ishing-0.023
work-0.023
 least-0.023
Token and
Token ID290
Feature activation-5.19e-03

Pos logit contributions
ves+0.142
Her+0.139
 herself+0.131
 Everyone+0.128
uts+0.127
Neg logit contributions
oured-0.147
ited-0.129
 least-0.126
ishing-0.125
rying-0.117
Token Lily
Token ID20037
Feature activation+6.92e-04

Pos logit contributions
oured+0.011
ited+0.011
 least+0.010
cy+0.010
ishing+0.010
Neg logit contributions
ves-0.007
uts-0.007
Her-0.006
 Everyone-0.006
ggies-0.005
Token speaks
Token ID9209
Feature activation+1.52e-02

Pos logit contributions
ves+0.001
 Everyone+0.001
Her+0.001
 many+0.001
uts+0.001
Neg logit contributions
oured-0.001
ited-0.001
rying-0.001
 least-0.001
ishing-0.001
Token Spanish
Token ID7897
Feature activation+1.26e-01

Pos logit contributions
ves+0.023
uts+0.020
Her+0.019
ggies+0.019
 Everyone+0.018
Neg logit contributions
oured-0.026
rying-0.024
work-0.023
king-0.023
ishing-0.023
Token.
Token ID13
Feature activation-3.93e-03

Pos logit contributions
Her+0.178
ves+0.176
 herself+0.169
 himself+0.161
 Everyone+0.157
Neg logit contributions
oured-0.220
ishing-0.193
ksh-0.190
ited-0.189
work-0.188
Token
Token ID198
Feature activation+1.61e-02

Pos logit contributions
oured+0.008
cy+0.007
ksh+0.007
 least+0.007
ited+0.006
Neg logit contributions
ves-0.006
ggy-0.005
uts-0.005
ggies-0.005
gh-0.005
Token
Token ID198
Feature activation+1.58e-02

Pos logit contributions
uts+0.024
 herself+0.021
ves+0.021
aces+0.020
 people+0.020
Neg logit contributions
cy-0.027
oured-0.024
ited-0.022
bie-0.022
 least-0.022
Token"
Token ID1
Feature activation-3.25e-02

Pos logit contributions
uts+0.021
ves+0.019
 herself+0.019
 himself+0.019
 Everyone+0.018
Neg logit contributions
ited-0.033
cy-0.031
oured-0.030
ksh-0.029
rying-0.029
TokenLook
Token ID8567
Feature activation+9.98e-03

Pos logit contributions
ited+0.053
oured+0.052
ksh+0.048
cy+0.047
 least+0.045
Neg logit contributions
ves-0.061
 many-0.057
 much-0.057
uts-0.057
 Everyone-0.054
Token '
Token ID705
Feature activation+4.40e-03

Pos logit contributions
oured+0.073
 least+0.073
 extingu+0.067
rying+0.067
ksh+0.067
Neg logit contributions
ves-0.044
Her-0.042
uts-0.040
 Everyone-0.039
 herself-0.037
Tokenl
Token ID75
Feature activation+1.07e-01

Pos logit contributions
ves+0.008
 herself+0.008
 many+0.007
 himself+0.007
 Everyone+0.007
Neg logit contributions
ited-0.008
oured-0.008
cy-0.007
rying-0.007
ksh-0.007
Tokenion
Token ID295
Feature activation+3.35e-02

Pos logit contributions
ves+0.183
Her+0.181
 himself+0.180
 many+0.180
 herself+0.176
Neg logit contributions
oured-0.185
ited-0.172
rying-0.165
 least-0.156
ishing-0.155
Token'
Token ID6
Feature activation+1.38e-02

Pos logit contributions
Her+0.052
ves+0.051
 himself+0.051
 many+0.050
 herself+0.049
Neg logit contributions
oured-0.062
ited-0.058
rying-0.056
 least-0.056
cy-0.054
Token in
Token ID287
Feature activation+9.67e-03

Pos logit contributions
ves+0.022
Her+0.021
 himself+0.020
 herself+0.020
uts+0.020
Neg logit contributions
oured-0.024
ited-0.023
 least-0.023
rying-0.022
ksh-0.021
Token Spanish
Token ID7897
Feature activation+1.26e-01

Pos logit contributions
uts+0.015
ves+0.014
Her+0.013
 Everyone+0.013
 herself+0.013
Neg logit contributions
ited-0.017
 least-0.017
oured-0.016
 extingu-0.016
ishing-0.015
Token.
Token ID13
Feature activation+5.30e-03

Pos logit contributions
 himself+0.195
Her+0.191
 herself+0.190
ves+0.183
uts+0.174
Neg logit contributions
oured-0.226
ited-0.211
ishing-0.190
int-0.189
 least-0.189
Token It
Token ID632
Feature activation-1.99e-03

Pos logit contributions
ves+0.008
 lesson+0.007
ggies+0.007
 many+0.007
ggy+0.006
Neg logit contributions
oured-0.010
ksh-0.010
 extingu-0.009
 least-0.009
int-0.009
Token is
Token ID318
Feature activation-3.55e-02

Pos logit contributions
oured+0.004
ited+0.003
 least+0.003
rying+0.003
ksh+0.003
Neg logit contributions
ves-0.003
Her-0.003
 many-0.003
 herself-0.003
uts-0.003
Token '
Token ID705
Feature activation+4.49e-03

Pos logit contributions
oured+0.066
 least+0.066
ksh+0.060
rying+0.060
ited+0.059
Neg logit contributions
ves-0.051
Her-0.051
 Everyone-0.049
uts-0.049
Everyone-0.045
Tokenleon
Token ID38970
Feature activation-1.91e-02

Pos logit contributions
ves+0.008
 himself+0.008
 herself+0.007
 many+0.007
 Everyone+0.007
Neg logit contributions
ited-0.008
oured-0.008
cy-0.007
ksh-0.007
rying-0.007
Token man
Token ID582
Feature activation+6.23e-03

Pos logit contributions
ves+0.056
Her+0.052
uts+0.051
 much+0.051
 Everyone+0.051
Neg logit contributions
oured-0.089
ited-0.086
rying-0.082
ishing-0.079
 least-0.078
Token to
Token ID284
Feature activation-6.60e-03

Pos logit contributions
ves+0.011
uts+0.009
 Everyone+0.009
ggies+0.009
Her+0.009
Neg logit contributions
oured-0.012
rying-0.012
 least-0.011
ited-0.010
cy-0.010
Token teach
Token ID4545
Feature activation+4.09e-02

Pos logit contributions
oured+0.009
 least+0.009
ited+0.008
rying+0.008
arse+0.007
Neg logit contributions
ves-0.015
uts-0.014
 much-0.013
 many-0.013
 Everyone-0.013
Token her
Token ID607
Feature activation+8.03e-03

Pos logit contributions
uts+0.074
ves+0.072
ggies+0.061
ggy+0.059
Her+0.059
Neg logit contributions
oured-0.066
 least-0.064
ited-0.062
ishing-0.056
rying-0.055
Token some
Token ID617
Feature activation-1.34e-02

Pos logit contributions
ves+0.012
Her+0.011
uts+0.011
ggies+0.010
ily+0.010
Neg logit contributions
oured-0.014
 least-0.013
rying-0.012
work-0.011
ishing-0.011
Token Spanish
Token ID7897
Feature activation+1.25e-01

Pos logit contributions
oured+0.026
ited+0.025
ishing+0.024
rying+0.024
 least+0.024
Neg logit contributions
Her-0.017
uts-0.017
ves-0.016
 herself-0.015
 himself-0.014
Token.
Token ID13
Feature activation-5.60e-03

Pos logit contributions
ves+0.168
Her+0.164
 herself+0.145
ggies+0.144
ily+0.142
Neg logit contributions
oured-0.237
 least-0.202
rying-0.199
work-0.199
ited-0.199
Token The
Token ID383
Feature activation-1.61e-02

Pos logit contributions
oured+0.014
ited+0.013
 least+0.012
cy+0.012
ksh+0.012
Neg logit contributions
ves-0.008
ggies-0.006
ggy-0.006
uts-0.006
 much-0.005
Token old
Token ID1468
Feature activation+4.11e-02

Pos logit contributions
oured+0.037
ited+0.036
 least+0.033
rying+0.032
ishing+0.032
Neg logit contributions
ves-0.022
uts-0.021
Her-0.020
 herself-0.018
 Everyone-0.018
Token man
Token ID582
Feature activation+6.65e-03

Pos logit contributions
ves+0.054
uts+0.051
Her+0.050
 Everyone+0.050
 much+0.050
Neg logit contributions
oured-0.091
ited-0.089
rying-0.082
ishing-0.081
 least-0.079
Token was
Token ID373
Feature activation-4.97e-03

Pos logit contributions
ves+0.012
 Everyone+0.012
 many+0.012
Her+0.012
 herself+0.012
Neg logit contributions
oured-0.011
ited-0.011
rying-0.010
ishing-0.010
 least-0.010
Token the
Token ID262
Feature activation-4.91e-04

Pos logit contributions
ves+0.029
uts+0.027
 Everyone+0.027
Her+0.025
ggies+0.024
Neg logit contributions
oured-0.052
ited-0.047
ishing-0.045
cy-0.045
 least-0.045
Token name
Token ID1438
Feature activation+7.61e-03

Pos logit contributions
oured+0.001
ited+0.001
 least+0.001
rying+0.001
ishing+0.001
Neg logit contributions
ves-0.001
Her-0.001
uts-0.001
 Everyone-0.001
 many-0.001
Token "
Token ID366
Feature activation-1.81e-02

Pos logit contributions
ves+0.010
uts+0.010
 Everyone+0.010
ggies+0.010
Her+0.009
Neg logit contributions
oured-0.015
ited-0.013
cy-0.012
 least-0.012
 returned-0.012
TokenB
Token ID33
Feature activation+5.15e-03

Pos logit contributions
oured+0.031
ited+0.030
cy+0.029
ksh+0.027
ishing+0.026
Neg logit contributions
ves-0.034
 many-0.032
 much-0.031
ggies-0.031
 Everyone-0.030
TokenAN
Token ID1565
Feature activation+3.24e-02

Pos logit contributions
 many+0.011
ves+0.011
Her+0.011
 Everyone+0.011
 herself+0.011
Neg logit contributions
oured-0.007
ited-0.007
rying-0.006
ishing-0.006
ksh-0.006
TokenANA
Token ID31574
Feature activation+1.25e-01

Pos logit contributions
ves+0.057
 many+0.056
 herself+0.055
Her+0.055
 Everyone+0.054
Neg logit contributions
oured-0.059
ited-0.055
rying-0.051
ishing-0.050
ksh-0.049
Token"
Token ID1
Feature activation-3.24e-02

Pos logit contributions
 herself+0.149
 many+0.148
 much+0.144
Her+0.134
 himself+0.131
Neg logit contributions
oured-0.264
ish-0.234
rying-0.230
ishing-0.228
ksh-0.222
Token and
Token ID290
Feature activation-3.00e-03

Pos logit contributions
oured+0.052
rying+0.048
cy+0.047
ksh+0.046
int+0.046
Neg logit contributions
ves-0.054
ggies-0.049
 Everyone-0.048
ggy-0.044
 People-0.042
Token her
Token ID607
Feature activation+9.01e-03

Pos logit contributions
oured+0.006
 least+0.005
ited+0.005
rying+0.005
 extingu+0.005
Neg logit contributions
ves-0.004
Her-0.004
 much-0.004
 Everyone-0.004
uts-0.004
Token gir
Token ID37370
Feature activation+3.76e-02

Pos logit contributions
ves+0.014
Her+0.014
 Everyone+0.013
uts+0.013
 much+0.013
Neg logit contributions
oured-0.017
 least-0.015
ishing-0.015
rying-0.015
ited-0.015
Tokenaffe
Token ID21223
Feature activation+1.69e-03

Pos logit contributions
Her+0.068
ves+0.067
 many+0.066
 herself+0.066
 much+0.063
Neg logit contributions
oured-0.066
ited-0.059
ishing-0.053
 least-0.053
ksh-0.052
Token she
Token ID673
Feature activation+7.61e-03

Pos logit contributions
ves+0.003
Her+0.003
uts+0.003
 Everyone+0.003
ggies+0.002
Neg logit contributions
oured-0.005
ited-0.005
 least-0.005
rying-0.004
ishing-0.004
Token had
Token ID550
Feature activation+1.93e-03

Pos logit contributions
ves+0.013
 Everyone+0.012
 many+0.012
Her+0.012
 herself+0.012
Neg logit contributions
oured-0.015
ited-0.014
rying-0.013
ishing-0.013
 least-0.013
Token learned
Token ID4499
Feature activation+5.74e-02

Pos logit contributions
ves+0.003
Her+0.003
uts+0.003
 Everyone+0.003
 herself+0.003
Neg logit contributions
oured-0.003
ited-0.003
 least-0.003
rying-0.003
ishing-0.003
Token lots
Token ID6041
Feature activation+1.05e-02

Pos logit contributions
ves+0.101
uts+0.089
oles+0.085
ggies+0.084
Her+0.082
Neg logit contributions
 least-0.097
oured-0.093
ited-0.076
rying-0.076
ishing-0.073
Token of
Token ID286
Feature activation+3.72e-02

Pos logit contributions
ves+0.012
Her+0.012
uts+0.010
 herself+0.010
 Everyone+0.010
Neg logit contributions
oured-0.024
ited-0.023
ishing-0.021
rying-0.021
 least-0.020
Token Spanish
Token ID7897
Feature activation+1.25e-01

Pos logit contributions
uts+0.047
ves+0.044
Her+0.043
ily+0.039
 Everyone+0.038
Neg logit contributions
oured-0.074
ited-0.072
 least-0.070
ishing-0.070
rying-0.068
Token!
Token ID0
Feature activation-9.44e-03

Pos logit contributions
Her+0.177
ves+0.175
ily+0.151
His+0.149
Their+0.148
Neg logit contributions
oured-0.240
 least-0.203
ited-0.203
ishing-0.202
rying-0.197
Token
Token ID198
Feature activation+1.04e-02

Pos logit contributions
oured+0.023
 least+0.022
cy+0.022
ited+0.022
ksh+0.020
Neg logit contributions
ves-0.011
ggies-0.008
ggy-0.008
uts-0.008
l-0.007
Token
Token ID198
Feature activation+1.01e-02

Pos logit contributions
uts+0.013
ves+0.012
ggies+0.010
 purpose+0.010
 badly+0.010
Neg logit contributions
oured-0.020
ited-0.019
 least-0.019
cy-0.017
int-0.017
TokenAt
Token ID2953
Feature activation-7.37e-03

Pos logit contributions
uts+0.013
ves+0.012
 himself+0.011
ggies+0.011
 herself+0.011
Neg logit contributions
ited-0.023
oured-0.020
 least-0.019
rying-0.018
cy-0.018
Token the
Token ID262
Feature activation-4.04e-03

Pos logit contributions
oured+0.018
ited+0.017
 least+0.017
rying+0.016
ishing+0.016
Neg logit contributions
ves-0.009
 Everyone-0.008
uts-0.008
Her-0.008
 many-0.007
Token.
Token ID13
Feature activation-7.49e-03

Pos logit contributions
ves+0.055
 Everyone+0.049
uts+0.048
 People+0.046
Her+0.044
Neg logit contributions
ited-0.056
oured-0.054
 least-0.053
rying-0.053
ishing-0.050
Token She
Token ID1375
Feature activation+6.89e-05

Pos logit contributions
oured+0.019
 least+0.017
ksh+0.017
cy+0.016
ited+0.016
Neg logit contributions
ves-0.009
ggies-0.007
uts-0.007
ggy-0.006
l-0.006
Token grabbed
Token ID13176
Feature activation-1.42e-02

Pos logit contributions
ves+0.000
 Everyone+0.000
 many+0.000
Her+0.000
 herself+0.000
Neg logit contributions
oured-0.000
ited-0.000
rying-0.000
ishing-0.000
 least-0.000
Token the
Token ID262
Feature activation-5.79e-03

Pos logit contributions
oured+0.037
ited+0.035
 least+0.033
ishing+0.033
rying+0.033
Neg logit contributions
ves-0.015
uts-0.014
 Everyone-0.012
Her-0.012
ggies-0.011
Token fire
Token ID2046
Feature activation-4.48e-02

Pos logit contributions
oured+0.013
ited+0.012
rying+0.012
ishing+0.011
ksh+0.011
Neg logit contributions
 many-0.008
ves-0.008
 Everyone-0.008
 much-0.008
 people-0.008
Token extingu
Token ID24995
Feature activation-2.24e-01

Pos logit contributions
oured+0.102
ited+0.099
rying+0.093
ishing+0.090
 least+0.089
Neg logit contributions
ves-0.057
 many-0.055
 Everyone-0.055
 herself-0.052
uts-0.052
Tokenisher
Token ID4828
Feature activation-5.19e-02

Pos logit contributions
oured+0.238
phony+0.219
OO+0.207
rying+0.203
 least+0.199
Neg logit contributions
 herself-0.438
 people-0.432
Her-0.426
 himself-0.426
 Everyone-0.424
Token and
Token ID290
Feature activation-9.54e-03

Pos logit contributions
oured+0.116
ited+0.109
rying+0.109
 least+0.104
ishing+0.103
Neg logit contributions
ves-0.070
 Everyone-0.068
Her-0.063
 many-0.063
 People-0.061
Token put
Token ID1234
Feature activation+7.11e-04

Pos logit contributions
oured+0.017
ited+0.016
rying+0.015
ishing+0.015
 least+0.014
Neg logit contributions
ves-0.017
 many-0.017
 Everyone-0.017
 herself-0.016
Her-0.016
Token out
Token ID503
Feature activation-5.04e-02

Pos logit contributions
uts+0.001
ves+0.001
Her+0.001
 Everyone+0.001
ggies+0.000
Neg logit contributions
oured-0.002
ited-0.002
ishing-0.002
 least-0.002
rying-0.002
Token the
Token ID262
Feature activation-5.86e-03

Pos logit contributions
oured+0.121
ited+0.113
rying+0.110
ishing+0.107
 least+0.106
Neg logit contributions
ves-0.060
uts-0.058
 Everyone-0.050
ggies-0.049
Her-0.046
Token quickly
Token ID2952
Feature activation+3.04e-02

Pos logit contributions
ves+0.011
 many+0.011
 much+0.010
 Everyone+0.010
Her+0.010
Neg logit contributions
oured-0.014
ited-0.014
rying-0.013
ishing-0.013
ksh-0.012
Token checked
Token ID10667
Feature activation-6.93e-02

Pos logit contributions
ves+0.051
 many+0.050
 much+0.048
Her+0.045
 himself+0.045
Neg logit contributions
oured-0.059
ksh-0.055
ited-0.055
 least-0.054
ishing-0.053
Token for
Token ID329
Feature activation-6.19e-03

Pos logit contributions
oured+0.158
rying+0.156
ishing+0.151
ited+0.150
 least+0.148
Neg logit contributions
ves-0.088
uts-0.084
 Everyone-0.081
Her-0.078
 People-0.070
Token a
Token ID257
Feature activation-1.12e-02

Pos logit contributions
oured+0.016
ited+0.016
rying+0.015
 least+0.015
ishing+0.015
Neg logit contributions
ves-0.006
 Everyone-0.005
uts-0.005
Her-0.005
 many-0.005
Token fire
Token ID2046
Feature activation-4.55e-02

Pos logit contributions
oured+0.028
ited+0.027
rying+0.026
ishing+0.026
 least+0.025
Neg logit contributions
 many-0.013
 much-0.012
ves-0.012
 Everyone-0.011
 long-0.011
Token extingu
Token ID24995
Feature activation-2.23e-01

Pos logit contributions
oured+0.104
ited+0.103
rying+0.094
ishing+0.092
ksh+0.089
Neg logit contributions
ves-0.057
 many-0.056
 Everyone-0.055
uts-0.054
 much-0.052
Tokenisher
Token ID4828
Feature activation-5.36e-02

Pos logit contributions
oured+0.265
ksh+0.251
phony+0.248
ited+0.225
igor+0.215
Neg logit contributions
 of-0.427
 people-0.413
 himself-0.413
 many-0.408
 much-0.396
Token,
Token ID11
Feature activation+5.90e-03

Pos logit contributions
oured+0.139
ited+0.132
rying+0.130
ishing+0.126
 least+0.125
Neg logit contributions
ves-0.052
 Everyone-0.051
 many-0.048
Her-0.048
 much-0.045
Token but
Token ID475
Feature activation-4.11e-02

Pos logit contributions
ves+0.008
 Everyone+0.007
 many+0.007
Her+0.007
uts+0.007
Neg logit contributions
oured-0.013
ited-0.013
rying-0.012
 least-0.012
ishing-0.012
Token there
Token ID612
Feature activation-1.29e-02

Pos logit contributions
oured+0.109
ited+0.105
rying+0.102
 least+0.101
ishing+0.098
Neg logit contributions
ves-0.037
 Everyone-0.034
uts-0.033
Her-0.032
 many-0.030
Token wasn
Token ID2492
Feature activation+2.59e-03

Pos logit contributions
oured+0.020
 least+0.019
ited+0.018
cy+0.017
rying+0.016
Neg logit contributions
Her-0.023
uts-0.022
ves-0.021
 Everyone-0.021
 herself-0.021
Token he
Token ID339
Feature activation+6.97e-03

Pos logit contributions
oured+0.009
 least+0.008
ited+0.008
rying+0.008
cy+0.008
Neg logit contributions
uts-0.005
ves-0.005
Her-0.004
 Everyone-0.004
 lesson-0.003
Token found
Token ID1043
Feature activation-7.69e-03

Pos logit contributions
ves+0.016
 Everyone+0.016
 many+0.016
uts+0.015
Her+0.015
Neg logit contributions
oured-0.009
ited-0.008
rying-0.007
 least-0.007
ishing-0.007
Token a
Token ID257
Feature activation-1.17e-02

Pos logit contributions
oured+0.019
rying+0.018
 least+0.018
ishing+0.017
ited+0.017
Neg logit contributions
uts-0.008
ves-0.008
Her-0.008
 Everyone-0.007
ggies-0.006
Token magical
Token ID10883
Feature activation-1.56e-02

Pos logit contributions
oured+0.024
ited+0.023
rying+0.021
 least+0.021
ishing+0.021
Neg logit contributions
ves-0.018
 Everyone-0.017
uts-0.017
 herself-0.016
Her-0.016
Token o
Token ID267
Feature activation-2.09e-02

Pos logit contributions
oured+0.027
ited+0.026
rying+0.024
ishing+0.024
 least+0.023
Neg logit contributions
ves-0.028
 Everyone-0.028
 many-0.027
Her-0.027
uts-0.027
Tokenint
Token ID600
Feature activation-1.64e-01

Pos logit contributions
oured+0.034
ited+0.031
rying+0.030
ksh+0.029
ishing+0.029
Neg logit contributions
ves-0.042
 Everyone-0.038
Her-0.037
 herself-0.037
 much-0.037
Tokenment
Token ID434
Feature activation-1.82e-02

Pos logit contributions
oured+0.314
rying+0.290
ited+0.288
ishing+0.279
int+0.261
Neg logit contributions
 Everyone-0.281
ves-0.273
 many-0.267
Her-0.265
uts-0.260
Token.
Token ID13
Feature activation-3.00e-03

Pos logit contributions
oured+0.043
ited+0.042
rying+0.038
ishing+0.038
 least+0.037
Neg logit contributions
 Everyone-0.023
ves-0.023
Her-0.021
uts-0.021
 People-0.020
Token He
Token ID679
Feature activation+3.28e-03

Pos logit contributions
oured+0.007
ksh+0.006
 least+0.005
cy+0.005
int+0.005
Neg logit contributions
ves-0.005
 lesson-0.004
 many-0.003
l-0.003
uts-0.003
Token touched
Token ID12615
Feature activation-6.10e-02

Pos logit contributions
ves+0.006
 Everyone+0.006
 many+0.005
uts+0.005
Her+0.005
Neg logit contributions
oured-0.006
ited-0.006
rying-0.005
ishing-0.005
 least-0.005
Token it
Token ID340
Feature activation+2.41e-02

Pos logit contributions
oured+0.141
ited+0.128
ishing+0.128
cy+0.124
rying+0.122
Neg logit contributions
uts-0.086
ves-0.078
Her-0.073
ggies-0.072
 Everyone-0.066
Token the
Token ID262
Feature activation-9.18e-03

Pos logit contributions
ves+0.062
uts+0.059
 Everyone+0.053
Her+0.052
ggies+0.051
Neg logit contributions
 least-0.076
oured-0.072
 extingu-0.071
rying-0.069
ited-0.068
Token apple
Token ID17180
Feature activation-1.44e-02

Pos logit contributions
oured+0.014
 least+0.014
ited+0.013
rying+0.013
ishing+0.013
Neg logit contributions
ves-0.018
Her-0.017
uts-0.017
 Everyone-0.017
 herself-0.016
Token!
Token ID0
Feature activation-1.72e-03

Pos logit contributions
oured+0.028
rying+0.027
 least+0.027
ited+0.026
cy+0.025
Neg logit contributions
 Everyone-0.021
Her-0.020
ves-0.020
 People-0.019
ggies-0.018
Token He
Token ID679
Feature activation+2.77e-03

Pos logit contributions
oured+0.004
ksh+0.003
cy+0.003
 least+0.003
ited+0.003
Neg logit contributions
ves-0.003
 lesson-0.002
 many-0.002
uts-0.002
ggies-0.002
Token squ
Token ID2809
Feature activation-4.66e-03

Pos logit contributions
ves+0.005
 Everyone+0.004
Her+0.004
uts+0.004
 many+0.004
Neg logit contributions
oured-0.005
ited-0.005
rying-0.005
 least-0.005
ishing-0.005
Tokenint
Token ID600
Feature activation-1.63e-01

Pos logit contributions
oured+0.005
ited+0.005
rying+0.005
 least+0.005
arse+0.004
Neg logit contributions
 Everyone-0.011
 many-0.011
 herself-0.011
ves-0.011
 much-0.011
Tokened
Token ID276
Feature activation-2.19e-02

Pos logit contributions
oured+0.257
ited+0.227
ishing+0.222
rying+0.221
cy+0.204
Neg logit contributions
 Everyone-0.325
ves-0.324
uts-0.312
ggies-0.310
 People-0.307
Token his
Token ID465
Feature activation-1.63e-02

Pos logit contributions
oured+0.050
ited+0.045
rying+0.044
ishing+0.044
 extingu+0.042
Neg logit contributions
ves-0.031
uts-0.028
 Everyone-0.027
Her-0.026
ggies-0.026
Token eyes
Token ID2951
Feature activation+1.61e-02

Pos logit contributions
oured+0.029
ited+0.028
rying+0.026
ishing+0.025
 least+0.025
Neg logit contributions
ves-0.029
 many-0.028
 Everyone-0.028
 herself-0.027
Her-0.027
Token and
Token ID290
Feature activation-4.52e-03

Pos logit contributions
ves+0.019
 Everyone+0.018
 many+0.017
Her+0.017
uts+0.016
Neg logit contributions
oured-0.039
ited-0.037
rying-0.035
ishing-0.034
ksh-0.034
Token saw
Token ID2497
Feature activation-1.14e-02

Pos logit contributions
oured+0.007
ited+0.007
rying+0.006
ishing+0.006
 least+0.006
Neg logit contributions
ves-0.009
 Everyone-0.009
 many-0.009
uts-0.009
Her-0.008
Token when
Token ID618
Feature activation-1.27e-02

Pos logit contributions
ves+0.010
 Everyone+0.009
uts+0.009
Her+0.008
 many+0.008
Neg logit contributions
oured-0.015
rying-0.014
ited-0.014
 least-0.013
ksh-0.013
Token he
Token ID339
Feature activation+6.86e-03

Pos logit contributions
 least+0.028
oured+0.028
ited+0.027
rying+0.026
cy+0.025
Neg logit contributions
uts-0.016
ves-0.014
Her-0.013
 Everyone-0.012
 herself-0.011
Token noticed
Token ID6810
Feature activation+4.22e-02

Pos logit contributions
 many+0.013
ves+0.012
 much+0.012
+0.012
Her+0.011
Neg logit contributions
oured-0.012
ksh-0.011
rying-0.011
ishing-0.011
ited-0.011
Token a
Token ID257
Feature activation-1.03e-02

Pos logit contributions
ves+0.060
uts+0.060
Her+0.056
 Everyone+0.053
ggies+0.050
Neg logit contributions
oured-0.088
 least-0.084
ited-0.079
rying-0.079
ishing-0.078
Token gl
Token ID1278
Feature activation-1.17e-02

Pos logit contributions
oured+0.024
ited+0.024
rying+0.023
ishing+0.022
 least+0.022
Neg logit contributions
ves-0.012
 Everyone-0.012
 many-0.011
 herself-0.011
Her-0.011
Tokenint
Token ID600
Feature activation-1.63e-01

Pos logit contributions
oured+0.018
 least+0.018
ited+0.018
ksh+0.018
cy+0.017
Neg logit contributions
ves-0.024
Her-0.020
ggy-0.020
l-0.020
 himself-0.020
Token in
Token ID287
Feature activation+2.74e-03

Pos logit contributions
oured+0.397
ited+0.383
rying+0.368
ishing+0.358
 least+0.352
Neg logit contributions
ves-0.185
 Everyone-0.177
 many-0.175
Her-0.168
 herself-0.165
Token the
Token ID262
Feature activation-3.11e-03

Pos logit contributions
ves+0.003
uts+0.003
 Everyone+0.003
Her+0.003
ggies+0.002
Neg logit contributions
ited-0.007
oured-0.007
rying-0.006
 least-0.006
ishing-0.006
Token ground
Token ID2323
Feature activation-5.29e-03

Pos logit contributions
oured+0.007
ited+0.007
rying+0.006
ishing+0.006
 least+0.006
Neg logit contributions
 many-0.004
ves-0.004
 Everyone-0.004
 herself-0.004
Her-0.004
Token.
Token ID13
Feature activation-9.83e-03

Pos logit contributions
oured+0.012
ited+0.012
rying+0.012
ishing+0.011
 least+0.011
Neg logit contributions
 Everyone-0.006
ves-0.006
Her-0.006
 People-0.005
uts-0.005
Token He
Token ID679
Feature activation+1.88e-03

Pos logit contributions
oured+0.028
ited+0.026
rying+0.025
ksh+0.025
ishing+0.024
Neg logit contributions
ves-0.008
Her-0.007
 herself-0.006
 many-0.006
uts-0.006
Token,
Token ID11
Feature activation+5.04e-03

Pos logit contributions
 many+0.003
 much+0.003
 Everyone+0.003
ves+0.003
 people+0.003
Neg logit contributions
oured-0.006
ited-0.006
rying-0.006
ksh-0.006
ishing-0.006
Token shiny
Token ID22441
Feature activation-3.08e-03

Pos logit contributions
ves+0.007
 Everyone+0.007
uts+0.007
Her+0.007
 herself+0.006
Neg logit contributions
oured-0.011
ited-0.010
rying-0.010
ishing-0.010
ksh-0.009
Token wheel
Token ID7825
Feature activation-3.85e-03

Pos logit contributions
oured+0.006
ited+0.006
rying+0.006
ksh+0.005
ishing+0.005
Neg logit contributions
ves-0.005
 many-0.005
 Everyone-0.005
 much-0.004
 herself-0.004
Token that
Token ID326
Feature activation-1.05e-03

Pos logit contributions
oured+0.009
rying+0.008
ited+0.008
ishing+0.008
 least+0.008
Neg logit contributions
ves-0.005
Her-0.004
 Everyone-0.004
 many-0.004
 herself-0.004
Token gl
Token ID1278
Feature activation-1.12e-02

Pos logit contributions
oured+0.002
ited+0.002
 least+0.002
cy+0.002
rying+0.002
Neg logit contributions
Her-0.002
ves-0.002
 Everyone-0.002
 herself-0.002
 many-0.002
Tokenint
Token ID600
Feature activation-1.63e-01

Pos logit contributions
ited+0.016
oured+0.015
 least+0.015
ksh+0.013
cy+0.013
Neg logit contributions
ves-0.024
uts-0.023
 Everyone-0.022
Her-0.022
 himself-0.022
Tokened
Token ID276
Feature activation-1.90e-02

Pos logit contributions
oured+0.286
rying+0.261
ishing+0.247
ited+0.233
int+0.224
Neg logit contributions
Her-0.298
 Everyone-0.294
ves-0.291
Everyone-0.271
 many-0.271
Token in
Token ID287
Feature activation+2.38e-03

Pos logit contributions
oured+0.039
ishing+0.035
rying+0.035
ited+0.034
ksh+0.034
Neg logit contributions
ves-0.031
Her-0.029
uts-0.027
 Everyone-0.027
ggy-0.024
Token the
Token ID262
Feature activation-5.75e-03

Pos logit contributions
ves+0.003
uts+0.003
 Everyone+0.003
Her+0.003
ggies+0.002
Neg logit contributions
ited-0.005
oured-0.005
 least-0.005
rying-0.005
 extingu-0.005
Token sun
Token ID4252
Feature activation-5.26e-02

Pos logit contributions
oured+0.012
ited+0.011
rying+0.011
ishing+0.010
ksh+0.010
Neg logit contributions
ves-0.009
 Everyone-0.009
 many-0.009
 much-0.008
Her-0.008
Token.
Token ID13
Feature activation-1.21e-02

Pos logit contributions
oured+0.137
ited+0.130
rying+0.127
ishing+0.124
 least+0.123
Neg logit contributions
ves-0.052
 Everyone-0.048
 many-0.047
Her-0.046
 much-0.044
TokenAfter
Token ID3260
Feature activation+1.29e-02

Pos logit contributions
ves+0.015
 many+0.015
uts+0.015
 Everyone+0.015
 herself+0.015
Neg logit contributions
ited-0.028
oured-0.026
rying-0.025
ksh-0.024
ishing-0.023
Token they
Token ID484
Feature activation-1.75e-02

Pos logit contributions
ves+0.014
uts+0.013
 Everyone+0.013
Her+0.012
 herself+0.011
Neg logit contributions
oured-0.031
 least-0.029
ited-0.028
rying-0.028
ishing-0.027
Token made
Token ID925
Feature activation+1.83e-02

Pos logit contributions
rying+0.030
oured+0.030
ksh+0.029
ishing+0.029
ited+0.027
Neg logit contributions
 many-0.034
 much-0.032
ves-0.031
Her-0.030
 other-0.030
Token the
Token ID262
Feature activation-1.89e-03

Pos logit contributions
ves+0.018
uts+0.018
Her+0.018
 Everyone+0.016
 herself+0.014
Neg logit contributions
oured-0.045
 least-0.042
rying-0.041
ishing-0.041
ited-0.041
Token spl
Token ID4328
Feature activation-4.53e-02

Pos logit contributions
oured+0.004
ited+0.004
rying+0.004
ksh+0.004
ishing+0.004
Neg logit contributions
ves-0.003
 many-0.003
 Everyone-0.002
 people-0.002
 herself-0.002
Tokenint
Token ID600
Feature activation-1.63e-01

Pos logit contributions
oured+0.078
ited+0.076
 least+0.070
rying+0.069
arse+0.068
Neg logit contributions
ves-0.083
 Everyone-0.080
uts-0.076
 many-0.074
 much-0.074
Token,
Token ID11
Feature activation+8.76e-03

Pos logit contributions
oured+0.344
rying+0.322
ishing+0.313
ish+0.305
cy+0.304
Neg logit contributions
ves-0.226
Her-0.224
 Everyone-0.217
 many-0.206
uts-0.204
Token they
Token ID484
Feature activation-1.73e-02

Pos logit contributions
ves+0.010
uts+0.009
Her+0.009
 Everyone+0.008
 herself+0.008
Neg logit contributions
oured-0.021
ited-0.020
 least-0.020
rying-0.019
ishing-0.019
Token put
Token ID1234
Feature activation-5.22e-04

Pos logit contributions
oured+0.035
ited+0.034
rying+0.033
ishing+0.033
ksh+0.032
Neg logit contributions
 many-0.028
ves-0.026
 much-0.025
 Everyone-0.024
Her-0.024
Token it
Token ID340
Feature activation+3.52e-02

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
ishing+0.001
 least+0.001
Neg logit contributions
ves-0.000
 Everyone-0.000
Her-0.000
uts-0.000
 many-0.000
Token on
Token ID319
Feature activation+4.82e-02

Pos logit contributions
ves+0.039
 many+0.039
 herself+0.037
 much+0.036
 Everyone+0.036
Neg logit contributions
oured-0.086
ited-0.085
rying-0.079
 least-0.077
ksh-0.077
Token the
Token ID262
Feature activation-1.73e-02

Pos logit contributions
ited+0.017
oured+0.016
 least+0.016
cy+0.015
rying+0.015
Neg logit contributions
ves-0.009
uts-0.009
 Everyone-0.008
Her-0.008
 People-0.007
Token distance
Token ID5253
Feature activation-7.45e-02

Pos logit contributions
oured+0.028
ited+0.027
 least+0.026
rying+0.025
ishing+0.024
Neg logit contributions
ves-0.034
uts-0.033
Her-0.031
 Everyone-0.031
 herself-0.030
Token.
Token ID13
Feature activation-1.37e-02

Pos logit contributions
oured+0.153
ited+0.151
rying+0.146
ishing+0.137
 least+0.136
Neg logit contributions
ves-0.104
Her-0.096
uts-0.094
 Everyone-0.092
ggies-0.089
Token He
Token ID679
Feature activation-1.79e-03

Pos logit contributions
oured+0.028
ksh+0.025
cy+0.024
 least+0.023
ited+0.023
Neg logit contributions
ves-0.021
 lesson-0.017
uts-0.016
l-0.015
ggy-0.015
Token squ
Token ID2809
Feature activation-6.82e-03

Pos logit contributions
oured+0.003
ited+0.003
rying+0.003
 least+0.003
ishing+0.003
Neg logit contributions
ves-0.003
 Everyone-0.003
uts-0.003
Her-0.003
 many-0.003
Tokenint
Token ID600
Feature activation-1.62e-01

Pos logit contributions
oured+0.008
ited+0.007
rying+0.007
 least+0.007
arse+0.006
Neg logit contributions
 Everyone-0.016
 many-0.015
ves-0.015
 herself-0.015
Her-0.015
Tokened
Token ID276
Feature activation-2.26e-02

Pos logit contributions
oured+0.239
ited+0.210
ishing+0.204
rying+0.203
int+0.188
Neg logit contributions
 Everyone-0.335
ves-0.333
uts-0.324
 truth-0.320
 People-0.319
Token and
Token ID290
Feature activation-7.77e-03

Pos logit contributions
oured+0.048
ited+0.043
rying+0.042
ishing+0.042
 extingu+0.041
Neg logit contributions
ves-0.036
uts-0.033
 Everyone-0.031
Her-0.030
ggies-0.030
Token said
Token ID531
Feature activation-1.71e-02

Pos logit contributions
oured+0.012
ited+0.011
rying+0.011
ishing+0.010
 least+0.010
Neg logit contributions
ves-0.016
 Everyone-0.015
 many-0.015
Her-0.015
uts-0.015
Token to
Token ID284
Feature activation-1.20e-02

Pos logit contributions
oured+0.029
rying+0.023
 extingu+0.022
int+0.022
ked+0.022
Neg logit contributions
Every-0.025
ves-0.025
Her-0.024
Everyone-0.023
uts-0.023
Token Sarah
Token ID10490
Feature activation-1.24e-02

Pos logit contributions
 least+0.025
oured+0.024
rying+0.022
ited+0.022
 extingu+0.021
Neg logit contributions
uts-0.017
ves-0.017
Her-0.014
 much-0.014
 many-0.013
Token.
Token ID13
Feature activation-4.84e-03

Pos logit contributions
oured+0.020
rying+0.020
ishing+0.018
ited+0.018
 least+0.017
Neg logit contributions
 Everyone-0.014
Her-0.013
ves-0.013
Everyone-0.013
 People-0.012
Token
Token ID220
Feature activation-4.64e-03

Pos logit contributions
oured+0.011
ksh+0.010
 least+0.009
cy+0.009
ited+0.009
Neg logit contributions
ves-0.007
Her-0.006
l-0.005
uts-0.005
 many-0.005
Token
Token ID198
Feature activation+4.38e-03

Pos logit contributions
oured+0.011
rying+0.010
cy+0.010
ksh+0.009
int+0.009
Neg logit contributions
ves-0.005
Her-0.004
 lesson-0.004
 people-0.004
 many-0.004
Token
Token ID198
Feature activation+4.43e-03

Pos logit contributions
ves+0.005
 words+0.005
 people+0.005
uts+0.005
 lesson+0.005
Neg logit contributions
oured-0.009
ited-0.008
 retrieved-0.007
cy-0.007
ksh-0.007
TokenM
Token ID44
Feature activation-2.46e-02

Pos logit contributions
ves+0.006
 many+0.005
 words+0.005
uts+0.005
 herself+0.005
Neg logit contributions
ited-0.009
oured-0.009
rying-0.008
ksh-0.008
cy-0.008
Tokenint
Token ID600
Feature activation-1.62e-01

Pos logit contributions
oured+0.051
ited+0.050
rying+0.046
ishing+0.045
 least+0.045
Neg logit contributions
ves-0.037
 many-0.037
 Everyone-0.036
 herself-0.035
Her-0.035
Token no
Token ID645
Feature activation+3.26e-02

Pos logit contributions
oured+0.250
ited+0.234
ishing+0.218
rying+0.209
cy+0.198
Neg logit contributions
 Everyone-0.326
uts-0.325
 much-0.323
 many-0.320
ves-0.317
Token longer
Token ID2392
Feature activation-5.25e-04

Pos logit contributions
 many+0.055
ves+0.055
 Everyone+0.053
 much+0.053
 herself+0.052
Neg logit contributions
oured-0.060
ited-0.056
rying-0.054
ksh-0.053
 least-0.051
Token wanted
Token ID2227
Feature activation+1.54e-02

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
ishing+0.001
 least+0.001
Neg logit contributions
ves-0.001
 Everyone-0.001
Her-0.001
 many-0.001
uts-0.001
Token to
Token ID284
Feature activation-9.36e-03

Pos logit contributions
ves+0.026
Her+0.026
 Everyone+0.025
uts+0.025
Everyone+0.024
Neg logit contributions
oured-0.027
 least-0.026
rying-0.025
ishing-0.024
ited-0.022
Token go
Token ID467
Feature activation+2.47e-02

Pos logit contributions
oured+0.012
 least+0.012
ited+0.011
rying+0.010
 extingu+0.009
Neg logit contributions
ves-0.021
 many-0.020
uts-0.020
Her-0.019
 Everyone-0.019
Token
Token ID198
Feature activation+1.18e-02

Pos logit contributions
ves+0.039
uts+0.033
ggies+0.031
ggy+0.031
Her+0.029
Neg logit contributions
oured-0.057
ited-0.051
 least-0.049
cy-0.049
rying-0.048
Token
Token ID198
Feature activation+1.15e-02

Pos logit contributions
ves+0.013
uts+0.013
 herself+0.011
ggies+0.010
ggy+0.010
Neg logit contributions
oured-0.025
cy-0.024
ited-0.023
 least-0.023
rying-0.023
TokenJ
Token ID41
Feature activation+4.89e-02

Pos logit contributions
ves+0.016
uts+0.015
 many+0.013
 Everyone+0.013
 herself+0.013
Neg logit contributions
ited-0.024
oured-0.024
rying-0.022
cy-0.022
ishing-0.022
Tokenill
Token ID359
Feature activation-3.08e-02

Pos logit contributions
ves+0.099
 many+0.096
 Everyone+0.095
Her+0.093
 herself+0.093
Neg logit contributions
oured-0.076
ited-0.071
rying-0.067
ishing-0.064
 least-0.062
Token squ
Token ID2809
Feature activation-1.81e-03

Pos logit contributions
oured+0.058
ited+0.055
rying+0.052
ishing+0.050
 least+0.049
Neg logit contributions
ves-0.052
 Everyone-0.050
 many-0.050
Her-0.048
 herself-0.048
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
 least+0.002
 extingu+0.002
rying+0.002
oured+0.002
arse+0.002
Neg logit contributions
 Everyone-0.004
Her-0.004
 many-0.004
 much-0.004
 herself-0.004
Tokened
Token ID276
Feature activation-1.63e-02

Pos logit contributions
oured+0.269
ited+0.249
ishing+0.237
rying+0.232
cy+0.220
Neg logit contributions
ves-0.308
 Everyone-0.303
 many-0.295
 much-0.293
uts-0.291
Token up
Token ID510
Feature activation+3.08e-02

Pos logit contributions
oured+0.034
ited+0.032
ishing+0.031
rying+0.031
ksh+0.029
Neg logit contributions
ves-0.025
uts-0.023
 Everyone-0.023
 many-0.023
 much-0.022
Token at
Token ID379
Feature activation-1.58e-02

Pos logit contributions
ves+0.040
 Everyone+0.037
 many+0.035
uts+0.035
 much+0.034
Neg logit contributions
oured-0.071
ited-0.068
rying-0.065
ishing-0.064
cy-0.061
Token the
Token ID262
Feature activation-2.48e-03

Pos logit contributions
oured+0.035
ited+0.033
 least+0.032
rying+0.032
ishing+0.031
Neg logit contributions
ves-0.023
uts-0.021
 Everyone-0.020
 many-0.020
 much-0.020
Token sky
Token ID6766
Feature activation+3.17e-03

Pos logit contributions
oured+0.004
ited+0.004
rying+0.004
 least+0.004
ishing+0.003
Neg logit contributions
ves-0.005
Her-0.004
 Everyone-0.004
uts-0.004
 herself-0.004
Tokenstick
Token ID13915
Feature activation+1.55e-02

Pos logit contributions
 Everyone+0.001
ves+0.001
Her+0.001
 many+0.001
 much+0.001
Neg logit contributions
oured-0.001
rying-0.001
 least-0.001
ishing-0.001
 extingu-0.001
Token to
Token ID284
Feature activation-6.95e-03

Pos logit contributions
ves+0.027
Her+0.026
 Everyone+0.026
uts+0.024
Everyone+0.024
Neg logit contributions
oured-0.027
 least-0.026
rying-0.024
ishing-0.023
ited-0.023
Token make
Token ID787
Feature activation+3.96e-02

Pos logit contributions
oured+0.011
ited+0.011
 least+0.010
rying+0.010
ishing+0.010
Neg logit contributions
uts-0.013
ves-0.013
 Everyone-0.012
Her-0.012
 many-0.012
Token a
Token ID257
Feature activation-7.87e-03

Pos logit contributions
uts+0.048
Her+0.048
ves+0.046
His+0.043
Every+0.042
Neg logit contributions
oured-0.084
 least-0.078
ishing-0.078
 extingu-0.077
rying-0.077
Token spl
Token ID4328
Feature activation-4.44e-02

Pos logit contributions
oured+0.016
ited+0.016
rying+0.015
ishing+0.014
 least+0.014
Neg logit contributions
ves-0.012
 many-0.011
 Everyone-0.011
 much-0.011
Her-0.011
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
oured+0.072
ited+0.069
rying+0.063
 least+0.062
ishing+0.060
Neg logit contributions
ves-0.087
 Everyone-0.084
 many-0.082
 much-0.080
uts-0.080
Token for
Token ID329
Feature activation-3.02e-03

Pos logit contributions
oured+0.276
rying+0.256
 extingu+0.246
ishing+0.243
ited+0.240
Neg logit contributions
ves-0.284
Her-0.283
 Everyone-0.278
 many-0.271
 herself-0.264
Token the
Token ID262
Feature activation-5.07e-04

Pos logit contributions
oured+0.007
 least+0.007
rying+0.006
ishing+0.006
 extingu+0.006
Neg logit contributions
ves-0.003
uts-0.003
Her-0.003
 Everyone-0.003
Every-0.003
Token bird
Token ID6512
Feature activation-1.91e-02

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
ishing+0.001
 least+0.001
Neg logit contributions
ves-0.001
 Everyone-0.001
 many-0.001
Her-0.001
uts-0.001
Token's
Token ID338
Feature activation-2.09e-02

Pos logit contributions
oured+0.043
rying+0.041
ited+0.041
ishing+0.039
 least+0.039
Neg logit contributions
 Everyone-0.024
ves-0.023
 many-0.022
Her-0.022
 herself-0.021
Token wing
Token ID8539
Feature activation-1.52e-02

Pos logit contributions
oured+0.038
rying+0.035
ishing+0.035
ited+0.034
 least+0.034
Neg logit contributions
Her-0.034
ves-0.034
 Everyone-0.034
 many-0.032
uts-0.032
Token for
Token ID329
Feature activation-7.89e-03

Pos logit contributions
oured+0.071
ited+0.066
rying+0.066
ishing+0.062
 least+0.057
Neg logit contributions
ves-0.100
uts-0.094
Her-0.094
 Everyone-0.093
 herself-0.088
Token something
Token ID1223
Feature activation-8.53e-03

Pos logit contributions
oured+0.017
ited+0.016
rying+0.016
ishing+0.015
 least+0.015
Neg logit contributions
uts-0.011
ves-0.011
Her-0.010
 Everyone-0.010
 People-0.008
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
rying+0.017
oured+0.017
ited+0.016
 least+0.016
ishing+0.016
Neg logit contributions
 Everyone-0.012
ves-0.012
Her-0.011
 People-0.011
uts-0.010
Token He
Token ID679
Feature activation+2.26e-03

Pos logit contributions
oured+0.026
ksh+0.022
 least+0.021
rying+0.021
ited+0.021
Neg logit contributions
ves-0.018
Her-0.014
uts-0.014
ggy-0.013
l-0.013
Token squ
Token ID2809
Feature activation-3.03e-03

Pos logit contributions
ves+0.005
 Everyone+0.005
 many+0.005
uts+0.005
Her+0.005
Neg logit contributions
oured-0.003
ited-0.002
rying-0.002
ishing-0.002
 least-0.002
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
oured+0.004
 least+0.003
 extingu+0.003
rying+0.003
arse+0.003
Neg logit contributions
 Everyone-0.007
 many-0.006
Her-0.006
 herself-0.006
 much-0.006
Tokened
Token ID276
Feature activation-1.83e-02

Pos logit contributions
oured+0.245
ishing+0.222
ited+0.219
rying+0.217
cy+0.201
Neg logit contributions
 Everyone-0.327
ves-0.325
uts-0.315
 much-0.310
Her-0.309
Token his
Token ID465
Feature activation-1.35e-02

Pos logit contributions
oured+0.039
ited+0.035
rying+0.035
ishing+0.035
cy+0.032
Neg logit contributions
ves-0.029
uts-0.027
 Everyone-0.026
Her-0.025
 many-0.024
Token eyes
Token ID2951
Feature activation+1.90e-02

Pos logit contributions
oured+0.024
ited+0.023
rying+0.022
ishing+0.021
 least+0.020
Neg logit contributions
ves-0.024
 Everyone-0.023
 many-0.023
Her-0.023
uts-0.022
Token and
Token ID290
Feature activation-1.31e-02

Pos logit contributions
ves+0.025
 Everyone+0.025
uts+0.023
 many+0.023
Her+0.022
Neg logit contributions
oured-0.043
ited-0.041
rying-0.038
ishing-0.037
 least-0.035
Token saw
Token ID2497
Feature activation-1.75e-02

Pos logit contributions
oured+0.023
ited+0.022
rying+0.020
ishing+0.020
 least+0.019
Neg logit contributions
ves-0.024
 Everyone-0.023
uts-0.022
 many-0.022
 much-0.022
Token day
Token ID1110
Feature activation-1.06e-02

Pos logit contributions
ves+0.006
 many+0.006
 Everyone+0.006
 much+0.005
Her+0.005
Neg logit contributions
oured-0.016
ited-0.015
rying-0.015
ishing-0.015
 least-0.014
Token he
Token ID339
Feature activation+7.96e-03

Pos logit contributions
oured+0.028
ited+0.027
rying+0.026
ishing+0.026
 least+0.025
Neg logit contributions
ves-0.010
 Everyone-0.009
 many-0.009
Her-0.008
 herself-0.008
Token noticed
Token ID6810
Feature activation+4.32e-02

Pos logit contributions
ves+0.013
 many+0.013
 Everyone+0.013
Her+0.012
 much+0.012
Neg logit contributions
oured-0.015
ited-0.014
rying-0.014
ishing-0.013
ksh-0.013
Token something
Token ID1223
Feature activation-6.45e-03

Pos logit contributions
ves+0.055
uts+0.053
Her+0.050
 Everyone+0.049
 People+0.044
Neg logit contributions
oured-0.098
 least-0.092
ited-0.089
ishing-0.088
rying-0.087
Token gl
Token ID1278
Feature activation-1.05e-02

Pos logit contributions
oured+0.013
ited+0.013
rying+0.013
ishing+0.012
 least+0.012
Neg logit contributions
 Everyone-0.009
ves-0.008
uts-0.008
Her-0.008
 herself-0.008
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
ited+0.015
oured+0.014
 least+0.014
ksh+0.013
cy+0.013
Neg logit contributions
ves-0.022
Her-0.020
 himself-0.020
uts-0.020
 herself-0.020
Tokening
Token ID278
Feature activation+5.30e-02

Pos logit contributions
oured+0.310
rying+0.293
ited+0.290
ishing+0.280
int+0.259
Neg logit contributions
ves-0.259
 Everyone-0.258
Her-0.253
 many-0.245
uts-0.243
Token in
Token ID287
Feature activation+4.25e-03

Pos logit contributions
 Everyone+0.085
ves+0.084
uts+0.083
Her+0.078
 People+0.077
Neg logit contributions
ited-0.102
oured-0.101
rying-0.099
ishing-0.096
 least-0.090
Token the
Token ID262
Feature activation-3.75e-03

Pos logit contributions
ves+0.006
uts+0.005
 Everyone+0.005
ggies+0.005
Her+0.004
Neg logit contributions
ited-0.010
oured-0.010
 least-0.010
 extingu-0.009
rying-0.009
Token sun
Token ID4252
Feature activation-5.05e-02

Pos logit contributions
oured+0.008
ited+0.008
rying+0.007
ishing+0.007
 least+0.007
Neg logit contributions
ves-0.005
 Everyone-0.005
 many-0.005
uts-0.005
Her-0.005
Token.
Token ID13
Feature activation-8.75e-03

Pos logit contributions
oured+0.135
ited+0.130
rying+0.125
ishing+0.122
 least+0.121
Neg logit contributions
ves-0.046
 many-0.043
 Everyone-0.043
Her-0.040
 herself-0.039
Token pile
Token ID14540
Feature activation+2.03e-03

Pos logit contributions
 many+0.008
ves+0.008
 much+0.008
 Everyone+0.008
 herself+0.008
Neg logit contributions
oured-0.011
ited-0.010
rying-0.010
ksh-0.009
ishing-0.009
Token of
Token ID286
Feature activation+3.61e-02

Pos logit contributions
ves+0.002
 Everyone+0.002
Her+0.002
 many+0.001
uts+0.001
Neg logit contributions
oured-0.005
rying-0.005
ited-0.005
ishing-0.005
 least-0.005
Token dust
Token ID8977
Feature activation-2.32e-02

Pos logit contributions
ves+0.064
uts+0.063
 Everyone+0.059
Her+0.058
ggies+0.054
Neg logit contributions
oured-0.065
ited-0.062
ishing-0.060
rying-0.059
 least-0.056
Token and
Token ID290
Feature activation-1.07e-02

Pos logit contributions
oured+0.051
rying+0.048
ited+0.048
ishing+0.047
 least+0.045
Neg logit contributions
 Everyone-0.031
Her-0.031
ves-0.031
 many-0.029
 much-0.028
Token l
Token ID300
Feature activation+5.16e-03

Pos logit contributions
oured+0.019
ited+0.017
rying+0.016
 least+0.016
ishing+0.016
Neg logit contributions
ves-0.019
uts-0.019
 Everyone-0.018
Her-0.018
ggies-0.017
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
ves+0.006
 Everyone+0.006
 many+0.006
Her+0.006
 herself+0.006
Neg logit contributions
oured-0.012
ited-0.012
rying-0.011
ishing-0.011
 least-0.011
Token in
Token ID287
Feature activation+4.42e-03

Pos logit contributions
oured+0.267
rying+0.238
ishing+0.236
ited+0.229
cy+0.206
Neg logit contributions
Her-0.306
 Everyone-0.304
ves-0.301
 many-0.285
uts-0.285
Token the
Token ID262
Feature activation-2.95e-03

Pos logit contributions
ves+0.005
uts+0.005
 Everyone+0.005
Her+0.004
ggies+0.004
Neg logit contributions
oured-0.010
ited-0.010
ishing-0.010
 extingu-0.010
rying-0.010
Token corner
Token ID5228
Feature activation+5.36e-03

Pos logit contributions
oured+0.005
ited+0.005
rying+0.005
ishing+0.005
 least+0.004
Neg logit contributions
ves-0.005
 Everyone-0.005
uts-0.005
 many-0.005
Her-0.005
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
ves+0.007
Her+0.007
 Everyone+0.007
uts+0.006
ggies+0.006
Neg logit contributions
oured-0.011
ited-0.011
rying-0.011
ishing-0.010
 extingu-0.010
Token He
Token ID679
Feature activation+2.97e-03

Pos logit contributions
oured+0.024
ksh+0.020
icol+0.019
 least+0.019
ited+0.019
Neg logit contributions
ves-0.015
Her-0.012
ggy-0.012
ggies-0.011
 herself-0.010
Token the
Token ID262
Feature activation-2.93e-03

Pos logit contributions
ves+0.006
uts+0.006
ggies+0.005
 Everyone+0.005
 People+0.005
Neg logit contributions
 least-0.007
oured-0.007
 extingu-0.007
ited-0.007
ishing-0.007
Token paint
Token ID7521
Feature activation-2.18e-02

Pos logit contributions
oured+0.005
ited+0.005
rying+0.005
 least+0.005
ishing+0.005
Neg logit contributions
ves-0.005
uts-0.005
 Everyone-0.005
 himself-0.005
 many-0.005
Token.
Token ID13
Feature activation-8.87e-03

Pos logit contributions
oured+0.048
ited+0.043
ishing+0.043
 least+0.043
rying+0.042
Neg logit contributions
ves-0.029
ggies-0.027
 Everyone-0.027
Her-0.026
uts-0.026
Token She
Token ID1375
Feature activation-6.42e-04

Pos logit contributions
oured+0.022
ksh+0.019
icol+0.018
ited+0.018
 least+0.018
Neg logit contributions
ves-0.013
ggy-0.010
ggies-0.010
l-0.009
uts-0.009
Token squ
Token ID2809
Feature activation-2.91e-03

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
ishing+0.001
 least+0.001
Neg logit contributions
ves-0.001
 many-0.001
 Everyone-0.001
Her-0.001
 herself-0.001
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
 least+0.003
rying+0.003
arse+0.003
oured+0.003
 extingu+0.003
Neg logit contributions
 Everyone-0.007
 many-0.006
 much-0.006
 herself-0.006
Her-0.006
Tokened
Token ID276
Feature activation-1.73e-02

Pos logit contributions
oured+0.256
ishing+0.226
ited+0.225
rying+0.212
cy+0.203
Neg logit contributions
 Everyone-0.320
ves-0.319
ggies-0.308
uts-0.306
 People-0.297
Token at
Token ID379
Feature activation-1.70e-02

Pos logit contributions
oured+0.040
ited+0.036
ishing+0.036
ksh+0.034
cy+0.034
Neg logit contributions
ves-0.025
uts-0.023
ggies-0.022
 Everyone-0.020
ggy-0.020
Token the
Token ID262
Feature activation-3.59e-03

Pos logit contributions
oured+0.036
 least+0.034
ited+0.034
ishing+0.032
cy+0.031
Neg logit contributions
ves-0.026
uts-0.025
ggies-0.021
ggy-0.021
Her-0.021
Token canvas
Token ID21978
Feature activation-1.44e-02

Pos logit contributions
oured+0.006
ited+0.006
rying+0.005
ishing+0.005
 least+0.005
Neg logit contributions
ves-0.007
 Everyone-0.006
uts-0.006
 many-0.006
Her-0.006
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
oured+0.032
ited+0.029
rying+0.029
ishing+0.028
 least+0.028
Neg logit contributions
ves-0.021
 Everyone-0.021
 People-0.018
ggies-0.018
Her-0.018
Token smell
Token ID8508
Feature activation-3.12e-02

Pos logit contributions
ves+0.002
Her+0.002
 Everyone+0.002
 many+0.002
uts+0.002
Neg logit contributions
oured-0.002
ited-0.002
rying-0.002
 least-0.002
ishing-0.002
Token?"
Token ID1701
Feature activation-1.72e-02

Pos logit contributions
ited+0.060
oured+0.057
 least+0.056
rying+0.056
ishing+0.053
Neg logit contributions
Her-0.046
ves-0.045
uts-0.044
 herself-0.043
 Everyone-0.043
Token The
Token ID383
Feature activation-1.58e-02

Pos logit contributions
oured+0.040
ited+0.038
 least+0.036
rying+0.034
cy+0.034
Neg logit contributions
ves-0.023
uts-0.021
 many-0.019
ggies-0.018
 much-0.018
Token squirrel
Token ID33039
Feature activation-2.40e-02

Pos logit contributions
ited+0.027
oured+0.026
 least+0.025
rying+0.024
ish+0.023
Neg logit contributions
ves-0.029
uts-0.026
 Everyone-0.025
ggies-0.024
Her-0.024
Token squ
Token ID2809
Feature activation-9.53e-04

Pos logit contributions
oured+0.043
ited+0.042
rying+0.039
ishing+0.038
 least+0.037
Neg logit contributions
ves-0.042
 Everyone-0.041
 many-0.040
Her-0.039
 herself-0.039
Tokenint
Token ID600
Feature activation-1.61e-01

Pos logit contributions
 least+0.001
rying+0.001
oured+0.001
 extingu+0.001
arse+0.001
Neg logit contributions
Her-0.002
 Everyone-0.002
 many-0.002
 herself-0.002
 She-0.002
Tokened
Token ID276
Feature activation-1.72e-02

Pos logit contributions
oured+0.264
ited+0.243
ishing+0.229
rying+0.229
cy+0.213
Neg logit contributions
 Everyone-0.314
ves-0.313
 many-0.300
 much-0.295
uts-0.293
Token,
Token ID11
Feature activation+7.20e-03

Pos logit contributions
oured+0.042
ited+0.040
rying+0.039
ishing+0.037
 least+0.037
Neg logit contributions
ves-0.020
 many-0.019
 Everyone-0.019
Her-0.018
 herself-0.018
Token "
Token ID366
Feature activation-1.72e-02

Pos logit contributions
ves+0.007
 many+0.007
Her+0.007
 herself+0.007
 Everyone+0.007
Neg logit contributions
oured-0.019
ited-0.018
rying-0.017
ishing-0.017
 least-0.017
TokenOh
Token ID5812
Feature activation-1.83e-02

Pos logit contributions
ited+0.036
oured+0.033
ksh+0.032
cy+0.032
 least+0.031
Neg logit contributions
ves-0.026
 Everyone-0.025
 many-0.024
ggies-0.023
 herself-0.022
Token,
Token ID11
Feature activation+7.11e-03

Pos logit contributions
oured+0.046
ited+0.044
rying+0.043
ishing+0.042
 least+0.041
Neg logit contributions
ves-0.019
 Everyone-0.018
 many-0.018
Her-0.017
 herself-0.017
Token and
Token ID290
Feature activation-8.52e-03

Pos logit contributions
oured+0.061
ited+0.059
rying+0.056
ishing+0.055
 least+0.054
Neg logit contributions
ves-0.042
 Everyone-0.040
 many-0.040
Her-0.039
 herself-0.038
Token saw
Token ID2497
Feature activation-1.44e-02

Pos logit contributions
oured+0.020
ited+0.019
rying+0.018
ishing+0.018
 least+0.018
Neg logit contributions
ves-0.011
 Everyone-0.010
Her-0.010
uts-0.010
 many-0.010
Token the
Token ID262
Feature activation-1.04e-03

Pos logit contributions
oured+0.029
 least+0.028
ishing+0.025
ited+0.025
rying+0.024
Neg logit contributions
ves-0.021
uts-0.021
Her-0.020
ggies-0.017
 Everyone-0.015
Token sun
Token ID4252
Feature activation-4.92e-02

Pos logit contributions
oured+0.002
ited+0.002
rying+0.002
 least+0.002
ishing+0.002
Neg logit contributions
ves-0.002
Her-0.002
 Everyone-0.001
 many-0.001
uts-0.001
Token gl
Token ID1278
Feature activation-9.52e-03

Pos logit contributions
oured+0.121
ited+0.117
rying+0.112
ishing+0.109
 least+0.108
Neg logit contributions
ves-0.055
 Everyone-0.052
 many-0.051
Her-0.049
 herself-0.049
Tokenint
Token ID600
Feature activation-1.60e-01

Pos logit contributions
 least+0.014
ited+0.014
oured+0.013
 extingu+0.011
cy+0.011
Neg logit contributions
ves-0.020
uts-0.018
Her-0.018
 himself-0.017
 Everyone-0.017
Tokening
Token ID278
Feature activation+5.36e-02

Pos logit contributions
oured+0.311
ited+0.286
rying+0.282
ishing+0.273
ksh+0.262
Neg logit contributions
ves-0.266
 Everyone-0.257
Her-0.255
 many-0.250
 herself-0.242
Token off
Token ID572
Feature activation+6.05e-02

Pos logit contributions
ves+0.074
Her+0.073
uts+0.069
 Everyone+0.068
 many+0.064
Neg logit contributions
oured-0.116
ited-0.113
rying-0.109
ishing-0.108
ksh-0.104
Token the
Token ID262
Feature activation-1.51e-03

Pos logit contributions
ves+0.084
Her+0.079
uts+0.077
ggies+0.070
 herself+0.070
Neg logit contributions
oured-0.129
ited-0.124
ishing-0.118
rying-0.118
ksh-0.113
Token water
Token ID1660
Feature activation-2.93e-02

Pos logit contributions
oured+0.003
ited+0.003
rying+0.003
ishing+0.003
 least+0.002
Neg logit contributions
ves-0.002
 Everyone-0.002
 many-0.002
 much-0.002
 herself-0.002
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
oured+0.070
ited+0.066
rying+0.066
ishing+0.063
 least+0.063
Neg logit contributions
ves-0.034
Her-0.033
 Everyone-0.031
 many-0.030
 herself-0.029
Token forth
Token ID6071
Feature activation-5.15e-02

Pos logit contributions
oured+0.007
ited+0.006
rying+0.005
ishing+0.005
 least+0.005
Neg logit contributions
ves-0.022
 Everyone-0.022
 many-0.022
Her-0.021
uts-0.021
Token,
Token ID11
Feature activation+9.09e-03

Pos logit contributions
oured+0.106
rying+0.101
ishing+0.098
ited+0.095
 least+0.090
Neg logit contributions
 Everyone-0.071
ves-0.068
 People-0.066
uts-0.062
ggies-0.061
Token the
Token ID262
Feature activation+3.75e-04

Pos logit contributions
ves+0.014
 Everyone+0.013
 many+0.013
Her+0.012
uts+0.012
Neg logit contributions
oured-0.019
ited-0.018
rying-0.017
ishing-0.017
 least-0.017
Token sunlight
Token ID19606
Feature activation-4.03e-02

Pos logit contributions
ves+0.001
uts+0.001
Her+0.001
 Everyone+0.001
 himself+0.001
Neg logit contributions
oured-0.001
ited-0.001
rying-0.001
 least-0.001
ksh-0.001
Token gl
Token ID1278
Feature activation-1.02e-02

Pos logit contributions
oured+0.079
ited+0.076
rying+0.072
ishing+0.070
 least+0.069
Neg logit contributions
ves-0.063
 Everyone-0.062
 many-0.061
Her-0.059
uts-0.058
Tokenint
Token ID600
Feature activation-1.60e-01

Pos logit contributions
ited+0.014
 least+0.014
oured+0.013
cy+0.011
 extingu+0.011
Neg logit contributions
ves-0.022
Her-0.021
uts-0.021
 herself-0.020
 Everyone-0.020
Tokening
Token ID278
Feature activation+5.44e-02

Pos logit contributions
oured+0.293
rying+0.266
ited+0.261
ishing+0.258
ksh+0.243
Neg logit contributions
ves-0.277
 Everyone-0.268
Her-0.260
 many-0.254
ggies-0.253
Token off
Token ID572
Feature activation+6.06e-02

Pos logit contributions
ves+0.089
uts+0.085
 Everyone+0.084
Her+0.082
 People+0.078
Neg logit contributions
oured-0.102
ited-0.099
ishing-0.098
rying-0.096
ksh-0.090
Token her
Token ID607
Feature activation+6.55e-03

Pos logit contributions
ves+0.093
ggies+0.084
uts+0.084
ggy+0.081
 himself+0.077
Neg logit contributions
oured-0.119
ited-0.113
ishing-0.109
rying-0.108
cy-0.104
Token long
Token ID890
Feature activation+3.05e-02

Pos logit contributions
ves+0.011
 Everyone+0.010
 many+0.010
Her+0.010
uts+0.010
Neg logit contributions
oured-0.012
ited-0.011
rying-0.011
ishing-0.011
 least-0.011
Token,
Token ID11
Feature activation+8.55e-03

Pos logit contributions
ves+0.049
uts+0.048
 Everyone+0.048
 many+0.047
Her+0.046
Neg logit contributions
oured-0.056
ited-0.056
ishing-0.050
rying-0.048
ksh-0.047
Token They
Token ID1119
Feature activation-9.17e-03

Pos logit contributions
oured+0.013
ksh+0.011
 least+0.010
cy+0.010
 extingu+0.010
Neg logit contributions
ves-0.008
Her-0.006
ggy-0.006
uts-0.006
 herself-0.006
Token all
Token ID477
Feature activation-5.88e-02

Pos logit contributions
oured+0.020
rying+0.019
ishing+0.019
ksh+0.019
ited+0.019
Neg logit contributions
 many-0.014
ves-0.013
 much-0.012
 herself-0.012
 Everyone-0.012
Token had
Token ID550
Feature activation+2.17e-03

Pos logit contributions
oured+0.124
rying+0.115
ksh+0.115
ishing+0.113
 extingu+0.111
Neg logit contributions
 many-0.089
 much-0.087
 Everyone-0.082
 it-0.080
ves-0.079
Token to
Token ID284
Feature activation-7.99e-03

Pos logit contributions
 many+0.002
ves+0.002
 Everyone+0.002
 much+0.002
 herself+0.002
Neg logit contributions
oured-0.006
ited-0.006
rying-0.006
ishing-0.005
 least-0.005
Token squ
Token ID2809
Feature activation+2.41e-05

Pos logit contributions
oured+0.015
ited+0.014
rying+0.013
ishing+0.013
ksh+0.013
Neg logit contributions
ves-0.014
 herself-0.014
 many-0.014
 Everyone-0.013
 himself-0.013
Tokenint
Token ID600
Feature activation-1.60e-01

Pos logit contributions
 many+0.000
 much+0.000
 Everyone+0.000
 herself+0.000
ves+0.000
Neg logit contributions
oured-0.000
rying-0.000
 least-0.000
 extingu-0.000
ited-0.000
Token their
Token ID511
Feature activation-3.68e-02

Pos logit contributions
oured+0.314
ishing+0.286
rying+0.285
ited+0.281
cy+0.268
Neg logit contributions
ves-0.253
Her-0.245
 Everyone-0.243
uts-0.232
 herself-0.224
Token eyes
Token ID2951
Feature activation+2.36e-02

Pos logit contributions
oured+0.077
ited+0.075
rying+0.070
ishing+0.066
 least+0.066
Neg logit contributions
ves-0.056
 many-0.056
 Everyone-0.054
 herself-0.052
 much-0.051
Token.
Token ID13
Feature activation-5.69e-03

Pos logit contributions
ves+0.029
 Everyone+0.027
 many+0.025
Her+0.025
 herself+0.025
Neg logit contributions
oured-0.057
ited-0.054
rying-0.051
ishing-0.050
 least-0.049
Token
Token ID220
Feature activation+6.11e-03

Pos logit contributions
oured+0.014
ksh+0.013
 least+0.012
cy+0.012
ited+0.012
Neg logit contributions
ves-0.007
Her-0.006
uts-0.006
 herself-0.006
ggy-0.005
Token
Token ID198
Feature activation+1.34e-02

Pos logit contributions
ves+0.006
Her+0.005
 lesson+0.005
 people+0.005
 herself+0.005
Neg logit contributions
oured-0.014
 least-0.013
rying-0.013
ksh-0.013
cy-0.013
Token used
Token ID973
Feature activation-1.73e-02

Pos logit contributions
oured+0.082
ited+0.078
rying+0.075
ishing+0.073
 least+0.072
Neg logit contributions
ves-0.083
 Everyone-0.081
Her-0.077
uts-0.076
 many-0.076
Token to
Token ID284
Feature activation-7.64e-03

Pos logit contributions
oured+0.035
rying+0.033
ishing+0.033
ited+0.033
 least+0.031
Neg logit contributions
ves-0.029
 Everyone-0.028
Her-0.025
 People-0.024
Everyone-0.024
Token make
Token ID787
Feature activation+4.01e-02

Pos logit contributions
oured+0.012
ited+0.011
 least+0.011
rying+0.011
ishing+0.010
Neg logit contributions
ves-0.015
 Everyone-0.014
uts-0.014
Her-0.013
 many-0.013
Token a
Token ID257
Feature activation-8.40e-03

Pos logit contributions
ves+0.051
uts+0.049
Her+0.046
His+0.044
Every+0.044
Neg logit contributions
oured-0.082
ishing-0.079
 least-0.078
rying-0.076
ited-0.072
Token spl
Token ID4328
Feature activation-4.37e-02

Pos logit contributions
oured+0.018
ited+0.017
rying+0.017
ishing+0.016
 least+0.016
Neg logit contributions
ves-0.012
 many-0.012
 Everyone-0.011
Her-0.011
 much-0.011
Tokenint
Token ID600
Feature activation-1.60e-01

Pos logit contributions
oured+0.075
ited+0.071
rying+0.067
ishing+0.064
 least+0.064
Neg logit contributions
ves-0.081
 Everyone-0.079
 many-0.078
Her-0.076
 much-0.075
Token for
Token ID329
Feature activation-2.54e-03

Pos logit contributions
oured+0.308
rying+0.296
ish+0.281
ited+0.280
ishing+0.279
Neg logit contributions
ves-0.250
 Everyone-0.242
Her-0.231
ggies-0.229
 many-0.221
Token the
Token ID262
Feature activation-1.67e-03

Pos logit contributions
oured+0.005
 least+0.005
rying+0.005
ishing+0.005
ited+0.005
Neg logit contributions
uts-0.003
ves-0.003
 Everyone-0.003
Her-0.003
Every-0.003
Token bird
Token ID6512
Feature activation-2.03e-02

Pos logit contributions
oured+0.003
ited+0.003
rying+0.003
 least+0.003
ishing+0.003
Neg logit contributions
ves-0.003
 Everyone-0.003
uts-0.003
Her-0.003
 many-0.003
Token's
Token ID338
Feature activation-2.19e-02

Pos logit contributions
oured+0.047
ited+0.045
rying+0.045
 least+0.043
ishing+0.043
Neg logit contributions
ves-0.025
 Everyone-0.025
Her-0.022
 many-0.022
 People-0.021
Token wing
Token ID8539
Feature activation-1.53e-02

Pos logit contributions
oured+0.039
ited+0.038
ishing+0.037
rying+0.037
 least+0.037
Neg logit contributions
 Everyone-0.036
ves-0.035
Her-0.033
 People-0.032
uts-0.031
Token was
Token ID373
Feature activation-7.09e-03

Pos logit contributions
oured+0.024
cy+0.024
ited+0.023
 least+0.022
ishing+0.021
Neg logit contributions
uts-0.023
ves-0.023
Her-0.022
 Everyone-0.021
 many-0.020
Token a
Token ID257
Feature activation-1.17e-02

Pos logit contributions
oured+0.015
ited+0.015
 least+0.014
cy+0.014
ksh+0.014
Neg logit contributions
ves-0.010
uts-0.009
 Everyone-0.008
Her-0.007
l-0.007
Token little
Token ID1310
Feature activation-6.58e-03

Pos logit contributions
oured+0.032
ited+0.031
rying+0.030
ishing+0.029
 least+0.028
Neg logit contributions
 many-0.011
 much-0.011
ves-0.010
 Everyone-0.010
 people-0.010
Token girl
Token ID2576
Feature activation-4.64e-03

Pos logit contributions
oured+0.014
ited+0.013
rying+0.013
ishing+0.012
 least+0.012
Neg logit contributions
 many-0.010
 Everyone-0.010
 much-0.009
ves-0.009
 people-0.009
Token called
Token ID1444
Feature activation+5.88e-03

Pos logit contributions
oured+0.009
ited+0.009
rying+0.008
ishing+0.008
 least+0.008
Neg logit contributions
ves-0.007
 Everyone-0.007
 many-0.007
Her-0.007
 herself-0.007
Token T
Token ID309
Feature activation+9.39e-02

Pos logit contributions
ves+0.009
uts+0.007
 himself+0.007
 Everyone+0.007
 many+0.007
Neg logit contributions
oured-0.012
cy-0.011
rying-0.011
ited-0.011
 extingu-0.011
Tokenessa
Token ID21411
Feature activation+1.65e-02

Pos logit contributions
ves+0.164
 Everyone+0.157
 many+0.157
Her+0.153
 herself+0.152
Neg logit contributions
oured-0.172
ited-0.166
rying-0.155
ishing-0.149
 least-0.148
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
ves+0.013
 many+0.012
 Everyone+0.012
Her+0.011
 herself+0.011
Neg logit contributions
oured-0.046
ited-0.045
rying-0.043
ishing-0.042
 least-0.042
Token T
Token ID309
Feature activation+9.36e-02

Pos logit contributions
oured+0.030
ited+0.029
 least+0.028
rying+0.028
cy+0.027
Neg logit contributions
ves-0.011
uts-0.008
ggies-0.008
ggy-0.008
 Everyone-0.007
Tokenessa
Token ID21411
Feature activation+1.65e-02

Pos logit contributions
ves+0.157
 Everyone+0.152
 many+0.151
 herself+0.149
Her+0.148
Neg logit contributions
oured-0.177
ited-0.172
rying-0.161
ishing-0.154
 least-0.154
Token was
Token ID373
Feature activation-7.30e-03

Pos logit contributions
ves+0.029
uts+0.027
 Everyone+0.027
 herself+0.026
Her+0.026
Neg logit contributions
oured-0.029
ited-0.027
cy-0.026
 least-0.025
rying-0.025
Token They
Token ID1119
Feature activation-6.03e-03

Pos logit contributions
ves+0.004
ggy+0.003
l+0.003
 himself+0.003
 herself+0.003
Neg logit contributions
oured-0.004
ksh-0.004
cy-0.004
 extingu-0.004
 least-0.004
Token felt
Token ID2936
Feature activation+2.30e-02

Pos logit contributions
oured+0.013
ited+0.012
rying+0.012
ishing+0.012
ksh+0.012
Neg logit contributions
 many-0.009
 Everyone-0.008
 much-0.008
Her-0.008
ves-0.008
Token sorry
Token ID7926
Feature activation-2.46e-02

Pos logit contributions
ves+0.041
uts+0.036
Her+0.034
 Everyone+0.033
ggies+0.033
Neg logit contributions
oured-0.041
 least-0.040
ited-0.038
ishing-0.035
rying-0.034
Token.
Token ID13
Feature activation+2.31e-03

Pos logit contributions
 least+0.044
rying+0.044
oured+0.043
ksh+0.041
ishing+0.041
Neg logit contributions
ves-0.039
Her-0.035
 Everyone-0.035
uts-0.034
ggies-0.033
Token They
Token ID1119
Feature activation-6.28e-03

Pos logit contributions
ves+0.004
hell+0.003
fs+0.003
ggy+0.003
aches+0.003
Neg logit contributions
cy-0.004
ksh-0.004
 extingu-0.004
oured-0.004
 least-0.004
Token realized
Token ID6939
Feature activation+8.14e-02

Pos logit contributions
oured+0.013
rying+0.013
ishing+0.013
ited+0.013
ksh+0.012
Neg logit contributions
 many-0.010
 much-0.009
 Everyone-0.009
Her-0.009
 herself-0.009
Token they
Token ID484
Feature activation-1.47e-02

Pos logit contributions
ves+0.106
 Everyone+0.103
Her+0.102
uts+0.101
 People+0.098
Neg logit contributions
 least-0.159
oured-0.149
rying-0.146
 extingu-0.145
ishing-0.133
Token were
Token ID547
Feature activation-7.81e-03

Pos logit contributions
oured+0.037
ited+0.035
rying+0.034
ishing+0.034
ksh+0.034
Neg logit contributions
 many-0.017
 much-0.017
ves-0.016
 Everyone-0.016
 herself-0.015
Token wrong
Token ID2642
Feature activation+2.65e-02

Pos logit contributions
oured+0.020
ited+0.020
rying+0.018
ksh+0.018
ishing+0.018
Neg logit contributions
 many-0.008
 herself-0.008
ves-0.008
 much-0.008
 Everyone-0.007
Token.
Token ID13
Feature activation+1.99e-03

Pos logit contributions
 Everyone+0.040
ves+0.037
 People+0.037
Her+0.036
Everyone+0.033
Neg logit contributions
rying-0.051
ishing-0.049
oured-0.047
 least-0.047
ited-0.046
Token
Token ID198
Feature activation+1.42e-02

Pos logit contributions
ves+0.003
ggy+0.002
 herself+0.002
ggies+0.002
 himself+0.002
Neg logit contributions
cy-0.004
ksh-0.004
 extingu-0.004
 least-0.004
oured-0.004
Token
Token ID198
Feature activation+9.32e-03

Pos logit contributions
oured+0.033
 least+0.029
ited+0.029
ksh+0.028
cy+0.028
Neg logit contributions
ves-0.013
ggies-0.010
ggy-0.010
 much-0.010
uts-0.009
Token
Token ID198
Feature activation+9.14e-03

Pos logit contributions
 herself+0.009
ves+0.009
 much+0.008
 himself+0.008
uts+0.008
Neg logit contributions
oured-0.021
 least-0.019
 extingu-0.017
arse-0.017
ited-0.017
TokenMrs
Token ID27034
Feature activation+3.78e-02

Pos logit contributions
 Everyone+0.010
 herself+0.010
ves+0.010
 much+0.009
 himself+0.009
Neg logit contributions
oured-0.020
ited-0.020
 least-0.019
arse-0.018
rying-0.018
Token.
Token ID13
Feature activation-8.81e-03

Pos logit contributions
ves+0.035
 Everyone+0.032
 many+0.031
Her+0.029
 much+0.029
Neg logit contributions
oured-0.100
ited-0.095
rying-0.093
ishing-0.091
 least-0.090
Token P
Token ID350
Feature activation+3.22e-02

Pos logit contributions
oured+0.017
ited+0.016
rying+0.015
ishing+0.014
 least+0.014
Neg logit contributions
ves-0.015
 Everyone-0.014
 many-0.014
Her-0.014
 herself-0.014
Tokenarson
Token ID12613
Feature activation+1.02e-01

Pos logit contributions
ves+0.061
uts+0.059
 Everyone+0.059
 herself+0.057
Her+0.057
Neg logit contributions
ited-0.051
oured-0.051
rying-0.047
 least-0.044
ksh-0.043
Token smiled
Token ID13541
Feature activation-1.36e-02

Pos logit contributions
ves+0.195
 Everyone+0.188
 many+0.188
Her+0.183
 herself+0.182
Neg logit contributions
oured-0.169
ited-0.161
rying-0.149
ishing-0.144
 least-0.142
Token and
Token ID290
Feature activation-1.21e-02

Pos logit contributions
oured+0.033
ited+0.032
rying+0.031
ishing+0.030
 least+0.030
Neg logit contributions
ves-0.016
 Everyone-0.014
 many-0.014
Her-0.014
uts-0.014
Token said
Token ID531
Feature activation-1.94e-02

Pos logit contributions
oured+0.024
ited+0.024
rying+0.022
ishing+0.022
 least+0.021
Neg logit contributions
ves-0.019
 many-0.018
 Everyone-0.018
 herself-0.017
Her-0.017
Token,
Token ID11
Feature activation+5.18e-03

Pos logit contributions
oured+0.035
rying+0.027
ishing+0.026
ish+0.026
ited+0.025
Neg logit contributions
ves-0.031
uts-0.030
ggies-0.028
Her-0.027
Everyone-0.027
Token "
Token ID366
Feature activation-2.04e-02

Pos logit contributions
ves+0.007
ggies+0.006
 many+0.006
 Everyone+0.006
uts+0.006
Neg logit contributions
oured-0.012
rying-0.010
ksh-0.010
ited-0.010
 extingu-0.009
TokenThat
Token ID2504
Feature activation+2.96e-02

Pos logit contributions
ited+0.032
ksh+0.030
oured+0.029
cy+0.028
 extingu+0.026
Neg logit contributions
ves-0.029
 many-0.028
 Everyone-0.027
 himself-0.027
ggies-0.026
Token is
Token ID318
Feature activation-3.69e-02

Pos logit contributions
ves+0.048
 Everyone+0.046
 many+0.046
Her+0.045
 herself+0.044
Neg logit contributions
oured-0.058
ited-0.055
rying-0.052
ishing-0.050
 least-0.050
Token something
Token ID1223
Feature activation-8.93e-03

Pos logit contributions
oured+0.089
 least+0.085
rying+0.084
ited+0.083
ksh+0.080
Neg logit contributions
ves-0.041
 Everyone-0.037
uts-0.037
Her-0.037
 many-0.033
Token we
Token ID356
Feature activation-3.79e-02

Pos logit contributions
oured+0.017
 least+0.016
rying+0.016
ited+0.016
 extingu+0.015
Neg logit contributions
ves-0.013
Her-0.012
 Everyone-0.012
 herself-0.011
uts-0.011
Token won
Token ID1839
Feature activation+1.63e-02

Pos logit contributions
oured+0.080
ited+0.076
ksh+0.074
ishing+0.072
rying+0.071
Neg logit contributions
 many-0.056
 Everyone-0.054
 herself-0.053
 much-0.052
 people-0.051
Token't
Token ID470
Feature activation+7.20e-02

Pos logit contributions
ves+0.016
Her+0.016
 many+0.014
 Everyone+0.014
His+0.014
Neg logit contributions
 least-0.034
rying-0.034
oured-0.034
 extingu-0.032
arse-0.032
Token find
Token ID1064
Feature activation+1.47e-03

Pos logit contributions
ves+0.115
 Everyone+0.110
 many+0.110
Her+0.106
 herself+0.105
Neg logit contributions
oured-0.142
ited-0.137
rying-0.130
ishing-0.125
 least-0.124
Token out
Token ID503
Feature activation-4.86e-02

Pos logit contributions
ves+0.002
Her+0.002
 Everyone+0.002
uts+0.002
His+0.002
Neg logit contributions
oured-0.003
rying-0.003
ishing-0.003
 least-0.003
ited-0.003
Token today
Token ID1909
Feature activation+1.40e-02

Pos logit contributions
oured+0.098
rying+0.098
ited+0.093
 least+0.092
ishing+0.090
Neg logit contributions
Her-0.067
ves-0.066
uts-0.060
 Everyone-0.059
His-0.059
Token."
Token ID526
Feature activation-1.13e-02

Pos logit contributions
Her+0.022
His+0.021
ves+0.021
 Everyone+0.019
 himself+0.018
Neg logit contributions
rying-0.025
oured-0.025
 least-0.025
ited-0.023
cy-0.023
Token
Token ID628
Feature activation-5.75e-02

Pos logit contributions
oured+0.029
ited+0.026
ksh+0.025
 least+0.025
cy+0.025
Neg logit contributions
ves-0.013
uts-0.011
 many-0.010
 himself-0.010
 much-0.009
Token owl
Token ID39610
Feature activation-3.85e-02

Pos logit contributions
oured+0.030
ited+0.029
rying+0.027
ishing+0.026
ksh+0.026
Neg logit contributions
 many-0.023
ves-0.023
 Everyone-0.023
 herself-0.022
 much-0.022
Token told
Token ID1297
Feature activation+1.02e-02

Pos logit contributions
oured+0.072
ited+0.069
rying+0.065
ishing+0.063
 least+0.062
Neg logit contributions
ves-0.065
 Everyone-0.063
 many-0.062
Her-0.060
 herself-0.060
Token the
Token ID262
Feature activation-1.10e-03

Pos logit contributions
ves+0.016
uts+0.016
Her+0.014
 much+0.012
 Everyone+0.012
Neg logit contributions
oured-0.021
 least-0.020
ited-0.020
ishing-0.018
rying-0.018
Token rabbit
Token ID22746
Feature activation-1.63e-03

Pos logit contributions
oured+0.002
ited+0.002
rying+0.002
 least+0.002
ishing+0.001
Neg logit contributions
ves-0.002
Her-0.002
uts-0.002
 Everyone-0.002
 many-0.002
Token kind
Token ID1611
Feature activation+3.87e-02

Pos logit contributions
oured+0.003
 least+0.003
rying+0.003
ited+0.003
ish+0.003
Neg logit contributions
Her-0.002
ves-0.002
uts-0.002
 Everyone-0.002
Everyone-0.002
Token words
Token ID2456
Feature activation+7.31e-02

Pos logit contributions
ves+0.066
Her+0.060
uts+0.059
ggies+0.054
 much+0.053
Neg logit contributions
oured-0.072
ishing-0.068
rying-0.068
ited-0.068
ish-0.067
Token,
Token ID11
Feature activation+8.82e-03

Pos logit contributions
ves+0.095
 Everyone+0.089
Her+0.085
ggies+0.083
uts+0.079
Neg logit contributions
oured-0.163
ited-0.151
 least-0.147
rying-0.147
ishing-0.141
Token and
Token ID290
Feature activation-9.00e-03

Pos logit contributions
ves+0.012
 Everyone+0.011
uts+0.011
Her+0.011
ggies+0.010
Neg logit contributions
oured-0.019
ited-0.018
rying-0.018
 least-0.018
ishing-0.017
Token soon
Token ID2582
Feature activation+3.04e-03

Pos logit contributions
oured+0.022
ited+0.021
 least+0.021
rying+0.021
ishing+0.020
Neg logit contributions
ves-0.010
Her-0.009
uts-0.009
 Everyone-0.009
 herself-0.008
Token the
Token ID262
Feature activation-1.19e-03

Pos logit contributions
ves+0.003
Her+0.003
 Everyone+0.003
uts+0.003
 herself+0.002
Neg logit contributions
oured-0.008
ited-0.007
 least-0.007
rying-0.007
ishing-0.007
Token rabbit
Token ID22746
Feature activation-1.73e-03

Pos logit contributions
oured+0.002
ited+0.002
rying+0.002
 least+0.002
ishing+0.002
Neg logit contributions
ves-0.002
 Everyone-0.002
uts-0.002
Her-0.002
 many-0.002
Token under
Token ID739
Feature activation+6.66e-03

Pos logit contributions
oured+0.089
ited+0.087
 least+0.082
rying+0.082
ksh+0.082
Neg logit contributions
 many-0.053
 much-0.047
 Everyone-0.046
 herself-0.044
ves-0.044
Token the
Token ID262
Feature activation+4.03e-03

Pos logit contributions
ves+0.006
 Everyone+0.006
uts+0.005
 many+0.005
 much+0.005
Neg logit contributions
oured-0.018
ited-0.017
rying-0.016
ishing-0.016
 least-0.016
Token tree
Token ID5509
Feature activation-4.13e-03

Pos logit contributions
ves+0.005
 Everyone+0.005
 many+0.004
Her+0.004
uts+0.004
Neg logit contributions
oured-0.010
ited-0.009
rying-0.009
ishing-0.009
 least-0.009
Token.
Token ID13
Feature activation+6.01e-03

Pos logit contributions
oured+0.010
ited+0.010
rying+0.010
 least+0.010
ishing+0.010
Neg logit contributions
ves-0.004
 Everyone-0.004
Her-0.003
 many-0.003
uts-0.003
Token She
Token ID1375
Feature activation+7.79e-03

Pos logit contributions
ves+0.007
ggies+0.005
ggy+0.005
 much+0.004
uts+0.004
Neg logit contributions
oured-0.015
cy-0.014
ited-0.014
rying-0.014
ksh-0.013
Token sees
Token ID7224
Feature activation-2.19e-02

Pos logit contributions
 many+0.015
 herself+0.014
 She+0.014
+0.014
 people+0.014
Neg logit contributions
oured-0.013
ited-0.012
arse-0.012
rying-0.012
ksh-0.012
Token the
Token ID262
Feature activation+3.88e-03

Pos logit contributions
oured+0.044
 least+0.041
 extingu+0.039
ited+0.038
cy+0.036
Neg logit contributions
ves-0.034
ggies-0.030
uts-0.030
 Everyone-0.028
gh-0.028
Token apples
Token ID22514
Feature activation-5.58e-04

Pos logit contributions
ves+0.006
 Everyone+0.005
Her+0.005
uts+0.005
 many+0.005
Neg logit contributions
oured-0.008
ited-0.008
rying-0.008
 least-0.007
ishing-0.007
Token.
Token ID13
Feature activation+5.82e-03

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
 least+0.001
ishing+0.001
Neg logit contributions
 Everyone-0.001
ves-0.001
Her-0.001
 People-0.001
 many-0.001
Token They
Token ID1119
Feature activation+6.90e-04

Pos logit contributions
ves+0.009
ggies+0.007
ggy+0.007
gh+0.007
 lesson+0.006
Neg logit contributions
oured-0.012
cy-0.011
ksh-0.011
 extingu-0.011
int-0.010
Token are
Token ID389
Feature activation-4.31e-02

Pos logit contributions
 many+0.001
 herself+0.001
 Everyone+0.001
ves+0.001
 much+0.001
Neg logit contributions
oured-0.001
ited-0.001
ishing-0.001
rying-0.001
ksh-0.001
Token man
Token ID582
Feature activation+7.78e-03

Pos logit contributions
oured+0.025
ited+0.024
rying+0.022
 least+0.022
ishing+0.021
Neg logit contributions
ves-0.024
Her-0.023
uts-0.023
 Everyone-0.022
 herself-0.022
Token on
Token ID319
Feature activation+5.10e-02

Pos logit contributions
ves+0.013
 many+0.013
 Everyone+0.013
 much+0.012
Her+0.012
Neg logit contributions
oured-0.015
ited-0.014
rying-0.013
ishing-0.013
 least-0.013
Token it
Token ID340
Feature activation+3.41e-02

Pos logit contributions
uts+0.076
Her+0.072
ves+0.068
 herself+0.062
ily+0.062
Neg logit contributions
oured-0.099
ited-0.096
 least-0.089
ishing-0.087
ishment-0.085
Token was
Token ID373
Feature activation-2.12e-03

Pos logit contributions
ves+0.057
 Everyone+0.056
Her+0.055
 many+0.055
uts+0.054
Neg logit contributions
oured-0.063
ited-0.061
rying-0.056
 least-0.054
ishing-0.054
Token being
Token ID852
Feature activation-2.70e-02

Pos logit contributions
oured+0.004
 least+0.004
ited+0.004
rying+0.004
ishing+0.003
Neg logit contributions
ves-0.004
Her-0.003
 Everyone-0.003
uts-0.003
 People-0.003
Token ignorant
Token ID20109
Feature activation+3.54e-02

Pos logit contributions
oured+0.059
ited+0.055
rying+0.055
ishing+0.054
 least+0.053
Neg logit contributions
ves-0.039
 Everyone-0.036
Her-0.035
uts-0.034
 herself-0.033
Token and
Token ID290
Feature activation-6.98e-03

Pos logit contributions
Her+0.054
ves+0.053
 Everyone+0.045
Everyone+0.043
 herself+0.043
Neg logit contributions
oured-0.072
rying-0.068
ishing-0.067
ited-0.067
 least-0.063
Token rev
Token ID2710
Feature activation-1.85e-03

Pos logit contributions
ited+0.012
oured+0.012
rying+0.012
 least+0.012
ishing+0.011
Neg logit contributions
ves-0.012
Her-0.012
uts-0.011
 Everyone-0.011
 herself-0.011
Tokenved
Token ID1079
Feature activation-2.82e-02

Pos logit contributions
oured+0.003
ited+0.003
rying+0.002
 least+0.002
ishing+0.002
Neg logit contributions
ves-0.004
 Everyone-0.004
 many-0.004
Her-0.004
 herself-0.004
Token the
Token ID262
Feature activation-3.45e-04

Pos logit contributions
oured+0.058
rying+0.055
 least+0.053
ishing+0.053
ited+0.053
Neg logit contributions
ves-0.040
Her-0.039
uts-0.038
ily-0.033
 Everyone-0.033
Token engine
Token ID3113
Feature activation-1.32e-03

Pos logit contributions
oured+0.001
ited+0.001
rying+0.001
 least+0.001
ksh+0.001
Neg logit contributions
ves-0.001
Her-0.001
uts-0.001
 Everyone-0.001
 herself-0.001
Token feel
Token ID1254
Feature activation-8.76e-03

Pos logit contributions
oured+0.012
ited+0.011
rying+0.011
ishing+0.010
 least+0.010
Neg logit contributions
ves-0.012
 many-0.012
 Everyone-0.012
Her-0.011
 herself-0.011
Token sick
Token ID6639
Feature activation+3.79e-02

Pos logit contributions
oured+0.017
ited+0.016
 least+0.016
rying+0.015
ishing+0.014
Neg logit contributions
ves-0.015
uts-0.014
 Everyone-0.013
ggies-0.013
Her-0.013
Token.
Token ID13
Feature activation-2.79e-03

Pos logit contributions
ves+0.061
 Everyone+0.053
Her+0.051
ggies+0.050
 People+0.050
Neg logit contributions
oured-0.072
 least-0.071
rying-0.070
 extingu-0.069
ited-0.068
Token Her
Token ID2332
Feature activation-1.87e-02

Pos logit contributions
oured+0.007
ksh+0.006
ited+0.006
 least+0.006
icol+0.006
Neg logit contributions
ves-0.004
ggies-0.002
uts-0.002
l-0.002
ggy-0.002
Token tum
Token ID11814
Feature activation-1.79e-02

Pos logit contributions
oured+0.041
ited+0.039
ishing+0.037
 least+0.037
rying+0.036
Neg logit contributions
ves-0.026
 Everyone-0.025
Her-0.023
uts-0.023
 many-0.023
Tokenmy
Token ID1820
Feature activation-1.17e-02

Pos logit contributions
oured+0.014
ited+0.013
ishing+0.011
cy+0.010
rying+0.010
Neg logit contributions
ves-0.050
 Everyone-0.049
 many-0.047
uts-0.046
 People-0.046
Token hurt
Token ID5938
Feature activation+4.03e-03

Pos logit contributions
oured+0.022
ited+0.022
rying+0.020
ishing+0.020
 least+0.019
Neg logit contributions
ves-0.019
 Everyone-0.019
 many-0.019
Her-0.018
uts-0.018
Token and
Token ID290
Feature activation-6.49e-03

Pos logit contributions
ves+0.008
uts+0.007
 Everyone+0.007
 People+0.007
ggies+0.007
Neg logit contributions
 least-0.006
oured-0.006
cy-0.006
ishing-0.006
ited-0.006
Token she
Token ID673
Feature activation+4.79e-03

Pos logit contributions
ited+0.014
oured+0.014
 least+0.013
rying+0.013
ishing+0.012
Neg logit contributions
ves-0.009
uts-0.008
 Everyone-0.008
Her-0.007
ggies-0.007
Token didn
Token ID1422
Feature activation+5.06e-02

Pos logit contributions
ves+0.008
 Everyone+0.008
 many+0.008
Her+0.008
 herself+0.008
Neg logit contributions
oured-0.009
ited-0.009
rying-0.008
ishing-0.008
 least-0.008
Token't
Token ID470
Feature activation+6.80e-02

Pos logit contributions
ves+0.062
 many+0.059
 people+0.058
 languages+0.055
 others+0.053
Neg logit contributions
 least-0.097
oured-0.096
arse-0.093
ksh-0.085
 extingu-0.083
Token said
Token ID531
Feature activation-1.50e-02

Pos logit contributions
ves+0.009
 Everyone+0.009
uts+0.009
 many+0.009
 much+0.008
Neg logit contributions
oured-0.007
ited-0.007
rying-0.006
 least-0.006
ksh-0.006
Token.
Token ID13
Feature activation+1.97e-04

Pos logit contributions
oured+0.033
rying+0.031
ishing+0.029
arse+0.029
ited+0.028
Neg logit contributions
ves-0.020
 Everyone-0.019
 People-0.018
Everyone-0.017
 many-0.017
Token She
Token ID1375
Feature activation+4.38e-03

Pos logit contributions
ves+0.000
ggies+0.000
 lesson+0.000
ggy+0.000
aches+0.000
Neg logit contributions
oured-0.000
 extingu-0.000
joy-0.000
 least-0.000
ksh-0.000
Token took
Token ID1718
Feature activation+4.40e-03

Pos logit contributions
ves+0.008
 Everyone+0.007
 many+0.007
Her+0.007
 herself+0.007
Neg logit contributions
oured-0.008
ited-0.008
rying-0.007
ishing-0.007
 least-0.007
Token the
Token ID262
Feature activation-8.43e-03

Pos logit contributions
ves+0.007
uts+0.007
Her+0.006
ggies+0.006
Every+0.006
Neg logit contributions
oured-0.009
 least-0.008
cy-0.008
ited-0.007
 extingu-0.007
Token gum
Token ID27142
Feature activation+4.10e-03

Pos logit contributions
oured+0.015
ited+0.015
 least+0.013
ishing+0.013
 extingu+0.013
Neg logit contributions
ves-0.015
 Everyone-0.014
uts-0.014
Her-0.014
 many-0.013
Token away
Token ID1497
Feature activation+1.90e-02

Pos logit contributions
ves+0.007
 Everyone+0.007
 People+0.007
ggies+0.006
 many+0.006
Neg logit contributions
oured-0.007
ited-0.007
rying-0.007
 extingu-0.007
 least-0.007
Token from
Token ID422
Feature activation+1.64e-02

Pos logit contributions
ves+0.032
 Everyone+0.032
 People+0.031
ggies+0.029
uts+0.028
Neg logit contributions
oured-0.035
 extingu-0.034
ited-0.033
rying-0.032
 least-0.031
Token them
Token ID606
Feature activation+3.94e-02

Pos logit contributions
ves+0.023
uts+0.023
 Everyone+0.020
Her+0.019
 People+0.018
Neg logit contributions
 least-0.037
oured-0.035
rying-0.034
ited-0.033
ishing-0.033
Token and
Token ID290
Feature activation-1.55e-03

Pos logit contributions
ves+0.058
 Everyone+0.054
Her+0.051
 People+0.049
Everyone+0.049
Neg logit contributions
oured-0.084
ited-0.080
rying-0.079
 least-0.076
ishing-0.074
Token threw
Token ID9617
Feature activation-9.59e-03

Pos logit contributions
oured+0.002
ited+0.002
rying+0.002
 least+0.002
ishing+0.002
Neg logit contributions
ves-0.003
 Everyone-0.003
 many-0.003
Her-0.003
uts-0.003
Token rabbit
Token ID22746
Feature activation-3.41e-03

Pos logit contributions
ves+0.034
 Everyone+0.033
 many+0.032
Her+0.031
 herself+0.031
Neg logit contributions
oured-0.044
ited-0.043
rying-0.040
ishing-0.039
 least-0.038
Token called
Token ID1444
Feature activation+8.42e-03

Pos logit contributions
oured+0.006
ited+0.006
rying+0.006
ishing+0.006
 least+0.006
Neg logit contributions
ves-0.006
 Everyone-0.005
 many-0.005
uts-0.005
 herself-0.005
Token Bob
Token ID5811
Feature activation-4.39e-02

Pos logit contributions
ves+0.015
Her+0.013
uts+0.013
 lesson+0.013
 many+0.012
Neg logit contributions
oured-0.016
rying-0.013
 extingu-0.013
ksh-0.013
cy-0.013
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
oured+0.089
ited+0.086
rying+0.082
ishing+0.080
 least+0.078
Neg logit contributions
ves-0.067
 Everyone-0.065
 many-0.063
Her-0.062
 herself-0.061
Token Bob
Token ID5811
Feature activation-4.39e-02

Pos logit contributions
oured+0.022
ited+0.021
ksh+0.019
rying+0.019
cy+0.019
Neg logit contributions
ves-0.015
Her-0.014
uts-0.013
 herself-0.013
 Everyone-0.013
Token was
Token ID373
Feature activation-6.12e-03

Pos logit contributions
oured+0.083
ited+0.081
rying+0.076
ishing+0.075
 least+0.074
Neg logit contributions
ves-0.072
 Everyone-0.069
Her-0.067
uts-0.066
 many-0.066
Token a
Token ID257
Feature activation-1.13e-02

Pos logit contributions
oured+0.015
ited+0.014
rying+0.014
 least+0.014
ishing+0.013
Neg logit contributions
ves-0.007
uts-0.006
 Everyone-0.006
Her-0.006
 herself-0.006
Token gentle
Token ID10296
Feature activation+5.44e-03

Pos logit contributions
oured+0.032
ited+0.031
rying+0.031
ishing+0.030
arse+0.029
Neg logit contributions
 much-0.010
 many-0.009
 long-0.008
 Everyone-0.008
Her-0.008
Token creature
Token ID7185
Feature activation-1.37e-02

Pos logit contributions
ves+0.010
 Everyone+0.009
 many+0.009
Her+0.009
uts+0.009
Neg logit contributions
oured-0.010
ited-0.009
rying-0.009
ishing-0.008
 least-0.008
Token and
Token ID290
Feature activation-1.17e-02

Pos logit contributions
oured+0.027
ited+0.025
rying+0.024
 least+0.024
ishing+0.023
Neg logit contributions
ves-0.022
 Everyone-0.021
Her-0.020
 herself-0.020
 many-0.020
Token he
Token ID339
Feature activation+6.65e-03

Pos logit contributions
oured+0.019
ited+0.017
 least+0.016
rying+0.016
cy+0.015
Neg logit contributions
uts-0.020
ves-0.020
Her-0.019
 Everyone-0.019
 herself-0.018
Token got
Token ID1392
Feature activation+3.25e-04

Pos logit contributions
oured+0.015
ited+0.014
rying+0.013
ishing+0.013
ksh+0.013
Neg logit contributions
 many-0.015
ves-0.015
 Everyone-0.014
 herself-0.014
 much-0.014
Token lost
Token ID2626
Feature activation-8.09e-03

Pos logit contributions
ves+0.000
 Everyone+0.000
uts+0.000
Her+0.000
 many+0.000
Neg logit contributions
oured-0.001
ited-0.001
rying-0.001
 least-0.001
ishing-0.001
Token.
Token ID13
Feature activation-4.94e-03

Pos logit contributions
oured+0.019
rying+0.019
ited+0.019
ishing+0.018
 least+0.018
Neg logit contributions
ves-0.009
 Everyone-0.009
Her-0.008
uts-0.008
 People-0.008
Token His
Token ID2399
Feature activation-2.64e-02

Pos logit contributions
oured+0.011
ksh+0.010
ited+0.010
cy+0.010
 least+0.010
Neg logit contributions
ves-0.006
uts-0.005
 herself-0.005
Her-0.005
ily-0.004
Token family
Token ID1641
Feature activation-1.10e-02

Pos logit contributions
oured+0.059
ited+0.056
rying+0.054
ishing+0.054
 least+0.054
Neg logit contributions
ves-0.033
 Everyone-0.032
Her-0.032
 herself-0.030
uts-0.030
Token looked
Token ID3114
Feature activation-4.49e-02

Pos logit contributions
oured+0.021
ited+0.021
rying+0.019
ishing+0.019
ksh+0.019
Neg logit contributions
ves-0.018
 many-0.017
 Everyone-0.017
Her-0.016
 much-0.016
Token for
Token ID329
Feature activation-4.87e-03

Pos logit contributions
oured+0.102
ited+0.098
rying+0.092
ksh+0.090
ishing+0.089
Neg logit contributions
 many-0.062
 much-0.059
 Everyone-0.055
ves-0.054
 herself-0.053
Token him
Token ID683
Feature activation+3.97e-02

Pos logit contributions
oured+0.012
ited+0.011
rying+0.011
 least+0.010
ishing+0.010
Neg logit contributions
ves-0.006
 Everyone-0.005
uts-0.005
Her-0.005
 many-0.005
Token everywhere
Token ID8347
Feature activation+2.89e-02

Pos logit contributions
ves+0.053
 Everyone+0.052
Her+0.049
uts+0.046
 many+0.046
Neg logit contributions
oured-0.090
ited-0.085
rying-0.084
 least-0.081
ishing-0.080
Token,
Token ID11
Feature activation+9.34e-03

Pos logit contributions
 Everyone+0.040
ves+0.039
Her+0.037
 People+0.034
uts+0.034
Neg logit contributions
oured-0.058
rying-0.057
 least-0.055
ited-0.053
ishing-0.052
Token but
Token ID475
Feature activation-3.95e-02

Pos logit contributions
ves+0.011
 Everyone+0.011
 many+0.010
Her+0.010
 herself+0.010
Neg logit contributions
oured-0.022
ited-0.022
rying-0.021
ishing-0.020
 least-0.020
Token.
Token ID13
Feature activation-2.60e-03

Pos logit contributions
oured+0.002
ited+0.002
rying+0.002
ishing+0.002
 least+0.002
Neg logit contributions
ves-0.001
 Everyone-0.001
Her-0.001
 People-0.001
uts-0.001
Token She
Token ID1375
Feature activation+3.39e-03

Pos logit contributions
oured+0.006
cy+0.005
ited+0.005
ksh+0.005
 least+0.005
Neg logit contributions
ves-0.004
ggies-0.003
uts-0.003
ggy-0.003
 lesson-0.003
Token wanted
Token ID2227
Feature activation+2.13e-02

Pos logit contributions
ves+0.006
 Everyone+0.006
 many+0.006
uts+0.006
 herself+0.005
Neg logit contributions
oured-0.006
ited-0.006
rying-0.005
ishing-0.005
 least-0.005
Token to
Token ID284
Feature activation-7.01e-03

Pos logit contributions
ves+0.037
 Everyone+0.035
uts+0.034
ggies+0.032
Her+0.032
Neg logit contributions
oured-0.036
 least-0.035
ited-0.031
rying-0.030
cy-0.029
Token get
Token ID651
Feature activation+1.46e-02

Pos logit contributions
oured+0.012
ited+0.011
rying+0.011
ishing+0.010
 least+0.010
Neg logit contributions
ves-0.013
 many-0.013
 Everyone-0.013
Her-0.012
uts-0.012
Token closer
Token ID5699
Feature activation-2.94e-02

Pos logit contributions
ves+0.013
 Everyone+0.011
uts+0.011
Her+0.010
 many+0.010
Neg logit contributions
oured-0.039
ited-0.038
rying-0.037
 least-0.036
ishing-0.036
Token to
Token ID284
Feature activation-6.85e-03

Pos logit contributions
oured+0.047
rying+0.046
 least+0.044
ishing+0.042
cy+0.042
Neg logit contributions
 Everyone-0.056
ves-0.055
Everyone-0.052
 People-0.051
ggies-0.050
Token see
Token ID766
Feature activation-1.58e-02

Pos logit contributions
oured+0.019
ited+0.019
rying+0.018
 least+0.018
ishing+0.017
Neg logit contributions
ves-0.005
uts-0.005
 many-0.005
 Everyone-0.005
Her-0.004
Token it
Token ID340
Feature activation+3.54e-02

Pos logit contributions
oured+0.037
 least+0.035
rying+0.035
ited+0.035
ishing+0.034
Neg logit contributions
ves-0.019
uts-0.017
 Everyone-0.017
ggies-0.016
 People-0.014
Token better
Token ID1365
Feature activation+5.39e-02

Pos logit contributions
ves+0.041
 Everyone+0.039
 many+0.038
Her+0.036
uts+0.036
Neg logit contributions
oured-0.086
ited-0.083
rying-0.080
ishing-0.078
 least-0.077
Token,
Token ID11
Feature activation+1.25e-02

Pos logit contributions
 Everyone+0.085
ves+0.081
 People+0.079
Everyone+0.074
Her+0.070
Neg logit contributions
rying-0.112
 least-0.105
ishing-0.103
oured-0.102
cy-0.099
Token The
Token ID383
Feature activation-1.46e-02

Pos logit contributions
oured+0.008
ksh+0.007
 least+0.007
rying+0.007
ited+0.007
Neg logit contributions
ves-0.006
 lesson-0.004
ggy-0.004
ggies-0.004
uts-0.004
Token mouse
Token ID10211
Feature activation-1.47e-02

Pos logit contributions
oured+0.029
ited+0.028
rying+0.026
ishing+0.025
 least+0.025
Neg logit contributions
ves-0.024
 Everyone-0.023
 many-0.023
Her-0.022
 herself-0.022
Token was
Token ID373
Feature activation-2.21e-03

Pos logit contributions
oured+0.026
ited+0.025
rying+0.023
 least+0.022
ishing+0.022
Neg logit contributions
ves-0.026
 Everyone-0.026
uts-0.025
 many-0.025
Her-0.025
Token sad
Token ID6507
Feature activation+3.03e-02

Pos logit contributions
oured+0.004
ited+0.004
rying+0.004
 least+0.004
ishing+0.004
Neg logit contributions
ves-0.003
 Everyone-0.003
uts-0.003
Her-0.003
 People-0.003
Token,
Token ID11
Feature activation+1.00e-02

Pos logit contributions
ves+0.048
 Everyone+0.042
Her+0.042
 People+0.039
uts+0.038
Neg logit contributions
oured-0.062
rying-0.060
 least-0.059
ited-0.058
ishing-0.058
Token but
Token ID475
Feature activation-3.71e-02

Pos logit contributions
ves+0.020
 Everyone+0.018
uts+0.017
Her+0.016
Everyone+0.016
Neg logit contributions
ited-0.016
 least-0.015
oured-0.015
rying-0.015
ksh-0.013
Token he
Token ID339
Feature activation+9.11e-03

Pos logit contributions
oured+0.088
ited+0.087
 least+0.084
rying+0.084
cy+0.078
Neg logit contributions
ves-0.045
uts-0.041
 Everyone-0.038
 People-0.034
ggies-0.034
Token listened
Token ID16399
Feature activation-2.35e-02

Pos logit contributions
ves+0.016
 Everyone+0.016
 many+0.016
Her+0.015
 herself+0.015
Neg logit contributions
oured-0.016
ited-0.016
rying-0.015
ishing-0.014
 least-0.014
Token to
Token ID284
Feature activation-4.95e-03

Pos logit contributions
oured+0.046
rying+0.043
ited+0.043
ishing+0.039
ksh+0.038
Neg logit contributions
ves-0.037
 Everyone-0.036
 People-0.034
uts-0.034
ggies-0.032
Token his
Token ID465
Feature activation-9.84e-03

Pos logit contributions
 least+0.011
rying+0.011
oured+0.011
ited+0.010
ishing+0.010
Neg logit contributions
uts-0.007
ves-0.007
Her-0.005
ggies-0.005
 much-0.005
Token mum
Token ID25682
Feature activation+1.51e-02

Pos logit contributions
oured+0.018
ited+0.017
rying+0.016
ishing+0.016
 least+0.016
Neg logit contributions
ves-0.017
 Everyone-0.017
uts-0.016
 many-0.016
Her-0.016
Token face
Token ID1986
Feature activation-3.42e-03

Pos logit contributions
oured+0.020
ited+0.019
rying+0.018
ishing+0.018
 least+0.018
Neg logit contributions
ves-0.016
 Everyone-0.015
 many-0.015
uts-0.015
 much-0.015
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
oured+0.008
ited+0.007
rying+0.007
ishing+0.006
int+0.006
Neg logit contributions
ves-0.004
 Everyone-0.004
 many-0.004
 People-0.004
Her-0.004
Token The
Token ID383
Feature activation-1.36e-02

Pos logit contributions
oured+0.005
ksh+0.004
 least+0.004
icol+0.004
cy+0.004
Neg logit contributions
ves-0.004
ggy-0.003
ggies-0.003
l-0.003
Her-0.002
Token purple
Token ID14032
Feature activation+2.13e-02

Pos logit contributions
oured+0.026
ited+0.024
 least+0.023
rying+0.022
ishing+0.022
Neg logit contributions
ves-0.023
uts-0.022
 Everyone-0.021
Her-0.021
 many-0.021
Token k
Token ID479
Feature activation-4.81e-02

Pos logit contributions
ves+0.033
 Everyone+0.032
 many+0.032
Her+0.031
 herself+0.031
Neg logit contributions
oured-0.043
ited-0.041
rying-0.039
ishing-0.038
 least-0.037
Tokenite
Token ID578
Feature activation-5.78e-02

Pos logit contributions
oured+0.082
ited+0.080
rying+0.075
 least+0.071
ishing+0.070
Neg logit contributions
ves-0.090
 Everyone-0.087
 many-0.086
Her-0.084
uts-0.084
Token with
Token ID351
Feature activation+1.52e-02

Pos logit contributions
oured+0.113
ited+0.108
rying+0.102
ishing+0.099
 least+0.098
Neg logit contributions
ves-0.093
 Everyone-0.089
 many-0.089
Her-0.086
 herself-0.086
Token its
Token ID663
Feature activation-2.44e-03

Pos logit contributions
ves+0.021
uts+0.018
 Everyone+0.017
Her+0.017
ggies+0.017
Neg logit contributions
oured-0.034
ited-0.033
 least-0.030
ishing-0.029
 extingu-0.029
Token long
Token ID890
Feature activation+3.23e-02

Pos logit contributions
oured+0.005
ited+0.004
 least+0.004
ishing+0.004
arse+0.004
Neg logit contributions
ves-0.004
 Everyone-0.004
Her-0.004
uts-0.004
 many-0.004
Token strings
Token ID13042
Feature activation+6.76e-02

Pos logit contributions
ves+0.052
 Everyone+0.051
 many+0.050
uts+0.049
Her+0.048
Neg logit contributions
oured-0.061
ited-0.057
ishing-0.052
rying-0.052
arse-0.050
Token was
Token ID373
Feature activation-1.49e-03

Pos logit contributions
ves+0.090
 Everyone+0.087
 many+0.085
Her+0.083
 herself+0.081
Neg logit contributions
oured-0.151
ited-0.144
rying-0.137
ishing-0.133
 least-0.131
Token on
Token ID319
Feature activation+5.06e-02

Pos logit contributions
oured+0.020
ited+0.019
rying+0.018
 least+0.018
ishing+0.018
Neg logit contributions
ves-0.008
 Everyone-0.008
uts-0.007
Her-0.007
 herself-0.007
Token,
Token ID11
Feature activation+6.04e-03

Pos logit contributions
Her+0.083
uts+0.080
ves+0.076
 herself+0.074
 Everyone+0.073
Neg logit contributions
oured-0.090
 least-0.087
ishing-0.081
cy-0.080
ited-0.078
Token they
Token ID484
Feature activation-1.93e-02

Pos logit contributions
ves+0.010
Her+0.010
uts+0.009
 herself+0.009
 Everyone+0.009
Neg logit contributions
oured-0.011
ited-0.011
 least-0.010
rying-0.010
cy-0.010
Token were
Token ID547
Feature activation-1.00e-02

Pos logit contributions
oured+0.043
ited+0.040
rying+0.039
ksh+0.039
ishing+0.038
Neg logit contributions
 many-0.027
ves-0.026
 Everyone-0.025
 herself-0.025
 much-0.025
Token always
Token ID1464
Feature activation-2.97e-02

Pos logit contributions
oured+0.021
ited+0.020
 least+0.020
rying+0.019
ishing+0.019
Neg logit contributions
ves-0.014
uts-0.014
 Everyone-0.013
Her-0.013
 People-0.012
Token together
Token ID1978
Feature activation-4.06e-02

Pos logit contributions
oured+0.068
ited+0.065
rying+0.060
ksh+0.059
ishing+0.059
Neg logit contributions
 many-0.039
ves-0.038
 much-0.037
 herself-0.036
 Everyone-0.036
Token.
Token ID13
Feature activation-7.10e-03

Pos logit contributions
oured+0.100
ited+0.097
rying+0.092
ishing+0.090
 least+0.089
Neg logit contributions
ves-0.045
 many-0.043
 Everyone-0.043
 herself-0.040
Her-0.040
Token Oscar
Token ID15694
Feature activation+1.89e-02

Pos logit contributions
oured+0.016
 least+0.015
ksh+0.015
cy+0.014
ited+0.014
Neg logit contributions
ves-0.010
Her-0.009
 herself-0.008
 people-0.008
 many-0.007
Token would
Token ID561
Feature activation+7.87e-03

Pos logit contributions
ves+0.030
Her+0.028
 Everyone+0.028
 many+0.028
uts+0.027
Neg logit contributions
oured-0.038
ited-0.036
 least-0.034
rying-0.033
ishing-0.033
Token help
Token ID1037
Feature activation-1.09e-02

Pos logit contributions
ves+0.013
 Everyone+0.012
uts+0.012
 many+0.012
Her+0.011
Neg logit contributions
oured-0.015
ited-0.014
 least-0.014
rying-0.014
ishing-0.013
Token the
Token ID262
Feature activation-1.33e-03

Pos logit contributions
oured+0.021
 least+0.020
rying+0.020
ishing+0.020
ited+0.018
Neg logit contributions
uts-0.016
ves-0.016
Her-0.016
 much-0.015
 Everyone-0.015
Token,
Token ID11
Feature activation+1.04e-02

Pos logit contributions
oured+0.072
ited+0.069
rying+0.067
 least+0.064
ishing+0.064
Neg logit contributions
ves-0.027
 many-0.026
 Everyone-0.025
 much-0.024
 herself-0.024
Token he
Token ID339
Feature activation+9.63e-03

Pos logit contributions
ves+0.013
Her+0.012
 Everyone+0.012
uts+0.012
ggies+0.010
Neg logit contributions
ited-0.023
oured-0.023
 least-0.021
rying-0.021
ishing-0.021
Token had
Token ID550
Feature activation+3.97e-03

Pos logit contributions
ves+0.018
 Everyone+0.018
 many+0.018
Her+0.017
 herself+0.017
Neg logit contributions
oured-0.016
ited-0.015
rying-0.014
ishing-0.014
 least-0.014
Token to
Token ID284
Feature activation-5.66e-03

Pos logit contributions
ves+0.006
 Everyone+0.006
Her+0.006
uts+0.005
 herself+0.005
Neg logit contributions
oured-0.008
ited-0.008
rying-0.007
 least-0.007
ishing-0.007
Token squ
Token ID2809
Feature activation+7.61e-04

Pos logit contributions
oured+0.008
ited+0.008
 least+0.007
rying+0.007
ishing+0.007
Neg logit contributions
ves-0.012
uts-0.011
 Everyone-0.011
 many-0.011
Her-0.011
Tokenint
Token ID600
Feature activation-1.56e-01

Pos logit contributions
 Everyone+0.002
 many+0.002
 much+0.002
 herself+0.002
Her+0.002
Neg logit contributions
oured-0.001
 least-0.001
 extingu-0.001
ited-0.001
rying-0.001
Token his
Token ID465
Feature activation-1.05e-02

Pos logit contributions
oured+0.292
rying+0.272
ishing+0.267
ited+0.260
ish+0.240
Neg logit contributions
ves-0.263
 Everyone-0.252
uts-0.251
Her-0.248
ggies-0.238
Token eyes
Token ID2951
Feature activation+2.43e-02

Pos logit contributions
oured+0.019
ited+0.018
rying+0.017
ishing+0.016
 least+0.016
Neg logit contributions
ves-0.019
 Everyone-0.018
 many-0.018
Her-0.018
uts-0.017
Token.
Token ID13
Feature activation-3.01e-03

Pos logit contributions
ves+0.033
 Everyone+0.031
ggies+0.029
 People+0.028
Her+0.028
Neg logit contributions
oured-0.056
ited-0.053
rying-0.050
 least-0.048
ishing-0.047
Token He
Token ID679
Feature activation+7.62e-03

Pos logit contributions
oured+0.007
ited+0.006
 least+0.006
int+0.006
rying+0.006
Neg logit contributions
ves-0.004
Her-0.004
uts-0.004
 herself-0.003
l-0.003
Token turned
Token ID2900
Feature activation-3.02e-02

Pos logit contributions
ves+0.014
 many+0.014
 Everyone+0.014
Her+0.013
 herself+0.013
Neg logit contributions
oured-0.013
ited-0.012
rying-0.012
ishing-0.011
 least-0.011
Token is
Token ID318
Feature activation-3.82e-02

Pos logit contributions
oured+0.025
ited+0.024
rying+0.022
ishing+0.022
ksh+0.021
Neg logit contributions
ves-0.015
 many-0.015
 Everyone-0.014
 much-0.014
 herself-0.014
Token angry
Token ID7954
Feature activation+3.86e-02

Pos logit contributions
oured+0.079
rying+0.075
 least+0.073
ited+0.073
ishing+0.072
Neg logit contributions
 Everyone-0.055
ves-0.055
 People-0.049
Her-0.048
uts-0.047
Token.
Token ID13
Feature activation+9.32e-03

Pos logit contributions
ves+0.055
Her+0.049
 Everyone+0.049
 many+0.045
 herself+0.043
Neg logit contributions
rying-0.079
oured-0.077
ishing-0.075
ited-0.074
 least-0.073
Token It
Token ID632
Feature activation-9.32e-03

Pos logit contributions
ves+0.015
 lesson+0.014
ggies+0.013
gh+0.013
 chores+0.013
Neg logit contributions
 extingu-0.013
cy-0.013
oured-0.013
ksh-0.012
 retrieved-0.011
Token his
Token ID465
Feature activation-1.45e-02

Pos logit contributions
oured+0.020
ited+0.019
rying+0.018
ishing+0.017
arse+0.017
Neg logit contributions
 many-0.014
ves-0.014
 Everyone-0.013
 much-0.013
 herself-0.013
Tokenses
Token ID8448
Feature activation-1.41e-01

Pos logit contributions
oured+0.014
 least+0.012
ited+0.012
rying+0.011
ishing+0.011
Neg logit contributions
ves-0.037
 Everyone-0.036
Her-0.036
 herself-0.035
uts-0.035
Token.
Token ID13
Feature activation+9.37e-03

Pos logit contributions
oured+0.333
ited+0.321
rying+0.309
ishing+0.303
 least+0.297
Neg logit contributions
ves-0.167
 Everyone-0.160
 many-0.150
Her-0.149
uts-0.146
Token It
Token ID632
Feature activation-8.55e-03

Pos logit contributions
ves+0.016
ggies+0.014
 lesson+0.013
 laws+0.012
 others+0.012
Neg logit contributions
oured-0.014
 extingu-0.013
cy-0.012
ksh-0.012
 least-0.012
Token scratches
Token ID37627
Feature activation-2.68e-02

Pos logit contributions
oured+0.019
ited+0.018
rying+0.017
ishing+0.016
arse+0.016
Neg logit contributions
 many-0.013
ves-0.013
 much-0.012
 Everyone-0.012
 herself-0.011
Token.
Token ID13
Feature activation+8.89e-03

Pos logit contributions
cy+0.042
oured+0.040
ked+0.039
 extingu+0.039
ishing+0.038
Neg logit contributions
ves-0.042
uts-0.040
ggies-0.040
ggy-0.039
Every-0.037
Token It
Token ID632
Feature activation-8.57e-03

Pos logit contributions
ves+0.014
ggies+0.013
 lesson+0.012
 laws+0.012
ups+0.012
Neg logit contributions
oured-0.013
 extingu-0.012
 least-0.012
ksh-0.011
cy-0.011
Tokenksh
Token ID50133
Feature activation-1.58e-01

Pos logit contributions
 least+0.051
ited+0.049
oured+0.047
rying+0.042
 believer+0.037
Neg logit contributions
ves-0.147
uts-0.143
 Everyone-0.141
 many-0.140
Her-0.134
Tokenakes
Token ID1124
Feature activation-8.19e-02

Pos logit contributions
oured+0.295
ited+0.287
rying+0.273
 least+0.264
ishing+0.251
Neg logit contributions
 Everyone-0.256
ves-0.255
uts-0.246
 many-0.237
 himself-0.227
Token.
Token ID13
Feature activation-4.21e-03

Pos logit contributions
oured+0.192
ited+0.183
rying+0.181
 least+0.173
ishing+0.173
Neg logit contributions
ves-0.099
 Everyone-0.095
Her-0.094
 many-0.090
 herself-0.085
Token The
Token ID383
Feature activation-2.65e-02

Pos logit contributions
oured+0.009
ksh+0.009
 least+0.008
cy+0.008
 extingu+0.008
Neg logit contributions
ves-0.006
 lesson-0.005
 many-0.005
ggy-0.005
uts-0.005
Token mil
Token ID1465
Feature activation-6.19e-02

Pos logit contributions
oured+0.045
ited+0.043
 least+0.042
rying+0.039
ksh+0.038
Neg logit contributions
uts-0.050
ves-0.049
Her-0.047
 Everyone-0.044
 herself-0.044
Tokenksh
Token ID50133
Feature activation-1.52e-01

Pos logit contributions
oured+0.034
ited+0.033
 least+0.028
rying+0.026
ishing+0.019
Neg logit contributions
ves-0.178
 Everyone-0.174
uts-0.171
 many-0.169
Her-0.167
Tokenakes
Token ID1124
Feature activation-8.09e-02

Pos logit contributions
oured+0.276
ited+0.262
rying+0.253
 least+0.239
ishing+0.230
Neg logit contributions
ves-0.264
 Everyone-0.263
uts-0.247
 many-0.244
 much-0.235
Token were
Token ID547
Feature activation-5.14e-03

Pos logit contributions
oured+0.138
ited+0.133
rying+0.123
 least+0.118
ishing+0.117
Neg logit contributions
ves-0.145
 Everyone-0.144
Her-0.141
 many-0.141
uts-0.137
Token blue
Token ID4171
Feature activation+6.42e-03

Pos logit contributions
oured+0.010
ited+0.010
 least+0.009
ksh+0.009
rying+0.009
Neg logit contributions
ves-0.008
 Everyone-0.008
uts-0.008
Her-0.007
 People-0.007
Token and
Token ID290
Feature activation-7.60e-03

Pos logit contributions
ves+0.009
Her+0.008
 Everyone+0.008
 many+0.008
 People+0.007
Neg logit contributions
oured-0.014
ited-0.013
ksh-0.013
ishing-0.013
rying-0.013
Token pink
Token ID11398
Feature activation+1.58e-02

Pos logit contributions
oured+0.014
ited+0.014
ksh+0.013
 least+0.012
cy+0.012
Neg logit contributions
ves-0.012
uts-0.012
 Everyone-0.012
Her-0.012
 herself-0.011
Token angry
Token ID7954
Feature activation+4.52e-02

Pos logit contributions
ves+0.023
 Everyone+0.023
uts+0.022
Her+0.021
 herself+0.020
Neg logit contributions
ited-0.029
oured-0.029
rying-0.027
ksh-0.027
 least-0.026
Token snake
Token ID17522
Feature activation-2.46e-02

Pos logit contributions
ves+0.072
Her+0.068
 many+0.068
 Everyone+0.067
 much+0.066
Neg logit contributions
oured-0.086
ited-0.083
rying-0.081
ishing-0.080
 least-0.076
Token.
Token ID13
Feature activation-1.66e-03

Pos logit contributions
oured+0.062
ited+0.060
rying+0.057
ishing+0.056
 least+0.055
Neg logit contributions
ves-0.026
 many-0.024
 Everyone-0.024
Her-0.022
 herself-0.022
Token It
Token ID632
Feature activation-2.87e-03

Pos logit contributions
oured+0.004
ksh+0.003
 least+0.003
 extingu+0.003
cy+0.003
Neg logit contributions
ves-0.002
ggies-0.002
 lesson-0.002
 herself-0.002
Her-0.002
Token his
Token ID465
Feature activation-8.11e-03

Pos logit contributions
oured+0.006
ited+0.006
rying+0.006
ishing+0.006
arse+0.005
Neg logit contributions
 many-0.004
ves-0.004
 much-0.004
 Everyone-0.004
 himself-0.004
Tokenses
Token ID8448
Feature activation-1.30e-01

Pos logit contributions
oured+0.012
ited+0.011
rying+0.010
ishing+0.010
 least+0.010
Neg logit contributions
ves-0.017
 many-0.017
 Everyone-0.017
 much-0.016
 herself-0.016
Token and
Token ID290
Feature activation-4.38e-03

Pos logit contributions
oured+0.328
ited+0.318
rying+0.303
ishing+0.295
 least+0.294
Neg logit contributions
ves-0.139
 many-0.136
 Everyone-0.131
 much-0.126
 herself-0.124
Token snaps
Token ID23429
Feature activation-3.90e-03

Pos logit contributions
oured+0.009
ited+0.008
rying+0.008
ishing+0.008
arse+0.007
Neg logit contributions
 many-0.007
ves-0.007
 herself-0.007
 Everyone-0.007
 much-0.007
Token at
Token ID379
Feature activation-1.29e-02

Pos logit contributions
oured+0.007
rying+0.007
ited+0.007
ishing+0.007
ksh+0.006
Neg logit contributions
ves-0.007
 Everyone-0.006
Her-0.006
 People-0.006
uts-0.006
Token Ben
Token ID3932
Feature activation+3.84e-02

Pos logit contributions
oured+0.028
 least+0.027
ited+0.026
ishing+0.025
rying+0.025
Neg logit contributions
ves-0.019
uts-0.018
Her-0.017
 Everyone-0.017
 much-0.016
Token.
Token ID13
Feature activation-1.77e-03

Pos logit contributions
 Everyone+0.053
ves+0.052
 People+0.049
Everyone+0.048
Her+0.047
Neg logit contributions
oured-0.082
rying-0.081
ited-0.078
ishing-0.076
 least-0.074
Token maze
Token ID31237
Feature activation-2.06e-03

Pos logit contributions
ves+0.005
 many+0.005
 Everyone+0.005
 much+0.005
 herself+0.005
Neg logit contributions
oured-0.007
ited-0.007
rying-0.006
ishing-0.006
 least-0.006
Token.
Token ID13
Feature activation+1.57e-03

Pos logit contributions
oured+0.005
ited+0.004
rying+0.004
 least+0.004
ishing+0.004
Neg logit contributions
Her-0.003
ves-0.003
 Everyone-0.003
 many-0.002
 herself-0.002
Token
Token ID198
Feature activation+1.79e-02

Pos logit contributions
ves+0.002
uts+0.002
ggy+0.002
 himself+0.002
ggies+0.002
Neg logit contributions
oured-0.003
cy-0.003
ksh-0.003
 least-0.003
ited-0.003
Token
Token ID198
Feature activation+1.78e-02

Pos logit contributions
uts+0.023
ves+0.021
 words+0.021
 laws+0.020
 people+0.020
Neg logit contributions
oured-0.034
cy-0.031
ited-0.030
 least-0.029
int-0.028
TokenExc
Token ID40127
Feature activation+2.02e-02

Pos logit contributions
uts+0.022
ves+0.022
 himself+0.022
 Everyone+0.020
 many+0.020
Neg logit contributions
ited-0.040
oured-0.037
cy-0.034
rying-0.034
ishing-0.033
Tokenited
Token ID863
Feature activation-1.39e-01

Pos logit contributions
Her+0.058
ves+0.058
 many+0.056
The+0.056
 herself+0.055
Neg logit contributions
oured-0.009
rying-0.007
sing-0.006
 extingu-0.006
arse-0.006
Tokenly
Token ID306
Feature activation+3.55e-02

Pos logit contributions
oured+0.303
ited+0.291
rying+0.272
ishing+0.270
 least+0.260
Neg logit contributions
ves-0.201
 many-0.182
Her-0.181
 Everyone-0.181
uts-0.177
Token,
Token ID11
Feature activation+1.18e-02

Pos logit contributions
ves+0.041
 Everyone+0.039
 many+0.039
Her+0.037
 much+0.037
Neg logit contributions
oured-0.086
ited-0.081
rying-0.078
ishing-0.076
 least-0.074
Token she
Token ID673
Feature activation+7.39e-03

Pos logit contributions
ves+0.022
uts+0.021
ggies+0.019
ggy+0.018
 himself+0.018
Neg logit contributions
oured-0.020
ited-0.019
cy-0.018
 least-0.018
rying-0.016
Token asked
Token ID1965
Feature activation-1.59e-02

Pos logit contributions
ves+0.013
 many+0.012
 Everyone+0.012
Her+0.012
 herself+0.012
Neg logit contributions
oured-0.013
ited-0.013
rying-0.012
ishing-0.012
 least-0.012
Token her
Token ID607
Feature activation+6.72e-03

Pos logit contributions
oured+0.031
rying+0.026
ishing+0.026
cy+0.025
 least+0.025
Neg logit contributions
uts-0.026
ves-0.025
ggies-0.021
l-0.020
He-0.020