TOP ACTIVATIONS
MAX = 0.158

- nine , forty , forty - one , forty -
, forty - one , forty - two , forty -
a swimming pool and a restaurant . Lily was so excited
gear - a bright - red sc ooter ! All of
Sam and Sue , a married couple . Sam and Sue
, thirty - nine , forty , forty - one ,
but he was a messy eater . His mom always gave
tidy room and a sweet dessert . <|endoftext|> L ily likes
games , and a giant birthday cake . The kids were
, forty - three , forty - four , forty -
Lily went to a big restaurant with Mom and Dad .
gorilla , and a baby gorilla . The baby gorilla was
has a dragon and a knight and a castle . Lily
Very well . I have a
her mom went to a restaurant for dinner . Lily liked
a nurse , and a writer . She had helped many
ball , and a t eddy bear . She liked her
. The End . <|endoftext|> Once upon a
. The End . <|endoftext|> Once there was
a car and a t eddy bear . They like to

MIN ACTIVATIONS
MIN = -0.185

. He watched as Per c ival cut out two shapes
sand and watched as it sc ur ried away .
knew how to play hop sc ot ch .
. Rabbit watched how squirrel sc ur ried up it quickly
day , he watched as other pages ran around and played
had to know all about microsc opes .
She watched as the mouse sc ur ried over to the
was the same color as Mitt ens ! Lily was so
replied with kindness instead of sc olding him . He ran
fun . Tim watched as other kids played with the drum
forest . Who knows what other new adventures they would find
exploring , never knowing what other secrets she would find .
watched the squirrel as it sc ur ried around .
He stopped to wonder what other surprises he would find .
should take turns pulling each other â s hair
She grabbed oven m itt s and carefully pulled out
can always rely on each other ". Tina hugged her back
. Barry enjoyed playing hop sc ot ch with them so
dance with them instead of sc aring them .
and thanked the parent before sc ur rying away .

INTERVAL 0.072 - 0.158
CONTAINS 1.180%

there was a clumsy bar ber . He liked to zip
we need it and to always be safe . <|endoftext|> Once
, but the waves were too strong . His boat caps
her and her mom to share . The sign
witch . Lily learned that being brave and helping others can

INTERVAL -0.014 - 0.072
CONTAINS 67.134%

and cars and cats . They mixed the colors and made
The cat me ows and jumps out of the box
One morning he woke up and couldn 't see anything because
better and realized that sometimes it 's good to listen to
the park to play with his friends . Max 's friends

INTERVAL -0.099 - -0.014
CONTAINS 31.470%

both hands . Then he said the same magic word .
I have the strongest branches ! I can handle anything ."
it on a leaf . L ily cared about
on his work clothes , picked up his work bag and
Lily . It 's so much fun !" Lily smiled and

INTERVAL -0.185 - -0.099
CONTAINS 0.216%

stay in her room while she went to the store .
. But no matter how hard he tried , he couldn
cleaning . They argued while they worked and they were both
did not know that the box was not empty . It
replied . As they continued walking , they met
Token-
Token ID12
Feature activation-4.58e-03

Pos logit contributions
What+0.215
Really+0.188
Who+0.170
It+0.169
 boy+0.166
Neg logit contributions
 least-0.188
onel-0.185
ird-0.183
 er-0.178
ig-0.176
Tokennine
Token ID30888
Feature activation+5.46e-02

Pos logit contributions
 least+0.009
onel+0.008
ird+0.008
col+0.008
issy+0.007
Neg logit contributions
What-0.007
Really-0.006
 boy-0.006
Who-0.005
 man-0.005
Token,
Token ID11
Feature activation+8.22e-03

Pos logit contributions
What+0.122
Really+0.111
 boy+0.103
Who+0.102
It+0.100
Neg logit contributions
 least-0.067
onel-0.062
ird-0.059
 er-0.055
issy-0.052
Token forty
Token ID16571
Feature activation+1.51e-01

Pos logit contributions
What+0.015
 boy+0.014
Really+0.014
Spot+0.013
 man+0.013
Neg logit contributions
 least-0.012
 er-0.010
onel-0.010
bar-0.010
ird-0.009
Token,
Token ID11
Feature activation+1.52e-02

Pos logit contributions
What+0.289
Really+0.253
Who+0.236
It+0.228
Can+0.224
Neg logit contributions
 least-0.226
onel-0.219
ird-0.217
ig-0.216
 er-0.214
Token forty
Token ID16571
Feature activation+1.58e-01

Pos logit contributions
What+0.027
Really+0.025
 boy+0.025
 man+0.023
Spot+0.023
Neg logit contributions
 least-0.023
onel-0.020
 er-0.020
bar-0.020
ird-0.018
Token-
Token ID12
Feature activation+1.44e-02

Pos logit contributions
What+0.309
Really+0.272
Who+0.255
It+0.246
Can+0.242
Neg logit contributions
 least-0.226
ig-0.219
ird-0.217
onel-0.216
 er-0.214
Tokenone
Token ID505
Feature activation+7.78e-02

Pos logit contributions
What+0.022
Really+0.020
 boy+0.020
Who+0.017
 man+0.017
Neg logit contributions
 least-0.026
onel-0.026
ird-0.024
col-0.024
issy-0.022
Token,
Token ID11
Feature activation+1.13e-02

Pos logit contributions
What+0.158
Really+0.146
 boy+0.136
Who+0.131
It+0.128
Neg logit contributions
 least-0.108
onel-0.103
ird-0.095
 er-0.089
col-0.087
Token forty
Token ID16571
Feature activation+1.55e-01

Pos logit contributions
What+0.020
Really+0.018
 boy+0.018
 man+0.016
+0.016
Neg logit contributions
 least-0.018
onel-0.016
bar-0.015
 er-0.015
ird-0.015
Token-
Token ID12
Feature activation+1.26e-02

Pos logit contributions
What+0.308
Really+0.272
Who+0.254
It+0.246
Can+0.242
Neg logit contributions
 least-0.219
ig-0.210
ird-0.209
onel-0.206
 er-0.205
Token,
Token ID11
Feature activation+1.52e-02

Pos logit contributions
What+0.289
Really+0.253
Who+0.236
It+0.228
Can+0.224
Neg logit contributions
 least-0.226
onel-0.219
ird-0.217
ig-0.216
 er-0.214
Token forty
Token ID16571
Feature activation+1.58e-01

Pos logit contributions
What+0.027
Really+0.025
 boy+0.025
 man+0.023
Spot+0.023
Neg logit contributions
 least-0.023
onel-0.020
 er-0.020
bar-0.020
ird-0.018
Token-
Token ID12
Feature activation+1.44e-02

Pos logit contributions
What+0.309
Really+0.272
Who+0.255
It+0.246
Can+0.242
Neg logit contributions
 least-0.226
ig-0.219
ird-0.217
onel-0.216
 er-0.214
Tokenone
Token ID505
Feature activation+7.78e-02

Pos logit contributions
What+0.022
Really+0.020
 boy+0.020
Who+0.017
 man+0.017
Neg logit contributions
 least-0.026
onel-0.026
ird-0.024
col-0.024
issy-0.022
Token,
Token ID11
Feature activation+1.13e-02

Pos logit contributions
What+0.158
Really+0.146
 boy+0.136
Who+0.131
It+0.128
Neg logit contributions
 least-0.108
onel-0.103
ird-0.095
 er-0.089
col-0.087
Token forty
Token ID16571
Feature activation+1.55e-01

Pos logit contributions
What+0.020
Really+0.018
 boy+0.018
 man+0.016
+0.016
Neg logit contributions
 least-0.018
onel-0.016
bar-0.015
 er-0.015
ird-0.015
Token-
Token ID12
Feature activation+1.26e-02

Pos logit contributions
What+0.308
Really+0.272
Who+0.254
It+0.246
Can+0.242
Neg logit contributions
 least-0.219
ig-0.210
ird-0.209
onel-0.206
 er-0.205
Tokentwo
Token ID11545
Feature activation+9.57e-02

Pos logit contributions
What+0.020
Really+0.018
 boy+0.018
Who+0.015
 man+0.015
Neg logit contributions
 least-0.022
onel-0.022
ird-0.020
col-0.020
bar-0.019
Token,
Token ID11
Feature activation-8.95e-03

Pos logit contributions
What+0.203
Really+0.188
 boy+0.176
Who+0.170
It+0.169
Neg logit contributions
 least-0.127
onel-0.119
ird-0.110
 er-0.102
col-0.098
Token forty
Token ID16571
Feature activation+1.37e-01

Pos logit contributions
 least+0.014
onel+0.013
bar+0.012
 er+0.012
ird+0.012
Neg logit contributions
What-0.016
Really-0.015
 boy-0.014
 man-0.013
Spot-0.013
Token-
Token ID12
Feature activation-1.09e-02

Pos logit contributions
What+0.236
Really+0.207
Who+0.188
It+0.184
 boy+0.183
Neg logit contributions
 least-0.235
onel-0.226
ird-0.219
ig-0.215
 er-0.213
Token a
Token ID257
Feature activation-2.59e-03

Pos logit contributions
What+0.069
Really+0.063
 boy+0.059
Who+0.053
It+0.053
Neg logit contributions
 least-0.082
onel-0.081
ird-0.073
col-0.072
issy-0.070
Token swimming
Token ID14899
Feature activation+5.79e-02

Pos logit contributions
 least+0.004
onel+0.004
ird+0.004
col+0.004
auc+0.004
Neg logit contributions
What-0.004
Really-0.004
 boy-0.004
It-0.004
Who-0.004
Token pool
Token ID5933
Feature activation+5.71e-03

Pos logit contributions
What+0.111
Really+0.104
 boy+0.097
Who+0.092
It+0.091
Neg logit contributions
 least-0.086
onel-0.083
ird-0.074
col-0.073
auc-0.071
Token and
Token ID290
Feature activation+1.60e-02

Pos logit contributions
What+0.009
Really+0.008
 boy+0.008
 man+0.007
It+0.007
Neg logit contributions
 least-0.011
onel-0.010
ird-0.009
col-0.009
issy-0.009
Token a
Token ID257
Feature activation+4.09e-02

Pos logit contributions
What+0.027
Really+0.025
 boy+0.024
Who+0.021
It+0.021
Neg logit contributions
 least-0.027
onel-0.027
ird-0.024
col-0.024
aved-0.023
Token restaurant
Token ID7072
Feature activation+1.55e-01

Pos logit contributions
What+0.068
Really+0.062
 boy+0.058
Who+0.054
It+0.053
Neg logit contributions
 least-0.071
onel-0.068
ird-0.063
col-0.061
auc-0.060
Token.
Token ID13
Feature activation-4.65e-03

Pos logit contributions
What+0.278
Really+0.260
 boy+0.235
Who+0.223
It+0.220
Neg logit contributions
 least-0.249
onel-0.240
ird-0.224
col-0.219
auc-0.212
Token Lily
Token ID20037
Feature activation-2.20e-02

Pos logit contributions
 least+0.009
onel+0.009
ird+0.008
col+0.008
aved+0.008
Neg logit contributions
What-0.007
Really-0.006
 boy-0.006
Who-0.005
It-0.005
Token was
Token ID373
Feature activation-9.16e-03

Pos logit contributions
 least+0.040
onel+0.037
ird+0.036
col+0.035
inda+0.035
Neg logit contributions
What-0.036
Really-0.033
Who-0.027
It-0.026
 boy-0.026
Token so
Token ID523
Feature activation+2.40e-02

Pos logit contributions
 least+0.013
onel+0.013
ird+0.011
col+0.011
 er+0.011
Neg logit contributions
What-0.018
Really-0.017
 boy-0.015
-0.015
Who-0.015
Token excited
Token ID6568
Feature activation+2.28e-03

Pos logit contributions
What+0.041
Really+0.038
Who+0.033
 boy+0.032
How+0.032
Neg logit contributions
onel-0.042
 least-0.039
ird-0.037
col-0.036
aved-0.036
Token gear
Token ID7733
Feature activation-2.31e-02

Pos logit contributions
What+0.012
Really+0.011
 boy+0.010
It+0.010
+0.010
Neg logit contributions
 least-0.013
onel-0.012
col-0.012
ird-0.011
 er-0.011
Token -
Token ID532
Feature activation+2.90e-02

Pos logit contributions
 least+0.047
onel+0.047
oken+0.042
ird+0.042
issy+0.042
Neg logit contributions
What-0.029
 boy-0.028
Really-0.026
 man-0.025
It-0.023
Token a
Token ID257
Feature activation+4.83e-02

Pos logit contributions
What+0.038
Really+0.033
 boy+0.032
It+0.027
Who+0.027
Neg logit contributions
 least-0.061
onel-0.060
ird-0.055
col-0.054
issy-0.054
Token bright
Token ID6016
Feature activation+1.72e-02

Pos logit contributions
What+0.074
Really+0.067
 boy+0.061
Who+0.057
It+0.056
Neg logit contributions
 least-0.090
onel-0.086
ird-0.080
col-0.077
 er-0.076
Token-
Token ID12
Feature activation+3.71e-02

Pos logit contributions
What+0.027
Really+0.025
 boy+0.022
Who+0.021
It+0.020
Neg logit contributions
 least-0.032
onel-0.030
ird-0.028
col-0.028
 er-0.027
Tokenred
Token ID445
Feature activation+1.55e-01

Pos logit contributions
What+0.065
 boy+0.059
Really+0.057
It+0.057
 man+0.055
Neg logit contributions
 least-0.066
onel-0.063
oken-0.055
ird-0.052
 ease-0.052
Token sc
Token ID629
Feature activation-8.41e-02

Pos logit contributions
What+0.264
Really+0.251
 boy+0.215
Who+0.213
It+0.204
Neg logit contributions
 least-0.262
onel-0.250
ird-0.232
col-0.229
auc-0.225
Tokenooter
Token ID25141
Feature activation+1.05e-02

Pos logit contributions
 least+0.119
onel+0.112
 ease+0.097
col+0.096
inda+0.095
Neg logit contributions
 boy-0.147
 man-0.144
What-0.142
Really-0.136
He-0.134
Token!
Token ID0
Feature activation-3.87e-03

Pos logit contributions
What+0.014
 boy+0.014
Really+0.013
 man+0.012
It+0.012
Neg logit contributions
onel-0.021
 least-0.020
oken-0.018
inda-0.018
issy-0.018
Token All
Token ID1439
Feature activation-3.40e-02

Pos logit contributions
 least+0.006
onel+0.006
ird+0.006
issy+0.006
col+0.006
Neg logit contributions
What-0.006
Really-0.006
 boy-0.005
Who-0.005
-0.005
Token of
Token ID286
Feature activation-4.02e-02

Pos logit contributions
 least+0.068
onel+0.068
ird+0.063
col+0.059
oken+0.059
Neg logit contributions
What-0.048
 boy-0.044
Really-0.040
It-0.037
 man-0.037
Token Sam
Token ID3409
Feature activation-9.90e-03

Pos logit contributions
What+0.075
Really+0.074
 boy+0.065
+0.062
Who+0.059
Neg logit contributions
 least-0.087
onel-0.081
ird-0.076
 ease-0.073
col-0.072
Token and
Token ID290
Feature activation+2.40e-02

Pos logit contributions
 least+0.019
onel+0.019
ird+0.017
col+0.017
aved+0.017
Neg logit contributions
What-0.015
Really-0.013
 boy-0.013
 man-0.011
It-0.011
Token Sue
Token ID26089
Feature activation+3.23e-02

Pos logit contributions
What+0.040
Really+0.038
+0.031
Who+0.031
 boy+0.030
Neg logit contributions
 least-0.041
onel-0.037
ird-0.036
 er-0.034
col-0.034
Token,
Token ID11
Feature activation+2.27e-03

Pos logit contributions
What+0.054
Really+0.050
 boy+0.047
Who+0.043
It+0.043
Neg logit contributions
 least-0.056
onel-0.054
ird-0.049
col-0.048
issy-0.047
Token a
Token ID257
Feature activation+5.31e-02

Pos logit contributions
What+0.004
Really+0.003
Who+0.003
+0.003
 boy+0.003
Neg logit contributions
 least-0.004
onel-0.004
ird-0.004
col-0.003
issy-0.003
Token married
Token ID6405
Feature activation+1.52e-01

Pos logit contributions
What+0.074
Really+0.067
Who+0.055
How+0.054
It+0.052
Neg logit contributions
 least-0.099
onel-0.093
ird-0.093
 er-0.087
col-0.087
Token couple
Token ID3155
Feature activation+8.66e-02

Pos logit contributions
What+0.224
Really+0.204
 boy+0.187
Who+0.172
It+0.171
Neg logit contributions
 least-0.294
onel-0.285
ird-0.263
col-0.257
issy-0.253
Token.
Token ID13
Feature activation-7.66e-03

Pos logit contributions
What+0.120
 boy+0.116
 man+0.102
Really+0.097
It+0.091
Neg logit contributions
 least-0.171
onel-0.170
aved-0.152
auc-0.149
issy-0.149
Token Sam
Token ID3409
Feature activation-1.42e-02

Pos logit contributions
 least+0.014
onel+0.013
ird+0.013
col+0.013
issy+0.012
Neg logit contributions
What-0.011
Really-0.011
 boy-0.010
Who-0.009
-0.009
Token and
Token ID290
Feature activation+1.86e-02

Pos logit contributions
 least+0.025
onel+0.022
ird+0.021
col+0.021
issy+0.021
Neg logit contributions
What-0.024
Really-0.023
Who-0.019
It-0.018
 boy-0.018
Token Sue
Token ID26089
Feature activation+2.67e-02

Pos logit contributions
What+0.033
Really+0.031
+0.029
ller+0.027
It+0.027
Neg logit contributions
 least-0.027
onel-0.026
col-0.026
 er-0.025
ird-0.024
Token,
Token ID11
Feature activation-1.99e-03

Pos logit contributions
What+0.165
Really+0.151
 boy+0.143
Who+0.140
It+0.139
Neg logit contributions
 least-0.085
onel-0.077
ird-0.073
 er-0.070
ig-0.064
Token thirty
Token ID12277
Feature activation+1.18e-01

Pos logit contributions
 least+0.003
 er+0.002
onel+0.002
bar+0.002
ird+0.002
Neg logit contributions
What-0.004
 boy-0.003
Really-0.003
Spot-0.003
 man-0.003
Token-
Token ID12
Feature activation-4.58e-03

Pos logit contributions
What+0.215
Really+0.188
Who+0.170
It+0.169
 boy+0.166
Neg logit contributions
 least-0.188
onel-0.185
ird-0.183
 er-0.178
ig-0.176
Tokennine
Token ID30888
Feature activation+5.46e-02

Pos logit contributions
 least+0.009
onel+0.008
ird+0.008
col+0.008
issy+0.007
Neg logit contributions
What-0.007
Really-0.006
 boy-0.006
Who-0.005
 man-0.005
Token,
Token ID11
Feature activation+8.22e-03

Pos logit contributions
What+0.122
Really+0.111
 boy+0.103
Who+0.102
It+0.100
Neg logit contributions
 least-0.067
onel-0.062
ird-0.059
 er-0.055
issy-0.052
Token forty
Token ID16571
Feature activation+1.51e-01

Pos logit contributions
What+0.015
 boy+0.014
Really+0.014
Spot+0.013
 man+0.013
Neg logit contributions
 least-0.012
 er-0.010
onel-0.010
bar-0.010
ird-0.009
Token,
Token ID11
Feature activation+1.52e-02

Pos logit contributions
What+0.289
Really+0.253
Who+0.236
It+0.228
Can+0.224
Neg logit contributions
 least-0.226
onel-0.219
ird-0.217
ig-0.216
 er-0.214
Token forty
Token ID16571
Feature activation+1.58e-01

Pos logit contributions
What+0.027
Really+0.025
 boy+0.025
 man+0.023
Spot+0.023
Neg logit contributions
 least-0.023
onel-0.020
 er-0.020
bar-0.020
ird-0.018
Token-
Token ID12
Feature activation+1.44e-02

Pos logit contributions
What+0.309
Really+0.272
Who+0.255
It+0.246
Can+0.242
Neg logit contributions
 least-0.226
ig-0.219
ird-0.217
onel-0.216
 er-0.214
Tokenone
Token ID505
Feature activation+7.78e-02

Pos logit contributions
What+0.022
Really+0.020
 boy+0.020
Who+0.017
 man+0.017
Neg logit contributions
 least-0.026
onel-0.026
ird-0.024
col-0.024
issy-0.022
Token,
Token ID11
Feature activation+1.13e-02

Pos logit contributions
What+0.158
Really+0.146
 boy+0.136
Who+0.131
It+0.128
Neg logit contributions
 least-0.108
onel-0.103
ird-0.095
 er-0.089
col-0.087
Token but
Token ID475
Feature activation+9.46e-04

Pos logit contributions
 least+0.006
onel+0.005
 er+0.005
ird+0.005
col+0.005
Neg logit contributions
What-0.006
Really-0.006
 boy-0.005
-0.005
It-0.005
Token he
Token ID339
Feature activation+2.24e-02

Pos logit contributions
What+0.001
Really+0.001
 boy+0.001
+0.001
Who+0.001
Neg logit contributions
 least-0.002
onel-0.002
ird-0.002
 er-0.002
bar-0.002
Token was
Token ID373
Feature activation-5.13e-03

Pos logit contributions
What+0.036
Really+0.032
 boy+0.030
Who+0.028
It+0.027
Neg logit contributions
 least-0.042
onel-0.040
ird-0.037
col-0.035
issy-0.035
Token a
Token ID257
Feature activation+3.73e-02

Pos logit contributions
 least+0.009
onel+0.008
ird+0.007
 er+0.007
inda+0.007
Neg logit contributions
What-0.009
Really-0.008
 boy-0.008
Who-0.007
-0.007
Token messy
Token ID24097
Feature activation+1.99e-02

Pos logit contributions
What+0.050
Really+0.045
 boy+0.040
It+0.037
Who+0.037
Neg logit contributions
 least-0.076
onel-0.074
ird-0.068
col-0.067
issy-0.065
Token eater
Token ID38338
Feature activation+1.50e-01

Pos logit contributions
What+0.029
Really+0.027
It+0.023
Who+0.022
How+0.021
Neg logit contributions
 least-0.039
onel-0.036
 er-0.035
ird-0.034
auc-0.034
Token.
Token ID13
Feature activation-3.46e-03

Pos logit contributions
What+0.200
 boy+0.188
Really+0.175
 man+0.170
It+0.154
Neg logit contributions
 least-0.300
onel-0.296
col-0.270
aved-0.270
oken-0.268
Token His
Token ID2399
Feature activation+1.66e-02

Pos logit contributions
 least+0.006
onel+0.006
ird+0.006
col+0.005
issy+0.005
Neg logit contributions
What-0.005
Really-0.005
 boy-0.005
Who-0.004
 man-0.004
Token mom
Token ID1995
Feature activation+2.67e-02

Pos logit contributions
What+0.023
Really+0.020
Who+0.018
It+0.017
 boy+0.017
Neg logit contributions
 least-0.035
onel-0.033
 er-0.030
ird-0.030
issy-0.029
Token always
Token ID1464
Feature activation+9.16e-02

Pos logit contributions
What+0.047
Really+0.043
 boy+0.041
Who+0.038
It+0.037
Neg logit contributions
 least-0.045
onel-0.043
ird-0.039
col-0.038
issy-0.037
Token gave
Token ID2921
Feature activation+3.75e-03

Pos logit contributions
What+0.183
Really+0.173
 boy+0.157
Who+0.153
It+0.147
Neg logit contributions
 least-0.134
onel-0.120
ird-0.116
issy-0.107
 er-0.104
Token tidy
Token ID43044
Feature activation+3.66e-02

Pos logit contributions
What+0.047
Really+0.043
 boy+0.042
 man+0.036
It+0.036
Neg logit contributions
onel-0.066
 least-0.066
ird-0.060
col-0.059
issy-0.058
Token room
Token ID2119
Feature activation+3.89e-02

Pos logit contributions
What+0.064
Really+0.059
Who+0.052
It+0.051
+0.049
Neg logit contributions
 least-0.063
onel-0.057
ird-0.054
col-0.053
 er-0.053
Token and
Token ID290
Feature activation+3.43e-02

Pos logit contributions
What+0.060
Really+0.054
 boy+0.051
Who+0.046
It+0.046
Neg logit contributions
 least-0.073
onel-0.071
ird-0.065
col-0.064
aved-0.062
Token a
Token ID257
Feature activation+6.43e-02

Pos logit contributions
What+0.052
Really+0.046
 boy+0.045
Who+0.040
It+0.040
Neg logit contributions
 least-0.065
onel-0.064
ird-0.058
col-0.057
aved-0.056
Token sweet
Token ID6029
Feature activation+7.00e-02

Pos logit contributions
What+0.093
Really+0.084
 boy+0.082
 man+0.072
It+0.071
Neg logit contributions
 least-0.126
onel-0.126
ird-0.113
col-0.112
aved-0.111
Token dessert
Token ID23084
Feature activation+1.50e-01

Pos logit contributions
What+0.137
Really+0.129
How+0.114
It+0.114
+0.113
Neg logit contributions
 least-0.100
onel-0.089
ird-0.087
 er-0.084
col-0.083
Token.
Token ID13
Feature activation+3.75e-02

Pos logit contributions
What+0.254
Really+0.233
 boy+0.215
Who+0.201
It+0.200
Neg logit contributions
 least-0.260
onel-0.250
ird-0.229
col-0.221
auc-0.216
Token<|endoftext|>
Token ID50256
Feature activation+4.14e-02

Pos logit contributions
What+0.049
Really+0.044
 boy+0.041
 man+0.037
Who+0.037
Neg logit contributions
 least-0.077
onel-0.074
ird-0.069
col-0.068
issy-0.067
TokenL
Token ID43
Feature activation-6.54e-02

Pos logit contributions
What+0.060
Really+0.059
 boy+0.051
Who+0.048
ller+0.047
Neg logit contributions
 least-0.076
onel-0.073
ird-0.068
col-0.067
aved-0.066
Tokenily
Token ID813
Feature activation+1.74e-03

Pos logit contributions
onel+0.126
 least+0.119
aved+0.113
col+0.110
bar+0.107
Neg logit contributions
What-0.094
 boy-0.091
Tom-0.079
 man-0.079
It-0.076
Token likes
Token ID7832
Feature activation+2.33e-03

Pos logit contributions
What+0.002
Really+0.002
 boy+0.002
Who+0.002
It+0.002
Neg logit contributions
 least-0.004
onel-0.003
ird-0.003
col-0.003
issy-0.003
Token games
Token ID1830
Feature activation+3.42e-02

Pos logit contributions
What+0.001
Really+0.001
 boy+0.001
+0.000
It+0.000
Neg logit contributions
 least-0.001
onel-0.001
ird-0.001
col-0.001
auc-0.001
Token,
Token ID11
Feature activation-1.14e-02

Pos logit contributions
What+0.050
 boy+0.044
Really+0.044
 man+0.040
It+0.039
Neg logit contributions
 least-0.066
onel-0.065
ird-0.058
issy-0.057
col-0.056
Token and
Token ID290
Feature activation+1.87e-02

Pos logit contributions
 least+0.022
onel+0.021
col+0.019
ird+0.019
auc+0.018
Neg logit contributions
What-0.017
Really-0.016
 boy-0.014
-0.013
Who-0.013
Token a
Token ID257
Feature activation+5.52e-02

Pos logit contributions
What+0.032
Really+0.030
 boy+0.027
+0.026
It+0.025
Neg logit contributions
 least-0.032
onel-0.031
ird-0.028
col-0.027
inda-0.026
Token giant
Token ID6175
Feature activation+9.23e-02

Pos logit contributions
What+0.093
Really+0.085
 boy+0.079
Who+0.074
It+0.074
Neg logit contributions
 least-0.094
onel-0.091
ird-0.082
col-0.081
auc-0.078
Token birthday
Token ID10955
Feature activation+1.50e-01

Pos logit contributions
What+0.164
Really+0.151
 boy+0.146
Who+0.133
It+0.132
Neg logit contributions
 least-0.149
onel-0.145
ird-0.130
col-0.127
issy-0.124
Token cake
Token ID12187
Feature activation+6.89e-02

Pos logit contributions
What+0.279
Really+0.258
 boy+0.250
Who+0.228
It+0.227
Neg logit contributions
 least-0.229
onel-0.223
ird-0.197
col-0.193
issy-0.189
Token.
Token ID13
Feature activation-3.23e-03

Pos logit contributions
What+0.118
Really+0.110
 boy+0.101
Who+0.094
It+0.093
Neg logit contributions
 least-0.117
onel-0.113
ird-0.103
col-0.102
auc-0.098
Token The
Token ID383
Feature activation-8.71e-03

Pos logit contributions
 least+0.005
onel+0.005
ird+0.005
col+0.005
issy+0.005
Neg logit contributions
What-0.006
Really-0.005
 boy-0.005
Who-0.004
It-0.004
Token kids
Token ID3988
Feature activation+9.24e-03

Pos logit contributions
 least+0.019
onel+0.018
ird+0.016
col+0.016
 er+0.016
Neg logit contributions
What-0.011
Really-0.010
-0.008
It-0.008
 Please-0.008
Token were
Token ID547
Feature activation-2.66e-02

Pos logit contributions
What+0.015
 boy+0.014
Really+0.014
 man+0.012
It+0.012
Neg logit contributions
 least-0.016
onel-0.016
ird-0.014
col-0.014
issy-0.013
Token,
Token ID11
Feature activation-8.95e-03

Pos logit contributions
What+0.203
Really+0.188
 boy+0.176
Who+0.170
It+0.169
Neg logit contributions
 least-0.127
onel-0.119
ird-0.110
 er-0.102
col-0.098
Token forty
Token ID16571
Feature activation+1.37e-01

Pos logit contributions
 least+0.014
onel+0.013
bar+0.012
 er+0.012
ird+0.012
Neg logit contributions
What-0.016
Really-0.015
 boy-0.014
 man-0.013
Spot-0.013
Token-
Token ID12
Feature activation-1.09e-02

Pos logit contributions
What+0.236
Really+0.207
Who+0.188
It+0.184
 boy+0.183
Neg logit contributions
 least-0.235
onel-0.226
ird-0.219
ig-0.215
 er-0.213
Tokenthree
Token ID15542
Feature activation+7.93e-02

Pos logit contributions
 least+0.020
onel+0.019
col+0.018
ird+0.018
issy+0.017
Neg logit contributions
What-0.017
Really-0.015
 boy-0.015
Who-0.013
 man-0.013
Token,
Token ID11
Feature activation+4.40e-04

Pos logit contributions
What+0.162
Really+0.150
 boy+0.143
Who+0.134
It+0.132
Neg logit contributions
 least-0.110
onel-0.105
ird-0.096
col-0.089
issy-0.087
Token forty
Token ID16571
Feature activation+1.49e-01

Pos logit contributions
What+0.001
Really+0.001
 boy+0.001
 man+0.001
Spot+0.001
Neg logit contributions
 least-0.001
onel-0.001
bar-0.001
 er-0.001
ird-0.001
Token-
Token ID12
Feature activation-2.52e-03

Pos logit contributions
What+0.272
Really+0.238
Who+0.220
It+0.214
 boy+0.207
Neg logit contributions
 least-0.239
onel-0.227
ig-0.225
ird-0.225
 er-0.219
Tokenfour
Token ID14337
Feature activation+8.68e-02

Pos logit contributions
 least+0.005
onel+0.004
ird+0.004
col+0.004
issy+0.004
Neg logit contributions
What-0.004
Really-0.003
 boy-0.003
Who-0.003
 man-0.003
Token,
Token ID11
Feature activation-2.71e-02

Pos logit contributions
What+0.172
Really+0.160
 boy+0.153
Who+0.142
It+0.141
Neg logit contributions
 least-0.124
onel-0.120
ird-0.107
col-0.102
aved-0.099
Token forty
Token ID16571
Feature activation+1.16e-01

Pos logit contributions
 least+0.045
onel+0.042
ird+0.039
 er+0.037
col+0.037
Neg logit contributions
What-0.047
Really-0.042
 boy-0.041
-0.037
Who-0.037
Token-
Token ID12
Feature activation-2.64e-02

Pos logit contributions
What+0.207
Really+0.182
Who+0.165
It+0.159
 boy+0.155
Neg logit contributions
 least-0.188
onel-0.186
ird-0.183
ig-0.175
 er-0.174
Token Lily
Token ID20037
Feature activation-1.33e-02

Pos logit contributions
What+0.027
Really+0.024
 boy+0.023
Who+0.020
It+0.020
Neg logit contributions
 least-0.034
onel-0.033
ird-0.030
col-0.030
issy-0.029
Token went
Token ID1816
Feature activation+5.01e-03

Pos logit contributions
 least+0.025
onel+0.024
ird+0.022
col+0.021
issy+0.021
Neg logit contributions
What-0.021
Really-0.019
 boy-0.018
Who-0.016
It-0.016
Token to
Token ID284
Feature activation+3.18e-02

Pos logit contributions
What+0.006
 boy+0.005
Really+0.005
 man+0.005
It+0.005
Neg logit contributions
onel-0.011
 least-0.011
ird-0.010
col-0.009
issy-0.009
Token a
Token ID257
Feature activation+3.68e-02

Pos logit contributions
What+0.049
Really+0.047
 boy+0.042
+0.039
It+0.039
Neg logit contributions
 least-0.060
onel-0.053
ird-0.050
 ease-0.049
col-0.049
Token big
Token ID1263
Feature activation+8.25e-03

Pos logit contributions
What+0.060
Really+0.055
 boy+0.051
Who+0.047
It+0.047
Neg logit contributions
 least-0.065
onel-0.061
ird-0.056
col-0.055
issy-0.054
Token restaurant
Token ID7072
Feature activation+1.48e-01

Pos logit contributions
What+0.014
Really+0.013
 boy+0.012
Who+0.011
It+0.011
Neg logit contributions
 least-0.014
onel-0.013
ird-0.012
col-0.012
issy-0.011
Token with
Token ID351
Feature activation+4.23e-02

Pos logit contributions
What+0.267
Really+0.258
Who+0.221
 boy+0.218
It+0.214
Neg logit contributions
 least-0.227
onel-0.205
ird-0.204
col-0.198
auc-0.195
Token Mom
Token ID11254
Feature activation+2.46e-03

Pos logit contributions
What+0.069
Really+0.065
 boy+0.060
+0.056
It+0.055
Neg logit contributions
 least-0.073
onel-0.071
ird-0.065
col-0.064
auc-0.062
Token and
Token ID290
Feature activation+1.64e-02

Pos logit contributions
What+0.004
Really+0.004
 boy+0.003
Who+0.003
It+0.003
Neg logit contributions
 least-0.004
onel-0.004
ird-0.004
col-0.004
issy-0.004
Token Dad
Token ID17415
Feature activation+5.84e-02

Pos logit contributions
What+0.023
Really+0.022
 boy+0.019
It+0.018
ller+0.017
Neg logit contributions
 least-0.032
onel-0.029
ird-0.028
col-0.027
auc-0.026
Token.
Token ID13
Feature activation+7.47e-03

Pos logit contributions
What+0.102
Really+0.096
 boy+0.088
Who+0.082
It+0.082
Neg logit contributions
 least-0.097
onel-0.092
ird-0.086
col-0.083
issy-0.080
Token gorilla
Token ID45314
Feature activation+1.39e-01

Pos logit contributions
What+0.073
Really+0.068
 boy+0.060
Who+0.056
It+0.055
Neg logit contributions
 least-0.091
onel-0.087
ird-0.082
col-0.081
issy-0.079
Token,
Token ID11
Feature activation+2.29e-02

Pos logit contributions
What+0.259
Really+0.242
 boy+0.221
Who+0.208
It+0.208
Neg logit contributions
 least-0.218
onel-0.211
ird-0.193
col-0.187
issy-0.180
Token and
Token ID290
Feature activation+4.73e-02

Pos logit contributions
What+0.033
Really+0.032
 boy+0.028
Who+0.026
It+0.026
Neg logit contributions
 least-0.045
onel-0.042
ird-0.039
col-0.038
 er-0.037
Token a
Token ID257
Feature activation+7.47e-02

Pos logit contributions
What+0.076
Really+0.072
+0.063
 boy+0.062
It+0.060
Neg logit contributions
 least-0.081
onel-0.076
ird-0.071
col-0.069
 er-0.069
Token baby
Token ID5156
Feature activation+8.36e-02

Pos logit contributions
What+0.112
Really+0.098
How+0.086
Who+0.082
It+0.081
Neg logit contributions
 least-0.134
onel-0.129
ird-0.126
 er-0.123
col-0.121
Token gorilla
Token ID45314
Feature activation+1.46e-01

Pos logit contributions
What+0.142
Really+0.131
It+0.112
How+0.111
Who+0.107
Neg logit contributions
 least-0.146
onel-0.140
ird-0.133
col-0.125
 er-0.125
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
What+0.255
Really+0.236
 boy+0.217
Who+0.202
It+0.202
Neg logit contributions
 least-0.246
onel-0.241
ird-0.219
col-0.212
aved-0.207
Token The
Token ID383
Feature activation+7.10e-03

Pos logit contributions
What+0.034
Really+0.030
 boy+0.028
+0.026
Who+0.026
Neg logit contributions
 least-0.042
onel-0.040
ird-0.039
col-0.037
issy-0.037
Token baby
Token ID5156
Feature activation+3.80e-02

Pos logit contributions
What+0.011
Really+0.010
It+0.009
+0.009
How+0.008
Neg logit contributions
 least-0.013
onel-0.012
 er-0.011
ovable-0.010
ird-0.010
Token gorilla
Token ID45314
Feature activation+1.09e-01

Pos logit contributions
What+0.067
Really+0.059
How+0.052
It+0.050
Who+0.050
Neg logit contributions
 least-0.061
onel-0.057
ird-0.053
 er-0.051
col-0.051
Token was
Token ID373
Feature activation-8.95e-04

Pos logit contributions
What+0.187
Really+0.169
Who+0.142
How+0.140
 boy+0.139
Neg logit contributions
 least-0.191
onel-0.185
ird-0.168
col-0.165
 er-0.161
Token has
Token ID468
Feature activation+1.10e-02

Pos logit contributions
What+0.012
Really+0.011
 boy+0.010
Who+0.009
It+0.009
Neg logit contributions
 least-0.017
onel-0.017
ird-0.015
col-0.015
aved-0.015
Token a
Token ID257
Feature activation+2.71e-02

Pos logit contributions
What+0.016
Really+0.014
 boy+0.014
It+0.012
Who+0.012
Neg logit contributions
 least-0.022
onel-0.021
ird-0.019
col-0.019
issy-0.019
Token dragon
Token ID10441
Feature activation+8.64e-02

Pos logit contributions
What+0.045
Really+0.041
 boy+0.038
Who+0.035
It+0.034
Neg logit contributions
 least-0.047
onel-0.045
ird-0.041
col-0.041
 er-0.040
Token and
Token ID290
Feature activation+4.74e-02

Pos logit contributions
What+0.142
Really+0.133
 boy+0.119
Who+0.111
It+0.108
Neg logit contributions
 least-0.155
onel-0.151
ird-0.136
col-0.135
 er-0.130
Token a
Token ID257
Feature activation+6.30e-02

Pos logit contributions
What+0.070
Really+0.063
 boy+0.059
+0.053
It+0.053
Neg logit contributions
 least-0.091
onel-0.089
ird-0.081
col-0.080
issy-0.078
Token knight
Token ID22062
Feature activation+1.46e-01

Pos logit contributions
What+0.102
Really+0.092
 boy+0.084
Who+0.079
How+0.078
Neg logit contributions
 least-0.111
onel-0.109
ird-0.098
col-0.097
 er-0.095
Token and
Token ID290
Feature activation+5.26e-02

Pos logit contributions
What+0.248
Really+0.231
 boy+0.210
Who+0.193
It+0.192
Neg logit contributions
 least-0.252
onel-0.247
ird-0.224
col-0.219
aved-0.212
Token a
Token ID257
Feature activation+7.42e-02

Pos logit contributions
What+0.072
Really+0.064
 boy+0.062
Who+0.055
 man+0.054
Neg logit contributions
 least-0.106
onel-0.104
ird-0.096
col-0.094
aved-0.093
Token castle
Token ID16669
Feature activation+1.14e-01

Pos logit contributions
What+0.119
Really+0.107
 boy+0.098
Who+0.092
It+0.090
Neg logit contributions
 least-0.133
onel-0.130
ird-0.117
col-0.116
 er-0.112
Token.
Token ID13
Feature activation+3.01e-02

Pos logit contributions
What+0.176
Really+0.160
 boy+0.152
Who+0.137
It+0.135
Neg logit contributions
 least-0.212
onel-0.208
ird-0.188
col-0.186
aved-0.181
Token Lily
Token ID20037
Feature activation-2.85e-04

Pos logit contributions
What+0.039
Really+0.035
 boy+0.033
Who+0.029
 man+0.029
Neg logit contributions
 least-0.062
onel-0.061
ird-0.056
col-0.055
issy-0.054
Token
Token ID6184
Feature activation+3.66e-02

Pos logit contributions
 least+0.013
onel+0.013
ird+0.012
inda+0.012
issy+0.012
Neg logit contributions
What-0.008
Really-0.007
 boy-0.006
Spot-0.006
Who-0.006
Token
Token ID95
Feature activation+5.65e-02

Pos logit contributions
What+0.067
Really+0.061
 boy+0.060
Who+0.054
It+0.054
Neg logit contributions
 least-0.058
onel-0.056
ird-0.051
col-0.049
aved-0.048
Token
Token ID26391
Feature activation+4.39e-02

Pos logit contributions
What+0.102
Really+0.094
 boy+0.091
Who+0.082
It+0.082
Neg logit contributions
 least-0.091
onel-0.087
ird-0.079
col-0.076
issy-0.075
Token
Token ID129
Feature activation+1.13e-02

Pos logit contributions
What+0.075
Really+0.068
 boy+0.062
Who+0.058
How+0.054
Neg logit contributions
 least-0.076
onel-0.072
ird-0.072
issy-0.068
aved-0.065
Token
Token ID241
Feature activation+6.19e-02

Pos logit contributions
What+0.019
Really+0.017
 boy+0.017
Who+0.015
It+0.015
Neg logit contributions
 least-0.020
onel-0.019
ird-0.017
col-0.017
issy-0.017
TokenVery
Token ID16371
Feature activation+1.45e-01

Pos logit contributions
What+0.064
Really+0.063
 boy+0.059
+0.057
ller+0.056
Neg logit contributions
 least-0.122
ird-0.116
col-0.114
onel-0.109
inda-0.108
Token well
Token ID880
Feature activation+7.09e-02

Pos logit contributions
What+0.245
Really+0.220
 boy+0.204
Who+0.194
It+0.190
Neg logit contributions
 least-0.251
onel-0.241
ird-0.218
inda-0.207
col-0.204
Token.
Token ID13
Feature activation-6.16e-03

Pos logit contributions
What+0.139
Really+0.129
 boy+0.117
Who+0.115
It+0.110
Neg logit contributions
 least-0.106
onel-0.101
ird-0.091
col-0.086
inda-0.085
Token I
Token ID314
Feature activation-1.78e-02

Pos logit contributions
 least+0.012
onel+0.012
ird+0.011
col+0.011
aved+0.010
Neg logit contributions
What-0.009
Really-0.008
 boy-0.008
Who-0.007
It-0.007
Token have
Token ID423
Feature activation+2.83e-02

Pos logit contributions
onel+0.032
 least+0.032
ird+0.028
col+0.028
auc+0.028
Neg logit contributions
What-0.028
 boy-0.024
Really-0.024
It-0.022
 man-0.022
Token a
Token ID257
Feature activation+1.34e-03

Pos logit contributions
What+0.043
Really+0.039
 boy+0.037
Who+0.033
It+0.033
Neg logit contributions
 least-0.053
onel-0.052
ird-0.047
col-0.046
issy-0.045
Token her
Token ID607
Feature activation+4.61e-02

Pos logit contributions
What+0.014
Really+0.013
 boy+0.012
Who+0.011
It+0.011
Neg logit contributions
 least-0.017
onel-0.017
ird-0.015
col-0.015
issy-0.015
Token mom
Token ID1995
Feature activation+3.01e-02

Pos logit contributions
What+0.060
Really+0.053
Who+0.043
It+0.042
How+0.042
Neg logit contributions
 least-0.099
onel-0.093
ird-0.088
 er-0.086
issy-0.085
Token went
Token ID1816
Feature activation-6.42e-04

Pos logit contributions
What+0.050
Really+0.045
 boy+0.045
Who+0.040
 man+0.040
Neg logit contributions
 least-0.052
onel-0.052
ird-0.046
col-0.045
aved-0.044
Token to
Token ID284
Feature activation+2.50e-02

Pos logit contributions
 least+0.001
onel+0.001
ird+0.001
col+0.001
issy+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token a
Token ID257
Feature activation+3.37e-02

Pos logit contributions
What+0.038
Really+0.036
 boy+0.033
It+0.030
+0.030
Neg logit contributions
 least-0.048
onel-0.042
ird-0.040
 ease-0.039
col-0.039
Token restaurant
Token ID7072
Feature activation+1.44e-01

Pos logit contributions
What+0.055
Really+0.051
 boy+0.047
Who+0.044
It+0.043
Neg logit contributions
 least-0.059
onel-0.055
ird-0.051
col-0.050
auc-0.049
Token for
Token ID329
Feature activation+6.96e-02

Pos logit contributions
What+0.260
Really+0.253
Who+0.215
 boy+0.212
It+0.209
Neg logit contributions
 least-0.219
onel-0.197
ird-0.196
col-0.186
auc-0.186
Token dinner
Token ID8073
Feature activation+7.34e-02

Pos logit contributions
Really+0.122
What+0.121
 boy+0.110
+0.105
It+0.102
Neg logit contributions
 least-0.112
ird-0.096
onel-0.095
oken-0.087
 er-0.086
Token.
Token ID13
Feature activation-9.64e-04

Pos logit contributions
What+0.142
Really+0.136
 boy+0.121
Who+0.118
It+0.115
Neg logit contributions
 least-0.107
onel-0.097
ird-0.097
col-0.087
auc-0.086
Token Lily
Token ID20037
Feature activation-2.15e-02

Pos logit contributions
 least+0.002
onel+0.002
ird+0.002
col+0.002
issy+0.002
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token liked
Token ID8288
Feature activation-1.04e-02

Pos logit contributions
 least+0.041
onel+0.038
ird+0.037
issy+0.035
col+0.035
Neg logit contributions
What-0.033
Really-0.030
Who-0.024
 boy-0.024
It-0.024
Token a
Token ID257
Feature activation+6.03e-02

Pos logit contributions
What+0.015
Really+0.014
 boy+0.013
+0.012
Who+0.012
Neg logit contributions
 least-0.019
onel-0.018
ird-0.017
col-0.016
issy-0.016
Token nurse
Token ID15849
Feature activation+8.54e-02

Pos logit contributions
What+0.088
Really+0.080
 boy+0.073
Who+0.067
It+0.067
Neg logit contributions
 least-0.116
onel-0.112
ird-0.104
col-0.101
issy-0.099
Token,
Token ID11
Feature activation+2.28e-02

Pos logit contributions
What+0.147
Really+0.134
 boy+0.131
 man+0.118
Who+0.117
Neg logit contributions
 least-0.142
onel-0.140
ird-0.124
col-0.122
aved-0.121
Token and
Token ID290
Feature activation+4.46e-02

Pos logit contributions
What+0.030
Really+0.027
 boy+0.025
Who+0.022
It+0.022
Neg logit contributions
 least-0.048
onel-0.046
ird-0.043
col-0.042
issy-0.041
Token a
Token ID257
Feature activation+6.95e-02

Pos logit contributions
What+0.064
Really+0.057
 boy+0.055
Who+0.048
It+0.048
Neg logit contributions
 least-0.088
onel-0.086
ird-0.078
col-0.077
aved-0.076
Token writer
Token ID6260
Feature activation+1.43e-01

Pos logit contributions
What+0.095
Really+0.084
 boy+0.078
Who+0.070
It+0.070
Neg logit contributions
 least-0.142
onel-0.138
ird-0.128
col-0.125
issy-0.122
Token.
Token ID13
Feature activation+2.37e-02

Pos logit contributions
What+0.247
Really+0.228
 boy+0.214
Who+0.197
It+0.196
Neg logit contributions
 least-0.241
onel-0.233
ird-0.214
col-0.205
issy-0.201
Token She
Token ID1375
Feature activation+1.16e-02

Pos logit contributions
What+0.032
Really+0.028
 boy+0.026
It+0.023
Who+0.023
Neg logit contributions
onel-0.051
 least-0.050
aved-0.045
ird-0.045
col-0.045
Token had
Token ID550
Feature activation+2.51e-02

Pos logit contributions
What+0.019
Really+0.017
 boy+0.016
Who+0.014
It+0.014
Neg logit contributions
 least-0.021
onel-0.020
ird-0.019
col-0.018
issy-0.018
Token helped
Token ID4193
Feature activation-2.00e-02

Pos logit contributions
What+0.040
Really+0.037
 boy+0.035
Who+0.031
+0.031
Neg logit contributions
 least-0.045
onel-0.044
ird-0.040
col-0.039
issy-0.038
Token many
Token ID867
Feature activation-4.76e-02

Pos logit contributions
 least+0.037
onel+0.036
ird+0.032
col+0.032
aved+0.032
Neg logit contributions
What-0.031
Really-0.028
 boy-0.027
 man-0.024
Who-0.024
Token ball
Token ID2613
Feature activation+3.22e-02

Pos logit contributions
What+0.113
Really+0.102
 boy+0.096
Who+0.088
It+0.087
Neg logit contributions
 least-0.135
onel-0.131
ird-0.121
col-0.117
issy-0.115
Token,
Token ID11
Feature activation+2.98e-02

Pos logit contributions
 boy+0.041
What+0.038
 man+0.037
 lady+0.033
Really+0.032
Neg logit contributions
onel-0.068
 least-0.066
oken-0.063
 ease-0.061
aved-0.060
Token and
Token ID290
Feature activation+6.35e-02

Pos logit contributions
What+0.042
Really+0.039
 boy+0.035
Who+0.032
+0.031
Neg logit contributions
 least-0.060
onel-0.057
ird-0.053
col-0.052
issy-0.051
Token a
Token ID257
Feature activation+7.27e-02

Pos logit contributions
What+0.076
Really+0.064
 boy+0.064
It+0.055
Who+0.054
Neg logit contributions
onel-0.142
 least-0.142
aved-0.128
col-0.127
ird-0.127
Token t
Token ID256
Feature activation+7.56e-02

Pos logit contributions
What+0.113
Really+0.102
 boy+0.098
Who+0.088
It+0.087
Neg logit contributions
 least-0.134
onel-0.131
ird-0.119
col-0.117
issy-0.114
Tokeneddy
Token ID21874
Feature activation+1.43e-01

Pos logit contributions
What+0.086
Really+0.075
 boy+0.066
Who+0.060
It+0.057
Neg logit contributions
 least-0.173
onel-0.171
ird-0.160
col-0.157
issy-0.155
Token bear
Token ID6842
Feature activation+7.87e-02

Pos logit contributions
What+0.249
Really+0.236
It+0.202
Who+0.202
How+0.192
Neg logit contributions
 least-0.242
onel-0.219
ird-0.208
 er-0.206
ig-0.196
Token.
Token ID13
Feature activation+3.53e-02

Pos logit contributions
What+0.112
 boy+0.102
Really+0.099
 man+0.092
+0.087
Neg logit contributions
 least-0.151
onel-0.150
col-0.133
oken-0.132
aved-0.131
Token She
Token ID1375
Feature activation+1.66e-02

Pos logit contributions
What+0.059
 boy+0.055
Really+0.055
 man+0.051
 fisherman+0.049
Neg logit contributions
 least-0.052
onel-0.051
ird-0.049
ibr-0.045
issy-0.045
Token liked
Token ID8288
Feature activation+9.94e-04

Pos logit contributions
What+0.025
Really+0.023
 boy+0.020
+0.019
Who+0.019
Neg logit contributions
 least-0.032
onel-0.031
ird-0.029
col-0.028
issy-0.028
Token her
Token ID607
Feature activation-4.97e-03

Pos logit contributions
What+0.002
Really+0.002
 boy+0.001
+0.001
It+0.001
Neg logit contributions
 least-0.002
onel-0.002
col-0.001
 er-0.001
ird-0.001
Token.
Token ID13
Feature activation+2.02e-03

Pos logit contributions
 least+0.049
onel+0.049
ird+0.044
auc+0.044
issy+0.044
Neg logit contributions
What-0.039
Really-0.034
 boy-0.034
 man-0.031
It-0.030
Token
Token ID220
Feature activation-2.43e-02

Pos logit contributions
What+0.002
Really+0.002
 boy+0.002
It+0.001
Who+0.001
Neg logit contributions
onel-0.005
 least-0.005
ird-0.004
col-0.004
auc-0.004
Token
Token ID198
Feature activation-4.40e-03

Pos logit contributions
 least+0.052
onel+0.050
ird+0.047
issy+0.045
col+0.045
Neg logit contributions
What-0.031
Really-0.028
 boy-0.026
Who-0.023
 man-0.022
Token
Token ID198
Feature activation-2.37e-04

Pos logit contributions
 least+0.008
onel+0.008
ird+0.007
col+0.007
 ease+0.007
Neg logit contributions
What-0.006
Really-0.006
 boy-0.006
Who-0.005
 Please-0.005
TokenThe
Token ID464
Feature activation+1.21e-02

Pos logit contributions
 least+0.000
onel+0.000
ird+0.000
col+0.000
issy+0.000
Neg logit contributions
What-0.000
Really-0.000
 boy-0.000
 man-0.000
Who-0.000
Token End
Token ID5268
Feature activation+1.42e-01

Pos logit contributions
What+0.020
Really+0.019
ller+0.017
 boy+0.017
+0.016
Neg logit contributions
 least-0.020
onel-0.018
oken-0.016
 er-0.016
ird-0.016
Token.
Token ID13
Feature activation+8.49e-04

Pos logit contributions
What+0.259
Really+0.255
 boy+0.236
Who+0.218
+0.214
Neg logit contributions
 least-0.200
ird-0.192
onel-0.183
col-0.176
inda-0.165
Token<|endoftext|>
Token ID50256
Feature activation+3.42e-02

Pos logit contributions
What+0.001
Really+0.001
 boy+0.001
Who+0.001
It+0.001
Neg logit contributions
 least-0.002
onel-0.002
ird-0.002
col-0.002
issy-0.002
TokenOnce
Token ID7454
Feature activation+4.15e-02

Pos logit contributions
Really+0.049
What+0.049
 boy+0.043
Who+0.039
 man+0.039
Neg logit contributions
 least-0.062
onel-0.059
ird-0.056
col-0.055
aved-0.055
Token upon
Token ID2402
Feature activation+4.32e-02

Pos logit contributions
What+0.041
 boy+0.035
Really+0.034
 man+0.028
It+0.028
Neg logit contributions
 least-0.099
onel-0.098
ird-0.091
col-0.090
aved-0.088
Token a
Token ID257
Feature activation-1.95e-02

Pos logit contributions
Really+0.085
+0.078
What+0.078
 boy+0.073
ller+0.072
Neg logit contributions
 least-0.055
ird-0.050
onel-0.049
 ease-0.049
col-0.042
Token.
Token ID13
Feature activation-2.27e-03

Pos logit contributions
onel+0.042
 least+0.042
ird+0.039
issy+0.038
col+0.038
Neg logit contributions
What-0.024
 boy-0.022
Really-0.020
It-0.019
-0.018
Token
Token ID220
Feature activation-2.75e-02

Pos logit contributions
 least+0.004
ird+0.003
 ease+0.003
onel+0.003
col+0.003
Neg logit contributions
What-0.004
Really-0.003
 boy-0.003
 raven-0.003
Who-0.003
Token
Token ID198
Feature activation-1.12e-02

Pos logit contributions
 least+0.056
onel+0.054
ird+0.051
issy+0.049
col+0.049
Neg logit contributions
What-0.037
Really-0.034
 boy-0.032
Who-0.029
 man-0.028
Token
Token ID198
Feature activation-7.11e-03

Pos logit contributions
 least+0.017
onel+0.017
ird+0.017
 ease+0.017
col+0.016
Neg logit contributions
Really-0.017
What-0.017
 boy-0.017
Who-0.015
 Please-0.014
TokenThe
Token ID464
Feature activation+9.32e-03

Pos logit contributions
ird+0.012
col+0.012
 least+0.012
onel+0.012
 ease+0.011
Neg logit contributions
 boy-0.009
 MacDonald-0.008
Really-0.008
What-0.007
 man-0.007
Token End
Token ID5268
Feature activation+1.42e-01

Pos logit contributions
What+0.014
Really+0.013
 boy+0.012
It+0.011
+0.011
Neg logit contributions
 least-0.018
onel-0.017
ird-0.015
 er-0.015
issy-0.015
Token.
Token ID13
Feature activation-2.75e-03

Pos logit contributions
What+0.284
Really+0.283
 boy+0.257
Who+0.244
+0.239
Neg logit contributions
ird-0.165
 least-0.155
col-0.152
onel-0.139
auc-0.131
Token<|endoftext|>
Token ID50256
Feature activation+3.26e-02

Pos logit contributions
 least+0.005
onel+0.005
ird+0.005
col+0.005
issy+0.004
Neg logit contributions
What-0.004
Really-0.004
 boy-0.003
 man-0.003
Who-0.003
TokenOnce
Token ID7454
Feature activation+3.95e-02

Pos logit contributions
Really+0.047
What+0.047
 boy+0.040
Who+0.038
ller+0.037
Neg logit contributions
 least-0.059
onel-0.056
ird-0.054
aved-0.053
col-0.052
Token there
Token ID612
Feature activation-1.90e-02

Pos logit contributions
What+0.040
 boy+0.034
Really+0.034
 man+0.027
It+0.027
Neg logit contributions
 least-0.094
onel-0.093
ird-0.086
col-0.085
aved-0.083
Token was
Token ID373
Feature activation-3.94e-02

Pos logit contributions
 least+0.042
onel+0.041
ird+0.038
col+0.037
issy+0.036
Neg logit contributions
What-0.023
Really-0.020
 boy-0.019
Who-0.016
It-0.016
Token a
Token ID257
Feature activation+6.69e-02

Pos logit contributions
What+0.032
Really+0.029
 boy+0.027
Who+0.024
It+0.024
Neg logit contributions
 least-0.050
onel-0.048
ird-0.045
col-0.044
issy-0.043
Token car
Token ID1097
Feature activation+4.49e-02

Pos logit contributions
What+0.108
Really+0.098
 boy+0.087
Who+0.085
It+0.082
Neg logit contributions
 least-0.115
onel-0.110
ird-0.104
 er-0.101
col-0.101
Token and
Token ID290
Feature activation+5.74e-02

Pos logit contributions
What+0.077
Really+0.070
 boy+0.070
 man+0.063
+0.062
Neg logit contributions
 least-0.075
onel-0.074
ird-0.065
col-0.064
aved-0.063
Token a
Token ID257
Feature activation+6.79e-02

Pos logit contributions
What+0.075
Really+0.066
 boy+0.063
Who+0.055
It+0.055
Neg logit contributions
 least-0.121
onel-0.120
ird-0.109
col-0.108
aved-0.106
Token t
Token ID256
Feature activation+7.60e-02

Pos logit contributions
What+0.106
Really+0.096
 boy+0.090
Who+0.083
It+0.082
Neg logit contributions
 least-0.123
onel-0.120
ird-0.110
col-0.107
aved-0.104
Tokeneddy
Token ID21874
Feature activation+1.42e-01

Pos logit contributions
What+0.099
Really+0.088
Who+0.075
 boy+0.072
How+0.069
Neg logit contributions
 least-0.161
onel-0.159
ird-0.149
auc-0.148
col-0.147
Token bear
Token ID6842
Feature activation+7.34e-02

Pos logit contributions
What+0.250
Really+0.235
It+0.204
Who+0.203
How+0.192
Neg logit contributions
 least-0.242
onel-0.214
 er-0.205
ird-0.204
ig-0.197
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
What+0.104
 boy+0.095
Really+0.092
 man+0.085
+0.081
Neg logit contributions
 least-0.141
onel-0.141
oken-0.124
ird-0.124
col-0.123
Token They
Token ID1119
Feature activation+3.45e-03

Pos logit contributions
What+0.016
Really+0.015
 boy+0.014
 man+0.013
Who+0.013
Neg logit contributions
 least-0.023
onel-0.021
ird-0.020
col-0.020
auc-0.020
Token like
Token ID588
Feature activation+5.65e-02

Pos logit contributions
What+0.006
Really+0.005
 boy+0.005
Who+0.005
It+0.005
Neg logit contributions
 least-0.006
onel-0.006
ird-0.005
col-0.005
issy-0.005
Token to
Token ID284
Feature activation-1.55e-03

Pos logit contributions
What+0.097
Really+0.095
+0.082
 boy+0.080
Who+0.077
Neg logit contributions
 least-0.089
 er-0.080
col-0.079
onel-0.079
 ease-0.076
Token.
Token ID13
Feature activation-8.45e-03

Pos logit contributions
 least+0.014
onel+0.013
col+0.012
inda+0.011
issy+0.011
Neg logit contributions
What-0.017
Really-0.016
Who-0.014
 boy-0.014
Spot-0.014
Token He
Token ID679
Feature activation+2.61e-02

Pos logit contributions
 least+0.019
onel+0.018
ird+0.017
col+0.017
issy+0.017
Neg logit contributions
What-0.010
Really-0.008
 boy-0.008
Who-0.007
It-0.007
Token watched
Token ID7342
Feature activation-1.54e-02

Pos logit contributions
What+0.044
Really+0.040
 boy+0.037
Who+0.035
It+0.034
Neg logit contributions
 least-0.045
onel-0.043
ird-0.039
col-0.039
issy-0.038
Token as
Token ID355
Feature activation+4.69e-02

Pos logit contributions
 least+0.029
onel+0.029
ird+0.026
col+0.026
issy+0.025
Neg logit contributions
What-0.023
Really-0.021
 boy-0.020
Who-0.018
It-0.018
Token Per
Token ID2448
Feature activation-1.18e-01

Pos logit contributions
What+0.068
Really+0.063
 boy+0.056
+0.051
Who+0.051
Neg logit contributions
 least-0.090
onel-0.088
ird-0.080
col-0.079
issy-0.079
Tokenc
Token ID66
Feature activation-1.85e-01

Pos logit contributions
 least+0.184
onel+0.183
ird+0.156
auc+0.155
col+0.154
Neg logit contributions
What-0.217
Really-0.199
 boy-0.192
Who-0.177
-0.176
Tokenival
Token ID2473
Feature activation-3.74e-02

Pos logit contributions
 least+0.289
onel+0.273
ird+0.247
col+0.245
auc+0.242
Neg logit contributions
What-0.329
 boy-0.303
Really-0.298
 man-0.282
It-0.280
Token cut
Token ID2005
Feature activation-5.83e-02

Pos logit contributions
 least+0.055
onel+0.051
ird+0.049
 er+0.046
bar+0.046
Neg logit contributions
What-0.070
Really-0.067
 boy-0.057
It-0.056
Who-0.056
Token out
Token ID503
Feature activation-6.36e-02

Pos logit contributions
 least+0.112
onel+0.109
ird+0.099
col+0.096
aved+0.094
Neg logit contributions
What-0.087
 boy-0.077
Really-0.076
 man-0.069
It-0.068
Token two
Token ID734
Feature activation-5.46e-02

Pos logit contributions
 least+0.099
onel+0.097
ird+0.087
col+0.085
issy+0.083
Neg logit contributions
What-0.116
Really-0.108
 boy-0.103
Who-0.094
It-0.094
Token shapes
Token ID15268
Feature activation-1.70e-02

Pos logit contributions
 least+0.103
onel+0.100
ird+0.090
col+0.090
issy+0.087
Neg logit contributions
What-0.083
Really-0.075
 boy-0.074
 man-0.065
Who-0.065
Token sand
Token ID6450
Feature activation-3.20e-02

Pos logit contributions
onel+0.015
 least+0.015
ird+0.014
issy+0.013
aved+0.013
Neg logit contributions
What-0.020
 boy-0.018
Really-0.018
 man-0.017
Who-0.016
Token and
Token ID290
Feature activation-1.39e-03

Pos logit contributions
 least+0.080
onel+0.080
ird+0.077
issy+0.076
oken+0.075
Neg logit contributions
 boy-0.024
What-0.023
 man-0.022
Really-0.017
-0.016
Token watched
Token ID7342
Feature activation-1.18e-02

Pos logit contributions
 least+0.002
onel+0.002
ird+0.002
col+0.002
issy+0.002
Neg logit contributions
What-0.002
Really-0.002
 boy-0.002
Who-0.002
It-0.002
Token as
Token ID355
Feature activation+5.12e-02

Pos logit contributions
 least+0.030
onel+0.029
ird+0.027
col+0.027
issy+0.027
Neg logit contributions
What-0.011
Really-0.009
 boy-0.008
Who-0.007
It-0.007
Token it
Token ID340
Feature activation-5.50e-02

Pos logit contributions
What+0.060
Really+0.052
 boy+0.049
Who+0.042
It+0.042
Neg logit contributions
 least-0.115
onel-0.113
ird-0.104
col-0.103
issy-0.101
Token sc
Token ID629
Feature activation-1.82e-01

Pos logit contributions
 least+0.097
onel+0.092
ird+0.087
col+0.082
issy+0.082
Neg logit contributions
What-0.091
 boy-0.081
Really-0.079
It-0.075
 man-0.074
Tokenur
Token ID333
Feature activation-1.30e-02

Pos logit contributions
 least+0.234
 tele+0.214
onel+0.210
issy+0.204
inda+0.195
Neg logit contributions
Really-0.303
What-0.285
He-0.277
"-0.273
It-0.269
Tokenried
Token ID2228
Feature activation+1.99e-02

Pos logit contributions
 least+0.022
onel+0.022
ird+0.020
issy+0.019
col+0.019
Neg logit contributions
What-0.022
Really-0.020
 boy-0.019
Who-0.018
-0.017
Token away
Token ID1497
Feature activation+6.61e-03

Pos logit contributions
What+0.036
 boy+0.032
Really+0.031
It+0.030
 man+0.030
Neg logit contributions
 least-0.033
onel-0.032
ird-0.030
issy-0.028
auc-0.027
Token.
Token ID13
Feature activation-1.64e-02

Pos logit contributions
What+0.010
 boy+0.008
Really+0.008
 man+0.008
Who+0.007
Neg logit contributions
 least-0.013
onel-0.013
ird-0.012
issy-0.012
auc-0.011
Token
Token ID198
Feature activation-1.75e-02

Pos logit contributions
 least+0.038
onel+0.037
ird+0.034
col+0.034
aved+0.033
Neg logit contributions
What-0.019
Really-0.016
 boy-0.015
It-0.013
Who-0.013
Token knew
Token ID2993
Feature activation-6.45e-02

Pos logit contributions
 least+0.080
onel+0.078
ird+0.072
aved+0.069
col+0.068
Neg logit contributions
What-0.074
Really-0.068
 boy-0.059
-0.056
It-0.056
Token how
Token ID703
Feature activation+5.63e-02

Pos logit contributions
onel+0.135
 least+0.134
aved+0.125
col+0.122
ird+0.121
Neg logit contributions
What-0.083
 boy-0.070
Really-0.069
 man-0.063
Who-0.061
Token to
Token ID284
Feature activation-5.01e-02

Pos logit contributions
What+0.089
Really+0.081
 boy+0.077
Who+0.069
+0.069
Neg logit contributions
 least-0.103
onel-0.100
ird-0.091
col-0.090
aved-0.087
Token play
Token ID711
Feature activation-2.85e-02

Pos logit contributions
 least+0.084
onel+0.080
ird+0.073
col+0.071
aved+0.070
Neg logit contributions
What-0.088
Really-0.082
 boy-0.077
Who-0.070
It-0.069
Token hop
Token ID1725
Feature activation-4.01e-02

Pos logit contributions
 least+0.053
onel+0.051
ird+0.048
col+0.046
 er+0.045
Neg logit contributions
What-0.042
Really-0.041
 boy-0.036
-0.034
Who-0.033
Tokensc
Token ID1416
Feature activation-1.77e-01

Pos logit contributions
 least+0.078
onel+0.077
ird+0.071
issy+0.068
oken+0.067
Neg logit contributions
What-0.059
 boy-0.052
Really-0.050
 man-0.046
It-0.046
Tokenot
Token ID313
Feature activation-7.15e-02

Pos logit contributions
 least+0.307
onel+0.286
ird+0.250
col+0.245
issy+0.242
Neg logit contributions
What-0.296
 boy-0.266
Really-0.255
It-0.247
"-0.247
Tokench
Token ID354
Feature activation-2.55e-02

Pos logit contributions
 least+0.137
onel+0.133
ird+0.117
issy+0.113
oken+0.112
Neg logit contributions
What-0.109
 boy-0.097
Really-0.089
 man-0.088
It-0.087
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 least+0.049
onel+0.047
ird+0.043
col+0.042
issy+0.042
Neg logit contributions
What-0.038
 boy-0.033
Really-0.033
 man-0.030
It-0.030
Token
Token ID198
Feature activation-1.66e-02

Pos logit contributions
 least+0.019
onel+0.019
ird+0.017
col+0.017
issy+0.017
Neg logit contributions
What-0.016
Really-0.014
 boy-0.013
Who-0.012
It-0.012
Token
Token ID198
Feature activation-1.09e-02

Pos logit contributions
 least+0.032
onel+0.031
ird+0.029
col+0.029
issy+0.029
Neg logit contributions
What-0.022
Really-0.022
 boy-0.020
Who-0.018
-0.017
Token.
Token ID13
Feature activation-9.95e-03

Pos logit contributions
 least+0.035
onel+0.035
ird+0.032
issy+0.031
oken+0.031
Neg logit contributions
What-0.025
 boy-0.022
Really-0.022
 man-0.020
It-0.020
Token Rabbit
Token ID25498
Feature activation+1.83e-03

Pos logit contributions
 least+0.019
onel+0.018
ird+0.017
col+0.016
issy+0.016
Neg logit contributions
What-0.015
Really-0.014
 boy-0.013
It-0.012
Who-0.012
Token watched
Token ID7342
Feature activation-1.66e-02

Pos logit contributions
What+0.003
Really+0.003
 boy+0.003
Who+0.003
How+0.002
Neg logit contributions
 least-0.003
onel-0.003
col-0.003
ird-0.003
aved-0.003
Token how
Token ID703
Feature activation+5.19e-02

Pos logit contributions
 least+0.034
onel+0.034
ird+0.031
col+0.030
issy+0.030
Neg logit contributions
What-0.023
Really-0.020
 boy-0.019
Who-0.017
It-0.017
Token squirrel
Token ID33039
Feature activation-5.02e-02

Pos logit contributions
What+0.091
Really+0.084
Who+0.075
 MacDonald+0.072
How+0.071
Neg logit contributions
onel-0.081
 least-0.076
aved-0.074
col-0.073
 ease-0.071
Token sc
Token ID629
Feature activation-1.76e-01

Pos logit contributions
 least+0.089
onel+0.087
ird+0.078
col+0.076
issy+0.075
Neg logit contributions
What-0.083
Really-0.075
 boy-0.073
It-0.066
Who-0.065
Tokenur
Token ID333
Feature activation-2.52e-03

Pos logit contributions
 least+0.198
onel+0.171
auc+0.166
inda+0.152
 er+0.148
Neg logit contributions
Really-0.342
What-0.337
Spot-0.314
It-0.310
-0.308
Tokenried
Token ID2228
Feature activation+1.42e-02

Pos logit contributions
 least+0.004
onel+0.004
ird+0.004
col+0.004
aved+0.004
Neg logit contributions
What-0.004
Really-0.004
 boy-0.004
Who-0.004
It-0.004
Token up
Token ID510
Feature activation-1.45e-02

Pos logit contributions
What+0.026
Really+0.024
 boy+0.023
It+0.021
 man+0.021
Neg logit contributions
 least-0.022
onel-0.022
ird-0.020
col-0.019
issy-0.019
Token it
Token ID340
Feature activation-1.76e-02

Pos logit contributions
onel+0.031
 least+0.031
ird+0.028
issy+0.027
auc+0.027
Neg logit contributions
What-0.019
Really-0.016
 boy-0.016
 man-0.014
It-0.014
Token quickly
Token ID2952
Feature activation-5.78e-02

Pos logit contributions
 least+0.038
onel+0.037
ird+0.034
auc+0.033
col+0.033
Neg logit contributions
What-0.021
 boy-0.019
Really-0.019
 man-0.017
It-0.016
Token day
Token ID1110
Feature activation-6.52e-03

Pos logit contributions
What+0.032
 boy+0.030
Really+0.029
 man+0.027
It+0.026
Neg logit contributions
 least-0.031
onel-0.030
ird-0.027
col-0.027
issy-0.026
Token,
Token ID11
Feature activation-3.82e-02

Pos logit contributions
 least+0.010
onel+0.010
ird+0.009
col+0.009
issy+0.009
Neg logit contributions
What-0.012
Really-0.011
 boy-0.011
 man-0.010
Who-0.010
Token he
Token ID339
Feature activation-9.00e-04

Pos logit contributions
 least+0.063
onel+0.058
 ease+0.054
 er+0.053
ird+0.053
Neg logit contributions
What-0.062
Really-0.059
-0.053
 MacDonald-0.051
Spot-0.050
Token watched
Token ID7342
Feature activation-1.57e-02

Pos logit contributions
 least+0.002
onel+0.002
ird+0.001
col+0.001
issy+0.001
Neg logit contributions
What-0.002
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token as
Token ID355
Feature activation+5.19e-02

Pos logit contributions
 least+0.030
onel+0.029
col+0.026
ird+0.026
 ease+0.025
Neg logit contributions
What-0.023
Really-0.022
 boy-0.020
-0.018
Who-0.018
Token other
Token ID584
Feature activation-1.75e-01

Pos logit contributions
What+0.078
Really+0.073
 boy+0.064
+0.063
Who+0.059
Neg logit contributions
 least-0.096
onel-0.094
 ease-0.084
col-0.083
aved-0.081
Token pages
Token ID5468
Feature activation-8.05e-02

Pos logit contributions
 least+0.326
onel+0.322
inda+0.293
ird+0.290
 er+0.286
Neg logit contributions
What-0.252
Really-0.239
 boy-0.205
It-0.193
-0.191
Token ran
Token ID4966
Feature activation-6.71e-02

Pos logit contributions
 least+0.166
onel+0.156
ird+0.149
col+0.145
issy+0.145
Neg logit contributions
What-0.103
 boy-0.091
 man-0.087
It-0.085
Really-0.083
Token around
Token ID1088
Feature activation-1.67e-02

Pos logit contributions
 least+0.131
onel+0.128
ird+0.120
issy+0.112
auc+0.112
Neg logit contributions
What-0.101
 boy-0.093
Really-0.086
 man-0.085
It-0.078
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
 least+0.033
onel+0.032
ird+0.029
col+0.028
aved+0.028
Neg logit contributions
What-0.024
Really-0.022
 boy-0.021
It-0.019
Who-0.019
Token played
Token ID2826
Feature activation+2.19e-02

Pos logit contributions
 least+0.022
onel+0.021
col+0.019
ird+0.019
aved+0.018
Neg logit contributions
What-0.022
Really-0.020
 boy-0.019
It-0.017
Who-0.017
Token had
Token ID550
Feature activation-2.96e-03

Pos logit contributions
onel+0.081
 least+0.080
col+0.072
ird+0.072
auc+0.071
Neg logit contributions
What-0.054
 boy-0.048
Really-0.048
 man-0.042
Who-0.042
Token to
Token ID284
Feature activation-6.32e-04

Pos logit contributions
 least+0.005
onel+0.005
ird+0.005
col+0.005
issy+0.005
Neg logit contributions
What-0.005
Really-0.004
 boy-0.004
It-0.004
Who-0.004
Token know
Token ID760
Feature activation-5.13e-04

Pos logit contributions
 least+0.001
onel+0.001
ird+0.001
col+0.001
aved+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token all
Token ID477
Feature activation-5.66e-02

Pos logit contributions
 least+0.001
onel+0.001
ird+0.001
col+0.001
aved+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
It-0.001
 man-0.001
Token about
Token ID546
Feature activation+3.75e-02

Pos logit contributions
onel+0.110
 least+0.109
ird+0.097
aved+0.095
auc+0.093
Neg logit contributions
What-0.085
 boy-0.075
Really-0.068
 man-0.067
It-0.066
Token microsc
Token ID21145
Feature activation-1.74e-01

Pos logit contributions
What+0.055
Really+0.051
 boy+0.048
Who+0.042
It+0.042
Neg logit contributions
 least-0.072
onel-0.070
ird-0.064
col-0.063
issy-0.061
Tokenopes
Token ID13920
Feature activation-7.50e-02

Pos logit contributions
 least+0.327
onel+0.317
ird+0.296
col+0.277
issy+0.275
Neg logit contributions
What-0.259
Really-0.235
 boy-0.219
-0.201
It-0.201
Token.
Token ID13
Feature activation-1.58e-02

Pos logit contributions
 least+0.147
onel+0.144
ird+0.131
col+0.129
issy+0.127
Neg logit contributions
What-0.108
Really-0.095
 boy-0.092
It-0.082
 man-0.082
Token
Token ID220
Feature activation-3.61e-02

Pos logit contributions
 least+0.024
onel+0.022
ird+0.021
 ease+0.020
col+0.019
Neg logit contributions
What-0.028
 boy-0.025
Really-0.025
 man-0.024
ller-0.023
Token
Token ID198
Feature activation-2.17e-02

Pos logit contributions
onel+0.069
 least+0.068
ird+0.065
issy+0.063
col+0.061
Neg logit contributions
What-0.050
Really-0.048
 boy-0.044
Who-0.040
 man-0.038
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
 least+0.034
onel+0.033
ird+0.032
 ease+0.031
col+0.030
Neg logit contributions
What-0.031
 boy-0.031
Really-0.030
 MacDonald-0.030
ller-0.029
TokenShe
Token ID3347
Feature activation-8.89e-03

Pos logit contributions
ird+0.015
 least+0.015
onel+0.015
col+0.015
inda+0.014
Neg logit contributions
What-0.010
 boy-0.010
Really-0.009
ller-0.008
 man-0.008
Token watched
Token ID7342
Feature activation-1.14e-02

Pos logit contributions
 least+0.016
onel+0.016
ird+0.014
col+0.014
aved+0.014
Neg logit contributions
What-0.014
Really-0.013
 boy-0.013
It-0.011
Who-0.011
Token as
Token ID355
Feature activation+4.93e-02

Pos logit contributions
 least+0.024
onel+0.023
ird+0.021
col+0.021
issy+0.021
Neg logit contributions
What-0.015
Really-0.013
 boy-0.013
 man-0.011
Who-0.011
Token the
Token ID262
Feature activation-6.22e-02

Pos logit contributions
What+0.070
Really+0.063
 boy+0.060
Who+0.053
It+0.053
Neg logit contributions
 least-0.098
onel-0.097
ird-0.088
col-0.087
aved-0.085
Token mouse
Token ID10211
Feature activation-9.93e-02

Pos logit contributions
 least+0.118
onel+0.116
ird+0.105
col+0.103
issy+0.102
Neg logit contributions
What-0.095
Really-0.086
 boy-0.081
Who-0.073
It-0.072
Token sc
Token ID629
Feature activation-1.73e-01

Pos logit contributions
 least+0.179
onel+0.175
ird+0.157
upid+0.156
auc+0.155
Neg logit contributions
What-0.152
 boy-0.141
Really-0.133
 man-0.131
It-0.124
Tokenur
Token ID333
Feature activation-4.48e-03

Pos logit contributions
 least+0.217
 rect+0.169
onel+0.168
 tele+0.159
col+0.159
Neg logit contributions
Really-0.300
What-0.298
-0.296
"-0.283
It-0.282
Tokenried
Token ID2228
Feature activation+2.15e-02

Pos logit contributions
 least+0.008
onel+0.007
ird+0.007
col+0.007
issy+0.006
Neg logit contributions
What-0.008
Really-0.007
 boy-0.007
Who-0.006
It-0.006
Token over
Token ID625
Feature activation+1.05e-02

Pos logit contributions
What+0.040
 boy+0.035
Really+0.034
It+0.033
 man+0.033
Neg logit contributions
 least-0.035
onel-0.034
ird-0.032
issy-0.029
auc-0.028
Token to
Token ID284
Feature activation-4.07e-03

Pos logit contributions
What+0.014
 boy+0.011
Really+0.011
It+0.010
 man+0.010
Neg logit contributions
onel-0.023
 least-0.023
ird-0.022
issy-0.021
auc-0.020
Token the
Token ID262
Feature activation+4.40e-03

Pos logit contributions
onel+0.009
 least+0.008
ird+0.008
issy+0.008
auc+0.008
Neg logit contributions
What-0.006
 boy-0.005
Really-0.004
Who-0.004
It-0.004
Token was
Token ID373
Feature activation-1.91e-02

Pos logit contributions
 boy+0.033
What+0.032
 man+0.027
Really+0.027
It+0.027
Neg logit contributions
 least-0.058
onel-0.057
upid-0.056
opl-0.054
oken-0.053
Token the
Token ID262
Feature activation+2.27e-02

Pos logit contributions
onel+0.036
 least+0.036
ird+0.032
col+0.032
auc+0.032
Neg logit contributions
What-0.029
Really-0.026
 boy-0.025
It-0.023
 man-0.022
Token same
Token ID976
Feature activation+3.00e-02

Pos logit contributions
What+0.037
 boy+0.035
Really+0.033
 man+0.031
It+0.030
Neg logit contributions
onel-0.041
aved-0.037
 least-0.037
auc-0.036
ird-0.036
Token color
Token ID3124
Feature activation+7.43e-03

Pos logit contributions
What+0.050
 boy+0.046
Really+0.045
 man+0.041
It+0.040
Neg logit contributions
 least-0.051
onel-0.051
col-0.045
ird-0.045
aved-0.045
Token as
Token ID355
Feature activation+6.48e-02

Pos logit contributions
What+0.011
 boy+0.011
Really+0.010
It+0.009
 man+0.009
Neg logit contributions
 least-0.014
onel-0.013
ird-0.012
col-0.012
oken-0.011
Token Mitt
Token ID16627
Feature activation-1.72e-01

Pos logit contributions
What+0.089
Really+0.079
 boy+0.077
It+0.067
 man+0.066
Neg logit contributions
 least-0.133
onel-0.130
ird-0.119
col-0.117
aved-0.116
Tokenens
Token ID641
Feature activation-9.77e-02

Pos logit contributions
 least+0.250
onel+0.243
ird+0.213
col+0.209
aved+0.202
Neg logit contributions
What-0.336
Really-0.311
 boy-0.301
Who-0.276
It-0.275
Token!
Token ID0
Feature activation-1.88e-02

Pos logit contributions
 least+0.173
onel+0.170
col+0.151
ird+0.149
auc+0.148
Neg logit contributions
What-0.157
Really-0.141
 boy-0.139
 man-0.124
Who-0.124
Token Lily
Token ID20037
Feature activation-3.64e-02

Pos logit contributions
 least+0.039
onel+0.038
ird+0.035
col+0.034
aved+0.034
Neg logit contributions
What-0.026
Really-0.023
 boy-0.022
It-0.019
Who-0.019
Token was
Token ID373
Feature activation-2.17e-02

Pos logit contributions
 least+0.073
onel+0.071
ird+0.065
col+0.064
issy+0.063
Neg logit contributions
What-0.052
Really-0.047
 boy-0.043
Who-0.039
It-0.038
Token so
Token ID523
Feature activation+1.28e-02

Pos logit contributions
 least+0.033
onel+0.032
ird+0.028
col+0.027
aved+0.026
Neg logit contributions
What-0.041
Really-0.039
 boy-0.037
Who-0.034
 man-0.034
Token replied
Token ID8712
Feature activation-2.70e-02

Pos logit contributions
 least+0.005
onel+0.005
ird+0.005
col+0.005
issy+0.005
Neg logit contributions
What-0.006
Really-0.005
 boy-0.005
It-0.005
Who-0.005
Token with
Token ID351
Feature activation+5.15e-03

Pos logit contributions
 least+0.041
onel+0.040
ird+0.036
col+0.035
issy+0.034
Neg logit contributions
What-0.051
Really-0.047
 boy-0.045
Who-0.042
It-0.041
Token kindness
Token ID23887
Feature activation-4.25e-03

Pos logit contributions
What+0.007
Really+0.006
 boy+0.006
It+0.005
 man+0.005
Neg logit contributions
 least-0.010
onel-0.010
ird-0.010
col-0.009
auc-0.009
Token instead
Token ID2427
Feature activation+2.94e-02

Pos logit contributions
 least+0.008
onel+0.008
ird+0.007
col+0.007
issy+0.007
Neg logit contributions
What-0.006
Really-0.006
 boy-0.006
 man-0.005
It-0.005
Token of
Token ID286
Feature activation-6.24e-02

Pos logit contributions
What+0.051
Really+0.046
 boy+0.046
It+0.041
 man+0.041
Neg logit contributions
 least-0.049
onel-0.047
ird-0.042
col-0.041
aved-0.041
Token sc
Token ID629
Feature activation-1.72e-01

Pos logit contributions
 least+0.106
onel+0.103
ird+0.092
col+0.090
aved+0.088
Neg logit contributions
What-0.108
Really-0.099
 boy-0.095
Who-0.086
It-0.086
Tokenolding
Token ID33266
Feature activation-4.88e-02

Pos logit contributions
 least+0.286
onel+0.253
col+0.238
 ease+0.234
inea+0.227
Neg logit contributions
What-0.245
 boy-0.240
 man-0.240
"-0.237
-0.222
Token him
Token ID683
Feature activation+7.18e-03

Pos logit contributions
 least+0.077
onel+0.072
 ease+0.065
col+0.064
ird+0.063
Neg logit contributions
What-0.087
Really-0.086
 boy-0.075
-0.074
Who-0.072
Token.
Token ID13
Feature activation-7.49e-03

Pos logit contributions
What+0.011
Really+0.010
 boy+0.010
 man+0.009
Who+0.009
Neg logit contributions
 least-0.013
onel-0.013
ird-0.012
col-0.011
aved-0.011
Token He
Token ID679
Feature activation+3.05e-02

Pos logit contributions
 least+0.013
onel+0.012
ird+0.011
col+0.011
 ease+0.011
Neg logit contributions
What-0.012
Really-0.011
 boy-0.011
Who-0.010
 man-0.009
Token ran
Token ID4966
Feature activation-4.42e-02

Pos logit contributions
What+0.048
Really+0.044
 boy+0.042
Who+0.037
It+0.037
Neg logit contributions
 least-0.056
onel-0.055
ird-0.050
col-0.049
issy-0.048
Token fun
Token ID1257
Feature activation-1.79e-02

Pos logit contributions
What+0.074
Really+0.066
 boy+0.064
It+0.057
Who+0.057
Neg logit contributions
 least-0.087
onel-0.086
ird-0.077
col-0.076
issy-0.075
Token.
Token ID13
Feature activation-8.93e-03

Pos logit contributions
 least+0.036
onel+0.036
ird+0.033
issy+0.032
col+0.032
Neg logit contributions
What-0.024
 boy-0.022
Really-0.020
 man-0.019
It-0.018
Token Tim
Token ID5045
Feature activation-2.20e-02

Pos logit contributions
 least+0.019
onel+0.018
ird+0.017
col+0.016
issy+0.016
Neg logit contributions
What-0.012
Really-0.011
 boy-0.010
Who-0.009
It-0.009
Token watched
Token ID7342
Feature activation-5.29e-03

Pos logit contributions
 least+0.040
onel+0.037
ird+0.034
issy+0.033
inda+0.033
Neg logit contributions
What-0.036
Really-0.035
 boy-0.029
Who-0.029
It-0.028
Token as
Token ID355
Feature activation+5.48e-02

Pos logit contributions
 least+0.010
onel+0.010
ird+0.009
col+0.009
aved+0.009
Neg logit contributions
What-0.008
Really-0.007
 boy-0.006
Who-0.006
-0.006
Token other
Token ID584
Feature activation-1.72e-01

Pos logit contributions
What+0.077
Really+0.071
 boy+0.064
+0.058
Who+0.058
Neg logit contributions
 least-0.109
onel-0.106
ird-0.096
col-0.095
issy-0.094
Token kids
Token ID3988
Feature activation-8.39e-02

Pos logit contributions
 least+0.361
onel+0.355
ird+0.325
col+0.320
aved+0.314
Neg logit contributions
What-0.226
Really-0.199
 boy-0.192
Who-0.167
It-0.165
Token played
Token ID2826
Feature activation-1.07e-02

Pos logit contributions
 least+0.158
onel+0.158
col+0.144
issy+0.142
auc+0.142
Neg logit contributions
What-0.119
 boy-0.113
 man-0.100
Really-0.097
Who-0.094
Token with
Token ID351
Feature activation-4.46e-03

Pos logit contributions
 least+0.020
onel+0.020
ird+0.018
col+0.018
aved+0.018
Neg logit contributions
What-0.016
Really-0.015
 boy-0.013
Who-0.012
-0.012
Token the
Token ID262
Feature activation-5.87e-03

Pos logit contributions
 least+0.010
onel+0.010
ird+0.009
col+0.009
issy+0.009
Neg logit contributions
What-0.005
 boy-0.004
Really-0.004
 man-0.004
It-0.004
Token drum
Token ID13026
Feature activation-6.82e-02

Pos logit contributions
 least+0.009
onel+0.008
ird+0.007
col+0.007
aved+0.007
Neg logit contributions
What-0.012
Really-0.011
 boy-0.010
It-0.010
 man-0.010
Token forest
Token ID8222
Feature activation-4.30e-02

Pos logit contributions
 least+0.032
onel+0.030
col+0.027
ird+0.027
aved+0.026
Neg logit contributions
What-0.037
Really-0.034
 boy-0.032
Who-0.030
It-0.030
Token.
Token ID13
Feature activation-1.14e-02

Pos logit contributions
 least+0.089
onel+0.088
ird+0.082
issy+0.080
upid+0.078
Neg logit contributions
What-0.052
 boy-0.049
 man-0.045
Really-0.042
-0.040
Token Who
Token ID5338
Feature activation+2.82e-03

Pos logit contributions
 least+0.019
onel+0.019
ird+0.017
col+0.017
aved+0.016
Neg logit contributions
What-0.019
Really-0.017
 boy-0.017
 man-0.015
Who-0.015
Token knows
Token ID4206
Feature activation-3.50e-02

Pos logit contributions
What+0.004
 boy+0.004
Really+0.004
It+0.003
 man+0.003
Neg logit contributions
 least-0.005
onel-0.005
col-0.005
ird-0.005
auc-0.005
Token what
Token ID644
Feature activation+1.77e-02

Pos logit contributions
onel+0.072
 least+0.071
ird+0.066
aved+0.065
issy+0.064
Neg logit contributions
What-0.046
Really-0.040
 boy-0.039
It-0.035
Who-0.034
Token other
Token ID584
Feature activation-1.71e-01

Pos logit contributions
What+0.024
Really+0.022
 boy+0.020
Who+0.018
It+0.017
Neg logit contributions
 least-0.036
onel-0.036
ird-0.033
col-0.032
issy-0.032
Token new
Token ID649
Feature activation-1.13e-01

Pos logit contributions
 least+0.378
onel+0.371
ird+0.341
col+0.340
issy+0.328
Neg logit contributions
What-0.202
Really-0.180
 boy-0.162
-0.144
It-0.143
Token adventures
Token ID17545
Feature activation-2.49e-02

Pos logit contributions
 least+0.221
onel+0.212
col+0.202
ird+0.193
 er+0.192
Neg logit contributions
What-0.160
Really-0.145
 boy-0.124
-0.121
Who-0.120
Token they
Token ID484
Feature activation-7.13e-02

Pos logit contributions
 least+0.043
onel+0.041
ird+0.037
col+0.037
issy+0.036
Neg logit contributions
What-0.042
Really-0.040
 boy-0.036
Who-0.034
It-0.032
Token would
Token ID561
Feature activation-4.05e-03

Pos logit contributions
 least+0.123
onel+0.113
ird+0.099
aved+0.099
issy+0.097
Neg logit contributions
What-0.123
Really-0.118
 boy-0.109
Who-0.099
-0.098
Token find
Token ID1064
Feature activation-8.47e-03

Pos logit contributions
 least+0.007
onel+0.007
ird+0.006
col+0.006
aved+0.006
Neg logit contributions
What-0.007
Really-0.006
 boy-0.006
Who-0.005
It-0.005
Token exploring
Token ID13504
Feature activation+1.50e-02

Pos logit contributions
What+0.014
Really+0.013
 boy+0.013
Who+0.011
It+0.011
Neg logit contributions
 least-0.015
onel-0.014
ird-0.013
col-0.013
aved-0.012
Token,
Token ID11
Feature activation-3.04e-02

Pos logit contributions
What+0.021
Really+0.018
 boy+0.018
 man+0.016
It+0.015
Neg logit contributions
 least-0.030
onel-0.030
ird-0.028
issy-0.027
auc-0.026
Token never
Token ID1239
Feature activation+1.97e-02

Pos logit contributions
 least+0.052
onel+0.051
ird+0.047
auc+0.045
issy+0.045
Neg logit contributions
What-0.051
Really-0.046
 boy-0.045
It-0.041
 man-0.041
Token knowing
Token ID6970
Feature activation-5.44e-03

Pos logit contributions
What+0.029
 boy+0.026
Really+0.026
Who+0.023
 man+0.022
Neg logit contributions
onel-0.038
 least-0.037
ird-0.035
auc-0.034
col-0.034
Token what
Token ID644
Feature activation+2.32e-02

Pos logit contributions
 least+0.011
onel+0.011
ird+0.010
auc+0.009
issy+0.009
Neg logit contributions
What-0.008
Really-0.007
 boy-0.007
It-0.006
Who-0.006
Token other
Token ID584
Feature activation-1.71e-01

Pos logit contributions
What+0.035
Really+0.031
 boy+0.030
Who+0.027
It+0.026
Neg logit contributions
 least-0.045
onel-0.044
ird-0.040
col-0.039
aved-0.039
Token secrets
Token ID13141
Feature activation-8.24e-02

Pos logit contributions
 least+0.389
onel+0.382
ird+0.353
col+0.348
aved+0.343
Neg logit contributions
What-0.192
Really-0.167
 boy-0.157
Who-0.133
It-0.132
Token she
Token ID673
Feature activation-7.52e-02

Pos logit contributions
 least+0.145
onel+0.142
ird+0.128
col+0.125
aved+0.123
Neg logit contributions
What-0.135
Really-0.121
 boy-0.118
It-0.107
Who-0.106
Token would
Token ID561
Feature activation-9.23e-03

Pos logit contributions
 least+0.136
onel+0.127
ird+0.115
col+0.111
issy+0.110
Neg logit contributions
What-0.122
Really-0.113
 boy-0.107
It-0.095
-0.095
Token find
Token ID1064
Feature activation-8.29e-03

Pos logit contributions
 least+0.015
onel+0.015
ird+0.013
col+0.013
issy+0.013
Neg logit contributions
What-0.016
Really-0.015
 boy-0.014
Who-0.013
It-0.013
Token.
Token ID13
Feature activation-1.61e-02

Pos logit contributions
onel+0.016
 least+0.016
ird+0.015
aved+0.015
auc+0.015
Neg logit contributions
What-0.011
 boy-0.010
Really-0.009
 man-0.009
It-0.009
Token watched
Token ID7342
Feature activation-1.49e-02

Pos logit contributions
 least+0.034
onel+0.032
col+0.030
ird+0.029
aved+0.029
Neg logit contributions
What-0.030
Really-0.028
 boy-0.026
Who-0.023
It-0.023
Token the
Token ID262
Feature activation-3.76e-02

Pos logit contributions
 least+0.033
onel+0.033
ird+0.031
issy+0.030
col+0.030
Neg logit contributions
What-0.018
Really-0.015
 boy-0.015
 man-0.013
It-0.013
Token squirrel
Token ID33039
Feature activation-1.09e-02

Pos logit contributions
onel+0.077
 least+0.077
col+0.070
 ease+0.068
aved+0.068
Neg logit contributions
What-0.043
Really-0.040
 boy-0.033
How-0.033
-0.032
Token as
Token ID355
Feature activation+5.88e-02

Pos logit contributions
 least+0.024
onel+0.023
ird+0.022
issy+0.022
auc+0.022
Neg logit contributions
What-0.013
 boy-0.012
 man-0.010
He-0.009
It-0.009
Token it
Token ID340
Feature activation-4.86e-02

Pos logit contributions
What+0.083
Really+0.077
 boy+0.065
Who+0.063
+0.062
Neg logit contributions
 least-0.110
onel-0.109
 ease-0.100
aved-0.097
col-0.096
Token sc
Token ID629
Feature activation-1.70e-01

Pos logit contributions
 least+0.083
onel+0.081
ird+0.073
col+0.072
aved+0.070
Neg logit contributions
What-0.082
Really-0.076
 boy-0.072
Who-0.066
It-0.065
Tokenur
Token ID333
Feature activation-1.87e-03

Pos logit contributions
 least+0.223
inda+0.192
onel+0.191
 tele+0.189
issy+0.183
Neg logit contributions
Really-0.284
-0.266
What-0.266
Spot-0.259
He-0.255
Tokenried
Token ID2228
Feature activation+2.82e-02

Pos logit contributions
 least+0.003
onel+0.003
ird+0.003
issy+0.003
 er+0.003
Neg logit contributions
What-0.003
Really-0.002
 boy-0.002
Who-0.002
-0.002
Token around
Token ID1088
Feature activation-9.55e-03

Pos logit contributions
What+0.050
 boy+0.044
Really+0.042
It+0.040
 man+0.040
Neg logit contributions
 least-0.049
onel-0.048
ird-0.047
issy-0.043
auc-0.041
Token.
Token ID13
Feature activation-2.30e-02

Pos logit contributions
onel+0.020
 least+0.020
ird+0.019
issy+0.018
auc+0.017
Neg logit contributions
What-0.013
 boy-0.011
Really-0.011
Who-0.010
 man-0.010
Token
Token ID220
Feature activation-3.55e-02

Pos logit contributions
 least+0.038
onel+0.036
col+0.035
ird+0.034
aved+0.034
Neg logit contributions
What-0.035
Really-0.032
 boy-0.030
Who-0.029
 MacDonald-0.027
Token He
Token ID679
Feature activation+3.12e-02

Pos logit contributions
 least+0.062
onel+0.060
ird+0.055
col+0.054
issy+0.053
Neg logit contributions
What-0.051
Really-0.047
 boy-0.045
Who-0.040
It-0.040
Token stopped
Token ID5025
Feature activation+1.97e-03

Pos logit contributions
What+0.048
Really+0.043
 boy+0.041
Who+0.037
It+0.037
Neg logit contributions
 least-0.059
onel-0.057
ird-0.052
col-0.051
aved-0.050
Token to
Token ID284
Feature activation-3.15e-02

Pos logit contributions
What+0.003
Really+0.003
 boy+0.002
Who+0.002
It+0.002
Neg logit contributions
 least-0.004
onel-0.004
col-0.003
ird-0.003
aved-0.003
Token wonder
Token ID4240
Feature activation-6.63e-02

Pos logit contributions
onel+0.056
 least+0.055
ird+0.050
issy+0.048
col+0.048
Neg logit contributions
What-0.052
Really-0.046
 boy-0.045
Who-0.041
 man-0.041
Token what
Token ID644
Feature activation+1.50e-02

Pos logit contributions
 least+0.128
onel+0.125
ird+0.114
col+0.112
issy+0.110
Neg logit contributions
What-0.098
Really-0.088
 boy-0.084
Who-0.075
It-0.074
Token other
Token ID584
Feature activation-1.69e-01

Pos logit contributions
What+0.021
Really+0.019
 boy+0.017
+0.016
Who+0.015
Neg logit contributions
 least-0.030
onel-0.029
ird-0.027
col-0.027
issy-0.026
Token surprises
Token ID24072
Feature activation-2.12e-02

Pos logit contributions
 least+0.375
onel+0.365
col+0.338
ird+0.337
issy+0.328
Neg logit contributions
What-0.193
Really-0.176
 boy-0.146
-0.144
It-0.135
Token he
Token ID339
Feature activation-2.82e-02

Pos logit contributions
 least+0.038
onel+0.037
ird+0.034
col+0.033
issy+0.032
Neg logit contributions
What-0.034
Really-0.031
 boy-0.029
Who-0.027
It-0.026
Token would
Token ID561
Feature activation-1.28e-04

Pos logit contributions
 least+0.049
onel+0.046
ird+0.041
aved+0.040
inda+0.040
Neg logit contributions
What-0.048
Really-0.045
 boy-0.041
It-0.037
Who-0.037
Token find
Token ID1064
Feature activation-8.37e-03

Pos logit contributions
 least+0.000
onel+0.000
ird+0.000
col+0.000
aved+0.000
Neg logit contributions
What-0.000
Really-0.000
 boy-0.000
Who-0.000
It-0.000
Token.
Token ID13
Feature activation-2.56e-02

Pos logit contributions
 least+0.016
onel+0.016
ird+0.014
issy+0.014
aved+0.014
Neg logit contributions
What-0.012
 boy-0.011
Really-0.011
 man-0.010
It-0.009
Token should
Token ID815
Feature activation-3.33e-02

Pos logit contributions
 least+0.073
onel+0.071
ird+0.063
aved+0.062
col+0.060
Neg logit contributions
What-0.087
Really-0.082
 boy-0.075
Who-0.071
 man-0.070
Token take
Token ID1011
Feature activation-4.45e-03

Pos logit contributions
 least+0.058
onel+0.058
ird+0.052
col+0.051
issy+0.050
Neg logit contributions
What-0.054
Really-0.049
 boy-0.048
It-0.043
 man-0.043
Token turns
Token ID4962
Feature activation-1.26e-02

Pos logit contributions
onel+0.010
 least+0.010
ird+0.009
auc+0.009
issy+0.009
Neg logit contributions
What-0.005
 boy-0.004
Really-0.004
 man-0.004
It-0.003
Token pulling
Token ID10427
Feature activation-5.46e-02

Pos logit contributions
 least+0.022
onel+0.022
ird+0.020
col+0.019
aved+0.019
Neg logit contributions
What-0.021
Really-0.019
 boy-0.018
Who-0.016
It-0.016
Token each
Token ID1123
Feature activation-1.56e-02

Pos logit contributions
 least+0.113
onel+0.112
ird+0.102
col+0.100
issy+0.099
Neg logit contributions
What-0.074
Really-0.065
 boy-0.063
Who-0.054
It-0.054
Token other
Token ID584
Feature activation-1.68e-01

Pos logit contributions
onel+0.026
 least+0.026
ird+0.023
col+0.023
aved+0.023
Neg logit contributions
What-0.027
 boy-0.026
 man-0.023
Really-0.022
It-0.021
Tokenâ
Token ID22940
Feature activation-6.66e-02

Pos logit contributions
 least+0.246
onel+0.240
ird+0.211
col+0.206
issy+0.201
Neg logit contributions
What-0.325
Really-0.300
 boy-0.292
Who-0.267
It-0.266
Token
Token ID26391
Feature activation-3.86e-02

Pos logit contributions
 least+0.098
onel+0.095
ird+0.084
col+0.082
issy+0.080
Neg logit contributions
What-0.129
Really-0.120
 boy-0.116
Who-0.106
It-0.106
Token
Token ID8151
Feature activation-5.52e-02

Pos logit contributions
ird+0.053
onel+0.049
 least+0.048
ibr+0.045
issy+0.043
Neg logit contributions
What-0.081
Really-0.074
 boy-0.068
Who-0.066
Wow-0.061
Tokens
Token ID82
Feature activation-1.07e-01

Pos logit contributions
onel+0.117
 least+0.116
oken+0.108
aved+0.104
col+0.102
Neg logit contributions
What-0.063
 boy-0.061
-0.055
 man-0.054
"-0.053
Token hair
Token ID4190
Feature activation-2.94e-02

Pos logit contributions
 least+0.197
onel+0.193
ird+0.175
col+0.172
issy+0.168
Neg logit contributions
What-0.167
Really-0.152
 boy-0.145
Who-0.130
It-0.129
Token
Token ID220
Feature activation-3.00e-02

Pos logit contributions
 least+0.015
onel+0.014
ird+0.013
oken+0.013
aved+0.013
Neg logit contributions
What-0.010
Really-0.009
 boy-0.009
It-0.008
-0.008
Token She
Token ID1375
Feature activation-1.62e-02

Pos logit contributions
 least+0.068
onel+0.067
ird+0.062
col+0.061
aved+0.060
Neg logit contributions
What-0.034
Really-0.030
 boy-0.028
It-0.024
Who-0.024
Token grabbed
Token ID13176
Feature activation-2.08e-02

Pos logit contributions
 least+0.030
onel+0.029
ird+0.026
col+0.026
issy+0.025
Neg logit contributions
What-0.026
Really-0.023
 boy-0.022
Who-0.020
It-0.020
Token oven
Token ID14361
Feature activation-5.34e-02

Pos logit contributions
onel+0.052
 least+0.051
ird+0.049
auc+0.047
opl+0.047
Neg logit contributions
What-0.021
 boy-0.016
It-0.013
 man-0.013
Who-0.012
Token m
Token ID285
Feature activation-1.04e-01

Pos logit contributions
 least+0.116
onel+0.114
ird+0.105
col+0.102
aved+0.101
Neg logit contributions
What-0.066
Really-0.058
 boy-0.055
 man-0.048
It-0.047
Tokenitt
Token ID715
Feature activation-1.68e-01

Pos logit contributions
 least+0.116
onel+0.107
inda+0.085
ird+0.085
col+0.084
Neg logit contributions
What-0.238
Really-0.225
 boy-0.213
It-0.206
-0.204
Tokens
Token ID82
Feature activation-9.23e-02

Pos logit contributions
 least+0.334
onel+0.329
ird+0.302
aved+0.284
auc+0.283
Neg logit contributions
What-0.231
 boy-0.202
Really-0.198
 man-0.183
-0.180
Token and
Token ID290
Feature activation-2.64e-03

Pos logit contributions
 least+0.196
onel+0.192
ird+0.177
col+0.175
aved+0.171
Neg logit contributions
What-0.119
Really-0.107
 boy-0.099
Who-0.088
It-0.085
Token carefully
Token ID7773
Feature activation-2.12e-02

Pos logit contributions
onel+0.005
 least+0.005
oken+0.004
ird+0.004
aved+0.004
Neg logit contributions
What-0.004
 boy-0.004
Really-0.004
 man-0.003
It-0.003
Token pulled
Token ID5954
Feature activation-1.95e-02

Pos logit contributions
onel+0.040
 least+0.040
ovable+0.034
issy+0.034
upid+0.034
Neg logit contributions
What-0.032
 boy-0.027
 man-0.026
Really-0.026
It-0.026
Token out
Token ID503
Feature activation-6.26e-02

Pos logit contributions
 least+0.036
onel+0.036
ird+0.032
auc+0.031
aved+0.030
Neg logit contributions
What-0.031
 boy-0.027
Really-0.026
It-0.025
 man-0.024
Token can
Token ID460
Feature activation-7.89e-02

Pos logit contributions
 least+0.086
onel+0.084
issy+0.079
ird+0.079
auc+0.077
Neg logit contributions
What-0.049
 boy-0.043
Really-0.041
 man-0.039
It-0.038
Token always
Token ID1464
Feature activation+2.30e-02

Pos logit contributions
 least+0.141
onel+0.134
ird+0.124
col+0.118
issy+0.117
Neg logit contributions
What-0.130
Really-0.120
 boy-0.113
Who-0.102
-0.101
Token rely
Token ID8814
Feature activation-4.24e-04

Pos logit contributions
What+0.038
Really+0.036
 boy+0.033
+0.030
Who+0.030
Neg logit contributions
 least-0.042
onel-0.038
ird-0.036
 ease-0.034
 er-0.033
Token on
Token ID319
Feature activation-7.18e-02

Pos logit contributions
 least+0.001
onel+0.001
ird+0.001
 er+0.001
aved+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token each
Token ID1123
Feature activation-2.35e-02

Pos logit contributions
 least+0.132
onel+0.127
ird+0.116
col+0.114
issy+0.112
Neg logit contributions
What-0.113
Really-0.104
 boy-0.098
Who-0.089
It-0.087
Token other
Token ID584
Feature activation-1.67e-01

Pos logit contributions
 least+0.034
onel+0.033
ird+0.029
col+0.029
aved+0.028
Neg logit contributions
What-0.046
Really-0.042
 boy-0.041
Who-0.038
It-0.038
Token".
Token ID1911
Feature activation+1.66e-02

Pos logit contributions
 least+0.300
onel+0.283
ird+0.261
col+0.252
issy+0.246
Neg logit contributions
What-0.276
Really-0.258
 boy-0.226
It-0.219
Who-0.219
Token Tina
Token ID35903
Feature activation-4.24e-02

Pos logit contributions
What+0.023
Really+0.021
 boy+0.020
Who+0.018
 man+0.017
Neg logit contributions
 least-0.033
onel-0.031
ird-0.030
col-0.029
issy-0.028
Token hugged
Token ID41130
Feature activation-4.23e-02

Pos logit contributions
 least+0.081
onel+0.077
ird+0.071
col+0.070
issy+0.068
Neg logit contributions
What-0.065
Really-0.059
 boy-0.054
Who-0.050
It-0.048
Token her
Token ID607
Feature activation-1.25e-02

Pos logit contributions
 least+0.068
onel+0.064
col+0.058
ird+0.057
issy+0.056
Neg logit contributions
What-0.076
Really-0.071
 boy-0.067
Who-0.062
It-0.061
Token back
Token ID736
Feature activation+4.18e-02

Pos logit contributions
 least+0.024
onel+0.023
ird+0.021
col+0.021
 er+0.021
Neg logit contributions
What-0.019
Really-0.017
 boy-0.015
Who-0.015
It-0.015
Token.
Token ID13
Feature activation-1.58e-02

Pos logit contributions
 least+0.068
onel+0.067
ird+0.061
col+0.060
issy+0.059
Neg logit contributions
What-0.053
Really-0.047
 boy-0.047
 man-0.041
It-0.041
Token Barry
Token ID14488
Feature activation-8.35e-03

Pos logit contributions
onel+0.033
 least+0.032
ird+0.029
auc+0.029
oken+0.028
Neg logit contributions
What-0.023
Really-0.020
 boy-0.019
-0.018
It-0.017
Token enjoyed
Token ID8359
Feature activation+4.62e-02

Pos logit contributions
 least+0.016
onel+0.015
ird+0.014
col+0.014
inda+0.014
Neg logit contributions
What-0.013
Really-0.012
 boy-0.010
Who-0.010
It-0.009
Token playing
Token ID2712
Feature activation+7.30e-03

Pos logit contributions
What+0.078
Really+0.076
 boy+0.072
Who+0.064
 Please+0.063
Neg logit contributions
 least-0.077
onel-0.071
col-0.065
 ease-0.065
ird-0.062
Token hop
Token ID1725
Feature activation-2.76e-02

Pos logit contributions
What+0.010
Really+0.010
 boy+0.009
Who+0.008
It+0.008
Neg logit contributions
 least-0.014
onel-0.014
col-0.012
ird-0.012
inda-0.012
Tokensc
Token ID1416
Feature activation-1.67e-01

Pos logit contributions
 least+0.048
onel+0.047
ird+0.042
issy+0.040
col+0.040
Neg logit contributions
What-0.046
 boy-0.042
Really-0.041
 man-0.038
It-0.037
Tokenot
Token ID313
Feature activation-6.06e-02

Pos logit contributions
 least+0.261
onel+0.244
ird+0.213
col+0.205
aved+0.203
Neg logit contributions
What-0.307
Really-0.276
 boy-0.275
It-0.258
 man-0.253
Tokench
Token ID354
Feature activation-1.53e-02

Pos logit contributions
 least+0.104
onel+0.100
ird+0.089
col+0.085
auc+0.085
Neg logit contributions
What-0.103
 boy-0.092
Really-0.092
 man-0.083
It-0.083
Token with
Token ID351
Feature activation-8.11e-03

Pos logit contributions
 least+0.027
onel+0.026
ird+0.023
col+0.023
issy+0.022
Neg logit contributions
What-0.026
Really-0.023
 boy-0.023
 man-0.020
It-0.020
Token them
Token ID606
Feature activation+2.03e-02

Pos logit contributions
 least+0.017
onel+0.017
ird+0.017
issy+0.016
auc+0.015
Neg logit contributions
What-0.010
 boy-0.009
Really-0.008
 man-0.008
It-0.008
Token so
Token ID523
Feature activation-1.25e-02

Pos logit contributions
What+0.027
 boy+0.025
Really+0.023
 man+0.022
It+0.021
Neg logit contributions
 least-0.040
onel-0.039
ird-0.037
issy-0.036
auc-0.036
Token dance
Token ID9280
Feature activation+2.09e-02

Pos logit contributions
What+0.024
Really+0.022
 boy+0.021
 man+0.020
Who+0.019
Neg logit contributions
onel-0.023
 least-0.023
ird-0.020
col-0.020
aved-0.020
Token with
Token ID351
Feature activation-6.18e-03

Pos logit contributions
What+0.034
Really+0.031
 boy+0.029
Who+0.027
+0.026
Neg logit contributions
 least-0.037
onel-0.036
ird-0.033
col-0.033
aved-0.032
Token them
Token ID606
Feature activation+2.24e-02

Pos logit contributions
 least+0.014
onel+0.014
ird+0.013
issy+0.012
auc+0.012
Neg logit contributions
What-0.007
 boy-0.006
Really-0.006
 man-0.005
It-0.005
Token instead
Token ID2427
Feature activation+1.99e-02

Pos logit contributions
What+0.032
 boy+0.029
Really+0.028
 man+0.025
It+0.025
Neg logit contributions
 least-0.043
onel-0.042
ird-0.039
col-0.038
issy-0.038
Token of
Token ID286
Feature activation-6.80e-02

Pos logit contributions
What+0.032
 boy+0.028
Really+0.028
 man+0.026
It+0.025
Neg logit contributions
 least-0.036
onel-0.034
ird-0.031
col-0.031
aved-0.030
Token sc
Token ID629
Feature activation-1.66e-01

Pos logit contributions
 least+0.124
onel+0.121
ird+0.109
col+0.108
aved+0.105
Neg logit contributions
What-0.108
Really-0.099
 boy-0.094
Who-0.084
It-0.084
Tokenaring
Token ID1723
Feature activation+1.81e-02

Pos logit contributions
 least+0.246
onel+0.232
col+0.222
 ease+0.214
 tele+0.210
Neg logit contributions
 man-0.234
 boy-0.230
"-0.227
What-0.227
 Please-0.225
Token them
Token ID606
Feature activation-5.06e-03

Pos logit contributions
What+0.026
Really+0.025
 boy+0.022
Who+0.020
+0.020
Neg logit contributions
 least-0.035
onel-0.034
col-0.031
ird-0.031
auc-0.030
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 least+0.009
onel+0.009
ird+0.008
col+0.008
issy+0.008
Neg logit contributions
What-0.008
Really-0.007
 boy-0.007
 man-0.006
Who-0.006
Token
Token ID198
Feature activation-1.68e-02

Pos logit contributions
onel+0.028
 least+0.028
ird+0.026
col+0.025
aved+0.025
Neg logit contributions
What-0.023
Really-0.021
 boy-0.020
-0.018
It-0.018
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 least+0.036
onel+0.035
ird+0.033
col+0.032
issy+0.032
Neg logit contributions
What-0.021
Really-0.019
 boy-0.018
Who-0.015
 man-0.015
Token and
Token ID290
Feature activation-1.90e-02

Pos logit contributions
What+0.010
Really+0.010
 boy+0.009
Who+0.009
Spot+0.009
Neg logit contributions
 least-0.008
onel-0.007
col-0.006
inda-0.006
 er-0.006
Token thanked
Token ID26280
Feature activation+1.38e-02

Pos logit contributions
 least+0.034
onel+0.033
ird+0.030
col+0.030
issy+0.029
Neg logit contributions
What-0.031
Really-0.028
 boy-0.027
Who-0.024
It-0.024
Token the
Token ID262
Feature activation-1.31e-02

Pos logit contributions
What+0.023
Really+0.022
 boy+0.021
+0.019
It+0.019
Neg logit contributions
 least-0.023
onel-0.021
col-0.019
 ease-0.019
issy-0.019
Token parent
Token ID2560
Feature activation-5.72e-02

Pos logit contributions
 least+0.027
onel+0.027
ird+0.025
col+0.025
aved+0.025
Neg logit contributions
What-0.018
 boy-0.016
Really-0.015
 man-0.014
Who-0.013
Token before
Token ID878
Feature activation+3.90e-02

Pos logit contributions
 least+0.099
onel+0.097
ird+0.087
col+0.086
aved+0.084
Neg logit contributions
What-0.095
Really-0.086
 boy-0.085
 man-0.076
It-0.075
Token sc
Token ID629
Feature activation-1.66e-01

Pos logit contributions
What+0.070
Really+0.067
 boy+0.063
How+0.057
+0.057
Neg logit contributions
 least-0.059
onel-0.054
 ease-0.051
col-0.048
ird-0.048
Tokenur
Token ID333
Feature activation+3.98e-03

Pos logit contributions
 least+0.245
 rect+0.221
col+0.206
onel+0.206
 ease+0.202
Neg logit contributions
 lady-0.259
 man-0.255
"-0.250
 boy-0.241
What-0.235
Tokenrying
Token ID14992
Feature activation-1.57e-02

Pos logit contributions
What+0.007
Really+0.007
 boy+0.007
Who+0.006
It+0.006
Neg logit contributions
 least-0.006
onel-0.006
ird-0.005
col-0.005
issy-0.005
Token away
Token ID1497
Feature activation+1.47e-02

Pos logit contributions
 least+0.021
onel+0.020
ird+0.017
col+0.017
aved+0.016
Neg logit contributions
What-0.033
Really-0.031
 boy-0.030
Who-0.028
It-0.028
Token.
Token ID13
Feature activation-1.47e-02

Pos logit contributions
What+0.023
Really+0.021
 boy+0.021
Who+0.018
 man+0.018
Neg logit contributions
 least-0.027
onel-0.026
ird-0.024
col-0.023
issy-0.023
Token
Token ID220
Feature activation-3.30e-02

Pos logit contributions
 least+0.026
onel+0.026
ird+0.023
col+0.023
issy+0.023
Neg logit contributions
What-0.024
Really-0.021
 boy-0.020
Who-0.018
It-0.018
Token there
Token ID612
Feature activation+1.53e-02

Pos logit contributions
 least+0.047
onel+0.047
aved+0.042
ird+0.042
inda+0.041
Neg logit contributions
Really-0.033
What-0.033
 MacDonald-0.029
Who-0.028
 man-0.027
Token was
Token ID373
Feature activation-1.78e-02

Pos logit contributions
What+0.021
Really+0.020
 boy+0.017
Who+0.016
 man+0.015
Neg logit contributions
 least-0.031
onel-0.030
ird-0.027
col-0.027
aved-0.026
Token a
Token ID257
Feature activation+3.53e-02

Pos logit contributions
 least+0.040
onel+0.039
ird+0.036
col+0.035
auc+0.035
Neg logit contributions
What-0.022
Really-0.018
 boy-0.018
It-0.016
Who-0.015
Token clumsy
Token ID39120
Feature activation+3.61e-02

Pos logit contributions
What+0.051
Really+0.045
 MacDonald+0.040
Who+0.039
How+0.038
Neg logit contributions
 least-0.061
onel-0.057
 er-0.056
ird-0.054
issy-0.052
Token bar
Token ID2318
Feature activation+7.76e-02

Pos logit contributions
What+0.047
Really+0.041
Who+0.033
It+0.033
How+0.032
Neg logit contributions
 least-0.076
onel-0.073
 er-0.069
ird-0.069
issy-0.067
Tokenber
Token ID527
Feature activation+1.07e-01

Pos logit contributions
What+0.083
Really+0.072
 boy+0.062
It+0.055
Who+0.055
Neg logit contributions
 least-0.181
onel-0.179
ird-0.167
col-0.163
auc-0.161
Token.
Token ID13
Feature activation+1.99e-03

Pos logit contributions
What+0.209
Really+0.196
 boy+0.179
Who+0.171
It+0.170
Neg logit contributions
 least-0.158
onel-0.149
ird-0.138
col-0.131
issy-0.128
Token He
Token ID679
Feature activation+3.64e-02

Pos logit contributions
What+0.003
Really+0.003
 boy+0.003
+0.002
Who+0.002
Neg logit contributions
 least-0.004
onel-0.003
ird-0.003
col-0.003
aved-0.003
Token liked
Token ID8288
Feature activation-7.20e-03

Pos logit contributions
What+0.064
Really+0.060
Who+0.050
 boy+0.048
+0.047
Neg logit contributions
 least-0.058
onel-0.054
 er-0.053
ird-0.052
issy-0.051
Token to
Token ID284
Feature activation+2.02e-03

Pos logit contributions
 least+0.010
 er+0.009
onel+0.009
col+0.009
bar+0.008
Neg logit contributions
What-0.014
Really-0.013
-0.012
 boy-0.012
It-0.011
Token zip
Token ID19974
Feature activation-3.84e-02

Pos logit contributions
What+0.004
Really+0.003
 boy+0.003
+0.003
It+0.003
Neg logit contributions
 least-0.003
onel-0.003
ird-0.003
col-0.003
aved-0.003
Token we
Token ID356
Feature activation-1.04e-01

Pos logit contributions
 least+0.014
onel+0.013
ird+0.012
col+0.011
oken+0.011
Neg logit contributions
What-0.011
Really-0.010
 boy-0.009
It-0.008
 man-0.008
Token need
Token ID761
Feature activation-5.35e-02

Pos logit contributions
onel+0.177
 least+0.169
col+0.163
auc+0.161
inda+0.159
Neg logit contributions
What-0.168
 boy-0.150
Really-0.147
It-0.134
 man-0.134
Token it
Token ID340
Feature activation-2.32e-02

Pos logit contributions
onel+0.136
 least+0.129
issy+0.125
ird+0.124
auc+0.123
Neg logit contributions
What-0.052
Really-0.038
 boy-0.036
It-0.035
Who-0.032
Token and
Token ID290
Feature activation+1.32e-03

Pos logit contributions
 least+0.044
onel+0.044
ird+0.041
col+0.040
aved+0.040
Neg logit contributions
What-0.032
 boy-0.029
Really-0.028
 man-0.025
-0.025
Token to
Token ID284
Feature activation+2.02e-02

Pos logit contributions
What+0.002
Really+0.002
 boy+0.002
Who+0.002
It+0.002
Neg logit contributions
 least-0.002
onel-0.002
ird-0.002
col-0.002
issy-0.002
Token always
Token ID1464
Feature activation+7.54e-02

Pos logit contributions
What+0.029
Really+0.025
 boy+0.025
It+0.023
 man+0.022
Neg logit contributions
onel-0.040
 least-0.038
ird-0.036
auc-0.036
aved-0.036
Token be
Token ID307
Feature activation+4.77e-02

Pos logit contributions
What+0.130
Really+0.120
 boy+0.114
Who+0.103
It+0.103
Neg logit contributions
 least-0.130
onel-0.122
ird-0.113
col-0.107
aved-0.104
Token safe
Token ID3338
Feature activation-2.00e-02

Pos logit contributions
What+0.081
Really+0.079
 boy+0.070
Who+0.067
 man+0.064
Neg logit contributions
 least-0.080
onel-0.077
ird-0.070
col-0.067
aved-0.065
Token.
Token ID13
Feature activation-4.23e-03

Pos logit contributions
 least+0.032
onel+0.030
col+0.026
ird+0.025
aved+0.025
Neg logit contributions
What-0.037
Really-0.036
Who-0.031
 boy-0.031
It-0.030
Token<|endoftext|>
Token ID50256
Feature activation+3.06e-02

Pos logit contributions
onel+0.010
 least+0.009
ird+0.009
aved+0.009
auc+0.008
Neg logit contributions
What-0.005
Really-0.005
 boy-0.004
It-0.004
-0.004
TokenOnce
Token ID7454
Feature activation+3.98e-02

Pos logit contributions
Really+0.047
What+0.046
 boy+0.039
Who+0.038
+0.037
Neg logit contributions
 least-0.053
onel-0.049
ird-0.048
col-0.047
 ease-0.047
Token,
Token ID11
Feature activation-3.73e-03

Pos logit contributions
What+0.013
 boy+0.013
 man+0.011
Really+0.011
It+0.010
Neg logit contributions
onel-0.020
 least-0.020
ird-0.018
oken-0.017
inda-0.017
Token but
Token ID475
Feature activation+2.20e-03

Pos logit contributions
 least+0.007
onel+0.007
ird+0.007
col+0.007
issy+0.007
Neg logit contributions
What-0.005
Really-0.005
 boy-0.005
It-0.004
Who-0.004
Token the
Token ID262
Feature activation+3.12e-02

Pos logit contributions
What+0.003
Really+0.003
 boy+0.002
Who+0.002
+0.002
Neg logit contributions
 least-0.004
onel-0.004
ird-0.004
bar-0.004
 ease-0.004
Token waves
Token ID9813
Feature activation+4.29e-03

Pos logit contributions
What+0.059
Really+0.055
 boy+0.049
Who+0.048
It+0.047
Neg logit contributions
 least-0.048
onel-0.046
col-0.039
ird-0.039
aved-0.038
Token were
Token ID547
Feature activation-3.11e-02

Pos logit contributions
What+0.007
 boy+0.006
Really+0.006
 man+0.006
It+0.006
Neg logit contributions
 least-0.008
onel-0.007
ird-0.007
col-0.007
issy-0.007
Token too
Token ID1165
Feature activation+8.62e-02

Pos logit contributions
 least+0.061
onel+0.058
aved+0.051
bar+0.051
 ease+0.051
Neg logit contributions
What-0.046
Really-0.045
 boy-0.039
Who-0.037
-0.036
Token strong
Token ID1913
Feature activation-3.39e-02

Pos logit contributions
What+0.135
Really+0.130
 boy+0.109
Who+0.108
How+0.100
Neg logit contributions
 least-0.156
onel-0.149
aved-0.139
col-0.138
ird-0.135
Token.
Token ID13
Feature activation-6.36e-04

Pos logit contributions
 least+0.068
onel+0.068
ird+0.062
col+0.060
issy+0.060
Neg logit contributions
What-0.045
 boy-0.041
Really-0.039
 man-0.037
It-0.035
Token His
Token ID2399
Feature activation+1.49e-02

Pos logit contributions
 least+0.001
onel+0.001
ird+0.001
col+0.001
issy+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
-0.001
Token boat
Token ID8848
Feature activation+4.97e-03

Pos logit contributions
What+0.027
Really+0.024
Who+0.022
How+0.021
It+0.021
Neg logit contributions
 least-0.025
onel-0.024
ird-0.021
oken-0.021
col-0.021
Token caps
Token ID11022
Feature activation-1.54e-02

Pos logit contributions
What+0.008
Really+0.007
 boy+0.006
Who+0.006
It+0.006
Neg logit contributions
 least-0.009
onel-0.009
ird-0.008
col-0.008
aved-0.008
Token her
Token ID607
Feature activation-7.72e-03

Pos logit contributions
What+0.035
Really+0.031
 boy+0.029
Who+0.027
It+0.027
Neg logit contributions
onel-0.040
 least-0.040
ird-0.035
aved-0.035
auc-0.035
Token and
Token ID290
Feature activation+3.00e-03

Pos logit contributions
onel+0.014
 least+0.014
ird+0.013
auc+0.012
aved+0.012
Neg logit contributions
What-0.012
 boy-0.011
Really-0.011
 man-0.010
Who-0.009
Token her
Token ID607
Feature activation+3.08e-02

Pos logit contributions
What+0.005
Really+0.005
 boy+0.004
Who+0.004
It+0.004
Neg logit contributions
 least-0.005
onel-0.005
ird-0.005
col-0.004
issy-0.004
Token mom
Token ID1995
Feature activation+2.14e-02

Pos logit contributions
What+0.041
Really+0.037
 boy+0.037
 man+0.033
Who+0.031
Neg logit contributions
 least-0.062
onel-0.062
ird-0.057
auc-0.056
aved-0.055
Token to
Token ID284
Feature activation+2.76e-02

Pos logit contributions
What+0.035
Really+0.032
 boy+0.031
Who+0.028
It+0.028
Neg logit contributions
 least-0.037
onel-0.037
ird-0.033
col-0.032
issy-0.032
Token share
Token ID2648
Feature activation+9.06e-02

Pos logit contributions
What+0.044
Really+0.036
 boy+0.035
It+0.034
Who+0.034
Neg logit contributions
onel-0.053
 least-0.048
auc-0.048
ird-0.046
aved-0.046
Token.
Token ID13
Feature activation-8.45e-03

Pos logit contributions
What+0.145
Really+0.143
 boy+0.132
+0.122
Who+0.120
Neg logit contributions
 least-0.154
onel-0.143
col-0.140
ird-0.135
 ease-0.133
Token
Token ID198
Feature activation-1.33e-02

Pos logit contributions
onel+0.018
 least+0.018
ird+0.016
aved+0.016
issy+0.016
Neg logit contributions
What-0.011
Really-0.010
 boy-0.009
It-0.009
Who-0.008
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 least+0.029
onel+0.029
ird+0.027
col+0.026
aved+0.026
Neg logit contributions
What-0.016
Really-0.014
 boy-0.013
Who-0.011
 man-0.011
TokenThe
Token ID464
Feature activation+4.84e-03

Pos logit contributions
 least+0.021
onel+0.021
ird+0.019
col+0.019
aved+0.018
Neg logit contributions
What-0.013
Really-0.012
 boy-0.012
 man-0.010
Who-0.010
Token sign
Token ID1051
Feature activation+8.32e-03

Pos logit contributions
What+0.007
Really+0.006
 boy+0.006
It+0.005
Who+0.005
Neg logit contributions
 least-0.010
onel-0.009
ird-0.008
col-0.008
 er-0.008
Token witch
Token ID16365
Feature activation+6.56e-02

Pos logit contributions
What+0.060
Really+0.054
 boy+0.049
It+0.046
Who+0.045
Neg logit contributions
 least-0.082
onel-0.081
ird-0.073
col-0.073
issy-0.071
Token.
Token ID13
Feature activation+9.42e-03

Pos logit contributions
What+0.099
Really+0.089
 boy+0.088
 man+0.079
Who+0.077
Neg logit contributions
 least-0.122
onel-0.119
ird-0.108
col-0.106
aved-0.106
Token Lily
Token ID20037
Feature activation-1.68e-02

Pos logit contributions
What+0.014
Really+0.013
 boy+0.012
It+0.011
Who+0.011
Neg logit contributions
 least-0.018
onel-0.018
ird-0.017
aved-0.016
col-0.016
Token learned
Token ID4499
Feature activation-1.21e-02

Pos logit contributions
 least+0.032
onel+0.032
ird+0.029
col+0.028
auc+0.028
Neg logit contributions
What-0.024
Really-0.022
 boy-0.022
 man-0.019
It-0.019
Token that
Token ID326
Feature activation+5.44e-03

Pos logit contributions
onel+0.025
 least+0.024
ird+0.024
aved+0.023
auc+0.022
Neg logit contributions
What-0.017
Really-0.014
 boy-0.014
It-0.013
 man-0.013
Token being
Token ID852
Feature activation+8.24e-02

Pos logit contributions
What+0.007
Really+0.006
 boy+0.006
It+0.005
Who+0.005
Neg logit contributions
 least-0.011
onel-0.011
ird-0.011
aved-0.010
issy-0.010
Token brave
Token ID14802
Feature activation-1.69e-02

Pos logit contributions
What+0.111
Really+0.098
 boy+0.094
It+0.083
Who+0.081
Neg logit contributions
 least-0.171
onel-0.168
ird-0.155
col-0.151
issy-0.150
Token and
Token ID290
Feature activation+3.84e-03

Pos logit contributions
 least+0.033
onel+0.032
ird+0.030
col+0.029
issy+0.029
Neg logit contributions
What-0.024
Really-0.022
 boy-0.021
Who-0.018
It-0.018
Token helping
Token ID5742
Feature activation-6.56e-03

Pos logit contributions
What+0.006
Really+0.005
 boy+0.005
It+0.004
Who+0.004
Neg logit contributions
onel-0.007
 least-0.007
ird-0.007
issy-0.007
auc-0.007
Token others
Token ID1854
Feature activation-2.41e-02

Pos logit contributions
onel+0.013
 least+0.013
ird+0.012
aved+0.012
col+0.011
Neg logit contributions
What-0.009
 boy-0.008
Really-0.008
It-0.007
 man-0.007
Token can
Token ID460
Feature activation-4.78e-02

Pos logit contributions
 least+0.041
onel+0.040
ird+0.035
col+0.034
aved+0.034
Neg logit contributions
What-0.041
Really-0.039
 boy-0.036
Who-0.033
 man-0.032
Token and
Token ID290
Feature activation+1.46e-02

Pos logit contributions
What+0.020
Really+0.017
 boy+0.017
 man+0.015
It+0.015
Neg logit contributions
 least-0.031
onel-0.030
ird-0.027
col-0.027
issy-0.026
Token cars
Token ID5006
Feature activation-2.51e-03

Pos logit contributions
What+0.023
Really+0.021
 boy+0.020
It+0.018
Who+0.018
Neg logit contributions
 least-0.027
onel-0.026
ird-0.024
col-0.023
issy-0.023
Token and
Token ID290
Feature activation+2.32e-02

Pos logit contributions
 least+0.006
onel+0.005
oken+0.005
ird+0.005
inda+0.005
Neg logit contributions
What-0.003
 boy-0.003
 man-0.002
Really-0.002
It-0.002
Token cats
Token ID11875
Feature activation+2.76e-02

Pos logit contributions
What+0.036
Really+0.033
 boy+0.031
It+0.028
+0.028
Neg logit contributions
 least-0.042
onel-0.041
ird-0.037
col-0.037
issy-0.036
Token.
Token ID13
Feature activation+2.76e-02

Pos logit contributions
What+0.035
 boy+0.030
Really+0.027
 man+0.027
It+0.026
Neg logit contributions
 least-0.059
onel-0.058
oken-0.052
issy-0.052
auc-0.051
Token They
Token ID1119
Feature activation+1.12e-02

Pos logit contributions
What+0.032
Really+0.027
 boy+0.026
Who+0.022
It+0.021
Neg logit contributions
onel-0.065
 least-0.065
ird-0.059
col-0.058
aved-0.057
Token mixed
Token ID7668
Feature activation+2.06e-02

Pos logit contributions
What+0.018
 boy+0.016
Really+0.016
It+0.015
Who+0.015
Neg logit contributions
onel-0.020
 least-0.020
ird-0.017
oken-0.017
col-0.017
Token the
Token ID262
Feature activation-4.24e-03

Pos logit contributions
What+0.030
Really+0.027
 boy+0.025
Who+0.023
It+0.022
Neg logit contributions
 least-0.041
onel-0.039
ird-0.036
col-0.036
aved-0.035
Token colors
Token ID7577
Feature activation-3.22e-02

Pos logit contributions
onel+0.007
 least+0.007
ird+0.006
ovable+0.006
bar+0.006
Neg logit contributions
What-0.008
 boy-0.007
Really-0.007
 man-0.007
It-0.006
Token and
Token ID290
Feature activation+1.55e-03

Pos logit contributions
 least+0.074
onel+0.073
ird+0.069
issy+0.065
oken+0.065
Neg logit contributions
What-0.036
 boy-0.031
Really-0.029
It-0.027
 man-0.026
Token made
Token ID925
Feature activation+3.97e-02

Pos logit contributions
What+0.002
Really+0.002
 boy+0.002
It+0.002
Who+0.002
Neg logit contributions
 least-0.003
onel-0.003
ird-0.003
col-0.003
issy-0.003
Token
Token ID198
Feature activation+1.12e-02

Pos logit contributions
What+0.009
Really+0.008
 boy+0.007
Who+0.007
 Please+0.006
Neg logit contributions
 least-0.010
onel-0.010
ird-0.009
col-0.009
 ease-0.009
TokenThe
Token ID464
Feature activation+1.44e-02

Pos logit contributions
What+0.012
Really+0.012
 boy+0.011
Who+0.010
 characters+0.010
Neg logit contributions
 least-0.021
ird-0.020
bar-0.019
col-0.019
onel-0.019
Token cat
Token ID3797
Feature activation-1.34e-02

Pos logit contributions
What+0.021
Really+0.019
 boy+0.016
Who+0.016
+0.016
Neg logit contributions
 least-0.027
onel-0.025
 er-0.025
ird-0.024
issy-0.023
Token me
Token ID502
Feature activation-1.61e-02

Pos logit contributions
 least+0.024
onel+0.023
 er+0.021
col+0.021
ird+0.020
Neg logit contributions
What-0.021
Really-0.019
 boy-0.017
It-0.016
Who-0.016
Tokenows
Token ID1666
Feature activation-2.45e-02

Pos logit contributions
 least+0.017
onel+0.016
ird+0.014
col+0.013
issy+0.012
Neg logit contributions
What-0.038
Really-0.036
 boy-0.035
Who-0.033
-0.032
Token and
Token ID290
Feature activation+2.15e-02

Pos logit contributions
 least+0.049
onel+0.048
col+0.044
ird+0.044
aved+0.043
Neg logit contributions
What-0.034
Really-0.031
 boy-0.028
Who-0.026
-0.025
Token jumps
Token ID18045
Feature activation+4.96e-03

Pos logit contributions
What+0.035
Really+0.031
It+0.028
 boy+0.028
+0.028
Neg logit contributions
 least-0.036
 er-0.033
onel-0.031
col-0.029
 ease-0.029
Token out
Token ID503
Feature activation-4.37e-02

Pos logit contributions
What+0.009
Really+0.008
 boy+0.008
Who+0.007
It+0.007
Neg logit contributions
 least-0.008
onel-0.008
ird-0.007
col-0.007
issy-0.007
Token of
Token ID286
Feature activation-2.94e-02

Pos logit contributions
 least+0.069
onel+0.066
ird+0.060
issy+0.056
col+0.055
Neg logit contributions
What-0.082
 boy-0.075
Really-0.074
 man-0.068
It-0.068
Token the
Token ID262
Feature activation-1.54e-02

Pos logit contributions
 least+0.057
onel+0.055
ird+0.050
col+0.049
issy+0.048
Neg logit contributions
What-0.043
Really-0.040
 boy-0.037
Who-0.033
-0.033
Token box
Token ID3091
Feature activation-8.62e-02

Pos logit contributions
 least+0.025
onel+0.024
ird+0.021
col+0.021
auc+0.020
Neg logit contributions
What-0.027
Really-0.026
 boy-0.024
Who-0.022
It-0.022
TokenOne
Token ID3198
Feature activation+5.56e-02

Pos logit contributions
ird+0.017
col+0.017
issy+0.016
inda+0.016
bar+0.015
Neg logit contributions
Really-0.010
What-0.010
 boy-0.009
ller-0.008
 MacDonald-0.008
Token morning
Token ID3329
Feature activation+3.79e-02

Pos logit contributions
What+0.103
Really+0.094
 boy+0.092
Who+0.083
It+0.083
Neg logit contributions
 least-0.088
onel-0.085
ird-0.076
col-0.074
aved-0.072
Token he
Token ID339
Feature activation+3.69e-03

Pos logit contributions
What+0.082
Really+0.078
 boy+0.071
Who+0.069
How+0.068
Neg logit contributions
 least-0.044
onel-0.041
ird-0.039
col-0.036
issy-0.036
Token woke
Token ID19092
Feature activation-2.16e-03

Pos logit contributions
What+0.006
Really+0.006
 boy+0.005
Who+0.005
It+0.005
Neg logit contributions
 least-0.007
onel-0.006
ird-0.006
col-0.006
issy-0.005
Token up
Token ID510
Feature activation+4.15e-03

Pos logit contributions
 least+0.003
onel+0.003
 er+0.002
col+0.002
oken+0.002
Neg logit contributions
What-0.004
Really-0.004
 boy-0.004
Who-0.004
It-0.004
Token and
Token ID290
Feature activation-4.64e-03

Pos logit contributions
What+0.006
Really+0.005
 boy+0.005
Who+0.004
It+0.004
Neg logit contributions
 least-0.009
onel-0.008
ird-0.008
col-0.008
issy-0.007
Token couldn
Token ID3521
Feature activation-1.73e-02

Pos logit contributions
 least+0.008
onel+0.007
col+0.006
ird+0.006
aved+0.006
Neg logit contributions
What-0.008
Really-0.007
Spot-0.006
 boy-0.006
 MacDonald-0.006
Token't
Token ID470
Feature activation-8.26e-04

Pos logit contributions
 least+0.033
onel+0.032
ird+0.029
col+0.029
issy+0.028
Neg logit contributions
What-0.026
Really-0.024
 boy-0.023
Who-0.020
It-0.020
Token see
Token ID766
Feature activation+4.50e-03

Pos logit contributions
onel+0.001
 least+0.001
col+0.001
ird+0.001
auc+0.001
Neg logit contributions
What-0.001
Really-0.001
 boy-0.001
Who-0.001
It-0.001
Token anything
Token ID1997
Feature activation+3.98e-02

Pos logit contributions
What+0.007
Really+0.006
 boy+0.006
Who+0.006
It+0.006
Neg logit contributions
 least-0.008
onel-0.008
ird-0.007
col-0.007
issy-0.007
Token because
Token ID780
Feature activation-8.27e-03

Pos logit contributions
What+0.062
Really+0.054
 boy+0.053
 man+0.048
Who+0.048
Neg logit contributions
 least-0.075
onel-0.073
ird-0.067
col-0.065
issy-0.065
Token better
Token ID1365
Feature activation+2.99e-02

Pos logit contributions
 least+0.038
onel+0.038
ird+0.034
col+0.034
issy+0.033
Neg logit contributions
What-0.050
Really-0.045
 boy-0.045
It-0.041
 man-0.041
Token and
Token ID290
Feature activation-1.73e-02

Pos logit contributions
What+0.050
Really+0.046
 boy+0.043
Who+0.039
It+0.039
Neg logit contributions
 least-0.054
onel-0.052
ird-0.047
col-0.044
aved-0.044
Token realized
Token ID6939
Feature activation-2.73e-02

Pos logit contributions
 least+0.033
onel+0.032
ird+0.030
col+0.029
aved+0.029
Neg logit contributions
What-0.026
Really-0.024
 boy-0.023
It-0.020
 man-0.020
Token that
Token ID326
Feature activation+1.25e-02

Pos logit contributions
 least+0.051
onel+0.050
ird+0.045
col+0.044
aved+0.044
Neg logit contributions
What-0.043
Really-0.038
 boy-0.037
It-0.033
Who-0.033
Token sometimes
Token ID3360
Feature activation+5.83e-02

Pos logit contributions
What+0.020
Really+0.018
 boy+0.017
+0.016
Who+0.016
Neg logit contributions
 least-0.022
onel-0.022
ird-0.020
col-0.019
issy-0.019
Token it
Token ID340
Feature activation-6.83e-03

Pos logit contributions
What+0.095
Really+0.088
 boy+0.078
Who+0.074
It+0.073
Neg logit contributions
 least-0.106
onel-0.100
ird-0.092
 er-0.086
 ease-0.086
Token's
Token ID338
Feature activation-5.24e-02

Pos logit contributions
 least+0.009
onel+0.009
ird+0.009
col+0.009
aved+0.008
Neg logit contributions
What-0.013
 boy-0.012
Really-0.012
It-0.011
 man-0.011
Token good
Token ID922
Feature activation-6.41e-03

Pos logit contributions
 least+0.084
onel+0.075
ird+0.065
oken+0.063
 ease+0.063
Neg logit contributions
What-0.097
Really-0.094
 boy-0.088
-0.085
Who-0.084
Token to
Token ID284
Feature activation-5.66e-03

Pos logit contributions
 least+0.012
onel+0.011
ird+0.010
col+0.010
aved+0.010
Neg logit contributions
What-0.011
Really-0.010
 boy-0.009
Who-0.009
It-0.008
Token listen
Token ID6004
Feature activation+3.42e-02

Pos logit contributions
 least+0.010
onel+0.009
ird+0.008
 ease+0.008
oken+0.008
Neg logit contributions
What-0.010
Really-0.009
 boy-0.009
-0.008
 man-0.008
Token to
Token ID284
Feature activation-1.56e-02

Pos logit contributions
What+0.058
Really+0.054
 boy+0.050
Who+0.047
It+0.044
Neg logit contributions
 least-0.058
onel-0.057
col-0.051
ird-0.050
inda-0.049
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
What+0.018
Really+0.016
 boy+0.015
Who+0.013
It+0.013
Neg logit contributions
 least-0.025
onel-0.024
ird-0.022
col-0.022
issy-0.021
Token park
Token ID3952
Feature activation-4.70e-03

Pos logit contributions
What+0.032
Really+0.027
 boy+0.027
 man+0.024
It+0.024
Neg logit contributions
onel-0.040
 least-0.037
ird-0.037
issy-0.035
aved-0.035
Token to
Token ID284
Feature activation+1.33e-02

Pos logit contributions
onel+0.011
 least+0.011
ird+0.010
issy+0.010
col+0.010
Neg logit contributions
What-0.005
 boy-0.005
Really-0.004
 man-0.004
It-0.004
Token play
Token ID711
Feature activation+4.82e-03

Pos logit contributions
What+0.019
 boy+0.016
Who+0.014
Really+0.014
It+0.013
Neg logit contributions
onel-0.030
issy-0.026
 least-0.026
ird-0.026
ovable-0.026
Token with
Token ID351
Feature activation+2.69e-03

Pos logit contributions
What+0.006
Really+0.005
 boy+0.005
Who+0.004
It+0.004
Neg logit contributions
 least-0.010
onel-0.010
ird-0.009
col-0.009
issy-0.009
Token his
Token ID465
Feature activation+1.21e-02

Pos logit contributions
What+0.003
 boy+0.002
Really+0.002
 man+0.002
It+0.002
Neg logit contributions
onel-0.007
 least-0.007
ird-0.006
col-0.006
issy-0.006
Token friends
Token ID2460
Feature activation-1.90e-02

Pos logit contributions
What+0.016
Really+0.014
 boy+0.014
 man+0.012
It+0.012
Neg logit contributions
 least-0.025
onel-0.025
ird-0.023
issy-0.022
col-0.022
Token.
Token ID13
Feature activation-6.89e-03

Pos logit contributions
onel+0.047
 least+0.044
issy+0.043
oken+0.043
col+0.043
Neg logit contributions
What-0.018
 boy-0.017
 man-0.013
Can-0.012
-0.012
Token Max
Token ID5436
Feature activation-1.56e-03

Pos logit contributions
 least+0.014
onel+0.014
ird+0.013
col+0.013
issy+0.013
Neg logit contributions
What-0.009
Really-0.008
 boy-0.007
Who-0.007
It-0.007
Token's
Token ID338
Feature activation-1.54e-02

Pos logit contributions
 least+0.003
onel+0.003
ird+0.003
col+0.002
issy+0.002
Neg logit contributions
What-0.002
Really-0.002
 boy-0.002
Who-0.002
It-0.002
Token friends
Token ID2460
Feature activation+8.36e-03

Pos logit contributions
 least+0.030
onel+0.030
ird+0.027
col+0.027
aved+0.026
Neg logit contributions
What-0.022
Really-0.020
 boy-0.018
Who-0.017
It-0.017
Token both
Token ID1111
Feature activation+9.73e-03

Pos logit contributions
What+0.009
Really+0.007
 boy+0.007
It+0.006
 man+0.006
Neg logit contributions
onel-0.015
 least-0.015
ird-0.014
issy-0.013
auc-0.013
Token hands
Token ID2832
Feature activation-6.56e-02

Pos logit contributions
What+0.015
 boy+0.014
Really+0.013
 man+0.012
It+0.012
Neg logit contributions
onel-0.018
 least-0.018
ird-0.017
issy-0.016
auc-0.016
Token.
Token ID13
Feature activation-1.92e-02

Pos logit contributions
onel+0.150
 least+0.147
issy+0.136
ird+0.135
auc+0.134
Neg logit contributions
What-0.070
 boy-0.063
 man-0.057
Really-0.055
It-0.054
Token Then
Token ID3244
Feature activation-4.88e-02

Pos logit contributions
 least+0.039
onel+0.038
ird+0.035
col+0.034
aved+0.034
Neg logit contributions
What-0.027
Really-0.024
 boy-0.023
Who-0.020
It-0.020
Token he
Token ID339
Feature activation-2.08e-02

Pos logit contributions
 least+0.075
onel+0.069
ird+0.063
col+0.060
aved+0.059
Neg logit contributions
What-0.092
Really-0.085
 boy-0.077
Who-0.074
It-0.073
Token said
Token ID531
Feature activation-3.44e-02

Pos logit contributions
 least+0.035
onel+0.033
ird+0.030
col+0.030
issy+0.029
Neg logit contributions
What-0.037
Really-0.034
 boy-0.031
Who-0.029
It-0.029
Token the
Token ID262
Feature activation+1.51e-02

Pos logit contributions
onel+0.049
 least+0.048
ird+0.042
col+0.042
aved+0.039
Neg logit contributions
What-0.068
Really-0.062
 boy-0.059
Who-0.056
How-0.054
Token same
Token ID976
Feature activation+1.25e-02

Pos logit contributions
What+0.024
Really+0.022
+0.020
Who+0.019
 boy+0.019
Neg logit contributions
 least-0.027
onel-0.024
 er-0.024
ird-0.022
 ease-0.022
Token magic
Token ID5536
Feature activation-3.84e-02

Pos logit contributions
What+0.018
Really+0.017
 boy+0.015
Who+0.014
It+0.014
Neg logit contributions
 least-0.024
onel-0.024
ird-0.022
col-0.021
 er-0.021
Token word
Token ID1573
Feature activation-1.82e-02

Pos logit contributions
 least+0.066
onel+0.065
 er+0.062
col+0.061
ird+0.059
Neg logit contributions
What-0.063
Really-0.057
Who-0.049
How-0.048
-0.048
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
 least+0.033
onel+0.033
ird+0.030
col+0.029
issy+0.028
Neg logit contributions
What-0.028
Really-0.026
 boy-0.025
Who-0.022
It-0.022
TokenI
Token ID40
Feature activation-3.83e-02

Pos logit contributions
What+0.013
Really+0.013
 boy+0.012
 man+0.011
+0.011
Neg logit contributions
 least-0.030
onel-0.028
col-0.028
aved-0.026
issy-0.026
Token have
Token ID423
Feature activation+1.80e-02

Pos logit contributions
 least+0.071
onel+0.071
col+0.063
ird+0.063
issy+0.061
Neg logit contributions
What-0.058
 boy-0.052
Really-0.051
It-0.046
 man-0.046
Token the
Token ID262
Feature activation-1.93e-02

Pos logit contributions
What+0.026
Really+0.024
 boy+0.023
Who+0.020
It+0.020
Neg logit contributions
 least-0.035
onel-0.034
ird-0.031
col-0.031
issy-0.030
Token strongest
Token ID12841
Feature activation+2.47e-03

Pos logit contributions
onel+0.031
 least+0.030
ird+0.029
auc+0.028
issy+0.028
Neg logit contributions
What-0.033
 boy-0.031
Really-0.030
 man-0.028
It-0.027
Token branches
Token ID13737
Feature activation-7.66e-02

Pos logit contributions
What+0.004
 boy+0.004
Really+0.004
 man+0.004
Who+0.004
Neg logit contributions
 least-0.004
onel-0.004
issy-0.003
ird-0.003
col-0.003
Token!
Token ID0
Feature activation-1.55e-02

Pos logit contributions
 least+0.142
onel+0.139
ird+0.127
issy+0.126
col+0.125
Neg logit contributions
What-0.112
 boy-0.100
Really-0.098
It-0.090
 man-0.089
Token I
Token ID314
Feature activation-2.02e-02

Pos logit contributions
 least+0.031
onel+0.029
ird+0.028
col+0.027
issy+0.026
Neg logit contributions
What-0.022
Really-0.019
 boy-0.019
 man-0.017
Who-0.016
Token can
Token ID460
Feature activation-2.91e-02

Pos logit contributions
 least+0.036
onel+0.036
ird+0.032
col+0.032
issy+0.032
Neg logit contributions
What-0.031
 boy-0.028
Really-0.027
It-0.025
 man-0.024
Token handle
Token ID5412
Feature activation-4.51e-02

Pos logit contributions
 least+0.051
onel+0.050
ird+0.045
col+0.045
issy+0.044
Neg logit contributions
What-0.047
Really-0.043
 boy-0.041
It-0.038
Who-0.037
Token anything
Token ID1997
Feature activation+4.19e-02

Pos logit contributions
 least+0.085
onel+0.080
col+0.073
ird+0.072
bar+0.071
Neg logit contributions
What-0.069
Really-0.066
 boy-0.060
Who-0.055
-0.054
Token."
Token ID526
Feature activation+1.34e-02

Pos logit contributions
What+0.057
Really+0.051
 boy+0.049
It+0.044
 man+0.043
Neg logit contributions
 least-0.084
onel-0.082
ird-0.076
issy-0.075
col-0.074
Token it
Token ID340
Feature activation-1.13e-02

Pos logit contributions
What+0.028
Really+0.022
 boy+0.021
It+0.017
Who+0.016
Neg logit contributions
 least-0.081
onel-0.080
ird-0.075
col-0.073
issy-0.073
Token on
Token ID319
Feature activation-2.85e-02

Pos logit contributions
 least+0.027
onel+0.025
ird+0.025
issy+0.025
opl+0.024
Neg logit contributions
What-0.012
 boy-0.009
Really-0.009
It-0.008
 man-0.007
Token a
Token ID257
Feature activation+5.51e-03

Pos logit contributions
onel+0.068
 least+0.062
aved+0.060
auc+0.059
ovable+0.058
Neg logit contributions
What-0.038
 boy-0.029
It-0.028
 man-0.027
Really-0.027
Token leaf
Token ID12835
Feature activation-2.03e-02

Pos logit contributions
What+0.007
 boy+0.007
 man+0.006
Really+0.006
It+0.006
Neg logit contributions
onel-0.012
 least-0.012
oken-0.011
aved-0.011
auc-0.010
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
onel+0.052
oken+0.051
 least+0.050
ibr+0.050
 persever+0.049
Neg logit contributions
 boy-0.014
 man-0.013
 lady-0.010
What-0.010
 with-0.009
Token
Token ID198
Feature activation-2.05e-02

Pos logit contributions
 least+0.032
onel+0.031
ird+0.028
col+0.028
issy+0.027
Neg logit contributions
What-0.023
Really-0.021
 boy-0.020
Who-0.017
It-0.017
Token
Token ID198
Feature activation-1.60e-02

Pos logit contributions
 least+0.040
onel+0.039
ird+0.037
col+0.036
issy+0.036
Neg logit contributions
What-0.027
Really-0.026
 boy-0.024
Who-0.021
-0.020
TokenL
Token ID43
Feature activation-6.50e-02

Pos logit contributions
onel+0.028
col+0.028
ird+0.027
 least+0.027
inda+0.027
Neg logit contributions
What-0.020
 boy-0.020
Really-0.020
 MacDonald-0.018
 man-0.018
Tokenily
Token ID813
Feature activation+6.30e-03

Pos logit contributions
 least+0.102
onel+0.097
col+0.087
aved+0.086
ird+0.086
Neg logit contributions
What-0.120
 boy-0.108
Really-0.106
It-0.098
Who-0.096
Token cared
Token ID19951
Feature activation-1.27e-02

Pos logit contributions
What+0.009
 boy+0.009
Really+0.008
 man+0.008
+0.007
Neg logit contributions
onel-0.012
 least-0.012
ird-0.011
auc-0.010
col-0.010
Token about
Token ID546
Feature activation+5.98e-02

Pos logit contributions
 least+0.031
ird+0.029
onel+0.029
aved+0.028
auc+0.028
Neg logit contributions
What-0.012
 boy-0.010
It-0.009
 man-0.009
Really-0.008
Token on
Token ID319
Feature activation-3.60e-02

Pos logit contributions
What+0.047
Really+0.045
 boy+0.040
Who+0.038
+0.037
Neg logit contributions
 least-0.053
onel-0.050
ird-0.046
col-0.046
aved-0.044
Token his
Token ID465
Feature activation-3.89e-03

Pos logit contributions
 least+0.073
onel+0.071
ird+0.066
col+0.065
aved+0.063
Neg logit contributions
What-0.049
Really-0.045
 boy-0.042
Who-0.037
It-0.037
Token work
Token ID670
Feature activation+4.93e-02

Pos logit contributions
 least+0.007
onel+0.007
ird+0.006
col+0.006
 er+0.006
Neg logit contributions
What-0.006
Really-0.006
 boy-0.005
It-0.005
Who-0.005
Token clothes
Token ID8242
Feature activation-6.59e-02

Pos logit contributions
What+0.051
 boy+0.049
 man+0.047
Really+0.040
It+0.038
Neg logit contributions
onel-0.113
 least-0.112
aved-0.099
ovable-0.099
issy-0.098
Token,
Token ID11
Feature activation-3.40e-02

Pos logit contributions
onel+0.167
 least+0.163
inda+0.151
oken+0.149
col+0.148
Neg logit contributions
What-0.054
 boy-0.048
 man-0.043
Really-0.038
-0.038
Token picked
Token ID6497
Feature activation-1.83e-02

Pos logit contributions
onel+0.071
 least+0.070
ird+0.063
aved+0.062
col+0.062
Neg logit contributions
What-0.046
 boy-0.040
Really-0.039
 man-0.035
Who-0.035
Token up
Token ID510
Feature activation-1.27e-02

Pos logit contributions
 least+0.030
onel+0.030
col+0.026
ird+0.026
auc+0.025
Neg logit contributions
What-0.031
Really-0.030
 boy-0.027
Who-0.025
-0.025
Token his
Token ID465
Feature activation-1.18e-02

Pos logit contributions
 least+0.025
onel+0.024
ird+0.022
col+0.022
aved+0.021
Neg logit contributions
What-0.018
Really-0.017
 boy-0.016
Who-0.014
It-0.014
Token work
Token ID670
Feature activation+4.64e-02

Pos logit contributions
 least+0.021
onel+0.020
ird+0.019
col+0.018
 er+0.018
Neg logit contributions
What-0.019
Really-0.018
 boy-0.016
It-0.015
Who-0.015
Tokenbag
Token ID21454
Feature activation-3.81e-03

Pos logit contributions
What+0.056
 boy+0.053
 man+0.049
Really+0.047
It+0.043
Neg logit contributions
 least-0.099
onel-0.098
aved-0.087
ird-0.085
issy-0.085
Token and
Token ID290
Feature activation-4.73e-03

Pos logit contributions
 least+0.008
onel+0.008
ird+0.007
col+0.007
issy+0.007
Neg logit contributions
What-0.005
 boy-0.005
Really-0.004
 man-0.004
It-0.004
Token Lily
Token ID20037
Feature activation-8.58e-03

Pos logit contributions
 least+0.011
onel+0.010
ird+0.010
 er+0.009
issy+0.009
Neg logit contributions
What-0.010
Really-0.010
Who-0.008
 boy-0.008
-0.008
Token.
Token ID13
Feature activation+5.83e-04

Pos logit contributions
 least+0.014
onel+0.013
ird+0.012
col+0.011
issy+0.011
Neg logit contributions
What-0.016
Really-0.015
 boy-0.013
Who-0.013
It-0.012
Token It
Token ID632
Feature activation+1.28e-02

Pos logit contributions
What+0.001
Really+0.001
 boy+0.001
 man+0.001
Who+0.001
Neg logit contributions
 least-0.001
onel-0.001
ird-0.001
col-0.001
issy-0.001
Token's
Token ID338
Feature activation-2.90e-02

Pos logit contributions
What+0.025
 boy+0.023
Really+0.023
It+0.021
 man+0.021
Neg logit contributions
 least-0.018
onel-0.018
ird-0.015
col-0.015
aved-0.015
Token so
Token ID523
Feature activation+2.19e-02

Pos logit contributions
 least+0.051
onel+0.049
ird+0.044
col+0.043
issy+0.042
Neg logit contributions
What-0.049
Really-0.045
 boy-0.043
Who-0.039
It-0.038
Token much
Token ID881
Feature activation-2.07e-02

Pos logit contributions
What+0.028
Really+0.025
 boy+0.025
It+0.021
 man+0.021
Neg logit contributions
 least-0.047
onel-0.046
ird-0.042
col-0.041
aved-0.041
Token fun
Token ID1257
Feature activation-1.69e-02

Pos logit contributions
onel+0.041
 least+0.041
ird+0.037
aved+0.037
col+0.036
Neg logit contributions
What-0.030
Really-0.026
 boy-0.026
 man-0.023
It-0.022
Token!"
Token ID2474
Feature activation+2.47e-02

Pos logit contributions
 least+0.029
onel+0.026
col+0.025
ird+0.024
inda+0.024
Neg logit contributions
What-0.030
Really-0.029
Who-0.024
 boy-0.024
It-0.023
Token Lily
Token ID20037
Feature activation-4.39e-02

Pos logit contributions
What+0.044
Really+0.040
 boy+0.040
It+0.036
+0.036
Neg logit contributions
onel-0.042
 least-0.041
aved-0.036
ird-0.035
oken-0.035
Token smiled
Token ID13541
Feature activation+1.25e-02

Pos logit contributions
 least+0.078
onel+0.074
ird+0.068
col+0.068
inda+0.065
Neg logit contributions
What-0.073
Really-0.067
 boy-0.060
Who-0.057
It-0.055
Token and
Token ID290
Feature activation+3.52e-03

Pos logit contributions
What+0.018
Really+0.017
 boy+0.014
Who+0.014
How+0.013
Neg logit contributions
 least-0.024
onel-0.023
col-0.021
inda-0.020
 er-0.020
Token stay
Token ID2652
Feature activation+7.12e-02

Pos logit contributions
What+0.035
 boy+0.031
Really+0.029
It+0.028
 man+0.028
Neg logit contributions
onel-0.034
 least-0.031
aved-0.028
ovable-0.028
col-0.027
Token in
Token ID287
Feature activation+1.13e-02

Pos logit contributions
What+0.123
Really+0.122
 boy+0.109
Who+0.103
It+0.100
Neg logit contributions
 least-0.120
onel-0.109
ird-0.102
col-0.098
oken-0.096
Token her
Token ID607
Feature activation+1.31e-02

Pos logit contributions
What+0.017
Really+0.015
 boy+0.014
Who+0.013
It+0.013
Neg logit contributions
onel-0.022
 least-0.022
ird-0.020
col-0.019
aved-0.019
Token room
Token ID2119
Feature activation-1.27e-03

Pos logit contributions
 boy+0.012
What+0.010
 man+0.008
Really+0.007
+0.006
Neg logit contributions
onel-0.035
 least-0.033
aved-0.033
col-0.032
upid-0.032
Token while
Token ID981
Feature activation-1.88e-02

Pos logit contributions
onel+0.003
 least+0.003
aved+0.003
col+0.002
 ease+0.002
Neg logit contributions
What-0.001
 boy-0.001
Really-0.001
 man-0.001
-0.001
Token she
Token ID673
Feature activation-1.04e-01

Pos logit contributions
 least+0.031
onel+0.030
ird+0.027
col+0.026
issy+0.026
Neg logit contributions
What-0.033
Really-0.031
 boy-0.029
It-0.026
Who-0.026
Token went
Token ID1816
Feature activation-4.12e-02

Pos logit contributions
 least+0.196
onel+0.192
ird+0.172
col+0.171
issy+0.166
Neg logit contributions
What-0.159
Really-0.144
 boy-0.141
Who-0.124
It-0.123
Token to
Token ID284
Feature activation-1.92e-03

Pos logit contributions
onel+0.091
 least+0.088
ird+0.081
col+0.078
issy+0.078
Neg logit contributions
What-0.053
 boy-0.046
 man-0.040
Really-0.040
Who-0.038
Token the
Token ID262
Feature activation+9.57e-03

Pos logit contributions
onel+0.004
 least+0.004
col+0.003
ird+0.003
aved+0.003
Neg logit contributions
What-0.003
 boy-0.002
Really-0.002
Who-0.002
 man-0.002
Token store
Token ID3650
Feature activation-4.32e-02

Pos logit contributions
What+0.017
 boy+0.015
Really+0.014
 man+0.014
Who+0.013
Neg logit contributions
onel-0.017
 least-0.016
aved-0.015
ird-0.015
inda-0.015
Token.
Token ID13
Feature activation-1.53e-02

Pos logit contributions
 least+0.075
onel+0.070
ird+0.067
col+0.064
auc+0.063
Neg logit contributions
What-0.072
Really-0.069
 boy-0.060
Who-0.058
It-0.057
Token.
Token ID13
Feature activation-1.36e-02

Pos logit contributions
 least+0.040
onel+0.040
ird+0.036
col+0.036
aved+0.036
Neg logit contributions
What-0.024
 boy-0.023
 man-0.021
Really-0.020
It-0.018
Token But
Token ID887
Feature activation-6.13e-02

Pos logit contributions
 least+0.029
onel+0.028
ird+0.026
col+0.026
issy+0.025
Neg logit contributions
What-0.017
Really-0.015
 boy-0.014
Who-0.012
It-0.012
Token no
Token ID645
Feature activation+3.66e-02

Pos logit contributions
 least+0.120
onel+0.116
ird+0.105
col+0.103
issy+0.101
Neg logit contributions
What-0.091
Really-0.082
 boy-0.077
Who-0.070
It-0.069
Token matter
Token ID2300
Feature activation+2.12e-02

Pos logit contributions
What+0.056
Really+0.051
 boy+0.048
It+0.043
Who+0.043
Neg logit contributions
 least-0.070
onel-0.067
ird-0.062
col-0.060
issy-0.059
Token how
Token ID703
Feature activation+6.23e-02

Pos logit contributions
What+0.026
Really+0.022
 boy+0.022
Who+0.019
It+0.018
Neg logit contributions
onel-0.046
 least-0.045
col-0.043
aved-0.042
ird-0.041
Token hard
Token ID1327
Feature activation-1.04e-01

Pos logit contributions
What+0.110
Really+0.101
 boy+0.095
Who+0.088
It+0.086
Neg logit contributions
 least-0.104
onel-0.102
ird-0.091
col-0.088
aved-0.087
Token he
Token ID339
Feature activation-4.80e-02

Pos logit contributions
 least+0.184
onel+0.165
ird+0.152
 ease+0.151
 er+0.149
Neg logit contributions
What-0.172
Really-0.161
 boy-0.133
Who-0.131
It-0.130
Token tried
Token ID3088
Feature activation-4.17e-02

Pos logit contributions
 least+0.087
onel+0.087
col+0.078
ird+0.078
auc+0.077
Neg logit contributions
What-0.074
Really-0.068
 boy-0.066
Who-0.059
It-0.058
Token,
Token ID11
Feature activation-4.31e-02

Pos logit contributions
onel+0.074
 least+0.073
ird+0.066
col+0.065
issy+0.064
Neg logit contributions
What-0.069
 boy-0.061
Really-0.060
It-0.055
 man-0.055
Token he
Token ID339
Feature activation+2.52e-03

Pos logit contributions
 least+0.084
onel+0.079
ird+0.071
 ease+0.070
col+0.069
Neg logit contributions
What-0.063
Really-0.058
 boy-0.052
-0.048
Who-0.048
Token couldn
Token ID3521
Feature activation-2.59e-02

Pos logit contributions
What+0.004
Really+0.004
 boy+0.004
 man+0.004
It+0.003
Neg logit contributions
 least-0.004
onel-0.004
ird-0.004
col-0.004
auc-0.004
Token cleaning
Token ID12724
Feature activation-2.15e-02

Pos logit contributions
What+0.036
Really+0.032
 boy+0.031
It+0.028
Who+0.028
Neg logit contributions
 least-0.037
onel-0.036
ird-0.032
col-0.032
issy-0.031
Token.
Token ID13
Feature activation-1.09e-02

Pos logit contributions
onel+0.040
 least+0.040
ird+0.036
col+0.035
issy+0.035
Neg logit contributions
What-0.034
Really-0.030
 boy-0.029
 man-0.026
It-0.026
Token They
Token ID1119
Feature activation-1.73e-02

Pos logit contributions
onel+0.024
 least+0.024
ird+0.022
col+0.021
aved+0.021
Neg logit contributions
What-0.013
Really-0.012
 boy-0.012
It-0.010
-0.010
Token argued
Token ID7189
Feature activation+2.56e-02

Pos logit contributions
onel+0.034
 least+0.033
opl+0.031
oken+0.030
upid+0.030
Neg logit contributions
What-0.024
 boy-0.023
It-0.021
 man-0.020
Really-0.019
Token while
Token ID981
Feature activation-3.42e-02

Pos logit contributions
What+0.038
Really+0.035
 boy+0.033
Who+0.030
It+0.029
Neg logit contributions
 least-0.049
onel-0.048
ird-0.043
col-0.042
issy-0.042
Token they
Token ID484
Feature activation-1.06e-01

Pos logit contributions
 least+0.055
onel+0.053
ird+0.048
col+0.046
issy+0.045
Neg logit contributions
What-0.060
Really-0.057
 boy-0.053
It-0.049
Who-0.048
Token worked
Token ID3111
Feature activation+8.96e-03

Pos logit contributions
 least+0.188
onel+0.188
auc+0.167
col+0.163
upid+0.161
Neg logit contributions
What-0.171
 boy-0.151
Really-0.149
It-0.140
 man-0.134
Token and
Token ID290
Feature activation-2.91e-02

Pos logit contributions
What+0.016
Really+0.014
 boy+0.014
It+0.013
 man+0.012
Neg logit contributions
onel-0.015
 least-0.015
ird-0.013
col-0.013
issy-0.013
Token they
Token ID484
Feature activation-4.02e-02

Pos logit contributions
onel+0.054
 least+0.051
auc+0.048
col+0.047
ovable+0.047
Neg logit contributions
What-0.047
 boy-0.042
Really-0.041
It-0.038
 man-0.037
Token were
Token ID547
Feature activation-4.54e-02

Pos logit contributions
 least+0.075
onel+0.075
opl+0.070
upid+0.069
auc+0.069
Neg logit contributions
What-0.058
 boy-0.055
It-0.051
 man-0.048
Really-0.048
Token both
Token ID1111
Feature activation+2.20e-02

Pos logit contributions
 least+0.065
onel+0.063
ird+0.056
col+0.054
aved+0.053
Neg logit contributions
What-0.090
Really-0.084
 boy-0.081
Who-0.074
It-0.074
Token did
Token ID750
Feature activation+3.67e-02

Pos logit contributions
 least+0.052
onel+0.052
ird+0.047
col+0.047
auc+0.046
Neg logit contributions
What-0.034
 boy-0.030
Really-0.030
 man-0.026
It-0.026
Token not
Token ID407
Feature activation+5.67e-02

Pos logit contributions
What+0.046
Really+0.044
 boy+0.041
+0.034
Who+0.034
Neg logit contributions
 least-0.080
onel-0.076
ird-0.070
col-0.068
 ease-0.067
Token know
Token ID760
Feature activation-4.28e-03

Pos logit contributions
What+0.096
Really+0.087
 boy+0.084
Who+0.076
It+0.076
Neg logit contributions
 least-0.098
onel-0.096
ird-0.086
col-0.085
issy-0.083
Token that
Token ID326
Feature activation-1.17e-02

Pos logit contributions
 least+0.010
onel+0.010
aved+0.009
ird+0.009
auc+0.009
Neg logit contributions
What-0.005
 boy-0.004
Really-0.004
 man-0.003
Who-0.003
Token the
Token ID262
Feature activation-5.99e-03

Pos logit contributions
onel+0.025
 least+0.024
aved+0.023
ird+0.023
bar+0.022
Neg logit contributions
What-0.015
 boy-0.014
Really-0.013
 man-0.012
Who-0.011
Token box
Token ID3091
Feature activation-1.02e-01

Pos logit contributions
 least+0.010
onel+0.010
ird+0.009
col+0.009
issy+0.008
Neg logit contributions
What-0.010
Really-0.009
 boy-0.009
Who-0.008
It-0.008
Token was
Token ID373
Feature activation-3.44e-02

Pos logit contributions
 least+0.234
onel+0.230
ird+0.212
col+0.210
aved+0.207
Neg logit contributions
What-0.106
 boy-0.097
Really-0.088
 man-0.082
It-0.077
Token not
Token ID407
Feature activation+7.05e-02

Pos logit contributions
 least+0.062
onel+0.060
ird+0.055
col+0.054
issy+0.052
Neg logit contributions
What-0.056
Really-0.050
 boy-0.049
Who-0.044
It-0.044
Token empty
Token ID6565
Feature activation-9.70e-03

Pos logit contributions
What+0.113
Really+0.103
 boy+0.099
Who+0.089
It+0.088
Neg logit contributions
 least-0.127
onel-0.124
ird-0.113
col-0.111
issy-0.108
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
onel+0.019
 least+0.019
ird+0.018
issy+0.018
col+0.017
Neg logit contributions
What-0.013
 boy-0.012
Really-0.011
 man-0.010
Who-0.010
Token It
Token ID632
Feature activation+8.04e-03

Pos logit contributions
What+0.014
Really+0.013
 boy+0.012
It+0.011
Who+0.011
Neg logit contributions
 least-0.023
onel-0.022
ird-0.020
col-0.020
aved-0.020
Token replied
Token ID8712
Feature activation-2.34e-02

Pos logit contributions
 least+0.075
onel+0.071
ird+0.067
col+0.066
 er+0.064
Neg logit contributions
What-0.069
Really-0.063
 boy-0.058
Who-0.054
It-0.052
Token.
Token ID13
Feature activation-1.26e-02

Pos logit contributions
 least+0.052
onel+0.049
issy+0.045
ird+0.045
aved+0.045
Neg logit contributions
What-0.029
 boy-0.025
Really-0.025
 man-0.022
It-0.022
Token
Token ID198
Feature activation-1.50e-02

Pos logit contributions
 least+0.026
onel+0.025
ird+0.023
col+0.023
issy+0.022
Neg logit contributions
What-0.017
Really-0.015
 boy-0.014
Who-0.013
 man-0.012
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 least+0.029
onel+0.028
ird+0.027
col+0.027
issy+0.026
Neg logit contributions
What-0.020
Really-0.019
 boy-0.017
Who-0.016
 Please-0.015
TokenAs
Token ID1722
Feature activation-1.50e-02

Pos logit contributions
 least+0.019
onel+0.019
ird+0.019
col+0.019
issy+0.017
Neg logit contributions
What-0.013
Really-0.012
 boy-0.012
 man-0.010
 Please-0.010
Token they
Token ID484
Feature activation-1.03e-01

Pos logit contributions
 least+0.028
onel+0.026
ird+0.024
col+0.024
issy+0.023
Neg logit contributions
What-0.024
Really-0.022
 boy-0.020
Who-0.018
It-0.018
Token continued
Token ID3767
Feature activation-2.84e-02

Pos logit contributions
onel+0.192
 least+0.188
auc+0.170
ird+0.165
issy+0.164
Neg logit contributions
What-0.168
Really-0.149
 boy-0.143
It-0.131
Who-0.131
Token walking
Token ID6155
Feature activation-2.68e-02

Pos logit contributions
 least+0.061
onel+0.061
ird+0.057
issy+0.053
upid+0.052
Neg logit contributions
What-0.037
 boy-0.032
Really-0.030
 man-0.029
-0.028
Token,
Token ID11
Feature activation-3.65e-02

Pos logit contributions
 least+0.044
onel+0.043
ird+0.038
col+0.037
issy+0.036
Neg logit contributions
What-0.048
Really-0.044
 boy-0.042
Who-0.039
It-0.038
Token they
Token ID484
Feature activation-2.47e-02

Pos logit contributions
 least+0.065
onel+0.061
ird+0.057
col+0.055
issy+0.054
Neg logit contributions
What-0.058
Really-0.054
 boy-0.050
It-0.045
Who-0.045
Token met
Token ID1138
Feature activation+1.39e-02

Pos logit contributions
onel+0.056
 least+0.056
ird+0.051
col+0.051
auc+0.051
Neg logit contributions
What-0.028
 boy-0.024
Really-0.023
 man-0.020
Who-0.020