TOP ACTIVATIONS
MAX = 16.837

said seriously . Pe ppa rushed back to her
my calling her . " Pe ppa , it 's time
can ." Pe ep flew out of the
mom replied . Pe ppa organized the meat in
give them . Pe ggy wanted to introduce her
kindly the owl said " Pe anut , if you work
skills . Pe ggy and Sam had so
their loving friendship . <|endoftext|> Pe ggy was very excited .
ear . " Pe ppers are red , cheese
other pigs . Pe ppa was relieved and happy
be brave . Pe pper took a deep breath
door . Pe ggy felt so sorry for
She was so excited ! Pe eking through the leaves ,
One day , Pe ppa stumbled upon a strange
vegetables from their farm . Pe ppa was excited to go
was a humble mouse named Pe anut . Pe anut lived
mouse named Pe anut . Pe anut lived in an airport
George . One day , Pe ppa was playing with her
an airport . Every day Pe anut would wake up and
I organized the meat !" Pe ppa said . Her mom

MIN ACTIVATIONS
MIN = -17.788

When Daisy put the magn ifying glass down , the cater
examine it with the magn ifying glass ?" "
cried , dropping the magn ifying glass . "
borrowed her mom 's magn ifying glass to take a closer
flashlight , and a magn ifying glass . As
Don 't yield to the temptation to mist reat and break
. Lily had a magn ifying glass that she found in
She took out a magn ifying glass and looked very carefully
, handing him the magn ifying glass . She looked around
. They had a magn ifying glass and a small box
, drawing , and doing experiments . Lily and Tom want
and handed him the magn ifying glass . He
She takes out a magn ifying glass . She looks at
can help me with my chores instead . If you help
He pointed his magn ifying glass at different classmates and
the necklace and the magn ifying glass and told him that
the watch with the magn ifying glass . He saw the
ax saw shelves full of experiments and monitors be eping away
said , holding the magn ifying glass over a black ant
a flashlight and a magn ifying glass to examine their finds

INTERVAL 8.181 - 16.837
CONTAINS 5.350%

luggage from his bedroom . Jon ny 's
cl apped his hands . But then ,
frightened and alone . <|endoftext|> L ily and Tom were twins
back safely . â € Max thought about his
was so smart . <|endoftext|> Once upon a time , there

INTERVAL -0.476 - 8.181
CONTAINS 47.030%

Amy smiled and waved back at the elephant . She was
and Tom , what have you done ? You broke the
to Lily , " You are very kind and smart .
. The cop saw that Tim my was upset
The new boy smiled and said , " My

INTERVAL -9.132 - -0.476
CONTAINS 46.086%

cutting the corn and filling the truck . Tom
I wanted to have some too . But I did not
day . She promised to wear it every time she felt
that . They had to pick up all the blocks and
it was ," agreed Daisy . And they went back to

INTERVAL -17.788 - -9.132
CONTAINS 1.534%

fragile ." L ila said . She gave the crystal to
Lily . I was foolish . I won 't do that
had not lost their surprise . They hugged each other and
play outside with her friends . One day , when she
was shaking it all around . Jo an
Token said
Token ID531
Feature activation+6.65e+00

Pos logit contributions
eking+0.111
ir+0.098
aman+0.096
irs+0.096
Mary+0.094
Neg logit contributions
.-0.080
 He-0.079
."-0.077
 They-0.073
 toy-0.073
Token seriously
Token ID6411
Feature activation+5.61e-01

Pos logit contributions
eking+0.255
aman+0.225
ir+0.225
agg+0.224
Mary+0.222
Neg logit contributions
 proper-0.065
powers-0.061
 He-0.060
 toy-0.054
 They-0.050
Token.
Token ID13
Feature activation-7.98e+00

Pos logit contributions
eking+0.023
ir+0.020
aman+0.020
agg+0.020
Mary+0.020
Neg logit contributions
 proper-0.005
powers-0.004
 He-0.004
 toy-0.004
attered-0.003
Token
Token ID198
Feature activation+1.29e+01

Pos logit contributions
 He+0.146
 They+0.145
."+0.131
 But+0.131
.+0.126
Neg logit contributions
eking-0.283
ir-0.246
eks-0.242
iter-0.239
Jud-0.238
Token
Token ID198
Feature activation+1.28e+01

Pos logit contributions
eking+0.538
eks+0.454
ir+0.440
eling+0.438
Jud+0.438
Neg logit contributions
.-0.169
."-0.168
 He-0.166
 They-0.163
 But-0.141
TokenPe
Token ID6435
Feature activation+1.68e+01

Pos logit contributions
eking+0.302
ir+0.252
Jud+0.249
agg+0.245
ulu+0.242
Neg logit contributions
."-0.318
 He-0.318
 They-0.312
 proper-0.308
.-0.308
Tokenppa
Token ID44989
Feature activation+5.81e+00

Pos logit contributions
whel+0.294
irs+0.284
ulu+0.281
 July+0.271
Jim+0.265
Neg logit contributions
."-0.478
.-0.472
 toy-0.432
ometer-0.413
cream-0.410
Token rushed
Token ID13999
Feature activation-6.23e-01

Pos logit contributions
eking+0.184
ir+0.161
aman+0.161
agg+0.159
Mary+0.158
Neg logit contributions
 He-0.096
 proper-0.093
."-0.091
.-0.090
 toy-0.090
Token back
Token ID736
Feature activation-3.71e-01

Pos logit contributions
 He+0.007
 proper+0.006
."+0.006
 toy+0.006
.+0.006
Neg logit contributions
eking-0.024
ir-0.021
aman-0.021
agg-0.021
Jud-0.021
Token to
Token ID284
Feature activation+2.69e+00

Pos logit contributions
 He+0.003
 proper+0.003
."+0.003
 toy+0.003
.+0.003
Neg logit contributions
eking-0.015
ir-0.013
aman-0.013
agg-0.013
Jud-0.013
Token her
Token ID607
Feature activation+1.28e+00

Pos logit contributions
eking+0.089
agg+0.077
ir+0.077
iter+0.074
aman+0.074
Neg logit contributions
.-0.051
 He-0.047
 toy-0.047
."-0.047
 proper-0.046
Tokenmy
Token ID1820
Feature activation+3.78e+00

Pos logit contributions
eking+0.120
agg+0.112
aman+0.107
Mary+0.106
Jud+0.105
Neg logit contributions
.-0.119
 He-0.118
."-0.114
 They-0.113
 toy-0.110
Token calling
Token ID4585
Feature activation+9.94e-01

Pos logit contributions
eking+0.092
aman+0.077
ir+0.077
agg+0.076
Jud+0.076
Neg logit contributions
 proper-0.089
 He-0.089
."-0.086
powers-0.085
 toy-0.085
Token her
Token ID607
Feature activation+1.21e+00

Pos logit contributions
eking+0.024
aman+0.020
ir+0.020
agg+0.020
Jud+0.020
Neg logit contributions
 proper-0.023
 He-0.023
."-0.023
powers-0.023
 toy-0.022
Token.
Token ID13
Feature activation-7.89e+00

Pos logit contributions
eking+0.039
aman+0.036
Ja+0.035
Jud+0.035
ics+0.035
Neg logit contributions
attered-0.016
powers-0.016
 proper-0.016
 He-0.013
ly-0.012
Token "
Token ID366
Feature activation+3.37e+00

Pos logit contributions
 He+0.134
 They+0.127
."+0.114
.+0.113
 But+0.112
Neg logit contributions
eking-0.278
ir-0.244
Jud-0.234
ulu-0.234
eks-0.234
TokenPe
Token ID6435
Feature activation+1.67e+01

Pos logit contributions
eking+0.106
ir+0.091
irs+0.089
agg+0.088
ics+0.086
Neg logit contributions
 He-0.061
 They-0.058
."-0.057
ny-0.057
 toy-0.057
Tokenppa
Token ID44989
Feature activation+5.60e+00

Pos logit contributions
whel+0.276
ulu+0.260
 July+0.248
irs+0.248
Uh+0.243
Neg logit contributions
."-0.483
.-0.474
 toy-0.437
ometer-0.424
".-0.416
Token,
Token ID11
Feature activation+5.35e+00

Pos logit contributions
eking+0.153
ir+0.132
ma+0.128
Jud+0.127
aman+0.125
Neg logit contributions
 proper-0.123
powers-0.114
 He-0.111
attered-0.105
 They-0.105
Token it
Token ID340
Feature activation-1.01e+00

Pos logit contributions
eking+0.170
ir+0.150
agg+0.149
Jud+0.147
aman+0.146
Neg logit contributions
 He-0.090
 proper-0.088
 toy-0.088
."-0.087
.-0.084
Token's
Token ID338
Feature activation+8.72e-01

Pos logit contributions
 proper+0.015
powers+0.015
 He+0.015
 They+0.013
."+0.013
Neg logit contributions
eking-0.033
aman-0.029
Jud-0.028
ir-0.028
eks-0.028
Token time
Token ID640
Feature activation+3.88e+00

Pos logit contributions
eking+0.023
agg+0.020
Mary+0.020
Jud+0.020
ir+0.020
Neg logit contributions
."-0.020
.-0.020
 toy-0.019
 He-0.019
 proper-0.019
Token can
Token ID460
Feature activation-8.51e-01

Pos logit contributions
eking+0.037
ir+0.034
eks+0.033
ulu+0.033
agg+0.033
Neg logit contributions
 proper-0.017
 He-0.016
 toy-0.016
powers-0.015
attered-0.015
Token."
Token ID526
Feature activation-2.94e+00

Pos logit contributions
 proper+0.017
powers+0.015
 toy+0.014
 He+0.014
attered+0.014
Neg logit contributions
eking-0.025
ir-0.023
agg-0.022
ulu-0.022
Jud-0.022
Token
Token ID220
Feature activation+1.01e+01

Pos logit contributions
 He+0.063
 They+0.060
."+0.059
.+0.058
 But+0.056
Neg logit contributions
eking-0.088
Jud-0.074
ir-0.074
agg-0.073
Mary-0.072
Token
Token ID198
Feature activation+1.24e+01

Pos logit contributions
eking+0.342
ir+0.292
Jud+0.289
aman+0.287
Mary+0.287
Neg logit contributions
 He-0.178
."-0.170
.-0.166
 They-0.165
powers-0.145
Token
Token ID198
Feature activation+1.24e+01

Pos logit contributions
eking+0.486
eks+0.420
ir+0.407
Jud+0.407
andra+0.406
Neg logit contributions
 He-0.184
."-0.177
 They-0.168
.-0.162
 But-0.144
TokenPe
Token ID6435
Feature activation+1.63e+01

Pos logit contributions
eking+0.320
Jud+0.265
ir+0.262
agg+0.260
eks+0.259
Neg logit contributions
 He-0.295
."-0.288
 They-0.280
 proper-0.274
.-0.273
Tokenep
Token ID538
Feature activation+6.16e+00

Pos logit contributions
irs+0.330
Bet+0.307
Jud+0.305
Rachel+0.305
whel+0.302
Neg logit contributions
.-0.429
."-0.422
 toy-0.396
ometer-0.380
cream-0.376
Token flew
Token ID13112
Feature activation+1.26e+00

Pos logit contributions
eking+0.174
aman+0.151
ir+0.150
Jud+0.149
agg+0.149
Neg logit contributions
 He-0.120
 proper-0.118
."-0.116
 toy-0.114
.-0.113
Token out
Token ID503
Feature activation+1.27e-01

Pos logit contributions
eking+0.044
aman+0.039
ir+0.039
agg+0.039
Jud+0.039
Neg logit contributions
 proper-0.016
 He-0.016
."-0.015
powers-0.015
 toy-0.015
Token of
Token ID286
Feature activation-2.17e-01

Pos logit contributions
eking+0.005
aman+0.004
Jud+0.004
agg+0.004
Mary+0.004
Neg logit contributions
 He-0.002
 proper-0.002
.-0.002
."-0.002
 toy-0.002
Token the
Token ID262
Feature activation-4.50e-01

Pos logit contributions
 toy+0.003
 He+0.003
 proper+0.003
.+0.003
."+0.003
Neg logit contributions
eking-0.008
agg-0.007
Ja-0.007
icians-0.007
iter-0.007
Token mom
Token ID1995
Feature activation+4.26e+00

Pos logit contributions
eking+0.018
ir+0.015
aman+0.015
Jud+0.015
agg+0.015
Neg logit contributions
 He-0.013
 proper-0.013
."-0.012
 toy-0.012
powers-0.012
Token replied
Token ID8712
Feature activation+7.64e+00

Pos logit contributions
eking+0.112
ir+0.095
Jud+0.094
aman+0.094
agg+0.093
Neg logit contributions
 He-0.094
.-0.092
."-0.091
 proper-0.090
 toy-0.088
Token.
Token ID13
Feature activation-8.21e+00

Pos logit contributions
eking+0.315
ir+0.283
agg+0.280
aman+0.279
Mary+0.276
Neg logit contributions
 proper-0.054
powers-0.050
 He-0.043
 toy-0.039
 They-0.035
Token
Token ID198
Feature activation+1.22e+01

Pos logit contributions
 He+0.136
 They+0.129
."+0.116
 But+0.113
.+0.110
Neg logit contributions
eking-0.306
eks-0.263
ir-0.260
iter-0.255
irs-0.254
Token
Token ID198
Feature activation+1.22e+01

Pos logit contributions
eking+0.546
eks+0.464
eling+0.448
oken+0.436
ths+0.432
Neg logit contributions
 He-0.138
.-0.133
."-0.133
 They-0.131
 But-0.103
TokenPe
Token ID6435
Feature activation+1.63e+01

Pos logit contributions
eking+0.358
eks+0.303
ir+0.286
 nowhere+0.286
andra+0.284
Neg logit contributions
 He-0.249
 They-0.248
."-0.247
.-0.232
 toy-0.224
Tokenppa
Token ID44989
Feature activation+5.46e+00

Pos logit contributions
irs+0.245
whel+0.241
ulu+0.235
 July+0.215
Bet+0.214
Neg logit contributions
.-0.494
."-0.486
 toy-0.458
ly-0.435
ometer-0.434
Token organized
Token ID8389
Feature activation-4.05e+00

Pos logit contributions
eking+0.165
aman+0.144
ir+0.143
agg+0.142
Jud+0.142
Neg logit contributions
 proper-0.096
 He-0.095
."-0.091
powers-0.090
 toy-0.090
Token the
Token ID262
Feature activation-7.45e-01

Pos logit contributions
 toy+0.068
.+0.068
 proper+0.064
 He+0.063
ly+0.057
Neg logit contributions
eking-0.134
Jud-0.120
ir-0.119
irs-0.118
aman-0.115
Token meat
Token ID6174
Feature activation-1.58e-01

Pos logit contributions
powers+0.012
 They+0.012
."+0.011
 He+0.011
 proper+0.011
Neg logit contributions
eking-0.023
aman-0.021
ma-0.020
agg-0.019
Mary-0.019
Token in
Token ID287
Feature activation+1.14e+00

Pos logit contributions
 proper+0.003
powers+0.003
 He+0.003
."+0.003
 They+0.003
Neg logit contributions
eking-0.005
agg-0.004
aman-0.004
ulu-0.004
Mary-0.004
Token give
Token ID1577
Feature activation+1.83e+00

Pos logit contributions
eking+0.083
aman+0.073
ir+0.073
Jud+0.072
agg+0.072
Neg logit contributions
 proper-0.035
 He-0.035
."-0.034
 toy-0.033
.-0.032
Token them
Token ID606
Feature activation+3.39e+00

Pos logit contributions
eking+0.053
aman+0.046
Jud+0.046
ir+0.046
agg+0.045
Neg logit contributions
 He-0.034
 proper-0.034
powers-0.033
."-0.032
 toy-0.032
Token.
Token ID13
Feature activation-7.99e+00

Pos logit contributions
eking+0.088
ma+0.083
ulu+0.080
agg+0.080
Mary+0.079
Neg logit contributions
 He-0.065
powers-0.062
attered-0.060
 Let-0.059
 But-0.057
Token
Token ID198
Feature activation+1.27e+01

Pos logit contributions
 He+0.103
 They+0.095
 proper+0.094
."+0.092
.+0.090
Neg logit contributions
eking-0.294
ir-0.258
Jud-0.256
agg-0.254
aman-0.251
Token
Token ID198
Feature activation+1.27e+01

Pos logit contributions
eking+0.514
ir+0.425
eks+0.424
eling+0.424
ths+0.409
Neg logit contributions
 He-0.186
.-0.170
 They-0.170
."-0.167
 But-0.146
TokenPe
Token ID6435
Feature activation+1.63e+01

Pos logit contributions
eking+0.286
ir+0.231
Jud+0.225
agg+0.223
irs+0.221
Neg logit contributions
 He-0.336
."-0.326
 They-0.323
.-0.323
 proper-0.316
Tokenggy
Token ID19970
Feature activation+1.75e-01

Pos logit contributions
irs+0.319
 July+0.314
whel+0.301
Jud+0.296
Rachel+0.291
Neg logit contributions
.-0.419
."-0.411
 toy-0.382
 likes-0.382
ny-0.379
Token wanted
Token ID2227
Feature activation-2.67e+00

Pos logit contributions
eking+0.005
aman+0.005
agg+0.005
ir+0.005
Jud+0.005
Neg logit contributions
 proper-0.003
 He-0.003
."-0.003
powers-0.003
 toy-0.003
Token to
Token ID284
Feature activation+2.46e+00

Pos logit contributions
 He+0.024
 proper+0.024
powers+0.022
."+0.022
 They+0.020
Neg logit contributions
eking-0.102
Mary-0.092
aman-0.092
Jud-0.092
agg-0.092
Token introduce
Token ID10400
Feature activation+1.86e+00

Pos logit contributions
eking+0.070
ir+0.066
agg+0.065
ulu+0.065
Mary+0.065
Neg logit contributions
 proper-0.041
attered-0.041
powers-0.040
 He-0.040
ometer-0.038
Token her
Token ID607
Feature activation+8.93e-01

Pos logit contributions
eking+0.045
aman+0.039
ir+0.038
agg+0.038
Jud+0.038
Neg logit contributions
 He-0.044
 proper-0.044
."-0.043
.-0.042
 toy-0.042
Token kindly
Token ID26820
Feature activation-1.86e+00

Pos logit contributions
eking+0.079
aman+0.074
irs+0.072
agg+0.071
OV+0.071
Neg logit contributions
.-0.048
 He-0.045
."-0.044
 toy-0.043
ly-0.041
Token the
Token ID262
Feature activation-6.02e-01

Pos logit contributions
 He+0.028
.+0.026
."+0.025
 toy+0.025
 They+0.024
Neg logit contributions
eking-0.064
ir-0.056
agg-0.056
ulu-0.055
aman-0.055
Token owl
Token ID39610
Feature activation+3.60e+00

Pos logit contributions
 He+0.009
 proper+0.009
."+0.008
 toy+0.008
powers+0.008
Neg logit contributions
eking-0.020
aman-0.018
ir-0.018
agg-0.018
Jud-0.017
Token said
Token ID531
Feature activation+6.22e+00

Pos logit contributions
eking+0.105
aman+0.093
agg+0.092
Jud+0.091
ir+0.091
Neg logit contributions
 He-0.068
.-0.066
."-0.065
 toy-0.065
 proper-0.063
Token "
Token ID366
Feature activation+3.07e+00

Pos logit contributions
eking+0.197
ir+0.172
aman+0.172
agg+0.171
Mary+0.170
Neg logit contributions
 proper-0.102
 He-0.096
powers-0.095
."-0.093
 toy-0.092
TokenPe
Token ID6435
Feature activation+1.62e+01

Pos logit contributions
eking+0.086
agg+0.080
irs+0.074
aman+0.073
lf+0.072
Neg logit contributions
 He-0.064
ny-0.060
."-0.058
.-0.057
 Let-0.055
Tokenanut
Token ID20651
Feature activation+4.96e+00

Pos logit contributions
eking+0.281
ulu+0.274
aman+0.272
agg+0.270
irs+0.267
Neg logit contributions
.-0.427
."-0.427
 toy-0.414
powers-0.412
 He-0.410
Token,
Token ID11
Feature activation+4.87e+00

Pos logit contributions
eking+0.149
ir+0.134
Jud+0.129
agg+0.129
aman+0.129
Neg logit contributions
 proper-0.088
powers-0.081
 He-0.081
attered-0.075
 They-0.074
Token if
Token ID611
Feature activation-1.88e+00

Pos logit contributions
eking+0.150
aman+0.130
ir+0.130
agg+0.129
Jud+0.129
Neg logit contributions
 proper-0.083
 He-0.082
."-0.079
powers-0.079
 toy-0.078
Token you
Token ID345
Feature activation+7.44e-01

Pos logit contributions
 proper+0.038
 He+0.038
."+0.037
powers+0.036
 toy+0.036
Neg logit contributions
eking-0.052
aman-0.045
ir-0.045
agg-0.044
Jud-0.044
Token work
Token ID670
Feature activation-4.11e+00

Pos logit contributions
eking+0.023
ir+0.021
agg+0.020
aman+0.020
Jud+0.020
Neg logit contributions
 He-0.012
 proper-0.012
 toy-0.011
 They-0.011
powers-0.011
Token skills
Token ID4678
Feature activation-1.15e+00

Pos logit contributions
 proper+0.032
powers+0.030
 He+0.030
 They+0.028
."+0.027
Neg logit contributions
eking-0.082
Mary-0.072
ma-0.072
aman-0.071
ir-0.071
Token.
Token ID13
Feature activation-7.84e+00

Pos logit contributions
 proper+0.011
powers+0.010
 He+0.010
 toy+0.009
 They+0.008
Neg logit contributions
eking-0.043
aman-0.039
Mary-0.039
ir-0.038
Jud-0.038
Token
Token ID220
Feature activation+9.84e+00

Pos logit contributions
 He+0.148
 They+0.143
."+0.129
 But+0.127
.+0.127
Neg logit contributions
eking-0.261
ir-0.217
agg-0.215
iter-0.212
eks-0.211
Token
Token ID198
Feature activation+1.25e+01

Pos logit contributions
eking+0.335
ir+0.286
aman+0.284
agg+0.281
Jud+0.276
Neg logit contributions
 He-0.173
 They-0.168
."-0.162
.-0.159
powers-0.143
Token
Token ID198
Feature activation+1.24e+01

Pos logit contributions
eking+0.519
aman+0.437
eks+0.430
ir+0.428
ths+0.424
Neg logit contributions
 He-0.170
 They-0.157
.-0.143
."-0.142
 But-0.135
TokenPe
Token ID6435
Feature activation+1.61e+01

Pos logit contributions
eking+0.319
ir+0.257
agg+0.254
eks+0.249
irs+0.248
Neg logit contributions
 He-0.296
 They-0.283
."-0.281
.-0.279
powers-0.272
Tokenggy
Token ID19970
Feature activation+1.90e-02

Pos logit contributions
irs+0.420
 July+0.417
whel+0.389
Jud+0.388
Rachel+0.383
Neg logit contributions
.-0.313
."-0.298
 likes-0.276
 toy-0.274
ny-0.268
Token and
Token ID290
Feature activation+3.33e+00

Pos logit contributions
eking+0.001
Mary+0.000
aman+0.000
agg+0.000
Jud+0.000
Neg logit contributions
 proper-0.000
powers-0.000
 He-0.000
."-0.000
 toy-0.000
Token Sam
Token ID3409
Feature activation+2.72e+00

Pos logit contributions
eking+0.097
agg+0.081
ir+0.081
aman+0.079
irs+0.079
Neg logit contributions
 He-0.070
 toy-0.068
.-0.066
 They-0.065
 proper-0.065
Token had
Token ID550
Feature activation-1.34e+00

Pos logit contributions
eking+0.064
ir+0.054
Mary+0.054
aman+0.054
ma+0.052
Neg logit contributions
 proper-0.065
powers-0.064
 He-0.061
 toy-0.059
."-0.059
Token so
Token ID523
Feature activation-9.52e-01

Pos logit contributions
 proper+0.025
 He+0.025
powers+0.024
."+0.024
 toy+0.024
Neg logit contributions
eking-0.037
Mary-0.033
Jud-0.033
aman-0.032
agg-0.032
Token their
Token ID511
Feature activation+1.61e+00

Pos logit contributions
eking+0.110
aman+0.096
agg+0.095
Jud+0.093
Mary+0.092
Neg logit contributions
.-0.091
."-0.090
 toy-0.089
 He-0.089
 proper-0.088
Token loving
Token ID14442
Feature activation-4.83e+00

Pos logit contributions
eking+0.048
aman+0.041
ir+0.041
agg+0.041
Jud+0.040
Neg logit contributions
 He-0.029
."-0.027
 proper-0.026
 They-0.026
powers-0.026
Token friendship
Token ID14738
Feature activation-2.24e-01

Pos logit contributions
 He+0.091
powers+0.089
 proper+0.089
attered+0.087
cream+0.084
Neg logit contributions
eking-0.125
aman-0.115
ir-0.113
Mary-0.113
agg-0.113
Token.
Token ID13
Feature activation-7.77e+00

Pos logit contributions
 proper+0.002
 He+0.002
attered+0.002
powers+0.002
 Let+0.002
Neg logit contributions
eking-0.008
ir-0.007
ics-0.007
aman-0.007
ma-0.007
Token<|endoftext|>
Token ID50256
Feature activation+1.03e+01

Pos logit contributions
 He+0.147
 They+0.140
 proper+0.139
."+0.138
.+0.134
Neg logit contributions
eking-0.242
ir-0.201
aman-0.201
agg-0.201
Jud-0.197
TokenPe
Token ID6435
Feature activation+1.61e+01

Pos logit contributions
eking+0.255
aman+0.212
ir+0.208
agg+0.208
Jud+0.207
Neg logit contributions
 He-0.246
."-0.238
 proper-0.238
.-0.235
 They-0.234
Tokenggy
Token ID19970
Feature activation+2.53e-01

Pos logit contributions
Uh+0.352
whel+0.335
irs+0.317
 July+0.308
Da+0.302
Neg logit contributions
.-0.410
."-0.376
 toy-0.336
ny-0.326
 name-0.322
Token was
Token ID373
Feature activation+5.21e+00

Pos logit contributions
eking+0.008
aman+0.007
ir+0.007
agg+0.007
Jud+0.007
Neg logit contributions
 He-0.004
 proper-0.004
."-0.004
 toy-0.004
powers-0.004
Token very
Token ID845
Feature activation-9.84e-01

Pos logit contributions
eking+0.178
irs+0.159
aman+0.159
Jud+0.159
Mary+0.157
Neg logit contributions
 He-0.078
.-0.076
 toy-0.075
 like-0.074
 proper-0.072
Token excited
Token ID6568
Feature activation+6.17e-01

Pos logit contributions
 He+0.019
 proper+0.018
."+0.018
.+0.018
 toy+0.018
Neg logit contributions
eking-0.029
aman-0.026
Jud-0.025
Mary-0.025
ir-0.025
Token.
Token ID13
Feature activation-8.06e+00

Pos logit contributions
eking+0.021
aman+0.020
ics+0.019
ma+0.019
agg+0.019
Neg logit contributions
 proper-0.008
powers-0.008
 He-0.006
 floppy-0.006
attered-0.006
Token ear
Token ID1027
Feature activation-2.29e+00

Pos logit contributions
eking+0.032
ir+0.028
agg+0.027
aman+0.027
Jud+0.027
Neg logit contributions
 He-0.020
 proper-0.020
."-0.020
powers-0.019
 They-0.019
Token.
Token ID13
Feature activation-8.67e+00

Pos logit contributions
 proper+0.021
powers+0.015
 He+0.013
attered+0.013
 toy+0.012
Neg logit contributions
eking-0.091
ma-0.082
ir-0.082
aman-0.081
Mary-0.081
Token
Token ID198
Feature activation+1.21e+01

Pos logit contributions
 He+0.134
 They+0.121
.+0.110
 But+0.106
."+0.102
Neg logit contributions
eking-0.335
ir-0.286
aman-0.283
agg-0.282
eks-0.278
Token
Token ID198
Feature activation+1.21e+01

Pos logit contributions
eking+0.483
aman+0.415
eling+0.407
agg+0.407
eks+0.400
Neg logit contributions
 He-0.178
.-0.172
."-0.160
 They-0.157
 But-0.128
Token"
Token ID1
Feature activation+1.94e+00

Pos logit contributions
eking+0.393
agg+0.338
aman+0.335
ir+0.332
ulu+0.328
Neg logit contributions
 He-0.222
.-0.204
 They-0.203
."-0.200
powers-0.181
TokenPe
Token ID6435
Feature activation+1.59e+01

Pos logit contributions
eking+0.056
aman+0.049
ir+0.049
agg+0.048
Jud+0.048
Neg logit contributions
 He-0.038
 proper-0.037
."-0.036
powers-0.035
 toy-0.035
Tokenppers
Token ID11799
Feature activation-3.48e+00

Pos logit contributions
eking+0.291
aman+0.277
irs+0.271
ir+0.261
agg+0.257
Neg logit contributions
.-0.426
."-0.425
 He-0.408
 toy-0.406
powers-0.401
Token are
Token ID389
Feature activation+5.78e+00

Pos logit contributions
 proper+0.057
 He+0.056
powers+0.054
."+0.053
 toy+0.053
Neg logit contributions
eking-0.110
ir-0.096
aman-0.096
agg-0.095
Jud-0.094
Token red
Token ID2266
Feature activation-2.34e+00

Pos logit contributions
eking+0.174
ir+0.147
agg+0.146
aman+0.146
Jud+0.145
Neg logit contributions
 proper-0.103
 He-0.102
powers-0.100
."-0.094
 They-0.094
Token,
Token ID11
Feature activation+4.67e+00

Pos logit contributions
 proper+0.059
powers+0.050
attered+0.048
 He+0.047
 But+0.045
Neg logit contributions
eking-0.054
ir-0.048
ma-0.048
aman-0.044
ics-0.041
Token cheese
Token ID9891
Feature activation-3.75e-01

Pos logit contributions
eking+0.152
Jud+0.134
aman+0.134
agg+0.133
ir+0.132
Neg logit contributions
 He-0.074
 proper-0.072
."-0.071
 toy-0.069
 They-0.068
Token other
Token ID584
Feature activation-2.96e+00

Pos logit contributions
 proper+0.015
 He+0.015
powers+0.015
 player+0.014
."+0.014
Neg logit contributions
eking-0.027
agg-0.024
aman-0.024
Mary-0.023
ir-0.023
Token pigs
Token ID22333
Feature activation+3.98e+00

Pos logit contributions
 He+0.042
 proper+0.041
."+0.041
 toy+0.039
.+0.039
Neg logit contributions
eking-0.101
aman-0.089
ir-0.089
Jud-0.088
agg-0.088
Token.
Token ID13
Feature activation-8.28e+00

Pos logit contributions
eking+0.135
ir+0.114
Jud+0.113
aman+0.113
ics+0.112
Neg logit contributions
 proper-0.066
 He-0.059
powers-0.059
 Let-0.051
."-0.051
Token
Token ID198
Feature activation+1.24e+01

Pos logit contributions
 proper+0.155
powers+0.148
 He+0.145
."+0.144
 toy+0.144
Neg logit contributions
eking-0.234
aman-0.212
ir-0.204
agg-0.204
Mary-0.202
Token
Token ID198
Feature activation+1.23e+01

Pos logit contributions
eking+0.467
ir+0.388
aman+0.386
eks+0.383
agg+0.379
Neg logit contributions
 He-0.194
 They-0.185
."-0.184
.-0.183
 But-0.158
TokenPe
Token ID6435
Feature activation+1.59e+01

Pos logit contributions
eking+0.267
ir+0.216
agg+0.216
Jud+0.216
aman+0.215
Neg logit contributions
 He-0.326
 proper-0.323
."-0.318
 They-0.313
 toy-0.312
Tokenppa
Token ID44989
Feature activation+5.02e+00

Pos logit contributions
whel+0.296
 July+0.295
irs+0.289
ulu+0.270
Uh+0.262
Neg logit contributions
."-0.432
.-0.431
 toy-0.387
ly-0.371
 should-0.366
Token was
Token ID373
Feature activation+4.58e+00

Pos logit contributions
eking+0.171
irs+0.154
ir+0.151
aman+0.150
ulu+0.146
Neg logit contributions
 He-0.076
.-0.074
 They-0.069
."-0.069
 toy-0.069
Token relieved
Token ID22598
Feature activation-1.21e-01

Pos logit contributions
eking+0.165
aman+0.158
irs+0.156
ir+0.154
agg+0.146
Neg logit contributions
 like-0.063
 He-0.063
.-0.059
."-0.057
 toy-0.056
Token and
Token ID290
Feature activation+2.83e+00

Pos logit contributions
 He+0.001
 proper+0.001
."+0.001
powers+0.001
 toy+0.001
Neg logit contributions
eking-0.005
aman-0.004
ir-0.004
agg-0.004
Jud-0.004
Token happy
Token ID3772
Feature activation+2.06e-01

Pos logit contributions
eking+0.085
ir+0.077
aman+0.075
irs+0.074
Mary+0.072
Neg logit contributions
 He-0.054
 proper-0.052
 They-0.050
 toy-0.050
.-0.050
Token be
Token ID307
Feature activation+6.39e+00

Pos logit contributions
eking+0.063
ir+0.055
aman+0.055
agg+0.055
Jud+0.054
Neg logit contributions
 He-0.033
 proper-0.033
."-0.033
.-0.032
 toy-0.031
Token brave
Token ID14802
Feature activation-3.08e+00

Pos logit contributions
eking+0.175
aman+0.149
ir+0.149
agg+0.148
Jud+0.148
Neg logit contributions
 proper-0.132
 He-0.132
."-0.127
powers-0.126
 toy-0.125
Token.
Token ID13
Feature activation-8.22e+00

Pos logit contributions
attered+0.050
cream+0.049
 proper+0.048
powers+0.046
ometer+0.045
Neg logit contributions
eking-0.087
ma-0.084
 nowhere-0.076
Mary-0.075
ir-0.073
Token
Token ID198
Feature activation+1.22e+01

Pos logit contributions
 He+0.139
 They+0.133
 But+0.120
."+0.117
.+0.116
Neg logit contributions
eking-0.295
iter-0.252
ir-0.249
lf-0.248
irs-0.246
Token
Token ID198
Feature activation+1.22e+01

Pos logit contributions
eking+0.505
ths+0.414
eks+0.412
eling+0.410
aman+0.405
Neg logit contributions
 He-0.164
.-0.151
 They-0.148
."-0.139
 But-0.125
TokenPe
Token ID6435
Feature activation+1.59e+01

Pos logit contributions
eking+0.314
eks+0.247
irs+0.247
agg+0.243
ir+0.242
Neg logit contributions
 He-0.291
 They-0.284
."-0.277
.-0.276
ny-0.271
Tokenpper
Token ID2848
Feature activation+2.61e+00

Pos logit contributions
irs+0.326
 July+0.309
whel+0.306
ulu+0.281
aman+0.280
Neg logit contributions
.-0.405
."-0.392
 toy-0.365
ny-0.363
 He-0.356
Token took
Token ID1718
Feature activation+2.94e+00

Pos logit contributions
eking+0.078
irs+0.069
ir+0.068
aman+0.068
agg+0.066
Neg logit contributions
 He-0.049
.-0.046
 toy-0.046
."-0.046
 proper-0.045
Token a
Token ID257
Feature activation-2.87e+00

Pos logit contributions
eking+0.092
aman+0.080
ir+0.079
Jud+0.079
agg+0.079
Neg logit contributions
 He-0.051
 proper-0.050
."-0.048
 toy-0.048
.-0.047
Token deep
Token ID2769
Feature activation-2.02e+00

Pos logit contributions
 proper+0.051
 He+0.050
 toy+0.048
powers+0.048
."+0.048
Neg logit contributions
eking-0.088
aman-0.076
ir-0.076
agg-0.076
Jud-0.075
Token breath
Token ID8033
Feature activation+1.12e+00

Pos logit contributions
 He+0.054
 proper+0.054
 toy+0.052
."+0.052
powers+0.052
Neg logit contributions
eking-0.043
aman-0.036
Jud-0.035
Mary-0.035
agg-0.035
Token door
Token ID3420
Feature activation-1.03e+00

Pos logit contributions
powers+0.029
 He+0.029
 proper+0.029
 They+0.028
."+0.027
Neg logit contributions
eking-0.032
aman-0.027
agg-0.026
ir-0.026
Mary-0.026
Token.
Token ID13
Feature activation-8.25e+00

Pos logit contributions
powers+0.011
 proper+0.010
 bone+0.009
 He+0.009
eer+0.008
Neg logit contributions
eking-0.038
aman-0.035
Mary-0.034
ma-0.034
ir-0.034
Token
Token ID220
Feature activation+9.47e+00

Pos logit contributions
 He+0.181
 They+0.177
 proper+0.166
."+0.165
.+0.164
Neg logit contributions
eking-0.239
ir-0.202
agg-0.197
ulu-0.193
Jud-0.192
Token
Token ID198
Feature activation+1.20e+01

Pos logit contributions
eking+0.343
ir+0.295
lf+0.287
aman+0.286
eks+0.283
Neg logit contributions
 They-0.155
 He-0.150
."-0.146
.-0.143
 But-0.125
Token
Token ID198
Feature activation+1.20e+01

Pos logit contributions
eking+0.472
ir+0.400
agg+0.396
eks+0.394
aman+0.392
Neg logit contributions
 They-0.168
 He-0.168
.-0.158
."-0.157
 But-0.140
TokenPe
Token ID6435
Feature activation+1.57e+01

Pos logit contributions
eking+0.268
agg+0.222
ir+0.217
Jud+0.209
ulu+0.207
Neg logit contributions
 He-0.314
 They-0.310
."-0.308
.-0.307
 proper-0.303
Tokenggy
Token ID19970
Feature activation-6.52e-01

Pos logit contributions
irs+0.349
 July+0.339
Rachel+0.329
Jud+0.320
whel+0.314
Neg logit contributions
.-0.377
."-0.352
 toy-0.336
 likes-0.327
 name-0.320
Token felt
Token ID2936
Feature activation-1.55e-01

Pos logit contributions
 proper+0.013
 He+0.013
."+0.012
powers+0.012
 toy+0.012
Neg logit contributions
eking-0.018
aman-0.016
agg-0.016
ir-0.015
Jud-0.015
Token so
Token ID523
Feature activation-1.52e+00

Pos logit contributions
powers+0.003
 He+0.003
 proper+0.003
 toy+0.003
 They+0.003
Neg logit contributions
eking-0.004
agg-0.004
ulu-0.004
Mary-0.003
aman-0.003
Token sorry
Token ID7926
Feature activation-1.71e+00

Pos logit contributions
 proper+0.032
 He+0.031
."+0.030
powers+0.030
 toy+0.030
Neg logit contributions
eking-0.042
aman-0.036
ir-0.036
Jud-0.035
Mary-0.035
Token for
Token ID329
Feature activation+9.83e-01

Pos logit contributions
powers+0.033
 proper+0.032
attered+0.029
 floppy+0.028
cream+0.027
Neg logit contributions
eking-0.046
aman-0.045
agg-0.040
ulu-0.040
Mary-0.039
Token She
Token ID1375
Feature activation-2.89e+00

Pos logit contributions
 proper+0.182
."+0.178
attered+0.176
 toy+0.175
powers+0.175
Neg logit contributions
eking-0.152
Ja-0.138
Kelly-0.136
aman-0.135
Mary-0.134
Token was
Token ID373
Feature activation+5.32e+00

Pos logit contributions
 He+0.043
 toy+0.042
.+0.041
 proper+0.041
 should+0.039
Neg logit contributions
eking-0.099
ir-0.089
aman-0.088
irs-0.088
Jud-0.085
Token so
Token ID523
Feature activation-5.85e-01

Pos logit contributions
eking+0.187
aman+0.177
irs+0.177
ir+0.176
Jud+0.171
Neg logit contributions
 He-0.068
 like-0.067
.-0.066
 toy-0.065
."-0.063
Token excited
Token ID6568
Feature activation+5.73e-01

Pos logit contributions
 He+0.009
.+0.008
."+0.008
 They+0.008
 like+0.008
Neg logit contributions
eking-0.020
aman-0.019
irs-0.018
ir-0.018
andra-0.018
Token!
Token ID0
Feature activation-6.98e+00

Pos logit contributions
eking+0.019
aman+0.017
Mary+0.017
agg+0.017
Jud+0.017
Neg logit contributions
 proper-0.009
powers-0.008
 He-0.007
 toy-0.007
attered-0.007
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
 proper+0.099
 toy+0.092
."+0.092
powers+0.089
 He+0.086
Neg logit contributions
eking-0.218
aman-0.204
Mary-0.200
Jud-0.200
Kelly-0.196
Tokeneking
Token ID18754
Feature activation+2.86e+00

Pos logit contributions
Bet+0.249
irs+0.246
ulu+0.245
Jim+0.237
agg+0.234
Neg logit contributions
ray-0.416
.-0.412
 toy-0.411
 likes-0.409
."-0.408
Token through
Token ID832
Feature activation-1.43e-01

Pos logit contributions
eking+0.094
ir+0.086
agg+0.085
Jud+0.083
Mary+0.083
Neg logit contributions
 proper-0.042
powers-0.038
 He-0.038
."-0.037
 They-0.035
Token the
Token ID262
Feature activation-4.21e-01

Pos logit contributions
 proper+0.003
 He+0.003
."+0.003
powers+0.003
 toy+0.003
Neg logit contributions
eking-0.004
aman-0.003
ir-0.003
agg-0.003
Mary-0.003
Token leaves
Token ID5667
Feature activation-5.78e-01

Pos logit contributions
 They+0.008
 He+0.008
 proper+0.008
."+0.008
powers+0.008
Neg logit contributions
eking-0.012
aman-0.010
Mary-0.010
Jud-0.010
ir-0.010
Token,
Token ID11
Feature activation+5.33e+00

Pos logit contributions
.+0.007
 He+0.006
."+0.006
 toy+0.006
 like+0.006
Neg logit contributions
eking-0.021
agg-0.019
Mary-0.019
aman-0.019
Jud-0.019
Token
Token ID198
Feature activation+1.28e+01

Pos logit contributions
eking+0.394
aman+0.343
ir+0.335
eling+0.332
agg+0.328
Neg logit contributions
 He-0.161
 They-0.154
.-0.153
."-0.146
 It-0.124
Token
Token ID198
Feature activation+1.28e+01

Pos logit contributions
eking+0.543
ths+0.454
eks+0.453
eling+0.451
aman+0.451
Neg logit contributions
.-0.169
 He-0.165
."-0.156
 They-0.152
 But-0.134
TokenOne
Token ID3198
Feature activation+1.36e+00

Pos logit contributions
eking+0.309
Jud+0.253
agg+0.250
ir+0.248
aman+0.244
Neg logit contributions
 He-0.323
."-0.313
.-0.308
 proper-0.308
 They-0.307
Token day
Token ID1110
Feature activation+9.42e-01

Pos logit contributions
eking+0.032
agg+0.031
ulu+0.031
Mary+0.030
Jud+0.030
Neg logit contributions
powers-0.029
 He-0.028
 proper-0.028
."-0.027
attered-0.026
Token,
Token ID11
Feature activation+5.21e+00

Pos logit contributions
eking+0.033
Jud+0.030
Mary+0.029
aman+0.029
ir+0.029
Neg logit contributions
 He-0.013
.-0.013
."-0.012
 They-0.012
 toy-0.012
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
eking+0.143
Jud+0.120
ir+0.120
agg+0.118
Mary+0.118
Neg logit contributions
 He-0.114
."-0.109
.-0.109
 They-0.108
 toy-0.108
Tokenppa
Token ID44989
Feature activation+5.50e+00

Pos logit contributions
irs+0.297
Mary+0.286
Bet+0.284
Jim+0.276
Jane+0.272
Neg logit contributions
.-0.400
."-0.399
 toy-0.386
 proper-0.382
 favorite-0.369
Token stumbled
Token ID24241
Feature activation+2.58e+00

Pos logit contributions
eking+0.167
aman+0.145
ir+0.144
agg+0.143
Mary+0.142
Neg logit contributions
 He-0.097
 proper-0.096
."-0.094
 toy-0.093
.-0.092
Token upon
Token ID2402
Feature activation+7.33e+00

Pos logit contributions
eking+0.085
aman+0.076
ir+0.075
agg+0.074
irs+0.074
Neg logit contributions
 He-0.041
.-0.040
."-0.039
 toy-0.038
 proper-0.037
Token a
Token ID257
Feature activation-2.38e+00

Pos logit contributions
eking+0.236
aman+0.207
ir+0.207
agg+0.206
Jud+0.206
Neg logit contributions
 proper-0.112
 He-0.109
."-0.105
powers-0.104
 toy-0.101
Token strange
Token ID6283
Feature activation-5.31e+00

Pos logit contributions
 He+0.040
."+0.039
 They+0.039
powers+0.038
 proper+0.037
Neg logit contributions
eking-0.072
aman-0.066
ir-0.063
Jud-0.063
ulu-0.062
Token vegetables
Token ID13701
Feature activation-2.24e+00

Pos logit contributions
 He+0.018
attered+0.017
 proper+0.017
ny+0.017
 They+0.017
Neg logit contributions
eking-0.028
agg-0.024
aman-0.024
ma-0.024
ir-0.024
Token from
Token ID422
Feature activation+6.02e+00

Pos logit contributions
 proper+0.028
powers+0.027
 He+0.026
attered+0.024
 floppy+0.022
Neg logit contributions
eking-0.077
aman-0.069
ma-0.068
ulu-0.068
ir-0.067
Token their
Token ID511
Feature activation+1.88e+00

Pos logit contributions
eking+0.187
agg+0.164
Jud+0.164
ir+0.162
Mary+0.161
Neg logit contributions
."-0.107
 proper-0.107
.-0.107
 toy-0.103
 He-0.102
Token farm
Token ID5318
Feature activation+1.84e+00

Pos logit contributions
eking+0.050
aman+0.046
ma+0.043
agg+0.042
Mary+0.041
Neg logit contributions
 He-0.037
powers-0.036
."-0.034
 They-0.034
 proper-0.033
Token.
Token ID13
Feature activation-7.95e+00

Pos logit contributions
eking+0.071
aman+0.065
ma+0.065
Mary+0.063
Jud+0.063
Neg logit contributions
 He-0.014
 proper-0.014
powers-0.013
attered-0.011
 They-0.010
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
 He+0.143
 They+0.138
 proper+0.136
."+0.136
.+0.131
Neg logit contributions
eking-0.252
ir-0.216
Jud-0.216
agg-0.212
Mary-0.211
Tokenppa
Token ID44989
Feature activation+5.63e+00

Pos logit contributions
Mary+0.285
Jud+0.283
Jim+0.282
irs+0.271
Bet+0.270
Neg logit contributions
."-0.419
 proper-0.411
.-0.410
ly-0.409
 toy-0.406
Token was
Token ID373
Feature activation+5.11e+00

Pos logit contributions
eking+0.182
ir+0.160
irs+0.158
aman+0.156
Jud+0.155
Neg logit contributions
 He-0.094
.-0.089
."-0.088
 toy-0.088
 proper-0.088
Token excited
Token ID6568
Feature activation+4.50e-01

Pos logit contributions
eking+0.168
ir+0.157
aman+0.157
irs+0.155
Jud+0.154
Neg logit contributions
 He-0.081
.-0.078
."-0.076
 toy-0.075
 proper-0.075
Token to
Token ID284
Feature activation+2.41e+00

Pos logit contributions
eking+0.016
aman+0.015
Mary+0.014
agg+0.014
ir+0.014
Neg logit contributions
 proper-0.006
powers-0.005
 He-0.005
 They-0.005
 toy-0.005
Token go
Token ID467
Feature activation+2.86e+00

Pos logit contributions
eking+0.062
Mary+0.056
agg+0.055
Jud+0.055
ir+0.055
Neg logit contributions
attered-0.049
powers-0.049
 proper-0.048
ometer-0.044
 He-0.043
Token was
Token ID373
Feature activation+5.23e+00

Pos logit contributions
 He+0.018
.+0.017
 toy+0.016
 They+0.015
."+0.014
Neg logit contributions
eking-0.063
Jud-0.053
ir-0.053
aman-0.053
irs-0.053
Token a
Token ID257
Feature activation-2.39e+00

Pos logit contributions
eking+0.188
aman+0.166
ir+0.164
Jud+0.163
agg+0.163
Neg logit contributions
 He-0.067
 proper-0.066
 toy-0.063
."-0.062
 They-0.061
Token humble
Token ID19695
Feature activation-3.55e+00

Pos logit contributions
 proper+0.042
powers+0.041
 He+0.041
."+0.039
attered+0.038
Neg logit contributions
eking-0.068
agg-0.062
ulu-0.061
Jud-0.061
aman-0.061
Token mouse
Token ID10211
Feature activation+2.26e+00

Pos logit contributions
 proper+0.057
powers+0.055
attered+0.052
 He+0.052
."+0.049
Neg logit contributions
eking-0.103
aman-0.100
agg-0.099
ir-0.098
ulu-0.098
Token named
Token ID3706
Feature activation+3.41e+00

Pos logit contributions
lf+0.024
eking+0.024
ir+0.023
ulu+0.023
ma+0.022
Neg logit contributions
 proper-0.082
powers-0.080
attered-0.074
 textures-0.072
ashes-0.071
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
eking+0.094
aman+0.083
Jud+0.082
ir+0.082
Mary+0.081
Neg logit contributions
 proper-0.067
 He-0.065
powers-0.065
."-0.063
 toy-0.062
Tokenanut
Token ID20651
Feature activation+5.24e+00

Pos logit contributions
Bet+0.295
whel+0.289
ivated+0.279
Mary+0.277
Uh+0.277
Neg logit contributions
.-0.394
 toy-0.378
."-0.374
ny-0.352
 proper-0.352
Token.
Token ID13
Feature activation-8.07e+00

Pos logit contributions
eking+0.196
ir+0.189
ma+0.187
Jud+0.184
ulu+0.184
Neg logit contributions
 proper-0.049
powers-0.045
attered-0.038
 He-0.037
 Let-0.032
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
 proper+0.188
powers+0.181
 toy+0.173
."+0.173
 He+0.170
Neg logit contributions
eking-0.195
ir-0.170
aman-0.170
agg-0.166
Mary-0.164
Tokenanut
Token ID20651
Feature activation+5.18e+00

Pos logit contributions
Bet+0.308
Mary+0.304
aman+0.302
irs+0.295
whel+0.293
Neg logit contributions
 toy-0.381
.-0.376
."-0.373
 likes-0.368
 proper-0.359
Token lived
Token ID5615
Feature activation+7.43e+00

Pos logit contributions
eking+0.143
agg+0.128
Jud+0.127
ma+0.127
ir+0.126
Neg logit contributions
 proper-0.098
powers-0.094
 He-0.091
."-0.086
attered-0.085
Token mouse
Token ID10211
Feature activation+2.26e+00

Pos logit contributions
 proper+0.057
powers+0.055
attered+0.052
 He+0.052
."+0.049
Neg logit contributions
eking-0.103
aman-0.100
agg-0.099
ir-0.098
ulu-0.098
Token named
Token ID3706
Feature activation+3.41e+00

Pos logit contributions
lf+0.024
eking+0.024
ir+0.023
ulu+0.023
ma+0.022
Neg logit contributions
 proper-0.082
powers-0.080
attered-0.074
 textures-0.072
ashes-0.071
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
eking+0.094
aman+0.083
Jud+0.082
ir+0.082
Mary+0.081
Neg logit contributions
 proper-0.067
 He-0.065
powers-0.065
."-0.063
 toy-0.062
Tokenanut
Token ID20651
Feature activation+5.24e+00

Pos logit contributions
Bet+0.295
whel+0.289
ivated+0.279
Mary+0.277
Uh+0.277
Neg logit contributions
.-0.394
 toy-0.378
."-0.374
ny-0.352
 proper-0.352
Token.
Token ID13
Feature activation-8.07e+00

Pos logit contributions
eking+0.196
ir+0.189
ma+0.187
Jud+0.184
ulu+0.184
Neg logit contributions
 proper-0.049
powers-0.045
attered-0.038
 He-0.037
 Let-0.032
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
 proper+0.188
powers+0.181
 toy+0.173
."+0.173
 He+0.170
Neg logit contributions
eking-0.195
ir-0.170
aman-0.170
agg-0.166
Mary-0.164
Tokenanut
Token ID20651
Feature activation+5.18e+00

Pos logit contributions
Bet+0.308
Mary+0.304
aman+0.302
irs+0.295
whel+0.293
Neg logit contributions
 toy-0.381
.-0.376
."-0.373
 likes-0.368
 proper-0.359
Token lived
Token ID5615
Feature activation+7.43e+00

Pos logit contributions
eking+0.143
agg+0.128
Jud+0.127
ma+0.127
ir+0.126
Neg logit contributions
 proper-0.098
powers-0.094
 He-0.091
."-0.086
attered-0.085
Token in
Token ID287
Feature activation+1.15e+00

Pos logit contributions
eking+0.275
aman+0.268
irs+0.255
andra+0.254
agg+0.254
Neg logit contributions
.-0.088
 He-0.083
 toy-0.083
."-0.082
 proper-0.077
Token an
Token ID281
Feature activation-5.36e-02

Pos logit contributions
eking+0.041
aman+0.036
ir+0.036
agg+0.036
Jud+0.035
Neg logit contributions
 He-0.015
 proper-0.015
 toy-0.014
."-0.014
 They-0.014
Token airport
Token ID9003
Feature activation+1.50e+00

Pos logit contributions
 proper+0.001
 He+0.001
."+0.001
powers+0.001
 toy+0.001
Neg logit contributions
eking-0.002
aman-0.002
ir-0.002
agg-0.002
Jud-0.002
Token George
Token ID4502
Feature activation+4.38e+00

Pos logit contributions
powers+0.018
 proper+0.016
attered+0.014
 He+0.013
ometer+0.013
Neg logit contributions
ma-0.034
eking-0.034
Jud-0.032
Mary-0.032
aman-0.031
Token.
Token ID13
Feature activation-8.12e+00

Pos logit contributions
eking+0.140
ir+0.134
ma+0.134
ulu+0.127
ics+0.126
Neg logit contributions
powers-0.064
 proper-0.059
 He-0.046
 floppy-0.045
 bone-0.044
Token One
Token ID1881
Feature activation+4.41e-01

Pos logit contributions
 proper+0.069
 He+0.065
powers+0.062
."+0.061
 toy+0.060
Neg logit contributions
eking-0.318
aman-0.288
ir-0.287
agg-0.286
Mary-0.284
Token day
Token ID1110
Feature activation+9.75e-01

Pos logit contributions
eking+0.013
agg+0.012
aman+0.012
ir+0.012
Jud+0.012
Neg logit contributions
 proper-0.007
 He-0.007
powers-0.007
."-0.007
 toy-0.007
Token,
Token ID11
Feature activation+5.23e+00

Pos logit contributions
eking+0.033
Jud+0.029
aman+0.029
ir+0.029
Mary+0.029
Neg logit contributions
 He-0.014
."-0.014
 proper-0.014
.-0.014
 They-0.013
Token Pe
Token ID2631
Feature activation+1.56e+01

Pos logit contributions
eking+0.152
ir+0.129
Jud+0.128
aman+0.127
agg+0.127
Neg logit contributions
 He-0.104
 proper-0.101
."-0.100
 They-0.099
 toy-0.099
Tokenppa
Token ID44989
Feature activation+5.55e+00

Pos logit contributions
Mary+0.292
Jane+0.273
Bet+0.271
Jud+0.268
Jim+0.267
Neg logit contributions
."-0.414
.-0.412
 toy-0.400
 proper-0.386
ray-0.374
Token was
Token ID373
Feature activation+5.39e+00

Pos logit contributions
eking+0.155
Jud+0.136
aman+0.135
agg+0.135
ir+0.135
Neg logit contributions
 proper-0.108
 He-0.104
powers-0.102
."-0.097
 toy-0.097
Token playing
Token ID2712
Feature activation-5.14e+00

Pos logit contributions
aman+0.155
eking+0.153
Mary+0.149
Jud+0.148
andra+0.147
Neg logit contributions
.-0.108
 toy-0.106
."-0.102
 like-0.101
 He-0.099
Token with
Token ID351
Feature activation+2.67e+00

Pos logit contributions
 He+0.070
 proper+0.070
."+0.067
 toy+0.065
powers+0.064
Neg logit contributions
eking-0.177
aman-0.157
ir-0.156
agg-0.156
Jud-0.155
Token her
Token ID607
Feature activation+1.18e+00

Pos logit contributions
eking+0.086
aman+0.075
ir+0.074
agg+0.074
Jud+0.073
Neg logit contributions
 He-0.046
 toy-0.046
 proper-0.045
 They-0.043
."-0.042
Token an
Token ID281
Feature activation-5.36e-02

Pos logit contributions
eking+0.041
aman+0.036
ir+0.036
agg+0.036
Jud+0.035
Neg logit contributions
 He-0.015
 proper-0.015
 toy-0.014
."-0.014
 They-0.014
Token airport
Token ID9003
Feature activation+1.50e+00

Pos logit contributions
 proper+0.001
 He+0.001
."+0.001
powers+0.001
 toy+0.001
Neg logit contributions
eking-0.002
aman-0.002
ir-0.002
agg-0.002
Jud-0.002
Token.
Token ID13
Feature activation-8.11e+00

Pos logit contributions
eking+0.050
ir+0.043
Mary+0.042
aman+0.042
Jud+0.042
Neg logit contributions
 proper-0.023
powers-0.022
 He-0.022
 They-0.020
 toy-0.020
Token Every
Token ID3887
Feature activation-3.51e+00

Pos logit contributions
 proper+0.087
powers+0.079
 He+0.079
."+0.077
 toy+0.077
Neg logit contributions
eking-0.298
aman-0.270
ir-0.269
agg-0.268
Jud-0.266
Token day
Token ID1110
Feature activation+8.86e-01

Pos logit contributions
 proper+0.055
 He+0.055
."+0.053
powers+0.052
 toy+0.052
Neg logit contributions
eking-0.113
aman-0.099
ir-0.099
agg-0.098
Jud-0.098
Token Pe
Token ID2631
Feature activation+1.55e+01

Pos logit contributions
eking+0.027
aman+0.024
ir+0.024
agg+0.024
Mary+0.023
Neg logit contributions
 proper-0.015
 He-0.015
powers-0.014
."-0.014
 toy-0.014
Tokenanut
Token ID20651
Feature activation+5.12e+00

Pos logit contributions
Mary+0.319
loe+0.314
aman+0.304
Bet+0.301
irs+0.299
Neg logit contributions
."-0.367
.-0.362
 toy-0.359
ny-0.352
 likes-0.349
Token would
Token ID561
Feature activation-1.15e+00

Pos logit contributions
eking+0.189
aman+0.169
ir+0.168
Jud+0.167
agg+0.167
Neg logit contributions
 He-0.057
 proper-0.057
."-0.055
 toy-0.054
.-0.053
Token wake
Token ID7765
Feature activation+3.41e+00

Pos logit contributions
 proper+0.017
 He+0.017
."+0.016
powers+0.016
 toy+0.016
Neg logit contributions
eking-0.038
aman-0.034
ir-0.034
agg-0.033
Jud-0.033
Token up
Token ID510
Feature activation-1.16e+00

Pos logit contributions
eking+0.124
aman+0.112
Jud+0.111
ir+0.111
agg+0.111
Neg logit contributions
 He-0.040
 proper-0.039
."-0.038
 toy-0.037
.-0.036
Token and
Token ID290
Feature activation+3.40e+00

Pos logit contributions
 He+0.020
.+0.019
."+0.019
 proper+0.018
 toy+0.018
Neg logit contributions
eking-0.037
Jud-0.034
ulu-0.033
aman-0.033
Mary-0.033
Token I
Token ID314
Feature activation-3.29e+00

Pos logit contributions
 proper+0.113
 toy+0.102
powers+0.100
.+0.096
ly+0.093
Neg logit contributions
eking-0.251
Mary-0.239
agg-0.237
ulu-0.237
aman-0.236
Token organized
Token ID8389
Feature activation-4.01e+00

Pos logit contributions
 proper+0.067
 He+0.067
powers+0.065
 They+0.063
 toy+0.062
Neg logit contributions
eking-0.086
agg-0.076
ulu-0.075
ir-0.075
Jud-0.073
Token the
Token ID262
Feature activation-7.02e-01

Pos logit contributions
 He+0.084
 proper+0.083
powers+0.081
 They+0.079
."+0.079
Neg logit contributions
eking-0.107
aman-0.092
ir-0.091
agg-0.090
Mary-0.089
Token meat
Token ID6174
Feature activation-1.72e-01

Pos logit contributions
powers+0.012
 They+0.012
 He+0.011
attered+0.011
."+0.010
Neg logit contributions
eking-0.021
aman-0.018
ma-0.018
Ja-0.017
Mary-0.016
Token!"
Token ID2474
Feature activation-1.80e+00

Pos logit contributions
 proper+0.003
 He+0.002
powers+0.002
 They+0.002
attered+0.002
Neg logit contributions
eking-0.006
ics-0.005
ma-0.005
lf-0.005
agg-0.005
Token Pe
Token ID2631
Feature activation+1.55e+01

Pos logit contributions
 He+0.039
 They+0.036
.+0.036
."+0.036
 toy+0.035
Neg logit contributions
eking-0.052
ir-0.044
agg-0.043
Jud-0.042
ulu-0.042
Tokenppa
Token ID44989
Feature activation+5.55e+00

Pos logit contributions
irs+0.236
Bet+0.235
Mary+0.234
Jud+0.230
ulu+0.230
Neg logit contributions
.-0.452
."-0.451
 toy-0.435
 proper-0.430
ray-0.419
Token said
Token ID531
Feature activation+6.11e+00

Pos logit contributions
eking+0.142
aman+0.120
ir+0.120
agg+0.118
Jud+0.118
Neg logit contributions
 He-0.124
 proper-0.123
."-0.121
 toy-0.118
.-0.118
Token.
Token ID13
Feature activation-8.15e+00

Pos logit contributions
eking+0.207
agg+0.183
ir+0.183
Mary+0.180
Jud+0.179
Neg logit contributions
 proper-0.084
powers-0.076
 He-0.073
 bone-0.070
 toy-0.069
Token Her
Token ID2332
Feature activation+2.04e+00

Pos logit contributions
 He+0.134
 They+0.123
."+0.118
.+0.114
 proper+0.112
Neg logit contributions
eking-0.286
ir-0.245
Jud-0.243
agg-0.240
eks-0.240
Token mom
Token ID1995
Feature activation+4.59e+00

Pos logit contributions
eking+0.049
agg+0.042
aman+0.042
Mary+0.041
Jud+0.041
Neg logit contributions
 He-0.046
 proper-0.046
."-0.045
powers-0.045
 They-0.044
Token When
Token ID1649
Feature activation-3.06e+00

Pos logit contributions
 proper+0.156
powers+0.141
 toy+0.140
."+0.136
 He+0.135
Neg logit contributions
eking-0.413
aman-0.376
Mary-0.376
Jud-0.373
ir-0.373
Token Daisy
Token ID40355
Feature activation+4.60e-02

Pos logit contributions
 proper+0.058
 He+0.057
."+0.055
powers+0.055
 toy+0.054
Neg logit contributions
eking-0.088
aman-0.076
ir-0.076
agg-0.076
Jud-0.076
Token put
Token ID1234
Feature activation-1.24e+00

Pos logit contributions
eking+0.001
aman+0.001
ir+0.001
agg+0.001
Jud+0.001
Neg logit contributions
 proper-0.001
 He-0.001
."-0.001
powers-0.001
 toy-0.001
Token the
Token ID262
Feature activation-2.98e+00

Pos logit contributions
.+0.020
 toy+0.019
 He+0.019
."+0.019
 proper+0.019
Neg logit contributions
eking-0.041
Jud-0.037
aman-0.037
agg-0.037
Mary-0.036
Token magn
Token ID7842
Feature activation+2.16e-01

Pos logit contributions
 He+0.057
 They+0.055
 proper+0.052
powers+0.051
."+0.050
Neg logit contributions
eking-0.087
aman-0.075
ir-0.074
ma-0.073
ulu-0.073
Tokenifying
Token ID4035
Feature activation-1.37e+01

Pos logit contributions
eking+0.007
ir+0.006
aman+0.006
agg+0.006
irs+0.006
Neg logit contributions
 proper-0.004
 He-0.003
."-0.003
powers-0.003
 toy-0.003
Token glass
Token ID5405
Feature activation-2.10e+00

Pos logit contributions
 proper+0.212
 He+0.204
powers+0.200
."+0.194
 toy+0.190
Neg logit contributions
eking-0.431
ir-0.391
aman-0.391
agg-0.386
Jud-0.383
Token down
Token ID866
Feature activation-5.21e+00

Pos logit contributions
 proper+0.036
 He+0.034
powers+0.033
."+0.031
 They+0.031
Neg logit contributions
eking-0.063
ir-0.056
ulu-0.055
aman-0.055
agg-0.055
Token,
Token ID11
Feature activation+3.54e+00

Pos logit contributions
 He+0.085
.+0.083
."+0.081
 proper+0.079
 toy+0.078
Neg logit contributions
eking-0.171
aman-0.149
Jud-0.148
agg-0.146
ir-0.145
Token the
Token ID262
Feature activation-2.12e+00

Pos logit contributions
eking+0.093
aman+0.079
ir+0.079
agg+0.078
Jud+0.077
Neg logit contributions
 He-0.078
 proper-0.077
."-0.075
 toy-0.074
.-0.074
Token cater
Token ID20825
Feature activation+2.25e+00

Pos logit contributions
 He+0.036
 proper+0.036
 They+0.033
powers+0.033
."+0.033
Neg logit contributions
eking-0.065
aman-0.056
ulu-0.055
agg-0.055
ir-0.055
Token examine
Token ID10716
Feature activation-2.05e-01

Pos logit contributions
 proper+0.099
attered+0.097
powers+0.096
 He+0.091
 textures+0.089
Neg logit contributions
eking-0.138
ir-0.132
ulu-0.129
Jud-0.126
agg-0.124
Token it
Token ID340
Feature activation-2.87e+00

Pos logit contributions
 proper+0.004
 He+0.004
powers+0.004
 They+0.004
 toy+0.004
Neg logit contributions
eking-0.006
ir-0.005
aman-0.005
agg-0.005
Mary-0.005
Token with
Token ID351
Feature activation+5.56e-01

Pos logit contributions
."+0.047
 He+0.047
 proper+0.046
.+0.046
 toy+0.044
Neg logit contributions
eking-0.091
Jud-0.081
aman-0.081
agg-0.080
Mary-0.080
Token the
Token ID262
Feature activation-2.24e+00

Pos logit contributions
eking+0.015
aman+0.013
Jud+0.013
Mary+0.013
agg+0.013
Neg logit contributions
."-0.012
 proper-0.011
.-0.011
 He-0.011
 toy-0.011
Token magn
Token ID7842
Feature activation+8.30e-01

Pos logit contributions
 He+0.034
 proper+0.034
attered+0.034
powers+0.033
 They+0.032
Neg logit contributions
eking-0.072
ir-0.062
aman-0.062
ulu-0.062
Jud-0.061
Tokenifying
Token ID4035
Feature activation-1.35e+01

Pos logit contributions
eking+0.026
ir+0.024
aman+0.022
irs+0.022
ulu+0.022
Neg logit contributions
 proper-0.015
 He-0.014
 They-0.013
 toy-0.013
powers-0.013
Token glass
Token ID5405
Feature activation-1.40e+00

Pos logit contributions
 proper+0.208
powers+0.200
 He+0.194
attered+0.187
ly+0.185
Neg logit contributions
eking-0.398
ir-0.384
aman-0.374
agg-0.368
ulu-0.367
Token?"
Token ID1701
Feature activation-6.12e-01

Pos logit contributions
 proper+0.029
powers+0.027
 He+0.026
attered+0.025
 They+0.025
Neg logit contributions
eking-0.037
ulu-0.034
ir-0.034
ma-0.033
aman-0.032
Token
Token ID198
Feature activation+1.12e+01

Pos logit contributions
 He+0.012
 proper+0.011
."+0.011
 They+0.011
.+0.011
Neg logit contributions
eking-0.019
aman-0.016
ir-0.016
agg-0.016
Jud-0.016
Token
Token ID198
Feature activation+1.11e+01

Pos logit contributions
eking+0.383
aman+0.328
ir+0.318
Jud+0.313
agg+0.311
Neg logit contributions
 He-0.201
."-0.190
.-0.189
 They-0.185
 proper-0.173
Token"
Token ID1
Feature activation+5.08e-01

Pos logit contributions
eking+0.215
Mary+0.184
Jud+0.180
ulu+0.179
agg+0.179
Neg logit contributions
 proper-0.311
powers-0.300
 He-0.297
."-0.294
 toy-0.293
Token cried
Token ID16896
Feature activation+1.33e+00

Pos logit contributions
eking+0.037
ir+0.033
aman+0.033
irs+0.033
agg+0.033
Neg logit contributions
.-0.023
 He-0.023
."-0.022
 They-0.021
 toy-0.021
Token,
Token ID11
Feature activation+3.49e+00

Pos logit contributions
eking+0.049
ir+0.046
Jud+0.044
agg+0.044
ulu+0.043
Neg logit contributions
 proper-0.015
powers-0.013
 likes-0.011
 toy-0.010
attered-0.010
Token dropping
Token ID12047
Feature activation-5.91e+00

Pos logit contributions
eking+0.085
agg+0.076
ir+0.075
aman+0.075
Jud+0.074
Neg logit contributions
 He-0.082
.-0.081
."-0.078
 toy-0.077
 They-0.077
Token the
Token ID262
Feature activation-2.38e+00

Pos logit contributions
powers+0.148
 proper+0.147
 He+0.139
."+0.138
attered+0.136
Neg logit contributions
eking-0.125
ir-0.114
Jud-0.111
Mary-0.108
ulu-0.106
Token magn
Token ID7842
Feature activation+9.43e-01

Pos logit contributions
 He+0.044
 proper+0.043
 They+0.042
."+0.040
powers+0.040
Neg logit contributions
eking-0.071
ir-0.062
Jud-0.061
aman-0.061
ulu-0.061
Tokenifying
Token ID4035
Feature activation-1.34e+01

Pos logit contributions
eking+0.030
ir+0.028
aman+0.026
ulu+0.025
Jud+0.025
Neg logit contributions
 proper-0.017
 He-0.016
powers-0.015
 They-0.015
."-0.015
Token glass
Token ID5405
Feature activation-1.25e+00

Pos logit contributions
 proper+0.211
powers+0.197
 He+0.189
ly+0.185
attered+0.182
Neg logit contributions
eking-0.391
ir-0.380
aman-0.372
Jud-0.370
ulu-0.367
Token.
Token ID13
Feature activation-9.72e+00

Pos logit contributions
 proper+0.020
powers+0.018
 He+0.016
 But+0.016
 likes+0.015
Neg logit contributions
eking-0.039
ir-0.038
ulu-0.036
ma-0.035
 nowhere-0.035
Token
Token ID198
Feature activation+1.10e+01

Pos logit contributions
 He+0.136
 They+0.132
 But+0.123
 It+0.105
.+0.095
Neg logit contributions
eking-0.426
eling-0.377
ir-0.365
eks-0.365
iter-0.362
Token
Token ID198
Feature activation+1.09e+01

Pos logit contributions
eking+0.440
eling+0.368
ir+0.367
aman+0.366
eks+0.363
Neg logit contributions
 He-0.169
 They-0.157
.-0.153
."-0.142
 But-0.132
Token"
Token ID1
Feature activation+6.06e-01

Pos logit contributions
eking+0.289
ir+0.245
aman+0.241
agg+0.240
ulu+0.235
Neg logit contributions
 He-0.246
 proper-0.235
."-0.235
 They-0.234
.-0.233
Token borrowed
Token ID22546
Feature activation-3.35e+00

Pos logit contributions
 He+0.073
 toy+0.067
.+0.066
 should+0.064
 proper+0.063
Neg logit contributions
eking-0.167
aman-0.151
irs-0.148
ir-0.147
iter-0.140
Token her
Token ID607
Feature activation-5.85e-01

Pos logit contributions
 He+0.058
 proper+0.057
 toy+0.055
."+0.055
.+0.054
Neg logit contributions
eking-0.105
aman-0.092
agg-0.091
Mary-0.090
ir-0.089
Token mom
Token ID1995
Feature activation+2.94e+00

Pos logit contributions
 He+0.010
 They+0.010
 proper+0.009
powers+0.009
attered+0.009
Neg logit contributions
eking-0.018
ir-0.016
aman-0.015
Jud-0.015
agg-0.015
Token's
Token ID338
Feature activation-7.78e-01

Pos logit contributions
eking+0.085
aman+0.074
ir+0.074
agg+0.073
Jud+0.073
Neg logit contributions
 proper-0.055
 He-0.055
."-0.053
 toy-0.052
powers-0.052
Token magn
Token ID7842
Feature activation+9.94e-01

Pos logit contributions
 He+0.014
 They+0.013
attered+0.012
."+0.012
ny+0.012
Neg logit contributions
eking-0.023
ir-0.020
aman-0.019
Jud-0.019
agg-0.019
Tokenifying
Token ID4035
Feature activation-1.33e+01

Pos logit contributions
eking+0.034
ir+0.031
aman+0.030
agg+0.030
Jud+0.030
Neg logit contributions
 proper-0.014
 He-0.013
."-0.013
powers-0.012
 toy-0.012
Token glass
Token ID5405
Feature activation-1.28e+00

Pos logit contributions
 proper+0.189
 He+0.182
powers+0.177
."+0.174
 They+0.170
Neg logit contributions
eking-0.422
ir-0.389
aman-0.386
agg-0.382
Jud-0.381
Token to
Token ID284
Feature activation+8.71e-01

Pos logit contributions
 proper+0.023
 He+0.020
powers+0.020
 They+0.019
 But+0.019
Neg logit contributions
eking-0.036
ir-0.035
ulu-0.034
ma-0.032
 nowhere-0.032
Token take
Token ID1011
Feature activation-5.60e-01

Pos logit contributions
eking+0.031
agg+0.028
aman+0.028
Mary+0.028
Ja+0.027
Neg logit contributions
.-0.012
."-0.011
 like-0.011
 proper-0.011
 He-0.011
Token a
Token ID257
Feature activation-4.13e+00

Pos logit contributions
 He+0.012
 proper+0.012
."+0.011
powers+0.011
 toy+0.011
Neg logit contributions
eking-0.015
ir-0.013
aman-0.013
Jud-0.013
agg-0.013
Token closer
Token ID5699
Feature activation-4.31e+00

Pos logit contributions
 He+0.078
 proper+0.074
 They+0.074
."+0.073
attered+0.071
Neg logit contributions
eking-0.117
ir-0.103
Jud-0.103
aman-0.102
ulu-0.101
Token flashlight
Token ID38371
Feature activation-5.52e+00

Pos logit contributions
 He+0.066
."+0.064
attered+0.064
 They+0.063
 proper+0.063
Neg logit contributions
eking-0.115
aman-0.106
Mary-0.103
ulu-0.103
Jud-0.102
Token,
Token ID11
Feature activation+3.70e+00

Pos logit contributions
 proper+0.088
powers+0.078
 He+0.070
 They+0.065
attered+0.064
Neg logit contributions
eking-0.183
aman-0.159
ma-0.157
lf-0.157
ir-0.157
Token and
Token ID290
Feature activation+2.02e+00

Pos logit contributions
eking+0.129
aman+0.114
ir+0.114
agg+0.113
Jud+0.113
Neg logit contributions
 proper-0.048
 He-0.048
."-0.045
powers-0.044
 toy-0.044
Token a
Token ID257
Feature activation-3.70e+00

Pos logit contributions
eking+0.059
Mary+0.051
agg+0.051
Jud+0.050
aman+0.050
Neg logit contributions
 proper-0.036
 He-0.036
powers-0.035
."-0.035
 They-0.033
Token magn
Token ID7842
Feature activation+1.28e+00

Pos logit contributions
attered+0.062
 He+0.061
."+0.060
 They+0.059
powers+0.059
Neg logit contributions
eking-0.102
aman-0.094
Mary-0.092
Ja-0.092
Jud-0.092
Tokenifying
Token ID4035
Feature activation-1.31e+01

Pos logit contributions
eking+0.040
ir+0.036
aman+0.034
irs+0.034
agg+0.033
Neg logit contributions
 proper-0.024
 He-0.021
powers-0.021
 toy-0.021
 They-0.021
Token glass
Token ID5405
Feature activation-8.67e-01

Pos logit contributions
 proper+0.204
powers+0.195
 He+0.191
."+0.185
 They+0.179
Neg logit contributions
eking-0.402
aman-0.373
ir-0.372
agg-0.364
Jud-0.359
Token.
Token ID13
Feature activation-9.03e+00

Pos logit contributions
 proper+0.011
powers+0.010
 He+0.009
 But+0.009
 They+0.008
Neg logit contributions
eking-0.030
ir-0.028
ulu-0.027
ma-0.027
lf-0.026
Token
Token ID198
Feature activation+1.13e+01

Pos logit contributions
 He+0.120
 proper+0.114
."+0.110
 They+0.110
powers+0.105
Neg logit contributions
eking-0.323
ir-0.283
aman-0.283
agg-0.280
Jud-0.280
Token
Token ID198
Feature activation+1.14e+01

Pos logit contributions
eking+0.475
eling+0.383
aman+0.376
ths+0.374
tha+0.372
Neg logit contributions
 He-0.166
 They-0.156
.-0.146
."-0.141
 It-0.128
TokenAs
Token ID1722
Feature activation+1.79e+00

Pos logit contributions
eking+0.346
aman+0.270
ir+0.270
eks+0.267
agg+0.264
Neg logit contributions
 He-0.236
 They-0.229
."-0.218
.-0.215
powers-0.215
Token Don
Token ID2094
Feature activation-2.62e+00

Pos logit contributions
 proper+0.193
powers+0.190
 toy+0.183
."+0.179
attered+0.175
Neg logit contributions
eking-0.246
aman-0.225
ir-0.221
Mary-0.220
agg-0.219
Token't
Token ID470
Feature activation+2.78e-02

Pos logit contributions
."+0.073
 He+0.072
.+0.072
 They+0.068
 toy+0.068
Neg logit contributions
eking-0.057
aman-0.048
agg-0.047
Jud-0.047
Mary-0.045
Token yield
Token ID7800
Feature activation-4.61e+00

Pos logit contributions
eking+0.001
ir+0.001
Jud+0.001
ulu+0.001
Mary+0.001
Neg logit contributions
 proper-0.001
 He-0.001
powers-0.000
attered-0.000
 They-0.000
Token to
Token ID284
Feature activation-9.38e-01

Pos logit contributions
 proper+0.058
 He+0.055
powers+0.053
 They+0.049
 toy+0.048
Neg logit contributions
eking-0.162
ir-0.143
aman-0.141
Mary-0.139
Jud-0.139
Token the
Token ID262
Feature activation-3.23e+00

Pos logit contributions
 proper+0.019
 He+0.019
powers+0.018
."+0.018
 toy+0.018
Neg logit contributions
eking-0.026
aman-0.022
ir-0.022
Jud-0.022
agg-0.022
Token temptation
Token ID29062
Feature activation-1.30e+01

Pos logit contributions
 He+0.055
 proper+0.054
powers+0.052
 They+0.052
."+0.050
Neg logit contributions
eking-0.098
aman-0.086
ir-0.084
Jud-0.084
Mary-0.083
Token to
Token ID284
Feature activation-1.10e-01

Pos logit contributions
 proper+0.144
 He+0.127
powers+0.125
 toy+0.116
attered+0.111
Neg logit contributions
eking-0.470
Mary-0.425
ir-0.424
aman-0.423
ulu-0.418
Token mist
Token ID4020
Feature activation-1.95e+00

Pos logit contributions
."+0.001
.+0.001
 He+0.001
 proper+0.001
 toy+0.001
Neg logit contributions
eking-0.004
agg-0.003
aman-0.003
Mary-0.003
ir-0.003
Tokenreat
Token ID630
Feature activation+9.62e-01

Pos logit contributions
 proper+0.031
powers+0.029
 He+0.028
 toy+0.026
 bone+0.025
Neg logit contributions
eking-0.064
ir-0.056
aman-0.055
ulu-0.055
ics-0.054
Token and
Token ID290
Feature activation+1.48e+00

Pos logit contributions
eking+0.031
aman+0.027
ir+0.027
agg+0.027
Jud+0.027
Neg logit contributions
 proper-0.015
 He-0.015
."-0.014
powers-0.014
 toy-0.014
Token break
Token ID2270
Feature activation-2.55e+00

Pos logit contributions
eking+0.046
Jud+0.040
aman+0.040
agg+0.039
Mary+0.039
Neg logit contributions
 proper-0.025
 He-0.024
powers-0.023
."-0.023
 toy-0.023
Token.
Token ID13
Feature activation-8.83e+00

Pos logit contributions
 proper+0.006
powers+0.005
 Let+0.005
attered+0.005
 He+0.004
Neg logit contributions
eking-0.010
ir-0.009
ma-0.009
ics-0.009
aman-0.009
Token Lily
Token ID20037
Feature activation+1.63e+00

Pos logit contributions
 proper+0.106
powers+0.102
 toy+0.096
."+0.093
attered+0.088
Neg logit contributions
eking-0.294
ulu-0.282
agg-0.278
Mary-0.272
Jud-0.271
Token had
Token ID550
Feature activation-3.05e+00

Pos logit contributions
eking+0.047
agg+0.041
Jud+0.041
aman+0.041
Mary+0.041
Neg logit contributions
 proper-0.030
 He-0.029
."-0.029
powers-0.028
 toy-0.027
Token a
Token ID257
Feature activation-3.77e+00

Pos logit contributions
 proper+0.050
 He+0.049
powers+0.047
."+0.047
 toy+0.045
Neg logit contributions
eking-0.096
Mary-0.084
Jud-0.083
agg-0.083
aman-0.082
Token magn
Token ID7842
Feature activation+1.35e+00

Pos logit contributions
 proper+0.060
 He+0.059
."+0.059
attered+0.058
powers+0.057
Neg logit contributions
eking-0.111
Jud-0.100
aman-0.099
ir-0.099
agg-0.099
Tokenifying
Token ID4035
Feature activation-1.30e+01

Pos logit contributions
eking+0.044
ir+0.041
aman+0.039
agg+0.038
irs+0.038
Neg logit contributions
 proper-0.021
 He-0.020
powers-0.019
."-0.019
 toy-0.019
Token glass
Token ID5405
Feature activation-1.00e+00

Pos logit contributions
 He+0.162
 proper+0.161
."+0.152
 toy+0.148
powers+0.147
Neg logit contributions
eking-0.464
aman-0.411
ir-0.408
agg-0.408
Jud-0.407
Token that
Token ID326
Feature activation-6.51e+00

Pos logit contributions
 proper+0.022
powers+0.019
attered+0.017
 He+0.017
 Let+0.017
Neg logit contributions
ulu-0.024
eking-0.024
ir-0.023
ma-0.023
lf-0.023
Token she
Token ID673
Feature activation-4.98e+00

Pos logit contributions
 He+0.120
 proper+0.114
.+0.112
."+0.111
 toy+0.111
Neg logit contributions
eking-0.200
aman-0.176
ir-0.176
Mary-0.171
Jud-0.170
Token found
Token ID1043
Feature activation+1.00e+00

Pos logit contributions
 proper+0.088
 He+0.088
."+0.085
 They+0.083
powers+0.083
Neg logit contributions
eking-0.145
agg-0.127
Jud-0.125
Mary-0.124
ulu-0.123
Token in
Token ID287
Feature activation-2.58e-01

Pos logit contributions
eking+0.036
ir+0.031
agg+0.031
Mary+0.031
Jud+0.031
Neg logit contributions
 proper-0.013
 He-0.012
powers-0.012
 They-0.011
."-0.011
Token She
Token ID1375
Feature activation-4.15e+00

Pos logit contributions
 proper+0.146
 toy+0.142
."+0.136
 thing+0.129
 bone+0.129
Neg logit contributions
Jud-0.237
eking-0.236
agg-0.229
aman-0.228
Bet-0.227
Token took
Token ID1718
Feature activation+1.97e+00

Pos logit contributions
 He+0.082
 proper+0.081
 toy+0.079
."+0.078
powers+0.078
Neg logit contributions
eking-0.118
aman-0.102
ir-0.101
Mary-0.100
agg-0.099
Token out
Token ID503
Feature activation-1.07e+00

Pos logit contributions
eking+0.065
aman+0.058
ir+0.055
Jud+0.055
agg+0.055
Neg logit contributions
 proper-0.031
 He-0.031
 toy-0.030
.-0.029
 They-0.029
Token a
Token ID257
Feature activation-3.58e+00

Pos logit contributions
 toy+0.016
 proper+0.016
.+0.016
 He+0.016
 They+0.015
Neg logit contributions
eking-0.037
aman-0.033
agg-0.031
Jud-0.031
Mary-0.031
Token magn
Token ID7842
Feature activation+1.57e+00

Pos logit contributions
 He+0.060
 proper+0.058
."+0.058
 They+0.056
powers+0.056
Neg logit contributions
eking-0.110
aman-0.097
ir-0.096
Jud-0.096
agg-0.095
Tokenifying
Token ID4035
Feature activation-1.28e+01

Pos logit contributions
eking+0.054
ir+0.048
aman+0.047
agg+0.047
Jud+0.047
Neg logit contributions
 proper-0.022
 He-0.022
."-0.021
powers-0.020
 toy-0.020
Token glass
Token ID5405
Feature activation-7.01e-01

Pos logit contributions
 proper+0.168
 He+0.164
."+0.156
powers+0.155
 toy+0.151
Neg logit contributions
eking-0.437
ir-0.392
aman-0.392
agg-0.389
Jud-0.388
Token and
Token ID290
Feature activation+2.32e+00

Pos logit contributions
 proper+0.013
powers+0.012
 He+0.012
 They+0.011
."+0.011
Neg logit contributions
eking-0.019
ir-0.018
ulu-0.018
aman-0.017
ma-0.017
Token looked
Token ID3114
Feature activation-2.37e+00

Pos logit contributions
eking+0.064
aman+0.055
agg+0.055
Jud+0.055
ir+0.055
Neg logit contributions
 proper-0.047
 He-0.046
."-0.045
powers-0.044
 toy-0.044
Token very
Token ID845
Feature activation-1.95e+00

Pos logit contributions
powers+0.044
 He+0.041
 bone+0.040
attered+0.039
 Let+0.039
Neg logit contributions
eking-0.069
eks-0.059
agg-0.059
ir-0.058
whel-0.056
Token carefully
Token ID7773
Feature activation-2.72e+00

Pos logit contributions
 He+0.038
 proper+0.038
powers+0.037
 bone+0.037
 toy+0.037
Neg logit contributions
eking-0.054
agg-0.047
ir-0.045
ulu-0.043
Jud-0.043
Token,
Token ID11
Feature activation+4.42e+00

Pos logit contributions
eking+0.182
ir+0.156
agg+0.155
Mary+0.153
Jud+0.152
Neg logit contributions
 proper-0.089
powers-0.083
 He-0.078
attered-0.078
 toy-0.075
Token handing
Token ID22786
Feature activation-7.32e-01

Pos logit contributions
eking+0.164
agg+0.150
irs+0.150
aman+0.148
ir+0.148
Neg logit contributions
 He-0.057
.-0.055
."-0.053
 They-0.050
 like-0.046
Token him
Token ID683
Feature activation+1.95e+00

Pos logit contributions
 He+0.015
 proper+0.015
 toy+0.014
."+0.014
.+0.014
Neg logit contributions
eking-0.021
agg-0.018
ir-0.018
aman-0.018
Jud-0.017
Token the
Token ID262
Feature activation-1.28e+00

Pos logit contributions
eking+0.041
ma+0.039
Jud+0.037
Mary+0.037
ulu+0.036
Neg logit contributions
powers-0.047
attered-0.046
 He-0.045
 proper-0.045
 They-0.043
Token magn
Token ID7842
Feature activation+1.60e+00

Pos logit contributions
 He+0.021
 proper+0.020
 They+0.020
attered+0.019
powers+0.019
Neg logit contributions
eking-0.039
aman-0.034
ir-0.034
Jud-0.033
Mary-0.033
Tokenifying
Token ID4035
Feature activation-1.27e+01

Pos logit contributions
eking+0.048
ir+0.046
aman+0.041
irs+0.041
ulu+0.041
Neg logit contributions
 proper-0.031
 He-0.027
 They-0.027
powers-0.026
 toy-0.026
Token glass
Token ID5405
Feature activation-5.17e-01

Pos logit contributions
 proper+0.181
 He+0.173
powers+0.170
."+0.162
 toy+0.158
Neg logit contributions
eking-0.406
ir-0.377
aman-0.375
agg-0.368
Jud-0.367
Token.
Token ID13
Feature activation-8.36e+00

Pos logit contributions
 proper+0.007
powers+0.006
 He+0.005
attered+0.005
 But+0.005
Neg logit contributions
eking-0.017
ir-0.016
ulu-0.016
ma-0.016
 nowhere-0.015
Token She
Token ID1375
Feature activation-4.19e+00

Pos logit contributions
 He+0.133
 proper+0.125
 They+0.124
."+0.123
.+0.119
Neg logit contributions
eking-0.281
agg-0.243
aman-0.243
ir-0.243
Jud-0.239
Token looked
Token ID3114
Feature activation-1.75e+00

Pos logit contributions
 He+0.075
 toy+0.074
.+0.073
."+0.071
 proper+0.071
Neg logit contributions
eking-0.131
ir-0.117
irs-0.113
aman-0.113
Jud-0.110
Token around
Token ID1088
Feature activation-4.73e-02

Pos logit contributions
powers+0.030
 He+0.029
 proper+0.029
 They+0.027
."+0.027
Neg logit contributions
eking-0.054
ir-0.046
aman-0.046
agg-0.045
Mary-0.044
Token.
Token ID13
Feature activation-8.61e+00

Pos logit contributions
 proper+0.038
 He+0.030
powers+0.029
attered+0.027
 Let+0.024
Neg logit contributions
eking-0.145
ir-0.135
ulu-0.130
ics-0.128
irs-0.125
Token They
Token ID1119
Feature activation-4.33e+00

Pos logit contributions
 proper+0.117
 He+0.114
powers+0.110
."+0.110
 toy+0.107
Neg logit contributions
eking-0.294
aman-0.262
ir-0.261
agg-0.260
Jud-0.259
Token had
Token ID550
Feature activation-2.57e+00

Pos logit contributions
 He+0.076
.+0.074
 proper+0.073
 toy+0.072
 They+0.072
Neg logit contributions
eking-0.136
aman-0.119
ir-0.117
agg-0.114
Mary-0.114
Token a
Token ID257
Feature activation-3.38e+00

Pos logit contributions
 He+0.041
 proper+0.041
powers+0.040
."+0.039
 toy+0.037
Neg logit contributions
eking-0.080
Mary-0.071
Jud-0.071
agg-0.070
aman-0.069
Token magn
Token ID7842
Feature activation+1.57e+00

Pos logit contributions
 He+0.054
 proper+0.053
."+0.052
powers+0.050
 They+0.049
Neg logit contributions
eking-0.106
aman-0.095
Jud-0.094
agg-0.094
ir-0.093
Tokenifying
Token ID4035
Feature activation-1.27e+01

Pos logit contributions
eking+0.054
ir+0.049
aman+0.048
agg+0.047
ulu+0.047
Neg logit contributions
 proper-0.022
 He-0.021
."-0.020
 toy-0.020
powers-0.020
Token glass
Token ID5405
Feature activation-6.36e-01

Pos logit contributions
 He+0.166
 proper+0.165
."+0.156
 toy+0.153
powers+0.151
Neg logit contributions
eking-0.448
aman-0.394
Jud-0.392
ir-0.391
agg-0.391
Token and
Token ID290
Feature activation+2.49e+00

Pos logit contributions
 proper+0.013
powers+0.012
 He+0.011
 But+0.011
 Let+0.010
Neg logit contributions
ulu-0.017
eking-0.016
ir-0.015
ma-0.015
agg-0.015
Token a
Token ID257
Feature activation-3.43e+00

Pos logit contributions
eking+0.068
agg+0.062
Jud+0.062
Mary+0.060
ulu+0.060
Neg logit contributions
 proper-0.046
 He-0.045
."-0.044
ny-0.043
powers-0.043
Token small
Token ID1402
Feature activation-3.44e+00

Pos logit contributions
 He+0.064
."+0.063
attered+0.061
powers+0.060
 proper+0.060
Neg logit contributions
eking-0.091
Jud-0.084
ulu-0.083
aman-0.083
agg-0.081
Token box
Token ID3091
Feature activation-8.50e-01

Pos logit contributions
cream+0.072
attered+0.070
 He+0.069
 proper+0.067
powers+0.066
Neg logit contributions
eking-0.073
ir-0.072
aman-0.072
ulu-0.069
whel-0.068
Token,
Token ID11
Feature activation+3.88e+00

Pos logit contributions
 proper+0.058
 He+0.057
powers+0.055
."+0.052
 toy+0.051
Neg logit contributions
eking-0.122
ir-0.106
agg-0.106
Mary-0.105
aman-0.105
Token drawing
Token ID8263
Feature activation-4.55e+00

Pos logit contributions
eking+0.136
aman+0.125
ir+0.123
agg+0.123
Jud+0.122
Neg logit contributions
 He-0.052
 They-0.049
.-0.049
."-0.049
 toy-0.048
Token,
Token ID11
Feature activation+3.82e+00

Pos logit contributions
 proper+0.062
 He+0.058
powers+0.056
."+0.054
 toy+0.053
Neg logit contributions
eking-0.155
ir-0.137
Mary-0.136
agg-0.136
aman-0.135
Token and
Token ID290
Feature activation+1.91e+00

Pos logit contributions
eking+0.144
aman+0.133
agg+0.132
ir+0.132
Jud+0.131
Neg logit contributions
 He-0.043
 They-0.040
.-0.040
."-0.039
 toy-0.039
Token doing
Token ID1804
Feature activation-1.31e+00

Pos logit contributions
eking+0.053
Mary+0.045
Jud+0.045
ir+0.044
aman+0.044
Neg logit contributions
 proper-0.038
 He-0.037
powers-0.036
."-0.035
attered-0.034
Token experiments
Token ID10256
Feature activation-1.26e+01

Pos logit contributions
 He+0.023
 proper+0.022
powers+0.022
."+0.021
attered+0.021
Neg logit contributions
eking-0.040
Mary-0.034
agg-0.034
Jud-0.034
ir-0.034
Token.
Token ID13
Feature activation-9.06e+00

Pos logit contributions
 proper+0.125
powers+0.116
attered+0.113
 floppy+0.096
 He+0.096
Neg logit contributions
eking-0.471
ma-0.426
ics-0.415
lf-0.414
irs-0.410
Token Lily
Token ID20037
Feature activation+1.14e+00

Pos logit contributions
 proper+0.117
 He+0.111
powers+0.109
."+0.108
 toy+0.107
Neg logit contributions
eking-0.312
aman-0.280
agg-0.279
ir-0.278
Jud-0.278
Token and
Token ID290
Feature activation+1.81e+00

Pos logit contributions
eking+0.036
Mary+0.032
agg+0.032
aman+0.032
Jud+0.032
Neg logit contributions
 proper-0.018
powers-0.016
 He-0.016
attered-0.015
 toy-0.015
Token Tom
Token ID4186
Feature activation+3.89e+00

Pos logit contributions
eking+0.050
ir+0.043
agg+0.042
ulu+0.042
Jud+0.042
Neg logit contributions
 He-0.038
 proper-0.037
.-0.037
 toy-0.036
 They-0.036
Token want
Token ID765
Feature activation-3.83e+00

Pos logit contributions
eking+0.105
eks+0.093
ir+0.089
ma+0.088
agg+0.088
Neg logit contributions
powers-0.078
 proper-0.077
attered-0.074
ometer-0.072
 bone-0.071
Token and
Token ID290
Feature activation+2.36e+00

Pos logit contributions
eking+0.059
aman+0.054
ma+0.051
Mary+0.051
ir+0.050
Neg logit contributions
 proper-0.039
powers-0.038
attered-0.036
 He-0.034
 They-0.033
Token handed
Token ID10158
Feature activation+1.08e+00

Pos logit contributions
eking+0.072
ir+0.062
aman+0.062
irs+0.059
agg+0.059
Neg logit contributions
 He-0.046
 toy-0.043
.-0.043
 proper-0.043
."-0.043
Token him
Token ID683
Feature activation+1.96e+00

Pos logit contributions
eking+0.031
aman+0.027
agg+0.026
ir+0.026
Jud+0.026
Neg logit contributions
 He-0.022
.-0.022
."-0.021
 proper-0.021
 toy-0.021
Token the
Token ID262
Feature activation-1.61e+00

Pos logit contributions
eking+0.045
ma+0.040
ir+0.039
Jud+0.039
ulu+0.038
Neg logit contributions
powers-0.044
 He-0.044
attered-0.043
 proper-0.043
 They-0.042
Token magn
Token ID7842
Feature activation+1.68e+00

Pos logit contributions
 toy+0.025
 proper+0.025
."+0.024
.+0.024
 He+0.023
Neg logit contributions
eking-0.053
aman-0.048
agg-0.048
Mary-0.048
Jud-0.048
Tokenifying
Token ID4035
Feature activation-1.25e+01

Pos logit contributions
eking+0.054
ir+0.050
aman+0.046
irs+0.046
agg+0.046
Neg logit contributions
 proper-0.028
 He-0.027
 They-0.025
."-0.025
powers-0.025
Token glass
Token ID5405
Feature activation-5.71e-01

Pos logit contributions
 proper+0.179
 He+0.172
powers+0.167
."+0.158
 They+0.156
Neg logit contributions
eking-0.394
ir-0.371
aman-0.365
Jud-0.360
agg-0.359
Token.
Token ID13
Feature activation-9.00e+00

Pos logit contributions
 proper+0.009
powers+0.007
attered+0.007
 He+0.007
 But+0.007
Neg logit contributions
ulu-0.017
ir-0.017
eking-0.017
ma-0.016
 nowhere-0.015
Token
Token ID198
Feature activation+1.17e+01

Pos logit contributions
 He+0.167
 proper+0.156
 They+0.155
."+0.154
.+0.150
Neg logit contributions
eking-0.282
aman-0.242
ir-0.241
agg-0.238
Jud-0.236
Token
Token ID198
Feature activation+1.16e+01

Pos logit contributions
eking+0.499
ths+0.416
aman+0.413
eling+0.403
eks+0.403
Neg logit contributions
 He-0.168
.-0.152
 They-0.151
."-0.138
 But-0.124
TokenHe
Token ID1544
Feature activation-4.30e+00

Pos logit contributions
eking+0.314
ir+0.255
aman+0.252
agg+0.245
tha+0.243
Neg logit contributions
 He-0.284
 They-0.266
.-0.260
."-0.256
 proper-0.251
Token She
Token ID1375
Feature activation-4.85e+00

Pos logit contributions
 He+0.139
 They+0.133
.+0.125
."+0.123
 toy+0.112
Neg logit contributions
eking-0.350
agg-0.302
aman-0.299
ir-0.298
ulu-0.293
Token takes
Token ID2753
Feature activation+1.76e+00

Pos logit contributions
 proper+0.080
 He+0.077
."+0.074
powers+0.073
 toy+0.072
Neg logit contributions
eking-0.152
agg-0.133
aman-0.133
ir-0.133
Jud-0.132
Token out
Token ID503
Feature activation-8.43e-01

Pos logit contributions
eking+0.057
aman+0.050
agg+0.050
ir+0.050
Jud+0.049
Neg logit contributions
 proper-0.028
 He-0.027
."-0.026
 toy-0.026
.-0.026
Token a
Token ID257
Feature activation-3.59e+00

Pos logit contributions
 He+0.013
 proper+0.013
 toy+0.013
.+0.013
."+0.013
Neg logit contributions
eking-0.028
aman-0.024
agg-0.024
Jud-0.024
ir-0.024
Token magn
Token ID7842
Feature activation+1.78e+00

Pos logit contributions
 He+0.060
 proper+0.059
."+0.058
 They+0.057
powers+0.057
Neg logit contributions
eking-0.110
aman-0.096
ir-0.096
Jud-0.095
agg-0.095
Tokenifying
Token ID4035
Feature activation-1.25e+01

Pos logit contributions
eking+0.058
ir+0.053
aman+0.051
agg+0.050
Jud+0.050
Neg logit contributions
 proper-0.029
 He-0.027
."-0.026
 toy-0.026
powers-0.026
Token glass
Token ID5405
Feature activation-8.58e-01

Pos logit contributions
 proper+0.167
 He+0.164
."+0.156
powers+0.154
 toy+0.151
Neg logit contributions
eking-0.428
aman-0.382
ir-0.381
agg-0.379
Jud-0.378
Token.
Token ID13
Feature activation-8.82e+00

Pos logit contributions
 proper+0.017
powers+0.015
 He+0.014
attered+0.014
 But+0.014
Neg logit contributions
eking-0.024
ir-0.022
ulu-0.021
ma-0.021
agg-0.020
Token She
Token ID1375
Feature activation-4.22e+00

Pos logit contributions
 He+0.103
 They+0.094
 proper+0.093
."+0.092
.+0.089
Neg logit contributions
eking-0.333
aman-0.293
ir-0.291
agg-0.291
Jud-0.289
Token looks
Token ID3073
Feature activation-3.14e+00

Pos logit contributions
 proper+0.066
 He+0.065
."+0.062
powers+0.060
 toy+0.060
Neg logit contributions
eking-0.136
agg-0.119
aman-0.119
ir-0.118
Jud-0.118
Token at
Token ID379
Feature activation+2.39e+00

Pos logit contributions
powers+0.050
 He+0.049
 proper+0.048
 They+0.045
."+0.045
Neg logit contributions
eking-0.099
agg-0.086
ir-0.085
eks-0.083
aman-0.083
Token can
Token ID460
Feature activation-6.00e+00

Pos logit contributions
 proper+0.092
 He+0.089
powers+0.085
 toy+0.084
."+0.082
Neg logit contributions
eking-0.183
ir-0.162
aman-0.160
Jud-0.159
agg-0.159
Token help
Token ID1037
Feature activation-7.11e+00

Pos logit contributions
 proper+0.123
powers+0.118
attered+0.116
 He+0.107
ly+0.106
Neg logit contributions
eking-0.162
ir-0.145
ma-0.145
ulu-0.136
irl-0.136
Token me
Token ID502
Feature activation+2.31e+00

Pos logit contributions
 proper+0.178
 He+0.177
."+0.172
powers+0.171
 toy+0.170
Neg logit contributions
eking-0.164
aman-0.136
ir-0.135
agg-0.134
Jud-0.133
Token with
Token ID351
Feature activation-6.86e-01

Pos logit contributions
eking+0.071
ir+0.060
ulu+0.059
agg+0.057
ma+0.057
Neg logit contributions
 proper-0.042
 He-0.040
powers-0.039
attered-0.039
 They-0.037
Token my
Token ID616
Feature activation-1.38e+00

Pos logit contributions
 proper+0.013
."+0.013
 He+0.013
 toy+0.013
.+0.013
Neg logit contributions
eking-0.020
agg-0.017
Jud-0.017
aman-0.017
Mary-0.017
Token chores
Token ID46173
Feature activation-1.23e+01

Pos logit contributions
 He+0.023
 They+0.021
 proper+0.021
attered+0.021
powers+0.020
Neg logit contributions
eking-0.043
aman-0.037
ir-0.037
ma-0.036
agg-0.036
Token instead
Token ID2427
Feature activation-5.26e+00

Pos logit contributions
 proper+0.110
powers+0.104
 He+0.101
 They+0.088
 toy+0.087
Neg logit contributions
eking-0.459
aman-0.425
ma-0.418
ir-0.414
Mary-0.411
Token.
Token ID13
Feature activation-1.08e+01

Pos logit contributions
 proper+0.046
 He+0.040
powers+0.040
 toy+0.036
 They+0.034
Neg logit contributions
eking-0.204
ir-0.183
aman-0.182
Mary-0.181
agg-0.181
Token If
Token ID1002
Feature activation-3.15e+00

Pos logit contributions
 proper+0.152
 toy+0.147
powers+0.142
attered+0.139
ly+0.135
Neg logit contributions
eking-0.328
Mary-0.316
aman-0.307
Jud-0.307
Ma-0.307
Token you
Token ID345
Feature activation-1.01e+00

Pos logit contributions
."+0.062
 He+0.061
.+0.060
 proper+0.058
 toy+0.057
Neg logit contributions
eking-0.091
ir-0.080
aman-0.080
Jud-0.080
agg-0.079
Token help
Token ID1037
Feature activation-5.02e+00

Pos logit contributions
 proper+0.017
 He+0.016
powers+0.016
 toy+0.016
attered+0.015
Neg logit contributions
eking-0.031
ir-0.027
aman-0.026
agg-0.026
ulu-0.026
Token
Token ID198
Feature activation+1.22e+01

Pos logit contributions
eking+0.479
aman+0.408
ir+0.384
eling+0.380
agg+0.378
Neg logit contributions
 He-0.199
.-0.182
 They-0.171
."-0.171
 But-0.155
TokenHe
Token ID1544
Feature activation-3.30e+00

Pos logit contributions
eking+0.254
aman+0.198
ir+0.196
agg+0.190
ulu+0.184
Neg logit contributions
 He-0.359
 proper-0.340
 They-0.337
."-0.337
powers-0.336
Token pointed
Token ID6235
Feature activation-8.90e-01

Pos logit contributions
 He+0.061
.+0.057
 toy+0.056
 They+0.051
 should+0.051
Neg logit contributions
eking-0.107
irs-0.097
ir-0.097
aman-0.095
ics-0.092
Token his
Token ID465
Feature activation+7.52e-01

Pos logit contributions
 proper+0.016
powers+0.016
attered+0.015
."+0.014
 They+0.013
Neg logit contributions
eking-0.025
ir-0.023
aman-0.022
Mary-0.022
agg-0.022
Token magn
Token ID7842
Feature activation+2.16e+00

Pos logit contributions
eking+0.024
aman+0.022
Mary+0.022
agg+0.022
Jud+0.022
Neg logit contributions
 proper-0.012
 toy-0.012
 He-0.012
.-0.011
."-0.011
Tokenifying
Token ID4035
Feature activation-1.23e+01

Pos logit contributions
eking+0.073
ir+0.066
aman+0.064
agg+0.062
Jud+0.062
Neg logit contributions
 proper-0.033
 He-0.031
."-0.030
 They-0.029
powers-0.029
Token glass
Token ID5405
Feature activation-1.48e-01

Pos logit contributions
 proper+0.180
 He+0.172
powers+0.169
."+0.166
 They+0.160
Neg logit contributions
eking-0.385
ir-0.355
aman-0.353
agg-0.348
Jud-0.348
Token at
Token ID379
Feature activation+2.37e+00

Pos logit contributions
 proper+0.003
powers+0.002
 He+0.002
 They+0.002
."+0.002
Neg logit contributions
eking-0.004
ir-0.004
ma-0.004
ulu-0.004
aman-0.004
Token different
Token ID1180
Feature activation-3.87e+00

Pos logit contributions
eking+0.079
agg+0.068
aman+0.068
Jud+0.068
Ja+0.066
Neg logit contributions
 He-0.041
.-0.040
."-0.038
 toy-0.038
 They-0.036
Token classmates
Token ID28999
Feature activation-1.09e+00

Pos logit contributions
powers+0.065
 proper+0.064
 He+0.063
attered+0.059
 They+0.057
Neg logit contributions
eking-0.118
ir-0.103
aman-0.103
ma-0.100
Mary-0.100
Token and
Token ID290
Feature activation+2.64e+00

Pos logit contributions
 proper+0.016
powers+0.015
 He+0.014
attered+0.013
."+0.013
Neg logit contributions
eking-0.038
aman-0.033
ir-0.033
Mary-0.032
ma-0.032
Token the
Token ID262
Feature activation-1.08e+00

Pos logit contributions
eking+0.051
ulu+0.044
ir+0.044
ma+0.044
Jud+0.043
Neg logit contributions
 proper-0.059
powers-0.059
 He-0.056
attered-0.056
."-0.055
Token necklace
Token ID38987
Feature activation+5.07e-01

Pos logit contributions
 toy+0.017
 proper+0.016
."+0.016
.+0.016
 He+0.016
Neg logit contributions
eking-0.036
agg-0.033
aman-0.032
Mary-0.032
Jud-0.032
Token and
Token ID290
Feature activation+2.62e+00

Pos logit contributions
eking+0.015
ma+0.014
ir+0.013
ulu+0.013
ics+0.013
Neg logit contributions
 proper-0.009
powers-0.008
attered-0.007
 He-0.007
."-0.007
Token the
Token ID262
Feature activation-1.04e+00

Pos logit contributions
eking+0.074
ir+0.065
aman+0.064
agg+0.063
Jud+0.061
Neg logit contributions
 He-0.055
 toy-0.053
 proper-0.052
.-0.052
."-0.051
Token magn
Token ID7842
Feature activation+2.06e+00

Pos logit contributions
 proper+0.017
 He+0.017
."+0.017
 toy+0.016
powers+0.016
Neg logit contributions
eking-0.032
aman-0.028
ir-0.028
agg-0.028
Jud-0.028
Tokenifying
Token ID4035
Feature activation-1.22e+01

Pos logit contributions
eking+0.068
ir+0.061
aman+0.060
agg+0.059
Jud+0.059
Neg logit contributions
 proper-0.032
 He-0.031
."-0.029
powers-0.029
 toy-0.029
Token glass
Token ID5405
Feature activation-1.61e-01

Pos logit contributions
 proper+0.169
 He+0.164
powers+0.159
."+0.152
 They+0.148
Neg logit contributions
eking-0.396
ir-0.366
aman-0.361
Jud-0.357
agg-0.357
Token and
Token ID290
Feature activation+2.60e+00

Pos logit contributions
 proper+0.002
powers+0.002
 He+0.002
 But+0.002
 Let+0.001
Neg logit contributions
eking-0.006
ir-0.005
ulu-0.005
ma-0.005
agg-0.005
Token told
Token ID1297
Feature activation+2.42e+00

Pos logit contributions
eking+0.071
ir+0.061
aman+0.061
agg+0.061
Jud+0.060
Neg logit contributions
 He-0.054
 proper-0.053
."-0.052
 toy-0.052
.-0.051
Token him
Token ID683
Feature activation+2.33e+00

Pos logit contributions
eking+0.055
ir+0.046
agg+0.045
aman+0.045
Jud+0.045
Neg logit contributions
 proper-0.061
 He-0.059
powers-0.059
."-0.058
 toy-0.057
Token that
Token ID326
Feature activation-5.86e+00

Pos logit contributions
eking+0.055
ics+0.049
Mary+0.049
ulu+0.049
agg+0.048
Neg logit contributions
attered-0.051
 proper-0.048
 dot-0.047
powers-0.047
ny-0.047
Token the
Token ID262
Feature activation-1.08e+00

Pos logit contributions
eking+0.078
aman+0.067
ir+0.067
agg+0.066
Jud+0.066
Neg logit contributions
 proper-0.053
 He-0.053
."-0.051
 toy-0.050
powers-0.050
Token watch
Token ID2342
Feature activation-9.24e-01

Pos logit contributions
 proper+0.019
 He+0.019
powers+0.018
 They+0.018
attered+0.017
Neg logit contributions
eking-0.032
aman-0.028
ir-0.028
Mary-0.028
Jud-0.028
Token with
Token ID351
Feature activation+2.16e+00

Pos logit contributions
 proper+0.011
 He+0.011
powers+0.011
."+0.010
 toy+0.010
Neg logit contributions
eking-0.033
aman-0.029
ir-0.029
Mary-0.029
agg-0.029
Token the
Token ID262
Feature activation-1.03e+00

Pos logit contributions
eking+0.068
aman+0.060
ir+0.060
agg+0.059
Jud+0.059
Neg logit contributions
 proper-0.035
 He-0.035
."-0.034
powers-0.033
 toy-0.033
Token magn
Token ID7842
Feature activation+2.24e+00

Pos logit contributions
 proper+0.016
 He+0.016
."+0.015
powers+0.015
 They+0.015
Neg logit contributions
eking-0.033
ir-0.029
aman-0.029
Jud-0.029
agg-0.029
Tokenifying
Token ID4035
Feature activation-1.22e+01

Pos logit contributions
eking+0.071
ir+0.066
aman+0.061
agg+0.060
ulu+0.060
Neg logit contributions
 proper-0.041
 He-0.036
."-0.035
powers-0.035
 toy-0.035
Token glass
Token ID5405
Feature activation+4.41e-02

Pos logit contributions
 proper+0.175
 He+0.163
powers+0.163
."+0.154
 toy+0.152
Neg logit contributions
eking-0.385
ir-0.361
aman-0.357
agg-0.352
Jud-0.351
Token.
Token ID13
Feature activation-8.10e+00

Pos logit contributions
eking+0.002
ir+0.001
ulu+0.001
ma+0.001
agg+0.001
Neg logit contributions
 proper-0.001
powers-0.000
 He-0.000
 But-0.000
attered-0.000
Token He
Token ID679
Feature activation-3.53e+00

Pos logit contributions
 proper+0.072
."+0.065
 toy+0.060
ly+0.055
cest+0.054
Neg logit contributions
eking-0.269
agg-0.263
Jane-0.262
Jud-0.258
ulu-0.257
Token saw
Token ID2497
Feature activation+2.82e+00

Pos logit contributions
 He+0.052
.+0.049
 should+0.049
 likes+0.047
 toy+0.045
Neg logit contributions
eking-0.125
irs-0.123
ir-0.120
aman-0.119
ics-0.116
Token the
Token ID262
Feature activation-9.65e-01

Pos logit contributions
eking+0.089
ir+0.078
Jud+0.077
aman+0.077
agg+0.077
Neg logit contributions
 proper-0.046
 He-0.044
."-0.043
powers-0.043
 They-0.042
Tokenax
Token ID897
Feature activation+6.12e+00

Pos logit contributions
eking+0.115
agg+0.090
aman+0.089
ir+0.088
Jud+0.085
Neg logit contributions
 proper-0.229
 toy-0.219
powers-0.218
 He-0.218
."-0.217
Token saw
Token ID2497
Feature activation+2.19e+00

Pos logit contributions
eking+0.189
aman+0.166
ir+0.165
Jud+0.163
agg+0.162
Neg logit contributions
 He-0.106
 proper-0.103
."-0.101
 toy-0.100
.-0.099
Token shelves
Token ID20726
Feature activation-4.40e+00

Pos logit contributions
eking+0.067
aman+0.058
ir+0.058
agg+0.058
Jud+0.058
Neg logit contributions
 proper-0.038
 He-0.038
."-0.036
powers-0.036
 toy-0.036
Token full
Token ID1336
Feature activation-6.80e+00

Pos logit contributions
 He+0.060
 proper+0.057
.+0.056
."+0.055
 toy+0.054
Neg logit contributions
eking-0.151
aman-0.135
Jud-0.135
ir-0.135
agg-0.134
Token of
Token ID286
Feature activation-1.37e+00

Pos logit contributions
 proper+0.129
powers+0.127
 He+0.117
."+0.117
attered+0.113
Neg logit contributions
eking-0.185
agg-0.167
aman-0.167
ulu-0.166
ir-0.165
Token experiments
Token ID10256
Feature activation-1.21e+01

Pos logit contributions
 He+0.020
 proper+0.020
."+0.020
powers+0.020
 They+0.019
Neg logit contributions
eking-0.044
ir-0.038
aman-0.038
agg-0.038
Jud-0.038
Token and
Token ID290
Feature activation+2.38e+00

Pos logit contributions
 proper+0.194
powers+0.179
 He+0.163
."+0.158
attered+0.156
Neg logit contributions
eking-0.388
Mary-0.334
aman-0.334
ma-0.333
agg-0.327
Token monitors
Token ID19374
Feature activation-8.91e+00

Pos logit contributions
eking+0.068
Mary+0.058
agg+0.058
Jud+0.057
Jane+0.056
Neg logit contributions
."-0.043
 proper-0.042
 He-0.041
powers-0.041
ny-0.039
Token be
Token ID307
Feature activation+5.61e+00

Pos logit contributions
 proper+0.123
powers+0.116
 He+0.102
attered+0.100
 toy+0.097
Neg logit contributions
eking-0.310
aman-0.270
Jud-0.269
Mary-0.267
ir-0.267
Tokeneping
Token ID7213
Feature activation-2.57e+00

Pos logit contributions
aman+0.126
eking+0.121
Mary+0.115
Jud+0.113
agg+0.112
Neg logit contributions
.-0.143
 toy-0.135
."-0.131
 like-0.131
 proper-0.128
Token away
Token ID1497
Feature activation-1.51e+00

Pos logit contributions
 proper+0.044
powers+0.043
 Let+0.039
ometer+0.038
."+0.038
Neg logit contributions
eking-0.078
ir-0.064
Mary-0.062
eks-0.061
ics-0.061
Token said
Token ID531
Feature activation+5.80e+00

Pos logit contributions
eking+0.058
Jud+0.050
Mary+0.049
agg+0.049
aman+0.049
Neg logit contributions
 proper-0.062
 He-0.059
powers-0.058
."-0.058
 toy-0.057
Token,
Token ID11
Feature activation+4.86e+00

Pos logit contributions
eking+0.189
ir+0.165
Jud+0.165
agg+0.164
Mary+0.162
Neg logit contributions
 proper-0.092
powers-0.081
attered-0.074
 He-0.074
 likes-0.073
Token holding
Token ID4769
Feature activation+1.37e+00

Pos logit contributions
eking+0.111
aman+0.100
irs+0.100
agg+0.099
ir+0.097
Neg logit contributions
.-0.122
 He-0.122
 toy-0.116
."-0.116
 They-0.113
Token the
Token ID262
Feature activation-7.89e-01

Pos logit contributions
eking+0.039
ir+0.034
Jud+0.034
Mary+0.034
agg+0.034
Neg logit contributions
powers-0.026
 proper-0.025
 He-0.025
."-0.024
 They-0.023
Token magn
Token ID7842
Feature activation+2.29e+00

Pos logit contributions
 He+0.015
 They+0.015
 proper+0.015
powers+0.014
."+0.014
Neg logit contributions
eking-0.022
ir-0.020
aman-0.020
Jud-0.020
ulu-0.019
Tokenifying
Token ID4035
Feature activation-1.21e+01

Pos logit contributions
eking+0.071
ir+0.068
ulu+0.060
aman+0.060
Jud+0.060
Neg logit contributions
 proper-0.043
 He-0.039
 They-0.038
powers-0.037
."-0.037
Token glass
Token ID5405
Feature activation+6.23e-02

Pos logit contributions
 proper+0.185
powers+0.173
 He+0.169
ly+0.165
attered+0.163
Neg logit contributions
eking-0.358
ir-0.347
aman-0.337
Jud-0.336
agg-0.334
Token over
Token ID625
Feature activation+1.15e+00

Pos logit contributions
eking+0.002
ir+0.002
ulu+0.002
 nowhere+0.002
ma+0.002
Neg logit contributions
 proper-0.001
powers-0.001
 But-0.001
cream-0.001
 He-0.001
Token a
Token ID257
Feature activation-2.77e+00

Pos logit contributions
eking+0.032
aman+0.027
agg+0.027
ir+0.027
Jud+0.027
Neg logit contributions
 He-0.024
 proper-0.023
.-0.023
."-0.023
 toy-0.023
Token black
Token ID2042
Feature activation-3.74e+00

Pos logit contributions
attered+0.050
 proper+0.050
powers+0.050
 He+0.050
 They+0.049
Neg logit contributions
eking-0.074
aman-0.068
ir-0.068
Jud-0.067
Mary-0.067
Token ant
Token ID1885
Feature activation+4.90e-01

Pos logit contributions
 proper+0.077
powers+0.070
attered+0.062
 They+0.059
cream+0.058
Neg logit contributions
ir-0.104
eking-0.097
aman-0.094
lf-0.094
 nowhere-0.091
Token a
Token ID257
Feature activation-2.72e+00

Pos logit contributions
 proper+0.026
 He+0.026
."+0.024
 toy+0.024
powers+0.024
Neg logit contributions
eking-0.060
aman-0.054
ir-0.053
agg-0.053
Jud-0.052
Token flashlight
Token ID38371
Feature activation-4.22e+00

Pos logit contributions
 He+0.043
 proper+0.041
."+0.041
powers+0.040
 They+0.038
Neg logit contributions
eking-0.085
aman-0.076
Jud-0.076
agg-0.076
ir-0.075
Token and
Token ID290
Feature activation+3.14e+00

Pos logit contributions
 proper+0.086
powers+0.077
 He+0.068
 Let+0.066
."+0.065
Neg logit contributions
eking-0.123
aman-0.104
ir-0.104
ics-0.103
lf-0.103
Token a
Token ID257
Feature activation-2.67e+00

Pos logit contributions
eking+0.086
Jud+0.078
agg+0.077
Mary+0.077
ulu+0.075
Neg logit contributions
 proper-0.057
 He-0.056
."-0.055
powers-0.054
ny-0.052
Token magn
Token ID7842
Feature activation+2.32e+00

Pos logit contributions
."+0.046
 He+0.045
attered+0.045
powers+0.043
 proper+0.043
Neg logit contributions
eking-0.075
Jud-0.067
aman-0.066
ulu-0.066
Ja-0.066
Tokenifying
Token ID4035
Feature activation-1.21e+01

Pos logit contributions
eking+0.074
ir+0.069
aman+0.064
agg+0.064
ulu+0.064
Neg logit contributions
 proper-0.041
 He-0.037
powers-0.036
 toy-0.035
."-0.035
Token glass
Token ID5405
Feature activation+6.84e-02

Pos logit contributions
 proper+0.171
 He+0.166
powers+0.161
."+0.158
 They+0.153
Neg logit contributions
eking-0.393
ir-0.357
aman-0.355
agg-0.352
Jud-0.349
Token to
Token ID284
Feature activation+2.32e+00

Pos logit contributions
eking+0.002
ulu+0.002
ir+0.002
lf+0.002
ma+0.002
Neg logit contributions
 proper-0.001
powers-0.001
 He-0.001
 But-0.001
cream-0.001
Token examine
Token ID10716
Feature activation+1.51e+00

Pos logit contributions
eking+0.077
ir+0.069
Jud+0.068
agg+0.068
Mary+0.067
Neg logit contributions
 proper-0.033
 He-0.032
powers-0.032
."-0.030
 toy-0.029
Token their
Token ID511
Feature activation+1.53e+00

Pos logit contributions
eking+0.045
ir+0.041
irs+0.040
ma+0.039
ulu+0.039
Neg logit contributions
powers-0.027
 proper-0.024
attered-0.023
 He-0.023
 Let-0.021
Token finds
Token ID7228
Feature activation+1.82e+00

Pos logit contributions
eking+0.041
ma+0.036
ir+0.036
aman+0.036
agg+0.035
Neg logit contributions
 He-0.029
 Let-0.026
powers-0.026
ny-0.026
."-0.026
Token luggage
Token ID31488
Feature activation-2.25e-01

Pos logit contributions
 He+0.031
 They+0.029
 proper+0.027
attered+0.027
powers+0.026
Neg logit contributions
eking-0.045
aman-0.040
ir-0.038
ulu-0.038
ma-0.038
Token from
Token ID422
Feature activation+4.85e+00

Pos logit contributions
 proper+0.004
powers+0.004
 They+0.004
 He+0.004
 likes+0.004
Neg logit contributions
eking-0.007
ics-0.006
ma-0.006
aman-0.005
lf-0.005
Token his
Token ID465
Feature activation+3.77e-01

Pos logit contributions
eking+0.152
agg+0.137
Jud+0.136
aman+0.136
Mary+0.134
Neg logit contributions
 proper-0.082
."-0.081
 He-0.081
.-0.079
 toy-0.079
Token bedroom
Token ID14043
Feature activation-5.11e+00

Pos logit contributions
eking+0.009
aman+0.008
ma+0.008
ir+0.008
agg+0.008
Neg logit contributions
 He-0.008
 Let-0.007
 They-0.007
attered-0.007
ly-0.007
Token.
Token ID13
Feature activation-8.97e+00

Pos logit contributions
 proper+0.077
 He+0.076
powers+0.076
 They+0.074
 should+0.070
Neg logit contributions
eking-0.157
aman-0.141
lf-0.137
ma-0.137
 nowhere-0.132
Token
Token ID220
Feature activation+9.16e+00

Pos logit contributions
 proper+0.137
powers+0.134
."+0.126
 toy+0.125
 He+0.120
Neg logit contributions
eking-0.279
aman-0.261
Mary-0.252
agg-0.251
Jud-0.246
Token
Token ID198
Feature activation+1.18e+01

Pos logit contributions
eking+0.342
aman+0.299
ir+0.292
agg+0.289
Jud+0.283
Neg logit contributions
 He-0.157
.-0.137
 They-0.137
."-0.134
 But-0.114
Token
Token ID198
Feature activation+1.17e+01

Pos logit contributions
eking+0.399
ir+0.339
agg+0.334
Jud+0.334
aman+0.332
Neg logit contributions
 He-0.210
.-0.198
."-0.196
 They-0.190
 proper-0.179
TokenJon
Token ID18219
Feature activation+4.22e+00

Pos logit contributions
eking+0.203
aman+0.174
Mary+0.167
agg+0.161
ir+0.159
Neg logit contributions
 proper-0.349
powers-0.341
."-0.333
 He-0.333
 toy-0.328
Tokenny
Token ID3281
Feature activation-4.99e-01

Pos logit contributions
eking+0.157
aman+0.147
Mary+0.146
ma+0.145
andra+0.145
Neg logit contributions
powers-0.039
 proper-0.038
."-0.032
 toy-0.030
 They-0.030
Token's
Token ID338
Feature activation-3.98e-01

Pos logit contributions
 proper+0.008
 He+0.008
."+0.008
powers+0.008
 toy+0.007
Neg logit contributions
eking-0.016
aman-0.014
agg-0.014
Jud-0.014
Mary-0.014
Token cl
Token ID537
Feature activation+4.16e+00

Pos logit contributions
eking+0.061
Mary+0.051
Jud+0.050
agg+0.049
ir+0.049
Neg logit contributions
 proper-0.062
 He-0.059
powers-0.059
."-0.059
 toy-0.057
Tokenapped
Token ID6320
Feature activation+1.40e+00

Pos logit contributions
eking+0.057
ir+0.053
agg+0.049
irling+0.042
J+0.041
Neg logit contributions
 proper-0.142
 bone-0.141
 toy-0.135
powers-0.132
."-0.132
Token his
Token ID465
Feature activation+6.93e-01

Pos logit contributions
eking+0.041
ir+0.036
Mary+0.035
J+0.035
Ma+0.035
Neg logit contributions
powers-0.025
cream-0.023
attered-0.023
 proper-0.023
ometer-0.022
Token hands
Token ID2832
Feature activation+2.03e-01

Pos logit contributions
eking+0.020
ir+0.017
agg+0.017
eks+0.016
Ja+0.016
Neg logit contributions
 He-0.013
."-0.013
 proper-0.012
 They-0.012
 like-0.012
Token.
Token ID13
Feature activation-8.25e+00

Pos logit contributions
eking+0.007
ir+0.007
aman+0.006
Mary+0.006
ics+0.006
Neg logit contributions
 proper-0.002
powers-0.002
 He-0.002
 likes-0.002
 Let-0.002
Token
Token ID220
Feature activation+9.39e+00

Pos logit contributions
 proper+0.121
powers+0.112
 toy+0.112
."+0.111
.+0.103
Neg logit contributions
eking-0.262
Jud-0.240
agg-0.239
aman-0.238
ir-0.238
Token
Token ID198
Feature activation+1.19e+01

Pos logit contributions
eking+0.310
aman+0.268
ir+0.264
agg+0.261
Jud+0.256
Neg logit contributions
 He-0.173
 They-0.163
."-0.160
.-0.159
 proper-0.147
Token
Token ID198
Feature activation+1.19e+01

Pos logit contributions
eking+0.457
aman+0.393
eks+0.383
tha+0.377
ir+0.374
Neg logit contributions
 He-0.196
 They-0.173
."-0.164
.-0.162
 But-0.148
TokenBut
Token ID1537
Feature activation+3.60e+00

Pos logit contributions
eking+0.309
aman+0.254
ir+0.246
eks+0.244
agg+0.240
Neg logit contributions
 He-0.291
 They-0.273
 proper-0.272
."-0.267
powers-0.266
Token then
Token ID788
Feature activation-6.08e-01

Pos logit contributions
eking+0.108
ir+0.094
Jud+0.094
agg+0.094
aman+0.094
Neg logit contributions
 He-0.066
 proper-0.064
 toy-0.063
."-0.062
.-0.062
Token,
Token ID11
Feature activation+4.77e+00

Pos logit contributions
 proper+0.011
powers+0.011
 He+0.011
."+0.011
 toy+0.010
Neg logit contributions
eking-0.018
agg-0.015
Jud-0.015
aman-0.015
ir-0.015
Token frightened
Token ID24776
Feature activation-4.49e+00

Pos logit contributions
 proper+0.040
 He+0.039
."+0.038
powers+0.037
 Let+0.037
Neg logit contributions
eking-0.043
ir-0.036
agg-0.035
ulu-0.034
Jud-0.034
Token and
Token ID290
Feature activation+2.50e+00

Pos logit contributions
 proper+0.073
powers+0.066
ometer+0.059
 name+0.055
attered+0.053
Neg logit contributions
eking-0.151
ir-0.135
ma-0.129
 nowhere-0.129
Mary-0.128
Token alone
Token ID3436
Feature activation-1.15e+00

Pos logit contributions
eking+0.060
eks+0.045
ir+0.045
ulu+0.045
 nowhere+0.043
Neg logit contributions
 proper-0.062
powers-0.061
 He-0.059
 toy-0.058
."-0.057
Token.
Token ID13
Feature activation-8.74e+00

Pos logit contributions
 proper+0.014
powers+0.012
 He+0.010
cream+0.010
attered+0.010
Neg logit contributions
eking-0.043
ir-0.038
 nowhere-0.037
ma-0.036
ulu-0.036
Token<|endoftext|>
Token ID50256
Feature activation+9.87e+00

Pos logit contributions
 He+0.136
 They+0.131
."+0.123
 proper+0.122
.+0.119
Neg logit contributions
eking-0.305
aman-0.267
ir-0.262
agg-0.260
Jud-0.255
TokenL
Token ID43
Feature activation+1.00e+01

Pos logit contributions
eking+0.227
aman+0.188
ir+0.184
agg+0.183
Jud+0.182
Neg logit contributions
 He-0.252
 proper-0.246
."-0.245
.-0.241
 They-0.240
Tokenily
Token ID813
Feature activation+1.11e+00

Pos logit contributions
eking+0.193
aman+0.157
Bet+0.156
Jud+0.152
Ja+0.152
Neg logit contributions
.-0.313
."-0.305
 toy-0.296
 He-0.289
 They-0.287
Token and
Token ID290
Feature activation+2.53e+00

Pos logit contributions
eking+0.032
aman+0.028
agg+0.028
Jud+0.028
Mary+0.028
Neg logit contributions
 proper-0.020
 He-0.019
powers-0.019
."-0.019
 toy-0.018
Token Tom
Token ID4186
Feature activation+5.18e+00

Pos logit contributions
eking+0.072
ir+0.060
aman+0.060
agg+0.059
Mary+0.058
Neg logit contributions
 He-0.054
 They-0.052
 proper-0.051
."-0.051
.-0.051
Token were
Token ID547
Feature activation+6.24e+00

Pos logit contributions
eking+0.126
Jud+0.109
aman+0.108
agg+0.108
Mary+0.108
Neg logit contributions
 proper-0.117
powers-0.116
 He-0.110
."-0.109
 toy-0.107
Token twins
Token ID20345
Feature activation-3.67e+00

Pos logit contributions
eking+0.194
Jud+0.187
aman+0.185
Mary+0.184
ir+0.178
Neg logit contributions
 toy-0.108
.-0.108
 proper-0.102
 They-0.100
 favorite-0.100
Token back
Token ID736
Feature activation-8.51e-01

Pos logit contributions
 proper+0.003
 He+0.003
powers+0.003
 They+0.003
 toy+0.003
Neg logit contributions
eking-0.004
ir-0.004
ulu-0.004
irs-0.003
Jud-0.003
Token safely
Token ID11512
Feature activation-1.78e-01

Pos logit contributions
 proper+0.013
powers+0.013
 He+0.012
 player+0.012
attered+0.012
Neg logit contributions
eking-0.027
ir-0.024
aman-0.024
ulu-0.023
Mary-0.023
Token.
Token ID13
Feature activation-8.20e+00

Pos logit contributions
 proper+0.002
powers+0.002
 He+0.002
 They+0.002
attered+0.002
Neg logit contributions
eking-0.006
ir-0.006
aman-0.005
ulu-0.005
ics-0.005
Tokenâ
Token ID22940
Feature activation+4.81e+00

Pos logit contributions
 proper+0.134
 He+0.130
."+0.126
 toy+0.125
powers+0.125
Neg logit contributions
eking-0.257
aman-0.227
ir-0.224
Jud-0.224
agg-0.224
Token€
Token ID26391
Feature activation+4.86e+00

Pos logit contributions
eking+0.110
aman+0.091
ir+0.090
Mary+0.089
Jud+0.089
Neg logit contributions
 proper-0.122
 He-0.119
powers-0.118
 toy-0.118
."-0.117
Token
Token ID198
Feature activation+1.25e+01

Pos logit contributions
eking+0.142
ir+0.125
agg+0.123
aman+0.123
ulu+0.122
Neg logit contributions
 He-0.097
."-0.094
.-0.092
 They-0.092
 proper-0.089
Token
Token ID198
Feature activation+1.24e+01

Pos logit contributions
eking+0.476
aman+0.418
tha+0.399
ir+0.398
andra+0.398
Neg logit contributions
 He-0.197
.-0.171
 They-0.169
."-0.164
 It-0.147
TokenMax
Token ID11518
Feature activation+4.01e+00

Pos logit contributions
eking+0.357
ir+0.302
agg+0.301
aman+0.299
Jud+0.286
Neg logit contributions
 He-0.262
 proper-0.245
 They-0.238
."-0.238
powers-0.237
Token thought
Token ID1807
Feature activation+1.22e-01

Pos logit contributions
eking+0.107
agg+0.092
Mary+0.091
aman+0.091
Jud+0.090
Neg logit contributions
 proper-0.086
powers-0.080
 He-0.076
 toy-0.076
attered-0.075
Token about
Token ID546
Feature activation+1.62e+00

Pos logit contributions
eking+0.004
aman+0.003
agg+0.003
ir+0.003
Mary+0.003
Neg logit contributions
 proper-0.002
 He-0.002
powers-0.002
."-0.002
 They-0.002
Token his
Token ID465
Feature activation+1.04e+00

Pos logit contributions
eking+0.047
aman+0.040
ir+0.040
Jud+0.040
agg+0.040
Neg logit contributions
 He-0.031
 proper-0.031
."-0.030
.-0.030
 toy-0.030
Token was
Token ID373
Feature activation+4.80e+00

Pos logit contributions
 proper+0.070
 He+0.070
."+0.066
powers+0.066
 toy+0.065
Neg logit contributions
eking-0.119
aman-0.103
ir-0.103
agg-0.102
Jud-0.102
Token so
Token ID523
Feature activation-1.05e+00

Pos logit contributions
eking+0.135
aman+0.119
ir+0.118
Jud+0.117
agg+0.117
Neg logit contributions
."-0.095
 proper-0.094
.-0.093
 He-0.092
 toy-0.091
Token smart
Token ID4451
Feature activation-5.64e+00

Pos logit contributions
 He+0.024
 proper+0.023
 bone+0.023
powers+0.023
 They+0.022
Neg logit contributions
eking-0.026
agg-0.023
ulu-0.022
Mary-0.021
ir-0.021
Token.
Token ID13
Feature activation-8.02e+00

Pos logit contributions
 proper+0.096
powers+0.090
attered+0.087
cream+0.082
 texture+0.082
Neg logit contributions
ma-0.154
eking-0.150
aman-0.141
Mary-0.138
 nowhere-0.135
Token<|endoftext|>
Token ID50256
Feature activation+1.07e+01

Pos logit contributions
 proper+0.162
powers+0.157
 He+0.155
 toy+0.154
."+0.152
Neg logit contributions
eking-0.212
ir-0.188
aman-0.188
Mary-0.187
agg-0.187
TokenOnce
Token ID7454
Feature activation+1.04e+01

Pos logit contributions
eking+0.298
aman+0.251
ir+0.248
agg+0.247
Jud+0.246
Neg logit contributions
 He-0.231
."-0.222
.-0.220
 proper-0.218
 They-0.217
Token upon
Token ID2402
Feature activation+6.94e+00

Pos logit contributions
eking+0.241
icians+0.190
Jud+0.184
Bet+0.179
aman+0.178
Neg logit contributions
.-0.318
 He-0.302
."-0.297
 toy-0.288
 They-0.287
Token a
Token ID257
Feature activation-2.86e+00

Pos logit contributions
eking+0.212
Mary+0.189
agg+0.186
Jud+0.185
ir+0.184
Neg logit contributions
 proper-0.115
 He-0.108
powers-0.108
 toy-0.103
."-0.103
Token time
Token ID640
Feature activation+3.18e+00

Pos logit contributions
.+0.061
 proper+0.059
."+0.057
 like+0.056
 toy+0.055
Neg logit contributions
eking-0.085
irs-0.072
icians-0.070
eks-0.069
ir-0.069
Token,
Token ID11
Feature activation+4.45e+00

Pos logit contributions
eking+0.155
andra+0.140
ppa+0.140
eling+0.140
icians+0.138
Neg logit contributions
.-0.024
 He-0.018
 They-0.013
."-0.012
 like-0.010
Token there
Token ID612
Feature activation-1.97e+00

Pos logit contributions
eking+0.112
aman+0.097
Jud+0.092
ir+0.091
agg+0.091
Neg logit contributions
 He-0.109
.-0.106
 toy-0.104
 They-0.104
."-0.104
TokenAmy
Token ID40797
Feature activation+3.09e+00

Pos logit contributions
eking+0.325
eks+0.260
ir+0.252
agg+0.247
aman+0.247
Neg logit contributions
 He-0.300
."-0.281
 They-0.281
.-0.274
 proper-0.269
Token smiled
Token ID13541
Feature activation+4.12e+00

Pos logit contributions
eking+0.089
aman+0.078
agg+0.077
ir+0.077
Mary+0.077
Neg logit contributions
 proper-0.058
 He-0.055
powers-0.055
."-0.053
 toy-0.053
Token and
Token ID290
Feature activation+3.41e+00

Pos logit contributions
eking+0.113
agg+0.104
aman+0.102
Ma+0.101
ir+0.100
Neg logit contributions
 proper-0.081
powers-0.077
cream-0.068
attered-0.067
 bone-0.066
Token waved
Token ID26834
Feature activation+1.56e+00

Pos logit contributions
eking+0.084
aman+0.071
agg+0.071
Jud+0.070
Mary+0.070
Neg logit contributions
 proper-0.079
 He-0.077
powers-0.076
."-0.076
 toy-0.074
Token back
Token ID736
Feature activation-8.16e-01

Pos logit contributions
eking+0.041
ir+0.038
agg+0.038
aman+0.037
Mary+0.037
Neg logit contributions
 proper-0.030
powers-0.029
 bone-0.028
attered-0.028
cream-0.027
Token at
Token ID379
Feature activation+2.90e+00

Pos logit contributions
 proper+0.007
powers+0.005
 He+0.005
attered+0.005
 toy+0.004
Neg logit contributions
eking-0.033
ir-0.030
aman-0.030
Mary-0.029
agg-0.029
Token the
Token ID262
Feature activation-6.43e-01

Pos logit contributions
eking+0.089
Jud+0.078
agg+0.076
Mary+0.074
aman+0.074
Neg logit contributions
.-0.055
."-0.052
 toy-0.052
 He-0.051
 proper-0.051
Token elephant
Token ID20950
Feature activation+3.10e+00

Pos logit contributions
 proper+0.011
powers+0.010
 He+0.010
 They+0.010
."+0.010
Neg logit contributions
eking-0.020
aman-0.017
ir-0.017
Mary-0.017
Jud-0.017
Token.
Token ID13
Feature activation-7.89e+00

Pos logit contributions
eking+0.116
ir+0.104
aman+0.103
ma+0.101
Mary+0.100
Neg logit contributions
 proper-0.035
powers-0.027
attered-0.025
 He-0.025
 They-0.021
Token She
Token ID1375
Feature activation-3.11e+00

Pos logit contributions
 proper+0.114
 toy+0.106
powers+0.104
."+0.099
attered+0.094
Neg logit contributions
eking-0.251
aman-0.238
Mary-0.235
Jud-0.232
agg-0.232
Token was
Token ID373
Feature activation+5.41e+00

Pos logit contributions
 He+0.063
.+0.061
 proper+0.060
."+0.060
 toy+0.060
Neg logit contributions
eking-0.088
ir-0.076
aman-0.075
Jud-0.074
irs-0.074
Token and
Token ID290
Feature activation+2.50e+00

Pos logit contributions
eking+0.192
aman+0.170
ir+0.169
Jud+0.168
Mary+0.168
Neg logit contributions
 proper-0.092
powers-0.085
 He-0.085
 toy-0.078
 They-0.077
Token Tom
Token ID4186
Feature activation+4.64e+00

Pos logit contributions
eking+0.071
Jud+0.063
agg+0.062
ir+0.062
Mary+0.060
Neg logit contributions
 He-0.051
."-0.050
.-0.049
 toy-0.048
 proper-0.048
Token,
Token ID11
Feature activation+4.12e+00

Pos logit contributions
eking+0.150
ma+0.131
aman+0.129
Jud+0.128
ir+0.128
Neg logit contributions
 proper-0.079
powers-0.076
 He-0.067
attered-0.066
 toy-0.063
Token what
Token ID644
Feature activation-7.06e+00

Pos logit contributions
eking+0.118
ir+0.104
agg+0.103
aman+0.103
Jud+0.102
Neg logit contributions
 He-0.080
 proper-0.079
."-0.079
 toy-0.077
.-0.077
Token have
Token ID423
Feature activation-1.56e+00

Pos logit contributions
 He+0.119
."+0.118
.+0.116
 proper+0.113
 toy+0.113
Neg logit contributions
eking-0.227
aman-0.205
ir-0.196
agg-0.194
Mary-0.191
Token you
Token ID345
Feature activation+2.65e-01

Pos logit contributions
."+0.032
 He+0.032
.+0.032
 proper+0.032
 toy+0.031
Neg logit contributions
eking-0.043
aman-0.037
ir-0.037
Jud-0.037
agg-0.036
Token done
Token ID1760
Feature activation+1.73e+00

Pos logit contributions
eking+0.005
aman+0.005
agg+0.005
Mary+0.005
Jud+0.005
Neg logit contributions
."-0.007
.-0.007
 He-0.007
 toy-0.007
 proper-0.007
Token?
Token ID30
Feature activation-6.21e+00

Pos logit contributions
eking+0.048
aman+0.041
ir+0.041
Mary+0.040
agg+0.040
Neg logit contributions
 proper-0.036
powers-0.035
 He-0.034
 toy-0.032
 They-0.031
Token You
Token ID921
Feature activation-7.37e-01

Pos logit contributions
 proper+0.120
powers+0.110
 textures+0.107
ly+0.107
 toy+0.104
Neg logit contributions
eking-0.166
ma-0.147
Jud-0.146
izz-0.146
ir-0.145
Token broke
Token ID6265
Feature activation-1.49e+00

Pos logit contributions
 proper+0.013
 He+0.013
powers+0.013
 toy+0.012
."+0.012
Neg logit contributions
eking-0.022
ir-0.019
Jud-0.019
ulu-0.018
eks-0.018
Token the
Token ID262
Feature activation-1.25e+00

Pos logit contributions
 proper+0.031
 He+0.030
powers+0.029
."+0.029
 They+0.028
Neg logit contributions
eking-0.041
ir-0.035
aman-0.035
Mary-0.034
Jud-0.034
Token to
Token ID284
Feature activation+2.25e+00

Pos logit contributions
eking+0.186
aman+0.167
Jud+0.163
Mary+0.161
ir+0.160
Neg logit contributions
 He-0.090
."-0.089
 proper-0.087
.-0.086
 toy-0.083
Token Lily
Token ID20037
Feature activation+2.48e+00

Pos logit contributions
eking+0.081
aman+0.073
agg+0.073
Jud+0.072
Mary+0.071
Neg logit contributions
."-0.032
.-0.031
 They-0.029
 proper-0.029
 He-0.029
Token,
Token ID11
Feature activation+4.86e+00

Pos logit contributions
eking+0.071
ma+0.065
ir+0.062
ulu+0.061
aman+0.061
Neg logit contributions
powers-0.045
 proper-0.044
attered-0.043
 He-0.039
 toy-0.036
Token "
Token ID366
Feature activation+2.82e+00

Pos logit contributions
eking+0.189
agg+0.173
aman+0.171
Jud+0.170
irs+0.168
Neg logit contributions
 He-0.059
.-0.057
."-0.055
 They-0.052
 But-0.048
TokenYou
Token ID1639
Feature activation-6.02e-01

Pos logit contributions
eking+0.092
agg+0.080
aman+0.079
ir+0.077
irs+0.077
Neg logit contributions
 He-0.048
."-0.047
 They-0.045
 Let-0.045
 proper-0.045
Token are
Token ID389
Feature activation+5.70e+00

Pos logit contributions
 proper+0.010
 He+0.010
powers+0.010
."+0.009
 toy+0.009
Neg logit contributions
eking-0.019
ir-0.016
agg-0.016
Jud-0.016
aman-0.016
Token very
Token ID845
Feature activation-1.32e+00

Pos logit contributions
eking+0.189
aman+0.174
ir+0.169
agg+0.168
Mary+0.166
Neg logit contributions
 proper-0.086
."-0.085
 He-0.085
.-0.082
 toy-0.082
Token kind
Token ID1611
Feature activation-2.46e+00

Pos logit contributions
 proper+0.020
 He+0.020
."+0.019
powers+0.019
 toy+0.019
Neg logit contributions
eking-0.043
aman-0.038
ir-0.038
agg-0.038
Jud-0.038
Token and
Token ID290
Feature activation+2.74e+00

Pos logit contributions
 proper+0.058
attered+0.049
cream+0.048
 wagon+0.045
powers+0.044
Neg logit contributions
ma-0.055
ir-0.052
eking-0.050
Mary-0.050
Jane-0.049
Token smart
Token ID4451
Feature activation-5.73e+00

Pos logit contributions
eking+0.074
Jud+0.061
Ja+0.061
ulu+0.060
agg+0.060
Neg logit contributions
 He-0.055
powers-0.053
attered-0.053
 proper-0.052
ny-0.051
Token.
Token ID13
Feature activation-9.20e+00

Pos logit contributions
 proper+0.109
cream+0.097
attered+0.096
powers+0.083
controlled+0.080
Neg logit contributions
ma-0.149
eking-0.146
ir-0.139
ulu-0.132
 nowhere-0.128
Token.
Token ID13
Feature activation-8.49e+00

Pos logit contributions
powers+0.037
 proper+0.034
 Let+0.030
 name+0.029
attered+0.028
Neg logit contributions
eking-0.064
 nowhere-0.054
ir-0.053
ering-0.052
J-0.050
Token
Token ID198
Feature activation+1.21e+01

Pos logit contributions
 He+0.136
 proper+0.125
 They+0.123
."+0.122
.+0.118
Neg logit contributions
eking-0.289
ir-0.254
aman-0.251
agg-0.251
Jud-0.248
Token
Token ID198
Feature activation+1.20e+01

Pos logit contributions
eking+0.486
aman+0.431
agg+0.407
ir+0.405
tha+0.402
Neg logit contributions
 He-0.180
.-0.159
 They-0.152
."-0.150
 But-0.135
TokenThe
Token ID464
Feature activation+1.06e-02

Pos logit contributions
eking+0.300
aman+0.255
agg+0.250
ir+0.248
Jud+0.241
Neg logit contributions
 He-0.295
 proper-0.280
."-0.275
 They-0.274
 toy-0.272
Token cop
Token ID2243
Feature activation-1.85e+00

Pos logit contributions
eking+0.000
Mary+0.000
aman+0.000
Jud+0.000
Ja+0.000
Neg logit contributions
 proper-0.000
powers-0.000
."-0.000
 He-0.000
attered-0.000
Token saw
Token ID2497
Feature activation+2.10e+00

Pos logit contributions
 He+0.037
 proper+0.037
 toy+0.036
.+0.035
."+0.035
Neg logit contributions
eking-0.052
ir-0.045
aman-0.045
agg-0.044
Jud-0.044
Token that
Token ID326
Feature activation-6.01e+00

Pos logit contributions
eking+0.079
agg+0.072
Jud+0.070
Mary+0.069
ir+0.069
Neg logit contributions
 He-0.026
 toy-0.024
.-0.024
."-0.023
 They-0.023
Token Tim
Token ID5045
Feature activation+5.69e+00

Pos logit contributions
 He+0.095
 toy+0.089
.+0.088
 proper+0.087
."+0.087
Neg logit contributions
eking-0.206
agg-0.182
ir-0.181
aman-0.179
Jud-0.177
Tokenmy
Token ID1820
Feature activation+1.14e+00

Pos logit contributions
eking+0.129
aman+0.104
ir+0.104
Jud+0.102
Mary+0.102
Neg logit contributions
 proper-0.146
 He-0.143
powers-0.142
."-0.140
 They-0.138
Token was
Token ID373
Feature activation+4.28e+00

Pos logit contributions
eking+0.034
aman+0.029
ir+0.029
Mary+0.028
Jud+0.028
Neg logit contributions
 proper-0.021
powers-0.020
 He-0.020
."-0.019
 They-0.019
Token upset
Token ID9247
Feature activation-5.17e+00

Pos logit contributions
eking+0.128
aman+0.124
agg+0.120
Mary+0.119
Jud+0.118
Neg logit contributions
.-0.078
 He-0.078
 toy-0.077
."-0.074
 like-0.072
Token
Token ID198
Feature activation+1.21e+01

Pos logit contributions
eking+0.015
agg+0.013
aman+0.013
ir+0.013
irs+0.012
Neg logit contributions
 He-0.009
 They-0.008
.-0.008
."-0.008
 proper-0.008
Token
Token ID198
Feature activation+1.21e+01

Pos logit contributions
eking+0.453
aman+0.382
agg+0.373
ir+0.370
eling+0.368
Neg logit contributions
 He-0.202
."-0.189
.-0.187
 They-0.180
 toy-0.162
TokenThe
Token ID464
Feature activation+3.31e-01

Pos logit contributions
eking+0.244
Mary+0.218
Jud+0.212
ir+0.206
Ja+0.206
Neg logit contributions
 proper-0.323
powers-0.308
."-0.302
 He-0.302
 toy-0.301
Token new
Token ID649
Feature activation-3.39e+00

Pos logit contributions
eking+0.011
Jud+0.009
agg+0.009
Mary+0.009
aman+0.009
Neg logit contributions
 proper-0.005
powers-0.005
."-0.005
 He-0.005
 They-0.004
Token boy
Token ID2933
Feature activation+3.37e+00

Pos logit contributions
 proper+0.048
powers+0.045
attered+0.038
."+0.038
fy+0.036
Neg logit contributions
eking-0.104
Jud-0.101
J-0.099
ma-0.099
Jane-0.098
Token smiled
Token ID13541
Feature activation+3.35e+00

Pos logit contributions
eking+0.090
ir+0.077
aman+0.077
agg+0.076
Jud+0.076
Neg logit contributions
 He-0.074
.-0.071
 proper-0.070
 toy-0.070
."-0.069
Token and
Token ID290
Feature activation+2.77e+00

Pos logit contributions
eking+0.085
 nowhere+0.076
ir+0.074
agg+0.074
J+0.073
Neg logit contributions
 proper-0.073
powers-0.071
cream-0.060
 needs-0.058
 Let-0.058
Token said
Token ID531
Feature activation+5.63e+00

Pos logit contributions
eking+0.069
aman+0.059
agg+0.058
Jud+0.058
ir+0.058
Neg logit contributions
 proper-0.063
 He-0.062
."-0.061
powers-0.060
 toy-0.059
Token,
Token ID11
Feature activation+4.44e+00

Pos logit contributions
eking+0.150
agg+0.132
ir+0.130
Mary+0.128
aman+0.127
Neg logit contributions
 proper-0.116
powers-0.109
attered-0.105
 He-0.103
 toy-0.101
Token "
Token ID366
Feature activation+2.52e+00

Pos logit contributions
eking+0.172
agg+0.157
ir+0.156
aman+0.156
irs+0.156
Neg logit contributions
 He-0.054
.-0.049
."-0.048
 They-0.046
 But-0.042
TokenMy
Token ID3666
Feature activation+7.55e-01

Pos logit contributions
eking+0.059
Mary+0.054
Jud+0.053
aman+0.052
Ma+0.051
Neg logit contributions
 proper-0.061
."-0.056
powers-0.056
 toy-0.055
.-0.054
Token cutting
Token ID7720
Feature activation-2.31e+00

Pos logit contributions
agg+0.005
eking+0.005
ulu+0.005
Jud+0.005
aman+0.005
Neg logit contributions
 toy-0.005
.-0.005
 He-0.005
."-0.004
 proper-0.004
Token the
Token ID262
Feature activation-1.50e+00

Pos logit contributions
 proper+0.040
 He+0.038
powers+0.038
."+0.037
 toy+0.036
Neg logit contributions
eking-0.071
ir-0.062
aman-0.061
agg-0.060
Mary-0.060
Token corn
Token ID11676
Feature activation-8.70e-01

Pos logit contributions
powers+0.030
 He+0.030
 proper+0.029
 They+0.029
attered+0.028
Neg logit contributions
eking-0.041
aman-0.035
eling-0.033
fters-0.032
eks-0.032
Token and
Token ID290
Feature activation+2.25e+00

Pos logit contributions
 proper+0.014
powers+0.014
 He+0.012
attered+0.011
 Let+0.011
Neg logit contributions
eking-0.028
aman-0.024
ir-0.024
ma-0.023
ics-0.023
Token filling
Token ID12591
Feature activation-1.55e+00

Pos logit contributions
eking+0.055
agg+0.054
ir+0.053
irs+0.051
ulu+0.050
Neg logit contributions
.-0.053
 He-0.052
 toy-0.051
 like-0.049
."-0.049
Token the
Token ID262
Feature activation-1.47e+00

Pos logit contributions
 proper+0.032
powers+0.032
 He+0.030
."+0.029
 They+0.029
Neg logit contributions
eking-0.042
ir-0.036
Mary-0.035
aman-0.035
agg-0.035
Token truck
Token ID7779
Feature activation+1.10e+00

Pos logit contributions
powers+0.033
 He+0.029
 proper+0.029
 floppy+0.028
 They+0.028
Neg logit contributions
eking-0.035
aman-0.031
ma-0.029
eks-0.028
agg-0.027
Token.
Token ID13
Feature activation-8.84e+00

Pos logit contributions
eking+0.035
ma+0.029
ics+0.028
ir+0.028
aman+0.027
Neg logit contributions
 proper-0.019
powers-0.018
 Can-0.016
 He-0.016
 floppy-0.016
Token
Token ID198
Feature activation+1.16e+01

Pos logit contributions
 He+0.113
 proper+0.100
 They+0.099
."+0.098
.+0.094
Neg logit contributions
eking-0.330
ir-0.293
aman-0.292
agg-0.290
Jud-0.287
Token
Token ID198
Feature activation+1.15e+01

Pos logit contributions
eking+0.464
aman+0.407
ir+0.391
agg+0.384
tha+0.379
Neg logit contributions
 He-0.180
 They-0.159
.-0.158
."-0.152
 But-0.135
TokenTom
Token ID13787
Feature activation+5.73e+00

Pos logit contributions
eking+0.306
ir+0.262
aman+0.260
agg+0.260
irs+0.247
Neg logit contributions
 He-0.283
 They-0.258
."-0.257
.-0.256
 proper-0.248
Token I
Token ID314
Feature activation-4.71e+00

Pos logit contributions
 proper+0.165
powers+0.153
ly+0.145
 toy+0.145
attered+0.145
Neg logit contributions
Mary-0.243
Ma-0.243
ulu-0.243
Ja-0.241
eking-0.240
Token wanted
Token ID2227
Feature activation-3.76e+00

Pos logit contributions
 proper+0.077
 He+0.074
powers+0.072
 toy+0.070
 bone+0.066
Neg logit contributions
eking-0.140
ir-0.131
ulu-0.129
Jud-0.125
agg-0.124
Token to
Token ID284
Feature activation+1.33e+00

Pos logit contributions
 proper+0.043
 He+0.043
."+0.041
 toy+0.040
powers+0.039
Neg logit contributions
eking-0.137
aman-0.122
ir-0.121
agg-0.121
Jud-0.121
Token have
Token ID423
Feature activation-1.56e+00

Pos logit contributions
eking+0.042
ir+0.038
Jud+0.037
agg+0.036
aman+0.036
Neg logit contributions
 proper-0.021
 He-0.021
powers-0.020
 toy-0.019
attered-0.019
Token some
Token ID617
Feature activation-2.10e+00

Pos logit contributions
."+0.024
 proper+0.024
 He+0.023
.+0.023
 toy+0.023
Neg logit contributions
eking-0.052
aman-0.046
agg-0.045
ir-0.045
Jud-0.045
Token too
Token ID1165
Feature activation-3.51e+00

Pos logit contributions
 He+0.029
 proper+0.028
powers+0.026
 They+0.026
 toy+0.025
Neg logit contributions
eking-0.072
ir-0.064
agg-0.063
aman-0.063
ulu-0.062
Token.
Token ID13
Feature activation-8.78e+00

Pos logit contributions
 proper+0.038
attered+0.034
 He+0.030
 player+0.030
powers+0.029
Neg logit contributions
eking-0.120
ir-0.116
Mary-0.111
Jane-0.109
agg-0.109
Token But
Token ID887
Feature activation+3.01e+00

Pos logit contributions
 proper+0.158
attered+0.148
powers+0.148
ly+0.144
 toy+0.144
Neg logit contributions
Mary-0.224
Ma-0.221
Ja-0.219
Jim-0.214
ulu-0.213
Token I
Token ID314
Feature activation-4.26e+00

Pos logit contributions
eking+0.109
Jud+0.099
ir+0.098
aman+0.098
agg+0.098
Neg logit contributions
 He-0.037
."-0.037
.-0.034
 proper-0.034
 toy-0.034
Token did
Token ID750
Feature activation+8.36e-01

Pos logit contributions
 proper+0.071
 He+0.069
powers+0.067
 toy+0.066
 They+0.062
Neg logit contributions
eking-0.126
ir-0.115
ulu-0.114
Jud-0.113
aman-0.112
Token not
Token ID407
Feature activation+2.50e+00

Pos logit contributions
eking+0.014
ir+0.011
Mary+0.010
Jud+0.010
aman+0.010
Neg logit contributions
 proper-0.026
 He-0.026
powers-0.025
attered-0.025
 They-0.024
Token day
Token ID1110
Feature activation-2.77e-01

Pos logit contributions
 He+0.106
 proper+0.105
powers+0.100
."+0.100
 They+0.097
Neg logit contributions
eking-0.198
aman-0.173
ir-0.172
Jud-0.171
agg-0.171
Token.
Token ID13
Feature activation-8.90e+00

Pos logit contributions
 proper+0.002
 He+0.002
powers+0.002
 toy+0.002
."+0.001
Neg logit contributions
eking-0.011
ir-0.010
aman-0.010
Mary-0.010
agg-0.010
Token She
Token ID1375
Feature activation-4.43e+00

Pos logit contributions
 He+0.134
 They+0.126
 proper+0.126
."+0.121
.+0.118
Neg logit contributions
eking-0.313
ir-0.272
agg-0.268
ulu-0.266
aman-0.265
Token promised
Token ID8072
Feature activation+6.70e-02

Pos logit contributions
 He+0.071
.+0.071
 proper+0.071
 toy+0.070
."+0.070
Neg logit contributions
eking-0.144
ir-0.125
aman-0.124
irs-0.123
Jud-0.121
Token to
Token ID284
Feature activation+1.14e+00

Pos logit contributions
eking+0.002
ir+0.002
ulu+0.002
agg+0.002
aman+0.002
Neg logit contributions
 He-0.001
.-0.001
."-0.001
 proper-0.001
 They-0.001
Token wear
Token ID5806
Feature activation-2.56e+00

Pos logit contributions
eking+0.033
agg+0.028
aman+0.028
Jud+0.028
ir+0.028
Neg logit contributions
 He-0.022
."-0.021
.-0.021
 proper-0.021
 toy-0.020
Token it
Token ID340
Feature activation-2.29e+00

Pos logit contributions
 proper+0.048
 He+0.048
."+0.046
 toy+0.046
powers+0.045
Neg logit contributions
eking-0.075
aman-0.065
ir-0.064
agg-0.064
Jud-0.064
Token every
Token ID790
Feature activation-4.81e+00

Pos logit contributions
 proper+0.036
 He+0.035
."+0.034
.+0.033
 toy+0.033
Neg logit contributions
eking-0.074
aman-0.066
ir-0.065
Jud-0.065
agg-0.065
Token time
Token ID640
Feature activation+2.24e+00

Pos logit contributions
 He+0.078
 proper+0.078
."+0.074
powers+0.073
 toy+0.072
Neg logit contributions
eking-0.152
aman-0.133
ir-0.133
agg-0.132
Jud-0.132
Token she
Token ID673
Feature activation-4.87e+00

Pos logit contributions
eking+0.061
aman+0.052
ir+0.052
agg+0.052
Mary+0.052
Neg logit contributions
 proper-0.045
 He-0.045
powers-0.043
."-0.043
 toy-0.043
Token felt
Token ID2936
Feature activation-7.46e-01

Pos logit contributions
 proper+0.087
 He+0.087
."+0.083
powers+0.082
 toy+0.081
Neg logit contributions
eking-0.146
aman-0.127
ir-0.126
agg-0.126
Jud-0.125
Token that
Token ID326
Feature activation-6.74e+00

Pos logit contributions
 proper+0.070
 He+0.070
."+0.066
powers+0.066
 toy+0.065
Neg logit contributions
eking-0.144
aman-0.126
ir-0.125
agg-0.125
Jud-0.124
Token.
Token ID13
Feature activation-9.54e+00

Pos logit contributions
powers+0.100
 proper+0.089
attered+0.088
cream+0.080
 floppy+0.071
Neg logit contributions
eking-0.213
ma-0.198
ir-0.194
aman-0.193
lf-0.188
Token They
Token ID1119
Feature activation-5.50e+00

Pos logit contributions
 proper+0.059
powers+0.051
 toy+0.048
ly+0.047
."+0.040
Neg logit contributions
eking-0.369
Mary-0.358
Jud-0.357
aman-0.350
Jim-0.348
Token had
Token ID550
Feature activation-3.46e+00

Pos logit contributions
powers+0.133
 proper+0.132
 He+0.128
."+0.128
 toy+0.122
Neg logit contributions
eking-0.125
Jud-0.104
ir-0.104
Mary-0.103
aman-0.102
Token to
Token ID284
Feature activation+7.29e-01

Pos logit contributions
 He+0.054
 proper+0.051
powers+0.050
."+0.050
 toy+0.047
Neg logit contributions
eking-0.108
Jud-0.094
aman-0.094
ir-0.094
Mary-0.093
Token pick
Token ID2298
Feature activation-1.66e+00

Pos logit contributions
eking+0.023
ir+0.020
aman+0.020
Jud+0.019
Mary+0.019
Neg logit contributions
powers-0.013
 He-0.012
 proper-0.012
attered-0.011
 toy-0.011
Token up
Token ID510
Feature activation-2.95e+00

Pos logit contributions
powers+0.032
 He+0.031
 proper+0.030
."+0.027
 Let+0.027
Neg logit contributions
eking-0.044
ma-0.039
ir-0.039
eks-0.039
aman-0.038
Token all
Token ID477
Feature activation-5.39e+00

Pos logit contributions
 proper+0.059
 He+0.058
powers+0.056
."+0.056
 toy+0.054
Neg logit contributions
eking-0.082
ir-0.070
aman-0.070
agg-0.070
Mary-0.070
Token the
Token ID262
Feature activation-2.39e+00

Pos logit contributions
powers+0.136
 He+0.125
 proper+0.124
 bone+0.119
ometer+0.118
Neg logit contributions
eking-0.116
ma-0.113
ulu-0.107
ir-0.105
Mary-0.105
Token blocks
Token ID7021
Feature activation-2.18e+00

Pos logit contributions
 proper+0.041
 He+0.041
 toy+0.040
."+0.040
.+0.039
Neg logit contributions
eking-0.074
aman-0.067
agg-0.065
Jud-0.065
ir-0.065
Token and
Token ID290
Feature activation+1.38e+00

Pos logit contributions
 proper+0.028
 He+0.026
powers+0.025
."+0.024
 toy+0.023
Neg logit contributions
eking-0.077
ir-0.069
aman-0.068
Jud-0.067
Mary-0.067
Token it
Token ID340
Feature activation-1.56e+00

Pos logit contributions
eking+0.139
Mary+0.122
Jud+0.122
agg+0.122
aman+0.121
Neg logit contributions
 proper-0.084
powers-0.080
 He-0.080
attered-0.076
 toy-0.076
Token was
Token ID373
Feature activation+4.56e+00

Pos logit contributions
 proper+0.032
powers+0.032
 He+0.031
attered+0.031
 They+0.030
Neg logit contributions
eking-0.041
ma-0.034
agg-0.034
Jud-0.034
Mary-0.033
Token,"
Token ID553
Feature activation+2.27e+00

Pos logit contributions
eking+0.139
agg+0.119
ma+0.117
Jud+0.116
ulu+0.116
Neg logit contributions
powers-0.074
 He-0.073
attered-0.072
 They-0.068
 proper-0.067
Token agreed
Token ID4987
Feature activation-7.10e-01

Pos logit contributions
eking+0.055
Jud+0.047
agg+0.047
ir+0.047
aman+0.047
Neg logit contributions
 proper-0.053
 He-0.051
powers-0.051
."-0.050
 toy-0.050
Token Daisy
Token ID40355
Feature activation+2.19e+00

Pos logit contributions
 proper+0.009
powers+0.008
attered+0.007
 likes+0.007
 toy+0.006
Neg logit contributions
eking-0.025
Jud-0.024
Jane-0.023
agg-0.023
Mary-0.023
Token.
Token ID13
Feature activation-8.54e+00

Pos logit contributions
eking+0.066
ma+0.065
Jud+0.060
ics+0.060
Mary+0.060
Neg logit contributions
 proper-0.033
attered-0.029
powers-0.026
 Can-0.026
 thing-0.025
Token And
Token ID843
Feature activation+5.07e+00

Pos logit contributions
 proper+0.105
 He+0.101
."+0.097
powers+0.096
 toy+0.095
Neg logit contributions
eking-0.302
aman-0.270
ir-0.269
Jud-0.269
agg-0.269
Token they
Token ID484
Feature activation-3.09e+00

Pos logit contributions
eking+0.152
aman+0.132
ir+0.132
agg+0.131
Jud+0.130
Neg logit contributions
 He-0.091
 proper-0.091
."-0.087
 toy-0.086
powers-0.086
Token went
Token ID1816
Feature activation+1.53e+00

Pos logit contributions
 He+0.058
 proper+0.057
."+0.055
.+0.055
 toy+0.054
Neg logit contributions
eking-0.092
aman-0.079
Mary-0.077
Jud-0.077
ir-0.077
Token back
Token ID736
Feature activation-9.25e-01

Pos logit contributions
eking+0.043
ir+0.037
ulu+0.037
Mary+0.037
agg+0.037
Neg logit contributions
 proper-0.028
 textures-0.027
powers-0.027
attered-0.026
 likes-0.026
Token to
Token ID284
Feature activation+1.96e+00

Pos logit contributions
 He+0.010
 proper+0.010
powers+0.010
."+0.009
 They+0.009
Neg logit contributions
eking-0.033
agg-0.029
Jud-0.029
aman-0.029
Mary-0.029
Token fragile
Token ID21049
Feature activation-4.47e+00

Pos logit contributions
 He+0.032
 proper+0.031
powers+0.030
."+0.030
 They+0.030
Neg logit contributions
eking-0.069
ir-0.060
agg-0.060
ulu-0.059
aman-0.059
Token."
Token ID526
Feature activation-2.69e+00

Pos logit contributions
 proper+0.091
attered+0.086
powers+0.081
 name+0.076
 He+0.074
Neg logit contributions
eking-0.111
ma-0.101
ir-0.091
 nowhere-0.090
lf-0.089
Token L
Token ID406
Feature activation+9.33e+00

Pos logit contributions
 He+0.048
 proper+0.047
."+0.046
 They+0.045
 toy+0.045
Neg logit contributions
eking-0.083
aman-0.071
ir-0.071
agg-0.070
Jud-0.070
Tokenila
Token ID10102
Feature activation+3.02e+00

Pos logit contributions
eking+0.172
aman+0.140
Bet+0.136
ths+0.130
Ja+0.129
Neg logit contributions
.-0.291
."-0.286
 He-0.275
 They-0.259
ny-0.251
Token said
Token ID531
Feature activation+5.00e+00

Pos logit contributions
eking+0.096
irs+0.085
ir+0.083
aman+0.081
agg+0.081
Neg logit contributions
.-0.055
 He-0.054
."-0.051
 They-0.050
 toy-0.050
Token.
Token ID13
Feature activation-9.24e+00

Pos logit contributions
eking+0.201
ir+0.178
agg+0.176
aman+0.172
Jud+0.171
Neg logit contributions
 proper-0.044
powers-0.035
 He-0.031
attered-0.030
 toy-0.030
Token She
Token ID1375
Feature activation-4.45e+00

Pos logit contributions
 He+0.146
 They+0.141
.+0.132
."+0.130
 But+0.120
Neg logit contributions
eking-0.338
aman-0.284
agg-0.284
ir-0.281
eks-0.276
Token gave
Token ID2921
Feature activation+3.00e+00

Pos logit contributions
 He+0.079
 toy+0.075
.+0.075
 proper+0.073
 They+0.073
Neg logit contributions
eking-0.140
ir-0.123
aman-0.122
irs-0.121
Jud-0.120
Token the
Token ID262
Feature activation-1.18e+00

Pos logit contributions
eking+0.099
agg+0.087
aman+0.086
Mary+0.084
Jud+0.084
Neg logit contributions
 toy-0.052
.-0.051
."-0.048
 He-0.047
 proper-0.045
Token crystal
Token ID15121
Feature activation-4.35e-01

Pos logit contributions
 He+0.020
 proper+0.019
 They+0.019
attered+0.019
powers+0.019
Neg logit contributions
eking-0.035
ir-0.030
aman-0.030
ulu-0.030
ma-0.030
Token to
Token ID284
Feature activation+1.90e+00

Pos logit contributions
 proper+0.007
attered+0.007
 Can+0.007
 He+0.007
 Let+0.006
Neg logit contributions
eking-0.011
ma-0.011
ulu-0.010
ir-0.010
aches-0.010
Token Lily
Token ID20037
Feature activation+2.84e-01

Pos logit contributions
eking+0.089
agg+0.077
aman+0.077
Jud+0.077
ir+0.077
Neg logit contributions
 He-0.045
 proper-0.044
."-0.044
 toy-0.043
.-0.042
Token.
Token ID13
Feature activation-1.08e+01

Pos logit contributions
ma+0.009
eking+0.008
aman+0.008
tha+0.007
ir+0.007
Neg logit contributions
powers-0.005
 proper-0.005
attered-0.004
 textures-0.004
cream-0.004
Token I
Token ID314
Feature activation-5.65e+00

Pos logit contributions
 proper+0.178
powers+0.177
 toy+0.167
ly+0.163
ny+0.163
Neg logit contributions
eking-0.297
ulu-0.276
ir-0.275
Jud-0.272
aman-0.272
Token was
Token ID373
Feature activation+3.25e+00

Pos logit contributions
 He+0.115
powers+0.110
 proper+0.110
ly+0.103
 toy+0.103
Neg logit contributions
eking-0.145
ir-0.130
ulu-0.129
Jud-0.129
agg-0.124
Token foolish
Token ID19538
Feature activation-6.34e+00

Pos logit contributions
eking+0.095
aman+0.086
agg+0.084
ir+0.084
Mary+0.083
Neg logit contributions
 proper-0.059
."-0.059
 He-0.059
 toy-0.058
.-0.057
Token.
Token ID13
Feature activation-1.04e+01

Pos logit contributions
 proper+0.115
cream+0.099
ometer+0.093
powers+0.088
 mineral+0.087
Neg logit contributions
eking-0.190
ma-0.167
h-0.162
 nowhere-0.160
ir-0.159
Token I
Token ID314
Feature activation-5.25e+00

Pos logit contributions
 proper+0.170
powers+0.170
 toy+0.168
ly+0.162
ny+0.160
Neg logit contributions
eking-0.273
Ma-0.259
Mary-0.259
Jud-0.256
ulu-0.256
Token won
Token ID1839
Feature activation+1.62e+00

Pos logit contributions
 He+0.103
 proper+0.100
powers+0.097
ly+0.094
 toy+0.093
Neg logit contributions
eking-0.133
ir-0.124
ulu-0.123
Jud-0.120
eks-0.115
Token't
Token ID470
Feature activation+2.76e+00

Pos logit contributions
eking+0.028
aman+0.023
agg+0.022
Jud+0.021
ir+0.021
Neg logit contributions
 He-0.050
."-0.050
 proper-0.049
.-0.049
 toy-0.048
Token do
Token ID466
Feature activation-2.17e+00

Pos logit contributions
eking+0.072
ir+0.064
Jud+0.062
ulu+0.060
Jane+0.057
Neg logit contributions
 proper-0.063
ometer-0.058
attered-0.058
powers-0.057
 He-0.056
Token that
Token ID326
Feature activation-6.72e+00

Pos logit contributions
 proper+0.052
 He+0.051
ny+0.051
attered+0.050
powers+0.048
Neg logit contributions
eking-0.052
ir-0.040
Mary-0.040
irs-0.039
ics-0.039
Token had
Token ID550
Feature activation-3.81e+00

Pos logit contributions
 proper+0.102
 He+0.101
powers+0.099
."+0.098
 toy+0.096
Neg logit contributions
eking-0.161
Mary-0.138
Jud-0.138
agg-0.137
aman-0.137
Token not
Token ID407
Feature activation+5.41e-01

Pos logit contributions
 proper+0.086
."+0.084
.+0.083
 toy+0.083
 He+0.082
Neg logit contributions
eking-0.097
ir-0.089
aman-0.087
agg-0.085
ulu-0.083
Token lost
Token ID2626
Feature activation-6.67e+00

Pos logit contributions
eking+0.012
Jud+0.010
Mary+0.009
ir+0.009
agg+0.009
Neg logit contributions
 He-0.014
 proper-0.013
powers-0.013
."-0.013
 They-0.013
Token their
Token ID511
Feature activation-4.00e-01

Pos logit contributions
 He+0.174
 proper+0.173
powers+0.171
."+0.164
 They+0.163
Neg logit contributions
eking-0.142
ir-0.117
Mary-0.115
aman-0.115
Jud-0.115
Token surprise
Token ID5975
Feature activation+2.01e-01

Pos logit contributions
 He+0.006
 They+0.006
."+0.005
 proper+0.005
powers+0.005
Neg logit contributions
eking-0.013
aman-0.011
ir-0.011
ma-0.011
agg-0.011
Token.
Token ID13
Feature activation-1.00e+01

Pos logit contributions
eking+0.008
ma+0.007
aman+0.007
ir+0.007
ulu+0.007
Neg logit contributions
powers-0.002
 He-0.002
 proper-0.002
attered-0.001
 They-0.001
Token They
Token ID1119
Feature activation-5.44e+00

Pos logit contributions
 proper+0.082
powers+0.072
 He+0.071
 toy+0.071
."+0.069
Neg logit contributions
eking-0.392
aman-0.359
Mary-0.359
Jud-0.359
ir-0.356
Token hugged
Token ID41130
Feature activation+9.47e-01

Pos logit contributions
 He+0.095
 proper+0.093
.+0.090
."+0.089
 toy+0.089
Neg logit contributions
eking-0.170
aman-0.149
ir-0.148
agg-0.147
Mary-0.145
Token each
Token ID1123
Feature activation-4.21e+00

Pos logit contributions
eking+0.028
ir+0.024
aman+0.024
Mary+0.024
Jud+0.024
Neg logit contributions
 proper-0.017
 He-0.017
powers-0.017
."-0.016
 toy-0.016
Token other
Token ID584
Feature activation-4.08e+00

Pos logit contributions
powers+0.082
attered+0.080
 proper+0.080
 He+0.080
ometer+0.073
Neg logit contributions
eking-0.112
Mary-0.102
ir-0.101
Jud-0.099
aman-0.098
Token and
Token ID290
Feature activation+1.89e+00

Pos logit contributions
 proper+0.058
powers+0.057
 He+0.057
attered+0.053
 They+0.050
Neg logit contributions
eking-0.134
ma-0.118
ir-0.117
aman-0.116
Mary-0.116
Token play
Token ID711
Feature activation-5.87e+00

Pos logit contributions
attered+0.019
powers+0.019
 proper+0.018
 He+0.017
ometer+0.017
Neg logit contributions
eking-0.034
ir-0.031
agg-0.031
Jud-0.030
Mary-0.030
Token outside
Token ID2354
Feature activation-5.87e+00

Pos logit contributions
 proper+0.095
 He+0.091
powers+0.090
 They+0.083
attered+0.083
Neg logit contributions
eking-0.186
ir-0.165
ulu-0.159
Mary-0.159
agg-0.158
Token with
Token ID351
Feature activation+2.73e-01

Pos logit contributions
 proper+0.121
powers+0.118
attered+0.104
ometer+0.103
cream+0.098
Neg logit contributions
eking-0.160
ir-0.147
ma-0.144
ulu-0.140
lf-0.139
Token her
Token ID607
Feature activation-9.72e-01

Pos logit contributions
eking+0.006
aman+0.005
agg+0.005
Jud+0.005
ir+0.005
Neg logit contributions
 He-0.007
 proper-0.007
."-0.007
 toy-0.007
.-0.007
Token friends
Token ID2460
Feature activation-2.14e+00

Pos logit contributions
 He+0.018
 proper+0.018
powers+0.017
."+0.017
 They+0.017
Neg logit contributions
eking-0.028
ir-0.024
aman-0.024
agg-0.024
Jud-0.024
Token.
Token ID13
Feature activation-9.97e+00

Pos logit contributions
powers+0.027
 proper+0.026
 He+0.023
attered+0.022
 They+0.020
Neg logit contributions
eking-0.075
ma-0.068
aman-0.067
ir-0.066
 nowhere-0.065
Token One
Token ID1881
Feature activation-1.24e+00

Pos logit contributions
 proper+0.130
powers+0.123
 He+0.118
 toy+0.118
."+0.116
Neg logit contributions
eking-0.337
aman-0.310
ir-0.307
Jud-0.307
Mary-0.306
Token day
Token ID1110
Feature activation-9.30e-01

Pos logit contributions
 proper+0.022
powers+0.022
 He+0.022
."+0.021
 They+0.020
Neg logit contributions
eking-0.036
ulu-0.032
agg-0.032
ir-0.032
aman-0.032
Token,
Token ID11
Feature activation+3.70e+00

Pos logit contributions
 He+0.015
 proper+0.015
."+0.014
 toy+0.014
powers+0.014
Neg logit contributions
eking-0.030
aman-0.026
ir-0.026
agg-0.026
Jud-0.026
Token when
Token ID618
Feature activation-4.46e+00

Pos logit contributions
eking+0.109
ir+0.093
aman+0.092
agg+0.091
Jud+0.090
Neg logit contributions
 He-0.072
 proper-0.071
."-0.070
.-0.069
 They-0.069
Token she
Token ID673
Feature activation-4.76e+00

Pos logit contributions
 He+0.090
.+0.087
."+0.086
 toy+0.086
 proper+0.086
Neg logit contributions
eking-0.132
ir-0.111
agg-0.109
aman-0.109
Mary-0.107
Token was
Token ID373
Feature activation+3.82e+00

Pos logit contributions
 proper+0.043
 He+0.041
powers+0.040
."+0.038
 toy+0.036
Neg logit contributions
eking-0.132
ir-0.117
aman-0.117
Jud-0.117
Mary-0.116
Token shaking
Token ID17275
Feature activation-4.59e+00

Pos logit contributions
eking+0.111
aman+0.094
ir+0.094
Jud+0.093
agg+0.093
Neg logit contributions
 He-0.072
 proper-0.072
powers-0.070
."-0.069
 They-0.068
Token it
Token ID340
Feature activation-2.35e+00

Pos logit contributions
powers+0.067
 proper+0.060
 He+0.049
 Let+0.049
ashes+0.049
Neg logit contributions
eking-0.154
ir-0.134
Mary-0.132
aman-0.132
 nowhere-0.130
Token all
Token ID477
Feature activation-4.75e+00

Pos logit contributions
powers+0.027
 proper+0.027
 He+0.023
 likes+0.022
 They+0.021
Neg logit contributions
eking-0.086
aman-0.071
ir-0.070
ics-0.070
 nowhere-0.068
Token around
Token ID1088
Feature activation-6.11e-01

Pos logit contributions
powers+0.095
 proper+0.083
 He+0.080
 bone+0.079
ometer+0.076
Neg logit contributions
eking-0.144
ma-0.122
ir-0.120
aman-0.119
Mary-0.117
Token.
Token ID13
Feature activation-9.21e+00

Pos logit contributions
 proper+0.006
 He+0.006
powers+0.006
."+0.005
 toy+0.005
Neg logit contributions
eking-0.023
ir-0.021
aman-0.021
Jud-0.021
Mary-0.021
Token
Token ID220
Feature activation+8.88e+00

Pos logit contributions
 He+0.205
 They+0.198
 proper+0.197
."+0.194
.+0.191
Neg logit contributions
eking-0.247
ir-0.207
aman-0.207
Jud-0.202
agg-0.202
Token
Token ID198
Feature activation+1.15e+01

Pos logit contributions
eking+0.317
aman+0.272
ir+0.270
lf+0.264
eks+0.262
Neg logit contributions
 He-0.148
 They-0.147
.-0.140
."-0.138
 But-0.120
Token
Token ID198
Feature activation+1.14e+01

Pos logit contributions
eking+0.448
aman+0.381
andra+0.369
eling+0.369
ir+0.368
Neg logit contributions
 They-0.171
 He-0.169
."-0.157
.-0.156
 But-0.141
TokenJo
Token ID9908
Feature activation+2.99e+00

Pos logit contributions
eking+0.270
aman+0.220
ir+0.219
agg+0.212
Jud+0.210
Neg logit contributions
 He-0.281
."-0.273
 They-0.273
 proper-0.271
.-0.269
Tokenan
Token ID272
Feature activation+2.29e+00

Pos logit contributions
eking+0.078
irs+0.069
aman+0.068
Ja+0.066
Jud+0.066
Neg logit contributions
.-0.064
 He-0.063
."-0.063
 They-0.060
ly-0.059