TOP ACTIVATIONS
MAX = 0.186

smile . S aman tha laughed , happy that
before dinner . <|endoftext|> S aman tha was a lazy girl
fun time outside . <|endoftext|> Once upon a time , there
discover new things . <|endoftext|> Once upon a time , there
and her dad . <|endoftext|> Once upon a time , there
his delicious lunch . <|endoftext|> Once upon a time there was
the victory parade . <|endoftext|> Once upon a time , a
world around her . <|endoftext|> Once upon a time , there
all day long . <|endoftext|> Once upon a time , there
on his face . <|endoftext|> Once upon a time , there
help him out . <|endoftext|> Once upon a time , there
left all alone . <|endoftext|> Once upon a time , there
the door . Once she was inside , D
through it . Once the bunny was inside the
on his face . <|endoftext|> Once there was a little girl
best of friends . <|endoftext|> Once upon a time , there
new spot . Once Robert reached the top of
all by herself . <|endoftext|> Once upon a time , there
a happy heart . <|endoftext|> Once upon a time , there
happy and warm . <|endoftext|> Once upon a time , there

MIN ACTIVATIONS
MIN = -0.155

You should not take things that are not yours ! You
You should not take things that are not yours . And
her things away so nothing bad would happen again . <|endoftext|>
We will not take things that are not ours . Please
You should not take things that are not yours . You
I won 't let anything bad happen ."
. Don 't take things that are not yours . And
was bad to take things that weren 't yours ." Jill
. They never take things that are not theirs again .
not nice to take things that are not ours . Mrs
You should not take things that are not yours ," Sara
checks if we have anything bad in our pockets or our
Tim didn 't want anything bad to happen to his club
We don 't want anything bad to happen ."
not to spray paint anything that shouldn 't be painted ."
be cautious so that nothing bad would happen again . <|endoftext|>
You should not take things that are not yours !"
You should not eat things that are rot . They are
are very lucky that nothing bad happened . You should be
You can 't take things that are not yours !"

INTERVAL 0.100 - 0.186
CONTAINS 5.976%

rest in it . M ummy was talking to
anxious and more happy . After a while ,
?" Tom asks . " Sure , you can
ran towards her . When he got closer ,
a deep hole . The animals worked together to

INTERVAL 0.015 - 0.100
CONTAINS 28.260%

She gasped , What are you
for us ." Tom smiled and said , "
When her mom found her , they laughed and laughed .
to play , but the cloud said it was tired and
, he knew that he could conquer anything that scared him

INTERVAL -0.070 - 0.015
CONTAINS 57.115%

Once there was a girl named Lucy . She was three
B enny , can you mix the flour and sugar together
water . The boat is white and red . It has
, a little girl was helping her mom put dishes in
Tom thought about what Lily said . He said ,

INTERVAL -0.155 - -0.070
CONTAINS 8.649%

loud thunder and bright lightning . Tim 's dad
try to say the words . One day ,
better , and she learned that sometimes even angry things can
cap and a red label . They could not read the
Lily drank it all up . She felt much better and
Token smile
Token ID8212
Feature activation+3.20e-02

Pos logit contributions
ona+0.041
+0.040
ink+0.040
een+0.039
loe+0.039
Neg logit contributions
 for-0.026
 back-0.025
 worse-0.025
 hospital-0.024
der-0.023
Token.
Token ID13
Feature activation-5.53e-02

Pos logit contributions
+0.073
ona+0.072
 total+0.071
loe+0.071
ink+0.070
Neg logit contributions
 for-0.024
 worse-0.023
der-0.023
 back-0.022
 inside-0.020
Token
Token ID198
Feature activation+1.23e-01

Pos logit contributions
der+0.036
 for+0.034
 worse+0.032
 back+0.032
 he+0.030
Neg logit contributions
ink-0.127
ona-0.126
 total-0.123
-0.122
ere-0.122
Token
Token ID198
Feature activation+1.25e-01

Pos logit contributions
+0.245
ona+0.238
ink+0.236
 total+0.235
 blonde+0.234
Neg logit contributions
 worse-0.118
 for-0.118
 back-0.113
der-0.108
 he-0.105
TokenS
Token ID50
Feature activation+6.88e-02

Pos logit contributions
+0.203
ink+0.195
ona+0.191
 blonde+0.188
 total+0.186
Neg logit contributions
 worse-0.162
 he-0.154
 smoothly-0.154
hot-0.152
 for-0.151
Tokenaman
Token ID10546
Feature activation+1.81e-01

Pos logit contributions
loe+0.097
 blonde+0.096
 total+0.095
ates+0.093
Will+0.088
Neg logit contributions
 for-0.104
 back-0.102
ly-0.098
 worse-0.097
der-0.097
Tokentha
Token ID12898
Feature activation+6.32e-02

Pos logit contributions
 total+0.218
+0.218
 blonde+0.207
ona+0.199
loe+0.197
Neg logit contributions
 back-0.337
 for-0.322
 worse-0.316
 inside-0.308
 he-0.307
Token laughed
Token ID13818
Feature activation+5.39e-02

Pos logit contributions
+0.124
 total+0.119
ona+0.118
loe+0.116
 blonde+0.114
Neg logit contributions
 back-0.067
 for-0.067
 worse-0.067
der-0.064
 inside-0.061
Token,
Token ID11
Feature activation+3.33e-02

Pos logit contributions
ona+0.093
ink+0.092
 blonde+0.092
+0.090
ere+0.087
Neg logit contributions
 worse-0.065
controlled-0.060
 hospital-0.059
 rubble-0.056
der-0.055
Token happy
Token ID3772
Feature activation+2.67e-02

Pos logit contributions
+0.072
ona+0.072
ink+0.072
loe+0.071
 total+0.070
Neg logit contributions
 for-0.030
 back-0.029
 worse-0.028
der-0.027
 even-0.026
Token that
Token ID326
Feature activation-7.42e-02

Pos logit contributions
ona+0.055
+0.054
ink+0.053
 total+0.052
 blonde+0.052
Neg logit contributions
 worse-0.025
controlled-0.021
der-0.021
 hospital-0.019
 altogether-0.019
Token before
Token ID878
Feature activation-1.20e-02

Pos logit contributions
ona+0.119
+0.118
ink+0.116
loe+0.115
 total+0.114
Neg logit contributions
 for-0.033
 worse-0.032
der-0.030
 back-0.030
 altogether-0.024
Token dinner
Token ID8073
Feature activation+1.12e-02

Pos logit contributions
 for+0.011
 worse+0.010
 back+0.009
der+0.009
 he+0.009
Neg logit contributions
ona-0.026
ink-0.025
 total-0.025
ere-0.025
-0.025
Token.
Token ID13
Feature activation-5.58e-02

Pos logit contributions
ona+0.024
 total+0.024
+0.024
ink+0.023
loe+0.023
Neg logit contributions
 for-0.010
 worse-0.010
der-0.009
 back-0.009
 he-0.008
Token<|endoftext|>
Token ID50256
Feature activation+1.22e-01

Pos logit contributions
der+0.041
 for+0.040
 worse+0.039
 back+0.038
 he+0.036
Neg logit contributions
ink-0.123
ona-0.121
-0.119
 total-0.119
een-0.118
TokenS
Token ID50
Feature activation+7.48e-02

Pos logit contributions
ona+0.263
 blonde+0.255
 total+0.250
elve+0.249
itching+0.243
Neg logit contributions
 inside-0.111
 back-0.110
.-0.108
 He-0.107
 around-0.105
Tokenaman
Token ID10546
Feature activation+1.81e-01

Pos logit contributions
 blonde+0.098
+0.095
ates+0.091
icians+0.088
loe+0.088
Neg logit contributions
 for-0.129
 back-0.129
 inside-0.112
 he-0.111
 that-0.111
Tokentha
Token ID12898
Feature activation+7.31e-02

Pos logit contributions
 total+0.254
 blonde+0.236
+0.235
elve+0.226
loe+0.218
Neg logit contributions
 back-0.302
 for-0.290
 inside-0.280
 he-0.280
 that-0.277
Token was
Token ID373
Feature activation+1.14e-01

Pos logit contributions
+0.174
 total+0.170
ona+0.170
loe+0.165
 blonde+0.165
Neg logit contributions
 for-0.049
 back-0.046
 inside-0.043
 worse-0.041
der-0.038
Token a
Token ID257
Feature activation+5.04e-02

Pos logit contributions
ona+0.237
+0.233
 total+0.230
een+0.224
ink+0.223
Neg logit contributions
 for-0.120
 back-0.112
 worse-0.110
 inside-0.104
 around-0.100
Token lazy
Token ID16931
Feature activation-2.22e-02

Pos logit contributions
+0.082
ink+0.081
ona+0.081
loe+0.081
een+0.080
Neg logit contributions
 back-0.068
 worse-0.067
 for-0.066
 hospital-0.062
 even-0.062
Token girl
Token ID2576
Feature activation+3.73e-02

Pos logit contributions
 worse+0.022
der+0.020
 back+0.019
 for+0.019
controlled+0.019
Neg logit contributions
ona-0.045
-0.044
ink-0.043
 total-0.042
 blonde-0.042
Token fun
Token ID1257
Feature activation+6.56e-03

Pos logit contributions
+0.020
ona+0.020
ink+0.019
 total+0.019
loe+0.019
Neg logit contributions
 worse-0.012
der-0.012
 for-0.011
 back-0.011
 altogether-0.010
Token time
Token ID640
Feature activation+6.64e-02

Pos logit contributions
+0.012
ona+0.012
ink+0.012
 total+0.012
loe+0.011
Neg logit contributions
 worse-0.008
der-0.007
 for-0.007
 back-0.007
 altogether-0.006
Token outside
Token ID2354
Feature activation+8.81e-03

Pos logit contributions
ona+0.130
+0.130
ink+0.127
 total+0.126
loe+0.125
Neg logit contributions
 worse-0.069
 for-0.066
der-0.066
 back-0.063
 altogether-0.059
Token.
Token ID13
Feature activation-6.78e-02

Pos logit contributions
ona+0.021
+0.021
ink+0.020
 total+0.020
loe+0.020
Neg logit contributions
 worse-0.006
 for-0.005
der-0.005
 back-0.005
 altogether-0.004
Token<|endoftext|>
Token ID50256
Feature activation+1.27e-01

Pos logit contributions
 worse+0.076
der+0.067
 for+0.065
 back+0.065
 altogether+0.060
Neg logit contributions
-0.133
ona-0.130
ink-0.126
 total-0.126
loe-0.125
TokenOnce
Token ID7454
Feature activation+1.80e-01

Pos logit contributions
ona+0.273
 blonde+0.262
ink+0.258
 total+0.258
+0.253
Neg logit contributions
 back-0.116
 for-0.113
 inside-0.108
 around-0.105
 worse-0.100
Token upon
Token ID2402
Feature activation+1.57e-01

Pos logit contributions
+0.126
ona+0.125
ink+0.119
 total+0.112
loe+0.109
Neg logit contributions
 worse-0.415
der-0.403
 for-0.403
 back-0.398
 altogether-0.385
Token a
Token ID257
Feature activation+6.73e-02

Pos logit contributions
+0.278
ona+0.278
ink+0.273
 total+0.267
loe+0.264
Neg logit contributions
 worse-0.192
der-0.183
 for-0.179
 back-0.176
 altogether-0.166
Token time
Token ID640
Feature activation+1.13e-01

Pos logit contributions
ona+0.085
icians+0.083
ink+0.080
+0.080
used+0.079
Neg logit contributions
 for-0.122
 back-0.116
 even-0.111
 worse-0.111
 bad-0.109
Token,
Token ID11
Feature activation+5.28e-02

Pos logit contributions
ona+0.276
icians+0.269
 blonde+0.264
 total+0.262
ates+0.262
Neg logit contributions
 for-0.095
 back-0.078
 he-0.074
 inside-0.068
 they-0.064
Token there
Token ID612
Feature activation+7.11e-02

Pos logit contributions
+0.089
ona+0.086
ink+0.083
loe+0.082
een+0.081
Neg logit contributions
 for-0.079
 back-0.077
 worse-0.075
 inside-0.075
 around-0.070
Token discover
Token ID7073
Feature activation+5.92e-03

Pos logit contributions
 worse+0.050
der+0.049
 altogether+0.044
controlled+0.044
 for+0.043
Neg logit contributions
 total-0.061
ona-0.061
-0.061
 blonde-0.059
ink-0.058
Token new
Token ID649
Feature activation-3.68e-02

Pos logit contributions
ona+0.011
ere+0.011
ink+0.011
 total+0.010
 blonde+0.010
Neg logit contributions
 worse-0.006
der-0.006
ger-0.006
controlled-0.006
 back-0.005
Token things
Token ID1243
Feature activation-3.90e-02

Pos logit contributions
 worse+0.044
der+0.041
 back+0.038
 for+0.038
 altogether+0.037
Neg logit contributions
ona-0.066
 total-0.064
ink-0.064
ere-0.064
-0.064
Token.
Token ID13
Feature activation-6.51e-02

Pos logit contributions
der+0.034
 worse+0.033
 hospital+0.030
 trigger+0.030
 grave+0.029
Neg logit contributions
ink-0.078
ona-0.077
-0.076
 total-0.075
ere-0.074
Token<|endoftext|>
Token ID50256
Feature activation+1.30e-01

Pos logit contributions
 worse+0.107
der+0.097
 back+0.094
 for+0.094
controlled+0.091
Neg logit contributions
-0.097
ona-0.092
loe-0.089
 total-0.088
ink-0.088
TokenOnce
Token ID7454
Feature activation+1.80e-01

Pos logit contributions
ona+0.285
 blonde+0.269
ink+0.269
 total+0.265
elve+0.261
Neg logit contributions
 back-0.113
 for-0.113
 inside-0.106
 around-0.103
 worse-0.103
Token upon
Token ID2402
Feature activation+1.53e-01

Pos logit contributions
+0.123
ona+0.122
ink+0.117
 total+0.110
loe+0.107
Neg logit contributions
 worse-0.417
der-0.406
 for-0.403
 back-0.399
 altogether-0.387
Token a
Token ID257
Feature activation+6.48e-02

Pos logit contributions
+0.279
ona+0.279
ink+0.275
 total+0.267
loe+0.266
Neg logit contributions
 worse-0.182
der-0.172
 for-0.170
 back-0.167
 altogether-0.156
Token time
Token ID640
Feature activation+1.12e-01

Pos logit contributions
icians+0.083
ona+0.082
used+0.078
ink+0.078
ates+0.077
Neg logit contributions
 for-0.116
 back-0.108
 worse-0.105
 even-0.105
 bad-0.103
Token,
Token ID11
Feature activation+5.46e-02

Pos logit contributions
ona+0.272
icians+0.261
ates+0.257
 total+0.255
 blonde+0.255
Neg logit contributions
 for-0.096
 back-0.079
 he-0.079
 inside-0.069
.-0.068
Token there
Token ID612
Feature activation+6.98e-02

Pos logit contributions
+0.089
ona+0.088
ink+0.084
loe+0.083
een+0.082
Neg logit contributions
 for-0.082
 worse-0.080
 back-0.079
 inside-0.078
 around-0.073
Token and
Token ID290
Feature activation-1.89e-02

Pos logit contributions
+0.014
ona+0.014
ink+0.014
 total+0.014
loe+0.014
Neg logit contributions
 worse-0.004
der-0.003
 for-0.003
 back-0.003
 altogether-0.003
Token her
Token ID607
Feature activation+9.92e-03

Pos logit contributions
 worse+0.019
der+0.018
controlled+0.017
 altogether+0.016
ger+0.016
Neg logit contributions
ona-0.037
-0.036
ink-0.035
 total-0.035
loe-0.035
Token dad
Token ID9955
Feature activation+3.69e-03

Pos logit contributions
+0.018
ona+0.018
ink+0.017
 total+0.017
loe+0.017
Neg logit contributions
 worse-0.012
der-0.011
 for-0.011
 back-0.011
 altogether-0.010
Token.
Token ID13
Feature activation-6.06e-02

Pos logit contributions
ona+0.008
+0.008
ink+0.008
 total+0.008
loe+0.008
Neg logit contributions
 for-0.003
 back-0.003
 worse-0.003
der-0.003
 inside-0.003
Token<|endoftext|>
Token ID50256
Feature activation+1.28e-01

Pos logit contributions
 worse+0.052
der+0.049
 for+0.048
 back+0.046
 altogether+0.042
Neg logit contributions
ona-0.129
-0.128
ink-0.127
 total-0.125
loe-0.123
TokenOnce
Token ID7454
Feature activation+1.80e-01

Pos logit contributions
ona+0.283
 blonde+0.275
 total+0.269
ink+0.267
elve+0.266
Neg logit contributions
 back-0.108
 for-0.105
 inside-0.101
.-0.097
 around-0.097
Token upon
Token ID2402
Feature activation+1.56e-01

Pos logit contributions
+0.133
ona+0.131
ink+0.126
 total+0.119
een+0.116
Neg logit contributions
 worse-0.406
 for-0.399
der-0.395
 back-0.393
 altogether-0.376
Token a
Token ID257
Feature activation+6.50e-02

Pos logit contributions
+0.280
ona+0.280
ink+0.276
 total+0.269
loe+0.267
Neg logit contributions
 worse-0.187
der-0.177
 for-0.174
 back-0.171
 altogether-0.160
Token time
Token ID640
Feature activation+1.11e-01

Pos logit contributions
ona+0.082
icians+0.081
used+0.079
+0.078
ink+0.077
Neg logit contributions
 for-0.116
 back-0.110
 even-0.106
 worse-0.104
 bad-0.104
Token,
Token ID11
Feature activation+5.17e-02

Pos logit contributions
ona+0.276
icians+0.271
 total+0.264
 blonde+0.264
ates+0.263
Neg logit contributions
 for-0.086
 back-0.071
 he-0.066
 inside-0.064
 they-0.059
Token there
Token ID612
Feature activation+6.93e-02

Pos logit contributions
+0.087
ona+0.085
loe+0.081
ink+0.080
een+0.079
Neg logit contributions
 for-0.077
 back-0.074
 inside-0.073
 worse-0.073
 around-0.069
Token his
Token ID465
Feature activation+1.08e-02

Pos logit contributions
ona+0.063
+0.063
ink+0.062
 total+0.061
loe+0.060
Neg logit contributions
 worse-0.024
der-0.022
 for-0.022
 back-0.021
 altogether-0.019
Token delicious
Token ID12625
Feature activation+1.83e-02

Pos logit contributions
ona+0.021
+0.021
ink+0.021
loe+0.020
een+0.020
Neg logit contributions
 worse-0.011
 for-0.011
der-0.011
 back-0.011
 hospital-0.010
Token lunch
Token ID9965
Feature activation+4.10e-02

Pos logit contributions
ona+0.035
+0.034
loe+0.034
ink+0.034
een+0.033
Neg logit contributions
 for-0.022
 back-0.021
 worse-0.020
der-0.018
 hospital-0.018
Token.
Token ID13
Feature activation-5.61e-02

Pos logit contributions
ona+0.099
+0.098
ink+0.098
 total+0.096
loe+0.096
Neg logit contributions
 worse-0.024
 for-0.022
der-0.022
 back-0.021
 altogether-0.017
Token<|endoftext|>
Token ID50256
Feature activation+1.24e-01

Pos logit contributions
 worse+0.044
der+0.040
 for+0.038
 back+0.037
 altogether+0.034
Neg logit contributions
-0.126
ona-0.125
ink-0.123
 total-0.121
loe-0.121
TokenOnce
Token ID7454
Feature activation+1.80e-01

Pos logit contributions
ona+0.238
 total+0.225
 blonde+0.223
ink+0.221
+0.219
Neg logit contributions
 back-0.139
 for-0.138
 inside-0.133
 worse-0.131
 around-0.131
Token upon
Token ID2402
Feature activation+1.55e-01

Pos logit contributions
ona+0.103
+0.101
ink+0.098
 total+0.091
loe+0.089
Neg logit contributions
 worse-0.433
der-0.423
 back-0.410
 for-0.409
 altogether-0.403
Token a
Token ID257
Feature activation+6.84e-02

Pos logit contributions
+0.262
ona+0.262
ink+0.257
 total+0.254
 blonde+0.248
Neg logit contributions
 worse-0.197
der-0.191
 for-0.185
 back-0.181
 altogether-0.173
Token time
Token ID640
Feature activation+1.12e-01

Pos logit contributions
ona+0.087
ink+0.082
icians+0.082
+0.081
ates+0.079
Neg logit contributions
 for-0.124
 back-0.118
 worse-0.115
 even-0.114
 bad-0.112
Token there
Token ID612
Feature activation+6.72e-02

Pos logit contributions
ona+0.258
 total+0.247
 blonde+0.242
ates+0.241
ink+0.241
Neg logit contributions
 for-0.107
 back-0.091
 he-0.087
 inside-0.085
 they-0.075
Token was
Token ID373
Feature activation+1.17e-01

Pos logit contributions
ona+0.151
+0.147
 total+0.143
 blonde+0.143
elve+0.143
Neg logit contributions
 for-0.053
 worse-0.053
 he-0.052
 back-0.052
 inside-0.050
Token the
Token ID262
Feature activation-3.86e-02

Pos logit contributions
ona+0.020
+0.019
ink+0.019
 total+0.019
loe+0.019
Neg logit contributions
 for-0.010
 back-0.009
 worse-0.009
der-0.009
 even-0.008
Token victory
Token ID5373
Feature activation-2.31e-02

Pos logit contributions
 worse+0.049
der+0.047
 for+0.046
 back+0.046
 altogether+0.043
Neg logit contributions
ona-0.066
-0.066
ink-0.065
 total-0.064
loe-0.063
Token parade
Token ID16134
Feature activation+1.60e-02

Pos logit contributions
 worse+0.015
der+0.014
 for+0.013
 back+0.013
 altogether+0.011
Neg logit contributions
ona-0.054
-0.053
ink-0.053
 total-0.052
loe-0.051
Token.
Token ID13
Feature activation-5.90e-02

Pos logit contributions
ona+0.037
+0.037
ink+0.037
 total+0.036
loe+0.036
Neg logit contributions
 worse-0.010
der-0.009
 for-0.008
 back-0.008
 altogether-0.007
Token<|endoftext|>
Token ID50256
Feature activation+1.29e-01

Pos logit contributions
 worse+0.100
der+0.090
 back+0.088
 for+0.086
controlled+0.084
Neg logit contributions
-0.086
ona-0.082
loe-0.079
ink-0.079
 total-0.077
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.284
 blonde+0.271
ink+0.267
 total+0.267
elve+0.263
Neg logit contributions
 back-0.111
 for-0.110
 inside-0.104
 around-0.102
.-0.101
Token upon
Token ID2402
Feature activation+1.51e-01

Pos logit contributions
+0.124
ona+0.123
ink+0.118
 total+0.111
loe+0.108
Neg logit contributions
 worse-0.413
der-0.402
 for-0.401
 back-0.396
 altogether-0.383
Token a
Token ID257
Feature activation+6.66e-02

Pos logit contributions
+0.281
ona+0.280
ink+0.277
loe+0.269
 total+0.268
Neg logit contributions
 worse-0.174
der-0.162
 for-0.162
 back-0.159
 altogether-0.147
Token time
Token ID640
Feature activation+1.10e-01

Pos logit contributions
icians+0.089
ona+0.087
ink+0.084
used+0.084
ates+0.083
Neg logit contributions
 for-0.117
 back-0.109
 even-0.106
 worse-0.104
 bad-0.103
Token,
Token ID11
Feature activation+5.62e-02

Pos logit contributions
ona+0.278
icians+0.271
ates+0.264
 total+0.263
 blonde+0.261
Neg logit contributions
 for-0.087
 back-0.069
 he-0.067
.-0.060
 they-0.059
Token a
Token ID257
Feature activation+6.64e-02

Pos logit contributions
+0.097
ona+0.096
ink+0.090
loe+0.090
een+0.088
Neg logit contributions
 for-0.082
 back-0.080
 inside-0.079
 worse-0.078
 around-0.074
Token world
Token ID995
Feature activation+2.47e-02

Pos logit contributions
 worse+0.051
der+0.049
 for+0.048
 back+0.047
 altogether+0.045
Neg logit contributions
-0.070
ona-0.070
ink-0.069
 total-0.068
loe-0.067
Token around
Token ID1088
Feature activation+1.92e-02

Pos logit contributions
ona+0.055
+0.055
ink+0.054
 total+0.053
ere+0.052
Neg logit contributions
 worse-0.018
der-0.017
 for-0.015
 back-0.015
 hospital-0.014
Token her
Token ID607
Feature activation-6.18e-03

Pos logit contributions
ona+0.039
+0.039
ink+0.038
 total+0.037
loe+0.037
Neg logit contributions
 worse-0.019
 for-0.018
 back-0.018
der-0.018
 altogether-0.016
Token.
Token ID13
Feature activation-6.51e-02

Pos logit contributions
 worse+0.004
der+0.004
 for+0.003
 back+0.003
controlled+0.003
Neg logit contributions
-0.014
ona-0.014
ink-0.014
 total-0.014
 blonde-0.014
Token<|endoftext|>
Token ID50256
Feature activation+1.30e-01

Pos logit contributions
 worse+0.080
der+0.077
 for+0.076
 back+0.074
 altogether+0.070
Neg logit contributions
ona-0.114
ink-0.113
-0.113
 total-0.110
een-0.108
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.292
 blonde+0.280
 total+0.277
ink+0.276
elve+0.274
Neg logit contributions
 for-0.105
 back-0.105
 inside-0.099
.-0.097
 around-0.096
Token upon
Token ID2402
Feature activation+1.53e-01

Pos logit contributions
ona+0.112
+0.110
ink+0.106
 total+0.099
loe+0.096
Neg logit contributions
 worse-0.424
der-0.413
 for-0.405
 back-0.403
 altogether-0.393
Token a
Token ID257
Feature activation+6.57e-02

Pos logit contributions
+0.286
ona+0.286
ink+0.283
loe+0.274
 total+0.273
Neg logit contributions
 worse-0.175
der-0.163
 for-0.163
 back-0.160
 altogether-0.148
Token time
Token ID640
Feature activation+1.10e-01

Pos logit contributions
icians+0.088
ona+0.087
ink+0.084
used+0.083
ates+0.082
Neg logit contributions
 for-0.114
 back-0.106
 even-0.104
 worse-0.103
 bad-0.102
Token,
Token ID11
Feature activation+5.42e-02

Pos logit contributions
ona+0.269
icians+0.261
ates+0.259
 total+0.255
ere+0.254
Neg logit contributions
 for-0.092
 back-0.075
 he-0.074
 inside-0.067
 they-0.063
Token there
Token ID612
Feature activation+6.89e-02

Pos logit contributions
+0.090
ona+0.089
ink+0.085
loe+0.083
een+0.082
Neg logit contributions
 for-0.081
 back-0.079
 inside-0.079
 worse-0.078
 around-0.073
Token all
Token ID477
Feature activation-3.15e-02

Pos logit contributions
ona+0.037
 blonde+0.037
 total+0.037
ink+0.037
+0.036
Neg logit contributions
 worse-0.023
controlled-0.021
der-0.021
 hospital-0.021
 grave-0.020
Token day
Token ID1110
Feature activation+4.90e-02

Pos logit contributions
der+0.036
 worse+0.036
 grave+0.031
 altogether+0.030
 hospital+0.030
Neg logit contributions
ink-0.058
ona-0.058
loe-0.057
 total-0.057
ere-0.057
Token long
Token ID890
Feature activation-5.33e-03

Pos logit contributions
ona+0.087
+0.087
ink+0.085
 total+0.084
loe+0.084
Neg logit contributions
 worse-0.059
der-0.057
 for-0.057
 back-0.055
 altogether-0.052
Token.
Token ID13
Feature activation-6.92e-02

Pos logit contributions
 worse+0.002
 for+0.002
der+0.002
 back+0.002
 altogether+0.001
Neg logit contributions
ona-0.014
-0.014
ink-0.014
 total-0.014
loe-0.014
Token<|endoftext|>
Token ID50256
Feature activation+1.30e-01

Pos logit contributions
 worse+0.076
der+0.069
 for+0.068
 back+0.067
 altogether+0.061
Neg logit contributions
-0.136
ona-0.133
ink-0.130
loe-0.129
 total-0.128
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.276
 blonde+0.262
 total+0.262
ink+0.261
elve+0.257
Neg logit contributions
 back-0.119
 for-0.117
 inside-0.112
 worse-0.110
 around-0.109
Token upon
Token ID2402
Feature activation+1.55e-01

Pos logit contributions
ona+0.112
+0.111
ink+0.107
 total+0.100
loe+0.097
Neg logit contributions
 worse-0.423
der-0.412
 for-0.404
 back-0.402
 altogether-0.393
Token a
Token ID257
Feature activation+6.62e-02

Pos logit contributions
ona+0.262
+0.262
ink+0.257
 total+0.254
 blonde+0.248
Neg logit contributions
 worse-0.198
der-0.192
 for-0.186
 back-0.182
 altogether-0.174
Token time
Token ID640
Feature activation+1.12e-01

Pos logit contributions
ona+0.088
icians+0.086
used+0.083
lor+0.081
ink+0.081
Neg logit contributions
 for-0.114
 back-0.108
 worse-0.105
 even-0.105
 bad-0.105
Token,
Token ID11
Feature activation+5.41e-02

Pos logit contributions
ona+0.280
icians+0.273
ates+0.263
 total+0.263
lor+0.261
Neg logit contributions
 for-0.089
 back-0.072
 he-0.070
 inside-0.064
.-0.061
Token there
Token ID612
Feature activation+6.93e-02

Pos logit contributions
+0.093
ona+0.093
ink+0.088
loe+0.087
een+0.086
Neg logit contributions
 for-0.076
 worse-0.075
 back-0.075
 inside-0.073
 around-0.068
Token on
Token ID319
Feature activation+4.65e-02

Pos logit contributions
ona+0.058
+0.058
ink+0.057
 total+0.056
loe+0.056
Neg logit contributions
 worse-0.021
der-0.020
 for-0.020
 back-0.019
 altogether-0.017
Token his
Token ID465
Feature activation-3.87e-04

Pos logit contributions
ona+0.114
+0.113
ink+0.112
een+0.110
loe+0.108
Neg logit contributions
 for-0.027
 back-0.027
 worse-0.026
der-0.024
 he-0.021
Token face
Token ID1986
Feature activation-8.35e-03

Pos logit contributions
 worse+0.000
der+0.000
hot+0.000
 for+0.000
 altogether+0.000
Neg logit contributions
-0.001
ona-0.001
ink-0.001
 blonde-0.001
 total-0.001
Token.
Token ID13
Feature activation-5.68e-02

Pos logit contributions
 worse+0.003
der+0.003
 for+0.003
 back+0.003
 altogether+0.002
Neg logit contributions
ona-0.022
-0.021
ink-0.021
 total-0.021
loe-0.021
Token<|endoftext|>
Token ID50256
Feature activation+1.29e-01

Pos logit contributions
der+0.044
 for+0.041
 worse+0.041
 back+0.040
 he+0.037
Neg logit contributions
ink-0.124
ona-0.124
 total-0.121
-0.120
een-0.119
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.240
 blonde+0.225
 total+0.223
elve+0.218
ink+0.217
Neg logit contributions
 back-0.155
 inside-0.153
 for-0.149
 around-0.148
 He-0.145
Token upon
Token ID2402
Feature activation+1.56e-01

Pos logit contributions
+0.132
ona+0.130
ink+0.125
 total+0.117
loe+0.114
Neg logit contributions
 worse-0.405
 for-0.397
der-0.393
 back-0.391
 altogether-0.375
Token a
Token ID257
Feature activation+6.56e-02

Pos logit contributions
+0.277
ona+0.277
ink+0.273
 total+0.267
loe+0.264
Neg logit contributions
 worse-0.189
der-0.179
 for-0.176
 back-0.173
 altogether-0.162
Token time
Token ID640
Feature activation+1.13e-01

Pos logit contributions
ona+0.085
ink+0.083
icians+0.082
+0.081
lor+0.080
Neg logit contributions
 for-0.114
 back-0.110
 even-0.107
 bad-0.103
 worse-0.103
Token,
Token ID11
Feature activation+5.49e-02

Pos logit contributions
ona+0.275
icians+0.268
ates+0.265
 total+0.261
ere+0.259
Neg logit contributions
 for-0.091
 he-0.078
 back-0.077
 inside-0.071
 Tim-0.066
Token there
Token ID612
Feature activation+6.99e-02

Pos logit contributions
+0.088
ona+0.088
ink+0.083
loe+0.082
een+0.081
Neg logit contributions
 for-0.081
 back-0.081
 worse-0.080
 inside-0.079
 around-0.075
Token help
Token ID1037
Feature activation-3.21e-02

Pos logit contributions
ink+0.037
+0.036
ona+0.036
loe+0.034
ere+0.034
Neg logit contributions
 for-0.029
 back-0.028
 worse-0.028
 even-0.026
 fix-0.025
Token him
Token ID683
Feature activation-1.48e-02

Pos logit contributions
 worse+0.020
 for+0.017
der+0.017
 back+0.017
 altogether+0.014
Neg logit contributions
-0.077
ona-0.077
ink-0.076
 total-0.075
loe-0.074
Token out
Token ID503
Feature activation+1.92e-02

Pos logit contributions
 worse+0.008
 for+0.007
 back+0.007
der+0.006
 inside+0.005
Neg logit contributions
-0.037
ona-0.037
ink-0.037
loe-0.036
ere-0.036
Token.
Token ID13
Feature activation-6.77e-02

Pos logit contributions
+0.044
ona+0.044
ink+0.043
 total+0.042
loe+0.042
Neg logit contributions
 worse-0.014
 for-0.013
der-0.013
 back-0.013
 altogether-0.011
Token<|endoftext|>
Token ID50256
Feature activation+1.27e-01

Pos logit contributions
 worse+0.064
der+0.061
 for+0.059
 back+0.058
 altogether+0.053
Neg logit contributions
ona-0.138
-0.138
ink-0.136
 total-0.134
loe-0.132
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.286
 blonde+0.277
ink+0.272
 total+0.270
elve+0.268
Neg logit contributions
 for-0.103
 back-0.102
 inside-0.096
.-0.095
 around-0.093
Token upon
Token ID2402
Feature activation+1.52e-01

Pos logit contributions
+0.137
ona+0.133
ink+0.129
 total+0.122
een+0.119
Neg logit contributions
 worse-0.400
 for-0.395
der-0.389
 back-0.388
 altogether-0.371
Token a
Token ID257
Feature activation+6.54e-02

Pos logit contributions
ona+0.290
+0.290
ink+0.287
loe+0.279
 total+0.276
Neg logit contributions
 worse-0.171
 for-0.159
der-0.158
 back-0.157
 altogether-0.143
Token time
Token ID640
Feature activation+1.12e-01

Pos logit contributions
icians+0.084
ona+0.083
ink+0.080
used+0.080
+0.079
Neg logit contributions
 for-0.119
 back-0.108
 even-0.105
 worse-0.104
 bad-0.104
Token,
Token ID11
Feature activation+5.33e-02

Pos logit contributions
ona+0.279
icians+0.272
ates+0.267
 blonde+0.267
ere+0.266
Neg logit contributions
 for-0.091
 he-0.071
 back-0.067
 they-0.062
 inside-0.062
Token there
Token ID612
Feature activation+6.92e-02

Pos logit contributions
+0.092
ona+0.092
ink+0.088
loe+0.086
een+0.085
Neg logit contributions
 for-0.078
 inside-0.076
 back-0.075
 worse-0.073
 around-0.069
Token left
Token ID1364
Feature activation-8.03e-03

Pos logit contributions
+0.185
ona+0.180
ink+0.176
een+0.174
loe+0.173
Neg logit contributions
 back-0.086
 worse-0.085
 for-0.085
 even-0.080
 around-0.072
Token all
Token ID477
Feature activation-2.68e-02

Pos logit contributions
 worse+0.007
der+0.006
 for+0.006
 back+0.006
 altogether+0.005
Neg logit contributions
ona-0.017
-0.017
ink-0.017
 total-0.017
loe-0.017
Token alone
Token ID3436
Feature activation+2.84e-03

Pos logit contributions
 worse+0.029
der+0.029
 for+0.025
 back+0.025
 altogether+0.024
Neg logit contributions
ona-0.051
-0.051
ink-0.050
 total-0.050
ere-0.049
Token.
Token ID13
Feature activation-7.10e-02

Pos logit contributions
+0.007
ona+0.007
ink+0.007
 total+0.007
loe+0.006
Neg logit contributions
 worse-0.002
der-0.002
 for-0.001
 back-0.001
 hospital-0.001
Token<|endoftext|>
Token ID50256
Feature activation+1.28e-01

Pos logit contributions
 worse+0.087
der+0.080
 for+0.079
 back+0.078
 altogether+0.073
Neg logit contributions
-0.130
ona-0.127
ink-0.125
 total-0.123
loe-0.122
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.265
ink+0.252
 blonde+0.251
 total+0.249
+0.245
Neg logit contributions
 back-0.126
 for-0.124
 worse-0.119
 inside-0.116
 around-0.113
Token upon
Token ID2402
Feature activation+1.53e-01

Pos logit contributions
ona+0.112
+0.111
ink+0.107
 total+0.100
loe+0.097
Neg logit contributions
 worse-0.422
der-0.411
 for-0.403
 back-0.401
 altogether-0.392
Token a
Token ID257
Feature activation+6.46e-02

Pos logit contributions
+0.265
ona+0.265
ink+0.260
 total+0.256
loe+0.251
Neg logit contributions
 worse-0.190
der-0.182
 for-0.178
 back-0.174
 altogether-0.165
Token time
Token ID640
Feature activation+1.12e-01

Pos logit contributions
ona+0.087
icians+0.085
ink+0.081
used+0.080
ates+0.080
Neg logit contributions
 for-0.113
 back-0.107
 worse-0.103
 even-0.103
 bad-0.102
Token,
Token ID11
Feature activation+5.34e-02

Pos logit contributions
ona+0.273
icians+0.263
ates+0.259
 blonde+0.255
ere+0.255
Neg logit contributions
 for-0.095
 back-0.080
 he-0.076
 inside-0.068
.-0.065
Token there
Token ID612
Feature activation+6.89e-02

Pos logit contributions
+0.088
ona+0.088
ink+0.084
loe+0.082
een+0.081
Neg logit contributions
 for-0.079
 back-0.077
 worse-0.077
 inside-0.075
 around-0.071
Token the
Token ID262
Feature activation-1.63e-02

Pos logit contributions
ona+0.127
+0.126
ink+0.125
 total+0.122
loe+0.121
Neg logit contributions
 worse-0.068
der-0.063
 for-0.063
 back-0.061
 altogether-0.057
Token door
Token ID3420
Feature activation+1.62e-02

Pos logit contributions
der+0.023
 worse+0.023
 for+0.021
 theirs+0.021
controlled+0.021
Neg logit contributions
 total-0.025
 blonde-0.024
-0.023
ona-0.023
ere-0.023
Token.
Token ID13
Feature activation-6.09e-02

Pos logit contributions
+0.033
ink+0.033
 total+0.032
ona+0.032
 blonde+0.032
Neg logit contributions
 worse-0.013
der-0.012
controlled-0.012
 hospital-0.011
ger-0.011
Token
Token ID198
Feature activation+1.24e-01

Pos logit contributions
 He+0.049
der+0.048
 back+0.048
 for+0.048
 he+0.048
Neg logit contributions
ink-0.132
ona-0.128
 total-0.125
elve-0.124
ere-0.122
Token
Token ID198
Feature activation+1.26e-01

Pos logit contributions
ink+0.264
+0.264
ona+0.262
 total+0.261
ael+0.257
Neg logit contributions
 he-0.094
 back-0.092
 He-0.091
 worse-0.090
 for-0.086
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
+0.210
ink+0.209
ates+0.201
ona+0.199
 total+0.195
Neg logit contributions
hot-0.154
 worse-0.154
 he-0.151
 even-0.151
 smoothly-0.150
Token she
Token ID673
Feature activation+2.96e-03

Pos logit contributions
ona+0.267
+0.262
ink+0.261
loe+0.254
 total+0.251
Neg logit contributions
 worse-0.267
der-0.259
controlled-0.242
 for-0.240
 back-0.239
Token was
Token ID373
Feature activation+9.70e-02

Pos logit contributions
ates+0.007
icer+0.006
unks+0.006
loe+0.006
ink+0.006
Neg logit contributions
 inside-0.003
 even-0.003
 he-0.003
 back-0.003
 from-0.003
Token inside
Token ID2641
Feature activation+1.46e-02

Pos logit contributions
+0.248
ona+0.235
 total+0.230
icer+0.229
if+0.228
Neg logit contributions
 back-0.083
 inside-0.080
 for-0.080
 around-0.077
 worse-0.073
Token,
Token ID11
Feature activation+3.76e-02

Pos logit contributions
ona+0.030
ink+0.030
+0.030
 total+0.029
ates+0.029
Neg logit contributions
 for-0.016
 back-0.015
 worse-0.014
 he-0.014
 inside-0.014
Token D
Token ID360
Feature activation+3.66e-02

Pos logit contributions
ink+0.087
ona+0.087
+0.086
ere+0.083
een+0.083
Neg logit contributions
 worse-0.028
 for-0.026
 back-0.025
der-0.023
 he-0.023
Token through
Token ID832
Feature activation+5.79e-02

Pos logit contributions
 worse+0.023
der+0.021
 for+0.020
 back+0.019
 altogether+0.018
Neg logit contributions
-0.074
ona-0.074
ink-0.073
 total-0.072
loe-0.071
Token it
Token ID340
Feature activation-3.75e-02

Pos logit contributions
ona+0.122
+0.121
ink+0.120
loe+0.116
 total+0.116
Neg logit contributions
 worse-0.053
 for-0.050
der-0.050
 back-0.049
 altogether-0.044
Token.
Token ID13
Feature activation-5.70e-02

Pos logit contributions
 worse+0.023
der+0.020
 for+0.018
 back+0.018
 hospital+0.016
Neg logit contributions
-0.089
ona-0.088
ink-0.087
 total-0.086
loe-0.085
Token
Token ID198
Feature activation+1.28e-01

Pos logit contributions
 for+0.040
der+0.040
 back+0.039
 he+0.039
 worse+0.039
Neg logit contributions
ink-0.128
ona-0.126
 total-0.123
-0.121
ere-0.121
Token
Token ID198
Feature activation+1.29e-01

Pos logit contributions
+0.283
ink+0.280
ona+0.272
 blonde+0.272
 total+0.269
Neg logit contributions
 he-0.096
 He-0.095
 back-0.091
 even-0.087
 for-0.087
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
+0.229
ink+0.224
ona+0.213
 blonde+0.209
een+0.206
Neg logit contributions
 he-0.153
 worse-0.150
 even-0.149
hot-0.147
 back-0.147
Token the
Token ID262
Feature activation-2.95e-04

Pos logit contributions
ona+0.303
ink+0.300
+0.296
loe+0.289
 total+0.289
Neg logit contributions
 worse-0.230
der-0.221
controlled-0.202
 for-0.200
 back-0.200
Token bunny
Token ID44915
Feature activation+3.88e-02

Pos logit contributions
 worse+0.000
 for+0.000
 back+0.000
der+0.000
 hospital+0.000
Neg logit contributions
ink-0.001
ona-0.001
-0.001
een-0.001
loe-0.001
Token was
Token ID373
Feature activation+9.44e-02

Pos logit contributions
+0.077
 total+0.075
loe+0.075
ona+0.073
ates+0.072
Neg logit contributions
 for-0.041
 back-0.039
 worse-0.038
 even-0.038
 he-0.038
Token inside
Token ID2641
Feature activation+9.53e-03

Pos logit contributions
+0.218
ona+0.205
icer+0.201
lor+0.198
een+0.198
Neg logit contributions
 back-0.106
 for-0.104
 around-0.102
 inside-0.098
 even-0.094
Token the
Token ID262
Feature activation-2.60e-02

Pos logit contributions
ona+0.021
+0.020
ink+0.020
loe+0.020
 total+0.020
Neg logit contributions
 for-0.010
 he-0.009
 back-0.008
 worse-0.008
 inside-0.008
Token on
Token ID319
Feature activation+4.80e-02

Pos logit contributions
+0.056
ink+0.056
ona+0.056
 total+0.054
loe+0.053
Neg logit contributions
 worse-0.026
der-0.024
 for-0.022
 back-0.022
 altogether-0.021
Token his
Token ID465
Feature activation+1.59e-03

Pos logit contributions
ona+0.111
+0.110
ink+0.109
 total+0.107
een+0.106
Neg logit contributions
 worse-0.035
der-0.032
 for-0.031
 back-0.031
 altogether-0.027
Token face
Token ID1986
Feature activation-7.89e-03

Pos logit contributions
ink+0.003
 blonde+0.003
+0.003
ona+0.003
ates+0.003
Neg logit contributions
 worse-0.002
der-0.002
hot-0.002
ger-0.002
 smoothly-0.002
Token.
Token ID13
Feature activation-6.09e-02

Pos logit contributions
 worse+0.004
der+0.004
 for+0.003
 back+0.003
 altogether+0.003
Neg logit contributions
-0.020
ona-0.020
ink-0.019
 total-0.019
loe-0.019
Token<|endoftext|>
Token ID50256
Feature activation+1.30e-01

Pos logit contributions
 worse+0.058
der+0.057
 for+0.056
 back+0.054
 altogether+0.050
Neg logit contributions
ona-0.123
ink-0.121
-0.121
 total-0.118
loe-0.117
TokenOnce
Token ID7454
Feature activation+1.79e-01

Pos logit contributions
ona+0.283
 blonde+0.267
 total+0.264
ink+0.263
elve+0.259
Neg logit contributions
 back-0.118
 for-0.116
 inside-0.112
 around-0.109
 worse-0.105
Token there
Token ID612
Feature activation+6.84e-02

Pos logit contributions
+0.123
ona+0.122
ink+0.117
 total+0.110
loe+0.107
Neg logit contributions
 worse-0.412
der-0.401
 for-0.399
 back-0.395
 altogether-0.382
Token was
Token ID373
Feature activation+1.16e-01

Pos logit contributions
ona+0.153
ink+0.148
+0.146
icians+0.146
 blonde+0.144
Neg logit contributions
 for-0.061
 back-0.061
 he-0.057
 inside-0.053
 worse-0.052
Token a
Token ID257
Feature activation+5.26e-02

Pos logit contributions
ona+0.204
+0.203
ink+0.200
 total+0.196
loe+0.193
Neg logit contributions
 worse-0.146
der-0.138
 for-0.137
 back-0.134
 altogether-0.126
Token little
Token ID1310
Feature activation+5.87e-02

Pos logit contributions
+0.072
ona+0.072
ink+0.071
 total+0.069
loe+0.068
Neg logit contributions
 worse-0.085
der-0.082
 for-0.081
 back-0.080
 altogether-0.076
Token girl
Token ID2576
Feature activation+4.43e-02

Pos logit contributions
ona+0.112
+0.112
ink+0.110
 total+0.107
ere+0.107
Neg logit contributions
 worse-0.065
der-0.063
 altogether-0.058
controlled-0.054
 back-0.053
Token best
Token ID1266
Feature activation+3.22e-02

Pos logit contributions
 worse+0.026
 back+0.024
 for+0.024
 even+0.023
 hospital+0.023
Neg logit contributions
ink-0.031
-0.030
ona-0.030
een-0.030
loe-0.030
Token of
Token ID286
Feature activation+1.40e-02

Pos logit contributions
+0.045
ona+0.044
loe+0.042
ink+0.041
ere+0.041
Neg logit contributions
 worse-0.052
 altogether-0.048
der-0.047
 grave-0.047
 smoothly-0.047
Token friends
Token ID2460
Feature activation+7.05e-03

Pos logit contributions
ona+0.021
+0.021
ink+0.020
 total+0.020
een+0.020
Neg logit contributions
 worse-0.022
 for-0.022
 back-0.021
der-0.021
 even-0.020
Token.
Token ID13
Feature activation-6.24e-02

Pos logit contributions
+0.015
ona+0.014
ink+0.014
loe+0.014
 total+0.014
Neg logit contributions
 worse-0.006
der-0.005
 grave-0.005
 trigger-0.005
controlled-0.005
Token<|endoftext|>
Token ID50256
Feature activation+1.26e-01

Pos logit contributions
 worse+0.057
der+0.053
 for+0.051
 back+0.050
 altogether+0.046
Neg logit contributions
-0.131
ona-0.130
ink-0.128
 total-0.126
loe-0.125
TokenOnce
Token ID7454
Feature activation+1.78e-01

Pos logit contributions
ona+0.277
 blonde+0.264
 total+0.262
ink+0.259
elve+0.258
Neg logit contributions
 back-0.105
 for-0.104
 inside-0.101
.-0.097
 around-0.097
Token upon
Token ID2402
Feature activation+1.53e-01

Pos logit contributions
+0.122
ona+0.121
ink+0.116
 total+0.109
loe+0.106
Neg logit contributions
 worse-0.412
der-0.401
 for-0.399
 back-0.395
 altogether-0.383
Token a
Token ID257
Feature activation+6.48e-02

Pos logit contributions
+0.278
ona+0.278
ink+0.273
 total+0.266
loe+0.265
Neg logit contributions
 worse-0.182
der-0.171
 for-0.169
 back-0.167
 altogether-0.155
Token time
Token ID640
Feature activation+1.11e-01

Pos logit contributions
ona+0.083
icians+0.082
used+0.079
+0.078
ink+0.078
Neg logit contributions
 for-0.115
 back-0.108
 even-0.105
 worse-0.102
 bad-0.102
Token,
Token ID11
Feature activation+5.33e-02

Pos logit contributions
ona+0.278
icians+0.269
ates+0.264
 total+0.264
 blonde+0.262
Neg logit contributions
 for-0.086
 back-0.070
 he-0.063
 inside-0.063
.-0.060
Token there
Token ID612
Feature activation+6.95e-02

Pos logit contributions
+0.086
ona+0.084
ink+0.081
loe+0.079
een+0.078
Neg logit contributions
 for-0.081
 worse-0.079
 back-0.079
 inside-0.077
 around-0.073
Token new
Token ID649
Feature activation-2.34e-02

Pos logit contributions
+0.036
ona+0.036
ink+0.035
 total+0.034
loe+0.034
Neg logit contributions
 worse-0.031
der-0.030
 for-0.030
 back-0.029
 altogether-0.028
Token spot
Token ID4136
Feature activation+6.43e-02

Pos logit contributions
 worse+0.028
 for+0.027
der+0.026
 back+0.026
 altogether+0.024
Neg logit contributions
-0.042
ona-0.042
ink-0.042
loe-0.041
 total-0.040
Token.
Token ID13
Feature activation-5.32e-02

Pos logit contributions
+0.153
ona+0.153
ink+0.149
 total+0.148
loe+0.147
Neg logit contributions
 worse-0.040
 for-0.039
der-0.037
 back-0.037
 altogether-0.030
Token
Token ID198
Feature activation+1.24e-01

Pos logit contributions
 for+0.038
der+0.038
 worse+0.037
 back+0.037
 he+0.035
Neg logit contributions
ink-0.116
ona-0.116
-0.114
 total-0.113
ere-0.111
Token
Token ID198
Feature activation+1.25e-01

Pos logit contributions
+0.253
ink+0.246
ona+0.243
 blonde+0.239
 total+0.238
Neg logit contributions
 for-0.115
 worse-0.115
 back-0.112
 he-0.104
 even-0.103
TokenOnce
Token ID7454
Feature activation+1.78e-01

Pos logit contributions
+0.205
ink+0.199
ona+0.192
 blonde+0.188
 total+0.184
Neg logit contributions
 worse-0.165
 smoothly-0.157
 he-0.156
 even-0.155
 back-0.155
Token Robert
Token ID5199
Feature activation+6.91e-02

Pos logit contributions
ona+0.277
ink+0.272
+0.267
loe+0.264
 total+0.263
Neg logit contributions
 worse-0.260
der-0.250
controlled-0.229
 grave-0.228
 for-0.227
Token reached
Token ID4251
Feature activation+8.56e-02

Pos logit contributions
+0.156
loe+0.147
 blonde+0.143
 total+0.142
ates+0.141
Neg logit contributions
 for-0.062
 back-0.060
 he-0.059
 inside-0.057
 even-0.056
Token the
Token ID262
Feature activation-1.55e-02

Pos logit contributions
+0.191
ona+0.187
ink+0.187
loe+0.181
 total+0.180
Neg logit contributions
 for-0.074
 back-0.071
 worse-0.070
der-0.062
 inside-0.060
Token top
Token ID1353
Feature activation+5.29e-03

Pos logit contributions
 worse+0.021
 for+0.020
 back+0.020
der+0.020
 hospital+0.019
Neg logit contributions
-0.026
ona-0.026
ink-0.025
loe-0.025
ere-0.024
Token of
Token ID286
Feature activation+2.15e-02

Pos logit contributions
ona+0.010
+0.010
loe+0.010
ink+0.010
ere+0.010
Neg logit contributions
 for-0.006
 back-0.005
 worse-0.005
 he-0.005
der-0.005
Token all
Token ID477
Feature activation-5.07e-02

Pos logit contributions
 worse+0.011
der+0.010
 for+0.009
 back+0.009
 altogether+0.009
Neg logit contributions
-0.027
ona-0.027
ink-0.026
 total-0.026
loe-0.026
Token by
Token ID416
Feature activation+2.94e-02

Pos logit contributions
 worse+0.053
der+0.051
 back+0.046
 for+0.046
 altogether+0.044
Neg logit contributions
ona-0.097
-0.097
ink-0.096
 total-0.095
ere-0.094
Token herself
Token ID5223
Feature activation-4.78e-02

Pos logit contributions
+0.072
ona+0.071
ink+0.070
 total+0.068
 blonde+0.067
Neg logit contributions
 for-0.021
der-0.019
 worse-0.019
 even-0.017
 back-0.016
Token.
Token ID13
Feature activation-7.15e-02

Pos logit contributions
 worse+0.026
der+0.022
 back+0.020
 for+0.020
 altogether+0.017
Neg logit contributions
-0.116
ona-0.116
ink-0.115
 total-0.113
ere-0.112
Token<|endoftext|>
Token ID50256
Feature activation+1.29e-01

Pos logit contributions
 worse+0.126
der+0.112
controlled+0.110
 back+0.109
ger+0.108
Neg logit contributions
-0.102
Once-0.092
ona-0.091
loe-0.091
 total-0.086
TokenOnce
Token ID7454
Feature activation+1.78e-01

Pos logit contributions
ona+0.274
ink+0.260
 blonde+0.259
 total+0.256
elve+0.253
Neg logit contributions
 back-0.117
 for-0.115
 worse-0.109
 inside-0.109
 around-0.106
Token upon
Token ID2402
Feature activation+1.50e-01

Pos logit contributions
ona+0.098
+0.095
ink+0.092
 total+0.086
loe+0.084
Neg logit contributions
 worse-0.435
der-0.424
 back-0.408
 for-0.406
 altogether-0.403
Token a
Token ID257
Feature activation+6.62e-02

Pos logit contributions
+0.260
ona+0.260
ink+0.255
 total+0.251
loe+0.246
Neg logit contributions
 worse-0.186
der-0.178
 for-0.174
 back-0.171
 altogether-0.162
Token time
Token ID640
Feature activation+1.10e-01

Pos logit contributions
ona+0.082
icians+0.081
ink+0.079
ates+0.076
+0.076
Neg logit contributions
 for-0.121
 back-0.113
 worse-0.112
 even-0.110
 bad-0.108
Token,
Token ID11
Feature activation+5.48e-02

Pos logit contributions
ona+0.259
ates+0.247
icians+0.247
ere+0.245
ink+0.244
Neg logit contributions
 for-0.100
 back-0.085
 he-0.083
 they-0.074
 inside-0.072
Token there
Token ID612
Feature activation+6.89e-02

Pos logit contributions
ona+0.091
+0.091
ink+0.086
loe+0.084
een+0.084
Neg logit contributions
 for-0.081
 back-0.079
 worse-0.078
 inside-0.078
 around-0.073
Token a
Token ID257
Feature activation+2.28e-02

Pos logit contributions
 worse+0.030
der+0.028
 altogether+0.027
controlled+0.026
ger+0.026
Neg logit contributions
ona-0.050
-0.048
 total-0.048
 blonde-0.048
ink-0.047
Token happy
Token ID3772
Feature activation+2.65e-02

Pos logit contributions
ona+0.041
ink+0.041
+0.041
loe+0.039
een+0.039
Neg logit contributions
 worse-0.028
 for-0.026
 back-0.026
der-0.026
 hospital-0.024
Token heart
Token ID2612
Feature activation-4.61e-04

Pos logit contributions
+0.045
ona+0.045
 total+0.044
 blonde+0.043
ink+0.043
Neg logit contributions
 worse-0.034
der-0.032
controlled-0.030
 for-0.029
 back-0.029
Token.
Token ID13
Feature activation-6.73e-02

Pos logit contributions
 worse+0.000
der+0.000
controlled+0.000
 hospital+0.000
 altogether+0.000
Neg logit contributions
ink-0.001
-0.001
ona-0.001
 total-0.001
loe-0.001
Token<|endoftext|>
Token ID50256
Feature activation+1.26e-01

Pos logit contributions
 worse+0.059
der+0.055
 for+0.053
 back+0.052
 altogether+0.047
Neg logit contributions
ona-0.142
-0.142
ink-0.140
 total-0.138
loe-0.137
TokenOnce
Token ID7454
Feature activation+1.78e-01

Pos logit contributions
ona+0.271
 blonde+0.263
 total+0.258
ink+0.255
+0.255
Neg logit contributions
 back-0.113
 for-0.110
 inside-0.105
 around-0.102
.-0.098
Token upon
Token ID2402
Feature activation+1.57e-01

Pos logit contributions
+0.156
ona+0.148
ink+0.143
 blonde+0.138
een+0.138
Neg logit contributions
 for-0.387
 worse-0.382
 back-0.378
der-0.369
 inside-0.356
Token a
Token ID257
Feature activation+6.43e-02

Pos logit contributions
+0.276
ona+0.276
ink+0.272
 total+0.266
loe+0.263
Neg logit contributions
 worse-0.193
der-0.184
 for-0.180
 back-0.177
 altogether-0.167
Token time
Token ID640
Feature activation+1.13e-01

Pos logit contributions
ona+0.082
icians+0.079
used+0.078
+0.077
ink+0.076
Neg logit contributions
 for-0.116
 back-0.110
 even-0.105
 bad-0.104
 worse-0.104
Token,
Token ID11
Feature activation+5.19e-02

Pos logit contributions
ona+0.289
icians+0.282
 total+0.277
 blonde+0.277
ited+0.273
Neg logit contributions
 for-0.084
 back-0.067
 he-0.062
 inside-0.059
 from-0.053
Token there
Token ID612
Feature activation+6.81e-02

Pos logit contributions
+0.090
ona+0.088
loe+0.083
ink+0.082
een+0.081
Neg logit contributions
 for-0.077
 back-0.074
 inside-0.073
 worse-0.073
 around-0.068
Token happy
Token ID3772
Feature activation+2.00e-02

Pos logit contributions
+0.233
ona+0.221
 total+0.220
elve+0.216
een+0.214
Neg logit contributions
 back-0.091
 worse-0.088
 for-0.087
 even-0.084
 inside-0.077
Token and
Token ID290
Feature activation-3.56e-02

Pos logit contributions
+0.042
ona+0.041
ink+0.040
 total+0.039
Once+0.039
Neg logit contributions
 worse-0.018
controlled-0.017
der-0.015
 trigger-0.015
ger-0.015
Token warm
Token ID5814
Feature activation+3.82e-03

Pos logit contributions
 worse+0.046
der+0.045
 altogether+0.041
 for+0.039
controlled+0.039
Neg logit contributions
ink-0.061
ona-0.060
-0.060
 blonde-0.057
een-0.056
Token.
Token ID13
Feature activation-7.30e-02

Pos logit contributions
+0.008
ink+0.008
ona+0.008
 blonde+0.008
een+0.007
Neg logit contributions
 worse-0.004
der-0.003
controlled-0.003
 hospital-0.003
 arrow-0.003
Token<|endoftext|>
Token ID50256
Feature activation+1.26e-01

Pos logit contributions
 worse+0.092
der+0.079
 for+0.077
 back+0.077
controlled+0.074
Neg logit contributions
-0.137
ona-0.130
ink-0.128
loe-0.127
 blonde-0.127
TokenOnce
Token ID7454
Feature activation+1.78e-01

Pos logit contributions
ona+0.262
 blonde+0.250
 total+0.248
ink+0.248
+0.243
Neg logit contributions
 back-0.121
 for-0.118
 worse-0.115
 inside-0.112
 around-0.109
Token upon
Token ID2402
Feature activation+1.53e-01

Pos logit contributions
ona+0.111
+0.110
ink+0.106
 total+0.100
loe+0.096
Neg logit contributions
 worse-0.421
der-0.411
 for-0.401
 back-0.399
 altogether-0.391
Token a
Token ID257
Feature activation+6.22e-02

Pos logit contributions
+0.250
ona+0.250
ink+0.245
 total+0.244
 blonde+0.237
Neg logit contributions
 worse-0.200
der-0.195
 for-0.187
 back-0.182
 altogether-0.177
Token time
Token ID640
Feature activation+1.11e-01

Pos logit contributions
ona+0.080
icians+0.079
ink+0.075
used+0.075
ates+0.074
Neg logit contributions
 for-0.111
 back-0.106
 worse-0.102
 bad-0.100
 even-0.100
Token,
Token ID11
Feature activation+5.04e-02

Pos logit contributions
ona+0.264
icians+0.253
 blonde+0.251
 total+0.250
ates+0.246
Neg logit contributions
 for-0.098
 back-0.085
 he-0.079
 inside-0.073
.-0.070
Token there
Token ID612
Feature activation+6.64e-02

Pos logit contributions
+0.080
ona+0.078
ink+0.075
loe+0.075
een+0.073
Neg logit contributions
 for-0.078
 worse-0.077
 back-0.075
 inside-0.073
der-0.069
Token You
Token ID921
Feature activation-3.90e-02

Pos logit contributions
 worse+0.125
 for+0.112
der+0.110
 back+0.109
 grave+0.102
Neg logit contributions
-0.229
ona-0.222
ink-0.220
loe-0.219
 total-0.213
Token should
Token ID815
Feature activation-2.09e-02

Pos logit contributions
 worse+0.042
 for+0.040
 back+0.039
der+0.036
 fix+0.036
Neg logit contributions
ona-0.078
-0.077
loe-0.075
ink-0.074
 blonde-0.073
Token not
Token ID407
Feature activation-4.65e-02

Pos logit contributions
der+0.028
 worse+0.027
 altogether+0.025
 for+0.024
taker+0.024
Neg logit contributions
-0.035
ona-0.034
ink-0.034
 total-0.033
loe-0.033
Token take
Token ID1011
Feature activation-3.25e-02

Pos logit contributions
der+0.056
 worse+0.056
 for+0.052
 back+0.051
 altogether+0.050
Neg logit contributions
-0.082
ona-0.081
 total-0.080
ink-0.080
een-0.078
Token things
Token ID1243
Feature activation-7.97e-02

Pos logit contributions
 worse+0.033
 for+0.030
der+0.030
 back+0.030
 altogether+0.027
Neg logit contributions
-0.066
ona-0.066
ink-0.065
 total-0.063
loe-0.063
Token that
Token ID326
Feature activation-1.55e-01

Pos logit contributions
 pavement+0.082
 grave+0.080
der+0.075
attered+0.075
Acc+0.074
Neg logit contributions
-0.127
ink-0.122
if-0.121
Once-0.118
ere-0.114
Token are
Token ID389
Feature activation+1.50e-02

Pos logit contributions
 worse+0.186
der+0.175
 for+0.174
 back+0.170
 altogether+0.158
Neg logit contributions
ona-0.280
-0.280
ink-0.275
 total-0.269
loe-0.267
Token not
Token ID407
Feature activation-6.30e-02

Pos logit contributions
 total+0.024
ona+0.023
+0.023
ink+0.022
ere+0.022
Neg logit contributions
der-0.019
 worse-0.019
 altogether-0.018
controlled-0.018
 hospital-0.017
Token yours
Token ID12431
Feature activation-7.29e-02

Pos logit contributions
der+0.073
 worse+0.073
 altogether+0.067
 hospital+0.066
 uncontroll+0.064
Neg logit contributions
 total-0.108
ona-0.105
ere-0.104
een-0.101
ink-0.100
Token!
Token ID0
Feature activation-1.15e-01

Pos logit contributions
 worse+0.068
der+0.063
 hospital+0.057
 altogether+0.056
 for+0.055
Neg logit contributions
-0.148
ink-0.146
ona-0.146
 total-0.143
een-0.141
Token You
Token ID921
Feature activation-4.14e-02

Pos logit contributions
 worse+0.134
 for+0.122
 back+0.119
der+0.119
 grave+0.114
Neg logit contributions
-0.218
loe-0.211
ink-0.209
ona-0.208
een-0.200
Token You
Token ID921
Feature activation-3.61e-02

Pos logit contributions
 worse+0.147
 for+0.136
 back+0.135
der+0.126
 altogether+0.126
Neg logit contributions
-0.202
loe-0.196
ink-0.191
ona-0.190
 blonde-0.186
Token should
Token ID815
Feature activation-2.08e-02

Pos logit contributions
 worse+0.045
der+0.043
 for+0.042
 back+0.042
 altogether+0.040
Neg logit contributions
-0.063
ona-0.062
ink-0.062
 total-0.061
loe-0.059
Token not
Token ID407
Feature activation-4.66e-02

Pos logit contributions
der+0.029
 worse+0.029
 altogether+0.027
 back+0.026
 for+0.026
Neg logit contributions
ona-0.033
-0.033
ink-0.032
loe-0.032
 total-0.031
Token take
Token ID1011
Feature activation-3.09e-02

Pos logit contributions
der+0.057
 worse+0.057
 for+0.053
 back+0.053
 altogether+0.051
Neg logit contributions
-0.080
ona-0.079
 total-0.079
ink-0.078
ere-0.076
Token things
Token ID1243
Feature activation-7.84e-02

Pos logit contributions
 worse+0.033
der+0.031
 for+0.030
 back+0.029
 altogether+0.028
Neg logit contributions
ona-0.059
-0.059
ink-0.058
 total-0.058
loe-0.057
Token that
Token ID326
Feature activation-1.53e-01

Pos logit contributions
 grave+0.081
 pavement+0.080
 villains+0.079
dies+0.079
ger+0.077
Neg logit contributions
-0.128
Once-0.116
ink-0.115
if-0.115
ere-0.112
Token are
Token ID389
Feature activation+1.62e-02

Pos logit contributions
 worse+0.193
der+0.186
 for+0.179
 back+0.177
 altogether+0.169
Neg logit contributions
-0.262
ona-0.261
ink-0.257
 total-0.252
loe-0.248
Token not
Token ID407
Feature activation-6.36e-02

Pos logit contributions
 total+0.024
ona+0.023
+0.022
ink+0.022
ere+0.022
Neg logit contributions
der-0.022
 worse-0.022
 altogether-0.021
controlled-0.021
ger-0.021
Token yours
Token ID12431
Feature activation-7.31e-02

Pos logit contributions
der+0.079
 worse+0.078
 altogether+0.075
 hospital+0.074
chair+0.073
Neg logit contributions
 total-0.101
ona-0.096
ere-0.096
 blonde-0.093
ates-0.093
Token.
Token ID13
Feature activation-1.08e-01

Pos logit contributions
 worse+0.057
der+0.053
 arrow+0.049
taker+0.048
 grave+0.048
Neg logit contributions
-0.154
ink-0.150
ona-0.149
 total-0.148
een-0.146
Token And
Token ID843
Feature activation-1.13e-02

Pos logit contributions
 worse+0.131
 for+0.121
 back+0.121
der+0.117
 altogether+0.113
Neg logit contributions
-0.198
loe-0.193
ona-0.192
ink-0.192
 blonde-0.185
Token her
Token ID607
Feature activation-3.49e-02

Pos logit contributions
 worse+0.020
der+0.019
 for+0.019
 back+0.018
 altogether+0.017
Neg logit contributions
-0.036
ona-0.036
ink-0.035
 total-0.035
loe-0.034
Token things
Token ID1243
Feature activation-5.03e-02

Pos logit contributions
 back+0.047
 worse+0.046
 for+0.046
der+0.044
 inside+0.041
Neg logit contributions
ona-0.060
ink-0.058
-0.058
loe-0.057
een-0.055
Token away
Token ID1497
Feature activation-2.18e-02

Pos logit contributions
der+0.054
 hospital+0.043
controlled+0.042
ger+0.042
 worse+0.041
Neg logit contributions
-0.097
 total-0.091
ink-0.090
ere-0.089
Once-0.089
Token so
Token ID523
Feature activation-2.01e-02

Pos logit contributions
der+0.022
controlled+0.021
 worse+0.021
 hospital+0.021
 officer+0.020
Neg logit contributions
 total-0.038
-0.038
ink-0.037
ona-0.037
 blonde-0.036
Token nothing
Token ID2147
Feature activation-1.06e-01

Pos logit contributions
 officer+0.023
 grave+0.023
ashes+0.022
ger+0.022
controlled+0.022
Neg logit contributions
ink-0.034
-0.032
ere-0.031
 blonde-0.030
Will-0.030
Token bad
Token ID2089
Feature activation-1.52e-01

Pos logit contributions
bike+0.104
 officer+0.099
controlled+0.098
 automobile+0.098
 trigger+0.095
Neg logit contributions
if-0.182
-0.181
ere-0.177
 total-0.176
een-0.174
Token would
Token ID561
Feature activation-1.86e-02

Pos logit contributions
 worse+0.138
der+0.127
 grave+0.126
controlled+0.122
 hospital+0.121
Neg logit contributions
-0.292
ink-0.284
loe-0.284
ona-0.282
een-0.280
Token happen
Token ID1645
Feature activation-5.73e-02

Pos logit contributions
der+0.017
 worse+0.016
controlled+0.015
 back+0.014
ger+0.014
Neg logit contributions
-0.037
 total-0.037
ona-0.036
ink-0.036
een-0.035
Token again
Token ID757
Feature activation-9.06e-02

Pos logit contributions
 worse+0.040
der+0.037
controlled+0.036
 officer+0.036
 hospital+0.035
Neg logit contributions
-0.125
ink-0.124
ona-0.123
 total-0.122
ere-0.119
Token.
Token ID13
Feature activation-8.16e-02

Pos logit contributions
 worse+0.049
der+0.045
 for+0.039
 back+0.038
 hospital+0.035
Neg logit contributions
-0.219
ona-0.218
ink-0.218
 total-0.214
ere-0.210
Token<|endoftext|>
Token ID50256
Feature activation+1.21e-01

Pos logit contributions
 worse+0.158
der+0.152
ly+0.150
 trigger+0.150
 arrow+0.149
Neg logit contributions
-0.097
Once-0.095
loe-0.083
 total-0.080
ona-0.079
Token We
Token ID775
Feature activation-4.75e-02

Pos logit contributions
 worse+0.136
 altogether+0.124
der+0.123
 uncontroll+0.117
 hospital+0.115
Neg logit contributions
-0.168
Will-0.164
 total-0.161
loe-0.159
 blonde-0.158
Token will
Token ID481
Feature activation+4.95e-03

Pos logit contributions
der+0.072
 worse+0.070
 altogether+0.067
 uncontroll+0.065
 smoothly+0.064
Neg logit contributions
-0.069
ink-0.067
 total-0.066
ona-0.066
ere-0.064
Token not
Token ID407
Feature activation-4.23e-02

Pos logit contributions
ona+0.009
+0.009
ink+0.009
loe+0.009
ere+0.009
Neg logit contributions
 worse-0.006
 for-0.005
 back-0.005
der-0.005
 fix-0.005
Token take
Token ID1011
Feature activation-2.65e-02

Pos logit contributions
 worse+0.049
der+0.047
 for+0.046
 back+0.045
 altogether+0.043
Neg logit contributions
-0.077
ona-0.077
ink-0.076
 total-0.075
loe-0.074
Token things
Token ID1243
Feature activation-7.47e-02

Pos logit contributions
 worse+0.029
der+0.029
 for+0.027
 back+0.026
 altogether+0.025
Neg logit contributions
ona-0.049
-0.049
ink-0.048
 total-0.048
ere-0.047
Token that
Token ID326
Feature activation-1.52e-01

Pos logit contributions
 pavement+0.072
 uncontroll+0.069
 arrow+0.068
attered+0.067
ger+0.067
Neg logit contributions
-0.130
Once-0.122
ink-0.121
if-0.119
 total-0.119
Token are
Token ID389
Feature activation+1.61e-02

Pos logit contributions
 worse+0.189
der+0.182
 for+0.175
 back+0.173
 altogether+0.165
Neg logit contributions
-0.264
ona-0.263
ink-0.260
 total-0.254
loe-0.250
Token not
Token ID407
Feature activation-6.39e-02

Pos logit contributions
ona+0.027
 total+0.027
+0.027
ink+0.026
ere+0.026
Neg logit contributions
 worse-0.019
der-0.019
 back-0.018
 altogether-0.018
 for-0.018
Token ours
Token ID16903
Feature activation-7.16e-02

Pos logit contributions
der+0.066
 worse+0.065
 altogether+0.059
ger+0.058
controlled+0.057
Neg logit contributions
 total-0.118
ona-0.116
ere-0.112
ink-0.112
 blonde-0.111
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
 worse+0.053
der+0.047
controlled+0.043
 hospital+0.042
 arrow+0.042
Neg logit contributions
-0.155
ona-0.153
 total-0.152
ink-0.151
een-0.148
Token Please
Token ID4222
Feature activation+1.02e-02

Pos logit contributions
 worse+0.131
 altogether+0.115
der+0.111
 uncontroll+0.111
 back+0.111
Neg logit contributions
-0.180
 total-0.169
Will-0.168
loe-0.168
 blonde-0.168
Token You
Token ID921
Feature activation-3.23e-02

Pos logit contributions
 worse+0.131
 back+0.123
 for+0.122
der+0.121
 altogether+0.118
Neg logit contributions
-0.170
loe-0.164
 blonde-0.160
ink-0.160
ona-0.157
Token should
Token ID815
Feature activation-1.64e-02

Pos logit contributions
 worse+0.036
der+0.035
 for+0.033
 back+0.033
 altogether+0.031
Neg logit contributions
-0.060
ona-0.060
ink-0.059
 total-0.058
loe-0.057
Token not
Token ID407
Feature activation-4.61e-02

Pos logit contributions
 worse+0.023
der+0.023
 altogether+0.021
 for+0.021
 back+0.021
Neg logit contributions
-0.026
ona-0.025
ink-0.025
loe-0.024
 total-0.024
Token take
Token ID1011
Feature activation-3.40e-02

Pos logit contributions
der+0.058
 worse+0.057
 for+0.053
 back+0.052
 altogether+0.052
Neg logit contributions
-0.077
 total-0.077
ona-0.076
ink-0.076
ere-0.074
Token things
Token ID1243
Feature activation-7.54e-02

Pos logit contributions
 worse+0.039
der+0.039
 for+0.036
 altogether+0.035
 back+0.035
Neg logit contributions
ona-0.059
 total-0.059
-0.059
ink-0.058
ere-0.058
Token that
Token ID326
Feature activation-1.52e-01

Pos logit contributions
 pavement+0.087
 grave+0.083
dies+0.081
attered+0.080
 sidewalk+0.078
Neg logit contributions
-0.112
if-0.108
Once-0.105
ink-0.101
Will-0.098
Token are
Token ID389
Feature activation+1.73e-02

Pos logit contributions
 worse+0.197
der+0.194
 for+0.182
 back+0.181
 altogether+0.176
Neg logit contributions
-0.251
ona-0.248
ink-0.246
 total-0.241
een-0.238
Token not
Token ID407
Feature activation-6.57e-02

Pos logit contributions
 total+0.025
ona+0.024
ink+0.023
ere+0.023
+0.023
Neg logit contributions
der-0.024
 altogether-0.023
controlled-0.023
 worse-0.023
ger-0.023
Token yours
Token ID12431
Feature activation-7.47e-02

Pos logit contributions
der+0.087
 worse+0.085
 altogether+0.083
 hospital+0.081
chair+0.081
Neg logit contributions
 total-0.098
ere-0.094
ona-0.092
ates-0.091
 blonde-0.090
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
 worse+0.058
der+0.054
taker+0.048
 altogether+0.047
 arrow+0.047
Neg logit contributions
-0.160
ink-0.157
ona-0.155
 total-0.153
een-0.152
Token You
Token ID921
Feature activation-3.69e-02

Pos logit contributions
 worse+0.129
 for+0.121
 back+0.121
der+0.116
 altogether+0.113
Neg logit contributions
-0.182
loe-0.177
ink-0.176
ona-0.174
 blonde-0.170
Token I
Token ID314
Feature activation-5.05e-02

Pos logit contributions
 worse+0.066
der+0.064
 for+0.060
 back+0.059
 altogether+0.058
Neg logit contributions
-0.098
ona-0.097
ink-0.096
 total-0.094
loe-0.094
Token won
Token ID1839
Feature activation+1.35e-02

Pos logit contributions
 worse+0.066
 for+0.065
 back+0.063
der+0.060
 even+0.059
Neg logit contributions
ona-0.087
-0.086
ink-0.084
 blonde-0.084
 total-0.083
Token't
Token ID470
Feature activation-2.40e-02

Pos logit contributions
ona+0.014
+0.013
 total+0.013
ink+0.012
 blonde+0.012
Neg logit contributions
 worse-0.028
 back-0.027
 for-0.027
der-0.027
 hospital-0.025
Token let
Token ID1309
Feature activation-4.19e-02

Pos logit contributions
der+0.027
 worse+0.026
controlled+0.025
 for+0.025
 back+0.024
Neg logit contributions
-0.043
ona-0.042
ink-0.041
 total-0.041
een-0.040
Token anything
Token ID1997
Feature activation-1.18e-01

Pos logit contributions
der+0.053
 worse+0.053
 for+0.049
 altogether+0.048
 back+0.048
Neg logit contributions
-0.069
ona-0.068
ink-0.068
loe-0.067
 total-0.066
Token bad
Token ID2089
Feature activation-1.51e-01

Pos logit contributions
der+0.105
ger+0.098
controlled+0.098
 worse+0.098
 officer+0.096
Neg logit contributions
-0.217
ona-0.217
ink-0.217
 total-0.214
een-0.208
Token happen
Token ID1645
Feature activation-6.32e-02

Pos logit contributions
 worse+0.177
der+0.169
 altogether+0.158
controlled+0.157
 back+0.155
Neg logit contributions
ona-0.262
-0.259
ink-0.257
 total-0.251
loe-0.250
Token."
Token ID526
Feature activation-3.21e-02

Pos logit contributions
 worse+0.050
der+0.046
 for+0.045
 back+0.044
 altogether+0.039
Neg logit contributions
-0.140
ona-0.140
ink-0.138
 total-0.136
loe-0.135
Token
Token ID220
Feature activation+1.31e-01

Pos logit contributions
 worse+0.043
der+0.042
 for+0.042
 back+0.041
 altogether+0.039
Neg logit contributions
ona-0.053
ink-0.052
-0.051
 total-0.050
ere-0.049
Token
Token ID198
Feature activation+1.15e-01

Pos logit contributions
ona+0.240
 total+0.233
ink+0.229
+0.228
 blonde+0.228
Neg logit contributions
 worse-0.155
 for-0.148
der-0.148
 back-0.147
 altogether-0.135
Token
Token ID198
Feature activation+1.15e-01

Pos logit contributions
ona+0.223
ink+0.218
 total+0.218
+0.217
 blonde+0.216
Neg logit contributions
 worse-0.118
 back-0.115
 for-0.113
der-0.108
 he-0.106
Token.
Token ID13
Feature activation-1.11e-01

Pos logit contributions
 worse+0.016
der+0.015
 back+0.015
 for+0.014
ger+0.014
Neg logit contributions
ona-0.024
-0.023
ink-0.023
 total-0.023
loe-0.023
Token Don
Token ID2094
Feature activation-3.35e-02

Pos logit contributions
der+0.094
 worse+0.093
 for+0.089
 back+0.086
 fix+0.079
Neg logit contributions
ona-0.232
ink-0.229
 total-0.227
-0.226
ere-0.223
Token't
Token ID470
Feature activation-1.37e-02

Pos logit contributions
 worse+0.061
 for+0.057
 back+0.056
 hospital+0.055
 he+0.054
Neg logit contributions
 blonde-0.049
ona-0.047
iley-0.044
agg-0.043
ates-0.042
Token take
Token ID1011
Feature activation-2.25e-02

Pos logit contributions
der+0.019
 worse+0.018
 for+0.017
 back+0.017
ger+0.017
Neg logit contributions
-0.021
 total-0.021
ona-0.021
ink-0.020
 blonde-0.020
Token things
Token ID1243
Feature activation-7.34e-02

Pos logit contributions
 worse+0.025
 for+0.023
 back+0.023
der+0.022
 hospital+0.020
Neg logit contributions
-0.045
ona-0.045
ink-0.044
loe-0.043
 total-0.042
Token that
Token ID326
Feature activation-1.51e-01

Pos logit contributions
der+0.064
 pavement+0.060
 lesson+0.059
 uncontroll+0.059
ger+0.058
Neg logit contributions
-0.140
ink-0.136
if-0.130
ona-0.129
 total-0.128
Token are
Token ID389
Feature activation+1.41e-02

Pos logit contributions
 worse+0.190
der+0.186
 for+0.176
 back+0.174
 altogether+0.169
Neg logit contributions
-0.259
ona-0.258
ink-0.256
 total-0.249
een-0.246
Token not
Token ID407
Feature activation-6.46e-02

Pos logit contributions
 total+0.023
ona+0.023
ink+0.022
+0.022
ere+0.021
Neg logit contributions
der-0.018
 worse-0.017
 altogether-0.017
controlled-0.016
ger-0.016
Token yours
Token ID12431
Feature activation-7.16e-02

Pos logit contributions
der+0.078
 worse+0.075
 altogether+0.072
chair+0.071
 hospital+0.071
Neg logit contributions
 total-0.107
ona-0.102
ere-0.101
 blonde-0.100
ink-0.098
Token.
Token ID13
Feature activation-1.14e-01

Pos logit contributions
 worse+0.056
der+0.051
 altogether+0.044
 hospital+0.044
taker+0.044
Neg logit contributions
-0.155
ona-0.154
ink-0.154
 total-0.151
een-0.149
Token And
Token ID843
Feature activation-1.58e-02

Pos logit contributions
 worse+0.105
der+0.096
 for+0.095
 back+0.093
 altogether+0.085
Neg logit contributions
-0.240
ona-0.238
ink-0.234
loe-0.230
 total-0.229
Token was
Token ID373
Feature activation+2.44e-02

Pos logit contributions
 worse+0.067
 for+0.064
 back+0.060
 hospital+0.054
 around+0.053
Neg logit contributions
ona-0.209
-0.206
loe-0.201
 blonde-0.201
 total-0.199
Token bad
Token ID2089
Feature activation-1.19e-01

Pos logit contributions
+0.054
ona+0.052
loe+0.052
ink+0.051
een+0.051
Neg logit contributions
 for-0.028
 worse-0.026
 back-0.025
 inside-0.023
 even-0.023
Token to
Token ID284
Feature activation-2.18e-02

Pos logit contributions
 worse+0.103
 for+0.102
der+0.099
 back+0.093
 inside+0.082
Neg logit contributions
-0.259
ona-0.256
ink-0.253
 total-0.250
loe-0.246
Token take
Token ID1011
Feature activation-3.13e-02

Pos logit contributions
 worse+0.022
 back+0.021
 for+0.021
 fix+0.020
 hospital+0.020
Neg logit contributions
-0.044
ona-0.043
 total-0.043
ink-0.043
loe-0.043
Token things
Token ID1243
Feature activation-7.14e-02

Pos logit contributions
 worse+0.028
 for+0.027
 back+0.026
der+0.024
 hospital+0.023
Neg logit contributions
-0.068
ona-0.068
ink-0.066
loe-0.065
 total-0.064
Token that
Token ID326
Feature activation-1.51e-01

Pos logit contributions
der+0.053
 worse+0.047
ger+0.045
 uncontroll+0.042
 grave+0.042
Neg logit contributions
-0.152
ink-0.151
ere-0.143
ona-0.143
 total-0.143
Token weren
Token ID6304
Feature activation+2.23e-02

Pos logit contributions
 worse+0.168
 for+0.158
 back+0.153
der+0.151
 hospital+0.140
Neg logit contributions
ona-0.292
-0.289
ink-0.284
loe-0.279
 total-0.279
Token't
Token ID470
Feature activation-4.69e-02

Pos logit contributions
ona+0.022
 blonde+0.021
+0.020
een+0.020
ink+0.020
Neg logit contributions
 worse-0.046
 for-0.045
 back-0.044
der-0.044
 even-0.042
Token yours
Token ID12431
Feature activation-7.51e-02

Pos logit contributions
 worse+0.051
der+0.049
 back+0.045
 for+0.045
 altogether+0.044
Neg logit contributions
ona-0.087
ink-0.085
 total-0.085
-0.084
ere-0.083
Token."
Token ID526
Feature activation-4.02e-02

Pos logit contributions
 worse+0.062
 for+0.058
der+0.058
 back+0.056
 altogether+0.050
Neg logit contributions
ona-0.163
-0.163
ink-0.161
 total-0.158
loe-0.157
Token Jill
Token ID22772
Feature activation+2.99e-02

Pos logit contributions
 worse+0.057
der+0.055
 for+0.053
 back+0.053
 altogether+0.050
Neg logit contributions
-0.064
ona-0.064
ink-0.062
 total-0.061
loe-0.060
Token.
Token ID13
Feature activation-8.86e-02

Pos logit contributions
attered+0.075
 trigger+0.075
 grave+0.073
taker+0.073
 rubble+0.073
Neg logit contributions
if-0.106
 total-0.103
ona-0.102
Once-0.102
een-0.102
Token They
Token ID1119
Feature activation-3.44e-02

Pos logit contributions
 worse+0.105
 back+0.087
der+0.087
 for+0.086
ly+0.085
Neg logit contributions
-0.173
ona-0.167
ink-0.163
 total-0.163
 blonde-0.162
Token never
Token ID1239
Feature activation-1.87e-03

Pos logit contributions
 worse+0.050
der+0.049
controlled+0.045
attered+0.044
 altogether+0.044
Neg logit contributions
-0.054
ona-0.053
 total-0.052
ink-0.052
Once-0.050
Token take
Token ID1011
Feature activation-2.73e-02

Pos logit contributions
der+0.003
 worse+0.003
taker+0.003
 for+0.003
ger+0.002
Neg logit contributions
 total-0.003
ona-0.003
-0.003
ere-0.003
ink-0.003
Token things
Token ID1243
Feature activation-7.67e-02

Pos logit contributions
 worse+0.031
der+0.030
 for+0.029
 back+0.028
 altogether+0.027
Neg logit contributions
-0.050
ona-0.049
 total-0.049
ink-0.049
ere-0.048
Token that
Token ID326
Feature activation-1.50e-01

Pos logit contributions
der+0.071
taker+0.071
ometer+0.071
attered+0.070
 pavement+0.070
Neg logit contributions
-0.144
Once-0.128
 total-0.121
Will-0.119
if-0.119
Token are
Token ID389
Feature activation+1.22e-02

Pos logit contributions
der+0.214
 worse+0.205
 back+0.189
 for+0.185
 altogether+0.185
Neg logit contributions
-0.240
 total-0.223
ink-0.221
ona-0.221
een-0.216
Token not
Token ID407
Feature activation-6.88e-02

Pos logit contributions
 total+0.021
ona+0.020
+0.020
ink+0.019
ere+0.019
Neg logit contributions
der-0.014
 worse-0.014
controlled-0.013
 grave-0.013
ger-0.013
Token theirs
Token ID22021
Feature activation-1.07e-01

Pos logit contributions
der+0.052
 worse+0.048
 grave+0.045
ger+0.045
 hospital+0.044
Neg logit contributions
 total-0.145
ere-0.139
ona-0.138
 blonde-0.138
ates-0.135
Token again
Token ID757
Feature activation-1.18e-01

Pos logit contributions
 worse+0.080
controlled+0.073
der+0.070
taker+0.068
 arrow+0.067
Neg logit contributions
-0.234
ona-0.226
 total-0.223
ink-0.221
een-0.218
Token.
Token ID13
Feature activation-9.32e-02

Pos logit contributions
 worse+0.052
der+0.044
 for+0.042
 back+0.040
 altogether+0.032
Neg logit contributions
ona-0.305
-0.304
ink-0.301
 total-0.296
loe-0.295
Token not
Token ID407
Feature activation-3.54e-02

Pos logit contributions
 total+0.025
ona+0.025
+0.024
ink+0.024
ere+0.023
Neg logit contributions
der-0.017
 worse-0.017
controlled-0.016
 altogether-0.016
 back-0.015
Token nice
Token ID3621
Feature activation-6.22e-02

Pos logit contributions
der+0.039
 worse+0.038
 altogether+0.035
controlled+0.035
 back+0.035
Neg logit contributions
 total-0.063
ona-0.063
ere-0.061
ink-0.061
-0.061
Token to
Token ID284
Feature activation-2.15e-02

Pos logit contributions
 worse+0.054
bike+0.051
controlled+0.051
 smoothly+0.050
 hospital+0.049
Neg logit contributions
ona-0.119
 total-0.117
ink-0.115
ere-0.114
-0.114
Token take
Token ID1011
Feature activation-2.83e-02

Pos logit contributions
 worse+0.024
der+0.023
 for+0.022
 back+0.022
 altogether+0.021
Neg logit contributions
-0.041
ona-0.041
ink-0.040
 total-0.039
loe-0.039
Token things
Token ID1243
Feature activation-7.19e-02

Pos logit contributions
 worse+0.031
der+0.030
 for+0.028
 back+0.027
 altogether+0.027
Neg logit contributions
ona-0.052
-0.052
ink-0.052
 total-0.051
ere-0.050
Token that
Token ID326
Feature activation-1.50e-01

Pos logit contributions
 pavement+0.074
Acc+0.069
 uncontroll+0.069
dies+0.069
der+0.068
Neg logit contributions
-0.111
ink-0.110
if-0.105
Once-0.105
 total-0.101
Token are
Token ID389
Feature activation+1.88e-02

Pos logit contributions
 worse+0.179
der+0.169
 for+0.167
 back+0.164
 altogether+0.153
Neg logit contributions
ona-0.272
-0.272
ink-0.267
 total-0.261
loe-0.260
Token not
Token ID407
Feature activation-6.23e-02

Pos logit contributions
ona+0.032
 total+0.032
+0.032
ink+0.031
ere+0.030
Neg logit contributions
 worse-0.022
der-0.022
 back-0.020
 altogether-0.020
 for-0.020
Token ours
Token ID16903
Feature activation-7.10e-02

Pos logit contributions
 worse+0.069
der+0.069
 altogether+0.064
 hospital+0.063
controlled+0.062
Neg logit contributions
 total-0.109
ona-0.108
ere-0.104
ink-0.103
 blonde-0.103
Token.
Token ID13
Feature activation-9.96e-02

Pos logit contributions
 worse+0.060
der+0.057
 altogether+0.051
controlled+0.051
 hospital+0.050
Neg logit contributions
-0.145
ona-0.144
ink-0.143
 total-0.142
een-0.139
Token Mrs
Token ID9074
Feature activation+3.61e-03

Pos logit contributions
 worse+0.118
 back+0.106
 altogether+0.106
 for+0.103
der+0.097
Neg logit contributions
-0.185
ink-0.181
loe-0.179
ona-0.176
Will-0.173
Token You
Token ID921
Feature activation-2.91e-02

Pos logit contributions
 worse+0.127
 for+0.115
 back+0.112
 even+0.112
der+0.111
Neg logit contributions
-0.182
ink-0.169
loe-0.168
ona-0.162
 blonde-0.160
Token should
Token ID815
Feature activation-1.65e-02

Pos logit contributions
 worse+0.032
 for+0.029
 back+0.028
der+0.028
 fix+0.026
Neg logit contributions
ona-0.058
-0.057
loe-0.056
ink-0.056
een-0.055
Token not
Token ID407
Feature activation-4.48e-02

Pos logit contributions
der+0.024
 worse+0.024
 altogether+0.023
taker+0.022
 for+0.022
Neg logit contributions
-0.025
ona-0.024
ink-0.024
 total-0.024
loe-0.023
Token take
Token ID1011
Feature activation-3.19e-02

Pos logit contributions
der+0.053
 worse+0.053
 for+0.049
 back+0.047
 altogether+0.047
Neg logit contributions
-0.081
ona-0.079
 total-0.079
ink-0.079
ere-0.076
Token things
Token ID1243
Feature activation-7.28e-02

Pos logit contributions
 worse+0.032
der+0.030
 for+0.029
 back+0.028
 altogether+0.026
Neg logit contributions
-0.064
ona-0.064
ink-0.063
 total-0.062
loe-0.061
Token that
Token ID326
Feature activation-1.50e-01

Pos logit contributions
 pavement+0.087
dies+0.079
 sidewalk+0.078
 uncontroll+0.076
attered+0.076
Neg logit contributions
ink-0.106
-0.106
if-0.106
Once-0.102
Will-0.098
Token are
Token ID389
Feature activation+1.75e-02

Pos logit contributions
 worse+0.190
der+0.186
 for+0.176
 back+0.174
 altogether+0.169
Neg logit contributions
-0.254
ona-0.252
ink-0.249
 total-0.243
een-0.240
Token not
Token ID407
Feature activation-6.13e-02

Pos logit contributions
 total+0.029
ona+0.028
ink+0.028
+0.027
ere+0.027
Neg logit contributions
der-0.021
 worse-0.020
 altogether-0.020
controlled-0.019
ger-0.019
Token yours
Token ID12431
Feature activation-7.11e-02

Pos logit contributions
der+0.080
 worse+0.080
 altogether+0.076
 hospital+0.075
 uncontroll+0.074
Neg logit contributions
 total-0.093
ona-0.089
ere-0.089
ates-0.087
ink-0.087
Token,"
Token ID553
Feature activation+4.14e-03

Pos logit contributions
 worse+0.064
der+0.064
 altogether+0.058
 hospital+0.057
 uncontroll+0.056
Neg logit contributions
ink-0.140
-0.139
ona-0.138
 total-0.136
een-0.133
Token Sara
Token ID24799
Feature activation+1.70e-02

Pos logit contributions
ona+0.007
+0.006
 blonde+0.006
iley+0.006
 total+0.006
Neg logit contributions
controlled-0.006
attered-0.005
der-0.005
 worse-0.005
 altogether-0.005
Token checks
Token ID8794
Feature activation-2.30e-02

Pos logit contributions
 worse+0.026
der+0.025
 for+0.024
 back+0.024
 altogether+0.022
Neg logit contributions
ona-0.045
-0.045
ink-0.045
 total-0.044
loe-0.043
Token if
Token ID611
Feature activation-6.77e-02

Pos logit contributions
 worse+0.024
der+0.023
 back+0.021
 for+0.020
 altogether+0.020
Neg logit contributions
ona-0.044
ink-0.044
-0.044
 total-0.043
loe-0.043
Token we
Token ID356
Feature activation-8.06e-02

Pos logit contributions
 worse+0.071
 for+0.066
der+0.065
 back+0.063
 hospital+0.058
Neg logit contributions
-0.135
ona-0.134
ink-0.132
 total-0.129
loe-0.128
Token have
Token ID423
Feature activation-4.33e-02

Pos logit contributions
 worse+0.091
 for+0.085
der+0.084
 back+0.083
 hospital+0.076
Neg logit contributions
ona-0.152
-0.151
ink-0.149
loe-0.145
 total-0.145
Token anything
Token ID1997
Feature activation-1.13e-01

Pos logit contributions
 worse+0.045
 for+0.043
 back+0.042
der+0.039
 inside+0.037
Neg logit contributions
-0.094
ona-0.090
ink-0.089
loe-0.088
ere-0.084
Token bad
Token ID2089
Feature activation-1.50e-01

Pos logit contributions
 worse+0.098
der+0.093
 back+0.086
 for+0.085
 altogether+0.083
Neg logit contributions
ona-0.234
-0.234
ink-0.232
 total-0.227
loe-0.223
Token in
Token ID287
Feature activation-1.01e-02

Pos logit contributions
 worse+0.123
der+0.111
 back+0.104
 for+0.102
 altogether+0.099
Neg logit contributions
ona-0.321
-0.318
ink-0.316
 total-0.310
een-0.308
Token our
Token ID674
Feature activation-4.85e-02

Pos logit contributions
 for+0.009
 worse+0.008
 back+0.008
 inside+0.007
 fix+0.007
Neg logit contributions
-0.024
ona-0.024
ink-0.023
agg-0.023
 blonde-0.022
Token pockets
Token ID16511
Feature activation+4.12e-03

Pos logit contributions
 worse+0.059
 back+0.059
 for+0.059
der+0.055
 hospital+0.055
Neg logit contributions
ona-0.087
-0.086
ink-0.084
loe-0.082
 total-0.082
Token or
Token ID393
Feature activation-8.40e-02

Pos logit contributions
+0.010
ona+0.010
ink+0.009
 total+0.009
loe+0.009
Neg logit contributions
 worse-0.003
der-0.002
 for-0.002
 back-0.002
 altogether-0.002
Token our
Token ID674
Feature activation-5.29e-02

Pos logit contributions
 worse+0.092
der+0.090
 back+0.083
 for+0.083
 altogether+0.082
Neg logit contributions
ona-0.152
 total-0.150
-0.150
ink-0.148
 blonde-0.146
Token Tim
Token ID5045
Feature activation-7.79e-03

Pos logit contributions
 grave+0.077
der+0.076
attered+0.074
controlled+0.073
ashes+0.072
Neg logit contributions
Will-0.163
Once-0.156
 total-0.155
een-0.151
ona-0.151
Token didn
Token ID1422
Feature activation+9.15e-03

Pos logit contributions
der+0.011
 worse+0.010
 altogether+0.010
 grave+0.010
controlled+0.010
Neg logit contributions
ink-0.012
-0.011
ona-0.011
 total-0.011
ere-0.011
Token't
Token ID470
Feature activation-2.26e-02

Pos logit contributions
+0.006
ink+0.006
ona+0.006
loe+0.005
 total+0.005
Neg logit contributions
 worse-0.021
der-0.021
 for-0.021
 back-0.021
 grave-0.020
Token want
Token ID765
Feature activation-4.27e-02

Pos logit contributions
der+0.035
ometer+0.031
controlled+0.031
 grave+0.030
attered+0.029
Neg logit contributions
 total-0.032
Once-0.032
ona-0.029
 blonde-0.028
re-0.028
Token anything
Token ID1997
Feature activation-1.22e-01

Pos logit contributions
der+0.044
 grave+0.041
taker+0.039
 worse+0.039
ger+0.038
Neg logit contributions
 total-0.078
een-0.074
 blonde-0.074
Will-0.074
ere-0.074
Token bad
Token ID2089
Feature activation-1.49e-01

Pos logit contributions
 trigger+0.139
 pavement+0.134
 grave+0.133
arp+0.133
attered+0.130
Neg logit contributions
re-0.183
Once-0.181
if-0.176
 blonde-0.173
 time-0.170
Token to
Token ID284
Feature activation-3.86e-02

Pos logit contributions
 grave+0.143
 rubble+0.139
der+0.138
 worse+0.135
 smoothly+0.133
Neg logit contributions
re-0.256
een-0.254
ael-0.253
ona-0.251
Once-0.250
Token happen
Token ID1645
Feature activation-5.45e-02

Pos logit contributions
der+0.046
 worse+0.046
 for+0.042
 back+0.042
 altogether+0.040
Neg logit contributions
-0.068
ona-0.067
ink-0.066
 total-0.066
een-0.065
Token to
Token ID284
Feature activation-1.51e-02

Pos logit contributions
controlled+0.050
 hospital+0.049
 officer+0.049
 grave+0.048
 arrow+0.048
Neg logit contributions
ink-0.099
ona-0.099
-0.098
if-0.097
 total-0.097
Token his
Token ID465
Feature activation-2.76e-02

Pos logit contributions
der+0.015
 worse+0.014
 grave+0.014
controlled+0.014
taker+0.013
Neg logit contributions
een-0.028
loe-0.028
 total-0.027
ona-0.027
ere-0.027
Token club
Token ID3430
Feature activation-3.07e-02

Pos logit contributions
der+0.051
 worse+0.050
chair+0.049
taker+0.049
hot+0.049
Neg logit contributions
 total-0.028
-0.027
ona-0.026
ink-0.025
ere-0.024
Token We
Token ID775
Feature activation-4.27e-02

Pos logit contributions
 worse+0.093
der+0.088
 for+0.085
 back+0.083
 hospital+0.076
Neg logit contributions
ona-0.229
ink-0.223
-0.223
 total-0.219
ere-0.217
Token don
Token ID836
Feature activation+4.38e-03

Pos logit contributions
 worse+0.051
 for+0.048
der+0.048
 back+0.047
 hospital+0.044
Neg logit contributions
ona-0.077
-0.077
ink-0.076
 total-0.074
loe-0.073
Token't
Token ID470
Feature activation-2.11e-02

Pos logit contributions
ona+0.004
 blonde+0.004
 total+0.004
een+0.004
ere+0.004
Neg logit contributions
 worse-0.009
 for-0.009
der-0.009
 back-0.008
 hospital-0.008
Token want
Token ID765
Feature activation-4.34e-02

Pos logit contributions
 worse+0.022
der+0.022
 back+0.019
 altogether+0.019
 for+0.019
Neg logit contributions
-0.041
ona-0.040
ink-0.040
 total-0.040
ere-0.039
Token anything
Token ID1997
Feature activation-1.22e-01

Pos logit contributions
 worse+0.031
der+0.031
 for+0.027
 altogether+0.026
ger+0.026
Neg logit contributions
-0.096
ona-0.096
 total-0.095
ink-0.094
 blonde-0.092
Token bad
Token ID2089
Feature activation-1.49e-01

Pos logit contributions
der+0.122
 worse+0.115
 altogether+0.115
ger+0.112
bike+0.109
Neg logit contributions
ona-0.214
-0.213
 total-0.212
ink-0.211
 blonde-0.206
Token to
Token ID284
Feature activation-4.44e-02

Pos logit contributions
 worse+0.145
der+0.132
 altogether+0.125
 back+0.122
 hospital+0.120
Neg logit contributions
ona-0.290
-0.286
ink-0.284
 total-0.279
een-0.279
Token happen
Token ID1645
Feature activation-5.68e-02

Pos logit contributions
 worse+0.049
 for+0.047
 back+0.045
der+0.043
 hospital+0.041
Neg logit contributions
ona-0.088
-0.086
ink-0.085
loe-0.083
 total-0.083
Token."
Token ID526
Feature activation-3.76e-02

Pos logit contributions
 worse+0.047
der+0.043
 for+0.041
 back+0.040
 hospital+0.038
Neg logit contributions
ona-0.122
-0.122
ink-0.121
 total-0.118
loe-0.117
Token
Token ID198
Feature activation+1.17e-01

Pos logit contributions
 worse+0.059
der+0.056
 for+0.054
 back+0.054
 altogether+0.052
Neg logit contributions
-0.056
ona-0.054
ink-0.053
loe-0.051
een-0.051
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
ona+0.200
+0.199
ink+0.196
 total+0.192
 blonde+0.188
Neg logit contributions
 worse-0.150
 for-0.144
der-0.141
 back-0.140
 altogether-0.131
Token not
Token ID407
Feature activation-3.33e-02

Pos logit contributions
 worse+0.036
der+0.034
 for+0.032
 back+0.032
 altogether+0.030
Neg logit contributions
ona-0.062
-0.062
ink-0.061
 total-0.060
loe-0.059
Token to
Token ID284
Feature activation-1.91e-02

Pos logit contributions
 worse+0.023
der+0.022
 for+0.020
 back+0.020
 altogether+0.018
Neg logit contributions
-0.075
ona-0.075
ink-0.074
 total-0.074
loe-0.072
Token spray
Token ID11662
Feature activation-7.34e-02

Pos logit contributions
 worse+0.021
 for+0.020
 back+0.019
der+0.019
 fix+0.018
Neg logit contributions
ona-0.037
-0.037
ink-0.037
loe-0.036
 total-0.035
Token paint
Token ID7521
Feature activation-4.45e-02

Pos logit contributions
 worse+0.080
der+0.077
 for+0.072
 back+0.071
 altogether+0.068
Neg logit contributions
-0.137
ona-0.136
ink-0.135
 total-0.133
ere-0.131
Token anything
Token ID1997
Feature activation-1.19e-01

Pos logit contributions
 worse+0.043
der+0.041
 for+0.039
 back+0.038
 altogether+0.036
Neg logit contributions
-0.089
ona-0.089
ink-0.089
 total-0.086
ere-0.085
Token that
Token ID326
Feature activation-1.49e-01

Pos logit contributions
 worse+0.108
der+0.103
 back+0.098
 for+0.097
 altogether+0.095
Neg logit contributions
-0.235
ona-0.233
ink-0.233
 total-0.227
ere-0.222
Token shouldn
Token ID6584
Feature activation-8.14e-02

Pos logit contributions
 worse+0.164
 for+0.152
 back+0.149
der+0.146
 hospital+0.134
Neg logit contributions
ona-0.294
-0.293
ink-0.288
loe-0.282
 total-0.279
Token't
Token ID470
Feature activation-5.38e-02

Pos logit contributions
 worse+0.176
 for+0.171
der+0.170
 back+0.169
 hospital+0.162
Neg logit contributions
ona-0.069
-0.067
ink-0.065
 total-0.063
 blonde-0.062
Token be
Token ID307
Feature activation-2.13e-02

Pos logit contributions
 worse+0.064
der+0.062
controlled+0.059
 uncontroll+0.059
 altogether+0.058
Neg logit contributions
-0.091
ona-0.089
 total-0.088
ink-0.087
een-0.086
Token painted
Token ID13055
Feature activation-3.94e-02

Pos logit contributions
 worse+0.023
 for+0.021
 back+0.019
der+0.019
 inside+0.018
Neg logit contributions
-0.047
loe-0.045
ona-0.045
ink-0.044
een-0.043
Token."
Token ID526
Feature activation-2.92e-02

Pos logit contributions
 worse+0.032
der+0.029
 for+0.028
 back+0.027
 altogether+0.025
Neg logit contributions
-0.086
ona-0.086
ink-0.085
 total-0.084
loe-0.083
Token be
Token ID307
Feature activation+1.45e-02

Pos logit contributions
 worse+0.003
 for+0.002
der+0.002
 back+0.002
 hospital+0.002
Neg logit contributions
ona-0.004
-0.004
ink-0.004
 total-0.004
loe-0.004
Token cautious
Token ID21205
Feature activation-8.30e-02

Pos logit contributions
+0.024
ona+0.024
ink+0.023
 total+0.022
loe+0.022
Neg logit contributions
 worse-0.020
 for-0.019
der-0.019
 back-0.019
 altogether-0.018
Token so
Token ID523
Feature activation-1.54e-02

Pos logit contributions
 worse+0.071
der+0.067
 hospital+0.062
 grave+0.062
controlled+0.061
Neg logit contributions
ink-0.163
-0.162
loe-0.161
ona-0.161
 total-0.161
Token that
Token ID326
Feature activation-1.06e-01

Pos logit contributions
 officer+0.018
 altogether+0.017
ger+0.016
der+0.016
 pavement+0.016
Neg logit contributions
ink-0.027
 blonde-0.025
iley-0.025
ere-0.025
if-0.025
Token nothing
Token ID2147
Feature activation-1.04e-01

Pos logit contributions
der+0.106
 worse+0.099
controlled+0.096
taker+0.093
 altogether+0.092
Neg logit contributions
-0.198
ona-0.197
 total-0.196
ink-0.195
Will-0.192
Token bad
Token ID2089
Feature activation-1.49e-01

Pos logit contributions
der+0.113
 worse+0.110
 altogether+0.108
 grave+0.107
bike+0.107
Neg logit contributions
-0.172
ona-0.169
 total-0.169
ink-0.168
ere-0.167
Token would
Token ID561
Feature activation-2.39e-02

Pos logit contributions
 worse+0.164
der+0.157
 back+0.145
 grave+0.145
 hospital+0.144
Neg logit contributions
-0.263
ona-0.262
ink-0.259
loe-0.256
 total-0.256
Token happen
Token ID1645
Feature activation-5.76e-02

Pos logit contributions
 worse+0.018
der+0.017
 for+0.016
 back+0.016
 altogether+0.015
Neg logit contributions
-0.052
ona-0.052
 total-0.051
ink-0.051
loe-0.050
Token again
Token ID757
Feature activation-9.30e-02

Pos logit contributions
 worse+0.038
der+0.035
controlled+0.033
 hospital+0.032
 officer+0.032
Neg logit contributions
-0.128
ona-0.128
ink-0.128
 total-0.126
loe-0.123
Token.
Token ID13
Feature activation-7.98e-02

Pos logit contributions
 worse+0.042
der+0.036
 for+0.035
 back+0.032
 altogether+0.026
Neg logit contributions
ona-0.239
-0.238
ink-0.235
 total-0.232
loe-0.231
Token<|endoftext|>
Token ID50256
Feature activation+1.23e-01

Pos logit contributions
 worse+0.169
 trigger+0.164
controlled+0.157
ly+0.157
ger+0.156
Neg logit contributions
Once-0.084
-0.082
loe-0.067
ona-0.065
ink-0.065
Token You
Token ID921
Feature activation-3.31e-02

Pos logit contributions
 worse+0.115
der+0.109
 for+0.106
 back+0.104
 altogether+0.096
Neg logit contributions
ona-0.220
-0.220
ink-0.217
 total-0.213
loe-0.210
Token should
Token ID815
Feature activation-1.66e-02

Pos logit contributions
 worse+0.038
 for+0.036
der+0.035
 back+0.035
 hospital+0.032
Neg logit contributions
ona-0.062
-0.062
ink-0.060
loe-0.059
 total-0.059
Token not
Token ID407
Feature activation-4.31e-02

Pos logit contributions
 worse+0.022
der+0.021
 for+0.020
 back+0.020
 altogether+0.019
Neg logit contributions
-0.028
ona-0.028
ink-0.027
 total-0.027
loe-0.027
Token take
Token ID1011
Feature activation-2.77e-02

Pos logit contributions
 worse+0.049
der+0.046
 for+0.045
 back+0.044
 altogether+0.042
Neg logit contributions
-0.081
ona-0.081
ink-0.079
 total-0.078
loe-0.077
Token things
Token ID1243
Feature activation-7.31e-02

Pos logit contributions
 worse+0.028
 for+0.027
 back+0.027
der+0.024
 hospital+0.023
Neg logit contributions
-0.059
ona-0.058
ink-0.058
loe-0.055
 total-0.054
Token that
Token ID326
Feature activation-1.49e-01

Pos logit contributions
der+0.070
 pavement+0.068
 grave+0.066
 uncontroll+0.065
 Maybe+0.064
Neg logit contributions
-0.121
ink-0.120
if-0.118
Once-0.114
ere-0.111
Token are
Token ID389
Feature activation+1.76e-02

Pos logit contributions
 worse+0.169
 for+0.162
 back+0.156
der+0.155
 hospital+0.144
Neg logit contributions
ona-0.283
-0.282
ink-0.275
 total-0.271
loe-0.270
Token not
Token ID407
Feature activation-6.10e-02

Pos logit contributions
 total+0.029
ona+0.028
+0.027
ink+0.027
ere+0.027
Neg logit contributions
der-0.022
 worse-0.022
 altogether-0.021
controlled-0.020
ger-0.020
Token yours
Token ID12431
Feature activation-7.05e-02

Pos logit contributions
der+0.074
 worse+0.074
 altogether+0.069
 hospital+0.067
 uncontroll+0.066
Neg logit contributions
 total-0.100
ona-0.099
ere-0.097
ink-0.095
ates-0.094
Token!"
Token ID2474
Feature activation-2.96e-02

Pos logit contributions
 worse+0.077
der+0.073
 hospital+0.067
 altogether+0.067
taker+0.065
Neg logit contributions
-0.130
ink-0.129
ona-0.128
 total-0.126
een-0.123
Token
Token ID198
Feature activation+1.18e-01

Pos logit contributions
 worse+0.048
der+0.047
controlled+0.046
 grave+0.044
 online+0.044
Neg logit contributions
-0.042
ona-0.038
Once-0.036
ink-0.035
loe-0.034
Token You
Token ID921
Feature activation-3.14e-02

Pos logit contributions
 worse+0.106
 back+0.095
 for+0.095
der+0.094
 altogether+0.089
Neg logit contributions
-0.192
ona-0.190
loe-0.187
ink-0.186
 blonde-0.185
Token should
Token ID815
Feature activation-1.89e-02

Pos logit contributions
 worse+0.032
der+0.030
 for+0.029
 back+0.028
 altogether+0.026
Neg logit contributions
ona-0.063
-0.063
ink-0.062
 total-0.061
loe-0.060
Token not
Token ID407
Feature activation-4.70e-02

Pos logit contributions
 worse+0.024
der+0.023
 for+0.022
 back+0.022
 altogether+0.021
Neg logit contributions
-0.033
ona-0.033
ink-0.032
 total-0.032
loe-0.031
Token eat
Token ID4483
Feature activation-4.40e-02

Pos logit contributions
 worse+0.053
der+0.050
 for+0.049
 back+0.048
 altogether+0.045
Neg logit contributions
-0.088
ona-0.088
ink-0.087
 total-0.085
loe-0.084
Token things
Token ID1243
Feature activation-7.74e-02

Pos logit contributions
 worse+0.044
der+0.044
 for+0.040
 back+0.040
controlled+0.040
Neg logit contributions
-0.083
ona-0.082
ink-0.082
 total-0.081
ere-0.079
Token that
Token ID326
Feature activation-1.49e-01

Pos logit contributions
 arrow+0.068
 grave+0.068
 pavement+0.065
ger+0.065
attered+0.065
Neg logit contributions
-0.147
ink-0.136
if-0.134
ona-0.133
Once-0.133
Token are
Token ID389
Feature activation+2.11e-02

Pos logit contributions
 worse+0.186
der+0.179
 for+0.173
 back+0.171
 altogether+0.163
Neg logit contributions
-0.257
ona-0.256
ink-0.252
 total-0.247
loe-0.243
Token rot
Token ID5724
Feature activation-9.12e-02

Pos logit contributions
ona+0.036
+0.036
 total+0.035
ink+0.035
ere+0.034
Neg logit contributions
 worse-0.026
der-0.025
 back-0.024
 for-0.024
 altogether-0.023
Token.
Token ID13
Feature activation-1.06e-01

Pos logit contributions
 worse+0.065
 hospital+0.065
controlled+0.064
 arrow+0.063
der+0.062
Neg logit contributions
ink-0.182
-0.180
 total-0.179
ere-0.174
ona-0.174
Token They
Token ID1119
Feature activation-4.63e-02

Pos logit contributions
 worse+0.112
 back+0.103
 for+0.103
der+0.101
 altogether+0.096
Neg logit contributions
-0.207
ona-0.205
ink-0.202
loe-0.201
 blonde-0.197
Token are
Token ID389
Feature activation+6.83e-02

Pos logit contributions
der+0.061
 worse+0.061
 back+0.057
 altogether+0.057
 for+0.057
Neg logit contributions
-0.075
ink-0.074
ona-0.073
 total-0.072
ere-0.069
Token are
Token ID389
Feature activation+6.74e-02

Pos logit contributions
 worse+0.033
 for+0.031
 back+0.030
der+0.030
 fix+0.028
Neg logit contributions
ona-0.059
-0.059
ink-0.057
een-0.056
loe-0.056
Token very
Token ID845
Feature activation+3.50e-03

Pos logit contributions
+0.127
ona+0.126
ink+0.124
 total+0.121
loe+0.121
Neg logit contributions
 worse-0.079
 for-0.074
 back-0.073
der-0.072
 hospital-0.066
Token lucky
Token ID9670
Feature activation-5.27e-02

Pos logit contributions
+0.006
ona+0.006
ink+0.006
 total+0.006
loe+0.006
Neg logit contributions
 worse-0.005
 for-0.004
 back-0.004
der-0.004
 hospital-0.004
Token that
Token ID326
Feature activation-1.14e-01

Pos logit contributions
 worse+0.038
der+0.034
 back+0.031
 altogether+0.030
 for+0.030
Neg logit contributions
ona-0.118
-0.117
ink-0.116
 total-0.115
loe-0.114
Token nothing
Token ID2147
Feature activation-1.03e-01

Pos logit contributions
 worse+0.124
 for+0.123
 back+0.115
der+0.111
 even+0.105
Neg logit contributions
-0.231
ona-0.226
ink-0.222
 blonde-0.216
ere-0.216
Token bad
Token ID2089
Feature activation-1.49e-01

Pos logit contributions
der+0.106
 worse+0.103
 altogether+0.101
 uncontroll+0.098
 arrow+0.097
Neg logit contributions
ona-0.180
ink-0.177
 total-0.175
-0.175
een-0.172
Token happened
Token ID3022
Feature activation-1.91e-02

Pos logit contributions
 worse+0.157
der+0.145
 back+0.138
 for+0.134
 altogether+0.134
Neg logit contributions
ona-0.281
ink-0.276
-0.276
 total-0.271
een-0.269
Token.
Token ID13
Feature activation-9.89e-02

Pos logit contributions
 worse+0.012
der+0.011
 hospital+0.010
 altogether+0.009
 for+0.009
Neg logit contributions
ona-0.044
ink-0.043
-0.043
 total-0.042
een-0.042
Token You
Token ID921
Feature activation-3.32e-02

Pos logit contributions
 worse+0.103
der+0.097
 for+0.095
 back+0.093
 altogether+0.087
Neg logit contributions
-0.194
ona-0.193
ink-0.190
 total-0.186
loe-0.186
Token should
Token ID815
Feature activation-1.86e-02

Pos logit contributions
 worse+0.035
 for+0.032
 back+0.032
der+0.031
 fix+0.030
Neg logit contributions
ona-0.066
-0.065
ink-0.064
een-0.064
 blonde-0.063
Token be
Token ID307
Feature activation-4.54e-03

Pos logit contributions
 worse+0.023
 for+0.022
der+0.021
 back+0.021
 hospital+0.020
Neg logit contributions
-0.033
ona-0.033
ink-0.033
 total-0.032
loe-0.031
Token You
Token ID921
Feature activation-2.97e-02

Pos logit contributions
 worse+0.120
der+0.113
 for+0.111
 back+0.109
 altogether+0.101
Neg logit contributions
-0.209
ona-0.209
ink-0.206
 total-0.202
loe-0.200
Token can
Token ID460
Feature activation+2.01e-02

Pos logit contributions
 worse+0.032
 for+0.029
 back+0.029
der+0.028
 fix+0.027
Neg logit contributions
ona-0.060
-0.059
 blonde-0.057
ink-0.057
een-0.057
Token't
Token ID470
Feature activation-1.78e-02

Pos logit contributions
+0.017
ink+0.016
ona+0.016
 total+0.015
loe+0.015
Neg logit contributions
der-0.043
 worse-0.043
 for-0.042
 altogether-0.041
 back-0.041
Token take
Token ID1011
Feature activation-2.53e-02

Pos logit contributions
der+0.022
 worse+0.021
 for+0.020
taker+0.019
 altogether+0.019
Neg logit contributions
-0.032
ink-0.031
 total-0.030
ona-0.030
een-0.030
Token things
Token ID1243
Feature activation-7.11e-02

Pos logit contributions
 worse+0.032
 back+0.031
 for+0.031
der+0.029
 hospital+0.028
Neg logit contributions
-0.046
ona-0.046
ink-0.045
loe-0.043
een-0.043
Token that
Token ID326
Feature activation-1.49e-01

Pos logit contributions
der+0.066
 lesson+0.064
 valuable+0.063
ometer+0.059
 grave+0.059
Neg logit contributions
-0.131
ink-0.124
Once-0.121
 total-0.120
ere-0.115
Token are
Token ID389
Feature activation+1.82e-02

Pos logit contributions
 worse+0.178
 for+0.166
der+0.166
 back+0.163
 altogether+0.151
Neg logit contributions
ona-0.271
-0.270
ink-0.265
 total-0.260
loe-0.258
Token not
Token ID407
Feature activation-6.22e-02

Pos logit contributions
 total+0.030
ona+0.029
+0.029
ink+0.029
ere+0.028
Neg logit contributions
der-0.023
 worse-0.022
 altogether-0.021
 back-0.020
ger-0.020
Token yours
Token ID12431
Feature activation-6.94e-02

Pos logit contributions
der+0.072
 worse+0.072
 altogether+0.065
 hospital+0.064
 back+0.063
Neg logit contributions
 total-0.107
ona-0.105
ere-0.104
ink-0.104
-0.101
Token!"
Token ID2474
Feature activation-3.44e-02

Pos logit contributions
 worse+0.077
der+0.073
 for+0.068
 back+0.067
 altogether+0.066
Neg logit contributions
-0.129
ona-0.127
ink-0.127
 total-0.124
een-0.122
Token
Token ID198
Feature activation+1.15e-01

Pos logit contributions
 worse+0.056
der+0.054
 for+0.051
controlled+0.051
 back+0.051
Neg logit contributions
-0.048
ona-0.045
ink-0.043
loe-0.043
 total-0.043
Token rest
Token ID1334
Feature activation+4.82e-03

Pos logit contributions
ona+0.030
ink+0.030
+0.029
loe+0.028
 total+0.028
Neg logit contributions
 worse-0.013
 for-0.013
 back-0.013
 fix-0.013
 even-0.012
Token in
Token ID287
Feature activation+3.25e-02

Pos logit contributions
ona+0.010
+0.010
ink+0.010
 total+0.010
loe+0.010
Neg logit contributions
 worse-0.004
der-0.004
 for-0.004
 back-0.004
 altogether-0.003
Token it
Token ID340
Feature activation-5.74e-02

Pos logit contributions
ona+0.080
ink+0.078
agg+0.075
ere+0.074
Fre+0.074
Neg logit contributions
 for-0.026
 back-0.023
 worse-0.021
 he-0.020
 inside-0.019
Token.
Token ID13
Feature activation-8.01e-02

Pos logit contributions
 worse+0.044
der+0.040
 for+0.040
 back+0.038
 altogether+0.034
Neg logit contributions
ona-0.129
-0.128
ink-0.127
 total-0.124
loe-0.124
Token
Token ID198
Feature activation+1.10e-01

Pos logit contributions
 worse+0.082
 for+0.079
der+0.079
 back+0.077
 altogether+0.071
Neg logit contributions
ona-0.158
ink-0.156
-0.154
 total-0.152
loe-0.150
Token
Token ID198
Feature activation+1.11e-01

Pos logit contributions
 blonde+0.240
ink+0.232
ona+0.231
 total+0.228
+0.227
Neg logit contributions
 for-0.090
 he-0.086
 back-0.084
 worse-0.081
 even-0.081
TokenM
Token ID44
Feature activation+6.22e-02

Pos logit contributions
ona+0.136
+0.134
ink+0.134
loe+0.131
een+0.129
Neg logit contributions
 worse-0.197
 for-0.192
 back-0.189
 even-0.189
der-0.186
Tokenummy
Token ID13513
Feature activation-1.42e-04

Pos logit contributions
ona+0.078
+0.078
 total+0.077
ink+0.076
een+0.075
Neg logit contributions
 worse-0.106
der-0.105
 for-0.103
 back-0.100
 altogether-0.096
Token was
Token ID373
Feature activation+8.99e-02

Pos logit contributions
 for+0.000
 back+0.000
 worse+0.000
der+0.000
 inside+0.000
Neg logit contributions
-0.000
 total-0.000
ink-0.000
ona-0.000
loe-0.000
Token talking
Token ID3375
Feature activation-3.20e-02

Pos logit contributions
+0.225
lor+0.215
ink+0.211
ona+0.211
een+0.209
Neg logit contributions
 for-0.083
 worse-0.077
 back-0.077
 even-0.070
 inside-0.067
Token to
Token ID284
Feature activation+1.71e-02

Pos logit contributions
 worse+0.023
der+0.021
 for+0.019
 back+0.019
 hospital+0.017
Neg logit contributions
ona-0.073
-0.072
ink-0.071
 total-0.071
loe-0.070
Token anxious
Token ID18116
Feature activation-8.66e-02

Pos logit contributions
 worse+0.089
der+0.084
 for+0.080
 back+0.078
 altogether+0.071
Neg logit contributions
-0.199
ona-0.196
loe-0.194
ink-0.192
een-0.192
Token and
Token ID290
Feature activation-6.33e-02

Pos logit contributions
 worse+0.084
controlled+0.078
 arrow+0.077
taker+0.075
ger+0.075
Neg logit contributions
-0.156
ink-0.156
ona-0.153
een-0.152
 total-0.151
Token more
Token ID517
Feature activation-7.25e-02

Pos logit contributions
der+0.073
 worse+0.065
ger+0.065
controlled+0.065
 altogether+0.064
Neg logit contributions
 total-0.108
ona-0.108
 blonde-0.106
ink-0.104
-0.103
Token happy
Token ID3772
Feature activation-3.24e-02

Pos logit contributions
 worse+0.056
 back+0.049
der+0.047
controlled+0.047
ger+0.047
Neg logit contributions
ink-0.153
ona-0.152
 blonde-0.151
-0.149
ere-0.145
Token.
Token ID13
Feature activation-8.19e-02

Pos logit contributions
 worse+0.027
controlled+0.024
hot+0.023
taker+0.022
 officer+0.021
Neg logit contributions
ink-0.066
-0.066
ona-0.065
 total-0.064
 blonde-0.064
Token
Token ID198
Feature activation+1.06e-01

Pos logit contributions
 worse+0.118
der+0.110
 for+0.108
 back+0.106
 altogether+0.102
Neg logit contributions
-0.133
ona-0.130
ink-0.126
 total-0.125
loe-0.124
Token
Token ID198
Feature activation+1.08e-01

Pos logit contributions
ona+0.149
ink+0.148
+0.147
loe+0.141
ere+0.140
Neg logit contributions
 worse-0.168
der-0.163
 back-0.152
 for-0.151
 altogether-0.148
TokenAfter
Token ID3260
Feature activation+5.68e-02

Pos logit contributions
ona+0.119
ink+0.117
+0.116
 total+0.111
ere+0.109
Neg logit contributions
 worse-0.206
der-0.201
 for-0.193
 back-0.193
 altogether-0.189
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
ona+0.121
ink+0.119
+0.118
ere+0.117
 total+0.117
Neg logit contributions
 worse-0.051
 for-0.049
der-0.047
 back-0.046
 hospital-0.042
Token while
Token ID981
Feature activation+2.93e-02

Pos logit contributions
ink+0.022
ona+0.021
loe+0.021
+0.021
ere+0.021
Neg logit contributions
 worse-0.015
 for-0.015
 back-0.015
der-0.014
 even-0.013
Token,
Token ID11
Feature activation+9.92e-03

Pos logit contributions
ona+0.057
een+0.056
ink+0.055
ere+0.055
 total+0.055
Neg logit contributions
 for-0.035
 worse-0.032
 back-0.032
 he-0.029
 inside-0.029
Token?"
Token ID1701
Feature activation+8.08e-03

Pos logit contributions
 worse+0.036
 for+0.036
 back+0.035
der+0.034
 hospital+0.032
Neg logit contributions
ona-0.044
-0.044
ink-0.043
loe-0.043
 blonde-0.043
Token Tom
Token ID4186
Feature activation+2.44e-02

Pos logit contributions
ona+0.016
+0.015
 total+0.015
ink+0.015
loe+0.015
Neg logit contributions
 worse-0.009
der-0.008
 for-0.008
 back-0.008
 altogether-0.008
Token asks
Token ID7893
Feature activation+6.65e-03

Pos logit contributions
+0.041
ona+0.040
ink+0.040
 total+0.039
loe+0.038
Neg logit contributions
 worse-0.033
der-0.032
 for-0.031
 back-0.031
 altogether-0.029
Token.
Token ID13
Feature activation-7.53e-02

Pos logit contributions
ona+0.014
+0.014
 total+0.014
loe+0.014
een+0.014
Neg logit contributions
der-0.005
 worse-0.005
controlled-0.004
 grave-0.004
 hospital-0.004
Token
Token ID198
Feature activation+1.02e-01

Pos logit contributions
 worse+0.090
 for+0.085
 back+0.084
der+0.084
 he+0.077
Neg logit contributions
ink-0.139
ona-0.139
-0.134
een-0.132
loe-0.132
Token
Token ID198
Feature activation+1.03e-01

Pos logit contributions
ona+0.175
+0.175
ink+0.172
 total+0.169
 blonde+0.166
Neg logit contributions
 worse-0.132
 for-0.125
 back-0.123
der-0.123
 altogether-0.113
Token"
Token ID1
Feature activation-1.86e-02

Pos logit contributions
ona+0.111
+0.111
ink+0.108
 total+0.105
loe+0.102
Neg logit contributions
 worse-0.200
der-0.194
 for-0.192
 back-0.189
 altogether-0.183
TokenSure
Token ID19457
Feature activation+3.03e-02

Pos logit contributions
 worse+0.022
der+0.021
 for+0.020
 back+0.020
 altogether+0.018
Neg logit contributions
ona-0.034
ink-0.034
-0.034
loe-0.033
 total-0.033
Token,
Token ID11
Feature activation-3.91e-03

Pos logit contributions
+0.061
ona+0.060
ink+0.059
loe+0.058
ere+0.057
Neg logit contributions
 for-0.034
 worse-0.033
 back-0.032
 he-0.029
der-0.028
Token you
Token ID345
Feature activation-3.24e-02

Pos logit contributions
 worse+0.004
 for+0.004
 back+0.004
der+0.004
 even+0.004
Neg logit contributions
-0.008
ink-0.008
ona-0.008
 total-0.007
loe-0.007
Token can
Token ID460
Feature activation+2.46e-02

Pos logit contributions
 worse+0.031
 for+0.029
 back+0.028
der+0.028
 hospital+0.025
Neg logit contributions
ona-0.068
-0.068
ink-0.067
loe-0.066
een-0.065
Token ran
Token ID4966
Feature activation-4.33e-03

Pos logit contributions
 worse+0.016
 back+0.016
 for+0.015
 even+0.015
der+0.014
Neg logit contributions
-0.030
loe-0.029
ink-0.028
 total-0.028
ona-0.028
Token towards
Token ID3371
Feature activation+6.28e-03

Pos logit contributions
der+0.006
 worse+0.005
 wasteful+0.005
 privacy+0.004
grand+0.004
Neg logit contributions
Once-0.008
een-0.008
 total-0.007
-0.007
ona-0.007
Token her
Token ID607
Feature activation-4.80e-03

Pos logit contributions
ona+0.011
+0.011
een+0.011
 total+0.011
ere+0.011
Neg logit contributions
der-0.007
 worse-0.007
 altogether-0.006
controlled-0.006
 for-0.006
Token.
Token ID13
Feature activation-6.88e-02

Pos logit contributions
 worse+0.003
der+0.003
controlled+0.003
 smoothly+0.003
ger+0.002
Neg logit contributions
-0.011
ona-0.011
ink-0.011
 total-0.010
een-0.010
Token
Token ID198
Feature activation+1.22e-01

Pos logit contributions
 worse+0.072
 back+0.071
 for+0.071
der+0.069
 he+0.065
Neg logit contributions
ink-0.133
ona-0.132
 total-0.128
-0.127
loe-0.125
Token
Token ID198
Feature activation+1.22e-01

Pos logit contributions
 blonde+0.267
ink+0.266
+0.265
ona+0.263
 total+0.263
Neg logit contributions
 he-0.095
 back-0.091
 He-0.088
 for-0.088
 worse-0.087
TokenWhen
Token ID2215
Feature activation+7.23e-02

Pos logit contributions
+0.231
ink+0.224
ona+0.220
loe+0.212
ates+0.212
Neg logit contributions
 worse-0.142
 back-0.131
 smoothly-0.129
 he-0.127
 even-0.127
Token he
Token ID339
Feature activation-1.24e-02

Pos logit contributions
+0.181
ona+0.178
 total+0.175
ink+0.175
loe+0.172
Neg logit contributions
der-0.038
 worse-0.038
 for-0.029
controlled-0.028
 back-0.027
Token got
Token ID1392
Feature activation+4.42e-02

Pos logit contributions
 back+0.013
 he+0.013
 inside+0.012
 even+0.011
 for+0.011
Neg logit contributions
ates-0.030
loe-0.027
 blonde-0.027
ink-0.026
elve-0.026
Token closer
Token ID5699
Feature activation-3.33e-02

Pos logit contributions
+0.082
ona+0.082
ink+0.081
 total+0.080
loe+0.078
Neg logit contributions
 worse-0.050
der-0.050
 for-0.046
 back-0.044
 altogether-0.043
Token,
Token ID11
Feature activation+2.18e-02

Pos logit contributions
 worse+0.040
der+0.038
 for+0.036
 back+0.036
 hospital+0.035
Neg logit contributions
ona-0.059
-0.059
ink-0.058
 total-0.057
een-0.056
Token a
Token ID257
Feature activation+1.91e-03

Pos logit contributions
 for+0.004
 back+0.003
 worse+0.003
 even+0.003
 inside+0.003
Neg logit contributions
ink-0.016
ona-0.016
-0.015
loe-0.015
lor-0.015
Token deep
Token ID2769
Feature activation+6.18e-04

Pos logit contributions
loe+0.003
ink+0.003
ona+0.003
+0.003
ates+0.003
Neg logit contributions
 worse-0.002
 back-0.002
 for-0.002
 hospital-0.002
 even-0.002
Token hole
Token ID7604
Feature activation+2.54e-02

Pos logit contributions
+0.001
ona+0.001
ink+0.001
loe+0.001
een+0.001
Neg logit contributions
 worse-0.001
 for-0.001
 back-0.001
der-0.001
 inside-0.001
Token.
Token ID13
Feature activation-7.22e-02

Pos logit contributions
ona+0.059
+0.059
loe+0.057
ink+0.057
 total+0.057
Neg logit contributions
 for-0.018
 worse-0.018
der-0.017
 back-0.016
 inside-0.014
Token
Token ID198
Feature activation+1.15e-01

Pos logit contributions
 for+0.056
der+0.056
 back+0.054
 worse+0.051
 They+0.050
Neg logit contributions
ink-0.158
ona-0.156
 total-0.151
ere-0.149
 blonde-0.149
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
 blonde+0.236
 total+0.233
ona+0.232
+0.231
ink+0.228
Neg logit contributions
 back-0.102
 for-0.101
 worse-0.097
 he-0.093
 even-0.092
TokenThe
Token ID464
Feature activation+6.03e-03

Pos logit contributions
+0.196
 blonde+0.193
ink+0.190
ona+0.188
 total+0.183
Neg logit contributions
 worse-0.145
 back-0.141
 smoothly-0.140
hot-0.139
 even-0.137
Token animals
Token ID4695
Feature activation+1.97e-02

Pos logit contributions
+0.012
loe+0.012
ona+0.012
ink+0.012
een+0.011
Neg logit contributions
 worse-0.007
 for-0.006
 back-0.006
der-0.006
 arrow-0.006
Token worked
Token ID3111
Feature activation+9.73e-03

Pos logit contributions
+0.036
ona+0.035
ink+0.034
 total+0.034
loe+0.034
Neg logit contributions
 worse-0.024
 for-0.023
 back-0.023
der-0.022
 inside-0.021
Token together
Token ID1978
Feature activation+1.55e-02

Pos logit contributions
ona+0.019
+0.019
ink+0.019
 total+0.019
loe+0.019
Neg logit contributions
 worse-0.010
der-0.009
 for-0.009
 back-0.009
 altogether-0.008
Token to
Token ID284
Feature activation+3.56e-03

Pos logit contributions
ona+0.038
+0.037
ink+0.037
loe+0.037
 total+0.036
Neg logit contributions
 for-0.011
 back-0.010
 worse-0.009
der-0.008
 inside-0.008
Token She
Token ID1375
Feature activation-5.16e-03

Pos logit contributions
 worse+0.083
der+0.071
 for+0.068
 back+0.068
controlled+0.067
Neg logit contributions
-0.148
ona-0.144
ink-0.139
 blonde-0.138
ere-0.137
Token gasped
Token ID45236
Feature activation+8.81e-02

Pos logit contributions
 even+0.005
 back+0.005
 for+0.005
 worse+0.005
 he+0.005
Neg logit contributions
-0.011
loe-0.010
ona-0.010
ates-0.010
ink-0.010
Token,
Token ID11
Feature activation+2.10e-02

Pos logit contributions
ink+0.160
ona+0.159
+0.158
ere+0.153
 blonde+0.152
Neg logit contributions
 worse-0.098
controlled-0.090
der-0.090
 hospital-0.087
 lesson-0.085
Token
Token ID6184
Feature activation+1.51e-02

Pos logit contributions
ona+0.044
+0.044
ink+0.043
loe+0.043
 total+0.042
Neg logit contributions
 worse-0.019
 for-0.019
 back-0.019
der-0.019
 altogether-0.016
Token
Token ID95
Feature activation-1.56e-02

Pos logit contributions
+0.021
ink+0.020
ona+0.020
een+0.019
loe+0.019
Neg logit contributions
 worse-0.026
der-0.025
 for-0.024
 back-0.024
 even-0.023
Token
Token ID26391
Feature activation+2.61e-02

Pos logit contributions
 worse+0.024
der+0.023
 for+0.022
 back+0.022
 altogether+0.021
Neg logit contributions
-0.024
ona-0.023
ink-0.023
een-0.022
loe-0.022
Token
Token ID129
Feature activation+2.14e-02

Pos logit contributions
ona+0.039
 total+0.036
loe+0.036
een+0.035
ael+0.035
Neg logit contributions
 worse-0.038
 back-0.038
 for-0.038
der-0.036
 altogether-0.035
Token
Token ID241
Feature activation-4.38e-02

Pos logit contributions
ink+0.032
ona+0.032
+0.031
een+0.030
loe+0.030
Neg logit contributions
 worse-0.033
der-0.031
 for-0.030
 back-0.030
 altogether-0.029
TokenWhat
Token ID2061
Feature activation-3.55e-02

Pos logit contributions
 worse+0.054
der+0.052
 for+0.051
 back+0.050
 altogether+0.047
Neg logit contributions
ona-0.076
-0.076
ink-0.075
 total-0.073
loe-0.073
Token are
Token ID389
Feature activation+6.52e-02

Pos logit contributions
 worse+0.035
 even+0.031
 for+0.031
 he+0.030
 back+0.028
Neg logit contributions
-0.079
ona-0.074
ink-0.073
loe-0.071
ere-0.070
Token you
Token ID345
Feature activation-5.22e-02

Pos logit contributions
+0.111
ona+0.110
ink+0.108
 total+0.106
loe+0.105
Neg logit contributions
 worse-0.085
der-0.081
 for-0.080
 back-0.078
 altogether-0.074
Token for
Token ID329
Feature activation-5.20e-02

Pos logit contributions
 worse+0.034
der+0.031
 back+0.029
 for+0.028
 altogether+0.027
Neg logit contributions
-0.091
ona-0.091
ink-0.090
 total-0.089
ere-0.088
Token us
Token ID514
Feature activation-3.66e-02

Pos logit contributions
 for+0.057
 back+0.047
 worse+0.047
 that+0.045
 even+0.044
Neg logit contributions
ona-0.112
-0.110
ink-0.108
 blonde-0.105
lor-0.105
Token."
Token ID526
Feature activation-3.46e-02

Pos logit contributions
 for+0.024
 back+0.022
 worse+0.022
 inside+0.018
der+0.018
Neg logit contributions
-0.090
ona-0.090
ink-0.087
loe-0.086
 total-0.085
Token
Token ID198
Feature activation+1.17e-01

Pos logit contributions
 for+0.031
 worse+0.029
der+0.029
 back+0.029
 he+0.028
Neg logit contributions
ink-0.073
ona-0.072
 total-0.071
elve-0.069
ere-0.068
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
 blonde+0.274
 total+0.271
ink+0.269
elve+0.266
ona+0.263
Neg logit contributions
 he-0.074
 for-0.071
 He-0.070
 back-0.068
 They-0.066
TokenTom
Token ID13787
Feature activation+3.87e-02

Pos logit contributions
ink+0.234
ona+0.226
+0.224
ates+0.222
een+0.220
Neg logit contributions
 worse-0.111
 back-0.101
 he-0.100
 for-0.099
 even-0.099
Token smiled
Token ID13541
Feature activation+3.66e-02

Pos logit contributions
+0.086
ona+0.080
 total+0.080
Fre+0.080
loe+0.079
Neg logit contributions
 back-0.036
 for-0.034
 even-0.033
 around-0.031
der-0.030
Token and
Token ID290
Feature activation-3.74e-02

Pos logit contributions
ona+0.065
 blonde+0.064
ink+0.064
+0.063
 total+0.061
Neg logit contributions
 worse-0.042
controlled-0.041
 hospital-0.038
 rubble-0.037
ger-0.036
Token said
Token ID531
Feature activation+5.58e-02

Pos logit contributions
 worse+0.045
der+0.043
 for+0.042
 back+0.041
 altogether+0.039
Neg logit contributions
ona-0.067
-0.067
ink-0.066
 total-0.065
loe-0.064
Token,
Token ID11
Feature activation+2.45e-02

Pos logit contributions
ona+0.090
+0.090
ink+0.088
 total+0.086
loe+0.085
Neg logit contributions
 worse-0.078
der-0.074
 for-0.073
 back-0.072
 altogether-0.068
Token "
Token ID366
Feature activation-6.21e-03

Pos logit contributions
ink+0.055
ona+0.055
ere+0.055
loe+0.055
 total+0.055
Neg logit contributions
 for-0.019
der-0.018
 back-0.018
 worse-0.016
 he-0.016
Token When
Token ID1649
Feature activation+4.28e-02

Pos logit contributions
 back+0.055
 They+0.054
 for+0.054
der+0.052
 He+0.049
Neg logit contributions
ink-0.177
ona-0.176
elve-0.168
icians-0.167
ere-0.165
Token her
Token ID607
Feature activation+1.33e-03

Pos logit contributions
ona+0.108
ink+0.107
+0.107
ere+0.105
 total+0.104
Neg logit contributions
 worse-0.022
 for-0.019
 back-0.018
der-0.017
 hospital-0.014
Token mom
Token ID1995
Feature activation-2.34e-02

Pos logit contributions
ona+0.003
+0.003
ink+0.003
 total+0.003
loe+0.003
Neg logit contributions
 worse-0.001
der-0.001
 for-0.001
 back-0.001
 altogether-0.001
Token found
Token ID1043
Feature activation+2.87e-02

Pos logit contributions
 for+0.028
 back+0.028
 inside+0.028
 worse+0.025
 he+0.025
Neg logit contributions
-0.045
ona-0.044
if-0.043
ink-0.042
 total-0.042
Token her
Token ID607
Feature activation-1.15e-02

Pos logit contributions
ink+0.063
ona+0.063
+0.063
 blonde+0.060
ere+0.059
Neg logit contributions
 for-0.026
 back-0.026
 worse-0.026
 inside-0.023
 hospital-0.021
Token,
Token ID11
Feature activation+1.80e-02

Pos logit contributions
 for+0.012
 back+0.011
 worse+0.011
 inside+0.010
der+0.010
Neg logit contributions
ona-0.024
ink-0.024
-0.024
loe-0.023
ere-0.023
Token they
Token ID484
Feature activation-2.31e-02

Pos logit contributions
ink+0.039
ona+0.039
+0.038
ere+0.037
een+0.037
Neg logit contributions
 worse-0.017
 for-0.015
 back-0.015
der-0.015
 inside-0.014
Token laughed
Token ID13818
Feature activation+2.63e-02

Pos logit contributions
 worse+0.030
der+0.029
 for+0.028
 back+0.027
 altogether+0.026
Neg logit contributions
-0.040
ona-0.040
ink-0.039
 total-0.038
loe-0.038
Token and
Token ID290
Feature activation-4.49e-02

Pos logit contributions
Once+0.037
 blonde+0.037
ink+0.036
+0.036
 total+0.035
Neg logit contributions
grand-0.039
controlled-0.036
 hospital-0.034
 wasteful-0.034
 worse-0.034
Token laughed
Token ID13818
Feature activation+2.62e-02

Pos logit contributions
 worse+0.066
der+0.064
controlled+0.059
 for+0.058
 altogether+0.058
Neg logit contributions
-0.069
ona-0.068
ink-0.067
 total-0.065
 blonde-0.065
Token.
Token ID13
Feature activation-7.66e-02

Pos logit contributions
 blonde+0.039
+0.039
Once+0.039
ink+0.038
 total+0.038
Neg logit contributions
grand-0.037
controlled-0.033
 hospital-0.033
 wasteful-0.031
 officer-0.031
Token to
Token ID284
Feature activation-5.57e-03

Pos logit contributions
der+0.015
 worse+0.014
controlled+0.013
taker+0.013
 altogether+0.013
Neg logit contributions
ona-0.031
 total-0.031
 blonde-0.030
een-0.030
-0.030
Token play
Token ID711
Feature activation-7.98e-03

Pos logit contributions
 worse+0.006
 for+0.005
 back+0.005
der+0.005
 hospital+0.005
Neg logit contributions
ona-0.011
-0.011
ink-0.011
loe-0.011
 total-0.011
Token,
Token ID11
Feature activation+1.03e-02

Pos logit contributions
 worse+0.009
der+0.009
controlled+0.008
taker+0.008
 officer+0.008
Neg logit contributions
-0.015
ink-0.014
ona-0.014
een-0.014
 total-0.014
Token but
Token ID475
Feature activation+6.45e-03

Pos logit contributions
+0.017
 blonde+0.017
 total+0.016
ona+0.016
ink+0.016
Neg logit contributions
 worse-0.013
der-0.013
controlled-0.012
ger-0.012
 back-0.012
Token the
Token ID262
Feature activation-3.30e-02

Pos logit contributions
 total+0.011
+0.011
 blonde+0.011
ona+0.010
loe+0.010
Neg logit contributions
der-0.008
ashes-0.007
controlled-0.007
 worse-0.007
 altogether-0.007
Token cloud
Token ID6279
Feature activation+3.75e-02

Pos logit contributions
 worse+0.047
 for+0.045
 back+0.045
der+0.044
 hospital+0.042
Neg logit contributions
ona-0.055
-0.053
ink-0.052
ere-0.051
een-0.051
Token said
Token ID531
Feature activation+4.75e-02

Pos logit contributions
+0.063
ink+0.063
ona+0.062
 total+0.061
ere+0.059
Neg logit contributions
 worse-0.046
der-0.046
 for-0.043
 back-0.043
 altogether-0.042
Token it
Token ID340
Feature activation-5.24e-02

Pos logit contributions
+0.076
ona+0.076
ink+0.076
 total+0.073
loe+0.072
Neg logit contributions
 worse-0.062
der-0.059
controlled-0.058
 for-0.057
 back-0.057
Token was
Token ID373
Feature activation+5.72e-02

Pos logit contributions
 worse+0.067
der+0.065
 for+0.063
 back+0.062
 altogether+0.060
Neg logit contributions
-0.088
ink-0.087
ona-0.087
 total-0.085
loe-0.083
Token tired
Token ID10032
Feature activation-5.35e-02

Pos logit contributions
+0.113
ona+0.113
ink+0.111
loe+0.109
een+0.108
Neg logit contributions
 worse-0.063
 for-0.060
 back-0.057
der-0.057
 altogether-0.051
Token and
Token ID290
Feature activation-5.60e-02

Pos logit contributions
controlled+0.075
cream+0.073
bike+0.072
ometer+0.070
 disl+0.069
Neg logit contributions
 time-0.075
 year-0.075
Once-0.074
re-0.065
 total-0.064
Token,
Token ID11
Feature activation+3.31e-02

Pos logit contributions
ona+0.109
+0.107
ink+0.104
 total+0.104
een+0.103
Neg logit contributions
 for-0.071
 worse-0.070
 back-0.066
der-0.065
 he-0.063
Token he
Token ID339
Feature activation-7.02e-04

Pos logit contributions
ona+0.082
+0.082
ink+0.081
 total+0.079
loe+0.079
Neg logit contributions
 worse-0.018
 for-0.016
der-0.015
 back-0.014
 hospital-0.012
Token knew
Token ID2993
Feature activation+4.39e-03

Pos logit contributions
 worse+0.001
 for+0.001
der+0.001
 back+0.001
 even+0.001
Neg logit contributions
-0.001
ona-0.001
 total-0.001
 blonde-0.001
loe-0.001
Token that
Token ID326
Feature activation-7.42e-02

Pos logit contributions
ona+0.010
+0.010
ink+0.010
loe+0.010
ere+0.010
Neg logit contributions
 worse-0.002
der-0.002
 back-0.002
 grave-0.002
 altogether-0.002
Token he
Token ID339
Feature activation-2.84e-02

Pos logit contributions
 worse+0.063
der+0.061
 back+0.057
 for+0.056
 altogether+0.053
Neg logit contributions
ona-0.153
-0.153
ink-0.151
 total-0.150
loe-0.147
Token could
Token ID714
Feature activation+5.79e-02

Pos logit contributions
 worse+0.032
der+0.031
 for+0.029
 back+0.029
 altogether+0.028
Neg logit contributions
ink-0.052
ona-0.052
-0.051
 total-0.050
ere-0.049
Token conquer
Token ID23875
Feature activation-2.14e-02

Pos logit contributions
ona+0.109
+0.105
ink+0.103
 total+0.103
 blonde+0.103
Neg logit contributions
 worse-0.070
 for-0.066
 back-0.065
der-0.063
 fix-0.062
Token anything
Token ID1997
Feature activation-9.15e-02

Pos logit contributions
 worse+0.023
der+0.022
 for+0.021
 back+0.020
 altogether+0.020
Neg logit contributions
ona-0.039
ink-0.039
 total-0.039
-0.039
een-0.038
Token that
Token ID326
Feature activation-1.15e-01

Pos logit contributions
 worse+0.077
der+0.074
 back+0.069
 for+0.069
 hospital+0.065
Neg logit contributions
ona-0.187
-0.187
ink-0.186
 total-0.182
ere-0.180
Token scared
Token ID12008
Feature activation-8.41e-02

Pos logit contributions
 worse+0.120
der+0.114
 for+0.110
 back+0.108
 altogether+0.101
Neg logit contributions
-0.224
ona-0.223
ink-0.220
 total-0.217
loe-0.214
Token him
Token ID683
Feature activation-4.29e-02

Pos logit contributions
 worse+0.083
der+0.080
 for+0.074
 back+0.073
 altogether+0.071
Neg logit contributions
ona-0.161
-0.161
ink-0.160
 total-0.158
loe-0.157
TokenOnce
Token ID7454
Feature activation+1.61e-01

Pos logit contributions
ona+0.234
 blonde+0.227
elve+0.220
 total+0.220
ink+0.216
Neg logit contributions
 back-0.141
 inside-0.139
 for-0.136
 around-0.135
.-0.132
Token there
Token ID612
Feature activation+5.99e-02

Pos logit contributions
+0.126
ona+0.125
ink+0.121
 total+0.114
een+0.110
Neg logit contributions
 worse-0.355
 for-0.349
der-0.344
 back-0.343
 altogether-0.329
Token was
Token ID373
Feature activation+1.04e-01

Pos logit contributions
ona+0.126
ink+0.123
+0.122
 total+0.119
ates+0.118
Neg logit contributions
 for-0.056
 back-0.055
 worse-0.054
 he-0.052
 even-0.049
Token a
Token ID257
Feature activation+4.30e-02

Pos logit contributions
+0.179
ona+0.178
ink+0.175
 total+0.172
loe+0.170
Neg logit contributions
 worse-0.132
der-0.127
 for-0.122
 back-0.121
 altogether-0.115
Token girl
Token ID2576
Feature activation+3.47e-02

Pos logit contributions
ona+0.057
+0.057
ink+0.056
 total+0.054
loe+0.053
Neg logit contributions
 worse-0.072
der-0.069
 for-0.068
 back-0.067
 altogether-0.064
Token named
Token ID3706
Feature activation+1.33e-02

Pos logit contributions
ona+0.047
ink+0.047
+0.047
ere+0.045
 total+0.045
Neg logit contributions
 worse-0.057
der-0.055
 altogether-0.052
 back-0.051
 for-0.051
Token Lucy
Token ID22162
Feature activation+4.71e-02

Pos logit contributions
ona+0.033
+0.033
ink+0.033
 total+0.032
loe+0.032
Neg logit contributions
 worse-0.007
 for-0.007
 back-0.006
der-0.006
 inside-0.005
Token.
Token ID13
Feature activation-6.32e-02

Pos logit contributions
ink+0.094
 blonde+0.093
ona+0.093
 total+0.092
ere+0.091
Neg logit contributions
der-0.046
 worse-0.046
 altogether-0.038
 trigger-0.038
taker-0.038
Token She
Token ID1375
Feature activation+1.28e-02

Pos logit contributions
 for+0.043
 worse+0.041
 back+0.040
der+0.040
 inside+0.035
Neg logit contributions
ona-0.148
ink-0.146
-0.141
ere-0.140
icians-0.139
Token was
Token ID373
Feature activation+8.90e-02

Pos logit contributions
ink+0.023
ona+0.023
+0.023
 blonde+0.022
ere+0.022
Neg logit contributions
 worse-0.015
der-0.015
 for-0.014
 altogether-0.014
 back-0.014
Token three
Token ID1115
Feature activation+5.87e-02

Pos logit contributions
ona+0.194
+0.191
ink+0.190
een+0.188
lor+0.186
Neg logit contributions
 for-0.083
 worse-0.082
 back-0.076
 even-0.073
 hospital-0.069
TokenB
Token ID33
Feature activation+1.64e-02

Pos logit contributions
ona+0.010
+0.010
ink+0.010
 total+0.009
loe+0.009
Neg logit contributions
 worse-0.008
der-0.007
 for-0.007
 back-0.007
 altogether-0.007
Tokenenny
Token ID11870
Feature activation+6.06e-03

Pos logit contributions
ona+0.023
+0.023
ink+0.023
 total+0.022
loe+0.022
Neg logit contributions
 worse-0.026
der-0.025
 for-0.025
 back-0.024
 altogether-0.023
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
ona+0.010
+0.010
ink+0.010
 total+0.009
loe+0.009
Neg logit contributions
 worse-0.008
der-0.008
 for-0.008
 back-0.008
 altogether-0.007
Token can
Token ID460
Feature activation+4.35e-02

Pos logit contributions
+0.027
ona+0.027
ink+0.026
 total+0.026
loe+0.026
Neg logit contributions
 worse-0.014
der-0.013
 for-0.013
 back-0.012
 altogether-0.011
Token you
Token ID345
Feature activation-3.71e-02

Pos logit contributions
ona+0.086
+0.084
ink+0.084
 total+0.084
ere+0.082
Neg logit contributions
 worse-0.046
 for-0.042
 back-0.041
der-0.040
 fix-0.040
Token mix
Token ID5022
Feature activation-1.34e-02

Pos logit contributions
 for+0.036
 worse+0.036
 fix+0.034
 back+0.033
 even+0.033
Neg logit contributions
ona-0.076
loe-0.076
ink-0.076
een-0.075
-0.075
Token the
Token ID262
Feature activation-5.95e-02

Pos logit contributions
 for+0.013
 back+0.012
 worse+0.012
 inside+0.010
 even+0.010
Neg logit contributions
-0.030
ona-0.029
loe-0.029
 blonde-0.028
ink-0.028
Token flour
Token ID10601
Feature activation+8.42e-03

Pos logit contributions
 for+0.083
 back+0.082
 worse+0.078
 inside+0.075
 even+0.071
Neg logit contributions
loe-0.105
-0.103
ona-0.103
ink-0.101
iley-0.101
Token and
Token ID290
Feature activation-6.44e-02

Pos logit contributions
ona+0.017
+0.017
ink+0.016
loe+0.016
 total+0.016
Neg logit contributions
 worse-0.009
 for-0.008
 back-0.008
der-0.008
 altogether-0.007
Token sugar
Token ID7543
Feature activation+4.70e-03

Pos logit contributions
 for+0.076
 worse+0.075
 back+0.070
der+0.068
 even+0.067
Neg logit contributions
ona-0.122
loe-0.121
-0.121
ink-0.120
een-0.116
Token together
Token ID1978
Feature activation+1.01e-02

Pos logit contributions
ona+0.009
 blonde+0.009
+0.009
loe+0.009
 total+0.009
Neg logit contributions
 for-0.005
 worse-0.005
 back-0.005
der-0.004
 inside-0.004
Token water
Token ID1660
Feature activation-1.70e-02

Pos logit contributions
 worse+0.089
der+0.088
 altogether+0.082
 uncontroll+0.080
taker+0.079
Neg logit contributions
ink-0.073
 total-0.073
een-0.073
ona-0.072
-0.071
Token.
Token ID13
Feature activation-6.78e-02

Pos logit contributions
 worse+0.007
 for+0.006
der+0.006
 back+0.006
 altogether+0.004
Neg logit contributions
ona-0.044
-0.044
ink-0.043
loe-0.043
 total-0.043
Token The
Token ID383
Feature activation-1.33e-02

Pos logit contributions
 for+0.038
 back+0.037
 worse+0.036
der+0.036
 They+0.030
Neg logit contributions
ink-0.163
ona-0.162
ere-0.159
-0.158
 total-0.157
Token boat
Token ID8848
Feature activation+2.22e-02

Pos logit contributions
 worse+0.010
der+0.009
 for+0.009
 back+0.009
 altogether+0.008
Neg logit contributions
-0.030
ona-0.030
ink-0.030
 total-0.029
loe-0.029
Token is
Token ID318
Feature activation+4.11e-02

Pos logit contributions
ink+0.040
ona+0.039
+0.039
 total+0.038
ere+0.037
Neg logit contributions
 worse-0.027
der-0.025
 back-0.024
 for-0.024
 altogether-0.023
Token white
Token ID2330
Feature activation-9.26e-03

Pos logit contributions
+0.083
ona+0.081
loe+0.080
ink+0.079
een+0.078
Neg logit contributions
 worse-0.045
 for-0.044
 back-0.042
der-0.039
 inside-0.038
Token and
Token ID290
Feature activation-4.31e-02

Pos logit contributions
 worse+0.010
der+0.009
controlled+0.008
 altogether+0.008
 back+0.008
Neg logit contributions
-0.018
ink-0.018
ona-0.017
 total-0.017
 blonde-0.017
Token red
Token ID2266
Feature activation+2.82e-03

Pos logit contributions
 worse+0.054
der+0.053
 back+0.049
 altogether+0.048
 for+0.048
Neg logit contributions
-0.074
ink-0.074
 total-0.073
ona-0.073
 blonde-0.070
Token.
Token ID13
Feature activation-7.04e-02

Pos logit contributions
+0.006
ink+0.006
ona+0.005
 total+0.005
 blonde+0.005
Neg logit contributions
 worse-0.003
controlled-0.002
taker-0.002
 hospital-0.002
 smoothly-0.002
Token It
Token ID632
Feature activation-1.82e-02

Pos logit contributions
der+0.055
 for+0.054
 worse+0.053
 back+0.053
 They+0.052
Neg logit contributions
ink-0.156
ere-0.154
ona-0.153
-0.151
 total-0.149
Token has
Token ID468
Feature activation+6.22e-03

Pos logit contributions
 worse+0.025
der+0.024
 altogether+0.023
 knowing+0.022
 for+0.021
Neg logit contributions
ink-0.030
ona-0.029
 total-0.029
-0.028
een-0.028
Token,
Token ID11
Feature activation+2.98e-02

Pos logit contributions
ona+0.145
ink+0.138
+0.137
loe+0.135
een+0.135
Neg logit contributions
 for-0.074
 back-0.067
 worse-0.066
 inside-0.065
 he-0.062
Token a
Token ID257
Feature activation+3.42e-02

Pos logit contributions
ona+0.060
ink+0.059
+0.059
loe+0.058
een+0.057
Neg logit contributions
 worse-0.030
 for-0.029
der-0.028
 back-0.028
 hospital-0.025
Token little
Token ID1310
Feature activation+4.04e-02

Pos logit contributions
ona+0.051
+0.051
ink+0.050
 total+0.048
loe+0.048
Neg logit contributions
 worse-0.052
der-0.049
 for-0.049
 back-0.048
 altogether-0.046
Token girl
Token ID2576
Feature activation+3.27e-02

Pos logit contributions
 total+0.068
+0.067
ona+0.067
ere+0.065
 blonde+0.064
Neg logit contributions
der-0.050
 altogether-0.045
 worse-0.045
controlled-0.044
edy-0.042
Token was
Token ID373
Feature activation+8.45e-02

Pos logit contributions
+0.056
ona+0.056
ink+0.054
 total+0.053
loe+0.053
Neg logit contributions
 worse-0.042
 for-0.041
 back-0.040
der-0.039
 inside-0.038
Token helping
Token ID5742
Feature activation-3.05e-02

Pos logit contributions
+0.153
ona+0.152
 total+0.149
ink+0.147
loe+0.143
Neg logit contributions
 worse-0.116
 for-0.116
 back-0.109
 hospital-0.105
 inside-0.105
Token her
Token ID607
Feature activation-1.80e-02

Pos logit contributions
 for+0.027
 worse+0.026
 back+0.024
 even+0.021
 hospital+0.021
Neg logit contributions
ink-0.070
-0.070
ona-0.068
loe-0.066
 total-0.065
Token mom
Token ID1995
Feature activation-3.24e-02

Pos logit contributions
 worse+0.018
 for+0.017
 back+0.017
 hospital+0.016
der+0.015
Neg logit contributions
ona-0.037
-0.037
ink-0.037
 total-0.036
 blonde-0.036
Token put
Token ID1234
Feature activation-1.20e-02

Pos logit contributions
 for+0.035
 worse+0.033
 back+0.032
 fix+0.032
 inside+0.031
Neg logit contributions
ona-0.070
-0.068
loe-0.068
ink-0.067
Fre-0.066
Token dishes
Token ID16759
Feature activation+2.23e-04

Pos logit contributions
 worse+0.012
 for+0.011
 back+0.011
der+0.010
 inside+0.010
Neg logit contributions
ona-0.025
-0.025
ink-0.024
loe-0.024
 total-0.024
Token in
Token ID287
Feature activation+2.55e-02

Pos logit contributions
+0.000
ink+0.000
 total+0.000
ona+0.000
ere+0.000
Neg logit contributions
der-0.000
 worse-0.000
 hospital-0.000
 grave-0.000
controlled-0.000
Token
Token ID198
Feature activation+1.15e-01

Pos logit contributions
+0.134
ink+0.127
Once+0.121
ona+0.120
loe+0.117
Neg logit contributions
 worse-0.212
der-0.209
ger-0.201
taker-0.197
controlled-0.196
TokenTom
Token ID13787
Feature activation+2.99e-02

Pos logit contributions
Once+0.134
+0.133
ona+0.132
ink+0.130
 total+0.128
Neg logit contributions
der-0.208
 worse-0.204
 for-0.201
taker-0.198
 back-0.194
Token thought
Token ID1807
Feature activation+6.86e-03

Pos logit contributions
+0.057
ona+0.057
ink+0.056
 total+0.056
loe+0.055
Neg logit contributions
 worse-0.034
der-0.032
 back-0.032
 for-0.031
 hospital-0.028
Token about
Token ID546
Feature activation-1.84e-03

Pos logit contributions
loe+0.012
irl+0.011
ere+0.011
ona+0.011
 time+0.011
Neg logit contributions
 grave-0.007
attered-0.007
bike-0.007
ger-0.007
 trigger-0.007
Token what
Token ID644
Feature activation-9.48e-02

Pos logit contributions
 worse+0.002
der+0.002
attered+0.002
 back+0.002
 smoothly+0.002
Neg logit contributions
-0.004
ona-0.003
 total-0.003
een-0.003
ink-0.003
Token Lily
Token ID20037
Feature activation-4.71e-02

Pos logit contributions
der+0.095
 grave+0.093
 worse+0.088
controlled+0.086
 smoothly+0.084
Neg logit contributions
ona-0.180
 total-0.180
-0.175
loe-0.173
 blonde-0.171
Token said
Token ID531
Feature activation+1.20e-02

Pos logit contributions
 worse+0.062
der+0.059
 for+0.057
 back+0.057
 altogether+0.054
Neg logit contributions
ona-0.078
-0.078
ink-0.077
 total-0.074
ere-0.074
Token.
Token ID13
Feature activation-8.88e-02

Pos logit contributions
+0.022
ink+0.022
Once+0.021
ates+0.021
 blonde+0.021
Neg logit contributions
 worse-0.011
controlled-0.011
 smoothly-0.011
 guy-0.010
 hospital-0.010
Token He
Token ID679
Feature activation-1.58e-02

Pos logit contributions
 worse+0.092
controlled+0.089
 grave+0.078
chy+0.075
 heal+0.074
Neg logit contributions
-0.171
 total-0.163
ona-0.160
 blonde-0.160
Once-0.160
Token said
Token ID531
Feature activation+3.85e-02

Pos logit contributions
 worse+0.023
 altogether+0.022
 for+0.021
controlled+0.021
der+0.021
Neg logit contributions
ona-0.024
ink-0.023
-0.022
 total-0.022
ere-0.022
Token,
Token ID11
Feature activation+1.71e-02

Pos logit contributions
ona+0.045
ink+0.044
+0.043
loe+0.041
 blonde+0.040
Neg logit contributions
 pavement-0.064
controlled-0.063
 grave-0.063
 worse-0.062
attered-0.062
Token loud
Token ID7812
Feature activation-5.04e-02

Pos logit contributions
der+0.060
controlled+0.060
hot+0.058
ger+0.057
 for+0.055
Neg logit contributions
-0.074
 total-0.073
ink-0.070
ere-0.068
 year-0.068
Token thunder
Token ID18355
Feature activation-4.33e-02

Pos logit contributions
 worse+0.055
der+0.054
controlled+0.054
hot+0.051
 online+0.051
Neg logit contributions
ink-0.086
-0.085
 total-0.084
ere-0.083
een-0.082
Token and
Token ID290
Feature activation-4.52e-02

Pos logit contributions
 worse+0.042
controlled+0.040
 valuable+0.038
der+0.038
 decide+0.038
Neg logit contributions
-0.082
ink-0.080
 total-0.076
een-0.074
ona-0.074
Token bright
Token ID6016
Feature activation-2.24e-03

Pos logit contributions
der+0.052
 worse+0.048
 altogether+0.046
controlled+0.046
 for+0.045
Neg logit contributions
 total-0.080
ink-0.077
-0.077
ona-0.075
ere-0.075
Token lightning
Token ID14357
Feature activation-1.24e-02

Pos logit contributions
 worse+0.002
der+0.002
 for+0.002
controlled+0.002
 valuable+0.002
Neg logit contributions
ink-0.004
-0.004
ere-0.004
 blonde-0.004
 total-0.004
Token.
Token ID13
Feature activation-7.50e-02

Pos logit contributions
 worse+0.009
der+0.008
 back+0.008
controlled+0.008
 for+0.008
Neg logit contributions
ink-0.026
-0.026
 total-0.025
ere-0.024
een-0.024
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
 worse+0.086
 for+0.082
der+0.081
 back+0.081
 hospital+0.074
Neg logit contributions
ona-0.141
ink-0.138
-0.137
 total-0.134
loe-0.132
Token
Token ID198
Feature activation+1.17e-01

Pos logit contributions
ona+0.208
+0.207
ink+0.204
 total+0.199
 blonde+0.198
Neg logit contributions
 worse-0.139
 for-0.135
 back-0.131
der-0.126
 inside-0.119
TokenTim
Token ID14967
Feature activation+4.00e-02

Pos logit contributions
+0.211
ona+0.209
ink+0.206
 total+0.204
loe+0.200
Neg logit contributions
 worse-0.143
der-0.138
 for-0.133
 back-0.130
 altogether-0.124
Token's
Token ID338
Feature activation+2.91e-02

Pos logit contributions
+0.080
ona+0.079
ink+0.077
 total+0.076
loe+0.076
Neg logit contributions
 worse-0.042
 for-0.039
 back-0.039
der-0.038
 hospital-0.035
Token dad
Token ID9955
Feature activation-3.49e-04

Pos logit contributions
+0.049
 total+0.046
ona+0.046
 blonde+0.046
ink+0.045
Neg logit contributions
der-0.036
 worse-0.036
 altogether-0.034
controlled-0.034
ger-0.034
Token try
Token ID1949
Feature activation-2.77e-02

Pos logit contributions
 worse+0.051
der+0.049
ger+0.047
 altogether+0.046
taker+0.045
Neg logit contributions
 total-0.071
 blonde-0.070
ona-0.069
-0.067
ink-0.066
Token to
Token ID284
Feature activation+6.35e-03

Pos logit contributions
 worse+0.023
der+0.021
 for+0.021
 back+0.020
 altogether+0.019
Neg logit contributions
-0.060
ona-0.060
ink-0.059
 total-0.058
loe-0.057
Token say
Token ID910
Feature activation+1.96e-02

Pos logit contributions
+0.013
ink+0.012
ona+0.012
loe+0.012
ere+0.012
Neg logit contributions
 worse-0.007
 for-0.007
 back-0.007
 fix-0.007
 even-0.007
Token the
Token ID262
Feature activation-4.84e-02

Pos logit contributions
+0.039
ona+0.039
ink+0.038
 total+0.037
loe+0.037
Neg logit contributions
 worse-0.020
der-0.019
 for-0.019
 back-0.018
 altogether-0.017
Token words
Token ID2456
Feature activation-2.69e-02

Pos logit contributions
 worse+0.052
 for+0.050
der+0.049
 back+0.048
 hospital+0.044
Neg logit contributions
-0.095
ona-0.094
ink-0.094
loe-0.091
een-0.090
Token.
Token ID13
Feature activation-7.16e-02

Pos logit contributions
 worse+0.019
der+0.017
 hospital+0.016
 grave+0.016
ger+0.016
Neg logit contributions
ona-0.060
 total-0.058
ink-0.058
-0.057
loe-0.056
Token
Token ID198
Feature activation+1.11e-01

Pos logit contributions
 worse+0.071
der+0.068
 for+0.067
 back+0.065
 altogether+0.061
Neg logit contributions
ona-0.143
ink-0.142
-0.141
 total-0.137
ere-0.136
Token
Token ID198
Feature activation+1.13e-01

Pos logit contributions
+0.209
ona+0.206
ink+0.204
 blonde+0.203
 total+0.199
Neg logit contributions
 worse-0.120
 for-0.116
der-0.111
 back-0.110
 altogether-0.103
TokenOne
Token ID3198
Feature activation+6.20e-02

Pos logit contributions
+0.174
ink+0.163
ona+0.162
 blonde+0.161
loe+0.155
Neg logit contributions
 worse-0.166
 for-0.156
der-0.153
 smoothly-0.152
 back-0.151
Token day
Token ID1110
Feature activation+7.70e-02

Pos logit contributions
ona+0.119
+0.118
ink+0.117
 total+0.115
loe+0.113
Neg logit contributions
 worse-0.068
der-0.064
 back-0.061
 for-0.060
 altogether-0.058
Token,
Token ID11
Feature activation+3.20e-02

Pos logit contributions
+0.158
ink+0.156
ona+0.156
if+0.153
ere+0.152
Neg logit contributions
 for-0.081
 he-0.072
 worse-0.072
 back-0.072
 inside-0.070
Token better
Token ID1365
Feature activation-4.70e-02

Pos logit contributions
+0.029
ink+0.028
ona+0.028
 total+0.028
 blonde+0.028
Neg logit contributions
der-0.015
 worse-0.014
 grave-0.013
ometer-0.013
 altogether-0.013
Token,
Token ID11
Feature activation+1.46e-02

Pos logit contributions
controlled+0.042
 worse+0.042
 hospital+0.040
 rubble+0.039
 pavement+0.038
Neg logit contributions
-0.097
ink-0.095
ona-0.092
if-0.089
Once-0.086
Token and
Token ID290
Feature activation-3.01e-02

Pos logit contributions
+0.029
ona+0.028
ink+0.028
 blonde+0.027
 total+0.027
Neg logit contributions
 worse-0.014
der-0.013
ger-0.013
controlled-0.012
 back-0.012
Token she
Token ID673
Feature activation-2.01e-02

Pos logit contributions
der+0.031
 worse+0.031
controlled+0.030
 altogether+0.030
 trigger+0.030
Neg logit contributions
ona-0.055
-0.052
ink-0.052
 blonde-0.051
Will-0.051
Token learned
Token ID4499
Feature activation-2.27e-02

Pos logit contributions
 worse+0.025
der+0.023
 for+0.022
 back+0.022
 altogether+0.022
Neg logit contributions
ona-0.036
ink-0.036
-0.035
 total-0.034
ere-0.033
Token that
Token ID326
Feature activation-8.78e-02

Pos logit contributions
 trigger+0.029
ger+0.028
 guy+0.028
ashes+0.028
attered+0.028
Neg logit contributions
ona-0.033
if-0.032
Once-0.032
ere-0.032
Will-0.032
Token sometimes
Token ID3360
Feature activation-4.43e-02

Pos logit contributions
der+0.095
 worse+0.094
ger+0.092
 back+0.091
controlled+0.090
Neg logit contributions
-0.155
Will-0.152
ona-0.151
ink-0.150
 total-0.148
Token even
Token ID772
Feature activation-5.71e-02

Pos logit contributions
der+0.042
 worse+0.042
 back+0.039
 for+0.038
 altogether+0.037
Neg logit contributions
-0.087
ink-0.086
ona-0.085
 total-0.084
loe-0.083
Token angry
Token ID7954
Feature activation-7.78e-02

Pos logit contributions
 worse+0.053
der+0.051
 for+0.049
 back+0.048
 altogether+0.046
Neg logit contributions
ona-0.115
-0.115
ink-0.113
 total-0.112
ere-0.110
Token things
Token ID1243
Feature activation-4.99e-02

Pos logit contributions
 worse+0.092
 altogether+0.088
der+0.088
 grave+0.085
controlled+0.085
Neg logit contributions
ere-0.124
ink-0.123
 blonde-0.123
een-0.119
ona-0.119
Token can
Token ID460
Feature activation-5.83e-03

Pos logit contributions
 worse+0.056
der+0.055
 for+0.052
 back+0.051
 altogether+0.050
Neg logit contributions
-0.089
ink-0.088
ona-0.087
 total-0.086
ere-0.084
Token cap
Token ID1451
Feature activation+9.52e-03

Pos logit contributions
 worse+0.000
der+0.000
 for+0.000
 back+0.000
 altogether+0.000
Neg logit contributions
-0.000
ona-0.000
ink-0.000
 total-0.000
loe-0.000
Token and
Token ID290
Feature activation-4.65e-02

Pos logit contributions
ink+0.019
+0.019
ona+0.019
 total+0.018
 blonde+0.018
Neg logit contributions
 worse-0.009
der-0.008
 hospital-0.008
 altogether-0.008
 back-0.008
Token a
Token ID257
Feature activation+6.76e-03

Pos logit contributions
 worse+0.054
der+0.052
 for+0.051
 back+0.050
 hospital+0.046
Neg logit contributions
-0.086
ona-0.086
ink-0.084
loe-0.082
 total-0.082
Token red
Token ID2266
Feature activation+1.46e-02

Pos logit contributions
ona+0.012
+0.012
ink+0.012
loe+0.011
een+0.011
Neg logit contributions
 worse-0.008
 for-0.008
der-0.008
 back-0.008
 hospital-0.007
Token label
Token ID6167
Feature activation+8.71e-03

Pos logit contributions
+0.024
ink+0.024
ona+0.023
 total+0.023
 blonde+0.023
Neg logit contributions
 worse-0.020
der-0.018
 altogether-0.017
controlled-0.017
 for-0.017
Token.
Token ID13
Feature activation-7.27e-02

Pos logit contributions
ink+0.019
ona+0.019
+0.019
 total+0.018
 blonde+0.018
Neg logit contributions
 worse-0.007
der-0.006
 hospital-0.005
 back-0.005
 altogether-0.005
Token They
Token ID1119
Feature activation-1.24e-02

Pos logit contributions
 worse+0.050
der+0.050
 for+0.049
 back+0.045
 he+0.042
Neg logit contributions
ink-0.163
ona-0.163
-0.162
 total-0.160
een-0.158
Token could
Token ID714
Feature activation+6.68e-02

Pos logit contributions
 worse+0.016
der+0.016
 for+0.015
 back+0.015
 altogether+0.014
Neg logit contributions
-0.021
ona-0.021
ink-0.021
 total-0.020
loe-0.020
Token not
Token ID407
Feature activation-2.40e-02

Pos logit contributions
+0.118
ona+0.117
ink+0.116
 total+0.114
loe+0.111
Neg logit contributions
 worse-0.081
der-0.079
 for-0.075
 back-0.074
 altogether-0.072
Token read
Token ID1100
Feature activation-1.94e-02

Pos logit contributions
der+0.033
 worse+0.032
 for+0.029
 altogether+0.029
 back+0.029
Neg logit contributions
ona-0.038
-0.038
 total-0.037
ink-0.037
 blonde-0.036
Token the
Token ID262
Feature activation-6.66e-02

Pos logit contributions
 worse+0.018
der+0.017
 back+0.015
 for+0.015
 hospital+0.015
Neg logit contributions
 total-0.039
ona-0.038
ink-0.038
-0.038
 blonde-0.037
Token Lily
Token ID20037
Feature activation-1.83e-03

Pos logit contributions
 worse+0.051
der+0.051
 altogether+0.049
 back+0.048
 for+0.047
Neg logit contributions
ona-0.062
-0.061
ink-0.060
 total-0.060
 blonde-0.059
Token drank
Token ID24070
Feature activation-1.51e-02

Pos logit contributions
 for+0.002
 worse+0.002
 back+0.002
der+0.002
 inside+0.002
Neg logit contributions
ona-0.003
 total-0.003
-0.003
loe-0.003
ink-0.003
Token it
Token ID340
Feature activation-7.51e-02

Pos logit contributions
 officer+0.019
bike+0.019
controlled+0.019
 villains+0.019
 guy+0.018
Neg logit contributions
-0.025
 total-0.023
ink-0.022
ona-0.022
 blonde-0.022
Token all
Token ID477
Feature activation-6.57e-02

Pos logit contributions
controlled+0.070
 officer+0.069
 guy+0.068
der+0.066
 villains+0.066
Neg logit contributions
-0.147
ink-0.143
 total-0.137
ona-0.137
 blonde-0.133
Token up
Token ID510
Feature activation-3.59e-02

Pos logit contributions
 villains+0.066
der+0.064
 officer+0.062
 arrow+0.060
dies+0.059
Neg logit contributions
ere-0.126
-0.125
ink-0.124
ona-0.122
 total-0.122
Token.
Token ID13
Feature activation-8.40e-02

Pos logit contributions
 worse+0.021
der+0.019
 for+0.017
controlled+0.016
 back+0.016
Neg logit contributions
-0.086
ona-0.085
ink-0.084
 total-0.083
 blonde-0.081
Token She
Token ID1375
Feature activation-1.03e-02

Pos logit contributions
 worse+0.081
der+0.072
 for+0.071
 back+0.069
ger+0.065
Neg logit contributions
-0.177
ona-0.172
ink-0.169
 total-0.166
loe-0.166
Token felt
Token ID2936
Feature activation+1.92e-02

Pos logit contributions
 worse+0.013
der+0.013
 for+0.012
 back+0.012
 altogether+0.012
Neg logit contributions
ona-0.018
-0.018
ink-0.017
 total-0.017
loe-0.017
Token much
Token ID881
Feature activation-1.73e-02

Pos logit contributions
 blonde+0.029
ink+0.029
Once+0.029
ona+0.029
+0.029
Neg logit contributions
der-0.025
bike-0.025
ometer-0.024
 guy-0.024
 officer-0.024
Token better
Token ID1365
Feature activation-4.90e-02

Pos logit contributions
bike+0.013
ometer+0.012
controlled+0.011
 automobile+0.011
der+0.011
Neg logit contributions
-0.038
ona-0.036
ink-0.036
Once-0.036
 blonde-0.035
Token and
Token ID290
Feature activation-5.00e-02

Pos logit contributions
 worse+0.036
der+0.033
 for+0.031
 back+0.031
 altogether+0.028
Neg logit contributions
-0.111
ona-0.111
ink-0.110
 total-0.107
loe-0.106