TOP ACTIVATIONS
MAX = 0.147

this is a sculpture of a cat . It is white
you can see a little face in the bark
I have two cards with a dog !" Ben says .
I am baking a delicious cake . â
flag , a map , a landmark , or a symbol
, like a flag , a map , a landmark ,
a gir affe , and a snake !" He points to
and a monkey , and a gir affe , and a
" I was born as a cater pillar . A cater
a tiny box shaped like a heart .
? It looks more like a mess . And some of
, there was an old wooden box that no one could
has a long nose called a trunk . It can use
, a landmark , or a symbol . That way ,
Wow ! You are a purple z ebra ! Can I
" Look , there 's a stand !" said Lucy .
, a nose , and a collar . Then your dog
draw me a picture of a banana ?" L
is a lion , and a monkey , and a gir
a ball , and even a spoon . He likes to

MIN ACTIVATIONS
MIN = -0.121

fingers . She inserted the key into the key
went on a walk and soon they saw a lab .
followed the ringing sound and soon found the source
as they could , and soon they were far away from
through the forest , but soon it was getting dark .
but she kept walking and soon found a beautiful flower .
scared at first , but soon the fear was replaced by
police officer called Sarah and soon she arrived to pick up
little girl was disappointed but soon spotted a small elephant standing
baby was disappointed , but soon she saw a little bird
. She started walking but soon realized it was farther away
for a bit , but soon he was back home and
over the hills , but soon she found herself lost in
walked and walked , but soon he realized he didn 't
They searched and searched , but it was gone . They
," said Lily . She organized all the toys in the
under the umbrella , but soon they noticed that their feet
she screamed for help . Soon a friendly polar bear appeared
She followed the singing and soon she came upon a pond
and flew away , but soon returned with a friend .

INTERVAL 0.080 - 0.147
CONTAINS 1.389%

a time , there was a little boy named Tim my
and plants . It was a magical place !
in the curtain , and a sneaky thief was reaching in
The cat was sitting under a tree . The bird said
fun . <|endoftext|> Zero is a green cat . He likes

INTERVAL 0.013 - 0.080
CONTAINS 33.666%

he had written . His mom was very proud
mom said . L ily felt much better inside
was the best day of his life !
Now I have something pretty to wear ."
py did it and the other animals liked it . They

INTERVAL -0.054 - 0.013
CONTAINS 58.517%

there was an enormous elephant . He liked to wander around
cover . L ily said to her mom
. The infant was very happy and played with the ball
him . She keeps looking for her own pieces .
tired ." Ben and Lily nodded . They were

INTERVAL -0.121 - -0.054
CONTAINS 6.429%

all the numbers , but when he got to the end
started to count . But then , Lily 's little brother
ained all night . She put on her rain coat and
to laugh . They pointed at her and made fun of
down his head . He saw his scarf falling off his
Token this
Token ID428
Feature activation+5.88e-02

Pos logit contributions
 her+0.008
 his+0.007
 their+0.006
 the+0.006
 its+0.006
Neg logit contributions
 promise-0.016
 rule-0.016
-0.016
opus-0.016
oc-0.016
Token is
Token ID318
Feature activation+1.61e-04

Pos logit contributions
ode+0.106
ney+0.105
uer+0.103
opus+0.103
umpy+0.103
Neg logit contributions
 her-0.118
 the-0.118
 their-0.110
 it-0.107
 his-0.103
Token a
Token ID257
Feature activation+1.16e-01

Pos logit contributions
 promise+0.000
 rule+0.000
opus+0.000
oc+0.000
+0.000
Neg logit contributions
 her-0.000
 the-0.000
 his-0.000
 their-0.000
 its-0.000
Token sculpture
Token ID26924
Feature activation+9.28e-02

Pos logit contributions
ode+0.238
ari+0.237
ney+0.233
iver+0.225
asis+0.225
Neg logit contributions
 the-0.200
 her-0.200
 their-0.198
 it-0.193
 his-0.186
Token of
Token ID286
Feature activation+1.01e-02

Pos logit contributions
iver+0.225
umpy+0.220
 promise+0.219
ini+0.219
 Driver+0.217
Neg logit contributions
 her-0.131
 the-0.126
 his-0.115
 their-0.112
 it-0.110
Token a
Token ID257
Feature activation+1.47e-01

Pos logit contributions
 promise+0.029
umpy+0.028
opus+0.028
oc+0.028
+0.028
Neg logit contributions
 her-0.009
 the-0.008
 their-0.007
 his-0.007
 its-0.006
Token cat
Token ID3797
Feature activation+8.38e-02

Pos logit contributions
iver+0.298
ari+0.264
asis+0.260
umpy+0.258
rig+0.252
Neg logit contributions
 it-0.297
 the-0.296
 their-0.276
 her-0.264
 its-0.263
Token.
Token ID13
Feature activation+6.64e-02

Pos logit contributions
oral+0.205
 promise+0.200
Two+0.198
ini+0.197
umpy+0.196
Neg logit contributions
 her-0.127
 the-0.119
 it-0.113
 their-0.109
 his-0.098
Token It
Token ID632
Feature activation+3.42e-02

Pos logit contributions
 promise+0.164
asis+0.162
ater+0.158
opus+0.154
ute+0.153
Neg logit contributions
 her-0.098
 the-0.092
 their-0.086
 his-0.084
 its-0.083
Token is
Token ID318
Feature activation-2.84e-03

Pos logit contributions
ute+0.059
opus+0.059
umpy+0.058
 promise+0.058
ater+0.057
Neg logit contributions
 her-0.070
 the-0.067
 their-0.064
 his-0.062
 it-0.060
Token white
Token ID2330
Feature activation+4.35e-02

Pos logit contributions
 her+0.004
 his+0.003
 the+0.003
 their+0.003
 its+0.003
Neg logit contributions
 promise-0.006
 rule-0.006
umpy-0.006
-0.006
opus-0.006
Token
Token ID129
Feature activation+5.80e-02

Pos logit contributions
 rule+0.148
 promise+0.147
 favor+0.145
reath+0.143
ari+0.141
Neg logit contributions
 her-0.128
 the-0.119
 his-0.114
 it-0.110
 their-0.107
Token
Token ID241
Feature activation+7.28e-02

Pos logit contributions
 promise+0.098
 rule+0.090
 favor+0.087
ari+0.087
opus+0.086
Neg logit contributions
 her-0.129
 the-0.119
 his-0.117
 its-0.107
 their-0.106
Tokenyou
Token ID5832
Feature activation+6.65e-02

Pos logit contributions
 promise+0.160
ari+0.158
opus+0.150
affe+0.150
asis+0.149
Neg logit contributions
 her-0.129
 the-0.119
 his-0.115
 it-0.112
 their-0.109
Token can
Token ID460
Feature activation+8.00e-02

Pos logit contributions
opus+0.128
oc+0.126
asis+0.120
ater+0.120
actus+0.117
Neg logit contributions
 her-0.129
 the-0.128
 it-0.119
 his-0.111
 their-0.109
Token see
Token ID766
Feature activation-2.16e-03

Pos logit contributions
oc+0.168
affe+0.159
ari+0.157
opus+0.151
asis+0.151
Neg logit contributions
 her-0.166
 the-0.153
 it-0.146
 his-0.140
 their-0.133
Token a
Token ID257
Feature activation+1.44e-01

Pos logit contributions
 her+0.002
 his+0.002
 their+0.002
 the+0.002
 its+0.001
Neg logit contributions
 promise-0.006
 rule-0.006
opus-0.006
-0.006
oc-0.006
Token little
Token ID1310
Feature activation+9.89e-02

Pos logit contributions
iver+0.267
umpy+0.267
rr+0.254
age+0.253
ari+0.252
Neg logit contributions
 the-0.278
 it-0.267
 her-0.254
 his-0.244
 their-0.240
Token face
Token ID1986
Feature activation+5.94e-02

Pos logit contributions
ari+0.181
arl+0.174
age+0.171
ute+0.171
oral+0.170
Neg logit contributions
 the-0.203
 her-0.194
 it-0.189
 his-0.176
 their-0.170
Token in
Token ID287
Feature activation+3.73e-03

Pos logit contributions
 promise+0.129
oc+0.128
opus+0.127
 rule+0.125
affe+0.125
Neg logit contributions
 her-0.101
 the-0.090
 his-0.084
 their-0.082
 it-0.080
Token the
Token ID262
Feature activation+7.41e-02

Pos logit contributions
 promise+0.011
opus+0.011
umpy+0.011
oc+0.011
 rule+0.011
Neg logit contributions
 her-0.003
 the-0.002
 his-0.002
 their-0.002
 its-0.001
Token bark
Token ID21405
Feature activation+5.82e-02

Pos logit contributions
uer+0.126
oc+0.126
umpy+0.123
opus+0.122
affe+0.122
Neg logit contributions
 her-0.166
 the-0.159
 his-0.151
 their-0.140
 its-0.131
Token I
Token ID314
Feature activation+3.02e-02

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.002
 rule-0.002
opus-0.002
-0.002
umpy-0.002
Token have
Token ID423
Feature activation-1.74e-02

Pos logit contributions
affe+0.067
umpy+0.064
ute+0.063
opus+0.063
ney+0.063
Neg logit contributions
 her-0.049
 the-0.047
 his-0.044
 their-0.042
 it-0.041
Token two
Token ID734
Feature activation+5.58e-02

Pos logit contributions
 her+0.016
 their+0.013
 his+0.012
 the+0.012
led+0.011
Neg logit contributions
-0.046
 promise-0.046
 rule-0.045
opus-0.045
ater-0.045
Token cards
Token ID4116
Feature activation+6.92e-02

Pos logit contributions
ute+0.108
ney+0.106
oc+0.105
opus+0.104
ari+0.103
Neg logit contributions
 her-0.108
 the-0.100
 his-0.094
 their-0.093
 it-0.090
Token with
Token ID351
Feature activation+1.14e-02

Pos logit contributions
oc+0.117
umpy+0.116
opus+0.116
asis+0.113
ute+0.112
Neg logit contributions
 her-0.148
 the-0.138
 their-0.132
 his-0.129
 it-0.127
Token a
Token ID257
Feature activation+1.42e-01

Pos logit contributions
 promise+0.026
umpy+0.026
opus+0.025
 rule+0.025
oc+0.025
Neg logit contributions
 her-0.017
 the-0.015
 their-0.015
 his-0.015
 its-0.013
Token dog
Token ID3290
Feature activation+9.40e-02

Pos logit contributions
iver+0.277
umpy+0.258
asis+0.248
ini+0.244
ney+0.243
Neg logit contributions
 her-0.290
 the-0.284
 their-0.282
 it-0.274
 his-0.267
Token!"
Token ID2474
Feature activation-3.66e-02

Pos logit contributions
oral+0.158
 promise+0.155
umpy+0.151
ini+0.149
reath+0.148
Neg logit contributions
 her-0.216
 the-0.207
 it-0.191
 their-0.191
 his-0.188
Token Ben
Token ID3932
Feature activation+3.42e-03

Pos logit contributions
 her+0.039
bled+0.038
lled+0.038
led+0.038
ied+0.033
Neg logit contributions
-0.085
 promise-0.081
 rule-0.080
 request-0.079
 judge-0.078
Token says
Token ID1139
Feature activation-3.99e-02

Pos logit contributions
 promise+0.006
opus+0.006
 rule+0.006
umpy+0.006
oc+0.006
Neg logit contributions
 her-0.007
 his-0.006
 the-0.006
 their-0.006
 its-0.005
Token.
Token ID13
Feature activation-1.02e-02

Pos logit contributions
 her+0.087
lled+0.080
 their+0.078
 his+0.077
 the+0.077
Neg logit contributions
 promise-0.050
 request-0.046
 rule-0.045
-0.044
 team-0.043
Token
Token ID129
Feature activation+6.24e-02

Pos logit contributions
 rule+0.178
 promise+0.178
 favor+0.174
reath+0.169
ode+0.168
Neg logit contributions
 her-0.118
 the-0.113
 it-0.103
 his-0.103
 their-0.094
Token
Token ID241
Feature activation+7.41e-02

Pos logit contributions
 promise+0.101
ode+0.093
 rule+0.090
 difference+0.089
opus+0.089
Neg logit contributions
 her-0.137
 the-0.129
 his-0.126
 their-0.116
 its-0.115
TokenI
Token ID40
Feature activation+5.99e-02

Pos logit contributions
 promise+0.163
ari+0.159
opus+0.155
ode+0.154
affe+0.153
Neg logit contributions
 her-0.129
 the-0.120
 it-0.115
 his-0.115
 their-0.112
Token am
Token ID716
Feature activation+1.16e-02

Pos logit contributions
opus+0.101
affe+0.096
ari+0.090
asis+0.090
ode+0.089
Neg logit contributions
 her-0.134
 the-0.125
 his-0.124
 it-0.120
 their-0.117
Token baking
Token ID16871
Feature activation+1.51e-02

Pos logit contributions
oc+0.028
 promise+0.028
opus+0.027
affe+0.027
+0.027
Neg logit contributions
 her-0.016
 the-0.014
 his-0.013
 their-0.013
 it-0.012
Token a
Token ID257
Feature activation+1.41e-01

Pos logit contributions
 promise+0.038
umpy+0.037
opus+0.037
 rule+0.036
oc+0.036
Neg logit contributions
 her-0.020
 the-0.017
 his-0.016
 their-0.016
 it-0.015
Token delicious
Token ID12625
Feature activation+1.07e-01

Pos logit contributions
umpy+0.314
rr+0.300
iver+0.298
arl+0.296
 Driver+0.289
Neg logit contributions
 the-0.236
 her-0.233
 it-0.220
 their-0.203
 his-0.189
Token cake
Token ID12187
Feature activation+7.72e-02

Pos logit contributions
umpy+0.232
arl+0.216
ari+0.212
rr+0.211
 Driver+0.204
Neg logit contributions
 the-0.204
 her-0.203
 it-0.193
 his-0.171
 their-0.157
Token.
Token ID13
Feature activation+9.08e-02

Pos logit contributions
umpy+0.144
 promise+0.136
Two+0.133
arl+0.130
opus+0.130
Neg logit contributions
 her-0.161
 the-0.147
 his-0.140
 it-0.137
 their-0.137
Tokenâ
Token ID22940
Feature activation+5.10e-02

Pos logit contributions
 promise+0.209
ater+0.201
asis+0.198
 rule+0.195
an+0.195
Neg logit contributions
 her-0.152
 the-0.144
 his-0.135
 it-0.134
 its-0.119
Token
Token ID26391
Feature activation+6.22e-02

Pos logit contributions
 team+0.079
oc+0.079
opus+0.079
 rule+0.078
affe+0.078
Neg logit contributions
 her-0.105
 the-0.098
 his-0.093
 their-0.085
 him-0.084
Token flag
Token ID6056
Feature activation+7.84e-02

Pos logit contributions
iver+0.241
umpy+0.238
rig+0.228
asis+0.221
ater+0.217
Neg logit contributions
 it-0.273
 her-0.273
 the-0.270
 their-0.261
 his-0.248
Token,
Token ID11
Feature activation+1.36e-02

Pos logit contributions
opus+0.191
 promise+0.188
umpy+0.187
oral+0.185
Two+0.184
Neg logit contributions
 her-0.114
 the-0.109
 his-0.097
 their-0.096
 it-0.091
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.031
umpy+0.031
opus+0.031
oc+0.030
+0.030
Neg logit contributions
 her-0.020
 the-0.018
 their-0.018
 his-0.017
 its-0.016
Token map
Token ID3975
Feature activation+8.95e-02

Pos logit contributions
umpy+0.254
iver+0.250
ater+0.228
rig+0.225
ini+0.221
Neg logit contributions
 her-0.295
 the-0.291
 it-0.283
 their-0.281
 his-0.273
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
 promise+0.218
opus+0.217
iver+0.217
ute+0.216
ini+0.216
Neg logit contributions
 the-0.121
 her-0.121
 it-0.101
 his-0.100
 their-0.100
Token a
Token ID257
Feature activation+1.40e-01

Pos logit contributions
 promise+0.024
opus+0.024
umpy+0.024
oc+0.024
+0.024
Neg logit contributions
 her-0.017
 the-0.015
 their-0.015
 his-0.015
 its-0.013
Token landmark
Token ID20533
Feature activation+4.37e-02

Pos logit contributions
umpy+0.256
iver+0.251
ater+0.226
ivering+0.223
ini+0.219
Neg logit contributions
 her-0.301
 the-0.293
 their-0.284
 it-0.281
 his-0.278
Token,
Token ID11
Feature activation+1.68e-02

Pos logit contributions
opus+0.105
oc+0.104
umpy+0.103
 promise+0.102
+0.100
Neg logit contributions
 her-0.060
 the-0.057
 their-0.053
 his-0.050
 it-0.047
Token or
Token ID393
Feature activation+8.04e-03

Pos logit contributions
opus+0.037
umpy+0.037
 promise+0.037
+0.036
oc+0.036
Neg logit contributions
 her-0.026
 the-0.023
 their-0.023
 his-0.023
 its-0.020
Token a
Token ID257
Feature activation+1.37e-01

Pos logit contributions
 promise+0.023
opus+0.023
umpy+0.023
oc+0.023
+0.023
Neg logit contributions
 her-0.007
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token symbol
Token ID6194
Feature activation+7.05e-02

Pos logit contributions
iver+0.237
umpy+0.236
ivering+0.209
ini+0.208
ater+0.208
Neg logit contributions
 her-0.301
 the-0.295
 their-0.285
 it-0.283
 his-0.278
Token,
Token ID11
Feature activation+2.76e-02

Pos logit contributions
opus+0.064
oc+0.063
 promise+0.062
affe+0.061
umpy+0.061
Neg logit contributions
 her-0.044
 the-0.039
 their-0.038
 his-0.038
 it-0.034
Token like
Token ID588
Feature activation+2.86e-02

Pos logit contributions
opus+0.060
umpy+0.059
+0.058
oc+0.058
asis+0.058
Neg logit contributions
 her-0.046
 the-0.041
 their-0.040
 his-0.040
 it-0.037
Token a
Token ID257
Feature activation+1.34e-01

Pos logit contributions
umpy+0.070
 promise+0.069
opus+0.068
oc+0.068
asis+0.068
Neg logit contributions
 her-0.043
 the-0.038
 their-0.036
 his-0.036
 it-0.033
Token flag
Token ID6056
Feature activation+7.84e-02

Pos logit contributions
iver+0.241
umpy+0.238
rig+0.228
asis+0.221
ater+0.217
Neg logit contributions
 it-0.273
 her-0.273
 the-0.270
 their-0.261
 his-0.248
Token,
Token ID11
Feature activation+1.36e-02

Pos logit contributions
opus+0.191
 promise+0.188
umpy+0.187
oral+0.185
Two+0.184
Neg logit contributions
 her-0.114
 the-0.109
 his-0.097
 their-0.096
 it-0.091
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.031
umpy+0.031
opus+0.031
oc+0.030
+0.030
Neg logit contributions
 her-0.020
 the-0.018
 their-0.018
 his-0.017
 its-0.016
Token map
Token ID3975
Feature activation+8.95e-02

Pos logit contributions
umpy+0.254
iver+0.250
ater+0.228
rig+0.225
ini+0.221
Neg logit contributions
 her-0.295
 the-0.291
 it-0.283
 their-0.281
 his-0.273
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
 promise+0.218
opus+0.217
iver+0.217
ute+0.216
ini+0.216
Neg logit contributions
 the-0.121
 her-0.121
 it-0.101
 his-0.100
 their-0.100
Token a
Token ID257
Feature activation+1.40e-01

Pos logit contributions
 promise+0.024
opus+0.024
umpy+0.024
oc+0.024
+0.024
Neg logit contributions
 her-0.017
 the-0.015
 their-0.015
 his-0.015
 its-0.013
Token landmark
Token ID20533
Feature activation+4.37e-02

Pos logit contributions
umpy+0.256
iver+0.251
ater+0.226
ivering+0.223
ini+0.219
Neg logit contributions
 her-0.301
 the-0.293
 their-0.284
 it-0.281
 his-0.278
Token,
Token ID11
Feature activation+1.68e-02

Pos logit contributions
opus+0.105
oc+0.104
umpy+0.103
 promise+0.102
+0.100
Neg logit contributions
 her-0.060
 the-0.057
 their-0.053
 his-0.050
 it-0.047
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.018
umpy+0.018
opus+0.018
 rule+0.018
oc+0.018
Neg logit contributions
 her-0.006
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation+7.58e-02

Pos logit contributions
iver+0.265
umpy+0.248
ater+0.241
ivering+0.229
inning+0.227
Neg logit contributions
 the-0.295
 his-0.277
 her-0.275
 their-0.270
 it-0.262
Tokenaffe
Token ID21223
Feature activation+9.14e-02

Pos logit contributions
 promise+0.080
ograph+0.075
iver+0.072
+0.071
ute+0.071
Neg logit contributions
 her-0.189
 the-0.188
 his-0.187
 its-0.176
 him-0.175
Token,
Token ID11
Feature activation+4.91e-02

Pos logit contributions
oral+0.162
icious+0.157
arl+0.156
iver+0.154
 promise+0.154
Neg logit contributions
 her-0.198
 the-0.198
 his-0.192
 it-0.182
 their-0.177
Token and
Token ID290
Feature activation+1.25e-02

Pos logit contributions
umpy+0.125
 promise+0.122
opus+0.122
ater+0.121
asis+0.120
Neg logit contributions
 her-0.068
 the-0.063
 his-0.060
 their-0.056
 it-0.052
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.037
umpy+0.037
opus+0.036
 rule+0.036
ute+0.036
Neg logit contributions
 her-0.010
 the-0.008
 his-0.008
 their-0.008
 its-0.006
Token snake
Token ID17522
Feature activation+1.10e-01

Pos logit contributions
iver+0.265
umpy+0.248
ater+0.240
ivering+0.229
inning+0.227
Neg logit contributions
 the-0.294
 her-0.279
 his-0.276
 their-0.269
 it-0.258
Token!"
Token ID2474
Feature activation-8.95e-03

Pos logit contributions
oral+0.207
iver+0.206
icious+0.206
ini+0.203
umpy+0.203
Neg logit contributions
 her-0.211
 the-0.208
 his-0.200
 their-0.195
 it-0.188
Token He
Token ID679
Feature activation+7.28e-03

Pos logit contributions
 her+0.013
 their+0.011
 the+0.011
 his+0.011
led+0.010
Neg logit contributions
 promise-0.019
 rule-0.019
-0.019
oc-0.019
umpy-0.019
Token points
Token ID2173
Feature activation-6.07e-02

Pos logit contributions
 promise+0.013
umpy+0.012
 rule+0.012
opus+0.012
oc+0.012
Neg logit contributions
 her-0.015
 the-0.013
 his-0.013
 their-0.013
 its-0.012
Token to
Token ID284
Feature activation-2.83e-02

Pos logit contributions
bled+0.088
led+0.080
lled+0.076
 her+0.072
ered+0.065
Neg logit contributions
 goal-0.117
 name-0.116
 request-0.112
 team-0.111
 promise-0.111
Token and
Token ID290
Feature activation+6.19e-03

Pos logit contributions
umpy+0.060
 promise+0.058
opus+0.058
ater+0.058
oc+0.058
Neg logit contributions
 her-0.040
 the-0.037
 his-0.036
 their-0.035
 it-0.032
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
 promise+0.016
umpy+0.016
opus+0.016
 rule+0.016
+0.016
Neg logit contributions
 her-0.007
 the-0.006
 his-0.005
 their-0.005
 its-0.005
Token monkey
Token ID21657
Feature activation+9.32e-02

Pos logit contributions
iver+0.262
umpy+0.247
ater+0.237
ivering+0.228
inning+0.225
Neg logit contributions
 the-0.288
 his-0.268
 their-0.264
 her-0.263
 it-0.260
Token,
Token ID11
Feature activation+4.53e-02

Pos logit contributions
oral+0.159
arl+0.152
ini+0.152
icious+0.152
 promise+0.150
Neg logit contributions
 the-0.201
 her-0.199
 his-0.195
 it-0.187
 their-0.182
Token and
Token ID290
Feature activation+6.35e-03

Pos logit contributions
umpy+0.116
 promise+0.113
ater+0.112
opus+0.112
oc+0.111
Neg logit contributions
 her-0.064
 the-0.059
 his-0.057
 their-0.052
 it-0.049
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.018
umpy+0.018
opus+0.018
 rule+0.018
oc+0.018
Neg logit contributions
 her-0.006
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation+7.58e-02

Pos logit contributions
iver+0.265
umpy+0.248
ater+0.241
ivering+0.229
inning+0.227
Neg logit contributions
 the-0.295
 his-0.277
 her-0.275
 their-0.270
 it-0.262
Tokenaffe
Token ID21223
Feature activation+9.14e-02

Pos logit contributions
 promise+0.080
ograph+0.075
iver+0.072
+0.071
ute+0.071
Neg logit contributions
 her-0.189
 the-0.188
 his-0.187
 its-0.176
 him-0.175
Token,
Token ID11
Feature activation+4.91e-02

Pos logit contributions
oral+0.162
icious+0.157
arl+0.156
iver+0.154
 promise+0.154
Neg logit contributions
 her-0.198
 the-0.198
 his-0.192
 it-0.182
 their-0.177
Token and
Token ID290
Feature activation+1.25e-02

Pos logit contributions
umpy+0.125
 promise+0.122
opus+0.122
ater+0.121
asis+0.120
Neg logit contributions
 her-0.068
 the-0.063
 his-0.060
 their-0.056
 it-0.052
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.037
umpy+0.037
opus+0.036
 rule+0.036
ute+0.036
Neg logit contributions
 her-0.010
 the-0.008
 his-0.008
 their-0.008
 its-0.006
Token "
Token ID366
Feature activation+5.84e-02

Pos logit contributions
oc+0.079
 Driver+0.078
ney+0.075
age+0.074
umpy+0.073
Neg logit contributions
 her-0.087
 the-0.082
 it-0.074
 its-0.068
 their-0.066
TokenI
Token ID40
Feature activation+5.62e-02

Pos logit contributions
ari+0.117
opus+0.115
 promise+0.114
asis+0.114
ute+0.113
Neg logit contributions
 her-0.106
 his-0.094
 the-0.092
 their-0.088
 it-0.086
Token was
Token ID373
Feature activation+1.52e-02

Pos logit contributions
opus+0.098
affe+0.093
ari+0.092
ute+0.089
asis+0.088
Neg logit contributions
 her-0.124
 the-0.113
 his-0.112
 their-0.110
 it-0.108
Token born
Token ID4642
Feature activation+3.66e-02

Pos logit contributions
 promise+0.036
oc+0.036
opus+0.036
affe+0.035
umpy+0.035
Neg logit contributions
 her-0.021
 the-0.019
 his-0.018
 their-0.018
 it-0.016
Token as
Token ID355
Feature activation+1.27e-02

Pos logit contributions
 promise+0.086
opus+0.084
umpy+0.084
 rule+0.083
oc+0.083
Neg logit contributions
 her-0.054
 the-0.050
 his-0.046
 their-0.044
 it-0.042
Token a
Token ID257
Feature activation+1.38e-01

Pos logit contributions
 promise+0.028
 rule+0.027
oc+0.027
opus+0.027
umpy+0.027
Neg logit contributions
 her-0.020
 the-0.018
 his-0.017
 their-0.017
 it-0.016
Token cater
Token ID20825
Feature activation+6.47e-02

Pos logit contributions
iver+0.297
umpy+0.279
ney+0.264
ater+0.263
ulture+0.260
Neg logit contributions
 the-0.240
 her-0.236
 it-0.231
 his-0.216
 their-0.209
Tokenpillar
Token ID41643
Feature activation+7.57e-02

Pos logit contributions
 promise+0.081
 rule+0.072
 Driver+0.071
ney+0.068
 request+0.067
Neg logit contributions
 her-0.169
 the-0.156
led-0.151
ted-0.149
 it-0.147
Token.
Token ID13
Feature activation+8.01e-02

Pos logit contributions
 promise+0.162
 rule+0.160
umpy+0.159
ute+0.155
ney+0.155
Neg logit contributions
 her-0.129
 the-0.117
 it-0.106
 his-0.104
 its-0.100
Token A
Token ID317
Feature activation+9.25e-02

Pos logit contributions
 promise+0.173
asis+0.171
ater+0.170
 rule+0.163
actus+0.161
Neg logit contributions
 her-0.139
 the-0.133
 his-0.125
 it-0.122
 its-0.118
Token cater
Token ID20825
Feature activation+5.69e-02

Pos logit contributions
ari+0.196
iver+0.196
ney+0.191
umpy+0.190
 Driver+0.189
Neg logit contributions
 her-0.162
 the-0.157
 it-0.144
 its-0.131
 his-0.127
Token a
Token ID257
Feature activation+1.12e-01

Pos logit contributions
 her+0.007
 his+0.005
 their+0.005
 the+0.005
 its+0.005
Neg logit contributions
 promise-0.012
 rule-0.012
umpy-0.012
-0.012
opus-0.012
Token tiny
Token ID7009
Feature activation+8.93e-02

Pos logit contributions
umpy+0.258
iver+0.257
ides+0.254
rr+0.251
'."+0.251
Neg logit contributions
 the-0.178
 her-0.175
 their-0.163
 his-0.161
 it-0.159
Token box
Token ID3091
Feature activation+5.21e-02

Pos logit contributions
icious+0.161
ari+0.156
asis+0.156
age+0.153
iver+0.152
Neg logit contributions
 her-0.184
 the-0.179
 his-0.169
 it-0.163
 their-0.162
Token shaped
Token ID14292
Feature activation+7.53e-03

Pos logit contributions
 promise+0.130
opus+0.128
oc+0.128
umpy+0.127
+0.127
Neg logit contributions
 her-0.069
 the-0.062
 his-0.058
 their-0.055
 it-0.052
Token like
Token ID588
Feature activation-1.76e-02

Pos logit contributions
 promise+0.017
opus+0.017
+0.017
umpy+0.017
 rule+0.017
Neg logit contributions
 her-0.011
 the-0.010
 his-0.010
 their-0.009
 its-0.009
Token a
Token ID257
Feature activation+1.38e-01

Pos logit contributions
 her+0.019
 their+0.016
 his+0.016
 the+0.016
led+0.014
Neg logit contributions
 promise-0.043
 rule-0.043
opus-0.042
-0.042
oc-0.042
Token heart
Token ID2612
Feature activation+4.98e-02

Pos logit contributions
umpy+0.280
iver+0.271
ater+0.248
rr+0.246
ari+0.245
Neg logit contributions
 her-0.278
 it-0.272
 the-0.268
 his-0.255
 their-0.254
Token.
Token ID13
Feature activation+4.30e-02

Pos logit contributions
 promise+0.111
opus+0.111
oc+0.110
+0.107
 rule+0.107
Neg logit contributions
 her-0.080
 the-0.073
 his-0.067
 their-0.066
 it-0.063
Token
Token ID220
Feature activation+1.67e-02

Pos logit contributions
 promise+0.105
ater+0.101
 rule+0.100
asis+0.100
opus+0.098
Neg logit contributions
 her-0.068
 the-0.061
 his-0.060
 their-0.056
 it-0.053
Token
Token ID198
Feature activation+4.58e-02

Pos logit contributions
 promise+0.032
opus+0.031
 rule+0.031
ater+0.030
oc+0.030
Neg logit contributions
 her-0.032
 the-0.028
 his-0.028
 their-0.028
 it-0.026
Token
Token ID198
Feature activation+5.72e-02

Pos logit contributions
 promise+0.107
asis+0.103
opus+0.103
 rule+0.102
ater+0.102
Neg logit contributions
 her-0.076
 the-0.068
 his-0.067
 their-0.062
 it-0.060
Token?
Token ID30
Feature activation+3.85e-02

Pos logit contributions
iver+0.129
umpy+0.127
ute+0.126
ater+0.123
age+0.122
Neg logit contributions
 the-0.094
 her-0.091
 their-0.088
 his-0.080
 it-0.080
Token It
Token ID632
Feature activation+1.33e-02

Pos logit contributions
 promise+0.077
umpy+0.072
 rule+0.072
ater+0.072
asis+0.072
Neg logit contributions
 her-0.073
 the-0.068
 their-0.066
 his-0.064
 it-0.064
Token looks
Token ID3073
Feature activation-1.31e-02

Pos logit contributions
 promise+0.020
opus+0.020
ater+0.020
ute+0.020
umpy+0.020
Neg logit contributions
 her-0.030
 the-0.028
 their-0.027
 his-0.027
 it-0.025
Token more
Token ID517
Feature activation+2.93e-02

Pos logit contributions
 her+0.024
 his+0.021
 their+0.020
 the+0.020
 its+0.019
Neg logit contributions
 promise-0.024
 rule-0.024
-0.023
umpy-0.023
oc-0.023
Token like
Token ID588
Feature activation-3.45e-03

Pos logit contributions
opus+0.064
 promise+0.062
affe+0.062
oc+0.061
age+0.061
Neg logit contributions
 her-0.047
 the-0.044
 their-0.043
 his-0.043
 it-0.039
Token a
Token ID257
Feature activation+1.38e-01

Pos logit contributions
 her+0.003
 his+0.003
 their+0.003
 the+0.003
 its+0.002
Neg logit contributions
 promise-0.009
 rule-0.009
-0.009
opus-0.009
umpy-0.009
Token mess
Token ID2085
Feature activation+2.84e-02

Pos logit contributions
iver+0.271
ater+0.255
umpy+0.248
ute+0.236
ooter+0.235
Neg logit contributions
 it-0.267
 the-0.265
 their-0.261
 her-0.243
 they-0.243
Token.
Token ID13
Feature activation+3.70e-02

Pos logit contributions
 promise+0.054
oc+0.052
opus+0.052
ute+0.052
 rule+0.051
Neg logit contributions
 her-0.055
 the-0.050
 their-0.049
 his-0.048
 it-0.046
Token And
Token ID843
Feature activation+4.50e-04

Pos logit contributions
 promise+0.073
opus+0.072
asis+0.070
 rule+0.068
ater+0.068
Neg logit contributions
 her-0.068
 the-0.063
 their-0.062
 his-0.059
 it-0.059
Token some
Token ID617
Feature activation+4.03e-02

Pos logit contributions
 promise+0.001
 rule+0.001
opus+0.001
umpy+0.001
oc+0.001
Neg logit contributions
 her-0.001
 the-0.001
 his-0.001
 their-0.001
 its-0.001
Token of
Token ID286
Feature activation-1.32e-02

Pos logit contributions
opus+0.083
ute+0.082
umpy+0.081
+0.080
ater+0.080
Neg logit contributions
 her-0.072
 the-0.067
 their-0.065
 his-0.064
 it-0.059
Token,
Token ID11
Feature activation-2.46e-02

Pos logit contributions
 her+0.053
 his+0.048
 their+0.047
led+0.046
 the+0.046
Neg logit contributions
opus-0.050
-0.049
 promise-0.049
rr-0.048
umpy-0.048
Token there
Token ID612
Feature activation-5.22e-02

Pos logit contributions
 her+0.027
 his+0.022
 their+0.021
 the+0.020
 him+0.020
Neg logit contributions
-0.061
ute-0.060
umpy-0.059
 promise-0.059
 rule-0.058
Token was
Token ID373
Feature activation-2.98e-02

Pos logit contributions
 her+0.097
led+0.095
bled+0.088
 able+0.086
 their+0.086
Neg logit contributions
 world-0.079
oc-0.079
 story-0.077
-0.075
 promise-0.074
Token an
Token ID281
Feature activation+3.47e-02

Pos logit contributions
 her+0.026
led+0.022
 his+0.022
 their+0.021
 the+0.020
Neg logit contributions
-0.074
umpy-0.073
ute-0.072
opus-0.072
 world-0.071
Token old
Token ID1468
Feature activation+7.23e-02

Pos logit contributions
+0.070
 promise+0.066
!'"+0.066
oc+0.066
age+0.066
Neg logit contributions
 her-0.064
 the-0.057
 his-0.056
 their-0.054
 its-0.050
Token wooden
Token ID13510
Feature activation+1.38e-01

Pos logit contributions
icious+0.131
 Deal+0.125
umpy+0.123
rr+0.123
!'"+0.122
Neg logit contributions
 her-0.164
 the-0.152
 his-0.149
 their-0.137
 it-0.136
Token box
Token ID3091
Feature activation+3.39e-02

Pos logit contributions
iver+0.254
icious+0.244
umpy+0.243
!'"+0.224
rig+0.223
Neg logit contributions
 her-0.280
 the-0.274
 his-0.273
 their-0.245
 all-0.242
Token that
Token ID326
Feature activation-5.15e-02

Pos logit contributions
 promise+0.077
opus+0.075
+0.075
oc+0.075
umpy+0.074
Neg logit contributions
 her-0.053
 the-0.048
 his-0.045
 their-0.044
 it-0.040
Token no
Token ID645
Feature activation+1.90e-02

Pos logit contributions
led+0.090
 her+0.089
lled+0.087
ged+0.087
 their+0.084
Neg logit contributions
 promise-0.082
 world-0.079
 rule-0.079
ney-0.078
 hill-0.076
Token one
Token ID530
Feature activation-2.23e-04

Pos logit contributions
+0.037
umpy+0.037
opus+0.037
 promise+0.037
asis+0.036
Neg logit contributions
 her-0.035
 his-0.032
 the-0.031
 their-0.030
 it-0.028
Token could
Token ID714
Feature activation+1.78e-02

Pos logit contributions
 her+0.000
 his+0.000
 the+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.000
 rule-0.000
opus-0.000
-0.000
umpy-0.000
Token has
Token ID468
Feature activation-1.07e-02

Pos logit contributions
 promise+0.048
ute+0.048
ater+0.048
umpy+0.048
opus+0.047
Neg logit contributions
 her-0.063
 the-0.058
 his-0.057
 their-0.057
 it-0.053
Token a
Token ID257
Feature activation+1.17e-01

Pos logit contributions
 her+0.014
 their+0.011
 his+0.011
 the+0.011
 its+0.010
Neg logit contributions
 promise-0.025
 rule-0.025
oc-0.025
umpy-0.025
opus-0.024
Token long
Token ID890
Feature activation+6.75e-02

Pos logit contributions
iver+0.229
umpy+0.220
ater+0.219
asis+0.211
inning+0.210
Neg logit contributions
 her-0.226
 his-0.216
 it-0.211
 the-0.210
 its-0.203
Token nose
Token ID9686
Feature activation+5.33e-02

Pos logit contributions
 promise+0.120
opus+0.118
umpy+0.116
rig+0.112
icious+0.111
Neg logit contributions
 her-0.141
 his-0.133
 the-0.129
 its-0.124
 it-0.123
Token called
Token ID1444
Feature activation+5.36e-02

Pos logit contributions
 promise+0.137
 rule+0.128
 favor+0.127
+0.127
oral+0.126
Neg logit contributions
 her-0.075
 the-0.067
 his-0.063
 it-0.061
 their-0.059
Token a
Token ID257
Feature activation+1.37e-01

Pos logit contributions
 promise+0.132
umpy+0.124
asis+0.124
 rule+0.123
opus+0.122
Neg logit contributions
 her-0.081
 the-0.075
 his-0.069
 their-0.068
 its-0.063
Token trunk
Token ID21427
Feature activation+8.01e-02

Pos logit contributions
iver+0.251
umpy+0.233
ater+0.228
rig+0.226
 promise+0.226
Neg logit contributions
 the-0.295
 her-0.294
 his-0.281
 it-0.272
 its-0.265
Token.
Token ID13
Feature activation+4.62e-02

Pos logit contributions
 promise+0.207
iver+0.203
icious+0.200
opus+0.199
 Driver+0.198
Neg logit contributions
 the-0.103
 her-0.099
 his-0.087
 it-0.085
 their-0.081
Token It
Token ID632
Feature activation+2.87e-02

Pos logit contributions
 promise+0.100
asis+0.094
opus+0.093
 rule+0.092
ater+0.092
Neg logit contributions
 her-0.082
 the-0.076
 his-0.075
 their-0.073
 its-0.071
Token can
Token ID460
Feature activation+4.80e-02

Pos logit contributions
 promise+0.047
umpy+0.046
ute+0.046
opus+0.046
ater+0.046
Neg logit contributions
 her-0.062
 the-0.058
 his-0.057
 their-0.056
 it-0.053
Token use
Token ID779
Feature activation+1.28e-02

Pos logit contributions
umpy+0.082
ari+0.080
opus+0.080
ater+0.080
ute+0.079
Neg logit contributions
 her-0.109
 the-0.100
 his-0.098
 their-0.093
 it-0.091
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
 promise+0.218
opus+0.217
iver+0.217
ute+0.216
ini+0.216
Neg logit contributions
 the-0.121
 her-0.121
 it-0.101
 his-0.100
 their-0.100
Token a
Token ID257
Feature activation+1.40e-01

Pos logit contributions
 promise+0.024
opus+0.024
umpy+0.024
oc+0.024
+0.024
Neg logit contributions
 her-0.017
 the-0.015
 their-0.015
 his-0.015
 its-0.013
Token landmark
Token ID20533
Feature activation+4.37e-02

Pos logit contributions
umpy+0.256
iver+0.251
ater+0.226
ivering+0.223
ini+0.219
Neg logit contributions
 her-0.301
 the-0.293
 their-0.284
 it-0.281
 his-0.278
Token,
Token ID11
Feature activation+1.68e-02

Pos logit contributions
opus+0.105
oc+0.104
umpy+0.103
 promise+0.102
+0.100
Neg logit contributions
 her-0.060
 the-0.057
 their-0.053
 his-0.050
 it-0.047
Token or
Token ID393
Feature activation+8.04e-03

Pos logit contributions
opus+0.037
umpy+0.037
 promise+0.037
+0.036
oc+0.036
Neg logit contributions
 her-0.026
 the-0.023
 their-0.023
 his-0.023
 its-0.020
Token a
Token ID257
Feature activation+1.37e-01

Pos logit contributions
 promise+0.023
opus+0.023
umpy+0.023
oc+0.023
+0.023
Neg logit contributions
 her-0.007
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token symbol
Token ID6194
Feature activation+7.05e-02

Pos logit contributions
iver+0.237
umpy+0.236
ivering+0.209
ini+0.208
ater+0.208
Neg logit contributions
 her-0.301
 the-0.295
 their-0.285
 it-0.283
 his-0.278
Token.
Token ID13
Feature activation+5.29e-02

Pos logit contributions
 promise+0.137
opus+0.132
oc+0.131
iver+0.131
umpy+0.129
Neg logit contributions
 her-0.134
 the-0.130
 his-0.120
 their-0.118
 it-0.114
Token That
Token ID1320
Feature activation+2.60e-02

Pos logit contributions
opus+0.101
 promise+0.099
asis+0.098
ater+0.095
ute+0.094
Neg logit contributions
 her-0.105
 the-0.095
 his-0.093
 their-0.092
 its-0.088
Token way
Token ID835
Feature activation+2.30e-02

Pos logit contributions
umpy+0.045
opus+0.044
arl+0.044
ater+0.044
affe+0.042
Neg logit contributions
 her-0.054
 the-0.050
 their-0.048
 his-0.047
 it-0.046
Token,
Token ID11
Feature activation+7.90e-05

Pos logit contributions
oc+0.054
umpy+0.053
opus+0.053
affe+0.052
ater+0.052
Neg logit contributions
 her-0.035
 the-0.031
 their-0.030
 his-0.029
 it-0.027
TokenWow
Token ID22017
Feature activation+4.92e-03

Pos logit contributions
umpy+0.026
oc+0.026
opus+0.025
 promise+0.025
ater+0.024
Neg logit contributions
 her-0.049
 the-0.047
 his-0.045
 their-0.045
 it-0.043
Token!
Token ID0
Feature activation+6.14e-02

Pos logit contributions
 promise+0.010
 rule+0.010
oc+0.010
umpy+0.010
opus+0.010
Neg logit contributions
 her-0.008
 the-0.007
 his-0.007
 their-0.007
 it-0.007
Token You
Token ID921
Feature activation+2.91e-02

Pos logit contributions
ater+0.119
 promise+0.118
 rule+0.117
asis+0.116
ute+0.113
Neg logit contributions
 her-0.122
 the-0.112
 his-0.108
 their-0.105
 it-0.103
Token are
Token ID389
Feature activation-1.03e-02

Pos logit contributions
oc+0.057
opus+0.057
ute+0.056
 rule+0.055
affe+0.055
Neg logit contributions
 her-0.053
 the-0.049
 his-0.045
 their-0.045
 it-0.044
Token a
Token ID257
Feature activation+9.84e-02

Pos logit contributions
 her+0.014
 his+0.012
 their+0.012
 the+0.011
 its+0.010
Neg logit contributions
 promise-0.024
 rule-0.023
umpy-0.023
opus-0.023
-0.023
Token purple
Token ID14032
Feature activation+1.37e-01

Pos logit contributions
iver+0.233
age+0.230
ney+0.227
asis+0.227
ari+0.224
Neg logit contributions
 her-0.150
 their-0.139
 the-0.138
 his-0.130
 it-0.128
Token z
Token ID1976
Feature activation+6.90e-02

Pos logit contributions
umpy+0.272
iver+0.265
rig+0.260
asis+0.247
ney+0.247
Neg logit contributions
 her-0.290
 the-0.260
 his-0.253
 it-0.247
 their-0.235
Tokenebra
Token ID37052
Feature activation+1.10e-01

Pos logit contributions
umpy+0.094
 difference+0.090
 rule+0.086
 promise+0.085
ney+0.085
Neg logit contributions
 her-0.153
 his-0.147
 the-0.142
bled-0.139
 their-0.136
Token!
Token ID0
Feature activation+7.17e-02

Pos logit contributions
oral+0.224
arl+0.223
umpy+0.216
reath+0.214
iver+0.211
Neg logit contributions
 her-0.201
 the-0.190
 his-0.178
 it-0.176
 their-0.170
Token Can
Token ID1680
Feature activation+2.01e-02

Pos logit contributions
asis+0.149
 promise+0.148
ater+0.142
opus+0.140
uddle+0.138
Neg logit contributions
 her-0.139
 the-0.127
 his-0.124
 its-0.122
 their-0.121
Token I
Token ID314
Feature activation+9.26e-02

Pos logit contributions
umpy+0.033
ater+0.032
 promise+0.031
oc+0.031
asis+0.031
Neg logit contributions
 her-0.046
 the-0.042
 his-0.041
 their-0.040
 it-0.040
Token"
Token ID1
Feature activation+4.81e-02

Pos logit contributions
 promise+0.076
 rule+0.072
ater+0.071
affe+0.069
opus+0.069
Neg logit contributions
 her-0.066
 the-0.062
 his-0.059
 their-0.057
 it-0.055
TokenLook
Token ID8567
Feature activation-7.13e-03

Pos logit contributions
 promise+0.089
 rule+0.082
oc+0.081
opus+0.080
asis+0.079
Neg logit contributions
 her-0.101
 the-0.092
 his-0.089
 their-0.087
 it-0.083
Token,
Token ID11
Feature activation+1.86e-02

Pos logit contributions
 her+0.009
 his+0.008
 their+0.008
 the+0.008
 its+0.007
Neg logit contributions
 promise-0.017
-0.017
opus-0.017
 rule-0.017
oc-0.017
Token there
Token ID612
Feature activation+3.91e-02

Pos logit contributions
 promise+0.053
umpy+0.053
oc+0.053
opus+0.052
 rule+0.052
Neg logit contributions
 her-0.018
 the-0.015
 his-0.014
 their-0.014
 it-0.012
Token's
Token ID338
Feature activation+1.67e-02

Pos logit contributions
umpy+0.070
ater+0.070
 promise+0.067
oc+0.067
opus+0.067
Neg logit contributions
 her-0.079
 the-0.074
 their-0.071
 his-0.070
 it-0.068
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
opus+0.051
oc+0.051
umpy+0.050
 promise+0.050
affe+0.050
Neg logit contributions
 her-0.013
 his-0.010
 their-0.010
 the-0.010
 its-0.007
Token stand
Token ID1302
Feature activation+5.95e-02

Pos logit contributions
age+0.323
iver+0.315
umpy+0.315
ater+0.313
ides+0.310
Neg logit contributions
 the-0.224
 her-0.200
 it-0.200
 his-0.186
 their-0.183
Token!"
Token ID2474
Feature activation-4.18e-02

Pos logit contributions
 promise+0.140
 rule+0.134
icious+0.133
reath+0.133
affe+0.132
Neg logit contributions
 her-0.082
 the-0.081
 his-0.074
 it-0.074
 their-0.072
Token said
Token ID531
Feature activation-4.68e-02

Pos logit contributions
 her+0.066
bled+0.063
 their+0.059
lled+0.057
 the+0.056
Neg logit contributions
-0.080
oc-0.073
 request-0.073
 rule-0.073
affe-0.072
Token Lucy
Token ID22162
Feature activation+5.59e-03

Pos logit contributions
 her+0.041
led+0.036
lled+0.033
bled+0.032
 their+0.030
Neg logit contributions
-0.117
 promise-0.116
 team-0.115
 request-0.115
 rule-0.113
Token.
Token ID13
Feature activation+2.80e-02

Pos logit contributions
 promise+0.011
 rule+0.011
umpy+0.010
opus+0.010
+0.010
Neg logit contributions
 her-0.010
 his-0.009
 the-0.009
 their-0.009
 its-0.008
Token,
Token ID11
Feature activation+3.38e-02

Pos logit contributions
opus+0.085
 promise+0.084
oc+0.083
asis+0.083
umpy+0.082
Neg logit contributions
 her-0.049
 the-0.045
 their-0.043
 his-0.042
 it-0.039
Token a
Token ID257
Feature activation+1.33e-01

Pos logit contributions
opus+0.073
umpy+0.072
asis+0.071
oc+0.071
+0.070
Neg logit contributions
 her-0.060
 the-0.053
 his-0.052
 their-0.051
 it-0.046
Token nose
Token ID9686
Feature activation+5.37e-02

Pos logit contributions
umpy+0.237
asis+0.223
ari+0.219
ini+0.217
ater+0.217
Neg logit contributions
 her-0.283
 the-0.277
 his-0.268
 their-0.260
 its-0.258
Token,
Token ID11
Feature activation+2.48e-02

Pos logit contributions
opus+0.139
 promise+0.138
asis+0.137
+0.136
umpy+0.134
Neg logit contributions
 her-0.067
 the-0.058
 his-0.055
 their-0.053
 it-0.048
Token and
Token ID290
Feature activation+2.41e-02

Pos logit contributions
umpy+0.064
opus+0.064
 promise+0.063
oc+0.063
asis+0.063
Neg logit contributions
 her-0.031
 the-0.027
 his-0.026
 their-0.025
 its-0.022
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
umpy+0.071
opus+0.070
 promise+0.070
asis+0.069
ute+0.069
Neg logit contributions
 her-0.022
 the-0.018
 his-0.017
 their-0.017
 its-0.013
Token collar
Token ID19908
Feature activation+5.17e-02

Pos logit contributions
umpy+0.248
ater+0.234
ari+0.232
asis+0.231
iver+0.228
Neg logit contributions
 her-0.290
 the-0.279
 his-0.275
 their-0.266
 its-0.260
Token.
Token ID13
Feature activation+5.17e-02

Pos logit contributions
opus+0.117
asis+0.114
 promise+0.113
umpy+0.112
oc+0.112
Neg logit contributions
 her-0.086
 the-0.079
 his-0.073
 their-0.072
 it-0.069
Token Then
Token ID3244
Feature activation+2.46e-03

Pos logit contributions
asis+0.097
opus+0.095
 promise+0.094
ater+0.092
ute+0.092
Neg logit contributions
 her-0.104
 the-0.095
 his-0.092
 their-0.090
 its-0.087
Token your
Token ID534
Feature activation+6.29e-02

Pos logit contributions
 promise+0.004
umpy+0.004
opus+0.004
 rule+0.004
oc+0.004
Neg logit contributions
 her-0.005
 the-0.005
 his-0.005
 their-0.005
 it-0.004
Token dog
Token ID3290
Feature activation+5.84e-02

Pos logit contributions
oc+0.106
asis+0.104
umpy+0.102
opus+0.100
rr+0.098
Neg logit contributions
 her-0.132
 the-0.125
 their-0.120
 his-0.120
 it-0.112
Token draw
Token ID3197
Feature activation-1.45e-02

Pos logit contributions
umpy+0.143
ney+0.138
actus+0.135
opus+0.135
opard+0.132
Neg logit contributions
 her-0.126
 the-0.119
 his-0.108
 it-0.107
 him-0.099
Token me
Token ID502
Feature activation+2.50e-02

Pos logit contributions
 her+0.010
 their+0.008
 his+0.007
 the+0.007
 its+0.005
Neg logit contributions
-0.042
oc-0.041
 promise-0.041
opus-0.041
 rule-0.041
Token a
Token ID257
Feature activation+1.24e-01

Pos logit contributions
umpy+0.037
 promise+0.036
 rule+0.036
opus+0.035
affe+0.035
Neg logit contributions
 her-0.058
 the-0.055
 his-0.053
 their-0.050
 it-0.049
Token picture
Token ID4286
Feature activation+5.11e-02

Pos logit contributions
umpy+0.244
rr+0.241
ari+0.238
agons+0.236
iver+0.236
Neg logit contributions
 it-0.229
 the-0.225
 her-0.218
 his-0.206
 its-0.194
Token of
Token ID286
Feature activation+1.53e-02

Pos logit contributions
 promise+0.089
oc+0.088
ater+0.086
umpy+0.084
opus+0.083
Neg logit contributions
 her-0.114
 the-0.104
 his-0.101
 it-0.095
 their-0.095
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
umpy+0.038
 promise+0.038
opus+0.037
ute+0.037
oc+0.037
Neg logit contributions
 her-0.021
 the-0.018
 his-0.018
 their-0.017
 it-0.016
Token banana
Token ID25996
Feature activation+9.61e-02

Pos logit contributions
iver+0.270
umpy+0.258
arl+0.253
rr+0.253
agons+0.246
Neg logit contributions
 it-0.272
 the-0.265
 her-0.255
 his-0.233
 its-0.220
Token?"
Token ID1701
Feature activation+4.66e-03

Pos logit contributions
umpy+0.156
ini+0.153
rr+0.148
 promise+0.147
rig+0.145
Neg logit contributions
 her-0.213
 the-0.209
 his-0.194
 it-0.190
 how-0.174
Token
Token ID198
Feature activation+2.31e-02

Pos logit contributions
 promise+0.008
 rule+0.008
opus+0.008
umpy+0.008
oc+0.008
Neg logit contributions
 her-0.009
 the-0.008
 his-0.008
 their-0.008
 it-0.008
Token
Token ID198
Feature activation+4.20e-02

Pos logit contributions
asis+0.047
 promise+0.047
opus+0.047
 rule+0.047
ater+0.047
Neg logit contributions
 her-0.042
 the-0.038
 his-0.036
 their-0.036
 it-0.034
TokenL
Token ID43
Feature activation+1.63e-02

Pos logit contributions
asis+0.098
ode+0.088
affe+0.088
opus+0.087
ari+0.087
Neg logit contributions
 her-0.080
 the-0.068
 their-0.064
 his-0.062
 it-0.060
Token is
Token ID318
Feature activation+1.82e-03

Pos logit contributions
umpy+0.039
ute+0.037
 promise+0.037
ater+0.037
oc+0.037
Neg logit contributions
 her-0.031
 the-0.028
 his-0.027
 their-0.026
 it-0.025
Token a
Token ID257
Feature activation+1.12e-01

Pos logit contributions
 promise+0.005
 rule+0.005
opus+0.005
umpy+0.005
oc+0.005
Neg logit contributions
 her-0.002
 the-0.001
 his-0.001
 their-0.001
 its-0.001
Token lion
Token ID18744
Feature activation+9.68e-02

Pos logit contributions
iver+0.215
ater+0.205
umpy+0.204
ivering+0.201
pillar+0.184
Neg logit contributions
 the-0.238
 their-0.227
 it-0.215
 his-0.213
 her-0.213
Token,
Token ID11
Feature activation+2.57e-02

Pos logit contributions
oral+0.175
ini+0.167
 promise+0.166
umpy+0.165
arl+0.165
Neg logit contributions
 the-0.202
 her-0.195
 his-0.191
 it-0.183
 their-0.176
Token and
Token ID290
Feature activation+6.19e-03

Pos logit contributions
umpy+0.060
 promise+0.058
opus+0.058
ater+0.058
oc+0.058
Neg logit contributions
 her-0.040
 the-0.037
 his-0.036
 their-0.035
 it-0.032
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
 promise+0.016
umpy+0.016
opus+0.016
 rule+0.016
+0.016
Neg logit contributions
 her-0.007
 the-0.006
 his-0.005
 their-0.005
 its-0.005
Token monkey
Token ID21657
Feature activation+9.32e-02

Pos logit contributions
iver+0.262
umpy+0.247
ater+0.237
ivering+0.228
inning+0.225
Neg logit contributions
 the-0.288
 his-0.268
 their-0.264
 her-0.263
 it-0.260
Token,
Token ID11
Feature activation+4.53e-02

Pos logit contributions
oral+0.159
arl+0.152
ini+0.152
icious+0.152
 promise+0.150
Neg logit contributions
 the-0.201
 her-0.199
 his-0.195
 it-0.187
 their-0.182
Token and
Token ID290
Feature activation+6.35e-03

Pos logit contributions
umpy+0.116
 promise+0.113
ater+0.112
opus+0.112
oc+0.111
Neg logit contributions
 her-0.064
 the-0.059
 his-0.057
 their-0.052
 it-0.049
Token a
Token ID257
Feature activation+1.39e-01

Pos logit contributions
 promise+0.018
umpy+0.018
opus+0.018
 rule+0.018
oc+0.018
Neg logit contributions
 her-0.006
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation+7.58e-02

Pos logit contributions
iver+0.265
umpy+0.248
ater+0.241
ivering+0.229
inning+0.227
Neg logit contributions
 the-0.295
 his-0.277
 her-0.275
 their-0.270
 it-0.262
Token a
Token ID257
Feature activation+1.18e-01

Pos logit contributions
 her+0.016
 their+0.013
 his+0.013
 the+0.013
 its+0.012
Neg logit contributions
 promise-0.026
 rule-0.025
-0.025
opus-0.025
oc-0.025
Token ball
Token ID2613
Feature activation+5.86e-02

Pos logit contributions
umpy+0.231
iver+0.221
ari+0.213
oc+0.210
ney+0.210
Neg logit contributions
 her-0.239
 the-0.225
 his-0.219
 it-0.202
 their-0.197
Token,
Token ID11
Feature activation-9.24e-03

Pos logit contributions
 promise+0.153
umpy+0.152
iver+0.151
 Amen+0.150
asis+0.150
Neg logit contributions
 her-0.074
 the-0.067
 his-0.064
 their-0.059
 it-0.054
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.013
 its+0.012
Neg logit contributions
 promise-0.018
-0.018
 rule-0.018
oc-0.017
opus-0.017
Token even
Token ID772
Feature activation-1.49e-02

Pos logit contributions
 her+0.012
 their+0.009
 his+0.009
 the+0.009
 its+0.007
Neg logit contributions
 promise-0.038
 rule-0.037
-0.037
oc-0.036
opus-0.036
Token a
Token ID257
Feature activation+1.36e-01

Pos logit contributions
 her+0.015
 their+0.012
 his+0.012
 the+0.012
 its+0.010
Neg logit contributions
 promise-0.039
 rule-0.038
-0.038
oc-0.037
umpy-0.037
Token spoon
Token ID24556
Feature activation+4.17e-02

Pos logit contributions
iver+0.257
umpy+0.243
asis+0.228
ari+0.227
rr+0.223
Neg logit contributions
 her-0.278
 the-0.269
 his-0.257
 it-0.249
 their-0.246
Token.
Token ID13
Feature activation+7.72e-03

Pos logit contributions
 promise+0.088
+0.088
oc+0.087
opus+0.086
umpy+0.085
Neg logit contributions
 her-0.070
 his-0.062
 the-0.062
 their-0.060
 it-0.055
Token He
Token ID679
Feature activation+2.05e-02

Pos logit contributions
 promise+0.018
opus+0.017
 rule+0.017
oc+0.017
umpy+0.017
Neg logit contributions
 her-0.011
 his-0.010
 the-0.010
 their-0.010
 it-0.009
Token likes
Token ID7832
Feature activation-9.94e-03

Pos logit contributions
umpy+0.033
opus+0.032
 promise+0.032
 rule+0.032
ute+0.032
Neg logit contributions
 her-0.044
 the-0.040
 his-0.040
 their-0.039
 it-0.037
Token to
Token ID284
Feature activation+4.99e-02

Pos logit contributions
 her+0.011
 their+0.009
 his+0.009
 the+0.009
 its+0.008
Neg logit contributions
 promise-0.025
 rule-0.024
opus-0.024
oc-0.024
-0.024
Token fingers
Token ID9353
Feature activation-1.41e-02

Pos logit contributions
opus+0.090
umpy+0.086
asis+0.084
ute+0.084
+0.080
Neg logit contributions
 her-0.099
 his-0.091
 the-0.090
 their-0.085
 its-0.083
Token.
Token ID13
Feature activation+7.27e-03

Pos logit contributions
 her+0.028
 their+0.025
 his+0.025
 the+0.025
 its+0.023
Neg logit contributions
 promise-0.023
 rule-0.022
umpy-0.022
ute-0.022
oc-0.022
Token
Token ID198
Feature activation+2.44e-02

Pos logit contributions
 promise+0.015
 rule+0.015
opus+0.014
oc+0.014
umpy+0.014
Neg logit contributions
 her-0.012
 the-0.011
 his-0.011
 their-0.011
 it-0.010
Token
Token ID198
Feature activation+3.87e-02

Pos logit contributions
 promise+0.051
 rule+0.050
ater+0.049
opus+0.049
asis+0.048
Neg logit contributions
 her-0.044
 the-0.039
 his-0.038
 their-0.037
 it-0.036
TokenShe
Token ID3347
Feature activation-2.99e-02

Pos logit contributions
 promise+0.083
 rule+0.078
 Driver+0.077
asis+0.076
 request+0.075
Neg logit contributions
 her-0.071
 the-0.061
 his-0.057
 their-0.056
 it-0.055
Token inserted
Token ID18846
Feature activation-1.21e-01

Pos logit contributions
 her+0.047
 their+0.043
 his+0.040
 the+0.040
 its+0.038
Neg logit contributions
 promise-0.061
oc-0.061
ney-0.058
 request-0.057
-0.057
Token the
Token ID262
Feature activation+1.71e-02

Pos logit contributions
led+0.145
bled+0.121
ied+0.098
icked+0.097
zed+0.097
Neg logit contributions
 shelf-0.230
 promise-0.215
 button-0.215
 hill-0.213
 goal-0.211
Token key
Token ID1994
Feature activation-4.78e-02

Pos logit contributions
opus+0.024
umpy+0.024
oc+0.024
+0.023
 promise+0.023
Neg logit contributions
 her-0.040
 the-0.037
 his-0.036
 their-0.035
 it-0.034
Token into
Token ID656
Feature activation-7.44e-02

Pos logit contributions
 her+0.065
 their+0.060
 his+0.058
 the+0.053
 its+0.051
Neg logit contributions
 promise-0.101
ney-0.099
ute-0.098
age-0.097
 hill-0.097
Token the
Token ID262
Feature activation+3.51e-02

Pos logit contributions
led+0.077
bled+0.055
ied+0.055
icked+0.048
ed+0.045
Neg logit contributions
 shelf-0.185
 hill-0.175
 rule-0.174
 house-0.173
 promise-0.171
Token key
Token ID1994
Feature activation-1.14e-02

Pos logit contributions
umpy+0.052
opus+0.052
ute+0.049
oc+0.049
iver+0.048
Neg logit contributions
 her-0.084
 the-0.078
 his-0.077
 their-0.072
 it-0.071
Token went
Token ID1816
Feature activation-1.58e-02

Pos logit contributions
 her+0.059
 their+0.053
 his+0.053
 its+0.052
 the+0.052
Neg logit contributions
 promise-0.079
ari-0.075
oc-0.074
ney-0.074
age-0.072
Token on
Token ID319
Feature activation-5.42e-02

Pos logit contributions
 her+0.020
 his+0.017
 their+0.017
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.036
 rule-0.035
ute-0.035
-0.035
 request-0.034
Token a
Token ID257
Feature activation+4.91e-03

Pos logit contributions
led+0.064
bled+0.056
icked+0.048
ied+0.048
ed+0.048
Neg logit contributions
 promise-0.127
 rule-0.127
 story-0.127
 farm-0.127
 hill-0.124
Token walk
Token ID2513
Feature activation-2.45e-02

Pos logit contributions
 promise+0.006
umpy+0.006
opus+0.006
oc+0.006
 rule+0.006
Neg logit contributions
 her-0.012
 the-0.011
 their-0.011
 his-0.011
 its-0.010
Token and
Token ID290
Feature activation-8.44e-02

Pos logit contributions
 her+0.040
 his+0.036
 nothing+0.034
bled+0.034
 their+0.033
Neg logit contributions
 promise-0.046
 rule-0.045
ute-0.045
 world-0.045
ney-0.045
Token soon
Token ID2582
Feature activation-1.20e-01

Pos logit contributions
bled+0.150
led+0.143
 such+0.143
ped+0.142
ries+0.141
Neg logit contributions
 swing-0.116
 station-0.114
 house-0.111
 team-0.109
 hill-0.109
Token they
Token ID484
Feature activation-6.49e-02

Pos logit contributions
 his+0.183
nam+0.182
bled+0.179
 how+0.175
years+0.172
Neg logit contributions
ari-0.186
-0.173
 team-0.167
 group-0.165
 house-0.164
Token saw
Token ID2497
Feature activation-5.93e-02

Pos logit contributions
 her+0.118
 his+0.111
 how+0.110
 such+0.109
 its+0.108
Neg logit contributions
ari-0.096
 team-0.093
 house-0.089
 promise-0.089
 hill-0.088
Token a
Token ID257
Feature activation+8.60e-02

Pos logit contributions
led+0.069
ened+0.066
 her+0.065
ged+0.063
bled+0.062
Neg logit contributions
 world-0.125
 fence-0.124
 farm-0.121
 house-0.119
 hill-0.117
Token lab
Token ID2248
Feature activation+4.16e-02

Pos logit contributions
.'"+0.194
iver+0.193
rr+0.192
umpy+0.192
'."+0.192
Neg logit contributions
 the-0.133
 it-0.124
 her-0.122
 their-0.115
 his-0.110
Token.
Token ID13
Feature activation-6.26e-03

Pos logit contributions
 promise+0.091
+0.090
umpy+0.089
opus+0.089
oc+0.089
Neg logit contributions
 her-0.066
 the-0.061
 their-0.059
 his-0.057
 it-0.055
Token followed
Token ID3940
Feature activation-4.86e-02

Pos logit contributions
 her+0.048
 their+0.044
 his+0.042
 the+0.040
 its+0.040
Neg logit contributions
 promise-0.057
oc-0.056
-0.053
 request-0.053
ney-0.053
Token the
Token ID262
Feature activation+3.62e-02

Pos logit contributions
led+0.031
bled+0.024
icked+0.017
lled+0.016
ied+0.016
Neg logit contributions
 request-0.137
 promise-0.136
 rule-0.133
 story-0.130
 shelf-0.129
Token ringing
Token ID32333
Feature activation-3.54e-03

Pos logit contributions
oc+0.063
+0.059
affe+0.058
umpy+0.057
!'"+0.057
Neg logit contributions
 her-0.082
 the-0.075
 their-0.071
 his-0.070
 it-0.066
Token sound
Token ID2128
Feature activation-4.20e-02

Pos logit contributions
 her+0.007
 his+0.007
 their+0.007
 the+0.007
 its+0.006
Neg logit contributions
 promise-0.006
 rule-0.005
opus-0.005
umpy-0.005
-0.005
Token and
Token ID290
Feature activation-8.13e-02

Pos logit contributions
 her+0.070
 his+0.064
 their+0.063
 the+0.060
bled+0.060
Neg logit contributions
 promise-0.072
 rule-0.071
umpy-0.071
affe-0.070
age-0.069
Token soon
Token ID2582
Feature activation-1.19e-01

Pos logit contributions
ied+0.119
led+0.118
bled+0.115
icked+0.112
 theirs+0.111
Neg logit contributions
 promise-0.125
 house-0.125
 judge-0.124
 hill-0.123
 request-0.122
Token found
Token ID1043
Feature activation-7.34e-02

Pos logit contributions
bled+0.183
nam+0.164
 theirs+0.162
ied+0.162
ped+0.157
Neg logit contributions
ari-0.163
 house-0.162
 team-0.162
pillar-0.158
 group-0.158
Token the
Token ID262
Feature activation+4.26e-02

Pos logit contributions
led+0.085
ged+0.074
bled+0.070
ened+0.066
lled+0.060
Neg logit contributions
 name-0.151
age-0.150
 world-0.148
 veterinarian-0.148
 rule-0.147
Token source
Token ID2723
Feature activation-2.72e-02

Pos logit contributions
oc+0.083
affe+0.079
opus+0.077
+0.076
rr+0.074
Neg logit contributions
 her-0.087
 the-0.081
 their-0.075
 his-0.071
 it-0.069
Token
Token ID6184
Feature activation+4.91e-02

Pos logit contributions
 her+0.050
 his+0.046
 their+0.045
 the+0.043
 its+0.041
Neg logit contributions
 promise-0.046
umpy-0.046
 rule-0.045
oc-0.044
opus-0.044
Token
Token ID95
Feature activation+4.60e-02

Pos logit contributions
 promise+0.069
oc+0.064
umpy+0.064
 rule+0.062
ney+0.062
Neg logit contributions
 her-0.111
 the-0.105
 their-0.097
 it-0.097
 its-0.094
Token as
Token ID355
Feature activation-6.49e-02

Pos logit contributions
 her+0.086
bled+0.081
lled+0.076
ied+0.073
 none+0.072
Neg logit contributions
-0.140
affe-0.139
oc-0.133
 road-0.131
age-0.127
Token they
Token ID484
Feature activation-6.22e-02

Pos logit contributions
led+0.068
lled+0.063
 her+0.061
ened+0.057
ied+0.053
Neg logit contributions
 hill-0.145
 road-0.136
 farm-0.135
 team-0.134
 ramp-0.132
Token could
Token ID714
Feature activation-2.93e-02

Pos logit contributions
 her+0.125
 their+0.112
 his+0.105
 its+0.104
 such+0.103
Neg logit contributions
 promise-0.094
ite-0.092
ari-0.091
-0.091
 hill-0.090
Token,
Token ID11
Feature activation-4.12e-02

Pos logit contributions
 her+0.056
 their+0.049
 his+0.049
led+0.049
 the+0.048
Neg logit contributions
 promise-0.050
 rule-0.047
ute-0.047
age-0.045
 request-0.045
Token and
Token ID290
Feature activation-8.81e-02

Pos logit contributions
 her+0.079
led+0.072
 their+0.068
 his+0.068
ied+0.066
Neg logit contributions
 hill-0.065
 promise-0.065
 rule-0.063
 ramp-0.062
 road-0.062
Token soon
Token ID2582
Feature activation-1.18e-01

Pos logit contributions
icked+0.107
 her+0.106
bled+0.105
 his+0.103
ries+0.099
Neg logit contributions
 station-0.151
 team-0.150
 farm-0.149
 promise-0.148
 hill-0.147
Token they
Token ID484
Feature activation-5.76e-02

Pos logit contributions
bled+0.152
years+0.141
ied+0.135
led+0.130
nam+0.129
Neg logit contributions
 team-0.220
-0.207
 house-0.205
 story-0.202
ari-0.201
Token were
Token ID547
Feature activation-5.82e-02

Pos logit contributions
 her+0.104
 their+0.083
 his+0.082
 how+0.080
 its+0.079
Neg logit contributions
 team-0.100
 promise-0.099
ari-0.098
umpy-0.098
 hill-0.098
Token far
Token ID1290
Feature activation-1.54e-02

Pos logit contributions
 her+0.106
led+0.097
ted+0.089
ied+0.089
 their+0.086
Neg logit contributions
 promise-0.089
umpy-0.087
 world-0.085
 swing-0.084
 driver-0.083
Token away
Token ID1497
Feature activation-4.82e-02

Pos logit contributions
 her+0.022
 their+0.018
 his+0.018
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.035
umpy-0.034
 rule-0.034
-0.034
oc-0.033
Token from
Token ID422
Feature activation-3.79e-02

Pos logit contributions
 her+0.080
 his+0.066
led+0.065
bled+0.065
 their+0.063
Neg logit contributions
umpy-0.088
 world-0.086
 rule-0.084
 promise-0.084
ney-0.082
Token through
Token ID832
Feature activation-6.40e-02

Pos logit contributions
 nothing+0.077
bled+0.076
lled+0.074
led+0.074
ened+0.072
Neg logit contributions
-0.099
ny-0.099
 hill-0.099
 world-0.098
age-0.095
Token the
Token ID262
Feature activation+2.69e-02

Pos logit contributions
bled+0.071
led+0.063
icked+0.055
ied+0.049
lled+0.044
Neg logit contributions
 world-0.166
 hill-0.163
 farm-0.159
 house-0.158
 fence-0.157
Token forest
Token ID8222
Feature activation-4.42e-02

Pos logit contributions
 promise+0.042
ater+0.039
oc+0.038
umpy+0.038
 rule+0.038
Neg logit contributions
 her-0.062
 the-0.060
 his-0.057
 their-0.055
 its-0.053
Token,
Token ID11
Feature activation-6.95e-02

Pos logit contributions
 her+0.082
 his+0.076
bled+0.074
 their+0.074
led+0.073
Neg logit contributions
 world-0.069
 farm-0.069
ney-0.068
 rule-0.068
 hill-0.068
Token but
Token ID475
Feature activation-9.44e-02

Pos logit contributions
 her+0.115
 their+0.111
icked+0.110
 his+0.109
 him+0.107
Neg logit contributions
 farm-0.109
 hill-0.107
 swing-0.106
 world-0.105
 fence-0.104
Token soon
Token ID2582
Feature activation-1.18e-01

Pos logit contributions
lled+0.115
 his+0.111
 their+0.108
 theirs+0.106
led+0.103
Neg logit contributions
 farm-0.167
 hill-0.167
 world-0.166
 house-0.161
pillar-0.160
Token it
Token ID340
Feature activation-5.49e-02

Pos logit contributions
bled+0.169
lled+0.164
led+0.162
ped+0.155
ied+0.153
Neg logit contributions
pillar-0.185
 world-0.183
 group-0.173
 story-0.170
 judge-0.168
Token was
Token ID373
Feature activation-4.77e-02

Pos logit contributions
 her+0.101
 their+0.098
 his+0.090
 theirs+0.087
 how+0.086
Neg logit contributions
age-0.093
umpy-0.091
ney-0.089
 world-0.087
asis-0.086
Token getting
Token ID1972
Feature activation-6.54e-02

Pos logit contributions
led+0.082
 her+0.081
ed+0.077
 their+0.076
ged+0.075
Neg logit contributions
 rule-0.078
 world-0.077
 promise-0.074
 story-0.072
umpy-0.071
Token dark
Token ID3223
Feature activation-2.51e-02

Pos logit contributions
 her+0.123
 their+0.112
led+0.112
lled+0.109
ged+0.109
Neg logit contributions
-0.095
 promise-0.087
 world-0.087
umpy-0.087
 river-0.084
Token.
Token ID13
Feature activation-1.53e-02

Pos logit contributions
 her+0.051
 their+0.047
 his+0.046
 the+0.045
 its+0.044
Neg logit contributions
umpy-0.037
-0.037
 world-0.036
 rule-0.036
ney-0.036
Token but
Token ID475
Feature activation-7.73e-02

Pos logit contributions
led+0.115
bled+0.114
icked+0.113
lled+0.109
ied+0.109
Neg logit contributions
 rule-0.064
 promise-0.062
 road-0.059
 world-0.059
 judge-0.059
Token she
Token ID673
Feature activation-1.46e-02

Pos logit contributions
lled+0.112
bled+0.107
led+0.107
icked+0.105
ts+0.103
Neg logit contributions
 promise-0.146
 request-0.129
 judge-0.123
-0.115
 story-0.115
Token kept
Token ID4030
Feature activation-4.31e-02

Pos logit contributions
 her+0.023
 their+0.021
 the+0.020
 his+0.020
 its+0.019
Neg logit contributions
 promise-0.031
 rule-0.029
-0.029
oc-0.029
opus-0.029
Token walking
Token ID6155
Feature activation-1.45e-02

Pos logit contributions
led+0.067
ied+0.058
bled+0.057
lled+0.057
ed+0.057
Neg logit contributions
 promise-0.081
 request-0.077
 rule-0.075
 story-0.075
 world-0.075
Token and
Token ID290
Feature activation-8.14e-02

Pos logit contributions
 her+0.023
 their+0.021
 his+0.021
 the+0.020
led+0.019
Neg logit contributions
 promise-0.029
 rule-0.028
umpy-0.027
-0.027
oc-0.027
Token soon
Token ID2582
Feature activation-1.17e-01

Pos logit contributions
led+0.145
icked+0.133
ied+0.127
bled+0.127
ped+0.124
Neg logit contributions
 hill-0.127
 promise-0.126
 swing-0.124
 house-0.123
 world-0.117
Token found
Token ID1043
Feature activation-6.77e-02

Pos logit contributions
bled+0.167
ped+0.161
lled+0.156
ries+0.155
 how+0.149
Neg logit contributions
-0.167
 team-0.162
 house-0.160
ari-0.156
 swing-0.154
Token a
Token ID257
Feature activation+6.38e-02

Pos logit contributions
led+0.093
ened+0.082
lled+0.079
ged+0.078
bled+0.074
Neg logit contributions
 veterinarian-0.132
age-0.126
 rule-0.124
 world-0.124
 house-0.123
Token beautiful
Token ID4950
Feature activation+5.23e-02

Pos logit contributions
rr+0.140
umpy+0.139
'."+0.136
iver+0.135
.'"+0.135
Neg logit contributions
 the-0.101
 her-0.099
 it-0.094
 their-0.093
 his-0.091
Token flower
Token ID15061
Feature activation+5.17e-02

Pos logit contributions
umpy+0.085
rr+0.081
 Deal+0.078
 promise+0.078
ater+0.078
Neg logit contributions
 her-0.122
 the-0.117
 it-0.111
 their-0.110
 his-0.109
Token.
Token ID13
Feature activation+7.88e-03

Pos logit contributions
umpy+0.123
rr+0.120
oc+0.119
 Driver+0.118
+0.118
Neg logit contributions
 her-0.081
 the-0.069
 their-0.064
 it-0.064
 his-0.060
Token scared
Token ID12008
Feature activation-4.39e-02

Pos logit contributions
 her+0.079
led+0.079
bled+0.079
ied+0.077
ries+0.077
Neg logit contributions
 promise-0.122
umpy-0.112
 request-0.110
-0.109
 world-0.108
Token at
Token ID379
Feature activation-3.50e-02

Pos logit contributions
 her+0.078
 their+0.076
bled+0.075
led+0.073
 his+0.073
Neg logit contributions
umpy-0.067
 promise-0.064
 world-0.063
 rule-0.061
ater-0.060
Token first
Token ID717
Feature activation-1.24e-02

Pos logit contributions
led+0.071
 her+0.070
 their+0.068
 his+0.066
ed+0.065
Neg logit contributions
 promise-0.052
 rule-0.049
 hill-0.046
 request-0.045
 world-0.044
Token,
Token ID11
Feature activation-3.32e-02

Pos logit contributions
 her+0.020
 their+0.017
 his+0.017
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.025
 rule-0.025
-0.024
umpy-0.024
oc-0.024
Token but
Token ID475
Feature activation-7.72e-02

Pos logit contributions
 her+0.060
 their+0.055
 his+0.055
led+0.054
 the+0.052
Neg logit contributions
 promise-0.057
 rule-0.054
 request-0.052
-0.051
 world-0.051
Token soon
Token ID2582
Feature activation-1.17e-01

Pos logit contributions
led+0.122
 theirs+0.113
ries+0.112
lled+0.112
 their+0.112
Neg logit contributions
 promise-0.140
 request-0.114
-0.103
 rule-0.100
 judge-0.099
Token the
Token ID262
Feature activation+4.52e-02

Pos logit contributions
bled+0.171
led+0.152
ied+0.149
iddled+0.145
 theirs+0.138
Neg logit contributions
-0.163
 world-0.160
 promise-0.157
 story-0.153
 bird-0.153
Token fear
Token ID3252
Feature activation-3.86e-02

Pos logit contributions
oc+0.093
ari+0.085
affe+0.085
umpy+0.082
'."+0.082
Neg logit contributions
 her-0.084
 the-0.084
 his-0.080
 their-0.077
 it-0.074
Token was
Token ID373
Feature activation-4.66e-02

Pos logit contributions
 her+0.064
 his+0.056
 their+0.056
 the+0.052
 its+0.050
Neg logit contributions
 promise-0.074
umpy-0.073
 rule-0.072
age-0.072
opus-0.071
Token replaced
Token ID6928
Feature activation-4.79e-02

Pos logit contributions
 her+0.068
led+0.062
 their+0.058
 his+0.058
ed+0.057
Neg logit contributions
 promise-0.092
 rule-0.089
 request-0.084
 world-0.082
age-0.081
Token by
Token ID416
Feature activation-3.72e-02

Pos logit contributions
 her+0.085
 their+0.083
 his+0.081
led+0.076
bled+0.076
Neg logit contributions
 world-0.074
 promise-0.072
ater-0.069
umpy-0.069
 rule-0.068
Token police
Token ID1644
Feature activation-1.60e-02

Pos logit contributions
ute+0.037
asis+0.036
 promise+0.036
ater+0.036
opus+0.036
Neg logit contributions
 her-0.052
 the-0.048
 his-0.047
 their-0.046
 it-0.042
Token officer
Token ID3818
Feature activation-2.63e-02

Pos logit contributions
 her+0.037
 their+0.034
 his+0.034
 the+0.034
 its+0.032
Neg logit contributions
 promise-0.021
opus-0.020
oc-0.020
-0.020
umpy-0.020
Token called
Token ID1444
Feature activation-3.23e-02

Pos logit contributions
 her+0.041
 their+0.037
 his+0.035
 the+0.034
 its+0.033
Neg logit contributions
 promise-0.055
oc-0.053
-0.052
 request-0.051
opus-0.051
Token Sarah
Token ID10490
Feature activation-1.26e-02

Pos logit contributions
 her+0.016
led+0.013
 their+0.012
 his+0.012
 the+0.010
Neg logit contributions
 promise-0.094
 rule-0.093
 request-0.091
-0.089
 story-0.089
Token and
Token ID290
Feature activation-7.12e-02

Pos logit contributions
 her+0.022
 their+0.020
 his+0.019
 the+0.019
 its+0.018
Neg logit contributions
 promise-0.024
oc-0.023
-0.023
 rule-0.023
opus-0.023
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
led+0.073
bled+0.070
lled+0.065
ied+0.065
 her+0.056
Neg logit contributions
 judge-0.162
 promise-0.160
 team-0.156
 mechanic-0.153
 request-0.152
Token she
Token ID673
Feature activation-2.49e-02

Pos logit contributions
bled+0.145
nam+0.129
ries+0.126
 herself+0.123
ped+0.122
Neg logit contributions
 team-0.217
ari-0.205
 story-0.203
 group-0.194
affe-0.191
Token arrived
Token ID5284
Feature activation-2.32e-02

Pos logit contributions
 her+0.039
 their+0.034
 his+0.033
 the+0.033
 its+0.032
Neg logit contributions
 promise-0.050
oc-0.050
-0.048
opus-0.048
ari-0.048
Token to
Token ID284
Feature activation-1.88e-02

Pos logit contributions
 her+0.033
 his+0.028
 the+0.028
 their+0.028
led+0.026
Neg logit contributions
 promise-0.050
ute-0.047
 rule-0.046
-0.046
 request-0.046
Token pick
Token ID2298
Feature activation-5.60e-02

Pos logit contributions
 her+0.032
led+0.030
 his+0.028
 their+0.028
 the+0.028
Neg logit contributions
 promise-0.035
 rule-0.034
 request-0.033
-0.033
oc-0.032
Token up
Token ID510
Feature activation-5.89e-02

Pos logit contributions
bled+0.038
 her+0.033
 nothing+0.033
led+0.031
 their+0.030
Neg logit contributions
oc-0.142
-0.140
pillar-0.138
affe-0.137
 world-0.136
Token little
Token ID1310
Feature activation-5.63e-03

Pos logit contributions
ute+0.067
iver+0.063
ater+0.063
+0.063
opus+0.063
Neg logit contributions
 her-0.066
 the-0.062
 their-0.057
 his-0.056
 it-0.051
Token girl
Token ID2576
Feature activation-1.24e-02

Pos logit contributions
 her+0.016
 his+0.015
 their+0.015
 the+0.015
 its+0.014
Neg logit contributions
 promise-0.005
 rule-0.005
-0.005
umpy-0.005
oc-0.005
Token was
Token ID373
Feature activation-3.49e-02

Pos logit contributions
 her+0.018
 their+0.016
 his+0.016
 the+0.015
 its+0.014
Neg logit contributions
 promise-0.028
oc-0.027
-0.026
 rule-0.026
opus-0.026
Token disappointed
Token ID11679
Feature activation-1.01e-02

Pos logit contributions
 her+0.060
led+0.055
 their+0.053
 his+0.053
 him+0.052
Neg logit contributions
 promise-0.064
umpy-0.061
 rule-0.061
 request-0.058
age-0.058
Token but
Token ID475
Feature activation-6.57e-02

Pos logit contributions
 her+0.017
 their+0.015
 his+0.015
 the+0.015
 its+0.014
Neg logit contributions
 promise-0.020
 rule-0.019
umpy-0.019
-0.019
ute-0.019
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
bled+0.099
 them+0.094
lled+0.093
ied+0.092
led+0.092
Neg logit contributions
 promise-0.117
 judge-0.114
 request-0.104
 world-0.098
 swing-0.096
Token spotted
Token ID13489
Feature activation-7.98e-02

Pos logit contributions
bled+0.184
iddled+0.173
ied+0.167
lled+0.162
ped+0.158
Neg logit contributions
-0.186
 team-0.177
 story-0.169
 group-0.157
pillar-0.155
Token a
Token ID257
Feature activation+8.28e-02

Pos logit contributions
led+0.083
bled+0.072
lled+0.069
ged+0.065
ied+0.063
Neg logit contributions
 veterinarian-0.164
 farm-0.162
 world-0.160
 judge-0.154
 rule-0.154
Token small
Token ID1402
Feature activation+5.75e-02

Pos logit contributions
iver+0.176
'."+0.174
umpy+0.172
ater+0.171
.'"+0.171
Neg logit contributions
 the-0.142
 her-0.127
 their-0.127
 it-0.125
 his-0.125
Token elephant
Token ID20950
Feature activation+5.34e-02

Pos logit contributions
umpy+0.089
arl+0.089
ute+0.089
opus+0.089
icious+0.088
Neg logit contributions
 the-0.127
 her-0.123
 their-0.118
 his-0.117
 it-0.112
Token standing
Token ID5055
Feature activation-1.73e-02

Pos logit contributions
umpy+0.128
arl+0.126
+0.125
 promise+0.124
 rule+0.122
Neg logit contributions
 her-0.080
 the-0.074
 his-0.070
 it-0.065
 their-0.064
Token baby
Token ID5156
Feature activation-7.23e-03

Pos logit contributions
ater+0.058
asis+0.057
oc+0.057
ute+0.057
!'"+0.056
Neg logit contributions
 her-0.059
 the-0.055
 his-0.052
 their-0.052
 it-0.047
Token was
Token ID373
Feature activation-4.33e-02

Pos logit contributions
 her+0.014
 their+0.012
 his+0.012
 the+0.012
 its+0.011
Neg logit contributions
 promise-0.013
oc-0.013
opus-0.013
-0.013
umpy-0.013
Token disappointed
Token ID11679
Feature activation-1.57e-02

Pos logit contributions
 her+0.077
led+0.072
 their+0.070
 his+0.068
 him+0.068
Neg logit contributions
 promise-0.071
 rule-0.071
umpy-0.070
 request-0.067
age-0.065
Token,
Token ID11
Feature activation-4.78e-02

Pos logit contributions
 her+0.028
 their+0.025
 his+0.025
 the+0.024
led+0.023
Neg logit contributions
 promise-0.029
 rule-0.028
umpy-0.028
-0.027
ute-0.027
Token but
Token ID475
Feature activation-7.83e-02

Pos logit contributions
bled+0.087
led+0.086
 her+0.085
lled+0.082
ied+0.081
Neg logit contributions
 rule-0.069
 promise-0.068
 world-0.067
 hill-0.066
 farm-0.065
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
lled+0.118
bled+0.113
ied+0.108
icked+0.108
 theirs+0.101
Neg logit contributions
 promise-0.124
-0.118
 judge-0.115
 veterinarian-0.114
 hill-0.112
Token she
Token ID673
Feature activation-1.67e-02

Pos logit contributions
bled+0.183
ied+0.174
iddled+0.172
lled+0.166
alled+0.159
Neg logit contributions
-0.192
 team-0.166
 bird-0.163
 story-0.161
!"-0.158
Token saw
Token ID2497
Feature activation-7.24e-02

Pos logit contributions
 her+0.027
 their+0.024
 his+0.023
 the+0.023
 its+0.022
Neg logit contributions
 promise-0.035
oc-0.033
-0.033
opus-0.033
umpy-0.033
Token a
Token ID257
Feature activation+8.23e-02

Pos logit contributions
led+0.080
bled+0.079
lled+0.070
ened+0.067
ied+0.067
Neg logit contributions
 veterinarian-0.144
 farm-0.139
 world-0.135
 house-0.135
 judge-0.133
Token little
Token ID1310
Feature activation+4.09e-02

Pos logit contributions
'."+0.178
.'"+0.175
rr+0.173
!'"+0.173
asis+0.172
Neg logit contributions
 the-0.136
 her-0.132
 his-0.124
 it-0.122
 their-0.122
Token bird
Token ID6512
Feature activation+3.24e-02

Pos logit contributions
asis+0.058
ute+0.056
opus+0.056
rr+0.055
umpy+0.055
Neg logit contributions
 her-0.096
 the-0.092
 his-0.088
 their-0.088
 it-0.084
Token.
Token ID13
Feature activation-6.45e-03

Pos logit contributions
opus+0.065
oc+0.064
 promise+0.063
affe+0.062
umpy+0.062
Neg logit contributions
 her-0.052
 the-0.047
 their-0.043
 his-0.043
 it-0.041
Token She
Token ID1375
Feature activation-1.99e-02

Pos logit contributions
 her+0.011
 his+0.010
 their+0.010
 the+0.010
 its+0.009
Neg logit contributions
 promise-0.012
 rule-0.012
-0.012
umpy-0.012
opus-0.012
Token started
Token ID2067
Feature activation-4.56e-02

Pos logit contributions
 her+0.031
 their+0.027
 his+0.027
 the+0.026
 its+0.025
Neg logit contributions
 promise-0.042
-0.040
oc-0.040
 rule-0.040
opus-0.039
Token walking
Token ID6155
Feature activation-2.48e-02

Pos logit contributions
led+0.070
bled+0.067
 her+0.062
lled+0.059
 none+0.059
Neg logit contributions
 rule-0.083
 request-0.083
 swing-0.082
 farm-0.082
 hill-0.081
Token but
Token ID475
Feature activation-8.35e-02

Pos logit contributions
 her+0.039
 their+0.034
led+0.034
 his+0.033
bled+0.033
Neg logit contributions
 promise-0.047
 hill-0.046
 rule-0.046
-0.045
 world-0.045
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
 his+0.107
led+0.098
icked+0.097
 their+0.096
ped+0.092
Neg logit contributions
 promise-0.158
 judge-0.157
-0.150
 hill-0.149
 world-0.148
Token realized
Token ID6939
Feature activation-7.72e-02

Pos logit contributions
bled+0.192
ped+0.180
led+0.176
ied+0.176
 his+0.173
Neg logit contributions
-0.164
pillar-0.156
 judge-0.154
ari-0.149
 world-0.147
Token it
Token ID340
Feature activation-4.31e-02

Pos logit contributions
led+0.103
bled+0.088
lled+0.087
ped+0.081
ed+0.080
Neg logit contributions
 veterinarian-0.146
 swing-0.145
 world-0.143
 rule-0.142
 judge-0.142
Token was
Token ID373
Feature activation-5.60e-02

Pos logit contributions
 her+0.074
 their+0.072
 his+0.070
 its+0.067
bled+0.064
Neg logit contributions
umpy-0.074
 promise-0.073
 rule-0.073
 world-0.072
 story-0.070
Token farther
Token ID18485
Feature activation-3.65e-02

Pos logit contributions
led+0.101
ed+0.095
ged+0.092
ted+0.090
bled+0.089
Neg logit contributions
 rule-0.100
 promise-0.096
 world-0.090
 story-0.087
 request-0.085
Token away
Token ID1497
Feature activation-3.87e-02

Pos logit contributions
 her+0.075
bled+0.068
 his+0.064
 the+0.064
 their+0.064
Neg logit contributions
 promise-0.059
 rule-0.055
umpy-0.055
ater-0.054
ute-0.051
Token for
Token ID329
Feature activation-6.64e-02

Pos logit contributions
led+0.046
 her+0.044
bled+0.043
ed+0.037
ied+0.035
Neg logit contributions
 promise-0.120
 rule-0.117
 world-0.117
 farm-0.114
 story-0.114
Token a
Token ID257
Feature activation+3.10e-02

Pos logit contributions
led+0.087
ied+0.087
bled+0.084
 her+0.079
angled+0.072
Neg logit contributions
 story-0.139
oc-0.136
 rule-0.135
 tale-0.133
 promise-0.133
Token bit
Token ID1643
Feature activation-2.64e-03

Pos logit contributions
umpy+0.069
ney+0.067
ater+0.067
ute+0.067
affe+0.066
Neg logit contributions
 her-0.046
 the-0.044
 their-0.042
 his-0.041
 it-0.038
Token,
Token ID11
Feature activation-5.99e-02

Pos logit contributions
 her+0.004
 his+0.004
 their+0.004
 the+0.004
 its+0.003
Neg logit contributions
 promise-0.006
 rule-0.005
umpy-0.005
-0.005
oc-0.005
Token but
Token ID475
Feature activation-7.83e-02

Pos logit contributions
 her+0.108
led+0.104
ied+0.098
bled+0.097
icked+0.096
Neg logit contributions
 promise-0.096
 world-0.092
 rule-0.091
 farm-0.088
oc-0.087
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
ries+0.118
 her+0.117
led+0.107
iddled+0.104
icked+0.102
Neg logit contributions
 promise-0.149
 team-0.130
 rule-0.129
oc-0.129
 hill-0.124
Token he
Token ID339
Feature activation-3.15e-02

Pos logit contributions
bled+0.155
 her+0.154
ied+0.153
iddled+0.152
nam+0.142
Neg logit contributions
 team-0.211
 story-0.192
-0.189
oc-0.185
 world-0.184
Token was
Token ID373
Feature activation-4.03e-02

Pos logit contributions
 her+0.053
 their+0.046
 his+0.043
 the+0.042
 its+0.042
Neg logit contributions
 promise-0.063
opus-0.062
umpy-0.061
oc-0.060
-0.060
Token back
Token ID736
Feature activation-2.19e-02

Pos logit contributions
 her+0.065
led+0.059
 their+0.058
ted+0.054
 his+0.054
Neg logit contributions
 promise-0.076
umpy-0.073
 rule-0.071
 world-0.071
 team-0.068
Token home
Token ID1363
Feature activation-2.35e-02

Pos logit contributions
 her+0.030
 their+0.025
 his+0.025
 the+0.024
 its+0.023
Neg logit contributions
 promise-0.049
umpy-0.047
ute-0.047
 rule-0.047
-0.046
Token and
Token ID290
Feature activation-4.93e-02

Pos logit contributions
 her+0.046
 their+0.041
 his+0.040
 the+0.039
led+0.038
Neg logit contributions
 promise-0.040
 rule-0.037
umpy-0.037
oc-0.037
ney-0.036
Token over
Token ID625
Feature activation-3.64e-02

Pos logit contributions
bled+0.129
ied+0.125
 theirs+0.121
 such+0.118
 their+0.116
Neg logit contributions
 swing-0.116
 hill-0.109
 slide-0.104
 fence-0.101
 river-0.097
Token the
Token ID262
Feature activation+5.79e-02

Pos logit contributions
 her+0.039
led+0.039
bled+0.036
 his+0.036
 their+0.036
Neg logit contributions
 world-0.082
 hill-0.081
 promise-0.080
-0.079
 rule-0.079
Token hills
Token ID18639
Feature activation-3.31e-02

Pos logit contributions
oc+0.117
 Amen+0.109
umpy+0.109
iver+0.109
ari+0.107
Neg logit contributions
 the-0.120
 her-0.116
 it-0.109
 their-0.096
 its-0.092
Token,
Token ID11
Feature activation-6.97e-02

Pos logit contributions
 her+0.062
 his+0.056
 their+0.056
led+0.054
 the+0.054
Neg logit contributions
 promise-0.052
 rule-0.052
umpy-0.050
 hill-0.050
ute-0.049
Token but
Token ID475
Feature activation-9.07e-02

Pos logit contributions
icked+0.118
 her+0.112
 their+0.111
bled+0.110
 his+0.110
Neg logit contributions
 hill-0.113
 swing-0.113
 fence-0.105
 farm-0.104
 slide-0.103
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
led+0.093
lled+0.092
 his+0.085
 their+0.084
icked+0.083
Neg logit contributions
 hill-0.183
 swing-0.178
 promise-0.172
 goal-0.167
-0.165
Token she
Token ID673
Feature activation-3.45e-02

Pos logit contributions
bled+0.181
led+0.178
ied+0.174
lled+0.169
ped+0.160
Neg logit contributions
-0.168
 swing-0.156
 world-0.151
 house-0.151
!"-0.148
Token found
Token ID1043
Feature activation-7.72e-02

Pos logit contributions
 her+0.060
 their+0.057
 his+0.053
 its+0.050
 the+0.050
Neg logit contributions
opus-0.067
umpy-0.064
 promise-0.063
pillar-0.063
affe-0.062
Token herself
Token ID5223
Feature activation-6.34e-03

Pos logit contributions
led+0.092
 their+0.078
lled+0.070
 his+0.069
bled+0.067
Neg logit contributions
 cage-0.145
 world-0.145
age-0.142
 fence-0.141
 veterinarian-0.140
Token lost
Token ID2626
Feature activation-2.38e-02

Pos logit contributions
 her+0.009
 his+0.008
 their+0.008
 the+0.008
 its+0.007
Neg logit contributions
 promise-0.014
 rule-0.014
-0.013
umpy-0.013
opus-0.013
Token in
Token ID287
Feature activation-5.18e-02

Pos logit contributions
 her+0.030
 their+0.028
led+0.027
 his+0.026
 the+0.026
Neg logit contributions
 promise-0.052
 rule-0.052
ater-0.050
 world-0.050
ute-0.050
Token walked
Token ID6807
Feature activation-3.11e-02

Pos logit contributions
 her+0.047
 their+0.042
 his+0.040
 its+0.040
 the+0.040
Neg logit contributions
 promise-0.059
-0.056
oc-0.055
 request-0.054
 rule-0.054
Token and
Token ID290
Feature activation-6.85e-02

Pos logit contributions
 her+0.048
 their+0.042
 nothing+0.042
 its+0.041
led+0.041
Neg logit contributions
-0.056
 promise-0.056
ute-0.055
 hill-0.054
 rule-0.054
Token walked
Token ID6807
Feature activation-2.03e-02

Pos logit contributions
bled+0.113
 such+0.111
lled+0.111
 anything+0.101
 her+0.101
Neg logit contributions
 hill-0.113
-0.107
 swing-0.101
 slide-0.101
 road-0.100
Token,
Token ID11
Feature activation-5.46e-02

Pos logit contributions
 her+0.032
 their+0.027
 the+0.027
 his+0.027
 its+0.026
Neg logit contributions
 promise-0.039
-0.039
 rule-0.039
ute-0.038
oc-0.038
Token but
Token ID475
Feature activation-7.65e-02

Pos logit contributions
 her+0.102
led+0.097
 their+0.097
icked+0.096
 its+0.094
Neg logit contributions
 hill-0.082
 farm-0.077
 swing-0.077
 promise-0.077
 road-0.077
Token soon
Token ID2582
Feature activation-1.16e-01

Pos logit contributions
led+0.102
icked+0.102
lled+0.100
 her+0.098
ped+0.091
Neg logit contributions
 hill-0.147
 road-0.140
 promise-0.137
oc-0.135
 driver-0.132
Token he
Token ID339
Feature activation-3.99e-02

Pos logit contributions
led+0.197
bled+0.196
ied+0.189
lled+0.185
ped+0.184
Neg logit contributions
 team-0.154
-0.151
 river-0.149
 hill-0.145
 road-0.145
Token realized
Token ID6939
Feature activation-6.36e-02

Pos logit contributions
 her+0.073
 their+0.067
 its+0.063
 his+0.061
 the+0.059
Neg logit contributions
opus-0.073
umpy-0.073
 promise-0.070
-0.070
oc-0.068
Token he
Token ID339
Feature activation-1.03e-02

Pos logit contributions
led+0.074
 her+0.066
lled+0.066
bled+0.062
ed+0.061
Neg logit contributions
 hill-0.131
opus-0.127
 story-0.126
 world-0.125
 rule-0.125
Token didn
Token ID1422
Feature activation+1.74e-02

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.022
umpy-0.021
-0.021
 rule-0.021
opus-0.021
Token't
Token ID470
Feature activation+2.00e-02

Pos logit contributions
 promise+0.030
oc+0.029
ater+0.028
opus+0.028
affe+0.028
Neg logit contributions
 her-0.036
 the-0.033
 his-0.033
 their-0.032
 it-0.031
Token They
Token ID1119
Feature activation-6.18e-02

Pos logit contributions
 her+0.039
 his+0.035
 the+0.034
 their+0.034
led+0.033
Neg logit contributions
 rule-0.028
 promise-0.028
-0.028
umpy-0.028
oc-0.027
Token searched
Token ID16499
Feature activation-6.67e-02

Pos logit contributions
 her+0.105
bled+0.100
 its+0.099
led+0.099
 their+0.098
Neg logit contributions
 promise-0.112
 request-0.094
 judge-0.092
 rule-0.092
 team-0.089
Token and
Token ID290
Feature activation-8.17e-02

Pos logit contributions
bled+0.104
ed+0.097
icked+0.097
led+0.097
 her+0.096
Neg logit contributions
 hill-0.128
 promise-0.124
 rule-0.116
 world-0.114
 road-0.114
Token searched
Token ID16499
Feature activation-6.18e-02

Pos logit contributions
 such+0.136
bled+0.135
led+0.135
ied+0.133
 her+0.133
Neg logit contributions
 hill-0.122
 promise-0.117
 ramp-0.116
 road-0.108
-0.108
Token,
Token ID11
Feature activation-9.55e-02

Pos logit contributions
 her+0.111
bled+0.109
led+0.107
ed+0.105
icked+0.104
Neg logit contributions
 hill-0.099
 promise-0.096
 rule-0.089
 world-0.089
 road-0.087
Token but
Token ID475
Feature activation-1.15e-01

Pos logit contributions
icked+0.228
led+0.219
 her+0.215
bled+0.210
ied+0.210
Neg logit contributions
 hill-0.086
 farm-0.086
 rule-0.084
 promise-0.082
 road-0.081
Token it
Token ID340
Feature activation-7.14e-02

Pos logit contributions
led+0.162
ped+0.157
med+0.152
ened+0.149
lled+0.146
Neg logit contributions
 hill-0.203
 promise-0.202
 fence-0.188
 house-0.187
 rule-0.185
Token was
Token ID373
Feature activation-5.51e-02

Pos logit contributions
 her+0.148
 its+0.138
 their+0.138
 such+0.135
 his+0.134
Neg logit contributions
 promise-0.100
ney-0.090
 hill-0.088
 rule-0.087
 request-0.082
Token gone
Token ID3750
Feature activation-9.84e-03

Pos logit contributions
ted+0.092
led+0.091
ed+0.090
 her+0.087
ged+0.085
Neg logit contributions
 promise-0.102
 rule-0.100
 shelf-0.089
 world-0.089
 hill-0.087
Token.
Token ID13
Feature activation-1.94e-02

Pos logit contributions
 her+0.020
 his+0.018
 their+0.018
 the+0.018
 its+0.016
Neg logit contributions
 promise-0.016
-0.016
 rule-0.016
umpy-0.016
oc-0.015
Token They
Token ID1119
Feature activation-6.91e-02

Pos logit contributions
 her+0.040
 his+0.036
 the+0.035
 their+0.035
led+0.034
Neg logit contributions
 promise-0.030
-0.030
 rule-0.030
umpy-0.030
oc-0.029
Token,"
Token ID553
Feature activation-3.48e-02

Pos logit contributions
 promise+0.001
 rule+0.001
opus+0.001
umpy+0.001
oc+0.001
Neg logit contributions
 her-0.003
 the-0.003
 his-0.003
 their-0.003
 its-0.003
Token said
Token ID531
Feature activation-5.35e-02

Pos logit contributions
 her+0.057
 their+0.053
bled+0.052
 his+0.051
 nothing+0.051
Neg logit contributions
-0.067
oc-0.064
 promise-0.062
affe-0.059
 request-0.057
Token Lily
Token ID20037
Feature activation-1.99e-02

Pos logit contributions
lled+0.065
led+0.061
bled+0.061
ied+0.057
 her+0.053
Neg logit contributions
-0.119
oc-0.112
 promise-0.111
 judge-0.111
 veterinarian-0.109
Token.
Token ID13
Feature activation+8.64e-03

Pos logit contributions
 her+0.041
 their+0.038
 the+0.037
 his+0.037
 its+0.035
Neg logit contributions
 promise-0.031
oc-0.030
umpy-0.028
-0.028
age-0.028
Token She
Token ID1375
Feature activation-4.75e-02

Pos logit contributions
 promise+0.018
 rule+0.018
opus+0.018
umpy+0.018
ater+0.017
Neg logit contributions
 her-0.015
 the-0.013
 his-0.013
 their-0.013
 its-0.011
Token organized
Token ID8389
Feature activation-1.15e-01

Pos logit contributions
 her+0.079
 their+0.074
 his+0.069
 the+0.068
 such+0.066
Neg logit contributions
 promise-0.093
oc-0.091
 request-0.086
-0.085
ari-0.085
Token all
Token ID477
Feature activation-7.40e-02

Pos logit contributions
led+0.101
lled+0.078
ment+0.072
 happened+0.072
 colder+0.070
Neg logit contributions
 shelf-0.249
 store-0.215
 label-0.213
 shop-0.213
 switch-0.212
Token the
Token ID262
Feature activation-1.52e-02

Pos logit contributions
bled+0.093
 none+0.083
 nothing+0.081
 neither+0.062
 something+0.060
Neg logit contributions
 promise-0.159
oc-0.156
ney-0.151
 world-0.150
ny-0.150
Token toys
Token ID14958
Feature activation-8.08e-02

Pos logit contributions
 her+0.030
 their+0.027
 the+0.026
 his+0.026
 its+0.025
Neg logit contributions
 promise-0.026
 rule-0.026
oc-0.025
umpy-0.025
ater-0.024
Token in
Token ID287
Feature activation-4.79e-02

Pos logit contributions
 her+0.126
 his+0.116
 their+0.112
 its+0.103
 the+0.100
Neg logit contributions
 promise-0.145
 hill-0.138
ney-0.135
ute-0.130
 house-0.129
Token the
Token ID262
Feature activation+3.69e-02

Pos logit contributions
led+0.037
bled+0.025
ied+0.021
ed+0.020
 her+0.018
Neg logit contributions
 rule-0.136
oc-0.135
 promise-0.134
 shelf-0.131
 world-0.131
Token under
Token ID739
Feature activation-6.16e-02

Pos logit contributions
 her+0.055
 nothing+0.048
led+0.046
 his+0.046
lled+0.046
Neg logit contributions
ute-0.091
 promise-0.089
 world-0.087
 house-0.087
-0.087
Token the
Token ID262
Feature activation+3.45e-02

Pos logit contributions
led+0.059
ied+0.054
bled+0.048
icked+0.047
ered+0.046
Neg logit contributions
 farm-0.151
 team-0.150
-0.150
 hill-0.149
 fence-0.147
Token umbrella
Token ID25510
Feature activation-1.85e-02

Pos logit contributions
ater+0.060
opus+0.059
oc+0.058
 promise+0.057
rr+0.057
Neg logit contributions
 her-0.078
 the-0.075
 their-0.069
 his-0.067
 it-0.064
Token,
Token ID11
Feature activation-6.33e-02

Pos logit contributions
 her+0.033
 his+0.029
 their+0.029
 the+0.029
 its+0.027
Neg logit contributions
 promise-0.034
oc-0.033
 rule-0.033
age-0.033
ute-0.033
Token but
Token ID475
Feature activation-8.24e-02

Pos logit contributions
 her+0.107
 his+0.100
 him+0.099
led+0.098
ied+0.096
Neg logit contributions
 swing-0.099
 farm-0.091
 promise-0.091
 slide-0.088
 hill-0.087
Token soon
Token ID2582
Feature activation-1.15e-01

Pos logit contributions
lled+0.098
led+0.097
icked+0.091
ped+0.086
ered+0.085
Neg logit contributions
 promise-0.172
 team-0.157
 shelf-0.157
 rule-0.154
 judge-0.153
Token they
Token ID484
Feature activation-4.29e-02

Pos logit contributions
ped+0.133
bled+0.133
led+0.129
years+0.126
ied+0.121
Neg logit contributions
 team-0.226
 party-0.202
 story-0.202
-0.199
 judge-0.188
Token noticed
Token ID6810
Feature activation-6.68e-02

Pos logit contributions
 her+0.068
 his+0.061
 their+0.059
 its+0.056
 the+0.056
Neg logit contributions
 promise-0.083
 team-0.078
affe-0.075
ater-0.074
-0.074
Token that
Token ID326
Feature activation-6.28e-02

Pos logit contributions
 her+0.061
led+0.061
bled+0.055
lled+0.053
 his+0.052
Neg logit contributions
 farm-0.139
 world-0.139
 rule-0.139
 story-0.138
 promise-0.137
Token their
Token ID511
Feature activation+1.89e-02

Pos logit contributions
led+0.057
bled+0.048
icked+0.044
lled+0.043
ped+0.043
Neg logit contributions
 team-0.154
 rule-0.154
 story-0.151
 promise-0.151
 world-0.150
Token feet
Token ID3625
Feature activation-3.07e-02

Pos logit contributions
oc+0.034
opus+0.033
+0.033
umpy+0.032
affe+0.032
Neg logit contributions
 her-0.038
 the-0.034
 their-0.034
 his-0.033
 it-0.031
Token she
Token ID673
Feature activation-3.32e-02

Pos logit contributions
led+0.102
bled+0.091
lled+0.091
icked+0.084
ied+0.081
Neg logit contributions
 swing-0.135
 house-0.132
 judge-0.131
 promise-0.130
 veterinarian-0.128
Token screamed
Token ID25421
Feature activation-2.98e-02

Pos logit contributions
 her+0.052
 their+0.048
 his+0.045
 the+0.044
 its+0.043
Neg logit contributions
 promise-0.067
age-0.063
 request-0.062
-0.061
oc-0.061
Token for
Token ID329
Feature activation-7.07e-02

Pos logit contributions
 her+0.048
led+0.044
 their+0.044
 his+0.044
 the+0.043
Neg logit contributions
 promise-0.053
 rule-0.051
 request-0.051
ute-0.050
oc-0.049
Token help
Token ID1037
Feature activation-2.79e-02

Pos logit contributions
led+0.214
ed+0.194
ied+0.190
lled+0.187
ened+0.178
Neg logit contributions
 promise-0.057
 rule-0.050
 request-0.047
 test-0.042
 spell-0.039
Token.
Token ID13
Feature activation-1.98e-02

Pos logit contributions
led+0.053
 her+0.052
 their+0.051
 his+0.050
bled+0.048
Neg logit contributions
 promise-0.043
 rule-0.042
 request-0.040
age-0.040
oc-0.038
Token Soon
Token ID15894
Feature activation-1.15e-01

Pos logit contributions
 her+0.038
 their+0.036
 his+0.035
 the+0.034
led+0.034
Neg logit contributions
-0.031
 rule-0.031
 promise-0.031
 request-0.030
umpy-0.030
Token a
Token ID257
Feature activation+6.37e-02

Pos logit contributions
led+0.303
bled+0.279
lled+0.276
ped+0.267
ied+0.264
Neg logit contributions
 house-0.070
 story-0.067
 promise-0.060
 swing-0.059
 team-0.058
Token friendly
Token ID8030
Feature activation+1.97e-02

Pos logit contributions
ater+0.136
umpy+0.131
'."+0.131
.'"+0.130
!'"+0.130
Neg logit contributions
 her-0.107
 the-0.103
 his-0.099
 it-0.095
 their-0.092
Token polar
Token ID13559
Feature activation+1.37e-02

Pos logit contributions
+0.024
 promise+0.024
oc+0.023
umpy+0.023
ater+0.023
Neg logit contributions
 her-0.051
 the-0.047
 his-0.047
 their-0.046
 its-0.044
Token bear
Token ID6842
Feature activation-9.76e-04

Pos logit contributions
 promise+0.015
umpy+0.015
+0.015
 rule+0.015
ute+0.015
Neg logit contributions
 her-0.035
 his-0.033
 the-0.033
 their-0.032
 its-0.031
Token appeared
Token ID4120
Feature activation-2.54e-02

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.002
 rule-0.002
opus-0.002
-0.002
oc-0.002
Token She
Token ID1375
Feature activation-2.05e-02

Pos logit contributions
 promise+0.016
 rule+0.015
opus+0.015
oc+0.015
+0.015
Neg logit contributions
 her-0.012
 the-0.010
 his-0.010
 their-0.010
 its-0.009
Token followed
Token ID3940
Feature activation-4.61e-02

Pos logit contributions
 her+0.031
 their+0.028
 his+0.027
 the+0.027
 its+0.024
Neg logit contributions
 promise-0.043
oc-0.042
-0.041
opus-0.041
 request-0.040
Token the
Token ID262
Feature activation+2.67e-02

Pos logit contributions
led+0.031
bled+0.025
 none+0.020
icked+0.020
lled+0.019
Neg logit contributions
 promise-0.127
 request-0.126
-0.122
 rule-0.120
 story-0.119
Token singing
Token ID13777
Feature activation-3.22e-02

Pos logit contributions
oc+0.048
ute+0.048
ater+0.047
opus+0.046
+0.046
Neg logit contributions
 her-0.054
 the-0.050
 his-0.047
 it-0.044
 their-0.043
Token and
Token ID290
Feature activation-8.45e-02

Pos logit contributions
 her+0.051
 his+0.046
 their+0.046
 the+0.044
led+0.041
Neg logit contributions
 promise-0.060
age-0.059
ute-0.059
 rule-0.058
affe-0.057
Token soon
Token ID2582
Feature activation-1.15e-01

Pos logit contributions
bled+0.131
 theirs+0.131
ries+0.130
ied+0.130
icked+0.122
Neg logit contributions
 promise-0.127
 house-0.118
 request-0.117
 swing-0.115
 hill-0.113
Token she
Token ID673
Feature activation-2.95e-02

Pos logit contributions
nam+0.169
bled+0.168
ries+0.148
 theirs+0.147
ied+0.146
Neg logit contributions
pillar-0.175
 world-0.171
 group-0.168
 house-0.167
 story-0.166
Token came
Token ID1625
Feature activation-2.93e-02

Pos logit contributions
 her+0.049
 their+0.045
 his+0.044
 the+0.041
 its+0.040
Neg logit contributions
 promise-0.056
oc-0.055
opus-0.055
-0.054
umpy-0.054
Token upon
Token ID2402
Feature activation-2.90e-02

Pos logit contributions
 her+0.045
 their+0.041
 his+0.039
led+0.038
 the+0.038
Neg logit contributions
umpy-0.057
 promise-0.057
 world-0.056
age-0.056
ute-0.056
Token a
Token ID257
Feature activation+5.92e-02

Pos logit contributions
 her+0.027
 their+0.023
 his+0.023
led+0.022
 the+0.021
Neg logit contributions
 promise-0.073
 rule-0.073
-0.071
 world-0.070
 request-0.070
Token pond
Token ID16723
Feature activation+1.85e-02

Pos logit contributions
ater+0.121
rr+0.120
umpy+0.119
ute+0.117
affe+0.117
Neg logit contributions
 the-0.110
 her-0.103
 it-0.098
 his-0.093
 their-0.091
Token and
Token ID290
Feature activation-6.27e-02

Pos logit contributions
 her+0.054
led+0.052
bled+0.051
 their+0.050
lled+0.049
Neg logit contributions
 promise-0.060
oc-0.060
 world-0.060
 rule-0.059
 request-0.058
Token flew
Token ID13112
Feature activation-4.03e-02

Pos logit contributions
 her+0.081
 their+0.079
bled+0.075
ied+0.073
 his+0.072
Neg logit contributions
-0.126
oc-0.118
 hill-0.114
 promise-0.113
asis-0.108
Token away
Token ID1497
Feature activation-3.29e-02

Pos logit contributions
 her+0.058
 their+0.054
led+0.052
bled+0.051
 how+0.051
Neg logit contributions
 promise-0.070
-0.070
 world-0.070
age-0.068
 hill-0.068
Token,
Token ID11
Feature activation-3.55e-02

Pos logit contributions
 her+0.063
 their+0.057
 his+0.057
 him+0.054
 the+0.054
Neg logit contributions
 promise-0.052
 rule-0.050
umpy-0.049
-0.048
oc-0.047
Token but
Token ID475
Feature activation-7.85e-02

Pos logit contributions
 her+0.059
 their+0.053
 his+0.052
led+0.051
 the+0.048
Neg logit contributions
 promise-0.066
 rule-0.064
 world-0.062
 hill-0.061
-0.060
Token soon
Token ID2582
Feature activation-1.15e-01

Pos logit contributions
lled+0.073
led+0.070
 their+0.069
ries+0.069
icked+0.067
Neg logit contributions
 promise-0.173
-0.159
 hill-0.158
 rule-0.158
oc-0.157
Token returned
Token ID4504
Feature activation-3.75e-02

Pos logit contributions
bled+0.134
ied+0.128
led+0.123
ged+0.120
ped+0.119
Neg logit contributions
 story-0.212
 world-0.211
 team-0.210
-0.208
 house-0.198
Token with
Token ID351
Feature activation-7.02e-02

Pos logit contributions
 her+0.063
 their+0.059
led+0.059
 his+0.055
lled+0.055
Neg logit contributions
 promise-0.063
-0.060
 world-0.060
 rule-0.060
age-0.059
Token a
Token ID257
Feature activation+6.14e-02

Pos logit contributions
led+0.070
bled+0.066
ied+0.056
lled+0.053
ed+0.052
Neg logit contributions
 promise-0.157
 story-0.151
 rule-0.149
 world-0.149
 request-0.146
Token friend
Token ID1545
Feature activation+2.23e-02

Pos logit contributions
ater+0.130
rr+0.129
umpy+0.128
'."+0.125
opus+0.124
Neg logit contributions
 her-0.099
 the-0.099
 his-0.092
 it-0.089
 their-0.088
Token.
Token ID13
Feature activation+2.88e-03

Pos logit contributions
opus+0.045
 promise+0.045
 rule+0.045
umpy+0.045
ute+0.045
Neg logit contributions
 her-0.040
 the-0.036
 his-0.035
 their-0.034
 it-0.031
Token a
Token ID257
Feature activation+4.68e-02

Pos logit contributions
 her+0.020
 their+0.016
 the+0.015
 his+0.015
led+0.015
Neg logit contributions
-0.065
 promise-0.064
 rule-0.063
oc-0.063
umpy-0.063
Token time
Token ID640
Feature activation-3.94e-02

Pos logit contributions
opus+0.094
ute+0.091
asis+0.090
 Driver+0.090
affe+0.089
Neg logit contributions
 her-0.083
 their-0.074
 the-0.073
 his-0.072
 it-0.063
Token,
Token ID11
Feature activation-3.85e-02

Pos logit contributions
 her+0.073
led+0.068
 their+0.066
 his+0.065
lled+0.065
Neg logit contributions
-0.063
opus-0.062
 promise-0.062
oc-0.061
 story-0.061
Token there
Token ID612
Feature activation-5.16e-02

Pos logit contributions
 her+0.042
led+0.034
 their+0.034
 his+0.033
lled+0.032
Neg logit contributions
-0.090
 promise-0.087
oc-0.085
umpy-0.084
 rule-0.083
Token was
Token ID373
Feature activation-4.10e-02

Pos logit contributions
 her+0.093
led+0.090
bled+0.087
 their+0.085
 able+0.082
Neg logit contributions
oc-0.083
-0.080
 story-0.079
 world-0.079
 promise-0.077
Token a
Token ID257
Feature activation+8.87e-02

Pos logit contributions
led+0.038
 her+0.037
bled+0.033
ened+0.033
ied+0.032
Neg logit contributions
-0.095
 world-0.091
 team-0.091
 story-0.091
 farm-0.091
Token little
Token ID1310
Feature activation+6.30e-02

Pos logit contributions
age+0.216
'."+0.212
!'"+0.210
ides+0.207
.'"+0.205
Neg logit contributions
 the-0.140
 her-0.134
 his-0.123
 their-0.121
 it-0.111
Token boy
Token ID2933
Feature activation+2.04e-02

Pos logit contributions
age+0.112
'."+0.107
!'"+0.107
iver+0.106
 rule+0.105
Neg logit contributions
 her-0.135
 the-0.129
 their-0.121
 his-0.119
 it-0.114
Token named
Token ID3706
Feature activation-1.29e-02

Pos logit contributions
 promise+0.037
opus+0.037
umpy+0.036
 rule+0.036
+0.036
Neg logit contributions
 her-0.041
 the-0.037
 his-0.037
 their-0.037
 its-0.033
Token Tim
Token ID5045
Feature activation+2.01e-02

Pos logit contributions
 her+0.013
 their+0.010
led+0.009
 the+0.009
 his+0.009
Neg logit contributions
 promise-0.033
-0.033
 rule-0.033
opus-0.033
oc-0.033
Tokenmy
Token ID1820
Feature activation+2.53e-02

Pos logit contributions
opus+0.035
ute+0.033
 rule+0.033
 promise+0.032
+0.032
Neg logit contributions
 her-0.044
 his-0.041
 the-0.040
 their-0.039
 its-0.035
Token and
Token ID290
Feature activation-5.15e-02

Pos logit contributions
 her+0.023
 his+0.020
 their+0.020
 the+0.020
 its+0.019
Neg logit contributions
 promise-0.024
 rule-0.023
opus-0.023
-0.023
umpy-0.023
Token plants
Token ID6134
Feature activation-2.87e-04

Pos logit contributions
 her+0.076
bled+0.072
led+0.071
 his+0.067
 their+0.066
Neg logit contributions
 promise-0.094
 judge-0.094
 world-0.093
 rule-0.092
 farm-0.091
Token.
Token ID13
Feature activation-6.76e-03

Pos logit contributions
 her+0.001
 the+0.000
 his+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.001
 rule-0.000
opus-0.000
-0.000
umpy-0.000
Token It
Token ID632
Feature activation-4.85e-02

Pos logit contributions
 her+0.012
 his+0.011
 their+0.010
 the+0.010
 its+0.010
Neg logit contributions
 promise-0.013
-0.013
 rule-0.013
umpy-0.012
oc-0.012
Token was
Token ID373
Feature activation-5.43e-02

Pos logit contributions
 her+0.085
 their+0.078
 his+0.077
bled+0.075
 its+0.074
Neg logit contributions
 promise-0.080
oc-0.080
 hill-0.079
 world-0.078
 rule-0.078
Token a
Token ID257
Feature activation+8.25e-02

Pos logit contributions
led+0.072
ged+0.064
 her+0.062
ed+0.061
bled+0.060
Neg logit contributions
 world-0.114
 rule-0.111
 farm-0.109
 story-0.106
 promise-0.105
Token magical
Token ID10883
Feature activation+7.53e-02

Pos logit contributions
rr+0.198
umpy+0.196
'."+0.195
.'"+0.194
iver+0.193
Neg logit contributions
 the-0.122
 her-0.116
 their-0.115
 it-0.105
 his-0.101
Token place
Token ID1295
Feature activation-2.80e-02

Pos logit contributions
umpy+0.135
rr+0.130
arl+0.129
 Deal+0.128
icious+0.126
Neg logit contributions
 her-0.154
 the-0.153
 it-0.143
 their-0.139
 his-0.136
Token!
Token ID0
Feature activation+9.36e-03

Pos logit contributions
 her+0.042
 his+0.037
 their+0.036
 the+0.035
led+0.033
Neg logit contributions
 promise-0.059
 rule-0.058
ute-0.055
 world-0.055
ney-0.055
Token
Token ID198
Feature activation+1.72e-02

Pos logit contributions
 promise+0.020
opus+0.019
 rule+0.019
ater+0.019
oc+0.019
Neg logit contributions
 her-0.016
 the-0.014
 his-0.013
 their-0.013
 it-0.012
Token
Token ID198
Feature activation+2.04e-02

Pos logit contributions
 promise+0.034
 rule+0.033
ater+0.032
opus+0.032
oc+0.032
Neg logit contributions
 her-0.032
 the-0.029
 his-0.028
 their-0.028
 it-0.026
Token in
Token ID287
Feature activation-2.04e-02

Pos logit contributions
 her+0.030
 their+0.025
 his+0.025
 the+0.025
 its+0.022
Neg logit contributions
 promise-0.049
 rule-0.048
umpy-0.048
ute-0.048
opus-0.047
Token the
Token ID262
Feature activation+5.83e-02

Pos logit contributions
 her+0.009
 their+0.006
led+0.005
 his+0.005
 the+0.004
Neg logit contributions
 rule-0.062
 promise-0.062
oc-0.061
-0.061
 hill-0.060
Token curtain
Token ID29461
Feature activation+3.72e-02

Pos logit contributions
opus+0.102
umpy+0.098
ute+0.098
arl+0.094
ater+0.094
Neg logit contributions
 her-0.124
 the-0.123
 their-0.112
 his-0.110
 its-0.106
Token,
Token ID11
Feature activation-1.07e-02

Pos logit contributions
 promise+0.078
opus+0.077
oc+0.076
umpy+0.076
 rule+0.075
Neg logit contributions
 her-0.064
 the-0.057
 his-0.055
 their-0.054
 it-0.051
Token and
Token ID290
Feature activation-4.67e-02

Pos logit contributions
 her+0.015
 his+0.013
 their+0.013
 the+0.013
 its+0.012
Neg logit contributions
 promise-0.023
 rule-0.023
-0.023
umpy-0.023
oc-0.023
Token a
Token ID257
Feature activation+9.94e-02

Pos logit contributions
 her+0.038
 their+0.031
 his+0.028
led+0.026
 the+0.024
Neg logit contributions
 promise-0.120
-0.118
 rule-0.117
opus-0.117
oc-0.116
Token sneaky
Token ID48930
Feature activation+7.28e-02

Pos logit contributions
ari+0.188
iver+0.187
umpy+0.187
asis+0.185
rr+0.180
Neg logit contributions
 her-0.178
 the-0.177
 it-0.165
 their-0.160
 his-0.158
Token thief
Token ID25906
Feature activation+3.27e-02

Pos logit contributions
+0.111
ute+0.107
'."+0.103
umpy+0.103
ari+0.102
Neg logit contributions
 her-0.164
 the-0.157
 his-0.152
 it-0.148
 their-0.144
Token was
Token ID373
Feature activation-1.13e-02

Pos logit contributions
 promise+0.076
ute+0.076
 rule+0.075
umpy+0.075
+0.074
Neg logit contributions
 her-0.049
 the-0.043
 his-0.042
 their-0.039
 it-0.038
Token reaching
Token ID8978
Feature activation-2.18e-02

Pos logit contributions
 her+0.017
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.024
 rule-0.024
umpy-0.024
oc-0.023
opus-0.023
Token in
Token ID287
Feature activation-3.85e-03

Pos logit contributions
 her+0.023
 their+0.019
 his+0.019
 the+0.018
led+0.018
Neg logit contributions
 promise-0.054
 rule-0.053
 request-0.051
-0.051
oc-0.050
Token The
Token ID383
Feature activation+8.08e-02

Pos logit contributions
 promise+0.089
 rule+0.087
ater+0.086
asis+0.085
opus+0.084
Neg logit contributions
 her-0.055
 the-0.049
 his-0.048
 their-0.047
 its-0.044
Token cat
Token ID3797
Feature activation+8.86e-03

Pos logit contributions
!'"+0.147
'."+0.142
asis+0.140
ater+0.139
rr+0.138
Neg logit contributions
 her-0.170
 the-0.168
 his-0.148
 their-0.144
 a-0.140
Token was
Token ID373
Feature activation-2.56e-02

Pos logit contributions
 promise+0.021
 rule+0.020
umpy+0.020
ute+0.020
opus+0.020
Neg logit contributions
 her-0.012
 the-0.011
 his-0.010
 their-0.010
 its-0.009
Token sitting
Token ID5586
Feature activation-1.92e-02

Pos logit contributions
 her+0.040
 their+0.035
 his+0.034
led+0.034
 the+0.033
Neg logit contributions
 promise-0.050
umpy-0.049
 rule-0.048
age-0.047
 request-0.046
Token under
Token ID739
Feature activation-3.45e-02

Pos logit contributions
 her+0.025
 their+0.022
 his+0.021
 the+0.021
 its+0.019
Neg logit contributions
 promise-0.044
umpy-0.043
-0.043
oc-0.042
ute-0.042
Token a
Token ID257
Feature activation+8.08e-02

Pos logit contributions
led+0.027
 her+0.025
bled+0.023
ied+0.023
lled+0.021
Neg logit contributions
-0.087
oc-0.086
 rule-0.084
 world-0.084
 farm-0.084
Token tree
Token ID5509
Feature activation+3.38e-02

Pos logit contributions
ider+0.155
rr+0.154
oral+0.150
 Driver+0.149
 promise+0.147
Neg logit contributions
 the-0.166
 her-0.163
 his-0.147
 it-0.145
 its-0.139
Token.
Token ID13
Feature activation+9.00e-03

Pos logit contributions
opus+0.076
 promise+0.075
umpy+0.073
+0.072
 rule+0.072
Neg logit contributions
 her-0.055
 the-0.049
 his-0.047
 their-0.045
 its-0.042
Token The
Token ID383
Feature activation+6.89e-02

Pos logit contributions
 promise+0.020
 rule+0.020
opus+0.019
ater+0.019
ute+0.019
Neg logit contributions
 her-0.014
 the-0.012
 his-0.012
 their-0.012
 its-0.011
Token bird
Token ID6512
Feature activation-7.81e-03

Pos logit contributions
!'"+0.127
'."+0.124
rr+0.120
ater+0.118
asis+0.118
Neg logit contributions
 the-0.143
 her-0.142
 his-0.120
 their-0.118
 a-0.117
Token said
Token ID531
Feature activation-3.72e-02

Pos logit contributions
 her+0.012
 their+0.010
 his+0.010
 the+0.010
 its+0.009
Neg logit contributions
 promise-0.017
oc-0.017
-0.016
opus-0.016
umpy-0.016
Token fun
Token ID1257
Feature activation+4.58e-03

Pos logit contributions
opus+0.045
oc+0.042
umpy+0.042
ute+0.042
ater+0.041
Neg logit contributions
 her-0.031
 their-0.026
 the-0.026
 his-0.026
 it-0.023
Token.
Token ID13
Feature activation-1.82e-03

Pos logit contributions
 promise+0.009
opus+0.009
umpy+0.009
 rule+0.009
+0.009
Neg logit contributions
 her-0.008
 the-0.007
 their-0.007
 his-0.007
 its-0.006
Token<|endoftext|>
Token ID50256
Feature activation+4.07e-02

Pos logit contributions
 her+0.004
 his+0.003
 the+0.003
 their+0.003
 its+0.003
Neg logit contributions
 promise-0.003
 rule-0.003
-0.003
umpy-0.003
oc-0.003
TokenZero
Token ID28667
Feature activation-1.88e-02

Pos logit contributions
 promise+0.090
 rule+0.083
affe+0.080
ater+0.080
 request+0.080
Neg logit contributions
 her-0.074
 the-0.071
 his-0.067
 their-0.065
 it-0.062
Token is
Token ID318
Feature activation-1.48e-02

Pos logit contributions
 her+0.029
 their+0.025
 his+0.025
 the+0.024
 its+0.023
Neg logit contributions
oc-0.039
-0.039
 promise-0.039
opus-0.038
umpy-0.038
Token a
Token ID257
Feature activation+1.07e-01

Pos logit contributions
 her+0.012
 their+0.009
 the+0.008
 his+0.008
led+0.008
Neg logit contributions
 promise-0.040
-0.040
umpy-0.040
 rule-0.040
ute-0.040
Token green
Token ID4077
Feature activation+1.20e-01

Pos logit contributions
ides+0.262
age+0.260
iver+0.260
!'"+0.247
'."+0.246
Neg logit contributions
 the-0.174
 her-0.166
 his-0.155
 their-0.143
 it-0.143
Token cat
Token ID3797
Feature activation+6.70e-02

Pos logit contributions
umpy+0.212
icious+0.198
 Deal+0.195
iver+0.194
rig+0.189
Neg logit contributions
 her-0.265
 the-0.256
 his-0.244
 it-0.223
 their-0.222
Token.
Token ID13
Feature activation+4.40e-04

Pos logit contributions
 promise+0.165
arl+0.159
umpy+0.157
 rule+0.155
reath+0.153
Neg logit contributions
 her-0.105
 the-0.099
 his-0.092
 their-0.085
 it-0.083
Token He
Token ID679
Feature activation+1.78e-02

Pos logit contributions
 promise+0.001
 rule+0.001
opus+0.001
umpy+0.001
+0.001
Neg logit contributions
 her-0.001
 his-0.001
 the-0.001
 their-0.001
 its-0.001
Token likes
Token ID7832
Feature activation-8.89e-03

Pos logit contributions
 promise+0.030
 rule+0.029
opus+0.029
umpy+0.029
ute+0.029
Neg logit contributions
 her-0.037
 his-0.034
 the-0.034
 their-0.033
 it-0.031
Token he
Token ID339
Feature activation-3.72e-03

Pos logit contributions
 her+0.030
led+0.029
bled+0.027
 their+0.024
lled+0.024
Neg logit contributions
 rule-0.089
 story-0.089
 judge-0.087
 promise-0.087
-0.086
Token had
Token ID550
Feature activation-2.02e-02

Pos logit contributions
 her+0.006
 their+0.005
 his+0.005
 the+0.005
 its+0.005
Neg logit contributions
 promise-0.008
 rule-0.008
-0.008
oc-0.008
umpy-0.008
Token written
Token ID3194
Feature activation-2.20e-02

Pos logit contributions
 her+0.031
 their+0.027
 his+0.027
 the+0.027
 its+0.025
Neg logit contributions
 promise-0.043
 rule-0.041
-0.041
ater-0.041
umpy-0.040
Token.
Token ID13
Feature activation-1.48e-02

Pos logit contributions
 her+0.036
 their+0.032
 his+0.031
 the+0.031
 its+0.029
Neg logit contributions
 promise-0.042
 rule-0.041
-0.041
umpy-0.041
ute-0.041
Token
Token ID198
Feature activation+1.36e-02

Pos logit contributions
 her+0.030
 their+0.028
 the+0.027
 his+0.027
led+0.026
Neg logit contributions
 promise-0.023
-0.022
umpy-0.022
oc-0.022
opus-0.022
Token
Token ID198
Feature activation+2.28e-02

Pos logit contributions
 promise+0.026
 rule+0.025
ater+0.025
opus+0.025
umpy+0.024
Neg logit contributions
 her-0.026
 the-0.023
 his-0.023
 their-0.023
 it-0.021
TokenHis
Token ID6653
Feature activation+4.40e-03

Pos logit contributions
 promise+0.041
 rule+0.040
+0.039
asis+0.038
ater+0.038
Neg logit contributions
 her-0.045
 his-0.041
 the-0.040
 their-0.040
 it-0.037
Token mom
Token ID1995
Feature activation-9.01e-03

Pos logit contributions
 promise+0.009
oc+0.009
umpy+0.009
opus+0.009
+0.009
Neg logit contributions
 her-0.008
 his-0.007
 the-0.007
 their-0.007
 its-0.006
Token was
Token ID373
Feature activation-2.79e-02

Pos logit contributions
 her+0.013
 their+0.011
 his+0.011
 the+0.010
 its+0.010
Neg logit contributions
 promise-0.021
oc-0.020
-0.020
opus-0.020
 rule-0.020
Token very
Token ID845
Feature activation-5.01e-03

Pos logit contributions
 her+0.040
 their+0.035
 his+0.035
led+0.034
 the+0.032
Neg logit contributions
 promise-0.059
 rule-0.057
umpy-0.056
 request-0.055
-0.055
Token proud
Token ID6613
Feature activation-2.49e-03

Pos logit contributions
 her+0.009
 his+0.007
 their+0.007
 the+0.007
 its+0.007
Neg logit contributions
 promise-0.010
 rule-0.010
umpy-0.010
-0.010
opus-0.010
Token mom
Token ID1995
Feature activation+3.12e-02

Pos logit contributions
oc+0.061
asis+0.059
umpy+0.058
ater+0.058
opus+0.058
Neg logit contributions
 her-0.045
 his-0.038
 the-0.038
 their-0.038
 its-0.033
Token said
Token ID531
Feature activation-2.69e-03

Pos logit contributions
opus+0.080
ute+0.079
umpy+0.078
 rule+0.077
ater+0.077
Neg logit contributions
 her-0.044
 the-0.036
 his-0.035
 their-0.034
 him-0.029
Token.
Token ID13
Feature activation+3.34e-02

Pos logit contributions
 her+0.005
 his+0.005
 the+0.005
 their+0.005
 its+0.004
Neg logit contributions
 promise-0.005
 rule-0.005
-0.004
oc-0.004
umpy-0.004
Token
Token ID198
Feature activation+4.35e-02

Pos logit contributions
 promise+0.076
asis+0.073
 rule+0.073
opus+0.073
ater+0.072
Neg logit contributions
 her-0.056
 the-0.050
 their-0.049
 his-0.048
 its-0.044
Token
Token ID198
Feature activation+5.91e-02

Pos logit contributions
 promise+0.099
opus+0.097
asis+0.096
ater+0.096
 rule+0.095
Neg logit contributions
 her-0.075
 the-0.066
 their-0.065
 his-0.063
 it-0.059
TokenL
Token ID43
Feature activation+1.96e-02

Pos logit contributions
 promise+0.120
asis+0.119
 rule+0.113
opus+0.113
ode+0.112
Neg logit contributions
 her-0.119
 their-0.104
 the-0.100
 his-0.096
 its-0.091
Tokenily
Token ID813
Feature activation+1.83e-02

Pos logit contributions
 promise+0.047
+0.047
opus+0.046
 rule+0.046
oc+0.046
Neg logit contributions
 her-0.029
 the-0.022
 his-0.022
 their-0.022
 its-0.019
Token felt
Token ID2936
Feature activation-4.47e-02

Pos logit contributions
 rule+0.042
ute+0.042
opus+0.042
 promise+0.042
umpy+0.042
Neg logit contributions
 her-0.026
 the-0.022
 his-0.022
 their-0.021
 it-0.019
Token much
Token ID881
Feature activation-1.03e-02

Pos logit contributions
 her+0.071
ied+0.071
 their+0.068
bled+0.068
led+0.067
Neg logit contributions
 promise-0.078
-0.076
umpy-0.073
 request-0.072
 rule-0.071
Token better
Token ID1365
Feature activation+7.00e-05

Pos logit contributions
 her+0.020
 their+0.018
 his+0.018
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.019
-0.018
 rule-0.018
umpy-0.017
oc-0.017
Token inside
Token ID2641
Feature activation-3.68e-02

Pos logit contributions
 promise+0.000
 rule+0.000
opus+0.000
+0.000
umpy+0.000
Neg logit contributions
 her-0.000
 the-0.000
 his-0.000
 their-0.000
 its-0.000
Token was
Token ID373
Feature activation-4.23e-02

Pos logit contributions
 her+0.042
 their+0.038
 his+0.036
 the+0.036
 its+0.035
Neg logit contributions
 promise-0.043
umpy-0.041
opus-0.041
-0.041
 rule-0.040
Token the
Token ID262
Feature activation+5.01e-02

Pos logit contributions
 her+0.068
led+0.067
ed+0.064
ted+0.063
ged+0.062
Neg logit contributions
 promise-0.078
 rule-0.077
 world-0.073
umpy-0.069
 request-0.069
Token best
Token ID1266
Feature activation+5.52e-03

Pos logit contributions
oc+0.126
affe+0.115
ari+0.111
opus+0.111
age+0.110
Neg logit contributions
 the-0.073
 her-0.072
 his-0.070
 their-0.065
 nothing-0.064
Token day
Token ID1110
Feature activation-3.97e-02

Pos logit contributions
 promise+0.008
oc+0.008
+0.008
opus+0.008
umpy+0.008
Neg logit contributions
 her-0.012
 his-0.011
 the-0.011
 their-0.011
 it-0.011
Token of
Token ID286
Feature activation-2.61e-02

Pos logit contributions
 her+0.071
 their+0.062
 his+0.061
 the+0.059
led+0.058
Neg logit contributions
 promise-0.068
opus-0.066
oc-0.065
 rule-0.065
umpy-0.065
Token his
Token ID465
Feature activation+3.53e-02

Pos logit contributions
 her+0.022
led+0.016
 their+0.016
 the+0.014
 his+0.014
Neg logit contributions
 promise-0.070
 rule-0.068
 world-0.065
asis-0.065
 team-0.065
Token life
Token ID1204
Feature activation+1.68e-02

Pos logit contributions
oc+0.061
affe+0.061
umpy+0.060
+0.060
age+0.060
Neg logit contributions
 her-0.069
 the-0.068
 their-0.066
 his-0.066
 it-0.060
Token!
Token ID0
Feature activation+1.41e-02

Pos logit contributions
 promise+0.032
oc+0.032
 rule+0.032
umpy+0.032
opus+0.032
Neg logit contributions
 her-0.030
 the-0.027
 his-0.027
 their-0.027
 it-0.025
Token
Token ID220
Feature activation+2.68e-02

Pos logit contributions
 promise+0.028
 rule+0.027
ater+0.027
opus+0.027
ute+0.027
Neg logit contributions
 her-0.025
 his-0.023
 the-0.023
 their-0.022
 it-0.021
Token
Token ID198
Feature activation+2.34e-02

Pos logit contributions
 promise+0.055
 rule+0.054
opus+0.052
ater+0.052
oc+0.051
Neg logit contributions
 her-0.047
 the-0.044
 his-0.043
 their-0.041
 it-0.038
Token
Token ID198
Feature activation+2.69e-02

Pos logit contributions
 promise+0.049
 rule+0.048
ater+0.048
opus+0.046
ute+0.046
Neg logit contributions
 her-0.040
 the-0.037
 his-0.037
 their-0.035
 it-0.034
Token Now
Token ID2735
Feature activation+7.42e-03

Pos logit contributions
ater+0.075
 promise+0.075
 rule+0.074
asis+0.074
opus+0.070
Neg logit contributions
 her-0.093
 the-0.088
 it-0.082
 his-0.082
 their-0.080
Token I
Token ID314
Feature activation+5.30e-02

Pos logit contributions
umpy+0.013
oc+0.013
 promise+0.013
opus+0.013
 rule+0.013
Neg logit contributions
 her-0.014
 the-0.013
 his-0.013
 their-0.013
 it-0.012
Token have
Token ID423
Feature activation-3.78e-02

Pos logit contributions
opus+0.098
affe+0.091
ute+0.090
round+0.090
umpy+0.088
Neg logit contributions
 her-0.107
 the-0.099
 his-0.093
 it-0.093
 their-0.088
Token something
Token ID1223
Feature activation-2.96e-03

Pos logit contributions
 her+0.039
led+0.035
bled+0.033
 their+0.033
 his+0.032
Neg logit contributions
-0.090
 promise-0.089
 rule-0.085
 story-0.083
 favor-0.083
Token pretty
Token ID2495
Feature activation+2.85e-02

Pos logit contributions
 her+0.005
 his+0.004
 the+0.004
 their+0.004
 its+0.004
Neg logit contributions
 promise-0.006
 rule-0.006
-0.006
opus-0.006
umpy-0.006
Token to
Token ID284
Feature activation+4.16e-02

Pos logit contributions
opus+0.065
umpy+0.064
oc+0.064
 promise+0.064
 rule+0.063
Neg logit contributions
 her-0.045
 the-0.038
 his-0.037
 their-0.037
 it-0.036
Token wear
Token ID5806
Feature activation+1.71e-02

Pos logit contributions
umpy+0.078
opus+0.075
ney+0.072
ari+0.071
 Driver+0.069
Neg logit contributions
 her-0.098
 the-0.088
 his-0.084
 it-0.081
 its-0.080
Token."
Token ID526
Feature activation+2.66e-03

Pos logit contributions
 promise+0.034
umpy+0.033
oc+0.033
opus+0.033
 rule+0.033
Neg logit contributions
 her-0.031
 the-0.028
 his-0.027
 their-0.027
 it-0.025
Token
Token ID220
Feature activation+2.04e-02

Pos logit contributions
 promise+0.006
 rule+0.005
opus+0.005
umpy+0.005
oc+0.005
Neg logit contributions
 her-0.004
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token
Token ID198
Feature activation+2.20e-02

Pos logit contributions
 promise+0.040
 rule+0.039
opus+0.039
ater+0.038
oc+0.038
Neg logit contributions
 her-0.038
 the-0.034
 his-0.033
 their-0.033
 it-0.030
Token
Token ID198
Feature activation+3.20e-02

Pos logit contributions
 promise+0.046
 rule+0.045
opus+0.044
ater+0.044
asis+0.044
Neg logit contributions
 her-0.040
 the-0.036
 his-0.034
 their-0.034
 it-0.031
Tokenpy
Token ID9078
Feature activation-1.20e-03

Pos logit contributions
 promise+0.005
 rule+0.005
+0.005
opus+0.005
oc+0.005
Neg logit contributions
 her-0.007
 the-0.006
 his-0.006
 their-0.006
 its-0.006
Token did
Token ID750
Feature activation-3.39e-02

Pos logit contributions
 her+0.002
 his+0.002
 the+0.002
 their+0.002
 its+0.001
Neg logit contributions
 promise-0.003
 rule-0.003
opus-0.003
-0.003
oc-0.003
Token it
Token ID340
Feature activation+6.75e-03

Pos logit contributions
led+0.046
 her+0.043
ed+0.041
bled+0.040
ied+0.039
Neg logit contributions
 promise-0.074
 rule-0.071
 request-0.068
-0.066
 story-0.066
Token and
Token ID290
Feature activation-4.68e-02

Pos logit contributions
 promise+0.014
opus+0.014
oc+0.014
 rule+0.014
+0.014
Neg logit contributions
 her-0.011
 the-0.009
 his-0.009
 their-0.009
 its-0.008
Token the
Token ID262
Feature activation+4.98e-02

Pos logit contributions
 her+0.071
bled+0.067
 his+0.065
led+0.064
ied+0.063
Neg logit contributions
 promise-0.085
-0.081
 judge-0.080
oc-0.080
 team-0.080
Token other
Token ID584
Feature activation+2.41e-02

Pos logit contributions
oc+0.120
opus+0.118
ater+0.116
+0.114
ari+0.114
Neg logit contributions
 her-0.083
 the-0.078
 his-0.068
 their-0.068
 its-0.063
Token animals
Token ID4695
Feature activation+1.48e-03

Pos logit contributions
opus+0.053
+0.052
 promise+0.051
ute+0.051
ater+0.050
Neg logit contributions
 her-0.041
 the-0.037
 their-0.034
 his-0.034
 its-0.031
Token liked
Token ID8288
Feature activation-4.84e-02

Pos logit contributions
 promise+0.003
 rule+0.003
opus+0.003
oc+0.003
umpy+0.003
Neg logit contributions
 her-0.002
 the-0.002
 his-0.002
 their-0.002
 its-0.002
Token it
Token ID340
Feature activation-9.21e-03

Pos logit contributions
led+0.056
bled+0.054
lled+0.049
 her+0.046
ed+0.043
Neg logit contributions
 rule-0.109
 world-0.106
 story-0.106
 farm-0.103
 team-0.103
Token.
Token ID13
Feature activation-4.78e-03

Pos logit contributions
 her+0.018
 his+0.016
 their+0.016
 the+0.016
 its+0.014
Neg logit contributions
 promise-0.016
 rule-0.016
umpy-0.015
-0.015
oc-0.015
Token They
Token ID1119
Feature activation-2.10e-02

Pos logit contributions
 her+0.009
 his+0.008
 their+0.008
 the+0.008
 its+0.007
Neg logit contributions
 promise-0.009
oc-0.009
 rule-0.009
-0.009
umpy-0.009
Token there
Token ID612
Feature activation-4.07e-02

Pos logit contributions
 her+0.017
 his+0.014
 their+0.013
 the+0.013
 its+0.011
Neg logit contributions
 promise-0.044
-0.044
umpy-0.044
ute-0.044
 rule-0.043
Token was
Token ID373
Feature activation-2.47e-02

Pos logit contributions
 her+0.071
led+0.067
bled+0.063
 their+0.063
 his+0.062
Neg logit contributions
oc-0.069
 world-0.067
 promise-0.067
-0.067
 story-0.065
Token an
Token ID281
Feature activation+4.12e-02

Pos logit contributions
 her+0.021
 his+0.017
led+0.016
 their+0.016
 the+0.016
Neg logit contributions
-0.063
umpy-0.062
 promise-0.062
ute-0.062
 rule-0.061
Token enormous
Token ID9812
Feature activation+1.10e-01

Pos logit contributions
+0.084
!'"+0.078
age+0.078
 promise+0.078
'."+0.078
Neg logit contributions
 her-0.077
 the-0.069
 his-0.068
 their-0.064
 its-0.059
Token elephant
Token ID20950
Feature activation+5.93e-02

Pos logit contributions
umpy+0.198
iver+0.195
age+0.188
'."+0.187
rr+0.184
Neg logit contributions
 the-0.235
 her-0.228
 it-0.211
 his-0.209
 their-0.196
Token.
Token ID13
Feature activation+7.62e-03

Pos logit contributions
 promise+0.141
arl+0.140
 Deal+0.135
 rule+0.134
reath+0.133
Neg logit contributions
 her-0.097
 the-0.093
 his-0.087
 their-0.078
 its-0.077
Token He
Token ID679
Feature activation+8.01e-03

Pos logit contributions
 promise+0.016
opus+0.016
 rule+0.016
ute+0.016
+0.015
Neg logit contributions
 her-0.013
 his-0.011
 the-0.011
 their-0.011
 its-0.010
Token liked
Token ID8288
Feature activation-2.24e-02

Pos logit contributions
 promise+0.018
 rule+0.018
umpy+0.017
opus+0.017
ute+0.017
Neg logit contributions
 her-0.012
 his-0.011
 the-0.011
 their-0.010
 its-0.009
Token to
Token ID284
Feature activation+5.91e-02

Pos logit contributions
 her+0.028
 their+0.023
led+0.023
 his+0.022
 the+0.022
Neg logit contributions
 rule-0.051
opus-0.049
affe-0.049
oc-0.049
 promise-0.049
Token wander
Token ID27776
Feature activation-1.62e-02

Pos logit contributions
umpy+0.116
opus+0.111
ari+0.105
rot+0.102
ute+0.102
Neg logit contributions
 her-0.141
 his-0.132
 the-0.128
 their-0.118
 its-0.113
Token around
Token ID1088
Feature activation-5.29e-03

Pos logit contributions
 her+0.019
 their+0.015
 his+0.014
 the+0.014
bled+0.013
Neg logit contributions
-0.039
 promise-0.039
 rule-0.039
oc-0.038
umpy-0.038
Token cover
Token ID3002
Feature activation+9.01e-03

Pos logit contributions
umpy+0.073
oc+0.073
 promise+0.072
opus+0.071
rr+0.071
Neg logit contributions
 her-0.134
 the-0.124
 their-0.115
 his-0.113
 it-0.103
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
 promise+0.017
opus+0.016
oc+0.016
 rule+0.016
+0.016
Neg logit contributions
 her-0.017
 the-0.015
 his-0.015
 their-0.015
 its-0.014
Token
Token ID220
Feature activation-6.23e-03

Pos logit contributions
 her+0.025
 his+0.023
 their+0.023
 the+0.023
led+0.022
Neg logit contributions
-0.023
 promise-0.023
 rule-0.022
oc-0.022
umpy-0.022
Token
Token ID198
Feature activation+6.95e-03

Pos logit contributions
 her+0.013
 his+0.012
 the+0.011
 their+0.011
 its+0.011
Neg logit contributions
 promise-0.010
 rule-0.010
-0.010
umpy-0.010
oc-0.010
Token
Token ID198
Feature activation+2.59e-02

Pos logit contributions
 promise+0.012
opus+0.012
 rule+0.012
umpy+0.012
oc+0.012
Neg logit contributions
 her-0.014
 the-0.012
 their-0.012
 his-0.012
 it-0.011
TokenL
Token ID43
Feature activation-2.51e-03

Pos logit contributions
asis+0.049
opus+0.048
 promise+0.047
affe+0.046
 Driver+0.045
Neg logit contributions
 her-0.054
 the-0.046
 their-0.046
 his-0.043
 it-0.042
Tokenily
Token ID813
Feature activation-6.89e-03

Pos logit contributions
 her+0.004
 his+0.004
 the+0.004
 their+0.004
 its+0.003
Neg logit contributions
 promise-0.005
 rule-0.005
umpy-0.005
opus-0.005
-0.005
Token said
Token ID531
Feature activation-4.68e-02

Pos logit contributions
 her+0.012
 their+0.010
 his+0.010
 the+0.010
 its+0.009
Neg logit contributions
 promise-0.014
oc-0.013
-0.013
opus-0.013
 rule-0.013
Token to
Token ID284
Feature activation-3.47e-02

Pos logit contributions
lled+0.089
 her+0.083
led+0.082
 his+0.079
 their+0.077
Neg logit contributions
-0.071
 promise-0.068
 team-0.066
ute-0.066
opus-0.066
Token her
Token ID607
Feature activation-1.70e-02

Pos logit contributions
led+0.019
ed+0.011
lled+0.008
ied+0.007
ged+0.006
Neg logit contributions
 promise-0.106
 request-0.105
 rule-0.105
 judge-0.104
-0.100
Token mom
Token ID1995
Feature activation-2.05e-02

Pos logit contributions
 her+0.033
 his+0.030
 their+0.030
led+0.030
 the+0.029
Neg logit contributions
 promise-0.030
 rule-0.027
 request-0.027
-0.026
 story-0.026
Token.
Token ID13
Feature activation+2.64e-03

Pos logit contributions
opus+0.115
umpy+0.110
arl+0.110
ute+0.109
oc+0.109
Neg logit contributions
 her-0.078
 the-0.071
 their-0.064
 his-0.064
 it-0.058
Token The
Token ID383
Feature activation+2.25e-02

Pos logit contributions
 promise+0.006
 rule+0.006
opus+0.006
umpy+0.005
oc+0.005
Neg logit contributions
 her-0.004
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token infant
Token ID11212
Feature activation-9.50e-04

Pos logit contributions
oc+0.039
asis+0.038
ute+0.037
ater+0.037
opus+0.037
Neg logit contributions
 her-0.050
 the-0.045
 their-0.043
 his-0.043
 it-0.040
Token was
Token ID373
Feature activation-4.22e-02

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.002
 rule-0.002
opus-0.002
-0.002
oc-0.002
Token very
Token ID845
Feature activation-1.60e-02

Pos logit contributions
led+0.076
ted+0.073
 her+0.072
ened+0.070
 him+0.069
Neg logit contributions
 promise-0.071
 rule-0.067
umpy-0.063
 request-0.061
iver-0.060
Token happy
Token ID3772
Feature activation-1.30e-02

Pos logit contributions
 her+0.029
 his+0.026
 the+0.026
 their+0.026
 him+0.025
Neg logit contributions
 promise-0.030
umpy-0.028
 rule-0.028
 request-0.027
-0.027
Token and
Token ID290
Feature activation-5.52e-02

Pos logit contributions
 her+0.026
 their+0.023
 his+0.023
 the+0.022
led+0.022
Neg logit contributions
 promise-0.021
 rule-0.021
-0.020
umpy-0.020
opus-0.020
Token played
Token ID2826
Feature activation-1.41e-02

Pos logit contributions
 her+0.089
lled+0.085
 them+0.084
 him+0.083
 any+0.082
Neg logit contributions
 promise-0.096
-0.087
 judge-0.084
 request-0.084
 team-0.084
Token with
Token ID351
Feature activation-5.74e-02

Pos logit contributions
 her+0.021
 their+0.019
 his+0.018
 the+0.018
led+0.017
Neg logit contributions
 promise-0.029
 rule-0.029
-0.029
oc-0.028
umpy-0.028
Token the
Token ID262
Feature activation+1.97e-02

Pos logit contributions
led+0.054
bled+0.049
ged+0.046
ed+0.045
icked+0.041
Neg logit contributions
-0.135
 team-0.134
 promise-0.133
 world-0.132
 rule-0.131
Token ball
Token ID2613
Feature activation+2.86e-03

Pos logit contributions
oc+0.026
asis+0.025
ater+0.024
opus+0.024
ari+0.024
Neg logit contributions
 her-0.051
 the-0.047
 their-0.046
 his-0.046
 it-0.043
Token him
Token ID683
Feature activation+2.54e-02

Pos logit contributions
led+0.042
 her+0.030
bled+0.030
ed+0.028
 their+0.025
Neg logit contributions
 rule-0.138
 hill-0.135
 world-0.134
 promise-0.134
 road-0.134
Token.
Token ID13
Feature activation+7.71e-03

Pos logit contributions
opus+0.053
+0.052
umpy+0.051
asis+0.051
ari+0.050
Neg logit contributions
 her-0.045
 the-0.040
 his-0.040
 their-0.039
 it-0.035
Token She
Token ID1375
Feature activation+2.45e-02

Pos logit contributions
 promise+0.016
opus+0.015
 rule+0.015
+0.015
umpy+0.015
Neg logit contributions
 her-0.014
 his-0.012
 the-0.012
 their-0.012
 its-0.011
Token keeps
Token ID7622
Feature activation-3.50e-02

Pos logit contributions
opus+0.043
umpy+0.043
ute+0.043
asis+0.042
oc+0.042
Neg logit contributions
 her-0.049
 the-0.044
 his-0.043
 their-0.043
 it-0.040
Token looking
Token ID2045
Feature activation-2.04e-02

Pos logit contributions
led+0.041
 her+0.040
 their+0.035
bled+0.034
 his+0.033
Neg logit contributions
 rule-0.075
 promise-0.074
 judge-0.074
 request-0.073
 story-0.072
Token for
Token ID329
Feature activation-4.82e-02

Pos logit contributions
 her+0.029
 their+0.025
 his+0.025
 the+0.024
 its+0.023
Neg logit contributions
 promise-0.045
 rule-0.043
umpy-0.043
oc-0.042
-0.042
Token her
Token ID607
Feature activation+4.02e-02

Pos logit contributions
led+0.043
bled+0.036
ied+0.035
lled+0.033
 her+0.032
Neg logit contributions
 promise-0.122
 rule-0.120
 judge-0.117
 request-0.117
 story-0.114
Token own
Token ID898
Feature activation+4.53e-02

Pos logit contributions
opus+0.082
umpy+0.081
oc+0.081
asis+0.077
rr+0.077
Neg logit contributions
 her-0.075
 the-0.069
 his-0.066
 their-0.065
 it-0.060
Token pieces
Token ID5207
Feature activation+4.74e-03

Pos logit contributions
opus+0.091
umpy+0.090
oc+0.089
ari+0.088
asis+0.088
Neg logit contributions
 her-0.087
 the-0.081
 his-0.078
 their-0.075
 it-0.071
Token.
Token ID13
Feature activation-8.26e-05

Pos logit contributions
 promise+0.008
opus+0.008
oc+0.008
+0.008
 rule+0.008
Neg logit contributions
 her-0.009
 the-0.008
 his-0.008
 their-0.008
 its-0.007
Token
Token ID198
Feature activation+1.30e-02

Pos logit contributions
 her+0.000
 his+0.000
 the+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.000
 rule-0.000
opus-0.000
-0.000
umpy-0.000
Token tired
Token ID10032
Feature activation+3.41e-02

Pos logit contributions
 her+0.046
 their+0.040
 his+0.039
 him+0.038
 the+0.038
Neg logit contributions
 promise-0.050
umpy-0.046
-0.046
 rule-0.045
 request-0.043
Token."
Token ID526
Feature activation+1.46e-02

Pos logit contributions
opus+0.077
oc+0.074
affe+0.074
ute+0.073
rot+0.072
Neg logit contributions
 her-0.057
 the-0.052
 his-0.049
 their-0.049
 it-0.046
Token
Token ID198
Feature activation+2.70e-02

Pos logit contributions
 promise+0.030
 rule+0.029
oc+0.029
opus+0.029
asis+0.029
Neg logit contributions
 her-0.027
 the-0.024
 their-0.024
 his-0.023
 it-0.022
Token
Token ID198
Feature activation+3.74e-02

Pos logit contributions
 promise+0.057
ater+0.057
 rule+0.056
opus+0.056
asis+0.055
Neg logit contributions
 her-0.048
 the-0.043
 their-0.042
 his-0.041
 it-0.039
TokenBen
Token ID11696
Feature activation+5.40e-03

Pos logit contributions
 promise+0.069
asis+0.068
 rule+0.067
ute+0.067
 request+0.066
Neg logit contributions
 her-0.077
 the-0.068
 their-0.068
 his-0.063
 it-0.061
Token and
Token ID290
Feature activation-2.40e-02

Pos logit contributions
 promise+0.011
 rule+0.011
opus+0.011
oc+0.011
umpy+0.011
Neg logit contributions
 her-0.009
 the-0.008
 his-0.008
 their-0.007
 its-0.007
Token Lily
Token ID20037
Feature activation+4.76e-03

Pos logit contributions
 her+0.025
led+0.025
lled+0.024
 their+0.023
bled+0.023
Neg logit contributions
 promise-0.057
 rule-0.055
 request-0.054
-0.053
 team-0.052
Token nodded
Token ID14464
Feature activation-2.16e-02

Pos logit contributions
 promise+0.011
 rule+0.011
oc+0.011
opus+0.011
umpy+0.011
Neg logit contributions
 her-0.007
 the-0.006
 his-0.006
 their-0.006
 its-0.005
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
 her+0.043
 the+0.039
 his+0.038
 their+0.038
led+0.037
Neg logit contributions
 promise-0.035
-0.032
 request-0.031
 rule-0.031
oc-0.030
Token They
Token ID1119
Feature activation-1.26e-02

Pos logit contributions
opus+0.031
 promise+0.031
asis+0.031
oc+0.030
ater+0.030
Neg logit contributions
 her-0.024
 their-0.023
 the-0.022
 his-0.022
 its-0.019
Token were
Token ID547
Feature activation-2.91e-02

Pos logit contributions
 her+0.019
 their+0.016
 his+0.016
 the+0.016
 its+0.015
Neg logit contributions
 promise-0.028
-0.027
 rule-0.026
oc-0.026
umpy-0.026
Token all
Token ID477
Feature activation-6.06e-02

Pos logit contributions
led+0.094
 her+0.085
ged+0.077
bled+0.076
ened+0.074
Neg logit contributions
 driver-0.147
oc-0.147
affe-0.146
 fence-0.143
 rule-0.142
Token the
Token ID262
Feature activation-1.93e-03

Pos logit contributions
bled+0.121
 nothing+0.108
led+0.104
ged+0.102
 none+0.101
Neg logit contributions
 world-0.095
 hill-0.090
ney-0.088
 rule-0.086
oc-0.086
Token numbers
Token ID3146
Feature activation-4.66e-02

Pos logit contributions
 her+0.003
 his+0.003
 their+0.003
 the+0.003
 its+0.003
Neg logit contributions
 promise-0.004
 rule-0.004
oc-0.004
umpy-0.004
-0.004
Token,
Token ID11
Feature activation-4.82e-02

Pos logit contributions
 her+0.084
 his+0.072
 their+0.072
 the+0.069
 its+0.068
Neg logit contributions
 promise-0.077
 hill-0.076
 rule-0.076
affe-0.075
ari-0.074
Token but
Token ID475
Feature activation-7.90e-02

Pos logit contributions
 her+0.090
led+0.082
bled+0.081
 their+0.078
ied+0.076
Neg logit contributions
 rule-0.075
 farm-0.075
oc-0.075
 road-0.074
 hill-0.074
Token when
Token ID618
Feature activation-6.11e-02

Pos logit contributions
 her+0.100
led+0.088
icked+0.080
lled+0.074
ted+0.074
Neg logit contributions
 hill-0.159
 promise-0.156
 judge-0.156
 driver-0.152
oc-0.150
Token he
Token ID339
Feature activation-3.07e-02

Pos logit contributions
led+0.083
lled+0.072
 her+0.071
ied+0.070
icked+0.066
Neg logit contributions
 hill-0.135
 road-0.128
-0.128
 story-0.124
 team-0.123
Token got
Token ID1392
Feature activation-5.43e-02

Pos logit contributions
 her+0.052
 their+0.048
 his+0.042
 its+0.042
 the+0.041
Neg logit contributions
oc-0.062
 promise-0.061
-0.059
asis-0.058
opus-0.058
Token to
Token ID284
Feature activation-2.24e-02

Pos logit contributions
led+0.082
 her+0.079
lled+0.077
ped+0.072
bled+0.071
Neg logit contributions
 promise-0.095
ite-0.093
-0.090
 house-0.089
 judge-0.089
Token the
Token ID262
Feature activation+2.57e-02

Pos logit contributions
led+0.013
 her+0.012
bled+0.008
ied+0.008
ed+0.008
Neg logit contributions
 promise-0.068
 rule-0.067
 request-0.065
-0.065
 hill-0.065
Token end
Token ID886
Feature activation-9.04e-03

Pos logit contributions
umpy+0.041
rr+0.039
ute+0.038
opus+0.037
oc+0.037
Neg logit contributions
 her-0.059
 the-0.056
 his-0.052
 it-0.051
 their-0.049
Token started
Token ID2067
Feature activation-4.10e-02

Pos logit contributions
 her+0.031
 his+0.027
 their+0.027
 the+0.026
 its+0.026
Neg logit contributions
 promise-0.050
-0.048
affe-0.047
oc-0.047
 team-0.047
Token to
Token ID284
Feature activation+3.99e-02

Pos logit contributions
led+0.072
 her+0.069
bled+0.067
lled+0.061
 its+0.061
Neg logit contributions
 rule-0.065
 team-0.065
 hill-0.063
 story-0.063
 request-0.063
Token count
Token ID954
Feature activation-3.69e-02

Pos logit contributions
umpy+0.066
opus+0.066
ute+0.064
affe+0.059
actus+0.059
Neg logit contributions
 her-0.097
 the-0.092
 his-0.089
 their-0.087
 it-0.079
Token.
Token ID13
Feature activation-5.18e-03

Pos logit contributions
 her+0.023
led+0.021
bled+0.018
 their+0.016
lled+0.016
Neg logit contributions
 rule-0.098
ute-0.097
 hill-0.096
 world-0.096
-0.096
Token But
Token ID887
Feature activation-6.92e-02

Pos logit contributions
 her+0.009
 his+0.008
 the+0.008
 their+0.008
 its+0.007
Neg logit contributions
 promise-0.010
-0.010
 rule-0.010
umpy-0.010
oc-0.010
Token then
Token ID788
Feature activation-6.26e-02

Pos logit contributions
lled+0.111
led+0.110
icked+0.103
ped+0.099
ed+0.097
Neg logit contributions
 promise-0.117
-0.107
 rule-0.103
 judge-0.099
 team-0.098
Token,
Token ID11
Feature activation-5.83e-02

Pos logit contributions
lled+0.137
led+0.130
ied+0.128
bled+0.127
ped+0.124
Neg logit contributions
-0.083
 team-0.076
 promise-0.070
 story-0.069
 judge-0.066
Token Lily
Token ID20037
Feature activation+2.74e-03

Pos logit contributions
lled+0.071
led+0.068
ied+0.062
ped+0.059
ened+0.057
Neg logit contributions
-0.129
 promise-0.119
 team-0.118
 judge-0.113
 swing-0.112
Token's
Token ID338
Feature activation+2.57e-02

Pos logit contributions
 promise+0.006
 rule+0.006
oc+0.006
umpy+0.006
opus+0.006
Neg logit contributions
 her-0.004
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token little
Token ID1310
Feature activation+1.71e-02

Pos logit contributions
oc+0.039
umpy+0.037
+0.036
asis+0.036
opus+0.036
Neg logit contributions
 her-0.061
 the-0.055
 his-0.055
 their-0.052
 it-0.050
Token brother
Token ID3956
Feature activation-1.35e-02

Pos logit contributions
oc+0.026
umpy+0.026
 promise+0.026
opus+0.026
asis+0.025
Neg logit contributions
 her-0.038
 the-0.035
 his-0.034
 their-0.034
 it-0.031
Tokenained
Token ID1328
Feature activation-5.08e-02

Pos logit contributions
 her+0.051
 his+0.047
 the+0.046
 their+0.046
 it+0.044
Neg logit contributions
umpy-0.047
ney-0.044
ater-0.044
ute-0.044
rr-0.043
Token all
Token ID477
Feature activation-5.04e-02

Pos logit contributions
 her+0.072
led+0.070
 his+0.068
ened+0.065
 their+0.064
Neg logit contributions
 rule-0.095
 world-0.094
umpy-0.093
 hill-0.091
-0.091
Token night
Token ID1755
Feature activation+4.03e-03

Pos logit contributions
 her+0.087
 his+0.086
bled+0.085
 their+0.082
 him+0.081
Neg logit contributions
umpy-0.083
 world-0.082
 rule-0.078
ney-0.076
 hill-0.075
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
 promise+0.007
opus+0.007
 rule+0.007
oc+0.007
+0.007
Neg logit contributions
 her-0.008
 the-0.007
 their-0.007
 his-0.007
 its-0.006
Token She
Token ID1375
Feature activation-3.77e-02

Pos logit contributions
 her+0.028
 their+0.026
 his+0.026
 the+0.025
led+0.024
Neg logit contributions
-0.030
 promise-0.029
umpy-0.028
 rule-0.028
oc-0.028
Token put
Token ID1234
Feature activation-7.23e-02

Pos logit contributions
 her+0.061
 their+0.059
 his+0.059
 its+0.055
 such+0.054
Neg logit contributions
 promise-0.070
-0.066
 request-0.062
umpy-0.061
opus-0.061
Token on
Token ID319
Feature activation-7.29e-02

Pos logit contributions
lled+0.103
 none+0.095
 theirs+0.094
arted+0.092
 Joe+0.091
Neg logit contributions
-0.121
 promise-0.103
age-0.103
 slide-0.103
 shelf-0.098
Token her
Token ID607
Feature activation-3.78e-05

Pos logit contributions
bled+0.048
led+0.047
lled+0.041
icked+0.040
 none+0.038
Neg logit contributions
 swing-0.188
 slide-0.185
-0.183
 house-0.179
 shelf-0.179
Token rain
Token ID6290
Feature activation-9.79e-03

Pos logit contributions
 her+0.000
 his+0.000
 the+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.000
 rule-0.000
opus-0.000
oc-0.000
umpy-0.000
Tokencoat
Token ID31434
Feature activation-2.57e-02

Pos logit contributions
 her+0.026
 his+0.024
 their+0.024
 the+0.024
 its+0.023
Neg logit contributions
 promise-0.010
 rule-0.009
-0.009
umpy-0.009
ute-0.009
Token and
Token ID290
Feature activation-6.23e-02

Pos logit contributions
 her+0.050
 their+0.046
 his+0.046
 its+0.044
 the+0.044
Neg logit contributions
 promise-0.041
-0.040
age-0.040
ute-0.039
 rule-0.038
Token to
Token ID284
Feature activation+1.18e-02

Pos logit contributions
 her+0.042
led+0.040
bled+0.040
 his+0.037
lled+0.035
Neg logit contributions
 rule-0.070
 promise-0.070
 request-0.067
 story-0.065
 judge-0.065
Token laugh
Token ID6487
Feature activation-1.08e-02

Pos logit contributions
opus+0.018
umpy+0.017
affe+0.017
oc+0.017
ute+0.017
Neg logit contributions
 her-0.028
 the-0.025
 their-0.025
 his-0.025
 its-0.023
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
 her+0.021
 his+0.019
 their+0.019
 the+0.019
 its+0.017
Neg logit contributions
 promise-0.018
 rule-0.018
-0.018
umpy-0.017
oc-0.017
Token They
Token ID1119
Feature activation-2.51e-02

Pos logit contributions
 her+0.036
 his+0.033
 the+0.032
 their+0.031
led+0.031
Neg logit contributions
 rule-0.026
umpy-0.026
 promise-0.025
-0.025
 request-0.024
Token pointed
Token ID6235
Feature activation-5.41e-02

Pos logit contributions
 her+0.038
 his+0.033
 their+0.033
 the+0.031
 its+0.031
Neg logit contributions
 promise-0.053
 rule-0.050
-0.049
 judge-0.048
oc-0.048
Token at
Token ID379
Feature activation-6.14e-02

Pos logit contributions
bled+0.079
lled+0.071
led+0.068
 her+0.061
 none+0.059
Neg logit contributions
 promise-0.109
 request-0.103
 rule-0.103
-0.096
.'"-0.095
Token her
Token ID607
Feature activation+1.98e-02

Pos logit contributions
led+0.038
bled+0.034
lled+0.028
 his+0.027
 able+0.023
Neg logit contributions
 rule-0.161
 team-0.159
 promise-0.159
 judge-0.158
 world-0.156
Token and
Token ID290
Feature activation-6.08e-02

Pos logit contributions
umpy+0.038
opus+0.038
oc+0.038
+0.037
asis+0.036
Neg logit contributions
 her-0.038
 the-0.034
 their-0.033
 his-0.033
 it-0.030
Token made
Token ID925
Feature activation-4.62e-02

Pos logit contributions
 her+0.092
 him+0.089
lled+0.089
bled+0.088
ied+0.087
Neg logit contributions
 promise-0.101
 judge-0.098
 rule-0.095
 request-0.094
-0.093
Token fun
Token ID1257
Feature activation+2.34e-02

Pos logit contributions
 her+0.025
 his+0.024
led+0.022
bled+0.019
 their+0.018
Neg logit contributions
 promise-0.128
 request-0.125
 rule-0.124
 story-0.120
ute-0.118
Token of
Token ID286
Feature activation-4.64e-02

Pos logit contributions
opus+0.046
+0.045
affe+0.045
oc+0.044
 promise+0.044
Neg logit contributions
 her-0.044
 their-0.040
 the-0.040
 his-0.039
 it-0.036
Token down
Token ID866
Feature activation-3.59e-02

Pos logit contributions
 her+0.027
 their+0.022
 the+0.022
 his+0.022
 its+0.020
Neg logit contributions
 promise-0.044
-0.044
 rule-0.044
ute-0.044
umpy-0.043
Token his
Token ID465
Feature activation+5.60e-02

Pos logit contributions
 her+0.053
led+0.048
 their+0.046
bled+0.046
lled+0.045
Neg logit contributions
 rule-0.069
 world-0.067
 hill-0.066
umpy-0.066
oc-0.065
Token head
Token ID1182
Feature activation+9.77e-03

Pos logit contributions
oc+0.083
umpy+0.083
icious+0.081
oral+0.076
opus+0.075
Neg logit contributions
 her-0.137
 the-0.126
 his-0.125
 their-0.118
 its-0.111
Token.
Token ID13
Feature activation-2.98e-03

Pos logit contributions
 promise+0.018
opus+0.018
+0.018
 rule+0.018
oc+0.018
Neg logit contributions
 her-0.018
 his-0.016
 the-0.016
 their-0.016
 its-0.015
Token He
Token ID679
Feature activation-3.89e-02

Pos logit contributions
 her+0.006
 the+0.005
 their+0.005
 his+0.005
 its+0.005
Neg logit contributions
 promise-0.005
 rule-0.005
oc-0.005
umpy-0.005
-0.005
Token saw
Token ID2497
Feature activation-5.95e-02

Pos logit contributions
 her+0.064
 their+0.057
 the+0.052
 his+0.052
 its+0.052
Neg logit contributions
 promise-0.075
oc-0.071
-0.070
 rule-0.069
 request-0.068
Token his
Token ID465
Feature activation+3.70e-02

Pos logit contributions
led+0.045
bled+0.041
lled+0.039
 her+0.035
ened+0.034
Neg logit contributions
 rule-0.140
affe-0.138
 slide-0.135
opus-0.135
 driver-0.135
Token scarf
Token ID36953
Feature activation-1.44e-02

Pos logit contributions
oc+0.060
umpy+0.059
+0.056
opus+0.055
asis+0.055
Neg logit contributions
 her-0.084
 the-0.077
 his-0.075
 their-0.073
 it-0.069
Token falling
Token ID7463
Feature activation-3.21e-02

Pos logit contributions
 her+0.025
 their+0.022
 the+0.021
 his+0.021
 its+0.020
Neg logit contributions
 promise-0.028
 rule-0.027
oc-0.027
opus-0.026
umpy-0.026
Token off
Token ID572
Feature activation-2.79e-02

Pos logit contributions
 her+0.040
 their+0.034
 the+0.033
 his+0.032
led+0.029
Neg logit contributions
 promise-0.070
 rule-0.070
ute-0.070
-0.069
umpy-0.069
Token his
Token ID465
Feature activation+5.78e-02

Pos logit contributions
 her+0.024
 their+0.019
led+0.018
 his+0.017
bled+0.017
Neg logit contributions
 rule-0.071
oc-0.071
-0.071
 hill-0.070
 promise-0.070