TOP ACTIVATIONS
MAX = 4.979

fingers . She inserted the key into the key
went on a walk and soon they saw a lab .
followed the ringing sound and soon found the source
as they could , and soon they were far away from
through the forest , but soon it was getting dark .
scared at first , but soon the fear was replaced by
police officer called Sarah and soon she arrived to pick up
but she kept walking and soon found a beautiful flower .
little girl was disappointed but soon spotted a small elephant standing
baby was disappointed , but soon she saw a little bird
for a bit , but soon he was back home and
. She started walking but soon realized it was farther away
over the hills , but soon she found herself lost in
walked and walked , but soon he realized he didn 't
They searched and searched , but it was gone . They
under the umbrella , but soon they noticed that their feet
," said Lily . She organized all the toys in the
and flew away , but soon returned with a friend .
She followed the singing and soon she came upon a pond
she screamed for help . Soon a friendly polar bear appeared

MIN ACTIVATIONS
MIN = -6.069

this is a sculpture of a cat . It is white
you can see a little face in the bark
I have two cards with a dog !" Ben says .
I am baking a delicious cake . â
flag , a map , a landmark , or a symbol
, like a flag , a map , a landmark ,
and a monkey , and a gir affe , and a
a gir affe , and a snake !" He points to
a tiny box shaped like a heart .
, there was an old wooden box that no one could
" I was born as a cater pillar . A cater
? It looks more like a mess . And some of
has a long nose called a trunk . It can use
Wow ! You are a purple z ebra ! Can I
" Look , there 's a stand !" said Lucy .
, a landmark , or a symbol . That way ,
draw me a picture of a banana ?" L
, a nose , and a collar . Then your dog
is a lion , and a monkey , and a gir
a ball , and even a spoon . He likes to

INTERVAL 2.217 - 4.979
CONTAINS 6.375%

a bump on her head and a scratch on her arm
with her ball . She saw a boy who spoke a
went to the store to buy some apples . As they
parent said , " Where are you , little one ?
. Suddenly , they heard a noise .

INTERVAL -0.545 - 2.217
CONTAINS 58.536%

lie on the bed . She put her dolls
it !" Jimmy put the axe down . He
told him to be careful . " I will explain why
eggs . I was hungry and I wanted to make an
The boy was so happy and he was amazed by how

INTERVAL -3.307 - -0.545
CONTAINS 33.677%

Jack what was wrong . Jack showed the squirrel
the sky . The man pushed the button and
Instead , Su zy decided to take a nap under the
the town and no longer come searching for food .
you . But you have to promise to be honest and

INTERVAL -6.069 - -3.307
CONTAINS 1.412%

Then , she shut the door behind them so that
my and started running after the bird . She ran and
lived at the bottom of a big old tree , close
As she stepped up to the edge of the pond ,
a time , there was a little train . It was
Token fingers
Token ID9353
Feature activation+5.73e-01

Pos logit contributions
opus+0.090
umpy+0.086
asis+0.084
ute+0.084
+0.080
Neg logit contributions
 her-0.100
 his-0.091
 the-0.090
 their-0.085
 its-0.083
Token.
Token ID13
Feature activation-2.83e-01

Pos logit contributions
 her+0.028
 their+0.025
 his+0.025
 the+0.025
 its+0.023
Neg logit contributions
 promise-0.023
 rule-0.022
umpy-0.022
ute-0.021
oc-0.021
Token
Token ID198
Feature activation-1.01e+00

Pos logit contributions
 promise+0.014
 rule+0.014
opus+0.014
oc+0.013
umpy+0.013
Neg logit contributions
 her-0.012
 the-0.010
 his-0.010
 their-0.010
 it-0.009
Token
Token ID198
Feature activation-1.59e+00

Pos logit contributions
 promise+0.052
 rule+0.050
ater+0.050
opus+0.049
asis+0.048
Neg logit contributions
 her-0.044
 the-0.039
 his-0.038
 their-0.037
 it-0.036
TokenShe
Token ID3347
Feature activation+1.23e+00

Pos logit contributions
 promise+0.082
 rule+0.078
 Driver+0.077
asis+0.076
 request+0.075
Neg logit contributions
 her-0.071
 the-0.061
 his-0.057
 their-0.056
 it-0.055
Token inserted
Token ID18846
Feature activation+4.98e+00

Pos logit contributions
 her+0.047
 their+0.043
 his+0.040
 the+0.040
 its+0.038
Neg logit contributions
 promise-0.061
oc-0.061
ney-0.058
 request-0.056
-0.056
Token the
Token ID262
Feature activation-7.06e-01

Pos logit contributions
led+0.145
bled+0.121
icked+0.099
ied+0.098
zed+0.098
Neg logit contributions
 shelf-0.230
 button-0.215
 promise-0.215
 hill-0.214
 goal-0.211
Token key
Token ID1994
Feature activation+1.96e+00

Pos logit contributions
opus+0.024
umpy+0.024
oc+0.024
+0.023
 promise+0.023
Neg logit contributions
 her-0.040
 the-0.037
 his-0.036
 their-0.035
 it-0.034
Token into
Token ID656
Feature activation+3.06e+00

Pos logit contributions
 her+0.065
 their+0.060
 his+0.058
 the+0.053
 its+0.051
Neg logit contributions
 promise-0.100
ney-0.099
ute-0.098
 hill-0.097
age-0.096
Token the
Token ID262
Feature activation-1.45e+00

Pos logit contributions
led+0.077
bled+0.055
ied+0.055
icked+0.049
ed+0.045
Neg logit contributions
 shelf-0.185
 hill-0.175
 rule-0.173
 house-0.173
 world-0.171
Token key
Token ID1994
Feature activation+4.68e-01

Pos logit contributions
opus+0.052
umpy+0.052
ute+0.049
oc+0.049
iver+0.048
Neg logit contributions
 her-0.084
 the-0.079
 his-0.077
 their-0.072
 it-0.072
Token went
Token ID1816
Feature activation+6.35e-01

Pos logit contributions
 her+0.058
 their+0.053
 his+0.053
 its+0.051
 the+0.051
Neg logit contributions
 promise-0.078
ari-0.074
oc-0.073
ney-0.073
age-0.071
Token on
Token ID319
Feature activation+2.21e+00

Pos logit contributions
 her+0.020
 his+0.017
 their+0.016
 the+0.016
 its+0.016
Neg logit contributions
 promise-0.036
 rule-0.035
ute-0.034
-0.034
 request-0.034
Token a
Token ID257
Feature activation-2.37e-01

Pos logit contributions
led+0.063
bled+0.055
icked+0.048
ied+0.047
ed+0.047
Neg logit contributions
 promise-0.126
 story-0.126
 rule-0.126
 farm-0.126
 hill-0.124
Token walk
Token ID2513
Feature activation+9.94e-01

Pos logit contributions
 promise+0.007
umpy+0.007
opus+0.007
oc+0.007
 rule+0.007
Neg logit contributions
 her-0.014
 the-0.013
 their-0.013
 his-0.013
 it-0.012
Token and
Token ID290
Feature activation+3.47e+00

Pos logit contributions
 her+0.039
 his+0.035
 nothing+0.034
bled+0.033
 their+0.033
Neg logit contributions
 promise-0.045
ute-0.044
 rule-0.044
 world-0.044
ney-0.044
Token soon
Token ID2582
Feature activation+4.93e+00

Pos logit contributions
bled+0.150
 such+0.144
led+0.143
ped+0.143
ries+0.142
Neg logit contributions
 swing-0.116
 station-0.113
 house-0.111
 hill-0.109
 team-0.109
Token they
Token ID484
Feature activation+2.65e+00

Pos logit contributions
nam+0.182
 his+0.182
bled+0.178
 how+0.175
years+0.171
Neg logit contributions
ari-0.185
-0.173
 team-0.167
 group-0.165
 house-0.165
Token saw
Token ID2497
Feature activation+2.43e+00

Pos logit contributions
 her+0.117
 his+0.111
 how+0.110
 such+0.109
 its+0.107
Neg logit contributions
ari-0.095
 team-0.092
 house-0.089
 promise-0.088
 hill-0.088
Token a
Token ID257
Feature activation-3.57e+00

Pos logit contributions
led+0.069
ened+0.067
 her+0.065
ged+0.063
bled+0.062
Neg logit contributions
 world-0.125
 fence-0.124
 farm-0.121
 house-0.119
 hill-0.117
Token lab
Token ID2248
Feature activation-1.72e+00

Pos logit contributions
iver+0.194
.'"+0.194
umpy+0.194
rr+0.193
ater+0.192
Neg logit contributions
 the-0.135
 it-0.126
 her-0.124
 their-0.117
 his-0.112
Token.
Token ID13
Feature activation+2.69e-01

Pos logit contributions
 promise+0.091
+0.090
opus+0.090
umpy+0.090
oc+0.090
Neg logit contributions
 her-0.067
 the-0.062
 their-0.059
 his-0.058
 it-0.055
Token followed
Token ID3940
Feature activation+1.99e+00

Pos logit contributions
 her+0.048
 their+0.043
 his+0.042
 the+0.040
 its+0.040
Neg logit contributions
 promise-0.056
oc-0.056
-0.053
 request-0.053
ney-0.053
Token the
Token ID262
Feature activation-1.50e+00

Pos logit contributions
led+0.031
bled+0.024
icked+0.018
lled+0.017
ied+0.016
Neg logit contributions
 request-0.136
 promise-0.135
 rule-0.133
 story-0.129
 shelf-0.129
Token ringing
Token ID32333
Feature activation+1.48e-01

Pos logit contributions
oc+0.063
+0.059
affe+0.058
umpy+0.058
!'"+0.057
Neg logit contributions
 her-0.083
 the-0.076
 their-0.071
 his-0.071
 it-0.067
Token sound
Token ID2128
Feature activation+1.73e+00

Pos logit contributions
 her+0.008
 his+0.007
 their+0.007
 the+0.007
 its+0.006
Neg logit contributions
 promise-0.006
 rule-0.005
opus-0.005
umpy-0.005
oc-0.005
Token and
Token ID290
Feature activation+3.35e+00

Pos logit contributions
 her+0.071
 his+0.064
 their+0.064
 the+0.061
bled+0.060
Neg logit contributions
 promise-0.071
 rule-0.071
umpy-0.070
affe-0.070
opus-0.068
Token soon
Token ID2582
Feature activation+4.91e+00

Pos logit contributions
ied+0.120
led+0.118
bled+0.116
icked+0.114
 theirs+0.113
Neg logit contributions
 house-0.125
 promise-0.124
 judge-0.123
 hill-0.123
 request-0.121
Token found
Token ID1043
Feature activation+3.03e+00

Pos logit contributions
bled+0.183
nam+0.165
 theirs+0.164
ied+0.162
ped+0.158
Neg logit contributions
ari-0.163
 house-0.163
 team-0.162
pillar-0.158
 group-0.158
Token the
Token ID262
Feature activation-1.76e+00

Pos logit contributions
led+0.086
ged+0.074
bled+0.070
ened+0.067
lled+0.061
Neg logit contributions
 name-0.151
age-0.150
 world-0.148
 veterinarian-0.148
 house-0.147
Token source
Token ID2723
Feature activation+1.12e+00

Pos logit contributions
oc+0.083
affe+0.079
opus+0.077
+0.076
rr+0.074
Neg logit contributions
 her-0.088
 the-0.082
 their-0.076
 his-0.071
 it-0.070
Token
Token ID6184
Feature activation-2.02e+00

Pos logit contributions
 her+0.050
 his+0.046
 their+0.045
 the+0.044
 its+0.042
Neg logit contributions
umpy-0.046
 promise-0.046
 rule-0.044
oc-0.044
opus-0.044
Token
Token ID95
Feature activation-1.88e+00

Pos logit contributions
 promise+0.069
oc+0.064
umpy+0.064
 rule+0.062
ney+0.062
Neg logit contributions
 her-0.110
 the-0.105
 their-0.097
 it-0.097
 its-0.093
Token as
Token ID355
Feature activation+2.67e+00

Pos logit contributions
 her+0.086
bled+0.081
lled+0.076
 none+0.073
ied+0.073
Neg logit contributions
-0.140
affe-0.139
oc-0.133
 road-0.132
 hill-0.127
Token they
Token ID484
Feature activation+2.56e+00

Pos logit contributions
led+0.068
lled+0.064
 her+0.061
ened+0.058
ied+0.053
Neg logit contributions
 hill-0.146
 road-0.136
 farm-0.135
 team-0.135
 fence-0.133
Token could
Token ID714
Feature activation+1.19e+00

Pos logit contributions
 her+0.125
 their+0.111
 his+0.105
 its+0.104
 such+0.103
Neg logit contributions
 promise-0.094
ite-0.092
ari-0.091
-0.091
 hill-0.091
Token,
Token ID11
Feature activation+1.70e+00

Pos logit contributions
 her+0.056
 their+0.049
 his+0.049
led+0.048
 the+0.048
Neg logit contributions
 promise-0.049
 rule-0.047
ute-0.046
 request-0.045
age-0.045
Token and
Token ID290
Feature activation+3.63e+00

Pos logit contributions
 her+0.079
led+0.072
 their+0.068
 his+0.068
ied+0.066
Neg logit contributions
 hill-0.065
 promise-0.064
 rule-0.062
 ramp-0.062
 road-0.062
Token soon
Token ID2582
Feature activation+4.88e+00

Pos logit contributions
icked+0.108
 her+0.106
bled+0.105
 his+0.104
ries+0.100
Neg logit contributions
 station-0.152
 team-0.151
 farm-0.150
 hill-0.148
 promise-0.147
Token they
Token ID484
Feature activation+2.36e+00

Pos logit contributions
bled+0.152
years+0.142
ied+0.134
nam+0.130
led+0.129
Neg logit contributions
 team-0.221
-0.207
 house-0.206
 story-0.202
ari-0.201
Token were
Token ID547
Feature activation+2.39e+00

Pos logit contributions
 her+0.104
 their+0.083
 his+0.082
 how+0.080
 its+0.079
Neg logit contributions
 team-0.099
 promise-0.098
ari-0.098
 hill-0.097
umpy-0.097
Token far
Token ID1290
Feature activation+6.32e-01

Pos logit contributions
 her+0.106
led+0.096
ted+0.089
ied+0.088
 their+0.086
Neg logit contributions
 promise-0.089
umpy-0.087
 world-0.084
 swing-0.084
 hill-0.083
Token away
Token ID1497
Feature activation+1.98e+00

Pos logit contributions
 her+0.022
 their+0.018
 his+0.018
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.035
umpy-0.034
-0.034
 rule-0.034
oc-0.033
Token from
Token ID422
Feature activation+1.56e+00

Pos logit contributions
 her+0.080
 his+0.066
led+0.065
bled+0.065
 their+0.063
Neg logit contributions
umpy-0.087
 world-0.085
 rule-0.083
 promise-0.083
ney-0.082
Token through
Token ID832
Feature activation+2.62e+00

Pos logit contributions
 nothing+0.077
bled+0.075
lled+0.074
led+0.073
ened+0.072
Neg logit contributions
ny-0.099
-0.099
 hill-0.099
 world-0.098
ute-0.094
Token the
Token ID262
Feature activation-1.12e+00

Pos logit contributions
bled+0.071
led+0.062
icked+0.055
ied+0.048
lled+0.044
Neg logit contributions
 world-0.165
 hill-0.163
 farm-0.158
 house-0.158
 fence-0.156
Token forest
Token ID8222
Feature activation+1.80e+00

Pos logit contributions
 promise+0.043
ater+0.039
oc+0.039
umpy+0.038
 rule+0.038
Neg logit contributions
 her-0.063
 the-0.060
 his-0.058
 their-0.056
 its-0.053
Token,
Token ID11
Feature activation+2.85e+00

Pos logit contributions
 her+0.081
 his+0.076
bled+0.074
 their+0.074
led+0.072
Neg logit contributions
 farm-0.068
 world-0.068
ney-0.068
 hill-0.067
 rule-0.067
Token but
Token ID475
Feature activation+3.89e+00

Pos logit contributions
 her+0.114
 their+0.111
icked+0.110
 his+0.109
 him+0.107
Neg logit contributions
 farm-0.109
 hill-0.107
 swing-0.105
 world-0.104
 fence-0.103
Token soon
Token ID2582
Feature activation+4.84e+00

Pos logit contributions
lled+0.116
 his+0.111
 their+0.108
 theirs+0.107
led+0.103
Neg logit contributions
 farm-0.168
 hill-0.167
 world-0.167
 house-0.161
pillar-0.160
Token it
Token ID340
Feature activation+2.24e+00

Pos logit contributions
bled+0.169
lled+0.163
led+0.161
ped+0.155
 theirs+0.153
Neg logit contributions
pillar-0.185
 world-0.183
 group-0.174
 story-0.170
 house-0.169
Token was
Token ID373
Feature activation+1.95e+00

Pos logit contributions
 her+0.100
 their+0.098
 his+0.089
 theirs+0.087
 how+0.086
Neg logit contributions
age-0.092
umpy-0.091
ney-0.089
 world-0.086
asis-0.085
Token getting
Token ID1972
Feature activation+2.68e+00

Pos logit contributions
led+0.082
 her+0.081
ed+0.077
 their+0.076
ged+0.075
Neg logit contributions
 rule-0.077
 world-0.076
 promise-0.073
 story-0.072
umpy-0.071
Token dark
Token ID3223
Feature activation+1.01e+00

Pos logit contributions
 her+0.122
 their+0.112
led+0.111
lled+0.109
 how+0.108
Neg logit contributions
-0.094
 world-0.087
umpy-0.087
 promise-0.087
 river-0.083
Token.
Token ID13
Feature activation+6.33e-01

Pos logit contributions
 her+0.050
 their+0.046
 his+0.045
 the+0.044
 its+0.043
Neg logit contributions
umpy-0.036
-0.036
 world-0.036
ney-0.035
 rule-0.035
Token scared
Token ID12008
Feature activation+1.81e+00

Pos logit contributions
 her+0.080
led+0.079
bled+0.079
ied+0.078
ries+0.078
Neg logit contributions
 promise-0.122
umpy-0.112
 request-0.110
-0.109
 world-0.108
Token at
Token ID379
Feature activation+1.44e+00

Pos logit contributions
 her+0.078
 their+0.076
bled+0.075
 his+0.073
led+0.073
Neg logit contributions
umpy-0.066
 promise-0.064
 world-0.063
 rule-0.061
ater-0.060
Token first
Token ID717
Feature activation+5.27e-01

Pos logit contributions
led+0.071
 her+0.070
 their+0.068
 his+0.066
ed+0.065
Neg logit contributions
 promise-0.052
 rule-0.049
 hill-0.046
 request-0.045
 world-0.044
Token,
Token ID11
Feature activation+1.38e+00

Pos logit contributions
 her+0.020
 their+0.018
 his+0.018
 the+0.018
 its+0.016
Neg logit contributions
 promise-0.026
 rule-0.025
-0.025
umpy-0.025
oc-0.025
Token but
Token ID475
Feature activation+3.18e+00

Pos logit contributions
 her+0.061
 their+0.056
 his+0.056
led+0.055
 the+0.053
Neg logit contributions
 promise-0.057
 rule-0.054
 request-0.052
-0.051
 world-0.051
Token soon
Token ID2582
Feature activation+4.80e+00

Pos logit contributions
led+0.123
 theirs+0.114
ries+0.114
lled+0.113
 their+0.112
Neg logit contributions
 promise-0.140
 request-0.114
-0.102
 rule-0.099
 judge-0.099
Token the
Token ID262
Feature activation-1.86e+00

Pos logit contributions
bled+0.171
led+0.152
ied+0.149
iddled+0.146
 theirs+0.140
Neg logit contributions
-0.163
 world-0.160
 promise-0.156
 bird-0.154
 story-0.154
Token fear
Token ID3252
Feature activation+1.59e+00

Pos logit contributions
oc+0.093
ari+0.085
affe+0.084
umpy+0.082
'."+0.081
Neg logit contributions
 her-0.084
 the-0.084
 his-0.080
 their-0.077
 it-0.075
Token was
Token ID373
Feature activation+1.92e+00

Pos logit contributions
 her+0.064
 his+0.056
 their+0.056
 the+0.053
 its+0.050
Neg logit contributions
 promise-0.074
umpy-0.073
 rule-0.072
age-0.071
opus-0.071
Token replaced
Token ID6928
Feature activation+1.96e+00

Pos logit contributions
 her+0.068
led+0.062
 their+0.059
ed+0.058
 his+0.058
Neg logit contributions
 promise-0.091
 rule-0.088
 request-0.083
 world-0.082
age-0.080
Token by
Token ID416
Feature activation+1.55e+00

Pos logit contributions
 her+0.084
 their+0.083
 his+0.081
bled+0.076
led+0.076
Neg logit contributions
 world-0.074
 promise-0.071
ater-0.068
umpy-0.068
 rule-0.067
Token police
Token ID1644
Feature activation+6.59e-01

Pos logit contributions
ute+0.037
asis+0.037
 promise+0.036
ater+0.036
opus+0.036
Neg logit contributions
 her-0.052
 the-0.048
 his-0.047
 their-0.046
 it-0.042
Token officer
Token ID3818
Feature activation+1.07e+00

Pos logit contributions
 her+0.037
 their+0.034
 his+0.034
 the+0.034
 its+0.033
Neg logit contributions
 promise-0.021
opus-0.020
oc-0.020
-0.020
umpy-0.020
Token called
Token ID1444
Feature activation+1.34e+00

Pos logit contributions
 her+0.041
 their+0.036
 his+0.035
 the+0.034
 its+0.033
Neg logit contributions
 promise-0.054
oc-0.053
-0.052
opus-0.051
 request-0.051
Token Sarah
Token ID10490
Feature activation+5.17e-01

Pos logit contributions
 her+0.016
led+0.013
 their+0.012
 his+0.012
lled+0.010
Neg logit contributions
 promise-0.095
 rule-0.094
 request-0.091
 story-0.090
-0.090
Token and
Token ID290
Feature activation+2.95e+00

Pos logit contributions
 her+0.022
 their+0.020
 his+0.019
 the+0.019
 its+0.018
Neg logit contributions
 promise-0.024
oc-0.023
-0.023
 rule-0.023
opus-0.022
Token soon
Token ID2582
Feature activation+4.79e+00

Pos logit contributions
led+0.074
bled+0.071
lled+0.066
ied+0.066
 her+0.056
Neg logit contributions
 judge-0.162
 promise-0.160
 team-0.157
 mechanic-0.153
 request-0.152
Token she
Token ID673
Feature activation+1.01e+00

Pos logit contributions
bled+0.146
nam+0.130
ries+0.126
 herself+0.124
ped+0.123
Neg logit contributions
 team-0.217
ari-0.205
 story-0.203
 group-0.194
 party-0.191
Token arrived
Token ID5284
Feature activation+9.53e-01

Pos logit contributions
 her+0.038
 their+0.034
 his+0.033
 the+0.033
 its+0.031
Neg logit contributions
 promise-0.050
oc-0.049
-0.047
opus-0.047
ari-0.047
Token to
Token ID284
Feature activation+7.91e-01

Pos logit contributions
 her+0.033
 his+0.028
 the+0.028
 their+0.028
led+0.026
Neg logit contributions
 promise-0.049
ute-0.046
-0.046
 rule-0.046
 request-0.046
Token pick
Token ID2298
Feature activation+2.31e+00

Pos logit contributions
 her+0.032
led+0.031
 his+0.029
 their+0.028
 the+0.028
Neg logit contributions
 promise-0.036
 rule-0.035
 request-0.034
-0.033
oc-0.033
Token up
Token ID510
Feature activation+2.43e+00

Pos logit contributions
bled+0.039
 nothing+0.033
 her+0.033
led+0.031
 none+0.031
Neg logit contributions
oc-0.142
-0.140
pillar-0.138
affe-0.137
 world-0.137
Token but
Token ID475
Feature activation+3.16e+00

Pos logit contributions
led+0.115
bled+0.114
icked+0.113
lled+0.109
ied+0.108
Neg logit contributions
 rule-0.063
 promise-0.062
 road-0.059
 world-0.059
 farm-0.058
Token she
Token ID673
Feature activation+5.75e-01

Pos logit contributions
lled+0.111
bled+0.106
led+0.105
icked+0.104
ts+0.102
Neg logit contributions
 promise-0.145
 request-0.128
 judge-0.122
 story-0.116
-0.115
Token kept
Token ID4030
Feature activation+1.77e+00

Pos logit contributions
 her+0.022
 their+0.020
 the+0.019
 his+0.019
 its+0.018
Neg logit contributions
 promise-0.030
-0.028
oc-0.028
 rule-0.028
opus-0.028
Token walking
Token ID6155
Feature activation+5.91e-01

Pos logit contributions
led+0.067
ied+0.058
bled+0.058
lled+0.057
ed+0.057
Neg logit contributions
 promise-0.080
 request-0.077
 rule-0.075
 story-0.075
 world-0.074
Token and
Token ID290
Feature activation+3.34e+00

Pos logit contributions
 her+0.023
 their+0.021
 his+0.020
 the+0.020
led+0.019
Neg logit contributions
 promise-0.029
 rule-0.028
umpy-0.027
-0.027
oc-0.027
Token soon
Token ID2582
Feature activation+4.79e+00

Pos logit contributions
led+0.145
icked+0.134
bled+0.127
ied+0.127
ped+0.124
Neg logit contributions
 hill-0.127
 promise-0.125
 swing-0.124
 house-0.124
 world-0.117
Token found
Token ID1043
Feature activation+2.79e+00

Pos logit contributions
bled+0.166
ped+0.161
lled+0.155
ries+0.155
 how+0.150
Neg logit contributions
-0.167
 team-0.162
 house-0.161
ari-0.156
 group-0.154
Token a
Token ID257
Feature activation-2.65e+00

Pos logit contributions
led+0.094
ened+0.083
lled+0.079
ged+0.078
bled+0.075
Neg logit contributions
 veterinarian-0.132
age-0.125
 world-0.124
 house-0.124
 rule-0.123
Token beautiful
Token ID4950
Feature activation-2.18e+00

Pos logit contributions
rr+0.141
umpy+0.140
'."+0.137
iver+0.136
ater+0.136
Neg logit contributions
 the-0.102
 her-0.101
 it-0.095
 their-0.095
 his-0.092
Token flower
Token ID15061
Feature activation-2.15e+00

Pos logit contributions
umpy+0.086
rr+0.081
ater+0.079
 promise+0.078
arl+0.078
Neg logit contributions
 her-0.124
 the-0.119
 it-0.113
 their-0.112
 his-0.111
Token.
Token ID13
Feature activation-3.18e-01

Pos logit contributions
umpy+0.124
rr+0.120
oc+0.119
 Driver+0.119
+0.118
Neg logit contributions
 her-0.082
 the-0.070
 it-0.065
 their-0.065
 his-0.061
Token little
Token ID1310
Feature activation+2.22e-01

Pos logit contributions
ute+0.068
iver+0.064
ater+0.064
+0.064
opus+0.064
Neg logit contributions
 her-0.067
 the-0.063
 their-0.058
 his-0.057
 it-0.052
Token girl
Token ID2576
Feature activation+5.02e-01

Pos logit contributions
 her+0.015
 his+0.014
 their+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.005
 rule-0.004
-0.004
umpy-0.004
oc-0.004
Token was
Token ID373
Feature activation+1.43e+00

Pos logit contributions
 her+0.018
 their+0.015
 his+0.015
 the+0.015
 its+0.014
Neg logit contributions
 promise-0.027
oc-0.026
-0.026
 rule-0.026
opus-0.026
Token disappointed
Token ID11679
Feature activation+4.12e-01

Pos logit contributions
 her+0.060
led+0.055
 their+0.053
 his+0.053
 him+0.052
Neg logit contributions
 promise-0.063
umpy-0.060
 rule-0.060
 request-0.057
age-0.057
Token but
Token ID475
Feature activation+2.70e+00

Pos logit contributions
 her+0.017
 their+0.015
 his+0.015
 the+0.015
 its+0.014
Neg logit contributions
 promise-0.020
 rule-0.019
umpy-0.019
-0.019
ute-0.019
Token soon
Token ID2582
Feature activation+4.78e+00

Pos logit contributions
bled+0.099
 them+0.094
lled+0.093
ied+0.092
led+0.091
Neg logit contributions
 promise-0.117
 judge-0.114
 request-0.104
 world-0.098
 swing-0.096
Token spotted
Token ID13489
Feature activation+3.29e+00

Pos logit contributions
bled+0.183
iddled+0.173
ied+0.166
lled+0.162
ped+0.158
Neg logit contributions
-0.186
 team-0.178
 story-0.170
 group-0.158
pillar-0.155
Token a
Token ID257
Feature activation-3.44e+00

Pos logit contributions
led+0.084
bled+0.073
lled+0.070
ged+0.065
ied+0.063
Neg logit contributions
 veterinarian-0.163
 farm-0.163
 world-0.160
 judge-0.153
 rule-0.153
Token small
Token ID1402
Feature activation-2.39e+00

Pos logit contributions
iver+0.177
'."+0.174
umpy+0.173
ater+0.172
.'"+0.171
Neg logit contributions
 the-0.144
 her-0.129
 their-0.129
 it-0.127
 his-0.126
Token elephant
Token ID20950
Feature activation-2.22e+00

Pos logit contributions
umpy+0.090
ute+0.090
opus+0.089
arl+0.089
icious+0.088
Neg logit contributions
 the-0.129
 her-0.125
 their-0.120
 his-0.119
 it-0.114
Token standing
Token ID5055
Feature activation+6.99e-01

Pos logit contributions
umpy+0.129
arl+0.127
+0.126
 promise+0.125
 rule+0.123
Neg logit contributions
 her-0.081
 the-0.075
 his-0.071
 it-0.066
 their-0.066
Token baby
Token ID5156
Feature activation+2.78e-01

Pos logit contributions
ater+0.059
asis+0.059
oc+0.058
ute+0.058
rr+0.058
Neg logit contributions
 her-0.061
 the-0.056
 his-0.053
 their-0.053
 it-0.049
Token was
Token ID373
Feature activation+1.76e+00

Pos logit contributions
 her+0.013
 their+0.011
 his+0.011
 the+0.011
 its+0.010
Neg logit contributions
 promise-0.012
oc-0.012
opus-0.012
-0.012
umpy-0.012
Token disappointed
Token ID11679
Feature activation+6.37e-01

Pos logit contributions
 her+0.077
led+0.072
 their+0.070
 him+0.068
 his+0.068
Neg logit contributions
 promise-0.070
 rule-0.070
umpy-0.069
 request-0.066
 story-0.064
Token,
Token ID11
Feature activation+1.97e+00

Pos logit contributions
 her+0.028
 their+0.024
 his+0.024
 the+0.024
led+0.023
Neg logit contributions
 promise-0.028
 rule-0.027
umpy-0.027
-0.027
ute-0.027
Token but
Token ID475
Feature activation+3.22e+00

Pos logit contributions
bled+0.088
led+0.086
 her+0.085
lled+0.082
ied+0.082
Neg logit contributions
 rule-0.068
 promise-0.067
 world-0.067
 hill-0.066
 farm-0.065
Token soon
Token ID2582
Feature activation+4.77e+00

Pos logit contributions
lled+0.118
bled+0.113
icked+0.108
ied+0.108
 theirs+0.102
Neg logit contributions
 promise-0.123
-0.117
 judge-0.115
 veterinarian-0.114
 hill-0.113
Token she
Token ID673
Feature activation+6.69e-01

Pos logit contributions
bled+0.183
ied+0.174
iddled+0.172
lled+0.166
alled+0.160
Neg logit contributions
-0.192
 team-0.166
 bird-0.163
 story-0.161
!"-0.158
Token saw
Token ID2497
Feature activation+2.98e+00

Pos logit contributions
 her+0.027
 their+0.024
 his+0.023
 the+0.023
 its+0.021
Neg logit contributions
 promise-0.034
oc-0.032
-0.032
opus-0.032
umpy-0.032
Token a
Token ID257
Feature activation-3.42e+00

Pos logit contributions
led+0.080
bled+0.079
lled+0.070
ened+0.068
ied+0.067
Neg logit contributions
 veterinarian-0.143
 farm-0.139
 house-0.135
 world-0.135
 fence-0.132
Token little
Token ID1310
Feature activation-1.71e+00

Pos logit contributions
'."+0.179
.'"+0.175
rr+0.175
asis+0.174
!'"+0.173
Neg logit contributions
 the-0.138
 her-0.134
 his-0.126
 it-0.124
 their-0.123
Token bird
Token ID6512
Feature activation-1.36e+00

Pos logit contributions
asis+0.058
ute+0.056
opus+0.056
umpy+0.056
rr+0.056
Neg logit contributions
 her-0.098
 the-0.094
 his-0.090
 their-0.089
 it-0.086
Token for
Token ID329
Feature activation+2.73e+00

Pos logit contributions
led+0.046
 her+0.044
bled+0.043
ed+0.037
ied+0.035
Neg logit contributions
 promise-0.119
 rule-0.117
 world-0.117
 farm-0.114
 story-0.114
Token a
Token ID257
Feature activation-1.31e+00

Pos logit contributions
led+0.087
ied+0.087
bled+0.084
 her+0.079
angled+0.073
Neg logit contributions
 story-0.139
oc-0.136
 rule-0.135
 tale-0.133
 promise-0.132
Token bit
Token ID1643
Feature activation+9.84e-02

Pos logit contributions
umpy+0.071
ney+0.069
ater+0.069
ute+0.068
affe+0.068
Neg logit contributions
 her-0.047
 the-0.045
 their-0.043
 his-0.042
 it-0.040
Token,
Token ID11
Feature activation+2.47e+00

Pos logit contributions
 her+0.004
 his+0.003
 their+0.003
 the+0.003
 its+0.003
Neg logit contributions
 promise-0.005
 rule-0.005
umpy-0.005
oc-0.005
opus-0.005
Token but
Token ID475
Feature activation+3.22e+00

Pos logit contributions
 her+0.108
led+0.104
ied+0.099
bled+0.097
icked+0.097
Neg logit contributions
 promise-0.096
 world-0.092
 rule-0.090
 farm-0.088
oc-0.086
Token soon
Token ID2582
Feature activation+4.77e+00

Pos logit contributions
ries+0.119
 her+0.117
led+0.107
iddled+0.105
icked+0.103
Neg logit contributions
 promise-0.149
 team-0.131
oc-0.129
 rule-0.128
 hill-0.124
Token he
Token ID339
Feature activation+1.29e+00

Pos logit contributions
bled+0.155
 her+0.154
iddled+0.152
ied+0.152
nam+0.143
Neg logit contributions
 team-0.211
 story-0.193
-0.189
oc-0.185
 world-0.185
Token was
Token ID373
Feature activation+1.65e+00

Pos logit contributions
 her+0.053
 their+0.046
 his+0.043
 the+0.042
 its+0.042
Neg logit contributions
 promise-0.062
opus-0.061
umpy-0.061
oc-0.060
-0.059
Token back
Token ID736
Feature activation+8.95e-01

Pos logit contributions
 her+0.064
led+0.059
 their+0.058
ted+0.055
 his+0.054
Neg logit contributions
 promise-0.076
umpy-0.072
 world-0.071
 rule-0.070
 team-0.068
Token home
Token ID1363
Feature activation+9.55e-01

Pos logit contributions
 her+0.030
 their+0.025
 his+0.025
 the+0.024
 its+0.023
Neg logit contributions
 promise-0.049
umpy-0.047
ute-0.047
 rule-0.046
 world-0.046
Token and
Token ID290
Feature activation+2.04e+00

Pos logit contributions
 her+0.045
 their+0.041
 his+0.040
 the+0.039
 its+0.038
Neg logit contributions
 promise-0.039
 rule-0.037
umpy-0.036
oc-0.036
ney-0.036
Token.
Token ID13
Feature activation+2.79e-01

Pos logit contributions
opus+0.065
oc+0.064
 promise+0.063
umpy+0.062
affe+0.062
Neg logit contributions
 her-0.053
 the-0.048
 their-0.044
 his-0.043
 it-0.042
Token She
Token ID1375
Feature activation+8.16e-01

Pos logit contributions
 her+0.012
 his+0.011
 their+0.011
 the+0.010
 its+0.010
Neg logit contributions
 promise-0.013
-0.013
 rule-0.013
umpy-0.013
opus-0.012
Token started
Token ID2067
Feature activation+1.87e+00

Pos logit contributions
 her+0.031
 their+0.027
 his+0.027
 the+0.026
 its+0.025
Neg logit contributions
 promise-0.042
oc-0.040
-0.040
 rule-0.039
opus-0.039
Token walking
Token ID6155
Feature activation+1.01e+00

Pos logit contributions
led+0.070
bled+0.067
 her+0.062
lled+0.059
 none+0.059
Neg logit contributions
 rule-0.082
 request-0.082
 swing-0.082
 farm-0.081
 hill-0.081
Token but
Token ID475
Feature activation+3.44e+00

Pos logit contributions
 her+0.038
 their+0.034
led+0.034
 his+0.033
 nothing+0.033
Neg logit contributions
 promise-0.046
 hill-0.045
 rule-0.045
-0.045
 world-0.044
Token soon
Token ID2582
Feature activation+4.76e+00

Pos logit contributions
 his+0.107
led+0.098
icked+0.097
 their+0.096
ped+0.093
Neg logit contributions
 promise-0.158
 judge-0.157
-0.150
 hill-0.149
 world-0.148
Token realized
Token ID6939
Feature activation+3.19e+00

Pos logit contributions
bled+0.192
ped+0.180
led+0.175
ied+0.175
 his+0.173
Neg logit contributions
-0.164
pillar-0.156
 judge-0.153
ari-0.149
 world-0.147
Token it
Token ID340
Feature activation+1.76e+00

Pos logit contributions
led+0.104
bled+0.089
lled+0.088
ped+0.082
ed+0.082
Neg logit contributions
 veterinarian-0.146
 swing-0.145
 world-0.143
 rule-0.142
 judge-0.142
Token was
Token ID373
Feature activation+2.30e+00

Pos logit contributions
 her+0.074
 their+0.072
 his+0.070
 its+0.067
bled+0.064
Neg logit contributions
umpy-0.074
 promise-0.072
 rule-0.072
 world-0.072
 story-0.070
Token farther
Token ID18485
Feature activation+1.49e+00

Pos logit contributions
led+0.101
ed+0.096
ged+0.092
ted+0.090
bled+0.089
Neg logit contributions
 rule-0.099
 promise-0.095
 world-0.090
 story-0.087
 request-0.085
Token away
Token ID1497
Feature activation+1.59e+00

Pos logit contributions
 her+0.074
bled+0.068
 his+0.064
 the+0.064
 their+0.064
Neg logit contributions
 promise-0.058
 rule-0.055
umpy-0.054
ater-0.053
ute-0.051
Token over
Token ID625
Feature activation+1.48e+00

Pos logit contributions
bled+0.130
ied+0.125
 theirs+0.122
 such+0.119
 their+0.116
Neg logit contributions
 swing-0.116
 hill-0.109
 slide-0.103
 fence-0.101
 river-0.097
Token the
Token ID262
Feature activation-2.40e+00

Pos logit contributions
 her+0.039
led+0.038
bled+0.036
 his+0.036
 their+0.035
Neg logit contributions
 world-0.081
 hill-0.080
 promise-0.079
-0.078
 rule-0.077
Token hills
Token ID18639
Feature activation+1.34e+00

Pos logit contributions
oc+0.118
umpy+0.109
 Amen+0.109
iver+0.109
ari+0.107
Neg logit contributions
 the-0.121
 her-0.117
 it-0.110
 their-0.097
 its-0.093
Token,
Token ID11
Feature activation+2.86e+00

Pos logit contributions
 her+0.062
 his+0.056
 their+0.056
led+0.054
 the+0.053
Neg logit contributions
 promise-0.051
 rule-0.050
umpy-0.050
 hill-0.050
ute-0.049
Token but
Token ID475
Feature activation+3.74e+00

Pos logit contributions
icked+0.118
 her+0.111
 their+0.111
bled+0.110
 his+0.110
Neg logit contributions
 hill-0.113
 swing-0.112
 fence-0.105
 farm-0.104
 slide-0.103
Token soon
Token ID2582
Feature activation+4.76e+00

Pos logit contributions
led+0.092
lled+0.092
 his+0.085
 their+0.084
icked+0.084
Neg logit contributions
 hill-0.184
 swing-0.178
 promise-0.172
 goal-0.167
-0.165
Token she
Token ID673
Feature activation+1.40e+00

Pos logit contributions
bled+0.181
led+0.178
ied+0.174
lled+0.169
ped+0.161
Neg logit contributions
-0.168
 swing-0.157
 house-0.152
 world-0.151
!"-0.148
Token found
Token ID1043
Feature activation+3.18e+00

Pos logit contributions
 her+0.060
 their+0.056
 his+0.053
 its+0.050
 the+0.049
Neg logit contributions
opus-0.066
umpy-0.064
pillar-0.062
 promise-0.062
affe-0.061
Token herself
Token ID5223
Feature activation+2.49e-01

Pos logit contributions
led+0.092
 their+0.078
lled+0.070
 his+0.069
bled+0.068
Neg logit contributions
 cage-0.145
 world-0.145
age-0.142
 fence-0.141
 veterinarian-0.140
Token lost
Token ID2626
Feature activation+9.67e-01

Pos logit contributions
 her+0.009
 his+0.008
 their+0.008
 the+0.008
 its+0.007
Neg logit contributions
 promise-0.013
 rule-0.013
umpy-0.013
-0.013
opus-0.013
Token in
Token ID287
Feature activation+2.13e+00

Pos logit contributions
 her+0.030
 their+0.028
led+0.026
 his+0.026
 the+0.026
Neg logit contributions
 promise-0.051
 rule-0.051
ater-0.049
 world-0.049
ute-0.049
Token walked
Token ID6807
Feature activation+1.26e+00

Pos logit contributions
 her+0.046
 their+0.042
 his+0.040
 its+0.039
 the+0.039
Neg logit contributions
 promise-0.058
-0.055
oc-0.055
 request-0.053
 rule-0.053
Token and
Token ID290
Feature activation+2.81e+00

Pos logit contributions
 her+0.047
 nothing+0.042
 their+0.042
 its+0.041
 how+0.041
Neg logit contributions
-0.055
 promise-0.055
ute-0.054
 hill-0.054
 rule-0.054
Token walked
Token ID6807
Feature activation+8.20e-01

Pos logit contributions
bled+0.113
 such+0.112
lled+0.111
 anything+0.102
 theirs+0.101
Neg logit contributions
 hill-0.113
-0.107
 swing-0.100
 slide-0.100
 road-0.099
Token,
Token ID11
Feature activation+2.23e+00

Pos logit contributions
 her+0.031
 their+0.027
 the+0.026
 his+0.026
 its+0.025
Neg logit contributions
 promise-0.039
-0.039
 rule-0.038
ute-0.037
oc-0.037
Token but
Token ID475
Feature activation+3.15e+00

Pos logit contributions
 her+0.101
led+0.096
 their+0.096
icked+0.095
 its+0.094
Neg logit contributions
 hill-0.082
 farm-0.077
 swing-0.077
 road-0.076
 promise-0.076
Token soon
Token ID2582
Feature activation+4.75e+00

Pos logit contributions
icked+0.103
led+0.102
lled+0.101
 her+0.098
ped+0.091
Neg logit contributions
 hill-0.148
 road-0.140
 promise-0.137
oc-0.135
 driver-0.132
Token he
Token ID339
Feature activation+1.62e+00

Pos logit contributions
led+0.196
bled+0.195
ied+0.188
lled+0.185
ped+0.185
Neg logit contributions
 team-0.155
-0.151
 river-0.149
 hill-0.146
 road-0.145
Token realized
Token ID6939
Feature activation+2.63e+00

Pos logit contributions
 her+0.072
 their+0.067
 its+0.062
 his+0.060
 how+0.059
Neg logit contributions
opus-0.073
umpy-0.072
 promise-0.069
-0.069
oc-0.067
Token he
Token ID339
Feature activation+4.20e-01

Pos logit contributions
led+0.075
lled+0.067
 her+0.066
bled+0.063
ed+0.062
Neg logit contributions
 hill-0.132
opus-0.127
 story-0.126
 world-0.126
 rule-0.125
Token didn
Token ID1422
Feature activation-7.11e-01

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.022
umpy-0.021
-0.021
opus-0.021
 rule-0.021
Token't
Token ID470
Feature activation-8.11e-01

Pos logit contributions
 promise+0.030
oc+0.029
ater+0.028
opus+0.028
affe+0.028
Neg logit contributions
 her-0.036
 the-0.033
 his-0.032
 their-0.032
 it-0.030
Token They
Token ID1119
Feature activation+2.53e+00

Pos logit contributions
 her+0.040
 his+0.036
 the+0.034
 their+0.034
led+0.034
Neg logit contributions
 promise-0.028
 rule-0.028
-0.028
umpy-0.028
oc-0.028
Token searched
Token ID16499
Feature activation+2.72e+00

Pos logit contributions
 her+0.104
bled+0.099
 its+0.099
 their+0.098
led+0.098
Neg logit contributions
 promise-0.111
 request-0.094
 judge-0.092
 rule-0.091
 team-0.089
Token and
Token ID290
Feature activation+3.34e+00

Pos logit contributions
bled+0.103
icked+0.096
ed+0.096
led+0.096
 her+0.095
Neg logit contributions
 hill-0.127
 promise-0.122
 rule-0.115
 world-0.113
 road-0.113
Token searched
Token ID16499
Feature activation+2.52e+00

Pos logit contributions
 such+0.136
bled+0.134
led+0.133
 her+0.132
ped+0.132
Neg logit contributions
 hill-0.122
 promise-0.116
 ramp-0.116
 road-0.108
-0.107
Token,
Token ID11
Feature activation+3.91e+00

Pos logit contributions
 her+0.110
bled+0.108
led+0.106
ed+0.104
icked+0.103
Neg logit contributions
 hill-0.098
 promise-0.094
 rule-0.088
 world-0.088
 farm-0.086
Token but
Token ID475
Feature activation+4.75e+00

Pos logit contributions
icked+0.226
led+0.216
 her+0.214
bled+0.208
ied+0.208
Neg logit contributions
 hill-0.087
 farm-0.087
 rule-0.084
 promise-0.082
 road-0.082
Token it
Token ID340
Feature activation+2.92e+00

Pos logit contributions
led+0.161
ped+0.156
med+0.152
ened+0.149
lled+0.145
Neg logit contributions
 hill-0.204
 promise-0.202
 fence-0.189
 house-0.189
 rule-0.185
Token was
Token ID373
Feature activation+2.26e+00

Pos logit contributions
 her+0.147
 its+0.137
 their+0.137
 such+0.134
 his+0.133
Neg logit contributions
 promise-0.099
ney-0.090
 hill-0.087
 rule-0.086
 request-0.081
Token gone
Token ID3750
Feature activation+3.91e-01

Pos logit contributions
ted+0.092
led+0.091
ed+0.090
 her+0.087
ged+0.084
Neg logit contributions
 promise-0.102
 rule-0.099
 shelf-0.089
 world-0.089
 hill-0.087
Token.
Token ID13
Feature activation+8.04e-01

Pos logit contributions
 her+0.019
 his+0.017
 their+0.017
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.016
-0.015
 rule-0.015
umpy-0.015
oc-0.015
Token They
Token ID1119
Feature activation+2.83e+00

Pos logit contributions
 her+0.041
 his+0.037
 the+0.035
 their+0.035
led+0.034
Neg logit contributions
 promise-0.030
-0.030
umpy-0.030
 rule-0.030
oc-0.029
Token under
Token ID739
Feature activation+2.52e+00

Pos logit contributions
 her+0.054
 nothing+0.047
 his+0.045
led+0.045
lled+0.045
Neg logit contributions
ute-0.090
 promise-0.088
 world-0.086
 house-0.086
-0.086
Token the
Token ID262
Feature activation-1.44e+00

Pos logit contributions
led+0.058
ied+0.053
bled+0.047
icked+0.046
ered+0.046
Neg logit contributions
 farm-0.151
 team-0.149
 hill-0.149
-0.149
 fence-0.147
Token umbrella
Token ID25510
Feature activation+7.48e-01

Pos logit contributions
ater+0.061
opus+0.060
oc+0.059
rr+0.057
 promise+0.057
Neg logit contributions
 her-0.079
 the-0.076
 their-0.070
 his-0.068
 it-0.065
Token,
Token ID11
Feature activation+2.60e+00

Pos logit contributions
 her+0.032
 his+0.029
 their+0.029
 the+0.028
 its+0.026
Neg logit contributions
 promise-0.033
oc-0.033
age-0.032
ute-0.032
 rule-0.032
Token but
Token ID475
Feature activation+3.40e+00

Pos logit contributions
 her+0.107
 his+0.100
 him+0.100
led+0.098
ied+0.096
Neg logit contributions
 swing-0.098
 farm-0.091
 promise-0.090
 slide-0.087
 hill-0.087
Token soon
Token ID2582
Feature activation+4.74e+00

Pos logit contributions
lled+0.099
led+0.097
icked+0.093
ped+0.087
ered+0.086
Neg logit contributions
 promise-0.172
 team-0.158
 shelf-0.157
 rule-0.154
 judge-0.153
Token they
Token ID484
Feature activation+1.75e+00

Pos logit contributions
ped+0.134
bled+0.133
led+0.129
years+0.127
ied+0.121
Neg logit contributions
 team-0.226
 story-0.203
 party-0.203
-0.199
 picnic-0.188
Token noticed
Token ID6810
Feature activation+2.75e+00

Pos logit contributions
 her+0.068
 his+0.061
 their+0.059
 its+0.056
 the+0.055
Neg logit contributions
 promise-0.083
 team-0.077
affe-0.074
ater-0.074
-0.074
Token that
Token ID326
Feature activation+2.59e+00

Pos logit contributions
 her+0.062
led+0.061
bled+0.056
lled+0.053
 his+0.052
Neg logit contributions
 farm-0.139
 world-0.139
 story-0.138
 rule-0.138
 veterinarian-0.136
Token their
Token ID511
Feature activation-7.90e-01

Pos logit contributions
led+0.058
bled+0.048
icked+0.045
lled+0.044
ped+0.044
Neg logit contributions
 team-0.154
 rule-0.153
 story-0.151
 promise-0.150
 world-0.150
Token feet
Token ID3625
Feature activation+1.27e+00

Pos logit contributions
oc+0.034
opus+0.034
+0.033
umpy+0.033
affe+0.033
Neg logit contributions
 her-0.039
 the-0.035
 their-0.035
 his-0.033
 it-0.032
Token,"
Token ID553
Feature activation+1.42e+00

Pos logit contributions
 promise+0.001
opus+0.001
 rule+0.001
umpy+0.001
oc+0.001
Neg logit contributions
 her-0.003
 the-0.003
 his-0.003
 their-0.003
 its-0.003
Token said
Token ID531
Feature activation+2.19e+00

Pos logit contributions
 her+0.057
 their+0.053
bled+0.052
 nothing+0.051
 his+0.051
Neg logit contributions
-0.066
oc-0.064
 promise-0.062
affe-0.059
 request-0.057
Token Lily
Token ID20037
Feature activation+7.93e-01

Pos logit contributions
lled+0.065
led+0.060
bled+0.060
ied+0.056
 her+0.052
Neg logit contributions
-0.119
oc-0.112
 promise-0.111
 judge-0.110
 veterinarian-0.108
Token.
Token ID13
Feature activation-3.57e-01

Pos logit contributions
 her+0.040
 their+0.037
 the+0.036
 his+0.036
 its+0.034
Neg logit contributions
 promise-0.030
oc-0.029
umpy-0.028
-0.028
age-0.027
Token She
Token ID1375
Feature activation+1.93e+00

Pos logit contributions
 promise+0.018
 rule+0.018
opus+0.018
umpy+0.018
ute+0.017
Neg logit contributions
 her-0.015
 the-0.013
 his-0.013
 their-0.013
 its-0.012
Token organized
Token ID8389
Feature activation+4.74e+00

Pos logit contributions
 her+0.078
 their+0.073
 his+0.068
 the+0.067
 such+0.066
Neg logit contributions
 promise-0.092
oc-0.090
 request-0.085
-0.084
ari-0.084
Token all
Token ID477
Feature activation+3.03e+00

Pos logit contributions
led+0.100
lled+0.078
ment+0.072
 happened+0.071
 colder+0.069
Neg logit contributions
 shelf-0.249
 store-0.215
 label-0.213
 shop-0.213
 switch-0.212
Token the
Token ID262
Feature activation+6.10e-01

Pos logit contributions
bled+0.092
 none+0.083
 nothing+0.081
 neither+0.062
 something+0.060
Neg logit contributions
 promise-0.159
oc-0.156
ney-0.151
 world-0.150
ny-0.149
Token toys
Token ID14958
Feature activation+3.31e+00

Pos logit contributions
 her+0.029
 their+0.026
 the+0.026
 his+0.025
 its+0.024
Neg logit contributions
 promise-0.026
 rule-0.025
oc-0.025
umpy-0.024
ater-0.024
Token in
Token ID287
Feature activation+1.97e+00

Pos logit contributions
 her+0.126
 his+0.116
 their+0.112
 its+0.103
 such+0.101
Neg logit contributions
 promise-0.144
 hill-0.138
ney-0.135
ute-0.129
 house-0.129
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
led+0.037
bled+0.025
ied+0.021
ed+0.021
 her+0.018
Neg logit contributions
 rule-0.136
oc-0.135
 promise-0.133
 shelf-0.131
 world-0.131
Token and
Token ID290
Feature activation+2.59e+00

Pos logit contributions
 her+0.054
led+0.052
bled+0.051
 their+0.050
lled+0.050
Neg logit contributions
 promise-0.060
oc-0.060
 world-0.059
 rule-0.059
 request-0.058
Token flew
Token ID13112
Feature activation+1.66e+00

Pos logit contributions
 her+0.082
 their+0.080
bled+0.076
ied+0.074
 theirs+0.072
Neg logit contributions
-0.126
oc-0.118
 hill-0.115
 promise-0.113
asis-0.108
Token away
Token ID1497
Feature activation+1.35e+00

Pos logit contributions
 her+0.058
 their+0.054
led+0.052
 how+0.051
bled+0.051
Neg logit contributions
-0.070
 promise-0.070
 world-0.070
age-0.068
 hill-0.068
Token,
Token ID11
Feature activation+1.46e+00

Pos logit contributions
 her+0.063
 their+0.057
 his+0.057
 him+0.055
 the+0.053
Neg logit contributions
 promise-0.051
 rule-0.049
umpy-0.048
-0.047
oc-0.047
Token but
Token ID475
Feature activation+3.24e+00

Pos logit contributions
 her+0.059
 their+0.053
 his+0.052
led+0.052
 the+0.049
Neg logit contributions
 promise-0.065
 rule-0.063
 world-0.062
 hill-0.061
-0.060
Token soon
Token ID2582
Feature activation+4.73e+00

Pos logit contributions
lled+0.074
led+0.070
ries+0.070
 their+0.069
icked+0.069
Neg logit contributions
 promise-0.173
 hill-0.159
-0.159
oc-0.158
 rule-0.157
Token returned
Token ID4504
Feature activation+1.55e+00

Pos logit contributions
bled+0.135
ied+0.129
led+0.122
ged+0.120
ped+0.120
Neg logit contributions
 story-0.213
 world-0.211
 team-0.210
-0.208
 house-0.198
Token with
Token ID351
Feature activation+2.89e+00

Pos logit contributions
 her+0.063
 their+0.060
led+0.059
lled+0.056
 his+0.056
Neg logit contributions
 promise-0.063
 world-0.061
-0.060
 rule-0.059
age-0.059
Token a
Token ID257
Feature activation-2.55e+00

Pos logit contributions
led+0.071
bled+0.066
ied+0.056
lled+0.053
ed+0.052
Neg logit contributions
 promise-0.157
 story-0.151
 world-0.149
 rule-0.149
 request-0.146
Token friend
Token ID1545
Feature activation-9.17e-01

Pos logit contributions
ater+0.130
umpy+0.129
rr+0.129
opus+0.125
agons+0.125
Neg logit contributions
 her-0.101
 the-0.101
 his-0.093
 it-0.091
 their-0.089
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
opus+0.045
 promise+0.045
 rule+0.045
umpy+0.045
ute+0.045
Neg logit contributions
 her-0.040
 the-0.036
 his-0.035
 their-0.034
 it-0.031
Token She
Token ID1375
Feature activation+8.33e-01

Pos logit contributions
 promise+0.015
opus+0.015
 rule+0.015
oc+0.014
umpy+0.014
Neg logit contributions
 her-0.011
 the-0.010
 his-0.010
 their-0.010
 its-0.009
Token followed
Token ID3940
Feature activation+1.89e+00

Pos logit contributions
 her+0.031
 their+0.028
 his+0.027
 the+0.026
 its+0.024
Neg logit contributions
 promise-0.043
oc-0.041
-0.041
opus-0.040
 request-0.040
Token the
Token ID262
Feature activation-1.11e+00

Pos logit contributions
led+0.030
bled+0.025
 none+0.021
icked+0.020
lled+0.019
Neg logit contributions
 promise-0.126
 request-0.125
-0.122
 rule-0.119
 story-0.118
Token singing
Token ID13777
Feature activation+1.31e+00

Pos logit contributions
oc+0.049
ute+0.048
ater+0.048
opus+0.047
+0.046
Neg logit contributions
 her-0.055
 the-0.051
 his-0.047
 it-0.045
 their-0.044
Token and
Token ID290
Feature activation+3.48e+00

Pos logit contributions
 her+0.050
 his+0.046
 their+0.046
 the+0.044
led+0.041
Neg logit contributions
 promise-0.059
age-0.059
ute-0.058
 rule-0.057
affe-0.057
Token soon
Token ID2582
Feature activation+4.73e+00

Pos logit contributions
 theirs+0.132
bled+0.132
ries+0.131
ied+0.130
icked+0.124
Neg logit contributions
 promise-0.126
 house-0.118
 request-0.117
 swing-0.115
 hill-0.113
Token she
Token ID673
Feature activation+1.20e+00

Pos logit contributions
nam+0.169
bled+0.168
ries+0.149
 theirs+0.148
ied+0.146
Neg logit contributions
pillar-0.175
 world-0.171
 group-0.168
 house-0.167
 story-0.167
Token came
Token ID1625
Feature activation+1.20e+00

Pos logit contributions
 her+0.049
 their+0.045
 his+0.044
 the+0.041
 its+0.040
Neg logit contributions
 promise-0.055
oc-0.055
opus-0.054
-0.054
umpy-0.053
Token upon
Token ID2402
Feature activation+1.19e+00

Pos logit contributions
 her+0.045
 their+0.041
 his+0.039
led+0.038
 the+0.038
Neg logit contributions
umpy-0.057
 promise-0.056
 world-0.056
ute-0.055
age-0.055
Token a
Token ID257
Feature activation-2.46e+00

Pos logit contributions
 her+0.026
 their+0.023
 his+0.023
led+0.022
 the+0.021
Neg logit contributions
 promise-0.073
 rule-0.072
-0.071
 world-0.070
 request-0.070
Token pond
Token ID16723
Feature activation-7.80e-01

Pos logit contributions
ater+0.122
rr+0.121
umpy+0.119
ute+0.118
affe+0.117
Neg logit contributions
 the-0.112
 her-0.105
 it-0.099
 his-0.094
 their-0.092
Token she
Token ID673
Feature activation+1.36e+00

Pos logit contributions
led+0.101
bled+0.091
lled+0.091
icked+0.084
ied+0.081
Neg logit contributions
 swing-0.135
 house-0.132
 judge-0.131
 promise-0.130
 veterinarian-0.127
Token screamed
Token ID25421
Feature activation+1.21e+00

Pos logit contributions
 her+0.052
 their+0.048
 his+0.045
 the+0.044
 its+0.043
Neg logit contributions
 promise-0.067
age-0.062
 request-0.061
oc-0.061
-0.061
Token for
Token ID329
Feature activation+2.91e+00

Pos logit contributions
 her+0.048
led+0.044
 their+0.044
 his+0.043
 the+0.042
Neg logit contributions
 promise-0.052
 rule-0.050
ute-0.050
 request-0.050
oc-0.048
Token help
Token ID1037
Feature activation+1.14e+00

Pos logit contributions
led+0.213
ed+0.194
ied+0.189
lled+0.187
ened+0.178
Neg logit contributions
 promise-0.057
 rule-0.050
 request-0.048
 test-0.042
 spell-0.039
Token.
Token ID13
Feature activation+8.27e-01

Pos logit contributions
led+0.052
 her+0.052
 their+0.051
 his+0.049
bled+0.048
Neg logit contributions
 promise-0.042
 rule-0.041
 request-0.040
age-0.039
oc-0.037
Token Soon
Token ID15894
Feature activation+4.73e+00

Pos logit contributions
 her+0.038
 their+0.036
 his+0.036
 the+0.035
led+0.034
Neg logit contributions
-0.032
 rule-0.032
 promise-0.031
 request-0.031
umpy-0.031
Token a
Token ID257
Feature activation-2.65e+00

Pos logit contributions
led+0.303
bled+0.279
lled+0.276
ped+0.268
ied+0.263
Neg logit contributions
 house-0.070
 story-0.068
 promise-0.060
 swing-0.059
 team-0.058
Token friendly
Token ID8030
Feature activation-8.31e-01

Pos logit contributions
ater+0.137
umpy+0.132
'."+0.131
.'"+0.130
!'"+0.130
Neg logit contributions
 her-0.108
 the-0.105
 his-0.100
 it-0.097
 their-0.094
Token polar
Token ID13559
Feature activation-5.77e-01

Pos logit contributions
+0.024
 promise+0.024
oc+0.024
umpy+0.024
ater+0.023
Neg logit contributions
 her-0.052
 the-0.048
 his-0.048
 their-0.047
 its-0.045
Token bear
Token ID6842
Feature activation+2.20e-02

Pos logit contributions
 promise+0.016
umpy+0.015
+0.015
 rule+0.015
ute+0.015
Neg logit contributions
 her-0.036
 his-0.034
 the-0.034
 their-0.033
 its-0.032
Token appeared
Token ID4120
Feature activation+1.03e+00

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.001
 rule-0.001
opus-0.001
oc-0.001
umpy-0.001
Token this
Token ID428
Feature activation-2.40e+00

Pos logit contributions
 her+0.009
 his+0.007
 their+0.007
 the+0.007
 its+0.006
Neg logit contributions
 promise-0.018
 rule-0.017
-0.017
opus-0.017
oc-0.017
Token is
Token ID318
Feature activation+4.43e-03

Pos logit contributions
ode+0.104
ney+0.104
opus+0.102
umpy+0.102
uer+0.101
Neg logit contributions
 her-0.117
 the-0.117
 their-0.110
 it-0.107
 his-0.102
Token a
Token ID257
Feature activation-4.76e+00

Pos logit contributions
 her+0.000
 his+0.000
 the+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.000
 rule-0.000
opus-0.000
umpy-0.000
oc-0.000
Token sculpture
Token ID26924
Feature activation-3.82e+00

Pos logit contributions
ode+0.237
ari+0.237
ney+0.233
iver+0.224
asis+0.224
Neg logit contributions
 her-0.201
 the-0.201
 their-0.200
 it-0.195
 his-0.188
Token of
Token ID286
Feature activation-3.85e-01

Pos logit contributions
iver+0.224
umpy+0.219
 promise+0.218
ini+0.218
 Driver+0.215
Neg logit contributions
 her-0.131
 the-0.126
 his-0.116
 their-0.113
 it-0.112
Token a
Token ID257
Feature activation-6.07e+00

Pos logit contributions
 promise+0.026
umpy+0.026
opus+0.026
oc+0.026
+0.026
Neg logit contributions
 her-0.009
 the-0.007
 their-0.007
 his-0.007
 its-0.006
Token cat
Token ID3797
Feature activation-3.45e+00

Pos logit contributions
iver+0.297
ari+0.263
asis+0.260
umpy+0.257
rig+0.251
Neg logit contributions
 it-0.299
 the-0.298
 their-0.278
 her-0.266
 its-0.265
Token.
Token ID13
Feature activation-2.69e+00

Pos logit contributions
oral+0.204
 promise+0.199
Two+0.197
ini+0.196
umpy+0.196
Neg logit contributions
 her-0.128
 the-0.119
 it-0.114
 their-0.110
 his-0.099
Token It
Token ID632
Feature activation-1.39e+00

Pos logit contributions
 promise+0.161
asis+0.159
ater+0.155
opus+0.152
ute+0.151
Neg logit contributions
 her-0.097
 the-0.090
 their-0.085
 his-0.083
 its-0.082
Token is
Token ID318
Feature activation+1.26e-01

Pos logit contributions
ute+0.058
opus+0.058
umpy+0.057
 promise+0.057
ater+0.056
Neg logit contributions
 her-0.069
 the-0.066
 their-0.063
 his-0.061
 it-0.060
Token white
Token ID2330
Feature activation-1.80e+00

Pos logit contributions
 her+0.004
 his+0.004
 the+0.004
 their+0.004
 its+0.003
Neg logit contributions
 promise-0.007
 rule-0.007
umpy-0.007
opus-0.007
-0.007
Token
Token ID129
Feature activation-2.38e+00

Pos logit contributions
 rule+0.147
 promise+0.147
 favor+0.145
reath+0.142
ari+0.141
Neg logit contributions
 her-0.129
 the-0.119
 his-0.114
 it-0.110
 their-0.107
Token
Token ID241
Feature activation-2.98e+00

Pos logit contributions
 promise+0.098
 rule+0.090
ari+0.087
 favor+0.087
opus+0.086
Neg logit contributions
 her-0.128
 the-0.119
 his-0.117
 its-0.107
 their-0.106
Tokenyou
Token ID5832
Feature activation-2.70e+00

Pos logit contributions
 promise+0.158
ari+0.156
opus+0.149
affe+0.149
asis+0.148
Neg logit contributions
 her-0.128
 the-0.118
 his-0.114
 it-0.112
 their-0.108
Token can
Token ID460
Feature activation-3.26e+00

Pos logit contributions
opus+0.125
oc+0.124
asis+0.118
ater+0.118
actus+0.115
Neg logit contributions
 her-0.128
 the-0.127
 it-0.118
 his-0.110
 their-0.108
Token see
Token ID766
Feature activation+1.24e-01

Pos logit contributions
oc+0.165
affe+0.156
ari+0.155
opus+0.150
umpy+0.149
Neg logit contributions
 her-0.165
 the-0.152
 it-0.145
 his-0.139
 their-0.133
Token a
Token ID257
Feature activation-5.95e+00

Pos logit contributions
 her+0.003
 his+0.002
 their+0.002
 the+0.002
 its+0.002
Neg logit contributions
 promise-0.008
 rule-0.008
opus-0.008
-0.008
oc-0.008
Token little
Token ID1310
Feature activation-4.07e+00

Pos logit contributions
iver+0.267
umpy+0.267
rr+0.253
age+0.252
ari+0.252
Neg logit contributions
 the-0.279
 it-0.270
 her-0.256
 his-0.246
 their-0.242
Token face
Token ID1986
Feature activation-2.41e+00

Pos logit contributions
ari+0.180
arl+0.172
ute+0.170
age+0.170
oral+0.169
Neg logit contributions
 the-0.203
 her-0.195
 it-0.190
 his-0.177
 their-0.171
Token in
Token ID287
Feature activation-1.34e-01

Pos logit contributions
 promise+0.126
oc+0.126
opus+0.125
 rule+0.122
affe+0.122
Neg logit contributions
 her-0.100
 the-0.089
 his-0.083
 their-0.082
 it-0.079
Token the
Token ID262
Feature activation-3.04e+00

Pos logit contributions
 promise+0.010
opus+0.010
umpy+0.010
oc+0.010
 rule+0.010
Neg logit contributions
 her-0.002
 the-0.002
 his-0.002
 their-0.002
 its-0.001
Token bark
Token ID21405
Feature activation-2.38e+00

Pos logit contributions
oc+0.125
uer+0.125
umpy+0.122
opus+0.122
affe+0.121
Neg logit contributions
 her-0.166
 the-0.160
 his-0.151
 their-0.141
 its-0.132
Token I
Token ID314
Feature activation-1.24e+00

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.003
 rule-0.003
opus-0.003
-0.003
oc-0.003
Token have
Token ID423
Feature activation+7.35e-01

Pos logit contributions
affe+0.066
umpy+0.064
ute+0.063
opus+0.063
ney+0.063
Neg logit contributions
 her-0.049
 the-0.047
 his-0.044
 their-0.042
 it-0.042
Token two
Token ID734
Feature activation-2.28e+00

Pos logit contributions
 her+0.017
 their+0.013
 his+0.013
 the+0.012
led+0.011
Neg logit contributions
-0.047
 promise-0.047
 rule-0.046
opus-0.046
ater-0.046
Token cards
Token ID4116
Feature activation-2.82e+00

Pos logit contributions
ute+0.107
ney+0.105
oc+0.104
opus+0.103
umpy+0.102
Neg logit contributions
 her-0.108
 the-0.100
 his-0.094
 their-0.093
 it-0.089
Token with
Token ID351
Feature activation-4.37e-01

Pos logit contributions
oc+0.115
umpy+0.115
opus+0.115
asis+0.111
ute+0.111
Neg logit contributions
 her-0.148
 the-0.137
 their-0.131
 his-0.129
 it-0.127
Token a
Token ID257
Feature activation-5.85e+00

Pos logit contributions
 promise+0.024
umpy+0.024
opus+0.023
oc+0.023
 rule+0.023
Neg logit contributions
 her-0.016
 the-0.014
 their-0.014
 his-0.013
 its-0.012
Token dog
Token ID3290
Feature activation-3.86e+00

Pos logit contributions
iver+0.275
umpy+0.256
asis+0.246
ney+0.242
ini+0.242
Neg logit contributions
 her-0.291
 the-0.285
 their-0.283
 it-0.276
 his-0.267
Token!"
Token ID2474
Feature activation+1.51e+00

Pos logit contributions
oral+0.157
 promise+0.154
umpy+0.151
ini+0.148
reath+0.147
Neg logit contributions
 her-0.216
 the-0.207
 it-0.192
 their-0.191
 his-0.189
Token Ben
Token ID3932
Feature activation-1.44e-01

Pos logit contributions
bled+0.039
lled+0.039
 her+0.039
led+0.038
ied+0.034
Neg logit contributions
-0.085
 promise-0.081
 rule-0.080
 request-0.079
 judge-0.078
Token says
Token ID1139
Feature activation+1.63e+00

Pos logit contributions
 promise+0.006
opus+0.006
 rule+0.006
umpy+0.006
oc+0.006
Neg logit contributions
 her-0.007
 the-0.006
 his-0.006
 their-0.006
 its-0.005
Token.
Token ID13
Feature activation+4.13e-01

Pos logit contributions
 her+0.087
lled+0.080
 their+0.077
 his+0.077
 the+0.077
Neg logit contributions
 promise-0.049
 request-0.045
 rule-0.044
-0.043
 team-0.043
Token
Token ID129
Feature activation-2.56e+00

Pos logit contributions
 promise+0.177
 rule+0.177
 favor+0.173
reath+0.168
ode+0.167
Neg logit contributions
 her-0.118
 the-0.113
 it-0.104
 his-0.104
 their-0.095
Token
Token ID241
Feature activation-3.04e+00

Pos logit contributions
 promise+0.100
ode+0.093
 rule+0.090
opus+0.089
 difference+0.089
Neg logit contributions
 her-0.137
 the-0.129
 his-0.125
 their-0.116
 its-0.115
TokenI
Token ID40
Feature activation-2.45e+00

Pos logit contributions
 promise+0.162
ari+0.158
opus+0.154
ode+0.152
asis+0.152
Neg logit contributions
 her-0.128
 the-0.119
 it-0.115
 his-0.115
 their-0.111
Token am
Token ID716
Feature activation-4.52e-01

Pos logit contributions
opus+0.100
affe+0.095
ari+0.089
asis+0.089
 farm+0.088
Neg logit contributions
 her-0.133
 the-0.124
 his-0.123
 it-0.119
 their-0.117
Token baking
Token ID16871
Feature activation-5.94e-01

Pos logit contributions
oc+0.026
 promise+0.026
opus+0.026
umpy+0.025
+0.025
Neg logit contributions
 her-0.015
 the-0.013
 his-0.013
 their-0.013
 it-0.011
Token a
Token ID257
Feature activation-5.80e+00

Pos logit contributions
 promise+0.036
umpy+0.035
opus+0.035
 rule+0.034
oc+0.034
Neg logit contributions
 her-0.019
 the-0.016
 his-0.016
 their-0.016
 it-0.014
Token delicious
Token ID12625
Feature activation-4.41e+00

Pos logit contributions
umpy+0.314
rr+0.299
iver+0.298
arl+0.295
ari+0.288
Neg logit contributions
 the-0.238
 her-0.235
 it-0.222
 their-0.205
 his-0.191
Token cake
Token ID12187
Feature activation-3.16e+00

Pos logit contributions
umpy+0.232
arl+0.216
ari+0.212
rr+0.211
 Driver+0.204
Neg logit contributions
 the-0.205
 her-0.204
 it-0.195
 his-0.172
 their-0.158
Token.
Token ID13
Feature activation-3.69e+00

Pos logit contributions
umpy+0.142
 promise+0.134
Two+0.131
opus+0.129
arl+0.128
Neg logit contributions
 her-0.160
 the-0.146
 his-0.139
 it-0.137
 their-0.136
Tokenâ
Token ID22940
Feature activation-2.08e+00

Pos logit contributions
 promise+0.205
ater+0.198
asis+0.195
 rule+0.192
an+0.192
Neg logit contributions
 her-0.150
 the-0.142
 his-0.134
 it-0.132
 its-0.118
Token
Token ID26391
Feature activation-2.55e+00

Pos logit contributions
 team+0.079
oc+0.079
opus+0.079
 rule+0.078
affe+0.077
Neg logit contributions
 her-0.104
 the-0.098
 his-0.092
 their-0.084
 him-0.084
Token flag
Token ID6056
Feature activation-3.21e+00

Pos logit contributions
iver+0.240
umpy+0.237
rig+0.226
asis+0.220
ney+0.216
Neg logit contributions
 it-0.275
 her-0.274
 the-0.271
 their-0.262
 his-0.249
Token,
Token ID11
Feature activation-5.27e-01

Pos logit contributions
opus+0.189
 promise+0.186
umpy+0.185
oral+0.182
Two+0.182
Neg logit contributions
 her-0.114
 the-0.110
 his-0.097
 their-0.096
 it-0.092
Token a
Token ID257
Feature activation-5.73e+00

Pos logit contributions
 promise+0.029
umpy+0.029
opus+0.029
oc+0.028
+0.028
Neg logit contributions
 her-0.019
 the-0.017
 their-0.017
 his-0.016
 its-0.015
Token map
Token ID3975
Feature activation-3.66e+00

Pos logit contributions
umpy+0.252
iver+0.249
ater+0.227
rig+0.224
ini+0.220
Neg logit contributions
 her-0.297
 the-0.292
 it-0.284
 their-0.283
 his-0.274
Token,
Token ID11
Feature activation-4.18e-01

Pos logit contributions
 promise+0.215
opus+0.215
ute+0.214
iver+0.214
ini+0.213
Neg logit contributions
 her-0.122
 the-0.122
 it-0.102
 his-0.101
 their-0.101
Token a
Token ID257
Feature activation-5.74e+00

Pos logit contributions
 promise+0.022
opus+0.022
umpy+0.022
oc+0.022
+0.022
Neg logit contributions
 her-0.016
 the-0.014
 their-0.014
 his-0.014
 its-0.012
Token landmark
Token ID20533
Feature activation-1.77e+00

Pos logit contributions
umpy+0.255
iver+0.249
ater+0.225
ivering+0.222
ini+0.218
Neg logit contributions
 her-0.302
 the-0.294
 their-0.285
 it-0.282
 his-0.279
Token,
Token ID11
Feature activation-6.52e-01

Pos logit contributions
opus+0.102
oc+0.101
umpy+0.100
 promise+0.100
+0.098
Neg logit contributions
 her-0.060
 the-0.056
 their-0.052
 his-0.050
 it-0.047
Token or
Token ID393
Feature activation-2.88e-01

Pos logit contributions
opus+0.035
 promise+0.035
umpy+0.035
oc+0.034
+0.034
Neg logit contributions
 her-0.025
 the-0.022
 their-0.022
 his-0.021
 its-0.020
Token a
Token ID257
Feature activation-5.63e+00

Pos logit contributions
 promise+0.020
opus+0.020
umpy+0.020
oc+0.020
+0.020
Neg logit contributions
 her-0.006
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token symbol
Token ID6194
Feature activation-2.88e+00

Pos logit contributions
umpy+0.235
iver+0.235
ivering+0.208
ini+0.207
ater+0.206
Neg logit contributions
 her-0.301
 the-0.295
 their-0.285
 it-0.284
 his-0.278
Token,
Token ID11
Feature activation-1.10e+00

Pos logit contributions
opus+0.061
oc+0.060
 promise+0.060
affe+0.059
umpy+0.059
Neg logit contributions
 her-0.043
 the-0.038
 their-0.037
 his-0.037
 it-0.033
Token like
Token ID588
Feature activation-1.15e+00

Pos logit contributions
opus+0.058
umpy+0.057
+0.056
oc+0.056
asis+0.056
Neg logit contributions
 her-0.045
 the-0.040
 their-0.039
 his-0.039
 it-0.036
Token a
Token ID257
Feature activation-5.51e+00

Pos logit contributions
umpy+0.068
 promise+0.067
opus+0.066
oc+0.066
asis+0.066
Neg logit contributions
 her-0.042
 the-0.037
 their-0.035
 his-0.035
 it-0.032
Token flag
Token ID6056
Feature activation-3.21e+00

Pos logit contributions
iver+0.240
umpy+0.237
rig+0.226
asis+0.220
ney+0.216
Neg logit contributions
 it-0.275
 her-0.274
 the-0.271
 their-0.262
 his-0.249
Token,
Token ID11
Feature activation-5.27e-01

Pos logit contributions
opus+0.189
 promise+0.186
umpy+0.185
oral+0.182
Two+0.182
Neg logit contributions
 her-0.114
 the-0.110
 his-0.097
 their-0.096
 it-0.092
Token a
Token ID257
Feature activation-5.73e+00

Pos logit contributions
 promise+0.029
umpy+0.029
opus+0.029
oc+0.028
+0.028
Neg logit contributions
 her-0.019
 the-0.017
 their-0.017
 his-0.016
 its-0.015
Token map
Token ID3975
Feature activation-3.66e+00

Pos logit contributions
umpy+0.252
iver+0.249
ater+0.227
rig+0.224
ini+0.220
Neg logit contributions
 her-0.297
 the-0.292
 it-0.284
 their-0.283
 his-0.274
Token,
Token ID11
Feature activation-4.18e-01

Pos logit contributions
 promise+0.215
opus+0.215
ute+0.214
iver+0.214
ini+0.213
Neg logit contributions
 her-0.122
 the-0.122
 it-0.102
 his-0.101
 their-0.101
Token a
Token ID257
Feature activation-5.74e+00

Pos logit contributions
 promise+0.022
opus+0.022
umpy+0.022
oc+0.022
+0.022
Neg logit contributions
 her-0.016
 the-0.014
 their-0.014
 his-0.014
 its-0.012
Token landmark
Token ID20533
Feature activation-1.77e+00

Pos logit contributions
umpy+0.255
iver+0.249
ater+0.225
ivering+0.222
ini+0.218
Neg logit contributions
 her-0.302
 the-0.294
 their-0.285
 it-0.282
 his-0.279
Token,
Token ID11
Feature activation-6.52e-01

Pos logit contributions
opus+0.102
oc+0.101
umpy+0.100
 promise+0.100
+0.098
Neg logit contributions
 her-0.060
 the-0.056
 their-0.052
 his-0.050
 it-0.047
Token and
Token ID290
Feature activation-2.17e-01

Pos logit contributions
umpy+0.058
 promise+0.057
opus+0.056
oc+0.056
ater+0.056
Neg logit contributions
 her-0.039
 the-0.036
 his-0.035
 their-0.034
 it-0.032
Token a
Token ID257
Feature activation-5.60e+00

Pos logit contributions
 promise+0.014
umpy+0.014
opus+0.014
 rule+0.014
oc+0.014
Neg logit contributions
 her-0.006
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token monkey
Token ID21657
Feature activation-3.83e+00

Pos logit contributions
iver+0.261
umpy+0.246
ater+0.236
ivering+0.228
inning+0.224
Neg logit contributions
 the-0.290
 his-0.270
 their-0.266
 her-0.265
 it-0.263
Token,
Token ID11
Feature activation-1.83e+00

Pos logit contributions
oral+0.158
ini+0.151
arl+0.151
icious+0.150
 promise+0.150
Neg logit contributions
 the-0.201
 her-0.199
 his-0.195
 it-0.187
 their-0.182
Token and
Token ID290
Feature activation-2.24e-01

Pos logit contributions
umpy+0.113
 promise+0.110
opus+0.110
ater+0.110
oc+0.109
Neg logit contributions
 her-0.063
 the-0.058
 his-0.056
 their-0.052
 it-0.049
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 promise+0.015
umpy+0.015
opus+0.015
 rule+0.015
oc+0.015
Neg logit contributions
 her-0.005
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation-3.12e+00

Pos logit contributions
iver+0.264
umpy+0.247
ater+0.240
ivering+0.228
inning+0.226
Neg logit contributions
 the-0.296
 his-0.279
 her-0.277
 their-0.271
 it-0.264
Tokenaffe
Token ID21223
Feature activation-3.75e+00

Pos logit contributions
 promise+0.080
ograph+0.074
iver+0.072
+0.071
ute+0.071
Neg logit contributions
 her-0.190
 the-0.188
 his-0.188
 its-0.177
 him-0.176
Token,
Token ID11
Feature activation-1.99e+00

Pos logit contributions
oral+0.161
icious+0.155
arl+0.154
iver+0.153
 promise+0.153
Neg logit contributions
 her-0.198
 the-0.198
 his-0.192
 it-0.182
 their-0.177
Token and
Token ID290
Feature activation-4.79e-01

Pos logit contributions
umpy+0.122
opus+0.120
 promise+0.119
ater+0.118
asis+0.117
Neg logit contributions
 her-0.068
 the-0.062
 his-0.059
 their-0.056
 it-0.052
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 promise+0.034
umpy+0.034
opus+0.034
 rule+0.033
ute+0.033
Neg logit contributions
 her-0.010
 the-0.007
 his-0.007
 their-0.007
 its-0.006
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 promise+0.015
umpy+0.015
opus+0.015
 rule+0.015
oc+0.015
Neg logit contributions
 her-0.005
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation-3.12e+00

Pos logit contributions
iver+0.264
umpy+0.247
ater+0.240
ivering+0.228
inning+0.226
Neg logit contributions
 the-0.296
 his-0.279
 her-0.277
 their-0.271
 it-0.264
Tokenaffe
Token ID21223
Feature activation-3.75e+00

Pos logit contributions
 promise+0.080
ograph+0.074
iver+0.072
+0.071
ute+0.071
Neg logit contributions
 her-0.190
 the-0.188
 his-0.188
 its-0.177
 him-0.176
Token,
Token ID11
Feature activation-1.99e+00

Pos logit contributions
oral+0.161
icious+0.155
arl+0.154
iver+0.153
 promise+0.153
Neg logit contributions
 her-0.198
 the-0.198
 his-0.192
 it-0.182
 their-0.177
Token and
Token ID290
Feature activation-4.79e-01

Pos logit contributions
umpy+0.122
opus+0.120
 promise+0.119
ater+0.118
asis+0.117
Neg logit contributions
 her-0.068
 the-0.062
 his-0.059
 their-0.056
 it-0.052
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 promise+0.034
umpy+0.034
opus+0.034
 rule+0.033
ute+0.033
Neg logit contributions
 her-0.010
 the-0.007
 his-0.007
 their-0.007
 its-0.006
Token snake
Token ID17522
Feature activation-4.51e+00

Pos logit contributions
iver+0.264
umpy+0.247
ater+0.239
ivering+0.228
inning+0.226
Neg logit contributions
 the-0.296
 her-0.281
 his-0.278
 their-0.271
 it-0.261
Token!"
Token ID2474
Feature activation+3.63e-01

Pos logit contributions
oral+0.206
iver+0.205
icious+0.204
ini+0.202
umpy+0.202
Neg logit contributions
 her-0.211
 the-0.208
 his-0.201
 their-0.196
 it-0.189
Token He
Token ID679
Feature activation-2.95e-01

Pos logit contributions
 her+0.013
 their+0.011
 the+0.011
 his+0.011
 its+0.010
Neg logit contributions
 promise-0.019
-0.019
 rule-0.019
oc-0.018
umpy-0.018
Token points
Token ID2173
Feature activation+2.50e+00

Pos logit contributions
 promise+0.012
umpy+0.012
opus+0.012
 rule+0.012
oc+0.012
Neg logit contributions
 her-0.014
 the-0.013
 his-0.013
 their-0.013
 its-0.012
Token to
Token ID284
Feature activation+1.17e+00

Pos logit contributions
bled+0.089
led+0.080
lled+0.076
 her+0.072
ered+0.066
Neg logit contributions
 goal-0.117
 name-0.115
 request-0.112
 team-0.111
 promise-0.110
Token a
Token ID257
Feature activation-4.62e+00

Pos logit contributions
 her+0.007
 his+0.006
 their+0.006
 the+0.006
 its+0.005
Neg logit contributions
 promise-0.013
 rule-0.012
umpy-0.012
-0.012
opus-0.012
Token tiny
Token ID7009
Feature activation-3.69e+00

Pos logit contributions
umpy+0.259
iver+0.258
ides+0.254
rr+0.252
'."+0.250
Neg logit contributions
 the-0.180
 her-0.177
 their-0.165
 his-0.163
 it-0.161
Token box
Token ID3091
Feature activation-2.15e+00

Pos logit contributions
icious+0.160
asis+0.156
ari+0.156
age+0.153
iver+0.152
Neg logit contributions
 her-0.186
 the-0.181
 his-0.170
 it-0.165
 their-0.163
Token shaped
Token ID14292
Feature activation-3.04e-01

Pos logit contributions
 promise+0.130
opus+0.128
oc+0.128
umpy+0.127
+0.127
Neg logit contributions
 her-0.070
 the-0.062
 his-0.059
 their-0.056
 it-0.053
Token like
Token ID588
Feature activation+7.46e-01

Pos logit contributions
 promise+0.017
opus+0.016
+0.016
umpy+0.016
oc+0.016
Neg logit contributions
 her-0.011
 the-0.009
 his-0.009
 their-0.009
 its-0.008
Token a
Token ID257
Feature activation-5.70e+00

Pos logit contributions
 her+0.020
 their+0.017
 his+0.016
 the+0.016
led+0.014
Neg logit contributions
 promise-0.044
 rule-0.044
opus-0.043
-0.043
oc-0.043
Token heart
Token ID2612
Feature activation-2.04e+00

Pos logit contributions
umpy+0.280
iver+0.270
ater+0.247
rr+0.245
ari+0.244
Neg logit contributions
 her-0.279
 it-0.274
 the-0.270
 his-0.256
 their-0.256
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
 promise+0.110
opus+0.110
oc+0.108
+0.106
 rule+0.106
Neg logit contributions
 her-0.080
 the-0.073
 his-0.067
 their-0.066
 it-0.064
Token
Token ID220
Feature activation-6.83e-01

Pos logit contributions
 promise+0.103
ater+0.099
asis+0.098
 rule+0.098
opus+0.097
Neg logit contributions
 her-0.067
 the-0.060
 his-0.059
 their-0.056
 it-0.053
Token
Token ID198
Feature activation-1.89e+00

Pos logit contributions
 promise+0.031
opus+0.030
 rule+0.030
oc+0.030
ater+0.030
Neg logit contributions
 her-0.032
 the-0.028
 his-0.028
 their-0.028
 it-0.026
Token
Token ID198
Feature activation-2.35e+00

Pos logit contributions
 promise+0.107
asis+0.103
opus+0.103
 rule+0.102
ater+0.102
Neg logit contributions
 her-0.076
 the-0.068
 his-0.067
 their-0.063
 it-0.061
Token,
Token ID11
Feature activation+9.78e-01

Pos logit contributions
 her+0.052
 his+0.047
 their+0.046
led+0.045
 the+0.045
Neg logit contributions
opus-0.049
-0.048
 promise-0.048
umpy-0.047
rr-0.047
Token there
Token ID612
Feature activation+2.12e+00

Pos logit contributions
 her+0.026
 his+0.021
 their+0.020
 the+0.019
 him+0.019
Neg logit contributions
-0.059
ute-0.059
umpy-0.058
 promise-0.058
 rule-0.057
Token was
Token ID373
Feature activation+1.19e+00

Pos logit contributions
 her+0.095
led+0.093
bled+0.087
 able+0.085
 their+0.085
Neg logit contributions
 world-0.078
oc-0.078
 story-0.076
-0.074
 promise-0.073
Token an
Token ID281
Feature activation-1.46e+00

Pos logit contributions
 her+0.026
 his+0.021
led+0.021
 their+0.020
 the+0.020
Neg logit contributions
-0.073
umpy-0.071
ute-0.070
opus-0.070
 world-0.070
Token old
Token ID1468
Feature activation-3.00e+00

Pos logit contributions
+0.072
 promise+0.068
oc+0.067
!'"+0.067
age+0.067
Neg logit contributions
 her-0.065
 the-0.059
 his-0.058
 their-0.056
 its-0.051
Token wooden
Token ID13510
Feature activation-5.70e+00

Pos logit contributions
icious+0.131
 Deal+0.125
umpy+0.124
rr+0.123
rig+0.123
Neg logit contributions
 her-0.166
 the-0.154
 his-0.150
 their-0.139
 it-0.138
Token box
Token ID3091
Feature activation-1.42e+00

Pos logit contributions
iver+0.254
umpy+0.244
icious+0.243
!'"+0.223
rig+0.223
Neg logit contributions
 her-0.282
 the-0.276
 his-0.275
 their-0.247
 all-0.244
Token that
Token ID326
Feature activation+2.12e+00

Pos logit contributions
 promise+0.078
opus+0.076
oc+0.076
+0.076
umpy+0.075
Neg logit contributions
 her-0.054
 the-0.049
 his-0.046
 their-0.045
 it-0.041
Token no
Token ID645
Feature activation-7.92e-01

Pos logit contributions
led+0.090
 her+0.089
lled+0.088
ged+0.088
 their+0.085
Neg logit contributions
 promise-0.082
 world-0.079
 rule-0.078
ney-0.078
 hill-0.076
Token one
Token ID530
Feature activation-8.45e-03

Pos logit contributions
+0.038
umpy+0.038
opus+0.037
 promise+0.037
asis+0.037
Neg logit contributions
 her-0.036
 his-0.032
 the-0.032
 their-0.031
 it-0.029
Token could
Token ID714
Feature activation-7.42e-01

Pos logit contributions
 promise+0.000
 rule+0.000
opus+0.000
umpy+0.000
oc+0.000
Neg logit contributions
 her-0.000
 his-0.000
 the-0.000
 their-0.000
 its-0.000
Token "
Token ID366
Feature activation-2.38e+00

Pos logit contributions
oc+0.080
 Driver+0.078
ney+0.076
age+0.074
umpy+0.074
Neg logit contributions
 her-0.088
 the-0.083
 it-0.075
 its-0.069
 their-0.067
TokenI
Token ID40
Feature activation-2.30e+00

Pos logit contributions
ari+0.115
opus+0.114
 promise+0.113
asis+0.113
ute+0.112
Neg logit contributions
 her-0.106
 his-0.093
 the-0.092
 their-0.087
 it-0.086
Token was
Token ID373
Feature activation-6.18e-01

Pos logit contributions
opus+0.097
affe+0.092
ari+0.091
ute+0.088
asis+0.088
Neg logit contributions
 her-0.123
 the-0.113
 his-0.112
 their-0.110
 it-0.108
Token born
Token ID4642
Feature activation-1.49e+00

Pos logit contributions
 promise+0.035
oc+0.035
opus+0.035
umpy+0.035
affe+0.035
Neg logit contributions
 her-0.021
 the-0.018
 his-0.018
 their-0.018
 it-0.016
Token as
Token ID355
Feature activation-5.03e-01

Pos logit contributions
 promise+0.085
opus+0.083
umpy+0.082
oc+0.082
 rule+0.082
Neg logit contributions
 her-0.054
 the-0.050
 his-0.046
 their-0.044
 it-0.042
Token a
Token ID257
Feature activation-5.69e+00

Pos logit contributions
 promise+0.027
oc+0.026
 rule+0.026
opus+0.026
umpy+0.026
Neg logit contributions
 her-0.019
 the-0.017
 his-0.017
 their-0.016
 it-0.015
Token cater
Token ID20825
Feature activation-2.65e+00

Pos logit contributions
iver+0.296
umpy+0.278
ney+0.264
ater+0.262
ulture+0.258
Neg logit contributions
 the-0.241
 her-0.237
 it-0.234
 his-0.218
 their-0.211
Tokenpillar
Token ID41643
Feature activation-3.09e+00

Pos logit contributions
 promise+0.079
 rule+0.071
 Driver+0.069
ney+0.067
 request+0.066
Neg logit contributions
 her-0.169
 the-0.156
led-0.150
ted-0.149
 it-0.148
Token.
Token ID13
Feature activation-3.25e+00

Pos logit contributions
 promise+0.160
 rule+0.158
umpy+0.157
ute+0.153
ney+0.153
Neg logit contributions
 her-0.129
 the-0.116
 it-0.106
 his-0.104
 its-0.100
Token A
Token ID317
Feature activation-3.81e+00

Pos logit contributions
 promise+0.170
asis+0.168
ater+0.167
 rule+0.160
actus+0.159
Neg logit contributions
 her-0.137
 the-0.131
 his-0.123
 it-0.120
 its-0.116
Token cater
Token ID20825
Feature activation-2.34e+00

Pos logit contributions
ari+0.196
iver+0.195
ney+0.191
umpy+0.190
 Driver+0.189
Neg logit contributions
 her-0.163
 the-0.159
 it-0.146
 its-0.132
 his-0.129
Token?
Token ID30
Feature activation-1.54e+00

Pos logit contributions
iver+0.128
umpy+0.126
ute+0.125
ater+0.122
opus+0.121
Neg logit contributions
 the-0.094
 her-0.092
 their-0.088
 his-0.081
 it-0.080
Token It
Token ID632
Feature activation-5.28e-01

Pos logit contributions
 promise+0.074
umpy+0.070
ater+0.070
asis+0.070
 rule+0.070
Neg logit contributions
 her-0.071
 the-0.066
 their-0.064
 his-0.063
 it-0.062
Token looks
Token ID3073
Feature activation+5.49e-01

Pos logit contributions
 promise+0.019
opus+0.019
umpy+0.019
ute+0.019
ater+0.019
Neg logit contributions
 her-0.029
 the-0.027
 their-0.026
 his-0.026
 it-0.024
Token more
Token ID517
Feature activation-1.20e+00

Pos logit contributions
 her+0.024
 his+0.021
 their+0.021
 the+0.021
 its+0.020
Neg logit contributions
 promise-0.025
 rule-0.024
-0.024
umpy-0.023
oc-0.023
Token like
Token ID588
Feature activation+1.65e-01

Pos logit contributions
opus+0.064
 promise+0.062
oc+0.061
affe+0.061
age+0.061
Neg logit contributions
 her-0.047
 the-0.044
 their-0.043
 his-0.043
 it-0.040
Token a
Token ID257
Feature activation-5.68e+00

Pos logit contributions
 her+0.004
 his+0.003
 their+0.003
 the+0.003
 its+0.003
Neg logit contributions
 promise-0.011
 rule-0.011
-0.011
opus-0.011
umpy-0.011
Token mess
Token ID2085
Feature activation-1.15e+00

Pos logit contributions
iver+0.270
ater+0.254
umpy+0.247
ute+0.236
int+0.234
Neg logit contributions
 it-0.268
 the-0.267
 their-0.262
 her-0.245
 they-0.245
Token.
Token ID13
Feature activation-1.48e+00

Pos logit contributions
 promise+0.053
oc+0.051
opus+0.051
ute+0.051
 rule+0.050
Neg logit contributions
 her-0.055
 the-0.050
 their-0.048
 his-0.048
 it-0.046
Token And
Token ID843
Feature activation+2.34e-02

Pos logit contributions
 promise+0.070
opus+0.070
asis+0.068
 rule+0.066
ater+0.065
Neg logit contributions
 her-0.066
 the-0.061
 their-0.060
 his-0.057
 it-0.057
Token some
Token ID617
Feature activation-1.63e+00

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.001
 rule-0.001
opus-0.001
oc-0.001
-0.001
Token of
Token ID286
Feature activation+5.76e-01

Pos logit contributions
opus+0.081
ute+0.080
umpy+0.079
+0.078
ater+0.078
Neg logit contributions
 her-0.070
 the-0.065
 their-0.064
 his-0.063
 it-0.058
Token has
Token ID468
Feature activation+4.65e-01

Pos logit contributions
 promise+0.047
ute+0.047
umpy+0.047
ater+0.047
opus+0.046
Neg logit contributions
 her-0.062
 the-0.057
 his-0.056
 their-0.056
 it-0.052
Token a
Token ID257
Feature activation-4.84e+00

Pos logit contributions
 her+0.014
 their+0.012
 his+0.012
 the+0.012
 its+0.010
Neg logit contributions
 promise-0.026
 rule-0.026
oc-0.026
umpy-0.026
opus-0.026
Token long
Token ID890
Feature activation-2.79e+00

Pos logit contributions
iver+0.228
umpy+0.220
ater+0.218
asis+0.210
inning+0.209
Neg logit contributions
 her-0.227
 his-0.217
 it-0.213
 the-0.212
 its-0.204
Token nose
Token ID9686
Feature activation-2.19e+00

Pos logit contributions
 promise+0.120
opus+0.119
umpy+0.116
rig+0.112
icious+0.111
Neg logit contributions
 her-0.141
 his-0.134
 the-0.130
 its-0.125
 it-0.124
Token called
Token ID1444
Feature activation-2.18e+00

Pos logit contributions
 promise+0.135
 rule+0.126
+0.126
 favor+0.125
oral+0.124
Neg logit contributions
 her-0.075
 the-0.068
 his-0.064
 it-0.062
 their-0.060
Token a
Token ID257
Feature activation-5.65e+00

Pos logit contributions
 promise+0.129
umpy+0.122
asis+0.122
 rule+0.120
opus+0.120
Neg logit contributions
 her-0.080
 the-0.074
 his-0.068
 their-0.068
 its-0.063
Token trunk
Token ID21427
Feature activation-3.29e+00

Pos logit contributions
iver+0.250
umpy+0.232
ater+0.227
rig+0.225
 promise+0.224
Neg logit contributions
 the-0.296
 her-0.295
 his-0.282
 it-0.274
 its-0.266
Token.
Token ID13
Feature activation-1.86e+00

Pos logit contributions
 promise+0.206
iver+0.202
opus+0.198
icious+0.198
 Driver+0.196
Neg logit contributions
 the-0.104
 her-0.100
 his-0.088
 it-0.086
 their-0.082
Token It
Token ID632
Feature activation-1.16e+00

Pos logit contributions
 promise+0.097
asis+0.092
opus+0.091
 rule+0.090
ater+0.090
Neg logit contributions
 her-0.081
 the-0.074
 his-0.073
 their-0.072
 its-0.069
Token can
Token ID460
Feature activation-1.96e+00

Pos logit contributions
 promise+0.045
umpy+0.045
ute+0.045
opus+0.045
ater+0.045
Neg logit contributions
 her-0.061
 the-0.057
 his-0.056
 their-0.055
 it-0.052
Token use
Token ID779
Feature activation-4.94e-01

Pos logit contributions
umpy+0.081
ari+0.079
opus+0.079
ater+0.078
ute+0.078
Neg logit contributions
 her-0.108
 the-0.099
 his-0.098
 their-0.093
 it-0.090
TokenWow
Token ID22017
Feature activation-1.95e-01

Pos logit contributions
umpy+0.027
oc+0.027
opus+0.025
 promise+0.025
ater+0.025
Neg logit contributions
 her-0.050
 the-0.048
 his-0.046
 their-0.046
 it-0.044
Token!
Token ID0
Feature activation-2.49e+00

Pos logit contributions
 promise+0.009
oc+0.009
 rule+0.009
umpy+0.009
opus+0.009
Neg logit contributions
 her-0.008
 the-0.007
 his-0.007
 their-0.007
 it-0.006
Token You
Token ID921
Feature activation-1.18e+00

Pos logit contributions
ater+0.117
 promise+0.116
 rule+0.114
asis+0.114
ute+0.111
Neg logit contributions
 her-0.120
 the-0.110
 his-0.106
 their-0.103
 it-0.101
Token are
Token ID389
Feature activation+4.33e-01

Pos logit contributions
opus+0.055
oc+0.055
ute+0.055
 rule+0.054
umpy+0.053
Neg logit contributions
 her-0.052
 the-0.048
 his-0.044
 their-0.044
 it-0.043
Token a
Token ID257
Feature activation-4.06e+00

Pos logit contributions
 her+0.014
 his+0.012
 their+0.012
 the+0.012
 its+0.011
Neg logit contributions
 promise-0.024
 rule-0.024
umpy-0.023
opus-0.023
-0.023
Token purple
Token ID14032
Feature activation-5.65e+00

Pos logit contributions
iver+0.233
age+0.230
ney+0.228
asis+0.227
ari+0.224
Neg logit contributions
 her-0.152
 their-0.141
 the-0.139
 his-0.132
 it-0.129
Token z
Token ID1976
Feature activation-2.84e+00

Pos logit contributions
umpy+0.272
iver+0.265
rig+0.259
ney+0.247
asis+0.247
Neg logit contributions
 her-0.292
 the-0.261
 his-0.255
 it-0.249
 their-0.237
Tokenebra
Token ID37052
Feature activation-4.53e+00

Pos logit contributions
umpy+0.093
 difference+0.089
 rule+0.086
ney+0.085
 promise+0.085
Neg logit contributions
 her-0.154
 his-0.147
 the-0.143
bled-0.139
 their-0.136
Token!
Token ID0
Feature activation-2.91e+00

Pos logit contributions
oral+0.222
arl+0.221
umpy+0.215
reath+0.212
iver+0.210
Neg logit contributions
 her-0.202
 the-0.190
 his-0.178
 it-0.177
 their-0.171
Token Can
Token ID1680
Feature activation-8.03e-01

Pos logit contributions
asis+0.147
 promise+0.146
ater+0.140
opus+0.138
uddle+0.136
Neg logit contributions
 her-0.137
 the-0.125
 his-0.122
 its-0.121
 their-0.119
Token I
Token ID314
Feature activation-3.80e+00

Pos logit contributions
umpy+0.032
ater+0.031
 promise+0.030
oc+0.030
asis+0.030
Neg logit contributions
 her-0.044
 the-0.041
 his-0.039
 their-0.039
 it-0.039
Token"
Token ID1
Feature activation-1.98e+00

Pos logit contributions
 promise+0.077
 rule+0.073
ater+0.072
opus+0.070
affe+0.070
Neg logit contributions
 her-0.067
 the-0.063
 his-0.060
 their-0.058
 it-0.056
TokenLook
Token ID8567
Feature activation+2.85e-01

Pos logit contributions
 promise+0.089
oc+0.081
 rule+0.081
opus+0.080
asis+0.078
Neg logit contributions
 her-0.101
 the-0.092
 his-0.089
 their-0.087
 it-0.083
Token,
Token ID11
Feature activation-7.46e-01

Pos logit contributions
 her+0.009
 his+0.008
 their+0.007
 the+0.007
 its+0.007
Neg logit contributions
 promise-0.017
-0.016
opus-0.016
oc-0.016
 rule-0.016
Token there
Token ID612
Feature activation-1.59e+00

Pos logit contributions
 promise+0.052
umpy+0.051
oc+0.051
opus+0.051
 rule+0.051
Neg logit contributions
 her-0.017
 the-0.014
 his-0.013
 their-0.013
 it-0.011
Token's
Token ID338
Feature activation-6.64e-01

Pos logit contributions
umpy+0.069
ater+0.068
 promise+0.066
opus+0.066
oc+0.066
Neg logit contributions
 her-0.078
 the-0.074
 their-0.070
 his-0.069
 it-0.068
Token a
Token ID257
Feature activation-5.64e+00

Pos logit contributions
opus+0.049
oc+0.049
umpy+0.048
 promise+0.048
affe+0.048
Neg logit contributions
 her-0.013
 his-0.009
 their-0.009
 the-0.009
 its-0.007
Token stand
Token ID1302
Feature activation-2.44e+00

Pos logit contributions
age+0.323
umpy+0.316
iver+0.316
ater+0.314
ides+0.309
Neg logit contributions
 the-0.227
 it-0.203
 her-0.203
 his-0.188
 their-0.185
Token!"
Token ID2474
Feature activation+1.72e+00

Pos logit contributions
 promise+0.139
 rule+0.133
reath+0.131
icious+0.131
affe+0.131
Neg logit contributions
 her-0.082
 the-0.081
 it-0.074
 his-0.074
 their-0.072
Token said
Token ID531
Feature activation+1.93e+00

Pos logit contributions
 her+0.066
bled+0.063
 their+0.059
lled+0.058
 the+0.056
Neg logit contributions
-0.080
oc-0.073
 request-0.072
affe-0.072
 rule-0.072
Token Lucy
Token ID22162
Feature activation-2.48e-01

Pos logit contributions
 her+0.041
led+0.036
lled+0.034
bled+0.033
 their+0.030
Neg logit contributions
-0.117
 promise-0.115
 team-0.115
 request-0.114
 rule-0.113
Token.
Token ID13
Feature activation-1.15e+00

Pos logit contributions
 promise+0.012
 rule+0.011
umpy+0.011
opus+0.011
+0.011
Neg logit contributions
 her-0.011
 his-0.010
 the-0.010
 their-0.009
 its-0.009
Token,
Token ID11
Feature activation-4.18e-01

Pos logit contributions
 promise+0.215
opus+0.215
ute+0.214
iver+0.214
ini+0.213
Neg logit contributions
 her-0.122
 the-0.122
 it-0.102
 his-0.101
 their-0.101
Token a
Token ID257
Feature activation-5.74e+00

Pos logit contributions
 promise+0.022
opus+0.022
umpy+0.022
oc+0.022
+0.022
Neg logit contributions
 her-0.016
 the-0.014
 their-0.014
 his-0.014
 its-0.012
Token landmark
Token ID20533
Feature activation-1.77e+00

Pos logit contributions
umpy+0.255
iver+0.249
ater+0.225
ivering+0.222
ini+0.218
Neg logit contributions
 her-0.302
 the-0.294
 their-0.285
 it-0.282
 his-0.279
Token,
Token ID11
Feature activation-6.52e-01

Pos logit contributions
opus+0.102
oc+0.101
umpy+0.100
 promise+0.100
+0.098
Neg logit contributions
 her-0.060
 the-0.056
 their-0.052
 his-0.050
 it-0.047
Token or
Token ID393
Feature activation-2.88e-01

Pos logit contributions
opus+0.035
 promise+0.035
umpy+0.035
oc+0.034
+0.034
Neg logit contributions
 her-0.025
 the-0.022
 their-0.022
 his-0.021
 its-0.020
Token a
Token ID257
Feature activation-5.63e+00

Pos logit contributions
 promise+0.020
opus+0.020
umpy+0.020
oc+0.020
+0.020
Neg logit contributions
 her-0.006
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token symbol
Token ID6194
Feature activation-2.88e+00

Pos logit contributions
umpy+0.235
iver+0.235
ivering+0.208
ini+0.207
ater+0.206
Neg logit contributions
 her-0.301
 the-0.295
 their-0.285
 it-0.284
 his-0.278
Token.
Token ID13
Feature activation-2.13e+00

Pos logit contributions
 promise+0.135
opus+0.130
oc+0.130
iver+0.129
umpy+0.127
Neg logit contributions
 her-0.134
 the-0.129
 his-0.119
 their-0.118
 it-0.114
Token That
Token ID1320
Feature activation-1.06e+00

Pos logit contributions
opus+0.098
 promise+0.096
asis+0.096
ater+0.092
ute+0.092
Neg logit contributions
 her-0.103
 the-0.093
 his-0.090
 their-0.090
 its-0.086
Token way
Token ID835
Feature activation-9.20e-01

Pos logit contributions
umpy+0.044
opus+0.044
ater+0.043
arl+0.042
ute+0.042
Neg logit contributions
 her-0.053
 the-0.049
 their-0.048
 his-0.047
 it-0.046
Token,
Token ID11
Feature activation+4.29e-02

Pos logit contributions
oc+0.052
umpy+0.051
opus+0.051
ater+0.050
affe+0.050
Neg logit contributions
 her-0.034
 the-0.031
 their-0.029
 his-0.028
 it-0.027
Token draw
Token ID3197
Feature activation+6.09e-01

Pos logit contributions
umpy+0.143
ney+0.137
actus+0.135
opus+0.134
opard+0.131
Neg logit contributions
 her-0.125
 the-0.119
 his-0.108
 it-0.107
 him-0.099
Token me
Token ID502
Feature activation-1.02e+00

Pos logit contributions
 her+0.010
 their+0.008
 his+0.007
 the+0.007
 its+0.005
Neg logit contributions
-0.043
oc-0.042
 promise-0.042
opus-0.042
 rule-0.042
Token a
Token ID257
Feature activation-5.11e+00

Pos logit contributions
umpy+0.036
 promise+0.036
 rule+0.035
opus+0.035
affe+0.034
Neg logit contributions
 her-0.058
 the-0.054
 his-0.053
 their-0.050
 it-0.049
Token picture
Token ID4286
Feature activation-2.09e+00

Pos logit contributions
umpy+0.244
rr+0.241
ari+0.238
agons+0.236
iver+0.236
Neg logit contributions
 it-0.232
 the-0.226
 her-0.220
 his-0.207
 its-0.196
Token of
Token ID286
Feature activation-6.05e-01

Pos logit contributions
 promise+0.088
oc+0.087
ater+0.085
umpy+0.083
opus+0.082
Neg logit contributions
 her-0.114
 the-0.104
 his-0.101
 it-0.095
 their-0.094
Token a
Token ID257
Feature activation-5.62e+00

Pos logit contributions
umpy+0.036
 promise+0.036
opus+0.035
ute+0.035
oc+0.035
Neg logit contributions
 her-0.020
 the-0.018
 his-0.017
 their-0.017
 it-0.015
Token banana
Token ID25996
Feature activation-3.95e+00

Pos logit contributions
iver+0.270
umpy+0.258
rr+0.252
arl+0.252
agons+0.246
Neg logit contributions
 it-0.275
 the-0.267
 her-0.257
 his-0.235
 its-0.222
Token?"
Token ID1701
Feature activation-1.97e-01

Pos logit contributions
umpy+0.156
ini+0.152
rr+0.147
 promise+0.146
rig+0.144
Neg logit contributions
 her-0.213
 the-0.209
 his-0.194
 it-0.191
 how-0.175
Token
Token ID198
Feature activation-9.73e-01

Pos logit contributions
 promise+0.009
 rule+0.008
opus+0.008
umpy+0.008
oc+0.008
Neg logit contributions
 her-0.010
 the-0.008
 his-0.008
 their-0.008
 it-0.008
Token
Token ID198
Feature activation-1.74e+00

Pos logit contributions
asis+0.049
opus+0.048
 promise+0.048
 rule+0.048
ater+0.048
Neg logit contributions
 her-0.043
 the-0.039
 his-0.037
 their-0.037
 it-0.035
TokenL
Token ID43
Feature activation-6.77e-01

Pos logit contributions
asis+0.098
ode+0.088
affe+0.088
opus+0.088
ari+0.087
Neg logit contributions
 her-0.081
 the-0.068
 their-0.064
 his-0.063
 it-0.061
Token,
Token ID11
Feature activation-1.36e+00

Pos logit contributions
opus+0.082
 promise+0.082
oc+0.081
asis+0.080
umpy+0.080
Neg logit contributions
 her-0.049
 the-0.044
 their-0.042
 his-0.041
 it-0.039
Token a
Token ID257
Feature activation-5.47e+00

Pos logit contributions
opus+0.071
umpy+0.071
oc+0.069
asis+0.069
+0.068
Neg logit contributions
 her-0.058
 the-0.052
 his-0.051
 their-0.050
 it-0.045
Token nose
Token ID9686
Feature activation-2.19e+00

Pos logit contributions
umpy+0.236
asis+0.221
ari+0.218
ini+0.216
ater+0.216
Neg logit contributions
 her-0.284
 the-0.278
 his-0.269
 their-0.261
 its-0.260
Token,
Token ID11
Feature activation-9.91e-01

Pos logit contributions
opus+0.136
 promise+0.136
asis+0.134
+0.133
umpy+0.132
Neg logit contributions
 her-0.067
 the-0.059
 his-0.056
 their-0.054
 it-0.049
Token and
Token ID290
Feature activation-9.52e-01

Pos logit contributions
umpy+0.062
opus+0.062
 promise+0.061
oc+0.061
asis+0.060
Neg logit contributions
 her-0.030
 the-0.026
 his-0.025
 their-0.025
 its-0.022
Token a
Token ID257
Feature activation-5.61e+00

Pos logit contributions
umpy+0.068
opus+0.067
 promise+0.067
asis+0.066
ute+0.066
Neg logit contributions
 her-0.021
 the-0.017
 his-0.016
 their-0.016
 its-0.013
Token collar
Token ID19908
Feature activation-2.10e+00

Pos logit contributions
umpy+0.247
ater+0.233
ari+0.230
asis+0.230
iver+0.227
Neg logit contributions
 her-0.291
 the-0.280
 his-0.276
 their-0.267
 its-0.262
Token.
Token ID13
Feature activation-2.08e+00

Pos logit contributions
opus+0.115
asis+0.112
 promise+0.111
umpy+0.110
oc+0.110
Neg logit contributions
 her-0.086
 the-0.078
 his-0.073
 their-0.072
 it-0.069
Token Then
Token ID3244
Feature activation-7.33e-02

Pos logit contributions
asis+0.094
opus+0.093
 promise+0.092
ute+0.090
ater+0.090
Neg logit contributions
 her-0.101
 the-0.093
 his-0.090
 their-0.088
 its-0.085
Token your
Token ID534
Feature activation-2.58e+00

Pos logit contributions
 promise+0.003
umpy+0.003
opus+0.003
 rule+0.003
oc+0.003
Neg logit contributions
 her-0.004
 the-0.003
 his-0.003
 their-0.003
 its-0.003
Token dog
Token ID3290
Feature activation-2.38e+00

Pos logit contributions
oc+0.105
asis+0.103
umpy+0.102
opus+0.100
rr+0.097
Neg logit contributions
 her-0.132
 the-0.125
 their-0.121
 his-0.120
 it-0.113
Token is
Token ID318
Feature activation-4.89e-02

Pos logit contributions
umpy+0.037
ute+0.036
 promise+0.036
oc+0.036
ater+0.036
Neg logit contributions
 her-0.030
 the-0.027
 his-0.027
 their-0.026
 it-0.025
Token a
Token ID257
Feature activation-4.60e+00

Pos logit contributions
 promise+0.003
opus+0.003
 rule+0.003
umpy+0.003
oc+0.003
Neg logit contributions
 her-0.001
 the-0.001
 his-0.001
 their-0.001
 its-0.001
Token lion
Token ID18744
Feature activation-3.98e+00

Pos logit contributions
iver+0.215
ater+0.205
umpy+0.204
ivering+0.201
pillar+0.184
Neg logit contributions
 the-0.239
 their-0.229
 it-0.217
 his-0.214
 her-0.214
Token,
Token ID11
Feature activation-1.03e+00

Pos logit contributions
oral+0.174
ini+0.167
 promise+0.165
umpy+0.165
arl+0.164
Neg logit contributions
 the-0.202
 her-0.196
 his-0.191
 it-0.184
 their-0.177
Token and
Token ID290
Feature activation-2.17e-01

Pos logit contributions
umpy+0.058
 promise+0.057
opus+0.056
oc+0.056
ater+0.056
Neg logit contributions
 her-0.039
 the-0.036
 his-0.035
 their-0.034
 it-0.032
Token a
Token ID257
Feature activation-5.60e+00

Pos logit contributions
 promise+0.014
umpy+0.014
opus+0.014
 rule+0.014
oc+0.014
Neg logit contributions
 her-0.006
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token monkey
Token ID21657
Feature activation-3.83e+00

Pos logit contributions
iver+0.261
umpy+0.246
ater+0.236
ivering+0.228
inning+0.224
Neg logit contributions
 the-0.290
 his-0.270
 their-0.266
 her-0.265
 it-0.263
Token,
Token ID11
Feature activation-1.83e+00

Pos logit contributions
oral+0.158
ini+0.151
arl+0.151
icious+0.150
 promise+0.150
Neg logit contributions
 the-0.201
 her-0.199
 his-0.195
 it-0.187
 their-0.182
Token and
Token ID290
Feature activation-2.24e-01

Pos logit contributions
umpy+0.113
 promise+0.110
opus+0.110
ater+0.110
oc+0.109
Neg logit contributions
 her-0.063
 the-0.058
 his-0.056
 their-0.052
 it-0.049
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 promise+0.015
umpy+0.015
opus+0.015
 rule+0.015
oc+0.015
Neg logit contributions
 her-0.005
 the-0.004
 his-0.004
 their-0.004
 its-0.003
Token gir
Token ID37370
Feature activation-3.12e+00

Pos logit contributions
iver+0.264
umpy+0.247
ater+0.240
ivering+0.228
inning+0.226
Neg logit contributions
 the-0.296
 his-0.279
 her-0.277
 their-0.271
 it-0.264
Token a
Token ID257
Feature activation-4.86e+00

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.027
 rule-0.026
-0.026
opus-0.025
oc-0.025
Token ball
Token ID2613
Feature activation-2.41e+00

Pos logit contributions
umpy+0.231
iver+0.220
ari+0.213
oc+0.210
ney+0.210
Neg logit contributions
 her-0.241
 the-0.227
 his-0.220
 it-0.204
 their-0.199
Token,
Token ID11
Feature activation+3.93e-01

Pos logit contributions
 promise+0.152
umpy+0.151
iver+0.149
asis+0.149
ini+0.149
Neg logit contributions
 her-0.074
 the-0.068
 his-0.065
 their-0.059
 it-0.055
Token and
Token ID290
Feature activation+5.89e-01

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.018
-0.018
 rule-0.018
oc-0.018
opus-0.018
Token even
Token ID772
Feature activation+6.35e-01

Pos logit contributions
 her+0.012
 their+0.010
 his+0.009
 the+0.009
 its+0.008
Neg logit contributions
 promise-0.039
 rule-0.038
-0.038
oc-0.038
opus-0.038
Token a
Token ID257
Feature activation-5.60e+00

Pos logit contributions
 her+0.016
 their+0.013
 his+0.012
 the+0.012
 its+0.011
Neg logit contributions
 promise-0.040
 rule-0.039
-0.039
oc-0.039
umpy-0.038
Token spoon
Token ID24556
Feature activation-1.70e+00

Pos logit contributions
iver+0.255
umpy+0.243
asis+0.227
ari+0.226
rr+0.222
Neg logit contributions
 her-0.280
 the-0.271
 his-0.259
 it-0.252
 their-0.248
Token.
Token ID13
Feature activation-3.16e-01

Pos logit contributions
 promise+0.087
+0.086
oc+0.086
opus+0.085
umpy+0.084
Neg logit contributions
 her-0.070
 his-0.062
 the-0.062
 their-0.060
 it-0.055
Token He
Token ID679
Feature activation-8.38e-01

Pos logit contributions
 promise+0.018
opus+0.017
oc+0.017
 rule+0.017
umpy+0.017
Neg logit contributions
 her-0.011
 his-0.010
 the-0.010
 their-0.010
 it-0.009
Token likes
Token ID7832
Feature activation+4.12e-01

Pos logit contributions
umpy+0.033
opus+0.032
 promise+0.032
 rule+0.032
ute+0.032
Neg logit contributions
 her-0.043
 the-0.040
 his-0.040
 their-0.039
 it-0.037
Token to
Token ID284
Feature activation-2.04e+00

Pos logit contributions
 her+0.011
 their+0.009
 his+0.009
 the+0.009
 its+0.008
Neg logit contributions
 promise-0.025
 rule-0.024
opus-0.024
oc-0.024
-0.024
Token a
Token ID257
Feature activation-1.15e+00

Pos logit contributions
led+0.068
bled+0.062
lled+0.058
 her+0.056
ied+0.056
Neg logit contributions
 promise-0.106
 rule-0.103
 judge-0.097
age-0.096
 request-0.096
Token bump
Token ID13852
Feature activation+1.59e+00

Pos logit contributions
opus+0.048
umpy+0.048
ater+0.045
ari+0.045
asis+0.045
Neg logit contributions
 her-0.060
 his-0.055
 their-0.055
 the-0.054
 its-0.051
Token on
Token ID319
Feature activation+2.64e+00

Pos logit contributions
 her+0.063
 their+0.057
 his+0.055
 the+0.054
 its+0.051
Neg logit contributions
umpy-0.073
ater-0.071
ute-0.069
ney-0.067
age-0.067
Token her
Token ID607
Feature activation-8.90e-01

Pos logit contributions
led+0.053
lled+0.033
ied+0.032
ed+0.031
bled+0.029
Neg logit contributions
 rule-0.171
 shelf-0.170
 veterinarian-0.166
 road-0.166
 judge-0.164
Token head
Token ID1182
Feature activation+9.70e-02

Pos logit contributions
opus+0.029
umpy+0.028
+0.027
oc+0.027
ari+0.027
Neg logit contributions
 her-0.054
 his-0.051
 the-0.050
 their-0.049
 its-0.047
Token and
Token ID290
Feature activation+2.43e+00

Pos logit contributions
 her+0.004
 their+0.004
 his+0.004
 the+0.004
 its+0.004
Neg logit contributions
 promise-0.004
 rule-0.004
umpy-0.004
oc-0.004
opus-0.004
Token a
Token ID257
Feature activation-1.61e+00

Pos logit contributions
lled+0.068
 her+0.066
ied+0.064
bled+0.064
led+0.063
Neg logit contributions
 promise-0.119
-0.115
 judge-0.113
 rule-0.113
 request-0.112
Token scratch
Token ID12692
Feature activation+9.39e-01

Pos logit contributions
opus+0.062
umpy+0.062
ari+0.061
asis+0.059
+0.058
Neg logit contributions
 her-0.090
 his-0.084
 their-0.081
 the-0.081
 its-0.078
Token on
Token ID319
Feature activation+1.85e+00

Pos logit contributions
 her+0.040
 their+0.037
 his+0.036
 the+0.036
 its+0.033
Neg logit contributions
umpy-0.040
 promise-0.040
-0.039
oc-0.039
 rule-0.039
Token her
Token ID607
Feature activation-1.02e+00

Pos logit contributions
led+0.024
lled+0.013
bled+0.013
ied+0.011
ed+0.011
Neg logit contributions
 rule-0.131
 shelf-0.127
oc-0.125
-0.125
 judge-0.125
Token arm
Token ID3211
Feature activation-4.08e-01

Pos logit contributions
opus+0.036
umpy+0.035
+0.035
ari+0.034
oc+0.034
Neg logit contributions
 her-0.060
 his-0.055
 the-0.054
 their-0.053
 its-0.051
Token with
Token ID351
Feature activation+2.24e+00

Pos logit contributions
 her+0.020
 their+0.018
 his+0.017
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.025
 rule-0.024
-0.024
oc-0.024
ute-0.024
Token her
Token ID607
Feature activation+6.98e-02

Pos logit contributions
led+0.048
bled+0.045
lled+0.044
 none+0.036
ened+0.036
Neg logit contributions
 swing-0.137
 slide-0.134
 house-0.128
-0.124
 rule-0.122
Token ball
Token ID2613
Feature activation-5.34e-01

Pos logit contributions
 her+0.004
 his+0.004
 the+0.004
 their+0.004
 its+0.004
Neg logit contributions
 promise-0.002
 rule-0.002
opus-0.002
-0.002
umpy-0.002
Token.
Token ID13
Feature activation+6.31e-01

Pos logit contributions
oc+0.025
opus+0.025
 promise+0.025
+0.025
umpy+0.025
Neg logit contributions
 her-0.024
 the-0.021
 his-0.021
 their-0.021
 its-0.019
Token She
Token ID1375
Feature activation+1.89e+00

Pos logit contributions
 her+0.029
 their+0.027
 his+0.027
 the+0.026
led+0.025
Neg logit contributions
-0.026
 promise-0.025
umpy-0.025
 rule-0.025
oc-0.025
Token saw
Token ID2497
Feature activation+3.13e+00

Pos logit contributions
 her+0.088
 their+0.079
 such+0.078
 its+0.077
 his+0.075
Neg logit contributions
 promise-0.072
-0.071
 hill-0.069
opus-0.068
 swing-0.067
Token a
Token ID257
Feature activation-3.15e+00

Pos logit contributions
led+0.106
ened+0.094
ered+0.085
bled+0.085
lled+0.085
Neg logit contributions
 swing-0.149
 slide-0.144
 park-0.139
 judge-0.139
 bird-0.138
Token boy
Token ID2933
Feature activation-5.27e-01

Pos logit contributions
'."+0.176
.'"+0.175
!'"+0.170
iver+0.167
ides+0.166
Neg logit contributions
 the-0.123
 it-0.115
 her-0.114
 their-0.110
 his-0.106
Token who
Token ID508
Feature activation+1.19e+00

Pos logit contributions
 promise+0.027
 rule+0.027
+0.027
umpy+0.027
opus+0.027
Neg logit contributions
 her-0.021
 his-0.019
 the-0.019
 their-0.018
 it-0.017
Token spoke
Token ID5158
Feature activation+2.02e+00

Pos logit contributions
 her+0.047
 their+0.042
 his+0.040
 the+0.039
 its+0.038
Neg logit contributions
-0.057
oc-0.056
opus-0.056
 promise-0.055
umpy-0.054
Token a
Token ID257
Feature activation-3.15e+00

Pos logit contributions
 her+0.062
lled+0.059
led+0.058
 their+0.056
bled+0.055
Neg logit contributions
ute-0.096
 promise-0.096
 rule-0.095
 world-0.094
 request-0.093
Token went
Token ID1816
Feature activation+8.33e-01

Pos logit contributions
 her+0.037
 his+0.033
 its+0.033
 their+0.032
 the+0.032
Neg logit contributions
 promise-0.045
-0.044
oc-0.042
umpy-0.042
 rule-0.042
Token to
Token ID284
Feature activation+8.44e-01

Pos logit contributions
 her+0.026
 his+0.022
 its+0.022
 the+0.022
 their+0.021
Neg logit contributions
 promise-0.045
 rule-0.045
-0.045
ute-0.044
ari-0.044
Token the
Token ID262
Feature activation-1.15e+00

Pos logit contributions
 her+0.009
led+0.008
bled+0.005
 his+0.005
 their+0.004
Neg logit contributions
 promise-0.063
-0.063
 rule-0.062
 request-0.061
oc-0.061
Token store
Token ID3650
Feature activation+1.36e+00

Pos logit contributions
 promise+0.032
affe+0.032
umpy+0.032
rr+0.032
ater+0.031
Neg logit contributions
 her-0.076
 the-0.071
 their-0.069
 his-0.069
 it-0.064
Token to
Token ID284
Feature activation+3.63e-01

Pos logit contributions
 her+0.052
led+0.046
 his+0.045
 its+0.045
 their+0.043
Neg logit contributions
 promise-0.064
 rule-0.063
-0.062
ney-0.062
ari-0.061
Token buy
Token ID2822
Feature activation+2.44e+00

Pos logit contributions
 her+0.022
 his+0.021
 their+0.021
 the+0.020
 its+0.020
Neg logit contributions
 promise-0.010
 rule-0.009
-0.009
oc-0.009
umpy-0.009
Token some
Token ID617
Feature activation-1.30e+00

Pos logit contributions
bled+0.066
led+0.066
lled+0.065
ened+0.061
ged+0.059
Neg logit contributions
-0.127
oc-0.122
 promise-0.120
 rule-0.118
 story-0.118
Token apples
Token ID22514
Feature activation+2.92e-01

Pos logit contributions
opus+0.069
umpy+0.068
ney+0.067
ater+0.067
oc+0.067
Neg logit contributions
 her-0.054
 the-0.048
 his-0.047
 their-0.046
 it-0.040
Token.
Token ID13
Feature activation+4.57e-01

Pos logit contributions
 her+0.013
 his+0.012
 their+0.011
 the+0.011
 its+0.011
Neg logit contributions
 promise-0.013
 rule-0.013
-0.013
oc-0.012
umpy-0.012
Token As
Token ID1081
Feature activation+1.53e+00

Pos logit contributions
 her+0.018
 his+0.016
 their+0.015
 the+0.015
 its+0.014
Neg logit contributions
-0.022
 promise-0.022
oc-0.022
 rule-0.022
umpy-0.022
Token they
Token ID484
Feature activation+1.93e+00

Pos logit contributions
 her+0.037
led+0.036
ped+0.030
lled+0.030
bled+0.030
Neg logit contributions
 hill-0.093
-0.092
 road-0.090
 farm-0.089
oc-0.088
Token parent
Token ID2560
Feature activation+6.36e-01

Pos logit contributions
opus+0.014
umpy+0.014
 promise+0.014
ater+0.014
+0.013
Neg logit contributions
 her-0.028
 his-0.026
 the-0.026
 their-0.025
 its-0.024
Token said
Token ID531
Feature activation+2.09e+00

Pos logit contributions
 her+0.023
 their+0.020
 his+0.020
 the+0.020
 its+0.018
Neg logit contributions
 promise-0.033
oc-0.032
opus-0.032
umpy-0.032
affe-0.032
Token,
Token ID11
Feature activation+3.05e-01

Pos logit contributions
lled+0.083
 her+0.083
led+0.075
 their+0.073
ened+0.070
Neg logit contributions
-0.083
affe-0.083
opus-0.082
 story-0.081
oc-0.078
Token "
Token ID366
Feature activation-5.53e-01

Pos logit contributions
 her+0.012
 their+0.010
 his+0.010
 the+0.010
 its+0.009
Neg logit contributions
 promise-0.015
oc-0.015
-0.015
opus-0.015
 rule-0.015
TokenWhere
Token ID8496
Feature activation+4.33e-01

Pos logit contributions
 promise+0.025
 rule+0.024
ute+0.023
opus+0.023
+0.023
Neg logit contributions
 her-0.026
 the-0.023
 his-0.023
 their-0.023
 it-0.022
Token are
Token ID389
Feature activation+2.58e+00

Pos logit contributions
 her+0.022
 their+0.020
 his+0.019
 the+0.019
 its+0.018
Neg logit contributions
 promise-0.017
 rule-0.017
-0.016
opus-0.016
umpy-0.016
Token you
Token ID345
Feature activation-5.94e-02

Pos logit contributions
 her+0.151
led+0.145
ged+0.143
lled+0.143
ened+0.139
Neg logit contributions
-0.060
opus-0.058
 team-0.058
 judge-0.055
 promise-0.053
Token,
Token ID11
Feature activation-1.33e+00

Pos logit contributions
 promise+0.003
oc+0.003
 rule+0.003
opus+0.003
umpy+0.003
Neg logit contributions
 her-0.002
 the-0.002
 his-0.002
 their-0.002
 its-0.002
Token little
Token ID1310
Feature activation-1.05e+00

Pos logit contributions
ney+0.059
umpy+0.059
ater+0.059
ute+0.059
age+0.059
Neg logit contributions
 her-0.065
 the-0.063
 his-0.061
 their-0.058
 it-0.057
Token one
Token ID530
Feature activation-2.54e+00

Pos logit contributions
umpy+0.038
ney+0.037
age+0.037
ute+0.036
 promise+0.036
Neg logit contributions
 her-0.060
 the-0.057
 his-0.054
 their-0.053
 it-0.052
Token?
Token ID30
Feature activation-2.25e+00

Pos logit contributions
ater+0.113
agons+0.108
arl+0.106
uer+0.106
 ramp+0.106
Neg logit contributions
 the-0.133
 her-0.125
 his-0.122
 it-0.117
 their-0.113
Token.
Token ID13
Feature activation+9.97e-01

Pos logit contributions
 her+0.003
 his+0.002
 the+0.002
 their+0.002
 its+0.002
Neg logit contributions
 promise-0.002
 rule-0.002
opus-0.002
umpy-0.002
oc-0.002
Token
Token ID220
Feature activation+1.12e-01

Pos logit contributions
 her+0.047
 their+0.041
 the+0.040
 his+0.039
led+0.039
Neg logit contributions
-0.040
 rule-0.039
oc-0.038
umpy-0.038
 promise-0.038
Token
Token ID198
Feature activation-1.84e-01

Pos logit contributions
 her+0.006
 his+0.005
 the+0.005
 their+0.005
 its+0.005
Neg logit contributions
 promise-0.004
 rule-0.004
umpy-0.004
-0.004
opus-0.004
Token
Token ID198
Feature activation-4.16e-01

Pos logit contributions
 promise+0.008
opus+0.007
 rule+0.007
oc+0.007
+0.007
Neg logit contributions
 her-0.009
 his-0.008
 the-0.008
 their-0.008
 its-0.007
TokenSuddenly
Token ID38582
Feature activation+3.16e+00

Pos logit contributions
 promise+0.016
opus+0.016
+0.016
 rule+0.015
oc+0.015
Neg logit contributions
 her-0.022
 his-0.020
 the-0.020
 their-0.020
 it-0.018
Token,
Token ID11
Feature activation+3.27e+00

Pos logit contributions
 her+0.150
lled+0.137
led+0.136
bled+0.136
ied+0.135
Neg logit contributions
-0.101
 team-0.100
opus-0.096
 goal-0.094
 hill-0.093
Token they
Token ID484
Feature activation+1.52e+00

Pos logit contributions
 her+0.115
led+0.110
angled+0.109
ied+0.106
ened+0.106
Neg logit contributions
 team-0.142
 goal-0.135
-0.132
 hill-0.130
opus-0.126
Token heard
Token ID2982
Feature activation+2.12e+00

Pos logit contributions
 her+0.068
 their+0.061
 his+0.060
 its+0.059
 the+0.057
Neg logit contributions
-0.061
opus-0.061
 promise-0.060
rr-0.059
umpy-0.058
Token a
Token ID257
Feature activation-2.52e+00

Pos logit contributions
 her+0.052
led+0.044
bled+0.044
ged+0.039
 their+0.037
Neg logit contributions
 story-0.121
affe-0.119
 team-0.117
 rule-0.116
rr-0.116
Token noise
Token ID7838
Feature activation+4.29e-01

Pos logit contributions
ater+0.136
ute+0.133
ari+0.131
umpy+0.131
ite+0.129
Neg logit contributions
 her-0.101
 the-0.096
 their-0.091
 his-0.091
 it-0.084
Token.
Token ID13
Feature activation+1.50e+00

Pos logit contributions
 her+0.019
 their+0.016
 his+0.016
 the+0.016
 its+0.015
Neg logit contributions
 promise-0.019
 rule-0.019
oc-0.019
umpy-0.018
opus-0.018
Token lie
Token ID6486
Feature activation+1.35e+00

Pos logit contributions
opus+0.077
umpy+0.076
oc+0.073
affe+0.072
rot+0.070
Neg logit contributions
 her-0.092
 the-0.085
 his-0.078
 their-0.077
 it-0.076
Token on
Token ID319
Feature activation+9.58e-01

Pos logit contributions
 her+0.043
 his+0.037
 their+0.037
 the+0.036
lled+0.034
Neg logit contributions
-0.071
 promise-0.070
ute-0.069
ater-0.067
 rule-0.066
Token the
Token ID262
Feature activation-1.29e+00

Pos logit contributions
 her+0.011
led+0.009
 his+0.008
 their+0.008
lled+0.006
Neg logit contributions
-0.068
 promise-0.067
 rule-0.067
oc-0.066
 world-0.066
Token bed
Token ID3996
Feature activation-6.69e-01

Pos logit contributions
oc+0.054
affe+0.053
umpy+0.053
opus+0.053
ute+0.052
Neg logit contributions
 her-0.067
 the-0.062
 his-0.059
 their-0.058
 it-0.056
Token.
Token ID13
Feature activation+6.92e-01

Pos logit contributions
 promise+0.031
opus+0.031
oc+0.030
+0.030
umpy+0.030
Neg logit contributions
 her-0.030
 the-0.027
 their-0.026
 his-0.026
 it-0.024
Token
Token ID198
Feature activation-8.83e-02

Pos logit contributions
 her+0.034
 his+0.032
 their+0.031
led+0.031
 the+0.031
Neg logit contributions
-0.026
umpy-0.025
 promise-0.025
 rule-0.025
oc-0.024
Token
Token ID198
Feature activation-6.40e-01

Pos logit contributions
 promise+0.004
 rule+0.004
opus+0.004
umpy+0.004
oc+0.004
Neg logit contributions
 her-0.004
 the-0.004
 his-0.004
 their-0.004
 its-0.004
TokenShe
Token ID3347
Feature activation+1.38e+00

Pos logit contributions
 promise+0.026
opus+0.026
 rule+0.026
affe+0.025
oc+0.025
Neg logit contributions
 her-0.034
 the-0.030
 their-0.029
 his-0.029
 it-0.027
Token put
Token ID1234
Feature activation+2.81e+00

Pos logit contributions
 her+0.053
 their+0.049
 his+0.047
 the+0.046
 its+0.046
Neg logit contributions
 promise-0.066
-0.062
 rule-0.061
oc-0.061
 request-0.061
Token her
Token ID607
Feature activation+1.90e-01

Pos logit contributions
lled+0.096
 able+0.084
 none+0.080
 Joe+0.078
arted+0.078
Neg logit contributions
-0.127
 shelf-0.111
 promise-0.108
 house-0.104
 slide-0.104
Token dolls
Token ID36062
Feature activation+1.32e+00

Pos logit contributions
 her+0.009
 their+0.008
 his+0.008
 the+0.008
 its+0.008
Neg logit contributions
 promise-0.008
 rule-0.008
-0.008
opus-0.008
umpy-0.008
Token it
Token ID340
Feature activation-1.27e+00

Pos logit contributions
 her+0.006
 their+0.004
 his+0.004
 the+0.003
 its+0.002
Neg logit contributions
 promise-0.039
-0.038
 rule-0.038
opus-0.038
oc-0.038
Token!"
Token ID2474
Feature activation-5.00e-01

Pos logit contributions
opus+0.062
oc+0.061
 promise+0.061
ney+0.060
ari+0.060
Neg logit contributions
 her-0.057
 the-0.051
 his-0.049
 their-0.048
 it-0.047
Token
Token ID198
Feature activation-1.85e+00

Pos logit contributions
 promise+0.026
 rule+0.025
opus+0.025
umpy+0.025
ater+0.025
Neg logit contributions
 her-0.020
 the-0.018
 his-0.018
 their-0.017
 it-0.016
Token
Token ID198
Feature activation-2.61e+00

Pos logit contributions
 promise+0.101
 rule+0.100
ater+0.099
opus+0.098
asis+0.098
Neg logit contributions
 her-0.076
 the-0.069
 his-0.068
 their-0.064
 it-0.061
TokenJimmy
Token ID40335
Feature activation-2.78e-01

Pos logit contributions
 promise+0.130
asis+0.129
 rule+0.126
actus+0.126
ari+0.124
Neg logit contributions
 her-0.118
 his-0.105
 their-0.101
 the-0.100
 its-0.091
Token put
Token ID1234
Feature activation+1.95e+00

Pos logit contributions
 promise+0.016
 rule+0.016
opus+0.016
umpy+0.016
ute+0.015
Neg logit contributions
 her-0.009
 the-0.008
 his-0.008
 their-0.008
 its-0.007
Token the
Token ID262
Feature activation-1.26e+00

Pos logit contributions
 her+0.026
 their+0.020
lled+0.019
 none+0.018
 nothing+0.017
Neg logit contributions
-0.129
 promise-0.125
oc-0.121
 rule-0.120
ater-0.118
Token axe
Token ID23272
Feature activation+1.26e+00

Pos logit contributions
oc+0.052
ari+0.051
umpy+0.050
+0.049
asis+0.049
Neg logit contributions
 her-0.065
 his-0.061
 the-0.060
 their-0.056
 it-0.055
Token down
Token ID866
Feature activation+1.14e+00

Pos logit contributions
 her+0.039
 their+0.034
 the+0.032
 his+0.032
 its+0.029
Neg logit contributions
 promise-0.070
ute-0.067
 rule-0.066
-0.065
ney-0.065
Token.
Token ID13
Feature activation-1.02e+00

Pos logit contributions
 her+0.044
 their+0.038
 his+0.036
 the+0.036
led+0.035
Neg logit contributions
 promise-0.053
 rule-0.052
umpy-0.051
-0.051
oc-0.051
Token He
Token ID679
Feature activation+3.32e-01

Pos logit contributions
 promise+0.055
asis+0.053
opus+0.053
 rule+0.053
ater+0.053
Neg logit contributions
 her-0.040
 his-0.037
 the-0.036
 their-0.034
 it-0.032
Token told
Token ID1297
Feature activation+1.70e+00

Pos logit contributions
 promise+0.009
opus+0.009
 rule+0.009
umpy+0.009
oc+0.009
Neg logit contributions
 her-0.006
 the-0.005
 his-0.005
 their-0.005
 it-0.004
Token him
Token ID683
Feature activation+2.12e+00

Pos logit contributions
led+0.034
bled+0.030
lled+0.029
ed+0.028
 her+0.024
Neg logit contributions
 promise-0.104
 story-0.104
 rule-0.103
-0.103
 request-0.102
Token to
Token ID284
Feature activation-2.19e+00

Pos logit contributions
led+0.073
 her+0.070
lled+0.070
bled+0.066
ged+0.064
Neg logit contributions
 promise-0.104
 story-0.103
 rule-0.101
affe-0.095
 team-0.095
Token be
Token ID307
Feature activation+1.03e+00

Pos logit contributions
opus+0.109
oc+0.104
umpy+0.104
affe+0.102
pillar+0.102
Neg logit contributions
 her-0.115
 the-0.104
 his-0.103
 their-0.099
 it-0.092
Token careful
Token ID8161
Feature activation+7.87e-01

Pos logit contributions
 her+0.048
led+0.044
 his+0.044
 their+0.044
 the+0.044
Neg logit contributions
 promise-0.040
 rule-0.039
umpy-0.036
-0.036
 request-0.035
Token.
Token ID13
Feature activation-1.36e-01

Pos logit contributions
 her+0.035
 his+0.031
 their+0.031
 the+0.031
 its+0.030
Neg logit contributions
 promise-0.034
 rule-0.033
umpy-0.033
ute-0.032
ater-0.031
Token "
Token ID366
Feature activation-4.97e-01

Pos logit contributions
 promise+0.006
opus+0.006
 rule+0.006
umpy+0.006
oc+0.006
Neg logit contributions
 her-0.006
 the-0.006
 his-0.006
 their-0.006
 its-0.005
TokenI
Token ID40
Feature activation-1.56e+00

Pos logit contributions
 promise+0.020
opus+0.020
ute+0.019
 rule+0.019
ari+0.019
Neg logit contributions
 her-0.025
 his-0.023
 their-0.023
 the-0.023
 it-0.021
Token will
Token ID481
Feature activation-1.94e+00

Pos logit contributions
opus+0.062
affe+0.056
ater+0.055
ute+0.055
rr+0.054
Neg logit contributions
 her-0.084
 the-0.080
 their-0.078
 his-0.077
 it-0.076
Token explain
Token ID4727
Feature activation+2.21e-02

Pos logit contributions
opus+0.102
umpy+0.097
pillar+0.097
affe+0.096
rot+0.095
Neg logit contributions
 her-0.096
 the-0.091
 his-0.086
 their-0.083
 it-0.079
Token why
Token ID1521
Feature activation-2.08e-01

Pos logit contributions
 her+0.001
 his+0.000
 the+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.001
 rule-0.001
opus-0.001
-0.001
oc-0.001
Token eggs
Token ID9653
Feature activation-1.59e+00

Pos logit contributions
umpy+0.063
opus+0.061
oc+0.059
affe+0.058
rr+0.057
Neg logit contributions
 her-0.063
 the-0.057
 their-0.055
 his-0.052
 it-0.051
Token.
Token ID13
Feature activation-8.69e-01

Pos logit contributions
opus+0.087
oc+0.083
Two+0.083
umpy+0.083
affe+0.083
Neg logit contributions
 her-0.063
 the-0.056
 their-0.053
 his-0.053
 it-0.048
Token I
Token ID314
Feature activation-1.11e+00

Pos logit contributions
 promise+0.035
opus+0.034
ater+0.033
ute+0.033
asis+0.033
Neg logit contributions
 her-0.046
 the-0.041
 his-0.041
 their-0.041
 its-0.039
Token was
Token ID373
Feature activation+3.07e-01

Pos logit contributions
opus+0.061
ute+0.059
affe+0.058
ari+0.058
umpy+0.058
Neg logit contributions
 her-0.040
 the-0.035
 his-0.035
 their-0.035
 it-0.032
Token hungry
Token ID14720
Feature activation-2.00e+00

Pos logit contributions
 her+0.011
 his+0.010
 their+0.010
 the+0.010
 its+0.009
Neg logit contributions
 promise-0.016
 rule-0.015
umpy-0.015
-0.015
opus-0.015
Token and
Token ID290
Feature activation+5.13e-01

Pos logit contributions
opus+0.114
ute+0.113
oc+0.105
+0.104
 Driver+0.104
Neg logit contributions
 her-0.081
 the-0.075
 his-0.070
 their-0.068
 it-0.066
Token I
Token ID314
Feature activation-1.56e+00

Pos logit contributions
 her+0.022
 their+0.020
 his+0.020
 the+0.019
 its+0.018
Neg logit contributions
 promise-0.023
-0.022
 rule-0.022
umpy-0.022
oc-0.022
Token wanted
Token ID2227
Feature activation+5.86e-01

Pos logit contributions
opus+0.086
ute+0.084
affe+0.082
rr+0.081
ari+0.081
Neg logit contributions
 her-0.058
 the-0.052
 his-0.050
 their-0.049
 it-0.047
Token to
Token ID284
Feature activation-2.13e+00

Pos logit contributions
 her+0.017
 his+0.014
 their+0.014
 the+0.013
led+0.012
Neg logit contributions
 promise-0.034
 rule-0.034
-0.034
opus-0.033
oc-0.033
Token make
Token ID787
Feature activation+3.03e-01

Pos logit contributions
umpy+0.106
opus+0.106
ari+0.104
ute+0.100
ney+0.099
Neg logit contributions
 her-0.112
 the-0.104
 his-0.096
 it-0.093
 their-0.092
Token an
Token ID281
Feature activation-1.77e+00

Pos logit contributions
 her+0.011
 his+0.009
 their+0.009
 the+0.009
 its+0.008
Neg logit contributions
 promise-0.016
 rule-0.015
-0.015
opus-0.015
oc-0.015
TokenThe
Token ID464
Feature activation-2.67e+00

Pos logit contributions
 promise+0.088
ater+0.084
 rule+0.083
+0.083
ute+0.082
Neg logit contributions
 her-0.079
 his-0.073
 the-0.070
 their-0.068
 it-0.063
Token boy
Token ID2933
Feature activation-2.53e-01

Pos logit contributions
iver+0.112
!'"+0.111
umpy+0.110
oc+0.108
ute+0.108
Neg logit contributions
 the-0.135
 her-0.134
 his-0.118
 their-0.114
 it-0.112
Token was
Token ID373
Feature activation+1.22e+00

Pos logit contributions
 promise+0.014
 rule+0.014
opus+0.014
umpy+0.014
+0.014
Neg logit contributions
 her-0.009
 the-0.008
 his-0.008
 their-0.007
 its-0.007
Token so
Token ID523
Feature activation+5.98e-01

Pos logit contributions
 her+0.046
 their+0.041
led+0.041
 his+0.040
 him+0.038
Neg logit contributions
 promise-0.059
umpy-0.057
 rule-0.056
age-0.055
ney-0.054
Token happy
Token ID3772
Feature activation+6.05e-01

Pos logit contributions
 her+0.026
 their+0.023
 his+0.023
 the+0.022
 its+0.021
Neg logit contributions
 promise-0.028
umpy-0.027
oc-0.026
-0.026
 rule-0.025
Token and
Token ID290
Feature activation+2.06e+00

Pos logit contributions
 her+0.021
 their+0.018
 his+0.018
 the+0.017
 its+0.016
Neg logit contributions
 promise-0.031
oc-0.031
opus-0.031
-0.031
 rule-0.031
Token he
Token ID339
Feature activation-4.16e-02

Pos logit contributions
 her+0.074
 them+0.067
 their+0.066
lled+0.061
 him+0.059
Neg logit contributions
 promise-0.102
-0.094
oc-0.094
 request-0.092
age-0.090
Token was
Token ID373
Feature activation+1.26e+00

Pos logit contributions
 promise+0.002
 rule+0.002
opus+0.002
umpy+0.002
oc+0.002
Neg logit contributions
 her-0.001
 the-0.001
 his-0.001
 their-0.001
 its-0.001
Token amazed
Token ID24562
Feature activation+1.07e+00

Pos logit contributions
 her+0.041
 their+0.037
led+0.037
 his+0.034
 him+0.033
Neg logit contributions
 promise-0.067
umpy-0.063
age-0.063
 rule-0.062
 request-0.060
Token by
Token ID416
Feature activation+1.73e+00

Pos logit contributions
 her+0.041
 their+0.037
 his+0.034
 the+0.034
 its+0.034
Neg logit contributions
umpy-0.049
-0.049
 promise-0.048
opus-0.048
ater-0.048
Token how
Token ID703
Feature activation+1.09e+00

Pos logit contributions
led+0.024
 her+0.022
ed+0.017
 their+0.013
lled+0.012
Neg logit contributions
 rule-0.119
 promise-0.118
 story-0.117
 hill-0.116
oc-0.115
Token Jack
Token ID3619
Feature activation+2.58e-01

Pos logit contributions
 her+0.024
led+0.021
lled+0.020
 their+0.019
bled+0.019
Neg logit contributions
 promise-0.092
-0.091
oc-0.091
 team-0.091
affe-0.090
Token what
Token ID644
Feature activation+1.47e+00

Pos logit contributions
 her+0.008
 their+0.007
 his+0.007
 the+0.007
 its+0.006
Neg logit contributions
 promise-0.015
 rule-0.014
oc-0.014
-0.014
opus-0.014
Token was
Token ID373
Feature activation+1.34e+00

Pos logit contributions
 her+0.043
bled+0.039
led+0.039
 their+0.035
lled+0.034
Neg logit contributions
 story-0.080
 judge-0.079
oc-0.079
opus-0.079
 rule-0.079
Token wrong
Token ID2642
Feature activation-3.85e-01

Pos logit contributions
 her+0.052
 their+0.047
 his+0.045
 the+0.043
led+0.043
Neg logit contributions
ute-0.059
 rule-0.059
opus-0.058
 promise-0.057
 judge-0.057
Token.
Token ID13
Feature activation+8.82e-03

Pos logit contributions
 promise+0.018
opus+0.018
+0.017
 rule+0.017
umpy+0.017
Neg logit contributions
 her-0.017
 the-0.015
 his-0.015
 their-0.015
 it-0.014
Token
Token ID198
Feature activation-1.46e+00

Pos logit contributions
 her+0.000
 the+0.000
 his+0.000
 their+0.000
 its+0.000
Neg logit contributions
 promise-0.001
 rule-0.001
opus-0.001
umpy-0.001
oc-0.001
Token
Token ID198
Feature activation-1.26e+00

Pos logit contributions
 promise+0.079
 rule+0.076
asis+0.076
ute+0.075
age+0.072
Neg logit contributions
 her-0.058
 the-0.058
 his-0.055
 their-0.052
 it-0.051
TokenJack
Token ID14295
Feature activation+4.35e-01

Pos logit contributions
 promise+0.054
ute+0.054
 rule+0.052
asis+0.051
age+0.049
Neg logit contributions
 her-0.063
 the-0.061
 his-0.060
 their-0.057
 it-0.054
Token showed
Token ID3751
Feature activation+2.44e+00

Pos logit contributions
 her+0.016
 their+0.014
 his+0.014
 the+0.014
 its+0.013
Neg logit contributions
 promise-0.022
oc-0.022
-0.022
opus-0.022
affe-0.021
Token the
Token ID262
Feature activation-1.07e+00

Pos logit contributions
led+0.044
bled+0.044
 none+0.041
 nothing+0.041
 her+0.041
Neg logit contributions
-0.141
 team-0.140
affe-0.140
 story-0.137
 request-0.135
Token squirrel
Token ID33039
Feature activation+2.39e+00

Pos logit contributions
+0.040
ater+0.039
'."+0.039
ute+0.039
umpy+0.039
Neg logit contributions
 her-0.059
 the-0.055
 his-0.053
 it-0.050
 their-0.048
Token the
Token ID262
Feature activation-1.25e+00

Pos logit contributions
 her+0.008
 their+0.005
led+0.004
 his+0.004
 the+0.004
Neg logit contributions
 promise-0.051
 rule-0.051
oc-0.050
-0.050
ute-0.049
Token sky
Token ID6766
Feature activation+1.91e-02

Pos logit contributions
oc+0.064
age+0.061
+0.060
ari+0.059
'."+0.059
Neg logit contributions
 the-0.055
 her-0.055
 his-0.051
 their-0.049
 it-0.048
Token.
Token ID13
Feature activation-1.81e-01

Pos logit contributions
 her+0.001
 his+0.001
 the+0.001
 their+0.001
 its+0.001
Neg logit contributions
 promise-0.001
 rule-0.001
opus-0.001
oc-0.001
umpy-0.001
Token
Token ID198
Feature activation-5.09e-01

Pos logit contributions
 promise+0.008
opus+0.008
 rule+0.008
umpy+0.008
oc+0.008
Neg logit contributions
 her-0.008
 his-0.008
 the-0.008
 their-0.008
 its-0.007
Token
Token ID198
Feature activation-4.00e-01

Pos logit contributions
 promise+0.023
opus+0.023
 rule+0.023
ater+0.022
umpy+0.022
Neg logit contributions
 her-0.023
 the-0.021
 his-0.021
 their-0.021
 it-0.019
TokenThe
Token ID464
Feature activation-8.19e-01

Pos logit contributions
 promise+0.014
opus+0.014
ute+0.014
 rule+0.013
+0.013
Neg logit contributions
 her-0.022
 the-0.021
 his-0.021
 their-0.020
 it-0.019
Token man
Token ID582
Feature activation+3.74e-01

Pos logit contributions
oc+0.035
age+0.033
!'"+0.033
umpy+0.032
'."+0.032
Neg logit contributions
 her-0.043
 the-0.042
 his-0.040
 their-0.038
 it-0.037
Token pushed
Token ID7121
Feature activation+1.71e+00

Pos logit contributions
 her+0.013
 their+0.011
 his+0.011
 the+0.011
 its+0.010
Neg logit contributions
 promise-0.020
oc-0.020
opus-0.020
 rule-0.020
-0.020
Token the
Token ID262
Feature activation-1.23e+00

Pos logit contributions
led+0.033
 her+0.033
bled+0.032
 none+0.029
 nothing+0.028
Neg logit contributions
rr-0.099
 promise-0.098
 rule-0.097
oc-0.097
 team-0.096
Token button
Token ID4936
Feature activation+7.25e-01

Pos logit contributions
oc+0.044
asis+0.042
age+0.042
+0.041
iver+0.041
Neg logit contributions
 the-0.070
 her-0.069
 his-0.065
 their-0.063
 it-0.063
Token and
Token ID290
Feature activation+2.78e+00

Pos logit contributions
 her+0.030
 their+0.026
 the+0.026
 his+0.026
 its+0.024
Neg logit contributions
 promise-0.034
 rule-0.034
ute-0.033
umpy-0.033
oc-0.032
Token Instead
Token ID5455
Feature activation+1.65e+00

Pos logit contributions
 promise+0.002
opus+0.002
 rule+0.002
oc+0.002
umpy+0.002
Neg logit contributions
 her-0.002
 the-0.002
 his-0.002
 their-0.002
 its-0.002
Token,
Token ID11
Feature activation+3.18e+00

Pos logit contributions
 her+0.070
led+0.069
 his+0.067
lled+0.067
bled+0.065
Neg logit contributions
 promise-0.069
 rule-0.063
ari-0.062
 world-0.061
 hill-0.061
Token Su
Token ID1778
Feature activation-4.88e-01

Pos logit contributions
led+0.103
bled+0.091
lled+0.089
icked+0.083
ped+0.080
Neg logit contributions
 swing-0.154
 world-0.148
 slide-0.148
 hill-0.148
 promise-0.146
Tokenzy
Token ID7357
Feature activation-1.02e+00

Pos logit contributions
 rule+0.020
 promise+0.020
umpy+0.019
oc+0.019
ute+0.019
Neg logit contributions
 her-0.024
 the-0.022
 his-0.021
 their-0.021
 it-0.020
Token decided
Token ID3066
Feature activation+1.37e+00

Pos logit contributions
 rule+0.057
ute+0.056
 promise+0.056
opus+0.056
umpy+0.056
Neg logit contributions
 her-0.036
 the-0.032
 his-0.031
 their-0.030
 it-0.027
Token to
Token ID284
Feature activation-2.43e+00

Pos logit contributions
 her+0.042
led+0.040
 their+0.040
 his+0.038
lled+0.037
Neg logit contributions
 promise-0.071
-0.070
 rule-0.069
 request-0.067
 judge-0.067
Token take
Token ID1011
Feature activation+2.09e+00

Pos logit contributions
umpy+0.117
opus+0.117
affe+0.108
ute+0.107
ater+0.107
Neg logit contributions
 her-0.126
 the-0.118
 his-0.114
 their-0.109
 it-0.107
Token a
Token ID257
Feature activation-1.99e+00

Pos logit contributions
led+0.047
bled+0.046
lled+0.043
ied+0.040
 none+0.039
Neg logit contributions
-0.119
age-0.110
 promise-0.109
 story-0.109
 world-0.108
Token nap
Token ID25422
Feature activation-3.29e-01

Pos logit contributions
umpy+0.087
oc+0.084
ney+0.082
ater+0.081
affe+0.079
Neg logit contributions
 her-0.103
 the-0.096
 his-0.093
 their-0.092
 it-0.091
Token under
Token ID739
Feature activation+1.01e+00

Pos logit contributions
 promise+0.017
opus+0.017
oc+0.017
umpy+0.017
 rule+0.017
Neg logit contributions
 her-0.012
 the-0.011
 his-0.011
 their-0.011
 its-0.009
Token the
Token ID262
Feature activation-1.82e+00

Pos logit contributions
 her+0.009
led+0.008
 their+0.007
 his+0.006
bled+0.005
Neg logit contributions
-0.074
 promise-0.072
 rule-0.072
 world-0.071
 hill-0.071
Token the
Token ID262
Feature activation-1.89e+00

Pos logit contributions
 her+0.025
led+0.022
bled+0.018
 their+0.018
 his+0.018
Neg logit contributions
 rule-0.104
 promise-0.103
 world-0.100
 farm-0.099
 request-0.099
Token town
Token ID3240
Feature activation-7.51e-01

Pos logit contributions
oc+0.084
rr+0.082
umpy+0.079
ari+0.078
opus+0.078
Neg logit contributions
 her-0.101
 the-0.095
 their-0.088
 it-0.086
 his-0.084
Token and
Token ID290
Feature activation+5.56e-01

Pos logit contributions
 promise+0.039
opus+0.039
+0.038
 rule+0.038
ute+0.038
Neg logit contributions
 her-0.029
 the-0.026
 their-0.025
 his-0.025
 it-0.023
Token no
Token ID645
Feature activation+5.23e-02

Pos logit contributions
 her+0.029
 their+0.026
 his+0.026
 the+0.026
led+0.025
Neg logit contributions
 promise-0.021
 rule-0.020
-0.019
oc-0.019
 request-0.018
Token longer
Token ID2392
Feature activation-1.66e-01

Pos logit contributions
 her+0.002
 his+0.002
 the+0.002
 their+0.002
 its+0.002
Neg logit contributions
 promise-0.002
 rule-0.002
oc-0.002
opus-0.002
-0.002
Token come
Token ID1282
Feature activation-1.20e+00

Pos logit contributions
 promise+0.008
opus+0.008
 rule+0.008
oc+0.008
+0.008
Neg logit contributions
 her-0.007
 the-0.006
 his-0.006
 their-0.006
 its-0.006
Token searching
Token ID10342
Feature activation+1.42e+00

Pos logit contributions
opus+0.078
+0.075
oc+0.075
 rule+0.075
affe+0.075
Neg logit contributions
 her-0.036
 the-0.030
 their-0.030
 his-0.029
 its-0.025
Token for
Token ID329
Feature activation+2.16e+00

Pos logit contributions
 her+0.051
bled+0.044
 his+0.043
 their+0.043
 the+0.042
Neg logit contributions
 promise-0.070
 rule-0.066
oc-0.065
-0.065
 request-0.064
Token food
Token ID2057
Feature activation-2.99e-01

Pos logit contributions
led+0.052
bled+0.050
 her+0.046
lled+0.046
ied+0.045
Neg logit contributions
 promise-0.127
 rule-0.121
oc-0.118
 request-0.117
 story-0.117
Token.
Token ID13
Feature activation-5.96e-01

Pos logit contributions
 promise+0.014
opus+0.014
umpy+0.014
oc+0.014
 rule+0.014
Neg logit contributions
 her-0.013
 the-0.011
 his-0.011
 their-0.011
 its-0.010
Token
Token ID198
Feature activation-4.42e-01

Pos logit contributions
 promise+0.028
ater+0.027
opus+0.027
 rule+0.027
ute+0.026
Neg logit contributions
 her-0.027
 the-0.024
 his-0.024
 their-0.023
 it-0.022
Token you
Token ID345
Feature activation-1.85e+00

Pos logit contributions
 promise+0.025
umpy+0.025
oc+0.025
opus+0.024
 rule+0.024
Neg logit contributions
 her-0.024
 the-0.021
 his-0.021
 their-0.021
 its-0.019
Token.
Token ID13
Feature activation-2.28e+00

Pos logit contributions
opus+0.088
ute+0.087
asis+0.086
umpy+0.086
age+0.085
Neg logit contributions
 her-0.085
 the-0.081
 his-0.076
 their-0.075
 it-0.071
Token But
Token ID887
Feature activation+2.17e-01

Pos logit contributions
asis+0.107
opus+0.104
ater+0.104
ute+0.103
umpy+0.102
Neg logit contributions
 her-0.108
 the-0.099
 his-0.097
 their-0.095
 its-0.092
Token you
Token ID345
Feature activation-1.78e+00

Pos logit contributions
 her+0.011
 their+0.009
 his+0.009
 the+0.009
 its+0.009
Neg logit contributions
 promise-0.009
 rule-0.009
-0.009
opus-0.009
oc-0.008
Token have
Token ID423
Feature activation+3.56e-01

Pos logit contributions
opus+0.077
ute+0.076
umpy+0.072
ater+0.072
ini+0.071
Neg logit contributions
 the-0.087
 her-0.084
 it-0.081
 his-0.077
 their-0.076
Token to
Token ID284
Feature activation-2.85e+00

Pos logit contributions
 her+0.010
 his+0.008
 their+0.008
 the+0.008
 its+0.007
Neg logit contributions
 promise-0.022
 rule-0.021
-0.021
umpy-0.021
opus-0.021
Token promise
Token ID6991
Feature activation-6.43e-01

Pos logit contributions
opus+0.147
pillar+0.130
umpy+0.130
ari+0.129
ney+0.128
Neg logit contributions
 her-0.153
 the-0.141
 his-0.140
 their-0.136
 its-0.122
Token to
Token ID284
Feature activation-1.10e+00

Pos logit contributions
umpy+0.030
oc+0.029
opus+0.029
 promise+0.029
ute+0.028
Neg logit contributions
 her-0.031
 the-0.027
 his-0.027
 their-0.026
 it-0.025
Token be
Token ID307
Feature activation+6.18e-01

Pos logit contributions
opus+0.048
umpy+0.048
oc+0.046
ute+0.045
ney+0.044
Neg logit contributions
 her-0.060
 the-0.054
 his-0.054
 their-0.052
 it-0.048
Token honest
Token ID5508
Feature activation-1.43e+00

Pos logit contributions
 her+0.031
 their+0.028
 his+0.027
 the+0.027
led+0.026
Neg logit contributions
 promise-0.024
 rule-0.023
-0.022
umpy-0.022
oc-0.021
Token and
Token ID290
Feature activation-5.08e-01

Pos logit contributions
opus+0.081
asis+0.080
ute+0.077
 promise+0.077
+0.075
Neg logit contributions
 her-0.052
 the-0.048
 his-0.046
 their-0.044
 it-0.041
Token
Token ID26391
Feature activation-1.98e+00

Pos logit contributions
opus+0.071
ute+0.069
 team+0.069
 rule+0.068
 promise+0.068
Neg logit contributions
 her-0.101
 the-0.098
 his-0.090
led-0.087
 their-0.086
Token Then
Token ID3244
Feature activation-6.51e-01

Pos logit contributions
 promise+0.075
 rule+0.074
reath+0.069
ute+0.067
ater+0.066
Neg logit contributions
 her-0.102
 the-0.101
 it-0.098
 their-0.095
 his-0.091
Token,
Token ID11
Feature activation-1.39e+00

Pos logit contributions
umpy+0.028
opus+0.028
oc+0.028
ute+0.027
 promise+0.027
Neg logit contributions
 her-0.031
 the-0.029
 his-0.028
 their-0.028
 it-0.026
Token she
Token ID673
Feature activation-3.16e+00

Pos logit contributions
umpy+0.067
opus+0.066
oc+0.065
ute+0.065
ater+0.065
Neg logit contributions
 her-0.063
 the-0.059
 his-0.056
 their-0.056
 it-0.054
Token shut
Token ID4423
Feature activation+1.57e+00

Pos logit contributions
ute+0.146
opus+0.145
ater+0.144
umpy+0.143
arl+0.141
Neg logit contributions
 her-0.142
 the-0.137
 it-0.128
 his-0.124
 their-0.123
Token the
Token ID262
Feature activation-3.61e+00

Pos logit contributions
 her+0.024
 his+0.017
 their+0.016
led+0.015
 nothing+0.013
Neg logit contributions
-0.106
ater-0.103
 promise-0.103
!'"-0.102
 rule-0.102
Token door
Token ID3420
Feature activation-1.12e+00

Pos logit contributions
iver+0.159
umpy+0.155
oc+0.149
+0.149
ari+0.149
Neg logit contributions
 her-0.176
 the-0.175
 his-0.167
 it-0.150
 their-0.150
Token behind
Token ID2157
Feature activation-6.24e-01

Pos logit contributions
opus+0.057
oc+0.055
 promise+0.055
affe+0.054
 rule+0.053
Neg logit contributions
 her-0.047
 the-0.042
 his-0.040
 their-0.040
 it-0.037
Token them
Token ID606
Feature activation-2.22e+00

Pos logit contributions
 promise+0.033
opus+0.032
umpy+0.032
 rule+0.032
oc+0.032
Neg logit contributions
 her-0.025
 the-0.022
 his-0.022
 their-0.022
 it-0.020
Token so
Token ID523
Feature activation+2.33e-01

Pos logit contributions
opus+0.093
oc+0.091
affe+0.090
 rule+0.088
ute+0.087
Neg logit contributions
 her-0.115
 the-0.110
 their-0.100
 his-0.100
 it-0.099
Token that
Token ID326
Feature activation+4.23e-01

Pos logit contributions
 her+0.008
 his+0.006
 their+0.006
 the+0.006
 its+0.006
Neg logit contributions
 promise-0.013
-0.013
 rule-0.013
opus-0.013
umpy-0.013
Tokenmy
Token ID1820
Feature activation-1.61e+00

Pos logit contributions
 promise+0.006
opus+0.005
umpy+0.005
 rule+0.005
oc+0.005
Neg logit contributions
 her-0.005
 the-0.005
 his-0.005
 their-0.005
 its-0.004
Token and
Token ID290
Feature activation+5.28e-01

Pos logit contributions
 rule+0.087
opus+0.086
umpy+0.086
ute+0.085
asis+0.084
Neg logit contributions
 her-0.065
 the-0.057
 his-0.056
 their-0.054
 it-0.050
Token started
Token ID2067
Feature activation+5.19e-01

Pos logit contributions
 her+0.016
 their+0.014
 his+0.013
 the+0.013
 its+0.012
Neg logit contributions
 promise-0.030
-0.030
oc-0.029
 rule-0.029
opus-0.029
Token running
Token ID2491
Feature activation+2.17e-01

Pos logit contributions
 her+0.017
 their+0.014
 his+0.014
 the+0.014
 its+0.012
Neg logit contributions
 promise-0.029
 rule-0.028
-0.028
umpy-0.028
ute-0.028
Token after
Token ID706
Feature activation+5.03e-01

Pos logit contributions
 her+0.007
 his+0.006
 their+0.006
 the+0.006
 its+0.006
Neg logit contributions
 promise-0.012
 rule-0.012
umpy-0.012
-0.012
oc-0.011
Token the
Token ID262
Feature activation-4.32e+00

Pos logit contributions
 her+0.008
 their+0.006
 his+0.006
 the+0.005
 its+0.004
Neg logit contributions
 promise-0.035
 rule-0.035
-0.035
opus-0.035
umpy-0.035
Token bird
Token ID6512
Feature activation-7.27e-01

Pos logit contributions
oc+0.172
iver+0.159
ater+0.159
umpy+0.155
ari+0.148
Neg logit contributions
 the-0.230
 her-0.225
 their-0.210
 his-0.208
 it-0.195
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
 promise+0.035
opus+0.035
 rule+0.035
+0.035
umpy+0.034
Neg logit contributions
 her-0.031
 the-0.028
 his-0.027
 their-0.027
 it-0.024
Token She
Token ID1375
Feature activation+6.61e-01

Pos logit contributions
 promise+0.081
 rule+0.079
ater+0.079
asis+0.077
opus+0.076
Neg logit contributions
 her-0.047
 the-0.042
 his-0.042
 their-0.040
 it-0.037
Token ran
Token ID4966
Feature activation+9.33e-01

Pos logit contributions
 her+0.024
 their+0.020
 his+0.020
 the+0.020
 its+0.018
Neg logit contributions
 promise-0.035
oc-0.034
-0.034
opus-0.034
 rule-0.033
Token and
Token ID290
Feature activation+2.32e+00

Pos logit contributions
 her+0.032
 their+0.028
 his+0.027
 the+0.026
led+0.026
Neg logit contributions
 promise-0.048
-0.047
umpy-0.047
 rule-0.047
ute-0.046
Token lived
Token ID5615
Feature activation+9.11e-01

Pos logit contributions
 her+0.021
 their+0.019
 the+0.018
 his+0.018
 its+0.017
Neg logit contributions
 promise-0.024
-0.023
oc-0.023
 rule-0.023
opus-0.023
Token at
Token ID379
Feature activation+1.30e+00

Pos logit contributions
 her+0.029
 their+0.026
 his+0.024
 the+0.023
 it+0.022
Neg logit contributions
 promise-0.049
ute-0.049
umpy-0.049
 rule-0.048
ater-0.048
Token the
Token ID262
Feature activation-1.83e+00

Pos logit contributions
led+0.019
 her+0.018
 their+0.013
ed+0.012
ied+0.011
Neg logit contributions
 rule-0.090
 hill-0.089
 world-0.089
 farm-0.089
asis-0.088
Token bottom
Token ID4220
Feature activation-8.58e-01

Pos logit contributions
oc+0.062
 promise+0.060
affe+0.058
umpy+0.057
rr+0.055
Neg logit contributions
 her-0.113
 the-0.108
 his-0.104
 their-0.099
 it-0.095
Token of
Token ID286
Feature activation+1.71e+00

Pos logit contributions
 promise+0.048
+0.046
 rule+0.045
oc+0.045
opus+0.045
Neg logit contributions
 her-0.032
 the-0.029
 his-0.028
 their-0.027
 it-0.025
Token a
Token ID257
Feature activation-3.60e+00

Pos logit contributions
led+0.031
ed+0.020
ied+0.018
bled+0.017
lled+0.017
Neg logit contributions
 hill-0.115
 farm-0.115
 river-0.114
 rule-0.114
 world-0.113
Token big
Token ID1263
Feature activation-2.74e+00

Pos logit contributions
ides+0.197
age+0.194
 Deal+0.193
iver+0.192
rr+0.190
Neg logit contributions
 the-0.145
 her-0.127
 it-0.125
 his-0.122
 their-0.115
Token old
Token ID1468
Feature activation-2.59e+00

Pos logit contributions
icious+0.106
age+0.106
ute+0.102
reath+0.099
umpy+0.099
Neg logit contributions
 the-0.156
 her-0.150
 his-0.143
 it-0.136
 their-0.133
Token tree
Token ID5509
Feature activation-9.89e-01

Pos logit contributions
icious+0.098
 Deal+0.092
rr+0.092
ari+0.092
rig+0.090
Neg logit contributions
 her-0.153
 the-0.145
 his-0.144
 it-0.137
 its-0.131
Token,
Token ID11
Feature activation+1.52e+00

Pos logit contributions
opus+0.055
 promise+0.055
+0.053
umpy+0.053
affe+0.053
Neg logit contributions
 her-0.036
 the-0.032
 his-0.031
 their-0.030
 its-0.028
Token close
Token ID1969
Feature activation+1.63e+00

Pos logit contributions
 her+0.058
 their+0.053
led+0.051
bled+0.049
 the+0.049
Neg logit contributions
 promise-0.068
 rule-0.067
 hill-0.067
 world-0.067
-0.064
TokenAs
Token ID1722
Feature activation+1.13e+00

Pos logit contributions
 promise+0.113
asis+0.107
ater+0.106
opus+0.105
 rule+0.104
Neg logit contributions
 her-0.110
 the-0.094
 his-0.094
 their-0.089
 it-0.086
Token she
Token ID673
Feature activation+1.16e-01

Pos logit contributions
 her+0.022
led+0.018
 his+0.018
 their+0.017
bled+0.015
Neg logit contributions
 promise-0.072
-0.072
oc-0.071
 rule-0.071
opus-0.070
Token stepped
Token ID10764
Feature activation+2.55e-01

Pos logit contributions
 her+0.004
 their+0.003
 his+0.003
 the+0.003
 its+0.003
Neg logit contributions
 promise-0.006
oc-0.006
opus-0.006
-0.006
 rule-0.006
Token up
Token ID510
Feature activation+1.79e+00

Pos logit contributions
 her+0.008
 their+0.007
 his+0.007
 the+0.007
 its+0.006
Neg logit contributions
 promise-0.014
-0.014
opus-0.014
umpy-0.014
ute-0.014
Token to
Token ID284
Feature activation+7.51e-01

Pos logit contributions
 her+0.052
led+0.051
bled+0.050
lled+0.050
 their+0.048
Neg logit contributions
-0.088
 hill-0.088
 rule-0.087
 promise-0.087
 story-0.085
Token the
Token ID262
Feature activation-3.69e+00

Pos logit contributions
 her+0.006
led+0.005
 their+0.003
 his+0.003
bled+0.002
Neg logit contributions
 promise-0.058
 rule-0.057
-0.057
 request-0.056
 hill-0.055
Token edge
Token ID5743
Feature activation+5.69e-01

Pos logit contributions
oc+0.143
umpy+0.141
rr+0.139
iver+0.136
!'"+0.133
Neg logit contributions
 the-0.204
 her-0.199
 his-0.180
 their-0.173
 it-0.171
Token of
Token ID286
Feature activation+1.99e+00

Pos logit contributions
 her+0.024
 their+0.021
 his+0.021
 the+0.021
 its+0.019
Neg logit contributions
 promise-0.026
umpy-0.026
opus-0.026
 rule-0.026
-0.025
Token the
Token ID262
Feature activation-3.14e+00

Pos logit contributions
led+0.035
bled+0.022
ed+0.022
ied+0.019
icked+0.018
Neg logit contributions
 hill-0.134
 rule-0.131
 world-0.129
 shelf-0.127
 fence-0.127
Token pond
Token ID16723
Feature activation+1.01e+00

Pos logit contributions
affe+0.117
!'"+0.113
umpy+0.113
'."+0.112
iver+0.112
Neg logit contributions
 the-0.178
 her-0.163
 his-0.158
 their-0.149
 it-0.147
Token,
Token ID11
Feature activation+3.05e+00

Pos logit contributions
 her+0.041
 their+0.036
 his+0.036
 the+0.035
 its+0.033
Neg logit contributions
 promise-0.047
-0.045
 rule-0.045
opus-0.045
umpy-0.044
Token a
Token ID257
Feature activation-2.23e+00

Pos logit contributions
 her+0.020
 their+0.015
 his+0.015
 the+0.014
led+0.014
Neg logit contributions
-0.063
 promise-0.063
 rule-0.062
oc-0.061
umpy-0.061
Token time
Token ID640
Feature activation+1.27e+00

Pos logit contributions
opus+0.114
 Driver+0.109
asis+0.108
ute+0.108
affe+0.108
Neg logit contributions
 her-0.091
 their-0.084
 the-0.081
 his-0.079
 it-0.069
Token,
Token ID11
Feature activation+1.14e+00

Pos logit contributions
 her+0.057
 his+0.050
 their+0.050
led+0.049
 the+0.049
Neg logit contributions
 promise-0.052
-0.052
opus-0.051
oc-0.050
 rule-0.050
Token there
Token ID612
Feature activation+2.21e+00

Pos logit contributions
 her+0.031
 his+0.025
 their+0.024
 the+0.023
 him+0.022
Neg logit contributions
-0.067
 promise-0.066
umpy-0.065
 rule-0.065
ute-0.064
Token was
Token ID373
Feature activation+1.42e+00

Pos logit contributions
 her+0.100
led+0.095
bled+0.091
 their+0.089
 able+0.088
Neg logit contributions
oc-0.083
 world-0.081
-0.080
 promise-0.080
 story-0.080
Token a
Token ID257
Feature activation-3.59e+00

Pos logit contributions
 her+0.032
led+0.029
 his+0.026
ened+0.025
bled+0.025
Neg logit contributions
-0.083
 promise-0.080
umpy-0.080
 rule-0.080
oc-0.080
Token little
Token ID1310
Feature activation-2.68e+00

Pos logit contributions
age+0.210
'."+0.205
!'"+0.203
ides+0.200
opl+0.197
Neg logit contributions
 the-0.140
 her-0.133
 their-0.124
 his-0.121
 it-0.112
Token train
Token ID4512
Feature activation-2.64e+00

Pos logit contributions
age+0.113
!'"+0.109
iver+0.109
'."+0.108
icious+0.107
Neg logit contributions
 her-0.139
 the-0.135
 their-0.127
 his-0.123
 it-0.118
Token.
Token ID13
Feature activation-3.58e-01

Pos logit contributions
 promise+0.163
reath+0.156
 Deal+0.155
arl+0.151
umpy+0.151
Neg logit contributions
 the-0.094
 her-0.090
 his-0.083
 their-0.081
 it-0.072
Token It
Token ID632
Feature activation-2.21e-01

Pos logit contributions
 promise+0.020
opus+0.020
 rule+0.019
ute+0.019
umpy+0.019
Neg logit contributions
 her-0.013
 the-0.011
 his-0.011
 their-0.011
 its-0.010
Token was
Token ID373
Feature activation+9.96e-01

Pos logit contributions
 promise+0.013
opus+0.012
 rule+0.012
umpy+0.012
+0.012
Neg logit contributions
 her-0.007
 the-0.006
 his-0.006
 their-0.006
 its-0.006