TOP ACTIVATIONS
MAX = 11.683

't worry , k itty . We can find another way
You 're not a princess . You 're just a baby
. Don 't be afraid . The dog is harmless .
Don 't worry , Play . You don 't have to
Mia . Don 't cry . The lizard did not mean
, do not hurt me . I am sorry . I
" Why , of course ! It would make us very
you have no pretty bricks . I win ," Tom said
that 's not a veil . That 's a curtain ,"
Max . Math is fun . Look , this is how
The slide is not real . It 's just a lie
back , " Go away ! I 'm not scared of
" Don 't be ashamed . You succeeded in playing beautifully
shouted , " Go away ! I have a magical shield
. The flower is boring . My gear is better .
head . " No way ! I 'm being careful and
Don 't be too mad . I 'm glad you followed
said , " Go away ! I don 't want to
beetle ! You are gross ! I hate you !" Lily
bad kids . Go away . Go away now ."

MIN ACTIVATIONS
MIN = -9.638

't hurt me !" sque aked the rabbit .
Stop !" a mouse sque aked , as it ran out
the king . They sne aked up on him while he
and cozy . He sne aked under it and waited for
's that ?" he bark ed , jumping on the box
kitchen . Then he sne aked out the door . He
with me ?" Spot bark ed . M ittens
O uch !" Ben cried out . He felt a sharp
sheep !" he squ eal ed . M ama
." Mom looked into the bucket and discovered the
quite curious . Alex walked up to him and said ,
." They walked deeper into the cave , until
when a small girl walked up and saw him . She
I will !" she ran off with a happy heart .
." S ara closed the book and thought for
S oph ie looked up and said , " Wow
qu ack ! Lucy pe ered closer and saw two ducks
the river . Tim pe ered over the side and saw
the squirrel ." He looked up and saw a three -
," he said . Lucy closed the box and put it

INTERVAL 6.353 - 11.683
CONTAINS 2.364%

, fire is dangerous . You can 't play with matches
new best friend , cow . You are so wise and
" Wow , Lily ! That 's an incredible picture !"
's OK , Tom . It 's just a balloon .
the girl said . " You are very kind . Do

INTERVAL 1.023 - 6.353
CONTAINS 27.301%

Because you won the hose , sweet heart . Now you
of the animals hidden away in the woods . Follow me
how to use a scale to weigh it . Tim my
Tom says , " Not now , horse . I have
way to unlock the hole to get the ball back .

INTERVAL -4.308 - 1.023
CONTAINS 67.075%

Max felt much better and was happy to be able to
air and it was getting darker . Sally was scared .
had managed to resist the bear 's gr ump iness and
led her hand . Sophie felt happy and the sadness started
. He wanted to take a nap in the grass .

INTERVAL -9.638 - -4.308
CONTAINS 3.260%

bird . The bird falls into the water with a th
, Lily ?" Lily said , " No , it is
and looked around . He saw a mountain in the distance
some water from the pot into the cup and handed it
was very excited and went straight to the market .
Token't
Token ID470
Feature activation+5.42e+00

Pos logit contributions
 yours+0.136
 loser+0.119
 fault+0.104
osaurs+0.100
 virtue+0.098
Neg logit contributions
ised-0.218
ized-0.207
 early-0.206
 wide-0.206
 cautiously-0.196
Token worry
Token ID5490
Feature activation+4.84e+00

Pos logit contributions
 yours+0.138
 loser+0.125
 virtue+0.118
osaurs+0.114
 teaches+0.114
Neg logit contributions
ised-0.105
 wide-0.100
ized-0.100
 early-0.099
 cautiously-0.093
Token,
Token ID11
Feature activation+7.24e+00

Pos logit contributions
 yours+0.163
 loser+0.151
 deserve+0.145
 virtue+0.141
 losers+0.140
Neg logit contributions
ised-0.055
 early-0.051
ized-0.049
 widened-0.045
 wide-0.045
Token k
Token ID479
Feature activation+6.32e+00

Pos logit contributions
 yours+0.154
 loser+0.140
 deserve+0.127
 virtue+0.126
 fault+0.121
Neg logit contributions
ised-0.165
 early-0.161
ized-0.159
 wide-0.158
 cautiously-0.150
Tokenitty
Token ID9760
Feature activation+5.91e+00

Pos logit contributions
 yours+0.160
 fault+0.138
 loser+0.133
 virtue+0.133
 deserve+0.130
Neg logit contributions
ised-0.135
ized-0.130
awed-0.124
 wide-0.111
 early-0.106
Token.
Token ID13
Feature activation+1.17e+01

Pos logit contributions
 yours+0.182
 loser+0.169
 deserve+0.160
 policy+0.158
 losers+0.156
Neg logit contributions
ised-0.079
ized-0.075
 early-0.074
 wide-0.068
led-0.067
Token We
Token ID775
Feature activation+7.55e+00

Pos logit contributions
 yours+0.225
 loser+0.196
 policy+0.186
 virtue+0.177
 rule+0.176
Neg logit contributions
ised-0.284
 wide-0.257
 walked-0.257
 cautiously-0.255
ized-0.252
Token can
Token ID460
Feature activation+5.30e+00

Pos logit contributions
 yours+0.136
 loser+0.117
 policy+0.114
 occasions+0.104
grand+0.099
Neg logit contributions
 wide-0.192
ised-0.192
 quickly-0.187
 cautiously-0.183
 walked-0.182
Token find
Token ID1064
Feature activation+2.54e+00

Pos logit contributions
 yours+0.123
 loser+0.113
 policy+0.104
 virtue+0.102
 fault+0.099
Neg logit contributions
 wide-0.113
ised-0.112
ized-0.111
 curiously-0.110
 cautiously-0.110
Token another
Token ID1194
Feature activation+4.44e+00

Pos logit contributions
 loser+0.058
 yours+0.054
 virtue+0.051
 deserve+0.050
 winner+0.050
Neg logit contributions
ised-0.053
ized-0.051
 wide-0.050
 early-0.048
 cautiously-0.048
Token way
Token ID835
Feature activation+3.87e+00

Pos logit contributions
 yours+0.110
 loser+0.099
osaurs+0.096
 losers+0.096
 Friends+0.091
Neg logit contributions
ised-0.084
 wide-0.081
 early-0.076
ized-0.076
 cautiously-0.074
Token You
Token ID921
Feature activation+8.84e+00

Pos logit contributions
 yours+0.191
 virtue+0.186
 matter+0.165
 policy+0.163
 deserve+0.163
Neg logit contributions
 wide-0.231
ised-0.220
 symp-0.212
 The-0.209
 Suddenly-0.208
Token're
Token ID821
Feature activation+5.89e+00

Pos logit contributions
 yours+0.189
 policy+0.178
 virtue+0.168
 occasions+0.166
 loser+0.160
Neg logit contributions
 wide-0.209
 widened-0.188
ised-0.183
ized-0.180
 walked-0.179
Token not
Token ID407
Feature activation+4.56e+00

Pos logit contributions
grand+0.167
 policy+0.167
 yours+0.165
 occasions+0.163
 Friends+0.163
Neg logit contributions
 wide-0.083
 tall-0.074
 early-0.070
 low-0.065
 excited-0.062
Token a
Token ID257
Feature activation+5.27e+00

Pos logit contributions
 loser+0.101
 yours+0.100
 virtue+0.096
 policy+0.096
 Friends+0.094
Neg logit contributions
 wide-0.085
 early-0.084
ised-0.081
 tall-0.077
ized-0.076
Token princess
Token ID21752
Feature activation+5.72e+00

Pos logit contributions
 costs+0.136
 yours+0.135
works+0.129
oes+0.128
 deserve+0.127
Neg logit contributions
 wide-0.092
 tall-0.085
 low-0.080
 early-0.080
 quickly-0.080
Token.
Token ID13
Feature activation+1.17e+01

Pos logit contributions
 yours+0.199
 loser+0.179
 virtue+0.179
 deserve+0.178
 Congratulations+0.171
Neg logit contributions
ised-0.049
 early-0.048
ized-0.045
 wide-0.044
led-0.040
Token You
Token ID921
Feature activation+8.44e+00

Pos logit contributions
 yours+0.183
 virtue+0.181
 depends+0.164
 inconvenience+0.161
 matter+0.158
Neg logit contributions
 She-0.265
 wide-0.259
 early-0.245
 The-0.244
 Then-0.244
Token're
Token ID821
Feature activation+6.08e+00

Pos logit contributions
 yours+0.185
 policy+0.170
 loser+0.164
 virtue+0.162
 fault+0.149
Neg logit contributions
 wide-0.190
 widened-0.179
ised-0.173
ized-0.173
led-0.169
Token just
Token ID655
Feature activation+6.32e+00

Pos logit contributions
 policy+0.164
 loser+0.164
 yours+0.161
 virtue+0.159
 occasions+0.159
Neg logit contributions
 wide-0.091
 tall-0.082
 early-0.082
 low-0.072
 excited-0.071
Token a
Token ID257
Feature activation+4.59e+00

Pos logit contributions
 yours+0.172
 loser+0.169
 policy+0.161
 virtue+0.161
 deserve+0.160
Neg logit contributions
 early-0.093
 wide-0.091
ised-0.090
ized-0.084
 tall-0.079
Token baby
Token ID5156
Feature activation+7.20e+00

Pos logit contributions
 yours+0.116
 costs+0.109
 Go+0.100
 deserve+0.099
 Friends+0.099
Neg logit contributions
 wide-0.086
 tall-0.080
 early-0.079
 low-0.078
 quickly-0.077
Token.
Token ID13
Feature activation+9.33e+00

Pos logit contributions
 yours+0.160
 loser+0.149
 virtue+0.142
 deserve+0.142
 losers+0.140
Neg logit contributions
ised-0.042
 early-0.041
ized-0.037
 wide-0.036
 widened-0.033
Token Don
Token ID2094
Feature activation+9.18e+00

Pos logit contributions
 yours+0.159
 virtue+0.148
 policy+0.130
 deserve+0.126
 rule+0.125
Neg logit contributions
 wide-0.224
 walked-0.213
 early-0.211
ised-0.210
 low-0.203
Token't
Token ID470
Feature activation+6.23e+00

Pos logit contributions
 yours+0.147
 loser+0.125
 fault+0.111
 virtue+0.111
 policy+0.108
Neg logit contributions
ised-0.267
 early-0.258
 wide-0.256
ized-0.255
 widened-0.246
Token be
Token ID307
Feature activation+3.27e+00

Pos logit contributions
 yours+0.153
 virtue+0.143
 teaches+0.137
inea+0.136
osaurs+0.136
Neg logit contributions
 wide-0.119
 early-0.110
ised-0.108
 low-0.106
ized-0.104
Token afraid
Token ID7787
Feature activation+5.90e+00

Pos logit contributions
 yours+0.098
 loser+0.089
 deserve+0.085
 virtue+0.083
 policy+0.082
Neg logit contributions
ised-0.050
 early-0.047
ized-0.046
 widened-0.044
 wide-0.043
Token.
Token ID13
Feature activation+1.16e+01

Pos logit contributions
 yours+0.191
 loser+0.180
 virtue+0.175
 deserve+0.170
osaurs+0.170
Neg logit contributions
ised-0.069
 early-0.069
 wide-0.063
ized-0.062
led-0.053
Token The
Token ID383
Feature activation+3.67e+00

Pos logit contributions
 yours+0.232
 virtue+0.214
 policy+0.197
 deserve+0.190
 occasions+0.184
Neg logit contributions
 wide-0.245
ised-0.238
 early-0.228
 nervously-0.224
 widened-0.222
Token dog
Token ID3290
Feature activation+7.40e+00

Pos logit contributions
 yours+0.088
 loser+0.074
 deserve+0.071
 virtue+0.069
 policy+0.068
Neg logit contributions
ised-0.078
 early-0.077
 wide-0.075
ized-0.074
 widened-0.071
Token is
Token ID318
Feature activation+4.16e+00

Pos logit contributions
 yours+0.137
 loser+0.127
 policy+0.119
 virtue+0.117
 fault+0.107
Neg logit contributions
 wide-0.180
 early-0.178
led-0.178
ized-0.178
ised-0.175
Token harmless
Token ID23585
Feature activation+4.10e+00

Pos logit contributions
 yours+0.106
 loser+0.101
 virtue+0.094
 deserve+0.094
 policy+0.093
Neg logit contributions
 early-0.072
ised-0.070
 wide-0.070
ized-0.068
 widened-0.064
Token.
Token ID13
Feature activation+1.11e+01

Pos logit contributions
 yours+0.131
 loser+0.122
 deserve+0.116
 virtue+0.116
 policy+0.112
Neg logit contributions
ised-0.052
 early-0.050
ized-0.046
 wide-0.045
led-0.042
TokenDon
Token ID3987
Feature activation+7.55e+00

Pos logit contributions
 yours+0.136
 loser+0.120
 policy+0.114
 fault+0.113
 losers+0.109
Neg logit contributions
ised-0.115
ized-0.109
 early-0.102
 wide-0.101
 widened-0.100
Token't
Token ID470
Feature activation+5.20e+00

Pos logit contributions
 yours+0.139
 loser+0.119
 fault+0.113
osaurs+0.104
 policy+0.103
Neg logit contributions
ised-0.205
 early-0.196
 wide-0.194
ized-0.194
 cautiously-0.187
Token worry
Token ID5490
Feature activation+4.83e+00

Pos logit contributions
 yours+0.127
 loser+0.112
 virtue+0.107
 depends+0.106
osaurs+0.105
Neg logit contributions
ised-0.104
 early-0.102
ized-0.098
 wide-0.098
 curiously-0.092
Token,
Token ID11
Feature activation+7.50e+00

Pos logit contributions
 yours+0.165
 loser+0.152
 deserve+0.145
 virtue+0.142
 policy+0.141
Neg logit contributions
ised-0.053
 early-0.048
ized-0.048
 widened-0.044
 wide-0.042
Token Play
Token ID3811
Feature activation+5.00e+00

Pos logit contributions
 yours+0.159
 loser+0.145
 deserve+0.132
 virtue+0.129
 policy+0.127
Neg logit contributions
ised-0.171
 early-0.167
ized-0.163
 wide-0.161
 cautiously-0.155
Token.
Token ID13
Feature activation+1.15e+01

Pos logit contributions
 yours+0.128
 loser+0.115
 deserve+0.108
 policy+0.105
 depends+0.105
Neg logit contributions
ised-0.097
 early-0.094
ized-0.092
 wide-0.087
 widened-0.086
Token You
Token ID921
Feature activation+8.18e+00

Pos logit contributions
 yours+0.203
 policy+0.170
 loser+0.169
 rule+0.156
 virtue+0.152
Neg logit contributions
ised-0.287
 early-0.267
 wide-0.265
 cautiously-0.263
 nervously-0.255
Token don
Token ID836
Feature activation+5.36e+00

Pos logit contributions
 yours+0.148
 loser+0.124
 policy+0.123
 virtue+0.111
 Friends+0.106
Neg logit contributions
ised-0.216
 wide-0.215
 early-0.206
ized-0.205
 widened-0.203
Token't
Token ID470
Feature activation+5.76e+00

Pos logit contributions
 yours+0.085
 loser+0.070
 policy+0.060
 fault+0.060
 virtue+0.060
Neg logit contributions
ised-0.159
 early-0.154
ized-0.153
 wide-0.150
 widened-0.148
Token have
Token ID423
Feature activation+4.68e+00

Pos logit contributions
 yours+0.118
 loser+0.098
 policy+0.097
 depends+0.094
osaurs+0.092
Neg logit contributions
 wide-0.131
 early-0.131
ised-0.128
ized-0.121
 low-0.121
Token to
Token ID284
Feature activation+5.34e+00

Pos logit contributions
 yours+0.131
 loser+0.123
 policy+0.116
 virtue+0.114
 deserve+0.113
Neg logit contributions
ised-0.073
 early-0.072
 wide-0.068
ized-0.067
awed-0.063
Token Mia
Token ID32189
Feature activation+3.89e+00

Pos logit contributions
 yours+0.172
 loser+0.154
 deserve+0.146
 virtue+0.142
 policy+0.141
Neg logit contributions
ised-0.128
 early-0.123
ized-0.121
 widened-0.115
 wide-0.115
Token.
Token ID13
Feature activation+9.51e+00

Pos logit contributions
 yours+0.124
 loser+0.114
 deserve+0.109
 virtue+0.106
 losers+0.106
Neg logit contributions
ised-0.050
 early-0.048
ized-0.046
 wide-0.043
 widened-0.043
Token Don
Token ID2094
Feature activation+9.09e+00

Pos logit contributions
 yours+0.160
 loser+0.137
 virtue+0.127
 deserve+0.126
 losers+0.122
Neg logit contributions
ised-0.251
 early-0.245
 wide-0.243
ized-0.236
 widened-0.233
Token't
Token ID470
Feature activation+6.68e+00

Pos logit contributions
 yours+0.132
 loser+0.107
 virtue+0.091
 policy+0.089
 fault+0.088
Neg logit contributions
ised-0.281
 early-0.272
ized-0.270
 wide-0.265
 widened-0.262
Token cry
Token ID3960
Feature activation+6.24e+00

Pos logit contributions
 yours+0.174
 loser+0.156
 virtue+0.155
 teaches+0.151
 fault+0.150
Neg logit contributions
 wide-0.120
ised-0.119
ized-0.112
 early-0.111
 curiously-0.107
Token.
Token ID13
Feature activation+1.15e+01

Pos logit contributions
 yours+0.201
 loser+0.183
 deserve+0.176
 virtue+0.172
 losers+0.170
Neg logit contributions
ised-0.082
 early-0.077
ized-0.075
 widened-0.069
 wide-0.068
Token The
Token ID383
Feature activation+4.24e+00

Pos logit contributions
 yours+0.242
 loser+0.211
 virtue+0.204
 deserve+0.199
 winner+0.198
Neg logit contributions
ised-0.250
 wide-0.241
 early-0.240
 widened-0.231
ized-0.230
Token lizard
Token ID42406
Feature activation+6.61e+00

Pos logit contributions
 yours+0.119
 deserve+0.103
 loser+0.101
 virtue+0.098
 costs+0.098
Neg logit contributions
ised-0.069
 wide-0.069
 early-0.068
ized-0.064
 cautiously-0.063
Token did
Token ID750
Feature activation+3.65e+00

Pos logit contributions
 yours+0.128
 loser+0.119
 virtue+0.101
 policy+0.101
 deserve+0.097
Neg logit contributions
led-0.162
ized-0.161
ised-0.160
 wide-0.159
 early-0.158
Token not
Token ID407
Feature activation+4.84e+00

Pos logit contributions
 yours+0.096
 loser+0.094
 policy+0.088
 Friends+0.088
 depends+0.087
Neg logit contributions
 early-0.060
 wide-0.058
ised-0.055
ized-0.054
 low-0.052
Token mean
Token ID1612
Feature activation+3.68e+00

Pos logit contributions
 yours+0.117
 virtue+0.109
 loser+0.106
 costs+0.105
 Congratulations+0.105
Neg logit contributions
 wide-0.090
 low-0.079
 early-0.078
 quickly-0.077
 tall-0.075
Token,
Token ID11
Feature activation+5.93e+00

Pos logit contributions
 yours+0.243
 fault+0.216
 loser+0.213
 policy+0.212
 virtue+0.210
Neg logit contributions
 early-0.061
 wide-0.061
 quickly-0.054
ised-0.053
 widened-0.052
Token do
Token ID466
Feature activation+6.55e+00

Pos logit contributions
 yours+0.123
 virtue+0.122
 fault+0.118
 Friends+0.115
 occasions+0.112
Neg logit contributions
 wide-0.123
 tall-0.113
 nervously-0.112
 early-0.112
 cautiously-0.112
Token not
Token ID407
Feature activation+5.19e+00

Pos logit contributions
 yours+0.108
 loser+0.098
 fault+0.094
 virtue+0.092
 policy+0.091
Neg logit contributions
ised-0.173
 early-0.173
ized-0.171
 wide-0.169
 tall-0.158
Token hurt
Token ID5938
Feature activation+3.95e+00

Pos logit contributions
 yours+0.119
 virtue+0.110
 loser+0.109
 fault+0.107
 costs+0.107
Neg logit contributions
 wide-0.102
 early-0.101
ised-0.096
ized-0.092
 quickly-0.092
Token me
Token ID502
Feature activation+4.31e+00

Pos logit contributions
 yours+0.082
 virtue+0.080
 depends+0.079
 costs+0.078
 deserve+0.078
Neg logit contributions
ised-0.082
 early-0.082
ized-0.081
 wide-0.080
 cautiously-0.077
Token.
Token ID13
Feature activation+1.14e+01

Pos logit contributions
 yours+0.107
 loser+0.095
 deserve+0.090
 virtue+0.087
 policy+0.086
Neg logit contributions
ised-0.087
 early-0.085
ized-0.083
 widened-0.079
 wide-0.079
Token I
Token ID314
Feature activation+7.45e+00

Pos logit contributions
 yours+0.190
 virtue+0.166
 winner+0.152
 deserve+0.151
 inconvenience+0.148
Neg logit contributions
 nervously-0.271
 curiously-0.270
 wide-0.267
 early-0.267
ised-0.265
Token am
Token ID716
Feature activation+3.39e+00

Pos logit contributions
 yours+0.159
 loser+0.134
 policy+0.117
 losers+0.117
 virtue+0.115
Neg logit contributions
 early-0.169
 wide-0.166
ised-0.165
 widened-0.164
led-0.164
Token sorry
Token ID7926
Feature activation+6.76e+00

Pos logit contributions
 yours+0.089
 loser+0.088
 policy+0.083
 virtue+0.083
 deserve+0.082
Neg logit contributions
 early-0.055
 wide-0.053
ised-0.051
ized-0.051
 cautiously-0.047
Token.
Token ID13
Feature activation+1.05e+01

Pos logit contributions
 yours+0.189
 loser+0.180
 virtue+0.179
 deserve+0.178
 depends+0.174
Neg logit contributions
 early-0.108
 wide-0.095
ised-0.092
 and-0.089
 nervously-0.087
Token I
Token ID314
Feature activation+7.28e+00

Pos logit contributions
 yours+0.180
 virtue+0.159
 fault+0.156
 loser+0.154
 winner+0.150
Neg logit contributions
 wide-0.248
ised-0.240
 early-0.236
 excited-0.234
 nervously-0.231
Token "
Token ID366
Feature activation+6.40e+00

Pos logit contributions
ised+0.007
ized+0.007
 early+0.007
 widened+0.006
awed+0.006
Neg logit contributions
 yours-0.013
 loser-0.012
 deserve-0.012
 losers-0.011
 virtue-0.011
TokenWhy
Token ID5195
Feature activation+7.82e+00

Pos logit contributions
 yours+0.156
 loser+0.140
 fault+0.139
 policy+0.137
 virtue+0.127
Neg logit contributions
ised-0.135
ized-0.129
 cautiously-0.119
 widened-0.117
 wide-0.117
Token,
Token ID11
Feature activation+7.52e+00

Pos logit contributions
 yours+0.151
 loser+0.133
 virtue+0.121
 policy+0.120
 deserve+0.120
Neg logit contributions
 early-0.191
ised-0.182
 wide-0.182
 widened-0.172
ized-0.170
Token of
Token ID286
Feature activation+6.13e+00

Pos logit contributions
 yours+0.179
 loser+0.167
 policy+0.156
 virtue+0.155
 deserve+0.155
Neg logit contributions
 early-0.144
ised-0.144
 wide-0.139
ized-0.136
 cautiously-0.133
Token course
Token ID1781
Feature activation+5.78e+00

Pos logit contributions
 yours+0.136
 loser+0.124
 deserve+0.113
 virtue+0.111
 policy+0.110
Neg logit contributions
ised-0.139
ized-0.132
 early-0.131
 wide-0.125
 widened-0.123
Token!
Token ID0
Feature activation+1.14e+01

Pos logit contributions
 yours+0.108
 loser+0.100
 virtue+0.089
 fault+0.088
 policy+0.088
Neg logit contributions
ised-0.147
 early-0.145
ized-0.141
 wide-0.139
 cautiously-0.133
Token It
Token ID632
Feature activation+9.61e+00

Pos logit contributions
 yours+0.206
 policy+0.183
 loser+0.179
 fault+0.171
 virtue+0.170
Neg logit contributions
ised-0.262
 cautiously-0.261
 wide-0.250
 nervously-0.249
 early-0.247
Token would
Token ID561
Feature activation+6.69e+00

Pos logit contributions
 yours+0.226
 loser+0.218
 policy+0.202
 virtue+0.189
 fault+0.179
Neg logit contributions
ised-0.195
 wide-0.191
led-0.191
ized-0.188
 cautiously-0.184
Token make
Token ID787
Feature activation+3.67e+00

Pos logit contributions
 loser+0.145
 yours+0.141
inea+0.137
 policy+0.136
 fault+0.133
Neg logit contributions
 wide-0.142
 early-0.128
ized-0.127
ised-0.127
 tall-0.125
Token us
Token ID514
Feature activation+3.79e+00

Pos logit contributions
 yours+0.076
 loser+0.072
 policy+0.064
 deserve+0.063
 virtue+0.063
Neg logit contributions
ised-0.087
 early-0.084
ized-0.083
 wide-0.080
 cautiously-0.077
Token very
Token ID845
Feature activation+1.88e+00

Pos logit contributions
 yours+0.101
 loser+0.094
 policy+0.087
 virtue+0.087
 deserve+0.086
Neg logit contributions
 early-0.068
ised-0.067
ized-0.064
 wide-0.063
 widened-0.060
Token you
Token ID345
Feature activation+6.85e+00

Pos logit contributions
 yours+0.161
 loser+0.160
 virtue+0.158
 fault+0.151
 deserve+0.149
Neg logit contributions
 early-0.085
 wide-0.082
ised-0.081
ized-0.076
 walked-0.071
Token have
Token ID423
Feature activation+4.25e+00

Pos logit contributions
 yours+0.147
 policy+0.129
 occasions+0.126
 virtue+0.123
 Friends+0.117
Neg logit contributions
 wide-0.157
ised-0.148
ized-0.143
 widened-0.140
 early-0.140
Token no
Token ID645
Feature activation+3.21e+00

Pos logit contributions
 loser+0.111
 yours+0.110
 policy+0.108
osaurs+0.107
 Friends+0.106
Neg logit contributions
ised-0.063
 wide-0.063
 early-0.061
ized-0.060
 tall-0.056
Token pretty
Token ID2495
Feature activation+2.55e+00

Pos logit contributions
 yours+0.081
 loser+0.075
 virtue+0.074
 deserve+0.072
 Congratulations+0.071
Neg logit contributions
ised-0.058
 wide-0.057
 early-0.056
ized-0.054
 tall-0.053
Token bricks
Token ID28902
Feature activation+5.85e+00

Pos logit contributions
 yours+0.062
 loser+0.058
 virtue+0.056
 deserve+0.055
 depends+0.053
Neg logit contributions
ised-0.051
ized-0.048
 early-0.048
 wide-0.046
 widened-0.045
Token.
Token ID13
Feature activation+1.14e+01

Pos logit contributions
 yours+0.147
 loser+0.137
 virtue+0.133
 depends+0.127
 Friends+0.127
Neg logit contributions
 early-0.101
ised-0.100
ized-0.100
 wide-0.097
led-0.094
Token I
Token ID314
Feature activation+7.50e+00

Pos logit contributions
 yours+0.182
 virtue+0.167
 depends+0.165
 fault+0.164
 matter+0.160
Neg logit contributions
 She-0.240
 Then-0.237
ised-0.234
 The-0.233
 wide-0.232
Token win
Token ID1592
Feature activation+2.77e+00

Pos logit contributions
 yours+0.174
 loser+0.135
 policy+0.135
 Friends+0.130
 occasions+0.127
Neg logit contributions
 wide-0.159
 early-0.153
 widened-0.153
led-0.150
ised-0.150
Token,"
Token ID553
Feature activation-9.75e-01

Pos logit contributions
 yours+0.076
 loser+0.071
 policy+0.067
 virtue+0.067
 deserve+0.066
Neg logit contributions
ised-0.045
 early-0.044
ized-0.042
 wide-0.041
 widened-0.038
Token Tom
Token ID4186
Feature activation-2.29e+00

Pos logit contributions
ised+0.023
ized+0.021
led+0.020
 early+0.020
awed+0.020
Neg logit contributions
 yours-0.022
 loser-0.020
 deserve-0.018
 losers-0.018
 You-0.018
Token said
Token ID531
Feature activation-5.12e+00

Pos logit contributions
ised+0.040
 early+0.038
ized+0.037
 widened+0.033
awed+0.032
Neg logit contributions
 yours-0.059
 loser-0.057
 deserve-0.055
 costs-0.054
 depends-0.054
Token that
Token ID326
Feature activation+7.78e+00

Pos logit contributions
 yours+0.112
 loser+0.101
 virtue+0.089
 policy+0.089
 deserve+0.088
Neg logit contributions
ised-0.238
 early-0.233
 wide-0.231
ized-0.227
 cautiously-0.220
Token's
Token ID338
Feature activation+7.16e+00

Pos logit contributions
 yours+0.186
 loser+0.176
 virtue+0.166
 policy+0.160
 fault+0.154
Neg logit contributions
 wide-0.150
 early-0.148
ised-0.139
ized-0.137
 widened-0.133
Token not
Token ID407
Feature activation+5.67e+00

Pos logit contributions
 yours+0.184
 loser+0.182
 policy+0.169
 deserve+0.167
 Friends+0.163
Neg logit contributions
 early-0.121
ised-0.120
 wide-0.115
ized-0.110
 cautiously-0.105
Token a
Token ID257
Feature activation+5.81e+00

Pos logit contributions
 loser+0.141
 yours+0.138
 policy+0.132
 virtue+0.129
 Friends+0.128
Neg logit contributions
 early-0.095
 wide-0.092
ised-0.088
ized-0.083
 tall-0.082
Token veil
Token ID30615
Feature activation+6.49e+00

Pos logit contributions
 yours+0.144
 costs+0.140
osaurs+0.136
 Go+0.135
 deserve+0.132
Neg logit contributions
 wide-0.101
 early-0.094
 tall-0.094
 low-0.092
 quickly-0.088
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.194
 loser+0.177
 deserve+0.168
 virtue+0.165
 policy+0.163
Neg logit contributions
ised-0.095
 early-0.094
ized-0.090
 wide-0.085
 widened-0.084
Token That
Token ID1320
Feature activation+8.97e+00

Pos logit contributions
 yours+0.229
 virtue+0.208
 winner+0.197
 depends+0.194
 loser+0.192
Neg logit contributions
 wide-0.223
 early-0.217
ised-0.209
 She-0.208
 widened-0.207
Token's
Token ID338
Feature activation+7.64e+00

Pos logit contributions
 yours+0.212
 loser+0.197
 virtue+0.176
 policy+0.176
 losers+0.174
Neg logit contributions
 wide-0.180
 early-0.178
ised-0.163
ized-0.161
 widened-0.160
Token a
Token ID257
Feature activation+5.84e+00

Pos logit contributions
 loser+0.198
 yours+0.195
 policy+0.184
 deserve+0.179
 Friends+0.177
Neg logit contributions
 early-0.124
ised-0.122
 wide-0.119
ized-0.109
 cautiously-0.106
Token curtain
Token ID29461
Feature activation+6.47e+00

Pos logit contributions
 yours+0.167
 costs+0.161
 Go+0.160
osaurs+0.159
 Friends+0.154
Neg logit contributions
 wide-0.089
 tall-0.082
 early-0.079
 low-0.076
 high-0.066
Token,"
Token ID553
Feature activation-1.46e+00

Pos logit contributions
 yours+0.207
 loser+0.197
 deserve+0.185
 virtue+0.184
 losers+0.180
Neg logit contributions
 early-0.075
ised-0.070
ized-0.068
 wide-0.067
led-0.062
Token Max
Token ID5436
Feature activation+5.10e+00

Pos logit contributions
 virtue+0.142
 yours+0.137
 loser+0.132
 fault+0.130
 policy+0.127
Neg logit contributions
 wide-0.064
 slowly-0.052
,-0.052
 quickly-0.052
 and-0.051
Token.
Token ID13
Feature activation+1.04e+01

Pos logit contributions
 yours+0.197
 loser+0.183
 virtue+0.177
 losers+0.176
 deserve+0.175
Neg logit contributions
 early-0.026
ised-0.025
 wide-0.023
ized-0.019
 nervously-0.016
Token Math
Token ID16320
Feature activation+7.54e+00

Pos logit contributions
 yours+0.196
 virtue+0.166
 policy+0.162
 inconvenience+0.161
 possible+0.160
Neg logit contributions
 wide-0.225
 Then-0.204
ised-0.204
 low-0.203
 Suddenly-0.200
Token is
Token ID318
Feature activation+6.38e+00

Pos logit contributions
 yours+0.134
 loser+0.119
 deserve+0.107
 virtue+0.106
 policy+0.104
Neg logit contributions
ised-0.201
 early-0.196
ized-0.190
 wide-0.187
led-0.183
Token fun
Token ID1257
Feature activation+5.06e+00

Pos logit contributions
 loser+0.169
 yours+0.162
 policy+0.159
grand+0.159
 Friends+0.159
Neg logit contributions
 wide-0.102
 early-0.101
 tall-0.096
 low-0.089
 quickly-0.080
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.174
 loser+0.171
 deserve+0.164
inea+0.161
 virtue+0.160
Neg logit contributions
 early-0.047
ised-0.038
 wide-0.038
ized-0.033
 and-0.031
Token Look
Token ID6803
Feature activation+4.32e+00

Pos logit contributions
 yours+0.223
 virtue+0.193
 inconvenience+0.191
 belong+0.191
 fault+0.191
Neg logit contributions
 Then-0.229
 wide-0.227
ised-0.210
 Suddenly-0.207
 early-0.207
Token,
Token ID11
Feature activation+6.20e+00

Pos logit contributions
 yours+0.150
 loser+0.139
 deserve+0.133
 virtue+0.130
 policy+0.130
Neg logit contributions
ised-0.045
 early-0.041
ized-0.040
 widened-0.036
 wide-0.035
Token this
Token ID428
Feature activation+7.58e+00

Pos logit contributions
 yours+0.129
 virtue+0.119
 loser+0.112
 fault+0.110
 deserve+0.109
Neg logit contributions
 wide-0.139
 early-0.135
ised-0.134
 tall-0.127
 widened-0.126
Token is
Token ID318
Feature activation+5.88e+00

Pos logit contributions
 yours+0.147
 loser+0.141
 virtue+0.136
 policy+0.134
inea+0.128
Neg logit contributions
 early-0.170
 wide-0.170
 tall-0.161
 and-0.153
ised-0.150
Token how
Token ID703
Feature activation+7.25e+00

Pos logit contributions
 loser+0.148
 yours+0.143
 policy+0.138
 deserve+0.135
 Friends+0.134
Neg logit contributions
 early-0.103
 wide-0.101
 tall-0.092
 low-0.089
ised-0.089
Token The
Token ID383
Feature activation+4.20e+00

Pos logit contributions
 yours+0.188
 loser+0.163
 policy+0.158
 fault+0.154
 virtue+0.154
Neg logit contributions
 wide-0.197
ised-0.197
 early-0.184
 cautiously-0.184
 low-0.178
Token slide
Token ID10649
Feature activation+5.40e+00

Pos logit contributions
 yours+0.114
 deserve+0.099
 loser+0.099
osaurs+0.095
 virtue+0.095
Neg logit contributions
 wide-0.074
 early-0.072
 tall-0.068
ised-0.064
 low-0.063
Token is
Token ID318
Feature activation+3.87e+00

Pos logit contributions
 yours+0.100
 loser+0.088
 policy+0.077
 virtue+0.076
 deserve+0.075
Neg logit contributions
ised-0.141
 early-0.138
ized-0.137
 wide-0.133
led-0.132
Token not
Token ID407
Feature activation+4.56e+00

Pos logit contributions
 yours+0.104
 loser+0.095
 deserve+0.089
 virtue+0.086
 policy+0.086
Neg logit contributions
ised-0.069
 early-0.067
ized-0.065
 wide-0.062
 widened-0.062
Token real
Token ID1103
Feature activation+6.23e+00

Pos logit contributions
 yours+0.127
 loser+0.120
 deserve+0.111
 virtue+0.110
 policy+0.109
Neg logit contributions
ised-0.073
 early-0.072
ized-0.067
 wide-0.067
 widened-0.064
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.211
 loser+0.202
 virtue+0.197
 deserve+0.195
 losers+0.193
Neg logit contributions
 early-0.052
ised-0.052
 wide-0.049
ized-0.044
 curiously-0.042
Token It
Token ID632
Feature activation+8.85e+00

Pos logit contributions
 yours+0.326
 policy+0.294
 virtue+0.291
 loser+0.290
 losers+0.286
Neg logit contributions
 wide-0.142
ised-0.142
 cautiously-0.126
 early-0.120
 widened-0.115
Token's
Token ID338
Feature activation+5.33e+00

Pos logit contributions
 yours+0.210
 loser+0.208
 policy+0.184
 virtue+0.180
 inconvenience+0.169
Neg logit contributions
 wide-0.183
led-0.173
ised-0.168
ized-0.166
 slowly-0.164
Token just
Token ID655
Feature activation+5.81e+00

Pos logit contributions
 yours+0.138
 loser+0.137
 deserve+0.126
 policy+0.125
 Friends+0.124
Neg logit contributions
 early-0.088
ised-0.086
 wide-0.085
ized-0.079
 cautiously-0.077
Token a
Token ID257
Feature activation+4.41e+00

Pos logit contributions
 yours+0.175
 loser+0.165
 deserve+0.154
 virtue+0.152
 policy+0.152
Neg logit contributions
ised-0.081
 early-0.079
ized-0.075
 wide-0.073
 widened-0.070
Token lie
Token ID6486
Feature activation+4.77e+00

Pos logit contributions
 yours+0.112
 Go+0.100
 Friends+0.099
 costs+0.099
osaurs+0.098
Neg logit contributions
 wide-0.083
 early-0.078
 tall-0.077
 low-0.075
 slowly-0.065
Token back
Token ID736
Feature activation-3.55e+00

Pos logit contributions
ised+0.081
ized+0.073
awed+0.067
 early+0.065
 widened+0.064
Neg logit contributions
 yours-0.127
 loser-0.114
 You-0.108
 Go-0.108
 losers-0.106
Token,
Token ID11
Feature activation-3.33e+00

Pos logit contributions
ized+0.058
 widened+0.058
ised+0.057
led+0.052
awed+0.051
Neg logit contributions
 yours-0.099
 loser-0.091
 You-0.083
 will-0.083
 Go-0.082
Token "
Token ID366
Feature activation+5.40e+00

Pos logit contributions
ised+0.072
ized+0.067
led+0.064
awed+0.062
 early+0.061
Neg logit contributions
 yours-0.078
 loser-0.069
 You-0.069
 deserve-0.065
 losers-0.064
TokenGo
Token ID5247
Feature activation+3.73e+00

Pos logit contributions
 yours+0.135
 policy+0.115
 fault+0.115
 loser+0.110
 virtue+0.108
Neg logit contributions
ised-0.114
ized-0.111
 widened-0.106
awed-0.104
 wide-0.103
Token away
Token ID1497
Feature activation+2.66e+00

Pos logit contributions
 yours+0.104
 loser+0.094
 deserve+0.089
 policy+0.087
 virtue+0.086
Neg logit contributions
ised-0.064
 early-0.061
ized-0.060
 wide-0.057
 widened-0.056
Token!
Token ID0
Feature activation+1.13e+01

Pos logit contributions
 yours+0.055
 loser+0.049
 deserve+0.044
 virtue+0.043
 losers+0.042
Neg logit contributions
ised-0.065
ized-0.062
 early-0.061
 widened-0.061
awed-0.059
Token I
Token ID314
Feature activation+8.30e+00

Pos logit contributions
 yours+0.224
 virtue+0.195
 policy+0.187
 fault+0.179
 occasions+0.178
Neg logit contributions
 wide-0.242
ised-0.242
 walked-0.239
 excited-0.223
 cautiously-0.221
Token'm
Token ID1101
Feature activation+4.28e+00

Pos logit contributions
 yours+0.163
 loser+0.134
 policy+0.122
 virtue+0.120
 Friends+0.119
Neg logit contributions
 early-0.203
 wide-0.202
ised-0.200
 widened-0.197
ized-0.196
Token not
Token ID407
Feature activation+4.48e+00

Pos logit contributions
 yours+0.117
 loser+0.108
 deserve+0.103
 virtue+0.101
 policy+0.101
Neg logit contributions
ised-0.070
 early-0.070
ized-0.068
 wide-0.067
 widened-0.063
Token scared
Token ID12008
Feature activation+5.86e+00

Pos logit contributions
 yours+0.122
 loser+0.116
 Friends+0.111
 virtue+0.111
 costs+0.111
Neg logit contributions
 early-0.068
 wide-0.065
ised-0.064
ized-0.062
 tall-0.058
Token of
Token ID286
Feature activation+5.29e+00

Pos logit contributions
 yours+0.126
 loser+0.108
 depends+0.107
osaurs+0.105
 costs+0.105
Neg logit contributions
 early-0.131
ised-0.131
ized-0.125
 wide-0.125
 nervously-0.117
Token "
Token ID366
Feature activation+5.97e+00

Pos logit contributions
ised+0.030
ized+0.027
awed+0.026
 early+0.026
led+0.026
Neg logit contributions
 yours-0.036
 loser-0.032
 You-0.031
 deserve-0.030
 Friends-0.029
TokenDon
Token ID3987
Feature activation+7.47e+00

Pos logit contributions
 yours+0.159
 loser+0.140
 fault+0.133
 policy+0.132
 virtue+0.127
Neg logit contributions
ised-0.119
ized-0.115
 early-0.104
awed-0.101
led-0.100
Token't
Token ID470
Feature activation+5.08e+00

Pos logit contributions
 yours+0.136
 loser+0.120
 fault+0.111
osaurs+0.103
 virtue+0.102
Neg logit contributions
ised-0.203
ized-0.194
 early-0.193
 wide-0.191
 widened-0.181
Token be
Token ID307
Feature activation+2.42e+00

Pos logit contributions
 yours+0.134
 loser+0.121
 virtue+0.118
inea+0.117
osaurs+0.117
Neg logit contributions
ised-0.089
 wide-0.087
 early-0.087
ized-0.086
 curiously-0.080
Token ashamed
Token ID22461
Feature activation+5.25e+00

Pos logit contributions
 yours+0.074
 loser+0.068
 deserve+0.064
 losers+0.062
 virtue+0.062
Neg logit contributions
ised-0.036
 early-0.034
ized-0.034
 widened-0.031
awed-0.030
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.177
 loser+0.165
osaurs+0.162
 depends+0.161
 deserve+0.158
Neg logit contributions
 early-0.055
ised-0.053
ized-0.047
 wide-0.045
 curiously-0.043
Token You
Token ID921
Feature activation+7.86e+00

Pos logit contributions
 yours+0.201
 loser+0.173
 policy+0.162
 rule+0.161
 virtue+0.153
Neg logit contributions
ised-0.279
 wide-0.249
ized-0.247
 early-0.247
 nodded-0.244
Token succeeded
Token ID14131
Feature activation+2.62e+00

Pos logit contributions
 yours+0.161
 loser+0.138
 policy+0.137
 occasions+0.133
 virtue+0.128
Neg logit contributions
ised-0.185
 wide-0.185
ized-0.176
 quickly-0.174
led-0.169
Token in
Token ID287
Feature activation+3.08e+00

Pos logit contributions
 yours+0.079
 loser+0.073
 deserve+0.069
 virtue+0.067
 policy+0.067
Neg logit contributions
ised-0.039
 early-0.037
ized-0.036
 widened-0.033
 wide-0.033
Token playing
Token ID2712
Feature activation+2.88e+00

Pos logit contributions
 yours+0.079
 loser+0.073
 deserve+0.068
 virtue+0.067
 depends+0.067
Neg logit contributions
ised-0.057
 early-0.055
ized-0.054
 wide-0.051
 widened-0.049
Token beautifully
Token ID21104
Feature activation+1.35e+00

Pos logit contributions
 yours+0.084
 loser+0.077
 deserve+0.073
 policy+0.072
 virtue+0.071
Neg logit contributions
ised-0.046
 early-0.043
ized-0.043
 wide-0.039
 widened-0.038
Token shouted
Token ID17293
Feature activation-4.10e+00

Pos logit contributions
ised+0.029
awed+0.028
ized+0.026
 early+0.026
overed+0.024
Neg logit contributions
 yours-0.043
 will-0.040
 depends-0.039
 you-0.038
 deserve-0.038
Token,
Token ID11
Feature activation-4.01e+00

Pos logit contributions
ised+0.063
ized+0.058
awed+0.055
 early+0.052
 widened+0.052
Neg logit contributions
 yours-0.123
 loser-0.110
 You-0.103
 Go-0.102
 virtue-0.102
Token "
Token ID366
Feature activation+5.33e+00

Pos logit contributions
ised+0.091
ized+0.086
led+0.082
awed+0.081
 early+0.079
Neg logit contributions
 yours-0.084
 You-0.075
 loser-0.074
 I-0.068
 deserve-0.068
TokenGo
Token ID5247
Feature activation+3.63e+00

Pos logit contributions
 yours+0.134
 loser+0.114
 fault+0.112
 policy+0.112
 depends+0.107
Neg logit contributions
ised-0.113
ized-0.107
 early-0.102
 widened-0.102
 wide-0.100
Token away
Token ID1497
Feature activation+2.49e+00

Pos logit contributions
 yours+0.105
 loser+0.096
 deserve+0.090
 policy+0.089
 depends+0.089
Neg logit contributions
ised-0.058
 early-0.056
ized-0.054
 wide-0.051
 cautiously-0.050
Token!
Token ID0
Feature activation+1.13e+01

Pos logit contributions
 yours+0.062
 loser+0.055
 deserve+0.052
 virtue+0.050
 policy+0.049
Neg logit contributions
ised-0.051
ized-0.049
 early-0.048
 widened-0.047
awed-0.046
Token I
Token ID314
Feature activation+8.03e+00

Pos logit contributions
 yours+0.216
 policy+0.183
 virtue+0.177
 fault+0.173
 rule+0.173
Neg logit contributions
ised-0.259
 wide-0.251
 early-0.243
 cautiously-0.243
 walked-0.240
Token have
Token ID423
Feature activation+4.67e+00

Pos logit contributions
 yours+0.140
 loser+0.114
 policy+0.101
 Friends+0.097
 virtue+0.096
Neg logit contributions
 early-0.216
ised-0.215
 wide-0.212
 widened-0.208
ized-0.207
Token a
Token ID257
Feature activation+2.23e+00

Pos logit contributions
 yours+0.128
 loser+0.125
 policy+0.115
 Friends+0.113
 deserve+0.113
Neg logit contributions
ised-0.072
 early-0.069
 wide-0.068
ized-0.068
 cautiously-0.062
Token magical
Token ID10883
Feature activation+1.65e+00

Pos logit contributions
 yours+0.058
 Go+0.051
 costs+0.050
 deserve+0.050
 loser+0.050
Neg logit contributions
 wide-0.043
 early-0.041
 tall-0.038
 low-0.038
 nervously-0.038
Token shield
Token ID7614
Feature activation+6.31e+00

Pos logit contributions
 yours+0.038
 loser+0.034
 deserve+0.033
 costs+0.031
 losers+0.031
Neg logit contributions
ised-0.035
 early-0.035
 wide-0.033
ized-0.032
 cautiously-0.032
Token.
Token ID13
Feature activation+1.07e+01

Pos logit contributions
 yours+0.109
 loser+0.100
 deserve+0.096
 virtue+0.093
 losers+0.092
Neg logit contributions
ised-0.048
 early-0.045
ized-0.044
 widened-0.041
 wide-0.040
Token The
Token ID383
Feature activation+3.97e+00

Pos logit contributions
 yours+0.170
 virtue+0.150
 deserve+0.139
 winner+0.137
 depends+0.136
Neg logit contributions
 early-0.245
ised-0.245
 wide-0.243
 She-0.234
 excited-0.233
Token flower
Token ID15061
Feature activation+7.01e+00

Pos logit contributions
 yours+0.113
 deserve+0.105
grand+0.101
 virtue+0.100
 loser+0.099
Neg logit contributions
 wide-0.062
 early-0.058
 tall-0.055
ised-0.052
 low-0.051
Token is
Token ID318
Feature activation+4.88e+00

Pos logit contributions
 yours+0.134
 loser+0.122
 virtue+0.108
 deserve+0.103
 policy+0.103
Neg logit contributions
ised-0.172
 early-0.171
ized-0.170
led-0.169
 wide-0.169
Token boring
Token ID14262
Feature activation+4.22e+00

Pos logit contributions
 yours+0.125
 loser+0.119
 deserve+0.110
 policy+0.108
 virtue+0.108
Neg logit contributions
 early-0.086
ised-0.084
 wide-0.083
ized-0.080
 widened-0.077
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.137
 loser+0.127
 deserve+0.121
 virtue+0.119
 losers+0.118
Neg logit contributions
ised-0.052
 early-0.049
ized-0.047
 wide-0.043
 widened-0.043
Token My
Token ID2011
Feature activation+4.71e+00

Pos logit contributions
 yours+0.224
 virtue+0.199
 winner+0.192
 inconvenience+0.188
 depends+0.185
Neg logit contributions
 wide-0.229
ised-0.223
 early-0.219
 cautiously-0.211
 and-0.209
Token gear
Token ID7733
Feature activation+7.35e+00

Pos logit contributions
 yours+0.116
 deserve+0.104
 loser+0.101
 losers+0.097
 virtue+0.096
Neg logit contributions
 early-0.093
 wide-0.090
ised-0.088
 curiously-0.081
 nervously-0.080
Token is
Token ID318
Feature activation+5.16e+00

Pos logit contributions
 yours+0.141
 loser+0.128
 virtue+0.113
 deserve+0.110
 policy+0.109
Neg logit contributions
ised-0.179
 early-0.178
ized-0.177
 wide-0.174
led-0.173
Token better
Token ID1365
Feature activation+5.53e+00

Pos logit contributions
 yours+0.137
 loser+0.129
 deserve+0.121
 policy+0.117
 virtue+0.117
Neg logit contributions
 early-0.087
ised-0.086
 wide-0.083
ized-0.081
 widened-0.078
Token.
Token ID13
Feature activation+1.09e+01

Pos logit contributions
 yours+0.157
 loser+0.153
 deserve+0.145
 losers+0.144
 depends+0.142
Neg logit contributions
 early-0.082
ised-0.074
 wide-0.071
ized-0.069
led-0.064
Token head
Token ID1182
Feature activation-3.18e+00

Pos logit contributions
ised+0.072
ized+0.070
 early+0.067
awed+0.066
 widened+0.066
Neg logit contributions
 yours-0.060
 loser-0.055
 virtue-0.048
 deserve-0.048
 rule-0.047
Token.
Token ID13
Feature activation+2.22e+00

Pos logit contributions
ised+0.063
ized+0.056
 early+0.055
awed+0.054
 widened+0.051
Neg logit contributions
 yours-0.078
 loser-0.071
 winner-0.065
 will-0.065
 fault-0.064
Token "
Token ID366
Feature activation+6.02e+00

Pos logit contributions
 yours+0.057
 loser+0.051
 deserve+0.050
 You+0.050
 Congratulations+0.048
Neg logit contributions
ised-0.041
ized-0.040
led-0.038
 early-0.037
awed-0.036
TokenNo
Token ID2949
Feature activation+4.78e+00

Pos logit contributions
 yours+0.155
 loser+0.136
 policy+0.130
 fault+0.130
 depends+0.128
Neg logit contributions
ised-0.120
ized-0.113
 early-0.106
 wide-0.103
 widened-0.101
Token way
Token ID835
Feature activation+5.65e+00

Pos logit contributions
 yours+0.138
 loser+0.129
 deserve+0.124
 virtue+0.120
 losers+0.119
Neg logit contributions
ised-0.071
 early-0.069
 wide-0.065
ized-0.064
 cautiously-0.060
Token!
Token ID0
Feature activation+1.13e+01

Pos logit contributions
 yours+0.200
 loser+0.193
 deserve+0.184
 depends+0.183
 virtue+0.181
Neg logit contributions
ised-0.047
 early-0.044
 wide-0.040
ized-0.040
 cautiously-0.036
Token I
Token ID314
Feature activation+7.66e+00

Pos logit contributions
 yours+0.209
 policy+0.171
 belong+0.159
 loser+0.158
 rule+0.154
Neg logit contributions
ised-0.283
 wide-0.253
 early-0.252
 cautiously-0.250
 nervously-0.243
Token'm
Token ID1101
Feature activation+3.27e+00

Pos logit contributions
 yours+0.144
 loser+0.115
 policy+0.103
 losers+0.098
 Friends+0.096
Neg logit contributions
ised-0.203
 early-0.201
 wide-0.195
 widened-0.195
ized-0.193
Token being
Token ID852
Feature activation+3.45e+00

Pos logit contributions
 yours+0.088
 loser+0.080
 deserve+0.075
 policy+0.073
 virtue+0.073
Neg logit contributions
ised-0.059
 early-0.057
ized-0.056
 wide-0.053
 widened-0.052
Token careful
Token ID8161
Feature activation+3.88e+00

Pos logit contributions
 yours+0.105
 loser+0.097
 policy+0.094
 deserve+0.093
 depends+0.091
Neg logit contributions
ised-0.049
ized-0.046
 early-0.046
 wide-0.044
led-0.040
Token and
Token ID290
Feature activation+5.55e+00

Pos logit contributions
 yours+0.122
 loser+0.113
 deserve+0.107
 depends+0.104
 virtue+0.104
Neg logit contributions
ised-0.052
 early-0.050
ized-0.048
 wide-0.044
 widened-0.044
TokenDon
Token ID3987
Feature activation+7.64e+00

Pos logit contributions
 yours+0.156
 loser+0.135
 policy+0.128
 fault+0.123
 depends+0.122
Neg logit contributions
ised-0.137
ized-0.132
 early-0.124
awed-0.118
 cautiously-0.116
Token't
Token ID470
Feature activation+5.48e+00

Pos logit contributions
 yours+0.150
 loser+0.128
 fault+0.120
osaurs+0.116
 policy+0.112
Neg logit contributions
ised-0.201
ized-0.189
 early-0.186
 wide-0.183
 cautiously-0.178
Token be
Token ID307
Feature activation+2.91e+00

Pos logit contributions
 yours+0.134
 loser+0.121
 depends+0.114
 virtue+0.114
osaurs+0.113
Neg logit contributions
ised-0.107
 early-0.106
 wide-0.102
ized-0.101
 cautiously-0.096
Token too
Token ID1165
Feature activation+3.07e+00

Pos logit contributions
 yours+0.086
 loser+0.078
 deserve+0.074
 losers+0.073
 virtue+0.072
Neg logit contributions
ised-0.046
ized-0.044
 early-0.043
 widened-0.041
awed-0.040
Token mad
Token ID8805
Feature activation+4.34e+00

Pos logit contributions
 yours+0.081
 loser+0.075
 deserve+0.070
 virtue+0.068
 losers+0.068
Neg logit contributions
ised-0.055
 widened-0.054
awed-0.054
ized-0.053
led-0.052
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.144
 loser+0.133
 deserve+0.127
 virtue+0.124
 depends+0.124
Neg logit contributions
ised-0.051
 early-0.048
ized-0.046
 widened-0.042
 wide-0.041
Token I
Token ID314
Feature activation+7.35e+00

Pos logit contributions
 yours+0.201
 loser+0.167
 policy+0.163
 belong+0.150
 rule+0.149
Neg logit contributions
ised-0.296
 early-0.263
ized-0.263
 cautiously-0.260
 nervously-0.259
Token'm
Token ID1101
Feature activation+3.56e+00

Pos logit contributions
 yours+0.132
 loser+0.103
 policy+0.091
 losers+0.088
 virtue+0.082
Neg logit contributions
ised-0.200
 early-0.199
 wide-0.192
ized-0.191
 nervously-0.191
Token glad
Token ID9675
Feature activation+5.67e+00

Pos logit contributions
 yours+0.089
 loser+0.081
 deserve+0.076
 policy+0.074
 virtue+0.074
Neg logit contributions
ised-0.070
 early-0.068
ized-0.067
 wide-0.063
 widened-0.062
Token you
Token ID345
Feature activation+5.98e+00

Pos logit contributions
 loser+0.072
 yours+0.072
 deserve+0.058
osaurs+0.058
 virtue+0.057
Neg logit contributions
 early-0.174
ised-0.172
ized-0.167
 wide-0.162
 cautiously-0.161
Token followed
Token ID3940
Feature activation+6.82e-01

Pos logit contributions
 yours+0.124
 loser+0.109
 policy+0.101
 virtue+0.096
 fault+0.095
Neg logit contributions
ised-0.141
 early-0.137
ized-0.137
 widened-0.136
 wide-0.135
Token said
Token ID531
Feature activation-1.32e+00

Pos logit contributions
 yours+0.079
 loser+0.072
 deserve+0.070
 depends+0.067
 losers+0.066
Neg logit contributions
ised-0.052
 early-0.051
ized-0.049
 widened-0.045
awed-0.045
Token,
Token ID11
Feature activation-1.65e+00

Pos logit contributions
ised+0.019
 early+0.017
ized+0.017
awed+0.017
 widened+0.016
Neg logit contributions
 yours-0.041
 loser-0.037
 losers-0.034
 deserve-0.034
 virtue-0.034
Token "
Token ID366
Feature activation+6.48e+00

Pos logit contributions
ised+0.036
ized+0.035
awed+0.033
led+0.032
 early+0.032
Neg logit contributions
 yours-0.037
 You-0.033
 loser-0.032
 Friends-0.030
 teaches-0.030
TokenGo
Token ID5247
Feature activation+4.10e+00

Pos logit contributions
 yours+0.166
 fault+0.145
 loser+0.142
 policy+0.141
inea+0.134
Neg logit contributions
ised-0.140
ized-0.136
awed-0.115
 wide-0.114
 sudden-0.112
Token away
Token ID1497
Feature activation+2.73e+00

Pos logit contributions
 yours+0.120
 loser+0.111
 policy+0.104
 depends+0.104
 deserve+0.104
Neg logit contributions
ised-0.061
 early-0.060
ized-0.058
 wide-0.056
 cautiously-0.054
Token!
Token ID0
Feature activation+1.13e+01

Pos logit contributions
 yours+0.083
 loser+0.076
 deserve+0.072
 virtue+0.070
 policy+0.070
Neg logit contributions
ised-0.041
ized-0.038
 early-0.037
 widened-0.036
awed-0.035
Token I
Token ID314
Feature activation+8.11e+00

Pos logit contributions
 yours+0.195
 policy+0.166
 fault+0.166
 virtue+0.163
 rule+0.155
Neg logit contributions
ised-0.273
 walked-0.253
 wide-0.252
ized-0.244
 cautiously-0.240
Token don
Token ID836
Feature activation+5.60e+00

Pos logit contributions
 yours+0.155
 loser+0.128
 policy+0.115
 losers+0.111
 virtue+0.109
Neg logit contributions
 wide-0.202
 early-0.201
ised-0.201
ized-0.197
 widened-0.196
Token't
Token ID470
Feature activation+5.69e+00

Pos logit contributions
 yours+0.091
 loser+0.077
 virtue+0.068
 fault+0.068
 policy+0.066
Neg logit contributions
ised-0.160
ized-0.157
 early-0.155
 wide-0.152
 widened-0.148
Token want
Token ID765
Feature activation+5.37e+00

Pos logit contributions
 yours+0.121
 loser+0.109
 costs+0.108
 policy+0.107
 fault+0.107
Neg logit contributions
 wide-0.118
 early-0.116
 low-0.108
 quickly-0.108
ised-0.107
Token to
Token ID284
Feature activation+4.77e+00

Pos logit contributions
 loser+0.174
 virtue+0.167
osaurs+0.166
 Driver+0.160
 depends+0.158
Neg logit contributions
 wide-0.062
 early-0.055
 tall-0.053
 slowly-0.049
 the-0.048
Token beetle
Token ID45489
Feature activation+5.02e+00

Pos logit contributions
 yours+0.141
 loser+0.132
 virtue+0.132
 policy+0.127
 depends+0.127
Neg logit contributions
 early-0.083
 wide-0.082
ised-0.078
ized-0.078
 curiously-0.074
Token!
Token ID0
Feature activation+1.10e+01

Pos logit contributions
 yours+0.115
 loser+0.105
 deserve+0.095
 virtue+0.094
 policy+0.094
Neg logit contributions
ised-0.107
 early-0.104
ized-0.104
 wide-0.099
led-0.098
Token You
Token ID921
Feature activation+9.27e+00

Pos logit contributions
 yours+0.205
 virtue+0.176
 policy+0.176
 occasions+0.164
 rule+0.160
Neg logit contributions
ised-0.253
 wide-0.248
awed-0.239
lee-0.237
 symp-0.236
Token are
Token ID389
Feature activation+5.33e+00

Pos logit contributions
 yours+0.200
 policy+0.176
 loser+0.174
 virtue+0.170
 occasions+0.168
Neg logit contributions
 wide-0.215
ised-0.204
ized-0.200
 widened-0.197
led-0.192
Token gross
Token ID10319
Feature activation+3.63e+00

Pos logit contributions
grand+0.151
 policy+0.149
 occasions+0.149
 loser+0.147
osaurs+0.146
Neg logit contributions
 wide-0.077
 tall-0.070
ized-0.061
 early-0.060
 excited-0.059
Token!
Token ID0
Feature activation+1.13e+01

Pos logit contributions
 yours+0.079
 loser+0.069
 deserve+0.063
 losers+0.061
 policy+0.060
Neg logit contributions
ised-0.085
 early-0.081
ized-0.081
 widened-0.079
 wide-0.076
Token I
Token ID314
Feature activation+8.11e+00

Pos logit contributions
 yours+0.184
 virtue+0.157
 code+0.154
 belong+0.149
 policy+0.145
Neg logit contributions
lee-0.258
ised-0.255
 excited-0.255
 wide-0.255
 Suddenly-0.250
Token hate
Token ID5465
Feature activation+4.72e+00

Pos logit contributions
 yours+0.195
 loser+0.167
 policy+0.153
 virtue+0.149
 Friends+0.146
Neg logit contributions
led-0.161
 wide-0.160
 widened-0.160
 early-0.158
ised-0.158
Token you
Token ID345
Feature activation+4.79e+00

Pos logit contributions
 yours+0.106
 loser+0.098
 deserve+0.093
 virtue+0.092
 policy+0.090
Neg logit contributions
 early-0.098
ised-0.098
ized-0.097
 wide-0.095
 cautiously-0.089
Token!"
Token ID2474
Feature activation-1.78e+00

Pos logit contributions
 yours+0.092
 loser+0.078
 deserve+0.071
 losers+0.067
 virtue+0.067
Neg logit contributions
ised-0.127
ized-0.122
 early-0.121
 widened-0.118
awed-0.116
Token Lily
Token ID20037
Feature activation-3.42e+00

Pos logit contributions
ised+0.038
ized+0.037
 early+0.035
awed+0.035
 widened+0.035
Neg logit contributions
 yours-0.040
 loser-0.036
 deserve-0.034
 You-0.033
 you-0.033
Token bad
Token ID2089
Feature activation+5.52e+00

Pos logit contributions
 occasions+0.157
rav+0.153
 policy+0.152
 virtue+0.149
 loser+0.148
Neg logit contributions
 wide-0.058
 tall-0.055
 early-0.050
 excited-0.049
 behind-0.047
Token kids
Token ID3988
Feature activation+4.92e+00

Pos logit contributions
 yours+0.158
 deserve+0.138
 virtue+0.136
 loser+0.135
 belong+0.133
Neg logit contributions
 early-0.089
ised-0.086
ized-0.076
led-0.075
 wide-0.074
Token.
Token ID13
Feature activation+1.01e+01

Pos logit contributions
 yours+0.147
 loser+0.136
 virtue+0.128
 winner+0.127
 deserve+0.126
Neg logit contributions
 early-0.068
ised-0.066
ized-0.065
 wide-0.061
led-0.060
Token Go
Token ID1514
Feature activation+4.17e+00

Pos logit contributions
 yours+0.171
 virtue+0.158
 winner+0.152
 favor+0.141
 fault+0.139
Neg logit contributions
 excited-0.223
 early-0.222
 wide-0.216
 Then-0.214
 The-0.213
Token away
Token ID1497
Feature activation+2.72e+00

Pos logit contributions
 yours+0.133
 loser+0.119
 fault+0.115
 policy+0.114
 depends+0.112
Neg logit contributions
ised-0.054
 early-0.053
 wide-0.050
ized-0.049
 slowly-0.045
Token.
Token ID13
Feature activation+1.13e+01

Pos logit contributions
 yours+0.057
 loser+0.048
 deserve+0.046
 virtue+0.044
 losers+0.043
Neg logit contributions
ised-0.066
 early-0.066
ized-0.062
 wide-0.060
 widened-0.059
Token Go
Token ID1514
Feature activation+4.57e+00

Pos logit contributions
 yours+0.208
 virtue+0.192
 winner+0.174
 fault+0.171
 possible+0.167
Neg logit contributions
 excited-0.237
 The-0.236
 walked-0.235
 wide-0.234
 early-0.225
Token away
Token ID1497
Feature activation+2.95e+00

Pos logit contributions
 yours+0.146
 loser+0.131
 fault+0.128
 policy+0.127
 depends+0.125
Neg logit contributions
ised-0.058
 early-0.056
 wide-0.053
ized-0.052
 slowly-0.049
Token now
Token ID783
Feature activation+5.05e+00

Pos logit contributions
 yours+0.061
 loser+0.051
 deserve+0.049
 virtue+0.047
 costs+0.046
Neg logit contributions
ised-0.073
 early-0.072
ized-0.069
 wide-0.067
 widened-0.065
Token."
Token ID526
Feature activation+2.89e-01

Pos logit contributions
 yours+0.119
 loser+0.105
 deserve+0.098
 policy+0.098
 virtue+0.097
Neg logit contributions
 early-0.106
ised-0.106
ized-0.101
 wide-0.098
 widened-0.094
Token
Token ID198
Feature activation-2.86e+00

Pos logit contributions
 yours+0.008
 loser+0.008
 deserve+0.007
 virtue+0.007
 policy+0.007
Neg logit contributions
ised-0.005
 early-0.004
ized-0.004
 widened-0.004
 wide-0.004
Token't
Token ID470
Feature activation+5.18e+00

Pos logit contributions
 yours+0.076
 loser+0.062
 virtue+0.055
 fault+0.055
 deserve+0.053
Neg logit contributions
ised-0.145
ized-0.140
 early-0.139
 wide-0.137
 widened-0.135
Token hurt
Token ID5938
Feature activation+3.34e+00

Pos logit contributions
 yours+0.134
 loser+0.118
 fault+0.115
 virtue+0.115
 costs+0.110
Neg logit contributions
 wide-0.097
ised-0.097
 early-0.094
 cautiously-0.092
ized-0.092
Token me
Token ID502
Feature activation+4.23e+00

Pos logit contributions
 yours+0.080
 virtue+0.074
 loser+0.073
 policy+0.073
 deserve+0.072
Neg logit contributions
ised-0.066
ized-0.063
 wide-0.062
 early-0.060
 cautiously-0.059
Token!"
Token ID2474
Feature activation-1.54e+00

Pos logit contributions
 yours+0.088
 loser+0.077
 deserve+0.071
 virtue+0.068
 losers+0.067
Neg logit contributions
ised-0.103
 early-0.099
ized-0.098
 widened-0.095
 wide-0.093
Token sque
Token ID35776
Feature activation-3.39e+00

Pos logit contributions
ised+0.022
 early+0.021
ized+0.020
awed+0.020
 widened+0.019
Neg logit contributions
 yours-0.045
 loser-0.040
 deserve-0.040
 depends-0.039
 You-0.039
Tokenaked
Token ID4335
Feature activation-8.96e+00

Pos logit contributions
ised+0.062
 early+0.062
ized+0.061
 widened+0.058
 wide+0.057
Neg logit contributions
 yours-0.089
 loser-0.081
 deserve-0.076
 virtue-0.072
 depends-0.072
Token the
Token ID262
Feature activation-1.86e-01

Pos logit contributions
ised+0.131
 early+0.119
ized+0.119
awed+0.117
 widened+0.115
Neg logit contributions
 yours-0.261
 loser-0.239
 losers-0.225
 deserve-0.221
 Go-0.216
Token rabbit
Token ID22746
Feature activation-4.03e+00

Pos logit contributions
 wide+0.004
ised+0.004
ized+0.004
 early+0.004
 tall+0.004
Neg logit contributions
 yours-0.004
 policy-0.004
 fault-0.004
 loser-0.004
 virtue-0.004
Token.
Token ID13
Feature activation-2.47e-01

Pos logit contributions
ised+0.083
 early+0.080
ized+0.077
awed+0.075
 widened+0.074
Neg logit contributions
 yours-0.082
 loser-0.078
 deserve-0.076
 you-0.074
 depends-0.072
Token
Token ID198
Feature activation-4.55e+00

Pos logit contributions
ised+0.004
ized+0.004
 early+0.004
led+0.004
 widened+0.004
Neg logit contributions
 yours-0.007
 loser-0.006
 deserve-0.006
 depends-0.006
 You-0.006
Token
Token ID198
Feature activation-3.50e+00

Pos logit contributions
ised+0.076
ized+0.072
 early+0.071
 widened+0.068
led+0.066
Neg logit contributions
 yours-0.129
 loser-0.116
 deserve-0.111
 virtue-0.107
 policy-0.106
Token Stop
Token ID13707
Feature activation+4.20e+00

Pos logit contributions
 yours+0.173
 virtue+0.140
 policy+0.137
 deserve+0.131
 belong+0.129
Neg logit contributions
ised-0.204
awed-0.190
 wide-0.190
 nervously-0.190
 early-0.189
Token!"
Token ID2474
Feature activation-1.66e+00

Pos logit contributions
 yours+0.118
 loser+0.107
 deserve+0.102
 virtue+0.100
 policy+0.100
Neg logit contributions
ised-0.070
 early-0.067
ized-0.066
 wide-0.062
 widened-0.062
Token a
Token ID257
Feature activation-4.10e+00

Pos logit contributions
ised+0.029
 early+0.028
ized+0.027
 widened+0.026
awed+0.026
Neg logit contributions
 yours-0.045
 loser-0.041
 deserve-0.039
 losers-0.037
 depends-0.037
Token mouse
Token ID10211
Feature activation-4.73e+00

Pos logit contributions
ised+0.078
ized+0.075
 early+0.074
awed+0.071
led+0.070
Neg logit contributions
 yours-0.102
 loser-0.097
 deserve-0.089
 losers-0.086
 virtue-0.086
Token sque
Token ID35776
Feature activation-3.29e+00

Pos logit contributions
ised+0.087
 early+0.086
ized+0.079
awed+0.073
side+0.072
Neg logit contributions
 yours-0.111
 loser-0.106
 deserve-0.103
 Go-0.103
 you-0.099
Tokenaked
Token ID4335
Feature activation-8.70e+00

Pos logit contributions
ised+0.062
 early+0.060
ized+0.059
 widened+0.056
 wide+0.055
Neg logit contributions
 yours-0.085
 loser-0.080
 deserve-0.074
 losers-0.071
 depends-0.070
Token,
Token ID11
Feature activation-4.80e+00

Pos logit contributions
ised+0.152
awed+0.149
ized+0.138
 widened+0.134
 early+0.133
Neg logit contributions
 yours-0.194
 loser-0.183
 Go-0.174
 you-0.173
 You-0.171
Token as
Token ID355
Feature activation-8.96e-01

Pos logit contributions
ised+0.101
ized+0.090
 early+0.084
led+0.082
side+0.081
Neg logit contributions
 yours-0.105
 You-0.096
 losers-0.094
 loser-0.091
 deserve-0.090
Token it
Token ID340
Feature activation+6.85e-01

Pos logit contributions
ised+0.013
ized+0.012
awed+0.011
 early+0.011
 widened+0.011
Neg logit contributions
 yours-0.028
 loser-0.026
 deserve-0.024
 losers-0.024
 policy-0.023
Token ran
Token ID4966
Feature activation-4.37e+00

Pos logit contributions
 will+0.015
 yours+0.015
 loser+0.015
 teaches+0.015
 deserve+0.015
Neg logit contributions
ised-0.013
 early-0.013
ized-0.012
 curiously-0.010
lee-0.010
Token out
Token ID503
Feature activation-6.46e+00

Pos logit contributions
ised+0.090
ized+0.084
awed+0.079
 widened+0.077
 early+0.077
Neg logit contributions
 yours-0.109
 loser-0.097
 deserve-0.091
 losers-0.089
 virtue-0.087
Token the
Token ID262
Feature activation+6.47e-01

Pos logit contributions
 yours+0.013
 loser+0.013
 deserve+0.012
 virtue+0.012
 depends+0.012
Neg logit contributions
ised-0.006
 early-0.006
ized-0.006
 wide-0.006
 cautiously-0.005
Token king
Token ID5822
Feature activation+6.82e-01

Pos logit contributions
 yours+0.016
 loser+0.014
 deserve+0.014
 virtue+0.013
 policy+0.013
Neg logit contributions
 early-0.012
 wide-0.012
ised-0.012
ized-0.011
 tall-0.011
Token.
Token ID13
Feature activation+7.61e-01

Pos logit contributions
 yours+0.021
 loser+0.020
 deserve+0.019
 virtue+0.019
 losers+0.018
Neg logit contributions
ised-0.009
 early-0.009
ized-0.008
 wide-0.008
 widened-0.007
Token They
Token ID1119
Feature activation-3.40e-01

Pos logit contributions
 yours+0.023
 loser+0.021
 deserve+0.020
 virtue+0.019
 policy+0.019
Neg logit contributions
ised-0.011
 early-0.011
ized-0.010
 widened-0.010
 wide-0.010
Token sne
Token ID10505
Feature activation-3.06e+00

Pos logit contributions
ised+0.006
 early+0.006
ized+0.006
awed+0.005
 widened+0.005
Neg logit contributions
 yours-0.009
 loser-0.008
 deserve-0.008
 depends-0.007
 losers-0.007
Tokenaked
Token ID4335
Feature activation-8.69e+00

Pos logit contributions
ised+0.076
 early+0.074
ized+0.073
 widened+0.070
 wide+0.070
Neg logit contributions
 yours-0.062
 loser-0.054
 deserve-0.050
 virtue-0.047
 losers-0.047
Token up
Token ID510
Feature activation-6.64e+00

Pos logit contributions
ised+0.170
awed+0.161
ized+0.155
 widened+0.148
led+0.138
Neg logit contributions
 yours-0.201
 loser-0.167
 losers-0.166
 marry-0.160
 you-0.158
Token on
Token ID319
Feature activation-5.77e+00

Pos logit contributions
ised+0.157
ized+0.147
led+0.144
awed+0.143
 widened+0.137
Neg logit contributions
 yours-0.129
 loser-0.111
 fault-0.106
 rule-0.104
 losers-0.104
Token him
Token ID683
Feature activation-2.53e+00

Pos logit contributions
ised+0.087
led+0.083
ized+0.081
awed+0.079
 widened+0.076
Neg logit contributions
 yours-0.172
 loser-0.154
 losers-0.144
 fault-0.142
 rule-0.142
Token while
Token ID981
Feature activation-2.66e+00

Pos logit contributions
ised+0.040
ized+0.036
awed+0.035
 early+0.033
led+0.033
Neg logit contributions
 yours-0.073
 loser-0.064
 fault-0.063
 deserve-0.062
 belong-0.062
Token he
Token ID339
Feature activation-3.80e-01

Pos logit contributions
ised+0.045
ized+0.041
awed+0.039
led+0.039
 widened+0.038
Neg logit contributions
 yours-0.075
 loser-0.069
 losers-0.063
 policy-0.062
 You-0.062
Token and
Token ID290
Feature activation+9.66e-01

Pos logit contributions
ised+0.028
ized+0.025
 widened+0.024
 early+0.024
awed+0.023
Neg logit contributions
 yours-0.062
 loser-0.053
 You-0.052
 deserve-0.051
 will-0.050
Token cozy
Token ID37438
Feature activation-2.89e+00

Pos logit contributions
 yours+0.028
 deserve+0.024
 loser+0.024
 losers+0.023
 depends+0.023
Neg logit contributions
ised-0.016
ized-0.015
 early-0.014
awed-0.014
 widened-0.014
Token.
Token ID13
Feature activation+1.33e+00

Pos logit contributions
ised+0.051
ized+0.048
awed+0.046
 widened+0.045
 early+0.045
Neg logit contributions
 yours-0.081
 loser-0.071
 deserve-0.066
 policy-0.066
 virtue-0.065
Token He
Token ID679
Feature activation-3.40e-01

Pos logit contributions
 yours+0.041
 loser+0.038
 deserve+0.035
 virtue+0.035
 policy+0.035
Neg logit contributions
ised-0.019
 early-0.018
ized-0.017
 wide-0.017
 cautiously-0.017
Token sne
Token ID10505
Feature activation-3.21e+00

Pos logit contributions
ised+0.006
ized+0.005
 early+0.005
awed+0.005
 widened+0.005
Neg logit contributions
 yours-0.009
 loser-0.008
 deserve-0.008
 losers-0.008
 costs-0.008
Tokenaked
Token ID4335
Feature activation-8.66e+00

Pos logit contributions
 early+0.079
ised+0.078
ized+0.078
 wide+0.075
 widened+0.074
Neg logit contributions
 yours-0.065
 loser-0.057
 deserve-0.054
 virtue-0.049
 losers-0.049
Token under
Token ID739
Feature activation-5.51e+00

Pos logit contributions
ised+0.190
awed+0.175
ized+0.172
 widened+0.156
 early+0.140
Neg logit contributions
 yours-0.184
 marry-0.149
 ever-0.145
 loser-0.144
 mate-0.144
Token it
Token ID340
Feature activation-3.15e+00

Pos logit contributions
ised+0.102
ized+0.094
awed+0.086
led+0.086
 early+0.085
Neg logit contributions
 yours-0.153
 loser-0.134
 losers-0.128
 you-0.126
 You-0.123
Token and
Token ID290
Feature activation-2.25e+00

Pos logit contributions
ised+0.066
awed+0.062
ized+0.060
led+0.055
 early+0.053
Neg logit contributions
 yours-0.068
 will-0.067
 you-0.062
 I-0.060
 can-0.060
Token waited
Token ID13488
Feature activation-4.83e+00

Pos logit contributions
ised+0.045
ized+0.040
 early+0.037
awed+0.036
 widened+0.033
Neg logit contributions
 yours-0.055
 will-0.048
 deserve-0.047
 you-0.047
 I-0.046
Token for
Token ID329
Feature activation-2.96e+00

Pos logit contributions
ised+0.110
ized+0.102
awed+0.101
 widened+0.091
led+0.089
Neg logit contributions
 yours-0.118
 loser-0.090
 you-0.089
 You-0.086
 fault-0.086
Token's
Token ID338
Feature activation+6.30e+00

Pos logit contributions
 yours+0.120
 virtue+0.105
 loser+0.105
 policy+0.093
 losers+0.091
Neg logit contributions
 wide-0.164
 early-0.161
ised-0.156
 widened-0.153
led-0.151
Token that
Token ID326
Feature activation+7.75e+00

Pos logit contributions
 yours+0.176
 loser+0.168
 deserve+0.156
 virtue+0.153
 policy+0.151
Neg logit contributions
ised-0.105
 early-0.099
ized-0.098
 wide-0.092
led-0.089
Token?"
Token ID1701
Feature activation-1.91e+00

Pos logit contributions
 yours+0.153
 loser+0.150
 virtue+0.140
 deserve+0.132
 fault+0.130
Neg logit contributions
 wide-0.179
 early-0.174
ised-0.167
ized-0.164
 tall-0.162
Token he
Token ID339
Feature activation-3.33e+00

Pos logit contributions
ised+0.034
ized+0.032
 early+0.031
awed+0.030
led+0.030
Neg logit contributions
 yours-0.052
 loser-0.047
 deserve-0.045
 You-0.043
 losers-0.043
Token bark
Token ID21405
Feature activation-6.05e+00

Pos logit contributions
ised+0.064
ized+0.062
 early+0.061
side+0.056
awed+0.055
Neg logit contributions
 yours-0.077
 loser-0.072
 deserve-0.070
 losers-0.069
 you-0.068
Tokened
Token ID276
Feature activation-8.58e+00

Pos logit contributions
 early+0.108
ised+0.107
ized+0.093
side+0.092
 fields+0.090
Neg logit contributions
 loser-0.135
 Go-0.134
 teaches-0.123
 yours-0.121
 deserve-0.121
Token,
Token ID11
Feature activation-5.21e+00

Pos logit contributions
ised+0.173
ized+0.159
awed+0.158
side+0.155
 early+0.149
Neg logit contributions
 yours-0.192
 loser-0.177
 You-0.171
 Go-0.169
 losers-0.161
Token jumping
Token ID14284
Feature activation-5.66e+00

Pos logit contributions
ised+0.116
ized+0.103
side+0.094
led+0.091
awed+0.088
Neg logit contributions
 yours-0.110
 You-0.102
 grant-0.099
 I-0.098
 losers-0.095
Token on
Token ID319
Feature activation-5.81e+00

Pos logit contributions
ised+0.126
ized+0.111
 sudden+0.106
awed+0.101
 early+0.098
Neg logit contributions
 yours-0.126
 loser-0.117
 Go-0.110
 You-0.108
 winner-0.103
Token the
Token ID262
Feature activation-2.18e+00

Pos logit contributions
ised+0.102
ized+0.093
led+0.090
 early+0.087
awed+0.087
Neg logit contributions
 yours-0.165
 loser-0.149
 losers-0.137
 You-0.136
 deserve-0.134
Token box
Token ID3091
Feature activation-4.93e+00

Pos logit contributions
ised+0.041
 early+0.039
ized+0.039
 widened+0.037
 wide+0.037
Neg logit contributions
 yours-0.057
 loser-0.051
 deserve-0.049
 virtue-0.047
 losers-0.046
Token kitchen
Token ID9592
Feature activation-2.54e+00

Pos logit contributions
 early+0.021
 wide+0.020
ised+0.020
ized+0.019
 tall+0.018
Neg logit contributions
 yours-0.029
 loser-0.026
 deserve-0.026
 policy-0.024
 depends-0.024
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
ised+0.038
ized+0.035
 early+0.033
 widened+0.032
awed+0.032
Neg logit contributions
 yours-0.077
 loser-0.068
 deserve-0.066
 costs-0.065
 policy-0.064
Token Then
Token ID3244
Feature activation-3.25e-01

Pos logit contributions
 yours+0.044
 loser+0.041
 deserve+0.038
 virtue+0.038
 losers+0.038
Neg logit contributions
ised-0.021
 early-0.020
ized-0.019
 wide-0.019
 widened-0.018
Token he
Token ID339
Feature activation+7.49e-02

Pos logit contributions
ised+0.004
 early+0.004
ized+0.003
 widened+0.003
 wide+0.003
Neg logit contributions
 yours-0.011
 loser-0.010
 deserve-0.010
 virtue-0.009
 losers-0.009
Token sne
Token ID10505
Feature activation-2.88e+00

Pos logit contributions
 yours+0.002
 loser+0.002
 deserve+0.002
 virtue+0.002
 policy+0.002
Neg logit contributions
ised-0.001
 early-0.001
ized-0.001
 widened-0.001
 wide-0.001
Tokenaked
Token ID4335
Feature activation-8.53e+00

Pos logit contributions
ised+0.072
ized+0.070
 early+0.070
 widened+0.067
 wide+0.066
Neg logit contributions
 yours-0.058
 loser-0.051
 deserve-0.047
 virtue-0.044
 losers-0.044
Token out
Token ID503
Feature activation-7.04e+00

Pos logit contributions
ised+0.162
awed+0.149
ized+0.146
 widened+0.138
led+0.125
Neg logit contributions
 yours-0.218
 loser-0.178
 losers-0.173
 you-0.172
 Go-0.168
Token the
Token ID262
Feature activation-2.92e+00

Pos logit contributions
ised+0.145
ized+0.137
awed+0.133
led+0.122
 widened+0.120
Neg logit contributions
 yours-0.169
 loser-0.150
 losers-0.140
 deserve-0.139
 policy-0.139
Token door
Token ID3420
Feature activation-4.82e+00

Pos logit contributions
 early+0.047
ised+0.046
 wide+0.044
ized+0.043
 widened+0.041
Neg logit contributions
 yours-0.085
 loser-0.077
 deserve-0.074
 virtue-0.071
 depends-0.070
Token.
Token ID13
Feature activation+7.63e-01

Pos logit contributions
ised+0.088
ized+0.077
awed+0.075
led+0.071
 early+0.069
Neg logit contributions
 yours-0.128
 deserve-0.108
 loser-0.108
 rule-0.107
 policy-0.107
Token He
Token ID679
Feature activation-8.43e-01

Pos logit contributions
 yours+0.023
 loser+0.021
 deserve+0.020
 virtue+0.020
 losers+0.020
Neg logit contributions
ised-0.011
 early-0.010
ized-0.010
 wide-0.010
 widened-0.010
Token with
Token ID351
Feature activation+2.28e+00

Pos logit contributions
 yours+0.107
 loser+0.099
 deserve+0.094
 virtue+0.094
 policy+0.094
Neg logit contributions
ised-0.047
 early-0.044
ized-0.044
 wide-0.041
 widened-0.039
Token me
Token ID502
Feature activation+2.91e+00

Pos logit contributions
 yours+0.053
 loser+0.051
 deserve+0.047
 virtue+0.046
 policy+0.046
Neg logit contributions
ised-0.048
ized-0.046
 early-0.046
 wide-0.044
awed-0.042
Token?"
Token ID1701
Feature activation-2.54e+00

Pos logit contributions
 yours+0.042
 loser+0.034
 deserve+0.031
 virtue+0.028
 policy+0.028
Neg logit contributions
ised-0.089
 early-0.087
ized-0.086
 widened-0.083
 wide-0.083
Token Spot
Token ID15899
Feature activation-3.86e+00

Pos logit contributions
ised+0.051
 early+0.051
ized+0.050
 widened+0.048
led+0.046
Neg logit contributions
 yours-0.059
 loser-0.051
 deserve-0.050
 You-0.050
 costs-0.049
Token bark
Token ID21405
Feature activation-5.61e+00

Pos logit contributions
ised+0.069
ized+0.066
 early+0.066
 widened+0.060
led+0.058
Neg logit contributions
 yours-0.100
 loser-0.092
 deserve-0.089
 depends-0.087
 costs-0.086
Tokened
Token ID276
Feature activation-8.53e+00

Pos logit contributions
ised+0.095
 early+0.087
ized+0.084
 valleys+0.083
 fields+0.079
Neg logit contributions
 loser-0.136
 yours-0.133
 Go-0.131
 ever-0.123
 You-0.122
Token.
Token ID13
Feature activation-1.15e+00

Pos logit contributions
ised+0.160
side+0.151
ized+0.150
awed+0.143
 valleys+0.134
Neg logit contributions
 yours-0.207
 loser-0.185
 You-0.179
 Go-0.175
 losers-0.171
Token
Token ID198
Feature activation-4.86e+00

Pos logit contributions
ised+0.023
ized+0.023
 early+0.021
led+0.021
 widened+0.021
Neg logit contributions
 yours-0.028
 loser-0.025
 deserve-0.024
 You-0.024
 Friends-0.024
Token
Token ID198
Feature activation-3.44e+00

Pos logit contributions
ised+0.078
 early+0.073
ized+0.073
 widened+0.069
 wide+0.066
Neg logit contributions
 yours-0.141
 loser-0.127
 deserve-0.121
 virtue-0.117
 policy-0.117
TokenM
Token ID44
Feature activation+1.19e+00

Pos logit contributions
ised+0.081
 early+0.079
ized+0.078
 widened+0.077
led+0.074
Neg logit contributions
 yours-0.072
 loser-0.064
 deserve-0.061
 losers-0.058
 Go-0.056
Tokenittens
Token ID34978
Feature activation-6.61e-01

Pos logit contributions
 yours+0.035
 loser+0.032
 deserve+0.030
 policy+0.029
 virtue+0.029
Neg logit contributions
ised-0.019
 early-0.018
ized-0.018
 widened-0.017
 wide-0.017
TokenO
Token ID46
Feature activation+3.92e+00

Pos logit contributions
 yours+0.101
 fault+0.088
 policy+0.083
 virtue+0.081
 loser+0.079
Neg logit contributions
ised-0.070
 widened-0.067
 wide-0.065
ized-0.064
 cautiously-0.063
Tokenuch
Token ID794
Feature activation+1.75e+00

Pos logit contributions
 yours+0.039
 loser+0.032
 deserve+0.025
 virtue+0.025
 policy+0.022
Neg logit contributions
ised-0.135
 early-0.132
 widened-0.132
ized-0.131
 wide-0.129
Token!"
Token ID2474
Feature activation-2.17e+00

Pos logit contributions
 yours+0.035
 loser+0.031
 deserve+0.028
 losers+0.026
 virtue+0.026
Neg logit contributions
ised-0.043
 early-0.041
ized-0.041
 widened-0.041
awed-0.040
Token Ben
Token ID3932
Feature activation-3.07e+00

Pos logit contributions
ised+0.044
ized+0.042
 early+0.040
 widened+0.038
awed+0.038
Neg logit contributions
 yours-0.054
 loser-0.048
 deserve-0.045
 losers-0.044
 You-0.044
Token cried
Token ID16896
Feature activation-6.15e+00

Pos logit contributions
ised+0.054
ized+0.051
 early+0.050
 widened+0.046
awed+0.045
Neg logit contributions
 yours-0.084
 loser-0.077
 deserve-0.073
 losers-0.071
 depends-0.071
Token out
Token ID503
Feature activation-8.53e+00

Pos logit contributions
ised+0.117
ized+0.117
awed+0.110
 widened+0.102
 early+0.102
Neg logit contributions
 yours-0.153
 loser-0.147
 deserve-0.128
 fault-0.127
 policy-0.126
Token.
Token ID13
Feature activation-6.93e-01

Pos logit contributions
ized+0.181
ised+0.178
awed+0.175
side+0.159
led+0.147
Neg logit contributions
 loser-0.185
 yours-0.181
 policy-0.161
 rule-0.160
 teaches-0.156
Token He
Token ID679
Feature activation-6.00e-01

Pos logit contributions
ised+0.014
ized+0.013
 early+0.012
led+0.012
awed+0.012
Neg logit contributions
 yours-0.017
 loser-0.016
 You-0.015
 deserve-0.015
 Friends-0.015
Token felt
Token ID2936
Feature activation-4.27e+00

Pos logit contributions
ised+0.011
 early+0.011
ized+0.010
side+0.009
overed+0.009
Neg logit contributions
 yours-0.015
 deserve-0.014
 loser-0.014
 will-0.014
 losers-0.014
Token a
Token ID257
Feature activation-3.99e+00

Pos logit contributions
ised+0.095
ized+0.091
 early+0.089
awed+0.087
 widened+0.082
Neg logit contributions
 yours-0.099
 loser-0.086
 deserve-0.083
 belong-0.080
 losers-0.079
Token sharp
Token ID7786
Feature activation-2.86e+00

Pos logit contributions
ised+0.084
ized+0.081
 early+0.078
awed+0.076
 widened+0.070
Neg logit contributions
 yours-0.091
 loser-0.090
 deserve-0.081
 losers-0.078
 policy-0.077
Token sheep
Token ID15900
Feature activation+5.91e+00

Pos logit contributions
 yours+0.128
 loser+0.121
 policy+0.112
 virtue+0.111
 deserve+0.110
Neg logit contributions
ised-0.087
 early-0.084
 wide-0.084
ized-0.079
 cautiously-0.077
Token!"
Token ID2474
Feature activation-2.87e+00

Pos logit contributions
 yours+0.125
 loser+0.114
 policy+0.102
 deserve+0.102
 virtue+0.101
Neg logit contributions
ised-0.137
 early-0.133
ized-0.132
 wide-0.127
 widened-0.125
Token he
Token ID339
Feature activation-2.75e+00

Pos logit contributions
ised+0.053
ized+0.052
awed+0.051
 early+0.050
led+0.049
Neg logit contributions
 yours-0.068
 deserve-0.061
 depends-0.058
 You-0.058
 you-0.058
Token squ
Token ID2809
Feature activation-3.93e+00

Pos logit contributions
ised+0.048
ized+0.048
 early+0.047
awed+0.042
 widened+0.041
Neg logit contributions
 yours-0.067
 loser-0.061
 deserve-0.060
 losers-0.058
 Go-0.058
Tokeneal
Token ID2287
Feature activation-5.51e+00

Pos logit contributions
 early+0.087
 wide+0.077
 widened+0.077
side+0.076
ized+0.076
Neg logit contributions
 yours-0.085
 loser-0.071
 deserve-0.071
 Go-0.070
 losers-0.069
Tokened
Token ID276
Feature activation-8.45e+00

Pos logit contributions
 fields+0.122
 early+0.118
 valleys+0.114
ised+0.109
 pasture+0.106
Neg logit contributions
 ever-0.108
 yours-0.098
 loser-0.098
 Go-0.089
 You-0.089
Token.
Token ID13
Feature activation-3.34e-01

Pos logit contributions
ised+0.166
awed+0.156
led+0.148
ized+0.145
 early+0.139
Neg logit contributions
 yours-0.190
 loser-0.173
 marry-0.164
 Go-0.162
 You-0.161
Token
Token ID198
Feature activation-4.89e+00

Pos logit contributions
ised+0.006
ized+0.006
 early+0.006
 widened+0.005
led+0.005
Neg logit contributions
 yours-0.009
 loser-0.008
 deserve-0.008
 depends-0.007
 You-0.007
Token
Token ID198
Feature activation-3.83e+00

Pos logit contributions
ised+0.084
ized+0.081
 early+0.079
 widened+0.076
led+0.075
Neg logit contributions
 yours-0.134
 loser-0.120
 deserve-0.116
 virtue-0.111
 policy-0.111
TokenM
Token ID44
Feature activation+1.27e+00

Pos logit contributions
ised+0.073
 early+0.072
ized+0.071
awed+0.069
 widened+0.069
Neg logit contributions
 yours-0.095
 loser-0.084
 deserve-0.082
 virtue-0.078
 policy-0.077
Tokenama
Token ID1689
Feature activation-2.73e+00

Pos logit contributions
 yours+0.029
 loser+0.025
 deserve+0.024
 policy+0.023
 virtue+0.023
Neg logit contributions
ised-0.028
 early-0.027
ized-0.027
 widened-0.026
 wide-0.026
Token."
Token ID526
Feature activation-1.90e+00

Pos logit contributions
 yours+0.055
 loser+0.055
 deserve+0.049
 losers+0.049
 virtue+0.049
Neg logit contributions
ised-0.037
ized-0.036
 early-0.035
 wide-0.033
 cautiously-0.032
Token
Token ID198
Feature activation-4.42e+00

Pos logit contributions
ised+0.038
ized+0.037
led+0.037
 widened+0.036
 early+0.035
Neg logit contributions
 yours-0.044
 You-0.039
 Friends-0.038
 deserve-0.038
 loser-0.037
Token
Token ID198
Feature activation-3.60e+00

Pos logit contributions
ised+0.084
ized+0.080
led+0.078
 widened+0.077
 early+0.075
Neg logit contributions
 yours-0.112
 loser-0.098
 deserve-0.097
 policy-0.094
 virtue-0.093
TokenMom
Token ID29252
Feature activation-2.68e+00

Pos logit contributions
ised+0.067
led+0.064
 early+0.064
 widened+0.063
ized+0.063
Neg logit contributions
 yours-0.090
 deserve-0.080
 loser-0.080
 virtue-0.077
 policy-0.075
Token looked
Token ID3114
Feature activation-5.07e+00

Pos logit contributions
ised+0.040
ized+0.038
 early+0.037
 widened+0.034
awed+0.033
Neg logit contributions
 yours-0.080
 loser-0.072
 deserve-0.070
 depends-0.068
 losers-0.067
Token into
Token ID656
Feature activation-8.44e+00

Pos logit contributions
side+0.088
awed+0.086
ised+0.084
ized+0.082
edly+0.082
Neg logit contributions
 yours-0.122
 will-0.105
 depends-0.100
 costs-0.097
 teaches-0.096
Token the
Token ID262
Feature activation-1.35e+00

Pos logit contributions
awed+0.166
led+0.165
ized+0.153
ised+0.151
 early+0.150
Neg logit contributions
 yours-0.177
 gift-0.162
 deserve-0.160
 policy-0.151
 loser-0.150
Token bucket
Token ID19236
Feature activation-4.46e+00

Pos logit contributions
ised+0.026
 widened+0.026
awed+0.026
 early+0.025
 cautiously+0.025
Neg logit contributions
 yours-0.030
 loser-0.028
 deserve-0.026
 rule-0.026
 depends-0.026
Token and
Token ID290
Feature activation-1.85e+00

Pos logit contributions
ised+0.083
 widened+0.082
awed+0.080
led+0.077
 early+0.074
Neg logit contributions
 yours-0.092
 winner-0.084
 will-0.084
 rule-0.080
 Friends-0.078
Token discovered
Token ID5071
Feature activation-3.17e+00

Pos logit contributions
 early+0.032
ised+0.031
awed+0.030
edly+0.030
ized+0.029
Neg logit contributions
 yours-0.042
 you-0.038
 will-0.037
 depends-0.036
 deserve-0.036
Token the
Token ID262
Feature activation+1.14e+00

Pos logit contributions
 early+0.059
ised+0.056
 widened+0.054
awed+0.052
ized+0.051
Neg logit contributions
 yours-0.084
 deserve-0.068
 belong-0.067
 fault-0.067
 you-0.064
Token quite
Token ID2407
Feature activation-1.62e+00

Pos logit contributions
ised+0.026
ized+0.023
 widened+0.022
 early+0.022
awed+0.021
Neg logit contributions
 yours-0.042
 loser-0.037
 losers-0.036
 winner-0.035
 deserve-0.035
Token curious
Token ID11040
Feature activation-2.00e+00

Pos logit contributions
ised+0.026
ized+0.023
awed+0.022
 early+0.022
 widened+0.022
Neg logit contributions
 yours-0.048
 loser-0.044
 deserve-0.042
 winner-0.041
 losers-0.040
Token.
Token ID13
Feature activation+1.61e-01

Pos logit contributions
ised+0.036
ized+0.033
 early+0.032
awed+0.031
 widened+0.030
Neg logit contributions
 yours-0.056
 loser-0.049
 deserve-0.046
 policy-0.046
 fault-0.046
Token Alex
Token ID4422
Feature activation-2.74e+00

Pos logit contributions
 yours+0.004
 loser+0.004
 deserve+0.004
 depends+0.003
 virtue+0.003
Neg logit contributions
ised-0.003
ized-0.003
 early-0.003
 widened-0.003
awed-0.003
Token walked
Token ID6807
Feature activation-6.19e+00

Pos logit contributions
 early+0.049
ised+0.049
ized+0.046
awed+0.043
 widened+0.042
Neg logit contributions
 yours-0.064
 deserve-0.061
 loser-0.059
 costs-0.058
 depends-0.058
Token up
Token ID510
Feature activation-8.44e+00

Pos logit contributions
ised+0.135
awed+0.130
ized+0.126
 widened+0.125
led+0.117
Neg logit contributions
 yours-0.132
 winner-0.112
 loser-0.108
 will-0.104
 losers-0.103
Token to
Token ID284
Feature activation-4.40e+00

Pos logit contributions
ised+0.199
ized+0.192
 widened+0.190
led+0.186
awed+0.183
Neg logit contributions
 yours-0.156
 loser-0.138
 winner-0.131
 losers-0.129
 ever-0.127
Token him
Token ID683
Feature activation-3.04e+00

Pos logit contributions
ised+0.055
 early+0.052
ized+0.051
 widened+0.047
 wide+0.046
Neg logit contributions
 yours-0.143
 loser-0.132
 deserve-0.126
 virtue-0.123
 policy-0.122
Token and
Token ID290
Feature activation-3.09e+00

Pos logit contributions
ised+0.052
ized+0.050
awed+0.050
led+0.050
 widened+0.047
Neg logit contributions
 yours-0.078
 You-0.068
 costs-0.068
 Friends-0.066
 depends-0.066
Token said
Token ID531
Feature activation-3.85e+00

Pos logit contributions
ised+0.057
 early+0.054
ized+0.053
awed+0.052
 widened+0.050
Neg logit contributions
 yours-0.074
 deserve-0.067
 depends-0.064
 costs-0.063
 you-0.062
Token,
Token ID11
Feature activation-3.23e+00

Pos logit contributions
ised+0.054
ized+0.050
 early+0.047
awed+0.045
 widened+0.045
Neg logit contributions
 yours-0.123
 loser-0.111
 deserve-0.103
 policy-0.102
 virtue-0.101
Token."
Token ID526
Feature activation-1.35e+00

Pos logit contributions
 yours+0.081
 loser+0.080
 virtue+0.073
 deserve+0.073
 losers+0.073
Neg logit contributions
ised-0.063
 early-0.062
ized-0.059
 wide-0.058
led-0.057
Token
Token ID198
Feature activation-4.03e+00

Pos logit contributions
ised+0.020
 early+0.019
ized+0.019
 widened+0.018
 wide+0.017
Neg logit contributions
 yours-0.041
 loser-0.037
 deserve-0.035
 virtue-0.034
 losers-0.034
Token
Token ID198
Feature activation-3.32e+00

Pos logit contributions
ised+0.068
ized+0.064
 early+0.063
 widened+0.061
led+0.059
Neg logit contributions
 yours-0.113
 loser-0.102
 deserve-0.097
 depends-0.094
 policy-0.094
TokenThey
Token ID2990
Feature activation-2.27e+00

Pos logit contributions
ised+0.056
 early+0.054
ized+0.052
 cautiously+0.049
 wide+0.049
Neg logit contributions
 yours-0.096
 loser-0.085
 fault-0.082
 deserve-0.080
 costs-0.079
Token walked
Token ID6807
Feature activation-6.49e+00

Pos logit contributions
ised+0.044
ized+0.042
 early+0.041
awed+0.041
led+0.037
Neg logit contributions
 yours-0.056
 deserve-0.051
 loser-0.050
 belong-0.049
 will-0.048
Token deeper
Token ID9211
Feature activation-8.44e+00

Pos logit contributions
awed+0.149
ised+0.147
ized+0.147
 widened+0.141
itched+0.127
Neg logit contributions
 yours-0.129
 will-0.113
 winner-0.112
 loser-0.109
 rule-0.104
Token into
Token ID656
Feature activation-6.86e+00

Pos logit contributions
awed+0.158
ized+0.149
 widened+0.149
 widening+0.146
ised+0.141
Neg logit contributions
 yours-0.166
 our-0.147
 loser-0.141
 ever-0.138
 policy-0.138
Token the
Token ID262
Feature activation-2.90e+00

Pos logit contributions
awed+0.150
led+0.145
ized+0.140
ised+0.135
ocked+0.124
Neg logit contributions
 yours-0.147
 loser-0.126
 policy-0.125
 rule-0.123
 our-0.122
Token cave
Token ID11527
Feature activation-6.07e+00

Pos logit contributions
ised+0.057
ized+0.056
led+0.055
awed+0.055
 widened+0.052
Neg logit contributions
 yours-0.069
 loser-0.062
 deserve-0.060
 rule-0.059
 virtue-0.058
Token,
Token ID11
Feature activation-3.62e+00

Pos logit contributions
awed+0.119
ised+0.115
 widened+0.108
led+0.107
ized+0.106
Neg logit contributions
 yours-0.136
 rule-0.130
 loser-0.127
 policy-0.123
 fault-0.118
Token until
Token ID1566
Feature activation-1.98e+00

Pos logit contributions
ised+0.077
awed+0.067
ized+0.066
led+0.066
 widened+0.065
Neg logit contributions
 yours-0.082
 loser-0.067
 winner-0.067
 least-0.067
 you-0.065
Token when
Token ID618
Feature activation-3.00e-01

Pos logit contributions
 yours+0.012
 loser+0.010
 deserve+0.010
 depends+0.010
 virtue+0.010
Neg logit contributions
ised-0.006
ized-0.006
awed-0.006
 early-0.006
 widened-0.005
Token a
Token ID257
Feature activation-1.56e+00

Pos logit contributions
 early+0.003
ised+0.003
ized+0.003
 wide+0.003
 widened+0.003
Neg logit contributions
 yours-0.010
 loser-0.009
 deserve-0.009
 virtue-0.009
 depends-0.009
Token small
Token ID1402
Feature activation-1.95e+00

Pos logit contributions
 early+0.024
 wide+0.023
ised+0.023
 tall+0.020
ized+0.020
Neg logit contributions
 yours-0.048
 loser-0.041
 fault-0.040
 deserve-0.040
 costs-0.040
Token girl
Token ID2576
Feature activation-1.45e+00

Pos logit contributions
ised+0.042
 early+0.040
ized+0.039
 wide+0.038
 widened+0.037
Neg logit contributions
 yours-0.047
 loser-0.041
 deserve-0.039
 virtue-0.037
 policy-0.037
Token walked
Token ID6807
Feature activation-6.05e+00

Pos logit contributions
ised+0.020
 early+0.019
ized+0.019
 widened+0.017
 wide+0.017
Neg logit contributions
 yours-0.045
 loser-0.041
 deserve-0.039
 depends-0.038
 losers-0.038
Token up
Token ID510
Feature activation-8.44e+00

Pos logit contributions
ised+0.127
ized+0.116
awed+0.114
 widened+0.113
 sudden+0.110
Neg logit contributions
 yours-0.138
 you-0.109
 You-0.109
 will-0.109
 winner-0.107
Token and
Token ID290
Feature activation-2.93e+00

Pos logit contributions
ised+0.199
ized+0.187
awed+0.178
 widened+0.178
led+0.170
Neg logit contributions
 yours-0.177
 you-0.138
 loser-0.136
 policy-0.134
 ever-0.132
Token saw
Token ID2497
Feature activation-4.86e+00

Pos logit contributions
ised+0.047
ized+0.042
 widened+0.038
 early+0.038
awed+0.038
Neg logit contributions
 yours-0.082
 costs-0.071
 depends-0.070
 deserve-0.070
 will-0.069
Token him
Token ID683
Feature activation-2.01e+00

Pos logit contributions
ised+0.082
ized+0.075
 early+0.075
 widened+0.071
awed+0.071
Neg logit contributions
 yours-0.138
 loser-0.116
 deserve-0.111
 you-0.110
 depends-0.109
Token.
Token ID13
Feature activation+2.20e-01

Pos logit contributions
ised+0.038
ized+0.033
led+0.032
awed+0.031
 early+0.031
Neg logit contributions
 yours-0.051
 loser-0.044
 You-0.043
 belong-0.043
 deserve-0.042
Token She
Token ID1375
Feature activation-1.29e+00

Pos logit contributions
 yours+0.006
 loser+0.006
 deserve+0.005
 You+0.005
 costs+0.005
Neg logit contributions
ised-0.004
ized-0.004
 early-0.003
led-0.003
 widened-0.003
TokenI
Token ID40
Feature activation+5.29e+00

Pos logit contributions
 yours+0.143
 fault+0.138
 policy+0.122
 loser+0.121
 enemy+0.115
Neg logit contributions
ised-0.098
ized-0.086
 walked-0.080
 cautiously-0.078
 sudden-0.077
Token will
Token ID481
Feature activation+3.45e+00

Pos logit contributions
 yours+0.105
 loser+0.082
 policy+0.079
 losers+0.076
 virtue+0.073
Neg logit contributions
 wide-0.130
 early-0.130
 widened-0.129
ised-0.129
 nervously-0.124
Token!"
Token ID2474
Feature activation-1.11e+00

Pos logit contributions
 yours+0.071
 loser+0.065
 policy+0.060
 depends+0.059
inea+0.058
Neg logit contributions
 early-0.080
 wide-0.079
ised-0.076
 cautiously-0.074
 widened-0.074
Token she
Token ID673
Feature activation-1.67e+00

Pos logit contributions
ised+0.017
 early+0.016
ized+0.016
 widened+0.015
 wide+0.015
Neg logit contributions
 yours-0.033
 loser-0.030
 deserve-0.029
 virtue-0.028
 policy-0.028
Token ran
Token ID4966
Feature activation-5.83e+00

Pos logit contributions
ised+0.025
 early+0.024
ized+0.023
 widened+0.022
 wide+0.022
Neg logit contributions
 yours-0.050
 loser-0.046
 deserve-0.043
 policy-0.042
 virtue-0.042
Token off
Token ID572
Feature activation-8.43e+00

Pos logit contributions
ised+0.118
ized+0.111
awed+0.105
 widened+0.104
led+0.101
Neg logit contributions
 yours-0.151
 loser-0.131
 deserve-0.123
 virtue-0.120
 losers-0.120
Token with
Token ID351
Feature activation-4.81e+00

Pos logit contributions
ised+0.169
ized+0.155
awed+0.151
led+0.148
 widened+0.146
Neg logit contributions
 yours-0.213
 loser-0.182
 policy-0.173
 losers-0.172
 you-0.170
Token a
Token ID257
Feature activation-4.83e+00

Pos logit contributions
ised+0.071
ized+0.066
 early+0.065
awed+0.063
 widened+0.060
Neg logit contributions
 yours-0.150
 loser-0.132
 deserve-0.127
 losers-0.125
 belong-0.122
Token happy
Token ID3772
Feature activation-3.70e+00

Pos logit contributions
ised+0.084
ized+0.079
 early+0.074
led+0.073
awed+0.072
Neg logit contributions
 yours-0.134
 loser-0.123
 deserve-0.116
 policy-0.112
 virtue-0.112
Token heart
Token ID2612
Feature activation-3.25e+00

Pos logit contributions
ised+0.079
ized+0.076
awed+0.071
led+0.070
 widened+0.068
Neg logit contributions
 yours-0.088
 loser-0.080
 policy-0.074
 deserve-0.073
 rule-0.073
Token.
Token ID13
Feature activation+8.85e-01

Pos logit contributions
ised+0.067
ized+0.060
awed+0.058
 early+0.057
 curiously+0.056
Neg logit contributions
 yours-0.072
 loser-0.066
 winner-0.064
 deserve-0.061
 teaches-0.060
Token."
Token ID526
Feature activation-1.16e+00

Pos logit contributions
 yours+0.045
 loser+0.044
 deserve+0.042
 virtue+0.041
 depends+0.040
Neg logit contributions
 early-0.020
ised-0.019
ized-0.018
 wide-0.018
 curiously-0.016
Token
Token ID198
Feature activation-4.09e+00

Pos logit contributions
 early+0.015
ised+0.015
 wide+0.013
ized+0.013
 widened+0.013
Neg logit contributions
 yours-0.038
 loser-0.034
 deserve-0.033
 virtue-0.033
 losers-0.032
Token
Token ID198
Feature activation-3.32e+00

Pos logit contributions
ised+0.076
ized+0.073
 early+0.070
 widened+0.068
led+0.067
Neg logit contributions
 yours-0.107
 loser-0.096
 deserve-0.093
 policy-0.088
 depends-0.088
TokenS
Token ID50
Feature activation+2.88e-01

Pos logit contributions
ised+0.055
 early+0.052
ized+0.050
 widened+0.048
 wide+0.048
Neg logit contributions
 yours-0.097
 loser-0.086
 deserve-0.082
 fault-0.080
 depends-0.080
Tokenara
Token ID3301
Feature activation-3.40e+00

Pos logit contributions
 yours+0.006
 loser+0.005
 deserve+0.005
 policy+0.004
 costs+0.004
Neg logit contributions
ised-0.007
 early-0.007
ized-0.007
 widened-0.007
 wide-0.007
Token closed
Token ID4838
Feature activation-8.41e+00

Pos logit contributions
ised+0.064
ized+0.060
 early+0.056
side+0.050
 valleys+0.050
Neg logit contributions
 will-0.077
 deserve-0.076
 teaches-0.076
 depends-0.074
 loser-0.074
Token the
Token ID262
Feature activation-1.76e+00

Pos logit contributions
ised+0.172
awed+0.167
ized+0.167
 widened+0.166
rolling+0.165
Neg logit contributions
 yours-0.175
 my-0.129
 you-0.125
 our-0.124
 fault-0.117
Token book
Token ID1492
Feature activation-3.84e+00

Pos logit contributions
 early+0.028
ised+0.027
 wide+0.027
ized+0.025
 widened+0.025
Neg logit contributions
 yours-0.050
 loser-0.045
 deserve-0.044
 virtue-0.042
 losers-0.041
Token and
Token ID290
Feature activation-2.36e+00

Pos logit contributions
ised+0.064
awed+0.060
ized+0.059
 sudden+0.057
lee+0.056
Neg logit contributions
 yours-0.100
 rule-0.094
 teaches-0.094
 loser-0.092
 costs-0.090
Token thought
Token ID1807
Feature activation-9.99e-01

Pos logit contributions
ised+0.045
ized+0.040
awed+0.037
 early+0.037
lee+0.033
Neg logit contributions
 yours-0.059
 will-0.054
 deserve-0.053
 depends-0.052
 costs-0.052
Token for
Token ID329
Feature activation-1.84e+00

Pos logit contributions
ised+0.019
ized+0.018
awed+0.017
 widened+0.016
 early+0.016
Neg logit contributions
 yours-0.028
 loser-0.024
 fault-0.023
 policy-0.023
 losers-0.022
Token
Token ID198
Feature activation-4.80e+00

Pos logit contributions
ised+0.050
ized+0.049
led+0.045
awed+0.044
 widened+0.043
Neg logit contributions
 yours-0.062
 You-0.056
 I-0.055
 you-0.054
 deserve-0.054
TokenS
Token ID50
Feature activation+6.01e-01

Pos logit contributions
ised+0.094
ized+0.090
led+0.088
 widened+0.086
 early+0.085
Neg logit contributions
 yours-0.120
 loser-0.107
 deserve-0.103
 Go-0.099
 policy-0.098
Tokenoph
Token ID2522
Feature activation-3.61e+00

Pos logit contributions
 yours+0.015
 loser+0.014
 deserve+0.013
 virtue+0.012
 policy+0.012
Neg logit contributions
ised-0.012
 early-0.011
ized-0.011
 widened-0.011
 wide-0.010
Tokenie
Token ID494
Feature activation-4.12e+00

Pos logit contributions
ised+0.078
 early+0.076
ized+0.073
 widened+0.073
 wide+0.072
Neg logit contributions
 yours-0.083
 loser-0.067
 deserve-0.067
 policy-0.065
 teaches-0.064
Token looked
Token ID3114
Feature activation-5.44e+00

Pos logit contributions
ised+0.071
 early+0.067
ized+0.066
 widened+0.060
awed+0.058
Neg logit contributions
 yours-0.099
 loser-0.092
 deserve-0.091
 Go-0.090
 costs-0.089
Token up
Token ID510
Feature activation-8.41e+00

Pos logit contributions
ised+0.088
side+0.079
ized+0.078
awed+0.076
 early+0.075
Neg logit contributions
 yours-0.148
 You-0.121
 costs-0.120
 deserve-0.119
 depends-0.119
Token and
Token ID290
Feature activation-3.93e+00

Pos logit contributions
awed+0.171
ised+0.169
led+0.163
 valleys+0.161
 widened+0.153
Neg logit contributions
 yours-0.139
 ever-0.137
 you-0.134
 I-0.124
 winner-0.122
Token said
Token ID531
Feature activation-4.69e+00

Pos logit contributions
ised+0.072
 early+0.066
ized+0.063
side+0.062
edly+0.062
Neg logit contributions
 yours-0.092
 you-0.085
 I-0.084
 You-0.079
 will-0.078
Token,
Token ID11
Feature activation-4.38e+00

Pos logit contributions
ised+0.068
ized+0.063
 early+0.061
 widened+0.057
awed+0.054
Neg logit contributions
 yours-0.145
 loser-0.129
 Go-0.121
 You-0.121
 deserve-0.120
Token "
Token ID366
Feature activation+3.79e+00

Pos logit contributions
ised+0.098
ized+0.092
led+0.086
awed+0.085
 early+0.082
Neg logit contributions
 yours-0.099
 You-0.087
 loser-0.083
 I-0.081
 Go-0.078
Token Wow
Token ID24755
Feature activation+5.24e+00

Pos logit contributions
 yours+0.106
 loser+0.091
 fault+0.091
 policy+0.089
 virtue+0.088
Neg logit contributions
ised-0.068
ized-0.064
 widened-0.064
 wide-0.063
 cautiously-0.061
Token qu
Token ID627
Feature activation-1.25e+00

Pos logit contributions
ised+0.096
ized+0.092
awed+0.090
 early+0.089
led+0.087
Neg logit contributions
 yours-0.083
 You-0.072
 loser-0.071
 depends-0.070
 deserve-0.069
Tokenack
Token ID441
Feature activation-3.78e+00

Pos logit contributions
ised+0.025
 early+0.024
ized+0.023
 widened+0.023
awed+0.022
Neg logit contributions
 yours-0.029
 loser-0.027
 deserve-0.027
 losers-0.025
 You-0.024
Token!
Token ID0
Feature activation+6.55e-01

Pos logit contributions
ised+0.076
awed+0.068
 early+0.067
ized+0.067
 widened+0.064
Neg logit contributions
 yours-0.089
 loser-0.077
 you-0.077
 You-0.077
 Go-0.076
Token Lucy
Token ID22162
Feature activation-2.70e+00

Pos logit contributions
 yours+0.018
 loser+0.016
 deserve+0.015
 virtue+0.015
 You+0.015
Neg logit contributions
ised-0.012
ized-0.011
 early-0.011
 widened-0.010
led-0.010
Token pe
Token ID613
Feature activation-2.31e+00

Pos logit contributions
 early+0.055
ised+0.052
ized+0.050
side+0.045
awed+0.042
Neg logit contributions
 will-0.054
 deserve-0.054
 loser-0.053
 yours-0.052
 Go-0.052
Tokenered
Token ID1068
Feature activation-8.40e+00

Pos logit contributions
 early+0.052
 widened+0.044
 wide+0.044
ised+0.041
 fields+0.041
Neg logit contributions
 yours-0.051
 loser-0.044
 winner-0.041
 losers-0.041
osaurs-0.041
Token closer
Token ID5699
Feature activation-6.70e+00

Pos logit contributions
awed+0.173
ised+0.172
ized+0.159
 early+0.154
led+0.149
Neg logit contributions
 yours-0.188
 you-0.151
 ever-0.139
 You-0.138
 I-0.137
Token and
Token ID290
Feature activation-3.64e+00

Pos logit contributions
ised+0.154
ized+0.147
awed+0.146
led+0.145
 widened+0.135
Neg logit contributions
 yours-0.121
 will-0.107
 you-0.107
 ever-0.105
 I-0.101
Token saw
Token ID2497
Feature activation-4.33e+00

Pos logit contributions
ised+0.059
 early+0.057
ized+0.053
awed+0.051
 widened+0.048
Neg logit contributions
 yours-0.095
 you-0.087
 deserve-0.086
 I-0.083
 will-0.083
Token two
Token ID734
Feature activation-1.79e+00

Pos logit contributions
ised+0.076
 early+0.074
ized+0.071
awed+0.070
 widened+0.070
Neg logit contributions
 yours-0.118
 loser-0.101
 fault-0.096
 you-0.095
 deserve-0.095
Token ducks
Token ID39694
Feature activation-2.25e+00

Pos logit contributions
ised+0.034
 early+0.034
ized+0.031
awed+0.031
led+0.030
Neg logit contributions
 yours-0.042
 loser-0.040
 winner-0.038
 deserve-0.037
 losers-0.037
Token the
Token ID262
Feature activation-8.03e-01

Pos logit contributions
ised+0.056
ized+0.052
awed+0.050
 widened+0.050
 early+0.049
Neg logit contributions
 yours-0.140
 loser-0.122
 deserve-0.118
 policy-0.114
 depends-0.114
Token river
Token ID7850
Feature activation-2.15e+00

Pos logit contributions
ised+0.012
 wide+0.011
 early+0.011
ized+0.011
 widened+0.010
Neg logit contributions
 yours-0.025
 loser-0.022
 policy-0.021
 deserve-0.021
 virtue-0.020
Token.
Token ID13
Feature activation+4.80e-01

Pos logit contributions
ised+0.036
 early+0.032
ized+0.032
 widened+0.032
awed+0.032
Neg logit contributions
 yours-0.060
 loser-0.054
 deserve-0.052
 rule-0.050
 costs-0.050
Token Tim
Token ID5045
Feature activation-2.78e+00

Pos logit contributions
 yours+0.012
 loser+0.011
 deserve+0.011
 You+0.010
 costs+0.010
Neg logit contributions
ised-0.009
ized-0.009
 early-0.009
led-0.008
 widened-0.008
Token pe
Token ID613
Feature activation-1.91e+00

Pos logit contributions
ised+0.057
ized+0.055
 early+0.054
overed+0.050
 widened+0.047
Neg logit contributions
 will-0.058
 yours-0.058
 deserve-0.057
'll-0.056
 depends-0.055
Tokenered
Token ID1068
Feature activation-8.38e+00

Pos logit contributions
 early+0.042
 widened+0.037
 wide+0.037
ised+0.035
 fields+0.034
Neg logit contributions
 yours-0.043
 loser-0.038
osaurs-0.035
 losers-0.035
 winner-0.034
Token over
Token ID625
Feature activation-6.81e+00

Pos logit contributions
awed+0.165
ised+0.163
ized+0.152
 widened+0.145
 early+0.142
Neg logit contributions
 yours-0.199
 you-0.155
 You-0.146
 will-0.145
 loser-0.145
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
ised+0.144
ized+0.139
led+0.135
awed+0.133
 widened+0.132
Neg logit contributions
 yours-0.153
 loser-0.132
 policy-0.130
 you-0.126
 You-0.125
Token side
Token ID1735
Feature activation-3.48e+00

Pos logit contributions
ised+0.021
 early+0.020
ized+0.020
 wide+0.019
 widened+0.019
Neg logit contributions
 yours-0.037
 loser-0.033
 deserve-0.031
 policy-0.030
 virtue-0.030
Token and
Token ID290
Feature activation-3.51e+00

Pos logit contributions
ised+0.080
led+0.077
 widened+0.077
ized+0.075
awed+0.075
Neg logit contributions
 yours-0.074
 loser-0.065
 policy-0.062
 winner-0.058
 rule-0.057
Token saw
Token ID2497
Feature activation-4.46e+00

Pos logit contributions
ised+0.064
 early+0.060
ized+0.058
 widening+0.055
 widened+0.055
Neg logit contributions
 yours-0.081
 you-0.077
 I-0.075
 deserve-0.072
 will-0.071
Token the
Token ID262
Feature activation+1.05e+00

Pos logit contributions
 yours+0.060
 loser+0.055
 deserve+0.052
 virtue+0.052
 depends+0.051
Neg logit contributions
ised-0.056
 early-0.054
ized-0.053
 wide-0.052
 cautiously-0.050
Token squirrel
Token ID33039
Feature activation+3.50e+00

Pos logit contributions
 yours+0.027
 loser+0.023
 deserve+0.022
 virtue+0.022
 losers+0.021
Neg logit contributions
 early-0.020
 wide-0.020
ised-0.019
ized-0.019
 widened-0.018
Token."
Token ID526
Feature activation-1.15e+00

Pos logit contributions
 yours+0.103
 loser+0.095
 deserve+0.090
 virtue+0.088
 policy+0.087
Neg logit contributions
ised-0.053
 early-0.051
ized-0.050
 wide-0.047
 widened-0.046
Token He
Token ID679
Feature activation-2.94e+00

Pos logit contributions
ised+0.021
ized+0.020
 early+0.019
 widened+0.019
led+0.018
Neg logit contributions
 yours-0.030
 loser-0.026
 deserve-0.026
 You-0.026
 costs-0.025
Token looked
Token ID3114
Feature activation-5.55e+00

Pos logit contributions
ised+0.057
ized+0.056
 early+0.054
side+0.051
awed+0.049
Neg logit contributions
 yours-0.067
 deserve-0.061
 teaches-0.060
 costs-0.060
 depends-0.059
Token up
Token ID510
Feature activation-8.37e+00

Pos logit contributions
ised+0.095
side+0.090
awed+0.088
ized+0.085
edly+0.081
Neg logit contributions
 yours-0.145
 will-0.120
 depends-0.116
 You-0.113
 costs-0.113
Token and
Token ID290
Feature activation-3.72e+00

Pos logit contributions
awed+0.173
ised+0.169
 widened+0.158
side+0.158
led+0.153
Neg logit contributions
 yours-0.144
 you-0.134
 ever-0.129
 I-0.117
 gift-0.108
Token saw
Token ID2497
Feature activation-5.22e+00

Pos logit contributions
ised+0.072
edly+0.070
side+0.067
 early+0.066
overed+0.064
Neg logit contributions
 yours-0.076
 you-0.072
 will-0.069
 I-0.069
 depends-0.066
Token a
Token ID257
Feature activation-2.90e+00

Pos logit contributions
ised+0.085
 early+0.081
ized+0.077
awed+0.076
 widened+0.075
Neg logit contributions
 yours-0.151
 loser-0.129
 deserve-0.123
 fault-0.121
 losers-0.121
Token three
Token ID1115
Feature activation-3.63e+00

Pos logit contributions
ised+0.053
ized+0.050
 early+0.049
led+0.047
awed+0.046
Neg logit contributions
 yours-0.077
 loser-0.072
 deserve-0.067
 virtue-0.065
 depends-0.064
Token-
Token ID12
Feature activation-3.79e+00

Pos logit contributions
ised+0.069
 early+0.067
ized+0.065
awed+0.063
 widened+0.062
Neg logit contributions
 yours-0.083
 loser-0.081
 deserve-0.074
 winner-0.073
 rule-0.073
Token,"
Token ID553
Feature activation-2.33e+00

Pos logit contributions
 yours+0.054
 loser+0.049
 deserve+0.046
 virtue+0.044
 policy+0.044
Neg logit contributions
ised-0.046
 early-0.044
ized-0.043
 widened-0.041
 wide-0.041
Token he
Token ID339
Feature activation-2.30e+00

Pos logit contributions
ised+0.035
ized+0.033
 early+0.033
awed+0.031
 widened+0.031
Neg logit contributions
 yours-0.068
 loser-0.060
 deserve-0.060
 costs-0.058
 losers-0.058
Token said
Token ID531
Feature activation-6.04e+00

Pos logit contributions
ised+0.040
 early+0.039
ized+0.038
 widened+0.035
 valleys+0.034
Neg logit contributions
 yours-0.059
 deserve-0.054
 loser-0.053
 costs-0.052
 depends-0.052
Token.
Token ID13
Feature activation-7.07e-01

Pos logit contributions
ised+0.117
ized+0.107
awed+0.103
 fields+0.102
 widened+0.102
Neg logit contributions
 yours-0.155
 loser-0.138
 policy-0.128
 deserve-0.127
 You-0.126
Token Lucy
Token ID22162
Feature activation-3.02e+00

Pos logit contributions
ised+0.013
ized+0.013
 early+0.012
 widened+0.012
led+0.011
Neg logit contributions
 yours-0.019
 loser-0.017
 deserve-0.016
 costs-0.016
 You-0.016
Token closed
Token ID4838
Feature activation-8.36e+00

Pos logit contributions
ised+0.053
 early+0.051
ized+0.049
led+0.042
awed+0.041
Neg logit contributions
 yours-0.074
 deserve-0.070
 winner-0.069
 loser-0.069
 depends-0.068
Token the
Token ID262
Feature activation-7.11e-01

Pos logit contributions
ised+0.172
 widened+0.168
awed+0.167
 valleys+0.162
led+0.161
Neg logit contributions
 yours-0.171
 my-0.128
 our-0.123
 mine-0.121
 you-0.120
Token box
Token ID3091
Feature activation-6.12e+00

Pos logit contributions
ised+0.014
 early+0.014
ized+0.013
 wide+0.013
 widened+0.013
Neg logit contributions
 yours-0.018
 loser-0.016
 deserve-0.015
 losers-0.014
 virtue-0.014
Token and
Token ID290
Feature activation-2.25e+00

Pos logit contributions
awed+0.116
ised+0.108
 valleys+0.101
 sudden+0.101
 widened+0.100
Neg logit contributions
 yours-0.129
 teaches-0.122
 rule-0.119
 gift-0.113
 winner-0.113
Token put
Token ID1234
Feature activation-3.84e+00

Pos logit contributions
ised+0.045
ized+0.040
awed+0.039
 early+0.039
ling+0.039
Neg logit contributions
 yours-0.048
 will-0.045
 depends-0.043
 deserve-0.043
 costs-0.043
Token it
Token ID340
Feature activation-1.61e+00

Pos logit contributions
ised+0.069
 early+0.064
ized+0.064
 widened+0.062
awed+0.059
Neg logit contributions
 yours-0.111
 loser-0.092
 deserve-0.089
 You-0.086
 losers-0.086
Token,
Token ID11
Feature activation+4.00e+00

Pos logit contributions
 yours+0.151
 loser+0.140
 virtue+0.135
 policy+0.134
 losers+0.133
Neg logit contributions
ised-0.025
 early-0.024
 wide-0.022
ized-0.020
led-0.020
Token fire
Token ID2046
Feature activation+5.11e+00

Pos logit contributions
 yours+0.092
 virtue+0.090
 loser+0.087
 policy+0.083
 fault+0.082
Neg logit contributions
 early-0.069
 and-0.069
 wide-0.069
 quickly-0.063
,-0.062
Token is
Token ID318
Feature activation+3.32e+00

Pos logit contributions
 yours+0.130
 loser+0.115
 deserve+0.111
 virtue+0.109
 losers+0.108
Neg logit contributions
ised-0.098
 early-0.092
ized-0.090
led-0.089
 wide-0.086
Token dangerous
Token ID4923
Feature activation+3.20e+00

Pos logit contributions
 yours+0.097
 loser+0.093
 deserve+0.087
 policy+0.085
 depends+0.085
Neg logit contributions
 early-0.047
ised-0.047
 wide-0.046
ized-0.042
 widened-0.040
Token.
Token ID13
Feature activation+7.22e+00

Pos logit contributions
 yours+0.095
 loser+0.088
 deserve+0.083
 virtue+0.081
 losers+0.080
Neg logit contributions
ised-0.048
 early-0.046
ized-0.045
 wide-0.041
 widened-0.041
Token You
Token ID921
Feature activation+7.17e+00

Pos logit contributions
 yours+0.175
 virtue+0.157
 fault+0.154
 loser+0.149
 winner+0.149
Neg logit contributions
ised-0.131
 wide-0.127
 early-0.124
ized-0.114
 widened-0.114
Token can
Token ID460
Feature activation+4.59e+00

Pos logit contributions
 yours+0.145
 virtue+0.117
 loser+0.116
 occasions+0.115
 policy+0.115
Neg logit contributions
 wide-0.171
ised-0.163
 walked-0.159
 quickly-0.156
 widened-0.156
Token't
Token ID470
Feature activation+3.83e+00

Pos logit contributions
 yours+0.099
 virtue+0.083
 loser+0.082
 policy+0.081
 depends+0.078
Neg logit contributions
 wide-0.108
 early-0.103
ised-0.101
 slowly-0.101
 quickly-0.100
Token play
Token ID711
Feature activation+2.53e+00

Pos logit contributions
 yours+0.099
 fault+0.090
 virtue+0.090
 loser+0.088
 policy+0.085
Neg logit contributions
ised-0.069
 wide-0.069
 early-0.066
ized-0.064
 slowly-0.063
Token with
Token ID351
Feature activation+1.38e+00

Pos logit contributions
 yours+0.081
 loser+0.074
 deserve+0.072
 virtue+0.071
 policy+0.071
Neg logit contributions
ised-0.033
 early-0.031
ized-0.030
 wide-0.028
 widened-0.026
Token matches
Token ID7466
Feature activation+1.94e+00

Pos logit contributions
 yours+0.036
 loser+0.035
 deserve+0.032
 virtue+0.032
 depends+0.032
Neg logit contributions
ised-0.025
ized-0.024
 early-0.023
 wide-0.022
 widened-0.021
Token new
Token ID649
Feature activation+4.52e+00

Pos logit contributions
 yours+0.066
 deserve+0.058
grand+0.055
 loser+0.053
 costs+0.053
Neg logit contributions
 wide-0.064
 early-0.062
 tall-0.060
 cautiously-0.059
ised-0.058
Token best
Token ID1266
Feature activation+4.89e+00

Pos logit contributions
 yours+0.109
 deserve+0.102
 loser+0.094
 costs+0.091
 Sorry+0.090
Neg logit contributions
 early-0.092
 wide-0.087
 cautiously-0.084
ised-0.083
 tall-0.083
Token friend
Token ID1545
Feature activation+5.00e+00

Pos logit contributions
 yours+0.123
 deserve+0.114
 costs+0.110
 belong+0.107
 depends+0.107
Neg logit contributions
 early-0.095
 wide-0.085
 and-0.083
 quickly-0.083
 slowly-0.082
Token,
Token ID11
Feature activation+5.03e+00

Pos logit contributions
 yours+0.121
 loser+0.118
 inconvenience+0.116
 virtue+0.115
 fault+0.115
Neg logit contributions
 and-0.083
 early-0.076
 wide-0.075
,-0.075
 quickly-0.071
Token cow
Token ID9875
Feature activation+3.94e+00

Pos logit contributions
 yours+0.138
 loser+0.130
 deserve+0.125
 policy+0.122
 virtue+0.121
Neg logit contributions
 early-0.078
 wide-0.076
ised-0.073
 cautiously-0.071
ized-0.068
Token.
Token ID13
Feature activation+9.57e+00

Pos logit contributions
 yours+0.110
 loser+0.101
 policy+0.097
 deserve+0.096
 virtue+0.094
Neg logit contributions
 early-0.061
ised-0.055
 wide-0.055
ized-0.054
 cautiously-0.053
Token You
Token ID921
Feature activation+7.88e+00

Pos logit contributions
 yours+0.176
 loser+0.157
 fault+0.156
 policy+0.152
 rule+0.145
Neg logit contributions
 wide-0.206
 early-0.205
 cautiously-0.202
ised-0.198
 Suddenly-0.198
Token are
Token ID389
Feature activation+4.17e+00

Pos logit contributions
 yours+0.164
 policy+0.147
 loser+0.138
 fault+0.129
 virtue+0.125
Neg logit contributions
 wide-0.194
 widened-0.182
ised-0.180
 early-0.171
 cautiously-0.169
Token so
Token ID523
Feature activation+2.47e+00

Pos logit contributions
 loser+0.111
 yours+0.111
 policy+0.110
grand+0.104
 fault+0.103
Neg logit contributions
 wide-0.065
 early-0.062
ised-0.059
 tall-0.058
ized-0.055
Token wise
Token ID10787
Feature activation+2.98e+00

Pos logit contributions
 yours+0.068
 loser+0.061
 deserve+0.057
 virtue+0.056
 policy+0.055
Neg logit contributions
ised-0.044
 early-0.042
ized-0.042
 widened-0.039
 wide-0.039
Token and
Token ID290
Feature activation+5.28e+00

Pos logit contributions
 yours+0.071
 loser+0.065
 belong+0.061
 deserve+0.060
 policy+0.059
Neg logit contributions
 early-0.060
ised-0.060
 wide-0.055
ized-0.055
led-0.052
Token "
Token ID366
Feature activation+4.63e+00

Pos logit contributions
ised+0.051
ized+0.049
led+0.045
 early+0.044
awed+0.044
Neg logit contributions
 yours-0.060
 loser-0.053
 You-0.052
 Friends-0.049
 deserve-0.049
TokenWow
Token ID22017
Feature activation+3.49e+00

Pos logit contributions
 yours+0.132
 fault+0.124
 loser+0.114
 virtue+0.114
 policy+0.110
Neg logit contributions
ised-0.080
ized-0.077
 widened-0.071
awed-0.070
 wide-0.069
Token,
Token ID11
Feature activation+4.21e+00

Pos logit contributions
 yours+0.124
 loser+0.115
 deserve+0.111
 virtue+0.109
 policy+0.108
Neg logit contributions
ised-0.033
 early-0.029
ized-0.029
 widened-0.026
 wide-0.025
Token Lily
Token ID20037
Feature activation+3.64e+00

Pos logit contributions
 loser+0.120
 virtue+0.119
 yours+0.118
 policy+0.113
 fault+0.111
Neg logit contributions
 wide-0.054
ised-0.050
ized-0.047
 early-0.047
 tall-0.044
Token!
Token ID0
Feature activation+8.21e+00

Pos logit contributions
 yours+0.124
 loser+0.114
 deserve+0.109
 virtue+0.107
 losers+0.106
Neg logit contributions
ised-0.041
 early-0.038
ized-0.037
 widened-0.034
 wide-0.033
Token That
Token ID1320
Feature activation+8.18e+00

Pos logit contributions
 virtue+0.158
 yours+0.155
 rule+0.149
 losers+0.143
 loser+0.137
Neg logit contributions
 wide-0.172
ised-0.165
 walked-0.162
 early-0.161
awed-0.160
Token's
Token ID338
Feature activation+5.12e+00

Pos logit contributions
 yours+0.226
 loser+0.216
 losers+0.204
 policy+0.195
 inconvenience+0.190
Neg logit contributions
 wide-0.149
 tall-0.124
ised-0.124
 early-0.123
ized-0.119
Token an
Token ID281
Feature activation+2.49e+00

Pos logit contributions
 yours+0.149
 loser+0.145
 policy+0.132
 deserve+0.131
 Friends+0.130
Neg logit contributions
ised-0.074
 early-0.072
 wide-0.071
ized-0.066
 widened-0.063
Token incredible
Token ID8082
Feature activation+3.88e+00

Pos logit contributions
 yours+0.063
 loser+0.060
 deserve+0.056
 Friends+0.054
 losers+0.054
Neg logit contributions
ised-0.046
ized-0.045
 early-0.044
 wide-0.043
 curiously-0.040
Token picture
Token ID4286
Feature activation+4.40e+00

Pos logit contributions
 yours+0.091
 losers+0.085
 loser+0.085
 deserve+0.081
 Go+0.076
Neg logit contributions
 wide-0.077
 early-0.077
ised-0.071
 tall-0.070
ized-0.068
Token!"
Token ID2474
Feature activation-1.89e+00

Pos logit contributions
 loser+0.124
 yours+0.118
 virtue+0.116
 fault+0.113
 Deal+0.111
Neg logit contributions
 early-0.062
 wide-0.060
 and-0.055
led-0.053
 quickly-0.051
Token's
Token ID338
Feature activation+3.91e+00

Pos logit contributions
 yours+0.214
 loser+0.212
 policy+0.195
 inconvenience+0.190
 virtue+0.189
Neg logit contributions
 wide-0.092
led-0.089
 quickly-0.082
 and-0.082
 slowly-0.081
Token OK
Token ID7477
Feature activation+5.38e+00

Pos logit contributions
 loser+0.088
 yours+0.086
grand+0.085
 depends+0.085
 costs+0.084
Neg logit contributions
 wide-0.073
 early-0.072
ised-0.068
 low-0.065
 tall-0.063
Token,
Token ID11
Feature activation+4.26e+00

Pos logit contributions
 yours+0.212
 virtue+0.202
 loser+0.200
 depends+0.199
 deserve+0.199
Neg logit contributions
 early-0.019
 wide-0.017
ised-0.012
 quickly-0.009
,-0.009
Token Tom
Token ID4186
Feature activation+3.56e+00

Pos logit contributions
 yours+0.124
 virtue+0.117
 loser+0.115
 deserve+0.114
 policy+0.111
Neg logit contributions
 wide-0.057
 early-0.056
ised-0.055
ized-0.053
 widened-0.049
Token.
Token ID13
Feature activation+7.98e+00

Pos logit contributions
 yours+0.125
 loser+0.117
 virtue+0.113
 deserve+0.111
 losers+0.111
Neg logit contributions
 early-0.030
ised-0.027
 wide-0.026
ized-0.024
led-0.023
Token It
Token ID632
Feature activation+7.50e+00

Pos logit contributions
 yours+0.128
 virtue+0.113
 loser+0.105
 winner+0.103
 deserve+0.102
Neg logit contributions
ised-0.201
 wide-0.199
 early-0.195
 low-0.182
 widened-0.181
Token's
Token ID338
Feature activation+4.15e+00

Pos logit contributions
 yours+0.229
 loser+0.218
 policy+0.202
 virtue+0.192
 occasions+0.190
Neg logit contributions
led-0.109
 wide-0.108
 walked-0.099
 slowly-0.099
 and-0.097
Token just
Token ID655
Feature activation+5.36e+00

Pos logit contributions
 loser+0.091
 yours+0.090
 deserve+0.085
 policy+0.085
 costs+0.084
Neg logit contributions
 early-0.081
 wide-0.080
ised-0.075
 low-0.072
 tall-0.071
Token a
Token ID257
Feature activation+3.30e+00

Pos logit contributions
 yours+0.165
 loser+0.158
 policy+0.148
 virtue+0.148
 deserve+0.147
Neg logit contributions
ised-0.066
 early-0.066
 wide-0.063
ized-0.061
led-0.058
Token balloon
Token ID21190
Feature activation+4.51e+00

Pos logit contributions
 yours+0.083
 costs+0.073
 Go+0.073
 deserve+0.070
 You+0.069
Neg logit contributions
 wide-0.064
 early-0.062
 tall-0.057
 low-0.057
 sudden-0.053
Token.
Token ID13
Feature activation+8.50e+00

Pos logit contributions
 yours+0.155
 virtue+0.148
 deserve+0.146
 loser+0.146
 Congratulations+0.146
Neg logit contributions
 early-0.030
 and-0.028
 wide-0.028
led-0.022
 slowly-0.022
Token the
Token ID262
Feature activation+5.68e-01

Pos logit contributions
ised+0.016
 early+0.015
ized+0.015
 widened+0.014
 wide+0.013
Neg logit contributions
 yours-0.028
 loser-0.026
 deserve-0.024
 virtue-0.024
 losers-0.023
Token girl
Token ID2576
Feature activation-4.23e+00

Pos logit contributions
 yours+0.012
 loser+0.010
 deserve+0.010
 virtue+0.009
 policy+0.009
Neg logit contributions
ised-0.014
 early-0.013
ized-0.013
 widened-0.013
 wide-0.012
Token said
Token ID531
Feature activation-5.75e+00

Pos logit contributions
ised+0.072
 early+0.067
ized+0.066
 widened+0.060
awed+0.059
Neg logit contributions
 yours-0.114
 deserve-0.102
 loser-0.102
 depends-0.100
 costs-0.099
Token.
Token ID13
Feature activation-4.46e-01

Pos logit contributions
ised+0.117
ized+0.106
 widened+0.104
awed+0.102
 early+0.102
Neg logit contributions
 yours-0.138
 loser-0.121
 Friends-0.115
 You-0.114
 Go-0.111
Token "
Token ID366
Feature activation+3.69e+00

Pos logit contributions
ised+0.009
ized+0.009
led+0.008
 early+0.008
 widened+0.008
Neg logit contributions
 yours-0.011
 loser-0.010
 You-0.010
 Friends-0.010
 deserve-0.009
TokenYou
Token ID1639
Feature activation+6.84e+00

Pos logit contributions
 yours+0.075
 loser+0.059
 fault+0.057
 policy+0.057
 virtue+0.057
Neg logit contributions
ised-0.092
 early-0.092
ized-0.091
 wide-0.086
 widened-0.086
Token are
Token ID389
Feature activation+3.69e+00

Pos logit contributions
 yours+0.168
 policy+0.150
 occasions+0.145
 loser+0.142
 virtue+0.140
Neg logit contributions
 wide-0.144
 widened-0.140
ised-0.130
led-0.129
 walked-0.127
Token very
Token ID845
Feature activation+2.20e+00

Pos logit contributions
 occasions+0.116
 policy+0.114
 loser+0.112
 virtue+0.111
grand+0.111
Neg logit contributions
 wide-0.047
 tall-0.040
 excited-0.035
 surprised-0.035
 amazed-0.034
Token kind
Token ID1611
Feature activation+2.26e+00

Pos logit contributions
 yours+0.058
 loser+0.053
 policy+0.049
 fault+0.049
 virtue+0.049
Neg logit contributions
ised-0.040
 early-0.038
ized-0.038
 wide-0.038
led-0.036
Token.
Token ID13
Feature activation+7.22e+00

Pos logit contributions
 yours+0.078
 loser+0.072
 virtue+0.070
 deserve+0.069
 depends+0.069
Neg logit contributions
ised-0.022
 early-0.021
ized-0.019
 wide-0.019
led-0.017
Token Do
Token ID2141
Feature activation+7.10e+00

Pos logit contributions
 fault+0.114
 yours+0.114
 loser+0.111
 virtue+0.108
 inconvenience+0.102
Neg logit contributions
ised-0.169
 early-0.168
 wide-0.165
 Then-0.158
 excited-0.155
TokenBecause
Token ID8128
Feature activation+6.43e+00

Pos logit contributions
 yours+0.144
 loser+0.127
 deserve+0.118
 policy+0.117
 losers+0.115
Neg logit contributions
ised-0.115
ized-0.110
 early-0.108
 widened-0.103
 wide-0.102
Token you
Token ID345
Feature activation+6.52e+00

Pos logit contributions
 yours+0.155
 loser+0.142
 deserve+0.134
 virtue+0.130
 depends+0.128
Neg logit contributions
ised-0.131
 early-0.126
ized-0.124
 wide-0.118
 widened-0.114
Token won
Token ID1839
Feature activation+3.73e+00

Pos logit contributions
 yours+0.114
 loser+0.097
 policy+0.087
 virtue+0.080
 losers+0.079
Neg logit contributions
ised-0.182
ized-0.172
 wide-0.171
 early-0.169
 widened-0.168
Token the
Token ID262
Feature activation+2.26e+00

Pos logit contributions
 yours+0.068
 loser+0.057
 policy+0.052
 virtue+0.051
 deserve+0.051
Neg logit contributions
ised-0.101
 early-0.097
ized-0.096
 wide-0.094
 widened-0.093
Token hose
Token ID31489
Feature activation+4.97e+00

Pos logit contributions
 yours+0.054
 loser+0.048
 deserve+0.045
 policy+0.045
 losers+0.044
Neg logit contributions
ised-0.046
 wide-0.046
 early-0.045
ized-0.045
 tall-0.041
Token,
Token ID11
Feature activation+6.15e+00

Pos logit contributions
 yours+0.145
 loser+0.137
 deserve+0.127
 policy+0.127
 virtue+0.126
Neg logit contributions
 early-0.073
ised-0.072
ized-0.070
 wide-0.066
 widened-0.061
Token sweet
Token ID6029
Feature activation+3.32e+00

Pos logit contributions
 yours+0.155
 loser+0.144
 deserve+0.132
 policy+0.130
 virtue+0.129
Neg logit contributions
ised-0.116
 early-0.112
ized-0.110
 wide-0.107
 cautiously-0.103
Tokenheart
Token ID11499
Feature activation+4.81e+00

Pos logit contributions
 yours+0.077
 loser+0.066
 deserve+0.063
 policy+0.060
 virtue+0.059
Neg logit contributions
ised-0.073
ized-0.070
 early-0.070
 widened-0.068
awed-0.067
Token.
Token ID13
Feature activation+9.54e+00

Pos logit contributions
 yours+0.108
 loser+0.100
 losers+0.092
 fault+0.090
 deserve+0.090
Neg logit contributions
ised-0.101
 early-0.099
ized-0.096
 wide-0.092
led-0.090
Token Now
Token ID2735
Feature activation+5.35e+00

Pos logit contributions
 yours+0.211
 loser+0.191
 policy+0.186
 belong+0.177
 rule+0.174
Neg logit contributions
ised-0.202
 cautiously-0.182
 wide-0.182
 early-0.180
 nervously-0.174
Token you
Token ID345
Feature activation+6.80e+00

Pos logit contributions
 yours+0.100
 loser+0.089
 deserve+0.079
 virtue+0.078
 policy+0.078
Neg logit contributions
ised-0.139
 early-0.134
ized-0.133
 wide-0.129
 widened-0.127
Token of
Token ID286
Feature activation+9.87e-01

Pos logit contributions
 yours+0.018
 loser+0.017
 virtue+0.016
 deserve+0.016
 depends+0.016
Neg logit contributions
 wide-0.011
ised-0.011
 early-0.011
ized-0.011
 tall-0.010
Token the
Token ID262
Feature activation+6.97e-01

Pos logit contributions
 loser+0.024
 yours+0.023
 virtue+0.022
 deserve+0.021
 depends+0.021
Neg logit contributions
ised-0.018
 early-0.018
 wide-0.018
ized-0.017
 cautiously-0.016
Token animals
Token ID4695
Feature activation+3.48e+00

Pos logit contributions
 yours+0.019
 loser+0.018
 deserve+0.016
 virtue+0.016
 policy+0.016
Neg logit contributions
ised-0.012
 wide-0.011
 early-0.011
ized-0.011
 widened-0.010
Token hidden
Token ID7104
Feature activation-4.54e-01

Pos logit contributions
 yours+0.089
 loser+0.087
 virtue+0.080
 deserve+0.078
 policy+0.078
Neg logit contributions
 early-0.060
ised-0.059
ized-0.058
 wide-0.057
 cautiously-0.053
Token away
Token ID1497
Feature activation+6.74e-01

Pos logit contributions
ised+0.007
 early+0.007
ized+0.007
 wide+0.007
led+0.006
Neg logit contributions
 yours-0.012
 loser-0.012
 deserve-0.011
 virtue-0.011
 depends-0.011
Token in
Token ID287
Feature activation+1.40e+00

Pos logit contributions
 yours+0.015
 loser+0.014
 deserve+0.013
 virtue+0.012
 losers+0.012
Neg logit contributions
ised-0.015
 early-0.014
ized-0.014
 wide-0.014
 widened-0.013
Token the
Token ID262
Feature activation+7.20e-02

Pos logit contributions
 yours+0.045
 loser+0.043
 virtue+0.040
 deserve+0.040
 policy+0.039
Neg logit contributions
ised-0.017
ized-0.016
 early-0.015
 wide-0.015
 widened-0.014
Token woods
Token ID16479
Feature activation+7.40e-01

Pos logit contributions
 yours+0.002
 loser+0.002
 deserve+0.002
 virtue+0.002
 losers+0.002
Neg logit contributions
ised-0.001
 early-0.001
ized-0.001
 widened-0.001
 wide-0.001
Token.
Token ID13
Feature activation+7.52e+00

Pos logit contributions
 yours+0.018
 loser+0.016
 deserve+0.015
 virtue+0.014
 policy+0.014
Neg logit contributions
ised-0.016
 early-0.015
ized-0.015
 widened-0.014
 wide-0.014
Token Follow
Token ID7281
Feature activation+2.26e-01

Pos logit contributions
 yours+0.151
 fault+0.141
 loser+0.135
 policy+0.128
 virtue+0.128
Neg logit contributions
 cautiously-0.171
 wide-0.169
 nervously-0.162
 curiously-0.160
ised-0.160
Token me
Token ID502
Feature activation+2.10e+00

Pos logit contributions
 yours+0.005
 loser+0.005
 deserve+0.004
 virtue+0.004
 policy+0.004
Neg logit contributions
ised-0.005
 early-0.005
ized-0.005
 widened-0.004
 wide-0.004
Token how
Token ID703
Feature activation+1.44e+00

Pos logit contributions
ised+0.039
 widened+0.037
awed+0.036
ized+0.035
 widening+0.034
Neg logit contributions
 yours-0.053
 costs-0.043
 fault-0.043
 deserve-0.043
 loser-0.042
Token to
Token ID284
Feature activation+2.75e+00

Pos logit contributions
 yours+0.037
 loser+0.031
 deserve+0.031
 virtue+0.030
 losers+0.029
Neg logit contributions
ised-0.028
 widened-0.027
ized-0.026
 early-0.026
awed-0.026
Token use
Token ID779
Feature activation-1.18e-01

Pos logit contributions
 yours+0.054
 loser+0.047
 deserve+0.043
 virtue+0.041
 policy+0.041
Neg logit contributions
ised-0.070
 early-0.067
ized-0.067
 widened-0.064
 wide-0.064
Token a
Token ID257
Feature activation+1.83e+00

Pos logit contributions
ised+0.001
 early+0.001
ized+0.001
 widened+0.001
awed+0.001
Neg logit contributions
 yours-0.004
 loser-0.004
 deserve-0.003
 virtue-0.003
 losers-0.003
Token scale
Token ID5046
Feature activation+5.48e-01

Pos logit contributions
 yours+0.047
 loser+0.042
 deserve+0.040
 virtue+0.038
 losers+0.038
Neg logit contributions
ised-0.035
 early-0.034
ized-0.033
 wide-0.032
 widened-0.032
Token to
Token ID284
Feature activation+2.38e+00

Pos logit contributions
 yours+0.016
 loser+0.015
 deserve+0.014
 virtue+0.014
 policy+0.013
Neg logit contributions
ised-0.009
 early-0.008
ized-0.008
 widened-0.008
awed-0.007
Token weigh
Token ID10164
Feature activation-8.94e-01

Pos logit contributions
 yours+0.044
 loser+0.037
 deserve+0.034
 virtue+0.032
 depends+0.032
Neg logit contributions
ised-0.063
 early-0.061
ized-0.061
 widened-0.059
awed-0.058
Token it
Token ID340
Feature activation+1.27e+00

Pos logit contributions
ised+0.013
 early+0.012
ized+0.012
 widened+0.012
awed+0.012
Neg logit contributions
 yours-0.027
 loser-0.024
 deserve-0.023
 virtue-0.022
 costs-0.022
Token.
Token ID13
Feature activation+7.51e-01

Pos logit contributions
 yours+0.038
 loser+0.034
 deserve+0.033
 costs+0.032
 virtue+0.032
Neg logit contributions
ised-0.020
ized-0.018
 early-0.018
awed-0.017
 widened-0.017
Token Tim
Token ID5045
Feature activation-9.29e-01

Pos logit contributions
 yours+0.019
 loser+0.017
 deserve+0.016
 You+0.016
 costs+0.016
Neg logit contributions
ised-0.015
ized-0.014
led-0.014
 widened-0.013
 early-0.013
Tokenmy
Token ID1820
Feature activation-7.45e-01

Pos logit contributions
ised+0.026
ized+0.026
 early+0.025
 widened+0.025
 wide+0.024
Neg logit contributions
 yours-0.015
 loser-0.013
 deserve-0.012
 depends-0.011
 costs-0.011
TokenTom
Token ID13787
Feature activation-1.29e+00

Pos logit contributions
ised+0.065
ized+0.062
 early+0.061
led+0.060
 widened+0.059
Neg logit contributions
 yours-0.068
 loser-0.062
 deserve-0.059
 losers-0.055
 virtue-0.054
Token says
Token ID1139
Feature activation-3.43e+00

Pos logit contributions
ised+0.032
ized+0.031
 early+0.030
 widened+0.029
awed+0.028
Neg logit contributions
 yours-0.026
 loser-0.023
 deserve-0.022
 depends-0.021
 losers-0.021
Token,
Token ID11
Feature activation-3.37e+00

Pos logit contributions
ised+0.032
 early+0.029
ized+0.029
 widened+0.026
 wide+0.025
Neg logit contributions
 yours-0.123
 loser-0.113
 deserve-0.109
 virtue-0.106
 policy-0.106
Token "
Token ID366
Feature activation+4.23e+00

Pos logit contributions
ised+0.067
ized+0.064
 early+0.058
led+0.058
awed+0.057
Neg logit contributions
 yours-0.087
 loser-0.082
 You-0.074
 deserve-0.074
 losers-0.073
TokenNot
Token ID3673
Feature activation+4.28e+00

Pos logit contributions
 yours+0.106
 fault+0.093
 policy+0.087
 loser+0.084
 virtue+0.082
Neg logit contributions
ised-0.094
ized-0.088
 wide-0.086
 early-0.083
 widened-0.083
Token now
Token ID783
Feature activation+3.69e+00

Pos logit contributions
 yours+0.115
 loser+0.103
 deserve+0.097
 virtue+0.094
 policy+0.093
Neg logit contributions
ised-0.079
 early-0.075
ized-0.075
 widened-0.071
 wide-0.070
Token,
Token ID11
Feature activation+5.44e+00

Pos logit contributions
 yours+0.133
 loser+0.123
 deserve+0.120
 virtue+0.118
 depends+0.117
Neg logit contributions
ised-0.031
 early-0.030
ized-0.027
 wide-0.025
 widened-0.024
Token horse
Token ID8223
Feature activation+4.98e+00

Pos logit contributions
 yours+0.144
 loser+0.130
 virtue+0.128
 deserve+0.125
 fault+0.123
Neg logit contributions
ised-0.093
 early-0.091
 wide-0.089
ized-0.087
 widened-0.082
Token.
Token ID13
Feature activation+8.96e+00

Pos logit contributions
 yours+0.168
 loser+0.153
 virtue+0.151
 fault+0.150
 policy+0.149
Neg logit contributions
 early-0.046
 wide-0.042
ised-0.040
led-0.039
ized-0.039
Token I
Token ID314
Feature activation+6.07e+00

Pos logit contributions
 yours+0.185
 virtue+0.165
 fault+0.159
 policy+0.153
 loser+0.148
Neg logit contributions
 wide-0.190
ised-0.185
 early-0.181
 low-0.175
 cautiously-0.174
Token have
Token ID423
Feature activation+2.45e+00

Pos logit contributions
 yours+0.102
 loser+0.074
 policy+0.071
 virtue+0.066
 losers+0.064
Neg logit contributions
 early-0.173
 wide-0.170
ised-0.169
 widened-0.165
ized-0.162
Token way
Token ID835
Feature activation+2.22e+00

Pos logit contributions
 yours+0.067
 loser+0.061
osaurs+0.060
 deserve+0.060
grand+0.058
Neg logit contributions
 wide-0.043
 early-0.041
ised-0.038
 low-0.037
 tall-0.037
Token to
Token ID284
Feature activation+2.13e+00

Pos logit contributions
 yours+0.066
 loser+0.064
 deserve+0.061
 depends+0.060
 virtue+0.059
Neg logit contributions
 early-0.030
ised-0.029
 wide-0.028
ized-0.027
 cautiously-0.026
Token unlock
Token ID12116
Feature activation-2.12e+00

Pos logit contributions
 yours+0.037
 loser+0.030
 deserve+0.028
 belong+0.026
 losers+0.026
Neg logit contributions
ised-0.061
ized-0.058
 early-0.058
awed-0.057
led-0.057
Token the
Token ID262
Feature activation+1.88e+00

Pos logit contributions
ised+0.027
 early+0.025
ized+0.024
 widened+0.024
awed+0.023
Neg logit contributions
 yours-0.070
 loser-0.062
 deserve-0.059
 policy-0.058
 virtue-0.058
Token hole
Token ID7604
Feature activation-9.53e-01

Pos logit contributions
 yours+0.059
 loser+0.053
 deserve+0.052
osaurs+0.052
grand+0.051
Neg logit contributions
 wide-0.024
ised-0.024
 early-0.022
ized-0.021
 tall-0.020
Token to
Token ID284
Feature activation+2.45e+00

Pos logit contributions
ised+0.015
ized+0.014
 widened+0.013
 early+0.013
awed+0.013
Neg logit contributions
 yours-0.029
 loser-0.025
 deserve-0.024
 costs-0.023
 policy-0.023
Token get
Token ID651
Feature activation+1.04e+00

Pos logit contributions
 yours+0.047
 loser+0.038
 deserve+0.038
 belong+0.035
 depends+0.034
Neg logit contributions
ised-0.066
ized-0.062
 early-0.062
awed-0.061
led-0.061
Token the
Token ID262
Feature activation+2.45e+00

Pos logit contributions
 yours+0.033
 loser+0.031
 deserve+0.029
 virtue+0.029
 policy+0.029
Neg logit contributions
ised-0.013
 early-0.013
ized-0.013
 wide-0.012
led-0.011
Token ball
Token ID2613
Feature activation+2.53e+00

Pos logit contributions
 yours+0.062
grand+0.056
osaurs+0.056
 loser+0.055
 deserve+0.054
Neg logit contributions
 wide-0.045
ised-0.044
ized-0.042
 early-0.040
 tall-0.038
Token back
Token ID736
Feature activation+1.38e+00

Pos logit contributions
 yours+0.077
 loser+0.071
 deserve+0.067
 virtue+0.065
 losers+0.065
Neg logit contributions
ised-0.036
 early-0.035
ized-0.034
 wide-0.032
 widened-0.031
Token.
Token ID13
Feature activation+5.93e-01

Pos logit contributions
 yours+0.042
 loser+0.036
 deserve+0.035
 policy+0.034
 fault+0.034
Neg logit contributions
ised-0.022
ized-0.021
 widened-0.020
 early-0.020
awed-0.019
Token Max
Token ID5436
Feature activation-5.56e-01

Pos logit contributions
 yours+0.054
 loser+0.046
 deserve+0.045
 You+0.044
 losers+0.043
Neg logit contributions
ised-0.049
ized-0.048
led-0.046
 early-0.044
awed-0.044
Token felt
Token ID2936
Feature activation-3.35e+00

Pos logit contributions
ised+0.011
ized+0.011
 early+0.010
overed+0.009
awed+0.009
Neg logit contributions
 deserve-0.012
 yours-0.012
 will-0.012
 depends-0.011
 teaches-0.011
Token much
Token ID881
Feature activation-2.60e+00

Pos logit contributions
ised+0.076
awed+0.074
ized+0.073
 early+0.069
 cautiously+0.068
Neg logit contributions
 yours-0.075
 deserve-0.066
 belong-0.062
 loser-0.061
 virtue-0.059
Token better
Token ID1365
Feature activation-9.86e-01

Pos logit contributions
ised+0.080
ized+0.078
awed+0.076
 early+0.075
 cautiously+0.073
Neg logit contributions
 yours-0.035
 deserve-0.029
 loser-0.028
 depends-0.025
 virtue-0.024
Token and
Token ID290
Feature activation-9.73e-01

Pos logit contributions
ised+0.018
 widened+0.017
awed+0.017
ized+0.016
led+0.016
Neg logit contributions
 yours-0.025
 loser-0.025
 you-0.022
 least-0.022
 policy-0.022
Token was
Token ID373
Feature activation-1.37e+00

Pos logit contributions
ised+0.019
ized+0.017
awed+0.017
overed+0.016
 widened+0.016
Neg logit contributions
 yours-0.024
 depends-0.022
 deserve-0.021
 will-0.021
 teaches-0.021
Token happy
Token ID3772
Feature activation-1.07e+00

Pos logit contributions
ised+0.027
ized+0.025
 widened+0.023
awed+0.023
 early+0.022
Neg logit contributions
 yours-0.039
 loser-0.031
 deserve-0.031
 losers-0.030
 belong-0.030
Token to
Token ID284
Feature activation+8.93e-01

Pos logit contributions
ised+0.017
ized+0.016
 early+0.016
 widened+0.015
 wide+0.014
Neg logit contributions
 yours-0.031
 loser-0.028
 deserve-0.027
 policy-0.026
 virtue-0.026
Token be
Token ID307
Feature activation-4.23e-01

Pos logit contributions
 yours+0.018
 loser+0.015
 virtue+0.014
 costs+0.014
 fault+0.013
Neg logit contributions
 early-0.022
ised-0.022
 wide-0.021
 widened-0.021
ized-0.020
Token able
Token ID1498
Feature activation+1.24e+00

Pos logit contributions
ised+0.008
ized+0.007
 widened+0.007
 early+0.007
 curiously+0.007
Neg logit contributions
 yours-0.012
 loser-0.010
 deserve-0.010
 losers-0.009
 belong-0.009
Token to
Token ID284
Feature activation+1.84e+00

Pos logit contributions
 yours+0.037
 loser+0.034
 deserve+0.032
 virtue+0.032
 depends+0.031
Neg logit contributions
 early-0.018
ised-0.018
ized-0.017
 wide-0.017
awed-0.016
Token air
Token ID1633
Feature activation-2.45e+00

Pos logit contributions
ised+0.016
ized+0.015
 early+0.015
awed+0.014
 widened+0.013
Neg logit contributions
 yours-0.031
 loser-0.028
 deserve-0.027
 virtue-0.026
 depends-0.026
Token and
Token ID290
Feature activation+1.35e-01

Pos logit contributions
ised+0.040
ized+0.035
awed+0.034
 early+0.034
 widened+0.033
Neg logit contributions
 yours-0.070
 loser-0.062
 deserve-0.059
 fault-0.058
 winner-0.057
Token it
Token ID340
Feature activation+1.95e+00

Pos logit contributions
 yours+0.004
 loser+0.003
 deserve+0.003
 losers+0.003
 costs+0.003
Neg logit contributions
ised-0.002
ized-0.002
 early-0.002
awed-0.002
 widened-0.002
Token was
Token ID373
Feature activation-1.20e+00

Pos logit contributions
 yours+0.049
 loser+0.044
 deserve+0.044
 costs+0.042
 depends+0.042
Neg logit contributions
ised-0.037
 early-0.035
ized-0.034
awed-0.032
 curiously-0.030
Token getting
Token ID1972
Feature activation-2.79e+00

Pos logit contributions
ised+0.022
ized+0.021
awed+0.020
 early+0.019
led+0.019
Neg logit contributions
 yours-0.033
 loser-0.030
 deserve-0.028
 virtue-0.027
 losers-0.027
Token darker
Token ID18646
Feature activation-3.36e+00

Pos logit contributions
ised+0.045
awed+0.043
ized+0.043
 early+0.038
 widened+0.037
Neg logit contributions
 yours-0.080
 loser-0.069
 deserve-0.068
 losers-0.066
 virtue-0.064
Token.
Token ID13
Feature activation+1.32e+00

Pos logit contributions
awed+0.063
ised+0.062
ized+0.061
 widened+0.056
led+0.055
Neg logit contributions
 yours-0.081
 fault-0.070
 loser-0.068
 policy-0.066
 depends-0.064
Token Sally
Token ID25737
Feature activation-1.84e+00

Pos logit contributions
 yours+0.034
 loser+0.031
 deserve+0.030
 You+0.029
 virtue+0.028
Neg logit contributions
ised-0.024
ized-0.023
 early-0.023
led-0.022
 widened-0.022
Token was
Token ID373
Feature activation-3.08e+00

Pos logit contributions
ised+0.035
ized+0.032
 early+0.032
 widened+0.028
overed+0.028
Neg logit contributions
 yours-0.040
 will-0.039
 deserve-0.039
 loser-0.037
'll-0.037
Token scared
Token ID12008
Feature activation-1.55e+00

Pos logit contributions
ised+0.061
ized+0.058
awed+0.055
led+0.054
 widened+0.053
Neg logit contributions
 yours-0.071
 loser-0.070
 winner-0.065
 deserve-0.063
 losers-0.063
Token.
Token ID13
Feature activation+6.63e-01

Pos logit contributions
ised+0.025
 widened+0.024
ized+0.023
awed+0.023
led+0.023
Neg logit contributions
 yours-0.043
 loser-0.038
 you-0.037
 policy-0.037
 You-0.036
Token had
Token ID550
Feature activation-1.02e-01

Pos logit contributions
ised+0.016
 early+0.014
ized+0.014
 widened+0.013
awed+0.013
Neg logit contributions
 yours-0.027
 loser-0.025
 deserve-0.024
 depends-0.023
 virtue-0.023
Token managed
Token ID5257
Feature activation-1.03e+00

Pos logit contributions
ised+0.002
 early+0.002
ized+0.002
 widened+0.002
awed+0.002
Neg logit contributions
 yours-0.003
 loser-0.002
 deserve-0.002
 virtue-0.002
 losers-0.002
Token to
Token ID284
Feature activation+8.17e-01

Pos logit contributions
ised+0.022
ized+0.020
 widened+0.019
 early+0.019
awed+0.019
Neg logit contributions
 yours-0.027
 loser-0.022
 deserve-0.022
 fault-0.021
 virtue-0.021
Token resist
Token ID4180
Feature activation-2.74e+00

Pos logit contributions
 yours+0.016
 loser+0.013
 deserve+0.012
 virtue+0.012
 losers+0.012
Neg logit contributions
ised-0.021
 early-0.020
ized-0.020
 widened-0.020
awed-0.019
Token the
Token ID262
Feature activation+1.26e-01

Pos logit contributions
ised+0.040
ized+0.037
awed+0.036
 early+0.036
 widened+0.036
Neg logit contributions
 yours-0.084
 loser-0.074
 deserve-0.070
 losers-0.069
 fault-0.069
Token bear
Token ID6842
Feature activation+8.44e-01

Pos logit contributions
 yours+0.003
 loser+0.003
 deserve+0.003
 losers+0.003
 policy+0.003
Neg logit contributions
 early-0.002
ised-0.002
ized-0.002
 wide-0.002
 curiously-0.002
Token's
Token ID338
Feature activation-6.91e-01

Pos logit contributions
 yours+0.024
 loser+0.021
 deserve+0.020
 virtue+0.020
 depends+0.019
Neg logit contributions
ised-0.014
ized-0.013
 early-0.013
 widened-0.013
awed-0.013
Token gr
Token ID1036
Feature activation+8.44e-01

Pos logit contributions
ised+0.018
 early+0.017
ized+0.017
 wide+0.016
 widened+0.016
Neg logit contributions
 yours-0.013
 loser-0.012
 deserve-0.011
 losers-0.010
 virtue-0.010
Tokenump
Token ID931
Feature activation+6.73e-01

Pos logit contributions
 yours+0.019
 loser+0.016
 deserve+0.015
 virtue+0.015
 policy+0.015
Neg logit contributions
ised-0.020
 early-0.019
ized-0.019
 widened-0.018
 wide-0.018
Tokeniness
Token ID1272
Feature activation+1.25e+00

Pos logit contributions
 yours+0.017
 loser+0.015
 deserve+0.014
 depends+0.013
 virtue+0.013
Neg logit contributions
ised-0.014
 early-0.013
ized-0.013
led-0.012
 widened-0.012
Token and
Token ID290
Feature activation+2.84e-01

Pos logit contributions
 yours+0.039
 loser+0.036
 deserve+0.034
 virtue+0.033
 policy+0.033
Neg logit contributions
ised-0.017
 early-0.015
ized-0.015
awed-0.014
 widened-0.014
Tokenled
Token ID992
Feature activation-4.98e+00

Pos logit contributions
 early+0.021
ised+0.021
 widening+0.018
 wide+0.018
ized+0.017
Neg logit contributions
 yours-0.071
 costs-0.064
 will-0.063
 depends-0.062
 deserve-0.062
Token her
Token ID607
Feature activation-1.77e+00

Pos logit contributions
ised+0.062
ized+0.056
 early+0.056
 widened+0.052
awed+0.051
Neg logit contributions
 yours-0.164
 loser-0.147
 deserve-0.141
 losers-0.137
 policy-0.137
Token hand
Token ID1021
Feature activation-2.44e+00

Pos logit contributions
ised+0.031
ized+0.029
 early+0.029
 widened+0.028
awed+0.027
Neg logit contributions
 yours-0.049
 loser-0.044
 deserve-0.041
 virtue-0.040
 policy-0.040
Token.
Token ID13
Feature activation+2.25e+00

Pos logit contributions
ised+0.040
ized+0.037
 early+0.035
 widened+0.034
awed+0.034
Neg logit contributions
 yours-0.072
 loser-0.062
 deserve-0.059
 costs-0.059
 You-0.058
Token Sophie
Token ID35331
Feature activation-1.35e+00

Pos logit contributions
 yours+0.059
 loser+0.051
 You+0.051
 deserve+0.050
 Congratulations+0.049
Neg logit contributions
ised-0.044
ized-0.042
led-0.040
 widened-0.037
awed-0.037
Token felt
Token ID2936
Feature activation-3.11e+00

Pos logit contributions
ised+0.023
 early+0.021
ized+0.021
awed+0.019
 widened+0.019
Neg logit contributions
 yours-0.035
 deserve-0.032
 loser-0.032
 depends-0.031
 costs-0.031
Token happy
Token ID3772
Feature activation-1.16e+00

Pos logit contributions
ised+0.061
ized+0.057
awed+0.057
 early+0.056
led+0.051
Neg logit contributions
 yours-0.080
 deserve-0.071
 belong-0.068
 loser-0.067
 losers-0.065
Token and
Token ID290
Feature activation-1.04e+00

Pos logit contributions
ised+0.018
ized+0.016
 widened+0.015
awed+0.015
 early+0.014
Neg logit contributions
 yours-0.035
 loser-0.031
 deserve-0.030
 fault-0.029
 policy-0.029
Token the
Token ID262
Feature activation+8.41e-01

Pos logit contributions
ised+0.020
awed+0.017
ized+0.017
 widened+0.017
side+0.016
Neg logit contributions
 yours-0.026
 deserve-0.024
 you-0.023
 I-0.023
 belong-0.023
Token sadness
Token ID25303
Feature activation+7.22e-01

Pos logit contributions
 yours+0.018
 loser+0.016
 deserve+0.015
 losers+0.014
 winner+0.014
Neg logit contributions
ised-0.021
ized-0.020
 early-0.019
 widened-0.019
led-0.019
Token started
Token ID2067
Feature activation+3.90e-01

Pos logit contributions
 yours+0.019
 loser+0.017
 deserve+0.017
 belong+0.016
 losers+0.016
Neg logit contributions
ised-0.013
 early-0.012
ized-0.012
awed-0.011
 cautiously-0.011
Token.
Token ID13
Feature activation+1.73e+00

Pos logit contributions
 yours+0.042
 loser+0.038
 deserve+0.037
 virtue+0.036
 policy+0.036
Neg logit contributions
ised-0.020
 early-0.018
ized-0.018
 widened-0.017
 wide-0.017
Token He
Token ID679
Feature activation+7.67e-01

Pos logit contributions
 yours+0.050
 loser+0.046
 deserve+0.043
 virtue+0.042
 losers+0.042
Neg logit contributions
ised-0.028
 early-0.026
ized-0.026
 widened-0.025
 wide-0.024
Token wanted
Token ID2227
Feature activation-3.84e-01

Pos logit contributions
 yours+0.022
 loser+0.020
 deserve+0.019
 losers+0.019
 virtue+0.019
Neg logit contributions
ised-0.012
 early-0.011
ized-0.011
 widened-0.010
 wide-0.010
Token to
Token ID284
Feature activation+1.36e+00

Pos logit contributions
 early+0.005
ised+0.005
 wide+0.005
ized+0.005
 cautiously+0.005
Neg logit contributions
 yours-0.012
 loser-0.011
 deserve-0.011
 depends-0.010
 virtue-0.010
Token take
Token ID1011
Feature activation-1.95e+00

Pos logit contributions
 yours+0.032
 loser+0.028
osaurs+0.027
 policy+0.027
 costs+0.027
Neg logit contributions
 wide-0.027
 early-0.027
 quickly-0.027
 cautiously-0.027
 slowly-0.027
Token a
Token ID257
Feature activation-4.74e-01

Pos logit contributions
ised+0.025
 early+0.023
ized+0.022
 wide+0.022
 cautiously+0.020
Neg logit contributions
 yours-0.062
 loser-0.058
 deserve-0.055
 policy-0.055
 virtue-0.054
Token nap
Token ID25422
Feature activation-2.46e+00

Pos logit contributions
 early+0.009
ised+0.009
 wide+0.008
ized+0.008
 widened+0.008
Neg logit contributions
 yours-0.012
 loser-0.011
 deserve-0.010
 depends-0.010
 virtue-0.010
Token in
Token ID287
Feature activation-2.37e+00

Pos logit contributions
ised+0.038
ized+0.035
 early+0.034
awed+0.034
 widened+0.034
Neg logit contributions
 yours-0.072
 loser-0.067
 virtue-0.061
 deserve-0.061
 policy-0.061
Token the
Token ID262
Feature activation-2.02e-01

Pos logit contributions
ised+0.029
ized+0.027
 early+0.027
 widened+0.025
awed+0.024
Neg logit contributions
 yours-0.078
 loser-0.072
 deserve-0.068
 policy-0.066
 virtue-0.066
Token grass
Token ID8701
Feature activation-2.11e+00

Pos logit contributions
ised+0.003
 early+0.003
ized+0.003
 wide+0.003
 widened+0.003
Neg logit contributions
 yours-0.006
 loser-0.005
 deserve-0.005
 virtue-0.005
 policy-0.005
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
ised+0.033
ized+0.030
 widened+0.029
led+0.028
awed+0.028
Neg logit contributions
 yours-0.062
 loser-0.058
 fault-0.053
 winner-0.052
 virtue-0.052
Token bird
Token ID6512
Feature activation-1.10e+00

Pos logit contributions
ised+0.031
 early+0.030
ized+0.029
 widened+0.028
 wide+0.027
Neg logit contributions
 yours-0.047
 loser-0.042
 deserve-0.040
 virtue-0.038
 policy-0.038
Token.
Token ID13
Feature activation+1.81e+00

Pos logit contributions
ised+0.017
 early+0.016
ized+0.016
 widened+0.015
awed+0.014
Neg logit contributions
 yours-0.033
 loser-0.030
 deserve-0.028
 losers-0.027
 depends-0.027
Token The
Token ID383
Feature activation+1.26e+00

Pos logit contributions
 yours+0.046
 loser+0.042
 deserve+0.040
 You+0.040
 losers+0.039
Neg logit contributions
ised-0.035
ized-0.035
 early-0.031
led-0.031
awed-0.030
Token bird
Token ID6512
Feature activation+7.77e-01

Pos logit contributions
 yours+0.028
 loser+0.024
 deserve+0.023
 virtue+0.022
 policy+0.022
Neg logit contributions
ised-0.029
 early-0.028
ized-0.028
 wide-0.027
 widened-0.027
Token falls
Token ID8953
Feature activation-2.68e+00

Pos logit contributions
 yours+0.017
 loser+0.015
 deserve+0.014
 depends+0.014
 losers+0.014
Neg logit contributions
ised-0.018
 early-0.017
ized-0.017
 widened-0.016
 wide-0.016
Token into
Token ID656
Feature activation-5.66e+00

Pos logit contributions
ised+0.039
ized+0.036
 early+0.034
 widened+0.033
awed+0.031
Neg logit contributions
 yours-0.085
 loser-0.077
 deserve-0.072
 losers-0.071
 fault-0.070
Token the
Token ID262
Feature activation-4.96e-01

Pos logit contributions
ized+0.115
led+0.113
ised+0.113
awed+0.105
 early+0.103
Neg logit contributions
 yours-0.132
 loser-0.126
 you-0.119
 You-0.109
 losers-0.108
Token water
Token ID1660
Feature activation-2.03e+00

Pos logit contributions
ised+0.009
 early+0.008
ized+0.008
 widened+0.008
 wide+0.007
Neg logit contributions
 yours-0.014
 loser-0.013
 deserve-0.012
 virtue-0.012
 policy-0.011
Token with
Token ID351
Feature activation-2.98e+00

Pos logit contributions
ised+0.038
 early+0.033
ized+0.032
 widened+0.031
led+0.030
Neg logit contributions
 yours-0.054
 loser-0.050
 deserve-0.046
 winner-0.045
 losers-0.045
Token a
Token ID257
Feature activation-2.23e+00

Pos logit contributions
ised+0.056
 early+0.055
ized+0.053
led+0.047
awed+0.046
Neg logit contributions
 yours-0.083
 loser-0.071
 deserve-0.069
 losers-0.068
 belong-0.065
Token th
Token ID294
Feature activation-3.26e+00

Pos logit contributions
ised+0.039
 early+0.037
ized+0.037
led+0.034
awed+0.033
Neg logit contributions
 yours-0.062
 loser-0.058
 deserve-0.054
 losers-0.052
 depends-0.051
Token,
Token ID11
Feature activation+5.44e+00

Pos logit contributions
 yours+0.157
 loser+0.152
 virtue+0.142
 deserve+0.137
 policy+0.137
Neg logit contributions
 wide-0.141
 early-0.140
ised-0.133
ized-0.132
led-0.131
Token Lily
Token ID20037
Feature activation+4.07e+00

Pos logit contributions
 yours+0.133
 loser+0.127
 deserve+0.120
 virtue+0.119
 depends+0.117
Neg logit contributions
 early-0.101
 wide-0.096
ised-0.094
 cautiously-0.090
 curiously-0.089
Token?"
Token ID1701
Feature activation-2.39e+00

Pos logit contributions
 yours+0.078
 loser+0.074
 losers+0.069
 virtue+0.067
 deserve+0.066
Neg logit contributions
 early-0.098
ised-0.090
 wide-0.090
 nervously-0.086
ized-0.085
Token Lily
Token ID20037
Feature activation-4.39e+00

Pos logit contributions
ised+0.042
ized+0.040
 early+0.039
 widened+0.037
led+0.036
Neg logit contributions
 yours-0.066
 loser-0.060
 deserve-0.057
 policy-0.055
 depends-0.054
Token said
Token ID531
Feature activation-5.38e+00

Pos logit contributions
ised+0.079
ized+0.076
 early+0.074
 valleys+0.071
led+0.066
Neg logit contributions
 yours-0.102
 costs-0.099
 teaches-0.097
 depends-0.097
 deserve-0.096
Token,
Token ID11
Feature activation-4.83e+00

Pos logit contributions
ised+0.087
ized+0.080
 early+0.074
 widened+0.074
awed+0.073
Neg logit contributions
 yours-0.159
 loser-0.142
 You-0.132
 you-0.129
 Friends-0.128
Token "
Token ID366
Feature activation+4.07e+00

Pos logit contributions
ised+0.115
ized+0.106
led+0.100
awed+0.098
 early+0.094
Neg logit contributions
 yours-0.103
 You-0.091
 loser-0.089
 I-0.086
 grant-0.083
TokenNo
Token ID2949
Feature activation+3.75e+00

Pos logit contributions
 yours+0.101
 fault+0.089
 policy+0.084
 inconvenience+0.083
 loser+0.082
Neg logit contributions
ised-0.081
 wide-0.080
ized-0.078
 early-0.077
 widened-0.074
Token,
Token ID11
Feature activation+3.93e+00

Pos logit contributions
 yours+0.148
 deserve+0.141
 loser+0.140
 virtue+0.138
 depends+0.134
Neg logit contributions
 early-0.012
ised-0.011
 wide-0.010
 nervously-0.006
 curiously-0.005
Token it
Token ID340
Feature activation+7.37e+00

Pos logit contributions
 yours+0.092
 loser+0.086
 virtue+0.084
 deserve+0.082
 depends+0.080
Neg logit contributions
 wide-0.074
ised-0.073
 early-0.073
ized-0.070
 widened-0.068
Token is
Token ID318
Feature activation+3.72e+00

Pos logit contributions
 yours+0.168
 loser+0.153
 policy+0.140
 virtue+0.135
 deserve+0.133
Neg logit contributions
ised-0.157
 wide-0.155
led-0.154
ized-0.151
 early-0.150
Token and
Token ID290
Feature activation-1.65e-01

Pos logit contributions
ised+0.015
ized+0.014
 early+0.014
 widened+0.013
awed+0.013
Neg logit contributions
 yours-0.036
 loser-0.032
 deserve-0.031
 virtue-0.030
 losers-0.030
Token looked
Token ID3114
Feature activation-4.60e+00

Pos logit contributions
ised+0.003
ized+0.003
 early+0.003
 widened+0.003
awed+0.003
Neg logit contributions
 yours-0.004
 loser-0.004
 deserve-0.004
 losers-0.004
 depends-0.004
Token around
Token ID1088
Feature activation-6.42e+00

Pos logit contributions
ised+0.074
awed+0.068
ized+0.066
 early+0.062
 widened+0.061
Neg logit contributions
 yours-0.134
 deserve-0.108
 loser-0.106
 losers-0.105
 belong-0.104
Token.
Token ID13
Feature activation+7.26e-01

Pos logit contributions
ised+0.116
awed+0.112
ized+0.109
led+0.103
 widened+0.101
Neg logit contributions
 yours-0.170
 loser-0.138
 you-0.137
 policy-0.135
 You-0.135
Token He
Token ID679
Feature activation-1.29e+00

Pos logit contributions
 yours+0.022
 loser+0.020
 deserve+0.019
 virtue+0.018
 policy+0.018
Neg logit contributions
ised-0.011
 early-0.010
ized-0.010
 widened-0.010
 wide-0.009
Token saw
Token ID2497
Feature activation-4.57e+00

Pos logit contributions
ised+0.022
ized+0.020
 early+0.020
awed+0.018
 widened+0.018
Neg logit contributions
 yours-0.033
 deserve-0.030
 loser-0.029
 will-0.029
 costs-0.029
Token a
Token ID257
Feature activation-2.82e+00

Pos logit contributions
ised+0.082
ized+0.076
 early+0.076
awed+0.072
 widened+0.071
Neg logit contributions
 yours-0.124
 loser-0.107
 deserve-0.101
 losers-0.100
 fault-0.100
Token mountain
Token ID8598
Feature activation-2.69e+00

Pos logit contributions
ised+0.047
 early+0.045
ized+0.044
 widened+0.042
 wide+0.041
Neg logit contributions
 yours-0.080
 loser-0.072
 deserve-0.068
 virtue-0.066
 policy-0.066
Token in
Token ID287
Feature activation-3.56e+00

Pos logit contributions
ised+0.044
awed+0.039
 widened+0.039
ized+0.038
led+0.037
Neg logit contributions
 yours-0.078
 loser-0.067
 deserve-0.065
 rule-0.065
 winner-0.064
Token the
Token ID262
Feature activation-2.06e+00

Pos logit contributions
awed+0.061
led+0.059
ised+0.058
ized+0.057
 widened+0.051
Neg logit contributions
 yours-0.101
 deserve-0.085
 loser-0.085
 policy-0.080
 you-0.080
Token distance
Token ID5253
Feature activation-3.47e+00

Pos logit contributions
ised+0.029
 early+0.027
ized+0.026
 wide+0.026
 widened+0.025
Neg logit contributions
 yours-0.065
 loser-0.059
 deserve-0.056
 policy-0.055
 losers-0.054
Token some
Token ID617
Feature activation-1.53e+00

Pos logit contributions
ised+0.069
ized+0.065
awed+0.061
 early+0.060
 widened+0.059
Neg logit contributions
 yours-0.112
 loser-0.098
 deserve-0.092
 losers-0.091
 You-0.090
Token water
Token ID1660
Feature activation-2.01e+00

Pos logit contributions
ised+0.029
 early+0.028
ized+0.027
 wide+0.026
 widened+0.026
Neg logit contributions
 yours-0.040
 loser-0.037
 deserve-0.034
 virtue-0.033
 depends-0.033
Token from
Token ID422
Feature activation-4.01e+00

Pos logit contributions
ised+0.039
ized+0.035
 widened+0.034
 early+0.032
awed+0.032
Neg logit contributions
 yours-0.053
 loser-0.046
 will-0.044
 costs-0.042
 deserve-0.042
Token the
Token ID262
Feature activation-2.45e-01

Pos logit contributions
ised+0.063
ized+0.058
 early+0.055
 widened+0.054
awed+0.054
Neg logit contributions
 yours-0.121
 loser-0.107
 losers-0.099
 deserve-0.099
 costs-0.098
Token pot
Token ID1787
Feature activation-2.97e+00

Pos logit contributions
 early+0.004
 wide+0.004
ised+0.004
ized+0.004
 widened+0.004
Neg logit contributions
 yours-0.007
 loser-0.006
 deserve-0.006
 virtue-0.006
 depends-0.006
Token into
Token ID656
Feature activation-4.91e+00

Pos logit contributions
ised+0.051
ized+0.046
 widened+0.046
awed+0.044
 sudden+0.043
Neg logit contributions
 yours-0.081
 loser-0.069
 teaches-0.068
 costs-0.067
 will-0.067
Token the
Token ID262
Feature activation-8.97e-01

Pos logit contributions
ised+0.081
ized+0.078
awed+0.077
led+0.073
 early+0.071
Neg logit contributions
 yours-0.142
 loser-0.127
 deserve-0.118
 losers-0.116
 you-0.115
Token cup
Token ID6508
Feature activation-1.71e+00

Pos logit contributions
ised+0.016
 early+0.016
ized+0.016
 widened+0.015
 wide+0.015
Neg logit contributions
 yours-0.024
 loser-0.022
 deserve-0.020
 virtue-0.020
 losers-0.020
Token and
Token ID290
Feature activation-1.88e+00

Pos logit contributions
ised+0.033
ized+0.030
 widened+0.029
 early+0.028
awed+0.027
Neg logit contributions
 yours-0.044
 loser-0.040
 deserve-0.038
 costs-0.037
 policy-0.037
Token handed
Token ID10158
Feature activation-3.05e+00

Pos logit contributions
ised+0.041
ized+0.038
 early+0.035
 curiously+0.033
awed+0.033
Neg logit contributions
 yours-0.041
 deserve-0.036
 costs-0.036
 will-0.035
 teaches-0.035
Token it
Token ID340
Feature activation-1.65e+00

Pos logit contributions
 early+0.037
ised+0.037
 wide+0.036
ized+0.034
 curiously+0.033
Neg logit contributions
 yours-0.094
 loser-0.091
 deserve-0.086
 virtue-0.085
 depends-0.085
Token was
Token ID373
Feature activation-1.81e+00

Pos logit contributions
 yours+0.023
 loser+0.021
 deserve+0.020
 losers+0.020
 virtue+0.019
Neg logit contributions
ised-0.012
 early-0.011
ized-0.011
 widened-0.010
 wide-0.010
Token very
Token ID845
Feature activation-1.29e+00

Pos logit contributions
ised+0.028
 early+0.026
ized+0.026
 widened+0.024
 wide+0.024
Neg logit contributions
 yours-0.054
 loser-0.049
 deserve-0.047
 virtue-0.045
 policy-0.045
Token excited
Token ID6568
Feature activation-2.64e+00

Pos logit contributions
ised+0.018
ized+0.017
 early+0.017
 widened+0.016
awed+0.016
Neg logit contributions
 yours-0.040
 loser-0.036
 deserve-0.034
 losers-0.034
 virtue-0.034
Token and
Token ID290
Feature activation-1.56e+00

Pos logit contributions
ised+0.044
 widened+0.040
ized+0.040
awed+0.040
led+0.037
Neg logit contributions
 yours-0.076
 loser-0.064
 deserve-0.063
 policy-0.063
 fault-0.061
Token went
Token ID1816
Feature activation-4.99e+00

Pos logit contributions
ised+0.024
ized+0.022
awed+0.021
 early+0.020
 widened+0.020
Neg logit contributions
 yours-0.047
 deserve-0.041
 loser-0.040
 costs-0.039
 losers-0.039
Token straight
Token ID3892
Feature activation-5.45e+00

Pos logit contributions
ised+0.111
ized+0.103
 widened+0.098
awed+0.095
led+0.090
Neg logit contributions
 yours-0.126
 loser-0.098
 winner-0.097
 deserve-0.097
 belong-0.094
Token to
Token ID284
Feature activation-3.41e+00

Pos logit contributions
ised+0.102
ized+0.100
 widened+0.099
awed+0.096
 widening+0.090
Neg logit contributions
 yours-0.145
 loser-0.113
 fault-0.109
 deserve-0.108
 losers-0.108
Token the
Token ID262
Feature activation-1.77e+00

Pos logit contributions
ised+0.052
ized+0.048
awed+0.045
 early+0.045
 widened+0.044
Neg logit contributions
 yours-0.106
 loser-0.093
 deserve-0.090
 losers-0.087
 policy-0.087
Token market
Token ID1910
Feature activation-3.17e+00

Pos logit contributions
 early+0.028
 wide+0.027
ised+0.027
ized+0.026
 cautiously+0.025
Neg logit contributions
 yours-0.051
 loser-0.046
 deserve-0.044
 policy-0.043
 losers-0.042
Token.
Token ID13
Feature activation+5.01e-01

Pos logit contributions
ised+0.055
 widened+0.051
ized+0.049
awed+0.049
led+0.048
Neg logit contributions
 yours-0.090
 loser-0.076
 deserve-0.075
 policy-0.074
 costs-0.073
Token
Token ID220
Feature activation-9.29e-01

Pos logit contributions
 yours+0.015
 loser+0.013
 deserve+0.013
 virtue+0.012
 losers+0.012
Neg logit contributions
ised-0.008
 early-0.007
ized-0.007
 widened-0.007
 wide-0.007