TOP ACTIVATIONS
MAX = 5.175

" Can I drink the milk , please ?" But mom
rare color called " lav ender ". Lily thought it looked
my , " Can I drink the milk , please ?"
. She put on her red m itten and got a
. " What are you eating , sweet heart ?" asked
's mom put on her a pron and said , "
mom my put on a pretty a pron and started to
aunt put some soft flour and some sweet sugar in a
. Do you want some milk with it ?" her mom
. She put on a pink a pron and climbed on
Do you want some milk with it ?" her mom said
. She put on a pretty dress and ran outside to
stirred the jam . The jam smelled so good that Lily
like me to brush your hair ?" Tim my was so
it tastes . She picks a leaf and puts it in
some of my juice to drink ?" The deer
" Can I put the cup cakes in the oven ?"
I have some soup to drink ?" Her mom said ,
. " We need sugar to make cookies . What should
. He put on his a pron and started to mix

MIN ACTIVATIONS
MIN = -4.948

and suddenly , Sam started running really fast . He was
cat was terrified and started running away . In
home . Everywhere he went , people didn 't want him
agreed and they both started running . The fox
march like soldiers and make loud noises . They marched on
louder and soon everyone started running , screaming , and pan
really fast . He was running so fast that he couldn
cheering and someone else started running faster . It was Sarah
They all started running up the volcano as fast
Tim and Max started to run . They both ran very
with their skill and started running around the map . Each
race began . Jack started running as fast as he could
. Then they both started running . They ran so fast
raced and they were both running very fast . But in
racing . They both started running as fast as they could
race started and everyone was running very fast . Suddenly ,
race . Then they started running as fast as they could
race started and the rabbit ran very fast . But then
was going and started to run too . They were very
It chased them and they ran away . Lily was sad

INTERVAL 2.644 - 5.175
CONTAINS 0.632%

She ran to her mom my and said , " Mom
L ily said , " Okay , Mom my
His mom said , " Sure , Tim my
day , she spilled some juice on the carpet and didn
with a big bag . Inside was something green and fuzzy

INTERVAL 0.113 - 2.644
CONTAINS 43.627%

" M ilo , look at this feather !
She strapped herself in , and the car zoom ed around
. It was an ice - cream van coming down the
have to clean up the mess . Sara and Ben say
, it has a hole in it ." L

INTERVAL -2.417 - 0.113
CONTAINS 55.166%

yes , yes !" Tim and Sam say . They sit
was a little clumsy and she accidentally dropped the box .
one else in the world . One day , he decided
feel better !" The animals all worked together to find a
the y ummy cake . <|endoftext|> Once upon a time ,

INTERVAL -4.948 - -2.417
CONTAINS 0.575%

his mind and make the decision . He tried to decide
, feeling very brave and mighty . The fairy smiled and
both ran deep into the forest . It was so much
. Luckily , the driver was a very smart
stay in their room . They want to play with Mom
Token "
Token ID366
Feature activation+3.14e+00

Pos logit contributions
pron+0.174
ool+0.168
iflower+0.157
 butter+0.156
ons+0.154
Neg logit contributions
 War-0.054
 warriors-0.053
joy-0.052
 runners-0.051
 afraid-0.050
TokenCan
Token ID6090
Feature activation+2.85e+00

Pos logit contributions
pron+0.179
ool+0.172
on+0.168
 teaspoon+0.167
 butter+0.163
Neg logit contributions
Ready-0.068
minded-0.064
 villains-0.056
 War-0.055
 risks-0.054
Token I
Token ID314
Feature activation+2.69e+00

Pos logit contributions
pron+0.209
book+0.173
 butter+0.172
ool+0.172
 recipe+0.169
Neg logit contributions
 villains-0.063
 War-0.053
 loses-0.051
 wre-0.045
 opponents-0.043
Token drink
Token ID4144
Feature activation+5.00e+00

Pos logit contributions
pron+0.179
 butter+0.141
lf+0.140
iflower+0.139
 recipe+0.137
Neg logit contributions
 risks-0.073
 drown-0.072
 villains-0.071
 wre-0.071
 prisoners-0.064
Token the
Token ID262
Feature activation+3.01e+00

Pos logit contributions
pron+0.341
comb+0.286
lf+0.281
itten+0.280
 hen+0.270
Neg logit contributions
 risks-0.107
 wre-0.090
 turns-0.081
 trouble-0.080
 War-0.080
Token milk
Token ID7545
Feature activation+5.17e+00

Pos logit contributions
pron+0.128
on+0.098
lf+0.096
innie+0.086
mmm+0.084
Neg logit contributions
 risks-0.158
 noises-0.139
 unnecessary-0.136
 trouble-0.136
 chances-0.135
Token,
Token ID11
Feature activation+4.08e+00

Pos logit contributions
pron+0.286
 tul+0.243
innie+0.242
 basil+0.235
 lav+0.231
Neg logit contributions
 anymore-0.128
ashes-0.123
 faster-0.115
 away-0.108
 loses-0.106
Token please
Token ID3387
Feature activation+3.98e+00

Pos logit contributions
pron+0.217
iflower+0.206
ons+0.203
ool+0.202
pp+0.201
Neg logit contributions
 enemies-0.105
 War-0.102
 villain-0.102
 villains-0.098
 brakes-0.094
Token?"
Token ID1701
Feature activation+2.44e+00

Pos logit contributions
pron+0.211
itten+0.192
quet+0.189
ons+0.187
isy+0.185
Neg logit contributions
 wre-0.085
 drives-0.082
 War-0.081
 want-0.080
 win-0.078
Token But
Token ID887
Feature activation+2.71e+00

Pos logit contributions
pron+0.138
 butter+0.134
iflower+0.121
ool+0.120
comb+0.119
Neg logit contributions
 War-0.081
 villains-0.073
 shouts-0.068
 He-0.068
 opponents-0.066
Token mom
Token ID1995
Feature activation+2.61e+00

Pos logit contributions
pron+0.139
 butter+0.124
ool+0.119
iflower+0.116
 basil+0.116
Neg logit contributions
 villains-0.105
 War-0.099
 villain-0.093
 enemies-0.092
 opponents-0.092
Token rare
Token ID4071
Feature activation+2.86e+00

Pos logit contributions
pron+0.154
ocolate+0.152
mm+0.145
 butter+0.145
on+0.142
Neg logit contributions
 villains-0.090
 risks-0.088
 races-0.087
 prisoners-0.086
 wre-0.084
Token color
Token ID3124
Feature activation+2.47e+00

Pos logit contributions
pron+0.129
 butter+0.108
on+0.100
 bacon+0.098
 basil+0.097
Neg logit contributions
 villains-0.126
 risks-0.123
 warriors-0.121
 speeds-0.118
 prisoners-0.117
Token called
Token ID1444
Feature activation+1.33e+00

Pos logit contributions
pron+0.176
 butter+0.144
 bacon+0.140
 basil+0.134
 hen+0.133
Neg logit contributions
 villains-0.060
 speeds-0.056
 warriors-0.054
 wre-0.052
 scores-0.050
Token "
Token ID366
Feature activation+3.69e+00

Pos logit contributions
pron+0.075
 butter+0.061
 egg+0.052
 slice+0.052
 teaspoon+0.052
Neg logit contributions
 villains-0.065
 risks-0.059
 warriors-0.059
 War-0.057
 runners-0.057
Tokenlav
Token ID18809
Feature activation+1.33e+00

Pos logit contributions
pron+0.199
 butter+0.172
 egg+0.165
 teaspoon+0.164
 toast+0.161
Neg logit contributions
 villains-0.139
 risks-0.135
 War-0.130
 runners-0.129
Ready-0.122
Tokenender
Token ID2194
Feature activation+5.03e+00

Pos logit contributions
pron+0.068
 butter+0.055
 slice+0.050
 teaspoon+0.048
 spoon+0.048
Neg logit contributions
 villains-0.065
 risks-0.065
 speeds-0.062
 opponents-0.062
 warriors-0.061
Token".
Token ID1911
Feature activation+6.76e-01

Pos logit contributions
pron+0.316
 butter+0.242
on+0.239
 comb+0.224
lf+0.222
Neg logit contributions
 risks-0.160
 warriors-0.139
 scores-0.133
 runners-0.133
 War-0.132
Token Lily
Token ID20037
Feature activation+2.88e+00

Pos logit contributions
pron+0.041
 butter+0.036
 egg+0.032
on+0.032
 teaspoon+0.031
Neg logit contributions
 villains-0.030
 risks-0.026
 War-0.026
 opponents-0.025
 speeds-0.025
Token thought
Token ID1807
Feature activation+2.80e+00

Pos logit contributions
pron+0.174
 butter+0.151
ocolate+0.144
 basil+0.141
 egg+0.139
Neg logit contributions
 villains-0.091
 speeds-0.087
 warriors-0.083
 opponents-0.082
 shouts-0.081
Token it
Token ID340
Feature activation+1.92e+00

Pos logit contributions
pron+0.179
flower+0.146
 slice+0.143
 loaf+0.133
pp+0.133
Neg logit contributions
 risks-0.087
 villains-0.086
 unnecessary-0.085
 speeds-0.082
 obstacles-0.079
Token looked
Token ID3114
Feature activation+2.71e-01

Pos logit contributions
pron+0.114
 butter+0.096
 slice+0.090
 basil+0.088
 toast+0.087
Neg logit contributions
 villains-0.075
 speeds-0.071
 risks-0.069
 War-0.067
 warriors-0.064
Tokenmy
Token ID1820
Feature activation+2.83e+00

Pos logit contributions
pron+0.201
 panc+0.166
 butter+0.161
 teaspoon+0.160
 poppy+0.154
Neg logit contributions
minded-0.086
 wre-0.081
 enemies-0.075
 warriors-0.074
ks-0.071
Token,
Token ID11
Feature activation+2.69e+00

Pos logit contributions
pron+0.186
 butter+0.148
 panc+0.144
itten+0.140
 comb+0.138
Neg logit contributions
minded-0.082
 warriors-0.080
 wre-0.078
 enemies-0.078
 villains-0.077
Token "
Token ID366
Feature activation+3.14e+00

Pos logit contributions
pron+0.174
ool+0.168
iflower+0.157
 butter+0.156
ons+0.154
Neg logit contributions
 War-0.054
 warriors-0.053
joy-0.052
 runners-0.051
 afraid-0.050
TokenCan
Token ID6090
Feature activation+2.85e+00

Pos logit contributions
pron+0.179
ool+0.172
on+0.168
 teaspoon+0.167
 butter+0.163
Neg logit contributions
Ready-0.068
minded-0.064
 villains-0.056
 War-0.055
 risks-0.054
Token I
Token ID314
Feature activation+2.69e+00

Pos logit contributions
pron+0.209
book+0.173
 butter+0.172
ool+0.172
 recipe+0.169
Neg logit contributions
 villains-0.063
 War-0.053
 loses-0.051
 wre-0.045
 opponents-0.043
Token drink
Token ID4144
Feature activation+5.00e+00

Pos logit contributions
pron+0.179
 butter+0.141
lf+0.140
iflower+0.139
 recipe+0.137
Neg logit contributions
 risks-0.073
 drown-0.072
 villains-0.071
 wre-0.071
 prisoners-0.064
Token the
Token ID262
Feature activation+3.01e+00

Pos logit contributions
pron+0.341
comb+0.286
lf+0.281
itten+0.280
 hen+0.270
Neg logit contributions
 risks-0.107
 wre-0.090
 turns-0.081
 trouble-0.080
 War-0.080
Token milk
Token ID7545
Feature activation+5.17e+00

Pos logit contributions
pron+0.128
on+0.098
lf+0.096
innie+0.086
mmm+0.084
Neg logit contributions
 risks-0.158
 noises-0.139
 unnecessary-0.136
 trouble-0.136
 chances-0.135
Token,
Token ID11
Feature activation+4.08e+00

Pos logit contributions
pron+0.286
 tul+0.243
innie+0.242
 basil+0.235
 lav+0.231
Neg logit contributions
 anymore-0.128
ashes-0.123
 faster-0.115
 away-0.108
 loses-0.106
Token please
Token ID3387
Feature activation+3.98e+00

Pos logit contributions
pron+0.217
iflower+0.206
ons+0.203
ool+0.202
pp+0.201
Neg logit contributions
 enemies-0.105
 War-0.102
 villain-0.102
 villains-0.098
 brakes-0.094
Token?"
Token ID1701
Feature activation+2.44e+00

Pos logit contributions
pron+0.211
itten+0.192
quet+0.189
ons+0.187
isy+0.185
Neg logit contributions
 wre-0.085
 drives-0.082
 War-0.081
 want-0.080
 win-0.078
Token.
Token ID13
Feature activation+5.59e-01

Pos logit contributions
pron+0.094
 butter+0.078
 basil+0.075
on+0.074
 teaspoon+0.071
Neg logit contributions
 villains-0.056
 warriors-0.049
 risks-0.048
 opponents-0.047
 runners-0.047
Token She
Token ID1375
Feature activation+2.42e+00

Pos logit contributions
pron+0.031
 butter+0.027
 egg+0.023
on+0.023
 slice+0.023
Neg logit contributions
 villains-0.027
 opponents-0.024
 War-0.024
 risks-0.024
 speeds-0.024
Token put
Token ID1234
Feature activation+3.11e+00

Pos logit contributions
pron+0.149
 butter+0.135
ocolate+0.132
quet+0.130
 basil+0.129
Neg logit contributions
 villains-0.077
 speeds-0.060
 warriors-0.057
 wre-0.056
 prisoners-0.055
Token on
Token ID319
Feature activation+2.54e+00

Pos logit contributions
pron+0.191
isy+0.173
comb+0.169
ousy+0.157
quet+0.156
Neg logit contributions
 risks-0.076
 turns-0.074
 speeds-0.071
 wre-0.070
 brakes-0.068
Token her
Token ID607
Feature activation+3.11e+00

Pos logit contributions
pron+0.129
isy+0.102
comb+0.102
itten+0.099
quet+0.096
Neg logit contributions
 wre-0.107
 speeds-0.103
 races-0.102
 risks-0.099
 scores-0.094
Token red
Token ID2266
Feature activation+4.98e+00

Pos logit contributions
pron+0.147
pp+0.140
quet+0.138
comb+0.136
mmm+0.135
Neg logit contributions
 risks-0.097
 trouble-0.095
 wre-0.094
 opponents-0.094
 goals-0.093
Token m
Token ID285
Feature activation+3.13e+00

Pos logit contributions
isy+0.211
pron+0.205
Mill+0.194
ousy+0.193
mmm+0.193
Neg logit contributions
 speeds-0.158
 brakes-0.157
 scores-0.157
 noises-0.155
 drives-0.153
Tokenitten
Token ID2621
Feature activation+3.41e+00

Pos logit contributions
pron+0.126
 basil+0.117
 butter+0.113
 poppy+0.112
 herb+0.107
Neg logit contributions
 villains-0.132
 risks-0.129
 opponents-0.129
 grades-0.129
 brakes-0.124
Token and
Token ID290
Feature activation+3.99e+00

Pos logit contributions
pron+0.216
 butter+0.177
 basil+0.175
gary+0.174
 panc+0.172
Neg logit contributions
 villains-0.088
 speeds-0.079
 warriors-0.078
 runners-0.078
 opponents-0.077
Token got
Token ID1392
Feature activation+1.44e+00

Pos logit contributions
ocolate+0.210
pron+0.210
quet+0.207
isy+0.199
ummy+0.196
Neg logit contributions
 villains-0.104
 warriors-0.103
 speeds-0.097
 turns-0.096
 sped-0.090
Token a
Token ID257
Feature activation+3.16e+00

Pos logit contributions
pron+0.080
isy+0.059
 butter+0.057
 slice+0.055
comb+0.055
Neg logit contributions
 villains-0.064
 wre-0.064
 risks-0.063
 unnecessary-0.059
 speeds-0.059
Token.
Token ID13
Feature activation+1.42e+00

Pos logit contributions
pron+0.268
innie+0.206
 tul+0.199
 hen+0.197
 bacon+0.194
Neg logit contributions
 risks-0.069
 wre-0.068
 victorious-0.060
 warriors-0.059
 goals-0.058
Token "
Token ID366
Feature activation+1.67e+00

Pos logit contributions
pron+0.088
 butter+0.075
on+0.066
 slice+0.064
 egg+0.063
Neg logit contributions
 villains-0.061
 warriors-0.055
 War-0.054
 opponents-0.053
 speeds-0.053
TokenWhat
Token ID2061
Feature activation+1.48e+00

Pos logit contributions
pron+0.109
 butter+0.092
 teaspoon+0.088
 egg+0.086
 spoon+0.085
Neg logit contributions
 villains-0.055
 risks-0.052
 War-0.049
 runners-0.045
 wre-0.045
Token are
Token ID389
Feature activation+2.13e+00

Pos logit contributions
pron+0.095
 butter+0.072
 basil+0.069
 egg+0.068
 comb+0.067
Neg logit contributions
 villains-0.063
 risks-0.056
 enemies-0.055
 warriors-0.053
 wre-0.053
Token you
Token ID345
Feature activation+2.32e+00

Pos logit contributions
pron+0.149
 slice+0.108
on+0.108
 mug+0.107
 comb+0.107
Neg logit contributions
 risks-0.070
 villains-0.068
 wre-0.067
 unnecessary-0.062
 runners-0.061
Token eating
Token ID6600
Feature activation+4.97e+00

Pos logit contributions
pron+0.153
 butter+0.120
quet+0.117
brush+0.116
 ornament+0.115
Neg logit contributions
 wre-0.072
 risks-0.068
 enemies-0.067
 warriors-0.066
 enemy-0.065
Token,
Token ID11
Feature activation+1.41e+00

Pos logit contributions
pron+0.300
comb+0.227
isy+0.222
itten+0.221
innie+0.219
Neg logit contributions
 wre-0.163
 risks-0.154
 goals-0.136
 races-0.134
 scores-0.132
Token sweet
Token ID6029
Feature activation+1.19e+00

Pos logit contributions
pron+0.075
 butter+0.057
on+0.057
 comb+0.051
 baked+0.050
Neg logit contributions
 villains-0.068
 warriors-0.064
 enemies-0.063
 risks-0.063
 wre-0.061
Tokenheart
Token ID11499
Feature activation+1.36e+00

Pos logit contributions
pron+0.050
 butter+0.034
on+0.031
 comb+0.031
 slice+0.030
Neg logit contributions
 villains-0.070
 risks-0.067
 warriors-0.067
 wre-0.066
 runners-0.064
Token?"
Token ID1701
Feature activation+1.75e+00

Pos logit contributions
pron+0.059
 butter+0.047
 basil+0.041
 sugar+0.038
 slice+0.037
Neg logit contributions
 villains-0.079
 risks-0.074
 warriors-0.073
 wre-0.071
 prisoners-0.070
Token asked
Token ID1965
Feature activation+1.21e+00

Pos logit contributions
pron+0.093
 butter+0.072
 slice+0.063
ool+0.060
iflower+0.059
Neg logit contributions
 villains-0.087
 shouts-0.083
 warriors-0.077
 War-0.077
 risks-0.077
Token's
Token ID338
Feature activation+1.78e+00

Pos logit contributions
pron+0.134
 butter+0.108
 basil+0.102
 teaspoon+0.102
ocolate+0.101
Neg logit contributions
 villains-0.075
 warriors-0.067
 speeds-0.065
 wre-0.064
dies-0.060
Token mom
Token ID1995
Feature activation+2.92e+00

Pos logit contributions
pron+0.098
on+0.082
 butter+0.081
 basil+0.075
mmm+0.075
Neg logit contributions
 villains-0.070
 enemies-0.068
 prisoners-0.068
 enemy-0.065
 warriors-0.064
Token put
Token ID1234
Feature activation+3.11e+00

Pos logit contributions
pron+0.174
ocolate+0.152
 basil+0.151
comb+0.148
 teaspoon+0.146
Neg logit contributions
 shouts-0.072
 wre-0.070
 won-0.068
 prisoners-0.067
 warriors-0.067
Token on
Token ID319
Feature activation+2.55e+00

Pos logit contributions
pron+0.186
isy+0.157
comb+0.152
mmm+0.148
quet+0.139
Neg logit contributions
 risks-0.103
 wre-0.092
 unnecessary-0.091
 trouble-0.089
 War-0.087
Token her
Token ID607
Feature activation+3.66e+00

Pos logit contributions
pron+0.143
isy+0.112
comb+0.110
itten+0.109
on+0.105
Neg logit contributions
 wre-0.102
 risks-0.097
 villains-0.095
 races-0.093
 trouble-0.090
Token a
Token ID257
Feature activation+4.87e+00

Pos logit contributions
pron+0.162
on+0.143
ousy+0.140
 butter+0.138
quet+0.137
Neg logit contributions
 risks-0.144
 prisoners-0.133
 unnecessary-0.130
 trouble-0.130
 warriors-0.128
Tokenpron
Token ID31186
Feature activation+2.90e+00

Pos logit contributions
ocolate+0.170
mmm+0.164
comb+0.164
mm+0.161
ions+0.158
Neg logit contributions
 races-0.199
 risks-0.197
 wre-0.189
 villains-0.188
 unnecessary-0.187
Token and
Token ID290
Feature activation+3.33e+00

Pos logit contributions
pron+0.159
 butter+0.147
 basil+0.142
 sugar+0.137
 teaspoon+0.136
Neg logit contributions
 villains-0.106
 warriors-0.090
dies-0.087
 wre-0.085
 shouts-0.083
Token said
Token ID531
Feature activation+2.88e+00

Pos logit contributions
pron+0.182
ocolate+0.158
ummy+0.153
quet+0.152
itten+0.148
Neg logit contributions
 villains-0.116
 speeds-0.105
 warriors-0.104
 shouts-0.103
 drives-0.101
Token,
Token ID11
Feature activation+2.97e+00

Pos logit contributions
pron+0.178
 butter+0.139
flower+0.138
ousy+0.136
iflower+0.136
Neg logit contributions
 villains-0.105
 War-0.087
 risks-0.079
 warriors-0.078
 wre-0.077
Token "
Token ID366
Feature activation+1.08e+00

Pos logit contributions
pron+0.188
 butter+0.173
 egg+0.156
 slice+0.156
 teaspoon+0.153
Neg logit contributions
 villains-0.092
 War-0.087
 warriors-0.083
 opponents-0.073
 brakes-0.072
Token mom
Token ID1995
Feature activation+2.87e+00

Pos logit contributions
pron+0.064
 butter+0.055
on+0.052
 basil+0.049
mmm+0.047
Neg logit contributions
 villains-0.057
 enemies-0.052
 prisoners-0.052
 warriors-0.051
 opponents-0.051
Tokenmy
Token ID1820
Feature activation+2.68e+00

Pos logit contributions
pron+0.175
ocolate+0.149
comb+0.149
 basil+0.148
quet+0.146
Neg logit contributions
 drivers-0.073
 warriors-0.073
 wre-0.071
 shouts-0.070
 villains-0.068
Token put
Token ID1234
Feature activation+3.08e+00

Pos logit contributions
pron+0.156
 butter+0.122
quet+0.121
 teaspoon+0.120
ocolate+0.120
Neg logit contributions
 warriors-0.097
 villains-0.095
 wre-0.086
 prisoners-0.085
 speeds-0.083
Token on
Token ID319
Feature activation+2.59e+00

Pos logit contributions
pron+0.181
isy+0.154
comb+0.146
mmm+0.140
quet+0.135
Neg logit contributions
 risks-0.103
 trouble-0.096
 wre-0.093
 unnecessary-0.093
 War-0.087
Token a
Token ID257
Feature activation+3.97e+00

Pos logit contributions
pron+0.151
isy+0.119
itten+0.117
comb+0.114
on+0.109
Neg logit contributions
 wre-0.098
 risks-0.095
 trouble-0.091
 villains-0.090
 races-0.090
Token pretty
Token ID2495
Feature activation+4.86e+00

Pos logit contributions
ocolate+0.172
comb+0.172
mmm+0.172
mm+0.169
pp+0.168
Neg logit contributions
 wre-0.131
 villains-0.128
 risks-0.125
 trouble-0.116
 threat-0.115
Token a
Token ID257
Feature activation+4.08e+00

Pos logit contributions
mmm+0.211
mm+0.209
mmmm+0.206
comb+0.204
ocolate+0.202
Neg logit contributions
 races-0.157
 marches-0.152
 villains-0.147
 risks-0.147
 speeds-0.146
Tokenpron
Token ID31186
Feature activation+2.30e+00

Pos logit contributions
ocolate+0.118
comb+0.112
mmm+0.110
 butter+0.108
ries+0.107
Neg logit contributions
 villains-0.202
 risks-0.190
 wre-0.189
 War-0.183
 threat-0.181
Token and
Token ID290
Feature activation+3.14e+00

Pos logit contributions
pron+0.124
 butter+0.111
 basil+0.105
 sugar+0.102
 teaspoon+0.100
Neg logit contributions
 villains-0.096
 warriors-0.084
 wre-0.077
dies-0.077
 runners-0.077
Token started
Token ID2067
Feature activation+1.76e+00

Pos logit contributions
pron+0.167
ocolate+0.139
quet+0.136
 recipe+0.134
itten+0.133
Neg logit contributions
 villains-0.119
 speeds-0.106
 warriors-0.103
 shouts-0.101
 drives-0.099
Token to
Token ID284
Feature activation+2.37e+00

Pos logit contributions
pron+0.097
 butter+0.079
isy+0.078
 egg+0.073
comb+0.073
Neg logit contributions
 villains-0.072
 wre-0.068
 risks-0.067
 warriors-0.065
 speeds-0.064
Token aunt
Token ID25949
Feature activation+2.27e+00

Pos logit contributions
pron+0.091
 butter+0.078
on+0.074
 basil+0.070
 teaspoon+0.067
Neg logit contributions
 villains-0.067
 prisoners-0.061
 risks-0.056
 enemies-0.056
 warriors-0.055
Token put
Token ID1234
Feature activation+3.12e+00

Pos logit contributions
pron+0.121
 butter+0.089
 panc+0.087
ocolate+0.087
 basil+0.086
Neg logit contributions
 villains-0.105
 warriors-0.104
 wre-0.095
 speeds-0.094
 enemies-0.093
Token some
Token ID617
Feature activation+3.77e+00

Pos logit contributions
pron+0.197
isy+0.155
mmm+0.149
itten+0.144
comb+0.144
Neg logit contributions
 risks-0.097
 speeds-0.086
 wre-0.084
 obstacles-0.081
 warriors-0.081
Token soft
Token ID2705
Feature activation+3.43e+00

Pos logit contributions
pron+0.162
comb+0.113
pp+0.113
mmm+0.105
 basil+0.104
Neg logit contributions
 risks-0.195
 wre-0.185
 trouble-0.173
 warriors-0.171
 enemies-0.170
Token flour
Token ID10601
Feature activation+3.81e+00

Pos logit contributions
pron+0.165
isy+0.117
mmm+0.111
live+0.105
comb+0.102
Neg logit contributions
 risks-0.167
 warriors-0.155
 turns-0.152
 wre-0.150
 noises-0.150
Token and
Token ID290
Feature activation+4.83e+00

Pos logit contributions
pron+0.267
mmm+0.197
innie+0.197
isy+0.195
mmmm+0.190
Neg logit contributions
 warriors-0.080
 risks-0.079
 apart-0.073
 wre-0.071
 speeds-0.071
Token some
Token ID617
Feature activation+4.14e+00

Pos logit contributions
pron+0.233
isy+0.169
quet+0.166
ocolate+0.165
ase+0.155
Neg logit contributions
 turns-0.208
 speeds-0.193
 bumps-0.188
 drives-0.188
 obstacles-0.187
Token sweet
Token ID6029
Feature activation+3.53e+00

Pos logit contributions
pron+0.176
lf+0.109
pp+0.107
ocolate+0.106
on+0.106
Neg logit contributions
 risks-0.225
 wre-0.204
 obstacles-0.201
 goals-0.200
 speeds-0.199
Token sugar
Token ID7543
Feature activation+3.88e+00

Pos logit contributions
pron+0.148
chard+0.084
lf+0.083
comb+0.083
pp+0.081
Neg logit contributions
 risks-0.200
 turns-0.180
 noises-0.177
 obstacles-0.174
 speeds-0.170
Token in
Token ID287
Feature activation+3.07e+00

Pos logit contributions
pron+0.272
mmm+0.209
ocolate+0.202
mmmm+0.200
gary+0.199
Neg logit contributions
ashes-0.074
 speeds-0.069
 warriors-0.068
ks-0.068
 risks-0.067
Token a
Token ID257
Feature activation+3.41e+00

Pos logit contributions
pron+0.226
atmeal+0.185
isy+0.184
iflower+0.176
ipop+0.172
Neg logit contributions
 trouble-0.063
 risks-0.062
 opposite-0.061
 obstacles-0.057
 speeds-0.056
Token.
Token ID13
Feature activation-8.99e-02

Pos logit contributions
 villains+0.004
 risks+0.004
 warriors+0.004
 wre+0.004
 opponents+0.004
Neg logit contributions
pron-0.004
 butter-0.003
 comb-0.003
 basil-0.003
 teaspoon-0.003
Token Do
Token ID2141
Feature activation+1.80e-01

Pos logit contributions
 villains+0.004
 risks+0.003
 warriors+0.003
 opponents+0.003
 wre+0.003
Neg logit contributions
pron-0.006
 butter-0.005
 comb-0.004
on-0.004
 teaspoon-0.004
Token you
Token ID345
Feature activation-5.17e-01

Pos logit contributions
pron+0.010
 butter+0.008
 teaspoon+0.007
 basil+0.007
 comb+0.007
Neg logit contributions
 villains-0.009
 risks-0.008
 warriors-0.008
 wre-0.008
 opponents-0.008
Token want
Token ID765
Feature activation+3.09e+00

Pos logit contributions
 villains+0.018
 risks+0.015
 warriors+0.015
 wre+0.014
 opponents+0.014
Neg logit contributions
pron-0.037
 butter-0.033
 comb-0.030
on-0.029
 slice-0.029
Token some
Token ID617
Feature activation+3.24e+00

Pos logit contributions
pron+0.217
kin+0.170
pp+0.168
comb+0.165
itten+0.164
Neg logit contributions
 villains-0.092
 risks-0.085
 wre-0.080
 trouble-0.076
 races-0.071
Token milk
Token ID7545
Feature activation+4.78e+00

Pos logit contributions
pron+0.159
pp+0.124
comb+0.124
 basil+0.112
 hen+0.111
Neg logit contributions
 risks-0.144
 wre-0.135
 villains-0.134
 goals-0.132
 trouble-0.129
Token with
Token ID351
Feature activation+4.77e+00

Pos logit contributions
pron+0.264
isy+0.213
 basil+0.207
 tul+0.204
on+0.198
Neg logit contributions
 anymore-0.126
 wre-0.126
 risks-0.125
ashes-0.124
 warriors-0.123
Token it
Token ID340
Feature activation+4.03e+00

Pos logit contributions
pron+0.290
comb+0.236
quet+0.224
itten+0.220
pp+0.209
Neg logit contributions
 risks-0.144
 wre-0.123
 unnecessary-0.119
 danger-0.116
 enemies-0.114
Token?"
Token ID1701
Feature activation+4.12e-01

Pos logit contributions
pron+0.227
 tul+0.183
 butter+0.180
 basil+0.179
 teaspoon+0.177
Neg logit contributions
 villains-0.115
 enemies-0.115
 loses-0.115
 warriors-0.115
 speeds-0.111
Token her
Token ID607
Feature activation+2.69e-01

Pos logit contributions
pron+0.019
 butter+0.015
 slice+0.012
 teaspoon+0.012
 egg+0.012
Neg logit contributions
 villains-0.026
 warriors-0.023
 opponents-0.023
 War-0.023
 risks-0.023
Token mom
Token ID1995
Feature activation+2.32e+00

Pos logit contributions
pron+0.012
 butter+0.010
on+0.008
 basil+0.008
 sugar+0.008
Neg logit contributions
 villains-0.016
 warriors-0.015
 risks-0.014
 opponents-0.014
 speeds-0.014
Token.
Token ID13
Feature activation+1.25e+00

Pos logit contributions
pron+0.140
 butter+0.118
on+0.114
 teaspoon+0.109
 comb+0.108
Neg logit contributions
 wre-0.050
 villains-0.047
 opponents-0.045
 warriors-0.044
 risks-0.044
Token She
Token ID1375
Feature activation+2.22e+00

Pos logit contributions
pron+0.080
 butter+0.070
 egg+0.065
on+0.064
 panc+0.063
Neg logit contributions
 villains-0.048
 War-0.044
 opponents-0.043
 warriors-0.041
 speeds-0.041
Token put
Token ID1234
Feature activation+3.33e+00

Pos logit contributions
pron+0.140
 butter+0.125
quet+0.124
ocolate+0.117
mmm+0.117
Neg logit contributions
 villains-0.080
 warriors-0.065
 speeds-0.059
 wre-0.059
 enemies-0.056
Token on
Token ID319
Feature activation+2.90e+00

Pos logit contributions
pron+0.206
isy+0.183
comb+0.178
mmm+0.172
quet+0.167
Neg logit contributions
 risks-0.090
 wre-0.076
 shortcuts-0.075
 turns-0.073
 trouble-0.072
Token a
Token ID257
Feature activation+3.80e+00

Pos logit contributions
pron+0.151
isy+0.116
comb+0.115
itten+0.113
quet+0.110
Neg logit contributions
 wre-0.122
 risks-0.118
 races-0.118
 speeds-0.112
 obstacles-0.107
Token pink
Token ID11398
Feature activation+4.77e+00

Pos logit contributions
mmm+0.157
mm+0.156
mmmm+0.152
ocolate+0.151
comb+0.149
Neg logit contributions
 wre-0.135
 risks-0.131
 villains-0.131
 races-0.120
 warriors-0.120
Token a
Token ID257
Feature activation+4.32e+00

Pos logit contributions
mmm+0.197
pron+0.186
aul+0.185
mmmm+0.183
pp+0.176
Neg logit contributions
 villains-0.153
 pirate-0.149
 gear-0.146
 wre-0.146
 submarine-0.145
Tokenpron
Token ID31186
Feature activation+2.69e+00

Pos logit contributions
ocolate+0.173
mm+0.168
mmm+0.168
comb+0.167
mmmm+0.166
Neg logit contributions
 villains-0.150
 risks-0.140
 wre-0.138
 races-0.138
 scores-0.138
Token and
Token ID290
Feature activation+3.49e+00

Pos logit contributions
pron+0.148
 butter+0.132
 basil+0.125
 sugar+0.123
 teaspoon+0.120
Neg logit contributions
 villains-0.103
 warriors-0.092
dies-0.084
 wre-0.083
 runners-0.082
Token climbed
Token ID19952
Feature activation+1.72e+00

Pos logit contributions
pron+0.197
 recipe+0.173
ocolate+0.170
quet+0.169
 teaspoon+0.166
Neg logit contributions
 villains-0.113
 warriors-0.105
 speeds-0.104
 turns-0.096
 risks-0.096
Token on
Token ID319
Feature activation+2.09e+00

Pos logit contributions
pron+0.109
 butter+0.086
 slice+0.083
isy+0.080
 comb+0.080
Neg logit contributions
 speeds-0.058
 risks-0.055
 wre-0.055
 warriors-0.054
 runners-0.052
Token Do
Token ID2141
Feature activation+1.80e-01

Pos logit contributions
 villains+0.004
 risks+0.003
 warriors+0.003
 opponents+0.003
 wre+0.003
Neg logit contributions
pron-0.006
 butter-0.005
 comb-0.004
on-0.004
 teaspoon-0.004
Token you
Token ID345
Feature activation-5.17e-01

Pos logit contributions
pron+0.010
 butter+0.008
 teaspoon+0.007
 basil+0.007
 comb+0.007
Neg logit contributions
 villains-0.009
 risks-0.008
 warriors-0.008
 wre-0.008
 opponents-0.008
Token want
Token ID765
Feature activation+3.09e+00

Pos logit contributions
 villains+0.018
 risks+0.015
 warriors+0.015
 wre+0.014
 opponents+0.014
Neg logit contributions
pron-0.037
 butter-0.033
 comb-0.030
on-0.029
 slice-0.029
Token some
Token ID617
Feature activation+3.24e+00

Pos logit contributions
pron+0.217
kin+0.170
pp+0.168
comb+0.165
itten+0.164
Neg logit contributions
 villains-0.092
 risks-0.085
 wre-0.080
 trouble-0.076
 races-0.071
Token milk
Token ID7545
Feature activation+4.78e+00

Pos logit contributions
pron+0.159
pp+0.124
comb+0.124
 basil+0.112
 hen+0.111
Neg logit contributions
 risks-0.144
 wre-0.135
 villains-0.134
 goals-0.132
 trouble-0.129
Token with
Token ID351
Feature activation+4.77e+00

Pos logit contributions
pron+0.264
isy+0.213
 basil+0.207
 tul+0.204
on+0.198
Neg logit contributions
 anymore-0.126
 wre-0.126
 risks-0.125
ashes-0.124
 warriors-0.123
Token it
Token ID340
Feature activation+4.03e+00

Pos logit contributions
pron+0.290
comb+0.236
quet+0.224
itten+0.220
pp+0.209
Neg logit contributions
 risks-0.144
 wre-0.123
 unnecessary-0.119
 danger-0.116
 enemies-0.114
Token?"
Token ID1701
Feature activation+4.12e-01

Pos logit contributions
pron+0.227
 tul+0.183
 butter+0.180
 basil+0.179
 teaspoon+0.177
Neg logit contributions
 villains-0.115
 enemies-0.115
 loses-0.115
 warriors-0.115
 speeds-0.111
Token her
Token ID607
Feature activation+2.69e-01

Pos logit contributions
pron+0.019
 butter+0.015
 slice+0.012
 teaspoon+0.012
 egg+0.012
Neg logit contributions
 villains-0.026
 warriors-0.023
 opponents-0.023
 War-0.023
 risks-0.023
Token mom
Token ID1995
Feature activation+2.32e+00

Pos logit contributions
pron+0.012
 butter+0.010
on+0.008
 basil+0.008
 sugar+0.008
Neg logit contributions
 villains-0.016
 warriors-0.015
 risks-0.014
 opponents-0.014
 speeds-0.014
Token said
Token ID531
Feature activation+6.11e-01

Pos logit contributions
pron+0.147
 butter+0.124
 basil+0.118
 teaspoon+0.115
 sugar+0.112
Neg logit contributions
 villains-0.073
 shouts-0.073
 warriors-0.067
ks-0.067
 wre-0.065
Token.
Token ID13
Feature activation+1.32e+00

Pos logit contributions
 villains+0.005
 risks+0.005
 warriors+0.005
 wre+0.004
 opponents+0.004
Neg logit contributions
pron-0.005
 butter-0.004
 comb-0.003
 teaspoon-0.003
on-0.003
Token She
Token ID1375
Feature activation+2.00e+00

Pos logit contributions
pron+0.081
 butter+0.068
on+0.061
 slice+0.061
 egg+0.059
Neg logit contributions
 villains-0.058
 warriors-0.053
 opponents-0.052
 speeds-0.050
 runners-0.049
Token put
Token ID1234
Feature activation+3.33e+00

Pos logit contributions
pron+0.119
 butter+0.103
ocolate+0.098
on+0.098
 teaspoon+0.097
Neg logit contributions
 villains-0.077
 warriors-0.066
 speeds-0.066
 wre-0.064
 shouts-0.061
Token on
Token ID319
Feature activation+2.64e+00

Pos logit contributions
pron+0.204
comb+0.187
mmm+0.178
isy+0.175
 recipe+0.174
Neg logit contributions
 risks-0.074
 turns-0.069
 forward-0.068
 speeds-0.062
 drives-0.061
Token a
Token ID257
Feature activation+3.91e+00

Pos logit contributions
pron+0.126
comb+0.102
quet+0.095
isy+0.093
cake+0.092
Neg logit contributions
 races-0.117
 risks-0.116
 speeds-0.114
 walks-0.113
 wre-0.112
Token pretty
Token ID2495
Feature activation+4.76e+00

Pos logit contributions
ocolate+0.130
comb+0.127
mmm+0.125
mm+0.124
mmmm+0.119
Neg logit contributions
 risks-0.170
 villains-0.170
 marches-0.165
 races-0.165
 scores-0.161
Token dress
Token ID6576
Feature activation+2.09e+00

Pos logit contributions
comb+0.193
ocolate+0.193
mmm+0.186
mmmm+0.181
mm+0.180
Neg logit contributions
 races-0.176
 marches-0.176
 speeds-0.164
 risks-0.159
 walks-0.158
Token and
Token ID290
Feature activation+2.74e+00

Pos logit contributions
pron+0.118
 basil+0.104
 butter+0.104
 teaspoon+0.098
 sugar+0.097
Neg logit contributions
 villains-0.071
 warriors-0.067
 runners-0.065
 speeds-0.062
 opponents-0.062
Token ran
Token ID4966
Feature activation-6.28e-01

Pos logit contributions
pron+0.151
 recipe+0.123
ocolate+0.122
 teaspoon+0.121
 egg+0.118
Neg logit contributions
 villains-0.107
 speeds-0.099
 warriors-0.095
 shouts-0.092
 risks-0.089
Token outside
Token ID2354
Feature activation+2.82e-01

Pos logit contributions
 villains+0.038
 risks+0.035
 warriors+0.034
 wre+0.034
 opponents+0.034
Neg logit contributions
pron-0.029
 butter-0.024
 comb-0.020
on-0.020
 sugar-0.019
Token to
Token ID284
Feature activation+1.43e+00

Pos logit contributions
pron+0.016
 butter+0.014
 basil+0.011
 sugar+0.011
 comb+0.011
Neg logit contributions
 villains-0.014
 warriors-0.012
 risks-0.012
 wre-0.012
 opponents-0.012
Token stirred
Token ID33091
Feature activation+2.75e+00

Pos logit contributions
pron+0.137
ool+0.095
itten+0.092
ocolate+0.083
 ornament+0.083
Neg logit contributions
 warriors-0.160
 villains-0.157
 obstacles-0.156
 wre-0.155
 speeds-0.152
Token the
Token ID262
Feature activation+2.02e+00

Pos logit contributions
pron+0.181
 hen+0.136
 comb+0.136
itten+0.130
on+0.128
Neg logit contributions
 risks-0.089
 speeds-0.083
 wre-0.082
 goals-0.081
 obstacles-0.079
Token jam
Token ID16853
Feature activation+4.08e+00

Pos logit contributions
pron+0.070
on+0.050
Her+0.046
ocolate+0.041
 comb+0.040
Neg logit contributions
 risks-0.121
 wre-0.113
 speeds-0.110
 unnecessary-0.109
 runners-0.107
Token.
Token ID13
Feature activation+1.84e+00

Pos logit contributions
pron+0.253
innie+0.204
 hen+0.198
 comb+0.193
 basil+0.190
Neg logit contributions
 speeds-0.108
ashes-0.105
 goals-0.105
 villains-0.102
 warriors-0.101
Token The
Token ID383
Feature activation+1.42e+00

Pos logit contributions
pron+0.120
 butter+0.107
 egg+0.098
 slice+0.095
 teaspoon+0.095
Neg logit contributions
 villains-0.064
 War-0.061
 opponents-0.055
 speeds-0.054
 shouts-0.050
Token jam
Token ID16853
Feature activation+4.74e+00

Pos logit contributions
pron+0.051
on+0.038
 butter+0.033
 comb+0.030
ocolate+0.030
Neg logit contributions
 villains-0.085
 risks-0.083
 warriors-0.081
 wre-0.079
 runners-0.079
Token smelled
Token ID37609
Feature activation+2.43e+00

Pos logit contributions
pron+0.270
innie+0.229
 basil+0.220
 hen+0.219
 mug+0.214
Neg logit contributions
 speeds-0.124
 villains-0.109
 noises-0.107
 scores-0.102
 turns-0.099
Token so
Token ID523
Feature activation+1.48e+00

Pos logit contributions
pron+0.150
 comb+0.117
itten+0.116
 hen+0.116
on+0.113
Neg logit contributions
 risks-0.087
 wre-0.083
 victorious-0.077
 unnecessary-0.076
 goals-0.074
Token good
Token ID922
Feature activation+3.94e-01

Pos logit contributions
pron+0.085
 comb+0.066
 butter+0.065
on+0.064
 hen+0.064
Neg logit contributions
 villains-0.059
 risks-0.059
 wre-0.057
 speeds-0.057
 goals-0.056
Token that
Token ID326
Feature activation+1.94e+00

Pos logit contributions
pron+0.022
 butter+0.018
 comb+0.016
on+0.016
 basil+0.016
Neg logit contributions
 villains-0.019
 risks-0.018
 warriors-0.017
 wre-0.017
 opponents-0.017
Token Lily
Token ID20037
Feature activation+1.76e+00

Pos logit contributions
pron+0.101
 butter+0.071
 slice+0.070
 egg+0.068
itten+0.068
Neg logit contributions
 villains-0.096
 speeds-0.085
 risks-0.084
 wre-0.083
 War-0.082
Token like
Token ID588
Feature activation+3.54e+00

Pos logit contributions
pron+0.045
 butter+0.038
 teaspoon+0.035
 egg+0.035
 basil+0.035
Neg logit contributions
 villains-0.024
 risks-0.022
 warriors-0.020
 opponents-0.019
 unnecessary-0.019
Token me
Token ID502
Feature activation+2.81e+00

Pos logit contributions
pron+0.221
comb+0.184
kin+0.171
 hen+0.170
pp+0.170
Neg logit contributions
 risks-0.125
 villains-0.118
 races-0.109
 wre-0.105
 obstacles-0.102
Token to
Token ID284
Feature activation+2.43e+00

Pos logit contributions
pron+0.165
 basil+0.137
 butter+0.136
 egg+0.130
 teaspoon+0.129
Neg logit contributions
 risks-0.107
 villains-0.100
 warriors-0.098
 runners-0.095
 unnecessary-0.093
Token brush
Token ID14093
Feature activation+3.30e+00

Pos logit contributions
pron+0.151
 butter+0.120
 basil+0.116
 teaspoon+0.115
 panc+0.115
Neg logit contributions
 villains-0.096
 risks-0.090
 wre-0.081
 races-0.081
 opponents-0.079
Token your
Token ID534
Feature activation+1.99e+00

Pos logit contributions
pron+0.195
 slice+0.150
 butter+0.145
 baked+0.144
comb+0.144
Neg logit contributions
 risks-0.137
 wre-0.122
 unnecessary-0.121
 villains-0.119
 speeds-0.118
Token hair
Token ID4190
Feature activation+4.73e+00

Pos logit contributions
pron+0.122
 butter+0.102
on+0.101
 basil+0.098
 boiled+0.095
Neg logit contributions
 risks-0.072
 villains-0.069
 unnecessary-0.065
 opponents-0.065
 speeds-0.065
Token?"
Token ID1701
Feature activation+5.03e-01

Pos logit contributions
pron+0.217
 basil+0.198
 butter+0.192
 hen+0.189
 egg+0.187
Neg logit contributions
 speeds-0.170
 scores-0.168
 opponents-0.166
 goals-0.164
 risks-0.162
Token Tim
Token ID5045
Feature activation+1.25e+00

Pos logit contributions
pron+0.031
 butter+0.026
 teaspoon+0.023
 basil+0.023
 egg+0.023
Neg logit contributions
 villains-0.022
 risks-0.020
 wre-0.020
 War-0.019
 opponents-0.019
Tokenmy
Token ID1820
Feature activation+2.12e+00

Pos logit contributions
pron+0.059
 butter+0.049
 basil+0.045
 sugar+0.042
 baking+0.042
Neg logit contributions
 villains-0.066
 opponents-0.062
 speeds-0.061
 risks-0.061
 warriors-0.060
Token was
Token ID373
Feature activation+5.48e-01

Pos logit contributions
pron+0.128
 butter+0.116
 basil+0.109
 sugar+0.107
 flour+0.106
Neg logit contributions
 villains-0.078
 opponents-0.071
 warriors-0.069
 speeds-0.069
 wre-0.064
Token so
Token ID523
Feature activation+1.34e+00

Pos logit contributions
pron+0.033
 butter+0.028
 basil+0.025
 sugar+0.025
 egg+0.025
Neg logit contributions
 villains-0.024
 risks-0.021
 warriors-0.021
 wre-0.021
 opponents-0.020
Token it
Token ID340
Feature activation+5.55e-01

Pos logit contributions
pron+0.051
 butter+0.039
 comb+0.036
on+0.036
 teaspoon+0.035
Neg logit contributions
 villains-0.048
 risks-0.045
 warriors-0.043
 unnecessary-0.043
 wre-0.043
Token tastes
Token ID18221
Feature activation+1.50e+00

Pos logit contributions
pron+0.035
 butter+0.028
on+0.026
 slice+0.025
 egg+0.025
Neg logit contributions
 villains-0.024
 risks-0.022
 warriors-0.022
 speeds-0.021
 wre-0.021
Token.
Token ID13
Feature activation+5.87e-01

Pos logit contributions
pron+0.108
on+0.083
 butter+0.082
 slice+0.080
 comb+0.080
Neg logit contributions
 risks-0.050
 villains-0.047
 wre-0.044
 unnecessary-0.043
 warriors-0.043
Token She
Token ID1375
Feature activation-1.09e-01

Pos logit contributions
pron+0.034
 butter+0.029
 slice+0.024
on+0.024
 egg+0.024
Neg logit contributions
 villains-0.029
 risks-0.025
 speeds-0.025
 opponents-0.025
 warriors-0.025
Token picks
Token ID11103
Feature activation+1.70e+00

Pos logit contributions
 villains+0.004
 risks+0.003
 warriors+0.003
 wre+0.003
 opponents+0.003
Neg logit contributions
pron-0.008
 butter-0.007
on-0.006
 comb-0.006
 sugar-0.006
Token a
Token ID257
Feature activation+4.73e+00

Pos logit contributions
pron+0.108
 butter+0.074
 slice+0.073
 panc+0.072
 teaspoon+0.072
Neg logit contributions
 risks-0.070
 unnecessary-0.062
 villains-0.062
 warriors-0.062
 speeds-0.061
Token leaf
Token ID12835
Feature activation+3.16e+00

Pos logit contributions
mm+0.220
atmeal+0.217
pron+0.217
ocolate+0.211
ries+0.204
Neg logit contributions
 risks-0.156
 scores-0.142
 unnecessary-0.135
 threat-0.134
 villains-0.130
Token and
Token ID290
Feature activation+1.22e+00

Pos logit contributions
pron+0.214
innie+0.164
 butter+0.158
 bacon+0.157
 panc+0.155
Neg logit contributions
 warriors-0.083
 wre-0.076
 villains-0.076
 victorious-0.074
 scores-0.072
Token puts
Token ID7584
Feature activation+2.55e+00

Pos logit contributions
pron+0.081
 butter+0.062
on+0.059
 teaspoon+0.056
 recipe+0.056
Neg logit contributions
 villains-0.048
 risks-0.045
 speeds-0.045
 warriors-0.044
 wre-0.043
Token it
Token ID340
Feature activation+2.76e+00

Pos logit contributions
pron+0.171
 recipe+0.133
 panc+0.128
 hen+0.124
itten+0.122
Neg logit contributions
 risks-0.081
 scores-0.067
 victorious-0.066
 unnecessary-0.065
 speeds-0.065
Token in
Token ID287
Feature activation+2.49e+00

Pos logit contributions
pron+0.191
 panc+0.148
 butter+0.145
mmm+0.142
 mug+0.141
Neg logit contributions
 risks-0.079
ks-0.078
 villains-0.077
 goals-0.076
 warriors-0.076
Token some
Token ID617
Feature activation+2.72e+00

Pos logit contributions
pron+0.176
comb+0.134
itten+0.131
kin+0.127
on+0.125
Neg logit contributions
 risks-0.087
 villains-0.084
 wre-0.076
 fights-0.071
 races-0.071
Token of
Token ID286
Feature activation+2.73e+00

Pos logit contributions
pron+0.154
comb+0.111
on+0.105
 hen+0.105
iflower+0.102
Neg logit contributions
 risks-0.123
 wre-0.109
 races-0.107
 villains-0.107
 War-0.106
Token my
Token ID616
Feature activation+9.94e-01

Pos logit contributions
pron+0.169
comb+0.125
itten+0.122
on+0.121
 butter+0.116
Neg logit contributions
 risks-0.105
 wre-0.097
 villains-0.095
 scores-0.089
 unnecessary-0.089
Token juice
Token ID13135
Feature activation+4.08e+00

Pos logit contributions
pron+0.043
 butter+0.033
on+0.030
 egg+0.027
 comb+0.026
Neg logit contributions
 villains-0.058
 risks-0.056
 opponents-0.054
 prisoners-0.053
 wre-0.053
Token to
Token ID284
Feature activation+3.25e+00

Pos logit contributions
pron+0.211
 butter+0.161
 basil+0.160
 bacon+0.154
isy+0.149
Neg logit contributions
 wre-0.156
 warriors-0.153
 villains-0.153
 opponents-0.152
ashes-0.152
Token drink
Token ID4144
Feature activation+4.73e+00

Pos logit contributions
pron+0.205
itten+0.166
on+0.160
 butter+0.160
iflower+0.154
Neg logit contributions
 wre-0.100
 drown-0.095
 risks-0.092
 villains-0.092
 races-0.085
Token?"
Token ID1701
Feature activation+3.08e-01

Pos logit contributions
pron+0.262
itten+0.204
innie+0.202
comb+0.200
 hen+0.199
Neg logit contributions
 risks-0.149
 wre-0.142
 War-0.131
 anymore-0.130
ashes-0.123
Token
Token ID198
Feature activation+6.71e-01

Pos logit contributions
pron+0.018
 butter+0.015
 teaspoon+0.013
 slice+0.013
 egg+0.013
Neg logit contributions
 villains-0.015
 warriors-0.013
 risks-0.013
 opponents-0.012
 wre-0.012
Token
Token ID198
Feature activation+4.56e-01

Pos logit contributions
pron+0.042
 butter+0.036
 teaspoon+0.032
 egg+0.032
 basil+0.031
Neg logit contributions
 villains-0.030
 risks-0.025
 warriors-0.025
 War-0.024
 speeds-0.024
TokenThe
Token ID464
Feature activation-1.05e+00

Pos logit contributions
pron+0.023
 butter+0.019
 teaspoon+0.017
 egg+0.016
 sugar+0.016
Neg logit contributions
 villains-0.026
 warriors-0.023
 risks-0.022
 opponents-0.022
 runners-0.022
Token deer
Token ID20096
Feature activation+7.57e-01

Pos logit contributions
 villains+0.049
 risks+0.045
 opponents+0.043
 wre+0.042
 unnecessary+0.042
Neg logit contributions
pron-0.062
 butter-0.056
 egg-0.051
 comb-0.049
 teaspoon-0.048
Token"
Token ID1
Feature activation+2.30e+00

Pos logit contributions
pron+0.031
 butter+0.027
 sugar+0.024
 teaspoon+0.023
 basil+0.023
Neg logit contributions
 villains-0.031
 risks-0.028
 warriors-0.028
 opponents-0.027
 runners-0.027
TokenCan
Token ID6090
Feature activation+1.79e+00

Pos logit contributions
pron+0.126
 butter+0.108
on+0.106
 egg+0.106
 teaspoon+0.104
Neg logit contributions
 villains-0.087
 runners-0.083
 risks-0.081
Ready-0.079
 obstacles-0.079
Token I
Token ID314
Feature activation+3.06e+00

Pos logit contributions
pron+0.115
 butter+0.095
 slice+0.086
 basil+0.085
 egg+0.085
Neg logit contributions
 villains-0.072
 risks-0.062
 opponents-0.059
 War-0.058
 runners-0.058
Token put
Token ID1234
Feature activation+3.57e+00

Pos logit contributions
pron+0.195
 butter+0.160
 basil+0.155
iflower+0.152
 egg+0.152
Neg logit contributions
 risks-0.089
 villains-0.084
 wre-0.084
 win-0.082
 runners-0.078
Token the
Token ID262
Feature activation+2.57e+00

Pos logit contributions
pron+0.220
comb+0.167
 hen+0.163
isy+0.160
ousy+0.155
Neg logit contributions
 risks-0.136
 unnecessary-0.114
 wre-0.112
 speeds-0.105
 shortcuts-0.105
Token cup
Token ID6508
Feature activation+4.72e+00

Pos logit contributions
pron+0.089
on+0.064
Her+0.059
mmm+0.058
lf+0.055
Neg logit contributions
 risks-0.143
 prisoners-0.130
 wre-0.130
 runners-0.126
 brakes-0.126
Tokencakes
Token ID37263
Feature activation+3.88e+00

Pos logit contributions
pron+0.224
 basil+0.179
 hen+0.179
 butter+0.174
 lav+0.165
Neg logit contributions
ashes-0.169
 risks-0.159
 wre-0.154
 victorious-0.152
 speeds-0.151
Token in
Token ID287
Feature activation+3.74e+00

Pos logit contributions
pron+0.236
 basil+0.211
 hen+0.210
ousy+0.199
 herb+0.192
Neg logit contributions
 risks-0.100
 victorious-0.099
 races-0.097
 wre-0.096
 runners-0.095
Token the
Token ID262
Feature activation+1.15e+00

Pos logit contributions
pron+0.238
isy+0.195
atmeal+0.190
itten+0.190
 butter+0.188
Neg logit contributions
 risks-0.099
 wre-0.093
 trouble-0.091
 races-0.082
 victorious-0.081
Token oven
Token ID14361
Feature activation+3.46e+00

Pos logit contributions
pron+0.039
 butter+0.029
 comb+0.023
on+0.023
 basil+0.022
Neg logit contributions
 villains-0.072
 wre-0.071
 risks-0.071
 War-0.068
 speeds-0.068
Token?"
Token ID1701
Feature activation+5.95e-02

Pos logit contributions
pron+0.164
 butter+0.137
 basil+0.133
on+0.120
isy+0.120
Neg logit contributions
 wre-0.150
 warriors-0.144
 speeds-0.143
 risks-0.141
 shouts-0.141
Token I
Token ID314
Feature activation+2.84e+00

Pos logit contributions
pron+0.129
 butter+0.107
 slice+0.099
 recipe+0.097
 loaf+0.096
Neg logit contributions
 villains-0.067
 War-0.056
 risks-0.054
 loses-0.053
 opponents-0.052
Token have
Token ID423
Feature activation+2.78e+00

Pos logit contributions
pron+0.199
 butter+0.161
 batch+0.157
iflower+0.155
lf+0.155
Neg logit contributions
 villains-0.067
 risks-0.064
 drown-0.058
 win-0.058
 wre-0.057
Token some
Token ID617
Feature activation+3.29e+00

Pos logit contributions
pron+0.172
itten+0.136
comb+0.133
ool+0.130
flower+0.129
Neg logit contributions
 risks-0.111
 villains-0.099
 unnecessary-0.097
 wre-0.093
 trouble-0.092
Token soup
Token ID17141
Feature activation+3.65e+00

Pos logit contributions
pron+0.146
pp+0.113
Her+0.111
comb+0.110
 hen+0.101
Neg logit contributions
 risks-0.167
 wre-0.151
 trouble-0.145
 villains-0.145
 enemies-0.141
Token to
Token ID284
Feature activation+3.73e+00

Pos logit contributions
pron+0.188
 butter+0.152
 basil+0.146
 tul+0.145
 ornament+0.143
Neg logit contributions
 villains-0.137
 warriors-0.137
 enemies-0.131
 opponents-0.129
 risks-0.127
Token drink
Token ID4144
Feature activation+4.71e+00

Pos logit contributions
pron+0.207
itten+0.190
ool+0.162
cake+0.159
 ornament+0.159
Neg logit contributions
 wre-0.132
 villains-0.129
 drown-0.124
 enemies-0.112
 warriors-0.110
Token?"
Token ID1701
Feature activation+2.09e+00

Pos logit contributions
pron+0.275
itten+0.231
innie+0.228
 comb+0.223
 hen+0.212
Neg logit contributions
 risks-0.136
 wre-0.126
 sharks-0.120
 War-0.119
ashes-0.119
Token Her
Token ID2332
Feature activation+6.46e-01

Pos logit contributions
pron+0.133
 butter+0.119
 egg+0.107
 teaspoon+0.105
iflower+0.104
Neg logit contributions
 War-0.072
 villains-0.066
 shouts-0.062
 opponents-0.061
 He-0.055
Token mom
Token ID1995
Feature activation+2.67e+00

Pos logit contributions
pron+0.033
 butter+0.028
on+0.024
 basil+0.024
 sugar+0.023
Neg logit contributions
 villains-0.034
 enemies-0.030
 opponents-0.030
 warriors-0.030
 prisoners-0.029
Token said
Token ID531
Feature activation+2.77e+00

Pos logit contributions
pron+0.165
 butter+0.143
 basil+0.138
 flour+0.132
 loaf+0.131
Neg logit contributions
 shouts-0.079
 echoes-0.068
 speeds-0.068
ks-0.068
 wre-0.067
Token,
Token ID11
Feature activation+3.21e+00

Pos logit contributions
pron+0.166
 butter+0.138
 bacon+0.134
 basil+0.127
iflower+0.127
Neg logit contributions
 villains-0.089
ks-0.085
 War-0.082
 victorious-0.079
joy-0.078
Token.
Token ID13
Feature activation+1.77e+00

Pos logit contributions
pron+0.004
 butter+0.004
 basil+0.003
on+0.003
 sugar+0.003
Neg logit contributions
 villains-0.004
 risks-0.003
 warriors-0.003
 wre-0.003
 opponents-0.003
Token "
Token ID366
Feature activation+4.38e-01

Pos logit contributions
pron+0.107
 butter+0.098
 egg+0.087
iflower+0.084
on+0.084
Neg logit contributions
 villains-0.070
 War-0.067
 opponents-0.062
 warriors-0.060
 shouts-0.059
TokenWe
Token ID1135
Feature activation+3.12e-01

Pos logit contributions
pron+0.026
 butter+0.022
 teaspoon+0.020
 egg+0.019
on+0.019
Neg logit contributions
 villains-0.021
 risks-0.018
 War-0.017
 warriors-0.017
 wre-0.017
Token need
Token ID761
Feature activation+3.81e+00

Pos logit contributions
pron+0.021
 butter+0.018
 basil+0.016
on+0.016
 sugar+0.016
Neg logit contributions
 villains-0.012
 risks-0.011
 wre-0.010
 warriors-0.010
 prisoners-0.010
Token sugar
Token ID7543
Feature activation+3.07e+00

Pos logit contributions
pron+0.233
isy+0.187
comb+0.187
lf+0.184
pp+0.177
Neg logit contributions
 risks-0.142
 trouble-0.129
 unnecessary-0.121
 wre-0.117
 consequences-0.112
Token to
Token ID284
Feature activation+4.69e+00

Pos logit contributions
pron+0.173
 basil+0.131
Her+0.126
on+0.126
 hen+0.122
Neg logit contributions
 risks-0.123
 runners-0.112
 wre-0.110
 warriors-0.110
 unnecessary-0.107
Token make
Token ID787
Feature activation+3.48e+00

Pos logit contributions
pron+0.276
ipop+0.220
on+0.212
itten+0.208
berries+0.201
Neg logit contributions
 risks-0.150
 wre-0.145
 drown-0.132
 win-0.128
 trouble-0.118
Token cookies
Token ID14746
Feature activation+2.44e+00

Pos logit contributions
pron+0.182
comb+0.131
flower+0.128
ipop+0.125
itten+0.125
Neg logit contributions
 risks-0.160
 unnecessary-0.148
 wre-0.147
 trouble-0.139
 decisions-0.136
Token.
Token ID13
Feature activation+2.09e+00

Pos logit contributions
pron+0.143
 basil+0.114
 butter+0.110
on+0.109
 hen+0.105
Neg logit contributions
 villains-0.094
 risks-0.094
 warriors-0.087
 wre-0.087
 enemies-0.086
Token What
Token ID1867
Feature activation+1.64e+00

Pos logit contributions
pron+0.149
 butter+0.130
 comb+0.119
on+0.116
 egg+0.116
Neg logit contributions
 villains-0.070
 War-0.062
 risks-0.056
 warriors-0.056
 opponents-0.054
Token should
Token ID815
Feature activation+3.14e+00

Pos logit contributions
pron+0.102
 butter+0.078
 basil+0.078
 egg+0.074
on+0.073
Neg logit contributions
 villains-0.074
 enemies-0.064
 risks-0.063
 wre-0.059
 warriors-0.059
Token.
Token ID13
Feature activation-5.33e-01

Pos logit contributions
pron+0.155
 butter+0.132
 basil+0.129
 sugar+0.122
 teaspoon+0.120
Neg logit contributions
 villains-0.073
 warriors-0.067
 wre-0.059
 enemies-0.059
 runners-0.056
Token He
Token ID679
Feature activation+3.30e+00

Pos logit contributions
 villains+0.020
 risks+0.018
 wre+0.017
 warriors+0.017
 opponents+0.017
Neg logit contributions
pron-0.036
 butter-0.032
on-0.028
 sugar-0.028
 comb-0.028
Token put
Token ID1234
Feature activation+2.57e+00

Pos logit contributions
pron+0.198
ocolate+0.163
ummy+0.162
mmm+0.159
 butter+0.159
Neg logit contributions
 villains-0.111
 warriors-0.099
 speeds-0.096
 wre-0.089
 enemies-0.087
Token on
Token ID319
Feature activation+1.87e+00

Pos logit contributions
pron+0.162
comb+0.132
isy+0.132
 basil+0.130
 hen+0.129
Neg logit contributions
 risks-0.072
 brakes-0.068
 wre-0.068
 speeds-0.066
 warriors-0.065
Token his
Token ID465
Feature activation+3.57e+00

Pos logit contributions
pron+0.129
isy+0.106
quet+0.105
 basil+0.104
comb+0.104
Neg logit contributions
 wre-0.051
 speeds-0.049
 risks-0.046
 villains-0.045
 races-0.044
Token a
Token ID257
Feature activation+4.68e+00

Pos logit contributions
pron+0.138
pp+0.134
comb+0.120
 basil+0.120
quet+0.120
Neg logit contributions
 risks-0.149
 villains-0.143
 warriors-0.142
 brakes-0.141
 opponents-0.138
Tokenpron
Token ID31186
Feature activation+2.63e+00

Pos logit contributions
mm+0.134
ions+0.129
ocolate+0.124
ries+0.124
mmm+0.120
Neg logit contributions
 races-0.212
 villains-0.210
 risks-0.210
 threat-0.205
 trouble-0.204
Token and
Token ID290
Feature activation+3.30e+00

Pos logit contributions
pron+0.143
 butter+0.126
 basil+0.122
 sugar+0.119
mmm+0.117
Neg logit contributions
 villains-0.103
 warriors-0.093
 opponents-0.084
 wre-0.083
dies-0.083
Token started
Token ID2067
Feature activation+2.79e+00

Pos logit contributions
pron+0.197
ummy+0.164
ocolate+0.159
quet+0.156
 recipe+0.155
Neg logit contributions
 villains-0.108
 speeds-0.105
 warriors-0.104
 turns-0.095
 shouts-0.094
Token to
Token ID284
Feature activation+2.90e+00

Pos logit contributions
pron+0.157
isy+0.135
mm+0.123
comb+0.123
kin+0.120
Neg logit contributions
 wre-0.095
 warriors-0.091
 speeds-0.090
 villains-0.089
 races-0.089
Token mix
Token ID5022
Feature activation+1.80e+00

Pos logit contributions
pron+0.122
ipop+0.099
itten+0.098
ummy+0.089
isy+0.088
Neg logit contributions
 wre-0.138
 villains-0.130
 speeds-0.126
 risks-0.125
 warriors-0.122
Token and
Token ID290
Feature activation-2.22e+00

Pos logit contributions
 villains+0.201
 risks+0.191
 unnecessary+0.189
 prisoners+0.181
 slaves+0.178
Neg logit contributions
pron-0.068
 baking-0.058
 butter-0.056
 comb-0.054
on-0.053
Token suddenly
Token ID6451
Feature activation-2.67e+00

Pos logit contributions
 villains+0.127
 risks+0.124
 unnecessary+0.117
 prisoners+0.113
 opponents+0.109
Neg logit contributions
pron-0.088
 butter-0.084
on-0.074
 sugar-0.071
 baking-0.069
Token,
Token ID11
Feature activation-2.52e+00

Pos logit contributions
 villains+0.174
 unnecessary+0.169
 risks+0.168
 prisoners+0.159
 slaves+0.159
Neg logit contributions
pron-0.083
 butter-0.080
on-0.068
 tomato-0.059
 sugar-0.057
Token Sam
Token ID3409
Feature activation-1.87e+00

Pos logit contributions
 villains+0.159
 unnecessary+0.151
 risks+0.150
 prisoners+0.144
 opponents+0.140
Neg logit contributions
 butter-0.082
pron-0.079
on-0.073
 tomato-0.067
 sugar-0.064
Token started
Token ID2067
Feature activation-3.27e+00

Pos logit contributions
 villains+0.114
 risks+0.109
 unnecessary+0.106
 wre+0.101
 prisoners+0.100
Neg logit contributions
pron-0.080
 butter-0.066
on-0.060
 comb-0.059
 brush-0.053
Token running
Token ID2491
Feature activation-4.95e+00

Pos logit contributions
 villains+0.242
 unnecessary+0.231
 risks+0.230
 prisoners+0.224
 slaves+0.218
Neg logit contributions
pron-0.068
 baking-0.054
 comb-0.054
 butter-0.048
 brushing-0.048
Token really
Token ID1107
Feature activation-3.78e+00

Pos logit contributions
 unnecessary+0.325
 villains+0.320
 prisoners+0.300
 risks+0.299
 slaves+0.295
Neg logit contributions
 her-0.120
,-0.117
 and-0.101
 with-0.089
 she-0.085
Token fast
Token ID3049
Feature activation-3.88e+00

Pos logit contributions
 villains+0.228
 unnecessary+0.216
 risks+0.215
 endings+0.213
 habitats+0.212
Neg logit contributions
pron-0.092
 her-0.086
 she-0.085
 butter-0.085
 cherry-0.084
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
 villains+0.278
 unnecessary+0.257
 prisoners+0.254
 slaves+0.251
 risks+0.249
Neg logit contributions
 butter-0.061
 comb-0.060
pron-0.059
 her-0.052
 baking-0.051
Token He
Token ID679
Feature activation-3.02e+00

Pos logit contributions
 villains+0.076
 risks+0.074
 unnecessary+0.072
 prisoners+0.068
 wre+0.067
Neg logit contributions
pron-0.101
 butter-0.085
on-0.085
 basil-0.081
 sugar-0.079
Token was
Token ID373
Feature activation-2.65e+00

Pos logit contributions
 risks+0.190
 unnecessary+0.190
 villains+0.187
 prisoners+0.173
 decisions+0.170
Neg logit contributions
pron-0.093
 comb-0.078
 butter-0.074
on-0.070
 brush-0.067
Token cat
Token ID3797
Feature activation-1.40e+00

Pos logit contributions
 villains+0.052
 risks+0.048
 opponents+0.047
 wre+0.047
 unnecessary+0.046
Neg logit contributions
pron-0.045
 butter-0.040
 comb-0.034
 egg-0.034
 sugar-0.033
Token was
Token ID373
Feature activation-1.77e+00

Pos logit contributions
 villains+0.084
 risks+0.079
 unnecessary+0.078
 opponents+0.076
 warriors+0.075
Neg logit contributions
pron-0.059
 butter-0.049
 comb-0.044
on-0.043
 egg-0.040
Token terrified
Token ID22144
Feature activation-1.86e+00

Pos logit contributions
 villains+0.094
 opponents+0.086
 risks+0.085
dies+0.085
 runners+0.082
Neg logit contributions
pron-0.082
 butter-0.071
 comb-0.066
 baking-0.065
 egg-0.063
Token and
Token ID290
Feature activation-2.10e+00

Pos logit contributions
 villains+0.144
 prisoners+0.129
 opponents+0.129
 speeds+0.128
 risks+0.127
Neg logit contributions
pron-0.042
 butter-0.039
 egg-0.027
 comb-0.025
 sugar-0.025
Token started
Token ID2067
Feature activation-2.71e+00

Pos logit contributions
 villains+0.120
 risks+0.116
 opponents+0.114
 runners+0.112
 unnecessary+0.111
Neg logit contributions
pron-0.082
 butter-0.071
 comb-0.065
 egg-0.063
on-0.062
Token running
Token ID2491
Feature activation-4.67e+00

Pos logit contributions
 villains+0.197
 risks+0.183
 runners+0.181
 opponents+0.180
 prisoners+0.179
Neg logit contributions
pron-0.059
 comb-0.049
 baking-0.048
 butter-0.044
 brushing-0.042
Token away
Token ID1497
Feature activation-3.03e+00

Pos logit contributions
 villains+0.304
 opponents+0.293
 unnecessary+0.292
 slaves+0.283
 prisoners+0.278
Neg logit contributions
 her-0.104
,-0.104
 and-0.085
 in-0.080
 with-0.080
Token.
Token ID13
Feature activation-1.11e+00

Pos logit contributions
 villains+0.213
 unnecessary+0.198
 opponents+0.197
 risks+0.194
 War+0.193
Neg logit contributions
pron-0.058
 baking-0.056
 butter-0.050
 her-0.050
 teaspoon-0.049
Token
Token ID198
Feature activation+4.49e-02

Pos logit contributions
 villains+0.064
 risks+0.058
 warriors+0.057
 wre+0.056
 runners+0.056
Neg logit contributions
pron-0.053
 butter-0.042
on-0.039
 basil-0.038
 sugar-0.037
Token
Token ID198
Feature activation-3.09e-01

Pos logit contributions
pron+0.002
 butter+0.002
 sugar+0.002
 comb+0.002
 teaspoon+0.002
Neg logit contributions
 villains-0.002
 risks-0.002
 warriors-0.002
 wre-0.002
 opponents-0.002
TokenIn
Token ID818
Feature activation-5.15e-01

Pos logit contributions
 villains+0.018
 risks+0.017
 warriors+0.016
 wre+0.016
 opponents+0.016
Neg logit contributions
pron-0.015
 butter-0.012
on-0.010
 comb-0.010
 sugar-0.010
Token home
Token ID1363
Feature activation-6.88e-01

Pos logit contributions
 villains+0.067
 risks+0.061
 warriors+0.060
 wre+0.060
 opponents+0.059
Neg logit contributions
pron-0.063
 butter-0.054
on-0.047
 comb-0.046
 sugar-0.045
Token.
Token ID13
Feature activation-3.36e+00

Pos logit contributions
 villains+0.038
 risks+0.034
 warriors+0.033
 opponents+0.033
 wre+0.033
Neg logit contributions
pron-0.035
 butter-0.030
 comb-0.025
 sugar-0.025
on-0.025
Token Everywhere
Token ID48201
Feature activation-3.97e+00

Pos logit contributions
 risks+0.206
 unnecessary+0.205
 villains+0.203
 prisoners+0.201
 slaves+0.195
Neg logit contributions
pron-0.104
on-0.087
 butter-0.076
 basil-0.070
 comb-0.069
Token he
Token ID339
Feature activation-1.38e+00

Pos logit contributions
 villains+0.239
 risks+0.237
 drown+0.235
 wre+0.233
 unnecessary+0.230
Neg logit contributions
pron-0.120
on-0.107
 butter-0.105
 she-0.097
 her-0.088
Token went
Token ID1816
Feature activation-2.99e+00

Pos logit contributions
 villains+0.086
 risks+0.080
 opponents+0.080
 unnecessary+0.077
 runners+0.077
Neg logit contributions
pron-0.058
 butter-0.048
 comb-0.042
 egg-0.040
 sugar-0.038
Token,
Token ID11
Feature activation-4.67e+00

Pos logit contributions
 villains+0.225
 risks+0.218
 unnecessary+0.216
 drown+0.213
 slaves+0.211
Neg logit contributions
on-0.038
pron-0.031
 she-0.029
 butter-0.026
 her-0.024
Token people
Token ID661
Feature activation-4.98e-01

Pos logit contributions
 risks+0.269
 villains+0.264
 prisoners+0.263
 unnecessary+0.256
 wre+0.256
Neg logit contributions
 she-0.121
on-0.117
pron-0.114
 butter-0.108
 her-0.107
Token didn
Token ID1422
Feature activation-1.92e-03

Pos logit contributions
 villains+0.029
 risks+0.027
 warriors+0.026
 opponents+0.026
 wre+0.026
Neg logit contributions
pron-0.024
 butter-0.019
 comb-0.016
on-0.016
 teaspoon-0.016
Token't
Token ID470
Feature activation-1.52e+00

Pos logit contributions
 villains+0.000
 risks+0.000
 warriors+0.000
 wre+0.000
 opponents+0.000
Neg logit contributions
pron-0.000
 butter-0.000
 sugar-0.000
 basil-0.000
 slice-0.000
Token want
Token ID765
Feature activation-2.63e+00

Pos logit contributions
 villains+0.074
 opponents+0.071
 wre+0.069
 risks+0.068
 speeds+0.067
Neg logit contributions
pron-0.080
 butter-0.068
 comb-0.065
 brush-0.062
 bake-0.058
Token him
Token ID683
Feature activation-1.40e+00

Pos logit contributions
 villains+0.182
 slaves+0.174
 runners+0.170
 prisoners+0.170
 opponents+0.168
Neg logit contributions
 butter-0.063
pron-0.056
 comb-0.043
 sugar-0.042
on-0.042
Token agreed
Token ID4987
Feature activation-7.30e-01

Pos logit contributions
 villains+0.074
 risks+0.068
 unnecessary+0.066
 warriors+0.065
 wre+0.064
Neg logit contributions
pron-0.056
 butter-0.046
 comb-0.040
on-0.039
 teaspoon-0.038
Token and
Token ID290
Feature activation-1.40e+00

Pos logit contributions
 villains+0.047
 risks+0.043
 warriors+0.042
 wre+0.042
 opponents+0.042
Neg logit contributions
pron-0.031
 butter-0.025
 comb-0.020
on-0.020
 sugar-0.020
Token they
Token ID484
Feature activation-1.82e+00

Pos logit contributions
 villains+0.087
 risks+0.081
 unnecessary+0.078
 wre+0.076
 opponents+0.076
Neg logit contributions
pron-0.059
 butter-0.051
 comb-0.043
on-0.042
 sugar-0.041
Token both
Token ID1111
Feature activation-2.02e+00

Pos logit contributions
 villains+0.107
 risks+0.102
 unnecessary+0.100
 opponents+0.094
 warriors+0.094
Neg logit contributions
pron-0.081
 butter-0.069
 comb-0.064
on-0.062
 brush-0.059
Token started
Token ID2067
Feature activation-3.12e+00

Pos logit contributions
 villains+0.116
 risks+0.112
 unnecessary+0.112
 opponents+0.103
 warriors+0.103
Neg logit contributions
pron-0.090
 butter-0.075
 comb-0.070
on-0.070
 brush-0.067
Token running
Token ID2491
Feature activation-4.58e+00

Pos logit contributions
 villains+0.208
 risks+0.187
 unnecessary+0.185
 slaves+0.181
awks+0.181
Neg logit contributions
pron-0.094
 baking-0.089
 butter-0.080
 comb-0.076
 mixer-0.074
Token.
Token ID13
Feature activation-1.37e+00

Pos logit contributions
 villains+0.332
 unnecessary+0.316
 slaves+0.309
 prisoners+0.306
 enemy+0.305
Neg logit contributions
,-0.075
 her-0.064
 and-0.052
 in-0.043
 every-0.041
Token
Token ID198
Feature activation-2.27e-02

Pos logit contributions
 villains+0.088
 risks+0.080
 warriors+0.077
 unnecessary+0.077
 prisoners+0.077
Neg logit contributions
pron-0.056
 butter-0.045
on-0.042
 sugar-0.039
 basil-0.039
Token
Token ID198
Feature activation-4.05e-01

Pos logit contributions
 villains+0.001
 risks+0.001
 warriors+0.001
 wre+0.001
 opponents+0.001
Neg logit contributions
pron-0.001
 butter-0.001
 comb-0.001
 sugar-0.001
 teaspoon-0.001
TokenThe
Token ID464
Feature activation-5.21e-01

Pos logit contributions
 villains+0.025
 risks+0.022
 warriors+0.022
 wre+0.022
 opponents+0.022
Neg logit contributions
pron-0.019
 butter-0.015
on-0.013
 comb-0.012
 sugar-0.012
Token fox
Token ID21831
Feature activation-1.99e+00

Pos logit contributions
 villains+0.028
 risks+0.025
 wre+0.024
 warriors+0.024
 opponents+0.024
Neg logit contributions
pron-0.028
 butter-0.024
 comb-0.021
 basil-0.020
 sugar-0.020
Token march
Token ID9960
Feature activation-2.92e+00

Pos logit contributions
 villains+0.029
 risks+0.027
 warriors+0.026
 opponents+0.025
 wre+0.025
Neg logit contributions
pron-0.032
 butter-0.027
 comb-0.024
on-0.023
 sugar-0.023
Token like
Token ID588
Feature activation-1.19e+00

Pos logit contributions
 villains+0.189
 unnecessary+0.178
 drown+0.177
 prisoners+0.176
 slaves+0.173
Neg logit contributions
pron-0.073
 her-0.071
 butter-0.058
 brush-0.057
 comb-0.054
Token soldiers
Token ID5795
Feature activation-1.84e+00

Pos logit contributions
 villains+0.070
 risks+0.065
 opponents+0.064
 wre+0.062
 warriors+0.062
Neg logit contributions
pron-0.052
 butter-0.046
 sugar-0.038
 comb-0.038
 basil-0.036
Token and
Token ID290
Feature activation-2.24e+00

Pos logit contributions
 villains+0.112
 unnecessary+0.108
 risks+0.107
 wre+0.104
 opponents+0.104
Neg logit contributions
pron-0.070
 butter-0.063
 comb-0.054
on-0.051
 brush-0.049
Token make
Token ID787
Feature activation-3.54e+00

Pos logit contributions
 risks+0.101
 villains+0.099
 unnecessary+0.099
 victorious+0.095
 opponents+0.094
Neg logit contributions
pron-0.107
 butter-0.098
 comb-0.092
 brush-0.089
 bake-0.084
Token loud
Token ID7812
Feature activation-4.57e+00

Pos logit contributions
 villains+0.145
 prisoners+0.132
 warriors+0.129
 opponents+0.129
 risks+0.128
Neg logit contributions
 butter-0.178
pron-0.162
 toast-0.152
 cake-0.151
 flour-0.151
Token noises
Token ID26782
Feature activation-1.92e+00

Pos logit contributions
 villains+0.178
 prisoners+0.172
 unnecessary+0.165
 criminal+0.164
 slaves+0.162
Neg logit contributions
 butter-0.210
pron-0.188
 cake-0.186
 milk-0.178
 baking-0.176
Token.
Token ID13
Feature activation-6.49e-01

Pos logit contributions
 villains+0.115
 unnecessary+0.111
 risks+0.110
 opponents+0.108
 wre+0.107
Neg logit contributions
pron-0.076
 butter-0.068
on-0.055
 comb-0.054
 baking-0.053
Token They
Token ID1119
Feature activation-1.70e+00

Pos logit contributions
 villains+0.034
 risks+0.031
 warriors+0.030
 wre+0.030
 opponents+0.030
Neg logit contributions
pron-0.035
 butter-0.029
 sugar-0.025
on-0.025
 basil-0.025
Token marched
Token ID23558
Feature activation-2.58e+00

Pos logit contributions
 risks+0.105
 villains+0.104
 unnecessary+0.102
 opponents+0.097
 wre+0.096
Neg logit contributions
pron-0.068
 butter-0.054
 comb-0.051
 brush-0.048
on-0.045
Token on
Token ID319
Feature activation-1.68e+00

Pos logit contributions
 villains+0.170
 unnecessary+0.165
 risks+0.164
 wre+0.152
 drown+0.152
Neg logit contributions
pron-0.085
 her-0.058
 butter-0.056
 brush-0.051
 comb-0.051
Token louder
Token ID27089
Feature activation-2.39e+00

Pos logit contributions
 villains+0.038
 risks+0.033
 opponents+0.033
 warriors+0.032
 wre+0.031
Neg logit contributions
pron-0.046
 butter-0.041
 sugar-0.036
 comb-0.036
 basil-0.035
Token and
Token ID290
Feature activation-7.08e-01

Pos logit contributions
 villains+0.163
 unnecessary+0.150
 prisoners+0.149
 risks+0.149
 opponents+0.149
Neg logit contributions
pron-0.066
 butter-0.058
 baking-0.048
 egg-0.047
 sugar-0.045
Token soon
Token ID2582
Feature activation-1.29e+00

Pos logit contributions
 villains+0.039
 risks+0.035
 opponents+0.034
 warriors+0.034
 wre+0.033
Neg logit contributions
pron-0.036
 butter-0.031
 sugar-0.026
 comb-0.026
 basil-0.026
Token everyone
Token ID2506
Feature activation-2.43e+00

Pos logit contributions
 villains+0.082
 opponents+0.074
 risks+0.073
 wre+0.073
 runners+0.072
Neg logit contributions
pron-0.053
 butter-0.047
 sugar-0.038
on-0.037
 comb-0.036
Token started
Token ID2067
Feature activation-3.11e+00

Pos logit contributions
 villains+0.164
 risks+0.158
 unnecessary+0.158
 opponents+0.156
 warriors+0.148
Neg logit contributions
pron-0.072
on-0.057
 butter-0.055
 boiled-0.051
 teaspoon-0.051
Token running
Token ID2491
Feature activation-4.54e+00

Pos logit contributions
 villains+0.241
 slaves+0.222
 prisoners+0.222
 risks+0.217
 opponents+0.216
Neg logit contributions
pron-0.051
 baking-0.048
 her-0.043
 comb-0.040
 brushing-0.040
Token,
Token ID11
Feature activation-1.54e+00

Pos logit contributions
 villains+0.332
 opponents+0.327
 prisoners+0.317
 slaves+0.317
 unnecessary+0.317
Neg logit contributions
,-0.067
 her-0.065
 and-0.049
 in-0.046
 with-0.039
Token screaming
Token ID14788
Feature activation-2.33e+00

Pos logit contributions
 villains+0.095
 risks+0.087
 opponents+0.084
 unnecessary+0.084
 prisoners+0.082
Neg logit contributions
pron-0.067
 butter-0.055
 comb-0.047
 baking-0.047
 sugar-0.046
Token,
Token ID11
Feature activation-1.17e+00

Pos logit contributions
 villains+0.149
 risks+0.135
 prisoners+0.133
 unnecessary+0.133
 opponents+0.132
Neg logit contributions
pron-0.085
 butter-0.071
 baking-0.059
 comb-0.058
 mug-0.057
Token and
Token ID290
Feature activation-8.21e-01

Pos logit contributions
 villains+0.070
 risks+0.064
 unnecessary+0.061
 warriors+0.061
 opponents+0.061
Neg logit contributions
pron-0.055
 butter-0.044
 comb-0.039
 baking-0.037
 sugar-0.037
Token pan
Token ID3425
Feature activation-1.56e+00

Pos logit contributions
 villains+0.050
 risks+0.045
 warriors+0.045
 opponents+0.044
 wre+0.044
Neg logit contributions
pron-0.036
 butter-0.030
 sugar-0.025
 comb-0.025
 basil-0.025
Token really
Token ID1107
Feature activation-3.78e+00

Pos logit contributions
 unnecessary+0.325
 villains+0.320
 prisoners+0.300
 risks+0.299
 slaves+0.295
Neg logit contributions
 her-0.120
,-0.117
 and-0.101
 with-0.089
 she-0.085
Token fast
Token ID3049
Feature activation-3.88e+00

Pos logit contributions
 villains+0.228
 unnecessary+0.216
 risks+0.215
 endings+0.213
 habitats+0.212
Neg logit contributions
pron-0.092
 her-0.086
 she-0.085
 butter-0.085
 cherry-0.084
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
 villains+0.278
 unnecessary+0.257
 prisoners+0.254
 slaves+0.251
 risks+0.249
Neg logit contributions
 butter-0.061
 comb-0.060
pron-0.059
 her-0.052
 baking-0.051
Token He
Token ID679
Feature activation-3.02e+00

Pos logit contributions
 villains+0.076
 risks+0.074
 unnecessary+0.072
 prisoners+0.068
 wre+0.067
Neg logit contributions
pron-0.101
 butter-0.085
on-0.085
 basil-0.081
 sugar-0.079
Token was
Token ID373
Feature activation-2.65e+00

Pos logit contributions
 risks+0.190
 unnecessary+0.190
 villains+0.187
 prisoners+0.173
 decisions+0.170
Neg logit contributions
pron-0.093
 comb-0.078
 butter-0.074
on-0.070
 brush-0.067
Token running
Token ID2491
Feature activation-4.53e+00

Pos logit contributions
 villains+0.160
 risks+0.145
dies+0.144
 opponents+0.140
 unnecessary+0.137
Neg logit contributions
pron-0.105
 butter-0.087
 comb-0.084
 baking-0.083
 brush-0.081
Token so
Token ID523
Feature activation-3.13e+00

Pos logit contributions
 villains+0.302
 unnecessary+0.298
 risks+0.284
 prisoners+0.276
 slaves+0.273
Neg logit contributions
 her-0.098
,-0.094
 and-0.078
 she-0.076
pron-0.076
Token fast
Token ID3049
Feature activation-3.45e+00

Pos logit contributions
 villains+0.184
 risks+0.178
 enemy+0.176
 prisoners+0.174
 situations+0.174
Neg logit contributions
 cherry-0.077
 she-0.076
on-0.074
 her-0.073
pron-0.073
Token that
Token ID326
Feature activation-1.61e+00

Pos logit contributions
 villains+0.271
 unnecessary+0.249
 prisoners+0.247
 risks+0.246
 opponents+0.245
Neg logit contributions
pron-0.054
 butter-0.049
on-0.036
 she-0.033
 comb-0.032
Token he
Token ID339
Feature activation-2.94e+00

Pos logit contributions
 villains+0.092
 risks+0.090
 unnecessary+0.087
 prisoners+0.086
 wre+0.085
Neg logit contributions
pron-0.069
 butter-0.059
on-0.057
 comb-0.048
 sugar-0.048
Token couldn
Token ID3521
Feature activation-2.92e-01

Pos logit contributions
 risks+0.181
 villains+0.179
 unnecessary+0.176
 prisoners+0.167
minded+0.166
Neg logit contributions
pron-0.105
 butter-0.083
 comb-0.076
on-0.073
 spoon-0.069
Token cheering
Token ID29732
Feature activation-1.90e+00

Pos logit contributions
 villains+0.121
 risks+0.104
 unnecessary+0.100
 warriors+0.100
dies+0.100
Neg logit contributions
pron-0.100
 butter-0.083
 baking-0.075
 comb-0.073
 brush-0.070
Token and
Token ID290
Feature activation-2.16e+00

Pos logit contributions
 villains+0.122
 risks+0.109
 wre+0.108
 prisoners+0.106
 unnecessary+0.105
Neg logit contributions
pron-0.077
 butter-0.063
on-0.050
 comb-0.050
 baking-0.048
Token someone
Token ID2130
Feature activation-2.12e+00

Pos logit contributions
 villains+0.114
 risks+0.102
 unnecessary+0.096
 prisoners+0.096
 warriors+0.095
Neg logit contributions
pron-0.105
 butter-0.092
 comb-0.084
 baking-0.083
 sugar-0.080
Token else
Token ID2073
Feature activation-2.04e+00

Pos logit contributions
 villains+0.124
 risks+0.115
 opponents+0.112
 warriors+0.111
 unnecessary+0.111
Neg logit contributions
pron-0.093
 butter-0.077
 comb-0.072
on-0.066
 brush-0.065
Token started
Token ID2067
Feature activation-3.30e+00

Pos logit contributions
 villains+0.119
 risks+0.111
 unnecessary+0.108
 wre+0.106
 warriors+0.106
Neg logit contributions
pron-0.092
 butter-0.074
on-0.066
 comb-0.065
 teaspoon-0.062
Token running
Token ID2491
Feature activation-4.45e+00

Pos logit contributions
 villains+0.221
 prisoners+0.197
 risks+0.194
 unnecessary+0.194
 slaves+0.192
Neg logit contributions
pron-0.099
 baking-0.085
 butter-0.081
 comb-0.079
 brushing-0.073
Token faster
Token ID5443
Feature activation-3.62e+00

Pos logit contributions
 villains+0.292
 prisoners+0.270
 unnecessary+0.265
 slaves+0.264
 warriors+0.257
Neg logit contributions
 her-0.098
pron-0.095
,-0.083
 comb-0.081
 butter-0.081
Token.
Token ID13
Feature activation-2.17e+00

Pos logit contributions
 villains+0.250
 prisoners+0.228
 unnecessary+0.222
 War+0.219
 slaves+0.218
Neg logit contributions
pron-0.081
 butter-0.074
 comb-0.062
 her-0.059
 brush-0.057
Token It
Token ID632
Feature activation-2.87e+00

Pos logit contributions
 villains+0.115
 wre+0.099
 risks+0.098
 prisoners+0.097
 warriors+0.096
Neg logit contributions
pron-0.111
 butter-0.091
 mug-0.079
on-0.079
 basil-0.079
Token was
Token ID373
Feature activation-1.63e+00

Pos logit contributions
 villains+0.190
 risks+0.180
 unnecessary+0.180
 warriors+0.173
 opponents+0.173
Neg logit contributions
pron-0.090
 butter-0.070
 comb-0.067
 brush-0.058
on-0.057
Token Sarah
Token ID10490
Feature activation-1.41e+00

Pos logit contributions
 villains+0.097
 risks+0.085
 warriors+0.084
 opponents+0.084
 wre+0.082
Neg logit contributions
pron-0.075
 butter-0.062
 baking-0.053
 comb-0.053
 sugar-0.053
Token
Token ID198
Feature activation-2.12e-01

Pos logit contributions
 villains+0.079
 risks+0.074
 unnecessary+0.072
 prisoners+0.067
 wre+0.067
Neg logit contributions
pron-0.074
 butter-0.063
on-0.059
 mixer-0.058
 sugar-0.057
Token
Token ID198
Feature activation-8.14e-01

Pos logit contributions
 villains+0.011
 risks+0.010
 warriors+0.010
 wre+0.010
 opponents+0.010
Neg logit contributions
pron-0.011
 butter-0.009
 comb-0.008
on-0.008
 sugar-0.008
TokenThey
Token ID2990
Feature activation-1.16e+00

Pos logit contributions
 villains+0.047
 risks+0.043
 warriors+0.042
 unnecessary+0.042
 wre+0.042
Neg logit contributions
pron-0.039
 butter-0.031
on-0.027
 comb-0.026
 sugar-0.026
Token all
Token ID477
Feature activation-1.89e+00

Pos logit contributions
 villains+0.066
 risks+0.062
 unnecessary+0.061
 opponents+0.058
 wre+0.057
Neg logit contributions
pron-0.056
 butter-0.047
 comb-0.042
on-0.041
 brush-0.039
Token started
Token ID2067
Feature activation-2.89e+00

Pos logit contributions
 villains+0.111
 risks+0.104
 unnecessary+0.104
 opponents+0.099
 warriors+0.096
Neg logit contributions
pron-0.084
 butter-0.070
 comb-0.066
on-0.065
 brush-0.064
Token running
Token ID2491
Feature activation-4.41e+00

Pos logit contributions
 villains+0.215
 risks+0.199
 unnecessary+0.197
 prisoners+0.191
 slaves+0.188
Neg logit contributions
 baking-0.056
pron-0.055
 butter-0.048
 comb-0.043
 mixer-0.042
Token up
Token ID510
Feature activation-2.30e+00

Pos logit contributions
 villains+0.353
 unnecessary+0.348
 slaves+0.326
 prisoners+0.326
 enemy+0.324
Neg logit contributions
,-0.066
 and-0.047
 her-0.035
 while-0.025
 in-0.024
Token the
Token ID262
Feature activation-1.47e+00

Pos logit contributions
 villains+0.182
 unnecessary+0.171
 risks+0.167
 opponents+0.163
 prisoners+0.163
Neg logit contributions
pron-0.043
 butter-0.037
 baking-0.033
 sugar-0.031
 teaspoon-0.029
Token volcano
Token ID32410
Feature activation-2.27e+00

Pos logit contributions
 villains+0.084
 risks+0.077
 unnecessary+0.076
 opponents+0.073
 warriors+0.071
Neg logit contributions
pron-0.070
 butter-0.059
 teaspoon-0.055
 mixer-0.054
 spoon-0.053
Token as
Token ID355
Feature activation-1.97e+00

Pos logit contributions
 villains+0.158
 unnecessary+0.145
 risks+0.143
 prisoners+0.139
 War+0.138
Neg logit contributions
pron-0.071
 butter-0.061
 baking-0.051
 brush-0.051
 teaspoon-0.050
Token fast
Token ID3049
Feature activation-2.89e+00

Pos logit contributions
 villains+0.101
 risks+0.100
 unnecessary+0.094
 opponents+0.093
 wre+0.090
Neg logit contributions
pron-0.096
 butter-0.085
 sugar-0.073
on-0.073
 flour-0.072
Token Tim
Token ID5045
Feature activation-1.49e+00

Pos logit contributions
 villains+0.068
 risks+0.063
 warriors+0.060
 opponents+0.059
 unnecessary+0.059
Neg logit contributions
pron-0.053
 butter-0.045
on-0.038
 sugar-0.038
 comb-0.037
Token and
Token ID290
Feature activation-1.22e+00

Pos logit contributions
 villains+0.095
 risks+0.090
 unnecessary+0.088
 warriors+0.083
 wre+0.082
Neg logit contributions
pron-0.063
 butter-0.051
 comb-0.046
on-0.045
 brush-0.041
Token Max
Token ID5436
Feature activation-1.93e+00

Pos logit contributions
 villains+0.069
 risks+0.065
 unnecessary+0.061
 warriors+0.061
 wre+0.061
Neg logit contributions
pron-0.058
 butter-0.049
 comb-0.044
on-0.043
 mug-0.041
Token started
Token ID2067
Feature activation-2.60e+00

Pos logit contributions
 villains+0.114
 risks+0.110
 unnecessary+0.110
 opponents+0.101
 warriors+0.101
Neg logit contributions
pron-0.080
 butter-0.070
 comb-0.066
on-0.060
 brush-0.059
Token to
Token ID284
Feature activation-1.79e+00

Pos logit contributions
 villains+0.197
 risks+0.184
 unnecessary+0.180
 prisoners+0.179
 warriors+0.174
Neg logit contributions
pron-0.061
 baking-0.052
 comb-0.046
 butter-0.046
 brushing-0.041
Token run
Token ID1057
Feature activation-4.39e+00

Pos logit contributions
 villains+0.092
 risks+0.083
 unnecessary+0.080
 prisoners+0.079
 warriors+0.079
Neg logit contributions
pron-0.094
 butter-0.081
 comb-0.080
 brush-0.076
 bake-0.074
Token.
Token ID13
Feature activation-1.96e+00

Pos logit contributions
 villains+0.318
 slaves+0.303
 unnecessary+0.303
 prisoners+0.299
 enemy+0.295
Neg logit contributions
,-0.079
 her-0.059
 and-0.056
 with-0.045
 while-0.042
Token They
Token ID1119
Feature activation-1.97e+00

Pos logit contributions
 villains+0.106
 risks+0.106
 unnecessary+0.102
 prisoners+0.096
 warriors+0.095
Neg logit contributions
pron-0.090
 butter-0.070
 milk-0.068
 sugar-0.066
 baking-0.065
Token both
Token ID1111
Feature activation-2.50e+00

Pos logit contributions
 risks+0.118
 villains+0.118
 unnecessary+0.116
 opponents+0.109
 warriors+0.107
Neg logit contributions
pron-0.076
 butter-0.065
 comb-0.064
 brush-0.059
on-0.058
Token ran
Token ID4966
Feature activation-4.18e+00

Pos logit contributions
 villains+0.153
 risks+0.152
 unnecessary+0.150
 opponents+0.141
dies+0.140
Neg logit contributions
pron-0.086
 butter-0.073
 comb-0.072
 brush-0.070
on-0.070
Token very
Token ID845
Feature activation-2.94e+00

Pos logit contributions
 villains+0.338
 unnecessary+0.321
 slaves+0.311
 risks+0.310
 enemy+0.308
Neg logit contributions
,-0.051
 her-0.030
 and-0.029
 she-0.014
 while-0.013
Token with
Token ID351
Feature activation-1.16e+00

Pos logit contributions
 villains+0.149
 unnecessary+0.142
 risks+0.140
 brakes+0.135
 War+0.134
Neg logit contributions
pron-0.053
 butter-0.047
 baking-0.040
 cake-0.035
on-0.035
Token their
Token ID511
Feature activation-2.34e+00

Pos logit contributions
 villains+0.073
 warriors+0.065
 risks+0.064
 runners+0.063
 opponents+0.063
Neg logit contributions
pron-0.048
 butter-0.043
 comb-0.035
 sugar-0.035
 mug-0.034
Token skill
Token ID5032
Feature activation-1.29e+00

Pos logit contributions
 villains+0.129
 risks+0.112
 wre+0.108
 unnecessary+0.108
 War+0.107
Neg logit contributions
pron-0.109
 butter-0.094
 slice-0.089
 brush-0.085
 cake-0.084
Token and
Token ID290
Feature activation-1.54e+00

Pos logit contributions
 villains+0.077
 risks+0.070
 unnecessary+0.068
 wre+0.068
 warriors+0.067
Neg logit contributions
pron-0.060
 butter-0.049
on-0.044
 comb-0.042
 baking-0.040
Token started
Token ID2067
Feature activation-1.75e+00

Pos logit contributions
 villains+0.084
 risks+0.080
 unnecessary+0.076
 wre+0.076
 warriors+0.075
Neg logit contributions
pron-0.073
 butter-0.065
 comb-0.056
on-0.055
 brush-0.055
Token running
Token ID2491
Feature activation-4.38e+00

Pos logit contributions
 villains+0.114
 risks+0.103
 prisoners+0.100
 warriors+0.100
 unnecessary+0.098
Neg logit contributions
pron-0.069
 butter-0.060
on-0.052
 comb-0.051
 baking-0.051
Token around
Token ID1088
Feature activation-2.49e+00

Pos logit contributions
 villains+0.294
 unnecessary+0.284
 prisoners+0.280
 slaves+0.276
 enemy+0.272
Neg logit contributions
 her-0.093
,-0.081
 butter-0.064
 in-0.062
 with-0.062
Token the
Token ID262
Feature activation-2.53e+00

Pos logit contributions
 villains+0.169
 risks+0.160
 unnecessary+0.159
 prisoners+0.151
 brakes+0.150
Neg logit contributions
pron-0.075
 baking-0.064
 butter-0.061
on-0.055
 teaspoon-0.051
Token map
Token ID3975
Feature activation-1.42e+00

Pos logit contributions
 villains+0.152
 unnecessary+0.142
 risks+0.139
 drown+0.133
 prisoners+0.131
Neg logit contributions
pron-0.100
 tomato-0.081
 butter-0.080
 cake-0.078
on-0.076
Token.
Token ID13
Feature activation-6.82e-01

Pos logit contributions
 villains+0.081
 risks+0.074
 unnecessary+0.072
 wre+0.070
 warriors+0.070
Neg logit contributions
pron-0.070
 butter-0.059
on-0.051
 comb-0.050
 teaspoon-0.049
Token Each
Token ID5501
Feature activation-1.02e+00

Pos logit contributions
 villains+0.034
 risks+0.030
 warriors+0.029
 wre+0.029
 opponents+0.028
Neg logit contributions
pron-0.039
 butter-0.033
on-0.029
 basil-0.029
 sugar-0.029
Token race
Token ID3234
Feature activation-2.56e+00

Pos logit contributions
 villains+0.077
 unnecessary+0.074
 risks+0.074
 wre+0.072
 warriors+0.070
Neg logit contributions
pron-0.066
 butter-0.055
 comb-0.051
 teaspoon-0.051
on-0.049
Token began
Token ID2540
Feature activation-2.97e+00

Pos logit contributions
 villains+0.151
 risks+0.146
 unnecessary+0.145
 wre+0.137
 warriors+0.134
Neg logit contributions
pron-0.108
 butter-0.088
on-0.084
 comb-0.083
 brush-0.077
Token.
Token ID13
Feature activation-9.10e-01

Pos logit contributions
 villains+0.203
 unnecessary+0.188
 risks+0.188
 prisoners+0.180
 slaves+0.180
Neg logit contributions
pron-0.082
 baking-0.069
 butter-0.065
 comb-0.063
on-0.060
Token Jack
Token ID3619
Feature activation-1.77e+00

Pos logit contributions
 villains+0.052
 risks+0.047
 wre+0.045
 unnecessary+0.045
 warriors+0.045
Neg logit contributions
pron-0.045
 butter-0.037
on-0.033
 sugar-0.032
 basil-0.032
Token started
Token ID2067
Feature activation-2.81e+00

Pos logit contributions
 villains+0.105
 risks+0.098
 unnecessary+0.096
 wre+0.091
 warriors+0.089
Neg logit contributions
pron-0.080
 butter-0.066
 comb-0.060
on-0.060
 brush-0.055
Token running
Token ID2491
Feature activation-4.33e+00

Pos logit contributions
 villains+0.177
 risks+0.164
 unnecessary+0.163
 prisoners+0.156
 slaves+0.155
Neg logit contributions
pron-0.096
 comb-0.080
 baking-0.079
 butter-0.079
 brush-0.073
Token as
Token ID355
Feature activation-2.41e+00

Pos logit contributions
 villains+0.305
 unnecessary+0.297
 slaves+0.283
 prisoners+0.281
 risks+0.277
Neg logit contributions
 her-0.085
,-0.081
 she-0.062
 and-0.062
pron-0.055
Token fast
Token ID3049
Feature activation-3.17e+00

Pos logit contributions
 villains+0.122
 risks+0.121
 unnecessary+0.116
 prisoners+0.111
 wre+0.109
Neg logit contributions
pron-0.118
 butter-0.102
on-0.093
 sugar-0.091
 teaspoon-0.089
Token as
Token ID355
Feature activation-1.84e+00

Pos logit contributions
 villains+0.229
 slaves+0.216
 wre+0.210
 prisoners+0.207
 enemies+0.207
Neg logit contributions
 butter-0.071
pron-0.067
 comb-0.061
 sugar-0.054
 brush-0.052
Token he
Token ID339
Feature activation-2.75e+00

Pos logit contributions
 villains+0.106
 risks+0.100
 unnecessary+0.096
 wre+0.095
 prisoners+0.094
Neg logit contributions
pron-0.083
 butter-0.071
on-0.065
 sugar-0.062
 teaspoon-0.057
Token could
Token ID714
Feature activation-2.48e+00

Pos logit contributions
 villains+0.194
 risks+0.193
 unnecessary+0.190
 wre+0.180
 prisoners+0.176
Neg logit contributions
pron-0.068
 butter-0.054
on-0.053
 spoon-0.044
 comb-0.043
Token.
Token ID13
Feature activation-2.49e+00

Pos logit contributions
 villains+0.079
 risks+0.073
 unnecessary+0.071
 warriors+0.069
 wre+0.068
Neg logit contributions
pron-0.062
 butter-0.049
on-0.044
 baking-0.043
 slice-0.042
Token Then
Token ID3244
Feature activation-2.50e+00

Pos logit contributions
 villains+0.129
 risks+0.121
 unnecessary+0.116
 prisoners+0.115
 wre+0.114
Neg logit contributions
pron-0.113
 butter-0.095
 mug-0.089
 mixer-0.088
 milk-0.087
Token they
Token ID484
Feature activation-1.56e+00

Pos logit contributions
 villains+0.183
 wre+0.166
 risks+0.166
 slaves+0.165
 unnecessary+0.164
Neg logit contributions
pron-0.059
 butter-0.054
 comb-0.041
on-0.036
 baking-0.034
Token both
Token ID1111
Feature activation-2.41e+00

Pos logit contributions
 villains+0.090
 risks+0.086
 unnecessary+0.084
 wre+0.078
 opponents+0.078
Neg logit contributions
pron-0.072
 butter-0.062
 comb-0.054
on-0.053
 slice-0.051
Token started
Token ID2067
Feature activation-2.85e+00

Pos logit contributions
 villains+0.141
 risks+0.135
 unnecessary+0.134
 prisoners+0.122
dies+0.121
Neg logit contributions
pron-0.102
 butter-0.087
 comb-0.082
on-0.080
 brush-0.079
Token running
Token ID2491
Feature activation-4.33e+00

Pos logit contributions
 villains+0.192
 risks+0.176
 unnecessary+0.176
 prisoners+0.167
 slaves+0.164
Neg logit contributions
pron-0.087
 baking-0.083
 butter-0.073
 mixer-0.071
 comb-0.070
Token.
Token ID13
Feature activation-1.98e+00

Pos logit contributions
 villains+0.328
 unnecessary+0.318
 prisoners+0.301
 risks+0.296
 slaves+0.294
Neg logit contributions
,-0.062
 her-0.051
 and-0.047
 brush-0.034
pron-0.032
Token They
Token ID1119
Feature activation-2.13e+00

Pos logit contributions
 villains+0.101
 risks+0.091
 wre+0.088
 prisoners+0.087
 unnecessary+0.087
Neg logit contributions
pron-0.098
 butter-0.082
on-0.076
 mug-0.075
 mixer-0.074
Token ran
Token ID4966
Feature activation-4.11e+00

Pos logit contributions
 villains+0.125
 risks+0.123
 unnecessary+0.122
 opponents+0.111
 wre+0.110
Neg logit contributions
pron-0.087
 butter-0.075
 comb-0.072
 brush-0.068
on-0.067
Token so
Token ID523
Feature activation-2.65e+00

Pos logit contributions
 villains+0.311
 unnecessary+0.294
 risks+0.283
 slaves+0.279
 enemy+0.277
Neg logit contributions
,-0.061
 her-0.051
 and-0.043
pron-0.041
 brush-0.037
Token fast
Token ID3049
Feature activation-3.55e+00

Pos logit contributions
 villains+0.141
 prisoners+0.135
 risks+0.135
 warriors+0.134
 enemy+0.133
Neg logit contributions
pron-0.097
 butter-0.089
 cherry-0.083
on-0.082
 comb-0.075
Token raced
Token ID32897
Feature activation-3.01e+00

Pos logit contributions
 villains+0.074
 risks+0.070
 unnecessary+0.067
 warriors+0.066
 wre+0.065
Neg logit contributions
pron-0.061
 butter-0.052
 comb-0.047
on-0.043
 brush-0.043
Token and
Token ID290
Feature activation-1.99e+00

Pos logit contributions
 villains+0.212
 unnecessary+0.189
 risks+0.187
 prisoners+0.185
 warriors+0.184
Neg logit contributions
pron-0.079
 her-0.077
 comb-0.069
on-0.063
 butter-0.062
Token they
Token ID484
Feature activation-2.20e+00

Pos logit contributions
 villains+0.111
 risks+0.110
 unnecessary+0.104
 wre+0.099
 warriors+0.098
Neg logit contributions
pron-0.091
 butter-0.076
 comb-0.074
on-0.070
 brush-0.069
Token were
Token ID547
Feature activation-1.63e+00

Pos logit contributions
 villains+0.136
 risks+0.134
 unnecessary+0.130
 warriors+0.123
 opponents+0.122
Neg logit contributions
pron-0.087
 butter-0.072
 comb-0.070
on-0.066
 brush-0.064
Token both
Token ID1111
Feature activation-2.72e+00

Pos logit contributions
 villains+0.098
 risks+0.089
dies+0.087
 warriors+0.086
 opponents+0.085
Neg logit contributions
pron-0.073
 butter-0.062
 comb-0.055
 baking-0.055
 brush-0.052
Token running
Token ID2491
Feature activation-4.33e+00

Pos logit contributions
 villains+0.174
dies+0.166
 risks+0.165
 unnecessary+0.157
 warriors+0.156
Neg logit contributions
pron-0.093
 baking-0.078
 butter-0.073
 brush-0.071
 comb-0.071
Token very
Token ID845
Feature activation-3.17e+00

Pos logit contributions
 villains+0.301
 unnecessary+0.283
 enemy+0.278
 slaves+0.274
 prisoners+0.272
Neg logit contributions
,-0.083
 her-0.077
 and-0.059
 in-0.056
 with-0.054
Token fast
Token ID3049
Feature activation-3.47e+00

Pos logit contributions
 villains+0.176
 risks+0.168
dies+0.162
 warriors+0.162
 unnecessary+0.159
Neg logit contributions
pron-0.108
 butter-0.091
 comb-0.088
 brush-0.085
 cherry-0.084
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
 villains+0.229
 unnecessary+0.207
 risks+0.207
 War+0.206
dies+0.205
Neg logit contributions
pron-0.085
 butter-0.084
 comb-0.076
 baking-0.072
 brush-0.062
Token But
Token ID887
Feature activation+1.13e-01

Pos logit contributions
 villains+0.054
 risks+0.049
 warriors+0.047
 wre+0.047
 unnecessary+0.046
Neg logit contributions
pron-0.054
 butter-0.043
on-0.040
 sugar-0.039
 baking-0.038
Token in
Token ID287
Feature activation-6.74e-01

Pos logit contributions
pron+0.006
 butter+0.005
 basil+0.004
 teaspoon+0.004
on+0.004
Neg logit contributions
 villains-0.006
 risks-0.006
 warriors-0.006
 wre-0.006
 opponents-0.006
Token racing
Token ID11717
Feature activation-2.54e+00

Pos logit contributions
 villains+0.079
 risks+0.072
 warriors+0.070
 prisoners+0.070
 opponents+0.070
Neg logit contributions
pron-0.051
 butter-0.044
 sugar-0.036
on-0.035
 comb-0.035
Token.
Token ID13
Feature activation-2.20e+00

Pos logit contributions
 villains+0.157
 prisoners+0.147
 unnecessary+0.147
 risks+0.144
 wre+0.143
Neg logit contributions
pron-0.085
 butter-0.072
on-0.064
 comb-0.063
 mug-0.057
Token They
Token ID1119
Feature activation-1.59e+00

Pos logit contributions
 villains+0.114
 risks+0.109
 unnecessary+0.108
 prisoners+0.104
 wre+0.101
Neg logit contributions
pron-0.107
 butter-0.092
 basil-0.088
on-0.087
 sugar-0.085
Token both
Token ID1111
Feature activation-2.27e+00

Pos logit contributions
 villains+0.088
 risks+0.085
 unnecessary+0.083
 wre+0.077
 opponents+0.076
Neg logit contributions
pron-0.076
 butter-0.066
 comb-0.061
 brush-0.057
on-0.057
Token started
Token ID2067
Feature activation-3.40e+00

Pos logit contributions
 villains+0.135
 risks+0.131
 unnecessary+0.130
 wre+0.120
 prisoners+0.119
Neg logit contributions
pron-0.094
 butter-0.078
 comb-0.074
on-0.073
 brush-0.071
Token running
Token ID2491
Feature activation-4.33e+00

Pos logit contributions
 villains+0.233
 risks+0.222
 unnecessary+0.220
 prisoners+0.213
 slaves+0.212
Neg logit contributions
pron-0.073
 baking-0.073
 comb-0.065
 brushing-0.064
 butter-0.061
Token as
Token ID355
Feature activation-2.27e+00

Pos logit contributions
 villains+0.336
 unnecessary+0.333
 prisoners+0.319
 slaves+0.319
 risks+0.313
Neg logit contributions
,-0.059
 and-0.043
 her-0.035
 every-0.020
 while-0.019
Token fast
Token ID3049
Feature activation-3.37e+00

Pos logit contributions
 risks+0.114
 villains+0.113
 unnecessary+0.107
 prisoners+0.102
 opponents+0.099
Neg logit contributions
pron-0.115
 butter-0.100
on-0.088
 sugar-0.087
 flour-0.084
Token as
Token ID355
Feature activation-1.98e+00

Pos logit contributions
 slaves+0.238
 villains+0.237
 prisoners+0.227
 wre+0.223
 unnecessary+0.219
Neg logit contributions
 butter-0.068
pron-0.054
 comb-0.052
 sugar-0.043
 brush-0.039
Token they
Token ID484
Feature activation-3.27e+00

Pos logit contributions
 villains+0.119
 risks+0.116
 unnecessary+0.111
 prisoners+0.108
 wre+0.108
Neg logit contributions
pron-0.079
 butter-0.071
on-0.062
 sugar-0.058
 basil-0.055
Token could
Token ID714
Feature activation-2.57e+00

Pos logit contributions
 unnecessary+0.232
 risks+0.232
 villains+0.225
 decisions+0.216
 prisoners+0.215
Neg logit contributions
pron-0.066
on-0.061
 butter-0.057
 comb-0.052
 brush-0.045
Token race
Token ID3234
Feature activation-2.45e+00

Pos logit contributions
 villains+0.067
 risks+0.061
 wre+0.059
 unnecessary+0.059
 warriors+0.058
Neg logit contributions
pron-0.075
 butter-0.062
 comb-0.055
 teaspoon-0.054
 sugar-0.054
Token started
Token ID2067
Feature activation-2.68e+00

Pos logit contributions
 villains+0.148
 risks+0.144
 unnecessary+0.143
 wre+0.133
 warriors+0.131
Neg logit contributions
pron-0.102
 butter-0.079
on-0.075
 comb-0.071
 mixer-0.068
Token and
Token ID290
Feature activation-1.57e+00

Pos logit contributions
 villains+0.181
 risks+0.168
 unnecessary+0.168
 prisoners+0.158
 slaves+0.155
Neg logit contributions
pron-0.081
 butter-0.066
 baking-0.064
on-0.057
 comb-0.057
Token everyone
Token ID2506
Feature activation-2.21e+00

Pos logit contributions
 villains+0.095
 risks+0.089
 unnecessary+0.084
 wre+0.083
 opponents+0.083
Neg logit contributions
pron-0.067
 butter-0.056
 sugar-0.048
 comb-0.047
on-0.047
Token was
Token ID373
Feature activation-1.68e+00

Pos logit contributions
 villains+0.141
 risks+0.138
 unnecessary+0.138
 opponents+0.127
 warriors+0.126
Neg logit contributions
pron-0.085
on-0.065
 butter-0.064
 comb-0.060
 teaspoon-0.058
Token running
Token ID2491
Feature activation-4.32e+00

Pos logit contributions
 villains+0.092
 risks+0.083
 unnecessary+0.079
 opponents+0.078
dies+0.078
Neg logit contributions
pron-0.087
 butter-0.071
 baking-0.066
 comb-0.064
on-0.063
Token very
Token ID845
Feature activation-2.53e+00

Pos logit contributions
 villains+0.295
 unnecessary+0.276
 risks+0.264
 prisoners+0.259
 slaves+0.253
Neg logit contributions
pron-0.088
 her-0.082
,-0.079
 butter-0.066
 baking-0.066
Token fast
Token ID3049
Feature activation-3.22e+00

Pos logit contributions
 villains+0.140
 risks+0.131
 unnecessary+0.124
dies+0.122
 warriors+0.120
Neg logit contributions
pron-0.109
 butter-0.088
 teaspoon-0.082
 comb-0.080
 baking-0.080
Token.
Token ID13
Feature activation-1.38e+00

Pos logit contributions
 villains+0.220
 unnecessary+0.197
 risks+0.195
 wre+0.193
 prisoners+0.190
Neg logit contributions
pron-0.093
 butter-0.081
 comb-0.072
 baking-0.070
 brushing-0.062
Token Suddenly
Token ID24975
Feature activation-1.25e+00

Pos logit contributions
 villains+0.068
 risks+0.062
 wre+0.059
 unnecessary+0.058
 warriors+0.058
Neg logit contributions
pron-0.077
 butter-0.064
 basil-0.059
 sugar-0.058
on-0.057
Token,
Token ID11
Feature activation-1.43e+00

Pos logit contributions
 villains+0.088
 risks+0.080
 opponents+0.079
 wre+0.079
 unnecessary+0.078
Neg logit contributions
pron-0.042
 butter-0.035
 sugar-0.027
on-0.026
 teaspoon-0.026
Token race
Token ID3234
Feature activation-2.86e+00

Pos logit contributions
 villains+0.064
 risks+0.054
 warriors+0.054
 opponents+0.053
 War+0.053
Neg logit contributions
pron-0.083
 butter-0.072
 comb-0.065
 teaspoon-0.064
 brush-0.063
Token.
Token ID13
Feature activation-1.33e+00

Pos logit contributions
 villains+0.187
 unnecessary+0.180
 risks+0.175
 wre+0.163
 prisoners+0.162
Neg logit contributions
pron-0.095
 butter-0.075
on-0.073
 mixer-0.068
 comb-0.068
Token Then
Token ID3244
Feature activation-1.75e+00

Pos logit contributions
 villains+0.071
 risks+0.065
 warriors+0.062
 unnecessary+0.061
 wre+0.061
Neg logit contributions
pron-0.070
 butter-0.058
 sugar-0.052
on-0.052
 slice-0.051
Token they
Token ID484
Feature activation-1.90e+00

Pos logit contributions
 villains+0.133
 warriors+0.121
 risks+0.121
 opponents+0.120
 wre+0.120
Neg logit contributions
pron-0.044
 butter-0.041
 sugar-0.029
 comb-0.028
on-0.026
Token started
Token ID2067
Feature activation-3.07e+00

Pos logit contributions
 villains+0.112
 risks+0.107
 unnecessary+0.106
 War+0.097
 wre+0.096
Neg logit contributions
pron-0.082
 butter-0.071
 comb-0.063
on-0.063
 brush-0.060
Token running
Token ID2491
Feature activation-4.32e+00

Pos logit contributions
 villains+0.223
 risks+0.204
 unnecessary+0.203
 prisoners+0.195
 slaves+0.193
Neg logit contributions
pron-0.069
 baking-0.068
 butter-0.058
 mixer-0.055
 comb-0.054
Token as
Token ID355
Feature activation-2.22e+00

Pos logit contributions
 villains+0.344
 unnecessary+0.331
 prisoners+0.311
 risks+0.310
 enemy+0.309
Neg logit contributions
,-0.057
 her-0.045
 and-0.041
 she-0.022
 in-0.020
Token fast
Token ID3049
Feature activation-3.28e+00

Pos logit contributions
 villains+0.114
 risks+0.110
 unnecessary+0.103
 opponents+0.102
 prisoners+0.100
Neg logit contributions
pron-0.109
 butter-0.097
 sugar-0.086
 flour-0.083
on-0.081
Token as
Token ID355
Feature activation-1.74e+00

Pos logit contributions
 villains+0.244
 slaves+0.233
 wre+0.222
 prisoners+0.221
 enemies+0.220
Neg logit contributions
 butter-0.062
pron-0.053
 comb-0.050
 sugar-0.043
 brush-0.039
Token they
Token ID484
Feature activation-2.76e+00

Pos logit contributions
 villains+0.116
 risks+0.110
 opponents+0.105
 wre+0.105
 warriors+0.104
Neg logit contributions
pron-0.060
 butter-0.053
 sugar-0.042
on-0.040
 flour-0.039
Token could
Token ID714
Feature activation-2.71e+00

Pos logit contributions
 risks+0.197
 villains+0.197
 unnecessary+0.194
 prisoners+0.185
 decisions+0.185
Neg logit contributions
pron-0.059
on-0.049
 butter-0.049
 comb-0.039
 brush-0.035
Token race
Token ID3234
Feature activation-2.13e+00

Pos logit contributions
 villains+0.075
 risks+0.069
 unnecessary+0.067
 wre+0.067
 loses+0.066
Neg logit contributions
pron-0.080
 butter-0.068
 comb-0.060
 brush-0.059
 sugar-0.059
Token started
Token ID2067
Feature activation-2.98e+00

Pos logit contributions
 villains+0.135
 risks+0.133
 unnecessary+0.132
 wre+0.125
 opponents+0.122
Neg logit contributions
pron-0.079
 butter-0.062
on-0.058
 comb-0.054
 mixer-0.049
Token and
Token ID290
Feature activation-1.36e+00

Pos logit contributions
 villains+0.215
 risks+0.204
 unnecessary+0.202
 prisoners+0.196
 slaves+0.193
Neg logit contributions
pron-0.057
 baking-0.048
 butter-0.047
 comb-0.043
 brushing-0.042
Token the
Token ID262
Feature activation-2.01e+00

Pos logit contributions
 villains+0.087
 risks+0.083
 opponents+0.079
 unnecessary+0.078
 warriors+0.078
Neg logit contributions
pron-0.051
 butter-0.044
 sugar-0.036
 comb-0.035
on-0.035
Token rabbit
Token ID22746
Feature activation-2.13e+00

Pos logit contributions
 villains+0.109
 risks+0.102
ks+0.102
 wre+0.101
 unnecessary+0.100
Neg logit contributions
pron-0.089
 butter-0.080
 comb-0.070
 tomato-0.068
 brush-0.067
Token ran
Token ID4966
Feature activation-4.30e+00

Pos logit contributions
 villains+0.132
 risks+0.128
 unnecessary+0.127
 warriors+0.119
 wre+0.119
Neg logit contributions
pron-0.084
 butter-0.069
 comb-0.063
on-0.059
 brush-0.055
Token very
Token ID845
Feature activation-3.15e+00

Pos logit contributions
 villains+0.282
 unnecessary+0.270
 slaves+0.264
 prisoners+0.260
 risks+0.259
Neg logit contributions
 her-0.099
,-0.090
pron-0.083
 comb-0.069
h-0.069
Token fast
Token ID3049
Feature activation-3.78e+00

Pos logit contributions
 villains+0.174
 risks+0.173
 unnecessary+0.165
 warriors+0.161
dies+0.159
Neg logit contributions
pron-0.114
 butter-0.091
 comb-0.091
 brush-0.088
 teaspoon-0.088
Token.
Token ID13
Feature activation-6.56e-01

Pos logit contributions
 villains+0.262
 unnecessary+0.251
 risks+0.244
 prisoners+0.243
 slaves+0.241
Neg logit contributions
pron-0.060
 comb-0.057
 butter-0.057
 baking-0.049
 her-0.049
Token But
Token ID887
Feature activation-1.33e-01

Pos logit contributions
 villains+0.035
 risks+0.032
 warriors+0.031
 unnecessary+0.030
 wre+0.030
Neg logit contributions
pron-0.035
 butter-0.029
on-0.026
 sugar-0.026
 basil-0.025
Token then
Token ID788
Feature activation-5.72e-01

Pos logit contributions
 villains+0.008
 risks+0.007
 warriors+0.007
 wre+0.007
 opponents+0.007
Neg logit contributions
pron-0.006
 butter-0.005
 teaspoon-0.004
 comb-0.004
on-0.004
Token was
Token ID373
Feature activation-1.22e+00

Pos logit contributions
 villains+0.105
 risks+0.099
 unnecessary+0.096
 opponents+0.096
 wre+0.096
Neg logit contributions
pron-0.053
 butter-0.044
on-0.037
 comb-0.036
 teaspoon-0.031
Token going
Token ID1016
Feature activation-2.34e+00

Pos logit contributions
 villains+0.079
 risks+0.072
 warriors+0.070
 opponents+0.070
 wre+0.070
Neg logit contributions
pron-0.049
 butter-0.040
 comb-0.033
on-0.033
 baking-0.031
Token and
Token ID290
Feature activation-1.56e+00

Pos logit contributions
 villains+0.179
 risks+0.159
 unnecessary+0.157
 prisoners+0.156
 wre+0.155
Neg logit contributions
pron-0.058
 butter-0.047
on-0.038
 baking-0.034
 comb-0.033
Token started
Token ID2067
Feature activation-2.67e+00

Pos logit contributions
 villains+0.091
 risks+0.086
 unnecessary+0.082
 opponents+0.081
 warriors+0.079
Neg logit contributions
pron-0.068
 butter-0.060
 comb-0.051
on-0.050
 sugar-0.050
Token to
Token ID284
Feature activation-1.86e+00

Pos logit contributions
 villains+0.185
 risks+0.169
 prisoners+0.168
 unnecessary+0.166
 warriors+0.164
Neg logit contributions
pron-0.077
 butter-0.063
 comb-0.061
 baking-0.060
 brushing-0.054
Token run
Token ID1057
Feature activation-4.30e+00

Pos logit contributions
 villains+0.096
 risks+0.086
 prisoners+0.085
 warriors+0.084
 unnecessary+0.084
Neg logit contributions
pron-0.093
 butter-0.083
 comb-0.081
 brush-0.075
 bake-0.072
Token too
Token ID1165
Feature activation-3.35e+00

Pos logit contributions
 villains+0.252
 slaves+0.239
 prisoners+0.235
 opponents+0.233
 unnecessary+0.229
Neg logit contributions
 her-0.115
,-0.108
 comb-0.094
h-0.092
 butter-0.090
Token.
Token ID13
Feature activation-2.09e+00

Pos logit contributions
 villains+0.218
 opponents+0.202
 wre+0.195
 prisoners+0.195
 warriors+0.194
Neg logit contributions
 butter-0.081
pron-0.079
on-0.071
 she-0.068
 sugar-0.068
Token They
Token ID1119
Feature activation-1.64e+00

Pos logit contributions
 villains+0.101
 risks+0.097
 unnecessary+0.094
 prisoners+0.087
 wre+0.086
Neg logit contributions
pron-0.102
 butter-0.089
 basil-0.084
 mixer-0.083
 sugar-0.083
Token were
Token ID547
Feature activation-2.18e+00

Pos logit contributions
 villains+0.093
 risks+0.090
 unnecessary+0.088
 opponents+0.084
 wre+0.083
Neg logit contributions
pron-0.071
 butter-0.062
 comb-0.058
on-0.055
 brush-0.053
Token very
Token ID845
Feature activation-2.47e+00

Pos logit contributions
 villains+0.124
 risks+0.112
dies+0.111
 opponents+0.108
 unnecessary+0.106
Neg logit contributions
pron-0.095
 butter-0.084
 comb-0.076
 baking-0.073
 brush-0.070
Token It
Token ID632
Feature activation-8.23e-01

Pos logit contributions
 villains+0.033
 risks+0.031
 warriors+0.029
 wre+0.029
 opponents+0.029
Neg logit contributions
pron-0.031
 butter-0.026
on-0.022
 sugar-0.022
 basil-0.022
Token chased
Token ID26172
Feature activation-1.46e+00

Pos logit contributions
 villains+0.050
 risks+0.047
 opponents+0.046
 warriors+0.045
 unnecessary+0.045
Neg logit contributions
pron-0.036
 butter-0.029
 comb-0.025
on-0.025
 teaspoon-0.024
Token them
Token ID606
Feature activation-2.53e+00

Pos logit contributions
 villains+0.097
 wre+0.087
 opponents+0.087
 warriors+0.087
 prisoners+0.085
Neg logit contributions
pron-0.050
 butter-0.046
 comb-0.037
 sugar-0.037
on-0.036
Token and
Token ID290
Feature activation-1.71e+00

Pos logit contributions
 villains+0.177
 warriors+0.163
 opponents+0.163
 risks+0.161
 unnecessary+0.161
Neg logit contributions
pron-0.062
 butter-0.055
 brush-0.052
 comb-0.051
on-0.050
Token they
Token ID484
Feature activation-2.28e+00

Pos logit contributions
 villains+0.091
 risks+0.088
 opponents+0.085
 unnecessary+0.085
 runners+0.082
Neg logit contributions
pron-0.080
 butter-0.069
 comb-0.063
 brush-0.060
on-0.060
Token ran
Token ID4966
Feature activation-4.29e+00

Pos logit contributions
 risks+0.129
 villains+0.128
 unnecessary+0.126
 opponents+0.125
 War+0.119
Neg logit contributions
pron-0.086
 butter-0.078
 comb-0.078
 brush-0.072
on-0.069
Token away
Token ID1497
Feature activation-2.85e+00

Pos logit contributions
 villains+0.283
 opponents+0.276
 slaves+0.261
 unnecessary+0.261
 enemy+0.260
Neg logit contributions
 her-0.095
,-0.088
 in-0.069
 gently-0.067
 and-0.067
Token.
Token ID13
Feature activation-8.16e-01

Pos logit contributions
 villains+0.199
 unnecessary+0.191
 opponents+0.188
 risks+0.186
 War+0.185
Neg logit contributions
pron-0.053
 teaspoon-0.051
 baking-0.050
 butter-0.048
 brush-0.047
Token Lily
Token ID20037
Feature activation-1.46e+00

Pos logit contributions
 villains+0.045
 risks+0.041
 warriors+0.040
 wre+0.039
 opponents+0.039
Neg logit contributions
pron-0.042
 butter-0.034
 basil-0.030
 sugar-0.030
 teaspoon-0.029
Token was
Token ID373
Feature activation-2.02e+00

Pos logit contributions
 villains+0.095
 risks+0.091
 unnecessary+0.088
 opponents+0.086
 wre+0.086
Neg logit contributions
pron-0.056
 butter-0.045
 comb-0.040
on-0.039
 brush-0.036
Token sad
Token ID6507
Feature activation-1.53e+00

Pos logit contributions
 villains+0.112
 opponents+0.108
dies+0.107
 risks+0.102
 runners+0.101
Neg logit contributions
pron-0.083
 butter-0.073
 comb-0.070
 brush-0.067
 baking-0.065
TokenShe
Token ID3347
Feature activation-2.89e-01

Pos logit contributions
pron+0.021
 butter+0.019
 teaspoon+0.016
 egg+0.016
 sugar+0.016
Neg logit contributions
 villains-0.022
 risks-0.019
 warriors-0.018
 speeds-0.018
 War-0.018
Token ran
Token ID4966
Feature activation-1.89e+00

Pos logit contributions
 villains+0.015
 risks+0.014
 warriors+0.013
 wre+0.013
 opponents+0.013
Neg logit contributions
pron-0.016
 butter-0.013
 comb-0.011
on-0.011
 sugar-0.011
Token to
Token ID284
Feature activation+4.28e-01

Pos logit contributions
 villains+0.128
 opponents+0.117
 risks+0.114
 unnecessary+0.112
 wre+0.112
Neg logit contributions
pron-0.064
 butter-0.052
 comb-0.046
 brush-0.045
 sugar-0.045
Token her
Token ID607
Feature activation+7.82e-02

Pos logit contributions
pron+0.022
 butter+0.018
 basil+0.015
on+0.015
 slice+0.014
Neg logit contributions
 villains-0.024
 risks-0.022
 wre-0.022
 speeds-0.021
 warriors-0.021
Token mom
Token ID1995
Feature activation+2.97e+00

Pos logit contributions
pron+0.004
 butter+0.003
on+0.002
 basil+0.002
 sugar+0.002
Neg logit contributions
 villains-0.005
 risks-0.004
 warriors-0.004
 wre-0.004
 opponents-0.004
Tokenmy
Token ID1820
Feature activation+2.82e+00

Pos logit contributions
pron+0.211
 teaspoon+0.169
 butter+0.166
kin+0.166
 panc+0.157
Neg logit contributions
 wre-0.072
 drives-0.072
minded-0.069
 shouts-0.067
 warriors-0.067
Token and
Token ID290
Feature activation+2.16e+00

Pos logit contributions
pron+0.193
 butter+0.153
 teaspoon+0.150
mm+0.147
 icing+0.147
Neg logit contributions
 warriors-0.074
 wre-0.068
 villains-0.066
 drives-0.066
 shouts-0.065
Token said
Token ID531
Feature activation+1.50e+00

Pos logit contributions
pron+0.130
 butter+0.112
 basil+0.107
ool+0.107
 teaspoon+0.104
Neg logit contributions
 villains-0.070
 shouts-0.069
 warriors-0.068
 wre-0.066
 prisoners-0.065
Token,
Token ID11
Feature activation+2.45e+00

Pos logit contributions
pron+0.089
 butter+0.069
 slice+0.064
 basil+0.062
 mug+0.062
Neg logit contributions
 villains-0.071
 risks-0.062
 warriors-0.061
 War-0.061
 runners-0.059
Token "
Token ID366
Feature activation+1.03e+00

Pos logit contributions
pron+0.156
 butter+0.142
 batter+0.127
 slice+0.125
 toast+0.124
Neg logit contributions
 War-0.081
 villains-0.080
 warriors-0.071
 shouts-0.067
 brakes-0.067
TokenMom
Token ID29252
Feature activation+2.63e+00

Pos logit contributions
pron+0.044
 butter+0.036
 teaspoon+0.033
on+0.031
 basil+0.031
Neg logit contributions
 villains-0.059
 risks-0.055
 War-0.053
 wre-0.053
 warriors-0.052
Token
Token ID198
Feature activation+1.52e+00

Pos logit contributions
pron+0.088
 butter+0.077
 comb+0.068
 egg+0.067
 basil+0.067
Neg logit contributions
 villains-0.056
 War-0.049
 risks-0.049
 speeds-0.048
 wre-0.047
Token
Token ID198
Feature activation+1.16e+00

Pos logit contributions
pron+0.100
 butter+0.090
 egg+0.081
 basil+0.080
 comb+0.079
Neg logit contributions
 villains-0.055
 War-0.046
 speeds-0.046
 wre-0.045
 risks-0.045
TokenL
Token ID43
Feature activation+6.33e-01

Pos logit contributions
pron+0.065
 butter+0.061
 egg+0.057
 teaspoon+0.054
on+0.054
Neg logit contributions
 villains-0.050
 War-0.043
 speeds-0.042
 risks-0.041
 runners-0.041
Tokenily
Token ID813
Feature activation+2.03e+00

Pos logit contributions
pron+0.036
 butter+0.031
 comb+0.026
 egg+0.026
 basil+0.026
Neg logit contributions
 villains-0.031
 risks-0.027
 warriors-0.026
 wre-0.026
 speeds-0.026
Token said
Token ID531
Feature activation+2.56e+00

Pos logit contributions
pron+0.121
 butter+0.102
 basil+0.093
 sugar+0.093
 panc+0.091
Neg logit contributions
 villains-0.080
 speeds-0.073
 warriors-0.072
 wre-0.070
 opponents-0.070
Token,
Token ID11
Feature activation+2.80e+00

Pos logit contributions
pron+0.159
 egg+0.128
 butter+0.127
 basil+0.127
 bacon+0.126
Neg logit contributions
 villains-0.087
 War-0.069
 warriors-0.069
 opponents-0.067
 enemies-0.066
Token "
Token ID366
Feature activation+1.02e+00

Pos logit contributions
pron+0.179
 butter+0.170
 egg+0.159
 batter+0.156
ool+0.156
Neg logit contributions
 War-0.075
 villains-0.075
 warriors-0.066
 opponents-0.062
 enemies-0.055
TokenOkay
Token ID16454
Feature activation+8.87e-01

Pos logit contributions
pron+0.055
 butter+0.049
 teaspoon+0.047
on+0.045
 egg+0.045
Neg logit contributions
 villains-0.045
 risks-0.040
 War-0.040
 warriors-0.038
 opponents-0.038
Token,
Token ID11
Feature activation+2.02e+00

Pos logit contributions
pron+0.043
 butter+0.036
 basil+0.032
on+0.031
 teaspoon+0.030
Neg logit contributions
 villains-0.047
 risks-0.043
 opponents-0.043
 warriors-0.042
 speeds-0.041
Token Mom
Token ID11254
Feature activation+7.59e-01

Pos logit contributions
pron+0.108
 butter+0.084
on+0.083
 baked+0.081
ons+0.080
Neg logit contributions
 villains-0.093
 warriors-0.079
 enemies-0.078
 risks-0.078
 opponents-0.077
Tokenmy
Token ID1820
Feature activation-9.04e-01

Pos logit contributions
pron+0.035
 butter+0.028
 basil+0.025
 teaspoon+0.024
 egg+0.023
Neg logit contributions
 villains-0.043
 risks-0.039
 warriors-0.039
 opponents-0.038
 wre-0.038
Token
Token ID198
Feature activation+1.43e+00

Pos logit contributions
pron+0.051
 butter+0.044
 comb+0.039
 basil+0.038
 teaspoon+0.038
Neg logit contributions
 villains-0.040
 risks-0.035
 wre-0.034
 War-0.034
 speeds-0.033
Token
Token ID198
Feature activation+1.26e+00

Pos logit contributions
pron+0.093
 butter+0.084
 basil+0.075
 egg+0.073
 teaspoon+0.073
Neg logit contributions
 villains-0.053
 War-0.046
 wre-0.045
 risks-0.045
 speeds-0.044
TokenHis
Token ID6653
Feature activation+3.23e-01

Pos logit contributions
pron+0.073
 butter+0.072
 teaspoon+0.064
 basil+0.063
 egg+0.063
Neg logit contributions
 villains-0.050
 War-0.046
 wre-0.043
 speeds-0.042
 opponents-0.041
Token mom
Token ID1995
Feature activation+1.95e+00

Pos logit contributions
pron+0.015
 butter+0.012
 basil+0.011
on+0.010
 sugar+0.010
Neg logit contributions
 villains-0.019
 warriors-0.017
 opponents-0.017
 risks-0.017
 wre-0.017
Token said
Token ID531
Feature activation+2.33e+00

Pos logit contributions
pron+0.123
 butter+0.099
 basil+0.099
 panc+0.096
 teaspoon+0.093
Neg logit contributions
 villains-0.064
 warriors-0.061
 shouts-0.060
 wre-0.060
 prisoners-0.058
Token,
Token ID11
Feature activation+2.86e+00

Pos logit contributions
pron+0.149
 batter+0.114
 mug+0.112
 butter+0.111
 basil+0.110
Neg logit contributions
 villains-0.088
 warriors-0.072
 War-0.069
 enemies-0.066
 risks-0.064
Token "
Token ID366
Feature activation+2.02e-01

Pos logit contributions
pron+0.195
 butter+0.168
 slice+0.162
 batter+0.160
 icing+0.157
Neg logit contributions
 villains-0.079
 War-0.076
 warriors-0.072
 runners-0.063
 brakes-0.061
TokenSure
Token ID19457
Feature activation-6.44e-01

Pos logit contributions
pron+0.012
 butter+0.010
 teaspoon+0.009
on+0.009
 comb+0.009
Neg logit contributions
 villains-0.009
 risks-0.008
 warriors-0.008
 wre-0.008
 runners-0.008
Token,
Token ID11
Feature activation+2.23e-01

Pos logit contributions
 villains+0.045
 risks+0.041
 warriors+0.041
 opponents+0.041
 wre+0.041
Neg logit contributions
pron-0.023
 butter-0.018
 sugar-0.014
on-0.014
 egg-0.014
Token Tim
Token ID5045
Feature activation+4.53e-01

Pos logit contributions
pron+0.013
 butter+0.010
 comb+0.009
 basil+0.009
on+0.009
Neg logit contributions
 villains-0.011
 risks-0.010
 wre-0.010
 warriors-0.010
 opponents-0.010
Tokenmy
Token ID1820
Feature activation-1.05e+00

Pos logit contributions
pron+0.020
 butter+0.016
 basil+0.014
 teaspoon+0.013
 panc+0.013
Neg logit contributions
 villains-0.027
 risks-0.025
 warriors-0.025
 wre-0.024
 opponents-0.024
Token day
Token ID1110
Feature activation+1.86e+00

Pos logit contributions
pron+0.093
 butter+0.086
 basil+0.077
 comb+0.075
 teaspoon+0.075
Neg logit contributions
 villains-0.047
 War-0.040
 wre-0.037
 speeds-0.037
 risks-0.037
Token,
Token ID11
Feature activation+2.04e+00

Pos logit contributions
pron+0.116
 butter+0.094
 egg+0.088
 basil+0.087
 recipe+0.087
Neg logit contributions
 villains-0.074
 risks-0.066
 War-0.064
 runners-0.063
 wre-0.062
Token she
Token ID673
Feature activation+5.67e-02

Pos logit contributions
pron+0.106
iflower+0.090
 egg+0.090
ool+0.089
 butter+0.089
Neg logit contributions
 villains-0.079
 War-0.072
 villain-0.068
 races-0.064
dies-0.064
Token spilled
Token ID31896
Feature activation+2.91e+00

Pos logit contributions
pron+0.003
 butter+0.003
 teaspoon+0.002
 basil+0.002
 sugar+0.002
Neg logit contributions
 villains-0.003
 risks-0.003
 warriors-0.003
 wre-0.003
 opponents-0.002
Token some
Token ID617
Feature activation+2.77e+00

Pos logit contributions
pron+0.178
isy+0.125
pp+0.125
comb+0.124
 hen+0.123
Neg logit contributions
 risks-0.112
 wre-0.111
 trouble-0.106
 War-0.102
 villains-0.098
Token juice
Token ID13135
Feature activation+3.02e+00

Pos logit contributions
pron+0.142
comb+0.104
pp+0.097
 slice+0.096
mmm+0.096
Neg logit contributions
 risks-0.131
 wre-0.126
 villains-0.124
 unnecessary-0.115
 War-0.114
Token on
Token ID319
Feature activation+3.30e+00

Pos logit contributions
pron+0.197
 basil+0.148
 bacon+0.148
innie+0.145
 hen+0.145
Neg logit contributions
 speeds-0.085
 risks-0.085
 villains-0.085
 wre-0.083
ashes-0.080
Token the
Token ID262
Feature activation+1.37e+00

Pos logit contributions
pron+0.225
pp+0.187
itten+0.182
ipop+0.177
gary+0.175
Neg logit contributions
 risks-0.069
 wre-0.069
 opposite-0.068
 speeds-0.065
 trouble-0.064
Token carpet
Token ID20710
Feature activation+2.45e+00

Pos logit contributions
pron+0.074
 butter+0.062
on+0.057
 slice+0.057
 sugar+0.053
Neg logit contributions
 villains-0.067
 wre-0.059
 warriors-0.056
 risks-0.055
 opponents-0.055
Token and
Token ID290
Feature activation+1.63e+00

Pos logit contributions
pron+0.155
 butter+0.126
 basil+0.120
 bacon+0.119
 panc+0.114
Neg logit contributions
 villains-0.081
 wre-0.074
 risks-0.074
 warriors-0.073
 speeds-0.069
Token didn
Token ID1422
Feature activation+1.35e+00

Pos logit contributions
pron+0.090
 recipe+0.065
 basil+0.065
itten+0.064
 butter+0.064
Neg logit contributions
 villains-0.075
 speeds-0.065
 risks-0.064
 wre-0.063
 loses-0.062
Token with
Token ID351
Feature activation+2.14e+00

Pos logit contributions
pron+0.136
 butter+0.114
 basil+0.105
 sugar+0.102
itten+0.101
Neg logit contributions
 villains-0.088
 risks-0.084
 wre-0.082
 enemies-0.079
 opponents-0.078
Token a
Token ID257
Feature activation+2.13e+00

Pos logit contributions
pron+0.112
on+0.082
itten+0.079
isy+0.077
 butter+0.076
Neg logit contributions
 villains-0.102
 risks-0.101
 wre-0.092
 unnecessary-0.092
 trouble-0.090
Token big
Token ID1263
Feature activation+2.06e+00

Pos logit contributions
pron+0.110
 butter+0.097
on+0.094
 basil+0.092
isy+0.087
Neg logit contributions
 villains-0.083
 risks-0.078
 prisoners-0.078
 wre-0.077
 enemies-0.073
Token bag
Token ID6131
Feature activation+2.06e+00

Pos logit contributions
pron+0.091
 butter+0.070
 basil+0.067
on+0.063
Her+0.063
Neg logit contributions
 villains-0.107
 risks-0.102
 warriors-0.102
 wre-0.101
 prisoners-0.101
Token.
Token ID13
Feature activation+9.88e-01

Pos logit contributions
pron+0.123
 butter+0.106
 basil+0.100
 teaspoon+0.097
 sugar+0.095
Neg logit contributions
 villains-0.078
 warriors-0.070
 risks-0.067
 wre-0.066
 runners-0.066
Token Inside
Token ID14384
Feature activation+2.70e+00

Pos logit contributions
pron+0.058
 butter+0.049
on+0.043
 comb+0.042
 egg+0.041
Neg logit contributions
 villains-0.048
 warriors-0.041
 speeds-0.041
 runners-0.041
 risks-0.041
Token was
Token ID373
Feature activation+1.93e+00

Pos logit contributions
pron+0.182
 butter+0.147
 basil+0.142
 comb+0.137
 slice+0.137
Neg logit contributions
 speeds-0.077
 wre-0.068
 marches-0.067
 runners-0.067
 races-0.065
Token something
Token ID1223
Feature activation+1.57e+00

Pos logit contributions
pron+0.124
 butter+0.096
on+0.095
itten+0.089
 slice+0.089
Neg logit contributions
 villains-0.077
 prisoners-0.067
 risks-0.067
 wre-0.065
 speeds-0.064
Token green
Token ID4077
Feature activation+1.56e+00

Pos logit contributions
pron+0.089
 butter+0.070
on+0.064
 slice+0.063
 basil+0.059
Neg logit contributions
 villains-0.075
 risks-0.069
 wre-0.066
 warriors-0.064
 enemies-0.064
Token and
Token ID290
Feature activation+2.66e+00

Pos logit contributions
pron+0.088
 butter+0.066
on+0.060
 slice+0.059
 basil+0.057
Neg logit contributions
 villains-0.076
 risks-0.071
 warriors-0.069
 speeds-0.067
 wre-0.067
Token fuzzy
Token ID34669
Feature activation+8.46e-01

Pos logit contributions
pron+0.142
 butter+0.113
on+0.110
ocolate+0.106
 slice+0.106
Neg logit contributions
 villains-0.123
 obstacles-0.105
 speeds-0.104
 risks-0.103
 warriors-0.103
Token
Token ID198
Feature activation+1.16e+00

Pos logit contributions
pron+0.032
 butter+0.027
 slice+0.024
 comb+0.023
 sugar+0.023
Neg logit contributions
 villains-0.026
 warriors-0.022
 risks-0.022
 opponents-0.022
 War-0.022
Token
Token ID198
Feature activation+9.96e-01

Pos logit contributions
pron+0.075
 butter+0.067
 comb+0.059
 egg+0.059
 sugar+0.058
Neg logit contributions
 villains-0.048
 War-0.040
 risks-0.039
 speeds-0.038
 warriors-0.038
Token"
Token ID1
Feature activation+1.30e+00

Pos logit contributions
pron+0.047
 butter+0.040
 icing+0.035
 slice+0.035
 sugar+0.035
Neg logit contributions
 villains-0.059
 warriors-0.051
 risks-0.050
 War-0.049
 opponents-0.049
TokenM
Token ID44
Feature activation+5.38e-01

Pos logit contributions
pron+0.083
 butter+0.068
 baking+0.066
 mixer+0.065
on+0.064
Neg logit contributions
 villains-0.053
 warriors-0.044
 risks-0.044
 War-0.043
Check-0.040
Tokenilo
Token ID18526
Feature activation-3.12e-01

Pos logit contributions
pron+0.028
 butter+0.022
 comb+0.020
 slice+0.019
 mixer+0.019
Neg logit contributions
 villains-0.030
 risks-0.027
 warriors-0.026
 opponents-0.026
 unnecessary-0.026
Token,
Token ID11
Feature activation+8.00e-01

Pos logit contributions
 villains+0.020
 risks+0.019
 warriors+0.018
 wre+0.018
 opponents+0.018
Neg logit contributions
pron-0.013
 butter-0.010
 comb-0.008
on-0.008
 sugar-0.008
Token look
Token ID804
Feature activation-4.18e-01

Pos logit contributions
pron+0.048
 butter+0.038
on+0.033
 panc+0.033
 slice+0.033
Neg logit contributions
 villains-0.038
 risks-0.034
 wre-0.033
 warriors-0.033
 prisoners-0.031
Token at
Token ID379
Feature activation+6.03e-01

Pos logit contributions
 villains+0.022
 risks+0.020
 warriors+0.020
 opponents+0.019
 wre+0.019
Neg logit contributions
pron-0.022
 butter-0.019
on-0.016
 sugar-0.016
 comb-0.016
Token this
Token ID428
Feature activation+1.46e+00

Pos logit contributions
pron+0.033
 butter+0.026
 slice+0.024
 sugar+0.022
 comb+0.022
Neg logit contributions
 villains-0.032
 risks-0.030
 wre-0.029
 warriors-0.029
 unnecessary-0.028
Token feather
Token ID17972
Feature activation+1.15e+00

Pos logit contributions
pron+0.074
 butter+0.053
 mixer+0.052
ocolate+0.052
 baking+0.052
Neg logit contributions
 villains-0.072
 risks-0.072
 speeds-0.068
 warriors-0.066
 enemy-0.065
Token!
Token ID0
Feature activation+3.46e-01

Pos logit contributions
pron+0.074
 butter+0.061
 basil+0.058
 sugar+0.057
 slice+0.056
Neg logit contributions
 villains-0.046
 risks-0.040
 warriors-0.039
 wre-0.038
 speeds-0.038
Token She
Token ID1375
Feature activation-3.21e-01

Pos logit contributions
 villains+0.032
 risks+0.029
 wre+0.028
 warriors+0.028
 opponents+0.028
Neg logit contributions
pron-0.022
 butter-0.018
 sugar-0.015
 basil-0.015
on-0.015
Token strapped
Token ID37625
Feature activation+1.31e+00

Pos logit contributions
 villains+0.017
 risks+0.016
 warriors+0.015
 wre+0.015
 opponents+0.015
Neg logit contributions
pron-0.017
 butter-0.014
 comb-0.012
on-0.012
 sugar-0.012
Token herself
Token ID5223
Feature activation+5.36e-01

Pos logit contributions
pron+0.077
 butter+0.060
 slice+0.056
 recipe+0.056
 basil+0.055
Neg logit contributions
 villains-0.059
 risks-0.056
 runners-0.054
 unnecessary-0.053
 warriors-0.052
Token in
Token ID287
Feature activation+1.08e+00

Pos logit contributions
pron+0.030
 butter+0.025
 basil+0.023
 sugar+0.022
 slice+0.022
Neg logit contributions
 villains-0.026
 risks-0.024
 wre-0.023
 speeds-0.023
 runners-0.023
Token,
Token ID11
Feature activation+9.34e-01

Pos logit contributions
pron+0.061
 butter+0.050
 basil+0.045
 teaspoon+0.043
iflower+0.042
Neg logit contributions
 villains-0.048
 risks-0.046
 speeds-0.045
 wre-0.043
 runners-0.043
Token and
Token ID290
Feature activation+5.79e-01

Pos logit contributions
pron+0.050
 butter+0.042
 teaspoon+0.038
on+0.037
 egg+0.037
Neg logit contributions
 villains-0.044
 warriors-0.041
 risks-0.039
 speeds-0.039
 opponents-0.039
Token the
Token ID262
Feature activation-1.81e+00

Pos logit contributions
pron+0.032
 butter+0.027
 basil+0.024
 teaspoon+0.024
 egg+0.024
Neg logit contributions
 villains-0.028
 warriors-0.025
 risks-0.025
 speeds-0.025
 wre-0.024
Token car
Token ID1097
Feature activation-6.94e-01

Pos logit contributions
 villains+0.095
 risks+0.091
 unnecessary+0.090
 warriors+0.088
 opponents+0.086
Neg logit contributions
pron-0.094
 butter-0.077
 comb-0.073
 brush-0.069
 mixer-0.067
Token zoom
Token ID19792
Feature activation-1.63e+00

Pos logit contributions
 villains+0.036
 risks+0.033
 warriors+0.031
 wre+0.031
 opponents+0.031
Neg logit contributions
pron-0.038
 butter-0.032
 comb-0.028
on-0.028
 sugar-0.027
Tokened
Token ID276
Feature activation-1.14e+00

Pos logit contributions
 villains+0.082
 risks+0.074
 wre+0.072
 warriors+0.071
 unnecessary+0.071
Neg logit contributions
pron-0.087
 butter-0.075
on-0.068
 comb-0.068
 sugar-0.063
Token around
Token ID1088
Feature activation-5.96e-01

Pos logit contributions
 villains+0.061
 risks+0.054
 warriors+0.053
 opponents+0.052
 wre+0.052
Neg logit contributions
pron-0.058
 butter-0.051
 comb-0.045
 sugar-0.044
on-0.044
Token.
Token ID13
Feature activation-2.21e-01

Pos logit contributions
 villains+0.011
 risks+0.010
 warriors+0.010
 wre+0.010
 opponents+0.010
Neg logit contributions
pron-0.012
 butter-0.010
 comb-0.008
 sugar-0.008
on-0.008
Token It
Token ID632
Feature activation-9.76e-01

Pos logit contributions
 villains+0.010
 risks+0.009
 warriors+0.009
 wre+0.008
 opponents+0.008
Neg logit contributions
pron-0.014
 butter-0.012
 comb-0.010
on-0.010
 sugar-0.010
Token was
Token ID373
Feature activation-1.36e-01

Pos logit contributions
 villains+0.064
 risks+0.060
 opponents+0.059
 warriors+0.058
 unnecessary+0.058
Neg logit contributions
pron-0.038
 butter-0.030
 comb-0.026
on-0.025
 egg-0.024
Token an
Token ID281
Feature activation-1.25e-01

Pos logit contributions
 villains+0.007
 risks+0.006
 warriors+0.006
 wre+0.006
 opponents+0.006
Neg logit contributions
pron-0.007
 butter-0.006
on-0.005
 comb-0.005
 teaspoon-0.005
Token ice
Token ID4771
Feature activation-7.87e-02

Pos logit contributions
 villains+0.006
 risks+0.005
 warriors+0.005
 wre+0.005
 opponents+0.005
Neg logit contributions
pron-0.007
 butter-0.006
on-0.005
 comb-0.005
 sugar-0.005
Token-
Token ID12
Feature activation+3.27e-01

Pos logit contributions
 villains+0.005
 risks+0.004
 warriors+0.004
 wre+0.004
 opponents+0.004
Neg logit contributions
pron-0.004
 butter-0.003
 basil-0.003
 comb-0.003
 sugar-0.003
Tokencream
Token ID36277
Feature activation+3.27e-01

Pos logit contributions
pron+0.018
 butter+0.015
 teaspoon+0.013
 basil+0.013
 comb+0.013
Neg logit contributions
 villains-0.017
 risks-0.015
 wre-0.015
 warriors-0.014
 War-0.014
Token van
Token ID5719
Feature activation-1.48e+00

Pos logit contributions
pron+0.019
 butter+0.016
 basil+0.014
 comb+0.014
 sugar+0.014
Neg logit contributions
 villains-0.016
 risks-0.014
 wre-0.014
 warriors-0.014
 runners-0.013
Token coming
Token ID2406
Feature activation-1.38e+00

Pos logit contributions
 villains+0.088
 risks+0.081
 unnecessary+0.078
 opponents+0.077
 warriors+0.077
Neg logit contributions
pron-0.070
 butter-0.058
on-0.053
 comb-0.051
 baking-0.049
Token down
Token ID866
Feature activation-1.26e+00

Pos logit contributions
 villains+0.091
 risks+0.082
 opponents+0.082
 warriors+0.080
 unnecessary+0.079
Neg logit contributions
pron-0.052
 butter-0.044
 egg-0.038
on-0.037
 slice-0.036
Token the
Token ID262
Feature activation-8.45e-01

Pos logit contributions
 villains+0.085
 risks+0.077
 opponents+0.076
 unnecessary+0.075
 prisoners+0.074
Neg logit contributions
pron-0.049
 butter-0.041
on-0.036
 sugar-0.034
 baking-0.033
Token have
Token ID423
Feature activation+9.42e-01

Pos logit contributions
 villains+0.045
 opponents+0.040
 warriors+0.040
 risks+0.040
 wre+0.038
Neg logit contributions
pron-0.046
 butter-0.041
 comb-0.035
 sugar-0.034
on-0.034
Token to
Token ID284
Feature activation+1.03e+00

Pos logit contributions
pron+0.053
 butter+0.041
on+0.040
 teaspoon+0.038
 ornament+0.038
Neg logit contributions
 villains-0.046
 risks-0.043
 wre-0.041
 warriors-0.041
 enemies-0.040
Token clean
Token ID3424
Feature activation+1.87e-02

Pos logit contributions
pron+0.060
 butter+0.048
on+0.045
 basil+0.044
 teaspoon+0.044
Neg logit contributions
 villains-0.048
 risks-0.044
 wre-0.043
 speeds-0.041
 warriors-0.040
Token up
Token ID510
Feature activation+1.16e+00

Pos logit contributions
pron+0.001
 butter+0.001
on+0.001
 comb+0.001
 slice+0.001
Neg logit contributions
 villains-0.001
 risks-0.001
 wre-0.001
 warriors-0.001
 opponents-0.001
Token the
Token ID262
Feature activation+2.03e-01

Pos logit contributions
pron+0.070
 butter+0.056
on+0.051
 comb+0.050
 teaspoon+0.050
Neg logit contributions
 villains-0.052
 risks-0.048
 wre-0.047
 speeds-0.045
 warriors-0.045
Token mess
Token ID2085
Feature activation+3.95e-01

Pos logit contributions
pron+0.012
 butter+0.010
on+0.009
 basil+0.008
 comb+0.008
Neg logit contributions
 villains-0.010
 wre-0.009
 risks-0.009
 warriors-0.008
 speeds-0.008
Token.
Token ID13
Feature activation+5.11e-03

Pos logit contributions
pron+0.022
 butter+0.018
 comb+0.016
 basil+0.016
 slice+0.016
Neg logit contributions
 villains-0.020
 risks-0.017
 warriors-0.017
 wre-0.017
 opponents-0.017
Token Sara
Token ID24799
Feature activation-5.85e-01

Pos logit contributions
pron+0.000
 butter+0.000
on+0.000
 comb+0.000
 slice+0.000
Neg logit contributions
 villains-0.000
 risks-0.000
 warriors-0.000
 wre-0.000
 opponents-0.000
Token and
Token ID290
Feature activation+1.99e-01

Pos logit contributions
 villains+0.037
 risks+0.034
 warriors+0.033
 opponents+0.032
 wre+0.032
Neg logit contributions
pron-0.025
 butter-0.021
 comb-0.017
 sugar-0.017
on-0.017
Token Ben
Token ID3932
Feature activation-1.05e+00

Pos logit contributions
pron+0.013
 butter+0.011
on+0.010
 teaspoon+0.010
 slice+0.010
Neg logit contributions
 villains-0.008
 warriors-0.007
 risks-0.007
 opponents-0.007
 wre-0.007
Token say
Token ID910
Feature activation+4.37e-01

Pos logit contributions
 villains+0.058
 warriors+0.051
 risks+0.051
 opponents+0.050
 unnecessary+0.049
Neg logit contributions
pron-0.054
 butter-0.047
 comb-0.042
 brush-0.041
 slice-0.040
Token,
Token ID11
Feature activation+9.65e-01

Pos logit contributions
pron+0.054
 butter+0.041
 basil+0.040
 teaspoon+0.037
 recipe+0.037
Neg logit contributions
 villains-0.049
 risks-0.048
 wre-0.048
 speeds-0.047
 enemies-0.046
Token it
Token ID340
Feature activation+2.73e-01

Pos logit contributions
pron+0.052
 butter+0.041
on+0.037
 basil+0.036
 panc+0.036
Neg logit contributions
 villains-0.051
 warriors-0.045
 risks-0.045
 wre-0.043
 prisoners-0.043
Token has
Token ID468
Feature activation+7.37e-01

Pos logit contributions
pron+0.017
 butter+0.014
 basil+0.012
 comb+0.012
on+0.012
Neg logit contributions
 villains-0.012
 risks-0.011
 wre-0.011
 warriors-0.011
 speeds-0.011
Token a
Token ID257
Feature activation+1.24e+00

Pos logit contributions
pron+0.040
 butter+0.031
on+0.028
 teaspoon+0.028
 basil+0.027
Neg logit contributions
 villains-0.039
 risks-0.036
 wre-0.034
 warriors-0.034
 speeds-0.033
Token hole
Token ID7604
Feature activation+1.22e+00

Pos logit contributions
pron+0.059
 butter+0.049
on+0.044
 basil+0.044
 sugar+0.041
Neg logit contributions
 villains-0.067
 risks-0.060
 wre-0.058
 enemies-0.057
 warriors-0.057
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
pron+0.067
 butter+0.056
 comb+0.049
 basil+0.048
on+0.047
Neg logit contributions
 villains-0.060
 warriors-0.055
 wre-0.054
 risks-0.053
 speeds-0.053
Token it
Token ID340
Feature activation+1.65e+00

Pos logit contributions
pron+0.061
 butter+0.048
 basil+0.042
 comb+0.042
on+0.041
Neg logit contributions
 villains-0.050
 wre-0.046
 risks-0.046
 speeds-0.045
 warriors-0.044
Token."
Token ID526
Feature activation-9.01e-01

Pos logit contributions
pron+0.112
 butter+0.091
 comb+0.083
 basil+0.083
 teaspoon+0.082
Neg logit contributions
 villains-0.063
 speeds-0.056
 risks-0.055
 warriors-0.055
 wre-0.053
Token
Token ID198
Feature activation-5.33e-01

Pos logit contributions
 villains+0.057
 risks+0.053
 unnecessary+0.051
 wre+0.051
 warriors+0.051
Neg logit contributions
pron-0.039
 butter-0.031
 sugar-0.027
on-0.027
 basil-0.026
Token
Token ID198
Feature activation-7.39e-01

Pos logit contributions
 villains+0.031
 risks+0.028
 warriors+0.027
 opponents+0.027
 wre+0.027
Neg logit contributions
pron-0.026
 butter-0.021
on-0.019
 sugar-0.018
 comb-0.018
TokenL
Token ID43
Feature activation+1.01e-01

Pos logit contributions
 villains+0.044
 risks+0.042
 warriors+0.040
 opponents+0.040
 wre+0.040
Neg logit contributions
pron-0.034
 butter-0.027
on-0.023
 sugar-0.023
 slice-0.023
Token yes
Token ID3763
Feature activation-8.97e-02

Pos logit contributions
pron+0.037
 butter+0.028
 comb+0.024
 teaspoon+0.024
on+0.024
Neg logit contributions
 villains-0.048
 warriors-0.044
 risks-0.043
 wre-0.042
 War-0.042
Token,
Token ID11
Feature activation+4.64e-01

Pos logit contributions
 villains+0.005
 risks+0.005
 warriors+0.005
 wre+0.005
 opponents+0.005
Neg logit contributions
pron-0.004
 butter-0.004
on-0.003
 comb-0.003
 basil-0.003
Token yes
Token ID3763
Feature activation-1.42e-01

Pos logit contributions
pron+0.022
 butter+0.017
 teaspoon+0.015
 comb+0.015
 basil+0.015
Neg logit contributions
 villains-0.028
 warriors-0.025
 risks-0.025
 wre-0.024
 opponents-0.024
Token!"
Token ID2474
Feature activation-1.09e+00

Pos logit contributions
 villains+0.008
 risks+0.007
 warriors+0.007
 wre+0.007
 opponents+0.007
Neg logit contributions
pron-0.007
 butter-0.006
on-0.005
 comb-0.005
 teaspoon-0.005
Token Tim
Token ID5045
Feature activation-1.14e+00

Pos logit contributions
 villains+0.067
 risks+0.064
 unnecessary+0.062
 opponents+0.060
 warriors+0.060
Neg logit contributions
pron-0.044
 butter-0.039
 sugar-0.035
on-0.033
 mixer-0.032
Token and
Token ID290
Feature activation-1.52e+00

Pos logit contributions
 villains+0.079
 risks+0.072
 unnecessary+0.071
 prisoners+0.069
 wre+0.069
Neg logit contributions
pron-0.041
 butter-0.036
 comb-0.029
on-0.028
 sugar-0.028
Token Sam
Token ID3409
Feature activation-1.47e+00

Pos logit contributions
 villains+0.083
 risks+0.081
 unnecessary+0.077
 wre+0.074
 prisoners+0.074
Neg logit contributions
pron-0.070
 butter-0.063
 comb-0.056
 brush-0.056
on-0.055
Token say
Token ID910
Feature activation-1.19e+00

Pos logit contributions
 villains+0.088
 unnecessary+0.079
 risks+0.078
 prisoners+0.076
 warriors+0.075
Neg logit contributions
pron-0.065
 butter-0.059
 comb-0.053
 brush-0.052
on-0.050
Token.
Token ID13
Feature activation-1.10e+00

Pos logit contributions
 villains+0.068
 risks+0.064
 wre+0.062
 unnecessary+0.062
 warriors+0.061
Neg logit contributions
pron-0.057
 butter-0.047
on-0.043
 slice-0.040
 comb-0.040
Token They
Token ID1119
Feature activation-1.00e+00

Pos logit contributions
 villains+0.052
 risks+0.050
 unnecessary+0.049
 prisoners+0.046
 wre+0.045
Neg logit contributions
pron-0.062
 butter-0.053
on-0.048
 mug-0.048
 basil-0.048
Token sit
Token ID1650
Feature activation+2.33e-01

Pos logit contributions
 villains+0.053
 risks+0.048
 unnecessary+0.046
 warriors+0.045
 opponents+0.045
Neg logit contributions
pron-0.053
 butter-0.046
 comb-0.042
 brush-0.041
 slice-0.040
Token was
Token ID373
Feature activation+4.24e-01

Pos logit contributions
pron+0.035
 butter+0.029
 basil+0.026
 slice+0.026
 teaspoon+0.026
Neg logit contributions
 villains-0.029
 risks-0.026
 warriors-0.026
 opponents-0.025
 wre-0.025
Token a
Token ID257
Feature activation-2.08e-01

Pos logit contributions
pron+0.028
 butter+0.024
 slice+0.022
on+0.022
 basil+0.021
Neg logit contributions
 villains-0.017
 risks-0.015
 warriors-0.015
 wre-0.014
 prisoners-0.014
Token little
Token ID1310
Feature activation-1.10e+00

Pos logit contributions
 villains+0.011
 risks+0.009
 warriors+0.009
 wre+0.009
 opponents+0.009
Neg logit contributions
pron-0.012
 butter-0.010
on-0.009
 comb-0.008
 sugar-0.008
Token clumsy
Token ID39120
Feature activation-1.43e+00

Pos logit contributions
 villains+0.047
 opponents+0.039
 risks+0.039
 wre+0.038
 speeds+0.037
Neg logit contributions
pron-0.071
 butter-0.064
 comb-0.058
 teaspoon-0.057
 sugar-0.057
Token and
Token ID290
Feature activation+9.30e-01

Pos logit contributions
 villains+0.081
 wre+0.075
 risks+0.074
 opponents+0.074
 unnecessary+0.072
Neg logit contributions
pron-0.069
 butter-0.060
 comb-0.053
 teaspoon-0.051
 sugar-0.050
Token she
Token ID673
Feature activation-8.50e-02

Pos logit contributions
pron+0.054
 butter+0.046
 slice+0.043
on+0.042
 panc+0.042
Neg logit contributions
 villains-0.042
 speeds-0.035
 warriors-0.035
 risks-0.035
 unnecessary-0.034
Token accidentally
Token ID14716
Feature activation+1.41e+00

Pos logit contributions
 villains+0.004
 risks+0.004
 warriors+0.004
 wre+0.004
 opponents+0.004
Neg logit contributions
pron-0.005
 butter-0.004
on-0.003
 sugar-0.003
 basil-0.003
Token dropped
Token ID5710
Feature activation+2.54e-01

Pos logit contributions
pron+0.089
 butter+0.071
 basil+0.067
 recipe+0.065
 panc+0.064
Neg logit contributions
 villains-0.059
 risks-0.052
 wre-0.051
 speeds-0.051
 warriors-0.048
Token the
Token ID262
Feature activation+1.06e-01

Pos logit contributions
pron+0.014
 butter+0.010
 slice+0.009
on+0.009
 basil+0.009
Neg logit contributions
 villains-0.014
 risks-0.013
 wre-0.012
 warriors-0.012
 speeds-0.012
Token box
Token ID3091
Feature activation+9.65e-01

Pos logit contributions
pron+0.005
 butter+0.004
 basil+0.003
 comb+0.003
on+0.003
Neg logit contributions
 villains-0.006
 risks-0.006
 wre-0.006
 warriors-0.006
 prisoners-0.005
Token.
Token ID13
Feature activation+1.41e+00

Pos logit contributions
pron+0.060
 butter+0.049
 basil+0.045
 bacon+0.043
 sugar+0.043
Neg logit contributions
 villains-0.040
 risks-0.037
 wre-0.036
 speeds-0.036
 warriors-0.035
Token one
Token ID530
Feature activation-4.24e-01

Pos logit contributions
 villains+0.059
 warriors+0.053
 risks+0.051
 opponents+0.051
 wre+0.051
Neg logit contributions
pron-0.072
 butter-0.064
 comb-0.058
 brush-0.055
 egg-0.055
Token else
Token ID2073
Feature activation-1.59e+00

Pos logit contributions
 villains+0.023
 risks+0.021
 warriors+0.021
 wre+0.020
 opponents+0.020
Neg logit contributions
pron-0.022
 butter-0.018
 comb-0.016
on-0.016
 teaspoon-0.015
Token in
Token ID287
Feature activation-1.37e+00

Pos logit contributions
 villains+0.094
 risks+0.087
 warriors+0.086
 opponents+0.085
 wre+0.084
Neg logit contributions
pron-0.071
 butter-0.059
on-0.051
 comb-0.051
 sugar-0.049
Token the
Token ID262
Feature activation-8.97e-01

Pos logit contributions
 villains+0.079
 warriors+0.073
 risks+0.072
 opponents+0.071
 unnecessary+0.071
Neg logit contributions
pron-0.063
 butter-0.053
 comb-0.047
on-0.046
 teaspoon-0.045
Token world
Token ID995
Feature activation-1.12e+00

Pos logit contributions
 villains+0.046
 risks+0.041
 warriors+0.041
 opponents+0.040
 runners+0.039
Neg logit contributions
pron-0.050
 butter-0.043
 sugar-0.037
 comb-0.037
 mug-0.036
Token.
Token ID13
Feature activation-4.64e-01

Pos logit contributions
 villains+0.061
 risks+0.056
 warriors+0.055
 unnecessary+0.054
 opponents+0.054
Neg logit contributions
pron-0.057
 butter-0.048
on-0.042
 comb-0.041
 sugar-0.041
Token One
Token ID1881
Feature activation+4.20e-01

Pos logit contributions
 villains+0.023
 risks+0.021
 warriors+0.020
 wre+0.020
 opponents+0.020
Neg logit contributions
pron-0.026
 butter-0.022
on-0.020
 sugar-0.019
 comb-0.019
Token day
Token ID1110
Feature activation+3.26e-01

Pos logit contributions
pron+0.024
 butter+0.020
 basil+0.018
 teaspoon+0.018
 comb+0.018
Neg logit contributions
 villains-0.020
 risks-0.018
 warriors-0.017
 wre-0.017
 War-0.017
Token,
Token ID11
Feature activation+2.44e-01

Pos logit contributions
pron+0.016
 butter+0.012
 basil+0.011
 slice+0.010
 egg+0.010
Neg logit contributions
 villains-0.019
 risks-0.018
 warriors-0.017
 wre-0.017
 opponents-0.017
Token he
Token ID339
Feature activation-1.28e+00

Pos logit contributions
pron+0.016
 butter+0.013
 basil+0.012
 slice+0.012
 egg+0.012
Neg logit contributions
 villains-0.010
 risks-0.009
 warriors-0.009
 wre-0.009
 opponents-0.008
Token decided
Token ID3066
Feature activation-8.62e-01

Pos logit contributions
 villains+0.076
 risks+0.072
 unnecessary+0.068
 opponents+0.068
 warriors+0.068
Neg logit contributions
pron-0.059
 butter-0.049
 comb-0.043
on-0.040
 baked-0.040
Token feel
Token ID1254
Feature activation-5.10e-01

Pos logit contributions
 villains+0.018
 risks+0.016
 warriors+0.015
 wre+0.015
 opponents+0.015
Neg logit contributions
pron-0.017
 butter-0.014
 comb-0.012
on-0.012
 sugar-0.012
Token better
Token ID1365
Feature activation-6.53e-01

Pos logit contributions
 villains+0.026
 risks+0.023
 warriors+0.022
 opponents+0.022
 wre+0.022
Neg logit contributions
pron-0.029
 butter-0.025
 comb-0.021
 sugar-0.021
on-0.021
Token!"
Token ID2474
Feature activation-4.98e-01

Pos logit contributions
 villains+0.036
 risks+0.033
 wre+0.032
 warriors+0.032
 opponents+0.032
Neg logit contributions
pron-0.033
 butter-0.028
 comb-0.023
on-0.023
 sugar-0.023
Token The
Token ID383
Feature activation-1.14e+00

Pos logit contributions
 villains+0.024
 risks+0.022
 wre+0.021
 warriors+0.021
 opponents+0.020
Neg logit contributions
pron-0.029
 butter-0.025
 comb-0.022
on-0.022
 sugar-0.021
Token animals
Token ID4695
Feature activation+2.46e-03

Pos logit contributions
 villains+0.059
 risks+0.056
 wre+0.054
 warriors+0.053
 unnecessary+0.053
Neg logit contributions
pron-0.057
 butter-0.051
 comb-0.045
 mug-0.045
 brush-0.042
Token all
Token ID477
Feature activation-7.79e-01

Pos logit contributions
pron+0.000
 butter+0.000
 sugar+0.000
 basil+0.000
on+0.000
Neg logit contributions
 villains-0.000
 risks-0.000
 warriors-0.000
 wre-0.000
 opponents-0.000
Token worked
Token ID3111
Feature activation-1.03e+00

Pos logit contributions
 villains+0.044
 risks+0.041
 opponents+0.039
 warriors+0.039
 unnecessary+0.039
Neg logit contributions
pron-0.038
 butter-0.032
 comb-0.028
on-0.027
 slice-0.027
Token together
Token ID1978
Feature activation+1.11e+00

Pos logit contributions
 villains+0.065
 risks+0.057
 warriors+0.057
 opponents+0.056
 wre+0.056
Neg logit contributions
pron-0.045
 butter-0.038
on-0.031
 comb-0.031
 sugar-0.031
Token to
Token ID284
Feature activation+4.16e-01

Pos logit contributions
pron+0.066
 butter+0.054
 basil+0.048
 egg+0.047
 sugar+0.047
Neg logit contributions
 villains-0.052
 warriors-0.047
 risks-0.046
 opponents-0.046
 enemies-0.045
Token find
Token ID1064
Feature activation-1.10e+00

Pos logit contributions
pron+0.023
 butter+0.019
on+0.017
 teaspoon+0.017
 basil+0.017
Neg logit contributions
 villains-0.021
 risks-0.019
 wre-0.019
 warriors-0.018
 opponents-0.018
Token a
Token ID257
Feature activation+8.22e-01

Pos logit contributions
 villains+0.060
 risks+0.053
 opponents+0.053
 warriors+0.053
 runners+0.052
Neg logit contributions
pron-0.053
 butter-0.049
 sugar-0.041
on-0.040
 comb-0.040
Token the
Token ID262
Feature activation+1.03e+00

Pos logit contributions
pron+0.067
 butter+0.051
itten+0.049
ool+0.047
 egg+0.047
Neg logit contributions
 villains-0.076
 risks-0.070
 wre-0.070
 warriors-0.069
 opponents-0.069
Token y
Token ID331
Feature activation+1.49e+00

Pos logit contributions
pron+0.032
 butter+0.022
 basil+0.020
 comb+0.018
on+0.018
Neg logit contributions
 villains-0.068
 wre-0.067
 risks-0.067
 speeds-0.064
 prisoners-0.063
Tokenummy
Token ID13513
Feature activation+2.14e+00

Pos logit contributions
pron+0.060
 butter+0.040
 comb+0.038
 basil+0.035
on+0.033
Neg logit contributions
 wre-0.088
 risks-0.088
 villains-0.088
 opponents-0.086
 speeds-0.084
Token cake
Token ID12187
Feature activation+1.50e+00

Pos logit contributions
pron+0.073
 basil+0.048
 butter+0.047
 hen+0.044
 comb+0.041
Neg logit contributions
 risks-0.126
 villains-0.126
 wre-0.125
 speeds-0.122
 scores-0.121
Token.
Token ID13
Feature activation+1.21e-01

Pos logit contributions
pron+0.091
 butter+0.075
 basil+0.070
 sugar+0.067
 egg+0.067
Neg logit contributions
 villains-0.060
 warriors-0.054
 risks-0.052
 opponents-0.052
 wre-0.050
Token<|endoftext|>
Token ID50256
Feature activation-7.14e-01

Pos logit contributions
pron+0.007
 butter+0.006
 comb+0.005
 teaspoon+0.005
 egg+0.005
Neg logit contributions
 villains-0.006
 risks-0.005
 warriors-0.005
 opponents-0.005
 wre-0.005
TokenOnce
Token ID7454
Feature activation+7.00e-01

Pos logit contributions
 villains+0.042
 risks+0.038
 warriors+0.037
 opponents+0.037
 wre+0.037
Neg logit contributions
pron-0.034
 butter-0.028
on-0.025
 sugar-0.023
 comb-0.023
Token upon
Token ID2402
Feature activation-1.56e-01

Pos logit contributions
pron+0.041
 butter+0.035
 comb+0.031
 basil+0.031
 sugar+0.030
Neg logit contributions
 villains-0.032
 wre-0.028
 risks-0.028
 warriors-0.028
 opponents-0.027
Token a
Token ID257
Feature activation+4.10e-01

Pos logit contributions
 villains+0.009
 risks+0.008
 warriors+0.008
 wre+0.008
 opponents+0.008
Neg logit contributions
pron-0.008
 butter-0.006
 comb-0.005
 teaspoon-0.005
 sugar-0.005
Token time
Token ID640
Feature activation+3.96e-01

Pos logit contributions
pron+0.025
 butter+0.021
 basil+0.019
 comb+0.019
on+0.019
Neg logit contributions
 villains-0.018
 risks-0.016
 wre-0.016
 warriors-0.015
 opponents-0.015
Token,
Token ID11
Feature activation+1.04e+00

Pos logit contributions
pron+0.019
 butter+0.015
 comb+0.013
 basil+0.013
 sugar+0.012
Neg logit contributions
 villains-0.024
 risks-0.021
 wre-0.021
 warriors-0.021
 opponents-0.021
Token his
Token ID465
Feature activation-2.69e+00

Pos logit contributions
 villains+0.137
 risks+0.121
 warriors+0.119
 unnecessary+0.119
 runners+0.118
Neg logit contributions
pron-0.093
 butter-0.079
 baking-0.069
 comb-0.066
on-0.066
Token mind
Token ID2000
Feature activation-8.64e-01

Pos logit contributions
 villains+0.141
 risks+0.130
 wre+0.129
 unnecessary+0.127
 warriors+0.125
Neg logit contributions
pron-0.120
 brush-0.099
 butter-0.097
 comb-0.094
 mug-0.092
Token and
Token ID290
Feature activation-2.07e+00

Pos logit contributions
 villains+0.050
 risks+0.046
 warriors+0.045
 wre+0.044
 opponents+0.044
Neg logit contributions
pron-0.041
 butter-0.034
on-0.029
 comb-0.028
 teaspoon-0.028
Token make
Token ID787
Feature activation-2.63e+00

Pos logit contributions
 villains+0.112
 risks+0.107
 unnecessary+0.103
 warriors+0.103
 runners+0.101
Neg logit contributions
pron-0.096
 butter-0.080
 comb-0.074
 brush-0.070
 slice-0.067
Token the
Token ID262
Feature activation-3.06e+00

Pos logit contributions
 villains+0.161
 warriors+0.143
 runners+0.140
 risks+0.138
 opponents+0.137
Neg logit contributions
pron-0.101
 butter-0.098
on-0.080
 toast-0.078
 cake-0.075
Token decision
Token ID2551
Feature activation-2.56e+00

Pos logit contributions
 villains+0.170
 unnecessary+0.164
 wre+0.156
 risks+0.156
 warriors+0.154
Neg logit contributions
pron-0.119
 butter-0.100
 brush-0.099
 comb-0.094
 baking-0.091
Token.
Token ID13
Feature activation-1.53e+00

Pos logit contributions
 villains+0.154
 warriors+0.141
 risks+0.140
 unnecessary+0.137
 opponents+0.136
Neg logit contributions
pron-0.107
 butter-0.091
 comb-0.080
on-0.078
 brush-0.077
Token He
Token ID679
Feature activation-2.15e-01

Pos logit contributions
 villains+0.110
 risks+0.102
 unnecessary+0.100
 warriors+0.099
 prisoners+0.098
Neg logit contributions
pron-0.049
 butter-0.038
on-0.037
 sugar-0.031
 bowl-0.030
Token tried
Token ID3088
Feature activation-8.07e-01

Pos logit contributions
 villains+0.011
 risks+0.010
 warriors+0.010
 wre+0.010
 opponents+0.010
Neg logit contributions
pron-0.012
 butter-0.010
on-0.008
 sugar-0.008
 comb-0.008
Token to
Token ID284
Feature activation-5.53e-01

Pos logit contributions
 villains+0.048
 risks+0.043
 warriors+0.043
 wre+0.042
 opponents+0.042
Neg logit contributions
pron-0.037
 butter-0.031
 comb-0.026
on-0.026
 sugar-0.026
Token decide
Token ID5409
Feature activation-4.13e-01

Pos logit contributions
 villains+0.029
 risks+0.026
 warriors+0.025
 wre+0.025
 opponents+0.025
Neg logit contributions
pron-0.030
 butter-0.025
 comb-0.022
 teaspoon-0.022
on-0.022
Token,
Token ID11
Feature activation+7.63e-02

Pos logit contributions
 villains+0.016
 risks+0.015
 warriors+0.015
 wre+0.014
 opponents+0.014
Neg logit contributions
pron-0.014
 butter-0.012
 comb-0.010
on-0.010
 sugar-0.010
Token feeling
Token ID4203
Feature activation-7.01e-01

Pos logit contributions
pron+0.004
 butter+0.003
on+0.003
 teaspoon+0.003
 basil+0.003
Neg logit contributions
 villains-0.004
 risks-0.004
 warriors-0.003
 wre-0.003
 opponents-0.003
Token very
Token ID845
Feature activation-8.46e-01

Pos logit contributions
 villains+0.036
 risks+0.032
 warriors+0.031
 opponents+0.031
 wre+0.031
Neg logit contributions
pron-0.039
 butter-0.033
on-0.028
 comb-0.028
 sugar-0.028
Token brave
Token ID14802
Feature activation-2.48e+00

Pos logit contributions
 villains+0.043
 risks+0.038
 warriors+0.037
 opponents+0.036
 wre+0.036
Neg logit contributions
pron-0.048
 butter-0.040
 comb-0.035
on-0.035
 sugar-0.035
Token and
Token ID290
Feature activation-1.50e+00

Pos logit contributions
 villains+0.161
 wre+0.144
 War+0.144
 unnecessary+0.144
 risks+0.142
Neg logit contributions
pron-0.093
 butter-0.086
 egg-0.066
 comb-0.065
 sugar-0.065
Token mighty
Token ID18680
Feature activation-2.45e+00

Pos logit contributions
 villains+0.074
 wre+0.065
 risks+0.065
 warriors+0.064
 opponents+0.064
Neg logit contributions
pron-0.083
 butter-0.071
 comb-0.062
 sugar-0.061
 brush-0.060
Token.
Token ID13
Feature activation-1.23e+00

Pos logit contributions
 villains+0.143
 wre+0.130
 War+0.129
 unnecessary+0.128
 risks+0.127
Neg logit contributions
pron-0.104
 butter-0.094
 comb-0.079
 teaspoon-0.077
 egg-0.075
Token The
Token ID383
Feature activation-4.89e-01

Pos logit contributions
 villains+0.060
 risks+0.054
 wre+0.052
 warriors+0.052
 unnecessary+0.051
Neg logit contributions
pron-0.071
 butter-0.060
on-0.054
 sugar-0.053
 basil-0.053
Token fairy
Token ID25607
Feature activation+7.77e-01

Pos logit contributions
 villains+0.026
 risks+0.023
 warriors+0.023
 wre+0.022
 opponents+0.022
Neg logit contributions
pron-0.026
 butter-0.022
 comb-0.019
 teaspoon-0.019
 sugar-0.019
Token smiled
Token ID13541
Feature activation-1.04e-01

Pos logit contributions
pron+0.041
 butter+0.034
 sugar+0.030
 basil+0.030
 teaspoon+0.029
Neg logit contributions
 villains-0.041
 warriors-0.036
 risks-0.036
 opponents-0.035
 wre-0.035
Token and
Token ID290
Feature activation-2.63e-02

Pos logit contributions
 villains+0.006
 risks+0.005
 warriors+0.005
 opponents+0.005
 wre+0.005
Neg logit contributions
pron-0.005
 butter-0.004
on-0.004
 sugar-0.003
 comb-0.003
Token both
Token ID1111
Feature activation-1.52e+00

Pos logit contributions
 villains+0.052
 risks+0.048
 unnecessary+0.046
 wre+0.046
 opponents+0.046
Neg logit contributions
pron-0.049
 butter-0.041
 comb-0.036
on-0.036
 slice-0.035
Token ran
Token ID4966
Feature activation-3.87e+00

Pos logit contributions
 villains+0.088
 risks+0.083
 unnecessary+0.081
 opponents+0.078
 wre+0.078
Neg logit contributions
pron-0.069
 butter-0.059
on-0.052
 comb-0.052
 brush-0.050
Token deep
Token ID2769
Feature activation-3.07e+00

Pos logit contributions
 villains+0.280
 slaves+0.260
 unnecessary+0.257
 enemy+0.255
 opponents+0.254
Neg logit contributions
 her-0.075
,-0.072
 she-0.059
 brush-0.053
 butter-0.050
Token into
Token ID656
Feature activation-2.46e+00

Pos logit contributions
 villains+0.230
 War+0.211
 villain+0.200
 slaves+0.198
 prisoners+0.195
Neg logit contributions
 butter-0.061
 eggs-0.056
 sugar-0.052
 egg-0.052
 teaspoon-0.051
Token the
Token ID262
Feature activation-2.01e+00

Pos logit contributions
 villains+0.182
 opponents+0.168
 risks+0.164
 unnecessary+0.163
dies+0.163
Neg logit contributions
 butter-0.055
pron-0.055
 sugar-0.044
 teaspoon-0.044
 spoon-0.041
Token forest
Token ID8222
Feature activation-2.74e+00

Pos logit contributions
 villains+0.099
 opponents+0.083
 risks+0.082
 enemies+0.082
 prisoners+0.081
Neg logit contributions
pron-0.105
 butter-0.094
 sugar-0.086
 basil-0.086
 bowl-0.085
Token.
Token ID13
Feature activation-1.33e+00

Pos logit contributions
 villains+0.181
 unnecessary+0.170
 risks+0.165
 War+0.161
 opponents+0.160
Neg logit contributions
pron-0.088
 butter-0.078
 baking-0.075
on-0.066
 slice-0.064
Token It
Token ID632
Feature activation-9.73e-01

Pos logit contributions
 villains+0.073
 risks+0.065
 warriors+0.063
 wre+0.062
 enemies+0.061
Neg logit contributions
pron-0.068
 butter-0.057
 sugar-0.051
on-0.051
 basil-0.051
Token was
Token ID373
Feature activation-8.40e-02

Pos logit contributions
 villains+0.063
 risks+0.059
 opponents+0.057
 warriors+0.056
 unnecessary+0.056
Neg logit contributions
pron-0.039
 butter-0.032
 comb-0.026
on-0.025
 teaspoon-0.025
Token so
Token ID523
Feature activation-1.08e+00

Pos logit contributions
 villains+0.005
 risks+0.004
 warriors+0.004
 wre+0.004
 opponents+0.004
Neg logit contributions
pron-0.004
 butter-0.004
on-0.003
 comb-0.003
 teaspoon-0.003
Token much
Token ID881
Feature activation-1.41e+00

Pos logit contributions
 villains+0.061
 risks+0.056
 opponents+0.054
 warriors+0.054
 unnecessary+0.053
Neg logit contributions
pron-0.052
 butter-0.044
 sugar-0.038
on-0.038
 basil-0.037
Token.
Token ID13
Feature activation-6.44e-01

Pos logit contributions
 villains+0.127
 prisoners+0.114
 warriors+0.112
 opponents+0.111
 risks+0.109
Neg logit contributions
pron-0.075
 butter-0.068
 sugar-0.055
 comb-0.054
 egg-0.053
Token
Token ID198
Feature activation+1.73e-01

Pos logit contributions
 villains+0.034
 risks+0.030
 warriors+0.029
 wre+0.029
 opponents+0.029
Neg logit contributions
pron-0.035
 butter-0.029
on-0.026
 sugar-0.025
 basil-0.025
Token
Token ID198
Feature activation-5.19e-02

Pos logit contributions
pron+0.010
 butter+0.008
 comb+0.007
 teaspoon+0.007
 sugar+0.007
Neg logit contributions
 villains-0.009
 risks-0.008
 warriors-0.008
 wre-0.007
 opponents-0.007
TokenLuckily
Token ID42332
Feature activation-1.91e-01

Pos logit contributions
 villains+0.003
 risks+0.003
 warriors+0.003
 opponents+0.003
 wre+0.003
Neg logit contributions
pron-0.003
 butter-0.002
 comb-0.002
 teaspoon-0.002
 sugar-0.002
Token,
Token ID11
Feature activation-3.19e-01

Pos logit contributions
 villains+0.012
 risks+0.011
 warriors+0.011
 wre+0.011
 opponents+0.011
Neg logit contributions
pron-0.008
 butter-0.006
 comb-0.005
on-0.005
 sugar-0.005
Token the
Token ID262
Feature activation-2.59e+00

Pos logit contributions
 villains+0.019
 risks+0.017
 warriors+0.017
 wre+0.017
 opponents+0.017
Neg logit contributions
pron-0.015
 butter-0.012
 comb-0.010
on-0.010
 sugar-0.010
Token driver
Token ID4639
Feature activation-1.85e+00

Pos logit contributions
 villains+0.146
 unnecessary+0.144
 risks+0.143
minded+0.142
 warriors+0.142
Neg logit contributions
pron-0.113
 butter-0.087
 mixer-0.084
 brush-0.084
 comb-0.082
Token was
Token ID373
Feature activation-1.66e+00

Pos logit contributions
 villains+0.112
 risks+0.104
 unnecessary+0.101
 opponents+0.099
 warriors+0.098
Neg logit contributions
pron-0.081
 butter-0.067
on-0.060
 comb-0.059
 slice-0.054
Token a
Token ID257
Feature activation-2.72e+00

Pos logit contributions
 villains+0.095
dies+0.082
 opponents+0.082
 warriors+0.082
 risks+0.081
Neg logit contributions
pron-0.084
 butter-0.068
 comb-0.062
 baking-0.061
on-0.060
Token very
Token ID845
Feature activation-2.88e+00

Pos logit contributions
 villains+0.143
 risks+0.131
 speeds+0.130
 opponents+0.128
 wre+0.126
Neg logit contributions
pron-0.144
 butter-0.111
 comb-0.104
 teaspoon-0.102
 slice-0.101
Token smart
Token ID4451
Feature activation-3.35e+00

Pos logit contributions
 villains+0.164
dies+0.141
 risks+0.140
 warriors+0.140
 opponents+0.137
Neg logit contributions
pron-0.140
 butter-0.102
 brush-0.099
 comb-0.097
 teaspoon-0.094
Token stay
Token ID2652
Feature activation-1.91e+00

Pos logit contributions
 villains+0.097
 warriors+0.094
 risks+0.091
 unnecessary+0.090
 opponents+0.090
Neg logit contributions
pron-0.074
 comb-0.063
 butter-0.061
 brush-0.060
 bake-0.057
Token in
Token ID287
Feature activation-1.99e+00

Pos logit contributions
 villains+0.125
dies+0.110
 warriors+0.110
 opponents+0.109
 risks+0.109
Neg logit contributions
pron-0.076
 butter-0.059
 comb-0.055
 brush-0.053
 baking-0.052
Token their
Token ID511
Feature activation-2.36e+00

Pos logit contributions
 villains+0.135
 warriors+0.126
 opponents+0.126
 risks+0.125
 unnecessary+0.124
Neg logit contributions
pron-0.061
 butter-0.051
 mug-0.045
 baking-0.044
 teaspoon-0.043
Token room
Token ID2119
Feature activation-2.03e+00

Pos logit contributions
 villains+0.150
 risks+0.143
 warriors+0.143
 unnecessary+0.139
 runners+0.136
Neg logit contributions
pron-0.087
 mug-0.067
 brush-0.065
 butter-0.065
 bowl-0.064
Token.
Token ID13
Feature activation-1.43e+00

Pos logit contributions
 villains+0.122
 risks+0.113
 unnecessary+0.111
 warriors+0.109
 opponents+0.108
Neg logit contributions
pron-0.089
 butter-0.074
 comb-0.062
 slice-0.061
 baking-0.061
Token They
Token ID1119
Feature activation-2.79e+00

Pos logit contributions
 villains+0.071
 risks+0.066
 warriors+0.065
 wre+0.063
 prisoners+0.063
Neg logit contributions
pron-0.074
 butter-0.061
 sugar-0.058
 mug-0.058
 basil-0.057
Token want
Token ID765
Feature activation-2.81e-01

Pos logit contributions
 villains+0.162
 warriors+0.151
 opponents+0.148
 unnecessary+0.145
joy+0.142
Neg logit contributions
pron-0.109
 brush-0.106
 comb-0.106
 butter-0.103
 spoon-0.087
Token to
Token ID284
Feature activation-9.78e-01

Pos logit contributions
 villains+0.017
 risks+0.015
 warriors+0.015
 wre+0.015
 opponents+0.015
Neg logit contributions
pron-0.014
 butter-0.011
 comb-0.009
on-0.009
 sugar-0.009
Token play
Token ID711
Feature activation-6.42e-01

Pos logit contributions
 villains+0.054
 risks+0.050
 warriors+0.050
 opponents+0.048
 unnecessary+0.048
Neg logit contributions
pron-0.049
 butter-0.040
 comb-0.037
 brush-0.036
 bake-0.034
Token with
Token ID351
Feature activation-2.43e-01

Pos logit contributions
 villains+0.038
 risks+0.035
 warriors+0.035
 opponents+0.034
 wre+0.034
Neg logit contributions
pron-0.030
 butter-0.024
 comb-0.020
on-0.020
 teaspoon-0.020
Token Mom
Token ID11254
Feature activation+4.32e-02

Pos logit contributions
 villains+0.013
 risks+0.012
 warriors+0.011
 wre+0.011
 opponents+0.011
Neg logit contributions
pron-0.013
 butter-0.011
on-0.009
 comb-0.009
 sugar-0.009