TOP ACTIVATIONS
MAX = 0.186

will never take anything that isn â t hers
and learned that being smart isn 't just about knowing the
drew a pink flower . Isn 't it beautiful ?" she
hunger or eat things that aren 't good for him like
said , " Thank you Aunt ie !" <|endoftext|> Once upon
ask before taking something that isn 't available to him .
there is nothing there . Isn 't that interesting ?"
around before taking something that isn 't yours ."
ies spinning their magical club ." Sarah
made Ruby a princess . Isn 't she much better now
is the fairy garden . Isn 't it wonderful ?"
Look at my veil ! Isn 't it lovely ?" Anna
I painted the wall . Isn 't it nice and spicy
remind him that scary stories aren 't real life . <|endoftext|>
. Thank you for being so nice and trusting me ."
and not eat things that aren 't good for him .
acy and said , " Isn 't the sunset beautiful ?"
fair to take something that isn 't yours ."
never play with things you don 't know near the water
and respectful of animals we don 't know . They might

MIN ACTIVATIONS
MIN = -0.257

the beach . Dad was digging a deep hole in the
Tim my worked hard , digging and piling the sand ,
big tree . He was lying on the grass and holding
a surprise ! There , lying in front of him ,
special ! <|endoftext|> Kate was walking in the park with her
they saw a big bull lying on the ground . It
One day , Kate was walking in a deep , dark
unknown cord . It was lying on the ground , hidden
all !" <|endoftext|> Billy was walking to the station when he
One day , Anna was walking in the park . She
hut . <|endoftext|> John was walking in the park with his
he saw an interesting case lying in the grass . He
feet . <|endoftext|> John was hopping around the table . He
day , a baby was walking . She was very attractive
, Lily saw a skeleton lying on the ground . She
green frog found a handle lying in the bottom of the
day , as he was digging , he made a big
, the brown jet was flying very fast . It wanted
, the gr anny was walking in the park when she
. <|endoftext|> Tom my was walking in the park with his

INTERVAL 0.075 - 0.186
CONTAINS 4.671%

" It 's where you can mail letters to your friends
learned that it 's important to rest when you 're tired
for being so helpful . From that day
berries are very good for you , my dear . They
sweet ie . We can 't go to the lake today

INTERVAL -0.035 - 0.075
CONTAINS 74.501%

his boss called him into his office , and asked John
man learned to be patient and not get upset with his
They would eat y ummy food and play games . One
no other uncle in the world . He loved to eat
It made me sad . â J

INTERVAL -0.146 - -0.035
CONTAINS 20.639%

an empty bowl . He has no past els . He
scary . But he was also very ignorant . He didn
enjoyed the sunny day . <|endoftext|> Once upon a time there
Jack who loved to listen . He was only three years
the boy happily accepted . He smiled and thanked the old

INTERVAL -0.257 - -0.146
CONTAINS 0.189%

she found a lady bug crawling across the grass . She
across a big juicy steak lying on the grass in front
to find that she was flying safely back into the sky
. She thought he was lying to make her feel better
day , while he was walking in the forest , he
Token will
Token ID481
Feature activation+8.53e-02

Pos logit contributions
grand+0.117
 disobedience+0.107
 hairc+0.107
 supervision+0.105
 divorce+0.104
Neg logit contributions
 camps-0.093
 knights-0.084
latable-0.081
 hay-0.081
 clearing-0.078
Token never
Token ID1239
Feature activation+8.31e-02

Pos logit contributions
grand+0.140
 disobedience+0.131
 worry+0.130
 disob+0.130
 promise+0.128
Neg logit contributions
 camps-0.126
latable-0.113
 knights-0.112
 hay-0.109
aw-0.108
Token take
Token ID1011
Feature activation+1.02e-03

Pos logit contributions
grand+0.118
 disobedience+0.106
 hairc+0.105
 supervision+0.103
 divorce+0.102
Neg logit contributions
 camps-0.136
 knights-0.126
latable-0.122
 fish-0.120
 hay-0.120
Token anything
Token ID1997
Feature activation+9.90e-02

Pos logit contributions
grand+0.002
t+0.002
 disobedience+0.002
rav+0.002
 divorce+0.001
Neg logit contributions
 camps-0.001
 knights-0.001
 hay-0.001
 outside-0.001
 fish-0.001
Token that
Token ID326
Feature activation+5.68e-02

Pos logit contributions
grand+0.187
 disobedience+0.173
 hairc+0.172
 supervision+0.169
 divorce+0.168
Neg logit contributions
 camps-0.117
 knights-0.103
latable-0.100
 hay-0.099
 outside-0.096
Token isn
Token ID2125
Feature activation+1.86e-01

Pos logit contributions
 babys+0.097
grand+0.096
 passports+0.094
 hairc+0.093
 divorce+0.092
Neg logit contributions
 and-0.071
 outside-0.068
 He-0.068
 like-0.066
 in-0.064
Tokenâ
Token ID22940
Feature activation+7.49e-02

Pos logit contributions
grand+0.187
t+0.172
 disobedience+0.161
 hairc+0.158
 divorce+0.156
Neg logit contributions
 camps-0.387
 knights-0.366
latable-0.358
 hay-0.356
 clearing-0.351
Token
Token ID26391
Feature activation+1.46e-01

Pos logit contributions
grand+0.108
 hairc+0.107
 supervision+0.104
 disobedience+0.102
 divorce+0.100
Neg logit contributions
 camps-0.109
latable-0.105
 outside-0.103
 knights-0.101
acking-0.101
Token
Token ID8151
Feature activation+1.43e-01

Pos logit contributions
grand+0.228
 hairc+0.227
 disobedience+0.221
 supervision+0.219
 divorce+0.217
Neg logit contributions
 camps-0.197
 knights-0.186
 outside-0.178
aw-0.178
 jungle-0.177
Tokent
Token ID83
Feature activation+1.58e-01

Pos logit contributions
grand+0.101
 disobedience+0.090
 hairc+0.088
 supervision+0.082
 divorce+0.079
Neg logit contributions
 camps-0.321
 outside-0.304
acking-0.299
latable-0.298
 knights-0.297
Token hers
Token ID25144
Feature activation+8.76e-02

Pos logit contributions
grand+0.263
 hairc+0.244
 disobedience+0.241
 lesson+0.240
 divorce+0.238
Neg logit contributions
 camps-0.236
latable-0.211
 knights-0.201
acking-0.195
 clearing-0.194
Token and
Token ID290
Feature activation+5.24e-02

Pos logit contributions
grand+0.250
 disobedience+0.235
 hairc+0.233
 supervision+0.231
 divorce+0.229
Neg logit contributions
 camps-0.096
 knights-0.080
latable-0.077
 hay-0.076
 outside-0.072
Token learned
Token ID4499
Feature activation+3.67e-02

Pos logit contributions
grand+0.080
 disobedience+0.074
 hairc+0.073
 supervision+0.071
 divorce+0.071
Neg logit contributions
 camps-0.084
latable-0.075
 knights-0.073
 hay-0.073
 clearing-0.072
Token that
Token ID326
Feature activation+4.37e-02

Pos logit contributions
grand+0.082
 disobedience+0.077
 supervision+0.076
 hairc+0.076
t+0.075
Neg logit contributions
 camps-0.031
 knights-0.026
 hay-0.025
latable-0.025
 outside-0.024
Token being
Token ID852
Feature activation+1.90e-02

Pos logit contributions
grand+0.097
 supervision+0.088
t+0.087
rav+0.085
 babys+0.084
Neg logit contributions
 outside-0.040
 knights-0.039
 and-0.038
 in-0.035
 all-0.035
Token smart
Token ID4451
Feature activation+4.13e-02

Pos logit contributions
grand+0.032
 disobedience+0.030
 hairc+0.029
 supervision+0.029
 divorce+0.029
Neg logit contributions
 camps-0.026
 knights-0.024
latable-0.023
 hay-0.023
 clearing-0.022
Token isn
Token ID2125
Feature activation+1.84e-01

Pos logit contributions
grand+0.077
 disobedience+0.071
 hairc+0.071
 supervision+0.069
 divorce+0.069
Neg logit contributions
 camps-0.049
 knights-0.045
 hay-0.043
latable-0.042
 outside-0.041
Token't
Token ID470
Feature activation+1.27e-01

Pos logit contributions
grand+0.188
t+0.171
 disobedience+0.162
 hairc+0.161
 divorce+0.158
Neg logit contributions
 camps-0.378
 knights-0.357
latable-0.353
 hay-0.344
 clearing-0.342
Token just
Token ID655
Feature activation+1.09e-01

Pos logit contributions
grand+0.210
 hairc+0.205
 lesson+0.203
 divorce+0.198
 disobedience+0.197
Neg logit contributions
 camps-0.185
latable-0.165
acking-0.156
 clearing-0.151
 knights-0.148
Token about
Token ID546
Feature activation+3.90e-02

Pos logit contributions
grand+0.205
 supervision+0.197
 hairc+0.197
 disobedience+0.196
 lesson+0.189
Neg logit contributions
 camps-0.134
latable-0.113
 knights-0.106
acking-0.106
 clearing-0.103
Token knowing
Token ID6970
Feature activation+9.02e-02

Pos logit contributions
grand+0.074
 disobedience+0.068
 hairc+0.067
 supervision+0.066
 divorce+0.066
Neg logit contributions
 camps-0.046
 knights-0.042
latable-0.040
 hay-0.039
 clearing-0.038
Token the
Token ID262
Feature activation+6.49e-02

Pos logit contributions
grand+0.181
 disobedience+0.170
 hairc+0.170
 supervision+0.167
 divorce+0.165
Neg logit contributions
 camps-0.097
 knights-0.083
latable-0.082
 hay-0.078
 clearing-0.077
Token drew
Token ID9859
Feature activation-2.23e-02

Pos logit contributions
grand+0.004
 disobedience+0.004
 hairc+0.004
 supervision+0.004
t+0.004
Neg logit contributions
 camps-0.003
 knights-0.003
latable-0.003
 hay-0.003
 clearing-0.003
Token a
Token ID257
Feature activation+3.07e-02

Pos logit contributions
 camps+0.026
latable+0.025
acking+0.022
 clearing+0.021
 hay+0.019
Neg logit contributions
grand-0.042
 hairc-0.042
 rhy-0.039
 note-0.039
rav-0.038
Token pink
Token ID11398
Feature activation+1.18e-02

Pos logit contributions
grand+0.052
 hairc+0.048
 disobedience+0.048
 divorce+0.047
 lesson+0.047
Neg logit contributions
 camps-0.040
latable-0.039
 knights-0.035
 hay-0.034
acking-0.034
Token flower
Token ID15061
Feature activation+4.02e-02

Pos logit contributions
grand+0.018
 note+0.017
 poem+0.017
 rhy+0.017
 hairc+0.016
Neg logit contributions
latable-0.018
 camps-0.017
acking-0.016
und-0.015
aw-0.015
Token.
Token ID13
Feature activation-2.09e-03

Pos logit contributions
grand+0.077
 rhy+0.072
 note+0.072
 hairc+0.070
 promise+0.069
Neg logit contributions
latable-0.047
 camps-0.045
acking-0.037
 clearing-0.036
 knights-0.035
Token Isn
Token ID25110
Feature activation+1.80e-01

Pos logit contributions
 camps+0.003
latable+0.003
acking+0.003
 clearing+0.003
 hay+0.003
Neg logit contributions
grand-0.004
 hairc-0.003
 rhy-0.003
t-0.003
rav-0.003
Token't
Token ID470
Feature activation+1.20e-01

Pos logit contributions
grand+0.203
 hairc+0.177
 disobedience+0.176
t+0.175
 divorce+0.170
Neg logit contributions
 camps-0.351
 knights-0.325
latable-0.321
 hay-0.319
 clearing-0.313
Token it
Token ID340
Feature activation+1.24e-01

Pos logit contributions
grand+0.227
 disobedience+0.217
 supervision+0.215
 protests+0.212
 hairc+0.210
Neg logit contributions
 camps-0.135
 knights-0.124
 outside-0.116
 hay-0.115
aw-0.114
Token beautiful
Token ID4950
Feature activation+2.52e-02

Pos logit contributions
grand+0.240
 disobedience+0.228
 supervision+0.225
 hairc+0.222
t+0.218
Neg logit contributions
 camps-0.139
 knights-0.126
 hay-0.118
 fish-0.117
latable-0.116
Token?"
Token ID1701
Feature activation+5.09e-02

Pos logit contributions
grand+0.043
 hairc+0.041
 disobedience+0.039
 divorce+0.039
t+0.039
Neg logit contributions
 camps-0.034
latable-0.031
 knights-0.029
 hay-0.029
 clearing-0.029
Token she
Token ID673
Feature activation-3.08e-02

Pos logit contributions
grand+0.097
 supervision+0.091
 disobedience+0.090
rav+0.089
 policy+0.088
Neg logit contributions
 knights-0.055
 camps-0.055
 He-0.055
 outside-0.053
 hay-0.051
Token hunger
Token ID16460
Feature activation+2.46e-02

Pos logit contributions
grand+0.043
 disobedience+0.041
 hairc+0.041
 supervision+0.041
 divorce+0.039
Neg logit contributions
 camps-0.037
latable-0.034
 knights-0.032
 clearing-0.032
acking-0.031
Token or
Token ID393
Feature activation+8.84e-02

Pos logit contributions
grand+0.047
 disobedience+0.045
 supervision+0.045
 hairc+0.045
t+0.044
Neg logit contributions
 camps-0.028
latable-0.026
 clearing-0.024
 knights-0.024
 astronaut-0.023
Token eat
Token ID4483
Feature activation+2.12e-03

Pos logit contributions
grand+0.138
 disobedience+0.123
 hairc+0.123
t+0.122
rav+0.121
Neg logit contributions
 camps-0.130
 knights-0.121
 hay-0.118
 fish-0.117
 outside-0.115
Token things
Token ID1243
Feature activation+1.07e-01

Pos logit contributions
grand+0.003
 hairc+0.003
 supervision+0.003
 disobedience+0.003
 punishment+0.003
Neg logit contributions
 camps-0.003
 clearing-0.003
latable-0.003
 astronaut-0.003
 knights-0.002
Token that
Token ID326
Feature activation+6.16e-02

Pos logit contributions
grand+0.213
t+0.201
 hairc+0.200
 disobedience+0.200
 supervision+0.199
Neg logit contributions
 camps-0.121
latable-0.104
 knights-0.103
 clearing-0.103
 hay-0.097
Token aren
Token ID3588
Feature activation+1.79e-01

Pos logit contributions
oral+0.105
grand+0.104
 babys+0.104
 protests+0.103
rav+0.101
Neg logit contributions
 fish-0.075
 and-0.075
 He-0.074
 like-0.071
 outside-0.070
Token't
Token ID470
Feature activation+1.35e-01

Pos logit contributions
grand+0.216
 disobedience+0.192
 hairc+0.190
 supervision+0.186
 divorce+0.183
Neg logit contributions
 camps-0.336
 knights-0.309
latable-0.306
 hay-0.302
 clearing-0.296
Token good
Token ID922
Feature activation+9.50e-02

Pos logit contributions
grand+0.216
 lesson+0.205
 disobedience+0.199
 policy+0.198
 worry+0.197
Neg logit contributions
 camps-0.202
latable-0.183
 knights-0.174
acking-0.173
aw-0.170
Token for
Token ID329
Feature activation+6.95e-02

Pos logit contributions
grand+0.176
 disobedience+0.172
 policy+0.171
t+0.170
 supervision+0.168
Neg logit contributions
 camps-0.115
 clearing-0.105
latable-0.104
 hay-0.096
 jungle-0.094
Token him
Token ID683
Feature activation+1.01e-01

Pos logit contributions
grand+0.123
 disobedience+0.116
 hairc+0.114
 supervision+0.114
t+0.110
Neg logit contributions
 camps-0.092
latable-0.080
 knights-0.079
 clearing-0.077
 hay-0.077
Token like
Token ID588
Feature activation+5.91e-02

Pos logit contributions
grand+0.220
 disobedience+0.206
 hairc+0.205
 supervision+0.203
 divorce+0.202
Neg logit contributions
 camps-0.091
 knights-0.077
latable-0.074
 hay-0.072
 clearing-0.069
Token said
Token ID531
Feature activation+5.81e-02

Pos logit contributions
 camps+0.006
latable+0.005
 knights+0.005
 hay+0.005
 clearing+0.005
Neg logit contributions
grand-0.006
 disobedience-0.006
 hairc-0.006
 supervision-0.006
 divorce-0.006
Token,
Token ID11
Feature activation+4.74e-02

Pos logit contributions
grand+0.083
 hairc+0.079
 disobedience+0.077
 lesson+0.076
 punishment+0.075
Neg logit contributions
 camps-0.097
latable-0.089
 clearing-0.087
acking-0.083
 jungle-0.082
Token "
Token ID366
Feature activation+6.34e-02

Pos logit contributions
grand+0.102
 disobedience+0.096
 hairc+0.095
 supervision+0.094
 divorce+0.093
Neg logit contributions
 camps-0.044
 knights-0.037
latable-0.036
 hay-0.035
 clearing-0.034
TokenThank
Token ID10449
Feature activation+5.79e-02

Pos logit contributions
grand+0.104
 disobedience+0.094
 hairc+0.094
t+0.092
 supervision+0.091
Neg logit contributions
 camps-0.092
 knights-0.083
latable-0.083
 hay-0.080
 clearing-0.078
Token you
Token ID345
Feature activation+1.06e-01

Pos logit contributions
grand+0.072
 disobedience+0.066
 supervision+0.064
 hairc+0.064
 divorce+0.063
Neg logit contributions
 camps-0.104
 knights-0.098
latable-0.095
 hay-0.094
 clearing-0.092
Token Aunt
Token ID38074
Feature activation+1.78e-01

Pos logit contributions
grand+0.206
 hairc+0.192
t+0.191
 disobedience+0.191
rav+0.187
Neg logit contributions
 camps-0.114
latable-0.106
 knights-0.098
 clearing-0.096
acking-0.095
Tokenie
Token ID494
Feature activation+8.24e-02

Pos logit contributions
grand+0.318
 disobedience+0.306
 supervision+0.303
 hairc+0.297
 divorce+0.293
Neg logit contributions
 camps-0.223
 knights-0.194
 hay-0.186
latable-0.181
 outside-0.181
Token!"
Token ID2474
Feature activation+6.54e-02

Pos logit contributions
grand+0.167
 hairc+0.153
t+0.152
 disobedience+0.151
 lesson+0.149
Neg logit contributions
 camps-0.088
latable-0.081
 clearing-0.074
 knights-0.074
acking-0.072
Token<|endoftext|>
Token ID50256
Feature activation+2.77e-02

Pos logit contributions
grand+0.144
 disobedience+0.132
 hairc+0.132
t+0.130
 lesson+0.127
Neg logit contributions
 camps-0.062
latable-0.054
 knights-0.052
 clearing-0.047
 hay-0.047
TokenOnce
Token ID7454
Feature activation+9.25e-02

Pos logit contributions
grand+0.042
t+0.040
 hairc+0.040
 disobedience+0.038
 policy+0.037
Neg logit contributions
 camps-0.040
latable-0.038
 knights-0.036
 astronaut-0.035
 hay-0.035
Token upon
Token ID2402
Feature activation-1.39e-02

Pos logit contributions
grand+0.166
 disobedience+0.156
 supervision+0.155
 hairc+0.155
 divorce+0.152
Neg logit contributions
 camps-0.119
 outside-0.103
 knights-0.099
 and-0.099
 He-0.099
Token ask
Token ID1265
Feature activation+5.71e-02

Pos logit contributions
grand+0.121
 disobedience+0.111
 hairc+0.108
 supervision+0.107
 disob+0.106
Neg logit contributions
 camps-0.138
 knights-0.122
latable-0.121
 hay-0.117
aw-0.116
Token before
Token ID878
Feature activation+1.10e-01

Pos logit contributions
grand+0.119
t+0.110
 divorce+0.109
 hairc+0.109
 disobedience+0.109
Neg logit contributions
 camps-0.054
 knights-0.051
 outside-0.048
 hay-0.047
 and-0.044
Token taking
Token ID2263
Feature activation-1.10e-02

Pos logit contributions
grand+0.177
 disobedience+0.160
 hairc+0.160
 divorce+0.157
 supervision+0.157
Neg logit contributions
 camps-0.160
 knights-0.147
 hay-0.141
latable-0.140
 fish-0.138
Token something
Token ID1223
Feature activation+8.24e-02

Pos logit contributions
 camps+0.017
 knights+0.017
 hay+0.016
 outside+0.016
 fish+0.016
Neg logit contributions
grand-0.017
t-0.015
 disobedience-0.015
rav-0.014
 divorce-0.014
Token that
Token ID326
Feature activation+4.97e-02

Pos logit contributions
grand+0.158
 disobedience+0.146
rav+0.144
 hairc+0.143
 divorce+0.142
Neg logit contributions
 camps-0.093
 knights-0.082
 outside-0.082
 fish-0.079
 hay-0.079
Token isn
Token ID2125
Feature activation+1.77e-01

Pos logit contributions
 babys+0.084
 protests+0.081
grand+0.081
 hairc+0.081
 divorce+0.079
Neg logit contributions
 He-0.060
 and-0.060
 fish-0.059
 outside-0.058
 like-0.055
Token't
Token ID470
Feature activation+1.22e-01

Pos logit contributions
grand+0.178
t+0.163
 disobedience+0.152
 hairc+0.150
 divorce+0.148
Neg logit contributions
 camps-0.371
 knights-0.350
latable-0.341
 hay-0.338
 clearing-0.337
Token available
Token ID1695
Feature activation+6.83e-02

Pos logit contributions
grand+0.184
 lesson+0.175
 supervision+0.169
 hairc+0.168
 policy+0.168
Neg logit contributions
 camps-0.200
latable-0.184
acking-0.176
 clearing-0.169
 jungle-0.165
Token to
Token ID284
Feature activation+9.74e-02

Pos logit contributions
grand+0.143
 disobedience+0.134
 hairc+0.133
 supervision+0.131
 divorce+0.131
Neg logit contributions
 camps-0.067
 knights-0.057
latable-0.055
 hay-0.054
 clearing-0.052
Token him
Token ID683
Feature activation+8.04e-02

Pos logit contributions
grand+0.170
rav+0.160
 disobedience+0.158
 divorce+0.158
 hairc+0.158
Neg logit contributions
 camps-0.114
 fish-0.109
 knights-0.109
 outside-0.106
 hay-0.105
Token.
Token ID13
Feature activation+1.98e-02

Pos logit contributions
grand+0.180
 hairc+0.171
rav+0.170
 divorce+0.170
 disobedience+0.169
Neg logit contributions
 camps-0.060
 knights-0.053
 outside-0.049
 hay-0.048
 fish-0.047
Token there
Token ID612
Feature activation+4.23e-02

Pos logit contributions
grand+0.060
 disobedience+0.055
t+0.055
 divorce+0.054
rav+0.054
Neg logit contributions
 knights-0.034
 camps-0.033
 fish-0.032
 and-0.031
 hay-0.031
Token is
Token ID318
Feature activation+5.58e-02

Pos logit contributions
grand+0.074
 disobedience+0.070
 hairc+0.068
 divorce+0.068
 supervision+0.067
Neg logit contributions
 camps-0.051
 knights-0.048
 hay-0.046
 fish-0.046
 outside-0.045
Token nothing
Token ID2147
Feature activation+6.24e-02

Pos logit contributions
grand+0.120
 disobedience+0.112
 hairc+0.112
 supervision+0.110
 divorce+0.109
Neg logit contributions
 camps-0.052
 knights-0.044
latable-0.043
 hay-0.041
 clearing-0.040
Token there
Token ID612
Feature activation+4.73e-02

Pos logit contributions
grand+0.111
 disobedience+0.103
 hairc+0.102
 supervision+0.101
 divorce+0.099
Neg logit contributions
 camps-0.080
 knights-0.072
 hay-0.069
latable-0.068
 outside-0.067
Token.
Token ID13
Feature activation+1.65e-02

Pos logit contributions
grand+0.097
 disobedience+0.093
 hairc+0.092
 divorce+0.091
 supervision+0.091
Neg logit contributions
 camps-0.043
 knights-0.039
 outside-0.039
 hay-0.037
 fish-0.036
Token Isn
Token ID25110
Feature activation+1.77e-01

Pos logit contributions
grand+0.028
 disobedience+0.026
 hairc+0.025
t+0.025
 supervision+0.025
Neg logit contributions
 camps-0.024
latable-0.021
 knights-0.021
 hay-0.020
 football-0.019
Token't
Token ID470
Feature activation+1.22e-01

Pos logit contributions
grand+0.205
 disobedience+0.181
 hairc+0.179
t+0.174
 supervision+0.173
Neg logit contributions
 camps-0.340
 knights-0.314
latable-0.310
 hay-0.307
 clearing-0.301
Token that
Token ID326
Feature activation+5.71e-02

Pos logit contributions
grand+0.215
 hairc+0.194
 disobedience+0.193
 divorce+0.191
 lesson+0.189
Neg logit contributions
 camps-0.172
latable-0.158
acking-0.151
 knights-0.143
 clearing-0.141
Token interesting
Token ID3499
Feature activation+2.13e-02

Pos logit contributions
grand+0.105
 supervision+0.101
 disobedience+0.100
 protests+0.098
 babys+0.096
Neg logit contributions
 fish-0.065
 knights-0.063
 camps-0.061
 hay-0.060
 outside-0.060
Token?"
Token ID1701
Feature activation+6.27e-02

Pos logit contributions
grand+0.039
 disobedience+0.037
 supervision+0.036
t+0.036
 divorce+0.036
Neg logit contributions
 camps-0.025
 knights-0.024
 fish-0.022
 hay-0.022
 jungle-0.022
Token
Token ID198
Feature activation+5.90e-02

Pos logit contributions
grand+0.129
 disobedience+0.129
 supervision+0.128
 divorce+0.125
 policy+0.125
Neg logit contributions
 camps-0.059
 knights-0.051
 He-0.050
 fields-0.048
 and-0.048
Token around
Token ID1088
Feature activation+3.13e-02

Pos logit contributions
grand+0.102
 disobedience+0.095
 hairc+0.095
 supervision+0.093
t+0.092
Neg logit contributions
 camps-0.063
latable-0.054
 knights-0.054
 hay-0.052
 clearing-0.051
Token before
Token ID878
Feature activation+9.50e-02

Pos logit contributions
grand+0.054
 disobedience+0.052
 supervision+0.052
 hairc+0.051
 worry+0.050
Neg logit contributions
 camps-0.040
latable-0.040
acking-0.033
 clearing-0.032
aw-0.031
Token taking
Token ID2263
Feature activation-1.14e-02

Pos logit contributions
grand+0.140
 disobedience+0.130
 hairc+0.127
 supervision+0.125
 divorce+0.123
Neg logit contributions
 camps-0.151
latable-0.139
 knights-0.136
 hay-0.132
 clearing-0.131
Token something
Token ID1223
Feature activation+7.63e-02

Pos logit contributions
 camps+0.015
 knights+0.014
 hay+0.013
latable+0.013
 fish+0.013
Neg logit contributions
grand-0.020
 disobedience-0.019
 hairc-0.018
 divorce-0.018
 supervision-0.018
Token that
Token ID326
Feature activation+4.48e-02

Pos logit contributions
grand+0.146
 disobedience+0.137
 hairc+0.133
 divorce+0.133
 supervision+0.132
Neg logit contributions
 camps-0.088
 knights-0.078
 hay-0.074
 fish-0.073
 outside-0.073
Token isn
Token ID2125
Feature activation+1.76e-01

Pos logit contributions
grand+0.069
 disobedience+0.066
 hairc+0.066
 protests+0.066
 divorce+0.065
Neg logit contributions
 camps-0.061
 fish-0.060
 hay-0.059
 knights-0.058
 outside-0.058
Token't
Token ID470
Feature activation+1.22e-01

Pos logit contributions
grand+0.178
t+0.163
 disobedience+0.153
 hairc+0.151
 divorce+0.148
Neg logit contributions
 camps-0.362
 knights-0.344
latable-0.338
 hay-0.331
 clearing-0.328
Token yours
Token ID12431
Feature activation+7.48e-02

Pos logit contributions
grand+0.181
 lesson+0.164
 hairc+0.161
 policy+0.160
 supervision+0.159
Neg logit contributions
 camps-0.201
latable-0.188
acking-0.178
 knights-0.172
 clearing-0.169
Token."
Token ID526
Feature activation+5.12e-02

Pos logit contributions
grand+0.135
t+0.126
 disobedience+0.124
 hairc+0.124
 supervision+0.123
Neg logit contributions
 camps-0.096
latable-0.088
 clearing-0.082
 hay-0.081
 knights-0.080
Token
Token ID198
Feature activation+5.49e-02

Pos logit contributions
grand+0.091
 disobedience+0.085
 hairc+0.083
t+0.082
 supervision+0.081
Neg logit contributions
 camps-0.066
latable-0.059
 knights-0.055
 astronaut-0.054
 clearing-0.053
Token
Token ID198
Feature activation+4.17e-02

Pos logit contributions
grand+0.100
 disobedience+0.092
t+0.091
 hairc+0.091
 divorce+0.089
Neg logit contributions
 camps-0.069
 knights-0.061
latable-0.061
 hay-0.057
 clearing-0.056
Tokenies
Token ID444
Feature activation+9.91e-04

Pos logit contributions
grand+0.058
t+0.057
 hairc+0.057
 promise+0.055
 worry+0.055
Neg logit contributions
 camps-0.038
latable-0.035
 knights-0.032
 clearing-0.032
aw-0.031
Token spinning
Token ID19493
Feature activation-7.22e-02

Pos logit contributions
grand+0.002
t+0.002
 supervision+0.002
 hairc+0.002
 disobedience+0.002
Neg logit contributions
 camps-0.001
latable-0.001
 knights-0.001
acking-0.001
 clearing-0.001
Token
Token ID220
Feature activation+1.69e-01

Pos logit contributions
 camps+0.068
latable+0.059
 knights+0.056
 clearing+0.053
 hay+0.052
Neg logit contributions
grand-0.149
 hairc-0.145
t-0.144
 disobedience-0.141
 lesson-0.140
Token their
Token ID511
Feature activation+5.24e-02

Pos logit contributions
grand+0.240
 disobedience+0.218
 hairc+0.215
 supervision+0.212
 divorce+0.209
Neg logit contributions
 camps-0.278
 knights-0.253
latable-0.250
 hay-0.246
 clearing-0.241
Token magical
Token ID10883
Feature activation+9.12e-03

Pos logit contributions
grand+0.101
 disobedience+0.093
 divorce+0.090
 supervision+0.090
 hairc+0.090
Neg logit contributions
 camps-0.062
 knights-0.056
 outside-0.056
 pond-0.055
 jungle-0.054
Token
Token ID220
Feature activation+1.76e-01

Pos logit contributions
grand+0.015
 disobedience+0.014
 hairc+0.014
 supervision+0.014
 divorce+0.014
Neg logit contributions
 camps-0.013
 knights-0.012
latable-0.011
 hay-0.011
 clearing-0.011
Token club
Token ID3430
Feature activation+4.41e-03

Pos logit contributions
grand+0.235
 disobedience+0.211
 hairc+0.208
t+0.206
 supervision+0.204
Neg logit contributions
 camps-0.307
 knights-0.282
latable-0.280
 hay-0.275
 clearing-0.270
Token."
Token ID22135
Feature activation+5.48e-02

Pos logit contributions
grand+0.009
t+0.008
 disobedience+0.008
 hairc+0.008
 supervision+0.008
Neg logit contributions
 camps-0.005
latable-0.004
 knights-0.004
 hay-0.004
 clearing-0.004
Token
Token ID198
Feature activation+4.70e-02

Pos logit contributions
grand+0.094
 disobedience+0.087
 hairc+0.086
 supervision+0.085
 divorce+0.084
Neg logit contributions
 camps-0.074
 knights-0.067
latable-0.065
 hay-0.064
 clearing-0.063
Token
Token ID198
Feature activation+3.50e-02

Pos logit contributions
grand+0.106
 hairc+0.106
 supervision+0.105
 disobedience+0.102
 divorce+0.100
Neg logit contributions
 camps-0.033
 outside-0.029
 He-0.029
 clearing-0.028
 and-0.028
TokenSarah
Token ID29284
Feature activation+3.81e-02

Pos logit contributions
 hairc+0.063
grand+0.063
 supervision+0.062
 disobedience+0.060
 divorce+0.059
Neg logit contributions
 camps-0.041
 clearing-0.037
 knights-0.036
aw-0.036
 hay-0.036
Token made
Token ID925
Feature activation+1.94e-02

Pos logit contributions
grand+0.013
 disobedience+0.012
 hairc+0.012
 supervision+0.012
 divorce+0.012
Neg logit contributions
 camps-0.011
 knights-0.010
latable-0.010
 hay-0.010
 fish-0.010
Token Ruby
Token ID10888
Feature activation+6.54e-02

Pos logit contributions
grand+0.039
 hairc+0.037
 disobedience+0.036
 supervision+0.036
 divorce+0.035
Neg logit contributions
 camps-0.021
latable-0.018
 knights-0.017
 clearing-0.017
acking-0.017
Token a
Token ID257
Feature activation+3.41e-02

Pos logit contributions
grand+0.111
 hairc+0.103
 disobedience+0.101
 promise+0.101
 worry+0.101
Neg logit contributions
 camps-0.090
latable-0.082
 knights-0.077
 hay-0.077
 clearing-0.076
Token princess
Token ID21752
Feature activation+2.92e-02

Pos logit contributions
grand+0.060
 disobedience+0.056
 supervision+0.055
 hairc+0.055
t+0.054
Neg logit contributions
 camps-0.045
 knights-0.040
 hay-0.039
latable-0.038
 clearing-0.038
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
grand+0.054
 note+0.051
 promise+0.051
 hairc+0.049
t+0.047
Neg logit contributions
latable-0.037
 camps-0.035
acking-0.030
 fields-0.030
 hay-0.030
Token Isn
Token ID25110
Feature activation+1.76e-01

Pos logit contributions
grand+0.004
t+0.004
 disobedience+0.004
rav+0.004
 hairc+0.004
Neg logit contributions
 camps-0.003
latable-0.003
acking-0.003
 clearing-0.003
aw-0.003
Token't
Token ID470
Feature activation+1.18e-01

Pos logit contributions
grand+0.198
 disobedience+0.174
 hairc+0.172
t+0.172
 divorce+0.166
Neg logit contributions
 camps-0.344
 knights-0.318
latable-0.315
 hay-0.312
 clearing-0.307
Token she
Token ID673
Feature activation+3.13e-02

Pos logit contributions
grand+0.207
 disobedience+0.192
 hairc+0.190
 supervision+0.188
 divorce+0.185
Neg logit contributions
 camps-0.157
 knights-0.141
latable-0.136
 hay-0.136
 clearing-0.132
Token much
Token ID881
Feature activation+7.78e-02

Pos logit contributions
grand+0.055
 disobedience+0.052
 hairc+0.051
 supervision+0.051
t+0.050
Neg logit contributions
 camps-0.040
 knights-0.037
latable-0.035
 hay-0.035
 fish-0.034
Token better
Token ID1365
Feature activation+7.63e-02

Pos logit contributions
grand+0.143
 disobedience+0.133
 supervision+0.133
 hairc+0.131
t+0.128
Neg logit contributions
 camps-0.097
 knights-0.087
 hay-0.083
 outside-0.081
latable-0.081
Token now
Token ID783
Feature activation+1.16e-01

Pos logit contributions
grand+0.145
 disobedience+0.133
 hairc+0.133
 divorce+0.129
t+0.128
Neg logit contributions
 camps-0.093
latable-0.084
 knights-0.080
 hay-0.079
 clearing-0.078
Token is
Token ID318
Feature activation+4.31e-02

Pos logit contributions
grand+0.051
 lesson+0.047
 disobedience+0.046
 hairc+0.046
 promise+0.045
Neg logit contributions
 camps-0.049
latable-0.045
 knights-0.042
acking-0.042
 hay-0.041
Token the
Token ID262
Feature activation+2.09e-02

Pos logit contributions
grand+0.081
 hairc+0.076
 disobedience+0.076
 supervision+0.075
 lesson+0.075
Neg logit contributions
 camps-0.049
latable-0.043
acking-0.043
 knights-0.042
aw-0.039
Token fairy
Token ID25607
Feature activation+2.73e-02

Pos logit contributions
grand+0.036
 disobedience+0.033
 hairc+0.032
 supervision+0.032
 divorce+0.032
Neg logit contributions
 camps-0.029
 knights-0.026
latable-0.025
 hay-0.025
 clearing-0.025
Token garden
Token ID11376
Feature activation-4.84e-03

Pos logit contributions
grand+0.052
 disobedience+0.048
 hairc+0.048
 supervision+0.047
 divorce+0.047
Neg logit contributions
 camps-0.033
latable-0.028
 knights-0.028
 hay-0.027
 clearing-0.026
Token.
Token ID13
Feature activation-6.84e-03

Pos logit contributions
 camps+0.005
latable+0.004
 knights+0.004
 clearing+0.004
 hay+0.004
Neg logit contributions
grand-0.010
 disobedience-0.009
 hairc-0.009
 supervision-0.009
t-0.009
Token Isn
Token ID25110
Feature activation+1.76e-01

Pos logit contributions
 camps+0.011
latable+0.009
acking+0.009
 football+0.009
 hay+0.009
Neg logit contributions
grand-0.012
 disobedience-0.009
 hairc-0.009
t-0.009
 supervision-0.009
Token't
Token ID470
Feature activation+1.19e-01

Pos logit contributions
grand+0.189
t+0.167
 disobedience+0.160
 hairc+0.159
 divorce+0.155
Neg logit contributions
 camps-0.356
 knights-0.332
latable-0.326
 hay-0.325
 clearing-0.318
Token it
Token ID340
Feature activation+1.21e-01

Pos logit contributions
grand+0.209
 disobedience+0.189
 hairc+0.189
 divorce+0.186
t+0.185
Neg logit contributions
 camps-0.162
latable-0.147
 knights-0.141
 hay-0.137
acking-0.137
Token wonderful
Token ID7932
Feature activation+4.25e-02

Pos logit contributions
grand+0.218
 disobedience+0.201
 hairc+0.200
 divorce+0.196
 supervision+0.196
Neg logit contributions
 camps-0.156
latable-0.137
 knights-0.137
 hay-0.132
 clearing-0.130
Token?"
Token ID1701
Feature activation+4.67e-02

Pos logit contributions
grand+0.079
 hairc+0.073
 disobedience+0.073
 divorce+0.071
t+0.071
Neg logit contributions
 camps-0.052
latable-0.046
 knights-0.044
 hay-0.044
 clearing-0.043
Token
Token ID198
Feature activation+4.90e-02

Pos logit contributions
grand+0.103
 disobedience+0.097
 supervision+0.096
 hairc+0.095
 divorce+0.095
Neg logit contributions
 camps-0.038
 knights-0.035
 He-0.032
aw-0.031
 outside-0.031
TokenLook
Token ID8567
Feature activation+9.68e-02

Pos logit contributions
grand+0.087
 disobedience+0.079
 hairc+0.079
t+0.077
 supervision+0.077
Neg logit contributions
 camps-0.083
 knights-0.075
latable-0.074
 hay-0.072
 clearing-0.070
Token at
Token ID379
Feature activation+4.22e-02

Pos logit contributions
grand+0.207
 hairc+0.197
 disobedience+0.193
 supervision+0.192
 divorce+0.190
Neg logit contributions
 camps-0.089
 knights-0.074
latable-0.073
 hay-0.071
 clearing-0.071
Token my
Token ID616
Feature activation+8.02e-02

Pos logit contributions
grand+0.072
 hairc+0.069
 disobedience+0.068
 supervision+0.068
 divorce+0.067
Neg logit contributions
 camps-0.055
 knights-0.050
latable-0.049
 hay-0.047
acking-0.046
Token veil
Token ID30615
Feature activation+7.53e-02

Pos logit contributions
grand+0.106
 disobedience+0.095
 hairc+0.094
 supervision+0.092
t+0.091
Neg logit contributions
 camps-0.143
 knights-0.131
latable-0.128
 hay-0.127
 clearing-0.124
Token!
Token ID0
Feature activation+1.97e-02

Pos logit contributions
grand+0.141
t+0.134
 promise+0.129
 disobedience+0.127
 worry+0.127
Neg logit contributions
 camps-0.089
latable-0.084
 knights-0.077
 clearing-0.074
 hay-0.073
Token Isn
Token ID25110
Feature activation+1.75e-01

Pos logit contributions
grand+0.031
rav+0.027
 hairc+0.026
t+0.026
 rhy+0.026
Neg logit contributions
 camps-0.031
latable-0.028
acking-0.028
 hay-0.027
 clearing-0.027
Token't
Token ID470
Feature activation+1.18e-01

Pos logit contributions
grand+0.190
t+0.168
 disobedience+0.163
 hairc+0.162
 divorce+0.157
Neg logit contributions
 camps-0.351
 knights-0.327
latable-0.320
 hay-0.320
 clearing-0.315
Token it
Token ID340
Feature activation+1.24e-01

Pos logit contributions
grand+0.202
 disobedience+0.185
 hairc+0.185
 supervision+0.181
 divorce+0.180
Neg logit contributions
 camps-0.164
 knights-0.145
latable-0.145
 hay-0.140
 clearing-0.137
Token lovely
Token ID14081
Feature activation+4.21e-02

Pos logit contributions
grand+0.226
 disobedience+0.210
 hairc+0.208
 supervision+0.205
 divorce+0.203
Neg logit contributions
 camps-0.156
 knights-0.138
latable-0.135
 hay-0.133
 clearing-0.129
Token?"
Token ID1701
Feature activation+5.43e-02

Pos logit contributions
grand+0.063
 hairc+0.060
t+0.059
 lesson+0.058
 divorce+0.057
Neg logit contributions
 camps-0.062
latable-0.060
 hay-0.055
 clearing-0.055
 knights-0.055
Token Anna
Token ID11735
Feature activation+3.95e-02

Pos logit contributions
grand+0.110
 disobedience+0.103
 supervision+0.102
 hairc+0.102
 divorce+0.100
Neg logit contributions
 camps-0.057
 knights-0.050
latable-0.048
 hay-0.047
 clearing-0.045
Token I
Token ID314
Feature activation+1.45e-02

Pos logit contributions
grand+0.085
 hairc+0.079
 disobedience+0.077
 supervision+0.075
 divorce+0.075
Neg logit contributions
 camps-0.083
latable-0.074
acking-0.074
 knights-0.071
 clearing-0.071
Token painted
Token ID13055
Feature activation-1.23e-02

Pos logit contributions
grand+0.026
 supervision+0.025
 disobedience+0.025
 hairc+0.025
 protests+0.024
Neg logit contributions
 camps-0.017
 knights-0.015
 outside-0.015
 filled-0.015
latable-0.015
Token the
Token ID262
Feature activation+4.68e-02

Pos logit contributions
 camps+0.016
latable+0.016
acking+0.015
 clearing+0.014
 knights+0.013
Neg logit contributions
 hairc-0.021
grand-0.019
 rhy-0.018
 note-0.018
 cigarettes-0.018
Token wall
Token ID3355
Feature activation+5.76e-02

Pos logit contributions
grand+0.089
 supervision+0.084
rav+0.083
 disobedience+0.083
 hairc+0.080
Neg logit contributions
 camps-0.053
 jungle-0.052
 outside-0.052
 knights-0.051
 hay-0.049
Token.
Token ID13
Feature activation+1.33e-02

Pos logit contributions
t+0.104
grand+0.101
 hairc+0.098
 promise+0.096
 note+0.096
Neg logit contributions
 camps-0.070
latable-0.068
 hay-0.058
 knights-0.058
 clearing-0.057
Token Isn
Token ID25110
Feature activation+1.75e-01

Pos logit contributions
grand+0.021
t+0.017
 hairc+0.017
 rhy+0.017
 disobedience+0.017
Neg logit contributions
 camps-0.022
latable-0.020
acking-0.019
 knights-0.019
 hay-0.019
Token't
Token ID470
Feature activation+1.19e-01

Pos logit contributions
grand+0.190
t+0.169
 disobedience+0.164
 hairc+0.163
 divorce+0.157
Neg logit contributions
 camps-0.348
 knights-0.324
latable-0.319
 hay-0.319
 clearing-0.312
Token it
Token ID340
Feature activation+1.22e-01

Pos logit contributions
grand+0.207
 disobedience+0.191
 hairc+0.190
 supervision+0.186
 divorce+0.185
Neg logit contributions
 camps-0.162
 knights-0.143
latable-0.142
 hay-0.139
 clearing-0.135
Token nice
Token ID3621
Feature activation+3.86e-02

Pos logit contributions
grand+0.229
 disobedience+0.213
 hairc+0.210
 supervision+0.210
 divorce+0.205
Neg logit contributions
 camps-0.147
 knights-0.131
latable-0.126
 hay-0.125
 fish-0.121
Token and
Token ID290
Feature activation+8.47e-02

Pos logit contributions
grand+0.067
 hairc+0.063
t+0.063
 disobedience+0.061
 divorce+0.061
Neg logit contributions
 camps-0.051
latable-0.047
 hay-0.044
 knights-0.043
 clearing-0.043
Token spicy
Token ID26880
Feature activation+3.65e-02

Pos logit contributions
grand+0.152
 disobedience+0.140
 hairc+0.140
 supervision+0.137
 divorce+0.136
Neg logit contributions
 camps-0.112
 knights-0.097
latable-0.097
 hay-0.093
 clearing-0.092
Token remind
Token ID7101
Feature activation+9.91e-02

Pos logit contributions
grand+0.134
 worry+0.128
 disobedience+0.126
 promise+0.125
 disob+0.125
Neg logit contributions
 camps-0.178
 knights-0.161
latable-0.161
aw-0.157
 hay-0.154
Token him
Token ID683
Feature activation+8.19e-02

Pos logit contributions
grand+0.202
 disobedience+0.190
 hairc+0.190
 supervision+0.187
 divorce+0.184
Neg logit contributions
 camps-0.105
latable-0.087
 knights-0.087
 hay-0.083
 clearing-0.081
Token that
Token ID326
Feature activation+5.16e-02

Pos logit contributions
grand+0.156
 disobedience+0.148
 hairc+0.148
 lesson+0.147
t+0.145
Neg logit contributions
 camps-0.102
latable-0.088
 clearing-0.083
acking-0.081
 trailer-0.080
Token scary
Token ID14343
Feature activation+3.46e-03

Pos logit contributions
grand+0.106
 supervision+0.096
rav+0.096
t+0.096
 divorce+0.094
Neg logit contributions
 camps-0.051
 knights-0.050
 fish-0.048
 outside-0.046
 sailing-0.045
Token stories
Token ID3923
Feature activation+3.85e-02

Pos logit contributions
grand+0.006
 supervision+0.005
 disobedience+0.005
 hairc+0.005
t+0.005
Neg logit contributions
 camps-0.005
 knights-0.004
 hay-0.004
 fish-0.004
 outside-0.004
Token aren
Token ID3588
Feature activation+1.75e-01

Pos logit contributions
grand+0.065
 disobedience+0.060
 hairc+0.059
 supervision+0.058
 divorce+0.058
Neg logit contributions
 camps-0.053
 knights-0.048
latable-0.047
 hay-0.046
 clearing-0.045
Token't
Token ID470
Feature activation+1.29e-01

Pos logit contributions
grand+0.198
 disobedience+0.176
 hairc+0.172
t+0.172
 supervision+0.168
Neg logit contributions
 camps-0.341
 knights-0.316
latable-0.312
 hay-0.306
 clearing-0.302
Token real
Token ID1103
Feature activation+7.45e-02

Pos logit contributions
grand+0.222
 lesson+0.211
 disobedience+0.209
 divorce+0.208
 hairc+0.207
Neg logit contributions
 camps-0.175
latable-0.160
 knights-0.150
acking-0.149
aw-0.148
Token life
Token ID1204
Feature activation+8.56e-02

Pos logit contributions
grand+0.154
 disobedience+0.145
 hairc+0.144
 supervision+0.142
 divorce+0.141
Neg logit contributions
 camps-0.076
latable-0.064
 knights-0.064
 hay-0.061
 clearing-0.060
Token.
Token ID13
Feature activation+2.14e-02

Pos logit contributions
grand+0.182
 disobedience+0.171
 hairc+0.170
 supervision+0.168
 divorce+0.167
Neg logit contributions
 camps-0.081
 knights-0.068
latable-0.067
 hay-0.065
 clearing-0.062
Token<|endoftext|>
Token ID50256
Feature activation+1.90e-02

Pos logit contributions
grand+0.039
 disobedience+0.036
t+0.035
 divorce+0.034
 supervision+0.034
Neg logit contributions
 camps-0.028
 knights-0.025
latable-0.024
 hay-0.023
 stable-0.023
Token.
Token ID13
Feature activation+1.12e-02

Pos logit contributions
grand+0.176
 hairc+0.164
 disobedience+0.163
t+0.162
 supervision+0.160
Neg logit contributions
 camps-0.103
latable-0.091
 clearing-0.082
 knights-0.082
 hay-0.081
Token Thank
Token ID6952
Feature activation+5.64e-02

Pos logit contributions
grand+0.018
 disobedience+0.016
 hairc+0.016
 supervision+0.016
 policy+0.015
Neg logit contributions
 camps-0.018
latable-0.016
acking-0.015
 hay-0.015
 knights-0.015
Token you
Token ID345
Feature activation+1.16e-01

Pos logit contributions
grand+0.084
 disobedience+0.079
 divorce+0.079
 supervision+0.076
 virus+0.075
Neg logit contributions
 knights-0.085
 camps-0.082
aw-0.081
 fish-0.080
 outside-0.079
Token for
Token ID329
Feature activation+6.55e-02

Pos logit contributions
grand+0.241
 disobedience+0.226
 hairc+0.225
t+0.223
 supervision+0.222
Neg logit contributions
 camps-0.112
latable-0.098
 knights-0.092
 clearing-0.091
 hay-0.089
Token being
Token ID852
Feature activation+4.84e-02

Pos logit contributions
grand+0.116
 disobedience+0.108
 hairc+0.107
 supervision+0.106
 divorce+0.104
Neg logit contributions
 camps-0.087
latable-0.076
 knights-0.075
 hay-0.073
 clearing-0.071
Token so
Token ID523
Feature activation+1.75e-01

Pos logit contributions
grand+0.092
 hairc+0.085
 disobedience+0.085
 supervision+0.083
 divorce+0.083
Neg logit contributions
 camps-0.060
latable-0.050
 knights-0.049
 hay-0.048
 clearing-0.047
Token nice
Token ID3621
Feature activation+3.42e-02

Pos logit contributions
grand+0.311
 disobedience+0.287
 supervision+0.284
 hairc+0.284
 divorce+0.277
Neg logit contributions
 camps-0.224
 knights-0.207
 hay-0.195
latable-0.194
 fish-0.193
Token and
Token ID290
Feature activation+8.23e-02

Pos logit contributions
grand+0.071
 hairc+0.068
 disobedience+0.067
t+0.066
 supervision+0.065
Neg logit contributions
 camps-0.034
latable-0.033
 clearing-0.028
 hay-0.026
acking-0.025
Token trusting
Token ID33914
Feature activation+4.78e-02

Pos logit contributions
grand+0.126
 disobedience+0.116
 hairc+0.115
 supervision+0.114
 worry+0.111
Neg logit contributions
 camps-0.132
latable-0.118
 knights-0.111
aw-0.111
 hay-0.111
Token me
Token ID502
Feature activation+1.07e-01

Pos logit contributions
grand+0.096
 disobedience+0.090
 divorce+0.089
 supervision+0.088
t+0.087
Neg logit contributions
 camps-0.047
 knights-0.045
 outside-0.044
 hay-0.043
 fish-0.042
Token."
Token ID526
Feature activation+5.17e-02

Pos logit contributions
grand+0.217
 disobedience+0.203
 hairc+0.202
 supervision+0.199
t+0.198
Neg logit contributions
 camps-0.111
latable-0.095
 knights-0.094
 hay-0.089
 clearing-0.088
Token and
Token ID290
Feature activation+6.36e-02

Pos logit contributions
grand+0.282
 disobedience+0.263
 supervision+0.261
 hairc+0.258
t+0.255
Neg logit contributions
 camps-0.110
latable-0.094
 knights-0.088
 clearing-0.083
 hay-0.082
Token not
Token ID407
Feature activation+8.54e-02

Pos logit contributions
grand+0.096
 disobedience+0.087
 hairc+0.086
 supervision+0.085
 divorce+0.083
Neg logit contributions
 camps-0.102
latable-0.092
 knights-0.090
 hay-0.088
 clearing-0.088
Token eat
Token ID4483
Feature activation-2.08e-06

Pos logit contributions
grand+0.129
 disobedience+0.125
 disob+0.122
 worry+0.122
 hairc+0.121
Neg logit contributions
 camps-0.131
latable-0.118
 knights-0.116
 hay-0.112
aw-0.111
Token things
Token ID1243
Feature activation+1.06e-01

Feature not active on prev token;
no predictions were affected.
Token that
Token ID326
Feature activation+5.93e-02

Pos logit contributions
grand+0.210
 supervision+0.202
 hairc+0.201
t+0.201
 disobedience+0.199
Neg logit contributions
 camps-0.118
 clearing-0.101
latable-0.100
 knights-0.094
 jungle-0.093
Token aren
Token ID3588
Feature activation+1.74e-01

Pos logit contributions
grand+0.096
 babys+0.096
oral+0.092
 protests+0.091
 divorce+0.090
Neg logit contributions
 fish-0.083
 and-0.078
 outside-0.076
 like-0.075
 He-0.075
Token't
Token ID470
Feature activation+1.30e-01

Pos logit contributions
grand+0.199
 disobedience+0.176
 hairc+0.173
t+0.171
 supervision+0.169
Neg logit contributions
 camps-0.338
 knights-0.313
latable-0.310
 hay-0.305
 clearing-0.301
Token good
Token ID922
Feature activation+9.23e-02

Pos logit contributions
grand+0.207
 lesson+0.206
 disobedience+0.194
 worry+0.194
 policy+0.193
Neg logit contributions
 camps-0.193
latable-0.176
acking-0.169
 knights-0.163
aw-0.161
Token for
Token ID329
Feature activation+6.73e-02

Pos logit contributions
grand+0.174
 disobedience+0.169
 hairc+0.166
 policy+0.165
 supervision+0.165
Neg logit contributions
 camps-0.109
latable-0.098
 clearing-0.097
 hay-0.092
 jungle-0.088
Token him
Token ID683
Feature activation+9.71e-02

Pos logit contributions
grand+0.118
 disobedience+0.112
 hairc+0.111
 supervision+0.111
 divorce+0.107
Neg logit contributions
 camps-0.089
latable-0.078
 knights-0.076
 clearing-0.074
 hay-0.073
Token.
Token ID13
Feature activation+1.53e-02

Pos logit contributions
grand+0.210
 disobedience+0.197
 hairc+0.196
 supervision+0.194
 divorce+0.192
Neg logit contributions
 camps-0.089
 knights-0.075
latable-0.073
 hay-0.071
 clearing-0.068
Tokenacy
Token ID1590
Feature activation+7.82e-02

Pos logit contributions
grand+0.104
 supervision+0.093
 disobedience+0.092
 divorce+0.091
 hairc+0.087
Neg logit contributions
 camps-0.152
 knights-0.142
latable-0.136
 outside-0.135
acking-0.135
Token and
Token ID290
Feature activation+3.14e-02

Pos logit contributions
grand+0.161
 disobedience+0.150
 hairc+0.150
 supervision+0.149
t+0.146
Neg logit contributions
 camps-0.080
 knights-0.068
latable-0.067
 hay-0.064
 clearing-0.062
Token said
Token ID531
Feature activation+4.99e-02

Pos logit contributions
grand+0.038
 hairc+0.034
 disobedience+0.034
 supervision+0.033
 divorce+0.032
Neg logit contributions
 camps-0.059
latable-0.054
 knights-0.054
 hay-0.052
 jungle-0.051
Token,
Token ID11
Feature activation+3.54e-02

Pos logit contributions
grand+0.095
 hairc+0.094
t+0.092
 disobedience+0.091
rav+0.091
Neg logit contributions
 camps-0.059
latable-0.051
 clearing-0.049
 hay-0.048
 jungle-0.047
Token "
Token ID366
Feature activation+5.29e-02

Pos logit contributions
grand+0.062
t+0.060
 disobedience+0.059
 hairc+0.059
rav+0.058
Neg logit contributions
 camps-0.045
latable-0.040
 knights-0.038
 hay-0.037
aw-0.037
TokenIsn
Token ID41451
Feature activation+1.74e-01

Pos logit contributions
grand+0.087
 disobedience+0.081
 supervision+0.080
 hairc+0.080
 divorce+0.079
Neg logit contributions
 camps-0.075
 knights-0.067
latable-0.066
 hay-0.065
 clearing-0.065
Token't
Token ID470
Feature activation+1.12e-01

Pos logit contributions
grand+0.221
 disobedience+0.199
 hairc+0.196
 supervision+0.196
 divorce+0.188
Neg logit contributions
 camps-0.315
 knights-0.288
latable-0.286
 hay-0.281
 clearing-0.277
Token the
Token ID262
Feature activation+5.74e-02

Pos logit contributions
grand+0.187
 hairc+0.172
 disobedience+0.167
 divorce+0.167
t+0.165
Neg logit contributions
 camps-0.162
latable-0.152
acking-0.145
 knights-0.145
 hay-0.137
Token sunset
Token ID26428
Feature activation+8.65e-02

Pos logit contributions
grand+0.111
 disobedience+0.103
 hairc+0.103
 divorce+0.101
 supervision+0.101
Neg logit contributions
 camps-0.066
 knights-0.057
latable-0.057
 hay-0.055
 clearing-0.053
Token beautiful
Token ID4950
Feature activation+1.77e-02

Pos logit contributions
grand+0.161
 disobedience+0.150
 hairc+0.149
 supervision+0.147
 divorce+0.146
Neg logit contributions
 camps-0.104
 knights-0.091
latable-0.090
 hay-0.088
 clearing-0.085
Token?"
Token ID1701
Feature activation+5.65e-02

Pos logit contributions
grand+0.031
 hairc+0.029
 disobedience+0.029
 divorce+0.029
 lesson+0.028
Neg logit contributions
 camps-0.023
latable-0.021
 knights-0.020
 hay-0.019
 clearing-0.019
Token fair
Token ID3148
Feature activation+4.26e-02

Pos logit contributions
grand+0.124
 disobedience+0.115
 hairc+0.114
 supervision+0.113
 divorce+0.112
Neg logit contributions
 camps-0.091
latable-0.080
 knights-0.079
 hay-0.076
 clearing-0.075
Token to
Token ID284
Feature activation+8.07e-02

Pos logit contributions
grand+0.074
 hairc+0.070
 disobedience+0.069
 supervision+0.069
t+0.068
Neg logit contributions
 camps-0.057
latable-0.053
 clearing-0.051
 knights-0.049
 hay-0.049
Token take
Token ID1011
Feature activation-9.18e-03

Pos logit contributions
grand+0.137
 disobedience+0.128
 hairc+0.127
 supervision+0.126
t+0.125
Neg logit contributions
 camps-0.105
 knights-0.096
 hay-0.095
 fish-0.095
latable-0.094
Token something
Token ID1223
Feature activation+7.10e-02

Pos logit contributions
 camps+0.012
 knights+0.012
 hay+0.011
 outside+0.011
 fish+0.011
Neg logit contributions
grand-0.016
 disobedience-0.014
t-0.014
 divorce-0.014
rav-0.014
Token that
Token ID326
Feature activation+4.24e-02

Pos logit contributions
grand+0.129
 disobedience+0.118
rav+0.116
 hairc+0.116
 supervision+0.116
Neg logit contributions
 camps-0.089
 outside-0.078
 knights-0.078
 hay-0.076
 fish-0.076
Token isn
Token ID2125
Feature activation+1.74e-01

Pos logit contributions
grand+0.069
 hairc+0.066
 protests+0.065
 disobedience+0.065
 babys+0.065
Neg logit contributions
 fish-0.054
 outside-0.053
 hay-0.052
 camps-0.051
 and-0.051
Token't
Token ID470
Feature activation+1.20e-01

Pos logit contributions
grand+0.171
t+0.159
 disobedience+0.146
 hairc+0.144
 divorce+0.143
Neg logit contributions
 camps-0.366
 knights-0.349
latable-0.338
 hay-0.334
 clearing-0.334
Token yours
Token ID12431
Feature activation+7.13e-02

Pos logit contributions
grand+0.180
 lesson+0.163
 hairc+0.159
 divorce+0.159
 policy+0.157
Neg logit contributions
 camps-0.199
latable-0.181
acking-0.174
 knights-0.169
 clearing-0.167
Token."
Token ID526
Feature activation+4.76e-02

Pos logit contributions
grand+0.129
t+0.120
 disobedience+0.120
 supervision+0.119
 hairc+0.119
Neg logit contributions
 camps-0.091
latable-0.083
 clearing-0.078
 knights-0.077
 hay-0.076
Token
Token ID198
Feature activation+4.99e-02

Pos logit contributions
grand+0.091
 hairc+0.086
 disobedience+0.085
 supervision+0.084
 divorce+0.083
Neg logit contributions
 camps-0.052
aw-0.046
 hay-0.046
 He-0.046
 outside-0.046
Token
Token ID198
Feature activation+3.67e-02

Pos logit contributions
grand+0.095
 disobedience+0.088
 hairc+0.086
 supervision+0.085
t+0.085
Neg logit contributions
 camps-0.060
 knights-0.053
latable-0.051
 hay-0.050
 clearing-0.048
Token never
Token ID1239
Feature activation+8.02e-02

Pos logit contributions
grand+0.075
 disob+0.070
 promise+0.070
 worry+0.069
 disobedience+0.066
Neg logit contributions
 camps-0.130
 knights-0.118
 clearing-0.116
aw-0.116
 fields-0.115
Token play
Token ID711
Feature activation-2.24e-02

Pos logit contributions
grand+0.139
 disobedience+0.129
 hairc+0.128
 supervision+0.126
 divorce+0.125
Neg logit contributions
 camps-0.108
 knights-0.096
latable-0.095
 hay-0.093
 clearing-0.090
Token with
Token ID351
Feature activation+4.63e-02

Pos logit contributions
 camps+0.029
latable+0.025
acking+0.024
 clearing+0.023
acked+0.022
Neg logit contributions
 hairc-0.039
 note-0.038
 divorce-0.038
grand-0.037
 disobedience-0.037
Token things
Token ID1243
Feature activation+1.06e-01

Pos logit contributions
grand+0.080
 hairc+0.075
 disobedience+0.075
 supervision+0.074
 divorce+0.072
Neg logit contributions
 camps-0.063
latable-0.055
 knights-0.054
 clearing-0.053
acking-0.051
Token you
Token ID345
Feature activation+1.05e-01

Pos logit contributions
grand+0.209
t+0.207
 hairc+0.197
 supervision+0.196
 disobedience+0.196
Neg logit contributions
 camps-0.121
latable-0.109
 clearing-0.104
 jungle-0.096
 knights-0.096
Token don
Token ID836
Feature activation+1.74e-01

Pos logit contributions
grand+0.219
 disobedience+0.203
 hairc+0.202
 divorce+0.201
 supervision+0.199
Neg logit contributions
 camps-0.099
 knights-0.089
 outside-0.086
 fish-0.084
 hay-0.084
Token't
Token ID470
Feature activation+1.33e-01

Pos logit contributions
grand+0.199
t+0.175
 disobedience+0.173
 hairc+0.169
 divorce+0.164
Neg logit contributions
 camps-0.343
 knights-0.316
latable-0.313
 hay-0.307
 clearing-0.306
Token know
Token ID760
Feature activation+1.05e-01

Pos logit contributions
grand+0.198
 disobedience+0.181
 hairc+0.180
 supervision+0.177
 divorce+0.175
Neg logit contributions
 camps-0.208
 knights-0.191
latable-0.187
 hay-0.186
 outside-0.181
Token near
Token ID1474
Feature activation+4.39e-02

Pos logit contributions
grand+0.210
 hairc+0.196
 disobedience+0.196
 supervision+0.194
 divorce+0.190
Neg logit contributions
 camps-0.113
latable-0.096
 knights-0.095
 hay-0.091
 clearing-0.091
Token the
Token ID262
Feature activation+6.30e-02

Pos logit contributions
grand+0.081
 hairc+0.077
 disobedience+0.076
 supervision+0.076
 divorce+0.073
Neg logit contributions
 camps-0.054
latable-0.048
 knights-0.046
 clearing-0.044
acking-0.044
Token water
Token ID1660
Feature activation+4.30e-02

Pos logit contributions
grand+0.125
 hairc+0.119
 disobedience+0.118
 supervision+0.116
 divorce+0.116
Neg logit contributions
 camps-0.066
latable-0.058
 knights-0.057
 hay-0.054
 clearing-0.053
Token and
Token ID290
Feature activation+7.12e-02

Pos logit contributions
grand+0.163
 disobedience+0.155
t+0.155
 supervision+0.153
 hairc+0.152
Neg logit contributions
 camps-0.089
latable-0.076
 clearing-0.073
 hay-0.068
 knights-0.067
Token respectful
Token ID25923
Feature activation+3.32e-02

Pos logit contributions
grand+0.103
 hairc+0.092
 disobedience+0.092
 divorce+0.090
 supervision+0.089
Neg logit contributions
 camps-0.114
 knights-0.107
 hay-0.104
latable-0.103
 fish-0.101
Token of
Token ID286
Feature activation+1.64e-02

Pos logit contributions
grand+0.072
 disobedience+0.067
 hairc+0.067
 divorce+0.066
 supervision+0.066
Neg logit contributions
 camps-0.030
 knights-0.026
 hay-0.024
latable-0.024
 outside-0.023
Token animals
Token ID4695
Feature activation+3.24e-02

Pos logit contributions
grand+0.025
 disobedience+0.023
 hairc+0.023
 supervision+0.023
 divorce+0.022
Neg logit contributions
 camps-0.026
latable-0.023
 knights-0.022
acking-0.022
 clearing-0.022
Token we
Token ID356
Feature activation+7.45e-02

Pos logit contributions
grand+0.069
 disobedience+0.064
 hairc+0.064
 supervision+0.064
t+0.063
Neg logit contributions
 camps-0.031
latable-0.026
 knights-0.026
 clearing-0.025
 hay-0.024
Token don
Token ID836
Feature activation+1.74e-01

Pos logit contributions
grand+0.117
 disobedience+0.108
 hairc+0.107
 supervision+0.105
 divorce+0.105
Neg logit contributions
 camps-0.110
 knights-0.101
 hay-0.098
latable-0.098
 fish-0.095
Token't
Token ID470
Feature activation+1.32e-01

Pos logit contributions
grand+0.191
 disobedience+0.168
t+0.167
 hairc+0.165
 divorce+0.160
Neg logit contributions
 camps-0.345
 knights-0.319
latable-0.316
 hay-0.310
 clearing-0.309
Token know
Token ID760
Feature activation+1.03e-01

Pos logit contributions
grand+0.214
 disobedience+0.197
 hairc+0.195
 supervision+0.192
t+0.190
Neg logit contributions
 camps-0.195
 knights-0.174
latable-0.172
 hay-0.168
 clearing-0.165
Token.
Token ID13
Feature activation+6.70e-03

Pos logit contributions
grand+0.214
 disobedience+0.200
 hairc+0.198
 supervision+0.196
 divorce+0.195
Neg logit contributions
 camps-0.103
 knights-0.088
latable-0.085
 hay-0.084
 clearing-0.080
Token They
Token ID1119
Feature activation-5.66e-02

Pos logit contributions
grand+0.012
 disobedience+0.011
 hairc+0.011
 supervision+0.011
 divorce+0.011
Neg logit contributions
 camps-0.009
 knights-0.008
latable-0.008
 hay-0.008
 clearing-0.007
Token might
Token ID1244
Feature activation+7.96e-02

Pos logit contributions
 camps+0.076
 fish+0.071
 hay+0.070
 knights+0.070
 and+0.070
Neg logit contributions
grand-0.092
 hairc-0.088
 disobedience-0.085
 supervision-0.084
 policy-0.084
Token the
Token ID262
Feature activation-8.22e-02

Pos logit contributions
 camps+0.111
 knights+0.101
latable+0.101
acking+0.095
 astronaut+0.095
Neg logit contributions
 hairc-0.162
grand-0.158
 disobedience-0.153
 punishment-0.151
 lesson-0.149
Token beach
Token ID10481
Feature activation-1.19e-01

Pos logit contributions
 camps+0.113
 knights+0.105
latable+0.102
aw+0.098
acking+0.098
Neg logit contributions
grand-0.136
 hairc-0.129
 disobedience-0.123
 supervision-0.123
 divorce-0.121
Token.
Token ID13
Feature activation-6.59e-02

Pos logit contributions
 camps+0.123
latable+0.115
 knights+0.111
 astronaut+0.102
 clearing+0.099
Neg logit contributions
grand-0.232
 hairc-0.220
t-0.215
 disobedience-0.214
 note-0.212
Token Dad
Token ID17415
Feature activation-1.66e-03

Pos logit contributions
 camps+0.082
latable+0.075
 knights+0.074
 hay+0.070
 fish+0.069
Neg logit contributions
grand-0.121
 hairc-0.108
 disobedience-0.108
 supervision-0.106
t-0.106
Token was
Token ID373
Feature activation-4.55e-02

Pos logit contributions
 camps+0.003
 knights+0.003
latable+0.003
 jungle+0.003
 hay+0.003
Neg logit contributions
grand-0.002
 complained-0.002
 disagreed-0.002
t-0.002
 hairc-0.002
Token digging
Token ID18894
Feature activation-2.06e-01

Pos logit contributions
 camps+0.071
latable+0.064
 knights+0.063
 hay+0.063
 spears+0.060
Neg logit contributions
 hairc-0.065
 disob-0.063
 worry-0.062
 babys-0.061
grand-0.060
Token a
Token ID257
Feature activation-6.22e-02

Pos logit contributions
 camps+0.231
latable+0.229
 trailer+0.211
 astronaut+0.204
acking+0.193
Neg logit contributions
 lesson-0.351
t-0.331
 note-0.320
 hairc-0.319
 punishment-0.318
Token deep
Token ID2769
Feature activation-1.27e-01

Pos logit contributions
 camps+0.078
 knights+0.072
latable+0.071
 astronaut+0.066
aw+0.065
Neg logit contributions
grand-0.100
 note-0.098
 promise-0.097
 hairc-0.097
 disobedience-0.097
Token hole
Token ID7604
Feature activation-8.59e-02

Pos logit contributions
 knights+0.179
 lions+0.176
aw+0.172
 cops+0.171
 camps+0.171
Neg logit contributions
 worry-0.168
 moment-0.164
 reason-0.157
 mistake-0.156
 fuss-0.155
Token in
Token ID287
Feature activation-7.12e-02

Pos logit contributions
 camps+0.109
latable+0.100
 knights+0.098
 astronaut+0.095
 jungle+0.094
Neg logit contributions
t-0.157
 note-0.131
 disobedience-0.130
 worry-0.130
 complained-0.129
Token the
Token ID262
Feature activation-5.74e-02

Pos logit contributions
 camps+0.085
latable+0.083
aw+0.080
 knights+0.080
acking+0.079
Neg logit contributions
 worry-0.122
 hairc-0.120
 disobedience-0.118
 punishment-0.117
 lesson-0.115
TokenTim
Token ID14967
Feature activation-7.80e-02

Pos logit contributions
 camps+0.107
latable+0.094
 knights+0.094
 hay+0.091
 clearing+0.087
Neg logit contributions
grand-0.148
 disobedience-0.136
 hairc-0.134
 supervision-0.132
t-0.132
Tokenmy
Token ID1820
Feature activation-7.09e-02

Pos logit contributions
 camps+0.141
latable+0.137
 knights+0.126
 clearing+0.125
 sailing+0.124
Neg logit contributions
 promise-0.096
 complained-0.092
grand-0.091
 disagreed-0.085
 disob-0.085
Token worked
Token ID3111
Feature activation-1.36e-01

Pos logit contributions
 camps+0.136
latable+0.127
 clearing+0.124
 knights+0.122
 hay+0.122
Neg logit contributions
 complained-0.080
grand-0.076
 promise-0.075
 disob-0.073
 disagreed-0.072
Token hard
Token ID1327
Feature activation-8.23e-02

Pos logit contributions
 camps+0.204
 knights+0.183
latable+0.183
 hay+0.177
 clearing+0.173
Neg logit contributions
grand-0.212
 disobedience-0.199
 hairc-0.197
t-0.194
 supervision-0.192
Token,
Token ID11
Feature activation-4.96e-02

Pos logit contributions
 camps+0.096
latable+0.085
 knights+0.081
 clearing+0.079
 hay+0.078
Neg logit contributions
grand-0.154
 disobedience-0.150
t-0.150
 hairc-0.149
 lesson-0.146
Token digging
Token ID18894
Feature activation-2.06e-01

Pos logit contributions
 camps+0.065
latable+0.060
aw+0.056
 knights+0.054
 clearing+0.054
Neg logit contributions
 disobedience-0.083
grand-0.083
 hairc-0.083
 supervision-0.081
 worry-0.079
Token and
Token ID290
Feature activation-3.73e-02

Pos logit contributions
latable+0.257
 camps+0.250
acking+0.216
 jungle+0.215
 trailer+0.214
Neg logit contributions
 lesson-0.360
 punishment-0.323
 note-0.311
 instructions-0.310
t-0.310
Token piling
Token ID48060
Feature activation-1.13e-01

Pos logit contributions
 camps+0.049
latable+0.046
aw+0.043
 hay+0.041
 knights+0.040
Neg logit contributions
grand-0.064
 disobedience-0.062
 supervision-0.061
 hairc-0.060
t-0.059
Token the
Token ID262
Feature activation-4.06e-02

Pos logit contributions
 camps+0.142
latable+0.138
 astronaut+0.136
 jungle+0.132
 knights+0.131
Neg logit contributions
 hairc-0.179
 lesson-0.172
 instructions-0.161
 attention-0.158
 punishment-0.155
Token sand
Token ID6450
Feature activation-5.78e-02

Pos logit contributions
latable+0.048
 camps+0.048
acking+0.045
aw+0.044
und+0.044
Neg logit contributions
grand-0.068
 lesson-0.067
 hairc-0.066
 disobedience-0.064
 punishment-0.064
Token,
Token ID11
Feature activation-3.63e-02

Pos logit contributions
latable+0.070
 clearing+0.066
 camps+0.065
 astronaut+0.061
 knights+0.058
Neg logit contributions
t-0.099
 note-0.092
 lesson-0.092
 hairc-0.090
grand-0.089
Token big
Token ID1263
Feature activation-1.47e-01

Pos logit contributions
 camps+0.108
 knights+0.106
aw+0.099
latable+0.099
acking+0.096
Neg logit contributions
grand-0.128
 note-0.126
 virus-0.125
t-0.123
 promise-0.120
Token tree
Token ID5509
Feature activation-9.57e-02

Pos logit contributions
 camps+0.211
 knights+0.195
aw+0.193
latable+0.192
uting+0.188
Neg logit contributions
 virus-0.220
 worry-0.216
 note-0.213
grand-0.208
 hairc-0.205
Token.
Token ID13
Feature activation-7.35e-02

Pos logit contributions
 camps+0.135
latable+0.129
 knights+0.121
 astronaut+0.120
 football+0.112
Neg logit contributions
t-0.157
grand-0.156
 disobedience-0.144
 disob-0.144
 note-0.142
Token He
Token ID679
Feature activation-1.46e-01

Pos logit contributions
 camps+0.087
latable+0.080
 knights+0.070
 hay+0.070
acking+0.069
Neg logit contributions
grand-0.138
 hairc-0.123
 disobedience-0.122
t-0.122
 babys-0.117
Token was
Token ID373
Feature activation-5.01e-02

Pos logit contributions
 camps+0.229
latable+0.223
acking+0.205
 knights+0.205
 clearing+0.201
Neg logit contributions
grand-0.207
 hairc-0.195
 disob-0.189
 complained-0.188
 disobedience-0.186
Token lying
Token ID9105
Feature activation-2.04e-01

Pos logit contributions
 camps+0.088
latable+0.081
und+0.077
acking+0.077
aw+0.074
Neg logit contributions
 hairc-0.061
 disob-0.059
 worry-0.058
 punishment-0.054
 complained-0.054
Token on
Token ID319
Feature activation-6.40e-02

Pos logit contributions
latable+0.290
 camps+0.256
 trailer+0.225
 knights+0.224
acking+0.216
Neg logit contributions
t-0.338
 moment-0.282
 worry-0.279
 disobedience-0.278
 disagreed-0.278
Token the
Token ID262
Feature activation-4.63e-02

Pos logit contributions
acking+0.100
latable+0.092
 camps+0.090
aw+0.088
acked+0.087
Neg logit contributions
 hairc-0.098
 punishment-0.084
 lesson-0.084
 disobedience-0.078
 instructions-0.078
Token grass
Token ID8701
Feature activation-8.64e-02

Pos logit contributions
 camps+0.064
latable+0.059
acking+0.058
 knights+0.058
und+0.057
Neg logit contributions
grand-0.075
 hairc-0.070
 lesson-0.068
 disobedience-0.068
 supervision-0.067
Token and
Token ID290
Feature activation-3.51e-02

Pos logit contributions
latable+0.098
 camps+0.098
 astronaut+0.096
 knights+0.094
 hay+0.090
Neg logit contributions
t-0.162
 disagreed-0.131
 note-0.131
 worry-0.130
 disobedience-0.130
Token holding
Token ID4769
Feature activation-8.09e-02

Pos logit contributions
 camps+0.047
latable+0.046
acking+0.042
 trailer+0.039
 knights+0.039
Neg logit contributions
 hairc-0.056
grand-0.056
 disobedience-0.055
 complained-0.054
 worry-0.053
Token a
Token ID257
Feature activation-6.69e-02

Pos logit contributions
 camps+0.098
latable+0.088
acking+0.080
 knights+0.077
aw+0.075
Neg logit contributions
grand-0.188
 disobedience-0.185
 hairc-0.182
 punishment-0.181
 lesson-0.179
Token surprise
Token ID5975
Feature activation-1.48e-02

Pos logit contributions
 camps+0.083
 knights+0.075
latable+0.075
aw+0.071
acking+0.070
Neg logit contributions
grand-0.116
 disobedience-0.110
 divorce-0.109
 hairc-0.109
 worry-0.107
Token!
Token ID0
Feature activation-5.70e-02

Pos logit contributions
 camps+0.017
latable+0.015
 knights+0.015
acking+0.014
 jungle+0.014
Neg logit contributions
grand-0.028
 policy-0.026
 disobedience-0.026
 hairc-0.026
 supervision-0.026
Token There
Token ID1318
Feature activation-1.07e-01

Pos logit contributions
 camps+0.066
 knights+0.056
latable+0.055
 hay+0.054
aw+0.051
Neg logit contributions
grand-0.113
 hairc-0.103
 disobedience-0.103
t-0.102
 divorce-0.100
Token,
Token ID11
Feature activation-6.48e-02

Pos logit contributions
 camps+0.145
latable+0.129
 knights+0.127
 hay+0.123
 clearing+0.119
Neg logit contributions
grand-0.186
 disobedience-0.173
t-0.170
 hairc-0.169
 supervision-0.167
Token lying
Token ID9105
Feature activation-2.03e-01

Pos logit contributions
 camps+0.082
latable+0.076
acking+0.069
 knights+0.068
 hay+0.067
Neg logit contributions
grand-0.116
 hairc-0.111
 disobedience-0.109
t-0.107
 divorce-0.106
Token in
Token ID287
Feature activation-8.31e-02

Pos logit contributions
 camps+0.275
latable+0.261
 knights+0.233
 astronaut+0.223
 hay+0.223
Neg logit contributions
t-0.329
grand-0.317
 disobedience-0.305
 note-0.305
 complained-0.294
Token front
Token ID2166
Feature activation-4.24e-02

Pos logit contributions
 camps+0.095
latable+0.089
acking+0.084
aw+0.083
 knights+0.081
Neg logit contributions
 hairc-0.147
grand-0.147
 disobedience-0.145
 punishment-0.145
 worry-0.145
Token of
Token ID286
Feature activation-9.65e-02

Pos logit contributions
 camps+0.046
latable+0.040
 knights+0.039
 hay+0.036
 football+0.035
Neg logit contributions
grand-0.085
 disobedience-0.080
 hairc-0.079
t-0.078
 supervision-0.077
Token him
Token ID683
Feature activation-3.08e-02

Pos logit contributions
 camps+0.096
latable+0.082
acking+0.081
 knights+0.077
aw+0.076
Neg logit contributions
grand-0.191
 hairc-0.189
 disobedience-0.189
 supervision-0.185
 worry-0.183
Token,
Token ID11
Feature activation-4.46e-02

Pos logit contributions
 camps+0.035
latable+0.030
 knights+0.029
 hay+0.028
 astronaut+0.027
Neg logit contributions
grand-0.059
 disobedience-0.057
 supervision-0.056
t-0.056
 hairc-0.055
Token special
Token ID2041
Feature activation-1.06e-01

Pos logit contributions
 camps+0.018
 knights+0.017
latable+0.017
 hay+0.016
 clearing+0.015
Neg logit contributions
grand-0.018
 disobedience-0.016
 hairc-0.016
t-0.016
 lesson-0.016
Token!
Token ID0
Feature activation-6.94e-02

Pos logit contributions
 camps+0.120
latable+0.109
 knights+0.102
acking+0.099
 hay+0.093
Neg logit contributions
grand-0.209
 hairc-0.197
 disobedience-0.193
 policy-0.190
 lesson-0.190
Token<|endoftext|>
Token ID50256
Feature activation-7.61e-02

Pos logit contributions
 camps+0.087
latable+0.077
 knights+0.075
 hay+0.073
 clearing+0.069
Neg logit contributions
grand-0.131
 hairc-0.118
 disobedience-0.117
t-0.116
 supervision-0.114
TokenKate
Token ID45087
Feature activation-8.03e-02

Pos logit contributions
 camps+0.113
latable+0.110
 knights+0.103
 hay+0.101
 astronaut+0.099
Neg logit contributions
grand-0.115
t-0.110
 hairc-0.105
 disobedience-0.101
rav-0.101
Token was
Token ID373
Feature activation-6.02e-02

Pos logit contributions
 camps+0.083
latable+0.074
 knights+0.073
 clearing+0.068
 hay+0.068
Neg logit contributions
grand-0.168
 hairc-0.157
 disobedience-0.152
 supervision-0.151
 complained-0.150
Token walking
Token ID6155
Feature activation-2.02e-01

Pos logit contributions
 camps+0.081
latable+0.080
 fish+0.074
 hay+0.072
 trailer+0.070
Neg logit contributions
 babys-0.090
 disob-0.088
 hairc-0.086
 worry-0.086
grand-0.086
Token in
Token ID287
Feature activation-9.81e-02

Pos logit contributions
latable+0.283
 camps+0.236
aw+0.232
 trailer+0.220
 hay+0.217
Neg logit contributions
 hairc-0.330
 moment-0.320
t-0.315
nam-0.312
days-0.306
Token the
Token ID262
Feature activation-8.55e-02

Pos logit contributions
 camps+0.108
latable+0.100
aw+0.098
 knights+0.096
acking+0.093
Neg logit contributions
 hairc-0.181
grand-0.177
 disobedience-0.174
 worry-0.172
 punishment-0.168
Token park
Token ID3952
Feature activation-1.22e-01

Pos logit contributions
 camps+0.088
latable+0.086
 knights+0.084
acking+0.083
 trailer+0.075
Neg logit contributions
 hairc-0.151
grand-0.149
 divorce-0.144
 babys-0.143
 worry-0.141
Token with
Token ID351
Feature activation-6.40e-02

Pos logit contributions
 camps+0.147
latable+0.135
 hay+0.126
 knights+0.121
 lumber+0.119
Neg logit contributions
t-0.220
grand-0.217
 hairc-0.207
 disobedience-0.206
 note-0.204
Token her
Token ID607
Feature activation-3.20e-02

Pos logit contributions
 camps+0.097
latable+0.087
 knights+0.086
 hay+0.083
 clearing+0.082
Neg logit contributions
grand-0.100
 hairc-0.093
 disobedience-0.092
 supervision-0.090
 divorce-0.089
Token they
Token ID484
Feature activation-1.27e-01

Pos logit contributions
 camps+0.136
latable+0.134
acking+0.122
aw+0.116
 knights+0.115
Neg logit contributions
grand-0.163
 worry-0.158
 hairc-0.154
 disobedience-0.153
nam-0.149
Token saw
Token ID2497
Feature activation-1.25e-01

Pos logit contributions
 camps+0.202
latable+0.192
 hay+0.180
 clearing+0.177
 knights+0.175
Neg logit contributions
grand-0.187
 worry-0.171
 disob-0.170
 promise-0.169
 disobedience-0.169
Token a
Token ID257
Feature activation-7.02e-02

Pos logit contributions
 camps+0.137
acking+0.129
latable+0.120
 clearing+0.113
 trailer+0.110
Neg logit contributions
grand-0.239
 hairc-0.234
 divorce-0.232
 disobedience-0.232
 supervision-0.225
Token big
Token ID1263
Feature activation-1.32e-01

Pos logit contributions
 camps+0.081
 knights+0.073
latable+0.073
aw+0.068
 hay+0.068
Neg logit contributions
grand-0.129
 disobedience-0.122
 hairc-0.122
 divorce-0.121
 virus-0.118
Token bull
Token ID6473
Feature activation-6.75e-02

Pos logit contributions
 camps+0.172
latable+0.161
acking+0.157
 knights+0.155
aw+0.154
Neg logit contributions
 worry-0.214
 virus-0.210
 note-0.210
 hairc-0.204
 promise-0.204
Token lying
Token ID9105
Feature activation-2.00e-01

Pos logit contributions
latable+0.090
 camps+0.083
acking+0.076
igg+0.076
ate+0.075
Neg logit contributions
 disobedience-0.105
 note-0.104
 worry-0.103
 disob-0.101
 hairc-0.099
Token on
Token ID319
Feature activation-5.77e-02

Pos logit contributions
latable+0.261
 camps+0.257
 knights+0.222
 astronaut+0.205
 hay+0.205
Neg logit contributions
t-0.333
 note-0.303
 disobedience-0.299
 virus-0.293
 disob-0.291
Token the
Token ID262
Feature activation-3.98e-02

Pos logit contributions
 camps+0.079
acking+0.079
latable+0.074
 knights+0.070
aw+0.068
Neg logit contributions
 lesson-0.088
 hairc-0.087
 punishment-0.085
 reason-0.084
grand-0.083
Token ground
Token ID2323
Feature activation-5.78e-02

Pos logit contributions
 camps+0.056
 knights+0.054
und+0.053
acking+0.053
aw+0.052
Neg logit contributions
grand-0.062
 lesson-0.057
 virus-0.056
 supervision-0.055
 hairc-0.054
Token.
Token ID13
Feature activation-6.02e-02

Pos logit contributions
 camps+0.086
latable+0.081
 knights+0.078
 astronaut+0.076
 cowboy+0.070
Neg logit contributions
t-0.089
grand-0.084
 disobedience-0.083
 note-0.079
 supervision-0.078
Token It
Token ID632
Feature activation-4.72e-02

Pos logit contributions
 camps+0.071
 astronaut+0.062
latable+0.061
 knights+0.060
acking+0.055
Neg logit contributions
grand-0.112
 hairc-0.100
 disobedience-0.100
t-0.099
 babys-0.099
Token One
Token ID1881
Feature activation-1.15e-01

Pos logit contributions
grand+0.135
 disobedience+0.125
 hairc+0.124
 supervision+0.122
 divorce+0.121
Neg logit contributions
 camps-0.094
 knights-0.084
latable-0.082
 hay-0.081
 clearing-0.079
Token day
Token ID1110
Feature activation-6.25e-02

Pos logit contributions
 camps+0.199
 hay+0.180
 knights+0.175
 clearing+0.175
 fish+0.175
Neg logit contributions
grand-0.160
 supervision-0.147
 disobedience-0.144
 hairc-0.143
 divorce-0.140
Token,
Token ID11
Feature activation-9.21e-02

Pos logit contributions
 camps+0.073
latable+0.068
acking+0.061
 knights+0.061
 jungle+0.060
Neg logit contributions
 hairc-0.113
grand-0.113
t-0.113
 disobedience-0.108
 punishment-0.107
Token Kate
Token ID16693
Feature activation-6.43e-02

Pos logit contributions
 camps+0.129
acking+0.122
latable+0.121
aw+0.115
 knights+0.110
Neg logit contributions
grand-0.152
 hairc-0.145
 worry-0.138
 disobedience-0.137
t-0.137
Token was
Token ID373
Feature activation-6.88e-02

Pos logit contributions
 camps+0.089
latable+0.081
 knights+0.080
 hay+0.077
 clearing+0.077
Neg logit contributions
grand-0.111
 disobedience-0.099
 hairc-0.099
 supervision-0.097
t-0.096
Token walking
Token ID6155
Feature activation-1.99e-01

Pos logit contributions
 camps+0.109
latable+0.108
und+0.099
 football+0.099
 fish+0.096
Neg logit contributions
 disob-0.090
 hairc-0.088
 babys-0.085
 worry-0.085
 divorce-0.080
Token in
Token ID287
Feature activation-9.71e-02

Pos logit contributions
latable+0.277
aw+0.237
 camps+0.235
 trailer+0.225
und+0.217
Neg logit contributions
t-0.311
 hairc-0.308
 moment-0.307
 lesson-0.300
 permission-0.288
Token a
Token ID257
Feature activation-8.14e-02

Pos logit contributions
 camps+0.099
latable+0.093
 knights+0.089
aw+0.088
acking+0.086
Neg logit contributions
 hairc-0.185
grand-0.184
 disobedience-0.180
 worry-0.175
 punishment-0.173
Token deep
Token ID2769
Feature activation-1.39e-01

Pos logit contributions
 knights+0.089
 camps+0.088
latable+0.085
und+0.077
acking+0.076
Neg logit contributions
grand-0.137
 divorce-0.136
 virus-0.133
 hairc-0.132
 note-0.131
Token,
Token ID11
Feature activation-6.39e-02

Pos logit contributions
 knights+0.185
latable+0.176
 cops+0.171
 lions+0.170
 Enjoy+0.168
Neg logit contributions
 worry-0.203
 moment-0.187
 virus-0.183
 photo-0.180
 reason-0.180
Token dark
Token ID3223
Feature activation-1.13e-01

Pos logit contributions
 camps+0.096
latable+0.089
 knights+0.088
und+0.082
acking+0.082
Neg logit contributions
 worry-0.091
grand-0.088
nam-0.087
 disobedience-0.086
 hairc-0.085
Token unknown
Token ID6439
Feature activation-7.53e-02

Pos logit contributions
 camps+0.035
und+0.031
 knights+0.030
 clearing+0.030
acking+0.030
Neg logit contributions
grand-0.040
 hairc-0.037
 disob-0.035
 disobedience-0.034
 divorce-0.034
Token cord
Token ID15050
Feature activation-6.13e-02

Pos logit contributions
 camps+0.098
latable+0.091
acking+0.089
 knights+0.088
aw+0.085
Neg logit contributions
grand-0.129
 hairc-0.121
 virus-0.119
 lesson-0.119
 policy-0.118
Token.
Token ID13
Feature activation-5.85e-02

Pos logit contributions
 camps+0.082
latable+0.075
acking+0.071
 knights+0.070
 clearing+0.069
Neg logit contributions
grand-0.108
t-0.101
 hairc-0.095
 disobedience-0.094
 note-0.093
Token It
Token ID632
Feature activation-4.27e-02

Pos logit contributions
 camps+0.068
latable+0.053
 knights+0.052
acking+0.052
aw+0.052
Neg logit contributions
grand-0.117
 hairc-0.109
 disobedience-0.106
t-0.106
 policy-0.104
Token was
Token ID373
Feature activation-4.28e-02

Pos logit contributions
 camps+0.055
 knights+0.049
latable+0.048
 hay+0.048
 clearing+0.047
Neg logit contributions
grand-0.076
 disobedience-0.071
 supervision-0.070
 hairc-0.070
 divorce-0.069
Token lying
Token ID9105
Feature activation-1.98e-01

Pos logit contributions
 camps+0.058
 knights+0.051
latable+0.051
 hay+0.048
aw+0.047
Neg logit contributions
grand-0.073
 disobedience-0.068
 hairc-0.068
t-0.067
 divorce-0.066
Token on
Token ID319
Feature activation-5.85e-02

Pos logit contributions
 camps+0.248
latable+0.244
 knights+0.203
acking+0.193
 hay+0.191
Neg logit contributions
t-0.350
grand-0.317
 disobedience-0.311
 lesson-0.310
 virus-0.308
Token the
Token ID262
Feature activation-4.25e-02

Pos logit contributions
 camps+0.072
acking+0.068
latable+0.067
 knights+0.063
aw+0.059
Neg logit contributions
 hairc-0.102
grand-0.102
 lesson-0.096
 disobedience-0.093
 virus-0.093
Token ground
Token ID2323
Feature activation-5.35e-02

Pos logit contributions
 camps+0.056
 knights+0.050
latable+0.050
aw+0.047
 clearing+0.047
Neg logit contributions
grand-0.075
 hairc-0.068
 disobedience-0.068
 supervision-0.066
t-0.066
Token,
Token ID11
Feature activation-4.01e-02

Pos logit contributions
 camps+0.072
latable+0.067
 knights+0.063
 cowboy+0.063
 astronaut+0.061
Neg logit contributions
t-0.094
grand-0.082
 disobedience-0.081
 note-0.078
 disob-0.077
Token hidden
Token ID7104
Feature activation-1.20e-01

Pos logit contributions
 camps+0.056
latable+0.050
 knights+0.048
aw+0.045
 astronaut+0.044
Neg logit contributions
 disobedience-0.064
t-0.064
 worry-0.063
grand-0.062
 disob-0.061
Token all
Token ID477
Feature activation-9.41e-02

Pos logit contributions
grand+0.069
 disobedience+0.064
 hairc+0.064
 supervision+0.063
 divorce+0.063
Neg logit contributions
 camps-0.049
 knights-0.043
latable-0.042
 clearing-0.041
 hay-0.041
Token!"
Token ID2474
Feature activation-4.14e-02

Pos logit contributions
 camps+0.131
latable+0.118
 knights+0.116
 trailer+0.111
 football+0.105
Neg logit contributions
grand-0.159
 lesson-0.150
 disobedience-0.150
 hairc-0.148
t-0.146
Token<|endoftext|>
Token ID50256
Feature activation-7.17e-02

Pos logit contributions
 camps+0.056
latable+0.049
 knights+0.048
 hay+0.046
 clearing+0.046
Neg logit contributions
grand-0.074
 disobedience-0.067
 hairc-0.067
t-0.066
 supervision-0.066
TokenBilly
Token ID49640
Feature activation-8.30e-02

Pos logit contributions
 camps+0.101
latable+0.094
 knights+0.089
 hay+0.088
 astronaut+0.085
Neg logit contributions
grand-0.118
t-0.110
 hairc-0.108
 disobedience-0.106
rav-0.103
Token was
Token ID373
Feature activation-5.47e-02

Pos logit contributions
 camps+0.134
latable+0.131
 knights+0.124
 clearing+0.117
 hay+0.116
Neg logit contributions
grand-0.117
 complained-0.116
 disagreed-0.109
 worry-0.106
 hairc-0.106
Token walking
Token ID6155
Feature activation-1.97e-01

Pos logit contributions
 camps+0.071
latable+0.071
 fish+0.065
 hay+0.064
 lumber+0.063
Neg logit contributions
 babys-0.078
grand-0.077
 disob-0.077
 punishment-0.075
 hairc-0.074
Token to
Token ID284
Feature activation-3.78e-02

Pos logit contributions
latable+0.279
 camps+0.241
aw+0.221
 hay+0.208
igg+0.201
Neg logit contributions
 hairc-0.307
 moment-0.294
t-0.293
 lesson-0.290
 note-0.284
Token the
Token ID262
Feature activation-7.05e-02

Pos logit contributions
acking+0.052
latable+0.052
 camps+0.050
aw+0.048
acked+0.047
Neg logit contributions
 hairc-0.061
 worry-0.057
grand-0.057
 reason-0.055
 disobedience-0.054
Token station
Token ID4429
Feature activation-8.80e-02

Pos logit contributions
 camps+0.070
latable+0.070
acking+0.066
 knights+0.064
 hay+0.059
Neg logit contributions
grand-0.138
 hairc-0.132
 divorce-0.127
 disobedience-0.124
 note-0.123
Token when
Token ID618
Feature activation-3.48e-02

Pos logit contributions
 camps+0.102
latable+0.090
 knights+0.086
 hay+0.086
und+0.079
Neg logit contributions
grand-0.170
t-0.164
 hairc-0.155
 disobedience-0.155
 note-0.151
Token he
Token ID339
Feature activation-1.08e-01

Pos logit contributions
 camps+0.051
latable+0.050
 hay+0.047
acking+0.045
aw+0.044
Neg logit contributions
t-0.050
 disobedience-0.050
 worry-0.050
 hairc-0.050
 punishment-0.049
TokenOne
Token ID3198
Feature activation-1.07e-01

Pos logit contributions
grand+0.036
 worry+0.033
 hairc+0.033
 disobedience+0.032
t+0.031
Neg logit contributions
 camps-0.030
acking-0.027
latable-0.026
 clearing-0.024
 knights-0.024
Token day
Token ID1110
Feature activation-6.03e-02

Pos logit contributions
 camps+0.183
 hay+0.162
 clearing+0.161
 pond+0.159
 fish+0.159
Neg logit contributions
grand-0.155
 supervision-0.145
 disobedience-0.139
 divorce-0.138
 hairc-0.137
Token,
Token ID11
Feature activation-9.17e-02

Pos logit contributions
 camps+0.068
latable+0.064
acking+0.057
 hay+0.056
 clearing+0.056
Neg logit contributions
grand-0.112
 hairc-0.112
t-0.110
 disobedience-0.106
 punishment-0.106
Token Anna
Token ID11735
Feature activation-7.04e-02

Pos logit contributions
 camps+0.121
acking+0.115
latable+0.113
aw+0.106
 hay+0.100
Neg logit contributions
grand-0.160
 hairc-0.154
 worry-0.144
 disobedience-0.144
t-0.143
Token was
Token ID373
Feature activation-6.92e-02

Pos logit contributions
 camps+0.088
latable+0.084
 knights+0.077
 clearing+0.076
 football+0.076
Neg logit contributions
grand-0.130
 disobedience-0.116
 hairc-0.115
 disob-0.114
 supervision-0.112
Token walking
Token ID6155
Feature activation-1.96e-01

Pos logit contributions
 camps+0.105
latable+0.103
 football+0.094
und+0.092
 fish+0.092
Neg logit contributions
 disob-0.098
 hairc-0.094
 babys-0.093
 worry-0.091
 divorce-0.089
Token in
Token ID287
Feature activation-9.37e-02

Pos logit contributions
latable+0.298
aw+0.255
 camps+0.253
 trailer+0.246
und+0.227
Neg logit contributions
t-0.294
 moment-0.289
 hairc-0.281
 lesson-0.281
grand-0.270
Token the
Token ID262
Feature activation-7.51e-02

Pos logit contributions
 camps+0.103
latable+0.099
aw+0.096
 knights+0.095
acking+0.093
Neg logit contributions
 hairc-0.170
grand-0.167
 disobedience-0.164
 worry-0.162
 punishment-0.159
Token park
Token ID3952
Feature activation-1.13e-01

Pos logit contributions
 knights+0.084
latable+0.076
 camps+0.074
atched+0.068
 spears+0.067
Neg logit contributions
grand-0.133
 divorce-0.127
 hairc-0.124
 virus-0.124
 babys-0.123
Token.
Token ID13
Feature activation-7.78e-02

Pos logit contributions
 camps+0.129
latable+0.116
 hay+0.110
 knights+0.107
 trailer+0.105
Neg logit contributions
grand-0.213
t-0.211
 disobedience-0.203
 hairc-0.203
 worry-0.198
Token She
Token ID1375
Feature activation-1.37e-01

Pos logit contributions
 camps+0.090
 fish+0.076
latable+0.075
 knights+0.074
 sailing+0.073
Neg logit contributions
grand-0.143
 hairc-0.140
t-0.139
 disobedience-0.137
 protests-0.132
Token hut
Token ID40812
Feature activation-1.86e-01

Pos logit contributions
 camps+0.115
latable+0.105
 knights+0.104
 hay+0.100
acking+0.099
Neg logit contributions
grand-0.126
 disobedience-0.116
 hairc-0.115
 supervision-0.114
 divorce-0.111
Token.
Token ID13
Feature activation-8.02e-02

Pos logit contributions
 camps+0.222
latable+0.217
 knights+0.188
acking+0.177
aw+0.176
Neg logit contributions
grand-0.351
 disobedience-0.325
 promise-0.324
t-0.322
 worry-0.318
Token<|endoftext|>
Token ID50256
Feature activation-6.65e-02

Pos logit contributions
 camps+0.102
latable+0.093
 knights+0.086
 hay+0.083
 fish+0.082
Neg logit contributions
grand-0.148
 disobedience-0.130
t-0.130
 hairc-0.129
 supervision-0.126
TokenJohn
Token ID7554
Feature activation-6.20e-02

Pos logit contributions
 camps+0.100
latable+0.096
 knights+0.090
 hay+0.087
 astronaut+0.087
Neg logit contributions
grand-0.102
t-0.095
 hairc-0.093
 disobedience-0.089
 policy-0.087
Token was
Token ID373
Feature activation-5.34e-02

Pos logit contributions
latable+0.083
 camps+0.082
 knights+0.076
 football+0.069
 hay+0.069
Neg logit contributions
grand-0.112
 complained-0.101
 hairc-0.099
 disob-0.097
 worry-0.095
Token walking
Token ID6155
Feature activation-1.96e-01

Pos logit contributions
latable+0.067
 camps+0.063
 fish+0.058
 monkeys+0.056
 tiger+0.056
Neg logit contributions
grand-0.081
 babys-0.081
 disob-0.079
 punishment-0.079
 disobedience-0.076
Token in
Token ID287
Feature activation-9.35e-02

Pos logit contributions
latable+0.285
 camps+0.245
 hay+0.227
aw+0.219
 carrot+0.208
Neg logit contributions
 hairc-0.319
grand-0.306
t-0.298
 babys-0.288
 moment-0.284
Token the
Token ID262
Feature activation-7.87e-02

Pos logit contributions
latable+0.121
aw+0.120
 camps+0.117
acking+0.115
 knights+0.108
Neg logit contributions
 hairc-0.158
 worry-0.144
 disobedience-0.141
 punishment-0.140
grand-0.140
Token park
Token ID3952
Feature activation-1.18e-01

Pos logit contributions
latable+0.087
acking+0.082
 camps+0.082
 knights+0.081
igg+0.075
Neg logit contributions
 divorce-0.128
grand-0.128
 hairc-0.127
 babys-0.127
 worry-0.124
Token with
Token ID351
Feature activation-6.14e-02

Pos logit contributions
 camps+0.149
latable+0.142
 hay+0.132
 knights+0.127
 carrot+0.125
Neg logit contributions
t-0.203
grand-0.200
 hairc-0.189
 disobedience-0.186
 note-0.185
Token his
Token ID465
Feature activation-2.81e-02

Pos logit contributions
 camps+0.067
latable+0.062
acking+0.059
aw+0.057
 clearing+0.057
Neg logit contributions
grand-0.116
 hairc-0.114
 worry-0.109
nam-0.107
 disobedience-0.107
Token he
Token ID339
Feature activation-1.42e-01

Pos logit contributions
 camps+0.114
latable+0.107
acking+0.105
aw+0.100
 knights+0.097
Neg logit contributions
grand-0.129
 hairc-0.123
t-0.118
 disobedience-0.117
 worry-0.116
Token saw
Token ID2497
Feature activation-1.24e-01

Pos logit contributions
latable+0.216
 camps+0.205
 hay+0.198
 trailer+0.190
 football+0.189
Neg logit contributions
grand-0.210
 disob-0.193
 complained-0.191
 disobedience-0.188
 hairc-0.187
Token an
Token ID281
Feature activation-2.86e-02

Pos logit contributions
 camps+0.134
acking+0.124
latable+0.119
 trailer+0.102
 clearing+0.101
Neg logit contributions
grand-0.245
 hairc-0.240
 disobedience-0.237
 divorce-0.234
 punishment-0.229
Token interesting
Token ID3499
Feature activation-9.15e-02

Pos logit contributions
 camps+0.041
 knights+0.037
acking+0.036
und+0.036
latable+0.035
Neg logit contributions
grand-0.046
 hairc-0.043
 disobedience-0.042
 worry-0.041
 disob-0.041
Token case
Token ID1339
Feature activation-5.97e-02

Pos logit contributions
 camps+0.126
 knights+0.116
latable+0.112
aw+0.110
acking+0.110
Neg logit contributions
grand-0.145
 hairc-0.143
 note-0.140
 virus-0.139
 divorce-0.135
Token lying
Token ID9105
Feature activation-1.96e-01

Pos logit contributions
 camps+0.078
latable+0.071
 hay+0.066
acking+0.063
 jungle+0.063
Neg logit contributions
t-0.115
grand-0.105
 policy-0.101
 note-0.100
 disobedience-0.097
Token in
Token ID287
Feature activation-7.59e-02

Pos logit contributions
 camps+0.221
latable+0.206
 knights+0.190
 hay+0.174
acking+0.172
Neg logit contributions
t-0.360
grand-0.356
 disobedience-0.346
 policy-0.337
 hairc-0.337
Token the
Token ID262
Feature activation-5.24e-02

Pos logit contributions
 camps+0.084
latable+0.080
acking+0.078
 knights+0.074
aw+0.074
Neg logit contributions
 hairc-0.139
grand-0.137
 disobedience-0.134
 worry-0.132
 punishment-0.132
Token grass
Token ID8701
Feature activation-9.24e-02

Pos logit contributions
 camps+0.055
 knights+0.049
latable+0.048
acking+0.043
 hay+0.043
Neg logit contributions
grand-0.106
 hairc-0.098
 disobedience-0.097
 divorce-0.096
 virus-0.095
Token.
Token ID13
Feature activation-6.27e-02

Pos logit contributions
 camps+0.129
latable+0.124
 knights+0.123
 astronaut+0.122
 hay+0.114
Neg logit contributions
t-0.157
grand-0.133
 note-0.131
 disobedience-0.129
 complained-0.125
Token He
Token ID679
Feature activation-1.34e-01

Pos logit contributions
 camps+0.060
 astronaut+0.049
 knights+0.047
latable+0.046
 hay+0.042
Neg logit contributions
grand-0.130
 hairc-0.123
t-0.122
 disobedience-0.121
 punishment-0.118
Token feet
Token ID3625
Feature activation-3.01e-02

Pos logit contributions
grand+0.035
 disobedience+0.033
 hairc+0.033
 supervision+0.033
t+0.032
Neg logit contributions
 camps-0.020
 knights-0.017
latable-0.017
 clearing-0.016
 hay-0.016
Token.
Token ID13
Feature activation-6.40e-02

Pos logit contributions
 camps+0.035
latable+0.031
 knights+0.031
 clearing+0.030
 jungle+0.029
Neg logit contributions
grand-0.058
t-0.055
 supervision-0.054
 hairc-0.054
 disobedience-0.053
Token<|endoftext|>
Token ID50256
Feature activation-5.03e-02

Pos logit contributions
 camps+0.073
latable+0.065
 knights+0.063
aw+0.061
 hay+0.060
Neg logit contributions
grand-0.122
 disobedience-0.109
 hairc-0.109
t-0.108
 supervision-0.107
TokenJohn
Token ID7554
Feature activation-5.45e-02

Pos logit contributions
 camps+0.073
latable+0.070
 knights+0.066
 hay+0.064
 astronaut+0.064
Neg logit contributions
grand-0.078
t-0.074
 hairc-0.072
 disobedience-0.069
 lesson-0.068
Token was
Token ID373
Feature activation-4.65e-02

Pos logit contributions
latable+0.075
 camps+0.072
 knights+0.069
 football+0.063
 clearing+0.063
Neg logit contributions
grand-0.095
 complained-0.087
 hairc-0.083
 disob-0.082
 promise-0.081
Token hopping
Token ID47153
Feature activation-1.96e-01

Pos logit contributions
latable+0.063
 camps+0.058
 fish+0.055
 tiger+0.054
 monkeys+0.054
Neg logit contributions
 disob-0.064
 punishment-0.064
 babys-0.064
grand-0.063
 crank-0.063
Token around
Token ID1088
Feature activation-8.85e-02

Pos logit contributions
latable+0.239
 camps+0.232
 trailer+0.209
 knights+0.203
 jungle+0.199
Neg logit contributions
t-0.355
 lesson-0.313
grand-0.298
 hairc-0.297
 note-0.287
Token the
Token ID262
Feature activation-5.11e-02

Pos logit contributions
latable+0.114
aw+0.104
 camps+0.101
 trailer+0.098
bridge+0.095
Neg logit contributions
 hairc-0.148
 worry-0.136
 moment-0.131
 disobedience-0.131
 punishment-0.130
Token table
Token ID3084
Feature activation-6.56e-02

Pos logit contributions
 knights+0.053
 camps+0.052
latable+0.051
und+0.049
atched+0.049
Neg logit contributions
 hairc-0.081
 photo-0.079
 note-0.079
grand-0.079
 virus-0.078
Token.
Token ID13
Feature activation-6.17e-02

Pos logit contributions
latable+0.083
 camps+0.083
 snowball+0.074
 hay+0.073
 trailer+0.073
Neg logit contributions
t-0.107
 note-0.099
grand-0.098
 disobedience-0.098
 hairc-0.094
Token He
Token ID679
Feature activation-1.34e-01

Pos logit contributions
 camps+0.064
aw+0.063
 knights+0.055
latable+0.054
 hay+0.053
Neg logit contributions
grand-0.118
 hairc-0.112
 disobedience-0.112
 worry-0.111
t-0.110
Token day
Token ID1110
Feature activation-6.75e-02

Pos logit contributions
 camps+0.016
latable+0.015
 knights+0.015
 hay+0.014
aw+0.014
Neg logit contributions
grand-0.016
 disobedience-0.015
 hairc-0.015
t-0.015
 supervision-0.015
Token,
Token ID11
Feature activation-1.03e-01

Pos logit contributions
latable+0.078
 camps+0.077
 clearing+0.067
 hay+0.066
 trailer+0.064
Neg logit contributions
grand-0.124
t-0.119
 disobedience-0.117
 hairc-0.115
 punishment-0.115
Token a
Token ID257
Feature activation-1.06e-01

Pos logit contributions
latable+0.149
 camps+0.146
acking+0.138
aw+0.132
acked+0.123
Neg logit contributions
grand-0.167
 worry-0.155
 hairc-0.150
 disobedience-0.148
 punishment-0.146
Token baby
Token ID5156
Feature activation-5.05e-02

Pos logit contributions
 camps+0.116
latable+0.101
 knights+0.101
 hay+0.096
aw+0.093
Neg logit contributions
grand-0.207
 disobedience-0.194
 hairc-0.193
 divorce-0.191
 supervision-0.189
Token was
Token ID373
Feature activation-6.38e-02

Pos logit contributions
latable+0.079
 camps+0.076
acking+0.074
uting+0.068
 trailer+0.068
Neg logit contributions
grand-0.079
 babys-0.072
 worry-0.072
 disob-0.071
 hairc-0.070
Token walking
Token ID6155
Feature activation-1.95e-01

Pos logit contributions
 camps+0.089
latable+0.082
 football+0.075
 trailer+0.074
 knights+0.073
Neg logit contributions
 disob-0.102
grand-0.098
 babys-0.097
 hairc-0.095
 divorce-0.093
Token.
Token ID13
Feature activation-7.04e-02

Pos logit contributions
latable+0.261
 camps+0.240
 knights+0.203
 trailer+0.197
aw+0.191
Neg logit contributions
t-0.325
grand-0.303
 hairc-0.303
 lesson-0.301
 worry-0.301
Token She
Token ID1375
Feature activation-1.41e-01

Pos logit contributions
 camps+0.079
latable+0.066
 knights+0.066
 hay+0.062
 astronaut+0.061
Neg logit contributions
grand-0.135
 hairc-0.126
 disobedience-0.125
t-0.123
 babys-0.121
Token was
Token ID373
Feature activation-5.12e-02

Pos logit contributions
 camps+0.213
latable+0.198
 knights+0.195
 hay+0.189
 football+0.183
Neg logit contributions
grand-0.214
 hairc-0.199
 disob-0.195
 disobedience-0.195
t-0.194
Token very
Token ID845
Feature activation+2.55e-03

Pos logit contributions
 camps+0.083
latable+0.074
acking+0.071
aw+0.071
 football+0.070
Neg logit contributions
 disob-0.070
 hairc-0.070
 worry-0.069
 babys-0.065
 protests-0.063
Token attractive
Token ID10966
Feature activation-8.68e-02

Pos logit contributions
grand+0.003
 disob+0.003
 hairc+0.003
 worry+0.003
 disobedience+0.003
Neg logit contributions
 camps-0.004
latable-0.004
 knights-0.004
 hay-0.004
 trailer-0.004
Token,
Token ID11
Feature activation-7.90e-02

Pos logit contributions
 camps+0.160
latable+0.150
 knights+0.134
 hay+0.131
 astronaut+0.126
Neg logit contributions
grand-0.256
t-0.248
 disobedience-0.238
 lesson-0.233
 hairc-0.232
Token Lily
Token ID20037
Feature activation-6.82e-02

Pos logit contributions
 camps+0.101
latable+0.097
acking+0.088
 knights+0.081
 clearing+0.080
Neg logit contributions
grand-0.138
 hairc-0.129
 worry-0.128
 disobedience-0.128
t-0.125
Token saw
Token ID2497
Feature activation-1.25e-01

Pos logit contributions
latable+0.089
 camps+0.086
 football+0.072
 hay+0.072
 clearing+0.071
Neg logit contributions
grand-0.125
 supervision-0.116
 complained-0.113
 disob-0.113
 promise-0.112
Token a
Token ID257
Feature activation-6.56e-02

Pos logit contributions
 camps+0.144
acking+0.131
latable+0.126
 clearing+0.112
 trailer+0.109
Neg logit contributions
grand-0.237
 hairc-0.237
 disobedience-0.228
 supervision-0.228
 divorce-0.227
Token skeleton
Token ID18328
Feature activation-9.47e-02

Pos logit contributions
 camps+0.080
 knights+0.072
latable+0.072
 hay+0.067
aw+0.066
Neg logit contributions
grand-0.117
 hairc-0.110
 disobedience-0.110
 divorce-0.109
 virus-0.106
Token lying
Token ID9105
Feature activation-1.95e-01

Pos logit contributions
latable+0.129
 camps+0.118
acking+0.107
 Soon+0.104
uting+0.104
Neg logit contributions
 note-0.162
grand-0.154
t-0.147
 virus-0.145
 hairc-0.144
Token on
Token ID319
Feature activation-5.65e-02

Pos logit contributions
 camps+0.230
latable+0.228
 knights+0.202
 hay+0.187
acking+0.183
Neg logit contributions
t-0.344
grand-0.324
 note-0.319
 virus-0.319
 hairc-0.317
Token the
Token ID262
Feature activation-3.74e-02

Pos logit contributions
 camps+0.074
acking+0.073
latable+0.070
 knights+0.063
aw+0.061
Neg logit contributions
 hairc-0.095
grand-0.091
 lesson-0.088
 note-0.088
 punishment-0.086
Token ground
Token ID2323
Feature activation-5.44e-02

Pos logit contributions
 camps+0.054
 knights+0.049
latable+0.049
acking+0.048
und+0.047
Neg logit contributions
grand-0.060
 hairc-0.055
 disobedience-0.053
 lesson-0.053
 virus-0.052
Token.
Token ID13
Feature activation-6.56e-02

Pos logit contributions
 camps+0.080
latable+0.077
 knights+0.074
 astronaut+0.071
 cowboy+0.066
Neg logit contributions
t-0.086
grand-0.082
 disobedience-0.078
 note-0.076
 hairc-0.075
Token She
Token ID1375
Feature activation-1.29e-01

Pos logit contributions
 camps+0.085
latable+0.074
 knights+0.070
 hay+0.068
 astronaut+0.067
Neg logit contributions
grand-0.120
 hairc-0.108
t-0.108
 disobedience-0.107
 divorce-0.105
Token green
Token ID4077
Feature activation-1.39e-01

Pos logit contributions
latable+0.162
 camps+0.158
acking+0.147
und+0.141
uting+0.138
Neg logit contributions
 worry-0.118
grand-0.116
 hairc-0.115
 supervision-0.110
 promise-0.110
Token frog
Token ID21264
Feature activation-1.09e-01

Pos logit contributions
 camps+0.178
latable+0.173
acking+0.158
 knights+0.150
 clearing+0.145
Neg logit contributions
grand-0.237
 hairc-0.231
 disobedience-0.221
 lesson-0.218
 note-0.216
Token found
Token ID1043
Feature activation-9.91e-02

Pos logit contributions
 camps+0.171
latable+0.167
 knights+0.153
 astronaut+0.150
 hay+0.149
Neg logit contributions
grand-0.153
 complained-0.142
 disob-0.141
 promise-0.140
t-0.140
Token a
Token ID257
Feature activation-5.50e-02

Pos logit contributions
 camps+0.126
latable+0.116
acking+0.113
 astronaut+0.099
 trailer+0.098
Neg logit contributions
 lesson-0.168
 supervision-0.167
grand-0.167
 disobedience-0.167
 punishment-0.164
Token handle
Token ID5412
Feature activation-5.94e-02

Pos logit contributions
 camps+0.068
 knights+0.062
latable+0.061
aw+0.059
 hay+0.057
Neg logit contributions
grand-0.097
 disobedience-0.091
 hairc-0.091
 divorce-0.089
 lesson-0.088
Token lying
Token ID9105
Feature activation-1.95e-01

Pos logit contributions
 camps+0.070
 knights+0.066
latable+0.061
 jungle+0.060
 clearing+0.057
Neg logit contributions
t-0.114
grand-0.109
 hairc-0.107
 supervision-0.104
 disobedience-0.104
Token in
Token ID287
Feature activation-7.61e-02

Pos logit contributions
 camps+0.206
latable+0.194
 knights+0.186
acking+0.164
 jungle+0.161
Neg logit contributions
t-0.362
grand-0.350
 hairc-0.345
 disobedience-0.344
 policy-0.340
Token the
Token ID262
Feature activation-6.18e-02

Pos logit contributions
 camps+0.088
latable+0.082
aw+0.078
 knights+0.077
acking+0.075
Neg logit contributions
grand-0.137
 hairc-0.135
 worry-0.134
 disobedience-0.133
 punishment-0.132
Token bottom
Token ID4220
Feature activation-7.11e-02

Pos logit contributions
 camps+0.071
 knights+0.065
latable+0.065
acking+0.060
 cops+0.057
Neg logit contributions
grand-0.112
 hairc-0.104
 virus-0.101
 disobedience-0.101
 note-0.101
Token of
Token ID286
Feature activation-1.14e-01

Pos logit contributions
 camps+0.080
latable+0.075
 knights+0.070
 hay+0.064
 astronaut+0.063
Neg logit contributions
grand-0.135
t-0.133
 disobedience-0.127
 hairc-0.127
 supervision-0.125
Token the
Token ID262
Feature activation-5.41e-02

Pos logit contributions
 camps+0.139
acking+0.129
latable+0.128
aw+0.123
 knights+0.118
Neg logit contributions
 hairc-0.199
 punishment-0.193
 disobedience-0.191
 worry-0.190
 supervision-0.190
Token day
Token ID1110
Feature activation-2.19e-02

Pos logit contributions
 camps+0.102
 knights+0.093
latable+0.092
 hay+0.092
 clearing+0.090
Neg logit contributions
grand-0.069
 disobedience-0.062
 hairc-0.062
 supervision-0.061
 divorce-0.059
Token,
Token ID11
Feature activation-5.81e-02

Pos logit contributions
 camps+0.025
latable+0.024
 astronaut+0.023
 knights+0.021
 clearing+0.021
Neg logit contributions
t-0.040
 disobedience-0.038
 lesson-0.038
 hairc-0.038
grand-0.038
Token as
Token ID355
Feature activation-4.80e-02

Pos logit contributions
 camps+0.063
latable+0.059
 knights+0.056
 astronaut+0.056
acking+0.055
Neg logit contributions
grand-0.106
t-0.106
 disobedience-0.103
 lesson-0.102
 hairc-0.101
Token he
Token ID339
Feature activation-1.21e-01

Pos logit contributions
 camps+0.045
 knights+0.039
latable+0.038
 hay+0.036
 clearing+0.035
Neg logit contributions
grand-0.102
 disobedience-0.096
 hairc-0.095
t-0.094
 supervision-0.094
Token was
Token ID373
Feature activation-4.02e-02

Pos logit contributions
 camps+0.167
latable+0.152
 knights+0.150
 hay+0.146
aw+0.142
Neg logit contributions
grand-0.199
 disobedience-0.188
t-0.183
 hairc-0.182
 lesson-0.178
Token digging
Token ID18894
Feature activation-1.94e-01

Pos logit contributions
 camps+0.052
latable+0.050
 football+0.047
 astronaut+0.046
aw+0.045
Neg logit contributions
 disob-0.064
 disobedience-0.062
 punishment-0.061
 supervision-0.060
t-0.060
Token,
Token ID11
Feature activation-4.00e-02

Pos logit contributions
 camps+0.256
latable+0.254
 trailer+0.252
acking+0.226
 stable+0.221
Neg logit contributions
t-0.314
 lesson-0.305
 punishment-0.276
 worry-0.275
 disobedience-0.271
Token he
Token ID339
Feature activation-1.08e-01

Pos logit contributions
 camps+0.051
latable+0.050
acking+0.050
 astronaut+0.046
 knights+0.044
Neg logit contributions
 disobedience-0.064
t-0.062
 hairc-0.062
 lesson-0.062
 worry-0.062
Token made
Token ID925
Feature activation-4.17e-02

Pos logit contributions
 camps+0.153
latable+0.150
 hay+0.137
 football+0.134
 knights+0.133
Neg logit contributions
grand-0.167
 disobedience-0.158
t-0.156
 disob-0.153
 complained-0.151
Token a
Token ID257
Feature activation-2.56e-02

Pos logit contributions
 camps+0.053
latable+0.049
acking+0.045
und+0.044
 ranch+0.044
Neg logit contributions
 lesson-0.073
 hairc-0.070
 disobedience-0.069
 punishment-0.069
t-0.067
Token big
Token ID1263
Feature activation-9.21e-02

Pos logit contributions
 camps+0.031
latable+0.028
 knights+0.028
 hay+0.026
aw+0.026
Neg logit contributions
grand-0.046
 disobedience-0.044
 hairc-0.043
 divorce-0.042
 supervision-0.042
Token,
Token ID11
Feature activation-1.13e-01

Pos logit contributions
 hairc+0.020
grand+0.019
 supervision+0.019
 disobedience+0.019
 punishment+0.019
Neg logit contributions
 camps-0.016
 knights-0.016
aw-0.015
latable-0.015
acking-0.015
Token the
Token ID262
Feature activation-9.77e-02

Pos logit contributions
 camps+0.154
latable+0.148
acking+0.136
aw+0.127
 knights+0.127
Neg logit contributions
grand-0.191
 worry-0.180
 disobedience-0.177
 hairc-0.177
 supervision-0.176
Token brown
Token ID7586
Feature activation-1.39e-01

Pos logit contributions
 camps+0.116
latable+0.108
acking+0.101
 knights+0.100
aw+0.095
Neg logit contributions
grand-0.177
 disobedience-0.167
 hairc-0.167
 supervision-0.163
 divorce-0.163
Token jet
Token ID12644
Feature activation-1.18e-01

Pos logit contributions
 camps+0.175
latable+0.172
acking+0.167
und+0.150
 knights+0.149
Neg logit contributions
grand-0.229
 hairc-0.228
 disobedience-0.216
 virus-0.215
 lesson-0.214
Token was
Token ID373
Feature activation-5.83e-02

Pos logit contributions
 camps+0.183
latable+0.175
 knights+0.161
 hay+0.160
 football+0.157
Neg logit contributions
grand-0.179
t-0.176
 disob-0.162
 disobedience-0.160
 complained-0.157
Token flying
Token ID7348
Feature activation-1.94e-01

Pos logit contributions
 camps+0.079
 knights+0.073
latable+0.073
 football+0.067
aw+0.067
Neg logit contributions
grand-0.091
t-0.089
 disob-0.087
 worry-0.087
 disobedience-0.087
Token very
Token ID845
Feature activation-1.73e-02

Pos logit contributions
 camps+0.245
latable+0.240
 knights+0.225
 trailer+0.200
acking+0.194
Neg logit contributions
t-0.350
 lesson-0.319
 moment-0.284
 worry-0.284
 note-0.282
Token fast
Token ID3049
Feature activation-1.08e-01

Pos logit contributions
 camps+0.023
latable+0.022
 knights+0.022
 hay+0.020
 football+0.020
Neg logit contributions
t-0.028
grand-0.028
 lesson-0.027
 hairc-0.027
 disobedience-0.026
Token.
Token ID13
Feature activation-6.31e-02

Pos logit contributions
latable+0.155
 camps+0.152
 clearing+0.132
 knights+0.131
 trailer+0.130
Neg logit contributions
 hairc-0.162
t-0.158
 disobedience-0.147
 disob-0.145
 worry-0.144
Token It
Token ID632
Feature activation-5.16e-02

Pos logit contributions
 camps+0.072
latable+0.063
 knights+0.060
 hay+0.058
 outside+0.056
Neg logit contributions
grand-0.124
 hairc-0.111
 disobedience-0.111
t-0.110
 supervision-0.107
Token wanted
Token ID2227
Feature activation-6.45e-02

Pos logit contributions
 camps+0.073
latable+0.066
 knights+0.066
 hay+0.063
 football+0.062
Neg logit contributions
grand-0.085
 disobedience-0.078
t-0.078
 hairc-0.077
 divorce-0.075
Token,
Token ID11
Feature activation-1.03e-01

Pos logit contributions
latable+0.078
 camps+0.077
 clearing+0.067
 hay+0.066
 trailer+0.064
Neg logit contributions
grand-0.124
t-0.119
 disobedience-0.117
 hairc-0.115
 punishment-0.115
Token the
Token ID262
Feature activation-9.98e-02

Pos logit contributions
latable+0.149
 camps+0.146
acking+0.138
aw+0.132
acked+0.123
Neg logit contributions
grand-0.167
 worry-0.155
 hairc-0.150
 disobedience-0.148
 punishment-0.146
Token gr
Token ID1036
Feature activation-8.82e-02

Pos logit contributions
 camps+0.120
latable+0.112
 knights+0.105
acking+0.104
 hay+0.100
Neg logit contributions
grand-0.180
 disobedience-0.169
 hairc-0.168
 divorce-0.165
 supervision-0.163
Tokenanny
Token ID7737
Feature activation-3.60e-02

Pos logit contributions
 camps+0.138
 knights+0.124
latable+0.124
 hay+0.122
 clearing+0.119
Neg logit contributions
grand-0.137
 disobedience-0.123
 hairc-0.122
t-0.120
 supervision-0.119
Token was
Token ID373
Feature activation-6.50e-02

Pos logit contributions
 camps+0.046
latable+0.043
 clearing+0.040
 hay+0.039
 knights+0.038
Neg logit contributions
grand-0.067
 hairc-0.060
t-0.059
 disobedience-0.059
 disob-0.059
Token walking
Token ID6155
Feature activation-1.94e-01

Pos logit contributions
 camps+0.097
latable+0.093
 knights+0.086
 hay+0.084
 football+0.084
Neg logit contributions
grand-0.095
 disob-0.091
 punishment-0.089
 disobedience-0.089
 worry-0.089
Token in
Token ID287
Feature activation-9.16e-02

Pos logit contributions
latable+0.259
 camps+0.232
aw+0.197
 trailer+0.195
 jungle+0.191
Neg logit contributions
t-0.313
 lesson-0.304
 hairc-0.293
 moment-0.288
grand-0.286
Token the
Token ID262
Feature activation-7.49e-02

Pos logit contributions
 camps+0.089
latable+0.080
 knights+0.077
aw+0.075
acking+0.073
Neg logit contributions
grand-0.183
 hairc-0.179
 disobedience-0.178
 worry-0.171
 supervision-0.171
Token park
Token ID3952
Feature activation-1.12e-01

Pos logit contributions
 camps+0.068
 knights+0.063
latable+0.063
 hay+0.054
acking+0.053
Neg logit contributions
grand-0.154
 divorce-0.146
 hairc-0.146
 disobedience-0.142
 virus-0.142
Token when
Token ID618
Feature activation-3.87e-02

Pos logit contributions
 camps+0.138
latable+0.129
 hay+0.116
 knights+0.114
 astronaut+0.111
Neg logit contributions
t-0.195
grand-0.195
 disobedience-0.188
 hairc-0.184
 worry-0.180
Token she
Token ID673
Feature activation-1.11e-01

Pos logit contributions
 camps+0.053
latable+0.047
 knights+0.047
 hay+0.046
 clearing+0.045
Neg logit contributions
grand-0.066
 disobedience-0.061
 hairc-0.061
 divorce-0.059
 supervision-0.059
Token.
Token ID13
Feature activation-7.46e-02

Pos logit contributions
 camps+0.104
latable+0.099
 knights+0.092
 clearing+0.084
 astronaut+0.083
Neg logit contributions
grand-0.172
 hairc-0.166
 disobedience-0.165
 supervision-0.165
 worry-0.162
Token<|endoftext|>
Token ID50256
Feature activation-6.94e-02

Pos logit contributions
 camps+0.093
latable+0.085
 knights+0.079
 hay+0.076
acking+0.074
Neg logit contributions
grand-0.141
 disobedience-0.123
t-0.123
 hairc-0.122
 supervision-0.119
TokenTom
Token ID13787
Feature activation-6.52e-02

Pos logit contributions
 camps+0.102
latable+0.098
 knights+0.091
 hay+0.090
 astronaut+0.089
Neg logit contributions
grand-0.106
t-0.100
 hairc-0.098
 disobedience-0.094
rav-0.092
Tokenmy
Token ID1820
Feature activation-7.50e-02

Pos logit contributions
 camps+0.132
latable+0.129
 knights+0.126
 hay+0.118
 astronaut+0.116
Neg logit contributions
grand-0.068
 worry-0.058
 promise-0.058
 complained-0.057
 hairc-0.056
Token was
Token ID373
Feature activation-5.47e-02

Pos logit contributions
latable+0.116
 camps+0.112
 knights+0.099
 hay+0.096
 clearing+0.095
Neg logit contributions
grand-0.120
 complained-0.110
 hairc-0.107
 disob-0.107
 worry-0.105
Token walking
Token ID6155
Feature activation-1.94e-01

Pos logit contributions
latable+0.073
 camps+0.071
 fish+0.064
 trailer+0.063
 lake+0.062
Neg logit contributions
 babys-0.077
 disob-0.075
 punishment-0.075
grand-0.073
 crank-0.073
Token in
Token ID287
Feature activation-9.17e-02

Pos logit contributions
latable+0.284
 camps+0.251
aw+0.229
 hay+0.217
 knights+0.212
Neg logit contributions
 hairc-0.296
t-0.283
grand-0.283
 moment-0.276
 lesson-0.271
Token the
Token ID262
Feature activation-7.92e-02

Pos logit contributions
aw+0.124
latable+0.123
 camps+0.121
acking+0.118
 knights+0.112
Neg logit contributions
 hairc-0.147
 worry-0.139
 disobedience-0.133
 punishment-0.132
 reason-0.129
Token park
Token ID3952
Feature activation-1.17e-01

Pos logit contributions
 camps+0.083
 knights+0.081
latable+0.079
acking+0.074
 spears+0.067
Neg logit contributions
grand-0.138
 hairc-0.137
 divorce-0.133
 babys-0.132
 worry-0.130
Token with
Token ID351
Feature activation-5.90e-02

Pos logit contributions
 camps+0.147
latable+0.138
 hay+0.126
 knights+0.125
 astronaut+0.124
Neg logit contributions
t-0.203
grand-0.198
 hairc-0.190
 disobedience-0.188
 note-0.185
Token his
Token ID465
Feature activation-2.74e-02

Pos logit contributions
 camps+0.067
latable+0.061
acking+0.058
 clearing+0.056
aw+0.056
Neg logit contributions
grand-0.110
 hairc-0.107
 worry-0.103
 disobedience-0.102
 supervision-0.101
Token "
Token ID366
Feature activation+3.46e-02

Pos logit contributions
grand+0.067
 disobedience+0.063
 hairc+0.063
 supervision+0.062
 divorce+0.061
Neg logit contributions
 camps-0.024
 knights-0.020
latable-0.019
 hay-0.019
 clearing-0.017
TokenIt
Token ID1026
Feature activation+2.30e-02

Pos logit contributions
 supervision+0.066
 hairc+0.065
 protests+0.064
grand+0.063
 disobedience+0.063
Neg logit contributions
 camps-0.038
aw-0.034
 hay-0.034
 clearing-0.034
 outside-0.033
Token's
Token ID338
Feature activation+4.35e-02

Pos logit contributions
grand+0.034
 disobedience+0.030
 hairc+0.030
t+0.029
 divorce+0.029
Neg logit contributions
 camps-0.038
 knights-0.034
latable-0.034
 hay-0.033
 clearing-0.032
Token where
Token ID810
Feature activation+1.65e-02

Pos logit contributions
grand+0.096
 hairc+0.090
 disobedience+0.090
 supervision+0.090
t+0.088
Neg logit contributions
 camps-0.040
 hay-0.035
 outside-0.034
 clearing-0.033
 stable-0.032
Token you
Token ID345
Feature activation+4.22e-02

Pos logit contributions
grand+0.033
 protests+0.032
 disobedience+0.032
t+0.031
rav+0.031
Neg logit contributions
 knights-0.015
 hay-0.015
 fish-0.014
 camps-0.014
 outside-0.014
Token can
Token ID460
Feature activation+8.05e-02

Pos logit contributions
grand+0.068
 promise+0.062
t+0.061
 disob+0.061
 disobedience+0.060
Neg logit contributions
 camps-0.063
latable-0.056
 knights-0.056
 clearing-0.054
acking-0.054
Token mail
Token ID6920
Feature activation-2.43e-02

Pos logit contributions
grand+0.120
t+0.111
 disob+0.105
 disobedience+0.103
 worry+0.103
Neg logit contributions
 camps-0.133
 knights-0.122
aw-0.122
latable-0.117
 hay-0.115
Token letters
Token ID7475
Feature activation+2.74e-02

Pos logit contributions
 camps+0.023
 knights+0.019
latable+0.019
 hay+0.018
 clearing+0.018
Neg logit contributions
grand-0.052
 disobedience-0.049
 hairc-0.048
 supervision-0.048
 divorce-0.047
Token to
Token ID284
Feature activation+6.74e-02

Pos logit contributions
grand+0.055
t+0.051
 hairc+0.051
 disobedience+0.050
 supervision+0.049
Neg logit contributions
 camps-0.030
latable-0.027
 knights-0.025
 clearing-0.025
 hay-0.024
Token your
Token ID534
Feature activation+6.23e-02

Pos logit contributions
grand+0.120
 disobedience+0.118
t+0.117
rav+0.116
 hairc+0.115
Neg logit contributions
 camps-0.077
 hay-0.072
 knights-0.072
 outside-0.070
 fish-0.068
Token friends
Token ID2460
Feature activation+3.34e-02

Pos logit contributions
grand+0.105
 disobedience+0.097
 hairc+0.096
t+0.095
 supervision+0.094
Neg logit contributions
 camps-0.087
 knights-0.079
 hay-0.077
latable-0.075
 jungle-0.074
Token learned
Token ID4499
Feature activation+8.15e-03

Pos logit contributions
grand+0.010
 disobedience+0.009
t+0.009
 hairc+0.009
 divorce+0.009
Neg logit contributions
 camps-0.009
latable-0.008
 knights-0.008
 football-0.007
 hay-0.007
Token that
Token ID326
Feature activation+2.43e-02

Pos logit contributions
grand+0.019
 disobedience+0.018
 hairc+0.018
 supervision+0.017
 divorce+0.017
Neg logit contributions
 camps-0.006
 knights-0.005
latable-0.005
 hay-0.005
 clearing-0.005
Token it
Token ID340
Feature activation+5.88e-02

Pos logit contributions
grand+0.050
 supervision+0.046
rav+0.044
t+0.044
 hairc+0.044
Neg logit contributions
 camps-0.026
 knights-0.024
 fish-0.024
 hay-0.023
 outside-0.023
Token's
Token ID338
Feature activation+6.65e-02

Pos logit contributions
grand+0.098
 hairc+0.090
 disobedience+0.089
 supervision+0.088
 divorce+0.087
Neg logit contributions
 camps-0.083
 knights-0.076
 hay-0.072
latable-0.072
 outside-0.071
Token important
Token ID1593
Feature activation+3.97e-02

Pos logit contributions
grand+0.094
 disobedience+0.085
 hairc+0.084
 supervision+0.083
 divorce+0.082
Neg logit contributions
 camps-0.112
 knights-0.102
latable-0.100
 hay-0.099
 clearing-0.097
Token to
Token ID284
Feature activation+8.20e-02

Pos logit contributions
grand+0.081
 supervision+0.075
 divorce+0.074
 hairc+0.073
 babys+0.073
Neg logit contributions
 outside-0.038
 and-0.037
 camps-0.037
 knights-0.037
 hay-0.037
Token rest
Token ID1334
Feature activation-1.21e-02

Pos logit contributions
grand+0.106
 disobedience+0.101
 worry+0.100
 promise+0.100
 disob+0.099
Neg logit contributions
 camps-0.147
latable-0.133
 knights-0.132
aw-0.131
acking-0.127
Token when
Token ID618
Feature activation+5.17e-02

Pos logit contributions
 camps+0.010
latable+0.009
 knights+0.008
 clearing+0.007
acking+0.007
Neg logit contributions
grand-0.027
 disobedience-0.026
 supervision-0.025
 hairc-0.025
 policy-0.025
Token you
Token ID345
Feature activation+6.08e-02

Pos logit contributions
grand+0.081
 disobedience+0.076
 hairc+0.075
 supervision+0.073
 divorce+0.072
Neg logit contributions
 camps-0.077
latable-0.070
 knights-0.069
 hay-0.067
 clearing-0.066
Token're
Token ID821
Feature activation+8.65e-02

Pos logit contributions
grand+0.095
 disobedience+0.087
 hairc+0.087
 supervision+0.085
t+0.084
Neg logit contributions
 camps-0.092
 knights-0.083
latable-0.082
 hay-0.080
 clearing-0.079
Token tired
Token ID10032
Feature activation+6.86e-03

Pos logit contributions
grand+0.119
 disobedience+0.116
 hairc+0.115
t+0.114
 disob+0.114
Neg logit contributions
 camps-0.140
latable-0.131
 knights-0.123
 hay-0.123
 clearing-0.120
Token for
Token ID329
Feature activation+4.08e-02

Pos logit contributions
grand+0.097
 promise+0.091
 hairc+0.091
 disobedience+0.091
 supervision+0.091
Neg logit contributions
 camps-0.065
latable-0.063
und-0.056
 clearing-0.056
aw-0.054
Token being
Token ID852
Feature activation+2.26e-02

Pos logit contributions
grand+0.073
 disobedience+0.068
 hairc+0.067
 supervision+0.067
 divorce+0.066
Neg logit contributions
 camps-0.054
latable-0.047
 knights-0.045
 hay-0.045
aw-0.044
Token so
Token ID523
Feature activation+1.51e-01

Pos logit contributions
grand+0.040
 hairc+0.038
 disobedience+0.037
 supervision+0.037
 divorce+0.037
Neg logit contributions
 camps-0.031
latable-0.026
 hay-0.025
 clearing-0.025
 knights-0.024
Token helpful
Token ID7613
Feature activation+1.10e-02

Pos logit contributions
grand+0.238
 disobedience+0.219
 hairc+0.216
t+0.213
 supervision+0.212
Neg logit contributions
 camps-0.233
latable-0.207
 knights-0.201
 hay-0.201
 clearing-0.198
Token.
Token ID13
Feature activation+9.63e-03

Pos logit contributions
grand+0.021
 disobedience+0.020
 hairc+0.020
t+0.020
 supervision+0.020
Neg logit contributions
 camps-0.013
latable-0.012
 clearing-0.011
 hay-0.011
 knights-0.010
Token
Token ID220
Feature activation+1.34e-01

Pos logit contributions
grand+0.016
 disobedience+0.014
t+0.013
 lesson+0.013
 supervision+0.013
Neg logit contributions
 camps-0.014
latable-0.013
 knights-0.013
 hay-0.012
 astronaut-0.012
Token
Token ID198
Feature activation+4.61e-02

Pos logit contributions
grand+0.232
t+0.211
 disobedience+0.208
 supervision+0.205
 hairc+0.203
Neg logit contributions
 camps-0.188
latable-0.165
 knights-0.162
 hay-0.160
 clearing-0.155
Token
Token ID198
Feature activation+3.23e-02

Pos logit contributions
grand+0.084
 disobedience+0.076
t+0.076
 hairc+0.075
 divorce+0.074
Neg logit contributions
 camps-0.059
 knights-0.052
latable-0.051
 hay-0.050
 clearing-0.048
TokenFrom
Token ID4863
Feature activation-9.02e-03

Pos logit contributions
grand+0.053
t+0.050
 disobedience+0.047
 lesson+0.047
 divorce+0.045
Neg logit contributions
 camps-0.048
 knights-0.043
latable-0.042
 hay-0.041
 stable-0.040
Token that
Token ID326
Feature activation+3.40e-03

Pos logit contributions
 camps+0.011
 knights+0.010
latable+0.010
acking+0.010
 clearing+0.009
Neg logit contributions
grand-0.017
 lesson-0.015
 disob-0.015
 disobedience-0.015
 divorce-0.015
Token day
Token ID1110
Feature activation+5.85e-02

Pos logit contributions
 hairc+0.006
 supervision+0.006
grand+0.006
 passports+0.006
nam+0.006
Neg logit contributions
 and-0.004
 camps-0.004
 outside-0.004
 jungle-0.004
 He-0.004
Tokenberries
Token ID20853
Feature activation+2.09e-02

Pos logit contributions
 camps+0.005
latable+0.004
 knights+0.004
 clearing+0.004
 football+0.004
Neg logit contributions
grand-0.005
 hairc-0.005
 worry-0.005
t-0.005
 promise-0.005
Token are
Token ID389
Feature activation+4.92e-02

Pos logit contributions
grand+0.031
t+0.028
 disobedience+0.028
 hairc+0.028
 supervision+0.027
Neg logit contributions
 camps-0.034
 knights-0.031
latable-0.030
 clearing-0.029
 hay-0.029
Token very
Token ID845
Feature activation+9.32e-02

Pos logit contributions
grand+0.079
 hairc+0.072
 lesson+0.071
 disobedience+0.069
 worry+0.069
Neg logit contributions
 camps-0.072
latable-0.065
 knights-0.064
acking-0.062
aw-0.061
Token good
Token ID922
Feature activation+5.04e-02

Pos logit contributions
grand+0.157
 hairc+0.140
 disobedience+0.137
 lesson+0.137
t+0.137
Neg logit contributions
 camps-0.132
 knights-0.121
latable-0.118
 clearing-0.114
aw-0.111
Token for
Token ID329
Feature activation+3.41e-02

Pos logit contributions
grand+0.095
 hairc+0.091
t+0.088
 policy+0.087
 disobedience+0.087
Neg logit contributions
 camps-0.062
latable-0.060
 clearing-0.057
 trailer-0.053
 hay-0.051
Token you
Token ID345
Feature activation+9.21e-02

Pos logit contributions
grand+0.050
 disobedience+0.046
 hairc+0.045
 supervision+0.045
 divorce+0.044
Neg logit contributions
 camps-0.055
 knights-0.050
latable-0.049
 hay-0.048
 clearing-0.047
Token,
Token ID11
Feature activation+4.08e-02

Pos logit contributions
grand+0.197
 disobedience+0.185
 hairc+0.183
 supervision+0.181
t+0.180
Neg logit contributions
 camps-0.087
 knights-0.073
latable-0.072
 hay-0.069
 clearing-0.067
Token my
Token ID616
Feature activation+7.14e-02

Pos logit contributions
grand+0.068
 disobedience+0.064
 hairc+0.064
 worry+0.064
 lesson+0.063
Neg logit contributions
 camps-0.056
latable-0.051
acking-0.050
 clearing-0.048
 knights-0.048
Token dear
Token ID13674
Feature activation+6.19e-02

Pos logit contributions
grand+0.127
 disobedience+0.117
 hairc+0.115
 supervision+0.114
 divorce+0.114
Neg logit contributions
 camps-0.094
 knights-0.084
latable-0.081
 hay-0.081
 fish-0.078
Token.
Token ID13
Feature activation+1.00e-02

Pos logit contributions
grand+0.114
 hairc+0.102
 lesson+0.100
 disobedience+0.099
 promise+0.099
Neg logit contributions
latable-0.078
 camps-0.078
 clearing-0.071
acking-0.068
und-0.067
Token They
Token ID1119
Feature activation-6.79e-02

Pos logit contributions
grand+0.015
t+0.013
rav+0.013
 Story+0.012
 disobedience+0.012
Neg logit contributions
 camps-0.017
aw-0.015
latable-0.015
 knights-0.014
 football-0.014
Token sweet
Token ID6029
Feature activation+2.03e-02

Pos logit contributions
grand+0.039
 disobedience+0.037
 hairc+0.036
 supervision+0.036
 divorce+0.036
Neg logit contributions
 camps-0.032
 knights-0.028
latable-0.028
 clearing-0.026
 hay-0.026
Tokenie
Token ID494
Feature activation+1.89e-02

Pos logit contributions
grand+0.021
 hairc+0.020
 worry+0.020
 disobedience+0.018
 lesson+0.017
Neg logit contributions
 camps-0.041
latable-0.040
 knights-0.039
 clearing-0.038
 trailer-0.038
Token.
Token ID13
Feature activation-6.81e-03

Pos logit contributions
grand+0.028
 promise+0.024
 worry+0.024
 supervision+0.024
 disob+0.024
Neg logit contributions
 camps-0.033
latable-0.032
 football-0.028
 clearing-0.027
acking-0.027
Token We
Token ID775
Feature activation+9.03e-03

Pos logit contributions
 camps+0.009
latable+0.007
 knights+0.007
 clearing+0.007
 hay+0.007
Neg logit contributions
grand-0.013
 disobedience-0.012
 hairc-0.012
 supervision-0.011
 divorce-0.011
Token can
Token ID460
Feature activation+8.48e-02

Pos logit contributions
grand+0.014
 disobedience+0.012
 hairc+0.012
 supervision+0.012
 divorce+0.012
Neg logit contributions
 camps-0.014
 knights-0.013
latable-0.013
 hay-0.012
 fish-0.012
Token't
Token ID470
Feature activation+8.62e-02

Pos logit contributions
grand+0.089
t+0.084
 disob+0.080
 worry+0.079
 disobedience+0.079
Neg logit contributions
 camps-0.175
aw-0.165
 clearing-0.162
 knights-0.161
latable-0.159
Token go
Token ID467
Feature activation-7.09e-04

Pos logit contributions
grand+0.144
t+0.133
 worry+0.132
 disobedience+0.132
 disob+0.132
Neg logit contributions
 camps-0.123
latable-0.111
 knights-0.108
 clearing-0.106
aw-0.106
Token to
Token ID284
Feature activation+6.22e-02

Pos logit contributions
 camps+0.001
latable+0.001
 knights+0.001
aw+0.001
 hay+0.001
Neg logit contributions
grand-0.001
t-0.001
 hairc-0.001
 divorce-0.001
 disobedience-0.001
Token the
Token ID262
Feature activation+3.08e-02

Pos logit contributions
grand+0.130
 hairc+0.123
 disobedience+0.123
 divorce+0.121
 supervision+0.120
Neg logit contributions
 camps-0.060
latable-0.052
 knights-0.051
aw-0.048
 hay-0.048
Token lake
Token ID13546
Feature activation+3.23e-02

Pos logit contributions
 divorce+0.060
 hairc+0.059
 virus+0.058
 lesson+0.058
grand+0.058
Neg logit contributions
acking-0.028
und-0.027
 knights-0.026
latable-0.026
 camps-0.026
Token today
Token ID1909
Feature activation+8.32e-02

Pos logit contributions
grand+0.057
t+0.054
 hairc+0.054
 disobedience+0.054
 worry+0.053
Neg logit contributions
 camps-0.039
latable-0.038
 knights-0.035
 clearing-0.034
 hay-0.033
Token his
Token ID465
Feature activation+4.60e-03

Pos logit contributions
grand+0.057
 hairc+0.056
t+0.056
 punishment+0.055
 divorce+0.054
Neg logit contributions
 camps-0.036
latable-0.034
acking-0.032
 pond-0.031
 hay-0.031
Token boss
Token ID6478
Feature activation+2.33e-02

Pos logit contributions
grand+0.007
 supervision+0.006
 disobedience+0.006
t+0.006
 divorce+0.006
Neg logit contributions
 camps-0.007
 knights-0.007
 jungle-0.007
 fish-0.006
 outside-0.006
Token called
Token ID1444
Feature activation-1.14e-04

Pos logit contributions
grand+0.045
 disobedience+0.041
 hairc+0.041
t+0.040
 supervision+0.040
Neg logit contributions
 camps-0.027
 knights-0.024
latable-0.024
 hay-0.023
 clearing-0.022
Token him
Token ID683
Feature activation+3.12e-02

Pos logit contributions
 camps+0.000
latable+0.000
acking+0.000
 clearing+0.000
 hay+0.000
Neg logit contributions
grand-0.000
 hairc-0.000
 divorce-0.000
 disobedience-0.000
 punishment-0.000
Token into
Token ID656
Feature activation-2.86e-02

Pos logit contributions
grand+0.058
 disobedience+0.053
 hairc+0.052
 divorce+0.052
t+0.051
Neg logit contributions
 camps-0.039
latable-0.035
acking-0.033
 clearing-0.033
 knights-0.033
Token his
Token ID465
Feature activation+3.13e-02

Pos logit contributions
 camps+0.026
latable+0.022
 knights+0.020
 hay+0.019
aw+0.019
Neg logit contributions
grand-0.062
 disobedience-0.060
 hairc-0.059
 supervision-0.058
 divorce-0.058
Token office
Token ID2607
Feature activation+5.22e-03

Pos logit contributions
grand+0.058
t+0.056
rav+0.054
 divorce+0.054
 disobedience+0.053
Neg logit contributions
 camps-0.039
 jungle-0.036
 knights-0.036
 outside-0.034
 stable-0.034
Token,
Token ID11
Feature activation+7.96e-04

Pos logit contributions
grand+0.011
 disobedience+0.010
 hairc+0.010
t+0.010
 supervision+0.010
Neg logit contributions
 camps-0.005
latable-0.004
 knights-0.004
 hay-0.004
 clearing-0.004
Token and
Token ID290
Feature activation+4.89e-03

Pos logit contributions
grand+0.001
 disobedience+0.001
 hairc+0.001
 worry+0.001
 divorce+0.001
Neg logit contributions
 camps-0.001
latable-0.001
acking-0.001
 knights-0.001
 hay-0.001
Token asked
Token ID1965
Feature activation+2.37e-02

Pos logit contributions
grand+0.009
 hairc+0.008
 disobedience+0.008
 divorce+0.008
t+0.008
Neg logit contributions
 camps-0.006
latable-0.005
 knights-0.005
 hay-0.005
acking-0.005
Token John
Token ID1757
Feature activation+7.53e-02

Pos logit contributions
grand+0.048
 hairc+0.046
 disobedience+0.045
 supervision+0.044
 lesson+0.044
Neg logit contributions
 camps-0.025
latable-0.022
 clearing-0.020
acking-0.020
 pond-0.020
Token man
Token ID582
Feature activation-8.08e-02

Pos logit contributions
 camps+0.148
latable+0.139
 knights+0.130
acking+0.128
aw+0.123
Neg logit contributions
grand-0.214
 disobedience-0.200
 hairc-0.199
 supervision-0.198
 lesson-0.195
Token learned
Token ID4499
Feature activation-6.36e-02

Pos logit contributions
 camps+0.116
latable+0.108
 knights+0.098
 hay+0.098
 football+0.093
Neg logit contributions
grand-0.138
 disobedience-0.125
 hairc-0.125
 supervision-0.124
 promise-0.124
Token to
Token ID284
Feature activation-3.39e-03

Pos logit contributions
 camps+0.074
latable+0.063
 knights+0.062
acking+0.059
 clearing+0.059
Neg logit contributions
grand-0.124
 hairc-0.119
 disobedience-0.117
 lesson-0.114
 supervision-0.114
Token be
Token ID307
Feature activation-1.44e-02

Pos logit contributions
 camps+0.007
latable+0.006
 knights+0.006
acking+0.006
aw+0.006
Neg logit contributions
grand-0.004
 disob-0.004
 promise-0.004
 worry-0.004
 disobedience-0.004
Token patient
Token ID5827
Feature activation-4.81e-02

Pos logit contributions
 camps+0.028
latable+0.026
acking+0.024
und+0.023
 football+0.022
Neg logit contributions
 lesson-0.017
 disob-0.017
 supervision-0.017
 worry-0.016
 promise-0.016
Token and
Token ID290
Feature activation-4.41e-03

Pos logit contributions
 camps+0.046
latable+0.041
 clearing+0.036
 knights+0.034
 jungle+0.033
Neg logit contributions
grand-0.101
 disobedience-0.099
 supervision-0.099
 hairc-0.097
 worry-0.095
Token not
Token ID407
Feature activation+1.88e-02

Pos logit contributions
 camps+0.007
latable+0.007
 knights+0.006
 hay+0.006
 clearing+0.006
Neg logit contributions
grand-0.006
 disobedience-0.006
 supervision-0.006
 hairc-0.006
 disob-0.006
Token get
Token ID651
Feature activation-3.39e-02

Pos logit contributions
grand+0.029
 disobedience+0.027
 supervision+0.027
 hairc+0.026
 disob+0.026
Neg logit contributions
 camps-0.030
latable-0.027
 knights-0.026
 clearing-0.025
 hay-0.025
Token upset
Token ID9247
Feature activation-5.44e-02

Pos logit contributions
 camps+0.047
latable+0.042
 knights+0.041
 hay+0.040
 clearing+0.040
Neg logit contributions
grand-0.057
 hairc-0.053
 disobedience-0.053
 supervision-0.052
 divorce-0.052
Token with
Token ID351
Feature activation-1.03e-02

Pos logit contributions
 camps+0.063
latable+0.056
 clearing+0.054
 knights+0.051
 hay+0.050
Neg logit contributions
grand-0.104
 disobedience-0.101
 hairc-0.101
 supervision-0.100
t-0.097
Token his
Token ID465
Feature activation+1.29e-02

Pos logit contributions
 camps+0.013
latable+0.011
 knights+0.010
acking+0.010
 clearing+0.010
Neg logit contributions
grand-0.019
 supervision-0.019
 hairc-0.019
 disobedience-0.018
 punishment-0.017
Token They
Token ID1119
Feature activation-1.11e-01

Pos logit contributions
 camps+0.035
 knights+0.032
aw+0.030
latable+0.029
 fish+0.028
Neg logit contributions
grand-0.066
 hairc-0.060
t-0.059
 divorce-0.059
 disobedience-0.059
Token would
Token ID561
Feature activation-4.18e-02

Pos logit contributions
 camps+0.149
 knights+0.128
 hay+0.126
latable+0.126
 outside+0.124
Neg logit contributions
grand-0.198
 disobedience-0.188
 supervision-0.186
 hairc-0.183
 divorce-0.180
Token eat
Token ID4483
Feature activation-8.23e-02

Pos logit contributions
aw+0.062
 camps+0.060
 knights+0.060
acking+0.058
und+0.058
Neg logit contributions
grand-0.061
 worry-0.060
 promise-0.059
 complain-0.058
 argue-0.058
Token y
Token ID331
Feature activation-3.52e-04

Pos logit contributions
 camps+0.119
latable+0.112
 knights+0.111
acking+0.109
 clearing+0.109
Neg logit contributions
grand-0.122
 hairc-0.122
 lesson-0.116
 passports-0.110
 divorce-0.107
Tokenummy
Token ID13513
Feature activation-2.57e-02

Pos logit contributions
 camps+0.000
 knights+0.000
 clearing+0.000
latable+0.000
 hay+0.000
Neg logit contributions
grand-0.001
 hairc-0.001
 disobedience-0.001
t-0.001
 supervision-0.001
Token food
Token ID2057
Feature activation-1.34e-02

Pos logit contributions
 camps+0.037
 knights+0.033
latable+0.033
 clearing+0.032
 hay+0.032
Neg logit contributions
grand-0.042
 hairc-0.039
 disobedience-0.038
 supervision-0.037
 divorce-0.037
Token and
Token ID290
Feature activation+1.13e-02

Pos logit contributions
 camps+0.010
latable+0.009
 knights+0.008
 clearing+0.008
acking+0.007
Neg logit contributions
grand-0.030
 hairc-0.029
t-0.028
 disobedience-0.028
 lesson-0.028
Token play
Token ID711
Feature activation-8.56e-02

Pos logit contributions
grand+0.019
 hairc+0.018
 disobedience+0.017
 worry+0.017
 supervision+0.017
Neg logit contributions
 camps-0.016
latable-0.014
 knights-0.014
acking-0.014
aw-0.014
Token games
Token ID1830
Feature activation-3.23e-02

Pos logit contributions
 camps+0.112
latable+0.098
 clearing+0.096
 river+0.090
acking+0.089
Neg logit contributions
grand-0.137
 lesson-0.136
 hairc-0.133
 note-0.127
 divorce-0.126
Token.
Token ID13
Feature activation-2.70e-02

Pos logit contributions
latable+0.038
 camps+0.038
 clearing+0.034
acking+0.032
 river+0.031
Neg logit contributions
grand-0.057
t-0.057
 hairc-0.053
 lesson-0.053
 note-0.052
Token One
Token ID1881
Feature activation-2.92e-02

Pos logit contributions
 camps+0.031
 knights+0.027
latable+0.027
 hay+0.026
 clearing+0.025
Neg logit contributions
grand-0.052
 disobedience-0.048
 hairc-0.048
 supervision-0.047
 divorce-0.047
Token no
Token ID645
Feature activation+4.04e-02

Pos logit contributions
 camps+0.034
latable+0.029
acking+0.026
 knights+0.026
 clearing+0.026
Neg logit contributions
grand-0.085
 hairc-0.081
 disobedience-0.080
 supervision-0.078
 divorce-0.078
Token other
Token ID584
Feature activation+2.40e-02

Pos logit contributions
grand+0.075
t+0.072
 disobedience+0.070
 supervision+0.070
rav+0.069
Neg logit contributions
 camps-0.051
 hay-0.046
 knights-0.045
 jungle-0.043
 stable-0.043
Token uncle
Token ID7711
Feature activation+5.18e-02

Pos logit contributions
grand+0.044
t+0.043
 divorce+0.042
 supervision+0.042
 disobedience+0.041
Neg logit contributions
 camps-0.029
 hay-0.027
 and-0.026
 fields-0.026
 stable-0.026
Token in
Token ID287
Feature activation-1.29e-02

Pos logit contributions
grand+0.101
 disobedience+0.094
 hairc+0.094
 supervision+0.092
 divorce+0.091
Neg logit contributions
 camps-0.058
 knights-0.051
latable-0.050
 hay-0.048
 clearing-0.047
Token the
Token ID262
Feature activation-9.39e-03

Pos logit contributions
 camps+0.009
 knights+0.007
 hay+0.007
latable+0.007
 clearing+0.006
Neg logit contributions
grand-0.031
 disobedience-0.029
t-0.028
 supervision-0.028
 divorce-0.028
Token world
Token ID995
Feature activation+2.01e-02

Pos logit contributions
 camps+0.009
 knights+0.008
latable+0.008
 hay+0.007
 clearing+0.007
Neg logit contributions
grand-0.020
 disobedience-0.019
 hairc-0.018
 divorce-0.018
 supervision-0.018
Token.
Token ID13
Feature activation-2.15e-02

Pos logit contributions
grand+0.041
 hairc+0.038
t+0.038
 disobedience+0.038
 supervision+0.037
Neg logit contributions
 camps-0.021
latable-0.018
 knights-0.018
 hay-0.017
 clearing-0.017
Token He
Token ID679
Feature activation-9.26e-02

Pos logit contributions
 camps+0.019
 knights+0.017
latable+0.016
aw+0.015
acking+0.015
Neg logit contributions
grand-0.047
 hairc-0.043
 disobedience-0.043
t-0.042
 divorce-0.041
Token loved
Token ID6151
Feature activation-3.09e-02

Pos logit contributions
 camps+0.101
 knights+0.088
latable+0.086
 hay+0.084
 clearing+0.081
Neg logit contributions
grand-0.184
 disobedience-0.172
 hairc-0.171
 supervision-0.170
 divorce-0.167
Token to
Token ID284
Feature activation+3.76e-02

Pos logit contributions
 camps+0.040
 knights+0.035
 hay+0.034
latable+0.034
 jungle+0.033
Neg logit contributions
grand-0.056
 disobedience-0.052
 hairc-0.051
 supervision-0.051
t-0.050
Token eat
Token ID4483
Feature activation-5.48e-02

Pos logit contributions
grand+0.063
 hairc+0.058
 worry+0.057
 disobedience+0.056
 disob+0.055
Neg logit contributions
 camps-0.056
 knights-0.049
acking-0.048
aw-0.048
latable-0.048
Token It
Token ID632
Feature activation+1.78e-02

Pos logit contributions
grand+0.012
 hairc+0.011
t+0.010
 disobedience+0.010
 supervision+0.010
Neg logit contributions
 camps-0.011
latable-0.010
 knights-0.010
 hay-0.010
aw-0.009
Token made
Token ID925
Feature activation+1.02e-02

Pos logit contributions
grand+0.028
 disobedience+0.025
 hairc+0.025
t+0.025
 divorce+0.024
Neg logit contributions
 camps-0.027
 knights-0.024
latable-0.024
 hay-0.023
 clearing-0.023
Token me
Token ID502
Feature activation+6.19e-02

Pos logit contributions
grand+0.016
 hairc+0.015
 disobedience+0.015
 supervision+0.015
 divorce+0.015
Neg logit contributions
 camps-0.015
latable-0.013
 knights-0.013
 clearing-0.013
acking-0.012
Token sad
Token ID6507
Feature activation-2.42e-03

Pos logit contributions
grand+0.099
 hairc+0.092
 disobedience+0.092
 supervision+0.090
t+0.089
Neg logit contributions
 camps-0.092
 knights-0.082
latable-0.081
 hay-0.078
 clearing-0.078
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
 camps+0.003
latable+0.002
 clearing+0.002
 knights+0.002
 jungle+0.002
Neg logit contributions
grand-0.005
 hairc-0.005
 disobedience-0.005
 worry-0.004
 lesson-0.004
Tokenâ
Token ID22940
Feature activation+2.60e-02

Pos logit contributions
grand+0.021
 disobedience+0.019
t+0.019
 hairc+0.019
 supervision+0.019
Neg logit contributions
 camps-0.017
latable-0.015
 knights-0.015
 hay-0.015
 fish-0.014
Token
Token ID26391
Feature activation+8.75e-02

Pos logit contributions
grand+0.025
 disobedience+0.022
 hairc+0.022
 supervision+0.021
 divorce+0.021
Neg logit contributions
 camps-0.055
 knights-0.051
latable-0.051
 hay-0.050
 clearing-0.049
Token
Token ID220
Feature activation+1.45e-01

Pos logit contributions
grand+0.153
 disobedience+0.140
 hairc+0.139
t+0.138
 supervision+0.137
Neg logit contributions
 camps-0.118
 knights-0.104
latable-0.103
 hay-0.102
 clearing-0.098
Token
Token ID198
Feature activation+5.31e-02

Pos logit contributions
grand+0.272
 disobedience+0.250
 hairc+0.249
t+0.247
 supervision+0.245
Neg logit contributions
 camps-0.176
 knights-0.153
latable-0.153
 hay-0.148
 clearing-0.143
Token
Token ID198
Feature activation+3.84e-02

Pos logit contributions
grand+0.111
 hairc+0.106
 supervision+0.105
 disobedience+0.105
 divorce+0.102
Neg logit contributions
 camps-0.051
 knights-0.043
 clearing-0.042
latable-0.042
 hay-0.041
TokenJ
Token ID41
Feature activation+6.57e-02

Pos logit contributions
grand+0.059
 disobedience+0.054
t+0.053
 hairc+0.052
 divorce+0.052
Neg logit contributions
 camps-0.060
latable-0.054
 knights-0.054
 hay-0.052
 fish-0.050
Token an
Token ID281
Feature activation+4.44e-02

Pos logit contributions
 camps+0.003
latable+0.003
acking+0.003
aw+0.003
 jungle+0.003
Neg logit contributions
 hairc-0.004
 supervision-0.004
 worry-0.004
 protests-0.004
 disobedience-0.004
Token empty
Token ID6565
Feature activation-5.54e-02

Pos logit contributions
grand+0.084
 disobedience+0.078
 hairc+0.078
 supervision+0.077
 divorce+0.076
Neg logit contributions
 camps-0.052
 knights-0.046
latable-0.045
 hay-0.044
 clearing-0.043
Token bowl
Token ID9396
Feature activation+2.91e-02

Pos logit contributions
 camps+0.081
latable+0.071
 knights+0.071
 hay+0.066
aw+0.066
Neg logit contributions
grand-0.090
 hairc-0.083
 disobedience-0.081
 supervision-0.080
t-0.079
Token.
Token ID13
Feature activation-3.21e-02

Pos logit contributions
t+0.045
 note+0.042
grand+0.042
 policy+0.041
 promise+0.041
Neg logit contributions
 camps-0.043
latable-0.043
 clearing-0.036
 astronaut-0.036
 jungle-0.036
Token He
Token ID679
Feature activation-9.03e-02

Pos logit contributions
 camps+0.035
latable+0.032
 sailing+0.030
acking+0.030
 astronaut+0.029
Neg logit contributions
grand-0.060
 disobedience-0.058
 hairc-0.057
t-0.056
 rhy-0.056
Token has
Token ID468
Feature activation-5.20e-02

Pos logit contributions
latable+0.150
 camps+0.147
 clearing+0.142
acking+0.141
 hay+0.134
Neg logit contributions
 hairc-0.108
grand-0.108
 disob-0.104
 promise-0.103
 disobedience-0.102
Token no
Token ID645
Feature activation+4.89e-02

Pos logit contributions
 camps+0.051
 knights+0.043
latable+0.043
 hay+0.041
 clearing+0.040
Neg logit contributions
grand-0.109
 disobedience-0.102
 hairc-0.102
 supervision-0.100
 divorce-0.099
Token past
Token ID1613
Feature activation+1.76e-02

Pos logit contributions
grand+0.088
 hairc+0.082
 supervision+0.081
 disobedience+0.081
 lesson+0.080
Neg logit contributions
 camps-0.060
latable-0.056
acking-0.052
 knights-0.052
aw-0.050
Tokenels
Token ID1424
Feature activation+2.21e-02

Pos logit contributions
 hairc+0.025
t+0.023
 supervision+0.022
grand+0.022
 disobedience+0.022
Neg logit contributions
 camps-0.028
latable-0.026
 clearing-0.025
acking-0.024
 jungle-0.024
Token.
Token ID13
Feature activation-2.00e-02

Pos logit contributions
t+0.041
grand+0.040
 hairc+0.039
 supervision+0.038
 disobedience+0.037
Neg logit contributions
 camps-0.028
latable-0.026
 clearing-0.024
 knights-0.022
 jungle-0.022
Token He
Token ID679
Feature activation-7.78e-02

Pos logit contributions
 camps+0.019
latable+0.017
 sailing+0.015
acking+0.015
 astronaut+0.014
Neg logit contributions
grand-0.042
t-0.040
 disobedience-0.040
 hairc-0.039
 lesson-0.038
Token scary
Token ID14343
Feature activation-8.66e-02

Pos logit contributions
grand+0.036
 hairc+0.033
t+0.032
 disobedience+0.032
 disob+0.032
Neg logit contributions
 camps-0.036
latable-0.033
 clearing-0.031
 hay-0.031
 knights-0.030
Token.
Token ID13
Feature activation-4.55e-02

Pos logit contributions
 camps+0.124
latable+0.123
acking+0.113
 clearing+0.112
 trailer+0.103
Neg logit contributions
 hairc-0.136
grand-0.130
 disobedience-0.125
t-0.122
 policy-0.120
Token But
Token ID887
Feature activation+3.91e-02

Pos logit contributions
 camps+0.030
latable+0.023
 knights+0.023
 hay+0.022
 clearing+0.020
Neg logit contributions
grand-0.111
 hairc-0.104
 disobedience-0.104
t-0.102
 supervision-0.102
Token he
Token ID339
Feature activation-8.67e-02

Pos logit contributions
grand+0.080
 disobedience+0.075
 hairc+0.075
 supervision+0.073
 divorce+0.073
Neg logit contributions
 camps-0.040
latable-0.035
 knights-0.034
 hay-0.033
 clearing-0.032
Token was
Token ID373
Feature activation-1.17e-02

Pos logit contributions
 camps+0.116
 knights+0.104
latable+0.101
 hay+0.100
 fish+0.098
Neg logit contributions
grand-0.152
 disobedience-0.140
 supervision-0.140
 hairc-0.139
 divorce-0.137
Token also
Token ID635
Feature activation-5.44e-02

Pos logit contributions
 camps+0.022
latable+0.020
und+0.019
acking+0.019
aw+0.019
Neg logit contributions
 disob-0.014
 hairc-0.013
 worry-0.013
grand-0.013
t-0.013
Token very
Token ID845
Feature activation+3.53e-02

Pos logit contributions
 camps+0.088
latable+0.077
 hay+0.073
acking+0.072
 knights+0.072
Neg logit contributions
grand-0.079
 hairc-0.077
 disob-0.076
 disobedience-0.075
t-0.073
Token ignorant
Token ID20109
Feature activation-5.57e-02

Pos logit contributions
grand+0.048
 disob+0.045
 hairc+0.044
t+0.044
 disobedience+0.044
Neg logit contributions
 camps-0.061
latable-0.056
 clearing-0.053
 hay-0.053
 knights-0.052
Token.
Token ID13
Feature activation-4.02e-02

Pos logit contributions
 camps+0.059
latable+0.050
 clearing+0.048
 knights+0.047
 hay+0.045
Neg logit contributions
grand-0.113
 hairc-0.110
 disobedience-0.108
t-0.106
 supervision-0.104
Token He
Token ID679
Feature activation-1.10e-01

Pos logit contributions
 camps+0.027
 knights+0.022
latable+0.020
 hay+0.020
 jungle+0.019
Neg logit contributions
grand-0.097
 disobedience-0.092
 supervision-0.091
 hairc-0.091
 divorce-0.090
Token didn
Token ID1422
Feature activation+2.20e-02

Pos logit contributions
 camps+0.154
 knights+0.139
latable+0.134
 hay+0.134
 fish+0.131
Neg logit contributions
grand-0.185
 disobedience-0.170
 supervision-0.170
 hairc-0.168
 divorce-0.165
Token enjoyed
Token ID8359
Feature activation-1.03e-01

Pos logit contributions
 camps+0.161
latable+0.147
 knights+0.141
 hay+0.139
 clearing+0.138
Neg logit contributions
grand-0.183
 disobedience-0.167
 hairc-0.166
t-0.165
 supervision-0.164
Token the
Token ID262
Feature activation-5.18e-02

Pos logit contributions
 camps+0.122
latable+0.108
 knights+0.102
acking+0.101
 clearing+0.099
Neg logit contributions
 hairc-0.188
grand-0.187
 disobedience-0.184
 supervision-0.181
 lesson-0.179
Token sunny
Token ID27737
Feature activation-9.01e-02

Pos logit contributions
 camps+0.076
 knights+0.070
latable+0.068
 hay+0.065
 clearing+0.065
Neg logit contributions
grand-0.082
 disobedience-0.076
 hairc-0.075
 divorce-0.073
 supervision-0.073
Token day
Token ID1110
Feature activation+1.32e-02

Pos logit contributions
 knights+0.193
 camps+0.190
latable+0.183
 trailer+0.178
aw+0.176
Neg logit contributions
grand-0.072
 hairc-0.063
 lesson-0.062
 disobedience-0.061
t-0.059
Token.
Token ID13
Feature activation-5.39e-02

Pos logit contributions
grand+0.027
t+0.025
 hairc+0.025
 disobedience+0.025
 supervision+0.025
Neg logit contributions
 camps-0.013
latable-0.012
 knights-0.011
 clearing-0.011
 hay-0.011
Token<|endoftext|>
Token ID50256
Feature activation-5.03e-02

Pos logit contributions
 camps+0.064
 knights+0.058
latable+0.057
 hay+0.054
 outside+0.052
Neg logit contributions
grand-0.103
 disobedience-0.092
t-0.091
 hairc-0.091
 supervision-0.088
TokenOnce
Token ID7454
Feature activation+1.23e-02

Pos logit contributions
 camps+0.073
latable+0.068
 knights+0.066
 hay+0.063
 astronaut+0.062
Neg logit contributions
grand-0.079
t-0.075
 hairc-0.073
 disobedience-0.071
 lesson-0.069
Token upon
Token ID2402
Feature activation-9.68e-02

Pos logit contributions
grand+0.021
 hairc+0.020
 disobedience+0.020
 supervision+0.019
 divorce+0.019
Neg logit contributions
 camps-0.017
 knights-0.015
 outside-0.014
 fish-0.014
 clearing-0.014
Token a
Token ID257
Feature activation-8.12e-02

Pos logit contributions
acking+0.130
 camps+0.129
latable+0.117
und+0.112
 trailer+0.105
Neg logit contributions
 hairc-0.160
grand-0.158
 lesson-0.155
t-0.154
 worry-0.153
Token time
Token ID640
Feature activation+1.08e-02

Pos logit contributions
 camps+0.143
 knights+0.138
 clearing+0.132
latable+0.132
 hay+0.130
Neg logit contributions
grand-0.095
 disobedience-0.088
t-0.085
 promise-0.083
 lesson-0.083
Token there
Token ID612
Feature activation-6.61e-02

Pos logit contributions
grand+0.025
 disobedience+0.023
 hairc+0.023
 supervision+0.023
t+0.023
Neg logit contributions
 camps-0.008
 knights-0.007
latable-0.007
 hay-0.006
 clearing-0.006
Token Jack
Token ID3619
Feature activation-1.27e-02

Pos logit contributions
 camps+0.050
 knights+0.049
 clearing+0.047
aw+0.047
bridge+0.046
Neg logit contributions
grand-0.068
t-0.062
 hairc-0.061
 worry-0.059
 supervision-0.059
Token who
Token ID508
Feature activation-3.15e-02

Pos logit contributions
 camps+0.017
latable+0.017
 knights+0.016
 clearing+0.015
 hay+0.015
Neg logit contributions
grand-0.022
 hairc-0.019
 worry-0.019
 disobedience-0.019
 disob-0.019
Token loved
Token ID6151
Feature activation-5.39e-02

Pos logit contributions
 camps+0.033
 knights+0.028
latable+0.028
 hay+0.026
 clearing+0.026
Neg logit contributions
grand-0.066
 disobedience-0.061
 hairc-0.060
t-0.059
 divorce-0.058
Token to
Token ID284
Feature activation+1.81e-02

Pos logit contributions
 camps+0.075
 knights+0.067
latable+0.067
 hay+0.065
 clearing+0.064
Neg logit contributions
grand-0.090
 disobedience-0.083
 hairc-0.083
 supervision-0.081
 divorce-0.081
Token listen
Token ID6004
Feature activation-4.74e-02

Pos logit contributions
grand+0.030
 hairc+0.029
 worry+0.028
 rhy+0.028
 disobedience+0.027
Neg logit contributions
 camps-0.026
aw-0.024
 clearing-0.023
 knights-0.023
acking-0.023
Token.
Token ID13
Feature activation-3.94e-02

Pos logit contributions
 camps+0.074
latable+0.068
 knights+0.067
 clearing+0.065
 jungle+0.063
Neg logit contributions
grand-0.070
 hairc-0.066
 disobedience-0.065
t-0.064
 lesson-0.063
Token He
Token ID679
Feature activation-1.14e-01

Pos logit contributions
 camps+0.038
aw+0.033
 knights+0.031
 clearing+0.030
 hay+0.030
Neg logit contributions
grand-0.077
 disobedience-0.074
t-0.073
 lesson-0.073
 hairc-0.073
Token was
Token ID373
Feature activation-2.44e-02

Pos logit contributions
 camps+0.151
 knights+0.134
latable+0.130
 hay+0.129
 clearing+0.125
Neg logit contributions
grand-0.202
 disobedience-0.187
 supervision-0.185
 hairc-0.185
 divorce-0.182
Token only
Token ID691
Feature activation-1.19e-04

Pos logit contributions
 camps+0.040
latable+0.037
und+0.036
aw+0.036
acking+0.035
Neg logit contributions
 hairc-0.034
 disob-0.032
 worry-0.031
grand-0.030
nam-0.030
Token three
Token ID1115
Feature activation-1.13e-02

Pos logit contributions
 camps+0.000
latable+0.000
aw+0.000
acking+0.000
und+0.000
Neg logit contributions
 hairc-0.000
grand-0.000
 supervision-0.000
 worry-0.000
 disobedience-0.000
Token years
Token ID812
Feature activation+2.27e-02

Pos logit contributions
 camps+0.020
 knights+0.019
latable+0.018
 hay+0.018
 clearing+0.018
Neg logit contributions
grand-0.015
 disobedience-0.013
 hairc-0.013
 supervision-0.013
 divorce-0.013
Token the
Token ID262
Feature activation+3.91e-03

Pos logit contributions
grand+0.042
 hairc+0.040
 disobedience+0.040
 supervision+0.039
 divorce+0.039
Neg logit contributions
 camps-0.024
latable-0.021
 knights-0.020
 hay-0.019
acking-0.019
Token boy
Token ID2933
Feature activation+2.87e-02

Pos logit contributions
grand+0.007
 disobedience+0.006
 hairc+0.006
t+0.006
 supervision+0.006
Neg logit contributions
 camps-0.005
 knights-0.005
latable-0.005
 hay-0.005
 clearing-0.004
Token happily
Token ID18177
Feature activation+3.12e-03

Pos logit contributions
grand+0.042
 disobedience+0.038
 hairc+0.037
 supervision+0.036
t+0.036
Neg logit contributions
 camps-0.048
latable-0.043
 knights-0.042
 hay-0.042
 clearing-0.041
Token accepted
Token ID6292
Feature activation-3.85e-02

Pos logit contributions
grand+0.004
 promise+0.004
t+0.004
 complained+0.004
 disob+0.004
Neg logit contributions
 camps-0.005
latable-0.005
 knights-0.005
 jungle-0.005
 clearing-0.005
Token.
Token ID13
Feature activation+1.26e-03

Pos logit contributions
 camps+0.031
 knights+0.026
 hay+0.024
latable+0.024
 fish+0.023
Neg logit contributions
grand-0.088
 disobedience-0.082
 hairc-0.081
 divorce-0.081
t-0.080
Token He
Token ID679
Feature activation-7.56e-02

Pos logit contributions
grand+0.003
 disobedience+0.002
 hairc+0.002
 supervision+0.002
t+0.002
Neg logit contributions
 camps-0.001
 knights-0.001
latable-0.001
 hay-0.001
 clearing-0.001
Token smiled
Token ID13541
Feature activation-3.36e-02

Pos logit contributions
 camps+0.123
latable+0.114
 knights+0.113
 clearing+0.108
 hay+0.107
Neg logit contributions
grand-0.108
 disobedience-0.097
 hairc-0.095
 lesson-0.095
 disob-0.095
Token and
Token ID290
Feature activation+2.70e-02

Pos logit contributions
latable+0.041
 camps+0.040
 hay+0.038
 jungle+0.037
 swings+0.035
Neg logit contributions
 worry-0.058
 lesson-0.054
 punishment-0.053
 hairc-0.052
 note-0.052
Token thanked
Token ID26280
Feature activation+2.11e-02

Pos logit contributions
grand+0.032
 hairc+0.030
 complained+0.029
 lesson+0.029
 disob+0.029
Neg logit contributions
 camps-0.052
latable-0.048
 knights-0.046
 clearing-0.044
 jungle-0.044
Token the
Token ID262
Feature activation+4.15e-02

Pos logit contributions
grand+0.050
 disobedience+0.047
t+0.047
 supervision+0.047
 divorce+0.046
Neg logit contributions
 camps-0.014
 knights-0.012
 fish-0.011
 hay-0.010
 outside-0.010
Token old
Token ID1468
Feature activation-3.40e-03

Pos logit contributions
grand+0.078
 disobedience+0.072
t+0.072
 hairc+0.072
 supervision+0.071
Neg logit contributions
 camps-0.050
 knights-0.045
 hay-0.043
latable-0.043
 fish-0.043
Token she
Token ID673
Feature activation-1.20e-01

Pos logit contributions
 camps+0.074
latable+0.068
 knights+0.067
 jungle+0.064
acking+0.064
Neg logit contributions
grand-0.065
 hairc-0.060
 disobedience-0.059
t-0.057
 supervision-0.057
Token found
Token ID1043
Feature activation-8.46e-02

Pos logit contributions
 camps+0.179
latable+0.177
 football+0.164
 knights+0.161
 jungle+0.158
Neg logit contributions
grand-0.178
 complained-0.165
t-0.163
 disob-0.162
 hairc-0.156
Token a
Token ID257
Feature activation-4.85e-02

Pos logit contributions
 camps+0.094
latable+0.087
acking+0.084
 jungle+0.078
 knights+0.074
Neg logit contributions
 hairc-0.157
grand-0.157
 supervision-0.155
 disobedience-0.151
 permission-0.149
Token lady
Token ID10846
Feature activation-2.35e-02

Pos logit contributions
 camps+0.062
 knights+0.055
latable+0.054
 hay+0.053
aw+0.051
Neg logit contributions
grand-0.086
 disobedience-0.080
 hairc-0.080
 supervision-0.078
 divorce-0.078
Tokenbug
Token ID25456
Feature activation-1.07e-01

Pos logit contributions
 camps+0.028
latable+0.026
 hay+0.023
 knights+0.022
 football+0.022
Neg logit contributions
grand-0.045
 supervision-0.042
 hairc-0.041
 note-0.040
 worry-0.040
Token crawling
Token ID34499
Feature activation-1.50e-01

Pos logit contributions
latable+0.126
 camps+0.124
acking+0.100
 astronaut+0.098
uting+0.097
Neg logit contributions
grand-0.193
 note-0.188
 supervision-0.186
 promise-0.184
 disobedience-0.179
Token across
Token ID1973
Feature activation-8.25e-02

Pos logit contributions
 camps+0.173
latable+0.160
 knights+0.150
 jungle+0.142
 astronaut+0.135
Neg logit contributions
t-0.278
grand-0.274
 hairc-0.269
 lesson-0.263
 disobedience-0.262
Token the
Token ID262
Feature activation-3.96e-02

Pos logit contributions
 camps+0.086
latable+0.075
 knights+0.069
acking+0.066
 astronaut+0.063
Neg logit contributions
grand-0.167
 hairc-0.161
 disobedience-0.159
 supervision-0.154
 divorce-0.154
Token grass
Token ID8701
Feature activation-6.93e-02

Pos logit contributions
 camps+0.047
latable+0.041
 knights+0.039
 football+0.036
acking+0.035
Neg logit contributions
grand-0.075
 hairc-0.071
 disobedience-0.068
 divorce-0.068
 lesson-0.068
Token.
Token ID13
Feature activation-5.11e-02

Pos logit contributions
 camps+0.099
latable+0.096
 astronaut+0.092
 knights+0.091
 cowboy+0.083
Neg logit contributions
t-0.115
 note-0.102
grand-0.096
 disobedience-0.095
 disagreed-0.093
Token She
Token ID1375
Feature activation-1.18e-01

Pos logit contributions
 camps+0.064
latable+0.055
 knights+0.053
 astronaut+0.052
 sailing+0.050
Neg logit contributions
grand-0.094
 hairc-0.088
t-0.087
 disobedience-0.084
 babys-0.084
Token across
Token ID1973
Feature activation-9.05e-02

Pos logit contributions
 camps+0.102
latable+0.097
 astronaut+0.090
 knights+0.085
 jungle+0.085
Neg logit contributions
t-0.091
 disobedience-0.085
 worry-0.085
rav-0.085
 hairc-0.083
Token a
Token ID257
Feature activation-6.31e-02

Pos logit contributions
 camps+0.076
latable+0.063
 knights+0.059
acking+0.054
 hay+0.054
Neg logit contributions
grand-0.201
 disobedience-0.193
 hairc-0.193
 divorce-0.188
 supervision-0.188
Token big
Token ID1263
Feature activation-1.29e-01

Pos logit contributions
 camps+0.079
 knights+0.071
latable+0.070
 hay+0.066
aw+0.065
Neg logit contributions
grand-0.112
 disobedience-0.105
 hairc-0.104
 divorce-0.103
 supervision-0.101
Token juicy
Token ID35075
Feature activation-1.16e-01

Pos logit contributions
 camps+0.171
 knights+0.154
latable+0.153
aw+0.146
uting+0.145
Neg logit contributions
 worry-0.207
 note-0.203
 virus-0.199
 promise-0.199
 divorce-0.194
Token steak
Token ID26320
Feature activation-6.04e-02

Pos logit contributions
 camps+0.169
 knights+0.164
latable+0.161
 astronaut+0.161
und+0.153
Neg logit contributions
 note-0.178
 worry-0.161
 promise-0.159
 reason-0.159
 punishment-0.155
Token lying
Token ID9105
Feature activation-1.87e-01

Pos logit contributions
latable+0.074
 camps+0.073
 knights+0.065
 astronaut+0.062
 jungle+0.060
Neg logit contributions
 note-0.106
t-0.104
grand-0.101
 policy-0.099
 disobedience-0.097
Token on
Token ID319
Feature activation-4.34e-02

Pos logit contributions
 camps+0.217
latable+0.211
 knights+0.191
 astronaut+0.171
 jungle+0.171
Neg logit contributions
t-0.343
 note-0.320
grand-0.320
 hairc-0.312
 disobedience-0.312
Token the
Token ID262
Feature activation-2.78e-02

Pos logit contributions
 camps+0.062
latable+0.057
 knights+0.053
acking+0.053
 astronaut+0.050
Neg logit contributions
 hairc-0.068
grand-0.068
 punishment-0.067
 note-0.066
 lesson-0.066
Token grass
Token ID8701
Feature activation-7.27e-02

Pos logit contributions
 camps+0.041
 knights+0.038
latable+0.038
und+0.037
acking+0.037
Neg logit contributions
grand-0.044
 hairc-0.039
 supervision-0.038
 lesson-0.038
 disobedience-0.038
Token in
Token ID287
Feature activation-5.20e-02

Pos logit contributions
 camps+0.105
 astronaut+0.101
latable+0.101
 knights+0.101
 cops+0.090
Neg logit contributions
t-0.118
 note-0.104
grand-0.103
 disobedience-0.098
 supervision-0.097
Token front
Token ID2166
Feature activation-1.55e-02

Pos logit contributions
 camps+0.067
latable+0.060
 knights+0.058
aw+0.053
acking+0.053
Neg logit contributions
grand-0.093
 disobedience-0.088
 hairc-0.088
 worry-0.087
 punishment-0.086
Token to
Token ID284
Feature activation-1.94e-02

Pos logit contributions
 camps+0.036
latable+0.033
 knights+0.032
 jungle+0.031
 hay+0.029
Neg logit contributions
t-0.065
grand-0.064
 hairc-0.061
 disobedience-0.061
 note-0.060
Token find
Token ID1064
Feature activation-6.74e-02

Pos logit contributions
 camps+0.028
latable+0.025
 knights+0.024
acking+0.024
aw+0.023
Neg logit contributions
grand-0.032
 hairc-0.029
 disobedience-0.029
 supervision-0.029
 promise-0.028
Token that
Token ID326
Feature activation-4.07e-02

Pos logit contributions
 camps+0.086
latable+0.078
acking+0.074
 knights+0.070
 jungle+0.070
Neg logit contributions
grand-0.122
 hairc-0.115
 supervision-0.113
 disobedience-0.113
 divorce-0.110
Token she
Token ID673
Feature activation-9.41e-02

Pos logit contributions
 camps+0.051
 knights+0.046
 hay+0.045
 outside+0.044
 fish+0.043
Neg logit contributions
grand-0.073
 disobedience-0.068
 supervision-0.067
t-0.066
 hairc-0.066
Token was
Token ID373
Feature activation-2.58e-02

Pos logit contributions
 camps+0.150
latable+0.137
 knights+0.134
 hay+0.132
 clearing+0.129
Neg logit contributions
grand-0.141
 disobedience-0.128
 hairc-0.126
 supervision-0.124
t-0.124
Token flying
Token ID7348
Feature activation-1.64e-01

Pos logit contributions
 camps+0.037
latable+0.035
acking+0.032
 football+0.032
 hay+0.031
Neg logit contributions
grand-0.040
 worry-0.039
 disob-0.039
 hairc-0.039
 disobedience-0.038
Token safely
Token ID11512
Feature activation-4.69e-02

Pos logit contributions
 camps+0.172
latable+0.153
 knights+0.140
acking+0.129
 hay+0.128
Neg logit contributions
grand-0.321
t-0.319
 hairc-0.306
 lesson-0.305
 disobedience-0.304
Token back
Token ID736
Feature activation-4.92e-02

Pos logit contributions
 camps+0.055
latable+0.052
 knights+0.049
 hay+0.047
 football+0.045
Neg logit contributions
grand-0.086
t-0.085
 hairc-0.081
 disobedience-0.081
 supervision-0.080
Token into
Token ID656
Feature activation-9.20e-02

Pos logit contributions
 camps+0.071
latable+0.071
 knights+0.063
 hay+0.060
 football+0.060
Neg logit contributions
grand-0.078
 hairc-0.074
 disobedience-0.072
 supervision-0.071
 policy-0.069
Token the
Token ID262
Feature activation-4.84e-02

Pos logit contributions
 camps+0.091
latable+0.080
 knights+0.074
 hay+0.072
acking+0.071
Neg logit contributions
grand-0.190
 hairc-0.182
 disobedience-0.180
 supervision-0.176
 divorce-0.175
Token sky
Token ID6766
Feature activation-4.01e-02

Pos logit contributions
 camps+0.057
latable+0.054
 knights+0.052
acking+0.049
 hay+0.047
Neg logit contributions
grand-0.087
 divorce-0.081
 hairc-0.081
 disobedience-0.081
 virus-0.079
Token.
Token ID13
Feature activation-3.15e-02

Pos logit contributions
grand+0.035
 disobedience+0.033
 hairc+0.033
 supervision+0.032
t+0.032
Neg logit contributions
 camps-0.018
 knights-0.016
latable-0.016
 hay-0.015
 clearing-0.015
Token She
Token ID1375
Feature activation-9.40e-02

Pos logit contributions
 camps+0.042
latable+0.038
 astronaut+0.037
 hay+0.033
 knights+0.033
Neg logit contributions
grand-0.050
 disobedience-0.050
 hairc-0.049
 supervision-0.048
 punishment-0.048
Token thought
Token ID1807
Feature activation+1.41e-02

Pos logit contributions
 camps+0.170
 knights+0.153
latable+0.152
 clearing+0.149
 hay+0.147
Neg logit contributions
grand-0.121
 disob-0.113
 disobedience-0.110
 hairc-0.110
 complained-0.109
Token he
Token ID339
Feature activation-4.88e-02

Pos logit contributions
grand+0.026
 hairc+0.024
 disobedience+0.024
 supervision+0.024
t+0.024
Neg logit contributions
 camps-0.018
latable-0.015
 clearing-0.015
 knights-0.015
 hay-0.014
Token was
Token ID373
Feature activation+1.41e-02

Pos logit contributions
 camps+0.074
 knights+0.066
latable+0.065
 hay+0.064
 clearing+0.062
Neg logit contributions
grand-0.077
 disobedience-0.071
 hairc-0.070
t-0.069
 supervision-0.068
Token lying
Token ID9105
Feature activation-1.50e-01

Pos logit contributions
 hairc+0.020
grand+0.020
 disob+0.019
 supervision+0.019
 disobedience+0.019
Neg logit contributions
 camps-0.024
acking-0.020
latable-0.020
und-0.019
 clearing-0.019
Token to
Token ID284
Feature activation+2.50e-02

Pos logit contributions
 camps+0.159
latable+0.144
 knights+0.126
 clearing+0.124
 hay+0.119
Neg logit contributions
grand-0.290
 disobedience-0.286
t-0.283
 hairc-0.283
 supervision-0.277
Token make
Token ID787
Feature activation-8.49e-03

Pos logit contributions
grand+0.050
 disobedience+0.047
 hairc+0.046
 supervision+0.046
 divorce+0.046
Neg logit contributions
 camps-0.027
 knights-0.023
latable-0.023
 hay-0.022
 clearing-0.022
Token her
Token ID607
Feature activation+2.91e-02

Pos logit contributions
 camps+0.010
latable+0.009
 clearing+0.009
 knights+0.008
 jungle+0.008
Neg logit contributions
grand-0.016
 hairc-0.015
 disobedience-0.015
 supervision-0.015
 divorce-0.014
Token feel
Token ID1254
Feature activation-2.44e-02

Pos logit contributions
grand+0.047
t+0.042
 disobedience+0.041
rav+0.041
 policy+0.040
Neg logit contributions
 camps-0.043
 knights-0.040
 hay-0.039
 outside-0.039
 fish-0.038
Token better
Token ID1365
Feature activation+8.67e-03

Pos logit contributions
 camps+0.043
latable+0.039
 knights+0.038
 clearing+0.037
 jungle+0.037
Neg logit contributions
grand-0.032
 disobedience-0.030
 hairc-0.030
 lesson-0.029
 supervision-0.029
Token day
Token ID1110
Feature activation-1.34e-03

Pos logit contributions
 camps+0.020
 knights+0.018
latable+0.018
 hay+0.017
 clearing+0.017
Neg logit contributions
grand-0.017
 disobedience-0.016
 hairc-0.016
 supervision-0.015
 divorce-0.015
Token,
Token ID11
Feature activation-5.41e-02

Pos logit contributions
 camps+0.002
latable+0.001
acking+0.001
 hay+0.001
 knights+0.001
Neg logit contributions
grand-0.003
 hairc-0.002
t-0.002
 disobedience-0.002
 punishment-0.002
Token while
Token ID981
Feature activation-1.58e-02

Pos logit contributions
 camps+0.062
latable+0.058
acking+0.055
 hay+0.049
 knights+0.049
Neg logit contributions
grand-0.103
 hairc-0.101
 disobedience-0.097
 divorce-0.096
 punishment-0.095
Token he
Token ID339
Feature activation-1.07e-01

Pos logit contributions
 camps+0.020
latable+0.020
acking+0.017
 hay+0.017
aw+0.016
Neg logit contributions
grand-0.027
 hairc-0.027
 disobedience-0.026
 punishment-0.026
 divorce-0.026
Token was
Token ID373
Feature activation-2.63e-02

Pos logit contributions
 camps+0.161
 knights+0.147
latable+0.147
 hay+0.144
 clearing+0.138
Neg logit contributions
grand-0.164
 disobedience-0.152
 hairc-0.152
t-0.148
 disob-0.146
Token walking
Token ID6155
Feature activation-1.56e-01

Pos logit contributions
 camps+0.033
latable+0.031
 hay+0.027
und+0.027
aw+0.027
Neg logit contributions
 disob-0.044
 hairc-0.044
 divorce-0.044
 punishment-0.043
 worry-0.042
Token in
Token ID287
Feature activation-5.81e-02

Pos logit contributions
 camps+0.200
latable+0.194
aw+0.163
acking+0.161
 trailer+0.160
Neg logit contributions
t-0.264
 lesson-0.253
 worry-0.251
 hairc-0.250
grand-0.250
Token the
Token ID262
Feature activation-4.64e-02

Pos logit contributions
 camps+0.058
latable+0.052
 knights+0.050
aw+0.048
acking+0.047
Neg logit contributions
grand-0.114
 hairc-0.113
 disobedience-0.112
 worry-0.108
 supervision-0.108
Token forest
Token ID8222
Feature activation-8.53e-02

Pos logit contributions
 camps+0.041
latable+0.039
 knights+0.036
acking+0.036
und+0.034
Neg logit contributions
grand-0.091
 hairc-0.087
 divorce-0.087
 disobedience-0.086
 virus-0.086
Token,
Token ID11
Feature activation-3.86e-02

Pos logit contributions
 camps+0.101
latable+0.093
 hay+0.084
 trailer+0.081
 knights+0.080
Neg logit contributions
grand-0.156
t-0.156
 disobedience-0.153
 worry-0.153
 hairc-0.149
Token he
Token ID339
Feature activation-1.06e-01

Pos logit contributions
 camps+0.054
latable+0.052
acking+0.049
 trailer+0.045
 hay+0.045
Neg logit contributions
 hairc-0.060
 disobedience-0.059
grand-0.058
t-0.058
 worry-0.057