TOP ACTIVATIONS
MAX = 16.158

can always make the conscious choice to take control of our
, how can we help Mitt ens feel fine again ?"
honest was always the best choice . <|endoftext|> Once upon a
" Thank you very much Mrs Berry ." Mrs Berry then
, " What a creative choice ! Now , no more
, but he had no choice . So he was taken
You have to respect his choice and be nice to him
" That 's a good choice , Lily . The song
" Thank you for sharing Mitt ens with me . I
not fighting was the better choice . The
honesty is always the best choice . It may not be
and brave was the right choice . She was so happy
Don 't yield to the temptation to mist reat and break
party and she had no choice . She said Jenny had
wished she had never let Mitt ens run away . <|endoftext|>
that he wouldn 't scare Mitt ens again . The end
love your ch ubby cat Mitt ens ," Sarah said .
children . You have no choice . Here 's the first
being brave was the best choice . The
. " Hi Mrs . Johnson ! What 's

MIN ACTIVATIONS
MIN = -9.560

upon a time , there lived a little hedge hog .
a small house , there lived a little girl named Sue
upon a time , there lived two best friends , John
upon a time , there lived two little birds named George
upon a time , there lived a gentle village . One
a small town , there lived a little boy named Tim
upon a time , there lived a little girl named Sue
upon a time , there lived a little girl who liked
upon a time , there lived a little girl called Kim
upon a time , there lived a delicate white rabbit who
upon a time , there lived a compassionate boy who loved
a big forest , there lived a little deer . The
a small town , there lived a fat cat named Tom
upon a time , there lived an ancient soldier . He
upon a time , there lived a young man . His
upon a time , there lived little Dave . He was
a big forest , there lived a lion . The lion
a big tree , there lived a mighty bird . The
upon a time , there lived a little girl named Grace
a big forest , there lived a gentle deer . The

INTERVAL 9.729 - 16.158
CONTAINS 0.042%

smiled and said , " Of course , sweet ie .
Tom , stop sc aring Miss Bunny . She is very
. You learned a valuable lesson today ." <|endoftext|> L ily
" That was a good choice . Let 's put the
was very happy to meet Mrs . Smith and he loved

INTERVAL 3.299 - 9.729
CONTAINS 12.182%

happily . He felt extra healthy and ready to play !
to put the v ase back on the table . But
bored . " M ummy , can I do
" No , Ben ," Lily says . "
you how to make the perfect cup of tea !"

INTERVAL -3.130 - 3.299
CONTAINS 70.972%

thanked the gray bird very much and flew off into the
job and he needed a nice suit . Mom said they
mom came to help and together they found some old clothes
Max let the cat hold onto his red collar . Together
it tasted so good ! Now the little boy goes searching

INTERVAL -9.560 - -3.130
CONTAINS 16.804%

rules . <|endoftext|> Once upon a time , there was a
room . <|endoftext|> One day , a little boy was walking
farms , and they both looked fancy . They could now
find . One day , he found a y ummy treat
out of the fireplace and was sitting in the corner of
Token can
Token ID460
Feature activation+4.64e+00

Pos logit contributions
 supervision+0.147
 permission+0.125
 apology+0.122
 choices+0.122
 disl+0.120
Neg logit contributions
lic-0.142
-0.135
Inside-0.135
On-0.131
Along-0.130
Token always
Token ID1464
Feature activation+3.56e+00

Pos logit contributions
 supervision+0.151
 permission+0.129
 apology+0.124
 choices+0.123
 disl+0.122
Neg logit contributions
lic-0.167
Inside-0.157
-0.157
On-0.153
Along-0.152
Token make
Token ID787
Feature activation+5.21e+00

Pos logit contributions
 supervision+0.113
 permission+0.095
 apology+0.093
 choices+0.092
 disl+0.091
Neg logit contributions
lic-0.127
Inside-0.124
-0.124
On-0.121
Along-0.119
Token the
Token ID262
Feature activation+4.63e+00

Pos logit contributions
 supervision+0.185
 apology+0.159
 disl+0.156
 permission+0.155
grand+0.150
Neg logit contributions
lic-0.146
-0.141
Inside-0.139
On-0.135
Along-0.134
Token conscious
Token ID6921
Feature activation+7.99e+00

Pos logit contributions
 supervision+0.175
 disl+0.153
 permission+0.149
 apology+0.146
 choices+0.142
Neg logit contributions
-0.125
lic-0.120
On-0.118
Along-0.118
 clearing-0.117
Token choice
Token ID3572
Feature activation+1.39e+01

Pos logit contributions
 supervision+0.322
 disl+0.273
 permission+0.269
 apology+0.251
 choices+0.249
Neg logit contributions
lic-0.209
Inside-0.193
On-0.193
Along-0.189
-0.178
Token to
Token ID284
Feature activation+9.75e+00

Pos logit contributions
 supervision+0.529
 disl+0.471
 apology+0.462
 permission+0.457
 choices+0.447
Neg logit contributions
lic-0.385
Inside-0.373
On-0.354
-0.343
Along-0.342
Token take
Token ID1011
Feature activation+3.70e+00

Pos logit contributions
 supervision+0.292
 permission+0.244
 apology+0.235
 choices+0.233
 disl+0.229
Neg logit contributions
lic-0.376
-0.361
Inside-0.359
On-0.351
Along-0.347
Token control
Token ID1630
Feature activation+3.77e+00

Pos logit contributions
 supervision+0.102
 disl+0.082
 apology+0.080
 permission+0.080
 choices+0.075
Neg logit contributions
lic-0.141
Inside-0.139
-0.136
On-0.133
Along-0.133
Token of
Token ID286
Feature activation+5.09e+00

Pos logit contributions
 supervision+0.148
 apology+0.131
 disl+0.127
 permission+0.127
 choices+0.123
Neg logit contributions
lic-0.094
Inside-0.094
On-0.092
-0.088
Along-0.086
Token our
Token ID674
Feature activation+3.87e+00

Pos logit contributions
 supervision+0.160
 disl+0.137
 apology+0.136
 permission+0.133
 choices+0.132
Neg logit contributions
Inside-0.162
-0.161
On-0.155
lic-0.153
Along-0.151
Token,
Token ID11
Feature activation+1.23e+00

Pos logit contributions
 supervision+0.132
 permission+0.118
 apology+0.116
 choices+0.114
 disl+0.113
Neg logit contributions
lic-0.073
-0.070
Inside-0.067
On-0.064
Along-0.063
Token how
Token ID703
Feature activation+3.94e+00

Pos logit contributions
 supervision+0.045
 apology+0.038
 choices+0.038
 harm+0.038
 disl+0.038
Neg logit contributions
lic-0.038
Inside-0.037
-0.035
On-0.035
Along-0.034
Token can
Token ID460
Feature activation+5.63e+00

Pos logit contributions
 supervision+0.163
 permission+0.142
 apology+0.133
 harm+0.131
 choices+0.130
Neg logit contributions
lic-0.116
Inside-0.106
On-0.102
In-0.096
Along-0.095
Token we
Token ID356
Feature activation+7.18e+00

Pos logit contributions
 supervision+0.201
 permission+0.168
 choices+0.166
 apology+0.165
 disl+0.165
Neg logit contributions
-0.186
Inside-0.178
lic-0.176
On-0.170
Along-0.161
Token help
Token ID1037
Feature activation+7.75e+00

Pos logit contributions
 supervision+0.200
 permission+0.171
 apology+0.160
 kindness+0.158
 choices+0.157
Neg logit contributions
lic-0.292
Inside-0.273
-0.270
Along-0.266
On-0.266
Token Mitt
Token ID16627
Feature activation+1.32e+01

Pos logit contributions
 supervision+0.271
 apology+0.229
 disl+0.228
 permission+0.227
 choices+0.224
Neg logit contributions
lic-0.251
Inside-0.247
-0.241
On-0.240
Along-0.232
Tokenens
Token ID641
Feature activation+6.50e+00

Pos logit contributions
 supervision+0.302
 permission+0.238
 apology+0.225
 disl+0.222
 choices+0.219
Neg logit contributions
lic-0.602
-0.587
Inside-0.574
On-0.564
Along-0.556
Token feel
Token ID1254
Feature activation+1.14e+00

Pos logit contributions
 supervision+0.239
 permission+0.208
 apology+0.201
 choices+0.200
 disl+0.197
Neg logit contributions
lic-0.206
-0.195
Inside-0.194
On-0.188
Along-0.186
Token fine
Token ID3734
Feature activation+3.30e+00

Pos logit contributions
 supervision+0.041
 permission+0.034
 disl+0.033
 apology+0.033
 choices+0.032
Neg logit contributions
lic-0.037
Inside-0.036
On-0.035
-0.035
Along-0.033
Token again
Token ID757
Feature activation+2.99e+00

Pos logit contributions
 supervision+0.110
 permission+0.093
 apology+0.091
 disl+0.090
 choices+0.089
Neg logit contributions
lic-0.115
-0.110
Inside-0.110
On-0.107
Along-0.106
Token?"
Token ID1701
Feature activation+6.39e-01

Pos logit contributions
 supervision+0.094
 permission+0.080
 choices+0.077
 apology+0.076
 kindness+0.075
Neg logit contributions
lic-0.111
-0.106
Inside-0.105
On-0.102
Along-0.102
Token honest
Token ID5508
Feature activation+7.10e+00

Pos logit contributions
 supervision+0.090
 permission+0.077
 apology+0.076
 choices+0.075
 disl+0.074
Neg logit contributions
lic-0.081
Inside-0.076
-0.074
On-0.073
Along-0.073
Token was
Token ID373
Feature activation-1.46e+00

Pos logit contributions
 supervision+0.301
 permission+0.266
 apology+0.262
 disl+0.261
 choices+0.253
Neg logit contributions
lic-0.174
Inside-0.163
-0.163
On-0.158
Along-0.153
Token always
Token ID1464
Feature activation+2.16e+00

Pos logit contributions
lic+0.045
Inside+0.044
On+0.043
Along+0.042
+0.042
Neg logit contributions
 supervision-0.052
 apology-0.044
 disl-0.043
 choices-0.043
 permission-0.043
Token the
Token ID262
Feature activation+3.97e+00

Pos logit contributions
 supervision+0.080
 apology+0.069
 permission+0.068
 disl+0.067
 choices+0.065
Neg logit contributions
lic-0.063
Inside-0.062
On-0.061
-0.059
Along-0.058
Token best
Token ID1266
Feature activation+6.06e+00

Pos logit contributions
 supervision+0.144
 disl+0.123
 permission+0.119
 apology+0.117
 choices+0.115
Neg logit contributions
-0.116
lic-0.116
Along-0.114
Inside-0.114
On-0.114
Token choice
Token ID3572
Feature activation+1.31e+01

Pos logit contributions
 supervision+0.209
 disl+0.176
 permission+0.173
 apology+0.170
 choices+0.164
Neg logit contributions
-0.192
lic-0.191
Inside-0.189
Along-0.185
On-0.184
Token.
Token ID13
Feature activation+2.06e+00

Pos logit contributions
 supervision+0.532
 disl+0.479
 apology+0.477
 permission+0.464
 choices+0.461
Neg logit contributions
lic-0.332
Inside-0.322
-0.313
On-0.310
Along-0.298
Token<|endoftext|>
Token ID50256
Feature activation+4.46e-01

Pos logit contributions
 supervision+0.092
 permission+0.082
 apology+0.081
 disl+0.080
 choices+0.080
Neg logit contributions
lic-0.049
Inside-0.044
-0.044
On-0.042
Along-0.042
TokenOnce
Token ID7454
Feature activation-2.35e+00

Pos logit contributions
 supervision+0.015
 harm+0.014
 kindness+0.014
 choices+0.014
 apology+0.014
Neg logit contributions
lic-0.012
Inside-0.010
Along-0.010
-0.010
On-0.010
Token upon
Token ID2402
Feature activation-5.53e+00

Pos logit contributions
lic+0.037
+0.033
Inside+0.033
On+0.030
Along+0.030
Neg logit contributions
 supervision-0.124
 permission-0.112
 apology-0.110
 choices-0.110
 disl-0.108
Token a
Token ID257
Feature activation-3.27e+00

Pos logit contributions
lic+0.169
+0.166
Inside+0.151
Along+0.147
ink+0.146
Neg logit contributions
 supervision-0.199
 permission-0.181
 kindness-0.176
 apology-0.174
 choices-0.173
Token "
Token ID366
Feature activation+3.68e+00

Pos logit contributions
 supervision+0.016
 permission+0.015
 apology+0.014
 choices+0.014
 disl+0.014
Neg logit contributions
lic-0.011
-0.010
Inside-0.010
On-0.010
Along-0.010
TokenThank
Token ID10449
Feature activation+6.30e+00

Pos logit contributions
 supervision+0.120
 permission+0.104
 disl+0.100
 kindness+0.100
 apology+0.099
Neg logit contributions
lic-0.125
Inside-0.117
-0.115
Along-0.113
On-0.113
Token you
Token ID345
Feature activation+7.57e+00

Pos logit contributions
 supervision+0.155
 permission+0.128
 choices+0.120
 apology+0.120
 kindness+0.119
Neg logit contributions
lic-0.276
Inside-0.262
-0.260
On-0.260
Along-0.258
Token very
Token ID845
Feature activation+4.47e+00

Pos logit contributions
 supervision+0.274
 permission+0.240
 apology+0.231
 choices+0.231
 kindness+0.227
Neg logit contributions
lic-0.242
-0.230
Inside-0.230
On-0.223
Along-0.221
Token much
Token ID881
Feature activation+4.93e+00

Pos logit contributions
 supervision+0.177
 permission+0.157
 choices+0.154
 apology+0.154
 kindness+0.152
Neg logit contributions
lic-0.126
-0.119
Inside-0.117
On-0.114
Along-0.113
Token Mrs
Token ID9074
Feature activation+1.30e+01

Pos logit contributions
 supervision+0.178
 permission+0.153
 apology+0.148
 disl+0.146
 choices+0.146
Neg logit contributions
lic-0.160
-0.152
Inside-0.152
On-0.148
Along-0.146
Token Berry
Token ID20165
Feature activation+7.43e+00

Pos logit contributions
 supervision+0.467
 permission+0.399
 apology+0.394
 disl+0.393
 choices+0.383
Neg logit contributions
lic-0.420
-0.401
Inside-0.394
On-0.383
Along-0.380
Token."
Token ID526
Feature activation+2.44e+00

Pos logit contributions
 supervision+0.261
 permission+0.222
 apology+0.218
 disl+0.214
 choices+0.214
Neg logit contributions
lic-0.246
-0.234
Inside-0.231
On-0.224
Along-0.223
Token Mrs
Token ID9074
Feature activation+8.84e+00

Pos logit contributions
 supervision+0.093
 permission+0.081
 kindness+0.079
 apology+0.079
 choices+0.079
Neg logit contributions
lic-0.074
Inside-0.068
-0.066
Along-0.066
On-0.066
Token Berry
Token ID20165
Feature activation+3.42e+00

Pos logit contributions
 supervision+0.311
 permission+0.267
 apology+0.260
 disl+0.256
 choices+0.255
Neg logit contributions
lic-0.294
-0.281
Inside-0.277
On-0.270
Along-0.266
Token then
Token ID788
Feature activation+2.68e-01

Pos logit contributions
 supervision+0.123
 permission+0.107
 apology+0.103
 choices+0.103
 disl+0.101
Neg logit contributions
lic-0.110
-0.105
Inside-0.104
On-0.101
Along-0.100
Token,
Token ID11
Feature activation+1.75e+00

Pos logit contributions
 supervision+0.134
 choices+0.116
 permission+0.116
 apology+0.115
 disl+0.114
Neg logit contributions
lic-0.092
-0.085
Inside-0.083
On-0.081
Along-0.079
Token "
Token ID366
Feature activation+4.31e+00

Pos logit contributions
 supervision+0.071
 permission+0.063
 apology+0.061
 kindness+0.060
 disl+0.060
Neg logit contributions
lic-0.047
-0.046
Inside-0.044
On-0.043
Along-0.043
TokenWhat
Token ID2061
Feature activation+5.52e+00

Pos logit contributions
 supervision+0.165
 permission+0.144
 disl+0.143
 apology+0.140
 kindness+0.137
Neg logit contributions
lic-0.128
-0.119
Inside-0.116
Along-0.113
On-0.111
Token a
Token ID257
Feature activation+3.38e-01

Pos logit contributions
 supervision+0.215
 permission+0.189
 apology+0.182
 attention+0.178
 choices+0.176
Neg logit contributions
lic-0.161
Inside-0.154
On-0.149
-0.145
Along-0.145
Token creative
Token ID7325
Feature activation+6.24e+00

Pos logit contributions
 supervision+0.012
 permission+0.011
 apology+0.011
 choices+0.010
 kindness+0.010
Neg logit contributions
lic-0.011
-0.010
Inside-0.010
On-0.010
Along-0.010
Token choice
Token ID3572
Feature activation+1.30e+01

Pos logit contributions
 supervision+0.216
 permission+0.185
 apology+0.178
 choices+0.178
 disl+0.176
Neg logit contributions
lic-0.211
-0.201
Inside-0.199
On-0.194
Along-0.193
Token!
Token ID0
Feature activation+4.46e+00

Pos logit contributions
 supervision+0.557
 permission+0.483
 apology+0.480
 disl+0.479
 choices+0.475
Neg logit contributions
lic-0.326
Inside-0.310
-0.290
On-0.285
Along-0.280
Token Now
Token ID2735
Feature activation+1.67e+00

Pos logit contributions
 supervision+0.141
 permission+0.118
 apology+0.116
 disl+0.114
 kindness+0.113
Neg logit contributions
lic-0.162
-0.154
Inside-0.152
On-0.150
Along-0.149
Token,
Token ID11
Feature activation-2.52e-01

Pos logit contributions
 supervision+0.056
 permission+0.048
 apology+0.047
 disl+0.046
 choices+0.046
Neg logit contributions
lic-0.057
Inside-0.056
-0.054
On-0.053
Along-0.053
Token no
Token ID645
Feature activation+3.30e+00

Pos logit contributions
lic+0.009
Inside+0.009
On+0.008
+0.008
Along+0.008
Neg logit contributions
 supervision-0.008
 permission-0.007
 apology-0.007
 choices-0.007
 disl-0.007
Token more
Token ID517
Feature activation+3.30e+00

Pos logit contributions
 supervision+0.112
 permission+0.096
 apology+0.093
 choices+0.092
 disl+0.092
Neg logit contributions
lic-0.114
-0.109
Inside-0.108
On-0.105
Along-0.104
Token,
Token ID11
Feature activation+3.06e-01

Pos logit contributions
 supervision+0.346
 permission+0.293
 apology+0.289
 disl+0.289
 choices+0.283
Neg logit contributions
lic-0.242
On-0.221
Inside-0.220
-0.215
Along-0.210
Token but
Token ID475
Feature activation+4.44e+00

Pos logit contributions
 supervision+0.011
 permission+0.010
 apology+0.009
 choices+0.009
 disl+0.009
Neg logit contributions
lic-0.010
Inside-0.009
-0.009
On-0.009
Along-0.009
Token he
Token ID339
Feature activation-1.01e+00

Pos logit contributions
 supervision+0.160
 apology+0.140
 permission+0.138
 disl+0.136
 choices+0.135
Neg logit contributions
lic-0.133
-0.131
Inside-0.131
On-0.129
Along-0.124
Token had
Token ID550
Feature activation-2.37e+00

Pos logit contributions
lic+0.031
Inside+0.029
+0.029
On+0.029
Along+0.028
Neg logit contributions
 supervision-0.038
 permission-0.033
 choices-0.032
 apology-0.032
 disl-0.031
Token no
Token ID645
Feature activation+4.98e+00

Pos logit contributions
lic+0.070
Inside+0.067
+0.065
On+0.065
Along+0.064
Neg logit contributions
 supervision-0.091
 permission-0.078
 apology-0.078
 choices-0.077
 disl-0.076
Token choice
Token ID3572
Feature activation+1.29e+01

Pos logit contributions
 supervision+0.164
 disl+0.141
 permission+0.135
 choices+0.130
 apology+0.130
Neg logit contributions
On-0.155
lic-0.155
-0.154
Inside-0.153
Along-0.150
Token.
Token ID13
Feature activation+2.86e+00

Pos logit contributions
 supervision+0.510
 disl+0.462
 apology+0.457
 choices+0.449
 permission+0.437
Neg logit contributions
lic-0.320
Inside-0.309
On-0.300
-0.297
Along-0.286
Token So
Token ID1406
Feature activation+3.49e-02

Pos logit contributions
 supervision+0.112
 disl+0.096
 apology+0.096
 kindness+0.095
 permission+0.095
Neg logit contributions
lic-0.081
Inside-0.071
-0.070
ink-0.070
Along-0.069
Token he
Token ID339
Feature activation-2.31e+00

Pos logit contributions
 supervision+0.001
 permission+0.001
 apology+0.001
 choices+0.001
 disl+0.001
Neg logit contributions
lic-0.001
-0.001
Inside-0.001
On-0.001
Along-0.001
Token was
Token ID373
Feature activation-3.97e+00

Pos logit contributions
lic+0.076
+0.076
Inside+0.072
On+0.070
Along+0.069
Neg logit contributions
 supervision-0.078
 permission-0.069
 apology-0.066
 kindness-0.065
 choices-0.064
Token taken
Token ID2077
Feature activation-2.44e-01

Pos logit contributions
lic+0.143
+0.139
Inside+0.136
On+0.132
Along+0.131
Neg logit contributions
 supervision-0.130
 permission-0.113
 apology-0.105
 choices-0.104
 kindness-0.104
Token You
Token ID921
Feature activation+3.61e+00

Pos logit contributions
 supervision+0.082
 permission+0.074
 apology+0.071
 choices+0.071
 attention+0.070
Neg logit contributions
lic-0.043
-0.041
Inside-0.041
On-0.039
Along-0.038
Token have
Token ID423
Feature activation-9.83e-01

Pos logit contributions
 supervision+0.124
 permission+0.106
 choices+0.103
 apology+0.103
 disl+0.101
Neg logit contributions
lic-0.123
Inside-0.119
-0.118
On-0.115
Along-0.113
Token to
Token ID284
Feature activation+3.20e+00

Pos logit contributions
lic+0.033
Inside+0.032
+0.031
On+0.031
Along+0.031
Neg logit contributions
 supervision-0.033
 apology-0.028
 permission-0.028
 choices-0.028
 disl-0.027
Token respect
Token ID2461
Feature activation+4.07e+00

Pos logit contributions
 supervision+0.093
 permission+0.076
 choices+0.074
 apology+0.073
 disl+0.072
Neg logit contributions
lic-0.122
Inside-0.119
-0.119
On-0.117
Along-0.114
Token his
Token ID465
Feature activation+3.39e+00

Pos logit contributions
 supervision+0.135
 permission+0.114
 disl+0.114
 apology+0.113
 choices+0.110
Neg logit contributions
lic-0.133
Inside-0.131
On-0.128
-0.125
Along-0.122
Token choice
Token ID3572
Feature activation+1.28e+01

Pos logit contributions
 supervision+0.111
 disl+0.091
 permission+0.091
 apology+0.088
 kindness+0.084
Neg logit contributions
-0.101
Inside-0.099
lic-0.097
On-0.097
Along-0.096
Token and
Token ID290
Feature activation+5.05e+00

Pos logit contributions
 supervision+0.544
 disl+0.504
 permission+0.495
 apology+0.487
 choices+0.469
Neg logit contributions
Inside-0.280
lic-0.276
On-0.264
-0.247
Along-0.236
Token be
Token ID307
Feature activation+5.71e-01

Pos logit contributions
 supervision+0.118
 harm+0.105
 disl+0.101
 permission+0.100
 apology+0.098
Neg logit contributions
Inside-0.184
On-0.170
lic-0.166
A-0.162
Under-0.159
Token nice
Token ID3621
Feature activation+3.95e+00

Pos logit contributions
 supervision+0.021
 apology+0.017
 permission+0.017
 disl+0.017
 choices+0.017
Neg logit contributions
lic-0.018
Inside-0.017
On-0.017
Along-0.016
-0.016
Token to
Token ID284
Feature activation+5.74e+00

Pos logit contributions
 supervision+0.158
 disl+0.143
 apology+0.142
 permission+0.133
 kindness+0.131
Neg logit contributions
lic-0.094
Inside-0.093
On-0.093
Along-0.088
-0.082
Token him
Token ID683
Feature activation+6.22e-01

Pos logit contributions
 supervision+0.182
 permission+0.151
 apology+0.151
 choices+0.149
 disl+0.147
Neg logit contributions
lic-0.201
Inside-0.195
-0.193
On-0.191
Along-0.186
Token "
Token ID366
Feature activation+4.19e+00

Pos logit contributions
 supervision+0.047
 permission+0.041
 apology+0.039
 kindness+0.039
 disl+0.039
Neg logit contributions
lic-0.033
-0.032
Inside-0.031
Along-0.031
On-0.030
TokenThat
Token ID2504
Feature activation+3.80e+00

Pos logit contributions
 supervision+0.157
 permission+0.137
 apology+0.134
 disl+0.133
 kindness+0.132
Neg logit contributions
lic-0.128
-0.122
Inside-0.117
Along-0.114
On-0.113
Token's
Token ID338
Feature activation+3.50e+00

Pos logit contributions
 supervision+0.135
 permission+0.116
 apology+0.110
 choices+0.109
 disl+0.108
Neg logit contributions
lic-0.127
Inside-0.121
-0.121
On-0.119
Along-0.116
Token a
Token ID257
Feature activation+1.15e-02

Pos logit contributions
 supervision+0.113
 permission+0.099
 choices+0.097
 kindness+0.095
 attention+0.094
Neg logit contributions
lic-0.130
-0.124
Inside-0.118
Along-0.117
On-0.114
Token good
Token ID922
Feature activation+6.27e+00

Pos logit contributions
 supervision+0.000
 permission+0.000
 apology+0.000
 kindness+0.000
 choices+0.000
Neg logit contributions
lic-0.000
-0.000
Inside-0.000
On-0.000
Along-0.000
Token choice
Token ID3572
Feature activation+1.28e+01

Pos logit contributions
 supervision+0.225
 disl+0.181
 permission+0.180
 choices+0.166
 privacy+0.163
Neg logit contributions
lic-0.193
-0.191
On-0.188
Along-0.187
Inside-0.187
Token,
Token ID11
Feature activation+3.18e+00

Pos logit contributions
 supervision+0.552
 disl+0.493
 apology+0.489
 permission+0.483
 choices+0.475
Neg logit contributions
lic-0.289
Inside-0.271
-0.268
On-0.249
Along-0.240
Token Lily
Token ID20037
Feature activation+5.27e+00

Pos logit contributions
 supervision+0.116
 permission+0.099
 disl+0.096
 apology+0.095
 choices+0.094
Neg logit contributions
lic-0.098
Inside-0.098
On-0.094
-0.093
Along-0.092
Token.
Token ID13
Feature activation+3.45e+00

Pos logit contributions
 supervision+0.224
 apology+0.203
 permission+0.197
 disl+0.195
 choices+0.189
Neg logit contributions
lic-0.124
-0.116
Inside-0.110
ink-0.108
On-0.107
Token The
Token ID383
Feature activation+1.32e+00

Pos logit contributions
 supervision+0.124
 permission+0.109
 apology+0.105
 choices+0.104
 disl+0.102
Neg logit contributions
lic-0.109
-0.105
Inside-0.104
On-0.099
Along-0.099
Token song
Token ID3496
Feature activation-2.77e-01

Pos logit contributions
 supervision+0.049
 permission+0.042
 disl+0.041
 choices+0.040
 apology+0.040
Neg logit contributions
lic-0.040
-0.039
Inside-0.038
On-0.037
Along-0.037
Token "
Token ID366
Feature activation+2.08e+00

Pos logit contributions
lic+0.052
+0.050
Inside+0.049
Along+0.048
On+0.047
Neg logit contributions
 supervision-0.062
 permission-0.056
 kindness-0.053
 apology-0.053
 disl-0.051
TokenThank
Token ID10449
Feature activation+5.03e+00

Pos logit contributions
 supervision+0.084
 permission+0.074
 apology+0.071
 choices+0.071
 disl+0.071
Neg logit contributions
lic-0.059
-0.055
Inside-0.055
On-0.053
Along-0.052
Token you
Token ID345
Feature activation+6.10e+00

Pos logit contributions
 supervision+0.139
 permission+0.115
 apology+0.110
 choices+0.109
 disl+0.107
Neg logit contributions
lic-0.205
-0.197
Inside-0.196
On-0.192
Along-0.191
Token for
Token ID329
Feature activation+4.95e+00

Pos logit contributions
 supervision+0.177
 permission+0.150
 apology+0.142
 choices+0.142
 kindness+0.140
Neg logit contributions
lic-0.239
-0.232
Inside-0.231
On-0.226
Along-0.224
Token sharing
Token ID7373
Feature activation+3.55e+00

Pos logit contributions
 supervision+0.151
 permission+0.125
 apology+0.122
 choices+0.122
 disl+0.119
Neg logit contributions
lic-0.187
Inside-0.180
-0.179
On-0.175
Along-0.173
Token Mitt
Token ID16627
Feature activation+1.27e+01

Pos logit contributions
 supervision+0.135
 permission+0.116
 apology+0.111
 choices+0.108
 harm+0.107
Neg logit contributions
lic-0.101
Inside-0.100
-0.098
On-0.097
A-0.090
Tokenens
Token ID641
Feature activation+6.34e+00

Pos logit contributions
 supervision+0.305
 permission+0.248
 apology+0.230
 disl+0.226
 choices+0.217
Neg logit contributions
lic-0.569
-0.549
Inside-0.533
On-0.518
Along-0.513
Token with
Token ID351
Feature activation+2.40e+00

Pos logit contributions
 supervision+0.248
 permission+0.213
 apology+0.213
 disl+0.211
 choices+0.201
Neg logit contributions
lic-0.181
-0.172
Inside-0.169
On-0.164
Along-0.164
Token me
Token ID502
Feature activation+1.84e+00

Pos logit contributions
 supervision+0.092
 permission+0.080
 apology+0.078
 choices+0.077
 disl+0.077
Neg logit contributions
lic-0.072
-0.069
Inside-0.068
On-0.066
Along-0.065
Token.
Token ID13
Feature activation+2.84e+00

Pos logit contributions
 supervision+0.058
 permission+0.049
 apology+0.047
 choices+0.047
 disl+0.046
Neg logit contributions
lic-0.068
-0.065
Inside-0.064
On-0.063
Along-0.062
Token I
Token ID314
Feature activation+1.67e+00

Pos logit contributions
 supervision+0.094
 permission+0.079
 apology+0.079
 disl+0.078
 kindness+0.078
Neg logit contributions
lic-0.097
Inside-0.093
-0.091
Along-0.091
On-0.091
Token not
Token ID407
Feature activation+4.41e+00

Pos logit contributions
 supervision+0.140
 apology+0.125
 disl+0.118
 permission+0.117
 choices+0.114
Neg logit contributions
On-0.105
Inside-0.095
-0.094
Along-0.093
In-0.091
Token fighting
Token ID4330
Feature activation+4.17e+00

Pos logit contributions
 supervision+0.150
 apology+0.126
 permission+0.123
 disl+0.121
 choices+0.119
Neg logit contributions
lic-0.142
Inside-0.140
On-0.140
Along-0.134
-0.128
Token was
Token ID373
Feature activation-1.09e+00

Pos logit contributions
 supervision+0.156
 permission+0.136
 apology+0.131
 disl+0.131
 choices+0.126
Neg logit contributions
lic-0.122
On-0.118
Inside-0.116
-0.110
Along-0.110
Token the
Token ID262
Feature activation+4.37e+00

Pos logit contributions
lic+0.032
On+0.031
Inside+0.031
+0.030
Along+0.029
Neg logit contributions
 supervision-0.039
 apology-0.033
 disl-0.033
 permission-0.032
 choices-0.032
Token better
Token ID1365
Feature activation+6.29e+00

Pos logit contributions
 supervision+0.162
 disl+0.135
 permission+0.132
grand+0.130
 privacy+0.130
Neg logit contributions
-0.125
lic-0.122
Along-0.117
On-0.116
Inside-0.112
Token choice
Token ID3572
Feature activation+1.27e+01

Pos logit contributions
 supervision+0.224
 permission+0.185
 apology+0.181
 disl+0.177
 kindness+0.168
Neg logit contributions
lic-0.189
On-0.182
Inside-0.178
Along-0.176
-0.173
Token.
Token ID13
Feature activation+1.52e+00

Pos logit contributions
 supervision+0.528
 disl+0.477
 apology+0.474
 permission+0.465
 choices+0.457
Neg logit contributions
lic-0.305
Inside-0.292
On-0.289
-0.285
Along-0.272
Token
Token ID220
Feature activation-1.87e+00

Pos logit contributions
 supervision+0.065
 permission+0.057
 apology+0.057
 kindness+0.056
 choices+0.056
Neg logit contributions
lic-0.038
Inside-0.035
-0.034
Along-0.033
On-0.033
Token
Token ID198
Feature activation-4.32e+00

Pos logit contributions
lic+0.042
Inside+0.038
+0.037
Along+0.037
 clearing+0.036
Neg logit contributions
 supervision-0.084
 permission-0.073
 apology-0.073
 disl-0.072
 choices-0.071
Token
Token ID198
Feature activation-4.72e+00

Pos logit contributions
lic+0.099
+0.095
Along+0.087
Inside+0.086
 clearing+0.083
Neg logit contributions
 supervision-0.182
 apology-0.168
 disl-0.163
 kindness-0.162
 permission-0.160
TokenThe
Token ID464
Feature activation-2.87e-01

Pos logit contributions
lic+0.119
+0.114
Inside+0.100
Along+0.098
 clearing+0.097
Neg logit contributions
 supervision-0.191
 apology-0.173
 permission-0.172
 disl-0.170
 kindness-0.169
Token honesty
Token ID24345
Feature activation+5.36e+00

Pos logit contributions
 supervision+0.144
 disl+0.122
 apology+0.122
 permission+0.117
 attention+0.110
Neg logit contributions
On-0.112
Inside-0.110
Along-0.107
 clearing-0.102
-0.101
Token is
Token ID318
Feature activation+9.87e-02

Pos logit contributions
 supervision+0.219
 permission+0.194
 disl+0.191
 apology+0.188
 choices+0.184
Neg logit contributions
Inside-0.135
lic-0.134
-0.134
On-0.128
Along-0.126
Token always
Token ID1464
Feature activation+2.38e+00

Pos logit contributions
 supervision+0.003
 disl+0.003
 apology+0.003
 choices+0.003
 permission+0.003
Neg logit contributions
Inside-0.003
lic-0.003
On-0.003
Along-0.003
In-0.003
Token the
Token ID262
Feature activation+4.00e+00

Pos logit contributions
 supervision+0.087
 apology+0.073
 permission+0.072
 disl+0.072
 choices+0.069
Neg logit contributions
Inside-0.072
lic-0.072
On-0.070
Along-0.068
-0.066
Token best
Token ID1266
Feature activation+6.06e+00

Pos logit contributions
 supervision+0.149
 disl+0.132
 permission+0.121
 apology+0.121
 choices+0.115
Neg logit contributions
Along-0.113
On-0.113
Inside-0.112
-0.109
lic-0.108
Token choice
Token ID3572
Feature activation+1.27e+01

Pos logit contributions
 supervision+0.207
 disl+0.186
 apology+0.168
 permission+0.165
minded+0.159
Neg logit contributions
Inside-0.180
Along-0.175
On-0.174
-0.171
lic-0.165
Token.
Token ID13
Feature activation+1.77e+00

Pos logit contributions
 supervision+0.516
 apology+0.463
 disl+0.462
 permission+0.449
 choices+0.444
Neg logit contributions
lic-0.310
Inside-0.308
-0.291
On-0.290
Along-0.282
Token It
Token ID632
Feature activation-1.26e+00

Pos logit contributions
 supervision+0.074
 permission+0.066
 apology+0.065
 kindness+0.064
 choices+0.064
Neg logit contributions
lic-0.048
Inside-0.043
-0.043
On-0.041
Along-0.041
Token may
Token ID743
Feature activation+2.98e+00

Pos logit contributions
lic+0.038
Inside+0.037
+0.036
On+0.036
Along+0.036
Neg logit contributions
 supervision-0.047
 permission-0.040
 apology-0.038
 disl-0.038
 choices-0.037
Token not
Token ID407
Feature activation+4.89e+00

Pos logit contributions
 supervision+0.114
 permission+0.097
 choices+0.094
 apology+0.094
 disl+0.092
Neg logit contributions
lic-0.087
Inside-0.086
-0.086
On-0.085
Along-0.082
Token be
Token ID307
Feature activation+9.08e-01

Pos logit contributions
 supervision+0.188
 permission+0.155
 disl+0.153
 choices+0.152
 apology+0.150
Neg logit contributions
Inside-0.141
On-0.140
lic-0.139
-0.134
Along-0.133
Token and
Token ID290
Feature activation+3.46e+00

Pos logit contributions
 supervision+0.295
 permission+0.260
 apology+0.258
 disl+0.258
 choices+0.248
Neg logit contributions
lic-0.170
Inside-0.159
On-0.158
-0.157
Along-0.150
Token brave
Token ID14802
Feature activation+5.83e+00

Pos logit contributions
 supervision+0.113
 apology+0.098
 permission+0.096
 choices+0.090
 disl+0.089
Neg logit contributions
Inside-0.109
lic-0.109
On-0.105
-0.100
Along-0.100
Token was
Token ID373
Feature activation-2.45e+00

Pos logit contributions
 supervision+0.225
 apology+0.202
 disl+0.200
 permission+0.197
 choices+0.187
Neg logit contributions
lic-0.156
Inside-0.145
-0.143
On-0.143
Along-0.139
Token the
Token ID262
Feature activation+3.30e+00

Pos logit contributions
lic+0.078
Inside+0.074
On+0.074
+0.073
Along+0.072
Neg logit contributions
 supervision-0.087
 apology-0.074
 permission-0.073
 choices-0.072
 disl-0.072
Token right
Token ID826
Feature activation+4.47e+00

Pos logit contributions
 supervision+0.119
 disl+0.101
 permission+0.099
 apology+0.097
 choices+0.096
Neg logit contributions
-0.098
lic-0.098
On-0.096
Along-0.096
Inside-0.096
Token choice
Token ID3572
Feature activation+1.27e+01

Pos logit contributions
 supervision+0.153
 disl+0.129
 permission+0.126
 apology+0.124
 kindness+0.120
Neg logit contributions
lic-0.143
On-0.139
-0.139
Inside-0.139
Along-0.137
Token.
Token ID13
Feature activation+2.14e+00

Pos logit contributions
 supervision+0.504
 disl+0.456
 apology+0.451
 permission+0.438
 choices+0.435
Neg logit contributions
lic-0.329
Inside-0.315
On-0.307
-0.307
Along-0.293
Token She
Token ID1375
Feature activation-1.66e+00

Pos logit contributions
 supervision+0.091
 permission+0.079
 disl+0.079
 apology+0.078
 kindness+0.078
Neg logit contributions
lic-0.055
-0.049
Inside-0.049
Along-0.048
On-0.047
Token was
Token ID373
Feature activation-4.48e+00

Pos logit contributions
lic+0.058
+0.056
Inside+0.055
On+0.054
Along+0.053
Neg logit contributions
 supervision-0.055
 permission-0.047
 apology-0.045
 choices-0.045
 disl-0.044
Token so
Token ID523
Feature activation+1.72e+00

Pos logit contributions
lic+0.154
+0.147
Inside+0.146
On+0.142
Along+0.141
Neg logit contributions
 supervision-0.153
 permission-0.130
 apology-0.127
 choices-0.126
 disl-0.124
Token happy
Token ID3772
Feature activation+2.82e+00

Pos logit contributions
 supervision+0.052
 permission+0.044
 apology+0.042
 choices+0.042
 disl+0.041
Neg logit contributions
lic-0.066
-0.063
Inside-0.063
On-0.061
Along-0.061
Token Don
Token ID2094
Feature activation+1.26e+01

Pos logit contributions
 supervision+0.042
 kindness+0.037
 apology+0.037
 disl+0.036
 permission+0.036
Neg logit contributions
lic-0.033
Inside-0.028
ink-0.028
Along-0.027
-0.026
Token't
Token ID470
Feature activation+9.34e+00

Pos logit contributions
 supervision+0.607
 permission+0.544
 choices+0.531
 disl+0.527
 apology+0.525
Neg logit contributions
lic-0.253
-0.239
Inside-0.232
On-0.221
Along-0.213
Token yield
Token ID7800
Feature activation+5.89e+00

Pos logit contributions
 supervision+0.310
 choices+0.255
 apology+0.251
 permission+0.247
 disl+0.242
Neg logit contributions
lic-0.288
Inside-0.281
On-0.280
Along-0.266
A-0.265
Token to
Token ID284
Feature activation+6.73e+00

Pos logit contributions
 supervision+0.223
 apology+0.194
 permission+0.192
 disl+0.184
grand+0.180
Neg logit contributions
Inside-0.153
On-0.152
lic-0.151
-0.146
Along-0.139
Token the
Token ID262
Feature activation+4.79e+00

Pos logit contributions
 supervision+0.216
 permission+0.180
 choices+0.180
 apology+0.179
 disl+0.174
Neg logit contributions
-0.213
Inside-0.213
lic-0.212
On-0.208
Along-0.200
Token temptation
Token ID29062
Feature activation+1.27e+01

Pos logit contributions
 supervision+0.177
 disl+0.152
 permission+0.152
 choices+0.147
 apology+0.146
Neg logit contributions
lic-0.139
-0.136
Inside-0.132
On-0.130
Along-0.129
Token to
Token ID284
Feature activation+9.30e+00

Pos logit contributions
 supervision+0.472
 permission+0.410
 disl+0.409
 apology+0.401
 choices+0.399
Neg logit contributions
lic-0.373
Inside-0.358
On-0.349
-0.346
Along-0.337
Token mist
Token ID4020
Feature activation+7.11e-01

Pos logit contributions
 supervision+0.288
 permission+0.241
 choices+0.234
 apology+0.232
 disl+0.228
Neg logit contributions
lic-0.346
-0.334
Inside-0.333
On-0.325
Along-0.319
Tokenreat
Token ID630
Feature activation+2.84e+00

Pos logit contributions
 supervision+0.025
 apology+0.021
 permission+0.021
 disl+0.020
 choices+0.020
Neg logit contributions
lic-0.024
-0.022
Inside-0.022
On-0.021
Along-0.021
Token and
Token ID290
Feature activation+4.20e+00

Pos logit contributions
 supervision+0.091
 permission+0.079
 apology+0.075
 choices+0.075
 kindness+0.074
Neg logit contributions
lic-0.101
-0.097
Inside-0.096
On-0.094
Along-0.094
Token break
Token ID2270
Feature activation+1.21e+00

Pos logit contributions
 supervision+0.137
 permission+0.118
 apology+0.117
 choices+0.114
 disl+0.110
Neg logit contributions
Inside-0.135
lic-0.131
On-0.131
-0.128
Along-0.125
Token party
Token ID2151
Feature activation-1.44e-02

Pos logit contributions
 supervision+0.094
 permission+0.080
 apology+0.079
 disl+0.079
 choices+0.079
Neg logit contributions
lic-0.057
Inside-0.056
On-0.055
-0.054
Along-0.054
Token and
Token ID290
Feature activation+1.90e+00

Pos logit contributions
lic+0.000
+0.000
Inside+0.000
On+0.000
Along+0.000
Neg logit contributions
 supervision-0.001
 permission-0.000
 apology-0.000
 choices-0.000
 disl-0.000
Token she
Token ID673
Feature activation-1.56e+00

Pos logit contributions
 supervision+0.064
 choices+0.057
 apology+0.056
 permission+0.056
 disl+0.054
Neg logit contributions
Inside-0.058
On-0.057
-0.053
lic-0.053
Along-0.052
Token had
Token ID550
Feature activation-2.87e+00

Pos logit contributions
lic+0.040
On+0.038
Inside+0.038
+0.036
Along+0.036
Neg logit contributions
 supervision-0.065
 permission-0.058
 disl-0.057
 choices-0.057
 apology-0.055
Token no
Token ID645
Feature activation+5.32e+00

Pos logit contributions
On+0.080
Inside+0.080
Along+0.075
lic+0.075
+0.074
Neg logit contributions
 supervision-0.106
 apology-0.096
 disl-0.094
 choices-0.094
 permission-0.087
Token choice
Token ID3572
Feature activation+1.26e+01

Pos logit contributions
 supervision+0.200
 disl+0.177
 permission+0.164
 apology+0.163
 harm+0.159
Neg logit contributions
On-0.141
Inside-0.139
lic-0.135
Along-0.132
-0.132
Token.
Token ID13
Feature activation+2.23e+00

Pos logit contributions
 supervision+0.458
 disl+0.409
 apology+0.403
 choices+0.395
 permission+0.384
Neg logit contributions
lic-0.353
Inside-0.353
On-0.344
-0.338
Along-0.326
Token She
Token ID1375
Feature activation-7.65e-01

Pos logit contributions
 supervision+0.097
 permission+0.087
 apology+0.085
 choices+0.084
 disl+0.083
Neg logit contributions
lic-0.056
-0.052
Inside-0.051
Along-0.049
On-0.049
Token said
Token ID531
Feature activation+9.91e-01

Pos logit contributions
lic+0.025
Inside+0.024
On+0.024
+0.023
Along+0.023
Neg logit contributions
 supervision-0.027
 permission-0.023
 disl-0.023
 choices-0.023
 attention-0.022
Token Jenny
Token ID28437
Feature activation+5.46e+00

Pos logit contributions
 supervision+0.037
 choices+0.032
 apology+0.032
 permission+0.031
 attention+0.031
Neg logit contributions
lic-0.028
-0.027
Inside-0.027
On-0.026
A-0.024
Token had
Token ID550
Feature activation-1.94e+00

Pos logit contributions
 supervision+0.238
 permission+0.212
 apology+0.211
 disl+0.210
 choices+0.207
Neg logit contributions
lic-0.128
Inside-0.122
On-0.118
-0.118
Along-0.114
Token wished
Token ID16555
Feature activation+4.00e+00

Pos logit contributions
 supervision+0.026
 apology+0.023
 permission+0.023
 choices+0.022
 disl+0.022
Neg logit contributions
lic-0.020
On-0.019
Inside-0.018
A-0.018
-0.018
Token she
Token ID673
Feature activation+2.47e+00

Pos logit contributions
 supervision+0.149
 disl+0.129
 apology+0.128
 permission+0.127
 choices+0.126
Neg logit contributions
-0.120
lic-0.118
Inside-0.113
On-0.113
Along-0.105
Token had
Token ID550
Feature activation-3.82e-01

Pos logit contributions
 supervision+0.103
 permission+0.090
 apology+0.089
 choices+0.088
 disl+0.088
Neg logit contributions
lic-0.068
Inside-0.064
-0.064
On-0.062
Along-0.061
Token never
Token ID1239
Feature activation+4.19e+00

Pos logit contributions
lic+0.011
Inside+0.011
On+0.011
+0.011
Along+0.011
Neg logit contributions
 supervision-0.014
 apology-0.012
 choices-0.012
 disl-0.012
 permission-0.012
Token let
Token ID1309
Feature activation+7.78e+00

Pos logit contributions
 supervision+0.159
 permission+0.137
 apology+0.137
 choices+0.136
 disl+0.134
Neg logit contributions
lic-0.127
Inside-0.122
On-0.118
-0.116
Along-0.115
Token Mitt
Token ID16627
Feature activation+1.26e+01

Pos logit contributions
 supervision+0.260
 disl+0.232
 apology+0.225
 choices+0.223
 permission+0.212
Neg logit contributions
-0.244
lic-0.242
Inside-0.239
On-0.232
Along-0.225
Tokenens
Token ID641
Feature activation+6.01e+00

Pos logit contributions
 supervision+0.236
 permission+0.173
 apology+0.159
 choices+0.158
 kindness+0.152
Neg logit contributions
lic-0.627
Inside-0.606
Along-0.599
On-0.598
-0.590
Token run
Token ID1057
Feature activation+2.97e+00

Pos logit contributions
 supervision+0.197
 choices+0.164
 disl+0.164
 apology+0.162
 permission+0.161
Neg logit contributions
lic-0.208
-0.204
Inside-0.202
On-0.198
Along-0.191
Token away
Token ID1497
Feature activation-1.02e-01

Pos logit contributions
 supervision+0.106
 disl+0.090
 apology+0.089
 choices+0.088
 permission+0.087
Neg logit contributions
lic-0.096
-0.093
Inside-0.092
On-0.091
Along-0.088
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
lic+0.003
On+0.003
Inside+0.003
+0.003
Along+0.003
Neg logit contributions
 supervision-0.004
 disl-0.003
 apology-0.003
 permission-0.003
 choices-0.003
Token<|endoftext|>
Token ID50256
Feature activation-7.46e-01

Pos logit contributions
 supervision+0.059
 permission+0.053
 apology+0.052
 choices+0.052
 disl+0.051
Neg logit contributions
lic-0.022
-0.021
Inside-0.020
On-0.020
Along-0.019
Token that
Token ID326
Feature activation+3.70e+00

Pos logit contributions
 supervision+0.147
 permission+0.126
 apology+0.123
 choices+0.122
 disl+0.121
Neg logit contributions
lic-0.129
-0.126
Inside-0.125
On-0.123
Along-0.119
Token he
Token ID339
Feature activation-3.15e-01

Pos logit contributions
 supervision+0.142
 apology+0.130
 disl+0.123
 permission+0.122
 choices+0.117
Neg logit contributions
On-0.101
Inside-0.098
-0.096
lic-0.093
Along-0.091
Token wouldn
Token ID3636
Feature activation+6.79e+00

Pos logit contributions
lic+0.010
+0.009
Inside+0.009
On+0.009
Along+0.009
Neg logit contributions
 supervision-0.012
 permission-0.010
 apology-0.010
 choices-0.010
 disl-0.010
Token't
Token ID470
Feature activation+5.55e+00

Pos logit contributions
 supervision+0.308
 permission+0.277
 apology+0.269
 choices+0.266
 disl+0.265
Neg logit contributions
lic-0.154
Inside-0.141
-0.138
On-0.134
Along-0.134
Token scare
Token ID19437
Feature activation+6.41e+00

Pos logit contributions
 supervision+0.185
 permission+0.154
 apology+0.152
 choices+0.152
 disl+0.148
Neg logit contributions
lic-0.189
On-0.185
Inside-0.185
-0.184
Along-0.178
Token Mitt
Token ID16627
Feature activation+1.25e+01

Pos logit contributions
 supervision+0.202
 disl+0.174
 apology+0.170
 permission+0.159
 generosity+0.156
Neg logit contributions
lic-0.208
-0.208
Inside-0.204
On-0.204
Along-0.197
Tokenens
Token ID641
Feature activation+5.77e+00

Pos logit contributions
 supervision+0.265
 permission+0.205
 apology+0.193
 choices+0.190
 disl+0.186
Neg logit contributions
lic-0.592
-0.572
Inside-0.568
On-0.558
Along-0.553
Token again
Token ID757
Feature activation+4.27e+00

Pos logit contributions
 supervision+0.190
 apology+0.163
 disl+0.161
 permission+0.160
 choices+0.157
Neg logit contributions
lic-0.198
-0.192
Inside-0.189
On-0.187
Along-0.184
Token.
Token ID13
Feature activation+1.53e+00

Pos logit contributions
 supervision+0.155
 apology+0.134
 permission+0.133
 choices+0.130
 disl+0.130
Neg logit contributions
lic-0.134
-0.129
Inside-0.129
On-0.127
Along-0.124
Token The
Token ID383
Feature activation+4.23e-01

Pos logit contributions
 supervision+0.071
 permission+0.063
 kindness+0.062
 choices+0.062
 apology+0.062
Neg logit contributions
lic-0.034
Inside-0.031
-0.030
Along-0.029
On-0.029
Token end
Token ID886
Feature activation+1.04e-01

Pos logit contributions
 supervision+0.013
 permission+0.011
 disl+0.010
 choices+0.010
 apology+0.010
Neg logit contributions
lic-0.016
-0.016
Inside-0.016
On-0.015
Along-0.015
Token love
Token ID1842
Feature activation+4.19e+00

Pos logit contributions
 supervision+0.105
 permission+0.091
 apology+0.087
 disl+0.086
 kindness+0.086
Neg logit contributions
lic-0.115
-0.110
Inside-0.110
On-0.107
Along-0.106
Token your
Token ID534
Feature activation+5.41e+00

Pos logit contributions
 supervision+0.149
 apology+0.131
 permission+0.126
 choices+0.124
 attention+0.117
Neg logit contributions
lic-0.130
Inside-0.128
-0.125
On-0.121
Along-0.119
Token ch
Token ID442
Feature activation+2.95e+00

Pos logit contributions
 supervision+0.189
 permission+0.158
 choices+0.152
 apology+0.151
 disl+0.147
Neg logit contributions
lic-0.180
-0.176
Inside-0.171
On-0.165
Along-0.165
Tokenubby
Token ID38393
Feature activation+5.49e+00

Pos logit contributions
 supervision+0.123
 permission+0.111
 apology+0.109
 choices+0.106
 disl+0.104
Neg logit contributions
lic-0.083
Inside-0.080
On-0.079
Along-0.075
-0.073
Token cat
Token ID3797
Feature activation+5.07e+00

Pos logit contributions
 supervision+0.185
 permission+0.157
 apology+0.151
 choices+0.150
 disl+0.149
Neg logit contributions
lic-0.190
-0.183
Inside-0.179
On-0.175
Along-0.173
Token Mitt
Token ID16627
Feature activation+1.25e+01

Pos logit contributions
 supervision+0.188
 permission+0.165
 apology+0.159
 disl+0.154
 choices+0.151
Neg logit contributions
lic-0.158
-0.147
Inside-0.145
On-0.141
Along-0.138
Tokenens
Token ID641
Feature activation+5.86e+00

Pos logit contributions
 supervision+0.312
 permission+0.257
 apology+0.242
 disl+0.230
 choices+0.228
Neg logit contributions
lic-0.544
-0.524
Inside-0.504
On-0.486
Along-0.482
Token,"
Token ID553
Feature activation+1.78e+00

Pos logit contributions
 supervision+0.191
 permission+0.164
 apology+0.159
 disl+0.156
 choices+0.153
Neg logit contributions
lic-0.208
-0.199
Inside-0.197
On-0.192
Along-0.190
Token Sarah
Token ID10490
Feature activation+2.91e+00

Pos logit contributions
 supervision+0.077
 permission+0.070
 apology+0.066
 choices+0.065
 harm+0.064
Neg logit contributions
Inside-0.047
lic-0.047
On-0.045
-0.040
A-0.040
Token said
Token ID531
Feature activation+3.10e-01

Pos logit contributions
 supervision+0.116
 permission+0.103
 choices+0.103
 apology+0.101
 attention+0.101
Neg logit contributions
lic-0.083
Inside-0.077
On-0.076
-0.073
Along-0.071
Token.
Token ID13
Feature activation+1.73e+00

Pos logit contributions
 supervision+0.014
 disl+0.013
 privacy+0.012
 choices+0.012
 permission+0.012
Neg logit contributions
lic-0.007
-0.006
Inside-0.006
On-0.006
Along-0.005
Token children
Token ID1751
Feature activation+4.11e+00

Pos logit contributions
 supervision+0.054
 permission+0.046
 disl+0.044
 choices+0.044
 apology+0.043
Neg logit contributions
lic-0.051
Inside-0.048
-0.047
On-0.046
Along-0.046
Token.
Token ID13
Feature activation+2.72e+00

Pos logit contributions
 supervision+0.162
 disl+0.143
 apology+0.142
 permission+0.141
 choices+0.140
Neg logit contributions
lic-0.109
-0.105
Inside-0.103
On-0.100
Along-0.098
Token You
Token ID921
Feature activation+4.07e+00

Pos logit contributions
 supervision+0.101
 permission+0.089
 choices+0.086
 apology+0.085
 attention+0.084
Neg logit contributions
-0.080
lic-0.079
Inside-0.078
On-0.074
Along-0.073
Token have
Token ID423
Feature activation-4.93e-01

Pos logit contributions
 supervision+0.141
 permission+0.120
 apology+0.116
 choices+0.116
 disl+0.114
Neg logit contributions
lic-0.139
-0.132
Inside-0.132
On-0.128
Along-0.127
Token no
Token ID645
Feature activation+5.05e+00

Pos logit contributions
lic+0.016
Inside+0.016
On+0.015
Along+0.015
+0.015
Neg logit contributions
 supervision-0.017
 apology-0.015
 choices-0.014
 permission-0.014
 disl-0.014
Token choice
Token ID3572
Feature activation+1.24e+01

Pos logit contributions
 supervision+0.177
 disl+0.146
 permission+0.145
 apology+0.142
 choices+0.139
Neg logit contributions
lic-0.154
Inside-0.149
On-0.146
-0.145
Along-0.144
Token.
Token ID13
Feature activation+3.27e+00

Pos logit contributions
 supervision+0.477
 disl+0.418
 apology+0.413
 choices+0.408
 permission+0.407
Neg logit contributions
lic-0.330
Inside-0.324
-0.311
On-0.307
Along-0.296
Token Here
Token ID3423
Feature activation+3.06e+00

Pos logit contributions
 supervision+0.121
 permission+0.107
 choices+0.103
 attention+0.102
 harm+0.102
Neg logit contributions
-0.095
Inside-0.094
lic-0.093
On-0.088
Along-0.087
Token's
Token ID338
Feature activation+2.67e+00

Pos logit contributions
 supervision+0.136
 permission+0.122
 apology+0.121
 disl+0.119
 choices+0.118
Neg logit contributions
-0.065
Inside-0.063
lic-0.061
On-0.058
ink-0.057
Token the
Token ID262
Feature activation+3.25e+00

Pos logit contributions
 supervision+0.101
 permission+0.087
 disl+0.085
 choices+0.085
 apology+0.085
Neg logit contributions
lic-0.078
Inside-0.076
-0.074
On-0.074
Along-0.072
Token first
Token ID717
Feature activation+4.98e+00

Pos logit contributions
 supervision+0.115
 permission+0.099
 apology+0.095
 choices+0.095
 disl+0.094
Neg logit contributions
lic-0.107
-0.102
Inside-0.101
On-0.098
Along-0.097
Token being
Token ID852
Feature activation+2.16e+00

Pos logit contributions
 supervision+0.143
 apology+0.129
 disl+0.124
 permission+0.118
 choices+0.114
Neg logit contributions
On-0.118
Inside-0.115
-0.110
Along-0.109
 clearing-0.104
Token brave
Token ID14802
Feature activation+6.57e+00

Pos logit contributions
 supervision+0.078
 apology+0.068
 permission+0.066
 choices+0.065
 disl+0.065
Neg logit contributions
lic-0.068
Inside-0.065
On-0.063
Along-0.062
-0.061
Token was
Token ID373
Feature activation-1.78e+00

Pos logit contributions
 supervision+0.260
 permission+0.229
 disl+0.228
 apology+0.226
 choices+0.217
Neg logit contributions
lic-0.179
-0.167
Inside-0.166
On-0.164
Along-0.160
Token the
Token ID262
Feature activation+3.08e+00

Pos logit contributions
lic+0.057
Inside+0.056
On+0.055
Along+0.054
+0.053
Neg logit contributions
 supervision-0.062
 apology-0.053
 disl-0.052
 permission-0.052
 choices-0.051
Token best
Token ID1266
Feature activation+5.45e+00

Pos logit contributions
 supervision+0.121
 disl+0.106
 apology+0.100
 permission+0.099
 choices+0.097
Neg logit contributions
-0.080
On-0.080
Along-0.080
Inside-0.079
lic-0.078
Token choice
Token ID3572
Feature activation+1.24e+01

Pos logit contributions
 supervision+0.193
 disl+0.162
 permission+0.156
 apology+0.154
 choices+0.145
Neg logit contributions
Inside-0.170
lic-0.169
-0.168
On-0.165
Along-0.164
Token.
Token ID13
Feature activation+1.25e+00

Pos logit contributions
 supervision+0.493
 disl+0.441
 apology+0.437
 permission+0.427
 choices+0.424
Neg logit contributions
lic-0.333
Inside-0.321
-0.310
On-0.309
Along-0.298
Token
Token ID220
Feature activation-2.30e+00

Pos logit contributions
 supervision+0.057
 permission+0.051
 apology+0.050
 kindness+0.050
 choices+0.050
Neg logit contributions
lic-0.029
Inside-0.025
-0.025
Along-0.024
On-0.024
Token
Token ID198
Feature activation-4.63e+00

Pos logit contributions
lic+0.055
+0.049
On+0.049
Inside+0.049
Along+0.048
Neg logit contributions
 supervision-0.099
 apology-0.084
 permission-0.084
 disl-0.083
 kindness-0.082
Token
Token ID198
Feature activation-5.04e+00

Pos logit contributions
lic+0.106
+0.101
Inside+0.094
Along+0.091
On+0.090
Neg logit contributions
 supervision-0.204
 apology-0.183
 permission-0.182
 kindness-0.180
 disl-0.179
TokenThe
Token ID464
Feature activation-1.28e+00

Pos logit contributions
lic+0.123
+0.117
Inside+0.108
Along+0.103
On+0.101
Neg logit contributions
 supervision-0.216
 permission-0.194
 apology-0.191
 kindness-0.189
 disl-0.188
Token.
Token ID13
Feature activation+3.59e+00

Pos logit contributions
 supervision+0.132
 permission+0.112
 apology+0.110
 choices+0.109
 disl+0.109
Neg logit contributions
lic-0.102
Inside-0.093
On-0.092
-0.090
Along-0.089
Token
Token ID198
Feature activation-2.60e+00

Pos logit contributions
 supervision+0.143
 disl+0.124
 apology+0.123
 permission+0.123
 kindness+0.122
Neg logit contributions
lic-0.101
-0.093
Inside-0.091
Along-0.089
On-0.088
Token
Token ID198
Feature activation-3.22e+00

Pos logit contributions
+0.066
lic+0.064
Along+0.053
Inside+0.053
ink+0.052
Neg logit contributions
 supervision-0.105
 apology-0.096
 disl-0.094
 kindness-0.092
 permission-0.091
Token"
Token ID1
Feature activation+5.03e+00

Pos logit contributions
lic+0.083
+0.083
ink+0.068
Along+0.062
Inside+0.061
Neg logit contributions
 supervision-0.131
 disl-0.119
 apology-0.114
 permission-0.113
 kindness-0.110
TokenHi
Token ID17250
Feature activation+1.64e+00

Pos logit contributions
 supervision+0.170
 permission+0.149
 disl+0.149
 apology+0.147
 kindness+0.145
Neg logit contributions
lic-0.166
-0.159
ink-0.148
Inside-0.148
Along-0.145
Token Mrs
Token ID9074
Feature activation+1.24e+01

Pos logit contributions
 supervision+0.061
 permission+0.053
 choices+0.053
 kindness+0.052
 apology+0.051
Neg logit contributions
lic-0.050
-0.049
Inside-0.046
On-0.046
Along-0.045
Token.
Token ID13
Feature activation+6.12e+00

Pos logit contributions
 supervision+0.424
 permission+0.359
 disl+0.357
 apology+0.354
 choices+0.343
Neg logit contributions
lic-0.408
-0.395
Inside-0.391
On-0.379
Along-0.375
Token Johnson
Token ID5030
Feature activation+4.89e+00

Pos logit contributions
 supervision+0.188
 permission+0.160
 kindness+0.152
 apology+0.149
 disl+0.145
Neg logit contributions
lic-0.233
Inside-0.219
On-0.211
Along-0.210
-0.205
Token!
Token ID0
Feature activation+3.94e+00

Pos logit contributions
 supervision+0.171
 permission+0.150
 apology+0.144
 choices+0.139
 disl+0.139
Neg logit contributions
lic-0.163
Inside-0.152
-0.151
On-0.148
Along-0.145
Token What
Token ID1867
Feature activation+3.69e+00

Pos logit contributions
 supervision+0.127
 permission+0.106
 disl+0.105
 apology+0.104
 kindness+0.103
Neg logit contributions
lic-0.139
-0.133
Inside-0.132
On-0.130
Along-0.129
Token's
Token ID338
Feature activation+2.75e+00

Pos logit contributions
 supervision+0.139
 permission+0.121
 apology+0.117
 choices+0.114
 disl+0.113
Neg logit contributions
lic-0.114
Inside-0.109
On-0.106
Along-0.103
-0.103
Token upon
Token ID2402
Feature activation-7.01e+00

Pos logit contributions
lic+0.069
+0.063
Inside+0.061
On+0.056
Along+0.056
Neg logit contributions
 supervision-0.206
 permission-0.187
 apology-0.182
 choices-0.181
 attention-0.180
Token a
Token ID257
Feature activation-4.61e+00

Pos logit contributions
lic+0.225
+0.221
Inside+0.199
ink+0.199
Along+0.197
Neg logit contributions
 supervision-0.236
 permission-0.216
 kindness-0.212
 apology-0.207
 choices-0.204
Token time
Token ID640
Feature activation-3.77e+00

Pos logit contributions
lic+0.145
Inside+0.127
+0.127
ink+0.115
Along+0.113
Neg logit contributions
 supervision-0.157
 apology-0.143
 kindness-0.142
 harm-0.137
 permission-0.136
Token,
Token ID11
Feature activation-3.94e+00

Pos logit contributions
lic+0.137
Inside+0.127
Along+0.126
On+0.124
+0.123
Neg logit contributions
 supervision-0.113
 permission-0.095
 disl-0.095
 kindness-0.093
 apology-0.091
Token there
Token ID612
Feature activation-6.31e+00

Pos logit contributions
lic+0.121
+0.119
Inside+0.107
Along+0.103
ink+0.101
Neg logit contributions
 supervision-0.131
 permission-0.116
 kindness-0.113
 attention-0.112
 apology-0.111
Token lived
Token ID5615
Feature activation-9.56e+00

Pos logit contributions
lic+0.206
Inside+0.191
+0.187
Along+0.186
On+0.180
Neg logit contributions
 supervision-0.218
 permission-0.189
 kindness-0.187
 choices-0.187
 apology-0.187
Token a
Token ID257
Feature activation-5.47e+00

Pos logit contributions
lic+0.386
+0.367
ink+0.343
br+0.334
Inside+0.324
Neg logit contributions
 permission-0.221
 kindness-0.199
 forgiveness-0.189
 supervision-0.188
 apology-0.165
Token little
Token ID1310
Feature activation-1.19e-01

Pos logit contributions
lic+0.172
+0.163
Inside+0.162
Along+0.153
On+0.152
Neg logit contributions
 supervision-0.182
 kindness-0.166
 apology-0.165
 permission-0.162
 disl-0.162
Token hedge
Token ID19859
Feature activation-4.92e+00

Pos logit contributions
lic+0.005
Inside+0.005
+0.004
On+0.004
Along+0.004
Neg logit contributions
 supervision-0.003
 permission-0.003
 apology-0.003
 choices-0.003
 kindness-0.003
Tokenhog
Token ID31897
Feature activation-1.48e+00

Pos logit contributions
lic+0.168
Inside+0.161
+0.160
On+0.155
Along+0.154
Neg logit contributions
 supervision-0.162
 permission-0.139
 choices-0.136
 apology-0.135
 kindness-0.134
Token.
Token ID13
Feature activation-5.92e-01

Pos logit contributions
lic+0.036
+0.034
Inside+0.033
On+0.033
Along+0.032
Neg logit contributions
 supervision-0.066
 permission-0.058
 apology-0.057
 choices-0.057
 attention-0.056
Token a
Token ID257
Feature activation-5.86e+00

Pos logit contributions
lic+0.156
+0.134
Along+0.128
Inside+0.128
ink+0.127
Neg logit contributions
 supervision-0.123
 kindness-0.114
 permission-0.109
 apology-0.106
 attention-0.105
Token small
Token ID1402
Feature activation-3.52e+00

Pos logit contributions
lic+0.190
+0.178
Inside+0.155
Along+0.146
ink+0.142
Neg logit contributions
 supervision-0.205
 disl-0.186
 harm-0.179
 kindness-0.179
 permission-0.175
Token house
Token ID2156
Feature activation-5.72e+00

Pos logit contributions
lic+0.127
+0.114
Inside+0.112
Along+0.104
On+0.104
Neg logit contributions
 supervision-0.113
 choices-0.101
 permission-0.100
 kindness-0.098
 harm-0.097
Token,
Token ID11
Feature activation-3.83e+00

Pos logit contributions
lic+0.180
+0.169
Inside+0.166
Along+0.164
On+0.161
Neg logit contributions
 supervision-0.209
 permission-0.183
 choices-0.179
 kindness-0.176
 apology-0.175
Token there
Token ID612
Feature activation-6.26e+00

Pos logit contributions
lic+0.102
+0.100
Along+0.087
Inside+0.086
On+0.084
Neg logit contributions
 supervision-0.153
 permission-0.136
 choices-0.136
 kindness-0.136
 apology-0.136
Token lived
Token ID5615
Feature activation-9.45e+00

Pos logit contributions
lic+0.205
+0.187
Inside+0.186
Along+0.183
On+0.178
Neg logit contributions
 supervision-0.221
 permission-0.192
 kindness-0.189
 disl-0.188
 choices-0.187
Token a
Token ID257
Feature activation-4.89e+00

Pos logit contributions
+0.370
lic+0.363
ink+0.340
br+0.324
zag+0.301
Neg logit contributions
 permission-0.239
 kindness-0.212
 supervision-0.207
 forgiveness-0.197
 apology-0.190
Token little
Token ID1310
Feature activation+2.98e-01

Pos logit contributions
lic+0.159
+0.151
Inside+0.148
Along+0.142
On+0.141
Neg logit contributions
 supervision-0.160
 kindness-0.144
 apology-0.143
 disl-0.142
 permission-0.140
Token girl
Token ID2576
Feature activation-4.09e-01

Pos logit contributions
 supervision+0.008
 permission+0.007
 choices+0.006
 kindness+0.006
 apology+0.006
Neg logit contributions
lic-0.012
Inside-0.012
-0.012
On-0.011
Along-0.011
Token named
Token ID3706
Feature activation+3.65e+00

Pos logit contributions
lic+0.011
+0.011
Inside+0.010
On+0.010
Along+0.010
Neg logit contributions
 supervision-0.017
 permission-0.015
 apology-0.014
 choices-0.014
 disl-0.014
Token Sue
Token ID26089
Feature activation+1.27e+00

Pos logit contributions
 supervision+0.100
 permission+0.086
 apology+0.081
 attention+0.081
 kindness+0.080
Neg logit contributions
lic-0.142
-0.141
On-0.133
Along-0.132
Inside-0.132
Token upon
Token ID2402
Feature activation-6.90e+00

Pos logit contributions
lic+0.062
+0.056
Inside+0.055
On+0.051
Along+0.050
Neg logit contributions
 supervision-0.206
 permission-0.188
 apology-0.184
 choices-0.183
 disl-0.181
Token a
Token ID257
Feature activation-4.53e+00

Pos logit contributions
lic+0.223
+0.220
ink+0.200
Inside+0.199
Along+0.197
Neg logit contributions
 supervision-0.226
 permission-0.208
 kindness-0.204
 apology-0.200
 disl-0.197
Token time
Token ID640
Feature activation-3.63e+00

Pos logit contributions
lic+0.142
Inside+0.125
+0.123
ink+0.114
Along+0.112
Neg logit contributions
 supervision-0.154
 apology-0.141
 kindness-0.139
 permission-0.134
 harm-0.134
Token,
Token ID11
Feature activation-3.79e+00

Pos logit contributions
lic+0.131
Inside+0.122
Along+0.121
+0.119
On+0.119
Neg logit contributions
 supervision-0.111
 permission-0.095
 disl-0.093
 kindness-0.092
 apology-0.091
Token there
Token ID612
Feature activation-6.16e+00

Pos logit contributions
lic+0.111
+0.109
Inside+0.099
Along+0.095
ink+0.092
Neg logit contributions
 supervision-0.134
 permission-0.120
 kindness-0.117
 apology-0.117
 attention-0.115
Token lived
Token ID5615
Feature activation-9.43e+00

Pos logit contributions
lic+0.199
Inside+0.184
+0.182
Along+0.180
On+0.175
Neg logit contributions
 supervision-0.216
 permission-0.187
 choices-0.185
 kindness-0.185
 apology-0.185
Token two
Token ID734
Feature activation-2.31e+00

Pos logit contributions
lic+0.379
+0.364
ink+0.337
br+0.326
Inside+0.317
Neg logit contributions
 permission-0.219
 kindness-0.199
 forgiveness-0.186
 supervision-0.184
 apology-0.166
Token best
Token ID1266
Feature activation+1.01e+00

Pos logit contributions
lic+0.081
+0.077
Inside+0.076
On+0.074
Along+0.074
Neg logit contributions
 supervision-0.078
 permission-0.066
 apology-0.064
 choices-0.064
 disl-0.063
Token friends
Token ID2460
Feature activation+1.24e+00

Pos logit contributions
 supervision+0.030
 permission+0.026
 apology+0.025
 choices+0.025
 kindness+0.025
Neg logit contributions
lic-0.038
Inside-0.037
-0.037
On-0.037
Along-0.035
Token,
Token ID11
Feature activation-3.05e+00

Pos logit contributions
 supervision+0.058
 apology+0.050
 disl+0.050
 permission+0.049
 choices+0.049
Neg logit contributions
lic-0.027
-0.025
Inside-0.025
On-0.025
Along-0.024
Token John
Token ID1757
Feature activation-5.68e-02

Pos logit contributions
lic+0.115
+0.109
Inside+0.107
On+0.103
ink+0.103
Neg logit contributions
 supervision-0.091
 permission-0.079
 kindness-0.075
 apology-0.074
 disl-0.071
Token upon
Token ID2402
Feature activation-7.12e+00

Pos logit contributions
lic+0.057
+0.050
Inside+0.050
On+0.047
Along+0.045
Neg logit contributions
 supervision-0.236
 permission-0.215
 apology-0.211
 choices-0.210
 disl-0.209
Token a
Token ID257
Feature activation-4.51e+00

Pos logit contributions
lic+0.226
+0.217
Inside+0.200
ink+0.196
Along+0.196
Neg logit contributions
 supervision-0.245
 permission-0.224
 kindness-0.219
 apology-0.214
 choices-0.212
Token time
Token ID640
Feature activation-3.81e+00

Pos logit contributions
lic+0.143
Inside+0.125
+0.123
Along+0.111
ink+0.111
Neg logit contributions
 supervision-0.153
 kindness-0.140
 apology-0.139
 harm-0.135
 permission-0.133
Token,
Token ID11
Feature activation-3.96e+00

Pos logit contributions
lic+0.131
Inside+0.119
+0.118
On+0.117
Along+0.116
Neg logit contributions
 supervision-0.128
 permission-0.111
 disl-0.107
 kindness-0.107
 apology-0.105
Token there
Token ID612
Feature activation-6.18e+00

Pos logit contributions
lic+0.116
+0.107
Inside+0.100
Along+0.095
ink+0.093
Neg logit contributions
 supervision-0.145
 permission-0.131
 kindness-0.128
 apology-0.125
 disl-0.125
Token lived
Token ID5615
Feature activation-9.42e+00

Pos logit contributions
lic+0.195
Inside+0.180
+0.177
Along+0.174
On+0.171
Neg logit contributions
 supervision-0.226
 permission-0.197
 choices-0.193
 kindness-0.193
 apology-0.192
Token two
Token ID734
Feature activation-2.50e+00

Pos logit contributions
lic+0.373
+0.355
ink+0.331
br+0.316
Inside+0.310
Neg logit contributions
 permission-0.232
 kindness-0.212
 supervision-0.203
 forgiveness-0.197
 apology-0.174
Token little
Token ID1310
Feature activation+4.28e-01

Pos logit contributions
lic+0.086
+0.082
Inside+0.081
On+0.079
Along+0.079
Neg logit contributions
 supervision-0.085
 permission-0.073
 apology-0.071
 choices-0.070
 disl-0.069
Token birds
Token ID10087
Feature activation-3.08e+00

Pos logit contributions
 supervision+0.015
 apology+0.013
 permission+0.013
 disl+0.013
 choices+0.012
Neg logit contributions
lic-0.014
-0.013
Inside-0.013
Along-0.013
On-0.012
Token named
Token ID3706
Feature activation+3.51e+00

Pos logit contributions
lic+0.097
Inside+0.093
+0.093
On+0.089
Along+0.089
Neg logit contributions
 supervision-0.113
 permission-0.099
 choices-0.094
 apology-0.094
 kindness-0.094
Token George
Token ID4502
Feature activation-1.14e+00

Pos logit contributions
 supervision+0.116
 permission+0.100
 apology+0.095
 choices+0.095
 disl+0.094
Neg logit contributions
lic-0.123
-0.117
Inside-0.117
On-0.114
Along-0.113
Token upon
Token ID2402
Feature activation-6.80e+00

Pos logit contributions
lic+0.064
+0.059
Inside+0.057
Along+0.053
On+0.052
Neg logit contributions
 supervision-0.187
 permission-0.170
 apology-0.166
 choices-0.165
 attention-0.164
Token a
Token ID257
Feature activation-4.42e+00

Pos logit contributions
+0.218
lic+0.218
Inside+0.194
ink+0.194
Along+0.192
Neg logit contributions
 supervision-0.225
 permission-0.207
 kindness-0.203
 apology-0.199
 choices-0.196
Token time
Token ID640
Feature activation-3.59e+00

Pos logit contributions
lic+0.138
Inside+0.125
+0.121
ink+0.111
Along+0.110
Neg logit contributions
 supervision-0.144
 apology-0.133
 kindness-0.131
 harm-0.127
 disl-0.125
Token,
Token ID11
Feature activation-3.70e+00

Pos logit contributions
lic+0.135
Inside+0.126
Along+0.125
+0.122
On+0.122
Neg logit contributions
 supervision-0.097
 disl-0.083
 permission-0.080
 kindness-0.080
 attention-0.078
Token there
Token ID612
Feature activation-6.00e+00

Pos logit contributions
+0.117
lic+0.116
Inside+0.106
Along+0.101
ink+0.099
Neg logit contributions
 supervision-0.115
 permission-0.103
 attention-0.100
 kindness-0.099
 apology-0.099
Token lived
Token ID5615
Feature activation-9.40e+00

Pos logit contributions
lic+0.200
Inside+0.186
+0.184
Along+0.182
On+0.175
Neg logit contributions
 supervision-0.201
 permission-0.173
 kindness-0.172
 apology-0.171
 disl-0.171
Token a
Token ID257
Feature activation-5.17e+00

Pos logit contributions
lic+0.368
+0.354
ink+0.327
br+0.320
Inside+0.312
Neg logit contributions
 permission-0.232
 kindness-0.206
 supervision-0.202
 forgiveness-0.193
 apology-0.179
Token gentle
Token ID10296
Feature activation+4.62e-01

Pos logit contributions
lic+0.163
Inside+0.156
+0.156
On+0.148
Along+0.147
Neg logit contributions
 supervision-0.170
 kindness-0.155
 apology-0.155
 permission-0.152
 disl-0.151
Token village
Token ID7404
Feature activation-3.48e+00

Pos logit contributions
 supervision+0.013
 kindness+0.012
 apology+0.012
 permission+0.011
 choices+0.011
Neg logit contributions
lic-0.018
Inside-0.017
On-0.017
-0.016
Along-0.016
Token.
Token ID13
Feature activation-6.36e-01

Pos logit contributions
lic+0.108
+0.103
Inside+0.102
On+0.099
Along+0.098
Neg logit contributions
 supervision-0.129
 permission-0.113
 apology-0.109
 choices-0.109
 disl-0.108
Token One
Token ID1881
Feature activation-3.35e+00

Pos logit contributions
lic+0.021
+0.019
ink+0.019
P+0.018
Finally+0.018
Neg logit contributions
 supervision-0.019
 disl-0.018
 permission-0.018
 kindness-0.017
 apology-0.017
Token a
Token ID257
Feature activation-5.94e+00

Pos logit contributions
lic+0.145
+0.128
Inside+0.121
ink+0.120
Along+0.119
Neg logit contributions
 supervision-0.135
 kindness-0.120
 permission-0.120
 attention-0.113
 apology-0.113
Token small
Token ID1402
Feature activation-3.64e+00

Pos logit contributions
lic+0.188
+0.177
Inside+0.154
Along+0.143
ink+0.142
Neg logit contributions
 supervision-0.211
 disl-0.188
 kindness-0.185
 permission-0.183
 harm-0.180
Token town
Token ID3240
Feature activation-5.70e+00

Pos logit contributions
lic+0.132
+0.117
Inside+0.114
Along+0.105
On+0.105
Neg logit contributions
 supervision-0.117
 permission-0.105
 choices-0.104
 kindness-0.103
 harm-0.099
Token,
Token ID11
Feature activation-3.65e+00

Pos logit contributions
lic+0.174
+0.167
Inside+0.158
Along+0.153
On+0.153
Neg logit contributions
 supervision-0.215
 permission-0.189
 choices-0.187
 kindness-0.181
 disl-0.181
Token there
Token ID612
Feature activation-6.19e+00

Pos logit contributions
lic+0.106
+0.103
ink+0.087
Inside+0.087
Along+0.086
Neg logit contributions
 supervision-0.131
 kindness-0.119
 choices-0.119
 permission-0.118
 attention-0.117
Token lived
Token ID5615
Feature activation-9.38e+00

Pos logit contributions
lic+0.222
+0.199
Inside+0.195
Along+0.192
On+0.185
Neg logit contributions
 supervision-0.198
 permission-0.172
 kindness-0.171
 choices-0.168
 disl-0.166
Token a
Token ID257
Feature activation-4.95e+00

Pos logit contributions
+0.368
lic+0.367
ink+0.344
br+0.321
zag+0.297
Neg logit contributions
 permission-0.233
 kindness-0.209
 supervision-0.201
 forgiveness-0.189
 apology-0.186
Token little
Token ID1310
Feature activation+1.98e-01

Pos logit contributions
lic+0.160
+0.152
Inside+0.150
On+0.143
Along+0.142
Neg logit contributions
 supervision-0.161
 kindness-0.145
 apology-0.143
 disl-0.143
 permission-0.142
Token boy
Token ID2933
Feature activation-1.98e-01

Pos logit contributions
 supervision+0.005
 permission+0.004
 choices+0.004
 kindness+0.004
 apology+0.004
Neg logit contributions
lic-0.008
Inside-0.008
-0.008
On-0.008
Along-0.008
Token named
Token ID3706
Feature activation+3.75e+00

Pos logit contributions
lic+0.006
+0.005
Inside+0.005
On+0.005
Along+0.005
Neg logit contributions
 supervision-0.008
 permission-0.007
 choices-0.007
 apology-0.007
 kindness-0.007
Token Tim
Token ID5045
Feature activation+1.84e+00

Pos logit contributions
 supervision+0.098
 permission+0.081
 choices+0.076
 attention+0.076
 kindness+0.076
Neg logit contributions
lic-0.154
-0.152
Inside-0.145
On-0.144
Along-0.142
Token upon
Token ID2402
Feature activation-6.79e+00

Pos logit contributions
lic+0.058
+0.052
Inside+0.051
On+0.048
Along+0.047
Neg logit contributions
 supervision-0.194
 permission-0.176
 apology-0.173
 choices-0.172
 disl-0.170
Token a
Token ID257
Feature activation-4.33e+00

Pos logit contributions
lic+0.210
+0.207
Inside+0.187
Along+0.184
ink+0.183
Neg logit contributions
 supervision-0.240
 permission-0.218
 kindness-0.212
 apology-0.210
 choices-0.207
Token time
Token ID640
Feature activation-3.52e+00

Pos logit contributions
lic+0.137
Inside+0.122
+0.119
 clearing+0.109
Along+0.109
Neg logit contributions
 supervision-0.145
 apology-0.132
 kindness-0.129
 harm-0.126
 attention-0.124
Token,
Token ID11
Feature activation-3.67e+00

Pos logit contributions
lic+0.129
Inside+0.120
Along+0.119
+0.117
On+0.116
Neg logit contributions
 supervision-0.102
 disl-0.086
 permission-0.085
 kindness-0.084
 attention-0.082
Token there
Token ID612
Feature activation-5.98e+00

Pos logit contributions
+0.114
lic+0.113
Inside+0.102
Along+0.099
ink+0.096
Neg logit contributions
 supervision-0.119
 permission-0.106
 attention-0.103
 kindness-0.102
 apology-0.101
Token lived
Token ID5615
Feature activation-9.34e+00

Pos logit contributions
lic+0.193
Inside+0.180
+0.178
Along+0.176
On+0.170
Neg logit contributions
 supervision-0.208
 permission-0.180
 kindness-0.178
 apology-0.178
 choices-0.178
Token a
Token ID257
Feature activation-5.11e+00

Pos logit contributions
lic+0.372
+0.359
ink+0.328
br+0.323
Inside+0.316
Neg logit contributions
 permission-0.223
 kindness-0.201
 supervision-0.194
 forgiveness-0.187
 apology-0.173
Token little
Token ID1310
Feature activation+2.04e-01

Pos logit contributions
lic+0.162
+0.155
Inside+0.154
On+0.146
Along+0.146
Neg logit contributions
 supervision-0.170
 kindness-0.153
 apology-0.152
 permission-0.151
 disl-0.149
Token girl
Token ID2576
Feature activation-4.23e-01

Pos logit contributions
 supervision+0.006
 permission+0.005
 apology+0.005
 choices+0.005
 kindness+0.005
Neg logit contributions
lic-0.008
Inside-0.008
-0.008
On-0.008
Along-0.008
Token named
Token ID3706
Feature activation+3.73e+00

Pos logit contributions
lic+0.011
+0.010
Inside+0.010
On+0.010
Along+0.010
Neg logit contributions
 supervision-0.018
 permission-0.016
 apology-0.016
 choices-0.016
 disl-0.016
Token Sue
Token ID26089
Feature activation+1.19e+00

Pos logit contributions
 supervision+0.105
 permission+0.089
 apology+0.084
 kindness+0.083
 choices+0.082
Neg logit contributions
lic-0.147
-0.142
Inside-0.139
On-0.137
Along-0.136
Token upon
Token ID2402
Feature activation-6.82e+00

Pos logit contributions
lic+0.053
+0.047
Inside+0.047
On+0.044
Along+0.042
Neg logit contributions
 supervision-0.199
 permission-0.182
 apology-0.178
 choices-0.178
 disl-0.176
Token a
Token ID257
Feature activation-4.28e+00

Pos logit contributions
lic+0.217
+0.216
Inside+0.194
ink+0.192
Along+0.190
Neg logit contributions
 supervision-0.229
 permission-0.211
 kindness-0.207
 apology-0.203
 choices-0.200
Token time
Token ID640
Feature activation-3.63e+00

Pos logit contributions
lic+0.130
Inside+0.113
+0.113
 clearing+0.101
Along+0.100
Neg logit contributions
 supervision-0.156
 apology-0.139
 kindness-0.137
 permission-0.135
 harm-0.135
Token,
Token ID11
Feature activation-3.59e+00

Pos logit contributions
lic+0.131
Inside+0.120
+0.119
Along+0.117
On+0.116
Neg logit contributions
 supervision-0.113
 permission-0.096
 kindness-0.093
 disl-0.092
 attention-0.091
Token there
Token ID612
Feature activation-5.74e+00

Pos logit contributions
+0.119
lic+0.118
Inside+0.105
Along+0.099
ink+0.098
Neg logit contributions
 supervision-0.106
 permission-0.097
 attention-0.092
 kindness-0.092
 apology-0.089
Token lived
Token ID5615
Feature activation-9.34e+00

Pos logit contributions
lic+0.193
Inside+0.178
+0.176
Along+0.174
On+0.168
Neg logit contributions
 supervision-0.194
 permission-0.167
 kindness-0.165
 choices-0.164
 apology-0.163
Token a
Token ID257
Feature activation-5.19e+00

Pos logit contributions
lic+0.370
+0.362
ink+0.320
br+0.316
Inside+0.306
Neg logit contributions
 permission-0.230
 kindness-0.206
 supervision-0.201
 forgiveness-0.186
 attention-0.173
Token little
Token ID1310
Feature activation-1.22e-01

Pos logit contributions
lic+0.163
+0.156
Inside+0.155
On+0.146
Along+0.145
Neg logit contributions
 supervision-0.173
 kindness-0.156
 apology-0.155
 permission-0.153
 disl-0.152
Token girl
Token ID2576
Feature activation-5.91e-01

Pos logit contributions
lic+0.005
Inside+0.005
+0.005
On+0.005
Along+0.005
Neg logit contributions
 supervision-0.003
 permission-0.003
 choices-0.003
 apology-0.003
 kindness-0.003
Token who
Token ID508
Feature activation+2.40e-01

Pos logit contributions
lic+0.015
+0.014
Inside+0.014
On+0.013
Along+0.013
Neg logit contributions
 supervision-0.026
 permission-0.023
 apology-0.023
 choices-0.022
 disl-0.022
Token liked
Token ID8288
Feature activation-9.27e-01

Pos logit contributions
 supervision+0.009
 permission+0.008
 apology+0.008
 choices+0.008
 disl+0.007
Neg logit contributions
lic-0.008
-0.007
Inside-0.007
On-0.007
Along-0.007
Token upon
Token ID2402
Feature activation-6.77e+00

Pos logit contributions
lic+0.061
+0.056
Inside+0.054
On+0.050
Along+0.050
Neg logit contributions
 supervision-0.190
 permission-0.172
 apology-0.169
 choices-0.168
 kindness-0.166
Token a
Token ID257
Feature activation-4.21e+00

Pos logit contributions
lic+0.211
+0.207
Inside+0.189
Along+0.185
ink+0.184
Neg logit contributions
 supervision-0.235
 permission-0.214
 kindness-0.210
 apology-0.207
 choices-0.204
Token time
Token ID640
Feature activation-3.55e+00

Pos logit contributions
lic+0.133
Inside+0.118
+0.115
Along+0.105
 clearing+0.105
Neg logit contributions
 supervision-0.140
 apology-0.129
 kindness-0.126
 disl-0.123
 harm-0.122
Token,
Token ID11
Feature activation-3.45e+00

Pos logit contributions
lic+0.134
Inside+0.123
Along+0.122
On+0.118
+0.118
Neg logit contributions
 supervision-0.101
 disl-0.085
 permission-0.084
 kindness-0.082
 attention-0.080
Token there
Token ID612
Feature activation-5.98e+00

Pos logit contributions
lic+0.110
+0.109
Inside+0.098
Along+0.095
ink+0.092
Neg logit contributions
 supervision-0.107
 permission-0.095
 attention-0.094
 kindness-0.094
 apology-0.094
Token lived
Token ID5615
Feature activation-9.33e+00

Pos logit contributions
lic+0.201
Inside+0.187
+0.184
Along+0.183
On+0.176
Neg logit contributions
 supervision-0.199
 permission-0.170
 kindness-0.169
 apology-0.169
 disl-0.169
Token a
Token ID257
Feature activation-5.15e+00

Pos logit contributions
lic+0.370
+0.360
ink+0.332
br+0.318
Inside+0.313
Neg logit contributions
 permission-0.223
 kindness-0.203
 supervision-0.197
 forgiveness-0.188
 apology-0.174
Token little
Token ID1310
Feature activation+2.25e-03

Pos logit contributions
lic+0.164
+0.157
Inside+0.155
Along+0.147
On+0.146
Neg logit contributions
 supervision-0.168
 kindness-0.153
 apology-0.152
 permission-0.149
 disl-0.149
Token girl
Token ID2576
Feature activation-5.47e-01

Pos logit contributions
 supervision+0.000
 permission+0.000
 choices+0.000
 apology+0.000
 kindness+0.000
Neg logit contributions
lic-0.000
Inside-0.000
-0.000
On-0.000
Along-0.000
Token called
Token ID1444
Feature activation+6.39e-01

Pos logit contributions
lic+0.014
+0.013
Inside+0.013
On+0.013
Along+0.012
Neg logit contributions
 supervision-0.023
 permission-0.021
 apology-0.020
 choices-0.020
 disl-0.020
Token Kim
Token ID6502
Feature activation+3.98e+00

Pos logit contributions
 supervision+0.019
 permission+0.016
 apology+0.015
 kindness+0.015
 disl+0.015
Neg logit contributions
lic-0.025
-0.024
Inside-0.023
On-0.023
Along-0.023
Token upon
Token ID2402
Feature activation-6.73e+00

Pos logit contributions
lic+0.055
+0.050
Inside+0.049
On+0.046
Along+0.044
Neg logit contributions
 supervision-0.196
 permission-0.178
 apology-0.174
 choices-0.174
 disl-0.172
Token a
Token ID257
Feature activation-4.23e+00

Pos logit contributions
lic+0.208
+0.202
Inside+0.187
Along+0.182
ink+0.178
Neg logit contributions
 supervision-0.238
 permission-0.217
 kindness-0.211
 choices-0.208
 apology-0.208
Token time
Token ID640
Feature activation-3.45e+00

Pos logit contributions
lic+0.131
Inside+0.115
+0.112
Along+0.103
ink+0.102
Neg logit contributions
 supervision-0.146
 apology-0.132
 kindness-0.130
 disl-0.128
 harm-0.128
Token,
Token ID11
Feature activation-3.59e+00

Pos logit contributions
lic+0.123
Inside+0.114
Along+0.112
On+0.110
+0.109
Neg logit contributions
 supervision-0.109
 disl-0.093
 permission-0.092
 kindness-0.090
 apology-0.088
Token there
Token ID612
Feature activation-5.95e+00

Pos logit contributions
lic+0.106
+0.104
Inside+0.095
Along+0.091
ink+0.088
Neg logit contributions
 supervision-0.126
 permission-0.112
 kindness-0.110
 apology-0.110
 attention-0.109
Token lived
Token ID5615
Feature activation-9.32e+00

Pos logit contributions
lic+0.199
Inside+0.184
+0.181
Along+0.180
On+0.173
Neg logit contributions
 supervision-0.200
 permission-0.173
 kindness-0.172
 disl-0.171
 choices-0.171
Token a
Token ID257
Feature activation-5.23e+00

Pos logit contributions
lic+0.365
+0.348
ink+0.323
br+0.315
Inside+0.310
Neg logit contributions
 permission-0.229
 kindness-0.213
 supervision-0.204
 forgiveness-0.194
 apology-0.183
Token delicate
Token ID19217
Feature activation-1.03e+00

Pos logit contributions
lic+0.164
+0.157
Inside+0.155
On+0.147
Along+0.147
Neg logit contributions
 supervision-0.175
 kindness-0.159
 apology-0.157
 disl-0.155
 permission-0.155
Token white
Token ID2330
Feature activation-2.42e+00

Pos logit contributions
lic+0.036
Inside+0.035
+0.034
On+0.034
Along+0.033
Neg logit contributions
 supervision-0.033
 permission-0.029
 apology-0.029
 kindness-0.028
 choices-0.028
Token rabbit
Token ID22746
Feature activation-1.65e+00

Pos logit contributions
lic+0.099
Inside+0.096
On+0.092
Along+0.090
+0.089
Neg logit contributions
 supervision-0.059
 permission-0.054
 kindness-0.054
 apology-0.054
 choices-0.050
Token who
Token ID508
Feature activation+5.61e-01

Pos logit contributions
lic+0.051
Inside+0.049
+0.048
Along+0.046
On+0.046
Neg logit contributions
 supervision-0.060
 permission-0.053
 kindness-0.051
 apology-0.051
 choices-0.051
Token upon
Token ID2402
Feature activation-6.82e+00

Pos logit contributions
lic+0.056
+0.050
Inside+0.050
On+0.046
Along+0.045
Neg logit contributions
 supervision-0.199
 permission-0.181
 apology-0.177
 choices-0.177
 disl-0.175
Token a
Token ID257
Feature activation-4.41e+00

Pos logit contributions
lic+0.209
+0.205
Inside+0.187
Along+0.183
ink+0.182
Neg logit contributions
 supervision-0.241
 permission-0.219
 kindness-0.214
 apology-0.213
 choices-0.210
Token time
Token ID640
Feature activation-3.50e+00

Pos logit contributions
lic+0.135
Inside+0.120
+0.116
Along+0.108
 clearing+0.107
Neg logit contributions
 supervision-0.156
 apology-0.141
 kindness-0.138
 harm-0.135
 disl-0.134
Token,
Token ID11
Feature activation-3.55e+00

Pos logit contributions
lic+0.124
Inside+0.115
Along+0.113
+0.112
On+0.111
Neg logit contributions
 supervision-0.111
 permission-0.094
 disl-0.093
 kindness-0.092
 apology-0.090
Token there
Token ID612
Feature activation-5.88e+00

Pos logit contributions
lic+0.111
+0.110
Inside+0.099
Along+0.096
ink+0.093
Neg logit contributions
 supervision-0.113
 permission-0.101
 kindness-0.100
 apology-0.100
 attention-0.099
Token lived
Token ID5615
Feature activation-9.32e+00

Pos logit contributions
lic+0.194
Inside+0.181
+0.178
Along+0.177
On+0.171
Neg logit contributions
 supervision-0.200
 permission-0.172
 kindness-0.171
 choices-0.170
 disl-0.170
Token a
Token ID257
Feature activation-5.12e+00

Pos logit contributions
lic+0.368
+0.357
ink+0.328
br+0.316
Inside+0.309
Neg logit contributions
 permission-0.231
 kindness-0.211
 supervision-0.205
 forgiveness-0.193
 apology-0.185
Token compassionate
Token ID32533
Feature activation+3.13e+00

Pos logit contributions
lic+0.163
+0.156
Inside+0.154
On+0.146
Along+0.146
Neg logit contributions
 supervision-0.170
 kindness-0.153
 apology-0.152
 permission-0.150
 disl-0.149
Token boy
Token ID2933
Feature activation+3.48e-01

Pos logit contributions
 supervision+0.095
 permission+0.085
 apology+0.084
 kindness+0.083
 choices+0.081
Neg logit contributions
lic-0.114
Inside-0.110
-0.109
On-0.107
Along-0.105
Token who
Token ID508
Feature activation+8.63e-01

Pos logit contributions
 supervision+0.015
 permission+0.013
 apology+0.013
 choices+0.013
 disl+0.013
Neg logit contributions
lic-0.009
-0.008
Inside-0.008
On-0.008
Along-0.008
Token loved
Token ID6151
Feature activation+8.20e-01

Pos logit contributions
 supervision+0.038
 permission+0.033
 apology+0.032
 choices+0.031
 attention+0.030
Neg logit contributions
lic-0.023
Inside-0.022
On-0.022
-0.021
Along-0.021
Token a
Token ID257
Feature activation-5.52e+00

Pos logit contributions
lic+0.135
+0.121
Inside+0.112
ink+0.112
Along+0.110
Neg logit contributions
 supervision-0.117
 kindness-0.107
 permission-0.104
 apology-0.100
 attention-0.099
Token big
Token ID1263
Feature activation-2.26e+00

Pos logit contributions
lic+0.179
+0.170
Inside+0.151
Along+0.137
ink+0.136
Neg logit contributions
 supervision-0.187
 disl-0.170
 kindness-0.166
 harm-0.163
 permission-0.162
Token forest
Token ID8222
Feature activation-8.56e+00

Pos logit contributions
lic+0.067
Inside+0.060
+0.059
On+0.057
Along+0.056
Neg logit contributions
 supervision-0.090
 permission-0.080
 kindness-0.079
 choices-0.078
 disl-0.078
Token,
Token ID11
Feature activation-3.31e+00

Pos logit contributions
lic+0.301
+0.299
Inside+0.275
Along+0.273
On+0.264
Neg logit contributions
 supervision-0.254
 choices-0.232
 harm-0.227
 disl-0.224
 permission-0.224
Token there
Token ID612
Feature activation-5.96e+00

Pos logit contributions
+0.095
lic+0.093
Inside+0.078
Along+0.077
ink+0.076
Neg logit contributions
 supervision-0.117
 kindness-0.109
 choices-0.109
 apology-0.107
 attention-0.106
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.211
+0.196
Inside+0.188
Along+0.185
On+0.179
Neg logit contributions
 supervision-0.191
 permission-0.167
 kindness-0.166
 choices-0.164
 disl-0.163
Token a
Token ID257
Feature activation-4.77e+00

Pos logit contributions
+0.381
lic+0.358
ink+0.339
br+0.316
icycle+0.296
Neg logit contributions
 permission-0.218
 kindness-0.205
 supervision-0.181
 attention-0.176
 forgiveness-0.176
Token little
Token ID1310
Feature activation+1.83e-01

Pos logit contributions
lic+0.156
+0.153
Inside+0.145
Along+0.138
On+0.136
Neg logit contributions
 supervision-0.138
 kindness-0.137
 disl-0.130
 apology-0.129
 harm-0.125
Token deer
Token ID20096
Feature activation-3.24e+00

Pos logit contributions
 supervision+0.005
 kindness+0.004
 choices+0.004
 permission+0.004
 apology+0.004
Neg logit contributions
lic-0.007
Inside-0.007
-0.007
On-0.007
Along-0.007
Token.
Token ID13
Feature activation-5.44e-02

Pos logit contributions
lic+0.110
+0.104
Inside+0.104
Along+0.099
On+0.099
Neg logit contributions
 supervision-0.107
 permission-0.094
 kindness-0.090
 choices-0.090
 apology-0.090
Token The
Token ID383
Feature activation-1.16e+00

Pos logit contributions
lic+0.002
+0.002
Finally+0.002
P+0.002
ink+0.002
Neg logit contributions
 kindness-0.001
 supervision-0.001
 disl-0.001
 honesty-0.001
 apology-0.001
Token a
Token ID257
Feature activation-5.68e+00

Pos logit contributions
lic+0.145
+0.128
Inside+0.122
Along+0.120
ink+0.119
Neg logit contributions
 supervision-0.124
 kindness-0.114
 permission-0.110
 apology-0.106
 choices-0.105
Token small
Token ID1402
Feature activation-3.54e+00

Pos logit contributions
lic+0.183
+0.172
Inside+0.152
Along+0.140
ink+0.138
Neg logit contributions
 supervision-0.196
 disl-0.179
 kindness-0.174
 harm-0.171
 permission-0.171
Token town
Token ID3240
Feature activation-5.57e+00

Pos logit contributions
lic+0.127
+0.114
Inside+0.112
Along+0.102
On+0.102
Neg logit contributions
 supervision-0.114
 choices-0.103
 permission-0.103
 kindness-0.101
 harm-0.099
Token,
Token ID11
Feature activation-3.36e+00

Pos logit contributions
lic+0.168
+0.161
Inside+0.154
Along+0.150
On+0.149
Neg logit contributions
 supervision-0.210
 permission-0.185
 choices-0.183
 disl-0.179
 kindness-0.178
Token there
Token ID612
Feature activation-6.09e+00

Pos logit contributions
lic+0.095
+0.093
Inside+0.079
ink+0.078
Along+0.077
Neg logit contributions
 supervision-0.124
 choices-0.113
 kindness-0.113
 apology-0.112
 permission-0.111
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.210
+0.191
Inside+0.189
Along+0.185
On+0.178
Neg logit contributions
 supervision-0.202
 permission-0.176
 kindness-0.174
 choices-0.172
 disl-0.171
Token a
Token ID257
Feature activation-4.81e+00

Pos logit contributions
+0.357
lic+0.354
ink+0.335
br+0.306
und+0.285
Neg logit contributions
 permission-0.238
 kindness-0.220
 supervision-0.214
 apology-0.201
 forgiveness-0.200
Token fat
Token ID3735
Feature activation+9.66e-01

Pos logit contributions
lic+0.158
+0.150
Inside+0.147
On+0.140
Along+0.140
Neg logit contributions
 supervision-0.153
 kindness-0.139
 apology-0.137
 disl-0.137
 permission-0.135
Token cat
Token ID3797
Feature activation-2.78e-02

Pos logit contributions
 supervision+0.023
 apology+0.020
 kindness+0.020
 permission+0.020
 choices+0.019
Neg logit contributions
lic-0.040
Inside-0.039
On-0.038
Along-0.037
-0.037
Token named
Token ID3706
Feature activation+3.89e+00

Pos logit contributions
lic+0.001
+0.001
Inside+0.001
Along+0.001
On+0.001
Neg logit contributions
 supervision-0.001
 permission-0.001
 disl-0.001
 choices-0.001
 apology-0.001
Token Tom
Token ID4186
Feature activation+1.90e+00

Pos logit contributions
 supervision+0.113
 permission+0.094
 kindness+0.090
 disl+0.089
 apology+0.088
Neg logit contributions
lic-0.150
-0.144
Inside-0.140
On-0.139
Along-0.139
Token upon
Token ID2402
Feature activation-6.72e+00

Pos logit contributions
lic+0.052
+0.047
Inside+0.046
On+0.043
Along+0.042
Neg logit contributions
 supervision-0.192
 permission-0.175
 apology-0.171
 choices-0.171
 disl-0.170
Token a
Token ID257
Feature activation-4.23e+00

Pos logit contributions
lic+0.208
+0.203
Inside+0.187
Along+0.182
ink+0.181
Neg logit contributions
 supervision-0.235
 permission-0.215
 kindness-0.210
 apology-0.206
 choices-0.205
Token time
Token ID640
Feature activation-3.53e+00

Pos logit contributions
lic+0.127
Inside+0.110
+0.108
ink+0.098
Along+0.098
Neg logit contributions
 supervision-0.154
 apology-0.138
 kindness-0.137
 disl-0.134
 harm-0.134
Token,
Token ID11
Feature activation-3.50e+00

Pos logit contributions
lic+0.131
Inside+0.122
Along+0.119
+0.118
On+0.117
Neg logit contributions
 supervision-0.103
 disl-0.086
 permission-0.086
 kindness-0.085
 apology-0.082
Token there
Token ID612
Feature activation-5.85e+00

Pos logit contributions
+0.116
lic+0.114
Inside+0.102
Along+0.098
ink+0.098
Neg logit contributions
 supervision-0.103
 permission-0.092
 attention-0.091
 kindness-0.090
 apology-0.089
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.193
Inside+0.178
+0.175
Along+0.173
On+0.168
Neg logit contributions
 supervision-0.202
 permission-0.175
 choices-0.172
 kindness-0.172
 disl-0.171
Token an
Token ID281
Feature activation-1.41e+00

Pos logit contributions
lic+0.364
+0.352
ink+0.323
Inside+0.308
br+0.308
Neg logit contributions
 permission-0.236
 kindness-0.215
 supervision-0.213
 forgiveness-0.198
 apology-0.187
Token ancient
Token ID6156
Feature activation-5.87e-01

Pos logit contributions
lic+0.045
Inside+0.043
Along+0.043
On+0.043
+0.042
Neg logit contributions
 supervision-0.052
 permission-0.042
 choices-0.040
 disl-0.040
 attention-0.040
Token soldier
Token ID10686
Feature activation-1.22e+00

Pos logit contributions
lic+0.021
Inside+0.019
+0.019
On+0.019
Along+0.019
Neg logit contributions
 supervision-0.019
 permission-0.017
 apology-0.016
 kindness-0.016
 choices-0.016
Token.
Token ID13
Feature activation-3.69e-01

Pos logit contributions
lic+0.036
+0.034
Inside+0.034
On+0.033
Along+0.033
Neg logit contributions
 supervision-0.047
 permission-0.041
 apology-0.040
 choices-0.040
 kindness-0.039
Token He
Token ID679
Feature activation-4.34e+00

Pos logit contributions
lic+0.012
+0.011
Inside+0.010
Along+0.010
On+0.010
Neg logit contributions
 supervision-0.013
 permission-0.012
 kindness-0.012
 apology-0.012
 disl-0.012
Token upon
Token ID2402
Feature activation-6.69e+00

Pos logit contributions
lic+0.050
+0.044
Inside+0.044
On+0.041
Along+0.040
Neg logit contributions
 supervision-0.192
 permission-0.175
 apology-0.171
 choices-0.171
 disl-0.170
Token a
Token ID257
Feature activation-4.17e+00

Pos logit contributions
lic+0.196
+0.189
Inside+0.176
Along+0.171
On+0.167
Neg logit contributions
 supervision-0.252
 permission-0.228
 kindness-0.220
 apology-0.220
 choices-0.218
Token time
Token ID640
Feature activation-3.46e+00

Pos logit contributions
lic+0.125
Inside+0.110
+0.107
ink+0.099
Along+0.098
Neg logit contributions
 supervision-0.151
 apology-0.137
 kindness-0.135
 harm-0.131
 disl-0.131
Token,
Token ID11
Feature activation-3.45e+00

Pos logit contributions
lic+0.125
Inside+0.115
Along+0.113
+0.111
On+0.110
Neg logit contributions
 supervision-0.106
 permission-0.089
 disl-0.089
 kindness-0.088
 attention-0.085
Token there
Token ID612
Feature activation-5.81e+00

Pos logit contributions
lic+0.111
+0.109
Inside+0.097
ink+0.093
Along+0.093
Neg logit contributions
 supervision-0.108
 permission-0.097
 kindness-0.095
 attention-0.095
 apology-0.093
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.188
Inside+0.173
+0.171
Along+0.169
On+0.164
Neg logit contributions
 supervision-0.206
 permission-0.178
 kindness-0.175
 choices-0.175
 apology-0.174
Token a
Token ID257
Feature activation-5.17e+00

Pos logit contributions
lic+0.372
+0.353
ink+0.326
br+0.310
Inside+0.308
Neg logit contributions
 permission-0.228
 kindness-0.210
 supervision-0.205
 forgiveness-0.195
 apology-0.178
Token young
Token ID1862
Feature activation+7.58e-01

Pos logit contributions
lic+0.162
+0.154
Inside+0.152
On+0.144
Along+0.144
Neg logit contributions
 supervision-0.175
 kindness-0.158
 apology-0.157
 permission-0.155
 disl-0.154
Token man
Token ID582
Feature activation-2.74e-01

Pos logit contributions
 supervision+0.025
 permission+0.021
 apology+0.020
 choices+0.020
 disl+0.020
Neg logit contributions
lic-0.027
-0.026
Inside-0.026
On-0.025
Along-0.025
Token.
Token ID13
Feature activation-1.93e-01

Pos logit contributions
lic+0.007
+0.006
Inside+0.006
On+0.006
Along+0.006
Neg logit contributions
 supervision-0.012
 permission-0.011
 apology-0.010
 choices-0.010
 disl-0.010
Token His
Token ID2399
Feature activation-3.13e+00

Pos logit contributions
lic+0.006
+0.006
Inside+0.005
Along+0.005
On+0.005
Neg logit contributions
 supervision-0.007
 disl-0.006
 kindness-0.006
 permission-0.006
 choices-0.006
Token upon
Token ID2402
Feature activation-6.79e+00

Pos logit contributions
lic+0.052
+0.046
Inside+0.046
On+0.043
Along+0.041
Neg logit contributions
 supervision-0.200
 permission-0.182
 apology-0.179
 choices-0.178
 disl-0.177
Token a
Token ID257
Feature activation-4.34e+00

Pos logit contributions
lic+0.207
+0.201
Inside+0.184
Along+0.179
ink+0.178
Neg logit contributions
 supervision-0.246
 permission-0.224
 kindness-0.217
 apology-0.215
 choices-0.214
Token time
Token ID640
Feature activation-3.58e+00

Pos logit contributions
lic+0.134
Inside+0.116
+0.115
ink+0.106
Along+0.104
Neg logit contributions
 supervision-0.154
 apology-0.138
 kindness-0.137
 harm-0.134
 permission-0.133
Token,
Token ID11
Feature activation-3.74e+00

Pos logit contributions
lic+0.120
Inside+0.111
+0.110
Along+0.108
On+0.107
Neg logit contributions
 supervision-0.122
 permission-0.105
 disl-0.102
 kindness-0.101
 apology-0.099
Token there
Token ID612
Feature activation-5.85e+00

Pos logit contributions
lic+0.114
+0.109
Inside+0.099
ink+0.096
Along+0.095
Neg logit contributions
 supervision-0.127
 permission-0.114
 kindness-0.111
 attention-0.109
 apology-0.108
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.190
Inside+0.176
+0.173
Along+0.171
On+0.167
Neg logit contributions
 supervision-0.205
 permission-0.177
 kindness-0.175
 choices-0.174
 apology-0.174
Token little
Token ID1310
Feature activation+1.27e-01

Pos logit contributions
lic+0.372
+0.358
ink+0.330
br+0.315
Inside+0.310
Neg logit contributions
 permission-0.222
 kindness-0.202
 supervision-0.197
 forgiveness-0.186
 apology-0.167
Token Dave
Token ID9935
Feature activation-1.19e+00

Pos logit contributions
 supervision+0.004
 permission+0.003
 choices+0.003
 kindness+0.003
 disl+0.003
Neg logit contributions
lic-0.005
-0.004
Inside-0.004
On-0.004
Along-0.004
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
lic+0.035
+0.033
Inside+0.033
On+0.032
Along+0.032
Neg logit contributions
 supervision-0.047
 permission-0.041
 apology-0.040
 choices-0.040
 disl-0.039
Token He
Token ID679
Feature activation-4.63e+00

Pos logit contributions
lic+0.018
+0.016
Inside+0.016
ink+0.015
br+0.015
Neg logit contributions
 supervision-0.010
 choices-0.009
 kindness-0.009
 permission-0.009
 disl-0.009
Token was
Token ID373
Feature activation-6.10e+00

Pos logit contributions
lic+0.156
+0.153
Inside+0.147
Along+0.142
On+0.140
Neg logit contributions
 supervision-0.146
 disl-0.133
 permission-0.130
 kindness-0.128
 apology-0.127
Token a
Token ID257
Feature activation-5.80e+00

Pos logit contributions
lic+0.151
+0.137
ink+0.128
Inside+0.126
Along+0.125
Neg logit contributions
 supervision-0.129
 kindness-0.119
 permission-0.116
 apology-0.112
 attention-0.111
Token big
Token ID1263
Feature activation-2.46e+00

Pos logit contributions
lic+0.186
+0.175
Inside+0.152
Along+0.142
ink+0.139
Neg logit contributions
 supervision-0.204
 disl-0.185
 kindness-0.180
 harm-0.178
 permission-0.177
Token forest
Token ID8222
Feature activation-8.81e+00

Pos logit contributions
lic+0.074
Inside+0.065
+0.065
On+0.062
Along+0.061
Neg logit contributions
 supervision-0.097
 permission-0.086
 choices-0.085
 kindness-0.084
 disl-0.084
Token,
Token ID11
Feature activation-3.56e+00

Pos logit contributions
lic+0.309
+0.304
Inside+0.281
Along+0.280
On+0.269
Neg logit contributions
 supervision-0.266
 choices-0.243
 harm-0.238
 disl-0.234
 permission-0.233
Token there
Token ID612
Feature activation-6.27e+00

Pos logit contributions
+0.097
lic+0.096
Inside+0.082
Along+0.082
On+0.078
Neg logit contributions
 supervision-0.135
 choices-0.122
 kindness-0.122
 apology-0.121
 permission-0.121
Token lived
Token ID5615
Feature activation-9.30e+00

Pos logit contributions
lic+0.223
+0.206
Inside+0.200
Along+0.198
On+0.190
Neg logit contributions
 supervision-0.195
 kindness-0.171
 permission-0.170
 choices-0.169
 disl-0.169
Token a
Token ID257
Feature activation-4.72e+00

Pos logit contributions
+0.378
lic+0.355
ink+0.336
br+0.311
icycle+0.294
Neg logit contributions
 permission-0.216
 kindness-0.204
 attention-0.183
 choices-0.179
 supervision-0.179
Token lion
Token ID18744
Feature activation-1.94e+00

Pos logit contributions
lic+0.154
+0.152
Inside+0.144
Along+0.137
On+0.135
Neg logit contributions
 kindness-0.134
 supervision-0.134
 disl-0.128
 apology-0.127
 harm-0.124
Token.
Token ID13
Feature activation-3.94e-01

Pos logit contributions
lic+0.072
+0.070
Inside+0.069
Along+0.066
On+0.065
Neg logit contributions
 supervision-0.054
 harm-0.047
 kindness-0.047
 permission-0.046
 choices-0.046
Token The
Token ID383
Feature activation-1.27e+00

Pos logit contributions
lic+0.014
+0.014
Along+0.013
On+0.013
Inside+0.013
Neg logit contributions
 supervision-0.011
 kindness-0.010
 disl-0.010
 apology-0.010
 choices-0.009
Token lion
Token ID18744
Feature activation-1.70e+00

Pos logit contributions
+0.051
lic+0.050
Inside+0.048
On+0.047
Along+0.046
Neg logit contributions
 supervision-0.034
 kindness-0.030
 apology-0.030
 choices-0.029
 permission-0.029
Token a
Token ID257
Feature activation-5.69e+00

Pos logit contributions
lic+0.140
+0.124
Inside+0.117
ink+0.116
Along+0.114
Neg logit contributions
 supervision-0.129
 kindness-0.116
 permission-0.116
 apology-0.109
 attention-0.109
Token big
Token ID1263
Feature activation-2.36e+00

Pos logit contributions
lic+0.180
+0.169
Inside+0.148
Along+0.135
ink+0.134
Neg logit contributions
 supervision-0.205
 disl-0.181
 permission-0.178
 kindness-0.178
 harm-0.176
Token tree
Token ID5509
Feature activation-6.63e+00

Pos logit contributions
lic+0.067
Inside+0.060
+0.060
On+0.057
Along+0.056
Neg logit contributions
 supervision-0.097
 permission-0.086
 choices-0.084
 kindness-0.083
 disl-0.083
Token,
Token ID11
Feature activation-3.46e+00

Pos logit contributions
lic+0.246
+0.223
Inside+0.215
Along+0.210
On+0.199
Neg logit contributions
 supervision-0.202
 harm-0.172
 permission-0.170
grand-0.166
 choices-0.166
Token there
Token ID612
Feature activation-6.03e+00

Pos logit contributions
lic+0.094
+0.092
Inside+0.078
Along+0.077
On+0.075
Neg logit contributions
 supervision-0.136
 permission-0.122
 choices-0.122
 kindness-0.121
 apology-0.120
Token lived
Token ID5615
Feature activation-9.29e+00

Pos logit contributions
lic+0.211
+0.194
Inside+0.189
Along+0.185
On+0.180
Neg logit contributions
 supervision-0.198
 permission-0.172
 kindness-0.170
 disl-0.167
 choices-0.167
Token a
Token ID257
Feature activation-4.82e+00

Pos logit contributions
+0.377
lic+0.373
ink+0.339
br+0.328
omed+0.303
Neg logit contributions
 permission-0.225
 kindness-0.196
 supervision-0.186
 apology-0.174
 forgiveness-0.172
Token mighty
Token ID18680
Feature activation+4.13e-01

Pos logit contributions
lic+0.158
+0.152
Inside+0.148
On+0.140
Along+0.140
Neg logit contributions
 supervision-0.150
 kindness-0.138
 apology-0.135
 disl-0.134
 permission-0.132
Token bird
Token ID6512
Feature activation-1.71e+00

Pos logit contributions
 supervision+0.012
 kindness+0.011
 apology+0.011
 permission+0.011
 choices+0.011
Neg logit contributions
lic-0.015
Inside-0.014
-0.014
On-0.014
Along-0.013
Token.
Token ID13
Feature activation-2.26e-01

Pos logit contributions
lic+0.054
+0.051
Inside+0.051
Along+0.048
On+0.048
Neg logit contributions
 supervision-0.062
 permission-0.054
 choices-0.052
 disl-0.052
 harm-0.052
Token The
Token ID383
Feature activation-1.09e+00

Pos logit contributions
lic+0.009
+0.008
Along+0.008
P+0.008
Inside+0.008
Neg logit contributions
 supervision-0.006
 disl-0.005
 kindness-0.005
 apology-0.005
 permission-0.005
Token upon
Token ID2402
Feature activation-6.76e+00

Pos logit contributions
lic+0.047
Inside+0.041
+0.041
On+0.040
Along+0.037
Neg logit contributions
 supervision-0.200
 permission-0.183
 apology-0.179
 choices-0.179
 disl-0.179
Token a
Token ID257
Feature activation-4.27e+00

Pos logit contributions
lic+0.207
+0.201
Inside+0.183
Along+0.179
ink+0.178
Neg logit contributions
 supervision-0.243
 permission-0.221
 kindness-0.215
 apology-0.212
 choices-0.211
Token time
Token ID640
Feature activation-3.51e+00

Pos logit contributions
lic+0.130
Inside+0.114
+0.113
ink+0.103
 clearing+0.103
Neg logit contributions
 supervision-0.152
 apology-0.136
 kindness-0.134
 permission-0.131
 harm-0.130
Token,
Token ID11
Feature activation-3.71e+00

Pos logit contributions
lic+0.126
Inside+0.114
+0.114
Along+0.112
On+0.110
Neg logit contributions
 supervision-0.111
 permission-0.094
 disl-0.092
 kindness-0.090
 attention-0.088
Token there
Token ID612
Feature activation-5.89e+00

Pos logit contributions
lic+0.117
+0.110
Inside+0.100
Along+0.096
ink+0.095
Neg logit contributions
 supervision-0.124
 permission-0.112
 kindness-0.108
 attention-0.106
 apology-0.105
Token lived
Token ID5615
Feature activation-9.29e+00

Pos logit contributions
lic+0.195
Inside+0.179
+0.177
Along+0.176
On+0.170
Neg logit contributions
 supervision-0.201
 permission-0.173
 kindness-0.172
 choices-0.172
 apology-0.170
Token a
Token ID257
Feature activation-5.19e+00

Pos logit contributions
lic+0.368
+0.348
ink+0.320
br+0.312
Inside+0.304
Neg logit contributions
 permission-0.229
 kindness-0.205
 supervision-0.202
 forgiveness-0.192
 apology-0.172
Token little
Token ID1310
Feature activation+6.61e-02

Pos logit contributions
lic+0.162
+0.154
Inside+0.152
On+0.144
Along+0.143
Neg logit contributions
 supervision-0.177
 kindness-0.159
 apology-0.157
 permission-0.156
 disl-0.154
Token girl
Token ID2576
Feature activation-5.49e-01

Pos logit contributions
 supervision+0.002
 permission+0.002
 choices+0.001
 apology+0.001
 kindness+0.001
Neg logit contributions
lic-0.003
Inside-0.003
-0.003
On-0.002
Along-0.002
Token named
Token ID3706
Feature activation+3.69e+00

Pos logit contributions
lic+0.013
+0.012
Inside+0.012
On+0.012
Along+0.011
Neg logit contributions
 supervision-0.025
 permission-0.022
 apology-0.022
 choices-0.022
 disl-0.021
Token Grace
Token ID16156
Feature activation-1.48e-01

Pos logit contributions
 supervision+0.114
 permission+0.097
 apology+0.092
 choices+0.092
 disl+0.090
Neg logit contributions
lic-0.139
-0.133
Inside-0.132
On-0.129
Along-0.128
Token a
Token ID257
Feature activation-5.37e+00

Pos logit contributions
lic+0.131
+0.116
ink+0.110
Inside+0.109
Along+0.106
Neg logit contributions
 supervision-0.110
 kindness-0.102
 permission-0.098
 apology-0.094
 attention-0.093
Token big
Token ID1263
Feature activation-2.12e+00

Pos logit contributions
lic+0.177
+0.166
Inside+0.149
Along+0.136
ink+0.134
Neg logit contributions
 supervision-0.181
 disl-0.163
 kindness-0.159
 harm-0.156
 permission-0.156
Token forest
Token ID8222
Feature activation-8.43e+00

Pos logit contributions
lic+0.064
Inside+0.058
+0.057
On+0.055
Along+0.054
Neg logit contributions
 supervision-0.083
 permission-0.074
 choices-0.072
 kindness-0.072
 apology-0.071
Token,
Token ID11
Feature activation-3.22e+00

Pos logit contributions
lic+0.291
+0.284
Inside+0.265
Along+0.262
On+0.254
Neg logit contributions
 supervision-0.264
 choices-0.237
 harm-0.232
 permission-0.231
 disl-0.229
Token there
Token ID612
Feature activation-5.91e+00

Pos logit contributions
+0.091
lic+0.090
Inside+0.075
Along+0.075
ink+0.073
Neg logit contributions
 supervision-0.116
 kindness-0.106
 choices-0.106
 attention-0.105
 apology-0.105
Token lived
Token ID5615
Feature activation-9.28e+00

Pos logit contributions
lic+0.210
+0.193
Inside+0.188
Along+0.184
On+0.178
Neg logit contributions
 supervision-0.189
 permission-0.165
 kindness-0.164
 disl-0.161
 choices-0.161
Token a
Token ID257
Feature activation-4.70e+00

Pos logit contributions
+0.380
lic+0.364
ink+0.342
br+0.321
icycle+0.300
Neg logit contributions
 permission-0.214
 kindness-0.199
 supervision-0.175
 forgiveness-0.173
 attention-0.171
Token gentle
Token ID10296
Feature activation+5.57e-01

Pos logit contributions
lic+0.154
+0.151
Inside+0.143
Along+0.136
On+0.135
Neg logit contributions
 supervision-0.136
 kindness-0.134
 disl-0.128
 apology-0.126
 harm-0.123
Token deer
Token ID20096
Feature activation-2.56e+00

Pos logit contributions
 supervision+0.013
 kindness+0.013
 apology+0.013
 permission+0.012
 choices+0.012
Neg logit contributions
lic-0.022
Inside-0.021
-0.021
On-0.021
Along-0.020
Token.
Token ID13
Feature activation+1.95e-02

Pos logit contributions
lic+0.088
+0.083
Inside+0.083
On+0.080
Along+0.079
Neg logit contributions
 supervision-0.085
 permission-0.074
 apology-0.071
 choices-0.071
 kindness-0.071
Token The
Token ID383
Feature activation-9.85e-01

Pos logit contributions
 supervision+0.000
 kindness+0.000
 disl+0.000
 apology+0.000
 honesty+0.000
Neg logit contributions
lic-0.001
Finally-0.001
-0.001
ink-0.001
P-0.001
Token smiled
Token ID13541
Feature activation-2.29e+00

Pos logit contributions
 supervision+0.014
 permission+0.012
 apology+0.012
 kindness+0.011
 disl+0.011
Neg logit contributions
lic-0.018
-0.017
Inside-0.017
On-0.017
Along-0.017
Token and
Token ID290
Feature activation+1.44e+00

Pos logit contributions
lic+0.067
+0.063
Inside+0.063
On+0.060
Along+0.060
Neg logit contributions
 supervision-0.089
 permission-0.079
 apology-0.076
 choices-0.076
 kindness-0.075
Token said
Token ID531
Feature activation+4.06e-01

Pos logit contributions
 supervision+0.048
 permission+0.040
 choices+0.039
 apology+0.039
 disl+0.038
Neg logit contributions
lic-0.052
Inside-0.049
-0.049
On-0.048
Along-0.047
Token,
Token ID11
Feature activation-1.01e+00

Pos logit contributions
 supervision+0.019
 permission+0.017
 harm+0.017
 choices+0.017
 disl+0.017
Neg logit contributions
lic-0.009
-0.008
Inside-0.007
On-0.007
Along-0.006
Token "
Token ID366
Feature activation+2.89e+00

Pos logit contributions
lic+0.028
+0.027
Inside+0.026
Along+0.026
On+0.026
Neg logit contributions
 supervision-0.039
 permission-0.035
 apology-0.034
 kindness-0.034
 choices-0.033
TokenOf
Token ID5189
Feature activation+9.90e+00

Pos logit contributions
 supervision+0.099
 permission+0.085
 apology+0.082
 choices+0.082
 disl+0.081
Neg logit contributions
lic-0.099
-0.095
Inside-0.093
On-0.091
Along-0.090
Token course
Token ID1781
Feature activation+4.06e+00

Pos logit contributions
 supervision+0.462
 permission+0.403
 disl+0.401
 choices+0.391
 apology+0.386
Neg logit contributions
lic-0.212
-0.210
Inside-0.196
On-0.186
A-0.176
Token,
Token ID11
Feature activation+2.45e+00

Pos logit contributions
 supervision+0.189
 permission+0.168
 disl+0.166
 apology+0.164
 choices+0.161
Neg logit contributions
lic-0.088
Inside-0.082
-0.082
On-0.074
Along-0.073
Token sweet
Token ID6029
Feature activation+5.32e+00

Pos logit contributions
 supervision+0.084
 permission+0.072
 disl+0.069
 apology+0.068
 choices+0.068
Neg logit contributions
lic-0.083
Inside-0.081
-0.078
On-0.077
Along-0.075
Tokenie
Token ID494
Feature activation+2.49e+00

Pos logit contributions
 supervision+0.140
 permission+0.114
 apology+0.108
 choices+0.107
 disl+0.105
Neg logit contributions
lic-0.227
-0.217
Inside-0.216
On-0.211
Along-0.210
Token.
Token ID13
Feature activation+3.50e+00

Pos logit contributions
 supervision+0.109
 disl+0.096
 apology+0.096
 permission+0.094
 kindness+0.089
Neg logit contributions
lic-0.058
-0.054
Inside-0.052
ink-0.051
Along-0.048
TokenTom
Token ID13787
Feature activation+3.94e+00

Pos logit contributions
 supervision+0.123
 permission+0.107
 apology+0.105
 disl+0.103
 kindness+0.102
Neg logit contributions
lic-0.106
-0.099
Inside-0.095
Along-0.094
On-0.092
Token,
Token ID11
Feature activation+8.00e-01

Pos logit contributions
 supervision+0.163
 permission+0.144
 apology+0.140
 choices+0.139
 disl+0.138
Neg logit contributions
lic-0.107
-0.101
Inside-0.100
On-0.097
Along-0.096
Token stop
Token ID2245
Feature activation+1.43e+00

Pos logit contributions
 supervision+0.026
 choices+0.022
 permission+0.022
 disl+0.021
 apology+0.021
Neg logit contributions
lic-0.027
Inside-0.027
-0.026
On-0.025
Along-0.025
Token sc
Token ID629
Feature activation+3.10e+00

Pos logit contributions
 supervision+0.057
 apology+0.052
 permission+0.051
 choices+0.049
 kindness+0.049
Neg logit contributions
lic-0.037
-0.035
Inside-0.034
On-0.032
ink-0.032
Tokenaring
Token ID1723
Feature activation+5.26e+00

Pos logit contributions
 supervision+0.093
 disl+0.081
 choices+0.080
 permission+0.079
 kindness+0.077
Neg logit contributions
lic-0.117
Inside-0.109
-0.108
On-0.105
Along-0.105
Token Miss
Token ID4544
Feature activation+1.08e+01

Pos logit contributions
 supervision+0.213
 permission+0.187
 disl+0.182
 apology+0.181
 choices+0.179
Neg logit contributions
lic-0.145
Inside-0.138
-0.137
On-0.133
Along-0.129
Token Bunny
Token ID28683
Feature activation+4.10e+00

Pos logit contributions
 supervision+0.385
 permission+0.334
 apology+0.322
 choices+0.320
 disl+0.315
Neg logit contributions
lic-0.356
-0.339
Inside-0.339
On-0.329
Along-0.326
Token.
Token ID13
Feature activation+3.32e+00

Pos logit contributions
 supervision+0.158
 disl+0.141
 permission+0.139
 apology+0.138
 kindness+0.131
Neg logit contributions
lic-0.115
-0.107
Inside-0.107
On-0.106
Along-0.104
Token She
Token ID1375
Feature activation-4.16e-02

Pos logit contributions
 supervision+0.131
 permission+0.115
 apology+0.111
 choices+0.111
 disl+0.110
Neg logit contributions
lic-0.096
-0.091
Inside-0.090
On-0.088
Along-0.087
Token is
Token ID318
Feature activation-2.67e+00

Pos logit contributions
lic+0.001
+0.001
Inside+0.001
On+0.001
Along+0.001
Neg logit contributions
 supervision-0.001
 permission-0.001
 apology-0.001
 choices-0.001
 disl-0.001
Token very
Token ID845
Feature activation+9.92e-01

Pos logit contributions
lic+0.088
Inside+0.083
+0.083
On+0.081
Along+0.080
Neg logit contributions
 supervision-0.095
 permission-0.082
 apology-0.080
 choices-0.079
 disl-0.078
Token.
Token ID13
Feature activation+3.15e+00

Pos logit contributions
 supervision+0.273
 disl+0.257
 harm+0.227
 apology+0.223
 permission+0.216
Neg logit contributions
lic-0.152
Inside-0.144
-0.142
On-0.141
Along-0.136
Token You
Token ID921
Feature activation+4.56e+00

Pos logit contributions
 supervision+0.113
 permission+0.099
 apology+0.095
 choices+0.094
 attention+0.092
Neg logit contributions
lic-0.101
-0.099
Inside-0.097
On-0.093
Along-0.092
Token learned
Token ID4499
Feature activation+4.21e+00

Pos logit contributions
 supervision+0.164
 permission+0.138
 apology+0.135
 choices+0.135
 disl+0.132
Neg logit contributions
lic-0.152
Inside-0.145
-0.143
On-0.138
Along-0.138
Token a
Token ID257
Feature activation+1.62e+00

Pos logit contributions
 supervision+0.163
 permission+0.136
 apology+0.130
 choices+0.129
 disl+0.129
Neg logit contributions
lic-0.125
Inside-0.124
On-0.121
Along-0.114
-0.113
Token valuable
Token ID8119
Feature activation+8.07e+00

Pos logit contributions
 supervision+0.046
 permission+0.038
 apology+0.036
 choices+0.036
 disl+0.035
Neg logit contributions
lic-0.065
-0.063
Inside-0.062
On-0.061
Along-0.061
Token lesson
Token ID11483
Feature activation+1.04e+01

Pos logit contributions
 supervision+0.226
 permission+0.152
 disl+0.152
 choices+0.138
 kindness+0.126
Neg logit contributions
lic-0.343
Inside-0.332
On-0.331
Along-0.328
-0.312
Token today
Token ID1909
Feature activation+2.53e+00

Pos logit contributions
 supervision+0.410
 disl+0.346
 permission+0.343
 apology+0.343
 choices+0.338
Neg logit contributions
Inside-0.281
lic-0.280
On-0.272
Along-0.257
-0.254
Token."
Token ID526
Feature activation+1.61e+00

Pos logit contributions
 supervision+0.090
 permission+0.078
 apology+0.075
 disl+0.074
 choices+0.074
Neg logit contributions
lic-0.082
-0.079
Inside-0.079
On-0.076
Along-0.075
Token<|endoftext|>
Token ID50256
Feature activation+1.75e-02

Pos logit contributions
 supervision+0.076
 permission+0.068
 apology+0.067
 disl+0.067
 choices+0.067
Neg logit contributions
lic-0.034
Inside-0.030
-0.030
Along-0.029
On-0.029
TokenL
Token ID43
Feature activation+3.25e+00

Pos logit contributions
 supervision+0.001
 harm+0.001
 kindness+0.001
 choices+0.001
 apology+0.001
Neg logit contributions
lic-0.000
Inside-0.000
Along-0.000
Finally-0.000
-0.000
Tokenily
Token ID813
Feature activation+1.18e+00

Pos logit contributions
 supervision+0.113
 permission+0.095
 disl+0.093
 apology+0.093
 choices+0.093
Neg logit contributions
lic-0.111
Inside-0.106
-0.106
On-0.103
Along-0.102
Token "
Token ID366
Feature activation+2.36e+00

Pos logit contributions
lic+0.036
Inside+0.034
+0.034
Along+0.034
On+0.033
Neg logit contributions
 supervision-0.048
 permission-0.042
 apology-0.041
 disl-0.041
 kindness-0.041
TokenThat
Token ID2504
Feature activation+2.09e+00

Pos logit contributions
 supervision+0.089
 permission+0.078
 apology+0.076
 disl+0.076
 kindness+0.075
Neg logit contributions
lic-0.071
-0.068
Inside-0.066
Along-0.063
On-0.063
Token was
Token ID373
Feature activation-2.03e+00

Pos logit contributions
 supervision+0.076
 permission+0.064
 disl+0.060
 apology+0.060
 choices+0.058
Neg logit contributions
lic-0.068
Inside-0.066
-0.065
On-0.064
Along-0.063
Token a
Token ID257
Feature activation-1.86e+00

Pos logit contributions
lic+0.059
+0.057
Inside+0.055
On+0.054
Along+0.053
Neg logit contributions
 supervision-0.080
 permission-0.071
 apology-0.068
 choices-0.068
 kindness-0.068
Token good
Token ID922
Feature activation+5.13e+00

Pos logit contributions
lic+0.062
Inside+0.057
+0.057
On+0.056
Along+0.055
Neg logit contributions
 supervision-0.065
 permission-0.057
 apology-0.056
 kindness-0.056
 disl-0.055
Token choice
Token ID3572
Feature activation+1.14e+01

Pos logit contributions
 supervision+0.194
 permission+0.155
 disl+0.152
 choices+0.143
 privacy+0.141
Neg logit contributions
-0.156
Inside-0.153
lic-0.153
On-0.152
Along-0.151
Token.
Token ID13
Feature activation+2.97e+00

Pos logit contributions
 supervision+0.501
 disl+0.445
 apology+0.438
 permission+0.437
 choices+0.429
Neg logit contributions
lic-0.262
Inside-0.244
-0.243
On-0.224
Along-0.217
Token Let
Token ID3914
Feature activation+5.37e+00

Pos logit contributions
 supervision+0.099
 permission+0.084
 apology+0.083
 disl+0.081
 choices+0.081
Neg logit contributions
lic-0.104
-0.098
Inside-0.097
On-0.096
Along-0.095
Token's
Token ID338
Feature activation+1.38e+00

Pos logit contributions
 supervision+0.196
 disl+0.167
 permission+0.163
 apology+0.162
 choices+0.158
Neg logit contributions
lic-0.168
Inside-0.164
-0.161
On-0.159
Along-0.156
Token put
Token ID1234
Feature activation-1.50e-01

Pos logit contributions
 supervision+0.043
 permission+0.036
 apology+0.035
 choices+0.035
 disl+0.034
Neg logit contributions
lic-0.051
-0.049
Inside-0.049
On-0.048
Along-0.047
Token the
Token ID262
Feature activation+2.66e+00

Pos logit contributions
lic+0.005
+0.005
Inside+0.005
On+0.005
Along+0.005
Neg logit contributions
 supervision-0.005
 permission-0.004
 apology-0.004
 disl-0.004
 choices-0.004
Token was
Token ID373
Feature activation-3.97e+00

Pos logit contributions
 supervision+0.035
 permission+0.031
 apology+0.030
 choices+0.030
 disl+0.029
Neg logit contributions
lic-0.029
-0.027
Inside-0.027
On-0.027
Along-0.026
Token very
Token ID845
Feature activation+6.56e-01

Pos logit contributions
lic+0.128
+0.123
Inside+0.122
On+0.118
Along+0.117
Neg logit contributions
 supervision-0.143
 permission-0.124
 choices-0.120
 apology-0.120
 disl-0.118
Token happy
Token ID3772
Feature activation+3.71e+00

Pos logit contributions
 supervision+0.020
 permission+0.017
 apology+0.016
 choices+0.016
 disl+0.016
Neg logit contributions
lic-0.025
Inside-0.024
-0.024
On-0.024
Along-0.023
Token to
Token ID284
Feature activation+5.20e+00

Pos logit contributions
 supervision+0.158
 apology+0.137
 permission+0.136
 disl+0.132
 choices+0.130
Neg logit contributions
lic-0.097
-0.094
Inside-0.093
On-0.092
Along-0.088
Token meet
Token ID1826
Feature activation+5.13e-01

Pos logit contributions
 supervision+0.158
 permission+0.129
 choices+0.126
 apology+0.124
 disl+0.121
Neg logit contributions
lic-0.200
-0.195
Inside-0.193
On-0.187
Along-0.184
Token Mrs
Token ID9074
Feature activation+1.03e+01

Pos logit contributions
 supervision+0.017
 permission+0.014
 apology+0.013
 choices+0.013
 disl+0.013
Neg logit contributions
lic-0.019
-0.018
Inside-0.018
On-0.018
Along-0.017
Token.
Token ID13
Feature activation+3.17e+00

Pos logit contributions
 supervision+0.373
 permission+0.320
 apology+0.311
 disl+0.307
 choices+0.304
Neg logit contributions
lic-0.334
-0.320
Inside-0.314
On-0.308
Along-0.302
Token Smith
Token ID4176
Feature activation+2.46e+00

Pos logit contributions
 supervision+0.088
 permission+0.074
 disl+0.070
 choices+0.069
 kindness+0.069
Neg logit contributions
lic-0.129
Inside-0.119
On-0.116
Along-0.116
-0.114
Token and
Token ID290
Feature activation+1.33e+00

Pos logit contributions
 supervision+0.099
 disl+0.085
 apology+0.085
 permission+0.085
 choices+0.082
Neg logit contributions
lic-0.068
On-0.065
-0.064
Inside-0.063
Along-0.063
Token he
Token ID339
Feature activation-3.11e+00

Pos logit contributions
 supervision+0.053
 permission+0.046
 apology+0.045
 attention+0.044
 choices+0.043
Neg logit contributions
lic-0.036
Inside-0.036
On-0.035
-0.033
Along-0.032
Token loved
Token ID6151
Feature activation+1.77e+00

Pos logit contributions
lic+0.103
Inside+0.097
+0.097
On+0.095
Along+0.094
Neg logit contributions
 supervision-0.113
 permission-0.097
 choices-0.093
 apology-0.093
 attention-0.091
Token happily
Token ID18177
Feature activation-4.30e-02

Pos logit contributions
lic+0.026
+0.025
Inside+0.024
On+0.023
Along+0.023
Neg logit contributions
 supervision-0.028
 permission-0.024
 choices-0.023
 apology-0.023
 kindness-0.023
Token.
Token ID13
Feature activation+7.05e-01

Pos logit contributions
lic+0.001
+0.001
Inside+0.001
On+0.001
Along+0.001
Neg logit contributions
 supervision-0.002
 apology-0.001
 disl-0.001
 permission-0.001
 choices-0.001
Token He
Token ID679
Feature activation-3.00e+00

Pos logit contributions
 supervision+0.027
 permission+0.024
 apology+0.023
 choices+0.023
 kindness+0.023
Neg logit contributions
lic-0.021
-0.019
Inside-0.018
ink-0.018
Along-0.018
Token felt
Token ID2936
Feature activation-2.49e+00

Pos logit contributions
lic+0.085
+0.080
Inside+0.080
On+0.079
Along+0.077
Neg logit contributions
 supervision-0.122
 permission-0.106
 choices-0.103
 apology-0.103
 attention-0.102
Token extra
Token ID3131
Feature activation+5.61e-01

Pos logit contributions
lic+0.057
Inside+0.056
On+0.055
+0.052
Along+0.051
Neg logit contributions
 supervision-0.112
 permission-0.095
 apology-0.094
 disl-0.091
 choices-0.091
Token healthy
Token ID5448
Feature activation+5.10e+00

Pos logit contributions
 supervision+0.019
 permission+0.016
 apology+0.016
 choices+0.016
 disl+0.015
Neg logit contributions
lic-0.019
-0.019
Inside-0.019
On-0.018
Along-0.018
Token and
Token ID290
Feature activation+1.84e+00

Pos logit contributions
 supervision+0.233
 apology+0.215
 disl+0.210
 permission+0.205
 choices+0.196
Neg logit contributions
-0.105
lic-0.101
Inside-0.098
On-0.097
Along-0.094
Token ready
Token ID3492
Feature activation+5.22e-01

Pos logit contributions
 supervision+0.073
 permission+0.065
 apology+0.063
 disl+0.059
 harm+0.059
Neg logit contributions
lic-0.051
Inside-0.050
On-0.049
-0.046
Along-0.045
Token to
Token ID284
Feature activation+3.38e+00

Pos logit contributions
 supervision+0.018
 permission+0.015
 apology+0.015
 disl+0.015
 choices+0.014
Neg logit contributions
lic-0.018
Inside-0.017
-0.017
On-0.017
Along-0.016
Token play
Token ID711
Feature activation-8.58e-01

Pos logit contributions
 supervision+0.098
 permission+0.083
 apology+0.079
 choices+0.078
 kindness+0.078
Neg logit contributions
lic-0.135
-0.127
Inside-0.126
On-0.123
Along-0.123
Token!
Token ID0
Feature activation+9.60e-01

Pos logit contributions
lic+0.028
+0.027
Inside+0.027
On+0.026
Along+0.026
Neg logit contributions
 supervision-0.031
 permission-0.027
 apology-0.026
 choices-0.026
 disl-0.025
Token to
Token ID284
Feature activation+3.74e+00

Pos logit contributions
lic+0.017
On+0.017
Inside+0.017
+0.016
Along+0.016
Neg logit contributions
 supervision-0.020
 apology-0.017
 choices-0.017
 permission-0.017
 disl-0.016
Token put
Token ID1234
Feature activation+4.45e-01

Pos logit contributions
 supervision+0.114
 permission+0.093
 choices+0.091
 disl+0.088
 apology+0.086
Neg logit contributions
lic-0.139
Inside-0.139
-0.136
On-0.136
Along-0.131
Token the
Token ID262
Feature activation+3.60e+00

Pos logit contributions
 supervision+0.015
 permission+0.012
 apology+0.012
 disl+0.012
 choices+0.012
Neg logit contributions
lic-0.016
-0.015
Inside-0.015
On-0.015
Along-0.014
Token v
Token ID410
Feature activation+2.48e+00

Pos logit contributions
 supervision+0.124
 permission+0.105
 disl+0.103
 apology+0.102
 choices+0.101
Neg logit contributions
lic-0.121
-0.117
Inside-0.117
On-0.113
Along-0.112
Tokenase
Token ID589
Feature activation+2.45e-01

Pos logit contributions
 supervision+0.098
 permission+0.088
 apology+0.088
 choices+0.086
 disl+0.083
Neg logit contributions
Inside-0.062
lic-0.062
On-0.061
Along-0.060
 clearing-0.058
Token back
Token ID736
Feature activation+3.73e+00

Pos logit contributions
 supervision+0.009
 disl+0.008
 apology+0.007
 permission+0.007
 choices+0.007
Neg logit contributions
lic-0.008
Inside-0.008
On-0.007
-0.007
Along-0.007
Token on
Token ID319
Feature activation+1.02e+00

Pos logit contributions
 supervision+0.154
 permission+0.136
 apology+0.132
 choices+0.131
 disl+0.130
Neg logit contributions
lic-0.101
-0.095
Inside-0.094
On-0.091
Along-0.090
Token the
Token ID262
Feature activation+2.86e+00

Pos logit contributions
 supervision+0.031
 permission+0.026
 apology+0.025
 choices+0.025
 disl+0.025
Neg logit contributions
lic-0.039
-0.037
Inside-0.037
On-0.036
Along-0.036
Token table
Token ID3084
Feature activation-3.21e+00

Pos logit contributions
 supervision+0.122
 disl+0.109
 permission+0.107
 choices+0.102
 apology+0.101
Neg logit contributions
Inside-0.069
-0.067
On-0.065
lic-0.065
Along-0.064
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
lic+0.099
+0.095
Inside+0.094
On+0.092
Along+0.090
Neg logit contributions
 supervision-0.117
 permission-0.102
 disl-0.101
 apology-0.100
 choices-0.098
Token But
Token ID887
Feature activation+2.92e+00

Pos logit contributions
 supervision+0.041
 permission+0.035
 apology+0.034
 disl+0.034
 kindness+0.033
Neg logit contributions
lic-0.041
Along-0.036
Inside-0.036
 clearing-0.035
ink-0.035
Token bored
Token ID17533
Feature activation+4.03e+00

Pos logit contributions
lic+0.048
Inside+0.045
On+0.045
+0.044
Along+0.043
Neg logit contributions
 supervision-0.063
 disl-0.056
 apology-0.055
 permission-0.051
 privacy-0.051
Token.
Token ID13
Feature activation+1.14e+00

Pos logit contributions
 supervision+0.155
 apology+0.136
 disl+0.134
 permission+0.134
 choices+0.126
Neg logit contributions
lic-0.117
Inside-0.112
On-0.112
-0.112
Along-0.106
Token
Token ID198
Feature activation-4.54e+00

Pos logit contributions
 supervision+0.037
 permission+0.034
 disl+0.033
 kindness+0.032
 apology+0.031
Neg logit contributions
lic-0.036
On-0.032
Inside-0.032
br-0.032
ink-0.032
Token
Token ID198
Feature activation-4.79e+00

Pos logit contributions
+0.118
lic+0.115
Along+0.100
 clearing+0.099
ink+0.095
Neg logit contributions
 supervision-0.164
 apology-0.153
 disl-0.151
 permission-0.149
 kindness-0.148
Token"
Token ID1
Feature activation+2.98e+00

Pos logit contributions
+0.122
lic+0.113
 clearing+0.102
ink+0.092
Along+0.091
Neg logit contributions
 supervision-0.172
 disl-0.165
 permission-0.162
grand-0.154
 apology-0.153
TokenM
Token ID44
Feature activation+3.99e+00

Pos logit contributions
 supervision+0.109
 permission+0.102
 kindness+0.098
 apology+0.095
 disl+0.095
Neg logit contributions
lic-0.080
-0.079
ink-0.077
Along-0.069
Inside-0.069
Tokenummy
Token ID13513
Feature activation+4.35e+00

Pos logit contributions
 supervision+0.130
 apology+0.121
 choices+0.121
 permission+0.119
 disl+0.114
Neg logit contributions
lic-0.126
On-0.123
 clearing-0.122
Along-0.122
Inside-0.119
Token,
Token ID11
Feature activation+2.42e-01

Pos logit contributions
 supervision+0.171
 permission+0.149
 apology+0.146
 disl+0.144
 choices+0.142
Neg logit contributions
lic-0.126
-0.118
Inside-0.117
On-0.111
Along-0.111
Token can
Token ID460
Feature activation+2.81e+00

Pos logit contributions
 supervision+0.008
 permission+0.007
 apology+0.007
 choices+0.007
 disl+0.007
Neg logit contributions
lic-0.008
Inside-0.008
-0.008
On-0.008
Along-0.008
Token I
Token ID314
Feature activation+3.68e+00

Pos logit contributions
 supervision+0.097
 permission+0.083
 choices+0.080
 apology+0.080
 disl+0.079
Neg logit contributions
lic-0.095
-0.094
Inside-0.092
On-0.089
Along-0.087
Token do
Token ID466
Feature activation+1.11e+00

Pos logit contributions
 supervision+0.108
 permission+0.093
 apology+0.088
 kindness+0.086
 choices+0.086
Neg logit contributions
lic-0.143
-0.135
Inside-0.135
On-0.132
Along-0.131
Token
Token ID198
Feature activation-4.41e+00

Pos logit contributions
 supervision+0.041
 disl+0.037
 kindness+0.036
 apology+0.036
 permission+0.034
Neg logit contributions
lic-0.033
Along-0.032
-0.031
ink-0.030
Inside-0.030
Token
Token ID198
Feature activation-4.68e+00

Pos logit contributions
lic+0.093
+0.088
Inside+0.081
Along+0.081
On+0.077
Neg logit contributions
 supervision-0.206
 apology-0.184
 permission-0.182
 disl-0.181
 kindness-0.180
Token"
Token ID1
Feature activation+3.09e+00

Pos logit contributions
lic+0.124
+0.122
Along+0.102
Inside+0.097
 clearing+0.096
Neg logit contributions
 supervision-0.194
 permission-0.166
 disl-0.165
 apology-0.164
 choices-0.163
TokenNo
Token ID2949
Feature activation+7.53e+00

Pos logit contributions
 supervision+0.115
 apology+0.103
 disl+0.103
 permission+0.101
 kindness+0.101
Neg logit contributions
lic-0.090
-0.086
Inside-0.081
Along-0.080
ink-0.079
Token,
Token ID11
Feature activation+1.56e+00

Pos logit contributions
 supervision+0.343
 permission+0.305
 disl+0.302
 choices+0.293
 apology+0.292
Neg logit contributions
lic-0.164
-0.156
Inside-0.148
On-0.141
Along-0.136
Token Ben
Token ID3932
Feature activation+3.47e+00

Pos logit contributions
 supervision+0.056
 permission+0.048
 choices+0.047
 disl+0.047
 harm+0.046
Neg logit contributions
-0.045
lic-0.045
Inside-0.044
On-0.041
A-0.040
Token,"
Token ID553
Feature activation+1.44e+00

Pos logit contributions
 supervision+0.159
 permission+0.147
 apology+0.141
 disl+0.140
 kindness+0.136
Neg logit contributions
lic-0.071
Inside-0.065
-0.064
On-0.062
ink-0.062
Token Lily
Token ID20037
Feature activation+1.73e+00

Pos logit contributions
 supervision+0.054
 permission+0.047
 choices+0.044
 disl+0.043
 harm+0.043
Neg logit contributions
lic-0.044
Inside-0.044
On-0.042
-0.042
Along-0.040
Token says
Token ID1139
Feature activation+4.59e-01

Pos logit contributions
 supervision+0.045
 permission+0.037
 disl+0.033
 choices+0.033
 apology+0.032
Neg logit contributions
lic-0.072
Inside-0.070
On-0.070
-0.069
Along-0.068
Token.
Token ID13
Feature activation+1.82e+00

Pos logit contributions
 supervision+0.020
 disl+0.019
grand+0.018
 privacy+0.017
 permission+0.017
Neg logit contributions
-0.010
lic-0.010
Inside-0.009
On-0.009
Along-0.008
Token "
Token ID366
Feature activation+2.29e+00

Pos logit contributions
 supervision+0.063
 permission+0.055
 kindness+0.052
 apology+0.052
 attention+0.050
Neg logit contributions
lic-0.060
Inside-0.054
Along-0.054
-0.054
On-0.053
Token you
Token ID345
Feature activation+4.11e+00

Pos logit contributions
 supervision+0.114
 disl+0.094
 permission+0.093
 apology+0.093
 choices+0.092
Neg logit contributions
lic-0.131
Inside-0.127
-0.126
On-0.121
Along-0.121
Token how
Token ID703
Feature activation+5.28e+00

Pos logit contributions
 supervision+0.136
 permission+0.117
 apology+0.112
 choices+0.112
 disl+0.111
Neg logit contributions
lic-0.144
-0.138
Inside-0.137
On-0.133
Along-0.132
Token to
Token ID284
Feature activation+5.67e+00

Pos logit contributions
 supervision+0.198
 apology+0.166
 permission+0.166
 choices+0.160
 disl+0.155
Neg logit contributions
lic-0.167
Inside-0.161
On-0.153
-0.153
Along-0.152
Token make
Token ID787
Feature activation+4.02e+00

Pos logit contributions
 supervision+0.152
 permission+0.130
 apology+0.124
 disl+0.121
 kindness+0.121
Neg logit contributions
lic-0.235
-0.221
Inside-0.220
On-0.217
Along-0.217
Token the
Token ID262
Feature activation+3.07e+00

Pos logit contributions
 supervision+0.150
 permission+0.126
 apology+0.125
 disl+0.121
 choices+0.118
Neg logit contributions
lic-0.125
Inside-0.117
-0.115
On-0.112
Along-0.111
Token perfect
Token ID2818
Feature activation+3.57e+00

Pos logit contributions
 supervision+0.130
 permission+0.111
 disl+0.109
 apology+0.108
 harm+0.105
Neg logit contributions
lic-0.082
-0.077
Inside-0.075
On-0.073
Along-0.072
Token cup
Token ID6508
Feature activation+7.28e-01

Pos logit contributions
 supervision+0.143
 permission+0.123
 apology+0.119
 disl+0.119
 choices+0.117
Neg logit contributions
lic-0.103
-0.097
Inside-0.096
On-0.093
Along-0.092
Token of
Token ID286
Feature activation+4.24e+00

Pos logit contributions
 supervision+0.024
 permission+0.021
 choices+0.020
 apology+0.020
 kindness+0.020
Neg logit contributions
lic-0.025
-0.024
Inside-0.024
On-0.023
Along-0.023
Token tea
Token ID8887
Feature activation+7.48e-01

Pos logit contributions
 supervision+0.169
 permission+0.146
 apology+0.143
 choices+0.139
 disl+0.138
Neg logit contributions
lic-0.118
-0.114
Inside-0.111
On-0.106
Along-0.103
Token!"
Token ID2474
Feature activation+9.27e-01

Pos logit contributions
 supervision+0.022
 permission+0.019
 choices+0.018
 apology+0.018
 kindness+0.018
Neg logit contributions
lic-0.028
-0.027
Inside-0.027
On-0.027
Along-0.026
Token
Token ID220
Feature activation-2.81e+00

Pos logit contributions
 supervision+0.040
 permission+0.036
 apology+0.035
 kindness+0.035
 choices+0.035
Neg logit contributions
lic-0.023
Inside-0.021
-0.020
On-0.020
Along-0.020
Token thanked
Token ID26280
Feature activation+1.49e+00

Pos logit contributions
lic+0.139
+0.132
Inside+0.131
On+0.127
Along+0.125
Neg logit contributions
 supervision-0.167
 permission-0.146
 apology-0.141
 choices-0.140
 kindness-0.139
Token the
Token ID262
Feature activation+2.20e+00

Pos logit contributions
 supervision+0.051
 permission+0.041
 disl+0.041
 choices+0.040
 apology+0.040
Neg logit contributions
lic-0.051
Inside-0.050
-0.050
On-0.049
Along-0.048
Token gray
Token ID12768
Feature activation+1.61e+00

Pos logit contributions
 supervision+0.092
 permission+0.079
 choices+0.078
 disl+0.077
 apology+0.077
Neg logit contributions
lic-0.058
-0.056
Inside-0.055
On-0.054
Along-0.054
Token bird
Token ID6512
Feature activation+1.25e+00

Pos logit contributions
 supervision+0.056
 permission+0.047
 apology+0.046
 choices+0.046
 disl+0.045
Neg logit contributions
lic-0.054
-0.052
Inside-0.052
On-0.050
Along-0.050
Token very
Token ID845
Feature activation+1.04e+00

Pos logit contributions
 supervision+0.045
 disl+0.038
 permission+0.037
 apology+0.036
 choices+0.036
Neg logit contributions
lic-0.039
On-0.038
Inside-0.038
Along-0.037
-0.037
Token much
Token ID881
Feature activation+1.89e+00

Pos logit contributions
 supervision+0.037
 permission+0.032
 apology+0.031
 choices+0.031
 disl+0.030
Neg logit contributions
lic-0.034
-0.033
Inside-0.032
On-0.031
Along-0.031
Token and
Token ID290
Feature activation+5.61e-01

Pos logit contributions
 supervision+0.080
 disl+0.069
 permission+0.068
 apology+0.066
 choices+0.063
Neg logit contributions
lic-0.049
On-0.048
Inside-0.047
Along-0.045
-0.045
Token flew
Token ID13112
Feature activation-4.37e+00

Pos logit contributions
 supervision+0.023
 permission+0.020
 apology+0.020
 disl+0.019
 choices+0.019
Neg logit contributions
lic-0.016
Inside-0.015
On-0.014
Along-0.014
-0.014
Token off
Token ID572
Feature activation-2.41e+00

Pos logit contributions
lic+0.178
+0.173
Inside+0.169
On+0.165
Along+0.164
Neg logit contributions
 supervision-0.120
 permission-0.101
 choices-0.096
 apology-0.096
 kindness-0.095
Token into
Token ID656
Feature activation-5.03e+00

Pos logit contributions
lic+0.083
+0.079
Inside+0.078
On+0.076
Along+0.076
Neg logit contributions
 supervision-0.082
 permission-0.071
 apology-0.068
 choices-0.067
 kindness-0.067
Token the
Token ID262
Feature activation-1.16e+00

Pos logit contributions
lic+0.160
Inside+0.160
On+0.158
+0.157
Along+0.154
Neg logit contributions
 supervision-0.174
 permission-0.145
 apology-0.144
 choices-0.142
 disl-0.142
Token job
Token ID1693
Feature activation+1.49e+00

Pos logit contributions
 supervision+0.023
 permission+0.020
 choices+0.019
 apology+0.019
 disl+0.019
Neg logit contributions
lic-0.017
-0.016
Inside-0.016
On-0.015
Along-0.015
Token and
Token ID290
Feature activation+1.61e+00

Pos logit contributions
 supervision+0.054
 permission+0.047
 apology+0.045
 choices+0.045
 disl+0.044
Neg logit contributions
lic-0.048
-0.046
Inside-0.045
On-0.044
Along-0.044
Token he
Token ID339
Feature activation-2.42e+00

Pos logit contributions
 supervision+0.060
 permission+0.052
 apology+0.051
 choices+0.051
 disl+0.050
Neg logit contributions
lic-0.050
Inside-0.048
On-0.046
-0.045
Along-0.044
Token needed
Token ID2622
Feature activation+4.03e-01

Pos logit contributions
lic+0.072
Inside+0.068
+0.067
On+0.066
Along+0.065
Neg logit contributions
 supervision-0.095
 permission-0.083
 choices-0.080
 apology-0.080
 disl-0.079
Token a
Token ID257
Feature activation-9.60e-01

Pos logit contributions
 supervision+0.014
 permission+0.012
 disl+0.012
 apology+0.012
 choices+0.012
Neg logit contributions
lic-0.013
Inside-0.013
-0.013
On-0.012
Along-0.012
Token nice
Token ID3621
Feature activation+1.75e+00

Pos logit contributions
lic+0.032
+0.030
Inside+0.030
On+0.029
Along+0.029
Neg logit contributions
 supervision-0.034
 permission-0.030
 apology-0.029
 kindness-0.028
 attention-0.028
Token suit
Token ID6050
Feature activation+2.45e-01

Pos logit contributions
 supervision+0.062
 permission+0.055
 apology+0.053
 choices+0.052
 kindness+0.052
Neg logit contributions
lic-0.057
-0.054
Inside-0.054
On-0.052
Along-0.052
Token.
Token ID13
Feature activation+1.22e+00

Pos logit contributions
 supervision+0.009
 apology+0.008
 disl+0.008
 permission+0.007
 choices+0.007
Neg logit contributions
lic-0.008
Inside-0.007
-0.007
On-0.007
Along-0.007
Token Mom
Token ID11254
Feature activation+3.99e+00

Pos logit contributions
 supervision+0.047
 permission+0.041
 apology+0.040
 disl+0.040
 choices+0.039
Neg logit contributions
lic-0.036
-0.032
ink-0.031
Inside-0.031
Along-0.030
Token said
Token ID531
Feature activation-7.08e-03

Pos logit contributions
 supervision+0.135
 permission+0.116
 apology+0.112
 choices+0.111
 disl+0.109
Neg logit contributions
lic-0.138
-0.132
Inside-0.131
On-0.127
Along-0.126
Token they
Token ID484
Feature activation+8.89e-01

Pos logit contributions
lic+0.000
+0.000
Inside+0.000
On+0.000
Along+0.000
Neg logit contributions
 supervision-0.000
 choices-0.000
 permission-0.000
 disl-0.000
 apology-0.000
Token mom
Token ID1995
Feature activation+4.62e+00

Pos logit contributions
lic+0.044
+0.043
Inside+0.042
On+0.041
Along+0.041
Neg logit contributions
 supervision-0.034
 permission-0.028
 apology-0.027
 choices-0.027
 disl-0.027
Token came
Token ID1625
Feature activation-4.61e+00

Pos logit contributions
 supervision+0.177
 disl+0.154
 permission+0.152
 choices+0.152
 apology+0.150
Neg logit contributions
lic-0.145
Inside-0.133
On-0.133
-0.129
Along-0.127
Token to
Token ID284
Feature activation+8.24e-02

Pos logit contributions
lic+0.158
+0.155
Inside+0.146
On+0.137
Along+0.137
Neg logit contributions
 supervision-0.155
 permission-0.143
 attention-0.131
 choices-0.130
 apology-0.129
Token help
Token ID1037
Feature activation+4.30e+00

Pos logit contributions
 supervision+0.003
 choices+0.002
 decisions+0.002
 apology+0.002
 attention+0.002
Neg logit contributions
Inside-0.003
On-0.003
-0.003
lic-0.003
Along-0.002
Token and
Token ID290
Feature activation+6.39e-01

Pos logit contributions
 supervision+0.144
 disl+0.124
 choices+0.119
 apology+0.118
 permission+0.118
Neg logit contributions
Inside-0.144
lic-0.143
On-0.141
-0.141
Along-0.131
Token together
Token ID1978
Feature activation-1.85e+00

Pos logit contributions
 supervision+0.025
 choices+0.022
 apology+0.022
 disl+0.022
 permission+0.022
Neg logit contributions
lic-0.019
Inside-0.018
On-0.018
Along-0.016
Under-0.016
Token they
Token ID484
Feature activation-1.86e+00

Pos logit contributions
lic+0.059
+0.057
Inside+0.056
On+0.055
Along+0.053
Neg logit contributions
 supervision-0.069
 apology-0.059
 permission-0.059
 disl-0.058
 choices-0.058
Token found
Token ID1043
Feature activation-4.24e+00

Pos logit contributions
lic+0.052
Inside+0.049
+0.048
On+0.048
Along+0.047
Neg logit contributions
 supervision-0.077
 apology-0.067
 disl-0.066
 permission-0.066
 choices-0.066
Token some
Token ID617
Feature activation+8.45e-01

Pos logit contributions
lic+0.093
Inside+0.085
+0.084
On+0.082
Along+0.079
Neg logit contributions
 supervision-0.192
 disl-0.175
 apology-0.170
 permission-0.167
 choices-0.165
Token old
Token ID1468
Feature activation+1.80e+00

Pos logit contributions
 supervision+0.031
 permission+0.026
 disl+0.026
 apology+0.025
 choices+0.025
Neg logit contributions
lic-0.027
Inside-0.025
On-0.025
-0.025
Along-0.024
Token clothes
Token ID8242
Feature activation-8.21e-01

Pos logit contributions
 supervision+0.068
 apology+0.060
 disl+0.059
 permission+0.056
 harm+0.055
Neg logit contributions
lic-0.050
Inside-0.049
On-0.048
Along-0.046
 clearing-0.045
Token Max
Token ID5436
Feature activation+9.08e-01

Pos logit contributions
 supervision+0.049
 apology+0.042
 permission+0.042
 choices+0.042
 kindness+0.041
Neg logit contributions
lic-0.042
-0.039
Inside-0.039
Along-0.038
On-0.038
Token let
Token ID1309
Feature activation+4.79e+00

Pos logit contributions
 supervision+0.030
 permission+0.026
 apology+0.026
 kindness+0.025
 disl+0.025
Neg logit contributions
lic-0.030
-0.030
Inside-0.029
On-0.028
Along-0.028
Token the
Token ID262
Feature activation+1.23e+00

Pos logit contributions
 supervision+0.156
 permission+0.132
 disl+0.128
 choices+0.127
 apology+0.127
Neg logit contributions
lic-0.173
-0.166
Inside-0.166
On-0.162
Along-0.160
Token cat
Token ID3797
Feature activation+2.98e+00

Pos logit contributions
 supervision+0.044
 permission+0.038
 choices+0.036
 disl+0.036
 apology+0.036
Neg logit contributions
lic-0.041
-0.039
Inside-0.039
On-0.038
Along-0.037
Token hold
Token ID1745
Feature activation-8.09e-01

Pos logit contributions
 supervision+0.097
 permission+0.078
 disl+0.078
 choices+0.078
 apology+0.076
Neg logit contributions
lic-0.107
Inside-0.102
-0.101
On-0.100
Along-0.099
Token onto
Token ID4291
Feature activation-1.22e+00

Pos logit contributions
+0.022
lic+0.022
Inside+0.022
On+0.021
Along+0.021
Neg logit contributions
 supervision-0.031
 disl-0.030
 permission-0.026
 privacy-0.026
grand-0.025
Token his
Token ID465
Feature activation-2.10e-01

Pos logit contributions
+0.038
lic+0.037
Inside+0.036
On+0.036
Along+0.035
Neg logit contributions
 supervision-0.044
 disl-0.038
 permission-0.037
 choices-0.036
 apology-0.036
Token red
Token ID2266
Feature activation+1.30e+00

Pos logit contributions
+0.006
Inside+0.006
lic+0.006
On+0.006
Along+0.006
Neg logit contributions
 supervision-0.007
 disl-0.007
 permission-0.006
 choices-0.006
 apology-0.006
Token collar
Token ID19908
Feature activation-3.90e-01

Pos logit contributions
 supervision+0.048
 permission+0.042
 apology+0.040
 choices+0.040
 kindness+0.039
Neg logit contributions
lic-0.043
-0.039
Inside-0.039
Along-0.038
On-0.038
Token.
Token ID13
Feature activation+9.10e-01

Pos logit contributions
lic+0.010
+0.010
Inside+0.010
On+0.010
Along+0.009
Neg logit contributions
 supervision-0.016
 permission-0.014
 disl-0.014
 apology-0.014
 choices-0.014
Token Together
Token ID17083
Feature activation-1.37e+00

Pos logit contributions
 supervision+0.029
 apology+0.025
 kindness+0.025
 choices+0.025
 disl+0.024
Neg logit contributions
lic-0.030
-0.029
Along-0.029
Inside-0.028
On-0.028
Token it
Token ID340
Feature activation-2.53e+00

Pos logit contributions
 supervision+0.039
 permission+0.034
 apology+0.034
 disl+0.032
 choices+0.031
Neg logit contributions
lic-0.026
Inside-0.025
On-0.024
-0.023
 clearing-0.023
Token tasted
Token ID29187
Feature activation-4.50e-01

Pos logit contributions
lic+0.075
+0.072
Inside+0.071
On+0.069
Along+0.069
Neg logit contributions
 supervision-0.095
 permission-0.083
 choices-0.081
 apology-0.081
 kindness-0.081
Token so
Token ID523
Feature activation+3.52e+00

Pos logit contributions
lic+0.014
+0.013
Inside+0.013
On+0.013
Along+0.013
Neg logit contributions
 supervision-0.017
 permission-0.014
 apology-0.014
 choices-0.014
 disl-0.014
Token good
Token ID922
Feature activation+6.17e+00

Pos logit contributions
 supervision+0.129
 permission+0.112
 apology+0.108
 choices+0.108
 disl+0.106
Neg logit contributions
lic-0.113
-0.107
Inside-0.106
On-0.103
Along-0.102
Token!
Token ID0
Feature activation+9.94e-01

Pos logit contributions
 supervision+0.267
 apology+0.233
 permission+0.230
 disl+0.225
 kindness+0.216
Neg logit contributions
lic-0.156
-0.148
Inside-0.145
On-0.139
Along-0.135
Token Now
Token ID2735
Feature activation+5.97e-01

Pos logit contributions
 supervision+0.035
 disl+0.031
 choices+0.031
 apology+0.031
 kindness+0.031
Neg logit contributions
lic-0.031
On-0.028
Inside-0.028
Along-0.028
-0.028
Token the
Token ID262
Feature activation+1.67e+00

Pos logit contributions
 supervision+0.023
 permission+0.019
 apology+0.019
 disl+0.019
 choices+0.018
Neg logit contributions
lic-0.018
Inside-0.018
-0.017
On-0.017
Along-0.017
Token little
Token ID1310
Feature activation+3.80e+00

Pos logit contributions
 supervision+0.069
 permission+0.058
 disl+0.057
 apology+0.056
 choices+0.055
Neg logit contributions
lic-0.046
Inside-0.045
-0.044
On-0.044
Along-0.043
Token boy
Token ID2933
Feature activation+2.87e+00

Pos logit contributions
 supervision+0.139
 permission+0.116
 disl+0.114
 apology+0.112
 harm+0.107
Neg logit contributions
lic-0.118
-0.117
Inside-0.116
On-0.113
Along-0.111
Token goes
Token ID2925
Feature activation-1.24e+00

Pos logit contributions
 supervision+0.123
 permission+0.107
 apology+0.106
 choices+0.105
 disl+0.104
Neg logit contributions
lic-0.074
-0.071
Inside-0.070
On-0.068
Along-0.067
Token searching
Token ID10342
Feature activation-1.80e+00

Pos logit contributions
lic+0.049
+0.047
Inside+0.046
On+0.045
Along+0.045
Neg logit contributions
 supervision-0.036
 permission-0.031
 choices-0.029
 apology-0.028
 attention-0.028
Token rules
Token ID3173
Feature activation+5.17e+00

Pos logit contributions
 supervision+0.111
 disl+0.090
 permission+0.089
 apology+0.085
 choices+0.085
Neg logit contributions
lic-0.116
On-0.112
Inside-0.111
Along-0.111
-0.110
Token.
Token ID13
Feature activation+1.34e+00

Pos logit contributions
 supervision+0.194
 disl+0.171
 permission+0.170
 apology+0.168
 choices+0.167
Neg logit contributions
lic-0.149
-0.146
Inside-0.145
On-0.143
Along-0.138
Token<|endoftext|>
Token ID50256
Feature activation-3.27e-01

Pos logit contributions
 supervision+0.058
 permission+0.051
 apology+0.051
 kindness+0.050
 choices+0.050
Neg logit contributions
lic-0.034
Inside-0.031
-0.030
Along-0.030
On-0.029
TokenOnce
Token ID7454
Feature activation-2.87e+00

Pos logit contributions
lic+0.009
Inside+0.008
Along+0.008
On+0.007
Finally+0.007
Neg logit contributions
 supervision-0.011
 kindness-0.010
 harm-0.010
 permission-0.010
 apology-0.010
Token upon
Token ID2402
Feature activation-5.94e+00

Pos logit contributions
lic+0.040
+0.035
Inside+0.035
On+0.033
Along+0.032
Neg logit contributions
 supervision-0.157
 permission-0.143
 choices-0.139
 apology-0.139
 disl-0.139
Token a
Token ID257
Feature activation-3.55e+00

Pos logit contributions
lic+0.170
+0.162
Inside+0.154
Along+0.149
On+0.146
Neg logit contributions
 supervision-0.230
 permission-0.207
 apology-0.201
 kindness-0.201
 choices-0.200
Token time
Token ID640
Feature activation-3.01e+00

Pos logit contributions
lic+0.105
Inside+0.092
+0.089
 clearing+0.083
Along+0.083
Neg logit contributions
 supervision-0.132
 apology-0.120
 kindness-0.117
 harm-0.114
 permission-0.114
Token,
Token ID11
Feature activation-2.86e+00

Pos logit contributions
lic+0.103
Inside+0.094
Along+0.092
+0.091
On+0.091
Neg logit contributions
 supervision-0.104
 permission-0.089
 disl-0.086
 kindness-0.086
 apology-0.085
Token there
Token ID612
Feature activation-5.28e+00

Pos logit contributions
+0.087
lic+0.086
Inside+0.075
Along+0.074
ink+0.071
Neg logit contributions
 supervision-0.100
 permission-0.088
 apology-0.087
 kindness-0.087
 attention-0.086
Token was
Token ID373
Feature activation-6.13e+00

Pos logit contributions
lic+0.156
Inside+0.144
+0.143
Along+0.139
On+0.138
Neg logit contributions
 supervision-0.204
 permission-0.179
 apology-0.175
 choices-0.175
 kindness-0.173
Token a
Token ID257
Feature activation-5.26e+00

Pos logit contributions
lic+0.187
+0.172
Inside+0.167
Along+0.159
On+0.158
Neg logit contributions
 supervision-0.227
 permission-0.205
 kindness-0.200
 apology-0.196
 choices-0.195
Token room
Token ID2119
Feature activation-5.90e+00

Pos logit contributions
lic+0.086
+0.082
Inside+0.078
Along+0.077
On+0.076
Neg logit contributions
 supervision-0.097
 permission-0.087
 choices-0.085
 apology-0.082
 kindness-0.082
Token.
Token ID13
Feature activation+1.22e-01

Pos logit contributions
lic+0.204
+0.195
Inside+0.193
On+0.188
Along+0.187
Neg logit contributions
 supervision-0.199
 permission-0.171
 apology-0.164
 choices-0.164
 disl-0.161
Token<|endoftext|>
Token ID50256
Feature activation-1.47e+00

Pos logit contributions
 supervision+0.005
 disl+0.004
 apology+0.004
 permission+0.004
 kindness+0.004
Neg logit contributions
lic-0.003
Inside-0.003
Along-0.003
On-0.003
-0.003
TokenOne
Token ID3198
Feature activation-3.44e+00

Pos logit contributions
lic+0.040
Inside+0.035
Along+0.035
Finally+0.034
On+0.033
Neg logit contributions
 supervision-0.045
 harm-0.042
 kindness-0.042
 permission-0.040
 choices-0.040
Token day
Token ID1110
Feature activation-5.04e+00

Pos logit contributions
lic+0.135
+0.130
Inside+0.127
Along+0.123
On+0.122
Neg logit contributions
 supervision-0.100
 permission-0.086
 apology-0.082
 choices-0.082
 kindness-0.082
Token,
Token ID11
Feature activation-3.89e+00

Pos logit contributions
lic+0.184
+0.167
Inside+0.166
Along+0.163
 clearing+0.163
Neg logit contributions
 supervision-0.148
 permission-0.136
 attention-0.124
 kindness-0.121
 disl-0.119
Token a
Token ID257
Feature activation-5.19e+00

Pos logit contributions
lic+0.150
+0.134
Inside+0.130
ink+0.129
Along+0.127
Neg logit contributions
 supervision-0.094
 permission-0.089
 kindness-0.084
 attention-0.081
 disl-0.079
Token little
Token ID1310
Feature activation-3.95e-01

Pos logit contributions
lic+0.166
Inside+0.161
+0.158
Along+0.151
On+0.151
Neg logit contributions
 supervision-0.167
 apology-0.153
 kindness-0.153
 permission-0.148
 disl-0.145
Token boy
Token ID2933
Feature activation-7.60e-01

Pos logit contributions
lic+0.016
Inside+0.016
+0.015
On+0.015
Along+0.015
Neg logit contributions
 supervision-0.011
 permission-0.009
 apology-0.009
 choices-0.009
 kindness-0.009
Token was
Token ID373
Feature activation-5.84e+00

Pos logit contributions
lic+0.018
+0.017
Inside+0.016
On+0.016
Along+0.015
Neg logit contributions
 supervision-0.035
 permission-0.031
 apology-0.030
 choices-0.030
 attention-0.030
Token walking
Token ID6155
Feature activation-5.02e+00

Pos logit contributions
lic+0.197
Inside+0.186
+0.184
On+0.178
Along+0.176
Neg logit contributions
 supervision-0.205
 permission-0.181
 choices-0.171
 apology-0.171
 kindness-0.168
Token farms
Token ID15893
Feature activation-9.77e-01

Pos logit contributions
 supervision+0.066
 apology+0.055
 disl+0.054
 permission+0.053
 attention+0.050
Neg logit contributions
lic-0.072
Inside-0.072
On-0.071
Along-0.070
-0.070
Token,
Token ID11
Feature activation-1.13e+00

Pos logit contributions
lic+0.025
+0.024
Inside+0.023
On+0.023
Along+0.022
Neg logit contributions
 supervision-0.038
 apology-0.035
 disl-0.034
 permission-0.034
 choices-0.032
Token and
Token ID290
Feature activation+1.06e+00

Pos logit contributions
lic+0.030
Inside+0.029
+0.029
On+0.029
Along+0.028
Neg logit contributions
 supervision-0.046
 permission-0.040
 apology-0.040
 disl-0.039
 choices-0.038
Token they
Token ID484
Feature activation-1.69e+00

Pos logit contributions
 supervision+0.039
 apology+0.035
 disl+0.033
 permission+0.032
 harm+0.031
Neg logit contributions
lic-0.030
Inside-0.029
Along-0.029
On-0.028
-0.027
Token both
Token ID1111
Feature activation-1.29e+00

Pos logit contributions
lic+0.048
+0.045
Inside+0.045
On+0.044
Along+0.043
Neg logit contributions
 supervision-0.069
 permission-0.060
 apology-0.060
 choices-0.059
 attention-0.058
Token looked
Token ID3114
Feature activation-5.26e+00

Pos logit contributions
lic+0.037
+0.034
Inside+0.033
On+0.032
Along+0.032
Neg logit contributions
 supervision-0.051
 permission-0.045
 apology-0.045
 choices-0.044
 attention-0.044
Token fancy
Token ID14996
Feature activation+8.91e-01

Pos logit contributions
lic+0.161
Inside+0.150
On+0.148
+0.147
Along+0.147
Neg logit contributions
 supervision-0.205
 permission-0.174
 apology-0.172
 disl-0.170
 choices-0.166
Token.
Token ID13
Feature activation+9.79e-01

Pos logit contributions
 supervision+0.034
 apology+0.029
 permission+0.029
 disl+0.028
 choices+0.027
Neg logit contributions
lic-0.027
-0.026
Inside-0.025
On-0.025
Along-0.025
Token They
Token ID1119
Feature activation-1.68e+00

Pos logit contributions
 supervision+0.041
 permission+0.037
 apology+0.036
 choices+0.036
 disl+0.035
Neg logit contributions
lic-0.026
-0.024
Inside-0.024
On-0.023
Along-0.023
Token could
Token ID714
Feature activation+2.59e-02

Pos logit contributions
lic+0.048
Inside+0.046
+0.045
On+0.045
Along+0.044
Neg logit contributions
 supervision-0.068
 choices-0.059
 apology-0.059
 permission-0.058
 attention-0.057
Token now
Token ID783
Feature activation+1.70e+00

Pos logit contributions
 supervision+0.001
 permission+0.001
 apology+0.001
 choices+0.001
 disl+0.001
Neg logit contributions
lic-0.001
-0.001
Inside-0.001
On-0.001
Along-0.001
Token find
Token ID1064
Feature activation+3.39e-01

Pos logit contributions
 supervision+0.079
 permission+0.066
 apology+0.064
 choices+0.064
 disl+0.063
Neg logit contributions
lic-0.086
Inside-0.084
-0.083
On-0.082
Along-0.080
Token.
Token ID13
Feature activation+1.91e-01

Pos logit contributions
 supervision+0.014
 apology+0.013
 disl+0.012
 permission+0.012
 choices+0.012
Neg logit contributions
Inside-0.009
lic-0.008
On-0.008
Along-0.008
-0.007
Token One
Token ID1881
Feature activation-2.58e+00

Pos logit contributions
 supervision+0.006
 kindness+0.006
 apology+0.006
 permission+0.005
 choices+0.005
Neg logit contributions
lic-0.006
ink-0.006
-0.006
P-0.005
ithering-0.005
Token day
Token ID1110
Feature activation-4.08e+00

Pos logit contributions
lic+0.112
+0.111
Inside+0.104
Along+0.100
On+0.099
Neg logit contributions
 supervision-0.060
 permission-0.051
 kindness-0.051
 choices-0.050
 apology-0.050
Token,
Token ID11
Feature activation-3.56e+00

Pos logit contributions
lic+0.147
+0.139
Inside+0.132
Along+0.131
On+0.131
Neg logit contributions
 supervision-0.124
 permission-0.113
 attention-0.109
 kindness-0.109
 choices-0.107
Token he
Token ID339
Feature activation-4.58e+00

Pos logit contributions
lic+0.141
+0.131
Inside+0.130
Along+0.127
On+0.125
Neg logit contributions
 supervision-0.101
 permission-0.088
 kindness-0.083
 choices-0.083
 apology-0.082
Token found
Token ID1043
Feature activation-4.81e+00

Pos logit contributions
+0.162
lic+0.157
Inside+0.142
On+0.135
Along+0.133
Neg logit contributions
 supervision-0.141
 permission-0.129
 harm-0.122
 kindness-0.117
 disl-0.117
Token a
Token ID257
Feature activation-4.16e+00

Pos logit contributions
lic+0.131
+0.126
Inside+0.121
On+0.118
Along+0.116
Neg logit contributions
 supervision-0.199
 permission-0.179
 apology-0.171
 choices-0.171
 kindness-0.170
Token y
Token ID331
Feature activation+2.40e+00

Pos logit contributions
lic+0.126
+0.124
Inside+0.116
ink+0.113
Along+0.110
Neg logit contributions
 supervision-0.147
 kindness-0.134
 permission-0.131
 apology-0.131
 attention-0.127
Tokenummy
Token ID13513
Feature activation+1.44e+00

Pos logit contributions
 supervision+0.070
 choices+0.067
 attention+0.064
 permission+0.063
 harm+0.062
Neg logit contributions
lic-0.086
Inside-0.083
On-0.081
Along-0.080
Under-0.078
Token treat
Token ID2190
Feature activation-7.18e-03

Pos logit contributions
 supervision+0.042
 choices+0.038
 kindness+0.038
 apology+0.037
 permission+0.037
Neg logit contributions
lic-0.053
-0.050
Inside-0.050
On-0.049
Along-0.049
Token out
Token ID503
Feature activation-3.50e+00

Pos logit contributions
lic+0.055
Inside+0.052
+0.051
On+0.050
Along+0.049
Neg logit contributions
 supervision-0.069
 permission-0.059
 apology-0.058
 disl-0.057
 choices-0.057
Token of
Token ID286
Feature activation+1.12e+00

Pos logit contributions
lic+0.108
Inside+0.101
+0.101
On+0.099
Along+0.097
Neg logit contributions
 supervision-0.133
 permission-0.115
 apology-0.112
 choices-0.111
 disl-0.111
Token the
Token ID262
Feature activation+1.01e+00

Pos logit contributions
 supervision+0.038
 permission+0.032
 apology+0.031
 choices+0.031
 disl+0.031
Neg logit contributions
lic-0.038
-0.036
Inside-0.036
On-0.035
Along-0.035
Token fireplace
Token ID44250
Feature activation-3.68e+00

Pos logit contributions
 supervision+0.042
 disl+0.036
 permission+0.036
 apology+0.035
 choices+0.035
Neg logit contributions
lic-0.027
-0.026
Inside-0.026
On-0.025
Along-0.025
Token and
Token ID290
Feature activation+5.90e-01

Pos logit contributions
lic+0.112
+0.106
Inside+0.105
On+0.102
Along+0.101
Neg logit contributions
 supervision-0.139
 permission-0.122
 apology-0.119
 choices-0.117
 disl-0.117
Token was
Token ID373
Feature activation-4.95e+00

Pos logit contributions
 supervision+0.022
 permission+0.019
 apology+0.018
 choices+0.018
 disl+0.018
Neg logit contributions
lic-0.019
-0.018
Inside-0.017
On-0.017
Along-0.017
Token sitting
Token ID5586
Feature activation-5.15e+00

Pos logit contributions
+0.159
lic+0.158
Inside+0.154
On+0.148
Along+0.147
Neg logit contributions
 supervision-0.178
 permission-0.157
 kindness-0.147
 harm-0.145
 apology-0.144
Token in
Token ID287
Feature activation-2.40e+00

Pos logit contributions
lic+0.158
Inside+0.148
+0.147
On+0.141
Along+0.140
Neg logit contributions
 supervision-0.192
 permission-0.175
 choices-0.168
 kindness-0.163
 attention-0.163
Token the
Token ID262
Feature activation-6.10e-01

Pos logit contributions
lic+0.088
+0.082
Inside+0.082
Along+0.080
On+0.079
Neg logit contributions
 supervision-0.077
 permission-0.067
 apology-0.064
 kindness-0.063
 choices-0.063
Token corner
Token ID5228
Feature activation-2.96e+00

Pos logit contributions
lic+0.017
+0.016
Inside+0.016
On+0.016
Along+0.015
Neg logit contributions
 supervision-0.025
 permission-0.022
 apology-0.021
 choices-0.021
 kindness-0.021
Token of
Token ID286
Feature activation+1.43e+00

Pos logit contributions
lic+0.090
+0.086
Inside+0.085
On+0.082
Along+0.081
Neg logit contributions
 supervision-0.112
 permission-0.098
 apology-0.095
 choices-0.094
 disl-0.093