TOP ACTIVATIONS
MAX = 14.150

bought us a treat . Sp aghetti !" he said .
. Maybe Red dy or Sp icy or Hot ." Lily
? That is silly . Sp iders don 't talk ,"
to leave it alone . Sp iders can be a bit
named Sp ide y . Sp ide y lived in a
rooms . One day , Sp ide y was trying to
, something strange happened . Sp ide y 's web started
named Sp ide y . Sp ide y lived in a
attic . Every day , Sp ide y would spin cob
his house look dirty . Sp ide y decided to clean
named Sp ide y . Sp ide y lived in a
back to their homes . Sp ide y was so happy
forest . One day , Sp ide y noticed that there
From that day on , Sp ide y knew that even
caught in his web . Sp ide y was so happy
's web went flying . Sp ide y held on tight
over . When Sp ide y opened his eyes
blew through the attic and Sp ide y 's web went
was a little spider named Sp ide y . Sp ide
catch Sp ide y . Sp ide y wasn 't ready

MIN ACTIVATIONS
MIN = -10.010

heavy . Jack bid the old man farewell ,
shrimp and the butterfly that whispered in the pond . <|endoftext|>
slide always listened as they whispered their secrets . <|endoftext|> Once
, even when its friend whispered its name . <|endoftext|> Once
fun afternoon . He then bid farewell to all the birds
some of his toys and bid him goodbye .
. So , the whale whispered to the little fish ,
't tell you ," Lily whispered . " It 's too
on , Tweet ie never whispered happy secrets again . <|endoftext|>
skull thanked the tree and whispered , " Thank you for
she remembered what the wall whispered to her . <|endoftext|> Once
for them ! Tim my whispered to Billy , " We
morning , the white flower whispered its morning song to the
ill . The pr une whispered back to its friend ,
and the shadows of night whispered in the distance . <|endoftext|>
" Come back ," whispered Tim my .
I 'm hiding ," Annie whispered . " My friends were
watched the sunset , and whispered about what a beautiful day
L ily whispered , " What do they
en ched , the mouse bid goodbye to the huge ant

INTERVAL 8.110 - 14.150
CONTAINS 0.131%

replied , � � € � � No , I don
feel better ." Tim my felt better knowing that
the noise . Tim my pushed and pulled with
said , � � € � � Do you want to
our walk ." O live agreed to be patient

INTERVAL 2.070 - 8.110
CONTAINS 21.618%

he shouted . L ily was scared . She
S ara opens the glue and squee zes some on
every day . She sings in the morning , in the
stop weeping . She we pt and we pt until her
dress - up . <|endoftext|> Once upon a time , there

INTERVAL -3.970 - 2.070
CONTAINS 71.891%

made them open their mouths and say " ah h ".
new car . This one is broken ." Billy
she asked her mother for a blue berry because she had
. They were boring and quiet . He liked trucks .
it some y ummy food and a bed to sleep in

INTERVAL -10.010 - -3.970
CONTAINS 6.361%

ground . He stopped running and picked them up . But
, there was a little girl named Lily . One day
are selling leaves !" Ben says . " They are pretty
, her books , and her envelop es in the box
spotted an old witch in her path . The witch had
Token bought
Token ID5839
Feature activation-6.76e-01

Pos logit contributions
ocol+0.134
wic+0.133
ions+0.132
ptions+0.132
olving+0.129
Neg logit contributions
 back-0.088
 proud-0.078
 goodbye-0.076
 thanked-0.076
 happy-0.076
Token us
Token ID514
Feature activation+1.15e+00

Pos logit contributions
 back+0.022
 goodbye+0.020
 proud+0.019
 woman+0.019
 happy+0.018
Neg logit contributions
wic-0.034
ocol-0.033
olving-0.033
ptions-0.033
ions-0.033
Token a
Token ID257
Feature activation+8.03e-01

Pos logit contributions
ocol+0.062
ptions+0.061
wic+0.061
olving+0.061
lex+0.061
Neg logit contributions
 back-0.035
 goodbye-0.029
 proud-0.028
 happy-0.027
 thanked-0.026
Token treat
Token ID2190
Feature activation-7.86e-01

Pos logit contributions
ocol+0.044
wic+0.044
ions+0.044
ptions+0.043
olving+0.043
Neg logit contributions
 back-0.023
 goodbye-0.020
 proud-0.020
 happy-0.019
 woman-0.019
Token.
Token ID13
Feature activation+3.63e-01

Pos logit contributions
 back+0.021
 goodbye+0.017
 proud+0.017
 woman+0.016
 happy+0.016
Neg logit contributions
ocol-0.045
ptions-0.045
wic-0.045
olving-0.044
lex-0.043
Token Sp
Token ID1338
Feature activation+1.42e+01

Pos logit contributions
wic+0.017
ocol+0.017
olving+0.017
ptions+0.017
ions+0.016
Neg logit contributions
 back-0.013
 goodbye-0.012
 proud-0.011
 woman-0.011
 relieved-0.011
Tokenaghetti
Token ID35812
Feature activation+5.48e+00

Pos logit contributions
wic+0.438
ocol+0.437
ptions+0.417
olving+0.406
 cheating+0.404
Neg logit contributions
 back-0.672
 woman-0.635
 proud-0.631
 happy-0.620
 excited-0.616
Token!"
Token ID2474
Feature activation+2.38e-01

Pos logit contributions
ocol+0.290
ptions+0.288
olving+0.280
wic+0.279
lex+0.279
Neg logit contributions
 back-0.173
 goodbye-0.146
 proud-0.141
 woman-0.139
 thanked-0.139
Token he
Token ID339
Feature activation-1.45e+00

Pos logit contributions
wic+0.012
ions+0.012
ocol+0.012
 Sp+0.011
lex+0.011
Neg logit contributions
 back-0.008
 goodbye-0.007
 proud-0.007
 relieved-0.006
 woman-0.006
Token said
Token ID531
Feature activation-7.06e+00

Pos logit contributions
 back+0.027
 goodbye+0.020
 proud+0.019
 woman+0.018
 relieved+0.017
Neg logit contributions
wic-0.092
 Sp-0.089
ions-0.088
 arose-0.087
ocol-0.087
Token.
Token ID13
Feature activation+2.30e-01

Pos logit contributions
 back+0.194
 goodbye+0.168
 proud+0.161
 happy+0.155
 relieved+0.152
Neg logit contributions
wic-0.398
ocol-0.397
ptions-0.391
ions-0.388
olving-0.386
Token.
Token ID13
Feature activation+9.68e-02

Pos logit contributions
 back+0.100
 and+0.090
 goodbye+0.081
 happy+0.076
 about+0.060
Neg logit contributions
lex-0.351
ptions-0.350
bek-0.350
icer-0.349
amy-0.344
Token Maybe
Token ID6674
Feature activation+1.42e-01

Pos logit contributions
ptions+0.006
ocol+0.006
wic+0.006
olving+0.006
ions+0.005
Neg logit contributions
 back-0.002
 goodbye-0.002
 relieved-0.002
 woman-0.002
 proud-0.002
Token Red
Token ID2297
Feature activation+3.21e+00

Pos logit contributions
ptions+0.009
wic+0.009
olving+0.009
icer+0.009
ocol+0.008
Neg logit contributions
 back-0.004
 happy-0.003
 woman-0.003
 goodbye-0.003
 proud-0.003
Tokendy
Token ID9892
Feature activation+3.84e+00

Pos logit contributions
ocol+0.166
ptions+0.163
ions+0.159
olving+0.159
lex+0.158
Neg logit contributions
 back-0.108
 proud-0.091
 thanked-0.089
 woman-0.089
 goodbye-0.088
Token or
Token ID393
Feature activation+8.63e-01

Pos logit contributions
ocol+0.201
ptions+0.197
wic+0.194
ions+0.193
olving+0.193
Neg logit contributions
 back-0.124
 goodbye-0.106
 proud-0.103
 thanked-0.102
 woman-0.101
Token Sp
Token ID1338
Feature activation+1.41e+01

Pos logit contributions
wic+0.046
olving+0.046
ocol+0.046
ions+0.045
ptions+0.045
Neg logit contributions
 back-0.028
 happy-0.024
 proud-0.023
 goodbye-0.023
 woman-0.022
Tokenicy
Token ID4611
Feature activation+6.08e+00

Pos logit contributions
wic+0.324
ocol+0.302
ptions+0.298
 Sp+0.282
 cheating+0.269
Neg logit contributions
 back-0.751
 and-0.731
five-0.722
 proud-0.715
 thanked-0.714
Token or
Token ID393
Feature activation+8.24e-01

Pos logit contributions
ptions+0.335
olving+0.309
ocol+0.308
 centimeters+0.302
redible+0.297
Neg logit contributions
 back-0.214
 and-0.188
 man-0.172
 says-0.170
 said-0.166
Token Hot
Token ID6964
Feature activation+7.64e+00

Pos logit contributions
olving+0.042
ptions+0.042
wic+0.042
ions+0.042
ocol+0.042
Neg logit contributions
 back-0.028
 happy-0.024
 proud-0.023
 goodbye-0.023
 woman-0.022
Token."
Token ID526
Feature activation-1.15e+00

Pos logit contributions
ptions+0.452
bek+0.422
olving+0.415
redible+0.402
lex+0.398
Neg logit contributions
 back-0.232
 and-0.221
 says-0.189
 said-0.178
 man-0.176
Token Lily
Token ID20037
Feature activation+2.44e+00

Pos logit contributions
 back+0.024
 proud+0.020
 goodbye+0.020
 woman+0.019
 relieved+0.018
Neg logit contributions
 Sp-0.067
wic-0.067
ions-0.066
lex-0.066
ocol-0.065
Token?
Token ID30
Feature activation+1.09e+00

Pos logit contributions
wic+0.190
ocol+0.189
ions+0.186
olving+0.185
ptions+0.184
Neg logit contributions
 back-0.091
 goodbye-0.077
 proud-0.076
 woman-0.074
 relieved-0.072
Token That
Token ID1320
Feature activation+2.34e+00

Pos logit contributions
ptions+0.060
wic+0.058
olving+0.057
ocol+0.057
icer+0.056
Neg logit contributions
 back-0.031
 goodbye-0.028
 proud-0.027
 excited-0.027
 relieved-0.027
Token is
Token ID318
Feature activation+1.81e+00

Pos logit contributions
ocol+0.127
wic+0.122
ptions+0.122
ions+0.121
lex+0.121
Neg logit contributions
 back-0.071
 goodbye-0.060
 woman-0.060
 proud-0.060
 happy-0.059
Token silly
Token ID14397
Feature activation+3.70e+00

Pos logit contributions
ocol+0.116
ptions+0.114
icer+0.113
olving+0.112
wic+0.112
Neg logit contributions
 back-0.044
 happy-0.041
 proud-0.040
 excited-0.033
 relieved-0.033
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
wic+0.221
ocol+0.220
ptions+0.219
olving+0.217
lex+0.216
Neg logit contributions
 back-0.089
 goodbye-0.072
 happy-0.070
 proud-0.070
 woman-0.069
Token Sp
Token ID1338
Feature activation+1.40e+01

Pos logit contributions
wic+0.001
ptions+0.001
olving+0.001
ocol+0.001
 centimeters+0.001
Neg logit contributions
 back-0.001
 goodbye-0.001
 proud-0.001
 relieved-0.001
 woman-0.001
Tokeniders
Token ID4157
Feature activation+5.35e+00

Pos logit contributions
ocol+0.380
wic+0.366
ptions+0.334
uces+0.319
esan+0.317
Neg logit contributions
 back-0.698
 and-0.690
 proud-0.675
 around-0.674
 excited-0.672
Token don
Token ID836
Feature activation+7.02e+00

Pos logit contributions
wic+0.223
ocol+0.219
ions+0.219
olving+0.212
 cheating+0.209
Neg logit contributions
 back-0.208
 goodbye-0.191
 woman-0.188
 wife-0.183
 proud-0.183
Token't
Token ID470
Feature activation+3.53e+00

Pos logit contributions
ions+0.233
wic+0.218
ocol+0.214
 centimeters+0.210
olving+0.210
Neg logit contributions
 back-0.341
 goodbye-0.337
 wife-0.328
 woman-0.321
 thanked-0.313
Token talk
Token ID1561
Feature activation-3.67e+00

Pos logit contributions
ocol+0.143
 cheating+0.140
lex+0.139
wic+0.136
ptions+0.136
Neg logit contributions
 back-0.124
 goodbye-0.123
 relieved-0.117
 woman-0.112
 thanked-0.109
Token,"
Token ID553
Feature activation-2.22e+00

Pos logit contributions
 back+0.103
 goodbye+0.088
 proud+0.086
 woman+0.084
 happy+0.082
Neg logit contributions
wic-0.200
ocol-0.198
ions-0.195
olving-0.194
ptions-0.193
Token to
Token ID284
Feature activation+1.87e+00

Pos logit contributions
 back+0.065
 goodbye+0.057
 proud+0.056
 woman+0.055
 happy+0.054
Neg logit contributions
wic-0.099
ocol-0.098
ions-0.096
olving-0.096
ptions-0.096
Token leave
Token ID2666
Feature activation-2.21e+00

Pos logit contributions
wic+0.108
ions+0.107
ocol+0.106
icer+0.105
ptions+0.104
Neg logit contributions
 back-0.053
 know-0.044
 goodbye-0.042
 happy-0.042
 proud-0.041
Token it
Token ID340
Feature activation-3.25e+00

Pos logit contributions
 back+0.063
 around+0.046
 and+0.045
 goodbye+0.045
 happy+0.045
Neg logit contributions
wic-0.135
olving-0.133
ptions-0.133
lex-0.130
ocol-0.129
Token alone
Token ID3436
Feature activation-1.18e+00

Pos logit contributions
 back+0.080
 goodbye+0.065
 proud+0.063
 happy+0.061
 woman+0.059
Neg logit contributions
wic-0.193
ocol-0.190
ptions-0.186
ions-0.186
olving-0.185
Token.
Token ID13
Feature activation-1.62e-01

Pos logit contributions
 back+0.030
 happy+0.022
 around+0.022
 excited+0.022
 and+0.022
Neg logit contributions
wic-0.074
ocol-0.073
lex-0.071
ptions-0.071
olving-0.069
Token Sp
Token ID1338
Feature activation+1.40e+01

Pos logit contributions
 back+0.005
 goodbye+0.004
 relieved+0.004
 proud+0.004
 woman+0.004
Neg logit contributions
wic-0.009
ocol-0.009
ptions-0.009
olving-0.009
ions-0.008
Tokeniders
Token ID4157
Feature activation+5.35e+00

Pos logit contributions
ocol+0.420
wic+0.402
ptions+0.360
arine+0.350
 Sp+0.350
Neg logit contributions
 back-0.665
 and-0.657
he-0.644
 excited-0.640
 woman-0.635
Token can
Token ID460
Feature activation+7.78e+00

Pos logit contributions
wic+0.217
ocol+0.215
ions+0.212
olving+0.209
ptions+0.205
Neg logit contributions
 back-0.216
 goodbye-0.198
 woman-0.196
 proud-0.194
 relieved-0.190
Token be
Token ID307
Feature activation+1.70e+00

Pos logit contributions
ocol+0.327
wic+0.326
lex+0.321
ions+0.320
olving+0.316
Neg logit contributions
 goodbye-0.277
 back-0.275
 woman-0.262
 relieved-0.260
 proud-0.259
Token a
Token ID257
Feature activation+7.33e-01

Pos logit contributions
wic+0.090
ions+0.087
ocol+0.087
olving+0.086
lex+0.086
Neg logit contributions
 back-0.056
 proud-0.048
 happy-0.048
 relieved-0.046
 goodbye-0.046
Token bit
Token ID1643
Feature activation+2.38e+00

Pos logit contributions
ions+0.043
ocol+0.043
wic+0.042
ptions+0.041
icer+0.040
Neg logit contributions
 back-0.023
 happy-0.020
 proud-0.019
 woman-0.019
 goodbye-0.018
Token named
Token ID3706
Feature activation+1.05e+00

Pos logit contributions
ocol+0.184
wic+0.183
ptions+0.181
olving+0.179
lex+0.176
Neg logit contributions
 back-0.107
 woman-0.092
 goodbye-0.091
 proud-0.091
 thanked-0.088
Token Sp
Token ID1338
Feature activation+1.37e+01

Pos logit contributions
ocol+0.046
olving+0.046
 centimeters+0.045
wic+0.045
 topp+0.044
Neg logit contributions
 back-0.041
 around-0.037
-0.035
 out-0.035
 woman-0.035
Tokenide
Token ID485
Feature activation+4.22e+00

Pos logit contributions
ocol+0.498
wic+0.478
 cheating+0.451
ptions+0.446
arine+0.443
Neg logit contributions
 back-0.569
 and-0.560
he-0.553
 woman-0.542
 around-0.538
Tokeny
Token ID88
Feature activation+1.37e+00

Pos logit contributions
ocol+0.228
ptions+0.226
lex+0.219
 cheating+0.217
arine+0.217
Neg logit contributions
 back-0.135
 and-0.116
 around-0.114
 happy-0.108
 excited-0.107
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
ocol+0.095
ptions+0.093
olving+0.090
lex+0.089
ions+0.089
Neg logit contributions
 back-0.025
 woman-0.021
 and-0.021
 thanked-0.018
 happy-0.018
Token Sp
Token ID1338
Feature activation+1.40e+01

Pos logit contributions
 He+0.001
+0.001
 excited+0.001
 She+0.001
 back+0.001
Neg logit contributions
gh-0.000
 injuries-0.000
ocol-0.000
ptions-0.000
bek-0.000
Tokenide
Token ID485
Feature activation+4.50e+00

Pos logit contributions
ocol+0.440
wic+0.433
ptions+0.401
 cheating+0.398
arine+0.393
Neg logit contributions
 back-0.662
 happy-0.622
 proud-0.619
 excited-0.617
 around-0.612
Tokeny
Token ID88
Feature activation+1.44e+00

Pos logit contributions
ocol+0.212
ptions+0.205
wic+0.204
lex+0.204
olving+0.199
Neg logit contributions
 back-0.165
 around-0.143
 happy-0.142
 excited-0.141
 thanked-0.141
Token lived
Token ID5615
Feature activation+2.09e+00

Pos logit contributions
ocol+0.077
wic+0.076
ptions+0.075
ions+0.075
olving+0.075
Neg logit contributions
 back-0.042
 proud-0.036
 goodbye-0.036
 woman-0.035
 happy-0.035
Token in
Token ID287
Feature activation+3.35e+00

Pos logit contributions
wic+0.120
ocol+0.120
ions+0.116
lex+0.116
ptions+0.116
Neg logit contributions
 back-0.056
 proud-0.046
 goodbye-0.045
 happy-0.044
 relieved-0.043
Token a
Token ID257
Feature activation+4.20e-01

Pos logit contributions
wic+0.212
ocol+0.208
ptions+0.207
lex+0.207
 degrees+0.204
Neg logit contributions
 back-0.074
 happy-0.062
 proud-0.058
 excited-0.057
 relieved-0.052
Token rooms
Token ID9519
Feature activation+4.39e+00

Pos logit contributions
 back+0.011
 happy+0.010
 proud+0.010
 excited+0.010
 goodbye+0.009
Neg logit contributions
wic-0.018
lex-0.018
olving-0.018
ptions-0.018
arine-0.018
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
wic+0.260
ocol+0.258
ions+0.252
olving+0.252
ptions+0.252
Neg logit contributions
 back-0.104
 goodbye-0.086
 proud-0.084
 woman-0.081
 happy-0.079
Token One
Token ID1881
Feature activation+2.07e+00

Pos logit contributions
 back+0.000
 happy+0.000
 excited+0.000
 proud+0.000
 goodbye+0.000
Neg logit contributions
wic-0.001
ocol-0.001
olving-0.001
ptions-0.001
lex-0.001
Token day
Token ID1110
Feature activation-1.16e+00

Pos logit contributions
wic+0.097
ions+0.096
olving+0.095
ocol+0.094
 Sp+0.093
Neg logit contributions
 back-0.071
 goodbye-0.065
 proud-0.062
 thanked-0.061
 woman-0.061
Token,
Token ID11
Feature activation+3.50e+00

Pos logit contributions
 back+0.022
 excited+0.021
 happy+0.020
 goodbye+0.020
 and+0.019
Neg logit contributions
lex-0.077
ptions-0.077
arine-0.076
olving-0.076
icer-0.075
Token Sp
Token ID1338
Feature activation+1.40e+01

Pos logit contributions
ocol+0.117
olving+0.115
lex+0.112
arine+0.109
gh+0.106
Neg logit contributions
 happy-0.181
 excited-0.178
 back-0.177
 proud-0.175
 and-0.168
Tokenide
Token ID485
Feature activation+4.46e+00

Pos logit contributions
ocol+0.410
wic+0.406
ptions+0.377
 cheating+0.371
olving+0.369
Neg logit contributions
 back-0.707
 proud-0.656
 happy-0.655
 woman-0.652
 excited-0.647
Tokeny
Token ID88
Feature activation+1.40e+00

Pos logit contributions
ocol+0.197
wic+0.193
lex+0.189
ptions+0.188
olving+0.186
Neg logit contributions
 back-0.176
 goodbye-0.150
 woman-0.149
 proud-0.148
 happy-0.148
Token was
Token ID373
Feature activation+2.18e+00

Pos logit contributions
wic+0.074
ocol+0.072
ions+0.072
olving+0.072
ptions+0.071
Neg logit contributions
 back-0.040
 goodbye-0.035
 proud-0.035
 woman-0.033
 relieved-0.033
Token trying
Token ID2111
Feature activation+2.53e-01

Pos logit contributions
ocol+0.122
wic+0.121
lex+0.119
ions+0.117
olving+0.116
Neg logit contributions
 back-0.068
 proud-0.057
 happy-0.056
 excited-0.055
 goodbye-0.053
Token to
Token ID284
Feature activation+1.70e+00

Pos logit contributions
ocol+0.019
wic+0.018
icer+0.018
ions+0.018
lex+0.018
Neg logit contributions
 back-0.004
 happy-0.002
 proud-0.002
 excited-0.002
 goodbye-0.002
Token,
Token ID11
Feature activation+3.45e+00

Pos logit contributions
 back+0.023
 goodbye+0.021
 and+0.019
 happy+0.018
 excited+0.018
Neg logit contributions
icer-0.088
lex-0.087
ptions-0.087
amy-0.086
uces-0.086
Token something
Token ID1223
Feature activation-3.74e-01

Pos logit contributions
ocol+0.162
lex+0.162
amy+0.157
ptions+0.153
arine+0.151
Neg logit contributions
 back-0.131
 happy-0.130
 excited-0.125
 proud-0.123
 goodbye-0.119
Token strange
Token ID6283
Feature activation+2.10e+00

Pos logit contributions
 back+0.012
 goodbye+0.011
 proud+0.011
 woman+0.010
 happy+0.010
Neg logit contributions
wic-0.019
ocol-0.019
ions-0.018
olving-0.018
ptions-0.018
Token happened
Token ID3022
Feature activation-1.56e+00

Pos logit contributions
ocol+0.102
lex+0.099
ptions+0.099
icer+0.099
wic+0.098
Neg logit contributions
 back-0.076
 thanked-0.066
 goodbye-0.066
 around-0.064
 proud-0.063
Token.
Token ID13
Feature activation-2.00e-02

Pos logit contributions
 back+0.031
 and+0.026
 around+0.020
 excited+0.017
 out+0.016
Neg logit contributions
ocol-0.117
lex-0.115
ptions-0.113
icer-0.112
wic-0.109
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
 back+0.001
+0.001
 excited+0.001
 happy+0.001
 proud+0.001
Neg logit contributions
ocol-0.001
olving-0.001
wic-0.001
 cheating-0.001
lex-0.001
Tokenide
Token ID485
Feature activation+4.40e+00

Pos logit contributions
ocol+0.423
wic+0.419
ptions+0.386
 cheating+0.383
olving+0.379
Neg logit contributions
 back-0.683
 happy-0.637
 proud-0.633
 excited-0.630
 thanked-0.628
Tokeny
Token ID88
Feature activation+1.31e+00

Pos logit contributions
ocol+0.209
wic+0.202
lex+0.202
ptions+0.200
olving+0.196
Neg logit contributions
 back-0.164
 around-0.138
 thanked-0.135
 excited-0.134
 goodbye-0.134
Token's
Token ID338
Feature activation-2.02e+00

Pos logit contributions
ocol+0.076
wic+0.074
ions+0.073
lex+0.073
olving+0.072
Neg logit contributions
 back-0.035
 thanked-0.030
 goodbye-0.029
 proud-0.029
 woman-0.029
Token web
Token ID3992
Feature activation+3.62e+00

Pos logit contributions
 back+0.061
 goodbye+0.053
 happy+0.050
 woman+0.049
 proud+0.049
Neg logit contributions
ocol-0.111
wic-0.111
ions-0.109
ptions-0.109
lex-0.108
Token started
Token ID2067
Feature activation-5.22e-02

Pos logit contributions
ocol+0.202
lex+0.198
ptions+0.197
wic+0.196
icer+0.192
Neg logit contributions
 back-0.109
 goodbye-0.089
 thanked-0.088
 proud-0.086
 happy-0.085
Token named
Token ID3706
Feature activation+1.39e+00

Pos logit contributions
ocol+0.164
wic+0.162
ptions+0.160
olving+0.159
ions+0.157
Neg logit contributions
 back-0.122
 proud-0.107
 goodbye-0.106
 woman-0.106
 happy-0.103
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
ocol+0.064
olving+0.061
 arose+0.059
 centimeters+0.059
wic+0.058
Neg logit contributions
 back-0.059
 around-0.049
 happy-0.047
 out-0.047
 and-0.046
Tokenide
Token ID485
Feature activation+4.02e+00

Pos logit contributions
ocol+0.473
wic+0.454
 cheating+0.433
ptions+0.429
arine+0.418
Neg logit contributions
 back-0.609
 happy-0.579
 and-0.579
 woman-0.575
 proud-0.570
Tokeny
Token ID88
Feature activation+1.18e+00

Pos logit contributions
ocol+0.228
ptions+0.224
lex+0.217
arine+0.217
 cheating+0.216
Neg logit contributions
 back-0.116
 and-0.095
 around-0.094
 happy-0.093
 excited-0.089
Token.
Token ID13
Feature activation-8.95e-02

Pos logit contributions
ocol+0.082
ptions+0.080
olving+0.079
ions+0.077
arine+0.077
Neg logit contributions
 back-0.021
 woman-0.017
 and-0.016
 happy-0.015
 proud-0.015
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
 He+0.005
+0.005
 She+0.005
 excited+0.005
 back+0.005
Neg logit contributions
gh-0.002
ocol-0.002
bek-0.002
 injuries-0.002
ptions-0.002
Tokenide
Token ID485
Feature activation+4.32e+00

Pos logit contributions
ocol+0.419
wic+0.410
ptions+0.382
olving+0.377
 cheating+0.376
Neg logit contributions
 back-0.688
 happy-0.646
 proud-0.644
 excited-0.636
 woman-0.635
Tokeny
Token ID88
Feature activation+1.25e+00

Pos logit contributions
ocol+0.192
wic+0.186
ptions+0.186
lex+0.183
olving+0.183
Neg logit contributions
 back-0.167
 proud-0.144
 happy-0.144
 thanked-0.143
 excited-0.142
Token lived
Token ID5615
Feature activation+2.20e+00

Pos logit contributions
wic+0.061
ocol+0.061
ions+0.059
olving+0.059
ptions+0.059
Neg logit contributions
 back-0.042
 goodbye-0.037
 proud-0.036
 woman-0.036
 happy-0.035
Token in
Token ID287
Feature activation+3.25e+00

Pos logit contributions
wic+0.109
ions+0.109
olving+0.109
ocol+0.107
ptions+0.107
Neg logit contributions
 back-0.065
 goodbye-0.061
 woman-0.059
 thanked-0.056
 relieved-0.056
Token a
Token ID257
Feature activation+4.07e-01

Pos logit contributions
wic+0.189
ocol+0.188
olving+0.184
ptions+0.184
lex+0.183
Neg logit contributions
 back-0.082
 proud-0.068
 happy-0.068
 goodbye-0.065
 woman-0.064
Token attic
Token ID46113
Feature activation+1.29e+00

Pos logit contributions
ocol+0.003
wic+0.003
ions+0.003
ptions+0.003
lex+0.003
Neg logit contributions
 back-0.002
 happy-0.001
 proud-0.001
 woman-0.001
 excited-0.001
Token.
Token ID13
Feature activation+5.64e-02

Pos logit contributions
wic+0.071
ocol+0.070
ions+0.069
olving+0.069
ptions+0.068
Neg logit contributions
 back-0.035
 goodbye-0.030
 proud-0.030
 woman-0.029
 relieved-0.028
Token Every
Token ID3887
Feature activation-1.80e+00

Pos logit contributions
ocol+0.003
wic+0.003
olving+0.003
ptions+0.003
lex+0.003
Neg logit contributions
 back-0.002
 excited-0.002
 happy-0.002
 proud-0.002
 relieved-0.002
Token day
Token ID1110
Feature activation-1.14e+00

Pos logit contributions
 back+0.048
 goodbye+0.041
 happy+0.039
 woman+0.038
 proud+0.038
Neg logit contributions
ocol-0.105
wic-0.104
ptions-0.103
lex-0.101
olving-0.101
Token,
Token ID11
Feature activation+3.58e+00

Pos logit contributions
 back+0.028
 of+0.026
 excited+0.026
 goodbye+0.026
 happy+0.025
Neg logit contributions
icer-0.069
ptions-0.069
lex-0.069
ocol-0.068
arine-0.068
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
lex+0.111
ocol+0.106
arine+0.106
amy+0.106
gh+0.105
Neg logit contributions
 happy-0.191
 excited-0.186
 back-0.185
 proud-0.181
 he-0.180
Tokenide
Token ID485
Feature activation+4.42e+00

Pos logit contributions
ocol+0.433
wic+0.422
ptions+0.395
 cheating+0.393
olving+0.384
Neg logit contributions
 back-0.678
 happy-0.631
 proud-0.627
 excited-0.625
 woman-0.624
Tokeny
Token ID88
Feature activation+1.36e+00

Pos logit contributions
ocol+0.213
wic+0.205
lex+0.205
ptions+0.204
 centimeters+0.199
Neg logit contributions
 back-0.164
 around-0.137
 happy-0.133
 woman-0.133
 and-0.132
Token would
Token ID561
Feature activation+3.62e+00

Pos logit contributions
ocol+0.071
wic+0.070
ions+0.068
ptions+0.067
lex+0.067
Neg logit contributions
 back-0.043
 goodbye-0.037
 woman-0.036
 proud-0.036
 thanked-0.035
Token spin
Token ID7906
Feature activation+1.54e+00

Pos logit contributions
wic+0.201
ocol+0.201
ions+0.197
ptions+0.195
lex+0.193
Neg logit contributions
 back-0.104
 goodbye-0.088
 proud-0.085
 happy-0.084
 woman-0.082
Token cob
Token ID22843
Feature activation+6.17e+00

Pos logit contributions
ocol+0.113
wic+0.113
olving+0.111
ions+0.111
lex+0.110
Neg logit contributions
 back-0.016
 goodbye-0.008
 proud-0.008
 happy-0.007
 woman-0.006
Token his
Token ID465
Feature activation-2.87e+00

Pos logit contributions
lex+0.055
ocol+0.054
wic+0.054
ptions+0.054
olving+0.054
Neg logit contributions
 back-0.025
 happy-0.021
 goodbye-0.020
 excited-0.020
 proud-0.019
Token house
Token ID2156
Feature activation-7.27e-01

Pos logit contributions
five+0.091
 goodbye+0.089
 back+0.088
 thanked+0.087
 proud+0.086
Neg logit contributions
olving-0.128
wic-0.125
 Sp-0.125
icer-0.122
 centimeters-0.121
Token look
Token ID804
Feature activation-1.63e+00

Pos logit contributions
 back+0.023
 proud+0.019
 goodbye+0.019
 happy+0.019
 relieved+0.018
Neg logit contributions
ocol-0.039
wic-0.038
ions-0.038
lex-0.037
ptions-0.037
Token dirty
Token ID11841
Feature activation+4.07e+00

Pos logit contributions
 back+0.049
 goodbye+0.044
 woman+0.044
 proud+0.042
 thanked+0.042
Neg logit contributions
wic-0.083
ocol-0.082
olving-0.080
ptions-0.080
ions-0.080
Token.
Token ID13
Feature activation-2.27e-01

Pos logit contributions
ocol+0.245
wic+0.244
ptions+0.238
ions+0.238
lex+0.237
Neg logit contributions
 back-0.097
 goodbye-0.076
 proud-0.074
 happy-0.071
 woman-0.070
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
 back+0.003
 goodbye+0.002
 proud+0.002
 happy+0.002
 thanked+0.002
Neg logit contributions
ocol-0.015
wic-0.015
olving-0.015
ions-0.015
ptions-0.015
Tokenide
Token ID485
Feature activation+4.33e+00

Pos logit contributions
ocol+0.402
wic+0.399
olving+0.365
ptions+0.364
lex+0.359
Neg logit contributions
 back-0.707
 happy-0.660
 proud-0.659
 thanked-0.651
 excited-0.650
Tokeny
Token ID88
Feature activation+1.20e+00

Pos logit contributions
ocol+0.186
wic+0.184
olving+0.179
ptions+0.178
lex+0.177
Neg logit contributions
 back-0.174
 goodbye-0.152
 proud-0.150
 woman-0.150
 thanked-0.149
Token decided
Token ID3066
Feature activation-1.92e+00

Pos logit contributions
wic+0.066
ocol+0.066
ions+0.065
olving+0.064
ptions+0.064
Neg logit contributions
 back-0.033
 goodbye-0.028
 proud-0.027
 woman-0.027
 thanked-0.026
Token to
Token ID284
Feature activation+1.61e+00

Pos logit contributions
 back+0.046
 proud+0.042
 relieved+0.042
 goodbye+0.040
 woman+0.040
Neg logit contributions
wic-0.100
 cheating-0.099
 Sp-0.099
olving-0.098
 arose-0.097
Token clean
Token ID3424
Feature activation+4.22e+00

Pos logit contributions
ocol+0.093
 centimeters+0.091
wic+0.091
ptions+0.091
icer+0.090
Neg logit contributions
 back-0.046
 know-0.038
 thanked-0.036
 face-0.036
 happy-0.036
Token named
Token ID3706
Feature activation+8.82e-01

Pos logit contributions
wic+0.164
ocol+0.163
ptions+0.160
olving+0.159
ions+0.158
Neg logit contributions
 back-0.110
 goodbye-0.096
 proud-0.095
 woman-0.095
 thanked-0.092
Token Sp
Token ID1338
Feature activation+1.36e+01

Pos logit contributions
wic+0.038
ocol+0.038
 topp+0.037
 centimeters+0.037
olving+0.037
Neg logit contributions
 back-0.036
 around-0.032
 woman-0.030
 excited-0.030
-0.030
Tokenide
Token ID485
Feature activation+4.08e+00

Pos logit contributions
ocol+0.478
wic+0.464
 cheating+0.435
ptions+0.433
arine+0.428
Neg logit contributions
 back-0.572
 and-0.556
he-0.555
 woman-0.549
 around-0.539
Tokeny
Token ID88
Feature activation+1.21e+00

Pos logit contributions
ocol+0.217
ptions+0.216
lex+0.209
 cheating+0.207
arine+0.207
Neg logit contributions
 back-0.134
 and-0.115
 around-0.113
 happy-0.107
 excited-0.106
Token.
Token ID13
Feature activation-2.75e-01

Pos logit contributions
ocol+0.081
ptions+0.080
lex+0.077
olving+0.077
ions+0.076
Neg logit contributions
 back-0.024
 woman-0.020
 and-0.019
 thanked-0.019
 goodbye-0.018
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
+0.014
 He+0.014
 excited+0.013
 back+0.013
 She+0.013
Neg logit contributions
gh-0.009
olving-0.009
ocol-0.009
ptions-0.008
 injuries-0.008
Tokenide
Token ID485
Feature activation+4.40e+00

Pos logit contributions
ocol+0.410
wic+0.409
ptions+0.379
olving+0.371
 cheating+0.370
Neg logit contributions
 back-0.694
 proud-0.646
 happy-0.645
 woman-0.642
 excited-0.638
Tokeny
Token ID88
Feature activation+1.32e+00

Pos logit contributions
ocol+0.201
wic+0.195
ptions+0.194
lex+0.193
olving+0.189
Neg logit contributions
 back-0.166
 happy-0.143
 thanked-0.143
 around-0.142
 proud-0.142
Token lived
Token ID5615
Feature activation+1.93e+00

Pos logit contributions
ocol+0.071
wic+0.070
ions+0.069
olving+0.068
ptions+0.068
Neg logit contributions
 back-0.038
 goodbye-0.033
 proud-0.032
 woman-0.032
 thanked-0.031
Token in
Token ID287
Feature activation+3.24e+00

Pos logit contributions
wic+0.111
ocol+0.110
ions+0.107
ptions+0.107
lex+0.107
Neg logit contributions
 back-0.051
 proud-0.043
 goodbye-0.042
 happy-0.041
 woman-0.040
Token a
Token ID257
Feature activation+3.17e-01

Pos logit contributions
wic+0.200
ocol+0.196
ptions+0.196
lex+0.195
 degrees+0.192
Neg logit contributions
 back-0.076
 happy-0.063
 proud-0.060
 excited-0.059
 goodbye-0.056
Token back
Token ID736
Feature activation-4.94e+00

Pos logit contributions
ions+0.002
wic+0.002
ocol+0.002
 Sp+0.002
olving+0.002
Neg logit contributions
 woman-0.000
 proud-0.000
 relieved-0.000
 wife-0.000
 thanked-0.000
Token to
Token ID284
Feature activation+1.62e+00

Pos logit contributions
 proud+0.085
 relieved+0.083
 woman+0.083
 thanked+0.080
 goodbye+0.075
Neg logit contributions
ions-0.294
olving-0.287
wic-0.286
 Sp-0.284
ocol-0.281
Token their
Token ID511
Feature activation-2.68e+00

Pos logit contributions
ocol+0.079
wic+0.079
ions+0.078
 Sp+0.077
olving+0.077
Neg logit contributions
 back-0.050
 goodbye-0.044
 woman-0.044
 proud-0.044
 relieved-0.043
Token homes
Token ID5682
Feature activation+1.25e+00

Pos logit contributions
 back+0.077
 goodbye+0.063
 proud+0.060
 happy+0.060
 woman+0.059
Neg logit contributions
ocol-0.152
ions-0.152
wic-0.151
ptions-0.149
olving-0.147
Token.
Token ID13
Feature activation-1.99e-01

Pos logit contributions
wic+0.063
olving+0.063
ions+0.063
ocol+0.063
ptions+0.061
Neg logit contributions
 back-0.035
 proud-0.031
 goodbye-0.031
 woman-0.031
 relieved-0.030
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
 back+0.006
 goodbye+0.005
 proud+0.005
 woman+0.005
 relieved+0.005
Neg logit contributions
wic-0.010
ocol-0.010
ions-0.010
olving-0.010
ptions-0.010
Tokenide
Token ID485
Feature activation+4.25e+00

Pos logit contributions
wic+0.381
ocol+0.380
ptions+0.353
olving+0.351
ions+0.349
Neg logit contributions
 back-0.738
 proud-0.687
 happy-0.681
 thanked-0.679
 goodbye-0.675
Tokeny
Token ID88
Feature activation+1.18e+00

Pos logit contributions
ocol+0.178
wic+0.177
ptions+0.172
olving+0.171
ions+0.171
Neg logit contributions
 back-0.174
 goodbye-0.156
 proud-0.153
 woman-0.152
 thanked-0.151
Token was
Token ID373
Feature activation+1.80e+00

Pos logit contributions
wic+0.059
olving+0.056
ions+0.056
 Sp+0.056
 centimeters+0.055
Neg logit contributions
 back-0.035
 proud-0.030
 goodbye-0.030
 woman-0.029
 relieved-0.028
Token so
Token ID523
Feature activation-8.17e-01

Pos logit contributions
olving+0.092
 centimeters+0.092
ions+0.092
wic+0.091
 Sp+0.091
Neg logit contributions
 woman-0.040
five-0.038
 pony-0.036
 wife-0.036
 back-0.035
Token happy
Token ID3772
Feature activation-5.18e+00

Pos logit contributions
 pony+0.015
 woman+0.015
 back+0.015
five+0.013
 wife+0.012
Neg logit contributions
olving-0.045
ocol-0.044
 Sp-0.044
 centimeters-0.044
ions-0.043
Token forest
Token ID8222
Feature activation+2.12e+00

Pos logit contributions
wic+0.003
ocol+0.003
ions+0.003
olving+0.003
ptions+0.003
Neg logit contributions
 back-0.002
 goodbye-0.002
 proud-0.002
 woman-0.002
 happy-0.002
Token.
Token ID13
Feature activation-1.48e-01

Pos logit contributions
wic+0.121
ocol+0.121
ions+0.118
ptions+0.117
lex+0.117
Neg logit contributions
 back-0.055
 proud-0.046
 goodbye-0.045
 happy-0.045
 woman-0.044
Token One
Token ID1881
Feature activation+1.95e+00

Pos logit contributions
 back+0.006
 happy+0.005
 excited+0.005
 proud+0.005
 goodbye+0.005
Neg logit contributions
ocol-0.006
wic-0.006
olving-0.006
ptions-0.006
 centimeters-0.006
Token day
Token ID1110
Feature activation-1.25e+00

Pos logit contributions
 Sp+0.078
wic+0.077
olving+0.076
ions+0.075
lex+0.074
Neg logit contributions
 back-0.075
 goodbye-0.074
 thanked-0.069
 relieved-0.068
 proud-0.067
Token,
Token ID11
Feature activation+3.46e+00

Pos logit contributions
 back+0.022
 excited+0.019
 happy+0.019
 and+0.018
 goodbye+0.018
Neg logit contributions
ptions-0.085
arine-0.083
lex-0.083
ocol-0.082
olving-0.082
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
ocol+0.093
olving+0.090
lex+0.089
arine+0.088
ptions+0.085
Neg logit contributions
 happy-0.204
 excited-0.200
 proud-0.198
 back-0.196
 and-0.189
Tokenide
Token ID485
Feature activation+4.29e+00

Pos logit contributions
ocol+0.427
wic+0.416
ptions+0.391
 cheating+0.387
olving+0.379
Neg logit contributions
 back-0.673
 happy-0.632
 proud-0.627
 excited-0.622
 woman-0.620
Tokeny
Token ID88
Feature activation+1.19e+00

Pos logit contributions
ocol+0.182
wic+0.180
ptions+0.174
lex+0.174
olving+0.174
Neg logit contributions
 back-0.174
 goodbye-0.152
 proud-0.151
 woman-0.150
 happy-0.149
Token noticed
Token ID6810
Feature activation-9.12e-01

Pos logit contributions
wic+0.061
olving+0.059
ptions+0.058
ions+0.058
 Sp+0.058
Neg logit contributions
 back-0.035
 goodbye-0.032
 proud-0.031
 relieved-0.030
 woman-0.029
Token that
Token ID326
Feature activation-3.29e+00

Pos logit contributions
 back+0.020
 proud+0.015
 happy+0.015
 goodbye+0.014
 woman+0.014
Neg logit contributions
ocol-0.059
lex-0.057
olving-0.056
wic-0.056
ptions-0.056
Token there
Token ID612
Feature activation+2.89e-01

Pos logit contributions
 back+0.104
 goodbye+0.091
 proud+0.089
 woman+0.088
 relieved+0.086
Neg logit contributions
wic-0.168
ocol-0.167
ions-0.164
olving-0.163
ptions-0.162
Token From
Token ID3574
Feature activation-1.62e+00

Pos logit contributions
 back+0.003
 goodbye+0.003
 proud+0.003
 relieved+0.003
 thanked+0.003
Neg logit contributions
ocol-0.003
wic-0.003
ptions-0.003
olving-0.003
lex-0.003
Token that
Token ID326
Feature activation-3.33e+00

Pos logit contributions
 back+0.043
 goodbye+0.037
 woman+0.036
 proud+0.036
 relieved+0.034
Neg logit contributions
wic-0.089
ocol-0.088
ions-0.088
olving-0.087
ptions-0.086
Token day
Token ID1110
Feature activation-1.23e+00

Pos logit contributions
 back+0.104
 goodbye+0.090
 proud+0.089
 woman+0.087
 happy+0.086
Neg logit contributions
wic-0.173
ocol-0.172
ions-0.168
ptions-0.167
olving-0.167
Token on
Token ID319
Feature activation+1.99e+00

Pos logit contributions
 back+0.027
 goodbye+0.021
 proud+0.021
 happy+0.019
 relieved+0.019
Neg logit contributions
wic-0.077
ocol-0.076
ptions-0.075
olving-0.074
lex-0.074
Token,
Token ID11
Feature activation+3.07e+00

Pos logit contributions
ptions+0.150
icious+0.140
amy+0.139
redible+0.139
lex+0.137
Neg logit contributions
 back-0.034
 and-0.029
 he-0.025
 around-0.025
 happy-0.022
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
lex+0.076
ocol+0.075
arine+0.073
ptions+0.071
amy+0.071
Neg logit contributions
 happy-0.185
 back-0.181
 excited-0.177
 proud-0.177
 around-0.172
Tokenide
Token ID485
Feature activation+4.23e+00

Pos logit contributions
ocol+0.421
wic+0.414
ptions+0.384
 cheating+0.380
lex+0.378
Neg logit contributions
 back-0.689
 happy-0.640
 proud-0.633
 excited-0.630
 thanked-0.630
Tokeny
Token ID88
Feature activation+1.00e+00

Pos logit contributions
ocol+0.198
wic+0.193
lex+0.190
ptions+0.190
 cheating+0.183
Neg logit contributions
 back-0.162
 goodbye-0.133
 around-0.133
 happy-0.133
 thanked-0.133
Token knew
Token ID2993
Feature activation-2.88e+00

Pos logit contributions
wic+0.053
olving+0.052
ions+0.052
ocol+0.052
ptions+0.051
Neg logit contributions
 back-0.029
 goodbye-0.025
 proud-0.024
 woman-0.024
 relieved-0.023
Token that
Token ID326
Feature activation-3.35e+00

Pos logit contributions
 back+0.054
 goodbye+0.041
 proud+0.040
 happy+0.037
 woman+0.037
Neg logit contributions
ocol-0.189
wic-0.189
ptions-0.185
lex-0.183
ions-0.182
Token even
Token ID772
Feature activation-2.51e+00

Pos logit contributions
 back+0.094
 happy+0.084
 proud+0.079
 excited+0.078
 goodbye+0.078
Neg logit contributions
ocol-0.196
wic-0.194
ptions-0.190
icer-0.190
lex-0.188
Token caught
Token ID4978
Feature activation+2.14e+00

Pos logit contributions
 back+0.066
 goodbye+0.055
 proud+0.054
 woman+0.053
 happy+0.053
Neg logit contributions
wic-0.119
ocol-0.119
lex-0.116
ptions-0.115
ions-0.114
Token in
Token ID287
Feature activation+3.08e+00

Pos logit contributions
lex+0.150
elve+0.147
ework+0.147
ptions+0.147
wic+0.145
Neg logit contributions
 back-0.048
 and-0.034
 around-0.032
 up-0.029
 out-0.028
Token his
Token ID465
Feature activation-3.02e+00

Pos logit contributions
wic+0.222
iders+0.218
lex+0.215
ptions+0.211
ocol+0.210
Neg logit contributions
 back-0.066
 and-0.053
 happy-0.042
 home-0.039
 excited-0.038
Token web
Token ID3992
Feature activation+3.49e+00

Pos logit contributions
five+0.083
 thanked+0.082
 proud+0.082
 goodbye+0.080
 relieved+0.079
Neg logit contributions
olving-0.144
wic-0.141
ocol-0.137
 centimeters-0.135
 Sp-0.135
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
wic+0.215
ocol+0.215
ptions+0.210
lex+0.209
icer+0.204
Neg logit contributions
 back-0.083
 goodbye-0.067
 proud-0.062
 happy-0.061
 woman-0.060
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
 back+0.000
 goodbye+0.000
 relieved+0.000
 excited+0.000
 proud+0.000
Neg logit contributions
ocol-0.001
ptions-0.001
wic-0.001
olving-0.001
lex-0.001
Tokenide
Token ID485
Feature activation+4.20e+00

Pos logit contributions
wic+0.397
ocol+0.395
olving+0.361
ptions+0.360
lex+0.359
Neg logit contributions
 back-0.710
 proud-0.657
 happy-0.657
 thanked-0.652
 woman-0.650
Tokeny
Token ID88
Feature activation+1.01e+00

Pos logit contributions
ocol+0.195
wic+0.190
lex+0.189
ptions+0.188
olving+0.186
Neg logit contributions
 back-0.159
 thanked-0.134
 happy-0.133
 goodbye-0.133
 around-0.133
Token was
Token ID373
Feature activation+1.99e+00

Pos logit contributions
wic+0.053
olving+0.051
ocol+0.051
ions+0.051
ptions+0.051
Neg logit contributions
 back-0.030
 goodbye-0.026
 proud-0.025
 woman-0.025
 relieved-0.024
Token so
Token ID523
Feature activation-6.22e-01

Pos logit contributions
olving+0.108
 centimeters+0.108
ptions+0.107
ions+0.107
 Sp+0.106
Neg logit contributions
 goodbye-0.042
 back-0.042
 woman-0.042
five-0.040
 wife-0.039
Token happy
Token ID3772
Feature activation-5.21e+00

Pos logit contributions
 pony+0.013
 goodbye+0.013
 back+0.013
 woman+0.012
 wife+0.012
Neg logit contributions
olving-0.036
 Sp-0.035
 centimeters-0.034
 arose-0.032
ocol-0.032
Token's
Token ID338
Feature activation-2.15e+00

Pos logit contributions
ocol+0.058
wic+0.057
lex+0.056
ions+0.056
olving+0.056
Neg logit contributions
 back-0.030
 goodbye-0.025
 proud-0.025
 thanked-0.024
 happy-0.024
Token web
Token ID3992
Feature activation+3.53e+00

Pos logit contributions
 back+0.069
 goodbye+0.061
 proud+0.060
 woman+0.059
 relieved+0.058
Neg logit contributions
wic-0.107
ocol-0.106
olving-0.105
ions-0.105
ptions-0.103
Token went
Token ID1816
Feature activation-1.86e+00

Pos logit contributions
ocol+0.200
lex+0.195
ptions+0.194
wic+0.193
icer+0.190
Neg logit contributions
 back-0.106
 goodbye-0.086
 thanked-0.085
 proud-0.084
 excited-0.083
Token flying
Token ID7348
Feature activation-2.79e+00

Pos logit contributions
 back+0.043
 around+0.027
 goodbye+0.027
 proud+0.025
 excited+0.023
Neg logit contributions
ocol-0.126
wic-0.125
lex-0.123
ions-0.121
ptions-0.121
Token.
Token ID13
Feature activation-2.36e-01

Pos logit contributions
 back+0.061
 around+0.045
 and+0.035
 goodbye+0.033
 excited+0.031
Neg logit contributions
ocol-0.204
wic-0.197
lex-0.194
ptions-0.191
esan-0.188
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
 back+0.005
 proud+0.004
 goodbye+0.004
 relieved+0.004
 thanked+0.004
Neg logit contributions
ocol-0.014
wic-0.014
ptions-0.014
olving-0.014
lex-0.014
Tokenide
Token ID485
Feature activation+4.21e+00

Pos logit contributions
ocol+0.397
wic+0.396
ptions+0.361
olving+0.359
lex+0.358
Neg logit contributions
 back-0.715
 happy-0.659
 proud-0.658
 thanked-0.655
 woman-0.653
Tokeny
Token ID88
Feature activation+1.04e+00

Pos logit contributions
ocol+0.191
wic+0.187
lex+0.182
ptions+0.181
olving+0.180
Neg logit contributions
 back-0.165
 goodbye-0.140
 thanked-0.139
 woman-0.138
 happy-0.137
Token held
Token ID2714
Feature activation-2.52e+00

Pos logit contributions
wic+0.052
olving+0.050
ions+0.050
ptions+0.050
ocol+0.050
Neg logit contributions
 back-0.033
 goodbye-0.029
 proud-0.029
 woman-0.028
 wife-0.027
Token on
Token ID319
Feature activation+1.96e+00

Pos logit contributions
 back+0.059
 goodbye+0.046
 proud+0.045
 happy+0.043
 relieved+0.042
Neg logit contributions
wic-0.156
ptions-0.154
ocol-0.154
olving-0.152
ions-0.150
Token tight
Token ID5381
Feature activation+1.19e+00

Pos logit contributions
wic+0.110
olving+0.110
ocol+0.108
lex+0.106
 centimeters+0.106
Neg logit contributions
 back-0.046
 woman-0.042
 goodbye-0.040
 proud-0.039
 wife-0.038
Token over
Token ID625
Feature activation-2.81e+00

Pos logit contributions
wic+0.088
ocol+0.088
lex+0.087
ions+0.086
ptions+0.085
Neg logit contributions
 back-0.040
 happy-0.033
 proud-0.033
 relieved-0.031
 goodbye-0.031
Token.
Token ID13
Feature activation-2.39e-01

Pos logit contributions
 back+0.057
 and+0.041
 happy+0.036
 excited+0.034
 home+0.033
Neg logit contributions
wic-0.207
ptions-0.204
lex-0.204
redible-0.202
iders-0.201
Token
Token ID198
Feature activation-2.45e+00

Pos logit contributions
 back+0.004
 goodbye+0.003
 woman+0.003
 proud+0.003
 happy+0.003
Neg logit contributions
wic-0.015
ocol-0.015
ions-0.015
olving-0.015
 Sp-0.015
Token
Token ID198
Feature activation-2.38e+00

Pos logit contributions
 back+0.052
 goodbye+0.046
 woman+0.043
 pony+0.041
 wife+0.041
Neg logit contributions
 Sp-0.149
ions-0.148
wic-0.146
olving-0.144
ocol-0.142
TokenWhen
Token ID2215
Feature activation+7.96e-01

Pos logit contributions
 proud+0.071
 happy+0.070
 back+0.069
 relieved+0.067
 thanked+0.066
Neg logit contributions
ocol-0.128
wic-0.123
lex-0.118
arine-0.117
ptions-0.116
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
ocol+0.044
ptions+0.044
lex+0.043
wic+0.043
olving+0.043
Neg logit contributions
 back-0.024
 proud-0.021
 happy-0.020
 goodbye-0.020
 excited-0.019
Tokenide
Token ID485
Feature activation+4.19e+00

Pos logit contributions
ocol+0.423
wic+0.421
 cheating+0.388
ptions+0.387
arine+0.377
Neg logit contributions
 back-0.671
 happy-0.626
 thanked-0.623
 excited-0.622
 proud-0.620
Tokeny
Token ID88
Feature activation+9.96e-01

Pos logit contributions
ocol+0.195
wic+0.191
ptions+0.187
lex+0.187
olving+0.182
Neg logit contributions
 back-0.162
 goodbye-0.135
 thanked-0.135
 around-0.133
 woman-0.132
Token opened
Token ID4721
Feature activation-2.60e-01

Pos logit contributions
wic+0.047
olving+0.046
ions+0.046
ptions+0.045
ocol+0.045
Neg logit contributions
 back-0.033
 goodbye-0.030
 proud-0.030
 woman-0.029
 wife-0.028
Token his
Token ID465
Feature activation-3.00e+00

Pos logit contributions
 back+0.008
 woman+0.007
 proud+0.007
 goodbye+0.007
 relieved+0.007
Neg logit contributions
wic-0.013
ocol-0.013
ions-0.013
olving-0.013
ptions-0.013
Token eyes
Token ID2951
Feature activation+1.79e-01

Pos logit contributions
five+0.111
 thanked+0.110
 proud+0.109
 woman+0.108
 goodbye+0.105
Neg logit contributions
olving-0.123
wic-0.120
ocol-0.115
 centimeters-0.111
 Sp-0.111
Token blew
Token ID17948
Feature activation-1.26e+00

Pos logit contributions
ocol+0.166
wic+0.166
lex+0.163
olving+0.161
ptions+0.160
Neg logit contributions
 back-0.127
 thanked-0.108
 proud-0.108
 goodbye-0.107
 woman-0.106
Token through
Token ID832
Feature activation+1.18e+00

Pos logit contributions
 back+0.031
 around+0.023
 and+0.022
 out+0.021
 off+0.019
Neg logit contributions
ework-0.087
ocol-0.087
olving-0.087
wic-0.086
lex-0.086
Token the
Token ID262
Feature activation-1.31e-02

Pos logit contributions
ocol+0.076
wic+0.076
lex+0.076
arine+0.075
olving+0.075
Neg logit contributions
 back-0.026
 and-0.023
 happy-0.019
 around-0.019
 excited-0.019
Token attic
Token ID46113
Feature activation+1.14e+00

Pos logit contributions
 back+0.000
 goodbye+0.000
 proud+0.000
 thanked+0.000
 relieved+0.000
Neg logit contributions
wic-0.001
olving-0.001
 centimeters-0.001
ptions-0.001
ions-0.001
Token and
Token ID290
Feature activation-4.10e+00

Pos logit contributions
wic+0.073
ocol+0.073
ions+0.071
olving+0.071
ptions+0.071
Neg logit contributions
 back-0.022
 goodbye-0.017
 proud-0.017
 woman-0.016
 thanked-0.016
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
 back+0.127
 proud+0.104
 thanked+0.102
 happy+0.102
 excited+0.100
Neg logit contributions
ocol-0.232
lex-0.225
wic-0.222
olving-0.220
ptions-0.217
Tokenide
Token ID485
Feature activation+4.19e+00

Pos logit contributions
ocol+0.419
wic+0.418
ptions+0.383
 cheating+0.381
lex+0.379
Neg logit contributions
 back-0.678
 happy-0.620
 thanked-0.619
 proud-0.619
 excited-0.618
Tokeny
Token ID88
Feature activation+1.04e+00

Pos logit contributions
ocol+0.199
wic+0.193
lex+0.192
ptions+0.189
olving+0.187
Neg logit contributions
 back-0.160
 around-0.132
 goodbye-0.132
 thanked-0.130
 excited-0.129
Token's
Token ID338
Feature activation-2.15e+00

Pos logit contributions
ocol+0.058
wic+0.057
lex+0.056
ions+0.056
olving+0.056
Neg logit contributions
 back-0.030
 goodbye-0.025
 proud-0.025
 thanked-0.024
 happy-0.024
Token web
Token ID3992
Feature activation+3.53e+00

Pos logit contributions
 back+0.069
 goodbye+0.061
 proud+0.060
 woman+0.059
 relieved+0.058
Neg logit contributions
wic-0.107
ocol-0.106
olving-0.105
ions-0.105
ptions-0.103
Token went
Token ID1816
Feature activation-1.86e+00

Pos logit contributions
ocol+0.200
lex+0.195
ptions+0.194
wic+0.193
icer+0.190
Neg logit contributions
 back-0.106
 goodbye-0.086
 thanked-0.085
 proud-0.084
 excited-0.083
Token was
Token ID373
Feature activation+2.24e+00

Pos logit contributions
ocol+0.022
wic+0.022
ptions+0.021
ions+0.021
olving+0.021
Neg logit contributions
 back-0.012
 proud-0.010
 goodbye-0.010
 happy-0.010
 woman-0.010
Token a
Token ID257
Feature activation+4.52e-01

Pos logit contributions
ions+0.113
wic+0.113
ocol+0.113
ptions+0.110
olving+0.110
Neg logit contributions
 back-0.069
 goodbye-0.062
 woman-0.059
 proud-0.058
 relieved-0.058
Token little
Token ID1310
Feature activation+6.39e-01

Pos logit contributions
ions+0.029
ocol+0.029
wic+0.029
olving+0.029
ptions+0.028
Neg logit contributions
 back-0.009
 proud-0.007
 happy-0.007
 woman-0.007
 goodbye-0.007
Token spider
Token ID19230
Feature activation+3.42e+00

Pos logit contributions
wic+0.042
olving+0.042
ocol+0.042
ptions+0.042
ions+0.042
Neg logit contributions
 back-0.011
 woman-0.008
 proud-0.008
 goodbye-0.008
 happy-0.007
Token named
Token ID3706
Feature activation+1.39e+00

Pos logit contributions
ocol+0.164
wic+0.162
ptions+0.160
olving+0.159
ions+0.157
Neg logit contributions
 back-0.122
 proud-0.107
 goodbye-0.106
 woman-0.106
 happy-0.103
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
ocol+0.064
olving+0.061
 arose+0.059
 centimeters+0.059
wic+0.058
Neg logit contributions
 back-0.059
 around-0.049
 happy-0.047
 out-0.047
 and-0.046
Tokenide
Token ID485
Feature activation+4.02e+00

Pos logit contributions
ocol+0.473
wic+0.454
 cheating+0.433
ptions+0.429
arine+0.418
Neg logit contributions
 back-0.609
 happy-0.579
 and-0.579
 woman-0.575
 proud-0.570
Tokeny
Token ID88
Feature activation+1.18e+00

Pos logit contributions
ocol+0.228
ptions+0.224
lex+0.217
arine+0.217
 cheating+0.216
Neg logit contributions
 back-0.116
 and-0.095
 around-0.094
 happy-0.093
 excited-0.089
Token.
Token ID13
Feature activation-8.95e-02

Pos logit contributions
ocol+0.082
ptions+0.080
olving+0.079
ions+0.077
arine+0.077
Neg logit contributions
 back-0.021
 woman-0.017
 and-0.016
 happy-0.015
 proud-0.015
Token Sp
Token ID1338
Feature activation+1.39e+01

Pos logit contributions
 He+0.005
+0.005
 She+0.005
 excited+0.005
 back+0.005
Neg logit contributions
gh-0.002
ocol-0.002
bek-0.002
 injuries-0.002
ptions-0.002
Tokenide
Token ID485
Feature activation+4.32e+00

Pos logit contributions
ocol+0.419
wic+0.410
ptions+0.382
olving+0.377
 cheating+0.376
Neg logit contributions
 back-0.688
 happy-0.646
 proud-0.644
 excited-0.636
 woman-0.635
Token catch
Token ID4929
Feature activation+9.80e-01

Pos logit contributions
wic+0.091
olving+0.091
 centimeters+0.090
ptions+0.089
ions+0.089
Neg logit contributions
 back-0.048
 happy-0.037
 proud-0.037
 goodbye-0.037
 thanked-0.036
Token Sp
Token ID1338
Feature activation+1.37e+01

Pos logit contributions
olving+0.058
ptions+0.057
lex+0.056
ocol+0.056
 centimeters+0.056
Neg logit contributions
 back-0.029
 around-0.023
 goodbye-0.021
 happy-0.021
 excited-0.021
Tokenide
Token ID485
Feature activation+4.03e+00

Pos logit contributions
ocol+0.482
wic+0.474
 cheating+0.439
ptions+0.417
olving+0.415
Neg logit contributions
 back-0.612
 and-0.563
 woman-0.559
 around-0.558
 happy-0.556
Tokeny
Token ID88
Feature activation+9.78e-01

Pos logit contributions
ptions+0.230
ocol+0.228
wic+0.223
lex+0.221
 cheating+0.219
Neg logit contributions
 back-0.131
 around-0.100
 and-0.098
 goodbye-0.096
 happy-0.088
Token.
Token ID13
Feature activation-1.06e-01

Pos logit contributions
ptions+0.064
ocol+0.062
wic+0.062
lex+0.061
icer+0.061
Neg logit contributions
 back-0.024
 goodbye-0.018
 and-0.017
 around-0.017
 happy-0.016
Token Sp
Token ID1338
Feature activation+1.38e+01

Pos logit contributions
 back+0.003
 proud+0.002
 relieved+0.002
 thanked+0.002
 excited+0.002
Neg logit contributions
ocol-0.006
ptions-0.006
wic-0.006
olving-0.006
lex-0.006
Tokenide
Token ID485
Feature activation+4.15e+00

Pos logit contributions
ocol+0.436
wic+0.432
arine+0.393
ptions+0.393
 cheating+0.392
Neg logit contributions
 back-0.660
 happy-0.622
 thanked-0.617
 proud-0.617
 excited-0.611
Tokeny
Token ID88
Feature activation+1.03e+00

Pos logit contributions
ocol+0.202
ptions+0.195
lex+0.194
wic+0.194
olving+0.191
Neg logit contributions
 back-0.154
 thanked-0.130
 around-0.128
 happy-0.127
 excited-0.126
Token wasn
Token ID2492
Feature activation+5.68e+00

Pos logit contributions
ocol+0.057
wic+0.055
ions+0.055
olving+0.054
ptions+0.054
Neg logit contributions
 back-0.029
 goodbye-0.025
 proud-0.025
 thanked-0.025
 woman-0.024
Token't
Token ID470
Feature activation+3.27e+00

Pos logit contributions
ions+0.181
ocol+0.174
wic+0.173
 Sp+0.171
 centimeters+0.171
Neg logit contributions
 woman-0.265
 back-0.263
 wife-0.257
 goodbye-0.257
 thanked-0.247
Token ready
Token ID3492
Feature activation-1.75e+00

Pos logit contributions
ocol+0.179
ptions+0.179
 cheating+0.177
wic+0.177
olving+0.175
Neg logit contributions
 back-0.078
 goodbye-0.074
 woman-0.067
 thanked-0.065
 proud-0.062
Token heavy
Token ID4334
Feature activation+3.91e+00

Pos logit contributions
olving+0.067
ocol+0.066
 centimeters+0.066
 cheating+0.064
 degrees+0.064
Neg logit contributions
 back-0.030
 goodbye-0.026
 woman-0.025
five-0.024
 pony-0.023
Token.
Token ID13
Feature activation-1.70e-01

Pos logit contributions
wic+0.222
ocol+0.221
ptions+0.218
ions+0.218
lex+0.215
Neg logit contributions
 back-0.106
 goodbye-0.089
 proud-0.085
 woman-0.082
 happy-0.082
Token
Token ID198
Feature activation-2.36e+00

Pos logit contributions
 back+0.004
 woman+0.003
 pony+0.003
 goodbye+0.003
 wife+0.003
Neg logit contributions
 Sp-0.010
wic-0.010
ions-0.010
olving-0.010
ocol-0.010
Token
Token ID198
Feature activation-2.31e+00

Pos logit contributions
 back+0.062
 woman+0.059
 pony+0.058
 goodbye+0.058
 wife+0.055
Neg logit contributions
 Sp-0.125
ions-0.125
wic-0.125
ocol-0.120
olving-0.119
TokenJack
Token ID14295
Feature activation+4.89e+00

Pos logit contributions
 back+0.103
 goodbye+0.094
 proud+0.092
 woman+0.092
 relieved+0.090
Neg logit contributions
wic-0.087
ocol-0.086
ions-0.085
olving-0.084
ptions-0.083
Token bid
Token ID8406
Feature activation-8.85e+00

Pos logit contributions
wic+0.272
olving+0.260
ions+0.260
ocol+0.256
 Sp+0.256
Neg logit contributions
 back-0.116
 goodbye-0.099
 proud-0.096
 woman-0.096
 relieved-0.090
Token the
Token ID262
Feature activation-1.93e-01

Pos logit contributions
 woman+0.248
 relieved+0.235
 back+0.233
 pony+0.233
 proud+0.233
Neg logit contributions
ocol-0.417
lex-0.413
olving-0.403
wic-0.403
gh-0.399
Token old
Token ID1468
Feature activation-1.60e+00

Pos logit contributions
 back+0.004
 around+0.004
 goodbye+0.003
 relieved+0.003
amb+0.003
Neg logit contributions
olving-0.011
 cheating-0.010
lex-0.010
wic-0.010
arine-0.010
Token man
Token ID582
Feature activation-3.65e+00

Pos logit contributions
 back+0.034
 relieved+0.033
 thanked+0.031
 exclaimed+0.030
 around+0.030
Neg logit contributions
lex-0.082
gh-0.081
olving-0.081
 cheating-0.080
wic-0.080
Token farewell
Token ID33458
Feature activation-5.67e+00

Pos logit contributions
 back+0.075
 woman+0.075
 proud+0.071
 relieved+0.067
 thanked+0.066
Neg logit contributions
wic-0.196
olving-0.193
ions-0.191
lex-0.191
 Sp-0.189
Token,
Token ID11
Feature activation+3.15e+00

Pos logit contributions
 know+0.154
 understand+0.152
 woman+0.138
 wife+0.135
found+0.133
Neg logit contributions
olving-0.255
 Sp-0.242
wic-0.242
 degrees-0.242
 centimeters-0.242
Token shrimp
Token ID30060
Feature activation-5.25e-01

Pos logit contributions
 back+0.002
 proud+0.002
 happy+0.002
 woman+0.002
 goodbye+0.002
Neg logit contributions
wic-0.004
ions-0.004
ocol-0.004
ptions-0.004
lex-0.004
Token and
Token ID290
Feature activation-4.14e+00

Pos logit contributions
 back+0.015
 woman+0.014
 goodbye+0.013
 proud+0.013
 relieved+0.013
Neg logit contributions
wic-0.027
ions-0.026
olving-0.026
ocol-0.025
 Sp-0.025
Token the
Token ID262
Feature activation-1.08e-01

Pos logit contributions
 back+0.125
 goodbye+0.109
 proud+0.107
 woman+0.105
 relieved+0.103
Neg logit contributions
wic-0.215
ocol-0.214
ions-0.210
olving-0.209
ptions-0.208
Token butterfly
Token ID35113
Feature activation-2.32e+00

Pos logit contributions
 back+0.003
 happy+0.003
 proud+0.003
 woman+0.003
 goodbye+0.003
Neg logit contributions
ions-0.006
ocol-0.006
wic-0.006
ptions-0.006
lex-0.006
Token that
Token ID326
Feature activation-3.43e+00

Pos logit contributions
 back+0.065
 woman+0.063
 proud+0.059
 relieved+0.058
 goodbye+0.057
Neg logit contributions
wic-0.112
olving-0.111
ions-0.110
 Sp-0.108
ocol-0.106
Token whispered
Token ID25029
Feature activation-8.83e+00

Pos logit contributions
 back+0.109
 goodbye+0.096
 proud+0.094
 woman+0.092
 relieved+0.091
Neg logit contributions
wic-0.174
ocol-0.172
ions-0.169
olving-0.169
ptions-0.168
Token in
Token ID287
Feature activation+3.14e+00

Pos logit contributions
 back+0.200
 goodbye+0.170
 happy+0.136
 proud+0.134
 relieved+0.125
Neg logit contributions
ocol-0.577
ptions-0.566
icer-0.566
wic-0.562
lex-0.561
Token the
Token ID262
Feature activation-1.53e-01

Pos logit contributions
ptions+0.201
wic+0.199
iders+0.190
lex+0.190
 centimeters+0.187
Neg logit contributions
 back-0.076
 happy-0.071
 proud-0.066
 goodbye-0.065
 thanked-0.058
Token pond
Token ID16723
Feature activation+1.40e+00

Pos logit contributions
 back+0.005
 goodbye+0.004
 proud+0.004
 woman+0.004
 happy+0.004
Neg logit contributions
wic-0.008
ions-0.008
ptions-0.008
ocol-0.008
olving-0.008
Token.
Token ID13
Feature activation-3.18e-01

Pos logit contributions
ocol+0.085
ptions+0.085
lex+0.083
wic+0.083
icer+0.082
Neg logit contributions
 back-0.036
 goodbye-0.031
 happy-0.028
 proud-0.028
 thanked-0.027
Token<|endoftext|>
Token ID50256
Feature activation-9.30e-01

Pos logit contributions
 back+0.006
 goodbye+0.005
 woman+0.005
 proud+0.005
 happy+0.004
Neg logit contributions
ions-0.020
wic-0.020
 Sp-0.020
ocol-0.019
lex-0.019
Token slide
Token ID10649
Feature activation+7.76e-01

Pos logit contributions
 back+0.036
 relieved+0.031
 goodbye+0.029
 proud+0.029
 thanked+0.028
Neg logit contributions
wic-0.098
lex-0.093
olving-0.093
ocol-0.092
 arose-0.092
Token always
Token ID1464
Feature activation+3.54e-01

Pos logit contributions
ocol+0.044
wic+0.043
ptions+0.042
lex+0.042
ions+0.041
Neg logit contributions
 back-0.023
 goodbye-0.020
 proud-0.019
 thanked-0.019
 happy-0.018
Token listened
Token ID16399
Feature activation-3.38e+00

Pos logit contributions
ocol+0.019
wic+0.019
ions+0.019
lex+0.019
ptions+0.019
Neg logit contributions
 back-0.011
 goodbye-0.009
 proud-0.009
 happy-0.009
 thanked-0.009
Token as
Token ID355
Feature activation-2.95e+00

Pos logit contributions
 back+0.074
 woman+0.070
 proud+0.063
 thanked+0.062
 wife+0.062
Neg logit contributions
wic-0.189
olving-0.189
ocol-0.185
 centimeters-0.185
ions-0.184
Token they
Token ID484
Feature activation-2.68e+00

Pos logit contributions
 back+0.069
 happy+0.064
 proud+0.062
 goodbye+0.061
 excited+0.057
Neg logit contributions
icer-0.191
ptions-0.185
ocol-0.183
wic-0.182
lex-0.178
Token whispered
Token ID25029
Feature activation-8.81e+00

Pos logit contributions
 back+0.081
 goodbye+0.070
 proud+0.068
 woman+0.067
 thanked+0.066
Neg logit contributions
wic-0.141
ocol-0.140
ions-0.138
olving-0.137
ptions-0.137
Token their
Token ID511
Feature activation-2.76e+00

Pos logit contributions
 back+0.218
 goodbye+0.188
 happy+0.158
 proud+0.157
 excited+0.150
Neg logit contributions
ocol-0.547
wic-0.544
icer-0.541
ptions-0.537
lex-0.532
Token secrets
Token ID13141
Feature activation-7.32e-01

Pos logit contributions
 back+0.091
 goodbye+0.081
 happy+0.074
 proud+0.074
 woman+0.071
Neg logit contributions
ions-0.155
ocol-0.151
ptions-0.150
wic-0.145
lex-0.142
Token.
Token ID13
Feature activation-3.66e-01

Pos logit contributions
 back+0.020
 goodbye+0.017
 proud+0.017
 woman+0.016
 relieved+0.016
Neg logit contributions
wic-0.040
ocol-0.040
ions-0.039
olving-0.039
ptions-0.039
Token<|endoftext|>
Token ID50256
Feature activation-8.83e-01

Pos logit contributions
 back+0.009
 pony+0.008
 goodbye+0.008
 woman+0.008
 proud+0.007
Neg logit contributions
 Sp-0.020
ions-0.020
wic-0.019
lex-0.019
olving-0.019
TokenOnce
Token ID7454
Feature activation+3.35e+00

Pos logit contributions
 back+0.022
 around+0.017
+0.017
 goodbye+0.017
 proud+0.016
Neg logit contributions
ocol-0.053
wic-0.052
olving-0.051
lex-0.051
ptions-0.051
Token,
Token ID11
Feature activation+3.10e+00

Pos logit contributions
 back+0.026
 happy+0.021
 goodbye+0.020
 proud+0.020
 excited+0.019
Neg logit contributions
wic-0.068
lex-0.066
ptions-0.066
ocol-0.066
ions-0.065
Token even
Token ID772
Feature activation-2.60e+00

Pos logit contributions
wic+0.169
ocol+0.168
olving+0.165
ions+0.165
ptions+0.163
Neg logit contributions
 back-0.083
 goodbye-0.072
 proud-0.070
 woman-0.069
 relieved-0.067
Token when
Token ID618
Feature activation-8.83e-01

Pos logit contributions
 back+0.092
 happy+0.081
 proud+0.081
 goodbye+0.079
 woman+0.078
Neg logit contributions
wic-0.130
ocol-0.125
ions-0.123
lex-0.122
ptions-0.121
Token its
Token ID663
Feature activation-3.69e+00

Pos logit contributions
 back+0.020
 happy+0.017
 goodbye+0.017
 proud+0.017
 excited+0.015
Neg logit contributions
ptions-0.057
lex-0.057
wic-0.056
ocol-0.055
olving-0.054
Token friend
Token ID1545
Feature activation-4.18e+00

Pos logit contributions
 back+0.105
 goodbye+0.086
 woman+0.082
 proud+0.082
 happy+0.082
Neg logit contributions
ions-0.211
wic-0.211
ptions-0.211
lex-0.204
ocol-0.203
Token whispered
Token ID25029
Feature activation-8.81e+00

Pos logit contributions
 back+0.115
 thanked+0.102
 goodbye+0.095
 happy+0.093
 woman+0.092
Neg logit contributions
ocol-0.244
wic-0.241
lex-0.239
ptions-0.239
ions-0.236
Token its
Token ID663
Feature activation-3.68e+00

Pos logit contributions
 back+0.231
 goodbye+0.199
 happy+0.155
 around+0.151
 proud+0.148
Neg logit contributions
icer-0.561
ptions-0.559
ocol-0.557
wic-0.549
lex-0.543
Token name
Token ID1438
Feature activation-1.47e+00

Pos logit contributions
 back+0.093
 goodbye+0.077
 proud+0.072
 happy+0.072
 woman+0.071
Neg logit contributions
ions-0.219
wic-0.218
ocol-0.217
ptions-0.216
lex-0.211
Token.
Token ID13
Feature activation-3.82e-01

Pos logit contributions
 back+0.044
 goodbye+0.038
 thanked+0.035
 proud+0.035
 woman+0.034
Neg logit contributions
ocol-0.081
wic-0.080
ptions-0.079
lex-0.078
ions-0.078
Token<|endoftext|>
Token ID50256
Feature activation-6.06e-01

Pos logit contributions
 back+0.007
 goodbye+0.006
 proud+0.005
 thanked+0.005
 woman+0.005
Neg logit contributions
wic-0.025
ocol-0.025
olving-0.024
ptions-0.024
 centimeters-0.024
TokenOnce
Token ID7454
Feature activation+3.42e+00

Pos logit contributions
 back+0.011
 proud+0.008
 goodbye+0.008
 thanked+0.007
 woman+0.007
Neg logit contributions
ocol-0.040
wic-0.039
olving-0.039
ptions-0.039
lex-0.038
Token fun
Token ID1257
Feature activation+1.74e+00

Pos logit contributions
 back+0.008
 goodbye+0.007
 proud+0.007
 happy+0.007
 woman+0.007
Neg logit contributions
ocol-0.014
wic-0.014
ions-0.014
ptions-0.014
olving-0.013
Token afternoon
Token ID6672
Feature activation-4.50e-01

Pos logit contributions
wic+0.087
ions+0.087
ocol+0.086
 arose+0.084
olving+0.084
Neg logit contributions
 back-0.051
 relieved-0.045
 proud-0.045
 goodbye-0.043
 woman-0.043
Token.
Token ID13
Feature activation-2.96e-01

Pos logit contributions
 back+0.012
 goodbye+0.010
 proud+0.010
 woman+0.010
 relieved+0.010
Neg logit contributions
wic-0.025
ocol-0.025
ions-0.024
olving-0.024
ptions-0.024
Token He
Token ID679
Feature activation+1.19e+00

Pos logit contributions
 back+0.008
 woman+0.006
 proud+0.006
 goodbye+0.006
 happy+0.006
Neg logit contributions
wic-0.016
 Sp-0.016
ions-0.016
ocol-0.016
lex-0.016
Token then
Token ID788
Feature activation-1.38e+00

Pos logit contributions
wic+0.057
 Sp+0.055
 arose+0.054
ions+0.053
 centimeters+0.052
Neg logit contributions
 back-0.034
 woman-0.030
five-0.030
 proud-0.030
 goodbye-0.030
Token bid
Token ID8406
Feature activation-8.81e+00

Pos logit contributions
 back+0.038
 woman+0.033
 proud+0.033
 relieved+0.032
five+0.031
Neg logit contributions
wic-0.065
 arose-0.064
ions-0.063
 cheating-0.063
 Sp-0.062
Token farewell
Token ID33458
Feature activation-5.58e+00

Pos logit contributions
 woman+0.166
 proud+0.164
 relieved+0.158
 back+0.153
 pony+0.149
Neg logit contributions
ocol-0.487
gh-0.473
wic-0.469
lex-0.469
 Sp-0.466
Token to
Token ID284
Feature activation+1.49e+00

Pos logit contributions
 know+0.148
 understand+0.144
 invites+0.127
 woman+0.124
found+0.124
Neg logit contributions
olving-0.243
 centimeters-0.239
 degrees-0.238
 Sp-0.237
wic-0.230
Token all
Token ID477
Feature activation-2.29e+00

Pos logit contributions
ocol+0.063
 Sp+0.062
ions+0.061
wic+0.061
lex+0.060
Neg logit contributions
 back-0.054
 relieved-0.051
 proud-0.049
 woman-0.049
 thanked-0.048
Token the
Token ID262
Feature activation-3.30e-01

Pos logit contributions
 back+0.072
 proud+0.067
 relieved+0.065
 wife+0.063
 woman+0.063
Neg logit contributions
ions-0.112
ocol-0.106
wic-0.102
lex-0.101
olving-0.100
Token birds
Token ID10087
Feature activation+1.69e+00

Pos logit contributions
 back+0.009
 goodbye+0.008
 proud+0.008
 relieved+0.008
 woman+0.008
Neg logit contributions
wic-0.018
ocol-0.018
ptions-0.017
ions-0.017
olving-0.017
Token some
Token ID617
Feature activation-1.31e+00

Pos logit contributions
 back+0.061
 goodbye+0.053
 proud+0.051
 happy+0.050
 woman+0.050
Neg logit contributions
wic-0.113
ocol-0.111
ptions-0.110
olving-0.110
ions-0.109
Token of
Token ID286
Feature activation-7.28e-01

Pos logit contributions
 back+0.041
 goodbye+0.035
 proud+0.034
 happy+0.033
 woman+0.033
Neg logit contributions
wic-0.069
ptions-0.068
ocol-0.068
olving-0.067
ions-0.067
Token his
Token ID465
Feature activation-2.98e+00

Pos logit contributions
 back+0.019
 goodbye+0.018
 happy+0.016
 thanked+0.016
 proud+0.015
Neg logit contributions
ptions-0.043
olving-0.043
wic-0.042
lex-0.042
 centimeters-0.042
Token toys
Token ID14958
Feature activation+9.99e-01

Pos logit contributions
 back+0.090
 goodbye+0.078
 proud+0.076
 woman+0.075
 happy+0.073
Neg logit contributions
wic-0.156
ocol-0.155
ions-0.153
olving-0.152
ptions-0.152
Token and
Token ID290
Feature activation-4.00e+00

Pos logit contributions
wic+0.054
olving+0.054
ions+0.053
ocol+0.053
 centimeters+0.052
Neg logit contributions
 back-0.025
 woman-0.022
 goodbye-0.022
 proud-0.022
 relieved-0.021
Token bid
Token ID8406
Feature activation-8.79e+00

Pos logit contributions
 back+0.109
 proud+0.093
 goodbye+0.092
 woman+0.090
 relieved+0.088
Neg logit contributions
wic-0.215
ocol-0.209
ions-0.209
olving-0.209
ptions-0.205
Token him
Token ID683
Feature activation-1.55e+00

Pos logit contributions
 back+0.204
 proud+0.193
 woman+0.190
 relieved+0.183
 thanked+0.174
Neg logit contributions
ocol-0.474
wic-0.473
olving-0.465
lex-0.458
ions-0.457
Token goodbye
Token ID24829
Feature activation-4.91e+00

Pos logit contributions
 back+0.040
 goodbye+0.034
 proud+0.033
 woman+0.031
 happy+0.031
Neg logit contributions
ocol-0.088
wic-0.088
ions-0.086
ptions-0.086
olving-0.086
Token.
Token ID13
Feature activation-1.99e-01

Pos logit contributions
 back+0.136
 woman+0.130
 proud+0.126
 relieved+0.124
 thanked+0.124
Neg logit contributions
olving-0.236
wic-0.234
 centimeters-0.230
 Sp-0.227
lex-0.226
Token
Token ID198
Feature activation-2.39e+00

Pos logit contributions
 back+0.005
 goodbye+0.004
 woman+0.004
 proud+0.004
 pony+0.004
Neg logit contributions
olving-0.011
 Sp-0.011
wic-0.011
ions-0.010
lex-0.010
Token
Token ID198
Feature activation-2.35e+00

Pos logit contributions
 back+0.061
 goodbye+0.054
 pony+0.050
 woman+0.049
 wife+0.048
Neg logit contributions
ions-0.128
 Sp-0.127
lex-0.124
wic-0.123
olving-0.123
Token.
Token ID13
Feature activation-3.61e-01

Pos logit contributions
wic+0.136
olving+0.136
ocol+0.134
ptions+0.133
ions+0.133
Neg logit contributions
 back-0.059
 happy-0.046
 proud-0.044
 goodbye-0.042
 excited-0.041
Token So
Token ID1406
Feature activation+1.30e+00

Pos logit contributions
 back+0.007
 goodbye+0.006
 proud+0.006
 woman+0.005
 thanked+0.005
Neg logit contributions
ocol-0.023
wic-0.023
ions-0.022
olving-0.022
ptions-0.022
Token,
Token ID11
Feature activation+3.25e+00

Pos logit contributions
ocol+0.073
wic+0.071
ptions+0.071
olving+0.070
ions+0.070
Neg logit contributions
 back-0.039
 goodbye-0.035
 proud-0.033
 happy-0.032
 thanked-0.032
Token the
Token ID262
Feature activation-1.43e-01

Pos logit contributions
wic+0.166
ocol+0.164
ions+0.162
olving+0.161
ptions+0.160
Neg logit contributions
 back-0.101
 goodbye-0.088
 proud-0.086
 woman-0.086
 relieved-0.084
Token whale
Token ID22206
Feature activation-2.25e+00

Pos logit contributions
 back+0.006
 goodbye+0.006
five+0.005
chair+0.005
eal+0.005
Neg logit contributions
wic-0.005
 vinegar-0.005
 topp-0.005
 Sp-0.005
 centimeters-0.005
Token whispered
Token ID25029
Feature activation-8.77e+00

Pos logit contributions
 back+0.069
 goodbye+0.064
 proud+0.064
 woman+0.062
 relieved+0.061
Neg logit contributions
wic-0.105
 Sp-0.102
olving-0.102
 arose-0.101
ptions-0.100
Token to
Token ID284
Feature activation+1.46e+00

Pos logit contributions
 back+0.185
 goodbye+0.146
 proud+0.119
 happy+0.118
 excited+0.117
Neg logit contributions
ocol-0.587
wic-0.572
icer-0.568
olving-0.567
ions-0.565
Token the
Token ID262
Feature activation-1.68e-01

Pos logit contributions
wic+0.073
ocol+0.073
ions+0.072
olving+0.071
ptions+0.071
Neg logit contributions
 back-0.047
 goodbye-0.042
 proud-0.041
 woman-0.040
 relieved-0.040
Token little
Token ID1310
Feature activation+4.52e-01

Pos logit contributions
 back+0.005
 goodbye+0.005
 relieved+0.005
 thanked+0.005
 proud+0.005
Neg logit contributions
wic-0.008
ptions-0.007
 centimeters-0.007
 topp-0.007
olving-0.007
Token fish
Token ID5916
Feature activation+3.94e-01

Pos logit contributions
wic+0.020
ocol+0.020
ions+0.019
ptions+0.019
olving+0.019
Neg logit contributions
 back-0.016
 goodbye-0.015
 proud-0.015
 relieved-0.015
 woman-0.014
Token,
Token ID11
Feature activation+3.22e+00

Pos logit contributions
wic+0.020
ocol+0.020
ions+0.020
olving+0.020
ptions+0.019
Neg logit contributions
 back-0.012
 goodbye-0.011
 proud-0.011
 woman-0.010
 relieved-0.010
Token't
Token ID470
Feature activation+3.44e+00

Pos logit contributions
 Sp+0.266
ions+0.258
wic+0.256
ocol+0.245
olving+0.240
Neg logit contributions
 back-0.341
 goodbye-0.334
 woman-0.325
 relieved-0.317
 wife-0.304
Token tell
Token ID1560
Feature activation-7.02e+00

Pos logit contributions
ocol+0.163
ptions+0.160
 Sp+0.159
lex+0.154
 cheating+0.154
Neg logit contributions
 goodbye-0.110
 relieved-0.104
 back-0.095
 woman-0.093
 proud-0.090
Token you
Token ID345
Feature activation+1.38e-01

Pos logit contributions
 back+0.291
 goodbye+0.260
 happy+0.244
 proud+0.239
 woman+0.237
Neg logit contributions
wic-0.310
ocol-0.302
olving-0.302
ptions-0.296
lex-0.295
Token,"
Token ID553
Feature activation-2.48e+00

Pos logit contributions
wic+0.007
ocol+0.007
ions+0.007
olving+0.007
ptions+0.007
Neg logit contributions
 back-0.004
 goodbye-0.003
 proud-0.003
 woman-0.003
 happy-0.003
Token Lily
Token ID20037
Feature activation+2.05e+00

Pos logit contributions
 back+0.068
 goodbye+0.060
 proud+0.054
 wife+0.053
 relieved+0.052
Neg logit contributions
 Sp-0.131
wic-0.127
 cheating-0.124
lex-0.123
ions-0.122
Token whispered
Token ID25029
Feature activation-8.77e+00

Pos logit contributions
 Sp+0.115
wic+0.113
 arose+0.112
 cheating+0.111
lex+0.109
Neg logit contributions
 back-0.047
 goodbye-0.042
five-0.038
 pony-0.036
 woman-0.036
Token.
Token ID13
Feature activation-6.54e-01

Pos logit contributions
 back+0.222
 proud+0.204
 woman+0.200
 goodbye+0.196
 relieved+0.195
Neg logit contributions
wic-0.470
ocol-0.461
ions-0.460
olving-0.459
lex-0.457
Token "
Token ID366
Feature activation+3.80e+00

Pos logit contributions
 back+0.017
 goodbye+0.015
 proud+0.014
 wife+0.014
 woman+0.014
Neg logit contributions
 Sp-0.036
ions-0.035
wic-0.035
olving-0.034
ptions-0.034
TokenIt
Token ID1026
Feature activation+5.99e-01

Pos logit contributions
ptions+0.138
ions+0.137
lex+0.136
olving+0.136
wic+0.135
Neg logit contributions
 back-0.168
 goodbye-0.160
 proud-0.148
 relieved-0.147
 thanked-0.145
Token's
Token ID338
Feature activation-2.01e+00

Pos logit contributions
wic+0.037
olving+0.036
ions+0.036
ptions+0.036
 Sp+0.035
Neg logit contributions
 back-0.012
 goodbye-0.010
 proud-0.010
 woman-0.010
 wife-0.009
Token too
Token ID1165
Feature activation+4.28e-01

Pos logit contributions
 back+0.061
 happy+0.051
 proud+0.045
 excited+0.044
 and+0.043
Neg logit contributions
wic-0.119
ocol-0.119
lex-0.115
ions-0.115
anish-0.114
Token on
Token ID319
Feature activation+1.98e+00

Pos logit contributions
 woman+0.034
 back+0.033
 thanked+0.033
 wife+0.032
 goodbye+0.031
Neg logit contributions
olving-0.061
ions-0.061
wic-0.060
ocol-0.059
 centimeters-0.058
Token,
Token ID11
Feature activation+2.97e+00

Pos logit contributions
ptions+0.126
ocol+0.120
wic+0.120
lex+0.119
olving+0.117
Neg logit contributions
 back-0.051
 goodbye-0.042
 and-0.039
 happy-0.038
 around-0.037
Token Tweet
Token ID18752
Feature activation+2.81e+00

Pos logit contributions
ocol+0.136
wic+0.134
ptions+0.131
lex+0.129
olving+0.127
Neg logit contributions
 back-0.117
 happy-0.106
 proud-0.104
 goodbye-0.103
 thanked-0.099
Tokenie
Token ID494
Feature activation-5.36e-01

Pos logit contributions
wic+0.112
ocol+0.111
ions+0.108
olving+0.107
ptions+0.107
Neg logit contributions
 back-0.121
 goodbye-0.109
 proud-0.108
 woman-0.106
 happy-0.105
Token never
Token ID1239
Feature activation+1.07e+00

Pos logit contributions
 back+0.017
 goodbye+0.016
 woman+0.015
 relieved+0.015
 proud+0.015
Neg logit contributions
olving-0.025
wic-0.025
 Sp-0.024
ions-0.024
ptions-0.023
Token whispered
Token ID25029
Feature activation-8.77e+00

Pos logit contributions
wic+0.052
olving+0.052
ptions+0.052
 cheating+0.050
 Sp+0.050
Neg logit contributions
 back-0.032
 goodbye-0.030
 proud-0.029
 woman-0.029
 relieved-0.028
Token happy
Token ID3772
Feature activation-5.16e+00

Pos logit contributions
 back+0.194
 goodbye+0.157
 proud+0.144
 happy+0.140
 woman+0.137
Neg logit contributions
wic-0.548
ocol-0.547
ions-0.535
ptions-0.531
lex-0.530
Token secrets
Token ID13141
Feature activation-4.26e-01

Pos logit contributions
 back+0.138
 proud+0.124
 relieved+0.122
 woman+0.121
 goodbye+0.118
Neg logit contributions
olving-0.271
wic-0.271
 Sp-0.265
ions-0.264
ocol-0.264
Token again
Token ID757
Feature activation-1.45e+00

Pos logit contributions
 back+0.011
 proud+0.010
 woman+0.010
 goodbye+0.010
 relieved+0.010
Neg logit contributions
wic-0.023
olving-0.023
ions-0.023
ocol-0.022
ptions-0.022
Token.
Token ID13
Feature activation-5.63e-01

Pos logit contributions
 back+0.031
 goodbye+0.023
 happy+0.023
 proud+0.023
 woman+0.022
Neg logit contributions
wic-0.093
ocol-0.092
ions-0.090
ptions-0.090
lex-0.089
Token<|endoftext|>
Token ID50256
Feature activation-1.11e+00

Pos logit contributions
 back+0.013
 goodbye+0.012
 woman+0.011
 wife+0.011
 couple+0.011
Neg logit contributions
 Sp-0.033
ions-0.031
wic-0.031
olving-0.031
lex-0.030
Token skull
Token ID14511
Feature activation-1.74e+00

Pos logit contributions
 back+0.002
 proud+0.001
 goodbye+0.001
 woman+0.001
 happy+0.001
Neg logit contributions
wic-0.004
ocol-0.003
ions-0.003
ptions-0.003
olving-0.003
Token thanked
Token ID26280
Feature activation-5.03e+00

Pos logit contributions
 back+0.049
five+0.047
 proud+0.046
 goodbye+0.045
 woman+0.045
Neg logit contributions
olving-0.081
wic-0.079
 arose-0.077
 centimeters-0.076
 cheating-0.075
Token the
Token ID262
Feature activation-9.62e-02

Pos logit contributions
 back+0.134
 goodbye+0.119
 woman+0.110
 happy+0.102
 proud+0.102
Neg logit contributions
wic-0.300
ptions-0.297
olving-0.295
 centimeters-0.294
 degrees-0.286
Token tree
Token ID5509
Feature activation-4.35e-01

Pos logit contributions
 back+0.002
 woman+0.002
 happy+0.002
 proud+0.002
 goodbye+0.002
Neg logit contributions
wic-0.006
ions-0.006
ocol-0.006
ptions-0.006
lex-0.006
Token and
Token ID290
Feature activation-4.29e+00

Pos logit contributions
 back+0.010
 proud+0.009
 woman+0.009
 relieved+0.008
 thanked+0.008
Neg logit contributions
olving-0.023
wic-0.023
ions-0.023
 Sp-0.023
ocol-0.023
Token whispered
Token ID25029
Feature activation-8.76e+00

Pos logit contributions
 back+0.118
 woman+0.103
 proud+0.102
 pony+0.101
five+0.100
Neg logit contributions
wic-0.210
 Sp-0.209
ions-0.208
 arose-0.203
olving-0.201
Token,
Token ID11
Feature activation+3.03e+00

Pos logit contributions
 back+0.126
 goodbye+0.093
 proud+0.080
 woman+0.072
 happy+0.072
Neg logit contributions
wic-0.608
ocol-0.608
ions-0.598
ptions-0.597
olving-0.593
Token "
Token ID366
Feature activation+3.50e+00

Pos logit contributions
olving+0.118
 Sp+0.117
wic+0.117
ocol+0.116
ions+0.113
Neg logit contributions
 back-0.113
 woman-0.102
 goodbye-0.100
 wife-0.099
five-0.098
TokenThank
Token ID10449
Feature activation+1.23e+00

Pos logit contributions
wic+0.159
ocol+0.157
olving+0.152
ptions+0.151
ions+0.150
Neg logit contributions
 back-0.131
 goodbye-0.117
 proud-0.117
 woman-0.116
 thanked-0.115
Token you
Token ID345
Feature activation+6.86e-02

Pos logit contributions
ptions+0.069
olving+0.068
wic+0.066
lex+0.064
icer+0.063
Neg logit contributions
 back-0.041
 goodbye-0.035
 proud-0.032
 happy-0.031
 and-0.030
Token for
Token ID329
Feature activation-7.45e-01

Pos logit contributions
ocol+0.004
ptions+0.004
olving+0.004
wic+0.004
lex+0.004
Neg logit contributions
 back-0.002
 goodbye-0.002
 happy-0.001
 proud-0.001
 woman-0.001
Token she
Token ID673
Feature activation-3.91e+00

Pos logit contributions
wic+0.200
ocol+0.198
olving+0.194
ions+0.194
ptions+0.194
Neg logit contributions
 back-0.071
 goodbye-0.058
 proud-0.057
 woman-0.056
 happy-0.054
Token remembered
Token ID12086
Feature activation-9.88e-01

Pos logit contributions
 back+0.121
 goodbye+0.108
 proud+0.104
 woman+0.102
 relieved+0.101
Neg logit contributions
wic-0.193
olving-0.187
ocol-0.185
ions-0.185
ptions-0.185
Token what
Token ID644
Feature activation-8.17e-01

Pos logit contributions
 back+0.027
 goodbye+0.024
 happy+0.022
 proud+0.022
 woman+0.022
Neg logit contributions
wic-0.057
ocol-0.057
icer-0.056
lex-0.056
ptions-0.056
Token the
Token ID262
Feature activation-3.57e-02

Pos logit contributions
 back+0.017
 woman+0.016
 happy+0.015
 goodbye+0.015
 proud+0.015
Neg logit contributions
icer-0.052
ocol-0.052
wic-0.051
ptions-0.051
 centimeters-0.050
Token wall
Token ID3355
Feature activation+2.96e+00

Pos logit contributions
 back+0.001
 goodbye+0.001
 proud+0.001
 woman+0.001
 happy+0.001
Neg logit contributions
wic-0.002
ocol-0.002
ions-0.002
ptions-0.002
olving-0.002
Token whispered
Token ID25029
Feature activation-8.75e+00

Pos logit contributions
wic+0.147
ocol+0.145
ions+0.144
olving+0.144
ptions+0.141
Neg logit contributions
 back-0.094
 goodbye-0.083
 proud-0.081
 woman-0.080
 relieved-0.079
Token to
Token ID284
Feature activation+1.62e+00

Pos logit contributions
 back+0.235
 goodbye+0.195
 happy+0.166
 around+0.163
 proud+0.163
Neg logit contributions
ocol-0.542
icer-0.531
wic-0.529
ptions-0.526
lex-0.523
Token her
Token ID607
Feature activation-4.94e+00

Pos logit contributions
wic+0.094
ptions+0.091
ions+0.091
ocol+0.091
 centimeters+0.090
Neg logit contributions
 back-0.044
 goodbye-0.037
 woman-0.037
 happy-0.036
 proud-0.036
Token.
Token ID13
Feature activation-2.68e-01

Pos logit contributions
 back+0.128
 goodbye+0.108
 proud+0.107
 woman+0.104
 happy+0.103
Neg logit contributions
ocol-0.285
wic-0.285
ptions-0.281
ions-0.281
lex-0.275
Token<|endoftext|>
Token ID50256
Feature activation-9.48e-01

Pos logit contributions
 back+0.005
 goodbye+0.004
 proud+0.004
 woman+0.004
 relieved+0.003
Neg logit contributions
wic-0.017
ocol-0.017
ions-0.017
olving-0.017
ptions-0.017
TokenOnce
Token ID7454
Feature activation+3.36e+00

Pos logit contributions
 back+0.025
 goodbye+0.019
 around+0.019
 proud+0.019
 woman+0.019
Neg logit contributions
ocol-0.055
wic-0.054
ptions-0.054
olving-0.054
lex-0.053
Token for
Token ID329
Feature activation-3.05e+00

Pos logit contributions
wic+0.091
ocol+0.089
olving+0.089
ions+0.087
ptions+0.087
Neg logit contributions
 back-0.095
 goodbye-0.087
 proud-0.085
 woman-0.084
 relieved-0.084
Token them
Token ID606
Feature activation-3.16e+00

Pos logit contributions
 back+0.098
 happy+0.085
 proud+0.082
 goodbye+0.082
 thanked+0.077
Neg logit contributions
ptions-0.171
wic-0.168
icer-0.168
lex-0.164
olving-0.163
Token!
Token ID0
Feature activation+3.25e-01

Pos logit contributions
 back+0.083
 goodbye+0.071
 proud+0.070
 woman+0.069
 relieved+0.067
Neg logit contributions
wic-0.174
ocol-0.173
ions-0.170
olving-0.170
ptions-0.168
Token Tim
Token ID5045
Feature activation+4.99e+00

Pos logit contributions
wic+0.020
ocol+0.020
ptions+0.020
olving+0.020
 centimeters+0.020
Neg logit contributions
 back-0.007
 goodbye-0.006
 proud-0.005
 thanked-0.005
 woman-0.005
Tokenmy
Token ID1820
Feature activation+2.20e-01

Pos logit contributions
wic+0.241
ocol+0.239
ions+0.235
olving+0.234
ptions+0.232
Neg logit contributions
 back-0.170
 goodbye-0.150
 proud-0.148
 woman-0.145
 relieved-0.142
Token whispered
Token ID25029
Feature activation-8.74e+00

Pos logit contributions
wic+0.012
ocol+0.012
olving+0.011
ions+0.011
ptions+0.011
Neg logit contributions
 back-0.006
 goodbye-0.005
 proud-0.005
 woman-0.005
 relieved-0.005
Token to
Token ID284
Feature activation+1.53e+00

Pos logit contributions
 back+0.217
 goodbye+0.184
 proud+0.163
 happy+0.160
 excited+0.154
Neg logit contributions
ocol-0.538
wic-0.527
ptions-0.524
ions-0.524
icer-0.520
Token Billy
Token ID15890
Feature activation+6.36e-01

Pos logit contributions
wic+0.083
ocol+0.082
ions+0.081
olving+0.081
ptions+0.080
Neg logit contributions
 back-0.044
 goodbye-0.038
 proud-0.037
 woman-0.036
 happy-0.036
Token,
Token ID11
Feature activation+3.05e+00

Pos logit contributions
ocol+0.036
wic+0.036
ions+0.035
olving+0.035
ptions+0.035
Neg logit contributions
 back-0.017
 goodbye-0.014
 proud-0.014
 woman-0.013
 happy-0.013
Token "
Token ID366
Feature activation+3.44e+00

Pos logit contributions
wic+0.140
ocol+0.138
olving+0.136
ions+0.135
ptions+0.134
Neg logit contributions
 back-0.105
 goodbye-0.093
 proud-0.090
 woman-0.090
 thanked-0.089
TokenWe
Token ID1135
Feature activation+1.56e+00

Pos logit contributions
wic+0.136
ocol+0.135
ions+0.132
olving+0.132
ptions+0.131
Neg logit contributions
 back-0.146
 goodbye-0.133
 proud-0.131
 woman-0.129
 relieved-0.127
Token morning
Token ID3329
Feature activation-1.01e+00

Pos logit contributions
 back+0.069
 goodbye+0.060
 proud+0.059
 woman+0.058
 relieved+0.057
Neg logit contributions
wic-0.109
ocol-0.108
ions-0.106
olving-0.106
ptions-0.105
Token,
Token ID11
Feature activation+3.34e+00

Pos logit contributions
 back+0.025
 goodbye+0.023
 happy+0.022
 woman+0.020
 proud+0.020
Neg logit contributions
icer-0.061
lex-0.061
olving-0.061
ptions-0.061
ocol-0.061
Token the
Token ID262
Feature activation+1.57e-02

Pos logit contributions
wic+0.206
ocol+0.203
lex+0.203
ptions+0.202
olving+0.201
Neg logit contributions
 back-0.078
 happy-0.071
 goodbye-0.069
 proud-0.068
 excited-0.066
Token white
Token ID2330
Feature activation+2.08e+00

Pos logit contributions
wic+0.001
ocol+0.001
ions+0.001
olving+0.001
ptions+0.001
Neg logit contributions
 back-0.000
 goodbye-0.000
 proud-0.000
 woman-0.000
 happy-0.000
Token flower
Token ID15061
Feature activation-3.42e+00

Pos logit contributions
wic+0.109
ocol+0.108
ions+0.106
olving+0.106
ptions+0.104
Neg logit contributions
 back-0.062
 goodbye-0.054
 proud-0.054
 relieved-0.052
 woman-0.051
Token whispered
Token ID25029
Feature activation-8.74e+00

Pos logit contributions
 back+0.143
 goodbye+0.133
 proud+0.130
 woman+0.129
 relieved+0.129
Neg logit contributions
wic-0.130
olving-0.127
ions-0.126
ocol-0.120
 Sp-0.119
Token its
Token ID663
Feature activation-3.65e+00

Pos logit contributions
 back+0.190
 goodbye+0.157
 proud+0.135
 happy+0.133
 woman+0.131
Neg logit contributions
ocol-0.561
wic-0.550
lex-0.544
icer-0.542
ions-0.542
Token morning
Token ID3329
Feature activation-1.01e+00

Pos logit contributions
 back+0.098
 goodbye+0.084
 proud+0.082
 woman+0.080
 happy+0.078
Neg logit contributions
wic-0.204
ocol-0.202
ions-0.200
olving-0.198
ptions-0.198
Token song
Token ID3496
Feature activation-3.89e+00

Pos logit contributions
 back+0.022
 goodbye+0.018
 proud+0.017
 woman+0.017
 happy+0.016
Neg logit contributions
wic-0.061
ocol-0.061
ions-0.060
olving-0.060
ptions-0.060
Token to
Token ID284
Feature activation+1.50e+00

Pos logit contributions
 back+0.094
 proud+0.082
 relieved+0.079
 woman+0.079
 goodbye+0.078
Neg logit contributions
wic-0.216
ions-0.214
olving-0.214
ocol-0.207
 Sp-0.206
Token the
Token ID262
Feature activation-4.97e-02

Pos logit contributions
ocol+0.076
wic+0.074
olving+0.074
ions+0.074
ptions+0.072
Neg logit contributions
 back-0.045
 goodbye-0.040
 proud-0.039
 relieved-0.039
 woman-0.039
Tokenill
Token ID359
Feature activation+4.40e+00

Pos logit contributions
ions+0.235
wic+0.234
olving+0.234
ocol+0.233
ptions+0.228
Neg logit contributions
 back-0.135
 woman-0.122
 goodbye-0.122
 proud-0.121
 relieved-0.116
Token.
Token ID13
Feature activation-6.65e-02

Pos logit contributions
ocol+0.234
wic+0.234
ions+0.230
olving+0.230
ptions+0.226
Neg logit contributions
 back-0.119
 proud-0.106
 goodbye-0.104
 woman-0.103
 relieved-0.102
Token The
Token ID383
Feature activation+2.16e+00

Pos logit contributions
 back+0.002
 goodbye+0.001
 proud+0.001
 thanked+0.001
 relieved+0.001
Neg logit contributions
wic-0.004
ocol-0.004
ptions-0.004
olving-0.004
ions-0.004
Token pr
Token ID778
Feature activation+6.32e+00

Pos logit contributions
ions+0.105
ptions+0.096
wic+0.096
ocol+0.094
iders+0.093
Neg logit contributions
 back-0.093
 proud-0.083
 happy-0.083
 woman-0.081
 man-0.078
Tokenune
Token ID1726
Feature activation+3.51e+00

Pos logit contributions
wic+0.247
ptions+0.244
 cheating+0.241
ocol+0.237
ions+0.234
Neg logit contributions
 back-0.270
 proud-0.235
 relieved-0.230
 woman-0.229
 thanked-0.228
Token whispered
Token ID25029
Feature activation-8.73e+00

Pos logit contributions
ions+0.164
wic+0.163
olving+0.161
 Sp+0.156
 centimeters+0.156
Neg logit contributions
 back-0.109
 goodbye-0.097
 proud-0.095
 woman-0.092
 wife-0.091
Token back
Token ID736
Feature activation-5.12e+00

Pos logit contributions
 back+0.235
 goodbye+0.203
 happy+0.171
 proud+0.168
 excited+0.165
Neg logit contributions
ocol-0.539
icer-0.529
wic-0.525
ptions-0.521
ions-0.519
Token to
Token ID284
Feature activation+1.48e+00

Pos logit contributions
 back+0.140
 goodbye+0.118
 happy+0.105
 proud+0.104
 woman+0.103
Neg logit contributions
ptions-0.306
ocol-0.305
wic-0.304
ions-0.296
lex-0.295
Token its
Token ID663
Feature activation-3.76e+00

Pos logit contributions
wic+0.085
ocol+0.083
ptions+0.082
 centimeters+0.081
ions+0.081
Neg logit contributions
 back-0.044
 happy-0.035
 goodbye-0.035
 proud-0.035
 woman-0.032
Token friend
Token ID1545
Feature activation-4.00e+00

Pos logit contributions
 back+0.101
 goodbye+0.080
 happy+0.078
 proud+0.078
 woman+0.076
Neg logit contributions
ions-0.224
wic-0.222
ocol-0.220
ptions-0.217
lex-0.214
Token,
Token ID11
Feature activation+3.19e+00

Pos logit contributions
 back+0.114
 goodbye+0.097
 happy+0.087
 woman+0.085
 excited+0.084
Neg logit contributions
ocol-0.240
wic-0.237
ptions-0.233
icer-0.232
ions-0.228
Token and
Token ID290
Feature activation-4.02e+00

Pos logit contributions
wic+0.009
ocol+0.009
ptions+0.008
ions+0.008
olving+0.008
Neg logit contributions
 back-0.004
 goodbye-0.003
 proud-0.003
 woman-0.003
 thanked-0.003
Token the
Token ID262
Feature activation-1.96e-01

Pos logit contributions
 back+0.117
 proud+0.101
 goodbye+0.100
 woman+0.100
 relieved+0.097
Neg logit contributions
olving-0.200
wic-0.200
ocol-0.198
ions-0.197
ptions-0.192
Token shadows
Token ID16187
Feature activation+2.61e+00

Pos logit contributions
 back+0.005
 goodbye+0.004
 proud+0.004
 woman+0.004
 relieved+0.004
Neg logit contributions
wic-0.011
ocol-0.011
ions-0.011
olving-0.011
ptions-0.011
Token of
Token ID286
Feature activation-6.99e-01

Pos logit contributions
olving+0.125
ions+0.125
wic+0.123
ocol+0.121
 arose+0.119
Neg logit contributions
 back-0.074
 woman-0.070
 wife-0.069
 proud-0.066
 goodbye-0.064
Token night
Token ID1755
Feature activation-1.54e+00

Pos logit contributions
 back+0.021
 goodbye+0.017
 happy+0.017
 proud+0.017
 woman+0.017
Neg logit contributions
wic-0.040
ptions-0.038
ocol-0.038
lex-0.037
olving-0.037
Token whispered
Token ID25029
Feature activation-8.73e+00

Pos logit contributions
 back+0.048
 goodbye+0.041
 proud+0.041
 woman+0.040
 thanked+0.039
Neg logit contributions
wic-0.080
ocol-0.079
ions-0.078
ptions-0.077
olving-0.077
Token in
Token ID287
Feature activation+3.13e+00

Pos logit contributions
 back+0.163
 goodbye+0.132
 proud+0.112
 happy+0.107
 woman+0.105
Neg logit contributions
wic-0.577
ocol-0.575
ptions-0.567
ions-0.564
lex-0.562
Token the
Token ID262
Feature activation-2.03e-01

Pos logit contributions
wic+0.181
ptions+0.175
ocol+0.170
lex+0.169
ions+0.169
Neg logit contributions
 back-0.091
 goodbye-0.080
 proud-0.076
 happy-0.076
 woman-0.073
Token distance
Token ID5253
Feature activation-2.73e+00

Pos logit contributions
 back+0.006
 goodbye+0.005
 proud+0.005
 woman+0.005
 relieved+0.005
Neg logit contributions
wic-0.011
ocol-0.011
ions-0.011
olving-0.011
ptions-0.011
Token.
Token ID13
Feature activation-3.79e-01

Pos logit contributions
 back+0.071
 goodbye+0.061
 happy+0.054
 proud+0.054
 woman+0.053
Neg logit contributions
wic-0.164
ptions-0.163
ocol-0.161
icer-0.160
ions-0.159
Token<|endoftext|>
Token ID50256
Feature activation-8.83e-01

Pos logit contributions
 back+0.009
 goodbye+0.007
 woman+0.007
 proud+0.007
 pony+0.006
Neg logit contributions
ions-0.022
wic-0.022
olving-0.022
 Sp-0.022
ocol-0.022
Token
Token ID198
Feature activation-2.39e+00

Pos logit contributions
 back+0.040
 goodbye+0.033
 woman+0.029
 proud+0.028
 wife+0.026
Neg logit contributions
wic-0.158
ions-0.157
ocol-0.155
olving-0.154
 Sp-0.154
Token"
Token ID1
Feature activation+2.12e+00

Pos logit contributions
 proud+0.078
 happy+0.076
 back+0.075
 relieved+0.074
 excited+0.074
Neg logit contributions
ocol-0.132
wic-0.122
olving-0.116
ions-0.116
 cheating-0.115
TokenCome
Token ID16773
Feature activation+2.71e+00

Pos logit contributions
ocol+0.112
wic+0.111
olving+0.100
 degrees+0.098
ions+0.098
Neg logit contributions
 back-0.070
 woman-0.068
 thanked-0.068
 proud-0.064
 happy-0.064
Token back
Token ID736
Feature activation-5.19e+00

Pos logit contributions
wic+0.163
ocol+0.163
olving+0.159
ions+0.159
ptions+0.158
Neg logit contributions
 back-0.063
 goodbye-0.051
 proud-0.050
 woman-0.047
 happy-0.047
Token,"
Token ID553
Feature activation-2.26e+00

Pos logit contributions
 back+0.166
 goodbye+0.139
 proud+0.136
 woman+0.134
 happy+0.133
Neg logit contributions
ocol-0.271
wic-0.271
ions-0.263
olving-0.262
ptions-0.261
Token whispered
Token ID25029
Feature activation-8.73e+00

Pos logit contributions
 back+0.031
 goodbye+0.022
 proud+0.020
 woman+0.019
 relieved+0.017
Neg logit contributions
wic-0.155
ocol-0.154
ions-0.152
olving-0.151
ptions-0.150
Token Tim
Token ID5045
Feature activation+6.02e+00

Pos logit contributions
 back+0.393
 goodbye+0.346
 excited+0.331
 proud+0.330
 woman+0.327
Neg logit contributions
wic-0.362
ocol-0.357
 centimeters-0.354
olving-0.354
ptions-0.349
Tokenmy
Token ID1820
Feature activation+6.17e-01

Pos logit contributions
wic+0.298
ocol+0.297
ions+0.290
olving+0.287
ptions+0.286
Neg logit contributions
 back-0.205
 goodbye-0.176
 proud-0.173
 woman-0.171
 happy-0.169
Token.
Token ID13
Feature activation-6.93e-01

Pos logit contributions
wic+0.031
olving+0.031
ocol+0.031
lex+0.030
ions+0.030
Neg logit contributions
 back-0.017
 woman-0.016
 goodbye-0.016
 proud-0.016
 thanked-0.015
Token
Token ID198
Feature activation-2.70e+00

Pos logit contributions
 back+0.013
 goodbye+0.010
 proud+0.010
 woman+0.009
 thanked+0.009
Neg logit contributions
wic-0.044
ions-0.043
ocol-0.043
olving-0.043
 Sp-0.043
Token
Token ID198
Feature activation-2.63e+00

Pos logit contributions
 back+0.041
 goodbye+0.032
 proud+0.030
 woman+0.029
 relieved+0.027
Neg logit contributions
wic-0.179
ocol-0.176
ions-0.176
olving-0.176
lex-0.174
TokenI
Token ID40
Feature activation+2.46e+00

Pos logit contributions
wic+0.103
ocol+0.101
olving+0.096
 centimeters+0.093
ptions+0.093
Neg logit contributions
 back-0.075
 woman-0.072
 proud-0.070
 thanked-0.070
 excited-0.068
Token'm
Token ID1101
Feature activation+3.58e+00

Pos logit contributions
wic+0.116
ocol+0.115
ions+0.114
ptions+0.113
olving+0.112
Neg logit contributions
 back-0.086
 goodbye-0.076
 proud-0.076
 woman-0.074
 happy-0.074
Token hiding
Token ID11816
Feature activation+1.15e+00

Pos logit contributions
wic+0.173
olving+0.166
ptions+0.165
lex+0.163
ocol+0.163
Neg logit contributions
 back-0.131
 proud-0.120
 happy-0.116
 goodbye-0.115
 excited-0.114
Token,"
Token ID553
Feature activation-2.27e+00

Pos logit contributions
wic+0.066
ptions+0.066
olving+0.065
ocol+0.064
lex+0.064
Neg logit contributions
 back-0.033
 goodbye-0.026
 proud-0.025
 happy-0.025
 around-0.025
Token Annie
Token ID25860
Feature activation+3.15e-01

Pos logit contributions
 back+0.059
 goodbye+0.049
 proud+0.045
 woman+0.044
 relieved+0.044
Neg logit contributions
ions-0.126
wic-0.125
 Sp-0.125
ocol-0.124
 arose-0.120
Token whispered
Token ID25029
Feature activation-8.73e+00

Pos logit contributions
wic+0.019
 Sp+0.019
ocol+0.019
ions+0.019
 arose+0.019
Neg logit contributions
 back-0.006
 goodbye-0.004
 proud-0.004
 woman-0.004
 pony-0.004
Token.
Token ID13
Feature activation-2.87e-01

Pos logit contributions
 back+0.233
 proud+0.202
 goodbye+0.200
 woman+0.199
 thanked+0.193
Neg logit contributions
wic-0.468
ocol-0.465
ions-0.459
olving-0.459
lex-0.454
Token "
Token ID366
Feature activation+3.92e+00

Pos logit contributions
 back+0.006
 goodbye+0.005
 proud+0.005
 woman+0.005
 thanked+0.005
Neg logit contributions
 Sp-0.017
ions-0.017
wic-0.016
olving-0.016
ocol-0.016
TokenMy
Token ID3666
Feature activation-9.02e-01

Pos logit contributions
wic+0.169
ocol+0.167
ions+0.161
olving+0.161
ptions+0.159
Neg logit contributions
 back-0.156
 proud-0.140
 goodbye-0.139
 woman-0.138
 happy-0.136
Token friends
Token ID2460
Feature activation-2.79e+00

Pos logit contributions
 back+0.021
 goodbye+0.017
 proud+0.016
 woman+0.016
 happy+0.016
Neg logit contributions
wic-0.055
ocol-0.054
ptions-0.054
ions-0.054
olving-0.053
Token were
Token ID547
Feature activation+2.51e+00

Pos logit contributions
 back+0.100
 goodbye+0.089
 proud+0.088
 woman+0.087
 relieved+0.085
Neg logit contributions
wic-0.128
ocol-0.127
ions-0.125
olving-0.124
ptions-0.123
Token watched
Token ID7342
Feature activation+1.59e+00

Pos logit contributions
 back+0.024
 goodbye+0.021
 proud+0.020
 woman+0.020
 relieved+0.019
Neg logit contributions
wic-0.045
ocol-0.045
ions-0.044
olving-0.044
ptions-0.043
Token the
Token ID262
Feature activation-9.01e-02

Pos logit contributions
wic+0.092
ptions+0.091
ocol+0.089
lex+0.089
icer+0.089
Neg logit contributions
 back-0.046
 goodbye-0.040
 happy-0.037
 proud-0.037
 excited-0.036
Token sunset
Token ID26428
Feature activation-4.20e+00

Pos logit contributions
 back+0.002
 woman+0.002
 goodbye+0.002
 proud+0.002
 happy+0.002
Neg logit contributions
ions-0.005
wic-0.005
ocol-0.005
ptions-0.005
lex-0.005
Token,
Token ID11
Feature activation+3.10e+00

Pos logit contributions
 back+0.113
 goodbye+0.103
 woman+0.103
 relieved+0.101
 proud+0.100
Neg logit contributions
wic-0.217
olving-0.217
ions-0.217
 centimeters-0.212
ocol-0.212
Token and
Token ID290
Feature activation-4.17e+00

Pos logit contributions
wic+0.174
olving+0.173
ocol+0.171
ions+0.170
 centimeters+0.169
Neg logit contributions
 back-0.067
 woman-0.056
 goodbye-0.054
 wife-0.052
 pony-0.049
Token whispered
Token ID25029
Feature activation-8.72e+00

Pos logit contributions
 back+0.132
 goodbye+0.115
 proud+0.113
 woman+0.112
 relieved+0.109
Neg logit contributions
wic-0.208
ocol-0.206
ions-0.203
olving-0.203
ptions-0.201
Token about
Token ID546
Feature activation-1.67e+00

Pos logit contributions
 back+0.199
 goodbye+0.171
 happy+0.140
 proud+0.138
 excited+0.131
Neg logit contributions
ocol-0.566
icer-0.563
ptions-0.557
wic-0.554
lex-0.551
Token what
Token ID644
Feature activation-7.65e-01

Pos logit contributions
 back+0.051
 goodbye+0.044
 proud+0.044
 happy+0.043
 woman+0.042
Neg logit contributions
wic-0.089
ocol-0.087
ptions-0.086
olving-0.086
ions-0.086
Token a
Token ID257
Feature activation+3.17e-01

Pos logit contributions
 back+0.020
 happy+0.017
 woman+0.017
 proud+0.017
 thanked+0.017
Neg logit contributions
ocol-0.046
wic-0.045
icer-0.045
ptions-0.044
olving-0.044
Token beautiful
Token ID4950
Feature activation-4.99e-01

Pos logit contributions
wic+0.017
ocol+0.017
ions+0.017
olving+0.017
ptions+0.017
Neg logit contributions
 back-0.009
 goodbye-0.008
 proud-0.007
 woman-0.007
 happy-0.007
Token day
Token ID1110
Feature activation-1.29e+00

Pos logit contributions
 back+0.016
 goodbye+0.014
 proud+0.014
 relieved+0.013
 woman+0.013
Neg logit contributions
wic-0.024
ocol-0.024
olving-0.024
ions-0.024
lex-0.023
Token
Token ID220
Feature activation-1.06e+00

Pos logit contributions
 back+0.014
 goodbye+0.011
 proud+0.011
 happy+0.011
 excited+0.011
Neg logit contributions
ocol-0.047
wic-0.047
ptions-0.046
 centimeters-0.045
olving-0.045
Token
Token ID198
Feature activation-2.85e+00

Pos logit contributions
 back+0.015
 goodbye+0.011
 proud+0.010
 woman+0.010
 relieved+0.009
Neg logit contributions
wic-0.072
ocol-0.072
ions-0.071
olving-0.071
ptions-0.070
Token
Token ID198
Feature activation-2.79e+00

Pos logit contributions
 back+0.026
 proud+0.015
 happy+0.015
 woman+0.013
 goodbye+0.013
Neg logit contributions
wic-0.212
ocol-0.211
olving-0.206
ptions-0.205
 centimeters-0.204
TokenL
Token ID43
Feature activation+5.93e+00

Pos logit contributions
 back+0.095
 happy+0.093
 proud+0.093
 excited+0.090
 thanked+0.089
Neg logit contributions
wic-0.140
ocol-0.137
olving-0.130
ptions-0.129
 centimeters-0.127
Tokenily
Token ID813
Feature activation+2.45e+00

Pos logit contributions
wic+0.310
ocol+0.297
ptions+0.295
uces+0.293
arine+0.292
Neg logit contributions
 back-0.174
 happy-0.152
 proud-0.151
 woman-0.150
 thanked-0.149
Token whispered
Token ID25029
Feature activation-8.72e+00

Pos logit contributions
wic+0.137
olving+0.133
ocol+0.132
ions+0.132
ptions+0.132
Neg logit contributions
 back-0.062
 goodbye-0.054
 proud-0.052
 woman-0.050
 relieved-0.048
Token,
Token ID11
Feature activation+2.89e+00

Pos logit contributions
 back+0.249
 goodbye+0.214
 proud+0.200
 happy+0.194
 woman+0.193
Neg logit contributions
ocol-0.492
wic-0.488
ions-0.479
ptions-0.477
olving-0.474
Token "
Token ID366
Feature activation+3.49e+00

Pos logit contributions
wic+0.140
ocol+0.139
ions+0.137
olving+0.136
ptions+0.135
Neg logit contributions
 back-0.097
 goodbye-0.085
 proud-0.084
 woman-0.083
 relieved-0.081
TokenWhat
Token ID2061
Feature activation+3.01e+00

Pos logit contributions
wic+0.147
ocol+0.146
ions+0.143
olving+0.143
ptions+0.142
Neg logit contributions
 back-0.141
 goodbye-0.127
 proud-0.125
 woman-0.123
 relieved-0.121
Token do
Token ID466
Feature activation+3.50e+00

Pos logit contributions
ocol+0.170
icer+0.159
ions+0.157
lex+0.157
wic+0.156
Neg logit contributions
 back-0.088
 happy-0.080
 proud-0.076
 excited-0.075
 goodbye-0.075
Token they
Token ID484
Feature activation-2.54e+00

Pos logit contributions
ocol+0.163
ptions+0.162
wic+0.162
icer+0.161
olving+0.160
Neg logit contributions
 back-0.125
 proud-0.114
 goodbye-0.113
 woman-0.113
 happy-0.112
Tokenen
Token ID268
Feature activation+2.45e-01

Pos logit contributions
 back+0.001
 goodbye+0.001
 proud+0.001
 woman+0.001
 thanked+0.001
Neg logit contributions
wic-0.001
ocol-0.001
ions-0.001
ptions-0.001
olving-0.001
Tokenched
Token ID1740
Feature activation-2.04e+00

Pos logit contributions
wic+0.014
ocol+0.014
olving+0.014
ions+0.014
ptions+0.014
Neg logit contributions
 back-0.006
 goodbye-0.005
 proud-0.005
 woman-0.005
 thanked-0.005
Token,
Token ID11
Feature activation+2.57e+00

Pos logit contributions
 back+0.042
 proud+0.036
 goodbye+0.036
 woman+0.036
 relieved+0.035
Neg logit contributions
wic-0.122
ocol-0.121
ions-0.120
olving-0.118
lex-0.117
Token the
Token ID262
Feature activation-5.91e-01

Pos logit contributions
ocol+0.151
wic+0.151
ptions+0.149
olving+0.149
ions+0.148
Neg logit contributions
 back-0.067
 goodbye-0.055
 proud-0.054
 happy-0.053
 woman-0.052
Token mouse
Token ID10211
Feature activation+7.68e-01

Pos logit contributions
 back+0.020
 woman+0.017
 goodbye+0.017
 proud+0.016
 happy+0.016
Neg logit contributions
ptions-0.031
wic-0.030
lex-0.030
ocol-0.030
ions-0.030
Token bid
Token ID8406
Feature activation-8.71e+00

Pos logit contributions
ocol+0.042
ptions+0.042
wic+0.042
lex+0.041
olving+0.041
Neg logit contributions
 back-0.023
 goodbye-0.019
 thanked-0.019
 proud-0.018
 happy-0.018
Token goodbye
Token ID24829
Feature activation-4.72e+00

Pos logit contributions
 back+0.254
 goodbye+0.219
 proud+0.217
 woman+0.213
 relieved+0.208
Neg logit contributions
wic-0.461
ocol-0.458
ions-0.450
olving-0.449
ptions-0.445
Token to
Token ID284
Feature activation+1.61e+00

Pos logit contributions
 back+0.105
 proud+0.097
 woman+0.095
 relieved+0.092
 thanked+0.089
Neg logit contributions
wic-0.251
olving-0.250
 Sp-0.248
ocol-0.246
lex-0.245
Token the
Token ID262
Feature activation-4.57e-01

Pos logit contributions
wic+0.079
ocol+0.078
olving+0.077
ions+0.077
ptions+0.077
Neg logit contributions
 back-0.054
 goodbye-0.047
 proud-0.047
 woman-0.046
 happy-0.045
Token huge
Token ID3236
Feature activation+1.23e+00

Pos logit contributions
 back+0.013
 woman+0.011
 happy+0.011
 proud+0.011
 goodbye+0.010
Neg logit contributions
wic-0.026
ions-0.026
ocol-0.026
ptions-0.026
lex-0.025
Token ant
Token ID1885
Feature activation+2.74e+00

Pos logit contributions
wic+0.069
ptions+0.069
ions+0.067
lex+0.066
ocol+0.066
Neg logit contributions
 back-0.038
 woman-0.032
 goodbye-0.032
 happy-0.030
 pony-0.029
Token replied
Token ID8712
Feature activation-6.65e+00

Pos logit contributions
ocol+0.068
wic+0.068
ions+0.067
olving+0.066
ptions+0.066
Neg logit contributions
 back-0.036
 goodbye-0.031
 proud-0.030
 woman-0.030
 thanked-0.030
Token,
Token ID11
Feature activation+3.22e+00

Pos logit contributions
 back+0.212
 goodbye+0.187
 proud+0.176
 happy+0.170
 relieved+0.167
Neg logit contributions
ptions-0.349
wic-0.349
ocol-0.348
ions-0.345
olving-0.340
Token �
Token ID6184
Feature activation+5.85e+00

Pos logit contributions
ocol+0.184
wic+0.183
ptions+0.183
ions+0.181
olving+0.179
Neg logit contributions
 back-0.093
 goodbye-0.080
 proud-0.077
 excited-0.075
 happy-0.075
Token�
Token ID95
Feature activation+5.83e+00

Pos logit contributions
ptions+0.236
ions+0.234
olving+0.223
ocol+0.204
anish+0.204
Neg logit contributions
 back-0.261
 proud-0.259
 goodbye-0.243
 relieved-0.238
five-0.231
Token€
Token ID26391
Feature activation+3.24e+00

Pos logit contributions
ions+0.249
ocol+0.238
ptions+0.236
olving+0.235
wic+0.233
Neg logit contributions
 back-0.245
 proud-0.227
five-0.222
 relieved-0.219
 goodbye-0.215
Token�
Token ID129
Feature activation+8.78e+00

Pos logit contributions
ocol+0.126
ions+0.124
ptions+0.121
wic+0.120
olving+0.119
Neg logit contributions
 back-0.143
 proud-0.134
 relieved-0.132
 goodbye-0.131
 woman-0.129
Token�
Token ID241
Feature activation+4.35e+00

Pos logit contributions
wic+0.328
ocol+0.324
ions+0.320
olving+0.316
ptions+0.314
Neg logit contributions
 back-0.396
 goodbye-0.360
 proud-0.359
 woman-0.351
 relieved-0.348
TokenNo
Token ID2949
Feature activation+3.31e+00

Pos logit contributions
wic+0.200
ocol+0.198
ptions+0.196
ions+0.194
olving+0.192
Neg logit contributions
 back-0.159
 proud-0.152
 excited-0.149
 happy-0.148
 woman-0.147
Token,
Token ID11
Feature activation+3.09e+00

Pos logit contributions
ptions+0.199
amy+0.193
olving+0.190
ions+0.189
ocol+0.187
Neg logit contributions
 goodbye-0.090
 back-0.086
 says-0.076
 said-0.075
 and-0.074
Token I
Token ID314
Feature activation+2.16e+00

Pos logit contributions
ptions+0.170
wic+0.167
ocol+0.167
olving+0.166
ions+0.163
Neg logit contributions
 back-0.093
 goodbye-0.085
 proud-0.079
 happy-0.079
 woman-0.076
Token don
Token ID836
Feature activation+6.81e+00

Pos logit contributions
ocol+0.101
wic+0.101
ions+0.100
ptions+0.100
olving+0.098
Neg logit contributions
 back-0.077
 proud-0.069
 goodbye-0.068
 happy-0.067
 relieved-0.066
Token feel
Token ID1254
Feature activation-2.57e+00

Pos logit contributions
 woman+0.077
 goodbye+0.075
 back+0.074
 couple+0.071
 pony+0.069
Neg logit contributions
olving-0.123
wic-0.122
 centimeters-0.118
 Sp-0.116
ions-0.116
Token better
Token ID1365
Feature activation-3.46e+00

Pos logit contributions
 back+0.068
 woman+0.068
 goodbye+0.063
 couple+0.062
 wife+0.062
Neg logit contributions
wic-0.128
ocol-0.126
olving-0.125
 centimeters-0.123
 Sp-0.122
Token."
Token ID526
Feature activation-1.48e+00

Pos logit contributions
 back+0.123
 goodbye+0.121
 woman+0.119
 relieved+0.116
 pony+0.113
Neg logit contributions
 centimeters-0.140
olving-0.140
 degrees-0.138
wic-0.133
 cheating-0.132
Token
Token ID198
Feature activation-2.74e+00

Pos logit contributions
 back+0.041
 goodbye+0.037
 woman+0.035
 couple+0.034
 pony+0.033
Neg logit contributions
 Sp-0.081
wic-0.076
 arose-0.074
lex-0.073
ions-0.073
Token
Token ID198
Feature activation-2.66e+00

Pos logit contributions
 back+0.080
 goodbye+0.080
 wife+0.072
 woman+0.069
 pony+0.069
Neg logit contributions
 Sp-0.144
wic-0.132
ions-0.132
 arose-0.132
lex-0.130
TokenTim
Token ID14967
Feature activation+8.52e+00

Pos logit contributions
 back+0.107
 proud+0.101
 happy+0.099
 thanked+0.098
 woman+0.096
Neg logit contributions
ocol-0.121
wic-0.116
ions-0.113
ptions-0.109
olving-0.108
Tokenmy
Token ID1820
Feature activation+5.84e-01

Pos logit contributions
wic+0.364
olving+0.357
ptions+0.351
lex+0.348
ions+0.345
Neg logit contributions
 back-0.320
 goodbye-0.304
 proud-0.297
 woman-0.289
 relieved-0.287
Token felt
Token ID2936
Feature activation-1.69e+00

Pos logit contributions
wic+0.028
nexpected+0.027
 Sp+0.027
olving+0.026
 centimeters+0.026
Neg logit contributions
 back-0.017
 goodbye-0.016
 woman-0.015
five-0.015
 proud-0.015
Token better
Token ID1365
Feature activation-3.42e+00

Pos logit contributions
 woman+0.059
 couple+0.057
amb+0.053
 pony+0.053
five+0.053
Neg logit contributions
 Sp-0.067
wic-0.066
 cheating-0.064
olving-0.064
 centimeters-0.063
Token knowing
Token ID6970
Feature activation-5.43e+00

Pos logit contributions
 back+0.075
 goodbye+0.069
 pony+0.068
 thanked+0.064
 woman+0.062
Neg logit contributions
olving-0.183
 centimeters-0.181
 cheating-0.178
 Sp-0.178
 degrees-0.173
Token that
Token ID326
Feature activation-3.35e+00

Pos logit contributions
 back+0.133
 goodbye+0.110
 woman+0.108
 proud+0.107
 relieved+0.106
Neg logit contributions
wic-0.303
ions-0.303
ocol-0.302
olving-0.298
 Sp-0.295
Token the
Token ID262
Feature activation+6.50e-02

Pos logit contributions
wic+0.106
ptions+0.105
lex+0.103
ocol+0.102
icer+0.101
Neg logit contributions
 back-0.053
 happy-0.041
 goodbye-0.041
 proud-0.040
 around-0.039
Token noise
Token ID7838
Feature activation-5.23e-01

Pos logit contributions
wic+0.003
ocol+0.003
olving+0.003
ptions+0.003
 centimeters+0.003
Neg logit contributions
 back-0.002
 goodbye-0.002
 proud-0.002
 relieved-0.002
 thanked-0.002
Token.
Token ID13
Feature activation-2.47e-01

Pos logit contributions
 back+0.013
 goodbye+0.011
 proud+0.011
 happy+0.010
 woman+0.010
Neg logit contributions
wic-0.030
ocol-0.030
ptions-0.029
ions-0.029
olving-0.029
Token
Token ID198
Feature activation-2.48e+00

Pos logit contributions
 back+0.006
 goodbye+0.005
 woman+0.005
 pony+0.005
 wife+0.004
Neg logit contributions
 Sp-0.014
wic-0.014
lex-0.014
ions-0.014
olving-0.014
Token
Token ID198
Feature activation-2.43e+00

Pos logit contributions
 back+0.052
 goodbye+0.047
 woman+0.042
 wife+0.041
 pony+0.041
Neg logit contributions
ions-0.147
 Sp-0.146
wic-0.146
lex-0.144
olving-0.144
TokenTim
Token ID14967
Feature activation+8.65e+00

Pos logit contributions
 back+0.097
 proud+0.093
 relieved+0.091
 happy+0.090
 thanked+0.089
Neg logit contributions
ocol-0.109
wic-0.108
ptions-0.102
ions-0.101
olving-0.100
Tokenmy
Token ID1820
Feature activation+8.58e-01

Pos logit contributions
wic+0.403
ocol+0.396
olving+0.392
ions+0.391
ptions+0.390
Neg logit contributions
 back-0.305
 goodbye-0.275
 proud-0.271
 woman-0.265
 relieved-0.260
Token pushed
Token ID7121
Feature activation+8.32e-01

Pos logit contributions
wic+0.040
 Sp+0.039
ptions+0.038
 centimeters+0.038
 arose+0.038
Neg logit contributions
 back-0.027
 goodbye-0.025
 proud-0.025
 wife-0.024
 woman-0.024
Token and
Token ID290
Feature activation-4.05e+00

Pos logit contributions
wic+0.046
ocol+0.046
ions+0.045
olving+0.045
ptions+0.044
Neg logit contributions
 back-0.023
 goodbye-0.020
 proud-0.019
 woman-0.019
 relieved-0.018
Token pulled
Token ID5954
Feature activation+9.48e-01

Pos logit contributions
 back+0.135
 proud+0.115
 goodbye+0.114
 thanked+0.112
 happy+0.112
Neg logit contributions
ocol-0.211
wic-0.209
olving-0.201
ions-0.201
lex-0.200
Token with
Token ID351
Feature activation-2.10e+00

Pos logit contributions
wic+0.065
ocol+0.063
ptions+0.062
esan+0.061
olving+0.061
Neg logit contributions
 back-0.022
 around-0.014
 and-0.014
 happy-0.014
 excited-0.013
Token said
Token ID531
Feature activation-7.33e+00

Pos logit contributions
wic+0.083
ptions+0.080
ocol+0.080
ions+0.080
olving+0.080
Neg logit contributions
 back-0.044
 goodbye-0.039
 proud-0.037
 woman-0.036
 relieved-0.035
Token,
Token ID11
Feature activation+3.35e+00

Pos logit contributions
 back+0.226
 goodbye+0.200
 proud+0.187
 happy+0.184
 relieved+0.176
Neg logit contributions
ocol-0.396
ptions-0.394
wic-0.394
ions-0.391
icer-0.387
Token �
Token ID6184
Feature activation+5.72e+00

Pos logit contributions
ocol+0.182
wic+0.182
ions+0.178
ptions+0.177
olving+0.177
Neg logit contributions
 back-0.100
 goodbye-0.086
 proud-0.086
 happy-0.083
 woman-0.082
Token�
Token ID95
Feature activation+5.82e+00

Pos logit contributions
ions+0.210
ptions+0.200
ocol+0.199
wic+0.199
olving+0.198
Neg logit contributions
 back-0.266
 proud-0.254
 relieved-0.241
 goodbye-0.237
 woman-0.233
Token€
Token ID26391
Feature activation+3.22e+00

Pos logit contributions
wic+0.269
ocol+0.267
ions+0.264
olving+0.257
ptions+0.255
Neg logit contributions
 back-0.214
 proud-0.190
 relieved-0.184
 woman-0.183
 goodbye-0.182
Token�
Token ID129
Feature activation+8.93e+00

Pos logit contributions
wic+0.127
olving+0.123
ocol+0.121
ptions+0.120
ions+0.119
Neg logit contributions
 back-0.136
 goodbye-0.125
 woman-0.121
 proud-0.120
 thanked-0.117
Token�
Token ID241
Feature activation+4.45e+00

Pos logit contributions
ocol+0.309
wic+0.308
lex+0.305
olving+0.305
ptions+0.298
Neg logit contributions
 back-0.407
 goodbye-0.381
 woman-0.376
 thanked-0.365
 proud-0.362
TokenDo
Token ID5211
Feature activation+6.84e+00

Pos logit contributions
wic+0.215
ocol+0.214
ptions+0.210
olving+0.210
ions+0.207
Neg logit contributions
 back-0.150
 proud-0.143
 woman-0.142
 happy-0.140
 excited-0.139
Token you
Token ID345
Feature activation+2.18e-02

Pos logit contributions
ocol+0.312
wic+0.311
ions+0.302
ptions+0.299
olving+0.298
Neg logit contributions
 back-0.258
 proud-0.231
 goodbye-0.230
 happy-0.226
 woman-0.220
Token want
Token ID765
Feature activation-9.76e-01

Pos logit contributions
ocol+0.001
wic+0.001
olving+0.001
ions+0.001
lex+0.001
Neg logit contributions
 back-0.001
 woman-0.000
 goodbye-0.000
 proud-0.000
 relieved-0.000
Token to
Token ID284
Feature activation+1.42e+00

Pos logit contributions
 back+0.019
 goodbye+0.015
 proud+0.015
 woman+0.014
 thanked+0.014
Neg logit contributions
wic-0.062
ocol-0.061
olving-0.060
ptions-0.060
ions-0.060
Token our
Token ID674
Feature activation+1.88e-01

Pos logit contributions
ptions+0.115
wic+0.113
lex+0.112
olving+0.110
anish+0.109
Neg logit contributions
 back-0.064
 happy-0.052
 proud-0.052
 and-0.051
 home-0.049
Token walk
Token ID2513
Feature activation-1.84e+00

Pos logit contributions
ptions+0.009
ocol+0.009
ions+0.009
wic+0.009
lex+0.008
Neg logit contributions
 back-0.007
 goodbye-0.006
 happy-0.006
 proud-0.006
 excited-0.006
Token."
Token ID526
Feature activation-1.12e+00

Pos logit contributions
 back+0.052
 goodbye+0.040
 around+0.038
 proud+0.038
 happy+0.037
Neg logit contributions
ocol-0.109
wic-0.108
ptions-0.105
lex-0.105
olving-0.104
Token
Token ID198
Feature activation-2.21e+00

Pos logit contributions
 back+0.018
 goodbye+0.013
 proud+0.013
 woman+0.012
 relieved+0.011
Neg logit contributions
wic-0.074
ocol-0.073
ions-0.073
olving-0.072
lex-0.072
Token
Token ID198
Feature activation-2.14e+00

Pos logit contributions
 back+0.037
 goodbye+0.029
 proud+0.026
 woman+0.025
 relieved+0.023
Neg logit contributions
wic-0.143
ions-0.142
ocol-0.142
lex-0.140
olving-0.140
TokenO
Token ID46
Feature activation+9.13e+00

Pos logit contributions
 woman+0.066
 proud+0.062
 happy+0.062
 back+0.061
 thanked+0.061
Neg logit contributions
ocol-0.116
wic-0.115
olving-0.109
 centimeters-0.104
ions-0.104
Tokenlive
Token ID12583
Feature activation+1.11e+00

Pos logit contributions
ptions+0.324
ocol+0.323
ions+0.313
wic+0.297
 Sp+0.295
Neg logit contributions
 proud-0.412
 thanked-0.405
 back-0.403
 relieved-0.393
 happy-0.388
Token agreed
Token ID4987
Feature activation-3.81e+00

Pos logit contributions
ocol+0.069
ions+0.066
ptions+0.065
olving+0.065
wic+0.065
Neg logit contributions
 back-0.028
 thanked-0.024
 goodbye-0.023
 proud-0.022
 excited-0.022
Token to
Token ID284
Feature activation+1.73e+00

Pos logit contributions
 back+0.076
 goodbye+0.061
 proud+0.059
 woman+0.056
 relieved+0.054
Neg logit contributions
wic-0.239
ocol-0.238
ions-0.234
olving-0.233
ptions-0.233
Token be
Token ID307
Feature activation+1.56e+00

Pos logit contributions
icer+0.109
ocol+0.107
wic+0.107
ptions+0.105
ions+0.102
Neg logit contributions
 back-0.049
 happy-0.039
 goodbye-0.039
 proud-0.038
 excited-0.036
Token patient
Token ID5827
Feature activation+8.82e-01

Pos logit contributions
wic+0.092
ocol+0.091
icer+0.088
ptions+0.087
ions+0.087
Neg logit contributions
 back-0.047
 proud-0.039
 happy-0.038
 relieved-0.036
 excited-0.036
Token he
Token ID339
Feature activation-1.69e+00

Pos logit contributions
 back+0.018
 goodbye+0.015
 proud+0.015
 woman+0.014
 relieved+0.014
Neg logit contributions
 Sp-0.023
ocol-0.023
ions-0.022
wic-0.022
lex-0.022
Token shouted
Token ID17293
Feature activation-3.80e+00

Pos logit contributions
 back+0.038
 goodbye+0.031
 proud+0.030
 woman+0.029
 relieved+0.027
Neg logit contributions
wic-0.101
ptions-0.098
 Sp-0.097
ions-0.097
 arose-0.096
Token.
Token ID13
Feature activation-1.28e-01

Pos logit contributions
 back+0.109
 woman+0.105
 thanked+0.104
 proud+0.103
 goodbye+0.100
Neg logit contributions
ocol-0.182
 Sp-0.180
wic-0.180
lex-0.178
olving-0.176
Token
Token ID198
Feature activation-2.19e+00

Pos logit contributions
 back+0.003
 goodbye+0.002
 proud+0.002
 woman+0.002
 wife+0.002
Neg logit contributions
ions-0.008
 Sp-0.008
wic-0.008
ocol-0.008
olving-0.007
Token
Token ID198
Feature activation-2.15e+00

Pos logit contributions
 back+0.040
 goodbye+0.036
 woman+0.031
 pony+0.030
 proud+0.030
Neg logit contributions
ions-0.136
wic-0.135
 Sp-0.135
lex-0.134
ocol-0.134
TokenL
Token ID43
Feature activation+6.33e+00

Pos logit contributions
 back+0.087
 proud+0.078
 relieved+0.077
 woman+0.076
 happy+0.076
Neg logit contributions
ocol-0.101
wic-0.100
olving-0.094
ptions-0.093
ions-0.093
Tokenily
Token ID813
Feature activation+3.04e+00

Pos logit contributions
wic+0.302
ocol+0.299
olving+0.293
ions+0.292
ptions+0.291
Neg logit contributions
 back-0.222
 goodbye-0.197
 proud-0.194
 woman-0.190
 relieved-0.187
Token was
Token ID373
Feature activation+2.39e+00

Pos logit contributions
ocol+0.203
wic+0.202
ions+0.199
olving+0.198
ptions+0.197
Neg logit contributions
 back-0.048
 goodbye-0.036
 proud-0.035
 woman-0.033
 thanked-0.032
Token scared
Token ID12008
Feature activation-6.10e-03

Pos logit contributions
 cheating+0.133
 centimeters+0.126
ptions+0.126
wic+0.123
 Sp+0.121
Neg logit contributions
 goodbye-0.056
 woman-0.056
 back-0.053
 pony-0.052
 wife-0.052
Token.
Token ID13
Feature activation-3.61e-02

Pos logit contributions
 back+0.000
 goodbye+0.000
 proud+0.000
 woman+0.000
 thanked+0.000
Neg logit contributions
ocol-0.000
ions-0.000
wic-0.000
olving-0.000
ptions-0.000
Token She
Token ID1375
Feature activation+2.55e-02

Pos logit contributions
 back+0.001
 proud+0.000
 goodbye+0.000
 woman+0.000
 relieved+0.000
Neg logit contributions
ocol-0.003
wic-0.002
ptions-0.002
 centimeters-0.002
ions-0.002
Token
Token ID198
Feature activation-1.93e+00

Pos logit contributions
 back+0.045
 goodbye+0.042
 pony+0.038
 woman+0.038
 wife+0.037
Neg logit contributions
 Sp-0.114
ions-0.113
wic-0.111
lex-0.110
 arose-0.109
TokenS
Token ID50
Feature activation+3.93e+00

Pos logit contributions
 back+0.058
 woman+0.053
 proud+0.053
 happy+0.051
 relieved+0.051
Neg logit contributions
wic-0.106
ocol-0.105
olving-0.101
ptions-0.099
icer-0.097
Tokenara
Token ID3301
Feature activation+2.45e+00

Pos logit contributions
 Sp+0.145
ocol+0.143
 accompl+0.143
ions+0.141
 degrees+0.140
Neg logit contributions
 back-0.175
 says-0.158
 and-0.152
five-0.150
 goodbye-0.143
Token opens
Token ID9808
Feature activation+4.68e-01

Pos logit contributions
wic+0.132
ocol+0.131
ions+0.129
olving+0.129
ptions+0.128
Neg logit contributions
 back-0.070
 goodbye-0.060
 proud-0.059
 woman-0.058
 relieved-0.056
Token the
Token ID262
Feature activation+3.16e-01

Pos logit contributions
wic+0.024
ocol+0.024
ions+0.024
olving+0.024
ptions+0.023
Neg logit contributions
 back-0.014
 goodbye-0.013
 proud-0.012
 woman-0.012
 happy-0.012
Token glue
Token ID22749
Feature activation+2.16e+00

Pos logit contributions
ions+0.016
ocol+0.016
ptions+0.016
wic+0.016
lex+0.015
Neg logit contributions
 back-0.011
 goodbye-0.009
 proud-0.009
 woman-0.009
 happy-0.009
Token and
Token ID290
Feature activation-3.59e+00

Pos logit contributions
wic+0.140
ions+0.140
olving+0.139
ocol+0.138
ptions+0.136
Neg logit contributions
 back-0.032
 goodbye-0.027
 proud-0.027
 relieved-0.026
 woman-0.024
Token squee
Token ID13170
Feature activation+8.22e+00

Pos logit contributions
 back+0.125
 goodbye+0.122
 woman+0.118
 relieved+0.118
 proud+0.116
Neg logit contributions
wic-0.163
ocol-0.153
 Sp-0.152
ions-0.152
 cheating-0.149
Tokenzes
Token ID12271
Feature activation-1.57e+00

Pos logit contributions
ptions+0.322
ions+0.319
olving+0.319
 centimeters+0.313
ocol+0.299
Neg logit contributions
 back-0.380
 goodbye-0.338
 thanked-0.325
 says-0.319
 proud-0.316
Token some
Token ID617
Feature activation-1.08e+00

Pos logit contributions
 back+0.043
 around+0.031
 and+0.030
 goodbye+0.030
 excited+0.030
Neg logit contributions
wic-0.096
ptions-0.095
ocol-0.093
olving-0.093
 centimeters-0.093
Token on
Token ID319
Feature activation+2.20e+00

Pos logit contributions
 back+0.049
 proud+0.044
 goodbye+0.044
 happy+0.043
 thanked+0.042
Neg logit contributions
wic-0.041
ptions-0.040
olving-0.039
ocol-0.039
lex-0.039
Token every
Token ID790
Feature activation-2.37e+00

Pos logit contributions
 back+0.169
 relieved+0.148
 woman+0.148
 goodbye+0.146
 proud+0.146
Neg logit contributions
wic-0.336
ions-0.332
olving-0.329
ptions-0.327
ocol-0.327
Token day
Token ID1110
Feature activation-1.29e+00

Pos logit contributions
 back+0.078
 proud+0.069
 goodbye+0.068
 relieved+0.068
 thanked+0.066
Neg logit contributions
wic-0.114
ocol-0.113
ions-0.112
olving-0.112
lex-0.109
Token.
Token ID13
Feature activation-1.09e-01

Pos logit contributions
 back+0.034
 goodbye+0.029
 proud+0.029
 woman+0.028
 relieved+0.028
Neg logit contributions
wic-0.070
ocol-0.070
ions-0.069
olving-0.069
ptions-0.068
Token She
Token ID1375
Feature activation+2.33e-01

Pos logit contributions
 back+0.002
 goodbye+0.002
 proud+0.002
 woman+0.002
 relieved+0.002
Neg logit contributions
wic-0.007
ocol-0.007
ions-0.006
olving-0.006
ptions-0.006
Token sings
Token ID33041
Feature activation-6.32e+00

Pos logit contributions
wic+0.010
 Sp+0.009
 cheating+0.009
olving+0.009
ions+0.009
Neg logit contributions
 back-0.008
 relieved-0.008
 goodbye-0.008
 woman-0.008
 couple-0.007
Token in
Token ID287
Feature activation+3.13e+00

Pos logit contributions
 back+0.165
 woman+0.150
 relieved+0.149
 thanked+0.146
 proud+0.145
Neg logit contributions
wic-0.328
ions-0.325
olving-0.324
 Sp-0.321
ptions-0.321
Token the
Token ID262
Feature activation+3.73e-01

Pos logit contributions
wic+0.188
ocol+0.182
olving+0.182
lex+0.182
ions+0.181
Neg logit contributions
 back-0.082
 happy-0.070
 goodbye-0.068
 proud-0.065
 woman-0.063
Token morning
Token ID3329
Feature activation-8.54e-01

Pos logit contributions
wic+0.020
ocol+0.020
ions+0.020
olving+0.020
ptions+0.020
Neg logit contributions
 back-0.011
 goodbye-0.009
 proud-0.009
 woman-0.009
 relieved-0.009
Token,
Token ID11
Feature activation+3.73e+00

Pos logit contributions
 back+0.022
 goodbye+0.019
 proud+0.019
 relieved+0.019
 woman+0.018
Neg logit contributions
wic-0.047
ocol-0.046
ions-0.046
olving-0.046
ptions-0.045
Token in
Token ID287
Feature activation+3.09e+00

Pos logit contributions
wic+0.206
ocol+0.202
olving+0.202
ions+0.202
 Sp+0.199
Neg logit contributions
 back-0.089
 goodbye-0.076
 proud-0.075
 relieved-0.075
 woman-0.075
Token the
Token ID262
Feature activation+3.39e-01

Pos logit contributions
wic+0.181
ocol+0.176
lex+0.174
ptions+0.173
olving+0.173
Neg logit contributions
 back-0.088
 goodbye-0.073
 happy-0.073
 proud-0.070
 woman-0.069
Token stop
Token ID2245
Feature activation+6.78e-01

Pos logit contributions
 Sp+0.139
gh+0.135
olving+0.134
lex+0.134
ocol+0.131
Neg logit contributions
 goodbye-0.109
 relieved-0.107
 woman-0.105
 says-0.100
eal-0.100
Token weeping
Token ID45705
Feature activation-4.12e+00

Pos logit contributions
wic+0.030
ocol+0.030
olving+0.030
ions+0.030
ptions+0.029
Neg logit contributions
 back-0.025
 goodbye-0.022
 proud-0.022
 woman-0.022
 relieved-0.021
Token.
Token ID13
Feature activation-3.85e-01

Pos logit contributions
 back+0.112
 goodbye+0.095
 proud+0.092
 woman+0.090
 happy+0.089
Neg logit contributions
wic-0.230
ocol-0.229
ions-0.224
ptions-0.224
olving-0.223
Token She
Token ID1375
Feature activation-2.16e-01

Pos logit contributions
 back+0.008
 goodbye+0.007
 proud+0.007
 woman+0.007
 relieved+0.006
Neg logit contributions
wic-0.023
olving-0.023
ocol-0.023
ions-0.022
ptions-0.022
Token we
Token ID356
Feature activation-1.16e-01

Pos logit contributions
 back+0.007
 goodbye+0.006
 proud+0.006
 woman+0.006
 relieved+0.006
Neg logit contributions
wic-0.011
 Sp-0.010
ptions-0.010
olving-0.010
ocol-0.010
Tokenpt
Token ID457
Feature activation+2.38e+00

Pos logit contributions
 back+0.005
 woman+0.005
 goodbye+0.005
 proud+0.005
 relieved+0.005
Neg logit contributions
wic-0.004
ocol-0.004
ions-0.004
olving-0.004
ptions-0.004
Token and
Token ID290
Feature activation-3.88e+00

Pos logit contributions
wic+0.143
ions+0.140
olving+0.140
ocol+0.139
 centimeters+0.138
Neg logit contributions
 back-0.049
 goodbye-0.042
 woman-0.040
 proud-0.040
 thanked-0.039
Token we
Token ID356
Feature activation-1.20e-01

Pos logit contributions
 back+0.143
 woman+0.127
 goodbye+0.126
 proud+0.125
 relieved+0.124
Neg logit contributions
wic-0.170
olving-0.163
ocol-0.162
ions-0.162
 Sp-0.160
Tokenpt
Token ID457
Feature activation+2.40e+00

Pos logit contributions
 back+0.005
 woman+0.005
 goodbye+0.005
 proud+0.005
 wife+0.005
Neg logit contributions
wic-0.005
ocol-0.004
olving-0.004
ions-0.004
 Sp-0.004
Token until
Token ID1566
Feature activation+1.92e+00

Pos logit contributions
wic+0.131
ocol+0.129
ions+0.128
olving+0.128
ptions+0.126
Neg logit contributions
 back-0.065
 goodbye-0.057
 proud-0.055
 woman-0.055
 thanked-0.053
Token her
Token ID607
Feature activation-5.00e+00

Pos logit contributions
wic+0.102
ocol+0.101
ions+0.100
olving+0.099
ptions+0.098
Neg logit contributions
 back-0.055
 goodbye-0.047
 proud-0.047
 woman-0.046
 relieved-0.045
Token dress
Token ID6576
Feature activation-1.13e+00

Pos logit contributions
wic+0.066
ocol+0.065
ptions+0.065
lex+0.064
 centimeters+0.063
Neg logit contributions
 back-0.024
 happy-0.018
 proud-0.017
 around-0.017
 and-0.017
Token-
Token ID12
Feature activation+1.16e+00

Pos logit contributions
 back+0.033
 goodbye+0.029
 proud+0.028
 woman+0.028
 relieved+0.027
Neg logit contributions
wic-0.059
ocol-0.059
olving-0.058
ions-0.058
ptions-0.057
Tokenup
Token ID929
Feature activation-5.59e-01

Pos logit contributions
wic+0.060
ocol+0.060
ions+0.059
olving+0.059
ptions+0.058
Neg logit contributions
 back-0.035
 goodbye-0.031
 proud-0.030
 woman-0.030
 relieved-0.029
Token.
Token ID13
Feature activation-3.90e-01

Pos logit contributions
 back+0.014
 goodbye+0.012
 proud+0.012
 woman+0.012
 relieved+0.012
Neg logit contributions
wic-0.031
ocol-0.031
olving-0.030
ions-0.030
ptions-0.030
Token<|endoftext|>
Token ID50256
Feature activation-1.18e+00

Pos logit contributions
 back+0.009
 goodbye+0.009
 wife+0.008
 woman+0.008
 proud+0.008
Neg logit contributions
 Sp-0.023
wic-0.021
ions-0.021
olving-0.021
lex-0.020
TokenOnce
Token ID7454
Feature activation+3.25e+00

Pos logit contributions
 back+0.017
 goodbye+0.013
 proud+0.012
 woman+0.012
 relieved+0.011
Neg logit contributions
wic-0.079
ocol-0.078
ions-0.078
olving-0.077
ptions-0.077
Token upon
Token ID2402
Feature activation+2.72e+00

Pos logit contributions
olving+0.075
 Sp+0.074
ions+0.073
icer+0.072
 arose+0.071
Neg logit contributions
five-0.175
 woman-0.168
 thanked-0.167
 back-0.165
 couple-0.165
Token a
Token ID257
Feature activation+4.00e-01

Pos logit contributions
wic+0.107
 Sp+0.106
ptions+0.105
ions+0.103
 centimeters+0.103
Neg logit contributions
 back-0.103
 goodbye-0.097
 woman-0.096
 thanked-0.092
 relieved-0.091
Token time
Token ID640
Feature activation+2.83e+00

Pos logit contributions
wic+0.013
ocol+0.013
ions+0.012
olving+0.012
ptions+0.012
Neg logit contributions
 back-0.020
 goodbye-0.018
 proud-0.018
 woman-0.018
 relieved-0.018
Token,
Token ID11
Feature activation+3.46e+00

Pos logit contributions
wic+0.155
ocol+0.155
ptions+0.154
olving+0.151
lex+0.151
Neg logit contributions
 back-0.082
 goodbye-0.070
 happy-0.069
 proud-0.067
 excited-0.066
Token there
Token ID612
Feature activation+9.24e-02

Pos logit contributions
wic+0.173
ocol+0.172
olving+0.166
ions+0.165
ptions+0.165
Neg logit contributions
 back-0.118
 proud-0.102
 happy-0.102
 goodbye-0.100
 relieved-0.098
Token made
Token ID925
Feature activation+1.18e+00

Pos logit contributions
 back+0.015
 goodbye+0.013
 proud+0.012
 woman+0.012
 relieved+0.012
Neg logit contributions
wic-0.032
ocol-0.031
olving-0.031
ions-0.031
ptions-0.031
Token them
Token ID606
Feature activation-2.23e+00

Pos logit contributions
wic+0.057
ocol+0.057
ions+0.056
olving+0.056
ptions+0.055
Neg logit contributions
 back-0.040
 goodbye-0.035
 proud-0.035
 woman-0.034
 relieved-0.034
Token open
Token ID1280
Feature activation+3.19e-02

Pos logit contributions
 woman+0.062
 back+0.060
 thanked+0.056
 couple+0.056
 relieved+0.054
Neg logit contributions
wic-0.109
olving-0.107
 centimeters-0.106
ions-0.102
ocol-0.102
Token their
Token ID511
Feature activation-2.19e+00

Pos logit contributions
ptions+0.002
ocol+0.002
lex+0.002
wic+0.002
uces+0.002
Neg logit contributions
 back-0.001
 and-0.000
 goodbye-0.000
 around-0.000
 happy-0.000
Token mouths
Token ID28552
Feature activation+2.57e+00

Pos logit contributions
 back+0.077
 goodbye+0.067
 proud+0.065
 woman+0.065
 happy+0.064
Neg logit contributions
wic-0.105
ocol-0.104
ions-0.104
ptions-0.102
olving-0.102
Token and
Token ID290
Feature activation-3.88e+00

Pos logit contributions
wic+0.163
olving+0.162
ions+0.160
 centimeters+0.158
ocol+0.158
Neg logit contributions
 back-0.040
 woman-0.035
 proud-0.033
 goodbye-0.033
 relieved-0.032
Token say
Token ID910
Feature activation-7.04e+00

Pos logit contributions
 back+0.117
 goodbye+0.102
 proud+0.100
 woman+0.098
 thanked+0.097
Neg logit contributions
wic-0.203
ocol-0.203
ions-0.200
ptions-0.199
olving-0.198
Token "
Token ID366
Feature activation+4.05e+00

Pos logit contributions
 back+0.186
 goodbye+0.182
 happy+0.156
 proud+0.150
 excited+0.136
Neg logit contributions
ptions-0.445
icer-0.438
amy-0.423
lex-0.421
uces-0.418
Tokenah
Token ID993
Feature activation+2.57e+00

Pos logit contributions
wic+0.159
ocol+0.153
olving+0.151
ptions+0.150
ions+0.147
Neg logit contributions
 back-0.177
 woman-0.162
 proud-0.161
 thanked-0.160
 happy-0.159
Tokenh
Token ID71
Feature activation+4.69e+00

Pos logit contributions
ptions+0.139
ocol+0.135
lex+0.132
wic+0.131
amy+0.131
Neg logit contributions
 back-0.080
 goodbye-0.073
 happy-0.069
 proud-0.068
 relieved-0.065
Token".
Token ID1911
Feature activation+4.11e-01

Pos logit contributions
ptions+0.224
amy+0.204
esan+0.203
ocol+0.203
 cheating+0.202
Neg logit contributions
 back-0.175
 goodbye-0.170
 proud-0.157
 and-0.155
 happy-0.151
Token new
Token ID649
Feature activation-2.78e+00

Pos logit contributions
ocol+0.024
wic+0.024
ions+0.023
olving+0.023
ptions+0.023
Neg logit contributions
 back-0.013
 goodbye-0.011
 proud-0.011
 happy-0.011
 woman-0.011
Token car
Token ID1097
Feature activation-1.44e+00

Pos logit contributions
 back+0.074
 goodbye+0.064
 proud+0.063
 woman+0.061
 relieved+0.061
Neg logit contributions
wic-0.153
ocol-0.151
olving-0.150
ions-0.149
ptions-0.148
Token.
Token ID13
Feature activation-6.68e-01

Pos logit contributions
 back+0.041
 goodbye+0.036
 proud+0.035
 woman+0.034
 relieved+0.033
Neg logit contributions
wic-0.077
ocol-0.076
ions-0.075
olving-0.075
ptions-0.074
Token This
Token ID770
Feature activation+3.38e+00

Pos logit contributions
 back+0.019
 goodbye+0.016
 proud+0.015
 relieved+0.015
 excited+0.015
Neg logit contributions
ocol-0.037
wic-0.037
olving-0.036
ptions-0.036
 degrees-0.036
Token one
Token ID530
Feature activation-5.86e+00

Pos logit contributions
wic+0.185
olving+0.183
ions+0.180
 cheating+0.180
ptions+0.179
Neg logit contributions
 back-0.081
 relieved-0.075
 proud-0.074
 goodbye-0.074
 woman-0.074
Token is
Token ID318
Feature activation+1.48e+00

Pos logit contributions
 back+0.223
 goodbye+0.209
 proud+0.203
 woman+0.203
 relieved+0.202
Neg logit contributions
wic-0.247
ions-0.244
olving-0.238
ocol-0.238
 cheating-0.235
Token broken
Token ID5445
Feature activation+1.44e+00

Pos logit contributions
ocol+0.087
wic+0.087
icer+0.084
ptions+0.084
ions+0.083
Neg logit contributions
 back-0.043
 happy-0.034
 proud-0.034
 goodbye-0.033
 excited-0.031
Token."
Token ID526
Feature activation-1.59e+00

Pos logit contributions
ocol+0.086
wic+0.085
ptions+0.082
lex+0.082
ions+0.081
Neg logit contributions
 back-0.040
 goodbye-0.030
 happy-0.029
 proud-0.029
 around-0.028
Token
Token ID198
Feature activation-2.57e+00

Pos logit contributions
 back+0.032
 goodbye+0.028
 woman+0.025
 proud+0.025
 relieved+0.023
Neg logit contributions
wic-0.096
 Sp-0.094
lex-0.093
olving-0.093
ptions-0.093
Token
Token ID198
Feature activation-2.50e+00

Pos logit contributions
 back+0.061
 goodbye+0.061
 woman+0.051
 pony+0.050
 wife+0.049
Neg logit contributions
 Sp-0.143
ions-0.140
lex-0.140
 arose-0.140
wic-0.139
TokenBilly
Token ID49640
Feature activation+2.95e+00

Pos logit contributions
 back+0.119
 proud+0.110
 woman+0.109
 thanked+0.107
 happy+0.107
Neg logit contributions
ocol-0.097
ions-0.092
wic-0.092
olving-0.089
 centimeters-0.086
Token she
Token ID673
Feature activation-3.91e+00

Pos logit contributions
 back+0.030
 goodbye+0.026
 happy+0.024
 and+0.023
 excited+0.023
Neg logit contributions
icer-0.091
ptions-0.090
olving-0.090
ocol-0.090
wic-0.089
Token asked
Token ID1965
Feature activation-4.45e+00

Pos logit contributions
 back+0.103
 goodbye+0.092
 proud+0.087
 woman+0.085
 relieved+0.084
Neg logit contributions
wic-0.215
ocol-0.209
olving-0.209
ptions-0.209
ions-0.208
Token her
Token ID607
Feature activation-5.02e+00

Pos logit contributions
 back+0.090
 goodbye+0.071
 happy+0.064
 woman+0.062
 proud+0.061
Neg logit contributions
wic-0.291
ocol-0.289
 centimeters-0.282
icer-0.282
ions-0.281
Token mother
Token ID2802
Feature activation-3.18e+00

Pos logit contributions
 back+0.113
 goodbye+0.095
 proud+0.093
 relieved+0.088
 woman+0.088
Neg logit contributions
wic-0.299
ocol-0.295
olving-0.293
ions-0.291
ptions-0.289
Token for
Token ID329
Feature activation-9.67e-01

Pos logit contributions
 back+0.099
 goodbye+0.081
 happy+0.078
 proud+0.074
 woman+0.074
Neg logit contributions
ocol-0.178
wic-0.177
icer-0.172
ptions-0.172
ions-0.171
Token a
Token ID257
Feature activation+3.13e-01

Pos logit contributions
 back+0.026
 goodbye+0.022
 happy+0.021
 wife+0.020
 proud+0.020
Neg logit contributions
icer-0.061
wic-0.059
olving-0.057
iders-0.056
ptions-0.056
Token blue
Token ID4171
Feature activation+2.39e+00

Pos logit contributions
ocol+0.017
wic+0.016
ions+0.016
ptions+0.016
olving+0.016
Neg logit contributions
 back-0.010
 goodbye-0.009
 woman-0.008
 happy-0.008
 proud-0.008
Tokenberry
Token ID8396
Feature activation+1.60e+00

Pos logit contributions
ptions+0.131
 centimeters+0.128
wic+0.128
olving+0.127
ocol+0.126
Neg logit contributions
 back-0.074
 goodbye-0.064
 woman-0.063
 pony-0.062
 wife-0.059
Token because
Token ID780
Feature activation-9.84e-01

Pos logit contributions
ptions+0.097
ocol+0.095
wic+0.095
 centimeters+0.093
olving+0.092
Neg logit contributions
 back-0.042
 goodbye-0.032
 woman-0.032
 proud-0.031
 happy-0.030
Token she
Token ID673
Feature activation-4.01e+00

Pos logit contributions
 back+0.022
 goodbye+0.018
 proud+0.018
 woman+0.017
 happy+0.017
Neg logit contributions
wic-0.060
ocol-0.059
olving-0.059
ptions-0.059
lex-0.058
Token had
Token ID550
Feature activation+2.61e+00

Pos logit contributions
 back+0.115
 proud+0.100
 woman+0.097
 goodbye+0.097
 thanked+0.096
Neg logit contributions
ocol-0.217
wic-0.214
ions-0.212
olving-0.212
ptions-0.209
Token.
Token ID13
Feature activation+2.82e-01

Pos logit contributions
wic+0.087
ocol+0.087
ions+0.085
olving+0.085
ptions+0.084
Neg logit contributions
 back-0.048
 goodbye-0.041
 proud-0.041
 woman-0.040
 happy-0.039
Token They
Token ID1119
Feature activation+2.24e+00

Pos logit contributions
wic+0.021
ocol+0.021
ptions+0.020
ions+0.020
olving+0.020
Neg logit contributions
 back-0.003
 goodbye-0.002
 proud-0.002
 happy-0.002
 relieved-0.002
Token were
Token ID547
Feature activation+2.87e+00

Pos logit contributions
ocol+0.111
wic+0.111
ions+0.109
olving+0.109
ptions+0.108
Neg logit contributions
 back-0.074
 goodbye-0.065
 proud-0.064
 woman-0.063
 thanked-0.063
Token boring
Token ID14262
Feature activation+4.74e+00

Pos logit contributions
wic+0.160
ocol+0.158
ions+0.156
olving+0.155
ptions+0.154
Neg logit contributions
 back-0.077
 goodbye-0.066
 proud-0.065
 woman-0.063
 relieved-0.061
Token and
Token ID290
Feature activation-3.67e+00

Pos logit contributions
wic+0.283
ocol+0.282
ions+0.275
ptions+0.274
lex+0.274
Neg logit contributions
 back-0.112
 goodbye-0.091
 proud-0.089
 woman-0.087
 happy-0.086
Token quiet
Token ID5897
Feature activation+1.57e+00

Pos logit contributions
 back+0.109
 happy+0.104
 proud+0.101
 excited+0.096
 goodbye+0.093
Neg logit contributions
ions-0.203
ocol-0.203
icer-0.203
ptions-0.202
wic-0.201
Token.
Token ID13
Feature activation+2.95e-01

Pos logit contributions
ocol+0.100
wic+0.098
lex+0.097
ptions+0.096
icer+0.095
Neg logit contributions
 back-0.038
 goodbye-0.029
 happy-0.027
 excited-0.026
 proud-0.026
Token He
Token ID679
Feature activation+1.96e+00

Pos logit contributions
wic+0.018
ocol+0.018
olving+0.017
ptions+0.017
ions+0.017
Neg logit contributions
 back-0.007
 goodbye-0.006
 proud-0.005
 happy-0.005
 relieved-0.005
Token liked
Token ID8288
Feature activation+2.07e+00

Pos logit contributions
ocol+0.099
wic+0.097
ions+0.097
olving+0.096
ptions+0.095
Neg logit contributions
 back-0.064
 goodbye-0.057
 proud-0.056
 thanked-0.056
 woman-0.055
Token trucks
Token ID13960
Feature activation+4.49e+00

Pos logit contributions
wic+0.130
ocol+0.130
icer+0.129
 centimeters+0.128
ptions+0.127
Neg logit contributions
 back-0.053
 happy-0.041
 goodbye-0.041
 woman-0.036
 and-0.036
Token.
Token ID13
Feature activation+3.69e-01

Pos logit contributions
wic+0.247
ocol+0.244
ions+0.242
olving+0.241
ptions+0.238
Neg logit contributions
 back-0.119
 goodbye-0.104
 proud-0.101
 woman-0.100
 relieved-0.097
Token it
Token ID340
Feature activation-3.21e+00

Pos logit contributions
 back+0.131
 goodbye+0.112
 proud+0.106
 woman+0.106
 thanked+0.104
Neg logit contributions
wic-0.210
olving-0.206
ocol-0.206
 centimeters-0.203
ptions-0.203
Token some
Token ID617
Feature activation-1.15e+00

Pos logit contributions
 back+0.072
 proud+0.067
 woman+0.066
 relieved+0.066
 goodbye+0.064
Neg logit contributions
wic-0.181
ions-0.178
ocol-0.175
olving-0.174
 Sp-0.172
Token y
Token ID331
Feature activation+3.45e+00

Pos logit contributions
 back+0.039
 goodbye+0.035
 proud+0.034
 happy+0.033
 woman+0.033
Neg logit contributions
ptions-0.057
wic-0.056
olving-0.055
lex-0.055
ocol-0.055
Tokenummy
Token ID13513
Feature activation+6.92e-01

Pos logit contributions
ptions+0.163
 cheating+0.159
wic+0.152
olving+0.151
redible+0.150
Neg logit contributions
 back-0.132
 goodbye-0.110
 and-0.109
 home-0.106
 thanked-0.105
Token food
Token ID2057
Feature activation+9.60e-01

Pos logit contributions
ptions+0.032
olving+0.032
wic+0.032
ocol+0.031
lex+0.031
Neg logit contributions
 back-0.026
 goodbye-0.023
 proud-0.022
 happy-0.022
 thanked-0.022
Token and
Token ID290
Feature activation-3.87e+00

Pos logit contributions
wic+0.059
ocol+0.059
ptions+0.058
olving+0.058
ions+0.058
Neg logit contributions
 back-0.020
 goodbye-0.016
 proud-0.015
 woman-0.015
 thanked-0.014
Token a
Token ID257
Feature activation+7.61e-01

Pos logit contributions
 back+0.114
 thanked+0.102
 goodbye+0.101
 proud+0.100
 happy+0.098
Neg logit contributions
ocol-0.208
ptions-0.207
wic-0.204
olving-0.204
lex-0.203
Token bed
Token ID3996
Feature activation-7.11e-01

Pos logit contributions
wic+0.038
ocol+0.038
ions+0.037
olving+0.037
ptions+0.037
Neg logit contributions
 back-0.025
 goodbye-0.022
 proud-0.022
 woman-0.021
 happy-0.021
Token to
Token ID284
Feature activation+2.16e+00

Pos logit contributions
 back+0.018
 goodbye+0.015
 proud+0.014
 thanked+0.014
 happy+0.014
Neg logit contributions
ocol-0.042
wic-0.042
ptions-0.042
lex-0.041
olving-0.040
Token sleep
Token ID3993
Feature activation-1.26e+00

Pos logit contributions
ptions+0.125
olving+0.123
wic+0.122
 centimeters+0.121
ions+0.121
Neg logit contributions
 back-0.059
 goodbye-0.051
 happy-0.050
 thanked-0.049
 proud-0.048
Token in
Token ID287
Feature activation+3.50e+00

Pos logit contributions
 back+0.033
 goodbye+0.028
 proud+0.026
 happy+0.025
 woman+0.025
Neg logit contributions
wic-0.073
ocol-0.073
ptions-0.072
lex-0.072
olving-0.071
Token ground
Token ID2323
Feature activation+1.64e+00

Pos logit contributions
 back+0.000
 goodbye+0.000
 proud+0.000
 relieved+0.000
 thanked+0.000
Neg logit contributions
wic-0.000
ocol-0.000
olving-0.000
ions-0.000
ptions-0.000
Token.
Token ID13
Feature activation-2.01e-01

Pos logit contributions
ocol+0.115
wic+0.113
ptions+0.109
icer+0.109
ions+0.108
Neg logit contributions
 back-0.036
 around-0.025
 and-0.022
 excited-0.021
 goodbye-0.021
Token He
Token ID679
Feature activation+1.57e+00

Pos logit contributions
 back+0.004
 goodbye+0.003
 proud+0.003
 excited+0.003
 happy+0.003
Neg logit contributions
ocol-0.013
wic-0.013
ions-0.013
ptions-0.013
olving-0.013
Token stopped
Token ID5025
Feature activation+3.24e-01

Pos logit contributions
wic+0.080
ocol+0.078
ptions+0.077
ions+0.077
olving+0.076
Neg logit contributions
 back-0.049
 goodbye-0.044
 proud-0.042
 woman-0.042
 relieved-0.041
Token running
Token ID2491
Feature activation+1.10e+00

Pos logit contributions
wic+0.019
ocol+0.019
ions+0.019
olving+0.019
ptions+0.018
Neg logit contributions
 back-0.008
 proud-0.007
 goodbye-0.006
 happy-0.006
 woman-0.006
Token and
Token ID290
Feature activation-4.04e+00

Pos logit contributions
ions+0.064
olving+0.064
 Sp+0.064
wic+0.064
 cheating+0.063
Neg logit contributions
 wife-0.018
 proud-0.016
 woman-0.016
 thanked-0.016
 relieved-0.016
Token picked
Token ID6497
Feature activation-1.01e+00

Pos logit contributions
 back+0.114
 goodbye+0.097
 proud+0.096
 woman+0.094
 relieved+0.091
Neg logit contributions
wic-0.220
ocol-0.219
ions-0.215
olving-0.215
ptions-0.213
Token them
Token ID606
Feature activation-2.63e+00

Pos logit contributions
 back+0.025
 woman+0.022
 goodbye+0.022
 proud+0.022
 relieved+0.021
Neg logit contributions
wic-0.056
ocol-0.055
ions-0.055
ptions-0.054
olving-0.054
Token up
Token ID510
Feature activation-6.30e+00

Pos logit contributions
 woman+0.047
 back+0.041
 wife+0.041
 goodbye+0.040
 thanked+0.039
Neg logit contributions
wic-0.162
ions-0.157
olving-0.157
lex-0.156
 cheating-0.155
Token.
Token ID13
Feature activation-1.84e-01

Pos logit contributions
 back+0.146
 goodbye+0.123
 woman+0.120
 proud+0.119
 relieved+0.114
Neg logit contributions
wic-0.366
ocol-0.364
olving-0.361
ions-0.360
lex-0.356
Token But
Token ID887
Feature activation+3.22e+00

Pos logit contributions
 back+0.005
 goodbye+0.004
 proud+0.004
 excited+0.004
 thanked+0.004
Neg logit contributions
ocol-0.011
wic-0.010
ions-0.010
ptions-0.010
olving-0.010
Token,
Token ID11
Feature activation+3.64e+00

Pos logit contributions
ocol+0.160
wic+0.159
ptions+0.159
arine+0.155
lex+0.155
Neg logit contributions
 back-0.075
 happy-0.063
 goodbye-0.062
 excited-0.060
 proud-0.058
Token there
Token ID612
Feature activation+1.31e-01

Pos logit contributions
wic+0.182
ocol+0.181
olving+0.175
ptions+0.175
ions+0.174
Neg logit contributions
 back-0.124
 happy-0.106
 proud-0.106
 goodbye-0.105
 relieved-0.102
Token was
Token ID373
Feature activation+2.08e+00

Pos logit contributions
ocol+0.008
ptions+0.008
 centimeters+0.008
 degrees+0.008
wic+0.008
Neg logit contributions
 back-0.003
 happy-0.003
 excited-0.002
 proud-0.002
 thanked-0.002
Token a
Token ID257
Feature activation+4.09e-01

Pos logit contributions
wic+0.109
ocol+0.108
ions+0.108
olving+0.106
ptions+0.106
Neg logit contributions
 back-0.061
 goodbye-0.053
 woman-0.052
 proud-0.051
 relieved-0.050
Token little
Token ID1310
Feature activation+6.35e-01

Pos logit contributions
ions+0.026
ocol+0.026
wic+0.026
olving+0.026
ptions+0.026
Neg logit contributions
 back-0.008
 proud-0.007
 happy-0.007
 woman-0.006
 goodbye-0.006
Token girl
Token ID2576
Feature activation-3.98e+00

Pos logit contributions
wic+0.042
ocol+0.041
olving+0.041
ions+0.041
ptions+0.041
Neg logit contributions
 back-0.011
 woman-0.008
 proud-0.008
 goodbye-0.008
 happy-0.008
Token named
Token ID3706
Feature activation+1.20e+00

Pos logit contributions
 back+0.148
 woman+0.131
 proud+0.130
 happy+0.129
 goodbye+0.127
Neg logit contributions
wic-0.188
ocol-0.185
olving-0.182
ptions-0.181
ions-0.178
Token Lily
Token ID20037
Feature activation+1.93e+00

Pos logit contributions
ocol+0.059
wic+0.059
olving+0.057
ions+0.056
 centimeters+0.056
Neg logit contributions
 back-0.044
 happy-0.038
 goodbye-0.038
 proud-0.037
 excited-0.036
Token.
Token ID13
Feature activation-3.47e-01

Pos logit contributions
lex+0.081
 Sp+0.081
wic+0.080
olving+0.080
ions+0.080
Neg logit contributions
 goodbye-0.069
 back-0.067
 relieved-0.063
 proud-0.062
 wife-0.060
Token One
Token ID1881
Feature activation+1.92e+00

Pos logit contributions
 back+0.004
 happy+0.003
 excited+0.002
 goodbye+0.002
 around+0.002
Neg logit contributions
ocol-0.026
wic-0.026
 degrees-0.025
 centimeters-0.025
ptions-0.025
Token day
Token ID1110
Feature activation-1.28e+00

Pos logit contributions
lex+0.069
 Sp+0.069
wic+0.067
olving+0.066
ions+0.064
Neg logit contributions
 back-0.078
 goodbye-0.076
 thanked-0.075
 relieved-0.072
 woman-0.071
Token are
Token ID389
Feature activation+2.36e+00

Pos logit contributions
wic+0.096
ocol+0.096
ions+0.095
olving+0.094
ptions+0.094
Neg logit contributions
 back-0.057
 proud-0.049
 thanked-0.048
 happy-0.048
 goodbye-0.047
Token selling
Token ID6301
Feature activation-3.85e-01

Pos logit contributions
wic+0.107
ocol+0.106
ions+0.104
olving+0.104
ptions+0.103
Neg logit contributions
 back-0.088
 goodbye-0.078
 proud-0.078
 woman-0.076
 happy-0.075
Token leaves
Token ID5667
Feature activation+4.32e+00

Pos logit contributions
 back+0.013
 goodbye+0.011
 proud+0.011
 woman+0.010
 happy+0.010
Neg logit contributions
wic-0.019
ocol-0.019
olving-0.019
ptions-0.019
ions-0.019
Token!"
Token ID2474
Feature activation-5.98e-01

Pos logit contributions
wic+0.248
ocol+0.245
ptions+0.244
olving+0.243
lex+0.240
Neg logit contributions
 back-0.112
 goodbye-0.091
 proud-0.089
 woman-0.086
 happy-0.085
Token Ben
Token ID3932
Feature activation+4.09e+00

Pos logit contributions
 back+0.021
 pony+0.019
found+0.018
 goodbye+0.018
 woman+0.018
Neg logit contributions
 Sp-0.027
ions-0.024
ocol-0.024
wic-0.024
 vinegar-0.023
Token says
Token ID1139
Feature activation-7.10e+00

Pos logit contributions
 cheating+0.211
 Sp+0.201
wic+0.200
 centimeters+0.189
iles+0.187
Neg logit contributions
 goodbye-0.096
 back-0.093
 proud-0.091
found-0.088
 couple-0.086
Token.
Token ID13
Feature activation+3.71e-01

Pos logit contributions
 woman+0.219
 back+0.215
 pony+0.209
 wife+0.208
 proud+0.208
Neg logit contributions
wic-0.318
ocol-0.316
 Sp-0.311
olving-0.310
ions-0.310
Token "
Token ID366
Feature activation+4.44e+00

Pos logit contributions
 Sp+0.019
ions+0.018
olving+0.017
 arose+0.017
wic+0.017
Neg logit contributions
 back-0.010
 proud-0.009
 goodbye-0.009
 pony-0.009
 relieved-0.009
TokenThey
Token ID2990
Feature activation+1.64e+00

Pos logit contributions
wic+0.191
ocol+0.189
olving+0.183
ions+0.183
ptions+0.182
Neg logit contributions
 back-0.176
 proud-0.158
 goodbye-0.157
 woman-0.156
 happy-0.153
Token are
Token ID389
Feature activation+2.40e+00

Pos logit contributions
wic+0.078
ocol+0.077
ions+0.076
olving+0.075
ptions+0.075
Neg logit contributions
 back-0.057
 goodbye-0.050
 proud-0.050
 woman-0.049
 happy-0.048
Token pretty
Token ID2495
Feature activation+6.15e-01

Pos logit contributions
wic+0.132
olving+0.131
ptions+0.130
ions+0.129
 arose+0.129
Neg logit contributions
 back-0.073
 proud-0.064
 happy-0.063
 excited-0.057
 goodbye-0.057
Token,
Token ID11
Feature activation+3.52e+00

Pos logit contributions
wic+0.061
ocol+0.061
ptions+0.060
ions+0.060
olving+0.059
Neg logit contributions
 back-0.027
 goodbye-0.022
 proud-0.022
 happy-0.021
 woman-0.021
Token her
Token ID607
Feature activation-4.85e+00

Pos logit contributions
wic+0.161
ocol+0.159
ions+0.158
ptions+0.155
olving+0.154
Neg logit contributions
 back-0.125
 goodbye-0.113
 proud-0.111
 relieved-0.110
 woman-0.109
Token books
Token ID3835
Feature activation+3.98e+00

Pos logit contributions
 back+0.149
 goodbye+0.126
 happy+0.126
 proud+0.124
 woman+0.121
Neg logit contributions
ocol-0.262
wic-0.260
ptions-0.260
lex-0.255
ions-0.255
Token,
Token ID11
Feature activation+3.52e+00

Pos logit contributions
ions+0.191
olving+0.189
wic+0.181
ptions+0.174
lex+0.174
Neg logit contributions
 wife-0.119
 woman-0.117
 goodbye-0.117
 back-0.115
 relieved-0.112
Token and
Token ID290
Feature activation-3.73e+00

Pos logit contributions
wic+0.220
ions+0.218
ocol+0.217
olving+0.215
ptions+0.213
Neg logit contributions
 back-0.067
 goodbye-0.055
 proud-0.054
 relieved-0.052
 woman-0.052
Token her
Token ID607
Feature activation-4.85e+00

Pos logit contributions
 back+0.115
 relieved+0.102
 proud+0.101
 goodbye+0.101
 woman+0.097
Neg logit contributions
olving-0.178
wic-0.178
ions-0.177
ocol-0.176
 cheating-0.174
Token envelop
Token ID16441
Feature activation+3.39e-01

Pos logit contributions
 back+0.157
 goodbye+0.134
 happy+0.132
 proud+0.129
 woman+0.128
Neg logit contributions
ocol-0.255
wic-0.253
ptions-0.252
ions-0.251
lex-0.249
Tokenes
Token ID274
Feature activation+1.42e+00

Pos logit contributions
ocol+0.015
wic+0.015
lex+0.014
ptions+0.014
olving+0.014
Neg logit contributions
 back-0.014
 goodbye-0.012
 woman-0.012
 proud-0.011
 thanked-0.011
Token in
Token ID287
Feature activation+3.12e+00

Pos logit contributions
wic+0.077
ocol+0.076
ions+0.076
olving+0.075
ptions+0.075
Neg logit contributions
 back-0.039
 goodbye-0.034
 proud-0.033
 woman-0.033
 relieved-0.032
Token the
Token ID262
Feature activation+1.09e-01

Pos logit contributions
wic+0.189
ocol+0.183
olving+0.181
lex+0.180
ions+0.178
Neg logit contributions
 back-0.081
 happy-0.067
 proud-0.062
 woman-0.061
 excited-0.060
Token box
Token ID3091
Feature activation+1.14e+00

Pos logit contributions
wic+0.006
ocol+0.006
ions+0.005
olving+0.005
ptions+0.005
Neg logit contributions
 back-0.004
 goodbye-0.003
 proud-0.003
 woman-0.003
 happy-0.003
Token spotted
Token ID13489
Feature activation+8.49e-02

Pos logit contributions
 back+0.112
 goodbye+0.103
 proud+0.099
 relieved+0.098
 woman+0.097
Neg logit contributions
ptions-0.193
 Sp-0.192
wic-0.192
 arose-0.184
ions-0.183
Token an
Token ID281
Feature activation+2.34e+00

Pos logit contributions
ocol+0.005
wic+0.005
lex+0.005
olving+0.005
ptions+0.005
Neg logit contributions
 back-0.002
 goodbye-0.002
 woman-0.002
 proud-0.001
 happy-0.001
Token old
Token ID1468
Feature activation-1.72e+00

Pos logit contributions
ocol+0.124
lex+0.122
 centimeters+0.121
wic+0.120
olving+0.120
Neg logit contributions
 back-0.061
 goodbye-0.059
 thanked-0.055
 relieved-0.054
 proud-0.052
Token witch
Token ID16365
Feature activation-1.24e+00

Pos logit contributions
 back+0.039
 goodbye+0.032
 woman+0.032
 proud+0.032
 happy+0.030
Neg logit contributions
wic-0.102
ocol-0.102
ions-0.100
olving-0.100
ptions-0.100
Token in
Token ID287
Feature activation+3.15e+00

Pos logit contributions
 back+0.034
 goodbye+0.028
 proud+0.028
 woman+0.027
 happy+0.026
Neg logit contributions
wic-0.070
ocol-0.069
ions-0.068
ptions-0.068
olving-0.068
Token her
Token ID607
Feature activation-4.90e+00

Pos logit contributions
wic+0.195
olving+0.185
ptions+0.185
ocol+0.184
lex+0.183
Neg logit contributions
 back-0.077
 woman-0.067
 goodbye-0.063
 happy-0.062
 proud-0.060
Token path
Token ID3108
Feature activation-1.33e+00

Pos logit contributions
five+0.190
 thanked+0.177
 proud+0.176
 relieved+0.175
 goodbye+0.171
Neg logit contributions
olving-0.172
wic-0.168
icer-0.168
 Sp-0.166
arine-0.162
Token.
Token ID13
Feature activation-8.44e-02

Pos logit contributions
 back+0.033
 goodbye+0.027
 proud+0.026
 woman+0.026
 happy+0.025
Neg logit contributions
wic-0.079
ocol-0.079
ptions-0.077
lex-0.076
ions-0.076
Token The
Token ID383
Feature activation+2.09e+00

Pos logit contributions
 back+0.002
 proud+0.002
 goodbye+0.002
 wife+0.002
 happy+0.002
Neg logit contributions
 Sp-0.005
wic-0.005
olving-0.005
ions-0.005
lex-0.004
Token witch
Token ID16365
Feature activation-1.27e+00

Pos logit contributions
wic+0.114
ocol+0.113
olving+0.112
ptions+0.112
ions+0.111
Neg logit contributions
 back-0.056
 goodbye-0.048
 proud-0.047
 thanked-0.046
 relieved-0.045
Token had
Token ID550
Feature activation+2.61e+00

Pos logit contributions
 back+0.037
 goodbye+0.033
 proud+0.031
 woman+0.030
 relieved+0.030
Neg logit contributions
wic-0.065
ptions-0.064
ocol-0.064
olving-0.064
ions-0.063