TOP ACTIVATIONS
MAX = 0.755

has a big books he lf with many books . She
go to the books he lf and start to climb .
hiding behind the books he lf . The cat jumps out
careful . The books he lf is tall and we are
chair to the books he lf . She stands on the
to climb the books he lf . She pushes the chair
can climb the books he lf and see what is on
had a big books he lf in her room with lots
had a big books he lf in her room . Her
book from the books he lf . She opens it and
I finished the books he lf !" Andy said with a
ran to the books he lf and put the box on
falls off the books he lf . It makes a loud
chair to the books he lf and climbed on it .
lf . The books he lf is high , so Lily
, on a books he lf . The books he lf
Sarah found a books he lf in her room . She
I touch the books he lf too ?" His mom smiled
we have a books he lf !" <|endoftext|> M ummy and
not climb the books he lf . It is dangerous .

MIN ACTIVATIONS
MIN = -0.648

He was proud of his names . M ia
were two friends . Their names were Lisa and Jerry .
ia wanted to have nice names for her animals . She
was a girl . Their names were Lucy and Joe .
best friends . Their names were Lucy and Charlotte .
in a tree . Their names were Max and Shy .
a large house . Their names were Molly and Jacob .
, far away . Their names were Pat and Bob .
good friends and signed their names together on everything they could
were two friends . Their names were Jack and Kim .
dogs and practiced saying their names . The
of the paintings or the names of the statues .
the pictures and learn the names of the animals . Tom
a green closet . Their names were Dave and L ulu
were two friends . Their names were Max and Ma isy
learned how to sign their names . They were so proud
She was happy with her names . Ben and
the words and learn the names of the animals .
in her nest . Their names were Tim , Kim ,
animals . He knew the names of many animals and the

INTERVAL 0.405 - 0.755
CONTAINS 1.643%

trouble . M ummy : " Did you drop
could wear proudly . And she had her mom , who
She asked Tim my why he lied to Jimmy . Tim
, too . Even if he is naughty sometimes . Come
the sky , even if she couldn 't bring a piece

INTERVAL 0.054 - 0.405
CONTAINS 29.998%

red box again . Tim remembered it from before . He
One day , Maria found an old map .
So they got in the man 's car and drove to
robe and gave Molly a loving pat on the head .
mom my and saw a boy playing with a toy gun

INTERVAL -0.297 - 0.054
CONTAINS 63.579%

not y ummy ! This is sour and bitter !"
on paper with her cr ay ons . One day ,
then some makeup . She wanted to make sure she looked
Jane quickly became troubled . She didn 't know what to
to find a muff in . They looked in the kitchen

INTERVAL -0.648 - -0.297
CONTAINS 4.780%

can do this . We have the will to win .
secret that he wanted to share with someone . One day
your imagination . Go to sleep and have sweet dreams ."
say they loved her . They sent her to bed without
I am Sara . I like you . You are very
Token has
Token ID468
Feature activation-1.30e-01

Pos logit contributions
lf+0.145
mber+0.133
opus+0.133
very+0.122
ths+0.120
Neg logit contributions
 names-0.098
ws-0.078
 name-0.074
 meet-0.073
 respect-0.072
Token a
Token ID257
Feature activation-1.48e-01

Pos logit contributions
 names+0.269
ws+0.250
 sneak+0.226
 name+0.225
ashes+0.219
Neg logit contributions
lf-0.324
mber-0.288
 importantly-0.281
opus-0.266
ths-0.263
Token big
Token ID1263
Feature activation-4.08e-02

Pos logit contributions
 names+0.323
 name+0.258
 sneak+0.258
ws+0.256
 meet+0.254
Neg logit contributions
lf-0.389
opus-0.360
mber-0.358
 importantly-0.338
izing-0.332
Token books
Token ID3835
Feature activation-5.63e-02

Pos logit contributions
 names+0.095
ws+0.078
 sneak+0.078
 meet+0.077
 name+0.076
Neg logit contributions
lf-0.101
opus-0.093
mber-0.092
 importantly-0.086
izing-0.083
Tokenhe
Token ID258
Feature activation+1.80e-01

Pos logit contributions
 names+0.141
 name+0.119
 sneak+0.118
 meet+0.117
 find+0.116
Neg logit contributions
lf-0.126
opus-0.119
mber-0.118
 importantly-0.110
ths-0.107
Tokenlf
Token ID1652
Feature activation+7.39e-01

Pos logit contributions
mber+0.066
opus+0.059
 importantly+0.037
izing+0.027
ths+0.021
Neg logit contributions
 names-0.754
 find-0.700
 play-0.695
 name-0.694
 sneak-0.694
Token with
Token ID351
Feature activation+6.11e-02

Pos logit contributions
lf+2.060
opus+1.941
mber+1.935
 importantly+1.822
izing+1.787
Neg logit contributions
 names-1.486
 sneak-1.169
 name-1.163
ws-1.145
 meet-1.135
Token many
Token ID867
Feature activation+9.10e-03

Pos logit contributions
lf+0.167
opus+0.150
mber+0.150
 importantly+0.145
izing+0.140
Neg logit contributions
 names-0.125
ws-0.103
 sneak-0.102
 name-0.100
 meet-0.100
Token books
Token ID3835
Feature activation-5.61e-02

Pos logit contributions
lf+0.026
opus+0.025
mber+0.025
 importantly+0.023
izing+0.023
Neg logit contributions
 names-0.018
 name-0.014
 sneak-0.013
 meet-0.013
ws-0.013
Token.
Token ID13
Feature activation-9.04e-02

Pos logit contributions
 names+0.104
 name+0.081
 sneak+0.079
 meet+0.079
 find+0.078
Neg logit contributions
lf-0.162
mber-0.157
opus-0.156
 importantly-0.146
ths-0.144
Token She
Token ID1375
Feature activation-6.20e-02

Pos logit contributions
 names+0.195
 name+0.155
 sneak+0.154
ws+0.154
 meet+0.153
Neg logit contributions
lf-0.243
opus-0.225
mber-0.223
 importantly-0.211
izing-0.207
Token go
Token ID467
Feature activation-4.48e-01

Pos logit contributions
lf+0.173
opus+0.149
mber+0.146
ths+0.144
 she+0.140
Neg logit contributions
 names-0.075
ws-0.055
ashes-0.052
 name-0.051
 meet-0.047
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
 names+0.773
ws+0.574
 name+0.570
 sneak+0.563
 meet+0.554
Neg logit contributions
lf-1.395
opus-1.298
mber-1.293
 importantly-1.221
ths-1.208
Token the
Token ID262
Feature activation+3.62e-02

Pos logit contributions
 names+0.615
ws+0.544
 name+0.503
ashes+0.503
 want+0.482
Neg logit contributions
lf-0.387
mber-0.300
opus-0.276
very-0.268
 she-0.259
Token books
Token ID3835
Feature activation-5.19e-02

Pos logit contributions
lf+0.093
opus+0.081
mber+0.079
 importantly+0.074
ths+0.074
Neg logit contributions
 names-0.084
ws-0.071
 name-0.069
 meet-0.067
 sneak-0.067
Tokenhe
Token ID258
Feature activation+1.81e-01

Pos logit contributions
 names+0.134
 sneak+0.111
 name+0.111
ws+0.110
 meet+0.110
Neg logit contributions
lf-0.116
opus-0.107
mber-0.106
 importantly-0.099
izing-0.096
Tokenlf
Token ID1652
Feature activation+7.39e-01

Pos logit contributions
mber+0.042
opus+0.040
lf+0.032
 importantly+0.018
izing+0.001
Neg logit contributions
 names-0.799
 sneak-0.728
 find-0.721
 name-0.718
 meet-0.717
Token and
Token ID290
Feature activation+2.63e-02

Pos logit contributions
lf+1.899
opus+1.673
mber+1.655
 importantly+1.578
izing+1.572
Neg logit contributions
 names-1.673
 name-1.352
ws-1.346
 meet-1.336
 sneak-1.320
Token start
Token ID923
Feature activation-3.57e-01

Pos logit contributions
lf+0.085
mber+0.073
opus+0.071
 she+0.071
ths+0.070
Neg logit contributions
 names-0.038
ws-0.029
 name-0.027
ashes-0.026
 meet-0.024
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
 names+0.721
ws+0.587
 name+0.581
 meet+0.570
 sneak+0.564
Neg logit contributions
lf-0.999
mber-0.843
opus-0.829
ths-0.828
izing-0.807
Token climb
Token ID12080
Feature activation-2.23e-01

Pos logit contributions
 names+0.379
ashes+0.294
ws+0.283
 name+0.275
ags+0.250
Neg logit contributions
lf-0.666
ths-0.526
izing-0.500
mber-0.500
 she-0.496
Token.
Token ID13
Feature activation-8.80e-02

Pos logit contributions
 names+0.516
 sneak+0.427
 meet+0.426
ws+0.424
 name+0.421
Neg logit contributions
lf-0.552
opus-0.487
mber-0.484
izing-0.461
ths-0.454
Token hiding
Token ID11816
Feature activation-9.33e-03

Pos logit contributions
 names+0.426
 sneak+0.351
ws+0.350
 meet+0.346
 name+0.345
Neg logit contributions
lf-0.407
opus-0.368
mber-0.367
 importantly-0.349
izing-0.336
Token behind
Token ID2157
Feature activation+8.44e-02

Pos logit contributions
 names+0.021
 name+0.017
 sneak+0.017
ws+0.017
 meet+0.016
Neg logit contributions
lf-0.024
opus-0.023
mber-0.022
 importantly-0.021
ths-0.021
Token the
Token ID262
Feature activation+2.87e-02

Pos logit contributions
lf+0.212
opus+0.185
mber+0.183
 importantly+0.176
izing+0.174
Neg logit contributions
 names-0.189
ws-0.161
 sneak-0.158
 name-0.153
 meet-0.149
Token books
Token ID3835
Feature activation-5.39e-02

Pos logit contributions
lf+0.069
opus+0.063
mber+0.059
 importantly+0.056
izing+0.054
Neg logit contributions
 names-0.069
ws-0.062
 sneak-0.057
 name-0.056
 meet-0.056
Tokenhe
Token ID258
Feature activation+1.81e-01

Pos logit contributions
 names+0.120
 name+0.095
 sneak+0.095
ws+0.095
 meet+0.094
Neg logit contributions
lf-0.141
opus-0.129
mber-0.128
 importantly-0.121
izing-0.119
Tokenlf
Token ID1652
Feature activation+7.38e-01

Pos logit contributions
mber+0.031
opus+0.023
lf+0.019
 importantly+0.013
ths-0.012
Neg logit contributions
 names-0.806
 sneak-0.734
 name-0.731
 meet-0.726
 find-0.726
Token.
Token ID13
Feature activation-9.08e-02

Pos logit contributions
lf+2.091
opus+1.884
mber+1.849
izing+1.772
 importantly+1.770
Neg logit contributions
 names-1.488
 name-1.169
ws-1.155
 meet-1.147
 sneak-1.139
Token The
Token ID383
Feature activation-2.00e-01

Pos logit contributions
 names+0.168
 name+0.129
 sneak+0.126
 meet+0.126
ws+0.126
Neg logit contributions
lf-0.272
opus-0.255
mber-0.252
 importantly-0.239
izing-0.237
Token cat
Token ID3797
Feature activation-5.79e-02

Pos logit contributions
 names+0.456
ws+0.387
 sneak+0.372
 meet+0.366
 name+0.361
Neg logit contributions
lf-0.509
opus-0.468
mber-0.454
 importantly-0.432
izing-0.417
Token jumps
Token ID18045
Feature activation-1.51e-01

Pos logit contributions
 names+0.106
 meet+0.082
ws+0.082
 name+0.080
 sneak+0.078
Neg logit contributions
lf-0.168
opus-0.157
mber-0.153
 importantly-0.143
ths-0.143
Token out
Token ID503
Feature activation-2.31e-01

Pos logit contributions
 names+0.325
 meet+0.260
ws+0.259
 sneak+0.257
 name+0.256
Neg logit contributions
lf-0.393
opus-0.364
mber-0.360
izing-0.338
 importantly-0.332
Token careful
Token ID8161
Feature activation+1.81e-01

Pos logit contributions
 names+0.765
 sneak+0.645
 name+0.643
ws+0.640
 meet+0.640
Neg logit contributions
lf-0.512
mber-0.452
opus-0.443
 importantly-0.425
ths-0.406
Token.
Token ID13
Feature activation-8.78e-02

Pos logit contributions
lf+0.513
opus+0.480
mber+0.477
 importantly+0.451
izing+0.442
Neg logit contributions
 names-0.359
 sneak-0.279
 name-0.279
ws-0.277
 meet-0.274
Token The
Token ID383
Feature activation-1.98e-01

Pos logit contributions
 names+0.185
ws+0.158
 sneak+0.149
 find+0.145
 meet+0.142
Neg logit contributions
lf-0.216
 importantly-0.202
opus-0.202
mber-0.198
izing-0.183
Token books
Token ID3835
Feature activation-5.23e-02

Pos logit contributions
 names+0.428
ws+0.348
 sneak+0.343
 name+0.341
 meet+0.337
Neg logit contributions
lf-0.534
opus-0.484
mber-0.478
 importantly-0.454
izing-0.441
Tokenhe
Token ID258
Feature activation+1.80e-01

Pos logit contributions
 names+0.075
 name+0.049
ws+0.048
 sneak+0.048
 meet+0.048
Neg logit contributions
lf-0.179
opus-0.166
mber-0.162
izing-0.158
 importantly-0.156
Tokenlf
Token ID1652
Feature activation+7.38e-01

Pos logit contributions
lf+0.160
opus+0.142
mber+0.140
 importantly+0.123
izing+0.110
Neg logit contributions
 names-0.694
 sneak-0.617
ws-0.613
 name-0.612
 meet-0.610
Token is
Token ID318
Feature activation-1.71e-01

Pos logit contributions
lf+2.136
opus+1.970
mber+1.895
izing+1.850
ths+1.817
Neg logit contributions
 names-1.332
 meet-1.035
 name-1.021
 sneak-0.974
ws-0.974
Token tall
Token ID7331
Feature activation+9.55e-02

Pos logit contributions
 names+0.395
 sneak+0.319
 name+0.318
ws+0.317
 meet+0.317
Neg logit contributions
lf-0.428
mber-0.388
opus-0.388
 importantly-0.368
izing-0.359
Token and
Token ID290
Feature activation+2.63e-02

Pos logit contributions
lf+0.243
opus+0.226
mber+0.224
 importantly+0.211
izing+0.206
Neg logit contributions
 names-0.217
 name-0.174
 sneak-0.174
ws-0.173
 meet-0.171
Token we
Token ID356
Feature activation+2.16e-01

Pos logit contributions
lf+0.070
opus+0.064
mber+0.064
 importantly+0.060
izing+0.059
Neg logit contributions
 names-0.057
 name-0.045
ws-0.045
 sneak-0.045
 meet-0.045
Token are
Token ID389
Feature activation-3.63e-01

Pos logit contributions
lf+0.591
opus+0.535
ths+0.493
 she+0.480
 he+0.476
Neg logit contributions
 names-0.361
 name-0.253
ws-0.253
 meet-0.248
ashes-0.238
Token chair
Token ID5118
Feature activation+1.90e-01

Pos logit contributions
lf+0.070
opus+0.060
mber+0.059
izing+0.052
 importantly+0.051
Neg logit contributions
 names-0.104
ws-0.090
 meet-0.090
 name-0.088
 sneak-0.088
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
lf+0.480
opus+0.421
mber+0.413
izing+0.408
ths+0.407
Neg logit contributions
 names-0.428
 meet-0.347
ws-0.346
 name-0.341
 sneak-0.332
Token the
Token ID262
Feature activation+3.59e-02

Pos logit contributions
 names+0.571
ws+0.464
 name+0.462
 meet+0.443
ashes+0.436
Neg logit contributions
lf-0.469
izing-0.366
mber-0.359
opus-0.356
ths-0.355
Token books
Token ID3835
Feature activation-5.27e-02

Pos logit contributions
lf+0.086
opus+0.075
mber+0.073
izing+0.068
ths+0.067
Neg logit contributions
 names-0.089
ws-0.075
 meet-0.073
 name-0.073
 sneak-0.071
Tokenhe
Token ID258
Feature activation+1.81e-01

Pos logit contributions
 names+0.144
ws+0.120
 name+0.120
 meet+0.119
 sneak+0.119
Neg logit contributions
lf-0.112
opus-0.099
mber-0.097
izing-0.090
 importantly-0.090
Tokenlf
Token ID1652
Feature activation+7.37e-01

Pos logit contributions
opus+0.035
mber+0.031
lf+0.030
 importantly+0.015
izing-0.002
Neg logit contributions
 names-0.803
 sneak-0.731
 name-0.724
 find-0.723
ws-0.719
Token.
Token ID13
Feature activation-8.49e-02

Pos logit contributions
lf+1.993
opus+1.752
mber+1.727
izing+1.681
 importantly+1.650
Neg logit contributions
 names-1.573
ws-1.257
 meet-1.255
 name-1.249
 sneak-1.213
Token She
Token ID1375
Feature activation-6.08e-02

Pos logit contributions
 names+0.181
ws+0.151
 sneak+0.148
 meet+0.146
 name+0.141
Neg logit contributions
lf-0.220
opus-0.202
mber-0.202
 importantly-0.192
izing-0.188
Token stands
Token ID6296
Feature activation-1.10e-01

Pos logit contributions
 names+0.118
 meet+0.096
 name+0.091
ws+0.091
 sneak+0.090
Neg logit contributions
lf-0.167
opus-0.152
mber-0.145
ths-0.142
 importantly-0.140
Token on
Token ID319
Feature activation+5.93e-02

Pos logit contributions
 names+0.226
 name+0.184
 try+0.177
 sneak+0.177
 find+0.175
Neg logit contributions
lf-0.306
opus-0.301
mber-0.299
 importantly-0.281
very-0.261
Token the
Token ID262
Feature activation+3.63e-02

Pos logit contributions
lf+0.154
opus+0.145
mber+0.144
 importantly+0.136
izing+0.131
Neg logit contributions
 names-0.132
 sneak-0.106
 name-0.105
ws-0.104
 meet-0.104
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
lf+0.540
opus+0.493
mber+0.488
 importantly+0.468
izing+0.463
Neg logit contributions
 names-0.361
ws-0.283
 meet-0.281
 name-0.279
 sneak-0.277
Token climb
Token ID12080
Feature activation-2.25e-01

Pos logit contributions
 names+0.363
ws+0.251
 name+0.249
ashes+0.238
 meet+0.224
Neg logit contributions
lf-0.695
izing-0.552
opus-0.552
mber-0.541
ths-0.540
Token the
Token ID262
Feature activation+3.64e-02

Pos logit contributions
 names+0.526
 sneak+0.427
 name+0.426
ws+0.425
 meet+0.422
Neg logit contributions
lf-0.560
opus-0.513
mber-0.509
 importantly-0.478
izing-0.471
Token books
Token ID3835
Feature activation-5.27e-02

Pos logit contributions
lf+0.072
opus+0.060
mber+0.058
ement+0.052
izing+0.052
Neg logit contributions
 names-0.103
ws-0.091
 meet-0.089
 name-0.088
 sneak-0.087
Tokenhe
Token ID258
Feature activation+1.81e-01

Pos logit contributions
 names+0.126
ws+0.103
 name+0.102
 sneak+0.102
 meet+0.101
Neg logit contributions
lf-0.129
opus-0.117
mber-0.115
 importantly-0.108
izing-0.107
Tokenlf
Token ID1652
Feature activation+7.37e-01

Pos logit contributions
lf+0.088
opus+0.080
mber+0.076
 importantly+0.056
izing+0.043
Neg logit contributions
 names-0.759
 sneak-0.682
 name-0.681
ws-0.674
 meet-0.674
Token.
Token ID13
Feature activation-8.49e-02

Pos logit contributions
lf+2.063
opus+1.846
mber+1.822
izing+1.744
 importantly+1.731
Neg logit contributions
 names-1.504
ws-1.191
 name-1.184
 meet-1.175
 sneak-1.156
Token She
Token ID1375
Feature activation-6.06e-02

Pos logit contributions
 names+0.184
ws+0.153
 sneak+0.149
 meet+0.148
 name+0.145
Neg logit contributions
lf-0.221
opus-0.204
mber-0.204
 importantly-0.193
izing-0.189
Token pushes
Token ID20070
Feature activation-1.66e-01

Pos logit contributions
 names+0.114
 meet+0.091
 name+0.087
ws+0.087
 sneak+0.085
Neg logit contributions
lf-0.170
opus-0.155
mber-0.152
ths-0.147
 importantly-0.143
Token the
Token ID262
Feature activation+3.61e-02

Pos logit contributions
 names+0.393
 name+0.319
 sneak+0.319
ws+0.317
 meet+0.314
Neg logit contributions
lf-0.412
opus-0.383
mber-0.380
 importantly-0.357
izing-0.346
Token chair
Token ID5118
Feature activation+1.90e-01

Pos logit contributions
lf+0.070
opus+0.060
mber+0.059
izing+0.052
 importantly+0.051
Neg logit contributions
 names-0.104
ws-0.090
 meet-0.090
 name-0.088
 sneak-0.088
Token can
Token ID460
Feature activation-2.28e-01

Pos logit contributions
 names+0.092
ags+0.079
ashes+0.078
ws+0.078
 sneak+0.068
Neg logit contributions
lf-0.116
opus-0.103
 importantly-0.091
mber-0.091
ths-0.089
Token climb
Token ID12080
Feature activation-2.26e-01

Pos logit contributions
 names+0.311
ws+0.269
ashes+0.238
 proper+0.210
 name+0.188
Neg logit contributions
lf-0.766
opus-0.657
ths-0.605
ister-0.598
izing-0.583
Token the
Token ID262
Feature activation+3.38e-02

Pos logit contributions
 names+0.533
 name+0.437
 sneak+0.436
ws+0.435
 meet+0.431
Neg logit contributions
lf-0.553
opus-0.501
mber-0.494
izing-0.462
 importantly-0.461
Token books
Token ID3835
Feature activation-5.34e-02

Pos logit contributions
lf+0.082
opus+0.071
mber+0.067
ths+0.062
ement+0.062
Neg logit contributions
 names-0.081
ws-0.071
 name-0.069
 sneak-0.068
 meet-0.067
Tokenhe
Token ID258
Feature activation+1.79e-01

Pos logit contributions
 names+0.128
 name+0.104
ws+0.104
 sneak+0.103
 meet+0.102
Neg logit contributions
lf-0.130
opus-0.119
mber-0.118
 importantly-0.109
izing-0.109
Tokenlf
Token ID1652
Feature activation+7.37e-01

Pos logit contributions
lf+0.054
opus+0.037
mber+0.037
 importantly+0.016
izing+0.005
Neg logit contributions
 names-0.789
 sneak-0.712
 name-0.711
 meet-0.707
ws-0.706
Token and
Token ID290
Feature activation+2.43e-02

Pos logit contributions
lf+2.037
opus+1.847
mber+1.824
izing+1.724
 importantly+1.717
Neg logit contributions
 names-1.518
 name-1.202
ws-1.186
 meet-1.177
 sneak-1.173
Token see
Token ID766
Feature activation-3.50e-01

Pos logit contributions
lf+0.083
opus+0.070
mber+0.070
ths+0.069
 she+0.068
Neg logit contributions
 names-0.031
ws-0.023
 name-0.021
ashes-0.020
ags-0.017
Token what
Token ID644
Feature activation+1.18e-01

Pos logit contributions
 names+0.786
 sneak+0.628
 name+0.626
ws+0.623
 meet+0.620
Neg logit contributions
lf-0.905
opus-0.844
mber-0.837
 importantly-0.784
izing-0.768
Token is
Token ID318
Feature activation-1.73e-01

Pos logit contributions
lf+0.269
opus+0.245
mber+0.243
 importantly+0.227
izing+0.220
Neg logit contributions
 names-0.301
ws-0.250
 name-0.250
 sneak-0.250
 meet-0.246
Token on
Token ID319
Feature activation+5.88e-02

Pos logit contributions
 names+0.379
 sneak+0.302
 name+0.302
ws+0.301
 meet+0.298
Neg logit contributions
lf-0.457
opus-0.424
mber-0.421
 importantly-0.397
izing-0.390
Token had
Token ID550
Feature activation+3.34e-02

Pos logit contributions
 names+0.153
ws+0.122
 sneak+0.122
 name+0.121
 meet+0.120
Neg logit contributions
lf-0.184
opus-0.170
mber-0.168
 importantly-0.159
izing-0.155
Token a
Token ID257
Feature activation-1.51e-01

Pos logit contributions
lf+0.088
mber+0.079
 importantly+0.079
opus+0.077
very+0.072
Neg logit contributions
 names-0.067
ws-0.062
 sneak-0.057
 name-0.053
 meet-0.052
Token big
Token ID1263
Feature activation-4.57e-02

Pos logit contributions
 names+0.338
ws+0.279
 sneak+0.273
 meet+0.270
 name+0.267
Neg logit contributions
lf-0.387
opus-0.359
mber-0.356
 importantly-0.334
izing-0.324
Token books
Token ID3835
Feature activation-6.14e-02

Pos logit contributions
 names+0.109
 sneak+0.089
ws+0.089
 name+0.088
 meet+0.088
Neg logit contributions
lf-0.111
opus-0.102
mber-0.101
 importantly-0.095
izing-0.093
Tokenhe
Token ID258
Feature activation+1.75e-01

Pos logit contributions
 names+0.154
 name+0.132
 find+0.130
 meet+0.129
 play+0.129
Neg logit contributions
lf-0.134
opus-0.128
mber-0.128
 importantly-0.115
izing-0.115
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
mber+0.063
opus+0.057
 importantly+0.039
izing+0.025
ement+0.018
Neg logit contributions
 names-0.726
 find-0.680
 play-0.675
 sneak-0.675
 talk-0.665
Token in
Token ID287
Feature activation+3.54e-03

Pos logit contributions
lf+1.908
opus+1.798
mber+1.795
 importantly+1.667
izing+1.639
Neg logit contributions
 names-1.616
 sneak-1.304
 name-1.298
ws-1.280
 meet-1.272
Token her
Token ID607
Feature activation+3.87e-04

Pos logit contributions
lf+0.009
mber+0.008
opus+0.007
 importantly+0.007
izing+0.007
Neg logit contributions
 names-0.009
ws-0.007
 sneak-0.007
 name-0.007
 meet-0.007
Token room
Token ID2119
Feature activation-2.89e-02

Pos logit contributions
lf+0.001
opus+0.001
mber+0.001
 importantly+0.001
ths+0.001
Neg logit contributions
 names-0.001
ws-0.001
 sneak-0.001
 meet-0.001
 name-0.001
Token with
Token ID351
Feature activation+5.80e-02

Pos logit contributions
 names+0.062
 name+0.050
 sneak+0.050
 meet+0.049
 find+0.048
Neg logit contributions
lf-0.075
mber-0.072
opus-0.072
 importantly-0.067
izing-0.065
Token lots
Token ID6041
Feature activation-4.13e-02

Pos logit contributions
lf+0.139
mber+0.116
 importantly+0.115
opus+0.114
ths+0.108
Neg logit contributions
 names-0.134
ws-0.120
 sneak-0.118
 meet-0.114
 name-0.113
Token had
Token ID550
Feature activation+3.63e-02

Pos logit contributions
lf+1.238
opus+1.110
mber+1.108
 importantly+1.037
ths+1.016
Neg logit contributions
 names-1.065
ws-0.865
 sneak-0.854
 name-0.854
 meet-0.839
Token a
Token ID257
Feature activation-1.47e-01

Pos logit contributions
lf+0.100
mber+0.091
opus+0.087
 importantly+0.087
ths+0.084
Neg logit contributions
 names-0.071
ws-0.062
 sneak-0.059
 name-0.057
 meet-0.056
Token big
Token ID1263
Feature activation-4.21e-02

Pos logit contributions
 names+0.339
ws+0.290
 sneak+0.279
 meet+0.275
 name+0.265
Neg logit contributions
lf-0.366
opus-0.335
mber-0.335
 importantly-0.313
very-0.300
Token books
Token ID3835
Feature activation-6.29e-02

Pos logit contributions
 names+0.103
ws+0.087
 sneak+0.085
 meet+0.084
 name+0.082
Neg logit contributions
lf-0.100
opus-0.090
mber-0.089
 importantly-0.083
very-0.080
Tokenhe
Token ID258
Feature activation+1.75e-01

Pos logit contributions
 names+0.165
 name+0.137
 sneak+0.137
ws+0.137
 meet+0.136
Neg logit contributions
lf-0.138
opus-0.126
mber-0.125
 importantly-0.116
izing-0.114
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
mber+0.052
opus+0.049
 importantly+0.021
izing+0.013
lf+0.002
Neg logit contributions
 names-0.749
 find-0.690
 play-0.684
 name-0.684
 sneak-0.683
Token in
Token ID287
Feature activation+6.27e-03

Pos logit contributions
lf+1.990
opus+1.802
mber+1.774
 importantly+1.704
izing+1.669
Neg logit contributions
 names-1.581
ws-1.261
 name-1.246
 sneak-1.244
 meet-1.243
Token her
Token ID607
Feature activation+2.92e-03

Pos logit contributions
lf+0.015
mber+0.012
 importantly+0.012
opus+0.011
izing+0.011
Neg logit contributions
 names-0.016
ws-0.014
 sneak-0.013
ashes-0.013
 name-0.013
Token room
Token ID2119
Feature activation-3.01e-02

Pos logit contributions
lf+0.007
opus+0.007
mber+0.006
 importantly+0.006
ths+0.006
Neg logit contributions
 names-0.007
ws-0.006
 sneak-0.006
 meet-0.005
 argue-0.005
Token.
Token ID13
Feature activation-9.66e-02

Pos logit contributions
 names+0.059
ws+0.047
 sneak+0.046
 name+0.046
 meet+0.046
Neg logit contributions
lf-0.087
opus-0.079
mber-0.078
 importantly-0.075
izing-0.073
Token Her
Token ID2332
Feature activation-9.34e-02

Pos logit contributions
 names+0.191
ws+0.166
 sneak+0.165
 meet+0.150
 respect+0.147
Neg logit contributions
lf-0.257
mber-0.239
opus-0.236
 importantly-0.228
very-0.221
Token book
Token ID1492
Feature activation+8.58e-02

Pos logit contributions
lf+0.060
opus+0.055
mber+0.053
 importantly+0.047
very+0.047
Neg logit contributions
 names-0.069
 sneak-0.060
ws-0.060
 meet-0.058
 name-0.057
Token from
Token ID422
Feature activation+2.87e-02

Pos logit contributions
lf+0.217
opus+0.193
mber+0.189
 importantly+0.177
izing+0.177
Neg logit contributions
 names-0.192
ws-0.161
 meet-0.156
 sneak-0.154
 name-0.150
Token the
Token ID262
Feature activation+2.71e-02

Pos logit contributions
lf+0.072
opus+0.066
mber+0.065
 importantly+0.061
izing+0.060
Neg logit contributions
 names-0.067
ws-0.054
 sneak-0.054
 name-0.054
 meet-0.053
Token books
Token ID3835
Feature activation-5.45e-02

Pos logit contributions
lf+0.063
opus+0.056
mber+0.053
 importantly+0.048
izing+0.047
Neg logit contributions
 names-0.067
ws-0.058
 sneak-0.056
 meet-0.055
 name-0.054
Tokenhe
Token ID258
Feature activation+1.77e-01

Pos logit contributions
 names+0.114
 name+0.089
ws+0.089
 sneak+0.089
 meet+0.088
Neg logit contributions
lf-0.150
opus-0.138
mber-0.137
 importantly-0.129
izing-0.128
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
mber+0.029
opus+0.027
lf+0.019
 importantly+0.014
izing-0.000
Neg logit contributions
 names-0.791
 sneak-0.717
 name-0.715
 find-0.708
 meet-0.708
Token.
Token ID13
Feature activation-9.56e-02

Pos logit contributions
lf+2.042
opus+1.825
mber+1.811
 importantly+1.730
izing+1.711
Neg logit contributions
 names-1.497
ws-1.183
 meet-1.180
 name-1.180
 sneak-1.165
Token She
Token ID1375
Feature activation-6.47e-02

Pos logit contributions
 names+0.195
ws+0.159
 sneak+0.159
 meet+0.156
 find+0.149
Neg logit contributions
lf-0.253
mber-0.239
opus-0.237
 importantly-0.228
izing-0.220
Token opens
Token ID9808
Feature activation-2.20e-01

Pos logit contributions
 names+0.114
 meet+0.092
ws+0.088
 respect+0.085
 name+0.084
Neg logit contributions
lf-0.181
opus-0.167
mber-0.163
ement-0.154
ths-0.153
Token it
Token ID340
Feature activation+1.73e-01

Pos logit contributions
 names+0.527
 name+0.428
ws+0.425
 sneak+0.423
 meet+0.418
Neg logit contributions
lf-0.542
opus-0.509
mber-0.501
 importantly-0.475
izing-0.456
Token and
Token ID290
Feature activation+1.63e-02

Pos logit contributions
lf+0.429
opus+0.392
mber+0.389
 importantly+0.365
izing+0.360
Neg logit contributions
 names-0.405
 sneak-0.329
ws-0.329
 name-0.327
 meet-0.327
Token I
Token ID314
Feature activation+8.58e-02

Pos logit contributions
lf+0.734
opus+0.686
mber+0.678
izing+0.647
 importantly+0.643
Neg logit contributions
 names-0.570
 name-0.458
 sneak-0.447
 meet-0.443
 try-0.443
Token finished
Token ID5201
Feature activation+1.17e-02

Pos logit contributions
lf+0.244
mber+0.231
opus+0.224
 importantly+0.216
very+0.212
Neg logit contributions
 names-0.160
ws-0.125
 sneak-0.118
 name-0.113
 meet-0.111
Token the
Token ID262
Feature activation+3.47e-02

Pos logit contributions
lf+0.027
opus+0.025
mber+0.025
 importantly+0.023
izing+0.022
Neg logit contributions
 names-0.029
 name-0.024
 sneak-0.024
 meet-0.024
ws-0.024
Token books
Token ID3835
Feature activation-6.24e-02

Pos logit contributions
lf+0.091
opus+0.081
mber+0.081
 importantly+0.077
izing+0.075
Neg logit contributions
 names-0.077
ws-0.064
 sneak-0.063
 name-0.061
 meet-0.061
Tokenhe
Token ID258
Feature activation+1.75e-01

Pos logit contributions
 names+0.151
 sneak+0.129
 find+0.128
 try+0.128
 meet+0.127
Neg logit contributions
lf-0.139
opus-0.131
mber-0.130
 importantly-0.125
ths-0.118
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.040
mber+0.039
 importantly+0.025
izing+0.015
lf+0.001
Neg logit contributions
 names-0.751
 find-0.692
 sneak-0.686
 meet-0.686
 name-0.682
Token!"
Token ID2474
Feature activation-1.50e-02

Pos logit contributions
lf+1.867
opus+1.765
mber+1.748
 importantly+1.657
izing+1.598
Neg logit contributions
 names-1.632
 sneak-1.340
 name-1.314
ws-1.309
 find-1.308
Token Andy
Token ID12382
Feature activation+2.11e-01

Pos logit contributions
 names+0.038
 name+0.033
 play+0.033
 meet+0.033
 try+0.033
Neg logit contributions
lf-0.032
opus-0.031
mber-0.031
izing-0.028
ement-0.027
Token said
Token ID531
Feature activation+1.88e-01

Pos logit contributions
lf+0.462
opus+0.451
mber+0.421
izing+0.418
 importantly+0.415
Neg logit contributions
 names-0.537
 sneak-0.458
 name-0.453
 find-0.450
 try-0.449
Token with
Token ID351
Feature activation+6.03e-02

Pos logit contributions
lf+0.484
opus+0.450
mber+0.445
 importantly+0.419
izing+0.412
Neg logit contributions
 names-0.421
 name-0.338
 sneak-0.338
ws-0.334
 meet-0.333
Token a
Token ID257
Feature activation-1.50e-01

Pos logit contributions
lf+0.167
mber+0.150
 importantly+0.150
opus+0.150
ement+0.141
Neg logit contributions
 names-0.120
 sneak-0.100
ws-0.099
 meet-0.094
 name-0.091
Token ran
Token ID4966
Feature activation-1.96e-03

Pos logit contributions
lf+0.129
opus+0.116
mber+0.114
 importantly+0.108
ths+0.107
Neg logit contributions
 names-0.119
 name-0.097
ws-0.097
 meet-0.095
 sneak-0.093
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
 names+0.004
ws+0.003
 name+0.003
 meet+0.003
 sneak+0.003
Neg logit contributions
lf-0.006
mber-0.005
opus-0.005
ths-0.005
izing-0.005
Token the
Token ID262
Feature activation+3.00e-02

Pos logit contributions
 names+0.561
ws+0.494
ashes+0.458
 name+0.435
wn+0.426
Neg logit contributions
lf-0.410
mber-0.342
opus-0.325
 he-0.313
 she-0.309
Token books
Token ID3835
Feature activation-6.03e-02

Pos logit contributions
lf+0.076
opus+0.068
mber+0.066
 importantly+0.062
izing+0.061
Neg logit contributions
 names-0.069
ws-0.058
 meet-0.055
 sneak-0.055
 name-0.055
Tokenhe
Token ID258
Feature activation+1.76e-01

Pos logit contributions
 names+0.133
 sneak+0.115
 find+0.115
 try+0.113
 play+0.113
Neg logit contributions
mber-0.144
lf-0.143
opus-0.142
 importantly-0.136
ths-0.131
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
mber+0.075
opus+0.065
 importantly+0.049
izing+0.027
ement+0.026
Neg logit contributions
 names-0.727
 find-0.680
 sneak-0.678
 play-0.673
 make-0.663
Token and
Token ID290
Feature activation+1.71e-02

Pos logit contributions
lf+1.924
opus+1.846
mber+1.838
 importantly+1.721
izing+1.669
Neg logit contributions
 names-1.595
 sneak-1.299
 name-1.260
 find-1.260
ws-1.256
Token put
Token ID1234
Feature activation-2.34e-01

Pos logit contributions
lf+0.046
mber+0.042
opus+0.042
 importantly+0.039
ths+0.038
Neg logit contributions
 names-0.037
 name-0.029
ws-0.029
 meet-0.029
 sneak-0.029
Token the
Token ID262
Feature activation+2.99e-02

Pos logit contributions
 names+0.512
 sneak+0.413
 meet+0.411
 name+0.408
ws+0.404
Neg logit contributions
lf-0.611
opus-0.555
mber-0.553
 importantly-0.521
ths-0.515
Token box
Token ID3091
Feature activation+3.52e-02

Pos logit contributions
lf+0.073
opus+0.066
mber+0.063
 importantly+0.061
izing+0.060
Neg logit contributions
 names-0.071
 sneak-0.060
 meet-0.058
ws-0.058
 name-0.057
Token on
Token ID319
Feature activation+5.67e-02

Pos logit contributions
lf+0.097
opus+0.090
mber+0.089
 importantly+0.085
izing+0.083
Neg logit contributions
 names-0.072
 sneak-0.057
 name-0.057
ws-0.057
 meet-0.056
Token falls
Token ID8953
Feature activation-3.98e-02

Pos logit contributions
lf+0.234
opus+0.212
mber+0.203
ths+0.197
 importantly+0.197
Neg logit contributions
 names-0.167
 meet-0.134
ws-0.130
 name-0.129
 sneak-0.128
Token off
Token ID572
Feature activation-7.69e-02

Pos logit contributions
 names+0.090
 meet+0.076
 sneak+0.074
ws+0.072
 name+0.070
Neg logit contributions
lf-0.097
opus-0.085
izing-0.084
mber-0.082
ths-0.080
Token the
Token ID262
Feature activation+3.37e-02

Pos logit contributions
 names+0.154
 sneak+0.122
 name+0.121
ws+0.119
 try+0.118
Neg logit contributions
lf-0.212
opus-0.204
mber-0.204
 importantly-0.191
ths-0.187
Token books
Token ID3835
Feature activation-5.41e-02

Pos logit contributions
lf+0.084
opus+0.072
mber+0.069
 importantly+0.064
izing+0.063
Neg logit contributions
 names-0.081
ws-0.069
 sneak-0.066
 meet-0.066
 name-0.066
Tokenhe
Token ID258
Feature activation+1.79e-01

Pos logit contributions
 names+0.146
 name+0.120
ws+0.119
 meet+0.119
 sneak+0.118
Neg logit contributions
lf-0.120
opus-0.104
mber-0.101
izing-0.095
 importantly-0.093
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
lf+0.006
opus-0.005
mber-0.005
 importantly-0.025
izing-0.039
Neg logit contributions
 names-0.833
 sneak-0.756
 name-0.756
ws-0.752
 meet-0.750
Token.
Token ID13
Feature activation-8.79e-02

Pos logit contributions
lf+1.968
opus+1.660
izing+1.583
mber+1.583
ement+1.569
Neg logit contributions
 names-1.639
 meet-1.318
 name-1.311
ws-1.286
 sneak-1.239
Token It
Token ID632
Feature activation-2.16e-01

Pos logit contributions
 names+0.203
 sneak+0.169
 meet+0.167
ws+0.164
 name+0.161
Neg logit contributions
lf-0.214
opus-0.202
mber-0.199
 importantly-0.188
izing-0.181
Token makes
Token ID1838
Feature activation-2.52e-01

Pos logit contributions
 names+0.470
 meet+0.400
 name+0.362
ws+0.360
 sneak+0.356
Neg logit contributions
lf-0.526
opus-0.467
ls-0.449
 she-0.429
ths-0.425
Token a
Token ID257
Feature activation-1.50e-01

Pos logit contributions
 names+0.509
 sneak+0.409
 meet+0.408
ws+0.406
 name+0.399
Neg logit contributions
lf-0.696
opus-0.630
mber-0.616
 importantly-0.596
izing-0.588
Token loud
Token ID7812
Feature activation+2.87e-03

Pos logit contributions
 names+0.359
ws+0.299
 meet+0.291
 sneak+0.287
 name+0.285
Neg logit contributions
lf-0.367
opus-0.334
mber-0.328
 importantly-0.307
izing-0.306
Token chair
Token ID5118
Feature activation+1.85e-01

Pos logit contributions
 names+0.341
 sneak+0.282
 name+0.281
 try+0.276
 meet+0.273
Neg logit contributions
lf-0.361
opus-0.338
mber-0.336
 importantly-0.328
ths-0.320
Token to
Token ID284
Feature activation-2.11e-01

Pos logit contributions
lf+0.466
opus+0.434
mber+0.433
 importantly+0.406
izing+0.394
Neg logit contributions
 names-0.426
 sneak-0.348
 name-0.345
ws-0.342
 meet-0.340
Token the
Token ID262
Feature activation+3.41e-02

Pos logit contributions
 names+0.432
ws+0.327
 name+0.315
ashes+0.311
 meet+0.282
Neg logit contributions
lf-0.620
mber-0.498
opus-0.497
izing-0.488
ths-0.479
Token books
Token ID3835
Feature activation-5.78e-02

Pos logit contributions
lf+0.092
opus+0.081
mber+0.079
izing+0.074
 importantly+0.073
Neg logit contributions
 names-0.075
ws-0.063
 name-0.060
 meet-0.059
 sneak-0.058
Tokenhe
Token ID258
Feature activation+1.77e-01

Pos logit contributions
 names+0.146
 sneak+0.126
 find+0.123
 try+0.123
 name+0.123
Neg logit contributions
lf-0.125
opus-0.119
mber-0.119
 importantly-0.113
ths-0.107
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.048
mber+0.046
 importantly+0.035
lf+0.014
izing+0.009
Neg logit contributions
 names-0.763
 sneak-0.703
 find-0.698
 name-0.692
 play-0.689
Token and
Token ID290
Feature activation+2.41e-02

Pos logit contributions
lf+1.891
opus+1.772
mber+1.763
 importantly+1.660
izing+1.609
Neg logit contributions
 names-1.642
 sneak-1.328
 name-1.313
ws-1.301
 meet-1.295
Token climbed
Token ID19952
Feature activation-2.78e-02

Pos logit contributions
lf+0.064
opus+0.060
mber+0.059
 importantly+0.056
izing+0.055
Neg logit contributions
 names-0.052
 sneak-0.041
 name-0.041
ws-0.041
 meet-0.041
Token on
Token ID319
Feature activation+5.72e-02

Pos logit contributions
 names+0.064
 sneak+0.052
ws+0.052
 name+0.052
 meet+0.051
Neg logit contributions
lf-0.070
opus-0.064
mber-0.063
 importantly-0.059
izing-0.059
Token it
Token ID340
Feature activation+1.81e-01

Pos logit contributions
lf+0.139
opus+0.127
mber+0.126
 importantly+0.118
izing+0.116
Neg logit contributions
 names-0.137
ws-0.112
 sneak-0.112
 name-0.112
 meet-0.111
Token.
Token ID13
Feature activation-8.97e-02

Pos logit contributions
lf+0.491
opus+0.434
mber+0.431
izing+0.418
 importantly+0.407
Neg logit contributions
 names-0.383
ws-0.311
 meet-0.302
 sneak-0.300
 name-0.298
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.032
mber+0.030
 importantly+0.010
lf+0.005
izing-0.007
Neg logit contributions
 names-0.796
 sneak-0.723
 name-0.721
 find-0.719
 play-0.713
Token.
Token ID13
Feature activation-9.06e-02

Pos logit contributions
lf+2.104
mber+2.033
opus+2.023
 importantly+1.902
izing+1.858
Neg logit contributions
 names-1.404
 sneak-1.093
 name-1.072
ws-1.057
 meet-1.050
Token The
Token ID383
Feature activation-2.00e-01

Pos logit contributions
 names+0.198
ws+0.166
 sneak+0.164
 meet+0.157
 name+0.156
Neg logit contributions
lf-0.233
mber-0.215
opus-0.213
 importantly-0.207
ement-0.196
Token books
Token ID3835
Feature activation-5.54e-02

Pos logit contributions
 names+0.428
ws+0.359
 sneak+0.351
 meet+0.342
 name+0.342
Neg logit contributions
lf-0.534
opus-0.482
mber-0.479
 importantly-0.452
izing-0.436
Tokenhe
Token ID258
Feature activation+1.79e-01

Pos logit contributions
 names+0.101
 name+0.077
 sneak+0.076
 meet+0.076
ws+0.075
Neg logit contributions
lf-0.164
opus-0.157
mber-0.156
 importantly-0.148
izing-0.145
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.034
mber+0.028
 importantly+0.027
lf+0.019
izing+0.008
Neg logit contributions
 names-0.794
 sneak-0.721
 name-0.714
 find-0.711
 meet-0.706
Token is
Token ID318
Feature activation-1.73e-01

Pos logit contributions
lf+2.187
opus+2.009
mber+1.989
 importantly+1.888
izing+1.860
Neg logit contributions
 names-1.351
ws-1.035
 name-1.033
 meet-1.029
 sneak-1.028
Token high
Token ID1029
Feature activation+2.04e-02

Pos logit contributions
 names+0.377
 name+0.301
 sneak+0.294
 meet+0.290
ws+0.290
Neg logit contributions
lf-0.461
opus-0.446
mber-0.434
izing-0.406
 importantly-0.404
Token,
Token ID11
Feature activation+2.69e-01

Pos logit contributions
lf+0.052
opus+0.048
mber+0.048
 importantly+0.045
izing+0.044
Neg logit contributions
 names-0.047
 sneak-0.038
 name-0.038
ws-0.038
 meet-0.037
Token so
Token ID523
Feature activation+9.01e-02

Pos logit contributions
lf+0.709
opus+0.663
mber+0.656
 importantly+0.615
izing+0.603
Neg logit contributions
 names-0.593
 name-0.473
 sneak-0.473
ws-0.468
 meet-0.466
Token Lily
Token ID20037
Feature activation+4.97e-02

Pos logit contributions
opus+0.205
lf+0.199
mber+0.197
izing+0.184
ement+0.182
Neg logit contributions
 names-0.226
 name-0.190
 play-0.185
 go-0.182
 try-0.180
Token,
Token ID11
Feature activation+2.69e-01

Pos logit contributions
 names+0.051
 sneak+0.040
 name+0.040
ws+0.039
 meet+0.039
Neg logit contributions
lf-0.069
opus-0.065
mber-0.064
 importantly-0.061
izing-0.060
Token on
Token ID319
Feature activation+5.63e-02

Pos logit contributions
lf+0.680
mber+0.598
opus+0.593
 importantly+0.582
izing+0.557
Neg logit contributions
 names-0.597
ws-0.505
 sneak-0.481
 name-0.475
 meet-0.472
Token a
Token ID257
Feature activation-1.48e-01

Pos logit contributions
lf+0.143
mber+0.126
opus+0.125
 importantly+0.117
izing+0.117
Neg logit contributions
 names-0.130
ws-0.107
 name-0.107
 sneak-0.106
 meet-0.105
Token books
Token ID3835
Feature activation-5.54e-02

Pos logit contributions
 names+0.336
 sneak+0.270
 name+0.270
ws+0.269
 meet+0.267
Neg logit contributions
lf-0.379
opus-0.350
mber-0.348
 importantly-0.328
izing-0.321
Tokenhe
Token ID258
Feature activation+1.78e-01

Pos logit contributions
 names+0.137
 sneak+0.114
 name+0.114
 meet+0.113
 find+0.112
Neg logit contributions
lf-0.125
opus-0.120
mber-0.119
 importantly-0.112
ths-0.109
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.032
mber+0.030
 importantly+0.010
lf+0.005
izing-0.007
Neg logit contributions
 names-0.796
 sneak-0.723
 name-0.721
 find-0.719
 play-0.713
Token.
Token ID13
Feature activation-9.06e-02

Pos logit contributions
lf+2.104
mber+2.033
opus+2.023
 importantly+1.902
izing+1.858
Neg logit contributions
 names-1.404
 sneak-1.093
 name-1.072
ws-1.057
 meet-1.050
Token The
Token ID383
Feature activation-2.00e-01

Pos logit contributions
 names+0.198
ws+0.166
 sneak+0.164
 meet+0.157
 name+0.156
Neg logit contributions
lf-0.233
mber-0.215
opus-0.213
 importantly-0.207
ement-0.196
Token books
Token ID3835
Feature activation-5.54e-02

Pos logit contributions
 names+0.428
ws+0.359
 sneak+0.351
 meet+0.342
 name+0.342
Neg logit contributions
lf-0.534
opus-0.482
mber-0.479
 importantly-0.452
izing-0.436
Tokenhe
Token ID258
Feature activation+1.79e-01

Pos logit contributions
 names+0.101
 name+0.077
 sneak+0.076
 meet+0.076
ws+0.075
Neg logit contributions
lf-0.164
opus-0.157
mber-0.156
 importantly-0.148
izing-0.145
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
opus+0.034
mber+0.028
 importantly+0.027
lf+0.019
izing+0.008
Neg logit contributions
 names-0.794
 sneak-0.721
 name-0.714
 find-0.711
 meet-0.706
Token Sarah
Token ID10490
Feature activation+1.93e-01

Pos logit contributions
opus+0.704
lf+0.652
mber+0.637
ement+0.582
 importantly+0.575
Neg logit contributions
 names-0.672
 play-0.555
 name-0.552
 try-0.545
 talk-0.533
Token found
Token ID1043
Feature activation-1.29e-01

Pos logit contributions
lf+0.512
opus+0.479
mber+0.472
 importantly+0.463
izing+0.453
Neg logit contributions
 names-0.405
 sneak-0.332
 meet-0.327
 name-0.327
 find-0.326
Token a
Token ID257
Feature activation-1.46e-01

Pos logit contributions
 names+0.205
 name+0.148
 meet+0.138
 find+0.138
 try+0.136
Neg logit contributions
lf-0.433
opus-0.405
mber-0.401
 importantly-0.380
izing-0.377
Token books
Token ID3835
Feature activation-6.17e-02

Pos logit contributions
 names+0.325
 name+0.270
 sneak+0.261
 try+0.256
 meet+0.253
Neg logit contributions
lf-0.376
opus-0.346
mber-0.344
 importantly-0.332
izing-0.325
Tokenhe
Token ID258
Feature activation+1.76e-01

Pos logit contributions
 names+0.146
 find+0.124
 sneak+0.123
 name+0.123
 try+0.122
Neg logit contributions
lf-0.140
mber-0.136
opus-0.135
 importantly-0.125
ths-0.121
Tokenlf
Token ID1652
Feature activation+7.36e-01

Pos logit contributions
mber+0.079
opus+0.076
 importantly+0.066
izing+0.040
ement+0.039
Neg logit contributions
 names-0.708
 find-0.665
 sneak-0.662
 play-0.658
 talk-0.648
Token in
Token ID287
Feature activation+8.16e-03

Pos logit contributions
lf+1.936
mber+1.847
opus+1.838
 importantly+1.716
izing+1.666
Neg logit contributions
 names-1.559
 sneak-1.266
 name-1.241
ws-1.231
 meet-1.221
Token her
Token ID607
Feature activation+3.26e-03

Pos logit contributions
lf+0.022
mber+0.020
opus+0.019
 importantly+0.019
izing+0.018
Neg logit contributions
 names-0.018
ws-0.014
 sneak-0.014
 name-0.014
 meet-0.014
Token room
Token ID2119
Feature activation-2.93e-02

Pos logit contributions
lf+0.009
opus+0.008
mber+0.008
 importantly+0.008
ths+0.007
Neg logit contributions
 names-0.007
ws-0.006
 sneak-0.006
 name-0.006
 meet-0.006
Token.
Token ID13
Feature activation-9.07e-02

Pos logit contributions
 names+0.049
 sneak+0.037
 name+0.037
 find+0.035
 meet+0.035
Neg logit contributions
lf-0.089
opus-0.087
mber-0.087
 importantly-0.082
ths-0.080
Token She
Token ID1375
Feature activation-6.76e-02

Pos logit contributions
 names+0.198
ws+0.167
 sneak+0.161
 meet+0.157
 name+0.155
Neg logit contributions
lf-0.235
mber-0.217
opus-0.216
 importantly-0.204
izing-0.201
Token I
Token ID314
Feature activation+8.57e-02

Pos logit contributions
 names+0.292
 try+0.249
 name+0.245
 meet+0.245
 sneak+0.244
Neg logit contributions
lf-0.329
mber-0.311
opus-0.300
 importantly-0.294
izing-0.290
Token touch
Token ID3638
Feature activation-3.92e-01

Pos logit contributions
lf+0.273
opus+0.258
mber+0.246
ister+0.234
 importantly+0.233
Neg logit contributions
 names-0.118
ws-0.107
ashes-0.088
 argue-0.070
 sneak-0.068
Token the
Token ID262
Feature activation+3.43e-02

Pos logit contributions
 names+0.894
 name+0.714
 try+0.706
 meet+0.704
 sneak+0.697
Neg logit contributions
lf-0.998
mber-0.946
opus-0.936
 importantly-0.887
ths-0.841
Token books
Token ID3835
Feature activation-6.23e-02

Pos logit contributions
lf+0.091
opus+0.082
mber+0.081
 importantly+0.077
izing+0.075
Neg logit contributions
 names-0.074
ws-0.062
 sneak-0.061
 meet-0.060
 name-0.059
Tokenhe
Token ID258
Feature activation+1.75e-01

Pos logit contributions
 names+0.151
 name+0.128
 sneak+0.127
 try+0.126
 find+0.126
Neg logit contributions
lf-0.143
mber-0.133
opus-0.132
 importantly-0.127
ths-0.121
Tokenlf
Token ID1652
Feature activation+7.35e-01

Pos logit contributions
opus+0.039
mber+0.038
 importantly+0.036
izing+0.021
ths+0.007
Neg logit contributions
 names-0.756
 name-0.695
 find-0.691
 sneak-0.683
 meet-0.682
Token too
Token ID1165
Feature activation+8.67e-03

Pos logit contributions
lf+1.768
opus+1.652
mber+1.651
 importantly+1.575
izing+1.507
Neg logit contributions
 names-1.725
 sneak-1.440
 name-1.430
 find-1.408
 meet-1.403
Token?"
Token ID1701
Feature activation+6.01e-02

Pos logit contributions
lf+0.023
opus+0.022
mber+0.021
 importantly+0.020
ths+0.020
Neg logit contributions
 names-0.018
 name-0.016
 try-0.015
 sneak-0.015
 look-0.014
Token His
Token ID2399
Feature activation-3.62e-02

Pos logit contributions
lf+0.164
opus+0.152
mber+0.151
 importantly+0.143
izing+0.140
Neg logit contributions
 names-0.127
 sneak-0.101
 name-0.100
ws-0.100
 meet-0.099
Token mom
Token ID1995
Feature activation+8.44e-02

Pos logit contributions
 names+0.089
 name+0.074
 sneak+0.072
 meet+0.071
ws+0.071
Neg logit contributions
lf-0.087
opus-0.080
mber-0.080
 importantly-0.074
izing-0.074
Token smiled
Token ID13541
Feature activation+2.29e-01

Pos logit contributions
lf+0.186
opus+0.183
mber+0.173
 importantly+0.169
izing+0.165
Neg logit contributions
 names-0.210
 name-0.181
 try-0.177
 play-0.175
 sneak-0.175
Token we
Token ID356
Feature activation+2.19e-01

Pos logit contributions
lf+0.264
opus+0.247
mber+0.244
 importantly+0.229
ement+0.222
Neg logit contributions
 names-0.250
 name-0.206
 sneak-0.206
 try-0.204
 meet-0.203
Token have
Token ID423
Feature activation-3.98e-01

Pos logit contributions
lf+0.566
opus+0.532
mber+0.503
ths+0.488
 importantly+0.484
Neg logit contributions
 names-0.416
ashes-0.323
ws-0.315
 sneak-0.309
 name-0.288
Token a
Token ID257
Feature activation-1.52e-01

Pos logit contributions
 names+0.780
 sneak+0.619
 name+0.616
ws+0.610
 meet+0.596
Neg logit contributions
lf-1.116
mber-1.046
opus-1.029
 importantly-0.994
ths-0.975
Token books
Token ID3835
Feature activation-6.05e-02

Pos logit contributions
 names+0.343
ws+0.283
 sneak+0.275
 meet+0.270
 name+0.267
Neg logit contributions
lf-0.383
opus-0.356
mber-0.354
 importantly-0.333
ths-0.326
Tokenhe
Token ID258
Feature activation+1.76e-01

Pos logit contributions
 names+0.146
 sneak+0.119
 name+0.119
ws+0.118
 meet+0.118
Neg logit contributions
lf-0.146
opus-0.134
mber-0.133
 importantly-0.125
izing-0.122
Tokenlf
Token ID1652
Feature activation+7.35e-01

Pos logit contributions
mber+0.390
lf+0.384
opus+0.383
 importantly+0.353
izing+0.346
Neg logit contributions
 names-0.419
 find-0.357
 sneak-0.352
 name-0.351
 meet-0.350
Token!"
Token ID2474
Feature activation-8.14e-03

Pos logit contributions
lf+1.868
opus+1.719
mber+1.703
 importantly+1.606
izing+1.575
Neg logit contributions
 names-1.667
 name-1.341
 sneak-1.340
ws-1.337
 meet-1.324
Token<|endoftext|>
Token ID50256
Feature activation+7.18e-02

Pos logit contributions
 names+0.017
 name+0.014
 meet+0.013
ws+0.013
 sneak+0.013
Neg logit contributions
lf-0.023
opus-0.021
mber-0.020
izing-0.019
 importantly-0.019
TokenM
Token ID44
Feature activation+2.65e-01

Pos logit contributions
lf+0.169
opus+0.155
mber+0.154
 importantly+0.144
izing+0.140
Neg logit contributions
 names-0.176
 sneak-0.146
ws-0.145
 name-0.144
 meet-0.143
Tokenummy
Token ID13513
Feature activation+4.18e-01

Pos logit contributions
lf+0.436
izing+0.386
opus+0.373
mber+0.366
 importantly+0.329
Neg logit contributions
 names-0.821
 name-0.719
 play-0.702
 meet-0.699
 try-0.698
Token and
Token ID290
Feature activation+2.35e-02

Pos logit contributions
lf+1.104
opus+1.029
mber+1.017
 importantly+0.995
izing+0.980
Neg logit contributions
 names-0.895
 name-0.723
 sneak-0.718
 meet-0.714
ws-0.701
Token not
Token ID407
Feature activation-1.85e-02

Pos logit contributions
 names+0.316
ws+0.246
ashes+0.229
 name+0.226
 sneak+0.208
Neg logit contributions
lf-0.466
opus-0.388
mber-0.386
 importantly-0.370
 she-0.363
Token climb
Token ID12080
Feature activation-2.28e-01

Pos logit contributions
 names+0.028
ws+0.024
ags+0.018
 name+0.018
ashes+0.018
Neg logit contributions
lf-0.056
opus-0.046
ister-0.045
ths-0.045
 she-0.045
Token the
Token ID262
Feature activation+2.36e-02

Pos logit contributions
 names+0.518
 name+0.419
 sneak+0.418
ws+0.414
 meet+0.414
Neg logit contributions
lf-0.584
opus-0.532
mber-0.525
 importantly-0.497
izing-0.495
Token books
Token ID3835
Feature activation-5.30e-02

Pos logit contributions
lf+0.054
opus+0.045
mber+0.042
izing+0.040
very+0.039
Neg logit contributions
 names-0.062
ws-0.054
 name-0.052
 sneak-0.052
 meet-0.052
Tokenhe
Token ID258
Feature activation+1.77e-01

Pos logit contributions
 names+0.137
 name+0.113
ws+0.112
 sneak+0.112
 meet+0.111
Neg logit contributions
lf-0.120
opus-0.109
mber-0.107
 importantly-0.099
izing-0.099
Tokenlf
Token ID1652
Feature activation+7.35e-01

Pos logit contributions
lf+0.019
opus+0.012
mber+0.012
 importantly-0.008
izing-0.022
Neg logit contributions
 names-0.803
 sneak-0.729
 name-0.727
 meet-0.725
ws-0.723
Token.
Token ID13
Feature activation-9.78e-02

Pos logit contributions
lf+2.059
opus+1.878
mber+1.850
 importantly+1.755
izing+1.751
Neg logit contributions
 names-1.509
 name-1.186
ws-1.162
 meet-1.157
 sneak-1.155
Token It
Token ID632
Feature activation-2.19e-01

Pos logit contributions
 names+0.222
 sneak+0.190
ws+0.188
 meet+0.176
 argue+0.176
Neg logit contributions
lf-0.227
 importantly-0.213
opus-0.209
mber-0.205
izing-0.188
Token is
Token ID318
Feature activation-1.79e-01

Pos logit contributions
 names+0.512
ws+0.425
 name+0.414
 meet+0.405
 sneak+0.393
Neg logit contributions
lf-0.532
opus-0.484
mber-0.446
ths-0.436
izing-0.431
Token dangerous
Token ID4923
Feature activation+8.39e-02

Pos logit contributions
 names+0.437
 meet+0.356
 sneak+0.354
ws+0.352
 name+0.352
Neg logit contributions
lf-0.430
mber-0.376
opus-0.374
 importantly-0.371
izing-0.355
Token.
Token ID13
Feature activation-9.78e-02

Pos logit contributions
lf+0.214
opus+0.188
mber+0.183
 importantly+0.182
izing+0.181
Neg logit contributions
 names-0.188
 sneak-0.154
 name-0.153
ws-0.150
 meet-0.149
Token He
Token ID679
Feature activation-6.92e-02

Pos logit contributions
 names+0.313
 name+0.251
 sneak+0.249
 meet+0.247
ws+0.245
Neg logit contributions
lf-0.378
opus-0.351
mber-0.347
 importantly-0.327
izing-0.322
Token was
Token ID373
Feature activation+5.03e-02

Pos logit contributions
 names+0.162
 sneak+0.131
 name+0.131
ws+0.130
 meet+0.130
Neg logit contributions
lf-0.172
opus-0.158
mber-0.157
 importantly-0.148
izing-0.145
Token proud
Token ID6613
Feature activation+2.40e-01

Pos logit contributions
lf+0.128
opus+0.121
mber+0.118
izing+0.110
 importantly+0.110
Neg logit contributions
 names-0.116
 name-0.095
 meet-0.093
 sneak-0.093
ws-0.091
Token of
Token ID286
Feature activation+1.87e-01

Pos logit contributions
lf+0.621
opus+0.591
mber+0.569
izing+0.529
ths+0.521
Neg logit contributions
 names-0.542
 play-0.451
 name-0.441
 try-0.436
 meet-0.425
Token his
Token ID465
Feature activation+1.83e-02

Pos logit contributions
opus+0.503
lf+0.503
mber+0.490
izing+0.446
 importantly+0.441
Neg logit contributions
 names-0.391
 meet-0.315
 name-0.307
 try-0.303
 play-0.300
Token names
Token ID3891
Feature activation-6.48e-01

Pos logit contributions
lf+0.047
opus+0.044
mber+0.042
 importantly+0.040
izing+0.039
Neg logit contributions
 names-0.039
ws-0.034
 sneak-0.033
 meet-0.031
 find-0.030
Token.
Token ID13
Feature activation-9.66e-02

Pos logit contributions
 names+1.354
 name+1.071
 play+1.052
 meet+1.032
 try+1.027
Neg logit contributions
lf-1.703
mber-1.662
opus-1.639
izing-1.491
ths-1.488
Token
Token ID198
Feature activation-1.23e-02

Pos logit contributions
 names+0.183
 name+0.145
 meet+0.138
 sneak+0.132
 try+0.131
Neg logit contributions
lf-0.289
opus-0.271
mber-0.266
izing-0.256
ths-0.249
Token
Token ID198
Feature activation-1.31e-02

Pos logit contributions
 names+0.023
 meet+0.019
 name+0.018
 sneak+0.017
 play+0.017
Neg logit contributions
lf-0.035
opus-0.034
mber-0.033
izing-0.032
 importantly-0.032
TokenM
Token ID44
Feature activation+2.54e-01

Pos logit contributions
 names+0.031
 sneak+0.027
 meet+0.027
 play+0.027
 take+0.026
Neg logit contributions
lf-0.029
opus-0.028
mber-0.028
ement-0.027
 importantly-0.026
Tokenia
Token ID544
Feature activation+2.46e-01

Pos logit contributions
lf+0.443
mber+0.417
opus+0.394
izing+0.356
 importantly+0.347
Neg logit contributions
 names-0.754
 name-0.655
 sneak-0.643
 meet-0.622
 play-0.619
Token were
Token ID547
Feature activation-6.65e-02

Pos logit contributions
lf+0.843
mber+0.810
opus+0.804
 importantly+0.757
izing+0.755
Neg logit contributions
 names-0.699
 name-0.571
 meet-0.558
 find-0.553
 play-0.551
Token two
Token ID734
Feature activation+2.89e-02

Pos logit contributions
 names+0.139
 meet+0.108
 name+0.105
 play+0.103
 try+0.101
Neg logit contributions
lf-0.189
opus-0.173
mber-0.169
izing-0.168
ement-0.160
Token friends
Token ID2460
Feature activation+1.43e-01

Pos logit contributions
lf+0.070
mber+0.066
opus+0.064
izing+0.060
very+0.059
Neg logit contributions
 names-0.071
 sneak-0.057
 name-0.056
 find-0.055
 meet-0.055
Token.
Token ID13
Feature activation-1.14e-01

Pos logit contributions
lf+0.344
opus+0.331
mber+0.329
izing+0.301
ths+0.289
Neg logit contributions
 names-0.345
 find-0.293
 meet-0.291
 play-0.288
 sneak-0.284
Token Their
Token ID5334
Feature activation-1.82e-01

Pos logit contributions
 names+0.175
ws+0.144
 sneak+0.137
 try+0.128
 name+0.127
Neg logit contributions
lf-0.359
mber-0.330
opus-0.327
 importantly-0.323
very-0.304
Token names
Token ID3891
Feature activation-6.47e-01

Pos logit contributions
ws+0.118
 names+0.062
 sneak+0.053
 want+0.048
 meet+0.045
Neg logit contributions
lf-0.669
opus-0.648
mber-0.637
 importantly-0.603
ths-0.602
Token were
Token ID547
Feature activation-6.53e-02

Pos logit contributions
 names+1.401
 sneak+1.120
ws+1.114
 name+1.113
 meet+1.100
Neg logit contributions
lf-1.720
opus-1.587
mber-1.572
 importantly-1.491
izing-1.457
Token Lisa
Token ID15378
Feature activation+1.32e-01

Pos logit contributions
 names+0.149
 name+0.117
 meet+0.115
 sneak+0.112
 play+0.111
Neg logit contributions
lf-0.175
opus-0.164
mber-0.162
izing-0.157
ement-0.153
Token and
Token ID290
Feature activation+8.75e-03

Pos logit contributions
lf+0.363
opus+0.347
mber+0.341
izing+0.326
ths+0.325
Neg logit contributions
 names-0.272
 name-0.216
 sneak-0.210
 meet-0.210
 find-0.209
Token Jerry
Token ID13075
Feature activation+1.53e-01

Pos logit contributions
lf+0.023
opus+0.022
mber+0.021
izing+0.020
ement+0.020
Neg logit contributions
 names-0.019
 sneak-0.015
 meet-0.015
 name-0.015
 try-0.015
Token.
Token ID13
Feature activation-1.09e-01

Pos logit contributions
lf+0.400
opus+0.368
mber+0.362
 importantly+0.360
izing+0.340
Neg logit contributions
 names-0.318
ws-0.269
 sneak-0.264
 meet-0.253
 name-0.252
Tokenia
Token ID544
Feature activation+2.46e-01

Pos logit contributions
lf+0.443
mber+0.417
opus+0.394
izing+0.356
 importantly+0.347
Neg logit contributions
 names-0.754
 name-0.655
 sneak-0.643
 meet-0.622
 play-0.619
Token wanted
Token ID2227
Feature activation-1.04e-01

Pos logit contributions
lf+0.650
opus+0.611
mber+0.603
 importantly+0.584
izing+0.577
Neg logit contributions
 names-0.531
 sneak-0.430
 name-0.422
 meet-0.422
 find-0.412
Token to
Token ID284
Feature activation-2.12e-01

Pos logit contributions
 names+0.171
ws+0.136
 name+0.128
 sneak+0.128
 meet+0.124
Neg logit contributions
lf-0.324
mber-0.302
opus-0.300
 importantly-0.285
ths-0.282
Token have
Token ID423
Feature activation-4.06e-01

Pos logit contributions
 names+0.272
ws+0.270
ashes+0.209
inters+0.171
 proper+0.159
Neg logit contributions
lf-0.703
opus-0.624
mber-0.617
ister-0.588
very-0.568
Token nice
Token ID3621
Feature activation-1.15e-02

Pos logit contributions
 names+0.802
ws+0.631
 sneak+0.629
 name+0.627
 meet+0.611
Neg logit contributions
lf-1.156
mber-1.068
opus-1.065
 importantly-1.015
izing-0.986
Token names
Token ID3891
Feature activation-6.44e-01

Pos logit contributions
 names+0.026
ws+0.022
 sneak+0.022
 meet+0.021
 name+0.021
Neg logit contributions
lf-0.028
mber-0.025
opus-0.025
 importantly-0.024
ths-0.023
Token for
Token ID329
Feature activation-7.81e-02

Pos logit contributions
 names+1.371
 name+1.082
 play+1.046
 meet+1.036
 try+1.036
Neg logit contributions
lf-1.723
mber-1.661
opus-1.657
izing-1.501
 importantly-1.497
Token her
Token ID607
Feature activation-6.33e-03

Pos logit contributions
 names+0.172
 sneak+0.136
 name+0.136
 meet+0.135
ws+0.133
Neg logit contributions
lf-0.204
opus-0.193
mber-0.192
 importantly-0.179
izing-0.176
Token animals
Token ID4695
Feature activation-4.43e-02

Pos logit contributions
 names+0.015
 name+0.012
 sneak+0.012
 meet+0.012
ws+0.012
Neg logit contributions
lf-0.015
opus-0.014
mber-0.014
 importantly-0.013
izing-0.013
Token.
Token ID13
Feature activation-9.85e-02

Pos logit contributions
 names+0.091
 name+0.071
 sneak+0.071
 meet+0.070
ws+0.069
Neg logit contributions
lf-0.124
opus-0.115
mber-0.115
 importantly-0.108
izing-0.106
Token She
Token ID1375
Feature activation-7.34e-02

Pos logit contributions
 names+0.205
ws+0.170
 sneak+0.165
 meet+0.163
 name+0.161
Neg logit contributions
lf-0.261
mber-0.244
opus-0.243
 importantly-0.229
izing-0.224
Token was
Token ID373
Feature activation+4.65e-02

Pos logit contributions
 names+0.060
 name+0.049
 sneak+0.048
 meet+0.047
ws+0.047
Neg logit contributions
lf-0.070
opus-0.065
mber-0.064
izing-0.060
 importantly-0.060
Token a
Token ID257
Feature activation-1.52e-01

Pos logit contributions
lf+0.115
opus+0.105
mber+0.104
 importantly+0.100
izing+0.095
Neg logit contributions
 names-0.110
ws-0.090
 sneak-0.089
 meet-0.088
 name-0.088
Token girl
Token ID2576
Feature activation+1.58e-01

Pos logit contributions
 names+0.375
 name+0.312
 sneak+0.309
 meet+0.305
ws+0.300
Neg logit contributions
lf-0.359
opus-0.327
mber-0.325
 importantly-0.306
izing-0.303
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
lf+0.381
opus+0.350
mber+0.348
 importantly+0.325
izing+0.319
Neg logit contributions
 names-0.380
 sneak-0.310
 name-0.310
ws-0.309
 meet-0.307
Token Their
Token ID5334
Feature activation-1.76e-01

Pos logit contributions
 names+0.179
ws+0.163
 sneak+0.149
 respect+0.139
 meet+0.131
Neg logit contributions
lf-0.283
mber-0.263
opus-0.253
 importantly-0.250
very-0.244
Token names
Token ID3891
Feature activation-6.44e-01

Pos logit contributions
ws+0.190
 names+0.135
 sneak+0.120
 meet+0.117
 want+0.112
Neg logit contributions
lf-0.573
opus-0.565
mber-0.538
ths-0.518
 importantly-0.516
Token were
Token ID547
Feature activation-5.70e-02

Pos logit contributions
 names+1.357
 name+1.071
 sneak+1.067
ws+1.064
 meet+1.055
Neg logit contributions
lf-1.740
opus-1.624
mber-1.617
 importantly-1.521
izing-1.499
Token Lucy
Token ID22162
Feature activation+1.49e-01

Pos logit contributions
 names+0.129
 name+0.103
 meet+0.100
 sneak+0.098
 play+0.097
Neg logit contributions
lf-0.150
opus-0.142
mber-0.139
izing-0.137
ement-0.134
Token and
Token ID290
Feature activation+1.39e-02

Pos logit contributions
lf+0.401
opus+0.375
mber+0.371
 importantly+0.350
izing+0.345
Neg logit contributions
 names-0.321
 name-0.257
 sneak-0.254
 meet-0.249
ws-0.247
Token Joe
Token ID5689
Feature activation+1.79e-01

Pos logit contributions
lf+0.030
opus+0.030
mber+0.028
izing+0.027
ement+0.027
Neg logit contributions
 names-0.036
 name-0.030
 sneak-0.030
 play-0.030
 try-0.030
Token.
Token ID13
Feature activation-1.00e-01

Pos logit contributions
lf+0.452
opus+0.411
mber+0.405
 importantly+0.396
izing+0.377
Neg logit contributions
 names-0.383
ws-0.322
 sneak-0.314
 name-0.308
 meet-0.302
Token best
Token ID1266
Feature activation+1.09e-01

Pos logit contributions
 names+0.133
 name+0.107
 sneak+0.107
 meet+0.106
ws+0.105
Neg logit contributions
lf-0.143
opus-0.132
mber-0.131
 importantly-0.122
izing-0.121
Token friends
Token ID2460
Feature activation+1.46e-01

Pos logit contributions
lf+0.204
opus+0.186
mber+0.180
 importantly+0.176
izing+0.162
Neg logit contributions
 names-0.311
ws-0.273
 sneak-0.270
 meet-0.267
 name-0.264
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
lf+0.404
opus+0.368
mber+0.365
 importantly+0.359
izing+0.346
Neg logit contributions
 names-0.299
ws-0.240
 sneak-0.240
 name-0.234
 meet-0.230
Token
Token ID198
Feature activation-1.93e-02

Pos logit contributions
 names+0.163
ws+0.158
 sneak+0.138
 try+0.131
 name+0.126
Neg logit contributions
lf-0.295
mber-0.282
 importantly-0.269
very-0.257
opus-0.256
TokenTheir
Token ID14574
Feature activation+3.91e-02

Pos logit contributions
 names+0.039
 name+0.031
 meet+0.031
 sneak+0.030
 find+0.030
Neg logit contributions
lf-0.054
opus-0.051
mber-0.050
izing-0.047
 importantly-0.047
Token names
Token ID3891
Feature activation-6.44e-01

Pos logit contributions
lf+0.090
opus+0.083
mber+0.081
ths+0.077
 importantly+0.076
Neg logit contributions
 names-0.084
ws-0.083
 sneak-0.073
 meet-0.073
 try-0.070
Token were
Token ID547
Feature activation-5.52e-02

Pos logit contributions
 names+1.289
 name+1.000
ws+0.985
 meet+0.981
 sneak+0.981
Neg logit contributions
lf-1.804
opus-1.705
mber-1.703
 importantly-1.590
izing-1.584
Token Lucy
Token ID22162
Feature activation+1.50e-01

Pos logit contributions
 names+0.125
 name+0.097
 meet+0.096
 sneak+0.093
 play+0.092
Neg logit contributions
lf-0.149
opus-0.143
mber-0.136
izing-0.134
ement-0.132
Token and
Token ID290
Feature activation+1.33e-02

Pos logit contributions
lf+0.396
opus+0.369
mber+0.364
 importantly+0.342
izing+0.338
Neg logit contributions
 names-0.331
 name-0.265
 sneak-0.263
 meet-0.260
ws-0.257
Token Charlotte
Token ID14685
Feature activation+2.02e-01

Pos logit contributions
lf+0.036
opus+0.034
mber+0.033
izing+0.032
ement+0.031
Neg logit contributions
 names-0.029
 sneak-0.023
 meet-0.022
 name-0.022
 try-0.022
Token.
Token ID13
Feature activation-1.01e-01

Pos logit contributions
lf+0.497
 importantly+0.453
mber+0.451
opus+0.442
izing+0.420
Neg logit contributions
 names-0.437
ws-0.392
 sneak-0.364
 name-0.349
 meet-0.348
Token in
Token ID287
Feature activation-4.81e-04

Pos logit contributions
lf+0.471
opus+0.440
mber+0.438
 importantly+0.409
izing+0.403
Neg logit contributions
 names-0.346
 name-0.271
 sneak-0.268
 meet-0.267
ws-0.264
Token a
Token ID257
Feature activation-1.53e-01

Pos logit contributions
 names+0.001
ws+0.001
 sneak+0.001
 name+0.001
ashes+0.001
Neg logit contributions
lf-0.001
mber-0.001
 importantly-0.001
opus-0.001
izing-0.001
Token tree
Token ID5509
Feature activation+1.29e-01

Pos logit contributions
 names+0.326
 sneak+0.261
 meet+0.261
 name+0.260
ws+0.253
Neg logit contributions
lf-0.412
opus-0.383
mber-0.377
 importantly-0.361
izing-0.354
Token.
Token ID13
Feature activation-1.05e-01

Pos logit contributions
lf+0.373
opus+0.347
mber+0.345
 importantly+0.327
izing+0.322
Neg logit contributions
 names-0.250
 sneak-0.193
 name-0.192
ws-0.192
 meet-0.190
Token Their
Token ID5334
Feature activation-1.81e-01

Pos logit contributions
 names+0.157
ws+0.136
 sneak+0.127
 find+0.118
 respect+0.116
Neg logit contributions
lf-0.325
mber-0.306
opus-0.297
 importantly-0.294
very-0.280
Token names
Token ID3891
Feature activation-6.43e-01

Pos logit contributions
ws+0.297
 names+0.296
 sneak+0.256
 want+0.244
 find+0.242
Neg logit contributions
lf-0.488
opus-0.465
mber-0.459
ths-0.445
izing-0.429
Token were
Token ID547
Feature activation-5.94e-02

Pos logit contributions
 names+1.312
 name+1.023
 meet+1.011
 sneak+1.009
ws+1.008
Neg logit contributions
lf-1.782
opus-1.680
mber-1.673
 importantly-1.569
izing-1.555
Token Max
Token ID5436
Feature activation+5.68e-02

Pos logit contributions
 names+0.138
 name+0.109
 meet+0.108
 play+0.101
 sneak+0.100
Neg logit contributions
lf-0.159
opus-0.148
izing-0.143
mber-0.141
ement-0.140
Token and
Token ID290
Feature activation+1.30e-02

Pos logit contributions
lf+0.156
opus+0.148
mber+0.146
 importantly+0.138
ths+0.137
Neg logit contributions
 names-0.117
 name-0.094
 sneak-0.091
 meet-0.090
 find-0.089
Token Shy
Token ID36838
Feature activation+1.80e-01

Pos logit contributions
lf+0.031
opus+0.029
mber+0.028
izing+0.027
ement+0.026
Neg logit contributions
 names-0.032
 meet-0.026
 sneak-0.026
 name-0.026
 try-0.026
Token.
Token ID13
Feature activation-1.04e-01

Pos logit contributions
lf+0.523
opus+0.487
mber+0.484
 importantly+0.459
izing+0.452
Neg logit contributions
 names-0.351
 name-0.270
 sneak-0.268
ws-0.264
 meet-0.264
Token a
Token ID257
Feature activation-1.54e-01

Pos logit contributions
 names+0.010
ws+0.009
 sneak+0.008
 name+0.008
 meet+0.008
Neg logit contributions
lf-0.015
mber-0.014
opus-0.014
 importantly-0.013
izing-0.013
Token large
Token ID1588
Feature activation+8.49e-02

Pos logit contributions
 names+0.341
 name+0.278
 sneak+0.275
 meet+0.275
 find+0.270
Neg logit contributions
lf-0.403
opus-0.373
mber-0.371
izing-0.352
 importantly-0.352
Token house
Token ID2156
Feature activation-1.18e-01

Pos logit contributions
lf+0.243
mber+0.228
opus+0.227
 importantly+0.219
izing+0.217
Neg logit contributions
 names-0.168
 name-0.132
 meet-0.129
 sneak-0.128
 play-0.127
Token.
Token ID13
Feature activation-1.11e-01

Pos logit contributions
 names+0.255
ws+0.204
 sneak+0.202
 name+0.201
 meet+0.199
Neg logit contributions
lf-0.318
opus-0.289
mber-0.285
 importantly-0.274
izing-0.267
Token Their
Token ID5334
Feature activation-1.81e-01

Pos logit contributions
 names+0.174
ws+0.156
 sneak+0.141
 find+0.132
 try+0.130
Neg logit contributions
lf-0.340
mber-0.313
opus-0.305
 importantly-0.303
very-0.285
Token names
Token ID3891
Feature activation-6.43e-01

Pos logit contributions
ws+0.156
 names+0.114
 sneak+0.092
 want+0.088
 meet+0.088
Neg logit contributions
lf-0.635
opus-0.602
mber-0.594
ths-0.569
 importantly-0.566
Token were
Token ID547
Feature activation-6.44e-02

Pos logit contributions
 names+1.393
 sneak+1.108
 name+1.106
ws+1.103
 meet+1.092
Neg logit contributions
lf-1.710
opus-1.582
mber-1.570
 importantly-1.484
izing-1.454
Token Molly
Token ID30236
Feature activation+1.71e-01

Pos logit contributions
 names+0.144
 name+0.113
 meet+0.111
 sneak+0.108
 play+0.106
Neg logit contributions
lf-0.174
opus-0.167
mber-0.164
izing-0.159
ement-0.156
Token and
Token ID290
Feature activation+9.00e-03

Pos logit contributions
lf+0.479
opus+0.461
mber+0.447
ths+0.438
izing+0.434
Neg logit contributions
 names-0.334
 name-0.267
 sneak-0.256
 find-0.256
 meet-0.251
Token Jacob
Token ID12806
Feature activation+1.87e-01

Pos logit contributions
lf+0.024
opus+0.023
mber+0.023
izing+0.022
ement+0.022
Neg logit contributions
 names-0.019
 sneak-0.015
 name-0.015
 meet-0.015
 try-0.015
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
lf+0.443
mber+0.402
opus+0.401
 importantly+0.400
very+0.365
Neg logit contributions
 names-0.408
ws-0.366
 sneak-0.347
 meet-0.327
 name-0.325
Token,
Token ID11
Feature activation+2.60e-01

Pos logit contributions
 names+0.002
ws+0.002
 sneak+0.002
 meet+0.002
 name+0.002
Neg logit contributions
lf-0.003
mber-0.003
izing-0.003
opus-0.003
 importantly-0.003
Token far
Token ID1290
Feature activation+2.85e-04

Pos logit contributions
lf+0.627
opus+0.578
mber+0.572
 importantly+0.532
izing+0.526
Neg logit contributions
 names-0.636
 name-0.522
 sneak-0.520
 meet-0.515
ws-0.514
Token away
Token ID1497
Feature activation-2.22e-01

Pos logit contributions
lf+0.001
opus+0.001
mber+0.001
izing+0.001
 importantly+0.001
Neg logit contributions
 names-0.001
ws-0.000
 meet-0.000
 sneak-0.000
 name-0.000
Token.
Token ID13
Feature activation-1.08e-01

Pos logit contributions
 names+0.500
ws+0.412
 sneak+0.405
 name+0.400
 meet+0.399
Neg logit contributions
lf-0.559
opus-0.510
mber-0.508
 importantly-0.487
izing-0.481
Token Their
Token ID5334
Feature activation-1.78e-01

Pos logit contributions
 names+0.160
ws+0.127
 sneak+0.121
 find+0.112
 meet+0.112
Neg logit contributions
lf-0.351
mber-0.330
opus-0.324
 importantly-0.317
very-0.303
Token names
Token ID3891
Feature activation-6.43e-01

Pos logit contributions
ws+0.259
 names+0.253
 sneak+0.219
 meet+0.210
 want+0.208
Neg logit contributions
lf-0.522
opus-0.489
mber-0.475
 importantly-0.456
ths-0.452
Token were
Token ID547
Feature activation-5.67e-02

Pos logit contributions
 names+1.265
 name+0.980
 meet+0.979
 play+0.959
ws+0.948
Neg logit contributions
lf-1.794
mber-1.739
opus-1.730
izing-1.608
 importantly-1.590
Token Pat
Token ID3208
Feature activation+1.20e-01

Pos logit contributions
 names+0.132
 name+0.105
 meet+0.103
 sneak+0.098
 play+0.097
Neg logit contributions
lf-0.151
opus-0.141
mber-0.140
izing-0.136
ement-0.132
Token and
Token ID290
Feature activation+1.21e-02

Pos logit contributions
lf+0.336
mber+0.315
opus+0.313
 importantly+0.306
ths+0.306
Neg logit contributions
 names-0.237
 name-0.185
 sneak-0.184
 meet-0.179
 play-0.178
Token Bob
Token ID5811
Feature activation+1.08e-01

Pos logit contributions
lf+0.027
opus+0.026
mber+0.025
izing+0.024
ement+0.024
Neg logit contributions
 names-0.032
 name-0.026
 meet-0.026
 sneak-0.026
 play-0.026
Token.
Token ID13
Feature activation-1.05e-01

Pos logit contributions
lf+0.301
opus+0.280
mber+0.278
 importantly+0.264
izing+0.258
Neg logit contributions
 names-0.219
 sneak-0.172
ws-0.171
 name-0.171
 meet-0.169
Token good
Token ID922
Feature activation-1.62e-02

Pos logit contributions
 names+0.048
 name+0.039
 meet+0.038
 sneak+0.038
ws+0.038
Neg logit contributions
lf-0.049
opus-0.045
mber-0.044
 importantly-0.041
izing-0.041
Token friends
Token ID2460
Feature activation+1.44e-01

Pos logit contributions
 names+0.045
 name+0.038
 sneak+0.038
ws+0.037
 meet+0.037
Neg logit contributions
lf-0.034
opus-0.031
mber-0.030
 importantly-0.028
izing-0.027
Token and
Token ID290
Feature activation+1.10e-02

Pos logit contributions
lf+0.420
opus+0.393
mber+0.392
 importantly+0.360
izing+0.359
Neg logit contributions
 names-0.278
 name-0.213
 meet-0.212
 sneak-0.209
ws-0.205
Token signed
Token ID4488
Feature activation-7.79e-02

Pos logit contributions
lf+0.030
mber+0.027
opus+0.027
 importantly+0.026
ths+0.025
Neg logit contributions
 names-0.022
ws-0.018
 name-0.018
 sneak-0.017
 meet-0.016
Token their
Token ID511
Feature activation-1.34e-01

Pos logit contributions
 names+0.160
 sneak+0.125
 name+0.125
ws+0.125
 meet+0.123
Neg logit contributions
lf-0.216
opus-0.201
mber-0.199
 importantly-0.189
izing-0.185
Token names
Token ID3891
Feature activation-6.43e-01

Pos logit contributions
ws+0.111
 names+0.107
 sneak+0.087
 meet+0.083
 want+0.082
Neg logit contributions
lf-0.445
ths-0.418
opus-0.417
mber-0.416
 importantly-0.412
Token together
Token ID1978
Feature activation+1.66e-01

Pos logit contributions
 names+1.467
 name+1.173
 sneak+1.163
ws+1.160
 meet+1.159
Neg logit contributions
lf-1.622
opus-1.522
mber-1.518
 importantly-1.403
izing-1.380
Token on
Token ID319
Feature activation+4.63e-02

Pos logit contributions
lf+0.455
opus+0.415
mber+0.412
 importantly+0.398
izing+0.392
Neg logit contributions
 names-0.339
ws-0.274
 sneak-0.271
 name-0.267
 meet-0.265
Token everything
Token ID2279
Feature activation+8.94e-02

Pos logit contributions
lf+0.122
opus+0.111
mber+0.111
 importantly+0.106
izing+0.104
Neg logit contributions
 names-0.100
ws-0.082
 sneak-0.081
 name-0.081
 meet-0.080
Token they
Token ID484
Feature activation+3.54e-01

Pos logit contributions
lf+0.244
mber+0.232
opus+0.232
 importantly+0.211
izing+0.207
Neg logit contributions
 names-0.185
 meet-0.144
 name-0.144
 sneak-0.144
 play-0.143
Token could
Token ID714
Feature activation+3.15e-02

Pos logit contributions
lf+0.879
mber+0.798
opus+0.785
 importantly+0.742
ths+0.720
Neg logit contributions
 names-0.778
ws-0.645
 name-0.637
 sneak-0.627
 try-0.587
Token were
Token ID547
Feature activation-5.97e-02

Pos logit contributions
lf+0.841
mber+0.810
opus+0.809
 importantly+0.758
izing+0.755
Neg logit contributions
 names-0.696
 name-0.567
 meet-0.557
 find-0.550
 play-0.550
Token two
Token ID734
Feature activation+2.84e-02

Pos logit contributions
 names+0.123
 meet+0.097
 play+0.093
 name+0.093
 find+0.090
Neg logit contributions
lf-0.171
opus-0.157
mber-0.154
izing-0.153
ement-0.142
Token friends
Token ID2460
Feature activation+1.42e-01

Pos logit contributions
lf+0.070
mber+0.066
opus+0.064
izing+0.061
very+0.059
Neg logit contributions
 names-0.069
 sneak-0.055
 name-0.054
 play-0.053
 find-0.053
Token.
Token ID13
Feature activation-1.06e-01

Pos logit contributions
lf+0.353
opus+0.346
mber+0.342
izing+0.316
ths+0.300
Neg logit contributions
 names-0.321
 find-0.277
 play-0.271
 meet-0.269
 sneak-0.262
Token Their
Token ID5334
Feature activation-1.78e-01

Pos logit contributions
 names+0.170
ws+0.138
 sneak+0.132
 name+0.126
 try+0.123
Neg logit contributions
lf-0.333
mber-0.308
opus-0.302
 importantly-0.299
very-0.284
Token names
Token ID3891
Feature activation-6.43e-01

Pos logit contributions
ws+0.118
 names+0.075
 sneak+0.063
 want+0.052
 meet+0.050
Neg logit contributions
lf-0.660
opus-0.634
mber-0.620
 importantly-0.593
ths-0.590
Token were
Token ID547
Feature activation-5.93e-02

Pos logit contributions
 names+1.291
 name+1.006
ws+0.980
 meet+0.980
 sneak+0.980
Neg logit contributions
lf-1.800
opus-1.704
mber-1.701
izing-1.586
 importantly-1.586
Token Jack
Token ID3619
Feature activation+1.95e-01

Pos logit contributions
 names+0.137
 name+0.107
 meet+0.106
 sneak+0.103
 play+0.102
Neg logit contributions
lf-0.157
opus-0.147
mber-0.145
izing-0.140
ement-0.136
Token and
Token ID290
Feature activation+1.11e-02

Pos logit contributions
lf+0.531
opus+0.501
mber+0.491
 importantly+0.468
izing+0.466
Neg logit contributions
 names-0.405
 name-0.323
 sneak-0.315
 find-0.312
 meet-0.311
Token Kim
Token ID6502
Feature activation+6.87e-03

Pos logit contributions
lf+0.026
opus+0.025
mber+0.024
izing+0.023
 importantly+0.022
Neg logit contributions
 names-0.027
 name-0.022
 sneak-0.022
 meet-0.022
ws-0.022
Token.
Token ID13
Feature activation-1.04e-01

Pos logit contributions
lf+0.019
opus+0.017
mber+0.017
 importantly+0.016
izing+0.016
Neg logit contributions
 names-0.014
ws-0.011
 sneak-0.011
 name-0.011
 meet-0.011
Token dogs
Token ID6844
Feature activation-7.62e-02

Pos logit contributions
lf+0.478
opus+0.435
mber+0.433
 importantly+0.416
izing+0.406
Neg logit contributions
 names-0.400
ws-0.329
 sneak-0.326
 name-0.320
 meet-0.317
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
 names+0.184
 name+0.150
 sneak+0.150
 meet+0.149
 play+0.147
Neg logit contributions
lf-0.183
mber-0.172
opus-0.169
 importantly-0.160
ths-0.153
Token practiced
Token ID19893
Feature activation-1.57e-01

Pos logit contributions
lf+0.028
opus+0.026
mber+0.026
 importantly+0.024
izing+0.024
Neg logit contributions
 names-0.025
 sneak-0.020
 name-0.020
 meet-0.020
ws-0.019
Token saying
Token ID2282
Feature activation+1.68e-01

Pos logit contributions
 names+0.387
 sneak+0.319
ws+0.317
 name+0.317
 meet+0.314
Neg logit contributions
lf-0.371
opus-0.334
mber-0.333
 importantly-0.314
izing-0.308
Token their
Token ID511
Feature activation-1.35e-01

Pos logit contributions
lf+0.399
opus+0.367
mber+0.362
 importantly+0.333
ths+0.326
Neg logit contributions
 names-0.429
 name-0.351
 sneak-0.342
ws-0.339
 meet-0.339
Token names
Token ID3891
Feature activation-6.42e-01

Pos logit contributions
ws+0.087
 names+0.070
 meet+0.065
 find+0.061
 sneak+0.057
Neg logit contributions
lf-0.483
opus-0.465
ths-0.454
mber-0.453
izing-0.450
Token.
Token ID13
Feature activation-1.06e-01

Pos logit contributions
 names+1.390
 name+1.095
 sneak+1.086
ws+1.083
 meet+1.076
Neg logit contributions
lf-1.703
opus-1.597
mber-1.589
 importantly-1.484
izing-1.455
Token
Token ID220
Feature activation+3.12e-02

Pos logit contributions
 names+0.217
 name+0.171
 sneak+0.169
 meet+0.168
ws+0.167
Neg logit contributions
lf-0.299
opus-0.278
mber-0.274
 importantly-0.260
izing-0.257
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
lf+0.078
opus+0.073
mber+0.072
 importantly+0.068
izing+0.066
Neg logit contributions
 names-0.070
 sneak-0.058
ws-0.057
 name-0.056
 meet-0.056
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
 names+0.032
 meet+0.025
 name+0.025
 find+0.024
 sneak+0.024
Neg logit contributions
lf-0.050
opus-0.047
mber-0.046
 importantly-0.045
izing-0.044
TokenThe
Token ID464
Feature activation+1.57e-01

Pos logit contributions
 names+0.044
 meet+0.039
 sneak+0.039
 play+0.037
 find+0.037
Neg logit contributions
lf-0.035
opus-0.033
ement-0.032
mber-0.032
 importantly-0.031
Token of
Token ID286
Feature activation+1.84e-01

Pos logit contributions
 names+0.277
 name+0.238
 sneak+0.235
 meet+0.235
 try+0.235
Neg logit contributions
lf-0.142
opus-0.128
mber-0.126
 importantly-0.111
ths-0.106
Token the
Token ID262
Feature activation+2.91e-02

Pos logit contributions
lf+0.447
mber+0.403
opus+0.397
 importantly+0.383
izing+0.378
Neg logit contributions
 names-0.428
ws-0.361
 sneak-0.358
 name-0.349
 meet-0.344
Token paintings
Token ID21641
Feature activation-6.56e-03

Pos logit contributions
lf+0.075
opus+0.069
mber+0.069
 importantly+0.065
izing+0.063
Neg logit contributions
 names-0.064
ws-0.053
 sneak-0.053
 name-0.051
 meet-0.051
Token or
Token ID393
Feature activation-1.10e-01

Pos logit contributions
 names+0.020
 name+0.017
 meet+0.017
 sneak+0.017
 try+0.017
Neg logit contributions
lf-0.011
opus-0.011
mber-0.010
 importantly-0.009
izing-0.009
Token the
Token ID262
Feature activation+2.90e-02

Pos logit contributions
 names+0.251
ws+0.209
 sneak+0.205
 name+0.205
 meet+0.201
Neg logit contributions
lf-0.272
mber-0.247
opus-0.246
 importantly-0.236
izing-0.230
Token names
Token ID3891
Feature activation-6.42e-01

Pos logit contributions
lf+0.076
opus+0.070
mber+0.069
 importantly+0.066
izing+0.064
Neg logit contributions
 names-0.062
ws-0.051
 sneak-0.051
 name-0.050
 meet-0.049
Token of
Token ID286
Feature activation+1.86e-01

Pos logit contributions
 names+1.706
 name+1.412
 meet+1.392
 sneak+1.390
 try+1.378
Neg logit contributions
lf-1.375
opus-1.286
mber-1.276
 importantly-1.165
izing-1.133
Token the
Token ID262
Feature activation+3.04e-02

Pos logit contributions
lf+0.477
mber+0.434
opus+0.432
 importantly+0.413
izing+0.404
Neg logit contributions
 names-0.408
ws-0.338
 sneak-0.334
 name-0.328
 meet-0.323
Token statues
Token ID25827
Feature activation+1.80e-01

Pos logit contributions
lf+0.082
opus+0.076
mber+0.075
 importantly+0.071
izing+0.070
Neg logit contributions
 names-0.064
ws-0.052
 sneak-0.051
 name-0.051
 meet-0.050
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
lf+0.497
opus+0.468
mber+0.466
 importantly+0.434
izing+0.427
Neg logit contributions
 names-0.371
 name-0.290
 meet-0.288
 sneak-0.287
 try-0.280
Token
Token ID198
Feature activation-1.40e-02

Pos logit contributions
 names+0.230
 sneak+0.191
ws+0.191
 meet+0.184
 name+0.182
Neg logit contributions
lf-0.258
mber-0.240
opus-0.238
 importantly-0.230
ement-0.220
Token the
Token ID262
Feature activation+2.64e-02

Pos logit contributions
 names+0.841
 sneak+0.686
ws+0.684
 name+0.682
 meet+0.676
Neg logit contributions
lf-0.887
opus-0.814
mber-0.808
 importantly-0.762
izing-0.746
Token pictures
Token ID5986
Feature activation-1.83e-01

Pos logit contributions
lf+0.068
opus+0.063
mber+0.062
 importantly+0.058
izing+0.057
Neg logit contributions
 names-0.058
ws-0.049
 sneak-0.048
 meet-0.047
 name-0.046
Token and
Token ID290
Feature activation+9.93e-03

Pos logit contributions
 names+0.439
 name+0.358
 sneak+0.355
 meet+0.354
ws+0.350
Neg logit contributions
lf-0.440
opus-0.416
mber-0.414
 importantly-0.383
izing-0.375
Token learn
Token ID2193
Feature activation-4.26e-01

Pos logit contributions
lf+0.028
mber+0.025
 importantly+0.024
opus+0.023
izing+0.023
Neg logit contributions
 names-0.017
ws-0.016
ashes-0.014
 name-0.014
 sneak-0.014
Token the
Token ID262
Feature activation+2.58e-02

Pos logit contributions
 names+1.008
 sneak+0.833
ws+0.832
 name+0.822
 meet+0.822
Neg logit contributions
lf-1.032
opus-0.939
mber-0.936
 importantly-0.887
izing-0.868
Token names
Token ID3891
Feature activation-6.42e-01

Pos logit contributions
lf+0.076
mber+0.072
opus+0.069
 importantly+0.067
izing+0.066
Neg logit contributions
ws-0.035
 names-0.032
 sneak-0.030
ashes-0.030
 meet-0.028
Token of
Token ID286
Feature activation+1.81e-01

Pos logit contributions
 names+1.375
 name+1.083
 sneak+1.073
ws+1.066
 meet+1.064
Neg logit contributions
lf-1.714
opus-1.613
mber-1.602
 importantly-1.502
izing-1.478
Token the
Token ID262
Feature activation+2.83e-02

Pos logit contributions
lf+0.462
mber+0.409
opus+0.407
 importantly+0.394
izing+0.385
Neg logit contributions
 names-0.386
ws-0.333
 sneak-0.324
 name-0.314
 meet-0.310
Token animals
Token ID4695
Feature activation-3.90e-02

Pos logit contributions
lf+0.071
opus+0.065
mber+0.065
 importantly+0.061
izing+0.059
Neg logit contributions
 names-0.060
ws-0.054
 sneak-0.051
 meet-0.049
 try-0.049
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
 names+0.077
ws+0.061
 sneak+0.061
 name+0.060
 meet+0.059
Neg logit contributions
lf-0.110
opus-0.102
mber-0.101
 importantly-0.096
izing-0.094
Token Tom
Token ID4186
Feature activation-1.50e-02

Pos logit contributions
 names+0.199
ws+0.160
 sneak+0.158
 meet+0.152
 name+0.152
Neg logit contributions
lf-0.292
mber-0.272
opus-0.269
 importantly-0.259
ement-0.251
Token a
Token ID257
Feature activation-1.52e-01

Pos logit contributions
 names+0.002
ws+0.002
 sneak+0.002
 name+0.002
 meet+0.002
Neg logit contributions
lf-0.002
mber-0.002
 importantly-0.002
opus-0.002
izing-0.002
Token green
Token ID4077
Feature activation+1.43e-02

Pos logit contributions
 names+0.324
 sneak+0.261
 name+0.260
 meet+0.260
 find+0.254
Neg logit contributions
lf-0.409
opus-0.376
mber-0.373
 importantly-0.358
izing-0.352
Token closet
Token ID21615
Feature activation-1.17e-01

Pos logit contributions
lf+0.037
opus+0.034
mber+0.034
 importantly+0.032
izing+0.031
Neg logit contributions
 names-0.032
ws-0.026
 sneak-0.026
 name-0.026
 meet-0.026
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
 names+0.202
 name+0.151
 sneak+0.151
 meet+0.146
 find+0.145
Neg logit contributions
lf-0.355
mber-0.340
opus-0.338
 importantly-0.318
ths-0.314
Token Their
Token ID5334
Feature activation-1.78e-01

Pos logit contributions
 names+0.157
ws+0.123
 sneak+0.118
 name+0.111
 meet+0.109
Neg logit contributions
lf-0.346
mber-0.326
opus-0.320
 importantly-0.313
very-0.298
Token names
Token ID3891
Feature activation-6.42e-01

Pos logit contributions
 names+0.187
ws+0.187
 sneak+0.148
 meet+0.143
 want+0.136
Neg logit contributions
lf-0.581
opus-0.555
mber-0.552
 importantly-0.520
ths-0.516
Token were
Token ID547
Feature activation-5.95e-02

Pos logit contributions
 names+1.200
 name+0.908
 meet+0.884
 play+0.883
ws+0.874
Neg logit contributions
lf-1.864
opus-1.803
mber-1.797
izing-1.698
ths-1.675
Token Dave
Token ID9935
Feature activation+1.75e-01

Pos logit contributions
 names+0.144
 name+0.114
 meet+0.110
 sneak+0.105
 play+0.104
Neg logit contributions
lf-0.154
opus-0.144
izing-0.138
mber-0.138
ement-0.136
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
lf+0.508
opus+0.471
mber+0.462
ths+0.458
izing+0.449
Neg logit contributions
 names-0.341
 name-0.272
 find-0.254
 sneak-0.253
 meet-0.252
Token L
Token ID406
Feature activation-2.53e-02

Pos logit contributions
lf+0.026
opus+0.025
mber+0.024
izing+0.023
ths+0.022
Neg logit contributions
 names-0.025
 name-0.020
 meet-0.020
 sneak-0.020
 try-0.019
Tokenulu
Token ID15712
Feature activation+3.24e-01

Pos logit contributions
 names+0.079
 name+0.068
 sneak+0.068
 meet+0.068
ws+0.068
Neg logit contributions
lf-0.044
opus-0.038
mber-0.037
 importantly-0.035
izing-0.034
Token were
Token ID547
Feature activation-5.58e-02

Pos logit contributions
lf+0.907
mber+0.886
opus+0.878
izing+0.831
 importantly+0.828
Neg logit contributions
 names-0.670
 name-0.553
 find-0.539
 play-0.535
 meet-0.533
Token two
Token ID734
Feature activation+3.47e-02

Pos logit contributions
 names+0.121
 meet+0.095
 name+0.092
 play+0.091
 try+0.089
Neg logit contributions
lf-0.153
opus-0.141
mber-0.138
izing-0.136
ement-0.128
Token friends
Token ID2460
Feature activation+1.47e-01

Pos logit contributions
lf+0.086
mber+0.080
opus+0.079
izing+0.074
very+0.072
Neg logit contributions
 names-0.084
 sneak-0.067
 name-0.067
 play-0.065
 find-0.065
Token.
Token ID13
Feature activation-1.06e-01

Pos logit contributions
lf+0.371
opus+0.365
mber+0.362
izing+0.335
ths+0.318
Neg logit contributions
 names-0.331
 find-0.284
 play-0.278
 meet-0.275
 sneak-0.267
Token Their
Token ID5334
Feature activation-1.79e-01

Pos logit contributions
 names+0.183
ws+0.143
 sneak+0.141
 name+0.137
 meet+0.134
Neg logit contributions
lf-0.325
mber-0.299
opus-0.298
 importantly-0.290
izing-0.276
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
ws+0.112
 sneak+0.054
 names+0.052
 want+0.044
 meet+0.039
Neg logit contributions
lf-0.667
opus-0.641
mber-0.626
 importantly-0.600
ement-0.593
Token were
Token ID547
Feature activation-5.77e-02

Pos logit contributions
 names+1.239
 name+0.956
 meet+0.920
 play+0.915
ws+0.912
Neg logit contributions
lf-1.836
mber-1.770
opus-1.769
izing-1.651
 importantly-1.624
Token Max
Token ID5436
Feature activation+5.51e-02

Pos logit contributions
 names+0.134
 name+0.104
 meet+0.103
 sneak+0.099
 play+0.099
Neg logit contributions
lf-0.153
opus-0.142
mber-0.140
izing-0.139
ths-0.134
Token and
Token ID290
Feature activation+1.29e-02

Pos logit contributions
lf+0.148
opus+0.139
mber+0.137
 importantly+0.129
izing+0.128
Neg logit contributions
 names-0.118
 name-0.094
 sneak-0.093
 meet-0.092
ws-0.091
Token Ma
Token ID6669
Feature activation+9.43e-02

Pos logit contributions
lf+0.031
opus+0.028
mber+0.028
 importantly+0.026
izing+0.026
Neg logit contributions
 names-0.032
 sneak-0.026
 name-0.026
ws-0.026
 meet-0.026
Tokenisy
Token ID13560
Feature activation+5.07e-01

Pos logit contributions
lf+0.169
mber+0.156
opus+0.153
 importantly+0.144
izing+0.144
Neg logit contributions
 names-0.279
 sneak-0.236
 meet-0.236
 name-0.233
ws-0.228
Token learned
Token ID4499
Feature activation-1.04e-01

Pos logit contributions
lf+0.166
opus+0.153
mber+0.152
 importantly+0.143
izing+0.140
Neg logit contributions
 names-0.148
 sneak-0.119
 name-0.119
ws-0.119
 meet-0.117
Token how
Token ID703
Feature activation+7.74e-02

Pos logit contributions
 names+0.229
 name+0.182
 sneak+0.181
ws+0.179
 meet+0.179
Neg logit contributions
lf-0.274
opus-0.255
mber-0.252
 importantly-0.237
izing-0.232
Token to
Token ID284
Feature activation-2.25e-01

Pos logit contributions
lf+0.268
opus+0.250
mber+0.248
izing+0.234
 importantly+0.234
Neg logit contributions
 names-0.110
 name-0.077
 meet-0.074
 try-0.073
 sneak-0.073
Token sign
Token ID1051
Feature activation-2.11e-01

Pos logit contributions
 names+0.312
ws+0.227
 name+0.190
ashes+0.188
 sneak+0.180
Neg logit contributions
lf-0.779
opus-0.701
mber-0.690
izing-0.672
ths-0.655
Token their
Token ID511
Feature activation-1.40e-01

Pos logit contributions
 names+0.487
 sneak+0.398
ws+0.396
 name+0.395
 meet+0.391
Neg logit contributions
lf-0.521
opus-0.478
mber-0.474
 importantly-0.451
izing-0.441
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
 names+0.199
ws+0.176
 sneak+0.168
 meet+0.166
 find+0.154
Neg logit contributions
lf-0.422
opus-0.399
mber-0.392
 importantly-0.380
ths-0.377
Token.
Token ID13
Feature activation-1.15e-01

Pos logit contributions
 names+1.411
 name+1.122
 sneak+1.113
 meet+1.111
ws+1.105
Neg logit contributions
lf-1.661
mber-1.564
opus-1.559
 importantly-1.449
izing-1.422
Token They
Token ID1119
Feature activation-3.31e-01

Pos logit contributions
 names+0.232
 name+0.184
 meet+0.180
 sneak+0.176
ws+0.176
Neg logit contributions
lf-0.326
opus-0.303
mber-0.299
izing-0.284
 importantly-0.283
Token were
Token ID547
Feature activation-6.07e-02

Pos logit contributions
 names+0.740
ws+0.605
 name+0.599
 sneak+0.589
 meet+0.574
Neg logit contributions
lf-0.843
mber-0.764
opus-0.757
 importantly-0.707
ths-0.706
Token so
Token ID523
Feature activation+8.49e-02

Pos logit contributions
 names+0.147
 name+0.121
 meet+0.120
 sneak+0.119
ws+0.117
Neg logit contributions
lf-0.148
opus-0.141
mber-0.138
izing-0.127
 importantly-0.125
Token proud
Token ID6613
Feature activation+2.39e-01

Pos logit contributions
lf+0.195
opus+0.178
mber+0.174
izing+0.160
 importantly+0.159
Neg logit contributions
 names-0.222
 name-0.186
 meet-0.183
 sneak-0.183
 try-0.179
Token She
Token ID1375
Feature activation-7.15e-02

Pos logit contributions
 names+0.302
 name+0.239
 sneak+0.239
ws+0.237
 meet+0.236
Neg logit contributions
lf-0.387
opus-0.359
mber-0.356
 importantly-0.336
izing-0.330
Token was
Token ID373
Feature activation+4.46e-02

Pos logit contributions
 names+0.167
ws+0.136
 sneak+0.135
 name+0.135
 meet+0.134
Neg logit contributions
lf-0.178
opus-0.163
mber-0.162
 importantly-0.152
izing-0.148
Token happy
Token ID3772
Feature activation+1.49e-01

Pos logit contributions
lf+0.111
opus+0.103
mber+0.102
 importantly+0.095
izing+0.094
Neg logit contributions
 names-0.105
 name-0.085
 sneak-0.085
ws-0.084
 meet-0.084
Token with
Token ID351
Feature activation+5.78e-02

Pos logit contributions
lf+0.437
opus+0.417
mber+0.405
 importantly+0.377
izing+0.375
Neg logit contributions
 names-0.286
 name-0.225
 play-0.218
 try-0.218
 meet-0.217
Token her
Token ID607
Feature activation-3.62e-03

Pos logit contributions
lf+0.153
opus+0.144
mber+0.142
 importantly+0.132
izing+0.131
Neg logit contributions
 names-0.128
 name-0.102
 meet-0.101
 sneak-0.101
ws-0.100
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
 names+0.008
ws+0.007
 sneak+0.007
 meet+0.007
 name+0.007
Neg logit contributions
lf-0.009
opus-0.008
mber-0.008
 importantly-0.008
izing-0.008
Token.
Token ID13
Feature activation-9.96e-02

Pos logit contributions
 names+1.378
 name+1.101
 meet+1.072
 sneak+1.059
 try+1.052
Neg logit contributions
lf-1.680
opus-1.575
mber-1.573
 importantly-1.485
ths-1.457
Token
Token ID198
Feature activation-1.19e-02

Pos logit contributions
 names+0.200
 name+0.159
 meet+0.153
 sneak+0.149
 try+0.147
Neg logit contributions
lf-0.289
opus-0.269
mber-0.262
izing-0.251
ths-0.248
Token
Token ID198
Feature activation-1.23e-02

Pos logit contributions
 names+0.022
 meet+0.017
 name+0.017
 find+0.016
 play+0.016
Neg logit contributions
lf-0.035
opus-0.034
mber-0.033
izing-0.033
 importantly-0.032
TokenBen
Token ID11696
Feature activation+1.22e-01

Pos logit contributions
 names+0.028
 sneak+0.025
 meet+0.025
 play+0.024
 take+0.024
Neg logit contributions
ement-0.028
lf-0.027
opus-0.027
mber-0.026
 importantly-0.026
Token and
Token ID290
Feature activation+1.71e-02

Pos logit contributions
lf+0.321
opus+0.300
mber+0.295
 importantly+0.288
izing+0.281
Neg logit contributions
 names-0.267
 sneak-0.217
 meet-0.213
 name-0.213
 find-0.208
Token the
Token ID262
Feature activation+2.35e-02

Pos logit contributions
 names+0.392
 name+0.308
 sneak+0.307
ws+0.305
 meet+0.302
Neg logit contributions
lf-0.467
opus-0.444
mber-0.436
 importantly-0.413
izing-0.400
Token words
Token ID2456
Feature activation-1.82e-01

Pos logit contributions
lf+0.053
opus+0.048
mber+0.047
 importantly+0.043
izing+0.042
Neg logit contributions
 names-0.058
ws-0.051
 sneak-0.050
 meet-0.049
 name-0.047
Token and
Token ID290
Feature activation+6.69e-03

Pos logit contributions
 names+0.414
 name+0.334
 sneak+0.333
 meet+0.327
ws+0.327
Neg logit contributions
lf-0.460
opus-0.433
mber-0.428
 importantly-0.402
izing-0.390
Token learn
Token ID2193
Feature activation-4.28e-01

Pos logit contributions
lf+0.019
mber+0.017
opus+0.017
 importantly+0.017
ths+0.016
Neg logit contributions
 names-0.013
ws-0.010
 name-0.010
 sneak-0.010
 meet-0.010
Token the
Token ID262
Feature activation+2.50e-02

Pos logit contributions
 names+1.001
 name+0.807
 sneak+0.804
ws+0.800
 meet+0.793
Neg logit contributions
lf-1.071
opus-0.992
mber-0.981
 importantly-0.918
izing-0.900
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
lf+0.065
mber+0.060
opus+0.059
 importantly+0.055
izing+0.054
Neg logit contributions
 names-0.050
ws-0.045
 sneak-0.043
 meet-0.042
 find-0.040
Token of
Token ID286
Feature activation+1.83e-01

Pos logit contributions
 names+1.374
 name+1.081
 sneak+1.071
ws+1.062
 meet+1.061
Neg logit contributions
lf-1.701
opus-1.609
mber-1.601
 importantly-1.497
izing-1.469
Token the
Token ID262
Feature activation+2.81e-02

Pos logit contributions
lf+0.466
opus+0.426
mber+0.424
 importantly+0.401
izing+0.393
Neg logit contributions
 names-0.414
ws-0.338
 sneak-0.336
 name-0.334
 meet-0.330
Token animals
Token ID4695
Feature activation-4.08e-02

Pos logit contributions
lf+0.073
opus+0.067
mber+0.066
 importantly+0.062
izing+0.061
Neg logit contributions
 names-0.062
ws-0.051
 sneak-0.050
 name-0.050
 meet-0.049
Token.
Token ID13
Feature activation-1.04e-01

Pos logit contributions
 names+0.081
ws+0.063
 sneak+0.063
 name+0.063
 meet+0.062
Neg logit contributions
lf-0.115
opus-0.107
mber-0.106
 importantly-0.101
izing-0.099
Token
Token ID198
Feature activation-1.97e-02

Pos logit contributions
 names+0.171
 name+0.125
 sneak+0.124
ws+0.123
 meet+0.123
Neg logit contributions
lf-0.332
opus-0.311
mber-0.309
 importantly-0.294
izing-0.291
Token in
Token ID287
Feature activation+3.48e-04

Pos logit contributions
lf+0.034
mber+0.031
opus+0.031
 importantly+0.028
ths+0.028
Neg logit contributions
 names-0.035
 name-0.029
 meet-0.028
 sneak-0.028
 play-0.028
Token her
Token ID607
Feature activation-3.34e-03

Pos logit contributions
lf+0.001
opus+0.001
mber+0.001
 importantly+0.001
izing+0.001
Neg logit contributions
 names-0.001
 sneak-0.001
 name-0.001
ws-0.001
 meet-0.001
Token nest
Token ID16343
Feature activation+5.36e-02

Pos logit contributions
 names+0.007
ws+0.005
 sneak+0.005
 name+0.005
 meet+0.005
Neg logit contributions
lf-0.009
opus-0.009
mber-0.009
 importantly-0.008
izing-0.008
Token.
Token ID13
Feature activation-1.04e-01

Pos logit contributions
lf+0.144
opus+0.137
mber+0.134
 importantly+0.126
ths+0.123
Neg logit contributions
 names-0.114
 name-0.090
 sneak-0.089
ws-0.089
 meet-0.088
Token Their
Token ID5334
Feature activation-1.77e-01

Pos logit contributions
 names+0.218
 name+0.174
 meet+0.170
 sneak+0.167
 try+0.165
Neg logit contributions
lf-0.294
opus-0.270
mber-0.265
izing-0.251
 importantly-0.250
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
 names+0.412
ws+0.344
 sneak+0.340
 name+0.334
 meet+0.333
Neg logit contributions
lf-0.438
opus-0.400
mber-0.393
 importantly-0.374
ths-0.367
Token were
Token ID547
Feature activation-5.88e-02

Pos logit contributions
 names+1.227
 meet+0.940
 name+0.938
 play+0.931
 go+0.924
Neg logit contributions
lf-1.823
opus-1.763
mber-1.734
 importantly-1.615
izing-1.613
Token Tim
Token ID5045
Feature activation+4.40e-02

Pos logit contributions
 names+0.145
 name+0.116
 meet+0.113
 play+0.106
 sneak+0.105
Neg logit contributions
lf-0.145
opus-0.138
mber-0.130
izing-0.130
ement-0.123
Token,
Token ID11
Feature activation+2.62e-01

Pos logit contributions
lf+0.127
opus+0.119
mber+0.118
 importantly+0.112
izing+0.110
Neg logit contributions
 names-0.086
 name-0.067
 sneak-0.066
ws-0.065
 meet-0.065
Token Kim
Token ID6502
Feature activation+5.28e-03

Pos logit contributions
lf+0.702
opus+0.690
mber+0.655
izing+0.625
ths+0.621
Neg logit contributions
 names-0.558
 name-0.452
 play-0.435
 sneak-0.432
 meet-0.425
Token,
Token ID11
Feature activation+2.63e-01

Pos logit contributions
lf+0.014
opus+0.014
mber+0.014
 importantly+0.013
izing+0.013
Neg logit contributions
 names-0.011
 name-0.009
 meet-0.009
 sneak-0.009
 find-0.008
Token animals
Token ID4695
Feature activation-3.83e-02

Pos logit contributions
 names+0.088
 name+0.068
 sneak+0.068
 meet+0.067
ws+0.067
Neg logit contributions
lf-0.121
opus-0.114
mber-0.113
 importantly-0.107
izing-0.105
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
 names+0.071
 name+0.054
 meet+0.052
 sneak+0.052
 find+0.051
Neg logit contributions
lf-0.115
opus-0.108
mber-0.107
ths-0.101
 importantly-0.101
Token He
Token ID679
Feature activation-6.45e-02

Pos logit contributions
 names+0.176
ws+0.134
 sneak+0.133
 name+0.129
 meet+0.127
Neg logit contributions
lf-0.312
mber-0.292
opus-0.292
 importantly-0.278
izing-0.268
Token knew
Token ID2993
Feature activation+1.16e-01

Pos logit contributions
 names+0.141
 sneak+0.113
 name+0.112
ws+0.111
 meet+0.111
Neg logit contributions
lf-0.169
opus-0.158
mber-0.156
 importantly-0.148
izing-0.146
Token the
Token ID262
Feature activation+3.10e-02

Pos logit contributions
lf+0.219
opus+0.217
mber+0.206
izing+0.193
ement+0.187
Neg logit contributions
 names-0.329
 try-0.281
 name-0.279
 find-0.278
 sneak-0.275
Token names
Token ID3891
Feature activation-6.41e-01

Pos logit contributions
lf+0.079
opus+0.073
mber+0.073
 importantly+0.066
very+0.065
Neg logit contributions
 names-0.066
ws-0.058
 sneak-0.054
 meet-0.053
 name-0.052
Token of
Token ID286
Feature activation+1.90e-01

Pos logit contributions
 names+1.635
 meet+1.341
 name+1.339
 sneak+1.330
 play+1.327
Neg logit contributions
lf-1.378
opus-1.325
mber-1.316
izing-1.206
 importantly-1.203
Token many
Token ID867
Feature activation+4.28e-03

Pos logit contributions
lf+0.486
opus+0.444
mber+0.442
 importantly+0.418
izing+0.409
Neg logit contributions
 names-0.427
ws-0.346
 sneak-0.345
 name-0.344
 meet-0.338
Token animals
Token ID4695
Feature activation-3.81e-02

Pos logit contributions
lf+0.011
opus+0.011
mber+0.011
ths+0.010
 importantly+0.010
Neg logit contributions
 names-0.009
 sneak-0.007
 name-0.007
 meet-0.007
ws-0.007
Token and
Token ID290
Feature activation+1.51e-02

Pos logit contributions
 names+0.091
 name+0.074
 sneak+0.074
ws+0.073
 meet+0.073
Neg logit contributions
lf-0.093
opus-0.085
mber-0.085
 importantly-0.079
izing-0.078
Token the
Token ID262
Feature activation+3.27e-02

Pos logit contributions
lf+0.037
mber+0.033
opus+0.033
 importantly+0.031
izing+0.030
Neg logit contributions
 names-0.036
ws-0.030
 name-0.029
 sneak-0.029
 meet-0.029
Token trouble
Token ID5876
Feature activation-4.70e-02

Pos logit contributions
lf+0.010
mber+0.009
opus+0.009
 importantly+0.008
izing+0.008
Neg logit contributions
 names-0.009
ws-0.007
 sneak-0.007
 name-0.007
 meet-0.007
Token.
Token ID13
Feature activation-9.43e-02

Pos logit contributions
 names+0.086
 sneak+0.066
 name+0.066
 meet+0.065
ws+0.064
Neg logit contributions
lf-0.139
opus-0.132
mber-0.132
 importantly-0.124
ths-0.121
Token
Token ID198
Feature activation-5.39e-03

Pos logit contributions
 names+0.183
 name+0.150
 try+0.141
 meet+0.136
 find+0.136
Neg logit contributions
lf-0.285
opus-0.259
mber-0.247
izing-0.245
ths-0.241
Token
Token ID198
Feature activation-5.53e-03

Pos logit contributions
 names+0.008
 name+0.007
 meet+0.006
 try+0.006
 find+0.006
Neg logit contributions
lf-0.017
opus-0.016
izing-0.016
ths-0.015
mber-0.015
TokenM
Token ID44
Feature activation+2.58e-01

Pos logit contributions
 names+0.013
 try+0.011
 name+0.011
 take+0.011
 play+0.011
Neg logit contributions
lf-0.013
ement-0.011
opus-0.011
izing-0.011
mber-0.010
Tokenummy
Token ID13513
Feature activation+4.12e-01

Pos logit contributions
lf+0.385
mber+0.316
opus+0.301
izing+0.293
 importantly+0.274
Neg logit contributions
 names-0.855
 name-0.763
 try-0.724
 sneak-0.723
 play-0.708
Token:
Token ID25
Feature activation+2.23e-01

Pos logit contributions
lf+0.992
opus+0.943
izing+0.939
 importantly+0.929
mber+0.900
Neg logit contributions
 names-0.967
 sneak-0.814
 name-0.801
 try-0.779
 meet-0.774
Token "
Token ID366
Feature activation+1.22e-01

Pos logit contributions
lf+0.539
opus+0.504
mber+0.492
 importantly+0.486
izing+0.476
Neg logit contributions
 names-0.527
 name-0.455
 meet-0.438
 try-0.432
 sneak-0.423
TokenDid
Token ID11633
Feature activation+1.96e-01

Pos logit contributions
lf+0.308
opus+0.279
mber+0.269
 importantly+0.255
ement+0.255
Neg logit contributions
 names-0.286
 sneak-0.237
 meet-0.231
 name-0.229
 try-0.228
Token you
Token ID345
Feature activation+1.47e-01

Pos logit contributions
lf+0.455
mber+0.432
opus+0.425
ement+0.393
 importantly+0.392
Neg logit contributions
 names-0.491
 sneak-0.409
 try-0.406
 meet-0.399
 name-0.397
Token drop
Token ID4268
Feature activation-2.60e-01

Pos logit contributions
lf+0.465
opus+0.425
mber+0.413
 importantly+0.406
 unfortunately+0.398
Neg logit contributions
 names-0.224
ws-0.162
ashes-0.136
 name-0.126
 meet-0.123
Token could
Token ID714
Feature activation+3.65e-02

Pos logit contributions
lf+1.148
opus+1.048
mber+1.038
 importantly+0.977
izing+0.940
Neg logit contributions
 names-1.129
ws-0.916
 sneak-0.911
 name-0.908
 meet-0.897
Token wear
Token ID5806
Feature activation-2.39e-01

Pos logit contributions
lf+0.116
opus+0.103
mber+0.097
 importantly+0.097
izing+0.093
Neg logit contributions
 names-0.060
ws-0.049
ashes-0.041
 sneak-0.039
 meet-0.038
Token proudly
Token ID23017
Feature activation+1.76e-01

Pos logit contributions
 names+0.574
 name+0.469
 sneak+0.467
ws+0.465
 meet+0.463
Neg logit contributions
lf-0.577
opus-0.532
mber-0.527
 importantly-0.494
izing-0.483
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
lf+0.450
opus+0.404
mber+0.403
 importantly+0.389
izing+0.376
Neg logit contributions
 names-0.385
 sneak-0.314
ws-0.312
 meet-0.300
 name-0.298
Token And
Token ID843
Feature activation-7.23e-02

Pos logit contributions
 names+0.229
 sneak+0.189
ws+0.187
 meet+0.181
 name+0.180
Neg logit contributions
lf-0.259
mber-0.242
opus-0.240
 importantly-0.230
izing-0.219
Token she
Token ID673
Feature activation+4.70e-01

Pos logit contributions
 names+0.193
 name+0.166
 try+0.164
 find+0.163
 meet+0.161
Neg logit contributions
lf-0.162
opus-0.158
mber-0.145
ement-0.130
izing-0.129
Token had
Token ID550
Feature activation+3.54e-02

Pos logit contributions
lf+1.166
opus+1.074
mber+1.065
 importantly+1.001
izing+0.980
Neg logit contributions
 names-1.104
 sneak-0.895
 name-0.894
ws-0.892
 meet-0.884
Token her
Token ID607
Feature activation-1.43e-03

Pos logit contributions
lf+0.096
mber+0.089
opus+0.086
 importantly+0.085
ths+0.081
Neg logit contributions
 names-0.073
 sneak-0.060
ws-0.059
 name-0.058
 meet-0.057
Token mom
Token ID1995
Feature activation+7.79e-02

Pos logit contributions
 names+0.003
ws+0.003
 sneak+0.003
 name+0.003
 meet+0.003
Neg logit contributions
lf-0.004
opus-0.003
mber-0.003
 importantly-0.003
izing-0.003
Token,
Token ID11
Feature activation+2.64e-01

Pos logit contributions
lf+0.212
opus+0.196
mber+0.195
 importantly+0.185
izing+0.181
Neg logit contributions
 names-0.163
ws-0.129
 sneak-0.129
 name-0.128
 meet-0.126
Token who
Token ID508
Feature activation+8.23e-02

Pos logit contributions
lf+0.705
opus+0.661
mber+0.649
izing+0.612
 importantly+0.601
Neg logit contributions
 names-0.579
 name-0.470
 meet-0.461
 sneak-0.455
 try-0.453
Token She
Token ID1375
Feature activation-7.29e-02

Pos logit contributions
 names+0.210
 name+0.163
 sneak+0.160
 meet+0.160
 try+0.155
Neg logit contributions
lf-0.332
opus-0.313
mber-0.306
 importantly-0.292
izing-0.289
Token asked
Token ID1965
Feature activation+1.86e-02

Pos logit contributions
 names+0.144
 sneak+0.125
 find+0.123
 take+0.120
 name+0.119
Neg logit contributions
opus-0.192
lf-0.188
mber-0.184
izing-0.183
 importantly-0.179
Token Tim
Token ID5045
Feature activation+4.23e-02

Pos logit contributions
lf+0.058
opus+0.057
mber+0.056
 importantly+0.054
izing+0.053
Neg logit contributions
 names-0.031
 sneak-0.024
 find-0.023
 try-0.023
 play-0.023
Tokenmy
Token ID1820
Feature activation-9.86e-02

Pos logit contributions
lf+0.104
mber+0.096
opus+0.095
very+0.088
 importantly+0.087
Neg logit contributions
 names-0.094
 name-0.075
 sneak-0.075
ws-0.075
 meet-0.074
Token why
Token ID1521
Feature activation+9.41e-02

Pos logit contributions
 names+0.236
 name+0.194
 sneak+0.191
 find+0.191
 try+0.190
Neg logit contributions
lf-0.248
opus-0.230
mber-0.226
 importantly-0.214
izing-0.211
Token he
Token ID339
Feature activation+4.50e-01

Pos logit contributions
mber+0.152
opus+0.152
lf+0.151
ths+0.137
ement+0.134
Neg logit contributions
 names-0.278
 try-0.250
 play-0.249
 make-0.247
 take-0.246
Token lied
Token ID19837
Feature activation-3.14e-02

Pos logit contributions
lf+0.973
opus+0.856
mber+0.825
 importantly+0.769
ths+0.741
Neg logit contributions
 names-1.208
ws-1.010
 name-0.971
 sneak-0.971
ashes-0.958
Token to
Token ID284
Feature activation-2.19e-01

Pos logit contributions
 names+0.063
 name+0.050
 sneak+0.050
ws+0.049
 meet+0.048
Neg logit contributions
lf-0.088
opus-0.083
mber-0.082
 importantly-0.078
izing-0.076
Token Jimmy
Token ID12963
Feature activation+1.91e-01

Pos logit contributions
 names+0.462
ws+0.379
 name+0.364
 sneak+0.361
 meet+0.352
Neg logit contributions
lf-0.597
opus-0.533
mber-0.532
 importantly-0.504
izing-0.500
Token.
Token ID13
Feature activation-1.12e-01

Pos logit contributions
lf+0.480
opus+0.438
mber+0.433
izing+0.403
 importantly+0.402
Neg logit contributions
 names-0.435
ws-0.351
 name-0.346
 sneak-0.346
 meet-0.345
Token Tim
Token ID5045
Feature activation+4.31e-02

Pos logit contributions
 names+0.252
 name+0.208
 sneak+0.205
 meet+0.204
 play+0.203
Neg logit contributions
lf-0.281
opus-0.265
mber-0.258
 importantly-0.249
izing-0.242
Token,
Token ID11
Feature activation+2.66e-01

Pos logit contributions
lf+0.638
opus+0.587
mber+0.582
 importantly+0.549
izing+0.538
Neg logit contributions
 names-0.534
ws-0.436
 sneak-0.432
 meet-0.432
 name-0.431
Token too
Token ID1165
Feature activation+5.63e-03

Pos logit contributions
lf+0.697
mber+0.634
opus+0.632
izing+0.603
ths+0.590
Neg logit contributions
 names-0.605
 name-0.496
 sneak-0.483
 meet-0.483
 try-0.478
Token.
Token ID13
Feature activation-9.65e-02

Pos logit contributions
lf+0.017
opus+0.016
mber+0.016
 importantly+0.015
izing+0.015
Neg logit contributions
 names-0.010
 name-0.008
 sneak-0.008
ws-0.008
 meet-0.008
Token Even
Token ID3412
Feature activation-1.42e-02

Pos logit contributions
 names+0.221
ws+0.188
 sneak+0.187
 find+0.179
 argue+0.176
Neg logit contributions
lf-0.228
mber-0.217
opus-0.214
 importantly-0.211
very-0.194
Token if
Token ID611
Feature activation+8.09e-02

Pos logit contributions
 names+0.036
 name+0.030
ws+0.029
 sneak+0.029
 try+0.029
Neg logit contributions
lf-0.034
opus-0.031
mber-0.031
ths-0.028
 importantly-0.028
Token he
Token ID339
Feature activation+4.57e-01

Pos logit contributions
lf+0.170
opus+0.154
mber+0.151
izing+0.142
ement+0.140
Neg logit contributions
 names-0.228
 sneak-0.191
 try-0.190
 name-0.189
 meet-0.185
Token is
Token ID318
Feature activation-1.73e-01

Pos logit contributions
opus+1.168
lf+1.134
mber+1.083
ement+1.002
ths+0.987
Neg logit contributions
 names-0.849
 meet-0.658
 sneak-0.645
 name-0.632
 argue-0.625
Token naughty
Token ID45128
Feature activation+1.75e-01

Pos logit contributions
 names+0.409
 sneak+0.332
 name+0.331
ws+0.331
 meet+0.328
Neg logit contributions
lf-0.427
opus-0.393
mber-0.390
 importantly-0.367
izing-0.359
Token sometimes
Token ID3360
Feature activation+4.41e-03

Pos logit contributions
lf+0.476
opus+0.442
mber+0.439
 importantly+0.414
izing+0.407
Neg logit contributions
 names-0.368
 name-0.289
 sneak-0.289
ws-0.288
 meet-0.285
Token.
Token ID13
Feature activation-9.57e-02

Pos logit contributions
lf+0.012
opus+0.011
mber+0.011
 importantly+0.011
izing+0.011
Neg logit contributions
 names-0.009
 sneak-0.007
ws-0.007
 name-0.007
 meet-0.007
Token Come
Token ID7911
Feature activation-1.98e-01

Pos logit contributions
 names+0.208
ws+0.175
 sneak+0.173
 find+0.166
 meet+0.162
Neg logit contributions
lf-0.234
mber-0.222
opus-0.221
 importantly-0.220
very-0.201
Token the
Token ID262
Feature activation+3.17e-02

Pos logit contributions
 names+0.368
 meet+0.292
 sneak+0.288
 name+0.287
 try+0.285
Neg logit contributions
lf-0.433
opus-0.425
mber-0.404
 importantly-0.376
izing-0.367
Token sky
Token ID6766
Feature activation+9.27e-02

Pos logit contributions
lf+0.081
opus+0.075
mber+0.074
 importantly+0.070
izing+0.069
Neg logit contributions
 names-0.072
 sneak-0.058
ws-0.058
 name-0.057
 meet-0.057
Token,
Token ID11
Feature activation+2.64e-01

Pos logit contributions
lf+0.219
opus+0.205
mber+0.201
 importantly+0.185
izing+0.181
Neg logit contributions
 names-0.228
 name-0.188
 sneak-0.188
 meet-0.186
 find-0.184
Token even
Token ID772
Feature activation-1.59e-02

Pos logit contributions
lf+0.685
opus+0.643
mber+0.625
izing+0.570
 importantly+0.566
Neg logit contributions
 names-0.603
 name-0.490
 meet-0.488
 sneak-0.485
 find-0.482
Token if
Token ID611
Feature activation+8.02e-02

Pos logit contributions
 names+0.045
 sneak+0.038
 name+0.038
ws+0.038
 meet+0.037
Neg logit contributions
lf-0.033
opus-0.030
mber-0.029
ement-0.026
 importantly-0.026
Token she
Token ID673
Feature activation+4.72e-01

Pos logit contributions
lf+0.159
opus+0.149
mber+0.137
ement+0.126
izing+0.124
Neg logit contributions
 names-0.237
 try-0.199
 sneak-0.198
 name-0.196
 meet-0.195
Token couldn
Token ID3521
Feature activation+1.72e-01

Pos logit contributions
lf+1.093
opus+1.004
mber+0.985
 importantly+0.923
ths+0.907
Neg logit contributions
 names-1.160
ws-0.947
 name-0.933
 sneak-0.933
 meet-0.929
Token't
Token ID470
Feature activation+1.30e-02

Pos logit contributions
lf+0.373
opus+0.343
mber+0.316
izing+0.297
ths+0.295
Neg logit contributions
 names-0.471
ws-0.399
 name-0.378
 sneak-0.375
 meet-0.366
Token bring
Token ID2222
Feature activation-3.22e-01

Pos logit contributions
lf+0.041
opus+0.038
izing+0.036
mber+0.035
ths+0.035
Neg logit contributions
 names-0.021
ws-0.017
 name-0.014
ashes-0.013
 sneak-0.013
Token a
Token ID257
Feature activation-1.50e-01

Pos logit contributions
 names+0.740
 sneak+0.618
ws+0.611
 meet+0.601
 name+0.600
Neg logit contributions
lf-0.785
mber-0.719
opus-0.715
 importantly-0.683
izing-0.668
Token piece
Token ID3704
Feature activation-5.33e-02

Pos logit contributions
 names+0.352
 sneak+0.286
ws+0.286
 name+0.284
 meet+0.282
Neg logit contributions
lf-0.373
opus-0.345
mber-0.342
 importantly-0.321
izing-0.314
Token red
Token ID2266
Feature activation-2.06e-02

Pos logit contributions
 names+0.085
 name+0.072
 try+0.069
 sneak+0.069
 play+0.068
Neg logit contributions
lf-0.090
mber-0.086
opus-0.085
 importantly-0.084
ths-0.081
Token box
Token ID3091
Feature activation+3.00e-02

Pos logit contributions
 names+0.047
ws+0.038
 sneak+0.038
 meet+0.038
 name+0.038
Neg logit contributions
lf-0.052
opus-0.048
mber-0.047
 importantly-0.044
izing-0.044
Token again
Token ID757
Feature activation+1.04e-01

Pos logit contributions
lf+0.079
opus+0.074
mber+0.074
 importantly+0.069
izing+0.067
Neg logit contributions
 names-0.065
 sneak-0.052
 name-0.052
 meet-0.051
ws-0.051
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
lf+0.288
opus+0.274
mber+0.269
 importantly+0.252
ths+0.247
Neg logit contributions
 names-0.214
 name-0.171
 sneak-0.169
 meet-0.169
 find-0.169
Token Tim
Token ID5045
Feature activation+4.35e-02

Pos logit contributions
 names+0.188
 name+0.148
 meet+0.142
 sneak+0.136
 try+0.135
Neg logit contributions
lf-0.328
opus-0.310
mber-0.299
izing-0.296
ths-0.292
Token remembered
Token ID12086
Feature activation+1.22e-01

Pos logit contributions
lf+0.116
mber+0.106
opus+0.106
 importantly+0.098
ths+0.096
Neg logit contributions
 names-0.095
ws-0.075
 name-0.075
 sneak-0.074
 meet-0.074
Token it
Token ID340
Feature activation+1.72e-01

Pos logit contributions
opus+0.301
lf+0.294
mber+0.282
ement+0.269
izing+0.267
Neg logit contributions
 names-0.282
 find-0.237
 play-0.235
 try-0.235
 sneak-0.233
Token from
Token ID422
Feature activation+2.81e-02

Pos logit contributions
lf+0.428
opus+0.398
mber+0.394
 importantly+0.367
izing+0.362
Neg logit contributions
 names-0.407
 name-0.331
 sneak-0.330
 meet-0.326
ws-0.326
Token before
Token ID878
Feature activation+1.56e-01

Pos logit contributions
lf+0.071
opus+0.064
mber+0.064
 importantly+0.061
izing+0.059
Neg logit contributions
 names-0.065
 sneak-0.053
 name-0.053
ws-0.052
 meet-0.052
Token.
Token ID13
Feature activation-1.09e-01

Pos logit contributions
opus+0.335
mber+0.330
lf+0.329
 importantly+0.290
izing+0.289
Neg logit contributions
 names-0.392
 play-0.344
 try-0.342
 meet-0.339
 find-0.337
Token He
Token ID679
Feature activation-7.25e-02

Pos logit contributions
 names+0.209
 name+0.163
 sneak+0.158
 meet+0.158
ws+0.156
Neg logit contributions
lf-0.319
opus-0.298
mber-0.294
 importantly-0.279
izing-0.277
Token
Token ID198
Feature activation-9.82e-03

Pos logit contributions
 names+0.196
ws+0.165
 sneak+0.162
 name+0.153
 meet+0.150
Neg logit contributions
lf-0.260
mber-0.238
opus-0.234
 importantly-0.227
ths-0.218
Token
Token ID198
Feature activation-9.63e-03

Pos logit contributions
 names+0.018
 meet+0.014
 name+0.014
 find+0.014
 play+0.013
Neg logit contributions
lf-0.029
opus-0.028
mber-0.027
izing-0.026
 importantly-0.026
TokenOne
Token ID3198
Feature activation-4.08e-03

Pos logit contributions
 names+0.020
 meet+0.016
 sneak+0.016
 name+0.015
 play+0.015
Neg logit contributions
opus-0.025
lf-0.025
mber-0.024
 importantly-0.023
izing-0.023
Token day
Token ID1110
Feature activation+9.61e-02

Pos logit contributions
 names+0.010
 name+0.008
 sneak+0.008
 meet+0.008
 find+0.008
Neg logit contributions
lf-0.010
opus-0.009
mber-0.009
 importantly-0.008
izing-0.008
Token,
Token ID11
Feature activation+2.71e-01

Pos logit contributions
opus+0.241
mber+0.221
lf+0.214
izing+0.207
ement+0.201
Neg logit contributions
 names-0.229
 look-0.193
 name-0.192
 play-0.191
 try-0.189
Token Maria
Token ID14200
Feature activation+2.41e-01

Pos logit contributions
opus+0.595
lf+0.548
mber+0.536
ement+0.488
 importantly+0.483
Neg logit contributions
 names-0.739
 name-0.622
 try-0.613
 play-0.612
 find-0.611
Token found
Token ID1043
Feature activation-1.32e-01

Pos logit contributions
lf+0.650
opus+0.598
mber+0.595
 importantly+0.559
izing+0.547
Neg logit contributions
 names-0.515
ws-0.407
 name-0.407
 sneak-0.403
 meet-0.398
Token an
Token ID281
Feature activation-1.44e-01

Pos logit contributions
 names+0.198
 name+0.140
 find+0.130
 meet+0.130
 try+0.128
Neg logit contributions
lf-0.447
opus-0.425
mber-0.418
izing-0.401
 importantly-0.400
Token old
Token ID1468
Feature activation+9.39e-02

Pos logit contributions
 names+0.346
 name+0.285
 sneak+0.283
ws+0.277
 meet+0.275
Neg logit contributions
lf-0.345
opus-0.323
mber-0.322
 importantly-0.300
izing-0.293
Token map
Token ID3975
Feature activation+5.43e-02

Pos logit contributions
lf+0.237
opus+0.225
mber+0.223
 importantly+0.216
ths+0.210
Neg logit contributions
 names-0.209
 name-0.174
 sneak-0.168
 find-0.166
 try-0.164
Token.
Token ID13
Feature activation-8.85e-02

Pos logit contributions
lf+0.144
mber+0.138
opus+0.137
 importantly+0.127
ths+0.126
Neg logit contributions
 names-0.113
 name-0.091
 sneak-0.090
 meet-0.089
 find-0.089
TokenSo
Token ID2396
Feature activation+1.71e-01

Pos logit contributions
 names+0.012
 meet+0.010
 sneak+0.010
 play+0.010
 try+0.010
Neg logit contributions
lf-0.011
opus-0.010
ement-0.010
 importantly-0.010
mber-0.010
Token they
Token ID484
Feature activation+3.60e-01

Pos logit contributions
lf+0.441
opus+0.438
mber+0.397
 importantly+0.389
ement+0.385
Neg logit contributions
 names-0.370
 try-0.336
 take-0.330
 go-0.327
 make-0.325
Token got
Token ID1392
Feature activation-3.74e-02

Pos logit contributions
lf+0.983
mber+0.897
opus+0.880
 importantly+0.828
ths+0.823
Neg logit contributions
 names-0.738
 name-0.585
ws-0.583
 meet-0.560
 sneak-0.560
Token in
Token ID287
Feature activation+5.35e-03

Pos logit contributions
 names+0.081
ws+0.067
 name+0.066
 sneak+0.065
 meet+0.064
Neg logit contributions
lf-0.096
mber-0.088
opus-0.087
 importantly-0.082
ths-0.082
Token the
Token ID262
Feature activation+3.29e-02

Pos logit contributions
lf+0.013
mber+0.012
opus+0.012
 importantly+0.011
ths+0.011
Neg logit contributions
 names-0.013
ws-0.010
 sneak-0.010
 name-0.010
 meet-0.010
Token man
Token ID582
Feature activation+2.74e-01

Pos logit contributions
lf+0.078
opus+0.072
mber+0.071
 importantly+0.066
izing+0.065
Neg logit contributions
 names-0.082
 sneak-0.067
 name-0.067
ws-0.067
 meet-0.066
Token's
Token ID338
Feature activation-1.28e-01

Pos logit contributions
lf+0.706
opus+0.642
mber+0.635
 importantly+0.604
ths+0.592
Neg logit contributions
 names-0.592
ws-0.487
 name-0.466
 sneak-0.454
 meet-0.445
Token car
Token ID1097
Feature activation-4.29e-03

Pos logit contributions
 names+0.308
ws+0.256
 sneak+0.250
 name+0.248
 meet+0.247
Neg logit contributions
lf-0.314
opus-0.282
mber-0.280
 importantly-0.265
ths-0.256
Token and
Token ID290
Feature activation+2.05e-02

Pos logit contributions
 names+0.010
 name+0.008
 sneak+0.008
ws+0.008
 meet+0.008
Neg logit contributions
lf-0.011
opus-0.010
mber-0.010
 importantly-0.009
izing-0.009
Token drove
Token ID10357
Feature activation-5.13e-02

Pos logit contributions
lf+0.056
opus+0.052
mber+0.052
 importantly+0.049
izing+0.049
Neg logit contributions
 names-0.043
 sneak-0.034
 name-0.034
 meet-0.034
ws-0.034
Token to
Token ID284
Feature activation-2.10e-01

Pos logit contributions
 names+0.099
ws+0.078
 sneak+0.077
 name+0.077
 meet+0.075
Neg logit contributions
lf-0.148
mber-0.136
opus-0.136
 importantly-0.128
izing-0.127
Tokenrobe
Token ID25481
Feature activation-5.81e-02

Pos logit contributions
 names+0.314
 name+0.254
 play+0.245
 sneak+0.244
 find+0.243
Neg logit contributions
lf-0.357
opus-0.344
mber-0.339
 importantly-0.324
izing-0.318
Token and
Token ID290
Feature activation+4.96e-03

Pos logit contributions
 names+0.113
 play+0.093
 find+0.093
 name+0.092
 look+0.091
Neg logit contributions
lf-0.157
opus-0.152
mber-0.148
 importantly-0.147
izing-0.140
Token gave
Token ID2921
Feature activation+2.28e-02

Pos logit contributions
opus+0.012
lf+0.012
izing+0.012
ement+0.012
mber+0.011
Neg logit contributions
 names-0.011
 play-0.010
 take-0.010
 sneak-0.009
 find-0.009
Token Molly
Token ID30236
Feature activation+1.73e-01

Pos logit contributions
lf+0.048
opus+0.044
mber+0.043
 importantly+0.040
izing+0.039
Neg logit contributions
 names-0.062
 sneak-0.052
 name-0.052
ws-0.052
 meet-0.051
Token a
Token ID257
Feature activation-1.56e-01

Pos logit contributions
lf+0.459
mber+0.438
opus+0.421
 importantly+0.399
very+0.392
Neg logit contributions
 names-0.339
ws-0.282
 sneak-0.272
 meet-0.266
 name-0.266
Token loving
Token ID14442
Feature activation+1.12e-01

Pos logit contributions
 names+0.301
ws+0.243
 sneak+0.235
 meet+0.228
 name+0.221
Neg logit contributions
lf-0.444
mber-0.423
opus-0.421
 importantly-0.393
very-0.383
Token pat
Token ID1458
Feature activation-3.19e-01

Pos logit contributions
lf+0.346
mber+0.333
opus+0.317
 importantly+0.312
very+0.303
Neg logit contributions
 names-0.170
ws-0.144
 sneak-0.130
 meet-0.120
ashes-0.115
Token on
Token ID319
Feature activation+4.82e-02

Pos logit contributions
 names+0.677
 sneak+0.536
ws+0.535
 name+0.532
 meet+0.527
Neg logit contributions
lf-0.860
opus-0.798
mber-0.792
 importantly-0.747
izing-0.735
Token the
Token ID262
Feature activation+2.63e-02

Pos logit contributions
lf+0.122
opus+0.113
mber+0.112
 importantly+0.106
izing+0.103
Neg logit contributions
 names-0.111
 name-0.090
 sneak-0.090
ws-0.089
 meet-0.089
Token head
Token ID1182
Feature activation-2.82e-01

Pos logit contributions
lf+0.069
opus+0.064
mber+0.062
 importantly+0.058
izing+0.057
Neg logit contributions
 names-0.056
ws-0.049
 sneak-0.044
 meet-0.044
 name-0.042
Token.
Token ID13
Feature activation-1.15e-01

Pos logit contributions
 names+0.536
 name+0.414
 sneak+0.412
ws+0.404
 meet+0.404
Neg logit contributions
lf-0.824
opus-0.771
mber-0.766
 importantly-0.728
izing-0.714
Token mom
Token ID1995
Feature activation+7.75e-02

Pos logit contributions
 names+0.006
 name+0.005
 play+0.004
 find+0.004
 meet+0.004
Neg logit contributions
lf-0.007
mber-0.007
opus-0.006
izing-0.006
 importantly-0.006
Tokenmy
Token ID1820
Feature activation-9.97e-02

Pos logit contributions
lf+0.209
opus+0.189
mber+0.188
 importantly+0.179
very+0.175
Neg logit contributions
 names-0.165
ws-0.134
 sneak-0.131
 name-0.130
 meet-0.126
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
 names+0.213
ws+0.169
 sneak+0.168
 name+0.168
 meet+0.165
Neg logit contributions
lf-0.270
opus-0.248
mber-0.246
 importantly-0.232
izing-0.229
Token saw
Token ID2497
Feature activation-6.56e-03

Pos logit contributions
opus+0.030
ement+0.030
lf+0.029
 importantly+0.029
izing+0.028
Neg logit contributions
 names-0.028
 find-0.025
 play-0.024
 meet-0.024
 sneak-0.024
Token a
Token ID257
Feature activation-1.55e-01

Pos logit contributions
 names+0.011
 name+0.008
 sneak+0.008
ws+0.008
 meet+0.008
Neg logit contributions
lf-0.021
opus-0.020
mber-0.020
 importantly-0.019
izing-0.019
Token boy
Token ID2933
Feature activation+8.21e-02

Pos logit contributions
 names+0.300
 name+0.262
 play+0.245
 sneak+0.241
 try+0.241
Neg logit contributions
lf-0.421
 importantly-0.394
ement-0.393
ths-0.390
izing-0.387
Token playing
Token ID2712
Feature activation+5.55e-02

Pos logit contributions
lf+0.189
opus+0.172
mber+0.168
 importantly+0.159
izing+0.155
Neg logit contributions
 names-0.209
ws-0.172
 name-0.171
 sneak-0.171
 meet-0.169
Token with
Token ID351
Feature activation+5.41e-02

Pos logit contributions
lf+0.122
opus+0.106
mber+0.099
 importantly+0.098
izing+0.097
Neg logit contributions
 names-0.146
ws-0.125
 sneak-0.122
 meet-0.120
 name-0.119
Token a
Token ID257
Feature activation-1.54e-01

Pos logit contributions
lf+0.149
opus+0.133
mber+0.130
 importantly+0.129
izing+0.124
Neg logit contributions
 names-0.109
ws-0.089
 sneak-0.089
 meet-0.088
 name-0.087
Token toy
Token ID13373
Feature activation-7.81e-02

Pos logit contributions
 names+0.300
 name+0.240
 sneak+0.229
 play+0.227
 try+0.227
Neg logit contributions
lf-0.437
mber-0.407
opus-0.399
izing-0.392
 importantly-0.392
Token gun
Token ID2485
Feature activation+4.92e-03

Pos logit contributions
 names+0.174
ws+0.140
 name+0.139
 sneak+0.139
 meet+0.138
Neg logit contributions
lf-0.204
opus-0.188
mber-0.183
 importantly-0.175
izing-0.171
Token not
Token ID407
Feature activation-9.02e-03

Pos logit contributions
 names+0.413
 name+0.333
 sneak+0.332
ws+0.330
 meet+0.325
Neg logit contributions
lf-0.459
opus-0.428
mber-0.421
 importantly-0.394
izing-0.386
Token y
Token ID331
Feature activation+6.84e-02

Pos logit contributions
 names+0.021
 sneak+0.017
 meet+0.017
 name+0.017
ws+0.017
Neg logit contributions
lf-0.023
mber-0.021
opus-0.021
 importantly-0.020
izing-0.019
Tokenummy
Token ID13513
Feature activation+4.18e-01

Pos logit contributions
lf+0.080
opus+0.071
mber+0.066
izing+0.059
 importantly+0.057
Neg logit contributions
 names-0.248
 meet-0.219
 name-0.217
 sneak-0.217
ws-0.217
Token!
Token ID0
Feature activation+1.31e-01

Pos logit contributions
lf+1.074
opus+0.992
mber+0.988
 importantly+0.935
ths+0.903
Neg logit contributions
 names-0.950
 sneak-0.762
 name-0.760
ws-0.757
 meet-0.745
Token This
Token ID770
Feature activation-2.32e-01

Pos logit contributions
lf+0.343
opus+0.337
mber+0.331
 importantly+0.314
very+0.301
Neg logit contributions
 names-0.258
ws-0.214
 sneak-0.212
 meet-0.204
 find-0.203
Token is
Token ID318
Feature activation-1.79e-01

Pos logit contributions
 names+0.452
 sneak+0.356
ws+0.355
 name+0.350
 meet+0.350
Neg logit contributions
lf-0.655
opus-0.608
mber-0.605
 importantly-0.565
izing-0.564
Token sour
Token ID11348
Feature activation+1.59e-01

Pos logit contributions
 names+0.418
 sneak+0.339
 name+0.338
ws+0.338
 meet+0.337
Neg logit contributions
lf-0.441
opus-0.403
mber-0.403
 importantly-0.380
izing-0.373
Token and
Token ID290
Feature activation+1.87e-02

Pos logit contributions
lf+0.403
mber+0.370
opus+0.369
 importantly+0.350
ths+0.340
Neg logit contributions
 names-0.367
 name-0.296
 sneak-0.295
ws-0.292
 meet-0.286
Token bitter
Token ID12922
Feature activation+1.73e-01

Pos logit contributions
lf+0.042
opus+0.038
mber+0.038
 importantly+0.035
izing+0.034
Neg logit contributions
 names-0.048
 sneak-0.040
 name-0.040
ws-0.040
 meet-0.040
Token!"
Token ID2474
Feature activation-1.43e-02

Pos logit contributions
lf+0.445
opus+0.411
mber+0.408
 importantly+0.385
izing+0.376
Neg logit contributions
 names-0.390
 sneak-0.313
 name-0.313
ws-0.311
 meet-0.308
Token
Token ID198
Feature activation-1.06e-02

Pos logit contributions
 names+0.029
 name+0.023
 meet+0.023
 sneak+0.022
ws+0.022
Neg logit contributions
lf-0.041
opus-0.038
mber-0.038
 importantly-0.035
izing-0.035
Token on
Token ID319
Feature activation+4.89e-02

Pos logit contributions
 names+0.717
 name+0.536
 meet+0.486
 sneak+0.480
 play+0.478
Neg logit contributions
lf-1.171
opus-1.160
mber-1.153
 importantly-1.074
ths-1.063
Token paper
Token ID3348
Feature activation-7.32e-02

Pos logit contributions
lf+0.126
opus+0.114
mber+0.113
izing+0.106
 importantly+0.106
Neg logit contributions
 names-0.110
ws-0.090
 name-0.089
 sneak-0.089
 meet-0.088
Token with
Token ID351
Feature activation+5.05e-02

Pos logit contributions
 names+0.148
ws+0.119
 sneak+0.119
 meet+0.118
 name+0.116
Neg logit contributions
lf-0.204
opus-0.185
mber-0.183
 importantly-0.174
izing-0.174
Token her
Token ID607
Feature activation-3.97e-03

Pos logit contributions
lf+0.120
mber+0.103
opus+0.103
 importantly+0.101
izing+0.097
Neg logit contributions
 names-0.118
ws-0.104
 sneak-0.103
 meet-0.099
 name-0.097
Token cr
Token ID1067
Feature activation-5.22e-02

Pos logit contributions
 names+0.009
 name+0.007
 try+0.007
 play+0.007
 find+0.007
Neg logit contributions
lf-0.011
mber-0.010
opus-0.010
izing-0.009
 importantly-0.009
Tokenay
Token ID323
Feature activation+2.77e-02

Pos logit contributions
 names+0.138
 sneak+0.122
 meet+0.118
ws+0.118
 find+0.115
Neg logit contributions
lf-0.113
opus-0.101
mber-0.098
 importantly-0.092
ement-0.090
Tokenons
Token ID684
Feature activation+3.76e-01

Pos logit contributions
lf+0.036
mber+0.032
opus+0.032
 importantly+0.028
izing+0.028
Neg logit contributions
 names-0.098
 name-0.085
ws-0.084
 meet-0.083
 try-0.082
Token.
Token ID13
Feature activation-1.10e-01

Pos logit contributions
lf+1.063
opus+0.981
mber+0.969
 importantly+0.931
izing+0.918
Neg logit contributions
 names-0.740
 sneak-0.591
ws-0.586
 meet-0.575
 name-0.571
Token One
Token ID1881
Feature activation-2.54e-01

Pos logit contributions
 names+0.188
ws+0.162
 sneak+0.157
 meet+0.142
 respect+0.140
Neg logit contributions
lf-0.323
mber-0.298
opus-0.295
 importantly-0.282
ement-0.279
Token day
Token ID1110
Feature activation+9.30e-02

Pos logit contributions
 names+0.578
 name+0.480
 play+0.470
 meet+0.469
 sneak+0.465
Neg logit contributions
lf-0.635
opus-0.599
mber-0.589
izing-0.565
 importantly-0.564
Token,
Token ID11
Feature activation+2.60e-01

Pos logit contributions
lf+0.225
mber+0.212
opus+0.209
ths+0.204
izing+0.202
Neg logit contributions
 names-0.220
 play-0.193
 try-0.191
 name-0.189
 go-0.187
Token then
Token ID788
Feature activation+2.86e-01

Pos logit contributions
lf+0.048
opus+0.047
mber+0.046
 importantly+0.044
izing+0.044
Neg logit contributions
 names-0.035
 sneak-0.028
 name-0.027
 find-0.027
 play-0.027
Token some
Token ID617
Feature activation-1.23e-01

Pos logit contributions
lf+0.795
opus+0.751
mber+0.742
 importantly+0.714
izing+0.707
Neg logit contributions
 names-0.587
 sneak-0.466
 name-0.463
ws-0.454
 meet-0.452
Token makeup
Token ID16029
Feature activation-6.93e-03

Pos logit contributions
 names+0.273
 sneak+0.219
 name+0.219
ws+0.219
 meet+0.216
Neg logit contributions
lf-0.318
opus-0.294
mber-0.292
 importantly-0.276
izing-0.269
Token.
Token ID13
Feature activation-9.78e-02

Pos logit contributions
 names+0.015
ws+0.012
 sneak+0.012
 meet+0.012
 name+0.012
Neg logit contributions
lf-0.019
opus-0.017
mber-0.017
 importantly-0.016
izing-0.016
Token She
Token ID1375
Feature activation-7.39e-02

Pos logit contributions
 names+0.206
 name+0.165
 meet+0.161
 sneak+0.160
ws+0.157
Neg logit contributions
lf-0.270
opus-0.250
mber-0.246
 importantly-0.233
izing-0.231
Token wanted
Token ID2227
Feature activation-1.06e-01

Pos logit contributions
 names+0.157
 sneak+0.125
 name+0.124
ws+0.123
 meet+0.122
Neg logit contributions
lf-0.197
opus-0.186
mber-0.185
 importantly-0.176
izing-0.175
Token to
Token ID284
Feature activation-2.14e-01

Pos logit contributions
 names+0.169
ws+0.126
 sneak+0.124
 name+0.123
 meet+0.120
Neg logit contributions
lf-0.344
opus-0.320
mber-0.320
 importantly-0.306
ths-0.299
Token make
Token ID787
Feature activation-4.60e-01

Pos logit contributions
 names+0.364
ws+0.305
ashes+0.267
 sneak+0.216
 name+0.214
Neg logit contributions
lf-0.707
opus-0.592
mber-0.569
izing-0.538
 importantly-0.537
Token sure
Token ID1654
Feature activation+6.06e-02

Pos logit contributions
 names+1.004
 name+0.801
 sneak+0.788
ws+0.781
 meet+0.780
Neg logit contributions
lf-1.224
opus-1.146
mber-1.135
 importantly-1.066
izing-1.045
Token she
Token ID673
Feature activation+4.70e-01

Pos logit contributions
lf+0.127
opus+0.122
mber+0.119
izing+0.107
 importantly+0.107
Neg logit contributions
 names-0.166
 try-0.140
 find-0.140
 name-0.139
 sneak-0.139
Token looked
Token ID3114
Feature activation+7.46e-03

Pos logit contributions
lf+1.047
opus+0.936
mber+0.855
 importantly+0.815
ths+0.811
Neg logit contributions
 names-1.208
ws-1.020
 meet-1.004
 name-0.985
 respect-0.968
Token Jane
Token ID12091
Feature activation+2.06e-01

Pos logit contributions
 names+0.373
 try+0.324
 sneak+0.323
 find+0.322
 play+0.321
Neg logit contributions
lf-0.317
opus-0.316
mber-0.309
 importantly-0.286
ement-0.284
Token quickly
Token ID2952
Feature activation+1.73e-01

Pos logit contributions
lf+0.516
opus+0.476
mber+0.474
 importantly+0.449
izing+0.439
Neg logit contributions
 names-0.477
 sneak-0.391
 name-0.389
ws-0.386
 meet-0.383
Token became
Token ID2627
Feature activation-1.26e-02

Pos logit contributions
lf+0.467
opus+0.436
mber+0.432
 importantly+0.409
izing+0.402
Neg logit contributions
 names-0.366
 sneak-0.292
 name-0.290
ws-0.287
 meet-0.285
Token troubled
Token ID17840
Feature activation+5.88e-02

Pos logit contributions
 names+0.029
 name+0.024
 sneak+0.024
ws+0.023
 meet+0.023
Neg logit contributions
lf-0.032
opus-0.030
mber-0.029
 importantly-0.027
izing-0.027
Token.
Token ID13
Feature activation-9.07e-02

Pos logit contributions
lf+0.163
opus+0.153
mber+0.152
 importantly+0.141
izing+0.138
Neg logit contributions
 names-0.121
 name-0.095
 sneak-0.094
ws-0.093
 meet-0.092
Token She
Token ID1375
Feature activation-6.67e-02

Pos logit contributions
 names+0.180
 name+0.141
 meet+0.138
 sneak+0.138
ws+0.135
Neg logit contributions
lf-0.259
opus-0.242
mber-0.238
 importantly-0.225
izing-0.224
Token didn
Token ID1422
Feature activation+1.14e-01

Pos logit contributions
 names+0.159
 sneak+0.130
 name+0.130
ws+0.129
 meet+0.128
Neg logit contributions
lf-0.163
opus-0.150
mber-0.148
 importantly-0.139
izing-0.136
Token't
Token ID470
Feature activation+1.41e-02

Pos logit contributions
lf+0.250
opus+0.227
mber+0.218
 importantly+0.205
izing+0.202
Neg logit contributions
 names-0.300
ws-0.254
 sneak-0.244
 name-0.241
 meet-0.241
Token know
Token ID760
Feature activation-4.34e-01

Pos logit contributions
lf+0.045
mber+0.043
 importantly+0.041
 she+0.040
very+0.039
Neg logit contributions
 names-0.021
ws-0.014
 sneak-0.012
 name-0.012
ashes-0.011
Token what
Token ID644
Feature activation+1.11e-01

Pos logit contributions
 names+0.875
 look+0.673
 respect+0.667
 make+0.665
 try+0.665
Neg logit contributions
opus-1.223
mber-1.222
lf-1.150
very-1.099
va-1.055
Token to
Token ID284
Feature activation-2.17e-01

Pos logit contributions
lf+0.355
opus+0.339
mber+0.334
izing+0.318
 importantly+0.316
Neg logit contributions
 names-0.187
 name-0.136
 meet-0.132
 sneak-0.131
 try-0.130
Token to
Token ID284
Feature activation-2.13e-01

Pos logit contributions
lf+0.126
opus+0.119
mber+0.117
 importantly+0.111
izing+0.108
Neg logit contributions
 names-0.093
 sneak-0.073
 name-0.072
 meet-0.072
ws-0.071
Token find
Token ID1064
Feature activation-4.68e-01

Pos logit contributions
 names+0.300
ws+0.224
ashes+0.199
 name+0.182
 sneak+0.158
Neg logit contributions
lf-0.723
mber-0.641
opus-0.623
 importantly-0.601
izing-0.599
Token a
Token ID257
Feature activation-1.53e-01

Pos logit contributions
 names+0.985
 name+0.768
 sneak+0.758
ws+0.750
 meet+0.750
Neg logit contributions
lf-1.307
opus-1.221
mber-1.193
 importantly-1.125
izing-1.112
Token muff
Token ID27563
Feature activation+6.38e-02

Pos logit contributions
 names+0.355
 name+0.290
 sneak+0.288
 meet+0.285
 try+0.280
Neg logit contributions
lf-0.384
opus-0.353
mber-0.349
 importantly-0.333
izing-0.327
Tokenin
Token ID259
Feature activation+4.17e-01

Pos logit contributions
lf+0.099
mber+0.082
opus+0.079
 importantly+0.076
izing+0.071
Neg logit contributions
 names-0.201
ws-0.176
 name-0.171
 sneak-0.170
 try-0.169
Token.
Token ID13
Feature activation-9.49e-02

Pos logit contributions
lf+1.115
opus+1.033
mber+1.026
 importantly+0.971
izing+0.952
Neg logit contributions
 names-0.901
 sneak-0.716
 name-0.715
ws-0.709
 meet-0.705
Token They
Token ID1119
Feature activation-3.26e-01

Pos logit contributions
 names+0.190
ws+0.161
 sneak+0.151
 meet+0.146
 name+0.144
Neg logit contributions
lf-0.253
mber-0.244
opus-0.237
 importantly-0.230
very-0.220
Token looked
Token ID3114
Feature activation+7.71e-03

Pos logit contributions
 names+0.713
ws+0.578
 name+0.569
 sneak+0.559
 meet+0.554
Neg logit contributions
lf-0.856
mber-0.777
opus-0.774
 importantly-0.728
ths-0.718
Token in
Token ID287
Feature activation+3.05e-03

Pos logit contributions
lf+0.023
opus+0.022
mber+0.021
 importantly+0.019
izing+0.019
Neg logit contributions
 names-0.015
 name-0.012
 sneak-0.012
 find-0.012
 meet-0.011
Token the
Token ID262
Feature activation+3.12e-02

Pos logit contributions
lf+0.008
opus+0.007
mber+0.007
 importantly+0.007
izing+0.007
Neg logit contributions
 names-0.007
 sneak-0.006
ws-0.006
 name-0.006
 meet-0.006
Token kitchen
Token ID9592
Feature activation+4.85e-02

Pos logit contributions
lf+0.079
opus+0.073
mber+0.072
 importantly+0.068
izing+0.066
Neg logit contributions
 names-0.073
 sneak-0.059
 name-0.059
ws-0.059
 meet-0.058
Token can
Token ID460
Feature activation-2.30e-01

Pos logit contributions
 names+0.035
ws+0.029
 sneak+0.028
ashes+0.027
 meet+0.025
Neg logit contributions
lf-0.069
opus-0.066
mber-0.065
 importantly-0.061
very-0.058
Token do
Token ID466
Feature activation-2.58e-01

Pos logit contributions
 names+0.381
ws+0.348
ashes+0.270
 sneak+0.258
 proper+0.254
Neg logit contributions
lf-0.700
opus-0.654
mber-0.605
izing-0.588
ths-0.586
Token this
Token ID428
Feature activation+2.36e-01

Pos logit contributions
 names+0.698
 name+0.582
 sneak+0.582
ws+0.578
 meet+0.576
Neg logit contributions
lf-0.551
opus-0.501
mber-0.497
 importantly-0.461
izing-0.449
Token.
Token ID13
Feature activation-1.05e-01

Pos logit contributions
lf+0.592
opus+0.542
mber+0.539
 importantly+0.504
izing+0.499
Neg logit contributions
 names-0.542
ws-0.443
 sneak-0.439
 name-0.436
 meet-0.433
Token We
Token ID775
Feature activation-5.10e-02

Pos logit contributions
 names+0.221
ws+0.198
 sneak+0.188
 find+0.183
 meet+0.173
Neg logit contributions
lf-0.244
opus-0.238
mber-0.237
 importantly-0.235
very-0.216
Token have
Token ID423
Feature activation-4.03e-01

Pos logit contributions
 names+0.082
ws+0.066
 sneak+0.063
 meet+0.059
ashes+0.058
Neg logit contributions
lf-0.135
opus-0.129
mber-0.122
 importantly-0.115
dd-0.113
Token the
Token ID262
Feature activation+2.43e-02

Pos logit contributions
 names+0.779
ws+0.621
 name+0.620
 sneak+0.617
 meet+0.600
Neg logit contributions
lf-1.134
mber-1.066
opus-1.048
 importantly-1.007
ths-0.979
Token will
Token ID481
Feature activation-3.36e-01

Pos logit contributions
lf+0.056
opus+0.052
mber+0.051
 importantly+0.048
izing+0.046
Neg logit contributions
 names-0.057
ws-0.051
 sneak-0.049
 meet-0.048
 name-0.046
Token to
Token ID284
Feature activation-2.17e-01

Pos logit contributions
 names+0.544
ws+0.422
 sneak+0.372
 name+0.361
ashes+0.346
Neg logit contributions
lf-1.060
mber-0.940
opus-0.939
 importantly-0.904
izing-0.901
Token win
Token ID1592
Feature activation-2.99e-01

Pos logit contributions
 names+0.308
ws+0.239
ashes+0.201
 name+0.188
 sneak+0.174
Neg logit contributions
lf-0.724
opus-0.627
mber-0.625
izing-0.616
 importantly-0.600
Token.
Token ID13
Feature activation-1.04e-01

Pos logit contributions
 names+0.646
 sneak+0.513
 name+0.512
ws+0.511
 meet+0.506
Neg logit contributions
lf-0.797
opus-0.738
mber-0.733
 importantly-0.692
izing-0.679
Token secret
Token ID3200
Feature activation-9.91e-02

Pos logit contributions
 names+0.102
 sneak+0.082
ws+0.082
 name+0.082
 meet+0.081
Neg logit contributions
lf-0.118
opus-0.109
mber-0.108
 importantly-0.101
izing-0.099
Token that
Token ID326
Feature activation+3.08e-01

Pos logit contributions
 names+0.213
 name+0.171
 sneak+0.168
 meet+0.165
ws+0.164
Neg logit contributions
lf-0.262
opus-0.247
mber-0.245
 importantly-0.229
izing-0.228
Token he
Token ID339
Feature activation+4.60e-01

Pos logit contributions
lf+0.700
opus+0.648
mber+0.643
 importantly+0.597
izing+0.589
Neg logit contributions
 names-0.784
 sneak-0.647
 name-0.647
 meet-0.641
ws-0.639
Token wanted
Token ID2227
Feature activation-1.02e-01

Pos logit contributions
lf+1.104
opus+0.975
mber+0.965
 importantly+0.907
ths+0.881
Neg logit contributions
 names-1.131
ws-0.928
 name-0.922
 meet-0.902
 sneak-0.897
Token to
Token ID284
Feature activation-2.10e-01

Pos logit contributions
 names+0.174
ws+0.137
 name+0.132
 sneak+0.131
 meet+0.127
Neg logit contributions
lf-0.315
mber-0.286
opus-0.285
 importantly-0.277
izing-0.269
Token share
Token ID2648
Feature activation-3.76e-01

Pos logit contributions
 names+0.326
ws+0.288
ashes+0.256
ksh+0.206
inters+0.195
Neg logit contributions
lf-0.689
opus-0.557
mber-0.554
ister-0.546
 she-0.532
Token with
Token ID351
Feature activation+5.89e-02

Pos logit contributions
 names+0.878
 sneak+0.714
ws+0.711
 name+0.710
 meet+0.704
Neg logit contributions
lf-0.938
opus-0.858
mber-0.852
 importantly-0.802
izing-0.787
Token someone
Token ID2130
Feature activation+8.93e-02

Pos logit contributions
lf+0.150
opus+0.141
mber+0.140
 importantly+0.131
izing+0.129
Neg logit contributions
 names-0.135
 name-0.108
 sneak-0.108
 meet-0.107
ws-0.106
Token.
Token ID13
Feature activation-9.56e-02

Pos logit contributions
lf+0.231
opus+0.215
mber+0.215
 importantly+0.199
izing+0.198
Neg logit contributions
 names-0.201
 name-0.162
 sneak-0.161
 meet-0.159
ws-0.159
Token One
Token ID1881
Feature activation-2.48e-01

Pos logit contributions
 names+0.211
ws+0.193
 sneak+0.179
 meet+0.179
 find+0.171
Neg logit contributions
lf-0.232
mber-0.205
opus-0.203
 importantly-0.198
ement-0.191
Token day
Token ID1110
Feature activation+9.60e-02

Pos logit contributions
 names+0.501
 name+0.429
 play+0.412
 find+0.400
 sneak+0.397
Neg logit contributions
lf-0.676
opus-0.641
izing-0.634
mber-0.627
ths-0.620
Token your
Token ID534
Feature activation+1.75e-03

Pos logit contributions
 names+0.165
 sneak+0.133
ws+0.132
 meet+0.126
 name+0.126
Neg logit contributions
lf-0.214
mber-0.195
 importantly-0.191
opus-0.191
izing-0.178
Token imagination
Token ID13843
Feature activation-2.22e-01

Pos logit contributions
lf+0.005
opus+0.004
mber+0.004
 importantly+0.004
ement+0.004
Neg logit contributions
 names-0.003
ws-0.003
 sneak-0.003
 meet-0.003
ashes-0.002
Token.
Token ID13
Feature activation-9.96e-02

Pos logit contributions
 names+0.464
ws+0.366
 name+0.365
 sneak+0.365
 meet+0.358
Neg logit contributions
lf-0.609
opus-0.560
mber-0.556
 importantly-0.524
izing-0.518
Token Go
Token ID1514
Feature activation-1.78e-01

Pos logit contributions
 names+0.217
ws+0.188
 sneak+0.185
 argue+0.179
 find+0.176
Neg logit contributions
 importantly-0.223
lf-0.220
mber-0.216
opus-0.213
very-0.199
Token to
Token ID284
Feature activation-2.17e-01

Pos logit contributions
 names+0.350
ws+0.278
ashes+0.246
 name+0.239
 argue+0.225
Neg logit contributions
lf-0.454
mber-0.435
opus-0.428
very-0.420
ths-0.381
Token sleep
Token ID3993
Feature activation-3.56e-01

Pos logit contributions
 names+0.566
ws+0.469
 name+0.461
ashes+0.459
 sneak+0.439
Neg logit contributions
lf-0.460
mber-0.383
 importantly-0.366
opus-0.362
ths-0.343
Token and
Token ID290
Feature activation+1.50e-02

Pos logit contributions
 names+0.777
 name+0.632
 sneak+0.631
 meet+0.621
ws+0.617
Neg logit contributions
lf-0.911
opus-0.869
mber-0.858
 importantly-0.815
ths-0.790
Token have
Token ID423
Feature activation-4.04e-01

Pos logit contributions
lf+0.049
mber+0.045
opus+0.044
 importantly+0.043
ths+0.041
Neg logit contributions
 names-0.021
ws-0.016
ashes-0.013
 name-0.013
 sneak-0.012
Token sweet
Token ID6029
Feature activation+1.50e-01

Pos logit contributions
 names+0.895
 name+0.748
 sneak+0.743
ws+0.742
 meet+0.706
Neg logit contributions
lf-1.012
mber-0.928
opus-0.892
 importantly-0.890
ths-0.848
Token dreams
Token ID10625
Feature activation-9.49e-02

Pos logit contributions
lf+0.384
mber+0.351
opus+0.350
 importantly+0.337
ement+0.319
Neg logit contributions
 names-0.317
 sneak-0.270
ws-0.268
 meet-0.260
 name-0.254
Token."
Token ID526
Feature activation-2.16e-01

Pos logit contributions
 names+0.168
ws+0.127
 sneak+0.126
 name+0.125
 meet+0.123
Neg logit contributions
lf-0.290
opus-0.269
mber-0.268
 importantly-0.255
izing-0.251
Token say
Token ID910
Feature activation-3.31e-01

Pos logit contributions
 names+0.018
ws+0.014
 name+0.012
 gra+0.012
ashes+0.012
Neg logit contributions
lf-0.033
mber-0.029
ister-0.028
very-0.027
 she-0.027
Token they
Token ID484
Feature activation+3.54e-01

Pos logit contributions
 names+0.848
 name+0.706
 try+0.690
 meet+0.680
 sneak+0.673
Neg logit contributions
lf-0.782
mber-0.710
opus-0.708
izing-0.657
 importantly-0.655
Token loved
Token ID6151
Feature activation-4.75e-02

Pos logit contributions
lf+0.585
opus+0.464
 importantly+0.455
 she+0.441
ths+0.435
Neg logit contributions
 names-0.949
ws-0.887
 name-0.849
ags-0.809
ashes-0.800
Token her
Token ID607
Feature activation+3.40e-04

Pos logit contributions
 names+0.109
ws+0.089
 name+0.089
 sneak+0.089
 meet+0.088
Neg logit contributions
lf-0.119
mber-0.107
opus-0.107
 importantly-0.102
izing-0.100
Token.
Token ID13
Feature activation-9.54e-02

Pos logit contributions
lf+0.001
opus+0.001
mber+0.001
 importantly+0.001
ement+0.001
Neg logit contributions
 names-0.001
ws-0.001
 sneak-0.001
 meet-0.001
 name-0.001
Token They
Token ID1119
Feature activation-3.28e-01

Pos logit contributions
 names+0.152
ws+0.113
 sneak+0.113
 meet+0.108
 name+0.108
Neg logit contributions
lf-0.306
mber-0.288
opus-0.288
 importantly-0.277
izing-0.269
Token sent
Token ID1908
Feature activation-3.02e-02

Pos logit contributions
 names+0.896
ws+0.788
 meet+0.747
 name+0.743
ashes+0.742
Neg logit contributions
lf-0.611
opus-0.511
mber-0.504
 importantly-0.488
ths-0.454
Token her
Token ID607
Feature activation+6.15e-04

Pos logit contributions
 names+0.072
ws+0.061
 sneak+0.060
 name+0.060
 meet+0.060
Neg logit contributions
lf-0.069
mber-0.062
opus-0.060
 importantly-0.058
izing-0.057
Token to
Token ID284
Feature activation-2.13e-01

Pos logit contributions
lf+0.002
opus+0.001
mber+0.001
ths+0.001
 importantly+0.001
Neg logit contributions
 names-0.001
ws-0.001
 sneak-0.001
 meet-0.001
 name-0.001
Token bed
Token ID3996
Feature activation-5.90e-03

Pos logit contributions
 names+0.585
ws+0.511
 name+0.479
ashes+0.475
 sneak+0.459
Neg logit contributions
lf-0.422
mber-0.339
 importantly-0.323
 she-0.311
 unfortunately-0.306
Token without
Token ID1231
Feature activation+9.35e-02

Pos logit contributions
 names+0.012
 name+0.009
 sneak+0.009
ws+0.009
 meet+0.009
Neg logit contributions
lf-0.016
opus-0.016
mber-0.016
 importantly-0.015
izing-0.014
Token I
Token ID314
Feature activation+7.75e-02

Pos logit contributions
lf+0.290
opus+0.283
mber+0.277
 importantly+0.266
very+0.251
Neg logit contributions
 names-0.271
ws-0.238
 sneak-0.226
 find-0.218
 name-0.212
Token am
Token ID716
Feature activation-3.01e-01

Pos logit contributions
lf+0.211
opus+0.202
mber+0.197
 importantly+0.190
very+0.183
Neg logit contributions
 names-0.125
ws-0.099
ags-0.092
 sneak-0.089
 backs-0.084
Token Sara
Token ID24799
Feature activation-5.63e-03

Pos logit contributions
 names+0.653
 name+0.531
 meet+0.518
 sneak+0.511
 try+0.498
Neg logit contributions
lf-0.808
opus-0.755
mber-0.741
izing-0.702
ths-0.696
Token.
Token ID13
Feature activation-9.97e-02

Pos logit contributions
 names+0.011
 name+0.009
 sneak+0.009
 meet+0.009
ws+0.009
Neg logit contributions
lf-0.016
opus-0.015
mber-0.015
 importantly-0.014
ths-0.014
Token I
Token ID314
Feature activation+7.78e-02

Pos logit contributions
 names+0.218
ws+0.190
 sneak+0.180
 find+0.173
 name+0.171
Neg logit contributions
lf-0.243
opus-0.229
mber-0.229
 importantly-0.221
very-0.206
Token like
Token ID588
Feature activation-3.39e-01

Pos logit contributions
lf+0.210
opus+0.203
mber+0.196
 importantly+0.191
dd+0.182
Neg logit contributions
 names-0.124
ws-0.097
ags-0.088
 sneak-0.088
 argue-0.081
Token you
Token ID345
Feature activation+1.46e-01

Pos logit contributions
 names+0.793
ws+0.656
 sneak+0.649
 name+0.646
 meet+0.633
Neg logit contributions
lf-0.829
opus-0.763
mber-0.759
 importantly-0.714
izing-0.693
Token.
Token ID13
Feature activation-1.00e-01

Pos logit contributions
lf+0.379
opus+0.347
mber+0.341
 importantly+0.318
izing+0.317
Neg logit contributions
 names-0.315
ws-0.259
 sneak-0.253
 name-0.247
 meet-0.243
Token You
Token ID921
Feature activation-7.46e-02

Pos logit contributions
 names+0.233
ws+0.210
 sneak+0.198
 argue+0.193
 find+0.191
Neg logit contributions
lf-0.216
 importantly-0.206
mber-0.205
opus-0.204
very-0.185
Token are
Token ID389
Feature activation-3.69e-01

Pos logit contributions
 names+0.101
ws+0.083
ashes+0.074
 sneak+0.070
 name+0.066
Neg logit contributions
lf-0.227
opus-0.217
mber-0.210
 importantly-0.200
very-0.190
Token very
Token ID845
Feature activation-1.49e-02

Pos logit contributions
 names+0.846
 sneak+0.682
 name+0.681
ws+0.679
 meet+0.673
Neg logit contributions
lf-0.939
opus-0.866
mber-0.860
 importantly-0.809
izing-0.793