TOP ACTIVATIONS
MAX = 0.365

was different from the flash lights that other kids had .
big box full of flash lights , so L ola was
liked to play with flash lights in the dark . They
, they saw many flash lights . Some were big ,
. They find some flash lights and tape them to their
the tent with their flash lights and their t eddy bears
everybody ". L ill ie nodding , then turned
". But L ill ie still wasn 't sure
mom 's encouragement , L ill ie reached her dream and
said it was the y umm iest ice - cream they
Tom always collected the y umm iest ice - cream for
make the food even y umm ier . Let 's see
little boy had the y umm iest treat . He asked
back . J ill was excited to win her
! It was the y umm iest stew ever !
, picking out the y umm iest treats and sharing them
it and found some y umm my food inside . Leo
. J ill was so excited ! She
. It was even y umm ier than the first card
placed it on her windows ill . Every time she saw

MIN ACTIVATIONS
MIN = -0.320

They saw the wood cut ter coming to the woods
." The wood cut ter was sad . He
Stop cutting !" The wood cut ters heard them and stopped
." The wood cut ter sighed and shook his
not understand what a " cut " was . The voice
. They are for wood cut ters ." They took Lily
them . But the wood cut ter heard their voices and
ground that said " short cut ". She thought it would
little boy getting a buzz cut and a lady getting a
. The wood cut ter was sad , but
too big . The wood cut ter could not part them
his friends , the wood cut ter was able to part
time there was a wood cut ter who was strong and
One day , the wood cut ter got an idea to
there was a kind wood cut ter who lived in the
The wood cut ter was pleased with himself
was mild . The wood cut ter worked for a long
ever . So the wood cut ter went to the tallest
he wanted to play ' cut the boat '. The little
finally agreed . The wood cut ter was very excited ,

INTERVAL 0.193 - 0.365
CONTAINS 0.785%

can I make you big again ?" L ily
The train felt proud of itself for being independent and helpful
dive into strange bins ever again . Jerry agreed
had lots of different cr ay ons and was proud of
. They laughed and cl apped and cheered .

INTERVAL 0.022 - 0.193
CONTAINS 33.988%

out of the bath tub . From that day
what it was , and her mum said , " It
had lots of fun . The end . <|endoftext|> Once upon
follow your dreams , just like Tim my loved to paint
she felt the y ummy juice sliding down her throat .

INTERVAL -0.149 - 0.022
CONTAINS 63.393%

Emma and she was three - years - old . She
. Tim my loved to play with his toys all day
called Bill who loved spicy food . Every day he would
the coin too . She thought Tom was being mean and
a big truck with a trunk . The trunk is orange

INTERVAL -0.320 - -0.149
CONTAINS 1.834%

big and shiny . They wanted to see what was inside
not mean . It just wanted to play .
they were important . They said they would always support them
, Bun - B un met a little girl named Lily
should do it . You might not win ."
Token was
Token ID373
Feature activation+2.28e-02

Pos logit contributions
ill+0.268
igsaw+0.267
lights+0.254
etti+0.246
ushes+0.241
Neg logit contributions
 asking-0.191
 trying-0.185
 mean-0.169
 hunters-0.164
 sneaking-0.164
Token different
Token ID1180
Feature activation-3.60e-02

Pos logit contributions
ill+0.045
lights+0.044
igsaw+0.043
ushes+0.041
etti+0.040
Neg logit contributions
 asking-0.033
 trying-0.031
 hunters-0.030
 challenged-0.030
 sneaking-0.030
Token from
Token ID422
Feature activation+8.58e-02

Pos logit contributions
 asking+0.053
 trying+0.051
 mean+0.047
 hunters+0.045
 sneaking+0.045
Neg logit contributions
ill-0.078
igsaw-0.078
lights-0.074
etti-0.072
ushes-0.070
Token the
Token ID262
Feature activation+5.99e-03

Pos logit contributions
igsaw+0.173
ill+0.171
lights+0.170
etti+0.159
ushes+0.151
Neg logit contributions
 asking-0.125
 trying-0.120
 mean-0.115
 challenged-0.113
 hunters-0.113
Token flash
Token ID7644
Feature activation+1.01e-01

Pos logit contributions
ill+0.010
lights+0.010
igsaw+0.010
umm+0.010
ushes+0.009
Neg logit contributions
 asking-0.010
 trying-0.009
 challenged-0.009
 sneaking-0.009
 chased-0.009
Tokenlights
Token ID8091
Feature activation+3.65e-01

Pos logit contributions
igsaw+0.232
ill+0.223
ushes+0.214
etti+0.213
aire+0.208
Neg logit contributions
 asking-0.131
 trying-0.129
 mean-0.121
 hunters-0.111
 going-0.109
Token that
Token ID326
Feature activation-1.01e-02

Pos logit contributions
ill+0.846
igsaw+0.840
lights+0.804
etti+0.780
ushes+0.767
Neg logit contributions
 asking-0.474
 trying-0.455
 mean-0.410
 hunters-0.398
 sneaking-0.397
Token other
Token ID584
Feature activation-1.09e-01

Pos logit contributions
 asking+0.014
 trying+0.014
 mean+0.013
 going+0.012
 telling+0.012
Neg logit contributions
igsaw-0.023
ill-0.023
etti-0.021
ushes-0.021
lights-0.021
Token kids
Token ID3988
Feature activation-6.55e-02

Pos logit contributions
 asking+0.146
 trying+0.141
 mean+0.130
 hunters+0.124
 telling+0.122
Neg logit contributions
igsaw-0.258
ill-0.251
etti-0.237
lights-0.234
ushes-0.231
Token had
Token ID550
Feature activation+2.22e-03

Pos logit contributions
 asking+0.102
 trying+0.097
 hunters+0.094
 sneaking+0.091
 mean+0.091
Neg logit contributions
ill-0.132
igsaw-0.124
lights-0.122
etti-0.119
ushes-0.115
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
ill+0.005
lights+0.004
etti+0.004
igsaw+0.004
ushes+0.004
Neg logit contributions
 asking-0.003
 trying-0.003
 mean-0.003
 cowboy-0.003
 sneaking-0.003
Token big
Token ID1263
Feature activation+7.56e-03

Pos logit contributions
 asking+0.065
 trying+0.061
 hunters+0.059
 sneaking+0.057
 telling+0.056
Neg logit contributions
igsaw-0.080
ill-0.075
lights-0.072
etti-0.070
ushes-0.070
Token box
Token ID3091
Feature activation+5.69e-03

Pos logit contributions
igsaw+0.014
ill+0.014
lights+0.013
etti+0.013
ushes+0.012
Neg logit contributions
 asking-0.013
 trying-0.013
 hunters-0.012
 mean-0.012
 sneaking-0.012
Token full
Token ID1336
Feature activation+1.26e-01

Pos logit contributions
ill+0.011
igsaw+0.011
lights+0.011
etti+0.010
ushes+0.010
Neg logit contributions
 asking-0.009
 trying-0.009
 mean-0.008
 hunters-0.008
 sneaking-0.008
Token of
Token ID286
Feature activation+8.85e-02

Pos logit contributions
ill+0.223
igsaw+0.220
lights+0.208
etti+0.198
ushes+0.194
Neg logit contributions
 asking-0.232
 trying-0.226
 mean-0.211
 sneaking-0.207
 hunters-0.207
Token flash
Token ID7644
Feature activation+1.03e-01

Pos logit contributions
igsaw+0.175
lights+0.173
ill+0.172
ushes+0.163
etti+0.159
Neg logit contributions
 asking-0.135
 trying-0.129
 sneaking-0.120
 mean-0.118
 hunters-0.117
Tokenlights
Token ID8091
Feature activation+3.64e-01

Pos logit contributions
igsaw+0.241
ill+0.234
etti+0.219
aire+0.217
+0.217
Neg logit contributions
 asking-0.132
 trying-0.129
 mean-0.115
 telling-0.109
 hunters-0.109
Token,
Token ID11
Feature activation+2.07e-03

Pos logit contributions
ill+0.836
igsaw+0.829
lights+0.796
etti+0.771
ushes+0.754
Neg logit contributions
 asking-0.477
 trying-0.458
 mean-0.416
 hunters-0.405
 sneaking-0.404
Token so
Token ID523
Feature activation+5.67e-02

Pos logit contributions
ill+0.005
igsaw+0.005
lights+0.005
etti+0.004
ushes+0.004
Neg logit contributions
 asking-0.003
 trying-0.003
 mean-0.002
 hunters-0.002
 sneaking-0.002
Token L
Token ID406
Feature activation+6.32e-03

Pos logit contributions
ill+0.106
igsaw+0.106
lights+0.098
etti+0.096
umm+0.093
Neg logit contributions
 asking-0.102
 trying-0.098
 mean-0.091
 telling-0.089
 sneaking-0.088
Tokenola
Token ID5708
Feature activation+2.29e-01

Pos logit contributions
lights+0.007
umm+0.006
igsaw+0.006
etti+0.006
ushes+0.006
Neg logit contributions
 asking-0.015
 trying-0.015
 mean-0.014
 sneaking-0.014
 going-0.014
Token was
Token ID373
Feature activation+2.78e-02

Pos logit contributions
ill+0.508
igsaw+0.496
lights+0.477
etti+0.466
ushes+0.456
Neg logit contributions
 asking-0.320
 trying-0.307
 mean-0.285
 hunters-0.279
 sneaking-0.277
Token liked
Token ID8288
Feature activation-7.13e-02

Pos logit contributions
 asking+0.052
 trying+0.050
 hunters+0.048
 sneaking+0.047
 telling+0.045
Neg logit contributions
ill-0.074
igsaw-0.070
lights-0.067
etti-0.066
ushes-0.066
Token to
Token ID284
Feature activation-1.10e-01

Pos logit contributions
cod+0.110
aver+0.108
 asking+0.108
eh+0.106
 sheriff+0.106
Neg logit contributions
lights-0.118
igsaw-0.113
ill-0.108
etti-0.103
ushes-0.102
Token play
Token ID711
Feature activation+5.08e-03

Pos logit contributions
 asking+0.151
 trying+0.148
 mean+0.139
 going+0.128
 walking+0.127
Neg logit contributions
ill-0.250
igsaw-0.249
lights-0.238
umm-0.234
ushes-0.229
Token with
Token ID351
Feature activation+8.93e-02

Pos logit contributions
ill+0.010
igsaw+0.010
lights+0.010
etti+0.009
ushes+0.009
Neg logit contributions
 asking-0.008
 trying-0.007
 mean-0.007
 sneaking-0.007
 telling-0.007
Token flash
Token ID7644
Feature activation+9.76e-02

Pos logit contributions
ill+0.185
igsaw+0.184
lights+0.177
etti+0.170
ushes+0.167
Neg logit contributions
 asking-0.139
 trying-0.134
 mean-0.123
 sneaking-0.121
 hunters-0.120
Tokenlights
Token ID8091
Feature activation+3.57e-01

Pos logit contributions
igsaw+0.074
ill+0.061
+0.056
umm+0.053
aire+0.050
Neg logit contributions
 asking-0.279
 trying-0.278
 mean-0.271
 guns-0.271
 animals-0.269
Token in
Token ID287
Feature activation+8.53e-02

Pos logit contributions
ill+0.789
igsaw+0.768
lights+0.756
etti+0.717
ushes+0.700
Neg logit contributions
 asking-0.499
 trying-0.479
 sneaking-0.440
 mean-0.440
 hunters-0.433
Token the
Token ID262
Feature activation+1.61e-04

Pos logit contributions
igsaw+0.216
ill+0.211
umm+0.205
etti+0.197
lights+0.196
Neg logit contributions
 asking-0.100
 trying-0.097
 mean-0.085
 walking-0.081
 z-0.080
Token dark
Token ID3223
Feature activation-2.16e-02

Pos logit contributions
igsaw+0.000
lights+0.000
ill+0.000
umm+0.000
etti+0.000
Neg logit contributions
 asking-0.000
 trying-0.000
 challenged-0.000
 telling-0.000
 sneaking-0.000
Token.
Token ID13
Feature activation+5.80e-03

Pos logit contributions
 asking+0.023
 trying+0.022
 mean+0.020
 hunters+0.019
 animals+0.019
Neg logit contributions
igsaw-0.055
ill-0.053
ushes-0.052
etti-0.051
-0.051
Token They
Token ID1119
Feature activation-3.25e-02

Pos logit contributions
igsaw+0.012
lights+0.012
ill+0.011
umm+0.011
etti+0.011
Neg logit contributions
 asking-0.009
 trying-0.009
 mean-0.008
 le-0.008
 walking-0.008
Token,
Token ID11
Feature activation-1.23e-02

Pos logit contributions
 asking+0.137
 trying+0.131
 mean+0.120
 hunters+0.119
 sneaking+0.119
Neg logit contributions
ill-0.245
igsaw-0.238
lights-0.229
etti-0.223
ushes-0.220
Token they
Token ID484
Feature activation-3.21e-02

Pos logit contributions
 asking+0.018
 trying+0.018
 mean+0.016
 going+0.015
 walking+0.015
Neg logit contributions
igsaw-0.028
ill-0.027
lights-0.026
umm-0.025
etti-0.025
Token saw
Token ID2497
Feature activation-1.41e-01

Pos logit contributions
 asking+0.035
 trying+0.033
 mean+0.029
 hunters+0.028
 sneaking+0.028
Neg logit contributions
ill-0.082
igsaw-0.081
lights-0.078
etti-0.076
ushes-0.075
Token many
Token ID867
Feature activation-8.13e-02

Pos logit contributions
 asking+0.199
 trying+0.191
 sneaking+0.172
 mean+0.172
 hunters+0.167
Neg logit contributions
ill-0.310
igsaw-0.307
lights-0.297
etti-0.284
ushes-0.284
Token flash
Token ID7644
Feature activation+9.33e-02

Pos logit contributions
 asking+0.104
 trying+0.099
 sneaking+0.093
 telling+0.088
 guarding+0.087
Neg logit contributions
lights-0.182
igsaw-0.178
ill-0.176
ushes-0.170
umm-0.161
Tokenlights
Token ID8091
Feature activation+3.55e-01

Pos logit contributions
igsaw+0.166
ill+0.154
+0.148
aire+0.145
umm+0.141
Neg logit contributions
 asking-0.171
 trying-0.171
 mean-0.164
 animals-0.158
 hunters-0.157
Token.
Token ID13
Feature activation+3.50e-03

Pos logit contributions
ill+0.779
igsaw+0.756
lights+0.747
etti+0.718
ushes+0.684
Neg logit contributions
 asking-0.491
 trying-0.468
 mean-0.438
 hunters-0.435
 sneaking-0.427
Token Some
Token ID2773
Feature activation-1.05e-01

Pos logit contributions
ill+0.007
igsaw+0.007
lights+0.006
etti+0.006
aire+0.006
Neg logit contributions
 asking-0.006
 trying-0.005
 le-0.005
 mean-0.005
 men-0.005
Token were
Token ID547
Feature activation+5.68e-02

Pos logit contributions
 asking+0.170
 trying+0.164
 mean+0.151
 sneaking+0.148
 hunters+0.148
Neg logit contributions
ill-0.210
igsaw-0.208
lights-0.200
etti-0.191
ushes-0.188
Token big
Token ID1263
Feature activation-3.25e-03

Pos logit contributions
ill+0.125
lights+0.123
igsaw+0.120
ushes+0.117
etti+0.113
Neg logit contributions
 asking-0.073
 trying-0.069
 hunters-0.066
 sneaking-0.064
 cowboy-0.064
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
 asking+0.004
 trying+0.004
 mean+0.004
 sneaking+0.004
 telling+0.004
Neg logit contributions
ill-0.007
igsaw-0.007
lights-0.007
etti-0.007
ushes-0.006
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
 cowboy+0.051
 gir+0.051
 asking+0.051
 sheriff+0.049
 trying+0.048
Neg logit contributions
ill-0.045
lights-0.039
etti-0.036
igsaw-0.036
fulness-0.033
Token They
Token ID1119
Feature activation-3.25e-02

Pos logit contributions
igsaw+0.024
ill+0.023
lights+0.022
etti+0.022
umm+0.021
Neg logit contributions
 asking-0.023
 trying-0.022
 hunters-0.021
 mean-0.020
 sheep-0.020
Token find
Token ID1064
Feature activation+2.79e-02

Pos logit contributions
 asking+0.054
 trying+0.051
 hunters+0.048
 sneaking+0.047
 telling+0.047
Neg logit contributions
ill-0.063
igsaw-0.058
aire-0.054
lights-0.053
ushes-0.053
Token some
Token ID617
Feature activation+6.44e-02

Pos logit contributions
igsaw+0.069
ill+0.069
lights+0.064
etti+0.063
umm+0.063
Neg logit contributions
 asking-0.034
 trying-0.032
 mean-0.029
 hunters-0.028
 telling-0.028
Token flash
Token ID7644
Feature activation+9.63e-02

Pos logit contributions
lights+0.126
igsaw+0.126
ill+0.124
ushes+0.116
etti+0.113
Neg logit contributions
 asking-0.101
 trying-0.094
 sneaking-0.088
 telling-0.085
 challenged-0.085
Tokenlights
Token ID8091
Feature activation+3.54e-01

Pos logit contributions
igsaw+0.116
+0.100
umm+0.099
ill+0.093
aire+0.093
Neg logit contributions
 mean-0.235
 trying-0.234
 asking-0.230
 guns-0.226
ep-0.223
Token and
Token ID290
Feature activation+6.52e-03

Pos logit contributions
ill+0.825
igsaw+0.815
lights+0.786
etti+0.759
ushes+0.746
Neg logit contributions
 asking-0.458
 trying-0.438
 mean-0.395
 hunters-0.386
 sneaking-0.385
Token tape
Token ID9154
Feature activation+1.61e-03

Pos logit contributions
ill+0.014
igsaw+0.014
lights+0.013
etti+0.013
ushes+0.013
Neg logit contributions
 asking-0.009
 trying-0.009
 mean-0.008
 hunters-0.008
 sneaking-0.008
Token them
Token ID606
Feature activation+4.74e-03

Pos logit contributions
ill+0.004
igsaw+0.003
lights+0.003
etti+0.003
ushes+0.003
Neg logit contributions
 asking-0.002
 trying-0.002
 mean-0.002
 sneaking-0.002
 hunters-0.002
Token to
Token ID284
Feature activation-1.07e-01

Pos logit contributions
igsaw+0.011
ill+0.011
lights+0.010
etti+0.010
umm+0.010
Neg logit contributions
 asking-0.007
 trying-0.006
 mean-0.006
 sneaking-0.006
 hunters-0.006
Token their
Token ID511
Feature activation-4.25e-02

Pos logit contributions
 asking+0.151
 trying+0.146
 mean+0.138
 z+0.131
 telling+0.127
Neg logit contributions
igsaw-0.249
ill-0.237
umm-0.233
lights-0.231
etti-0.221
Token the
Token ID262
Feature activation-1.17e-02

Pos logit contributions
ill+0.207
igsaw+0.207
lights+0.196
umm+0.194
etti+0.193
Neg logit contributions
 asking-0.120
 trying-0.115
 mean-0.101
 telling-0.100
 sneaking-0.098
Token tent
Token ID11105
Feature activation-3.83e-02

Pos logit contributions
 asking+0.021
 trying+0.020
 challenged+0.018
 hunters+0.018
 telling+0.018
Neg logit contributions
lights-0.021
igsaw-0.021
ill-0.020
ushes-0.019
etti-0.018
Token with
Token ID351
Feature activation+8.06e-02

Pos logit contributions
 asking+0.047
 trying+0.045
 mean+0.040
 telling+0.039
 sneaking+0.039
Neg logit contributions
igsaw-0.092
ill-0.091
lights-0.088
etti-0.085
ushes-0.084
Token their
Token ID511
Feature activation-5.15e-02

Pos logit contributions
igsaw+0.167
ill+0.165
umm+0.153
lights+0.151
etti+0.151
Neg logit contributions
 asking-0.129
 trying-0.126
 mean-0.119
 sheep-0.115
 walking-0.114
Token flash
Token ID7644
Feature activation+9.07e-02

Pos logit contributions
 asking+0.087
 trying+0.083
 sneaking+0.077
 challenged+0.077
 hunters+0.076
Neg logit contributions
lights-0.096
igsaw-0.096
ill-0.095
ushes-0.090
etti-0.087
Tokenlights
Token ID8091
Feature activation+3.51e-01

Pos logit contributions
igsaw+0.087
+0.072
ill+0.069
aire+0.067
umm+0.067
Neg logit contributions
 asking-0.235
 trying-0.234
 mean-0.233
ep-0.228
 angry-0.225
Token and
Token ID290
Feature activation-6.85e-04

Pos logit contributions
igsaw+0.852
ill+0.847
lights+0.802
etti+0.787
ushes+0.778
Neg logit contributions
 asking-0.429
 trying-0.413
 mean-0.365
 telling-0.351
 sneaking-0.347
Token their
Token ID511
Feature activation-5.05e-02

Pos logit contributions
 asking+0.001
 trying+0.001
 mean+0.001
 sneaking+0.001
 telling+0.001
Neg logit contributions
igsaw-0.001
ill-0.001
lights-0.001
umm-0.001
etti-0.001
Token t
Token ID256
Feature activation+1.27e-02

Pos logit contributions
 asking+0.090
 trying+0.086
 mean+0.080
 sneaking+0.079
 telling+0.078
Neg logit contributions
lights-0.092
igsaw-0.092
ill-0.091
ushes-0.085
etti-0.084
Tokeneddy
Token ID21874
Feature activation+1.81e-01

Pos logit contributions
igsaw+0.018
ill+0.016
umm+0.014
+0.014
 pleasure+0.013
Neg logit contributions
ep-0.027
 asking-0.027
 trying-0.026
 hunters-0.026
 guns-0.025
Token bears
Token ID13062
Feature activation-1.12e-01

Pos logit contributions
lights+0.454
ill+0.449
igsaw+0.443
etti+0.434
ushes+0.421
Neg logit contributions
 asking-0.183
 trying-0.172
 mean-0.158
 sneaking-0.149
 stealing-0.148
Token everybody
Token ID7288
Feature activation+1.98e-01

Pos logit contributions
ill+0.030
igsaw+0.030
lights+0.028
etti+0.027
ushes+0.027
Neg logit contributions
 asking-0.016
 trying-0.015
 mean-0.014
 hunters-0.013
 sneaking-0.013
Token".
Token ID1911
Feature activation+6.45e-02

Pos logit contributions
ill+0.438
igsaw+0.436
lights+0.416
etti+0.403
ushes+0.395
Neg logit contributions
 asking-0.282
 trying-0.273
 mean-0.248
 hunters-0.240
 sneaking-0.240
Token
Token ID198
Feature activation+3.94e-02

Pos logit contributions
igsaw+0.148
lights+0.147
umm+0.142
etti+0.141
ill+0.130
Neg logit contributions
 asking-0.086
 trying-0.082
 telling-0.072
 called-0.070
 walking-0.068
Token
Token ID198
Feature activation+3.70e-02

Pos logit contributions
igsaw+0.078
ill+0.076
lights+0.074
etti+0.072
umm+0.072
Neg logit contributions
 asking-0.065
 trying-0.063
 mean-0.058
 telling-0.057
 sneaking-0.056
TokenL
Token ID43
Feature activation-2.55e-02

Pos logit contributions
ill+0.069
igsaw+0.068
lights+0.065
etti+0.062
ushes+0.061
Neg logit contributions
 asking-0.066
 trying-0.064
 mean-0.059
 hunters-0.058
 sneaking-0.058
Tokenill
Token ID359
Feature activation+3.32e-01

Pos logit contributions
 asking+0.042
 trying+0.041
 mean+0.038
 sneaking+0.037
 telling+0.037
Neg logit contributions
igsaw-0.048
lights-0.047
etti-0.046
umm-0.045
-0.044
Tokenie
Token ID494
Feature activation+4.68e-02

Pos logit contributions
ill+0.686
igsaw+0.667
lights+0.640
etti+0.614
ushes+0.603
Neg logit contributions
 asking-0.521
 trying-0.504
 mean-0.464
 hunters-0.462
 sneaking-0.457
Token nodding
Token ID47583
Feature activation+3.67e-02

Pos logit contributions
ill+0.095
igsaw+0.094
lights+0.090
etti+0.086
ushes+0.085
Neg logit contributions
 asking-0.075
 trying-0.072
 mean-0.067
 hunters-0.065
 sneaking-0.065
Token,
Token ID11
Feature activation+4.29e-03

Pos logit contributions
igsaw+0.088
ill+0.082
umm+0.082
etti+0.080
lights+0.080
Neg logit contributions
 asking-0.051
 trying-0.049
 mean-0.043
 telling-0.042
 walking-0.041
Token then
Token ID788
Feature activation+7.99e-02

Pos logit contributions
igsaw+0.010
ill+0.010
lights+0.010
etti+0.009
ushes+0.009
Neg logit contributions
 asking-0.006
 trying-0.006
 telling-0.005
 mean-0.005
 sneaking-0.005
Token turned
Token ID2900
Feature activation+5.03e-02

Pos logit contributions
ill+0.168
igsaw+0.165
lights+0.158
etti+0.152
ushes+0.150
Neg logit contributions
 asking-0.119
 trying-0.115
 mean-0.107
 hunters-0.106
 sneaking-0.104
Token".
Token ID1911
Feature activation+6.40e-02

Pos logit contributions
ill+0.096
igsaw+0.096
lights+0.092
etti+0.089
ushes+0.087
Neg logit contributions
 asking-0.055
 trying-0.053
 mean-0.048
 hunters-0.047
 sneaking-0.047
Token
Token ID198
Feature activation+3.37e-02

Pos logit contributions
igsaw+0.133
lights+0.130
umm+0.125
ill+0.120
etti+0.120
Neg logit contributions
 asking-0.102
 trying-0.096
 telling-0.088
 called-0.084
 walking-0.084
Token
Token ID198
Feature activation+3.06e-02

Pos logit contributions
ill+0.061
igsaw+0.060
lights+0.057
etti+0.054
ushes+0.053
Neg logit contributions
 asking-0.062
 trying-0.060
 mean-0.056
 hunters-0.055
 sneaking-0.055
TokenBut
Token ID1537
Feature activation+1.45e-03

Pos logit contributions
ill+0.054
igsaw+0.051
lights+0.047
ushes+0.046
etti+0.045
Neg logit contributions
 asking-0.060
 trying-0.058
 hunters-0.055
 mean-0.054
 sneaking-0.053
Token L
Token ID406
Feature activation+1.49e-02

Pos logit contributions
ill+0.003
igsaw+0.003
lights+0.003
etti+0.003
ushes+0.002
Neg logit contributions
 asking-0.002
 trying-0.002
 mean-0.002
 hunters-0.002
 sneaking-0.002
Tokenill
Token ID359
Feature activation+3.29e-01

Pos logit contributions
lights+0.019
igsaw+0.019
umm+0.019
etti+0.018
ushes+0.017
Neg logit contributions
 trying-0.032
 asking-0.032
 mean-0.031
 sneaking-0.030
 telling-0.030
Tokenie
Token ID494
Feature activation+4.60e-02

Pos logit contributions
ill+0.705
igsaw+0.625
lights+0.609
aney+0.568
aire+0.561
Neg logit contributions
 asking-0.508
 hunters-0.482
 trying-0.479
 sneaking-0.448
 cowboy-0.444
Token still
Token ID991
Feature activation+1.34e-01

Pos logit contributions
ill+0.096
igsaw+0.088
lights+0.082
aire+0.081
etti+0.081
Neg logit contributions
 asking-0.067
 hunters-0.065
 trying-0.061
 sneaking-0.060
 cowboy-0.060
Token wasn
Token ID2492
Feature activation-1.44e-02

Pos logit contributions
igsaw+0.319
ill+0.314
lights+0.306
etti+0.297
umm+0.293
Neg logit contributions
 asking-0.174
 trying-0.170
 mean-0.154
 telling-0.146
 sneaking-0.143
Token't
Token ID470
Feature activation-7.34e-02

Pos logit contributions
 asking+0.021
 trying+0.020
 mean+0.018
 hunters+0.018
 sneaking+0.018
Neg logit contributions
ill-0.031
igsaw-0.031
lights-0.029
etti-0.028
ushes-0.028
Token sure
Token ID1654
Feature activation+1.94e-02

Pos logit contributions
 asking+0.091
 trying+0.087
 gir+0.084
 sneaking+0.083
 hunters+0.083
Neg logit contributions
ill-0.166
igsaw-0.162
lights-0.148
ushes-0.146
etti-0.143
Token mom
Token ID1995
Feature activation+1.03e-01

Pos logit contributions
ill+0.079
igsaw+0.078
lights+0.074
etti+0.072
umm+0.070
Neg logit contributions
 asking-0.060
 trying-0.058
 mean-0.053
 telling-0.052
 walking-0.051
Token's
Token ID338
Feature activation-5.00e-02

Pos logit contributions
ill+0.233
igsaw+0.225
lights+0.212
etti+0.209
ushes+0.205
Neg logit contributions
 asking-0.134
 trying-0.129
 mean-0.124
 hunters-0.121
 sneaking-0.120
Token encouragement
Token ID27529
Feature activation+4.19e-02

Pos logit contributions
 asking+0.081
 sneaking+0.078
 trying+0.078
 hunters+0.077
 cowboy+0.076
Neg logit contributions
igsaw-0.092
ill-0.086
lights-0.084
ushes-0.082
 pleasure-0.078
Token,
Token ID11
Feature activation+7.80e-03

Pos logit contributions
ill+0.099
igsaw+0.099
lights+0.094
etti+0.092
ushes+0.090
Neg logit contributions
 asking-0.053
 trying-0.051
 mean-0.045
 hunters-0.044
 sneaking-0.044
Token L
Token ID406
Feature activation+8.23e-03

Pos logit contributions
igsaw+0.016
ill+0.015
lights+0.015
etti+0.015
umm+0.014
Neg logit contributions
 asking-0.013
 trying-0.013
 telling-0.011
 mean-0.011
 walking-0.011
Tokenill
Token ID359
Feature activation+3.28e-01

Pos logit contributions
igsaw+0.007
umm+0.007
lights+0.007
etti+0.007
+0.006
Neg logit contributions
 trying-0.021
 asking-0.021
 mean-0.020
 sneaking-0.020
 naughty-0.019
Tokenie
Token ID494
Feature activation+4.34e-02

Pos logit contributions
ill+0.682
igsaw+0.632
lights+0.629
etti+0.585
ushes+0.581
Neg logit contributions
 asking-0.520
 trying-0.507
 hunters-0.469
 mean-0.469
 sneaking-0.464
Token reached
Token ID4251
Feature activation-2.84e-03

Pos logit contributions
ill+0.091
igsaw+0.089
lights+0.086
etti+0.083
ushes+0.082
Neg logit contributions
 asking-0.065
 trying-0.062
 mean-0.058
 hunters-0.057
 sneaking-0.057
Token her
Token ID607
Feature activation+3.82e-02

Pos logit contributions
 asking+0.004
 trying+0.004
 walking+0.003
 telling+0.003
 mean+0.003
Neg logit contributions
ill-0.007
igsaw-0.007
lights-0.007
etti-0.006
umm-0.006
Token dream
Token ID4320
Feature activation+6.00e-02

Pos logit contributions
ill+0.072
igsaw+0.072
lights+0.068
etti+0.065
ushes+0.065
Neg logit contributions
 asking-0.065
 trying-0.063
 mean-0.059
 hunters-0.058
 sneaking-0.058
Token and
Token ID290
Feature activation+1.50e-02

Pos logit contributions
igsaw+0.136
ill+0.136
lights+0.130
etti+0.127
ushes+0.125
Neg logit contributions
 asking-0.080
 trying-0.078
 mean-0.070
 sneaking-0.068
 telling-0.067
Token said
Token ID531
Feature activation-1.40e-01

Pos logit contributions
igsaw+0.435
ill+0.430
lights+0.412
etti+0.400
ushes+0.393
Neg logit contributions
 asking-0.304
 trying-0.294
 mean-0.269
 telling-0.261
 sneaking-0.258
Token it
Token ID340
Feature activation+8.75e-02

Pos logit contributions
 asking+0.190
 trying+0.183
 mean+0.166
 hunters+0.160
 sneaking+0.159
Neg logit contributions
ill-0.319
igsaw-0.318
lights-0.303
etti-0.294
ushes-0.289
Token was
Token ID373
Feature activation+3.04e-02

Pos logit contributions
igsaw+0.197
ill+0.194
lights+0.186
etti+0.179
ushes+0.177
Neg logit contributions
 asking-0.123
 trying-0.121
 mean-0.110
 telling-0.106
 sneaking-0.105
Token the
Token ID262
Feature activation+1.97e-02

Pos logit contributions
ill+0.067
igsaw+0.067
lights+0.064
etti+0.062
ushes+0.061
Neg logit contributions
 asking-0.044
 trying-0.042
 mean-0.038
 sneaking-0.037
 hunters-0.037
Token y
Token ID331
Feature activation-2.47e-03

Pos logit contributions
ill+0.034
igsaw+0.034
umm+0.033
lights+0.032
etti+0.030
Neg logit contributions
 asking-0.035
 trying-0.033
 sneaking-0.032
 hunters-0.032
 challenged-0.031
Tokenumm
Token ID13929
Feature activation+3.28e-01

Pos logit contributions
 asking+0.005
 trying+0.005
 le+0.005
 mean+0.005
 gir+0.005
Neg logit contributions
ill-0.003
ushes-0.003
lights-0.003
igsaw-0.003
etti-0.003
Tokeniest
Token ID6386
Feature activation+2.10e-01

Pos logit contributions
igsaw+0.490
ushes+0.444
ill+0.435
etti+0.433
lights+0.426
Neg logit contributions
 asking-0.699
 trying-0.680
 going-0.635
 telling-0.629
 mean-0.629
Token ice
Token ID4771
Feature activation+3.98e-02

Pos logit contributions
igsaw+0.460
ill+0.458
lights+0.434
etti+0.424
ushes+0.418
Neg logit contributions
 asking-0.309
 trying-0.299
 mean-0.270
 telling-0.263
 sneaking-0.260
Token-
Token ID12
Feature activation+1.14e-02

Pos logit contributions
ill+0.106
igsaw+0.106
lights+0.100
ushes+0.099
etti+0.097
Neg logit contributions
 asking-0.037
 trying-0.037
 going-0.031
 le-0.031
 mean-0.030
Tokencream
Token ID36277
Feature activation+8.46e-02

Pos logit contributions
ill+0.019
igsaw+0.019
lights+0.018
etti+0.017
ushes+0.017
Neg logit contributions
 asking-0.022
 trying-0.022
 mean-0.020
 le-0.020
 gir-0.020
Token they
Token ID484
Feature activation-1.38e-02

Pos logit contributions
igsaw+0.209
ill+0.202
ushes+0.194
lights+0.192
aire+0.186
Neg logit contributions
 trying-0.103
 asking-0.102
 le-0.090
 going-0.089
 gir-0.088
Token Tom
Token ID4186
Feature activation-9.24e-02

Pos logit contributions
igsaw+0.020
ill+0.020
lights+0.019
etti+0.019
ushes+0.018
Neg logit contributions
 asking-0.013
 trying-0.013
 mean-0.012
 telling-0.011
 hunters-0.011
Token always
Token ID1464
Feature activation-2.95e-02

Pos logit contributions
 asking+0.125
 trying+0.119
 mean+0.108
 hunters+0.107
 sneaking+0.106
Neg logit contributions
ill-0.210
igsaw-0.204
lights-0.196
etti-0.190
ushes-0.187
Token collected
Token ID7723
Feature activation+1.24e-02

Pos logit contributions
 asking+0.040
 trying+0.038
 mean+0.035
 sneaking+0.033
 hunters+0.033
Neg logit contributions
ill-0.068
igsaw-0.067
lights-0.064
etti-0.062
ushes-0.061
Token the
Token ID262
Feature activation+2.25e-02

Pos logit contributions
igsaw+0.031
ill+0.031
etti+0.030
lights+0.029
aire+0.028
Neg logit contributions
 asking-0.015
 trying-0.014
 mean-0.013
 le-0.013
 naughty-0.012
Token y
Token ID331
Feature activation-2.21e-03

Pos logit contributions
ill+0.044
igsaw+0.044
lights+0.042
umm+0.041
ushes+0.040
Neg logit contributions
 asking-0.036
 trying-0.034
 hunters-0.032
 sneaking-0.032
 mean-0.031
Tokenumm
Token ID13929
Feature activation+3.28e-01

Pos logit contributions
 asking+0.005
 trying+0.005
 le+0.005
 going+0.004
 gir+0.004
Neg logit contributions
ill-0.003
igsaw-0.003
lights-0.003
etti-0.002
ushes-0.002
Tokeniest
Token ID6386
Feature activation+2.11e-01

Pos logit contributions
igsaw+0.518
etti+0.477
ill+0.461
ushes+0.453
aire+0.442
Neg logit contributions
 asking-0.683
 trying-0.648
 going-0.613
 telling-0.598
 mean-0.598
Token ice
Token ID4771
Feature activation+3.86e-02

Pos logit contributions
igsaw+0.456
ill+0.456
lights+0.430
etti+0.419
ushes+0.409
Neg logit contributions
 asking-0.317
 trying-0.306
 mean-0.279
 telling-0.269
 sneaking-0.268
Token-
Token ID12
Feature activation+9.93e-03

Pos logit contributions
ill+0.106
igsaw+0.106
lights+0.100
ushes+0.098
etti+0.098
Neg logit contributions
 asking-0.034
 trying-0.033
 le-0.027
 going-0.027
 telling-0.026
Tokencream
Token ID36277
Feature activation+8.32e-02

Pos logit contributions
ill+0.017
igsaw+0.017
lights+0.017
etti+0.015
ushes+0.015
Neg logit contributions
 asking-0.018
 trying-0.018
 le-0.017
 gir-0.017
 mean-0.017
Token for
Token ID329
Feature activation+1.84e-02

Pos logit contributions
igsaw+0.209
ill+0.202
lights+0.195
ushes+0.192
aire+0.186
Neg logit contributions
 asking-0.099
 trying-0.098
 le-0.088
 going-0.085
 gir-0.084
Token make
Token ID787
Feature activation+5.37e-02

Pos logit contributions
igsaw+0.018
lights+0.017
ill+0.017
ushes+0.017
umm+0.016
Neg logit contributions
 trying-0.010
 asking-0.010
 going-0.009
 mean-0.009
 walking-0.009
Token the
Token ID262
Feature activation+1.58e-02

Pos logit contributions
ill+0.110
igsaw+0.109
lights+0.103
etti+0.101
ushes+0.098
Neg logit contributions
 asking-0.084
 trying-0.081
 mean-0.075
 sneaking-0.073
 hunters-0.073
Token food
Token ID2057
Feature activation+3.57e-03

Pos logit contributions
umm+0.024
 perfection+0.021
ill+0.021
aire+0.021
igsaw+0.021
Neg logit contributions
 chased-0.030
 asking-0.029
 sneaking-0.028
 trying-0.028
 guarding-0.028
Token even
Token ID772
Feature activation+1.99e-01

Pos logit contributions
igsaw+0.009
ill+0.008
lights+0.008
etti+0.008
ushes+0.008
Neg logit contributions
 trying-0.004
 asking-0.004
 mean-0.004
 going-0.004
 telling-0.003
Token y
Token ID331
Feature activation-3.99e-04

Pos logit contributions
ill+0.423
igsaw+0.421
lights+0.399
etti+0.388
ushes+0.379
Neg logit contributions
 asking-0.302
 trying-0.292
 mean-0.268
 hunters-0.260
 sneaking-0.259
Tokenumm
Token ID13929
Feature activation+3.26e-01

Pos logit contributions
 asking+0.001
 trying+0.001
 hunters+0.001
 le+0.001
 mean+0.001
Neg logit contributions
ill-0.000
igsaw-0.000
lights-0.000
ushes-0.000
etti-0.000
Tokenier
Token ID959
Feature activation+1.97e-01

Pos logit contributions
igsaw+0.472
ill+0.430
etti+0.415
lights+0.413
ushes+0.408
Neg logit contributions
 asking-0.732
 trying-0.707
 going-0.664
 mean-0.661
 telling-0.648
Token.
Token ID13
Feature activation+4.27e-02

Pos logit contributions
ill+0.419
igsaw+0.416
lights+0.396
etti+0.383
ushes+0.376
Neg logit contributions
 asking-0.295
 trying-0.285
 mean-0.261
 hunters-0.254
 sneaking-0.254
Token Let
Token ID3914
Feature activation+3.15e-02

Pos logit contributions
igsaw+0.082
ill+0.082
lights+0.082
ushes+0.073
aney+0.071
Neg logit contributions
 asking-0.075
 trying-0.072
 telling-0.068
 mean-0.063
 stealing-0.063
Token's
Token ID338
Feature activation-3.82e-02

Pos logit contributions
ill+0.086
igsaw+0.085
lights+0.082
etti+0.080
ushes+0.079
Neg logit contributions
 asking-0.028
 trying-0.027
 mean-0.022
 hunters-0.022
 sneaking-0.022
Token see
Token ID766
Feature activation+7.76e-02

Pos logit contributions
 asking+0.061
 trying+0.059
 mean+0.054
 hunters+0.053
 sneaking+0.053
Neg logit contributions
ill-0.077
igsaw-0.077
lights-0.073
etti-0.071
ushes-0.069
Token little
Token ID1310
Feature activation-9.42e-02

Pos logit contributions
igsaw+0.039
ill+0.039
lights+0.038
umm+0.036
etti+0.035
Neg logit contributions
 asking-0.027
 trying-0.025
 sneaking-0.024
 hunters-0.023
 telling-0.022
Token boy
Token ID2933
Feature activation-9.96e-02

Pos logit contributions
 asking+0.134
 trying+0.128
 sneaking+0.116
 mean+0.114
 telling+0.114
Neg logit contributions
ill-0.210
igsaw-0.207
lights-0.198
etti-0.191
umm-0.187
Token had
Token ID550
Feature activation+1.50e-02

Pos logit contributions
 asking+0.136
 trying+0.131
 hunters+0.128
 mean+0.126
 sneaking+0.123
Neg logit contributions
ill-0.219
igsaw-0.200
etti-0.196
lights-0.195
aire-0.188
Token the
Token ID262
Feature activation+1.81e-02

Pos logit contributions
ill+0.033
igsaw+0.032
etti+0.030
lights+0.030
ushes+0.030
Neg logit contributions
 asking-0.020
 trying-0.019
 sneaking-0.018
 mean-0.018
 hunters-0.018
Token y
Token ID331
Feature activation-3.74e-03

Pos logit contributions
ill+0.040
igsaw+0.040
lights+0.038
etti+0.037
ushes+0.036
Neg logit contributions
 asking-0.026
 trying-0.025
 mean-0.023
 hunters-0.022
 sneaking-0.022
Tokenumm
Token ID13929
Feature activation+3.25e-01

Pos logit contributions
 asking+0.009
 trying+0.009
 le+0.008
 walking+0.008
 mean+0.008
Neg logit contributions
ill-0.004
igsaw-0.004
lights-0.004
ushes-0.004
etti-0.004
Tokeniest
Token ID6386
Feature activation+2.06e-01

Pos logit contributions
igsaw+0.438
ill+0.397
etti+0.387
lights+0.385
ushes+0.368
Neg logit contributions
 asking-0.753
 trying-0.734
 mean-0.690
 going-0.675
 telling-0.674
Token treat
Token ID2190
Feature activation+9.17e-02

Pos logit contributions
ill+0.431
igsaw+0.430
lights+0.408
etti+0.396
ushes+0.388
Neg logit contributions
 asking-0.317
 trying-0.307
 mean-0.281
 sneaking-0.272
 hunters-0.271
Token.
Token ID13
Feature activation+4.35e-02

Pos logit contributions
igsaw+0.215
ill+0.210
lights+0.205
ushes+0.198
etti+0.197
Neg logit contributions
 asking-0.121
 trying-0.118
 mean-0.103
 telling-0.101
 going-0.099
Token He
Token ID679
Feature activation+1.10e-02

Pos logit contributions
ill+0.097
lights+0.093
igsaw+0.092
etti+0.090
ushes+0.088
Neg logit contributions
 asking-0.059
 trying-0.054
 demanding-0.046
 telling-0.045
 going-0.045
Token asked
Token ID1965
Feature activation-1.47e-01

Pos logit contributions
ill+0.024
igsaw+0.023
lights+0.022
etti+0.021
ushes+0.021
Neg logit contributions
 asking-0.016
 trying-0.015
 mean-0.014
 hunters-0.014
 sneaking-0.014
Token back
Token ID736
Feature activation+2.44e-01

Pos logit contributions
 asking+0.056
 trying+0.053
 mean+0.049
 sneaking+0.047
 hunters+0.047
Neg logit contributions
ill-0.089
igsaw-0.088
lights-0.083
etti-0.080
ushes-0.079
Token.
Token ID13
Feature activation+3.23e-02

Pos logit contributions
ill+0.564
igsaw+0.563
lights+0.539
etti+0.522
ushes+0.513
Neg logit contributions
 asking-0.324
 trying-0.313
 mean-0.280
 hunters-0.273
 sneaking-0.272
Token
Token ID198
Feature activation+3.78e-02

Pos logit contributions
lights+0.073
igsaw+0.070
etti+0.068
umm+0.067
ill+0.065
Neg logit contributions
 asking-0.047
 trying-0.043
 telling-0.038
 walking-0.038
 demanding-0.037
Token
Token ID198
Feature activation+3.55e-02

Pos logit contributions
igsaw+0.078
ill+0.077
lights+0.076
etti+0.073
umm+0.073
Neg logit contributions
 asking-0.059
 trying-0.057
 mean-0.052
 telling-0.051
 walking-0.050
TokenJ
Token ID41
Feature activation+1.10e-01

Pos logit contributions
ill+0.059
igsaw+0.059
lights+0.055
etti+0.053
ushes+0.051
Neg logit contributions
 asking-0.070
 trying-0.068
 mean-0.064
 sneaking-0.062
 hunters-0.062
Tokenill
Token ID359
Feature activation+3.25e-01

Pos logit contributions
lights+0.036
igsaw+0.036
etti+0.029
umm+0.028
+0.022
Neg logit contributions
 asking-0.337
 trying-0.331
 mean-0.324
 hunters-0.316
ep-0.316
Token was
Token ID373
Feature activation+2.96e-02

Pos logit contributions
ill+0.634
igsaw+0.565
lights+0.564
etti+0.528
aire+0.516
Neg logit contributions
 asking-0.550
 trying-0.532
 hunters-0.510
 mean-0.497
 sneaking-0.496
Token excited
Token ID6568
Feature activation-6.43e-03

Pos logit contributions
igsaw+0.071
ill+0.070
lights+0.068
etti+0.065
umm+0.065
Neg logit contributions
 asking-0.039
 trying-0.038
 mean-0.034
 telling-0.032
 walking-0.032
Token to
Token ID284
Feature activation-9.55e-02

Pos logit contributions
 asking+0.008
 trying+0.008
 mean+0.007
 hunters+0.007
 sneaking+0.007
Neg logit contributions
ill-0.015
igsaw-0.014
lights-0.014
etti-0.013
umm-0.013
Token win
Token ID1592
Feature activation+1.38e-02

Pos logit contributions
 asking+0.149
 trying+0.145
 mean+0.132
 telling+0.129
 walking+0.127
Neg logit contributions
igsaw-0.205
lights-0.203
ill-0.202
ushes-0.190
umm-0.189
Token her
Token ID607
Feature activation+4.57e-02

Pos logit contributions
igsaw+0.032
ill+0.032
lights+0.030
etti+0.029
ushes+0.029
Neg logit contributions
 asking-0.019
 trying-0.018
 mean-0.016
 hunters-0.016
 sneaking-0.016
Token!
Token ID0
Feature activation+8.98e-02

Pos logit contributions
 asking+0.004
 trying+0.004
 mean+0.004
 telling+0.003
 hunters+0.003
Neg logit contributions
igsaw-0.008
ill-0.008
lights-0.008
etti-0.008
ushes-0.008
Token It
Token ID632
Feature activation+1.36e-01

Pos logit contributions
lights+0.211
ill+0.208
umm+0.194
kin+0.191
imm+0.189
Neg logit contributions
 asking-0.098
 trying-0.089
 telling-0.087
 called-0.086
 named-0.085
Token was
Token ID373
Feature activation+3.75e-02

Pos logit contributions
igsaw+0.358
ill+0.326
lights+0.320
etti+0.319
ushes+0.314
Neg logit contributions
 trying-0.158
 asking-0.155
 mean-0.146
 telling-0.139
 chased-0.134
Token the
Token ID262
Feature activation+1.57e-02

Pos logit contributions
igsaw+0.088
ill+0.087
lights+0.083
etti+0.081
ushes+0.079
Neg logit contributions
 asking-0.050
 trying-0.048
 mean-0.043
 telling-0.042
 sneaking-0.041
Token y
Token ID331
Feature activation-3.17e-03

Pos logit contributions
ill+0.028
igsaw+0.028
lights+0.027
umm+0.026
etti+0.025
Neg logit contributions
 asking-0.028
 trying-0.027
 sneaking-0.025
 hunters-0.025
 mean-0.024
Tokenumm
Token ID13929
Feature activation+3.24e-01

Pos logit contributions
 asking+0.007
 trying+0.007
 le+0.007
 hunters+0.007
 mean+0.007
Neg logit contributions
ill-0.004
igsaw-0.003
lights-0.003
ushes-0.003
etti-0.003
Tokeniest
Token ID6386
Feature activation+2.07e-01

Pos logit contributions
igsaw+0.539
ushes+0.484
etti+0.477
ill+0.469
lights+0.464
Neg logit contributions
 asking-0.655
 trying-0.641
 going-0.601
 mean-0.583
 telling-0.579
Token stew
Token ID20798
Feature activation-8.17e-03

Pos logit contributions
igsaw+0.435
ill+0.433
lights+0.406
etti+0.400
ushes+0.393
Neg logit contributions
 asking-0.327
 trying-0.317
 mean-0.286
 telling-0.282
 hunters-0.278
Token ever
Token ID1683
Feature activation+1.90e-01

Pos logit contributions
 asking+0.011
 trying+0.011
 telling+0.010
 going+0.010
 mean+0.009
Neg logit contributions
igsaw-0.019
ill-0.019
lights-0.018
ushes-0.018
etti-0.017
Token!
Token ID0
Feature activation+8.92e-02

Pos logit contributions
igsaw+0.426
ill+0.421
lights+0.406
etti+0.392
ushes+0.391
Neg logit contributions
 asking-0.269
 trying-0.260
 mean-0.234
 telling-0.229
 sneaking-0.226
Token
Token ID198
Feature activation+3.72e-02

Pos logit contributions
lights+0.255
ill+0.250
umm+0.245
igsaw+0.239
etti+0.234
Neg logit contributions
 asking-0.051
 named-0.048
 called-0.047
 telling-0.044
 trying-0.041
Token,
Token ID11
Feature activation+6.23e-03

Pos logit contributions
ill+0.264
igsaw+0.252
lights+0.238
etti+0.233
ushes+0.229
Neg logit contributions
 asking-0.170
 trying-0.160
 mean-0.159
 hunters-0.152
 sneaking-0.152
Token picking
Token ID10868
Feature activation-1.27e-01

Pos logit contributions
ill+0.015
igsaw+0.014
etti+0.014
lights+0.013
ushes+0.013
Neg logit contributions
 asking-0.006
 mean-0.006
 hunters-0.006
 doctors-0.006
 trying-0.006
Token out
Token ID503
Feature activation+9.18e-02

Pos logit contributions
 asking+0.163
 trying+0.158
 mean+0.140
 hunters+0.138
 le+0.138
Neg logit contributions
igsaw-0.299
ill-0.298
lights-0.286
etti-0.280
ushes-0.269
Token the
Token ID262
Feature activation+1.47e-02

Pos logit contributions
ill+0.206
igsaw+0.205
lights+0.195
etti+0.192
umm+0.188
Neg logit contributions
 asking-0.126
 trying-0.122
 mean-0.112
 hunters-0.109
 telling-0.106
Token y
Token ID331
Feature activation-5.21e-03

Pos logit contributions
igsaw+0.030
ill+0.029
umm+0.028
lights+0.027
ushes+0.026
Neg logit contributions
 asking-0.021
 trying-0.021
 sneaking-0.020
 hunters-0.020
 chased-0.019
Tokenumm
Token ID13929
Feature activation+3.24e-01

Pos logit contributions
 asking+0.013
 trying+0.012
 le+0.012
 mean+0.011
 hunters+0.011
Neg logit contributions
ill-0.006
igsaw-0.005
lights-0.005
etti-0.005
ushes-0.005
Tokeniest
Token ID6386
Feature activation+2.06e-01

Pos logit contributions
igsaw+0.463
etti+0.429
ill+0.429
ushes+0.405
aire+0.389
Neg logit contributions
 asking-0.718
 trying-0.690
 mean-0.640
 going-0.632
 le-0.632
Token treats
Token ID18432
Feature activation+6.40e-02

Pos logit contributions
ill+0.471
igsaw+0.468
lights+0.450
umm+0.432
etti+0.429
Neg logit contributions
 asking-0.260
 trying-0.251
 mean-0.228
 hunters-0.227
 sneaking-0.225
Token and
Token ID290
Feature activation+1.92e-02

Pos logit contributions
igsaw+0.155
ill+0.149
lights+0.147
ushes+0.142
etti+0.139
Neg logit contributions
 trying-0.081
 asking-0.081
 going-0.069
 telling-0.069
 mean-0.067
Token sharing
Token ID7373
Feature activation+2.95e-02

Pos logit contributions
igsaw+0.041
ill+0.041
lights+0.038
etti+0.037
ushes+0.037
Neg logit contributions
 asking-0.030
 trying-0.029
 mean-0.026
 telling-0.026
 sneaking-0.025
Token them
Token ID606
Feature activation+2.27e-02

Pos logit contributions
igsaw+0.071
ill+0.069
lights+0.067
etti+0.066
umm+0.065
Neg logit contributions
 asking-0.038
 trying-0.037
 mean-0.033
 telling-0.032
 hunters-0.032
Token it
Token ID340
Feature activation+8.11e-02

Pos logit contributions
ill+0.064
igsaw+0.063
lights+0.061
etti+0.059
ushes+0.058
Neg logit contributions
 asking-0.042
 trying-0.041
 mean-0.038
 sneaking-0.037
 hunters-0.037
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
ill+0.192
igsaw+0.192
lights+0.183
etti+0.178
ushes+0.175
Neg logit contributions
 asking-0.103
 trying-0.099
 mean-0.088
 hunters-0.085
 sneaking-0.085
Token found
Token ID1043
Feature activation+5.58e-03

Pos logit contributions
igsaw+0.037
ill+0.037
lights+0.036
etti+0.035
ushes+0.034
Neg logit contributions
 asking-0.013
 trying-0.012
 mean-0.010
 sneaking-0.010
 telling-0.010
Token some
Token ID617
Feature activation+7.11e-02

Pos logit contributions
ill+0.013
igsaw+0.012
lights+0.012
ushes+0.012
etti+0.012
Neg logit contributions
 asking-0.008
 trying-0.007
 mean-0.007
 sneaking-0.007
 telling-0.006
Token y
Token ID331
Feature activation-5.86e-03

Pos logit contributions
igsaw+0.135
ill+0.134
lights+0.132
ushes+0.126
umm+0.124
Neg logit contributions
 asking-0.114
 trying-0.108
 sneaking-0.104
 hunters-0.100
 challenged-0.100
Tokenumm
Token ID13929
Feature activation+3.23e-01

Pos logit contributions
 asking+0.015
 trying+0.014
 hunters+0.013
 mean+0.013
 le+0.013
Neg logit contributions
ill-0.006
igsaw-0.006
lights-0.006
etti-0.005
ushes-0.005
Tokenmy
Token ID1820
Feature activation-4.44e-02

Pos logit contributions
igsaw+0.409
ill+0.397
lights+0.368
etti+0.351
ushes+0.340
Neg logit contributions
 asking-0.772
 trying-0.755
 mean-0.708
 hunters-0.699
 sneaking-0.698
Token food
Token ID2057
Feature activation+1.75e-04

Pos logit contributions
 asking+0.064
 trying+0.062
 mean+0.057
 hunters+0.056
 sneaking+0.055
Neg logit contributions
ill-0.096
igsaw-0.095
lights-0.090
etti-0.088
ushes-0.086
Token inside
Token ID2641
Feature activation+1.61e-01

Pos logit contributions
ill+0.000
igsaw+0.000
lights+0.000
etti+0.000
ushes+0.000
Neg logit contributions
 asking-0.000
 trying-0.000
 mean-0.000
 sneaking-0.000
 hunters-0.000
Token.
Token ID13
Feature activation+2.42e-02

Pos logit contributions
ill+0.354
igsaw+0.343
lights+0.331
etti+0.320
ushes+0.313
Neg logit contributions
 asking-0.226
 trying-0.216
 mean-0.204
 hunters-0.198
 sneaking-0.196
Token Leo
Token ID19632
Feature activation-9.10e-02

Pos logit contributions
ill+0.044
igsaw+0.043
lights+0.040
etti+0.039
ushes+0.039
Neg logit contributions
 asking-0.044
 trying-0.043
 mean-0.040
 sneaking-0.039
 hunters-0.039
Token.
Token ID13
Feature activation+4.06e-02

Pos logit contributions
ill+0.205
igsaw+0.201
lights+0.192
etti+0.187
ushes+0.180
Neg logit contributions
 asking-0.136
 trying-0.130
 mean-0.123
 hunters-0.123
 sneaking-0.121
Token
Token ID220
Feature activation+4.79e-02

Pos logit contributions
lights+0.091
ill+0.084
etti+0.080
ushes+0.080
igsaw+0.079
Neg logit contributions
 asking-0.059
 trying-0.057
 telling-0.048
 challenged-0.047
 walking-0.046
Token
Token ID198
Feature activation+4.44e-02

Pos logit contributions
lights+0.107
igsaw+0.107
umm+0.105
ill+0.102
etti+0.098
Neg logit contributions
 asking-0.063
 trying-0.062
 walking-0.053
 hunters-0.052
 telling-0.052
Token
Token ID198
Feature activation+4.11e-02

Pos logit contributions
lights+0.091
igsaw+0.089
ill+0.088
etti+0.086
umm+0.086
Neg logit contributions
 asking-0.069
 trying-0.067
 mean-0.060
 telling-0.059
 hunters-0.058
TokenJ
Token ID41
Feature activation+1.15e-01

Pos logit contributions
ill+0.067
igsaw+0.067
lights+0.064
etti+0.061
ushes+0.059
Neg logit contributions
 asking-0.082
 trying-0.080
 mean-0.075
 hunters-0.073
 sneaking-0.073
Tokenill
Token ID359
Feature activation+3.23e-01

Pos logit contributions
lights+0.118
umm+0.102
etti+0.099
igsaw+0.096
ushes+0.093
Neg logit contributions
 asking-0.275
 trying-0.270
 mean-0.264
 telling-0.254
 hunters-0.252
Token was
Token ID373
Feature activation+3.15e-02

Pos logit contributions
ill+0.648
igsaw+0.603
lights+0.582
etti+0.573
ushes+0.546
Neg logit contributions
 asking-0.526
 trying-0.504
 hunters-0.493
 mean-0.479
 sneaking-0.476
Token so
Token ID523
Feature activation+6.28e-02

Pos logit contributions
ill+0.075
igsaw+0.075
lights+0.072
etti+0.069
ushes+0.068
Neg logit contributions
 asking-0.040
 trying-0.038
 mean-0.034
 sneaking-0.033
 hunters-0.033
Token excited
Token ID6568
Feature activation-3.31e-03

Pos logit contributions
igsaw+0.166
lights+0.165
ill+0.162
etti+0.158
umm+0.156
Neg logit contributions
 trying-0.070
 asking-0.067
 mean-0.064
 angry-0.062
 naughty-0.057
Token!
Token ID0
Feature activation+9.22e-02

Pos logit contributions
 asking+0.005
 trying+0.005
 mean+0.005
 hunters+0.004
 sneaking+0.004
Neg logit contributions
ill-0.007
igsaw-0.007
lights-0.006
etti-0.006
umm-0.006
Token She
Token ID1375
Feature activation+6.35e-02

Pos logit contributions
lights+0.175
ill+0.165
igsaw+0.164
etti+0.158
ushes+0.154
Neg logit contributions
 asking-0.167
 trying-0.158
 telling-0.139
 walking-0.138
 demanding-0.137
Token.
Token ID13
Feature activation+2.73e-02

Pos logit contributions
igsaw+0.268
ill+0.262
lights+0.254
etti+0.252
ushes+0.245
Neg logit contributions
 asking-0.127
 trying-0.125
 mean-0.108
 telling-0.103
 walking-0.101
Token It
Token ID632
Feature activation+1.34e-01

Pos logit contributions
lights+0.066
ill+0.057
etti+0.055
igsaw+0.055
kin+0.054
Neg logit contributions
 asking-0.040
 trying-0.038
 walking-0.033
 telling-0.032
 le-0.031
Token was
Token ID373
Feature activation+3.44e-02

Pos logit contributions
igsaw+0.307
ill+0.298
lights+0.285
etti+0.278
ushes+0.275
Neg logit contributions
 asking-0.187
 trying-0.184
 mean-0.171
 telling-0.161
 sneaking-0.160
Token even
Token ID772
Feature activation+1.96e-01

Pos logit contributions
ill+0.076
igsaw+0.075
lights+0.072
etti+0.069
ushes+0.069
Neg logit contributions
 asking-0.048
 trying-0.045
 hunters-0.042
 mean-0.041
 sneaking-0.041
Token y
Token ID331
Feature activation-1.32e-03

Pos logit contributions
ill+0.376
igsaw+0.366
lights+0.365
ushes+0.345
umm+0.325
Neg logit contributions
 asking-0.307
 trying-0.294
 hunters-0.280
 sneaking-0.278
 sheep-0.272
Tokenumm
Token ID13929
Feature activation+3.23e-01

Pos logit contributions
 trying+0.003
 asking+0.003
 le+0.003
 mean+0.003
 going+0.003
Neg logit contributions
lights-0.002
ill-0.001
igsaw-0.001
ushes-0.001
etti-0.001
Tokenier
Token ID959
Feature activation+1.95e-01

Pos logit contributions
igsaw+0.457
ill+0.428
lights+0.422
etti+0.395
ushes+0.394
Neg logit contributions
 asking-0.723
 trying-0.709
 mean-0.650
 sneaking-0.649
 going-0.647
Token than
Token ID621
Feature activation+6.12e-02

Pos logit contributions
ill+0.376
igsaw+0.370
lights+0.352
etti+0.338
ushes+0.331
Neg logit contributions
 asking-0.331
 trying-0.320
 mean-0.300
 hunters-0.292
 sneaking-0.291
Token the
Token ID262
Feature activation+2.15e-02

Pos logit contributions
igsaw+0.129
lights+0.126
ill+0.125
etti+0.123
ushes+0.122
Neg logit contributions
 trying-0.095
 asking-0.094
 going-0.081
 mean-0.081
 telling-0.081
Token first
Token ID717
Feature activation-5.11e-02

Pos logit contributions
ill+0.042
igsaw+0.042
umm+0.039
etti+0.039
lights+0.038
Neg logit contributions
 asking-0.033
 trying-0.030
 sneaking-0.030
 hunters-0.029
 challenged-0.028
Token card
Token ID2657
Feature activation+1.88e-02

Pos logit contributions
 asking+0.069
 trying+0.066
 mean+0.060
 sneaking+0.059
 hunters+0.059
Neg logit contributions
ill-0.116
igsaw-0.115
lights-0.108
etti-0.106
ushes-0.104
Token placed
Token ID4624
Feature activation+6.80e-02

Pos logit contributions
igsaw+0.066
ill+0.065
lights+0.062
etti+0.060
umm+0.059
Neg logit contributions
 asking-0.035
 trying-0.034
 telling-0.029
 mean-0.029
 walking-0.028
Token it
Token ID340
Feature activation+8.51e-02

Pos logit contributions
igsaw+0.164
ill+0.162
lights+0.153
umm+0.150
etti+0.149
Neg logit contributions
 asking-0.089
 trying-0.086
 walking-0.073
 hunters-0.073
 mean-0.072
Token on
Token ID319
Feature activation+5.92e-02

Pos logit contributions
igsaw+0.223
ill+0.219
lights+0.206
umm+0.205
etti+0.204
Neg logit contributions
 asking-0.091
 trying-0.089
 walking-0.076
 mean-0.076
 sneaking-0.072
Token her
Token ID607
Feature activation+4.63e-02

Pos logit contributions
igsaw+0.146
ill+0.144
lights+0.141
ushes+0.136
umm+0.135
Neg logit contributions
 asking-0.070
 trying-0.069
 mean-0.058
 telling-0.056
 going-0.056
Token windows
Token ID9168
Feature activation+2.85e-02

Pos logit contributions
ill+0.087
igsaw+0.084
lights+0.081
etti+0.077
ushes+0.076
Neg logit contributions
 asking-0.084
 trying-0.083
 mean-0.075
 telling-0.073
 sneaking-0.073
Tokenill
Token ID359
Feature activation+3.23e-01

Pos logit contributions
igsaw+0.065
lights+0.058
umm+0.057
aire+0.057
etti+0.056
Neg logit contributions
 asking-0.041
 trying-0.040
 telling-0.034
 going-0.034
 walking-0.034
Token.
Token ID13
Feature activation+3.24e-02

Pos logit contributions
ill+0.695
igsaw+0.677
lights+0.646
etti+0.639
ushes+0.613
Neg logit contributions
 asking-0.457
 trying-0.440
 mean-0.437
 hunters-0.431
 sneaking-0.405
Token Every
Token ID3887
Feature activation+2.93e-02

Pos logit contributions
lights+0.070
ill+0.067
igsaw+0.065
etti+0.061
umm+0.060
Neg logit contributions
 asking-0.049
 trying-0.044
 walking-0.040
 telling-0.040
 le-0.038
Token time
Token ID640
Feature activation-3.37e-02

Pos logit contributions
ill+0.064
igsaw+0.064
lights+0.060
etti+0.058
ushes+0.057
Neg logit contributions
 asking-0.043
 trying-0.041
 mean-0.037
 hunters-0.037
 sneaking-0.036
Token she
Token ID673
Feature activation+1.00e-02

Pos logit contributions
 asking+0.054
 trying+0.053
 mean+0.048
 sneaking+0.047
 hunters+0.047
Neg logit contributions
ill-0.068
igsaw-0.068
lights-0.064
etti-0.062
ushes-0.061
Token saw
Token ID2497
Feature activation-1.24e-01

Pos logit contributions
igsaw+0.023
ill+0.023
lights+0.022
etti+0.021
ushes+0.021
Neg logit contributions
 asking-0.014
 trying-0.013
 mean-0.012
 sneaking-0.011
 telling-0.011
Token
Token ID198
Feature activation-6.71e-03

Pos logit contributions
 asking+0.010
 trying+0.010
 mean+0.009
 sneaking+0.009
 hunters+0.009
Neg logit contributions
ill-0.009
igsaw-0.009
lights-0.008
ushes-0.008
etti-0.008
TokenThey
Token ID2990
Feature activation-3.59e-02

Pos logit contributions
 asking+0.012
 trying+0.012
 mean+0.011
 hunters+0.011
 sneaking+0.011
Neg logit contributions
ill-0.012
igsaw-0.012
ushes-0.011
lights-0.011
etti-0.010
Token saw
Token ID2497
Feature activation-1.58e-01

Pos logit contributions
 asking+0.056
 trying+0.054
 hunters+0.049
 sneaking+0.048
 mean+0.048
Neg logit contributions
ill-0.074
igsaw-0.071
lights-0.068
etti-0.066
ushes-0.065
Token the
Token ID262
Feature activation-1.93e-02

Pos logit contributions
 asking+0.239
 trying+0.228
 sneaking+0.207
 mean+0.205
 telling+0.202
Neg logit contributions
ill-0.328
igsaw-0.324
lights-0.318
ushes-0.298
etti-0.298
Token wood
Token ID4898
Feature activation-1.22e-01

Pos logit contributions
 asking+0.027
 trying+0.025
 sneaking+0.023
 challenged+0.022
 telling+0.022
Neg logit contributions
igsaw-0.042
lights-0.042
ill-0.041
umm-0.038
ushes-0.038
Tokencut
Token ID8968
Feature activation-3.04e-01

Pos logit contributions
 asking+0.168
 trying+0.153
 cowboy+0.145
 mean+0.145
 sneaking+0.142
Neg logit contributions
lights-0.267
ill-0.266
igsaw-0.258
etti-0.240
ushes-0.235
Tokenter
Token ID353
Feature activation+5.29e-02

Pos logit contributions
 asking+0.561
 trying+0.529
 mean+0.525
 sneaking+0.518
 cowboy+0.502
Neg logit contributions
ill-0.513
lights-0.477
etti-0.447
igsaw-0.439
umm-0.417
Token coming
Token ID2406
Feature activation-1.35e-01

Pos logit contributions
ill+0.112
lights+0.105
etti+0.100
igsaw+0.097
+0.093
Neg logit contributions
 cowboy-0.067
 asking-0.066
 sheriff-0.062
 mean-0.061
 challenged-0.060
Token to
Token ID284
Feature activation-1.19e-01

Pos logit contributions
 asking+0.239
 cowboy+0.230
 trying+0.226
 gir+0.217
 mean+0.217
Neg logit contributions
ill-0.254
lights-0.222
etti-0.215
igsaw-0.212
ushes-0.208
Token the
Token ID262
Feature activation-9.26e-03

Pos logit contributions
 asking+0.170
 trying+0.164
 mean+0.149
 hunters+0.145
 sneaking+0.145
Neg logit contributions
ill-0.262
igsaw-0.261
lights-0.249
etti-0.241
ushes-0.237
Token woods
Token ID16479
Feature activation-1.12e-01

Pos logit contributions
 asking+0.014
 trying+0.013
 telling+0.012
 challenged+0.012
 sneaking+0.012
Neg logit contributions
igsaw-0.019
ill-0.018
lights-0.018
umm-0.017
etti-0.017
Token."
Token ID526
Feature activation+3.95e-02

Pos logit contributions
ill+0.213
igsaw+0.211
lights+0.202
etti+0.195
ushes+0.192
Neg logit contributions
 asking-0.130
 trying-0.125
 mean-0.114
 hunters-0.111
 sneaking-0.111
Token
Token ID198
Feature activation+1.54e-02

Pos logit contributions
ill+0.068
igsaw+0.068
lights+0.064
etti+0.061
ushes+0.060
Neg logit contributions
 asking-0.075
 trying-0.073
 mean-0.068
 hunters-0.067
 sneaking-0.067
Token
Token ID198
Feature activation+1.01e-02

Pos logit contributions
ill+0.026
igsaw+0.025
lights+0.023
ushes+0.022
etti+0.022
Neg logit contributions
 asking-0.030
 trying-0.030
 mean-0.028
 hunters-0.028
 sneaking-0.028
TokenThe
Token ID464
Feature activation+2.93e-02

Pos logit contributions
ill+0.018
igsaw+0.017
lights+0.015
etti+0.015
ushes+0.015
Neg logit contributions
 asking-0.019
 trying-0.019
 mean-0.017
 hunters-0.017
 sneaking-0.017
Token wood
Token ID4898
Feature activation-1.12e-01

Pos logit contributions
igsaw+0.060
ill+0.055
lights+0.054
umm+0.052
+0.050
Neg logit contributions
 asking-0.043
 trying-0.039
 challenged-0.037
 sneaking-0.037
 telling-0.035
Tokencut
Token ID8968
Feature activation-2.98e-01

Pos logit contributions
 asking+0.171
 trying+0.160
 sneaking+0.154
 hunters+0.150
 cowboy+0.148
Neg logit contributions
ill-0.231
lights-0.223
igsaw-0.214
etti-0.202
ushes-0.201
Tokenter
Token ID353
Feature activation+6.04e-02

Pos logit contributions
 asking+0.530
 trying+0.506
 sneaking+0.484
 mean+0.483
 hunters+0.471
Neg logit contributions
ill-0.541
igsaw-0.494
lights-0.487
etti-0.474
umm-0.443
Token was
Token ID373
Feature activation+1.40e-02

Pos logit contributions
ill+0.113
lights+0.101
etti+0.097
igsaw+0.095
ushes+0.092
Neg logit contributions
 asking-0.094
 hunters-0.091
 cowboy-0.089
 sheriff-0.085
 trying-0.085
Token sad
Token ID6507
Feature activation-8.67e-02

Pos logit contributions
ill+0.028
ushes+0.026
lights+0.025
igsaw+0.025
etti+0.023
Neg logit contributions
 hunters-0.018
 cowboy-0.018
 guns-0.018
 gir-0.018
 asking-0.018
Token.
Token ID13
Feature activation+8.36e-03

Pos logit contributions
 asking+0.112
 trying+0.107
 mean+0.097
 hunters+0.094
 sneaking+0.094
Neg logit contributions
ill-0.203
igsaw-0.201
lights-0.193
etti-0.186
ushes-0.183
Token He
Token ID679
Feature activation-1.68e-02

Pos logit contributions
ill+0.014
igsaw+0.014
lights+0.013
etti+0.013
ushes+0.012
Neg logit contributions
 asking-0.016
 trying-0.016
 mean-0.015
 hunters-0.015
 sneaking-0.014
Token Stop
Token ID13707
Feature activation-1.52e-02

Pos logit contributions
ill+0.184
igsaw+0.182
lights+0.170
etti+0.163
ushes+0.158
Neg logit contributions
 asking-0.229
 trying-0.224
 mean-0.210
 hunters-0.206
 sneaking-0.206
Token cutting
Token ID7720
Feature activation-1.30e-01

Pos logit contributions
 hunters+0.024
 asking+0.024
 gir+0.023
 trying+0.022
 mean+0.022
Neg logit contributions
ill-0.028
igsaw-0.026
etti-0.025
lights-0.025
-0.023
Token!"
Token ID2474
Feature activation+5.27e-02

Pos logit contributions
 asking+0.157
 trying+0.154
 mean+0.140
 guns+0.127
ep+0.126
Neg logit contributions
igsaw-0.319
ill-0.307
lights-0.298
etti-0.295
umm-0.290
Token The
Token ID383
Feature activation+1.09e-02

Pos logit contributions
ill+0.090
igsaw+0.089
lights+0.084
etti+0.080
ushes+0.078
Neg logit contributions
 asking-0.100
 trying-0.097
 mean-0.091
 hunters-0.090
 sneaking-0.090
Token wood
Token ID4898
Feature activation-1.11e-01

Pos logit contributions
igsaw+0.024
ill+0.022
lights+0.022
umm+0.020
etti+0.020
Neg logit contributions
 asking-0.015
 trying-0.014
 sneaking-0.013
 challenged-0.013
 telling-0.013
Tokencut
Token ID8968
Feature activation-2.97e-01

Pos logit contributions
 asking+0.165
 trying+0.154
 hunters+0.146
 sneaking+0.143
 cowboy+0.139
Neg logit contributions
ill-0.238
lights-0.229
igsaw-0.225
etti-0.209
ushes-0.208
Tokenters
Token ID1010
Feature activation-2.54e-02

Pos logit contributions
 asking+0.493
 trying+0.461
 sneaking+0.452
 mean+0.451
 hunters+0.445
Neg logit contributions
ill-0.565
lights-0.498
etti-0.495
igsaw-0.490
umm-0.453
Token heard
Token ID2982
Feature activation-1.05e-01

Pos logit contributions
 asking+0.037
 hunters+0.034
 trying+0.033
 cowboy+0.033
 sneaking+0.032
Neg logit contributions
ill-0.055
lights-0.049
igsaw-0.049
etti-0.047
ushes-0.046
Token them
Token ID606
Feature activation+9.10e-03

Pos logit contributions
 asking+0.154
 trying+0.150
 mean+0.136
 sneaking+0.134
 hunters+0.134
Neg logit contributions
ill-0.227
igsaw-0.221
lights-0.211
etti-0.204
ushes-0.198
Token and
Token ID290
Feature activation+2.35e-03

Pos logit contributions
ill+0.019
lights+0.016
etti+0.016
igsaw+0.016
aney+0.015
Neg logit contributions
 cowboy-0.014
 hunters-0.014
 asking-0.014
 doctors-0.013
 sheriff-0.013
Token stopped
Token ID5025
Feature activation-6.19e-02

Pos logit contributions
ill+0.005
lights+0.005
etti+0.005
igsaw+0.004
ushes+0.004
Neg logit contributions
 asking-0.003
 trying-0.003
 hunters-0.003
 mean-0.003
 cowboy-0.003
Token."
Token ID526
Feature activation+1.91e-02

Pos logit contributions
 asking+0.133
 trying+0.128
 cowboy+0.121
 hunters+0.119
 chased+0.119
Neg logit contributions
ill-0.144
igsaw-0.127
lights-0.125
etti-0.120
ushes-0.118
Token
Token ID198
Feature activation-6.07e-03

Pos logit contributions
igsaw+0.042
ill+0.040
lights+0.040
etti+0.039
umm+0.039
Neg logit contributions
 asking-0.029
 trying-0.028
 mean-0.026
 telling-0.024
 walking-0.024
Token
Token ID198
Feature activation-7.71e-03

Pos logit contributions
 asking+0.011
 trying+0.011
 mean+0.010
 hunters+0.010
 telling+0.010
Neg logit contributions
ill-0.011
igsaw-0.011
lights-0.010
etti-0.010
umm-0.010
TokenThe
Token ID464
Feature activation+1.83e-02

Pos logit contributions
 asking+0.015
 trying+0.015
 sneaking+0.014
 going+0.014
 walking+0.014
Neg logit contributions
ill-0.013
igsaw-0.012
lights-0.011
ushes-0.011
etti-0.011
Token wood
Token ID4898
Feature activation-1.17e-01

Pos logit contributions
umm+0.031
lights+0.030
igsaw+0.029
olk+0.028
+0.028
Neg logit contributions
 sneaking-0.025
 asking-0.024
 chased-0.023
 chasing-0.023
 challenged-0.023
Tokencut
Token ID8968
Feature activation-2.97e-01

Pos logit contributions
 asking+0.167
 trying+0.155
 sneaking+0.150
 going+0.145
 hunters+0.139
Neg logit contributions
ill-0.253
lights-0.248
igsaw-0.225
etti-0.222
ushes-0.220
Tokenter
Token ID353
Feature activation+5.69e-02

Pos logit contributions
 asking+0.513
 trying+0.495
 mean+0.459
 sneaking+0.455
 hunters+0.448
Neg logit contributions
ill-0.563
igsaw-0.545
lights-0.527
etti-0.508
ushes-0.487
Token sighed
Token ID21893
Feature activation-4.00e-02

Pos logit contributions
ill+0.099
lights+0.087
etti+0.082
igsaw+0.079
cake+0.079
Neg logit contributions
 asking-0.105
 trying-0.097
 hunters-0.096
 going-0.095
 cowboy-0.092
Token and
Token ID290
Feature activation-1.80e-02

Pos logit contributions
 asking+0.049
 trying+0.047
 mean+0.042
 sneaking+0.041
 hunters+0.040
Neg logit contributions
ill-0.096
igsaw-0.096
lights-0.091
etti-0.089
ushes-0.088
Token shook
Token ID14682
Feature activation-1.50e-02

Pos logit contributions
 asking+0.025
 trying+0.024
 hunters+0.022
 sneaking+0.022
 mean+0.021
Neg logit contributions
ill-0.038
lights-0.037
igsaw-0.036
etti-0.035
ushes-0.034
Token his
Token ID465
Feature activation-5.97e-02

Pos logit contributions
 asking+0.019
 trying+0.018
 mean+0.016
 hunters+0.016
 sneaking+0.016
Neg logit contributions
ill-0.036
igsaw-0.036
lights-0.034
etti-0.033
ushes-0.033
Token not
Token ID407
Feature activation-8.63e-02

Pos logit contributions
igsaw+0.058
ill+0.057
lights+0.056
etti+0.054
umm+0.054
Neg logit contributions
 asking-0.022
 trying-0.022
 mean-0.019
 telling-0.017
 sneaking-0.017
Token understand
Token ID1833
Feature activation+1.05e-02

Pos logit contributions
 asking+0.128
 trying+0.124
 mean+0.116
 sneaking+0.110
 hunters+0.110
Neg logit contributions
igsaw-0.186
ill-0.183
lights-0.178
etti-0.171
ushes-0.170
Token what
Token ID644
Feature activation+6.00e-03

Pos logit contributions
ill+0.022
igsaw+0.021
lights+0.021
etti+0.020
ushes+0.019
Neg logit contributions
 asking-0.016
 trying-0.015
 hunters-0.014
 mean-0.014
 sneaking-0.013
Token a
Token ID257
Feature activation-4.06e-02

Pos logit contributions
igsaw+0.012
ill+0.012
lights+0.012
etti+0.011
ushes+0.011
Neg logit contributions
 asking-0.009
 trying-0.008
 hunters-0.008
 cowboy-0.008
 gir-0.007
Token "
Token ID366
Feature activation-1.00e-01

Pos logit contributions
 asking+0.060
 trying+0.055
 hunters+0.051
ep+0.050
 telling+0.049
Neg logit contributions
igsaw-0.085
lights-0.075
ill-0.074
umm-0.074
 pleasure-0.071
Tokencut
Token ID8968
Feature activation-2.95e-01

Pos logit contributions
 asking+0.168
 trying+0.163
 mean+0.153
 telling+0.147
 hunters+0.147
Neg logit contributions
igsaw-0.198
ill-0.193
lights-0.187
etti-0.179
ushes-0.173
Token"
Token ID1
Feature activation-5.59e-02

Pos logit contributions
 asking+0.571
 trying+0.551
 mean+0.532
 sneaking+0.530
 chased+0.519
Neg logit contributions
ill-0.467
igsaw-0.424
umm-0.397
etti-0.395
lights-0.383
Token was
Token ID373
Feature activation+1.83e-02

Pos logit contributions
 asking+0.071
 trying+0.068
 mean+0.063
 telling+0.059
 hunters+0.059
Neg logit contributions
igsaw-0.135
ill-0.131
lights-0.127
etti-0.125
ushes-0.121
Token.
Token ID13
Feature activation+2.16e-02

Pos logit contributions
igsaw+0.036
 Jane+0.035
ill+0.034
 pleasure+0.032
 Sus+0.032
Neg logit contributions
 gir-0.017
 hunters-0.016
 cowboy-0.016
cod-0.016
 doctors-0.016
Token The
Token ID383
Feature activation+1.73e-02

Pos logit contributions
ill+0.039
igsaw+0.039
lights+0.038
etti+0.036
ushes+0.035
Neg logit contributions
 asking-0.039
 trying-0.038
 mean-0.035
 hunters-0.034
 telling-0.034
Token voice
Token ID3809
Feature activation-2.57e-02

Pos logit contributions
umm+0.035
igsaw+0.034
+0.033
lights+0.033
ill+0.032
Neg logit contributions
 asking-0.021
 sneaking-0.020
 trying-0.019
 chased-0.018
 cowboy-0.018
Token.
Token ID13
Feature activation-4.01e-03

Pos logit contributions
 asking+0.076
 trying+0.073
 mean+0.066
 hunters+0.065
 sneaking+0.064
Neg logit contributions
ill-0.135
igsaw-0.135
lights-0.128
etti-0.125
ushes-0.122
Token They
Token ID1119
Feature activation-3.77e-02

Pos logit contributions
 asking+0.007
 trying+0.007
 mean+0.007
 hunters+0.007
 doctors+0.006
Neg logit contributions
igsaw-0.007
ill-0.007
lights-0.007
etti-0.006
ushes-0.006
Token are
Token ID389
Feature activation+8.58e-02

Pos logit contributions
 asking+0.080
 trying+0.075
 sneaking+0.073
 hunters+0.072
 cowboy+0.071
Neg logit contributions
ill-0.054
igsaw-0.049
lights-0.047
etti-0.046
ushes-0.045
Token for
Token ID329
Feature activation-3.94e-03

Pos logit contributions
ushes+0.165
lights+0.164
ill+0.164
igsaw+0.161
 pleasure+0.145
Neg logit contributions
 challenged-0.103
 asking-0.101
 gir-0.098
 cowboy-0.095
 sneaking-0.091
Token wood
Token ID4898
Feature activation-1.12e-01

Pos logit contributions
 cowboy+0.004
 gir+0.003
 asking+0.003
 mean+0.003
 challenged+0.003
Neg logit contributions
igsaw-0.009
ill-0.009
lights-0.009
etti-0.008
ushes-0.008
Tokencut
Token ID8968
Feature activation-2.92e-01

Pos logit contributions
 asking+0.170
 trying+0.158
 sneaking+0.149
 mean+0.147
 walking+0.147
Neg logit contributions
ill-0.230
lights-0.226
igsaw-0.220
ushes-0.206
etti-0.202
Tokenters
Token ID1010
Feature activation-2.42e-02

Pos logit contributions
 asking+0.489
 trying+0.472
 mean+0.454
 sneaking+0.448
 cowboy+0.440
Neg logit contributions
ill-0.550
lights-0.518
igsaw-0.499
etti-0.492
umm-0.468
Token."
Token ID526
Feature activation+2.46e-02

Pos logit contributions
 asking+0.037
 trying+0.034
 cowboy+0.034
 sneaking+0.033
 gir+0.033
Neg logit contributions
ill-0.048
lights-0.043
igsaw-0.043
ushes-0.041
etti-0.040
Token They
Token ID1119
Feature activation-4.42e-02

Pos logit contributions
igsaw+0.051
ill+0.049
umm+0.048
etti+0.047
lights+0.046
Neg logit contributions
 asking-0.040
 trying-0.037
 telling-0.035
 mean-0.033
 hunters-0.033
Token took
Token ID1718
Feature activation-6.96e-02

Pos logit contributions
 asking+0.053
 trying+0.051
 mean+0.045
 hunters+0.044
 sneaking+0.044
Neg logit contributions
ill-0.108
igsaw-0.104
lights-0.101
etti-0.098
ushes-0.097
Token Lily
Token ID20037
Feature activation-2.33e-02

Pos logit contributions
 asking+0.092
 trying+0.089
 mean+0.082
 hunters+0.080
 guns+0.078
Neg logit contributions
ill-0.161
igsaw-0.160
lights-0.151
etti-0.146
umm-0.146
Token them
Token ID606
Feature activation+5.40e-04

Pos logit contributions
ill+0.120
igsaw+0.118
lights+0.115
etti+0.110
ushes+0.108
Neg logit contributions
 asking-0.083
 trying-0.081
 mean-0.074
 sneaking-0.073
 hunters-0.072
Token.
Token ID13
Feature activation-1.09e-02

Pos logit contributions
ill+0.001
lights+0.001
etti+0.001
igsaw+0.001
ushes+0.001
Neg logit contributions
 asking-0.001
 gir-0.001
 chased-0.001
 trying-0.001
 cowboy-0.001
Token But
Token ID887
Feature activation-4.59e-02

Pos logit contributions
 asking+0.019
 trying+0.018
 mean+0.017
 hunters+0.016
 telling+0.016
Neg logit contributions
igsaw-0.022
ill-0.021
lights-0.020
etti-0.020
umm-0.020
Token the
Token ID262
Feature activation-1.30e-02

Pos logit contributions
 asking+0.078
 trying+0.075
 mean+0.071
 sneaking+0.069
 hunters+0.069
Neg logit contributions
ill-0.086
igsaw-0.082
lights-0.080
etti-0.076
ushes-0.073
Token wood
Token ID4898
Feature activation-1.11e-01

Pos logit contributions
 asking+0.018
 sneaking+0.017
 challenged+0.017
 telling+0.016
 trying+0.016
Neg logit contributions
lights-0.024
umm-0.023
ushes-0.022
igsaw-0.022
ill-0.021
Tokencut
Token ID8968
Feature activation-2.91e-01

Pos logit contributions
 asking+0.141
 trying+0.129
 sneaking+0.124
 hunters+0.116
 mean+0.116
Neg logit contributions
ill-0.256
lights-0.254
igsaw-0.236
etti-0.229
ushes-0.226
Tokenter
Token ID353
Feature activation+6.30e-02

Pos logit contributions
 asking+0.512
 trying+0.493
 mean+0.458
 sneaking+0.456
 telling+0.448
Neg logit contributions
ill-0.543
igsaw-0.517
lights-0.508
etti-0.488
umm-0.463
Token heard
Token ID2982
Feature activation-1.12e-01

Pos logit contributions
ill+0.106
lights+0.089
kin+0.084
etti+0.082
cy+0.082
Neg logit contributions
 asking-0.111
 trying-0.099
 hunters-0.098
 going-0.097
 demanding-0.096
Token their
Token ID511
Feature activation-4.81e-02

Pos logit contributions
 asking+0.174
 trying+0.172
 sneaking+0.159
 gir+0.156
 challenged+0.155
Neg logit contributions
ill-0.218
lights-0.207
igsaw-0.199
etti-0.187
ushes-0.187
Token voices
Token ID10839
Feature activation-2.11e-03

Pos logit contributions
 asking+0.082
 trying+0.078
 challenged+0.077
eh+0.075
 sneaking+0.074
Neg logit contributions
lights-0.086
ushes-0.076
ill-0.073
umm-0.072
igsaw-0.072
Token and
Token ID290
Feature activation-1.13e-02

Pos logit contributions
 asking+0.003
 trying+0.003
 hunters+0.003
 gir+0.003
 cowboy+0.003
Neg logit contributions
ill-0.004
lights-0.004
etti-0.004
igsaw-0.004
ushes-0.003
Token ground
Token ID2323
Feature activation-6.23e-02

Pos logit contributions
igsaw+0.005
ill+0.005
etti+0.005
lights+0.005
umm+0.005
Neg logit contributions
 asking-0.004
 trying-0.004
 sneaking-0.004
 hunters-0.004
 walking-0.004
Token that
Token ID326
Feature activation-1.69e-02

Pos logit contributions
 asking+0.100
 cowboy+0.097
 hunters+0.095
 trying+0.093
 doctors+0.090
Neg logit contributions
ill-0.125
lights-0.114
igsaw-0.112
etti-0.109
cake-0.100
Token said
Token ID531
Feature activation-1.51e-01

Pos logit contributions
 asking+0.019
 trying+0.018
 mean+0.017
 telling+0.015
 going+0.015
Neg logit contributions
igsaw-0.044
ill-0.043
lights-0.041
etti-0.040
umm-0.040
Token "
Token ID366
Feature activation-1.02e-01

Pos logit contributions
 asking+0.188
 trying+0.184
 mean+0.170
 sneaking+0.157
 telling+0.156
Neg logit contributions
igsaw-0.360
ill-0.355
lights-0.349
umm-0.335
ushes-0.332
Tokenshort
Token ID19509
Feature activation-4.80e-02

Pos logit contributions
 asking+0.177
 trying+0.171
 mean+0.159
 hunters+0.154
 sneaking+0.154
Neg logit contributions
igsaw-0.194
ill-0.193
lights-0.186
etti-0.176
ushes-0.172
Tokencut
Token ID8968
Feature activation-2.88e-01

Pos logit contributions
 asking+0.094
 trying+0.091
 sneaking+0.084
 mean+0.084
 hunters+0.083
Neg logit contributions
igsaw-0.079
ill-0.079
etti-0.071
lights-0.068
umm-0.067
Token".
Token ID1911
Feature activation+5.43e-02

Pos logit contributions
 asking+0.535
 trying+0.518
 mean+0.499
 sneaking+0.491
 dangerous+0.490
Neg logit contributions
ill-0.492
igsaw-0.455
etti-0.442
umm-0.434
aire-0.406
Token She
Token ID1375
Feature activation+4.94e-02

Pos logit contributions
lights+0.101
ill+0.101
igsaw+0.097
etti+0.092
ushes+0.091
Neg logit contributions
 asking-0.098
 trying-0.094
 mean-0.086
 le-0.084
 telling-0.084
Token thought
Token ID1807
Feature activation-1.32e-01

Pos logit contributions
ill+0.103
igsaw+0.098
etti+0.096
lights+0.094
ushes+0.092
Neg logit contributions
 asking-0.069
 trying-0.066
 hunters-0.064
 mean-0.062
 sneaking-0.062
Token it
Token ID340
Feature activation+7.20e-02

Pos logit contributions
 asking+0.207
 trying+0.197
 hunters+0.181
 mean+0.180
 sneaking+0.180
Neg logit contributions
ill-0.266
igsaw-0.264
lights-0.251
etti-0.239
ushes-0.236
Token would
Token ID561
Feature activation-6.21e-02

Pos logit contributions
ill+0.123
igsaw+0.117
lights+0.116
etti+0.115
ushes+0.106
Neg logit contributions
 asking-0.126
 hunters-0.119
 trying-0.118
 sneaking-0.112
 stealing-0.111
Token little
Token ID1310
Feature activation-1.02e-01

Pos logit contributions
 asking+0.054
 trying+0.051
 hunters+0.047
 sneaking+0.046
 mean+0.045
Neg logit contributions
igsaw-0.085
ill-0.085
lights-0.081
etti-0.078
ushes-0.077
Token boy
Token ID2933
Feature activation-1.10e-01

Pos logit contributions
 asking+0.153
 trying+0.145
 hunters+0.132
 sneaking+0.132
 mean+0.131
Neg logit contributions
ill-0.212
igsaw-0.211
lights-0.200
etti-0.193
umm-0.190
Token getting
Token ID1972
Feature activation-4.84e-02

Pos logit contributions
 cowboy+0.130
 hunters+0.129
 asking+0.125
 sheriff+0.122
 mean+0.120
Neg logit contributions
ill-0.241
etti-0.219
igsaw-0.208
cy-0.199
-0.195
Token a
Token ID257
Feature activation-3.84e-02

Pos logit contributions
 asking+0.057
 trying+0.054
 sneaking+0.052
 hunters+0.052
 gir+0.051
Neg logit contributions
ill-0.111
igsaw-0.104
ushes-0.103
lights-0.103
etti-0.097
Token buzz
Token ID14713
Feature activation-8.34e-02

Pos logit contributions
 asking+0.062
 trying+0.061
 mean+0.056
 sneaking+0.054
 telling+0.054
Neg logit contributions
ill-0.078
igsaw-0.076
lights-0.074
etti-0.071
ushes-0.069
Tokencut
Token ID8968
Feature activation-2.87e-01

Pos logit contributions
 asking+0.137
 trying+0.127
 hunters+0.116
 sneaking+0.116
 stealing+0.115
Neg logit contributions
ill-0.172
igsaw-0.164
lights-0.157
etti-0.150
umm-0.145
Token and
Token ID290
Feature activation+1.41e-02

Pos logit contributions
 asking+0.520
 mean+0.496
 trying+0.494
 sneaking+0.479
 dangerous+0.478
Neg logit contributions
ill-0.506
etti-0.424
lights-0.421
igsaw-0.412
umm-0.408
Token a
Token ID257
Feature activation-3.97e-02

Pos logit contributions
ill+0.029
igsaw+0.028
etti+0.028
lights+0.027
ushes+0.026
Neg logit contributions
 hunters-0.017
 mean-0.016
 asking-0.016
 cowboy-0.016
 trying-0.015
Token lady
Token ID10846
Feature activation-5.50e-02

Pos logit contributions
 asking+0.061
 trying+0.057
 hunters+0.055
 sneaking+0.052
 telling+0.052
Neg logit contributions
igsaw-0.081
ill-0.079
lights-0.075
etti-0.072
ushes-0.072
Token getting
Token ID1972
Feature activation-4.90e-02

Pos logit contributions
 hunters+0.076
 cowboy+0.076
 sheriff+0.073
 asking+0.072
 pistols+0.068
Neg logit contributions
ill-0.116
etti-0.104
lights-0.099
cy-0.095
igsaw-0.094
Token a
Token ID257
Feature activation-3.93e-02

Pos logit contributions
 asking+0.070
 gir+0.067
 trying+0.067
 sneaking+0.067
 cowboy+0.067
Neg logit contributions
ill-0.091
ushes-0.088
lights-0.087
igsaw-0.085
etti-0.080
Token.
Token ID13
Feature activation+2.88e-02

Pos logit contributions
 asking+0.154
 trying+0.149
 hunters+0.139
 mean+0.136
 sneaking+0.134
Neg logit contributions
ill-0.281
igsaw-0.278
lights-0.264
etti-0.254
ushes-0.251
Token
Token ID198
Feature activation+3.40e-02

Pos logit contributions
ill+0.052
igsaw+0.052
lights+0.051
etti+0.048
ushes+0.047
Neg logit contributions
 asking-0.053
 trying-0.050
 telling-0.045
 mean-0.045
 walking-0.045
Token
Token ID198
Feature activation+2.98e-02

Pos logit contributions
ill+0.056
igsaw+0.055
lights+0.051
ushes+0.048
etti+0.048
Neg logit contributions
 asking-0.068
 trying-0.066
 mean-0.062
 hunters-0.062
 sneaking-0.061
TokenThe
Token ID464
Feature activation+4.35e-02

Pos logit contributions
ill+0.050
igsaw+0.050
+0.042
ushes+0.042
lights+0.042
Neg logit contributions
 asking-0.058
 trying-0.056
 hunters-0.053
 sneaking-0.052
 mean-0.052
Token wood
Token ID4898
Feature activation-9.69e-02

Pos logit contributions
igsaw+0.085
umm+0.079
+0.076
lights+0.071
cake+0.070
Neg logit contributions
 asking-0.063
 guns-0.057
 trying-0.057
 sneaking-0.056
 challenged-0.056
Tokencut
Token ID8968
Feature activation-2.85e-01

Pos logit contributions
 asking+0.100
 sneaking+0.093
 trying+0.092
 hunters+0.086
 cowboy+0.085
Neg logit contributions
ill-0.235
lights-0.229
igsaw-0.220
cake-0.209
ushes-0.208
Tokenter
Token ID353
Feature activation+6.61e-02

Pos logit contributions
 asking+0.517
 trying+0.498
 mean+0.471
 sneaking+0.469
 hunters+0.461
Neg logit contributions
ill-0.510
igsaw-0.476
lights-0.461
etti-0.449
umm-0.425
Token was
Token ID373
Feature activation+2.42e-02

Pos logit contributions
ill+0.128
lights+0.112
igsaw+0.112
etti+0.107
aire+0.106
Neg logit contributions
 asking-0.098
 hunters-0.092
 trying-0.092
 sneaking-0.090
 mean-0.088
Token sad
Token ID6507
Feature activation-7.61e-02

Pos logit contributions
ill+0.050
igsaw+0.047
ushes+0.046
lights+0.045
aney+0.043
Neg logit contributions
 hunters-0.029
 gir-0.029
 guns-0.029
 asking-0.029
 cowboy-0.029
Token,
Token ID11
Feature activation+7.79e-04

Pos logit contributions
 asking+0.095
 trying+0.092
 mean+0.082
 telling+0.079
 sneaking+0.079
Neg logit contributions
ill-0.182
igsaw-0.182
lights-0.173
etti-0.169
ushes-0.166
Token but
Token ID475
Feature activation-2.45e-02

Pos logit contributions
ill+0.002
igsaw+0.002
lights+0.002
etti+0.001
umm+0.001
Neg logit contributions
 asking-0.001
 trying-0.001
 hunters-0.001
 mean-0.001
 sneaking-0.001
Token too
Token ID1165
Feature activation+7.87e-02

Pos logit contributions
ill+0.141
lights+0.136
igsaw+0.133
ushes+0.129
etti+0.121
Neg logit contributions
 asking-0.093
 trying-0.084
 cowboy-0.084
 hunters-0.083
 sneaking-0.083
Token big
Token ID1263
Feature activation+9.32e-03

Pos logit contributions
ill+0.181
igsaw+0.180
lights+0.173
ushes+0.166
etti+0.163
Neg logit contributions
 asking-0.105
 trying-0.099
 sneaking-0.089
 hunters-0.088
 telling-0.087
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
ill+0.017
igsaw+0.016
lights+0.016
umm+0.015
etti+0.015
Neg logit contributions
ep-0.014
 asking-0.014
 mean-0.013
 challenged-0.013
cod-0.013
Token The
Token ID383
Feature activation+2.20e-02

Pos logit contributions
igsaw+0.030
ill+0.030
lights+0.029
etti+0.027
ushes+0.027
Neg logit contributions
 asking-0.031
 trying-0.031
 hunters-0.028
 mean-0.028
 telling-0.027
Token wood
Token ID4898
Feature activation-1.07e-01

Pos logit contributions
umm+0.041
ill+0.041
lights+0.040
igsaw+0.040
ushes+0.039
Neg logit contributions
 asking-0.031
 sneaking-0.029
 yells-0.028
 trying-0.028
 challenged-0.027
Tokencut
Token ID8968
Feature activation-2.85e-01

Pos logit contributions
 asking+0.139
 trying+0.129
 sneaking+0.125
 mean+0.119
 hunters+0.117
Neg logit contributions
ill-0.243
lights-0.238
igsaw-0.225
etti-0.218
ushes-0.215
Tokenter
Token ID353
Feature activation+6.95e-02

Pos logit contributions
 asking+0.518
 trying+0.497
 mean+0.468
 sneaking+0.466
 telling+0.455
Neg logit contributions
ill-0.513
igsaw-0.480
lights-0.472
etti-0.458
umm-0.437
Token could
Token ID714
Feature activation+3.34e-02

Pos logit contributions
ill+0.121
lights+0.100
etti+0.096
+0.096
cy+0.096
Neg logit contributions
 asking-0.120
 hunters-0.108
 sneaking-0.107
 cowboy-0.107
 doctors-0.106
Token not
Token ID407
Feature activation-7.76e-02

Pos logit contributions
ill+0.074
igsaw+0.073
lights+0.070
etti+0.068
ushes+0.067
Neg logit contributions
 asking-0.047
 trying-0.045
 mean-0.041
 hunters-0.040
 sneaking-0.040
Token part
Token ID636
Feature activation+1.31e-01

Pos logit contributions
 asking+0.115
 trying+0.112
 mean+0.103
 hunters+0.099
 sneaking+0.099
Neg logit contributions
igsaw-0.166
ill-0.165
lights-0.159
etti-0.153
ushes-0.151
Token them
Token ID606
Feature activation+1.72e-02

Pos logit contributions
ill+0.290
igsaw+0.289
lights+0.275
etti+0.267
ushes+0.262
Neg logit contributions
 asking-0.185
 trying-0.179
 mean-0.162
 hunters-0.158
 sneaking-0.158
Token his
Token ID465
Feature activation-2.73e-02

Pos logit contributions
ill+0.204
igsaw+0.203
lights+0.193
etti+0.187
ushes+0.183
Neg logit contributions
 asking-0.138
 trying-0.133
 mean-0.121
 sneaking-0.118
 hunters-0.118
Token friends
Token ID2460
Feature activation+4.27e-02

Pos logit contributions
 asking+0.039
 trying+0.036
 sneaking+0.035
 telling+0.034
 challenged+0.034
Neg logit contributions
igsaw-0.057
ill-0.055
lights-0.054
ushes-0.053
umm-0.051
Token,
Token ID11
Feature activation+1.90e-02

Pos logit contributions
ill+0.099
igsaw+0.098
lights+0.093
etti+0.091
ushes+0.089
Neg logit contributions
 asking-0.056
 trying-0.053
 mean-0.049
 hunters-0.048
 sneaking-0.047
Token the
Token ID262
Feature activation+2.43e-02

Pos logit contributions
ill+0.038
igsaw+0.037
lights+0.035
etti+0.034
ushes+0.034
Neg logit contributions
 asking-0.031
 trying-0.030
 mean-0.028
 hunters-0.027
 sneaking-0.027
Token wood
Token ID4898
Feature activation-9.89e-02

Pos logit contributions
igsaw+0.049
umm+0.046
ill+0.046
lights+0.045
 pleasure+0.044
Neg logit contributions
 asking-0.033
 sneaking-0.030
 trying-0.029
 yells-0.029
 challenged-0.029
Tokencut
Token ID8968
Feature activation-2.85e-01

Pos logit contributions
 asking+0.137
 trying+0.127
 sneaking+0.124
 mean+0.122
 hunters+0.117
Neg logit contributions
ill-0.217
lights-0.209
igsaw-0.199
etti-0.194
ushes-0.192
Tokenter
Token ID353
Feature activation+7.49e-02

Pos logit contributions
 asking+0.504
 trying+0.481
 mean+0.460
 sneaking+0.458
 hunters+0.440
Neg logit contributions
ill-0.529
lights-0.482
igsaw-0.481
etti-0.467
umm-0.440
Token was
Token ID373
Feature activation+3.46e-02

Pos logit contributions
ill+0.148
lights+0.130
etti+0.126
igsaw+0.122
ushes+0.121
Neg logit contributions
 asking-0.114
 mean-0.107
 stealing-0.104
 trying-0.104
 sneaking-0.104
Token able
Token ID1498
Feature activation+7.33e-02

Pos logit contributions
ill+0.068
igsaw+0.063
lights+0.062
ushes+0.061
etti+0.060
Neg logit contributions
 asking-0.055
 trying-0.052
 hunters-0.050
 sneaking-0.050
 mean-0.049
Token to
Token ID284
Feature activation-8.69e-02

Pos logit contributions
ill+0.164
igsaw+0.157
lights+0.150
ushes+0.148
etti+0.145
Neg logit contributions
 asking-0.096
 trying-0.089
 mean-0.085
 cowboy-0.085
 sneaking-0.084
Token part
Token ID636
Feature activation+1.47e-01

Pos logit contributions
 asking+0.131
 trying+0.125
 mean+0.115
 hunters+0.114
 sneaking+0.113
Neg logit contributions
ill-0.183
igsaw-0.180
lights-0.170
etti-0.167
ushes-0.162
Token time
Token ID640
Feature activation-4.24e-02

Pos logit contributions
 asking+0.048
 trying+0.045
 hunters+0.042
 telling+0.040
 sneaking+0.039
Neg logit contributions
ill-0.078
igsaw-0.077
lights-0.071
ushes-0.069
aire-0.068
Token there
Token ID612
Feature activation+3.29e-02

Pos logit contributions
 asking+0.052
 trying+0.050
 mean+0.044
 hunters+0.043
 sneaking+0.043
Neg logit contributions
ill-0.102
igsaw-0.101
lights-0.097
etti-0.094
ushes-0.093
Token was
Token ID373
Feature activation+2.55e-02

Pos logit contributions
igsaw+0.090
lights+0.085
ushes+0.082
ill+0.082
umm+0.082
Neg logit contributions
 trying-0.034
 asking-0.033
 mean-0.029
 walking-0.028
 z-0.027
Token a
Token ID257
Feature activation-3.55e-02

Pos logit contributions
ill+0.058
igsaw+0.057
lights+0.053
etti+0.053
ushes+0.051
Neg logit contributions
 asking-0.035
 trying-0.032
 mean-0.030
 sneaking-0.030
 telling-0.029
Token wood
Token ID4898
Feature activation-1.09e-01

Pos logit contributions
 asking+0.046
 trying+0.042
 sneaking+0.037
 hunters+0.037
 telling+0.037
Neg logit contributions
igsaw-0.082
ill-0.080
lights-0.075
aire-0.075
etti-0.075
Tokencut
Token ID8968
Feature activation-2.85e-01

Pos logit contributions
 asking+0.169
 trying+0.153
 sneaking+0.144
 yells+0.143
 demanding+0.139
Neg logit contributions
ill-0.208
lights-0.207
igsaw-0.201
etti-0.185
kin-0.183
Tokenter
Token ID353
Feature activation+7.07e-02

Pos logit contributions
 asking+0.589
 trying+0.559
 mean+0.554
 sneaking+0.550
 demanding+0.534
Neg logit contributions
ill-0.405
lights-0.327
etti-0.327
umm-0.292
cy-0.288
Token who
Token ID508
Feature activation-1.32e-01

Pos logit contributions
ill+0.112
cy+0.096
lights+0.092
ment+0.089
ossom+0.088
Neg logit contributions
 asking-0.124
 robbers-0.118
 yells-0.117
 thieves-0.117
 cowboy-0.116
Token was
Token ID373
Feature activation+2.46e-02

Pos logit contributions
 asking+0.245
 trying+0.227
 cowboy+0.218
 hunters+0.218
 sneaking+0.217
Neg logit contributions
ill-0.209
lights-0.199
igsaw-0.197
ushes-0.193
etti-0.189
Token strong
Token ID1913
Feature activation+3.24e-02

Pos logit contributions
ushes+0.024
+0.023
 moist+0.023
ill+0.022
 perfection+0.022
Neg logit contributions
 cowboy-0.045
 guns-0.043
 sheriff-0.042
 yells-0.041
 goats-0.041
Token and
Token ID290
Feature activation+1.54e-02

Pos logit contributions
ill+0.048
lights+0.047
umm+0.045
igsaw+0.045
etti+0.042
Neg logit contributions
 asking-0.057
have-0.053
 challenged-0.053
ep-0.053
 sneaking-0.051
TokenOne
Token ID3198
Feature activation-9.66e-02

Pos logit contributions
ill+0.022
igsaw+0.020
lights+0.018
ushes+0.017
cake+0.017
Neg logit contributions
 asking-0.027
 thieves-0.026
 trying-0.025
 guns-0.024
 yells-0.024
Token day
Token ID1110
Feature activation-4.28e-02

Pos logit contributions
 asking+0.151
 trying+0.145
 sneaking+0.130
 mean+0.128
 hunters+0.128
Neg logit contributions
ill-0.201
igsaw-0.199
lights-0.188
etti-0.179
ushes-0.178
Token,
Token ID11
Feature activation-7.10e-03

Pos logit contributions
 asking+0.068
 trying+0.063
 sneaking+0.060
 telling+0.060
 mean+0.059
Neg logit contributions
ill-0.082
igsaw-0.080
lights-0.076
etti-0.076
aire-0.072
Token the
Token ID262
Feature activation+3.90e-03

Pos logit contributions
 asking+0.011
 trying+0.010
 sneaking+0.010
 mean+0.009
 telling+0.009
Neg logit contributions
ill-0.014
igsaw-0.014
lights-0.013
etti-0.013
-0.013
Token wood
Token ID4898
Feature activation-1.06e-01

Pos logit contributions
umm+0.007
lights+0.007
ill+0.007
+0.006
igsaw+0.006
Neg logit contributions
 asking-0.006
 yells-0.005
 sneaking-0.005
 telling-0.005
 trying-0.005
Tokencut
Token ID8968
Feature activation-2.84e-01

Pos logit contributions
 asking+0.148
 sneaking+0.145
 cowboy+0.137
 trying+0.136
 chasing+0.130
Neg logit contributions
ill-0.210
lights-0.202
cake-0.177
ment-0.174
kin-0.172
Tokenter
Token ID353
Feature activation+6.79e-02

Pos logit contributions
 asking+0.527
 trying+0.508
 sneaking+0.480
 mean+0.473
 telling+0.464
Neg logit contributions
ill-0.494
lights-0.453
igsaw-0.450
etti-0.438
umm-0.413
Token got
Token ID1392
Feature activation-3.04e-02

Pos logit contributions
ill+0.115
cy+0.103
aire+0.094
ossom+0.089
piece+0.082
Neg logit contributions
 asking-0.111
 cowboy-0.106
 tresp-0.105
 training-0.103
 robber-0.102
Token an
Token ID281
Feature activation-1.63e-02

Pos logit contributions
 asking+0.042
 gir+0.042
 cowboy+0.042
 sneaking+0.041
 sheep+0.039
Neg logit contributions
ill-0.055
ushes-0.055
lights-0.053
igsaw-0.050
etti-0.047
Token idea
Token ID2126
Feature activation-2.50e-02

Pos logit contributions
 asking+0.024
 sneaking+0.022
 trying+0.022
 chasing+0.021
 chased+0.021
Neg logit contributions
igsaw-0.033
umm-0.032
ill-0.031
lights-0.030
ushes-0.029
Token to
Token ID284
Feature activation-1.04e-01

Pos logit contributions
 cowboy+0.043
 asking+0.042
 sheriff+0.042
 gir+0.041
 mean+0.041
Neg logit contributions
ill-0.043
igsaw-0.035
etti-0.035
lights-0.034
cake-0.033
Token there
Token ID612
Feature activation+3.29e-02

Pos logit contributions
 asking+0.005
 trying+0.004
 mean+0.004
 sneaking+0.004
 hunters+0.004
Neg logit contributions
igsaw-0.012
ill-0.012
lights-0.011
etti-0.011
umm-0.011
Token was
Token ID373
Feature activation+2.37e-02

Pos logit contributions
igsaw+0.089
lights+0.085
ill+0.083
ushes+0.082
umm+0.082
Neg logit contributions
 trying-0.035
 asking-0.033
 walking-0.030
 mean-0.029
 z-0.028
Token a
Token ID257
Feature activation-3.75e-02

Pos logit contributions
ill+0.055
igsaw+0.055
lights+0.052
etti+0.051
ushes+0.050
Neg logit contributions
 asking-0.031
 trying-0.029
 mean-0.026
 sneaking-0.026
 telling-0.025
Token kind
Token ID1611
Feature activation-3.37e-02

Pos logit contributions
 asking+0.047
 trying+0.043
 sneaking+0.038
 hunters+0.038
 telling+0.038
Neg logit contributions
igsaw-0.088
ill-0.086
lights-0.081
etti-0.080
aire-0.079
Token wood
Token ID4898
Feature activation-1.05e-01

Pos logit contributions
 asking+0.030
 mean+0.029
 trying+0.028
 gir+0.027
 sheep+0.027
Neg logit contributions
ill-0.091
igsaw-0.090
lights-0.089
etti-0.087
ushes-0.086
Tokencut
Token ID8968
Feature activation-2.83e-01

Pos logit contributions
 asking+0.137
 trying+0.126
 yells+0.119
 sneaking+0.116
 mean+0.113
Neg logit contributions
ill-0.223
igsaw-0.220
lights-0.215
etti-0.203
cake-0.202
Tokenter
Token ID353
Feature activation+6.92e-02

Pos logit contributions
 asking+0.575
 trying+0.547
 mean+0.534
 sneaking+0.533
 demanding+0.513
Neg logit contributions
ill-0.425
etti-0.365
lights-0.360
igsaw-0.331
umm-0.328
Token who
Token ID508
Feature activation-1.33e-01

Pos logit contributions
ill+0.131
cy+0.116
ment+0.112
lights+0.111
ugg+0.110
Neg logit contributions
 asking-0.106
 yells-0.097
 trying-0.094
 robbers-0.094
 thieves-0.093
Token lived
Token ID5615
Feature activation-7.50e-03

Pos logit contributions
 asking+0.248
 trying+0.234
 sneaking+0.221
 hunters+0.219
 telling+0.215
Neg logit contributions
ill-0.214
igsaw-0.207
lights-0.202
etti-0.198
ushes-0.196
Token in
Token ID287
Feature activation+9.16e-02

Pos logit contributions
 yells+0.013
 cowboy+0.013
 gir+0.013
 asking+0.013
 goats+0.013
Neg logit contributions
ushes-0.011
ill-0.011
aps-0.009
utter-0.009
igsaw-0.009
Token the
Token ID262
Feature activation+6.07e-03

Pos logit contributions
igsaw+0.189
ill+0.181
lights+0.168
etti+0.167
ushes+0.164
Neg logit contributions
 asking-0.135
 trying-0.127
 cowboy-0.119
 challenged-0.119
 sneaking-0.119
Token
Token ID220
Feature activation+4.26e-02

Pos logit contributions
lights+0.048
ill+0.046
igsaw+0.046
etti+0.043
ushes+0.043
Neg logit contributions
 asking-0.042
 trying-0.040
 telling-0.036
 le-0.036
 walking-0.035
Token
Token ID198
Feature activation+3.39e-02

Pos logit contributions
igsaw+0.086
lights+0.086
ill+0.084
umm+0.080
etti+0.079
Neg logit contributions
 asking-0.068
 trying-0.066
 walking-0.058
 hunters-0.058
 telling-0.057
Token
Token ID198
Feature activation+3.14e-02

Pos logit contributions
igsaw+0.065
ill+0.064
lights+0.063
etti+0.060
umm+0.059
Neg logit contributions
 asking-0.058
 trying-0.056
 mean-0.051
 telling-0.051
 sneaking-0.050
TokenThe
Token ID464
Feature activation+4.32e-02

Pos logit contributions
ill+0.053
igsaw+0.052
lights+0.046
ushes+0.045
etti+0.045
Neg logit contributions
 asking-0.063
 trying-0.061
 mean-0.056
 hunters-0.056
 sneaking-0.056
Token wood
Token ID4898
Feature activation-9.89e-02

Pos logit contributions
igsaw+0.080
umm+0.076
+0.072
ill+0.066
 pleasure+0.066
Neg logit contributions
 asking-0.062
 guns-0.058
 yells-0.058
 challenged-0.057
 sneaking-0.057
Tokencut
Token ID8968
Feature activation-2.83e-01

Pos logit contributions
 asking+0.091
 trying+0.082
 sneaking+0.081
 mean+0.078
 cowboy+0.075
Neg logit contributions
ill-0.255
lights-0.246
igsaw-0.242
etti-0.231
ushes-0.230
Tokenter
Token ID353
Feature activation+6.74e-02

Pos logit contributions
 asking+0.508
 trying+0.488
 mean+0.463
 sneaking+0.459
 hunters+0.448
Neg logit contributions
ill-0.510
igsaw-0.483
lights-0.467
etti-0.455
umm-0.433
Token was
Token ID373
Feature activation+2.52e-02

Pos logit contributions
ill+0.128
igsaw+0.114
lights+0.111
etti+0.110
ushes+0.105
Neg logit contributions
 asking-0.109
 trying-0.102
 mean-0.102
 hunters-0.102
 sneaking-0.100
Token pleased
Token ID10607
Feature activation+8.57e-02

Pos logit contributions
ill+0.052
igsaw+0.049
ushes+0.047
lights+0.046
etti+0.046
Neg logit contributions
 asking-0.036
 hunters-0.034
 guns-0.033
 trying-0.033
 cowboy-0.033
Token with
Token ID351
Feature activation+9.82e-02

Pos logit contributions
ill+0.194
igsaw+0.191
lights+0.183
etti+0.178
ushes+0.174
Neg logit contributions
 asking-0.118
 trying-0.113
 mean-0.103
 sneaking-0.100
 hunters-0.100
Token himself
Token ID2241
Feature activation+8.96e-02

Pos logit contributions
igsaw+0.208
ill+0.207
lights+0.194
etti+0.187
ushes+0.185
Neg logit contributions
 asking-0.142
 trying-0.136
 mean-0.128
 sneaking-0.125
 hunters-0.125
Token was
Token ID373
Feature activation+2.60e-02

Pos logit contributions
igsaw+0.088
ill+0.088
lights+0.083
etti+0.081
ushes+0.080
Neg logit contributions
 asking-0.059
 trying-0.057
 mean-0.052
 hunters-0.050
 sneaking-0.050
Token mild
Token ID11607
Feature activation+6.40e-02

Pos logit contributions
ill+0.052
lights+0.052
igsaw+0.048
ushes+0.047
etti+0.045
Neg logit contributions
 asking-0.038
 hunters-0.036
 trying-0.035
 gir-0.034
 sneaking-0.034
Token.
Token ID13
Feature activation+1.71e-02

Pos logit contributions
ill+0.147
igsaw+0.146
lights+0.140
etti+0.135
umm+0.133
Neg logit contributions
 asking-0.085
 trying-0.082
 mean-0.074
 sneaking-0.072
 hunters-0.072
Token The
Token ID383
Feature activation+1.98e-02

Pos logit contributions
ill+0.032
igsaw+0.031
lights+0.031
etti+0.029
ushes+0.029
Neg logit contributions
 asking-0.031
 trying-0.030
 hunters-0.027
 mean-0.027
 walking-0.027
Token wood
Token ID4898
Feature activation-1.06e-01

Pos logit contributions
umm+0.038
lights+0.036
+0.035
ill+0.035
igsaw+0.035
Neg logit contributions
 asking-0.027
 sneaking-0.026
 yells-0.026
 trying-0.024
 chasing-0.024
Tokencut
Token ID8968
Feature activation-2.83e-01

Pos logit contributions
 asking+0.132
 trying+0.121
 sneaking+0.120
 hunters+0.113
 stealing+0.109
Neg logit contributions
lights-0.242
ill-0.237
igsaw-0.218
etti-0.213
ushes-0.211
Tokenter
Token ID353
Feature activation+7.02e-02

Pos logit contributions
 asking+0.502
 trying+0.484
 mean+0.450
 sneaking+0.449
 telling+0.439
Neg logit contributions
ill-0.520
igsaw-0.498
lights-0.486
etti-0.467
umm-0.448
Token worked
Token ID3111
Feature activation-8.89e-02

Pos logit contributions
ill+0.120
lights+0.111
cy+0.098
etti+0.096
aire+0.096
Neg logit contributions
 asking-0.121
 stealing-0.112
 trying-0.111
 sneaking-0.111
 hunters-0.110
Token for
Token ID329
Feature activation+9.52e-03

Pos logit contributions
 asking+0.117
 trying+0.109
 mean+0.101
 hunters+0.101
 sneaking+0.100
Neg logit contributions
ill-0.202
igsaw-0.200
lights-0.194
ushes-0.184
etti-0.184
Token a
Token ID257
Feature activation-3.77e-02

Pos logit contributions
igsaw+0.020
ill+0.020
lights+0.019
etti+0.018
ushes+0.018
Neg logit contributions
 asking-0.013
 trying-0.013
 mean-0.012
 sneaking-0.012
 telling-0.012
Token long
Token ID890
Feature activation-7.51e-02

Pos logit contributions
 asking+0.059
 hunters+0.053
 trying+0.053
 yells+0.051
 stealing+0.051
Neg logit contributions
igsaw-0.076
ill-0.067
-0.066
lights-0.065
aire-0.065
Token ever
Token ID1683
Feature activation+1.85e-01

Pos logit contributions
 asking+0.146
 trying+0.138
 mean+0.128
 sneaking+0.125
 hunters+0.123
Neg logit contributions
ill-0.233
igsaw-0.228
lights-0.222
etti-0.212
ushes-0.207
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
ill+0.374
igsaw+0.330
lights+0.328
etti+0.308
umm+0.300
Neg logit contributions
 asking-0.286
 cowboy-0.279
 trying-0.269
 sheriff-0.268
 mean-0.268
Token So
Token ID1406
Feature activation-4.29e-02

Pos logit contributions
ill+0.024
igsaw+0.023
lights+0.022
etti+0.021
ushes+0.021
Neg logit contributions
 asking-0.024
 trying-0.023
 mean-0.021
 hunters-0.021
 telling-0.021
Token the
Token ID262
Feature activation+9.53e-03

Pos logit contributions
 asking+0.060
 trying+0.057
 mean+0.055
 sneaking+0.054
 hunters+0.052
Neg logit contributions
ill-0.092
igsaw-0.091
lights-0.088
etti-0.085
umm-0.083
Token wood
Token ID4898
Feature activation-1.06e-01

Pos logit contributions
umm+0.017
lights+0.015
ill+0.015
 perfection+0.015
igsaw+0.014
Neg logit contributions
 yells-0.014
 sneaking-0.014
 asking-0.013
 chasing-0.013
 barking-0.013
Tokencut
Token ID8968
Feature activation-2.83e-01

Pos logit contributions
 asking+0.121
 sneaking+0.116
 trying+0.112
 mean+0.105
 stealing+0.104
Neg logit contributions
ill-0.254
lights-0.244
igsaw-0.225
ushes-0.224
etti-0.222
Tokenter
Token ID353
Feature activation+7.09e-02

Pos logit contributions
 asking+0.555
 trying+0.531
 sneaking+0.510
 mean+0.505
 telling+0.496
Neg logit contributions
ill-0.462
lights-0.412
etti-0.404
igsaw-0.402
umm-0.375
Token went
Token ID1816
Feature activation+2.05e-03

Pos logit contributions
ill+0.132
cy+0.113
aire+0.108
lights+0.108
etti+0.107
Neg logit contributions
 asking-0.112
 stealing-0.106
 sneaking-0.105
 trying-0.104
 demanding-0.102
Token to
Token ID284
Feature activation-1.01e-01

Pos logit contributions
igsaw+0.003
ill+0.003
lights+0.003
+0.003
etti+0.003
Neg logit contributions
cod-0.004
 yells-0.003
have-0.003
 goats-0.003
 asking-0.003
Token the
Token ID262
Feature activation+9.72e-03

Pos logit contributions
 asking+0.152
 trying+0.145
 mean+0.143
 hunters+0.141
 sneaking+0.137
Neg logit contributions
ill-0.201
igsaw-0.201
etti-0.184
lights-0.177
ushes-0.173
Token tallest
Token ID38760
Feature activation+1.06e-01

Pos logit contributions
umm+0.018
ill+0.017
igsaw+0.016
 perfection+0.016
aire+0.015
Neg logit contributions
 asking-0.015
 yells-0.014
 trying-0.013
 chased-0.013
 sneaking-0.013
Token he
Token ID339
Feature activation-6.67e-02

Pos logit contributions
 asking+0.224
 trying+0.208
 hunters+0.198
 sneaking+0.196
ep+0.189
Neg logit contributions
ill-0.275
igsaw-0.263
lights-0.260
etti-0.245
ushes-0.231
Token wanted
Token ID2227
Feature activation-1.83e-01

Pos logit contributions
 asking+0.116
 trying+0.106
 sneaking+0.104
 hunters+0.103
 cowboy+0.098
Neg logit contributions
ill-0.119
etti-0.101
ushes-0.100
lights-0.100
igsaw-0.098
Token to
Token ID284
Feature activation-1.00e-01

Pos logit contributions
 asking+0.250
 trying+0.242
ep+0.233
 gir+0.230
 hunters+0.229
Neg logit contributions
ill-0.376
igsaw-0.364
lights-0.343
etti-0.324
ushes-0.316
Token play
Token ID711
Feature activation+1.30e-02

Pos logit contributions
 asking+0.155
 trying+0.150
 mean+0.138
 hunters+0.135
 sneaking+0.134
Neg logit contributions
ill-0.207
igsaw-0.206
lights-0.196
etti-0.189
ushes-0.186
Token '
Token ID705
Feature activation-6.22e-02

Pos logit contributions
ill+0.027
igsaw+0.027
lights+0.025
etti+0.024
ushes+0.024
Neg logit contributions
 asking-0.020
 trying-0.019
 sneaking-0.017
 mean-0.017
 hunters-0.017
Tokencut
Token ID8968
Feature activation-2.83e-01

Pos logit contributions
 asking+0.100
 trying+0.097
 mean+0.090
 hunters+0.087
 sneaking+0.087
Neg logit contributions
igsaw-0.124
ill-0.124
lights-0.120
etti-0.115
ushes-0.112
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
 asking+0.478
 trying+0.447
 gir+0.447
 mean+0.445
 chased+0.440
Neg logit contributions
ill-0.508
igsaw-0.479
etti-0.442
umm-0.428
aire-0.428
Token boat
Token ID8848
Feature activation-1.86e-02

Pos logit contributions
igsaw+0.024
ill+0.022
umm+0.022
lights+0.022
 pleasure+0.021
Neg logit contributions
 asking-0.018
 trying-0.016
 sneaking-0.015
ep-0.015
 challenged-0.015
Token'.
Token ID4458
Feature activation+8.43e-02

Pos logit contributions
 asking+0.027
 trying+0.026
 sneaking+0.023
 mean+0.023
 hunters+0.023
Neg logit contributions
ill-0.040
igsaw-0.040
lights-0.037
etti-0.036
ushes-0.035
Token The
Token ID383
Feature activation+2.45e-02

Pos logit contributions
ill+0.148
igsaw+0.147
lights+0.144
etti+0.134
ushes+0.133
Neg logit contributions
 asking-0.160
 trying-0.156
 mean-0.142
 telling-0.140
 hunters-0.140
Token little
Token ID1310
Feature activation-9.52e-02

Pos logit contributions
igsaw+0.049
lights+0.048
ill+0.046
umm+0.043
aire+0.042
Neg logit contributions
 asking-0.038
 trying-0.035
 sneaking-0.033
 telling-0.032
 going-0.031
Token finally
Token ID3443
Feature activation+1.17e-01

Pos logit contributions
ill+0.182
igsaw+0.177
lights+0.171
etti+0.165
ushes+0.160
Neg logit contributions
 asking-0.106
 trying-0.104
 sneaking-0.095
 hunters-0.093
 mean-0.093
Token agreed
Token ID4987
Feature activation+1.37e-03

Pos logit contributions
ill+0.262
igsaw+0.252
lights+0.240
etti+0.236
ushes+0.227
Neg logit contributions
 asking-0.159
 trying-0.153
 sneaking-0.140
 mean-0.138
 hunters-0.138
Token.
Token ID13
Feature activation+2.23e-02

Pos logit contributions
ill+0.003
igsaw+0.003
lights+0.003
etti+0.003
ushes+0.003
Neg logit contributions
 asking-0.002
 trying-0.002
 hunters-0.002
 mean-0.002
 sneaking-0.002
Token The
Token ID383
Feature activation+2.40e-02

Pos logit contributions
ill+0.039
igsaw+0.039
lights+0.039
etti+0.036
ushes+0.035
Neg logit contributions
 asking-0.042
 trying-0.040
 telling-0.036
 hunters-0.036
 walking-0.036
Token wood
Token ID4898
Feature activation-9.75e-02

Pos logit contributions
igsaw+0.044
umm+0.043
+0.041
ill+0.039
 pleasure+0.038
Neg logit contributions
 asking-0.036
 sneaking-0.034
 guns-0.033
 challenged-0.032
 telling-0.032
Tokencut
Token ID8968
Feature activation-2.82e-01

Pos logit contributions
 asking+0.093
 trying+0.086
 sneaking+0.083
 mean+0.081
 hunters+0.078
Neg logit contributions
ill-0.251
lights-0.242
igsaw-0.238
etti-0.227
ushes-0.227
Tokenter
Token ID353
Feature activation+6.81e-02

Pos logit contributions
 asking+0.507
 trying+0.488
 mean+0.463
 sneaking+0.458
 hunters+0.449
Neg logit contributions
ill-0.507
igsaw-0.478
lights-0.465
etti-0.452
umm-0.430
Token was
Token ID373
Feature activation+2.81e-02

Pos logit contributions
ill+0.138
igsaw+0.127
lights+0.124
etti+0.122
ushes+0.117
Neg logit contributions
 asking-0.102
 trying-0.097
 mean-0.095
 sneaking-0.093
 hunters-0.092
Token very
Token ID845
Feature activation-3.48e-02

Pos logit contributions
ill+0.059
igsaw+0.056
lights+0.054
ushes+0.053
etti+0.052
Neg logit contributions
 asking-0.040
 trying-0.038
 hunters-0.038
 sneaking-0.036
 guns-0.036
Token excited
Token ID6568
Feature activation-8.65e-03

Pos logit contributions
 asking+0.055
 trying+0.052
 hunters+0.050
 sneaking+0.050
 guns+0.048
Neg logit contributions
ill-0.070
igsaw-0.069
lights-0.065
ushes-0.064
etti-0.061
Token,
Token ID11
Feature activation+4.02e-03

Pos logit contributions
 asking+0.012
 trying+0.011
 mean+0.011
 hunters+0.011
 sneaking+0.011
Neg logit contributions
ill-0.019
igsaw-0.018
lights-0.018
etti-0.017
umm-0.017
Token can
Token ID460
Feature activation-8.28e-03

Pos logit contributions
 mean+0.030
 trying+0.028
 asking+0.027
 naughty+0.026
 rude+0.026
Neg logit contributions
igsaw-0.057
umm-0.054
ill-0.054
etti-0.053
lights-0.053
Token I
Token ID314
Feature activation+8.06e-02

Pos logit contributions
 asking+0.012
 trying+0.012
 mean+0.011
 naughty+0.010
 telling+0.010
Neg logit contributions
ill-0.018
igsaw-0.018
ushes-0.018
lights-0.017
umm-0.017
Token make
Token ID787
Feature activation+3.93e-02

Pos logit contributions
igsaw+0.186
lights+0.182
ill+0.182
ushes+0.173
umm+0.172
Neg logit contributions
 asking-0.106
 trying-0.106
 mean-0.100
 telling-0.091
 sneaking-0.089
Token you
Token ID345
Feature activation-3.26e-02

Pos logit contributions
ill+0.092
igsaw+0.091
lights+0.088
etti+0.085
ushes+0.084
Neg logit contributions
 asking-0.051
 trying-0.049
 mean-0.044
 hunters-0.043
 sneaking-0.043
Token big
Token ID1263
Feature activation-2.33e-03

Pos logit contributions
 asking+0.049
 trying+0.047
 mean+0.044
 telling+0.042
 sneaking+0.042
Neg logit contributions
ill-0.069
igsaw-0.069
lights-0.066
etti-0.064
ushes-0.063
Token again
Token ID757
Feature activation+2.40e-01

Pos logit contributions
 asking+0.005
 trying+0.004
 sneaking+0.004
 walking+0.004
 mean+0.004
Neg logit contributions
ill-0.004
igsaw-0.004
lights-0.004
etti-0.003
ushes-0.003
Token?"
Token ID1701
Feature activation+8.26e-02

Pos logit contributions
igsaw+0.589
ill+0.568
lights+0.560
etti+0.546
umm+0.538
Neg logit contributions
 asking-0.302
 trying-0.285
 mean-0.267
 telling-0.249
 going-0.238
Token
Token ID198
Feature activation+4.89e-03

Pos logit contributions
igsaw+0.193
etti+0.193
lights+0.192
ill+0.187
umm+0.187
Neg logit contributions
 asking-0.112
 trying-0.110
 mean-0.103
 angry-0.095
 telling-0.092
Token
Token ID198
Feature activation+3.36e-03

Pos logit contributions
igsaw+0.011
umm+0.011
ill+0.011
etti+0.011
lights+0.011
Neg logit contributions
 asking-0.007
 trying-0.006
 mean-0.006
 angry-0.006
 naughty-0.005
TokenL
Token ID43
Feature activation-5.12e-02

Pos logit contributions
ill+0.007
igsaw+0.007
lights+0.007
etti+0.006
umm+0.006
Neg logit contributions
 asking-0.005
 trying-0.005
 mean-0.005
 hunters-0.005
 sneaking-0.005
Tokenily
Token ID813
Feature activation+5.40e-02

Pos logit contributions
 trying+0.071
 asking+0.070
 mean+0.067
 hunters+0.062
 sneaking+0.060
Neg logit contributions
lights-0.113
igsaw-0.113
umm-0.109
etti-0.108
ill-0.103
Token The
Token ID383
Feature activation+2.96e-02

Pos logit contributions
lights+0.066
ill+0.065
igsaw+0.065
umm+0.060
ushes+0.060
Neg logit contributions
 asking-0.056
 trying-0.053
 walking-0.048
 telling-0.048
 le-0.046
Token train
Token ID4512
Feature activation-1.03e-01

Pos logit contributions
igsaw+0.052
lights+0.049
 pleasure+0.045
umm+0.043
kin+0.043
Neg logit contributions
 asking-0.050
 sneaking-0.048
 trying-0.047
 guarding-0.045
 challenged-0.045
Token felt
Token ID2936
Feature activation+1.02e-01

Pos logit contributions
 asking+0.158
 trying+0.153
 mean+0.142
 sneaking+0.140
 hunters+0.139
Neg logit contributions
ill-0.210
igsaw-0.204
lights-0.199
etti-0.192
ushes-0.186
Token proud
Token ID6613
Feature activation+1.05e-01

Pos logit contributions
ill+0.254
igsaw+0.252
lights+0.242
etti+0.239
umm+0.229
Neg logit contributions
 trying-0.126
 asking-0.125
 mean-0.119
 naughty-0.106
 telling-0.106
Token of
Token ID286
Feature activation+9.70e-02

Pos logit contributions
ill+0.230
igsaw+0.228
lights+0.218
etti+0.211
ushes+0.207
Neg logit contributions
 asking-0.151
 trying-0.146
 mean-0.133
 sneaking-0.129
 hunters-0.129
Token itself
Token ID2346
Feature activation+2.27e-01

Pos logit contributions
ill+0.208
igsaw+0.200
umm+0.196
lights+0.196
etti+0.194
Neg logit contributions
 trying-0.170
 asking-0.166
 telling-0.149
 stealing-0.142
 mean-0.141
Token for
Token ID329
Feature activation+1.83e-02

Pos logit contributions
igsaw+0.511
ill+0.506
lights+0.476
etti+0.465
ushes+0.444
Neg logit contributions
 asking-0.302
 trying-0.280
 mean-0.271
 sneaking-0.258
 hunters-0.255
Token being
Token ID852
Feature activation-5.27e-02

Pos logit contributions
igsaw+0.045
ill+0.045
lights+0.043
etti+0.042
ushes+0.041
Neg logit contributions
 asking-0.018
 trying-0.017
 mean-0.016
 hunters-0.016
 gir-0.015
Token independent
Token ID4795
Feature activation+5.60e-02

Pos logit contributions
 asking+0.066
 trying+0.063
 mean+0.056
 sneaking+0.055
 hunters+0.055
Neg logit contributions
ill-0.125
igsaw-0.124
lights-0.118
etti-0.115
ushes-0.114
Token and
Token ID290
Feature activation+3.48e-02

Pos logit contributions
ill+0.120
igsaw+0.116
lights+0.114
etti+0.107
ushes+0.106
Neg logit contributions
 asking-0.079
 trying-0.075
 sneaking-0.071
 mean-0.071
 gir-0.068
Token helpful
Token ID7613
Feature activation+2.79e-02

Pos logit contributions
igsaw+0.082
ill+0.082
lights+0.079
etti+0.077
ushes+0.076
Neg logit contributions
 asking-0.040
 trying-0.038
 mean-0.035
 hunters-0.034
 sneaking-0.034
Token dive
Token ID15647
Feature activation+1.89e-02

Pos logit contributions
 asking+0.147
 trying+0.143
 hunters+0.139
 gir+0.132
 sneaking+0.131
Neg logit contributions
ill-0.187
igsaw-0.178
lights-0.167
etti-0.166
umm-0.159
Token into
Token ID656
Feature activation+1.04e-01

Pos logit contributions
igsaw+0.046
ill+0.044
umm+0.044
etti+0.043
lights+0.042
Neg logit contributions
 trying-0.022
 asking-0.022
 mean-0.020
 hunters-0.019
 going-0.019
Token strange
Token ID6283
Feature activation-2.44e-02

Pos logit contributions
ill+0.223
igsaw+0.222
lights+0.211
etti+0.203
ushes+0.200
Neg logit contributions
 asking-0.150
 trying-0.144
 mean-0.133
 hunters-0.130
 sneaking-0.129
Token bins
Token ID41701
Feature activation-2.63e-02

Pos logit contributions
 asking+0.033
 trying+0.031
 mean+0.029
 telling+0.028
 sneaking+0.028
Neg logit contributions
ill-0.055
igsaw-0.054
lights-0.053
etti-0.050
ushes-0.049
Token ever
Token ID1683
Feature activation+1.91e-01

Pos logit contributions
 asking+0.038
 trying+0.037
 mean+0.034
 hunters+0.033
 sneaking+0.033
Neg logit contributions
ill-0.057
igsaw-0.056
lights-0.054
etti-0.052
ushes-0.051
Token again
Token ID757
Feature activation+2.60e-01

Pos logit contributions
ill+0.306
igsaw+0.267
lights+0.256
etti+0.236
aire+0.230
Neg logit contributions
 cowboy-0.364
 asking-0.362
 gir-0.356
 trying-0.354
 hunters-0.354
Token.
Token ID13
Feature activation+3.75e-02

Pos logit contributions
ill+0.546
igsaw+0.531
lights+0.509
etti+0.483
ushes+0.471
Neg logit contributions
 asking-0.379
 trying-0.366
 mean-0.346
 gir-0.336
 hunters-0.336
Token
Token ID198
Feature activation+4.34e-02

Pos logit contributions
lights+0.078
ill+0.078
igsaw+0.075
etti+0.074
ushes+0.070
Neg logit contributions
 asking-0.059
 trying-0.058
 telling-0.051
 mean-0.051
 hunters-0.051
Token
Token ID198
Feature activation+4.05e-02

Pos logit contributions
ill+0.081
igsaw+0.081
lights+0.076
etti+0.073
ushes+0.072
Neg logit contributions
 asking-0.076
 trying-0.074
 mean-0.069
 sneaking-0.067
 hunters-0.067
TokenJerry
Token ID43462
Feature activation-8.70e-02

Pos logit contributions
igsaw+0.074
ill+0.074
ushes+0.065
lights+0.065
etti+0.063
Neg logit contributions
 asking-0.073
 trying-0.070
 sneaking-0.065
 hunters-0.063
 mean-0.063
Token agreed
Token ID4987
Feature activation+2.02e-03

Pos logit contributions
 asking+0.144
 trying+0.137
 hunters+0.130
 mean+0.128
 sneaking+0.128
Neg logit contributions
ill-0.174
igsaw-0.163
lights-0.154
etti-0.150
ushes-0.148
Token had
Token ID550
Feature activation+2.87e-03

Pos logit contributions
 asking+0.001
 trying+0.001
 hunters+0.001
 sneaking+0.001
 stealing+0.001
Neg logit contributions
ill-0.001
igsaw-0.001
lights-0.001
etti-0.001
ushes-0.001
Token lots
Token ID6041
Feature activation+8.30e-02

Pos logit contributions
ill+0.006
igsaw+0.005
ushes+0.005
lights+0.005
etti+0.005
Neg logit contributions
 asking-0.004
 trying-0.004
 sneaking-0.004
 hunters-0.004
 demanding-0.004
Token of
Token ID286
Feature activation+8.96e-02

Pos logit contributions
ill+0.150
igsaw+0.148
lights+0.140
etti+0.134
ushes+0.132
Neg logit contributions
 asking-0.152
 trying-0.148
 mean-0.138
 sneaking-0.135
 hunters-0.135
Token different
Token ID1180
Feature activation-3.12e-02

Pos logit contributions
igsaw+0.159
ill+0.154
ushes+0.150
lights+0.144
etti+0.130
Neg logit contributions
 sneaking-0.152
 asking-0.151
 trying-0.147
 guarding-0.146
 hunters-0.143
Token cr
Token ID1067
Feature activation+5.09e-02

Pos logit contributions
 asking+0.059
 trying+0.058
 sneaking+0.056
 hunters+0.055
 cowboy+0.055
Neg logit contributions
ill-0.052
igsaw-0.051
lights-0.049
ushes-0.046
etti-0.045
Tokenay
Token ID323
Feature activation+2.09e-01

Pos logit contributions
igsaw+0.058
aire+0.052
ill+0.050
lights+0.050
cy+0.048
Neg logit contributions
 asking-0.114
 trying-0.110
ep-0.108
 mean-0.106
 hunters-0.104
Tokenons
Token ID684
Feature activation+1.81e-02

Pos logit contributions
ill+0.400
igsaw+0.394
lights+0.373
etti+0.361
ushes+0.356
Neg logit contributions
 asking-0.358
 trying-0.347
 mean-0.323
 sneaking-0.321
 hunters-0.320
Token and
Token ID290
Feature activation+1.78e-02

Pos logit contributions
ill+0.041
igsaw+0.040
lights+0.039
etti+0.037
ushes+0.037
Neg logit contributions
 asking-0.025
 trying-0.023
 sneaking-0.022
 mean-0.021
 hunters-0.021
Token was
Token ID373
Feature activation+2.71e-02

Pos logit contributions
ill+0.033
lights+0.033
igsaw+0.033
ushes+0.031
etti+0.031
Neg logit contributions
 asking-0.029
 trying-0.028
 sneaking-0.027
 hunters-0.026
 mean-0.026
Token proud
Token ID6613
Feature activation+9.71e-02

Pos logit contributions
ill+0.054
ushes+0.052
igsaw+0.050
lights+0.049
etti+0.047
Neg logit contributions
 hunters-0.037
 guns-0.037
 asking-0.036
 sneaking-0.035
 gun-0.034
Token of
Token ID286
Feature activation+9.08e-02

Pos logit contributions
ill+0.160
igsaw+0.153
lights+0.153
umm+0.140
etti+0.139
Neg logit contributions
 asking-0.184
 trying-0.181
 sneaking-0.174
 mean-0.173
 chased-0.168
Token.
Token ID13
Feature activation+6.39e-03

Pos logit contributions
ill+0.086
igsaw+0.086
lights+0.082
etti+0.079
ushes+0.078
Neg logit contributions
 asking-0.052
 trying-0.050
 mean-0.045
 hunters-0.044
 sneaking-0.044
Token They
Token ID1119
Feature activation-3.42e-02

Pos logit contributions
igsaw+0.012
lights+0.011
ill+0.011
etti+0.011
umm+0.010
Neg logit contributions
 asking-0.012
 trying-0.011
 le-0.010
 walking-0.010
 hunters-0.010
Token laughed
Token ID13818
Feature activation+4.41e-03

Pos logit contributions
 asking+0.055
 trying+0.053
 hunters+0.049
 sneaking+0.049
 mean+0.048
Neg logit contributions
ill-0.070
igsaw-0.066
lights-0.064
etti-0.062
ushes-0.061
Token and
Token ID290
Feature activation+4.95e-03

Pos logit contributions
ill+0.010
igsaw+0.009
lights+0.009
etti+0.009
ushes+0.009
Neg logit contributions
 asking-0.006
 trying-0.006
 sneaking-0.005
 mean-0.005
 hunters-0.005
Token cl
Token ID537
Feature activation+1.76e-02

Pos logit contributions
ill+0.011
igsaw+0.011
lights+0.010
etti+0.010
ushes+0.010
Neg logit contributions
 asking-0.007
 trying-0.007
 mean-0.006
 hunters-0.006
 sneaking-0.006
Tokenapped
Token ID6320
Feature activation+2.03e-01

Pos logit contributions
igsaw+0.033
lights+0.028
ill+0.027
 perfection+0.026
umm+0.026
Neg logit contributions
ep-0.028
 trying-0.028
 asking-0.027
 challenged-0.027
 named-0.026
Token and
Token ID290
Feature activation+5.53e-03

Pos logit contributions
ill+0.456
igsaw+0.451
lights+0.434
etti+0.418
ushes+0.412
Neg logit contributions
 asking-0.281
 trying-0.270
 mean-0.245
 sneaking-0.239
 hunters-0.239
Token cheered
Token ID34767
Feature activation+4.69e-02

Pos logit contributions
igsaw+0.010
ill+0.010
lights+0.009
etti+0.009
umm+0.009
Neg logit contributions
 asking-0.011
 trying-0.010
 mean-0.010
 telling-0.009
 sneaking-0.009
Token.
Token ID13
Feature activation+6.77e-03

Pos logit contributions
ill+0.105
igsaw+0.104
lights+0.100
etti+0.097
ushes+0.095
Neg logit contributions
 asking-0.065
 trying-0.062
 mean-0.056
 sneaking-0.055
 hunters-0.055
Token
Token ID198
Feature activation+1.09e-02

Pos logit contributions
igsaw+0.015
lights+0.014
ill+0.014
umm+0.014
etti+0.013
Neg logit contributions
 asking-0.010
 trying-0.010
 walking-0.009
 le-0.009
 telling-0.009
Token
Token ID198
Feature activation+9.59e-03

Pos logit contributions
igsaw+0.022
ill+0.021
lights+0.021
umm+0.020
etti+0.020
Neg logit contributions
 asking-0.018
 trying-0.017
 mean-0.016
 telling-0.015
 walking-0.015
Token out
Token ID503
Feature activation+9.50e-02

Pos logit contributions
ill+0.015
igsaw+0.015
lights+0.014
etti+0.013
ushes+0.013
Neg logit contributions
 asking-0.012
 trying-0.011
 hunters-0.011
 mean-0.010
 gir-0.010
Token of
Token ID286
Feature activation+9.44e-02

Pos logit contributions
ill+0.202
igsaw+0.200
lights+0.191
etti+0.183
ushes+0.181
Neg logit contributions
 asking-0.142
 trying-0.137
 mean-0.127
 hunters-0.125
 sneaking-0.123
Token the
Token ID262
Feature activation+7.09e-03

Pos logit contributions
ill+0.230
igsaw+0.230
lights+0.223
etti+0.219
umm+0.219
Neg logit contributions
 asking-0.123
 trying-0.117
 telling-0.099
 walking-0.096
 sneaking-0.095
Token bath
Token ID7837
Feature activation+4.74e-02

Pos logit contributions
igsaw+0.011
lights+0.010
ill+0.010
umm+0.009
etti+0.009
Neg logit contributions
 asking-0.014
 gir-0.014
 hunters-0.013
 trying-0.013
 sneaking-0.013
Tokentub
Token ID37995
Feature activation+1.13e-01

Pos logit contributions
igsaw+0.072
ill+0.067
etti+0.064
ushes+0.061
umm+0.060
Neg logit contributions
 asking-0.101
 trying-0.101
 telling-0.091
 guns-0.090
 mean-0.090
Token.
Token ID13
Feature activation+3.48e-02

Pos logit contributions
ill+0.260
igsaw+0.259
lights+0.248
etti+0.241
ushes+0.237
Neg logit contributions
 asking-0.148
 trying-0.142
 mean-0.128
 sneaking-0.124
 hunters-0.123
Token
Token ID198
Feature activation+3.45e-02

Pos logit contributions
ill+0.073
lights+0.072
igsaw+0.071
etti+0.066
umm+0.066
Neg logit contributions
 asking-0.057
 trying-0.054
 telling-0.047
 walking-0.045
 demanding-0.045
Token
Token ID198
Feature activation+3.14e-02

Pos logit contributions
igsaw+0.069
ill+0.069
lights+0.067
umm+0.066
etti+0.064
Neg logit contributions
 asking-0.056
 trying-0.054
 mean-0.049
 telling-0.049
 walking-0.048
TokenFrom
Token ID4863
Feature activation+5.23e-02

Pos logit contributions
ill+0.059
igsaw+0.059
lights+0.054
etti+0.053
ushes+0.052
Neg logit contributions
 asking-0.055
 trying-0.053
 hunters-0.050
 mean-0.049
 sneaking-0.049
Token that
Token ID326
Feature activation-9.48e-03

Pos logit contributions
igsaw+0.096
ill+0.095
lights+0.095
etti+0.087
ushes+0.082
Neg logit contributions
 asking-0.083
 sneaking-0.078
 trying-0.077
 gir-0.076
 cowboy-0.074
Token day
Token ID1110
Feature activation-3.90e-02

Pos logit contributions
 asking+0.015
 hunters+0.015
 trying+0.015
 gir+0.014
 cowboy+0.014
Neg logit contributions
igsaw-0.019
ill-0.018
lights-0.016
aire-0.016
umm-0.015
Token what
Token ID644
Feature activation+1.39e-02

Pos logit contributions
ill+0.360
igsaw+0.326
lights+0.307
ushes+0.295
etti+0.289
Neg logit contributions
 asking-0.247
 trying-0.245
ep-0.242
 chased-0.240
 hunters-0.240
Token it
Token ID340
Feature activation+8.47e-02

Pos logit contributions
igsaw+0.031
ill+0.031
lights+0.030
etti+0.029
ushes+0.028
Neg logit contributions
 asking-0.020
 trying-0.020
 mean-0.019
 walking-0.017
 telling-0.017
Token was
Token ID373
Feature activation+2.62e-02

Pos logit contributions
ill+0.178
igsaw+0.177
lights+0.168
etti+0.163
ushes+0.160
Neg logit contributions
 asking-0.130
 trying-0.125
 mean-0.115
 hunters-0.112
 sneaking-0.112
Token,
Token ID11
Feature activation-1.98e-03

Pos logit contributions
ill+0.053
igsaw+0.046
etti+0.044
lights+0.043
umm+0.042
Neg logit contributions
 cowboy-0.038
 hunters-0.038
 chased-0.037
 asking-0.036
 gir-0.036
Token and
Token ID290
Feature activation+1.26e-02

Pos logit contributions
 asking+0.003
 trying+0.002
 mean+0.002
 hunters+0.002
 sneaking+0.002
Neg logit contributions
ill-0.005
igsaw-0.005
lights-0.004
etti-0.004
ushes-0.004
Token her
Token ID607
Feature activation+4.13e-02

Pos logit contributions
ill+0.022
igsaw+0.020
lights+0.019
etti+0.019
ushes+0.018
Neg logit contributions
 hunters-0.022
 asking-0.022
 trying-0.022
 mean-0.021
 sneaking-0.021
Token mum
Token ID25682
Feature activation+1.75e-01

Pos logit contributions
ill+0.071
igsaw+0.069
lights+0.067
etti+0.064
ushes+0.061
Neg logit contributions
 asking-0.080
 trying-0.080
 mean-0.076
 walking-0.073
 sheep-0.073
Token said
Token ID531
Feature activation-1.42e-01

Pos logit contributions
ill+0.445
igsaw+0.407
etti+0.399
lights+0.394
ushes+0.378
Neg logit contributions
 asking-0.183
 trying-0.177
 hunters-0.174
 sneaking-0.166
 mean-0.161
Token,
Token ID11
Feature activation-1.30e-03

Pos logit contributions
 asking+0.219
 trying+0.209
 hunters+0.193
 mean+0.192
 sneaking+0.190
Neg logit contributions
ill-0.294
igsaw-0.291
lights-0.277
etti-0.270
ushes-0.259
Token "
Token ID366
Feature activation-9.50e-02

Pos logit contributions
 hunters+0.002
 asking+0.002
 mean+0.002
 thieves+0.002
ep+0.002
Neg logit contributions
ill-0.002
igsaw-0.002
etti-0.002
lights-0.002
-0.002
TokenIt
Token ID1026
Feature activation+1.48e-01

Pos logit contributions
 asking+0.176
 trying+0.170
 mean+0.158
 telling+0.157
 sneaking+0.154
Neg logit contributions
igsaw-0.174
lights-0.165
ill-0.163
etti-0.153
ushes-0.150
Token had
Token ID550
Feature activation+1.89e-02

Pos logit contributions
igsaw+0.058
ill+0.058
lights+0.055
etti+0.053
ushes+0.052
Neg logit contributions
 asking-0.038
 trying-0.037
 mean-0.034
 sneaking-0.033
 hunters-0.033
Token lots
Token ID6041
Feature activation+8.79e-02

Pos logit contributions
igsaw+0.041
ill+0.041
lights+0.040
etti+0.038
umm+0.038
Neg logit contributions
 asking-0.027
 trying-0.027
 mean-0.025
 hunters-0.024
 telling-0.023
Token of
Token ID286
Feature activation+9.49e-02

Pos logit contributions
ill+0.178
igsaw+0.176
etti+0.168
lights+0.165
aire+0.160
Neg logit contributions
 trying-0.137
 asking-0.137
 mean-0.129
 hunters-0.124
 telling-0.121
Token fun
Token ID1257
Feature activation+1.03e-02

Pos logit contributions
ill+0.232
igsaw+0.231
lights+0.221
etti+0.215
ushes+0.211
Neg logit contributions
 asking-0.114
 trying-0.109
 mean-0.097
 hunters-0.094
 sneaking-0.093
Token.
Token ID13
Feature activation+3.79e-02

Pos logit contributions
ill+0.022
igsaw+0.022
lights+0.021
etti+0.020
ushes+0.020
Neg logit contributions
 asking-0.015
 trying-0.015
 mean-0.013
 sneaking-0.013
 hunters-0.013
Token The
Token ID383
Feature activation+2.96e-02

Pos logit contributions
lights+0.099
ill+0.097
etti+0.095
igsaw+0.095
umm+0.093
Neg logit contributions
 asking-0.035
 trying-0.034
 telling-0.034
 le-0.031
 named-0.029
Token end
Token ID886
Feature activation+9.79e-02

Pos logit contributions
igsaw+0.051
ill+0.051
lights+0.048
umm+0.044
ushes+0.043
Neg logit contributions
 asking-0.057
 trying-0.053
 sneaking-0.051
 challenged-0.050
 demanding-0.049
Token.
Token ID13
Feature activation+3.66e-02

Pos logit contributions
ill+0.272
igsaw+0.269
umm+0.263
etti+0.258
aire+0.258
Neg logit contributions
 trying-0.079
 asking-0.074
 mean-0.070
 telling-0.067
 dangerous-0.060
Token<|endoftext|>
Token ID50256
Feature activation-4.80e-02

Pos logit contributions
lights+0.107
igsaw+0.104
ill+0.102
umm+0.101
etti+0.100
Neg logit contributions
 telling-0.026
 asking-0.025
 trying-0.025
 going-0.020
 walking-0.020
TokenOnce
Token ID7454
Feature activation-1.54e-02

Pos logit contributions
 trying+0.062
 asking+0.061
 walking+0.056
 telling+0.053
 mean+0.052
Neg logit contributions
igsaw-0.113
lights-0.111
umm-0.110
ill-0.106
etti-0.104
Token upon
Token ID2402
Feature activation-7.94e-02

Pos logit contributions
 asking+0.022
 trying+0.021
 mean+0.019
 walking+0.018
 telling+0.018
Neg logit contributions
igsaw-0.034
ill-0.033
etti-0.032
lights-0.032
umm-0.032
Token follow
Token ID1061
Feature activation-4.20e-02

Pos logit contributions
ill+0.069
igsaw+0.068
lights+0.065
etti+0.063
ushes+0.061
Neg logit contributions
 asking-0.053
 trying-0.052
 mean-0.047
 hunters-0.046
 sneaking-0.046
Token your
Token ID534
Feature activation-1.20e-02

Pos logit contributions
 asking+0.056
 trying+0.054
 mean+0.049
 hunters+0.047
 telling+0.047
Neg logit contributions
ill-0.098
igsaw-0.096
lights-0.092
etti-0.090
umm-0.088
Token dreams
Token ID10625
Feature activation+1.78e-01

Pos logit contributions
 asking+0.021
 trying+0.020
 sneaking+0.019
 hunters+0.019
 mean+0.019
Neg logit contributions
igsaw-0.022
ill-0.021
lights-0.021
ushes-0.020
umm-0.019
Token,
Token ID11
Feature activation+1.69e-02

Pos logit contributions
igsaw+0.439
ill+0.431
lights+0.412
umm+0.408
etti+0.407
Neg logit contributions
 asking-0.209
 trying-0.207
 mean-0.185
 going-0.175
 telling-0.171
Token just
Token ID655
Feature activation+7.28e-02

Pos logit contributions
ill+0.038
igsaw+0.038
lights+0.036
etti+0.035
ushes+0.034
Neg logit contributions
 asking-0.023
 trying-0.023
 mean-0.020
 hunters-0.020
 sneaking-0.020
Token like
Token ID588
Feature activation+7.13e-02

Pos logit contributions
ill+0.149
igsaw+0.147
lights+0.140
etti+0.135
ushes+0.133
Neg logit contributions
 asking-0.111
 trying-0.107
 mean-0.099
 hunters-0.098
 sneaking-0.098
Token Tim
Token ID5045
Feature activation-1.29e-02

Pos logit contributions
ill+0.149
igsaw+0.148
lights+0.142
etti+0.138
ushes+0.134
Neg logit contributions
 asking-0.094
 trying-0.090
 mean-0.087
 sneaking-0.085
 hunters-0.084
Tokenmy
Token ID1820
Feature activation-2.61e-02

Pos logit contributions
 asking+0.022
 trying+0.021
 hunters+0.021
 sneaking+0.020
 gir+0.020
Neg logit contributions
ill-0.024
igsaw-0.021
lights-0.019
ushes-0.019
cy-0.019
Token loved
Token ID6151
Feature activation-3.90e-02

Pos logit contributions
 asking+0.032
 trying+0.029
 hunters+0.029
 sneaking+0.028
 cowboy+0.028
Neg logit contributions
ill-0.059
igsaw-0.058
lights-0.054
etti-0.053
ushes-0.052
Token to
Token ID284
Feature activation-8.80e-02

Pos logit contributions
 asking+0.052
 trying+0.050
 mean+0.050
 chased+0.048
 sneaking+0.048
Neg logit contributions
igsaw-0.080
ill-0.079
lights-0.076
etti-0.072
ushes-0.071
Token paint
Token ID7521
Feature activation+1.13e-01

Pos logit contributions
 asking+0.139
 trying+0.135
 mean+0.124
 sneaking+0.119
 telling+0.119
Neg logit contributions
ill-0.182
igsaw-0.181
lights-0.172
etti-0.166
ushes-0.164
Token she
Token ID673
Feature activation+4.12e-03

Pos logit contributions
igsaw+0.070
ill+0.069
etti+0.066
lights+0.065
umm+0.065
Neg logit contributions
 trying-0.039
 asking-0.038
 mean-0.036
 going-0.035
 walking-0.034
Token felt
Token ID2936
Feature activation+9.44e-02

Pos logit contributions
igsaw+0.009
ill+0.009
lights+0.009
etti+0.008
ushes+0.008
Neg logit contributions
 asking-0.006
 trying-0.006
 mean-0.005
 sneaking-0.005
 hunters-0.005
Token the
Token ID262
Feature activation+8.85e-03

Pos logit contributions
igsaw+0.226
ill+0.224
lights+0.216
etti+0.213
umm+0.207
Neg logit contributions
 trying-0.125
 asking-0.125
 mean-0.114
 going-0.105
 telling-0.105
Token y
Token ID331
Feature activation-6.41e-03

Pos logit contributions
ill+0.019
igsaw+0.019
lights+0.018
etti+0.017
ushes+0.017
Neg logit contributions
 asking-0.014
 trying-0.014
 mean-0.013
 naughty-0.012
 telling-0.012
Tokenummy
Token ID13513
Feature activation+1.78e-01

Pos logit contributions
 le+0.013
 asking+0.013
 trying+0.013
 mean+0.012
 naughty+0.012
Neg logit contributions
ill-0.008
lights-0.008
igsaw-0.008
ushes-0.008
etti-0.007
Token juice
Token ID13135
Feature activation+4.76e-02

Pos logit contributions
ill+0.361
igsaw+0.360
lights+0.345
etti+0.338
ushes+0.333
Neg logit contributions
 trying-0.299
 asking-0.296
 mean-0.277
 le-0.264
 telling-0.258
Token sliding
Token ID22292
Feature activation-7.48e-03

Pos logit contributions
igsaw+0.115
ill+0.114
lights+0.109
etti+0.106
ushes+0.105
Neg logit contributions
 asking-0.058
 trying-0.057
 mean-0.050
 sneaking-0.048
 telling-0.048
Token down
Token ID866
Feature activation+1.11e-01

Pos logit contributions
 asking+0.010
 trying+0.010
 mean+0.009
 guns+0.009
 le+0.009
Neg logit contributions
igsaw-0.017
ill-0.017
etti-0.016
lights-0.015
ushes-0.015
Token her
Token ID607
Feature activation+4.03e-02

Pos logit contributions
igsaw+0.258
ill+0.257
lights+0.246
etti+0.240
ushes+0.235
Neg logit contributions
 asking-0.144
 trying-0.139
 mean-0.124
 sneaking-0.120
 hunters-0.119
Token throat
Token ID13589
Feature activation+5.09e-02

Pos logit contributions
ill+0.085
igsaw+0.084
lights+0.080
etti+0.078
ushes+0.077
Neg logit contributions
 asking-0.061
 trying-0.060
 mean-0.054
 sneaking-0.053
 telling-0.052
Token.
Token ID13
Feature activation+2.15e-02

Pos logit contributions
ill+0.110
igsaw+0.105
lights+0.101
etti+0.097
ushes+0.095
Neg logit contributions
 asking-0.075
 trying-0.072
 hunters-0.067
 mean-0.066
 sneaking-0.066
Token Emma
Token ID18966
Feature activation+6.04e-02

Pos logit contributions
ill+0.014
igsaw+0.014
lights+0.012
etti+0.012
ushes+0.012
Neg logit contributions
 asking-0.014
 trying-0.014
 sneaking-0.013
 hunters-0.013
 mean-0.013
Token and
Token ID290
Feature activation+2.03e-02

Pos logit contributions
ill+0.135
igsaw+0.131
lights+0.126
etti+0.123
ushes+0.120
Neg logit contributions
 asking-0.082
 trying-0.079
 mean-0.073
 sneaking-0.071
 hunters-0.071
Token she
Token ID673
Feature activation+7.13e-03

Pos logit contributions
igsaw+0.042
ill+0.042
lights+0.040
etti+0.039
ushes+0.038
Neg logit contributions
 asking-0.032
 trying-0.031
 mean-0.028
 telling-0.027
 hunters-0.027
Token was
Token ID373
Feature activation+1.97e-02

Pos logit contributions
igsaw+0.017
ill+0.017
lights+0.016
etti+0.015
umm+0.015
Neg logit contributions
 asking-0.009
 trying-0.009
 mean-0.008
 telling-0.008
 sneaking-0.008
Token three
Token ID1115
Feature activation+2.27e-02

Pos logit contributions
ill+0.042
igsaw+0.040
etti+0.038
ushes+0.038
lights+0.038
Neg logit contributions
 asking-0.025
 guns-0.024
 hunters-0.024
 trying-0.023
 sneaking-0.023
Token-
Token ID12
Feature activation+6.56e-03

Pos logit contributions
lights+0.044
igsaw+0.044
ill+0.043
ushes+0.041
aney+0.039
Neg logit contributions
 asking-0.035
 trying-0.033
 cowboy-0.032
 sneaking-0.031
 hunters-0.031
Tokenyears
Token ID19002
Feature activation-2.98e-02

Pos logit contributions
ill+0.011
igsaw+0.010
aney+0.009
ushes+0.009
etti+0.009
Neg logit contributions
 mean-0.013
 gun-0.012
 cowboy-0.012
 guns-0.012
 trying-0.012
Token-
Token ID12
Feature activation+1.15e-02

Pos logit contributions
 asking+0.042
 trying+0.041
 sneaking+0.036
 mean+0.035
 telling+0.035
Neg logit contributions
igsaw-0.068
ill-0.066
lights-0.063
ushes-0.060
umm-0.059
Tokenold
Token ID727
Feature activation-1.31e-01

Pos logit contributions
ill+0.024
igsaw+0.023
etti+0.021
ushes+0.021
lights+0.021
Neg logit contributions
 asking-0.018
 trying-0.018
 mean-0.017
 sneaking-0.016
 hunters-0.016
Token.
Token ID13
Feature activation+2.09e-02

Pos logit contributions
 asking+0.191
 trying+0.186
 mean+0.184
 sneaking+0.178
ep+0.174
Neg logit contributions
ill-0.268
igsaw-0.256
lights-0.240
etti-0.234
aire-0.233
Token She
Token ID1375
Feature activation+5.25e-02

Pos logit contributions
ill+0.039
igsaw+0.038
lights+0.038
ushes+0.036
etti+0.036
Neg logit contributions
 asking-0.038
 trying-0.036
 walking-0.032
 telling-0.032
 mean-0.031
Token.
Token ID13
Feature activation+3.46e-02

Pos logit contributions
 asking+0.092
 trying+0.087
 sneaking+0.086
 hunters+0.085
 cowboy+0.084
Neg logit contributions
ill-0.082
etti-0.071
igsaw-0.070
cy-0.069
lights-0.069
Token Tim
Token ID5045
Feature activation-2.24e-02

Pos logit contributions
igsaw+0.075
ill+0.074
lights+0.072
ushes+0.070
etti+0.068
Neg logit contributions
 asking-0.050
 trying-0.049
 mean-0.045
 hunters-0.043
 sneaking-0.043
Tokenmy
Token ID1820
Feature activation-4.97e-02

Pos logit contributions
 asking+0.031
 trying+0.030
 mean+0.027
 sneaking+0.026
 hunters+0.026
Neg logit contributions
igsaw-0.051
ill-0.051
lights-0.048
etti-0.047
ushes-0.046
Token loved
Token ID6151
Feature activation-5.19e-02

Pos logit contributions
 asking+0.074
 trying+0.069
 hunters+0.065
 sneaking+0.063
 mean+0.062
Neg logit contributions
ill-0.105
igsaw-0.102
lights-0.098
etti-0.096
aire-0.092
Token to
Token ID284
Feature activation-1.03e-01

Pos logit contributions
 mean+0.075
 charged+0.075
 trying+0.074
aver+0.074
 asking+0.073
Neg logit contributions
igsaw-0.087
lights-0.079
etti-0.077
umm-0.076
ill-0.075
Token play
Token ID711
Feature activation+1.13e-02

Pos logit contributions
 asking+0.148
 trying+0.144
 mean+0.133
 walking+0.126
 sneaking+0.126
Neg logit contributions
ill-0.228
igsaw-0.225
lights-0.218
ushes-0.209
etti-0.208
Token with
Token ID351
Feature activation+9.79e-02

Pos logit contributions
igsaw+0.020
ill+0.019
etti+0.019
lights+0.018
+0.017
Neg logit contributions
 asking-0.020
 trying-0.018
 demanding-0.018
 sneaking-0.017
 mean-0.017
Token his
Token ID465
Feature activation-3.54e-02

Pos logit contributions
lights+0.174
igsaw+0.173
etti+0.155
ill+0.152
ushes+0.147
Neg logit contributions
 asking-0.167
 trying-0.158
 sneaking-0.156
 telling-0.152
 demanding-0.149
Token toys
Token ID14958
Feature activation-3.56e-02

Pos logit contributions
 asking+0.054
 sneaking+0.051
 trying+0.049
 telling+0.046
 challenged+0.046
Neg logit contributions
igsaw-0.073
lights-0.068
ushes-0.065
etti-0.064
-0.063
Token all
Token ID477
Feature activation+1.97e-01

Pos logit contributions
 asking+0.055
 trying+0.050
 sneaking+0.050
 hunters+0.048
 demanding+0.048
Neg logit contributions
ill-0.072
igsaw-0.068
lights-0.068
etti-0.063
-0.062
Token day
Token ID1110
Feature activation-4.01e-02

Pos logit contributions
ill+0.408
igsaw+0.403
lights+0.385
ushes+0.362
etti+0.361
Neg logit contributions
 asking-0.299
 trying-0.289
 hunters-0.269
 sneaking-0.266
 mean-0.265
Token called
Token ID1444
Feature activation-1.34e-01

Pos logit contributions
 cowboy+0.212
 hunters+0.208
 asking+0.207
 men+0.202
 sheriff+0.198
Neg logit contributions
ill-0.337
etti-0.295
igsaw-0.279
cy-0.278
lights-0.273
Token Bill
Token ID3941
Feature activation+2.62e-02

Pos logit contributions
 asking+0.255
 trying+0.253
 mean+0.238
 sneaking+0.231
 hunters+0.231
Neg logit contributions
ill-0.226
igsaw-0.224
etti-0.201
lights-0.194
aire-0.182
Token who
Token ID508
Feature activation-1.32e-01

Pos logit contributions
ill+0.051
lights+0.041
igsaw+0.041
etti+0.041
aney+0.039
Neg logit contributions
 asking-0.044
 trying-0.043
 mean-0.041
 hunters-0.041
 sneaking-0.041
Token loved
Token ID6151
Feature activation-5.00e-02

Pos logit contributions
 asking+0.200
 trying+0.191
 hunters+0.174
 sneaking+0.172
 mean+0.172
Neg logit contributions
ill-0.274
igsaw-0.271
lights-0.258
etti-0.252
ushes-0.246
Token spicy
Token ID26880
Feature activation-9.82e-02

Pos logit contributions
 trying+0.072
 mean+0.072
 sneaking+0.071
 asking+0.071
ep+0.071
Neg logit contributions
igsaw-0.086
lights-0.080
umm-0.078
ill-0.078
ushes-0.074
Token food
Token ID2057
Feature activation-1.25e-03

Pos logit contributions
 asking+0.141
 trying+0.135
 mean+0.121
 sneaking+0.121
 telling+0.119
Neg logit contributions
igsaw-0.209
ill-0.207
lights-0.201
umm-0.197
etti-0.191
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
 asking+0.002
 trying+0.002
 mean+0.001
 hunters+0.001
 sneaking+0.001
Neg logit contributions
ill-0.003
igsaw-0.003
lights-0.003
etti-0.003
ushes-0.003
Token Every
Token ID3887
Feature activation+2.36e-02

Pos logit contributions
igsaw+0.037
ill+0.036
lights+0.034
etti+0.033
ushes+0.032
Neg logit contributions
 asking-0.035
 trying-0.034
 mean-0.031
 hunters-0.030
 gir-0.030
Token day
Token ID1110
Feature activation-4.27e-02

Pos logit contributions
igsaw+0.046
ill+0.046
lights+0.043
umm+0.041
etti+0.041
Neg logit contributions
 asking-0.039
 trying-0.037
 hunters-0.034
 sneaking-0.034
 mean-0.033
Token he
Token ID339
Feature activation-6.28e-02

Pos logit contributions
 asking+0.060
 trying+0.058
 sneaking+0.053
 hunters+0.052
 telling+0.052
Neg logit contributions
ill-0.093
igsaw-0.089
lights-0.087
etti-0.084
umm-0.082
Token would
Token ID561
Feature activation-5.60e-02

Pos logit contributions
 asking+0.102
 trying+0.097
 hunters+0.091
 demanding+0.088
 telling+0.087
Neg logit contributions
ill-0.121
lights-0.111
igsaw-0.111
etti-0.107
aire-0.106
Token the
Token ID262
Feature activation+5.25e-03

Pos logit contributions
ill+0.147
igsaw+0.146
lights+0.139
etti+0.135
ushes+0.133
Neg logit contributions
 asking-0.094
 trying-0.091
 mean-0.083
 hunters-0.080
 sneaking-0.080
Token coin
Token ID10752
Feature activation+1.88e-02

Pos logit contributions
ill+0.008
igsaw+0.008
lights+0.008
etti+0.007
umm+0.007
Neg logit contributions
 asking-0.010
 trying-0.010
 sneaking-0.010
 telling-0.009
 hunters-0.009
Token too
Token ID1165
Feature activation+7.37e-02

Pos logit contributions
ill+0.041
igsaw+0.040
lights+0.038
etti+0.037
ushes+0.036
Neg logit contributions
 asking-0.028
 trying-0.026
 mean-0.024
 hunters-0.024
 sneaking-0.024
Token.
Token ID13
Feature activation+8.20e-03

Pos logit contributions
ill+0.171
igsaw+0.170
lights+0.162
etti+0.157
ushes+0.155
Neg logit contributions
 asking-0.098
 trying-0.094
 mean-0.085
 hunters-0.082
 sneaking-0.082
Token She
Token ID1375
Feature activation+4.35e-02

Pos logit contributions
igsaw+0.015
ill+0.014
lights+0.014
etti+0.013
umm+0.013
Neg logit contributions
 asking-0.016
 trying-0.015
 mean-0.014
 telling-0.014
 hunters-0.014
Token thought
Token ID1807
Feature activation-1.29e-01

Pos logit contributions
ill+0.081
oph+0.071
lights+0.070
aire+0.068
ushes+0.067
Neg logit contributions
 asking-0.064
 hunters-0.063
 trying-0.062
 sneaking-0.059
 stealing-0.059
Token Tom
Token ID4186
Feature activation-1.03e-01

Pos logit contributions
 asking+0.189
 trying+0.185
 mean+0.168
 sneaking+0.161
 telling+0.160
Neg logit contributions
igsaw-0.282
ill-0.281
lights-0.268
etti-0.263
umm-0.261
Token was
Token ID373
Feature activation+1.80e-02

Pos logit contributions
 asking+0.170
 trying+0.158
 hunters+0.154
 sneaking+0.149
 cowboy+0.147
Neg logit contributions
ill-0.201
igsaw-0.180
lights-0.175
etti-0.171
ushes-0.170
Token being
Token ID852
Feature activation-6.70e-02

Pos logit contributions
ill+0.040
igsaw+0.040
etti+0.037
lights+0.036
ushes+0.035
Neg logit contributions
 asking-0.020
 gir-0.017
 sheep-0.017
cut-0.016
 challenged-0.016
Token mean
Token ID1612
Feature activation-1.66e-01

Pos logit contributions
 asking+0.058
 sneaking+0.051
 trying+0.048
 sheep+0.046
 gir+0.045
Neg logit contributions
ill-0.168
igsaw-0.167
ushes-0.159
lights-0.151
aney-0.149
Token and
Token ID290
Feature activation+1.18e-03

Pos logit contributions
 asking+0.246
 trying+0.232
 cowboy+0.225
 sneaking+0.224
 gir+0.223
Neg logit contributions
ill-0.352
lights-0.320
igsaw-0.315
ushes-0.303
etti-0.290
Token a
Token ID257
Feature activation-5.47e-02

Pos logit contributions
ill+0.112
igsaw+0.107
lights+0.105
etti+0.100
ushes+0.100
Neg logit contributions
 asking-0.067
 trying-0.062
 sneaking-0.056
 telling-0.055
 mean-0.054
Token big
Token ID1263
Feature activation-1.09e-02

Pos logit contributions
 asking+0.075
 trying+0.070
 hunters+0.062
 telling+0.062
 sneaking+0.062
Neg logit contributions
igsaw-0.124
ill-0.122
lights-0.116
etti-0.113
umm-0.110
Token truck
Token ID7779
Feature activation-1.67e-01

Pos logit contributions
 asking+0.016
 trying+0.016
 mean+0.014
 hunters+0.014
 sneaking+0.014
Neg logit contributions
igsaw-0.023
ill-0.023
lights-0.022
etti-0.021
umm-0.021
Token with
Token ID351
Feature activation+8.18e-02

Pos logit contributions
 asking+0.270
 cowboy+0.267
 hunters+0.261
 mean+0.252
 sheep+0.251
Neg logit contributions
ill-0.310
etti-0.299
lights-0.279
igsaw-0.270
-0.259
Token a
Token ID257
Feature activation-5.49e-02

Pos logit contributions
ushes+0.133
lights+0.131
ill+0.127
igsaw+0.121
etti+0.120
Neg logit contributions
 asking-0.137
 sneaking-0.135
 trying-0.132
 telling-0.124
 challenged-0.123
Token trunk
Token ID21427
Feature activation-9.59e-02

Pos logit contributions
 asking+0.088
 hunters+0.082
 sneaking+0.079
 trying+0.078
 telling+0.076
Neg logit contributions
igsaw-0.101
lights-0.095
etti-0.095
ill-0.092
-0.091
Token.
Token ID13
Feature activation-2.04e-03

Pos logit contributions
 asking+0.150
 trying+0.139
 hunters+0.133
 sneaking+0.133
 cowboy+0.133
Neg logit contributions
ill-0.197
lights-0.186
igsaw-0.186
etti-0.180
umm-0.169
Token The
Token ID383
Feature activation-2.65e-03

Pos logit contributions
 asking+0.004
 trying+0.004
 mean+0.003
 hunters+0.003
 walking+0.003
Neg logit contributions
igsaw-0.004
ill-0.004
umm-0.004
lights-0.004
etti-0.003
Token trunk
Token ID21427
Feature activation-9.65e-02

Pos logit contributions
 asking+0.004
 sneaking+0.004
 trying+0.004
 tri+0.003
 telling+0.003
Neg logit contributions
lights-0.005
ushes-0.005
ill-0.005
igsaw-0.005
etti-0.005
Token is
Token ID318
Feature activation-3.88e-03

Pos logit contributions
 asking+0.164
 trying+0.158
 mean+0.145
 sneaking+0.144
 hunters+0.143
Neg logit contributions
ill-0.186
igsaw-0.183
lights-0.177
etti-0.169
ushes-0.165
Token orange
Token ID10912
Feature activation+2.94e-02

Pos logit contributions
cod+0.006
 asking+0.005
 sneaking+0.005
 cowboy+0.005
 challenged+0.005
Neg logit contributions
lights-0.007
ill-0.007
ushes-0.007
etti-0.006
igsaw-0.006
Token big
Token ID1263
Feature activation-8.85e-03

Pos logit contributions
ill+0.011
igsaw+0.011
lights+0.011
ushes+0.011
etti+0.010
Neg logit contributions
 asking-0.008
 trying-0.008
 challenged-0.007
 cowboy-0.007
 sneaking-0.007
Token and
Token ID290
Feature activation-9.54e-04

Pos logit contributions
 asking+0.014
 challenged+0.013
ep+0.012
 trying+0.012
 mean+0.012
Neg logit contributions
ill-0.017
igsaw-0.017
lights-0.016
etti-0.015
umm-0.015
Token shiny
Token ID22441
Feature activation+4.61e-02

Pos logit contributions
 asking+0.001
 trying+0.001
 mean+0.001
 telling+0.001
 sneaking+0.001
Neg logit contributions
ill-0.002
igsaw-0.002
lights-0.002
etti-0.002
umm-0.002
Token.
Token ID13
Feature activation+7.63e-04

Pos logit contributions
ill+0.109
igsaw+0.108
lights+0.103
etti+0.100
ushes+0.099
Neg logit contributions
 asking-0.059
 trying-0.057
 mean-0.051
 hunters-0.050
 sneaking-0.049
Token They
Token ID1119
Feature activation-3.88e-02

Pos logit contributions
igsaw+0.002
lights+0.002
etti+0.002
umm+0.002
ill+0.001
Neg logit contributions
 asking-0.001
 trying-0.001
 mean-0.001
 le-0.001
 challenged-0.001
Token wanted
Token ID2227
Feature activation-2.00e-01

Pos logit contributions
 asking+0.045
 trying+0.042
 hunters+0.038
 sneaking+0.038
 mean+0.038
Neg logit contributions
ill-0.097
igsaw-0.093
lights-0.089
etti-0.088
ushes-0.087
Token to
Token ID284
Feature activation-1.13e-01

Pos logit contributions
 asking+0.242
 trying+0.231
 mean+0.208
 sneaking+0.202
 hunters+0.202
Neg logit contributions
ill-0.485
igsaw-0.476
lights-0.456
etti-0.446
ushes-0.439
Token see
Token ID766
Feature activation+5.64e-02

Pos logit contributions
 asking+0.131
 trying+0.129
 mean+0.123
 le+0.113
 sc+0.111
Neg logit contributions
lights-0.297
igsaw-0.278
ill-0.265
umm-0.263
ushes-0.256
Token what
Token ID644
Feature activation-5.04e-03

Pos logit contributions
igsaw+0.132
ill+0.130
lights+0.124
etti+0.120
umm+0.119
Neg logit contributions
 asking-0.076
 trying-0.074
 mean-0.067
 hunters-0.064
 telling-0.064
Token was
Token ID373
Feature activation+7.80e-03

Pos logit contributions
 trying+0.006
 asking+0.006
 mean+0.006
 le+0.005
 walking+0.005
Neg logit contributions
igsaw-0.013
ill-0.013
etti-0.012
lights-0.012
umm-0.012
Token inside
Token ID2641
Feature activation+1.51e-01

Pos logit contributions
ill+0.010
igsaw+0.009
etti+0.009
lights+0.008
ushes+0.008
Neg logit contributions
 cowboy-0.015
 sheriff-0.015
 asking-0.015
 demanding-0.014
 sheep-0.014
Token not
Token ID407
Feature activation-8.95e-02

Pos logit contributions
ill+0.016
lights+0.015
igsaw+0.015
ushes+0.014
etti+0.014
Neg logit contributions
 asking-0.011
 trying-0.010
 sneaking-0.009
 hunters-0.009
 gir-0.009
Token mean
Token ID1612
Feature activation-1.71e-01

Pos logit contributions
 asking+0.107
 trying+0.098
 sneaking+0.096
 challenged+0.090
 gir+0.090
Neg logit contributions
ill-0.200
igsaw-0.181
lights-0.181
ushes-0.172
etti-0.171
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
 asking+0.299
ep+0.294
 trying+0.286
 cowboy+0.281
cut+0.281
Neg logit contributions
ill-0.310
igsaw-0.265
lights-0.263
ushes-0.247
etti-0.244
Token It
Token ID632
Feature activation+1.10e-01

Pos logit contributions
 mean+0.020
 trying+0.019
 asking+0.019
 hunters+0.018
 rude+0.017
Neg logit contributions
lights-0.028
igsaw-0.028
umm-0.027
ill-0.027
etti-0.026
Token just
Token ID655
Feature activation+4.18e-02

Pos logit contributions
ill+0.197
lights+0.181
etti+0.167
ushes+0.166
igsaw+0.164
Neg logit contributions
 asking-0.189
 trying-0.173
 hunters-0.168
 sneaking-0.158
 guns-0.158
Token wanted
Token ID2227
Feature activation-1.93e-01

Pos logit contributions
igsaw+0.106
ill+0.106
lights+0.101
etti+0.099
ushes+0.097
Neg logit contributions
 asking-0.046
 trying-0.044
 mean-0.040
 hunters-0.038
 sneaking-0.037
Token to
Token ID284
Feature activation-1.12e-01

Pos logit contributions
 asking+0.260
 trying+0.253
 mean+0.225
 sneaking+0.225
 hunters+0.224
Neg logit contributions
ill-0.427
igsaw-0.416
lights-0.400
etti-0.386
ushes-0.383
Token play
Token ID711
Feature activation-1.43e-03

Pos logit contributions
 asking+0.167
 trying+0.161
 mean+0.147
 hunters+0.144
 sneaking+0.144
Neg logit contributions
ill-0.239
igsaw-0.237
lights-0.226
etti-0.219
ushes-0.214
Token.
Token ID13
Feature activation-1.22e-02

Pos logit contributions
 asking+0.002
 trying+0.002
 mean+0.002
 sneaking+0.002
 hunters+0.002
Neg logit contributions
ill-0.003
igsaw-0.003
lights-0.003
etti-0.003
ushes-0.003
Token
Token ID198
Feature activation-1.35e-03

Pos logit contributions
 asking+0.017
 trying+0.017
 mean+0.017
 hunters+0.015
 rude+0.015
Neg logit contributions
igsaw-0.027
ill-0.026
lights-0.026
umm-0.026
etti-0.025
Token
Token ID198
Feature activation-3.01e-03

Pos logit contributions
 asking+0.002
 trying+0.002
 mean+0.002
 hunters+0.002
 telling+0.002
Neg logit contributions
igsaw-0.003
ill-0.003
lights-0.002
etti-0.002
umm-0.002
Token they
Token ID484
Feature activation-4.86e-02

Pos logit contributions
 asking+0.011
 trying+0.011
 mean+0.010
 hunters+0.009
 sneaking+0.009
Neg logit contributions
ill-0.015
igsaw-0.015
lights-0.014
etti-0.013
ushes-0.013
Token were
Token ID547
Feature activation+3.90e-02

Pos logit contributions
 asking+0.072
 trying+0.069
 mean+0.064
 hunters+0.061
 telling+0.061
Neg logit contributions
igsaw-0.105
ill-0.104
lights-0.099
etti-0.096
ushes-0.094
Token important
Token ID1593
Feature activation-6.00e-02

Pos logit contributions
ill+0.079
lights+0.077
igsaw+0.075
ushes+0.074
etti+0.073
Neg logit contributions
 asking-0.055
 trying-0.052
 sneaking-0.051
 hunters-0.050
 mean-0.048
Token.
Token ID13
Feature activation-3.05e-04

Pos logit contributions
 asking+0.087
 trying+0.083
 mean+0.082
 sneaking+0.080
 hunters+0.077
Neg logit contributions
ill-0.121
igsaw-0.119
lights-0.119
ushes-0.112
etti-0.112
Token They
Token ID1119
Feature activation-4.72e-02

Pos logit contributions
 asking+0.001
 trying+0.001
 telling+0.000
 mean+0.000
 walking+0.000
Neg logit contributions
igsaw-0.001
ill-0.001
lights-0.001
umm-0.001
etti-0.001
Token said
Token ID531
Feature activation-1.62e-01

Pos logit contributions
 asking+0.063
 trying+0.060
 mean+0.055
 hunters+0.053
 sneaking+0.052
Neg logit contributions
ill-0.109
igsaw-0.108
lights-0.104
etti-0.100
ushes-0.099
Token they
Token ID484
Feature activation-4.90e-02

Pos logit contributions
 asking+0.314
 trying+0.302
 sneaking+0.287
 hunters+0.282
 mean+0.275
Neg logit contributions
ill-0.259
igsaw-0.253
lights-0.253
etti-0.231
ushes-0.224
Token would
Token ID561
Feature activation-7.83e-02

Pos logit contributions
 asking+0.069
 trying+0.069
 mean+0.065
 telling+0.060
 going+0.059
Neg logit contributions
igsaw-0.113
ill-0.108
lights-0.105
etti-0.101
umm-0.101
Token always
Token ID1464
Feature activation-5.18e-02

Pos logit contributions
 asking+0.121
 trying+0.117
 mean+0.108
 sneaking+0.104
 hunters+0.104
Neg logit contributions
ill-0.163
igsaw-0.163
lights-0.155
etti-0.149
ushes-0.147
Token support
Token ID1104
Feature activation+4.85e-02

Pos logit contributions
 asking+0.081
 trying+0.080
 mean+0.074
 telling+0.070
 walking+0.069
Neg logit contributions
igsaw-0.110
ill-0.109
lights-0.105
etti-0.101
umm-0.100
Token them
Token ID606
Feature activation+1.08e-03

Pos logit contributions
igsaw+0.101
ill+0.100
lights+0.094
etti+0.093
umm+0.091
Neg logit contributions
 asking-0.080
 trying-0.076
 mean-0.071
 telling-0.069
 hunters-0.068
Token,
Token ID11
Feature activation-5.33e-03

Pos logit contributions
 asking+0.070
 trying+0.065
 sneaking+0.062
 hunters+0.062
 telling+0.061
Neg logit contributions
igsaw-0.095
ill-0.093
lights-0.092
etti-0.090
-0.086
Token Bun
Token ID28515
Feature activation-1.22e-02

Pos logit contributions
 asking+0.009
 trying+0.009
 sneaking+0.008
 gir+0.008
 hunters+0.008
Neg logit contributions
lights-0.009
igsaw-0.009
ill-0.009
etti-0.009
-0.008
Token-
Token ID12
Feature activation-1.33e-04

Pos logit contributions
 asking+0.022
 trying+0.021
 stealing+0.021
 men+0.020
 sneaking+0.020
Neg logit contributions
ill-0.018
cy-0.018
aire-0.017
ugg-0.016
lights-0.016
TokenB
Token ID33
Feature activation-9.51e-02

Pos logit contributions
 asking+0.000
 trying+0.000
 sneaking+0.000
 mean+0.000
 telling+0.000
Neg logit contributions
ill-0.000
lights-0.000
igsaw-0.000
ushes-0.000
etti-0.000
Tokenun
Token ID403
Feature activation-9.55e-02

Pos logit contributions
 asking+0.157
 trying+0.152
 mean+0.140
 sneaking+0.137
 telling+0.136
Neg logit contributions
ill-0.189
igsaw-0.187
lights-0.178
etti-0.172
ushes-0.170
Token met
Token ID1138
Feature activation-1.55e-01

Pos logit contributions
 asking+0.175
 trying+0.166
 sneaking+0.158
 stealing+0.157
 demanding+0.155
Neg logit contributions
ill-0.162
aire-0.145
igsaw-0.141
cy-0.139
etti-0.138
Token a
Token ID257
Feature activation-4.38e-02

Pos logit contributions
 asking+0.198
 trying+0.190
 mean+0.170
 sneaking+0.167
 hunters+0.165
Neg logit contributions
ill-0.365
igsaw-0.360
lights-0.344
etti-0.335
ushes-0.331
Token little
Token ID1310
Feature activation-1.09e-01

Pos logit contributions
 asking+0.061
 trying+0.054
 sneaking+0.051
 hunters+0.050
 yells+0.049
Neg logit contributions
igsaw-0.094
ill-0.088
aire-0.084
etti-0.083
lights-0.082
Token girl
Token ID2576
Feature activation-3.29e-02

Pos logit contributions
 asking+0.169
 trying+0.158
 sneaking+0.149
ep+0.146
 going+0.145
Neg logit contributions
igsaw-0.213
ill-0.209
lights-0.196
aire-0.195
cy-0.193
Token named
Token ID3706
Feature activation-1.37e-01

Pos logit contributions
 cowboy+0.036
 asking+0.034
 men+0.032
 sheriff+0.031
 hunters+0.031
Neg logit contributions
ill-0.078
etti-0.070
cy-0.069
igsaw-0.068
lights-0.067
Token Lily
Token ID20037
Feature activation+1.02e-03

Pos logit contributions
 asking+0.263
 trying+0.255
 mean+0.239
 sneaking+0.233
 hunters+0.233
Neg logit contributions
ill-0.236
igsaw-0.235
lights-0.223
etti-0.213
ushes-0.208
Token should
Token ID815
Feature activation-6.82e-02

Pos logit contributions
 asking+0.031
 trying+0.030
 mean+0.028
 telling+0.027
 hunters+0.026
Neg logit contributions
igsaw-0.039
ill-0.039
lights-0.038
etti-0.036
umm-0.035
Token do
Token ID466
Feature activation-2.16e-02

Pos logit contributions
 asking+0.103
 trying+0.100
 mean+0.092
 sneaking+0.089
 hunters+0.088
Neg logit contributions
ill-0.144
igsaw-0.144
lights-0.137
etti-0.132
ushes-0.130
Token it
Token ID340
Feature activation+8.33e-02

Pos logit contributions
 asking+0.036
 trying+0.034
 hunters+0.032
 mean+0.032
 sneaking+0.032
Neg logit contributions
igsaw-0.041
ill-0.040
lights-0.038
etti-0.037
ushes-0.036
Token.
Token ID13
Feature activation+2.97e-02

Pos logit contributions
ill+0.188
igsaw+0.186
lights+0.177
etti+0.172
ushes+0.168
Neg logit contributions
 asking-0.114
 trying-0.109
 mean-0.098
 hunters-0.097
 sneaking-0.097
Token You
Token ID921
Feature activation-3.95e-02

Pos logit contributions
igsaw+0.053
ill+0.053
lights+0.050
ushes+0.047
etti+0.046
Neg logit contributions
 asking-0.056
 trying-0.054
 mean-0.051
 hunters-0.050
 telling-0.049
Token might
Token ID1244
Feature activation-1.57e-01

Pos logit contributions
 asking+0.062
 trying+0.060
 mean+0.055
 sneaking+0.054
 hunters+0.054
Neg logit contributions
ill-0.081
igsaw-0.081
lights-0.077
etti-0.074
ushes-0.073
Token not
Token ID407
Feature activation-7.52e-02

Pos logit contributions
 asking+0.249
 trying+0.240
 mean+0.219
 hunters+0.217
 sneaking+0.216
Neg logit contributions
ill-0.321
igsaw-0.319
lights-0.301
etti-0.292
ushes-0.286
Token win
Token ID1592
Feature activation+1.04e-02

Pos logit contributions
 asking+0.116
 trying+0.113
 mean+0.106
 telling+0.100
 sneaking+0.099
Neg logit contributions
igsaw-0.159
ill-0.158
lights-0.153
etti-0.147
ushes-0.144
Token."
Token ID526
Feature activation+4.52e-02

Pos logit contributions
ill+0.024
igsaw+0.023
lights+0.022
etti+0.022
ushes+0.021
Neg logit contributions
 asking-0.014
 trying-0.014
 mean-0.012
 hunters-0.012
 sneaking-0.012
Token
Token ID198
Feature activation+2.56e-02

Pos logit contributions
ill+0.088
igsaw+0.088
lights+0.086
etti+0.082
ushes+0.081
Neg logit contributions
 asking-0.079
 trying-0.075
 telling-0.067
 mean-0.067
 walking-0.064
Token
Token ID198
Feature activation+2.38e-02

Pos logit contributions
igsaw+0.049
ill+0.048
lights+0.046
etti+0.045
umm+0.044
Neg logit contributions
 asking-0.045
 trying-0.043
 mean-0.040
 telling-0.039
 sneaking-0.038