TOP ACTIVATIONS
MAX = 17.759

aded in the water delic ately with his fishing rod .
through the water and delic ately scoop ed up some of
day . They worked delic ately and carefully and nothing broke
of sprink les and delic ately spread them over the ice
uff led his hair affection ately . The
work , carefully and delic ately repairing the butterfly 's wings
little girl 's shoulders affection ately . Sarah smiled
the brav est and boun ci est bunny in the whole
will buy you the fan ci est dress in the store
on her walk , R yle e stumbled across a mysterious
to have a cool hairst yle , but he didn 't
was full of people b owing to the stage . The
new home with the fan ci est king in the land
corner . It was me owing softly . The girl was
noticed a little kitten me owing for help . It was
Fl uffy immediately stopped me owing . Lily was so happy
way back home , v owing never to return to that
happy ! He kept b owing and w agging his tail
sun sets and the b owing all made her feel happy
courage and loyalty by b owing to Daisy . <|endoftext|> Once

MIN ACTIVATIONS
MIN = -14.939

is s og gy and mush y . It makes Ben
. It is brown and mush y . "
't . They are all mush y and wet . I
sour . The strawberries were mush y and brown . The
. It was sour and mush y , but he did
" It is sour and mush y . Can I have
the soft sand and the mush y mud .
vegetable . And it is mush y and brown inside ."
. They are dry and mush y ," Sara said .
the jelly was a bit mush y . But that was
. It is wet and mush y . I put it
herbs had turned into a mush y green mixture .
, the wire was all mush y and wet .
It will get brown and mush y and make a mess
that they were old and mush y . Lily was sad
apple was so old and mush y that it started to
it was all brown and mush y . The
The tomatoes get soft and mush y . The cheese gets
loud noise and a big mess . Lily was scared and
they also made a big mess . They threw their toys

INTERVAL 9.584 - 17.759
CONTAINS 0.116%

named Molly . She was 3 years old and loved to
Everyone watched him in amaz ement and cheered .
one . Her mom said yes , and she happily ate
. He had heard about aer opl anes before ; they
Tom was visiting his grand pa 's house . His grand

INTERVAL 1.410 - 9.584
CONTAINS 25.074%

. She wants to have a party at her house .
went into the street . A car came and hit the
Tim my said , " Sure ! Let 's build
" Sure , you can play with us .
Sara and Ben say . " It is very fancy ."

INTERVAL -6.764 - 1.410
CONTAINS 73.258%

something wonderful with his yarn . <|endoftext|> Once upon a time
what to do . She came to Lily and said ,
smile . " Do you want to try ?"
help but smile . The moral of this story
he tried it . But little did he know , it

INTERVAL -14.939 - -6.764
CONTAINS 1.552%

to promise not to eat my cake again . OK ?"
y ! You can 't hurt my rabbit ! He is
to take them away . Lily 's mom told her to
." Anna did not want to be weak and
to stay on the first floor . The elevator
Tokenaded
Token ID5286
Feature activation+3.84e+00

Pos logit contributions
 mess+0.121
 mush+0.114
omo+0.107
 slippery+0.106
 time+0.103
Neg logit contributions
ci-0.148
piring-0.148
work-0.148
cient-0.145
illa-0.142
Token in
Token ID287
Feature activation+1.67e+00

Pos logit contributions
piring+0.216
cient+0.212
work+0.206
leaf+0.200
ci+0.199
Neg logit contributions
 mess-0.240
 mush-0.228
omo-0.217
onna-0.211
 slippery-0.208
Token the
Token ID262
Feature activation+7.04e-01

Pos logit contributions
piring+0.095
cient+0.093
work+0.091
igh+0.090
leaf+0.090
Neg logit contributions
 mess-0.097
 mush-0.094
omo-0.091
onna-0.086
 slippery-0.084
Token water
Token ID1660
Feature activation-5.36e+00

Pos logit contributions
cient+0.043
piring+0.043
work+0.041
 father+0.041
 photographer+0.040
Neg logit contributions
 mess-0.039
 mush-0.037
omo-0.035
onna-0.035
 slippery-0.033
Token delic
Token ID8675
Feature activation-5.20e+00

Pos logit contributions
 mess+0.276
 mush+0.260
 slippery+0.239
omo+0.233
onna+0.228
Neg logit contributions
piring-0.360
cient-0.355
leaf-0.344
work-0.344
illa-0.342
Tokenately
Token ID1286
Feature activation+1.78e+01

Pos logit contributions
onna+0.314
 mess+0.313
 mush+0.297
omo+0.296
 holes+0.280
Neg logit contributions
ci-0.266
cient-0.265
leaf-0.256
piring-0.249
 photographer-0.248
Token with
Token ID351
Feature activation-4.00e+00

Pos logit contributions
piring+1.113
cient+1.107
work+1.069
ci+1.050
leaf+1.048
Neg logit contributions
 mess-0.973
 mush-0.929
omo-0.859
 slippery-0.844
onna-0.818
Token his
Token ID465
Feature activation-1.01e+00

Pos logit contributions
 mess+0.207
 mush+0.190
omo+0.182
onna+0.179
 slippery+0.173
Neg logit contributions
piring-0.266
cient-0.261
work-0.250
 father-0.250
leaf-0.250
Token fishing
Token ID12478
Feature activation+7.35e+00

Pos logit contributions
 mess+0.055
 mush+0.054
 slippery+0.053
 messy+0.050
 time+0.050
Neg logit contributions
illa-0.065
piring-0.064
work-0.064
cient-0.063
ica-0.063
Token rod
Token ID15299
Feature activation+5.06e+00

Pos logit contributions
piring+0.324
cient+0.318
work+0.305
illa+0.300
leaf+0.299
Neg logit contributions
 mess-0.545
 mush-0.525
 slippery-0.494
omo-0.491
onna-0.480
Token.
Token ID13
Feature activation+3.64e-01

Pos logit contributions
piring+0.360
cient+0.359
leaf+0.349
 photographer+0.348
illa+0.347
Neg logit contributions
 mess-0.237
 mush-0.224
 slippery-0.204
omo-0.196
onna-0.193
Token through
Token ID832
Feature activation-4.32e-01

Pos logit contributions
 mess+0.027
 slippery+0.025
 mush+0.025
 sticky+0.022
 messy+0.022
Neg logit contributions
cient-0.050
 photographer-0.048
illa-0.048
iya-0.047
thel-0.047
Token the
Token ID262
Feature activation+6.88e-01

Pos logit contributions
 mess+0.022
 mush+0.021
 slippery+0.019
 Lily+0.018
 messy+0.018
Neg logit contributions
cient-0.030
piring-0.030
illa-0.029
work-0.029
leaf-0.029
Token water
Token ID1660
Feature activation-5.38e+00

Pos logit contributions
piring+0.049
cient+0.049
work+0.047
ci+0.047
leaf+0.046
Neg logit contributions
 mess-0.031
 mush-0.030
omo-0.028
onna-0.028
 slippery-0.027
Token and
Token ID290
Feature activation+4.79e-01

Pos logit contributions
 mess+0.258
 mush+0.241
 slippery+0.225
onna+0.210
 Lily+0.209
Neg logit contributions
cient-0.379
piring-0.374
leaf-0.371
 photographer-0.366
illa-0.364
Token delic
Token ID8675
Feature activation-5.21e+00

Pos logit contributions
cient+0.031
piring+0.031
 photographer+0.029
leaf+0.029
illa+0.029
Neg logit contributions
 mess-0.026
 mush-0.025
 slippery-0.023
 Lily-0.023
 messy-0.022
Tokenately
Token ID1286
Feature activation+1.77e+01

Pos logit contributions
 mess+0.421
onna+0.404
 mush+0.402
omo+0.401
 slippery+0.386
Neg logit contributions
cient-0.171
 photographer-0.164
leaf-0.158
ci-0.157
bike-0.155
Token scoop
Token ID23529
Feature activation-1.52e-01

Pos logit contributions
piring+1.103
cient+1.089
work+1.054
illa+1.040
leaf+1.037
Neg logit contributions
 mess-0.989
 mush-0.942
omo-0.869
 slippery-0.859
onna-0.833
Tokened
Token ID276
Feature activation+4.52e+00

Pos logit contributions
 mess+0.013
 mush+0.013
omo+0.013
onna+0.012
 slippery+0.012
Neg logit contributions
piring-0.005
cient-0.004
work-0.004
leaf-0.004
illa-0.004
Token up
Token ID510
Feature activation-1.93e+00

Pos logit contributions
piring+0.277
cient+0.274
work+0.266
leaf+0.261
illa+0.261
Neg logit contributions
 mess-0.256
 mush-0.243
omo-0.225
 slippery-0.223
onna-0.217
Token some
Token ID617
Feature activation-1.27e+00

Pos logit contributions
 mess+0.102
 mush+0.097
 slippery+0.090
 Lily+0.086
omo+0.085
Neg logit contributions
cient-0.128
piring-0.126
illa-0.121
work-0.121
 photographer-0.121
Token of
Token ID286
Feature activation-2.15e+00

Pos logit contributions
 mess+0.062
 mush+0.060
omo+0.058
onna+0.055
 slippery+0.051
Neg logit contributions
piring-0.087
leaf-0.082
cient-0.082
work-0.082
igh-0.079
Token day
Token ID1110
Feature activation+1.96e+00

Pos logit contributions
 mess+0.018
 mush+0.017
omo+0.016
 slippery+0.016
onna+0.015
Neg logit contributions
piring-0.022
cient-0.022
work-0.021
leaf-0.021
illa-0.021
Token.
Token ID13
Feature activation+2.87e-01

Pos logit contributions
cient+0.141
piring+0.140
work+0.135
illa+0.135
leaf+0.134
Neg logit contributions
 mess-0.093
 mush-0.085
 slippery-0.080
onna-0.076
omo-0.075
Token They
Token ID1119
Feature activation-1.21e-01

Pos logit contributions
piring+0.019
cient+0.019
work+0.018
ci+0.018
illa+0.018
Neg logit contributions
 mess-0.014
 mush-0.014
omo-0.013
 Lily-0.013
onna-0.013
Token worked
Token ID3111
Feature activation+2.42e+00

Pos logit contributions
 mess+0.006
 mush+0.006
 slippery+0.006
omo+0.005
 Lily+0.005
Neg logit contributions
cient-0.008
piring-0.008
owers-0.008
leaf-0.008
 photographer-0.008
Token delic
Token ID8675
Feature activation-5.27e+00

Pos logit contributions
piring+0.174
work+0.170
cient+0.170
ci+0.167
leaf+0.165
Neg logit contributions
 mess-0.105
 mush-0.103
omo-0.094
onna-0.089
 slippery-0.086
Tokenately
Token ID1286
Feature activation+1.77e+01

Pos logit contributions
onna+0.407
 mess+0.398
omo+0.394
 slippery+0.374
 mush+0.372
Neg logit contributions
 photographer-0.190
cient-0.188
 goat-0.181
ci-0.170
piring-0.169
Token and
Token ID290
Feature activation+4.18e-01

Pos logit contributions
piring+1.102
cient+1.095
work+1.052
illa+1.044
 photographer+1.042
Neg logit contributions
 mess-0.990
 mush-0.936
 slippery-0.872
omo-0.852
onna-0.834
Token carefully
Token ID7773
Feature activation+2.08e-01

Pos logit contributions
cient+0.026
piring+0.026
work+0.025
 photographer+0.025
leaf+0.025
Neg logit contributions
 mess-0.023
 mush-0.022
 slippery-0.021
omo-0.020
onna-0.020
Token and
Token ID290
Feature activation+4.13e-01

Pos logit contributions
cient+0.014
piring+0.014
 photographer+0.014
illa+0.014
leaf+0.014
Neg logit contributions
 mess-0.011
 mush-0.010
 slippery-0.009
onna-0.009
 Lily-0.009
Token nothing
Token ID2147
Feature activation-4.66e+00

Pos logit contributions
cient+0.027
piring+0.027
work+0.026
leaf+0.026
 photographer+0.026
Neg logit contributions
 mess-0.022
 mush-0.021
 slippery-0.019
omo-0.019
 Lily-0.018
Token broke
Token ID6265
Feature activation-8.32e+00

Pos logit contributions
 mess+0.258
 mush+0.251
omo+0.244
onna+0.221
 slippery+0.219
Neg logit contributions
piring-0.285
work-0.272
cient-0.272
leaf-0.269
ci-0.264
Token of
Token ID286
Feature activation-2.24e+00

Pos logit contributions
 mess+0.136
omo+0.129
 mush+0.127
onna+0.119
 painful+0.116
Neg logit contributions
piring-0.130
work-0.121
owers-0.117
amb-0.117
leaf-0.113
Token sprink
Token ID27562
Feature activation-3.40e+00

Pos logit contributions
 mess+0.131
 mush+0.120
onna+0.117
omo+0.116
 steep+0.112
Neg logit contributions
leaf-0.124
piring-0.124
cient-0.120
illa-0.116
work-0.115
Tokenles
Token ID829
Feature activation-1.99e+00

Pos logit contributions
 mess+0.179
 mush+0.171
 slippery+0.157
omo+0.156
onna+0.154
Neg logit contributions
piring-0.219
cient-0.216
work-0.213
ci-0.208
igh-0.208
Token and
Token ID290
Feature activation+3.59e-01

Pos logit contributions
 mess+0.094
 mush+0.088
 slippery+0.081
 messy+0.076
 Lily+0.076
Neg logit contributions
cient-0.146
piring-0.139
avi-0.138
leaf-0.137
illa-0.136
Token delic
Token ID8675
Feature activation-5.33e+00

Pos logit contributions
piring+0.020
cient+0.020
work+0.020
leaf+0.019
ci+0.019
Neg logit contributions
 mess-0.022
 mush-0.020
omo-0.019
 slippery-0.019
onna-0.019
Tokenately
Token ID1286
Feature activation+1.76e+01

Pos logit contributions
 mess+0.397
onna+0.394
 mush+0.392
omo+0.390
 slippery+0.358
Neg logit contributions
cient-0.208
 photographer-0.202
ci-0.201
piring-0.197
 goat-0.196
Token spread
Token ID4104
Feature activation+2.03e+00

Pos logit contributions
piring+0.971
cient+0.955
work+0.933
leaf+0.925
illa+0.923
Neg logit contributions
 mess-1.063
 mush-1.012
omo-0.953
 slippery-0.935
 messy-0.899
Token them
Token ID606
Feature activation-1.03e+00

Pos logit contributions
piring+0.110
work+0.106
cient+0.105
leaf+0.103
 father+0.102
Neg logit contributions
 mess-0.126
omo-0.119
 mush-0.116
 slippery-0.109
onna-0.108
Token over
Token ID625
Feature activation-7.67e+00

Pos logit contributions
 mess+0.059
 mush+0.056
omo+0.053
 slippery+0.051
onna+0.050
Neg logit contributions
piring-0.063
cient-0.061
work-0.061
ci-0.059
moon-0.059
Token the
Token ID262
Feature activation+5.36e-01

Pos logit contributions
 mess+0.463
 mush+0.435
omo+0.428
 slippery+0.406
onna+0.400
Neg logit contributions
piring-0.420
work-0.413
 father-0.402
illa-0.397
 photographer-0.396
Token ice
Token ID4771
Feature activation+2.17e+00

Pos logit contributions
work+0.030
moon+0.029
piring+0.029
cient+0.028
 father+0.028
Neg logit contributions
omo-0.029
 mush-0.027
 mess-0.027
onna-0.026
 painful-0.026
Tokenuff
Token ID1648
Feature activation+9.24e+00

Pos logit contributions
 mess+0.531
 mush+0.509
omo+0.487
 slippery+0.474
onna+0.461
Neg logit contributions
piring-0.408
cient-0.406
work-0.385
 photographer-0.380
leaf-0.380
Tokenled
Token ID992
Feature activation+1.11e+00

Pos logit contributions
piring+0.556
cient+0.550
 photographer+0.535
work+0.532
moon+0.530
Neg logit contributions
 mess-0.528
 mush-0.505
 slippery-0.471
omo-0.466
 Lily-0.455
Token his
Token ID465
Feature activation-1.18e+00

Pos logit contributions
piring+0.069
cient+0.067
work+0.066
ci+0.065
illa+0.064
Neg logit contributions
 mess-0.063
 mush-0.059
omo-0.055
 slippery-0.055
onna-0.054
Token hair
Token ID4190
Feature activation-1.96e+00

Pos logit contributions
 mess+0.073
 mush+0.069
omo+0.065
 slippery+0.064
 messy+0.063
Neg logit contributions
piring-0.071
leaf-0.068
cient-0.067
illa-0.067
work-0.067
Token affection
Token ID17696
Feature activation-1.77e-01

Pos logit contributions
 mess+0.103
 mush+0.097
 slippery+0.090
omo+0.088
onna+0.086
Neg logit contributions
piring-0.129
cient-0.128
leaf-0.124
work-0.124
 photographer-0.123
Tokenately
Token ID1286
Feature activation+1.76e+01

Pos logit contributions
 mess+0.014
 mush+0.013
omo+0.013
 Lily+0.013
 slippery+0.013
Neg logit contributions
cient-0.007
leaf-0.007
piring-0.007
 photographer-0.006
work-0.006
Token.
Token ID13
Feature activation+1.78e-01

Pos logit contributions
cient+1.042
piring+1.041
ci+1.033
work+1.020
moon+0.997
Neg logit contributions
 mess-0.978
 mush-0.946
omo-0.889
 slippery-0.856
onna-0.832
Token
Token ID220
Feature activation-3.98e-01

Pos logit contributions
piring+0.012
cient+0.012
work+0.011
illa+0.011
leaf+0.011
Neg logit contributions
 mess-0.009
 mush-0.009
 slippery-0.008
omo-0.008
onna-0.008
Token
Token ID198
Feature activation+7.59e-01

Pos logit contributions
 mess+0.021
 mush+0.020
 slippery+0.018
omo+0.018
onna+0.018
Neg logit contributions
piring-0.026
cient-0.025
work-0.025
illa-0.024
leaf-0.024
Token
Token ID198
Feature activation+7.63e-01

Pos logit contributions
cient+0.043
piring+0.043
illa+0.042
leaf+0.042
moon+0.041
Neg logit contributions
 mess-0.046
 mush-0.044
 slippery-0.040
omo-0.039
 messy-0.038
TokenThe
Token ID464
Feature activation+9.22e-01

Pos logit contributions
piring+0.048
cient+0.047
 photographer+0.046
illa+0.046
leaf+0.046
Neg logit contributions
 mess-0.043
 mush-0.040
 slippery-0.036
onna-0.035
omo-0.034
Token work
Token ID670
Feature activation+2.67e+00

Pos logit contributions
 mess+0.001
 mush+0.001
 slippery+0.001
 Lily+0.001
 messy+0.001
Neg logit contributions
piring-0.001
cient-0.001
amb-0.001
illa-0.001
owers-0.001
Token,
Token ID11
Feature activation-6.29e-01

Pos logit contributions
piring+0.161
cient+0.158
work+0.158
ci+0.153
moon+0.151
Neg logit contributions
 mess-0.149
 mush-0.144
omo-0.136
 slippery-0.132
onna-0.129
Token carefully
Token ID7773
Feature activation+7.50e-02

Pos logit contributions
 mess+0.033
 mush+0.031
 slippery+0.028
omo+0.027
 Lily+0.026
Neg logit contributions
piring-0.043
cient-0.042
work-0.041
illa-0.041
leaf-0.040
Token and
Token ID290
Feature activation+2.77e-01

Pos logit contributions
cient+0.005
piring+0.005
illa+0.005
iya+0.005
leaf+0.005
Neg logit contributions
 mess-0.004
 mush-0.004
 slippery-0.003
 messy-0.003
 sticky-0.003
Token delic
Token ID8675
Feature activation-5.40e+00

Pos logit contributions
piring+0.018
cient+0.018
leaf+0.017
owers+0.017
illa+0.017
Neg logit contributions
 mess-0.015
 mush-0.014
 slippery-0.013
 Lily-0.012
omo-0.012
Tokenately
Token ID1286
Feature activation+1.76e+01

Pos logit contributions
onna+0.408
 mess+0.407
omo+0.398
 mush+0.388
 slippery+0.374
Neg logit contributions
 photographer-0.187
leaf-0.187
cient-0.181
ci-0.179
thel-0.175
Token repairing
Token ID39005
Feature activation+3.78e+00

Pos logit contributions
piring+1.086
cient+1.076
work+1.040
leaf+1.026
 photographer+1.023
Neg logit contributions
 mess-1.000
 mush-0.945
 slippery-0.876
omo-0.863
onna-0.836
Token the
Token ID262
Feature activation+5.35e-01

Pos logit contributions
piring+0.222
cient+0.221
work+0.215
illa+0.211
ci+0.210
Neg logit contributions
 mess-0.222
 mush-0.212
omo-0.197
 slippery-0.195
onna-0.192
Token butterfly
Token ID35113
Feature activation+2.44e+00

Pos logit contributions
piring+0.032
cient+0.032
work+0.031
illa+0.031
leaf+0.031
Neg logit contributions
 mess-0.032
 mush-0.030
 slippery-0.028
omo-0.027
 messy-0.026
Token's
Token ID338
Feature activation+2.82e+00

Pos logit contributions
piring+0.163
cient+0.160
leaf+0.157
 photographer+0.155
illa+0.155
Neg logit contributions
 mess-0.127
 mush-0.119
 slippery-0.110
omo-0.104
onna-0.104
Token wings
Token ID12098
Feature activation-3.16e-01

Pos logit contributions
piring+0.172
cient+0.169
illa+0.163
leaf+0.163
work+0.162
Neg logit contributions
 mess-0.167
 mush-0.158
 slippery-0.146
omo-0.141
 messy-0.139
Token little
Token ID1310
Feature activation-4.27e-01

Pos logit contributions
piring+0.020
cient+0.020
work+0.019
leaf+0.019
illa+0.019
Neg logit contributions
 mess-0.020
 mush-0.019
omo-0.018
 slippery-0.018
onna-0.017
Token girl
Token ID2576
Feature activation+3.15e+00

Pos logit contributions
 mess+0.028
 mush+0.027
omo+0.025
 slippery+0.025
onna+0.024
Neg logit contributions
piring-0.022
cient-0.022
work-0.021
ci-0.021
illa-0.021
Token's
Token ID338
Feature activation+2.58e+00

Pos logit contributions
cient+0.228
piring+0.222
leaf+0.221
 photographer+0.221
avi+0.217
Neg logit contributions
 mess-0.149
 mush-0.138
 slippery-0.130
 messy-0.123
 Lily-0.121
Token shoulders
Token ID12450
Feature activation+1.73e+00

Pos logit contributions
piring+0.143
cient+0.142
work+0.137
leaf+0.134
illa+0.134
Neg logit contributions
 mess-0.165
 mush-0.157
 slippery-0.147
omo-0.145
onna-0.141
Token affection
Token ID17696
Feature activation-4.10e-01

Pos logit contributions
cient+0.124
piring+0.122
 photographer+0.119
leaf+0.119
illa+0.118
Neg logit contributions
 mess-0.082
 mush-0.076
 slippery-0.071
 messy-0.067
onna-0.067
Tokenately
Token ID1286
Feature activation+1.74e+01

Pos logit contributions
 mess+0.028
 mush+0.026
 Lily+0.025
 slippery+0.025
omo+0.025
Neg logit contributions
cient-0.021
leaf-0.020
 photographer-0.019
piring-0.019
work-0.019
Token.
Token ID13
Feature activation-5.33e-02

Pos logit contributions
piring+1.037
cient+1.015
ci+1.011
work+0.997
moon+0.988
Neg logit contributions
 mess-0.950
 mush-0.936
omo-0.886
 slippery-0.829
onna-0.813
Token
Token ID198
Feature activation+5.45e-01

Pos logit contributions
 mess+0.003
 mush+0.003
 slippery+0.003
omo+0.003
onna+0.003
Neg logit contributions
piring-0.003
cient-0.003
work-0.003
illa-0.003
leaf-0.003
Token
Token ID198
Feature activation+5.35e-01

Pos logit contributions
piring+0.033
cient+0.032
illa+0.031
leaf+0.031
work+0.031
Neg logit contributions
 mess-0.031
 mush-0.030
 slippery-0.028
omo-0.027
onna-0.026
TokenSarah
Token ID29284
Feature activation-8.03e-01

Pos logit contributions
piring+0.035
cient+0.034
 photographer+0.034
leaf+0.033
illa+0.033
Neg logit contributions
 mess-0.028
 mush-0.027
 slippery-0.024
onna-0.024
omo-0.024
Token smiled
Token ID13541
Feature activation-2.00e+00

Pos logit contributions
 mess+0.046
 mush+0.044
omo+0.041
 slippery+0.041
onna+0.039
Neg logit contributions
piring-0.049
cient-0.049
work-0.047
leaf-0.046
 photographer-0.046
Token the
Token ID262
Feature activation+7.06e-01

Pos logit contributions
 mess+0.127
 mush+0.118
onna+0.109
omo+0.109
 slippery+0.106
Neg logit contributions
piring-0.132
cient-0.128
work-0.125
leaf-0.124
 photographer-0.123
Token brav
Token ID49025
Feature activation+4.79e+00

Pos logit contributions
piring+0.043
cient+0.043
work+0.041
igh+0.040
leaf+0.040
Neg logit contributions
 mess-0.042
 slippery-0.040
 mush-0.039
omo-0.037
 messy-0.037
Tokenest
Token ID395
Feature activation+2.22e+00

Pos logit contributions
illa+0.233
piring+0.231
work+0.231
ci+0.228
cient+0.226
Neg logit contributions
 mess-0.322
 mush-0.307
onna-0.298
 slippery-0.291
 messy-0.278
Token and
Token ID290
Feature activation+5.09e-01

Pos logit contributions
cient+0.132
piring+0.131
work+0.124
leaf+0.123
illa+0.122
Neg logit contributions
 mess-0.136
 mush-0.125
 slippery-0.119
omo-0.119
 Lily-0.113
Token boun
Token ID31283
Feature activation-4.45e+00

Pos logit contributions
piring+0.036
cient+0.035
work+0.034
ci+0.034
leaf+0.034
Neg logit contributions
 mess-0.025
 mush-0.023
omo-0.021
 slippery-0.021
onna-0.020
Tokenci
Token ID979
Feature activation+1.41e+01

Pos logit contributions
 mess+0.323
 mush+0.310
 slippery+0.291
omo+0.290
onna+0.284
Neg logit contributions
piring-0.203
cient-0.200
work-0.193
ci-0.188
illa-0.188
Tokenest
Token ID395
Feature activation+2.22e+00

Pos logit contributions
piring+0.862
cient+0.853
work+0.826
illa+0.812
leaf+0.812
Neg logit contributions
 mess-0.807
 mush-0.767
 slippery-0.707
omo-0.705
onna-0.681
Token bunny
Token ID44915
Feature activation+2.65e+00

Pos logit contributions
cient+0.126
piring+0.124
work+0.118
illa+0.117
leaf+0.116
Neg logit contributions
 mess-0.141
 mush-0.131
omo-0.124
 slippery-0.124
 messy-0.118
Token in
Token ID287
Feature activation+1.68e+00

Pos logit contributions
cient+0.178
avi+0.172
 photographer+0.169
iya+0.168
leaf+0.168
Neg logit contributions
 mess-0.125
 mush-0.120
 slippery-0.118
 much-0.118
 Lily-0.117
Token the
Token ID262
Feature activation+7.07e-01

Pos logit contributions
piring+0.102
cient+0.100
work+0.098
illa+0.096
leaf+0.096
Neg logit contributions
 mess-0.097
 mush-0.092
omo-0.085
 slippery-0.084
onna-0.082
Token whole
Token ID2187
Feature activation-1.08e+00

Pos logit contributions
cient+0.037
piring+0.037
work+0.035
 photographer+0.035
ci+0.035
Neg logit contributions
 mess-0.046
 mush-0.044
omo-0.042
onna-0.041
 slippery-0.040
Token will
Token ID481
Feature activation+6.07e-02

Pos logit contributions
piring+0.031
cient+0.030
work+0.029
ica+0.028
ci+0.028
Neg logit contributions
 mess-0.032
 mush-0.031
omo-0.028
 slippery-0.028
onna-0.028
Token buy
Token ID2822
Feature activation+6.13e+00

Pos logit contributions
piring+0.003
cient+0.003
leaf+0.003
ci+0.003
work+0.003
Neg logit contributions
 mess-0.004
 mush-0.004
omo-0.003
 slippery-0.003
onna-0.003
Token you
Token ID345
Feature activation+1.68e+00

Pos logit contributions
piring+0.400
cient+0.398
work+0.384
 photographer+0.379
illa+0.378
Neg logit contributions
 mess-0.326
 mush-0.308
 slippery-0.283
omo-0.278
 Lily-0.273
Token the
Token ID262
Feature activation+7.02e-01

Pos logit contributions
piring+0.107
cient+0.106
work+0.101
avi+0.100
 photographer+0.100
Neg logit contributions
 mess-0.093
 mush-0.087
 slippery-0.081
omo-0.080
 Lily-0.079
Token fan
Token ID4336
Feature activation+1.35e+00

Pos logit contributions
piring+0.043
cient+0.043
work+0.041
illa+0.041
avi+0.041
Neg logit contributions
 mess-0.041
 mush-0.038
 slippery-0.035
omo-0.035
 messy-0.034
Tokenci
Token ID979
Feature activation+1.41e+01

Pos logit contributions
piring+0.099
cient+0.098
leaf+0.097
avi+0.097
iya+0.095
Neg logit contributions
 mess-0.059
 mush-0.056
 Lily-0.052
 messy-0.049
 slippery-0.049
Tokenest
Token ID395
Feature activation+2.20e+00

Pos logit contributions
piring+0.874
cient+0.861
work+0.832
leaf+0.823
illa+0.821
Neg logit contributions
 mess-0.798
 mush-0.758
 slippery-0.695
omo-0.694
onna-0.669
Token dress
Token ID6576
Feature activation+1.84e+00

Pos logit contributions
piring+0.139
cient+0.138
work+0.133
illa+0.132
leaf+0.131
Neg logit contributions
 mess-0.122
 mush-0.115
 slippery-0.106
omo-0.105
onna-0.102
Token in
Token ID287
Feature activation+1.68e+00

Pos logit contributions
piring+0.131
cient+0.131
leaf+0.126
avi+0.125
work+0.125
Neg logit contributions
 mess-0.085
 mush-0.080
 slippery-0.072
omo-0.071
onna-0.070
Token the
Token ID262
Feature activation+6.98e-01

Pos logit contributions
cient+0.106
piring+0.105
work+0.105
illa+0.103
 photographer+0.102
Neg logit contributions
 mess-0.088
 mush-0.084
omo-0.078
 slippery-0.077
onna-0.076
Token store
Token ID3650
Feature activation+2.85e+00

Pos logit contributions
cient+0.037
piring+0.036
work+0.036
 photographer+0.035
ci+0.035
Neg logit contributions
 mess-0.044
 mush-0.043
omo-0.041
 slippery-0.040
onna-0.040
Token on
Token ID319
Feature activation+2.98e+00

Pos logit contributions
 Lily+0.021
 mess+0.021
 slippery+0.020
 much+0.020
 mush+0.019
Neg logit contributions
work-0.022
 photographer-0.021
leaf-0.021
bike-0.021
igh-0.021
Token her
Token ID607
Feature activation+9.52e-01

Pos logit contributions
piring+0.195
cient+0.192
work+0.189
leaf+0.188
 photographer+0.188
Neg logit contributions
 mess-0.159
 mush-0.148
 slippery-0.142
omo-0.137
 Lily-0.136
Token walk
Token ID2513
Feature activation+5.54e+00

Pos logit contributions
leaf+0.072
piring+0.068
cient+0.067
igh+0.066
owers+0.066
Neg logit contributions
 mess-0.050
 time-0.048
 messy-0.045
 slippery-0.042
 Lily-0.042
Token,
Token ID11
Feature activation-3.72e-01

Pos logit contributions
piring+0.352
cient+0.349
work+0.338
leaf+0.334
illa+0.334
Neg logit contributions
 mess-0.302
 mush-0.286
 slippery-0.264
omo-0.260
onna-0.253
Token R
Token ID371
Feature activation+5.53e+00

Pos logit contributions
 Lily+0.020
 mess+0.019
 much+0.018
 slippery+0.018
 mush+0.017
Neg logit contributions
work-0.024
 photographer-0.024
leaf-0.023
igh-0.023
cient-0.023
Tokenyle
Token ID2349
Feature activation+1.41e+01

Pos logit contributions
piring+0.243
cient+0.231
work+0.212
ishly+0.207
ci+0.202
Neg logit contributions
 mess-0.405
 time-0.388
omo-0.386
onna-0.377
L-0.365
Tokene
Token ID68
Feature activation+7.33e+00

Pos logit contributions
piring+0.882
cient+0.855
work+0.845
leaf+0.844
 photographer+0.813
Neg logit contributions
 mess-0.788
 mush-0.725
 slippery-0.689
omo-0.683
 Lily-0.672
Token stumbled
Token ID24241
Feature activation-3.97e+00

Pos logit contributions
piring+0.457
cient+0.450
work+0.436
leaf+0.433
illa+0.428
Neg logit contributions
 mess-0.410
 mush-0.388
 slippery-0.357
omo-0.357
onna-0.344
Token across
Token ID1973
Feature activation-1.63e+00

Pos logit contributions
 mush+0.261
 mess+0.252
omo+0.235
 gross+0.223
 slippery+0.214
Neg logit contributions
piring-0.212
 father-0.197
moon-0.197
illa-0.196
work-0.196
Token a
Token ID257
Feature activation+2.32e+00

Pos logit contributions
 mess+0.097
 mush+0.091
 slippery+0.086
omo+0.082
 Lily+0.082
Neg logit contributions
cient-0.098
piring-0.097
work-0.093
 photographer-0.093
leaf-0.092
Token mysterious
Token ID11428
Feature activation+1.67e+00

Pos logit contributions
cient+0.143
piring+0.137
 goat+0.135
 father+0.134
work+0.133
Neg logit contributions
 mush-0.121
 mess-0.119
omo-0.117
onna-0.110
 Lily-0.104
Token to
Token ID284
Feature activation+2.64e-01

Pos logit contributions
piring+0.007
work+0.007
cient+0.007
 father+0.007
 grandfather+0.007
Neg logit contributions
 mess-0.011
 mush-0.010
omo-0.010
 slippery-0.009
onna-0.009
Token have
Token ID423
Feature activation-1.78e+00

Pos logit contributions
piring+0.013
cient+0.013
work+0.012
illa+0.012
leaf+0.012
Neg logit contributions
 mess-0.018
 mush-0.017
 slippery-0.016
omo-0.016
onna-0.016
Token a
Token ID257
Feature activation+2.31e+00

Pos logit contributions
 mess+0.111
 mush+0.108
omo+0.108
onna+0.103
 slippery+0.099
Neg logit contributions
piring-0.087
cient-0.085
 father-0.085
 grandfather-0.084
work-0.084
Token cool
Token ID3608
Feature activation-3.55e-01

Pos logit contributions
cient+0.145
 photographer+0.139
work+0.139
piring+0.138
 grandfather+0.138
Neg logit contributions
 mess-0.118
 mush-0.115
omo-0.113
 slippery-0.105
onna-0.103
Token hairst
Token ID42658
Feature activation-2.33e+00

Pos logit contributions
 mess+0.021
 mush+0.020
omo+0.019
 slippery+0.019
onna+0.018
Neg logit contributions
cient-0.020
work-0.020
piring-0.020
 photographer-0.019
ci-0.019
Tokenyle
Token ID2349
Feature activation+1.41e+01

Pos logit contributions
 mess+0.184
 mush+0.178
omo+0.175
 messy+0.172
 slippery+0.169
Neg logit contributions
cient-0.081
piring-0.074
owers-0.073
avi-0.070
 goat-0.069
Token,
Token ID11
Feature activation-3.80e-01

Pos logit contributions
piring+0.971
cient+0.959
avi+0.947
leaf+0.930
ica+0.929
Neg logit contributions
 mess-0.667
 mush-0.630
 Lily-0.582
 slippery-0.582
 much-0.563
Token but
Token ID475
Feature activation+4.99e-01

Pos logit contributions
 mess+0.019
 mush+0.018
 slippery+0.017
omo+0.016
onna+0.016
Neg logit contributions
piring-0.026
cient-0.026
work-0.025
illa-0.024
leaf-0.024
Token he
Token ID339
Feature activation-8.32e-01

Pos logit contributions
piring+0.032
cient+0.031
work+0.031
ci+0.030
illa+0.030
Neg logit contributions
 mess-0.027
 mush-0.026
omo-0.024
onna-0.023
 slippery-0.023
Token didn
Token ID1422
Feature activation-2.02e+00

Pos logit contributions
 mess+0.052
 mush+0.050
onna+0.046
 slippery+0.045
omo+0.043
Neg logit contributions
illa-0.045
ci-0.044
 father-0.043
cient-0.043
 grandfather-0.042
Token't
Token ID470
Feature activation-8.74e-01

Pos logit contributions
 mess+0.120
 mush+0.117
omo+0.107
 slippery+0.102
 messy+0.099
Neg logit contributions
piring-0.115
work-0.111
cient-0.111
illa-0.109
moon-0.109
Token was
Token ID373
Feature activation-1.15e+00

Pos logit contributions
piring+0.152
cient+0.151
work+0.148
ci+0.145
illa+0.143
Neg logit contributions
 mess-0.120
 mush-0.115
omo-0.105
 slippery-0.104
onna-0.101
Token full
Token ID1336
Feature activation-4.08e+00

Pos logit contributions
 mess+0.068
 mush+0.066
onna+0.063
omo+0.062
 slippery+0.057
Neg logit contributions
work-0.061
piring-0.060
 father-0.059
cient-0.059
 grandfather-0.057
Token of
Token ID286
Feature activation-2.11e+00

Pos logit contributions
 mess+0.178
 mush+0.167
 slippery+0.150
omo+0.149
onna+0.142
Neg logit contributions
piring-0.303
cient-0.301
work-0.293
illa-0.290
leaf-0.289
Token people
Token ID661
Feature activation+5.09e-01

Pos logit contributions
 mess+0.114
 mush+0.105
 slippery+0.098
 messy+0.095
 Lily+0.095
Neg logit contributions
cient-0.139
piring-0.137
illa-0.134
ci-0.133
work-0.133
Token b
Token ID275
Feature activation-1.86e-01

Pos logit contributions
avi+0.038
cient+0.038
used+0.037
leaf+0.037
iya+0.036
Neg logit contributions
 mess-0.022
 much-0.020
 messy-0.019
 Lily-0.018
 slippery-0.018
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.013
 mush+0.012
omo+0.011
 slippery+0.011
onna+0.011
Neg logit contributions
piring-0.009
cient-0.009
work-0.009
 photographer-0.009
ci-0.009
Token to
Token ID284
Feature activation+2.80e-01

Pos logit contributions
piring+0.945
cient+0.943
illa+0.927
leaf+0.926
 photographer+0.923
Neg logit contributions
 mess-0.708
 mush-0.657
 slippery-0.613
 messy-0.590
 much-0.576
Token the
Token ID262
Feature activation+7.15e-01

Pos logit contributions
piring+0.017
cient+0.016
leaf+0.016
amb+0.016
ica+0.016
Neg logit contributions
 mess-0.017
 mush-0.015
 Lily-0.015
 slippery-0.014
 much-0.014
Token stage
Token ID3800
Feature activation+2.59e+00

Pos logit contributions
piring+0.037
cient+0.036
igh+0.035
illa+0.035
ci+0.034
Neg logit contributions
 mess-0.049
 mush-0.045
 slippery-0.043
 messy-0.042
 Lily-0.041
Token.
Token ID13
Feature activation+3.78e-01

Pos logit contributions
piring+0.208
leaf+0.207
igh+0.204
cient+0.202
avi+0.201
Neg logit contributions
 mess-0.098
 mush-0.080
 messy-0.080
 Lily-0.080
 much-0.079
Token The
Token ID383
Feature activation+1.97e+00

Pos logit contributions
piring+0.027
cient+0.026
work+0.026
illa+0.026
ci+0.025
Neg logit contributions
 mess-0.018
 mush-0.017
 Lily-0.016
omo-0.015
 slippery-0.015
Token new
Token ID649
Feature activation+2.40e-01

Pos logit contributions
piring+0.054
cient+0.053
leaf+0.052
illa+0.051
work+0.051
Neg logit contributions
 mess-0.049
 mush-0.046
 slippery-0.042
omo-0.041
 messy-0.040
Token home
Token ID1363
Feature activation+8.42e-01

Pos logit contributions
piring+0.014
cient+0.014
work+0.013
ci+0.013
 photographer+0.013
Neg logit contributions
 mess-0.014
 mush-0.014
 slippery-0.013
omo-0.013
onna-0.012
Token with
Token ID351
Feature activation-4.08e+00

Pos logit contributions
cient+0.066
leaf+0.064
piring+0.063
 photographer+0.063
avi+0.062
Neg logit contributions
 mess-0.035
 mush-0.032
 slippery-0.029
 much-0.028
 Lily-0.028
Token the
Token ID262
Feature activation+6.20e-01

Pos logit contributions
 mess+0.235
 mush+0.224
 slippery+0.206
omo+0.205
onna+0.198
Neg logit contributions
piring-0.249
cient-0.246
work-0.238
leaf-0.234
illa-0.234
Token fan
Token ID4336
Feature activation+1.28e+00

Pos logit contributions
piring+0.037
cient+0.036
work+0.035
illa+0.034
leaf+0.034
Neg logit contributions
 mess-0.037
 mush-0.035
 slippery-0.033
omo-0.032
onna-0.031
Tokenci
Token ID979
Feature activation+1.41e+01

Pos logit contributions
cient+0.106
leaf+0.106
avi+0.104
 photographer+0.102
iya+0.101
Neg logit contributions
 mess-0.042
 mush-0.036
 much-0.036
 slippery-0.035
 Lily-0.035
Tokenest
Token ID395
Feature activation+2.15e+00

Pos logit contributions
piring+0.852
cient+0.845
work+0.819
illa+0.805
leaf+0.803
Neg logit contributions
 mess-0.805
 mush-0.767
 slippery-0.708
omo-0.704
onna-0.683
Token king
Token ID5822
Feature activation-7.17e-01

Pos logit contributions
cient+0.130
piring+0.127
illa+0.121
ci+0.120
work+0.120
Neg logit contributions
 mess-0.131
 mush-0.121
 slippery-0.112
omo-0.112
 messy-0.108
Token in
Token ID287
Feature activation+1.62e+00

Pos logit contributions
 mess+0.029
 Lily+0.027
 mush+0.027
 much+0.026
 slippery+0.025
Neg logit contributions
cient-0.053
avi-0.050
leaf-0.050
used-0.049
piring-0.049
Token the
Token ID262
Feature activation+6.66e-01

Pos logit contributions
piring+0.106
cient+0.105
work+0.101
leaf+0.101
ci+0.100
Neg logit contributions
 mess-0.086
 mush-0.081
 slippery-0.074
omo-0.073
 Lily-0.071
Token land
Token ID1956
Feature activation+5.57e-01

Pos logit contributions
cient+0.031
work+0.031
 photographer+0.030
piring+0.030
ci+0.030
Neg logit contributions
 mess-0.045
 mush-0.044
omo-0.043
onna-0.042
 slippery-0.042
Token corner
Token ID5228
Feature activation-4.70e+00

Pos logit contributions
work+0.036
cient+0.036
 goat+0.035
piring+0.035
 father+0.034
Neg logit contributions
 mush-0.041
 mess-0.040
omo-0.039
onna-0.038
 slippery-0.037
Token.
Token ID13
Feature activation+3.35e-01

Pos logit contributions
 mess+0.241
 mush+0.228
omo+0.208
 slippery+0.208
onna+0.201
Neg logit contributions
piring-0.313
cient-0.308
work-0.302
illa-0.296
leaf-0.295
Token It
Token ID632
Feature activation+2.28e+00

Pos logit contributions
cient+0.019
piring+0.019
work+0.018
leaf+0.018
illa+0.018
Neg logit contributions
 mess-0.021
 mush-0.019
 slippery-0.018
omo-0.018
onna-0.017
Token was
Token ID373
Feature activation-1.18e+00

Pos logit contributions
piring+0.128
cient+0.127
ci+0.123
work+0.120
ica+0.120
Neg logit contributions
 mess-0.136
 mush-0.128
onna-0.124
omo-0.123
 slippery-0.114
Token me
Token ID502
Feature activation-5.72e+00

Pos logit contributions
 mess+0.073
 mush+0.069
omo+0.066
onna+0.065
 slippery+0.063
Neg logit contributions
piring-0.063
 father-0.060
work-0.059
cient-0.059
illa-0.059
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.525
 mush+0.502
 slippery+0.498
onna+0.485
omo+0.477
Neg logit contributions
 photographer-0.152
cient-0.151
leaf-0.139
piring-0.138
owers-0.135
Token softly
Token ID26625
Feature activation-7.13e-01

Pos logit contributions
piring+0.847
cient+0.821
work+0.815
ci+0.793
illa+0.777
Neg logit contributions
 mess-0.801
 mush-0.762
omo-0.731
onna-0.692
 slippery-0.690
Token.
Token ID13
Feature activation+3.54e-01

Pos logit contributions
 mess+0.031
 mush+0.027
 slippery+0.025
 Lily+0.025
 messy+0.024
Neg logit contributions
cient-0.055
leaf-0.054
piring-0.053
 photographer-0.053
igh-0.052
Token The
Token ID383
Feature activation+1.96e+00

Pos logit contributions
cient+0.021
piring+0.020
leaf+0.019
work+0.019
ica+0.019
Neg logit contributions
 mess-0.022
 mush-0.020
 slippery-0.019
omo-0.019
onna-0.018
Token girl
Token ID2576
Feature activation+3.53e+00

Pos logit contributions
cient+0.084
piring+0.083
work+0.082
 father+0.080
 photographer+0.079
Neg logit contributions
 mess-0.141
 mush-0.140
onna-0.138
omo-0.131
 slippery-0.130
Token was
Token ID373
Feature activation-1.15e+00

Pos logit contributions
piring+0.230
cient+0.227
work+0.221
 photographer+0.217
leaf+0.217
Neg logit contributions
 mess-0.189
 mush-0.179
 slippery-0.165
omo-0.164
onna-0.157
Token noticed
Token ID6810
Feature activation+3.73e-01

Pos logit contributions
piring+0.032
moon+0.032
amb+0.032
cient+0.032
 photographer+0.032
Neg logit contributions
 mess-0.028
 mush-0.025
 Lily-0.025
 slippery-0.024
 messy-0.023
Token a
Token ID257
Feature activation+2.33e+00

Pos logit contributions
cient+0.024
piring+0.024
 photographer+0.024
igh+0.023
amb+0.023
Neg logit contributions
 mess-0.020
 Lily-0.019
 mush-0.018
 slippery-0.018
 much-0.017
Token little
Token ID1310
Feature activation-4.02e-02

Pos logit contributions
cient+0.153
piring+0.151
work+0.147
 photographer+0.147
 father+0.145
Neg logit contributions
 mess-0.113
 mush-0.111
omo-0.103
onna-0.101
 slippery-0.100
Token kitten
Token ID42143
Feature activation+1.56e+00

Pos logit contributions
 mess+0.003
 mush+0.003
 slippery+0.002
omo+0.002
onna+0.002
Neg logit contributions
piring-0.002
cient-0.002
work-0.002
illa-0.002
leaf-0.002
Token me
Token ID502
Feature activation-5.71e+00

Pos logit contributions
piring+0.101
cient+0.100
leaf+0.097
work+0.097
 photographer+0.096
Neg logit contributions
 mess-0.085
 mush-0.079
 slippery-0.073
omo-0.071
 Lily-0.070
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.524
 mush+0.507
 slippery+0.485
omo+0.480
onna+0.475
Neg logit contributions
piring-0.151
cient-0.151
 photographer-0.145
leaf-0.135
work-0.134
Token for
Token ID329
Feature activation+1.19e+00

Pos logit contributions
piring+0.827
cient+0.797
work+0.790
ci+0.781
illa+0.747
Neg logit contributions
 mess-0.812
 mush-0.775
omo-0.773
onna-0.738
 slippery-0.702
Token help
Token ID1037
Feature activation+5.12e+00

Pos logit contributions
piring+0.052
cient+0.050
leaf+0.047
illa+0.047
work+0.047
Neg logit contributions
 mess-0.088
 mush-0.085
omo-0.082
 slippery-0.080
onna-0.079
Token.
Token ID13
Feature activation+3.62e-01

Pos logit contributions
piring+0.302
cient+0.294
illa+0.284
work+0.282
ci+0.279
Neg logit contributions
 mess-0.293
 mush-0.287
omo-0.281
onna-0.260
 slippery-0.259
Token It
Token ID632
Feature activation+2.29e+00

Pos logit contributions
piring+0.022
work+0.022
cient+0.021
illa+0.021
leaf+0.021
Neg logit contributions
 mess-0.021
 mush-0.020
 Lily-0.019
 slippery-0.019
omo-0.018
Token was
Token ID373
Feature activation-1.16e+00

Pos logit contributions
cient+0.124
piring+0.120
ci+0.118
work+0.113
 goat+0.113
Neg logit contributions
 mess-0.141
 mush-0.132
onna-0.131
omo-0.131
 slippery-0.119
Token Fl
Token ID1610
Feature activation-5.54e+00

Pos logit contributions
cient+0.029
piring+0.029
work+0.028
ci+0.028
illa+0.027
Neg logit contributions
 mess-0.029
 mush-0.028
omo-0.026
onna-0.025
 slippery-0.025
Tokenuffy
Token ID15352
Feature activation+1.27e+00

Pos logit contributions
 mess+0.289
 mush+0.272
omo+0.252
 slippery+0.249
onna+0.241
Neg logit contributions
piring-0.370
cient-0.362
work-0.354
leaf-0.352
 photographer-0.350
Token immediately
Token ID3393
Feature activation+7.70e-01

Pos logit contributions
piring+0.087
cient+0.086
work+0.083
 photographer+0.082
leaf+0.082
Neg logit contributions
 mess-0.063
 mush-0.060
 slippery-0.054
omo-0.054
onna-0.052
Token stopped
Token ID5025
Feature activation-4.91e+00

Pos logit contributions
piring+0.045
cient+0.045
work+0.043
ci+0.043
moon+0.043
Neg logit contributions
 mess-0.044
 mush-0.042
omo-0.040
 slippery-0.039
onna-0.038
Token me
Token ID502
Feature activation-5.71e+00

Pos logit contributions
 mess+0.294
 mush+0.277
 slippery+0.258
onna+0.248
 Lily+0.246
Neg logit contributions
piring-0.295
cient-0.293
 photographer-0.284
work-0.283
leaf-0.281
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.306
 mush+0.282
 slippery+0.271
onna+0.256
 much+0.254
Neg logit contributions
 photographer-0.379
piring-0.371
cient-0.364
leaf-0.361
iya-0.356
Token.
Token ID13
Feature activation+3.64e-01

Pos logit contributions
piring+0.861
work+0.828
cient+0.825
ci+0.803
illa+0.782
Neg logit contributions
 mess-0.782
 mush-0.749
omo-0.734
 slippery-0.680
onna-0.673
Token Lily
Token ID20037
Feature activation-8.36e+00

Pos logit contributions
cient+0.020
amb+0.018
used+0.018
piring+0.018
illa+0.018
Neg logit contributions
 mess-0.024
 mush-0.021
 slippery-0.021
 messy-0.019
omo-0.019
Token was
Token ID373
Feature activation-1.14e+00

Pos logit contributions
 mess+0.436
 mush+0.413
 slippery+0.377
omo+0.375
onna+0.362
Neg logit contributions
piring-0.551
cient-0.546
work-0.530
illa-0.523
leaf-0.522
Token so
Token ID523
Feature activation+3.42e+00

Pos logit contributions
 mess+0.069
 mush+0.066
omo+0.063
onna+0.060
 slippery+0.060
Neg logit contributions
piring-0.063
cient-0.061
work-0.061
illa-0.061
 father-0.060
Token happy
Token ID3772
Feature activation+6.74e-01

Pos logit contributions
piring+0.228
cient+0.225
work+0.218
leaf+0.216
illa+0.214
Neg logit contributions
 mess-0.175
 mush-0.163
omo-0.151
 slippery-0.149
onna-0.146
Token way
Token ID835
Feature activation-4.72e+00

Pos logit contributions
 mess+0.043
 mush+0.039
 slippery+0.036
omo+0.035
onna+0.033
Neg logit contributions
piring-0.079
cient-0.078
work-0.076
leaf-0.076
illa-0.075
Token back
Token ID736
Feature activation-5.06e-01

Pos logit contributions
 mess+0.300
 mush+0.291
omo+0.278
onna+0.261
 slippery+0.259
Neg logit contributions
piring-0.259
work-0.246
cient-0.244
 father-0.242
ci-0.238
Token home
Token ID1363
Feature activation+9.26e-01

Pos logit contributions
 mess+0.028
 mush+0.027
 slippery+0.025
onna+0.024
 Lily+0.024
Neg logit contributions
piring-0.032
cient-0.032
 photographer-0.030
leaf-0.030
avi-0.030
Token,
Token ID11
Feature activation-3.71e-01

Pos logit contributions
piring+0.058
cient+0.057
work+0.056
ci+0.055
illa+0.054
Neg logit contributions
 mess-0.050
 mush-0.048
omo-0.045
 slippery-0.043
onna-0.042
Token v
Token ID410
Feature activation-3.46e+00

Pos logit contributions
 mess+0.020
 mush+0.019
omo+0.018
 slippery+0.018
onna+0.017
Neg logit contributions
piring-0.023
cient-0.023
work-0.022
illa-0.022
ci-0.022
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.249
 mush+0.236
 slippery+0.222
omo+0.214
onna+0.214
Neg logit contributions
piring-0.165
cient-0.163
work-0.158
illa-0.157
ci-0.152
Token never
Token ID1239
Feature activation-4.87e+00

Pos logit contributions
piring+0.845
cient+0.820
work+0.819
ci+0.788
bike+0.764
Neg logit contributions
 mess-0.795
 mush-0.767
omo-0.749
onna-0.692
 slippery-0.686
Token to
Token ID284
Feature activation+2.78e-01

Pos logit contributions
 mess+0.288
 mush+0.276
omo+0.257
 slippery+0.254
onna+0.246
Neg logit contributions
piring-0.287
cient-0.283
work-0.275
illa-0.268
leaf-0.268
Token return
Token ID1441
Feature activation-7.39e-01

Pos logit contributions
piring+0.015
cient+0.015
work+0.014
moon+0.014
illa+0.014
Neg logit contributions
 mess-0.017
 mush-0.017
omo-0.016
 slippery-0.015
onna-0.015
Token to
Token ID284
Feature activation+2.86e-01

Pos logit contributions
 mess+0.047
omo+0.045
 mush+0.044
onna+0.040
 slippery+0.040
Neg logit contributions
piring-0.038
 father-0.038
work-0.038
cient-0.037
 permission-0.037
Token that
Token ID326
Feature activation+3.63e+00

Pos logit contributions
piring+0.018
cient+0.018
work+0.017
illa+0.017
leaf+0.017
Neg logit contributions
 mess-0.016
 mush-0.015
omo-0.014
 slippery-0.014
onna-0.013
Token happy
Token ID3772
Feature activation+6.68e-01

Pos logit contributions
piring+0.202
cient+0.195
work+0.190
leaf+0.189
 photographer+0.185
Neg logit contributions
 mess-0.199
 mush-0.187
omo-0.173
onna-0.171
 slippery-0.169
Token!
Token ID0
Feature activation+3.89e-01

Pos logit contributions
cient+0.046
piring+0.044
igh+0.044
leaf+0.043
illa+0.042
Neg logit contributions
 mess-0.035
 mush-0.032
 Lily-0.030
 slippery-0.030
onna-0.029
Token He
Token ID679
Feature activation-2.10e+00

Pos logit contributions
cient+0.025
piring+0.025
 father+0.024
amb+0.024
leaf+0.024
Neg logit contributions
 mess-0.021
 mush-0.019
 slippery-0.018
omo-0.017
 messy-0.016
Token kept
Token ID4030
Feature activation+1.05e+00

Pos logit contributions
 mess+0.124
 mush+0.116
 slippery+0.110
omo+0.108
 Lily+0.105
Neg logit contributions
piring-0.127
cient-0.124
work-0.120
illa-0.118
owers-0.117
Token b
Token ID275
Feature activation-1.82e-01

Pos logit contributions
cient+0.069
owers+0.068
piring+0.064
moon+0.062
illa+0.062
Neg logit contributions
 mess-0.063
 Lily-0.058
 mush-0.055
 slippery-0.055
 late-0.053
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.016
 mush+0.016
 slippery+0.015
omo+0.015
onna+0.015
Neg logit contributions
cient-0.005
piring-0.005
work-0.005
 photographer-0.005
 father-0.005
Token and
Token ID290
Feature activation+5.01e-01

Pos logit contributions
piring+0.863
cient+0.833
work+0.827
ci+0.802
leaf+0.793
Neg logit contributions
 mess-0.794
 mush-0.758
omo-0.725
 slippery-0.684
onna-0.681
Token w
Token ID266
Feature activation-2.28e+00

Pos logit contributions
cient+0.026
piring+0.026
work+0.025
leaf+0.025
illa+0.025
Neg logit contributions
 mess-0.033
 mush-0.032
 slippery-0.030
omo-0.029
onna-0.028
Tokenagging
Token ID16406
Feature activation+5.22e+00

Pos logit contributions
 mess+0.127
 mush+0.120
 slippery+0.111
omo+0.110
onna+0.107
Neg logit contributions
piring-0.144
cient-0.143
work-0.139
ci-0.137
 photographer-0.137
Token his
Token ID465
Feature activation-1.01e+00

Pos logit contributions
piring+0.339
cient+0.324
work+0.324
ci+0.317
illa+0.313
Neg logit contributions
 mess-0.278
 mush-0.265
omo-0.251
onna-0.239
 slippery-0.238
Token tail
Token ID7894
Feature activation-2.52e+00

Pos logit contributions
 mess+0.053
 mush+0.050
 slippery+0.045
omo+0.045
onna+0.045
Neg logit contributions
piring-0.067
cient-0.066
work-0.064
ci-0.063
 photographer-0.063
Token sun
Token ID4252
Feature activation-3.34e+00

Pos logit contributions
piring+0.048
cient+0.047
work+0.046
illa+0.045
leaf+0.045
Neg logit contributions
 mess-0.036
 mush-0.034
 slippery-0.031
omo-0.031
onna-0.029
Tokensets
Token ID28709
Feature activation+8.37e+00

Pos logit contributions
 mess+0.159
 mush+0.141
 slippery+0.139
omo+0.134
 Lily+0.131
Neg logit contributions
leaf-0.238
cient-0.234
piring-0.230
 photographer-0.229
avi-0.224
Token and
Token ID290
Feature activation+5.00e-01

Pos logit contributions
cient+0.502
 photographer+0.495
leaf+0.495
piring+0.492
illa+0.490
Neg logit contributions
 mess-0.483
 mush-0.443
 slippery-0.435
 Lily-0.407
 messy-0.404
Token the
Token ID262
Feature activation+6.93e-01

Pos logit contributions
piring+0.034
cient+0.034
illa+0.033
leaf+0.033
amb+0.033
Neg logit contributions
 mess-0.025
 mush-0.024
 slippery-0.022
 Lily-0.022
 messy-0.021
Token b
Token ID275
Feature activation-1.93e-01

Pos logit contributions
piring+0.045
cient+0.044
work+0.043
illa+0.043
ci+0.042
Neg logit contributions
 mess-0.038
 mush-0.035
 slippery-0.033
omo-0.032
 messy-0.031
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.014
 mush+0.013
omo+0.013
 slippery+0.013
onna+0.012
Neg logit contributions
piring-0.009
cient-0.009
 photographer-0.008
work-0.008
 recorder-0.008
Token all
Token ID477
Feature activation-2.84e-01

Pos logit contributions
piring+0.881
cient+0.876
work+0.845
illa+0.840
leaf+0.838
Neg logit contributions
 mess-0.777
 mush-0.734
 slippery-0.679
omo-0.670
onna-0.648
Token made
Token ID925
Feature activation-2.53e+00

Pos logit contributions
 mess+0.015
 mush+0.015
omo+0.014
 slippery+0.013
onna+0.013
Neg logit contributions
piring-0.018
cient-0.018
work-0.018
ci-0.017
leaf-0.017
Token her
Token ID607
Feature activation+9.57e-01

Pos logit contributions
 mess+0.137
 mush+0.127
 slippery+0.119
 Lily+0.115
omo+0.114
Neg logit contributions
cient-0.165
piring-0.165
work-0.158
 photographer-0.157
igh-0.156
Token feel
Token ID1254
Feature activation-1.43e+00

Pos logit contributions
cient+0.068
piring+0.067
leaf+0.066
 photographer+0.066
illa+0.065
Neg logit contributions
 mess-0.049
 slippery-0.044
 mush-0.044
 much-0.041
 messy-0.040
Token happy
Token ID3772
Feature activation+6.63e-01

Pos logit contributions
omo+0.078
 mess+0.076
onna+0.074
 mush+0.072
 slippery+0.061
Neg logit contributions
piring-0.086
 father-0.081
 wife-0.080
leaf-0.080
cient-0.080
Token courage
Token ID11917
Feature activation+5.54e-01

Pos logit contributions
piring+0.214
cient+0.208
ci+0.200
moon+0.199
avi+0.199
Neg logit contributions
 mess-0.191
 mush-0.176
 slippery-0.163
 messy-0.158
 much-0.156
Token and
Token ID290
Feature activation+5.03e-01

Pos logit contributions
cient+0.038
piring+0.037
illa+0.036
 photographer+0.036
work+0.036
Neg logit contributions
 mess-0.028
 mush-0.026
 slippery-0.024
onna-0.023
 much-0.023
Token loyalty
Token ID15843
Feature activation+5.56e+00

Pos logit contributions
piring+0.030
cient+0.030
illa+0.029
work+0.029
 photographer+0.029
Neg logit contributions
 mess-0.029
 mush-0.027
 slippery-0.026
 Lily-0.025
 much-0.025
Token by
Token ID416
Feature activation+9.97e-01

Pos logit contributions
cient+0.361
piring+0.357
illa+0.350
 photographer+0.348
ci+0.345
Neg logit contributions
 mess-0.297
 mush-0.274
 slippery-0.257
 Lily-0.249
onna-0.242
Token b
Token ID275
Feature activation-1.93e-01

Pos logit contributions
piring+0.055
cient+0.054
 photographer+0.052
leaf+0.052
moon+0.052
Neg logit contributions
 mess-0.063
 mush-0.060
 slippery-0.056
omo-0.055
 Lily-0.055
Tokenowing
Token ID7855
Feature activation+1.41e+01

Pos logit contributions
 mess+0.015
 mush+0.014
onna+0.014
 slippery+0.014
omo+0.014
Neg logit contributions
piring-0.008
cient-0.007
 photographer-0.007
work-0.007
leaf-0.007
Token to
Token ID284
Feature activation+2.71e-01

Pos logit contributions
piring+0.880
cient+0.870
work+0.845
leaf+0.827
illa+0.825
Neg logit contributions
 mess-0.777
 mush-0.739
omo-0.682
 slippery-0.678
onna-0.656
Token Daisy
Token ID40355
Feature activation-3.58e+00

Pos logit contributions
piring+0.017
cient+0.017
leaf+0.016
work+0.016
amb+0.016
Neg logit contributions
 mess-0.015
 mush-0.014
 slippery-0.013
 Lily-0.013
 messy-0.012
Token.
Token ID13
Feature activation+3.64e-01

Pos logit contributions
 mess+0.184
 mush+0.166
 slippery+0.156
 much+0.154
 Lily+0.153
Neg logit contributions
leaf-0.243
piring-0.239
 photographer-0.238
cient-0.234
avi-0.233
Token<|endoftext|>
Token ID50256
Feature activation-4.55e+00

Pos logit contributions
piring+0.025
cient+0.024
work+0.024
leaf+0.024
ci+0.024
Neg logit contributions
 mess-0.017
 mush-0.017
 Lily-0.016
omo-0.015
onna-0.015
TokenOnce
Token ID7454
Feature activation-2.93e+00

Pos logit contributions
 mess+0.275
 mush+0.262
 slippery+0.243
omo+0.242
onna+0.235
Neg logit contributions
piring-0.262
cient-0.259
work-0.250
leaf-0.246
illa-0.246
Token is
Token ID318
Feature activation-1.71e+00

Pos logit contributions
work+0.097
cient+0.095
piring+0.095
leaf+0.092
ci+0.089
Neg logit contributions
omo-0.106
 mess-0.103
 mush-0.103
 late-0.096
onna-0.094
Token s
Token ID264
Feature activation-2.47e+00

Pos logit contributions
 mess+0.071
hell+0.070
 million+0.069
L+0.069
omo+0.066
Neg logit contributions
 father-0.108
work-0.103
piring-0.102
 wife-0.100
 goat-0.097
Tokenog
Token ID519
Feature activation-8.77e-01

Pos logit contributions
 mess+0.148
 mush+0.142
 slippery+0.134
omo+0.133
onna+0.132
Neg logit contributions
piring-0.132
owers-0.132
igh-0.131
ica-0.131
illa-0.130
Tokengy
Token ID1360
Feature activation+3.47e-01

Pos logit contributions
 mess+0.053
 mush+0.050
 slippery+0.047
omo+0.046
onna+0.045
Neg logit contributions
piring-0.051
cient-0.051
leaf-0.049
 photographer-0.048
work-0.048
Token and
Token ID290
Feature activation+1.81e-02

Pos logit contributions
piring+0.022
work+0.021
cient+0.021
moon+0.021
ci+0.020
Neg logit contributions
 mess-0.018
 mush-0.018
omo-0.017
 slippery-0.016
onna-0.016
Token mush
Token ID15026
Feature activation-1.49e+01

Pos logit contributions
piring+0.001
work+0.001
cient+0.001
 father+0.001
 recorder+0.001
Neg logit contributions
omo-0.001
 mess-0.001
 mush-0.001
hell-0.001
onna-0.001
Tokeny
Token ID88
Feature activation+2.65e+00

Pos logit contributions
 mess+0.923
 mush+0.891
omo+0.838
 slippery+0.823
onna+0.813
Neg logit contributions
piring-0.836
work-0.800
cient-0.791
illa-0.783
ci-0.773
Token.
Token ID13
Feature activation-1.43e-01

Pos logit contributions
piring+0.140
work+0.137
moon+0.129
owers+0.127
illa+0.126
Neg logit contributions
omo-0.153
 mess-0.151
 mush-0.151
 slippery-0.140
onna-0.139
Token It
Token ID632
Feature activation+1.79e+00

Pos logit contributions
 mess+0.010
 mush+0.009
 slippery+0.009
omo+0.008
 muddy+0.008
Neg logit contributions
avi-0.007
piring-0.007
cient-0.007
leaf-0.007
work-0.007
Token makes
Token ID1838
Feature activation-4.05e+00

Pos logit contributions
work+0.094
cient+0.094
piring+0.093
leaf+0.091
avi+0.088
Neg logit contributions
omo-0.099
 mess-0.096
 mush-0.093
 late-0.089
Tim-0.086
Token Ben
Token ID3932
Feature activation+4.29e-02

Pos logit contributions
 mush+0.258
 mess+0.258
omo+0.249
onna+0.240
 slippery+0.233
Neg logit contributions
piring-0.192
work-0.190
cient-0.186
 father-0.184
illa-0.179
Token.
Token ID13
Feature activation-2.73e-01

Pos logit contributions
 mess+0.113
 mush+0.107
omo+0.102
 slippery+0.098
onna+0.097
Neg logit contributions
work-0.114
piring-0.113
cient-0.112
ci-0.111
illa-0.109
Token It
Token ID632
Feature activation+1.70e+00

Pos logit contributions
 mess+0.015
 mush+0.014
 slippery+0.013
omo+0.013
onna+0.013
Neg logit contributions
piring-0.017
cient-0.017
work-0.016
illa-0.016
leaf-0.016
Token is
Token ID318
Feature activation-1.81e+00

Pos logit contributions
cient+0.097
work+0.095
piring+0.095
ci+0.092
leaf+0.091
Neg logit contributions
 mess-0.094
 mush-0.090
omo-0.090
 slippery-0.084
 late-0.082
Token brown
Token ID7586
Feature activation-1.11e+00

Pos logit contributions
 mess+0.090
omo+0.084
 mush+0.083
onna+0.077
 million+0.075
Neg logit contributions
 father-0.108
piring-0.108
 wife-0.105
work-0.105
 grandfather-0.104
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
 mess+0.068
 mush+0.065
omo+0.063
 slippery+0.061
onna+0.060
Neg logit contributions
piring-0.061
work-0.060
cient-0.058
illa-0.057
ci-0.057
Token mush
Token ID15026
Feature activation-1.49e+01

Pos logit contributions
 mess+0.001
omo+0.001
onna+0.001
 mush+0.001
 late+0.001
Neg logit contributions
piring-0.001
ci-0.001
cient-0.001
work-0.001
 goat-0.001
Tokeny
Token ID88
Feature activation+2.68e+00

Pos logit contributions
 mess+0.940
 mush+0.898
omo+0.844
 slippery+0.836
onna+0.825
Neg logit contributions
piring-0.825
work-0.798
cient-0.779
illa-0.776
ci-0.770
Token.
Token ID13
Feature activation-4.99e-02

Pos logit contributions
piring+0.144
work+0.139
illa+0.131
cient+0.130
owers+0.130
Neg logit contributions
 mess-0.158
omo-0.157
 mush-0.152
onna-0.142
 slippery-0.142
Token
Token ID198
Feature activation+5.46e-01

Pos logit contributions
 mess+0.003
 mush+0.003
 slippery+0.003
omo+0.002
 messy+0.002
Neg logit contributions
piring-0.003
cient-0.003
work-0.003
leaf-0.003
illa-0.003
Token
Token ID198
Feature activation+5.85e-01

Pos logit contributions
avi+0.024
illa+0.023
 grandfather+0.023
 father+0.022
leaf+0.022
Neg logit contributions
 mess-0.039
 mush-0.038
 slippery-0.034
 messy-0.034
 gross-0.033
Token"
Token ID1
Feature activation-2.32e+00

Pos logit contributions
avi+0.036
illa+0.034
ani+0.034
iya+0.034
ica+0.034
Neg logit contributions
 mess-0.033
 mush-0.031
onna-0.028
 slippery-0.027
 messy-0.026
Token't
Token ID470
Feature activation-1.30e+00

Pos logit contributions
 mess+0.019
 mush+0.019
omo+0.017
 slippery+0.017
onna+0.016
Neg logit contributions
piring-0.024
cient-0.023
work-0.023
leaf-0.022
illa-0.022
Token.
Token ID13
Feature activation-7.37e-02

Pos logit contributions
 mess+0.078
 mush+0.075
omo+0.070
 slippery+0.069
onna+0.068
Neg logit contributions
piring-0.074
cient-0.072
work-0.070
 photographer-0.069
leaf-0.069
Token They
Token ID1119
Feature activation-4.86e-01

Pos logit contributions
 mess+0.004
 mush+0.004
 slippery+0.004
omo+0.004
onna+0.004
Neg logit contributions
piring-0.004
cient-0.004
work-0.004
illa-0.004
ci-0.004
Token are
Token ID389
Feature activation+2.60e+00

Pos logit contributions
 mess+0.034
 mush+0.032
 slippery+0.030
omo+0.030
onna+0.029
Neg logit contributions
piring-0.024
cient-0.023
work-0.023
 photographer-0.023
leaf-0.022
Token all
Token ID477
Feature activation-7.20e-01

Pos logit contributions
work+0.163
 father+0.161
piring+0.161
 whales+0.160
 goat+0.158
Neg logit contributions
omo-0.113
 mess-0.100
L-0.099
ork-0.095
onna-0.095
Token mush
Token ID15026
Feature activation-1.49e+01

Pos logit contributions
omo+0.030
 mess+0.028
 mush+0.025
L+0.025
onna+0.025
Neg logit contributions
work-0.050
piring-0.049
leaf-0.049
cient-0.048
 whales-0.046
Tokeny
Token ID88
Feature activation+2.66e+00

Pos logit contributions
 mess+0.963
 mush+0.927
 slippery+0.885
onna+0.874
omo+0.866
Neg logit contributions
piring-0.769
work-0.754
ci-0.733
illa-0.719
owers-0.713
Token and
Token ID290
Feature activation+1.81e-02

Pos logit contributions
piring+0.155
work+0.149
sters+0.147
owers+0.147
 father+0.140
Neg logit contributions
omo-0.142
 mush-0.136
 mess-0.135
onna-0.128
 slippery-0.126
Token wet
Token ID9583
Feature activation-7.67e+00

Pos logit contributions
cient+0.001
piring+0.001
work+0.001
ci+0.001
 father+0.001
Neg logit contributions
 mess-0.001
omo-0.001
onna-0.001
 late-0.001
 mush-0.001
Token.
Token ID13
Feature activation-1.26e-01

Pos logit contributions
omo+0.429
onna+0.428
 mess+0.419
 mush+0.419
 slippery+0.401
Neg logit contributions
piring-0.423
owers-0.415
work-0.414
sters-0.392
ci-0.389
Token I
Token ID314
Feature activation-2.41e+00

Pos logit contributions
 mess+0.008
 mush+0.007
 slippery+0.007
omo+0.007
onna+0.007
Neg logit contributions
cient-0.007
piring-0.007
work-0.007
illa-0.007
avi-0.007
Token sour
Token ID11348
Feature activation-6.24e+00

Pos logit contributions
work+0.006
piring+0.006
 father+0.006
ci+0.006
 recorder+0.006
Neg logit contributions
 mess-0.005
omo-0.004
onna-0.004
umpy-0.004
hell-0.004
Token.
Token ID13
Feature activation-5.34e-02

Pos logit contributions
 mess+0.338
 mush+0.329
omo+0.307
 slippery+0.303
onna+0.300
Neg logit contributions
piring-0.384
work-0.372
cient-0.364
moon-0.356
owers-0.355
Token The
Token ID383
Feature activation+1.54e+00

Pos logit contributions
 mess+0.004
 slippery+0.003
 mush+0.003
 steep+0.003
omo+0.003
Neg logit contributions
cient-0.003
owers-0.003
 father-0.002
 photographer-0.002
 grandfather-0.002
Token strawberries
Token ID36973
Feature activation-4.08e-01

Pos logit contributions
piring+0.080
 father+0.079
work+0.079
 wife+0.079
 photographer+0.079
Neg logit contributions
 mush-0.092
 mess-0.092
onna-0.089
 slippery-0.087
omo-0.086
Token were
Token ID547
Feature activation+2.93e+00

Pos logit contributions
 mess+0.026
 mush+0.025
omo+0.024
 slippery+0.023
onna+0.023
Neg logit contributions
piring-0.021
cient-0.021
work-0.020
ci-0.020
moon-0.020
Token mush
Token ID15026
Feature activation-1.49e+01

Pos logit contributions
 father+0.183
piring+0.178
work+0.177
 wife+0.162
 camera+0.160
Neg logit contributions
L-0.125
 mess-0.121
CR-0.120
omo-0.119
onna-0.118
Tokeny
Token ID88
Feature activation+2.69e+00

Pos logit contributions
 mess+0.939
 mush+0.914
onna+0.860
omo+0.857
 slippery+0.850
Neg logit contributions
piring-0.795
work-0.778
ci-0.754
illa-0.752
cient-0.730
Token and
Token ID290
Feature activation+3.79e-02

Pos logit contributions
piring+0.110
work+0.109
sters+0.098
moon+0.096
owers+0.095
Neg logit contributions
omo-0.187
 mush-0.177
onna-0.175
 mess-0.173
 painful-0.169
Token brown
Token ID7586
Feature activation-1.06e+00

Pos logit contributions
 father+0.002
piring+0.002
work+0.002
cient+0.002
ci+0.002
Neg logit contributions
 mess-0.002
omo-0.001
umpy-0.001
onna-0.001
hell-0.001
Token.
Token ID13
Feature activation-1.20e-01

Pos logit contributions
 mess+0.059
 mush+0.056
omo+0.053
 slippery+0.052
onna+0.051
Neg logit contributions
piring-0.064
work-0.062
cient-0.061
owers-0.060
moon-0.060
Token The
Token ID383
Feature activation+1.49e+00

Pos logit contributions
 mess+0.006
 slippery+0.005
 mush+0.005
 steep+0.005
omo+0.005
Neg logit contributions
cient-0.008
owers-0.008
 photographer-0.008
 father-0.008
piring-0.008
Token.
Token ID13
Feature activation+6.39e-02

Pos logit contributions
 mess+0.105
 mush+0.100
omo+0.099
 slippery+0.090
onna+0.089
Neg logit contributions
piring-0.104
work-0.101
moon-0.098
cient-0.098
ci-0.096
Token It
Token ID632
Feature activation+1.99e+00

Pos logit contributions
 wife+0.003
 father+0.003
 grandfather+0.003
amb+0.003
 Jasper+0.003
Neg logit contributions
 mess-0.004
 mush-0.004
 poisonous-0.004
 slippery-0.004
 muddy-0.004
Token was
Token ID373
Feature activation-1.47e+00

Pos logit contributions
work+0.105
piring+0.104
ci+0.103
moon+0.103
illa+0.101
Neg logit contributions
 mess-0.118
omo-0.108
 mush-0.108
 late-0.104
onna-0.102
Token sour
Token ID11348
Feature activation-6.14e+00

Pos logit contributions
 mess+0.078
 mush+0.067
onna+0.067
ork+0.066
 million+0.066
Neg logit contributions
 father-0.084
piring-0.083
 wife-0.082
illa-0.078
 husband-0.078
Token and
Token ID290
Feature activation+1.66e-01

Pos logit contributions
 mess+0.373
 mush+0.364
omo+0.348
 slippery+0.337
onna+0.329
Neg logit contributions
piring-0.338
work-0.323
moon-0.309
illa-0.306
leaf-0.305
Token mush
Token ID15026
Feature activation-1.48e+01

Pos logit contributions
piring+0.011
work+0.011
cient+0.011
ci+0.011
 father+0.010
Neg logit contributions
 mess-0.007
omo-0.006
ork-0.006
onna-0.006
 million-0.005
Tokeny
Token ID88
Feature activation+2.78e+00

Pos logit contributions
 mess+0.950
 mush+0.900
omo+0.848
 slippery+0.846
onna+0.820
Neg logit contributions
piring-0.811
work-0.790
illa-0.758
cient-0.756
ci-0.756
Token,
Token ID11
Feature activation-7.20e-01

Pos logit contributions
piring+0.146
work+0.140
moon+0.130
owers+0.126
sters+0.126
Neg logit contributions
 mess-0.172
omo-0.170
 mush-0.166
 slippery-0.155
onna-0.152
Token but
Token ID475
Feature activation+1.53e-01

Pos logit contributions
 mess+0.042
omo+0.039
 mush+0.038
 million+0.038
onna+0.036
Neg logit contributions
piring-0.037
work-0.036
 wife-0.035
ci-0.035
 goat-0.034
Token he
Token ID339
Feature activation-1.20e+00

Pos logit contributions
piring+0.009
work+0.008
 father+0.008
 wife+0.008
illa+0.008
Neg logit contributions
 mess-0.009
 mush-0.008
omo-0.008
onna-0.008
 slippery-0.007
Token did
Token ID750
Feature activation+2.85e+00

Pos logit contributions
 mess+0.072
 mush+0.067
 slippery+0.063
omo+0.062
onna+0.061
Neg logit contributions
piring-0.066
illa-0.065
work-0.065
ci-0.064
cient-0.064
Token "
Token ID366
Feature activation+1.44e+00

Pos logit contributions
 mess+0.004
 mush+0.003
 slippery+0.003
 poisonous+0.003
omo+0.003
Neg logit contributions
 father-0.002
piring-0.002
cient-0.002
 wife-0.002
 grandfather-0.002
TokenIt
Token ID1026
Feature activation+1.15e+00

Pos logit contributions
piring+0.087
cient+0.084
work+0.082
leaf+0.082
avi+0.082
Neg logit contributions
 mess-0.084
 mush-0.078
omo-0.074
 slippery-0.073
onna-0.071
Token is
Token ID318
Feature activation-1.63e+00

Pos logit contributions
cient+0.064
work+0.060
ci+0.059
piring+0.058
illa+0.056
Neg logit contributions
 mess-0.070
omo-0.067
 mush-0.065
 late-0.062
 slippery-0.062
Token sour
Token ID11348
Feature activation-6.16e+00

Pos logit contributions
 mess+0.076
omo+0.071
hell+0.071
 million+0.068
lights+0.068
Neg logit contributions
 goat-0.096
 father-0.096
 wife-0.092
 whales-0.090
 permission-0.089
Token and
Token ID290
Feature activation+1.72e-01

Pos logit contributions
 mess+0.363
 mush+0.361
omo+0.353
onna+0.342
 slippery+0.336
Neg logit contributions
piring-0.325
owers-0.312
work-0.303
cient-0.302
sters-0.297
Token mush
Token ID15026
Feature activation-1.48e+01

Pos logit contributions
cient+0.012
piring+0.012
 goat+0.012
ci+0.012
work+0.012
Neg logit contributions
omo-0.006
 mess-0.006
onna-0.005
ork-0.005
 million-0.005
Tokeny
Token ID88
Feature activation+2.84e+00

Pos logit contributions
 mess+0.945
 mush+0.903
omo+0.850
 slippery+0.848
onna+0.838
Neg logit contributions
piring-0.789
work-0.767
cient-0.760
ci-0.749
illa-0.740
Token.
Token ID13
Feature activation+1.00e-01

Pos logit contributions
piring+0.168
cient+0.162
work+0.162
owers+0.161
sters+0.156
Neg logit contributions
 mess-0.149
omo-0.146
 mush-0.143
 slippery-0.136
onna-0.133
Token Can
Token ID1680
Feature activation-8.69e-01

Pos logit contributions
piring+0.005
cient+0.005
work+0.005
illa+0.005
 photographer+0.005
Neg logit contributions
 mess-0.007
 mush-0.006
 slippery-0.006
omo-0.005
 messy-0.005
Token I
Token ID314
Feature activation-2.14e+00

Pos logit contributions
 mess+0.064
 mush+0.057
 slippery+0.055
omo+0.054
 painful+0.053
Neg logit contributions
cient-0.040
work-0.038
ci-0.037
piring-0.036
 father-0.035
Token have
Token ID423
Feature activation-1.98e+00

Pos logit contributions
 mess+0.148
 mush+0.138
 slippery+0.135
omo+0.134
 holes+0.131
Neg logit contributions
cient-0.094
 wife-0.091
illa-0.088
piring-0.087
igh-0.087
Token the
Token ID262
Feature activation+3.76e-01

Pos logit contributions
 mess+0.037
 mush+0.035
 slippery+0.033
omo+0.031
 Lily+0.031
Neg logit contributions
cient-0.049
piring-0.048
illa-0.047
 photographer-0.046
work-0.046
Token soft
Token ID2705
Feature activation-3.52e+00

Pos logit contributions
piring+0.024
cient+0.024
work+0.023
 father+0.023
 wife+0.023
Neg logit contributions
 mess-0.019
 mush-0.018
omo-0.017
onna-0.017
 slippery-0.016
Token sand
Token ID6450
Feature activation-2.92e+00

Pos logit contributions
 mess+0.185
 mush+0.182
omo+0.177
 late+0.176
onna+0.171
Neg logit contributions
piring-0.198
 goat-0.195
work-0.195
ci-0.193
cient-0.193
Token and
Token ID290
Feature activation+2.38e-01

Pos logit contributions
 mess+0.160
 mush+0.153
omo+0.140
 slippery+0.140
onna+0.135
Neg logit contributions
piring-0.183
cient-0.180
work-0.176
ci-0.172
illa-0.171
Token the
Token ID262
Feature activation+4.50e-01

Pos logit contributions
piring+0.014
work+0.014
cient+0.014
ci+0.014
leaf+0.013
Neg logit contributions
 mess-0.013
 mush-0.012
omo-0.012
onna-0.012
 slippery-0.011
Token mush
Token ID15026
Feature activation-1.47e+01

Pos logit contributions
piring+0.028
cient+0.028
work+0.027
 father+0.027
leaf+0.027
Neg logit contributions
 mess-0.023
 mush-0.022
omo-0.021
onna-0.021
 slippery-0.020
Tokeny
Token ID88
Feature activation+2.91e+00

Pos logit contributions
 mess+0.812
 mush+0.771
 slippery+0.712
omo+0.706
onna+0.683
Neg logit contributions
cient-0.918
piring-0.917
leaf-0.876
work-0.875
 photographer-0.874
Token mud
Token ID17492
Feature activation-1.04e+01

Pos logit contributions
piring+0.170
cient+0.166
work+0.166
 wife+0.162
ci+0.161
Neg logit contributions
 mess-0.159
 mush-0.153
omo-0.144
onna-0.141
 slippery-0.140
Token.
Token ID13
Feature activation+1.71e-01

Pos logit contributions
 mess+0.560
 mush+0.542
omo+0.518
 slippery+0.485
onna+0.481
Neg logit contributions
piring-0.639
work-0.617
cient-0.609
ci-0.596
owers-0.595
Token
Token ID220
Feature activation-3.87e-01

Pos logit contributions
cient+0.010
piring+0.010
work+0.009
avi+0.009
 father+0.009
Neg logit contributions
 mess-0.010
 mush-0.010
 slippery-0.009
omo-0.009
 messy-0.008
Token
Token ID198
Feature activation+7.92e-01

Pos logit contributions
 mess+0.023
 mush+0.022
 slippery+0.020
omo+0.020
 messy+0.019
Neg logit contributions
cient-0.022
piring-0.022
illa-0.021
work-0.021
avi-0.021
Token vegetable
Token ID20236
Feature activation+1.46e+00

Pos logit contributions
 goat+0.133
 photographer+0.132
 wife+0.129
 camera+0.128
 father+0.128
Neg logit contributions
omo-0.093
 mess-0.087
 mush-0.085
 slippery-0.083
 late-0.082
Token.
Token ID13
Feature activation+1.38e-01

Pos logit contributions
piring+0.090
cient+0.089
work+0.088
illa+0.085
 photographer+0.085
Neg logit contributions
 mess-0.076
 mush-0.074
omo-0.071
 slippery-0.069
onna-0.065
Token And
Token ID843
Feature activation+2.85e+00

Pos logit contributions
piring+0.008
cient+0.008
work+0.007
illa+0.007
leaf+0.007
Neg logit contributions
 mess-0.009
 mush-0.008
 slippery-0.008
omo-0.008
onna-0.007
Token it
Token ID340
Feature activation-1.68e+00

Pos logit contributions
piring+0.185
cient+0.184
work+0.178
leaf+0.176
illa+0.176
Neg logit contributions
 mess-0.152
 mush-0.144
 slippery-0.132
omo-0.130
onna-0.126
Token is
Token ID318
Feature activation-1.45e+00

Pos logit contributions
 mess+0.095
omo+0.093
 mush+0.092
onna+0.087
 slippery+0.086
Neg logit contributions
cient-0.096
work-0.092
piring-0.092
ci-0.089
illa-0.088
Token mush
Token ID15026
Feature activation-1.46e+01

Pos logit contributions
omo+0.065
 mess+0.061
 mush+0.056
onna+0.055
hell+0.055
Neg logit contributions
 father-0.092
work-0.090
piring-0.089
cient-0.088
 permission-0.088
Tokeny
Token ID88
Feature activation+2.98e+00

Pos logit contributions
 mess+0.944
 mush+0.916
 slippery+0.863
omo+0.858
onna+0.856
Neg logit contributions
piring-0.751
work-0.747
ci-0.716
illa-0.713
cient-0.696
Token and
Token ID290
Feature activation+3.52e-01

Pos logit contributions
piring+0.172
work+0.170
cient+0.160
owers+0.158
sters+0.157
Neg logit contributions
omo-0.159
 mush-0.154
 mess-0.153
onna-0.142
 slippery-0.141
Token brown
Token ID7586
Feature activation-7.24e-01

Pos logit contributions
piring+0.026
cient+0.025
work+0.025
 father+0.025
ci+0.024
Neg logit contributions
omo-0.012
 mess-0.012
onna-0.010
hell-0.009
 late-0.009
Token inside
Token ID2641
Feature activation-2.07e+00

Pos logit contributions
 mess+0.038
 mush+0.037
omo+0.036
 slippery+0.035
onna+0.034
Neg logit contributions
piring-0.045
work-0.043
cient-0.042
owers-0.042
illa-0.041
Token."
Token ID526
Feature activation-4.78e+00

Pos logit contributions
 mess+0.105
 mush+0.103
omo+0.103
onna+0.095
 slippery+0.093
Neg logit contributions
piring-0.127
cient-0.124
work-0.123
illa-0.120
igh-0.118
Token.
Token ID13
Feature activation+2.23e-01

Pos logit contributions
 mess+0.195
 mush+0.189
omo+0.176
 slippery+0.171
onna+0.164
Neg logit contributions
piring-0.236
cient-0.231
work-0.225
moon-0.224
illa-0.222
Token They
Token ID1119
Feature activation-1.81e-01

Pos logit contributions
piring+0.013
cient+0.013
work+0.013
illa+0.013
avi+0.012
Neg logit contributions
 mess-0.013
 mush-0.012
 slippery-0.012
omo-0.011
onna-0.011
Token are
Token ID389
Feature activation+2.92e+00

Pos logit contributions
 mess+0.013
 mush+0.013
omo+0.012
 slippery+0.012
onna+0.012
Neg logit contributions
piring-0.008
cient-0.008
work-0.007
illa-0.007
leaf-0.007
Token dry
Token ID5894
Feature activation-3.28e+00

Pos logit contributions
 father+0.173
 whales+0.172
piring+0.170
 wife+0.170
 goat+0.169
Neg logit contributions
omo-0.124
 mess-0.118
ork-0.115
L-0.113
CR-0.113
Token and
Token ID290
Feature activation+3.90e-01

Pos logit contributions
 mush+0.197
omo+0.196
 mess+0.196
onna+0.190
 slippery+0.180
Neg logit contributions
ci-0.166
work-0.164
piring-0.163
amb-0.157
cient-0.151
Token mush
Token ID15026
Feature activation-1.46e+01

Pos logit contributions
piring+0.026
cient+0.025
ci+0.025
work+0.024
 wife+0.024
Neg logit contributions
omo-0.016
 mess-0.015
onna-0.014
ork-0.014
 mush-0.013
Tokeny
Token ID88
Feature activation+3.03e+00

Pos logit contributions
 mess+0.913
 mush+0.883
 slippery+0.828
omo+0.825
onna+0.812
Neg logit contributions
piring-0.786
work-0.771
ci-0.751
cient-0.745
illa-0.735
Token,"
Token ID553
Feature activation-2.38e+00

Pos logit contributions
piring+0.158
work+0.155
owers+0.145
moon+0.142
cient+0.142
Neg logit contributions
omo-0.175
 mess-0.167
 mush-0.166
onna-0.159
 slippery-0.156
Token Sara
Token ID24799
Feature activation-5.88e+00

Pos logit contributions
 mess+0.086
 mush+0.079
 slippery+0.068
omo+0.063
onna+0.062
Neg logit contributions
 wife-0.190
 father-0.189
piring-0.187
avi-0.186
cient-0.185
Token said
Token ID531
Feature activation+3.79e+00

Pos logit contributions
 mess+0.410
 mush+0.392
omo+0.367
 slippery+0.366
onna+0.356
Neg logit contributions
piring-0.294
cient-0.287
work-0.277
leaf-0.271
 photographer-0.269
Token.
Token ID13
Feature activation+2.89e-01

Pos logit contributions
cient+0.299
piring+0.296
 photographer+0.293
illa+0.290
leaf+0.289
Neg logit contributions
 mess-0.148
 mush-0.139
 Lily-0.131
onna-0.124
 slippery-0.122
Token the
Token ID262
Feature activation+5.93e-01

Pos logit contributions
 mess+0.026
 mush+0.025
omo+0.022
 slippery+0.022
onna+0.022
Neg logit contributions
piring-0.032
cient-0.032
work-0.031
ci-0.030
illa-0.030
Token jelly
Token ID27644
Feature activation-1.56e-01

Pos logit contributions
piring+0.035
work+0.035
cient+0.034
 father+0.034
 photographer+0.034
Neg logit contributions
 mess-0.033
 mush-0.032
onna-0.030
omo-0.029
 slippery-0.029
Token was
Token ID373
Feature activation-1.25e+00

Pos logit contributions
 mess+0.009
 mush+0.009
omo+0.008
onna+0.008
 slippery+0.008
Neg logit contributions
piring-0.009
cient-0.009
work-0.008
moon-0.008
ci-0.008
Token a
Token ID257
Feature activation+2.24e+00

Pos logit contributions
onna+0.068
 mush+0.068
 mess+0.067
hell+0.063
ork+0.062
Neg logit contributions
 father-0.065
work-0.063
piring-0.060
moon-0.059
 wife-0.058
Token bit
Token ID1643
Feature activation-5.73e+00

Pos logit contributions
 photographer+0.145
 father+0.144
 wife+0.143
work+0.140
 goat+0.140
Neg logit contributions
 mush-0.105
 mess-0.100
onna-0.097
omo-0.094
 slippery-0.089
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
omo+0.256
onna+0.235
 mess+0.227
 mush+0.219
ork+0.211
Neg logit contributions
work-0.376
piring-0.374
cient-0.355
 father-0.353
leaf-0.348
Tokeny
Token ID88
Feature activation+3.06e+00

Pos logit contributions
 mess+0.752
 mush+0.710
 slippery+0.653
omo+0.632
 messy+0.619
Neg logit contributions
cient-0.986
piring-0.951
leaf-0.944
 photographer-0.929
avi-0.919
Token.
Token ID13
Feature activation+3.01e-01

Pos logit contributions
piring+0.180
work+0.178
cient+0.170
ci+0.168
moon+0.168
Neg logit contributions
 mess-0.167
 mush-0.164
omo-0.160
onna-0.150
 slippery-0.149
Token But
Token ID887
Feature activation-2.43e+00

Pos logit contributions
cient+0.019
piring+0.019
work+0.018
leaf+0.018
 father+0.018
Neg logit contributions
 mess-0.017
 mush-0.015
 slippery-0.014
omo-0.014
 painful-0.013
Token that
Token ID326
Feature activation+3.57e+00

Pos logit contributions
 mess+0.140
 mush+0.133
omo+0.123
 slippery+0.121
onna+0.119
Neg logit contributions
piring-0.147
cient-0.144
work-0.143
ci-0.140
illa-0.140
Token was
Token ID373
Feature activation-1.21e+00

Pos logit contributions
piring+0.217
ci+0.216
work+0.214
cient+0.214
illa+0.207
Neg logit contributions
 mess-0.191
 mush-0.185
omo-0.175
onna-0.172
 slippery-0.168
Token.
Token ID13
Feature activation+2.51e-01

Pos logit contributions
omo+0.069
 mess+0.060
 mush+0.060
Tim+0.054
onna+0.053
Neg logit contributions
cient-0.074
piring-0.072
owers-0.071
work-0.070
 whales-0.070
Token It
Token ID632
Feature activation+2.20e+00

Pos logit contributions
piring+0.014
cient+0.014
work+0.014
illa+0.014
leaf+0.014
Neg logit contributions
 mess-0.015
 mush-0.014
 slippery-0.013
omo-0.013
onna-0.013
Token is
Token ID318
Feature activation-1.37e+00

Pos logit contributions
cient+0.123
work+0.122
ci+0.119
piring+0.117
illa+0.112
Neg logit contributions
 mess-0.124
omo-0.124
 mush-0.119
onna-0.112
 late-0.111
Token wet
Token ID9583
Feature activation-7.27e+00

Pos logit contributions
omo+0.065
hell+0.060
 mess+0.059
Tim+0.059
L+0.058
Neg logit contributions
 father-0.083
work-0.079
piring-0.078
cient-0.077
 permission-0.076
Token and
Token ID290
Feature activation+4.22e-01

Pos logit contributions
omo+0.498
 mush+0.484
 mess+0.480
onna+0.474
 painful+0.448
Neg logit contributions
piring-0.323
work-0.323
ci-0.299
owers-0.290
amb-0.286
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
piring+0.028
ci+0.028
cient+0.027
work+0.027
 father+0.026
Neg logit contributions
omo-0.017
ork-0.014
hell-0.014
onna-0.013
 mess-0.013
Tokeny
Token ID88
Feature activation+3.06e+00

Pos logit contributions
 mess+0.927
 mush+0.886
omo+0.835
 slippery+0.827
onna+0.811
Neg logit contributions
piring-0.789
work-0.770
cient-0.748
ci-0.741
illa-0.737
Token.
Token ID13
Feature activation+3.01e-01

Pos logit contributions
piring+0.158
work+0.153
moon+0.145
owers+0.142
cient+0.141
Neg logit contributions
omo-0.185
 mess-0.177
 mush-0.175
onna-0.164
 painful-0.161
Token I
Token ID314
Feature activation-1.96e+00

Pos logit contributions
cient+0.015
piring+0.015
illa+0.014
work+0.014
avi+0.014
Neg logit contributions
 mess-0.021
 mush-0.019
 slippery-0.018
omo-0.017
onna-0.017
Token put
Token ID1234
Feature activation-4.97e-01

Pos logit contributions
 mess+0.130
 mush+0.125
omo+0.121
 slippery+0.117
 million+0.114
Neg logit contributions
cient-0.090
piring-0.089
ci-0.088
leaf-0.088
illa-0.088
Token it
Token ID340
Feature activation-1.54e+00

Pos logit contributions
 mess+0.029
omo+0.028
onna+0.026
 mush+0.026
 slippery+0.025
Neg logit contributions
piring-0.028
work-0.028
igh-0.028
cient-0.028
illa-0.026
Token herbs
Token ID25411
Feature activation-1.82e+00

Pos logit contributions
 mess+0.279
 mush+0.276
onna+0.267
omo+0.263
 slippery+0.257
Neg logit contributions
work-0.275
 father-0.273
piring-0.271
cient-0.268
ci-0.267
Token had
Token ID550
Feature activation-2.63e+00

Pos logit contributions
 mess+0.111
 mush+0.105
 slippery+0.099
 messy+0.093
omo+0.093
Neg logit contributions
cient-0.105
piring-0.104
 photographer-0.101
illa-0.101
avi-0.100
Token turned
Token ID2900
Feature activation-2.31e+00

Pos logit contributions
 mess+0.143
 mush+0.135
omo+0.131
 slippery+0.122
onna+0.120
Neg logit contributions
piring-0.162
work-0.157
leaf-0.154
cient-0.153
moon-0.152
Token into
Token ID656
Feature activation-3.11e+00

Pos logit contributions
omo+0.135
 mess+0.135
 mush+0.126
onna+0.121
 slippery+0.115
Neg logit contributions
piring-0.132
work-0.131
cient-0.127
 father-0.120
igh-0.118
Token a
Token ID257
Feature activation+2.26e+00

Pos logit contributions
 mess+0.189
 mush+0.181
omo+0.171
 slippery+0.168
onna+0.166
Neg logit contributions
piring-0.174
cient-0.169
work-0.168
illa-0.164
leaf-0.162
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
cient+0.145
piring+0.145
work+0.143
 father+0.143
 photographer+0.141
Neg logit contributions
omo-0.103
 mush-0.103
 mess-0.102
onna-0.096
 slippery-0.091
Tokeny
Token ID88
Feature activation+3.06e+00

Pos logit contributions
 mess+0.799
 mush+0.757
 slippery+0.695
omo+0.689
onna+0.667
Neg logit contributions
cient-0.919
piring-0.919
work-0.876
leaf-0.872
illa-0.867
Token green
Token ID4077
Feature activation-2.37e+00

Pos logit contributions
work+0.165
 goat+0.165
cient+0.164
piring+0.162
 father+0.162
Neg logit contributions
 mess-0.169
 mush-0.165
omo-0.164
onna-0.159
 slippery-0.157
Token mixture
Token ID11710
Feature activation-1.71e+00

Pos logit contributions
 mess+0.140
 mush+0.133
omo+0.130
 slippery+0.127
onna+0.123
Neg logit contributions
piring-0.131
cient-0.130
work-0.130
moon-0.124
owers-0.124
Token.
Token ID13
Feature activation+2.79e-01

Pos logit contributions
 mess+0.095
 mush+0.091
omo+0.084
 slippery+0.083
onna+0.080
Neg logit contributions
piring-0.106
cient-0.103
work-0.102
illa-0.099
moon-0.099
Token
Token ID198
Feature activation+8.84e-01

Pos logit contributions
cient+0.016
piring+0.016
work+0.015
leaf+0.015
illa+0.015
Neg logit contributions
 mess-0.017
 mush-0.016
 slippery-0.015
omo-0.015
 messy-0.014
Token,
Token ID11
Feature activation-3.99e-01

Pos logit contributions
piring+0.050
cient+0.049
work+0.048
illa+0.047
 photographer+0.047
Neg logit contributions
 mess-0.043
 mush-0.041
omo-0.038
 slippery-0.037
onna-0.037
Token the
Token ID262
Feature activation+6.80e-01

Pos logit contributions
 mess+0.018
 mush+0.017
 slippery+0.016
 Lily+0.016
 messy+0.015
Neg logit contributions
cient-0.029
piring-0.029
leaf-0.029
amb-0.029
illa-0.028
Token wire
Token ID6503
Feature activation+8.71e-01

Pos logit contributions
piring+0.045
cient+0.044
work+0.043
illa+0.042
leaf+0.042
Neg logit contributions
 mess-0.037
 mush-0.034
 slippery-0.032
omo-0.031
 messy-0.030
Token was
Token ID373
Feature activation-1.19e+00

Pos logit contributions
cient+0.052
work+0.052
piring+0.052
ci+0.051
moon+0.050
Neg logit contributions
 mess-0.047
 mush-0.046
omo-0.043
onna-0.042
 slippery-0.042
Token all
Token ID477
Feature activation-3.21e-01

Pos logit contributions
omo+0.060
L+0.059
hell+0.058
 mess+0.056
Anna+0.055
Neg logit contributions
 father-0.066
work-0.062
 permission-0.062
piring-0.061
moon-0.059
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
omo+0.015
 mess+0.013
L+0.012
hell+0.011
ork+0.011
Neg logit contributions
work-0.023
piring-0.021
cient-0.020
leaf-0.020
ci-0.020
Tokeny
Token ID88
Feature activation+3.10e+00

Pos logit contributions
 mess+0.931
 mush+0.885
omo+0.839
 slippery+0.823
onna+0.823
Neg logit contributions
piring-0.785
work-0.764
ci-0.744
cient-0.743
illa-0.738
Token and
Token ID290
Feature activation+4.50e-01

Pos logit contributions
piring+0.155
work+0.151
owers+0.142
moon+0.141
ci+0.138
Neg logit contributions
omo-0.199
 mess-0.192
 mush-0.184
onna-0.177
 painful-0.172
Token wet
Token ID9583
Feature activation-7.23e+00

Pos logit contributions
 father+0.027
piring+0.027
cient+0.026
work+0.026
ci+0.026
Neg logit contributions
 mess-0.022
omo-0.021
onna-0.020
hell-0.020
 mush-0.019
Token.
Token ID13
Feature activation+3.06e-01

Pos logit contributions
omo+0.462
 mess+0.453
onna+0.450
 mush+0.428
 painful+0.417
Neg logit contributions
work-0.373
piring-0.367
owers-0.348
ci-0.337
al-0.320
Token
Token ID198
Feature activation+9.07e-01

Pos logit contributions
piring+0.020
cient+0.019
work+0.019
leaf+0.019
 photographer+0.019
Neg logit contributions
 mess-0.017
 mush-0.016
omo-0.015
 slippery-0.015
 Lily-0.014
Token It
Token ID632
Feature activation+2.28e+00

Pos logit contributions
piring+0.019
cient+0.018
work+0.018
illa+0.017
leaf+0.017
Neg logit contributions
 mess-0.022
 mush-0.021
 slippery-0.019
omo-0.019
onna-0.019
Token will
Token ID481
Feature activation+4.51e-02

Pos logit contributions
cient+0.134
work+0.129
piring+0.127
ci+0.124
illa+0.122
Neg logit contributions
 mess-0.129
 mush-0.125
omo-0.124
 slippery-0.115
onna-0.113
Token get
Token ID651
Feature activation+7.54e-01

Pos logit contributions
cient+0.002
work+0.002
piring+0.002
 send+0.002
 communicate+0.002
Neg logit contributions
omo-0.003
 mess-0.003
 mush-0.003
 million-0.002
 late-0.002
Token brown
Token ID7586
Feature activation-6.18e-01

Pos logit contributions
 permission+0.045
cient+0.045
work+0.044
 father+0.044
piring+0.042
Neg logit contributions
omo-0.037
onna-0.032
 mess-0.031
ue-0.029
hell-0.029
Token and
Token ID290
Feature activation+4.59e-01

Pos logit contributions
 mess+0.036
 mush+0.035
omo+0.033
 slippery+0.032
onna+0.031
Neg logit contributions
piring-0.035
work-0.035
cient-0.034
ci-0.033
illa-0.033
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
piring+0.029
cient+0.029
ci+0.028
 wife+0.028
 father+0.027
Neg logit contributions
 mess-0.020
omo-0.019
onna-0.016
hell-0.016
 anymore-0.016
Tokeny
Token ID88
Feature activation+3.11e+00

Pos logit contributions
 mess+0.892
 mush+0.849
 slippery+0.793
omo+0.792
onna+0.766
Neg logit contributions
piring-0.819
work-0.803
cient-0.791
illa-0.780
ci-0.777
Token and
Token ID290
Feature activation+4.66e-01

Pos logit contributions
work+0.184
piring+0.182
cient+0.171
moon+0.170
igh+0.168
Neg logit contributions
omo-0.164
 mess-0.158
 mush-0.154
 painful-0.146
 slippery-0.142
Token make
Token ID787
Feature activation+2.24e+00

Pos logit contributions
cient+0.030
piring+0.030
ci+0.029
work+0.029
 wife+0.028
Neg logit contributions
 mess-0.020
omo-0.019
onna-0.016
hell-0.016
 mush-0.016
Token a
Token ID257
Feature activation+2.28e+00

Pos logit contributions
cient+0.125
piring+0.123
work+0.117
illa+0.117
 father+0.116
Neg logit contributions
 mess-0.127
 mush-0.125
omo-0.122
onna-0.118
 slippery-0.116
Token mess
Token ID2085
Feature activation-1.38e+01

Pos logit contributions
cient+0.144
 wife+0.141
 father+0.139
 goat+0.139
piring+0.134
Neg logit contributions
omo-0.090
 late-0.085
 mush-0.084
 steep-0.081
 slippery-0.080
Token that
Token ID326
Feature activation+3.62e+00

Pos logit contributions
 mess+0.040
 mush+0.038
 Lily+0.036
 slippery+0.035
omo+0.035
Neg logit contributions
piring-0.035
cient-0.034
leaf-0.033
work-0.033
igh-0.033
Token they
Token ID484
Feature activation+2.37e+00

Pos logit contributions
piring+0.209
work+0.207
cient+0.204
ci+0.201
 photographer+0.197
Neg logit contributions
 mess-0.207
 mush-0.200
omo-0.192
 slippery-0.187
onna-0.186
Token were
Token ID547
Feature activation+3.37e+00

Pos logit contributions
piring+0.143
cient+0.141
work+0.136
leaf+0.135
illa+0.135
Neg logit contributions
 mess-0.139
 mush-0.132
 slippery-0.122
omo-0.120
onna-0.117
Token old
Token ID1468
Feature activation-8.95e-01

Pos logit contributions
piring+0.206
work+0.200
cient+0.200
leaf+0.196
 father+0.196
Neg logit contributions
 mess-0.179
omo-0.170
 mush-0.169
onna-0.165
 slippery-0.156
Token and
Token ID290
Feature activation+4.92e-01

Pos logit contributions
 mess+0.048
 mush+0.044
 slippery+0.040
omo+0.039
 messy+0.039
Neg logit contributions
piring-0.061
cient-0.061
leaf-0.058
illa-0.058
igh-0.058
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
cient+0.035
piring+0.034
work+0.033
ci+0.033
leaf+0.032
Neg logit contributions
 mess-0.023
 mush-0.021
omo-0.020
onna-0.020
 slippery-0.020
Tokeny
Token ID88
Feature activation+3.12e+00

Pos logit contributions
 mess+0.851
 mush+0.811
 slippery+0.750
omo+0.747
onna+0.728
Neg logit contributions
piring-0.859
cient-0.848
work-0.825
illa-0.810
ci-0.808
Token.
Token ID13
Feature activation+3.40e-01

Pos logit contributions
piring+0.191
work+0.187
cient+0.185
ci+0.180
illa+0.178
Neg logit contributions
 mess-0.170
 mush-0.162
omo-0.154
 slippery-0.150
onna-0.147
Token Lily
Token ID20037
Feature activation-8.37e+00

Pos logit contributions
cient+0.018
 father+0.017
 grandfather+0.017
 photographer+0.017
amb+0.017
Neg logit contributions
 mess-0.021
 slippery-0.021
 mush-0.020
 steep-0.018
 poisonous-0.018
Token was
Token ID373
Feature activation-1.17e+00

Pos logit contributions
 mess+0.509
 mush+0.489
 slippery+0.453
omo+0.450
onna+0.440
Neg logit contributions
piring-0.472
cient-0.471
work-0.455
illa-0.449
ci-0.447
Token sad
Token ID6507
Feature activation-4.94e+00

Pos logit contributions
 mess+0.062
 mush+0.059
omo+0.058
onna+0.056
 slippery+0.054
Neg logit contributions
piring-0.071
work-0.070
 father-0.070
 wife-0.069
cient-0.068
Token apple
Token ID17180
Feature activation-1.13e+00

Pos logit contributions
piring+0.129
cient+0.128
work+0.124
illa+0.122
leaf+0.122
Neg logit contributions
 mess-0.104
 mush-0.098
 slippery-0.090
omo-0.090
onna-0.087
Token was
Token ID373
Feature activation-1.15e+00

Pos logit contributions
 mess+0.062
 mush+0.058
omo+0.054
 slippery+0.053
onna+0.053
Neg logit contributions
piring-0.069
cient-0.068
work-0.067
ci-0.066
moon-0.066
Token so
Token ID523
Feature activation+3.42e+00

Pos logit contributions
 mess+0.064
 mush+0.057
omo+0.056
onna+0.054
 Lily+0.051
Neg logit contributions
piring-0.067
 father-0.064
work-0.064
illa-0.062
cient-0.062
Token old
Token ID1468
Feature activation-8.86e-01

Pos logit contributions
piring+0.234
cient+0.223
leaf+0.222
work+0.215
 wife+0.212
Neg logit contributions
 mess-0.158
 mush-0.132
omo-0.130
onna-0.121
 Lily-0.119
Token and
Token ID290
Feature activation+5.16e-01

Pos logit contributions
 mess+0.043
 mush+0.038
 slippery+0.036
 Lily+0.036
 messy+0.035
Neg logit contributions
cient-0.065
piring-0.063
leaf-0.061
igh-0.061
illa-0.060
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
piring+0.033
cient+0.033
work+0.033
ci+0.033
 father+0.031
Neg logit contributions
 mess-0.027
 mush-0.024
onna-0.023
omo-0.023
 slippery-0.022
Tokeny
Token ID88
Feature activation+3.13e+00

Pos logit contributions
 mess+0.780
 mush+0.747
 slippery+0.687
omo+0.680
onna+0.654
Neg logit contributions
cient-0.916
piring-0.906
leaf-0.880
 photographer-0.876
work-0.863
Token that
Token ID326
Feature activation+3.63e+00

Pos logit contributions
piring+0.194
cient+0.191
work+0.187
illa+0.182
ci+0.182
Neg logit contributions
 mess-0.173
 mush-0.165
omo-0.153
 slippery-0.151
onna-0.147
Token it
Token ID340
Feature activation-1.49e+00

Pos logit contributions
piring+0.222
cient+0.219
work+0.214
ci+0.210
illa+0.209
Neg logit contributions
 mess-0.206
 mush-0.196
omo-0.181
 slippery-0.180
onna-0.175
Token started
Token ID2067
Feature activation-2.91e+00

Pos logit contributions
 mess+0.084
 mush+0.078
omo+0.077
onna+0.072
 slippery+0.071
Neg logit contributions
cient-0.087
ci-0.087
work-0.086
piring-0.086
illa-0.082
Token to
Token ID284
Feature activation+2.77e-01

Pos logit contributions
 mess+0.190
 mush+0.181
omo+0.172
 slippery+0.168
onna+0.163
Neg logit contributions
piring-0.152
cient-0.148
work-0.146
ci-0.141
leaf-0.140
Token it
Token ID340
Feature activation-1.49e+00

Pos logit contributions
 mess+0.019
 mush+0.018
 slippery+0.017
 Lily+0.016
 messy+0.016
Neg logit contributions
piring-0.024
cient-0.024
leaf-0.023
illa-0.023
amb-0.022
Token was
Token ID373
Feature activation-1.16e+00

Pos logit contributions
omo+0.096
 mush+0.093
onna+0.091
 mess+0.090
 late+0.088
Neg logit contributions
work-0.062
ci-0.057
moon-0.057
 father-0.057
cient-0.056
Token all
Token ID477
Feature activation-2.74e-01

Pos logit contributions
L+0.067
Anna+0.067
umpy+0.064
hell+0.064
omo+0.062
Neg logit contributions
 father-0.055
 permission-0.049
work-0.048
 walking-0.045
 going-0.044
Token brown
Token ID7586
Feature activation-5.86e-01

Pos logit contributions
omo+0.011
 mess+0.010
L+0.010
ork+0.010
hell+0.009
Neg logit contributions
work-0.020
piring-0.018
ci-0.018
cient-0.018
leaf-0.017
Token and
Token ID290
Feature activation+5.03e-01

Pos logit contributions
 mess+0.037
omo+0.035
 mush+0.035
onna+0.034
 slippery+0.033
Neg logit contributions
piring-0.031
work-0.030
owers-0.030
cient-0.028
ci-0.028
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
 father+0.033
work+0.033
piring+0.032
ci+0.032
cient+0.032
Neg logit contributions
 mess-0.022
omo-0.020
onna-0.020
hell-0.019
umpy-0.019
Tokeny
Token ID88
Feature activation+3.12e+00

Pos logit contributions
 mess+0.897
 mush+0.852
omo+0.795
 slippery+0.789
onna+0.777
Neg logit contributions
piring-0.821
cient-0.794
work-0.791
ci-0.768
illa-0.765
Token.
Token ID13
Feature activation+3.54e-01

Pos logit contributions
piring+0.179
work+0.174
cient+0.168
owers+0.164
moon+0.163
Neg logit contributions
 mess-0.179
omo-0.171
 mush-0.170
onna-0.159
 slippery-0.157
Token
Token ID198
Feature activation+9.61e-01

Pos logit contributions
cient+0.020
piring+0.019
work+0.019
leaf+0.019
 photographer+0.019
Neg logit contributions
 mess-0.022
 mush-0.021
 slippery-0.020
omo-0.019
onna-0.019
Token
Token ID198
Feature activation+9.54e-01

Pos logit contributions
cient+0.048
piring+0.047
leaf+0.047
avi+0.046
illa+0.046
Neg logit contributions
 mess-0.065
 mush-0.062
 slippery-0.056
omo-0.056
 messy-0.055
TokenThe
Token ID464
Feature activation+1.11e+00

Pos logit contributions
piring+0.049
leaf+0.048
cient+0.048
avi+0.048
 photographer+0.047
Neg logit contributions
 mess-0.064
 mush-0.061
onna-0.055
 slippery-0.053
omo-0.052
Token The
Token ID383
Feature activation+1.97e+00

Pos logit contributions
piring+0.019
cient+0.019
work+0.018
avi+0.018
leaf+0.018
Neg logit contributions
 mess-0.023
 mush-0.021
 slippery-0.021
omo-0.019
 poisonous-0.019
Token tomatoes
Token ID23972
Feature activation-8.28e-01

Pos logit contributions
 father+0.114
 wife+0.109
work+0.108
cient+0.106
piring+0.105
Neg logit contributions
onna-0.105
omo-0.102
 mush-0.100
 mess-0.100
 slippery-0.098
Token get
Token ID651
Feature activation+7.73e-01

Pos logit contributions
 mess+0.058
 mush+0.056
omo+0.053
 slippery+0.053
onna+0.051
Neg logit contributions
piring-0.039
cient-0.038
work-0.036
illa-0.035
leaf-0.035
Token soft
Token ID2705
Feature activation-3.21e+00

Pos logit contributions
piring+0.048
work+0.047
 father+0.045
cient+0.044
 permission+0.043
Neg logit contributions
omo-0.041
 mess-0.041
onna-0.036
 mush-0.036
 steep-0.033
Token and
Token ID290
Feature activation+5.03e-01

Pos logit contributions
 mess+0.198
omo+0.196
 mush+0.187
onna+0.183
 slippery+0.181
Neg logit contributions
work-0.169
piring-0.167
owers-0.156
ci-0.155
moon-0.154
Token mush
Token ID15026
Feature activation-1.45e+01

Pos logit contributions
piring+0.033
 father+0.032
ci+0.032
cient+0.031
 wife+0.031
Neg logit contributions
 mess-0.021
omo-0.020
hell-0.018
umpy-0.018
onna-0.018
Tokeny
Token ID88
Feature activation+3.14e+00

Pos logit contributions
 mess+0.893
 mush+0.853
omo+0.800
 slippery+0.788
onna+0.768
Neg logit contributions
piring-0.824
work-0.792
cient-0.785
illa-0.768
ci-0.761
Token.
Token ID13
Feature activation+3.70e-01

Pos logit contributions
piring+0.163
work+0.157
moon+0.145
owers+0.144
sters+0.143
Neg logit contributions
omo-0.192
 mess-0.186
 mush-0.180
 painful-0.170
 slippery-0.169
Token The
Token ID383
Feature activation+1.97e+00

Pos logit contributions
piring+0.019
cient+0.019
work+0.019
avi+0.019
owers+0.019
Neg logit contributions
 mess-0.024
 mush-0.022
 slippery-0.021
omo-0.020
 poisonous-0.020
Token cheese
Token ID9891
Feature activation-3.89e+00

Pos logit contributions
 father+0.115
 wife+0.110
cient+0.109
work+0.109
 photographer+0.108
Neg logit contributions
onna-0.104
 mush-0.101
omo-0.101
 mess-0.100
 slippery-0.098
Token gets
Token ID3011
Feature activation-3.10e+00

Pos logit contributions
omo+0.203
 mush+0.197
 mess+0.194
 late+0.188
 slippery+0.182
Neg logit contributions
piring-0.220
sters-0.213
work-0.213
owers-0.211
cient-0.208
Token loud
Token ID7812
Feature activation-2.62e+00

Pos logit contributions
 mess+0.025
omo+0.025
 mush+0.024
 slippery+0.022
onna+0.022
Neg logit contributions
work-0.015
cient-0.014
piring-0.014
 father-0.014
leaf-0.014
Token noise
Token ID7838
Feature activation-4.85e+00

Pos logit contributions
 mess+0.118
 mush+0.109
 slippery+0.098
omo+0.097
onna+0.093
Neg logit contributions
piring-0.192
cient-0.191
work-0.185
illa-0.183
leaf-0.183
Token and
Token ID290
Feature activation-3.40e-01

Pos logit contributions
 mess+0.255
 mush+0.244
omo+0.224
 slippery+0.223
onna+0.215
Neg logit contributions
piring-0.315
cient-0.311
work-0.304
illa-0.297
leaf-0.297
Token a
Token ID257
Feature activation+1.56e+00

Pos logit contributions
 mess+0.019
 mush+0.018
omo+0.017
 slippery+0.016
onna+0.016
Neg logit contributions
piring-0.021
cient-0.020
work-0.020
leaf-0.019
ci-0.019
Token big
Token ID1263
Feature activation-1.87e+00

Pos logit contributions
cient+0.111
piring+0.109
 father+0.107
 grandfather+0.106
 goat+0.106
Neg logit contributions
 mess-0.064
 mush-0.063
omo-0.060
onna-0.057
 slippery-0.056
Token mess
Token ID2085
Feature activation-1.44e+01

Pos logit contributions
 mess+0.103
 mush+0.100
omo+0.095
 slippery+0.093
onna+0.093
Neg logit contributions
piring-0.110
cient-0.110
work-0.107
 photographer-0.105
igh-0.104
Token.
Token ID13
Feature activation-2.35e-01

Pos logit contributions
 mess+0.776
 mush+0.741
omo+0.678
 slippery+0.678
onna+0.655
Neg logit contributions
piring-0.920
cient-0.907
work-0.885
illa-0.867
leaf-0.865
Token Lily
Token ID20037
Feature activation-8.90e+00

Pos logit contributions
 mess+0.015
 mush+0.013
 slippery+0.013
omo+0.012
 poisonous+0.012
Neg logit contributions
cient-0.013
 father-0.012
 grandfather-0.012
piring-0.012
avi-0.012
Token was
Token ID373
Feature activation-1.68e+00

Pos logit contributions
 mush+0.512
 mess+0.506
omo+0.476
 slippery+0.462
onna+0.462
Neg logit contributions
cient-0.506
piring-0.497
ci-0.482
illa-0.481
work-0.480
Token scared
Token ID12008
Feature activation-4.22e+00

Pos logit contributions
omo+0.092
 mush+0.089
 mess+0.089
onna+0.087
 slippery+0.079
Neg logit contributions
piring-0.095
 father-0.095
work-0.093
cient-0.093
leaf-0.093
Token and
Token ID290
Feature activation+3.20e-02

Pos logit contributions
 mush+0.302
onna+0.286
omo+0.283
 mess+0.280
 slippery+0.264
Neg logit contributions
piring-0.182
 father-0.175
 grandfather-0.161
work-0.161
ica-0.161
Token they
Token ID484
Feature activation+1.47e+00

Pos logit contributions
 mess+0.128
 mush+0.115
 slippery+0.106
omo+0.102
 messy+0.101
Neg logit contributions
piring-0.191
cient-0.184
 photographer-0.181
iya-0.180
moon-0.180
Token also
Token ID635
Feature activation-7.02e-01

Pos logit contributions
piring+0.092
cient+0.090
illa+0.088
work+0.087
ica+0.087
Neg logit contributions
 mess-0.082
 mush-0.076
 slippery-0.072
 Lily-0.069
omo-0.069
Token made
Token ID925
Feature activation-3.36e+00

Pos logit contributions
 mess+0.042
 mush+0.039
 slippery+0.036
omo+0.035
 Lily+0.035
Neg logit contributions
piring-0.042
cient-0.041
illa-0.040
work-0.040
 photographer-0.040
Token a
Token ID257
Feature activation+1.56e+00

Pos logit contributions
 mess+0.193
 mush+0.178
 slippery+0.168
 Lily+0.164
omo+0.162
Neg logit contributions
piring-0.210
cient-0.209
 photographer-0.198
moon-0.198
work-0.197
Token big
Token ID1263
Feature activation-1.87e+00

Pos logit contributions
cient+0.110
piring+0.109
work+0.109
 photographer+0.107
 wife+0.107
Neg logit contributions
 mush-0.066
 mess-0.065
omo-0.060
 slippery-0.057
onna-0.057
Token mess
Token ID2085
Feature activation-1.44e+01

Pos logit contributions
 mess+0.102
 mush+0.099
omo+0.091
 slippery+0.091
onna+0.088
Neg logit contributions
piring-0.115
cient-0.114
work-0.112
 photographer-0.109
illa-0.108
Token.
Token ID13
Feature activation-2.35e-01

Pos logit contributions
 mess+0.735
 mush+0.677
 slippery+0.621
omo+0.605
 messy+0.593
Neg logit contributions
cient-0.977
piring-0.966
leaf-0.934
illa-0.929
 photographer-0.928
Token They
Token ID1119
Feature activation-5.98e-01

Pos logit contributions
 mess+0.014
 mush+0.013
 slippery+0.013
 messy+0.012
 poisonous+0.012
Neg logit contributions
cient-0.013
piring-0.013
work-0.013
 photographer-0.013
avi-0.013
Token threw
Token ID9617
Feature activation-4.21e+00

Pos logit contributions
 mess+0.035
 mush+0.033
 slippery+0.030
omo+0.030
onna+0.029
Neg logit contributions
piring-0.037
cient-0.036
work-0.035
illa-0.035
leaf-0.034
Token their
Token ID511
Feature activation+9.66e-01

Pos logit contributions
omo+0.263
onna+0.260
 mess+0.254
 mush+0.245
 painful+0.225
Neg logit contributions
piring-0.212
work-0.208
cient-0.197
igh-0.195
leaf-0.195
Token toys
Token ID14958
Feature activation-3.22e+00

Pos logit contributions
piring+0.059
cient+0.058
work+0.057
ci+0.055
leaf+0.055
Neg logit contributions
 mess-0.054
 mush-0.051
omo-0.048
onna-0.047
 slippery-0.047
Token named
Token ID3706
Feature activation-5.32e-02

Pos logit contributions
piring+0.198
cient+0.193
leaf+0.185
work+0.185
 photographer+0.180
Neg logit contributions
 mess-0.223
 mush-0.214
omo-0.198
 slippery-0.198
 Lily-0.194
Token Molly
Token ID30236
Feature activation-4.11e+00

Pos logit contributions
 mess+0.002
omo+0.002
 mush+0.002
 unstable+0.001
 slippery+0.001
Neg logit contributions
cient-0.004
illa-0.004
ci-0.004
 father-0.004
piring-0.004
Token.
Token ID13
Feature activation+3.46e-01

Pos logit contributions
 mess+0.147
 mush+0.136
 Lily+0.134
 much+0.119
 slippery+0.116
Neg logit contributions
piring-0.343
cient-0.336
leaf-0.333
igh-0.327
used-0.326
Token She
Token ID1375
Feature activation+2.62e-01

Pos logit contributions
piring+0.015
owers+0.015
cient+0.014
 father+0.014
amb+0.014
Neg logit contributions
 mess-0.026
 mush-0.023
 slippery-0.023
 poisonous-0.022
omo-0.022
Token was
Token ID373
Feature activation-1.18e+00

Pos logit contributions
cient+0.012
piring+0.012
ci+0.012
work+0.011
leaf+0.011
Neg logit contributions
 mess-0.019
 mush-0.018
omo-0.017
 slippery-0.017
onna-0.016
Token 3
Token ID513
Feature activation+1.05e+01

Pos logit contributions
 mess+0.058
 mush+0.056
 Lily+0.053
 messy+0.052
 slippery+0.052
Neg logit contributions
piring-0.082
cient-0.080
avi-0.079
 photographer-0.078
igh-0.077
Token years
Token ID812
Feature activation-2.89e-01

Pos logit contributions
illa+0.539
ci+0.539
ica+0.525
piring+0.524
avi+0.514
Neg logit contributions
onna-0.634
 mess-0.623
 mush-0.619
 slippery-0.576
 steep-0.563
Token old
Token ID1468
Feature activation-9.19e-01

Pos logit contributions
 mess+0.015
 mush+0.014
omo+0.014
 slippery+0.013
onna+0.012
Neg logit contributions
 father-0.018
 grandfather-0.017
cient-0.017
 wife-0.017
piring-0.017
Token and
Token ID290
Feature activation+4.67e-01

Pos logit contributions
 mess+0.042
 mush+0.038
 Lily+0.036
omo+0.034
 messy+0.034
Neg logit contributions
cient-0.070
piring-0.069
igh-0.067
leaf-0.067
work-0.065
Token loved
Token ID6151
Feature activation-5.21e-01

Pos logit contributions
piring+0.028
cient+0.028
work+0.027
moon+0.027
igh+0.027
Neg logit contributions
 mess-0.027
 mush-0.026
 slippery-0.024
 Lily-0.024
omo-0.024
Token to
Token ID284
Feature activation+2.53e-01

Pos logit contributions
 mess+0.027
 mush+0.024
 messy+0.023
 Lily+0.023
 slippery+0.022
Neg logit contributions
igh-0.038
piring-0.038
illa-0.037
cient-0.037
used-0.037
Token Everyone
Token ID11075
Feature activation-8.74e-01

Pos logit contributions
cient+0.006
piring+0.006
work+0.006
leaf+0.006
 photographer+0.006
Neg logit contributions
 mess-0.005
 mush-0.004
 slippery-0.004
omo-0.004
onna-0.004
Token watched
Token ID7342
Feature activation+1.88e+00

Pos logit contributions
 mess+0.042
 mush+0.037
 slippery+0.035
 messy+0.034
 Lily+0.034
Neg logit contributions
cient-0.063
piring-0.062
iya-0.062
owers-0.062
moon-0.062
Token him
Token ID683
Feature activation+6.38e+00

Pos logit contributions
cient+0.126
piring+0.124
illa+0.124
ci+0.123
moon+0.122
Neg logit contributions
 mess-0.098
 mush-0.089
 Lily-0.087
 slippery-0.084
 messy-0.080
Token in
Token ID287
Feature activation+1.37e+00

Pos logit contributions
iya+0.455
avi+0.439
illa+0.437
thel+0.436
piring+0.433
Neg logit contributions
 mess-0.354
 slippery-0.298
 mush-0.286
 messy-0.277
 much-0.276
Token amaz
Token ID40642
Feature activation+7.09e+00

Pos logit contributions
cient+0.087
piring+0.086
illa+0.083
work+0.082
ci+0.082
Neg logit contributions
 mess-0.081
 mush-0.075
 slippery-0.070
 Lily-0.066
omo-0.066
Tokenement
Token ID972
Feature activation+9.61e+00

Pos logit contributions
piring+0.303
cient+0.299
work+0.285
leaf+0.279
illa+0.278
Neg logit contributions
 mess-0.537
 mush-0.516
 slippery-0.486
omo-0.485
onna-0.472
Token and
Token ID290
Feature activation+1.85e-01

Pos logit contributions
leaf+0.758
cient+0.753
piring+0.744
 photographer+0.735
igh+0.731
Neg logit contributions
 mess-0.400
 mush-0.331
 slippery-0.314
omo-0.313
 messy-0.309
Token cheered
Token ID34767
Feature activation-8.88e-01

Pos logit contributions
cient+0.011
piring+0.011
owers+0.011
leaf+0.011
illa+0.011
Neg logit contributions
 mess-0.011
 mush-0.010
 Lily-0.010
 slippery-0.009
 messy-0.009
Token.
Token ID13
Feature activation+2.79e-02

Pos logit contributions
 mess+0.043
 mush+0.038
 Lily+0.038
 messy+0.035
 slippery+0.035
Neg logit contributions
cient-0.062
leaf-0.062
igh-0.062
piring-0.062
moon-0.061
Token
Token ID220
Feature activation-5.46e-01

Pos logit contributions
piring+0.002
moon+0.002
illa+0.002
ci+0.002
leaf+0.002
Neg logit contributions
 mush-0.001
 Lily-0.001
 mess-0.001
omo-0.001
 slippery-0.001
Token
Token ID198
Feature activation+6.11e-01

Pos logit contributions
 mess+0.031
 mush+0.030
 slippery+0.027
omo+0.027
 messy+0.026
Neg logit contributions
piring-0.033
cient-0.033
work-0.032
illa-0.031
leaf-0.031
Token one
Token ID530
Feature activation-1.28e+00

Pos logit contributions
 mess+0.117
 mush+0.112
omo+0.107
onna+0.107
 slippery+0.104
Neg logit contributions
piring-0.084
leaf-0.081
 wife-0.080
illa-0.080
moon-0.080
Token.
Token ID13
Feature activation+3.53e-01

Pos logit contributions
 mess+0.067
 mush+0.063
omo+0.057
 slippery+0.056
onna+0.055
Neg logit contributions
piring-0.083
cient-0.081
work-0.081
illa-0.081
ci-0.080
Token Her
Token ID2332
Feature activation+3.99e+00

Pos logit contributions
piring+0.023
igh+0.021
cient+0.021
leaf+0.021
illa+0.021
Neg logit contributions
 Lily-0.020
 mush-0.019
 mess-0.019
omo-0.018
onna-0.018
Token mom
Token ID1995
Feature activation-9.25e-01

Pos logit contributions
piring+0.204
cient+0.201
 father+0.200
illa+0.195
work+0.195
Neg logit contributions
 mess-0.255
 mush-0.250
 slippery-0.232
onna-0.232
omo-0.222
Token said
Token ID531
Feature activation+3.85e+00

Pos logit contributions
 mess+0.055
 mush+0.052
omo+0.050
 Lily+0.049
 slippery+0.049
Neg logit contributions
piring-0.057
cient-0.056
owers-0.053
avi-0.052
work-0.052
Token yes
Token ID3763
Feature activation+1.10e+01

Pos logit contributions
cient+0.158
piring+0.154
work+0.149
ci+0.147
leaf+0.145
Neg logit contributions
 mess-0.293
 mush-0.290
 Lily-0.277
 slippery-0.273
omo-0.269
Token,
Token ID11
Feature activation-3.93e-01

Pos logit contributions
piring+0.686
cient+0.680
work+0.660
illa+0.653
ci+0.648
Neg logit contributions
 mess-0.602
 mush-0.581
 slippery-0.529
omo-0.528
onna-0.513
Token and
Token ID290
Feature activation+4.77e-01

Pos logit contributions
 mess+0.018
 mush+0.017
 Lily+0.016
 slippery+0.016
omo+0.015
Neg logit contributions
cient-0.029
piring-0.029
work-0.028
ci-0.028
illa-0.028
Token she
Token ID673
Feature activation+5.35e-01

Pos logit contributions
cient+0.033
piring+0.033
avi+0.032
igh+0.032
owers+0.031
Neg logit contributions
 Lily-0.023
 mess-0.023
 mush-0.022
 slippery-0.021
omo-0.020
Token happily
Token ID18177
Feature activation-2.97e+00

Pos logit contributions
piring+0.033
cient+0.033
work+0.031
avi+0.031
leaf+0.031
Neg logit contributions
 mess-0.030
 mush-0.029
omo-0.027
 slippery-0.027
onna-0.026
Token ate
Token ID15063
Feature activation-3.21e+00

Pos logit contributions
 mess+0.154
 mush+0.147
 Lily+0.138
omo+0.138
 slippery+0.136
Neg logit contributions
cient-0.197
piring-0.193
avi-0.188
work-0.183
 grandfather-0.183
Token.
Token ID13
Feature activation+2.13e-01

Pos logit contributions
cient+0.703
piring+0.689
leaf+0.688
 photographer+0.681
avi+0.675
Neg logit contributions
 mess-0.626
 mush-0.551
 slippery-0.547
 messy-0.512
 muddy-0.496
Token He
Token ID679
Feature activation-2.24e+00

Pos logit contributions
cient+0.013
piring+0.013
leaf+0.013
work+0.013
illa+0.013
Neg logit contributions
 mess-0.012
 mush-0.011
 slippery-0.010
omo-0.010
onna-0.010
Token had
Token ID550
Feature activation-2.71e+00

Pos logit contributions
 mess+0.135
 mush+0.129
 slippery+0.119
omo+0.119
onna+0.115
Neg logit contributions
piring-0.130
cient-0.128
work-0.124
illa-0.122
leaf-0.122
Token heard
Token ID2982
Feature activation-1.15e+00

Pos logit contributions
 mess+0.138
 mush+0.131
omo+0.121
 slippery+0.116
onna+0.115
Neg logit contributions
piring-0.181
cient-0.177
leaf-0.173
work-0.172
illa-0.171
Token about
Token ID546
Feature activation-3.45e+00

Pos logit contributions
 mess+0.064
 mush+0.061
 slippery+0.056
omo+0.055
onna+0.054
Neg logit contributions
piring-0.071
cient-0.071
work-0.068
illa-0.067
leaf-0.067
Token aer
Token ID9551
Feature activation+1.05e+01

Pos logit contributions
 mess+0.178
 mush+0.156
 slippery+0.156
 much+0.151
 Lily+0.150
Neg logit contributions
cient-0.241
owers-0.236
piring-0.236
 photographer-0.235
amb-0.235
Tokenopl
Token ID20106
Feature activation+7.33e-01

Pos logit contributions
igh+0.454
cient+0.448
piring+0.445
leaf+0.444
ica+0.443
Neg logit contributions
 mess-0.758
 mush-0.735
 slippery-0.694
 Lily-0.692
 messy-0.683
Tokenanes
Token ID7305
Feature activation+6.97e+00

Pos logit contributions
ci+0.030
 wife+0.030
cient+0.030
 goat+0.029
 photographer+0.029
Neg logit contributions
 mess-0.050
omo-0.047
 mush-0.047
onna-0.047
 late-0.046
Token before
Token ID878
Feature activation-5.29e+00

Pos logit contributions
 photographer+0.530
iya+0.527
leaf+0.522
cient+0.519
 wife+0.513
Neg logit contributions
 mess-0.319
 slippery-0.278
 much-0.254
 mush-0.253
 messy-0.242
Token;
Token ID26
Feature activation+3.78e+00

Pos logit contributions
 mess+0.241
 mush+0.213
 slippery+0.208
 Lily+0.207
 much+0.200
Neg logit contributions
cient-0.398
piring-0.391
leaf-0.387
ci-0.385
 photographer-0.382
Token they
Token ID484
Feature activation+2.30e+00

Pos logit contributions
cient+0.291
piring+0.286
illa+0.283
 photographer+0.282
owers+0.281
Neg logit contributions
 mess-0.152
 slippery-0.142
 Lily-0.140
 mush-0.140
 much-0.139
Token Tom
Token ID4186
Feature activation-1.16e+00

Pos logit contributions
 mess+0.042
 mush+0.040
 slippery+0.036
omo+0.036
onna+0.035
Neg logit contributions
piring-0.054
cient-0.053
work-0.052
illa-0.051
leaf-0.051
Token was
Token ID373
Feature activation-1.60e+00

Pos logit contributions
 mess+0.072
 mush+0.068
 slippery+0.063
omo+0.063
 messy+0.061
Neg logit contributions
piring-0.067
cient-0.066
work-0.064
leaf-0.064
illa-0.062
Token visiting
Token ID10013
Feature activation+5.82e+00

Pos logit contributions
 mess+0.117
 mush+0.112
 slippery+0.106
omo+0.104
 messy+0.103
Neg logit contributions
piring-0.073
cient-0.072
work-0.069
 photographer-0.067
leaf-0.067
Token his
Token ID465
Feature activation-1.47e+00

Pos logit contributions
piring+0.385
cient+0.380
work+0.365
leaf+0.362
 photographer+0.360
Neg logit contributions
 mess-0.309
 mush-0.289
 slippery-0.268
omo-0.262
 Lily-0.257
Token grand
Token ID4490
Feature activation+7.32e+00

Pos logit contributions
 mess+0.101
 messy+0.089
 time+0.082
 slippery+0.080
 mush+0.077
Neg logit contributions
piring-0.095
leaf-0.091
owers-0.090
ance-0.088
amb-0.086
Tokenpa
Token ID8957
Feature activation+9.72e+00

Pos logit contributions
piring+0.266
cient+0.251
leaf+0.234
owers+0.226
used+0.221
Neg logit contributions
 mess-0.688
 mush-0.597
omo-0.597
 time-0.590
 messy-0.563
Token's
Token ID338
Feature activation+2.49e+00

Pos logit contributions
leaf+0.740
piring+0.725
 downs+0.694
amb+0.691
avi+0.690
Neg logit contributions
 mess-0.444
 Lily-0.407
 time-0.392
 messy-0.350
 anymore-0.345
Token house
Token ID2156
Feature activation+1.10e+00

Pos logit contributions
piring+0.186
leaf+0.179
ance+0.174
igh+0.174
owers+0.173
Neg logit contributions
 mess-0.144
 messy-0.119
 mush-0.109
 slippery-0.107
 muddy-0.102
Token.
Token ID13
Feature activation-1.43e-01

Pos logit contributions
piring+0.080
cient+0.079
leaf+0.078
 photographer+0.075
avi+0.075
Neg logit contributions
 mess-0.052
 mush-0.047
 slippery-0.044
 messy-0.042
 Lily-0.041
Token His
Token ID2399
Feature activation+1.77e+00

Pos logit contributions
 mess+0.006
 mush+0.006
 Lily+0.005
omo+0.005
 slippery+0.005
Neg logit contributions
piring-0.011
cient-0.011
work-0.010
ci-0.010
illa-0.010
Token grand
Token ID4490
Feature activation+7.28e+00

Pos logit contributions
piring+0.115
owers+0.115
leaf+0.112
ance+0.110
amb+0.102
Neg logit contributions
 mess-0.110
 time-0.108
 messy-0.096
omo-0.088
 broken-0.086
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
cient+0.310
piring+0.293
avi+0.292
iya+0.289
igh+0.288
Neg logit contributions
 mess-0.131
 mush-0.112
 Lily-0.110
 slippery-0.100
 time-0.100
Token She
Token ID1375
Feature activation+1.82e-02

Pos logit contributions
piring+0.006
cient+0.006
work+0.006
leaf+0.006
illa+0.006
Neg logit contributions
 mess-0.006
 mush-0.006
 slippery-0.005
omo-0.005
onna-0.005
Token wants
Token ID3382
Feature activation-1.35e+00

Pos logit contributions
piring+0.001
cient+0.001
work+0.001
illa+0.001
ci+0.001
Neg logit contributions
 mess-0.001
 mush-0.001
 slippery-0.001
omo-0.001
onna-0.001
Token to
Token ID284
Feature activation+8.44e-03

Pos logit contributions
 mess+0.075
 mush+0.071
 slippery+0.066
 Lily+0.066
 messy+0.064
Neg logit contributions
cient-0.089
piring-0.087
igh-0.084
amb-0.083
illa-0.083
Token have
Token ID423
Feature activation-2.04e+00

Pos logit contributions
piring+0.001
cient+0.001
illa+0.000
work+0.000
igh+0.000
Neg logit contributions
 mess-0.001
 mush-0.000
 slippery-0.000
 Lily-0.000
 messy-0.000
Token a
Token ID257
Feature activation+2.03e+00

Pos logit contributions
 mess+0.116
 mush+0.111
 slippery+0.103
 Lily+0.100
omo+0.099
Neg logit contributions
cient-0.129
piring-0.127
work-0.122
illa-0.122
ci-0.119
Token party
Token ID2151
Feature activation+3.14e+00

Pos logit contributions
piring+0.140
cient+0.134
avi+0.132
illa+0.132
work+0.130
Neg logit contributions
 mess-0.108
 mush-0.099
 slippery-0.093
 messy-0.090
omo-0.088
Token at
Token ID379
Feature activation+2.22e+00

Pos logit contributions
cient+0.250
piring+0.242
leaf+0.237
avi+0.235
igh+0.235
Neg logit contributions
 mess-0.133
 mush-0.115
 Lily-0.113
 messy-0.106
 slippery-0.104
Token her
Token ID607
Feature activation+6.57e-01

Pos logit contributions
piring+0.140
cient+0.139
work+0.133
leaf+0.132
 photographer+0.131
Neg logit contributions
 mess-0.125
 mush-0.117
 slippery-0.109
omo-0.107
onna-0.105
Token house
Token ID2156
Feature activation+1.28e+00

Pos logit contributions
piring+0.039
leaf+0.037
cient+0.037
igh+0.035
work+0.035
Neg logit contributions
 mess-0.042
 mush-0.038
omo-0.036
 slippery-0.036
 messy-0.036
Token.
Token ID13
Feature activation+4.61e-02

Pos logit contributions
piring+0.092
cient+0.091
leaf+0.088
work+0.087
 photographer+0.086
Neg logit contributions
 mess-0.061
 mush-0.056
 slippery-0.051
omo-0.050
 Lily-0.049
Token went
Token ID1816
Feature activation+1.02e+00

Pos logit contributions
work+0.023
cient+0.023
bike+0.022
 father+0.021
 goat+0.021
Neg logit contributions
omo-0.029
 mess-0.027
onna-0.026
hell-0.026
umpy-0.026
Token into
Token ID656
Feature activation-3.10e+00

Pos logit contributions
work+0.048
piring+0.046
bike+0.045
cient+0.045
 camera+0.043
Neg logit contributions
omo-0.069
 mess-0.062
onna-0.060
 mush-0.059
 million-0.057
Token the
Token ID262
Feature activation+6.80e-01

Pos logit contributions
 mess+0.180
 mush+0.175
omo+0.166
 slippery+0.159
onna+0.158
Neg logit contributions
piring-0.178
cient-0.172
work-0.172
illa-0.168
 photographer-0.168
Token street
Token ID4675
Feature activation-1.41e+00

Pos logit contributions
work+0.033
 goat+0.032
cient+0.032
 camera+0.032
 photographer+0.031
Neg logit contributions
onna-0.043
omo-0.041
 mush-0.040
 mess-0.039
hell-0.037
Token.
Token ID13
Feature activation+3.28e-01

Pos logit contributions
omo+0.088
 mush+0.086
 mess+0.083
onna+0.078
 painful+0.078
Neg logit contributions
work-0.073
piring-0.069
cient-0.067
bike-0.066
moon-0.066
Token A
Token ID317
Feature activation+4.18e+00

Pos logit contributions
cient+0.016
work+0.015
piring+0.015
 photographer+0.015
 father+0.015
Neg logit contributions
 mess-0.023
 mush-0.021
 slippery-0.020
omo-0.020
onna-0.019
Token car
Token ID1097
Feature activation+1.02e+00

Pos logit contributions
 goat+0.261
 photographer+0.252
 wife+0.246
 father+0.243
 camera+0.242
Neg logit contributions
onna-0.206
 mush-0.205
omo-0.194
 mess-0.187
 slippery-0.172
Token came
Token ID1625
Feature activation-8.58e-01

Pos logit contributions
piring+0.065
cient+0.064
work+0.063
leaf+0.061
illa+0.061
Neg logit contributions
 mess-0.055
 mush-0.053
omo-0.048
 slippery-0.048
onna-0.047
Token and
Token ID290
Feature activation+4.62e-01

Pos logit contributions
 mess+0.051
 mush+0.050
omo+0.046
 slippery+0.044
onna+0.044
Neg logit contributions
piring-0.050
cient-0.049
work-0.049
illa-0.047
leaf-0.047
Token hit
Token ID2277
Feature activation-7.02e+00

Pos logit contributions
work+0.026
cient+0.026
piring+0.026
leaf+0.025
ci+0.024
Neg logit contributions
 mess-0.028
 mush-0.027
omo-0.026
onna-0.025
 slippery-0.023
Token the
Token ID262
Feature activation+6.60e-01

Pos logit contributions
 mess+0.440
 mush+0.423
omo+0.417
onna+0.415
 slippery+0.379
Neg logit contributions
work-0.374
piring-0.368
cient-0.362
illa-0.346
igh-0.345
Token
Token ID198
Feature activation+7.10e-01

Pos logit contributions
cient+0.035
ci+0.034
piring+0.033
amb+0.033
illa+0.033
Neg logit contributions
 mess-0.049
 mush-0.048
 slippery-0.043
onna-0.042
 poisonous-0.042
TokenTim
Token ID14967
Feature activation-5.35e+00

Pos logit contributions
bike+0.040
cient+0.039
ci+0.039
 photographer+0.038
piring+0.038
Neg logit contributions
 mush-0.041
 mess-0.041
onna-0.039
 poisonous-0.037
 slippery-0.036
Tokenmy
Token ID1820
Feature activation-2.48e+00

Pos logit contributions
 mess+0.275
 mush+0.264
 slippery+0.240
onna+0.235
omo+0.234
Neg logit contributions
cient-0.350
piring-0.346
ci-0.340
work-0.339
illa-0.338
Token said
Token ID531
Feature activation+3.61e+00

Pos logit contributions
 mess+0.171
 mush+0.163
omo+0.153
 slippery+0.152
 Lily+0.148
Neg logit contributions
piring-0.127
cient-0.122
work-0.118
leaf-0.117
illa-0.115
Token,
Token ID11
Feature activation-6.40e-01

Pos logit contributions
piring+0.202
cient+0.199
work+0.187
 wife+0.184
 father+0.184
Neg logit contributions
 mess-0.233
 mush-0.214
omo-0.203
 slippery-0.201
onna-0.194
Token "
Token ID366
Feature activation+1.59e+00

Pos logit contributions
 Lily+0.029
 mess+0.028
 mush+0.028
 slippery+0.024
omo+0.024
Neg logit contributions
piring-0.046
igh-0.045
illa-0.045
moon-0.045
ci-0.045
TokenSure
Token ID19457
Feature activation+5.13e+00

Pos logit contributions
piring+0.080
cient+0.079
work+0.077
illa+0.075
igh+0.075
Neg logit contributions
 mess-0.107
 mush-0.104
 slippery-0.098
omo-0.096
onna-0.094
Token!
Token ID0
Feature activation+9.62e-02

Pos logit contributions
piring+0.305
cient+0.303
work+0.295
 photographer+0.294
ci+0.292
Neg logit contributions
 mess-0.292
 mush-0.281
 slippery-0.258
onna-0.252
omo-0.252
Token Let
Token ID3914
Feature activation+6.51e-02

Pos logit contributions
piring+0.006
cient+0.006
work+0.006
leaf+0.006
moon+0.006
Neg logit contributions
 mess-0.005
 mush-0.005
omo-0.005
 slippery-0.005
 Lily-0.005
Token's
Token ID338
Feature activation+2.67e+00

Pos logit contributions
piring+0.003
cient+0.003
work+0.003
illa+0.003
ci+0.003
Neg logit contributions
 mess-0.005
 mush-0.005
 slippery-0.004
omo-0.004
onna-0.004
Token build
Token ID1382
Feature activation+2.95e+00

Pos logit contributions
piring+0.150
cient+0.143
leaf+0.141
work+0.138
iya+0.138
Neg logit contributions
 mess-0.174
 mush-0.164
 slippery-0.153
omo-0.151
 messy-0.148
Token
Token ID198
Feature activation+9.23e-01

Pos logit contributions
piring+0.018
cient+0.018
work+0.017
 photographer+0.017
leaf+0.017
Neg logit contributions
 mess-0.018
 mush-0.017
 slippery-0.016
omo-0.015
onna-0.015
Token
Token ID198
Feature activation+9.19e-01

Pos logit contributions
piring+0.055
cient+0.055
leaf+0.053
avi+0.053
illa+0.052
Neg logit contributions
 mess-0.053
 mush-0.051
 slippery-0.046
omo-0.046
onna-0.045
Token"
Token ID1
Feature activation-1.99e+00

Pos logit contributions
piring+0.056
cient+0.055
avi+0.054
leaf+0.054
illa+0.053
Neg logit contributions
 mess-0.053
 mush-0.051
 slippery-0.045
onna-0.045
omo-0.044
TokenSure
Token ID19457
Feature activation+5.38e+00

Pos logit contributions
 mess+0.102
 mush+0.100
 Lily+0.094
omo+0.092
L+0.091
Neg logit contributions
cient-0.137
igh-0.132
piring-0.130
illa-0.126
ci-0.125
Token,
Token ID11
Feature activation-4.07e-01

Pos logit contributions
piring+0.330
cient+0.327
work+0.318
illa+0.315
 photographer+0.311
Neg logit contributions
 mess-0.302
 mush-0.291
 slippery-0.265
omo-0.262
onna-0.258
Token you
Token ID345
Feature activation+1.65e+00

Pos logit contributions
 mess+0.022
 mush+0.020
 slippery+0.019
omo+0.019
 Lily+0.018
Neg logit contributions
cient-0.027
piring-0.027
work-0.026
ci-0.026
moon-0.025
Token can
Token ID460
Feature activation-1.05e-02

Pos logit contributions
cient+0.090
piring+0.089
igh+0.085
work+0.084
 photographer+0.083
Neg logit contributions
 mess-0.105
 mush-0.103
 slippery-0.096
onna-0.096
omo-0.096
Token play
Token ID711
Feature activation+2.09e+00

Pos logit contributions
 mess+0.001
 mush+0.001
 slippery+0.001
omo+0.001
onna+0.001
Neg logit contributions
piring-0.001
cient-0.001
work-0.001
leaf-0.001
illa-0.001
Token with
Token ID351
Feature activation-4.02e+00

Pos logit contributions
piring+0.132
work+0.131
cient+0.131
 photographer+0.127
ci+0.127
Neg logit contributions
 mess-0.106
 mush-0.104
omo-0.101
onna-0.097
 slippery-0.096
Token us
Token ID514
Feature activation-3.00e-01

Pos logit contributions
 mess+0.217
 mush+0.207
omo+0.194
onna+0.191
 slippery+0.189
Neg logit contributions
cient-0.253
piring-0.250
work-0.245
illa-0.240
ci-0.238
Token.
Token ID13
Feature activation+3.50e-01

Pos logit contributions
 mush+0.016
 mess+0.016
omo+0.014
 slippery+0.014
onna+0.014
Neg logit contributions
work-0.018
cient-0.018
piring-0.018
illa-0.018
igh-0.018
Token Sara
Token ID24799
Feature activation-5.82e+00

Pos logit contributions
 mess+0.081
 mush+0.074
 Lily+0.069
 slippery+0.066
omo+0.065
Neg logit contributions
cient-0.197
piring-0.197
work-0.193
ci-0.192
illa-0.190
Token and
Token ID290
Feature activation+4.77e-01

Pos logit contributions
 mess+0.367
 mush+0.322
 Lily+0.317
 slippery+0.306
onna+0.306
Neg logit contributions
cient-0.352
piring-0.350
ci-0.325
work-0.321
illa-0.316
Token Ben
Token ID3932
Feature activation+5.24e-01

Pos logit contributions
piring+0.041
work+0.041
ci+0.040
cient+0.040
illa+0.040
Neg logit contributions
 Lily-0.019
 mess-0.016
 mush-0.015
 slippery-0.014
 sticky-0.012
Token say
Token ID910
Feature activation+6.73e+00

Pos logit contributions
ci+0.028
piring+0.027
cient+0.025
illa+0.025
 downs+0.025
Neg logit contributions
 mess-0.037
 Lily-0.033
 time-0.031
 slippery-0.031
 mush-0.030
Token.
Token ID13
Feature activation+3.38e-01

Pos logit contributions
cient+0.514
piring+0.507
illa+0.504
ci+0.498
ica+0.498
Neg logit contributions
 mess-0.277
 mush-0.255
 Lily-0.240
 much-0.233
 slippery-0.229
Token "
Token ID366
Feature activation+1.83e+00

Pos logit contributions
ci+0.026
illa+0.025
igh+0.025
piring+0.025
cient+0.025
Neg logit contributions
 Lily-0.015
 mess-0.014
 mush-0.013
onna-0.013
 late-0.013
TokenIt
Token ID1026
Feature activation+1.49e+00

Pos logit contributions
cient+0.129
ica+0.124
ci+0.122
 father+0.121
igh+0.120
Neg logit contributions
 mush-0.083
 late-0.082
 mess-0.082
L-0.079
 much-0.078
Token is
Token ID318
Feature activation-1.30e+00

Pos logit contributions
piring+0.116
 photographer+0.116
iya+0.115
avi+0.113
moon+0.112
Neg logit contributions
 mess-0.063
 mush-0.055
 slippery-0.052
 messy-0.049
 Lily-0.048
Token very
Token ID845
Feature activation+1.95e+00

Pos logit contributions
 mess+0.071
 mush+0.068
onna+0.062
omo+0.060
 slippery+0.060
Neg logit contributions
piring-0.077
 father-0.076
 grandfather-0.076
cient-0.075
work-0.075
Token fancy
Token ID14996
Feature activation+1.10e+00

Pos logit contributions
piring+0.132
leaf+0.131
cient+0.130
 wife+0.130
 photographer+0.126
Neg logit contributions
 mess-0.089
 mush-0.084
onna-0.075
omo-0.071
 million-0.070
Token."
Token ID526
Feature activation-4.70e+00

Pos logit contributions
piring+0.071
work+0.069
cient+0.069
illa+0.066
leaf+0.066
Neg logit contributions
 mess-0.058
 mush-0.056
omo-0.052
 slippery-0.051
onna-0.049
Token something
Token ID1223
Feature activation-1.55e+00

Pos logit contributions
 mess+0.142
 mush+0.131
 slippery+0.124
omo+0.119
 Lily+0.118
Neg logit contributions
piring-0.159
cient-0.159
work-0.153
 photographer-0.153
moon-0.151
Token wonderful
Token ID7932
Feature activation+1.01e+00

Pos logit contributions
 mess+0.066
 mush+0.060
 slippery+0.057
 messy+0.055
 much+0.053
Neg logit contributions
cient-0.121
piring-0.118
avi-0.117
igh-0.117
 photographer-0.117
Token with
Token ID351
Feature activation-4.00e+00

Pos logit contributions
cient+0.088
avi+0.081
iya+0.081
igh+0.080
leaf+0.080
Neg logit contributions
 mess-0.042
 mush-0.033
 Lily-0.031
 slippery-0.031
 messy-0.031
Token his
Token ID465
Feature activation-1.02e+00

Pos logit contributions
 mess+0.221
 mush+0.208
omo+0.194
 slippery+0.188
onna+0.188
Neg logit contributions
piring-0.250
cient-0.245
work-0.241
illa-0.237
leaf-0.236
Token yarn
Token ID21181
Feature activation-3.78e+00

Pos logit contributions
 mess+0.054
 mush+0.051
 slippery+0.046
omo+0.046
onna+0.045
Neg logit contributions
piring-0.067
cient-0.067
work-0.065
illa-0.064
leaf-0.064
Token.
Token ID13
Feature activation+3.64e-01

Pos logit contributions
 mess+0.192
 mush+0.177
 slippery+0.163
omo+0.158
onna+0.155
Neg logit contributions
cient-0.259
piring-0.258
leaf-0.251
illa-0.247
avi-0.247
Token<|endoftext|>
Token ID50256
Feature activation-4.55e+00

Pos logit contributions
piring+0.025
cient+0.024
work+0.024
illa+0.023
ci+0.023
Neg logit contributions
 mess-0.019
 mush-0.018
omo-0.016
 slippery-0.016
onna-0.015
TokenOnce
Token ID7454
Feature activation-2.93e+00

Pos logit contributions
 mess+0.273
 mush+0.261
 slippery+0.242
omo+0.242
 Lily+0.236
Neg logit contributions
piring-0.262
cient-0.261
work-0.250
 photographer-0.247
leaf-0.246
Token upon
Token ID2402
Feature activation-4.17e+00

Pos logit contributions
 mess+0.167
 mush+0.166
 slippery+0.145
omo+0.142
onna+0.138
Neg logit contributions
cient-0.174
piring-0.170
 father-0.169
igh-0.165
 grandfather-0.163
Token a
Token ID257
Feature activation+2.30e+00

Pos logit contributions
 mess+0.267
 mush+0.256
omo+0.237
 slippery+0.237
onna+0.231
Neg logit contributions
piring-0.225
cient-0.221
work-0.214
leaf-0.211
illa-0.210
Token time
Token ID640
Feature activation-5.88e+00

Pos logit contributions
avi+0.184
moon+0.184
iya+0.181
illa+0.178
 photographer+0.176
Neg logit contributions
 mush-0.071
 mess-0.069
onna-0.059
 slippery-0.053
 million-0.053
Token what
Token ID644
Feature activation+1.48e+00

Pos logit contributions
piring+0.182
cient+0.177
work+0.173
illa+0.172
 father+0.170
Neg logit contributions
 mess-0.223
 mush-0.219
omo-0.203
 slippery-0.200
onna-0.200
Token to
Token ID284
Feature activation+2.80e-01

Pos logit contributions
piring+0.083
cient+0.082
work+0.079
leaf+0.078
 photographer+0.078
Neg logit contributions
 mess-0.092
 mush-0.088
omo-0.082
 slippery-0.082
onna-0.079
Token do
Token ID466
Feature activation+8.89e+00

Pos logit contributions
piring+0.011
cient+0.011
work+0.011
leaf+0.011
illa+0.010
Neg logit contributions
 mess-0.022
 mush-0.021
 slippery-0.020
omo-0.020
onna-0.019
Token.
Token ID13
Feature activation+3.71e-01

Pos logit contributions
piring+0.530
work+0.517
cient+0.509
ci+0.508
illa+0.495
Neg logit contributions
 mess-0.491
 mush-0.486
omo-0.463
 slippery-0.449
onna-0.439
Token She
Token ID1375
Feature activation+2.86e-01

Pos logit contributions
cient+0.017
 father+0.015
 grandfather+0.014
ci+0.014
piring+0.014
Neg logit contributions
 mess-0.026
 mush-0.025
omo-0.025
 slippery-0.025
onna-0.024
Token came
Token ID1625
Feature activation-8.16e-01

Pos logit contributions
cient+0.015
piring+0.015
work+0.014
illa+0.014
 photographer+0.014
Neg logit contributions
 mess-0.019
 mush-0.018
 slippery-0.017
omo-0.017
onna-0.017
Token to
Token ID284
Feature activation+2.88e-01

Pos logit contributions
 mess+0.040
 mush+0.037
 slippery+0.035
omo+0.033
 messy+0.033
Neg logit contributions
piring-0.057
cient-0.057
work-0.054
owers-0.054
avi-0.054
Token Lily
Token ID20037
Feature activation-8.36e+00

Pos logit contributions
piring+0.016
cient+0.015
work+0.015
owers+0.014
ci+0.014
Neg logit contributions
 mess-0.020
 mush-0.018
 Lily-0.018
 slippery-0.017
 messy-0.017
Token and
Token ID290
Feature activation+4.99e-01

Pos logit contributions
 mess+0.472
 Lily+0.374
 mush+0.374
 messy+0.370
 slippery+0.353
Neg logit contributions
piring-0.610
leaf-0.597
avi-0.576
iya-0.566
owers-0.564
Token said
Token ID531
Feature activation+3.88e+00

Pos logit contributions
piring+0.032
owers+0.031
leaf+0.030
igh+0.030
work+0.030
Neg logit contributions
 mess-0.030
 Lily-0.027
 mush-0.027
 messy-0.025
 slippery-0.025
Token,
Token ID11
Feature activation-3.72e-01

Pos logit contributions
piring+0.258
cient+0.255
work+0.249
illa+0.246
leaf+0.245
Neg logit contributions
 mess-0.202
 mush-0.191
 slippery-0.174
omo-0.172
onna-0.168
Token smile
Token ID8212
Feature activation-2.91e+00

Pos logit contributions
piring+0.131
cient+0.130
leaf+0.125
 father+0.125
work+0.125
Neg logit contributions
 mess-0.127
 mush-0.121
omo-0.116
 slippery-0.113
onna-0.111
Token.
Token ID13
Feature activation+3.06e-01

Pos logit contributions
 mess+0.109
 mush+0.099
 Lily+0.091
omo+0.086
 messy+0.086
Neg logit contributions
cient-0.240
leaf-0.234
piring-0.230
avi-0.225
illa-0.225
Token "
Token ID366
Feature activation+1.80e+00

Pos logit contributions
piring+0.018
cient+0.017
work+0.017
leaf+0.017
 photographer+0.016
Neg logit contributions
 mess-0.019
 mush-0.018
 slippery-0.016
omo-0.016
onna-0.016
TokenDo
Token ID5211
Feature activation+2.52e+00

Pos logit contributions
cient+0.109
piring+0.108
work+0.103
illa+0.102
igh+0.102
Neg logit contributions
 mess-0.102
 mush-0.100
 slippery-0.091
omo-0.090
onna-0.089
Token you
Token ID345
Feature activation+1.64e+00

Pos logit contributions
piring+0.174
cient+0.173
work+0.170
leaf+0.167
 photographer+0.167
Neg logit contributions
 mess-0.123
 mush-0.116
 Lily-0.113
 slippery-0.111
omo-0.109
Token want
Token ID765
Feature activation+6.73e-02

Pos logit contributions
piring+0.072
igh+0.071
cient+0.070
leaf+0.069
work+0.068
Neg logit contributions
 mess-0.117
 mush-0.112
omo-0.108
 slippery-0.106
onna-0.104
Token to
Token ID284
Feature activation+2.45e-01

Pos logit contributions
work+0.003
leaf+0.003
piring+0.003
cient+0.003
illa+0.003
Neg logit contributions
 mess-0.005
omo-0.004
 mush-0.004
 slippery-0.004
 steep-0.004
Token try
Token ID1949
Feature activation+3.44e+00

Pos logit contributions
cient+0.009
work+0.009
illa+0.009
piring+0.009
igh+0.009
Neg logit contributions
 mess-0.018
 mush-0.017
 slippery-0.017
 painful-0.016
omo-0.016
Token?"
Token ID1701
Feature activation-1.50e+00

Pos logit contributions
illa+0.185
igh+0.183
ica+0.182
work+0.182
cient+0.182
Neg logit contributions
 mess-0.200
 mush-0.194
omo-0.182
 painful-0.176
 slippery-0.175
Token
Token ID198
Feature activation+9.34e-01

Pos logit contributions
 mess+0.069
 mush+0.066
 Lily+0.062
omo+0.061
 slippery+0.060
Neg logit contributions
piring-0.109
cient-0.107
work-0.106
illa-0.103
ci-0.103
Token
Token ID198
Feature activation+9.33e-01

Pos logit contributions
cient+0.046
avi+0.045
leaf+0.045
piring+0.045
illa+0.045
Neg logit contributions
 mess-0.063
 mush-0.061
 slippery-0.055
 messy-0.053
omo-0.052
Token help
Token ID1037
Feature activation+5.09e+00

Pos logit contributions
 mess+0.061
 mush+0.056
 slippery+0.056
onna+0.054
 Lily+0.053
Neg logit contributions
cient-0.048
piring-0.046
 photographer-0.044
leaf-0.044
owers-0.044
Token but
Token ID475
Feature activation+4.62e-01

Pos logit contributions
piring+0.360
cient+0.358
work+0.347
leaf+0.343
illa+0.341
Neg logit contributions
 mess-0.245
 mush-0.229
 slippery-0.208
omo-0.205
onna-0.199
Token smile
Token ID8212
Feature activation-2.89e+00

Pos logit contributions
cient+0.027
piring+0.027
illa+0.026
 photographer+0.026
avi+0.026
Neg logit contributions
 mess-0.028
 mush-0.026
 slippery-0.025
onna-0.024
 messy-0.024
Token.
Token ID13
Feature activation+3.22e-01

Pos logit contributions
 mess+0.145
 mush+0.132
 slippery+0.122
 messy+0.121
 Lily+0.120
Neg logit contributions
leaf-0.200
cient-0.197
piring-0.196
illa-0.192
 photographer-0.192
Token
Token ID198
Feature activation+9.22e-01

Pos logit contributions
piring+0.020
cient+0.020
work+0.020
leaf+0.019
illa+0.019
Neg logit contributions
 mess-0.018
 mush-0.017
 slippery-0.015
omo-0.015
onna-0.015
Token
Token ID198
Feature activation+9.10e-01

Pos logit contributions
piring+0.064
cient+0.063
work+0.061
illa+0.061
leaf+0.061
Neg logit contributions
 mess-0.045
 mush-0.043
 slippery-0.039
omo-0.039
onna-0.037
TokenThe
Token ID464
Feature activation+1.08e+00

Pos logit contributions
piring+0.058
cient+0.057
leaf+0.056
illa+0.055
 photographer+0.055
Neg logit contributions
 mess-0.050
 mush-0.048
 slippery-0.043
omo-0.043
onna-0.042
Token moral
Token ID6573
Feature activation-4.85e+00

Pos logit contributions
piring+0.069
cient+0.068
work+0.067
leaf+0.066
illa+0.065
Neg logit contributions
 mess-0.057
 mush-0.056
onna-0.051
omo-0.050
 slippery-0.050
Token of
Token ID286
Feature activation-2.18e+00

Pos logit contributions
 mess+0.236
 mush+0.225
omo+0.202
 slippery+0.198
onna+0.193
Neg logit contributions
piring-0.337
cient-0.330
leaf-0.328
work-0.327
illa-0.322
Token this
Token ID428
Feature activation+1.44e+00

Pos logit contributions
 mess+0.109
 mush+0.100
 slippery+0.094
 Lily+0.094
omo+0.090
Neg logit contributions
piring-0.151
cient-0.150
 photographer-0.143
owers-0.142
work-0.142
Token story
Token ID1621
Feature activation+4.21e+00

Pos logit contributions
piring+0.071
cient+0.069
work+0.069
illa+0.067
ci+0.066
Neg logit contributions
 mess-0.097
 mush-0.096
 slippery-0.088
omo-0.087
onna-0.086
Token he
Token ID339
Feature activation-1.18e+00

Pos logit contributions
 father+0.125
piring+0.120
 wife+0.118
bike+0.116
 grandfather+0.114
Neg logit contributions
 mess-0.222
 mush-0.206
onna-0.205
omo-0.190
oe-0.187
Token tried
Token ID3088
Feature activation-1.97e-01

Pos logit contributions
 mess+0.085
 mush+0.081
onna+0.076
omo+0.075
 slippery+0.074
Neg logit contributions
cient-0.054
piring-0.052
 father-0.052
 wife-0.050
ci-0.050
Token it
Token ID340
Feature activation-1.85e+00

Pos logit contributions
 mess+0.013
 mush+0.013
omo+0.013
 slippery+0.012
onna+0.012
Neg logit contributions
piring-0.009
work-0.009
illa-0.009
cient-0.009
 father-0.009
Token.
Token ID13
Feature activation-2.67e-03

Pos logit contributions
 mess+0.115
 mush+0.108
omo+0.104
 slippery+0.098
onna+0.098
Neg logit contributions
piring-0.106
cient-0.104
work-0.100
ci-0.098
illa-0.098
Token But
Token ID887
Feature activation-2.74e+00

Pos logit contributions
 mess+0.000
 mush+0.000
 slippery+0.000
 poisonous+0.000
omo+0.000
Neg logit contributions
cient-0.000
amb-0.000
bike-0.000
ci-0.000
piring-0.000
Token little
Token ID1310
Feature activation-4.44e-01

Pos logit contributions
 mess+0.180
 mush+0.171
onna+0.165
omo+0.164
 slippery+0.151
Neg logit contributions
cient-0.136
ci-0.136
piring-0.136
 father-0.135
work-0.133
Token did
Token ID750
Feature activation+2.82e+00

Pos logit contributions
 mess+0.028
 mush+0.028
onna+0.027
omo+0.026
 slippery+0.026
Neg logit contributions
piring-0.021
 father-0.021
ci-0.021
work-0.021
 goat-0.021
Token he
Token ID339
Feature activation-1.23e+00

Pos logit contributions
piring+0.181
cient+0.180
work+0.173
illa+0.171
ci+0.170
Neg logit contributions
 mess-0.153
 mush-0.145
omo-0.132
 slippery-0.132
onna-0.129
Token know
Token ID760
Feature activation+2.35e+00

Pos logit contributions
 mess+0.071
 mush+0.069
omo+0.064
 slippery+0.062
onna+0.061
Neg logit contributions
cient-0.072
piring-0.071
work-0.069
illa-0.069
ci-0.068
Token,
Token ID11
Feature activation-7.76e-01

Pos logit contributions
piring+0.125
cient+0.124
work+0.119
leaf+0.117
 photographer+0.117
Neg logit contributions
 mess-0.154
 mush-0.147
 slippery-0.138
omo-0.137
onna-0.133
Token it
Token ID340
Feature activation-1.90e+00

Pos logit contributions
 mess+0.036
 mush+0.034
 slippery+0.032
 Lily+0.031
 messy+0.030
Neg logit contributions
piring-0.056
cient-0.056
work-0.055
 photographer-0.055
illa-0.055
Token to
Token ID284
Feature activation-1.38e-01

Pos logit contributions
 mess+0.148
 mush+0.142
omo+0.136
 slippery+0.132
onna+0.128
Neg logit contributions
cient-0.102
piring-0.102
work-0.101
leaf-0.099
 father-0.098
Token promise
Token ID6991
Feature activation+1.41e+00

Pos logit contributions
 mess+0.009
 mush+0.009
 slippery+0.008
omo+0.008
onna+0.008
Neg logit contributions
piring-0.007
cient-0.007
illa-0.007
work-0.006
avi-0.006
Token not
Token ID407
Feature activation-7.37e+00

Pos logit contributions
illa+0.079
work+0.078
 father+0.077
ica+0.077
moon+0.076
Neg logit contributions
 mess-0.082
 mush-0.081
 slippery-0.075
 million-0.073
omo-0.073
Token to
Token ID284
Feature activation-1.57e-01

Pos logit contributions
 mush+0.513
 mess+0.509
omo+0.502
 slippery+0.467
 painful+0.451
Neg logit contributions
work-0.313
piring-0.310
moon-0.288
cient-0.288
leaf-0.287
Token eat
Token ID4483
Feature activation-6.66e-01

Pos logit contributions
 mush+0.010
 mess+0.010
omo+0.009
 slippery+0.009
onna+0.009
Neg logit contributions
illa-0.007
work-0.007
cient-0.007
avi-0.007
moon-0.007
Token my
Token ID616
Feature activation-7.15e+00

Pos logit contributions
omo+0.037
 mush+0.036
 mess+0.035
onna+0.034
 painful+0.032
Neg logit contributions
work-0.035
piring-0.035
cient-0.035
 father-0.034
illa-0.034
Token cake
Token ID12187
Feature activation-6.16e-01

Pos logit contributions
 mess+0.387
 mush+0.356
 slippery+0.325
 messy+0.314
omo+0.314
Neg logit contributions
piring-0.487
cient-0.477
illa-0.459
work-0.457
leaf-0.457
Token again
Token ID757
Feature activation-4.87e+00

Pos logit contributions
 mess+0.032
 mush+0.032
omo+0.030
 slippery+0.028
onna+0.028
Neg logit contributions
work-0.038
piring-0.038
moon-0.037
ci-0.037
illa-0.037
Token.
Token ID13
Feature activation-1.20e-01

Pos logit contributions
 mush+0.298
 mess+0.294
omo+0.287
onna+0.270
 slippery+0.266
Neg logit contributions
 father-0.237
ci-0.236
piring-0.236
moon-0.235
avi-0.235
Token OK
Token ID7477
Feature activation-3.11e+00

Pos logit contributions
 mess+0.008
 mush+0.008
 slippery+0.008
omo+0.007
 messy+0.007
Neg logit contributions
piring-0.006
cient-0.006
illa-0.006
work-0.006
leaf-0.006
Token?"
Token ID1701
Feature activation-1.96e+00

Pos logit contributions
 mess+0.193
 mush+0.193
 slippery+0.182
 holes+0.173
omo+0.171
Neg logit contributions
work-0.170
illa-0.160
ci-0.158
cient-0.157
 father-0.155
Tokeny
Token ID88
Feature activation+2.75e+00

Pos logit contributions
 mess+0.291
 mush+0.280
 slippery+0.260
omo+0.257
onna+0.251
Neg logit contributions
piring-0.274
cient-0.272
work-0.266
illa-0.258
igh-0.257
Token!
Token ID0
Feature activation-3.16e-03

Pos logit contributions
piring+0.185
cient+0.183
work+0.178
illa+0.175
leaf+0.175
Neg logit contributions
 mess-0.139
 mush-0.131
 slippery-0.119
omo-0.119
onna-0.115
Token You
Token ID921
Feature activation+1.96e+00

Pos logit contributions
 mess+0.000
 mush+0.000
omo+0.000
 slippery+0.000
onna+0.000
Neg logit contributions
piring-0.000
cient-0.000
work-0.000
leaf-0.000
illa-0.000
Token can
Token ID460
Feature activation-3.94e-01

Pos logit contributions
piring+0.128
iya+0.124
 photographer+0.123
cient+0.122
ica+0.122
Neg logit contributions
 mess-0.106
 mush-0.097
 slippery-0.092
 messy-0.086
 Lily-0.085
Token't
Token ID470
Feature activation-1.28e+00

Pos logit contributions
 mess+0.023
 mush+0.021
 slippery+0.020
omo+0.020
onna+0.019
Neg logit contributions
piring-0.024
cient-0.024
work-0.023
leaf-0.023
 photographer-0.023
Token hurt
Token ID5938
Feature activation-7.20e+00

Pos logit contributions
 mess+0.082
 mush+0.077
 slippery+0.072
omo+0.071
onna+0.070
Neg logit contributions
piring-0.071
cient-0.069
work-0.066
ci-0.066
leaf-0.065
Token my
Token ID616
Feature activation-7.12e+00

Pos logit contributions
 mess+0.413
 mush+0.393
omo+0.377
onna+0.359
 slippery+0.358
Neg logit contributions
cient-0.432
piring-0.432
illa-0.417
work-0.416
avi-0.414
Token rabbit
Token ID22746
Feature activation+3.68e+00

Pos logit contributions
 mess+0.466
 mush+0.433
 slippery+0.399
omo+0.391
 messy+0.390
Neg logit contributions
piring-0.403
leaf-0.401
used-0.370
owers-0.370
cient-0.369
Token!
Token ID0
Feature activation-4.70e-02

Pos logit contributions
cient+0.259
piring+0.256
work+0.251
ci+0.247
illa+0.246
Neg logit contributions
 mess-0.169
 mush-0.164
omo-0.151
 slippery-0.146
onna-0.143
Token He
Token ID679
Feature activation-2.53e+00

Pos logit contributions
 mess+0.003
 mush+0.003
 slippery+0.003
omo+0.003
onna+0.003
Neg logit contributions
piring-0.003
cient-0.003
work-0.002
illa-0.002
leaf-0.002
Token is
Token ID318
Feature activation-1.71e+00

Pos logit contributions
 mess+0.130
 mush+0.120
 slippery+0.111
omo+0.107
 messy+0.104
Neg logit contributions
piring-0.174
cient-0.167
 photographer-0.165
work-0.163
leaf-0.163
Token to
Token ID284
Feature activation+5.17e-02

Pos logit contributions
 mess+0.004
 mush+0.004
omo+0.003
 slippery+0.003
onna+0.003
Neg logit contributions
piring-0.003
cient-0.003
work-0.003
leaf-0.003
illa-0.003
Token take
Token ID1011
Feature activation+1.30e+00

Pos logit contributions
piring+0.003
cient+0.003
work+0.003
ci+0.003
illa+0.003
Neg logit contributions
 mess-0.003
 mush-0.003
 slippery-0.003
omo-0.003
onna-0.003
Token them
Token ID606
Feature activation-1.11e+00

Pos logit contributions
cient+0.078
piring+0.078
work+0.076
leaf+0.074
illa+0.074
Neg logit contributions
 mess-0.074
 mush-0.070
omo-0.068
onna-0.066
 slippery-0.065
Token away
Token ID1497
Feature activation-3.08e+00

Pos logit contributions
 mess+0.060
 mush+0.058
omo+0.054
 slippery+0.051
onna+0.051
Neg logit contributions
piring-0.071
cient-0.070
work-0.068
illa-0.067
ci-0.067
Token.
Token ID13
Feature activation+1.12e-01

Pos logit contributions
 mess+0.159
 mush+0.150
 slippery+0.138
omo+0.136
onna+0.132
Neg logit contributions
piring-0.204
cient-0.204
work-0.196
leaf-0.195
 photographer-0.194
Token Lily
Token ID20037
Feature activation-8.61e+00

Pos logit contributions
cient+0.008
 grandson+0.007
 wife+0.007
 son+0.007
 father+0.007
Neg logit contributions
omo-0.004
 mess-0.004
 slippery-0.004
 mush-0.004
 poisonous-0.004
Token's
Token ID338
Feature activation+2.71e+00

Pos logit contributions
 mess+0.434
 mush+0.389
 slippery+0.368
omo+0.358
 Lily+0.346
Neg logit contributions
piring-0.620
leaf-0.584
work-0.582
cient-0.579
 photographer-0.574
Token mom
Token ID1995
Feature activation-1.17e+00

Pos logit contributions
piring+0.162
cient+0.156
leaf+0.156
illa+0.152
owers+0.151
Neg logit contributions
 mess-0.175
 mush-0.157
 messy-0.149
 slippery-0.149
omo-0.148
Token told
Token ID1297
Feature activation+2.20e+00

Pos logit contributions
 mess+0.069
 mush+0.062
 slippery+0.060
omo+0.059
 Lily+0.058
Neg logit contributions
piring-0.074
leaf-0.070
work-0.069
owers-0.069
 photographer-0.069
Token her
Token ID607
Feature activation+6.86e-01

Pos logit contributions
cient+0.113
piring+0.112
work+0.106
 father+0.105
illa+0.105
Neg logit contributions
 mess-0.140
 mush-0.140
omo-0.133
onna-0.130
 slippery-0.128
Token to
Token ID284
Feature activation-6.24e-03

Pos logit contributions
cient+0.042
piring+0.042
leaf+0.041
work+0.041
ci+0.041
Neg logit contributions
 mess-0.039
 mush-0.036
 Lily-0.034
 slippery-0.033
omo-0.033
Token."
Token ID526
Feature activation-4.94e+00

Pos logit contributions
 mess+0.159
 mush+0.153
omo+0.148
 slippery+0.141
onna+0.136
Neg logit contributions
piring-0.226
work-0.219
cient-0.218
igh-0.210
avi-0.209
Token
Token ID198
Feature activation+6.68e-01

Pos logit contributions
 mess+0.326
 mush+0.314
 slippery+0.298
omo+0.290
 muddy+0.274
Neg logit contributions
piring-0.244
cient-0.244
avi-0.242
 father-0.239
 wife-0.237
Token
Token ID198
Feature activation+6.71e-01

Pos logit contributions
avi+0.031
cient+0.030
piring+0.030
leaf+0.030
illa+0.030
Neg logit contributions
 mush-0.045
 mess-0.045
omo-0.041
 slippery-0.040
 messy-0.039
TokenAnna
Token ID31160
Feature activation-5.72e+00

Pos logit contributions
avi+0.045
thel+0.044
ani+0.043
ica+0.043
iya+0.043
Neg logit contributions
 mush-0.033
 mess-0.032
onna-0.029
omo-0.028
 gross-0.027
Token did
Token ID750
Feature activation+2.92e+00

Pos logit contributions
 mess+0.367
 mush+0.348
omo+0.324
 slippery+0.323
onna+0.312
Neg logit contributions
piring-0.318
cient-0.312
work-0.300
leaf-0.296
 photographer-0.293
Token not
Token ID407
Feature activation-7.24e+00

Pos logit contributions
 wife+0.189
used+0.187
leaf+0.185
piring+0.185
illa+0.184
Neg logit contributions
 mush-0.136
omo-0.130
 mess-0.127
 million-0.122
onna-0.113
Token want
Token ID765
Feature activation-2.18e-01

Pos logit contributions
 mess+0.570
 mush+0.556
omo+0.525
 slippery+0.523
onna+0.510
Neg logit contributions
piring-0.286
cient-0.273
work-0.265
leaf-0.261
 photographer-0.259
Token to
Token ID284
Feature activation-3.48e-02

Pos logit contributions
 mess+0.015
omo+0.015
 mush+0.015
 slippery+0.014
 million+0.013
Neg logit contributions
work-0.009
leaf-0.009
 father-0.009
piring-0.009
 wife-0.009
Token be
Token ID307
Feature activation-1.79e-01

Pos logit contributions
 mess+0.002
 mush+0.002
omo+0.002
 slippery+0.002
onna+0.002
Neg logit contributions
piring-0.002
cient-0.002
work-0.002
illa-0.002
avi-0.002
Token weak
Token ID4939
Feature activation-2.59e+00

Pos logit contributions
 mess+0.008
omo+0.008
 mush+0.008
onna+0.008
 slippery+0.007
Neg logit contributions
 wife-0.012
piring-0.011
cient-0.011
 permission-0.011
 father-0.011
Token and
Token ID290
Feature activation+1.51e-01

Pos logit contributions
 mush+0.139
 mess+0.139
omo+0.132
 slippery+0.127
onna+0.124
Neg logit contributions
piring-0.159
work-0.154
cient-0.151
igh-0.146
 father-0.146
Token to
Token ID284
Feature activation-2.58e-02

Pos logit contributions
 mess+0.007
 mush+0.007
omo+0.006
 slippery+0.006
onna+0.006
Neg logit contributions
piring-0.006
cient-0.005
work-0.005
leaf-0.005
illa-0.005
Token stay
Token ID2652
Feature activation+2.91e+00

Pos logit contributions
 mess+0.002
 mush+0.002
 slippery+0.001
omo+0.001
onna+0.001
Neg logit contributions
piring-0.001
cient-0.001
work-0.001
 photographer-0.001
owers-0.001
Token on
Token ID319
Feature activation+2.65e+00

Pos logit contributions
piring+0.199
cient+0.198
work+0.191
leaf+0.190
illa+0.189
Neg logit contributions
 mess-0.149
 mush-0.139
 slippery-0.129
omo-0.126
 messy-0.121
Token the
Token ID262
Feature activation+3.88e-01

Pos logit contributions
cient+0.187
piring+0.186
work+0.179
leaf+0.177
illa+0.177
Neg logit contributions
 mess-0.133
 mush-0.121
 slippery-0.117
 Lily-0.109
omo-0.107
Token first
Token ID717
Feature activation+6.58e+00

Pos logit contributions
piring+0.023
cient+0.023
work+0.022
 photographer+0.022
igh+0.022
Neg logit contributions
 mess-0.024
 mush-0.022
 slippery-0.022
 messy-0.020
omo-0.020
Token floor
Token ID4314
Feature activation-9.36e+00

Pos logit contributions
piring+0.434
cient+0.429
work+0.419
illa+0.413
ci+0.411
Neg logit contributions
 mess-0.339
 mush-0.324
omo-0.294
 slippery-0.293
onna-0.284
Token.
Token ID13
Feature activation+1.83e-02

Pos logit contributions
 mess+0.511
 mush+0.496
omo+0.462
 slippery+0.441
onna+0.434
Neg logit contributions
piring-0.577
cient-0.563
work-0.560
ci-0.545
illa-0.544
Token
Token ID198
Feature activation+6.11e-01

Pos logit contributions
cient+0.001
piring+0.001
ci+0.001
 father+0.001
amb+0.001
Neg logit contributions
 mess-0.001
 mush-0.001
 slippery-0.001
onna-0.001
 poisonous-0.001
Token
Token ID198
Feature activation+6.16e-01

Pos logit contributions
piring+0.031
cient+0.031
illa+0.031
amb+0.030
 father+0.030
Neg logit contributions
 mess-0.040
 mush-0.039
 slippery-0.034
 messy-0.033
 poisonous-0.033
TokenThe
Token ID464
Feature activation+7.80e-01

Pos logit contributions
piring+0.040
cient+0.039
ci+0.039
 father+0.038
amb+0.038
Neg logit contributions
 mess-0.033
 mush-0.032
onna-0.028
 slippery-0.027
 poisonous-0.026
Token elevator
Token ID20932
Feature activation+4.09e+00

Pos logit contributions
cient+0.045
piring+0.045
igh+0.043
owers+0.042
illa+0.042
Neg logit contributions
 mess-0.051
 mush-0.046
 slippery-0.045
 messy-0.043
omo-0.043