TOP ACTIVATIONS
MAX = 0.209

for you . It has meat and vegetables . It is
. Maybe they would have meat too , but not mouse
. She had put special bait on the line that was
the grill . He has meat and corn and bread .
a burger . It had meat and cheese and lettuce .
for you . It has chicken and rice and carrots ."
was happy . It had meat . It chose Tim and
y ummy . We have chicken and potatoes and carrots and
cage . He put some meat in the cage and waited
new color . It is tur qu oise , like the
, " I 'm making chicken soup . It 's y
saw her mom put some meat in the big box that
, " They are av oc ados . My mom makes
farm , she saw two human children . They were playing
dinner . They are cooking chicken , rice and carrots .
Yes , you can have meat , but you have to
a bowl . Dad put tomato sauce over the pasta ,
top , sometimes they had meat balls , sometimes they had
, " Be ef is meat that comes from cows .
Mom says they are av oc ados . Sara wants to

MIN ACTIVATIONS
MIN = -0.207

and forth , helping the p anda to complete the climb
very happy and thanked the p anda . The p anda
together every day and the p anda was no longer gloomy
that day on , the p anda and the bird became
that day on , the p anda and the bird played
day onward , the young p anda tried to behave himself
or the porch or the p uddle . She did not
do you go ?" The p anda replied , " I
and very soft . The p anda loved to fold things
polite ." The p anda felt very sad ,
than ever , and the p uddle was gone . In
tree does not miss the p ears anymore ! <|endoftext|> J
wrong ?". The p anda said , " I
sc ow led at the p uddle . Suddenly
The bird thanked the p anda and offered him a
was feeling helpless because the p uddle was too deep to
. This continued until the p anda finally stopped .
the little girl and the p anda played together every day
that day on , the p anda made sure to always
The night came and the p anda was still lost .

INTERVAL 0.105 - 0.209
CONTAINS 0.154%

. " Good morning , O ll ie !"
them . " Help , help !" Tom and
" I have bread for the ducks ," Lily
, but we have each other and that 's what 's
waiter said " We have spaghetti available . Would you like

INTERVAL 0.001 - 0.105
CONTAINS 47.200%

Jill followed the bird and they flew high up in the
to disappear and the tree began to get bigger and bigger
He dreamed that he was flying in a big balloon .
day , the bird flew by and listened to the family
Stop , dog , stop !" Ben says . " You

INTERVAL -0.103 - 0.001
CONTAINS 52.371%

" Maybe there is a treasure in the hole !"
that his v ase was broken . He cried and tried
bake a cake . Leo wanted to make a y ummy
snow man . They couldn 't wait to play outside again
sky is blue !" The rh ino was

INTERVAL -0.207 - -0.103
CONTAINS 0.275%

<|endoftext|> L ily was a p anda who liked to deliver
. The town spe ople thought and thought
while we go on the slide . It will make you
's mom asked him to clean his messy room . Tim
. Then they decorated the room with other flowers too .
Token for
Token ID329
Feature activation+8.06e-02

Pos logit contributions
aw+0.222
Run+0.208
ted+0.204
plets+0.201
 trigger+0.200
Neg logit contributions
 look-0.144
 Jane-0.135
days-0.128
 Sus-0.125
 tidy-0.125
Token you
Token ID345
Feature activation+2.31e-02

Pos logit contributions
aw+0.123
Run+0.121
 sudden+0.113
ted+0.113
 trigger+0.112
Neg logit contributions
 look-0.164
days-0.152
 tidy-0.149
 Jane-0.149
 clean-0.148
Token.
Token ID13
Feature activation+6.22e-03

Pos logit contributions
aw+0.050
Run+0.048
 trigger+0.045
 sudden+0.044
ted+0.044
Neg logit contributions
 look-0.029
 Jane-0.027
 Sus-0.027
 tidy-0.025
days-0.025
Token It
Token ID632
Feature activation-8.95e-03

Pos logit contributions
aw+0.012
 sudden+0.011
 trigger+0.011
Run+0.011
plets+0.011
Neg logit contributions
 look-0.012
 Jane-0.010
 clean-0.010
 tidy-0.010
days-0.010
Token has
Token ID468
Feature activation+2.41e-02

Pos logit contributions
 Jane+0.016
 look+0.016
 Sus+0.015
days+0.015
 tidy+0.015
Neg logit contributions
aw-0.015
Run-0.015
ted-0.014
rots-0.014
plets-0.014
Token meat
Token ID6174
Feature activation+1.88e-01

Pos logit contributions
aw+0.047
Run+0.045
plets+0.044
 trigger+0.044
ted+0.044
Neg logit contributions
 look-0.040
 Jane-0.036
days-0.036
 tidy-0.036
 clean-0.036
Token and
Token ID290
Feature activation+5.36e-02

Pos logit contributions
aw+0.424
Run+0.419
 trigger+0.408
plets+0.405
ted+0.403
Neg logit contributions
 look-0.264
 clean-0.231
 Jane-0.229
 tidy-0.226
days-0.215
Token vegetables
Token ID13701
Feature activation+7.49e-02

Pos logit contributions
aw+0.099
Run+0.096
 trigger+0.094
plets+0.094
rified+0.093
Neg logit contributions
 look-0.101
 clean-0.090
 Jane-0.087
 tidy-0.087
 have-0.080
Token.
Token ID13
Feature activation+9.87e-03

Pos logit contributions
aw+0.164
Run+0.155
plets+0.152
ted+0.150
 trigger+0.149
Neg logit contributions
 Jane-0.101
 look-0.099
 Sus-0.093
days-0.093
 tidy-0.088
Token It
Token ID632
Feature activation-7.64e-03

Pos logit contributions
 sudden+0.019
aw+0.019
 trigger+0.019
Run+0.018
ted+0.018
Neg logit contributions
 look-0.017
 Jane-0.016
 clean-0.015
 tidy-0.015
 have-0.015
Token is
Token ID318
Feature activation-9.34e-03

Pos logit contributions
 look+0.013
 Jane+0.012
days+0.012
 tidy+0.012
 clean+0.012
Neg logit contributions
aw-0.014
Run-0.014
ted-0.013
plets-0.013
 trigger-0.013
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
aw+0.139
ted+0.124
Run+0.122
rots+0.120
plets+0.119
Neg logit contributions
 look-0.068
 Jane-0.060
 Sus-0.058
days-0.057
 tidy-0.055
Token Maybe
Token ID6674
Feature activation-1.20e-02

Pos logit contributions
aw+0.049
 trigger+0.048
Run+0.048
plets+0.048
ted+0.046
Neg logit contributions
 look-0.022
 Jane-0.020
 clean-0.018
 tidy-0.018
days-0.017
Token they
Token ID484
Feature activation+3.89e-02

Pos logit contributions
 look+0.018
 Jane+0.017
 clean+0.016
 tidy+0.015
 Lucy+0.014
Neg logit contributions
 trigger-0.027
plets-0.027
aw-0.027
Run-0.026
rified-0.025
Token would
Token ID561
Feature activation+3.46e-02

Pos logit contributions
aw+0.074
Run+0.073
ted+0.073
rots+0.069
rified+0.067
Neg logit contributions
 Jane-0.064
 Sus-0.059
 look-0.059
 tidy-0.059
 clean-0.058
Token have
Token ID423
Feature activation+1.73e-03

Pos logit contributions
 trigger+0.072
Run+0.071
aw+0.069
 sudden+0.067
Leave+0.067
Neg logit contributions
 Jane-0.043
 Sus-0.040
days-0.040
any-0.036
 Lucy-0.035
Token meat
Token ID6174
Feature activation+1.86e-01

Pos logit contributions
aw+0.003
Run+0.003
ted+0.003
 sudden+0.003
rots+0.003
Neg logit contributions
 look-0.003
days-0.003
 tidy-0.003
 Jane-0.003
 clean-0.003
Token too
Token ID1165
Feature activation-4.34e-02

Pos logit contributions
aw+0.395
Run+0.385
 trigger+0.375
ted+0.373
plets+0.373
Neg logit contributions
 look-0.280
 Jane-0.252
 clean-0.248
 tidy-0.246
days-0.238
Token,
Token ID11
Feature activation+3.99e-02

Pos logit contributions
 look+0.057
 Sus+0.055
days+0.055
 Jane+0.053
 Ell+0.050
Neg logit contributions
aw-0.099
Run-0.089
 sudden-0.083
ls-0.083
plets-0.082
Token but
Token ID475
Feature activation+7.76e-02

Pos logit contributions
aw+0.082
Run+0.081
 trigger+0.079
plets+0.079
ted+0.078
Neg logit contributions
 look-0.065
 Jane-0.058
 clean-0.057
 tidy-0.056
days-0.054
Token not
Token ID407
Feature activation-5.96e-02

Pos logit contributions
aw+0.142
Run+0.140
 trigger+0.135
ted+0.134
plets+0.133
Neg logit contributions
 look-0.138
 Jane-0.125
 clean-0.125
 tidy-0.124
days-0.122
Token mouse
Token ID10211
Feature activation+1.01e-01

Pos logit contributions
 look+0.086
days+0.080
 Jane+0.079
 tidy+0.078
 clean+0.077
Neg logit contributions
aw-0.124
Run-0.120
 trigger-0.116
ted-0.114
 sudden-0.113
Token.
Token ID13
Feature activation-6.46e-04

Pos logit contributions
aw+0.058
 sudden+0.054
Run+0.053
 trigger+0.053
ted+0.052
Neg logit contributions
days-0.033
 look-0.032
 Jane-0.030
 tidy-0.029
 clean-0.027
Token She
Token ID1375
Feature activation-3.59e-02

Pos logit contributions
 look+0.001
 tidy+0.001
 clean+0.001
days+0.001
 Jane+0.001
Neg logit contributions
aw-0.001
Run-0.001
ted-0.001
plets-0.001
 trigger-0.001
Token had
Token ID550
Feature activation+4.44e-03

Pos logit contributions
 look+0.052
 Jane+0.046
days+0.045
 tidy+0.045
 clean+0.045
Neg logit contributions
aw-0.079
Run-0.077
ted-0.074
 sudden-0.073
plets-0.073
Token put
Token ID1234
Feature activation+9.09e-05

Pos logit contributions
aw+0.009
 sudden+0.009
Run+0.009
ted+0.008
plets+0.008
Neg logit contributions
days-0.007
 look-0.006
 tidy-0.006
 Jane-0.005
 clean-0.005
Token special
Token ID2041
Feature activation+6.70e-02

Pos logit contributions
aw+0.000
Run+0.000
ted+0.000
 trigger+0.000
plets+0.000
Neg logit contributions
 look-0.000
days-0.000
 tidy-0.000
 Jane-0.000
 clean-0.000
Token bait
Token ID26536
Feature activation+1.83e-01

Pos logit contributions
plets+0.123
aw+0.123
Run+0.122
 trigger+0.122
rified+0.117
Neg logit contributions
 look-0.124
 clean-0.110
 tidy-0.107
 Jane-0.107
days-0.100
Token on
Token ID319
Feature activation+1.10e-02

Pos logit contributions
aw+0.380
Run+0.368
 trigger+0.360
ted+0.358
plets+0.356
Neg logit contributions
 look-0.282
 Jane-0.255
 clean-0.249
 tidy-0.249
days-0.246
Token the
Token ID262
Feature activation+5.74e-03

Pos logit contributions
aw+0.027
Run+0.026
 trigger+0.026
ted+0.025
plets+0.025
Neg logit contributions
 look-0.013
 Jane-0.011
 clean-0.011
 tidy-0.011
days-0.011
Token line
Token ID1627
Feature activation+1.68e-02

Pos logit contributions
aw+0.011
Run+0.011
ted+0.011
rots+0.010
plets+0.010
Neg logit contributions
 look-0.010
 clean-0.009
 tidy-0.009
 Jane-0.009
 smile-0.008
Token that
Token ID326
Feature activation-2.19e-02

Pos logit contributions
aw+0.035
Run+0.033
plets+0.033
ted+0.033
 trigger+0.032
Neg logit contributions
 look-0.025
 tidy-0.022
days-0.022
 Jane-0.022
 clean-0.022
Token was
Token ID373
Feature activation+1.35e-02

Pos logit contributions
 look+0.029
days+0.027
 Jane+0.026
 tidy+0.026
 clean+0.025
Neg logit contributions
aw-0.048
Run-0.047
ted-0.046
 trigger-0.045
plets-0.045
Token the
Token ID262
Feature activation+2.37e-03

Pos logit contributions
aw+0.001
Run+0.001
ted+0.000
 sudden+0.000
 trigger+0.000
Neg logit contributions
 look-0.000
days-0.000
 tidy-0.000
 clean-0.000
 Jane-0.000
Token grill
Token ID29901
Feature activation+3.53e-02

Pos logit contributions
aw+0.005
 trigger+0.005
Run+0.005
ted+0.004
 sudden+0.004
Neg logit contributions
 look-0.004
 Jane-0.004
days-0.003
 Sus-0.003
 tidy-0.003
Token.
Token ID13
Feature activation+5.06e-03

Pos logit contributions
aw+0.086
Run+0.082
ted+0.081
 trigger+0.079
plets+0.078
Neg logit contributions
 look-0.041
 Jane-0.036
 tidy-0.036
days-0.034
 clean-0.034
Token He
Token ID679
Feature activation+2.67e-02

Pos logit contributions
aw+0.010
Run+0.010
ted+0.009
plets+0.009
 trigger+0.009
Neg logit contributions
 look-0.009
days-0.008
 tidy-0.008
 clean-0.008
 Jane-0.007
Token has
Token ID468
Feature activation+1.90e-02

Pos logit contributions
aw+0.050
Run+0.049
ted+0.047
 trigger+0.046
plets+0.046
Neg logit contributions
 look-0.047
 Jane-0.043
 clean-0.042
 tidy-0.042
days-0.042
Token meat
Token ID6174
Feature activation+1.81e-01

Pos logit contributions
aw+0.043
Run+0.041
ted+0.040
 sudden+0.040
 trigger+0.039
Neg logit contributions
 look-0.025
days-0.024
 Jane-0.023
 tidy-0.022
 clean-0.022
Token and
Token ID290
Feature activation+5.05e-02

Pos logit contributions
Run+0.380
aw+0.378
 trigger+0.365
plets+0.363
 sudden+0.360
Neg logit contributions
 look-0.284
 clean-0.253
 Jane-0.243
 tidy-0.242
days-0.229
Token corn
Token ID11676
Feature activation+1.19e-01

Pos logit contributions
aw+0.101
plets+0.099
 trigger+0.098
Run+0.098
rified+0.096
Neg logit contributions
 look-0.087
 clean-0.077
 Jane-0.076
 tidy-0.073
 have-0.068
Token and
Token ID290
Feature activation+3.05e-02

Pos logit contributions
aw+0.258
Run+0.249
ted+0.248
 trigger+0.246
plets+0.245
Neg logit contributions
 look-0.168
 Jane-0.158
 tidy-0.153
 clean-0.150
days-0.149
Token bread
Token ID8509
Feature activation+1.19e-01

Pos logit contributions
aw+0.062
Run+0.061
 trigger+0.060
plets+0.060
rots+0.058
Neg logit contributions
 look-0.051
 clean-0.045
 Jane-0.044
 tidy-0.044
days-0.039
Token.
Token ID13
Feature activation+4.41e-03

Pos logit contributions
aw+0.284
Run+0.274
 trigger+0.271
ted+0.270
plets+0.269
Neg logit contributions
 look-0.151
 Jane-0.134
 tidy-0.131
 clean-0.129
days-0.127
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
days+0.037
 look+0.033
 tidy+0.032
 clean+0.031
 Jane+0.030
Neg logit contributions
aw-0.049
Run-0.047
 trigger-0.046
 sudden-0.046
ted-0.046
Token burger
Token ID26593
Feature activation+8.26e-02

Pos logit contributions
aw+0.026
Run+0.025
 trigger+0.025
plets+0.025
ted+0.024
Neg logit contributions
 look-0.019
 Jane-0.017
 clean-0.017
 tidy-0.017
days-0.016
Token.
Token ID13
Feature activation+9.48e-03

Pos logit contributions
aw+0.189
Run+0.180
ted+0.178
 trigger+0.178
plets+0.174
Neg logit contributions
 look-0.111
 Jane-0.100
 tidy-0.099
days-0.099
 clean-0.098
Token It
Token ID632
Feature activation-1.04e-02

Pos logit contributions
aw+0.018
Run+0.017
 trigger+0.017
plets+0.017
ted+0.017
Neg logit contributions
 look-0.016
days-0.015
 tidy-0.015
 clean-0.015
 Jane-0.014
Token had
Token ID550
Feature activation+1.91e-02

Pos logit contributions
days+0.014
 look+0.014
 Jane+0.013
 tidy+0.013
 clean+0.013
Neg logit contributions
aw-0.022
Run-0.022
ted-0.021
 sudden-0.020
 trigger-0.020
Token meat
Token ID6174
Feature activation+1.79e-01

Pos logit contributions
aw+0.042
Run+0.041
 trigger+0.041
plets+0.040
ted+0.040
Neg logit contributions
 look-0.027
 Jane-0.024
 clean-0.024
 tidy-0.024
days-0.023
Token and
Token ID290
Feature activation+6.04e-02

Pos logit contributions
Run+0.364
aw+0.356
 sudden+0.341
plets+0.341
 trigger+0.340
Neg logit contributions
 look-0.299
 clean-0.265
 tidy-0.255
 Jane-0.254
days-0.235
Token cheese
Token ID9891
Feature activation+1.23e-01

Pos logit contributions
aw+0.116
Run+0.113
rified+0.113
plets+0.110
rots+0.108
Neg logit contributions
 look-0.111
 clean-0.097
 Jane-0.093
 tidy-0.093
 have-0.086
Token and
Token ID290
Feature activation+3.91e-02

Pos logit contributions
aw+0.278
Run+0.272
 trigger+0.266
plets+0.265
ted+0.263
Neg logit contributions
 look-0.172
 Jane-0.152
 clean-0.150
 tidy-0.149
days-0.143
Token lettuce
Token ID39406
Feature activation+9.07e-02

Pos logit contributions
aw+0.075
Run+0.074
rified+0.073
plets+0.072
 trigger+0.072
Neg logit contributions
 look-0.070
 clean-0.062
 Jane-0.060
 tidy-0.060
 have-0.055
Token.
Token ID13
Feature activation+9.38e-03

Pos logit contributions
aw+0.211
 trigger+0.202
Run+0.201
ted+0.200
plets+0.199
Neg logit contributions
 look-0.116
 Jane-0.108
days-0.106
 tidy-0.104
 clean-0.102
Token for
Token ID329
Feature activation+5.53e-02

Pos logit contributions
aw+0.052
Run+0.051
ted+0.050
plets+0.049
 trigger+0.049
Neg logit contributions
 look-0.040
 tidy-0.036
 Jane-0.036
 clean-0.036
days-0.036
Token you
Token ID345
Feature activation+9.44e-03

Pos logit contributions
aw+0.094
Run+0.090
 sudden+0.088
ted+0.086
plets+0.085
Neg logit contributions
 look-0.102
days-0.093
 Jane-0.092
 tidy-0.088
 clean-0.087
Token.
Token ID13
Feature activation+9.60e-04

Pos logit contributions
aw+0.019
Run+0.018
plets+0.017
 trigger+0.017
ted+0.017
Neg logit contributions
 look-0.012
 Jane-0.012
 Sus-0.012
days-0.011
 tidy-0.011
Token It
Token ID632
Feature activation-7.86e-03

Pos logit contributions
aw+0.002
Run+0.002
 trigger+0.002
 sudden+0.002
ted+0.002
Neg logit contributions
 look-0.002
 Jane-0.002
 clean-0.002
 tidy-0.002
days-0.002
Token has
Token ID468
Feature activation+2.37e-02

Pos logit contributions
 Jane+0.014
 look+0.014
days+0.014
 Sus+0.013
 tidy+0.013
Neg logit contributions
aw-0.013
Run-0.013
ted-0.013
plets-0.012
rots-0.012
Token chicken
Token ID9015
Feature activation+1.79e-01

Pos logit contributions
aw+0.049
Run+0.046
ted+0.045
 trigger+0.045
plets+0.045
Neg logit contributions
 look-0.037
 Jane-0.034
days-0.034
 tidy-0.033
 clean-0.033
Token and
Token ID290
Feature activation+5.36e-02

Pos logit contributions
aw+0.351
Run+0.342
 trigger+0.333
plets+0.332
ted+0.330
Neg logit contributions
 look-0.298
 Jane-0.271
 clean-0.266
 tidy-0.264
days-0.257
Token rice
Token ID11464
Feature activation+6.39e-02

Pos logit contributions
aw+0.100
Run+0.097
plets+0.095
 trigger+0.094
rified+0.093
Neg logit contributions
 look-0.099
 clean-0.089
 Jane-0.088
 tidy-0.086
 have-0.078
Token and
Token ID290
Feature activation+3.65e-02

Pos logit contributions
aw+0.133
ted+0.126
 trigger+0.124
plets+0.123
Run+0.123
Neg logit contributions
 look-0.092
 Jane-0.091
days-0.086
 Sus-0.085
 tidy-0.082
Token carrots
Token ID34397
Feature activation+9.55e-02

Pos logit contributions
aw+0.068
Run+0.066
 trigger+0.065
plets+0.065
ted+0.063
Neg logit contributions
 look-0.067
 clean-0.060
 Jane-0.059
 tidy-0.059
days-0.054
Token."
Token ID526
Feature activation-2.32e-02

Pos logit contributions
aw+0.209
Run+0.198
ted+0.195
plets+0.193
 trigger+0.193
Neg logit contributions
 look-0.129
 Jane-0.129
 Sus-0.119
days-0.119
 tidy-0.117
Token was
Token ID373
Feature activation+1.01e-02

Pos logit contributions
aw+0.123
Run+0.122
 trigger+0.121
plets+0.121
ted+0.117
Neg logit contributions
 look-0.067
 clean-0.057
 Jane-0.057
 tidy-0.056
days-0.051
Token happy
Token ID3772
Feature activation+1.93e-02

Pos logit contributions
aw+0.020
Run+0.019
 sudden+0.019
 trigger+0.018
ted+0.018
Neg logit contributions
days-0.016
 look-0.016
 Jane-0.015
 Sus-0.014
 tidy-0.013
Token.
Token ID13
Feature activation+9.69e-03

Pos logit contributions
aw+0.043
Run+0.041
ted+0.041
 sudden+0.038
rots+0.038
Neg logit contributions
 look-0.025
days-0.024
 tidy-0.024
 clean-0.022
 Jane-0.021
Token It
Token ID632
Feature activation-1.65e-02

Pos logit contributions
aw+0.021
Run+0.020
 trigger+0.020
plets+0.020
ted+0.020
Neg logit contributions
 look-0.015
 Jane-0.013
 clean-0.013
 tidy-0.013
days-0.012
Token had
Token ID550
Feature activation+1.02e-02

Pos logit contributions
 look+0.022
days+0.021
 Jane+0.021
 tidy+0.020
 clean+0.019
Neg logit contributions
aw-0.037
Run-0.037
ted-0.036
 trigger-0.034
 sudden-0.034
Token meat
Token ID6174
Feature activation+1.78e-01

Pos logit contributions
aw+0.023
Run+0.021
ted+0.021
 sudden+0.020
 trigger+0.020
Neg logit contributions
days-0.014
 look-0.014
 Jane-0.013
 tidy-0.012
 clean-0.012
Token.
Token ID13
Feature activation+1.80e-02

Pos logit contributions
aw+0.367
Run+0.358
 trigger+0.350
plets+0.347
ted+0.346
Neg logit contributions
 look-0.279
 Jane-0.251
 clean-0.248
 tidy-0.246
days-0.239
Token It
Token ID632
Feature activation-9.47e-03

Pos logit contributions
aw+0.036
Run+0.035
 trigger+0.035
plets+0.034
ted+0.034
Neg logit contributions
 look-0.030
 Jane-0.027
 clean-0.027
 tidy-0.026
days-0.025
Token chose
Token ID7690
Feature activation+1.46e-02

Pos logit contributions
 look+0.013
days+0.012
 Jane+0.012
 tidy+0.011
 Sus+0.011
Neg logit contributions
aw-0.021
Run-0.021
ted-0.020
 trigger-0.019
plets-0.019
Token Tim
Token ID5045
Feature activation+3.82e-03

Pos logit contributions
aw+0.032
ted+0.031
Run+0.030
 trigger+0.030
rots+0.029
Neg logit contributions
 look-0.021
days-0.019
 tidy-0.019
 clean-0.018
 Jane-0.018
Token and
Token ID290
Feature activation+8.06e-03

Pos logit contributions
aw+0.007
ted+0.006
Run+0.006
rots+0.006
 trigger+0.006
Neg logit contributions
 look-0.007
 Jane-0.007
 tidy-0.006
days-0.006
 Sus-0.006
Token y
Token ID331
Feature activation-4.25e-04

Pos logit contributions
 look+0.055
 Jane+0.053
days+0.052
 Sus+0.051
any+0.048
Neg logit contributions
aw-0.066
Run-0.064
 sudden-0.062
led-0.059
lish-0.058
Tokenummy
Token ID13513
Feature activation-7.23e-03

Pos logit contributions
 look+0.001
 Jane+0.001
 tidy+0.001
 clean+0.001
days+0.001
Neg logit contributions
aw-0.001
Run-0.001
ted-0.001
 trigger-0.000
plets-0.000
Token.
Token ID13
Feature activation+8.12e-03

Pos logit contributions
 look+0.010
 Jane+0.009
 tidy+0.009
days+0.009
 clean+0.008
Neg logit contributions
aw-0.016
Run-0.015
ted-0.014
plets-0.014
 trigger-0.014
Token We
Token ID775
Feature activation+2.00e-02

Pos logit contributions
aw+0.014
 trigger+0.014
Run+0.013
ted+0.013
plets+0.013
Neg logit contributions
 look-0.016
 Jane-0.015
 clean-0.014
 tidy-0.014
days-0.014
Token have
Token ID423
Feature activation+6.73e-03

Pos logit contributions
Run+0.036
 trigger+0.036
aw+0.034
ted+0.034
plets+0.033
Neg logit contributions
 Jane-0.032
 Sus-0.027
 Sally-0.026
 Sue-0.025
 Jill-0.025
Token chicken
Token ID9015
Feature activation+1.76e-01

Pos logit contributions
aw+0.015
Run+0.014
ted+0.013
 sudden+0.013
rots+0.013
Neg logit contributions
 look-0.009
days-0.009
 Jane-0.008
 tidy-0.008
 clean-0.008
Token and
Token ID290
Feature activation+5.59e-02

Pos logit contributions
aw+0.343
Run+0.333
ted+0.323
 trigger+0.323
plets+0.322
Neg logit contributions
 look-0.296
 Jane-0.270
 clean-0.264
 tidy-0.263
days-0.254
Token potatoes
Token ID18821
Feature activation+1.31e-01

Pos logit contributions
aw+0.103
Run+0.101
 trigger+0.098
plets+0.098
ted+0.097
Neg logit contributions
 look-0.100
 Jane-0.091
 clean-0.090
 tidy-0.089
days-0.087
Token and
Token ID290
Feature activation+4.09e-02

Pos logit contributions
aw+0.246
ted+0.230
Run+0.223
rots+0.221
plets+0.220
Neg logit contributions
 look-0.210
 Jane-0.209
 Sus-0.194
 tidy-0.192
days-0.192
Token carrots
Token ID34397
Feature activation+9.44e-02

Pos logit contributions
aw+0.076
Run+0.074
 trigger+0.072
plets+0.072
ted+0.071
Neg logit contributions
 look-0.074
 clean-0.067
 Jane-0.067
 tidy-0.066
days-0.063
Token and
Token ID290
Feature activation+4.06e-02

Pos logit contributions
aw+0.174
ted+0.161
Run+0.161
plets+0.157
rots+0.156
Neg logit contributions
 look-0.153
 Jane-0.152
 tidy-0.144
days-0.142
 Sus-0.142
Token cage
Token ID17186
Feature activation-3.75e-03

Pos logit contributions
 look+0.019
 Jane+0.018
days+0.018
 tidy+0.018
 clean+0.018
Neg logit contributions
aw-0.023
Run-0.023
 trigger-0.023
plets-0.022
ted-0.022
Token.
Token ID13
Feature activation+8.69e-03

Pos logit contributions
 look+0.004
 Jane+0.003
 tidy+0.003
days+0.003
 clean+0.003
Neg logit contributions
aw-0.010
Run-0.009
ted-0.009
 trigger-0.009
plets-0.009
Token He
Token ID679
Feature activation+2.96e-02

Pos logit contributions
aw+0.018
Run+0.017
 trigger+0.017
plets+0.017
ted+0.017
Neg logit contributions
 look-0.014
 clean-0.013
 Jane-0.013
 tidy-0.013
days-0.012
Token put
Token ID1234
Feature activation-1.36e-03

Pos logit contributions
aw+0.067
Run+0.066
ted+0.063
plets+0.062
rots+0.061
Neg logit contributions
 look-0.040
days-0.037
 Jane-0.036
 clean-0.036
 tidy-0.035
Token some
Token ID617
Feature activation+4.34e-02

Pos logit contributions
 look+0.002
 Jane+0.001
 clean+0.001
 tidy+0.001
days+0.001
Neg logit contributions
aw-0.003
Run-0.003
plets-0.003
 trigger-0.003
ted-0.003
Token meat
Token ID6174
Feature activation+1.75e-01

Pos logit contributions
plets+0.091
rified+0.088
Run+0.087
izing+0.086
aw+0.084
Neg logit contributions
 look-0.070
 clean-0.065
 tidy-0.059
 have-0.054
 Jane-0.054
Token in
Token ID287
Feature activation+3.75e-03

Pos logit contributions
aw+0.383
Run+0.379
plets+0.372
 sudden+0.371
 trigger+0.364
Neg logit contributions
 look-0.259
 clean-0.230
 tidy-0.218
 Jane-0.215
days-0.204
Token the
Token ID262
Feature activation+3.36e-02

Pos logit contributions
Run+0.008
aw+0.008
 trigger+0.008
ted+0.008
 sudden+0.008
Neg logit contributions
 look-0.005
days-0.005
 tidy-0.004
 clean-0.004
 Jane-0.004
Token cage
Token ID17186
Feature activation+1.11e-02

Pos logit contributions
Run+0.067
aw+0.067
plets+0.066
ted+0.065
 trigger+0.065
Neg logit contributions
 look-0.056
 clean-0.049
 tidy-0.048
 Jane-0.047
days-0.044
Token and
Token ID290
Feature activation+4.40e-02

Pos logit contributions
aw+0.025
ted+0.022
Run+0.022
 trigger+0.022
plets+0.022
Neg logit contributions
 look-0.016
 Jane-0.015
days-0.015
 tidy-0.014
 clean-0.014
Token waited
Token ID13488
Feature activation+2.17e-02

Pos logit contributions
aw+0.100
Run+0.099
ted+0.097
 trigger+0.095
plets+0.095
Neg logit contributions
 look-0.058
 Jane-0.052
days-0.052
 clean-0.051
 tidy-0.051
Token new
Token ID649
Feature activation+6.60e-03

Pos logit contributions
aw+0.127
Run+0.125
 trigger+0.125
plets+0.122
 sudden+0.121
Neg logit contributions
 look-0.081
 Jane-0.074
days-0.071
 clean-0.069
 tidy-0.069
Token color
Token ID3124
Feature activation+1.24e-02

Pos logit contributions
aw+0.013
 trigger+0.012
Run+0.012
plets+0.012
 sudden+0.012
Neg logit contributions
 look-0.011
days-0.010
 tidy-0.010
 Jane-0.010
 clean-0.009
Token.
Token ID13
Feature activation+1.53e-02

Pos logit contributions
aw+0.030
plets+0.029
 trigger+0.029
rots+0.028
Run+0.028
Neg logit contributions
 look-0.014
days-0.013
 Jane-0.013
 tidy-0.012
 clean-0.012
Token It
Token ID632
Feature activation-6.84e-03

Pos logit contributions
aw+0.036
Run+0.035
 trigger+0.035
plets+0.034
ted+0.034
Neg logit contributions
 look-0.020
 Jane-0.018
 clean-0.018
 tidy-0.017
days-0.017
Token is
Token ID318
Feature activation-7.81e-03

Pos logit contributions
 look+0.012
 Jane+0.012
days+0.012
 tidy+0.012
 clean+0.011
Neg logit contributions
aw-0.012
Run-0.011
ted-0.011
 trigger-0.011
plets-0.011
Token tur
Token ID7858
Feature activation+1.75e-01

Pos logit contributions
days+0.010
 look+0.010
 Jane+0.009
 tidy+0.008
 clean+0.008
Neg logit contributions
aw-0.018
 sudden-0.016
plets-0.016
Run-0.016
 trigger-0.016
Tokenqu
Token ID421
Feature activation-7.05e-02

Pos logit contributions
 trigger+0.236
ted+0.221
rified+0.204
Run+0.197
ked+0.188
Neg logit contributions
 look-0.395
 tidy-0.377
 clean-0.377
days-0.377
 Jane-0.374
Tokenoise
Token ID25678
Feature activation+1.10e-01

Pos logit contributions
 look+0.171
 Jane+0.160
 clean+0.157
 tidy+0.157
days+0.153
Neg logit contributions
aw-0.093
Run-0.084
plets-0.082
 trigger-0.081
ted-0.080
Token,
Token ID11
Feature activation+3.73e-02

Pos logit contributions
aw+0.258
Run+0.251
 trigger+0.245
plets+0.245
ted+0.244
Neg logit contributions
 look-0.142
 Jane-0.124
 clean-0.122
 tidy-0.121
days-0.118
Token like
Token ID588
Feature activation+3.58e-03

Pos logit contributions
aw+0.076
Run+0.074
 trigger+0.072
plets+0.072
ted+0.071
Neg logit contributions
 look-0.060
 Jane-0.054
 clean-0.054
 tidy-0.053
days-0.052
Token the
Token ID262
Feature activation-1.21e-02

Pos logit contributions
aw+0.008
Run+0.007
ted+0.007
 trigger+0.007
plets+0.007
Neg logit contributions
 look-0.005
days-0.005
 tidy-0.005
 clean-0.005
 Jane-0.005
Token,
Token ID11
Feature activation-1.02e-02

Pos logit contributions
aw+0.017
Run+0.016
ted+0.016
rots+0.016
 trigger+0.015
Neg logit contributions
 look-0.010
 tidy-0.010
days-0.009
 clean-0.009
 Jane-0.009
Token "
Token ID366
Feature activation+5.75e-02

Pos logit contributions
 look+0.018
days+0.016
 tidy+0.016
 Jane+0.016
 clean+0.016
Neg logit contributions
aw-0.019
Run-0.019
ted-0.018
plets-0.018
 trigger-0.018
TokenI
Token ID40
Feature activation-2.83e-02

Pos logit contributions
 trigger+0.098
aw+0.093
Run+0.093
rified+0.090
ted+0.090
Neg logit contributions
 look-0.110
 Jane-0.103
 clean-0.100
 tidy-0.099
days-0.096
Token'm
Token ID1101
Feature activation-2.19e-02

Pos logit contributions
 Jane+0.033
 tidy+0.032
 look+0.030
 clean+0.030
 Lucy+0.029
Neg logit contributions
Run-0.061
aw-0.058
plets-0.056
 sudden-0.055
ted-0.055
Token making
Token ID1642
Feature activation+3.17e-02

Pos logit contributions
 look+0.035
days+0.035
 tidy+0.032
 Jane+0.031
my+0.031
Neg logit contributions
aw-0.039
Run-0.039
plets-0.038
 sudden-0.036
ted-0.035
Token chicken
Token ID9015
Feature activation+1.75e-01

Pos logit contributions
aw+0.068
Run+0.068
 trigger+0.067
plets+0.066
ted+0.066
Neg logit contributions
 look-0.048
 clean-0.042
 Jane-0.042
 tidy-0.041
days-0.038
Token soup
Token ID17141
Feature activation+1.07e-01

Pos logit contributions
aw+0.366
Run+0.355
 trigger+0.347
plets+0.346
ted+0.343
Neg logit contributions
 look-0.266
 Jane-0.242
 clean-0.235
 tidy-0.234
days-0.228
Token.
Token ID13
Feature activation+1.52e-02

Pos logit contributions
aw+0.233
rots+0.216
plets+0.215
Run+0.212
ted+0.210
Neg logit contributions
 look-0.144
 Jane-0.138
days-0.131
 Sus-0.130
 tidy-0.127
Token It
Token ID632
Feature activation-2.94e-03

Pos logit contributions
aw+0.028
Run+0.027
 trigger+0.026
 sudden+0.026
ted+0.026
Neg logit contributions
 look-0.028
 Jane-0.026
 clean-0.025
 tidy-0.025
days-0.024
Token's
Token ID338
Feature activation+4.69e-02

Pos logit contributions
 look+0.005
 Jane+0.005
 tidy+0.005
days+0.005
 clean+0.005
Neg logit contributions
aw-0.005
Run-0.005
plets-0.005
rots-0.005
ted-0.005
Token y
Token ID331
Feature activation+2.40e-02

Pos logit contributions
aw+0.104
Run+0.097
 sudden+0.095
plets+0.095
 trigger+0.094
Neg logit contributions
days-0.061
 look-0.060
 Jane-0.057
 Sus-0.052
 tidy-0.052
Token saw
Token ID2497
Feature activation+3.52e-02

Pos logit contributions
 look+0.077
 Jane+0.068
 clean+0.066
 tidy+0.066
days+0.066
Neg logit contributions
aw-0.129
Run-0.125
ted-0.122
 trigger-0.121
plets-0.120
Token her
Token ID607
Feature activation-1.95e-02

Pos logit contributions
aw+0.088
Run+0.086
ted+0.084
 trigger+0.084
plets+0.083
Neg logit contributions
 look-0.041
 Jane-0.035
 clean-0.035
 tidy-0.034
days-0.034
Token mom
Token ID1995
Feature activation+8.04e-03

Pos logit contributions
 look+0.029
 clean+0.026
 tidy+0.025
 Jane+0.025
days+0.023
Neg logit contributions
aw-0.044
Run-0.044
plets-0.043
 trigger-0.041
ted-0.041
Token put
Token ID1234
Feature activation+5.98e-03

Pos logit contributions
aw+0.016
ted+0.015
 sudden+0.015
Run+0.015
ting+0.014
Neg logit contributions
 Jane-0.011
 look-0.011
days-0.011
 Sus-0.011
 tidy-0.010
Token some
Token ID617
Feature activation+4.54e-02

Pos logit contributions
aw+0.014
ted+0.013
Run+0.013
 trigger+0.013
plets+0.013
Neg logit contributions
 look-0.007
days-0.007
 Jane-0.007
 tidy-0.007
 clean-0.006
Token meat
Token ID6174
Feature activation+1.74e-01

Pos logit contributions
Run+0.097
plets+0.096
aw+0.094
rots+0.092
rified+0.091
Neg logit contributions
 look-0.076
 clean-0.073
 tidy-0.068
 organized-0.058
 Jane-0.058
Token in
Token ID287
Feature activation+2.26e-03

Pos logit contributions
aw+0.345
Run+0.337
 trigger+0.330
plets+0.327
ted+0.326
Neg logit contributions
 look-0.287
 Jane-0.260
 clean-0.256
 tidy-0.255
days-0.247
Token the
Token ID262
Feature activation+3.73e-02

Pos logit contributions
Run+0.004
 sudden+0.004
aw+0.004
 trigger+0.004
ted+0.004
Neg logit contributions
days-0.003
 look-0.003
 tidy-0.003
bi-0.003
 Jane-0.003
Token big
Token ID1263
Feature activation+1.75e-02

Pos logit contributions
aw+0.086
Run+0.084
 trigger+0.082
ted+0.081
plets+0.081
Neg logit contributions
 look-0.050
 Jane-0.044
 clean-0.043
 tidy-0.043
days-0.042
Token box
Token ID3091
Feature activation-6.30e-02

Pos logit contributions
aw+0.038
Run+0.037
 trigger+0.036
ted+0.036
plets+0.036
Neg logit contributions
 look-0.025
 Jane-0.023
 clean-0.022
 tidy-0.022
days-0.022
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
 look+0.077
 Jane+0.071
 Sus+0.066
days+0.065
 tidy+0.065
Neg logit contributions
aw-0.147
ted-0.137
Run-0.134
plets-0.131
 trigger-0.130
Token,
Token ID11
Feature activation-2.10e-02

Pos logit contributions
 look+0.005
 tidy+0.005
days+0.005
 clean+0.005
 Jane+0.005
Neg logit contributions
aw-0.010
Run-0.009
 trigger-0.009
ted-0.009
rots-0.009
Token "
Token ID366
Feature activation+5.12e-02

Pos logit contributions
days+0.039
 tidy+0.039
 look+0.038
 clean+0.038
 Jane+0.035
Neg logit contributions
Run-0.035
aw-0.034
ted-0.034
 trigger-0.032
plets-0.032
TokenThey
Token ID2990
Feature activation-1.64e-03

Pos logit contributions
aw+0.095
 trigger+0.095
ted+0.093
Run+0.092
plets+0.092
Neg logit contributions
 look-0.089
 Jane-0.078
 clean-0.077
 tidy-0.077
days-0.076
Token are
Token ID389
Feature activation+6.75e-03

Pos logit contributions
 Jane+0.002
days+0.002
 Sus+0.002
 Lucy+0.002
 tidy+0.002
Neg logit contributions
aw-0.003
Run-0.003
 trigger-0.003
ted-0.003
rots-0.003
Token av
Token ID1196
Feature activation+7.75e-04

Pos logit contributions
aw+0.015
Run+0.014
 trigger+0.014
plets+0.014
ted+0.014
Neg logit contributions
 look-0.009
days-0.009
 Jane-0.008
 tidy-0.008
 clean-0.008
Tokenoc
Token ID420
Feature activation+1.72e-01

Pos logit contributions
Run+0.001
 trigger+0.001
aw+0.001
ted+0.001
rified+0.001
Neg logit contributions
 look-0.002
 clean-0.002
 tidy-0.002
my-0.002
 Jane-0.002
Tokenados
Token ID22484
Feature activation+5.59e-02

Pos logit contributions
Run+0.214
 sudden+0.207
plets+0.199
aw+0.196
 trigger+0.192
Neg logit contributions
 look-0.401
 Jane-0.385
 clean-0.367
 tidy-0.363
 Sus-0.352
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
aw+0.126
plets+0.119
rots+0.118
Run+0.117
ted+0.116
Neg logit contributions
 look-0.072
 Jane-0.071
days-0.067
 Sus-0.065
 tidy-0.064
Token My
Token ID2011
Feature activation+9.89e-03

Pos logit contributions
aw+0.021
Run+0.020
 trigger+0.020
plets+0.020
ted+0.020
Neg logit contributions
 look-0.018
 Jane-0.016
 clean-0.016
 tidy-0.016
days-0.015
Token mom
Token ID1995
Feature activation-7.92e-03

Pos logit contributions
aw+0.019
 trigger+0.018
Run+0.018
rots+0.018
ted+0.018
Neg logit contributions
 look-0.016
 Jane-0.015
days-0.015
 clean-0.015
 tidy-0.015
Token makes
Token ID1838
Feature activation+1.78e-02

Pos logit contributions
days+0.013
 Jane+0.013
 Sus+0.012
 tidy+0.012
 look+0.012
Neg logit contributions
aw-0.015
rots-0.014
ted-0.013
plets-0.012
Run-0.012
Token farm
Token ID5318
Feature activation-6.84e-03

Pos logit contributions
 look+0.021
 clean+0.017
 tidy+0.016
 Jane+0.014
 search+0.014
Neg logit contributions
aw-0.045
plets-0.044
rots-0.044
Run-0.044
ted-0.042
Token,
Token ID11
Feature activation-1.05e-02

Pos logit contributions
 look+0.009
 tidy+0.008
 clean+0.008
days+0.008
 Jane+0.007
Neg logit contributions
aw-0.016
Run-0.015
ted-0.015
 trigger-0.014
 sudden-0.014
Token she
Token ID673
Feature activation-2.37e-04

Pos logit contributions
 look+0.013
 clean+0.012
 tidy+0.012
 Jane+0.012
days+0.011
Neg logit contributions
aw-0.024
Run-0.024
ted-0.024
plets-0.023
 trigger-0.023
Token saw
Token ID2497
Feature activation+3.36e-02

Pos logit contributions
 look+0.000
 Jane+0.000
 clean+0.000
 tidy+0.000
days+0.000
Neg logit contributions
aw-0.001
Run-0.001
ted-0.000
plets-0.000
 trigger-0.000
Token two
Token ID734
Feature activation+4.82e-02

Pos logit contributions
Run+0.083
 trigger+0.083
plets+0.083
aw+0.082
ted+0.079
Neg logit contributions
 look-0.039
 Jane-0.035
 clean-0.033
 tidy-0.032
 Sus-0.028
Token human
Token ID1692
Feature activation+1.72e-01

Pos logit contributions
Run+0.077
 trigger+0.077
plets+0.075
ted+0.074
aw+0.074
Neg logit contributions
 look-0.098
 clean-0.091
 tidy-0.090
 Jane-0.087
 smile-0.083
Token children
Token ID1751
Feature activation+5.82e-02

Pos logit contributions
aw+0.358
Run+0.353
 trigger+0.346
plets+0.344
ted+0.338
Neg logit contributions
 look-0.266
 Jane-0.240
 clean-0.236
 tidy-0.233
days-0.225
Token.
Token ID13
Feature activation+1.52e-03

Pos logit contributions
aw+0.120
Run+0.113
ted+0.112
 sudden+0.107
rots+0.106
Neg logit contributions
 look-0.088
 tidy-0.081
 clean-0.079
 Jane-0.077
days-0.073
Token They
Token ID1119
Feature activation-2.16e-03

Pos logit contributions
Run+0.003
ted+0.003
aw+0.003
plets+0.003
 trigger+0.003
Neg logit contributions
 look-0.002
 tidy-0.002
 clean-0.002
days-0.002
 Jane-0.002
Token were
Token ID547
Feature activation+1.65e-03

Pos logit contributions
 look+0.003
days+0.003
 tidy+0.003
 Jane+0.003
 clean+0.003
Neg logit contributions
Run-0.004
ted-0.004
 sudden-0.004
aw-0.004
ked-0.004
Token playing
Token ID2712
Feature activation+1.85e-03

Pos logit contributions
aw+0.004
Run+0.004
 sudden+0.003
ted+0.003
plets+0.003
Neg logit contributions
 look-0.002
days-0.002
 Jane-0.002
 clean-0.002
 tidy-0.002
Token dinner
Token ID8073
Feature activation+8.79e-02

Pos logit contributions
aw+0.111
Run+0.107
 trigger+0.105
ted+0.104
plets+0.103
Neg logit contributions
 look-0.071
 Jane-0.062
 tidy-0.061
 clean-0.061
days-0.060
Token.
Token ID13
Feature activation+1.87e-02

Pos logit contributions
aw+0.200
plets+0.185
Run+0.181
 sudden+0.179
ted+0.177
Neg logit contributions
 look-0.107
 Jane-0.098
days-0.090
 Sus-0.089
 tidy-0.085
Token They
Token ID1119
Feature activation+1.14e-02

Pos logit contributions
aw+0.035
Run+0.035
plets+0.033
ted+0.033
 sudden+0.032
Neg logit contributions
 look-0.033
 tidy-0.029
 clean-0.029
days-0.028
 Jane-0.026
Token are
Token ID389
Feature activation+8.10e-03

Pos logit contributions
aw+0.024
Run+0.024
ted+0.022
 trigger+0.021
plets+0.021
Neg logit contributions
 Jane-0.014
days-0.014
 Sus-0.013
 look-0.012
 tidy-0.012
Token cooking
Token ID10801
Feature activation+6.92e-02

Pos logit contributions
aw+0.017
 sudden+0.017
Run+0.016
plets+0.015
ted+0.015
Neg logit contributions
 look-0.012
days-0.011
 Jane-0.010
 Sus-0.009
|-0.009
Token chicken
Token ID9015
Feature activation+1.70e-01

Pos logit contributions
aw+0.152
Run+0.147
 trigger+0.142
ted+0.142
plets+0.142
Neg logit contributions
 look-0.099
 Jane-0.088
 tidy-0.086
 clean-0.086
days-0.085
Token,
Token ID11
Feature activation+4.49e-02

Pos logit contributions
aw+0.368
Run+0.364
 trigger+0.355
plets+0.351
ted+0.349
Neg logit contributions
 look-0.252
 clean-0.223
 Jane-0.222
 tidy-0.219
days-0.211
Token rice
Token ID11464
Feature activation+5.96e-02

Pos logit contributions
aw+0.083
Run+0.081
 trigger+0.080
plets+0.079
ted+0.078
Neg logit contributions
 look-0.082
 Jane-0.074
 clean-0.074
 tidy-0.073
days-0.070
Token and
Token ID290
Feature activation+3.89e-02

Pos logit contributions
aw+0.120
plets+0.113
rots+0.113
ted+0.113
Run+0.111
Neg logit contributions
 look-0.089
 Jane-0.084
days-0.081
 Sus-0.080
 tidy-0.079
Token carrots
Token ID34397
Feature activation+9.43e-02

Pos logit contributions
aw+0.070
Run+0.069
 trigger+0.067
plets+0.066
ted+0.066
Neg logit contributions
 look-0.072
 Jane-0.065
 clean-0.065
 tidy-0.064
days-0.062
Token.
Token ID13
Feature activation+1.72e-02

Pos logit contributions
aw+0.206
ted+0.195
plets+0.193
Run+0.192
 trigger+0.190
Neg logit contributions
 look-0.125
 Jane-0.123
 tidy-0.115
 Sus-0.115
days-0.113
TokenYes
Token ID5297
Feature activation-3.63e-02

Pos logit contributions
 trigger+0.096
plets+0.091
rots+0.089
aw+0.087
ted+0.084
Neg logit contributions
 look-0.120
 Jane-0.114
 clean-0.109
 tidy-0.108
days-0.106
Token,
Token ID11
Feature activation+4.11e-03

Pos logit contributions
 look+0.046
days+0.041
 tidy+0.040
 clean+0.040
 Jane+0.039
Neg logit contributions
aw-0.085
Run-0.083
 trigger-0.080
plets-0.079
ted-0.079
Token you
Token ID345
Feature activation-9.87e-03

Pos logit contributions
aw+0.008
 trigger+0.008
Run+0.008
plets+0.008
ted+0.008
Neg logit contributions
 look-0.007
 Jane-0.007
 clean-0.007
 tidy-0.006
days-0.006
Token can
Token ID460
Feature activation-2.72e-02

Pos logit contributions
 Jane+0.016
days+0.016
 look+0.015
 Sus+0.015
 tidy+0.015
Neg logit contributions
aw-0.018
 trigger-0.017
Run-0.017
plets-0.017
rots-0.017
Token have
Token ID423
Feature activation-2.68e-02

Pos logit contributions
 Sus+0.030
 Jane+0.028
days+0.027
any+0.025
 tidy+0.024
Neg logit contributions
aw-0.058
 trigger-0.056
ted-0.055
Run-0.055
plets-0.055
Token meat
Token ID6174
Feature activation+1.69e-01

Pos logit contributions
 look+0.041
 Jane+0.037
 clean+0.036
 tidy+0.036
days+0.035
Neg logit contributions
aw-0.056
Run-0.055
 trigger-0.054
ted-0.053
plets-0.053
Token,
Token ID11
Feature activation+3.84e-02

Pos logit contributions
aw+0.356
Run+0.347
 trigger+0.339
plets+0.336
ted+0.335
Neg logit contributions
 look-0.259
 Jane-0.233
 clean-0.229
 tidy-0.227
days-0.220
Token but
Token ID475
Feature activation+7.44e-02

Pos logit contributions
aw+0.076
Run+0.074
 trigger+0.070
ted+0.070
plets+0.070
Neg logit contributions
 look-0.063
 clean-0.056
 tidy-0.056
days-0.056
 Jane-0.056
Token you
Token ID345
Feature activation+1.13e-02

Pos logit contributions
aw+0.152
Run+0.150
 trigger+0.145
ted+0.143
plets+0.142
Neg logit contributions
 look-0.114
 Jane-0.104
 clean-0.102
 tidy-0.101
days-0.101
Token have
Token ID423
Feature activation-4.18e-03

Pos logit contributions
aw+0.027
Run+0.027
 trigger+0.027
plets+0.026
rots+0.026
Neg logit contributions
 Jane-0.011
 Sus-0.010
days-0.009
 look-0.009
 tidy-0.008
Token to
Token ID284
Feature activation+1.59e-02

Pos logit contributions
 look+0.005
 Jane+0.005
days+0.005
 tidy+0.005
 clean+0.005
Neg logit contributions
aw-0.010
Run-0.009
ted-0.009
 trigger-0.009
 sudden-0.009
Token a
Token ID257
Feature activation+3.83e-02

Pos logit contributions
 sudden+0.034
aw+0.033
Run+0.032
 trigger+0.031
plets+0.031
Neg logit contributions
 look-0.023
days-0.022
 Jane-0.021
 tidy-0.020
my-0.019
Token bowl
Token ID9396
Feature activation-6.39e-03

Pos logit contributions
aw+0.083
 sudden+0.083
Run+0.081
 trigger+0.080
plets+0.079
Neg logit contributions
 look-0.050
 Jane-0.047
days-0.046
 Sus-0.042
 tidy-0.041
Token.
Token ID13
Feature activation+1.10e-03

Pos logit contributions
 look+0.011
days+0.010
 Jane+0.010
 tidy+0.010
 clean+0.009
Neg logit contributions
aw-0.012
plets-0.011
Run-0.010
ted-0.010
 trigger-0.010
Token Dad
Token ID17415
Feature activation+9.68e-03

Pos logit contributions
aw+0.002
Run+0.002
plets+0.002
ted+0.002
 trigger+0.002
Neg logit contributions
 look-0.002
 tidy-0.002
 clean-0.002
days-0.002
 Jane-0.002
Token put
Token ID1234
Feature activation-1.72e-02

Pos logit contributions
aw+0.021
Run+0.020
ted+0.020
plets+0.020
rots+0.019
Neg logit contributions
 look-0.013
days-0.012
 Jane-0.012
 tidy-0.012
 clean-0.011
Token tomato
Token ID24240
Feature activation+1.68e-01

Pos logit contributions
 look+0.026
 Jane+0.023
 tidy+0.023
days+0.023
 clean+0.023
Neg logit contributions
aw-0.037
Run-0.036
ted-0.035
 trigger-0.035
plets-0.034
Token sauce
Token ID10746
Feature activation+1.18e-01

Pos logit contributions
aw+0.334
Run+0.326
 trigger+0.318
plets+0.316
ted+0.316
Neg logit contributions
 look-0.278
 Jane-0.251
 clean-0.247
 tidy-0.246
days-0.239
Token over
Token ID625
Feature activation-3.26e-02

Pos logit contributions
aw+0.231
Run+0.224
 trigger+0.219
ted+0.218
plets+0.218
Neg logit contributions
 look-0.197
 Jane-0.178
 clean-0.175
 tidy-0.175
days-0.172
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
 look+0.043
 Jane+0.037
 clean+0.037
 tidy+0.037
days+0.036
Neg logit contributions
aw-0.076
Run-0.074
 trigger-0.072
ted-0.072
plets-0.071
Token pasta
Token ID26296
Feature activation+2.27e-02

Pos logit contributions
 look+0.015
 Jane+0.014
 clean+0.013
 tidy+0.013
days+0.013
Neg logit contributions
aw-0.026
Run-0.025
 trigger-0.025
plets-0.025
ted-0.024
Token,
Token ID11
Feature activation+1.75e-02

Pos logit contributions
aw+0.044
plets+0.041
 trigger+0.040
ted+0.040
Run+0.040
Neg logit contributions
 look-0.036
days-0.035
 Jane-0.034
 tidy-0.033
 Sus-0.033
Token top
Token ID1353
Feature activation+2.64e-02

Pos logit contributions
 look+0.000
 Jane+0.000
 clean+0.000
 tidy+0.000
days+0.000
Neg logit contributions
aw-0.001
Run-0.001
 trigger-0.001
plets-0.001
ted-0.001
Token,
Token ID11
Feature activation+4.84e-03

Pos logit contributions
aw+0.048
Run+0.046
plets+0.044
 trigger+0.044
ted+0.044
Neg logit contributions
 look-0.049
 Jane-0.045
 clean-0.044
 tidy-0.044
days-0.043
Token sometimes
Token ID3360
Feature activation-2.46e-02

Pos logit contributions
aw+0.010
Run+0.010
 trigger+0.010
plets+0.010
ted+0.010
Neg logit contributions
 look-0.008
 Jane-0.007
 clean-0.007
 tidy-0.006
days-0.006
Token they
Token ID484
Feature activation+2.28e-02

Pos logit contributions
 look+0.044
 Jane+0.040
 clean+0.039
 tidy+0.039
days+0.038
Neg logit contributions
aw-0.046
Run-0.045
 trigger-0.043
plets-0.043
ted-0.043
Token had
Token ID550
Feature activation-1.67e-03

Pos logit contributions
aw+0.044
ted+0.041
Run+0.040
plets+0.039
rots+0.039
Neg logit contributions
 Jane-0.033
 look-0.032
 Sus-0.031
days-0.031
 tidy-0.029
Token meat
Token ID6174
Feature activation+1.68e-01

Pos logit contributions
 look+0.002
days+0.002
 Jane+0.002
 tidy+0.002
 clean+0.002
Neg logit contributions
aw-0.004
Run-0.003
 sudden-0.003
 trigger-0.003
ted-0.003
Tokenballs
Token ID21591
Feature activation+7.45e-02

Pos logit contributions
aw+0.360
Run+0.356
 trigger+0.346
plets+0.344
ted+0.341
Neg logit contributions
 look-0.254
 Jane-0.225
 clean-0.224
 tidy-0.221
days-0.212
Token,
Token ID11
Feature activation+3.21e-02

Pos logit contributions
aw+0.163
Run+0.159
 trigger+0.156
plets+0.155
ted+0.154
Neg logit contributions
 look-0.108
 Jane-0.096
 clean-0.095
 tidy-0.094
days-0.091
Token sometimes
Token ID3360
Feature activation-1.39e-02

Pos logit contributions
aw+0.067
Run+0.067
plets+0.065
 trigger+0.065
ted+0.064
Neg logit contributions
 look-0.052
 clean-0.045
 Jane-0.045
 tidy-0.044
days-0.041
Token they
Token ID484
Feature activation+3.03e-02

Pos logit contributions
 look+0.024
 Jane+0.022
 clean+0.022
 tidy+0.021
days+0.021
Neg logit contributions
aw-0.027
Run-0.026
 trigger-0.025
plets-0.025
ted-0.025
Token had
Token ID550
Feature activation+3.59e-03

Pos logit contributions
aw+0.061
ted+0.058
Run+0.057
 trigger+0.056
plets+0.056
Neg logit contributions
 Jane-0.042
 look-0.041
 Sus-0.040
days-0.039
 tidy-0.037
Token,
Token ID11
Feature activation-1.09e-02

Pos logit contributions
aw+0.008
Run+0.008
ted+0.008
 trigger+0.008
plets+0.008
Neg logit contributions
 look-0.004
 tidy-0.004
 clean-0.004
 Jane-0.004
days-0.003
Token "
Token ID366
Feature activation+5.37e-02

Pos logit contributions
 look+0.018
 tidy+0.017
 Jane+0.017
 clean+0.017
days+0.017
Neg logit contributions
Run-0.021
aw-0.021
ted-0.020
plets-0.020
 trigger-0.020
TokenBe
Token ID3856
Feature activation+4.25e-02

Pos logit contributions
 trigger+0.095
rified+0.089
ted+0.088
plets+0.088
ked+0.086
Neg logit contributions
 look-0.094
 clean-0.086
 tidy-0.086
 Jane-0.082
my-0.080
Tokenef
Token ID891
Feature activation+5.62e-02

Pos logit contributions
aw+0.076
Run+0.071
ted+0.066
 trigger+0.065
 sudden+0.065
Neg logit contributions
 look-0.081
 Jane-0.075
days-0.073
 Sus-0.071
 clean-0.070
Token is
Token ID318
Feature activation-8.53e-03

Pos logit contributions
aw+0.106
rots+0.099
ted+0.098
 trigger+0.098
Run+0.098
Neg logit contributions
 look-0.096
 tidy-0.093
 Jane-0.091
 clean-0.091
 Sus-0.086
Token meat
Token ID6174
Feature activation+1.68e-01

Pos logit contributions
 look+0.014
days+0.012
 Jane+0.012
 tidy+0.012
 clean+0.012
Neg logit contributions
aw-0.017
Run-0.016
 trigger-0.016
plets-0.016
ted-0.016
Token that
Token ID326
Feature activation-5.28e-03

Pos logit contributions
aw+0.408
Run+0.400
 trigger+0.393
plets+0.390
ted+0.388
Neg logit contributions
 look-0.204
 Jane-0.178
 clean-0.174
 tidy-0.172
days-0.165
Token comes
Token ID2058
Feature activation-1.24e-02

Pos logit contributions
 look+0.010
 Jane+0.009
 clean+0.009
 tidy+0.009
days+0.008
Neg logit contributions
aw-0.010
Run-0.009
 trigger-0.009
plets-0.009
ted-0.009
Token from
Token ID422
Feature activation-4.51e-03

Pos logit contributions
 look+0.020
 Jane+0.018
 clean+0.018
 tidy+0.017
days+0.017
Neg logit contributions
aw-0.025
Run-0.025
 trigger-0.024
plets-0.024
ted-0.024
Token cows
Token ID22575
Feature activation+7.11e-02

Pos logit contributions
 look+0.006
 Jane+0.005
days+0.005
 tidy+0.005
 clean+0.005
Neg logit contributions
aw-0.010
Run-0.010
 trigger-0.010
ted-0.010
plets-0.010
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
aw+0.164
ted+0.153
Run+0.152
 trigger+0.151
plets+0.150
Neg logit contributions
 look-0.089
 Jane-0.086
 tidy-0.080
 Sus-0.078
days-0.078
Token Mom
Token ID11254
Feature activation+4.50e-03

Pos logit contributions
aw+0.043
Run+0.043
plets+0.042
 trigger+0.042
rots+0.041
Neg logit contributions
 look-0.021
 Jane-0.018
 clean-0.017
 tidy-0.017
days-0.015
Token says
Token ID1139
Feature activation+2.36e-02

Pos logit contributions
aw+0.007
ted+0.006
Run+0.006
rots+0.006
plets+0.005
Neg logit contributions
 look-0.009
days-0.009
 tidy-0.009
 Jane-0.009
 clean-0.009
Token they
Token ID484
Feature activation+2.81e-02

Pos logit contributions
aw+0.055
Run+0.054
ted+0.052
 trigger+0.052
plets+0.052
Neg logit contributions
 look-0.030
 clean-0.027
 tidy-0.027
 Jane-0.026
days-0.026
Token are
Token ID389
Feature activation-1.51e-02

Pos logit contributions
aw+0.050
Run+0.049
 trigger+0.049
ted+0.047
rots+0.046
Neg logit contributions
 Jane-0.046
days-0.043
 Sus-0.041
 tidy-0.040
 clean-0.039
Token av
Token ID1196
Feature activation-4.42e-03

Pos logit contributions
days+0.024
 look+0.021
|+0.020
 Sus+0.019
 Jane+0.019
Neg logit contributions
aw-0.028
Run-0.026
rots-0.026
 hunters-0.025
 sudden-0.025
Tokenoc
Token ID420
Feature activation+1.68e-01

Pos logit contributions
 look+0.012
 clean+0.011
 tidy+0.011
 Jane+0.011
days+0.011
Neg logit contributions
Run-0.004
aw-0.004
 trigger-0.004
plets-0.004
ted-0.004
Tokenados
Token ID22484
Feature activation+4.09e-02

Pos logit contributions
Run+0.195
aw+0.186
 sudden+0.183
plets+0.180
 trigger+0.177
Neg logit contributions
 look-0.409
 Jane-0.392
 clean-0.378
 tidy-0.374
days-0.363
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
aw+0.085
rots+0.080
ted+0.079
plets+0.075
Run+0.075
Neg logit contributions
 look-0.055
 Jane-0.052
 tidy-0.051
 Sus-0.050
days-0.050
Token Sara
Token ID24799
Feature activation-2.23e-02

Pos logit contributions
aw+0.041
 trigger+0.040
plets+0.040
Run+0.040
rots+0.040
Neg logit contributions
 look-0.016
 Jane-0.014
 clean-0.013
 tidy-0.012
 Lucy-0.011
Token wants
Token ID3382
Feature activation+1.89e-02

Pos logit contributions
 look+0.042
 Jane+0.039
days+0.038
 clean+0.038
 tidy+0.038
Neg logit contributions
aw-0.039
Run-0.038
ted-0.036
 trigger-0.036
plets-0.035
Token to
Token ID284
Feature activation-6.25e-03

Pos logit contributions
aw+0.039
ted+0.037
Run+0.036
 sudden+0.035
 trigger+0.035
Neg logit contributions
days-0.029
 look-0.027
 tidy-0.026
 clean-0.025
 Jane-0.025
Token and
Token ID290
Feature activation+6.67e-03

Pos logit contributions
 look+0.081
 tidy+0.077
days+0.077
 Jane+0.076
 clean+0.074
Neg logit contributions
ted-0.090
aw-0.090
Run-0.084
 sudden-0.081
plets-0.080
Token forth
Token ID6071
Feature activation+4.97e-02

Pos logit contributions
aw+0.014
Run+0.013
ted+0.013
 trigger+0.013
plets+0.013
Neg logit contributions
 look-0.011
 Jane-0.009
days-0.009
 clean-0.009
 tidy-0.009
Token,
Token ID11
Feature activation-8.30e-03

Pos logit contributions
aw+0.098
ted+0.096
plets+0.095
Run+0.095
 trigger+0.094
Neg logit contributions
 look-0.075
 Jane-0.071
days-0.070
 tidy-0.067
 clean-0.066
Token helping
Token ID5742
Feature activation-3.60e-02

Pos logit contributions
 look+0.012
days+0.011
 Jane+0.011
 clean+0.010
 tidy+0.010
Neg logit contributions
aw-0.018
Run-0.018
ted-0.017
plets-0.017
 trigger-0.017
Token the
Token ID262
Feature activation-4.04e-02

Pos logit contributions
 look+0.046
days+0.040
 Jane+0.040
 clean+0.039
 tidy+0.039
Neg logit contributions
aw-0.085
Run-0.082
 trigger-0.081
ted-0.080
plets-0.079
Token p
Token ID279
Feature activation-2.07e-01

Pos logit contributions
 look+0.070
 Jane+0.067
days+0.065
 clean+0.060
 tidy+0.060
Neg logit contributions
 sudden-0.074
aw-0.073
Run-0.072
 trigger-0.071
ted-0.066
Tokenanda
Token ID5282
Feature activation-1.05e-01

Pos logit contributions
 look+0.464
 Jane+0.433
 clean+0.422
 tidy+0.421
days+0.414
Neg logit contributions
aw-0.319
plets-0.274
rots-0.267
ted-0.257
 sudden-0.253
Token to
Token ID284
Feature activation-3.54e-02

Pos logit contributions
 Jane+0.150
 look+0.145
 Sus+0.139
days+0.138
 tidy+0.135
Neg logit contributions
aw-0.214
ted-0.200
plets-0.198
 trigger-0.195
Run-0.195
Token complete
Token ID1844
Feature activation-2.11e-02

Pos logit contributions
days+0.051
bie+0.043
 Jane+0.042
 Sus+0.042
any+0.041
Neg logit contributions
 trigger-0.063
Run-0.062
aw-0.060
 arrows-0.058
 sudden-0.058
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
days+0.034
 look+0.031
my+0.027
 tidy+0.027
|+0.027
Neg logit contributions
aw-0.039
 sudden-0.037
 arrows-0.035
Run-0.035
izing-0.035
Token climb
Token ID12080
Feature activation-1.07e-02

Pos logit contributions
 look+0.018
 Jane+0.018
days+0.017
 Sus+0.016
 clean+0.015
Neg logit contributions
aw-0.026
Run-0.026
 sudden-0.025
 trigger-0.025
plets-0.024
Token very
Token ID845
Feature activation-2.28e-02

Pos logit contributions
aw+0.002
 sudden+0.002
king+0.002
Run+0.002
ked+0.002
Neg logit contributions
days-0.002
 look-0.002
 Jane-0.001
 Sus-0.001
 cabinet-0.001
Token happy
Token ID3772
Feature activation+8.69e-03

Pos logit contributions
 look+0.028
 Jane+0.026
days+0.025
 Sus+0.023
 Lucy+0.020
Neg logit contributions
Run-0.052
 sudden-0.052
aw-0.052
 trigger-0.049
plets-0.049
Token and
Token ID290
Feature activation-7.05e-03

Pos logit contributions
aw+0.018
Run+0.017
ted+0.017
plets+0.017
 trigger+0.017
Neg logit contributions
 look-0.014
 Jane-0.012
 tidy-0.012
 clean-0.012
days-0.012
Token thanked
Token ID26280
Feature activation-5.58e-02

Pos logit contributions
 look+0.011
 Jane+0.009
 clean+0.009
 tidy+0.009
days+0.009
Neg logit contributions
aw-0.015
Run-0.015
 trigger-0.015
plets-0.014
ted-0.014
Token the
Token ID262
Feature activation-3.30e-02

Pos logit contributions
 look+0.073
days+0.070
 tidy+0.068
 clean+0.066
 Jane+0.062
Neg logit contributions
aw-0.126
Run-0.124
ted-0.123
 trigger-0.118
rots-0.115
Token p
Token ID279
Feature activation-2.05e-01

Pos logit contributions
 look+0.059
 Jane+0.059
days+0.056
 tidy+0.054
 clean+0.054
Neg logit contributions
aw-0.057
 sudden-0.056
Run-0.056
 trigger-0.055
ted-0.051
Tokenanda
Token ID5282
Feature activation-9.81e-02

Pos logit contributions
 look+0.401
 Jane+0.367
 clean+0.361
 tidy+0.359
days+0.351
Neg logit contributions
aw-0.366
plets-0.332
Run-0.329
rots-0.328
ted-0.326
Token.
Token ID13
Feature activation-1.24e-02

Pos logit contributions
 look+0.165
 Jane+0.160
 tidy+0.156
days+0.156
 Sus+0.152
Neg logit contributions
aw-0.175
ted-0.165
rots-0.162
 trigger-0.159
rified-0.154
Token The
Token ID383
Feature activation-1.19e-02

Pos logit contributions
 look+0.023
days+0.021
 tidy+0.021
 clean+0.021
 Jane+0.020
Neg logit contributions
aw-0.023
Run-0.022
ted-0.022
 trigger-0.021
plets-0.021
Token p
Token ID279
Feature activation-1.87e-01

Pos logit contributions
 look+0.020
 Jane+0.019
days+0.018
 clean+0.017
 tidy+0.017
Neg logit contributions
Run-0.023
aw-0.023
 sudden-0.023
 trigger-0.022
ted-0.021
Tokenanda
Token ID5282
Feature activation-7.75e-02

Pos logit contributions
 look+0.342
 Jane+0.310
 clean+0.307
 tidy+0.305
days+0.296
Neg logit contributions
aw-0.348
Run-0.331
plets-0.322
ted-0.321
 trigger-0.321
Token together
Token ID1978
Feature activation-4.16e-02

Pos logit contributions
 look+0.069
 Jane+0.062
 clean+0.062
 tidy+0.061
days+0.058
Neg logit contributions
aw-0.086
Run-0.084
 trigger-0.083
plets-0.082
ted-0.081
Token every
Token ID790
Feature activation+9.22e-03

Pos logit contributions
 look+0.059
 Jane+0.053
 clean+0.052
 tidy+0.051
days+0.050
Neg logit contributions
aw-0.092
Run-0.089
ted-0.087
 trigger-0.087
plets-0.086
Token day
Token ID1110
Feature activation+3.39e-02

Pos logit contributions
 sudden+0.021
Run+0.021
aw+0.019
ted+0.019
 trigger+0.019
Neg logit contributions
 look-0.012
 Jane-0.010
 Sus-0.010
 tidy-0.010
 clean-0.010
Token and
Token ID290
Feature activation-1.42e-02

Pos logit contributions
aw+0.091
Run+0.091
 trigger+0.089
plets+0.088
ted+0.087
Neg logit contributions
 look-0.032
 Jane-0.027
 clean-0.027
 tidy-0.026
days-0.024
Token the
Token ID262
Feature activation-3.68e-02

Pos logit contributions
 look+0.017
days+0.016
 tidy+0.016
 clean+0.015
 Jane+0.015
Neg logit contributions
aw-0.032
ted-0.031
Run-0.030
 sudden-0.030
plets-0.029
Token p
Token ID279
Feature activation-2.04e-01

Pos logit contributions
 look+0.057
 Jane+0.054
days+0.053
 clean+0.049
 tidy+0.049
Neg logit contributions
 sudden-0.074
Run-0.072
 trigger-0.071
aw-0.071
ted-0.066
Tokenanda
Token ID5282
Feature activation-1.06e-01

Pos logit contributions
 look+0.461
 Jane+0.415
 tidy+0.414
 clean+0.413
days+0.387
Neg logit contributions
aw-0.320
rots-0.265
ted-0.258
plets-0.257
 sudden-0.250
Token was
Token ID373
Feature activation+3.62e-03

Pos logit contributions
 look+0.157
 Jane+0.151
 tidy+0.139
 Sus+0.138
 clean+0.137
Neg logit contributions
aw-0.227
ted-0.210
rots-0.200
 sudden-0.196
Run-0.196
Token no
Token ID645
Feature activation+6.52e-02

Pos logit contributions
aw+0.008
 sudden+0.007
king+0.006
Leave+0.006
med+0.006
Neg logit contributions
 look-0.005
days-0.005
 Jane-0.004
 cabinet-0.004
 lie-0.004
Token longer
Token ID2392
Feature activation+2.96e-02

Pos logit contributions
aw+0.084
 sudden+0.079
Run+0.079
 trigger+0.075
ted+0.073
Neg logit contributions
 look-0.151
 Jane-0.142
days-0.141
 clean-0.136
 tidy-0.136
Token gloomy
Token ID46400
Feature activation+1.64e-02

Pos logit contributions
aw+0.058
 sudden+0.054
Run+0.053
plets+0.052
 trigger+0.050
Neg logit contributions
 look-0.050
days-0.045
 Jane-0.044
 tidy-0.042
 clean-0.042
Token that
Token ID326
Feature activation-6.15e-02

Pos logit contributions
 look+0.045
 clean+0.042
 tidy+0.040
 Jane+0.040
 Sus+0.036
Neg logit contributions
plets-0.055
 trigger-0.054
aw-0.054
Run-0.053
rots-0.052
Token day
Token ID1110
Feature activation+4.02e-02

Pos logit contributions
 look+0.068
 Jane+0.060
 tidy+0.056
 clean+0.056
days+0.054
Neg logit contributions
Run-0.150
 sudden-0.144
aw-0.144
ted-0.143
 trigger-0.143
Token on
Token ID319
Feature activation-6.04e-02

Pos logit contributions
aw+0.086
 trigger+0.084
Run+0.083
plets+0.082
 sudden+0.080
Neg logit contributions
 look-0.062
 clean-0.055
 tidy-0.053
 Jane-0.053
 have-0.051
Token,
Token ID11
Feature activation-2.83e-02

Pos logit contributions
 look+0.074
 Jane+0.064
 clean+0.063
 tidy+0.063
days+0.061
Neg logit contributions
aw-0.146
Run-0.144
 trigger-0.140
ted-0.140
plets-0.139
Token the
Token ID262
Feature activation-3.22e-02

Pos logit contributions
 look+0.035
 tidy+0.030
 clean+0.030
 Jane+0.029
days+0.029
Neg logit contributions
aw-0.067
Run-0.066
ted-0.065
plets-0.064
 trigger-0.064
Token p
Token ID279
Feature activation-2.04e-01

Pos logit contributions
 look+0.041
 Jane+0.040
days+0.037
 tidy+0.035
 clean+0.034
Neg logit contributions
 sudden-0.074
Run-0.071
 trigger-0.069
aw-0.068
ted-0.066
Tokenanda
Token ID5282
Feature activation-9.36e-02

Pos logit contributions
 look+0.405
 Jane+0.367
 clean+0.363
 tidy+0.362
days+0.347
Neg logit contributions
aw-0.359
rots-0.323
plets-0.322
ted-0.320
Run-0.318
Token and
Token ID290
Feature activation-5.42e-03

Pos logit contributions
 look+0.130
 Jane+0.118
 clean+0.112
 tidy+0.112
days+0.109
Neg logit contributions
aw-0.212
Run-0.202
ted-0.200
plets-0.198
 trigger-0.196
Token the
Token ID262
Feature activation-3.22e-02

Pos logit contributions
 look+0.008
days+0.008
 tidy+0.007
 Jane+0.007
 clean+0.007
Neg logit contributions
Run-0.011
ted-0.011
aw-0.010
 trigger-0.010
rified-0.010
Token bird
Token ID6512
Feature activation-1.58e-02

Pos logit contributions
 Jane+0.055
 look+0.055
days+0.053
 tidy+0.051
 clean+0.050
Neg logit contributions
 sudden-0.058
Run-0.057
aw-0.056
 trigger-0.056
ted-0.052
Token became
Token ID2627
Feature activation-5.44e-03

Pos logit contributions
 look+0.024
 Jane+0.023
 tidy+0.022
 clean+0.022
 Sus+0.021
Neg logit contributions
aw-0.033
ted-0.031
Run-0.031
plets-0.030
rots-0.030
Token that
Token ID326
Feature activation-6.20e-02

Pos logit contributions
 look+0.080
 Jane+0.073
 clean+0.072
 tidy+0.072
days+0.071
Neg logit contributions
aw-0.081
Run-0.079
 trigger-0.076
ted-0.076
plets-0.075
Token day
Token ID1110
Feature activation+3.83e-02

Pos logit contributions
 look+0.071
 Jane+0.063
 tidy+0.059
 clean+0.058
days+0.056
Neg logit contributions
Run-0.146
aw-0.142
 sudden-0.140
ted-0.140
 trigger-0.138
Token on
Token ID319
Feature activation-6.09e-02

Pos logit contributions
aw+0.082
 trigger+0.082
Run+0.081
plets+0.080
 sudden+0.078
Neg logit contributions
 look-0.056
 clean-0.051
 tidy-0.049
 Jane-0.048
 have-0.047
Token,
Token ID11
Feature activation-2.85e-02

Pos logit contributions
 look+0.079
 clean+0.068
 tidy+0.068
 Jane+0.067
days+0.066
Neg logit contributions
aw-0.140
Run-0.138
ted-0.134
 trigger-0.133
plets-0.133
Token the
Token ID262
Feature activation-2.94e-02

Pos logit contributions
 look+0.040
 tidy+0.034
 clean+0.034
days+0.033
 Jane+0.033
Neg logit contributions
aw-0.063
Run-0.062
ted-0.060
plets-0.059
 sudden-0.059
Token p
Token ID279
Feature activation-2.01e-01

Pos logit contributions
 look+0.041
 Jane+0.038
days+0.036
 tidy+0.033
 clean+0.033
Neg logit contributions
 sudden-0.063
Run-0.061
aw-0.061
 trigger-0.058
ted-0.056
Tokenanda
Token ID5282
Feature activation-9.94e-02

Pos logit contributions
 look+0.441
 clean+0.391
 tidy+0.390
 Jane+0.390
days+0.368
Neg logit contributions
aw-0.330
rots-0.286
ted-0.272
plets-0.271
 sudden-0.264
Token and
Token ID290
Feature activation-8.00e-03

Pos logit contributions
 look+0.151
 Jane+0.138
 tidy+0.128
 clean+0.127
days+0.126
Neg logit contributions
aw-0.215
ted-0.200
rots-0.196
Run-0.194
plets-0.192
Token the
Token ID262
Feature activation-3.04e-02

Pos logit contributions
 look+0.012
 tidy+0.011
 clean+0.010
days+0.010
 Jane+0.010
Neg logit contributions
Run-0.016
aw-0.016
ted-0.016
 trigger-0.015
 sudden-0.015
Token bird
Token ID6512
Feature activation-1.39e-02

Pos logit contributions
 look+0.046
 Jane+0.045
days+0.044
 tidy+0.041
 clean+0.040
Neg logit contributions
 sudden-0.060
aw-0.059
Run-0.059
 trigger-0.057
ted-0.054
Token played
Token ID2826
Feature activation-4.03e-02

Pos logit contributions
 look+0.023
 Jane+0.022
 tidy+0.021
 Sus+0.020
 clean+0.020
Neg logit contributions
aw-0.027
ted-0.026
rots-0.025
Run-0.024
plets-0.024
Token day
Token ID1110
Feature activation+3.26e-02

Pos logit contributions
 look+0.077
 Jane+0.068
 clean+0.066
 tidy+0.066
days+0.061
Neg logit contributions
Run-0.159
 sudden-0.151
ted-0.151
 trigger-0.150
aw-0.148
Token onward
Token ID38232
Feature activation-2.85e-02

Pos logit contributions
aw+0.068
 trigger+0.065
Run+0.065
plets+0.064
rots+0.063
Neg logit contributions
 look-0.053
 clean-0.047
 tidy-0.046
 Jane-0.045
 have-0.044
Token,
Token ID11
Feature activation-2.83e-02

Pos logit contributions
 look+0.031
 Jane+0.027
 clean+0.026
 tidy+0.026
days+0.023
Neg logit contributions
aw-0.074
Run-0.072
 trigger-0.071
plets-0.071
ted-0.069
Token the
Token ID262
Feature activation-3.35e-02

Pos logit contributions
 look+0.036
 clean+0.031
 tidy+0.030
 Jane+0.030
days+0.030
Neg logit contributions
aw-0.066
Run-0.066
ted-0.064
plets-0.064
 trigger-0.063
Token young
Token ID1862
Feature activation-4.61e-02

Pos logit contributions
 look+0.047
days+0.046
 Jane+0.045
 tidy+0.039
 clean+0.038
Neg logit contributions
 sudden-0.069
Run-0.063
 trigger-0.062
aw-0.062
ted-0.060
Token p
Token ID279
Feature activation-2.00e-01

Pos logit contributions
 look+0.083
days+0.079
 tidy+0.077
 clean+0.076
 Jane+0.074
Neg logit contributions
aw-0.077
 sudden-0.075
ted-0.073
Run-0.073
 trigger-0.072
Tokenanda
Token ID5282
Feature activation-9.62e-02

Pos logit contributions
 look+0.439
 clean+0.392
 tidy+0.391
 Jane+0.389
days+0.373
Neg logit contributions
aw-0.326
rots-0.284
ted-0.277
plets-0.273
 sudden-0.267
Token tried
Token ID3088
Feature activation+1.76e-02

Pos logit contributions
 look+0.144
 Jane+0.130
 clean+0.123
 tidy+0.123
days+0.121
Neg logit contributions
aw-0.210
ted-0.198
Run-0.196
rots-0.193
plets-0.191
Token to
Token ID284
Feature activation+3.04e-03

Pos logit contributions
aw+0.038
 sudden+0.036
ted+0.036
Run+0.036
 trigger+0.035
Neg logit contributions
 look-0.023
days-0.023
 Jane-0.022
 tidy-0.021
 clean-0.020
Token behave
Token ID17438
Feature activation-3.45e-02

Pos logit contributions
Run+0.006
 trigger+0.006
aw+0.006
 sudden+0.006
Leave+0.006
Neg logit contributions
 Jane-0.004
days-0.004
 Sus-0.004
 look-0.004
 Lucy-0.003
Token himself
Token ID2241
Feature activation-9.88e-03

Pos logit contributions
 look+0.053
 Jane+0.050
days+0.048
 clean+0.047
 tidy+0.047
Neg logit contributions
aw-0.071
Run-0.068
ted-0.067
 trigger-0.066
plets-0.065
Token or
Token ID393
Feature activation-4.06e-03

Pos logit contributions
aw+0.056
Run+0.054
ted+0.053
 trigger+0.052
plets+0.051
Neg logit contributions
 look-0.031
 Jane-0.028
 tidy-0.027
 clean-0.027
days-0.026
Token the
Token ID262
Feature activation-3.13e-02

Pos logit contributions
 look+0.005
 Jane+0.005
days+0.005
 tidy+0.004
 clean+0.004
Neg logit contributions
aw-0.009
Run-0.009
ted-0.009
 trigger-0.009
 sudden-0.009
Token porch
Token ID33179
Feature activation-1.22e-01

Pos logit contributions
 look+0.054
 Jane+0.049
 clean+0.048
 tidy+0.048
days+0.046
Neg logit contributions
aw-0.060
Run-0.059
 trigger-0.058
plets-0.057
ted-0.057
Token or
Token ID393
Feature activation-7.90e-03

Pos logit contributions
 look+0.162
 Jane+0.150
days+0.145
 Sus+0.144
 tidy+0.140
Neg logit contributions
aw-0.277
ted-0.255
Run-0.250
 trigger-0.249
plets-0.243
Token the
Token ID262
Feature activation-3.61e-02

Pos logit contributions
 look+0.009
days+0.009
 Jane+0.009
 tidy+0.008
 clean+0.008
Neg logit contributions
Run-0.018
aw-0.018
ted-0.018
 trigger-0.017
plets-0.017
Token p
Token ID279
Feature activation-2.00e-01

Pos logit contributions
 look+0.062
 Jane+0.056
 clean+0.056
 tidy+0.055
days+0.053
Neg logit contributions
aw-0.070
Run-0.068
 trigger-0.066
plets-0.066
ted-0.065
Tokenuddle
Token ID24500
Feature activation-1.16e-01

Pos logit contributions
 look+0.449
 Jane+0.415
 tidy+0.410
 clean+0.410
days+0.400
Neg logit contributions
aw-0.301
Run-0.266
plets-0.265
ted-0.262
 trigger-0.258
Token.
Token ID13
Feature activation+1.60e-03

Pos logit contributions
 look+0.144
 Jane+0.125
 tidy+0.121
days+0.120
 clean+0.118
Neg logit contributions
aw-0.282
ted-0.266
Run-0.265
 trigger-0.262
rified-0.254
Token She
Token ID1375
Feature activation-3.83e-02

Pos logit contributions
aw+0.004
Run+0.004
 trigger+0.004
plets+0.004
ted+0.004
Neg logit contributions
 look-0.002
 Jane-0.002
 clean-0.002
 tidy-0.002
days-0.002
Token did
Token ID750
Feature activation+4.99e-04

Pos logit contributions
 look+0.049
 clean+0.043
 Jane+0.042
 tidy+0.042
days+0.039
Neg logit contributions
aw-0.090
Run-0.089
 trigger-0.088
plets-0.088
ted-0.086
Token not
Token ID407
Feature activation-5.52e-02

Pos logit contributions
aw+0.001
Run+0.001
 trigger+0.001
 sudden+0.001
ted+0.001
Neg logit contributions
days-0.001
 Jane-0.001
 look-0.001
 Sus-0.001
 tidy-0.001
Token do
Token ID466
Feature activation-2.70e-02

Pos logit contributions
aw+0.014
Run+0.013
rots+0.013
ted+0.013
rified+0.012
Neg logit contributions
 Jane-0.009
 look-0.009
 tidy-0.009
 clean-0.009
days-0.009
Token you
Token ID345
Feature activation-4.03e-03

Pos logit contributions
 look+0.057
 Jane+0.052
 clean+0.052
 tidy+0.051
days+0.050
Neg logit contributions
aw-0.042
Run-0.041
 trigger-0.040
plets-0.040
ted-0.039
Token go
Token ID467
Feature activation-6.65e-03

Pos logit contributions
 Jane+0.005
 Sus+0.005
 Lucy+0.005
bie+0.004
 Ell+0.004
Neg logit contributions
Run-0.008
aw-0.008
 trigger-0.008
ted-0.007
Leave-0.007
Token?"
Token ID1701
Feature activation-4.10e-02

Pos logit contributions
 Jane+0.008
 look+0.007
days+0.007
 tidy+0.007
 Sus+0.007
Neg logit contributions
aw-0.015
ted-0.015
Run-0.015
plets-0.014
 trigger-0.014
Token The
Token ID383
Feature activation-3.60e-02

Pos logit contributions
 look+0.063
 tidy+0.057
days+0.056
 clean+0.056
 Jane+0.055
Neg logit contributions
aw-0.085
Run-0.083
ted-0.082
 sudden-0.079
 trigger-0.077
Token p
Token ID279
Feature activation-2.00e-01

Pos logit contributions
 Jane+0.057
 look+0.057
days+0.056
 Sus+0.050
 tidy+0.050
Neg logit contributions
 sudden-0.068
 trigger-0.067
aw-0.067
Run-0.066
ted-0.063
Tokenanda
Token ID5282
Feature activation-9.43e-02

Pos logit contributions
 look+0.438
 clean+0.404
 tidy+0.402
 Jane+0.399
days+0.382
Neg logit contributions
aw-0.325
rots-0.276
ted-0.268
plets-0.262
Run-0.259
Token replied
Token ID8712
Feature activation+2.42e-03

Pos logit contributions
days+0.173
 look+0.168
 tidy+0.161
 clean+0.158
 Jane+0.156
Neg logit contributions
aw-0.165
ted-0.152
Run-0.151
rots-0.148
 sudden-0.138
Token,
Token ID11
Feature activation-1.86e-02

Pos logit contributions
aw+0.006
Run+0.005
ted+0.005
 trigger+0.005
plets+0.005
Neg logit contributions
 look-0.003
days-0.003
 tidy-0.003
 Jane-0.003
 clean-0.003
Token "
Token ID366
Feature activation+5.13e-02

Pos logit contributions
 look+0.033
days+0.032
 tidy+0.031
 clean+0.031
 Jane+0.030
Neg logit contributions
Run-0.033
aw-0.032
ted-0.032
plets-0.030
 trigger-0.030
TokenI
Token ID40
Feature activation-3.05e-02

Pos logit contributions
 trigger+0.111
aw+0.110
Run+0.106
rots+0.105
plets+0.105
Neg logit contributions
 look-0.076
 tidy-0.066
days-0.066
 Jane-0.066
 clean-0.066
Token and
Token ID290
Feature activation-2.06e-02

Pos logit contributions
aw+0.031
Run+0.029
ted+0.029
 trigger+0.028
rified+0.028
Neg logit contributions
 look-0.013
 tidy-0.012
 Jane-0.012
 clean-0.011
days-0.011
Token very
Token ID845
Feature activation-6.16e-02

Pos logit contributions
days+0.028
 look+0.027
 Jane+0.027
 Sus+0.024
ill+0.023
Neg logit contributions
 sudden-0.040
ted-0.039
Run-0.039
aw-0.039
plets-0.037
Token soft
Token ID2705
Feature activation-4.02e-02

Pos logit contributions
days+0.096
 look+0.089
 Jane+0.088
 Sus+0.082
ill+0.079
Neg logit contributions
aw-0.116
 sudden-0.108
Run-0.103
ted-0.102
ls-0.101
Token.
Token ID13
Feature activation-2.74e-02

Pos logit contributions
 look+0.048
 Jane+0.045
days+0.044
 tidy+0.044
 clean+0.042
Neg logit contributions
aw-0.099
ted-0.094
rified-0.089
Run-0.089
 trigger-0.087
Token The
Token ID383
Feature activation-2.61e-02

Pos logit contributions
 look+0.049
 tidy+0.046
days+0.045
 clean+0.045
 smile+0.042
Neg logit contributions
aw-0.048
Run-0.046
ted-0.045
 trigger-0.042
rified-0.042
Token p
Token ID279
Feature activation-1.99e-01

Pos logit contributions
 Jane+0.033
days+0.032
 look+0.031
 Sus+0.028
 tidy+0.027
Neg logit contributions
 sudden-0.059
aw-0.056
 trigger-0.055
Run-0.055
ted-0.054
Tokenanda
Token ID5282
Feature activation-9.36e-02

Pos logit contributions
 look+0.443
 clean+0.404
 tidy+0.402
 Jane+0.396
days+0.383
Neg logit contributions
aw-0.322
rots-0.266
ted-0.265
plets-0.260
 sudden-0.253
Token loved
Token ID6151
Feature activation-2.61e-02

Pos logit contributions
 look+0.136
 Jane+0.123
days+0.118
 tidy+0.114
 clean+0.113
Neg logit contributions
aw-0.213
ted-0.193
Run-0.188
rots-0.186
 sudden-0.184
Token to
Token ID284
Feature activation-3.27e-02

Pos logit contributions
days+0.040
 look+0.039
 Jane+0.036
 tidy+0.036
 clean+0.034
Neg logit contributions
aw-0.051
ted-0.050
 sudden-0.048
Run-0.047
ting-0.043
Token fold
Token ID5591
Feature activation+1.26e-02

Pos logit contributions
 Jane+0.045
 Sus+0.041
days+0.041
bi+0.041
abby+0.040
Neg logit contributions
Run-0.057
aw-0.055
ted-0.055
Leave-0.054
 arrows-0.054
Token things
Token ID1243
Feature activation+2.69e-02

Pos logit contributions
aw+0.028
Run+0.028
 trigger+0.027
ted+0.027
plets+0.027
Neg logit contributions
 look-0.017
 Jane-0.015
 clean-0.015
 tidy-0.015
days-0.015
Token polite
Token ID23507
Feature activation+1.39e-02

Pos logit contributions
days+0.090
 Jane+0.081
 Sus+0.077
 look+0.074
any+0.073
Neg logit contributions
Run-0.104
aw-0.104
 trigger-0.103
 unnecessary-0.099
 arrows-0.098
Token."
Token ID526
Feature activation-4.14e-02

Pos logit contributions
aw+0.034
Run+0.032
ted+0.032
 trigger+0.032
plets+0.031
Neg logit contributions
 look-0.017
 tidy-0.015
 clean-0.015
 Jane-0.015
days-0.014
Token
Token ID198
Feature activation-1.64e-02

Pos logit contributions
 look+0.068
 Jane+0.062
 clean+0.061
 tidy+0.060
days+0.057
Neg logit contributions
aw-0.084
Run-0.082
 trigger-0.081
plets-0.080
ted-0.079
Token
Token ID198
Feature activation-1.94e-02

Pos logit contributions
 look+0.018
 Jane+0.016
 clean+0.015
 tidy+0.015
days+0.013
Neg logit contributions
aw-0.042
 trigger-0.042
Run-0.041
plets-0.041
ted-0.040
TokenThe
Token ID464
Feature activation-1.24e-02

Pos logit contributions
 look+0.035
 tidy+0.035
 clean+0.035
 Jane+0.032
days+0.030
Neg logit contributions
 trigger-0.034
plets-0.034
ted-0.034
aw-0.033
Run-0.032
Token p
Token ID279
Feature activation-1.99e-01

Pos logit contributions
 look+0.022
 Jane+0.021
 clean+0.020
 tidy+0.020
days+0.020
Neg logit contributions
aw-0.023
Run-0.023
 trigger-0.022
ted-0.021
plets-0.021
Tokenanda
Token ID5282
Feature activation-8.68e-02

Pos logit contributions
 look+0.363
 Jane+0.330
 clean+0.328
 tidy+0.325
days+0.315
Neg logit contributions
aw-0.365
Run-0.355
 trigger-0.344
plets-0.342
ted-0.340
Token felt
Token ID2936
Feature activation+1.31e-03

Pos logit contributions
 look+0.123
 Jane+0.110
 clean+0.107
days+0.107
 tidy+0.106
Neg logit contributions
aw-0.195
Run-0.188
ted-0.183
 trigger-0.180
 sudden-0.179
Token very
Token ID845
Feature activation-3.72e-03

Pos logit contributions
aw+0.003
Run+0.003
plets+0.003
 trigger+0.003
ted+0.003
Neg logit contributions
 look-0.002
 clean-0.001
 tidy-0.001
 Jane-0.001
days-0.001
Token sad
Token ID6507
Feature activation+2.15e-02

Pos logit contributions
 look+0.005
 Jane+0.005
 clean+0.005
days+0.005
 tidy+0.005
Neg logit contributions
aw-0.008
Run-0.008
 trigger-0.008
plets-0.008
ted-0.008
Token,
Token ID11
Feature activation-2.60e-03

Pos logit contributions
aw+0.047
Run+0.046
 trigger+0.045
ted+0.044
plets+0.044
Neg logit contributions
 look-0.031
 Jane-0.028
 clean-0.028
 tidy-0.027
days-0.027
Token than
Token ID621
Feature activation+1.68e-02

Pos logit contributions
aw+0.090
Run+0.086
plets+0.084
ted+0.083
 sudden+0.083
Neg logit contributions
 look-0.072
 Jane-0.066
 clean-0.065
 tidy-0.065
days-0.064
Token ever
Token ID1683
Feature activation-3.58e-02

Pos logit contributions
aw+0.032
ted+0.032
Run+0.030
 trigger+0.030
rified+0.029
Neg logit contributions
 look-0.025
days-0.025
 tidy-0.024
 clean-0.023
 Jane-0.022
Token,
Token ID11
Feature activation-5.32e-04

Pos logit contributions
 tidy+0.049
days+0.049
 look+0.048
 clean+0.047
 Jane+0.045
Neg logit contributions
aw-0.073
ted-0.070
ting-0.066
Run-0.066
rified-0.064
Token and
Token ID290
Feature activation-1.87e-03

Pos logit contributions
 look+0.001
 Jane+0.001
 clean+0.001
 tidy+0.001
days+0.001
Neg logit contributions
aw-0.001
Run-0.001
 trigger-0.001
plets-0.001
ted-0.001
Token the
Token ID262
Feature activation-2.53e-02

Pos logit contributions
 look+0.003
 clean+0.002
days+0.002
 tidy+0.002
 Jane+0.002
Neg logit contributions
aw-0.004
Run-0.004
ted-0.004
 trigger-0.004
plets-0.004
Token p
Token ID279
Feature activation-1.99e-01

Pos logit contributions
 look+0.040
days+0.038
 Jane+0.037
 tidy+0.034
 clean+0.033
Neg logit contributions
 sudden-0.049
 trigger-0.049
Run-0.049
aw-0.048
ted-0.045
Tokenuddle
Token ID24500
Feature activation-1.16e-01

Pos logit contributions
 look+0.538
 Jane+0.494
 tidy+0.491
 clean+0.490
days+0.486
Neg logit contributions
aw-0.215
plets-0.179
Run-0.171
 sudden-0.167
ted-0.166
Token was
Token ID373
Feature activation+6.24e-03

Pos logit contributions
 look+0.164
days+0.147
 Jane+0.143
 tidy+0.141
 clean+0.140
Neg logit contributions
aw-0.252
Run-0.242
ted-0.240
 trigger-0.238
plets-0.236
Token gone
Token ID3750
Feature activation-2.19e-02

Pos logit contributions
 sudden+0.012
aw+0.011
Run+0.011
plets+0.011
king+0.011
Neg logit contributions
days-0.009
 look-0.009
 Jane-0.008
|-0.007
 lie-0.007
Token.
Token ID13
Feature activation+1.70e-03

Pos logit contributions
days+0.030
 look+0.029
 Jane+0.029
 tidy+0.026
 clean+0.025
Neg logit contributions
aw-0.045
 trigger-0.043
Run-0.042
rified-0.041
plets-0.041
Token In
Token ID554
Feature activation-3.37e-02

Pos logit contributions
aw+0.004
Run+0.004
plets+0.004
ted+0.004
 trigger+0.004
Neg logit contributions
 look-0.002
 clean-0.002
 tidy-0.002
days-0.002
 Jane-0.002
Token tree
Token ID5509
Feature activation-7.95e-02

Pos logit contributions
 look+0.025
days+0.025
 Jane+0.024
 Sus+0.023
 tidy+0.022
Neg logit contributions
Run-0.024
 sudden-0.023
 trigger-0.022
aw-0.022
plets-0.021
Token does
Token ID857
Feature activation-2.23e-02

Pos logit contributions
 look+0.111
days+0.100
 Jane+0.099
 clean+0.095
 tidy+0.095
Neg logit contributions
aw-0.175
ted-0.166
Run-0.164
rots-0.162
plets-0.158
Token not
Token ID407
Feature activation-6.91e-02

Pos logit contributions
days+0.034
 Jane+0.031
my+0.029
 Sus+0.029
 look+0.028
Neg logit contributions
aw-0.045
 arrows-0.040
 sudden-0.040
rots-0.040
plets-0.040
Token miss
Token ID2051
Feature activation-5.11e-02

Pos logit contributions
days+0.089
 Jane+0.079
bie+0.077
any+0.073
 cabinet+0.072
Neg logit contributions
 trigger-0.141
Run-0.135
Leave-0.129
 sudden-0.126
plets-0.120
Token the
Token ID262
Feature activation-2.60e-02

Pos logit contributions
 look+0.070
days+0.067
 tidy+0.062
 Jane+0.061
 clean+0.061
Neg logit contributions
aw-0.110
ted-0.105
Run-0.104
 trigger-0.104
rots-0.101
Token p
Token ID279
Feature activation-1.98e-01

Pos logit contributions
days+0.042
 look+0.036
any+0.034
 Jane+0.034
|+0.033
Neg logit contributions
 sudden-0.045
Run-0.044
 trigger-0.044
aw-0.043
rified-0.042
Tokenears
Token ID4127
Feature activation-1.19e-01

Pos logit contributions
 look+0.402
days+0.372
 tidy+0.360
 clean+0.359
 Jane+0.344
Neg logit contributions
aw-0.326
plets-0.288
ls-0.285
rots-0.279
ted-0.265
Token anymore
Token ID7471
Feature activation-5.58e-02

Pos logit contributions
days+0.174
 look+0.172
 Jane+0.163
 Sus+0.157
 tidy+0.155
Neg logit contributions
aw-0.230
plets-0.210
ted-0.210
rified-0.203
Run-0.201
Token!
Token ID0
Feature activation+1.81e-02

Pos logit contributions
 look+0.086
 tidy+0.079
 clean+0.078
days+0.074
 Jane+0.072
Neg logit contributions
aw-0.116
Run-0.108
ted-0.107
rots-0.106
plets-0.104
Token<|endoftext|>
Token ID50256
Feature activation+5.59e-02

Pos logit contributions
aw+0.039
Run+0.038
plets+0.037
ted+0.036
 trigger+0.036
Neg logit contributions
 look-0.028
 tidy-0.025
 clean-0.025
days-0.024
 Jane-0.024
TokenJ
Token ID41
Feature activation-4.70e-02

Pos logit contributions
aw+0.091
Run+0.087
ted+0.087
rots+0.081
plets+0.080
Neg logit contributions
 look-0.112
 tidy-0.109
 clean-0.106
days-0.101
 Jane-0.099
Token wrong
Token ID2642
Feature activation+2.07e-02

Pos logit contributions
 look+0.082
 Jane+0.073
 clean+0.072
 tidy+0.071
days+0.067
Neg logit contributions
aw-0.115
Run-0.113
plets-0.111
 trigger-0.111
ted-0.109
Token?".
Token ID43634
Feature activation-1.61e-02

Pos logit contributions
aw+0.052
Run+0.052
plets+0.049
 trigger+0.049
ted+0.049
Neg logit contributions
 look-0.022
 clean-0.019
 Jane-0.018
 tidy-0.018
days-0.018
Token
Token ID198
Feature activation-1.38e-02

Pos logit contributions
 look+0.027
 tidy+0.025
 clean+0.024
days+0.024
 Jane+0.023
Neg logit contributions
aw-0.032
Run-0.031
ted-0.030
plets-0.029
 sudden-0.029
Token
Token ID198
Feature activation-1.72e-02

Pos logit contributions
 look+0.019
 Jane+0.017
 clean+0.016
 tidy+0.016
days+0.016
Neg logit contributions
aw-0.031
Run-0.031
 trigger-0.030
plets-0.030
ted-0.030
TokenThe
Token ID464
Feature activation-1.25e-02

Pos logit contributions
 look+0.036
 Jane+0.034
 clean+0.034
 tidy+0.033
days+0.032
Neg logit contributions
aw-0.026
 trigger-0.026
Run-0.025
ted-0.025
plets-0.025
Token p
Token ID279
Feature activation-1.98e-01

Pos logit contributions
 look+0.018
 Jane+0.017
days+0.017
 clean+0.016
 tidy+0.016
Neg logit contributions
 sudden-0.026
Run-0.026
 trigger-0.025
aw-0.025
ted-0.024
Tokenanda
Token ID5282
Feature activation-9.16e-02

Pos logit contributions
 look+0.413
 clean+0.374
 tidy+0.372
 Jane+0.370
days+0.359
Neg logit contributions
aw-0.341
rots-0.301
plets-0.297
ted-0.295
Run-0.295
Token said
Token ID531
Feature activation-6.73e-03

Pos logit contributions
 look+0.146
days+0.134
 Jane+0.131
 tidy+0.131
 clean+0.131
Neg logit contributions
aw-0.190
Run-0.180
ted-0.175
 sudden-0.171
rots-0.170
Token,
Token ID11
Feature activation-2.49e-02

Pos logit contributions
 look+0.008
 clean+0.007
 Jane+0.007
 tidy+0.007
days+0.007
Neg logit contributions
aw-0.016
Run-0.016
 trigger-0.016
ted-0.016
plets-0.015
Token "
Token ID366
Feature activation+5.05e-02

Pos logit contributions
 look+0.041
days+0.038
 tidy+0.038
 clean+0.037
 Jane+0.037
Neg logit contributions
Run-0.048
aw-0.048
ted-0.046
 trigger-0.046
plets-0.046
TokenI
Token ID40
Feature activation-2.86e-02

Pos logit contributions
aw+0.096
 trigger+0.093
plets+0.089
rots+0.088
ted+0.087
Neg logit contributions
 look-0.085
 Jane-0.079
days-0.077
 clean-0.076
 tidy-0.076
Token sc
Token ID629
Feature activation-7.55e-02

Pos logit contributions
 look+0.017
days+0.015
 Jane+0.015
 tidy+0.014
 clean+0.014
Neg logit contributions
aw-0.021
Run-0.021
ted-0.020
 trigger-0.019
 sudden-0.019
Tokenow
Token ID322
Feature activation-4.11e-02

Pos logit contributions
 look+0.160
 Jane+0.147
 tidy+0.146
 clean+0.145
days+0.144
Neg logit contributions
aw-0.121
Run-0.110
plets-0.109
ted-0.108
 trigger-0.107
Tokenled
Token ID992
Feature activation+3.10e-02

Pos logit contributions
 look+0.099
 Jane+0.093
 clean+0.092
 tidy+0.092
days+0.090
Neg logit contributions
aw-0.050
Run-0.049
 trigger-0.047
plets-0.046
ted-0.045
Token at
Token ID379
Feature activation+3.61e-03

Pos logit contributions
aw+0.075
Run+0.074
 trigger+0.072
ted+0.071
plets+0.071
Neg logit contributions
 look-0.038
 Jane-0.033
 clean-0.032
 tidy-0.032
days-0.031
Token the
Token ID262
Feature activation-3.76e-02

Pos logit contributions
aw+0.007
ted+0.007
 trigger+0.007
 sudden+0.007
Run+0.007
Neg logit contributions
days-0.005
 look-0.005
 tidy-0.005
 Jane-0.005
 clean-0.004
Token p
Token ID279
Feature activation-1.98e-01

Pos logit contributions
days+0.058
 Jane+0.057
 look+0.056
 tidy+0.051
 Sus+0.050
Neg logit contributions
 trigger-0.073
 sudden-0.071
aw-0.071
Run-0.070
rified-0.065
Tokenuddle
Token ID24500
Feature activation-1.21e-01

Pos logit contributions
 look+0.592
 tidy+0.549
 Jane+0.547
 clean+0.546
days+0.540
Neg logit contributions
aw-0.177
plets-0.125
rots-0.115
ted-0.108
 sudden-0.100
Token.
Token ID13
Feature activation-5.42e-03

Pos logit contributions
 look+0.140
days+0.120
 tidy+0.119
 Jane+0.116
 clean+0.116
Neg logit contributions
aw-0.300
 trigger-0.282
ted-0.282
Run-0.282
plets-0.275
Token
Token ID198
Feature activation-3.59e-03

Pos logit contributions
 look+0.009
 tidy+0.008
days+0.008
 clean+0.008
 Jane+0.008
Neg logit contributions
aw-0.011
Run-0.010
ted-0.010
plets-0.010
 sudden-0.010
Token
Token ID198
Feature activation-1.08e-02

Pos logit contributions
 look+0.005
 Jane+0.004
 clean+0.004
 tidy+0.004
days+0.004
Neg logit contributions
aw-0.008
Run-0.008
 trigger-0.008
ted-0.008
plets-0.008
TokenSuddenly
Token ID38582
Feature activation+4.96e-02

Pos logit contributions
 look+0.024
 clean+0.023
 tidy+0.023
 Jane+0.022
days+0.022
Neg logit contributions
aw-0.015
 trigger-0.015
ted-0.014
plets-0.014
Run-0.014
Token
Token ID198
Feature activation-1.09e-02

Pos logit contributions
 look+0.102
 tidy+0.091
 clean+0.089
 Jane+0.088
days+0.088
Neg logit contributions
aw-0.145
Run-0.139
plets-0.138
 trigger-0.135
rots-0.135
TokenThe
Token ID464
Feature activation-4.59e-03

Pos logit contributions
 look+0.022
 Jane+0.021
 clean+0.021
 tidy+0.021
days+0.020
Neg logit contributions
aw-0.017
 trigger-0.016
Run-0.016
ted-0.016
plets-0.016
Token bird
Token ID6512
Feature activation+3.55e-03

Pos logit contributions
 look+0.008
 Jane+0.008
 clean+0.008
 tidy+0.008
days+0.007
Neg logit contributions
aw-0.008
Run-0.008
 trigger-0.008
plets-0.008
ted-0.008
Token thanked
Token ID26280
Feature activation-5.22e-02

Pos logit contributions
aw+0.008
Run+0.007
ted+0.007
 sudden+0.007
plets+0.007
Neg logit contributions
 look-0.005
 Jane-0.005
 clean-0.005
 tidy-0.005
days-0.004
Token the
Token ID262
Feature activation-3.65e-02

Pos logit contributions
 look+0.066
days+0.060
 tidy+0.059
 clean+0.059
 Jane+0.055
Neg logit contributions
aw-0.122
Run-0.119
ted-0.117
 trigger-0.116
plets-0.112
Token p
Token ID279
Feature activation-1.98e-01

Pos logit contributions
 look+0.072
 Jane+0.068
days+0.067
 clean+0.066
 tidy+0.066
Neg logit contributions
aw-0.059
Run-0.059
 trigger-0.057
 sudden-0.055
plets-0.054
Tokenanda
Token ID5282
Feature activation-1.06e-01

Pos logit contributions
 look+0.353
 Jane+0.321
 clean+0.319
 tidy+0.316
days+0.306
Neg logit contributions
aw-0.367
Run-0.360
 trigger-0.350
plets-0.347
ted-0.345
Token and
Token ID290
Feature activation-2.35e-03

Pos logit contributions
 look+0.177
 Jane+0.167
days+0.164
 tidy+0.164
 clean+0.161
Neg logit contributions
aw-0.195
ted-0.184
rots-0.179
Run-0.178
 trigger-0.177
Token offered
Token ID4438
Feature activation-3.21e-02

Pos logit contributions
 look+0.004
 Jane+0.003
 clean+0.003
 tidy+0.003
days+0.003
Neg logit contributions
aw-0.005
Run-0.005
ted-0.005
plets-0.004
 trigger-0.004
Token him
Token ID683
Feature activation+2.78e-02

Pos logit contributions
 look+0.050
 Jane+0.044
 clean+0.044
 tidy+0.044
days+0.043
Neg logit contributions
aw-0.067
Run-0.065
 trigger-0.064
ted-0.064
plets-0.063
Token a
Token ID257
Feature activation+1.64e-02

Pos logit contributions
aw+0.051
ted+0.049
rots+0.045
 arrows+0.045
Run+0.043
Neg logit contributions
days-0.042
 Jane-0.041
 look-0.040
 Sus-0.040
 tidy-0.039
Token was
Token ID373
Feature activation+6.01e-03

Pos logit contributions
aw+0.100
Run+0.098
 trigger+0.096
plets+0.095
ted+0.095
Neg logit contributions
 look-0.058
 Jane-0.051
 clean-0.050
 tidy-0.050
days-0.048
Token feeling
Token ID4203
Feature activation+1.67e-02

Pos logit contributions
aw+0.012
 sudden+0.012
Run+0.011
plets+0.011
Leave+0.011
Neg logit contributions
 look-0.009
days-0.009
 Jane-0.008
 tidy-0.007
 Sus-0.007
Token helpless
Token ID21144
Feature activation+6.04e-02

Pos logit contributions
aw+0.038
Run+0.037
 sudden+0.037
plets+0.036
 trigger+0.036
Neg logit contributions
 look-0.021
 Jane-0.020
days-0.019
 tidy-0.017
 clean-0.017
Token because
Token ID780
Feature activation-2.89e-02

Pos logit contributions
aw+0.130
Run+0.126
 trigger+0.123
ted+0.123
plets+0.123
Neg logit contributions
 look-0.089
 Jane-0.080
days-0.078
 clean-0.078
 tidy-0.078
Token the
Token ID262
Feature activation-2.76e-02

Pos logit contributions
 look+0.048
 clean+0.042
 Jane+0.042
days+0.042
 tidy+0.042
Neg logit contributions
aw-0.056
Run-0.055
ted-0.053
 sudden-0.053
 trigger-0.053
Token p
Token ID279
Feature activation-1.97e-01

Pos logit contributions
 look+0.042
days+0.042
 Jane+0.041
 tidy+0.036
 Sus+0.036
Neg logit contributions
 sudden-0.054
Run-0.052
 trigger-0.051
aw-0.050
ted-0.048
Tokenuddle
Token ID24500
Feature activation-1.06e-01

Pos logit contributions
 look+0.523
 Jane+0.488
 clean+0.482
 tidy+0.481
days+0.474
Neg logit contributions
aw-0.214
plets-0.191
Run-0.185
ted-0.180
 sudden-0.173
Token was
Token ID373
Feature activation+1.26e-02

Pos logit contributions
 look+0.146
 Jane+0.131
days+0.129
 tidy+0.125
 clean+0.124
Neg logit contributions
aw-0.234
Run-0.227
ted-0.225
 trigger-0.223
plets-0.222
Token too
Token ID1165
Feature activation-6.59e-02

Pos logit contributions
 sudden+0.024
Run+0.023
 trigger+0.022
ted+0.022
aw+0.022
Neg logit contributions
days-0.019
 look-0.018
 Jane-0.017
bi-0.015
|-0.015
Token deep
Token ID2769
Feature activation+1.16e-01

Pos logit contributions
days+0.102
 look+0.101
 Jane+0.099
 Sus+0.085
 Ell+0.082
Neg logit contributions
 sudden-0.130
Run-0.130
aw-0.124
plets-0.124
 trigger-0.116
Token to
Token ID284
Feature activation-1.16e-02

Pos logit contributions
aw+0.272
Run+0.268
 trigger+0.261
plets+0.261
ted+0.258
Neg logit contributions
 look-0.150
 clean-0.130
 Jane-0.129
 tidy-0.127
days-0.119
Token.
Token ID13
Feature activation-4.10e-03

Pos logit contributions
aw+0.060
ted+0.058
 trigger+0.058
Run+0.057
rified+0.055
Neg logit contributions
 Jane-0.053
days-0.052
 look-0.051
 Sus-0.050
 tidy-0.049
Token This
Token ID770
Feature activation-2.98e-02

Pos logit contributions
 look+0.007
 tidy+0.007
 clean+0.007
days+0.007
 smile+0.007
Neg logit contributions
ted-0.007
aw-0.006
Run-0.006
plets-0.006
rified-0.006
Token continued
Token ID3767
Feature activation-2.18e-02

Pos logit contributions
 look+0.041
days+0.040
 Jane+0.039
 clean+0.038
 tidy+0.038
Neg logit contributions
aw-0.062
 sudden-0.060
ted-0.058
Run-0.058
rified-0.054
Token until
Token ID1566
Feature activation+6.27e-02

Pos logit contributions
 look+0.029
 Jane+0.025
 clean+0.025
 tidy+0.025
days+0.024
Neg logit contributions
aw-0.050
Run-0.049
ted-0.048
 trigger-0.048
plets-0.047
Token the
Token ID262
Feature activation-2.70e-02

Pos logit contributions
Run+0.122
ted+0.121
aw+0.118
 sudden+0.116
plets+0.112
Neg logit contributions
 look-0.096
days-0.096
 tidy-0.094
 clean-0.092
my-0.086
Token p
Token ID279
Feature activation-1.97e-01

Pos logit contributions
 look+0.035
 Jane+0.034
days+0.032
 tidy+0.031
 clean+0.031
Neg logit contributions
 sudden-0.059
aw-0.058
Run-0.058
ted-0.057
 trigger-0.057
Tokenanda
Token ID5282
Feature activation-8.84e-02

Pos logit contributions
 look+0.436
 tidy+0.392
 clean+0.392
 Jane+0.391
days+0.379
Neg logit contributions
aw-0.325
ted-0.275
plets-0.272
rots-0.266
 sudden-0.260
Token finally
Token ID3443
Feature activation-3.36e-03

Pos logit contributions
 look+0.127
 Jane+0.115
 clean+0.110
days+0.110
 tidy+0.109
Neg logit contributions
aw-0.197
Run-0.187
ted-0.185
plets-0.181
 sudden-0.180
Token stopped
Token ID5025
Feature activation-1.94e-02

Pos logit contributions
days+0.005
 look+0.005
 tidy+0.005
 Jane+0.005
 clean+0.005
Neg logit contributions
aw-0.007
ted-0.006
Run-0.006
plets-0.006
rified-0.006
Token.
Token ID13
Feature activation-5.46e-03

Pos logit contributions
 look+0.027
 Jane+0.024
 clean+0.024
 tidy+0.024
days+0.023
Neg logit contributions
aw-0.043
Run-0.043
 trigger-0.041
ted-0.041
plets-0.041
Token
Token ID220
Feature activation-5.69e-02

Pos logit contributions
days+0.009
 look+0.009
 tidy+0.009
 clean+0.009
 organized+0.008
Neg logit contributions
ted-0.008
aw-0.008
rified-0.008
Run-0.008
plets-0.008
Token the
Token ID262
Feature activation-2.91e-02

Pos logit contributions
 look+0.028
 clean+0.025
 tidy+0.025
days+0.024
 Jane+0.023
Neg logit contributions
aw-0.047
Run-0.047
ted-0.045
 sudden-0.045
plets-0.044
Token little
Token ID1310
Feature activation-2.67e-02

Pos logit contributions
 look+0.047
 Jane+0.043
 clean+0.042
 tidy+0.041
days+0.041
Neg logit contributions
aw-0.059
Run-0.058
 trigger-0.056
plets-0.055
ted-0.055
Token girl
Token ID2576
Feature activation-1.27e-02

Pos logit contributions
 look+0.047
 Jane+0.043
 clean+0.043
 tidy+0.042
days+0.042
Neg logit contributions
aw-0.050
Run-0.049
 trigger-0.047
plets-0.046
ted-0.046
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
 look+0.018
 Jane+0.017
days+0.016
 tidy+0.015
 clean+0.015
Neg logit contributions
aw-0.028
ted-0.026
Run-0.025
rots-0.025
 sudden-0.025
Token the
Token ID262
Feature activation-3.47e-02

Pos logit contributions
 look+0.020
 clean+0.017
 tidy+0.017
 Jane+0.017
days+0.016
Neg logit contributions
aw-0.044
Run-0.043
ted-0.043
 trigger-0.041
 sudden-0.041
Token p
Token ID279
Feature activation-1.97e-01

Pos logit contributions
 look+0.059
 Jane+0.054
days+0.053
 clean+0.052
 tidy+0.052
Neg logit contributions
aw-0.066
Run-0.065
 trigger-0.064
 sudden-0.062
plets-0.061
Tokenanda
Token ID5282
Feature activation-1.04e-01

Pos logit contributions
 look+0.433
 Jane+0.391
 clean+0.390
 tidy+0.389
days+0.365
Neg logit contributions
aw-0.313
rots-0.267
ted-0.267
plets-0.259
 sudden-0.256
Token played
Token ID2826
Feature activation-4.23e-02

Pos logit contributions
 look+0.156
 Jane+0.151
 tidy+0.140
 clean+0.137
 Sus+0.135
Neg logit contributions
aw-0.222
ted-0.207
rots-0.198
Run-0.194
 sudden-0.193
Token together
Token ID1978
Feature activation-4.16e-02

Pos logit contributions
 look+0.069
 Jane+0.062
 clean+0.062
 tidy+0.061
days+0.058
Neg logit contributions
aw-0.086
Run-0.084
 trigger-0.083
plets-0.082
ted-0.081
Token every
Token ID790
Feature activation+9.22e-03

Pos logit contributions
 look+0.059
 Jane+0.053
 clean+0.052
 tidy+0.051
days+0.050
Neg logit contributions
aw-0.092
Run-0.089
ted-0.087
 trigger-0.087
plets-0.086
Token day
Token ID1110
Feature activation+3.39e-02

Pos logit contributions
 sudden+0.021
Run+0.021
aw+0.019
ted+0.019
 trigger+0.019
Neg logit contributions
 look-0.012
 Jane-0.010
 Sus-0.010
 tidy-0.010
 clean-0.010
Token that
Token ID326
Feature activation-6.14e-02

Pos logit contributions
 look+0.068
 Jane+0.062
days+0.062
 clean+0.061
 tidy+0.061
Neg logit contributions
aw-0.059
Run-0.057
ted-0.055
 trigger-0.054
plets-0.054
Token day
Token ID1110
Feature activation+3.94e-02

Pos logit contributions
 look+0.063
 Jane+0.056
 tidy+0.051
 clean+0.051
days+0.049
Neg logit contributions
Run-0.155
aw-0.152
 trigger-0.150
 sudden-0.148
plets-0.147
Token on
Token ID319
Feature activation-5.76e-02

Pos logit contributions
aw+0.089
Run+0.086
plets+0.085
 trigger+0.085
 sudden+0.082
Neg logit contributions
 look-0.056
 clean-0.051
 tidy-0.048
 have-0.047
 Jane-0.046
Token,
Token ID11
Feature activation-2.60e-02

Pos logit contributions
 look+0.075
 Jane+0.065
 tidy+0.064
days+0.064
 clean+0.064
Neg logit contributions
aw-0.131
Run-0.129
plets-0.125
 trigger-0.125
ted-0.124
Token the
Token ID262
Feature activation-3.20e-02

Pos logit contributions
 look+0.036
 Jane+0.031
 clean+0.031
 tidy+0.031
days+0.030
Neg logit contributions
aw-0.059
Run-0.058
 trigger-0.057
plets-0.056
ted-0.056
Token p
Token ID279
Feature activation-1.97e-01

Pos logit contributions
 look+0.053
 Jane+0.048
 clean+0.047
 tidy+0.046
days+0.046
Neg logit contributions
aw-0.064
Run-0.063
 trigger-0.061
plets-0.060
ted-0.060
Tokenanda
Token ID5282
Feature activation-9.95e-02

Pos logit contributions
 look+0.437
 Jane+0.402
 clean+0.397
 tidy+0.395
days+0.379
Neg logit contributions
aw-0.309
rots-0.258
plets-0.257
ted-0.244
Run-0.241
Token made
Token ID925
Feature activation+1.42e-03

Pos logit contributions
 look+0.146
 Jane+0.135
 tidy+0.125
 clean+0.125
days+0.124
Neg logit contributions
aw-0.218
ted-0.203
Run-0.203
rots-0.200
plets-0.199
Token sure
Token ID1654
Feature activation-3.54e-02

Pos logit contributions
aw+0.003
Run+0.003
 trigger+0.003
plets+0.003
ted+0.003
Neg logit contributions
 look-0.002
 Jane-0.002
 clean-0.002
 tidy-0.002
days-0.002
Token to
Token ID284
Feature activation-2.91e-02

Pos logit contributions
 look+0.051
days+0.046
 Jane+0.045
 tidy+0.045
 clean+0.045
Neg logit contributions
aw-0.076
Run-0.073
ted-0.072
 trigger-0.070
 sudden-0.070
Token always
Token ID1464
Feature activation-3.21e-02

Pos logit contributions
days+0.042
 Jane+0.040
bi+0.037
|+0.036
 Sus+0.036
Neg logit contributions
ising-0.047
aw-0.047
 sudden-0.045
 trigger-0.044
Then-0.044
TokenThe
Token ID464
Feature activation-6.35e-03

Pos logit contributions
 look+0.025
 Jane+0.024
 clean+0.024
 tidy+0.024
days+0.023
Neg logit contributions
aw-0.015
Run-0.014
 trigger-0.014
plets-0.014
ted-0.014
Token night
Token ID1755
Feature activation+2.58e-02

Pos logit contributions
 look+0.011
 Jane+0.010
days+0.010
 clean+0.009
 tidy+0.009
Neg logit contributions
Run-0.012
 trigger-0.012
 sudden-0.012
aw-0.011
ted-0.011
Token came
Token ID1625
Feature activation-4.17e-02

Pos logit contributions
aw+0.059
Run+0.058
 trigger+0.057
plets+0.057
ted+0.056
Neg logit contributions
 look-0.035
 Jane-0.031
 clean-0.030
 tidy-0.030
days-0.029
Token and
Token ID290
Feature activation+6.35e-03

Pos logit contributions
 look+0.060
 Jane+0.054
 tidy+0.052
 clean+0.052
days+0.051
Neg logit contributions
aw-0.090
Run-0.089
 trigger-0.087
ted-0.086
plets-0.086
Token the
Token ID262
Feature activation-1.95e-02

Pos logit contributions
Run+0.014
ted+0.014
aw+0.013
 trigger+0.013
plets+0.013
Neg logit contributions
 look-0.009
 Jane-0.008
days-0.008
 tidy-0.008
 clean-0.008
Token p
Token ID279
Feature activation-1.97e-01

Pos logit contributions
 look+0.030
 Jane+0.028
days+0.027
 clean+0.026
 tidy+0.026
Neg logit contributions
Run-0.039
 sudden-0.039
 trigger-0.039
aw-0.039
ted-0.037
Tokenanda
Token ID5282
Feature activation-1.02e-01

Pos logit contributions
 look+0.402
 Jane+0.365
 clean+0.361
 tidy+0.359
days+0.348
Neg logit contributions
aw-0.338
ted-0.297
plets-0.297
rots-0.295
Run-0.294
Token was
Token ID373
Feature activation+5.10e-03

Pos logit contributions
 look+0.147
 Jane+0.138
 tidy+0.130
 clean+0.130
days+0.129
Neg logit contributions
aw-0.222
Run-0.211
ted-0.210
 trigger-0.203
 sudden-0.203
Token still
Token ID991
Feature activation-3.98e-03

Pos logit contributions
aw+0.011
 sudden+0.011
Run+0.010
ted+0.010
Leave+0.010
Neg logit contributions
 look-0.007
days-0.007
 Jane-0.007
 Sus-0.006
 collect-0.005
Token lost
Token ID2626
Feature activation-6.19e-03

Pos logit contributions
 look+0.006
days+0.006
 Jane+0.006
 tidy+0.005
 Sus+0.005
Neg logit contributions
aw-0.008
 sudden-0.008
Run-0.007
ted-0.007
king-0.007
Token.
Token ID13
Feature activation-8.23e-03

Pos logit contributions
 look+0.008
days+0.008
 Jane+0.008
 tidy+0.007
 clean+0.007
Neg logit contributions
aw-0.013
 trigger-0.012
Run-0.012
ted-0.012
rified-0.012
Token.
Token ID13
Feature activation+9.87e-04

Pos logit contributions
 look+0.017
days+0.016
 Jane+0.016
 tidy+0.015
 clean+0.015
Neg logit contributions
aw-0.024
ted-0.023
Run-0.023
 trigger-0.022
plets-0.022
Token
Token ID198
Feature activation+7.38e-04

Pos logit contributions
aw+0.002
Run+0.002
ted+0.002
plets+0.002
 trigger+0.002
Neg logit contributions
 look-0.001
 tidy-0.001
days-0.001
 clean-0.001
 Jane-0.001
Token
Token ID198
Feature activation-7.74e-03

Pos logit contributions
aw+0.002
Run+0.002
ted+0.002
plets+0.002
 trigger+0.002
Neg logit contributions
 look-0.001
 Jane-0.001
 tidy-0.001
 clean-0.001
days-0.001
Token"
Token ID1
Feature activation+4.64e-02

Pos logit contributions
 look+0.017
 clean+0.015
 tidy+0.015
 Jane+0.015
days+0.015
Neg logit contributions
aw-0.012
 trigger-0.011
Run-0.011
plets-0.011
ted-0.011
TokenGood
Token ID10248
Feature activation-2.51e-02

Pos logit contributions
aw+0.078
 trigger+0.074
plets+0.074
rots+0.072
Run+0.071
Neg logit contributions
 look-0.091
 Jane-0.086
 clean-0.084
 tidy-0.083
days-0.081
Token morning
Token ID3329
Feature activation+1.07e-01

Pos logit contributions
 look+0.034
 Jane+0.029
 clean+0.029
 tidy+0.029
days+0.027
Neg logit contributions
aw-0.058
Run-0.058
 trigger-0.056
plets-0.056
ted-0.056
Token,
Token ID11
Feature activation+6.46e-03

Pos logit contributions
aw+0.238
Run+0.236
ted+0.225
plets+0.225
 trigger+0.222
Neg logit contributions
 look-0.141
days-0.131
 tidy-0.127
 clean-0.124
 Jane-0.121
Token O
Token ID440
Feature activation-5.34e-02

Pos logit contributions
Run+0.015
aw+0.015
ted+0.014
 trigger+0.014
plets+0.014
Neg logit contributions
 look-0.008
days-0.008
 Jane-0.007
 tidy-0.007
 clean-0.007
Tokenll
Token ID297
Feature activation-7.98e-02

Pos logit contributions
 look+0.085
 Jane+0.080
 tidy+0.075
 clean+0.074
days+0.072
Neg logit contributions
aw-0.113
plets-0.107
Run-0.106
rots-0.106
ted-0.102
Tokenie
Token ID494
Feature activation+2.35e-02

Pos logit contributions
 look+0.110
 Jane+0.104
 tidy+0.101
 clean+0.100
days+0.099
Neg logit contributions
aw-0.170
 trigger-0.166
Run-0.164
ted-0.160
plets-0.158
Token!"
Token ID2474
Feature activation+3.02e-02

Pos logit contributions
aw+0.051
Run+0.049
plets+0.047
 trigger+0.047
ted+0.046
Neg logit contributions
 look-0.034
 Jane-0.031
days-0.030
 tidy-0.030
 clean-0.029
Token them
Token ID606
Feature activation-3.25e-02

Pos logit contributions
plets+0.076
 sudden+0.073
Run+0.072
rots+0.072
aw+0.071
Neg logit contributions
 look-0.041
 Jane-0.036
 clean-0.034
 tidy-0.031
 Lucy-0.030
Token.
Token ID13
Feature activation+1.71e-02

Pos logit contributions
 look+0.035
 Jane+0.034
 tidy+0.032
days+0.031
 Sus+0.030
Neg logit contributions
aw-0.078
 trigger-0.076
Run-0.075
ted-0.074
rified-0.072
Token
Token ID198
Feature activation+1.53e-02

Pos logit contributions
 trigger+0.041
aw+0.041
Run+0.040
plets+0.040
ted+0.040
Neg logit contributions
 look-0.021
 Jane-0.020
 clean-0.018
 tidy-0.017
 Lucy-0.016
Token
Token ID198
Feature activation+6.81e-03

Pos logit contributions
aw+0.035
Run+0.034
 trigger+0.033
plets+0.033
ted+0.033
Neg logit contributions
 look-0.021
 Jane-0.018
 clean-0.018
 tidy-0.018
days-0.017
Token"
Token ID1
Feature activation+5.77e-02

Pos logit contributions
 trigger+0.015
ted+0.015
rified+0.014
plets+0.014
rots+0.014
Neg logit contributions
 clean-0.009
 look-0.009
 tidy-0.009
 Jane-0.009
 have-0.008
TokenHelp
Token ID22087
Feature activation+1.31e-01

Pos logit contributions
rots+0.085
aw+0.085
 trigger+0.082
plets+0.080
ted+0.078
Neg logit contributions
 look-0.113
 Jane-0.108
 tidy-0.107
 clean-0.107
 have-0.101
Token,
Token ID11
Feature activation+2.95e-02

Pos logit contributions
plets+0.335
aw+0.332
 trigger+0.330
Run+0.325
 sudden+0.321
Neg logit contributions
 look-0.149
 Jane-0.128
 clean-0.116
 tidy-0.106
 Sus-0.101
Token help
Token ID1037
Feature activation-2.26e-02

Pos logit contributions
Run+0.073
aw+0.071
 trigger+0.070
ted+0.069
plets+0.069
Neg logit contributions
 look-0.033
 Jane-0.029
 tidy-0.029
days-0.029
 clean-0.028
Token!"
Token ID2474
Feature activation+4.88e-02

Pos logit contributions
 look+0.029
 Jane+0.025
 clean+0.025
 tidy+0.025
days+0.024
Neg logit contributions
aw-0.054
Run-0.053
 trigger-0.052
plets-0.052
ted-0.051
Token Tom
Token ID4186
Feature activation+1.30e-02

Pos logit contributions
aw+0.128
 trigger+0.126
plets+0.125
Run+0.123
ted+0.123
Neg logit contributions
 look-0.053
 Jane-0.047
 clean-0.043
 tidy-0.041
days-0.036
Token and
Token ID290
Feature activation+9.91e-03

Pos logit contributions
plets+0.030
 trigger+0.029
 sudden+0.029
rots+0.028
aw+0.028
Neg logit contributions
 look-0.020
 clean-0.016
 Jane-0.016
 tidy-0.015
my-0.015
Token
Token ID198
Feature activation+3.90e-03

Pos logit contributions
aw+0.017
Run+0.016
plets+0.016
 trigger+0.016
ted+0.016
Neg logit contributions
 look-0.008
 Jane-0.007
 clean-0.007
 tidy-0.007
days-0.006
Token
Token ID198
Feature activation-5.56e-03

Pos logit contributions
aw+0.009
Run+0.009
 trigger+0.008
ted+0.008
plets+0.008
Neg logit contributions
 look-0.005
 Jane-0.005
 clean-0.005
 tidy-0.005
days-0.004
Token"
Token ID1
Feature activation+4.66e-02

Pos logit contributions
 look+0.009
 clean+0.009
 tidy+0.009
 Jane+0.008
days+0.008
Neg logit contributions
 trigger-0.011
aw-0.011
plets-0.011
ted-0.011
rots-0.011
TokenI
Token ID40
Feature activation-2.14e-02

Pos logit contributions
plets+0.089
rots+0.088
aw+0.087
 trigger+0.084
izing+0.083
Neg logit contributions
 look-0.079
 Jane-0.075
 clean-0.072
 tidy-0.071
 play-0.066
Token have
Token ID423
Feature activation+6.46e-03

Pos logit contributions
 look+0.028
 Jane+0.027
 tidy+0.025
 clean+0.025
days+0.025
Neg logit contributions
aw-0.048
Run-0.048
 trigger-0.046
ted-0.045
plets-0.045
Token bread
Token ID8509
Feature activation+1.48e-01

Pos logit contributions
 trigger+0.017
plets+0.017
Run+0.017
aw+0.017
rified+0.017
Neg logit contributions
 look-0.007
 clean-0.006
 Jane-0.005
 tidy-0.005
 have-0.004
Token for
Token ID329
Feature activation+8.24e-02

Pos logit contributions
aw+0.350
Run+0.349
plets+0.341
 trigger+0.341
ted+0.335
Neg logit contributions
 look-0.193
 clean-0.165
 Jane-0.163
 tidy-0.159
days-0.150
Token the
Token ID262
Feature activation+4.75e-03

Pos logit contributions
aw+0.163
Run+0.159
 trigger+0.155
plets+0.154
ted+0.153
Neg logit contributions
 look-0.138
 Jane-0.125
 clean-0.124
 tidy-0.123
days-0.119
Token ducks
Token ID39694
Feature activation+3.51e-02

Pos logit contributions
aw+0.008
Run+0.007
plets+0.007
ted+0.007
 trigger+0.007
Neg logit contributions
 look-0.010
 clean-0.009
 tidy-0.009
 Jane-0.009
days-0.008
Token,"
Token ID553
Feature activation+1.83e-02

Pos logit contributions
aw+0.075
Run+0.072
 trigger+0.069
plets+0.068
rots+0.068
Neg logit contributions
 look-0.049
 Jane-0.047
 tidy-0.046
days-0.045
 clean-0.045
Token Lily
Token ID20037
Feature activation-4.50e-02

Pos logit contributions
aw+0.049
Run+0.049
 trigger+0.048
plets+0.047
ted+0.047
Neg logit contributions
 look-0.017
 Jane-0.014
 clean-0.014
 tidy-0.014
days-0.013
Token,
Token ID11
Feature activation+1.78e-02

Pos logit contributions
aw+0.049
ted+0.046
 trigger+0.046
plets+0.045
Run+0.045
Neg logit contributions
 look-0.033
 Jane-0.032
days-0.030
 tidy-0.029
 clean-0.029
Token but
Token ID475
Feature activation+6.36e-02

Pos logit contributions
aw+0.040
Run+0.039
plets+0.039
 trigger+0.038
ted+0.038
Neg logit contributions
 look-0.026
 clean-0.023
 Jane-0.023
 tidy-0.023
days-0.021
Token we
Token ID356
Feature activation+3.26e-03

Pos logit contributions
Run+0.112
aw+0.111
 trigger+0.108
ted+0.106
plets+0.104
Neg logit contributions
 look-0.109
 Jane-0.102
days-0.101
 clean-0.100
 tidy-0.100
Token have
Token ID423
Feature activation-1.99e-03

Pos logit contributions
 trigger+0.006
rots+0.005
aw+0.005
Run+0.005
ted+0.005
Neg logit contributions
 Jane-0.006
 Sus-0.005
days-0.005
 tidy-0.005
 Lucy-0.005
Token each
Token ID1123
Feature activation+8.18e-03

Pos logit contributions
 look+0.003
days+0.003
 Jane+0.003
 tidy+0.002
 clean+0.002
Neg logit contributions
aw-0.004
 sudden-0.004
Run-0.004
ted-0.004
rots-0.004
Token other
Token ID584
Feature activation+1.10e-01

Pos logit contributions
aw+0.017
Run+0.017
 trigger+0.016
ted+0.016
plets+0.016
Neg logit contributions
 look-0.012
 Jane-0.011
 tidy-0.011
 clean-0.011
days-0.011
Token and
Token ID290
Feature activation+3.24e-02

Pos logit contributions
aw+0.229
ted+0.209
 trigger+0.203
rots+0.203
Run+0.202
Neg logit contributions
 Jane-0.153
 look-0.149
 Sus-0.138
days-0.128
 tidy-0.127
Token that
Token ID326
Feature activation-4.14e-02

Pos logit contributions
aw+0.069
Run+0.067
 trigger+0.066
ted+0.065
plets+0.064
Neg logit contributions
 look-0.045
 Jane-0.042
days-0.040
 clean-0.039
 tidy-0.039
Token's
Token ID338
Feature activation+9.83e-03

Pos logit contributions
 look+0.076
 Jane+0.071
 clean+0.069
 tidy+0.069
days+0.068
Neg logit contributions
aw-0.073
Run-0.071
 trigger-0.069
ted-0.068
plets-0.068
Token what
Token ID644
Feature activation-6.04e-02

Pos logit contributions
aw+0.019
 trigger+0.017
 arrows+0.017
rots+0.017
 unnecessary+0.017
Neg logit contributions
days-0.015
 Jane-0.014
 look-0.014
 Sus-0.012
|-0.012
Token's
Token ID338
Feature activation-7.59e-04

Pos logit contributions
 look+0.114
 Jane+0.104
 clean+0.104
 tidy+0.103
days+0.099
Neg logit contributions
aw-0.106
Run-0.104
 trigger-0.101
plets-0.101
ted-0.099
Token waiter
Token ID46612
Feature activation+1.83e-02

Pos logit contributions
 look+0.018
 Jane+0.017
days+0.016
 clean+0.016
 tidy+0.016
Neg logit contributions
aw-0.019
Run-0.019
 trigger-0.018
ted-0.018
plets-0.017
Token said
Token ID531
Feature activation+1.24e-02

Pos logit contributions
ted+0.036
aw+0.036
Run+0.035
rots+0.033
 trigger+0.033
Neg logit contributions
 look-0.027
 tidy-0.025
days-0.025
 clean-0.024
 Jane-0.024
Token "
Token ID366
Feature activation+6.06e-02

Pos logit contributions
aw+0.022
ted+0.021
Run+0.021
 trigger+0.021
plets+0.020
Neg logit contributions
 look-0.021
days-0.020
 tidy-0.019
 clean-0.019
 Jane-0.019
TokenWe
Token ID1135
Feature activation+1.31e-02

Pos logit contributions
aw+0.096
 trigger+0.095
Run+0.094
plets+0.089
ted+0.089
Neg logit contributions
 look-0.122
 Jane-0.113
days-0.111
 clean-0.111
 tidy-0.111
Token have
Token ID423
Feature activation+1.26e-02

Pos logit contributions
Run+0.026
aw+0.025
ted+0.025
 trigger+0.024
plets+0.024
Neg logit contributions
 Jane-0.019
 Sus-0.017
 look-0.017
 tidy-0.017
days-0.016
Token spaghetti
Token ID48241
Feature activation+1.26e-01

Pos logit contributions
aw+0.028
Run+0.026
ted+0.025
 sudden+0.025
plets+0.025
Neg logit contributions
 look-0.018
days-0.016
 Jane-0.016
 tidy-0.016
 clean-0.015
Token available
Token ID1695
Feature activation+7.45e-02

Pos logit contributions
aw+0.261
ted+0.245
plets+0.242
Run+0.242
rots+0.240
Neg logit contributions
 look-0.186
 Jane-0.166
 tidy-0.164
 clean-0.162
days-0.160
Token.
Token ID13
Feature activation+6.58e-03

Pos logit contributions
aw+0.149
ted+0.141
Run+0.140
plets+0.138
 trigger+0.137
Neg logit contributions
 look-0.115
 Jane-0.103
 tidy-0.101
days-0.101
 clean-0.099
Token Would
Token ID10928
Feature activation+2.99e-03

Pos logit contributions
aw+0.011
Run+0.011
 trigger+0.010
ted+0.010
plets+0.010
Neg logit contributions
 look-0.013
 Jane-0.012
 clean-0.012
 tidy-0.012
days-0.011
Token you
Token ID345
Feature activation+8.95e-03

Pos logit contributions
aw+0.004
Run+0.004
ted+0.004
plets+0.004
 trigger+0.004
Neg logit contributions
 look-0.007
 clean-0.006
 Jane-0.006
 tidy-0.006
days-0.006
Token like
Token ID588
Feature activation+9.58e-04

Pos logit contributions
aw+0.015
Run+0.014
ted+0.013
plets+0.013
rots+0.013
Neg logit contributions
 Jane-0.014
 Sus-0.013
days-0.013
 hall-0.013
 cabinet-0.013
Token Jill
Token ID22772
Feature activation-2.27e-02

Pos logit contributions
 look+0.016
 Jane+0.014
 clean+0.013
 tidy+0.013
days+0.013
Neg logit contributions
aw-0.028
Run-0.028
 trigger-0.027
plets-0.027
ted-0.027
Token followed
Token ID3940
Feature activation+2.48e-02

Pos logit contributions
 look+0.035
 Jane+0.032
days+0.032
 tidy+0.031
 clean+0.031
Neg logit contributions
aw-0.048
Run-0.045
ted-0.044
rots-0.042
plets-0.042
Token the
Token ID262
Feature activation+9.99e-03

Pos logit contributions
aw+0.060
Run+0.059
 trigger+0.058
ted+0.058
plets+0.057
Neg logit contributions
 look-0.029
 tidy-0.025
 clean-0.025
 Jane-0.024
days-0.024
Token bird
Token ID6512
Feature activation+2.29e-02

Pos logit contributions
aw+0.019
Run+0.019
 trigger+0.019
 sudden+0.018
plets+0.018
Neg logit contributions
 look-0.017
 Jane-0.016
days-0.015
 clean-0.015
 tidy-0.015
Token and
Token ID290
Feature activation+2.44e-02

Pos logit contributions
aw+0.048
Run+0.047
 trigger+0.046
ted+0.046
plets+0.045
Neg logit contributions
 look-0.034
 Jane-0.031
 tidy-0.030
 clean-0.030
days-0.029
Token they
Token ID484
Feature activation+4.01e-02

Pos logit contributions
aw+0.052
Run+0.051
ted+0.050
 trigger+0.049
plets+0.048
Neg logit contributions
 look-0.036
 Jane-0.032
 clean-0.032
 tidy-0.032
days-0.032
Token flew
Token ID13112
Feature activation+4.28e-02

Pos logit contributions
aw+0.082
ted+0.078
Run+0.077
 sudden+0.074
ked+0.074
Neg logit contributions
 look-0.061
 Jane-0.059
days-0.058
 tidy-0.056
 clean-0.055
Token high
Token ID1029
Feature activation+5.43e-02

Pos logit contributions
aw+0.088
Run+0.086
ted+0.084
 trigger+0.084
plets+0.081
Neg logit contributions
 look-0.066
days-0.061
 tidy-0.059
 clean-0.059
 Jane-0.058
Token up
Token ID510
Feature activation+1.96e-02

Pos logit contributions
aw+0.108
Run+0.106
 trigger+0.102
ted+0.102
plets+0.102
Neg logit contributions
 look-0.090
 Jane-0.082
 clean-0.081
 tidy-0.080
days-0.078
Token in
Token ID287
Feature activation-3.71e-02

Pos logit contributions
aw+0.040
Run+0.040
ted+0.038
 sudden+0.038
plets+0.037
Neg logit contributions
 look-0.030
days-0.028
 tidy-0.027
 clean-0.027
 Jane-0.026
Token the
Token ID262
Feature activation+2.14e-03

Pos logit contributions
days+0.051
 look+0.050
 tidy+0.046
 clean+0.045
 Jane+0.042
Neg logit contributions
aw-0.079
Run-0.079
 sudden-0.077
ted-0.074
 trigger-0.072
Token to
Token ID284
Feature activation+1.78e-02

Pos logit contributions
aw+0.085
Run+0.082
ted+0.081
 trigger+0.080
 sudden+0.079
Neg logit contributions
 look-0.048
 tidy-0.043
 Jane-0.042
 clean-0.042
days-0.042
Token disappear
Token ID10921
Feature activation+2.07e-02

Pos logit contributions
Run+0.038
 trigger+0.034
 sudden+0.034
aw+0.033
plets+0.033
Neg logit contributions
days-0.023
 Jane-0.022
bie-0.021
bi-0.020
 Lucy-0.020
Token and
Token ID290
Feature activation+1.82e-02

Pos logit contributions
aw+0.049
Run+0.046
 trigger+0.046
plets+0.045
ted+0.045
Neg logit contributions
 look-0.025
days-0.023
 Jane-0.023
 tidy-0.022
 clean-0.022
Token the
Token ID262
Feature activation-1.72e-02

Pos logit contributions
aw+0.039
Run+0.038
ted+0.037
 trigger+0.037
plets+0.036
Neg logit contributions
 look-0.026
days-0.024
 Jane-0.023
 clean-0.023
 tidy-0.023
Token tree
Token ID5509
Feature activation-6.89e-02

Pos logit contributions
 look+0.030
 Jane+0.027
days+0.027
 clean+0.027
 tidy+0.027
Neg logit contributions
aw-0.032
Run-0.032
 trigger-0.031
 sudden-0.030
plets-0.030
Token began
Token ID2540
Feature activation+2.55e-02

Pos logit contributions
 look+0.104
 Jane+0.094
 clean+0.092
 tidy+0.091
days+0.090
Neg logit contributions
aw-0.147
Run-0.142
plets-0.139
 trigger-0.139
ted-0.138
Token to
Token ID284
Feature activation+1.68e-02

Pos logit contributions
aw+0.057
Run+0.055
ted+0.054
 sudden+0.054
 trigger+0.053
Neg logit contributions
 look-0.034
days-0.030
 Jane-0.030
 tidy-0.030
 clean-0.029
Token get
Token ID651
Feature activation-2.29e-02

Pos logit contributions
Run+0.033
 trigger+0.030
 sudden+0.029
Leave+0.029
plets+0.029
Neg logit contributions
days-0.024
 Jane-0.023
bie-0.022
bi-0.021
 Sus-0.020
Token bigger
Token ID5749
Feature activation+3.10e-02

Pos logit contributions
 look+0.034
days+0.034
 Jane+0.032
 tidy+0.030
 clean+0.029
Neg logit contributions
aw-0.047
Run-0.045
 trigger-0.045
ted-0.045
 sudden-0.044
Token and
Token ID290
Feature activation+4.07e-03

Pos logit contributions
aw+0.077
Run+0.076
 trigger+0.074
plets+0.074
ted+0.073
Neg logit contributions
 look-0.036
 Jane-0.031
 clean-0.031
 tidy-0.030
days-0.029
Token bigger
Token ID5749
Feature activation+1.28e-02

Pos logit contributions
Run+0.008
aw+0.008
 trigger+0.008
plets+0.008
ted+0.007
Neg logit contributions
 look-0.007
days-0.006
 Jane-0.006
 clean-0.006
 tidy-0.006
TokenHe
Token ID1544
Feature activation+2.34e-02

Pos logit contributions
 trigger+0.002
ted+0.002
plets+0.002
aw+0.002
rots+0.002
Neg logit contributions
 look-0.002
 tidy-0.002
 clean-0.002
 Jane-0.002
days-0.002
Token dreamed
Token ID27947
Feature activation+1.40e-03

Pos logit contributions
aw+0.053
Run+0.051
ted+0.050
 sudden+0.048
 trigger+0.047
Neg logit contributions
 look-0.031
 Jane-0.029
days-0.028
 clean-0.027
 tidy-0.027
Token that
Token ID326
Feature activation-4.55e-02

Pos logit contributions
aw+0.003
Run+0.002
ted+0.002
plets+0.002
 trigger+0.002
Neg logit contributions
 look-0.003
 clean-0.002
 tidy-0.002
 Jane-0.002
days-0.002
Token he
Token ID339
Feature activation+5.79e-02

Pos logit contributions
 look+0.068
 Jane+0.061
days+0.061
 clean+0.061
 tidy+0.060
Neg logit contributions
aw-0.094
Run-0.092
ted-0.089
 trigger-0.088
 sudden-0.087
Token was
Token ID373
Feature activation+6.77e-04

Pos logit contributions
aw+0.114
ted+0.109
Run+0.108
rots+0.105
rified+0.103
Neg logit contributions
 Jane-0.091
 look-0.091
 clean-0.084
 tidy-0.082
 Sus-0.082
Token flying
Token ID7348
Feature activation+2.55e-02

Pos logit contributions
aw+0.001
 sudden+0.001
ting+0.001
ted+0.001
rified+0.001
Neg logit contributions
days-0.001
 look-0.001
 Jane-0.001
 Sus-0.001
 tidy-0.001
Token in
Token ID287
Feature activation-1.89e-02

Pos logit contributions
aw+0.054
Run+0.050
ted+0.050
 trigger+0.049
rified+0.049
Neg logit contributions
 look-0.038
 Jane-0.035
days-0.035
 clean-0.034
 tidy-0.034
Token a
Token ID257
Feature activation+3.01e-02

Pos logit contributions
 look+0.025
days+0.025
 tidy+0.023
 Jane+0.023
bi+0.023
Neg logit contributions
aw-0.039
Run-0.038
 sudden-0.037
ted-0.037
 trigger-0.035
Token big
Token ID1263
Feature activation-1.40e-02

Pos logit contributions
aw+0.069
 sudden+0.066
Run+0.066
 trigger+0.064
plets+0.063
Neg logit contributions
 look-0.040
 Jane-0.038
days-0.036
 Sus-0.034
 clean-0.033
Token balloon
Token ID21190
Feature activation+7.09e-04

Pos logit contributions
days+0.019
 look+0.018
 Jane+0.018
 Sus+0.016
 tidy+0.016
Neg logit contributions
aw-0.031
 sudden-0.028
Run-0.028
ted-0.027
 trigger-0.027
Token.
Token ID13
Feature activation+2.53e-04

Pos logit contributions
aw+0.002
Run+0.002
ted+0.002
 trigger+0.002
plets+0.002
Neg logit contributions
 look-0.001
 Jane-0.001
 tidy-0.001
 clean-0.001
days-0.001
Token day
Token ID1110
Feature activation+4.63e-02

Pos logit contributions
 look+0.049
 Jane+0.042
 clean+0.041
 tidy+0.041
days+0.039
Neg logit contributions
aw-0.125
Run-0.123
ted-0.118
 trigger-0.118
 sudden-0.117
Token,
Token ID11
Feature activation-1.69e-02

Pos logit contributions
 trigger+0.148
Run+0.144
rots+0.143
plets+0.141
aw+0.140
Neg logit contributions
 look-0.031
 clean-0.022
 Jane-0.019
 tidy-0.017
 have-0.015
Token the
Token ID262
Feature activation-3.00e-02

Pos logit contributions
 look+0.020
 Jane+0.016
 clean+0.016
 tidy+0.015
 have+0.014
Neg logit contributions
aw-0.046
plets-0.046
 trigger-0.045
rots-0.044
rified-0.043
Token bird
Token ID6512
Feature activation-2.42e-03

Pos logit contributions
 look+0.048
 Jane+0.044
 clean+0.043
 tidy+0.043
days+0.041
Neg logit contributions
aw-0.061
Run-0.060
 trigger-0.059
plets-0.058
ted-0.058
Token flew
Token ID13112
Feature activation+3.79e-02

Pos logit contributions
 look+0.003
 Jane+0.003
 clean+0.003
 tidy+0.003
days+0.003
Neg logit contributions
aw-0.006
Run-0.006
ted-0.005
 trigger-0.005
plets-0.005
Token by
Token ID416
Feature activation+2.30e-02

Pos logit contributions
aw+0.080
Run+0.077
ted+0.076
 trigger+0.075
plets+0.073
Neg logit contributions
 look-0.058
days-0.053
 Jane-0.053
 tidy-0.051
 clean-0.051
Token and
Token ID290
Feature activation+2.19e-02

Pos logit contributions
aw+0.045
Run+0.044
 sudden+0.043
ted+0.043
 trigger+0.042
Neg logit contributions
 look-0.035
 tidy-0.033
 clean-0.032
 Jane-0.032
days-0.032
Token listened
Token ID16399
Feature activation-2.84e-02

Pos logit contributions
aw+0.045
ted+0.045
Run+0.045
 trigger+0.042
plets+0.042
Neg logit contributions
 look-0.033
days-0.030
 Jane-0.030
 tidy-0.030
 clean-0.029
Token to
Token ID284
Feature activation-6.49e-03

Pos logit contributions
 look+0.038
 Jane+0.034
days+0.033
 clean+0.033
 tidy+0.033
Neg logit contributions
aw-0.064
Run-0.062
ted-0.061
 trigger-0.061
plets-0.060
Token the
Token ID262
Feature activation-1.19e-02

Pos logit contributions
 look+0.009
days+0.009
 Jane+0.009
 tidy+0.009
 clean+0.009
Neg logit contributions
aw-0.013
Run-0.013
 trigger-0.013
 sudden-0.012
ted-0.012
Token family
Token ID1641
Feature activation-2.19e-02

Pos logit contributions
 look+0.020
 clean+0.018
 tidy+0.018
 Jane+0.018
days+0.017
Neg logit contributions
aw-0.023
Run-0.023
plets-0.022
ted-0.022
 trigger-0.022
TokenStop
Token ID19485
Feature activation+1.86e-02

Pos logit contributions
rots+0.093
aw+0.092
plets+0.087
ted+0.082
 trigger+0.081
Neg logit contributions
 clean-0.101
 look-0.100
 tidy-0.100
 Jane-0.092
 have-0.089
Token,
Token ID11
Feature activation+2.52e-02

Pos logit contributions
aw+0.045
Run+0.043
 trigger+0.043
plets+0.043
ted+0.042
Neg logit contributions
 look-0.024
 Jane-0.021
 clean-0.020
 tidy-0.019
days-0.018
Token dog
Token ID3290
Feature activation+9.47e-02

Pos logit contributions
aw+0.055
plets+0.054
rots+0.053
 sudden+0.052
ted+0.052
Neg logit contributions
 look-0.040
 clean-0.034
 Jane-0.033
 tidy-0.032
 have-0.030
Token,
Token ID11
Feature activation+3.05e-02

Pos logit contributions
plets+0.250
 sudden+0.250
aw+0.248
 trigger+0.246
Run+0.245
Neg logit contributions
 look-0.098
 clean-0.076
 Jane-0.074
 tidy-0.069
 have-0.060
Token stop
Token ID2245
Feature activation-1.36e-02

Pos logit contributions
aw+0.079
plets+0.078
rots+0.076
 sudden+0.076
Run+0.076
Neg logit contributions
 look-0.037
 clean-0.029
 Jane-0.028
 tidy-0.027
 have-0.025
Token!"
Token ID2474
Feature activation+4.66e-02

Pos logit contributions
 look+0.016
 clean+0.014
 Jane+0.013
 tidy+0.012
 Sus+0.011
Neg logit contributions
aw-0.034
rots-0.034
 trigger-0.033
plets-0.033
Run-0.033
Token Ben
Token ID3932
Feature activation+5.80e-02

Pos logit contributions
aw+0.120
rots+0.117
plets+0.116
Run+0.114
 trigger+0.114
Neg logit contributions
 look-0.056
 Jane-0.048
 clean-0.047
 tidy-0.044
 Sus-0.039
Token says
Token ID1139
Feature activation+1.84e-02

Pos logit contributions
plets+0.122
 sudden+0.114
izing+0.111
ging+0.110
 trigger+0.109
Neg logit contributions
 look-0.112
 clean-0.088
 lie-0.086
 have-0.084
 p-0.083
Token.
Token ID13
Feature activation+2.73e-03

Pos logit contributions
aw+0.052
Run+0.051
 trigger+0.050
plets+0.050
ted+0.049
Neg logit contributions
 look-0.016
 Jane-0.013
 clean-0.012
 tidy-0.012
days-0.011
Token "
Token ID366
Feature activation+7.11e-02

Pos logit contributions
aw+0.007
Run+0.006
plets+0.006
 trigger+0.006
ted+0.006
Neg logit contributions
 look-0.004
 Jane-0.003
 clean-0.003
 tidy-0.003
days-0.003
TokenYou
Token ID1639
Feature activation+1.11e-02

Pos logit contributions
rots+0.131
aw+0.122
ted+0.118
plets+0.116
 trigger+0.114
Neg logit contributions
 look-0.130
 clean-0.120
 have-0.116
 tidy-0.115
 play-0.106
Token "
Token ID366
Feature activation+6.02e-02

Pos logit contributions
 look+0.017
 Jane+0.015
 clean+0.015
 tidy+0.014
days+0.014
Neg logit contributions
aw-0.030
Run-0.030
 trigger-0.029
plets-0.029
ted-0.029
TokenMaybe
Token ID13300
Feature activation-2.89e-02

Pos logit contributions
ted+0.100
rots+0.100
 trigger+0.100
rified+0.096
plets+0.096
Neg logit contributions
 look-0.123
 clean-0.111
 tidy-0.110
days-0.106
 Jane-0.103
Token there
Token ID612
Feature activation-1.66e-02

Pos logit contributions
 look+0.048
 Jane+0.043
 clean+0.042
 tidy+0.041
days+0.038
Neg logit contributions
aw-0.060
 trigger-0.059
plets-0.059
Run-0.058
ted-0.058
Token is
Token ID318
Feature activation-3.33e-02

Pos logit contributions
 look+0.029
 Jane+0.028
 clean+0.027
 tidy+0.027
days+0.027
Neg logit contributions
aw-0.030
Run-0.028
rots-0.027
ted-0.026
plets-0.026
Token a
Token ID257
Feature activation+3.43e-02

Pos logit contributions
 look+0.045
days+0.041
 Jane+0.041
 tidy+0.041
 clean+0.041
Neg logit contributions
aw-0.074
Run-0.072
 trigger-0.069
ted-0.068
plets-0.068
Token treasure
Token ID14068
Feature activation-5.25e-04

Pos logit contributions
Run+0.068
 sudden+0.068
aw+0.067
 trigger+0.067
plets+0.062
Neg logit contributions
days-0.050
 Jane-0.050
 look-0.048
 Sus-0.045
 clean-0.043
Token in
Token ID287
Feature activation-2.28e-02

Pos logit contributions
 look+0.001
 Jane+0.001
 clean+0.001
 tidy+0.001
days+0.001
Neg logit contributions
aw-0.001
Run-0.001
 trigger-0.001
plets-0.001
ted-0.001
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
 look+0.031
days+0.030
 Jane+0.028
 tidy+0.027
 clean+0.027
Neg logit contributions
Run-0.049
aw-0.049
 trigger-0.046
 sudden-0.045
plets-0.045
Token hole
Token ID7604
Feature activation-5.07e-03

Pos logit contributions
aw+0.026
Run+0.025
 trigger+0.024
 sudden+0.024
plets+0.024
Neg logit contributions
 look-0.016
 Jane-0.015
days-0.015
 clean-0.014
 tidy-0.014
Token!"
Token ID2474
Feature activation+2.33e-03

Pos logit contributions
 look+0.007
 tidy+0.006
 Jane+0.006
 clean+0.006
days+0.006
Neg logit contributions
aw-0.012
Run-0.011
plets-0.011
 trigger-0.011
ted-0.011
Token
Token ID198
Feature activation+5.86e-03

Pos logit contributions
 trigger+0.006
plets+0.006
aw+0.005
rots+0.005
Run+0.005
Neg logit contributions
 look-0.003
 Jane-0.003
 clean-0.003
 tidy-0.002
 Sus-0.002
Token that
Token ID326
Feature activation-4.80e-02

Pos logit contributions
aw+0.007
Run+0.007
 trigger+0.007
plets+0.007
ted+0.006
Neg logit contributions
 look-0.004
 Jane-0.004
 clean-0.004
 tidy-0.003
days-0.003
Token his
Token ID465
Feature activation+6.29e-03

Pos logit contributions
 look+0.080
days+0.074
 Jane+0.073
 clean+0.070
 tidy+0.070
Neg logit contributions
Run-0.086
aw-0.085
 sudden-0.085
ted-0.085
 trigger-0.083
Token v
Token ID410
Feature activation-6.89e-02

Pos logit contributions
aw+0.014
Run+0.013
 trigger+0.013
plets+0.013
ted+0.013
Neg logit contributions
 look-0.009
 Jane-0.008
 clean-0.008
 tidy-0.008
days-0.008
Tokenase
Token ID589
Feature activation-4.65e-02

Pos logit contributions
 look+0.114
 clean+0.104
 Jane+0.104
 tidy+0.103
days+0.102
Neg logit contributions
Run-0.135
aw-0.132
 trigger-0.129
plets-0.126
ted-0.126
Token was
Token ID373
Feature activation+1.30e-02

Pos logit contributions
 look+0.061
 Jane+0.055
days+0.052
 clean+0.051
 tidy+0.051
Neg logit contributions
aw-0.107
Run-0.104
ted-0.103
 trigger-0.102
plets-0.100
Token broken
Token ID5445
Feature activation-1.22e-03

Pos logit contributions
 sudden+0.024
 trigger+0.023
aw+0.023
king+0.022
Run+0.022
Neg logit contributions
days-0.021
 look-0.019
 Jane-0.019
bi-0.017
 Sus-0.016
Token.
Token ID13
Feature activation-9.12e-04

Pos logit contributions
 look+0.002
 Jane+0.002
days+0.002
 tidy+0.001
 clean+0.001
Neg logit contributions
aw-0.003
 trigger-0.003
ted-0.002
Run-0.002
rified-0.002
Token He
Token ID679
Feature activation+2.06e-02

Pos logit contributions
 look+0.001
 Jane+0.001
 clean+0.001
 tidy+0.001
days+0.001
Neg logit contributions
aw-0.002
Run-0.002
 trigger-0.002
ted-0.002
plets-0.002
Token cried
Token ID16896
Feature activation+3.07e-02

Pos logit contributions
aw+0.044
Run+0.044
ted+0.042
 trigger+0.042
 sudden+0.042
Neg logit contributions
 look-0.030
 Jane-0.028
days-0.027
 clean-0.026
 tidy-0.026
Token and
Token ID290
Feature activation+1.95e-02

Pos logit contributions
aw+0.068
Run+0.068
plets+0.066
 trigger+0.065
rots+0.064
Neg logit contributions
 look-0.045
 clean-0.040
 tidy-0.039
 Jane-0.039
days-0.036
Token tried
Token ID3088
Feature activation+2.11e-02

Pos logit contributions
Run+0.036
aw+0.035
ted+0.035
 sudden+0.034
 trigger+0.034
Neg logit contributions
 look-0.034
days-0.032
 Jane-0.031
 tidy-0.030
 clean-0.030
Token bake
Token ID28450
Feature activation-2.75e-02

Pos logit contributions
days+0.009
any+0.008
bie+0.008
bi+0.007
my+0.007
Neg logit contributions
 trigger-0.012
aw-0.012
 arrows-0.012
Run-0.012
ted-0.012
Token a
Token ID257
Feature activation+2.68e-02

Pos logit contributions
 look+0.035
days+0.033
 Jane+0.031
 tidy+0.030
 clean+0.029
Neg logit contributions
aw-0.063
Run-0.055
ted-0.054
rots-0.054
plets-0.053
Token cake
Token ID12187
Feature activation-2.72e-02

Pos logit contributions
aw+0.063
Run+0.061
 trigger+0.060
plets+0.060
 sudden+0.059
Neg logit contributions
 look-0.034
 Jane-0.030
 clean-0.029
 tidy-0.029
days-0.029
Token.
Token ID13
Feature activation-6.64e-03

Pos logit contributions
 look+0.034
days+0.033
 Jane+0.032
 tidy+0.031
 clean+0.030
Neg logit contributions
aw-0.058
plets-0.054
 trigger-0.053
rots-0.053
ted-0.052
Token Leo
Token ID19632
Feature activation+5.79e-02

Pos logit contributions
 look+0.011
 Jane+0.011
 clean+0.009
 Lucy+0.009
 tidy+0.009
Neg logit contributions
aw-0.013
 trigger-0.013
plets-0.013
Run-0.013
ted-0.012
Token wanted
Token ID2227
Feature activation-1.93e-03

Pos logit contributions
aw+0.128
Run+0.124
ted+0.121
 sudden+0.119
rots+0.119
Neg logit contributions
 look-0.081
 Jane-0.071
days-0.069
 clean-0.069
 tidy-0.068
Token to
Token ID284
Feature activation-1.51e-02

Pos logit contributions
days+0.003
 look+0.003
 tidy+0.003
 smile+0.003
 clean+0.002
Neg logit contributions
aw-0.004
ted-0.004
 sudden-0.003
Run-0.003
rots-0.003
Token make
Token ID787
Feature activation+4.56e-03

Pos logit contributions
bi+0.020
days+0.019
bie+0.019
any+0.018
my+0.018
Neg logit contributions
Run-0.027
 trigger-0.026
 sudden-0.025
aw-0.025
plets-0.025
Token a
Token ID257
Feature activation+4.79e-02

Pos logit contributions
aw+0.010
Run+0.009
 trigger+0.009
ted+0.009
 sudden+0.009
Neg logit contributions
days-0.006
 look-0.006
 tidy-0.006
 Jane-0.006
 clean-0.006
Token y
Token ID331
Feature activation-1.93e-02

Pos logit contributions
aw+0.106
 trigger+0.102
 sudden+0.101
Run+0.100
rots+0.096
Neg logit contributions
 look-0.061
 Jane-0.059
days-0.058
 tidy-0.053
 clean-0.053
Tokenummy
Token ID13513
Feature activation-1.49e-02

Pos logit contributions
 look+0.041
 Jane+0.038
 clean+0.038
 tidy+0.037
days+0.037
Neg logit contributions
aw-0.029
Run-0.028
 trigger-0.027
plets-0.027
ted-0.027
Token snow
Token ID6729
Feature activation-3.79e-03

Pos logit contributions
aw+0.023
 sudden+0.022
rots+0.021
Run+0.021
plets+0.020
Neg logit contributions
 look-0.014
 Jane-0.013
days-0.013
 clean-0.012
 tidy-0.012
Tokenman
Token ID805
Feature activation+4.16e-02

Pos logit contributions
 look+0.006
 Jane+0.006
 tidy+0.006
 clean+0.006
days+0.005
Neg logit contributions
aw-0.008
ted-0.007
Run-0.007
 trigger-0.007
plets-0.007
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
aw+0.102
Run+0.098
ted+0.097
 trigger+0.096
plets+0.096
Neg logit contributions
 look-0.049
 Jane-0.043
 tidy-0.042
 clean-0.042
days-0.040
Token They
Token ID1119
Feature activation-1.25e-03

Pos logit contributions
 look+0.019
 tidy+0.018
 clean+0.018
days+0.017
 smile+0.016
Neg logit contributions
aw-0.019
Run-0.019
ted-0.018
plets-0.018
 trigger-0.017
Token couldn
Token ID3521
Feature activation+2.93e-02

Pos logit contributions
 look+0.002
days+0.002
 Jane+0.002
 tidy+0.002
 clean+0.002
Neg logit contributions
aw-0.003
ted-0.002
Run-0.002
 sudden-0.002
ked-0.002
Token't
Token ID470
Feature activation-5.54e-02

Pos logit contributions
Run+0.053
aw+0.053
plets+0.051
ted+0.050
 trigger+0.049
Neg logit contributions
 look-0.051
 tidy-0.048
 clean-0.047
 Jane-0.047
days-0.046
Token wait
Token ID4043
Feature activation+2.08e-02

Pos logit contributions
 Jane+0.074
 tidy+0.070
 look+0.069
 clean+0.068
bi+0.067
Neg logit contributions
Run-0.111
 trigger-0.108
plets-0.103
aw-0.103
 sudden-0.103
Token to
Token ID284
Feature activation-2.41e-04

Pos logit contributions
aw+0.042
Run+0.040
 sudden+0.040
ted+0.039
plets+0.037
Neg logit contributions
 look-0.028
days-0.028
 tidy-0.028
 Jane-0.027
 clean-0.026
Token play
Token ID711
Feature activation-5.27e-02

Pos logit contributions
days+0.000
 Jane+0.000
any+0.000
 Sus+0.000
bi+0.000
Neg logit contributions
Run-0.000
aw-0.000
 sudden-0.000
 trigger-0.000
ted-0.000
Token outside
Token ID2354
Feature activation-4.37e-02

Pos logit contributions
 look+0.081
 Jane+0.075
days+0.074
 tidy+0.074
 clean+0.073
Neg logit contributions
aw-0.108
Run-0.102
ted-0.098
 trigger-0.097
plets-0.096
Token again
Token ID757
Feature activation-3.29e-02

Pos logit contributions
 look+0.066
 tidy+0.059
 clean+0.058
 Jane+0.057
days+0.055
Neg logit contributions
aw-0.090
Run-0.086
ted-0.084
plets-0.082
 sudden-0.081
Token sky
Token ID6766
Feature activation-6.30e-02

Pos logit contributions
 look+0.085
 Jane+0.080
 clean+0.077
days+0.077
 tidy+0.077
Neg logit contributions
aw-0.084
Run-0.082
 trigger-0.080
 sudden-0.078
plets-0.078
Token is
Token ID318
Feature activation-5.04e-02

Pos logit contributions
 Jane+0.102
 look+0.102
days+0.095
 tidy+0.094
 clean+0.093
Neg logit contributions
aw-0.119
plets-0.112
Run-0.112
rots-0.109
ted-0.108
Token blue
Token ID4171
Feature activation-1.48e-02

Pos logit contributions
days+0.072
 look+0.068
 Jane+0.067
 Sus+0.060
|+0.059
Neg logit contributions
aw-0.097
 sudden-0.096
Run-0.092
plets-0.091
king-0.087
Token!"
Token ID2474
Feature activation-1.55e-02

Pos logit contributions
 look+0.022
 Jane+0.020
 tidy+0.020
 clean+0.020
days+0.019
Neg logit contributions
aw-0.031
plets-0.030
Run-0.029
 trigger-0.029
rots-0.029
Token
Token ID220
Feature activation-6.59e-02

Pos logit contributions
 look+0.020
 Jane+0.017
 clean+0.017
 tidy+0.017
days+0.016
Neg logit contributions
aw-0.037
Run-0.036
 trigger-0.035
plets-0.035
ted-0.035
Token
Token ID198
Feature activation-7.15e-03

Pos logit contributions
 look+0.092
 tidy+0.081
 clean+0.080
 Jane+0.080
days+0.078
Neg logit contributions
aw-0.148
Run-0.143
plets-0.140
 trigger-0.139
ted-0.138
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
 look+0.009
 Jane+0.008
 clean+0.008
 tidy+0.008
days+0.008
Neg logit contributions
aw-0.017
Run-0.016
 trigger-0.016
plets-0.016
ted-0.016
TokenThe
Token ID464
Feature activation-7.17e-03

Pos logit contributions
 look+0.022
 Jane+0.022
 tidy+0.021
 clean+0.021
days+0.021
Neg logit contributions
 trigger-0.020
ted-0.019
aw-0.019
plets-0.019
Run-0.019
Token rh
Token ID9529
Feature activation+1.02e-02

Pos logit contributions
 look+0.012
 Jane+0.012
 clean+0.011
days+0.011
 tidy+0.011
Neg logit contributions
aw-0.013
Run-0.013
 trigger-0.013
 sudden-0.013
ted-0.013
Tokenino
Token ID2879
Feature activation+4.07e-02

Pos logit contributions
Run+0.020
 trigger+0.019
aw+0.019
plets+0.019
ted+0.019
Neg logit contributions
 look-0.016
 clean-0.015
 tidy-0.015
 Jane-0.015
days-0.015
Token was
Token ID373
Feature activation-1.03e-02

Pos logit contributions
aw+0.090
Run+0.087
ted+0.084
plets+0.082
 sudden+0.082
Neg logit contributions
 look-0.059
 Jane-0.053
 clean-0.052
 tidy-0.052
days-0.051
Token<|endoftext|>
Token ID50256
Feature activation+6.21e-02

Pos logit contributions
aw+0.029
Run+0.029
 trigger+0.028
plets+0.028
ted+0.028
Neg logit contributions
 look-0.017
 Jane-0.015
 clean-0.015
 tidy-0.015
days-0.014
TokenL
Token ID43
Feature activation-3.91e-02

Pos logit contributions
aw+0.101
Run+0.096
ted+0.094
rots+0.090
plets+0.090
Neg logit contributions
 look-0.125
 tidy-0.118
 clean-0.116
 Jane-0.114
days-0.112
Tokenily
Token ID813
Feature activation-3.66e-02

Pos logit contributions
 look+0.063
 tidy+0.057
 clean+0.056
days+0.056
 Jane+0.056
Neg logit contributions
aw-0.079
Run-0.073
ted-0.073
 trigger-0.071
plets-0.070
Token was
Token ID373
Feature activation-1.65e-02

Pos logit contributions
 look+0.055
days+0.051
 Jane+0.051
 tidy+0.047
 clean+0.046
Neg logit contributions
aw-0.079
ted-0.074
Run-0.073
plets-0.067
rified-0.067
Token a
Token ID257
Feature activation+8.34e-03

Pos logit contributions
 look+0.023
 Jane+0.020
days+0.020
 tidy+0.019
 clean+0.019
Neg logit contributions
aw-0.037
Run-0.035
 sudden-0.034
ted-0.034
plets-0.034
Token p
Token ID279
Feature activation-1.68e-01

Pos logit contributions
 sudden+0.016
aw+0.016
Run+0.016
plets+0.015
 trigger+0.014
Neg logit contributions
 look-0.013
 Jane-0.012
days-0.012
 Sus-0.011
 Lucy-0.010
Tokenanda
Token ID5282
Feature activation-6.34e-02

Pos logit contributions
 look+0.370
 Jane+0.350
 clean+0.330
 tidy+0.328
 Lucy+0.326
Neg logit contributions
aw-0.271
rots-0.223
plets-0.220
Run-0.212
ted-0.211
Token who
Token ID508
Feature activation-4.22e-02

Pos logit contributions
 look+0.083
 Jane+0.074
 tidy+0.070
 clean+0.069
days+0.068
Neg logit contributions
aw-0.151
ted-0.138
Run-0.138
plets-0.136
rots-0.136
Token liked
Token ID8288
Feature activation-7.52e-03

Pos logit contributions
 Jane+0.068
days+0.065
 look+0.064
 Sus+0.062
 tidy+0.057
Neg logit contributions
aw-0.082
ted-0.075
Run-0.065
rots-0.065
led-0.063
Token to
Token ID284
Feature activation-1.87e-02

Pos logit contributions
 look+0.012
days+0.011
 Jane+0.011
 tidy+0.010
 clean+0.010
Neg logit contributions
aw-0.014
ted-0.014
 sudden-0.013
Run-0.013
 trigger-0.012
Token deliver
Token ID5203
Feature activation+7.77e-03

Pos logit contributions
 Jane+0.024
days+0.023
bi+0.023
 Sus+0.022
bie+0.022
Neg logit contributions
Run-0.033
 trigger-0.032
aw-0.031
 arrows-0.031
Leave-0.031
Token.
Token ID13
Feature activation-3.36e-03

Pos logit contributions
 look+0.088
 Jane+0.087
 tidy+0.085
 clean+0.079
days+0.078
Neg logit contributions
aw-0.131
ted-0.124
 trigger-0.119
ays-0.119
plets-0.118
Token
Token ID220
Feature activation-5.85e-02

Pos logit contributions
 look+0.006
 Jane+0.005
 clean+0.005
 tidy+0.005
days+0.005
Neg logit contributions
aw-0.007
Run-0.006
 trigger-0.006
plets-0.006
ted-0.006
Token
Token ID198
Feature activation-5.16e-03

Pos logit contributions
 look+0.079
 Jane+0.069
 clean+0.069
 tidy+0.068
days+0.066
Neg logit contributions
aw-0.134
Run-0.130
plets-0.127
 trigger-0.127
ted-0.126
Token
Token ID198
Feature activation-1.11e-02

Pos logit contributions
 look+0.007
 clean+0.006
 Jane+0.006
 tidy+0.006
days+0.006
Neg logit contributions
aw-0.012
Run-0.012
 trigger-0.011
plets-0.011
ted-0.011
TokenThe
Token ID464
Feature activation-5.45e-03

Pos logit contributions
 look+0.023
 tidy+0.022
 clean+0.022
 Jane+0.022
days+0.021
Neg logit contributions
 trigger-0.017
aw-0.017
ted-0.016
plets-0.016
Run-0.015
Token town
Token ID3240
Feature activation-1.05e-01

Pos logit contributions
 look+0.009
 Jane+0.009
days+0.008
 Sus+0.008
 clean+0.007
Neg logit contributions
Run-0.010
 sudden-0.010
 trigger-0.010
aw-0.010
plets-0.009
Tokenspe
Token ID4125
Feature activation-4.78e-02

Pos logit contributions
 look+0.159
 Jane+0.142
 clean+0.140
 tidy+0.138
days+0.134
Neg logit contributions
aw-0.223
Run-0.218
 trigger-0.211
ted-0.209
plets-0.209
Tokenople
Token ID643
Feature activation-3.60e-02

Pos logit contributions
 look+0.098
 tidy+0.091
 clean+0.091
 Jane+0.091
days+0.089
Neg logit contributions
Run-0.072
 trigger-0.071
plets-0.070
 sudden-0.066
aw-0.066
Token thought
Token ID1807
Feature activation-3.54e-02

Pos logit contributions
 look+0.051
 Jane+0.049
 tidy+0.047
 clean+0.046
days+0.046
Neg logit contributions
aw-0.077
Run-0.075
ted-0.073
 sudden-0.072
 trigger-0.071
Token and
Token ID290
Feature activation-6.98e-03

Pos logit contributions
 look+0.048
 Jane+0.043
 clean+0.042
 tidy+0.042
days+0.041
Neg logit contributions
aw-0.080
Run-0.079
 trigger-0.076
ted-0.076
plets-0.076
Token thought
Token ID1807
Feature activation-4.45e-02

Pos logit contributions
 look+0.010
days+0.009
 tidy+0.009
 clean+0.009
 Jane+0.008
Neg logit contributions
aw-0.015
Run-0.015
ted-0.015
 sudden-0.014
rified-0.014
Token while
Token ID981
Feature activation+1.38e-01

Pos logit contributions
aw+0.088
 trigger+0.084
Run+0.083
ted+0.083
rots+0.082
Neg logit contributions
 look-0.066
 Jane-0.063
days-0.060
 Sus-0.057
 tidy-0.057
Token we
Token ID356
Feature activation+1.49e-02

Pos logit contributions
aw+0.201
Run+0.192
plets+0.184
ted+0.182
 trigger+0.182
Neg logit contributions
 look-0.295
 tidy-0.271
 Jane-0.271
 clean-0.271
days-0.269
Token go
Token ID467
Feature activation+1.26e-02

Pos logit contributions
aw+0.031
ted+0.030
Run+0.030
plets+0.030
 trigger+0.029
Neg logit contributions
 Jane-0.022
 Sus-0.019
days-0.017
 Lucy-0.017
 tidy-0.017
Token on
Token ID319
Feature activation+4.90e-03

Pos logit contributions
aw+0.023
ted+0.023
 trigger+0.022
plets+0.022
rots+0.021
Neg logit contributions
 Jane-0.019
 look-0.018
 tidy-0.018
days-0.017
 clean-0.017
Token the
Token ID262
Feature activation+1.07e-02

Pos logit contributions
aw+0.011
Run+0.010
 sudden+0.010
ted+0.010
 trigger+0.010
Neg logit contributions
 look-0.007
days-0.006
 Jane-0.006
 tidy-0.006
 clean-0.006
Token slide
Token ID10649
Feature activation-1.06e-01

Pos logit contributions
aw+0.021
 trigger+0.021
Run+0.020
ted+0.019
plets+0.019
Neg logit contributions
days-0.016
 Jane-0.016
 look-0.016
 Sus-0.015
 tidy-0.014
Token.
Token ID13
Feature activation+1.35e-02

Pos logit contributions
 look+0.136
 Jane+0.119
 tidy+0.117
 clean+0.115
days+0.110
Neg logit contributions
aw-0.255
ted-0.229
Run-0.227
plets-0.224
rots-0.221
Token It
Token ID632
Feature activation-1.26e-02

Pos logit contributions
aw+0.025
 trigger+0.024
 sudden+0.024
Run+0.024
ted+0.024
Neg logit contributions
 look-0.025
 Jane-0.023
 clean-0.022
 have-0.021
 tidy-0.021
Token will
Token ID481
Feature activation-1.48e-02

Pos logit contributions
 Jane+0.021
 look+0.021
days+0.020
 tidy+0.020
 clean+0.019
Neg logit contributions
aw-0.024
Run-0.022
rots-0.022
ted-0.022
plets-0.022
Token make
Token ID787
Feature activation+1.54e-02

Pos logit contributions
 Jane+0.019
days+0.018
 Sus+0.016
bi+0.015
any+0.015
Neg logit contributions
 trigger-0.032
Run-0.031
plets-0.030
aw-0.030
Leave-0.029
Token you
Token ID345
Feature activation+3.97e-02

Pos logit contributions
Run+0.032
plets+0.031
 trigger+0.031
ted+0.031
aw+0.030
Neg logit contributions
 look-0.026
 clean-0.023
 Jane-0.022
 tidy-0.022
 Sus-0.020
Token's
Token ID338
Feature activation+8.50e-03

Pos logit contributions
 look+0.046
 clean+0.043
 tidy+0.042
 Jane+0.039
my+0.037
Neg logit contributions
 trigger-0.067
plets-0.067
Run-0.067
aw-0.066
rots-0.064
Token mom
Token ID1995
Feature activation-8.86e-03

Pos logit contributions
Run+0.021
aw+0.021
plets+0.021
ted+0.020
rots+0.019
Neg logit contributions
 look-0.011
 clean-0.010
 tidy-0.010
 Jane-0.009
 smile-0.008
Token asked
Token ID1965
Feature activation-1.14e-02

Pos logit contributions
 look+0.011
days+0.010
 Jane+0.010
 tidy+0.009
 clean+0.009
Neg logit contributions
aw-0.020
ted-0.019
Run-0.019
 sudden-0.019
 trigger-0.018
Token him
Token ID683
Feature activation+2.34e-02

Pos logit contributions
 look+0.021
 Jane+0.019
 clean+0.019
 tidy+0.019
days+0.018
Neg logit contributions
aw-0.021
Run-0.020
ted-0.019
 trigger-0.019
plets-0.019
Token to
Token ID284
Feature activation-1.75e-02

Pos logit contributions
ted+0.041
aw+0.040
rots+0.040
 trigger+0.038
Run+0.037
Neg logit contributions
 look-0.034
days-0.034
 tidy-0.033
 clean-0.031
 Jane-0.031
Token clean
Token ID3424
Feature activation-1.15e-01

Pos logit contributions
 Jane+0.018
days+0.018
 look+0.017
 Sus+0.016
bi+0.016
Neg logit contributions
 trigger-0.038
Run-0.038
aw-0.038
 sudden-0.037
ted-0.036
Token his
Token ID465
Feature activation+2.42e-02

Pos logit contributions
 look+0.138
 Jane+0.119
 clean+0.118
 tidy+0.116
days+0.108
Neg logit contributions
aw-0.285
Run-0.281
 trigger-0.274
plets-0.274
ted-0.271
Token messy
Token ID24097
Feature activation-6.20e-02

Pos logit contributions
Run+0.052
aw+0.051
plets+0.051
 trigger+0.050
ted+0.050
Neg logit contributions
 look-0.038
 clean-0.035
 tidy-0.034
 Jane-0.032
days-0.030
Token room
Token ID2119
Feature activation-9.75e-02

Pos logit contributions
 look+0.105
 Jane+0.097
 clean+0.094
 tidy+0.093
days+0.093
Neg logit contributions
aw-0.120
Run-0.115
 trigger-0.114
ted-0.112
plets-0.111
Token.
Token ID13
Feature activation-2.43e-02

Pos logit contributions
bi+0.131
 look+0.131
 Jane+0.126
 smile+0.119
 Sus+0.116
Neg logit contributions
aw-0.172
izing-0.164
ted-0.162
 trigger-0.158
ising-0.158
Token Tim
Token ID5045
Feature activation-3.60e-02

Pos logit contributions
 look+0.034
days+0.032
 tidy+0.030
 clean+0.029
 smile+0.027
Neg logit contributions
Run-0.051
ted-0.051
aw-0.050
 sudden-0.048
 trigger-0.047
Token.
Token ID13
Feature activation+1.87e-03

Pos logit contributions
 look+0.027
 Jane+0.024
days+0.022
 tidy+0.022
 clean+0.022
Neg logit contributions
aw-0.057
ted-0.052
plets-0.052
Run-0.051
 trigger-0.051
Token Then
Token ID3244
Feature activation+3.53e-02

Pos logit contributions
aw+0.003
Run+0.003
plets+0.003
ted+0.003
 trigger+0.003
Neg logit contributions
 look-0.003
 tidy-0.003
 clean-0.003
days-0.003
 Jane-0.003
Token they
Token ID484
Feature activation+3.40e-02

Pos logit contributions
aw+0.074
Run+0.073
 trigger+0.070
ted+0.070
plets+0.070
Neg logit contributions
 look-0.054
 clean-0.049
 tidy-0.048
 Jane-0.048
days-0.047
Token decorated
Token ID24789
Feature activation-4.87e-02

Pos logit contributions
aw+0.075
ted+0.071
Run+0.070
plets+0.068
 trigger+0.068
Neg logit contributions
 look-0.047
 Jane-0.044
days-0.044
 tidy-0.040
 clean-0.040
Token the
Token ID262
Feature activation+4.94e-03

Pos logit contributions
 look+0.070
 Jane+0.062
 clean+0.061
 tidy+0.060
days+0.058
Neg logit contributions
aw-0.108
Run-0.106
 trigger-0.104
plets-0.103
ted-0.102
Token room
Token ID2119
Feature activation-1.09e-01

Pos logit contributions
aw+0.012
Run+0.012
 trigger+0.011
plets+0.011
ted+0.011
Neg logit contributions
 look-0.006
 Jane-0.005
 clean-0.005
 tidy-0.005
days-0.005
Token with
Token ID351
Feature activation-1.18e-02

Pos logit contributions
 look+0.197
days+0.186
 tidy+0.179
 Jane+0.179
 clean+0.177
Neg logit contributions
aw-0.177
plets-0.155
 sudden-0.153
ted-0.153
Run-0.147
Token other
Token ID584
Feature activation+5.50e-02

Pos logit contributions
 look+0.016
days+0.015
 Jane+0.015
 tidy+0.014
 clean+0.014
Neg logit contributions
aw-0.026
Run-0.025
ted-0.025
 trigger-0.025
 sudden-0.025
Token flowers
Token ID12734
Feature activation-2.55e-03

Pos logit contributions
aw+0.107
Run+0.105
 trigger+0.102
plets+0.101
ted+0.101
Neg logit contributions
 look-0.094
 Jane-0.085
 clean-0.085
 tidy-0.084
days-0.081
Token too
Token ID1165
Feature activation-6.26e-02

Pos logit contributions
 look+0.004
days+0.004
 Jane+0.004
 tidy+0.004
 clean+0.004
Neg logit contributions
aw-0.005
plets-0.004
ted-0.004
rots-0.004
Run-0.004
Token.
Token ID13
Feature activation+5.35e-03

Pos logit contributions
days+0.087
 look+0.084
 Jane+0.076
|+0.076
 tidy+0.072
Neg logit contributions
aw-0.136
plets-0.128
 sudden-0.123
Run-0.121
rots-0.118