TOP ACTIVATIONS
MAX = 0.247

He takes the screw driver and opens the hole .
my showed Billy his screw driver and they decided to build
the screw with this screw driver . Let 's find more
blue ocean with a paint brush in his hand .
, Jan found her tooth brush at her doorstep , but
to put on her tooth brush . One
born only a few days ago . The deer felt sorry
dirty trophy back in the sandbox . He would never have
story . A long time ago , there was an adventurous
in her daddy 's tool box . She thought it would
poking out of the book case ! Lisa was so excited
she took a thin paint brush and made tiny little circles
He was here a minute ago . He was chasing the
lived here a long time ago ." " Can
you carry a l ily pad too ?" The
our country a long time ago . He helped many people
Let â s go find something else
- up box in the attic ! She went to get
and walk through a metal detector . Tom and Lily thought
them into beautiful or nam ents . One day

MIN ACTIVATIONS
MIN = -0.147

, wearing their fast red swe aters . <|endoftext|> Once upon
, he noticed that the Sm elly King was cruel and
Spot wanted to thank Ch ir py for singing .
go with a stranger . Str angers can be bad and
With Simone 's gentle reass urance , Josh stepped into
very relieved and hugged the naughty boy . From then on
sat down to enjoy their humble dinner together . From that
getting very annoyed with his rud eness and complained that he
, feeling grateful for the kind stranger 's help . <|endoftext|>
too . They hoped the rude bird would change and help
ran over and grabbed the naughty boy , and gave him
told his family about the wise owl and they all went
sign that said " Hom eless and Hungry ". Tim my
Buzz y asked , Should
, all thanks to the wise owl 's delivery . <|endoftext|>
When they were finished the wise woman said , " You
happy . They thanked the wise old man and played with
long . Soon , the wicked man became greedy again and
he screamed and he said This
m op call after her Remember

INTERVAL 0.148 - 0.247
CONTAINS 0.236%

I â m sorry Jane . I
would supply them with things they needed like food , water
comes out and the dro plets of water in the sky
outside . Kitty and Dog gy were inside the house ,
Tim my met some friendly aliens who were also pretending to

INTERVAL 0.050 - 0.148
CONTAINS 11.724%

wearing her red m itten when it was cold outside .
. They sh ivered . They were cold and scared .
, a little boy named Tim went to the park with
tent . They even stayed up late into the night playing
that the little duck was not feeling well . They asked

INTERVAL -0.048 - 0.050
CONTAINS 75.974%

cried . Her mom smiled and said , "
in your eye ?" Sara asked . The frog
to feel gr umpy . He kept z ipping around the
to take them . He walked over to them and said
was getting dark outside , and it needed to find its

INTERVAL -0.147 - -0.048
CONTAINS 12.067%

Ben . " Run , run !" Sara says
't believe his eyes . The truck was so orange !
, all wrapped in the soft leaves . <|endoftext|> Once upon
better . He was really proud of his work .
for permission than to feel guilty . <|endoftext|> Once upon a
Token
Token ID198
Feature activation+8.10e-02

Pos logit contributions
 in+0.147
o+0.128
 got+0.122
 just+0.122
 not+0.121
Neg logit contributions
umm-0.221
 disapproval-0.192
Warning-0.190
 heavens-0.187
 horror-0.182
TokenHe
Token ID1544
Feature activation+1.46e-01

Pos logit contributions
 in+0.126
o+0.115
 not+0.108
 got+0.107
 found+0.107
Neg logit contributions
umm-0.173
 disapproval-0.143
 heavens-0.141
 horror-0.139
 horizon-0.138
Token takes
Token ID2753
Feature activation+6.37e-02

Pos logit contributions
 in+0.148
gy+0.146
o+0.136
 out+0.126
ie+0.118
Neg logit contributions
 disapproval-0.305
umm-0.299
Warning-0.297
 horror-0.297
 heavens-0.296
Token the
Token ID262
Feature activation+2.01e-02

Pos logit contributions
 in+0.051
 would+0.044
o+0.044
gy+0.043
 got+0.040
Neg logit contributions
umm-0.162
 disapproval-0.152
Warning-0.151
 heavens-0.148
 sentences-0.146
Token screw
Token ID9580
Feature activation+8.60e-02

Pos logit contributions
 in+0.032
 got+0.027
 just+0.027
gy+0.027
o+0.027
Neg logit contributions
umm-0.039
Warning-0.036
 disapproval-0.035
 heavens-0.034
 horror-0.033
Tokendriver
Token ID26230
Feature activation+1.94e-01

Pos logit contributions
 in+0.101
 found+0.091
 would+0.090
o+0.089
 got+0.089
Neg logit contributions
umm-0.181
Warning-0.166
 clerk-0.165
 disapproval-0.161
 heavens-0.160
Token and
Token ID290
Feature activation+9.50e-02

Pos logit contributions
gy+0.132
o+0.123
 in+0.122
 would+0.118
 found+0.106
Neg logit contributions
umm-0.452
 clerk-0.444
Warning-0.438
 heavens-0.430
 disapproval-0.430
Token opens
Token ID9808
Feature activation+9.59e-02

Pos logit contributions
o+0.124
gy+0.118
 in+0.116
ie+0.115
 would+0.098
Neg logit contributions
 subt-0.181
 horror-0.179
Warning-0.176
umm-0.174
 sentences-0.170
Token the
Token ID262
Feature activation+2.14e-02

Pos logit contributions
o+0.089
 would+0.081
gy+0.080
 got+0.077
 in+0.076
Neg logit contributions
umm-0.218
Warning-0.210
 horror-0.208
 disapproval-0.207
 delight-0.207
Token hole
Token ID7604
Feature activation+1.10e-01

Pos logit contributions
 in+0.038
 out+0.033
 just+0.033
 got+0.033
gy+0.032
Neg logit contributions
umm-0.038
Warning-0.034
 disapproval-0.034
 heavens-0.032
 gentleman-0.031
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
gy+0.091
 in+0.090
o+0.085
 got+0.085
 would+0.081
Neg logit contributions
umm-0.266
Warning-0.243
 disapproval-0.242
 heavens-0.239
 horizon-0.237
Tokenmy
Token ID1820
Feature activation+1.16e-01

Pos logit contributions
 in+0.085
 out+0.072
gy+0.069
 not+0.066
 there+0.065
Neg logit contributions
 disapproval-0.193
Warning-0.188
 Painter-0.187
umm-0.186
 heavens-0.185
Token showed
Token ID3751
Feature activation+4.38e-02

Pos logit contributions
 in+0.118
 out+0.108
gy+0.102
 there+0.098
 where+0.094
Neg logit contributions
 disapproval-0.215
 gentleman-0.215
 subt-0.213
 orchestra-0.210
 Painter-0.209
Token Billy
Token ID15890
Feature activation+8.31e-02

Pos logit contributions
 in+0.050
 would+0.042
 got+0.042
gy+0.042
o+0.041
Neg logit contributions
umm-0.099
 disapproval-0.092
Warning-0.090
 heavens-0.088
 horror-0.088
Token his
Token ID465
Feature activation+3.15e-02

Pos logit contributions
 in+0.065
 would+0.064
 will+0.063
 got+0.063
gy+0.061
Neg logit contributions
 disapproval-0.173
 clerk-0.168
 languages-0.167
 amusement-0.166
 sentences-0.166
Token screw
Token ID9580
Feature activation+8.10e-02

Pos logit contributions
 in+0.045
 got+0.039
o+0.038
gy+0.038
 just+0.038
Neg logit contributions
umm-0.065
Warning-0.059
 disapproval-0.059
 heavens-0.057
 jan-0.055
Tokendriver
Token ID26230
Feature activation+1.90e-01

Pos logit contributions
 in+0.098
o+0.090
ie+0.089
gy+0.088
 found+0.088
Neg logit contributions
umm-0.154
 clerk-0.146
Warning-0.146
 jan-0.141
 disapproval-0.139
Token and
Token ID290
Feature activation+7.96e-02

Pos logit contributions
gy+0.118
 in+0.111
o+0.109
 got+0.092
 would+0.088
Neg logit contributions
umm-0.456
 disapproval-0.451
Warning-0.448
 clerk-0.447
 heavens-0.434
Token they
Token ID484
Feature activation+1.12e-01

Pos logit contributions
 in+0.093
o+0.085
gy+0.081
ie+0.071
 just+0.070
Neg logit contributions
umm-0.161
 subt-0.160
 clerk-0.156
Warning-0.155
 jan-0.155
Token decided
Token ID3066
Feature activation+5.96e-02

Pos logit contributions
 in+0.097
o+0.085
gy+0.082
 not+0.079
?"+0.078
Neg logit contributions
umm-0.251
 disapproval-0.246
Warning-0.242
 subt-0.238
 horror-0.236
Token to
Token ID284
Feature activation+6.68e-02

Pos logit contributions
o+0.043
gy+0.040
ie+0.039
 in+0.037
bo+0.030
Neg logit contributions
Warning-0.139
 clerk-0.138
umm-0.137
 subt-0.136
Fighting-0.135
Token build
Token ID1382
Feature activation+6.68e-02

Pos logit contributions
 in+0.089
o+0.087
ie+0.075
 out+0.074
gy+0.073
Neg logit contributions
umm-0.130
Warning-0.118
 disapproval-0.117
 delight-0.115
 jan-0.114
Token the
Token ID262
Feature activation+2.98e-02

Pos logit contributions
 in+0.066
gy+0.058
o+0.056
 got+0.052
 would+0.049
Neg logit contributions
umm-0.240
Warning-0.226
 disapproval-0.224
 heavens-0.220
 horror-0.216
Token screw
Token ID9580
Feature activation+9.64e-02

Pos logit contributions
 in+0.049
gy+0.043
o+0.043
 got+0.043
 just+0.042
Neg logit contributions
umm-0.055
Warning-0.049
 disapproval-0.049
 heavens-0.048
 horror-0.046
Token with
Token ID351
Feature activation+7.90e-02

Pos logit contributions
o+0.077
gy+0.077
 in+0.076
 found+0.071
 not+0.068
Neg logit contributions
umm-0.226
Warning-0.208
 heavens-0.201
 clerk-0.201
 Painter-0.199
Token this
Token ID428
Feature activation+4.20e-02

Pos logit contributions
 in+0.063
gy+0.052
o+0.052
 got+0.049
 would+0.047
Neg logit contributions
umm-0.208
 disapproval-0.194
Warning-0.194
 heavens-0.190
 horror-0.187
Token screw
Token ID9580
Feature activation+8.51e-02

Pos logit contributions
 in+0.051
o+0.048
gy+0.047
 got+0.044
 not+0.043
Neg logit contributions
umm-0.089
Warning-0.080
 disapproval-0.079
 heavens-0.078
 horizon-0.077
Tokendriver
Token ID26230
Feature activation+1.85e-01

Pos logit contributions
 in+0.083
o+0.082
gy+0.080
 found+0.076
 got+0.073
Neg logit contributions
umm-0.182
Warning-0.168
 clerk-0.163
 heavens-0.160
 Painter-0.160
Token.
Token ID13
Feature activation+2.47e-02

Pos logit contributions
gy+0.155
o+0.152
 in+0.143
 found+0.121
 got+0.121
Neg logit contributions
umm-0.418
Warning-0.397
 heavens-0.388
 horizon-0.387
 disapproval-0.387
Token Let
Token ID3914
Feature activation+6.37e-02

Pos logit contributions
 in+0.038
 just+0.034
o+0.034
 not+0.032
 there+0.032
Neg logit contributions
umm-0.051
 disapproval-0.046
 sidelines-0.045
 gentleman-0.044
irk-0.043
Token's
Token ID338
Feature activation+5.69e-02

Pos logit contributions
 in+0.060
gy+0.058
o+0.057
 where+0.048
 not+0.044
Neg logit contributions
umm-0.144
Warning-0.140
 disapproval-0.136
 heavens-0.135
 clerk-0.133
Token find
Token ID1064
Feature activation+8.32e-02

Pos logit contributions
o+0.075
led+0.069
gy+0.064
eh+0.064
 in+0.063
Neg logit contributions
umm-0.101
 delight-0.099
 horror-0.096
 Math-0.094
 Teacher-0.093
Token more
Token ID517
Feature activation+3.11e-02

Pos logit contributions
gy+0.075
o+0.068
 in+0.066
 would+0.065
 got+0.062
Neg logit contributions
umm-0.192
Warning-0.184
 sentences-0.181
 horror-0.180
 heavens-0.177
Token blue
Token ID4171
Feature activation+4.96e-02

Pos logit contributions
 in+0.097
gy+0.091
o+0.091
 got+0.086
ie+0.085
Neg logit contributions
umm-0.132
 disapproval-0.123
 heavens-0.121
Warning-0.120
 horizon-0.117
Token ocean
Token ID9151
Feature activation+1.31e-01

Pos logit contributions
 in+0.087
o+0.082
gy+0.081
 not+0.080
 got+0.079
Neg logit contributions
umm-0.084
 disapproval-0.071
Warning-0.070
 heavens-0.070
 horizon-0.069
Token with
Token ID351
Feature activation+6.89e-02

Pos logit contributions
gy+0.134
o+0.129
 in+0.120
ie+0.110
 got+0.109
Neg logit contributions
umm-0.271
 horizon-0.265
 heavens-0.250
 gentleman-0.245
 disapproval-0.240
Token a
Token ID257
Feature activation+6.29e-02

Pos logit contributions
 in+0.081
gy+0.071
o+0.069
 got+0.066
 would+0.065
Neg logit contributions
umm-0.157
 disapproval-0.146
Warning-0.144
 heavens-0.141
 horror-0.138
Token paint
Token ID7521
Feature activation+7.78e-02

Pos logit contributions
 in+0.097
gy+0.084
o+0.084
 got+0.083
 just+0.082
Neg logit contributions
umm-0.123
 disapproval-0.111
Warning-0.111
 heavens-0.108
 horror-0.104
Tokenbrush
Token ID32680
Feature activation+1.84e-01

Pos logit contributions
 in+0.087
o+0.084
gy+0.083
 not+0.078
 got+0.076
Neg logit contributions
umm-0.174
 heavens-0.155
Warning-0.154
 disapproval-0.153
 horizon-0.153
Token in
Token ID287
Feature activation+7.12e-02

Pos logit contributions
o+0.179
gy+0.174
by+0.168
 searched+0.165
 would+0.161
Neg logit contributions
 horizon-0.316
umm-0.301
ement-0.299
 heavens-0.288
Warning-0.285
Token his
Token ID465
Feature activation+4.70e-02

Pos logit contributions
 in+0.056
o+0.041
gy+0.040
 got+0.040
 just+0.039
Neg logit contributions
umm-0.194
Warning-0.181
 disapproval-0.181
 heavens-0.177
 horror-0.172
Token hand
Token ID1021
Feature activation+7.08e-02

Pos logit contributions
 in+0.075
o+0.071
gy+0.069
 got+0.068
 not+0.067
Neg logit contributions
umm-0.080
 disapproval-0.072
 heavens-0.071
 horizon-0.070
 horror-0.069
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
 in+0.061
gy+0.055
o+0.054
 got+0.053
 not+0.051
Neg logit contributions
umm-0.165
 disapproval-0.153
Warning-0.152
 heavens-0.151
 horizon-0.149
Token
Token ID220
Feature activation+6.55e-02

Pos logit contributions
 in+0.046
o+0.041
 just+0.040
gy+0.039
 got+0.039
Neg logit contributions
umm-0.058
 disapproval-0.052
Warning-0.051
 heavens-0.049
 jan-0.048
Token,
Token ID11
Feature activation+7.60e-02

Pos logit contributions
gy+0.114
o+0.109
 in+0.101
 would+0.101
 found+0.092
Neg logit contributions
umm-0.267
 gentleman-0.253
 heavens-0.251
 orchestra-0.248
 horizon-0.245
Token Jan
Token ID2365
Feature activation+8.57e-02

Pos logit contributions
gy+0.089
 in+0.080
o+0.080
ie+0.071
 got+0.071
Neg logit contributions
umm-0.143
 gentleman-0.142
 clerk-0.138
 orchestra-0.137
 disapproval-0.136
Token found
Token ID1043
Feature activation+8.09e-02

Pos logit contributions
 in+0.100
gy+0.090
 not+0.090
o+0.087
 out+0.086
Neg logit contributions
umm-0.176
Warning-0.166
 disapproval-0.165
 heavens-0.164
 Painter-0.163
Token her
Token ID607
Feature activation+4.17e-02

Pos logit contributions
 in+0.055
gy+0.048
o+0.045
 got+0.044
 would+0.040
Neg logit contributions
umm-0.218
 disapproval-0.205
Warning-0.202
 heavens-0.202
 clerk-0.197
Token tooth
Token ID16162
Feature activation+6.61e-02

Pos logit contributions
 in+0.061
o+0.055
 got+0.054
gy+0.054
 found+0.052
Neg logit contributions
umm-0.082
 disapproval-0.073
Warning-0.072
 heavens-0.071
 jan-0.069
Tokenbrush
Token ID32680
Feature activation+1.80e-01

Pos logit contributions
o+0.084
gy+0.079
 in+0.073
 our+0.068
 not+0.068
Neg logit contributions
umm-0.127
 clerk-0.111
 heavens-0.108
 disapproval-0.108
 gentleman-0.107
Token at
Token ID379
Feature activation+6.40e-02

Pos logit contributions
o+0.147
 swim+0.146
gy+0.138
 in+0.129
 will+0.127
Neg logit contributions
umm-0.355
 clerk-0.345
ement-0.330
 horizon-0.324
 disapproval-0.321
Token her
Token ID607
Feature activation+3.50e-02

Pos logit contributions
 in+0.064
o+0.053
 got+0.053
gy+0.053
 just+0.050
Neg logit contributions
umm-0.156
Warning-0.145
 disapproval-0.144
 heavens-0.141
 horror-0.137
Token doorstep
Token ID41594
Feature activation+1.15e-01

Pos logit contributions
 in+0.053
o+0.050
 got+0.048
gy+0.048
 found+0.046
Neg logit contributions
umm-0.066
 disapproval-0.057
 heavens-0.057
Warning-0.056
 horror-0.055
Token,
Token ID11
Feature activation+5.38e-02

Pos logit contributions
 in+0.098
o+0.093
gy+0.091
 got+0.083
 found+0.077
Neg logit contributions
umm-0.285
 disapproval-0.259
Warning-0.258
 heavens-0.256
 horizon-0.252
Token but
Token ID475
Feature activation+4.05e-02

Pos logit contributions
 in+0.055
gy+0.054
o+0.052
ie+0.046
 got+0.045
Neg logit contributions
umm-0.119
 disapproval-0.116
 heavens-0.113
Warning-0.111
 horror-0.110
Token to
Token ID284
Feature activation+9.22e-02

Pos logit contributions
 in+0.112
gy+0.106
o+0.103
ie+0.091
 out+0.084
Neg logit contributions
umm-0.292
 disapproval-0.284
 heavens-0.279
Warning-0.278
 subt-0.268
Token put
Token ID1234
Feature activation+7.71e-02

Pos logit contributions
o+0.117
 in+0.109
gy+0.099
 would+0.094
 got+0.092
Neg logit contributions
 disapproval-0.186
umm-0.186
Warning-0.177
 heavens-0.177
 delight-0.173
Token on
Token ID319
Feature activation+7.31e-02

Pos logit contributions
gy+0.060
o+0.057
 in+0.053
 got+0.050
 would+0.049
Neg logit contributions
umm-0.188
 disapproval-0.176
Warning-0.171
 heavens-0.170
 clerk-0.168
Token her
Token ID607
Feature activation+3.58e-02

Pos logit contributions
 in+0.073
o+0.065
gy+0.063
 got+0.062
 would+0.059
Neg logit contributions
umm-0.173
 disapproval-0.162
Warning-0.160
 heavens-0.158
 subt-0.153
Token tooth
Token ID16162
Feature activation+6.17e-02

Pos logit contributions
o+0.050
 in+0.048
gy+0.048
 got+0.046
 found+0.043
Neg logit contributions
umm-0.064
 disapproval-0.060
 heavens-0.058
Warning-0.057
 horizon-0.056
Tokenbrush
Token ID32680
Feature activation+1.79e-01

Pos logit contributions
o+0.095
 in+0.093
gy+0.091
 got+0.088
 not+0.087
Neg logit contributions
umm-0.102
 clerk-0.089
 heavens-0.087
 disapproval-0.087
Warning-0.084
Token.
Token ID13
Feature activation+3.11e-02

Pos logit contributions
o+0.141
 in+0.139
 got+0.139
gy+0.133
 found+0.132
Neg logit contributions
umm-0.367
 clerk-0.362
 horizon-0.347
 disapproval-0.342
Warning-0.335
Token
Token ID220
Feature activation+6.66e-02

Pos logit contributions
 in+0.050
o+0.045
 just+0.045
 got+0.043
gy+0.043
Neg logit contributions
umm-0.061
Warning-0.054
 disapproval-0.054
 jan-0.051
 heavens-0.051
Token
Token ID198
Feature activation+5.21e-02

Pos logit contributions
 in+0.099
 got+0.086
gy+0.086
 just+0.085
 would+0.083
Neg logit contributions
umm-0.128
Warning-0.119
 disapproval-0.116
 heavens-0.114
 clerk-0.111
Token
Token ID198
Feature activation+3.99e-02

Pos logit contributions
 in+0.051
o+0.045
 found+0.039
 not+0.039
 got+0.038
Neg logit contributions
umm-0.142
 disapproval-0.125
 horror-0.123
Warning-0.121
 jan-0.121
TokenOne
Token ID3198
Feature activation+3.84e-02

Pos logit contributions
 in+0.060
o+0.053
 not+0.051
 got+0.051
 found+0.051
Neg logit contributions
umm-0.088
 horror-0.075
 disapproval-0.073
 horizon-0.071
 heavens-0.070
Token born
Token ID4642
Feature activation+4.82e-02

Pos logit contributions
gy+0.090
o+0.086
 in+0.077
ie+0.076
 would+0.073
Neg logit contributions
 disapproval-0.140
 heavens-0.134
umm-0.134
 horror-0.133
 anticipation-0.130
Token only
Token ID691
Feature activation+6.68e-02

Pos logit contributions
 in+0.042
o+0.039
gy+0.038
 got+0.035
 just+0.033
Neg logit contributions
umm-0.118
 disapproval-0.111
 heavens-0.108
Warning-0.108
 gentleman-0.105
Token a
Token ID257
Feature activation+4.73e-02

Pos logit contributions
 in+0.070
o+0.065
gy+0.063
 got+0.060
 not+0.057
Neg logit contributions
umm-0.152
 disapproval-0.144
Warning-0.140
 heavens-0.139
 horror-0.137
Token few
Token ID1178
Feature activation+4.79e-02

Pos logit contributions
 in+0.076
o+0.065
gy+0.065
 got+0.065
 just+0.064
Neg logit contributions
umm-0.090
Warning-0.081
 disapproval-0.081
 heavens-0.079
 horror-0.075
Token days
Token ID1528
Feature activation+6.69e-02

Pos logit contributions
 in+0.093
o+0.083
gy+0.083
 got+0.082
 just+0.082
Neg logit contributions
umm-0.075
Warning-0.066
 disapproval-0.066
 heavens-0.063
 horror-0.060
Token ago
Token ID2084
Feature activation+1.77e-01

Pos logit contributions
 in+0.054
gy+0.049
o+0.048
 not+0.045
 just+0.044
Neg logit contributions
umm-0.168
Warning-0.154
 disapproval-0.153
 heavens-0.152
 gentleman-0.148
Token.
Token ID13
Feature activation+2.74e-02

Pos logit contributions
o+0.160
 in+0.152
gy+0.148
 got+0.137
ie+0.132
Neg logit contributions
umm-0.390
Warning-0.362
 disapproval-0.361
 clerk-0.358
 gentleman-0.358
Token The
Token ID383
Feature activation-5.78e-03

Pos logit contributions
 in+0.041
o+0.036
 just+0.036
gy+0.035
 got+0.034
Neg logit contributions
umm-0.056
Warning-0.051
 disapproval-0.051
 heavens-0.049
 jan-0.048
Token deer
Token ID20096
Feature activation+8.83e-02

Pos logit contributions
umm+0.011
Warning+0.010
 disapproval+0.009
 heavens+0.009
 horror+0.009
Neg logit contributions
 in-0.010
 just-0.009
 got-0.009
 would-0.008
gy-0.008
Token felt
Token ID2936
Feature activation+4.52e-02

Pos logit contributions
 in+0.070
 out+0.059
gy+0.059
o+0.058
 not+0.054
Neg logit contributions
umm-0.212
 disapproval-0.206
 heavens-0.203
Warning-0.202
 gentleman-0.196
Token sorry
Token ID7926
Feature activation+2.87e-02

Pos logit contributions
 in+0.058
o+0.057
gy+0.057
 would+0.051
 got+0.050
Neg logit contributions
umm-0.090
 disapproval-0.088
 heavens-0.086
Warning-0.083
 horror-0.082
Token dirty
Token ID11841
Feature activation+9.85e-02

Pos logit contributions
umm+0.207
 disapproval+0.189
Warning+0.189
 heavens+0.185
 horror+0.180
Neg logit contributions
 in-0.108
o-0.091
gy-0.090
 got-0.088
 not-0.087
Token trophy
Token ID16383
Feature activation+1.92e-01

Pos logit contributions
o+0.104
gy+0.099
 in+0.095
 found+0.090
 not+0.088
Neg logit contributions
umm-0.220
 disapproval-0.191
Warning-0.188
 horizon-0.186
 clerk-0.186
Token back
Token ID736
Feature activation+1.21e-01

Pos logit contributions
gy+0.177
o+0.157
 in+0.146
 got+0.132
 not+0.128
Neg logit contributions
umm-0.429
Warning-0.393
 clerk-0.384
 heavens-0.382
 Painter-0.380
Token in
Token ID287
Feature activation+1.10e-01

Pos logit contributions
gy+0.091
o+0.087
 in+0.083
 would+0.078
 got+0.077
Neg logit contributions
umm-0.272
 disapproval-0.258
 horizon-0.255
 gentleman-0.254
 heavens-0.254
Token the
Token ID262
Feature activation+5.56e-02

Pos logit contributions
 in+0.102
 got+0.075
 just+0.075
gy+0.074
o+0.073
Neg logit contributions
umm-0.291
Warning-0.269
 disapproval-0.266
 heavens-0.262
 jan-0.254
Token sandbox
Token ID35204
Feature activation+1.75e-01

Pos logit contributions
 in+0.099
o+0.087
gy+0.087
 got+0.087
 just+0.086
Neg logit contributions
umm-0.097
Warning-0.086
 disapproval-0.086
 heavens-0.083
 horror-0.079
Token.
Token ID13
Feature activation+4.66e-02

Pos logit contributions
gy+0.181
o+0.168
 in+0.159
 got+0.149
 would+0.142
Neg logit contributions
umm-0.379
 horizon-0.340
 heavens-0.334
 disapproval-0.332
 clerk-0.329
Token He
Token ID679
Feature activation+1.28e-01

Pos logit contributions
 in+0.070
o+0.063
 just+0.060
 not+0.059
 there+0.059
Neg logit contributions
umm-0.098
 disapproval-0.087
Warning-0.086
 heavens-0.083
 jan-0.081
Token would
Token ID561
Feature activation+1.04e-01

Pos logit contributions
gy+0.142
 in+0.131
o+0.129
 out+0.118
ie+0.110
Neg logit contributions
 disapproval-0.257
 horror-0.253
 subt-0.248
 jan-0.247
Warning-0.247
Token never
Token ID1239
Feature activation+1.09e-01

Pos logit contributions
o+0.118
 in+0.109
gy+0.108
ie+0.096
ies+0.091
Neg logit contributions
 disapproval-0.214
 sidelines-0.209
umm-0.207
 heavens-0.206
 delight-0.203
Token have
Token ID423
Feature activation+8.39e-02

Pos logit contributions
 in+0.171
o+0.167
gy+0.165
 out+0.151
 up+0.145
Neg logit contributions
umm-0.188
 disapproval-0.178
Warning-0.171
 horror-0.165
 sidelines-0.165
Token story
Token ID1621
Feature activation+1.02e-01

Pos logit contributions
 in+0.120
 would+0.110
 just+0.107
 not+0.105
 got+0.105
Neg logit contributions
umm-0.138
Warning-0.124
 disapproval-0.120
 subt-0.112
 jan-0.111
Token.
Token ID13
Feature activation+3.35e-02

Pos logit contributions
 in+0.094
gy+0.083
 got+0.079
o+0.078
 found+0.074
Neg logit contributions
umm-0.244
Warning-0.229
 heavens-0.227
 disapproval-0.223
 sentences-0.220
Token A
Token ID317
Feature activation+3.75e-02

Pos logit contributions
 in+0.055
o+0.049
 just+0.048
gy+0.047
 not+0.046
Neg logit contributions
umm-0.065
 disapproval-0.058
Warning-0.057
 sidelines-0.055
 jan-0.054
Token long
Token ID890
Feature activation+2.54e-02

Pos logit contributions
 in+0.061
 would+0.055
 got+0.053
 just+0.053
 not+0.051
Neg logit contributions
umm-0.076
Warning-0.069
 disapproval-0.067
 horror-0.062
 heavens-0.062
Token time
Token ID640
Feature activation+7.93e-02

Pos logit contributions
 in+0.039
o+0.035
gy+0.034
 just+0.033
 out+0.033
Neg logit contributions
umm-0.047
 disapproval-0.043
Warning-0.042
 heavens-0.042
 gentleman-0.041
Token ago
Token ID2084
Feature activation+1.75e-01

Pos logit contributions
 in+0.074
gy+0.072
o+0.072
ie+0.065
 got+0.064
Neg logit contributions
umm-0.175
 heavens-0.166
Warning-0.166
 disapproval-0.165
 horror-0.163
Token,
Token ID11
Feature activation+5.31e-02

Pos logit contributions
o+0.166
 in+0.164
gy+0.161
ie+0.151
 just+0.146
Neg logit contributions
umm-0.361
Warning-0.349
 Painter-0.345
 clerk-0.341
 sentences-0.335
Token there
Token ID612
Feature activation+1.04e-01

Pos logit contributions
o+0.046
gy+0.044
 got+0.039
ie+0.038
 in+0.038
Neg logit contributions
umm-0.123
Warning-0.120
 heavens-0.116
 Cars-0.115
 orchestra-0.114
Token was
Token ID373
Feature activation+4.40e-02

Pos logit contributions
 in+0.092
gy+0.073
o+0.071
 got+0.067
 just+0.067
Neg logit contributions
umm-0.266
Warning-0.249
 disapproval-0.244
 heavens-0.242
 Painter-0.236
Token an
Token ID281
Feature activation+1.06e-02

Pos logit contributions
 in+0.030
gy+0.021
o+0.021
 got+0.021
 just+0.019
Neg logit contributions
umm-0.123
Warning-0.115
 disapproval-0.114
 heavens-0.113
 subt-0.110
Token adventurous
Token ID39770
Feature activation+1.27e-02

Pos logit contributions
 in+0.019
 just+0.016
ie+0.016
?"+0.016
 out+0.015
Neg logit contributions
umm-0.020
Warning-0.019
 disapproval-0.019
 subt-0.018
Fighting-0.017
Token in
Token ID287
Feature activation+1.42e-01

Pos logit contributions
umm+0.169
Warning+0.160
 disapproval+0.157
irk+0.156
 narrative+0.153
Neg logit contributions
 in-0.097
 got-0.081
 just-0.081
 to-0.079
 not-0.078
Token her
Token ID607
Feature activation+1.02e-01

Pos logit contributions
o+0.147
gy+0.140
 in+0.134
 would+0.131
 got+0.127
Neg logit contributions
 disapproval-0.309
umm-0.306
 horror-0.303
Warning-0.300
 anticipation-0.297
Token daddy
Token ID35822
Feature activation+1.40e-01

Pos logit contributions
o+0.176
 in+0.167
gy+0.163
 got+0.161
 would+0.158
Neg logit contributions
umm-0.147
 horror-0.141
 heavens-0.139
 horizon-0.139
 disapproval-0.138
Token's
Token ID338
Feature activation+7.77e-02

Pos logit contributions
gy+0.129
 in+0.113
o+0.113
 not+0.113
 would+0.106
Neg logit contributions
umm-0.280
 anticipation-0.278
 heavens-0.273
 clerk-0.272
 delight-0.272
Token tool
Token ID2891
Feature activation+1.23e-01

Pos logit contributions
 in+0.127
o+0.125
 got+0.120
gy+0.119
 would+0.117
Neg logit contributions
umm-0.132
 disapproval-0.121
 heavens-0.117
 horror-0.115
 clerk-0.113
Tokenbox
Token ID3524
Feature activation+1.73e-01

Pos logit contributions
 in+0.120
o+0.111
gy+0.111
 got+0.104
 just+0.102
Neg logit contributions
umm-0.269
 clerk-0.250
Warning-0.247
 heavens-0.244
 disapproval-0.242
Token.
Token ID13
Feature activation+4.06e-02

Pos logit contributions
o+0.151
gy+0.147
 in+0.137
 got+0.123
 would+0.123
Neg logit contributions
umm-0.376
 clerk-0.365
Warning-0.347
 disapproval-0.343
 heavens-0.341
Token She
Token ID1375
Feature activation+1.26e-01

Pos logit contributions
 in+0.071
o+0.062
 just+0.061
 got+0.061
 there+0.060
Neg logit contributions
umm-0.074
 disapproval-0.067
Warning-0.066
 heavens-0.062
 jan-0.061
Token thought
Token ID1807
Feature activation+7.80e-02

Pos logit contributions
gy+0.131
 in+0.129
o+0.121
ie+0.111
 out+0.110
Neg logit contributions
 subt-0.261
 horror-0.256
 disapproval-0.256
umm-0.251
 heavens-0.249
Token it
Token ID340
Feature activation+9.76e-02

Pos logit contributions
 in+0.057
gy+0.056
o+0.055
 got+0.046
 found+0.043
Neg logit contributions
umm-0.199
Warning-0.187
 disapproval-0.186
 horror-0.183
 heavens-0.183
Token would
Token ID561
Feature activation+9.09e-02

Pos logit contributions
 in+0.066
o+0.054
gy+0.048
 where+0.042
 not+0.042
Neg logit contributions
umm-0.252
Warning-0.234
 heavens-0.231
 disapproval-0.231
 Painter-0.225
Token poking
Token ID39048
Feature activation+9.33e-02

Pos logit contributions
 in+0.111
gy+0.097
o+0.094
 got+0.090
 not+0.088
Neg logit contributions
umm-0.320
 disapproval-0.304
Warning-0.303
 heavens-0.299
 gentleman-0.296
Token out
Token ID503
Feature activation+1.24e-01

Pos logit contributions
 in+0.074
o+0.060
 got+0.060
gy+0.059
 would+0.059
Neg logit contributions
umm-0.240
 disapproval-0.228
Warning-0.226
 heavens-0.218
 horror-0.214
Token of
Token ID286
Feature activation+1.03e-01

Pos logit contributions
 in+0.113
gy+0.094
o+0.094
 got+0.092
 would+0.090
Neg logit contributions
umm-0.310
 disapproval-0.291
Warning-0.289
 heavens-0.281
 horror-0.277
Token the
Token ID262
Feature activation+2.84e-02

Pos logit contributions
 in+0.098
o+0.084
gy+0.082
 got+0.081
 would+0.080
Neg logit contributions
umm-0.254
 disapproval-0.241
Warning-0.237
 heavens-0.231
 horror-0.228
Token book
Token ID1492
Feature activation+9.49e-02

Pos logit contributions
 in+0.049
 got+0.043
o+0.043
gy+0.042
 just+0.042
Neg logit contributions
umm-0.051
Warning-0.045
 disapproval-0.045
 heavens-0.044
 horror-0.042
Tokencase
Token ID7442
Feature activation+1.73e-01

Pos logit contributions
 in+0.084
gy+0.068
o+0.067
 got+0.066
 just+0.064
Neg logit contributions
umm-0.243
Warning-0.225
 disapproval-0.223
 heavens-0.220
 horror-0.213
Token!
Token ID0
Feature activation+3.62e-02

Pos logit contributions
 in+0.146
o+0.140
gy+0.134
 got+0.128
 found+0.120
Neg logit contributions
umm-0.403
Warning-0.379
 disapproval-0.372
 heavens-0.370
 horizon-0.366
Token Lisa
Token ID15378
Feature activation+9.79e-02

Pos logit contributions
 in+0.060
o+0.053
 just+0.052
gy+0.051
 got+0.050
Neg logit contributions
umm-0.071
 disapproval-0.062
Warning-0.062
 heavens-0.059
irk-0.059
Token was
Token ID373
Feature activation+7.95e-02

Pos logit contributions
 in+0.080
gy+0.067
o+0.065
 out+0.064
 not+0.061
Neg logit contributions
umm-0.229
 disapproval-0.227
 heavens-0.223
Warning-0.222
 horror-0.216
Token so
Token ID523
Feature activation+5.85e-02

Pos logit contributions
 in+0.097
gy+0.088
 would+0.088
o+0.087
 got+0.085
Neg logit contributions
umm-0.165
 disapproval-0.159
 heavens-0.154
 horror-0.152
Warning-0.151
Token excited
Token ID6568
Feature activation+5.27e-02

Pos logit contributions
 in+0.094
 got+0.080
 just+0.080
o+0.080
gy+0.079
Neg logit contributions
umm-0.114
Warning-0.103
 disapproval-0.102
 heavens-0.099
 horror-0.095
Token she
Token ID673
Feature activation+1.20e-01

Pos logit contributions
 in+0.059
gy+0.058
o+0.057
 would+0.052
ie+0.052
Neg logit contributions
umm-0.119
Warning-0.112
 disapproval-0.112
 clerk-0.112
 gentleman-0.111
Token took
Token ID1718
Feature activation+6.11e-02

Pos logit contributions
gy+0.127
 in+0.108
o+0.099
 where+0.090
 out+0.086
Neg logit contributions
umm-0.245
 horror-0.242
 Painter-0.241
 heavens-0.241
 disapproval-0.240
Token a
Token ID257
Feature activation+5.70e-02

Pos logit contributions
 in+0.047
gy+0.037
o+0.037
 would+0.035
 got+0.034
Neg logit contributions
umm-0.165
 disapproval-0.153
Warning-0.153
 heavens-0.149
 horror-0.147
Token thin
Token ID7888
Feature activation+1.31e-02

Pos logit contributions
 in+0.079
gy+0.074
o+0.070
 not+0.069
 just+0.069
Neg logit contributions
umm-0.109
 disapproval-0.102
Warning-0.101
 heavens-0.101
 sidelines-0.098
Token paint
Token ID7521
Feature activation+7.23e-02

Pos logit contributions
 in+0.020
o+0.018
gy+0.018
 not+0.018
 just+0.017
Neg logit contributions
umm-0.025
 disapproval-0.023
Warning-0.022
 heavens-0.022
 gentleman-0.021
Tokenbrush
Token ID32680
Feature activation+1.72e-01

Pos logit contributions
 in+0.099
gy+0.090
o+0.090
 got+0.086
 not+0.085
Neg logit contributions
umm-0.150
Warning-0.135
 disapproval-0.133
 heavens-0.132
 clerk-0.127
Token and
Token ID290
Feature activation+9.23e-02

Pos logit contributions
ts+0.146
gy+0.135
 dug+0.134
 searched+0.133
o+0.132
Neg logit contributions
umm-0.332
 clerk-0.322
 horizon-0.321
 heavens-0.307
ula-0.305
Token made
Token ID925
Feature activation+6.22e-02

Pos logit contributions
gy+0.110
o+0.108
 in+0.098
ie+0.090
 not+0.080
Neg logit contributions
 subt-0.187
umm-0.187
 heavens-0.178
Warning-0.175
 clerk-0.174
Token tiny
Token ID7009
Feature activation+1.06e-02

Pos logit contributions
 in+0.046
gy+0.039
o+0.037
 got+0.034
 would+0.034
Neg logit contributions
umm-0.166
 disapproval-0.155
Warning-0.154
 heavens-0.152
 sentences-0.149
Token little
Token ID1310
Feature activation+1.53e-02

Pos logit contributions
 in+0.019
 just+0.017
 got+0.017
gy+0.016
o+0.016
Neg logit contributions
umm-0.018
Warning-0.017
 disapproval-0.016
 heavens-0.016
 jan-0.015
Token circles
Token ID13332
Feature activation+7.03e-02

Pos logit contributions
 in+0.027
 just+0.023
 got+0.023
 out+0.023
gy+0.023
Neg logit contributions
umm-0.027
Warning-0.025
 disapproval-0.025
 heavens-0.024
 jan-0.023
Token He
Token ID679
Feature activation+1.11e-01

Pos logit contributions
 in+0.033
o+0.028
gy+0.028
 just+0.028
 got+0.027
Neg logit contributions
umm-0.043
Warning-0.039
 disapproval-0.039
 heavens-0.037
 sidelines-0.036
Token was
Token ID373
Feature activation+7.68e-02

Pos logit contributions
 in+0.118
gy+0.103
 where+0.101
 out+0.100
o+0.095
Neg logit contributions
umm-0.216
 disapproval-0.216
 heavens-0.214
 horror-0.211
Warning-0.207
Token here
Token ID994
Feature activation+8.76e-02

Pos logit contributions
gy+0.082
 would+0.079
ie+0.077
o+0.075
 got+0.074
Neg logit contributions
umm-0.159
 disapproval-0.157
 horror-0.153
 heavens-0.152
 jan-0.151
Token a
Token ID257
Feature activation+5.13e-02

Pos logit contributions
 in+0.054
gy+0.053
 got+0.047
o+0.046
 found+0.042
Neg logit contributions
umm-0.226
 heavens-0.212
 disapproval-0.211
Warning-0.211
 clerk-0.205
Token minute
Token ID5664
Feature activation+5.28e-02

Pos logit contributions
 in+0.079
 just+0.068
 would+0.067
 got+0.066
 out+0.065
Neg logit contributions
umm-0.105
Warning-0.099
 disapproval-0.095
 heavens-0.090
 subt-0.089
Token ago
Token ID2084
Feature activation+1.70e-01

Pos logit contributions
 in+0.044
gy+0.043
o+0.039
 got+0.039
 found+0.037
Neg logit contributions
umm-0.126
 heavens-0.115
Warning-0.114
 disapproval-0.114
 horror-0.111
Token.
Token ID13
Feature activation+2.42e-02

Pos logit contributions
o+0.159
gy+0.157
 in+0.155
ie+0.144
 just+0.142
Neg logit contributions
umm-0.351
 clerk-0.332
Warning-0.328
 gentleman-0.326
 Painter-0.324
Token He
Token ID679
Feature activation+1.06e-01

Pos logit contributions
 in+0.044
o+0.038
 just+0.038
 got+0.038
gy+0.038
Neg logit contributions
umm-0.042
 disapproval-0.037
Warning-0.037
 heavens-0.035
 sidelines-0.034
Token was
Token ID373
Feature activation+6.73e-02

Pos logit contributions
 in+0.098
gy+0.085
 out+0.085
o+0.075
 where+0.073
Neg logit contributions
umm-0.229
 heavens-0.229
 disapproval-0.228
Warning-0.225
 jan-0.222
Token chasing
Token ID20023
Feature activation+2.00e-02

Pos logit contributions
gy+0.071
 in+0.068
 would+0.067
ie+0.067
o+0.066
Neg logit contributions
umm-0.141
 disapproval-0.137
 heavens-0.135
 horror-0.134
 jan-0.132
Token the
Token ID262
Feature activation-1.29e-02

Pos logit contributions
 in+0.018
gy+0.015
 got+0.015
o+0.014
 would+0.014
Neg logit contributions
umm-0.051
Warning-0.047
 disapproval-0.047
 heavens-0.046
 jan-0.044
Token lived
Token ID5615
Feature activation+7.06e-02

Pos logit contributions
 in+0.096
ies+0.090
ie+0.088
 where+0.087
gy+0.086
Neg logit contributions
Warning-0.178
 heavens-0.172
 horror-0.165
 cheers-0.163
umm-0.160
Token here
Token ID994
Feature activation+7.21e-02

Pos logit contributions
gy+0.055
 got+0.052
o+0.045
ie+0.043
 would+0.039
Neg logit contributions
Warning-0.154
 heavens-0.149
umm-0.147
 disapproval-0.144
 Spanish-0.143
Token a
Token ID257
Feature activation+4.63e-02

Pos logit contributions
gy+0.060
 in+0.054
 got+0.054
o+0.053
 found+0.045
Neg logit contributions
umm-0.166
Warning-0.159
 heavens-0.159
 disapproval-0.156
 horizon-0.153
Token long
Token ID890
Feature activation+1.57e-02

Pos logit contributions
 in+0.071
 just+0.060
 got+0.060
 would+0.059
 out+0.058
Neg logit contributions
umm-0.095
Warning-0.087
 disapproval-0.086
 heavens-0.081
 subt-0.081
Token time
Token ID640
Feature activation+7.15e-02

Pos logit contributions
 in+0.023
o+0.022
gy+0.020
 got+0.020
 not+0.020
Neg logit contributions
umm-0.029
 disapproval-0.026
 horizon-0.026
 heavens-0.026
Warning-0.026
Token ago
Token ID2084
Feature activation+1.69e-01

Pos logit contributions
gy+0.087
o+0.079
ie+0.077
 in+0.077
 got+0.076
Neg logit contributions
umm-0.139
 heavens-0.133
Warning-0.131
 horizon-0.130
 disapproval-0.127
Token."
Token ID526
Feature activation+2.57e-02

Pos logit contributions
o+0.160
gy+0.153
 in+0.146
 got+0.143
ie+0.141
Neg logit contributions
umm-0.348
Warning-0.329
 clerk-0.323
 gentleman-0.320
 Painter-0.320
Token
Token ID198
Feature activation+4.93e-02

Pos logit contributions
 in+0.031
o+0.025
 just+0.025
 got+0.025
 found+0.024
Neg logit contributions
umm-0.065
Warning-0.057
 disapproval-0.056
 heavens-0.055
 jan-0.054
Token
Token ID198
Feature activation+3.56e-02

Pos logit contributions
 in+0.053
o+0.045
 not+0.040
 just+0.040
 found+0.039
Neg logit contributions
umm-0.132
 disapproval-0.115
Warning-0.113
 heavens-0.112
 subt-0.110
Token"
Token ID1
Feature activation+3.74e-02

Pos logit contributions
 in+0.055
o+0.051
 not+0.049
 got+0.047
 just+0.047
Neg logit contributions
umm-0.080
 horror-0.063
 horizon-0.063
 disapproval-0.062
 heavens-0.061
TokenCan
Token ID6090
Feature activation+3.38e-02

Pos logit contributions
 in+0.062
 would+0.057
 got+0.056
 just+0.055
o+0.053
Neg logit contributions
umm-0.073
 horror-0.061
irk-0.060
andel-0.060
 disapproval-0.060
Token you
Token ID345
Feature activation+1.37e-01

Pos logit contributions
gy+0.182
 in+0.174
o+0.168
 would+0.152
 got+0.151
Neg logit contributions
 disapproval-0.328
 orchestra-0.321
Warning-0.318
umm-0.318
 heavens-0.317
Token carry
Token ID3283
Feature activation+7.35e-02

Pos logit contributions
 in+0.127
gy+0.123
o+0.119
 out+0.098
 got+0.096
Neg logit contributions
umm-0.305
 disapproval-0.299
Warning-0.295
 heavens-0.291
 gentleman-0.281
Token a
Token ID257
Feature activation+8.67e-02

Pos logit contributions
gy+0.071
o+0.070
 in+0.064
 would+0.063
 got+0.062
Neg logit contributions
umm-0.160
Warning-0.152
 disapproval-0.150
 heavens-0.147
 clerk-0.146
Token l
Token ID300
Feature activation+5.20e-02

Pos logit contributions
 in+0.135
o+0.122
gy+0.122
 not+0.116
 got+0.116
Neg logit contributions
umm-0.162
 disapproval-0.143
 heavens-0.143
Warning-0.140
 clerk-0.138
Tokenily
Token ID813
Feature activation+9.18e-02

Pos logit contributions
 in+0.105
gy+0.102
o+0.098
 got+0.097
 would+0.096
Neg logit contributions
umm-0.075
Warning-0.061
 disapproval-0.061
 heavens-0.060
irk-0.059
Token pad
Token ID14841
Feature activation+1.69e-01

Pos logit contributions
 not+0.096
gy+0.094
 in+0.093
o+0.090
 got+0.088
Neg logit contributions
umm-0.181
Warning-0.166
 heavens-0.164
 clerk-0.164
 gentleman-0.163
Token too
Token ID1165
Feature activation+9.62e-02

Pos logit contributions
ep+0.130
ie+0.127
 in+0.126
ts+0.124
gy+0.123
Neg logit contributions
umm-0.390
 clerk-0.350
 gentleman-0.339
Warning-0.336
 heavens-0.329
Token?"
Token ID1701
Feature activation+4.86e-02

Pos logit contributions
 in+0.118
gy+0.110
o+0.105
 got+0.099
 found+0.096
Neg logit contributions
umm-0.197
Warning-0.184
 heavens-0.184
 disapproval-0.182
 horizon-0.176
Token
Token ID220
Feature activation+7.24e-02

Pos logit contributions
 in+0.065
 just+0.054
o+0.053
 got+0.052
 not+0.051
Neg logit contributions
umm-0.116
Warning-0.103
 disapproval-0.100
 heavens-0.099
 horizon-0.097
Token
Token ID198
Feature activation+5.64e-02

Pos logit contributions
 in+0.097
 got+0.081
gy+0.081
 just+0.080
o+0.080
Neg logit contributions
umm-0.157
Warning-0.144
 disapproval-0.142
 heavens-0.139
 horror-0.134
TokenThe
Token ID464
Feature activation+2.44e-05

Pos logit contributions
 in+0.053
o+0.047
 just+0.038
 found+0.038
 got+0.037
Neg logit contributions
umm-0.160
 disapproval-0.138
 horror-0.136
Warning-0.136
 horizon-0.135
Token our
Token ID674
Feature activation+1.89e-02

Pos logit contributions
 in+0.060
 just+0.043
 out+0.040
 got+0.040
 not+0.039
Neg logit contributions
umm-0.191
Warning-0.174
 disapproval-0.171
 heavens-0.169
 jan-0.165
Token country
Token ID1499
Feature activation+8.99e-02

Pos logit contributions
 in+0.034
 just+0.029
gy+0.029
 got+0.029
 found+0.028
Neg logit contributions
umm-0.034
Warning-0.032
 disapproval-0.030
 heavens-0.029
 subt-0.028
Token a
Token ID257
Feature activation+4.41e-02

Pos logit contributions
 in+0.073
gy+0.059
o+0.058
 got+0.056
 just+0.055
Neg logit contributions
umm-0.231
Warning-0.214
 disapproval-0.213
 heavens-0.212
 gentleman-0.206
Token long
Token ID890
Feature activation+1.78e-02

Pos logit contributions
 in+0.071
 just+0.061
 got+0.061
 would+0.060
gy+0.059
Neg logit contributions
umm-0.087
Warning-0.079
 disapproval-0.077
 heavens-0.073
 horror-0.072
Token time
Token ID640
Feature activation+6.86e-02

Pos logit contributions
 in+0.027
o+0.025
gy+0.024
 just+0.024
 got+0.024
Neg logit contributions
umm-0.033
 disapproval-0.030
Warning-0.030
 heavens-0.029
 gentleman-0.029
Token ago
Token ID2084
Feature activation+1.68e-01

Pos logit contributions
 in+0.067
gy+0.059
o+0.058
 got+0.056
 would+0.054
Neg logit contributions
umm-0.161
 disapproval-0.149
Warning-0.149
 heavens-0.147
 gentleman-0.143
Token.
Token ID13
Feature activation+1.81e-02

Pos logit contributions
o+0.150
 in+0.147
gy+0.140
 got+0.134
ie+0.132
Neg logit contributions
umm-0.361
Warning-0.341
 clerk-0.338
 gentleman-0.336
 disapproval-0.334
Token He
Token ID679
Feature activation+9.89e-02

Pos logit contributions
 in+0.025
o+0.023
 just+0.021
 there+0.020
gy+0.020
Neg logit contributions
umm-0.041
 disapproval-0.037
Warning-0.036
 heavens-0.034
irk-0.034
Token helped
Token ID4193
Feature activation+5.26e-03

Pos logit contributions
 in+0.078
gy+0.061
o+0.059
 out+0.058
 just+0.052
Neg logit contributions
umm-0.251
Warning-0.240
 disapproval-0.240
 heavens-0.237
 gentleman-0.231
Token many
Token ID867
Feature activation+1.53e-02

Pos logit contributions
 in+0.006
 got+0.004
 just+0.004
gy+0.004
o+0.004
Neg logit contributions
umm-0.013
Warning-0.012
 disapproval-0.012
 heavens-0.012
 horror-0.011
Token people
Token ID661
Feature activation+7.56e-02

Pos logit contributions
 in+0.026
 just+0.022
 got+0.022
 would+0.022
 out+0.021
Neg logit contributions
umm-0.031
Warning-0.027
 disapproval-0.026
 heavens-0.026
 Painter-0.025
Token
Token ID129
Feature activation+1.22e-01

Pos logit contributions
umm+0.076
 disapproval+0.069
Warning+0.069
 heavens+0.066
 jan+0.065
Neg logit contributions
 in-0.046
 got-0.037
 would-0.037
 just-0.036
o-0.036
Token
Token ID241
Feature activation+1.20e-01

Pos logit contributions
 in+0.229
gy+0.202
 got+0.202
o+0.201
 just+0.201
Neg logit contributions
umm-0.203
Warning-0.179
 disapproval-0.177
 heavens-0.171
 horror-0.164
TokenLet
Token ID5756
Feature activation+1.22e-01

Pos logit contributions
 in+0.195
gy+0.172
 got+0.170
 just+0.167
 would+0.166
Neg logit contributions
umm-0.221
Warning-0.204
 disapproval-0.199
 heavens-0.193
 gentleman-0.184
Tokenâ
Token ID22940
Feature activation+5.78e-02

Pos logit contributions
 in+0.119
gy+0.115
o+0.112
 where+0.094
 not+0.090
Neg logit contributions
umm-0.270
Warning-0.265
 disapproval-0.263
 clerk-0.252
 heavens-0.251
Token
Token ID26391
Feature activation+6.03e-02

Pos logit contributions
 in+0.100
 got+0.092
gy+0.091
o+0.091
 just+0.088
Neg logit contributions
umm-0.104
 disapproval-0.087
Warning-0.086
 heavens-0.086
 horror-0.085
Token
Token ID8151
Feature activation+1.68e-01

Pos logit contributions
 in+0.099
ie+0.085
 out+0.083
 just+0.082
 would+0.082
Neg logit contributions
umm-0.105
Warning-0.098
 disapproval-0.094
 gentleman-0.094
 Painter-0.094
Tokens
Token ID82
Feature activation+8.76e-02

Pos logit contributions
 in+0.202
gy+0.164
 just+0.163
 got+0.161
 found+0.157
Neg logit contributions
umm-0.383
Warning-0.354
 disapproval-0.350
 heavens-0.340
 jan-0.331
Token go
Token ID467
Feature activation+9.64e-02

Pos logit contributions
 in+0.088
o+0.087
gy+0.080
ie+0.073
 found+0.071
Neg logit contributions
umm-0.175
 disapproval-0.172
 clerk-0.169
 delight-0.167
Warning-0.167
Token find
Token ID1064
Feature activation+1.03e-01

Pos logit contributions
 in+0.079
gy+0.073
 not+0.066
o+0.066
 would+0.065
Neg logit contributions
 gentleman-0.213
umm-0.211
Warning-0.210
 clerk-0.210
 heavens-0.208
Token something
Token ID1223
Feature activation+1.04e-01

Pos logit contributions
 in+0.083
gy+0.076
o+0.072
 got+0.069
 would+0.067
Neg logit contributions
umm-0.258
Warning-0.244
 disapproval-0.242
 heavens-0.238
 clerk-0.235
Token else
Token ID2073
Feature activation+1.16e-01

Pos logit contributions
gy+0.162
 in+0.159
o+0.153
 got+0.144
 found+0.140
Neg logit contributions
umm-0.181
Warning-0.162
 disapproval-0.159
 heavens-0.157
 jan-0.152
Token-
Token ID12
Feature activation+1.22e-01

Pos logit contributions
o+0.111
gy+0.106
 in+0.095
 got+0.084
 found+0.084
Neg logit contributions
umm-0.323
 clerk-0.299
 orchestra-0.299
 heavens-0.295
 gentleman-0.293
Tokenup
Token ID929
Feature activation+1.57e-01

Pos logit contributions
gy+0.212
 in+0.203
 not+0.198
 would+0.193
o+0.188
Neg logit contributions
 clerk-0.165
Warning-0.165
umm-0.160
 orchestra-0.155
Ta-0.155
Token box
Token ID3091
Feature activation+1.54e-01

Pos logit contributions
o+0.149
gy+0.146
 in+0.132
 not+0.117
 got+0.115
Neg logit contributions
umm-0.341
 orchestra-0.329
 clerk-0.326
 gentleman-0.321
 disapproval-0.319
Token in
Token ID287
Feature activation+9.17e-02

Pos logit contributions
gy+0.118
 in+0.118
o+0.115
 got+0.106
 would+0.097
Neg logit contributions
umm-0.373
 heavens-0.339
Warning-0.336
 clerk-0.336
 horizon-0.334
Token the
Token ID262
Feature activation+3.95e-02

Pos logit contributions
 in+0.072
 just+0.048
 got+0.047
 out+0.046
 found+0.045
Neg logit contributions
umm-0.263
 disapproval-0.240
Warning-0.239
 heavens-0.236
 sidelines-0.230
Token attic
Token ID46113
Feature activation+1.68e-01

Pos logit contributions
 in+0.066
 got+0.056
 out+0.056
 found+0.056
 just+0.055
Neg logit contributions
umm-0.077
Warning-0.069
 disapproval-0.069
irk-0.065
 heavens-0.064
Token!
Token ID0
Feature activation+3.98e-02

Pos logit contributions
o+0.155
gy+0.151
 in+0.148
 got+0.135
 just+0.129
Neg logit contributions
umm-0.377
 heavens-0.354
 horizon-0.354
Warning-0.350
 clerk-0.341
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
 in+0.065
o+0.058
 there+0.056
 just+0.055
 got+0.054
Neg logit contributions
umm-0.080
 disapproval-0.071
Warning-0.068
 heavens-0.067
 sidelines-0.066
Token went
Token ID1816
Feature activation+8.51e-02

Pos logit contributions
gy+0.121
o+0.109
 in+0.108
 out+0.096
ie+0.086
Neg logit contributions
 heavens-0.259
 disapproval-0.258
 subt-0.258
 orchestra-0.256
umm-0.254
Token to
Token ID284
Feature activation+8.36e-02

Pos logit contributions
gy+0.069
o+0.064
ie+0.058
seed+0.055
 just+0.051
Neg logit contributions
 clerk-0.193
 gentleman-0.191
 heavens-0.191
 jan-0.191
 professor-0.184
Token get
Token ID651
Feature activation+6.94e-02

Pos logit contributions
 in+0.077
o+0.072
gy+0.069
 would+0.062
 got+0.062
Neg logit contributions
umm-0.196
Warning-0.194
 disapproval-0.190
 heavens-0.187
 horror-0.183
Token and
Token ID290
Feature activation+1.03e-01

Pos logit contributions
 in+0.099
o+0.086
gy+0.085
 got+0.082
 just+0.081
Neg logit contributions
umm-0.268
Warning-0.246
 heavens-0.245
 horizon-0.244
 clerk-0.243
Token walk
Token ID2513
Feature activation+9.65e-02

Pos logit contributions
ie+0.106
o+0.105
gy+0.102
 in+0.098
 just+0.078
Neg logit contributions
 horror-0.208
 anticipation-0.207
 disapproval-0.206
 subt-0.205
 jan-0.205
Token through
Token ID832
Feature activation+7.28e-02

Pos logit contributions
o+0.071
gy+0.067
 in+0.064
 not+0.064
 got+0.060
Neg logit contributions
umm-0.217
Warning-0.214
 disapproval-0.212
 heavens-0.212
 gentleman-0.211
Token a
Token ID257
Feature activation+6.36e-02

Pos logit contributions
 in+0.061
o+0.053
gy+0.051
 got+0.050
 just+0.047
Neg logit contributions
umm-0.185
Warning-0.174
 disapproval-0.174
 heavens-0.170
 horror-0.168
Token metal
Token ID6147
Feature activation+8.47e-02

Pos logit contributions
 in+0.098
 got+0.084
 found+0.082
 out+0.082
 just+0.081
Neg logit contributions
umm-0.132
Warning-0.122
 disapproval-0.117
 subt-0.112
 Painter-0.111
Token detector
Token ID31029
Feature activation+1.66e-01

Pos logit contributions
 in+0.137
o+0.133
 not+0.127
 got+0.127
gy+0.126
Neg logit contributions
umm-0.147
Warning-0.130
 disapproval-0.129
 heavens-0.127
 gentleman-0.125
Token.
Token ID13
Feature activation+2.36e-02

Pos logit contributions
o+0.155
gy+0.152
 in+0.138
 got+0.126
 found+0.126
Neg logit contributions
 horizon-0.330
umm-0.326
 orchestra-0.325
Warning-0.325
 gentleman-0.322
Token Tom
Token ID4186
Feature activation+7.49e-02

Pos logit contributions
 in+0.034
o+0.030
 there+0.029
 just+0.029
 where+0.028
Neg logit contributions
umm-0.052
 disapproval-0.045
Warning-0.045
 heavens-0.044
 jan-0.042
Token and
Token ID290
Feature activation+6.97e-02

Pos logit contributions
 in+0.071
gy+0.062
 not+0.059
 out+0.058
ie+0.058
Neg logit contributions
 heavens-0.159
 disapproval-0.158
Warning-0.158
 horror-0.157
umm-0.155
Token Lily
Token ID20037
Feature activation+7.52e-02

Pos logit contributions
 in+0.120
gy+0.105
 just+0.104
o+0.104
 got+0.103
Neg logit contributions
umm-0.117
Warning-0.110
 disapproval-0.109
 jan-0.104
 heavens-0.104
Token thought
Token ID1807
Feature activation+6.45e-02

Pos logit contributions
 in+0.075
o+0.070
gy+0.067
 not+0.064
ie+0.061
Neg logit contributions
 disapproval-0.162
umm-0.160
Warning-0.159
 heavens-0.158
 horror-0.157
Token them
Token ID606
Feature activation+1.93e-01

Pos logit contributions
umm+0.037
Warning+0.034
 disapproval+0.034
 heavens+0.033
 jan+0.033
Neg logit contributions
 in-0.011
gy-0.008
o-0.008
 got-0.008
 just-0.008
Token into
Token ID656
Feature activation+1.29e-01

Pos logit contributions
gy+0.155
o+0.139
 in+0.133
 got+0.117
ie+0.115
Neg logit contributions
umm-0.429
 disapproval-0.425
 horizon-0.412
 heavens-0.405
Warning-0.405
Token beautiful
Token ID4950
Feature activation+6.05e-02

Pos logit contributions
 in+0.109
o+0.083
gy+0.082
 got+0.082
 just+0.080
Neg logit contributions
umm-0.342
Warning-0.320
 disapproval-0.319
 heavens-0.311
 horror-0.305
Token or
Token ID393
Feature activation+9.58e-02

Pos logit contributions
o+0.108
 not+0.103
 in+0.103
gy+0.102
 got+0.098
Neg logit contributions
umm-0.098
 horizon-0.085
 disapproval-0.083
 heavens-0.083
 orchestra-0.082
Tokennam
Token ID7402
Feature activation+1.28e-01

Pos logit contributions
gy+0.121
o+0.120
 in+0.118
ie+0.108
 got+0.105
Neg logit contributions
umm-0.187
 disapproval-0.180
 heavens-0.175
 horror-0.171
Warning-0.171
Tokenents
Token ID658
Feature activation+1.65e-01

Pos logit contributions
 in+0.249
 would+0.240
 got+0.227
 just+0.225
gy+0.224
Neg logit contributions
umm-0.188
Warning-0.164
 disapproval-0.161
 heavens-0.152
 Painter-0.150
Token.
Token ID13
Feature activation+4.28e-02

Pos logit contributions
gy+0.141
 in+0.137
o+0.133
 not+0.123
 just+0.122
Neg logit contributions
umm-0.367
Warning-0.347
 heavens-0.340
 disapproval-0.340
 horizon-0.336
Token
Token ID198
Feature activation+6.10e-02

Pos logit contributions
 in+0.064
o+0.057
 just+0.055
 there+0.053
 got+0.053
Neg logit contributions
umm-0.090
 disapproval-0.080
Warning-0.080
 heavens-0.077
 jan-0.075
Token
Token ID198
Feature activation+4.85e-02

Pos logit contributions
 in+0.064
o+0.056
 not+0.049
 found+0.049
 just+0.049
Neg logit contributions
umm-0.161
 disapproval-0.141
Warning-0.138
 heavens-0.137
 jan-0.135
TokenOne
Token ID3198
Feature activation+4.35e-02

Pos logit contributions
 in+0.070
o+0.065
 not+0.060
 found+0.060
 there+0.060
Neg logit contributions
umm-0.110
 disapproval-0.090
 horror-0.089
 heavens-0.088
 horizon-0.088
Token day
Token ID1110
Feature activation+7.76e-02

Pos logit contributions
 in+0.067
gy+0.059
o+0.058
 got+0.058
 just+0.058
Neg logit contributions
umm-0.085
Warning-0.077
 disapproval-0.076
 heavens-0.074
 horror-0.071
Token,
Token ID11
Feature activation-4.77e-02

Pos logit contributions
umm+0.093
Warning+0.087
 disapproval+0.086
 heavens+0.084
 horror+0.082
Neg logit contributions
 in-0.031
o-0.024
gy-0.024
 got-0.024
 just-0.023
Token wearing
Token ID5762
Feature activation-3.10e-02

Pos logit contributions
umm+0.106
 disapproval+0.098
Warning+0.096
 heavens+0.095
 horror+0.094
Neg logit contributions
 in-0.056
gy-0.048
o-0.047
 got-0.047
 would-0.045
Token their
Token ID511
Feature activation-7.86e-02

Pos logit contributions
umm+0.071
 disapproval+0.070
Warning+0.067
 sentences+0.066
 heavens+0.066
Neg logit contributions
 in-0.030
o-0.028
gy-0.025
 got-0.025
 would-0.024
Token fast
Token ID3049
Feature activation-9.73e-02

Pos logit contributions
umm+0.160
Warning+0.145
 disapproval+0.142
 heavens+0.139
 Painter+0.134
Neg logit contributions
 in-0.120
 got-0.102
 just-0.102
gy-0.101
o-0.100
Token red
Token ID2266
Feature activation-8.31e-02

Pos logit contributions
umm+0.169
 disapproval+0.151
Warning+0.151
 heavens+0.145
 horror+0.139
Neg logit contributions
 in-0.170
o-0.151
 got-0.148
gy-0.148
 just-0.147
Token swe
Token ID3490
Feature activation-1.44e-01

Pos logit contributions
umm+0.113
 disapproval+0.100
Warning+0.099
 heavens+0.093
 clerk+0.089
Neg logit contributions
 in-0.171
o-0.158
 got-0.153
 just-0.152
 not-0.151
Tokenaters
Token ID8605
Feature activation-3.56e-03

Pos logit contributions
umm+0.173
Warning+0.152
 disapproval+0.148
 heavens+0.143
 jan+0.134
Neg logit contributions
 in-0.327
gy-0.300
o-0.298
 just-0.296
 got-0.295
Token.
Token ID13
Feature activation-7.10e-02

Pos logit contributions
umm+0.009
Warning+0.008
 heavens+0.008
 disapproval+0.008
 clerk+0.008
Neg logit contributions
 in-0.002
o-0.002
gy-0.002
 not-0.002
 got-0.002
Token<|endoftext|>
Token ID50256
Feature activation-1.76e-02

Pos logit contributions
umm+0.152
 disapproval+0.137
Warning+0.136
 heavens+0.133
 jan+0.129
Neg logit contributions
 in-0.103
o-0.091
 just-0.089
 got-0.088
gy-0.087
TokenOnce
Token ID7454
Feature activation-1.75e-02

Pos logit contributions
umm+0.036
Warning+0.033
 disapproval+0.032
 heavens+0.032
 horror+0.030
Neg logit contributions
 in-0.025
gy-0.022
 got-0.022
 just-0.022
o-0.021
Token upon
Token ID2402
Feature activation+1.05e-02

Pos logit contributions
umm+0.037
irk+0.036
 horror+0.036
 Painter+0.035
Warning+0.035
Neg logit contributions
 in-0.016
gy-0.016
 just-0.014
 got-0.013
o-0.013
Token,
Token ID11
Feature activation-3.49e-02

Pos logit contributions
umm+0.062
 disapproval+0.058
Warning+0.058
 heavens+0.057
 horror+0.055
Neg logit contributions
 in-0.019
o-0.017
gy-0.016
 got-0.015
 just-0.015
Token he
Token ID339
Feature activation+1.48e-02

Pos logit contributions
umm+0.091
Warning+0.085
 disapproval+0.084
 heavens+0.082
 horror+0.080
Neg logit contributions
 in-0.032
 got-0.024
 just-0.024
gy-0.024
o-0.024
Token noticed
Token ID6810
Feature activation-3.90e-02

Pos logit contributions
 in+0.014
gy+0.012
o+0.012
 out+0.011
 not+0.010
Neg logit contributions
umm-0.033
 heavens-0.032
 disapproval-0.032
Warning-0.032
 horizon-0.031
Token that
Token ID326
Feature activation-1.91e-02

Pos logit contributions
umm+0.106
Warning+0.099
 disapproval+0.099
 heavens+0.097
 jan+0.095
Neg logit contributions
 in-0.029
gy-0.021
o-0.021
 got-0.021
 just-0.020
Token the
Token ID262
Feature activation-9.43e-02

Pos logit contributions
umm+0.051
Warning+0.047
 disapproval+0.047
 heavens+0.046
 horror+0.045
Neg logit contributions
 in-0.017
 got-0.013
 just-0.012
o-0.012
gy-0.012
Token Sm
Token ID2439
Feature activation-1.44e-01

Pos logit contributions
umm+0.169
Warning+0.152
 disapproval+0.148
 heavens+0.140
irk+0.138
Neg logit contributions
 in-0.169
 just-0.146
 got-0.146
 not-0.143
 found-0.143
Tokenelly
Token ID6148
Feature activation-8.78e-02

Pos logit contributions
umm+0.229
Warning+0.205
 disapproval+0.204
 heavens+0.194
 horror+0.187
Neg logit contributions
 in-0.277
 just-0.241
 got-0.241
gy-0.240
o-0.240
Token King
Token ID2677
Feature activation-3.10e-02

Pos logit contributions
umm+0.141
 disapproval+0.123
 heavens+0.122
Warning+0.120
 gentleman+0.114
Neg logit contributions
 in-0.161
o-0.153
gy-0.147
 just-0.144
 got-0.143
Token was
Token ID373
Feature activation-1.02e-02

Pos logit contributions
umm+0.082
Warning+0.076
 disapproval+0.076
 heavens+0.075
 Painter+0.074
Neg logit contributions
 in-0.022
gy-0.018
o-0.017
 out-0.016
 not-0.016
Token cruel
Token ID12177
Feature activation-9.23e-02

Pos logit contributions
umm+0.022
 disapproval+0.021
 heavens+0.020
 horror+0.020
Warning+0.020
Neg logit contributions
 in-0.012
gy-0.011
o-0.011
 would-0.011
 got-0.010
Token and
Token ID290
Feature activation-1.25e-02

Pos logit contributions
umm+0.234
 disapproval+0.218
Warning+0.215
 heavens+0.212
 gentleman+0.212
Neg logit contributions
 in-0.067
gy-0.065
o-0.062
 got-0.056
 found-0.052
Token
Token ID198
Feature activation-3.59e-02

Pos logit contributions
umm+0.048
Warning+0.046
 disapproval+0.044
 heavens+0.043
irk+0.042
Neg logit contributions
 in-0.035
gy-0.031
 just-0.030
 got-0.030
o-0.029
TokenSpot
Token ID32565
Feature activation-3.05e-02

Pos logit contributions
umm+0.078
 disapproval+0.069
 horror+0.067
 Painter+0.066
Warning+0.066
Neg logit contributions
 in-0.056
 got-0.049
 would-0.047
 just-0.047
 not-0.046
Token wanted
Token ID2227
Feature activation-3.66e-02

Pos logit contributions
umm+0.075
 heavens+0.071
Warning+0.071
 disapproval+0.070
 Painter+0.069
Neg logit contributions
 in-0.026
gy-0.020
o-0.019
 out-0.019
 not-0.019
Token to
Token ID284
Feature activation-2.64e-02

Pos logit contributions
umm+0.128
 disapproval+0.119
Warning+0.119
 heavens+0.115
 horror+0.114
Neg logit contributions
 in-0.010
 just-0.001
 to-0.000
 got+0.000
 not+0.000
Token thank
Token ID5875
Feature activation-1.01e-01

Pos logit contributions
umm+0.049
 disapproval+0.047
Warning+0.047
 heavens+0.047
 orchestra+0.045
Neg logit contributions
 in-0.038
o-0.035
gy-0.034
 got-0.032
 would-0.032
Token Ch
Token ID609
Feature activation-1.43e-01

Pos logit contributions
umm+0.214
Warning+0.197
 disapproval+0.194
 heavens+0.190
 jan+0.183
Neg logit contributions
 in-0.138
gy-0.117
 got-0.117
o-0.116
 just-0.115
Tokenir
Token ID343
Feature activation-1.06e-01

Pos logit contributions
umm+0.265
Warning+0.236
 disapproval+0.235
 heavens+0.231
 subt+0.224
Neg logit contributions
 in-0.244
gy-0.224
o-0.221
 just-0.210
 got-0.208
Tokenpy
Token ID9078
Feature activation-4.41e-02

Pos logit contributions
umm+0.193
 disapproval+0.174
Warning+0.174
 heavens+0.169
 Painter+0.161
Neg logit contributions
 in-0.174
 got-0.153
 would-0.151
 found-0.150
 just-0.149
Token for
Token ID329
Feature activation-5.19e-02

Pos logit contributions
umm+0.115
Warning+0.108
 disapproval+0.108
 heavens+0.106
 horror+0.103
Neg logit contributions
 in-0.036
gy-0.027
 got-0.027
o-0.027
 just-0.026
Token singing
Token ID13777
Feature activation-6.96e-02

Pos logit contributions
umm+0.129
Warning+0.117
 disapproval+0.116
 heavens+0.114
irk+0.110
Neg logit contributions
 in-0.059
 just-0.046
 not-0.046
 got-0.045
gy-0.045
Token.
Token ID13
Feature activation-7.36e-02

Pos logit contributions
umm+0.193
Warning+0.179
 disapproval+0.178
 heavens+0.175
 horror+0.170
Neg logit contributions
 in-0.053
 got-0.037
gy-0.037
o-0.037
 just-0.037
Token go
Token ID467
Feature activation-4.17e-02

Pos logit contributions
umm+0.040
 disapproval+0.038
Warning+0.037
 heavens+0.036
 jan+0.035
Neg logit contributions
 in-0.022
gy-0.020
o-0.019
 would-0.019
 out-0.018
Token with
Token ID351
Feature activation-4.05e-02

Pos logit contributions
umm+0.094
 heavens+0.090
Warning+0.090
 jan+0.090
 disapproval+0.089
Neg logit contributions
 would-0.033
 in-0.032
gy-0.030
o-0.028
 got-0.028
Token a
Token ID257
Feature activation-5.03e-02

Pos logit contributions
umm+0.091
 disapproval+0.085
Warning+0.085
 heavens+0.083
 clerk+0.081
Neg logit contributions
 in-0.046
o-0.040
gy-0.040
 would-0.038
 got-0.038
Token stranger
Token ID16195
Feature activation-2.40e-02

Pos logit contributions
umm+0.092
 disapproval+0.083
Warning+0.083
 heavens+0.080
 gentleman+0.078
Neg logit contributions
 in-0.084
o-0.073
gy-0.073
 just-0.072
 got-0.072
Token.
Token ID13
Feature activation-6.47e-02

Pos logit contributions
umm+0.056
Warning+0.052
 gentleman+0.051
 disapproval+0.051
 heavens+0.051
Neg logit contributions
 in-0.021
gy-0.018
 got-0.018
o-0.017
 not-0.017
Token Str
Token ID4285
Feature activation-1.43e-01

Pos logit contributions
umm+0.116
 disapproval+0.103
Warning+0.102
 heavens+0.097
 sidelines+0.096
Neg logit contributions
 in-0.112
o-0.102
gy-0.100
 just-0.099
 not-0.097
Tokenangers
Token ID6606
Feature activation+6.63e-03

Pos logit contributions
umm+0.217
Warning+0.191
 disapproval+0.190
 heavens+0.182
 horror+0.173
Neg logit contributions
 in-0.285
gy-0.253
o-0.253
 got-0.252
 just-0.251
Token can
Token ID460
Feature activation-1.33e-02

Pos logit contributions
o+0.006
gy+0.006
 in+0.006
 got+0.005
 found+0.005
Neg logit contributions
umm-0.015
 heavens-0.014
 disapproval-0.014
Warning-0.014
 clerk-0.013
Token be
Token ID307
Feature activation-4.18e-02

Pos logit contributions
umm+0.029
Warning+0.028
 heavens+0.027
 disapproval+0.027
 sidelines+0.026
Neg logit contributions
 in-0.015
o-0.014
gy-0.013
 out-0.012
 got-0.012
Token bad
Token ID2089
Feature activation-8.48e-02

Pos logit contributions
 disapproval+0.074
umm+0.068
 sentences+0.068
 clerk+0.067
Warning+0.067
Neg logit contributions
gy-0.050
 in-0.050
o-0.048
 got-0.046
 would-0.046
Token and
Token ID290
Feature activation-1.59e-02

Pos logit contributions
umm+0.217
Warning+0.197
 disapproval+0.196
 heavens+0.195
 clerk+0.191
Neg logit contributions
 in-0.066
gy-0.063
o-0.060
 got-0.057
 found-0.053
Token
Token ID198
Feature activation-2.97e-02

Pos logit contributions
umm+0.045
 disapproval+0.040
Warning+0.040
 heavens+0.039
 jan+0.038
Neg logit contributions
 in-0.024
o-0.020
 got-0.020
 just-0.020
 not-0.019
TokenWith
Token ID3152
Feature activation-3.63e-02

Pos logit contributions
umm+0.058
 disapproval+0.049
 horror+0.046
 heavens+0.046
Warning+0.046
Neg logit contributions
 in-0.050
 would-0.045
 got-0.045
 not-0.045
o-0.044
Token Simone
Token ID47592
Feature activation-4.17e-02

Pos logit contributions
umm+0.090
Warning+0.084
 disapproval+0.084
 heavens+0.082
 horror+0.080
Neg logit contributions
 in-0.037
gy-0.029
o-0.029
 got-0.029
 just-0.028
Token's
Token ID338
Feature activation-6.15e-02

Pos logit contributions
umm+0.111
Warning+0.105
 disapproval+0.104
 heavens+0.103
 horror+0.101
Neg logit contributions
 in-0.028
gy-0.024
o-0.022
 not-0.021
 just-0.020
Token gentle
Token ID10296
Feature activation-1.09e-01

Pos logit contributions
umm+0.136
Warning+0.123
 disapproval+0.120
 heavens+0.118
 jan+0.114
Neg logit contributions
 in-0.086
 got-0.071
 just-0.071
 not-0.069
 out-0.069
Token reass
Token ID12719
Feature activation-1.41e-01

Pos logit contributions
umm+0.172
Warning+0.151
 disapproval+0.150
 heavens+0.145
 horror+0.138
Neg logit contributions
 in-0.210
o-0.186
 got-0.186
gy-0.186
 just-0.185
Tokenurance
Token ID3874
Feature activation-6.51e-02

Pos logit contributions
umm+0.158
Warning+0.141
 jan+0.129
Cool+0.124
 disapproval+0.121
Neg logit contributions
 in-0.326
gy-0.322
 not-0.304
 just-0.298
 out-0.293
Token,
Token ID11
Feature activation-4.23e-02

Pos logit contributions
umm+0.204
Warning+0.189
 disapproval+0.188
 sidelines+0.183
 heavens+0.182
Neg logit contributions
 in-0.040
 got-0.025
 would-0.022
 just-0.021
 out-0.021
Token Josh
Token ID8518
Feature activation-2.01e-02

Pos logit contributions
umm+0.070
Warning+0.061
 disapproval+0.060
 heavens+0.058
 jan+0.055
Neg logit contributions
 in-0.081
 just-0.072
 got-0.072
o-0.071
gy-0.071
Token stepped
Token ID10764
Feature activation-3.97e-02

Pos logit contributions
umm+0.045
 disapproval+0.043
 heavens+0.043
Warning+0.042
 horror+0.042
Neg logit contributions
 in-0.019
gy-0.016
o-0.016
 not-0.015
 out-0.015
Token into
Token ID656
Feature activation-2.39e-02

Pos logit contributions
umm+0.104
Warning+0.097
 disapproval+0.096
 heavens+0.095
 horror+0.092
Neg logit contributions
 in-0.034
gy-0.027
o-0.027
 got-0.026
 would-0.026
Token very
Token ID845
Feature activation-4.58e-02

Pos logit contributions
umm+0.050
 disapproval+0.048
Warning+0.047
 heavens+0.047
 horror+0.046
Neg logit contributions
 in-0.030
gy-0.027
 would-0.027
o-0.027
 got-0.026
Token relieved
Token ID22598
Feature activation-6.07e-02

Pos logit contributions
umm+0.093
Warning+0.083
 disapproval+0.081
 heavens+0.079
 horizon+0.076
Neg logit contributions
 in-0.074
 got-0.063
 not-0.062
 just-0.062
o-0.062
Token and
Token ID290
Feature activation-1.39e-02

Pos logit contributions
umm+0.164
 disapproval+0.153
Warning+0.152
 heavens+0.150
 horror+0.147
Neg logit contributions
 in-0.042
gy-0.036
o-0.035
 got-0.033
 would-0.031
Token hugged
Token ID41130
Feature activation-5.88e-02

Pos logit contributions
umm+0.028
 disapproval+0.026
 heavens+0.025
Warning+0.025
 horror+0.025
Neg logit contributions
 in-0.019
gy-0.018
o-0.017
 just-0.016
 out-0.016
Token the
Token ID262
Feature activation-1.13e-01

Pos logit contributions
umm+0.149
Warning+0.138
 disapproval+0.138
 heavens+0.135
 horror+0.132
Neg logit contributions
 in-0.056
 got-0.045
o-0.045
gy-0.044
 just-0.043
Token naughty
Token ID45128
Feature activation-1.41e-01

Pos logit contributions
umm+0.217
Warning+0.196
 disapproval+0.194
 heavens+0.187
 subt+0.182
Neg logit contributions
 in-0.186
 got-0.160
 just-0.158
gy-0.156
o-0.156
Token boy
Token ID2933
Feature activation-2.62e-02

Pos logit contributions
umm+0.249
Warning+0.221
 disapproval+0.220
 heavens+0.213
 jan+0.206
Neg logit contributions
 in-0.245
o-0.216
gy-0.216
 got-0.213
 just-0.212
Token.
Token ID13
Feature activation-7.07e-02

Pos logit contributions
umm+0.062
 disapproval+0.059
 horror+0.058
Warning+0.057
 heavens+0.057
Neg logit contributions
 in-0.022
gy-0.019
 would-0.019
 got-0.019
o-0.019
Token From
Token ID3574
Feature activation-4.96e-02

Pos logit contributions
umm+0.117
Warning+0.111
 disapproval+0.110
 jan+0.099
 orchestra+0.098
Neg logit contributions
 in-0.133
o-0.118
gy-0.117
 just-0.117
 not-0.116
Token then
Token ID788
Feature activation-3.81e-02

Pos logit contributions
umm+0.125
Warning+0.118
 disapproval+0.117
 heavens+0.116
 gentleman+0.112
Neg logit contributions
 in-0.044
gy-0.037
o-0.036
 got-0.036
 would-0.034
Token on
Token ID319
Feature activation-4.52e-02

Pos logit contributions
umm+0.124
 disapproval+0.116
Warning+0.114
 heavens+0.113
 sidelines+0.111
Neg logit contributions
 in-0.016
 got-0.006
 just-0.005
 out-0.005
o-0.004
Token sat
Token ID3332
Feature activation-4.14e-02

Pos logit contributions
 in+0.001
gy+0.000
 out+0.000
 not+0.000
o+0.000
Neg logit contributions
 disapproval-0.001
umm-0.001
 horror-0.001
Warning-0.001
 heavens-0.001
Token down
Token ID866
Feature activation-4.34e-02

Pos logit contributions
umm+0.095
irk+0.087
ement+0.086
 heavens+0.086
Fighting+0.086
Neg logit contributions
 would-0.035
gy-0.034
 got-0.033
o-0.031
 not-0.030
Token to
Token ID284
Feature activation-3.14e-02

Pos logit contributions
umm+0.134
Warning+0.126
 disapproval+0.125
 heavens+0.123
 horror+0.120
Neg logit contributions
 in-0.021
 got-0.010
o-0.010
 just-0.010
gy-0.010
Token enjoy
Token ID2883
Feature activation-5.23e-02

Pos logit contributions
 horror+0.056
 sidelines+0.056
 delight+0.053
 disapproval+0.051
 amusement+0.050
Neg logit contributions
o-0.045
 in-0.045
ies-0.042
--0.042
 would-0.041
Token their
Token ID511
Feature activation-8.71e-02

Pos logit contributions
umm+0.117
 disapproval+0.108
Warning+0.106
 horror+0.105
 subt+0.104
Neg logit contributions
 in-0.054
gy-0.053
 got-0.051
o-0.051
 would-0.048
Token humble
Token ID19695
Feature activation-1.40e-01

Pos logit contributions
umm+0.164
 disapproval+0.142
Warning+0.140
 heavens+0.139
 horror+0.137
Neg logit contributions
 in-0.136
gy-0.125
o-0.123
 got-0.122
 out-0.119
Token dinner
Token ID8073
Feature activation-6.06e-02

Pos logit contributions
umm+0.265
 disapproval+0.233
 heavens+0.225
Warning+0.221
 horror+0.221
Neg logit contributions
 in-0.223
o-0.206
 got-0.198
 just-0.193
gy-0.192
Token together
Token ID1978
Feature activation-1.92e-02

Pos logit contributions
umm+0.164
Warning+0.147
 disapproval+0.146
 heavens+0.145
 subt+0.144
Neg logit contributions
 in-0.042
gy-0.037
o-0.035
 got-0.033
 found-0.030
Token.
Token ID13
Feature activation-7.16e-02

Pos logit contributions
umm+0.051
Warning+0.047
 disapproval+0.047
 heavens+0.046
 horror+0.045
Neg logit contributions
 in-0.015
gy-0.012
o-0.012
 got-0.012
 just-0.011
Token From
Token ID3574
Feature activation-5.26e-02

Pos logit contributions
umm+0.148
Warning+0.135
 disapproval+0.134
 heavens+0.127
 gentleman+0.127
Neg logit contributions
 in-0.112
 got-0.096
 just-0.095
o-0.094
 not-0.093
Token that
Token ID326
Feature activation-3.50e-02

Pos logit contributions
umm+0.132
Warning+0.124
 disapproval+0.123
 heavens+0.123
 gentleman+0.119
Neg logit contributions
 in-0.044
gy-0.039
o-0.037
 got-0.037
 would-0.035
Token getting
Token ID1972
Feature activation-3.03e-02

Pos logit contributions
umm+0.014
 disapproval+0.013
Warning+0.012
 heavens+0.012
 horror+0.012
Neg logit contributions
 in-0.007
gy-0.006
 got-0.006
o-0.006
 would-0.005
Token very
Token ID845
Feature activation-3.29e-02

Pos logit contributions
umm+0.065
 disapproval+0.060
Warning+0.059
 heavens+0.059
 jan+0.057
Neg logit contributions
 in-0.039
gy-0.035
 got-0.033
o-0.033
 would-0.032
Token annoyed
Token ID25602
Feature activation-5.51e-02

Pos logit contributions
umm+0.061
 disapproval+0.055
Warning+0.055
 heavens+0.054
 jan+0.053
Neg logit contributions
 in-0.051
gy-0.047
o-0.045
 would-0.045
 got-0.044
Token with
Token ID351
Feature activation-3.89e-02

Pos logit contributions
umm+0.155
Warning+0.144
 disapproval+0.144
 heavens+0.141
 horror+0.137
Neg logit contributions
 in-0.041
 got-0.028
 just-0.028
o-0.028
gy-0.027
Token his
Token ID465
Feature activation-7.41e-02

Pos logit contributions
umm+0.090
 disapproval+0.089
 horror+0.086
Warning+0.086
 heavens+0.083
Neg logit contributions
 in-0.038
o-0.035
gy-0.033
 got-0.032
 would-0.032
Token rud
Token ID28906
Feature activation-1.40e-01

Pos logit contributions
umm+0.158
Warning+0.146
 disapproval+0.141
 heavens+0.138
 subt+0.133
Neg logit contributions
 in-0.110
 just-0.092
 got-0.091
gy-0.091
o-0.091
Tokeneness
Token ID9449
Feature activation-5.66e-02

Pos logit contributions
umm+0.126
Warning+0.100
 disapproval+0.097
 heavens+0.091
 horror+0.081
Neg logit contributions
 in-0.369
 got-0.337
gy-0.337
o-0.337
 just-0.337
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
umm+0.164
Warning+0.152
 disapproval+0.150
 heavens+0.148
 subt+0.144
Neg logit contributions
 in-0.042
 just-0.027
 got-0.027
gy-0.026
 would-0.026
Token complained
Token ID13832
Feature activation-5.68e-02

Pos logit contributions
umm+0.036
 disapproval+0.035
Warning+0.034
 horror+0.033
 jan+0.033
Neg logit contributions
 in-0.019
gy-0.016
o-0.016
 out-0.015
 got-0.014
Token that
Token ID326
Feature activation-2.42e-02

Pos logit contributions
umm+0.157
Warning+0.146
 disapproval+0.145
 heavens+0.142
 horror+0.138
Neg logit contributions
 in-0.043
 got-0.031
gy-0.030
o-0.030
 just-0.030
Token he
Token ID339
Feature activation+1.51e-02

Pos logit contributions
umm+0.062
Warning+0.058
 disapproval+0.057
 heavens+0.056
 horror+0.055
Neg logit contributions
 in-0.021
gy-0.017
o-0.017
 got-0.016
 just-0.016
Token,
Token ID11
Feature activation-2.95e-02

Pos logit contributions
umm+0.064
 clerk+0.060
 jan+0.060
 gentleman+0.060
 disapproval+0.060
Neg logit contributions
 in-0.020
 would-0.019
ie-0.018
o-0.017
 got-0.017
Token feeling
Token ID4203
Feature activation-4.24e-02

Pos logit contributions
umm+0.058
 disapproval+0.055
 horror+0.053
 heavens+0.053
 gentleman+0.053
Neg logit contributions
 in-0.038
gy-0.034
ie-0.033
 would-0.032
o-0.032
Token grateful
Token ID14066
Feature activation-7.12e-02

Pos logit contributions
umm+0.076
 disapproval+0.070
 heavens+0.069
 horror+0.067
Warning+0.065
Neg logit contributions
gy-0.065
 in-0.064
o-0.061
ie-0.058
 would-0.058
Token for
Token ID329
Feature activation-5.27e-02

Pos logit contributions
umm+0.167
irk+0.148
 heavens+0.148
 horizon+0.147
 clerk+0.145
Neg logit contributions
gy-0.070
o-0.060
 got-0.056
ep-0.053
 in-0.053
Token the
Token ID262
Feature activation-1.14e-01

Pos logit contributions
umm+0.131
Warning+0.120
 disapproval+0.118
 heavens+0.117
 horizon+0.113
Neg logit contributions
 in-0.061
 just-0.048
 not-0.047
 got-0.046
o-0.045
Token kind
Token ID1611
Feature activation-1.40e-01

Pos logit contributions
umm+0.260
Warning+0.241
 disapproval+0.232
andel+0.225
 subt+0.225
Neg logit contributions
 in-0.157
 just-0.136
 not-0.134
 would-0.128
 got-0.128
Token stranger
Token ID16195
Feature activation-2.57e-02

Pos logit contributions
umm+0.226
Warning+0.209
 subt+0.194
 disapproval+0.191
 sidelines+0.188
Neg logit contributions
 in-0.284
 just-0.256
 got-0.252
 to-0.250
 would-0.245
Token's
Token ID338
Feature activation-7.77e-02

Pos logit contributions
umm+0.064
 disapproval+0.059
Warning+0.059
 gentleman+0.059
 heavens+0.059
Neg logit contributions
 in-0.019
gy-0.016
 got-0.016
o-0.015
 found-0.015
Token help
Token ID1037
Feature activation-6.75e-02

Pos logit contributions
umm+0.156
Warning+0.142
 disapproval+0.139
 heavens+0.136
 jan+0.131
Neg logit contributions
 in-0.121
 just-0.103
 got-0.102
o-0.102
gy-0.100
Token.
Token ID13
Feature activation-6.58e-02

Pos logit contributions
umm+0.190
Warning+0.176
 disapproval+0.175
 heavens+0.172
 jan+0.168
Neg logit contributions
 in-0.052
 just-0.035
 got-0.035
o-0.034
gy-0.034
Token<|endoftext|>
Token ID50256
Feature activation-1.73e-02

Pos logit contributions
umm+0.136
Warning+0.124
 disapproval+0.123
 heavens+0.120
 horror+0.115
Neg logit contributions
 in-0.096
 got-0.081
o-0.081
gy-0.081
 just-0.081
Token too
Token ID1165
Feature activation-2.25e-02

Pos logit contributions
umm+0.215
 disapproval+0.198
Warning+0.195
 heavens+0.194
 horror+0.191
Neg logit contributions
gy-0.060
 in-0.058
o-0.056
 got-0.053
 not-0.047
Token.
Token ID13
Feature activation-6.35e-02

Pos logit contributions
umm+0.061
Warning+0.056
 disapproval+0.056
 heavens+0.055
 horror+0.054
Neg logit contributions
 in-0.018
gy-0.013
o-0.013
 got-0.013
 just-0.013
Token They
Token ID1119
Feature activation+2.28e-02

Pos logit contributions
umm+0.126
 disapproval+0.113
Warning+0.112
 heavens+0.109
 jan+0.108
Neg logit contributions
 in-0.104
o-0.090
 got-0.090
 just-0.089
 not-0.088
Token hoped
Token ID10719
Feature activation-5.44e-02

Pos logit contributions
 in+0.022
gy+0.020
o+0.019
 out+0.018
 not+0.017
Neg logit contributions
 disapproval-0.049
umm-0.049
 heavens-0.048
Warning-0.048
 horror-0.046
Token the
Token ID262
Feature activation-1.09e-01

Pos logit contributions
umm+0.123
 heavens+0.118
Warning+0.116
 horror+0.116
 disapproval+0.115
Neg logit contributions
o-0.049
gy-0.049
 in-0.045
 got-0.041
 found-0.037
Token rude
Token ID22066
Feature activation-1.40e-01

Pos logit contributions
umm+0.218
Warning+0.197
 disapproval+0.196
 heavens+0.191
 horror+0.183
Neg logit contributions
 in-0.168
o-0.144
gy-0.143
 got-0.143
 just-0.142
Token bird
Token ID6512
Feature activation-3.90e-02

Pos logit contributions
umm+0.236
 heavens+0.211
 disapproval+0.205
Warning+0.205
 Painter+0.199
Neg logit contributions
 in-0.238
o-0.219
gy-0.218
 got-0.207
 out-0.206
Token would
Token ID561
Feature activation-1.02e-02

Pos logit contributions
 heavens+0.081
 languages+0.078
Warning+0.078
 disapproval+0.076
 horror+0.076
Neg logit contributions
gy-0.023
o-0.023
 out-0.021
 in-0.021
ep-0.018
Token change
Token ID1487
Feature activation-5.28e-02

Pos logit contributions
umm+0.023
 heavens+0.022
 disapproval+0.022
Warning+0.022
 horizon+0.021
Neg logit contributions
 in-0.010
o-0.010
gy-0.009
 got-0.008
ie-0.008
Token and
Token ID290
Feature activation-1.90e-02

Pos logit contributions
umm+0.112
 disapproval+0.110
 sentences+0.105
 horror+0.104
 heavens+0.104
Neg logit contributions
o-0.052
gy-0.050
 got-0.049
 in-0.047
 found-0.043
Token help
Token ID1037
Feature activation-6.49e-02

Pos logit contributions
 heavens+0.034
 delight+0.033
umm+0.033
 horror+0.032
 disapproval+0.032
Neg logit contributions
o-0.025
gy-0.023
ie-0.020
ep-0.020
 in-0.020
Token ran
Token ID4966
Feature activation-3.26e-02

Pos logit contributions
 in+0.026
gy+0.024
 out+0.022
o+0.021
 where+0.020
Neg logit contributions
 disapproval-0.054
umm-0.053
 clerk-0.052
 heavens-0.052
Warning-0.051
Token over
Token ID625
Feature activation-1.47e-02

Pos logit contributions
umm+0.091
Warning+0.085
 disapproval+0.084
 heavens+0.082
 horror+0.080
Neg logit contributions
 in-0.024
 got-0.017
gy-0.017
o-0.017
 just-0.017
Token and
Token ID290
Feature activation-1.46e-02

Pos logit contributions
umm+0.046
Warning+0.043
 disapproval+0.043
 heavens+0.042
 jan+0.041
Neg logit contributions
 in-0.007
 got-0.003
 just-0.003
o-0.003
gy-0.003
Token grabbed
Token ID13176
Feature activation-4.60e-02

Pos logit contributions
 clerk+0.029
 gentleman+0.029
umm+0.029
 disapproval+0.028
 heavens+0.028
Neg logit contributions
 in-0.016
gy-0.015
ie-0.014
 would-0.014
?"-0.013
Token the
Token ID262
Feature activation-1.05e-01

Pos logit contributions
umm+0.112
 disapproval+0.110
Warning+0.108
 horror+0.106
 heavens+0.103
Neg logit contributions
 in-0.039
gy-0.039
o-0.036
 would-0.034
 got-0.033
Token naughty
Token ID45128
Feature activation-1.40e-01

Pos logit contributions
umm+0.208
Warning+0.178
 disapproval+0.176
 heavens+0.175
 subt+0.171
Neg logit contributions
 in-0.180
 got-0.158
 just-0.156
 out-0.154
 found-0.151
Token boy
Token ID2933
Feature activation-2.69e-02

Pos logit contributions
umm+0.278
Warning+0.250
 heavens+0.242
 disapproval+0.239
 subt+0.234
Neg logit contributions
 in-0.228
 got-0.201
 just-0.195
 found-0.191
 out-0.191
Token,
Token ID11
Feature activation-2.84e-02

Pos logit contributions
umm+0.070
 disapproval+0.067
Warning+0.066
 horror+0.064
 heavens+0.064
Neg logit contributions
 in-0.020
 would-0.015
 got-0.015
gy-0.015
o-0.015
Token and
Token ID290
Feature activation-1.97e-02

Pos logit contributions
 disapproval+0.056
umm+0.054
 horror+0.053
 heavens+0.052
Warning+0.052
Neg logit contributions
 in-0.035
gy-0.034
ie-0.033
o-0.033
 would-0.031
Token gave
Token ID2921
Feature activation-5.41e-02

Pos logit contributions
 disapproval+0.040
umm+0.039
 horror+0.039
 clerk+0.039
 gentleman+0.038
Neg logit contributions
gy-0.022
 in-0.020
o-0.020
ie-0.019
?"-0.018
Token him
Token ID683
Feature activation-2.51e-02

Pos logit contributions
 disapproval+0.100
umm+0.098
 anticipation+0.098
 delight+0.096
Warning+0.096
Neg logit contributions
gy-0.080
 in-0.067
o-0.066
 got-0.062
 would-0.062
Token told
Token ID1297
Feature activation-5.61e-02

Pos logit contributions
 heavens+0.020
 horror+0.018
 horizon+0.018
 gentleman+0.018
 Painter+0.018
Neg logit contributions
 in-0.013
gy-0.012
o-0.012
 not-0.011
 where-0.011
Token his
Token ID465
Feature activation-6.84e-02

Pos logit contributions
umm+0.156
Warning+0.145
 disapproval+0.145
 heavens+0.142
 horror+0.139
Neg logit contributions
 in-0.043
o-0.031
gy-0.030
 got-0.030
 just-0.029
Token family
Token ID1641
Feature activation-2.39e-02

Pos logit contributions
umm+0.144
Warning+0.132
 disapproval+0.130
 subt+0.121
 sidelines+0.121
Neg logit contributions
 in-0.104
 not-0.089
 just-0.088
 got-0.088
 would-0.087
Token about
Token ID546
Feature activation-3.27e-02

Pos logit contributions
 heavens+0.042
 clerk+0.042
umm+0.041
 gentleman+0.041
Warning+0.041
Neg logit contributions
gy-0.031
o-0.029
 in-0.027
ep-0.026
bo-0.025
Token the
Token ID262
Feature activation-9.79e-02

Pos logit contributions
umm+0.077
 disapproval+0.073
Warning+0.072
 heavens+0.072
 horror+0.071
Neg logit contributions
 in-0.030
o-0.028
gy-0.027
 got-0.027
 found-0.025
Token wise
Token ID10787
Feature activation-1.39e-01

Pos logit contributions
umm+0.193
 disapproval+0.174
Warning+0.172
 subt+0.166
irk+0.163
Neg logit contributions
 in-0.167
 just-0.149
 would-0.145
 not-0.141
 found-0.141
Token owl
Token ID39610
Feature activation-3.42e-02

Pos logit contributions
umm+0.238
Warning+0.212
 disapproval+0.208
 heavens+0.194
 subt+0.194
Neg logit contributions
 in-0.262
 just-0.231
gy-0.228
 got-0.226
 would-0.224
Token and
Token ID290
Feature activation-1.97e-02

Pos logit contributions
umm+0.077
 heavens+0.075
 horror+0.075
Warning+0.075
 disapproval+0.075
Neg logit contributions
 in-0.024
gy-0.024
o-0.023
 got-0.021
 not-0.020
Token they
Token ID484
Feature activation+1.58e-02

Pos logit contributions
 horror+0.037
 heavens+0.037
 Painter+0.036
 amaz+0.036
 jan+0.036
Neg logit contributions
gy-0.022
o-0.022
 in-0.021
bo-0.020
ie-0.019
Token all
Token ID477
Feature activation+2.51e-03

Pos logit contributions
 in+0.015
gy+0.014
o+0.013
 not+0.012
 out+0.012
Neg logit contributions
 heavens-0.034
umm-0.034
 disapproval-0.033
Warning-0.033
 horror-0.032
Token went
Token ID1816
Feature activation-1.90e-02

Pos logit contributions
gy+0.003
 in+0.003
o+0.002
 not+0.002
 out+0.002
Neg logit contributions
 disapproval-0.005
 horror-0.005
 heavens-0.005
 anticipation-0.005
 orchestra-0.005
Token sign
Token ID1051
Feature activation-1.22e-02

Pos logit contributions
umm+0.111
Warning+0.100
 disapproval+0.097
 heavens+0.095
 subt+0.092
Neg logit contributions
 in-0.086
 got-0.074
 just-0.073
 out-0.072
 would-0.071
Token that
Token ID326
Feature activation-9.15e-03

Pos logit contributions
umm+0.037
Warning+0.034
 disapproval+0.034
 heavens+0.033
 Painter+0.032
Neg logit contributions
 in-0.009
 just-0.005
 out-0.005
 got-0.005
 would-0.005
Token said
Token ID531
Feature activation-3.42e-02

Pos logit contributions
umm+0.023
Warning+0.021
 disapproval+0.021
 heavens+0.020
 sidelines+0.020
Neg logit contributions
 in-0.011
 got-0.009
 just-0.009
 would-0.008
 not-0.008
Token "
Token ID366
Feature activation-7.10e-02

Pos logit contributions
umm+0.076
Warning+0.070
 disapproval+0.069
 heavens+0.068
 horror+0.065
Neg logit contributions
 in-0.044
gy-0.036
o-0.036
 got-0.036
 just-0.036
TokenHom
Token ID28718
Feature activation-1.05e-01

Pos logit contributions
umm+0.115
 horror+0.112
 disapproval+0.100
 jan+0.098
 sentences+0.095
Neg logit contributions
 in-0.138
o-0.127
 got-0.126
 just-0.122
 found-0.119
Tokeneless
Token ID5321
Feature activation-1.39e-01

Pos logit contributions
umm+0.159
Warning+0.140
 disapproval+0.137
 heavens+0.130
 subt+0.124
Neg logit contributions
 in-0.213
 got-0.190
gy-0.189
 just-0.189
o-0.189
Token and
Token ID290
Feature activation-3.27e-02

Pos logit contributions
umm+0.294
Warning+0.264
 disapproval+0.261
 heavens+0.256
 gentleman+0.248
Neg logit contributions
 in-0.195
o-0.167
gy-0.165
 just-0.165
 got-0.164
Token Hungry
Token ID42939
Feature activation-1.32e-01

Pos logit contributions
umm+0.057
Warning+0.052
 disapproval+0.051
 heavens+0.051
 subt+0.049
Neg logit contributions
 in-0.054
gy-0.050
o-0.049
 got-0.047
 just-0.047
Token".
Token ID1911
Feature activation-6.60e-02

Pos logit contributions
umm+0.211
 disapproval+0.192
Warning+0.192
 heavens+0.180
 horror+0.175
Neg logit contributions
 in-0.256
gy-0.227
 got-0.223
o-0.223
 just-0.221
Token Tim
Token ID5045
Feature activation-1.98e-02

Pos logit contributions
umm+0.157
Warning+0.142
 disapproval+0.140
 heavens+0.138
 orchestra+0.134
Neg logit contributions
 in-0.085
 got-0.072
 just-0.071
gy-0.071
o-0.069
Tokenmy
Token ID1820
Feature activation+2.29e-03

Pos logit contributions
umm+0.044
 disapproval+0.043
irk+0.041
Warning+0.041
 Painter+0.041
Neg logit contributions
 in-0.022
 out-0.017
 not-0.017
 there-0.017
 just-0.016
Token
Token ID198
Feature activation-3.94e-02

Pos logit contributions
umm+0.071
 disapproval+0.065
Warning+0.064
 heavens+0.063
 horror+0.061
Neg logit contributions
 in-0.038
o-0.032
 got-0.031
 just-0.031
gy-0.030
TokenBuzz
Token ID48230
Feature activation-3.95e-02

Pos logit contributions
umm+0.071
Warning+0.063
 disapproval+0.063
 heavens+0.061
 horror+0.059
Neg logit contributions
 in-0.069
 got-0.060
o-0.059
 just-0.059
gy-0.059
Tokeny
Token ID88
Feature activation-3.69e-02

Pos logit contributions
umm+0.083
 disapproval+0.078
Warning+0.077
irk+0.076
 heavens+0.076
Neg logit contributions
 not-0.038
 in-0.037
 where-0.037
 out-0.035
 just-0.032
Token asked
Token ID1965
Feature activation-4.44e-02

Pos logit contributions
umm+0.089
 disapproval+0.085
Warning+0.085
 heavens+0.085
 clerk+0.083
Neg logit contributions
 in-0.031
 out-0.025
gy-0.025
o-0.025
 not-0.023
Token,
Token ID11
Feature activation-3.39e-02

Pos logit contributions
umm+0.102
 disapproval+0.096
Warning+0.095
 heavens+0.094
 horror+0.092
Neg logit contributions
 in-0.039
gy-0.038
o-0.038
 got-0.037
 just-0.034
Token
Token ID6184
Feature activation-1.39e-01

Pos logit contributions
umm+0.077
Warning+0.068
 disapproval+0.067
 heavens+0.064
 sidelines+0.063
Neg logit contributions
 in-0.052
 just-0.042
 not-0.041
 out-0.041
 got-0.040
Token
Token ID95
Feature activation-1.35e-01

Pos logit contributions
umm+0.246
Warning+0.227
 disapproval+0.226
 horror+0.209
 heavens+0.208
Neg logit contributions
 in-0.243
gy-0.215
 just-0.212
 not-0.208
 got-0.206
Token
Token ID26391
Feature activation-1.09e-01

Pos logit contributions
umm+0.239
 disapproval+0.214
Warning+0.213
 jan+0.206
 gentleman+0.203
Neg logit contributions
 in-0.230
gy-0.212
 just-0.207
o-0.200
 not-0.200
Token
Token ID129
Feature activation-1.20e-01

Pos logit contributions
umm+0.189
 Painter+0.167
Warning+0.159
 heavens+0.158
 Cater+0.154
Neg logit contributions
 in-0.185
 out-0.162
ie-0.160
 up-0.156
 just-0.155
Token
Token ID241
Feature activation-5.79e-02

Pos logit contributions
umm+0.221
Warning+0.193
 disapproval+0.189
 heavens+0.183
 Painter+0.181
Neg logit contributions
 in-0.208
gy-0.185
 just-0.185
 not-0.184
o-0.179
TokenShould
Token ID19926
Feature activation-5.55e-02

Pos logit contributions
umm+0.098
 disapproval+0.085
Warning+0.085
 heavens+0.082
 horror+0.079
Neg logit contributions
 in-0.110
 just-0.097
o-0.096
 got-0.095
 not-0.094
Token,
Token ID11
Feature activation-3.34e-02

Pos logit contributions
umm+0.022
Warning+0.020
 disapproval+0.020
 heavens+0.020
 horror+0.020
Neg logit contributions
 in-0.007
gy-0.006
 got-0.006
o-0.006
 found-0.006
Token all
Token ID477
Feature activation-8.10e-03

Pos logit contributions
umm+0.086
 disapproval+0.080
Warning+0.079
 heavens+0.079
 horror+0.077
Neg logit contributions
 in-0.026
gy-0.021
 got-0.020
o-0.020
 out-0.019
Token thanks
Token ID5176
Feature activation-6.25e-02

Pos logit contributions
umm+0.018
 disapproval+0.017
 horror+0.017
 heavens+0.017
 Painter+0.016
Neg logit contributions
 in-0.007
gy-0.007
 got-0.007
o-0.007
 not-0.007
Token to
Token ID284
Feature activation-4.10e-02

Pos logit contributions
umm+0.181
 disapproval+0.168
Warning+0.168
 heavens+0.166
 horror+0.162
Neg logit contributions
 in-0.029
gy-0.021
 got-0.021
o-0.020
 found-0.018
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
umm+0.111
Warning+0.103
 disapproval+0.103
 heavens+0.100
 jan+0.098
Neg logit contributions
 in-0.035
 just-0.026
gy-0.025
 got-0.025
o-0.025
Token wise
Token ID10787
Feature activation-1.38e-01

Pos logit contributions
umm+0.238
ement+0.200
Warning+0.200
 horror+0.199
ula+0.198
Neg logit contributions
 in-0.155
 would-0.146
 just-0.143
 will-0.135
 found-0.134
Token owl
Token ID39610
Feature activation-2.91e-02

Pos logit contributions
umm+0.243
Warning+0.217
 disapproval+0.212
 sidelines+0.203
 subt+0.202
Neg logit contributions
 in-0.256
 just-0.226
gy-0.225
 got-0.218
 would-0.218
Token's
Token ID338
Feature activation-6.85e-02

Pos logit contributions
umm+0.074
Warning+0.068
 heavens+0.068
 disapproval+0.068
 Painter+0.067
Neg logit contributions
 in-0.021
 got-0.018
 not-0.017
gy-0.016
 found-0.016
Token delivery
Token ID7585
Feature activation-5.78e-02

Pos logit contributions
umm+0.112
Warning+0.098
 disapproval+0.091
 sidelines+0.088
 jan+0.088
Neg logit contributions
 in-0.138
 just-0.123
 not-0.122
 would-0.121
 to-0.118
Token.
Token ID13
Feature activation-6.06e-02

Pos logit contributions
umm+0.150
Warning+0.143
 disapproval+0.140
 subt+0.134
 heavens+0.134
Neg logit contributions
 in-0.060
 just-0.048
 would-0.046
 got-0.043
o-0.043
Token<|endoftext|>
Token ID50256
Feature activation-1.09e-02

Pos logit contributions
umm+0.147
 disapproval+0.137
Warning+0.134
 heavens+0.130
 sidelines+0.129
Neg logit contributions
 in-0.069
o-0.060
gy-0.056
 got-0.055
 just-0.055
Token When
Token ID1649
Feature activation-4.76e-03

Pos logit contributions
umm+0.137
 disapproval+0.128
Warning+0.124
 jan+0.120
 orchestra+0.117
Neg logit contributions
 in-0.112
o-0.100
 out-0.098
gy-0.095
 where-0.094
Token they
Token ID484
Feature activation+4.45e-03

Pos logit contributions
umm+0.013
 disapproval+0.013
Warning+0.012
irk+0.012
 heavens+0.012
Neg logit contributions
 in-0.004
 just-0.003
 got-0.003
 not-0.003
 out-0.003
Token were
Token ID547
Feature activation-3.29e-02

Pos logit contributions
 in+0.005
gy+0.004
o+0.004
 not+0.004
 just+0.004
Neg logit contributions
umm-0.010
Warning-0.010
 disapproval-0.010
 heavens-0.010
 horror-0.009
Token finished
Token ID5201
Feature activation-4.95e-02

Pos logit contributions
umm+0.075
 disapproval+0.071
 heavens+0.069
Warning+0.068
 horror+0.068
Neg logit contributions
 in-0.035
gy-0.031
o-0.030
 would-0.030
 got-0.029
Token the
Token ID262
Feature activation-9.90e-02

Pos logit contributions
umm+0.122
Warning+0.112
 disapproval+0.112
 heavens+0.111
 subt+0.109
Neg logit contributions
 in-0.042
o-0.039
gy-0.036
 got-0.036
 would-0.035
Token wise
Token ID10787
Feature activation-1.38e-01

Pos logit contributions
umm+0.195
Warning+0.179
 disapproval+0.175
 heavens+0.163
 subt+0.162
Neg logit contributions
 in-0.164
 just-0.139
 got-0.138
 found-0.137
 out-0.136
Token woman
Token ID2415
Feature activation-1.64e-02

Pos logit contributions
umm+0.197
Warning+0.171
 disapproval+0.168
 heavens+0.155
 subt+0.151
Neg logit contributions
 in-0.297
 got-0.266
 just-0.265
gy-0.263
 found-0.261
Token said
Token ID531
Feature activation-4.29e-02

Pos logit contributions
umm+0.044
Warning+0.041
 disapproval+0.041
 heavens+0.040
 horror+0.039
Neg logit contributions
 in-0.013
gy-0.009
o-0.009
 got-0.009
 just-0.009
Token,
Token ID11
Feature activation-4.69e-02

Pos logit contributions
umm+0.113
Warning+0.105
 disapproval+0.104
 heavens+0.102
 jan+0.099
Neg logit contributions
 in-0.040
 got-0.029
 just-0.029
o-0.029
gy-0.029
Token "
Token ID366
Feature activation-6.79e-02

Pos logit contributions
umm+0.101
Warning+0.090
 disapproval+0.090
 jan+0.085
irk+0.084
Neg logit contributions
 in-0.072
 just-0.060
 got-0.059
 out-0.059
 not-0.058
TokenYou
Token ID1639
Feature activation-1.71e-02

Pos logit contributions
umm+0.096
 disapproval+0.085
 horror+0.082
Warning+0.082
 jan+0.081
Neg logit contributions
 in-0.142
 just-0.128
o-0.127
 would-0.127
 got-0.126
Token happy
Token ID3772
Feature activation-5.18e-02

Pos logit contributions
umm+0.007
 disapproval+0.006
Warning+0.006
 heavens+0.006
 horror+0.006
Neg logit contributions
 in-0.003
gy-0.003
o-0.003
 got-0.003
 would-0.003
Token.
Token ID13
Feature activation-5.39e-02

Pos logit contributions
umm+0.153
Warning+0.143
 disapproval+0.142
 heavens+0.140
 jan+0.136
Neg logit contributions
 in-0.032
 just-0.020
 got-0.020
o-0.019
gy-0.019
Token They
Token ID1119
Feature activation+2.80e-02

Pos logit contributions
umm+0.128
 disapproval+0.125
Warning+0.120
 sidelines+0.116
 orchestra+0.116
Neg logit contributions
 in-0.065
o-0.058
 to-0.053
gy-0.052
 out-0.052
Token thanked
Token ID26280
Feature activation-4.68e-02

Pos logit contributions
 in+0.022
 got+0.016
 just+0.016
o+0.016
gy+0.015
Neg logit contributions
umm-0.077
Warning-0.072
 disapproval-0.071
 heavens-0.070
 horror-0.068
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
umm+0.143
Warning+0.131
 disapproval+0.130
 jan+0.126
 heavens+0.126
Neg logit contributions
 in-0.031
 just-0.019
 not-0.018
 out-0.017
 got-0.017
Token wise
Token ID10787
Feature activation-1.38e-01

Pos logit contributions
umm+0.225
 disapproval+0.201
Warning+0.198
 horror+0.187
ula+0.186
Neg logit contributions
 in-0.178
 got-0.155
 would-0.152
 found-0.152
 not-0.151
Token old
Token ID1468
Feature activation-1.01e-01

Pos logit contributions
umm+0.234
Warning+0.209
 disapproval+0.206
 subt+0.193
 sidelines+0.190
Neg logit contributions
 in-0.265
 got-0.230
 just-0.228
 found-0.226
 out-0.225
Token man
Token ID582
Feature activation-1.73e-02

Pos logit contributions
umm+0.173
 disapproval+0.151
Warning+0.148
 subt+0.138
 horror+0.137
Neg logit contributions
 in-0.194
 got-0.172
 found-0.169
 just-0.168
 would-0.168
Token and
Token ID290
Feature activation-1.30e-02

Pos logit contributions
umm+0.045
Warning+0.042
 disapproval+0.042
 heavens+0.042
 horror+0.041
Neg logit contributions
 in-0.012
 got-0.010
gy-0.009
 found-0.009
 not-0.009
Token played
Token ID2826
Feature activation-4.80e-02

Pos logit contributions
umm+0.032
Warning+0.030
 disapproval+0.030
 heavens+0.029
 subt+0.029
Neg logit contributions
 in-0.012
gy-0.010
o-0.010
 just-0.009
 not-0.009
Token with
Token ID351
Feature activation-4.54e-02

Pos logit contributions
umm+0.131
Warning+0.123
 disapproval+0.122
 heavens+0.120
 horror+0.117
Neg logit contributions
 in-0.031
o-0.027
gy-0.026
 got-0.025
 would-0.023
Token long
Token ID890
Feature activation-7.38e-02

Pos logit contributions
umm+0.140
Warning+0.128
 heavens+0.125
 disapproval+0.124
 sidelines+0.121
Neg logit contributions
 in-0.060
 got-0.047
 just-0.046
 not-0.044
 found-0.044
Token.
Token ID13
Feature activation-7.18e-02

Pos logit contributions
umm+0.181
Warning+0.167
 disapproval+0.166
 heavens+0.163
 horror+0.158
Neg logit contributions
 in-0.079
 got-0.062
gy-0.062
 just-0.062
o-0.062
Token Soon
Token ID15894
Feature activation-2.02e-02

Pos logit contributions
umm+0.151
Warning+0.147
 disapproval+0.146
 orchestra+0.145
irk+0.141
Neg logit contributions
 in-0.098
 just-0.083
gy-0.081
 got-0.080
o-0.079
Token,
Token ID11
Feature activation-4.47e-02

Pos logit contributions
umm+0.055
Warning+0.051
 disapproval+0.051
 heavens+0.050
 horror+0.049
Neg logit contributions
 in-0.016
gy-0.012
o-0.012
 got-0.012
 just-0.011
Token the
Token ID262
Feature activation-8.72e-02

Pos logit contributions
umm+0.112
Warning+0.105
 disapproval+0.103
 heavens+0.102
 jan+0.100
Neg logit contributions
 in-0.039
o-0.036
gy-0.035
 would-0.030
ie-0.030
Token wicked
Token ID20589
Feature activation-1.38e-01

Pos logit contributions
umm+0.183
Warning+0.167
 disapproval+0.163
 heavens+0.156
 horror+0.155
Neg logit contributions
 in-0.135
 got-0.118
 found-0.115
 not-0.115
 just-0.114
Token man
Token ID582
Feature activation-1.32e-02

Pos logit contributions
umm+0.218
Warning+0.198
 disapproval+0.192
 heavens+0.178
 subt+0.175
Neg logit contributions
 in-0.273
 got-0.245
 just-0.242
 not-0.238
 found-0.237
Token became
Token ID2627
Feature activation-3.02e-02

Pos logit contributions
umm+0.034
Warning+0.032
 disapproval+0.032
 heavens+0.032
 clerk+0.031
Neg logit contributions
 in-0.010
gy-0.008
o-0.007
 out-0.007
 not-0.007
Token greedy
Token ID31828
Feature activation-8.10e-02

Pos logit contributions
umm+0.058
 disapproval+0.056
Warning+0.053
 jan+0.053
 heavens+0.052
Neg logit contributions
 in-0.040
gy-0.037
o-0.037
 would-0.037
 got-0.035
Token again
Token ID757
Feature activation-2.40e-02

Pos logit contributions
umm+0.197
 disapproval+0.179
 gentleman+0.178
irk+0.177
Warning+0.176
Neg logit contributions
gy-0.068
o-0.065
 in-0.064
 got-0.057
 found-0.054
Token and
Token ID290
Feature activation-1.64e-02

Pos logit contributions
umm+0.058
Warning+0.056
 disapproval+0.055
 heavens+0.053
 horror+0.052
Neg logit contributions
 in-0.021
gy-0.018
o-0.018
 got-0.017
 would-0.017
Token he
Token ID339
Feature activation+9.32e-03

Pos logit contributions
umm+0.038
 disapproval+0.036
Warning+0.035
 subt+0.035
 gentleman+0.034
Neg logit contributions
 in-0.024
gy-0.021
o-0.020
 out-0.020
 just-0.019
Token screamed
Token ID25421
Feature activation-5.12e-02

Pos logit contributions
 in+0.009
gy+0.008
 out+0.008
o+0.007
 where+0.007
Neg logit contributions
umm-0.020
 disapproval-0.020
Warning-0.019
 heavens-0.019
 subt-0.019
Token and
Token ID290
Feature activation-1.75e-02

Pos logit contributions
umm+0.142
Warning+0.132
 disapproval+0.132
 heavens+0.129
 horror+0.126
Neg logit contributions
 in-0.037
o-0.026
 got-0.026
gy-0.026
 just-0.025
Token he
Token ID339
Feature activation+4.72e-03

Pos logit contributions
umm+0.036
 disapproval+0.036
Warning+0.034
 horror+0.034
 gentleman+0.034
Neg logit contributions
 in-0.020
gy-0.019
o-0.018
ie-0.016
 out-0.016
Token said
Token ID531
Feature activation-3.97e-02

Pos logit contributions
 in+0.004
gy+0.004
 out+0.004
o+0.004
 where+0.003
Neg logit contributions
umm-0.010
 disapproval-0.010
Warning-0.010
 horror-0.010
 jan-0.010
Token
Token ID6184
Feature activation-1.38e-01

Pos logit contributions
umm+0.083
 disapproval+0.077
 delight+0.077
 sentences+0.075
 gentleman+0.075
Neg logit contributions
gy-0.048
o-0.045
 in-0.041
ie-0.038
 found-0.036
Token
Token ID95
Feature activation-1.29e-01

Pos logit contributions
umm+0.230
 disapproval+0.209
Warning+0.208
 heavens+0.197
 horror+0.192
Neg logit contributions
 in-0.258
 just-0.236
 not-0.232
 got-0.228
gy-0.226
Token
Token ID26391
Feature activation-1.05e-01

Pos logit contributions
umm+0.207
 disapproval+0.184
Warning+0.177
 horror+0.175
 heavens+0.171
Neg logit contributions
 in-0.243
 just-0.227
 got-0.221
 not-0.220
gy-0.219
Token
Token ID129
Feature activation-1.19e-01

Pos logit contributions
umm+0.182
Warning+0.165
 Painter+0.160
 disapproval+0.156
 subt+0.154
Neg logit contributions
 in-0.177
ie-0.154
 out-0.152
 just-0.148
gy-0.144
Token
Token ID241
Feature activation-5.85e-02

Pos logit contributions
umm+0.218
Warning+0.191
 disapproval+0.188
 heavens+0.183
 Painter+0.178
Neg logit contributions
 in-0.206
 just-0.182
 not-0.181
gy-0.180
 got-0.180
TokenThis
Token ID1212
Feature activation-6.14e-02

Pos logit contributions
umm+0.094
 disapproval+0.082
Warning+0.082
 heavens+0.079
 horror+0.076
Neg logit contributions
 in-0.114
 just-0.101
o-0.101
 got-0.100
 not-0.098
Token m
Token ID285
Feature activation-1.17e-01

Pos logit contributions
umm+0.227
Warning+0.213
 subt+0.194
igsaw+0.193
 disapproval+0.186
Neg logit contributions
 in-0.131
 just-0.118
 out-0.110
 got-0.110
 not-0.109
Tokenop
Token ID404
Feature activation-3.90e-02

Pos logit contributions
umm+0.224
Warning+0.209
 disapproval+0.206
 heavens+0.198
 subt+0.194
Neg logit contributions
 in-0.181
o-0.170
 got-0.162
 just-0.156
--0.155
Token call
Token ID869
Feature activation-2.38e-02

Pos logit contributions
umm+0.107
Warning+0.099
 disapproval+0.098
 heavens+0.096
 subt+0.094
Neg logit contributions
 in-0.033
 just-0.024
 got-0.023
o-0.023
gy-0.023
Token after
Token ID706
Feature activation-4.11e-02

Pos logit contributions
umm+0.075
Warning+0.066
 disapproval+0.065
 subt+0.065
 heavens+0.064
Neg logit contributions
 in-0.020
 out-0.014
 just-0.012
 to-0.012
 got-0.011
Token her
Token ID607
Feature activation-8.19e-02

Pos logit contributions
umm+0.122
Warning+0.110
 disapproval+0.108
 heavens+0.106
 Painter+0.105
Neg logit contributions
 in-0.035
 just-0.024
 out-0.023
 not-0.022
 to-0.021
Token
Token ID6184
Feature activation-1.38e-01

Pos logit contributions
umm+0.240
Warning+0.226
 disapproval+0.218
 heavens+0.215
 sidelines+0.212
Neg logit contributions
 in-0.067
 just-0.047
 out-0.041
 not-0.041
 to-0.040
Token
Token ID95
Feature activation-1.29e-01

Pos logit contributions
umm+0.235
 disapproval+0.201
Warning+0.200
 horror+0.185
 Painter+0.185
Neg logit contributions
 in-0.251
 got-0.230
 just-0.230
 not-0.229
gy-0.223
Token
Token ID26391
Feature activation-9.86e-02

Pos logit contributions
umm+0.213
 disapproval+0.175
Warning+0.172
 horror+0.168
 Painter+0.163
Neg logit contributions
 in-0.248
 just-0.229
 got-0.228
 not-0.224
 found-0.222
Token
Token ID129
Feature activation-1.15e-01

Pos logit contributions
umm+0.177
Warning+0.162
 disapproval+0.158
 heavens+0.155
 Painter+0.149
Neg logit contributions
 in-0.165
gy-0.142
 just-0.141
o-0.139
 got-0.139
Token
Token ID241
Feature activation-5.75e-02

Pos logit contributions
umm+0.224
Warning+0.190
 disapproval+0.185
 Painter+0.181
 heavens+0.179
Neg logit contributions
 in-0.196
 not-0.175
 just-0.175
 got-0.170
gy-0.169
TokenRemember
Token ID16676
Feature activation-4.91e-02

Pos logit contributions
umm+0.094
 disapproval+0.080
Warning+0.080
 heavens+0.077
 horror+0.075
Neg logit contributions
 in-0.112
 just-0.100
 got-0.099
o-0.099
 would-0.097
Token
Token ID129
Feature activation+1.22e-01

Pos logit contributions
umm+0.017
 disapproval+0.017
Warning+0.017
 clerk+0.016
 horror+0.016
Neg logit contributions
 in-0.009
gy-0.008
o-0.008
 not-0.007
 out-0.007
Token
Token ID241
Feature activation+1.20e-01

Pos logit contributions
 in+0.229
gy+0.202
 got+0.202
o+0.201
 just+0.201
Neg logit contributions
umm-0.203
Warning-0.179
 disapproval-0.177
 heavens-0.171
 horror-0.164
TokenI
Token ID40
Feature activation+1.39e-01

Pos logit contributions
 in+0.195
gy+0.172
 got+0.170
 just+0.167
 would+0.166
Neg logit contributions
umm-0.221
Warning-0.204
 disapproval-0.199
 heavens-0.193
 gentleman-0.184
Tokenâ
Token ID22940
Feature activation+6.19e-02

Pos logit contributions
gy+0.184
 in+0.180
o+0.156
 where+0.153
ie+0.151
Neg logit contributions
 disapproval-0.255
umm-0.239
Warning-0.236
irk-0.236
 horror-0.230
Token
Token ID26391
Feature activation+6.25e-02

Pos logit contributions
 in+0.120
o+0.108
 got+0.108
gy+0.106
 just+0.106
Neg logit contributions
umm-0.098
Warning-0.085
 disapproval-0.083
 heavens-0.081
 horror-0.081
Token
Token ID8151
Feature activation+1.65e-01

Pos logit contributions
 in+0.099
 out+0.085
 would+0.085
ie+0.085
 just+0.084
Neg logit contributions
umm-0.112
Warning-0.101
 disapproval-0.100
 gentleman-0.100
 Painter-0.099
Tokenm
Token ID76
Feature activation+6.46e-02

Pos logit contributions
 in+0.201
 got+0.163
gy+0.163
 just+0.162
o+0.162
Neg logit contributions
umm-0.378
Warning-0.347
 disapproval-0.345
 heavens-0.336
 horror-0.325
Token sorry
Token ID7926
Feature activation+5.86e-02

Pos logit contributions
 in+0.097
 would+0.090
gy+0.088
 got+0.086
 found+0.086
Neg logit contributions
umm-0.117
 disapproval-0.109
 heavens-0.103
Warning-0.100
 Painter-0.099
Token Jane
Token ID12091
Feature activation+1.06e-01

Pos logit contributions
 in+0.053
gy+0.048
 found+0.044
o+0.044
 got+0.043
Neg logit contributions
umm-0.145
 disapproval-0.132
Warning-0.130
 heavens-0.129
 gentleman-0.127
Token.
Token ID13
Feature activation+3.10e-02

Pos logit contributions
 in+0.095
gy+0.083
 not+0.078
 out+0.076
 where+0.075
Neg logit contributions
umm-0.242
 disapproval-0.228
Warning-0.224
 heavens-0.221
 Painter-0.221
Token I
Token ID314
Feature activation+1.02e-01

Pos logit contributions
 in+0.049
o+0.044
 just+0.043
gy+0.042
 there+0.042
Neg logit contributions
umm-0.060
 disapproval-0.055
Warning-0.055
 heavens-0.051
 sidelines-0.051
Token would
Token ID561
Feature activation+1.70e-01

Pos logit contributions
 orchestra+0.058
Warning+0.057
 disapproval+0.055
 sidelines+0.055
 heavens+0.055
Neg logit contributions
o-0.047
 in-0.045
gy-0.040
ie-0.039
 got-0.038
Token supply
Token ID5127
Feature activation+4.98e-02

Pos logit contributions
o+0.224
gy+0.196
 in+0.185
led+0.182
ie+0.177
Neg logit contributions
 delight-0.299
 heavens-0.296
 disapproval-0.294
 sidelines-0.294
 orchestra-0.288
Token them
Token ID606
Feature activation+1.49e-01

Pos logit contributions
 in+0.048
gy+0.048
o+0.045
 got+0.045
 would+0.041
Neg logit contributions
umm-0.116
 disapproval-0.105
Warning-0.103
 clerk-0.102
 heavens-0.102
Token with
Token ID351
Feature activation+9.50e-02

Pos logit contributions
o+0.139
gy+0.130
 got+0.115
ep+0.109
bo+0.105
Neg logit contributions
 anticipation-0.308
umm-0.301
 sentences-0.295
 languages-0.294
 heavens-0.293
Token things
Token ID1243
Feature activation+1.34e-01

Pos logit contributions
 in+0.110
o+0.099
gy+0.097
 got+0.094
 just+0.088
Neg logit contributions
umm-0.211
 disapproval-0.200
Warning-0.195
 heavens-0.191
 horror-0.189
Token they
Token ID484
Feature activation+1.49e-01

Pos logit contributions
gy+0.127
o+0.123
 got+0.102
 in+0.098
ie+0.095
Neg logit contributions
umm-0.302
 heavens-0.282
 clerk-0.275
 horizon-0.274
Warning-0.272
Token needed
Token ID2622
Feature activation+9.91e-02

Pos logit contributions
gy+0.132
o+0.129
 in+0.125
 out+0.110
ie+0.103
Neg logit contributions
umm-0.324
 heavens-0.319
 disapproval-0.318
Warning-0.309
 anticipation-0.298
Token like
Token ID588
Feature activation+7.02e-02

Pos logit contributions
 in+0.078
o+0.071
gy+0.071
 got+0.065
 just+0.059
Neg logit contributions
umm-0.249
Warning-0.234
 disapproval-0.233
 heavens-0.229
 horror-0.223
Token food
Token ID2057
Feature activation+7.35e-02

Pos logit contributions
 in+0.076
o+0.063
gy+0.063
 got+0.062
 just+0.060
Neg logit contributions
umm-0.168
Warning-0.154
 disapproval-0.154
 heavens-0.150
 horror-0.146
Token,
Token ID11
Feature activation+5.68e-02

Pos logit contributions
gy+0.059
 in+0.059
o+0.058
 got+0.052
 just+0.048
Neg logit contributions
umm-0.179
 heavens-0.161
Warning-0.161
 disapproval-0.160
 clerk-0.158
Token water
Token ID1660
Feature activation+9.44e-02

Pos logit contributions
 in+0.068
gy+0.062
o+0.062
 got+0.059
 out+0.056
Neg logit contributions
umm-0.123
 disapproval-0.114
Warning-0.112
 heavens-0.110
 horror-0.107
Token comes
Token ID2058
Feature activation+1.37e-01

Pos logit contributions
gy+0.161
 not+0.153
o+0.138
 in+0.130
 found+0.130
Neg logit contributions
umm-0.382
 horizon-0.351
 heavens-0.326
ement-0.318
 delight-0.305
Token out
Token ID503
Feature activation+1.38e-01

Pos logit contributions
 in+0.091
o+0.090
 got+0.084
gy+0.084
 would+0.083
Neg logit contributions
umm-0.344
 disapproval-0.325
 heavens-0.320
Warning-0.318
 horizon-0.317
Token and
Token ID290
Feature activation+1.15e-01

Pos logit contributions
 in+0.122
gy+0.107
o+0.106
 got+0.102
 would+0.101
Neg logit contributions
umm-0.347
 disapproval-0.321
Warning-0.318
 heavens-0.315
 horror-0.310
Token the
Token ID262
Feature activation+3.95e-02

Pos logit contributions
gy+0.128
o+0.122
 in+0.110
ie+0.110
-+0.097
Neg logit contributions
 heavens-0.235
umm-0.231
 horror-0.226
 horizon-0.224
 disapproval-0.223
Token dro
Token ID3102
Feature activation+4.99e-02

Pos logit contributions
 in+0.069
o+0.062
gy+0.062
 got+0.061
 just+0.061
Neg logit contributions
umm-0.068
 disapproval-0.060
Warning-0.060
 heavens-0.059
 horizon-0.056
Tokenplets
Token ID46916
Feature activation+1.55e-01

Pos logit contributions
 in+0.115
 not+0.105
 would+0.105
 just+0.104
 found+0.103
Neg logit contributions
umm-0.063
 heavens-0.050
Warning-0.049
 disapproval-0.049
 horizon-0.046
Token of
Token ID286
Feature activation+9.25e-02

Pos logit contributions
gy+0.155
 in+0.145
o+0.139
 not+0.120
 just+0.118
Neg logit contributions
umm-0.351
 heavens-0.332
 disapproval-0.331
Warning-0.328
 horror-0.321
Token water
Token ID1660
Feature activation+1.20e-01

Pos logit contributions
 in+0.103
o+0.092
gy+0.091
 got+0.086
 would+0.085
Neg logit contributions
umm-0.213
 disapproval-0.200
Warning-0.197
 heavens-0.195
 horror-0.189
Token in
Token ID287
Feature activation+7.15e-02

Pos logit contributions
gy+0.098
o+0.094
 in+0.093
 not+0.083
 just+0.081
Neg logit contributions
umm-0.280
 heavens-0.260
 horizon-0.259
 disapproval-0.258
Warning-0.257
Token the
Token ID262
Feature activation+2.21e-02

Pos logit contributions
 in+0.060
o+0.046
gy+0.046
 got+0.045
 just+0.044
Neg logit contributions
umm-0.189
Warning-0.177
 disapproval-0.176
 heavens-0.172
 horror-0.168
Token sky
Token ID6766
Feature activation+1.25e-01

Pos logit contributions
 in+0.041
o+0.036
gy+0.036
 got+0.036
 just+0.036
Neg logit contributions
umm-0.037
Warning-0.032
 disapproval-0.032
 heavens-0.031
 horror-0.030
Token outside
Token ID2354
Feature activation+1.25e-01

Pos logit contributions
o+0.062
gy+0.060
 found+0.058
 got+0.057
 would+0.057
Neg logit contributions
umm-0.174
 horizon-0.153
 heavens-0.150
 gentleman-0.144
Warning-0.143
Token.
Token ID13
Feature activation+4.79e-02

Pos logit contributions
o+0.110
gy+0.104
 in+0.096
 got+0.091
 would+0.089
Neg logit contributions
umm-0.281
 horror-0.259
 disapproval-0.259
 heavens-0.257
 horizon-0.255
Token Kitty
Token ID21168
Feature activation+8.01e-02

Pos logit contributions
 in+0.081
o+0.071
 just+0.070
 got+0.069
gy+0.069
Neg logit contributions
umm-0.092
Warning-0.081
 disapproval-0.080
 heavens-0.077
 jan-0.075
Token and
Token ID290
Feature activation+7.85e-02

Pos logit contributions
 in+0.067
gy+0.059
 out+0.059
 not+0.057
 where+0.055
Neg logit contributions
 disapproval-0.171
 heavens-0.170
umm-0.169
Warning-0.165
 gentleman-0.163
Token Dog
Token ID8532
Feature activation+6.88e-02

Pos logit contributions
 in+0.133
gy+0.118
o+0.117
 got+0.117
 just+0.115
Neg logit contributions
umm-0.117
 disapproval-0.115
Warning-0.111
 horror-0.110
 gentleman-0.108
Tokengy
Token ID1360
Feature activation+1.54e-01

Pos logit contributions
 in+0.067
 not+0.056
 just+0.053
o+0.052
?"+0.052
Neg logit contributions
 disapproval-0.147
umm-0.146
Warning-0.144
 heavens-0.140
 horror-0.139
Token were
Token ID547
Feature activation+8.06e-02

Pos logit contributions
 in+0.155
o+0.141
 out+0.131
 not+0.128
 there+0.127
Neg logit contributions
 disapproval-0.300
Warning-0.288
 horror-0.280
umm-0.277
 heavens-0.276
Token inside
Token ID2641
Feature activation+1.09e-01

Pos logit contributions
gy+0.098
 in+0.096
 would+0.095
o+0.095
 got+0.094
Neg logit contributions
umm-0.147
 disapproval-0.144
 horror-0.142
 heavens-0.138
 delight-0.136
Token the
Token ID262
Feature activation-2.42e-03

Pos logit contributions
gy+0.087
o+0.084
 in+0.081
 got+0.077
bo+0.071
Neg logit contributions
umm-0.255
 disapproval-0.248
Warning-0.244
 horror-0.241
 heavens-0.239
Token house
Token ID2156
Feature activation+9.60e-02

Pos logit contributions
umm+0.005
Warning+0.004
 disapproval+0.004
 subt+0.004
irk+0.004
Neg logit contributions
 in-0.004
 out-0.003
 just-0.003
 would-0.003
 got-0.003
Token,
Token ID11
Feature activation+5.75e-02

Pos logit contributions
 in+0.073
gy+0.070
o+0.070
 got+0.062
 just+0.060
Neg logit contributions
umm-0.232
 heavens-0.214
Warning-0.211
 disapproval-0.210
 clerk-0.209
Token Tim
Token ID5045
Feature activation+1.08e-01

Pos logit contributions
 in+0.129
gy+0.125
o+0.119
 got+0.114
ie+0.109
Neg logit contributions
umm-0.220
 disapproval-0.207
Warning-0.203
 heavens-0.203
 gentleman-0.201
Tokenmy
Token ID1820
Feature activation+1.36e-01

Pos logit contributions
 in+0.107
gy+0.088
 where+0.088
 not+0.087
 out+0.083
Neg logit contributions
 Painter-0.207
 disapproval-0.206
Warning-0.201
umm-0.201
irk-0.201
Token met
Token ID1138
Feature activation+1.06e-01

Pos logit contributions
 in+0.136
gy+0.110
 where+0.108
 out+0.105
 not+0.104
Neg logit contributions
 gentleman-0.246
 Painter-0.245
 cheers-0.237
 horror-0.235
 disapproval-0.235
Token some
Token ID617
Feature activation+7.21e-02

Pos logit contributions
 in+0.068
 got+0.053
gy+0.052
 would+0.050
o+0.049
Neg logit contributions
umm-0.286
 disapproval-0.276
Warning-0.273
 heavens-0.265
 clerk-0.261
Token friendly
Token ID8030
Feature activation+1.94e-02

Pos logit contributions
 in+0.111
 just+0.092
 got+0.090
 would+0.089
 out+0.088
Neg logit contributions
umm-0.153
Warning-0.139
 disapproval-0.134
 heavens-0.131
 horror-0.128
Token aliens
Token ID16269
Feature activation+1.51e-01

Pos logit contributions
 in+0.033
 got+0.028
o+0.028
 just+0.028
 out+0.028
Neg logit contributions
umm-0.037
Warning-0.034
 disapproval-0.032
 heavens-0.032
 horror-0.031
Token who
Token ID508
Feature activation+1.09e-01

Pos logit contributions
gy+0.096
 not+0.090
 in+0.090
 got+0.089
o+0.078
Neg logit contributions
umm-0.359
Warning-0.342
 gentleman-0.341
 disapproval-0.338
 heavens-0.337
Token were
Token ID547
Feature activation+8.40e-02

Pos logit contributions
 in+0.098
gy+0.081
 out+0.077
o+0.076
 not+0.075
Neg logit contributions
umm-0.263
Warning-0.248
 disapproval-0.247
 heavens-0.245
 gentleman-0.240
Token also
Token ID635
Feature activation+1.13e-01

Pos logit contributions
 in+0.096
 got+0.084
gy+0.084
o+0.082
 would+0.080
Neg logit contributions
umm-0.185
 disapproval-0.173
Warning-0.170
 heavens-0.170
 jan-0.165
Token pretending
Token ID23662
Feature activation+8.80e-02

Pos logit contributions
 in+0.164
gy+0.148
 got+0.148
o+0.144
 would+0.144
Neg logit contributions
umm-0.208
 disapproval-0.194
 heavens-0.189
Warning-0.187
 jan-0.181
Token to
Token ID284
Feature activation+5.39e-02

Pos logit contributions
gy+0.049
 in+0.046
o+0.043
 got+0.036
 found+0.034
Neg logit contributions
umm-0.232
 disapproval-0.217
Warning-0.217
 heavens-0.216
 sentences-0.209
Token wearing
Token ID5762
Feature activation+6.67e-02

Pos logit contributions
 in+0.017
gy+0.016
 got+0.013
o+0.013
 found+0.012
Neg logit contributions
umm-0.077
 heavens-0.071
 subt-0.070
Warning-0.070
 disapproval-0.070
Token her
Token ID607
Feature activation+2.13e-02

Pos logit contributions
o+0.088
 in+0.086
gy+0.085
 would+0.078
 got+0.078
Neg logit contributions
umm-0.127
 disapproval-0.119
Warning-0.117
 heavens-0.117
 jan-0.116
Token red
Token ID2266
Feature activation+1.70e-02

Pos logit contributions
o+0.035
 in+0.034
gy+0.033
 got+0.032
ie+0.031
Neg logit contributions
umm-0.033
 heavens-0.029
 disapproval-0.029
 orchestra-0.029
 gentleman-0.029
Token m
Token ID285
Feature activation-2.15e-02

Pos logit contributions
 in+0.027
o+0.025
 not+0.024
gy+0.024
 found+0.024
Neg logit contributions
umm-0.029
 disapproval-0.026
Warning-0.026
 gentleman-0.025
 heavens-0.025
Tokenitten
Token ID2621
Feature activation+6.69e-02

Pos logit contributions
umm+0.035
Warning+0.031
 disapproval+0.031
 heavens+0.029
 horror+0.028
Neg logit contributions
 in-0.041
o-0.037
 got-0.036
gy-0.036
 just-0.036
Token when
Token ID618
Feature activation+5.47e-02

Pos logit contributions
 in+0.048
 found+0.046
 got+0.045
o+0.043
gy+0.043
Neg logit contributions
umm-0.160
 heavens-0.146
 clerk-0.145
Warning-0.143
 horizon-0.142
Token it
Token ID340
Feature activation+7.33e-02

Pos logit contributions
 in+0.072
gy+0.067
o+0.065
 got+0.062
 just+0.061
Neg logit contributions
umm-0.112
Warning-0.103
 disapproval-0.101
 heavens-0.100
 jan-0.096
Token was
Token ID373
Feature activation+6.61e-02

Pos logit contributions
 in+0.062
 out+0.050
gy+0.050
 not+0.049
 where+0.048
Neg logit contributions
umm-0.174
 heavens-0.163
Warning-0.161
 disapproval-0.160
 jan-0.156
Token cold
Token ID4692
Feature activation+2.27e-02

Pos logit contributions
 in+0.085
gy+0.085
o+0.082
 got+0.079
 would+0.077
Neg logit contributions
umm-0.123
 heavens-0.113
 disapproval-0.113
Warning-0.108
 horizon-0.107
Token outside
Token ID2354
Feature activation+7.40e-02

Pos logit contributions
o+0.019
gy+0.019
 not+0.018
 found+0.018
 would+0.017
Neg logit contributions
umm-0.056
 heavens-0.049
 horizon-0.048
Warning-0.046
 disapproval-0.044
Token.
Token ID13
Feature activation+8.16e-03

Pos logit contributions
o+0.063
 in+0.062
gy+0.060
 got+0.056
 found+0.053
Neg logit contributions
umm-0.180
Warning-0.164
 heavens-0.162
 disapproval-0.162
 horror-0.157
Token.
Token ID13
Feature activation-4.09e-03

Pos logit contributions
 in+0.037
o+0.031
 got+0.031
gy+0.031
 would+0.031
Neg logit contributions
umm-0.116
 disapproval-0.111
Warning-0.109
 horror-0.107
 heavens-0.106
Token They
Token ID1119
Feature activation+9.48e-02

Pos logit contributions
umm+0.011
 disapproval+0.009
Warning+0.009
 sidelines+0.009
igsaw+0.009
Neg logit contributions
 in-0.005
 not-0.004
 to-0.004
 out-0.004
 just-0.004
Token sh
Token ID427
Feature activation-1.60e-03

Pos logit contributions
 in+0.093
o+0.084
gy+0.081
ie+0.074
 not+0.070
Neg logit contributions
 disapproval-0.214
umm-0.208
Warning-0.208
 horror-0.203
 jan-0.200
Tokenivered
Token ID6396
Feature activation+2.77e-02

Pos logit contributions
umm+0.003
 disapproval+0.002
Warning+0.002
irk+0.002
 heavens+0.002
Neg logit contributions
 in-0.003
 would-0.003
 got-0.003
 out-0.003
 just-0.003
Token.
Token ID13
Feature activation-6.04e-03

Pos logit contributions
o+0.019
 in+0.019
gy+0.018
 would+0.017
 got+0.017
Neg logit contributions
umm-0.070
 disapproval-0.068
Warning-0.066
 heavens-0.065
 horror-0.065
Token They
Token ID1119
Feature activation+9.25e-02

Pos logit contributions
umm+0.018
 sidelines+0.016
 disapproval+0.015
igsaw+0.015
 orchestra+0.015
Neg logit contributions
 in-0.006
 to-0.005
 not-0.005
 there-0.004
 a-0.004
Token were
Token ID547
Feature activation+5.36e-02

Pos logit contributions
 in+0.091
o+0.090
gy+0.086
ie+0.079
 up+0.072
Neg logit contributions
 disapproval-0.202
 horror-0.193
Warning-0.191
 jan-0.186
 disbelief-0.186
Token cold
Token ID4692
Feature activation+6.49e-03

Pos logit contributions
 would+0.082
gy+0.081
o+0.079
ie+0.078
 got+0.078
Neg logit contributions
 disapproval-0.086
 horror-0.086
umm-0.082
 heavens-0.080
 jan-0.079
Token and
Token ID290
Feature activation+4.82e-02

Pos logit contributions
 in+0.004
 would+0.004
gy+0.004
 got+0.004
o+0.004
Neg logit contributions
umm-0.017
 disapproval-0.016
 heavens-0.016
Warning-0.016
 horror-0.015
Token scared
Token ID12008
Feature activation-7.91e-03

Pos logit contributions
gy+0.076
 in+0.074
o+0.072
ie+0.069
 out+0.067
Neg logit contributions
 disapproval-0.075
 horror-0.072
umm-0.069
 heavens-0.068
 anticipation-0.066
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
umm+0.023
Warning+0.022
 disapproval+0.022
 heavens+0.021
 subt+0.021
Neg logit contributions
 in-0.006
 just-0.004
 out-0.003
 got-0.003
o-0.003
Token,
Token ID11
Feature activation+3.16e-02

Pos logit contributions
gy+0.079
 would+0.070
o+0.069
 just+0.068
 in+0.066
Neg logit contributions
umm-0.124
Warning-0.119
 jan-0.118
 disapproval-0.118
 Cars-0.117
Token a
Token ID257
Feature activation+4.49e-02

Pos logit contributions
ishes+0.031
ches+0.030
aches+0.029
 would+0.028
ie+0.028
Neg logit contributions
Cool-0.056
Wow-0.056
Warning-0.055
 Kay-0.053
 clerk-0.052
Token little
Token ID1310
Feature activation+2.94e-02

Pos logit contributions
 in+0.060
 got+0.050
 just+0.049
o+0.049
gy+0.049
Neg logit contributions
umm-0.100
Warning-0.092
 disapproval-0.091
 heavens-0.087
 horror-0.085
Token boy
Token ID2933
Feature activation+8.34e-02

Pos logit contributions
 in+0.054
 got+0.048
gy+0.048
 just+0.048
 not+0.048
Neg logit contributions
umm-0.049
Warning-0.042
 disapproval-0.042
 heavens-0.041
 gentleman-0.039
Token named
Token ID3706
Feature activation+4.94e-02

Pos logit contributions
 not+0.112
ches+0.105
 up+0.103
 out+0.103
aches+0.101
Neg logit contributions
 Painter-0.120
 jan-0.119
Warning-0.119
 horror-0.116
umm-0.113
Token Tim
Token ID5045
Feature activation+7.23e-02

Pos logit contributions
 in+0.086
gy+0.085
ies+0.083
 would+0.082
-+0.082
Neg logit contributions
umm-0.060
 Cars-0.058
 jan-0.055
 Math-0.055
Cool-0.055
Token went
Token ID1816
Feature activation+5.85e-02

Pos logit contributions
 not+0.065
 in+0.058
 where+0.057
 as+0.053
 just+0.053
Neg logit contributions
Warning-0.146
irk-0.135
 disapproval-0.135
 Painter-0.131
 jan-0.130
Token to
Token ID284
Feature activation+6.49e-02

Pos logit contributions
 would+0.038
gy+0.037
ie+0.036
 just+0.035
 got+0.035
Neg logit contributions
 jan-0.131
umm-0.130
 heavens-0.129
 sidelines-0.129
Warning-0.128
Token the
Token ID262
Feature activation+3.43e-03

Pos logit contributions
 in+0.078
o+0.077
 would+0.075
led+0.074
ie+0.073
Neg logit contributions
Warning-0.114
 amusement-0.113
 sidelines-0.110
 Miss-0.110
 De-0.110
Token park
Token ID3952
Feature activation+7.23e-02

Pos logit contributions
 in+0.006
 just+0.005
 not+0.005
 got+0.005
 out+0.005
Neg logit contributions
umm-0.007
 disapproval-0.006
Warning-0.006
irk-0.006
 subt-0.006
Token with
Token ID351
Feature activation+3.53e-02

Pos logit contributions
 would+0.056
 in+0.054
gy+0.053
ie+0.051
 will+0.051
Neg logit contributions
umm-0.161
 gentleman-0.155
Warning-0.155
 jan-0.153
 clerk-0.151
Token tent
Token ID11105
Feature activation+1.03e-01

Pos logit contributions
 in+0.011
 got+0.009
 just+0.009
 out+0.009
 found+0.009
Neg logit contributions
umm-0.012
Warning-0.011
 disapproval-0.011
irk-0.010
 subt-0.010
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
 in+0.081
o+0.069
gy+0.068
 got+0.064
 just+0.061
Neg logit contributions
umm-0.262
Warning-0.240
 disapproval-0.239
 heavens-0.239
 horizon-0.234
Token They
Token ID1119
Feature activation+1.03e-01

Pos logit contributions
 in+0.019
 just+0.016
o+0.016
 not+0.016
 out+0.016
Neg logit contributions
umm-0.025
Warning-0.023
 disapproval-0.022
 heavens-0.021
 jan-0.021
Token even
Token ID772
Feature activation+9.87e-02

Pos logit contributions
 in+0.096
gy+0.081
o+0.079
 out+0.074
 not+0.070
Neg logit contributions
umm-0.237
 disapproval-0.234
Warning-0.227
 heavens-0.226
 horror-0.222
Token stayed
Token ID9658
Feature activation+4.40e-02

Pos logit contributions
 in+0.101
gy+0.093
?"+0.086
 not+0.085
 out+0.084
Neg logit contributions
 disapproval-0.200
 subt-0.198
 jan-0.193
 horror-0.190
 gentleman-0.190
Token up
Token ID510
Feature activation+5.95e-02

Pos logit contributions
o+0.036
 would+0.032
 got+0.031
gy+0.030
ie+0.028
Neg logit contributions
 disapproval-0.096
 orchestra-0.095
umm-0.095
 anticipation-0.094
 horror-0.094
Token late
Token ID2739
Feature activation+4.39e-02

Pos logit contributions
 got+0.040
gy+0.039
o+0.039
 would+0.038
 in+0.038
Neg logit contributions
umm-0.145
 disapproval-0.136
Warning-0.136
 heavens-0.136
 horizon-0.134
Token into
Token ID656
Feature activation+3.97e-02

Pos logit contributions
 in+0.034
o+0.034
 got+0.031
gy+0.031
 found+0.030
Neg logit contributions
umm-0.111
Warning-0.101
 disapproval-0.101
 heavens-0.100
 horizon-0.100
Token the
Token ID262
Feature activation-7.93e-03

Pos logit contributions
 in+0.036
o+0.035
gy+0.032
 would+0.031
 got+0.031
Neg logit contributions
umm-0.096
 disapproval-0.091
 horror-0.090
Warning-0.090
 heavens-0.088
Token night
Token ID1755
Feature activation+3.13e-02

Pos logit contributions
umm+0.015
Warning+0.014
 disapproval+0.014
 subt+0.013
 heavens+0.013
Neg logit contributions
 in-0.013
 just-0.011
 out-0.011
 got-0.011
 would-0.011
Token playing
Token ID2712
Feature activation+3.38e-02

Pos logit contributions
 in+0.049
o+0.044
gy+0.044
 got+0.042
 just+0.042
Neg logit contributions
umm-0.060
 heavens-0.052
Warning-0.051
 horizon-0.051
 disapproval-0.051
Token that
Token ID326
Feature activation+4.48e-02

Pos logit contributions
 in+0.022
gy+0.020
o+0.018
 got+0.017
 found+0.016
Neg logit contributions
umm-0.072
Warning-0.069
 disapproval-0.068
 heavens-0.067
 horror-0.065
Token the
Token ID262
Feature activation-3.20e-02

Pos logit contributions
 in+0.036
gy+0.027
o+0.027
 got+0.027
 just+0.026
Neg logit contributions
umm-0.120
Warning-0.112
 disapproval-0.112
 heavens-0.109
 horror-0.107
Token little
Token ID1310
Feature activation-2.87e-02

Pos logit contributions
umm+0.064
 disapproval+0.058
Warning+0.058
 heavens+0.057
 jan+0.054
Neg logit contributions
 in-0.048
o-0.041
gy-0.041
 got-0.041
 just-0.040
Token duck
Token ID22045
Feature activation+5.01e-02

Pos logit contributions
umm+0.056
 disapproval+0.050
Warning+0.049
 heavens+0.049
 gentleman+0.047
Neg logit contributions
 in-0.042
o-0.038
 not-0.037
gy-0.037
 got-0.037
Token was
Token ID373
Feature activation+5.20e-02

Pos logit contributions
 in+0.038
gy+0.035
 out+0.034
o+0.033
 where+0.033
Neg logit contributions
 disapproval-0.107
umm-0.104
 heavens-0.103
 horror-0.101
Fighting-0.100
Token not
Token ID407
Feature activation+8.22e-02

Pos logit contributions
 would+0.061
o+0.060
gy+0.060
ie+0.057
 in+0.056
Neg logit contributions
 disapproval-0.101
 heavens-0.094
 horror-0.094
umm-0.094
 clerk-0.087
Token feeling
Token ID4203
Feature activation+3.12e-02

Pos logit contributions
gy+0.118
o+0.116
 would+0.115
 in+0.108
 got+0.106
Neg logit contributions
 disapproval-0.131
umm-0.125
 heavens-0.118
Fighting-0.116
irk-0.113
Token well
Token ID880
Feature activation+3.43e-02

Pos logit contributions
gy+0.048
o+0.047
 would+0.044
 in+0.043
 got+0.042
Neg logit contributions
umm-0.053
 disapproval-0.051
 heavens-0.048
 horror-0.048
Warning-0.045
Token.
Token ID13
Feature activation-8.41e-03

Pos logit contributions
 in+0.030
o+0.029
gy+0.027
 got+0.025
 found+0.024
Neg logit contributions
umm-0.083
 disapproval-0.078
Warning-0.076
 heavens-0.075
 horror-0.074
Token They
Token ID1119
Feature activation+8.60e-02

Pos logit contributions
umm+0.019
Warning+0.017
 disapproval+0.017
 jan+0.016
 orchestra+0.016
Neg logit contributions
 in-0.012
 just-0.010
o-0.010
gy-0.010
 not-0.010
Token asked
Token ID1965
Feature activation+1.54e-02

Pos logit contributions
 in+0.089
gy+0.077
o+0.075
 out+0.069
 just+0.068
Neg logit contributions
umm-0.200
 disapproval-0.193
Warning-0.190
 heavens-0.186
 horror-0.183
Token cried
Token ID16896
Feature activation-3.52e-02

Pos logit contributions
umm+0.009
 disapproval+0.008
Warning+0.008
 heavens+0.008
 horror+0.008
Neg logit contributions
 in-0.006
gy-0.005
o-0.005
 would-0.005
 out-0.005
Token.
Token ID13
Feature activation-7.00e-02

Pos logit contributions
umm+0.109
Warning+0.101
 disapproval+0.097
 heavens+0.097
 sidelines+0.096
Neg logit contributions
 in-0.026
 out-0.016
 just-0.015
 got-0.013
gy-0.012
Token
Token ID198
Feature activation-3.65e-02

Pos logit contributions
umm+0.159
Warning+0.140
 disapproval+0.135
 orchestra+0.133
 amusement+0.132
Neg logit contributions
 in-0.108
 not-0.094
 to-0.093
 got-0.090
 out-0.089
Token
Token ID198
Feature activation-4.49e-02

Pos logit contributions
umm+0.110
 disapproval+0.094
Warning+0.091
 horizon+0.091
 jan+0.090
Neg logit contributions
 in-0.033
o-0.026
 not-0.026
 to-0.024
 got-0.023
TokenHer
Token ID9360
Feature activation-6.28e-02

Pos logit contributions
umm+0.105
 disapproval+0.085
Warning+0.084
 horizon+0.083
andel+0.083
Neg logit contributions
 in-0.069
 not-0.061
 got-0.059
 just-0.056
 found-0.055
Token mom
Token ID1995
Feature activation-7.88e-03

Pos logit contributions
umm+0.162
Warning+0.152
 heavens+0.141
 disapproval+0.139
 sidelines+0.137
Neg logit contributions
 in-0.076
 not-0.061
 just-0.060
 got-0.059
 out-0.055
Token smiled
Token ID13541
Feature activation-3.16e-02

Pos logit contributions
 disapproval+0.017
umm+0.017
 horror+0.016
 clerk+0.016
Warning+0.016
Neg logit contributions
 in-0.008
gy-0.007
o-0.007
 out-0.006
 not-0.006
Token and
Token ID290
Feature activation-5.77e-03

Pos logit contributions
umm+0.081
 disapproval+0.078
 heavens+0.076
 horror+0.075
Warning+0.075
Neg logit contributions
 in-0.021
o-0.021
gy-0.020
 got-0.018
 would-0.018
Token said
Token ID531
Feature activation-3.80e-02

Pos logit contributions
umm+0.011
 disapproval+0.011
 heavens+0.011
Warning+0.011
 clerk+0.010
Neg logit contributions
 in-0.008
gy-0.007
o-0.007
 not-0.006
 would-0.006
Token,
Token ID11
Feature activation-3.29e-02

Pos logit contributions
umm+0.080
 clerk+0.078
 disapproval+0.077
 horror+0.076
 heavens+0.075
Neg logit contributions
o-0.040
gy-0.039
 in-0.034
ep-0.032
 found-0.031
Token "
Token ID366
Feature activation-6.28e-02

Pos logit contributions
umm+0.058
 disapproval+0.052
Warning+0.052
 heavens+0.051
 horror+0.049
Neg logit contributions
 in-0.055
gy-0.050
o-0.049
 got-0.048
 would-0.048
Token in
Token ID287
Feature activation+2.76e-02

Pos logit contributions
gy+0.038
 got+0.034
 searched+0.034
 would+0.033
 dug+0.032
Neg logit contributions
umm-0.096
 gentleman-0.094
 clerk-0.093
 jan-0.086
 Painter-0.085
Token your
Token ID534
Feature activation-7.94e-03

Pos logit contributions
 in+0.030
 would+0.029
o+0.028
gy+0.027
 got+0.026
Neg logit contributions
umm-0.059
Warning-0.059
 disapproval-0.057
 heavens-0.055
 horror-0.054
Token eye
Token ID4151
Feature activation+4.40e-02

Pos logit contributions
umm+0.013
 horror+0.011
 disapproval+0.011
 heavens+0.011
Warning+0.011
Neg logit contributions
 in-0.014
o-0.013
 would-0.013
 got-0.013
gy-0.013
Token?"
Token ID1701
Feature activation+8.73e-03

Pos logit contributions
 in+0.030
gy+0.030
o+0.027
 got+0.026
 found+0.024
Neg logit contributions
umm-0.112
Warning-0.102
 clerk-0.102
 heavens-0.101
 gentleman-0.101
Token Sara
Token ID24799
Feature activation+5.02e-02

Pos logit contributions
 in+0.012
 just+0.010
 got+0.010
 not+0.009
o+0.009
Neg logit contributions
umm-0.020
Warning-0.018
 disapproval-0.018
 heavens-0.017
 subt-0.017
Token asked
Token ID1965
Feature activation+2.63e-02

Pos logit contributions
 in+0.064
gy+0.055
o+0.053
 got+0.052
 would+0.052
Neg logit contributions
umm-0.107
Warning-0.100
 disapproval-0.100
 heavens-0.097
 horror-0.095
Token.
Token ID13
Feature activation-1.91e-02

Pos logit contributions
 in+0.021
 got+0.016
gy+0.016
o+0.016
 would+0.016
Neg logit contributions
umm-0.069
Warning-0.065
 disapproval-0.065
 heavens-0.063
 horror-0.062
Token
Token ID198
Feature activation+1.47e-02

Pos logit contributions
umm+0.052
 disapproval+0.046
Warning+0.046
 orchestra+0.044
 sidelines+0.044
Neg logit contributions
 in-0.020
 out-0.016
 just-0.015
 not-0.015
o-0.015
Token
Token ID198
Feature activation+5.21e-03

Pos logit contributions
 in+0.018
o+0.015
 got+0.014
 just+0.014
gy+0.014
Neg logit contributions
umm-0.035
 disapproval-0.031
Warning-0.031
 heavens-0.030
 jan-0.030
TokenThe
Token ID464
Feature activation-5.21e-02

Pos logit contributions
 in+0.007
 got+0.006
o+0.006
gy+0.006
 just+0.006
Neg logit contributions
umm-0.012
Warning-0.010
 disapproval-0.010
 heavens-0.010
 horror-0.010
Token frog
Token ID21264
Feature activation+2.66e-02

Pos logit contributions
umm+0.107
Warning+0.096
 disapproval+0.093
 subt+0.089
 horror+0.089
Neg logit contributions
 in-0.085
 got-0.071
 just-0.071
 not-0.069
 found-0.069
Token to
Token ID284
Feature activation-5.34e-03

Pos logit contributions
umm+0.068
Warning+0.064
 disapproval+0.064
 heavens+0.063
 jan+0.062
Neg logit contributions
 in-0.012
o-0.007
gy-0.007
 got-0.007
 just-0.006
Token feel
Token ID1254
Feature activation-3.18e-02

Pos logit contributions
umm+0.011
Warning+0.011
 disapproval+0.011
 heavens+0.011
 jan+0.011
Neg logit contributions
 in-0.006
o-0.006
 got-0.005
gy-0.005
 would-0.005
Token gr
Token ID1036
Feature activation-8.83e-02

Pos logit contributions
umm+0.058
 disapproval+0.056
Warning+0.054
 heavens+0.054
 horror+0.052
Neg logit contributions
 in-0.048
gy-0.045
o-0.045
 would-0.041
 got-0.041
Tokenumpy
Token ID32152
Feature activation-7.02e-02

Pos logit contributions
umm+0.105
Warning+0.100
 horror+0.093
 disapproval+0.092
 sidelines+0.090
Neg logit contributions
 in-0.200
o-0.186
 got-0.180
gy-0.178
ie-0.177
Token.
Token ID13
Feature activation-6.63e-02

Pos logit contributions
umm+0.181
Warning+0.168
 disapproval+0.168
 heavens+0.164
 jan+0.161
Neg logit contributions
 in-0.056
o-0.048
gy-0.047
 got-0.045
 found-0.043
Token He
Token ID679
Feature activation+2.51e-02

Pos logit contributions
umm+0.144
Warning+0.133
 disapproval+0.132
 orchestra+0.131
 sidelines+0.131
Neg logit contributions
 in-0.091
gy-0.085
 to-0.081
 not-0.080
 just-0.080
Token kept
Token ID4030
Feature activation-3.24e-02

Pos logit contributions
 in+0.024
 out+0.021
o+0.021
 where+0.020
gy+0.020
Neg logit contributions
 disapproval-0.054
Warning-0.054
umm-0.053
 heavens-0.052
 jan-0.051
Token z
Token ID1976
Feature activation-8.12e-02

Pos logit contributions
umm+0.062
 disapproval+0.061
Warning+0.060
 heavens+0.058
 jan+0.058
Neg logit contributions
 in-0.042
gy-0.039
o-0.039
 would-0.038
 got-0.038
Tokenipping
Token ID4501
Feature activation-5.17e-02

Pos logit contributions
umm+0.124
 disapproval+0.110
Warning+0.107
irk+0.105
andel+0.102
Neg logit contributions
 in-0.154
 would-0.147
gy-0.144
 got-0.143
 found-0.140
Token around
Token ID1088
Feature activation-1.33e-02

Pos logit contributions
umm+0.141
Warning+0.131
 disapproval+0.131
 heavens+0.128
 horror+0.124
Neg logit contributions
 in-0.040
o-0.030
gy-0.029
 got-0.029
 just-0.028
Token the
Token ID262
Feature activation-9.11e-02

Pos logit contributions
umm+0.038
Warning+0.036
 disapproval+0.036
 heavens+0.035
 horror+0.034
Neg logit contributions
 in-0.008
o-0.005
 got-0.005
gy-0.005
 just-0.005
Token to
Token ID284
Feature activation+5.20e-04

Pos logit contributions
umm+0.031
Warning+0.029
 disapproval+0.029
 heavens+0.029
 horror+0.028
Neg logit contributions
 in-0.006
gy-0.004
o-0.004
 got-0.004
 just-0.003
Token take
Token ID1011
Feature activation-2.78e-02

Pos logit contributions
 in+0.001
o+0.001
gy+0.001
 would+0.001
 just+0.001
Neg logit contributions
umm-0.001
Warning-0.001
 disapproval-0.001
 heavens-0.001
 jan-0.001
Token them
Token ID606
Feature activation+2.04e-02

Pos logit contributions
umm+0.070
 disapproval+0.068
Warning+0.067
 horror+0.065
 sentences+0.065
Neg logit contributions
 in-0.021
o-0.018
 would-0.017
gy-0.017
 got-0.016
Token.
Token ID13
Feature activation-4.41e-02

Pos logit contributions
 in+0.017
 got+0.014
o+0.014
 would+0.014
gy+0.014
Neg logit contributions
umm-0.050
Warning-0.048
 disapproval-0.048
 heavens-0.046
 jan-0.046
Token He
Token ID679
Feature activation+4.56e-02

Pos logit contributions
umm+0.100
 disapproval+0.090
Warning+0.089
 heavens+0.088
 orchestra+0.087
Neg logit contributions
 in-0.064
 got-0.058
 to-0.055
 not-0.055
 just-0.054
Token walked
Token ID6807
Feature activation-2.21e-03

Pos logit contributions
 in+0.048
gy+0.042
 out+0.041
o+0.039
 where+0.036
Neg logit contributions
 disapproval-0.098
umm-0.097
Warning-0.097
 jan-0.095
 clerk-0.094
Token over
Token ID625
Feature activation+6.18e-03

Pos logit contributions
umm+0.006
Warning+0.006
 disapproval+0.005
 heavens+0.005
 jan+0.005
Neg logit contributions
 in-0.002
 got-0.001
gy-0.001
o-0.001
 just-0.001
Token to
Token ID284
Feature activation-2.63e-03

Pos logit contributions
 in+0.003
 got+0.002
 just+0.002
o+0.002
gy+0.001
Neg logit contributions
umm-0.019
Warning-0.018
 disapproval-0.018
 heavens-0.017
 horror-0.017
Token them
Token ID606
Feature activation+2.69e-02

Pos logit contributions
umm+0.006
Warning+0.006
 disapproval+0.006
 heavens+0.006
 gentleman+0.006
Neg logit contributions
 in-0.003
gy-0.002
o-0.002
 got-0.002
 would-0.002
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
 in+0.017
o+0.013
gy+0.013
 got+0.013
 would+0.012
Neg logit contributions
umm-0.073
 disapproval-0.069
Warning-0.069
 heavens-0.068
 horror-0.067
Token said
Token ID531
Feature activation-1.55e-02

Pos logit contributions
 in+0.013
gy+0.012
o+0.012
ie+0.011
 out+0.011
Neg logit contributions
umm-0.020
 clerk-0.020
 gentleman-0.020
 disapproval-0.020
 jan-0.019
Token was
Token ID373
Feature activation+5.94e-02

Pos logit contributions
o+0.070
 in+0.068
gy+0.067
 where+0.062
 out+0.061
Neg logit contributions
 heavens-0.160
umm-0.158
 disapproval-0.157
Warning-0.157
 horizon-0.150
Token getting
Token ID1972
Feature activation+4.04e-02

Pos logit contributions
o+0.073
gy+0.069
 in+0.061
 got+0.061
ie+0.059
Neg logit contributions
 disapproval-0.116
umm-0.113
 heavens-0.110
 horror-0.107
 horizon-0.104
Token dark
Token ID3223
Feature activation+1.90e-02

Pos logit contributions
gy+0.055
 in+0.054
o+0.052
 got+0.049
 would+0.048
Neg logit contributions
umm-0.076
 disapproval-0.073
 heavens-0.072
Warning-0.070
 clerk-0.067
Token outside
Token ID2354
Feature activation+6.97e-02

Pos logit contributions
o+0.017
 got+0.015
 not+0.015
gy+0.015
 found+0.013
Neg logit contributions
umm-0.045
 horizon-0.041
 heavens-0.041
Warning-0.039
 disapproval-0.038
Token,
Token ID11
Feature activation+3.17e-02

Pos logit contributions
 in+0.055
o+0.051
gy+0.049
 got+0.047
 found+0.043
Neg logit contributions
umm-0.175
Warning-0.162
 disapproval-0.161
 heavens-0.160
 horizon-0.156
Token and
Token ID290
Feature activation+4.43e-02

Pos logit contributions
 in+0.035
o+0.033
gy+0.033
 got+0.029
ie+0.028
Neg logit contributions
umm-0.070
 disapproval-0.067
 heavens-0.065
Warning-0.065
 horror-0.064
Token it
Token ID340
Feature activation+7.36e-02

Pos logit contributions
o+0.039
gy+0.039
 in+0.038
 got+0.029
ie+0.028
Neg logit contributions
umm-0.103
 disapproval-0.098
 heavens-0.098
Warning-0.097
 horror-0.095
Token needed
Token ID2622
Feature activation+4.59e-02

Pos logit contributions
o+0.069
 in+0.067
gy+0.066
 where+0.060
 out+0.057
Neg logit contributions
 heavens-0.160
umm-0.158
 disapproval-0.155
 horror-0.152
Warning-0.150
Token to
Token ID284
Feature activation+4.57e-02

Pos logit contributions
 in+0.027
o+0.022
gy+0.022
 got+0.019
 just+0.016
Neg logit contributions
umm-0.128
Warning-0.122
 disapproval-0.121
 heavens-0.120
 horror-0.116
Token find
Token ID1064
Feature activation+4.33e-02

Pos logit contributions
o+0.053
 in+0.051
gy+0.043
 got+0.042
 would+0.042
Neg logit contributions
umm-0.096
 disapproval-0.094
 heavens-0.094
Warning-0.093
 horror-0.090
Token its
Token ID663
Feature activation-8.91e-03

Pos logit contributions
o+0.045
gy+0.043
 in+0.042
 got+0.039
 would+0.037
Neg logit contributions
umm-0.097
 disapproval-0.092
Warning-0.091
 heavens-0.091
 horror-0.090
Token Ben
Token ID3932
Feature activation-3.48e-02

Pos logit contributions
 horror+0.024
 delight+0.022
 disapproval+0.022
 sidelines+0.021
 jan+0.021
Neg logit contributions
 in-0.046
gy-0.045
o-0.044
 would-0.043
ery-0.041
Token.
Token ID13
Feature activation-6.02e-02

Pos logit contributions
 disapproval+0.080
Warning+0.079
 horror+0.078
umm+0.077
 heavens+0.074
Neg logit contributions
gy-0.025
 in-0.025
o-0.024
 would-0.022
 got-0.022
Token
Token ID198
Feature activation-2.24e-02

Pos logit contributions
umm+0.119
Warning+0.109
 disapproval+0.107
 heavens+0.105
 horror+0.102
Neg logit contributions
 in-0.090
gy-0.079
o-0.076
 got-0.076
 just-0.076
Token
Token ID198
Feature activation-3.50e-02

Pos logit contributions
umm+0.050
Warning+0.045
 disapproval+0.045
 heavens+0.044
 horror+0.042
Neg logit contributions
 in-0.029
 got-0.024
gy-0.024
o-0.024
 just-0.024
Token"
Token ID1
Feature activation-4.95e-02

Pos logit contributions
umm+0.062
Warning+0.056
 disapproval+0.055
 heavens+0.054
 gentleman+0.051
Neg logit contributions
 in-0.061
gy-0.054
 just-0.053
o-0.053
 got-0.052
TokenRun
Token ID10987
Feature activation-5.24e-02

Pos logit contributions
umm+0.079
 disapproval+0.063
 heavens+0.063
Warning+0.062
 subt+0.062
Neg logit contributions
 in-0.101
 got-0.092
 found-0.090
 just-0.090
o-0.089
Token,
Token ID11
Feature activation-4.80e-02

Pos logit contributions
umm+0.120
Warning+0.115
 disapproval+0.114
 heavens+0.110
 gentleman+0.110
Neg logit contributions
 in-0.049
 would-0.044
o-0.041
 just-0.040
gy-0.040
Token run
Token ID1057
Feature activation-4.92e-02

Pos logit contributions
umm+0.075
 disapproval+0.070
 horror+0.069
Warning+0.069
 jan+0.068
Neg logit contributions
 in-0.080
gy-0.079
o-0.078
 would-0.075
ie-0.071
Token!"
Token ID2474
Feature activation-5.85e-02

Pos logit contributions
umm+0.107
Warning+0.104
 disapproval+0.101
 heavens+0.099
 gentleman+0.099
Neg logit contributions
 in-0.046
 would-0.043
o-0.042
gy-0.040
 not-0.039
Token Sara
Token ID24799
Feature activation-2.24e-02

Pos logit contributions
umm+0.118
Warning+0.108
 disapproval+0.107
 heavens+0.104
 horror+0.102
Neg logit contributions
 in-0.085
gy-0.074
o-0.074
 would-0.072
 just-0.071
Token says
Token ID1139
Feature activation-3.37e-02

Pos logit contributions
umm+0.044
Warning+0.041
 disapproval+0.040
 heavens+0.039
 horror+0.038
Neg logit contributions
 in-0.034
gy-0.029
 got-0.029
o-0.029
 just-0.029
Token't
Token ID470
Feature activation+3.20e-02

Pos logit contributions
 in+0.034
gy+0.031
o+0.031
 would+0.030
 found+0.030
Neg logit contributions
umm-0.042
Warning-0.041
 heavens-0.039
 disapproval-0.037
 clerk-0.036
Token believe
Token ID1975
Feature activation-1.69e-02

Pos logit contributions
 in+0.038
o+0.031
gy+0.031
 got+0.031
 just+0.030
Neg logit contributions
umm-0.074
Warning-0.069
 disapproval-0.068
 heavens-0.067
 horror-0.065
Token his
Token ID465
Feature activation-3.71e-02

Pos logit contributions
umm+0.037
 horror+0.035
 jan+0.034
 gentleman+0.034
 delight+0.034
Neg logit contributions
o-0.018
gy-0.018
 got-0.014
 will-0.014
 found-0.014
Token eyes
Token ID2951
Feature activation+2.63e-02

Pos logit contributions
umm+0.069
Warning+0.064
 disapproval+0.062
 heavens+0.060
 sidelines+0.058
Neg logit contributions
 in-0.064
 just-0.055
 out-0.054
 not-0.053
 got-0.053
Token.
Token ID13
Feature activation-3.94e-02

Pos logit contributions
 in+0.022
 just+0.015
 got+0.015
 out+0.014
 would+0.014
Neg logit contributions
umm-0.073
Warning-0.068
 disapproval-0.068
 heavens-0.066
irk-0.064
Token The
Token ID383
Feature activation-6.74e-02

Pos logit contributions
umm+0.073
Warning+0.068
 disapproval+0.067
 heavens+0.063
irk+0.062
Neg logit contributions
 in-0.067
 got-0.058
 just-0.057
 out-0.056
 not-0.056
Token truck
Token ID7779
Feature activation+9.37e-03

Pos logit contributions
umm+0.145
Warning+0.134
 disapproval+0.131
irk+0.130
andel+0.123
Neg logit contributions
 in-0.101
 not-0.086
 just-0.085
 got-0.083
 would-0.082
Token was
Token ID373
Feature activation+6.17e-03

Pos logit contributions
 in+0.006
 got+0.004
 just+0.004
o+0.003
gy+0.003
Neg logit contributions
umm-0.028
Warning-0.026
 disapproval-0.026
 heavens-0.025
irk-0.025
Token so
Token ID523
Feature activation-1.02e-02

Pos logit contributions
 in+0.007
gy+0.005
o+0.005
 got+0.005
 just+0.005
Neg logit contributions
umm-0.015
Warning-0.014
 disapproval-0.014
 heavens-0.013
 horror-0.013
Token orange
Token ID10912
Feature activation-1.72e-02

Pos logit contributions
umm+0.024
Warning+0.022
 disapproval+0.022
 heavens+0.021
 horror+0.021
Neg logit contributions
 in-0.014
 not-0.012
 got-0.012
 just-0.011
 found-0.011
Token!
Token ID0
Feature activation-3.95e-02

Pos logit contributions
umm+0.044
Warning+0.038
 jan+0.038
 heavens+0.037
 gentleman+0.037
Neg logit contributions
gy-0.013
 in-0.013
o-0.013
 found-0.011
 got-0.011
Token,
Token ID11
Feature activation-2.16e-02

Pos logit contributions
umm+0.068
 heavens+0.064
 disapproval+0.063
Warning+0.063
 horror+0.062
Neg logit contributions
o-0.023
gy-0.019
 in-0.019
 got-0.018
 would-0.017
Token all
Token ID477
Feature activation-4.97e-03

Pos logit contributions
umm+0.041
 horror+0.040
 disapproval+0.039
 heavens+0.039
 horizon+0.037
Neg logit contributions
o-0.027
 in-0.027
gy-0.027
ie-0.025
 got-0.025
Token wrapped
Token ID12908
Feature activation-5.44e-02

Pos logit contributions
umm+0.011
 horror+0.010
 heavens+0.010
 disapproval+0.010
Warning+0.010
Neg logit contributions
 in-0.005
gy-0.005
o-0.005
 not-0.005
 just-0.005
Token in
Token ID287
Feature activation-4.81e-02

Pos logit contributions
umm+0.133
 disapproval+0.124
Warning+0.122
 heavens+0.119
 clerk+0.116
Neg logit contributions
 got-0.043
 in-0.042
 not-0.040
 would-0.039
o-0.039
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
umm+0.134
Warning+0.124
 disapproval+0.123
 heavens+0.121
 jan+0.118
Neg logit contributions
 in-0.039
 just-0.027
 got-0.026
o-0.026
gy-0.026
Token soft
Token ID2705
Feature activation-9.88e-02

Pos logit contributions
umm+0.206
Warning+0.186
 disapproval+0.184
 heavens+0.179
 horror+0.172
Neg logit contributions
 in-0.171
 got-0.147
o-0.147
gy-0.147
 just-0.146
Token leaves
Token ID5667
Feature activation-1.02e-02

Pos logit contributions
umm+0.167
 disapproval+0.145
 heavens+0.143
Warning+0.139
 horizon+0.137
Neg logit contributions
 in-0.165
o-0.161
 not-0.156
 just-0.153
 got-0.149
Token.
Token ID13
Feature activation-5.58e-02

Pos logit contributions
umm+0.025
 heavens+0.023
Warning+0.023
 horizon+0.023
 disapproval+0.023
Neg logit contributions
 in-0.008
 got-0.007
gy-0.007
o-0.007
 found-0.007
Token<|endoftext|>
Token ID50256
Feature activation+5.75e-04

Pos logit contributions
umm+0.128
 disapproval+0.119
Warning+0.116
 jan+0.113
 orchestra+0.111
Neg logit contributions
 in-0.073
o-0.065
 just-0.061
 got-0.060
gy-0.059
TokenOnce
Token ID7454
Feature activation-2.82e-03

Pos logit contributions
 in+0.001
gy+0.001
 got+0.001
 just+0.001
o+0.001
Neg logit contributions
umm-0.001
Warning-0.001
 disapproval-0.001
 heavens-0.001
 horror-0.001
Token upon
Token ID2402
Feature activation+2.38e-02

Pos logit contributions
umm+0.006
irk+0.006
 Painter+0.006
 horror+0.006
Warning+0.006
Neg logit contributions
gy-0.003
 in-0.003
 just-0.002
 got-0.002
o-0.002
Token better
Token ID1365
Feature activation-4.58e-02

Pos logit contributions
gy+0.091
o+0.085
 in+0.076
 not+0.072
 would+0.071
Neg logit contributions
umm-0.060
 subt-0.051
 sentences-0.051
 heavens-0.051
 anticipation-0.050
Token.
Token ID13
Feature activation-5.08e-02

Pos logit contributions
umm+0.105
Warning+0.089
 heavens+0.089
 horizon+0.088
 sentences+0.087
Neg logit contributions
o-0.049
gy-0.048
 got-0.045
 in-0.045
ie-0.042
Token He
Token ID679
Feature activation+2.53e-02

Pos logit contributions
umm+0.108
 disapproval+0.100
Warning+0.099
 heavens+0.095
 jan+0.094
Neg logit contributions
 in-0.074
o-0.066
gy-0.063
 got-0.062
 just-0.062
Token was
Token ID373
Feature activation-1.25e-02

Pos logit contributions
gy+0.022
 in+0.022
 out+0.021
 where+0.020
o+0.019
Neg logit contributions
umm-0.053
 subt-0.051
 Painter-0.051
Warning-0.049
 heavens-0.049
Token really
Token ID1107
Feature activation-8.17e-03

Pos logit contributions
umm+0.026
 disapproval+0.025
 heavens+0.024
 horror+0.024
 subt+0.024
Neg logit contributions
gy-0.014
 in-0.014
ie-0.013
o-0.013
 would-0.012
Token proud
Token ID6613
Feature activation-5.24e-02

Pos logit contributions
umm+0.013
 disapproval+0.011
 subt+0.011
 jan+0.011
 heavens+0.011
Neg logit contributions
gy-0.014
 in-0.014
o-0.013
ie-0.013
 would-0.013
Token of
Token ID286
Feature activation-3.68e-02

Pos logit contributions
umm+0.117
Warning+0.107
 disapproval+0.106
 heavens+0.104
 horror+0.100
Neg logit contributions
 in-0.069
 got-0.056
o-0.056
 just-0.056
gy-0.056
Token his
Token ID465
Feature activation-8.13e-02

Pos logit contributions
umm+0.074
Warning+0.069
 disapproval+0.068
 horror+0.067
 subt+0.066
Neg logit contributions
gy-0.047
o-0.045
 got-0.043
 in-0.043
 would-0.041
Token work
Token ID670
Feature activation-3.67e-02

Pos logit contributions
umm+0.146
 heavens+0.124
 horizon+0.123
 horror+0.123
 disapproval+0.121
Neg logit contributions
 in-0.126
o-0.124
gy-0.120
 got-0.115
 would-0.112
Token.
Token ID13
Feature activation-5.18e-02

Pos logit contributions
umm+0.093
Warning+0.084
 disapproval+0.083
 heavens+0.083
 Painter+0.081
Neg logit contributions
 in-0.032
gy-0.029
o-0.028
 got-0.026
 found-0.025
Token
Token ID220
Feature activation-2.25e-02

Pos logit contributions
umm+0.121
 disapproval+0.114
Warning+0.111
 sidelines+0.107
 heavens+0.106
Neg logit contributions
 in-0.065
o-0.061
 just-0.054
 got-0.054
gy-0.053
Token for
Token ID329
Feature activation-2.17e-02

Pos logit contributions
umm+0.119
Warning+0.110
 disapproval+0.109
 heavens+0.107
 jan+0.105
Neg logit contributions
 in-0.036
 just-0.025
 got-0.025
 out-0.024
 not-0.024
Token permission
Token ID7170
Feature activation-6.18e-02

Pos logit contributions
umm+0.037
 disapproval+0.035
Warning+0.035
 heavens+0.035
 sentences+0.033
Neg logit contributions
 in-0.035
gy-0.034
o-0.034
 got-0.031
 would-0.031
Token than
Token ID621
Feature activation-3.59e-02

Pos logit contributions
umm+0.147
Warning+0.137
 disapproval+0.135
 heavens+0.133
 clerk+0.128
Neg logit contributions
 in-0.067
gy-0.055
 got-0.055
o-0.055
 just-0.053
Token to
Token ID284
Feature activation-3.46e-02

Pos logit contributions
umm+0.078
 heavens+0.074
 disapproval+0.074
Warning+0.073
 sidelines+0.070
Neg logit contributions
o-0.037
gy-0.037
 in-0.033
 got-0.031
 would-0.030
Token feel
Token ID1254
Feature activation-4.66e-02

Pos logit contributions
umm+0.077
Warning+0.073
 disapproval+0.072
 heavens+0.069
 gentleman+0.068
Neg logit contributions
 in-0.041
o-0.036
gy-0.035
 got-0.034
 would-0.033
Token guilty
Token ID6717
Feature activation-7.96e-02

Pos logit contributions
umm+0.099
 disapproval+0.092
Warning+0.091
 heavens+0.091
 horror+0.087
Neg logit contributions
 in-0.056
gy-0.054
o-0.053
 got-0.048
 would-0.047
Token.
Token ID13
Feature activation-5.50e-02

Pos logit contributions
umm+0.210
Warning+0.193
 disapproval+0.193
 heavens+0.188
 horror+0.184
Neg logit contributions
 in-0.064
o-0.054
gy-0.054
 got-0.051
 just-0.048
Token<|endoftext|>
Token ID50256
Feature activation-7.09e-03

Pos logit contributions
umm+0.125
 disapproval+0.115
Warning+0.112
 sidelines+0.110
 orchestra+0.107
Neg logit contributions
 in-0.073
o-0.066
 got-0.062
 just-0.061
 not-0.061
TokenOnce
Token ID7454
Feature activation-9.65e-03

Pos logit contributions
umm+0.015
Warning+0.014
 disapproval+0.013
 heavens+0.013
 jan+0.013
Neg logit contributions
 in-0.010
gy-0.008
 got-0.008
 just-0.008
o-0.008
Token upon
Token ID2402
Feature activation+1.76e-02

Pos logit contributions
umm+0.020
irk+0.020
 Painter+0.019
 horror+0.019
Warning+0.019
Neg logit contributions
gy-0.009
 in-0.009
 just-0.008
o-0.007
 got-0.007
Token a
Token ID257
Feature activation-4.32e-02

Pos logit contributions
gy+0.024
ishes+0.023
ish+0.023
ie+0.020
ies+0.020
Neg logit contributions
Warning-0.033
 Many-0.030
 footsteps-0.030
 disapproval-0.030
 amusement-0.029