TOP ACTIVATIONS
MAX = 0.296

Sour candy made her face scr unch up . She did
listen to Mom ." She scr ubs them with soap and
. She felt her face scr unch up and her tongue
sour and made their faces scr unch up .
and lady bugs and cater pill ars ." Tim
hard and made her face scr unch up . She spat
. Soon , more cater pill ars arrived . They also
. It makes her face scr unch up . She sp
bitter and made her face scr unch up .
Lily , no . Cater pill ars are not for keeping
ily liked to keep cater pill ars in a jar .
but he also knew cater pill ars were not pets .
it made Jack 's face scr unch up !
you , Lily . Cater pill ars can leave and never
made a funny face , scr unch ing up her nose
put it back . Cater pill ars need to eat leaves
it made Bob 's face scr unch up . He didn
a little bit and then scr abb led underneath some nearby
sour it made her face scr unch up !
said , " C ater pill ars are not meant to

MIN ACTIVATIONS
MIN = -0.354

he noticed something shiny wed ged between some branches . It
soft swing . He tug ged on his mom 's hand
The hero carefully tug ged and pulled until the rabbit
grabbed the hook and tug ged it . Suddenly , an
Tom tug ged at the knob and pulled
so badly that he lun ged forward and slur ped it
the counter . She tug ged on it , to see
. She pulled and tug ged until suddenly the zipper broke
was calm and he nud ged Meg 's arm reassuring ly
Suddenly , the dog lun ged at Tim my and bit
it moved . He tug ged harder and the wire came
they were all sm ud ged and she couldn 't read
it , so she tug ged and pulled and finally opened
cry . He tr ud ged back home , carrying the
. She pushed and tug ged at her k ite ,
The m ama bear nud ged the daddy bear and said
. She pushed and tug ged harder than ever , but
aled and wrapped and bag ged and boxed and cart oned
drawings would get sm ud ged because the paper wasn 't
. He shook and tug ged it , but it wouldn

INTERVAL 0.133 - 0.296
CONTAINS 7.466%

a very special day . It was pretty ! Mom my
to open z ips . She sees Ben and Lily struggling
. The man loved her very much . One day ,
her toys were gone . She had nothing left . She
plant her pine seed . She hoped that one day ,

INTERVAL -0.029 - 0.133
CONTAINS 55.989%

said the same magic word . Suddenly , the silver ball
is loyal and it was wrong to try and take something
many flowers and a fountain . She liked to spray water
bed to sleep in . The puppy was no longer scared
They play with Rex and the man . They share their

INTERVAL -0.192 - -0.029
CONTAINS 35.124%

of help from her mum , the garden was soon clean
and flew away with a smile on his face . He
from his bag until he had enough energy to yield himself
carrots to chew on . <|endoftext|> Once upon a time ,
happily showed it to her mom my . The crystal was

INTERVAL -0.354 - -0.192
CONTAINS 1.421%

<|endoftext|> Once upon a time , there was a little girl
, all of the sudden , it started to rain .
his tail . He l icked the fat boy 's face
drink . His mom said , " John , drink some
know that L ila was inside . She picked up the
Token Sour
Token ID47694
Feature activation+3.50e-03

Pos logit contributions
 rather+0.117
 annoying+0.112
rig+0.112
 stump+0.106
 probably+0.101
Neg logit contributions
ged-0.139
iked-0.124
 recess-0.123
or-0.118
ed-0.111
Token candy
Token ID18550
Feature activation+1.44e-02

Pos logit contributions
 rather+0.006
 annoying+0.006
rig+0.005
 stump+0.005
 probably+0.005
Neg logit contributions
ged-0.006
 recess-0.005
iked-0.005
or-0.005
 roles-0.005
Token made
Token ID925
Feature activation-1.47e-02

Pos logit contributions
 rather+0.031
 annoying+0.030
rig+0.030
 stump+0.029
 probably+0.028
Neg logit contributions
ged-0.018
 recess-0.015
iked-0.015
or-0.014
ed-0.012
Token her
Token ID607
Feature activation-5.37e-02

Pos logit contributions
ged+0.016
 recess+0.013
iked+0.012
or+0.012
ed+0.012
Neg logit contributions
 rather-0.034
rig-0.033
 stump-0.032
 annoying-0.032
 tre-0.030
Token face
Token ID1986
Feature activation+1.19e-01

Pos logit contributions
ged+0.084
 recess+0.070
ies+0.069
iked+0.068
ed+0.067
Neg logit contributions
rig-0.094
 rather-0.089
 annoying-0.089
 stump-0.088
 wire-0.084
Token scr
Token ID6040
Feature activation+2.96e-01

Pos logit contributions
rig+0.187
 rather+0.172
 stump+0.166
 annoying+0.165
 wire+0.154
Neg logit contributions
ged-0.215
ies-0.187
ed-0.182
or-0.180
iked-0.179
Tokenunch
Token ID3316
Feature activation+7.02e-02

Pos logit contributions
 annoying+0.522
 rather+0.516
 stump+0.508
rig+0.508
 wire+0.499
Neg logit contributions
ged-0.423
iked-0.418
or-0.392
ipped-0.362
 recess-0.345
Token up
Token ID510
Feature activation-2.57e-02

Pos logit contributions
 rather+0.147
 annoying+0.143
rig+0.142
 stump+0.137
 probably+0.133
Neg logit contributions
ged-0.092
iked-0.078
 recess-0.078
or-0.072
ed-0.066
Token.
Token ID13
Feature activation+7.34e-02

Pos logit contributions
ged+0.037
iked+0.032
 recess+0.032
or+0.030
ed+0.028
Neg logit contributions
 rather-0.050
rig-0.049
 annoying-0.049
 stump-0.047
 probably-0.045
Token She
Token ID1375
Feature activation+1.76e-01

Pos logit contributions
 rather+0.122
 annoying+0.117
rig+0.116
 stump+0.111
 probably+0.107
Neg logit contributions
ged-0.129
iked-0.115
 recess-0.114
or-0.109
ed-0.102
Token did
Token ID750
Feature activation-4.88e-02

Pos logit contributions
rig+0.392
 stump+0.373
 rather+0.367
 annoying+0.362
 eerie+0.348
Neg logit contributions
ged-0.203
ed-0.158
 recess-0.150
ies-0.145
iked-0.143
Token listen
Token ID6004
Feature activation-9.10e-02

Pos logit contributions
 rather+0.048
rig+0.047
 annoying+0.046
 stump+0.045
 unlike+0.042
Neg logit contributions
ged-0.048
 recess-0.044
iked-0.042
or-0.041
ies-0.038
Token to
Token ID284
Feature activation+2.86e-02

Pos logit contributions
ged+0.144
iked+0.126
 recess+0.125
or+0.119
ed+0.112
Neg logit contributions
 rather-0.165
rig-0.160
 annoying-0.160
 stump-0.154
 probably-0.147
Token Mom
Token ID11254
Feature activation-4.30e-02

Pos logit contributions
 rather+0.055
 annoying+0.053
rig+0.052
 stump+0.050
 probably+0.049
Neg logit contributions
ged-0.044
iked-0.038
 recess-0.037
or-0.035
ed-0.033
Token."
Token ID526
Feature activation-2.57e-02

Pos logit contributions
ged+0.075
iked+0.066
 recess+0.066
or+0.063
ed+0.059
Neg logit contributions
 rather-0.072
 annoying-0.069
rig-0.069
 stump-0.066
 probably-0.063
Token She
Token ID1375
Feature activation+1.70e-01

Pos logit contributions
ged+0.041
 recess+0.036
iked+0.036
or+0.034
ed+0.031
Neg logit contributions
 rather-0.047
rig-0.045
 annoying-0.045
 stump-0.043
 probably-0.041
Token scr
Token ID6040
Feature activation+2.92e-01

Pos logit contributions
 rather+0.351
 stump+0.332
 annoying+0.331
rig+0.326
 probably+0.307
Neg logit contributions
ged-0.228
iked-0.197
 recess-0.196
or-0.183
ed-0.176
Tokenubs
Token ID23161
Feature activation+1.20e-02

Pos logit contributions
 rather+0.584
 stump+0.583
 annoying+0.569
rig+0.557
 spe+0.547
Neg logit contributions
ged-0.374
iked-0.369
or-0.334
 recess-0.307
ipped-0.294
Token them
Token ID606
Feature activation-5.45e-02

Pos logit contributions
 rather+0.025
 annoying+0.024
rig+0.024
 probably+0.023
 stump+0.023
Neg logit contributions
ged-0.015
iked-0.013
 recess-0.012
rav-0.011
or-0.011
Token with
Token ID351
Feature activation-1.08e-02

Pos logit contributions
ged+0.085
iked+0.072
 recess+0.068
ed+0.066
or+0.065
Neg logit contributions
 rather-0.104
rig-0.101
 annoying-0.099
 stump-0.094
 probably-0.093
Token soap
Token ID19533
Feature activation+1.08e-01

Pos logit contributions
ged+0.016
 recess+0.014
iked+0.012
 roles+0.012
ed+0.012
Neg logit contributions
rig-0.022
 rather-0.020
 annoying-0.020
 stump-0.019
 probably-0.019
Token and
Token ID290
Feature activation+1.61e-01

Pos logit contributions
 rather+0.170
rig+0.170
 annoying+0.166
 stump+0.157
 probably+0.150
Neg logit contributions
ged-0.204
iked-0.175
 recess-0.171
or-0.169
ed-0.167
Token.
Token ID13
Feature activation+6.62e-02

Pos logit contributions
rig+0.240
 annoying+0.228
 rather+0.223
 stump+0.211
 wire+0.207
Neg logit contributions
ged-0.119
ed-0.092
iked-0.088
,-0.084
or-0.084
Token She
Token ID1375
Feature activation+1.69e-01

Pos logit contributions
 rather+0.090
 annoying+0.085
rig+0.085
 stump+0.080
 probably+0.076
Neg logit contributions
ged-0.137
iked-0.124
 recess-0.124
or-0.119
ed-0.112
Token felt
Token ID2936
Feature activation+3.45e-02

Pos logit contributions
rig+0.383
 rather+0.374
 stump+0.365
 annoying+0.349
 wire+0.329
Neg logit contributions
ged-0.205
ed-0.151
 recess-0.139
iked-0.138
ies-0.138
Token her
Token ID607
Feature activation-6.28e-02

Pos logit contributions
rig+0.059
 rather+0.055
 stump+0.055
 annoying+0.054
 wire+0.050
Neg logit contributions
ged-0.059
 recess-0.049
ed-0.047
iked-0.047
or-0.045
Token face
Token ID1986
Feature activation+1.14e-01

Pos logit contributions
ged+0.081
 recess+0.070
ipop+0.065
iked+0.065
,+0.065
Neg logit contributions
rig-0.129
 stump-0.122
 rather-0.119
 annoying-0.114
 eerie-0.111
Token scr
Token ID6040
Feature activation+2.91e-01

Pos logit contributions
rig+0.245
 rather+0.227
 annoying+0.219
 stump+0.211
 likely+0.207
Neg logit contributions
ged-0.167
ed-0.132
iked-0.121
ies-0.116
,-0.116
Tokenunch
Token ID3316
Feature activation+6.53e-02

Pos logit contributions
 annoying+0.499
 rather+0.496
 stump+0.496
rig+0.492
 wire+0.490
Neg logit contributions
ged-0.429
iked-0.412
or-0.369
ipped-0.362
bled-0.351
Token up
Token ID510
Feature activation-3.23e-02

Pos logit contributions
 rather+0.146
 annoying+0.142
rig+0.140
 stump+0.137
 probably+0.133
Neg logit contributions
ged-0.075
 recess-0.064
iked-0.063
or-0.057
 roles-0.050
Token and
Token ID290
Feature activation+1.57e-01

Pos logit contributions
ged+0.048
iked+0.041
 recess+0.040
or+0.038
ed+0.037
Neg logit contributions
rig-0.062
 rather-0.062
 annoying-0.061
 stump-0.058
 probably-0.055
Token her
Token ID607
Feature activation-6.44e-02

Pos logit contributions
 rather+0.259
 annoying+0.252
 stump+0.238
 probably+0.229
rig+0.225
Neg logit contributions
ged-0.261
or-0.257
iked-0.255
 recess-0.250
rav-0.225
Token tongue
Token ID11880
Feature activation+9.96e-02

Pos logit contributions
ged+0.105
 recess+0.091
iked+0.089
or+0.083
ed+0.082
Neg logit contributions
 rather-0.116
rig-0.115
 annoying-0.111
 stump-0.108
 probably-0.101
Token sour
Token ID11348
Feature activation+1.86e-01

Pos logit contributions
rig+0.237
 stump+0.207
 rather+0.203
umbling+0.181
 prized+0.174
Neg logit contributions
ged-0.327
ies-0.280
 roles-0.277
 recess-0.269
ed-0.269
Token and
Token ID290
Feature activation+1.61e-01

Pos logit contributions
rig+0.395
 rather+0.389
 annoying+0.375
 stump+0.371
 probably+0.353
Neg logit contributions
ged-0.244
iked-0.196
 recess-0.195
ed-0.194
ies-0.184
Token made
Token ID925
Feature activation-2.78e-02

Pos logit contributions
rig+0.259
 stump+0.227
 rather+0.218
umbling+0.204
 former+0.193
Neg logit contributions
ged-0.305
ies-0.257
 roles-0.257
ed-0.254
 recess-0.250
Token their
Token ID511
Feature activation-1.73e-02

Pos logit contributions
ged+0.026
 recess+0.021
iked+0.020
or+0.017
ed+0.017
Neg logit contributions
 rather-0.069
rig-0.068
 stump-0.067
 annoying-0.064
 tre-0.062
Token faces
Token ID6698
Feature activation+1.30e-02

Pos logit contributions
ged+0.022
ies+0.019
 recess+0.019
 roles+0.018
iked+0.018
Neg logit contributions
rig-0.035
 rather-0.035
 stump-0.034
 annoying-0.033
 likely-0.032
Token scr
Token ID6040
Feature activation+2.88e-01

Pos logit contributions
rig+0.030
 stump+0.028
 rather+0.027
pill+0.027
 annoying+0.026
Neg logit contributions
ged-0.015
ies-0.012
ed-0.011
,-0.011
iked-0.010
Tokenunch
Token ID3316
Feature activation+6.11e-02

Pos logit contributions
 stump+0.523
 rather+0.521
 annoying+0.516
 probably+0.503
rig+0.496
Neg logit contributions
ged-0.390
iked-0.382
or-0.347
ipped-0.328
bled-0.324
Token up
Token ID510
Feature activation-3.33e-02

Pos logit contributions
 rather+0.134
rig+0.131
 annoying+0.129
 stump+0.126
 probably+0.119
Neg logit contributions
ged-0.078
iked-0.062
or-0.062
 recess-0.060
ed-0.059
Token.
Token ID13
Feature activation+6.39e-02

Pos logit contributions
ged+0.046
 recess+0.037
iked+0.037
or+0.035
ed+0.034
Neg logit contributions
rig-0.068
 rather-0.067
 annoying-0.066
 stump-0.064
 probably-0.061
Token
Token ID198
Feature activation+2.83e-02

Pos logit contributions
 rather+0.108
rig+0.104
 annoying+0.104
 stump+0.100
 probably+0.094
Neg logit contributions
ged-0.110
 recess-0.098
iked-0.097
or-0.092
ed-0.087
Token
Token ID198
Feature activation+2.93e-02

Pos logit contributions
 rather+0.041
rig+0.039
 annoying+0.037
 stump+0.036
 probably+0.034
Neg logit contributions
ged-0.056
iked-0.052
 recess-0.052
 roles-0.049
or-0.048
Token and
Token ID290
Feature activation+1.59e-01

Pos logit contributions
rig+0.027
 annoying+0.026
 rather+0.026
 stump+0.025
 intense+0.024
Neg logit contributions
ged-0.024
 recess-0.020
or-0.020
iked-0.020
ies-0.019
Token lady
Token ID10846
Feature activation+2.48e-02

Pos logit contributions
rig+0.279
 rather+0.257
 annoying+0.253
 stump+0.252
 intense+0.234
Neg logit contributions
ged-0.258
 recess-0.242
iked-0.226
ies-0.216
 roles-0.214
Tokenbugs
Token ID32965
Feature activation+6.07e-02

Pos logit contributions
 rather+0.040
 annoying+0.038
rig+0.038
 stump+0.036
 probably+0.035
Neg logit contributions
ged-0.045
iked-0.040
 recess-0.040
or-0.038
ed-0.035
Token and
Token ID290
Feature activation+1.58e-01

Pos logit contributions
rig+0.104
 rather+0.104
 annoying+0.103
 stump+0.099
 probably+0.093
Neg logit contributions
ged-0.099
 recess-0.087
iked-0.086
or-0.084
ed-0.080
Token cater
Token ID20825
Feature activation+1.39e-01

Pos logit contributions
rig+0.273
 rather+0.254
 stump+0.249
 annoying+0.246
 probably+0.228
Neg logit contributions
ged-0.264
 recess-0.242
iked-0.225
ies-0.218
 roles-0.217
Tokenpill
Token ID27215
Feature activation+2.87e-01

Pos logit contributions
rig+0.139
 annoying+0.133
 rather+0.131
 stump+0.127
 probably+0.122
Neg logit contributions
ged-0.342
 recess-0.306
or-0.297
ed-0.292
iked-0.292
Tokenars
Token ID945
Feature activation-3.83e-02

Pos logit contributions
 rather+0.546
 annoying+0.538
rig+0.534
 stump+0.524
 probably+0.513
Neg logit contributions
ged-0.411
 recess-0.355
or-0.342
iked-0.336
ed-0.314
Token."
Token ID526
Feature activation-3.66e-02

Pos logit contributions
ged+0.063
iked+0.056
 recess+0.056
or+0.052
ed+0.048
Neg logit contributions
 rather-0.068
 annoying-0.065
rig-0.064
 stump-0.061
 probably-0.060
Token
Token ID198
Feature activation+2.76e-02

Pos logit contributions
ged+0.066
 recess+0.059
iked+0.058
or+0.056
 roles+0.053
Neg logit contributions
 rather-0.060
rig-0.057
 annoying-0.055
 stump-0.055
 probably-0.052
Token
Token ID198
Feature activation+2.93e-02

Pos logit contributions
 rather+0.044
rig+0.044
 annoying+0.042
 stump+0.042
 wire+0.039
Neg logit contributions
ged-0.049
iked-0.045
 recess-0.045
or-0.043
 roles-0.041
TokenTim
Token ID14967
Feature activation-4.43e-02

Pos logit contributions
 rather+0.053
rig+0.051
 annoying+0.051
 stump+0.048
 probably+0.047
Neg logit contributions
ged-0.048
 recess-0.040
iked-0.040
or-0.037
ies-0.035
Token hard
Token ID1327
Feature activation+1.19e-01

Pos logit contributions
rig+0.228
 rather+0.207
 stump+0.202
umbling+0.176
 prized+0.175
Neg logit contributions
ged-0.257
 recess-0.211
ies-0.211
ed-0.207
 roles-0.201
Token and
Token ID290
Feature activation+1.37e-01

Pos logit contributions
 rather+0.232
rig+0.226
 annoying+0.224
 stump+0.215
 probably+0.206
Neg logit contributions
ged-0.177
iked-0.151
 recess-0.151
or-0.142
ed-0.134
Token made
Token ID925
Feature activation-2.72e-02

Pos logit contributions
rig+0.287
 rather+0.271
 stump+0.265
 annoying+0.249
 likely+0.240
Neg logit contributions
ged-0.186
 recess-0.152
ed-0.142
iked-0.139
ies-0.138
Token her
Token ID607
Feature activation-7.11e-02

Pos logit contributions
ged+0.032
 recess+0.027
iked+0.025
ed+0.024
ies+0.023
Neg logit contributions
 rather-0.060
rig-0.060
 stump-0.057
 annoying-0.055
 tre-0.054
Token face
Token ID1986
Feature activation+1.06e-01

Pos logit contributions
ged+0.111
ies+0.099
 recess+0.094
ed+0.089
iked+0.089
Neg logit contributions
rig-0.124
 rather-0.118
 annoying-0.115
 stump-0.109
pill-0.108
Token scr
Token ID6040
Feature activation+2.87e-01

Pos logit contributions
rig+0.165
 rather+0.152
 stump+0.145
 annoying+0.144
pill+0.137
Neg logit contributions
ged-0.194
ies-0.171
ed-0.162
iked-0.160
or-0.157
Tokenunch
Token ID3316
Feature activation+6.07e-02

Pos logit contributions
 rather+0.493
 annoying+0.493
rig+0.482
 stump+0.481
 wire+0.474
Neg logit contributions
ged-0.417
iked-0.415
or-0.375
ipped-0.360
 recess-0.342
Token up
Token ID510
Feature activation-3.49e-02

Pos logit contributions
 rather+0.138
rig+0.135
 annoying+0.133
 stump+0.129
 probably+0.123
Neg logit contributions
ged-0.072
iked-0.057
or-0.055
 recess-0.055
ed-0.052
Token.
Token ID13
Feature activation+3.73e-02

Pos logit contributions
ged+0.050
iked+0.043
 recess+0.043
or+0.040
ed+0.038
Neg logit contributions
 rather-0.068
rig-0.067
 annoying-0.067
 stump-0.064
 probably-0.061
Token She
Token ID1375
Feature activation+1.62e-01

Pos logit contributions
 rather+0.057
 annoying+0.055
rig+0.054
 stump+0.052
 probably+0.049
Neg logit contributions
ged-0.071
 recess-0.063
iked-0.063
or-0.060
ed-0.056
Token spat
Token ID15246
Feature activation+4.63e-02

Pos logit contributions
rig+0.389
 rather+0.363
 stump+0.361
 annoying+0.344
 breathtaking+0.344
Neg logit contributions
ged-0.182
ed-0.127
ies-0.116
or-0.112
iked-0.106
Token.
Token ID13
Feature activation+8.06e-02

Pos logit contributions
ged+0.084
iked+0.075
 recess+0.074
or+0.070
 roles+0.064
Neg logit contributions
 rather-0.090
 annoying-0.084
rig-0.083
 stump-0.081
 probably-0.079
Token Soon
Token ID15894
Feature activation-3.08e-02

Pos logit contributions
 rather+0.146
 annoying+0.142
rig+0.142
 stump+0.138
 probably+0.130
Neg logit contributions
ged-0.125
 recess-0.112
iked-0.111
or-0.104
ed-0.098
Token,
Token ID11
Feature activation-1.53e-01

Pos logit contributions
ged+0.055
or+0.047
 recess+0.041
ging+0.040
iked+0.039
Neg logit contributions
 rather-0.050
 spe-0.049
 spr-0.044
 beautiful-0.043
 annoying-0.042
Token more
Token ID517
Feature activation+1.07e-01

Pos logit contributions
ged+0.185
iked+0.153
 recess+0.152
or+0.143
ed+0.128
Neg logit contributions
 rather-0.338
 annoying-0.328
rig-0.326
 stump-0.315
 probably-0.306
Token cater
Token ID20825
Feature activation+1.40e-01

Pos logit contributions
 rather+0.191
rig+0.189
 stump+0.184
 annoying+0.180
 probably+0.171
Neg logit contributions
ged-0.168
 recess-0.154
iked-0.147
ies-0.139
or-0.137
Tokenpill
Token ID27215
Feature activation+2.86e-01

Pos logit contributions
 rather+0.149
rig+0.146
 annoying+0.135
 stump+0.132
 probably+0.129
Neg logit contributions
ged-0.336
 recess-0.304
ed-0.300
or-0.300
iked-0.286
Tokenars
Token ID945
Feature activation-3.40e-02

Pos logit contributions
 rather+0.516
rig+0.500
 annoying+0.497
 stump+0.479
 probably+0.461
Neg logit contributions
ged-0.462
 recess-0.403
iked-0.397
or-0.383
ed-0.363
Token arrived
Token ID5284
Feature activation-1.49e-01

Pos logit contributions
ged+0.053
 recess+0.046
iked+0.046
or+0.044
ed+0.040
Neg logit contributions
 rather-0.063
 annoying-0.060
rig-0.059
 stump-0.057
 probably-0.055
Token.
Token ID13
Feature activation+7.53e-02

Pos logit contributions
ged+0.245
 recess+0.215
iked+0.213
or+0.202
 roles+0.185
Neg logit contributions
 rather-0.264
rig-0.247
 annoying-0.246
 stump-0.235
 probably-0.230
Token They
Token ID1119
Feature activation+1.37e-01

Pos logit contributions
 rather+0.135
 annoying+0.131
rig+0.129
 stump+0.124
 probably+0.119
Neg logit contributions
ged-0.120
 recess-0.108
iked-0.107
or-0.100
ed-0.095
Token also
Token ID635
Feature activation+2.18e-01

Pos logit contributions
 rather+0.272
rig+0.265
 annoying+0.264
 stump+0.254
 probably+0.244
Neg logit contributions
ged-0.195
 recess-0.167
iked-0.166
or-0.156
ed-0.145
Token.
Token ID13
Feature activation+4.11e-02

Pos logit contributions
 rather+0.337
rig+0.336
 annoying+0.324
 stump+0.318
 probably+0.304
Neg logit contributions
ged-0.213
iked-0.176
 recess-0.174
ed-0.168
or-0.163
Token It
Token ID632
Feature activation+1.68e-01

Pos logit contributions
 rather+0.071
rig+0.070
 annoying+0.068
 stump+0.067
 tre+0.063
Neg logit contributions
ged-0.066
 recess-0.059
iked-0.058
ed-0.055
or-0.053
Token makes
Token ID1838
Feature activation+1.88e-02

Pos logit contributions
rig+0.351
 rather+0.334
 stump+0.333
umbling+0.317
 breathtaking+0.308
Neg logit contributions
ged-0.188
 roles-0.175
ed-0.167
,-0.153
ies-0.149
Token her
Token ID607
Feature activation-7.30e-02

Pos logit contributions
 rather+0.045
rig+0.045
 stump+0.043
 annoying+0.042
 tre+0.042
Neg logit contributions
ged-0.017
 recess-0.015
iked-0.013
ies-0.013
ed-0.013
Token face
Token ID1986
Feature activation+1.06e-01

Pos logit contributions
ged+0.102
ies+0.095
 recess+0.089
,+0.086
 sister+0.086
Neg logit contributions
rig-0.131
 rather-0.124
 likely-0.121
pill-0.120
 annoying-0.120
Token scr
Token ID6040
Feature activation+2.84e-01

Pos logit contributions
rig+0.162
 rather+0.142
 stump+0.138
pill+0.135
 tre+0.132
Neg logit contributions
ged-0.197
ies-0.184
,-0.174
ed-0.169
 later-0.159
Tokenunch
Token ID3316
Feature activation+5.93e-02

Pos logit contributions
 rather+0.497
 annoying+0.487
rig+0.482
 stump+0.482
 probably+0.474
Neg logit contributions
ged-0.405
iked-0.393
or-0.362
ipped-0.348
bled-0.334
Token up
Token ID510
Feature activation-3.79e-02

Pos logit contributions
 rather+0.140
rig+0.137
 annoying+0.135
 stump+0.131
 probably+0.125
Neg logit contributions
ged-0.066
or-0.050
iked-0.050
 recess-0.048
ed-0.048
Token.
Token ID13
Feature activation+4.16e-02

Pos logit contributions
ged+0.057
iked+0.048
 recess+0.047
or+0.045
ed+0.044
Neg logit contributions
rig-0.072
 rather-0.071
 annoying-0.070
 stump-0.068
 probably-0.064
Token She
Token ID1375
Feature activation+1.52e-01

Pos logit contributions
 rather+0.076
rig+0.073
 annoying+0.073
 stump+0.070
 probably+0.067
Neg logit contributions
ged-0.066
 recess-0.058
iked-0.058
or-0.054
ed-0.052
Token sp
Token ID599
Feature activation+1.30e-01

Pos logit contributions
rig+0.318
 stump+0.313
 rather+0.303
umbling+0.292
 breathtaking+0.290
Neg logit contributions
ged-0.181
ed-0.142
 roles-0.139
ies-0.137
,-0.134
Token bitter
Token ID12922
Feature activation+1.60e-01

Pos logit contributions
rig+0.209
 rather+0.182
umbling+0.165
 stump+0.165
 annoying+0.150
Neg logit contributions
ged-0.283
 recess-0.240
ies-0.235
,-0.231
kins-0.230
Token and
Token ID290
Feature activation+1.44e-01

Pos logit contributions
rig+0.351
 rather+0.343
 annoying+0.335
 stump+0.328
 probably+0.316
Neg logit contributions
ged-0.196
iked-0.160
ed-0.156
 recess-0.152
ies-0.146
Token made
Token ID925
Feature activation-3.52e-02

Pos logit contributions
rig+0.225
 rather+0.191
 stump+0.190
umbling+0.182
 prized+0.165
Neg logit contributions
ged-0.267
ies-0.227
 recess-0.223
 roles-0.222
ed-0.220
Token her
Token ID607
Feature activation-7.49e-02

Pos logit contributions
ged+0.036
 recess+0.029
iked+0.026
ed+0.026
ies+0.025
Neg logit contributions
rig-0.083
 rather-0.082
 stump-0.079
 tre-0.075
 annoying-0.075
Token face
Token ID1986
Feature activation+1.04e-01

Pos logit contributions
ged+0.109
ies+0.103
,+0.092
 recess+0.092
 roles+0.090
Neg logit contributions
rig-0.130
 rather-0.121
 annoying-0.117
 stump-0.116
pill-0.115
Token scr
Token ID6040
Feature activation+2.84e-01

Pos logit contributions
rig+0.153
 rather+0.137
 stump+0.129
 annoying+0.125
pill+0.124
Neg logit contributions
ged-0.195
ies-0.183
,-0.167
ed-0.167
iked-0.160
Tokenunch
Token ID3316
Feature activation+5.48e-02

Pos logit contributions
 annoying+0.491
 rather+0.489
 stump+0.487
rig+0.480
 wire+0.478
Neg logit contributions
ged-0.391
iked-0.390
or-0.364
ipped-0.330
 recess-0.329
Token up
Token ID510
Feature activation-3.92e-02

Pos logit contributions
 rather+0.118
rig+0.115
 annoying+0.114
 stump+0.110
 probably+0.106
Neg logit contributions
ged-0.070
iked-0.058
 recess-0.057
or-0.055
ed-0.051
Token.
Token ID13
Feature activation+4.03e-02

Pos logit contributions
ged+0.053
iked+0.043
 recess+0.042
or+0.040
ed+0.039
Neg logit contributions
rig-0.080
 rather-0.079
 annoying-0.078
 stump-0.076
 wire-0.072
Token
Token ID198
Feature activation+1.16e-02

Pos logit contributions
 rather+0.072
rig+0.069
 annoying+0.068
 stump+0.066
 probably+0.064
Neg logit contributions
ged-0.066
 recess-0.058
iked-0.058
or-0.055
 roles-0.051
Token
Token ID198
Feature activation+1.44e-02

Pos logit contributions
 rather+0.019
rig+0.018
 annoying+0.017
 stump+0.017
 probably+0.016
Neg logit contributions
ged-0.021
iked-0.019
 recess-0.019
 roles-0.018
or-0.017
Token Lily
Token ID20037
Feature activation-1.32e-01

Pos logit contributions
ged+0.214
iked+0.184
 recess+0.179
or+0.171
ed+0.158
Neg logit contributions
 rather-0.313
 annoying-0.303
rig-0.295
 stump-0.286
 probably-0.281
Token,
Token ID11
Feature activation-1.52e-01

Pos logit contributions
ged+0.177
 recess+0.166
iked+0.165
or+0.142
rav+0.128
Neg logit contributions
 rather-0.275
 annoying-0.262
 probably-0.259
 stump-0.255
rig-0.246
Token no
Token ID645
Feature activation+6.88e-02

Pos logit contributions
ged+0.249
 recess+0.219
iked+0.216
or+0.206
ed+0.191
Neg logit contributions
 rather-0.266
rig-0.264
 annoying-0.257
 stump-0.252
 probably-0.233
Token.
Token ID13
Feature activation+7.74e-02

Pos logit contributions
 rather+0.156
 annoying+0.151
rig+0.146
 stump+0.144
 probably+0.139
Neg logit contributions
ged-0.085
iked-0.072
 recess-0.068
or-0.065
ed-0.060
Token Cater
Token ID40742
Feature activation+9.05e-02

Pos logit contributions
 rather+0.124
rig+0.122
 annoying+0.120
 stump+0.114
 probably+0.107
Neg logit contributions
ged-0.138
iked-0.124
 recess-0.123
or-0.119
ed-0.113
Tokenpill
Token ID27215
Feature activation+2.84e-01

Pos logit contributions
 rather+0.072
 annoying+0.072
rig+0.071
 probably+0.063
 stump+0.061
Neg logit contributions
ged-0.233
 recess-0.220
or-0.213
ed-0.212
iked-0.208
Tokenars
Token ID945
Feature activation-3.50e-02

Pos logit contributions
 rather+0.569
rig+0.564
 stump+0.551
 annoying+0.550
 eerie+0.520
Neg logit contributions
ged-0.387
or-0.315
 recess-0.313
iked-0.297
ed-0.289
Token are
Token ID389
Feature activation+2.62e-02

Pos logit contributions
ged+0.047
iked+0.038
 recess+0.038
or+0.037
ed+0.034
Neg logit contributions
 rather-0.073
rig-0.071
 annoying-0.070
 stump-0.069
 probably-0.065
Token not
Token ID407
Feature activation+1.53e-01

Pos logit contributions
rig+0.044
umbling+0.041
 stump+0.040
pill+0.037
 rather+0.037
Neg logit contributions
ged-0.039
 recess-0.037
,-0.037
ies-0.036
 roles-0.035
Token for
Token ID329
Feature activation-9.84e-02

Pos logit contributions
rig+0.232
 rather+0.226
 stump+0.211
 annoying+0.205
 tre+0.193
Neg logit contributions
ged-0.280
 recess-0.248
ies-0.243
iked-0.233
ed-0.226
Token keeping
Token ID5291
Feature activation-5.49e-02

Pos logit contributions
ged+0.165
iked+0.146
 recess+0.144
or+0.139
ed+0.130
Neg logit contributions
 rather-0.171
 annoying-0.165
rig-0.162
 stump-0.156
 probably-0.150
Tokenily
Token ID813
Feature activation-1.50e-01

Pos logit contributions
ged+0.102
iked+0.083
 recess+0.078
 roles+0.077
ging+0.075
Neg logit contributions
 rather-0.087
 probably-0.086
rig-0.083
 wire-0.082
 certainly-0.081
Token liked
Token ID8288
Feature activation-6.10e-05

Pos logit contributions
ged+0.269
iked+0.241
 recess+0.237
or+0.228
ed+0.210
Neg logit contributions
 rather-0.242
 annoying-0.229
rig-0.228
 stump-0.219
 probably-0.211
Token to
Token ID284
Feature activation+2.39e-02

Pos logit contributions
ged+0.000
 recess+0.000
iked+0.000
or+0.000
 roles+0.000
Neg logit contributions
rig-0.000
 rather-0.000
 annoying-0.000
 stump-0.000
 probably-0.000
Token keep
Token ID1394
Feature activation-3.45e-03

Pos logit contributions
rig+0.049
 rather+0.047
 annoying+0.047
 stump+0.046
 awful+0.045
Neg logit contributions
ged-0.031
 recess-0.030
iked-0.026
or-0.025
,-0.023
Token cater
Token ID20825
Feature activation+1.27e-01

Pos logit contributions
ged+0.004
iked+0.003
 recess+0.003
or+0.003
ed+0.003
Neg logit contributions
 rather-0.008
 annoying-0.008
rig-0.007
 stump-0.007
 probably-0.007
Tokenpill
Token ID27215
Feature activation+2.83e-01

Pos logit contributions
 probably+0.162
 rather+0.155
 stump+0.153
 annoying+0.148
rig+0.144
Neg logit contributions
ged-0.265
or-0.253
 recess-0.239
ies-0.235
iked-0.235
Tokenars
Token ID945
Feature activation-4.27e-02

Pos logit contributions
 rather+0.567
 annoying+0.563
rig+0.548
 probably+0.544
 stump+0.538
Neg logit contributions
ged-0.353
or-0.315
 recess-0.307
iked-0.296
ed-0.269
Token in
Token ID287
Feature activation-5.49e-02

Pos logit contributions
ged+0.054
or+0.044
ies+0.044
iked+0.044
,+0.041
Neg logit contributions
rig-0.090
 annoying-0.088
 stump-0.086
 rather-0.085
 probably-0.081
Token a
Token ID257
Feature activation+9.96e-02

Pos logit contributions
ged+0.043
 recess+0.041
iked+0.029
or+0.026
 roles+0.023
Neg logit contributions
rig-0.144
 rather-0.141
 annoying-0.138
 stump-0.134
 probably-0.132
Token jar
Token ID17379
Feature activation+5.93e-02

Pos logit contributions
 rather+0.145
rig+0.139
 annoying+0.136
 probably+0.126
 stump+0.124
Neg logit contributions
ged-0.186
 recess-0.183
or-0.167
iked-0.161
ies-0.156
Token.
Token ID13
Feature activation+1.53e-02

Pos logit contributions
 annoying+0.133
rig+0.129
 rather+0.126
 probably+0.121
 awful+0.118
Neg logit contributions
ged-0.064
or-0.058
iked-0.056
 recess-0.053
,-0.052
Token but
Token ID475
Feature activation+2.57e-02

Pos logit contributions
ged+0.148
,+0.138
 Lily+0.120
 there+0.118
 later+0.116
Neg logit contributions
rig-0.374
 stump-0.326
umbling-0.322
 former-0.319
ilt-0.310
Token he
Token ID339
Feature activation-6.38e-03

Pos logit contributions
 rather+0.054
rig+0.052
 annoying+0.052
 stump+0.050
 probably+0.048
Neg logit contributions
ged-0.034
 recess-0.029
iked-0.029
or-0.027
ed-0.025
Token also
Token ID635
Feature activation+2.09e-01

Pos logit contributions
ged+0.008
ed+0.007
ies+0.006
or+0.006
iked+0.006
Neg logit contributions
rig-0.013
 rather-0.013
 stump-0.012
 annoying-0.012
 highest-0.012
Token knew
Token ID2993
Feature activation-4.90e-02

Pos logit contributions
rig+0.438
 rather+0.417
 stump+0.416
 annoying+0.400
 wire+0.392
Neg logit contributions
ged-0.279
ed-0.221
 recess-0.215
iked-0.198
ies-0.192
Token cater
Token ID20825
Feature activation+1.36e-01

Pos logit contributions
ged+0.070
 recess+0.060
iked+0.060
or+0.056
ed+0.052
Neg logit contributions
 rather-0.097
rig-0.095
 annoying-0.094
 stump-0.091
 probably-0.087
Tokenpill
Token ID27215
Feature activation+2.83e-01

Pos logit contributions
 rather+0.132
 annoying+0.132
 stump+0.129
rig+0.127
 probably+0.127
Neg logit contributions
ged-0.328
or-0.305
ed-0.295
 recess-0.291
ipped-0.280
Tokenars
Token ID945
Feature activation-3.76e-02

Pos logit contributions
 rather+0.553
 stump+0.547
rig+0.545
 annoying+0.544
 probably+0.516
Neg logit contributions
ged-0.381
or-0.324
 recess-0.315
iked-0.302
ed-0.298
Token were
Token ID547
Feature activation-1.02e-02

Pos logit contributions
ged+0.058
iked+0.050
 recess+0.050
or+0.047
ed+0.044
Neg logit contributions
 rather-0.070
 annoying-0.068
rig-0.067
 stump-0.065
 probably-0.062
Token not
Token ID407
Feature activation+1.50e-01

Pos logit contributions
ged+0.019
 recess+0.017
ies+0.016
iked+0.016
 roles+0.016
Neg logit contributions
rig-0.015
 rather-0.014
 stump-0.013
 annoying-0.012
umbling-0.012
Token pets
Token ID17252
Feature activation+1.99e-03

Pos logit contributions
 rather+0.228
rig+0.223
 annoying+0.215
 stump+0.210
 probably+0.195
Neg logit contributions
ged-0.280
 recess-0.248
iked-0.245
or-0.232
ies-0.226
Token.
Token ID13
Feature activation+6.58e-02

Pos logit contributions
rig+0.004
 rather+0.004
 annoying+0.004
 stump+0.004
 intense+0.004
Neg logit contributions
ged-0.003
 recess-0.002
or-0.002
iked-0.002
ed-0.002
Token it
Token ID340
Feature activation-1.49e-02

Pos logit contributions
ged+0.217
,+0.172
ed+0.147
 recess+0.142
iked+0.142
Neg logit contributions
rig-0.388
 stump-0.368
 rather-0.347
 web-0.337
 annoying-0.336
Token made
Token ID925
Feature activation-3.46e-02

Pos logit contributions
ged+0.018
ed+0.016
ies+0.014
,+0.014
led+0.013
Neg logit contributions
rig-0.031
 stump-0.030
 rather-0.029
 former-0.027
 heavens-0.027
Token Jack
Token ID3619
Feature activation-1.59e-01

Pos logit contributions
ged+0.028
 recess+0.021
iked+0.019
ed+0.018
or+0.016
Neg logit contributions
 rather-0.089
rig-0.089
 stump-0.087
 annoying-0.084
 tre-0.082
Token's
Token ID338
Feature activation+8.78e-02

Pos logit contributions
ged+0.245
ed+0.203
ies+0.203
,+0.193
iked+0.185
Neg logit contributions
rig-0.272
 rather-0.269
 annoying-0.267
 wire-0.260
 stump-0.258
Token face
Token ID1986
Feature activation+1.06e-01

Pos logit contributions
rig+0.168
 rather+0.161
 annoying+0.160
 stump+0.155
 likely+0.153
Neg logit contributions
ged-0.119
 recess-0.100
ies-0.098
iked-0.095
,-0.092
Token scr
Token ID6040
Feature activation+2.81e-01

Pos logit contributions
rig+0.155
 stump+0.135
 wire+0.134
 rather+0.132
 annoying+0.130
Neg logit contributions
ged-0.203
ies-0.184
ed-0.183
,-0.180
iked-0.166
Tokenunch
Token ID3316
Feature activation+5.71e-02

Pos logit contributions
 annoying+0.506
 wire+0.496
 stump+0.494
 rather+0.490
 probably+0.483
Neg logit contributions
iked-0.388
ged-0.383
ipped-0.342
or-0.336
bled-0.317
Token up
Token ID510
Feature activation-4.03e-02

Pos logit contributions
 rather+0.138
rig+0.136
 annoying+0.133
 stump+0.129
 probably+0.123
Neg logit contributions
ged-0.063
ed-0.047
or-0.046
iked-0.044
 recess-0.040
Token!
Token ID0
Feature activation+4.91e-02

Pos logit contributions
ged+0.051
iked+0.041
 recess+0.040
or+0.038
ed+0.037
Neg logit contributions
 rather-0.086
rig-0.086
 annoying-0.085
 stump-0.082
 wire-0.079
Token
Token ID220
Feature activation-1.22e-01

Pos logit contributions
 rather+0.099
rig+0.096
 annoying+0.095
 stump+0.093
 probably+0.089
Neg logit contributions
ged-0.069
 recess-0.059
iked-0.059
or-0.056
ies-0.051
Token
Token ID198
Feature activation+1.13e-03

Pos logit contributions
ged+0.209
 recess+0.189
iked+0.184
or+0.173
 roles+0.165
Neg logit contributions
 rather-0.208
rig-0.203
 annoying-0.195
 stump-0.191
 probably-0.184
Token you
Token ID345
Feature activation+4.61e-02

Pos logit contributions
ged+0.297
iked+0.265
 recess+0.263
or+0.250
ed+0.238
Neg logit contributions
 rather-0.242
 annoying-0.232
rig-0.223
 stump-0.210
 probably-0.210
Token,
Token ID11
Feature activation-1.79e-01

Pos logit contributions
 rather+0.080
rig+0.079
 annoying+0.077
 stump+0.075
 probably+0.070
Neg logit contributions
ged-0.075
iked-0.067
 recess-0.066
or-0.063
ed-0.059
Token Lily
Token ID20037
Feature activation-1.50e-01

Pos logit contributions
ged+0.160
iked+0.127
 recess+0.126
ies+0.117
or+0.116
Neg logit contributions
 rather-0.425
rig-0.422
 stump-0.413
 annoying-0.406
 intense-0.389
Token.
Token ID13
Feature activation+5.07e-02

Pos logit contributions
ged+0.212
iked+0.185
 recess+0.184
or+0.166
 roles+0.149
Neg logit contributions
 rather-0.312
 annoying-0.295
rig-0.287
 probably-0.285
 stump-0.282
Token Cater
Token ID40742
Feature activation+8.16e-02

Pos logit contributions
 rather+0.079
rig+0.077
 annoying+0.077
 stump+0.073
 intense+0.068
Neg logit contributions
ged-0.091
iked-0.082
 recess-0.081
or-0.079
ed-0.075
Tokenpill
Token ID27215
Feature activation+2.79e-01

Pos logit contributions
 rather+0.062
 annoying+0.062
rig+0.059
 stump+0.055
 probably+0.054
Neg logit contributions
ged-0.211
 recess-0.198
or-0.197
ed-0.194
iked-0.191
Tokenars
Token ID945
Feature activation-4.18e-02

Pos logit contributions
 rather+0.540
rig+0.536
 annoying+0.529
 stump+0.520
 probably+0.488
Neg logit contributions
ged-0.393
 recess-0.323
or-0.322
iked-0.316
ed-0.305
Token can
Token ID460
Feature activation+4.98e-02

Pos logit contributions
ged+0.051
or+0.040
iked+0.039
 recess+0.039
ies+0.037
Neg logit contributions
 rather-0.089
rig-0.089
 stump-0.087
 annoying-0.086
 intense-0.079
Token leave
Token ID2666
Feature activation-5.19e-02

Pos logit contributions
rig+0.097
 rather+0.090
 stump+0.088
 annoying+0.087
pill+0.085
Neg logit contributions
ged-0.067
or-0.062
 recess-0.059
ies-0.054
,-0.053
Token and
Token ID290
Feature activation+1.37e-01

Pos logit contributions
ged+0.076
 recess+0.065
iked+0.064
or+0.062
ed+0.057
Neg logit contributions
 rather-0.101
rig-0.099
 annoying-0.098
 stump-0.094
 probably-0.090
Token never
Token ID1239
Feature activation+9.91e-02

Pos logit contributions
rig+0.249
 rather+0.243
 annoying+0.239
 stump+0.232
 intense+0.217
Neg logit contributions
ged-0.206
 recess-0.184
iked-0.178
or-0.169
 roles-0.163
Token made
Token ID925
Feature activation-4.30e-02

Pos logit contributions
rig+0.278
 stump+0.267
 rather+0.267
 annoying+0.258
 eerie+0.243
Neg logit contributions
ged-0.170
ed-0.126
iked-0.120
ies-0.117
or-0.117
Token a
Token ID257
Feature activation+9.80e-02

Pos logit contributions
ged+0.060
ed+0.052
iked+0.050
 recess+0.049
ies+0.047
Neg logit contributions
 rather-0.083
rig-0.083
 stump-0.081
 annoying-0.079
 tre-0.074
Token funny
Token ID8258
Feature activation+1.58e-01

Pos logit contributions
 rather+0.146
rig+0.141
 stump+0.140
pill+0.133
 annoying+0.127
Neg logit contributions
ged-0.163
ies-0.153
,-0.145
by-0.138
 recess-0.131
Token face
Token ID1986
Feature activation+1.01e-01

Pos logit contributions
 rather+0.282
rig+0.275
 annoying+0.264
 stump+0.263
 tre+0.251
Neg logit contributions
ged-0.255
ies-0.219
iked-0.203
ed-0.202
,-0.201
Token,
Token ID11
Feature activation-1.89e-01

Pos logit contributions
rig+0.212
 annoying+0.194
 rather+0.192
 stump+0.189
 tre+0.180
Neg logit contributions
ged-0.148
,-0.121
ed-0.114
ies-0.112
iked-0.110
Token scr
Token ID6040
Feature activation+2.78e-01

Pos logit contributions
ged+0.308
,+0.242
ed+0.231
iked+0.219
ies+0.219
Neg logit contributions
rig-0.348
 stump-0.322
 rather-0.300
umbling-0.297
 annoying-0.296
Tokenunch
Token ID3316
Feature activation+5.20e-02

Pos logit contributions
 stump+0.509
 annoying+0.494
 rather+0.494
 wire+0.484
rig+0.482
Neg logit contributions
ged-0.391
iked-0.389
or-0.344
ipped-0.323
bled-0.316
Tokening
Token ID278
Feature activation-5.74e-02

Pos logit contributions
 rather+0.120
rig+0.117
 annoying+0.116
 stump+0.112
 probably+0.108
Neg logit contributions
ged-0.060
iked-0.048
 recess-0.046
or-0.045
ed-0.042
Token up
Token ID510
Feature activation-4.61e-02

Pos logit contributions
ged+0.084
ed+0.065
iked+0.065
 recess+0.063
or+0.061
Neg logit contributions
 rather-0.118
rig-0.116
 annoying-0.113
 stump-0.110
 probably-0.105
Token her
Token ID607
Feature activation-9.30e-02

Pos logit contributions
ged+0.072
iked+0.062
 recess+0.059
ed+0.058
or+0.056
Neg logit contributions
 rather-0.085
rig-0.085
 annoying-0.084
 stump-0.081
 probably-0.076
Token nose
Token ID9686
Feature activation+4.69e-02

Pos logit contributions
ged+0.154
iked+0.129
 recess+0.128
ed+0.123
ies+0.121
Neg logit contributions
 rather-0.160
rig-0.159
 annoying-0.158
 stump-0.151
 probably-0.141
Token put
Token ID1234
Feature activation-1.42e-02

Pos logit contributions
rig+0.049
 rather+0.048
 annoying+0.048
pill+0.046
 stump+0.046
Neg logit contributions
ged-0.036
 recess-0.034
or-0.031
iked-0.030
ies-0.028
Token it
Token ID340
Feature activation-6.49e-03

Pos logit contributions
ged+0.020
 recess+0.017
iked+0.017
or+0.016
ed+0.015
Neg logit contributions
 rather-0.029
rig-0.028
 annoying-0.028
 stump-0.027
 probably-0.026
Token back
Token ID736
Feature activation-6.07e-02

Pos logit contributions
ged+0.009
iked+0.008
 recess+0.008
or+0.008
rav+0.007
Neg logit contributions
 rather-0.013
 annoying-0.012
rig-0.012
 stump-0.012
 probably-0.011
Token.
Token ID13
Feature activation+7.02e-02

Pos logit contributions
ged+0.090
 recess+0.080
iked+0.080
or+0.073
rav+0.067
Neg logit contributions
 rather-0.117
rig-0.110
 annoying-0.109
 stump-0.107
 probably-0.103
Token Cater
Token ID40742
Feature activation+8.52e-02

Pos logit contributions
rig+0.125
 annoying+0.124
 rather+0.116
 intense+0.115
 stump+0.112
Neg logit contributions
ged-0.098
iked-0.090
 recess-0.090
 Lee-0.088
or-0.084
Tokenpill
Token ID27215
Feature activation+2.78e-01

Pos logit contributions
 rather+0.073
 annoying+0.071
rig+0.066
 probably+0.064
 stump+0.060
Neg logit contributions
ged-0.215
 recess-0.203
or-0.198
ed-0.196
iked-0.193
Tokenars
Token ID945
Feature activation-4.20e-02

Pos logit contributions
 rather+0.541
rig+0.533
 annoying+0.532
 stump+0.519
 probably+0.492
Neg logit contributions
ged-0.392
 recess-0.324
or-0.323
iked-0.314
ed-0.299
Token need
Token ID761
Feature activation+1.04e-01

Pos logit contributions
ged+0.057
or+0.048
iked+0.045
 recess+0.044
ed+0.043
Neg logit contributions
rig-0.085
 rather-0.084
 annoying-0.083
 stump-0.082
 intense-0.076
Token to
Token ID284
Feature activation+2.64e-02

Pos logit contributions
rig+0.220
 annoying+0.208
 stump+0.199
 rather+0.199
 intense+0.187
Neg logit contributions
ged-0.139
 recess-0.121
iked-0.114
or-0.111
ed-0.109
Token eat
Token ID4483
Feature activation-7.18e-02

Pos logit contributions
rig+0.049
 rather+0.048
 annoying+0.047
pill+0.046
 unlike+0.045
Neg logit contributions
ged-0.037
 recess-0.036
or-0.031
iked-0.030
 later-0.030
Token leaves
Token ID5667
Feature activation+1.16e-01

Pos logit contributions
ged+0.106
 recess+0.090
or+0.088
iked+0.088
ed+0.081
Neg logit contributions
rig-0.135
 rather-0.134
 annoying-0.134
 stump-0.128
 probably-0.121
Token it
Token ID340
Feature activation-1.60e-02

Pos logit contributions
rig+0.154
 rather+0.152
 annoying+0.147
 stump+0.144
 probably+0.136
Neg logit contributions
ged-0.086
iked-0.073
 recess-0.071
or-0.066
ed-0.063
Token made
Token ID925
Feature activation-3.63e-02

Pos logit contributions
ged+0.022
ed+0.019
ies+0.017
or+0.015
 roles+0.015
Neg logit contributions
rig-0.033
 rather-0.032
 stump-0.030
 former-0.029
 eerie-0.028
Token Bob
Token ID5811
Feature activation-1.40e-01

Pos logit contributions
ged+0.035
iked+0.028
 recess+0.027
ed+0.023
or+0.022
Neg logit contributions
 rather-0.090
rig-0.088
 stump-0.084
 annoying-0.084
 tre-0.082
Token's
Token ID338
Feature activation+8.33e-02

Pos logit contributions
ged+0.240
ies+0.205
ed+0.195
iked+0.191
or+0.185
Neg logit contributions
rig-0.227
 rather-0.223
 annoying-0.211
 tre-0.206
 stump-0.205
Token face
Token ID1986
Feature activation+9.99e-02

Pos logit contributions
rig+0.159
 rather+0.152
 annoying+0.146
pill+0.139
 likely+0.136
Neg logit contributions
ged-0.127
ies-0.106
 recess-0.106
iked-0.104
ed-0.099
Token scr
Token ID6040
Feature activation+2.76e-01

Pos logit contributions
rig+0.181
 rather+0.160
 stump+0.154
 annoying+0.152
pill+0.151
Neg logit contributions
ged-0.165
ies-0.146
ed-0.141
,-0.140
iked-0.128
Tokenunch
Token ID3316
Feature activation+5.04e-02

Pos logit contributions
 annoying+0.494
 rather+0.489
rig+0.481
 stump+0.477
 probably+0.474
Neg logit contributions
iked-0.385
ged-0.385
or-0.346
ipped-0.336
 recess-0.326
Token up
Token ID510
Feature activation-4.31e-02

Pos logit contributions
 rather+0.114
rig+0.112
 annoying+0.110
 stump+0.106
 probably+0.102
Neg logit contributions
ged-0.060
iked-0.048
 recess-0.046
or-0.046
ed-0.044
Token.
Token ID13
Feature activation+2.03e-02

Pos logit contributions
ged+0.064
iked+0.055
 recess+0.055
or+0.052
ed+0.049
Neg logit contributions
 rather-0.083
rig-0.080
 annoying-0.080
 stump-0.077
 probably-0.074
Token He
Token ID679
Feature activation+1.12e-01

Pos logit contributions
 rather+0.043
rig+0.042
 annoying+0.041
 stump+0.038
 tre+0.038
Neg logit contributions
ged-0.025
iked-0.024
ed-0.021
 recess-0.020
or-0.019
Token didn
Token ID1422
Feature activation-6.45e-02

Pos logit contributions
rig+0.249
 rather+0.243
 annoying+0.233
 stump+0.231
 eerie+0.220
Neg logit contributions
ged-0.139
ed-0.100
iked-0.100
ies-0.095
or-0.094
Token a
Token ID257
Feature activation+1.01e-01

Pos logit contributions
ged+0.195
 recess+0.181
iked+0.167
or+0.150
ed+0.149
Neg logit contributions
rig-0.202
 rather-0.189
 annoying-0.184
 stump-0.175
 probably-0.166
Token little
Token ID1310
Feature activation+1.23e-01

Pos logit contributions
 rather+0.179
 annoying+0.178
rig+0.174
 probably+0.164
 stump+0.163
Neg logit contributions
ged-0.160
 recess-0.133
iked-0.130
or-0.127
ies-0.118
Token bit
Token ID1643
Feature activation+1.56e-01

Pos logit contributions
 annoying+0.221
rig+0.217
 probably+0.214
 rather+0.210
 certainly+0.206
Neg logit contributions
ged-0.184
,-0.158
 recess-0.147
or-0.143
ies-0.141
Token and
Token ID290
Feature activation+1.21e-01

Pos logit contributions
rig+0.362
 annoying+0.357
 rather+0.351
 stump+0.349
 probably+0.345
Neg logit contributions
ged-0.170
iked-0.128
 recess-0.126
or-0.124
ed-0.116
Token then
Token ID788
Feature activation-7.86e-02

Pos logit contributions
 rather+0.229
rig+0.221
 annoying+0.221
 stump+0.212
 probably+0.203
Neg logit contributions
ged-0.184
iked-0.159
 recess-0.159
or-0.150
ed-0.140
Token scr
Token ID6040
Feature activation+2.76e-01

Pos logit contributions
ged+0.123
 recess+0.112
iked+0.111
or+0.107
rav+0.098
Neg logit contributions
 rather-0.139
 annoying-0.134
 stump-0.127
 probably-0.124
rig-0.124
Tokenabb
Token ID6485
Feature activation+4.75e-02

Pos logit contributions
 stump+0.486
 annoying+0.486
 rather+0.478
 wire+0.470
rig+0.468
Neg logit contributions
iked-0.408
ged-0.403
or-0.364
ipped-0.335
ogged-0.323
Tokenled
Token ID992
Feature activation-2.05e-01

Pos logit contributions
 annoying+0.102
 rather+0.101
rig+0.100
 stump+0.096
 probably+0.095
Neg logit contributions
 recess-0.052
ged-0.051
iked-0.047
 roles-0.039
or-0.039
Token underneath
Token ID14638
Feature activation-1.50e-01

Pos logit contributions
ged+0.278
 recess+0.268
iked+0.267
or+0.249
rav+0.240
Neg logit contributions
 rather-0.396
 annoying-0.376
rig-0.361
 stump-0.354
 probably-0.351
Token some
Token ID617
Feature activation-6.52e-03

Pos logit contributions
ged+0.191
 recess+0.155
iked+0.152
or+0.142
ed+0.135
Neg logit contributions
rig-0.328
 rather-0.325
 annoying-0.320
 stump-0.302
 probably-0.292
Token nearby
Token ID6716
Feature activation+1.01e-01

Pos logit contributions
ged+0.012
 recess+0.011
iked+0.010
or+0.010
ed+0.010
Neg logit contributions
rig-0.010
 rather-0.010
 annoying-0.009
 stump-0.009
 probably-0.009
Token sour
Token ID11348
Feature activation+1.71e-01

Pos logit contributions
 rather+0.170
rig+0.166
 stump+0.162
 wire+0.149
 target+0.146
Neg logit contributions
ged-0.260
ed-0.236
iked-0.224
,-0.221
 recess-0.217
Token it
Token ID340
Feature activation-2.59e-02

Pos logit contributions
rig+0.337
 rather+0.332
 annoying+0.326
 stump+0.321
 wire+0.300
Neg logit contributions
ged-0.246
ed-0.213
,-0.205
iked-0.199
ies-0.195
Token made
Token ID925
Feature activation-4.68e-02

Pos logit contributions
ged+0.040
ed+0.034
ies+0.030
or+0.029
,+0.029
Neg logit contributions
rig-0.048
 rather-0.047
 stump-0.046
 annoying-0.043
 wire-0.042
Token her
Token ID607
Feature activation-8.73e-02

Pos logit contributions
ged+0.038
 recess+0.029
iked+0.027
or+0.025
ed+0.024
Neg logit contributions
 rather-0.122
rig-0.120
 annoying-0.117
 stump-0.117
 tre-0.110
Token face
Token ID1986
Feature activation+1.01e-01

Pos logit contributions
ged+0.132
ies+0.113
,+0.108
ed+0.108
iked+0.102
Neg logit contributions
rig-0.148
 annoying-0.147
 rather-0.141
 stump-0.139
pill-0.139
Token scr
Token ID6040
Feature activation+2.75e-01

Pos logit contributions
rig+0.161
 rather+0.147
 annoying+0.145
 stump+0.143
 wire+0.140
Neg logit contributions
ged-0.183
ed-0.156
ies-0.155
iked-0.148
,-0.147
Tokenunch
Token ID3316
Feature activation+4.80e-02

Pos logit contributions
 annoying+0.497
 wire+0.484
 rather+0.475
 stump+0.472
 probably+0.467
Neg logit contributions
iked-0.379
ged-0.371
or-0.334
ipped-0.327
bled-0.300
Token up
Token ID510
Feature activation-4.77e-02

Pos logit contributions
 rather+0.109
rig+0.107
 annoying+0.106
 stump+0.102
 probably+0.098
Neg logit contributions
ged-0.057
iked-0.044
or-0.043
 recess-0.042
ed-0.041
Token!
Token ID0
Feature activation+4.16e-02

Pos logit contributions
ged+0.068
iked+0.059
 recess+0.059
or+0.055
ed+0.051
Neg logit contributions
 rather-0.094
 annoying-0.091
rig-0.091
 stump-0.087
 probably-0.084
Token
Token ID220
Feature activation-1.32e-01

Pos logit contributions
 rather+0.081
rig+0.078
 annoying+0.077
 stump+0.075
 probably+0.072
Neg logit contributions
ged-0.062
 recess-0.053
iked-0.053
or-0.050
ies-0.046
Token
Token ID198
Feature activation-1.02e-02

Pos logit contributions
ged+0.243
 recess+0.227
iked+0.216
or+0.202
 roles+0.199
Neg logit contributions
 rather-0.209
rig-0.208
 stump-0.191
 annoying-0.188
 probably-0.186
Token said
Token ID531
Feature activation-1.33e-01

Pos logit contributions
 rather+0.339
 annoying+0.328
rig+0.327
 stump+0.316
 probably+0.306
Neg logit contributions
ged-0.202
iked-0.170
 recess-0.170
or-0.159
ed-0.143
Token,
Token ID11
Feature activation-1.57e-01

Pos logit contributions
 recess+0.222
 forth+0.182
iked+0.170
ged+0.169
or+0.166
Neg logit contributions
 rather-0.229
 sharp-0.198
 awful-0.196
 probably-0.195
 web-0.183
Token "
Token ID366
Feature activation+3.53e-02

Pos logit contributions
ged+0.246
 recess+0.230
iked+0.222
or+0.217
 roles+0.197
Neg logit contributions
 rather-0.286
 annoying-0.274
rig-0.266
 stump-0.262
 probably-0.255
TokenC
Token ID34
Feature activation-1.58e-02

Pos logit contributions
 rather+0.069
rig+0.067
 stump+0.062
 probably+0.058
 awful+0.058
Neg logit contributions
ged-0.055
ed-0.044
rav-0.042
 recess-0.041
olph-0.039
Tokenater
Token ID729
Feature activation+1.20e-01

Pos logit contributions
ged+0.026
or+0.024
 recess+0.023
iked+0.023
ipped+0.022
Neg logit contributions
 rather-0.028
 annoying-0.028
rig-0.026
 stump-0.025
 probably-0.025
Tokenpill
Token ID27215
Feature activation+2.75e-01

Pos logit contributions
 rather+0.188
rig+0.176
 annoying+0.175
 stump+0.162
 probably+0.155
Neg logit contributions
ged-0.231
or-0.201
ed-0.200
iked-0.198
 recess-0.195
Tokenars
Token ID945
Feature activation-4.30e-02

Pos logit contributions
 rather+0.539
rig+0.537
 annoying+0.527
 stump+0.514
 probably+0.487
Neg logit contributions
ged-0.378
 recess-0.312
or-0.308
iked-0.298
ed-0.290
Token are
Token ID389
Feature activation+2.18e-02

Pos logit contributions
ged+0.057
iked+0.048
 recess+0.048
or+0.045
ed+0.041
Neg logit contributions
 rather-0.090
 annoying-0.087
rig-0.086
 stump-0.083
 probably-0.081
Token not
Token ID407
Feature activation+1.49e-01

Pos logit contributions
rig+0.035
 rather+0.032
 stump+0.032
umbling+0.030
 annoying+0.030
Neg logit contributions
ged-0.037
 recess-0.033
iked-0.031
ed-0.030
,-0.030
Token meant
Token ID4001
Feature activation-5.48e-02

Pos logit contributions
 rather+0.211
rig+0.202
 annoying+0.200
 stump+0.190
 probably+0.179
Neg logit contributions
ged-0.297
 recess-0.266
iked-0.265
or-0.254
ed-0.242
Token to
Token ID284
Feature activation+2.68e-02

Pos logit contributions
ged+0.080
iked+0.069
 recess+0.069
or+0.065
ed+0.060
Neg logit contributions
 rather-0.107
 annoying-0.103
rig-0.103
 stump-0.099
 probably-0.095
Token he
Token ID339
Feature activation-4.56e-02

Pos logit contributions
 rather+0.213
 annoying+0.208
 stump+0.203
 probably+0.190
 tre+0.186
Neg logit contributions
iked-0.189
 recess-0.186
ged-0.186
or-0.179
rav-0.164
Token noticed
Token ID6810
Feature activation-1.36e-01

Pos logit contributions
ged+0.063
iked+0.054
 recess+0.053
or+0.051
ed+0.047
Neg logit contributions
 rather-0.092
rig-0.089
 annoying-0.089
 stump-0.086
 probably-0.082
Token something
Token ID1223
Feature activation+5.05e-02

Pos logit contributions
ged+0.217
 recess+0.179
iked+0.177
ed+0.173
or+0.168
Neg logit contributions
rig-0.251
 rather-0.245
 annoying-0.232
 stump-0.225
 likely-0.214
Token shiny
Token ID22441
Feature activation+1.00e-01

Pos logit contributions
 rather+0.061
rig+0.060
 stump+0.056
 annoying+0.054
 former+0.054
Neg logit contributions
ged-0.109
iked-0.089
or-0.088
,-0.088
ed-0.088
Token wed
Token ID29445
Feature activation+2.20e-02

Pos logit contributions
rig+0.238
 rather+0.230
 annoying+0.225
 stump+0.214
pill+0.208
Neg logit contributions
ged-0.118
ed-0.086
or-0.076
iked-0.076
,-0.075
Tokenged
Token ID2004
Feature activation-3.54e-01

Pos logit contributions
 rather+0.060
 stump+0.057
 probably+0.056
 annoying+0.055
rig+0.054
Neg logit contributions
 recess-0.010
iked-0.008
rav-0.007
 roles-0.006
 classmates-0.006
Token between
Token ID1022
Feature activation-9.16e-02

Pos logit contributions
ged+0.464
iked+0.385
 recess+0.378
or+0.362
ed+0.334
Neg logit contributions
 rather-0.749
rig-0.733
 annoying-0.731
 stump-0.701
 probably-0.675
Token some
Token ID617
Feature activation-2.10e-02

Pos logit contributions
ged+0.103
 recess+0.080
iked+0.079
or+0.069
ed+0.066
Neg logit contributions
rig-0.219
 rather-0.217
 annoying-0.208
 stump-0.201
 probably-0.197
Token branches
Token ID13737
Feature activation+9.96e-02

Pos logit contributions
ged+0.042
 recess+0.035
iked+0.034
or+0.034
ed+0.034
Neg logit contributions
rig-0.031
 rather-0.030
 annoying-0.028
 probably-0.027
 stump-0.026
Token.
Token ID13
Feature activation+1.26e-02

Pos logit contributions
 annoying+0.235
rig+0.234
 rather+0.227
 former+0.221
 fullest+0.212
Neg logit contributions
ged-0.115
,-0.082
or-0.079
iked-0.078
ed-0.075
Token It
Token ID632
Feature activation+1.43e-01

Pos logit contributions
 rather+0.020
 annoying+0.019
rig+0.019
 stump+0.018
 probably+0.017
Neg logit contributions
ged-0.023
iked-0.021
 recess-0.020
or-0.020
ed-0.019
Token soft
Token ID2705
Feature activation+1.80e-01

Pos logit contributions
 rather+0.350
rig+0.346
 annoying+0.334
 stump+0.320
 probably+0.311
Neg logit contributions
ged-0.291
 recess-0.257
iked-0.242
or-0.236
ies-0.223
Token swing
Token ID9628
Feature activation+5.55e-02

Pos logit contributions
 rather+0.303
rig+0.296
 annoying+0.287
 stump+0.261
 awful+0.257
Neg logit contributions
ged-0.312
 recess-0.270
iked-0.262
or-0.260
ies-0.254
Token.
Token ID13
Feature activation+2.10e-02

Pos logit contributions
 rather+0.110
rig+0.109
 annoying+0.106
 stump+0.101
 probably+0.100
Neg logit contributions
ged-0.075
or-0.061
 recess-0.060
,-0.059
ed-0.058
Token He
Token ID679
Feature activation+9.79e-02

Pos logit contributions
rig+0.032
 annoying+0.032
 rather+0.032
 intense+0.029
 awful+0.029
Neg logit contributions
ged-0.035
 recess-0.032
iked-0.031
or-0.031
 Lee-0.029
Token tug
Token ID27762
Feature activation-2.29e-02

Pos logit contributions
rig+0.158
 rather+0.155
 annoying+0.149
 stump+0.145
 intense+0.140
Neg logit contributions
ged-0.163
ed-0.142
or-0.140
,-0.135
iked-0.130
Tokenged
Token ID2004
Feature activation-3.52e-01

Pos logit contributions
iked+0.023
 recess+0.023
ged+0.019
or+0.017
rav+0.016
Neg logit contributions
 annoying-0.051
 rather-0.051
rig-0.046
 sharp-0.046
 intense-0.046
Token on
Token ID319
Feature activation-1.57e-01

Pos logit contributions
ged+0.507
 recess+0.456
iked+0.455
or+0.412
 roles+0.380
Neg logit contributions
 rather-0.684
 annoying-0.656
rig-0.645
 stump-0.619
 probably-0.599
Token his
Token ID465
Feature activation-8.51e-02

Pos logit contributions
ged+0.204
or+0.165
 recess+0.165
iked+0.161
 roles+0.142
Neg logit contributions
 rather-0.331
rig-0.328
 annoying-0.316
 stump-0.311
 probably-0.295
Token mom
Token ID1995
Feature activation-6.91e-02

Pos logit contributions
ged+0.127
 recess+0.109
iked+0.109
or+0.101
ed+0.096
Neg logit contributions
 rather-0.161
rig-0.156
 annoying-0.154
 stump-0.150
 probably-0.144
Token's
Token ID338
Feature activation+7.01e-02

Pos logit contributions
ged+0.121
iked+0.105
 recess+0.104
or+0.102
ed+0.096
Neg logit contributions
 rather-0.114
 annoying-0.110
rig-0.110
 stump-0.104
 probably-0.100
Token hand
Token ID1021
Feature activation+2.84e-02

Pos logit contributions
 rather+0.124
 annoying+0.124
rig+0.123
 stump+0.113
 intense+0.112
Neg logit contributions
ged-0.110
iked-0.094
 recess-0.092
or-0.091
ed-0.085
Token
Token ID26391
Feature activation-1.64e-01

Pos logit contributions
 rather+0.139
 annoying+0.134
rig+0.134
 stump+0.128
 probably+0.123
Neg logit contributions
ged-0.119
iked-0.104
 recess-0.104
or-0.098
ed-0.091
Token The
Token ID383
Feature activation+7.18e-02

Pos logit contributions
ged+0.264
 recess+0.231
iked+0.231
or+0.219
ed+0.202
Neg logit contributions
 rather-0.298
rig-0.285
 annoying-0.285
 stump-0.275
 probably-0.263
Token hero
Token ID4293
Feature activation+4.17e-02

Pos logit contributions
 rather+0.112
rig+0.108
 annoying+0.101
 spe+0.092
 intense+0.091
Neg logit contributions
ged-0.121
 recess-0.113
ed-0.104
or-0.104
iked-0.104
Token carefully
Token ID7773
Feature activation+5.57e-03

Pos logit contributions
 rather+0.076
 annoying+0.074
rig+0.073
 stump+0.070
 awful+0.066
Neg logit contributions
ged-0.065
or-0.055
iked-0.054
ed-0.054
 recess-0.053
Token tug
Token ID27762
Feature activation-2.10e-02

Pos logit contributions
 rather+0.010
 annoying+0.010
rig+0.010
 stump+0.009
 awful+0.009
Neg logit contributions
ged-0.009
iked-0.008
or-0.007
 recess-0.007
ed-0.007
Tokenged
Token ID2004
Feature activation-3.51e-01

Pos logit contributions
iked+0.022
 recess+0.021
ged+0.018
rav+0.016
 roles+0.015
Neg logit contributions
 rather-0.045
 annoying-0.045
rig-0.042
 awful-0.040
 electric-0.040
Token and
Token ID290
Feature activation+1.19e-01

Pos logit contributions
ged+0.540
iked+0.467
 recess+0.466
or+0.443
ed+0.411
Neg logit contributions
 rather-0.657
 annoying-0.634
rig-0.633
 stump-0.608
 probably-0.583
Token pulled
Token ID5954
Feature activation-1.31e-01

Pos logit contributions
 rather+0.228
rig+0.223
 annoying+0.220
 stump+0.211
 probably+0.201
Neg logit contributions
ged-0.182
iked-0.153
 recess-0.152
or-0.146
ed-0.139
Token until
Token ID1566
Feature activation-1.99e-01

Pos logit contributions
ged+0.182
iked+0.157
 recess+0.156
or+0.146
ed+0.134
Neg logit contributions
 rather-0.267
 annoying-0.257
rig-0.257
 stump-0.248
 probably-0.239
Token the
Token ID262
Feature activation-3.81e-02

Pos logit contributions
ged+0.284
iked+0.245
 recess+0.241
or+0.229
ies+0.210
Neg logit contributions
 rather-0.392
 annoying-0.376
rig-0.367
 stump-0.362
 probably-0.353
Token rabbit
Token ID22746
Feature activation+3.52e-02

Pos logit contributions
ged+0.069
 recess+0.066
iked+0.062
or+0.060
ed+0.060
Neg logit contributions
 rather-0.059
rig-0.056
 annoying-0.052
 probably-0.047
 stump-0.047
Token grabbed
Token ID13176
Feature activation-8.29e-02

Pos logit contributions
 rather+0.227
 annoying+0.223
rig+0.220
 stump+0.216
 awful+0.207
Neg logit contributions
ged-0.133
iked-0.105
ed-0.098
or-0.097
 recess-0.092
Token the
Token ID262
Feature activation-3.23e-02

Pos logit contributions
ged+0.132
iked+0.121
 recess+0.119
or+0.112
rav+0.106
Neg logit contributions
 rather-0.145
 annoying-0.140
rig-0.137
 stump-0.133
 probably-0.128
Token hook
Token ID8011
Feature activation+1.55e-01

Pos logit contributions
ged+0.059
ed+0.049
or+0.048
 recess+0.048
iked+0.047
Neg logit contributions
 rather-0.052
rig-0.050
 annoying-0.047
 likely-0.042
pill-0.042
Token and
Token ID290
Feature activation+1.23e-01

Pos logit contributions
 annoying+0.304
rig+0.285
 rather+0.284
pill+0.265
 probably+0.263
Neg logit contributions
ged-0.242
iked-0.204
or-0.201
ed-0.198
,-0.197
Token tug
Token ID27762
Feature activation-1.87e-02

Pos logit contributions
 rather+0.245
rig+0.236
 annoying+0.236
 stump+0.227
 probably+0.218
Neg logit contributions
ged-0.177
iked-0.151
 recess-0.149
or-0.141
ed-0.133
Tokenged
Token ID2004
Feature activation-3.48e-01

Pos logit contributions
 recess+0.010
iked+0.009
rav+0.007
riage+0.007
ail+0.006
Neg logit contributions
 rather-0.048
 annoying-0.046
 quite-0.045
 intense-0.045
 sharp-0.045
Token it
Token ID340
Feature activation-2.73e-02

Pos logit contributions
ged+0.490
 recess+0.447
iked+0.440
or+0.401
 roles+0.373
Neg logit contributions
 rather-0.688
 annoying-0.652
rig-0.651
 stump-0.627
 probably-0.604
Token.
Token ID13
Feature activation+2.60e-02

Pos logit contributions
ged+0.039
iked+0.034
 recess+0.034
or+0.032
ed+0.029
Neg logit contributions
 rather-0.054
 annoying-0.052
rig-0.052
 stump-0.050
 probably-0.048
Token Suddenly
Token ID24975
Feature activation-6.91e-02

Pos logit contributions
 rather+0.041
 annoying+0.041
rig+0.040
 stump+0.038
 probably+0.036
Neg logit contributions
ged-0.047
iked-0.043
 recess-0.041
or-0.040
ed-0.039
Token,
Token ID11
Feature activation-1.93e-01

Pos logit contributions
ged+0.082
 recess+0.073
iked+0.071
or+0.065
rav+0.065
Neg logit contributions
 rather-0.142
 annoying-0.134
 probably-0.131
 sharp-0.131
 stump-0.130
Token an
Token ID281
Feature activation+7.53e-02

Pos logit contributions
ged+0.266
 recess+0.227
iked+0.226
or+0.212
ed+0.196
Neg logit contributions
 rather-0.396
rig-0.384
 annoying-0.382
 stump-0.368
 probably-0.354
Token
Token ID220
Feature activation-1.44e-01

Pos logit contributions
 rather+0.071
 annoying+0.069
rig+0.069
 stump+0.066
 probably+0.064
Neg logit contributions
ged-0.047
iked-0.040
 recess-0.040
or-0.037
ed-0.034
Token
Token ID198
Feature activation-2.63e-02

Pos logit contributions
ged+0.257
 recess+0.237
iked+0.227
or+0.209
 roles+0.206
Neg logit contributions
 rather-0.236
rig-0.233
 stump-0.218
 annoying-0.217
 probably-0.210
Token
Token ID198
Feature activation-2.48e-02

Pos logit contributions
ged+0.045
 recess+0.040
iked+0.040
or+0.036
 roles+0.036
Neg logit contributions
 rather-0.045
rig-0.044
 annoying-0.042
 stump-0.042
 probably-0.039
TokenTom
Token ID13787
Feature activation-1.20e-01

Pos logit contributions
ged+0.035
iked+0.030
 recess+0.029
or+0.028
ed+0.026
Neg logit contributions
 rather-0.050
 annoying-0.048
rig-0.048
 stump-0.046
 probably-0.045
Token tug
Token ID27762
Feature activation-2.09e-02

Pos logit contributions
ged+0.173
iked+0.144
 recess+0.136
or+0.135
ed+0.132
Neg logit contributions
 rather-0.240
 annoying-0.230
rig-0.230
 stump-0.218
 probably-0.210
Tokenged
Token ID2004
Feature activation-3.48e-01

Pos logit contributions
 recess+0.011
iked+0.010
rav+0.007
riage+0.007
ail+0.007
Neg logit contributions
 annoying-0.055
 rather-0.054
 intense-0.053
 sharp-0.052
 quite-0.051
Token at
Token ID379
Feature activation-1.20e-01

Pos logit contributions
ged+0.542
 recess+0.480
iked+0.479
or+0.449
ed+0.415
Neg logit contributions
 rather-0.644
 annoying-0.618
rig-0.616
 stump-0.591
 probably-0.567
Token the
Token ID262
Feature activation-3.29e-02

Pos logit contributions
ged+0.138
 recess+0.111
iked+0.105
or+0.102
ed+0.090
Neg logit contributions
 rather-0.275
rig-0.270
 annoying-0.267
 stump-0.254
 probably-0.247
Token knob
Token ID36842
Feature activation+5.49e-02

Pos logit contributions
ged+0.050
or+0.040
iked+0.039
ed+0.039
,+0.038
Neg logit contributions
rig-0.062
 rather-0.062
 annoying-0.057
pill-0.056
 stump-0.052
Token and
Token ID290
Feature activation+1.22e-01

Pos logit contributions
rig+0.128
 annoying+0.124
 rather+0.122
 likely+0.114
pill+0.114
Neg logit contributions
ged-0.065
iked-0.050
,-0.047
or-0.047
ed-0.046
Token pulled
Token ID5954
Feature activation-1.31e-01

Pos logit contributions
 rather+0.231
 annoying+0.224
rig+0.219
 stump+0.215
 probably+0.207
Neg logit contributions
ged-0.180
 recess-0.163
iked-0.162
or-0.153
 roles-0.138
Token so
Token ID523
Feature activation+1.31e-01

Pos logit contributions
ged+0.042
 recess+0.036
iked+0.036
or+0.033
ed+0.030
Neg logit contributions
 rather-0.067
 annoying-0.064
rig-0.064
 stump-0.062
 probably-0.060
Token badly
Token ID11234
Feature activation+1.27e-01

Pos logit contributions
rig+0.273
 rather+0.268
 annoying+0.264
 stump+0.262
 probably+0.245
Neg logit contributions
ged-0.174
iked-0.142
 recess-0.139
or-0.136
ed-0.125
Token that
Token ID326
Feature activation+5.77e-02

Pos logit contributions
 rather+0.320
 annoying+0.315
rig+0.315
 stump+0.304
 probably+0.294
Neg logit contributions
ged-0.115
iked-0.087
 recess-0.086
or-0.079
ed-0.068
Token he
Token ID339
Feature activation-4.35e-02

Pos logit contributions
 rather+0.122
rig+0.121
 annoying+0.118
 stump+0.115
 probably+0.110
Neg logit contributions
ged-0.074
 recess-0.063
iked-0.061
or-0.058
ed-0.053
Token lun
Token ID14678
Feature activation+2.24e-02

Pos logit contributions
ged+0.061
iked+0.050
or+0.049
ed+0.048
 recess+0.046
Neg logit contributions
 rather-0.087
rig-0.086
 annoying-0.083
 stump-0.082
 wire-0.076
Tokenged
Token ID2004
Feature activation-3.47e-01

Pos logit contributions
 rather+0.050
 probably+0.047
 quite+0.046
 stump+0.046
 sharp+0.045
Neg logit contributions
 recess-0.019
 classmates-0.016
iked-0.015
 roles-0.013
rav-0.013
Token forward
Token ID2651
Feature activation-1.62e-01

Pos logit contributions
ged+0.554
 recess+0.494
iked+0.492
or+0.461
ed+0.428
Neg logit contributions
 rather-0.627
 annoying-0.599
rig-0.594
 stump-0.570
 probably-0.549
Token and
Token ID290
Feature activation+1.21e-01

Pos logit contributions
ged+0.290
 recess+0.274
iked+0.263
or+0.250
rav+0.238
Neg logit contributions
 rather-0.257
rig-0.236
 stump-0.233
 annoying-0.233
 probably-0.217
Token slur
Token ID40066
Feature activation+3.79e-02

Pos logit contributions
 rather+0.210
rig+0.205
 annoying+0.202
 stump+0.195
 probably+0.184
Neg logit contributions
ged-0.203
 recess-0.176
iked-0.176
or-0.167
ed-0.158
Tokenped
Token ID9124
Feature activation-1.96e-01

Pos logit contributions
 rather+0.086
 annoying+0.084
rig+0.084
 stump+0.081
 probably+0.079
Neg logit contributions
ged-0.042
 recess-0.035
iked-0.035
or-0.032
 roles-0.028
Token it
Token ID340
Feature activation-3.17e-02

Pos logit contributions
ged+0.278
iked+0.227
 recess+0.221
or+0.215
ed+0.208
Neg logit contributions
 rather-0.394
rig-0.392
 annoying-0.387
 stump-0.372
 probably-0.356
Token the
Token ID262
Feature activation-2.72e-02

Pos logit contributions
ged+0.142
 recess+0.121
or+0.099
iked+0.094
,+0.091
Neg logit contributions
rig-0.393
 annoying-0.378
 rather-0.373
 stump-0.350
pill-0.342
Token counter
Token ID3753
Feature activation+6.44e-02

Pos logit contributions
ged+0.046
 recess+0.043
or+0.040
 roles+0.039
iked+0.038
Neg logit contributions
 rather-0.044
rig-0.043
 annoying-0.041
 likely-0.040
 probably-0.037
Token.
Token ID13
Feature activation+2.52e-02

Pos logit contributions
 annoying+0.137
rig+0.135
 rather+0.133
 stump+0.124
 probably+0.122
Neg logit contributions
ged-0.083
iked-0.072
or-0.067
 recess-0.067
ed-0.065
Token She
Token ID1375
Feature activation+1.27e-01

Pos logit contributions
 rather+0.040
 annoying+0.039
rig+0.039
 stump+0.036
 probably+0.035
Neg logit contributions
ged-0.045
iked-0.041
 recess-0.040
or-0.038
ed-0.037
Token tug
Token ID27762
Feature activation-1.74e-02

Pos logit contributions
 rather+0.258
rig+0.257
 annoying+0.257
 stump+0.242
 awful+0.232
Neg logit contributions
ged-0.175
iked-0.143
ed-0.136
or-0.134
 recess-0.131
Tokenged
Token ID2004
Feature activation-3.47e-01

Pos logit contributions
iked+0.010
 recess+0.009
riage+0.007
ail+0.006
rav+0.006
Neg logit contributions
 rather-0.044
 annoying-0.044
 intense-0.043
 sharp-0.042
 quite-0.042
Token on
Token ID319
Feature activation-1.49e-01

Pos logit contributions
ged+0.549
 recess+0.514
iked+0.512
or+0.462
 roles+0.439
Neg logit contributions
 rather-0.619
 annoying-0.580
rig-0.570
 stump-0.548
 probably-0.530
Token it
Token ID340
Feature activation-2.48e-02

Pos logit contributions
ged+0.173
 recess+0.134
iked+0.132
or+0.127
ed+0.117
Neg logit contributions
rig-0.342
 rather-0.338
 annoying-0.332
 stump-0.316
 probably-0.303
Token,
Token ID11
Feature activation-1.83e-01

Pos logit contributions
ged+0.028
iked+0.022
or+0.022
ed+0.021
 recess+0.020
Neg logit contributions
 annoying-0.056
rig-0.055
 rather-0.055
 stump-0.051
 probably-0.051
Token to
Token ID284
Feature activation+1.01e-02

Pos logit contributions
ged+0.311
ed+0.225
iked+0.225
,+0.221
 recess+0.205
Neg logit contributions
rig-0.340
 rather-0.312
 annoying-0.312
 stump-0.305
pill-0.296
Token see
Token ID766
Feature activation-7.99e-02

Pos logit contributions
rig+0.021
 rather+0.021
 annoying+0.020
 stump+0.019
pill+0.019
Neg logit contributions
ged-0.013
 recess-0.012
iked-0.010
or-0.009
 later-0.009
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
ged+0.113
iked+0.096
or+0.093
 recess+0.092
ed+0.086
Neg logit contributions
 annoying-0.169
 rather-0.169
rig-0.166
 stump-0.156
 probably-0.153
Token She
Token ID1375
Feature activation+1.27e-01

Pos logit contributions
 rather+0.033
rig+0.031
 annoying+0.031
 stump+0.029
 probably+0.028
Neg logit contributions
ged-0.047
 recess-0.043
iked-0.042
or-0.040
 roles-0.038
Token pulled
Token ID5954
Feature activation-1.28e-01

Pos logit contributions
 rather+0.251
 annoying+0.246
rig+0.244
 stump+0.228
 awful+0.223
Neg logit contributions
ged-0.189
iked-0.155
or-0.151
ed-0.148
 recess-0.146
Token and
Token ID290
Feature activation+1.25e-01

Pos logit contributions
ged+0.190
iked+0.173
 recess+0.169
or+0.157
 roles+0.143
Neg logit contributions
 rather-0.244
 annoying-0.233
rig-0.229
 stump-0.223
 probably-0.215
Token tug
Token ID27762
Feature activation-1.89e-02

Pos logit contributions
 rather+0.200
 annoying+0.192
 stump+0.188
rig+0.186
 wire+0.175
Neg logit contributions
ged-0.206
iked-0.199
 recess-0.198
or-0.189
rav-0.180
Tokenged
Token ID2004
Feature activation-3.47e-01

Pos logit contributions
iked+0.011
 recess+0.010
rav+0.008
riage+0.007
 roles+0.006
Neg logit contributions
 rather-0.050
 annoying-0.048
 sharp-0.046
 intense-0.046
 quite-0.046
Token until
Token ID1566
Feature activation-1.89e-01

Pos logit contributions
ged+0.517
 recess+0.472
iked+0.470
or+0.430
 roles+0.402
Neg logit contributions
 rather-0.660
 annoying-0.622
rig-0.615
 stump-0.595
 probably-0.573
Token suddenly
Token ID6451
Feature activation-6.45e-02

Pos logit contributions
ged+0.309
iked+0.300
rav+0.283
or+0.282
ies+0.273
Neg logit contributions
 rather-0.313
 annoying-0.281
 probably-0.274
 awful-0.268
 stump-0.267
Token the
Token ID262
Feature activation-2.92e-02

Pos logit contributions
ged+0.088
iked+0.080
or+0.076
 recess+0.072
rav+0.070
Neg logit contributions
 rather-0.126
 annoying-0.120
 probably-0.117
 stump-0.116
 web-0.116
Token zipper
Token ID48992
Feature activation+5.36e-02

Pos logit contributions
ged+0.055
 recess+0.049
iked+0.049
or+0.047
ed+0.044
Neg logit contributions
 rather-0.046
rig-0.044
 annoying-0.043
 stump-0.041
 probably-0.039
Token broke
Token ID6265
Feature activation-8.35e-02

Pos logit contributions
 rather+0.095
 annoying+0.093
 stump+0.091
rig+0.090
 probably+0.088
Neg logit contributions
ged-0.080
 recess-0.074
iked-0.072
or-0.067
rav-0.063
Token was
Token ID373
Feature activation+5.23e-02

Pos logit contributions
 rather+0.208
rig+0.205
 annoying+0.202
 stump+0.193
 probably+0.181
Neg logit contributions
ged-0.160
iked-0.135
or-0.130
 recess-0.127
ed-0.123
Token calm
Token ID9480
Feature activation+1.02e-01

Pos logit contributions
rig+0.076
pill+0.070
 stump+0.068
 rather+0.067
 annoying+0.067
Neg logit contributions
ged-0.098
 recess-0.083
iked-0.082
,-0.080
ed-0.079
Token and
Token ID290
Feature activation+1.24e-01

Pos logit contributions
rig+0.179
 annoying+0.176
 rather+0.176
 stump+0.169
 wire+0.159
Neg logit contributions
ged-0.170
iked-0.146
ed-0.140
 recess-0.138
or-0.136
Token he
Token ID339
Feature activation-3.84e-02

Pos logit contributions
 rather+0.196
rig+0.195
 stump+0.188
 annoying+0.181
pill+0.170
Neg logit contributions
ged-0.219
iked-0.187
 recess-0.183
ed-0.174
ies-0.171
Token nud
Token ID26731
Feature activation-2.49e-02

Pos logit contributions
ged+0.048
iked+0.038
ed+0.038
or+0.035
ies+0.034
Neg logit contributions
 rather-0.080
rig-0.079
 annoying-0.078
 stump-0.078
 awful-0.073
Tokenged
Token ID2004
Feature activation-3.46e-01

Pos logit contributions
 recess+0.017
 roles+0.011
rav+0.010
iked+0.009
 classmates+0.008
Neg logit contributions
 rather-0.061
 stump-0.058
 probably-0.058
 wire-0.056
 quite-0.055
Token Meg
Token ID8336
Feature activation-1.70e-01

Pos logit contributions
ged+0.507
 recess+0.451
iked+0.448
or+0.410
 roles+0.383
Neg logit contributions
 rather-0.667
 annoying-0.640
rig-0.636
 stump-0.609
 probably-0.592
Token's
Token ID338
Feature activation+7.53e-02

Pos logit contributions
ged+0.280
iked+0.243
 recess+0.237
or+0.233
ed+0.220
Neg logit contributions
 rather-0.297
 annoying-0.292
rig-0.287
 stump-0.272
 probably-0.263
Token arm
Token ID3211
Feature activation+1.20e-01

Pos logit contributions
 annoying+0.150
 rather+0.147
rig+0.146
pill+0.140
 stump+0.132
Neg logit contributions
ged-0.099
iked-0.084
or-0.080
ed-0.079
 recess-0.078
Token reassuring
Token ID36450
Feature activation-2.22e-02

Pos logit contributions
rig+0.272
 annoying+0.270
 rather+0.260
 probably+0.253
 likely+0.248
Neg logit contributions
ged-0.132
ed-0.108
or-0.105
,-0.100
iked-0.096
Tokenly
Token ID306
Feature activation-9.35e-02

Pos logit contributions
ged+0.033
 recess+0.029
iked+0.029
or+0.027
ed+0.025
Neg logit contributions
 rather-0.043
 annoying-0.041
rig-0.041
 stump-0.039
 probably-0.038
Token Suddenly
Token ID24975
Feature activation-7.50e-02

Pos logit contributions
 rather+0.049
 annoying+0.047
rig+0.046
 stump+0.044
 web+0.042
Neg logit contributions
ged-0.040
iked-0.037
 recess-0.034
ed-0.033
or-0.032
Token,
Token ID11
Feature activation-2.00e-01

Pos logit contributions
ged+0.089
 recess+0.084
iked+0.083
or+0.081
rav+0.076
Neg logit contributions
 rather-0.151
 wire-0.147
 annoying-0.146
 sharp-0.144
 smoke-0.142
Token the
Token ID262
Feature activation-3.65e-02

Pos logit contributions
ged+0.249
 recess+0.208
iked+0.203
or+0.185
ed+0.175
Neg logit contributions
 rather-0.440
rig-0.437
 annoying-0.423
 stump-0.409
 probably-0.394
Token dog
Token ID3290
Feature activation+7.48e-02

Pos logit contributions
ged+0.046
 recess+0.042
ed+0.037
iked+0.034
 roles+0.033
Neg logit contributions
rig-0.077
 rather-0.076
 likely-0.068
 unlike-0.068
 probably-0.067
Token lun
Token ID14678
Feature activation+2.41e-02

Pos logit contributions
 rather+0.157
rig+0.154
 annoying+0.149
 stump+0.146
 probably+0.138
Neg logit contributions
ged-0.102
iked-0.082
 recess-0.079
or-0.076
ed-0.075
Tokenged
Token ID2004
Feature activation-3.46e-01

Pos logit contributions
 rather+0.054
 annoying+0.052
 electric+0.051
 stump+0.050
 sharp+0.050
Neg logit contributions
 recess-0.018
 classmates-0.017
iked-0.015
kins-0.015
packs-0.014
Token at
Token ID379
Feature activation-1.18e-01

Pos logit contributions
ged+0.439
iked+0.362
 recess+0.358
or+0.338
ed+0.312
Neg logit contributions
 rather-0.748
rig-0.728
 annoying-0.725
 stump-0.703
 probably-0.677
Token Tim
Token ID5045
Feature activation-1.69e-01

Pos logit contributions
ged+0.150
iked+0.131
 recess+0.127
or+0.121
ed+0.108
Neg logit contributions
 rather-0.251
 annoying-0.243
rig-0.239
 stump-0.234
 probably-0.228
Tokenmy
Token ID1820
Feature activation-1.30e-01

Pos logit contributions
ged+0.213
iked+0.179
 recess+0.179
or+0.166
ed+0.151
Neg logit contributions
 rather-0.364
 annoying-0.352
rig-0.351
 stump-0.339
 probably-0.328
Token and
Token ID290
Feature activation+1.16e-01

Pos logit contributions
ged+0.198
,+0.161
or+0.161
iked+0.157
ed+0.149
Neg logit contributions
 annoying-0.248
rig-0.241
 rather-0.240
 tre-0.219
 intense-0.219
Token bit
Token ID1643
Feature activation+1.49e-01

Pos logit contributions
 rather+0.227
 annoying+0.221
rig+0.216
 stump+0.211
 probably+0.204
Neg logit contributions
ged-0.167
 recess-0.147
iked-0.147
or-0.139
ed-0.124
Token it
Token ID340
Feature activation-3.29e-02

Pos logit contributions
rig+0.020
 rather+0.019
 annoying+0.018
 stump+0.018
 web+0.017
Neg logit contributions
ged-0.014
 recess-0.013
iked-0.012
ed-0.011
 roles-0.010
Token moved
Token ID3888
Feature activation+6.31e-04

Pos logit contributions
ged+0.036
ed+0.031
,+0.028
iked+0.025
ies+0.024
Neg logit contributions
rig-0.073
 rather-0.072
 annoying-0.069
 stump-0.066
 former-0.064
Token.
Token ID13
Feature activation+1.76e-02

Pos logit contributions
 rather+0.001
 annoying+0.001
rig+0.001
 stump+0.001
 probably+0.001
Neg logit contributions
ged-0.001
iked-0.001
or-0.001
 recess-0.001
ed-0.001
Token He
Token ID679
Feature activation+1.04e-01

Pos logit contributions
 annoying+0.032
 rather+0.032
rig+0.030
 stump+0.029
 awful+0.028
Neg logit contributions
ged-0.028
iked-0.025
 recess-0.023
or-0.023
ed-0.022
Token tug
Token ID27762
Feature activation-1.72e-02

Pos logit contributions
 rather+0.218
 annoying+0.217
rig+0.215
 stump+0.209
 awful+0.204
Neg logit contributions
ged-0.135
iked-0.103
ed-0.102
or-0.098
ies-0.090
Tokenged
Token ID2004
Feature activation-3.45e-01

Pos logit contributions
 recess+0.007
iked+0.006
riage+0.003
rav+0.003
 roles+0.003
Neg logit contributions
 rather-0.047
 annoying-0.046
 intense-0.045
 sharp-0.043
 quite-0.043
Token harder
Token ID7069
Feature activation-7.95e-03

Pos logit contributions
ged+0.494
 recess+0.448
iked+0.445
or+0.404
 roles+0.377
Neg logit contributions
 rather-0.674
 annoying-0.641
rig-0.637
 stump-0.611
 probably-0.588
Token and
Token ID290
Feature activation+1.22e-01

Pos logit contributions
ged+0.014
iked+0.011
or+0.011
 recess+0.011
ed+0.010
Neg logit contributions
 annoying-0.014
rig-0.014
 rather-0.014
 probably-0.013
 stump-0.013
Token the
Token ID262
Feature activation-3.57e-02

Pos logit contributions
 rather+0.219
rig+0.214
 annoying+0.211
 stump+0.202
 probably+0.194
Neg logit contributions
ged-0.202
 recess-0.172
iked-0.170
or-0.163
ed-0.156
Token wire
Token ID6503
Feature activation+1.86e-01

Pos logit contributions
ged+0.083
ed+0.072
 recess+0.070
 roles+0.069
,+0.069
Neg logit contributions
 rather-0.040
rig-0.040
pill-0.032
 annoying-0.029
 likely-0.027
Token came
Token ID1625
Feature activation-1.41e-01

Pos logit contributions
 rather+0.359
rig+0.355
 annoying+0.344
 stump+0.327
 probably+0.313
Neg logit contributions
ged-0.289
iked-0.242
 recess-0.229
or-0.228
ed-0.227
Token they
Token ID484
Feature activation-6.76e-02

Pos logit contributions
ged+0.004
iked+0.004
 recess+0.004
or+0.004
ed+0.003
Neg logit contributions
 rather-0.006
 annoying-0.006
rig-0.006
 stump-0.005
 probably-0.005
Token were
Token ID547
Feature activation-3.50e-02

Pos logit contributions
ged+0.093
iked+0.074
or+0.073
 recess+0.069
ed+0.068
Neg logit contributions
rig-0.143
 rather-0.139
 annoying-0.138
 stump-0.133
 awful-0.123
Token all
Token ID477
Feature activation-3.41e-03

Pos logit contributions
ged+0.065
 recess+0.053
iked+0.052
ies+0.052
ed+0.052
Neg logit contributions
rig-0.056
 stump-0.047
pill-0.046
Ble-0.046
 rather-0.045
Token sm
Token ID895
Feature activation+1.47e-01

Pos logit contributions
ged+0.006
 recess+0.005
iked+0.005
 roles+0.005
ed+0.005
Neg logit contributions
rig-0.005
 rather-0.005
 annoying-0.005
 stump-0.005
 wire-0.004
Tokenud
Token ID463
Feature activation+9.13e-02

Pos logit contributions
 rather+0.156
 annoying+0.151
rig+0.148
 stump+0.139
 probably+0.135
Neg logit contributions
ged-0.349
iked-0.322
 recess-0.319
or-0.309
ed-0.304
Tokenged
Token ID2004
Feature activation-3.45e-01

Pos logit contributions
 rather+0.189
 annoying+0.177
 stump+0.176
 probably+0.174
rig+0.168
Neg logit contributions
 recess-0.105
iked-0.093
ged-0.089
rav-0.083
 roles-0.082
Token and
Token ID290
Feature activation+1.22e-01

Pos logit contributions
ged+0.510
iked+0.438
 recess+0.437
or+0.413
ed+0.384
Neg logit contributions
 rather-0.667
rig-0.645
 annoying-0.644
 stump-0.619
 probably-0.594
Token she
Token ID673
Feature activation-2.06e-02

Pos logit contributions
rig+0.234
 rather+0.229
 annoying+0.217
 stump+0.216
 probably+0.203
Neg logit contributions
ged-0.191
 recess-0.158
iked-0.156
ed-0.144
or-0.143
Token couldn
Token ID3521
Feature activation-2.98e-02

Pos logit contributions
ged+0.027
ed+0.020
or+0.020
iked+0.019
ies+0.018
Neg logit contributions
rig-0.046
 rather-0.044
 stump-0.043
 annoying-0.042
 heavens-0.040
Token't
Token ID470
Feature activation+1.34e-01

Pos logit contributions
ged+0.042
or+0.036
 recess+0.035
iked+0.034
ed+0.033
Neg logit contributions
rig-0.059
 rather-0.058
 annoying-0.057
 stump-0.054
 tre-0.053
Token read
Token ID1100
Feature activation-5.62e-02

Pos logit contributions
rig+0.250
 rather+0.246
 annoying+0.246
 stump+0.230
pill+0.221
Neg logit contributions
ged-0.208
 recess-0.179
or-0.177
iked-0.172
ed-0.155
Token it
Token ID340
Feature activation-2.73e-02

Pos logit contributions
ged+0.306
iked+0.265
 recess+0.261
or+0.243
ed+0.228
Neg logit contributions
 rather-0.422
 annoying-0.403
rig-0.398
 stump-0.388
 probably-0.377
Token,
Token ID11
Feature activation-1.76e-01

Pos logit contributions
ged+0.038
iked+0.033
 recess+0.033
or+0.031
ed+0.028
Neg logit contributions
 rather-0.055
 annoying-0.053
rig-0.053
 stump-0.051
 probably-0.050
Token so
Token ID523
Feature activation+1.32e-01

Pos logit contributions
ged+0.218
,+0.198
 down+0.179
 later+0.173
iked+0.173
Neg logit contributions
rig-0.342
 annoying-0.332
pill-0.332
 stump-0.324
vous-0.319
Token she
Token ID673
Feature activation-3.05e-02

Pos logit contributions
 rather+0.237
 annoying+0.223
 stump+0.212
 probably+0.209
rig+0.205
Neg logit contributions
ged-0.208
 recess-0.184
iked-0.182
or-0.180
rav-0.165
Token tug
Token ID27762
Feature activation-1.97e-02

Pos logit contributions
ged+0.044
iked+0.037
 recess+0.037
or+0.035
ed+0.032
Neg logit contributions
 rather-0.061
 annoying-0.059
rig-0.058
 stump-0.056
 probably-0.054
Tokenged
Token ID2004
Feature activation-3.45e-01

Pos logit contributions
iked+0.016
 recess+0.014
rav+0.014
 majesty+0.014
ail+0.013
Neg logit contributions
 annoying-0.044
 quite-0.043
 rather-0.043
 really-0.042
 sharp-0.042
Token and
Token ID290
Feature activation+1.29e-01

Pos logit contributions
ged+0.560
iked+0.506
 recess+0.505
or+0.471
ed+0.439
Neg logit contributions
 rather-0.613
 annoying-0.581
rig-0.575
 stump-0.553
 probably-0.531
Token pulled
Token ID5954
Feature activation-1.29e-01

Pos logit contributions
 rather+0.231
 annoying+0.222
 stump+0.212
rig+0.210
 probably+0.206
Neg logit contributions
ged-0.195
 recess-0.187
iked-0.185
or-0.176
rav-0.162
Token and
Token ID290
Feature activation+1.32e-01

Pos logit contributions
ged+0.189
iked+0.176
 recess+0.173
or+0.158
rav+0.149
Neg logit contributions
 rather-0.248
 annoying-0.234
rig-0.227
 stump-0.225
 probably-0.217
Token finally
Token ID3443
Feature activation-9.57e-02

Pos logit contributions
 rather+0.222
 annoying+0.218
 stump+0.205
rig+0.199
 probably+0.195
Neg logit contributions
ged-0.210
iked-0.207
 recess-0.203
or-0.198
rav-0.184
Token opened
Token ID4721
Feature activation-2.92e-02

Pos logit contributions
ged+0.128
 recess+0.122
iked+0.120
or+0.115
rav+0.111
Neg logit contributions
 rather-0.183
 annoying-0.173
 sharp-0.169
 stump-0.169
 probably-0.168
Token cry
Token ID3960
Feature activation+8.52e-03

Pos logit contributions
 rather+0.010
rig+0.010
 annoying+0.010
 stump+0.010
pill+0.009
Neg logit contributions
ged-0.009
 recess-0.008
iked-0.007
or-0.007
ies-0.007
Token.
Token ID13
Feature activation+2.01e-02

Pos logit contributions
rig+0.018
 annoying+0.017
 rather+0.017
 stump+0.017
 probably+0.016
Neg logit contributions
ged-0.012
iked-0.009
or-0.009
ed-0.009
 recess-0.009
Token He
Token ID679
Feature activation+1.07e-01

Pos logit contributions
 rather+0.045
 annoying+0.044
rig+0.044
 stump+0.042
 probably+0.041
Neg logit contributions
ged-0.023
iked-0.019
 recess-0.019
or-0.018
ed-0.016
Token tr
Token ID491
Feature activation+3.85e-02

Pos logit contributions
rig+0.225
 rather+0.218
 stump+0.215
 annoying+0.212
pill+0.198
Neg logit contributions
ged-0.141
ed-0.106
iked-0.104
or-0.104
 recess-0.097
Tokenud
Token ID463
Feature activation+9.54e-02

Pos logit contributions
 rather+0.057
 annoying+0.055
rig+0.053
 stump+0.051
 probably+0.049
Neg logit contributions
ged-0.077
iked-0.069
 recess-0.067
or-0.066
ipped-0.062
Tokenged
Token ID2004
Feature activation-3.45e-01

Pos logit contributions
 rather+0.243
 probably+0.229
 stump+0.227
 annoying+0.225
 wire+0.218
Neg logit contributions
 recess-0.062
rav-0.041
 roles-0.041
iked-0.033
 halftime-0.024
Token back
Token ID736
Feature activation-7.09e-02

Pos logit contributions
ged+0.438
iked+0.385
 recess+0.382
or+0.351
ed+0.311
Neg logit contributions
 rather-0.736
 annoying-0.709
rig-0.699
 stump-0.675
 probably-0.655
Token home
Token ID1363
Feature activation-1.63e-01

Pos logit contributions
ged+0.116
 recess+0.103
iked+0.102
or+0.096
rav+0.094
Neg logit contributions
 rather-0.124
rig-0.118
 stump-0.115
 annoying-0.113
 probably-0.106
Token,
Token ID11
Feature activation-1.89e-01

Pos logit contributions
ged+0.254
iked+0.221
 recess+0.221
or+0.209
ed+0.194
Neg logit contributions
 rather-0.303
 annoying-0.291
rig-0.291
 stump-0.279
 probably-0.268
Token carrying
Token ID6872
Feature activation-9.77e-02

Pos logit contributions
ged+0.322
iked+0.259
 recess+0.239
or+0.236
ed+0.234
Neg logit contributions
rig-0.332
 annoying-0.318
 rather-0.315
 stump-0.298
pill-0.292
Token the
Token ID262
Feature activation-2.71e-02

Pos logit contributions
ged+0.141
iked+0.121
 recess+0.121
or+0.114
ed+0.105
Neg logit contributions
 rather-0.193
 annoying-0.186
rig-0.186
 stump-0.179
 probably-0.172
Token.
Token ID13
Feature activation+1.62e-02

Pos logit contributions
 annoying+0.001
rig+0.001
 rather+0.001
 stump+0.001
 probably+0.001
Neg logit contributions
ged-0.001
iked-0.001
 recess-0.001
or-0.001
ed-0.001
Token She
Token ID1375
Feature activation+1.16e-01

Pos logit contributions
 annoying+0.028
 rather+0.027
rig+0.026
 stump+0.025
 awful+0.024
Neg logit contributions
ged-0.027
iked-0.025
 recess-0.024
or-0.023
ed-0.022
Token pushed
Token ID7121
Feature activation-1.40e-01

Pos logit contributions
 annoying+0.231
rig+0.230
 rather+0.230
 stump+0.217
 awful+0.212
Neg logit contributions
ged-0.170
iked-0.137
or-0.136
ed-0.131
 recess-0.128
Token and
Token ID290
Feature activation+1.21e-01

Pos logit contributions
ged+0.213
 recess+0.183
iked+0.183
or+0.174
ed+0.161
Neg logit contributions
 rather-0.267
 annoying-0.258
rig-0.258
 stump-0.248
 probably-0.239
Token tug
Token ID27762
Feature activation-2.17e-02

Pos logit contributions
 rather+0.220
 annoying+0.212
rig+0.211
 stump+0.203
 probably+0.194
Neg logit contributions
ged-0.193
iked-0.170
 recess-0.169
or-0.160
ed-0.149
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
iked+0.018
 recess+0.017
rav+0.014
ged+0.014
or+0.013
Neg logit contributions
 rather-0.053
 annoying-0.051
 intense-0.047
rig-0.047
 quite-0.046
Token at
Token ID379
Feature activation-1.19e-01

Pos logit contributions
ged+0.541
iked+0.464
 recess+0.462
or+0.443
ed+0.412
Neg logit contributions
 rather-0.635
 annoying-0.617
rig-0.617
 stump-0.593
 probably-0.568
Token her
Token ID607
Feature activation-9.96e-02

Pos logit contributions
ged+0.139
 recess+0.111
iked+0.103
or+0.100
ed+0.090
Neg logit contributions
rig-0.270
 rather-0.266
 annoying-0.263
 stump-0.252
 probably-0.244
Token k
Token ID479
Feature activation+1.22e-01

Pos logit contributions
ged+0.169
iked+0.149
 recess+0.148
or+0.143
 roles+0.137
Neg logit contributions
 annoying-0.161
 rather-0.160
rig-0.159
 stump-0.150
 probably-0.146
Tokenite
Token ID578
Feature activation+3.93e-02

Pos logit contributions
 annoying+0.246
rig+0.239
 rather+0.238
 awful+0.234
 probably+0.227
Neg logit contributions
ged-0.181
iked-0.151
or-0.151
 recess-0.150
ipped-0.144
Token,
Token ID11
Feature activation-1.89e-01

Pos logit contributions
rig+0.088
 annoying+0.088
 rather+0.087
 stump+0.081
 probably+0.081
Neg logit contributions
ged-0.044
iked-0.035
 recess-0.034
or-0.034
ed-0.031
Token The
Token ID383
Feature activation+6.67e-02

Pos logit contributions
 rather+0.033
 annoying+0.031
rig+0.031
 stump+0.030
 probably+0.029
Neg logit contributions
ged-0.031
 recess-0.027
iked-0.027
or-0.026
ed-0.024
Token m
Token ID285
Feature activation+8.77e-02

Pos logit contributions
 rather+0.122
rig+0.119
 annoying+0.113
 stump+0.107
pill+0.105
Neg logit contributions
ged-0.102
 recess-0.088
iked-0.087
ies-0.082
or-0.082
Tokenama
Token ID1689
Feature activation-7.20e-02

Pos logit contributions
 rather+0.186
 annoying+0.178
rig+0.178
 stump+0.168
 probably+0.162
Neg logit contributions
ged-0.118
iked-0.100
or-0.096
ed-0.090
ies-0.086
Token bear
Token ID6842
Feature activation+1.76e-02

Pos logit contributions
ged+0.096
ies+0.078
iked+0.078
ed+0.078
or+0.074
Neg logit contributions
 rather-0.151
rig-0.143
 annoying-0.141
 stump-0.134
 awful-0.127
Token nud
Token ID26731
Feature activation-2.44e-02

Pos logit contributions
 rather+0.037
rig+0.037
 annoying+0.036
 stump+0.035
 probably+0.032
Neg logit contributions
ged-0.023
iked-0.019
ed-0.017
or-0.017
 recess-0.017
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
 recess+0.021
iked+0.015
rav+0.014
 roles+0.014
 majesty+0.012
Neg logit contributions
 rather-0.053
 wire-0.052
 stump-0.051
 probably-0.050
 dot-0.048
Token the
Token ID262
Feature activation-4.24e-02

Pos logit contributions
ged+0.470
iked+0.406
 recess+0.406
or+0.376
ed+0.342
Neg logit contributions
 rather-0.704
 annoying-0.680
rig-0.675
 stump-0.652
 probably-0.630
Token daddy
Token ID35822
Feature activation-8.26e-02

Pos logit contributions
ged+0.075
ed+0.062
 recess+0.060
iked+0.060
or+0.060
Neg logit contributions
 rather-0.071
rig-0.071
 annoying-0.065
pill-0.059
 likely-0.058
Token bear
Token ID6842
Feature activation+1.84e-02

Pos logit contributions
ged+0.131
or+0.108
iked+0.106
,+0.102
ed+0.102
Neg logit contributions
 annoying-0.153
 rather-0.151
rig-0.151
 probably-0.137
 likely-0.135
Token and
Token ID290
Feature activation+1.17e-01

Pos logit contributions
 annoying+0.037
rig+0.035
 rather+0.035
pill+0.033
 spe+0.033
Neg logit contributions
ged-0.026
,-0.023
or-0.021
iked-0.021
ed-0.020
Token said
Token ID531
Feature activation-1.81e-01

Pos logit contributions
 rather+0.234
 annoying+0.227
rig+0.221
 stump+0.218
 probably+0.211
Neg logit contributions
ged-0.160
 recess-0.141
iked-0.141
or-0.134
ed-0.118
Token.
Token ID13
Feature activation+6.76e-02

Pos logit contributions
 rather+0.113
 stump+0.109
rig+0.109
 annoying+0.108
 wire+0.107
Neg logit contributions
ged-0.156
 recess-0.141
iked-0.139
or-0.139
ed-0.127
Token She
Token ID1375
Feature activation+1.46e-01

Pos logit contributions
 rather+0.109
 annoying+0.104
rig+0.104
 stump+0.099
 probably+0.094
Neg logit contributions
ged-0.121
iked-0.108
 recess-0.108
or-0.103
ed-0.097
Token pushed
Token ID7121
Feature activation-1.35e-01

Pos logit contributions
 rather+0.288
rig+0.285
 annoying+0.282
 stump+0.279
 awful+0.257
Neg logit contributions
ged-0.201
iked-0.165
 recess-0.164
or-0.159
ed-0.154
Token and
Token ID290
Feature activation+1.56e-01

Pos logit contributions
ged+0.169
iked+0.147
 recess+0.143
or+0.133
ed+0.120
Neg logit contributions
 rather-0.291
 annoying-0.282
rig-0.279
 stump-0.271
 probably-0.261
Token tug
Token ID27762
Feature activation-1.69e-02

Pos logit contributions
 rather+0.284
 annoying+0.277
rig+0.269
 stump+0.262
 probably+0.250
Neg logit contributions
ged-0.238
iked-0.222
 recess-0.217
or-0.207
rav-0.187
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
iked+0.020
 recess+0.018
rav+0.015
or+0.015
riage+0.014
Neg logit contributions
 rather-0.037
 annoying-0.036
 awful-0.032
 intense-0.032
 sharp-0.032
Token harder
Token ID7069
Feature activation-3.04e-03

Pos logit contributions
ged+0.493
iked+0.422
 recess+0.421
or+0.397
ed+0.367
Neg logit contributions
 rather-0.681
 annoying-0.658
rig-0.658
 stump-0.633
 probably-0.610
Token than
Token ID621
Feature activation+2.06e-02

Pos logit contributions
ged+0.004
 recess+0.003
iked+0.003
or+0.003
ed+0.002
Neg logit contributions
 rather-0.007
rig-0.007
 annoying-0.007
 stump-0.006
 probably-0.006
Token ever
Token ID1683
Feature activation-2.27e-02

Pos logit contributions
 rather+0.042
rig+0.041
 annoying+0.041
 stump+0.039
 probably+0.038
Neg logit contributions
ged-0.028
 recess-0.024
iked-0.024
or-0.022
ed-0.021
Token,
Token ID11
Feature activation-1.56e-01

Pos logit contributions
ged+0.040
iked+0.038
 recess+0.036
or+0.035
rav+0.032
Neg logit contributions
 rather-0.038
 annoying-0.036
rig-0.034
 stump-0.034
 probably-0.033
Token but
Token ID475
Feature activation+1.79e-02

Pos logit contributions
ged+0.252
,+0.188
 recess+0.187
iked+0.184
ed+0.183
Neg logit contributions
rig-0.298
 annoying-0.268
 stump-0.265
 rather-0.258
pill-0.248
Tokenaled
Token ID3021
Feature activation-9.03e-02

Pos logit contributions
ged+0.087
iked+0.074
 recess+0.073
or+0.071
ed+0.066
Neg logit contributions
 rather-0.106
rig-0.103
 annoying-0.102
 stump-0.098
 probably-0.094
Token and
Token ID290
Feature activation+1.29e-01

Pos logit contributions
ged+0.119
 recess+0.100
iked+0.100
or+0.094
ed+0.087
Neg logit contributions
 rather-0.188
rig-0.183
 annoying-0.183
 stump-0.176
 probably-0.170
Token wrapped
Token ID12908
Feature activation-1.41e-03

Pos logit contributions
 rather+0.212
rig+0.207
 annoying+0.202
 stump+0.193
 probably+0.184
Neg logit contributions
ged-0.227
 recess-0.202
iked-0.199
or-0.188
ed-0.181
Token and
Token ID290
Feature activation+1.29e-01

Pos logit contributions
ged+0.002
iked+0.002
 recess+0.002
or+0.001
ed+0.001
Neg logit contributions
 rather-0.003
 annoying-0.003
rig-0.003
 stump-0.003
 probably-0.003
Token bag
Token ID6131
Feature activation+2.32e-02

Pos logit contributions
 rather+0.188
rig+0.182
 annoying+0.177
 stump+0.169
 probably+0.161
Neg logit contributions
ged-0.253
 recess-0.227
iked-0.225
or-0.214
ed-0.208
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
 rather+0.045
 annoying+0.043
rig+0.042
 stump+0.041
 probably+0.039
Neg logit contributions
ged-0.034
 recess-0.030
iked-0.030
or-0.028
 roles-0.025
Token and
Token ID290
Feature activation+1.30e-01

Pos logit contributions
ged+0.487
iked+0.418
 recess+0.417
or+0.392
ed+0.362
Neg logit contributions
 rather-0.686
 annoying-0.663
rig-0.662
 stump-0.637
 probably-0.614
Token boxed
Token ID46069
Feature activation-5.41e-02

Pos logit contributions
 rather+0.228
rig+0.225
 annoying+0.220
 stump+0.211
 probably+0.201
Neg logit contributions
ged-0.216
 recess-0.188
iked-0.186
or-0.175
ed-0.169
Token and
Token ID290
Feature activation+1.30e-01

Pos logit contributions
ged+0.080
 recess+0.072
iked+0.070
or+0.065
rav+0.060
Neg logit contributions
 rather-0.102
rig-0.098
 annoying-0.097
 stump-0.095
 probably-0.091
Token cart
Token ID6383
Feature activation+5.07e-02

Pos logit contributions
 rather+0.221
 annoying+0.212
rig+0.211
 stump+0.202
 probably+0.193
Neg logit contributions
ged-0.224
iked-0.198
 recess-0.197
or-0.188
ed-0.176
Tokenoned
Token ID12004
Feature activation-2.31e-01

Pos logit contributions
 rather+0.110
 annoying+0.106
rig+0.106
 stump+0.103
 probably+0.099
Neg logit contributions
ged-0.062
 recess-0.052
iked-0.052
or-0.048
 roles-0.043
Token drawings
Token ID23388
Feature activation-7.54e-03

Pos logit contributions
ged+0.175
 recess+0.158
iked+0.155
or+0.149
ed+0.140
Neg logit contributions
 rather-0.166
rig-0.161
 annoying-0.160
 stump-0.154
 probably-0.145
Token would
Token ID561
Feature activation-3.94e-02

Pos logit contributions
ged+0.012
 recess+0.010
iked+0.010
or+0.010
ed+0.009
Neg logit contributions
 rather-0.014
rig-0.014
 annoying-0.013
 stump-0.013
 probably-0.012
Token get
Token ID651
Feature activation+2.69e-02

Pos logit contributions
 recess+0.051
ged+0.047
,+0.047
ies+0.043
 later+0.043
Neg logit contributions
rig-0.079
umbling-0.068
pill-0.068
 stump-0.067
 annoying-0.063
Token sm
Token ID895
Feature activation+1.46e-01

Pos logit contributions
rig+0.037
 stump+0.035
 rather+0.035
 annoying+0.033
 web+0.030
Neg logit contributions
ged-0.054
 recess-0.049
iked-0.046
or-0.046
ies-0.046
Tokenud
Token ID463
Feature activation+9.11e-02

Pos logit contributions
 annoying+0.145
 rather+0.143
rig+0.138
 stump+0.134
 probably+0.131
Neg logit contributions
ged-0.348
 recess-0.327
iked-0.322
or-0.320
ed-0.310
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
 rather+0.166
 annoying+0.157
rig+0.155
 stump+0.151
 probably+0.145
Neg logit contributions
ged-0.140
iked-0.128
 recess-0.128
or-0.116
 roles-0.108
Token because
Token ID780
Feature activation+7.33e-02

Pos logit contributions
ged+0.497
 recess+0.423
iked+0.417
or+0.402
ed+0.376
Neg logit contributions
 rather-0.671
rig-0.666
 annoying-0.654
 stump-0.637
 probably-0.605
Token the
Token ID262
Feature activation-3.43e-02

Pos logit contributions
rig+0.173
 rather+0.164
 stump+0.163
 annoying+0.162
 probably+0.151
Neg logit contributions
ged-0.078
 recess-0.069
iked-0.063
or-0.061
 roles-0.056
Token paper
Token ID3348
Feature activation+1.22e-01

Pos logit contributions
 recess+0.057
ged+0.054
 roles+0.050
or+0.048
ies+0.047
Neg logit contributions
rig-0.058
 rather-0.051
 stump-0.050
 annoying-0.049
 intense-0.046
Token wasn
Token ID2492
Feature activation+3.20e-05

Pos logit contributions
rig+0.253
 rather+0.243
 annoying+0.239
 stump+0.233
 intense+0.220
Neg logit contributions
ged-0.156
ed-0.132
or-0.132
 recess-0.131
ies-0.129
Token't
Token ID470
Feature activation+1.31e-01

Pos logit contributions
rig+0.000
 stump+0.000
 rather+0.000
 tre+0.000
 web+0.000
Neg logit contributions
ged-0.000
 recess-0.000
or-0.000
 later-0.000
iked-0.000
Token.
Token ID13
Feature activation+2.02e-02

Pos logit contributions
ged+0.094
 recess+0.084
iked+0.083
or+0.077
ed+0.073
Neg logit contributions
 rather-0.111
rig-0.105
 annoying-0.105
 stump-0.101
 probably-0.098
Token He
Token ID679
Feature activation+1.03e-01

Pos logit contributions
 rather+0.038
 annoying+0.038
rig+0.037
 stump+0.035
 probably+0.034
Neg logit contributions
ged-0.031
iked-0.027
 recess-0.026
or-0.025
ed-0.024
Token shook
Token ID14682
Feature activation-3.42e-02

Pos logit contributions
 rather+0.221
rig+0.218
 annoying+0.213
 stump+0.203
 awful+0.196
Neg logit contributions
ged-0.139
iked-0.107
or-0.103
ed-0.101
 recess-0.095
Token and
Token ID290
Feature activation+1.24e-01

Pos logit contributions
ged+0.049
 recess+0.045
iked+0.044
or+0.041
ed+0.037
Neg logit contributions
 rather-0.067
 annoying-0.064
rig-0.063
 stump-0.062
 probably-0.060
Token tug
Token ID27762
Feature activation-1.74e-02

Pos logit contributions
 rather+0.248
rig+0.243
 annoying+0.238
 stump+0.231
 probably+0.220
Neg logit contributions
ged-0.182
iked-0.147
 recess-0.144
or-0.139
ed-0.133
Tokenged
Token ID2004
Feature activation-3.44e-01

Pos logit contributions
 recess+0.010
iked+0.010
rav+0.007
ail+0.006
riage+0.005
Neg logit contributions
 rather-0.047
 annoying-0.046
 sharp-0.044
 quite-0.043
 intense-0.043
Token it
Token ID340
Feature activation-3.10e-02

Pos logit contributions
ged+0.516
 recess+0.473
iked+0.466
or+0.429
 roles+0.399
Neg logit contributions
 rather-0.653
 annoying-0.619
rig-0.612
 stump-0.590
 probably-0.569
Token,
Token ID11
Feature activation-1.85e-01

Pos logit contributions
ged+0.044
iked+0.038
 recess+0.037
or+0.035
ed+0.033
Neg logit contributions
 rather-0.061
rig-0.060
 annoying-0.059
 stump-0.057
 probably-0.055
Token but
Token ID475
Feature activation-2.02e-03

Pos logit contributions
ged+0.295
,+0.218
iked+0.204
ed+0.197
 down+0.187
Neg logit contributions
rig-0.366
 annoying-0.331
 rather-0.323
 stump-0.321
pill-0.319
Token it
Token ID340
Feature activation-2.98e-02

Pos logit contributions
ged+0.003
 recess+0.003
iked+0.003
or+0.002
 roles+0.002
Neg logit contributions
 rather-0.004
 annoying-0.004
rig-0.004
 stump-0.004
 probably-0.004
Token wouldn
Token ID3636
Feature activation-8.55e-02

Pos logit contributions
ged+0.036
ed+0.030
or+0.026
iked+0.025
bled+0.024
Neg logit contributions
 rather-0.067
rig-0.066
 annoying-0.062
pill-0.060
 stump-0.059
Token a
Token ID257
Feature activation+8.89e-02

Pos logit contributions
rig+0.083
 annoying+0.078
 stump+0.077
 rather+0.076
 web+0.075
Neg logit contributions
ged-0.056
 recess-0.052
,-0.051
iked-0.048
 later-0.048
Token very
Token ID845
Feature activation+1.85e-01

Pos logit contributions
 rather+0.138
rig+0.137
pill+0.132
 stump+0.132
umbling+0.126
Neg logit contributions
 recess-0.135
ged-0.132
,-0.128
rav-0.116
ies-0.115
Token special
Token ID2041
Feature activation+1.21e-01

Pos logit contributions
 rather+0.233
 stump+0.218
rig+0.216
 annoying+0.205
 probably+0.192
Neg logit contributions
ged-0.364
 recess-0.358
iked-0.343
or-0.329
ies-0.317
Token day
Token ID1110
Feature activation-1.68e-01

Pos logit contributions
 annoying+0.271
 rather+0.270
rig+0.265
 awful+0.251
 stump+0.250
Neg logit contributions
 recess-0.113
ged-0.107
,-0.105
ies-0.095
iked-0.092
Token.
Token ID13
Feature activation+1.82e-02

Pos logit contributions
ged+0.225
iked+0.194
 recess+0.186
or+0.171
ed+0.170
Neg logit contributions
rig-0.350
 annoying-0.345
 rather-0.345
 stump-0.330
 probably-0.313
Token It
Token ID632
Feature activation+1.52e-01

Pos logit contributions
rig+0.033
 rather+0.033
 annoying+0.033
 awful+0.030
 web+0.030
Neg logit contributions
ged-0.025
 recess-0.025
iked-0.025
ed-0.021
,-0.020
Token was
Token ID373
Feature activation+4.53e-02

Pos logit contributions
 rather+0.385
rig+0.381
 annoying+0.363
 stump+0.363
 former+0.345
Neg logit contributions
ged-0.098
ed-0.086
,-0.078
 came-0.070
 later-0.060
Token pretty
Token ID2495
Feature activation+1.42e-01

Pos logit contributions
rig+0.080
pill+0.075
 stump+0.073
 rather+0.073
 annoying+0.072
Neg logit contributions
ged-0.066
 recess-0.063
,-0.059
iked-0.058
 later-0.056
Token!
Token ID0
Feature activation+3.77e-02

Pos logit contributions
 annoying+0.338
rig+0.335
 rather+0.310
 painful+0.304
 awful+0.302
Neg logit contributions
,-0.162
ies-0.132
ed-0.119
ged-0.111
 down-0.109
Token Mom
Token ID11254
Feature activation-7.48e-02

Pos logit contributions
 rather+0.062
 annoying+0.062
rig+0.062
 stump+0.057
 awful+0.056
Neg logit contributions
ged-0.061
 recess-0.058
iked-0.058
or-0.051
ed-0.050
Tokenmy
Token ID1820
Feature activation-1.34e-01

Pos logit contributions
ged+0.096
iked+0.080
 recess+0.077
or+0.075
ed+0.070
Neg logit contributions
 rather-0.159
 annoying-0.154
rig-0.154
 stump-0.148
 probably-0.141
Token to
Token ID284
Feature activation+2.98e-02

Pos logit contributions
 rather+0.006
 annoying+0.006
rig+0.006
 stump+0.006
 probably+0.006
Neg logit contributions
ged-0.005
iked-0.005
 recess-0.005
or-0.004
ed-0.004
Token open
Token ID1280
Feature activation+9.34e-02

Pos logit contributions
rig+0.062
pill+0.060
 unlike+0.059
 rather+0.058
 annoying+0.057
Neg logit contributions
 recess-0.034
ged-0.033
 later-0.031
,-0.030
or-0.030
Token z
Token ID1976
Feature activation+3.87e-02

Pos logit contributions
 rather+0.207
rig+0.202
 annoying+0.201
 stump+0.194
 probably+0.187
Neg logit contributions
ged-0.111
 recess-0.093
iked-0.092
or-0.086
ed-0.078
Tokenips
Token ID2419
Feature activation-1.02e-01

Pos logit contributions
 rather+0.084
rig+0.082
 annoying+0.080
 stump+0.076
 wire+0.076
Neg logit contributions
ged-0.041
iked-0.033
 recess-0.031
ies-0.028
rav-0.028
Token.
Token ID13
Feature activation+6.55e-02

Pos logit contributions
ged+0.137
 recess+0.115
iked+0.114
or+0.110
ed+0.102
Neg logit contributions
 rather-0.211
 annoying-0.206
rig-0.206
 stump-0.198
 probably-0.191
Token She
Token ID1375
Feature activation+1.72e-01

Pos logit contributions
 rather+0.090
 annoying+0.085
rig+0.085
 stump+0.080
 probably+0.076
Neg logit contributions
ged-0.135
iked-0.121
 recess-0.121
or-0.116
ed-0.110
Token sees
Token ID7224
Feature activation-3.05e-02

Pos logit contributions
 stump+0.361
 rather+0.357
rig+0.356
 annoying+0.353
umbling+0.342
Neg logit contributions
ged-0.179
ed-0.148
 recess-0.145
 roles-0.145
or-0.143
Token Ben
Token ID3932
Feature activation-1.13e-01

Pos logit contributions
ged+0.042
 recess+0.036
iked+0.034
ed+0.032
 roles+0.031
Neg logit contributions
rig-0.063
 rather-0.062
 stump-0.060
 annoying-0.059
 probably-0.057
Token and
Token ID290
Feature activation+1.66e-01

Pos logit contributions
ged+0.196
iked+0.164
or+0.161
 recess+0.161
ed+0.157
Neg logit contributions
 rather-0.189
rig-0.187
 annoying-0.183
 stump-0.172
 intense-0.164
Token Lily
Token ID20037
Feature activation-1.31e-01

Pos logit contributions
 rather+0.383
rig+0.372
 annoying+0.371
 stump+0.360
 probably+0.347
Neg logit contributions
ged-0.186
 recess-0.152
iked-0.151
or-0.139
ed-0.125
Token struggling
Token ID9648
Feature activation+8.66e-02

Pos logit contributions
ged+0.193
 recess+0.171
iked+0.170
or+0.161
ed+0.147
Neg logit contributions
 rather-0.252
 annoying-0.241
rig-0.236
 stump-0.233
 probably-0.224
Token.
Token ID13
Feature activation+5.07e-02

Pos logit contributions
 rather+0.028
 annoying+0.028
rig+0.028
 stump+0.026
 probably+0.025
Neg logit contributions
ged-0.022
iked-0.019
 recess-0.019
or-0.019
ed-0.017
Token The
Token ID383
Feature activation+8.41e-02

Pos logit contributions
 rather+0.087
 annoying+0.085
rig+0.085
 stump+0.081
 probably+0.077
Neg logit contributions
ged-0.084
 recess-0.074
iked-0.074
or-0.070
ed-0.065
Token man
Token ID582
Feature activation+2.18e-02

Pos logit contributions
 rather+0.141
rig+0.136
 annoying+0.134
 stump+0.130
 probably+0.123
Neg logit contributions
ged-0.145
 recess-0.131
iked-0.127
or-0.122
ed-0.114
Token loved
Token ID6151
Feature activation-8.81e-04

Pos logit contributions
rig+0.045
 rather+0.045
 annoying+0.044
 stump+0.043
 awful+0.041
Neg logit contributions
ged-0.029
iked-0.023
or-0.022
 recess-0.022
ed-0.022
Token her
Token ID607
Feature activation-8.35e-02

Pos logit contributions
ged+0.001
 recess+0.001
iked+0.001
or+0.001
 roles+0.001
Neg logit contributions
rig-0.002
 rather-0.002
 annoying-0.002
 stump-0.002
 probably-0.002
Token very
Token ID845
Feature activation+1.91e-01

Pos logit contributions
ged+0.139
 recess+0.125
iked+0.123
or+0.117
ies+0.115
Neg logit contributions
 rather-0.138
 annoying-0.138
rig-0.135
 stump-0.132
 probably-0.124
Token much
Token ID881
Feature activation+1.29e-01

Pos logit contributions
 stump+0.241
rig+0.234
 annoying+0.225
 wire+0.221
 rather+0.217
Neg logit contributions
ged-0.365
 recess-0.353
or-0.351
iked-0.343
,-0.336
Token.
Token ID13
Feature activation+4.49e-02

Pos logit contributions
rig+0.285
 annoying+0.284
 stump+0.280
 wire+0.270
 rather+0.269
Neg logit contributions
ged-0.131
,-0.120
or-0.112
 recess-0.110
ed-0.107
Token One
Token ID1881
Feature activation+1.37e-02

Pos logit contributions
 rather+0.079
 annoying+0.079
rig+0.077
 stump+0.073
 intense+0.070
Neg logit contributions
ged-0.070
 recess-0.064
iked-0.062
or-0.059
ed-0.054
Token day
Token ID1110
Feature activation-1.66e-01

Pos logit contributions
 rather+0.034
rig+0.033
 annoying+0.033
 stump+0.032
 probably+0.031
Neg logit contributions
ged-0.012
 recess-0.010
or-0.010
iked-0.010
ies-0.009
Token,
Token ID11
Feature activation-1.83e-01

Pos logit contributions
ged+0.231
iked+0.210
bled+0.206
 recess+0.198
or+0.188
Neg logit contributions
 rather-0.308
 annoying-0.272
 spe-0.271
 sharp-0.266
 smoke-0.263
Token her
Token ID607
Feature activation-7.15e-02

Pos logit contributions
 rather+0.249
rig+0.242
 annoying+0.240
 stump+0.233
 probably+0.223
Neg logit contributions
ged-0.161
 recess-0.139
iked-0.137
or-0.129
ed-0.120
Token toys
Token ID14958
Feature activation-2.71e-02

Pos logit contributions
ged+0.077
 recess+0.063
iked+0.062
or+0.057
ed+0.051
Neg logit contributions
 rather-0.167
rig-0.163
 annoying-0.162
 stump-0.158
 probably-0.152
Token were
Token ID547
Feature activation-1.46e-02

Pos logit contributions
ged+0.039
 recess+0.034
iked+0.034
or+0.033
rav+0.029
Neg logit contributions
 rather-0.052
 annoying-0.050
rig-0.049
 stump-0.048
 probably-0.047
Token gone
Token ID3750
Feature activation-5.74e-03

Pos logit contributions
ged+0.028
 recess+0.025
iked+0.025
or+0.023
 roles+0.023
Neg logit contributions
rig-0.022
 rather-0.021
 annoying-0.020
 stump-0.020
 probably-0.018
Token.
Token ID13
Feature activation+5.51e-02

Pos logit contributions
ged+0.009
iked+0.008
 recess+0.008
or+0.007
ed+0.007
Neg logit contributions
 rather-0.011
 annoying-0.010
rig-0.010
 stump-0.010
 probably-0.010
Token She
Token ID1375
Feature activation+1.61e-01

Pos logit contributions
 rather+0.078
 annoying+0.073
rig+0.072
 stump+0.069
 probably+0.066
Neg logit contributions
ged-0.113
iked-0.100
 recess-0.099
or-0.097
ies-0.091
Token had
Token ID550
Feature activation-3.96e-02

Pos logit contributions
 rather+0.334
rig+0.332
 annoying+0.324
 stump+0.319
 probably+0.295
Neg logit contributions
ged-0.215
iked-0.174
 recess-0.174
or-0.164
ed-0.160
Token nothing
Token ID2147
Feature activation+5.11e-02

Pos logit contributions
ged+0.072
 recess+0.065
iked+0.064
or+0.060
ed+0.058
Neg logit contributions
 rather-0.062
rig-0.062
 annoying-0.060
 stump-0.058
 probably-0.054
Token left
Token ID1364
Feature activation-2.00e-02

Pos logit contributions
 rather+0.092
rig+0.089
 annoying+0.089
 stump+0.085
 probably+0.081
Neg logit contributions
ged-0.082
iked-0.071
 recess-0.071
or-0.068
ed-0.063
Token.
Token ID13
Feature activation+5.82e-02

Pos logit contributions
ged+0.030
iked+0.026
 recess+0.026
or+0.024
ed+0.022
Neg logit contributions
 rather-0.038
 annoying-0.037
rig-0.036
 stump-0.035
 probably-0.034
Token She
Token ID1375
Feature activation+1.64e-01

Pos logit contributions
 rather+0.082
 annoying+0.076
rig+0.075
 stump+0.071
 probably+0.068
Neg logit contributions
ged-0.120
iked-0.108
 recess-0.106
or-0.103
ies-0.098
Token plant
Token ID4618
Feature activation+1.09e-01

Pos logit contributions
 rather+0.055
rig+0.054
 annoying+0.054
 stump+0.049
 awful+0.048
Neg logit contributions
ged-0.043
 recess-0.039
iked-0.037
or-0.035
ed-0.032
Token her
Token ID607
Feature activation-6.96e-02

Pos logit contributions
 annoying+0.235
rig+0.232
 rather+0.229
 stump+0.208
 intense+0.207
Neg logit contributions
ged-0.140
iked-0.115
 recess-0.115
or-0.110
ed-0.106
Token pine
Token ID20161
Feature activation+9.72e-02

Pos logit contributions
ged+0.126
 recess+0.114
iked+0.114
or+0.108
ies+0.106
Neg logit contributions
 annoying-0.106
 rather-0.105
rig-0.103
 stump-0.091
 intense-0.090
Token seed
Token ID9403
Feature activation+5.06e-02

Pos logit contributions
 annoying+0.153
rig+0.150
 rather+0.148
 probably+0.129
 intense+0.128
Neg logit contributions
ged-0.180
iked-0.166
 recess-0.160
or-0.160
ed-0.154
Token.
Token ID13
Feature activation+5.95e-02

Pos logit contributions
 annoying+0.112
 rather+0.108
rig+0.108
 awful+0.099
 probably+0.099
Neg logit contributions
ged-0.061
iked-0.051
or-0.048
 recess-0.048
ies-0.047
Token She
Token ID1375
Feature activation+1.64e-01

Pos logit contributions
 rather+0.089
 annoying+0.087
rig+0.086
 stump+0.081
 probably+0.077
Neg logit contributions
ged-0.113
iked-0.101
 recess-0.100
or-0.094
ed-0.092
Token hoped
Token ID10719
Feature activation-3.95e-02

Pos logit contributions
rig+0.349
 annoying+0.349
 rather+0.345
 stump+0.326
 awful+0.319
Neg logit contributions
ged-0.210
iked-0.164
ed-0.162
 recess-0.153
or-0.151
Token that
Token ID326
Feature activation+8.79e-02

Pos logit contributions
ged+0.054
 recess+0.049
iked+0.048
or+0.045
 roles+0.042
Neg logit contributions
rig-0.080
 rather-0.078
 annoying-0.077
 stump-0.075
 probably-0.070
Token one
Token ID530
Feature activation-2.44e-03

Pos logit contributions
rig+0.196
 rather+0.191
 annoying+0.188
 stump+0.182
 probably+0.173
Neg logit contributions
ged-0.102
 recess-0.088
iked-0.084
or-0.078
 roles-0.074
Token day
Token ID1110
Feature activation-1.51e-01

Pos logit contributions
ged+0.003
 recess+0.002
iked+0.002
or+0.002
ed+0.002
Neg logit contributions
rig-0.006
 rather-0.006
 annoying-0.005
 stump-0.005
 probably-0.005
Token,
Token ID11
Feature activation-1.57e-01

Pos logit contributions
ged+0.185
iked+0.155
 recess+0.154
or+0.144
ed+0.131
Neg logit contributions
 rather-0.329
rig-0.321
 annoying-0.320
 stump-0.309
 probably-0.298
Token said
Token ID531
Feature activation-1.65e-01

Pos logit contributions
ged+0.049
iked+0.041
 recess+0.041
or+0.039
ed+0.037
Neg logit contributions
 rather-0.068
rig-0.065
 annoying-0.065
 stump-0.063
 probably-0.060
Token the
Token ID262
Feature activation-2.86e-02

Pos logit contributions
 recess+0.233
ged+0.226
iked+0.191
packs+0.189
or+0.185
Neg logit contributions
 rather-0.333
 annoying-0.291
 sharp-0.285
 beautifully-0.285
 awful-0.285
Token same
Token ID976
Feature activation+8.21e-02

Pos logit contributions
ged+0.051
 recess+0.050
iked+0.045
 roles+0.044
or+0.043
Neg logit contributions
rig-0.042
 rather-0.042
 annoying-0.038
 stump-0.035
 intense-0.035
Token magic
Token ID5536
Feature activation+1.08e-01

Pos logit contributions
rig+0.135
 rather+0.133
 annoying+0.131
 stump+0.125
pill+0.118
Neg logit contributions
ged-0.135
 recess-0.120
iked-0.117
or-0.115
,-0.114
Token word
Token ID1573
Feature activation+9.69e-02

Pos logit contributions
rig+0.171
 rather+0.169
 annoying+0.166
 stump+0.150
 intense+0.146
Neg logit contributions
ged-0.194
 recess-0.172
iked-0.171
or-0.167
ed-0.167
Token.
Token ID13
Feature activation+2.24e-02

Pos logit contributions
rig+0.199
 rather+0.194
 annoying+0.193
 stump+0.184
 probably+0.174
Neg logit contributions
ged-0.132
iked-0.111
 recess-0.105
or-0.103
ed-0.103
Token Suddenly
Token ID24975
Feature activation-6.38e-02

Pos logit contributions
 rather+0.037
 annoying+0.036
rig+0.036
 stump+0.033
 probably+0.032
Neg logit contributions
ged-0.039
iked-0.036
 recess-0.035
or-0.033
ed-0.032
Token,
Token ID11
Feature activation-1.91e-01

Pos logit contributions
ged+0.080
 recess+0.073
iked+0.073
or+0.068
rav+0.065
Neg logit contributions
 rather-0.128
 wire-0.121
 annoying-0.120
 sharp-0.120
 smoke-0.119
Token the
Token ID262
Feature activation-2.97e-02

Pos logit contributions
ged+0.267
 recess+0.230
iked+0.227
or+0.212
ed+0.200
Neg logit contributions
 rather-0.385
rig-0.379
 annoying-0.371
 stump-0.358
 probably-0.344
Token silver
Token ID8465
Feature activation+1.10e-01

Pos logit contributions
ged+0.057
 recess+0.054
iked+0.048
ed+0.047
 roles+0.047
Neg logit contributions
rig-0.043
 rather-0.042
 annoying-0.038
 stump-0.035
 probably-0.034
Token ball
Token ID2613
Feature activation+8.59e-02

Pos logit contributions
rig+0.183
 rather+0.182
 annoying+0.175
 stump+0.156
 awful+0.151
Neg logit contributions
ged-0.203
 recess-0.173
iked-0.173
or-0.170
ed-0.169
Token is
Token ID318
Feature activation+1.12e-01

Pos logit contributions
 rather+0.020
rig+0.020
 annoying+0.019
 stump+0.019
 probably+0.018
Neg logit contributions
ged-0.018
iked-0.016
 recess-0.015
or-0.015
ed-0.014
Token loyal
Token ID9112
Feature activation+2.24e-02

Pos logit contributions
 rather+0.176
rig+0.171
 annoying+0.169
 stump+0.162
 probably+0.152
Neg logit contributions
ged-0.204
 recess-0.181
iked-0.180
or-0.173
ed-0.163
Token and
Token ID290
Feature activation+1.26e-01

Pos logit contributions
 rather+0.041
rig+0.040
 annoying+0.040
 stump+0.038
 probably+0.037
Neg logit contributions
ged-0.035
iked-0.030
 recess-0.030
or-0.029
ed-0.027
Token it
Token ID340
Feature activation-2.32e-02

Pos logit contributions
rig+0.202
 stump+0.196
 rather+0.194
 annoying+0.186
 web+0.182
Neg logit contributions
ged-0.212
iked-0.190
 recess-0.184
ed-0.183
or-0.173
Token was
Token ID373
Feature activation+5.45e-02

Pos logit contributions
ged+0.030
ed+0.027
or+0.024
 roles+0.023
iked+0.023
Neg logit contributions
rig-0.046
 stump-0.046
 rather-0.045
 annoying-0.043
 intense-0.041
Token wrong
Token ID2642
Feature activation+2.41e-02

Pos logit contributions
rig+0.077
 rather+0.076
 annoying+0.074
 stump+0.072
 intense+0.064
Neg logit contributions
ged-0.106
 recess-0.094
iked-0.092
or-0.089
ed-0.085
Token to
Token ID284
Feature activation+1.08e-02

Pos logit contributions
 rather+0.045
 annoying+0.044
rig+0.044
 stump+0.042
 probably+0.040
Neg logit contributions
ged-0.037
iked-0.032
 recess-0.031
or-0.030
ed-0.028
Token try
Token ID1949
Feature activation+4.62e-02

Pos logit contributions
 rather+0.019
 annoying+0.018
rig+0.018
 stump+0.017
 probably+0.017
Neg logit contributions
ged-0.018
iked-0.016
 recess-0.016
or-0.015
ed-0.014
Token and
Token ID290
Feature activation+1.24e-01

Pos logit contributions
 rather+0.089
rig+0.087
 annoying+0.087
 stump+0.083
 probably+0.080
Neg logit contributions
ged-0.068
 recess-0.058
iked-0.057
or-0.055
ed-0.051
Token take
Token ID1011
Feature activation+7.00e-03

Pos logit contributions
rig+0.218
 rather+0.217
 annoying+0.213
 stump+0.205
 awful+0.192
Neg logit contributions
ged-0.197
 recess-0.176
iked-0.171
or-0.164
ed-0.152
Token something
Token ID1223
Feature activation+6.25e-02

Pos logit contributions
 rather+0.013
 annoying+0.013
rig+0.013
 stump+0.012
 probably+0.012
Neg logit contributions
ged-0.011
iked-0.009
 recess-0.009
or-0.009
ed-0.008
Token many
Token ID867
Feature activation+5.60e-02

Pos logit contributions
ged+0.031
 recess+0.028
iked+0.028
ed+0.026
or+0.024
Neg logit contributions
rig-0.048
 rather-0.045
 annoying-0.044
 stump-0.043
 probably-0.041
Token flowers
Token ID12734
Feature activation+3.27e-02

Pos logit contributions
 rather+0.081
 annoying+0.078
rig+0.077
 stump+0.073
 probably+0.070
Neg logit contributions
ged-0.110
iked-0.098
 recess-0.098
or-0.094
ed-0.089
Token and
Token ID290
Feature activation+1.22e-01

Pos logit contributions
 annoying+0.070
rig+0.069
 rather+0.066
 awful+0.063
 intense+0.062
Neg logit contributions
ged-0.043
iked-0.037
,-0.035
 recess-0.035
ies-0.034
Token a
Token ID257
Feature activation+9.52e-02

Pos logit contributions
rig+0.225
 rather+0.214
 annoying+0.205
 stump+0.199
 probably+0.191
Neg logit contributions
ged-0.191
 recess-0.174
iked-0.165
ies-0.158
ed-0.155
Token fountain
Token ID29420
Feature activation+1.39e-01

Pos logit contributions
 rather+0.132
rig+0.132
 annoying+0.126
 stump+0.117
 probably+0.112
Neg logit contributions
ged-0.185
 recess-0.173
iked-0.164
or-0.158
ies-0.158
Token.
Token ID13
Feature activation+1.83e-02

Pos logit contributions
 rather+0.297
 annoying+0.297
rig+0.292
 stump+0.274
 probably+0.271
Neg logit contributions
ged-0.174
iked-0.152
 recess-0.151
or-0.143
ies-0.137
Token She
Token ID1375
Feature activation+1.35e-01

Pos logit contributions
 rather+0.026
rig+0.024
 annoying+0.024
 stump+0.023
 probably+0.022
Neg logit contributions
ged-0.037
iked-0.033
 recess-0.033
or-0.032
ed-0.031
Token liked
Token ID8288
Feature activation-5.06e-03

Pos logit contributions
rig+0.262
 annoying+0.257
 rather+0.255
 stump+0.249
 awful+0.242
Neg logit contributions
ged-0.193
 recess-0.164
iked-0.162
or-0.156
ed-0.154
Token to
Token ID284
Feature activation+1.62e-02

Pos logit contributions
ged+0.005
 recess+0.005
iked+0.005
 roles+0.004
ies+0.004
Neg logit contributions
rig-0.012
 rather-0.011
 annoying-0.011
 stump-0.011
 awful-0.011
Token spray
Token ID11662
Feature activation+1.34e-01

Pos logit contributions
rig+0.033
umbling+0.032
 annoying+0.032
 rather+0.032
 awful+0.032
Neg logit contributions
 recess-0.019
ged-0.018
,-0.016
iked-0.015
ies-0.015
Token water
Token ID1660
Feature activation+1.69e-01

Pos logit contributions
 rather+0.250
rig+0.245
 annoying+0.241
 stump+0.234
 probably+0.225
Neg logit contributions
ged-0.208
 recess-0.182
iked-0.178
or-0.171
ed-0.164
Token bed
Token ID3996
Feature activation-2.45e-02

Pos logit contributions
 annoying+0.173
 rather+0.168
rig+0.164
 awful+0.154
 probably+0.152
Neg logit contributions
ged-0.119
 recess-0.100
iked-0.099
or-0.093
ies-0.091
Token to
Token ID284
Feature activation+1.32e-02

Pos logit contributions
ged+0.033
iked+0.028
ed+0.027
or+0.027
 recess+0.026
Neg logit contributions
 annoying-0.051
rig-0.050
 rather-0.049
 stump-0.046
 probably-0.045
Token sleep
Token ID3993
Feature activation-9.46e-03

Pos logit contributions
 rather+0.023
rig+0.023
 annoying+0.022
 stump+0.021
 awful+0.020
Neg logit contributions
ged-0.021
 recess-0.019
iked-0.018
or-0.018
ies-0.016
Token in
Token ID287
Feature activation-6.35e-02

Pos logit contributions
ged+0.014
iked+0.012
 recess+0.012
or+0.012
ed+0.011
Neg logit contributions
 rather-0.018
rig-0.017
 annoying-0.017
 stump-0.017
 probably-0.016
Token.
Token ID13
Feature activation+2.66e-02

Pos logit contributions
ged+0.074
 recess+0.067
iked+0.057
or+0.056
 roles+0.051
Neg logit contributions
rig-0.141
 annoying-0.138
 rather-0.138
 stump-0.129
 probably-0.125
Token The
Token ID383
Feature activation+8.26e-02

Pos logit contributions
 rather+0.045
 annoying+0.043
rig+0.043
 stump+0.041
 probably+0.039
Neg logit contributions
ged-0.046
iked-0.041
 recess-0.040
or-0.038
ed-0.036
Token puppy
Token ID26188
Feature activation+2.50e-02

Pos logit contributions
rig+0.143
 rather+0.138
 annoying+0.130
 stump+0.127
 unlike+0.120
Neg logit contributions
ged-0.137
 recess-0.125
iked-0.115
ed-0.109
or-0.108
Token was
Token ID373
Feature activation+5.13e-02

Pos logit contributions
 rather+0.050
rig+0.050
 annoying+0.049
 stump+0.046
 awful+0.044
Neg logit contributions
ged-0.034
iked-0.028
 recess-0.027
ed-0.027
or-0.027
Token no
Token ID645
Feature activation+5.22e-02

Pos logit contributions
rig+0.076
 annoying+0.068
pill+0.068
 stump+0.068
 rather+0.067
Neg logit contributions
ged-0.094
 recess-0.080
,-0.079
ed-0.078
iked-0.078
Token longer
Token ID2392
Feature activation+2.52e-03

Pos logit contributions
 rather+0.074
rig+0.073
 annoying+0.071
 stump+0.070
 probably+0.064
Neg logit contributions
ged-0.103
 recess-0.092
iked-0.090
or-0.087
ed-0.083
Token scared
Token ID12008
Feature activation+8.90e-02

Pos logit contributions
rig+0.004
 rather+0.004
 stump+0.003
 annoying+0.003
 wire+0.003
Neg logit contributions
ged-0.005
 recess-0.004
ed-0.004
iked-0.004
ies-0.004
Token They
Token ID1119
Feature activation+1.25e-01

Pos logit contributions
 rather+0.107
rig+0.107
 annoying+0.105
 stump+0.102
 probably+0.099
Neg logit contributions
ged-0.109
 recess-0.101
iked-0.097
 roles-0.090
ed-0.088
Token play
Token ID711
Feature activation-2.10e-02

Pos logit contributions
rig+0.241
 rather+0.239
 stump+0.238
 annoying+0.237
 unlike+0.224
Neg logit contributions
ged-0.167
 recess-0.153
or-0.139
iked-0.138
ies-0.127
Token with
Token ID351
Feature activation-1.46e-02

Pos logit contributions
ged+0.035
iked+0.032
 recess+0.031
or+0.030
ed+0.027
Neg logit contributions
 rather-0.037
 annoying-0.035
rig-0.035
 stump-0.033
 probably-0.032
Token Rex
Token ID17853
Feature activation-4.71e-02

Pos logit contributions
ged+0.018
 recess+0.015
iked+0.014
or+0.013
ed+0.012
Neg logit contributions
 rather-0.032
rig-0.032
 annoying-0.031
 stump-0.030
 probably-0.029
Token and
Token ID290
Feature activation+1.63e-01

Pos logit contributions
ged+0.077
 recess+0.069
iked+0.068
or+0.065
ed+0.060
Neg logit contributions
 rather-0.084
 annoying-0.080
rig-0.079
 stump-0.076
 probably-0.073
Token the
Token ID262
Feature activation-1.52e-04

Pos logit contributions
 rather+0.304
rig+0.293
 annoying+0.293
 stump+0.281
 probably+0.270
Neg logit contributions
ged-0.253
 recess-0.220
iked-0.220
or-0.208
ed-0.193
Token man
Token ID582
Feature activation+4.29e-02

Pos logit contributions
ged+0.000
 recess+0.000
iked+0.000
 roles+0.000
or+0.000
Neg logit contributions
 rather-0.000
rig-0.000
 annoying-0.000
 stump-0.000
 probably-0.000
Token.
Token ID13
Feature activation+7.03e-02

Pos logit contributions
 rather+0.083
rig+0.081
 annoying+0.081
 stump+0.077
 probably+0.075
Neg logit contributions
ged-0.063
 recess-0.054
iked-0.054
or-0.051
ed-0.048
Token They
Token ID1119
Feature activation+1.28e-01

Pos logit contributions
rig+0.118
 rather+0.116
 annoying+0.116
 stump+0.111
 probably+0.110
Neg logit contributions
ged-0.114
 recess-0.107
iked-0.101
 roles-0.095
ed-0.092
Token share
Token ID2648
Feature activation-1.52e-01

Pos logit contributions
rig+0.261
 annoying+0.250
 rather+0.249
 stump+0.248
 intense+0.237
Neg logit contributions
ged-0.160
 recess-0.153
or-0.131
iked-0.127
ies-0.119
Token their
Token ID511
Feature activation-8.78e-03

Pos logit contributions
ged+0.192
iked+0.161
 recess+0.161
or+0.150
ed+0.136
Neg logit contributions
 rather-0.329
 annoying-0.319
rig-0.318
 stump-0.307
 probably-0.297
Token of
Token ID286
Feature activation+7.95e-02

Pos logit contributions
rig+0.276
 rather+0.274
 stump+0.268
 annoying+0.268
 probably+0.259
Neg logit contributions
ged-0.258
 recess-0.223
iked-0.221
or-0.211
ed-0.207
Token help
Token ID1037
Feature activation-3.30e-02

Pos logit contributions
 rather+0.138
rig+0.134
 annoying+0.132
 stump+0.128
 probably+0.122
Neg logit contributions
ged-0.135
 recess-0.120
iked-0.116
or-0.110
ed-0.104
Token from
Token ID422
Feature activation-6.24e-02

Pos logit contributions
ged+0.019
iked+0.012
 recess+0.012
or+0.011
ed+0.008
Neg logit contributions
 rather-0.092
rig-0.092
 annoying-0.091
 stump-0.089
 probably-0.086
Token her
Token ID607
Feature activation-7.73e-02

Pos logit contributions
ged+0.075
 recess+0.063
iked+0.062
or+0.058
ed+0.052
Neg logit contributions
 rather-0.138
rig-0.134
 annoying-0.134
 stump-0.130
 probably-0.125
Token mum
Token ID25682
Feature activation-7.70e-02

Pos logit contributions
ged+0.089
 recess+0.078
iked+0.073
or+0.069
ies+0.063
Neg logit contributions
 rather-0.169
rig-0.168
 annoying-0.167
 stump-0.161
 probably-0.155
Token,
Token ID11
Feature activation-1.80e-01

Pos logit contributions
ged+0.094
 recess+0.080
iked+0.080
or+0.072
ed+0.065
Neg logit contributions
 rather-0.169
rig-0.162
 annoying-0.161
 stump-0.158
 probably-0.153
Token the
Token ID262
Feature activation-1.30e-02

Pos logit contributions
ged+0.241
iked+0.207
 recess+0.203
or+0.193
ed+0.173
Neg logit contributions
 rather-0.376
 annoying-0.361
rig-0.355
 stump-0.343
 probably-0.337
Token garden
Token ID11376
Feature activation+7.73e-02

Pos logit contributions
ged+0.021
 recess+0.021
iked+0.019
or+0.018
 roles+0.018
Neg logit contributions
rig-0.022
 rather-0.021
 annoying-0.020
 stump-0.019
 probably-0.018
Token was
Token ID373
Feature activation+6.25e-02

Pos logit contributions
 rather+0.157
rig+0.156
 annoying+0.154
 stump+0.145
 probably+0.137
Neg logit contributions
ged-0.105
iked-0.091
ed-0.086
 recess-0.086
or-0.084
Token soon
Token ID2582
Feature activation-1.24e-01

Pos logit contributions
 rather+0.088
rig+0.087
 annoying+0.084
 stump+0.080
 probably+0.074
Neg logit contributions
ged-0.123
 recess-0.111
iked-0.110
or-0.104
ed-0.100
Token clean
Token ID3424
Feature activation+5.06e-02

Pos logit contributions
ged+0.214
 recess+0.188
iked+0.186
or+0.174
ed+0.168
Neg logit contributions
 rather-0.210
rig-0.205
 annoying-0.202
 stump-0.192
 probably-0.182
Token and
Token ID290
Feature activation+1.28e-01

Pos logit contributions
ged+0.071
iked+0.062
 recess+0.061
or+0.058
ed+0.055
Neg logit contributions
 rather-0.082
 annoying-0.081
rig-0.080
 stump-0.076
 probably-0.073
Token flew
Token ID13112
Feature activation-6.57e-02

Pos logit contributions
 rather+0.223
 annoying+0.215
rig+0.212
 stump+0.210
 probably+0.205
Neg logit contributions
ged-0.207
 recess-0.190
iked-0.186
or-0.179
rav-0.161
Token away
Token ID1497
Feature activation-1.17e-01

Pos logit contributions
ged+0.087
iked+0.074
 recess+0.073
or+0.068
rav+0.062
Neg logit contributions
 rather-0.138
rig-0.131
 annoying-0.130
 stump-0.128
 probably-0.123
Token with
Token ID351
Feature activation-3.09e-02

Pos logit contributions
ged+0.166
iked+0.142
 recess+0.140
or+0.132
ed+0.125
Neg logit contributions
 rather-0.232
 annoying-0.226
rig-0.226
 stump-0.216
 probably-0.208
Token a
Token ID257
Feature activation+1.03e-01

Pos logit contributions
ged+0.042
 recess+0.038
iked+0.035
or+0.032
ed+0.031
Neg logit contributions
rig-0.064
 rather-0.062
 annoying-0.060
 stump-0.059
 probably-0.056
Token smile
Token ID8212
Feature activation-4.32e-02

Pos logit contributions
 rather+0.175
 annoying+0.174
rig+0.172
 stump+0.159
 probably+0.158
Neg logit contributions
ged-0.157
 recess-0.141
iked-0.130
ies-0.126
,-0.126
Token on
Token ID319
Feature activation-1.42e-01

Pos logit contributions
ged+0.050
 recess+0.043
iked+0.042
or+0.039
 roles+0.035
Neg logit contributions
 rather-0.095
 annoying-0.092
rig-0.091
 stump-0.090
 probably-0.087
Token his
Token ID465
Feature activation-6.95e-02

Pos logit contributions
ged+0.150
 recess+0.120
iked+0.120
or+0.109
ed+0.096
Neg logit contributions
 rather-0.337
rig-0.331
 annoying-0.326
 stump-0.318
 probably-0.305
Token face
Token ID1986
Feature activation+1.00e-01

Pos logit contributions
ged+0.091
iked+0.074
 recess+0.072
ies+0.071
ed+0.067
Neg logit contributions
 rather-0.144
rig-0.133
 annoying-0.132
pill-0.131
 stump-0.129
Token.
Token ID13
Feature activation+2.96e-02

Pos logit contributions
 rather+0.201
rig+0.197
 annoying+0.197
 stump+0.186
 probably+0.180
Neg logit contributions
ged-0.141
iked-0.121
 recess-0.117
or-0.113
ed-0.106
Token He
Token ID679
Feature activation+1.19e-01

Pos logit contributions
 rather+0.039
 annoying+0.038
rig+0.037
 stump+0.035
 probably+0.033
Neg logit contributions
ged-0.061
iked-0.056
 recess-0.055
or-0.053
ed-0.051
Token from
Token ID422
Feature activation-6.95e-02

Pos logit contributions
ged+0.086
iked+0.072
or+0.069
 recess+0.069
ed+0.067
Neg logit contributions
 annoying-0.131
rig-0.130
 rather-0.129
 stump-0.120
 probably-0.118
Token his
Token ID465
Feature activation-6.90e-02

Pos logit contributions
ged+0.087
 recess+0.074
iked+0.071
or+0.067
ed+0.060
Neg logit contributions
 rather-0.150
rig-0.147
 annoying-0.146
 stump-0.140
 probably-0.135
Token bag
Token ID6131
Feature activation+2.25e-02

Pos logit contributions
ged+0.111
 recess+0.107
iked+0.101
or+0.096
 roles+0.094
Neg logit contributions
 annoying-0.113
rig-0.113
 rather-0.112
 stump-0.100
 intense-0.099
Token until
Token ID1566
Feature activation-1.86e-01

Pos logit contributions
 annoying+0.046
 rather+0.046
rig+0.045
 stump+0.042
 probably+0.041
Neg logit contributions
ged-0.030
iked-0.026
or-0.025
 recess-0.025
ed-0.024
Token he
Token ID339
Feature activation-3.04e-02

Pos logit contributions
ged+0.295
iked+0.264
or+0.261
 recess+0.254
rav+0.242
Neg logit contributions
 rather-0.324
 annoying-0.305
 probably-0.297
 stump-0.293
 awful-0.288
Token had
Token ID550
Feature activation-5.70e-02

Pos logit contributions
ged+0.043
iked+0.036
 recess+0.035
or+0.034
ed+0.033
Neg logit contributions
 rather-0.062
rig-0.060
 annoying-0.060
 stump-0.057
 probably-0.054
Token enough
Token ID1576
Feature activation-9.12e-02

Pos logit contributions
ged+0.091
 recess+0.081
iked+0.080
or+0.074
ed+0.072
Neg logit contributions
rig-0.102
 rather-0.102
 annoying-0.099
 stump-0.096
 wire-0.088
Token energy
Token ID2568
Feature activation+3.80e-02

Pos logit contributions
ged+0.147
iked+0.130
 recess+0.130
or+0.122
ed+0.115
Neg logit contributions
 rather-0.163
 annoying-0.158
rig-0.158
 stump-0.150
 probably-0.144
Token to
Token ID284
Feature activation+1.20e-02

Pos logit contributions
 rather+0.072
 annoying+0.071
rig+0.071
 stump+0.067
 probably+0.064
Neg logit contributions
ged-0.058
iked-0.050
 recess-0.049
or-0.047
ed-0.044
Token yield
Token ID7800
Feature activation-1.08e-02

Pos logit contributions
 annoying+0.023
 rather+0.022
rig+0.022
pill+0.021
 unlike+0.021
Neg logit contributions
ged-0.017
 recess-0.016
or-0.015
iked-0.015
 later-0.014
Token himself
Token ID2241
Feature activation-7.76e-02

Pos logit contributions
ged+0.015
 recess+0.012
iked+0.012
or+0.012
ed+0.011
Neg logit contributions
 rather-0.022
rig-0.021
 annoying-0.021
 stump-0.020
 probably-0.020
Token carrots
Token ID34397
Feature activation+1.83e-02

Pos logit contributions
ged+0.107
 recess+0.096
iked+0.094
or+0.089
 roles+0.084
Neg logit contributions
 rather-0.119
rig-0.116
 annoying-0.116
 stump-0.111
 probably-0.106
Token to
Token ID284
Feature activation+1.39e-02

Pos logit contributions
 rather+0.035
 annoying+0.034
rig+0.034
 stump+0.033
 probably+0.031
Neg logit contributions
ged-0.027
iked-0.023
 recess-0.023
or-0.022
ed-0.021
Token chew
Token ID34722
Feature activation-1.93e-02

Pos logit contributions
 rather+0.026
rig+0.026
 annoying+0.025
 stump+0.024
 awful+0.023
Neg logit contributions
ged-0.021
 recess-0.019
iked-0.018
or-0.017
ies-0.015
Token on
Token ID319
Feature activation-1.41e-01

Pos logit contributions
ged+0.027
 recess+0.022
iked+0.022
or+0.022
ed+0.020
Neg logit contributions
 rather-0.039
rig-0.039
 annoying-0.038
 stump-0.037
 probably-0.035
Token.
Token ID13
Feature activation+2.02e-02

Pos logit contributions
ged+0.157
 recess+0.131
or+0.124
,+0.114
 roles+0.113
Neg logit contributions
rig-0.318
 annoying-0.305
 rather-0.296
 stump-0.286
 intense-0.279
Token<|endoftext|>
Token ID50256
Feature activation-1.22e-01

Pos logit contributions
 annoying+0.042
 rather+0.042
rig+0.041
 stump+0.039
 probably+0.038
Neg logit contributions
ged-0.025
 recess-0.022
iked-0.022
or-0.021
ed-0.019
TokenOnce
Token ID7454
Feature activation-1.81e-01

Pos logit contributions
ged+0.264
iked+0.252
ies+0.231
 recess+0.231
or+0.225
Neg logit contributions
 rather-0.144
 stump-0.144
rig-0.136
 probably-0.127
 wire-0.120
Token upon
Token ID2402
Feature activation-8.26e-02

Pos logit contributions
ged+0.309
iked+0.271
 recess+0.264
ipped+0.254
ed+0.252
Neg logit contributions
 rather-0.293
rig-0.281
 really-0.266
 truly-0.262
 probably-0.259
Token a
Token ID257
Feature activation+8.72e-02

Pos logit contributions
ged+0.131
 recess+0.113
iked+0.107
ed+0.104
 later+0.102
Neg logit contributions
rig-0.145
 rather-0.139
 annoying-0.139
 stump-0.128
 electric-0.124
Token time
Token ID640
Feature activation-9.12e-02

Pos logit contributions
 rather+0.163
 annoying+0.162
 stump+0.159
rig+0.156
 probably+0.147
Neg logit contributions
ged-0.122
 recess-0.106
iked-0.102
or-0.098
ies-0.089
Token,
Token ID11
Feature activation-1.91e-01

Pos logit contributions
ged+0.153
 recess+0.144
iked+0.132
or+0.132
 roles+0.129
Neg logit contributions
 rather-0.153
 unlike-0.129
 probably-0.129
 really-0.129
-0.127
Token happily
Token ID18177
Feature activation-9.33e-02

Pos logit contributions
ged+0.173
iked+0.144
ed+0.135
or+0.134
 recess+0.125
Neg logit contributions
 rather-0.270
 annoying-0.269
rig-0.264
 stump-0.258
 awful-0.246
Token showed
Token ID3751
Feature activation-1.29e-01

Pos logit contributions
ged+0.151
 recess+0.135
iked+0.132
or+0.126
 roles+0.118
Neg logit contributions
 rather-0.165
rig-0.159
 annoying-0.158
 stump-0.151
 probably-0.145
Token it
Token ID340
Feature activation-1.82e-02

Pos logit contributions
ged+0.212
iked+0.188
 recess+0.166
olph+0.162
rav+0.161
Neg logit contributions
 rather-0.227
 annoying-0.211
 tre-0.205
 wire-0.198
 sharp-0.198
Token to
Token ID284
Feature activation+1.35e-02

Pos logit contributions
ged+0.021
iked+0.020
 recess+0.017
ed+0.017
or+0.017
Neg logit contributions
 annoying-0.040
 rather-0.039
rig-0.039
 stump-0.038
 probably-0.036
Token her
Token ID607
Feature activation-8.84e-02

Pos logit contributions
 rather+0.028
 annoying+0.027
 probably+0.025
rig+0.024
 tre+0.024
Neg logit contributions
ged-0.019
iked-0.017
or-0.015
 recess-0.015
ed-0.014
Token mom
Token ID1995
Feature activation-5.85e-02

Pos logit contributions
ged+0.123
iked+0.102
 recess+0.097
or+0.095
ed+0.087
Neg logit contributions
 rather-0.190
rig-0.180
 annoying-0.178
 stump-0.170
 probably-0.168
Tokenmy
Token ID1820
Feature activation-1.23e-01

Pos logit contributions
ged+0.098
 recess+0.089
iked+0.087
or+0.079
 roles+0.076
Neg logit contributions
 rather-0.107
rig-0.099
 stump-0.097
 annoying-0.095
 probably-0.092
Token.
Token ID13
Feature activation+2.82e-02

Pos logit contributions
ged+0.214
 recess+0.193
iked+0.191
or+0.179
ed+0.169
Neg logit contributions
 rather-0.208
rig-0.197
 annoying-0.193
 stump-0.190
 probably-0.180
Token The
Token ID383
Feature activation+8.57e-02

Pos logit contributions
 rather+0.045
rig+0.042
 annoying+0.042
 stump+0.040
 probably+0.038
Neg logit contributions
ged-0.052
 recess-0.046
iked-0.046
or-0.044
ed-0.041
Token crystal
Token ID15121
Feature activation+5.04e-02

Pos logit contributions
rig+0.139
 rather+0.137
 annoying+0.128
 stump+0.121
 probably+0.112
Neg logit contributions
ged-0.153
 recess-0.138
iked-0.132
or-0.127
ed-0.126
Token was
Token ID373
Feature activation+5.69e-02

Pos logit contributions
 rather+0.101
rig+0.100
 annoying+0.098
 stump+0.090
 awful+0.086
Neg logit contributions
ged-0.071
ies-0.058
iked-0.058
ed-0.057
or-0.053
Token<|endoftext|>
Token ID50256
Feature activation-1.18e-01

Pos logit contributions
 rather+0.034
rig+0.032
 stump+0.031
 probably+0.030
 tre+0.029
Neg logit contributions
ged-0.036
 recess-0.032
iked-0.031
or-0.030
 roles-0.029
TokenOnce
Token ID7454
Feature activation-1.85e-01

Pos logit contributions
ged+0.246
iked+0.234
 recess+0.215
ies+0.214
or+0.209
Neg logit contributions
 rather-0.148
 stump-0.148
rig-0.141
 probably-0.131
 annoying-0.125
Token upon
Token ID2402
Feature activation-8.51e-02

Pos logit contributions
ged+0.320
iked+0.290
 recess+0.278
 roles+0.274
ipped+0.271
Neg logit contributions
 rather-0.277
 really-0.261
rig-0.260
 much-0.250
 probably-0.249
Token a
Token ID257
Feature activation+8.40e-02

Pos logit contributions
ged+0.128
 recess+0.112
iked+0.103
ed+0.102
 later+0.101
Neg logit contributions
rig-0.156
 rather-0.148
 annoying-0.147
 stump-0.137
 electric-0.134
Token time
Token ID640
Feature activation-9.44e-02

Pos logit contributions
 rather+0.159
 annoying+0.157
 stump+0.156
rig+0.151
 probably+0.145
Neg logit contributions
ged-0.115
 recess-0.101
iked-0.096
or-0.092
ies-0.083
Token,
Token ID11
Feature activation-1.96e-01

Pos logit contributions
ged+0.161
 recess+0.155
 roles+0.146
or+0.145
iked+0.144
Neg logit contributions
 rather-0.149
 really-0.129
-0.125
 unlike-0.123
 because-0.121
Token there
Token ID612
Feature activation-1.88e-01

Pos logit contributions
iked+0.327
 recess+0.319
or+0.311
ged+0.303
 roles+0.288
Neg logit contributions
 rather-0.322
 probably-0.317
 stump-0.308
 annoying-0.306
 wire-0.305
Token was
Token ID373
Feature activation+3.83e-02

Pos logit contributions
ged+0.406
iked+0.388
 recess+0.384
or+0.360
 roles+0.345
Neg logit contributions
 rather-0.230
 probably-0.218
 annoying-0.207
 stump-0.203
 wire-0.199
Token a
Token ID257
Feature activation+8.53e-02

Pos logit contributions
rig+0.066
 rather+0.063
 annoying+0.063
 stump+0.061
 probably+0.056
Neg logit contributions
ged-0.063
 recess-0.060
iked-0.054
ed-0.051
 later-0.049
Token little
Token ID1310
Feature activation+1.13e-01

Pos logit contributions
 rather+0.145
rig+0.142
 stump+0.136
 annoying+0.134
 probably+0.127
Neg logit contributions
ged-0.134
 recess-0.124
ies-0.114
iked-0.112
or-0.111
Token girl
Token ID2576
Feature activation-4.78e-02

Pos logit contributions
rig+0.273
 rather+0.268
 annoying+0.255
 probably+0.254
 stump+0.251
Neg logit contributions
ged-0.109
 recess-0.097
ies-0.096
,-0.084
or-0.078
Token,
Token ID11
Feature activation-2.00e-01

Pos logit contributions
ged+0.140
or+0.122
iked+0.120
 recess+0.111
ipped+0.109
Neg logit contributions
 rather-0.181
 annoying-0.170
 probably-0.164
 stump-0.163
 wire-0.158
Token all
Token ID477
Feature activation-1.21e-03

Pos logit contributions
ged+0.291
 recess+0.266
or+0.262
iked+0.259
rav+0.237
Neg logit contributions
 rather-0.377
 annoying-0.350
 probably-0.346
 stump-0.339
 wire-0.327
Token of
Token ID286
Feature activation+6.83e-02

Pos logit contributions
ged+0.002
iked+0.001
 recess+0.001
or+0.001
ed+0.001
Neg logit contributions
 rather-0.003
 annoying-0.002
rig-0.002
 stump-0.002
 probably-0.002
Token the
Token ID262
Feature activation-3.82e-02

Pos logit contributions
rig+0.137
 rather+0.132
 annoying+0.126
 stump+0.121
 probably+0.115
Neg logit contributions
ged-0.097
 recess-0.092
iked-0.083
 later-0.075
ies-0.072
Token sudden
Token ID4802
Feature activation+1.84e-02

Pos logit contributions
ged+0.059
 recess+0.054
iked+0.053
or+0.050
ies+0.048
Neg logit contributions
 rather-0.070
rig-0.068
 annoying-0.065
 stump-0.064
 probably-0.062
Token,
Token ID11
Feature activation-1.99e-01

Pos logit contributions
 wire+0.032
 smoke+0.032
 sharp+0.032
 rather+0.031
 spe+0.031
Neg logit contributions
ged-0.027
or-0.023
 recess-0.022
iked-0.021
rav-0.021
Token it
Token ID340
Feature activation-2.54e-02

Pos logit contributions
ged+0.271
iked+0.231
 recess+0.230
or+0.220
ed+0.196
Neg logit contributions
 rather-0.409
 annoying-0.395
rig-0.383
 stump-0.380
 probably-0.371
Token started
Token ID2067
Feature activation-6.28e-02

Pos logit contributions
ged+0.032
iked+0.027
 recess+0.026
or+0.025
ed+0.024
Neg logit contributions
 rather-0.055
rig-0.053
 annoying-0.052
 stump-0.051
 probably-0.048
Token to
Token ID284
Feature activation+1.05e-02

Pos logit contributions
ged+0.058
iked+0.053
 recess+0.048
 roles+0.044
,+0.042
Neg logit contributions
rig-0.149
 annoying-0.142
 rather-0.142
 certainly-0.137
 awful-0.135
Token rain
Token ID6290
Feature activation+1.02e-01

Pos logit contributions
rig+0.020
 rather+0.018
 prized+0.018
pill+0.018
 annoying+0.017
Neg logit contributions
,-0.013
 recess-0.013
 later-0.013
ged-0.012
ies-0.012
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
 rather+0.220
rig+0.216
 annoying+0.212
 probably+0.204
 likely+0.203
Neg logit contributions
ged-0.115
iked-0.095
ed-0.094
 recess-0.091
,-0.091
Token his
Token ID465
Feature activation-5.92e-02

Pos logit contributions
ged+0.142
iked+0.115
 recess+0.114
ed+0.103
or+0.101
Neg logit contributions
 rather-0.256
rig-0.251
 annoying-0.247
 stump-0.238
 probably-0.231
Token tail
Token ID7894
Feature activation+4.55e-02

Pos logit contributions
ged+0.127
 recess+0.113
iked+0.111
ed+0.105
ies+0.105
Neg logit contributions
 rather-0.077
rig-0.072
 annoying-0.071
 stump-0.064
 probably-0.062
Token.
Token ID13
Feature activation+5.53e-02

Pos logit contributions
rig+0.087
 annoying+0.087
 rather+0.083
 stump+0.077
 probably+0.076
Neg logit contributions
ged-0.070
iked-0.059
ed-0.058
or-0.053
 recess-0.053
Token He
Token ID679
Feature activation+1.28e-01

Pos logit contributions
 rather+0.085
 annoying+0.083
rig+0.082
 stump+0.077
 probably+0.074
Neg logit contributions
ged-0.099
 recess-0.092
iked-0.092
or-0.084
ed-0.082
Token l
Token ID300
Feature activation+1.39e-01

Pos logit contributions
 rather+0.251
rig+0.250
 annoying+0.247
 stump+0.246
 awful+0.233
Neg logit contributions
ged-0.170
iked-0.134
 recess-0.132
ed-0.132
or-0.127
Tokenicked
Token ID9484
Feature activation-2.18e-01

Pos logit contributions
 rather+0.368
 annoying+0.361
rig+0.360
 stump+0.349
 probably+0.340
Neg logit contributions
ged-0.109
iked-0.081
 recess-0.079
or-0.069
ed-0.061
Token the
Token ID262
Feature activation-5.67e-03

Pos logit contributions
ged+0.273
iked+0.236
 recess+0.235
or+0.216
 roles+0.192
Neg logit contributions
 rather-0.466
 annoying-0.449
rig-0.443
 stump-0.428
 probably-0.418
Token fat
Token ID3735
Feature activation+1.64e-01

Pos logit contributions
ged+0.010
 recess+0.009
iked+0.009
ed+0.008
 roles+0.008
Neg logit contributions
 rather-0.009
rig-0.009
 annoying-0.008
 likely-0.008
 probably-0.008
Token boy
Token ID2933
Feature activation+3.26e-03

Pos logit contributions
rig+0.298
 rather+0.283
 annoying+0.267
 likely+0.254
 probably+0.253
Neg logit contributions
ged-0.289
 recess-0.233
iked-0.230
ed-0.230
ies-0.221
Token's
Token ID338
Feature activation+9.10e-02

Pos logit contributions
 annoying+0.006
 rather+0.006
rig+0.006
 stump+0.005
 probably+0.005
Neg logit contributions
ged-0.005
iked-0.004
ed-0.004
or-0.004
ies-0.004
Token face
Token ID1986
Feature activation+1.05e-01

Pos logit contributions
rig+0.158
 rather+0.158
 annoying+0.152
 likely+0.146
 probably+0.142
Neg logit contributions
ged-0.148
ed-0.118
iked-0.118
 recess-0.116
ies-0.110
Token drink
Token ID4144
Feature activation+2.63e-02

Pos logit contributions
 rather+0.130
 annoying+0.125
rig+0.124
 stump+0.119
 probably+0.114
Neg logit contributions
ged-0.140
iked-0.125
 recess-0.125
or-0.119
ed-0.111
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
rig+0.058
 annoying+0.057
 rather+0.056
 stump+0.054
 probably+0.052
Neg logit contributions
ged-0.032
iked-0.026
or-0.026
ed-0.025
 recess-0.025
Token His
Token ID2399
Feature activation+1.61e-04

Pos logit contributions
 rather+0.016
rig+0.015
 annoying+0.015
 stump+0.015
 probably+0.014
Neg logit contributions
ged-0.019
 recess-0.017
iked-0.017
or-0.016
 roles-0.015
Token mom
Token ID1995
Feature activation-7.53e-02

Pos logit contributions
 rather+0.000
 annoying+0.000
rig+0.000
 stump+0.000
 probably+0.000
Neg logit contributions
ged-0.000
 recess-0.000
iked-0.000
or-0.000
ed-0.000
Token said
Token ID531
Feature activation-1.82e-01

Pos logit contributions
ged+0.100
iked+0.083
 recess+0.080
or+0.077
ed+0.073
Neg logit contributions
 rather-0.157
rig-0.153
 annoying-0.153
 stump-0.147
 probably-0.140
Token,
Token ID11
Feature activation-1.97e-01

Pos logit contributions
 recess+0.211
ged+0.204
iked+0.180
 roles+0.164
or+0.163
Neg logit contributions
 rather-0.413
 annoying-0.372
 probably-0.369
 awful-0.362
 unlike-0.353
Token "
Token ID366
Feature activation-1.29e-02

Pos logit contributions
ged+0.293
 recess+0.259
iked+0.254
or+0.240
ed+0.222
Neg logit contributions
 rather-0.378
 annoying-0.362
rig-0.361
 stump-0.348
 probably-0.335
TokenJohn
Token ID7554
Feature activation-1.24e-01

Pos logit contributions
ged+0.023
iked+0.021
 recess+0.020
or+0.020
ed+0.019
Neg logit contributions
 rather-0.021
rig-0.020
 annoying-0.020
 stump-0.019
 probably-0.018
Token,
Token ID11
Feature activation-1.95e-01

Pos logit contributions
ged+0.109
 recess+0.092
iked+0.086
or+0.071
 roles+0.063
Neg logit contributions
 rather-0.315
 annoying-0.302
rig-0.299
 stump-0.295
 probably-0.294
Token drink
Token ID4144
Feature activation+2.97e-02

Pos logit contributions
ged+0.261
or+0.230
iked+0.229
 recess+0.221
rav+0.204
Neg logit contributions
rig-0.381
 rather-0.370
 stump-0.369
 annoying-0.368
 intense-0.342
Token some
Token ID617
Feature activation-1.63e-02

Pos logit contributions
rig+0.054
 rather+0.054
 annoying+0.052
 stump+0.051
 probably+0.048
Neg logit contributions
ged-0.046
or-0.039
 recess-0.039
iked-0.038
ed-0.037
Token know
Token ID760
Feature activation-6.65e-02

Pos logit contributions
 rather+0.238
 annoying+0.228
rig+0.228
 stump+0.217
 probably+0.207
Neg logit contributions
ged-0.261
iked-0.232
 recess-0.231
or-0.221
ed-0.208
Token that
Token ID326
Feature activation+8.28e-02

Pos logit contributions
ged+0.086
 recess+0.074
iked+0.074
or+0.068
ed+0.062
Neg logit contributions
 rather-0.142
 annoying-0.136
rig-0.133
 stump-0.130
 probably-0.128
Token L
Token ID406
Feature activation-7.86e-02

Pos logit contributions
 rather+0.186
rig+0.182
 annoying+0.181
 stump+0.175
 probably+0.169
Neg logit contributions
ged-0.096
 recess-0.080
iked-0.079
or-0.074
ed-0.066
Tokenila
Token ID10102
Feature activation-3.74e-02

Pos logit contributions
ged+0.093
 recess+0.078
iked+0.077
or+0.068
 roles+0.065
Neg logit contributions
 rather-0.172
rig-0.171
 annoying-0.168
 stump-0.161
 probably-0.159
Token was
Token ID373
Feature activation+7.91e-02

Pos logit contributions
ged+0.057
iked+0.048
 recess+0.047
or+0.047
ed+0.044
Neg logit contributions
 rather-0.071
rig-0.069
 annoying-0.068
 stump-0.065
 intense-0.062
Token inside
Token ID2641
Feature activation-1.93e-01

Pos logit contributions
rig+0.141
 rather+0.139
 annoying+0.133
 stump+0.130
 intense+0.121
Neg logit contributions
ged-0.127
 recess-0.111
iked-0.106
or-0.101
ed-0.099
Token.
Token ID13
Feature activation+6.06e-02

Pos logit contributions
ged+0.253
 recess+0.217
iked+0.211
or+0.202
ed+0.183
Neg logit contributions
rig-0.398
 rather-0.398
 annoying-0.391
 stump-0.376
 probably-0.359
Token She
Token ID1375
Feature activation+1.69e-01

Pos logit contributions
 rather+0.086
 annoying+0.082
rig+0.082
 stump+0.077
 probably+0.073
Neg logit contributions
ged-0.121
iked-0.109
 recess-0.109
or-0.104
ed-0.099
Token picked
Token ID6497
Feature activation-9.65e-02

Pos logit contributions
 rather+0.379
rig+0.370
 annoying+0.359
 stump+0.359
 highest+0.332
Neg logit contributions
ged-0.202
ed-0.154
iked-0.149
or-0.140
 recess-0.135
Token up
Token ID510
Feature activation-3.36e-02

Pos logit contributions
ged+0.141
iked+0.129
 recess+0.126
or+0.111
rav+0.108
Neg logit contributions
 rather-0.179
 annoying-0.172
rig-0.170
 stump-0.165
 probably-0.163
Token the
Token ID262
Feature activation-4.89e-03

Pos logit contributions
ged+0.045
iked+0.039
 recess+0.039
or+0.036
ed+0.033
Neg logit contributions
 rather-0.070
 annoying-0.067
rig-0.067
 stump-0.065
 probably-0.062