TOP ACTIVATIONS
MAX = 0.265

her mom my watched in amaz ement as the machine flew
't know the rh in oc eros was wild . Can
When Daisy put the magn ifying glass down , the cater
John chose some carrots and cuc umbers to take home .
giving him too many ch ocol ates ." The king learned
he would take the av oc ados and scatter them around
reward . It was ch ocol ates and sweets !
you go , my d arl ings . Enjoy your lunch
give to the rh in oc eros . Lily walked to
like lettuce , tomatoes and cuc umbers . He smiled as
there was a rh in oc eros who was always naughty
go ! Everyone watched in amaz ement as it took off
am taking the rh in oc eros to a new home
. Tim my watched with amaz ement as he saw stars
eddy bear and watched in amaz ement as it shrunk an
as he opened the ch ocol ates and started eating them
egg ! He watched in amaz ement as the egg began
air . He watched with amaz ement as the z ig
quiet compared to the noisy se ag ull s outside .
children put on their bath ro bes and ran around in

MIN ACTIVATIONS
MIN = -0.112

pot in the sky . Joe was so happy that he
river and went home . Mark learned an important lesson that
! As Daisy enjoyed the honey , she
had a great time . Joe was happy and he didn
When they got there , John 's dad had a surprise
him for the gift . Bob smiled and waved goodbye ,
mistakes . When John arrived home , the sun
after that . Every day Lucy would observe the sheep and
Bob and the man . Bob was happy because of his
the theater . Even though John was only three , he
to play with . As Lucy was pushing the soap around
After that day Sam was always happy to go
. And that 's how Tim shared his special surprise with
mom my drove home . John was very happy and thanked
time and said goodbye . Joe felt much better now that
's the story of how Mia and her mom turned apples
the mug . Unfortunately , John was too honest and had
time to go home . Tim said goodbye to the squirrel
. From then on , Sandy treated her little gray t
Max stopped sh ivering . Tim learned that it 's important

INTERVAL 0.171 - 0.265
CONTAINS 0.036%

as he opened the ch ocol ates and started eating them
egg ! He watched in amaz ement as the egg began
go ! Everyone watched in amaz ement as it took off
eddy bear and watched in amaz ement as it shrunk an
that the bad rh in oc eros was now nice .

INTERVAL 0.077 - 0.171
CONTAINS 3.321%

the sunshine . <|endoftext|> Once upon a time , there was
a time , in a small house , there lived a
She wanted to take a picture now ! M
, " That 's my dictionary . It helps me learn
saw a beautiful bird in a tree .

INTERVAL -0.018 - 0.077
CONTAINS 62.167%

father , " What 's wrong ?" His
, and looked after it like it was his own .
hand . He knew it was something special and he couldn
She tried to get on , but she was too small
pilot realize the power of being generous . From then on

INTERVAL -0.112 - -0.018
CONTAINS 34.476%

goes to his room and takes his trumpet . He blows
" cup cake " to create her game .
before agreeing . He happily put the pe b ble in
. She hugs her friends and says you are welcome .
said Joe . The bunny hopped over to Joe
Token her
Token ID607
Feature activation+1.30e-01

Pos logit contributions
Who+0.066
 had+0.063
 other+0.061
Where+0.060
 moral+0.059
Neg logit contributions
cient-0.108
ute-0.108
ap-0.108
uttering-0.107
ach-0.107
Token mom
Token ID1995
Feature activation+8.51e-02

Pos logit contributions
uttering+0.216
cient+0.212
iddy+0.211
ute+0.209
ach+0.209
Neg logit contributions
 had-0.183
 other-0.182
 moral-0.164
 also-0.160
Who-0.160
Tokenmy
Token ID1820
Feature activation+8.50e-02

Pos logit contributions
ute+0.169
cient+0.169
ap+0.168
uttering+0.167
ach+0.167
Neg logit contributions
Who-0.080
 had-0.078
 other-0.074
Where-0.071
 moral-0.071
Token watched
Token ID7342
Feature activation+1.17e-01

Pos logit contributions
ute+0.169
uttering+0.169
ach+0.167
ap+0.166
cient+0.166
Neg logit contributions
 had-0.084
Who-0.080
 other-0.076
 moral-0.071
 dial-0.071
Token in
Token ID287
Feature activation+6.83e-02

Pos logit contributions
ute+0.230
uttering+0.228
cient+0.227
 Majesty+0.223
ower+0.218
Neg logit contributions
 other-0.129
 had-0.123
Who-0.122
 dial-0.113
Where-0.109
Token amaz
Token ID40642
Feature activation+2.02e-01

Pos logit contributions
cient+0.120
ute+0.119
ower+0.116
uttering+0.116
ap+0.114
Neg logit contributions
Who-0.087
 had-0.086
 other-0.084
Where-0.081
 moral-0.079
Tokenement
Token ID972
Feature activation+1.21e-01

Pos logit contributions
cient+0.398
ute+0.387
oppy+0.383
uttering+0.376
sed+0.374
Neg logit contributions
 other-0.206
 sounded-0.203
 had-0.193
Who-0.193
He-0.190
Token as
Token ID355
Feature activation+8.85e-02

Pos logit contributions
cient+0.244
ute+0.236
ocol+0.235
 Majesty+0.233
uttering+0.231
Neg logit contributions
 had-0.125
 other-0.116
Who-0.104
 sounded-0.103
 shouted-0.102
Token the
Token ID262
Feature activation+6.16e-02

Pos logit contributions
cient+0.147
ap+0.147
ute+0.147
ach+0.145
uttering+0.145
Neg logit contributions
Who-0.111
 had-0.107
 other-0.104
Where-0.102
 moral-0.101
Token machine
Token ID4572
Feature activation+6.12e-02

Pos logit contributions
ummy+0.091
ach+0.090
ap+0.090
cient+0.090
ute+0.089
Neg logit contributions
Who-0.086
He-0.081
Where-0.081
 had-0.081
Less-0.080
Token flew
Token ID13112
Feature activation+9.53e-02

Pos logit contributions
ap+0.121
cient+0.120
ute+0.118
ach+0.117
ower+0.117
Neg logit contributions
Who-0.059
Where-0.053
 moral-0.052
Less-0.051
 other-0.051
Token't
Token ID470
Feature activation+1.15e-01

Pos logit contributions
 had+0.029
Who+0.025
 other+0.025
 also+0.023
 let+0.023
Neg logit contributions
ute-0.038
uttering-0.037
ower-0.035
ach-0.035
cient-0.034
Token know
Token ID760
Feature activation+1.08e-01

Pos logit contributions
cient+0.225
ute+0.225
ummy+0.223
uttering+0.222
ach+0.220
Neg logit contributions
 had-0.123
Who-0.120
 other-0.117
 moral-0.107
 also-0.105
Token the
Token ID262
Feature activation+9.38e-02

Pos logit contributions
uttering+0.176
ute+0.176
ower+0.170
cient+0.168
ach+0.167
Neg logit contributions
 other-0.159
 had-0.157
Who-0.154
 sounded-0.141
 also-0.141
Token rh
Token ID9529
Feature activation+1.72e-01

Pos logit contributions
ap+0.156
uttering+0.155
ach+0.154
cient+0.153
iddy+0.152
Neg logit contributions
 other-0.124
 had-0.122
Who-0.122
 moral-0.118
Where-0.112
Tokenin
Token ID259
Feature activation+7.22e-02

Pos logit contributions
icol+0.244
iddy+0.243
uttering+0.243
 Majesty+0.243
ap+0.242
Neg logit contributions
 other-0.254
Who-0.253
 had-0.252
 moral-0.240
Where-0.240
Tokenoc
Token ID420
Feature activation+1.93e-01

Pos logit contributions
cient+0.105
ach+0.102
ute+0.102
ap+0.102
ummy+0.100
Neg logit contributions
Who-0.107
 had-0.101
 other-0.100
Where-0.100
 moral-0.098
Tokeneros
Token ID27498
Feature activation+8.24e-02

Pos logit contributions
uttering+0.207
ute+0.175
utter+0.173
iddy+0.165
ap+0.165
Neg logit contributions
 other-0.382
Who-0.370
 moral-0.367
 Moral-0.364
 dial-0.353
Token was
Token ID373
Feature activation+7.20e-02

Pos logit contributions
cient+0.187
ap+0.186
ute+0.185
ach+0.184
uttering+0.183
Neg logit contributions
Who-0.054
 had-0.049
 other-0.047
Where-0.046
 moral-0.045
Token wild
Token ID4295
Feature activation+1.04e-01

Pos logit contributions
ute+0.136
cient+0.134
ap+0.133
ower+0.132
ach+0.132
Neg logit contributions
 had-0.083
Who-0.078
 other-0.078
Where-0.074
 moral-0.073
Token.
Token ID13
Feature activation+6.67e-02

Pos logit contributions
ute+0.219
uttering+0.217
ocol+0.216
cient+0.215
oppy+0.214
Neg logit contributions
 had-0.104
 other-0.094
 sounded-0.087
Who-0.082
 dial-0.079
Token Can
Token ID1680
Feature activation+8.86e-02

Pos logit contributions
cient+0.112
ap+0.111
ute+0.110
ach+0.109
ummy+0.109
Neg logit contributions
Who-0.083
 had-0.081
 other-0.079
 moral-0.077
Where-0.076
Token When
Token ID1649
Feature activation+9.11e-02

Pos logit contributions
cient+0.169
ute+0.169
uttering+0.169
ap+0.168
ach+0.167
Neg logit contributions
Who-0.093
 had-0.090
 other-0.086
Where-0.084
 moral-0.082
Token Daisy
Token ID40355
Feature activation+2.75e-02

Pos logit contributions
uttering+0.187
ute+0.186
ower+0.183
cient+0.181
iddy+0.180
Neg logit contributions
 had-0.097
Who-0.091
 other-0.090
 sounded-0.079
Where-0.079
Token put
Token ID1234
Feature activation+7.42e-02

Pos logit contributions
cient+0.058
ap+0.058
ute+0.057
ach+0.057
ummy+0.057
Neg logit contributions
Who-0.022
 other-0.020
 had-0.020
Where-0.020
 moral-0.020
Token the
Token ID262
Feature activation+6.10e-02

Pos logit contributions
cient+0.143
uttering+0.140
ute+0.140
ach+0.140
ap+0.139
Neg logit contributions
 had-0.081
 other-0.081
Who-0.078
Where-0.073
 sounded-0.071
Token magn
Token ID7842
Feature activation+1.23e-01

Pos logit contributions
ap+0.094
uttering+0.094
iddy+0.093
cient+0.093
ach+0.093
Neg logit contributions
 other-0.089
 had-0.085
Who-0.084
 moral-0.081
 dial-0.079
Tokenifying
Token ID4035
Feature activation+1.92e-01

Pos logit contributions
 Majesty+0.134
 disl+0.132
cient+0.127
ap+0.124
ach+0.122
Neg logit contributions
 had-0.232
Who-0.223
 other-0.219
He-0.218
Where-0.217
Token glass
Token ID5405
Feature activation+1.10e-01

Pos logit contributions
uttering+0.398
 Majesty+0.391
ower+0.390
cient+0.381
iddy+0.371
Neg logit contributions
 other-0.214
 had-0.211
See-0.191
 also-0.183
Oh-0.181
Token down
Token ID866
Feature activation+7.44e-02

Pos logit contributions
cient+0.162
 Majesty+0.161
uttering+0.159
ute+0.152
ocol+0.150
Neg logit contributions
 had-0.187
 other-0.177
 sounded-0.165
He-0.161
 also-0.159
Token,
Token ID11
Feature activation+6.10e-02

Pos logit contributions
cient+0.099
ute+0.098
uttering+0.095
 Majesty+0.094
ach+0.093
Neg logit contributions
 had-0.129
 other-0.126
Who-0.122
Where-0.117
He-0.115
Token the
Token ID262
Feature activation+5.39e-02

Pos logit contributions
ute+0.144
ach+0.142
uttering+0.141
ower+0.141
ap+0.140
Neg logit contributions
 had-0.053
 other-0.050
Who-0.046
Where-0.040
 sounded-0.039
Token cater
Token ID20825
Feature activation+1.01e-01

Pos logit contributions
cient+0.083
ap+0.082
ute+0.082
uttering+0.081
ach+0.081
Neg logit contributions
Who-0.075
 had-0.073
 other-0.071
Where-0.070
 moral-0.069
Token John
Token ID1757
Feature activation+1.02e-01

Pos logit contributions
Who+0.089
Where+0.087
Less+0.081
 moral+0.080
He+0.079
Neg logit contributions
ap-0.162
cient-0.161
iddy-0.147
ute-0.146
uttering-0.145
Token chose
Token ID7690
Feature activation+9.47e-02

Pos logit contributions
ute+0.218
uttering+0.216
cient+0.213
ap+0.212
ower+0.212
Neg logit contributions
 had-0.087
Who-0.083
 other-0.078
Where-0.071
 dial-0.071
Token some
Token ID617
Feature activation+9.10e-02

Pos logit contributions
cient+0.170
ute+0.166
uttering+0.164
ap+0.164
ower+0.163
Neg logit contributions
 other-0.124
 had-0.123
Who-0.114
 also-0.108
Where-0.108
Token carrots
Token ID34397
Feature activation+1.19e-01

Pos logit contributions
cient+0.151
ute+0.148
uttering+0.148
ap+0.146
 Majesty+0.146
Neg logit contributions
 other-0.125
 had-0.122
Who-0.117
 moral-0.112
Where-0.109
Token and
Token ID290
Feature activation+5.03e-02

Pos logit contributions
 Majesty+0.244
icol+0.235
ocol+0.228
 disl+0.226
uttering+0.222
Neg logit contributions
 had-0.151
 other-0.136
 also-0.133
He-0.122
 decided-0.114
Token cuc
Token ID38421
Feature activation+1.90e-01

Pos logit contributions
ute+0.104
uttering+0.103
iddy+0.100
ower+0.099
ach+0.098
Neg logit contributions
 had-0.057
 other-0.053
Who-0.051
 also-0.048
 sounded-0.045
Tokenumbers
Token ID17024
Feature activation+7.66e-02

Pos logit contributions
 disl+0.257
arter+0.244
icol+0.244
sing+0.241
ower+0.237
Neg logit contributions
He-0.295
Who-0.285
 other-0.284
Less-0.282
See-0.281
Token to
Token ID284
Feature activation+5.80e-02

Pos logit contributions
ute+0.144
uttering+0.143
cient+0.141
icol+0.141
 Majesty+0.141
Neg logit contributions
 had-0.094
 other-0.086
 also-0.080
He-0.078
 sounded-0.077
Token take
Token ID1011
Feature activation+7.02e-02

Pos logit contributions
uttering+0.122
ute+0.119
ower+0.118
iddy+0.116
ummy+0.112
Neg logit contributions
 other-0.065
 had-0.061
Who-0.061
 also-0.058
 dial-0.053
Token home
Token ID1363
Feature activation+8.68e-02

Pos logit contributions
ute+0.127
uttering+0.127
ower+0.123
cient+0.122
iddy+0.122
Neg logit contributions
 other-0.087
 had-0.085
Who-0.081
Where-0.076
 dial-0.076
Token.
Token ID13
Feature activation+4.99e-02

Pos logit contributions
ute+0.173
cient+0.171
uttering+0.170
 Majesty+0.169
icol+0.167
Neg logit contributions
 other-0.093
 had-0.090
He-0.078
 also-0.077
Who-0.076
Token giving
Token ID3501
Feature activation+8.62e-02

Pos logit contributions
uttering+0.127
ute+0.127
cient+0.122
ap+0.120
utter+0.118
Neg logit contributions
 other-0.096
 had-0.093
Who-0.092
 moral-0.088
 also-0.086
Token him
Token ID683
Feature activation+6.57e-02

Pos logit contributions
uttering+0.170
ute+0.163
ower+0.162
ap+0.158
iddy+0.157
Neg logit contributions
 other-0.105
 had-0.098
Who-0.093
Where-0.089
 dial-0.087
Token too
Token ID1165
Feature activation+7.78e-02

Pos logit contributions
uttering+0.135
ower+0.132
ute+0.129
icol+0.127
iddy+0.126
Neg logit contributions
 other-0.081
 had-0.071
 moral-0.069
 better-0.062
 also-0.061
Token many
Token ID867
Feature activation+7.07e-02

Pos logit contributions
cient+0.156
ute+0.155
ap+0.155
uttering+0.154
ach+0.154
Neg logit contributions
Who-0.072
 had-0.070
 other-0.068
Where-0.065
 moral-0.065
Token ch
Token ID442
Feature activation+1.08e-01

Pos logit contributions
cient+0.116
ute+0.116
ap+0.116
uttering+0.115
ach+0.114
Neg logit contributions
Who-0.091
 had-0.089
 other-0.088
 moral-0.084
Where-0.084
Tokenocol
Token ID4668
Feature activation+1.89e-01

Pos logit contributions
icol+0.109
cient+0.095
idents+0.086
ower+0.084
sing+0.083
Neg logit contributions
 other-0.211
 moral-0.202
He-0.202
 had-0.197
 generations-0.197
Tokenates
Token ID689
Feature activation+9.31e-02

Pos logit contributions
cient+0.309
icol+0.304
 disl+0.288
iddy+0.284
 Majesty+0.284
Neg logit contributions
 other-0.223
Less-0.219
 counted-0.217
 had-0.213
eps-0.213
Token."
Token ID526
Feature activation+5.90e-02

Pos logit contributions
uttering+0.184
icol+0.184
cient+0.183
 Majesty+0.183
ap+0.179
Neg logit contributions
 had-0.102
 other-0.098
 also-0.086
He-0.083
Who-0.077
Token The
Token ID383
Feature activation+4.58e-02

Pos logit contributions
cient+0.117
ap+0.117
ute+0.116
ach+0.115
ummy+0.115
Neg logit contributions
Who-0.055
 had-0.053
 other-0.051
Where-0.050
 moral-0.049
Token king
Token ID5822
Feature activation+3.00e-02

Pos logit contributions
cient+0.074
ap+0.073
ute+0.073
ach+0.072
uttering+0.072
Neg logit contributions
Who-0.060
 had-0.058
Where-0.056
 other-0.056
 moral-0.054
Token learned
Token ID4499
Feature activation+6.96e-02

Pos logit contributions
ap+0.060
cient+0.059
ute+0.059
ach+0.058
ummy+0.057
Neg logit contributions
Who-0.029
Where-0.028
 other-0.025
Less-0.024
See-0.024
Token he
Token ID339
Feature activation+5.48e-02

Pos logit contributions
uttering+0.299
ute+0.297
ower+0.292
cient+0.290
iddy+0.290
Neg logit contributions
 had-0.174
 other-0.162
Who-0.161
 sounded-0.148
Where-0.142
Token would
Token ID561
Feature activation+8.41e-02

Pos logit contributions
cient+0.119
ap+0.118
ute+0.117
ach+0.116
 Majesty+0.115
Neg logit contributions
Who-0.041
Where-0.037
 other-0.036
 had-0.035
 moral-0.035
Token take
Token ID1011
Feature activation+8.30e-02

Pos logit contributions
uttering+0.160
ute+0.157
ummy+0.157
ower+0.156
ach+0.156
Neg logit contributions
 had-0.095
Who-0.090
 other-0.088
 also-0.084
 dial-0.083
Token the
Token ID262
Feature activation+8.46e-02

Pos logit contributions
cient+0.144
uttering+0.144
ute+0.143
ower+0.141
iddy+0.140
Neg logit contributions
 other-0.102
Who-0.101
 had-0.101
Where-0.093
 dial-0.093
Token av
Token ID1196
Feature activation+1.57e-01

Pos logit contributions
ap+0.131
uttering+0.130
ach+0.128
iddy+0.128
ute+0.128
Neg logit contributions
 other-0.119
Who-0.117
 had-0.116
 moral-0.112
 dial-0.109
Tokenoc
Token ID420
Feature activation+1.88e-01

Pos logit contributions
uttering+0.279
iddy+0.277
ute+0.274
ower+0.274
ap+0.272
Neg logit contributions
 other-0.172
 had-0.170
He-0.170
Who-0.167
Where-0.160
Tokenados
Token ID22484
Feature activation+9.60e-02

Pos logit contributions
uttering+0.186
ute+0.165
iddy+0.155
ap+0.154
utter+0.153
Neg logit contributions
 other-0.373
 moral-0.362
 Moral-0.360
Who-0.359
Less-0.355
Token and
Token ID290
Feature activation+4.48e-02

Pos logit contributions
 Majesty+0.155
ocol+0.154
icol+0.148
iddy+0.145
ap+0.144
Neg logit contributions
 had-0.139
 other-0.138
He-0.128
 also-0.127
Who-0.122
Token scatter
Token ID41058
Feature activation+8.85e-02

Pos logit contributions
ute+0.080
uttering+0.080
ap+0.078
iddy+0.077
ower+0.077
Neg logit contributions
 had-0.056
Who-0.055
 other-0.054
 moral-0.050
Where-0.050
Token them
Token ID606
Feature activation+5.35e-02

Pos logit contributions
ute+0.163
uttering+0.161
ower+0.159
ap+0.158
cient+0.158
Neg logit contributions
 other-0.105
 had-0.103
Who-0.100
Where-0.095
He-0.092
Token around
Token ID1088
Feature activation+7.90e-02

Pos logit contributions
cient+0.087
ute+0.087
ocol+0.086
ummy+0.086
uttering+0.085
Neg logit contributions
 had-0.072
 other-0.070
Who-0.065
 also-0.063
 moral-0.063
Token reward
Token ID6721
Feature activation+1.56e-01

Pos logit contributions
uttering+0.332
iddy+0.323
ower+0.316
cient+0.315
ute+0.312
Neg logit contributions
 other-0.257
 had-0.248
 moral-0.214
Who-0.213
 sounded-0.204
Token.
Token ID13
Feature activation+8.48e-02

Pos logit contributions
ute+0.260
uttering+0.260
cient+0.260
 Majesty+0.253
ap+0.252
Neg logit contributions
 had-0.216
 other-0.206
Who-0.194
 also-0.185
 sounded-0.183
Token It
Token ID632
Feature activation+4.30e-02

Pos logit contributions
cient+0.151
ap+0.150
ute+0.149
ach+0.148
ummy+0.147
Neg logit contributions
Who-0.097
 had-0.093
 other-0.091
 moral-0.088
Where-0.088
Token was
Token ID373
Feature activation+6.67e-02

Pos logit contributions
cient+0.091
ap+0.089
 Majesty+0.087
ute+0.085
ish+0.085
Neg logit contributions
Who-0.035
Where-0.032
Less-0.030
 other-0.029
 moral-0.029
Token ch
Token ID442
Feature activation+1.09e-01

Pos logit contributions
ute+0.110
cient+0.110
ap+0.109
uttering+0.108
ach+0.108
Neg logit contributions
 had-0.086
Who-0.086
 other-0.083
 moral-0.080
Where-0.080
Tokenocol
Token ID4668
Feature activation+1.85e-01

Pos logit contributions
 Majesty+0.091
cient+0.088
icol+0.087
 disl+0.083
ower+0.082
Neg logit contributions
He-0.225
 had-0.220
 other-0.214
See-0.214
 moral-0.213
Tokenates
Token ID689
Feature activation+9.34e-02

Pos logit contributions
cient+0.280
ute+0.270
icol+0.260
 disl+0.259
iddy+0.255
Neg logit contributions
 counted-0.250
 had-0.249
Who-0.248
 other-0.246
Less-0.243
Token and
Token ID290
Feature activation+4.05e-02

Pos logit contributions
 Majesty+0.162
 disl+0.161
icol+0.160
cient+0.160
ocol+0.159
Neg logit contributions
 had-0.127
 other-0.114
 also-0.110
He-0.103
 sounded-0.101
Token sweets
Token ID42402
Feature activation+7.45e-02

Pos logit contributions
ute+0.075
uttering+0.075
ap+0.074
cient+0.073
ach+0.073
Neg logit contributions
 had-0.048
Who-0.046
 other-0.044
Where-0.042
 sounded-0.041
Token!
Token ID0
Feature activation+5.65e-02

Pos logit contributions
ap+0.127
ute+0.126
cient+0.126
ocol+0.126
uttering+0.125
Neg logit contributions
 had-0.103
 other-0.093
Who-0.091
 also-0.088
 sounded-0.088
Token
Token ID198
Feature activation+7.08e-02

Pos logit contributions
cient+0.094
ap+0.094
ute+0.093
ach+0.092
ummy+0.092
Neg logit contributions
Who-0.071
 had-0.069
 other-0.067
 moral-0.065
Where-0.065
Token you
Token ID345
Feature activation+1.13e-01

Pos logit contributions
Who+0.086
 had+0.080
 moral+0.079
 dial+0.079
Where+0.079
Neg logit contributions
ap-0.084
cient-0.083
ute-0.082
uttering-0.081
ummy-0.081
Token go
Token ID467
Feature activation+1.07e-01

Pos logit contributions
uttering+0.206
ower+0.206
ocol+0.202
ummy+0.202
ute+0.200
Neg logit contributions
 had-0.150
 other-0.138
 also-0.128
 moral-0.126
 learned-0.123
Token,
Token ID11
Feature activation+7.03e-02

Pos logit contributions
ocol+0.166
cient+0.165
ower+0.164
uttering+0.163
ummy+0.160
Neg logit contributions
 other-0.164
 had-0.163
 also-0.152
Who-0.148
He-0.144
Token my
Token ID616
Feature activation+9.26e-02

Pos logit contributions
uttering+0.131
uc+0.127
cient+0.125
ower+0.124
ocol+0.124
Neg logit contributions
 other-0.092
 had-0.092
Who-0.089
 also-0.088
 moral-0.083
Token d
Token ID288
Feature activation+1.20e-01

Pos logit contributions
uttering+0.146
iddy+0.145
cient+0.142
ocol+0.139
ach+0.138
Neg logit contributions
 other-0.136
 had-0.130
 moral-0.129
Who-0.125
 also-0.117
Tokenarl
Token ID7063
Feature activation+1.84e-01

Pos logit contributions
 disl+0.133
cient+0.130
 Majesty+0.115
icol+0.105
 crop+0.104
Neg logit contributions
He-0.214
 moral-0.204
 other-0.201
Miss-0.194
 had-0.192
Tokenings
Token ID654
Feature activation+9.07e-02

Pos logit contributions
ocol+0.255
uttering+0.234
cient+0.230
latable+0.229
 disl+0.226
Neg logit contributions
 had-0.307
 other-0.306
He-0.295
 moral-0.288
 shouted-0.268
Token.
Token ID13
Feature activation+6.56e-02

Pos logit contributions
uttering+0.139
ocol+0.134
ute+0.134
 Majesty+0.133
cient+0.133
Neg logit contributions
 had-0.139
 other-0.130
Who-0.129
He-0.126
 sounded-0.125
Token Enjoy
Token ID18179
Feature activation+9.65e-02

Pos logit contributions
uttering+0.104
ute+0.102
ach+0.101
ower+0.100
cient+0.100
Neg logit contributions
Who-0.093
 had-0.089
 other-0.086
Where-0.086
 moral-0.085
Token your
Token ID534
Feature activation+6.26e-02

Pos logit contributions
uttering+0.186
sing+0.173
cient+0.171
iddy+0.169
utter+0.167
Neg logit contributions
 other-0.148
 had-0.120
 kn-0.116
 moral-0.115
 dial-0.113
Token lunch
Token ID9965
Feature activation+9.91e-02

Pos logit contributions
uttering+0.106
iddy+0.104
cient+0.099
ower+0.095
ute+0.095
Neg logit contributions
 other-0.095
 moral-0.088
 had-0.086
Who-0.082
 dial-0.080
Token give
Token ID1577
Feature activation+1.02e-01

Pos logit contributions
uttering+0.195
ummy+0.190
ower+0.189
iddy+0.188
ute+0.187
Neg logit contributions
 other-0.105
 had-0.098
Who-0.093
 also-0.093
 moral-0.084
Token to
Token ID284
Feature activation+7.40e-02

Pos logit contributions
uttering+0.195
ute+0.192
ap+0.188
iddy+0.186
ower+0.186
Neg logit contributions
 other-0.125
 had-0.118
Who-0.113
Where-0.106
 dial-0.102
Token the
Token ID262
Feature activation+7.53e-02

Pos logit contributions
uttering+0.153
ute+0.149
ower+0.147
ach+0.145
iddy+0.141
Neg logit contributions
 other-0.089
 had-0.080
Who-0.074
 also-0.071
 moral-0.071
Token rh
Token ID9529
Feature activation+1.60e-01

Pos logit contributions
ap+0.121
uttering+0.120
ach+0.120
ute+0.119
cient+0.119
Neg logit contributions
 other-0.102
Who-0.101
 had-0.100
 moral-0.097
Where-0.093
Tokenin
Token ID259
Feature activation+7.79e-02

Pos logit contributions
uttering+0.222
ap+0.219
cient+0.219
iddy+0.219
ower+0.218
Neg logit contributions
Who-0.243
 other-0.240
 had-0.239
 moral-0.231
Where-0.229
Tokenoc
Token ID420
Feature activation+1.83e-01

Pos logit contributions
uttering+0.138
iddy+0.135
ach+0.132
ower+0.132
ap+0.131
Neg logit contributions
 other-0.097
 had-0.095
Who-0.093
 moral-0.092
He-0.088
Tokeneros
Token ID27498
Feature activation+9.74e-02

Pos logit contributions
uttering+0.202
ute+0.167
utter+0.164
iddy+0.163
ower+0.159
Neg logit contributions
 other-0.358
 moral-0.352
 Moral-0.349
Who-0.346
 dial-0.336
Token.
Token ID13
Feature activation+6.27e-02

Pos logit contributions
uttering+0.177
ower+0.173
iddy+0.172
ach+0.171
ute+0.170
Neg logit contributions
 other-0.126
 had-0.124
 moral-0.112
Who-0.110
 also-0.106
Token Lily
Token ID20037
Feature activation+1.70e-02

Pos logit contributions
cient+0.109
ap+0.109
ute+0.108
ach+0.107
ummy+0.107
Neg logit contributions
Who-0.074
 had-0.072
 other-0.070
Where-0.068
 moral-0.068
Token walked
Token ID6807
Feature activation+7.69e-02

Pos logit contributions
cient+0.036
ap+0.036
ute+0.036
ach+0.036
uttering+0.036
Neg logit contributions
Who-0.014
 had-0.013
 other-0.013
Where-0.012
 moral-0.012
Token to
Token ID284
Feature activation+6.11e-02

Pos logit contributions
ap+0.112
uttering+0.112
ute+0.112
ach+0.111
cient+0.111
Neg logit contributions
Who-0.115
 had-0.107
Less-0.105
 moral-0.104
Where-0.104
Token like
Token ID588
Feature activation+1.04e-01

Pos logit contributions
cient+0.221
ute+0.220
uttering+0.220
 Majesty+0.217
ap+0.217
Neg logit contributions
 had-0.179
 other-0.166
Who-0.159
 also-0.154
 sounded-0.151
Token lettuce
Token ID39406
Feature activation+1.20e-01

Pos logit contributions
uttering+0.172
ute+0.172
cient+0.170
ap+0.169
ower+0.168
Neg logit contributions
 other-0.143
 had-0.142
Who-0.136
 moral-0.127
Where-0.126
Token,
Token ID11
Feature activation+6.47e-02

Pos logit contributions
uttering+0.208
icol+0.198
ute+0.196
cient+0.191
 Majesty+0.191
Neg logit contributions
 had-0.186
 other-0.177
 also-0.159
He-0.147
 sounded-0.146
Token tomatoes
Token ID23972
Feature activation+9.69e-02

Pos logit contributions
uttering+0.107
ute+0.104
ap+0.100
iddy+0.099
ower+0.098
Neg logit contributions
 other-0.104
 had-0.102
 also-0.095
Who-0.095
 moral-0.091
Token and
Token ID290
Feature activation+5.17e-02

Pos logit contributions
uttering+0.151
ute+0.144
icol+0.143
 Majesty+0.143
ocol+0.141
Neg logit contributions
 had-0.161
 other-0.150
 also-0.143
He-0.132
 sounded-0.131
Token cuc
Token ID38421
Feature activation+1.83e-01

Pos logit contributions
uttering+0.083
ute+0.083
ower+0.079
iddy+0.078
ap+0.077
Neg logit contributions
 other-0.089
 had-0.088
Who-0.084
 also-0.080
 moral-0.079
Tokenumbers
Token ID17024
Feature activation+8.56e-02

Pos logit contributions
icol+0.250
 disl+0.246
sing+0.243
sed+0.238
iddy+0.237
Neg logit contributions
He-0.268
 other-0.267
Who-0.266
Where-0.258
 had-0.255
Token.
Token ID13
Feature activation+6.21e-02

Pos logit contributions
uttering+0.171
icol+0.171
ute+0.171
cient+0.170
 Majesty+0.169
Neg logit contributions
 had-0.091
 other-0.085
 also-0.077
Who-0.074
He-0.073
Token He
Token ID679
Feature activation+8.60e-03

Pos logit contributions
cient+0.112
ap+0.112
ute+0.111
ach+0.110
uttering+0.110
Neg logit contributions
Who-0.069
 had-0.067
 other-0.064
Where-0.063
 moral-0.062
Token smiled
Token ID13541
Feature activation+8.81e-02

Pos logit contributions
uttering+0.019
ute+0.019
cient+0.019
ower+0.019
ap+0.019
Neg logit contributions
 had-0.007
Who-0.006
 other-0.006
 moral-0.006
 also-0.006
Token as
Token ID355
Feature activation+5.70e-02

Pos logit contributions
ocol+0.209
igsaw+0.191
cient+0.190
imb+0.188
ute+0.187
Neg logit contributions
He-0.086
 had-0.074
 also-0.071
 other-0.071
 decided-0.066
Token there
Token ID612
Feature activation+8.49e-02

Pos logit contributions
cient+0.119
ute+0.117
ummy+0.112
ower+0.111
ap+0.110
Neg logit contributions
 had-0.191
Who-0.183
 other-0.177
Where-0.173
 also-0.171
Token was
Token ID373
Feature activation+7.92e-02

Pos logit contributions
cient+0.196
ute+0.195
uttering+0.193
ocol+0.193
ower+0.193
Neg logit contributions
 had-0.059
Who-0.055
 other-0.052
Where-0.047
 sounded-0.046
Token a
Token ID257
Feature activation+6.89e-02

Pos logit contributions
cient+0.127
ute+0.126
ap+0.124
ach+0.123
uttering+0.123
Neg logit contributions
Who-0.110
 had-0.109
 other-0.106
Where-0.103
 moral-0.102
Token rh
Token ID9529
Feature activation+1.54e-01

Pos logit contributions
cient+0.105
ap+0.104
ute+0.104
uttering+0.102
ach+0.102
Neg logit contributions
Who-0.097
 had-0.093
 other-0.091
Where-0.090
 moral-0.089
Tokenin
Token ID259
Feature activation+6.38e-02

Pos logit contributions
cient+0.204
ap+0.201
ute+0.200
uttering+0.200
ach+0.199
Neg logit contributions
Who-0.246
 had-0.241
 other-0.239
Where-0.232
 moral-0.231
Tokenoc
Token ID420
Feature activation+1.82e-01

Pos logit contributions
cient+0.089
ute+0.088
ap+0.087
ach+0.086
ummy+0.085
Neg logit contributions
Who-0.100
 had-0.095
Where-0.094
 other-0.093
See-0.091
Tokeneros
Token ID27498
Feature activation+7.79e-02

Pos logit contributions
uttering+0.208
ute+0.187
utter+0.180
oppy+0.180
ower+0.174
Neg logit contributions
 other-0.329
 Moral-0.310
 moral-0.308
 dial-0.307
Who-0.307
Token who
Token ID508
Feature activation+2.60e-02

Pos logit contributions
cient+0.133
ute+0.133
ap+0.133
uttering+0.132
ach+0.131
Neg logit contributions
Who-0.095
 had-0.092
 other-0.089
Where-0.087
 moral-0.086
Token was
Token ID373
Feature activation+5.79e-02

Pos logit contributions
cient+0.053
ach+0.049
ap+0.049
ute+0.048
ower+0.047
Neg logit contributions
Who-0.026
 moral-0.023
Less-0.023
Where-0.023
 other-0.022
Token always
Token ID1464
Feature activation+4.92e-02

Pos logit contributions
ute+0.113
cient+0.112
ap+0.112
uttering+0.112
ower+0.111
Neg logit contributions
 had-0.058
 other-0.056
Who-0.056
 also-0.052
Where-0.052
Token naughty
Token ID45128
Feature activation+8.22e-02

Pos logit contributions
ute+0.081
cient+0.080
ummy+0.079
ap+0.079
ach+0.078
Neg logit contributions
 had-0.066
Who-0.065
 other-0.063
Where-0.061
 sounded-0.058
Token go
Token ID467
Feature activation+1.04e-01

Pos logit contributions
uttering+0.264
ower+0.258
ute+0.252
ummy+0.251
iddy+0.250
Neg logit contributions
 other-0.128
 had-0.117
 also-0.113
Who-0.110
 asked-0.104
Token!
Token ID0
Feature activation+8.21e-02

Pos logit contributions
cient+0.164
ocol+0.160
 Majesty+0.159
ummy+0.158
ower+0.157
Neg logit contributions
 had-0.156
 other-0.152
Who-0.145
 also-0.142
 dial-0.139
Token Everyone
Token ID11075
Feature activation+3.01e-02

Pos logit contributions
cient+0.148
ap+0.147
ute+0.147
uttering+0.146
ach+0.146
Neg logit contributions
Who-0.092
 had-0.089
 other-0.086
Where-0.084
 moral-0.083
Token watched
Token ID7342
Feature activation+8.80e-02

Pos logit contributions
cient+0.063
ap+0.063
ute+0.062
ach+0.062
uttering+0.062
Neg logit contributions
Who-0.025
 had-0.022
 other-0.022
Where-0.022
 moral-0.022
Token in
Token ID287
Feature activation+4.95e-02

Pos logit contributions
ute+0.153
cient+0.148
uttering+0.148
 Majesty+0.146
ower+0.145
Neg logit contributions
 other-0.113
 had-0.112
Who-0.108
 dial-0.101
 sounded-0.099
Token amaz
Token ID40642
Feature activation+1.81e-01

Pos logit contributions
cient+0.078
ap+0.078
ute+0.078
uttering+0.077
ach+0.077
Neg logit contributions
Who-0.067
 had-0.065
 other-0.064
Where-0.062
 moral-0.062
Tokenement
Token ID972
Feature activation+1.04e-01

Pos logit contributions
cient+0.339
oppy+0.331
ute+0.323
opsy+0.312
uttering+0.311
Neg logit contributions
 other-0.199
 sounded-0.197
 had-0.191
He-0.187
Who-0.185
Token as
Token ID355
Feature activation+7.35e-02

Pos logit contributions
cient+0.187
ocol+0.185
ute+0.179
 Majesty+0.174
uttering+0.172
Neg logit contributions
 had-0.129
 other-0.121
 shouted-0.114
 also-0.111
 sounded-0.107
Token it
Token ID340
Feature activation+2.18e-02

Pos logit contributions
cient+0.118
ap+0.118
ute+0.117
ach+0.116
uttering+0.116
Neg logit contributions
Who-0.097
 had-0.094
 other-0.091
Where-0.089
 moral-0.089
Token took
Token ID1718
Feature activation+5.91e-02

Pos logit contributions
ap+0.044
cient+0.044
ute+0.043
ach+0.043
ower+0.042
Neg logit contributions
Who-0.019
Where-0.018
Less-0.017
 moral-0.017
 other-0.016
Token off
Token ID572
Feature activation+7.48e-02

Pos logit contributions
ap+0.086
cient+0.086
ute+0.085
ach+0.085
uttering+0.084
Neg logit contributions
Who-0.087
 had-0.084
 other-0.082
Where-0.081
 moral-0.081
Token am
Token ID716
Feature activation+7.64e-02

Pos logit contributions
cient+0.130
uttering+0.130
ute+0.130
ummy+0.129
ach+0.129
Neg logit contributions
 had-0.076
Who-0.075
 other-0.071
 moral-0.068
Where-0.068
Token taking
Token ID2263
Feature activation+8.82e-02

Pos logit contributions
ach+0.137
ower+0.136
uttering+0.136
ute+0.136
ummy+0.135
Neg logit contributions
 had-0.095
 other-0.094
 also-0.092
Who-0.088
 moral-0.084
Token the
Token ID262
Feature activation+9.33e-02

Pos logit contributions
uttering+0.156
iddy+0.152
ower+0.152
ute+0.151
cient+0.150
Neg logit contributions
 other-0.120
 had-0.110
Who-0.108
Where-0.102
 dial-0.102
Token rh
Token ID9529
Feature activation+1.59e-01

Pos logit contributions
uttering+0.145
iddy+0.143
ap+0.142
ach+0.139
ower+0.138
Neg logit contributions
 other-0.140
 moral-0.130
 had-0.130
Who-0.127
 dial-0.124
Tokenin
Token ID259
Feature activation+7.38e-02

Pos logit contributions
icol+0.238
 Majesty+0.234
uttering+0.233
iddy+0.233
ower+0.230
Neg logit contributions
 other-0.228
Who-0.221
 had-0.219
 moral-0.213
He-0.213
Tokenoc
Token ID420
Feature activation+1.80e-01

Pos logit contributions
uttering+0.134
iddy+0.132
ocol+0.129
ap+0.127
 Majesty+0.127
Neg logit contributions
 other-0.088
 had-0.085
Who-0.083
 moral-0.082
Where-0.077
Tokeneros
Token ID27498
Feature activation+9.30e-02

Pos logit contributions
uttering+0.204
utter+0.168
ute+0.165
oppy+0.163
iddy+0.160
Neg logit contributions
 other-0.333
 moral-0.324
 Moral-0.322
Who-0.310
Less-0.308
Token to
Token ID284
Feature activation+6.15e-02

Pos logit contributions
ocol+0.153
iddy+0.150
uttering+0.149
ummy+0.145
ower+0.145
Neg logit contributions
 other-0.134
 had-0.132
Who-0.119
 moral-0.116
 also-0.114
Token a
Token ID257
Feature activation+7.74e-02

Pos logit contributions
uttering+0.109
ower+0.108
ute+0.107
iddy+0.104
ap+0.104
Neg logit contributions
 other-0.081
Who-0.077
 had-0.075
 moral-0.072
 also-0.071
Token new
Token ID649
Feature activation+9.12e-02

Pos logit contributions
ap+0.126
ute+0.126
uttering+0.126
cient+0.125
ach+0.125
Neg logit contributions
Who-0.100
 had-0.098
 other-0.097
Where-0.093
 moral-0.093
Token home
Token ID1363
Feature activation+8.81e-02

Pos logit contributions
uttering+0.149
ocol+0.148
iddy+0.147
ower+0.146
cient+0.143
Neg logit contributions
 other-0.129
 had-0.123
 moral-0.120
Who-0.120
 dial-0.115
Token.
Token ID13
Feature activation+1.14e-01

Pos logit contributions
 had+0.053
 other+0.052
Who+0.052
Where+0.049
 also+0.048
Neg logit contributions
cient-0.071
ute-0.070
ummy-0.069
ach-0.068
ap-0.068
Token Tim
Token ID5045
Feature activation+3.49e-02

Pos logit contributions
uttering+0.218
ute+0.218
cient+0.217
ap+0.216
ach+0.215
Neg logit contributions
Who-0.119
 had-0.116
 other-0.110
Where-0.107
He-0.105
Tokenmy
Token ID1820
Feature activation+4.25e-02

Pos logit contributions
cient+0.075
ap+0.075
ute+0.073
ach+0.073
ummy+0.073
Neg logit contributions
Who-0.028
Where-0.025
 moral-0.022
Less-0.022
 other-0.022
Token watched
Token ID7342
Feature activation+9.78e-02

Pos logit contributions
cient+0.093
ap+0.093
ute+0.092
ach+0.092
uttering+0.091
Neg logit contributions
Who-0.031
 had-0.029
 other-0.028
Where-0.027
 moral-0.027
Token with
Token ID351
Feature activation+5.99e-02

Pos logit contributions
cient+0.194
ute+0.188
uttering+0.188
 Majesty+0.184
owers+0.182
Neg logit contributions
 other-0.112
Who-0.104
 moral-0.098
 had-0.096
 dial-0.096
Token amaz
Token ID40642
Feature activation+1.79e-01

Pos logit contributions
ute+0.127
ower+0.125
cient+0.124
ach+0.123
uttering+0.121
Neg logit contributions
 other-0.066
 had-0.063
Who-0.060
Where-0.057
 also-0.056
Tokenement
Token ID972
Feature activation+1.07e-01

Pos logit contributions
cient+0.316
oppy+0.297
ute+0.296
 Majesty+0.293
uttering+0.291
Neg logit contributions
 other-0.218
 sounded-0.216
He-0.214
Who-0.214
 had-0.209
Token as
Token ID355
Feature activation+7.41e-02

Pos logit contributions
cient+0.223
ench+0.217
 Majesty+0.211
ocol+0.210
imb+0.209
Neg logit contributions
 had-0.106
 other-0.096
 shouted-0.089
He-0.086
 sounded-0.084
Token he
Token ID339
Feature activation-5.22e-03

Pos logit contributions
cient+0.139
ute+0.137
ap+0.137
uttering+0.136
ach+0.135
Neg logit contributions
Who-0.081
 had-0.079
 other-0.078
Where-0.073
 moral-0.073
Token saw
Token ID2497
Feature activation+6.84e-02

Pos logit contributions
Less+0.006
 moral+0.006
Who+0.006
Those+0.006
Where+0.006
Neg logit contributions
cient-0.008
 Majesty-0.008
ap-0.008
ish-0.008
uc-0.008
Token stars
Token ID5788
Feature activation+6.86e-02

Pos logit contributions
ute+0.113
ap+0.112
cient+0.111
uttering+0.111
ower+0.110
Neg logit contributions
Who-0.092
 other-0.091
 had-0.090
Where-0.086
 sounded-0.083
Tokeneddy
Token ID21874
Feature activation+1.12e-01

Pos logit contributions
 disl+0.192
cient+0.188
ocol+0.188
 Majesty+0.187
ap+0.185
Neg logit contributions
He-0.198
Mom-0.190
 other-0.189
Who-0.189
 had-0.186
Token bear
Token ID6842
Feature activation+7.84e-02

Pos logit contributions
uttering+0.259
ach+0.242
ower+0.241
ap+0.241
ute+0.240
Neg logit contributions
 had-0.100
 other-0.095
Who-0.089
 dial-0.086
 sounded-0.083
Token and
Token ID290
Feature activation+4.31e-02

Pos logit contributions
uttering+0.154
ute+0.153
ap+0.152
ach+0.151
cient+0.151
Neg logit contributions
 had-0.078
Who-0.075
 other-0.073
 dial-0.067
Where-0.067
Token watched
Token ID7342
Feature activation+9.11e-02

Pos logit contributions
uttering+0.089
ute+0.089
ach+0.087
ap+0.087
cient+0.086
Neg logit contributions
 had-0.043
Who-0.042
 other-0.040
 dial-0.037
 prayed-0.037
Token in
Token ID287
Feature activation+4.68e-02

Pos logit contributions
cient+0.164
ute+0.163
uttering+0.162
ower+0.160
 Majesty+0.157
Neg logit contributions
 other-0.109
Who-0.108
 had-0.103
Where-0.098
 moral-0.097
Token amaz
Token ID40642
Feature activation+1.79e-01

Pos logit contributions
ap+0.070
ocol+0.070
ach+0.070
ummy+0.069
uttering+0.069
Neg logit contributions
Who-0.066
 had-0.063
 other-0.060
 moral-0.060
He-0.060
Tokenement
Token ID972
Feature activation+1.04e-01

Pos logit contributions
cient+0.319
ute+0.307
uttering+0.301
oppy+0.299
ap+0.294
Neg logit contributions
 other-0.215
Who-0.212
He-0.211
 sounded-0.210
 had-0.206
Token as
Token ID355
Feature activation+6.86e-02

Pos logit contributions
cient+0.221
ute+0.211
 Majesty+0.209
ocol+0.209
ubby+0.207
Neg logit contributions
 had-0.089
 other-0.088
Who-0.080
 shouted-0.075
He-0.075
Token it
Token ID340
Feature activation+1.69e-02

Pos logit contributions
cient+0.114
ap+0.114
ute+0.114
ach+0.113
uttering+0.113
Neg logit contributions
Who-0.086
 had-0.082
 other-0.079
Where-0.079
 moral-0.078
Token shrunk
Token ID38900
Feature activation+8.39e-02

Pos logit contributions
cient+0.031
ap+0.030
uc+0.029
uttering+0.029
ach+0.029
Neg logit contributions
Who-0.017
Less-0.017
 moral-0.016
Where-0.016
He-0.015
Token an
Token ID281
Feature activation+6.63e-02

Pos logit contributions
cient+0.150
ute+0.148
uttering+0.147
ap+0.145
ower+0.142
Neg logit contributions
 had-0.102
 other-0.101
Who-0.099
Where-0.094
 sounded-0.092
Token as
Token ID355
Feature activation+5.12e-02

Pos logit contributions
cient+0.159
uttering+0.159
ute+0.158
ocol+0.157
ap+0.156
Neg logit contributions
 had-0.092
 other-0.090
Who-0.089
 moral-0.083
 dial-0.082
Token he
Token ID339
Feature activation-8.10e-03

Pos logit contributions
cient+0.099
ute+0.098
ap+0.098
uttering+0.097
ach+0.097
Neg logit contributions
 had-0.054
 other-0.052
Who-0.052
 moral-0.047
He-0.047
Token opened
Token ID4721
Feature activation+6.80e-02

Pos logit contributions
Who+0.007
Where+0.006
Less+0.006
 moral+0.006
 other+0.006
Neg logit contributions
cient-0.017
ap-0.016
ach-0.016
ute-0.016
 Majesty-0.016
Token the
Token ID262
Feature activation+5.19e-02

Pos logit contributions
cient+0.135
uttering+0.133
ower+0.132
ute+0.131
ap+0.129
Neg logit contributions
 other-0.072
 had-0.069
 dial-0.066
Who-0.066
Where-0.063
Token ch
Token ID442
Feature activation+9.39e-02

Pos logit contributions
uttering+0.089
ap+0.089
ach+0.088
cient+0.088
ute+0.088
Neg logit contributions
 other-0.063
Who-0.063
 had-0.063
 moral-0.060
Where-0.059
Tokenocol
Token ID4668
Feature activation+1.78e-01

Pos logit contributions
cient+0.080
icol+0.069
ower+0.068
idents+0.064
uttering+0.063
Neg logit contributions
 other-0.190
 moral-0.185
He-0.182
 had-0.178
 kn-0.176
Tokenates
Token ID689
Feature activation+7.55e-02

Pos logit contributions
cient+0.272
iddy+0.258
ummy+0.256
 disl+0.256
icol+0.253
Neg logit contributions
 other-0.228
 counted-0.224
Less-0.222
 had-0.220
Who-0.219
Token and
Token ID290
Feature activation+2.01e-02

Pos logit contributions
cient+0.136
 Majesty+0.134
ocol+0.133
uttering+0.133
ute+0.132
Neg logit contributions
 had-0.086
 other-0.085
Who-0.082
 also-0.078
He-0.077
Token started
Token ID2067
Feature activation+4.68e-02

Pos logit contributions
ute+0.041
uttering+0.041
ummy+0.040
iddy+0.040
ower+0.040
Neg logit contributions
 had-0.020
Who-0.020
 other-0.019
 dial-0.018
Where-0.018
Token eating
Token ID6600
Feature activation+6.87e-02

Pos logit contributions
ute+0.070
cient+0.070
ummy+0.069
iddy+0.068
ower+0.068
Neg logit contributions
 had-0.070
Who-0.069
 other-0.068
 dial-0.066
Where-0.064
Token them
Token ID606
Feature activation+3.30e-02

Pos logit contributions
ute+0.125
cient+0.125
uttering+0.124
ap+0.121
ower+0.121
Neg logit contributions
 other-0.085
 had-0.080
Who-0.079
Where-0.074
He-0.073
Token egg
Token ID5935
Feature activation+1.24e-01

Pos logit contributions
uttering+0.176
ute+0.174
iddy+0.173
cient+0.173
ach+0.171
Neg logit contributions
 had-0.154
Who-0.149
 other-0.144
Where-0.136
 moral-0.136
Token!
Token ID0
Feature activation+7.46e-02

Pos logit contributions
uttering+0.237
igsaw+0.236
 Majesty+0.233
 disl+0.230
ocol+0.229
Neg logit contributions
 had-0.168
 also-0.128
 sounded-0.128
 other-0.125
 shouted-0.120
Token He
Token ID679
Feature activation+1.33e-02

Pos logit contributions
cient+0.127
ap+0.127
ute+0.127
uttering+0.125
ach+0.125
Neg logit contributions
Who-0.091
 had-0.088
 other-0.085
Where-0.083
 moral-0.082
Token watched
Token ID7342
Feature activation+8.42e-02

Pos logit contributions
cient+0.030
ap+0.030
ute+0.029
ach+0.029
uttering+0.029
Neg logit contributions
Who-0.009
 other-0.008
 had-0.008
Where-0.008
 moral-0.008
Token in
Token ID287
Feature activation+4.33e-02

Pos logit contributions
ute+0.168
uttering+0.164
cient+0.162
ower+0.162
 disl+0.156
Neg logit contributions
 other-0.094
Who-0.089
 had-0.086
 dial-0.080
He-0.080
Token amaz
Token ID40642
Feature activation+1.77e-01

Pos logit contributions
cient+0.066
ap+0.066
ute+0.066
ach+0.065
uttering+0.065
Neg logit contributions
Who-0.060
 had-0.059
 other-0.057
Where-0.056
 moral-0.056
Tokenement
Token ID972
Feature activation+1.02e-01

Pos logit contributions
cient+0.338
ute+0.332
oppy+0.320
uttering+0.313
ower+0.308
Neg logit contributions
He-0.193
 sounded-0.188
 other-0.187
Who-0.184
 had-0.183
Token as
Token ID355
Feature activation+7.27e-02

Pos logit contributions
cient+0.181
ute+0.174
ocol+0.170
iddy+0.166
 Majesty+0.165
Neg logit contributions
 had-0.129
 other-0.122
 shouted-0.119
He-0.119
Who-0.114
Token the
Token ID262
Feature activation+4.31e-02

Pos logit contributions
cient+0.130
ute+0.129
ap+0.129
uttering+0.128
ach+0.127
Neg logit contributions
Who-0.085
 had-0.083
 other-0.081
Where-0.077
 moral-0.077
Token egg
Token ID5935
Feature activation+5.51e-02

Pos logit contributions
ummy+0.057
cient+0.054
ach+0.053
ower+0.052
 Majesty+0.052
Neg logit contributions
Who-0.070
Where-0.066
 had-0.066
Less-0.066
He-0.065
Token began
Token ID2540
Feature activation+6.84e-02

Pos logit contributions
ute+0.108
cient+0.108
uttering+0.108
ap+0.107
ach+0.107
Neg logit contributions
 had-0.055
Who-0.052
 other-0.051
 sounded-0.047
Where-0.047
Token air
Token ID1633
Feature activation+1.80e-01

Pos logit contributions
 other+0.034
Who+0.032
 had+0.031
Where+0.030
 dial+0.029
Neg logit contributions
ute-0.053
ap-0.052
ower-0.051
uttering-0.051
ach-0.050
Token.
Token ID13
Feature activation+9.88e-02

Pos logit contributions
 Majesty+0.378
cient+0.374
ocol+0.374
oppy+0.370
ute+0.363
Neg logit contributions
 had-0.202
 other-0.171
 shouted-0.159
 sounded-0.150
 also-0.146
Token He
Token ID679
Feature activation+2.75e-02

Pos logit contributions
uttering+0.188
ute+0.187
cient+0.186
ach+0.185
ap+0.184
Neg logit contributions
Who-0.105
 had-0.102
 other-0.096
Where-0.094
He-0.093
Token watched
Token ID7342
Feature activation+9.42e-02

Pos logit contributions
cient+0.058
ap+0.058
ute+0.057
ach+0.057
uttering+0.056
Neg logit contributions
Who-0.023
 had-0.021
 other-0.020
Where-0.020
 moral-0.019
Token with
Token ID351
Feature activation+5.90e-02

Pos logit contributions
ute+0.190
cient+0.190
uttering+0.189
ower+0.184
 Majesty+0.178
Neg logit contributions
 other-0.105
Who-0.095
 had-0.093
 dial-0.086
 moral-0.086
Token amaz
Token ID40642
Feature activation+1.77e-01

Pos logit contributions
ute+0.110
cient+0.110
ower+0.108
ach+0.107
uttering+0.106
Neg logit contributions
 other-0.071
 had-0.070
Who-0.069
Where-0.065
 moral-0.063
Tokenement
Token ID972
Feature activation+1.07e-01

Pos logit contributions
cient+0.321
ute+0.306
oppy+0.306
uttering+0.301
 Majesty+0.295
Neg logit contributions
 sounded-0.206
 other-0.203
 had-0.201
Who-0.200
He-0.198
Token as
Token ID355
Feature activation+7.32e-02

Pos logit contributions
cient+0.216
ench+0.207
ute+0.207
ocol+0.204
imb+0.202
Neg logit contributions
 had-0.120
 other-0.103
 shouted-0.099
 sounded-0.095
He-0.093
Token the
Token ID262
Feature activation+5.04e-02

Pos logit contributions
cient+0.131
ap+0.130
ute+0.130
uttering+0.129
ach+0.129
Neg logit contributions
Who-0.084
 had-0.081
 other-0.078
Where-0.076
 moral-0.076
Token z
Token ID1976
Feature activation+1.04e-01

Pos logit contributions
cient+0.076
ute+0.075
ap+0.074
ach+0.074
ummy+0.073
Neg logit contributions
Who-0.070
Where-0.066
 had-0.066
Less-0.065
He-0.064
Tokenig
Token ID328
Feature activation+6.29e-02

Pos logit contributions
cient+0.140
ummy+0.131
 disl+0.127
 squash+0.126
ench+0.126
Neg logit contributions
 had-0.152
Who-0.150
 dial-0.147
Ann-0.146
 kn-0.146
Token quiet
Token ID5897
Feature activation+1.75e-01

Pos logit contributions
 had+0.045
 other+0.042
 also+0.039
Who+0.038
 moral+0.038
Neg logit contributions
uttering-0.056
ute-0.055
ach-0.054
cient-0.054
ower-0.054
Token compared
Token ID3688
Feature activation+1.49e-01

Pos logit contributions
cient+0.388
uttering+0.385
ocol+0.384
ute+0.376
oppy+0.370
Neg logit contributions
 had-0.171
 other-0.161
 sounded-0.135
 shouted-0.134
Who-0.131
Token to
Token ID284
Feature activation+7.26e-02

Pos logit contributions
cient+0.275
ute+0.269
ap+0.265
uttering+0.264
 Majesty+0.263
Neg logit contributions
 other-0.190
 had-0.181
Who-0.173
Where-0.164
He-0.162
Token the
Token ID262
Feature activation+9.21e-02

Pos logit contributions
ute+0.125
cient+0.124
uttering+0.124
ummy+0.122
ach+0.122
Neg logit contributions
Who-0.091
 had-0.090
 other-0.090
Where-0.083
 moral-0.082
Token noisy
Token ID31210
Feature activation+1.13e-01

Pos logit contributions
cient+0.153
ap+0.153
uttering+0.152
ute+0.151
ach+0.151
Neg logit contributions
Who-0.117
 other-0.117
 had-0.116
 moral-0.111
Where-0.108
Token se
Token ID384
Feature activation+1.77e-01

Pos logit contributions
uttering+0.199
cient+0.198
iddy+0.190
ute+0.189
ocol+0.187
Neg logit contributions
 other-0.150
 had-0.149
Who-0.138
 moral-0.133
 sounded-0.130
Tokenag
Token ID363
Feature activation+1.24e-01

Pos logit contributions
ummy+0.250
uttering+0.246
cient+0.243
iddy+0.240
ach+0.233
Neg logit contributions
 other-0.259
 had-0.257
 moral-0.254
He-0.250
Who-0.247
Tokenull
Token ID724
Feature activation+8.87e-02

Pos logit contributions
icol+0.200
 Majesty+0.196
cient+0.195
ummy+0.192
ocol+0.190
Neg logit contributions
 had-0.183
 other-0.177
 sounded-0.165
He-0.163
 moral-0.158
Tokens
Token ID82
Feature activation+7.77e-02

Pos logit contributions
ocol+0.168
uttering+0.166
ummy+0.165
ach+0.163
ute+0.162
Neg logit contributions
 had-0.102
 other-0.093
Who-0.087
 sounded-0.085
He-0.085
Token outside
Token ID2354
Feature activation+9.28e-02

Pos logit contributions
cient+0.134
ocol+0.132
ute+0.132
ummy+0.131
ach+0.131
Neg logit contributions
 had-0.099
 other-0.094
Who-0.088
 moral-0.086
 also-0.085
Token.
Token ID13
Feature activation+6.70e-02

Pos logit contributions
cient+0.180
ute+0.180
uttering+0.179
 Majesty+0.176
ach+0.175
Neg logit contributions
 had-0.100
 other-0.098
Who-0.092
 sounded-0.086
Where-0.085
Token children
Token ID1751
Feature activation+1.06e-01

Pos logit contributions
Who+0.092
Where+0.087
Less+0.079
 moral+0.077
 other+0.073
Neg logit contributions
cient-0.213
iddy-0.210
ap-0.210
ach-0.205
ummy-0.203
Token put
Token ID1234
Feature activation+1.10e-01

Pos logit contributions
ute+0.204
uttering+0.202
cient+0.202
ach+0.201
ocol+0.201
Neg logit contributions
 had-0.109
Who-0.105
 other-0.102
 moral-0.094
Where-0.094
Token on
Token ID319
Feature activation+9.60e-02

Pos logit contributions
uttering+0.199
cient+0.199
ute+0.198
 Majesty+0.198
ower+0.194
Neg logit contributions
 other-0.138
 had-0.134
Who-0.119
 sounded-0.116
 dial-0.113
Token their
Token ID511
Feature activation+9.24e-02

Pos logit contributions
cient+0.165
ute+0.165
uttering+0.164
ower+0.162
ap+0.160
Neg logit contributions
 other-0.124
 had-0.118
Who-0.118
 dial-0.111
 moral-0.110
Token bath
Token ID7837
Feature activation+1.32e-01

Pos logit contributions
uttering+0.144
cient+0.140
iddy+0.140
ute+0.139
ummy+0.137
Neg logit contributions
 other-0.138
 had-0.135
Who-0.129
 moral-0.125
 dial-0.123
Tokenro
Token ID305
Feature activation+1.74e-01

Pos logit contributions
ocol+0.235
cient+0.218
uttering+0.216
arter+0.206
 Majesty+0.206
Neg logit contributions
 had-0.189
 other-0.186
 sounded-0.165
 dial-0.158
 was-0.153
Tokenbes
Token ID12636
Feature activation+1.30e-01

Pos logit contributions
ummy+0.383
cient+0.379
ocol+0.368
arter+0.366
imb+0.360
Neg logit contributions
 had-0.189
 shouted-0.162
 other-0.154
 dial-0.149
 learned-0.145
Token and
Token ID290
Feature activation+4.70e-02

Pos logit contributions
ocol+0.262
 Majesty+0.257
ach+0.247
uttering+0.246
 disl+0.244
Neg logit contributions
 had-0.146
 other-0.136
 sounded-0.120
 also-0.119
He-0.116
Token ran
Token ID4966
Feature activation+7.68e-02

Pos logit contributions
cient+0.097
ute+0.097
uttering+0.097
ach+0.097
ap+0.096
Neg logit contributions
 had-0.041
Who-0.041
 other-0.039
Where-0.036
 dial-0.036
Token around
Token ID1088
Feature activation+8.53e-02

Pos logit contributions
cient+0.116
ocol+0.113
ummy+0.111
ute+0.111
ap+0.111
Neg logit contributions
 had-0.114
Who-0.112
 other-0.112
Where-0.107
 dial-0.105
Token in
Token ID287
Feature activation+5.52e-02

Pos logit contributions
cient+0.153
ute+0.150
ower+0.150
 Majesty+0.148
uttering+0.148
Neg logit contributions
 had-0.105
 other-0.105
Who-0.102
Where-0.095
 moral-0.093
Token pot
Token ID1787
Feature activation-5.40e-02

Pos logit contributions
Who+0.107
 had+0.106
 other+0.105
 moral+0.102
Where+0.100
Neg logit contributions
cient-0.088
ap-0.088
ach-0.087
ute-0.087
uttering-0.086
Token in
Token ID287
Feature activation-6.37e-02

Pos logit contributions
Who+0.066
 had+0.065
 other+0.064
 moral+0.061
Where+0.061
Neg logit contributions
cient-0.092
ute-0.091
ap-0.090
ach-0.090
uttering-0.090
Token the
Token ID262
Feature activation-4.47e-02

Pos logit contributions
Who+0.103
 had+0.098
Well+0.096
He+0.094
 moral+0.094
Neg logit contributions
ap-0.081
ummy-0.077
uttering-0.076
ach-0.075
ocol-0.075
Token sky
Token ID6766
Feature activation-3.03e-02

Pos logit contributions
Who+0.061
 had+0.060
 other+0.059
Where+0.057
 moral+0.057
Neg logit contributions
cient-0.070
ute-0.070
ap-0.070
ach-0.070
uttering-0.069
Token.
Token ID13
Feature activation-6.14e-02

Pos logit contributions
 had+0.030
 other+0.029
Who+0.028
 sounded+0.026
Where+0.026
Neg logit contributions
cient-0.060
ute-0.059
ocol-0.059
ach-0.058
uttering-0.058
Token Joe
Token ID5689
Feature activation-1.11e-01

Pos logit contributions
Who+0.093
 had+0.090
 other+0.088
Where+0.088
 moral+0.087
Neg logit contributions
ap-0.087
cient-0.087
ummy-0.087
ach-0.085
uttering-0.085
Token was
Token ID373
Feature activation-5.69e-02

Pos logit contributions
Who+0.107
Where+0.100
Less+0.090
 moral+0.089
 other+0.086
Neg logit contributions
ap-0.221
cient-0.218
ute-0.216
iddy-0.213
ach-0.212
Token so
Token ID523
Feature activation-5.64e-02

Pos logit contributions
 had+0.063
 other+0.060
 also+0.059
Who+0.058
Where+0.054
Neg logit contributions
cient-0.112
ocol-0.110
ute-0.110
ach-0.109
ummy-0.109
Token happy
Token ID3772
Feature activation-2.10e-02

Pos logit contributions
 had+0.062
Who+0.061
 other+0.060
 moral+0.057
 also+0.056
Neg logit contributions
cient-0.105
ute-0.104
ach-0.103
ocol-0.103
ummy-0.103
Token that
Token ID326
Feature activation-4.71e-02

Pos logit contributions
 had+0.031
Who+0.029
 other+0.028
He+0.027
Where+0.027
Neg logit contributions
cient-0.036
ocol-0.035
ute-0.034
 Majesty-0.033
ubby-0.033
Token he
Token ID339
Feature activation-1.06e-01

Pos logit contributions
Who+0.058
Where+0.053
 dial+0.052
 moral+0.052
 had+0.051
Neg logit contributions
ap-0.081
ummy-0.080
cient-0.078
iddy-0.077
uttering-0.077
Token river
Token ID7850
Feature activation-3.69e-02

Pos logit contributions
Who+0.110
 had+0.109
 other+0.108
 moral+0.105
Where+0.104
Neg logit contributions
cient-0.059
ap-0.059
ute-0.058
ach-0.058
uttering-0.058
Token and
Token ID290
Feature activation-6.40e-02

Pos logit contributions
 had+0.049
 other+0.047
Who+0.046
 moral+0.044
 dial+0.044
Neg logit contributions
ocol-0.060
cient-0.060
ute-0.060
uttering-0.059
ach-0.059
Token went
Token ID1816
Feature activation-4.63e-02

Pos logit contributions
Who+0.055
 moral+0.049
Where+0.049
 other+0.046
Less+0.045
Neg logit contributions
ap-0.134
cient-0.131
ute-0.128
ach-0.128
uttering-0.127
Token home
Token ID1363
Feature activation-3.79e-02

Pos logit contributions
 had+0.063
 other+0.062
Who+0.061
 dial+0.058
Where+0.057
Neg logit contributions
cient-0.075
ach-0.072
ocol-0.072
ummy-0.072
ute-0.072
Token.
Token ID13
Feature activation-5.81e-02

Pos logit contributions
 had+0.046
Who+0.045
 other+0.045
 dial+0.042
Where+0.041
Neg logit contributions
cient-0.066
ute-0.065
ocol-0.065
ach-0.064
ummy-0.064
Token Mark
Token ID2940
Feature activation-1.09e-01

Pos logit contributions
Who+0.088
 other+0.084
 moral+0.083
Where+0.082
Looks+0.079
Neg logit contributions
ummy-0.080
ap-0.077
cient-0.074
icol-0.073
ower-0.069
Token learned
Token ID4499
Feature activation-4.15e-02

Pos logit contributions
Who+0.107
Where+0.096
Less+0.087
Looks+0.087
 moral+0.083
Neg logit contributions
ap-0.220
cient-0.218
ute-0.212
iddy-0.209
 Majesty-0.209
Token an
Token ID281
Feature activation-4.29e-02

Pos logit contributions
 other+0.066
 had+0.064
 also+0.063
He+0.063
Who+0.063
Neg logit contributions
ocol-0.063
cient-0.061
ower-0.061
ap-0.060
uttering-0.060
Token important
Token ID1593
Feature activation-2.53e-02

Pos logit contributions
Who+0.069
See+0.065
Where+0.065
 had+0.061
Maybe+0.060
Neg logit contributions
ummy-0.064
ute-0.062
iddy-0.061
cient-0.060
ubby-0.059
Token lesson
Token ID11483
Feature activation-1.63e-02

Pos logit contributions
Who+0.026
Where+0.023
 had+0.023
See+0.022
 other+0.022
Neg logit contributions
ute-0.051
ower-0.048
uttering-0.048
ap-0.048
cient-0.048
Token that
Token ID326
Feature activation-5.10e-02

Pos logit contributions
Who+0.031
 had+0.030
 other+0.030
Where+0.029
 moral+0.029
Neg logit contributions
ute-0.017
ap-0.017
cient-0.017
ach-0.016
uttering-0.016
Token!
Token ID0
Feature activation-4.70e-02

Pos logit contributions
 had+0.023
 other+0.020
Who+0.019
 sounded+0.018
Where+0.018
Neg logit contributions
cient-0.038
ute-0.036
 Majesty-0.034
uttering-0.033
ap-0.033
Token
Token ID220
Feature activation-1.01e-02

Pos logit contributions
 had+0.044
Who+0.044
 other+0.041
See+0.040
Where+0.040
Neg logit contributions
ach-0.096
ute-0.096
cient-0.095
ap-0.095
ocol-0.094
Token
Token ID198
Feature activation-2.42e-02

Pos logit contributions
Who+0.006
 had+0.006
 other+0.005
He+0.005
Where+0.005
Neg logit contributions
cient-0.024
ocol-0.023
ute-0.023
ap-0.023
ach-0.023
Token
Token ID198
Feature activation-4.15e-02

Pos logit contributions
Who+0.020
 had+0.019
 other+0.018
Where+0.017
 moral+0.017
Neg logit contributions
cient-0.051
ute-0.051
ap-0.051
ach-0.051
uttering-0.051
TokenAs
Token ID1722
Feature activation-3.33e-02

Pos logit contributions
Who+0.032
 had+0.031
 other+0.030
 dial+0.028
 moral+0.028
Neg logit contributions
ute-0.088
cient-0.088
ap-0.088
uttering-0.088
ach-0.088
Token Daisy
Token ID40355
Feature activation-1.09e-01

Pos logit contributions
Who+0.048
 dial+0.045
 had+0.045
 moral+0.044
Where+0.044
Neg logit contributions
ummy-0.049
ap-0.048
uttering-0.048
ach-0.048
cient-0.047
Token enjoyed
Token ID8359
Feature activation-4.35e-02

Pos logit contributions
Who+0.127
Where+0.121
 moral+0.118
 other+0.117
Less+0.116
Neg logit contributions
ach-0.184
ap-0.182
ummy-0.180
cient-0.180
uck-0.178
Token the
Token ID262
Feature activation-5.44e-02

Pos logit contributions
 other+0.039
Who+0.038
 had+0.038
 also+0.033
Where+0.033
Neg logit contributions
cient-0.102
ute-0.101
ower-0.097
ish-0.094
ap-0.093
Token honey
Token ID12498
Feature activation-5.44e-02

Pos logit contributions
 other+0.081
 had+0.079
 moral+0.076
Who+0.075
 sounded+0.071
Neg logit contributions
cient-0.085
ute-0.082
ach-0.082
ap-0.082
uttering-0.081
Token,
Token ID11
Feature activation-4.21e-02

Pos logit contributions
 had+0.101
 other+0.096
Who+0.092
 also+0.091
 sounded+0.090
Neg logit contributions
cient-0.067
ute-0.066
 Majesty-0.064
ach-0.063
ocol-0.063
Token she
Token ID673
Feature activation-9.34e-02

Pos logit contributions
 had+0.038
Who+0.038
 other+0.038
Where+0.034
 also+0.032
Neg logit contributions
cient-0.090
ute-0.090
ap-0.089
ach-0.089
ower-0.087
Token had
Token ID550
Feature activation-4.08e-02

Pos logit contributions
Who+0.067
Where+0.062
 moral+0.059
He+0.056
Maybe+0.054
Neg logit contributions
ap-0.114
cient-0.110
iddy-0.109
ute-0.107
ach-0.107
Token a
Token ID257
Feature activation-5.24e-02

Pos logit contributions
Who+0.055
 had+0.052
Where+0.051
 other+0.050
 moral+0.050
Neg logit contributions
ap-0.065
cient-0.065
ach-0.064
ute-0.064
uttering-0.063
Token great
Token ID1049
Feature activation-3.34e-02

Pos logit contributions
Who+0.071
Where+0.066
 had+0.066
 other+0.064
He+0.063
Neg logit contributions
ap-0.084
ute-0.083
cient-0.082
uttering-0.081
ummy-0.080
Token time
Token ID640
Feature activation-1.70e-02

Pos logit contributions
 had+0.034
 other+0.032
 moral+0.031
Who+0.030
 sounded+0.028
Neg logit contributions
cient-0.068
ocol-0.067
ach-0.065
uttering-0.065
ute-0.065
Token.
Token ID13
Feature activation-5.23e-02

Pos logit contributions
Who+0.025
 had+0.024
 moral+0.023
Where+0.023
 other+0.023
Neg logit contributions
ap-0.025
ach-0.025
uttering-0.024
ute-0.024
cient-0.024
Token Joe
Token ID5689
Feature activation-1.08e-01

Pos logit contributions
Who+0.097
 dial+0.093
Where+0.091
 other+0.088
 prayed+0.088
Neg logit contributions
ummy-0.063
ap-0.062
cient-0.057
iddy-0.056
ower-0.056
Token was
Token ID373
Feature activation-5.06e-02

Pos logit contributions
Who+0.091
Where+0.082
 other+0.076
 had+0.075
 moral+0.074
Neg logit contributions
ap-0.230
cient-0.228
ute-0.227
ach-0.224
iddy-0.223
Token happy
Token ID3772
Feature activation-2.43e-02

Pos logit contributions
 had+0.059
Who+0.057
 other+0.056
Where+0.053
 moral+0.053
Neg logit contributions
cient-0.092
ute-0.091
ach-0.091
ower-0.090
ummy-0.090
Token and
Token ID290
Feature activation-6.79e-02

Pos logit contributions
Who+0.040
 had+0.039
 other+0.038
Where+0.037
 moral+0.037
Neg logit contributions
cient-0.032
ute-0.032
ap-0.032
ach-0.031
uttering-0.031
Token he
Token ID339
Feature activation-9.97e-02

Pos logit contributions
 had+0.068
Who+0.065
 other+0.064
 moral+0.058
Where+0.058
Neg logit contributions
cient-0.133
ute-0.133
uttering-0.132
ocol-0.132
ach-0.132
Token didn
Token ID1422
Feature activation-1.92e-02

Pos logit contributions
Who+0.095
Where+0.089
 moral+0.078
Less+0.078
 dial+0.074
Neg logit contributions
cient-0.198
ap-0.198
iddy-0.197
ute-0.194
 Majesty-0.191
Token When
Token ID1649
Feature activation-4.11e-02

Pos logit contributions
Who+0.101
 other+0.098
 moral+0.097
Where+0.095
 had+0.094
Neg logit contributions
ummy-0.075
ap-0.071
ach-0.067
ower-0.067
cient-0.066
Token they
Token ID484
Feature activation-9.16e-02

Pos logit contributions
Who+0.065
Where+0.062
 moral+0.061
 dial+0.060
Less+0.059
Neg logit contributions
ummy-0.054
ap-0.052
ach-0.050
iddy-0.048
cient-0.047
Token got
Token ID1392
Feature activation-5.08e-02

Pos logit contributions
Who+0.125
 moral+0.121
Looks+0.116
Where+0.115
Anne+0.113
Neg logit contributions
cient-0.133
iddy-0.127
ap-0.127
uck-0.123
melon-0.120
Token there
Token ID612
Feature activation-4.74e-02

Pos logit contributions
 had+0.087
 other+0.083
Who+0.080
 also+0.079
Where+0.078
Neg logit contributions
cient-0.072
ocol-0.069
 Majesty-0.068
ute-0.068
ower-0.067
Token,
Token ID11
Feature activation-4.68e-02

Pos logit contributions
 had+0.096
 other+0.092
Who+0.089
 sounded+0.086
 also+0.086
Neg logit contributions
cient-0.050
ocol-0.050
ute-0.050
ower-0.048
uttering-0.047
Token John
Token ID1757
Feature activation-1.08e-01

Pos logit contributions
Who+0.077
 moral+0.073
 had+0.073
Where+0.072
 other+0.071
Neg logit contributions
ap-0.058
cient-0.057
ummy-0.057
ach-0.056
uttering-0.056
Token's
Token ID338
Feature activation-4.98e-02

Pos logit contributions
Who+0.111
Where+0.104
Less+0.102
 moral+0.102
Looks+0.097
Neg logit contributions
cient-0.201
ap-0.201
 Majesty-0.196
ach-0.195
iddy-0.195
Token dad
Token ID9955
Feature activation-9.54e-02

Pos logit contributions
 other+0.077
 had+0.076
 moral+0.073
Who+0.073
 also+0.069
Neg logit contributions
cient-0.074
ocol-0.073
ach-0.071
uttering-0.071
ute-0.069
Token had
Token ID550
Feature activation-4.68e-02

Pos logit contributions
Who+0.097
Less+0.091
 moral+0.091
Where+0.090
Looks+0.084
Neg logit contributions
cient-0.179
ap-0.176
ach-0.174
iddy-0.172
ummy-0.170
Token a
Token ID257
Feature activation-5.39e-02

Pos logit contributions
Who+0.059
 moral+0.054
Where+0.054
 had+0.054
 other+0.053
Neg logit contributions
ap-0.077
cient-0.077
ach-0.077
uttering-0.076
ute-0.076
Token surprise
Token ID5975
Feature activation-2.48e-02

Pos logit contributions
 had+0.067
Who+0.067
 other+0.065
 moral+0.063
Where+0.061
Neg logit contributions
ach-0.092
cient-0.091
ute-0.091
uttering-0.090
ap-0.090
Token him
Token ID683
Feature activation-5.77e-02

Pos logit contributions
Who+0.044
 other+0.043
 had+0.042
Where+0.041
 moral+0.040
Neg logit contributions
ute-0.038
uttering-0.038
cient-0.037
ower-0.037
ap-0.037
Token for
Token ID329
Feature activation-5.15e-02

Pos logit contributions
Who+0.081
 had+0.078
 other+0.076
Where+0.075
 moral+0.074
Neg logit contributions
ap-0.089
cient-0.089
ute-0.088
ach-0.087
uttering-0.087
Token the
Token ID262
Feature activation-5.93e-02

Pos logit contributions
Who+0.075
 had+0.069
Maybe+0.068
 moral+0.068
Where+0.068
Neg logit contributions
ap-0.077
uttering-0.074
cient-0.074
ach-0.073
ocol-0.073
Token gift
Token ID6979
Feature activation-3.58e-02

Pos logit contributions
Who+0.079
 had+0.077
 other+0.076
 moral+0.073
Where+0.073
Neg logit contributions
cient-0.096
ute-0.095
ap-0.095
ach-0.094
uttering-0.094
Token.
Token ID13
Feature activation-5.36e-02

Pos logit contributions
Who+0.054
Where+0.049
See+0.048
 moral+0.048
Less+0.047
Neg logit contributions
ap-0.054
ummy-0.054
ute-0.052
ach-0.051
cient-0.051
Token Bob
Token ID5811
Feature activation-1.08e-01

Pos logit contributions
Who+0.089
 had+0.087
 other+0.085
Where+0.084
 moral+0.083
Neg logit contributions
cient-0.069
ap-0.068
ute-0.068
uttering-0.067
ach-0.067
Token smiled
Token ID13541
Feature activation-1.96e-02

Pos logit contributions
Who+0.087
Where+0.078
 other+0.072
 had+0.070
 moral+0.069
Neg logit contributions
ap-0.233
cient-0.231
ute-0.229
ach-0.227
iddy-0.225
Token and
Token ID290
Feature activation-6.73e-02

Pos logit contributions
 had+0.022
He+0.022
 other+0.019
 also+0.019
 sounded+0.019
Neg logit contributions
ocol-0.042
cient-0.041
 disl-0.038
ach-0.038
ower-0.038
Token waved
Token ID26834
Feature activation-1.12e-02

Pos logit contributions
Who+0.045
 had+0.042
 other+0.040
Where+0.038
 moral+0.038
Neg logit contributions
cient-0.153
ap-0.152
ute-0.152
uttering-0.151
ach-0.151
Token goodbye
Token ID24829
Feature activation-8.73e-04

Pos logit contributions
 had+0.017
 other+0.017
He+0.016
Who+0.016
Where+0.015
Neg logit contributions
cient-0.019
owers-0.018
ower-0.018
ute-0.018
ocol-0.017
Token,
Token ID11
Feature activation-5.25e-02

Pos logit contributions
 had+0.001
 other+0.001
He+0.001
 shouted+0.001
Who+0.001
Neg logit contributions
ocol-0.002
cient-0.002
imb-0.001
ute-0.001
ubby-0.001
Token mistakes
Token ID10135
Feature activation-2.28e-02

Pos logit contributions
 other+0.043
Who+0.042
 had+0.040
 moral+0.039
Where+0.038
Neg logit contributions
cient-0.074
uttering-0.073
ute-0.071
ower-0.071
ap-0.070
Token.
Token ID13
Feature activation-5.32e-02

Pos logit contributions
 had+0.034
 other+0.032
Who+0.031
 sounded+0.031
 also+0.030
Neg logit contributions
uttering-0.036
cient-0.036
ocol-0.035
ute-0.035
ach-0.035
Token
Token ID198
Feature activation-2.32e-02

Pos logit contributions
Who+0.067
 other+0.065
 had+0.064
 dial+0.063
 sounded+0.061
Neg logit contributions
ummy-0.090
ap-0.086
iddy-0.081
ach-0.081
imb-0.080
Token
Token ID198
Feature activation-4.39e-02

Pos logit contributions
Who+0.035
 dial+0.033
Where+0.032
 had+0.031
 other+0.031
Neg logit contributions
ute-0.033
ummy-0.032
ach-0.032
ap-0.032
uttering-0.031
TokenWhen
Token ID2215
Feature activation-4.00e-02

Pos logit contributions
 dial+0.058
 whispered+0.052
 other+0.050
Who+0.050
 counted+0.049
Neg logit contributions
ach-0.070
ummy-0.065
ute-0.065
ap-0.063
uttering-0.063
Token John
Token ID1757
Feature activation-1.07e-01

Pos logit contributions
Who+0.056
Where+0.053
 had+0.053
 moral+0.052
 dial+0.051
Neg logit contributions
ap-0.060
ummy-0.060
ach-0.059
cient-0.057
ute-0.057
Token arrived
Token ID5284
Feature activation-4.80e-02

Pos logit contributions
Who+0.113
Where+0.112
 moral+0.104
Less+0.104
Looks+0.103
Neg logit contributions
ap-0.197
uck-0.196
ish-0.194
cient-0.191
iddy-0.191
Token home
Token ID1363
Feature activation-4.93e-02

Pos logit contributions
 other+0.080
 had+0.079
He+0.073
 also+0.073
Who+0.072
Neg logit contributions
cient-0.069
 disl-0.063
ocol-0.063
ute-0.063
ower-0.062
Token,
Token ID11
Feature activation-5.12e-02

Pos logit contributions
 had+0.096
 other+0.092
Who+0.091
 also+0.087
He+0.086
Neg logit contributions
cient-0.057
ute-0.056
uttering-0.055
ocol-0.052
ower-0.052
Token the
Token ID262
Feature activation-6.31e-02

Pos logit contributions
Who+0.056
 had+0.052
Where+0.051
 other+0.050
 moral+0.050
Neg logit contributions
ap-0.093
cient-0.092
ummy-0.091
ute-0.091
ach-0.091
Token sun
Token ID4252
Feature activation-7.42e-02

Pos logit contributions
Who+0.092
Where+0.086
 had+0.086
He+0.084
 dial+0.084
Neg logit contributions
ap-0.092
ute-0.091
ower-0.090
ummy-0.090
iddy-0.089
Token after
Token ID706
Feature activation-3.51e-02

Pos logit contributions
Who+0.048
 had+0.047
 other+0.045
Where+0.044
 moral+0.043
Neg logit contributions
cient-0.070
ute-0.069
ap-0.069
ach-0.068
uttering-0.068
Token that
Token ID326
Feature activation-4.89e-02

Pos logit contributions
 had+0.050
Who+0.049
 other+0.048
 moral+0.046
Where+0.045
Neg logit contributions
cient-0.056
ute-0.055
ap-0.054
ach-0.053
 Majesty-0.053
Token.
Token ID13
Feature activation-5.51e-02

Pos logit contributions
Who+0.078
Where+0.073
 dial+0.071
He+0.071
 had+0.070
Neg logit contributions
ap-0.069
cient-0.068
ute-0.066
ummy-0.066
uttering-0.065
Token Every
Token ID3887
Feature activation-3.84e-02

Pos logit contributions
Who+0.076
 had+0.071
 other+0.070
Where+0.070
 dial+0.070
Neg logit contributions
ummy-0.088
ap-0.086
cient-0.084
iddy-0.082
ach-0.081
Token day
Token ID1110
Feature activation-3.10e-02

Pos logit contributions
Who+0.044
 had+0.044
 other+0.043
 moral+0.041
Where+0.041
Neg logit contributions
cient-0.069
ute-0.068
ach-0.068
ap-0.067
uttering-0.067
Token Lucy
Token ID22162
Feature activation-1.07e-01

Pos logit contributions
Who+0.067
 moral+0.065
 had+0.064
Where+0.064
 other+0.064
Neg logit contributions
ap-0.023
ummy-0.022
ach-0.022
cient-0.022
uttering-0.021
Token would
Token ID561
Feature activation-5.28e-02

Pos logit contributions
 had+0.094
Who+0.089
 other+0.085
 also+0.079
 prayed+0.078
Neg logit contributions
ute-0.223
uttering-0.223
ach-0.222
ap-0.221
cient-0.221
Token observe
Token ID12414
Feature activation-3.98e-02

Pos logit contributions
 had+0.059
 also+0.055
 other+0.054
Who+0.051
 prayed+0.049
Neg logit contributions
uttering-0.103
ower-0.103
ute-0.103
ocol-0.102
ach-0.101
Token the
Token ID262
Feature activation-7.07e-02

Pos logit contributions
 other+0.047
 had+0.043
Who+0.042
Where+0.040
 sounded+0.039
Neg logit contributions
ute-0.077
ower-0.076
uttering-0.075
ach-0.074
cient-0.074
Token sheep
Token ID15900
Feature activation-5.70e-02

Pos logit contributions
Who+0.118
 had+0.110
Less+0.110
He+0.110
Where+0.109
Neg logit contributions
cient-0.093
ummy-0.091
ute-0.090
ach-0.090
uttering-0.089
Token and
Token ID290
Feature activation-6.95e-02

Pos logit contributions
 had+0.080
 other+0.078
 also+0.073
Who+0.072
 moral+0.071
Neg logit contributions
ach-0.091
ocol-0.090
ute-0.090
ower-0.090
uttering-0.090
Token Bob
Token ID5811
Feature activation-7.62e-02

Pos logit contributions
Who+0.068
Maybe+0.064
Where+0.063
He+0.063
 moral+0.063
Neg logit contributions
ap-0.048
ocol-0.048
ach-0.047
ummy-0.046
 Majesty-0.046
Token and
Token ID290
Feature activation-5.31e-02

Pos logit contributions
Who+0.116
 had+0.110
 other+0.108
Where+0.108
 moral+0.107
Neg logit contributions
cient-0.108
ap-0.108
ute-0.107
ach-0.106
uttering-0.106
Token the
Token ID262
Feature activation-5.89e-02

Pos logit contributions
 had+0.074
Who+0.073
 other+0.072
Where+0.068
 moral+0.067
Neg logit contributions
ute-0.086
uttering-0.084
cient-0.084
ap-0.083
ower-0.083
Token man
Token ID582
Feature activation-7.90e-02

Pos logit contributions
Who+0.077
 had+0.074
Looks+0.074
Less+0.073
Where+0.073
Neg logit contributions
ower-0.089
ap-0.088
cient-0.088
uttering-0.088
ute-0.087
Token.
Token ID13
Feature activation-6.07e-02

Pos logit contributions
Who+0.088
Where+0.079
 moral+0.076
Looks+0.074
Less+0.074
Neg logit contributions
ap-0.151
cient-0.148
ute-0.143
ach-0.141
uttering-0.139
Token Bob
Token ID5811
Feature activation-1.07e-01

Pos logit contributions
Who+0.078
 had+0.074
 other+0.073
 moral+0.072
Where+0.071
Neg logit contributions
ap-0.099
ummy-0.099
cient-0.099
ach-0.098
ute-0.096
Token was
Token ID373
Feature activation-5.37e-02

Pos logit contributions
Who+0.097
Where+0.088
Less+0.078
 moral+0.077
 other+0.076
Neg logit contributions
cient-0.221
ap-0.221
ute-0.214
ach-0.214
 Majesty-0.213
Token happy
Token ID3772
Feature activation-2.44e-02

Pos logit contributions
 had+0.062
 other+0.057
 also+0.056
Who+0.056
Where+0.053
Neg logit contributions
cient-0.102
ute-0.101
ocol-0.101
ach-0.100
ower-0.100
Token because
Token ID780
Feature activation-6.18e-02

Pos logit contributions
Who+0.039
 had+0.038
 other+0.038
Where+0.037
 moral+0.037
Neg logit contributions
cient-0.032
ap-0.032
ute-0.032
uttering-0.032
ach-0.032
Token of
Token ID286
Feature activation-5.27e-02

Pos logit contributions
Who+0.075
 dial+0.070
Where+0.068
 had+0.067
Less+0.067
Neg logit contributions
ummy-0.104
ach-0.102
ap-0.102
iddy-0.099
uttering-0.099
Token his
Token ID465
Feature activation-6.55e-02

Pos logit contributions
Who+0.062
 had+0.062
 other+0.061
 moral+0.057
Where+0.057
Neg logit contributions
ute-0.094
cient-0.094
uttering-0.093
ap-0.092
ower-0.092
Token the
Token ID262
Feature activation-5.20e-02

Pos logit contributions
Who+0.038
 had+0.037
 other+0.036
Where+0.035
 moral+0.035
Neg logit contributions
ap-0.040
cient-0.040
ute-0.040
uttering-0.040
ach-0.040
Token theater
Token ID13766
Feature activation-3.53e-02

Pos logit contributions
Who+0.088
See+0.086
Maybe+0.085
Less+0.085
He+0.085
Neg logit contributions
ummy-0.066
cient-0.062
ap-0.061
ach-0.060
ute-0.057
Token.
Token ID13
Feature activation-5.65e-02

Pos logit contributions
Who+0.047
 had+0.044
 other+0.043
Where+0.043
 moral+0.043
Neg logit contributions
ap-0.058
cient-0.057
ute-0.057
ach-0.056
uttering-0.056
Token Even
Token ID3412
Feature activation-4.40e-02

Pos logit contributions
Who+0.085
 other+0.082
 moral+0.081
 had+0.080
Where+0.078
Neg logit contributions
ummy-0.080
ap-0.079
cient-0.078
iddy-0.075
ach-0.075
Token though
Token ID996
Feature activation-3.35e-02

Pos logit contributions
Who+0.063
 had+0.062
 other+0.060
Where+0.059
 moral+0.058
Neg logit contributions
cient-0.066
ap-0.066
ute-0.066
uttering-0.065
ach-0.065
Token John
Token ID1757
Feature activation-1.07e-01

Pos logit contributions
Who+0.044
 had+0.043
 other+0.042
Where+0.040
 moral+0.040
Neg logit contributions
cient-0.055
ute-0.055
ap-0.054
uttering-0.054
ach-0.054
Token was
Token ID373
Feature activation-4.55e-02

Pos logit contributions
Who+0.097
Where+0.089
Less+0.087
 moral+0.086
 other+0.081
Neg logit contributions
cient-0.217
ap-0.216
ach-0.215
ower-0.210
iddy-0.209
Token only
Token ID691
Feature activation-6.16e-02

Pos logit contributions
 had+0.052
 other+0.050
Who+0.050
 also+0.049
Where+0.047
Neg logit contributions
cient-0.087
ute-0.085
ummy-0.083
ach-0.083
ocol-0.083
Token three
Token ID1115
Feature activation-5.64e-02

Pos logit contributions
 had+0.084
 other+0.081
Who+0.081
 also+0.076
Where+0.076
Neg logit contributions
cient-0.105
ute-0.104
ummy-0.104
ach-0.101
ocol-0.098
Token,
Token ID11
Feature activation-5.44e-02

Pos logit contributions
Who+0.096
 had+0.091
Where+0.090
 other+0.090
 moral+0.089
Neg logit contributions
ap-0.070
ach-0.069
cient-0.069
ute-0.067
ummy-0.067
Token he
Token ID339
Feature activation-1.01e-01

Pos logit contributions
Who+0.068
 had+0.063
 moral+0.063
Where+0.062
 other+0.062
Neg logit contributions
ap-0.089
ach-0.088
cient-0.088
ummy-0.088
ute-0.087
Token to
Token ID284
Feature activation-4.50e-02

Pos logit contributions
 had+0.027
 other+0.025
 sounded+0.023
 also+0.022
Who+0.022
Neg logit contributions
cient-0.050
 Majesty-0.049
ocol-0.049
ach-0.048
uttering-0.048
Token play
Token ID711
Feature activation-2.00e-02

Pos logit contributions
 had+0.052
 other+0.052
Who+0.051
 dial+0.048
 also+0.047
Neg logit contributions
uttering-0.082
cient-0.082
ute-0.081
 Majesty-0.080
ummy-0.080
Token with
Token ID351
Feature activation-3.60e-02

Pos logit contributions
 other+0.024
 had+0.023
Who+0.022
 sounded+0.022
Where+0.022
Neg logit contributions
cient-0.039
 Majesty-0.038
ummy-0.037
uttering-0.036
ocol-0.036
Token.
Token ID13
Feature activation-5.06e-02

Pos logit contributions
Who+0.042
 had+0.040
 other+0.039
Where+0.038
 moral+0.038
Neg logit contributions
cient-0.065
ute-0.064
ap-0.064
uttering-0.064
ach-0.063
Token As
Token ID1081
Feature activation-3.08e-02

Pos logit contributions
Who+0.065
 had+0.063
 other+0.061
Where+0.060
 moral+0.059
Neg logit contributions
cient-0.084
ap-0.083
ute-0.083
uttering-0.082
ach-0.082
Token Lucy
Token ID22162
Feature activation-1.07e-01

Pos logit contributions
Who+0.042
He+0.039
 moral+0.039
 dial+0.039
Where+0.039
Neg logit contributions
ummy-0.047
ap-0.045
uttering-0.045
ach-0.044
iddy-0.044
Token was
Token ID373
Feature activation-4.46e-02

Pos logit contributions
Who+0.111
Where+0.104
 moral+0.102
Less+0.100
 other+0.096
Neg logit contributions
ap-0.201
ach-0.197
uc-0.195
iddy-0.194
uck-0.193
Token pushing
Token ID7796
Feature activation-3.56e-02

Pos logit contributions
 had+0.072
 other+0.069
Who+0.068
 also+0.066
 dial+0.065
Neg logit contributions
cient-0.065
ute-0.064
ummy-0.062
ach-0.061
ocol-0.060
Token the
Token ID262
Feature activation-6.12e-02

Pos logit contributions
Who+0.050
 had+0.048
 other+0.047
Where+0.046
 moral+0.046
Neg logit contributions
cient-0.055
ap-0.055
ute-0.055
uttering-0.054
ach-0.054
Token soap
Token ID19533
Feature activation-4.29e-02

Pos logit contributions
Who+0.092
 had+0.087
 prayed+0.086
Less+0.086
Where+0.086
Neg logit contributions
ute-0.087
ap-0.086
ummy-0.086
ower-0.085
ach-0.085
Token around
Token ID1088
Feature activation-3.35e-02

Pos logit contributions
 had+0.067
 other+0.066
 sounded+0.061
Who+0.061
 also+0.059
Neg logit contributions
cient-0.064
ocol-0.064
 Majesty-0.063
ach-0.062
uttering-0.061
Token
Token ID198
Feature activation-1.56e-02

Pos logit contributions
Who+0.065
 dial+0.061
 prayed+0.060
 moral+0.060
Where+0.059
Neg logit contributions
ap-0.074
ummy-0.071
cient-0.070
ute-0.069
ower-0.068
Token
Token ID198
Feature activation-3.81e-02

Pos logit contributions
Who+0.020
 dial+0.018
Where+0.017
 prayed+0.017
 had+0.017
Neg logit contributions
ute-0.027
ap-0.026
uttering-0.026
cient-0.026
ummy-0.026
TokenAfter
Token ID3260
Feature activation-2.99e-02

Pos logit contributions
 dial+0.047
 prayed+0.043
Who+0.043
 other+0.038
 had+0.038
Neg logit contributions
ute-0.064
ap-0.064
uttering-0.063
uck-0.062
ach-0.062
Token that
Token ID326
Feature activation-3.96e-02

Pos logit contributions
Who+0.047
 had+0.044
 other+0.044
Where+0.043
 moral+0.043
Neg logit contributions
ap-0.041
cient-0.040
uttering-0.040
ute-0.040
ach-0.040
Token day
Token ID1110
Feature activation-2.50e-02

Pos logit contributions
 had+0.071
 other+0.067
 moral+0.066
Who+0.066
 also+0.063
Neg logit contributions
ocol-0.053
cient-0.051
uttering-0.050
 Majesty-0.050
ach-0.049
Token Sam
Token ID3409
Feature activation-1.07e-01

Pos logit contributions
Who+0.044
 moral+0.041
Where+0.041
 other+0.041
 dial+0.041
Neg logit contributions
ap-0.031
uttering-0.030
ute-0.029
ach-0.029
ummy-0.029
Token was
Token ID373
Feature activation-4.46e-02

Pos logit contributions
Who+0.099
Where+0.093
Less+0.083
 moral+0.081
Looks+0.079
Neg logit contributions
ap-0.212
iddy-0.210
cient-0.209
ach-0.207
ummy-0.206
Token always
Token ID1464
Feature activation-4.52e-02

Pos logit contributions
 had+0.055
 other+0.050
 also+0.049
 learned+0.045
 moral+0.045
Neg logit contributions
cient-0.087
ach-0.085
ocol-0.084
ute-0.082
ower-0.082
Token happy
Token ID3772
Feature activation-1.29e-02

Pos logit contributions
 had+0.058
 other+0.054
Who+0.052
 also+0.052
 prayed+0.050
Neg logit contributions
cient-0.085
ocol-0.082
ute-0.080
ach-0.079
ummy-0.079
Token to
Token ID284
Feature activation-4.90e-02

Pos logit contributions
 had+0.020
Who+0.018
 other+0.018
 sounded+0.017
Where+0.017
Neg logit contributions
cient-0.021
ute-0.021
ocol-0.020
uttering-0.020
ower-0.019
Token go
Token ID467
Feature activation-4.18e-02

Pos logit contributions
 had+0.040
Who+0.039
 other+0.038
 moral+0.035
 also+0.034
Neg logit contributions
cient-0.109
uttering-0.109
ute-0.107
ower-0.107
ach-0.107
Token.
Token ID13
Feature activation-4.30e-02

Pos logit contributions
Who+0.078
 had+0.073
Where+0.071
 other+0.071
 moral+0.070
Neg logit contributions
ap-0.094
cient-0.094
ute-0.093
uttering-0.092
ach-0.092
Token And
Token ID843
Feature activation-5.83e-02

Pos logit contributions
Who+0.062
 dial+0.057
Where+0.057
 prayed+0.056
 moral+0.055
Neg logit contributions
ummy-0.069
cient-0.067
ap-0.065
iddy-0.065
ach-0.063
Token that
Token ID326
Feature activation-4.58e-02

Pos logit contributions
Who+0.087
 had+0.080
Where+0.080
 moral+0.079
 other+0.079
Neg logit contributions
ap-0.085
ach-0.084
cient-0.083
ute-0.083
ummy-0.083
Token's
Token ID338
Feature activation-6.27e-02

Pos logit contributions
Who+0.050
Where+0.044
He+0.041
 dial+0.041
See+0.040
Neg logit contributions
ap-0.090
ummy-0.087
cient-0.087
iddy-0.084
ute-0.083
Token how
Token ID703
Feature activation-2.97e-02

Pos logit contributions
 had+0.099
Who+0.097
 other+0.096
 moral+0.095
 also+0.093
Neg logit contributions
ute-0.088
cient-0.088
uttering-0.088
ach-0.087
ower-0.087
Token Tim
Token ID5045
Feature activation-1.06e-01

Pos logit contributions
 had+0.043
 other+0.041
 also+0.036
He+0.035
Who+0.035
Neg logit contributions
ute-0.054
ap-0.051
icol-0.050
ach-0.050
uttering-0.050
Token shared
Token ID4888
Feature activation-3.69e-02

Pos logit contributions
Who+0.109
Where+0.102
Less+0.096
 moral+0.095
 other+0.091
Neg logit contributions
cient-0.209
ap-0.206
ute-0.204
iddy-0.201
ach-0.200
Token his
Token ID465
Feature activation-6.03e-02

Pos logit contributions
Who+0.045
 had+0.043
 other+0.042
Where+0.041
 moral+0.041
Neg logit contributions
ap-0.063
cient-0.063
ute-0.063
ach-0.062
uttering-0.062
Token special
Token ID2041
Feature activation-4.85e-02

Pos logit contributions
Who+0.085
 had+0.085
 other+0.084
 moral+0.081
 dial+0.079
Neg logit contributions
cient-0.093
ap-0.091
uttering-0.091
ute-0.091
ach-0.090
Token surprise
Token ID5975
Feature activation-3.65e-02

Pos logit contributions
 had+0.071
Who+0.071
 other+0.070
 moral+0.068
 dial+0.067
Neg logit contributions
cient-0.072
uttering-0.070
ap-0.070
ach-0.069
ute-0.069
Token with
Token ID351
Feature activation-4.98e-02

Pos logit contributions
Who+0.059
Where+0.054
 had+0.054
 moral+0.053
See+0.053
Neg logit contributions
ap-0.050
ute-0.050
ach-0.049
cient-0.049
ummy-0.049
Token mom
Token ID1995
Feature activation-8.41e-02

Pos logit contributions
Who+0.065
Where+0.061
 had+0.060
See+0.059
He+0.059
Neg logit contributions
ap-0.057
ummy-0.057
ute-0.056
cient-0.056
ower-0.055
Tokenmy
Token ID1820
Feature activation-8.41e-02

Pos logit contributions
Who+0.113
Where+0.105
 moral+0.100
Warning+0.097
Less+0.096
Neg logit contributions
ish-0.150
ap-0.148
ummy-0.146
cient-0.144
uc-0.143
Token drove
Token ID10357
Feature activation-2.13e-02

Pos logit contributions
Who+0.100
Where+0.095
 moral+0.089
Oh+0.086
Less+0.084
Neg logit contributions
cient-0.155
ish-0.154
ap-0.153
iddy-0.148
uc-0.146
Token home
Token ID1363
Feature activation-3.03e-02

Pos logit contributions
Who+0.038
 had+0.036
 moral+0.035
Where+0.035
 other+0.035
Neg logit contributions
ap-0.025
ach-0.025
ute-0.025
cient-0.025
uttering-0.024
Token.
Token ID13
Feature activation-5.99e-02

Pos logit contributions
Who+0.047
 had+0.045
 other+0.044
Where+0.044
 moral+0.043
Neg logit contributions
cient-0.042
ap-0.042
ute-0.042
uttering-0.041
ach-0.041
Token John
Token ID1757
Feature activation-1.06e-01

Pos logit contributions
Who+0.095
 other+0.093
 moral+0.091
 had+0.090
Where+0.089
Neg logit contributions
ummy-0.080
ap-0.076
cient-0.073
ach-0.072
ower-0.071
Token was
Token ID373
Feature activation-5.12e-02

Pos logit contributions
Who+0.111
Where+0.102
Less+0.098
 moral+0.098
Looks+0.094
Neg logit contributions
cient-0.210
ap-0.201
 Majesty-0.194
ish-0.194
iddy-0.194
Token very
Token ID845
Feature activation-5.31e-02

Pos logit contributions
 had+0.057
 other+0.053
Who+0.053
 also+0.052
Where+0.050
Neg logit contributions
ute-0.098
cient-0.098
ocol-0.096
ower-0.096
ach-0.096
Token happy
Token ID3772
Feature activation-1.72e-02

Pos logit contributions
Who+0.062
 had+0.062
 other+0.060
 moral+0.057
Where+0.057
Neg logit contributions
cient-0.094
ute-0.094
ap-0.092
ach-0.092
uttering-0.092
Token and
Token ID290
Feature activation-7.23e-02

Pos logit contributions
Who+0.022
 had+0.022
 other+0.021
Where+0.021
 moral+0.020
Neg logit contributions
cient-0.029
ute-0.028
ap-0.028
uttering-0.028
ach-0.028
Token thanked
Token ID26280
Feature activation-3.73e-02

Pos logit contributions
Who+0.074
 had+0.069
 other+0.068
Where+0.067
 moral+0.067
Neg logit contributions
cient-0.138
ap-0.138
ute-0.137
ach-0.136
uttering-0.136
Token time
Token ID640
Feature activation-2.42e-02

Pos logit contributions
Who+0.029
 had+0.029
 other+0.028
 moral+0.027
Where+0.027
Neg logit contributions
cient-0.046
ute-0.045
ap-0.045
uttering-0.045
ach-0.044
Token and
Token ID290
Feature activation-5.71e-02

Pos logit contributions
 had+0.035
Who+0.033
 other+0.033
Where+0.031
He+0.031
Neg logit contributions
cient-0.038
ute-0.038
uttering-0.037
ower-0.036
ap-0.036
Token said
Token ID531
Feature activation-2.21e-02

Pos logit contributions
Who+0.060
Where+0.054
 moral+0.053
 other+0.052
 had+0.051
Neg logit contributions
ap-0.109
cient-0.106
ach-0.106
ummy-0.105
ute-0.104
Token goodbye
Token ID24829
Feature activation+4.69e-03

Pos logit contributions
 had+0.034
 other+0.034
He+0.030
 moral+0.030
 better+0.030
Neg logit contributions
uttering-0.037
ower-0.036
ute-0.034
ocol-0.034
arter-0.034
Token.
Token ID13
Feature activation-5.11e-02

Pos logit contributions
ocol+0.010
cient+0.010
ute+0.009
ubby+0.009
uttering+0.009
Neg logit contributions
 had-0.005
 other-0.004
He-0.004
Who-0.004
 also-0.004
Token Joe
Token ID5689
Feature activation-1.05e-01

Pos logit contributions
Who+0.086
 had+0.083
 other+0.082
Where+0.081
 dial+0.081
Neg logit contributions
ap-0.065
ummy-0.064
cient-0.063
ach-0.061
uttering-0.061
Token felt
Token ID2936
Feature activation-3.42e-02

Pos logit contributions
Who+0.101
Where+0.092
 other+0.083
Less+0.083
 moral+0.082
Neg logit contributions
ap-0.211
cient-0.209
ute-0.207
ach-0.205
iddy-0.205
Token much
Token ID881
Feature activation-3.62e-02

Pos logit contributions
 had+0.036
 other+0.034
Who+0.032
 also+0.031
He+0.031
Neg logit contributions
ower-0.070
cient-0.070
ute-0.069
ach-0.069
ummy-0.069
Token better
Token ID1365
Feature activation-1.95e-02

Pos logit contributions
Who+0.034
Mark+0.031
 dial+0.030
Ann+0.030
Anne+0.030
Neg logit contributions
ap-0.073
uttering-0.068
ummy-0.068
iddy-0.068
ute-0.067
Token now
Token ID783
Feature activation-4.16e-02

Pos logit contributions
 had+0.024
Who+0.022
 other+0.022
 also+0.021
Where+0.021
Neg logit contributions
cient-0.037
ocol-0.036
ute-0.035
ummy-0.034
 Majesty-0.034
Token that
Token ID326
Feature activation-4.90e-02

Pos logit contributions
Who+0.059
 had+0.056
 other+0.055
Where+0.055
 moral+0.054
Neg logit contributions
ap-0.064
ute-0.063
cient-0.062
ach-0.062
uttering-0.062
Token's
Token ID338
Feature activation-6.28e-02

Pos logit contributions
Who+0.049
Where+0.043
 dial+0.042
He+0.041
 had+0.040
Neg logit contributions
ap-0.096
ummy-0.093
cient-0.093
ute-0.092
iddy-0.090
Token the
Token ID262
Feature activation-5.66e-02

Pos logit contributions
 had+0.093
 other+0.091
Who+0.091
 moral+0.089
 also+0.088
Neg logit contributions
cient-0.095
ute-0.094
uttering-0.094
ach-0.094
ower-0.093
Token story
Token ID1621
Feature activation-5.03e-02

Pos logit contributions
Who+0.066
 had+0.060
 dial+0.060
 shouted+0.059
Where+0.058
Neg logit contributions
ap-0.100
ummy-0.100
ute-0.097
cient-0.097
uttering-0.096
Token of
Token ID286
Feature activation-5.54e-02

Pos logit contributions
Who+0.065
Where+0.060
He+0.058
Less+0.058
Looks+0.057
Neg logit contributions
ap-0.086
ute-0.081
ummy-0.081
cient-0.081
iddy-0.079
Token how
Token ID703
Feature activation-3.35e-02

Pos logit contributions
 other+0.071
 moral+0.065
 had+0.065
 also+0.063
Who+0.062
Neg logit contributions
cient-0.101
ower-0.101
ute-0.101
uttering-0.099
ach-0.099
Token Mia
Token ID32189
Feature activation-1.05e-01

Pos logit contributions
 had+0.041
 other+0.040
 also+0.035
Who+0.033
 learned+0.033
Neg logit contributions
icol-0.065
ute-0.065
ach-0.064
ocol-0.064
ower-0.063
Token and
Token ID290
Feature activation-5.44e-02

Pos logit contributions
Who+0.090
 had+0.085
 other+0.082
Where+0.079
 moral+0.078
Neg logit contributions
cient-0.219
ap-0.218
ute-0.218
ach-0.216
uttering-0.216
Token her
Token ID607
Feature activation-6.13e-02

Pos logit contributions
 had+0.061
Who+0.061
 other+0.061
 moral+0.056
Where+0.053
Neg logit contributions
ute-0.104
ach-0.102
uttering-0.102
cient-0.100
ower-0.100
Token mom
Token ID1995
Feature activation-9.01e-02

Pos logit contributions
 other+0.077
 had+0.073
 moral+0.071
Who+0.069
 also+0.064
Neg logit contributions
ute-0.112
cient-0.111
ach-0.110
uttering-0.110
ap-0.107
Token turned
Token ID2900
Feature activation-3.91e-02

Pos logit contributions
Who+0.098
Where+0.090
 moral+0.085
Less+0.084
Looks+0.081
Neg logit contributions
cient-0.174
ap-0.174
ish-0.165
ute-0.164
iddy-0.163
Token apples
Token ID22514
Feature activation-4.55e-02

Pos logit contributions
 other+0.049
 had+0.047
 moral+0.041
Who+0.041
 learned+0.039
Neg logit contributions
 Majesty-0.075
ute-0.074
ower-0.074
uttering-0.073
ach-0.072
Token the
Token ID262
Feature activation-6.08e-02

Pos logit contributions
 other+0.065
Who+0.065
 had+0.064
Where+0.061
 dial+0.060
Neg logit contributions
ute-0.075
cient-0.074
ower-0.073
uttering-0.073
iddy-0.072
Token mug
Token ID25152
Feature activation-3.16e-02

Pos logit contributions
Who+0.129
 had+0.122
He+0.122
Where+0.122
Less+0.118
Neg logit contributions
ummy-0.052
ute-0.050
cient-0.050
ap-0.049
uttering-0.049
Token.
Token ID13
Feature activation-5.65e-02

Pos logit contributions
Who+0.041
 had+0.041
 other+0.040
Where+0.038
 moral+0.038
Neg logit contributions
cient-0.051
ute-0.051
ap-0.050
ach-0.050
uttering-0.050
Token Unfortunately
Token ID8989
Feature activation-2.96e-02

Pos logit contributions
Who+0.070
 other+0.067
 had+0.066
Where+0.065
 moral+0.063
Neg logit contributions
ummy-0.095
ap-0.093
cient-0.093
 Majesty-0.091
ach-0.090
Token,
Token ID11
Feature activation-4.51e-02

Pos logit contributions
 had+0.059
 other+0.056
Who+0.054
 moral+0.053
 sounded+0.053
Neg logit contributions
uttering-0.035
ute-0.034
ower-0.033
cient-0.032
ach-0.031
Token John
Token ID1757
Feature activation-1.05e-01

Pos logit contributions
Who+0.065
 had+0.062
Where+0.061
 other+0.061
 moral+0.060
Neg logit contributions
ap-0.065
cient-0.065
ummy-0.064
ach-0.064
ute-0.063
Token was
Token ID373
Feature activation-5.11e-02

Pos logit contributions
Who+0.107
Where+0.094
Less+0.091
 moral+0.088
Looks+0.088
Neg logit contributions
cient-0.203
ap-0.202
ish-0.197
ach-0.194
iddy-0.194
Token too
Token ID1165
Feature activation-4.07e-02

Pos logit contributions
Who+0.059
 had+0.058
 other+0.056
Where+0.054
 moral+0.054
Neg logit contributions
cient-0.092
ute-0.091
ap-0.091
ach-0.090
uttering-0.090
Token honest
Token ID5508
Feature activation-1.16e-02

Pos logit contributions
 had+0.046
 other+0.044
Who+0.044
He+0.042
 also+0.042
Neg logit contributions
cient-0.077
ocol-0.075
ute-0.075
ap-0.075
 Majesty-0.075
Token and
Token ID290
Feature activation-7.55e-02

Pos logit contributions
 had+0.015
 other+0.013
He+0.012
Who+0.012
 sounded+0.012
Neg logit contributions
cient-0.022
ocol-0.022
ute-0.021
 Majesty-0.021
uttering-0.020
Token had
Token ID550
Feature activation-4.32e-02

Pos logit contributions
Who+0.065
 had+0.058
 other+0.057
Where+0.057
 moral+0.056
Neg logit contributions
cient-0.156
ap-0.155
ute-0.153
ach-0.153
uttering-0.153
Token time
Token ID640
Feature activation-2.11e-02

Pos logit contributions
 had+0.040
 other+0.038
Who+0.038
 moral+0.036
Where+0.036
Neg logit contributions
cient-0.073
ute-0.072
ach-0.071
ocol-0.070
ower-0.070
Token to
Token ID284
Feature activation-5.33e-02

Pos logit contributions
 had+0.037
 other+0.035
Who+0.034
 moral+0.034
 also+0.033
Neg logit contributions
cient-0.027
ocol-0.027
ach-0.026
 Majesty-0.026
ute-0.026
Token go
Token ID467
Feature activation-4.44e-02

Pos logit contributions
 other+0.059
 had+0.059
Who+0.057
 moral+0.057
 also+0.054
Neg logit contributions
uttering-0.100
cient-0.100
ower-0.099
ute-0.098
ocol-0.097
Token home
Token ID1363
Feature activation-2.46e-02

Pos logit contributions
 had+0.083
Who+0.083
 other+0.082
Where+0.079
 moral+0.078
Neg logit contributions
cient-0.048
ocol-0.046
ute-0.046
ach-0.046
ap-0.046
Token.
Token ID13
Feature activation-4.55e-02

Pos logit contributions
 had+0.022
 other+0.022
Who+0.020
 also+0.020
 moral+0.019
Neg logit contributions
cient-0.052
ocol-0.052
ute-0.050
uttering-0.050
ach-0.050
Token Tim
Token ID5045
Feature activation-1.05e-01

Pos logit contributions
Who+0.079
Where+0.074
 had+0.074
 other+0.073
 moral+0.073
Neg logit contributions
ummy-0.056
ap-0.056
cient-0.056
ach-0.054
ute-0.054
Token said
Token ID531
Feature activation-2.68e-02

Pos logit contributions
Who+0.120
Where+0.113
Looks+0.101
Less+0.099
See+0.097
Neg logit contributions
ap-0.191
cient-0.191
ute-0.185
iddy-0.185
 Majesty-0.184
Token goodbye
Token ID24829
Feature activation+4.64e-03

Pos logit contributions
 other+0.047
 had+0.046
 moral+0.044
He+0.041
 learned+0.041
Neg logit contributions
ocol-0.040
ower-0.040
ach-0.039
uttering-0.037
ute-0.037
Token to
Token ID284
Feature activation-4.52e-02

Pos logit contributions
ocol+0.009
cient+0.008
ach+0.008
igsaw+0.007
 Majesty+0.007
Neg logit contributions
 had-0.007
 other-0.006
 learned-0.006
 also-0.006
 shouted-0.006
Token the
Token ID262
Feature activation-5.79e-02

Pos logit contributions
Who+0.066
 had+0.063
Where+0.062
Maybe+0.060
See+0.060
Neg logit contributions
cient-0.067
ap-0.066
ummy-0.064
ute-0.063
iddy-0.063
Token squirrel
Token ID33039
Feature activation-5.20e-02

Pos logit contributions
Who+0.077
 had+0.073
Where+0.072
 dial+0.071
 prayed+0.070
Neg logit contributions
cient-0.092
ute-0.092
ummy-0.091
ap-0.090
uttering-0.090
Token.
Token ID13
Feature activation-5.67e-02

Pos logit contributions
Who+0.035
 had+0.035
 other+0.034
Where+0.032
 moral+0.032
Neg logit contributions
cient-0.062
ute-0.061
uttering-0.060
ap-0.060
ach-0.060
Token From
Token ID3574
Feature activation-4.80e-02

Pos logit contributions
Who+0.063
 had+0.060
 other+0.058
Where+0.057
 dial+0.056
Neg logit contributions
ap-0.104
ummy-0.104
ute-0.103
cient-0.103
uttering-0.102
Token then
Token ID788
Feature activation-4.73e-02

Pos logit contributions
 had+0.067
 also+0.066
 other+0.065
 moral+0.061
Who+0.061
Neg logit contributions
ocol-0.081
cient-0.081
 Majesty-0.080
ap-0.079
uttering-0.079
Token on
Token ID319
Feature activation-3.22e-02

Pos logit contributions
Who+0.068
 dial+0.062
Where+0.062
 other+0.061
 moral+0.059
Neg logit contributions
ap-0.076
ute-0.074
uttering-0.071
ummy-0.071
cient-0.070
Token,
Token ID11
Feature activation-4.03e-02

Pos logit contributions
 had+0.065
 other+0.064
Who+0.061
 moral+0.060
 also+0.060
Neg logit contributions
cient-0.033
uttering-0.033
ute-0.031
ach-0.031
 Majesty-0.031
Token Sandy
Token ID17581
Feature activation-1.05e-01

Pos logit contributions
 other+0.065
 had+0.065
Who+0.061
 also+0.061
 moral+0.059
Neg logit contributions
cient-0.061
ute-0.057
 Majesty-0.057
ocol-0.056
ach-0.056
Token treated
Token ID5716
Feature activation-4.17e-02

Pos logit contributions
 had+0.083
Who+0.076
 other+0.074
 moral+0.069
 also+0.068
Neg logit contributions
uttering-0.232
ocol-0.230
ute-0.230
cient-0.229
ach-0.229
Token her
Token ID607
Feature activation-4.98e-02

Pos logit contributions
Who+0.047
 other+0.046
 had+0.045
Where+0.043
 dial+0.042
Neg logit contributions
ute-0.078
uttering-0.077
cient-0.077
ap-0.076
ower-0.076
Token little
Token ID1310
Feature activation-4.64e-02

Pos logit contributions
Who+0.063
 had+0.061
 other+0.060
Where+0.058
 moral+0.058
Neg logit contributions
cient-0.084
ap-0.083
ute-0.083
uttering-0.083
ach-0.082
Token gray
Token ID12768
Feature activation-3.71e-02

Pos logit contributions
 other+0.062
 had+0.060
Who+0.059
 moral+0.057
 dial+0.056
Neg logit contributions
uttering-0.077
cient-0.077
ute-0.076
ach-0.075
iddy-0.074
Token t
Token ID256
Feature activation-3.01e-02

Pos logit contributions
 had+0.036
 other+0.036
Who+0.034
 dial+0.033
 moral+0.032
Neg logit contributions
uttering-0.075
cient-0.074
ap-0.074
ach-0.073
ute-0.072
Token Max
Token ID5436
Feature activation-8.74e-02

Pos logit contributions
Who+0.069
 had+0.064
 other+0.063
Where+0.063
 moral+0.062
Neg logit contributions
cient-0.103
ap-0.103
ute-0.102
ach-0.101
uttering-0.100
Token stopped
Token ID5025
Feature activation-2.43e-02

Pos logit contributions
Who+0.089
Where+0.079
 moral+0.074
Less+0.073
See+0.070
Neg logit contributions
ap-0.172
cient-0.171
ute-0.166
iddy-0.166
uc-0.164
Token sh
Token ID427
Feature activation-1.24e-02

Pos logit contributions
Who+0.031
 had+0.031
 other+0.031
 also+0.029
Where+0.029
Neg logit contributions
cient-0.043
ute-0.043
 Majesty-0.042
igsaw-0.041
ach-0.041
Tokenivering
Token ID35598
Feature activation-1.39e-02

Pos logit contributions
Who+0.020
 had+0.020
He+0.020
Where+0.020
 sounded+0.019
Neg logit contributions
 Majesty-0.015
cient-0.015
ocol-0.014
ute-0.014
ach-0.014
Token.
Token ID13
Feature activation-5.09e-02

Pos logit contributions
 had+0.013
 other+0.012
Who+0.012
 sounded+0.012
Where+0.011
Neg logit contributions
cient-0.029
ocol-0.029
ute-0.029
 Majesty-0.029
uttering-0.028
Token Tim
Token ID5045
Feature activation-1.05e-01

Pos logit contributions
Who+0.071
 other+0.066
Where+0.065
 had+0.065
 moral+0.065
Neg logit contributions
ummy-0.081
cient-0.080
ap-0.077
ower-0.077
iddy-0.076
Token learned
Token ID4499
Feature activation-3.82e-02

Pos logit contributions
Who+0.108
Where+0.102
Less+0.090
See+0.087
Looks+0.085
Neg logit contributions
ap-0.200
cient-0.200
iddy-0.195
ish-0.195
 Majesty-0.193
Token that
Token ID326
Feature activation-4.77e-02

Pos logit contributions
 other+0.068
Who+0.068
 had+0.068
 moral+0.066
He+0.065
Neg logit contributions
ocol-0.047
cient-0.046
ap-0.046
uttering-0.046
 Majesty-0.045
Token it
Token ID340
Feature activation-7.97e-02

Pos logit contributions
Who+0.065
Where+0.059
 had+0.059
 dial+0.058
See+0.057
Neg logit contributions
ap-0.075
ummy-0.074
ute-0.073
iddy-0.071
ach-0.071
Token's
Token ID338
Feature activation-6.05e-02

Pos logit contributions
Who+0.095
Where+0.086
Less+0.079
See+0.077
He+0.076
Neg logit contributions
cient-0.145
ummy-0.143
ap-0.142
iddy-0.141
ute-0.140
Token important
Token ID1593
Feature activation-2.25e-02

Pos logit contributions
Who+0.085
 had+0.078
Where+0.077
 other+0.076
 dial+0.075
Neg logit contributions
ap-0.094
ute-0.094
cient-0.093
uttering-0.092
ummy-0.091
Token as
Token ID355
Feature activation+5.12e-02

Pos logit contributions
cient+0.159
uttering+0.159
ute+0.158
ocol+0.157
ap+0.156
Neg logit contributions
 had-0.092
 other-0.090
Who-0.089
 moral-0.083
 dial-0.082
Token he
Token ID339
Feature activation-8.10e-03

Pos logit contributions
cient+0.099
ute+0.098
ap+0.098
uttering+0.097
ach+0.097
Neg logit contributions
 had-0.054
 other-0.052
Who-0.052
 moral-0.047
He-0.047
Token opened
Token ID4721
Feature activation+6.80e-02

Pos logit contributions
Who+0.007
Where+0.006
Less+0.006
 moral+0.006
 other+0.006
Neg logit contributions
cient-0.017
ap-0.016
ach-0.016
ute-0.016
 Majesty-0.016
Token the
Token ID262
Feature activation+5.19e-02

Pos logit contributions
cient+0.135
uttering+0.133
ower+0.132
ute+0.131
ap+0.129
Neg logit contributions
 other-0.072
 had-0.069
 dial-0.066
Who-0.066
Where-0.063
Token ch
Token ID442
Feature activation+9.39e-02

Pos logit contributions
uttering+0.089
ap+0.089
ach+0.088
cient+0.088
ute+0.088
Neg logit contributions
 other-0.063
Who-0.063
 had-0.063
 moral-0.060
Where-0.059
Tokenocol
Token ID4668
Feature activation+1.78e-01

Pos logit contributions
cient+0.080
icol+0.069
ower+0.068
idents+0.064
uttering+0.063
Neg logit contributions
 other-0.190
 moral-0.185
He-0.182
 had-0.178
 kn-0.176
Tokenates
Token ID689
Feature activation+7.55e-02

Pos logit contributions
cient+0.272
iddy+0.258
ummy+0.256
 disl+0.256
icol+0.253
Neg logit contributions
 other-0.228
 counted-0.224
Less-0.222
 had-0.220
Who-0.219
Token and
Token ID290
Feature activation+2.01e-02

Pos logit contributions
cient+0.136
 Majesty+0.134
ocol+0.133
uttering+0.133
ute+0.132
Neg logit contributions
 had-0.086
 other-0.085
Who-0.082
 also-0.078
He-0.077
Token started
Token ID2067
Feature activation+4.68e-02

Pos logit contributions
ute+0.041
uttering+0.041
ummy+0.040
iddy+0.040
ower+0.040
Neg logit contributions
 had-0.020
Who-0.020
 other-0.019
 dial-0.018
Where-0.018
Token eating
Token ID6600
Feature activation+6.87e-02

Pos logit contributions
ute+0.070
cient+0.070
ummy+0.069
iddy+0.068
ower+0.068
Neg logit contributions
 had-0.070
Who-0.069
 other-0.068
 dial-0.066
Where-0.064
Token them
Token ID606
Feature activation+3.30e-02

Pos logit contributions
ute+0.125
cient+0.125
uttering+0.124
ap+0.121
ower+0.121
Neg logit contributions
 other-0.085
 had-0.080
Who-0.079
Where-0.074
He-0.073
Token egg
Token ID5935
Feature activation+1.24e-01

Pos logit contributions
uttering+0.176
ute+0.174
iddy+0.173
cient+0.173
ach+0.171
Neg logit contributions
 had-0.154
Who-0.149
 other-0.144
Where-0.136
 moral-0.136
Token!
Token ID0
Feature activation+7.46e-02

Pos logit contributions
uttering+0.237
igsaw+0.236
 Majesty+0.233
 disl+0.230
ocol+0.229
Neg logit contributions
 had-0.168
 also-0.128
 sounded-0.128
 other-0.125
 shouted-0.120
Token He
Token ID679
Feature activation+1.33e-02

Pos logit contributions
cient+0.127
ap+0.127
ute+0.127
uttering+0.125
ach+0.125
Neg logit contributions
Who-0.091
 had-0.088
 other-0.085
Where-0.083
 moral-0.082
Token watched
Token ID7342
Feature activation+8.42e-02

Pos logit contributions
cient+0.030
ap+0.030
ute+0.029
ach+0.029
uttering+0.029
Neg logit contributions
Who-0.009
 other-0.008
 had-0.008
Where-0.008
 moral-0.008
Token in
Token ID287
Feature activation+4.33e-02

Pos logit contributions
ute+0.168
uttering+0.164
cient+0.162
ower+0.162
 disl+0.156
Neg logit contributions
 other-0.094
Who-0.089
 had-0.086
 dial-0.080
He-0.080
Token amaz
Token ID40642
Feature activation+1.77e-01

Pos logit contributions
cient+0.066
ap+0.066
ute+0.066
ach+0.065
uttering+0.065
Neg logit contributions
Who-0.060
 had-0.059
 other-0.057
Where-0.056
 moral-0.056
Tokenement
Token ID972
Feature activation+1.02e-01

Pos logit contributions
cient+0.338
ute+0.332
oppy+0.320
uttering+0.313
ower+0.308
Neg logit contributions
He-0.193
 sounded-0.188
 other-0.187
Who-0.184
 had-0.183
Token as
Token ID355
Feature activation+7.27e-02

Pos logit contributions
cient+0.181
ute+0.174
ocol+0.170
iddy+0.166
 Majesty+0.165
Neg logit contributions
 had-0.129
 other-0.122
 shouted-0.119
He-0.119
Who-0.114
Token the
Token ID262
Feature activation+4.31e-02

Pos logit contributions
cient+0.130
ute+0.129
ap+0.129
uttering+0.128
ach+0.127
Neg logit contributions
Who-0.085
 had-0.083
 other-0.081
Where-0.077
 moral-0.077
Token egg
Token ID5935
Feature activation+5.51e-02

Pos logit contributions
ummy+0.057
cient+0.054
ach+0.053
ower+0.052
 Majesty+0.052
Neg logit contributions
Who-0.070
Where-0.066
 had-0.066
Less-0.066
He-0.065
Token began
Token ID2540
Feature activation+6.84e-02

Pos logit contributions
ute+0.108
cient+0.108
uttering+0.108
ap+0.107
ach+0.107
Neg logit contributions
 had-0.055
Who-0.052
 other-0.051
 sounded-0.047
Where-0.047
Token go
Token ID467
Feature activation+1.04e-01

Pos logit contributions
uttering+0.264
ower+0.258
ute+0.252
ummy+0.251
iddy+0.250
Neg logit contributions
 other-0.128
 had-0.117
 also-0.113
Who-0.110
 asked-0.104
Token!
Token ID0
Feature activation+8.21e-02

Pos logit contributions
cient+0.164
ocol+0.160
 Majesty+0.159
ummy+0.158
ower+0.157
Neg logit contributions
 had-0.156
 other-0.152
Who-0.145
 also-0.142
 dial-0.139
Token Everyone
Token ID11075
Feature activation+3.01e-02

Pos logit contributions
cient+0.148
ap+0.147
ute+0.147
uttering+0.146
ach+0.146
Neg logit contributions
Who-0.092
 had-0.089
 other-0.086
Where-0.084
 moral-0.083
Token watched
Token ID7342
Feature activation+8.80e-02

Pos logit contributions
cient+0.063
ap+0.063
ute+0.062
ach+0.062
uttering+0.062
Neg logit contributions
Who-0.025
 had-0.022
 other-0.022
Where-0.022
 moral-0.022
Token in
Token ID287
Feature activation+4.95e-02

Pos logit contributions
ute+0.153
cient+0.148
uttering+0.148
 Majesty+0.146
ower+0.145
Neg logit contributions
 other-0.113
 had-0.112
Who-0.108
 dial-0.101
 sounded-0.099
Token amaz
Token ID40642
Feature activation+1.81e-01

Pos logit contributions
cient+0.078
ap+0.078
ute+0.078
uttering+0.077
ach+0.077
Neg logit contributions
Who-0.067
 had-0.065
 other-0.064
Where-0.062
 moral-0.062
Tokenement
Token ID972
Feature activation+1.04e-01

Pos logit contributions
cient+0.339
oppy+0.331
ute+0.323
opsy+0.312
uttering+0.311
Neg logit contributions
 other-0.199
 sounded-0.197
 had-0.191
He-0.187
Who-0.185
Token as
Token ID355
Feature activation+7.35e-02

Pos logit contributions
cient+0.187
ocol+0.185
ute+0.179
 Majesty+0.174
uttering+0.172
Neg logit contributions
 had-0.129
 other-0.121
 shouted-0.114
 also-0.111
 sounded-0.107
Token it
Token ID340
Feature activation+2.18e-02

Pos logit contributions
cient+0.118
ap+0.118
ute+0.117
ach+0.116
uttering+0.116
Neg logit contributions
Who-0.097
 had-0.094
 other-0.091
Where-0.089
 moral-0.089
Token took
Token ID1718
Feature activation+5.91e-02

Pos logit contributions
ap+0.044
cient+0.044
ute+0.043
ach+0.043
ower+0.042
Neg logit contributions
Who-0.019
Where-0.018
Less-0.017
 moral-0.017
 other-0.016
Token off
Token ID572
Feature activation+7.48e-02

Pos logit contributions
ap+0.086
cient+0.086
ute+0.085
ach+0.085
uttering+0.084
Neg logit contributions
Who-0.087
 had-0.084
 other-0.082
Where-0.081
 moral-0.081
Tokeneddy
Token ID21874
Feature activation+1.12e-01

Pos logit contributions
 disl+0.192
cient+0.188
ocol+0.188
 Majesty+0.187
ap+0.185
Neg logit contributions
He-0.198
Mom-0.190
 other-0.189
Who-0.189
 had-0.186
Token bear
Token ID6842
Feature activation+7.84e-02

Pos logit contributions
uttering+0.259
ach+0.242
ower+0.241
ap+0.241
ute+0.240
Neg logit contributions
 had-0.100
 other-0.095
Who-0.089
 dial-0.086
 sounded-0.083
Token and
Token ID290
Feature activation+4.31e-02

Pos logit contributions
uttering+0.154
ute+0.153
ap+0.152
ach+0.151
cient+0.151
Neg logit contributions
 had-0.078
Who-0.075
 other-0.073
 dial-0.067
Where-0.067
Token watched
Token ID7342
Feature activation+9.11e-02

Pos logit contributions
uttering+0.089
ute+0.089
ach+0.087
ap+0.087
cient+0.086
Neg logit contributions
 had-0.043
Who-0.042
 other-0.040
 dial-0.037
 prayed-0.037
Token in
Token ID287
Feature activation+4.68e-02

Pos logit contributions
cient+0.164
ute+0.163
uttering+0.162
ower+0.160
 Majesty+0.157
Neg logit contributions
 other-0.109
Who-0.108
 had-0.103
Where-0.098
 moral-0.097
Token amaz
Token ID40642
Feature activation+1.79e-01

Pos logit contributions
ap+0.070
ocol+0.070
ach+0.070
ummy+0.069
uttering+0.069
Neg logit contributions
Who-0.066
 had-0.063
 other-0.060
 moral-0.060
He-0.060
Tokenement
Token ID972
Feature activation+1.04e-01

Pos logit contributions
cient+0.319
ute+0.307
uttering+0.301
oppy+0.299
ap+0.294
Neg logit contributions
 other-0.215
Who-0.212
He-0.211
 sounded-0.210
 had-0.206
Token as
Token ID355
Feature activation+6.86e-02

Pos logit contributions
cient+0.221
ute+0.211
 Majesty+0.209
ocol+0.209
ubby+0.207
Neg logit contributions
 had-0.089
 other-0.088
Who-0.080
 shouted-0.075
He-0.075
Token it
Token ID340
Feature activation+1.69e-02

Pos logit contributions
cient+0.114
ap+0.114
ute+0.114
ach+0.113
uttering+0.113
Neg logit contributions
Who-0.086
 had-0.082
 other-0.079
Where-0.079
 moral-0.078
Token shrunk
Token ID38900
Feature activation+8.39e-02

Pos logit contributions
cient+0.031
ap+0.030
uc+0.029
uttering+0.029
ach+0.029
Neg logit contributions
Who-0.017
Less-0.017
 moral-0.016
Where-0.016
He-0.015
Token an
Token ID281
Feature activation+6.63e-02

Pos logit contributions
cient+0.150
ute+0.148
uttering+0.147
ap+0.145
ower+0.142
Neg logit contributions
 had-0.102
 other-0.101
Who-0.099
Where-0.094
 sounded-0.092
Token that
Token ID326
Feature activation+7.74e-02

Pos logit contributions
cient+0.221
ocol+0.219
ute+0.207
ubby+0.204
oppy+0.203
Neg logit contributions
 had-0.156
 other-0.136
Who-0.135
 sounded-0.130
 also-0.127
Token the
Token ID262
Feature activation+7.19e-02

Pos logit contributions
ap+0.130
cient+0.130
ute+0.129
ach+0.128
uttering+0.128
Neg logit contributions
Who-0.096
 had-0.092
 other-0.089
Where-0.088
 moral-0.087
Token bad
Token ID2089
Feature activation+8.47e-02

Pos logit contributions
cient+0.114
ap+0.113
ute+0.113
uttering+0.112
ach+0.112
Neg logit contributions
Who-0.097
 had-0.094
 other-0.092
 moral-0.090
Where-0.090
Token rh
Token ID9529
Feature activation+1.42e-01

Pos logit contributions
cient+0.148
uttering+0.148
ute+0.146
ach+0.144
ap+0.144
Neg logit contributions
 had-0.107
Who-0.104
 other-0.103
 moral-0.099
 sounded-0.094
Tokenin
Token ID259
Feature activation+3.78e-02

Pos logit contributions
cient+0.181
ap+0.180
ute+0.180
uttering+0.178
ach+0.178
Neg logit contributions
Who-0.233
 had-0.229
 other-0.224
Where-0.219
 moral-0.218
Tokenoc
Token ID420
Feature activation+1.71e-01

Pos logit contributions
cient+0.052
ach+0.050
ap+0.049
ute+0.049
ower+0.048
Neg logit contributions
Who-0.059
Where-0.056
Less-0.053
 other-0.053
See-0.053
Tokeneros
Token ID27498
Feature activation+4.50e-02

Pos logit contributions
uttering+0.170
ute+0.148
utter+0.137
ap+0.135
iddy+0.134
Neg logit contributions
 other-0.345
 moral-0.336
Who-0.336
 Moral-0.330
 had-0.328
Token was
Token ID373
Feature activation+5.46e-02

Pos logit contributions
cient+0.101
ap+0.100
ute+0.099
ach+0.099
ummy+0.097
Neg logit contributions
Who-0.031
Where-0.026
 moral-0.025
 other-0.025
Less-0.024
Token now
Token ID783
Feature activation+5.54e-02

Pos logit contributions
ute+0.102
cient+0.102
ap+0.099
ach+0.099
ower+0.099
Neg logit contributions
 had-0.063
Who-0.059
 other-0.059
Where-0.056
 also-0.056
Token nice
Token ID3621
Feature activation+7.46e-02

Pos logit contributions
cient+0.096
ute+0.095
ach+0.094
ap+0.093
uttering+0.093
Neg logit contributions
 had-0.073
 other-0.068
Who-0.068
 moral-0.064
Where-0.063
Token.
Token ID13
Feature activation+4.89e-02

Pos logit contributions
cient+0.128
ute+0.126
uttering+0.126
ap+0.124
ocol+0.124
Neg logit contributions
 had-0.100
 other-0.094
Who-0.093
 sounded-0.087
Where-0.087
Token the
Token ID262
Feature activation+5.21e-02

Pos logit contributions
cient+0.049
ute+0.048
ower+0.048
uttering+0.048
ach+0.047
Neg logit contributions
 other-0.028
 had-0.028
Who-0.027
Where-0.026
 also-0.025
Token sunshine
Token ID34488
Feature activation+6.33e-02

Pos logit contributions
cient+0.092
ach+0.091
uttering+0.090
ute+0.090
ap+0.090
Neg logit contributions
 other-0.065
 had-0.064
Who-0.063
 moral-0.060
Where-0.059
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
ocol+0.129
cient+0.126
uttering+0.125
ute+0.123
 Majesty+0.123
Neg logit contributions
 had-0.068
 other-0.066
 also-0.056
 sounded-0.056
Who-0.055
Token<|endoftext|>
Token ID50256
Feature activation+8.45e-02

Pos logit contributions
ummy+0.035
cient+0.035
 Majesty+0.034
ap+0.033
iddy+0.033
Neg logit contributions
Who-0.026
 dial-0.025
 other-0.024
Where-0.024
 moral-0.023
TokenOnce
Token ID7454
Feature activation+9.63e-02

Pos logit contributions
ap+0.147
ummy+0.147
ach+0.146
ute+0.146
iddy+0.143
Neg logit contributions
Who-0.097
 moral-0.088
 dial-0.087
Less-0.085
 prayed-0.085
Token upon
Token ID2402
Feature activation+8.68e-02

Pos logit contributions
ach+0.148
cient+0.145
 Majesty+0.144
ocol+0.143
ower+0.141
Neg logit contributions
 had-0.147
 other-0.142
 also-0.134
 moral-0.134
Who-0.131
Token a
Token ID257
Feature activation+8.31e-02

Pos logit contributions
cient+0.148
ute+0.144
ap+0.144
ach+0.143
ower+0.142
Neg logit contributions
Who-0.115
 had-0.114
 other-0.111
Where-0.107
He-0.106
Token time
Token ID640
Feature activation+8.09e-02

Pos logit contributions
cient+0.187
ocol+0.180
ute+0.179
ach+0.176
 Majesty+0.175
Neg logit contributions
 had-0.072
 other-0.064
 moral-0.063
 also-0.061
One-0.059
Token,
Token ID11
Feature activation+6.96e-02

Pos logit contributions
cient+0.138
 Majesty+0.137
ocol+0.136
uc+0.133
ute+0.132
Neg logit contributions
 had-0.128
 also-0.107
 other-0.107
 was-0.104
Once-0.104
Token there
Token ID612
Feature activation+4.80e-02

Pos logit contributions
cient+0.112
 Majesty+0.111
ocol+0.110
ach+0.110
ute+0.108
Neg logit contributions
 had-0.103
Who-0.096
 other-0.094
 also-0.094
Where-0.093
Token was
Token ID373
Feature activation+5.80e-02

Pos logit contributions
cient+0.121
ute+0.120
ap+0.120
uttering+0.119
ach+0.119
Neg logit contributions
Who-0.021
 had-0.021
 other-0.018
Where-0.016
 moral-0.016
Token a
Token ID257
Feature activation+1.07e-01

Pos logit contributions
cient+0.200
ute+0.194
ap+0.193
ower+0.192
ocol+0.191
Neg logit contributions
Who-0.150
 had-0.149
 other-0.146
Where-0.140
He-0.138
Token time
Token ID640
Feature activation+9.91e-02

Pos logit contributions
cient+0.236
ute+0.229
ocol+0.227
iddy+0.224
ach+0.224
Neg logit contributions
 had-0.094
 other-0.089
 moral-0.084
 also-0.081
He-0.079
Token,
Token ID11
Feature activation+8.59e-02

Pos logit contributions
ocol+0.174
 Majesty+0.173
cient+0.173
icol+0.170
uc+0.167
Neg logit contributions
 had-0.150
 also-0.126
 other-0.125
Once-0.123
 was-0.119
Token in
Token ID287
Feature activation+6.26e-02

Pos logit contributions
ocol+0.160
 disl+0.160
 Majesty+0.159
icol+0.156
cient+0.154
Neg logit contributions
 had-0.117
 also-0.109
 other-0.102
Who-0.098
Where-0.098
Token a
Token ID257
Feature activation+8.43e-02

Pos logit contributions
cient+0.135
ower+0.133
ute+0.130
 Majesty+0.129
ach+0.127
Neg logit contributions
 had-0.065
 other-0.063
 also-0.061
Who-0.061
Where-0.060
Token small
Token ID1402
Feature activation+8.80e-02

Pos logit contributions
ocol+0.165
cient+0.162
uc+0.162
ower+0.161
ach+0.161
Neg logit contributions
 other-0.092
 had-0.089
Who-0.089
 also-0.084
 moral-0.082
Token house
Token ID2156
Feature activation+6.85e-02

Pos logit contributions
 disl+0.189
ancing+0.183
ocol+0.183
uttering+0.182
ower+0.182
Neg logit contributions
 other-0.099
 had-0.086
 also-0.074
 corner-0.072
 dial-0.070
Token,
Token ID11
Feature activation+6.28e-02

Pos logit contributions
uttering+0.088
 Majesty+0.087
ocol+0.085
ute+0.085
cient+0.084
Neg logit contributions
 had-0.135
 other-0.121
 also-0.115
Who-0.115
 sounded-0.113
Token there
Token ID612
Feature activation+5.05e-02

Pos logit contributions
ach+0.094
cient+0.093
ocol+0.093
 Majesty+0.092
ute+0.092
Neg logit contributions
 had-0.096
Who-0.093
 other-0.090
Where-0.087
 also-0.087
Token lived
Token ID5615
Feature activation+7.63e-02

Pos logit contributions
ap+0.115
ute+0.114
cient+0.114
ach+0.113
uttering+0.113
Neg logit contributions
Who-0.033
 had-0.029
 other-0.028
Where-0.028
 moral-0.028
Token a
Token ID257
Feature activation+6.00e-02

Pos logit contributions
cient+0.127
ute+0.124
ower+0.123
 Majesty+0.123
ocol+0.122
Neg logit contributions
Who-0.101
 had-0.099
 other-0.099
Where-0.095
 moral-0.093
Token She
Token ID1375
Feature activation+2.59e-02

Pos logit contributions
uttering+0.195
ute+0.195
cient+0.193
ap+0.193
ach+0.192
Neg logit contributions
Who-0.087
 had-0.085
 other-0.078
He-0.076
Where-0.076
Token wanted
Token ID2227
Feature activation+8.40e-02

Pos logit contributions
cient+0.054
ap+0.053
ute+0.052
ach+0.052
 Majesty+0.052
Neg logit contributions
Who-0.022
Where-0.020
 other-0.019
 had-0.019
 moral-0.019
Token to
Token ID284
Feature activation+6.64e-02

Pos logit contributions
ute+0.108
cient+0.108
ap+0.107
uttering+0.106
ach+0.104
Neg logit contributions
Who-0.143
 had-0.141
 other-0.140
Where-0.134
 moral-0.133
Token take
Token ID1011
Feature activation+6.87e-02

Pos logit contributions
uttering+0.131
cient+0.129
ute+0.127
ummy+0.126
ower+0.125
Neg logit contributions
 other-0.072
Who-0.072
 had-0.070
 dial-0.065
 also-0.064
Token a
Token ID257
Feature activation+6.51e-02

Pos logit contributions
cient+0.116
uttering+0.115
ute+0.115
ap+0.114
ower+0.113
Neg logit contributions
Who-0.088
 other-0.088
 had-0.086
Where-0.082
 dial-0.080
Token picture
Token ID4286
Feature activation+1.14e-01

Pos logit contributions
ach+0.116
 Majesty+0.115
ap+0.115
iddy+0.114
ummy+0.113
Neg logit contributions
 other-0.080
 had-0.080
Who-0.078
 dial-0.074
 moral-0.071
Token now
Token ID783
Feature activation+5.62e-02

Pos logit contributions
cient+0.198
ute+0.192
uttering+0.192
 Majesty+0.192
ach+0.191
Neg logit contributions
 had-0.138
 other-0.135
Who-0.135
 moral-0.126
He-0.126
Token!
Token ID0
Feature activation+5.13e-02

Pos logit contributions
cient+0.091
ap+0.091
ute+0.091
uttering+0.090
ach+0.090
Neg logit contributions
Who-0.074
 had-0.072
 other-0.070
Where-0.068
 moral-0.068
Token
Token ID198
Feature activation+6.91e-02

Pos logit contributions
cient+0.094
ap+0.093
ute+0.093
ach+0.092
ummy+0.092
Neg logit contributions
Who-0.056
 had-0.054
 other-0.053
 moral-0.051
Where-0.051
Token
Token ID198
Feature activation+5.37e-02

Pos logit contributions
ute+0.123
ach+0.122
ap+0.121
ummy+0.119
uttering+0.118
Neg logit contributions
Who-0.077
 had-0.073
 dial-0.072
 sounded-0.070
 prayed-0.069
TokenM
Token ID44
Feature activation+6.39e-02

Pos logit contributions
ap+0.103
ute+0.102
ach+0.102
ower+0.099
uttering+0.099
Neg logit contributions
 had-0.053
 dial-0.051
 other-0.050
Who-0.050
 sounded-0.049
Token,
Token ID11
Feature activation+7.45e-02

Pos logit contributions
ocol+0.174
imb+0.168
arter+0.168
akes+0.166
uttering+0.164
Neg logit contributions
 had-0.140
 other-0.117
 was-0.115
 shouted-0.115
 sounded-0.112
Token "
Token ID366
Feature activation+4.75e-02

Pos logit contributions
ocol+0.148
ach+0.145
uttering+0.142
ute+0.142
cient+0.140
Neg logit contributions
 had-0.086
Who-0.081
He-0.075
 other-0.073
Where-0.073
TokenThat
Token ID2504
Feature activation+5.36e-02

Pos logit contributions
ach+0.093
ower+0.089
ute+0.088
uck+0.087
uttering+0.087
Neg logit contributions
 other-0.046
 moral-0.044
 dial-0.042
 had-0.041
 also-0.036
Token's
Token ID338
Feature activation+5.35e-02

Pos logit contributions
ocol+0.134
uttering+0.132
ach+0.128
cient+0.125
iddy+0.124
Neg logit contributions
 had-0.041
 other-0.037
 sounded-0.035
 also-0.032
 moral-0.030
Token my
Token ID616
Feature activation+7.06e-02

Pos logit contributions
cient+0.080
ap+0.080
ute+0.080
ach+0.079
uttering+0.079
Neg logit contributions
Who-0.078
 had-0.076
 other-0.074
Where-0.072
 moral-0.072
Token dictionary
Token ID22155
Feature activation+9.06e-02

Pos logit contributions
uttering+0.108
cient+0.108
ach+0.106
iddy+0.106
ap+0.104
Neg logit contributions
 other-0.106
 moral-0.101
 had-0.098
Who-0.098
 dial-0.092
Token.
Token ID13
Feature activation+4.68e-02

Pos logit contributions
uttering+0.155
ute+0.152
ach+0.152
cient+0.151
ap+0.150
Neg logit contributions
 had-0.118
Who-0.111
 other-0.110
 sounded-0.104
 moral-0.103
Token It
Token ID632
Feature activation+2.23e-02

Pos logit contributions
cient+0.059
ap+0.058
ute+0.058
ummy+0.057
ach+0.057
Neg logit contributions
Who-0.077
 had-0.076
 other-0.075
 moral-0.073
Where-0.073
Token helps
Token ID5419
Feature activation+5.50e-02

Pos logit contributions
cient+0.039
ish+0.035
ap+0.034
 Majesty+0.034
ute+0.033
Neg logit contributions
Who-0.030
Where-0.029
Less-0.028
 moral-0.027
He-0.027
Token me
Token ID502
Feature activation+4.33e-02

Pos logit contributions
uttering+0.086
ute+0.086
cient+0.086
ap+0.085
ower+0.084
Neg logit contributions
Who-0.075
 other-0.074
 had-0.074
 moral-0.070
Where-0.070
Token learn
Token ID2193
Feature activation+7.40e-02

Pos logit contributions
uttering+0.078
cient+0.078
ute+0.078
ach+0.078
ap+0.077
Neg logit contributions
Who-0.048
 had-0.047
 other-0.047
 dial-0.044
Where-0.044
Token saw
Token ID2497
Feature activation+9.28e-02

Pos logit contributions
cient+0.095
ap+0.094
ute+0.093
ach+0.092
uttering+0.092
Neg logit contributions
Who-0.042
 had-0.039
 other-0.038
Where-0.037
 moral-0.037
Token a
Token ID257
Feature activation+7.64e-02

Pos logit contributions
ute+0.152
uttering+0.150
ap+0.148
cient+0.147
iddy+0.145
Neg logit contributions
 other-0.132
 had-0.131
Who-0.129
Where-0.121
 sounded-0.120
Token beautiful
Token ID4950
Feature activation+1.06e-01

Pos logit contributions
cient+0.127
ap+0.126
ach+0.125
uttering+0.125
ute+0.125
Neg logit contributions
Who-0.099
 had-0.097
 other-0.095
 moral-0.091
Where-0.091
Token bird
Token ID6512
Feature activation+8.08e-02

Pos logit contributions
uttering+0.201
cient+0.199
iddy+0.192
uc+0.191
oppy+0.190
Neg logit contributions
 had-0.131
 other-0.124
Who-0.116
 dial-0.113
 sounded-0.110
Token in
Token ID287
Feature activation+6.43e-02

Pos logit contributions
 Majesty+0.143
ummy+0.140
cient+0.140
iddy+0.138
ocol+0.138
Neg logit contributions
 had-0.107
Who-0.095
 other-0.095
 also-0.092
Where-0.089
Token a
Token ID257
Feature activation+8.35e-02

Pos logit contributions
cient+0.115
ute+0.114
ap+0.113
ower+0.112
uttering+0.112
Neg logit contributions
Who-0.081
 had-0.078
 other-0.078
Where-0.076
 moral-0.072
Token tree
Token ID5509
Feature activation+8.72e-02

Pos logit contributions
ach+0.142
uttering+0.142
ap+0.141
cient+0.140
ute+0.140
Neg logit contributions
 had-0.105
Who-0.104
 other-0.103
Where-0.097
 moral-0.094
Token.
Token ID13
Feature activation+5.06e-02

Pos logit contributions
 Majesty+0.177
cient+0.173
ocol+0.172
ute+0.169
iddy+0.167
Neg logit contributions
 had-0.097
 also-0.082
 other-0.080
Who-0.077
 shouted-0.074
Token
Token ID220
Feature activation+9.04e-02

Pos logit contributions
cient+0.077
ap+0.077
ute+0.077
ach+0.076
uttering+0.076
Neg logit contributions
Who-0.071
 had-0.069
 other-0.067
Where-0.066
 moral-0.065
Token
Token ID198
Feature activation+6.01e-02

Pos logit contributions
cient+0.204
ap+0.202
ach+0.202
ute+0.202
uttering+0.201
Neg logit contributions
 had-0.059
Who-0.059
 other-0.056
Where-0.052
He-0.051
Token
Token ID198
Feature activation+4.77e-02

Pos logit contributions
ach+0.103
ute+0.100
ummy+0.097
cient+0.097
iddy+0.097
Neg logit contributions
 prayed-0.067
Who-0.066
 dial-0.065
 had-0.064
 moral-0.064
Token father
Token ID2988
Feature activation+1.31e-02

Pos logit contributions
cient+0.011
ap+0.011
ute+0.011
ach+0.011
uttering+0.011
Neg logit contributions
Who-0.011
 had-0.011
 other-0.010
Where-0.010
 moral-0.010
Token,
Token ID11
Feature activation+7.72e-03

Pos logit contributions
ower+0.021
ute+0.020
uttering+0.020
ach+0.020
icol+0.019
Neg logit contributions
 had-0.020
 other-0.020
 also-0.019
 moral-0.019
Who-0.018
Token "
Token ID366
Feature activation-1.19e-02

Pos logit contributions
ute+0.012
cient+0.012
ap+0.012
ach+0.012
uttering+0.012
Neg logit contributions
Who-0.011
 had-0.011
 other-0.010
Where-0.010
 moral-0.010
TokenWhat
Token ID2061
Feature activation+1.24e-02

Pos logit contributions
 had+0.010
 other+0.009
 learned+0.009
 teaches+0.009
Less+0.008
Neg logit contributions
ach-0.024
cient-0.023
uttering-0.023
uck-0.022
imb-0.022
Token's
Token ID338
Feature activation-3.35e-03

Pos logit contributions
ocol+0.027
uttering+0.027
ach+0.026
ute+0.026
ower+0.026
Neg logit contributions
 had-0.011
 other-0.010
Who-0.010
 sounded-0.010
 also-0.009
Token wrong
Token ID2642
Feature activation+2.83e-02

Pos logit contributions
 other+0.004
 had+0.004
 also+0.004
Who+0.004
 sounded+0.004
Neg logit contributions
ocol-0.006
 Majesty-0.006
ummy-0.006
uc-0.006
iddy-0.006
Token?"
Token ID1701
Feature activation+1.02e-02

Pos logit contributions
ocol+0.054
uttering+0.048
uc+0.047
awk+0.044
cient+0.044
Neg logit contributions
 had-0.040
 moral-0.039
 sounded-0.039
Who-0.039
 other-0.038
Token
Token ID220
Feature activation+5.50e-02

Pos logit contributions
ap+0.019
cient+0.018
ute+0.018
ach+0.018
ummy+0.018
Neg logit contributions
Who-0.011
 had-0.011
Where-0.011
 moral-0.011
 other-0.011
Token
Token ID198
Feature activation+3.30e-02

Pos logit contributions
cient+0.121
ap+0.120
ute+0.120
uttering+0.120
ach+0.120
Neg logit contributions
Who-0.039
 had-0.038
 other-0.037
Where-0.034
 moral-0.034
Token
Token ID198
Feature activation+1.39e-02

Pos logit contributions
ute+0.056
ap+0.056
ach+0.056
ummy+0.054
cient+0.054
Neg logit contributions
Who-0.039
 had-0.037
Looks-0.036
Where-0.035
Less-0.035
TokenHis
Token ID6653
Feature activation-4.95e-03

Pos logit contributions
ap+0.028
ach+0.028
ute+0.028
cient+0.027
ower+0.026
Neg logit contributions
 had-0.013
 other-0.012
Who-0.012
Looks-0.012
Less-0.011
Token,
Token ID11
Feature activation-8.23e-03

Pos logit contributions
ute+0.045
cient+0.044
ap+0.043
uttering+0.043
 Majesty+0.043
Neg logit contributions
 had-0.030
Who-0.028
 other-0.028
Where-0.026
 sounded-0.026
Token and
Token ID290
Feature activation-1.67e-02

Pos logit contributions
 had+0.010
Who+0.010
 other+0.010
Where+0.010
 moral+0.010
Neg logit contributions
ute-0.014
cient-0.014
ap-0.014
ach-0.014
uttering-0.014
Token looked
Token ID3114
Feature activation+1.53e-02

Pos logit contributions
Who+0.018
 had+0.017
 other+0.017
Where+0.017
 moral+0.016
Neg logit contributions
cient-0.031
ap-0.031
ute-0.030
ach-0.030
uttering-0.030
Token after
Token ID706
Feature activation+7.83e-03

Pos logit contributions
ute+0.023
ower+0.022
cient+0.022
ach+0.022
ummy+0.022
Neg logit contributions
 had-0.024
 other-0.023
Who-0.022
Where-0.022
He-0.021
Token it
Token ID340
Feature activation-2.53e-02

Pos logit contributions
ute+0.013
ap+0.013
cient+0.013
uttering+0.013
ower+0.012
Neg logit contributions
Who-0.010
 had-0.010
 other-0.010
Where-0.010
 moral-0.009
Token like
Token ID588
Feature activation+7.79e-03

Pos logit contributions
Who+0.037
 moral+0.033
Where+0.033
Less+0.032
See+0.032
Neg logit contributions
ap-0.039
cient-0.038
uttering-0.037
ach-0.037
ute-0.037
Token it
Token ID340
Feature activation-2.69e-02

Pos logit contributions
cient+0.012
ute+0.012
ap+0.012
uttering+0.012
ach+0.012
Neg logit contributions
Who-0.011
 had-0.010
 other-0.010
Where-0.010
 moral-0.010
Token was
Token ID373
Feature activation+1.61e-03

Pos logit contributions
Who+0.035
Where+0.031
 moral+0.031
 dial+0.031
Less+0.031
Neg logit contributions
cient-0.045
ap-0.044
ummy-0.042
ach-0.042
uttering-0.042
Token his
Token ID465
Feature activation-3.48e-03

Pos logit contributions
cient+0.003
ute+0.003
ap+0.003
uttering+0.003
ach+0.003
Neg logit contributions
Who-0.002
 had-0.002
 other-0.002
Where-0.002
 moral-0.002
Token own
Token ID898
Feature activation-1.05e-02

Pos logit contributions
Who+0.005
 had+0.005
 other+0.005
 moral+0.005
Where+0.005
Neg logit contributions
cient-0.005
ap-0.005
ute-0.005
ach-0.005
uttering-0.005
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 had+0.014
Who+0.014
 other+0.014
 moral+0.013
Where+0.013
Neg logit contributions
cient-0.017
ap-0.017
ute-0.017
ach-0.016
uttering-0.016
Token hand
Token ID1021
Feature activation+7.30e-02

Pos logit contributions
ach+0.093
uttering+0.093
iddy+0.092
cient+0.092
ocol+0.090
Neg logit contributions
 other-0.070
 had-0.069
Who-0.061
 moral-0.060
 dial-0.060
Token.
Token ID13
Feature activation+3.11e-02

Pos logit contributions
cient+0.147
 Majesty+0.145
ocol+0.145
ute+0.143
uttering+0.142
Neg logit contributions
 had-0.074
 other-0.069
He-0.061
 dial-0.061
Who-0.061
Token He
Token ID679
Feature activation-1.43e-02

Pos logit contributions
ute+0.056
cient+0.055
ap+0.055
uttering+0.055
ach+0.054
Neg logit contributions
Who-0.037
 had-0.036
Where-0.035
He-0.035
 other-0.034
Token knew
Token ID2993
Feature activation+4.57e-02

Pos logit contributions
Who+0.013
Less+0.012
Where+0.012
 moral+0.011
 other+0.011
Neg logit contributions
cient-0.028
ap-0.028
ach-0.027
ute-0.027
 Majesty-0.027
Token it
Token ID340
Feature activation+9.98e-03

Pos logit contributions
cient+0.082
uttering+0.081
ute+0.080
ap+0.080
ower+0.079
Neg logit contributions
 had-0.059
 other-0.057
Who-0.056
 also-0.056
He-0.052
Token was
Token ID373
Feature activation+3.51e-02

Pos logit contributions
cient+0.020
ummy+0.019
ap+0.019
iddy+0.019
ach+0.019
Neg logit contributions
Who-0.009
Less-0.008
 moral-0.008
Where-0.008
See-0.007
Token something
Token ID1223
Feature activation+4.39e-02

Pos logit contributions
ap+0.058
cient+0.058
ute+0.057
ach+0.057
uttering+0.057
Neg logit contributions
Who-0.045
 had-0.042
 other-0.041
Where-0.041
 moral-0.040
Token special
Token ID2041
Feature activation+5.92e-02

Pos logit contributions
cient+0.076
ap+0.075
ute+0.075
uttering+0.074
ach+0.074
Neg logit contributions
Who-0.053
 had-0.053
 other-0.051
Where-0.049
 moral-0.048
Token and
Token ID290
Feature activation+2.68e-03

Pos logit contributions
cient+0.094
ute+0.092
ap+0.091
uttering+0.091
 Majesty+0.091
Neg logit contributions
 had-0.086
Who-0.080
 other-0.080
 sounded-0.076
Where-0.076
Token he
Token ID339
Feature activation-2.24e-02

Pos logit contributions
ute+0.005
cient+0.005
ap+0.005
uttering+0.005
ach+0.005
Neg logit contributions
Who-0.003
 had-0.003
 other-0.003
Where-0.003
 moral-0.003
Token couldn
Token ID3521
Feature activation+5.69e-02

Pos logit contributions
Who+0.020
Where+0.018
Less+0.018
 other+0.016
 moral+0.016
Neg logit contributions
cient-0.045
ap-0.044
ach-0.044
 Majesty-0.043
ute-0.043
Token She
Token ID1375
Feature activation-5.87e-03

Pos logit contributions
cient+0.087
ute+0.086
ap+0.086
uttering+0.086
ocol+0.086
Neg logit contributions
Who-0.037
 had-0.035
 other-0.033
Where-0.033
He-0.031
Token tried
Token ID3088
Feature activation+6.35e-02

Pos logit contributions
Who+0.005
Where+0.005
Less+0.004
 moral+0.004
 other+0.004
Neg logit contributions
cient-0.012
ap-0.012
ute-0.012
ach-0.012
 Majesty-0.012
Token to
Token ID284
Feature activation+4.26e-02

Pos logit contributions
cient+0.074
ap+0.074
ute+0.073
uttering+0.072
ach+0.072
Neg logit contributions
Who-0.113
 had-0.110
 other-0.108
Where-0.106
 moral-0.105
Token get
Token ID651
Feature activation+4.35e-02

Pos logit contributions
cient+0.078
uttering+0.077
ute+0.077
ap+0.077
ummy+0.077
Neg logit contributions
Who-0.048
 had-0.047
 other-0.046
 moral-0.043
 dial-0.043
Token on
Token ID319
Feature activation+4.77e-02

Pos logit contributions
ap+0.058
uttering+0.057
cient+0.057
ocol+0.057
ute+0.057
Neg logit contributions
Who-0.067
 had-0.063
 moral-0.063
He-0.062
Less-0.062
Token,
Token ID11
Feature activation+4.09e-02

Pos logit contributions
cient+0.076
ute+0.076
ap+0.076
uttering+0.075
ach+0.075
Neg logit contributions
Who-0.064
 had-0.063
 other-0.061
Where-0.059
 moral-0.059
Token but
Token ID475
Feature activation+1.96e-02

Pos logit contributions
cient+0.067
ap+0.067
ute+0.067
ach+0.066
uttering+0.066
Neg logit contributions
Who-0.053
 had-0.051
 other-0.050
Where-0.049
 moral-0.048
Token she
Token ID673
Feature activation-1.04e-02

Pos logit contributions
ach+0.038
cient+0.038
ute+0.038
 Majesty+0.038
ap+0.038
Neg logit contributions
 had-0.022
 other-0.021
Who-0.020
 also-0.020
Where-0.019
Token was
Token ID373
Feature activation+3.26e-02

Pos logit contributions
Who+0.012
Where+0.011
 moral+0.010
 other+0.010
Less+0.010
Neg logit contributions
ap-0.019
ute-0.019
cient-0.019
ower-0.018
ach-0.018
Token too
Token ID1165
Feature activation+4.18e-02

Pos logit contributions
uttering+0.052
ap+0.051
iddy+0.051
ute+0.050
cient+0.050
Neg logit contributions
Who-0.043
Where-0.039
 other-0.039
 had-0.038
 moral-0.038
Token small
Token ID1402
Feature activation+6.41e-02

Pos logit contributions
cient+0.080
ap+0.079
ute+0.078
ach+0.078
ocol+0.077
Neg logit contributions
 had-0.045
Who-0.044
 other-0.043
 also-0.041
 moral-0.041
Token pilot
Token ID8022
Feature activation-5.02e-03

Pos logit contributions
Who+0.026
 had+0.026
 other+0.025
Where+0.024
 moral+0.024
Neg logit contributions
cient-0.027
ute-0.027
ap-0.027
uttering-0.026
ach-0.026
Token realize
Token ID6537
Feature activation+8.21e-03

Pos logit contributions
 had+0.006
 other+0.005
Who+0.005
 moral+0.005
 prayed+0.005
Neg logit contributions
ach-0.009
cient-0.009
uttering-0.009
ute-0.009
ocol-0.009
Token the
Token ID262
Feature activation-1.64e-02

Pos logit contributions
uttering+0.012
ower+0.012
ute+0.012
ap+0.012
cient+0.012
Neg logit contributions
 had-0.013
 other-0.013
Who-0.013
 also-0.012
 moral-0.012
Token power
Token ID1176
Feature activation-4.41e-04

Pos logit contributions
Who+0.024
 had+0.022
Where+0.022
He+0.022
Less+0.021
Neg logit contributions
ute-0.024
ap-0.024
cient-0.024
uttering-0.024
 Majesty-0.023
Token of
Token ID286
Feature activation-7.51e-03

Pos logit contributions
 had+0.000
Who+0.000
 other+0.000
 moral+0.000
Where+0.000
Neg logit contributions
cient-0.001
ach-0.001
ap-0.001
uttering-0.001
ute-0.001
Token being
Token ID852
Feature activation-5.33e-03

Pos logit contributions
 other+0.010
 had+0.009
Who+0.009
 moral+0.009
Where+0.009
Neg logit contributions
ute-0.013
cient-0.013
uttering-0.013
ower-0.013
ap-0.013
Token generous
Token ID14431
Feature activation+2.80e-02

Pos logit contributions
 had+0.006
 other+0.006
Who+0.006
 prayed+0.005
 learned+0.005
Neg logit contributions
ute-0.010
ummy-0.010
ower-0.010
ach-0.010
ocol-0.010
Token.
Token ID13
Feature activation-1.55e-02

Pos logit contributions
cient+0.053
ocol+0.053
ute+0.052
uttering+0.052
ower+0.052
Neg logit contributions
 had-0.033
 other-0.031
Who-0.029
 moral-0.028
 sounded-0.027
Token From
Token ID3574
Feature activation-4.41e-03

Pos logit contributions
Who+0.019
 had+0.018
 other+0.018
Where+0.017
 dial+0.017
Neg logit contributions
cient-0.027
ummy-0.027
ap-0.026
iddy-0.026
ute-0.026
Token then
Token ID788
Feature activation-4.77e-03

Pos logit contributions
 had+0.006
 also+0.006
 other+0.005
 moral+0.005
Who+0.005
Neg logit contributions
ocol-0.008
ower-0.008
uttering-0.008
ap-0.008
OO-0.008
Token on
Token ID319
Feature activation+6.31e-03

Pos logit contributions
Who+0.006
Where+0.006
 dial+0.006
 other+0.006
 had+0.006
Neg logit contributions
ap-0.008
ute-0.008
cient-0.008
uttering-0.008
ummy-0.007
Token goes
Token ID2925
Feature activation-2.94e-02

Pos logit contributions
Who+0.082
 had+0.077
Where+0.076
 other+0.075
Less+0.073
Neg logit contributions
ap-0.136
cient-0.135
ute-0.134
ach-0.131
 Majesty-0.131
Token to
Token ID284
Feature activation-1.90e-02

Pos logit contributions
Who+0.058
 had+0.056
Less+0.055
 moral+0.055
 other+0.055
Neg logit contributions
ap-0.028
ute-0.027
uttering-0.027
cient-0.027
ach-0.026
Token his
Token ID465
Feature activation-2.33e-02

Pos logit contributions
Who+0.023
 other+0.023
 had+0.022
 moral+0.021
Where+0.021
Neg logit contributions
cient-0.034
ute-0.033
ower-0.033
uttering-0.033
ap-0.033
Token room
Token ID2119
Feature activation+1.11e-03

Pos logit contributions
Who+0.036
 had+0.034
See+0.034
Where+0.034
Looks+0.033
Neg logit contributions
ap-0.033
ute-0.033
ummy-0.032
cient-0.031
ach-0.031
Token and
Token ID290
Feature activation-5.29e-02

Pos logit contributions
cient+0.002
ap+0.002
ute+0.002
ach+0.002
uttering+0.002
Neg logit contributions
Who-0.001
 had-0.001
 other-0.001
Where-0.001
 moral-0.001
Token takes
Token ID2753
Feature activation-3.20e-02

Pos logit contributions
Who+0.058
 had+0.056
 other+0.055
Where+0.053
 moral+0.052
Neg logit contributions
cient-0.097
ute-0.097
ap-0.096
ach-0.096
uttering-0.096
Token his
Token ID465
Feature activation-4.42e-02

Pos logit contributions
Who+0.048
 had+0.046
 moral+0.045
Where+0.044
Less+0.044
Neg logit contributions
ap-0.047
ach-0.045
cient-0.045
ute-0.045
ocol-0.044
Token trumpet
Token ID40334
Feature activation-9.42e-04

Pos logit contributions
Who+0.070
 had+0.067
Where+0.066
Less+0.064
He+0.064
Neg logit contributions
ute-0.061
cient-0.060
ap-0.059
ower-0.058
ach-0.057
Token.
Token ID13
Feature activation-3.92e-02

Pos logit contributions
Who+0.001
 had+0.001
 other+0.001
Where+0.001
 moral+0.001
Neg logit contributions
cient-0.002
ap-0.002
ach-0.002
ute-0.002
uttering-0.002
Token He
Token ID679
Feature activation-8.44e-02

Pos logit contributions
Who+0.030
 had+0.028
 other+0.027
He+0.026
Where+0.025
Neg logit contributions
cient-0.088
ach-0.087
uttering-0.087
ute-0.087
ower-0.086
Token blows
Token ID20385
Feature activation-2.83e-03

Pos logit contributions
Who+0.093
 had+0.090
 other+0.087
Where+0.085
 moral+0.084
Neg logit contributions
cient-0.154
ute-0.153
ap-0.153
uttering-0.152
ach-0.152
Token "
Token ID366
Feature activation-5.61e-02

Pos logit contributions
Who+0.008
 had+0.008
Where+0.008
 moral+0.008
 other+0.008
Neg logit contributions
ap-0.007
cient-0.007
ute-0.007
ummy-0.007
uttering-0.007
Tokencup
Token ID25244
Feature activation-3.72e-02

Pos logit contributions
 had+0.071
Who+0.070
 other+0.069
 moral+0.069
 also+0.068
Neg logit contributions
cient-0.090
ute-0.089
ach-0.089
ower-0.087
ap-0.085
Tokencake
Token ID30560
Feature activation-3.39e-02

Pos logit contributions
 had+0.054
Who+0.050
 dial+0.050
 other+0.050
 moral+0.048
Neg logit contributions
uttering-0.057
icol-0.055
ute-0.054
ocol-0.054
iddy-0.053
Token"
Token ID1
Feature activation-5.12e-02

Pos logit contributions
Who+0.044
 other+0.044
 had+0.043
Where+0.041
 moral+0.041
Neg logit contributions
uttering-0.057
iddy-0.055
ute-0.055
ocol-0.054
icol-0.053
Token to
Token ID284
Feature activation-4.96e-02

Pos logit contributions
Who+0.079
 moral+0.075
Less+0.075
Looks+0.074
Where+0.073
Neg logit contributions
ute-0.070
ower-0.068
ummy-0.067
ap-0.066
uttering-0.066
Token create
Token ID2251
Feature activation-2.66e-02

Pos logit contributions
Who+0.054
 had+0.052
 other+0.051
 moral+0.048
 dial+0.048
Neg logit contributions
ute-0.093
cient-0.093
uttering-0.093
ap-0.092
ach-0.091
Token her
Token ID607
Feature activation-4.31e-02

Pos logit contributions
Who+0.038
 had+0.037
 other+0.036
Where+0.035
 moral+0.035
Neg logit contributions
cient-0.040
ap-0.040
ute-0.040
ach-0.039
uttering-0.039
Token game
Token ID983
Feature activation-7.46e-03

Pos logit contributions
Who+0.060
 had+0.059
 other+0.059
 moral+0.056
Where+0.056
Neg logit contributions
cient-0.066
ute-0.065
uttering-0.065
ach-0.065
ap-0.065
Token.
Token ID13
Feature activation-3.41e-02

Pos logit contributions
Who+0.010
 had+0.009
Where+0.009
 moral+0.009
 other+0.009
Neg logit contributions
ap-0.012
ute-0.012
ach-0.012
uttering-0.012
ummy-0.012
Token
Token ID198
Feature activation-7.34e-03

Pos logit contributions
Who+0.046
 had+0.044
 prayed+0.042
 moral+0.041
Where+0.041
Neg logit contributions
ap-0.056
ummy-0.052
cient-0.052
uck-0.051
ute-0.051
Token
Token ID198
Feature activation-2.73e-02

Pos logit contributions
Who+0.010
 dial+0.009
Looks+0.009
 prayed+0.009
 had+0.009
Neg logit contributions
ute-0.011
ap-0.011
ummy-0.011
ach-0.011
utter-0.011
Token before
Token ID878
Feature activation-2.41e-02

Pos logit contributions
Who+0.018
 had+0.018
 other+0.017
Where+0.017
 moral+0.017
Neg logit contributions
cient-0.022
ute-0.022
ap-0.022
uttering-0.021
ach-0.021
Token agreeing
Token ID24093
Feature activation-1.28e-02

Pos logit contributions
Who+0.033
 had+0.031
 other+0.031
Where+0.031
 moral+0.031
Neg logit contributions
ach-0.037
ummy-0.037
uttering-0.037
ap-0.037
cient-0.036
Token.
Token ID13
Feature activation-3.25e-02

Pos logit contributions
Who+0.017
 had+0.016
 other+0.016
Where+0.016
 moral+0.015
Neg logit contributions
ap-0.020
cient-0.020
ute-0.020
ach-0.020
uttering-0.020
Token He
Token ID679
Feature activation-7.76e-02

Pos logit contributions
Who+0.034
 had+0.032
 other+0.031
Where+0.030
 moral+0.030
Neg logit contributions
cient-0.062
ap-0.061
ute-0.061
ummy-0.061
ach-0.060
Token happily
Token ID18177
Feature activation-3.85e-02

Pos logit contributions
Who+0.069
Where+0.064
 other+0.060
Less+0.057
See+0.055
Neg logit contributions
cient-0.156
ap-0.156
 Majesty-0.152
ach-0.152
ute-0.151
Token put
Token ID1234
Feature activation-2.40e-02

Pos logit contributions
Who+0.036
Where+0.033
 other+0.032
 moral+0.031
Less+0.030
Neg logit contributions
ap-0.078
cient-0.078
ute-0.076
ach-0.076
uttering-0.075
Token the
Token ID262
Feature activation-4.91e-02

Pos logit contributions
Who+0.032
 had+0.031
 other+0.030
Where+0.029
 moral+0.029
Neg logit contributions
cient-0.039
ap-0.039
ute-0.039
uttering-0.038
ach-0.038
Token pe
Token ID613
Feature activation-1.81e-02

Pos logit contributions
Who+0.080
Anne+0.078
 prayed+0.078
Where+0.077
 sounded+0.077
Neg logit contributions
ach-0.061
ute-0.060
ocol-0.060
 Majesty-0.059
ish-0.059
Tokenb
Token ID65
Feature activation-1.35e-02

Pos logit contributions
Who+0.016
 had+0.015
 other+0.015
 moral+0.014
Where+0.014
Neg logit contributions
cient-0.037
ap-0.037
ummy-0.037
uttering-0.036
ute-0.036
Tokenble
Token ID903
Feature activation-3.42e-02

Pos logit contributions
 had+0.026
Who+0.026
 other+0.026
He+0.026
 moral+0.025
Neg logit contributions
cient-0.013
ap-0.013
ummy-0.013
icol-0.013
uttering-0.012
Token in
Token ID287
Feature activation-3.96e-02

Pos logit contributions
 had+0.043
 other+0.041
Who+0.040
He+0.039
Where+0.038
Neg logit contributions
cient-0.059
 Majesty-0.058
ach-0.058
ap-0.057
ummy-0.057
Token.
Token ID13
Feature activation-1.61e-02

Pos logit contributions
ute+0.017
cient+0.017
uttering+0.016
ach+0.016
ap+0.016
Neg logit contributions
 had-0.011
Who-0.011
 other-0.011
Where-0.010
 moral-0.010
Token She
Token ID1375
Feature activation-6.35e-02

Pos logit contributions
Who+0.018
 had+0.017
 other+0.017
 moral+0.016
Where+0.016
Neg logit contributions
ap-0.029
cient-0.029
ute-0.029
uttering-0.029
ummy-0.029
Token hugs
Token ID40657
Feature activation+8.39e-03

Pos logit contributions
Who+0.074
Where+0.070
 had+0.067
Less+0.066
 other+0.066
Neg logit contributions
ap-0.113
cient-0.110
ute-0.110
 Majesty-0.110
ach-0.107
Token her
Token ID607
Feature activation-3.24e-03

Pos logit contributions
ower+0.018
 Majesty+0.018
ute+0.018
uttering+0.017
cient+0.017
Neg logit contributions
 other-0.009
Who-0.008
 had-0.007
Where-0.007
 moral-0.007
Token friends
Token ID2460
Feature activation-9.47e-03

Pos logit contributions
 other+0.004
 had+0.004
 moral+0.004
Who+0.004
 dial+0.004
Neg logit contributions
ach-0.006
cient-0.006
ocol-0.006
uttering-0.006
ap-0.006
Token and
Token ID290
Feature activation-3.43e-02

Pos logit contributions
 had+0.011
 other+0.011
Who+0.011
 moral+0.010
 also+0.010
Neg logit contributions
cient-0.017
ocol-0.017
ach-0.017
ute-0.017
ower-0.017
Token says
Token ID1139
Feature activation+6.18e-03

Pos logit contributions
 had+0.028
 other+0.028
Who+0.027
 moral+0.026
 also+0.025
Neg logit contributions
ach-0.075
uttering-0.075
ower-0.075
ute-0.075
cient-0.075
Token you
Token ID345
Feature activation-3.00e-02

Pos logit contributions
ach+0.011
uttering+0.010
ower+0.010
ute+0.010
ish+0.010
Neg logit contributions
 other-0.009
 had-0.009
 moral-0.008
 sounded-0.008
Who-0.008
Token are
Token ID389
Feature activation-2.09e-02

Pos logit contributions
 had+0.038
Who+0.037
 other+0.037
 moral+0.034
Where+0.034
Neg logit contributions
cient-0.051
uttering-0.051
ute-0.050
ach-0.050
ap-0.050
Token welcome
Token ID7062
Feature activation+9.52e-03

Pos logit contributions
Who+0.031
 had+0.028
Where+0.028
He+0.028
Maybe+0.027
Neg logit contributions
ap-0.031
cient-0.031
uttering-0.030
ute-0.030
 Majesty-0.030
Token.
Token ID13
Feature activation-1.99e-02

Pos logit contributions
ute+0.019
cient+0.019
ower+0.019
ocol+0.019
uttering+0.019
Neg logit contributions
 had-0.009
 other-0.009
Who-0.009
Where-0.009
 moral-0.008
Token said
Token ID531
Feature activation+9.31e-05

Pos logit contributions
Who+0.033
 moral+0.031
Where+0.031
 had+0.030
 other+0.029
Neg logit contributions
cient-0.067
ap-0.067
ach-0.066
ummy-0.066
ute-0.065
Token Joe
Token ID5689
Feature activation-4.24e-02

Pos logit contributions
ute+0.000
uttering+0.000
ower+0.000
 Majesty+0.000
cient+0.000
Neg logit contributions
Who-0.000
 other-0.000
 had-0.000
 sounded-0.000
Where-0.000
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
Who+0.048
 had+0.047
 other+0.045
Where+0.044
 moral+0.044
Neg logit contributions
cient-0.076
ute-0.076
ap-0.076
uttering-0.075
ach-0.075
Token
Token ID198
Feature activation+9.56e-04

Pos logit contributions
Who+0.014
 had+0.013
 other+0.013
Where+0.013
He+0.012
Neg logit contributions
ute-0.021
ap-0.021
ach-0.021
uttering-0.021
cient-0.021
Token
Token ID198
Feature activation-1.77e-02

Pos logit contributions
ute+0.002
ach+0.002
cient+0.002
ap+0.002
ummy+0.002
Neg logit contributions
Who-0.001
 had-0.001
Where-0.001
 other-0.001
 prayed-0.001
TokenThe
Token ID464
Feature activation-4.09e-02

Pos logit contributions
Who+0.019
 had+0.019
 other+0.019
 dial+0.018
 moral+0.018
Neg logit contributions
ach-0.032
cient-0.031
ap-0.031
ute-0.031
ower-0.031
Token bunny
Token ID44915
Feature activation-5.14e-02

Pos logit contributions
Who+0.074
 had+0.070
Less+0.069
 counted+0.069
Where+0.069
Neg logit contributions
ummy-0.046
ute-0.045
cient-0.044
ap-0.042
iddy-0.041
Token hopped
Token ID45230
Feature activation-1.01e-02

Pos logit contributions
Who+0.055
Where+0.051
Less+0.050
 other+0.048
See+0.048
Neg logit contributions
cient-0.095
ap-0.095
ute-0.092
 Majesty-0.091
ach-0.091
Token over
Token ID625
Feature activation-1.06e-02

Pos logit contributions
Who+0.014
 had+0.013
 other+0.013
Where+0.013
 moral+0.013
Neg logit contributions
uttering-0.015
ap-0.015
ach-0.015
ute-0.015
cient-0.015
Token to
Token ID284
Feature activation-1.65e-02

Pos logit contributions
Who+0.019
 had+0.019
 other+0.018
Where+0.018
 moral+0.018
Neg logit contributions
cient-0.012
ute-0.012
ap-0.012
uttering-0.012
ach-0.012
Token Joe
Token ID5689
Feature activation-4.58e-02

Pos logit contributions
Who+0.028
 had+0.027
 other+0.027
Where+0.026
 moral+0.026
Neg logit contributions
ute-0.022
ower-0.022
uttering-0.022
cient-0.021
ach-0.021