TOP ACTIVATIONS
MAX = 15.921

day to be careful with strangers and to not trust them
. Max bark ed at strangers and chased squirrel s .
be patient and not trust strangers . <|endoftext|> Once upon a
remember to be careful of strangers !" Sam my
is dangerous . You are strangers . You are big and
t scared of strangers anymore - he was very
to their friends and to strangers . They hugged each other
!" The strangers took Sarah 's coat and
. They do not follow strangers . They are polite and
be careful and not trust strangers with her things . The
" You shouldn 't trust strangers !" in a harsh voice
promised never to frown on strangers again . They also promised
shy and does not like strangers ." " Thank
words about not talking to strangers . L ily
match for the group of strangers . She realized that in
. Lily learned that sometimes strangers can be nice , but
't open the door to strangers . They might not be
you know and not follow strangers , even if it 's
rules . She knew that strangers could be dangerous and that
friends and not talking to strangers . Lily thought that was

MIN ACTIVATIONS
MIN = -14.583

tomorrow and play again . Of course , she did and
an ugly little mouse . Of course , Joy couldn 't
very brave and smart . Of course you can have your
to run too fast . Of course , Tommy was still
moment . " Yes ! Of course ," he said .
folded it in two . Of course , it didn 't
to go golf ing . Of course she said yes ,
ride in the wagon . Of course , Molly and Sam
his fish with him ! Of course , it wasn 't
said , " Yes ! Of course . I have a
could hardly stand it ! Of course she wanted to go
. So , Moe thought and thought . Suddenly
said , " Yes ! Of course you can borrow some
there was a mouse named Moe . Moe liked to play
are my best friend . Of course I include you in
a good brother too . Of course we can still be
friends to go home . Of course , they had a
a mouse named Moe . Moe liked to play hide and
my cute little girl ! Of course you can have a
you are very kind . Of course you can touch it

INTERVAL 8.295 - 15.921
CONTAINS 0.327%

the ground , and the wolf ran away .
She is scared of the bull . " The bull is
it on the shelf . Billy hated heights so he couldn
the old man . <|endoftext|> Billy loved to decor ate .
very young , only three years old . They

INTERVAL 0.669 - 8.295
CONTAINS 38.629%

all the plants were so happy . They danced and sang
how to make friends with animals ." She hugs them and
boy and his dad . The little boy wanted nothing more
me ?" Max said , " Sure , let 's bow
. But she will be okay ." They give Lily a

INTERVAL -6.957 - 0.669
CONTAINS 57.931%

come to the office anymore . You are grounded ."
smiled and picked up the baby and hugged him .
girls stayed and played together in the park . They ran
. Please forgive us ." Their parents sighed and
. Then one cold winter , the family decided to settle

INTERVAL -14.583 - -6.957
CONTAINS 3.113%

at his surroundings and marvel ed at the beauty of it
a big , white box with a door that opened with
He also said sorry and together they built an even better
there it was - the biggest and most beautiful tree Lily
. He roared and ran towards them . "
Token day
Token ID1110
Feature activation-3.64e+00

Pos logit contributions
 strangers+0.087
 tresp+0.085
 supposed+0.080
vern+0.079
ache+0.079
Neg logit contributions
cy-0.046
sc-0.043
 Of-0.041
 up-0.040
ed-0.039
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 up+0.165
ed+0.164
 onto+0.158
cy+0.156
 Of+0.152
Neg logit contributions
 tresp-0.241
 strangers-0.238
 liar-0.234
cod-0.232
vern-0.230
Token be
Token ID307
Feature activation-7.10e-01

Pos logit contributions
 strangers+0.072
 tresp+0.072
 supposed+0.069
ache+0.069
 liar+0.068
Neg logit contributions
cy-0.070
 up-0.068
 Of-0.062
sc-0.061
 onto-0.060
Token careful
Token ID8161
Feature activation+1.10e+00

Pos logit contributions
 up+0.044
cy+0.044
 down+0.041
ed+0.041
sc+0.041
Neg logit contributions
 strangers-0.036
 tresp-0.035
ache-0.032
vern-0.032
 supposed-0.031
Token with
Token ID351
Feature activation-8.83e+00

Pos logit contributions
 tresp+0.084
 strangers+0.083
 liar+0.079
vern+0.077
ache+0.076
Neg logit contributions
 up-0.044
cy-0.041
sc-0.039
ed-0.038
 down-0.038
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 up+0.418
 into+0.357
 onto+0.355
 down+0.351
cy+0.348
Neg logit contributions
ache-0.557
vern-0.547
 tresp-0.544
 liar-0.536
 supposed-0.513
Token and
Token ID290
Feature activation+2.83e+00

Pos logit contributions
 tresp+1.077
 strangers+1.073
vern+1.024
 supposed+1.015
 liar+1.005
Neg logit contributions
cy-0.736
 up-0.696
sc-0.689
 onto-0.641
ed-0.636
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 tresp+0.154
vern+0.149
 strangers+0.148
ache+0.146
 supposed+0.142
Neg logit contributions
 up-0.154
cy-0.154
 Of-0.148
 girl-0.145
 down-0.145
Token not
Token ID407
Feature activation+5.69e+00

Pos logit contributions
 tresp+0.072
ache+0.072
vern+0.070
 supposed+0.069
 strangers+0.068
Neg logit contributions
 up-0.066
cy-0.066
 Of-0.060
 down-0.059
 girl-0.058
Token trust
Token ID3774
Feature activation+2.50e+00

Pos logit contributions
 tresp+0.343
vern+0.339
 strangers+0.333
 supposed+0.318
ache+0.316
Neg logit contributions
cy-0.288
 up-0.280
 Of-0.262
 down-0.261
 into-0.258
Token them
Token ID606
Feature activation+5.02e+00

Pos logit contributions
vern+0.142
 tresp+0.140
ache+0.135
 liar+0.133
cod+0.128
Neg logit contributions
 up-0.142
cy-0.137
 down-0.134
 into-0.132
 onto-0.127
Token.
Token ID13
Feature activation+4.86e-01

Pos logit contributions
 strangers+0.147
 tresp+0.144
vern+0.141
 supposed+0.141
 liar+0.134
Neg logit contributions
cy-0.085
sc-0.078
 up-0.076
ed-0.073
 Of-0.071
Token Max
Token ID5436
Feature activation+2.99e+00

Pos logit contributions
 strangers+0.033
 tresp+0.032
 supposed+0.030
vern+0.030
ache+0.030
Neg logit contributions
cy-0.023
sc-0.022
 Of-0.021
 up-0.021
ed-0.020
Token bark
Token ID21405
Feature activation-1.38e+00

Pos logit contributions
 strangers+0.222
vern+0.216
 tresp+0.216
 supposed+0.210
 liar+0.206
Neg logit contributions
cy-0.122
 up-0.115
sc-0.111
ed-0.106
 girl-0.101
Tokened
Token ID276
Feature activation-7.68e+00

Pos logit contributions
cy+0.012
sc+0.008
 up+0.005
 Of+0.005
ed+0.004
Neg logit contributions
 strangers-0.148
 tresp-0.145
 supposed-0.140
vern-0.138
ache-0.137
Token at
Token ID379
Feature activation+1.23e-01

Pos logit contributions
cy+0.466
sc+0.445
 Of+0.429
ed+0.408
 onto+0.402
Neg logit contributions
 strangers-0.423
 tresp-0.400
 supposed-0.358
ache-0.354
 liar-0.354
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.009
 strangers+0.009
vern+0.009
 supposed+0.008
ache+0.008
Neg logit contributions
cy-0.005
 up-0.005
sc-0.005
 Of-0.005
 onto-0.004
Token and
Token ID290
Feature activation+2.84e+00

Pos logit contributions
 strangers+1.034
 tresp+1.015
 supposed+0.959
vern+0.951
ache+0.927
Neg logit contributions
cy-0.793
sc-0.746
 up-0.729
 Of-0.711
ed-0.694
Token chased
Token ID26172
Feature activation+4.18e+00

Pos logit contributions
 strangers+0.170
 tresp+0.161
 cruel+0.144
 liar+0.144
ache+0.142
Neg logit contributions
cy-0.165
sc-0.153
 Of-0.147
 up-0.144
ed-0.143
Token squirrel
Token ID33039
Feature activation+6.44e+00

Pos logit contributions
 strangers+0.251
 tresp+0.227
 supposed+0.202
 liar+0.199
 cruel+0.196
Neg logit contributions
cy-0.243
sc-0.225
 Of-0.223
 mixture-0.215
 up-0.214
Tokens
Token ID82
Feature activation-4.14e+00

Pos logit contributions
 strangers+0.560
 tresp+0.539
 supposed+0.508
vern+0.505
ache+0.503
Neg logit contributions
cy-0.167
 Of-0.161
sc-0.142
antly-0.130
 mixture-0.127
Token.
Token ID13
Feature activation+4.82e-01

Pos logit contributions
cy+0.177
sc+0.174
 up+0.172
 onto+0.162
 down+0.159
Neg logit contributions
 strangers-0.291
 tresp-0.288
 supposed-0.276
ache-0.271
vern-0.270
Token be
Token ID307
Feature activation-7.20e-01

Pos logit contributions
ache+0.082
 tresp+0.081
 liar+0.080
 strangers+0.078
 supposed+0.078
Neg logit contributions
cy-0.056
 up-0.056
 Of-0.048
 girl-0.048
 down-0.046
Token patient
Token ID5827
Feature activation-9.37e-01

Pos logit contributions
cy+0.048
 up+0.047
sc+0.046
 Of+0.045
ed+0.045
Neg logit contributions
 strangers-0.034
 tresp-0.032
ache-0.029
vern-0.029
 supposed-0.029
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
cy+0.047
sc+0.044
 up+0.043
 Of+0.042
ed+0.042
Neg logit contributions
 strangers-0.061
 tresp-0.059
 supposed-0.055
vern-0.055
ache-0.054
Token not
Token ID407
Feature activation+5.70e+00

Pos logit contributions
 strangers+0.130
 tresp+0.126
 supposed+0.113
vern+0.113
ache+0.111
Neg logit contributions
cy-0.191
sc-0.182
 up-0.182
 Of-0.180
ed-0.174
Token trust
Token ID3774
Feature activation+2.51e+00

Pos logit contributions
vern+0.342
 tresp+0.325
 strangers+0.325
 liar+0.305
ache+0.304
Neg logit contributions
cy-0.293
 up-0.290
 down-0.280
 into-0.272
 Of-0.266
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
vern+0.140
 tresp+0.135
ache+0.132
 liar+0.131
cod+0.125
Neg logit contributions
 up-0.147
cy-0.140
 down-0.139
 into-0.137
 onto-0.130
Token.
Token ID13
Feature activation+4.93e-01

Pos logit contributions
 tresp+1.130
 strangers+1.121
vern+1.085
 supposed+1.074
 liar+1.055
Neg logit contributions
cy-0.671
 up-0.637
sc-0.629
 down-0.579
ed-0.575
Token<|endoftext|>
Token ID50256
Feature activation+7.18e+00

Pos logit contributions
 strangers+0.032
 tresp+0.031
 supposed+0.029
vern+0.028
 liar+0.028
Neg logit contributions
cy-0.024
sc-0.023
 up-0.022
ed-0.022
 onto-0.021
TokenOnce
Token ID7454
Feature activation+1.20e+00

Pos logit contributions
 strangers+0.422
 tresp+0.408
 supposed+0.383
vern+0.371
 liar+0.351
Neg logit contributions
cy-0.375
sc-0.356
 Of-0.348
 mixture-0.337
 up-0.327
Token upon
Token ID2402
Feature activation+2.16e+00

Pos logit contributions
 tresp+0.090
 strangers+0.088
ache+0.084
 liar+0.082
 supposed+0.081
Neg logit contributions
cy-0.054
 up-0.050
 Of-0.046
 down-0.045
ed-0.044
Token a
Token ID257
Feature activation-1.23e+00

Pos logit contributions
 strangers+0.163
 tresp+0.163
ache+0.152
 supposed+0.149
vern+0.148
Neg logit contributions
cy-0.082
 up-0.079
sc-0.075
 Of-0.072
ed-0.071
Token remember
Token ID3505
Feature activation+1.57e-01

Pos logit contributions
ache+0.106
 strangers+0.105
 tresp+0.104
 liar+0.101
 supposed+0.096
Neg logit contributions
 up-0.103
cy-0.101
 Of-0.096
 down-0.096
 onto-0.095
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 tresp+0.010
ache+0.010
 liar+0.010
 strangers+0.010
vern+0.009
Neg logit contributions
 up-0.008
cy-0.007
 onto-0.007
 into-0.007
 down-0.007
Token be
Token ID307
Feature activation-7.28e-01

Pos logit contributions
 strangers+0.074
 tresp+0.074
ache+0.073
 liar+0.071
cod+0.070
Neg logit contributions
 up-0.063
cy-0.061
 onto-0.057
 Of-0.056
 down-0.056
Token careful
Token ID8161
Feature activation+1.10e+00

Pos logit contributions
cy+0.046
 up+0.046
sc+0.044
 down+0.043
 Of+0.043
Neg logit contributions
 strangers-0.036
 tresp-0.034
ache-0.032
vern-0.031
 supposed-0.031
Token of
Token ID286
Feature activation+2.26e+00

Pos logit contributions
 strangers+0.080
 tresp+0.080
 liar+0.076
vern+0.075
ache+0.074
Neg logit contributions
 up-0.044
cy-0.042
sc-0.041
 down-0.039
 onto-0.039
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
ache+0.136
vern+0.133
 liar+0.133
 tresp+0.132
 strangers+0.127
Neg logit contributions
 up-0.115
 down-0.101
 onto-0.100
 into-0.098
sc-0.094
Token!"
Token ID2474
Feature activation-2.41e-01

Pos logit contributions
 tresp+1.080
 strangers+1.079
vern+1.038
 supposed+1.023
 liar+1.017
Neg logit contributions
cy-0.710
sc-0.682
 up-0.682
 onto-0.629
 down-0.622
Token
Token ID198
Feature activation-6.93e-01

Pos logit contributions
cy+0.012
sc+0.011
 up+0.011
ed+0.010
 Of+0.010
Neg logit contributions
 strangers-0.016
 tresp-0.015
 supposed-0.014
ache-0.014
vern-0.014
Token
Token ID198
Feature activation-7.02e-01

Pos logit contributions
cy+0.022
sc+0.022
 Of+0.021
 up+0.020
antly+0.018
Neg logit contributions
 strangers-0.058
 tresp-0.056
 supposed-0.054
ache-0.053
 liar-0.052
TokenSam
Token ID16305
Feature activation+4.38e+00

Pos logit contributions
sc+0.033
cy+0.033
 Of+0.030
antly+0.029
 up+0.028
Neg logit contributions
 strangers-0.048
 tresp-0.047
 supposed-0.046
 allergic-0.043
ache-0.043
Tokenmy
Token ID1820
Feature activation+3.14e+00

Pos logit contributions
 strangers+0.274
 tresp+0.266
 supposed+0.248
vern+0.247
 liar+0.244
Neg logit contributions
cy-0.240
sc-0.217
 up-0.216
ed-0.211
 Of-0.208
Token is
Token ID318
Feature activation-1.85e+00

Pos logit contributions
cy+0.128
sc+0.121
 up+0.115
 Of+0.114
ed+0.109
Neg logit contributions
 strangers-0.224
 tresp-0.220
 supposed-0.207
vern-0.205
ache-0.203
Token dangerous
Token ID4923
Feature activation+1.05e+01

Pos logit contributions
 up+0.106
 Of+0.101
sc+0.101
cy+0.101
 down+0.097
Neg logit contributions
 strangers-0.100
 tresp-0.096
vern-0.090
ache-0.089
 liar-0.087
Token.
Token ID13
Feature activation+4.78e-01

Pos logit contributions
 strangers+0.706
 tresp+0.687
 supposed+0.645
vern+0.633
ache+0.631
Neg logit contributions
cy-0.503
sc-0.472
 Of-0.453
 up-0.450
ed-0.435
Token You
Token ID921
Feature activation-4.48e+00

Pos logit contributions
 strangers+0.033
 tresp+0.032
 supposed+0.030
 liar+0.030
vern+0.029
Neg logit contributions
cy-0.022
sc-0.020
 up-0.020
ed-0.019
 onto-0.019
Token are
Token ID389
Feature activation+4.14e+00

Pos logit contributions
cy+0.254
 up+0.235
sc+0.233
 onto+0.224
 girl+0.218
Neg logit contributions
 strangers-0.254
 tresp-0.245
vern-0.242
 supposed-0.231
cod-0.225
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 strangers+0.220
 tresp+0.209
vern+0.206
ache+0.205
cod+0.189
Neg logit contributions
 up-0.240
cy-0.229
 into-0.219
 Of-0.219
sc-0.217
Token.
Token ID13
Feature activation+4.93e-01

Pos logit contributions
 strangers+1.124
 tresp+1.122
vern+1.086
 supposed+1.068
isdom+1.031
Neg logit contributions
cy-0.670
sc-0.635
 up-0.633
 down-0.588
ed-0.588
Token You
Token ID921
Feature activation-4.47e+00

Pos logit contributions
 strangers+0.035
 tresp+0.034
 supposed+0.031
 liar+0.031
vern+0.030
Neg logit contributions
cy-0.022
 up-0.021
sc-0.020
ed-0.019
 onto-0.019
Token are
Token ID389
Feature activation+4.13e+00

Pos logit contributions
cy+0.249
sc+0.231
 up+0.227
 onto+0.218
ed+0.216
Neg logit contributions
 strangers-0.253
vern-0.249
 tresp-0.242
 supposed-0.229
ules-0.228
Token big
Token ID1263
Feature activation+3.84e+00

Pos logit contributions
 strangers+0.207
vern+0.200
ache+0.197
 tresp+0.195
cod+0.183
Neg logit contributions
 up-0.248
 into-0.228
cy-0.227
 onto-0.221
 down-0.220
Token and
Token ID290
Feature activation+2.83e+00

Pos logit contributions
 strangers+0.244
 tresp+0.238
vern+0.233
 supposed+0.232
ache+0.222
Neg logit contributions
cy-0.188
 up-0.176
sc-0.174
 girl-0.171
ed-0.168
Token
Token ID26391
Feature activation-2.93e-01

Pos logit contributions
 strangers+0.236
 tresp+0.232
 supposed+0.224
ache+0.212
vern+0.212
Neg logit contributions
cy-0.200
sc-0.193
 up-0.187
 Of-0.184
 girl-0.183
Token
Token ID8151
Feature activation+7.92e+00

Pos logit contributions
cy+0.017
ed+0.016
 Of+0.016
sc+0.016
 up+0.016
Neg logit contributions
 strangers-0.017
 tresp-0.016
 supposed-0.015
 cruel-0.015
vern-0.015
Tokent
Token ID83
Feature activation+1.97e+00

Pos logit contributions
 strangers+0.511
 tresp+0.490
 supposed+0.446
 cruel+0.441
 liar+0.441
Neg logit contributions
cy-0.420
sc-0.401
ed-0.380
 up-0.380
 Of-0.375
Token scared
Token ID12008
Feature activation+1.70e+00

Pos logit contributions
 strangers+0.108
 tresp+0.106
vern+0.101
 liar+0.096
isdom+0.095
Neg logit contributions
 up-0.112
cy-0.111
 down-0.107
sc-0.104
 onto-0.104
Token of
Token ID286
Feature activation+2.26e+00

Pos logit contributions
 strangers+0.109
 tresp+0.107
 supposed+0.100
vern+0.099
ache+0.098
Neg logit contributions
cy-0.086
sc-0.081
 up-0.079
 Of-0.078
ed-0.075
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.140
 strangers+0.140
vern+0.139
ache+0.136
 supposed+0.135
Neg logit contributions
 up-0.107
cy-0.103
 Of-0.102
sc-0.099
 onto-0.099
Token anymore
Token ID7471
Feature activation+5.03e+00

Pos logit contributions
 strangers+1.070
 tresp+1.053
 supposed+0.995
vern+0.992
 liar+0.971
Neg logit contributions
cy-0.748
sc-0.703
 up-0.688
 Of-0.666
 onto-0.650
Token -
Token ID532
Feature activation+3.87e+00

Pos logit contributions
 strangers+0.352
 tresp+0.345
vern+0.328
 supposed+0.325
 liar+0.322
Neg logit contributions
cy-0.214
sc-0.202
 up-0.199
 Of-0.198
ed-0.191
Token he
Token ID339
Feature activation-7.01e-01

Pos logit contributions
 strangers+0.262
 tresp+0.257
vern+0.246
 supposed+0.245
ache+0.242
Neg logit contributions
cy-0.173
sc-0.165
 Of-0.163
 up-0.163
 onto-0.155
Token was
Token ID373
Feature activation-2.44e+00

Pos logit contributions
cy+0.037
 up+0.035
sc+0.034
ed+0.033
 onto+0.032
Neg logit contributions
 strangers-0.042
 tresp-0.042
vern-0.041
 supposed-0.039
 liar-0.039
Token very
Token ID845
Feature activation+1.51e+00

Pos logit contributions
 up+0.144
cy+0.132
 onto+0.131
 down+0.130
 into+0.130
Neg logit contributions
 strangers-0.129
vern-0.126
 tresp-0.126
ache-0.120
cod-0.118
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 strangers+0.053
 tresp+0.051
 supposed+0.048
vern+0.047
ache+0.046
Neg logit contributions
cy-0.045
sc-0.043
 Of-0.041
 up-0.041
ed-0.040
Token their
Token ID511
Feature activation+6.91e-01

Pos logit contributions
ache+0.093
vern+0.089
 tresp+0.087
 liar+0.083
 supposed+0.082
Neg logit contributions
 up-0.051
 Of-0.046
cy-0.046
 down-0.043
 girl-0.042
Token friends
Token ID2460
Feature activation+3.14e-01

Pos logit contributions
 strangers+0.040
 tresp+0.040
 supposed+0.038
ache+0.037
vern+0.037
Neg logit contributions
cy-0.038
 up-0.035
sc-0.035
 Of-0.035
 girl-0.033
Token and
Token ID290
Feature activation+2.84e+00

Pos logit contributions
 tresp+0.022
 strangers+0.022
 supposed+0.021
vern+0.020
ache+0.020
Neg logit contributions
cy-0.014
 up-0.013
sc-0.013
 Of-0.012
 onto-0.012
Token to
Token ID284
Feature activation+1.28e+00

Pos logit contributions
 strangers+0.160
 tresp+0.158
ache+0.149
 supposed+0.148
vern+0.146
Neg logit contributions
cy-0.158
 up-0.149
 Of-0.149
sc-0.146
 down-0.142
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
ache+0.081
 tresp+0.076
vern+0.074
 supposed+0.072
 strangers+0.071
Neg logit contributions
 up-0.063
cy-0.059
 Of-0.058
 down-0.056
 together-0.055
Token.
Token ID13
Feature activation+4.94e-01

Pos logit contributions
 tresp+1.141
 strangers+1.135
 supposed+1.085
vern+1.068
ache+1.049
Neg logit contributions
cy-0.675
 up-0.624
sc-0.623
 Of-0.579
ed-0.573
Token They
Token ID1119
Feature activation-1.21e+00

Pos logit contributions
 strangers+0.036
 tresp+0.035
 supposed+0.033
vern+0.032
 liar+0.032
Neg logit contributions
cy-0.021
sc-0.020
 up-0.019
 Of-0.019
ed-0.018
Token hugged
Token ID41130
Feature activation+1.37e+00

Pos logit contributions
cy+0.076
 girl+0.068
 up+0.067
ed+0.067
sc+0.066
Neg logit contributions
 strangers-0.062
 tresp-0.060
 supposed-0.060
ache-0.056
 allergic-0.056
Token each
Token ID1123
Feature activation+3.20e+00

Pos logit contributions
 tresp+0.087
 strangers+0.085
 supposed+0.083
vern+0.081
ache+0.080
Neg logit contributions
cy-0.071
 up-0.066
sc-0.065
 Of-0.064
ed-0.063
Token other
Token ID584
Feature activation-1.01e+00

Pos logit contributions
 strangers+0.219
 tresp+0.215
 supposed+0.202
vern+0.199
ache+0.199
Neg logit contributions
cy-0.148
sc-0.139
 up-0.133
 Of-0.132
ed-0.128
Token!"
Token ID2474
Feature activation-2.48e-01

Pos logit contributions
 strangers+0.152
 tresp+0.148
 supposed+0.139
vern+0.136
ache+0.135
Neg logit contributions
cy-0.117
sc-0.110
 up-0.106
 Of-0.105
ed-0.102
Token
Token ID220
Feature activation-2.40e+00

Pos logit contributions
cy+0.012
sc+0.011
 up+0.010
cer+0.010
 aboard+0.010
Neg logit contributions
 strangers-0.015
 tresp-0.014
 understand-0.014
 supposed-0.014
isdom-0.014
Token
Token ID198
Feature activation-6.99e-01

Pos logit contributions
cy+0.104
sc+0.096
 Of+0.091
 up+0.090
ed+0.087
Neg logit contributions
 strangers-0.172
 tresp-0.167
 supposed-0.158
vern-0.156
ache-0.154
Token
Token ID198
Feature activation-6.91e-01

Pos logit contributions
cy+0.028
sc+0.026
 Of+0.025
 up+0.025
ed+0.024
Neg logit contributions
 strangers-0.052
 tresp-0.051
 supposed-0.048
ache-0.047
vern-0.047
TokenThe
Token ID464
Feature activation+2.99e+00

Pos logit contributions
cy+0.035
sc+0.034
 Of+0.033
 up+0.032
ed+0.031
Neg logit contributions
 strangers-0.045
 tresp-0.044
 supposed-0.041
ache-0.040
 liar-0.040
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.184
 strangers+0.178
 supposed+0.175
vern+0.175
ache+0.171
Neg logit contributions
cy-0.149
 up-0.148
 girl-0.144
sc-0.138
 down-0.135
Token took
Token ID1718
Feature activation-8.53e-01

Pos logit contributions
 strangers+0.966
 tresp+0.934
 supposed+0.862
ache+0.851
vern+0.839
Neg logit contributions
cy-0.857
sc-0.814
 Of-0.788
 up-0.774
ed-0.751
Token Sarah
Token ID10490
Feature activation+4.84e+00

Pos logit contributions
cy+0.048
sc+0.047
 Of+0.044
ipt+0.043
suit+0.041
Neg logit contributions
 strangers-0.049
 tresp-0.046
 supposed-0.043
 cruel-0.040
 allergic-0.040
Token's
Token ID338
Feature activation-1.30e+00

Pos logit contributions
 strangers+0.333
 tresp+0.323
 supposed+0.309
ache+0.301
vern+0.298
Neg logit contributions
cy-0.219
sc-0.207
 Of-0.201
ed-0.185
 up-0.185
Token coat
Token ID13209
Feature activation+2.34e+00

Pos logit contributions
cy+0.066
sc+0.062
 up+0.061
 Of+0.060
ed+0.058
Neg logit contributions
 strangers-0.083
 tresp-0.082
 supposed-0.078
vern-0.075
 liar-0.075
Token and
Token ID290
Feature activation+2.84e+00

Pos logit contributions
 strangers+0.149
 tresp+0.145
 supposed+0.135
vern+0.132
ache+0.131
Neg logit contributions
cy-0.121
sc-0.114
 up-0.110
 Of-0.109
ed-0.106
Token.
Token ID13
Feature activation+4.93e-01

Pos logit contributions
cy+0.192
 up+0.173
 Of+0.166
sc+0.165
 onto+0.160
Neg logit contributions
 tresp-0.369
 strangers-0.367
vern-0.347
 supposed-0.345
 liar-0.345
Token They
Token ID1119
Feature activation-1.20e+00

Pos logit contributions
 strangers+0.030
 tresp+0.028
 supposed+0.026
 liar+0.026
vern+0.025
Neg logit contributions
sc-0.027
cy-0.027
 up-0.025
ed-0.024
 onto-0.024
Token do
Token ID466
Feature activation+3.14e+00

Pos logit contributions
cy+0.063
 up+0.058
 onto+0.053
 girl+0.052
sc+0.051
Neg logit contributions
 supposed-0.072
 tresp-0.070
 strangers-0.070
vern-0.069
ache-0.067
Token not
Token ID407
Feature activation+5.69e+00

Pos logit contributions
 strangers+0.110
 tresp+0.109
ache+0.100
 liar+0.096
vern+0.090
Neg logit contributions
 up-0.251
cy-0.231
sc-0.225
 into-0.224
 Of-0.224
Token follow
Token ID1061
Feature activation-3.02e+00

Pos logit contributions
vern+0.319
 tresp+0.315
 strangers+0.295
cod+0.280
 supposed+0.272
Neg logit contributions
cy-0.336
 up-0.303
 down-0.296
 girl-0.289
 into-0.288
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
cy+0.148
 up+0.142
sc+0.138
 Of+0.135
 onto+0.133
Neg logit contributions
 tresp-0.190
 strangers-0.188
 supposed-0.180
vern-0.180
ache-0.178
Token.
Token ID13
Feature activation+4.77e-01

Pos logit contributions
 strangers+1.093
 tresp+1.088
 supposed+1.025
vern+1.018
ache+0.998
Neg logit contributions
cy-0.724
sc-0.676
 up-0.664
 Of-0.632
ed-0.619
Token They
Token ID1119
Feature activation-1.22e+00

Pos logit contributions
 strangers+0.032
 tresp+0.029
 supposed+0.028
 liar+0.027
vern+0.027
Neg logit contributions
cy-0.024
sc-0.023
 up-0.022
ed-0.021
 onto-0.021
Token are
Token ID389
Feature activation+4.12e+00

Pos logit contributions
cy+0.065
 up+0.061
 onto+0.056
sc+0.056
 girl+0.054
Neg logit contributions
 supposed-0.071
vern-0.069
 tresp-0.069
 strangers-0.068
ache-0.067
Token polite
Token ID23507
Feature activation+2.32e+00

Pos logit contributions
vern+0.205
 tresp+0.204
 strangers+0.203
ache+0.198
isdom+0.198
Neg logit contributions
 up-0.259
 into-0.233
 down-0.230
cy-0.224
 onto-0.222
Token and
Token ID290
Feature activation+2.83e+00

Pos logit contributions
 strangers+0.155
 tresp+0.153
 supposed+0.143
vern+0.143
ache+0.140
Neg logit contributions
cy-0.110
sc-0.102
 up-0.101
 Of-0.099
ed-0.099
Token be
Token ID307
Feature activation-7.22e-01

Pos logit contributions
 strangers+0.069
 tresp+0.067
 supposed+0.062
ache+0.061
vern+0.060
Neg logit contributions
cy-0.074
 up-0.069
sc-0.069
 Of-0.068
 onto-0.065
Token careful
Token ID8161
Feature activation+1.09e+00

Pos logit contributions
cy+0.046
sc+0.043
 Of+0.042
 up+0.042
ed+0.041
Neg logit contributions
 strangers-0.038
 tresp-0.037
 supposed-0.034
vern-0.033
ache-0.033
Token and
Token ID290
Feature activation+2.83e+00

Pos logit contributions
 strangers+0.089
 tresp+0.088
 supposed+0.082
 liar+0.082
vern+0.082
Neg logit contributions
cy-0.035
 up-0.034
sc-0.033
 Of-0.031
ed-0.030
Token not
Token ID407
Feature activation+5.70e+00

Pos logit contributions
 strangers+0.111
 tresp+0.107
vern+0.093
 supposed+0.093
ache+0.091
Neg logit contributions
cy-0.212
sc-0.203
 up-0.203
 Of-0.202
ed-0.196
Token trust
Token ID3774
Feature activation+2.50e+00

Pos logit contributions
 tresp+0.325
vern+0.322
 strangers+0.322
 supposed+0.295
 liar+0.292
Neg logit contributions
cy-0.315
 up-0.300
 down-0.293
 Of-0.289
sc-0.283
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.166
vern+0.162
ache+0.154
 liar+0.154
cod+0.153
Neg logit contributions
 up-0.117
cy-0.115
 down-0.109
 into-0.106
 onto-0.105
Token with
Token ID351
Feature activation-8.84e+00

Pos logit contributions
 tresp+1.092
 strangers+1.091
vern+1.026
 supposed+1.025
 liar+0.999
Neg logit contributions
cy-0.721
sc-0.675
 up-0.670
 Of-0.626
ed-0.625
Token her
Token ID607
Feature activation+9.88e-01

Pos logit contributions
 up+0.348
cy+0.314
 onto+0.313
 down+0.301
 Of+0.300
Neg logit contributions
 tresp-0.621
ache-0.611
vern-0.592
cod-0.592
 liar-0.583
Token things
Token ID1243
Feature activation-1.55e+00

Pos logit contributions
 tresp+0.069
 strangers+0.069
 supposed+0.066
vern+0.065
ache+0.064
Neg logit contributions
cy-0.042
 up-0.039
sc-0.038
 Of-0.038
 onto-0.037
Token.
Token ID13
Feature activation+4.78e-01

Pos logit contributions
cy+0.069
sc+0.065
 up+0.064
 Of+0.061
 onto+0.061
Neg logit contributions
 strangers-0.108
 tresp-0.107
 supposed-0.100
vern-0.098
 liar-0.098
Token The
Token ID383
Feature activation+9.09e-01

Pos logit contributions
 strangers+0.031
 tresp+0.029
 supposed+0.028
vern+0.027
ache+0.027
Neg logit contributions
cy-0.024
sc-0.023
 up-0.021
ed-0.021
 onto-0.021
Token "
Token ID366
Feature activation+3.88e+00

Pos logit contributions
 Of+0.011
cy+0.011
sc+0.011
 up+0.010
 onto+0.010
Neg logit contributions
 strangers-0.017
 tresp-0.016
 supposed-0.015
vern-0.015
 liar-0.015
TokenYou
Token ID1639
Feature activation-2.48e+00

Pos logit contributions
 strangers+0.248
 tresp+0.232
 supposed+0.232
ache+0.221
vern+0.219
Neg logit contributions
 up-0.181
cy-0.180
sc-0.178
 onto-0.174
 down-0.167
Token shouldn
Token ID6584
Feature activation+1.27e+01

Pos logit contributions
cy+0.150
ed+0.138
 up+0.138
 onto+0.133
 girl+0.131
Neg logit contributions
 strangers-0.139
 tresp-0.126
vern-0.121
cod-0.121
 supposed-0.120
Token't
Token ID470
Feature activation+3.31e+00

Pos logit contributions
 strangers+0.655
 tresp+0.631
 supposed+0.578
ache+0.570
 liar+0.569
Neg logit contributions
cy-0.797
 up-0.761
sc-0.757
 Of-0.753
 onto-0.725
Token trust
Token ID3774
Feature activation+2.50e+00

Pos logit contributions
 strangers+0.183
 tresp+0.180
 supposed+0.167
vern+0.166
ache+0.163
Neg logit contributions
cy-0.197
 Of-0.183
sc-0.183
 up-0.181
 onto-0.175
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.175
vern+0.174
ache+0.168
 liar+0.167
cod+0.165
Neg logit contributions
 up-0.108
cy-0.098
 down-0.097
 into-0.095
 onto-0.095
Token!"
Token ID2474
Feature activation-2.43e-01

Pos logit contributions
 tresp+1.094
 strangers+1.092
vern+1.053
 supposed+1.032
 liar+1.016
Neg logit contributions
cy-0.701
 up-0.669
sc-0.662
 down-0.614
 onto-0.607
Token in
Token ID287
Feature activation-2.11e+00

Pos logit contributions
cy+0.013
sc+0.012
 up+0.011
antly+0.011
 aboard+0.011
Neg logit contributions
 strangers-0.014
 supposed-0.013
 tresp-0.013
 understand-0.012
vern-0.012
Token a
Token ID257
Feature activation-1.22e+00

Pos logit contributions
cy+0.101
sc+0.094
 Of+0.092
 up+0.091
 onto+0.087
Neg logit contributions
 strangers-0.142
 tresp-0.139
 supposed-0.131
vern-0.129
ache-0.129
Token harsh
Token ID11859
Feature activation+5.38e+00

Pos logit contributions
cy+0.070
sc+0.068
 up+0.065
 Of+0.064
 girl+0.063
Neg logit contributions
 strangers-0.069
 tresp-0.068
 supposed-0.065
ache-0.062
vern-0.061
Token voice
Token ID3809
Feature activation-2.67e+00

Pos logit contributions
 strangers+0.379
 tresp+0.375
 supposed+0.359
vern+0.353
ache+0.348
Neg logit contributions
cy-0.233
sc-0.214
 up-0.212
 Of-0.210
ed-0.199
Token promised
Token ID8072
Feature activation+1.55e+00

Pos logit contributions
 strangers+0.173
 tresp+0.168
 supposed+0.156
ache+0.153
vern+0.152
Neg logit contributions
cy-0.152
sc-0.144
 Of-0.138
 up-0.137
ed-0.133
Token never
Token ID1239
Feature activation+6.44e+00

Pos logit contributions
 strangers+0.101
 tresp+0.099
 supposed+0.093
vern+0.091
ache+0.091
Neg logit contributions
cy-0.075
sc-0.071
 up-0.070
 Of-0.069
ed-0.066
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 strangers+0.424
 tresp+0.418
vern+0.392
 supposed+0.387
ache+0.380
Neg logit contributions
cy-0.311
sc-0.287
 up-0.285
 Of-0.284
ed-0.270
Token frown
Token ID22002
Feature activation+2.88e+00

Pos logit contributions
 tresp+0.074
 strangers+0.073
vern+0.070
 supposed+0.068
 liar+0.066
Neg logit contributions
cy-0.069
 up-0.065
 Of-0.063
 down-0.062
sc-0.061
Token on
Token ID319
Feature activation-3.09e+00

Pos logit contributions
 strangers+0.218
 tresp+0.210
 supposed+0.200
ache+0.199
vern+0.195
Neg logit contributions
cy-0.114
sc-0.106
 Of-0.101
 up-0.094
ed-0.092
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
cy+0.135
 up+0.133
sc+0.124
 Of+0.122
 onto+0.121
Neg logit contributions
 tresp-0.206
 strangers-0.204
 supposed-0.199
vern-0.199
ache-0.196
Token again
Token ID757
Feature activation-4.96e+00

Pos logit contributions
 strangers+1.261
 tresp+1.221
 supposed+1.158
ache+1.145
vern+1.137
Neg logit contributions
cy-0.570
sc-0.523
 Of-0.498
 up-0.476
ed-0.468
Token.
Token ID13
Feature activation+4.88e-01

Pos logit contributions
cy+0.217
sc+0.198
 up+0.197
 Of+0.193
 onto+0.186
Neg logit contributions
 strangers-0.348
 tresp-0.347
 supposed-0.325
vern-0.324
ache-0.319
Token They
Token ID1119
Feature activation-1.21e+00

Pos logit contributions
 strangers+0.034
 tresp+0.033
 supposed+0.032
vern+0.031
ache+0.030
Neg logit contributions
cy-0.021
sc-0.020
 up-0.019
ed-0.018
 Of-0.018
Token also
Token ID635
Feature activation+2.30e+00

Pos logit contributions
cy+0.068
 up+0.060
ed+0.059
 onto+0.058
sc+0.056
Neg logit contributions
 strangers-0.075
 tresp-0.074
vern-0.071
 supposed-0.071
 liar-0.069
Token promised
Token ID8072
Feature activation+1.56e+00

Pos logit contributions
 strangers+0.136
 tresp+0.130
vern+0.126
 supposed+0.124
ache+0.119
Neg logit contributions
cy-0.135
 up-0.120
ed-0.117
sc-0.115
 onto-0.115
Token shy
Token ID15800
Feature activation+1.54e+00

Pos logit contributions
 strangers+0.079
 tresp+0.077
vern+0.070
 supposed+0.070
ache+0.070
Neg logit contributions
cy-0.092
 up-0.089
sc-0.086
 Of-0.083
 onto-0.082
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 strangers+0.104
 tresp+0.102
 supposed+0.096
vern+0.095
ache+0.093
Neg logit contributions
cy-0.073
sc-0.069
 up-0.066
 Of-0.066
ed-0.064
Token does
Token ID857
Feature activation-1.71e-01

Pos logit contributions
 strangers+0.137
 tresp+0.135
vern+0.129
 supposed+0.125
ache+0.122
Neg logit contributions
cy-0.178
 up-0.174
sc-0.170
 Of-0.166
 onto-0.165
Token not
Token ID407
Feature activation+5.69e+00

Pos logit contributions
 up+0.013
 onto+0.012
 down+0.012
 into+0.012
 Of+0.012
Neg logit contributions
 tresp-0.006
 strangers-0.006
ache-0.005
 supposed-0.005
 liar-0.005
Token like
Token ID588
Feature activation+1.18e+00

Pos logit contributions
 tresp+0.277
vern+0.267
 strangers+0.267
 supposed+0.239
isdom+0.236
Neg logit contributions
cy-0.354
 up-0.346
 down-0.334
sc-0.328
 girl-0.326
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.074
vern+0.073
 strangers+0.072
 supposed+0.070
ache+0.070
Neg logit contributions
 up-0.055
cy-0.052
sc-0.050
 onto-0.050
 down-0.050
Token."
Token ID526
Feature activation-6.17e-01

Pos logit contributions
 tresp+1.147
vern+1.119
 strangers+1.107
 supposed+1.076
 liar+1.057
Neg logit contributions
cy-0.653
 up-0.639
sc-0.613
 down-0.597
 onto-0.567
Token
Token ID198
Feature activation-6.96e-01

Pos logit contributions
cy+0.035
sc+0.034
ed+0.032
 up+0.032
 mixture+0.031
Neg logit contributions
 strangers-0.036
 tresp-0.033
 liar-0.031
vern-0.030
 understand-0.029
Token
Token ID198
Feature activation-6.99e-01

Pos logit contributions
cy+0.026
sc+0.024
 Of+0.023
 up+0.023
ed+0.021
Neg logit contributions
 strangers-0.054
 tresp-0.053
 supposed-0.051
ache-0.050
vern-0.049
Token"
Token ID1
Feature activation+5.69e+00

Pos logit contributions
cy+0.035
sc+0.033
 Of+0.031
 up+0.031
ed+0.030
Neg logit contributions
 strangers-0.046
 tresp-0.045
 supposed-0.042
vern-0.041
ache-0.041
TokenThank
Token ID10449
Feature activation-4.87e+00

Pos logit contributions
 strangers+0.370
 cruel+0.342
 liar+0.341
 tresp+0.336
ache+0.333
Neg logit contributions
 up-0.292
 onto-0.275
cy-0.275
 down-0.269
 into-0.259
Token words
Token ID2456
Feature activation+1.41e+00

Pos logit contributions
 Of+0.060
 up+0.060
 girl+0.058
cy+0.056
 mixture+0.055
Neg logit contributions
 strangers-0.080
 tresp-0.080
ache-0.075
cod-0.075
 supposed-0.073
Token about
Token ID546
Feature activation+6.52e+00

Pos logit contributions
 tresp+0.102
 strangers+0.101
cod+0.096
 liar+0.094
 supposed+0.093
Neg logit contributions
 up-0.059
cy-0.057
 onto-0.055
sc-0.055
ed-0.053
Token not
Token ID407
Feature activation+5.68e+00

Pos logit contributions
 tresp+0.435
vern+0.425
ache+0.420
 strangers+0.419
 liar+0.412
Neg logit contributions
 up-0.286
cy-0.272
 Of-0.262
 onto-0.252
 down-0.248
Token talking
Token ID3375
Feature activation+1.25e+00

Pos logit contributions
 strangers+0.261
 tresp+0.259
vern+0.236
ache+0.230
 liar+0.226
Neg logit contributions
cy-0.376
 up-0.367
sc-0.359
 Of-0.356
 onto-0.347
Token to
Token ID284
Feature activation+1.26e+00

Pos logit contributions
 tresp+0.072
 strangers+0.071
vern+0.067
 supposed+0.066
 liar+0.065
Neg logit contributions
 up-0.068
cy-0.067
sc-0.065
 Of-0.064
 down-0.063
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.081
vern+0.081
 supposed+0.076
ache+0.075
 liar+0.074
Neg logit contributions
 up-0.059
cy-0.058
 Of-0.057
 down-0.054
 girl-0.053
Token.
Token ID13
Feature activation+4.87e-01

Pos logit contributions
 tresp+1.119
 strangers+1.118
vern+1.053
 supposed+1.051
 liar+1.031
Neg logit contributions
cy-0.690
 up-0.645
sc-0.640
 Of-0.606
 onto-0.594
Token
Token ID198
Feature activation-6.96e-01

Pos logit contributions
 strangers+0.027
 tresp+0.023
 supposed+0.023
isdom+0.023
 shouldn+0.023
Neg logit contributions
cy-0.027
sc-0.026
 mixture-0.023
 onto-0.023
 aboard-0.023
Token
Token ID198
Feature activation-6.99e-01

Pos logit contributions
cy+0.023
sc+0.021
 up+0.021
 Of+0.021
ed+0.018
Neg logit contributions
 strangers-0.057
 tresp-0.057
ache-0.054
 supposed-0.053
 liar-0.053
TokenL
Token ID43
Feature activation-1.79e+00

Pos logit contributions
cy+0.032
sc+0.031
 Of+0.029
 up+0.029
ed+0.028
Neg logit contributions
 strangers-0.049
 tresp-0.048
 supposed-0.045
ache-0.044
vern-0.043
Tokenily
Token ID813
Feature activation+1.92e+00

Pos logit contributions
cy+0.112
sc+0.107
 up+0.103
 Of+0.103
ed+0.101
Neg logit contributions
 strangers-0.095
 tresp-0.093
 supposed-0.084
ache-0.083
vern-0.082
Token match
Token ID2872
Feature activation-6.97e+00

Pos logit contributions
 strangers+0.184
 tresp+0.174
 supposed+0.165
 allergic+0.160
 liar+0.159
Neg logit contributions
cy-0.145
sc-0.142
 Of-0.133
ed-0.133
 onto-0.126
Token for
Token ID329
Feature activation-3.00e-01

Pos logit contributions
cy+0.322
 up+0.307
sc+0.298
ed+0.288
 onto+0.287
Neg logit contributions
 strangers-0.466
 tresp-0.458
vern-0.432
 supposed-0.431
ache-0.427
Token the
Token ID262
Feature activation+1.41e+00

Pos logit contributions
 up+0.012
 onto+0.012
 Of+0.011
cy+0.011
 into+0.011
Neg logit contributions
ache-0.020
 tresp-0.020
cod-0.020
 supposed-0.019
vern-0.019
Token group
Token ID1448
Feature activation+2.18e+00

Pos logit contributions
 strangers+0.093
 tresp+0.090
 supposed+0.083
 liar+0.082
ache+0.081
Neg logit contributions
cy-0.070
sc-0.068
 Of-0.064
ed-0.064
 onto-0.062
Token of
Token ID286
Feature activation+2.25e+00

Pos logit contributions
 strangers+0.139
 tresp+0.139
vern+0.131
 supposed+0.130
ache+0.130
Neg logit contributions
cy-0.105
sc-0.103
 up-0.098
ed-0.095
 onto-0.095
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
ache+0.140
 tresp+0.136
vern+0.136
 supposed+0.135
 strangers+0.128
Neg logit contributions
 up-0.103
cy-0.100
 girl-0.099
 Of-0.098
sc-0.094
Token.
Token ID13
Feature activation+4.87e-01

Pos logit contributions
 strangers+1.114
 tresp+1.108
vern+1.067
 supposed+1.039
 liar+1.029
Neg logit contributions
cy-0.701
sc-0.659
 up-0.651
 onto-0.614
 down-0.610
Token She
Token ID1375
Feature activation+6.47e-01

Pos logit contributions
 strangers+0.033
 tresp+0.032
 supposed+0.030
vern+0.030
 liar+0.029
Neg logit contributions
cy-0.022
sc-0.021
 up-0.020
ed-0.019
 Of-0.019
Token realized
Token ID6939
Feature activation+5.02e+00

Pos logit contributions
 strangers+0.038
 tresp+0.037
vern+0.037
 supposed+0.036
 liar+0.034
Neg logit contributions
cy-0.036
 up-0.033
sc-0.033
ed-0.032
 onto-0.032
Token that
Token ID326
Feature activation+1.15e+00

Pos logit contributions
 tresp+0.318
 supposed+0.308
 strangers+0.307
vern+0.305
cod+0.302
Neg logit contributions
 up-0.241
 onto-0.223
 Of-0.219
cy-0.215
ed-0.212
Token in
Token ID287
Feature activation-2.11e+00

Pos logit contributions
 strangers+0.071
 tresp+0.071
vern+0.067
ache+0.066
 supposed+0.066
Neg logit contributions
cy-0.058
 up-0.055
sc-0.055
 Of-0.054
 onto-0.052
Token.
Token ID13
Feature activation+4.96e-01

Pos logit contributions
 strangers+0.209
 tresp+0.204
 supposed+0.193
ache+0.187
vern+0.187
Neg logit contributions
cy-0.146
sc-0.138
 Of-0.129
 up-0.123
 onto-0.119
Token Lily
Token ID20037
Feature activation-3.08e+00

Pos logit contributions
 strangers+0.028
 tresp+0.026
 supposed+0.024
isdom+0.024
 liar+0.024
Neg logit contributions
cy-0.026
sc-0.025
 up-0.023
ed-0.023
antly-0.023
Token learned
Token ID4499
Feature activation+6.12e+00

Pos logit contributions
cy+0.149
 up+0.139
ed+0.136
sc+0.132
 onto+0.128
Neg logit contributions
 strangers-0.216
 tresp-0.205
 supposed-0.195
vern-0.194
 liar-0.193
Token that
Token ID326
Feature activation+1.16e+00

Pos logit contributions
 strangers+0.376
 tresp+0.374
 liar+0.349
 supposed+0.348
ache+0.346
Neg logit contributions
 up-0.317
 Of-0.293
cy-0.289
 onto-0.289
sc-0.283
Token sometimes
Token ID3360
Feature activation+2.49e-01

Pos logit contributions
 strangers+0.070
 tresp+0.069
 supposed+0.064
vern+0.063
ache+0.063
Neg logit contributions
cy-0.063
sc-0.060
 Of-0.058
 up-0.058
ed-0.055
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.018
 strangers+0.018
ache+0.017
 supposed+0.017
vern+0.017
Neg logit contributions
cy-0.010
 up-0.010
sc-0.009
 Of-0.009
 down-0.009
Token can
Token ID460
Feature activation+5.10e+00

Pos logit contributions
 tresp+0.920
 strangers+0.913
vern+0.900
ache+0.872
 liar+0.868
Neg logit contributions
cy-0.904
 up-0.870
sc-0.848
 onto-0.796
ed-0.795
Token be
Token ID307
Feature activation-7.15e-01

Pos logit contributions
 tresp+0.316
 supposed+0.310
 strangers+0.308
 liar+0.303
isdom+0.290
Neg logit contributions
 up-0.258
cy-0.245
 edge-0.228
 into-0.227
 onto-0.223
Token nice
Token ID3621
Feature activation-6.04e-02

Pos logit contributions
 up+0.046
 down+0.043
cy+0.042
 into+0.042
 girl+0.040
Neg logit contributions
 strangers-0.035
 tresp-0.034
vern-0.033
ache-0.032
 liar-0.031
Token,
Token ID11
Feature activation-2.46e-01

Pos logit contributions
cy+0.003
sc+0.003
 Of+0.003
 up+0.003
ed+0.003
Neg logit contributions
 strangers-0.004
 tresp-0.004
 supposed-0.003
vern-0.003
ache-0.003
Token but
Token ID475
Feature activation-2.54e+00

Pos logit contributions
cy+0.011
 up+0.010
sc+0.010
 Of+0.010
 onto+0.009
Neg logit contributions
 tresp-0.017
 strangers-0.017
vern-0.016
ache-0.016
 supposed-0.016
Token't
Token ID470
Feature activation+3.28e+00

Pos logit contributions
 strangers+0.349
 tresp+0.345
 liar+0.340
 supposed+0.324
cod+0.322
Neg logit contributions
 up-0.308
 onto-0.294
sc-0.290
 Of-0.284
cy-0.278
Token open
Token ID1280
Feature activation+3.64e-01

Pos logit contributions
 strangers+0.180
 tresp+0.176
 supposed+0.163
vern+0.160
ache+0.157
Neg logit contributions
cy-0.197
sc-0.186
 Of-0.181
 up-0.181
ed-0.174
Token the
Token ID262
Feature activation+1.40e+00

Pos logit contributions
 strangers+0.024
 tresp+0.023
 supposed+0.021
ache+0.021
vern+0.021
Neg logit contributions
cy-0.019
sc-0.018
 Of-0.017
ed-0.016
 up-0.016
Token door
Token ID3420
Feature activation-3.30e+00

Pos logit contributions
 strangers+0.096
 tresp+0.094
 supposed+0.089
vern+0.087
ache+0.086
Neg logit contributions
cy-0.064
sc-0.060
 up-0.058
 Of-0.058
ed-0.055
Token to
Token ID284
Feature activation+1.25e+00

Pos logit contributions
cy+0.172
sc+0.162
 up+0.157
 Of+0.154
ed+0.151
Neg logit contributions
 strangers-0.208
 tresp-0.203
 supposed-0.189
vern-0.187
ache-0.184
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 tresp+0.078
 strangers+0.077
 supposed+0.073
vern+0.073
ache+0.072
Neg logit contributions
cy-0.064
 up-0.061
sc-0.058
 Of-0.057
 onto-0.056
Token.
Token ID13
Feature activation+4.78e-01

Pos logit contributions
 strangers+1.063
 tresp+1.040
 supposed+0.975
vern+0.961
ache+0.953
Neg logit contributions
cy-0.766
sc-0.718
 up-0.688
 Of-0.685
ed-0.662
Token They
Token ID1119
Feature activation-1.22e+00

Pos logit contributions
 strangers+0.033
 tresp+0.032
 supposed+0.030
vern+0.029
 liar+0.029
Neg logit contributions
cy-0.022
sc-0.021
 up-0.020
ed-0.019
 onto-0.019
Token might
Token ID1244
Feature activation+4.25e+00

Pos logit contributions
cy+0.068
 up+0.063
sc+0.061
 onto+0.058
 girl+0.057
Neg logit contributions
 strangers-0.071
 tresp-0.071
vern-0.068
 liar-0.068
 supposed-0.067
Token not
Token ID407
Feature activation+5.69e+00

Pos logit contributions
isdom+0.249
ules+0.249
 tresp+0.245
 liar+0.244
vern+0.242
Neg logit contributions
 up-0.210
cy-0.193
 away-0.187
 into-0.184
 down-0.178
Token be
Token ID307
Feature activation-7.27e-01

Pos logit contributions
 tresp+0.355
vern+0.352
isdom+0.332
 liar+0.326
ules+0.323
Neg logit contributions
 up-0.277
cy-0.261
 down-0.258
 into-0.257
 with-0.245
Token you
Token ID345
Feature activation-5.86e-01

Pos logit contributions
 tresp+0.206
 strangers+0.200
 liar+0.192
vern+0.191
 supposed+0.189
Neg logit contributions
cy-0.152
 up-0.149
sc-0.140
 down-0.136
 into-0.134
Token know
Token ID760
Feature activation+4.89e+00

Pos logit contributions
 up+0.030
cy+0.028
 down+0.028
 into+0.027
ed+0.027
Neg logit contributions
 tresp-0.036
vern-0.035
 liar-0.035
 strangers-0.033
ache-0.033
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 tresp+0.341
 liar+0.333
 strangers+0.328
ache+0.325
 supposed+0.325
Neg logit contributions
 up-0.203
cy-0.179
sc-0.179
 onto-0.175
 into-0.175
Token not
Token ID407
Feature activation+5.68e+00

Pos logit contributions
 strangers+0.128
 tresp+0.126
ache+0.113
vern+0.113
 supposed+0.112
Neg logit contributions
cy-0.188
 up-0.183
 Of-0.180
sc-0.179
 onto-0.174
Token follow
Token ID1061
Feature activation-3.03e+00

Pos logit contributions
vern+0.308
 tresp+0.306
 strangers+0.302
cod+0.277
 liar+0.276
Neg logit contributions
cy-0.318
 up-0.312
 down-0.300
 Of-0.297
sc-0.293
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 up+0.147
cy+0.135
 down+0.130
sc+0.130
 into+0.130
Neg logit contributions
 tresp-0.194
ache-0.187
 liar-0.186
vern-0.186
 supposed-0.182
Token,
Token ID11
Feature activation-2.42e-01

Pos logit contributions
 tresp+1.106
 strangers+1.105
vern+1.043
 supposed+1.042
 liar+1.027
Neg logit contributions
cy-0.698
sc-0.659
 up-0.655
ed-0.603
 onto-0.601
Token even
Token ID772
Feature activation+3.04e+00

Pos logit contributions
 up+0.012
cy+0.011
sc+0.011
 onto+0.011
 Of+0.011
Neg logit contributions
 tresp-0.015
vern-0.015
 supposed-0.014
 strangers-0.014
 liar-0.014
Token if
Token ID611
Feature activation-1.75e+00

Pos logit contributions
 strangers+0.198
 tresp+0.191
 supposed+0.179
ache+0.175
vern+0.174
Neg logit contributions
cy-0.154
sc-0.144
 Of-0.139
 up-0.137
ed-0.134
Token it
Token ID340
Feature activation+2.37e-01

Pos logit contributions
cy+0.066
 up+0.065
 Of+0.062
sc+0.062
ed+0.059
Neg logit contributions
 tresp-0.126
 strangers-0.124
ache-0.121
vern-0.120
 supposed-0.120
Token's
Token ID338
Feature activation-1.30e+00

Pos logit contributions
 strangers+0.017
 tresp+0.017
vern+0.016
 supposed+0.016
 liar+0.016
Neg logit contributions
cy-0.010
sc-0.009
 up-0.009
ed-0.009
 Of-0.009
Token rules
Token ID3173
Feature activation-1.11e+00

Pos logit contributions
 strangers+0.067
 tresp+0.064
 supposed+0.060
ache+0.059
vern+0.059
Neg logit contributions
cy-0.049
sc-0.045
 Of-0.043
ed-0.042
 up-0.040
Token.
Token ID13
Feature activation+4.77e-01

Pos logit contributions
cy+0.044
sc+0.043
 up+0.042
ed+0.040
 onto+0.039
Neg logit contributions
 tresp-0.081
 strangers-0.081
 liar-0.076
 supposed-0.075
cod-0.075
Token She
Token ID1375
Feature activation+6.32e-01

Pos logit contributions
 strangers+0.030
 tresp+0.028
 supposed+0.027
vern+0.026
 liar+0.026
Neg logit contributions
cy-0.024
sc-0.024
 up-0.022
ed-0.022
 onto-0.022
Token knew
Token ID2993
Feature activation+4.84e+00

Pos logit contributions
 strangers+0.034
 tresp+0.033
 supposed+0.031
vern+0.030
 liar+0.030
Neg logit contributions
cy-0.039
 up-0.036
sc-0.036
ed-0.036
 girl-0.035
Token that
Token ID326
Feature activation+1.15e+00

Pos logit contributions
 strangers+0.290
 tresp+0.287
 supposed+0.267
vern+0.263
ache+0.261
Neg logit contributions
cy-0.257
 up-0.245
 Of-0.241
sc-0.240
 onto-0.232
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
 strangers+0.074
 tresp+0.073
vern+0.068
 supposed+0.068
ache+0.067
Neg logit contributions
cy-0.056
sc-0.053
 up-0.052
 Of-0.052
 onto-0.049
Token could
Token ID714
Feature activation+5.08e+00

Pos logit contributions
vern+1.005
 tresp+0.998
 strangers+0.969
 liar+0.960
 supposed+0.942
Neg logit contributions
cy-0.837
 up-0.792
sc-0.763
ed-0.749
 onto-0.726
Token be
Token ID307
Feature activation-7.27e-01

Pos logit contributions
 liar+0.286
 tresp+0.285
cod+0.277
 supposed+0.275
ache+0.275
Neg logit contributions
 up-0.272
cy-0.249
 onto-0.236
 into-0.231
 edge-0.229
Token dangerous
Token ID4923
Feature activation+1.05e+01

Pos logit contributions
 up+0.045
cy+0.042
 down+0.041
 into+0.040
 onto+0.039
Neg logit contributions
 tresp-0.037
 strangers-0.037
vern-0.035
 liar-0.034
ache-0.034
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 strangers+0.668
 tresp+0.653
 supposed+0.610
vern+0.603
ache+0.598
Neg logit contributions
cy-0.535
sc-0.505
 up-0.488
 Of-0.484
ed-0.472
Token that
Token ID326
Feature activation+1.15e+00

Pos logit contributions
 tresp+0.146
vern+0.142
 strangers+0.137
ache+0.133
cod+0.130
Neg logit contributions
 up-0.167
cy-0.159
 down-0.158
 Of-0.153
 girl-0.153
Token friends
Token ID2460
Feature activation+3.16e-01

Pos logit contributions
 strangers+0.064
 tresp+0.062
 supposed+0.058
ache+0.056
vern+0.056
Neg logit contributions
cy-0.050
sc-0.048
 Of-0.046
ed-0.044
 up-0.044
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 strangers+0.022
 tresp+0.022
vern+0.020
 supposed+0.020
 liar+0.020
Neg logit contributions
cy-0.014
sc-0.014
 up-0.013
 Of-0.013
 onto-0.012
Token not
Token ID407
Feature activation+5.68e+00

Pos logit contributions
 strangers+0.149
 tresp+0.145
 supposed+0.134
vern+0.134
ache+0.132
Neg logit contributions
cy-0.171
sc-0.163
 up-0.161
 Of-0.160
 onto-0.154
Token talking
Token ID3375
Feature activation+1.23e+00

Pos logit contributions
 strangers+0.290
 tresp+0.285
vern+0.283
 liar+0.256
cod+0.256
Neg logit contributions
cy-0.338
 up-0.330
 down-0.322
sc-0.314
 Of-0.314
Token to
Token ID284
Feature activation+1.26e+00

Pos logit contributions
 tresp+0.085
 strangers+0.084
vern+0.080
 supposed+0.079
 liar+0.078
Neg logit contributions
 up-0.053
cy-0.052
sc-0.051
 down-0.050
ed-0.049
Token strangers
Token ID18981
Feature activation+1.59e+01

Pos logit contributions
vern+0.081
 tresp+0.078
ache+0.075
 liar+0.073
 supposed+0.072
Neg logit contributions
 up-0.062
cy-0.057
 down-0.057
 girl-0.055
 Of-0.055
Token.
Token ID13
Feature activation+4.79e-01

Pos logit contributions
 tresp+1.123
 strangers+1.118
vern+1.065
 supposed+1.057
 liar+1.037
Neg logit contributions
cy-0.682
 up-0.643
sc-0.642
 down-0.590
 onto-0.588
Token Lily
Token ID20037
Feature activation-3.09e+00

Pos logit contributions
 strangers+0.029
 hated+0.026
 shouldn+0.025
 tresp+0.025
 mean+0.024
Neg logit contributions
cy-0.023
 aboard-0.022
cer-0.022
sc-0.022
antly-0.021
Token thought
Token ID1807
Feature activation-3.11e+00

Pos logit contributions
cy+0.178
sc+0.167
 up+0.164
 Of+0.162
ed+0.160
Neg logit contributions
 strangers-0.178
 tresp-0.174
 supposed-0.162
vern-0.159
ache-0.155
Token that
Token ID326
Feature activation+1.14e+00

Pos logit contributions
 up+0.161
 onto+0.131
 into+0.124
 down+0.121
 Of+0.117
Neg logit contributions
cod-0.208
 tresp-0.207
 liar-0.206
ache-0.199
vern-0.196
Token was
Token ID373
Feature activation-2.45e+00

Pos logit contributions
 tresp+0.076
 strangers+0.076
vern+0.073
ache+0.072
 liar+0.070
Neg logit contributions
cy-0.050
 up-0.049
sc-0.049
 Of-0.049
 girl-0.046
Token tomorrow
Token ID9439
Feature activation-6.30e-01

Pos logit contributions
cy+0.013
sc+0.012
 Of+0.012
ed+0.011
 up+0.011
Neg logit contributions
 strangers-0.016
 tresp-0.015
 supposed-0.014
vern-0.014
ache-0.014
Token and
Token ID290
Feature activation+2.23e+00

Pos logit contributions
cy+0.033
sc+0.031
 Of+0.030
 up+0.029
ed+0.029
Neg logit contributions
 strangers-0.040
 tresp-0.038
 supposed-0.036
ache-0.035
vern-0.035
Token play
Token ID711
Feature activation+1.92e-01

Pos logit contributions
 strangers+0.125
 tresp+0.118
 supposed+0.108
ache+0.105
 liar+0.103
Neg logit contributions
cy-0.132
sc-0.129
 Of-0.121
ed-0.120
 up-0.115
Token again
Token ID757
Feature activation-5.57e+00

Pos logit contributions
 strangers+0.015
 tresp+0.014
vern+0.013
 liar+0.013
 supposed+0.013
Neg logit contributions
cy-0.007
 Of-0.006
sc-0.006
ipt-0.006
ed-0.006
Token.
Token ID13
Feature activation-1.41e-01

Pos logit contributions
cy+0.275
sc+0.263
 Of+0.252
 up+0.245
ed+0.243
Neg logit contributions
 strangers-0.366
 tresp-0.352
 supposed-0.331
vern-0.323
ache-0.322
Token Of
Token ID3226
Feature activation-1.39e+01

Pos logit contributions
cy+0.008
sc+0.007
 up+0.007
ed+0.007
 mixture+0.007
Neg logit contributions
 strangers-0.008
 tresp-0.008
 liar-0.007
 supposed-0.007
isdom-0.007
Token course
Token ID1781
Feature activation+8.77e-01

Pos logit contributions
cy+0.767
sc+0.705
 up+0.699
 Of+0.693
 girl+0.670
Neg logit contributions
 strangers-0.811
 tresp-0.793
 supposed-0.773
ache-0.750
vern-0.750
Token,
Token ID11
Feature activation-8.83e-01

Pos logit contributions
 strangers+0.053
 tresp+0.051
 supposed+0.046
vern+0.045
ache+0.045
Neg logit contributions
cy-0.048
sc-0.047
 Of-0.044
 up-0.043
ed-0.043
Token she
Token ID673
Feature activation-1.10e+00

Pos logit contributions
cy+0.049
sc+0.049
ed+0.044
 Of+0.044
ASH+0.043
Neg logit contributions
 strangers-0.052
 tresp-0.049
 supposed-0.044
 cruel-0.043
 allergic-0.043
Token did
Token ID750
Feature activation+2.20e+00

Pos logit contributions
cy+0.061
sc+0.057
 up+0.056
ed+0.055
 onto+0.054
Neg logit contributions
 strangers-0.067
 tresp-0.065
 supposed-0.060
vern-0.060
ache-0.059
Token and
Token ID290
Feature activation+2.16e+00

Pos logit contributions
 strangers+0.152
 tresp+0.149
ache+0.141
 supposed+0.139
 liar+0.138
Neg logit contributions
cy-0.096
 up-0.096
sc-0.091
 Of-0.089
 onto-0.088
Token an
Token ID281
Feature activation+1.23e+00

Pos logit contributions
cy+0.167
sc+0.159
 up+0.155
 Of+0.151
ed+0.147
Neg logit contributions
 strangers-0.264
 tresp-0.256
vern-0.243
 supposed-0.242
ache-0.241
Token ugly
Token ID13400
Feature activation+3.16e+00

Pos logit contributions
 strangers+0.067
 tresp+0.066
vern+0.066
ache+0.063
 liar+0.062
Neg logit contributions
 up-0.068
cy-0.067
 girl-0.064
 Of-0.062
 onto-0.061
Token little
Token ID1310
Feature activation-8.58e-01

Pos logit contributions
 strangers+0.196
 tresp+0.191
 supposed+0.181
vern+0.180
ache+0.175
Neg logit contributions
cy-0.164
sc-0.152
 up-0.151
 Of-0.149
 girl-0.143
Token mouse
Token ID10211
Feature activation-7.59e-02

Pos logit contributions
cy+0.043
sc+0.042
 Of+0.041
ed+0.040
 up+0.039
Neg logit contributions
 strangers-0.055
 tresp-0.053
 liar-0.049
ache-0.048
 supposed-0.047
Token.
Token ID13
Feature activation-2.20e-01

Pos logit contributions
cy+0.003
sc+0.003
 up+0.003
 onto+0.003
ed+0.003
Neg logit contributions
 strangers-0.005
vern-0.005
 tresp-0.005
 supposed-0.005
 liar-0.005
Token Of
Token ID3226
Feature activation-1.39e+01

Pos logit contributions
cy+0.010
sc+0.009
 up+0.009
 Of+0.009
ed+0.009
Neg logit contributions
 strangers-0.015
 tresp-0.015
 supposed-0.014
vern-0.014
ache-0.014
Token course
Token ID1781
Feature activation+9.39e-01

Pos logit contributions
cy+0.816
sc+0.765
 Of+0.751
 up+0.749
ed+0.723
Neg logit contributions
 strangers-0.783
 tresp-0.759
 supposed-0.714
ache-0.701
vern-0.696
Token,
Token ID11
Feature activation-7.62e-01

Pos logit contributions
 strangers+0.060
 tresp+0.058
 supposed+0.054
vern+0.052
ache+0.052
Neg logit contributions
cy-0.049
sc-0.046
 Of-0.044
 up-0.043
ed-0.043
Token Joy
Token ID14087
Feature activation-5.98e+00

Pos logit contributions
cy+0.038
sc+0.036
 Of+0.034
 up+0.033
ed+0.033
Neg logit contributions
 strangers-0.050
 tresp-0.048
 supposed-0.045
vern-0.044
 liar-0.044
Token couldn
Token ID3521
Feature activation+6.20e+00

Pos logit contributions
cy+0.297
ed+0.272
sc+0.271
 up+0.271
 onto+0.258
Neg logit contributions
 strangers-0.398
 tresp-0.384
vern-0.369
 supposed-0.366
 liar-0.351
Token't
Token ID470
Feature activation+2.91e+00

Pos logit contributions
 strangers+0.340
 tresp+0.325
 supposed+0.301
ache+0.296
 liar+0.296
Neg logit contributions
cy-0.373
sc-0.361
 up-0.360
 onto-0.348
 Of-0.347
Token very
Token ID845
Feature activation+9.41e-01

Pos logit contributions
 strangers+0.219
 tresp+0.206
ache+0.198
vern+0.194
 liar+0.190
Neg logit contributions
cy-0.185
 up-0.184
sc-0.173
 Of-0.170
 onto-0.170
Token brave
Token ID14802
Feature activation+3.29e+00

Pos logit contributions
 strangers+0.048
 tresp+0.045
 supposed+0.041
ache+0.040
vern+0.040
Neg logit contributions
cy-0.061
sc-0.058
 up-0.057
 Of-0.057
ed-0.055
Token and
Token ID290
Feature activation+2.25e+00

Pos logit contributions
 strangers+0.206
 tresp+0.199
 supposed+0.186
ache+0.182
vern+0.180
Neg logit contributions
cy-0.174
sc-0.164
 Of-0.160
 up-0.155
ed-0.151
Token smart
Token ID4451
Feature activation+2.85e+00

Pos logit contributions
 strangers+0.097
 tresp+0.094
 supposed+0.086
vern+0.082
ache+0.082
Neg logit contributions
cy-0.162
sc-0.155
 Of-0.149
 up-0.149
ed-0.146
Token.
Token ID13
Feature activation-1.16e-01

Pos logit contributions
 strangers+0.189
 tresp+0.183
 supposed+0.171
ache+0.168
vern+0.167
Neg logit contributions
cy-0.141
sc-0.132
 Of-0.127
 up-0.125
ed-0.121
Token Of
Token ID3226
Feature activation-1.38e+01

Pos logit contributions
cy+0.005
sc+0.005
 up+0.005
 mixture+0.005
 onto+0.004
Neg logit contributions
 strangers-0.008
 tresp-0.008
 supposed-0.007
isdom-0.007
 liar-0.007
Token course
Token ID1781
Feature activation+8.93e-01

Pos logit contributions
cy+0.731
 up+0.666
sc+0.664
 Of+0.662
 girl+0.644
Neg logit contributions
 strangers-0.821
 tresp-0.795
 supposed-0.787
ache-0.781
vern-0.748
Token you
Token ID345
Feature activation-1.20e+00

Pos logit contributions
 strangers+0.058
 tresp+0.056
 supposed+0.052
vern+0.051
ache+0.050
Neg logit contributions
cy-0.045
sc-0.043
 Of-0.041
 up-0.040
ed-0.039
Token can
Token ID460
Feature activation+4.46e+00

Pos logit contributions
cy+0.062
sc+0.055
ed+0.055
 up+0.054
 girl+0.052
Neg logit contributions
 strangers-0.080
 tresp-0.073
 supposed-0.070
vern-0.069
ache-0.069
Token have
Token ID423
Feature activation+3.92e+00

Pos logit contributions
 strangers+0.267
 tresp+0.261
 supposed+0.239
 liar+0.232
ache+0.232
Neg logit contributions
cy-0.250
 up-0.235
sc-0.226
 Of-0.224
 onto-0.220
Token your
Token ID534
Feature activation-9.21e-01

Pos logit contributions
 strangers+0.277
 tresp+0.269
 supposed+0.254
vern+0.250
ache+0.248
Neg logit contributions
cy-0.177
sc-0.166
 Of-0.157
 up-0.154
ed-0.151
Token to
Token ID284
Feature activation+8.74e-01

Pos logit contributions
 strangers+0.306
 tresp+0.305
vern+0.302
 liar+0.279
 supposed+0.277
Neg logit contributions
cy-0.286
 up-0.276
 down-0.264
sc-0.262
 Of-0.261
Token run
Token ID1057
Feature activation+4.55e+00

Pos logit contributions
vern+0.052
 tresp+0.051
 strangers+0.048
 supposed+0.048
 liar+0.048
Neg logit contributions
 up-0.043
cy-0.043
 down-0.041
 into-0.038
 onto-0.038
Token too
Token ID1165
Feature activation+1.66e-02

Pos logit contributions
 strangers+0.362
 tresp+0.361
vern+0.344
 supposed+0.344
ache+0.340
Neg logit contributions
cy-0.154
sc-0.149
 up-0.145
 onto-0.133
 down-0.132
Token fast
Token ID3049
Feature activation+3.43e-01

Pos logit contributions
 tresp+0.001
ache+0.001
isdom+0.001
 strangers+0.001
cod+0.001
Neg logit contributions
 up-0.001
cy-0.001
 onto-0.001
 down-0.001
sc-0.001
Token.
Token ID13
Feature activation+6.71e-02

Pos logit contributions
 strangers+0.024
 tresp+0.024
vern+0.023
 supposed+0.023
ache+0.022
Neg logit contributions
cy-0.014
sc-0.014
 up-0.014
ed-0.013
 onto-0.013
Token Of
Token ID3226
Feature activation-1.37e+01

Pos logit contributions
 strangers+0.004
 tresp+0.004
 supposed+0.004
vern+0.004
ache+0.004
Neg logit contributions
cy-0.003
sc-0.003
 up-0.003
ed-0.003
 Of-0.003
Token course
Token ID1781
Feature activation+1.07e+00

Pos logit contributions
cy+0.626
 Of+0.585
 up+0.584
 girl+0.573
sc+0.562
Neg logit contributions
 supposed-0.860
 strangers-0.852
ache-0.850
 tresp-0.846
vern-0.837
Token,
Token ID11
Feature activation-6.72e-01

Pos logit contributions
 strangers+0.067
 tresp+0.065
 supposed+0.059
vern+0.058
ache+0.058
Neg logit contributions
cy-0.058
sc-0.056
 Of-0.052
 up-0.051
ed-0.051
Token Tommy
Token ID19919
Feature activation+2.53e+00

Pos logit contributions
cy+0.038
sc+0.037
ed+0.033
 Of+0.033
 up+0.033
Neg logit contributions
 strangers-0.040
 tresp-0.038
 supposed-0.036
ache-0.034
 liar-0.034
Token was
Token ID373
Feature activation-2.89e+00

Pos logit contributions
 strangers+0.167
 tresp+0.161
vern+0.153
 liar+0.148
 supposed+0.147
Neg logit contributions
cy-0.129
 up-0.123
ed-0.121
 onto-0.115
sc-0.115
Token still
Token ID991
Feature activation+1.79e+00

Pos logit contributions
 up+0.178
 down+0.163
cy+0.160
 onto+0.160
ed+0.155
Neg logit contributions
 strangers-0.161
 tresp-0.155
ache-0.142
vern-0.139
cod-0.138
Token moment
Token ID2589
Feature activation-8.92e+00

Pos logit contributions
cy+0.039
sc+0.033
 up+0.031
 Of+0.031
ed+0.030
Neg logit contributions
 strangers-0.143
 tresp-0.141
 supposed-0.134
vern-0.132
ache-0.132
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
cy+0.426
sc+0.403
 up+0.392
 Of+0.389
ed+0.376
Neg logit contributions
 strangers-0.599
 tresp-0.586
 supposed-0.548
vern-0.540
 liar-0.538
Token "
Token ID366
Feature activation+3.51e+00

Pos logit contributions
 strangers+0.005
 tresp+0.004
 shouldn+0.004
vern+0.004
 supposed+0.004
Neg logit contributions
cy-0.007
sc-0.006
 mixture-0.006
cer-0.005
 up-0.005
TokenYes
Token ID5297
Feature activation-1.18e+01

Pos logit contributions
 strangers+0.219
 understand+0.204
vern+0.202
Days+0.201
 tresp+0.201
Neg logit contributions
 up-0.163
cy-0.151
 onto-0.149
 roles-0.148
 into-0.145
Token!
Token ID0
Feature activation+1.49e+00

Pos logit contributions
cy+0.565
sc+0.527
 up+0.526
 Of+0.517
 onto+0.496
Neg logit contributions
 strangers-0.782
 tresp-0.765
 supposed-0.736
vern-0.705
 liar-0.699
Token Of
Token ID3226
Feature activation-1.36e+01

Pos logit contributions
 strangers+0.089
 tresp+0.087
isdom+0.082
 liar+0.080
 supposed+0.079
Neg logit contributions
cy-0.071
sc-0.068
 up-0.067
 mixture-0.062
 onto-0.062
Token course
Token ID1781
Feature activation+1.09e+00

Pos logit contributions
cy+0.665
 up+0.640
sc+0.623
 Of+0.615
 onto+0.603
Neg logit contributions
 strangers-0.848
ache-0.816
 supposed-0.795
 tresp-0.794
 liar-0.769
Token,"
Token ID553
Feature activation-3.60e-01

Pos logit contributions
 strangers+0.070
 tresp+0.068
 supposed+0.062
vern+0.061
ache+0.061
Neg logit contributions
cy-0.056
sc-0.053
 Of-0.050
ed-0.049
 up-0.049
Token he
Token ID339
Feature activation-1.13e+00

Pos logit contributions
cy+0.022
sc+0.020
ASH+0.019
umm+0.019
 mixture+0.018
Neg logit contributions
 strangers-0.019
 tresp-0.018
 cruel-0.016
 understand-0.016
ache-0.016
Token said
Token ID531
Feature activation-4.75e+00

Pos logit contributions
cy+0.064
sc+0.056
 Of+0.055
 up+0.053
umm+0.052
Neg logit contributions
 tresp-0.066
 strangers-0.065
ache-0.061
 cruel-0.059
 supposed-0.059
Token.
Token ID13
Feature activation+4.94e-02

Pos logit contributions
cy+0.283
sc+0.257
 Of+0.237
Wow+0.237
iffs+0.228
Neg logit contributions
 strangers-0.263
 tresp-0.257
vern-0.248
ache-0.243
 supposed-0.238
Token folded
Token ID24650
Feature activation-4.41e+00

Pos logit contributions
 strangers+0.110
 tresp+0.109
 supposed+0.098
 mean+0.097
 allergic+0.096
Neg logit contributions
cy-0.159
 Of-0.157
umm-0.155
sc-0.146
iffs-0.138
Token it
Token ID340
Feature activation-1.06e-01

Pos logit contributions
cy+0.328
sc+0.297
 Of+0.293
umm+0.289
 girl+0.281
Neg logit contributions
 strangers-0.160
 tresp-0.137
 allergic-0.135
 supposed-0.135
eping-0.129
Token in
Token ID287
Feature activation-2.46e+00

Pos logit contributions
cy+0.005
 Of+0.005
Wow+0.005
 girl+0.005
ucks+0.005
Neg logit contributions
 supposed-0.006
 strangers-0.006
 tresp-0.006
 never-0.006
 not-0.006
Token two
Token ID734
Feature activation-6.28e+00

Pos logit contributions
cy+0.137
sc+0.130
ed+0.125
 Of+0.125
 up+0.118
Neg logit contributions
 strangers-0.146
 tresp-0.143
 supposed-0.134
vern-0.130
 allergic-0.126
Token.
Token ID13
Feature activation+1.27e-01

Pos logit contributions
cy+0.262
sc+0.244
 Of+0.232
 up+0.230
ed+0.221
Neg logit contributions
 strangers-0.462
 tresp-0.451
 supposed-0.426
vern-0.418
ache-0.417
Token Of
Token ID3226
Feature activation-1.36e+01

Pos logit contributions
 strangers+0.007
 tresp+0.007
 supposed+0.006
isdom+0.006
 liar+0.006
Neg logit contributions
cy-0.007
sc-0.006
cer-0.006
 girl-0.006
 up-0.006
Token course
Token ID1781
Feature activation+1.12e+00

Pos logit contributions
cy+0.744
sc+0.685
 up+0.684
 Of+0.674
ed+0.655
Neg logit contributions
 strangers-0.798
 tresp-0.777
 supposed-0.755
ache-0.738
vern-0.737
Token,
Token ID11
Feature activation-6.23e-01

Pos logit contributions
 strangers+0.068
 tresp+0.066
 supposed+0.061
vern+0.060
ache+0.059
Neg logit contributions
cy-0.061
sc-0.059
 Of-0.056
 up-0.055
ed-0.054
Token it
Token ID340
Feature activation-1.62e-01

Pos logit contributions
cy+0.035
sc+0.033
 Of+0.030
ed+0.030
 up+0.029
Neg logit contributions
 strangers-0.039
 tresp-0.037
 liar-0.033
 supposed-0.033
 cruel-0.032
Token didn
Token ID1422
Feature activation+6.60e+00

Pos logit contributions
cy+0.009
 Of+0.008
sc+0.008
 onto+0.007
ed+0.007
Neg logit contributions
 strangers-0.010
 tresp-0.009
 supposed-0.009
ache-0.009
vern-0.009
Token't
Token ID470
Feature activation+2.93e+00

Pos logit contributions
 strangers+0.336
 tresp+0.321
 supposed+0.293
ache+0.292
 liar+0.290
Neg logit contributions
cy-0.416
sc-0.410
 up-0.404
 Of-0.396
 onto-0.391
Token to
Token ID284
Feature activation+1.01e+00

Pos logit contributions
 strangers+0.167
 tresp+0.164
 supposed+0.153
vern+0.151
ache+0.151
Neg logit contributions
cy-0.116
sc-0.109
 up-0.106
 Of-0.106
ed-0.101
Token go
Token ID467
Feature activation+1.73e+00

Pos logit contributions
 strangers+0.064
 tresp+0.063
ache+0.057
 supposed+0.057
 liar+0.056
Neg logit contributions
cy-0.053
 up-0.050
 Of-0.049
sc-0.049
 onto-0.048
Token golf
Token ID13126
Feature activation+3.52e+00

Pos logit contributions
 strangers+0.117
 tresp+0.113
 supposed+0.108
vern+0.104
ache+0.103
Neg logit contributions
cy-0.081
sc-0.077
 Of-0.074
ed-0.071
 up-0.069
Tokening
Token ID278
Feature activation-4.54e+00

Pos logit contributions
 strangers+0.239
 tresp+0.230
vern+0.217
 liar+0.211
ache+0.209
Neg logit contributions
cy-0.157
 Of-0.146
sc-0.137
 Sure-0.135
antly-0.135
Token.
Token ID13
Feature activation+2.03e-01

Pos logit contributions
cy+0.229
sc+0.211
 Of+0.210
ed+0.198
 Sure+0.196
Neg logit contributions
 strangers-0.297
 tresp-0.279
 supposed-0.254
vern-0.249
 liar-0.248
Token Of
Token ID3226
Feature activation-1.35e+01

Pos logit contributions
 strangers+0.010
 tresp+0.009
 liar+0.008
 supposed+0.008
 shouldn+0.008
Neg logit contributions
cy-0.012
sc-0.012
antly-0.011
 up-0.011
fl-0.011
Token course
Token ID1781
Feature activation+1.21e+00

Pos logit contributions
cy+0.818
sc+0.774
 Of+0.752
 up+0.749
ed+0.729
Neg logit contributions
 strangers-0.741
 tresp-0.718
 supposed-0.667
vern-0.651
ache-0.649
Token she
Token ID673
Feature activation-7.53e-01

Pos logit contributions
 strangers+0.071
 tresp+0.067
 supposed+0.061
vern+0.059
 cruel+0.058
Neg logit contributions
cy-0.068
sc-0.067
 Of-0.061
ed-0.060
 up-0.060
Token said
Token ID531
Feature activation-4.63e+00

Pos logit contributions
cy+0.037
sc+0.035
 Of+0.034
 up+0.033
ed+0.032
Neg logit contributions
 strangers-0.050
 tresp-0.048
 supposed-0.045
vern-0.044
ache-0.044
Token yes
Token ID3763
Feature activation-1.37e-01

Pos logit contributions
cy+0.258
sc+0.250
 Of+0.248
 up+0.246
ed+0.239
Neg logit contributions
 strangers-0.267
 tresp-0.259
 supposed-0.244
ache-0.234
vern-0.233
Token,
Token ID11
Feature activation-5.56e-01

Pos logit contributions
cy+0.007
sc+0.006
 up+0.006
 Of+0.006
ed+0.006
Neg logit contributions
 strangers-0.009
 tresp-0.009
 supposed-0.008
 liar-0.008
vern-0.008
Token ride
Token ID6594
Feature activation+2.05e+00

Pos logit contributions
 tresp+0.057
 strangers+0.056
 supposed+0.056
ache+0.052
vern+0.052
Neg logit contributions
cy-0.040
 up-0.038
 Of-0.037
sc-0.036
 onto-0.035
Token in
Token ID287
Feature activation-2.44e+00

Pos logit contributions
 strangers+0.138
 tresp+0.129
 supposed+0.118
vern+0.117
 liar+0.114
Neg logit contributions
cy-0.093
sc-0.090
 Of-0.086
ipt-0.082
 mixture-0.082
Token the
Token ID262
Feature activation+1.12e+00

Pos logit contributions
cy+0.113
sc+0.106
 up+0.102
 Of+0.102
 onto+0.098
Neg logit contributions
 strangers-0.167
 tresp-0.163
 supposed-0.154
ache-0.153
vern-0.151
Token wagon
Token ID31860
Feature activation+3.61e+00

Pos logit contributions
 strangers+0.071
 tresp+0.068
 supposed+0.063
vern+0.062
ache+0.062
Neg logit contributions
cy-0.059
sc-0.055
 Of-0.054
 up-0.052
ed-0.052
Token.
Token ID13
Feature activation+2.32e-01

Pos logit contributions
 strangers+0.229
 tresp+0.217
ache+0.202
 supposed+0.202
vern+0.200
Neg logit contributions
cy-0.178
sc-0.168
 Of-0.166
ed-0.160
 up-0.154
Token Of
Token ID3226
Feature activation-1.35e+01

Pos logit contributions
 strangers+0.014
 tresp+0.013
 supposed+0.012
vern+0.012
isdom+0.012
Neg logit contributions
cy-0.012
sc-0.012
 up-0.011
ed-0.011
 mixture-0.010
Token course
Token ID1781
Feature activation+1.28e+00

Pos logit contributions
cy+0.769
sc+0.710
 up+0.710
 Of+0.710
ed+0.684
Neg logit contributions
 strangers-0.768
 tresp-0.752
 supposed-0.717
ache-0.701
vern-0.688
Token,
Token ID11
Feature activation-4.36e-01

Pos logit contributions
 strangers+0.076
 tresp+0.072
 supposed+0.065
 cruel+0.061
vern+0.061
Neg logit contributions
cy-0.071
sc-0.070
ipt-0.063
 Of-0.062
ed-0.062
Token Molly
Token ID30236
Feature activation-7.29e+00

Pos logit contributions
cy+0.023
sc+0.022
ASH+0.020
 mixture+0.019
ipt+0.019
Neg logit contributions
 strangers-0.027
 tresp-0.026
 supposed-0.023
 understand-0.023
 cruel-0.023
Token and
Token ID290
Feature activation+2.65e+00

Pos logit contributions
cy+0.339
sc+0.310
 up+0.308
 Of+0.299
ed+0.298
Neg logit contributions
 strangers-0.505
 tresp-0.490
 supposed-0.466
vern-0.456
 liar-0.449
Token Sam
Token ID3409
Feature activation+3.41e+00

Pos logit contributions
 strangers+0.162
 tresp+0.154
 supposed+0.140
ache+0.137
 liar+0.136
Neg logit contributions
cy-0.146
sc-0.145
 Of-0.131
 up-0.129
ed-0.127
Token his
Token ID465
Feature activation-2.89e-01

Pos logit contributions
cy+0.115
sc+0.104
 Of+0.100
ipt+0.099
 girl+0.095
Neg logit contributions
 strangers-0.110
 tresp-0.108
 supposed-0.100
 mean-0.091
vern-0.091
Token fish
Token ID5916
Feature activation-8.26e-01

Pos logit contributions
cy+0.015
sc+0.014
 Of+0.013
 up+0.013
ed+0.013
Neg logit contributions
 strangers-0.019
 tresp-0.018
 supposed-0.017
vern-0.017
ache-0.017
Token with
Token ID351
Feature activation-9.05e+00

Pos logit contributions
cy+0.035
sc+0.033
 Of+0.032
 up+0.030
ed+0.030
Neg logit contributions
 strangers-0.060
 tresp-0.058
 supposed-0.055
vern-0.054
ache-0.054
Token him
Token ID683
Feature activation+4.24e+00

Pos logit contributions
cy+0.562
sc+0.538
 up+0.534
 Of+0.521
 onto+0.516
Neg logit contributions
 strangers-0.474
 tresp-0.463
 supposed-0.428
ache-0.426
vern-0.422
Token!
Token ID0
Feature activation+1.66e+00

Pos logit contributions
 strangers+0.293
 tresp+0.286
 supposed+0.270
vern+0.269
 liar+0.265
Neg logit contributions
cy-0.194
sc-0.182
 up-0.181
 onto-0.173
 Of-0.173
Token Of
Token ID3226
Feature activation-1.34e+01

Pos logit contributions
 strangers+0.099
 tresp+0.095
 supposed+0.092
vern+0.086
 liar+0.086
Neg logit contributions
cy-0.088
sc-0.083
 up-0.078
ed-0.076
 girl-0.075
Token course
Token ID1781
Feature activation+1.29e+00

Pos logit contributions
cy+0.711
 up+0.660
sc+0.658
 Of+0.655
 onto+0.631
Neg logit contributions
 strangers-0.803
 tresp-0.781
 supposed-0.764
ache-0.754
vern-0.749
Token,
Token ID11
Feature activation-4.64e-01

Pos logit contributions
 strangers+0.081
 tresp+0.078
 supposed+0.073
vern+0.070
ache+0.070
Neg logit contributions
cy-0.068
sc-0.065
 Of-0.061
 up-0.060
ed-0.059
Token it
Token ID340
Feature activation-9.84e-03

Pos logit contributions
cy+0.025
sc+0.024
 Of+0.023
 up+0.022
ed+0.022
Neg logit contributions
 strangers-0.028
 tresp-0.027
 supposed-0.025
vern-0.025
 liar-0.024
Token wasn
Token ID2492
Feature activation+4.43e+00

Pos logit contributions
cy+0.001
sc+0.000
 Of+0.000
 up+0.000
ed+0.000
Neg logit contributions
 strangers-0.001
 tresp-0.001
 supposed-0.001
ache-0.001
vern-0.001
Token't
Token ID470
Feature activation+3.08e+00

Pos logit contributions
 strangers+0.278
 tresp+0.262
 liar+0.256
cod+0.253
ache+0.250
Neg logit contributions
 up-0.232
 onto-0.228
sc-0.228
cy-0.214
 Of-0.208
Token said
Token ID531
Feature activation-4.50e+00

Pos logit contributions
 strangers+0.152
 tresp+0.143
ache+0.136
 cruel+0.134
 mean+0.130
Neg logit contributions
sc-0.145
cy-0.145
 Of-0.128
iffs-0.125
 up-0.124
Token,
Token ID11
Feature activation-4.45e-01

Pos logit contributions
cy+0.233
sc+0.220
 Of+0.213
 up+0.212
ed+0.206
Neg logit contributions
 strangers-0.284
 tresp-0.276
 supposed-0.259
vern-0.253
ache-0.252
Token "
Token ID366
Feature activation+3.69e+00

Pos logit contributions
cy+0.022
sc+0.021
 Of+0.021
 up+0.021
ed+0.020
Neg logit contributions
 strangers-0.029
 tresp-0.028
 supposed-0.026
vern-0.026
ache-0.025
TokenYes
Token ID5297
Feature activation-1.16e+01

Pos logit contributions
 strangers+0.241
 understand+0.228
ache+0.227
Days+0.225
vern+0.224
Neg logit contributions
 up-0.153
cy-0.140
 onto-0.135
ed-0.130
 into-0.130
Token!
Token ID0
Feature activation+1.66e+00

Pos logit contributions
cy+0.579
sc+0.544
 up+0.529
 Of+0.527
ed+0.504
Neg logit contributions
 strangers-0.752
 tresp-0.733
 supposed-0.694
vern-0.675
ache-0.672
Token Of
Token ID3226
Feature activation-1.34e+01

Pos logit contributions
 strangers+0.102
 tresp+0.100
isdom+0.095
 liar+0.092
 supposed+0.091
Neg logit contributions
cy-0.074
 up-0.070
sc-0.069
ed-0.067
 onto-0.064
Token course
Token ID1781
Feature activation+1.27e+00

Pos logit contributions
cy+0.674
 up+0.627
sc+0.617
 Of+0.611
 onto+0.587
Neg logit contributions
 strangers-0.832
ache-0.804
 tresp-0.794
 supposed-0.780
 liar-0.761
Token.
Token ID13
Feature activation+2.44e-01

Pos logit contributions
 strangers+0.085
 tresp+0.083
 supposed+0.077
vern+0.075
ache+0.075
Neg logit contributions
cy-0.062
sc-0.058
 Of-0.055
 up-0.054
ed-0.053
Token I
Token ID314
Feature activation-1.52e+00

Pos logit contributions
 tresp+0.015
 strangers+0.015
isdom+0.014
 supposed+0.014
 liar+0.013
Neg logit contributions
cy-0.011
 up-0.010
ed-0.010
sc-0.010
 mixture-0.010
Token have
Token ID423
Feature activation+4.31e+00

Pos logit contributions
cy+0.086
sc+0.078
 up+0.073
ed+0.071
 Of+0.070
Neg logit contributions
 strangers-0.101
 tresp-0.092
 supposed-0.085
vern-0.084
 allergic-0.084
Token a
Token ID257
Feature activation-1.48e+00

Pos logit contributions
 strangers+0.310
 tresp+0.302
 supposed+0.285
ache+0.284
vern+0.282
Neg logit contributions
cy-0.180
sc-0.171
 up-0.166
 Of-0.163
ed-0.159
Token could
Token ID714
Feature activation+4.97e+00

Pos logit contributions
cy+0.030
sc+0.029
 up+0.029
ed+0.028
 onto+0.027
Neg logit contributions
 strangers-0.036
 tresp-0.034
vern-0.033
 supposed-0.032
 liar-0.032
Token hardly
Token ID8941
Feature activation+4.11e+00

Pos logit contributions
 tresp+0.283
 strangers+0.283
cod+0.268
 supposed+0.252
ache+0.251
Neg logit contributions
 up-0.286
 onto-0.282
cy-0.271
 Of-0.271
 down-0.265
Token stand
Token ID1302
Feature activation-2.15e+00

Pos logit contributions
 strangers+0.230
 tresp+0.224
 supposed+0.207
vern+0.203
ache+0.201
Neg logit contributions
cy-0.243
sc-0.230
 Of-0.224
 up-0.222
ed-0.216
Token it
Token ID340
Feature activation+1.07e-01

Pos logit contributions
cy+0.046
 up+0.044
sc+0.037
 onto+0.036
 down+0.035
Neg logit contributions
 strangers-0.199
 tresp-0.197
vern-0.187
 supposed-0.186
 liar-0.186
Token!
Token ID0
Feature activation+1.74e+00

Pos logit contributions
 strangers+0.007
 tresp+0.007
 supposed+0.007
ache+0.006
vern+0.006
Neg logit contributions
cy-0.005
 Of-0.005
sc-0.005
ed-0.004
 girl-0.004
Token Of
Token ID3226
Feature activation-1.34e+01

Pos logit contributions
 strangers+0.098
 tresp+0.087
 supposed+0.087
 allergic+0.082
isdom+0.082
Neg logit contributions
cy-0.102
sc-0.099
cer-0.087
antly-0.086
ed-0.086
Token course
Token ID1781
Feature activation+1.36e+00

Pos logit contributions
cy+0.752
sc+0.697
 up+0.696
 Of+0.687
ed+0.663
Neg logit contributions
 strangers-0.768
 tresp-0.753
 supposed-0.721
ache-0.701
vern-0.700
Token she
Token ID673
Feature activation-5.99e-01

Pos logit contributions
 strangers+0.082
 tresp+0.078
 supposed+0.073
vern+0.070
ache+0.069
Neg logit contributions
cy-0.076
sc-0.074
 Of-0.069
 up-0.067
ed-0.067
Token wanted
Token ID2227
Feature activation+2.58e+00

Pos logit contributions
cy+0.033
 up+0.031
sc+0.030
ed+0.030
 onto+0.029
Neg logit contributions
 strangers-0.038
 tresp-0.035
vern-0.034
 supposed-0.033
 liar-0.032
Token to
Token ID284
Feature activation+1.11e+00

Pos logit contributions
 strangers+0.174
 tresp+0.170
 supposed+0.159
vern+0.158
ache+0.157
Neg logit contributions
cy-0.123
sc-0.115
 up-0.113
 Of-0.112
 onto-0.107
Token go
Token ID467
Feature activation+1.83e+00

Pos logit contributions
 strangers+0.072
 tresp+0.071
ache+0.067
 supposed+0.067
 liar+0.066
Neg logit contributions
cy-0.053
 up-0.051
 Of-0.049
 onto-0.049
 down-0.047
Token.
Token ID13
Feature activation+1.65e-01

Pos logit contributions
cy+0.059
sc+0.055
 up+0.054
 Of+0.052
 onto+0.051
Neg logit contributions
 strangers-0.093
 tresp-0.091
 supposed-0.086
vern-0.084
ache-0.083
Token
Token ID198
Feature activation-1.02e+00

Pos logit contributions
 hated+0.006
 about+0.005
 supposed+0.005
 strangers+0.005
 shouldn+0.005
Neg logit contributions
sc-0.011
fl-0.010
cer-0.010
cy-0.010
als-0.010
Token
Token ID198
Feature activation-1.03e+00

Pos logit contributions
cy+0.058
sc+0.053
ed+0.052
 down+0.051
 up+0.051
Neg logit contributions
 strangers-0.054
vern-0.053
 tresp-0.051
 supposed-0.050
 understand-0.048
TokenSo
Token ID2396
Feature activation-3.66e+00

Pos logit contributions
cy+0.063
ed+0.059
 Of+0.058
 up+0.058
sc+0.058
Neg logit contributions
 strangers-0.049
 tresp-0.047
vern-0.047
Days-0.046
 supposed-0.045
Token,
Token ID11
Feature activation-5.87e-01

Pos logit contributions
sc+0.283
ucks+0.281
ASH+0.273
rolley+0.252
fl+0.250
Neg logit contributions
 never-0.118
 not-0.113
 always-0.105
 often-0.102
 strangers-0.100
Token Moe
Token ID46488
Feature activation-1.34e+01

Pos logit contributions
ucks+0.039
sc+0.038
ASH+0.038
antly+0.035
liness+0.035
Neg logit contributions
 not-0.023
 never-0.023
 strangers-0.022
 always-0.021
 often-0.020
Token thought
Token ID1807
Feature activation-3.46e+00

Pos logit contributions
cy+0.697
sc+0.655
 up+0.632
ed+0.622
 Of+0.609
Neg logit contributions
 strangers-0.847
 tresp-0.828
vern-0.777
 supposed-0.775
 liar-0.749
Token and
Token ID290
Feature activation+2.46e+00

Pos logit contributions
 up+0.174
 onto+0.161
 into+0.154
 down+0.148
 Of+0.148
Neg logit contributions
 tresp-0.223
 strangers-0.222
 liar-0.218
cod-0.217
ache-0.216
Token thought
Token ID1807
Feature activation-3.48e+00

Pos logit contributions
 strangers+0.138
 tresp+0.132
 supposed+0.123
ache+0.119
 liar+0.116
Neg logit contributions
sc-0.147
cy-0.141
 Of-0.130
 up-0.126
 onto-0.125
Token.
Token ID13
Feature activation+9.80e-02

Pos logit contributions
 up+0.173
 onto+0.163
 Of+0.161
sc+0.160
 into+0.157
Neg logit contributions
 strangers-0.226
 tresp-0.224
 liar-0.210
ache-0.209
 supposed-0.207
Token Suddenly
Token ID24975
Feature activation-8.13e+00

Pos logit contributions
 hated+0.003
 strangers+0.003
 supposed+0.003
 tresp+0.003
 shouldn+0.003
Neg logit contributions
cy-0.006
sc-0.006
cer-0.006
fl-0.006
 therapist-0.005
Token said
Token ID531
Feature activation-4.47e+00

Pos logit contributions
 strangers+0.110
 tresp+0.107
 supposed+0.100
vern+0.099
ache+0.098
Neg logit contributions
cy-0.079
sc-0.074
 Of-0.072
 up-0.071
ed-0.069
Token,
Token ID11
Feature activation-3.83e-01

Pos logit contributions
cy+0.231
sc+0.220
 Of+0.213
 up+0.213
ed+0.207
Neg logit contributions
 strangers-0.282
 tresp-0.274
 supposed-0.258
vern-0.251
ache-0.249
Token "
Token ID366
Feature activation+3.76e+00

Pos logit contributions
cy+0.021
sc+0.020
 Of+0.019
 up+0.019
ed+0.018
Neg logit contributions
 strangers-0.023
 tresp-0.023
 supposed-0.021
vern-0.021
ache-0.021
TokenYes
Token ID5297
Feature activation-1.15e+01

Pos logit contributions
 strangers+0.224
ache+0.207
 understand+0.205
cod+0.201
 tresp+0.199
Neg logit contributions
 up-0.188
 roles-0.171
 onto-0.167
cy-0.166
 into-0.165
Token!
Token ID0
Feature activation+1.75e+00

Pos logit contributions
cy+0.528
 up+0.491
sc+0.486
 Of+0.481
 onto+0.463
Neg logit contributions
 strangers-0.784
 tresp-0.767
 supposed-0.741
vern-0.712
ache-0.705
Token Of
Token ID3226
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.111
 tresp+0.109
isdom+0.102
 liar+0.101
 supposed+0.100
Neg logit contributions
cy-0.079
sc-0.076
 up-0.076
ed-0.069
 onto-0.069
Token course
Token ID1781
Feature activation+1.41e+00

Pos logit contributions
cy+0.629
 up+0.594
 Of+0.577
sc+0.573
 onto+0.565
Neg logit contributions
 strangers-0.844
ache-0.828
 supposed-0.799
 tresp-0.793
 liar-0.774
Token you
Token ID345
Feature activation-6.65e-01

Pos logit contributions
 strangers+0.089
 tresp+0.086
 supposed+0.080
vern+0.078
ache+0.078
Neg logit contributions
cy-0.073
sc-0.069
 Of-0.066
 up-0.065
ed-0.064
Token can
Token ID460
Feature activation+5.02e+00

Pos logit contributions
cy+0.028
 up+0.028
sc+0.027
ed+0.025
 down+0.025
Neg logit contributions
 strangers-0.049
 tresp-0.044
ache-0.043
cod-0.043
vern-0.043
Token borrow
Token ID8804
Feature activation+3.81e+00

Pos logit contributions
 strangers+0.306
 tresp+0.294
 supposed+0.272
ache+0.266
 liar+0.266
Neg logit contributions
cy-0.282
sc-0.264
 up-0.264
 Of-0.255
 onto-0.252
Token some
Token ID617
Feature activation-3.56e+00

Pos logit contributions
 strangers+0.271
 tresp+0.265
 supposed+0.250
vern+0.245
ache+0.245
Neg logit contributions
cy-0.166
sc-0.155
 Of-0.148
 up-0.148
ed-0.142
Token there
Token ID612
Feature activation-8.66e-01

Pos logit contributions
sc+0.037
cy+0.036
 Of+0.036
ed+0.033
 onto+0.033
Neg logit contributions
 strangers-0.043
 tresp-0.042
 supposed-0.040
 cruel-0.038
ache-0.038
Token was
Token ID373
Feature activation-2.84e+00

Pos logit contributions
cy+0.034
sc+0.032
 Of+0.030
 up+0.029
ed+0.028
Neg logit contributions
 strangers-0.065
 tresp-0.064
 supposed-0.060
vern-0.059
ache-0.058
Token a
Token ID257
Feature activation-1.59e+00

Pos logit contributions
cy+0.139
sc+0.130
 Of+0.125
 up+0.124
ed+0.120
Neg logit contributions
 strangers-0.189
 tresp-0.184
 supposed-0.173
vern-0.169
ache-0.168
Token mouse
Token ID10211
Feature activation+3.82e-01

Pos logit contributions
cy+0.074
 girl+0.068
 up+0.063
 Of+0.062
sc+0.061
Neg logit contributions
 strangers-0.111
 tresp-0.109
 supposed-0.106
vern-0.104
ache-0.099
Token named
Token ID3706
Feature activation+5.90e+00

Pos logit contributions
 strangers+0.016
 tresp+0.016
vern+0.016
 supposed+0.015
ache+0.014
Neg logit contributions
cy-0.027
 up-0.025
sc-0.025
 girl-0.025
 Of-0.024
Token Moe
Token ID46488
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.405
 tresp+0.399
 supposed+0.375
vern+0.374
ache+0.370
Neg logit contributions
cy-0.275
sc-0.253
 up-0.251
 Of-0.248
 onto-0.234
Token.
Token ID13
Feature activation+2.32e-01

Pos logit contributions
cy+0.531
ed+0.435
 up+0.424
 girl+0.390
sc+0.378
Neg logit contributions
 tresp-1.042
 strangers-1.039
vern-1.002
ache-0.973
 cruel-0.970
Token Moe
Token ID46488
Feature activation-1.33e+01

Pos logit contributions
 tresp+0.016
 strangers+0.016
 liar+0.014
 supposed+0.014
 mean+0.014
Neg logit contributions
sc-0.009
antly-0.009
cy-0.008
 mixture-0.008
 girl-0.008
Token liked
Token ID8288
Feature activation+2.91e+00

Pos logit contributions
cy+0.881
ed+0.795
 up+0.780
sc+0.757
 girl+0.746
Neg logit contributions
 strangers-0.700
 tresp-0.677
vern-0.655
ache-0.617
 supposed-0.609
Token to
Token ID284
Feature activation+1.07e+00

Pos logit contributions
vern+0.238
 tresp+0.231
ache+0.224
 strangers+0.223
 liar+0.220
Neg logit contributions
 up-0.091
 Of-0.086
 down-0.084
 into-0.081
 onto-0.080
Token play
Token ID711
Feature activation+6.15e-01

Pos logit contributions
 tresp+0.081
 supposed+0.081
 strangers+0.077
vern+0.077
 cruel+0.076
Neg logit contributions
cy-0.038
 up-0.036
 onto-0.035
 Of-0.034
 tin-0.033
Token are
Token ID389
Feature activation+4.09e+00

Pos logit contributions
cy+0.249
sc+0.227
 up+0.226
ed+0.214
 girl+0.213
Neg logit contributions
 strangers-0.272
 tresp-0.255
vern-0.246
 supposed-0.244
ache-0.237
Token my
Token ID616
Feature activation-3.04e+00

Pos logit contributions
 strangers+0.289
 tresp+0.278
ache+0.275
vern+0.267
 liar+0.261
Neg logit contributions
 up-0.171
cy-0.167
 Of-0.151
 onto-0.150
 into-0.149
Token best
Token ID1266
Feature activation-6.78e+00

Pos logit contributions
cy+0.158
sc+0.151
 Of+0.144
 up+0.142
ed+0.141
Neg logit contributions
 strangers-0.192
 tresp-0.187
 supposed-0.173
vern-0.171
 liar-0.170
Token friend
Token ID1545
Feature activation-2.67e+00

Pos logit contributions
cy+0.374
sc+0.357
 Of+0.342
 up+0.337
ed+0.330
Neg logit contributions
 strangers-0.409
 tresp-0.399
 supposed-0.371
 liar-0.362
ache-0.362
Token.
Token ID13
Feature activation+4.38e-01

Pos logit contributions
cy+0.119
sc+0.110
 up+0.106
 Of+0.104
ed+0.103
Neg logit contributions
 strangers-0.188
 tresp-0.183
 supposed-0.173
vern-0.171
ache-0.170
Token Of
Token ID3226
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.028
 tresp+0.027
 supposed+0.025
 liar+0.025
isdom+0.024
Neg logit contributions
cy-0.022
 up-0.021
sc-0.020
 mixture-0.020
ed-0.020
Token course
Token ID1781
Feature activation+1.44e+00

Pos logit contributions
cy+0.718
 up+0.659
sc+0.647
 Of+0.645
 girl+0.632
Neg logit contributions
 strangers-0.785
 tresp-0.760
 supposed-0.741
ache-0.735
vern-0.716
Token I
Token ID314
Feature activation-1.33e+00

Pos logit contributions
 strangers+0.089
 tresp+0.086
 supposed+0.078
vern+0.077
ache+0.076
Neg logit contributions
cy-0.077
sc-0.075
 Of-0.070
 up-0.069
ed-0.068
Token include
Token ID2291
Feature activation-7.88e-01

Pos logit contributions
cy+0.064
sc+0.056
 up+0.052
ed+0.051
 girl+0.050
Neg logit contributions
 strangers-0.099
 tresp-0.090
 allergic-0.085
ache-0.084
vern-0.084
Token you
Token ID345
Feature activation-6.36e-01

Pos logit contributions
cy+0.041
sc+0.038
 Of+0.037
 up+0.035
ed+0.035
Neg logit contributions
 strangers-0.051
 tresp-0.049
 supposed-0.045
vern-0.044
 liar-0.044
Token in
Token ID287
Feature activation-2.17e+00

Pos logit contributions
cy+0.029
sc+0.027
 up+0.027
 Of+0.025
ed+0.025
Neg logit contributions
 strangers-0.045
 tresp-0.044
 supposed-0.041
ache-0.040
vern-0.040
Token a
Token ID257
Feature activation-1.23e+00

Pos logit contributions
 strangers+0.230
ache+0.216
 tresp+0.216
vern+0.211
isdom+0.201
Neg logit contributions
 up-0.235
cy-0.217
 into-0.210
 down-0.209
 onto-0.206
Token good
Token ID922
Feature activation-2.82e+00

Pos logit contributions
cy+0.071
 girl+0.067
sc+0.066
 up+0.065
 Of+0.063
Neg logit contributions
 strangers-0.069
 tresp-0.065
 supposed-0.062
vern-0.060
isdom-0.059
Token brother
Token ID3956
Feature activation+2.84e+00

Pos logit contributions
cy+0.162
sc+0.160
 Of+0.152
 onto+0.149
 up+0.148
Neg logit contributions
 strangers-0.161
 tresp-0.155
 liar-0.146
 supposed-0.143
ache-0.142
Token too
Token ID1165
Feature activation+3.91e-01

Pos logit contributions
 strangers+0.196
 tresp+0.192
 supposed+0.182
vern+0.179
ache+0.175
Neg logit contributions
cy-0.128
sc-0.121
 up-0.116
 Of-0.113
ed-0.112
Token.
Token ID13
Feature activation+4.80e-01

Pos logit contributions
 strangers+0.030
 tresp+0.029
 supposed+0.028
vern+0.028
ache+0.027
Neg logit contributions
 up-0.013
cy-0.013
sc-0.013
ed-0.012
 down-0.012
Token Of
Token ID3226
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.032
 tresp+0.031
 liar+0.029
 supposed+0.028
isdom+0.028
Neg logit contributions
cy-0.023
sc-0.022
 up-0.021
ed-0.020
 mixture-0.020
Token course
Token ID1781
Feature activation+1.49e+00

Pos logit contributions
cy+0.713
 up+0.659
 Of+0.647
sc+0.646
 girl+0.635
Neg logit contributions
 strangers-0.780
 tresp-0.763
 supposed-0.746
ache-0.734
vern-0.715
Token we
Token ID356
Feature activation-1.72e+00

Pos logit contributions
 strangers+0.093
 tresp+0.089
 supposed+0.081
vern+0.079
 cruel+0.079
Neg logit contributions
cy-0.080
sc-0.077
 Of-0.071
ed-0.070
 up-0.070
Token can
Token ID460
Feature activation+5.08e+00

Pos logit contributions
cy+0.080
 up+0.072
 down+0.066
 girl+0.066
sc+0.065
Neg logit contributions
 strangers-0.122
 tresp-0.114
vern-0.112
ache-0.110
 allergic-0.110
Token still
Token ID991
Feature activation+2.22e+00

Pos logit contributions
 strangers+0.313
 tresp+0.309
 supposed+0.288
ache+0.279
 liar+0.279
Neg logit contributions
cy-0.281
 up-0.268
sc-0.246
 Of-0.246
 onto-0.239
Token be
Token ID307
Feature activation-7.44e-01

Pos logit contributions
 strangers+0.146
 tresp+0.144
 supposed+0.137
ache+0.132
vern+0.131
Neg logit contributions
cy-0.113
 up-0.103
sc-0.097
 Of-0.097
 onto-0.095
Token friends
Token ID2460
Feature activation+2.99e-01

Pos logit contributions
cy+0.005
 up+0.005
 girl+0.005
 Of+0.005
 onto+0.005
Neg logit contributions
 tresp-0.006
 strangers-0.006
 supposed-0.006
cod-0.006
isdom-0.006
Token to
Token ID284
Feature activation+1.25e+00

Pos logit contributions
 strangers+0.021
 tresp+0.021
 supposed+0.019
vern+0.019
 liar+0.019
Neg logit contributions
cy-0.013
sc-0.012
 up-0.012
 Of-0.012
 onto-0.012
Token go
Token ID467
Feature activation+1.98e+00

Pos logit contributions
 supposed+0.078
 tresp+0.078
 strangers+0.078
 liar+0.078
ache+0.077
Neg logit contributions
 up-0.060
cy-0.058
 onto-0.056
 down-0.053
 into-0.051
Token home
Token ID1363
Feature activation-3.23e+00

Pos logit contributions
 strangers+0.147
 tresp+0.143
 supposed+0.135
vern+0.133
ache+0.132
Neg logit contributions
cy-0.082
sc-0.076
 Of-0.073
 up-0.070
ed-0.069
Token.
Token ID13
Feature activation+4.72e-01

Pos logit contributions
cy+0.136
 up+0.128
sc+0.125
 onto+0.119
 Of+0.119
Neg logit contributions
 strangers-0.234
 tresp-0.228
 supposed-0.213
 liar-0.210
vern-0.209
Token Of
Token ID3226
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.028
 tresp+0.028
 supposed+0.026
vern+0.025
 liar+0.025
Neg logit contributions
cy-0.025
sc-0.024
 up-0.023
ed-0.022
 mixture-0.021
Token course
Token ID1781
Feature activation+1.49e+00

Pos logit contributions
cy+0.717
 up+0.657
sc+0.656
 Of+0.649
 girl+0.632
Neg logit contributions
 strangers-0.783
 tresp-0.766
 supposed-0.750
ache-0.731
vern-0.724
Token,
Token ID11
Feature activation-2.64e-01

Pos logit contributions
 strangers+0.091
 tresp+0.089
 supposed+0.080
vern+0.079
ache+0.078
Neg logit contributions
cy-0.081
sc-0.078
 Of-0.073
 up-0.071
ed-0.071
Token they
Token ID484
Feature activation-6.33e-01

Pos logit contributions
cy+0.015
sc+0.014
 Of+0.013
ed+0.013
 up+0.012
Neg logit contributions
 strangers-0.016
 tresp-0.015
 supposed-0.014
 liar-0.014
 cruel-0.013
Token had
Token ID550
Feature activation+5.57e+00

Pos logit contributions
cy+0.034
 up+0.031
 down+0.030
ed+0.030
 onto+0.030
Neg logit contributions
 strangers-0.041
 tresp-0.040
vern-0.038
 supposed-0.038
 liar-0.038
Token a
Token ID257
Feature activation-1.24e+00

Pos logit contributions
 strangers+0.376
 tresp+0.359
ache+0.341
 liar+0.335
 supposed+0.333
Neg logit contributions
cy-0.267
 up-0.252
sc-0.251
ed-0.244
 Of-0.240
Token a
Token ID257
Feature activation-1.59e+00

Pos logit contributions
cy+0.139
sc+0.130
 Of+0.125
 up+0.124
ed+0.120
Neg logit contributions
 strangers-0.189
 tresp-0.184
 supposed-0.173
vern-0.169
ache-0.168
Token mouse
Token ID10211
Feature activation+3.82e-01

Pos logit contributions
cy+0.074
 girl+0.068
 up+0.063
 Of+0.062
sc+0.061
Neg logit contributions
 strangers-0.111
 tresp-0.109
 supposed-0.106
vern-0.104
ache-0.099
Token named
Token ID3706
Feature activation+5.90e+00

Pos logit contributions
 strangers+0.016
 tresp+0.016
vern+0.016
 supposed+0.015
ache+0.014
Neg logit contributions
cy-0.027
 up-0.025
sc-0.025
 girl-0.025
 Of-0.024
Token Moe
Token ID46488
Feature activation-1.33e+01

Pos logit contributions
 strangers+0.405
 tresp+0.399
 supposed+0.375
vern+0.374
ache+0.370
Neg logit contributions
cy-0.275
sc-0.253
 up-0.251
 Of-0.248
 onto-0.234
Token.
Token ID13
Feature activation+2.32e-01

Pos logit contributions
cy+0.531
ed+0.435
 up+0.424
 girl+0.390
sc+0.378
Neg logit contributions
 tresp-1.042
 strangers-1.039
vern-1.002
ache-0.973
 cruel-0.970
Token Moe
Token ID46488
Feature activation-1.33e+01

Pos logit contributions
 tresp+0.016
 strangers+0.016
 liar+0.014
 supposed+0.014
 mean+0.014
Neg logit contributions
sc-0.009
antly-0.009
cy-0.008
 mixture-0.008
 girl-0.008
Token liked
Token ID8288
Feature activation+2.91e+00

Pos logit contributions
cy+0.881
ed+0.795
 up+0.780
sc+0.757
 girl+0.746
Neg logit contributions
 strangers-0.700
 tresp-0.677
vern-0.655
ache-0.617
 supposed-0.609
Token to
Token ID284
Feature activation+1.07e+00

Pos logit contributions
vern+0.238
 tresp+0.231
ache+0.224
 strangers+0.223
 liar+0.220
Neg logit contributions
 up-0.091
 Of-0.086
 down-0.084
 into-0.081
 onto-0.080
Token play
Token ID711
Feature activation+6.15e-01

Pos logit contributions
 tresp+0.081
 supposed+0.081
 strangers+0.077
vern+0.077
 cruel+0.076
Neg logit contributions
cy-0.038
 up-0.036
 onto-0.035
 Of-0.034
 tin-0.033
Token hide
Token ID7808
Feature activation-3.04e+00

Pos logit contributions
 strangers+0.051
 tresp+0.050
 supposed+0.047
vern+0.046
ache+0.046
Neg logit contributions
cy-0.020
sc-0.018
 Of-0.017
ed-0.016
 up-0.016
Token and
Token ID290
Feature activation+2.68e+00

Pos logit contributions
cy+0.146
sc+0.139
 up+0.136
ed+0.130
 onto+0.128
Neg logit contributions
 strangers-0.201
 tresp-0.199
 supposed-0.191
vern-0.186
ache-0.184
Token my
Token ID616
Feature activation-3.01e+00

Pos logit contributions
cy+0.012
sc+0.011
 Of+0.011
 up+0.011
 onto+0.010
Neg logit contributions
 strangers-0.017
 tresp-0.016
 supposed-0.015
vern-0.015
ache-0.015
Token cute
Token ID13779
Feature activation+2.26e+00

Pos logit contributions
cy+0.168
sc+0.160
 Of+0.153
 up+0.152
ed+0.148
Neg logit contributions
 strangers-0.179
 tresp-0.174
 supposed-0.162
vern-0.159
ache-0.158
Token little
Token ID1310
Feature activation+1.79e-02

Pos logit contributions
 strangers+0.150
 tresp+0.146
 supposed+0.137
vern+0.135
ache+0.134
Neg logit contributions
cy-0.112
sc-0.104
 up-0.101
 Of-0.100
ed-0.097
Token girl
Token ID2576
Feature activation-9.91e+00

Pos logit contributions
 strangers+0.001
 tresp+0.001
 mean+0.001
 liar+0.001
 supposed+0.001
Neg logit contributions
sc-0.001
ipt-0.001
antly-0.001
 onto-0.001
iffs-0.001
Token!
Token ID0
Feature activation+1.86e+00

Pos logit contributions
cy+0.486
sc+0.445
 up+0.433
 Of+0.431
ed+0.421
Neg logit contributions
 strangers-0.663
 tresp-0.639
 supposed-0.597
vern-0.596
ache-0.593
Token Of
Token ID3226
Feature activation-1.32e+01

Pos logit contributions
 strangers+0.111
 tresp+0.110
isdom+0.106
 shouldn+0.103
 supposed+0.102
Neg logit contributions
cy-0.083
 up-0.081
sc-0.079
ed-0.076
 mixture-0.076
Token course
Token ID1781
Feature activation+1.50e+00

Pos logit contributions
cy+0.633
 up+0.587
 Of+0.565
sc+0.560
 girl+0.557
Neg logit contributions
 strangers-0.841
ache-0.828
 tresp-0.805
 supposed-0.804
 liar-0.789
Token you
Token ID345
Feature activation-5.93e-01

Pos logit contributions
 strangers+0.097
 tresp+0.095
 supposed+0.088
vern+0.086
ache+0.085
Neg logit contributions
cy-0.075
sc-0.072
 Of-0.068
 up-0.066
ed-0.065
Token can
Token ID460
Feature activation+5.09e+00

Pos logit contributions
cy+0.028
 up+0.026
sc+0.024
ed+0.024
 down+0.023
Neg logit contributions
 strangers-0.043
 tresp-0.040
ache-0.039
 allergic-0.038
cod-0.038
Token have
Token ID423
Feature activation+4.52e+00

Pos logit contributions
 strangers+0.333
 tresp+0.324
 liar+0.301
ache+0.300
 cruel+0.299
Neg logit contributions
cy-0.269
 up-0.255
 onto-0.239
 Of-0.238
sc-0.235
Token a
Token ID257
Feature activation-1.24e+00

Pos logit contributions
 strangers+0.337
 tresp+0.334
ache+0.320
 liar+0.314
 supposed+0.311
Neg logit contributions
cy-0.168
 up-0.168
sc-0.161
 Of-0.157
 onto-0.156
Token you
Token ID345
Feature activation-5.77e-01

Pos logit contributions
cy+0.008
 Of+0.007
sc+0.007
 up+0.007
 girl+0.007
Neg logit contributions
 strangers-0.019
 tresp-0.019
 supposed-0.019
vern-0.018
ache-0.018
Token are
Token ID389
Feature activation+4.12e+00

Pos logit contributions
cy+0.036
 up+0.034
sc+0.034
 Of+0.033
ed+0.033
Neg logit contributions
 strangers-0.031
 tresp-0.029
ache-0.027
 supposed-0.027
vern-0.027
Token very
Token ID845
Feature activation+1.49e+00

Pos logit contributions
 strangers+0.250
 tresp+0.240
ache+0.225
vern+0.220
 supposed+0.220
Neg logit contributions
cy-0.225
 up-0.213
sc-0.211
 Of-0.206
 onto-0.200
Token kind
Token ID1611
Feature activation+4.14e+00

Pos logit contributions
 strangers+0.067
 tresp+0.065
 supposed+0.059
vern+0.058
ache+0.056
Neg logit contributions
cy-0.105
sc-0.102
 Of-0.098
 up-0.097
ed-0.096
Token.
Token ID13
Feature activation+4.63e-01

Pos logit contributions
 strangers+0.243
 tresp+0.230
 supposed+0.212
ache+0.210
 liar+0.207
Neg logit contributions
cy-0.237
sc-0.224
 Of-0.217
 up-0.209
 onto-0.206
Token Of
Token ID3226
Feature activation-1.32e+01

Pos logit contributions
 strangers+0.030
 tresp+0.029
isdom+0.027
 supposed+0.027
 liar+0.027
Neg logit contributions
cy-0.022
sc-0.021
 up-0.020
ed-0.020
 mixture-0.019
Token course
Token ID1781
Feature activation+1.48e+00

Pos logit contributions
cy+0.724
 up+0.662
sc+0.661
 Of+0.654
 girl+0.632
Neg logit contributions
 strangers-0.774
 tresp-0.751
ache-0.736
 supposed-0.731
vern-0.703
Token you
Token ID345
Feature activation-5.86e-01

Pos logit contributions
 strangers+0.095
 tresp+0.091
 supposed+0.083
vern+0.082
ache+0.081
Neg logit contributions
cy-0.077
sc-0.074
 Of-0.069
 up-0.068
ed-0.067
Token can
Token ID460
Feature activation+5.07e+00

Pos logit contributions
cy+0.029
 up+0.027
sc+0.026
ed+0.025
 girl+0.025
Neg logit contributions
 strangers-0.039
 tresp-0.036
ache-0.036
cod-0.035
 allergic-0.035
Token touch
Token ID3638
Feature activation-2.22e+00

Pos logit contributions
 strangers+0.324
 tresp+0.319
 liar+0.295
 supposed+0.290
 cruel+0.289
Neg logit contributions
cy-0.273
 up-0.255
sc-0.239
 onto-0.237
 Of-0.237
Token it
Token ID340
Feature activation+1.95e-01

Pos logit contributions
cy+0.107
sc+0.100
 Of+0.096
ed+0.092
 up+0.092
Neg logit contributions
 strangers-0.151
 tresp-0.144
 supposed-0.135
vern-0.133
ache-0.132
Token the
Token ID262
Feature activation+9.71e-01

Pos logit contributions
 tresp+0.058
 strangers+0.058
 supposed+0.057
vern+0.056
 liar+0.055
Neg logit contributions
cy-0.034
 up-0.034
 onto-0.031
sc-0.031
 down-0.030
Token ground
Token ID2323
Feature activation-1.24e-01

Pos logit contributions
 strangers+0.067
 tresp+0.065
 supposed+0.062
vern+0.061
ache+0.060
Neg logit contributions
cy-0.045
sc-0.042
 up-0.040
 Of-0.040
ed-0.039
Token,
Token ID11
Feature activation-7.01e-01

Pos logit contributions
cy+0.006
sc+0.006
 up+0.006
 Of+0.005
ed+0.005
Neg logit contributions
 strangers-0.008
 tresp-0.008
 supposed-0.007
vern-0.007
 liar-0.007
Token and
Token ID290
Feature activation+2.36e+00

Pos logit contributions
cy+0.039
sc+0.037
 Of+0.035
 up+0.035
ed+0.034
Neg logit contributions
 strangers-0.042
 tresp-0.041
 supposed-0.038
vern-0.037
ache-0.037
Token the
Token ID262
Feature activation+9.38e-01

Pos logit contributions
 strangers+0.129
 tresp+0.122
 cruel+0.114
 allergic+0.112
 supposed+0.111
Neg logit contributions
cy-0.142
sc-0.131
 Of-0.130
ipt-0.124
ucks-0.122
Token wolf
Token ID17481
Feature activation+1.02e+01

Pos logit contributions
 tresp+0.037
vern+0.037
 strangers+0.037
 supposed+0.036
isdom+0.033
Neg logit contributions
cy-0.066
 up-0.065
 girl-0.065
sc-0.063
 down-0.062
Token ran
Token ID4966
Feature activation+1.90e+00

Pos logit contributions
 strangers+0.649
 tresp+0.627
 supposed+0.597
vern+0.591
 liar+0.570
Neg logit contributions
cy-0.528
sc-0.501
 up-0.485
ed-0.477
 Of-0.471
Token away
Token ID1497
Feature activation-8.77e+00

Pos logit contributions
 strangers+0.160
 tresp+0.157
 supposed+0.148
vern+0.141
ache+0.140
Neg logit contributions
cy-0.060
 Of-0.056
sc-0.048
ipt-0.045
 mixture-0.045
Token.
Token ID13
Feature activation-3.54e-02

Pos logit contributions
cy+0.462
sc+0.434
 Of+0.422
 up+0.388
ed+0.386
Neg logit contributions
 strangers-0.552
 tresp-0.546
 supposed-0.507
vern-0.496
ache-0.491
Token
Token ID198
Feature activation-1.22e+00

Pos logit contributions
cy+0.002
sc+0.002
 Of+0.002
 up+0.002
ed+0.002
Neg logit contributions
 strangers-0.002
 tresp-0.002
 supposed-0.002
vern-0.002
 allergic-0.002
Token
Token ID198
Feature activation-1.22e+00

Pos logit contributions
cy+0.046
sc+0.046
 up+0.044
 Of+0.042
 onto+0.041
Neg logit contributions
 strangers-0.096
 tresp-0.094
 liar-0.087
ache-0.087
 supposed-0.087
Token She
Token ID1375
Feature activation+6.41e-01

Pos logit contributions
 strangers+0.031
 tresp+0.030
 supposed+0.028
ache+0.027
 liar+0.027
Neg logit contributions
cy-0.026
sc-0.025
 Of-0.023
 up-0.023
ed-0.022
Token is
Token ID318
Feature activation-1.85e+00

Pos logit contributions
 strangers+0.045
 tresp+0.044
 supposed+0.042
vern+0.041
ache+0.041
Neg logit contributions
cy-0.029
sc-0.027
 up-0.026
 Of-0.025
ed-0.025
Token scared
Token ID12008
Feature activation+1.70e+00

Pos logit contributions
 up+0.122
 down+0.113
cy+0.112
 onto+0.111
 into+0.111
Neg logit contributions
 strangers-0.090
 tresp-0.088
vern-0.086
isdom-0.084
ache-0.083
Token of
Token ID286
Feature activation+2.26e+00

Pos logit contributions
 tresp+0.124
 strangers+0.120
 supposed+0.116
vern+0.115
 liar+0.114
Neg logit contributions
cy-0.070
 up-0.068
sc-0.065
 down-0.063
 Of-0.063
Token the
Token ID262
Feature activation+1.41e+00

Pos logit contributions
vern+0.183
ache+0.179
 tresp+0.178
 supposed+0.177
 liar+0.176
Neg logit contributions
 up-0.078
 down-0.064
 onto-0.062
 into-0.058
cy-0.057
Token bull
Token ID6473
Feature activation+8.45e+00

Pos logit contributions
 tresp+0.072
vern+0.069
 strangers+0.067
ache+0.066
 liar+0.065
Neg logit contributions
 up-0.089
cy-0.085
 girl-0.085
 down-0.083
sc-0.079
Token.
Token ID13
Feature activation+4.71e-01

Pos logit contributions
 strangers+0.580
 tresp+0.569
 supposed+0.536
vern+0.530
ache+0.522
Neg logit contributions
cy-0.388
sc-0.364
 up-0.351
 Of-0.345
ed-0.333
Token "
Token ID366
Feature activation+3.89e+00

Pos logit contributions
 strangers+0.033
 tresp+0.032
 supposed+0.030
vern+0.030
ache+0.030
Neg logit contributions
cy-0.021
sc-0.020
 Of-0.019
 up-0.019
ed-0.018
TokenThe
Token ID464
Feature activation+2.97e+00

Pos logit contributions
 strangers+0.253
 tresp+0.247
 supposed+0.231
vern+0.227
ache+0.226
Neg logit contributions
cy-0.195
sc-0.183
 Of-0.176
 up-0.175
ed-0.169
Token bull
Token ID6473
Feature activation+8.45e+00

Pos logit contributions
vern+0.145
 tresp+0.145
 strangers+0.140
ache+0.139
 supposed+0.139
Neg logit contributions
 up-0.181
 girl-0.180
cy-0.178
 down-0.171
sc-0.167
Token is
Token ID318
Feature activation-1.86e+00

Pos logit contributions
 strangers+0.634
 tresp+0.617
vern+0.614
 supposed+0.606
 liar+0.586
Neg logit contributions
cy-0.333
 up-0.305
sc-0.300
 onto-0.275
ed-0.274
Token it
Token ID340
Feature activation-8.03e-02

Pos logit contributions
cy+0.111
sc+0.104
 Of+0.103
 Sure+0.094
ipt+0.093
Neg logit contributions
 strangers-0.135
 tresp-0.129
 supposed-0.123
 allergic-0.116
 liar-0.115
Token on
Token ID319
Feature activation-3.41e+00

Pos logit contributions
cy+0.003
sc+0.003
 Of+0.003
 up+0.003
ed+0.003
Neg logit contributions
 strangers-0.006
 tresp-0.006
 supposed-0.006
ache-0.006
vern-0.005
Token the
Token ID262
Feature activation+1.10e+00

Pos logit contributions
cy+0.178
sc+0.167
 Of+0.161
ed+0.158
 up+0.155
Neg logit contributions
 strangers-0.219
 tresp-0.210
 supposed-0.195
ache-0.188
vern-0.188
Token shelf
Token ID18316
Feature activation-2.27e+00

Pos logit contributions
 strangers+0.074
 tresp+0.072
 supposed+0.067
ache+0.065
 liar+0.064
Neg logit contributions
cy-0.054
sc-0.049
 Of-0.049
ed-0.048
 up-0.045
Token.
Token ID13
Feature activation+1.48e-01

Pos logit contributions
cy+0.116
sc+0.107
 Of+0.107
 up+0.101
ed+0.101
Neg logit contributions
 strangers-0.143
 tresp-0.138
 supposed-0.130
ache-0.127
vern-0.127
Token Billy
Token ID15890
Feature activation+9.69e+00

Pos logit contributions
 strangers+0.010
 tresp+0.010
 supposed+0.010
vern+0.009
ache+0.009
Neg logit contributions
cy-0.007
sc-0.006
 up-0.006
 Of-0.006
ed-0.006
Token hated
Token ID16563
Feature activation+9.87e+00

Pos logit contributions
vern+0.684
 strangers+0.677
 tresp+0.670
 supposed+0.638
 liar+0.627
Neg logit contributions
cy-0.442
 up-0.421
sc-0.393
ed-0.385
 down-0.374
Token heights
Token ID23245
Feature activation+2.13e+00

Pos logit contributions
vern+0.736
 tresp+0.706
ache+0.690
 liar+0.675
cod+0.671
Neg logit contributions
 up-0.422
 down-0.355
 into-0.353
 Of-0.349
 onto-0.349
Token so
Token ID523
Feature activation-2.37e+00

Pos logit contributions
 strangers+0.141
 tresp+0.140
 supposed+0.130
vern+0.130
 liar+0.128
Neg logit contributions
cy-0.100
 up-0.097
sc-0.094
ed-0.091
 onto-0.091
Token he
Token ID339
Feature activation-1.08e+00

Pos logit contributions
 up+0.111
 Of+0.107
cy+0.105
 onto+0.102
ed+0.102
Neg logit contributions
 tresp-0.149
 strangers-0.144
 supposed-0.144
vern-0.141
ache-0.138
Token couldn
Token ID3521
Feature activation+6.24e+00

Pos logit contributions
sc+0.055
 up+0.055
cy+0.054
ed+0.054
 onto+0.051
Neg logit contributions
vern-0.065
 strangers-0.065
 tresp-0.062
 supposed-0.060
isdom-0.058
Token the
Token ID262
Feature activation+1.41e+00

Pos logit contributions
ache+0.164
 tresp+0.163
vern+0.159
 supposed+0.157
 strangers+0.154
Neg logit contributions
 up-0.085
cy-0.080
 Of-0.080
 onto-0.076
 girl-0.072
Token old
Token ID1468
Feature activation+4.48e+00

Pos logit contributions
 strangers+0.086
 tresp+0.085
 supposed+0.081
vern+0.080
ache+0.078
Neg logit contributions
cy-0.074
 up-0.069
sc-0.069
 girl-0.068
 Of-0.068
Token man
Token ID582
Feature activation-2.74e-02

Pos logit contributions
 strangers+0.338
 tresp+0.331
 supposed+0.314
vern+0.309
ache+0.306
Neg logit contributions
cy-0.177
sc-0.163
 Of-0.157
 up-0.156
ed-0.149
Token.
Token ID13
Feature activation+4.91e-01

Pos logit contributions
cy+0.001
 up+0.001
sc+0.001
 Of+0.001
ed+0.001
Neg logit contributions
 strangers-0.002
 tresp-0.002
vern-0.002
 supposed-0.002
 liar-0.002
Token<|endoftext|>
Token ID50256
Feature activation+7.17e+00

Pos logit contributions
 strangers+0.034
 tresp+0.032
 supposed+0.031
vern+0.030
ache+0.030
Neg logit contributions
cy-0.023
sc-0.021
 up-0.020
ed-0.020
 Of-0.020
TokenBilly
Token ID49640
Feature activation+1.16e+01

Pos logit contributions
 strangers+0.441
 tresp+0.426
 supposed+0.398
vern+0.375
 liar+0.363
Neg logit contributions
sc-0.312
cy-0.311
 mixture-0.306
 Of-0.303
 up-0.283
Token loved
Token ID6151
Feature activation+5.66e+00

Pos logit contributions
 strangers+0.790
 tresp+0.778
vern+0.767
 liar+0.734
 supposed+0.726
Neg logit contributions
cy-0.606
 up-0.537
ed-0.507
sc-0.493
 down-0.492
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 tresp+0.399
vern+0.389
 strangers+0.388
ache+0.377
 liar+0.374
Neg logit contributions
 up-0.238
cy-0.229
 Of-0.223
 down-0.219
 onto-0.217
Token decor
Token ID11705
Feature activation+3.22e+00

Pos logit contributions
 tresp+0.088
 strangers+0.086
 supposed+0.083
ache+0.082
 liar+0.080
Neg logit contributions
cy-0.058
 Of-0.056
 up-0.055
 onto-0.054
 down-0.051
Tokenate
Token ID378
Feature activation-6.69e+00

Pos logit contributions
 strangers+0.231
vern+0.215
 tresp+0.211
 supposed+0.210
isdom+0.207
Neg logit contributions
sc-0.136
cy-0.128
 Sure-0.124
 Of-0.115
ed-0.107
Token.
Token ID13
Feature activation+4.96e-01

Pos logit contributions
cy+0.344
sc+0.324
 Of+0.311
ed+0.298
 up+0.296
Neg logit contributions
 strangers-0.431
 tresp-0.415
 supposed-0.384
vern-0.379
ache-0.375
Token very
Token ID845
Feature activation+1.40e+00

Pos logit contributions
 tresp+0.122
 strangers+0.119
 supposed+0.117
 liar+0.105
vern+0.103
Neg logit contributions
cy-0.179
sc-0.176
 Of-0.169
antly-0.165
ed-0.163
Token young
Token ID1862
Feature activation+3.30e+00

Pos logit contributions
 strangers+0.074
 tresp+0.070
ache+0.065
vern+0.063
 supposed+0.063
Neg logit contributions
cy-0.091
 up-0.083
 Of-0.082
sc-0.082
ed-0.079
Token,
Token ID11
Feature activation-3.57e-01

Pos logit contributions
 strangers+0.194
 tresp+0.188
 supposed+0.174
 liar+0.167
ache+0.167
Neg logit contributions
cy-0.184
sc-0.180
 Of-0.172
 up-0.167
ed-0.164
Token only
Token ID691
Feature activation-3.61e+00

Pos logit contributions
cy+0.018
sc+0.017
 Of+0.017
 up+0.017
ed+0.016
Neg logit contributions
 strangers-0.023
 tresp-0.022
 supposed-0.021
vern-0.020
ache-0.020
Token three
Token ID1115
Feature activation-5.07e+00

Pos logit contributions
cy+0.159
sc+0.155
ipt+0.142
antly+0.142
 Of+0.140
Neg logit contributions
 strangers-0.250
 tresp-0.249
 supposed-0.235
 liar-0.225
 cruel-0.221
Token years
Token ID812
Feature activation+8.55e+00

Pos logit contributions
cy+0.309
 girl+0.303
ed+0.298
 up+0.289
sc+0.289
Neg logit contributions
 supposed-0.249
vern-0.247
 tresp-0.240
ache-0.236
 strangers-0.232
Token old
Token ID1468
Feature activation+4.35e+00

Pos logit contributions
 strangers+0.531
 tresp+0.527
 supposed+0.495
vern+0.484
 liar+0.481
Neg logit contributions
cy-0.437
 up-0.417
sc-0.407
 Of-0.400
 girl-0.394
Token.
Token ID13
Feature activation+3.47e-01

Pos logit contributions
 strangers+0.271
 tresp+0.261
 supposed+0.239
ache+0.236
 liar+0.234
Neg logit contributions
cy-0.229
sc-0.223
 Of-0.211
 up-0.208
 onto-0.203
Token
Token ID198
Feature activation-8.46e-01

Pos logit contributions
 strangers+0.021
 tresp+0.019
 supposed+0.019
vern+0.018
isdom+0.018
Neg logit contributions
cy-0.018
sc-0.017
ed-0.016
 up-0.016
 onto-0.016
Token
Token ID198
Feature activation-8.49e-01

Pos logit contributions
cy+0.037
sc+0.034
 Of+0.033
 up+0.032
ed+0.032
Neg logit contributions
 strangers-0.060
 tresp-0.058
 supposed-0.055
vern-0.054
ache-0.053
TokenThey
Token ID2990
Feature activation+4.07e-01

Pos logit contributions
cy+0.056
sc+0.054
 Of+0.052
 up+0.052
ed+0.050
Neg logit contributions
 strangers-0.043
 tresp-0.042
 supposed-0.038
ache-0.038
vern-0.037
Token all
Token ID477
Feature activation+2.46e+00

Pos logit contributions
 strangers+0.157
 tresp+0.147
 supposed+0.139
ache+0.135
 liar+0.132
Neg logit contributions
cy-0.124
sc-0.120
 Of-0.114
ed-0.106
 up-0.106
Token the
Token ID262
Feature activation+9.93e-01

Pos logit contributions
 strangers+0.169
 tresp+0.166
 supposed+0.155
vern+0.154
ache+0.153
Neg logit contributions
cy-0.115
sc-0.107
 up-0.104
 Of-0.103
ed-0.099
Token plants
Token ID6134
Feature activation+3.32e+00

Pos logit contributions
 strangers+0.066
 tresp+0.064
 supposed+0.060
vern+0.059
ache+0.059
Neg logit contributions
cy-0.049
sc-0.046
 Of-0.044
 up-0.044
ed-0.042
Token were
Token ID547
Feature activation+3.60e+00

Pos logit contributions
 tresp+0.220
 strangers+0.210
vern+0.207
 liar+0.203
 supposed+0.196
Neg logit contributions
cy-0.167
 up-0.166
ed-0.161
 onto-0.158
 down-0.157
Token so
Token ID523
Feature activation-2.45e+00

Pos logit contributions
 strangers+0.220
 tresp+0.218
vern+0.205
 liar+0.204
cod+0.200
Neg logit contributions
 up-0.195
cy-0.183
 into-0.181
 down-0.179
 onto-0.177
Token happy
Token ID3772
Feature activation+2.60e+00

Pos logit contributions
cy+0.139
 up+0.131
 Of+0.124
sc+0.123
ed+0.121
Neg logit contributions
 strangers-0.147
 tresp-0.145
ache-0.132
 liar-0.129
 supposed-0.128
Token.
Token ID13
Feature activation+2.00e-02

Pos logit contributions
 strangers+0.168
 tresp+0.162
 supposed+0.154
ache+0.147
vern+0.147
Neg logit contributions
cy-0.134
sc-0.126
 Of-0.121
 up-0.116
ed-0.114
Token They
Token ID1119
Feature activation-1.68e+00

Pos logit contributions
 strangers+0.001
 tresp+0.001
 supposed+0.001
 allergic+0.001
 liar+0.001
Neg logit contributions
cy-0.001
sc-0.001
ed-0.001
 up-0.001
cer-0.001
Token danced
Token ID39480
Feature activation-2.33e+00

Pos logit contributions
cy+0.077
 up+0.069
ed+0.066
 into+0.066
 onto+0.065
Neg logit contributions
vern-0.127
 tresp-0.122
 supposed-0.116
 strangers-0.116
 liar-0.115
Token and
Token ID290
Feature activation+2.34e+00

Pos logit contributions
cy+0.124
 Of+0.114
sc+0.113
ipt+0.106
 Sure+0.103
Neg logit contributions
 strangers-0.140
 tresp-0.133
 supposed-0.124
vern-0.121
ache-0.121
Token sang
Token ID25889
Feature activation-2.75e+00

Pos logit contributions
 strangers+0.128
 tresp+0.123
 supposed+0.113
ache+0.107
 liar+0.107
Neg logit contributions
cy-0.140
sc-0.134
 Of-0.132
 up-0.128
ed-0.124
Token how
Token ID703
Feature activation+2.51e+00

Pos logit contributions
 tresp+0.274
 strangers+0.272
 supposed+0.263
ache+0.262
vern+0.261
Neg logit contributions
 up-0.253
cy-0.247
 Of-0.239
sc-0.231
 down-0.229
Token to
Token ID284
Feature activation+1.17e+00

Pos logit contributions
 strangers+0.179
 tresp+0.179
ache+0.171
cod+0.167
vern+0.164
Neg logit contributions
 up-0.100
cy-0.094
 Of-0.093
 down-0.092
sc-0.088
Token make
Token ID787
Feature activation-2.88e+00

Pos logit contributions
 tresp+0.073
 strangers+0.072
 supposed+0.071
 liar+0.071
ache+0.070
Neg logit contributions
cy-0.054
 up-0.054
 with-0.046
 onto-0.046
 edge-0.046
Token friends
Token ID2460
Feature activation+2.17e-01

Pos logit contributions
cy+0.161
sc+0.151
 Of+0.142
ed+0.141
 onto+0.134
Neg logit contributions
 strangers-0.182
 tresp-0.172
 supposed-0.160
vern-0.158
 allergic-0.153
Token with
Token ID351
Feature activation-8.92e+00

Pos logit contributions
 strangers+0.017
 tresp+0.017
 supposed+0.016
ache+0.015
vern+0.015
Neg logit contributions
cy-0.008
 up-0.007
sc-0.007
 Of-0.007
 down-0.007
Token animals
Token ID4695
Feature activation+1.72e+00

Pos logit contributions
 up+0.378
cy+0.374
sc+0.345
 onto+0.338
 down+0.336
Neg logit contributions
 tresp-0.610
 supposed-0.592
ache-0.580
 strangers-0.576
vern-0.574
Token."
Token ID526
Feature activation-6.85e-01

Pos logit contributions
 tresp+0.125
 strangers+0.124
 supposed+0.119
 liar+0.115
vern+0.115
Neg logit contributions
cy-0.069
 up-0.066
sc-0.066
 onto-0.061
 down-0.061
Token She
Token ID1375
Feature activation+5.86e-01

Pos logit contributions
cy+0.034
sc+0.033
 up+0.031
ed+0.030
 Of+0.029
Neg logit contributions
 strangers-0.044
 tresp-0.043
 supposed-0.040
vern-0.039
 liar-0.039
Token hugs
Token ID40657
Feature activation-1.79e+00

Pos logit contributions
 strangers+0.040
 tresp+0.039
 supposed+0.036
vern+0.036
ache+0.035
Neg logit contributions
cy-0.028
sc-0.025
 Of-0.025
ed-0.023
 up-0.023
Token them
Token ID606
Feature activation+4.98e+00

Pos logit contributions
cy+0.102
 up+0.096
sc+0.095
 Of+0.095
 onto+0.092
Neg logit contributions
 tresp-0.101
 strangers-0.101
 supposed-0.096
vern-0.095
ache-0.092
Token and
Token ID290
Feature activation+2.79e+00

Pos logit contributions
 strangers+0.264
 tresp+0.243
ache+0.230
vern+0.225
 allergic+0.224
Neg logit contributions
cy-0.298
sc-0.296
 Of-0.280
ed-0.263
antly-0.262
Token boy
Token ID2933
Feature activation-8.05e+00

Pos logit contributions
sc+0.002
cy+0.002
 Of+0.002
ed+0.002
 onto+0.002
Neg logit contributions
 strangers-0.009
 tresp-0.009
 liar-0.008
ache-0.008
 cruel-0.008
Token and
Token ID290
Feature activation+2.69e+00

Pos logit contributions
cy+0.605
 up+0.545
sc+0.543
 Of+0.538
 girl+0.528
Neg logit contributions
 strangers-0.334
 tresp-0.325
vern-0.324
 supposed-0.299
ache-0.299
Token his
Token ID465
Feature activation-2.46e-01

Pos logit contributions
 strangers+0.211
 tresp+0.207
ache+0.201
 supposed+0.198
vern+0.197
Neg logit contributions
cy-0.095
 girl-0.092
 Of-0.090
 up-0.090
 down-0.083
Token dad
Token ID9955
Feature activation+4.53e+00

Pos logit contributions
cy+0.012
 up+0.011
 girl+0.011
 Of+0.011
ed+0.011
Neg logit contributions
 strangers-0.016
 tresp-0.016
 supposed-0.015
vern-0.015
ache-0.014
Token.
Token ID13
Feature activation+3.30e-01

Pos logit contributions
 strangers+0.287
 tresp+0.280
vern+0.265
ache+0.259
 supposed+0.257
Neg logit contributions
cy-0.242
sc-0.218
 up-0.216
 Of-0.214
ed-0.208
Token The
Token ID383
Feature activation+7.45e-01

Pos logit contributions
 strangers+0.020
 tresp+0.019
 supposed+0.018
 liar+0.018
 allergic+0.017
Neg logit contributions
sc-0.016
cy-0.016
ed-0.016
 up-0.016
antly-0.015
Token little
Token ID1310
Feature activation-1.36e-01

Pos logit contributions
 strangers+0.052
 tresp+0.051
 supposed+0.048
vern+0.047
ache+0.047
Neg logit contributions
cy-0.034
sc-0.031
 up-0.030
 Of-0.030
ed-0.029
Token boy
Token ID2933
Feature activation-8.09e+00

Pos logit contributions
sc+0.006
antly+0.005
 onto+0.005
 Of+0.005
iffs+0.004
Neg logit contributions
 strangers-0.010
 mean-0.010
 liar-0.009
 tresp-0.009
 hated-0.008
Token wanted
Token ID2227
Feature activation+2.55e+00

Pos logit contributions
cy+0.476
 up+0.426
ed+0.419
 girl+0.403
sc+0.398
Neg logit contributions
 strangers-0.498
vern-0.478
 tresp-0.472
ache-0.442
 liar-0.432
Token nothing
Token ID2147
Feature activation+2.02e-01

Pos logit contributions
 strangers+0.176
 tresp+0.174
vern+0.164
ache+0.163
 supposed+0.163
Neg logit contributions
cy-0.111
 up-0.107
sc-0.105
 Of-0.104
 onto-0.099
Token more
Token ID517
Feature activation-4.70e+00

Pos logit contributions
 strangers+0.013
 tresp+0.012
 supposed+0.012
vern+0.011
ache+0.011
Neg logit contributions
cy-0.011
sc-0.010
 Of-0.010
ed-0.010
 up-0.010
Token me
Token ID502
Feature activation+9.10e-01

Pos logit contributions
 tresp+0.115
vern+0.110
 strangers+0.110
 liar+0.108
 supposed+0.108
Neg logit contributions
 up-0.064
cy-0.064
sc-0.059
 down-0.057
 into-0.055
Token?"
Token ID1701
Feature activation+4.62e+00

Pos logit contributions
 strangers+0.055
 tresp+0.054
 liar+0.050
 supposed+0.049
vern+0.049
Neg logit contributions
cy-0.050
 up-0.049
sc-0.045
 onto-0.045
 down-0.045
Token Max
Token ID5436
Feature activation+2.40e+00

Pos logit contributions
 strangers+0.262
 tresp+0.252
 supposed+0.233
ache+0.229
vern+0.228
Neg logit contributions
cy-0.271
sc-0.258
 Of-0.246
 up-0.246
ed-0.241
Token said
Token ID531
Feature activation-4.92e+00

Pos logit contributions
 strangers+0.166
 tresp+0.162
ache+0.153
 supposed+0.151
vern+0.149
Neg logit contributions
cy-0.105
sc-0.102
 Of-0.094
 up-0.092
 onto-0.090
Token,
Token ID11
Feature activation-8.46e-01

Pos logit contributions
cy+0.262
sc+0.247
 Of+0.237
 up+0.235
ed+0.228
Neg logit contributions
 strangers-0.306
 tresp-0.297
 supposed-0.277
vern-0.272
ache-0.271
Token "
Token ID366
Feature activation+3.28e+00

Pos logit contributions
cy+0.038
 up+0.037
 Of+0.037
sc+0.037
ed+0.036
Neg logit contributions
 strangers-0.058
 tresp-0.057
vern-0.054
 supposed-0.053
ache-0.051
TokenSure
Token ID19457
Feature activation-1.12e+01

Pos logit contributions
 strangers+0.226
ache+0.212
 cruel+0.211
 understand+0.208
 liar+0.207
Neg logit contributions
cy-0.129
 up-0.128
 onto-0.118
 roles-0.114
ed-0.114
Token,
Token ID11
Feature activation-8.81e-01

Pos logit contributions
cy+0.561
sc+0.519
 Of+0.509
 up+0.508
ed+0.488
Neg logit contributions
 strangers-0.728
 tresp-0.705
 supposed-0.668
vern-0.658
ache-0.649
Token let
Token ID1309
Feature activation-3.94e+00

Pos logit contributions
cy+0.036
sc+0.034
 Of+0.033
 up+0.033
 onto+0.031
Neg logit contributions
 strangers-0.064
 tresp-0.063
 supposed-0.060
vern-0.059
ache-0.058
Token's
Token ID338
Feature activation-1.95e+00

Pos logit contributions
cy+0.173
sc+0.162
 Of+0.156
 up+0.149
ed+0.147
Neg logit contributions
 strangers-0.282
 tresp-0.274
 supposed-0.259
vern-0.253
ache-0.252
Token bow
Token ID9563
Feature activation-1.53e+00

Pos logit contributions
cy+0.105
 up+0.103
 onto+0.095
 down+0.094
sc+0.093
Neg logit contributions
 strangers-0.119
 tresp-0.118
ache-0.112
 liar-0.112
vern-0.111
Token.
Token ID13
Feature activation+4.43e-01

Pos logit contributions
cy+0.284
sc+0.271
 up+0.260
 Of+0.250
 onto+0.248
Neg logit contributions
 strangers-0.496
 tresp-0.493
vern-0.468
 supposed-0.465
ache-0.460
Token But
Token ID887
Feature activation-8.59e+00

Pos logit contributions
 strangers+0.029
 tresp+0.028
 supposed+0.026
 liar+0.026
 allergic+0.025
Neg logit contributions
cy-0.022
sc-0.020
 up-0.020
ed-0.019
 onto-0.019
Token she
Token ID673
Feature activation-4.99e-01

Pos logit contributions
cy+0.485
sc+0.471
ed+0.434
 Of+0.428
 up+0.414
Neg logit contributions
 strangers-0.517
 tresp-0.497
 supposed-0.452
 cruel-0.438
 liar-0.435
Token will
Token ID481
Feature activation-1.08e+00

Pos logit contributions
cy+0.022
 up+0.020
sc+0.019
ed+0.018
 onto+0.017
Neg logit contributions
 strangers-0.035
vern-0.034
 tresp-0.034
 supposed-0.032
 liar-0.032
Token be
Token ID307
Feature activation-7.81e-01

Pos logit contributions
 up+0.061
cy+0.060
 onto+0.057
 into+0.057
sc+0.056
Neg logit contributions
 tresp-0.055
 strangers-0.054
 supposed-0.053
cod-0.052
vern-0.052
Token okay
Token ID8788
Feature activation+5.86e+00

Pos logit contributions
cy+0.047
 up+0.047
sc+0.044
 down+0.044
 onto+0.044
Neg logit contributions
 strangers-0.041
 tresp-0.040
vern-0.037
ache-0.036
 supposed-0.036
Token."
Token ID526
Feature activation-6.76e-01

Pos logit contributions
 tresp+0.437
 strangers+0.434
vern+0.411
 liar+0.404
 supposed+0.399
Neg logit contributions
cy-0.233
 up-0.224
 down-0.205
sc-0.199
 Of-0.199
Token They
Token ID1119
Feature activation-1.27e+00

Pos logit contributions
cy+0.036
sc+0.035
ed+0.033
 up+0.032
 mixture+0.032
Neg logit contributions
 strangers-0.041
 tresp-0.037
vern-0.035
 liar-0.035
 supposed-0.034
Token give
Token ID1577
Feature activation-2.90e+00

Pos logit contributions
cy+0.062
 up+0.060
 onto+0.057
sc+0.055
 down+0.055
Neg logit contributions
 supposed-0.083
 strangers-0.080
 tresp-0.080
vern-0.080
isdom-0.076
Token Lily
Token ID20037
Feature activation-3.16e+00

Pos logit contributions
sc+0.156
 Of+0.153
cy+0.147
umm+0.144
 mixture+0.140
Neg logit contributions
 strangers-0.179
 tresp-0.149
 supposed-0.144
 allergic-0.142
 cruel-0.137
Token a
Token ID257
Feature activation-1.30e+00

Pos logit contributions
cy+0.144
sc+0.133
 up+0.129
 Of+0.127
ed+0.124
Neg logit contributions
 strangers-0.220
 tresp-0.216
 supposed-0.203
vern-0.199
ache-0.198
Token come
Token ID1282
Feature activation+3.74e+00

Pos logit contributions
 tresp+0.006
 strangers+0.006
vern+0.006
 supposed+0.006
ache+0.006
Neg logit contributions
cy-0.006
 up-0.006
sc-0.005
 Of-0.005
 onto-0.005
Token to
Token ID284
Feature activation+3.85e-01

Pos logit contributions
 strangers+0.274
 tresp+0.267
 supposed+0.252
vern+0.247
ache+0.246
Neg logit contributions
cy-0.157
sc-0.147
 Of-0.140
 up-0.137
ed-0.133
Token the
Token ID262
Feature activation+6.30e-01

Pos logit contributions
 tresp+0.027
 supposed+0.026
vern+0.026
 strangers+0.026
ache+0.026
Neg logit contributions
cy-0.016
 up-0.016
 Of-0.014
 down-0.014
 onto-0.014
Token office
Token ID2607
Feature activation+4.80e+00

Pos logit contributions
 strangers+0.038
 tresp+0.036
 supposed+0.034
ache+0.033
vern+0.033
Neg logit contributions
cy-0.035
sc-0.033
 Of-0.032
 up-0.032
ed-0.031
Token anymore
Token ID7471
Feature activation+4.39e+00

Pos logit contributions
 strangers+0.321
 tresp+0.312
 supposed+0.293
vern+0.287
ache+0.286
Neg logit contributions
cy-0.231
sc-0.216
 Of-0.208
 up-0.205
ed-0.199
Token.
Token ID13
Feature activation-7.84e-02

Pos logit contributions
 strangers+0.313
 tresp+0.308
vern+0.291
 supposed+0.289
ache+0.288
Neg logit contributions
cy-0.185
sc-0.173
 up-0.171
 Of-0.164
ed-0.159
Token You
Token ID921
Feature activation-5.01e+00

Pos logit contributions
cy+0.004
sc+0.003
 up+0.003
ed+0.003
 onto+0.003
Neg logit contributions
 strangers-0.006
 tresp-0.005
 supposed-0.005
 liar-0.005
vern-0.005
Token are
Token ID389
Feature activation+3.64e+00

Pos logit contributions
cy+0.277
 up+0.252
sc+0.248
 onto+0.245
 girl+0.237
Neg logit contributions
 strangers-0.292
vern-0.282
isdom-0.275
 tresp-0.273
 supposed-0.270
Token grounded
Token ID22804
Feature activation+4.71e+00

Pos logit contributions
 strangers+0.212
ache+0.210
vern+0.207
isdom+0.204
cod+0.194
Neg logit contributions
 up-0.197
 into-0.176
 down-0.174
 onto-0.170
cy-0.165
Token."
Token ID526
Feature activation-1.02e+00

Pos logit contributions
 strangers+0.322
 tresp+0.315
 supposed+0.297
ache+0.295
vern+0.295
Neg logit contributions
cy-0.214
sc-0.202
 up-0.201
 Of-0.191
 onto-0.190
Token
Token ID198
Feature activation-1.07e+00

Pos logit contributions
cy+0.049
sc+0.047
 up+0.043
ed+0.043
 Of+0.042
Neg logit contributions
 strangers-0.069
 tresp-0.066
 liar-0.061
 supposed-0.060
vern-0.060
Token smiled
Token ID13541
Feature activation-2.13e+00

Pos logit contributions
 strangers+0.020
 tresp+0.019
 supposed+0.018
vern+0.018
ache+0.018
Neg logit contributions
cy-0.013
sc-0.012
 up-0.012
 Of-0.011
ed-0.011
Token and
Token ID290
Feature activation+2.45e+00

Pos logit contributions
sc+0.132
cy+0.130
 Of+0.119
 mixture+0.109
 onto+0.109
Neg logit contributions
 strangers-0.112
 tresp-0.105
ache-0.101
 supposed-0.098
vern-0.097
Token picked
Token ID6497
Feature activation+3.50e-01

Pos logit contributions
 tresp+0.128
 strangers+0.126
 supposed+0.114
ache+0.111
 cruel+0.108
Neg logit contributions
cy-0.147
sc-0.147
 Of-0.135
 up-0.127
ASH-0.126
Token up
Token ID510
Feature activation-9.53e+00

Pos logit contributions
 strangers+0.025
 tresp+0.022
 supposed+0.022
 allergic+0.022
 understand+0.021
Neg logit contributions
sc-0.013
cy-0.012
ipt-0.012
suit-0.012
 Of-0.011
Token the
Token ID262
Feature activation+1.03e+00

Pos logit contributions
cy+0.486
sc+0.465
 Of+0.436
 up+0.422
ed+0.419
Neg logit contributions
 strangers-0.620
 tresp-0.596
 supposed-0.546
ache-0.541
vern-0.540
Token baby
Token ID5156
Feature activation-2.25e+00

Pos logit contributions
 strangers+0.065
 tresp+0.064
 supposed+0.058
 liar+0.057
vern+0.056
Neg logit contributions
cy-0.055
sc-0.053
 Of-0.051
antly-0.049
ed-0.049
Token and
Token ID290
Feature activation+2.42e+00

Pos logit contributions
sc+0.106
 Of+0.103
cy+0.100
Wow+0.093
 Sure+0.092
Neg logit contributions
 strangers-0.150
 tresp-0.142
ache-0.133
 supposed-0.129
 allergic-0.126
Token hugged
Token ID41130
Feature activation+9.50e-01

Pos logit contributions
 strangers+0.128
 tresp+0.128
 supposed+0.114
ache+0.110
 cruel+0.106
Neg logit contributions
cy-0.144
sc-0.142
 Of-0.135
antly-0.124
 up-0.123
Token him
Token ID683
Feature activation+4.02e+00

Pos logit contributions
 strangers+0.051
 tresp+0.045
ache+0.041
 supposed+0.040
 liar+0.040
Neg logit contributions
sc-0.060
cy-0.058
 Of-0.053
ed-0.052
suit-0.051
Token.
Token ID13
Feature activation+5.76e-02

Pos logit contributions
 strangers+0.222
 tresp+0.214
vern+0.203
ache+0.202
 supposed+0.197
Neg logit contributions
cy-0.221
sc-0.219
 Of-0.208
ipt-0.197
Wow-0.194
Token
Token ID198
Feature activation-1.14e+00

Pos logit contributions
 strangers+0.003
 tresp+0.003
 supposed+0.003
vern+0.003
isdom+0.003
Neg logit contributions
sc-0.003
cy-0.003
 up-0.003
ed-0.003
 onto-0.003
Token girls
Token ID4813
Feature activation-3.58e+00

Pos logit contributions
cy+0.345
sc+0.331
 Of+0.320
 up+0.314
ed+0.306
Neg logit contributions
 strangers-0.340
 tresp-0.325
 supposed-0.300
ache-0.293
vern-0.292
Token stayed
Token ID9658
Feature activation+2.71e+00

Pos logit contributions
cy+0.201
sc+0.182
 up+0.181
ed+0.179
 onto+0.175
Neg logit contributions
 strangers-0.218
 tresp-0.214
vern-0.203
 supposed-0.199
 liar-0.196
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 strangers+0.183
 tresp+0.176
 supposed+0.167
 allergic+0.161
vern+0.160
Neg logit contributions
sc-0.126
cy-0.125
 Of-0.118
ed-0.111
 Sure-0.105
Token played
Token ID2826
Feature activation+2.27e+00

Pos logit contributions
 strangers+0.133
 tresp+0.128
 supposed+0.117
 never+0.109
 understand+0.107
Neg logit contributions
sc-0.183
cy-0.173
 Of-0.169
ipt-0.162
 up-0.161
Token together
Token ID1978
Feature activation-8.45e+00

Pos logit contributions
 strangers+0.174
 tresp+0.168
vern+0.154
 supposed+0.153
 liar+0.152
Neg logit contributions
cy-0.089
 Of-0.073
sc-0.073
ipt-0.072
ed-0.067
Token in
Token ID287
Feature activation-2.12e+00

Pos logit contributions
cy+0.410
sc+0.374
 Of+0.366
ed+0.349
 up+0.340
Neg logit contributions
 strangers-0.568
 tresp-0.542
 supposed-0.515
vern-0.493
ache-0.492
Token the
Token ID262
Feature activation+1.40e+00

Pos logit contributions
cy+0.102
sc+0.096
 Of+0.092
 up+0.091
ed+0.089
Neg logit contributions
 strangers-0.143
 tresp-0.138
 supposed-0.130
vern-0.127
ache-0.126
Token park
Token ID3952
Feature activation+1.64e+00

Pos logit contributions
 strangers+0.082
 tresp+0.076
 supposed+0.070
 mean+0.068
 cruel+0.068
Neg logit contributions
sc-0.077
cy-0.077
ed-0.075
 Of-0.074
 onto-0.069
Token.
Token ID13
Feature activation+4.58e-01

Pos logit contributions
 strangers+0.110
 tresp+0.106
 supposed+0.099
ache+0.096
vern+0.096
Neg logit contributions
cy-0.079
sc-0.074
 Of-0.071
 up-0.069
ed-0.067
Token They
Token ID1119
Feature activation-1.23e+00

Pos logit contributions
 strangers+0.030
 supposed+0.027
 tresp+0.027
 allergic+0.026
isdom+0.025
Neg logit contributions
sc-0.020
cy-0.019
 up-0.018
ed-0.018
 girl-0.017
Token ran
Token ID4966
Feature activation+2.36e+00

Pos logit contributions
cy+0.073
 up+0.064
sc+0.063
ed+0.062
 onto+0.061
Neg logit contributions
 strangers-0.074
vern-0.074
 tresp-0.074
ache-0.069
 supposed-0.069
Token.
Token ID13
Feature activation+4.20e-01

Pos logit contributions
 strangers+0.223
 tresp+0.218
 supposed+0.202
vern+0.202
 liar+0.199
Neg logit contributions
cy-0.133
 up-0.120
sc-0.117
ed-0.115
 Of-0.111
Token Please
Token ID4222
Feature activation-7.02e-01

Pos logit contributions
 strangers+0.027
 tresp+0.026
 liar+0.024
 supposed+0.024
 allergic+0.023
Neg logit contributions
cy-0.020
sc-0.019
 up-0.019
ed-0.019
 onto-0.018
Token forgive
Token ID20927
Feature activation+3.58e+00

Pos logit contributions
cy+0.035
 up+0.032
 Of+0.031
sc+0.030
 down+0.030
Neg logit contributions
 strangers-0.046
 tresp-0.045
 supposed-0.044
vern-0.042
 liar-0.042
Token us
Token ID514
Feature activation+6.37e+00

Pos logit contributions
 strangers+0.175
 tresp+0.159
 supposed+0.143
ache+0.141
vern+0.137
Neg logit contributions
cy-0.244
sc-0.239
 Of-0.227
ed-0.225
 onto-0.221
Token."
Token ID526
Feature activation-6.64e-01

Pos logit contributions
 strangers+0.471
 tresp+0.464
 supposed+0.438
vern+0.427
 liar+0.425
Neg logit contributions
cy-0.260
 up-0.241
sc-0.239
 Of-0.228
ed-0.221
Token
Token ID198
Feature activation-7.32e-01

Pos logit contributions
cy+0.039
sc+0.038
 up+0.034
ed+0.034
 mixture+0.033
Neg logit contributions
 strangers-0.037
 tresp-0.034
 liar-0.032
vern-0.032
ache-0.031
Token
Token ID198
Feature activation-7.37e-01

Pos logit contributions
cy+0.023
 Of+0.022
sc+0.021
 up+0.021
 onto+0.019
Neg logit contributions
 strangers-0.060
 tresp-0.060
 supposed-0.057
ache-0.056
 liar-0.055
TokenTheir
Token ID14574
Feature activation+1.30e+00

Pos logit contributions
cy+0.038
sc+0.035
 Of+0.034
 up+0.033
ed+0.032
Neg logit contributions
 strangers-0.048
 tresp-0.046
 supposed-0.044
ache-0.043
vern-0.042
Token parents
Token ID3397
Feature activation+1.93e+00

Pos logit contributions
 strangers+0.088
 tresp+0.086
vern+0.081
 supposed+0.081
ache+0.081
Neg logit contributions
cy-0.059
 up-0.056
 Of-0.055
 girl-0.053
sc-0.052
Token sighed
Token ID21893
Feature activation+1.22e+00

Pos logit contributions
 strangers+0.122
 tresp+0.118
 supposed+0.113
vern+0.110
ache+0.107
Neg logit contributions
cy-0.101
sc-0.094
 up-0.091
 Of-0.091
ed-0.090
Token and
Token ID290
Feature activation+2.81e+00

Pos logit contributions
 strangers+0.072
 tresp+0.069
ache+0.064
 supposed+0.064
vern+0.062
Neg logit contributions
cy-0.069
sc-0.066
 Of-0.062
ed-0.059
 onto-0.058
Token.
Token ID13
Feature activation+3.39e-01

Pos logit contributions
 strangers+0.567
 tresp+0.560
 supposed+0.525
vern+0.517
 liar+0.513
Neg logit contributions
cy-0.403
sc-0.380
 up-0.374
 Of-0.363
ed-0.356
Token Then
Token ID3244
Feature activation-9.58e+00

Pos logit contributions
 strangers+0.024
 tresp+0.023
 supposed+0.021
vern+0.021
 liar+0.021
Neg logit contributions
cy-0.015
sc-0.014
 up-0.014
ed-0.014
 Of-0.013
Token one
Token ID530
Feature activation-7.96e+00

Pos logit contributions
sc+0.582
cy+0.567
ipt+0.529
 Of+0.508
ed+0.507
Neg logit contributions
 strangers-0.550
 tresp-0.514
 liar-0.422
 mean-0.418
 cruel-0.415
Token cold
Token ID4692
Feature activation+9.91e-01

Pos logit contributions
sc+0.383
cy+0.371
ed+0.343
 Of+0.340
ASH+0.339
Neg logit contributions
 strangers-0.531
 tresp-0.496
ache-0.470
 supposed-0.466
 allergic-0.462
Token winter
Token ID7374
Feature activation+4.88e+00

Pos logit contributions
 strangers+0.069
 tresp+0.065
ache+0.060
 supposed+0.060
 liar+0.058
Neg logit contributions
cy-0.045
sc-0.044
 Of-0.042
 up-0.040
 onto-0.038
Token,
Token ID11
Feature activation-3.33e-01

Pos logit contributions
 strangers+0.317
 tresp+0.306
 supposed+0.284
ache+0.282
 liar+0.267
Neg logit contributions
cy-0.241
sc-0.231
 Of-0.225
 up-0.212
 onto-0.207
Token the
Token ID262
Feature activation+1.33e+00

Pos logit contributions
cy+0.017
sc+0.016
 Of+0.015
 up+0.015
ed+0.015
Neg logit contributions
 strangers-0.022
 tresp-0.021
 supposed-0.019
ache-0.019
vern-0.019
Token family
Token ID1641
Feature activation+3.97e+00

Pos logit contributions
 strangers+0.082
 tresp+0.079
 supposed+0.071
ache+0.070
 liar+0.070
Neg logit contributions
cy-0.072
sc-0.069
 Of-0.067
ed-0.065
 up-0.065
Token decided
Token ID3066
Feature activation+6.58e-01

Pos logit contributions
 strangers+0.236
 tresp+0.232
vern+0.219
 supposed+0.216
 liar+0.210
Neg logit contributions
cy-0.222
sc-0.203
 up-0.203
ed-0.197
 Of-0.193
Token to
Token ID284
Feature activation+1.20e+00

Pos logit contributions
 strangers+0.046
 tresp+0.045
vern+0.043
 supposed+0.043
 liar+0.042
Neg logit contributions
 up-0.028
cy-0.027
sc-0.026
 Of-0.026
 onto-0.025
Token settle
Token ID12259
Feature activation-2.40e+00

Pos logit contributions
 strangers+0.072
 tresp+0.071
 supposed+0.069
ache+0.067
vern+0.066
Neg logit contributions
cy-0.064
 up-0.060
sc-0.058
 onto-0.057
 Of-0.056
Token at
Token ID379
Feature activation+1.79e-02

Pos logit contributions
cy+0.022
sc+0.021
 Of+0.020
 up+0.020
ed+0.019
Neg logit contributions
 strangers-0.030
 tresp-0.029
 supposed-0.027
vern-0.027
ache-0.027
Token his
Token ID465
Feature activation-1.86e-01

Pos logit contributions
 tresp+0.001
 strangers+0.001
ache+0.001
vern+0.001
 supposed+0.001
Neg logit contributions
cy-0.001
 up-0.001
sc-0.001
 Of-0.001
 onto-0.001
Token surroundings
Token ID21334
Feature activation-2.07e+00

Pos logit contributions
cy+0.009
sc+0.008
 Of+0.008
 up+0.008
ed+0.008
Neg logit contributions
 strangers-0.012
 tresp-0.012
 supposed-0.011
vern-0.011
ache-0.011
Token and
Token ID290
Feature activation+2.77e+00

Pos logit contributions
cy+0.098
sc+0.092
 up+0.091
 Of+0.089
 onto+0.086
Neg logit contributions
 strangers-0.138
 tresp-0.136
 supposed-0.127
vern-0.125
 liar-0.124
Token marvel
Token ID22672
Feature activation-4.97e+00

Pos logit contributions
 strangers+0.159
 tresp+0.150
 supposed+0.140
ache+0.135
 cruel+0.133
Neg logit contributions
cy-0.161
sc-0.156
 Of-0.146
 up-0.140
ed-0.139
Tokened
Token ID276
Feature activation-7.73e+00

Pos logit contributions
cy+0.138
sc+0.131
 Of+0.120
 up+0.104
 onto+0.098
Neg logit contributions
 strangers-0.427
 tresp-0.414
 supposed-0.399
ache-0.395
vern-0.392
Token at
Token ID379
Feature activation+6.13e-02

Pos logit contributions
cy+0.383
sc+0.364
 Of+0.344
 up+0.330
ed+0.323
Neg logit contributions
 strangers-0.507
 tresp-0.490
 supposed-0.459
ache-0.449
vern-0.448
Token the
Token ID262
Feature activation+1.38e+00

Pos logit contributions
 tresp+0.004
 strangers+0.004
 supposed+0.004
ache+0.004
vern+0.004
Neg logit contributions
cy-0.003
 up-0.003
 Of-0.003
 onto-0.003
sc-0.003
Token beauty
Token ID8737
Feature activation+8.24e-01

Pos logit contributions
 strangers+0.086
 tresp+0.083
 supposed+0.077
ache+0.075
vern+0.074
Neg logit contributions
cy-0.073
sc-0.069
 Of-0.067
ed-0.065
 up-0.065
Token of
Token ID286
Feature activation+2.22e+00

Pos logit contributions
 strangers+0.043
 tresp+0.038
ache+0.036
 supposed+0.036
vern+0.034
Neg logit contributions
cy-0.052
sc-0.049
 Of-0.047
ASH-0.047
ipt-0.046
Token it
Token ID340
Feature activation+1.99e-01

Pos logit contributions
ache+0.156
 tresp+0.152
vern+0.152
 supposed+0.147
 liar+0.147
Neg logit contributions
 up-0.093
cy-0.088
 Of-0.085
 onto-0.084
 down-0.082
Token a
Token ID257
Feature activation-1.27e+00

Pos logit contributions
 strangers+0.211
 tresp+0.208
 supposed+0.199
vern+0.198
ache+0.194
Neg logit contributions
cy-0.126
 up-0.121
sc-0.117
ed-0.114
 Of-0.114
Token big
Token ID1263
Feature activation+3.79e+00

Pos logit contributions
cy+0.067
 girl+0.063
sc+0.063
 up+0.063
 Of+0.060
Neg logit contributions
 strangers-0.077
 tresp-0.074
 supposed-0.073
vern-0.071
isdom-0.068
Token,
Token ID11
Feature activation-3.01e-01

Pos logit contributions
 strangers+0.264
 tresp+0.256
vern+0.255
 supposed+0.255
ache+0.245
Neg logit contributions
cy-0.169
 up-0.156
sc-0.148
 girl-0.148
 down-0.146
Token white
Token ID2330
Feature activation-1.06e-01

Pos logit contributions
cy+0.015
 up+0.015
sc+0.015
 girl+0.014
 down+0.014
Neg logit contributions
 strangers-0.019
 tresp-0.018
vern-0.018
 supposed-0.017
ache-0.017
Token box
Token ID3091
Feature activation-3.60e+00

Pos logit contributions
cy+0.005
sc+0.005
 up+0.005
 Of+0.005
ed+0.005
Neg logit contributions
 strangers-0.007
 tresp-0.006
 supposed-0.006
vern-0.006
ache-0.006
Token with
Token ID351
Feature activation-8.91e+00

Pos logit contributions
cy+0.160
sc+0.150
 up+0.144
 Of+0.143
ed+0.138
Neg logit contributions
 strangers-0.254
 tresp-0.247
 supposed-0.232
vern-0.230
ache-0.228
Token a
Token ID257
Feature activation-1.30e+00

Pos logit contributions
cy+0.390
sc+0.373
 up+0.371
 Of+0.355
ed+0.355
Neg logit contributions
 strangers-0.613
 tresp-0.598
vern-0.569
ache-0.566
 supposed-0.564
Token door
Token ID3420
Feature activation-3.36e+00

Pos logit contributions
cy+0.066
sc+0.063
 up+0.062
 Of+0.060
 onto+0.059
Neg logit contributions
 strangers-0.082
 tresp-0.079
 supposed-0.075
vern-0.073
ache-0.072
Token that
Token ID326
Feature activation+1.07e+00

Pos logit contributions
cy+0.155
sc+0.144
 Of+0.138
 up+0.133
ed+0.131
Neg logit contributions
 strangers-0.232
 tresp-0.226
 supposed-0.214
ache-0.209
vern-0.207
Token opened
Token ID4721
Feature activation-1.83e+00

Pos logit contributions
 strangers+0.059
 tresp+0.059
 supposed+0.055
 mean+0.053
 cruel+0.051
Neg logit contributions
cy-0.064
sc-0.058
 Of-0.056
ucks-0.052
iffs-0.051
Token with
Token ID351
Feature activation-8.92e+00

Pos logit contributions
cy+0.091
sc+0.086
 Of+0.083
ed+0.076
 up+0.074
Neg logit contributions
 strangers-0.120
 tresp-0.117
 supposed-0.109
vern-0.106
ache-0.105
Token He
Token ID679
Feature activation-1.89e-01

Pos logit contributions
 strangers+0.028
 tresp+0.028
 supposed+0.026
vern+0.025
ache+0.024
Neg logit contributions
cy-0.026
sc-0.025
 up-0.024
ed-0.023
 mixture-0.022
Token also
Token ID635
Feature activation+2.29e+00

Pos logit contributions
cy+0.010
sc+0.010
 up+0.009
ed+0.009
 Of+0.009
Neg logit contributions
 strangers-0.012
 tresp-0.011
 supposed-0.011
vern-0.010
 liar-0.010
Token said
Token ID531
Feature activation-4.32e+00

Pos logit contributions
 strangers+0.138
 tresp+0.131
vern+0.126
 supposed+0.123
ache+0.120
Neg logit contributions
cy-0.130
sc-0.119
 up-0.118
ed-0.117
 Of-0.114
Token sorry
Token ID7926
Feature activation+4.93e+00

Pos logit contributions
cy+0.209
 up+0.204
 Of+0.202
sc+0.201
ed+0.198
Neg logit contributions
 strangers-0.280
 tresp-0.270
 supposed-0.260
ache-0.247
vern-0.246
Token and
Token ID290
Feature activation+2.82e+00

Pos logit contributions
 strangers+0.280
 tresp+0.261
ache+0.253
 supposed+0.244
 cruel+0.235
Neg logit contributions
cy-0.292
sc-0.273
 Of-0.259
ipt-0.246
 onto-0.244
Token together
Token ID1978
Feature activation-8.43e+00

Pos logit contributions
 strangers+0.149
 tresp+0.147
 cruel+0.131
ache+0.131
 supposed+0.130
Neg logit contributions
cy-0.172
sc-0.169
 Of-0.156
 up-0.148
antly-0.146
Token they
Token ID484
Feature activation-6.32e-01

Pos logit contributions
cy+0.374
sc+0.347
 up+0.337
 Of+0.335
ed+0.324
Neg logit contributions
 strangers-0.599
 tresp-0.586
 supposed-0.553
vern-0.545
ache-0.540
Token built
Token ID3170
Feature activation+5.11e+00

Pos logit contributions
cy+0.034
ed+0.030
 up+0.029
 onto+0.028
 down+0.028
Neg logit contributions
 strangers-0.043
 tresp-0.043
vern-0.042
 liar-0.040
 supposed-0.040
Token an
Token ID281
Feature activation+2.36e+00

Pos logit contributions
 strangers+0.300
 tresp+0.288
 supposed+0.267
vern+0.257
 allergic+0.252
Neg logit contributions
cy-0.285
sc-0.275
 Of-0.266
ed-0.247
ipt-0.240
Token even
Token ID772
Feature activation+3.02e+00

Pos logit contributions
 tresp+0.131
 strangers+0.126
ache+0.122
vern+0.117
 liar+0.116
Neg logit contributions
 up-0.141
cy-0.132
 girl-0.122
 down-0.118
 onto-0.117
Token better
Token ID1365
Feature activation-4.17e+00

Pos logit contributions
 strangers+0.173
 mean+0.164
 cruel+0.161
 tresp+0.156
ache+0.152
Neg logit contributions
cy-0.173
ipt-0.157
 Of-0.156
sc-0.155
ed-0.152
Token there
Token ID612
Feature activation-4.57e-01

Pos logit contributions
 strangers+0.150
 tresp+0.144
 supposed+0.129
ache+0.128
 liar+0.123
Neg logit contributions
cy-0.165
sc-0.164
 Of-0.157
antly-0.148
umm-0.148
Token it
Token ID340
Feature activation+2.17e-01

Pos logit contributions
cy+0.025
sc+0.024
 Of+0.023
 up+0.022
ed+0.021
Neg logit contributions
 strangers-0.028
 tresp-0.027
 supposed-0.025
vern-0.025
ache-0.024
Token was
Token ID373
Feature activation-2.47e+00

Pos logit contributions
 strangers+0.012
 tresp+0.012
 supposed+0.011
ache+0.011
 cruel+0.010
Neg logit contributions
cy-0.012
 Of-0.012
sc-0.012
Wow-0.010
ipt-0.010
Token -
Token ID532
Feature activation+3.84e+00

Pos logit contributions
 up+0.111
cy+0.104
sc+0.102
 onto+0.102
 down+0.097
Neg logit contributions
 strangers-0.171
 tresp-0.164
ache-0.158
vern-0.157
 liar-0.156
Token the
Token ID262
Feature activation+1.38e+00

Pos logit contributions
 strangers+0.246
 tresp+0.238
 supposed+0.222
ache+0.214
vern+0.214
Neg logit contributions
cy-0.198
sc-0.186
 Of-0.175
ed-0.172
 up-0.169
Token biggest
Token ID4094
Feature activation-8.58e+00

Pos logit contributions
 strangers+0.089
 tresp+0.087
 supposed+0.082
vern+0.081
ache+0.079
Neg logit contributions
cy-0.070
sc-0.065
 up-0.064
 Of-0.063
 girl-0.061
Token and
Token ID290
Feature activation+2.80e+00

Pos logit contributions
cy+0.467
sc+0.437
 Of+0.425
 up+0.424
ed+0.409
Neg logit contributions
 strangers-0.523
 tresp-0.509
 supposed-0.476
vern-0.467
ache-0.463
Token most
Token ID749
Feature activation-8.10e+00

Pos logit contributions
 strangers+0.176
 tresp+0.168
 supposed+0.161
 mean+0.159
 cruel+0.158
Neg logit contributions
cy-0.144
sc-0.139
antly-0.133
ed-0.132
ipt-0.124
Token beautiful
Token ID4950
Feature activation+2.36e+00

Pos logit contributions
cy+0.533
sc+0.509
 Of+0.491
 up+0.485
ed+0.480
Neg logit contributions
 strangers-0.406
 tresp-0.391
 supposed-0.361
vern-0.349
ache-0.346
Token tree
Token ID5509
Feature activation+5.75e+00

Pos logit contributions
 strangers+0.148
 tresp+0.141
 supposed+0.131
ache+0.128
vern+0.126
Neg logit contributions
cy-0.125
sc-0.121
 Of-0.114
ed-0.111
 up-0.110
Token Lily
Token ID20037
Feature activation-3.12e+00

Pos logit contributions
 strangers+0.371
 tresp+0.360
 supposed+0.337
vern+0.330
ache+0.329
Neg logit contributions
cy-0.290
sc-0.273
 Of-0.263
 up-0.260
ed-0.253
Token.
Token ID13
Feature activation+4.38e-01

Pos logit contributions
 strangers+0.319
 tresp+0.306
 supposed+0.287
ache+0.278
vern+0.274
Neg logit contributions
cy-0.277
sc-0.259
 Of-0.249
 up-0.236
 onto-0.232
Token He
Token ID679
Feature activation-2.31e-01

Pos logit contributions
 strangers+0.029
 tresp+0.028
 supposed+0.026
 liar+0.026
vern+0.025
Neg logit contributions
cy-0.022
sc-0.021
 up-0.019
 Of-0.019
ed-0.018
Token roared
Token ID45615
Feature activation+1.37e+00

Pos logit contributions
sc+0.011
 up+0.011
cy+0.011
ed+0.010
 onto+0.010
Neg logit contributions
vern-0.015
 strangers-0.015
 tresp-0.014
 supposed-0.014
ache-0.013
Token and
Token ID290
Feature activation+2.78e+00

Pos logit contributions
 strangers+0.091
 tresp+0.090
 supposed+0.085
vern+0.084
ache+0.082
Neg logit contributions
cy-0.066
sc-0.063
 up-0.062
ed-0.060
 Of-0.059
Token ran
Token ID4966
Feature activation+2.34e+00

Pos logit contributions
 strangers+0.157
 tresp+0.153
 supposed+0.139
ache+0.136
 liar+0.135
Neg logit contributions
cy-0.166
sc-0.156
 Of-0.152
 up-0.149
ed-0.145
Token towards
Token ID3371
Feature activation-7.02e+00

Pos logit contributions
 strangers+0.192
 tresp+0.187
 supposed+0.172
 liar+0.167
ache+0.167
Neg logit contributions
cy-0.085
 Of-0.078
sc-0.068
 mixture-0.067
antly-0.065
Token them
Token ID606
Feature activation+4.96e+00

Pos logit contributions
cy+0.353
sc+0.339
 Of+0.322
 up+0.312
ed+0.307
Neg logit contributions
 strangers-0.463
 tresp-0.438
 supposed-0.406
 liar-0.395
vern-0.394
Token.
Token ID13
Feature activation+4.07e-01

Pos logit contributions
 strangers+0.321
 tresp+0.311
 supposed+0.292
ache+0.284
vern+0.283
Neg logit contributions
cy-0.253
sc-0.236
 Of-0.230
ed-0.218
 up-0.217
Token
Token ID198
Feature activation-7.75e-01

Pos logit contributions
 strangers+0.025
 tresp+0.024
 liar+0.022
 cruel+0.022
 supposed+0.022
Neg logit contributions
cy-0.022
sc-0.020
 up-0.019
 mixture-0.018
ed-0.018
Token
Token ID198
Feature activation-7.78e-01

Pos logit contributions
cy+0.027
sc+0.025
 up+0.025
 Of+0.024
ed+0.023
Neg logit contributions
 strangers-0.062
 tresp-0.061
 supposed-0.058
ache-0.057
 liar-0.056
Token"
Token ID1
Feature activation+5.61e+00

Pos logit contributions
cy+0.049
sc+0.047
 Of+0.046
 up+0.046
 onto+0.045
Neg logit contributions
 strangers-0.040
 tresp-0.038
vern-0.034
 supposed-0.034
 liar-0.034