TOP ACTIVATIONS
MAX = 0.228

They hurt their knees and elbows . They cry and hug
their heads and knees and elbows . They cry and bleed
scrap ed her knees and elbows . She felt a lot
is angry . It his ses . It scratches . It
's very mild , like tal cum powder . I 'm
nose . The cat his ses and scratches his face .
scare it . It his ses and shows its teeth .
strangers . The cat his ses at Tom and Sue .
que aky in its sharp tal ons .
a cat . She his ses and scratches at the dog
angry snake . It his ses and snaps at Ben .
of Tom . It his ses and scratches . It hurts
it . The cat his ses and scratches him . Ben
like that . It his ses and fl aps its wings
and angry . It his ses and fl aps its wings
The snake his ses . L ila and Tom
is afraid . It his ses and scratches Tom 's hand
is mean . He his ses and runs away . He
grabbed the snake with its tal ons . The hawk flew
with the pipe in its tal ons . The pipe was

MIN ACTIVATIONS
MIN = -0.218

. It says " Fr idge ". She does not know
tell Daddy 's three - year - old a story about
belonged to a three - year - old girl . She
a very adventurous three - year - old . She was
John , a three - year - old , was sitting
was a polite three - year - old . Whenever somebody
real that the 3 - year - old boy could feel
there was a three - year - old girl named Katie
Mark was a three - year - old boy who was
day , a 3 - year - old child ventured into
was a curious three - year - old . She saw
day , a three - year - old boy was very
on , the three - year - old started testing and
, a little three - year - old boy named John
was a curious three - year - old boy . Everywhere
presents ! The 3 - year - old was so happy
presents ! The 3 - year - old was so happy
and their little three - year - old daughter . They
, a small three - year old child spotted the boot
kids , a 3 - year - old and their older

INTERVAL 0.116 - 0.228
CONTAINS 0.413%

the rabbit realised she had no food to eat . She
, it 's so pretty ," Max said . " Can
so pretty with its orange wings and long be ak .
want to hurt Lily 's feelings , but he did not
It might get cold outside ," her dad said .

INTERVAL 0.005 - 0.116
CONTAINS 38.715%

there , Joe mentioned that he had never been to a
" Yeah , I 'm the fastest in the
new jewelry in her mom 's drawer . It is a
. He took a big bite , and it was so
of the hole and found herself in a beautiful garden .

INTERVAL -0.106 - 0.005
CONTAINS 60.554%

let 's go for a march !" Sally was excited and
bit scared because he had never been to the hall before
and said to his sister : " Y ay , I
they like to eat carrots and they like to hop around
The end . <|endoftext|> A unt S ari and her son

INTERVAL -0.218 - -0.106
CONTAINS 0.318%

. She prayed for help . Suddenly ,
and dad watched to make sure she was safe .
." So the four friends discussed what game to play .
safe . Jo and Jeff were relieved .
smiled . He couldn 't believe he had eaten something so
Token They
Token ID1119
Feature activation+1.88e-02

Pos logit contributions
ator+0.015
-+0.015
ail+0.015
 carefully+0.015
yl+0.014
Neg logit contributions
 Leave-0.020
irms-0.019
vious-0.018
ses-0.018
Leave-0.018
Token hurt
Token ID5938
Feature activation+1.18e-01

Pos logit contributions
 Leave+0.035
irms+0.032
ses+0.032
Ha+0.031
Leave+0.030
Neg logit contributions
 carefully-0.035
--0.031
ep-0.030
ail-0.030
ator-0.029
Token their
Token ID511
Feature activation-9.62e-03

Pos logit contributions
 Leave+0.287
irms+0.262
ses+0.255
Ha+0.252
vious+0.251
Neg logit contributions
--0.162
ator-0.160
ail-0.149
 carefully-0.140
yl-0.138
Token knees
Token ID14475
Feature activation+9.10e-02

Pos logit contributions
-+0.013
ator+0.013
ail+0.013
 carefully+0.012
lf+0.011
Neg logit contributions
 Leave-0.024
irms-0.021
vious-0.021
Ha-0.021
Leave-0.021
Token and
Token ID290
Feature activation-6.11e-03

Pos logit contributions
 Leave+0.247
irms+0.224
Leave+0.220
ses+0.220
vious+0.217
Neg logit contributions
ator-0.101
ail-0.090
--0.088
yl-0.085
lf-0.079
Token elbows
Token ID37076
Feature activation+2.28e-01

Pos logit contributions
-+0.013
 carefully+0.013
ep+0.012
 room+0.012
 next+0.011
Neg logit contributions
 Leave-0.009
vious-0.009
Ha-0.009
rified-0.008
 Wally-0.007
Token.
Token ID13
Feature activation-7.11e-03

Pos logit contributions
 Leave+0.621
irms+0.570
ses+0.562
Leave+0.551
 drown+0.547
Neg logit contributions
ator-0.249
ail-0.225
--0.211
yl-0.207
lf-0.192
Token They
Token ID1119
Feature activation+1.92e-02

Pos logit contributions
-+0.009
ator+0.009
ail+0.009
 carefully+0.008
An+0.008
Neg logit contributions
 Leave-0.016
vious-0.016
irms-0.016
ses-0.015
awks-0.015
Token cry
Token ID3960
Feature activation+1.09e-01

Pos logit contributions
 Leave+0.033
Ha+0.033
irms+0.033
ses+0.032
 Pip+0.031
Neg logit contributions
 carefully-0.036
--0.031
 very-0.029
a-0.029
 quickly-0.029
Token and
Token ID290
Feature activation-5.67e-03

Pos logit contributions
 Leave+0.319
irms+0.297
ses+0.290
Ha+0.288
Leave+0.287
Neg logit contributions
--0.096
ator-0.092
ail-0.085
 carefully-0.075
yl-0.073
Token hug
Token ID16225
Feature activation+4.52e-02

Pos logit contributions
 carefully+0.010
-+0.010
 very+0.009
 room+0.009
 quickly+0.009
Neg logit contributions
vious-0.010
 Leave-0.010
Ha-0.010
rified-0.009
umm-0.009
Token their
Token ID511
Feature activation-1.31e-02

Pos logit contributions
 Leave+0.260
irms+0.241
ses+0.232
Leave+0.227
Ha+0.225
Neg logit contributions
ator-0.185
ail-0.177
--0.175
yl-0.162
lf-0.159
Token heads
Token ID6665
Feature activation+9.03e-02

Pos logit contributions
-+0.020
ator+0.020
ail+0.019
 carefully+0.018
yl+0.017
Neg logit contributions
 Leave-0.030
irms-0.027
vious-0.027
Leave-0.026
Ha-0.026
Token and
Token ID290
Feature activation-3.96e-03

Pos logit contributions
 Leave+0.224
irms+0.200
Leave+0.197
 drown+0.196
vious+0.195
Neg logit contributions
ator-0.126
ail-0.115
yl-0.108
lf-0.103
ades-0.096
Token knees
Token ID14475
Feature activation+9.12e-02

Pos logit contributions
-+0.008
 carefully+0.007
ep+0.007
 room+0.007
 together+0.006
Neg logit contributions
 Leave-0.007
vious-0.007
Ha-0.006
rified-0.006
 Pip-0.006
Token and
Token ID290
Feature activation-4.05e-03

Pos logit contributions
 Leave+0.242
irms+0.220
Leave+0.217
ses+0.213
 drown+0.213
Neg logit contributions
ator-0.113
ail-0.100
yl-0.093
lf-0.090
--0.088
Token elbows
Token ID37076
Feature activation+2.27e-01

Pos logit contributions
-+0.009
 carefully+0.009
ep+0.009
 room+0.008
 next+0.008
Neg logit contributions
 Leave-0.006
vious-0.005
 Pip-0.005
 Wally-0.005
rified-0.005
Token.
Token ID13
Feature activation-4.97e-03

Pos logit contributions
 Leave+0.597
irms+0.551
ses+0.541
 drown+0.529
Leave+0.527
Neg logit contributions
ator-0.276
ail-0.253
yl-0.233
--0.223
lf-0.217
Token They
Token ID1119
Feature activation+1.98e-02

Pos logit contributions
-+0.007
ator+0.006
An+0.006
yl+0.006
 carefully+0.006
Neg logit contributions
 Leave-0.011
vious-0.011
irms-0.011
ses-0.010
Leave-0.010
Token cry
Token ID3960
Feature activation+1.09e-01

Pos logit contributions
 Leave+0.032
 Pip+0.031
 Wally+0.031
irms+0.031
Ha+0.030
Neg logit contributions
 carefully-0.039
--0.034
ep-0.033
 very-0.032
 quickly-0.032
Token and
Token ID290
Feature activation-4.40e-03

Pos logit contributions
 Leave+0.311
irms+0.290
ses+0.283
Leave+0.277
Ha+0.276
Neg logit contributions
ator-0.105
--0.097
ail-0.093
yl-0.084
lf-0.080
Token bleed
Token ID30182
Feature activation+1.57e-01

Pos logit contributions
 carefully+0.008
-+0.008
 room+0.007
 very+0.007
 quickly+0.007
Neg logit contributions
 Leave-0.008
vious-0.008
Ha-0.007
rified-0.007
Leave-0.007
Token scrap
Token ID15881
Feature activation+1.18e-01

Pos logit contributions
-+0.007
 carefully+0.006
ep+0.006
lf+0.006
yl+0.006
Neg logit contributions
 Leave-0.008
irms-0.007
ses-0.007
vious-0.007
awks-0.007
Tokened
Token ID276
Feature activation+3.73e-02

Pos logit contributions
 Leave+0.244
irms+0.221
ses+0.218
 drown+0.210
Leave+0.209
Neg logit contributions
--0.206
ator-0.197
ail-0.189
yl-0.180
 carefully-0.177
Token her
Token ID607
Feature activation-1.96e-02

Pos logit contributions
 Leave+0.088
irms+0.080
ses+0.079
Ha+0.076
Leave+0.076
Neg logit contributions
--0.055
ator-0.052
ail-0.049
 carefully-0.047
yl-0.047
Token knees
Token ID14475
Feature activation+8.59e-02

Pos logit contributions
-+0.044
ator+0.044
ail+0.043
 carefully+0.041
yl+0.041
Neg logit contributions
 Leave-0.030
irms-0.026
ses-0.025
Ha-0.024
Leave-0.024
Token and
Token ID290
Feature activation-4.05e-03

Pos logit contributions
 Leave+0.222
irms+0.201
Leave+0.200
vious+0.199
ses+0.197
Neg logit contributions
ator-0.110
ail-0.097
yl-0.089
lf-0.087
--0.084
Token elbows
Token ID37076
Feature activation+2.19e-01

Pos logit contributions
-+0.009
 carefully+0.008
lf+0.008
ep+0.008
 room+0.007
Neg logit contributions
 Leave-0.007
vious-0.006
irms-0.005
 Sorry-0.005
Ha-0.005
Token.
Token ID13
Feature activation-8.29e-03

Pos logit contributions
 Leave+0.597
irms+0.552
ses+0.544
Leave+0.535
vious+0.529
Neg logit contributions
ator-0.240
ail-0.215
--0.199
yl-0.190
lf-0.182
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
-+0.011
 carefully+0.011
ail+0.011
 new+0.010
yl+0.010
Neg logit contributions
 Leave-0.017
 drown-0.017
vious-0.017
haw-0.016
irms-0.016
Token felt
Token ID2936
Feature activation+4.04e-02

Pos logit contributions
-+0.018
 carefully+0.017
ep+0.017
lf+0.016
a+0.015
Neg logit contributions
 Leave-0.022
irms-0.021
ses-0.020
 Sorry-0.020
awks-0.019
Token a
Token ID257
Feature activation-8.15e-03

Pos logit contributions
ses+0.099
 Leave+0.097
 Sorry+0.097
Ha+0.092
use+0.092
Neg logit contributions
--0.050
 very-0.049
 new-0.046
 carefully-0.045
 special-0.038
Token lot
Token ID1256
Feature activation+1.44e-02

Pos logit contributions
-+0.015
 carefully+0.013
ator+0.013
ail+0.013
yl+0.012
Neg logit contributions
 Leave-0.017
irms-0.016
ses-0.015
vious-0.015
awks-0.014
Token is
Token ID318
Feature activation+2.17e-02

Pos logit contributions
-+0.001
ator+0.001
 carefully+0.001
ail+0.001
ep+0.001
Neg logit contributions
 Leave-0.001
irms-0.001
 Pip-0.001
ses-0.001
Leave-0.001
Token angry
Token ID7954
Feature activation+5.50e-02

Pos logit contributions
 Leave+0.051
irms+0.047
ses+0.045
Leave+0.045
Ha+0.045
Neg logit contributions
ator-0.031
--0.031
ail-0.029
yl-0.027
 carefully-0.027
Token.
Token ID13
Feature activation-5.73e-03

Pos logit contributions
 Leave+0.161
irms+0.150
ses+0.147
Leave+0.145
Ha+0.145
Neg logit contributions
ator-0.047
--0.046
ail-0.043
yl-0.037
 carefully-0.036
Token It
Token ID632
Feature activation-3.23e-03

Pos logit contributions
-+0.011
ator+0.011
ail+0.010
yl+0.010
 carefully+0.010
Neg logit contributions
 Leave-0.010
irms-0.010
ses-0.009
vious-0.009
 drown-0.009
Token his
Token ID465
Feature activation+3.24e-02

Pos logit contributions
-+0.007
 carefully+0.006
ep+0.006
ator+0.006
yl+0.006
Neg logit contributions
 Leave-0.005
irms-0.005
 Pip-0.005
Leave-0.005
 drown-0.004
Tokenses
Token ID8448
Feature activation+2.07e-01

Pos logit contributions
 Leave+0.042
Leave+0.035
irms+0.034
vious+0.034
Ha+0.032
Neg logit contributions
--0.083
ator-0.079
ail-0.078
 carefully-0.076
yl-0.075
Token.
Token ID13
Feature activation-5.00e-03

Pos logit contributions
 Leave+0.611
irms+0.575
ses+0.560
Ha+0.552
Leave+0.551
Neg logit contributions
--0.175
ator-0.154
ail-0.140
 carefully-0.134
yl-0.124
Token It
Token ID632
Feature activation-4.65e-03

Pos logit contributions
-+0.009
ator+0.009
yl+0.008
 carefully+0.008
ail+0.008
Neg logit contributions
 Leave-0.009
irms-0.009
ses-0.008
 drown-0.008
vious-0.008
Token scratches
Token ID37627
Feature activation+7.98e-02

Pos logit contributions
-+0.010
 carefully+0.009
ep+0.009
yl+0.008
ator+0.008
Neg logit contributions
 Leave-0.008
irms-0.007
 Pip-0.007
Leave-0.006
 Woo-0.006
Token.
Token ID13
Feature activation-4.38e-03

Pos logit contributions
 Leave+0.174
irms+0.158
ses+0.151
Leave+0.151
Ha+0.150
Neg logit contributions
ator-0.130
--0.123
ail-0.123
yl-0.113
lf-0.112
Token It
Token ID632
Feature activation-6.11e-03

Pos logit contributions
-+0.007
ator+0.007
yl+0.007
 carefully+0.007
ail+0.006
Neg logit contributions
 Leave-0.009
irms-0.008
vious-0.008
ses-0.008
 drown-0.008
Token's
Token ID338
Feature activation+8.85e-03

Pos logit contributions
ator+0.125
lf+0.113
-+0.113
ail+0.113
yl+0.111
Neg logit contributions
 Leave-0.153
ses-0.131
vious-0.131
irms-0.130
 Are-0.128
Token very
Token ID845
Feature activation-5.46e-02

Pos logit contributions
 Leave+0.019
ses+0.018
irms+0.017
 Pip+0.017
 pain+0.016
Neg logit contributions
ator-0.014
--0.013
ail-0.013
ades-0.012
yl-0.012
Token mild
Token ID11607
Feature activation+2.19e-02

Pos logit contributions
ator+0.082
ail+0.072
-+0.071
lf+0.069
ades+0.067
Neg logit contributions
 Leave-0.117
 Pip-0.113
irms-0.113
ses-0.111
 pain-0.107
Token,
Token ID11
Feature activation+7.40e-03

Pos logit contributions
 Leave+0.055
irms+0.051
ses+0.049
Leave+0.049
Ha+0.049
Neg logit contributions
--0.028
ator-0.028
ail-0.026
 carefully-0.024
yl-0.024
Token like
Token ID588
Feature activation+1.02e-02

Pos logit contributions
 Leave+0.018
irms+0.016
ses+0.016
Leave+0.015
Ha+0.015
Neg logit contributions
ator-0.010
--0.010
ail-0.010
yl-0.009
 carefully-0.009
Token tal
Token ID3305
Feature activation+2.06e-01

Pos logit contributions
 Leave+0.025
irms+0.023
Leave+0.022
ses+0.022
Ha+0.022
Neg logit contributions
--0.014
ator-0.013
 carefully-0.012
ail-0.012
yl-0.012
Tokencum
Token ID36340
Feature activation-1.93e-02

Pos logit contributions
 Leave+0.484
irms+0.440
vious+0.438
Leave+0.429
Ha+0.417
Neg logit contributions
ator-0.294
ail-0.259
--0.258
yl-0.242
pack-0.241
Token powder
Token ID11913
Feature activation-8.37e-02

Pos logit contributions
ator+0.025
-+0.024
ail+0.023
lf+0.021
 carefully+0.021
Neg logit contributions
 Leave-0.047
irms-0.043
ses-0.043
vious-0.042
Ha-0.041
Token.
Token ID13
Feature activation-6.08e-03

Pos logit contributions
ator+0.112
ail+0.102
-+0.101
yl+0.100
ep+0.095
Neg logit contributions
 Leave-0.197
irms-0.177
ses-0.176
vious-0.174
 Pip-0.171
Token I
Token ID314
Feature activation+8.47e-02

Pos logit contributions
ator+0.012
-+0.012
ail+0.012
 carefully+0.011
yl+0.011
Neg logit contributions
 Leave-0.010
irms-0.009
ses-0.009
Leave-0.009
 drown-0.008
Token'm
Token ID1101
Feature activation+3.50e-03

Pos logit contributions
 Leave+0.152
irms+0.137
ses+0.130
Leave+0.128
Ha+0.126
Neg logit contributions
--0.169
ator-0.167
ail-0.162
 carefully-0.157
yl-0.154
Token nose
Token ID9686
Feature activation+7.11e-02

Pos logit contributions
-+0.035
ator+0.035
ail+0.033
yl+0.031
 carefully+0.031
Neg logit contributions
 Leave-0.036
irms-0.032
ses-0.031
Ha-0.030
Leave-0.030
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
 Leave+0.179
irms+0.168
vious+0.163
awks+0.158
Leave+0.158
Neg logit contributions
ator-0.097
ail-0.083
yl-0.077
lf-0.076
pack-0.071
Token The
Token ID383
Feature activation+1.40e-02

Pos logit contributions
 Leave+0.004
irms+0.004
vious+0.004
Leave+0.004
ses+0.004
Neg logit contributions
--0.004
 carefully-0.004
ail-0.004
ator-0.004
yl-0.003
Token cat
Token ID3797
Feature activation+3.63e-03

Pos logit contributions
 Leave+0.030
irms+0.027
vious+0.026
 drown+0.025
Leave+0.025
Neg logit contributions
--0.023
ator-0.022
ail-0.021
 carefully-0.020
yl-0.020
Token his
Token ID465
Feature activation+2.43e-02

Pos logit contributions
 Leave+0.007
irms+0.006
ses+0.006
Leave+0.006
 Pip+0.006
Neg logit contributions
--0.007
 carefully-0.006
ail-0.006
ator-0.006
yl-0.006
Tokenses
Token ID8448
Feature activation+2.06e-01

Pos logit contributions
 Leave+0.036
irms+0.031
vious+0.030
Ha+0.030
Leave+0.030
Neg logit contributions
--0.058
ail-0.053
ator-0.052
 carefully-0.051
yl-0.050
Token and
Token ID290
Feature activation+3.29e-03

Pos logit contributions
 Leave+0.607
irms+0.574
ses+0.563
 drown+0.552
Ha+0.550
Neg logit contributions
--0.175
ator-0.143
 carefully-0.134
ail-0.132
yl-0.118
Token scratches
Token ID37627
Feature activation+8.31e-02

Pos logit contributions
 Leave+0.007
vious+0.006
rified+0.006
irms+0.006
Leave+0.006
Neg logit contributions
--0.005
 carefully-0.005
 new-0.005
ep-0.004
 next-0.004
Token his
Token ID465
Feature activation+2.44e-02

Pos logit contributions
 Leave+0.179
irms+0.163
ses+0.158
Leave+0.154
Ha+0.154
Neg logit contributions
ator-0.136
--0.134
ail-0.128
yl-0.120
 carefully-0.119
Token face
Token ID1986
Feature activation+4.67e-02

Pos logit contributions
 Leave+0.044
irms+0.039
Ha+0.038
vious+0.037
Leave+0.037
Neg logit contributions
--0.050
ail-0.046
ator-0.045
 carefully-0.044
yl-0.043
Token.
Token ID13
Feature activation+2.65e-03

Pos logit contributions
 Leave+0.121
irms+0.111
vious+0.109
Leave+0.106
 pain+0.105
Neg logit contributions
ator-0.062
ail-0.054
yl-0.049
lf-0.048
ades-0.045
Token scare
Token ID19437
Feature activation+3.17e-02

Pos logit contributions
 Leave+0.021
irms+0.019
ses+0.019
Ha+0.018
vious+0.018
Neg logit contributions
ator-0.017
--0.017
ail-0.016
 carefully-0.016
yl-0.015
Token it
Token ID340
Feature activation-2.00e-02

Pos logit contributions
 Leave+0.071
ses+0.071
irms+0.065
 drown+0.064
Leave+0.063
Neg logit contributions
--0.050
 carefully-0.043
ail-0.038
yl-0.038
ator-0.037
Token.
Token ID13
Feature activation+1.70e-03

Pos logit contributions
ator+0.027
-+0.026
ail+0.025
yl+0.023
lf+0.023
Neg logit contributions
 Leave-0.049
irms-0.045
ses-0.044
Leave-0.043
Ha-0.043
Token It
Token ID632
Feature activation-1.28e-03

Pos logit contributions
 Leave+0.003
 drown+0.003
Leave+0.002
 bleed+0.002
ses+0.002
Neg logit contributions
 carefully-0.003
--0.003
 patiently-0.003
 new-0.003
"-0.003
Token his
Token ID465
Feature activation+2.20e-02

Pos logit contributions
-+0.003
 carefully+0.003
ator+0.002
ail+0.002
yl+0.002
Neg logit contributions
 Leave-0.002
irms-0.002
 Pip-0.002
ses-0.002
Leave-0.002
Tokenses
Token ID8448
Feature activation+2.05e-01

Pos logit contributions
 Leave+0.040
irms+0.035
Leave+0.034
Ha+0.034
vious+0.034
Neg logit contributions
--0.044
ator-0.043
ail-0.042
 carefully-0.040
yl-0.040
Token and
Token ID290
Feature activation+2.52e-03

Pos logit contributions
 Leave+0.588
irms+0.549
ses+0.536
Ha+0.527
Leave+0.527
Neg logit contributions
--0.186
ator-0.183
ail-0.166
 carefully-0.148
yl-0.147
Token shows
Token ID2523
Feature activation-3.31e-02

Pos logit contributions
 Leave+0.004
vious+0.004
rified+0.004
irms+0.004
Leave+0.004
Neg logit contributions
--0.005
 carefully-0.004
ep-0.004
 new-0.004
yl-0.004
Token its
Token ID663
Feature activation+3.29e-02

Pos logit contributions
-+0.063
 carefully+0.058
ator+0.056
ail+0.055
yl+0.052
Neg logit contributions
 Leave-0.064
ses-0.059
irms-0.058
Leave-0.057
 drown-0.056
Token teeth
Token ID9941
Feature activation+1.58e-01

Pos logit contributions
 Leave+0.066
irms+0.060
ses+0.058
Leave+0.056
Ha+0.056
Neg logit contributions
ator-0.058
--0.057
ail-0.055
yl-0.052
 carefully-0.051
Token.
Token ID13
Feature activation+2.04e-03

Pos logit contributions
 Leave+0.407
irms+0.367
vious+0.359
Leave+0.357
ses+0.356
Neg logit contributions
ator-0.206
ail-0.181
yl-0.164
lf-0.164
--0.162
Token strangers
Token ID18981
Feature activation+8.73e-02

Pos logit contributions
 Leave+0.028
irms+0.026
ses+0.025
Ha+0.025
Leave+0.024
Neg logit contributions
--0.020
ator-0.019
ail-0.018
 carefully-0.018
yl-0.018
Token.
Token ID13
Feature activation-1.08e-03

Pos logit contributions
 Leave+0.264
irms+0.248
ses+0.243
Ha+0.240
Leave+0.239
Neg logit contributions
--0.065
ator-0.062
ail-0.055
 carefully-0.049
yl-0.048
Token The
Token ID383
Feature activation+1.27e-02

Pos logit contributions
-+0.002
 carefully+0.002
ator+0.002
yl+0.002
ail+0.002
Neg logit contributions
 Leave-0.002
irms-0.002
 drown-0.002
ses-0.002
vious-0.002
Token cat
Token ID3797
Feature activation+9.93e-04

Pos logit contributions
 Leave+0.026
irms+0.023
Ha+0.022
vious+0.022
Leave+0.022
Neg logit contributions
--0.023
ator-0.022
ail-0.021
yl-0.020
 carefully-0.020
Token his
Token ID465
Feature activation+2.26e-02

Pos logit contributions
 Leave+0.002
irms+0.002
Leave+0.002
ses+0.002
Ha+0.002
Neg logit contributions
--0.002
 carefully-0.002
ail-0.002
ator-0.002
yl-0.002
Tokenses
Token ID8448
Feature activation+2.05e-01

Pos logit contributions
 Leave+0.024
irms+0.019
Ha+0.018
Leave+0.018
vious+0.017
Neg logit contributions
--0.062
ator-0.061
ail-0.060
yl-0.057
 carefully-0.057
Token at
Token ID379
Feature activation-7.07e-04

Pos logit contributions
 Leave+0.563
irms+0.522
ses+0.509
Leave+0.501
Ha+0.500
Neg logit contributions
ator-0.215
--0.207
ail-0.196
yl-0.176
 carefully-0.170
Token Tom
Token ID4186
Feature activation-4.52e-03

Pos logit contributions
ator+0.001
-+0.001
ail+0.001
yl+0.001
 carefully+0.001
Neg logit contributions
 Leave-0.001
irms-0.001
ses-0.001
Leave-0.001
Ha-0.001
Token and
Token ID290
Feature activation+6.39e-04

Pos logit contributions
ator+0.005
ail+0.004
-+0.004
yl+0.004
ades+0.004
Neg logit contributions
 Leave-0.012
irms-0.011
ses-0.011
 drown-0.011
Leave-0.011
Token Sue
Token ID26089
Feature activation-2.20e-02

Pos logit contributions
 Leave+0.001
vious+0.001
irms+0.001
Leave+0.001
 drown+0.001
Neg logit contributions
--0.001
 carefully-0.001
 new-0.001
yl-0.001
 patiently-0.001
Token.
Token ID13
Feature activation-6.36e-04

Pos logit contributions
-+0.018
ail+0.016
ator+0.016
 carefully+0.015
yl+0.013
Neg logit contributions
 Leave-0.064
irms-0.061
ses-0.059
Ha-0.059
Leave-0.059
Tokenque
Token ID4188
Feature activation+1.29e-02

Pos logit contributions
 Leave+0.025
vious+0.021
irms+0.020
Ha+0.020
 Pip+0.020
Neg logit contributions
ator-0.032
lf-0.030
--0.029
ail-0.029
 carefully-0.029
Tokenaky
Token ID15492
Feature activation-9.34e-03

Pos logit contributions
 Leave+0.029
irms+0.026
awks+0.026
ses+0.025
vious+0.025
Neg logit contributions
ator-0.021
lf-0.018
ail-0.018
--0.018
yl-0.017
Token in
Token ID287
Feature activation-9.68e-03

Pos logit contributions
ator+0.014
ail+0.012
-+0.012
yl+0.012
lf+0.012
Neg logit contributions
 Leave-0.022
irms-0.020
ses-0.019
vious-0.019
 drown-0.019
Token its
Token ID663
Feature activation+3.13e-02

Pos logit contributions
-+0.017
ator+0.016
ail+0.016
yl+0.015
 carefully+0.015
Neg logit contributions
 Leave-0.020
irms-0.018
ses-0.018
Ha-0.017
vious-0.017
Token sharp
Token ID7786
Feature activation+5.09e-02

Pos logit contributions
 Leave+0.058
irms+0.053
vious+0.051
Ha+0.050
Leave+0.050
Neg logit contributions
--0.062
ail-0.058
ator-0.057
 carefully-0.055
ep-0.053
Token tal
Token ID3305
Feature activation+2.04e-01

Pos logit contributions
 Leave+0.076
irms+0.067
Ha+0.066
 Sorry+0.064
Leave+0.064
Neg logit contributions
--0.126
ail-0.109
ep-0.107
ator-0.106
 carefully-0.105
Tokenons
Token ID684
Feature activation-5.65e-02

Pos logit contributions
 Leave+0.511
irms+0.463
Leave+0.446
vious+0.444
Ha+0.436
Neg logit contributions
ator-0.275
--0.244
ail-0.239
yl-0.224
 carefully-0.219
Token.
Token ID13
Feature activation-9.15e-03

Pos logit contributions
ator+0.062
-+0.054
ail+0.053
yl+0.050
lf+0.049
Neg logit contributions
 Leave-0.152
irms-0.139
ses-0.137
 Pip-0.135
 drown-0.134
Token
Token ID220
Feature activation+4.66e-03

Pos logit contributions
ail+0.013
-+0.013
ator+0.012
 carefully+0.012
yl+0.012
Neg logit contributions
 Leave-0.020
irms-0.020
vious-0.019
ses-0.019
Leave-0.018
Token
Token ID198
Feature activation-4.57e-03

Pos logit contributions
 Leave+0.011
irms+0.010
ses+0.010
Ha+0.009
Leave+0.009
Neg logit contributions
ator-0.007
--0.007
ail-0.006
yl-0.006
lf-0.006
Token
Token ID198
Feature activation-4.27e-03

Pos logit contributions
ator+0.007
-+0.007
ail+0.006
lf+0.006
yl+0.006
Neg logit contributions
 Leave-0.011
irms-0.010
Ha-0.009
ses-0.009
Leave-0.009
Token a
Token ID257
Feature activation-8.90e-03

Pos logit contributions
 Leave+0.023
irms+0.022
ses+0.021
 drown+0.020
vious+0.020
Neg logit contributions
--0.024
ator-0.023
ail-0.022
 carefully-0.022
yl-0.022
Token cat
Token ID3797
Feature activation-6.19e-04

Pos logit contributions
ator+0.019
-+0.019
ail+0.018
yl+0.018
 carefully+0.017
Neg logit contributions
 Leave-0.015
irms-0.013
ses-0.012
Ha-0.012
Leave-0.012
Token.
Token ID13
Feature activation-1.52e-03

Pos logit contributions
-+0.001
ator+0.001
ail+0.000
 carefully+0.000
yl+0.000
Neg logit contributions
 Leave-0.002
irms-0.002
ses-0.002
Ha-0.002
Leave-0.002
Token She
Token ID1375
Feature activation-9.27e-03

Pos logit contributions
-+0.003
ail+0.003
 carefully+0.003
yl+0.002
"+0.002
Neg logit contributions
 Leave-0.003
ses-0.003
irms-0.003
 drown-0.003
vious-0.003
Token his
Token ID465
Feature activation+2.03e-02

Pos logit contributions
-+0.020
 carefully+0.019
ail+0.019
ator+0.019
yl+0.018
Neg logit contributions
 Leave-0.014
irms-0.013
Leave-0.012
ses-0.012
vious-0.012
Tokenses
Token ID8448
Feature activation+2.04e-01

Pos logit contributions
 Leave+0.024
irms+0.019
Leave+0.019
Ha+0.018
vious+0.018
Neg logit contributions
--0.054
ator-0.052
ail-0.051
 carefully-0.049
yl-0.049
Token and
Token ID290
Feature activation+7.38e-04

Pos logit contributions
 Leave+0.548
irms+0.507
ses+0.494
Leave+0.487
Ha+0.486
Neg logit contributions
ator-0.228
--0.217
ail-0.209
yl-0.188
lf-0.183
Token scratches
Token ID37627
Feature activation+8.02e-02

Pos logit contributions
 Leave+0.001
vious+0.001
rified+0.001
Leave+0.001
irms+0.001
Neg logit contributions
--0.001
 carefully-0.001
ep-0.001
 new-0.001
yl-0.001
Token at
Token ID379
Feature activation-1.18e-03

Pos logit contributions
 Leave+0.179
irms+0.163
ses+0.158
Ha+0.155
Leave+0.155
Neg logit contributions
ator-0.124
--0.123
ail-0.117
yl-0.110
 carefully-0.109
Token the
Token ID262
Feature activation+4.44e-03

Pos logit contributions
-+0.002
ator+0.002
ail+0.002
 carefully+0.002
yl+0.002
Neg logit contributions
 Leave-0.003
irms-0.002
vious-0.002
ses-0.002
Ha-0.002
Token dog
Token ID3290
Feature activation+5.30e-02

Pos logit contributions
 Leave+0.008
irms+0.007
Ha+0.007
vious+0.007
Leave+0.007
Neg logit contributions
--0.009
ator-0.008
ail-0.008
 carefully-0.007
lf-0.007
Token angry
Token ID7954
Feature activation+5.11e-02

Pos logit contributions
 Leave+0.023
irms+0.021
vious+0.020
 drown+0.020
Ha+0.019
Neg logit contributions
--0.014
 carefully-0.012
ep-0.011
yl-0.011
ator-0.010
Token snake
Token ID17522
Feature activation-8.83e-03

Pos logit contributions
 Leave+0.114
Leave+0.101
irms+0.098
Ha+0.097
 drown+0.096
Neg logit contributions
--0.098
ator-0.079
yl-0.076
ail-0.072
lf-0.071
Token.
Token ID13
Feature activation+3.07e-04

Pos logit contributions
-+0.009
ator+0.008
ail+0.008
yl+0.007
 carefully+0.007
Neg logit contributions
 Leave-0.024
irms-0.023
Ha-0.022
ses-0.022
Leave-0.022
Token It
Token ID632
Feature activation-2.24e-03

Pos logit contributions
 Leave+0.001
irms+0.001
ses+0.001
Leave+0.000
vious+0.000
Neg logit contributions
--0.001
ator-0.001
ail-0.001
 carefully-0.001
yl-0.001
Token his
Token ID465
Feature activation+2.35e-02

Pos logit contributions
-+0.005
 carefully+0.004
ep+0.004
ator+0.004
ail+0.004
Neg logit contributions
 Leave-0.004
irms-0.003
 Pip-0.003
Leave-0.003
 Woo-0.003
Tokenses
Token ID8448
Feature activation+2.03e-01

Pos logit contributions
 Leave+0.039
Leave+0.033
irms+0.033
Ha+0.032
vious+0.032
Neg logit contributions
--0.052
ator-0.050
ail-0.049
 carefully-0.047
yl-0.046
Token and
Token ID290
Feature activation+1.62e-03

Pos logit contributions
 Leave+0.577
irms+0.543
ses+0.528
Ha+0.519
Leave+0.519
Neg logit contributions
--0.194
ator-0.176
ail-0.162
 carefully-0.154
yl-0.146
Token snaps
Token ID23429
Feature activation+3.62e-02

Pos logit contributions
 Leave+0.003
vious+0.003
irms+0.002
rified+0.002
Leave+0.002
Neg logit contributions
--0.003
 carefully-0.003
ep-0.003
 new-0.003
yl-0.003
Token at
Token ID379
Feature activation-4.98e-04

Pos logit contributions
 Leave+0.086
irms+0.079
ses+0.076
Leave+0.075
Ha+0.075
Neg logit contributions
ator-0.052
--0.050
ail-0.049
yl-0.045
lf-0.044
Token Ben
Token ID3932
Feature activation+2.80e-02

Pos logit contributions
ator+0.001
-+0.001
ail+0.001
yl+0.001
 carefully+0.001
Neg logit contributions
 Leave-0.001
irms-0.001
Leave-0.001
ses-0.001
vious-0.001
Token.
Token ID13
Feature activation+7.38e-04

Pos logit contributions
 Leave+0.078
irms+0.072
ses+0.070
 drown+0.069
vious+0.068
Neg logit contributions
ator-0.029
ail-0.026
--0.025
yl-0.023
lf-0.022
Token of
Token ID286
Feature activation-1.95e-02

Pos logit contributions
 Leave+0.195
ses+0.184
irms+0.184
Ha+0.178
 drown+0.177
Neg logit contributions
--0.047
ator-0.038
ail-0.034
 carefully-0.033
yl-0.029
Token Tom
Token ID4186
Feature activation-7.04e-03

Pos logit contributions
ator+0.033
-+0.033
ail+0.032
yl+0.030
 carefully+0.029
Neg logit contributions
 Leave-0.040
irms-0.037
ses-0.035
Leave-0.035
Ha-0.035
Token.
Token ID13
Feature activation-3.22e-03

Pos logit contributions
ator+0.008
ail+0.007
-+0.007
yl+0.006
ades+0.006
Neg logit contributions
 Leave-0.019
irms-0.018
ses-0.017
vious-0.017
 drown-0.017
Token It
Token ID632
Feature activation-6.14e-03

Pos logit contributions
-+0.006
 carefully+0.006
yl+0.005
"+0.005
ep+0.005
Neg logit contributions
 Leave-0.005
 drown-0.005
irms-0.005
ses-0.005
Leave-0.005
Token his
Token ID465
Feature activation+2.09e-02

Pos logit contributions
-+0.012
 carefully+0.012
ail+0.010
ep+0.010
yl+0.010
Neg logit contributions
 Leave-0.011
 Pip-0.010
irms-0.010
Leave-0.009
 Woo-0.009
Tokenses
Token ID8448
Feature activation+2.03e-01

Pos logit contributions
 Leave+0.028
Leave+0.023
vious+0.023
Ha+0.022
irms+0.022
Neg logit contributions
--0.054
ail-0.050
 carefully-0.049
ator-0.048
 new-0.047
Token and
Token ID290
Feature activation-8.41e-04

Pos logit contributions
 Leave+0.587
irms+0.548
ses+0.535
Ha+0.527
Leave+0.527
Neg logit contributions
--0.180
ator-0.173
ail-0.158
 carefully-0.142
yl-0.139
Token scratches
Token ID37627
Feature activation+7.91e-02

Pos logit contributions
-+0.002
 carefully+0.001
 new+0.001
ep+0.001
yl+0.001
Neg logit contributions
 Leave-0.001
vious-0.001
rified-0.001
irms-0.001
Leave-0.001
Token.
Token ID13
Feature activation-2.84e-03

Pos logit contributions
 Leave+0.204
irms+0.189
ses+0.183
Leave+0.180
Ha+0.180
Neg logit contributions
ator-0.097
--0.093
ail-0.090
yl-0.082
 carefully-0.079
Token It
Token ID632
Feature activation-6.13e-03

Pos logit contributions
-+0.005
 carefully+0.005
yl+0.005
ep+0.005
ator+0.005
Neg logit contributions
 Leave-0.005
 drown-0.004
irms-0.004
vious-0.004
ses-0.004
Token hurts
Token ID20406
Feature activation+1.48e-01

Pos logit contributions
-+0.013
 carefully+0.012
ep+0.011
ail+0.011
yl+0.011
Neg logit contributions
 Leave-0.010
 Pip-0.009
irms-0.009
Leave-0.009
 Woo-0.008
Token it
Token ID340
Feature activation-2.19e-02

Pos logit contributions
 Leave+0.005
irms+0.005
ses+0.005
Ha+0.005
Leave+0.005
Neg logit contributions
--0.003
ator-0.003
ail-0.003
yl-0.003
 carefully-0.003
Token.
Token ID13
Feature activation-5.39e-03

Pos logit contributions
ator+0.032
ail+0.029
lf+0.027
ades+0.027
yl+0.025
Neg logit contributions
 Leave-0.053
ses-0.048
irms-0.048
 drown-0.046
 Pip-0.046
Token The
Token ID383
Feature activation+7.46e-03

Pos logit contributions
-+0.010
 carefully+0.010
ator+0.009
ail+0.009
yl+0.009
Neg logit contributions
 Leave-0.010
irms-0.009
vious-0.009
Leave-0.008
ses-0.008
Token cat
Token ID3797
Feature activation-3.32e-03

Pos logit contributions
 Leave+0.016
irms+0.014
ses+0.013
Leave+0.013
Ha+0.013
Neg logit contributions
--0.013
ator-0.013
ail-0.012
yl-0.012
 carefully-0.011
Token his
Token ID465
Feature activation+2.17e-02

Pos logit contributions
-+0.007
ator+0.006
ail+0.006
 carefully+0.006
yl+0.006
Neg logit contributions
 Leave-0.006
irms-0.005
ses-0.005
Ha-0.005
Leave-0.005
Tokenses
Token ID8448
Feature activation+2.03e-01

Pos logit contributions
 Leave+0.020
irms+0.015
Leave+0.014
Ha+0.014
vious+0.014
Neg logit contributions
--0.063
ator-0.061
ail-0.061
yl-0.059
 carefully-0.058
Token and
Token ID290
Feature activation-3.56e-03

Pos logit contributions
 Leave+0.577
irms+0.536
ses+0.524
Leave+0.517
Ha+0.515
Neg logit contributions
ator-0.192
--0.183
ail-0.173
yl-0.153
lf-0.147
Token scratches
Token ID37627
Feature activation+7.74e-02

Pos logit contributions
-+0.007
 carefully+0.006
 new+0.006
ep+0.006
yl+0.005
Neg logit contributions
 Leave-0.006
vious-0.006
rified-0.005
irms-0.005
Leave-0.005
Token him
Token ID683
Feature activation+3.14e-02

Pos logit contributions
 Leave+0.160
irms+0.145
ses+0.140
Leave+0.137
Ha+0.137
Neg logit contributions
ator-0.133
--0.131
ail-0.126
yl-0.119
 carefully-0.117
Token.
Token ID13
Feature activation-4.65e-03

Pos logit contributions
 Leave+0.079
irms+0.071
ses+0.070
 drown+0.069
 bleed+0.068
Neg logit contributions
ator-0.046
ail-0.039
lf-0.038
yl-0.037
ades-0.037
Token Ben
Token ID3932
Feature activation+3.12e-02

Pos logit contributions
-+0.007
 carefully+0.007
An+0.007
 new+0.006
"+0.006
Neg logit contributions
 Leave-0.009
vious-0.009
irms-0.009
Leave-0.008
ses-0.008
Token like
Token ID588
Feature activation+1.08e-02

Pos logit contributions
-+0.036
 carefully+0.034
yl+0.032
a+0.031
ator+0.031
Neg logit contributions
 Leave-0.033
irms-0.031
ses-0.029
Ha-0.028
Leave-0.027
Token that
Token ID326
Feature activation+4.75e-04

Pos logit contributions
 Leave+0.027
irms+0.027
ses+0.025
Ha+0.025
vious+0.024
Neg logit contributions
--0.014
ator-0.012
 carefully-0.012
ail-0.011
yl-0.011
Token.
Token ID13
Feature activation-4.86e-03

Pos logit contributions
 Leave+0.001
irms+0.001
ses+0.001
Ha+0.001
Leave+0.001
Neg logit contributions
--0.000
ator-0.000
 carefully-0.000
yl-0.000
ail-0.000
Token It
Token ID632
Feature activation-7.44e-03

Pos logit contributions
-+0.009
ator+0.009
ail+0.009
yl+0.009
 carefully+0.009
Neg logit contributions
 Leave-0.008
irms-0.008
ses-0.007
vious-0.007
 drown-0.007
Token his
Token ID465
Feature activation+2.92e-02

Pos logit contributions
-+0.012
 carefully+0.011
ail+0.010
ator+0.010
ep+0.010
Neg logit contributions
 Leave-0.016
irms-0.015
 Pip-0.015
ses-0.014
Leave-0.014
Tokenses
Token ID8448
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.059
irms+0.054
vious+0.052
Ha+0.052
Leave+0.052
Neg logit contributions
--0.054
ail-0.050
ator-0.050
 carefully-0.047
yl-0.047
Token and
Token ID290
Feature activation-6.53e-03

Pos logit contributions
 Leave+0.605
irms+0.586
ses+0.561
 drown+0.560
Ha+0.557
Neg logit contributions
--0.171
 carefully-0.130
ator-0.115
ail-0.114
ep-0.099
Token fl
Token ID781
Feature activation-7.92e-03

Pos logit contributions
-+0.011
 carefully+0.010
ep+0.009
 new+0.009
yl+0.008
Neg logit contributions
 Leave-0.013
vious-0.012
irms-0.012
rified-0.012
Whe-0.011
Tokenaps
Token ID1686
Feature activation+2.65e-02

Pos logit contributions
-+0.012
ail+0.011
ator+0.011
yl+0.010
ades+0.010
Neg logit contributions
 Leave-0.018
irms-0.017
Ha-0.017
Leave-0.016
vious-0.016
Token its
Token ID663
Feature activation+3.50e-02

Pos logit contributions
 Leave+0.063
irms+0.058
ses+0.057
Ha+0.056
 drown+0.055
Neg logit contributions
--0.038
 carefully-0.035
ator-0.033
ail-0.032
ep-0.031
Token wings
Token ID12098
Feature activation+1.39e-01

Pos logit contributions
 Leave+0.063
irms+0.057
Ha+0.054
vious+0.054
Leave+0.054
Neg logit contributions
--0.073
 carefully-0.067
ail-0.066
ator-0.064
ep-0.063
Token and
Token ID290
Feature activation-2.40e-03

Pos logit contributions
 Leave+0.193
irms+0.182
ses+0.181
Ha+0.176
 drown+0.175
Neg logit contributions
--0.053
ator-0.044
ail-0.040
 carefully-0.039
yl-0.034
Token angry
Token ID7954
Feature activation+4.80e-02

Pos logit contributions
 carefully+0.004
-+0.004
ep+0.003
 new+0.003
 patiently+0.003
Neg logit contributions
 Leave-0.005
Ha-0.004
irms-0.004
 Pip-0.004
vious-0.004
Token.
Token ID13
Feature activation-3.00e-03

Pos logit contributions
 Leave+0.159
irms+0.150
ses+0.149
Leave+0.147
Ha+0.147
Neg logit contributions
--0.029
ator-0.014
 carefully-0.014
ail-0.010
yl-0.009
Token It
Token ID632
Feature activation-5.33e-03

Pos logit contributions
-+0.004
An+0.004
 carefully+0.004
"+0.004
 new+0.004
Neg logit contributions
 Leave-0.006
ses-0.006
irms-0.006
 drown-0.006
Leave-0.006
Token his
Token ID465
Feature activation+2.09e-02

Pos logit contributions
-+0.011
 carefully+0.010
ep+0.009
ail+0.009
a+0.009
Neg logit contributions
 Leave-0.009
 Pip-0.008
irms-0.008
Leave-0.007
 Woo-0.007
Tokenses
Token ID8448
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.030
vious+0.025
Ha+0.025
Leave+0.025
irms+0.024
Neg logit contributions
--0.052
ail-0.047
 carefully-0.046
ator-0.046
yl-0.045
Token and
Token ID290
Feature activation-2.21e-03

Pos logit contributions
 Leave+0.595
irms+0.563
ses+0.548
Ha+0.539
 drown+0.539
Neg logit contributions
--0.176
ator-0.140
 carefully-0.134
ail-0.131
yl-0.118
Token fl
Token ID781
Feature activation-1.34e-02

Pos logit contributions
-+0.004
 carefully+0.004
 new+0.004
ep+0.003
 next+0.003
Neg logit contributions
 Leave-0.004
vious-0.004
rified-0.004
 lately-0.003
 Stop-0.003
Tokenaps
Token ID1686
Feature activation+3.05e-02

Pos logit contributions
-+0.019
ail+0.018
ator+0.018
yl+0.016
ades+0.016
Neg logit contributions
 Leave-0.031
irms-0.029
Ha-0.029
Leave-0.029
vious-0.028
Token its
Token ID663
Feature activation+3.12e-02

Pos logit contributions
 Leave+0.073
irms+0.068
ses+0.068
Ha+0.067
 drown+0.066
Neg logit contributions
--0.042
 carefully-0.039
ep-0.033
ator-0.032
a-0.032
Token wings
Token ID12098
Feature activation+1.40e-01

Pos logit contributions
 Leave+0.053
irms+0.047
Leave+0.045
vious+0.045
Ha+0.044
Neg logit contributions
--0.068
 carefully-0.061
ep-0.059
ail-0.058
a-0.057
Token
Token ID198
Feature activation-5.89e-04

Pos logit contributions
-+0.005
ator+0.004
yl+0.004
ail+0.004
 carefully+0.004
Neg logit contributions
 Leave-0.005
irms-0.004
ses-0.004
Leave-0.004
 drown-0.004
Token
Token ID198
Feature activation-3.07e-04

Pos logit contributions
ator+0.001
-+0.001
ail+0.001
lf+0.001
 creation+0.001
Neg logit contributions
 Leave-0.001
Ha-0.001
irms-0.001
Leave-0.001
 drown-0.001
TokenThe
Token ID464
Feature activation-3.21e-02

Pos logit contributions
ator+0.000
ail+0.000
-+0.000
 creation+0.000
yl+0.000
Neg logit contributions
 Leave-0.001
Leave-0.001
Ha-0.001
irms-0.001
vious-0.001
Token snake
Token ID17522
Feature activation-9.37e-03

Pos logit contributions
ator+0.059
-+0.058
ail+0.057
yl+0.053
 carefully+0.051
Neg logit contributions
 Leave-0.062
irms-0.057
ses-0.056
Ha-0.054
Leave-0.053
Token his
Token ID465
Feature activation+2.20e-02

Pos logit contributions
-+0.019
 carefully+0.017
ail+0.017
ator+0.017
ep+0.016
Neg logit contributions
 Leave-0.017
irms-0.015
Leave-0.014
Ha-0.014
ses-0.014
Tokenses
Token ID8448
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.034
irms+0.029
Leave+0.028
Ha+0.028
vious+0.028
Neg logit contributions
--0.051
ator-0.050
ail-0.049
 carefully-0.046
yl-0.046
Token.
Token ID13
Feature activation-1.94e-03

Pos logit contributions
 Leave+0.573
irms+0.532
ses+0.519
Leave+0.512
Ha+0.511
Neg logit contributions
ator-0.196
--0.188
ail-0.177
yl-0.157
 carefully-0.152
Token L
Token ID406
Feature activation-1.88e-02

Pos logit contributions
-+0.004
yl+0.003
ator+0.003
 carefully+0.003
ail+0.003
Neg logit contributions
 Leave-0.003
irms-0.003
Leave-0.003
 drown-0.003
ses-0.003
Tokenila
Token ID10102
Feature activation-2.13e-02

Pos logit contributions
ator+0.040
-+0.037
ail+0.037
 carefully+0.036
lf+0.035
Neg logit contributions
 Leave-0.032
vious-0.027
irms-0.027
Ha-0.026
ses-0.025
Token and
Token ID290
Feature activation+1.16e-03

Pos logit contributions
-+0.018
 carefully+0.014
ail+0.014
ator+0.014
ep+0.011
Neg logit contributions
 Leave-0.063
irms-0.059
Leave-0.058
Ha-0.058
ses-0.058
Token Tom
Token ID4186
Feature activation-4.74e-03

Pos logit contributions
 Leave+0.002
vious+0.002
Leave+0.002
 drown+0.002
irms+0.002
Neg logit contributions
--0.002
 carefully-0.002
ma-0.002
ator-0.002
yl-0.002
Token is
Token ID318
Feature activation+2.01e-02

Pos logit contributions
ator+0.000
-+0.000
ail+0.000
yl+0.000
 carefully+0.000
Neg logit contributions
 Leave-0.000
irms-0.000
ses-0.000
Leave-0.000
Ha-0.000
Token afraid
Token ID7787
Feature activation+3.92e-02

Pos logit contributions
 Leave+0.045
irms+0.042
ses+0.041
Leave+0.040
Ha+0.039
Neg logit contributions
ator-0.031
--0.030
ail-0.029
yl-0.027
lf-0.026
Token.
Token ID13
Feature activation-1.70e-03

Pos logit contributions
 Leave+0.113
irms+0.105
ses+0.103
Ha+0.101
Leave+0.101
Neg logit contributions
--0.036
ator-0.035
ail-0.032
yl-0.028
 carefully-0.028
Token It
Token ID632
Feature activation-5.16e-03

Pos logit contributions
-+0.003
 carefully+0.003
An+0.002
"+0.002
ator+0.002
Neg logit contributions
 Leave-0.003
irms-0.003
 drown-0.003
vious-0.003
ses-0.003
Token his
Token ID465
Feature activation+2.09e-02

Pos logit contributions
-+0.011
 carefully+0.010
ator+0.010
ail+0.010
yl+0.010
Neg logit contributions
 Leave-0.008
irms-0.007
ses-0.007
 Pip-0.007
Leave-0.007
Tokenses
Token ID8448
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.017
irms+0.013
ses+0.012
Leave+0.011
Ha+0.011
Neg logit contributions
ator-0.062
--0.062
ail-0.060
yl-0.058
 carefully-0.058
Token and
Token ID290
Feature activation-1.22e-03

Pos logit contributions
 Leave+0.570
irms+0.530
ses+0.517
Leave+0.509
Ha+0.509
Neg logit contributions
ator-0.194
--0.192
ail-0.177
yl-0.158
 carefully-0.155
Token scratches
Token ID37627
Feature activation+8.08e-02

Pos logit contributions
-+0.002
 carefully+0.002
 new+0.002
ep+0.002
 patiently+0.002
Neg logit contributions
 Leave-0.002
vious-0.002
rified-0.002
irms-0.002
Ha-0.002
Token Tom
Token ID4186
Feature activation-2.46e-03

Pos logit contributions
 Leave+0.190
irms+0.175
ses+0.169
Leave+0.167
Ha+0.166
Neg logit contributions
ator-0.117
--0.112
ail-0.110
yl-0.102
lf-0.099
Token's
Token ID338
Feature activation+8.13e-03

Pos logit contributions
ator+0.003
ail+0.003
-+0.003
yl+0.002
lf+0.002
Neg logit contributions
 Leave-0.007
irms-0.006
ses-0.006
Leave-0.006
 drown-0.006
Token hand
Token ID1021
Feature activation+8.17e-02

Pos logit contributions
 Leave+0.016
irms+0.014
ses+0.014
Leave+0.013
Ha+0.013
Neg logit contributions
ator-0.015
--0.015
ail-0.014
yl-0.013
lf-0.013
Token is
Token ID318
Feature activation+2.08e-02

Pos logit contributions
ator+0.050
-+0.048
ail+0.047
yl+0.045
lf+0.044
Neg logit contributions
 Leave-0.066
irms-0.059
ses-0.057
vious-0.056
Ha-0.056
Token mean
Token ID1612
Feature activation+5.56e-02

Pos logit contributions
 Leave+0.048
irms+0.043
ses+0.042
Leave+0.041
Ha+0.041
Neg logit contributions
ator-0.031
--0.031
ail-0.029
yl-0.027
 carefully-0.027
Token.
Token ID13
Feature activation-2.63e-03

Pos logit contributions
 Leave+0.163
irms+0.152
ses+0.149
Leave+0.146
Ha+0.146
Neg logit contributions
--0.047
ator-0.047
ail-0.042
yl-0.037
 carefully-0.037
Token He
Token ID679
Feature activation+3.17e-02

Pos logit contributions
-+0.004
 carefully+0.004
"+0.004
 next+0.004
An+0.004
Neg logit contributions
 Leave-0.005
 drown-0.005
ses-0.005
vious-0.005
Leave-0.005
Token his
Token ID465
Feature activation+2.12e-02

Pos logit contributions
 Leave+0.058
 Pip+0.053
Leave+0.052
 drown+0.050
 Sorry+0.049
Neg logit contributions
 carefully-0.059
--0.059
ep-0.052
 patiently-0.050
a-0.050
Tokenses
Token ID8448
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.043
Leave+0.037
Ha+0.036
irms+0.036
vious+0.036
Neg logit contributions
--0.040
ator-0.037
ail-0.037
 carefully-0.035
yl-0.034
Token and
Token ID290
Feature activation-1.54e-03

Pos logit contributions
 Leave+0.594
irms+0.558
ses+0.546
Ha+0.536
Leave+0.535
Neg logit contributions
--0.173
ator-0.158
ail-0.144
 carefully-0.134
yl-0.126
Token runs
Token ID4539
Feature activation+6.81e-02

Pos logit contributions
-+0.003
 carefully+0.003
 new+0.002
ep+0.002
 patiently+0.002
Neg logit contributions
 Leave-0.003
rified-0.003
vious-0.003
Leave-0.002
bye-0.002
Token away
Token ID1497
Feature activation+2.86e-02

Pos logit contributions
 Leave+0.169
irms+0.156
ses+0.152
Leave+0.149
Ha+0.149
Neg logit contributions
ator-0.089
--0.088
ail-0.083
yl-0.077
 carefully-0.075
Token.
Token ID13
Feature activation-1.95e-03

Pos logit contributions
 Leave+0.064
irms+0.063
 drown+0.061
 pain+0.060
ses+0.060
Neg logit contributions
ator-0.041
ail-0.038
ades-0.035
lf-0.033
yl-0.033
Token He
Token ID679
Feature activation+3.20e-02

Pos logit contributions
-+0.002
 new+0.002
 carefully+0.002
An+0.002
 next+0.002
Neg logit contributions
 Leave-0.004
vious-0.004
awks-0.004
ses-0.004
irms-0.004
Token grabbed
Token ID13176
Feature activation-1.68e-02

Pos logit contributions
 carefully+0.007
-+0.007
ep+0.007
ail+0.007
yl+0.006
Neg logit contributions
 Leave-0.009
irms-0.008
ses-0.008
awks-0.008
vious-0.007
Token the
Token ID262
Feature activation+4.60e-03

Pos logit contributions
-+0.021
 carefully+0.019
ator+0.019
ail+0.017
yl+0.017
Neg logit contributions
 Leave-0.043
irms-0.039
Leave-0.038
vious-0.038
ses-0.038
Token snake
Token ID17522
Feature activation-1.37e-02

Pos logit contributions
 Leave+0.010
irms+0.009
Leave+0.008
vious+0.008
Ha+0.008
Neg logit contributions
--0.008
ator-0.007
ail-0.007
 carefully-0.007
yl-0.006
Token with
Token ID351
Feature activation+2.28e-02

Pos logit contributions
ator+0.016
ail+0.014
yl+0.013
-+0.013
lf+0.013
Neg logit contributions
 Leave-0.037
irms-0.034
ses-0.033
 drown-0.033
vious-0.032
Token its
Token ID663
Feature activation+3.15e-02

Pos logit contributions
 Leave+0.054
irms+0.051
ses+0.049
vious+0.049
Leave+0.048
Neg logit contributions
--0.031
 carefully-0.028
ator-0.027
ep-0.026
ail-0.025
Token tal
Token ID3305
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.054
irms+0.049
Ha+0.045
Leave+0.045
vious+0.045
Neg logit contributions
--0.066
ail-0.063
ator-0.062
 carefully-0.061
yl-0.058
Tokenons
Token ID684
Feature activation-5.76e-02

Pos logit contributions
 Leave+0.477
irms+0.422
Leave+0.410
vious+0.404
awks+0.397
Neg logit contributions
ator-0.305
ail-0.259
--0.255
yl-0.248
 carefully-0.240
Token.
Token ID13
Feature activation-9.56e-03

Pos logit contributions
ator+0.062
ail+0.053
-+0.053
yl+0.051
lf+0.048
Neg logit contributions
 Leave-0.158
irms-0.145
ses-0.142
 drown-0.141
vious-0.140
Token The
Token ID383
Feature activation+9.40e-03

Pos logit contributions
 carefully+0.015
-+0.015
ator+0.015
ail+0.014
An+0.013
Neg logit contributions
 Leave-0.019
irms-0.018
vious-0.017
Leave-0.017
ses-0.017
Token hawk
Token ID48710
Feature activation+1.09e-01

Pos logit contributions
 Leave+0.018
irms+0.016
vious+0.015
Ha+0.015
Leave+0.015
Neg logit contributions
--0.018
ator-0.018
ail-0.017
 carefully-0.016
yl-0.016
Token flew
Token ID13112
Feature activation+6.81e-02

Pos logit contributions
 Leave+0.217
irms+0.199
ses+0.191
Ha+0.185
Leave+0.185
Neg logit contributions
--0.194
ator-0.190
ail-0.183
 carefully-0.175
yl-0.173
Token with
Token ID351
Feature activation+2.67e-02

Pos logit contributions
ator+0.046
ail+0.043
-+0.041
yl+0.040
lf+0.039
Neg logit contributions
 Leave-0.092
irms-0.083
vious-0.082
ses-0.082
Leave-0.081
Token the
Token ID262
Feature activation+6.24e-03

Pos logit contributions
 Leave+0.065
irms+0.060
ses+0.058
Ha+0.057
vious+0.057
Neg logit contributions
--0.036
 carefully-0.034
ator-0.034
ail-0.031
yl-0.030
Token pipe
Token ID12656
Feature activation-1.76e-02

Pos logit contributions
 Leave+0.014
irms+0.013
ses+0.012
Leave+0.012
Ha+0.012
Neg logit contributions
ator-0.010
--0.009
ail-0.009
yl-0.008
lf-0.008
Token in
Token ID287
Feature activation-6.32e-03

Pos logit contributions
ator+0.020
-+0.018
ail+0.017
yl+0.016
lf+0.015
Neg logit contributions
 Leave-0.047
irms-0.043
ses-0.043
Leave-0.042
vious-0.042
Token its
Token ID663
Feature activation+3.55e-02

Pos logit contributions
ator+0.011
-+0.011
ail+0.011
yl+0.010
 carefully+0.010
Neg logit contributions
 Leave-0.013
irms-0.012
ses-0.011
Leave-0.011
Ha-0.011
Token tal
Token ID3305
Feature activation+2.02e-01

Pos logit contributions
 Leave+0.072
irms+0.065
vious+0.062
Leave+0.061
Ha+0.061
Neg logit contributions
--0.064
ator-0.061
ail-0.060
 carefully-0.058
yl-0.056
Tokenons
Token ID684
Feature activation-5.92e-02

Pos logit contributions
 Leave+0.493
irms+0.442
Leave+0.431
vious+0.428
Ha+0.419
Neg logit contributions
ator-0.276
--0.246
ail-0.243
yl-0.230
 carefully-0.223
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
ator+0.060
-+0.055
ail+0.053
yl+0.048
lf+0.046
Neg logit contributions
 Leave-0.164
irms-0.150
ses-0.148
Leave-0.146
vious-0.146
Token The
Token ID383
Feature activation+1.00e-02

Pos logit contributions
ator+0.024
-+0.024
ail+0.023
 carefully+0.022
yl+0.022
Neg logit contributions
 Leave-0.027
irms-0.025
ses-0.024
Leave-0.023
vious-0.023
Token pipe
Token ID12656
Feature activation-1.81e-02

Pos logit contributions
 Leave+0.021
irms+0.019
ses+0.019
Ha+0.018
Leave+0.018
Neg logit contributions
ator-0.017
--0.017
ail-0.016
yl-0.015
lf-0.015
Token was
Token ID373
Feature activation+6.79e-03

Pos logit contributions
ator+0.026
-+0.025
ail+0.024
yl+0.022
 carefully+0.022
Neg logit contributions
 Leave-0.043
irms-0.039
ses-0.038
Leave-0.037
Ha-0.037
Token.
Token ID13
Feature activation-9.93e-04

Pos logit contributions
ator+0.029
ail+0.027
-+0.024
yl+0.023
ades+0.022
Neg logit contributions
 Leave-0.052
ses-0.048
irms-0.046
vious-0.045
 Pip-0.045
Token It
Token ID632
Feature activation-4.55e-03

Pos logit contributions
ator+0.002
-+0.002
ail+0.002
yl+0.002
lf+0.002
Neg logit contributions
 Leave-0.002
irms-0.001
ses-0.001
Ha-0.001
Leave-0.001
Token says
Token ID1139
Feature activation+1.33e-02

Pos logit contributions
ator+0.010
ail+0.009
-+0.009
lf+0.009
ades+0.009
Neg logit contributions
 Leave-0.008
ses-0.007
irms-0.006
Ha-0.006
vious-0.006
Token "
Token ID366
Feature activation-7.53e-02

Pos logit contributions
 Leave+0.030
 Pip+0.026
ses+0.026
 Are+0.026
 Wally+0.024
Neg logit contributions
ator-0.021
ail-0.019
ep-0.018
pack-0.018
--0.018
TokenFr
Token ID6732
Feature activation-8.29e-02

Pos logit contributions
ator+0.145
-+0.142
ail+0.140
 carefully+0.129
lf+0.129
Neg logit contributions
 Leave-0.142
irms-0.124
Ha-0.120
ses-0.120
Leave-0.119
Tokenidge
Token ID3130
Feature activation-2.18e-01

Pos logit contributions
-+0.120
ail+0.116
ator+0.115
 carefully+0.113
lf+0.111
Neg logit contributions
 Leave-0.171
vious-0.162
ses-0.157
irms-0.154
Ha-0.151
Token".
Token ID1911
Feature activation+4.43e-02

Pos logit contributions
ator+0.329
ail+0.303
Years+0.303
iced+0.298
lf+0.295
Neg logit contributions
 Leave-0.475
ses-0.420
vious-0.407
 Pip-0.397
 Stop-0.387
Token She
Token ID1375
Feature activation-7.53e-03

Pos logit contributions
 Leave+0.088
irms+0.080
ses+0.077
Leave+0.076
vious+0.075
Neg logit contributions
--0.078
ator-0.078
ail-0.075
 carefully-0.072
yl-0.071
Token does
Token ID857
Feature activation-2.62e-02

Pos logit contributions
ator+0.015
-+0.015
ail+0.014
yl+0.014
 carefully+0.013
Neg logit contributions
 Leave-0.013
irms-0.012
ses-0.012
Ha-0.011
Leave-0.011
Token not
Token ID407
Feature activation-1.80e-02

Pos logit contributions
ator+0.052
ail+0.051
-+0.046
ma+0.045
sy+0.044
Neg logit contributions
 Leave-0.047
Leave-0.040
 drown-0.039
ses-0.038
 betrayed-0.038
Token know
Token ID760
Feature activation-4.43e-02

Pos logit contributions
ator+0.042
-+0.042
ail+0.041
yl+0.038
lf+0.038
Neg logit contributions
 Leave-0.026
irms-0.022
ses-0.021
vious-0.021
Ha-0.021
Token tell
Token ID1560
Feature activation-6.54e-02

Pos logit contributions
 Leave+0.013
irms+0.012
ses+0.011
Leave+0.011
vious+0.011
Neg logit contributions
ator-0.009
ail-0.009
--0.009
yl-0.008
lf-0.008
Token Daddy
Token ID26926
Feature activation-5.62e-02

Pos logit contributions
ator+0.117
-+0.113
ail+0.112
yl+0.106
lf+0.102
Neg logit contributions
 Leave-0.130
irms-0.115
ses-0.110
vious-0.110
Ha-0.109
Token's
Token ID338
Feature activation-7.83e-04

Pos logit contributions
ator+0.098
ail+0.088
yl+0.085
-+0.082
ep+0.082
Neg logit contributions
 Leave-0.114
 Pip-0.099
 Wally-0.097
irms-0.097
vious-0.095
Token three
Token ID1115
Feature activation-7.33e-02

Pos logit contributions
ator+0.001
-+0.001
ail+0.001
yl+0.001
ep+0.001
Neg logit contributions
 Leave-0.002
ses-0.002
irms-0.002
vious-0.002
Ha-0.001
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.095
ail+0.080
pack+0.079
yl+0.071
iced+0.070
Neg logit contributions
 Leave-0.174
vious-0.158
irms-0.151
 pain-0.149
 Wally-0.147
Tokenyear
Token ID1941
Feature activation-1.97e-01

Pos logit contributions
ator+0.159
ail+0.142
lf+0.132
-+0.125
aby+0.124
Neg logit contributions
 Leave-0.267
vious-0.260
Ha-0.241
irms-0.231
Leave-0.229
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.123
ail+0.086
pack+0.085
Kate+0.081
An+0.076
Neg logit contributions
 Leave-0.595
vious-0.574
irms-0.547
ses-0.533
awks-0.529
Tokenold
Token ID727
Feature activation-7.73e-02

Pos logit contributions
ator+0.233
ail+0.212
lf+0.202
aby+0.198
iced+0.196
Neg logit contributions
 Leave-0.195
vious-0.185
Ha-0.163
irms-0.159
haw-0.157
Token a
Token ID257
Feature activation-1.20e-02

Pos logit contributions
ator+0.117
ail+0.105
ades+0.102
-+0.101
yl+0.099
Neg logit contributions
 Leave-0.175
irms-0.155
vious-0.151
 Pip-0.151
Ha-0.148
Token story
Token ID1621
Feature activation-6.42e-02

Pos logit contributions
ator+0.019
ail+0.017
-+0.015
ep+0.015
yl+0.015
Neg logit contributions
 Leave-0.027
 Are-0.024
ses-0.024
 Sorry-0.024
Ha-0.023
Token about
Token ID546
Feature activation-8.97e-02

Pos logit contributions
ail+0.101
ator+0.095
yl+0.083
ades+0.082
iced+0.081
Neg logit contributions
 Leave-0.123
vious-0.113
 Pip-0.111
 Wally-0.107
 Thank-0.104
Token belonged
Token ID19611
Feature activation+1.47e-02

Pos logit contributions
ator+0.012
ail+0.010
ades+0.010
pack+0.010
-+0.010
Neg logit contributions
 Leave-0.013
 Are-0.013
Ha-0.012
 Pip-0.012
Leave-0.012
Token to
Token ID284
Feature activation+3.07e-03

Pos logit contributions
 Leave+0.032
vious+0.028
 Pip+0.028
 Woo+0.027
Leave+0.027
Neg logit contributions
ator-0.023
lf-0.022
ves-0.021
ail-0.020
ep-0.019
Token a
Token ID257
Feature activation-1.17e-02

Pos logit contributions
 Leave+0.006
irms+0.005
ses+0.005
vious+0.005
Leave+0.005
Neg logit contributions
ator-0.006
--0.006
ail-0.005
 carefully-0.005
yl-0.005
Token three
Token ID1115
Feature activation-7.45e-02

Pos logit contributions
ator+0.019
ail+0.017
-+0.016
ades+0.016
ep+0.015
Neg logit contributions
 Leave-0.025
 Are-0.024
 drown-0.023
Ha-0.023
ses-0.023
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.082
ail+0.075
pack+0.072
iced+0.068
 patiently+0.065
Neg logit contributions
 Leave-0.195
 pain-0.180
vious-0.179
irms-0.174
Leave-0.172
Tokenyear
Token ID1941
Feature activation-1.97e-01

Pos logit contributions
ator+0.161
ail+0.152
 creation+0.134
lf+0.132
iced+0.130
Neg logit contributions
 Leave-0.271
vious-0.262
Ha-0.243
Leave-0.239
irms-0.238
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.114
pack+0.087
iced+0.085
ail+0.084
ades+0.074
Neg logit contributions
 Leave-0.600
vious-0.580
irms-0.561
 pain-0.557
ses-0.555
Tokenold
Token ID727
Feature activation-7.55e-02

Pos logit contributions
ator+0.245
ail+0.230
 creation+0.219
iced+0.215
aby+0.212
Neg logit contributions
 Leave-0.185
vious-0.174
Ha-0.147
irms-0.147
awks-0.142
Token girl
Token ID2576
Feature activation+2.67e-02

Pos logit contributions
ator+0.125
ail+0.116
-+0.113
ades+0.108
lf+0.108
Neg logit contributions
 Leave-0.163
irms-0.145
ses-0.142
vious-0.139
Ha-0.139
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
 Leave+0.062
Leave+0.052
irms+0.051
 Pip+0.051
 Are+0.051
Neg logit contributions
ator-0.043
ail-0.037
ades-0.037
lf-0.035
--0.034
Token She
Token ID1375
Feature activation-1.75e-02

Pos logit contributions
ator+0.027
-+0.027
ail+0.026
yl+0.025
 carefully+0.025
Neg logit contributions
 Leave-0.023
irms-0.021
ses-0.020
Leave-0.020
Ha-0.019
Token a
Token ID257
Feature activation-9.02e-03

Pos logit contributions
 Leave+0.016
irms+0.014
ses+0.014
Leave+0.014
Ha+0.014
Neg logit contributions
--0.008
ator-0.008
ail-0.007
 carefully-0.007
yl-0.007
Token very
Token ID845
Feature activation-6.20e-02

Pos logit contributions
ator+0.015
ail+0.014
ades+0.013
-+0.013
eways+0.012
Neg logit contributions
 Leave-0.019
irms-0.017
ses-0.017
Ha-0.017
Leave-0.017
Token adventurous
Token ID39770
Feature activation-3.07e-02

Pos logit contributions
ator+0.100
ail+0.092
-+0.090
yl+0.086
ades+0.085
Neg logit contributions
 Leave-0.132
irms-0.122
ses-0.118
vious-0.118
Ha-0.116
Token three
Token ID1115
Feature activation-7.18e-02

Pos logit contributions
ator+0.050
-+0.049
ail+0.048
yl+0.044
 carefully+0.044
Neg logit contributions
 Leave-0.065
irms-0.059
ses-0.057
Ha-0.056
Leave-0.056
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.088
ail+0.082
pack+0.081
iced+0.078
yl+0.068
Neg logit contributions
 Leave-0.176
vious-0.162
irms-0.156
 pain-0.155
Leave-0.151
Tokenyear
Token ID1941
Feature activation-1.97e-01

Pos logit contributions
ator+0.180
ail+0.161
 creation+0.160
-+0.154
iced+0.152
Neg logit contributions
 Leave-0.245
vious-0.227
Leave-0.202
Ha-0.201
haw-0.201
Token-
Token ID12
Feature activation-1.16e-01

Pos logit contributions
ator+0.095
ail+0.064
pack+0.057
iced+0.054
ep+0.052
Neg logit contributions
 Leave-0.625
vious-0.591
ses-0.579
irms-0.579
awks-0.561
Tokenold
Token ID727
Feature activation-7.74e-02

Pos logit contributions
ator+0.237
ail+0.214
 creation+0.212
iced+0.208
aby+0.205
Neg logit contributions
 Leave-0.189
vious-0.167
haw-0.143
Ha-0.141
Leave-0.139
Token.
Token ID13
Feature activation-1.61e-02

Pos logit contributions
ator+0.093
ail+0.085
-+0.083
yl+0.075
ades+0.075
Neg logit contributions
 Leave-0.202
irms-0.184
ses-0.180
vious-0.177
Ha-0.177
Token She
Token ID1375
Feature activation-1.94e-02

Pos logit contributions
ator+0.033
-+0.033
ail+0.031
 carefully+0.030
yl+0.030
Neg logit contributions
 Leave-0.027
irms-0.024
ses-0.023
 drown-0.023
Leave-0.023
Token was
Token ID373
Feature activation+4.96e-03

Pos logit contributions
ator+0.029
-+0.028
ail+0.027
yl+0.025
 carefully+0.025
Neg logit contributions
 Leave-0.045
irms-0.041
ses-0.040
Leave-0.039
Ha-0.039
TokenJohn
Token ID7554
Feature activation-1.05e-01

Pos logit contributions
ator+0.016
ail+0.014
-+0.013
 creation+0.012
yl+0.012
Neg logit contributions
 Leave-0.029
Leave-0.026
Ha-0.025
irms-0.025
 Pip-0.024
Token,
Token ID11
Feature activation-6.94e-04

Pos logit contributions
ator+0.195
ail+0.176
-+0.174
yl+0.167
ades+0.165
Neg logit contributions
 Leave-0.209
irms-0.180
vious-0.176
Ha-0.176
ses-0.174
Token a
Token ID257
Feature activation-7.65e-03

Pos logit contributions
ator+0.001
-+0.001
ail+0.001
yl+0.001
 carefully+0.001
Neg logit contributions
 Leave-0.001
irms-0.001
ses-0.001
Ha-0.001
Leave-0.001
Token three
Token ID1115
Feature activation-7.15e-02

Pos logit contributions
ator+0.013
ail+0.011
-+0.011
ades+0.011
 carefully+0.010
Neg logit contributions
 Leave-0.016
ses-0.014
irms-0.014
 drown-0.014
Ha-0.014
Token-
Token ID12
Feature activation-1.16e-01

Pos logit contributions
ator+0.073
ail+0.061
pack+0.055
iced+0.049
 carefully+0.048
Neg logit contributions
 Leave-0.199
vious-0.184
irms-0.180
 pain-0.179
Leave-0.178
Tokenyear
Token ID1941
Feature activation-1.97e-01

Pos logit contributions
ator+0.155
ail+0.145
lf+0.123
-+0.121
 creation+0.120
Neg logit contributions
 Leave-0.288
vious-0.277
Ha-0.262
irms-0.256
Leave-0.252
Token-
Token ID12
Feature activation-1.16e-01

Pos logit contributions
ator+0.112
pack+0.079
ail+0.077
An+0.072
iced+0.069
Neg logit contributions
 Leave-0.600
vious-0.591
irms-0.565
ses-0.558
 pain-0.554
Tokenold
Token ID727
Feature activation-7.75e-02

Pos logit contributions
ator+0.241
ail+0.224
aby+0.209
 creation+0.205
iced+0.201
Neg logit contributions
 Leave-0.198
vious-0.193
Ha-0.169
irms-0.163
haw-0.155
Token,
Token ID11
Feature activation-2.01e-03

Pos logit contributions
ator+0.132
-+0.126
ail+0.124
yl+0.116
 carefully+0.115
Neg logit contributions
 Leave-0.163
irms-0.146
ses-0.141
Ha-0.139
Leave-0.139
Token was
Token ID373
Feature activation+5.20e-03

Pos logit contributions
-+0.003
ator+0.003
 carefully+0.003
ail+0.003
ep+0.003
Neg logit contributions
 Leave-0.004
irms-0.004
ses-0.004
 drown-0.004
vious-0.004
Token sitting
Token ID5586
Feature activation+2.76e-03

Pos logit contributions
 Leave+0.011
irms+0.010
ses+0.010
Leave+0.009
vious+0.009
Neg logit contributions
ator-0.009
ail-0.008
--0.008
yl-0.008
ades-0.007
Token was
Token ID373
Feature activation+7.05e-03

Pos logit contributions
-+0.122
ator+0.111
ail+0.107
 carefully+0.104
yl+0.095
Neg logit contributions
 Leave-0.273
irms-0.262
ses-0.254
Leave-0.250
 drown-0.247
Token a
Token ID257
Feature activation-1.13e-02

Pos logit contributions
 Leave+0.017
irms+0.015
ses+0.015
Leave+0.015
Ha+0.015
Neg logit contributions
ator-0.010
--0.010
ail-0.009
yl-0.009
 carefully-0.008
Token polite
Token ID23507
Feature activation+3.89e-03

Pos logit contributions
ator+0.018
ail+0.017
ades+0.016
-+0.016
eways+0.015
Neg logit contributions
 Leave-0.024
irms-0.021
ses-0.021
Ha-0.021
Leave-0.021
Token three
Token ID1115
Feature activation-7.38e-02

Pos logit contributions
 Leave+0.009
irms+0.008
ses+0.008
Ha+0.008
Leave+0.008
Neg logit contributions
ator-0.006
--0.006
ail-0.006
yl-0.005
 carefully-0.005
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.090
ail+0.085
pack+0.079
iced+0.076
aby+0.069
Neg logit contributions
 Leave-0.183
vious-0.169
 pain-0.164
irms-0.161
Leave-0.158
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.181
ail+0.161
 creation+0.150
lf+0.148
iced+0.144
Neg logit contributions
 Leave-0.253
vious-0.238
Ha-0.214
haw-0.209
Leave-0.208
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.109
ail+0.080
pack+0.072
iced+0.072
An+0.071
Neg logit contributions
 Leave-0.604
vious-0.578
irms-0.557
ses-0.557
 pain-0.546
Tokenold
Token ID727
Feature activation-7.62e-02

Pos logit contributions
ator+0.238
ail+0.215
 creation+0.206
aby+0.204
iced+0.203
Neg logit contributions
 Leave-0.191
vious-0.174
haw-0.148
Ha-0.146
 Pip-0.143
Token.
Token ID13
Feature activation-1.62e-02

Pos logit contributions
ator+0.094
ail+0.087
-+0.083
ades+0.078
yl+0.077
Neg logit contributions
 Leave-0.196
irms-0.178
ses-0.173
vious-0.172
Ha-0.171
Token Whenever
Token ID21326
Feature activation+6.16e-02

Pos logit contributions
ator+0.038
-+0.038
ail+0.037
yl+0.036
 carefully+0.035
Neg logit contributions
 Leave-0.023
irms-0.020
ses-0.018
Ha-0.018
Leave-0.018
Token somebody
Token ID8276
Feature activation+2.63e-02

Pos logit contributions
 Leave+0.129
irms+0.121
ses+0.118
awks+0.112
 drown+0.110
Neg logit contributions
--0.106
ator-0.104
lf-0.096
 carefully-0.095
ail-0.094
Token real
Token ID1103
Feature activation-4.67e-03

Pos logit contributions
 Leave+0.110
irms+0.101
ses+0.099
Leave+0.097
Ha+0.097
Neg logit contributions
ator-0.050
--0.049
ail-0.046
yl-0.042
 carefully-0.041
Token that
Token ID326
Feature activation-4.87e-03

Pos logit contributions
ator+0.006
ail+0.005
-+0.005
yl+0.005
ep+0.005
Neg logit contributions
 Leave-0.012
irms-0.011
ses-0.011
vious-0.011
Leave-0.010
Token the
Token ID262
Feature activation+9.34e-04

Pos logit contributions
-+0.008
ator+0.008
ail+0.007
 carefully+0.007
lf+0.007
Neg logit contributions
 Leave-0.010
irms-0.010
ses-0.009
Leave-0.009
Ha-0.009
Token 3
Token ID513
Feature activation-7.06e-02

Pos logit contributions
 Leave+0.002
irms+0.002
ses+0.002
Leave+0.002
Ha+0.002
Neg logit contributions
ator-0.001
--0.001
ail-0.001
yl-0.001
 carefully-0.001
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.077
ail+0.073
pack+0.072
iced+0.066
 carefully+0.064
Neg logit contributions
 Leave-0.169
vious-0.162
 pain-0.154
irms-0.149
Leave-0.148
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.151
ail+0.136
-+0.128
lf+0.126
iced+0.120
Neg logit contributions
 Leave-0.272
vious-0.261
Ha-0.241
Leave-0.234
haw-0.233
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.082
ail+0.059
pack+0.051
iced+0.041
An+0.040
Neg logit contributions
 Leave-0.634
vious-0.598
ses-0.597
irms-0.589
awks-0.577
Tokenold
Token ID727
Feature activation-7.72e-02

Pos logit contributions
ator+0.210
ail+0.193
lf+0.185
-+0.181
iced+0.179
Neg logit contributions
 Leave-0.214
vious-0.199
Ha-0.178
haw-0.174
irms-0.170
Token boy
Token ID2933
Feature activation+4.46e-02

Pos logit contributions
ator+0.113
-+0.111
ail+0.106
yl+0.099
 carefully+0.097
Neg logit contributions
 Leave-0.178
irms-0.163
ses-0.158
Leave-0.155
Ha-0.155
Token could
Token ID714
Feature activation-3.78e-03

Pos logit contributions
 Leave+0.095
irms+0.084
ses+0.082
vious+0.081
Leave+0.081
Neg logit contributions
ator-0.074
--0.070
ail-0.069
yl-0.065
lf-0.063
Token feel
Token ID1254
Feature activation+4.53e-02

Pos logit contributions
ator+0.007
-+0.007
ail+0.007
yl+0.006
 carefully+0.006
Neg logit contributions
 Leave-0.007
irms-0.007
ses-0.006
Leave-0.006
Ha-0.006
Token there
Token ID612
Feature activation+7.94e-03

Pos logit contributions
ator+0.000
-+0.000
ail+0.000
ades+0.000
yl+0.000
Neg logit contributions
 Leave-0.000
irms-0.000
ses-0.000
Ha-0.000
Leave-0.000
Token was
Token ID373
Feature activation+7.56e-03

Pos logit contributions
 Leave+0.023
irms+0.022
ses+0.021
Leave+0.021
vious+0.021
Neg logit contributions
--0.007
ator-0.007
ail-0.006
yl-0.005
 carefully-0.005
Token a
Token ID257
Feature activation-8.24e-03

Pos logit contributions
 Leave+0.021
irms+0.019
ses+0.019
 drown+0.018
Leave+0.018
Neg logit contributions
--0.008
ator-0.008
 carefully-0.007
ail-0.007
yl-0.007
Token three
Token ID1115
Feature activation-7.18e-02

Pos logit contributions
ator+0.015
ail+0.013
cards+0.013
ades+0.012
eways+0.012
Neg logit contributions
 Leave-0.016
 Pip-0.015
 Are-0.015
Ha-0.015
 Woo-0.015
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.082
pack+0.076
iced+0.070
ail+0.069
cards+0.065
Neg logit contributions
 Leave-0.176
 pain-0.164
vious-0.164
 ow-0.163
irms-0.161
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.159
ail+0.145
-+0.135
 creation+0.135
iced+0.132
Neg logit contributions
 Leave-0.257
vious-0.245
Ha-0.227
Leave-0.220
irms-0.217
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.097
ail+0.065
pack+0.057
iced+0.055
An+0.049
Neg logit contributions
 Leave-0.628
vious-0.593
ses-0.590
irms-0.585
awks-0.572
Tokenold
Token ID727
Feature activation-7.85e-02

Pos logit contributions
ator+0.220
ail+0.201
 creation+0.196
iced+0.192
-+0.187
Neg logit contributions
 Leave-0.196
vious-0.181
Ha-0.160
irms-0.154
haw-0.153
Token girl
Token ID2576
Feature activation+2.48e-02

Pos logit contributions
ator+0.146
ail+0.136
ades+0.132
iced+0.131
lf+0.124
Neg logit contributions
 Leave-0.155
irms-0.134
ses-0.131
Ha-0.130
vious-0.129
Token named
Token ID3706
Feature activation+2.11e-02

Pos logit contributions
 Leave+0.035
 Pip+0.025
irms+0.025
Leave+0.024
ses+0.024
Neg logit contributions
ator-0.063
ades-0.058
ail-0.058
iced-0.055
 creation-0.054
Token Katie
Token ID24721
Feature activation-7.94e-02

Pos logit contributions
 Leave+0.020
 Pip+0.016
irms+0.016
Ha+0.016
 Woo+0.015
Neg logit contributions
ator-0.058
ades-0.057
--0.056
ep-0.056
ves-0.054
TokenMark
Token ID9704
Feature activation-1.07e-01

Pos logit contributions
 Leave+0.060
irms+0.052
Ha+0.050
Leave+0.049
ses+0.049
Neg logit contributions
ator-0.058
--0.057
ail-0.055
yl-0.052
 carefully-0.052
Token was
Token ID373
Feature activation+6.28e-03

Pos logit contributions
-+0.120
ator+0.110
ail+0.107
 carefully+0.099
yl+0.095
Neg logit contributions
 Leave-0.289
irms-0.271
ses-0.265
Leave-0.260
 drown-0.258
Token a
Token ID257
Feature activation-1.03e-02

Pos logit contributions
 Leave+0.016
irms+0.015
ses+0.014
Ha+0.014
Leave+0.014
Neg logit contributions
--0.008
ator-0.008
ail-0.007
 carefully-0.007
yl-0.007
Token three
Token ID1115
Feature activation-7.34e-02

Pos logit contributions
ator+0.017
ail+0.016
ades+0.015
-+0.015
lf+0.014
Neg logit contributions
 Leave-0.022
irms-0.020
ses-0.019
Ha-0.019
Leave-0.019
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.090
ail+0.083
pack+0.079
iced+0.072
aby+0.070
Neg logit contributions
 Leave-0.181
vious-0.170
irms-0.163
 pain-0.162
Leave-0.158
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.170
ail+0.153
-+0.144
lf+0.141
 creation+0.140
Neg logit contributions
 Leave-0.261
vious-0.248
Leave-0.226
Ha-0.224
irms-0.220
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.104
ail+0.073
pack+0.067
An+0.062
iced+0.061
Neg logit contributions
 Leave-0.615
vious-0.590
irms-0.569
ses-0.564
awks-0.556
Tokenold
Token ID727
Feature activation-7.75e-02

Pos logit contributions
ator+0.237
ail+0.216
aby+0.207
 creation+0.206
lf+0.205
Neg logit contributions
 Leave-0.193
vious-0.177
Ha-0.150
irms-0.150
Leave-0.148
Token boy
Token ID2933
Feature activation+4.33e-02

Pos logit contributions
ator+0.110
ail+0.102
-+0.098
ades+0.092
yl+0.092
Neg logit contributions
 Leave-0.186
irms-0.168
ses-0.163
vious-0.161
Leave-0.161
Token who
Token ID508
Feature activation-5.80e-02

Pos logit contributions
 Leave+0.114
irms+0.102
ses+0.099
Leave+0.098
vious+0.098
Neg logit contributions
ator-0.053
ail-0.048
--0.045
ades-0.042
yl-0.042
Token was
Token ID373
Feature activation+5.54e-03

Pos logit contributions
-+0.087
ator+0.084
ail+0.080
 carefully+0.076
yl+0.075
Neg logit contributions
 Leave-0.132
irms-0.123
ses-0.119
Leave-0.117
Ha-0.116
Token day
Token ID1110
Feature activation-4.68e-03

Pos logit contributions
ator+0.060
ail+0.056
-+0.052
ades+0.052
yl+0.051
Neg logit contributions
 Leave-0.058
irms-0.053
vious-0.051
ses-0.051
Ha-0.050
Token,
Token ID11
Feature activation+3.57e-04

Pos logit contributions
-+0.005
 carefully+0.004
ator+0.004
yl+0.004
ail+0.004
Neg logit contributions
 Leave-0.014
irms-0.013
ses-0.013
awks-0.012
Leave-0.012
Token a
Token ID257
Feature activation-9.72e-03

Pos logit contributions
 Leave+0.001
irms+0.001
ses+0.001
vious+0.001
Leave+0.001
Neg logit contributions
--0.000
ator-0.000
 carefully-0.000
ail-0.000
yl-0.000
Token 3
Token ID513
Feature activation-7.07e-02

Pos logit contributions
ator+0.014
ail+0.013
-+0.013
ep+0.012
ades+0.012
Neg logit contributions
 Leave-0.022
Ha-0.020
ses-0.020
irms-0.020
 Pip-0.020
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.073
ail+0.065
iced+0.058
pack+0.058
ades+0.057
Neg logit contributions
 Leave-0.189
vious-0.175
 pain-0.172
Ha-0.166
Leave-0.165
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.163
ail+0.149
-+0.133
 creation+0.133
iced+0.131
Neg logit contributions
 Leave-0.263
vious-0.254
Ha-0.237
irms-0.225
Leave-0.224
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.093
ail+0.057
pack+0.055
iced+0.047
An+0.045
Neg logit contributions
 Leave-0.637
vious-0.604
ses-0.593
irms-0.592
awks-0.578
Tokenold
Token ID727
Feature activation-7.70e-02

Pos logit contributions
ator+0.243
ail+0.222
 creation+0.209
iced+0.208
aby+0.207
Neg logit contributions
 Leave-0.182
vious-0.171
Ha-0.151
irms-0.141
haw-0.137
Token child
Token ID1200
Feature activation-5.14e-02

Pos logit contributions
ator+0.128
-+0.125
ail+0.121
yl+0.113
 carefully+0.112
Neg logit contributions
 Leave-0.163
irms-0.147
ses-0.143
Ha-0.140
Leave-0.140
Token ventured
Token ID44716
Feature activation+1.04e-02

Pos logit contributions
ator+0.097
ail+0.090
-+0.090
yl+0.084
ades+0.083
Neg logit contributions
 Leave-0.101
irms-0.086
Leave-0.084
Ha-0.083
vious-0.083
Token into
Token ID656
Feature activation+6.29e-02

Pos logit contributions
 Leave+0.023
irms+0.022
vious+0.021
Leave+0.021
 Pip+0.020
Neg logit contributions
ator-0.016
--0.015
ail-0.015
lf-0.014
yl-0.014
Token was
Token ID373
Feature activation+5.32e-03

Pos logit contributions
-+0.078
ator+0.073
ail+0.069
 carefully+0.064
yl+0.062
Neg logit contributions
 Leave-0.197
irms-0.186
ses-0.180
Leave-0.178
 drown-0.176
Token a
Token ID257
Feature activation-8.22e-03

Pos logit contributions
 Leave+0.012
irms+0.011
ses+0.011
Ha+0.011
Leave+0.011
Neg logit contributions
--0.008
ator-0.007
ail-0.007
 carefully-0.007
yl-0.007
Token curious
Token ID11040
Feature activation-4.42e-02

Pos logit contributions
ator+0.014
ail+0.013
ades+0.013
-+0.012
eways+0.012
Neg logit contributions
 Leave-0.017
irms-0.015
ses-0.015
Ha-0.015
Leave-0.014
Token three
Token ID1115
Feature activation-7.09e-02

Pos logit contributions
ator+0.070
ail+0.067
-+0.066
yl+0.061
ades+0.060
Neg logit contributions
 Leave-0.096
irms-0.087
ses-0.086
vious-0.084
Ha-0.083
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.084
ail+0.074
pack+0.074
iced+0.071
aby+0.064
Neg logit contributions
 Leave-0.180
vious-0.171
 pain-0.163
irms-0.162
Leave-0.156
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.211
ail+0.190
 creation+0.186
iced+0.180
-+0.179
Neg logit contributions
 Leave-0.221
vious-0.214
Leave-0.186
Ha-0.185
irms-0.178
Token-
Token ID12
Feature activation-1.15e-01

Pos logit contributions
ator+0.112
pack+0.081
ail+0.077
iced+0.077
An+0.073
Neg logit contributions
 Leave-0.603
vious-0.586
irms-0.560
ses-0.560
 pain-0.550
Tokenold
Token ID727
Feature activation-7.95e-02

Pos logit contributions
ator+0.254
ail+0.227
 creation+0.225
aby+0.223
iced+0.221
Neg logit contributions
 Leave-0.176
vious-0.166
Ha-0.134
irms-0.133
Leave-0.132
Token.
Token ID13
Feature activation-1.81e-02

Pos logit contributions
ator+0.106
ail+0.097
-+0.094
ades+0.090
lf+0.087
Neg logit contributions
 Leave-0.198
irms-0.178
ses-0.174
vious-0.173
Ha-0.172
Token She
Token ID1375
Feature activation-2.16e-02

Pos logit contributions
ator+0.040
-+0.040
ail+0.039
yl+0.037
 carefully+0.036
Neg logit contributions
 Leave-0.029
irms-0.025
ses-0.024
Ha-0.023
Leave-0.023
Token saw
Token ID2497
Feature activation-1.87e-02

Pos logit contributions
ator+0.034
-+0.034
ail+0.032
yl+0.030
 carefully+0.030
Neg logit contributions
 Leave-0.047
irms-0.043
ses-0.042
Leave-0.041
Ha-0.041
Token day
Token ID1110
Feature activation-3.87e-03

Pos logit contributions
ator+0.049
-+0.048
ail+0.046
yl+0.043
 carefully+0.043
Neg logit contributions
 Leave-0.067
irms-0.061
ses-0.059
Leave-0.058
Ha-0.058
Token,
Token ID11
Feature activation+1.77e-03

Pos logit contributions
-+0.005
ator+0.004
ail+0.004
yl+0.004
 carefully+0.004
Neg logit contributions
 Leave-0.010
irms-0.010
ses-0.009
Leave-0.009
vious-0.009
Token a
Token ID257
Feature activation-7.62e-03

Pos logit contributions
 Leave+0.004
irms+0.004
ses+0.004
vious+0.004
Leave+0.004
Neg logit contributions
--0.002
ator-0.002
ail-0.002
 carefully-0.002
yl-0.002
Token three
Token ID1115
Feature activation-7.09e-02

Pos logit contributions
ator+0.012
ail+0.011
-+0.010
ades+0.010
cards+0.010
Neg logit contributions
 Leave-0.017
Ha-0.015
irms-0.015
 Pip-0.015
ses-0.015
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.072
pack+0.063
ail+0.059
iced+0.058
 patiently+0.053
Neg logit contributions
 Leave-0.190
vious-0.175
 pain-0.172
irms-0.170
 ow-0.167
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.153
ail+0.139
-+0.127
 creation+0.125
lf+0.123
Neg logit contributions
 Leave-0.270
vious-0.255
Ha-0.238
Leave-0.231
irms-0.228
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.083
ail+0.050
pack+0.042
iced+0.039
An+0.034
Neg logit contributions
 Leave-0.645
vious-0.607
ses-0.597
irms-0.595
awks-0.581
Tokenold
Token ID727
Feature activation-7.75e-02

Pos logit contributions
ator+0.226
ail+0.207
 creation+0.197
iced+0.195
aby+0.194
Neg logit contributions
 Leave-0.197
vious-0.180
Ha-0.161
haw-0.155
Leave-0.152
Token boy
Token ID2933
Feature activation+4.50e-02

Pos logit contributions
ator+0.133
-+0.127
ail+0.126
 carefully+0.116
yl+0.116
Neg logit contributions
 Leave-0.162
irms-0.144
ses-0.140
Ha-0.138
Leave-0.137
Token was
Token ID373
Feature activation+8.27e-03

Pos logit contributions
 Leave+0.085
irms+0.076
ses+0.073
Leave+0.071
Ha+0.071
Neg logit contributions
--0.085
ator-0.085
ail-0.082
yl-0.078
 carefully-0.077
Token very
Token ID845
Feature activation-6.12e-02

Pos logit contributions
 Leave+0.017
irms+0.015
ses+0.015
Leave+0.014
Ha+0.014
Neg logit contributions
ator-0.015
--0.014
ail-0.014
yl-0.013
lf-0.013
Token on
Token ID319
Feature activation-1.90e-02

Pos logit contributions
ator+0.058
ail+0.052
-+0.049
ades+0.049
lf+0.048
Neg logit contributions
 Leave-0.092
 Pip-0.080
Leave-0.080
Ha-0.078
irms-0.078
Token,
Token ID11
Feature activation+2.06e-03

Pos logit contributions
ator+0.027
-+0.026
ail+0.025
yl+0.023
 carefully+0.023
Neg logit contributions
 Leave-0.045
irms-0.041
ses-0.040
Leave-0.039
Ha-0.039
Token the
Token ID262
Feature activation+5.14e-03

Pos logit contributions
 Leave+0.004
irms+0.004
ses+0.004
Ha+0.004
Leave+0.004
Neg logit contributions
ator-0.003
--0.003
ail-0.003
 carefully-0.003
yl-0.003
Token three
Token ID1115
Feature activation-6.74e-02

Pos logit contributions
 Leave+0.011
irms+0.010
ses+0.010
vious+0.010
Leave+0.010
Neg logit contributions
ator-0.007
--0.007
ail-0.007
yl-0.007
ades-0.007
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.073
pack+0.068
aby+0.058
ail+0.057
her+0.057
Neg logit contributions
 Leave-0.168
vious-0.158
 pain-0.142
Leave-0.142
irms-0.140
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.155
ail+0.136
lf+0.129
-+0.126
aby+0.124
Neg logit contributions
 Leave-0.266
vious-0.249
Ha-0.238
Leave-0.229
haw-0.226
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.089
pack+0.057
ail+0.056
An+0.048
lf+0.046
Neg logit contributions
 Leave-0.636
vious-0.595
ses-0.581
irms-0.576
Leave-0.568
Tokenold
Token ID727
Feature activation-7.93e-02

Pos logit contributions
ator+0.218
ail+0.193
lf+0.190
aby+0.190
 creation+0.181
Neg logit contributions
 Leave-0.207
vious-0.187
Ha-0.172
haw-0.167
Leave-0.165
Token started
Token ID2067
Feature activation-2.64e-02

Pos logit contributions
ator+0.121
-+0.112
ail+0.111
yl+0.104
ades+0.103
Neg logit contributions
 Leave-0.180
irms-0.159
ses-0.155
vious-0.155
Ha-0.155
Token testing
Token ID4856
Feature activation-7.83e-02

Pos logit contributions
ator+0.038
ail+0.034
lf+0.032
-+0.032
yl+0.031
Neg logit contributions
 Leave-0.063
irms-0.056
Leave-0.055
ses-0.054
 Pip-0.054
Token and
Token ID290
Feature activation-1.00e-02

Pos logit contributions
ator+0.122
ail+0.116
yl+0.108
-+0.107
lf+0.104
Neg logit contributions
 Leave-0.166
irms-0.158
 Pip-0.146
Leave-0.146
ses-0.146
Token,
Token ID11
Feature activation+1.21e-03

Pos logit contributions
-+0.002
ator+0.002
ail+0.002
yl+0.002
 carefully+0.002
Neg logit contributions
 Leave-0.005
irms-0.004
ses-0.004
Leave-0.004
Ha-0.004
Token a
Token ID257
Feature activation-1.21e-02

Pos logit contributions
 Leave+0.003
irms+0.002
ses+0.002
Leave+0.002
Ha+0.002
Neg logit contributions
ator-0.002
--0.002
ail-0.002
yl-0.002
ades-0.002
Token little
Token ID1310
Feature activation+1.93e-02

Pos logit contributions
ator+0.020
ail+0.018
ades+0.016
eways+0.016
-+0.016
Neg logit contributions
 Leave-0.026
Ha-0.023
ses-0.023
 pain-0.023
 Are-0.023
Token three
Token ID1115
Feature activation-7.38e-02

Pos logit contributions
 Leave+0.045
irms+0.041
ses+0.040
Ha+0.039
Leave+0.039
Neg logit contributions
--0.029
ator-0.027
ail-0.026
yl-0.025
 carefully-0.024
Token-
Token ID12
Feature activation-1.12e-01

Pos logit contributions
ator+0.063
pack+0.050
ail+0.047
iced+0.043
ades+0.039
Neg logit contributions
 Leave-0.219
vious-0.203
 pain-0.197
Leave-0.194
irms-0.192
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.167
ail+0.150
 creation+0.140
iced+0.139
lf+0.136
Neg logit contributions
 Leave-0.259
vious-0.252
Ha-0.233
Leave-0.223
irms-0.222
Token-
Token ID12
Feature activation-1.12e-01

Pos logit contributions
ator+0.106
pack+0.073
ail+0.069
iced+0.067
ades+0.058
Neg logit contributions
 Leave-0.625
vious-0.597
irms-0.579
ses-0.578
 pain-0.569
Tokenold
Token ID727
Feature activation-7.56e-02

Pos logit contributions
ator+0.250
ail+0.225
 creation+0.222
iced+0.219
lf+0.212
Neg logit contributions
 Leave-0.177
vious-0.169
Ha-0.144
irms-0.137
Leave-0.132
Token boy
Token ID2933
Feature activation+4.56e-02

Pos logit contributions
ator+0.127
-+0.121
ail+0.119
yl+0.110
 carefully+0.110
Neg logit contributions
 Leave-0.161
irms-0.143
ses-0.139
Ha-0.138
Leave-0.137
Token named
Token ID3706
Feature activation+2.16e-02

Pos logit contributions
 Leave+0.089
irms+0.077
ses+0.074
Leave+0.074
Ha+0.074
Neg logit contributions
ator-0.086
--0.082
ail-0.081
yl-0.075
 carefully-0.075
Token John
Token ID1757
Feature activation-8.57e-02

Pos logit contributions
 Leave+0.034
irms+0.030
ses+0.028
Leave+0.028
Ha+0.027
Neg logit contributions
ator-0.048
--0.047
ail-0.046
yl-0.044
 carefully-0.044
Token was
Token ID373
Feature activation+7.24e-03

Pos logit contributions
-+0.134
ator+0.120
ail+0.118
 carefully+0.116
ma+0.107
Neg logit contributions
 Leave-0.261
irms-0.249
ses-0.243
Leave-0.238
 drown-0.235
Token a
Token ID257
Feature activation-1.07e-02

Pos logit contributions
 Leave+0.017
irms+0.015
ses+0.015
Leave+0.015
Ha+0.014
Neg logit contributions
ator-0.011
--0.010
ail-0.010
yl-0.009
lf-0.009
Token curious
Token ID11040
Feature activation-4.49e-02

Pos logit contributions
ator+0.018
ail+0.017
ades+0.016
-+0.016
lf+0.015
Neg logit contributions
 Leave-0.022
irms-0.020
ses-0.020
Ha-0.020
Leave-0.019
Token three
Token ID1115
Feature activation-7.37e-02

Pos logit contributions
ator+0.062
-+0.059
ail+0.058
yl+0.053
 carefully+0.052
Neg logit contributions
 Leave-0.107
irms-0.098
ses-0.096
Ha-0.094
vious-0.094
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.094
ail+0.087
pack+0.083
iced+0.079
aby+0.074
Neg logit contributions
 Leave-0.182
vious-0.168
 pain-0.160
irms-0.160
Leave-0.158
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.180
ail+0.163
-+0.153
lf+0.152
 creation+0.152
Neg logit contributions
 Leave-0.250
vious-0.238
Ha-0.212
Leave-0.212
irms-0.210
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.106
ail+0.077
pack+0.069
iced+0.066
An+0.060
Neg logit contributions
 Leave-0.618
vious-0.593
irms-0.576
ses-0.565
awks-0.559
Tokenold
Token ID727
Feature activation-7.72e-02

Pos logit contributions
ator+0.233
ail+0.212
 creation+0.204
lf+0.202
iced+0.200
Neg logit contributions
 Leave-0.195
vious-0.181
irms-0.155
Ha-0.153
Leave-0.152
Token boy
Token ID2933
Feature activation+4.41e-02

Pos logit contributions
ator+0.097
ail+0.089
-+0.086
ades+0.080
yl+0.080
Neg logit contributions
 Leave-0.197
irms-0.179
ses-0.174
vious-0.173
Leave-0.172
Token.
Token ID13
Feature activation-1.36e-02

Pos logit contributions
 Leave+0.120
irms+0.109
ses+0.107
Leave+0.105
vious+0.105
Neg logit contributions
ator-0.048
--0.044
ail-0.043
yl-0.038
lf-0.038
Token Everywhere
Token ID48201
Feature activation+7.88e-02

Pos logit contributions
-+0.034
ator+0.034
ail+0.034
yl+0.032
lf+0.031
Neg logit contributions
 Leave-0.018
irms-0.015
Ha-0.014
ses-0.014
vious-0.013
Token presents
Token ID10969
Feature activation-3.86e-02

Pos logit contributions
 Leave+0.057
irms+0.052
ses+0.051
Ha+0.050
Leave+0.050
Neg logit contributions
--0.029
ator-0.028
ail-0.026
 carefully-0.025
yl-0.024
Token!
Token ID0
Feature activation+4.41e-03

Pos logit contributions
ator+0.056
ail+0.055
Years+0.049
An+0.048
ep+0.048
Neg logit contributions
 Leave-0.080
ses-0.074
 Pip-0.072
irms-0.070
 Wally-0.069
Token The
Token ID383
Feature activation+5.05e-03

Pos logit contributions
 Leave+0.008
irms+0.007
ses+0.007
Leave+0.007
vious+0.007
Neg logit contributions
ator-0.008
--0.008
ail-0.008
 carefully-0.008
yl-0.008
Token 3
Token ID513
Feature activation-7.03e-02

Pos logit contributions
 Leave+0.011
irms+0.010
 pain+0.010
ses+0.010
Ha+0.010
Neg logit contributions
ail-0.008
ator-0.007
--0.007
yl-0.007
ri-0.007
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.081
ail+0.073
pack+0.072
 patiently+0.072
iced+0.070
Neg logit contributions
 Leave-0.162
vious-0.152
 pain-0.148
haw-0.146
Ha-0.141
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.171
ail+0.154
 creation+0.150
-+0.140
iced+0.139
Neg logit contributions
 Leave-0.237
vious-0.222
haw-0.213
Ha-0.210
 betrayed-0.191
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.103
pack+0.081
ail+0.069
iced+0.064
An+0.064
Neg logit contributions
 Leave-0.601
ses-0.575
vious-0.571
irms-0.561
awks-0.555
Tokenold
Token ID727
Feature activation-7.60e-02

Pos logit contributions
ator+0.238
ail+0.217
 creation+0.210
iced+0.204
aby+0.203
Neg logit contributions
 Leave-0.175
vious-0.157
haw-0.146
Ha-0.142
 betrayed-0.130
Token was
Token ID373
Feature activation+6.33e-03

Pos logit contributions
ator+0.130
ail+0.120
-+0.116
ades+0.113
yl+0.110
Neg logit contributions
 Leave-0.158
Ha-0.137
irms-0.136
vious-0.136
ses-0.136
Token so
Token ID523
Feature activation+4.09e-02

Pos logit contributions
 Leave+0.014
ses+0.013
irms+0.012
 Pip+0.012
Leave+0.012
Neg logit contributions
ator-0.010
ail-0.009
--0.009
ades-0.008
yl-0.008
Token happy
Token ID3772
Feature activation-6.30e-03

Pos logit contributions
 Leave+0.101
irms+0.091
ses+0.089
Ha+0.088
vious+0.087
Neg logit contributions
ator-0.055
--0.053
ail-0.051
yl-0.047
lf-0.045
Token presents
Token ID10969
Feature activation-3.86e-02

Pos logit contributions
 Leave+0.057
irms+0.052
ses+0.051
Ha+0.050
Leave+0.050
Neg logit contributions
--0.029
ator-0.028
ail-0.026
 carefully-0.025
yl-0.024
Token!
Token ID0
Feature activation+4.41e-03

Pos logit contributions
ator+0.056
ail+0.055
Years+0.049
An+0.048
ep+0.048
Neg logit contributions
 Leave-0.080
ses-0.074
 Pip-0.072
irms-0.070
 Wally-0.069
Token The
Token ID383
Feature activation+5.05e-03

Pos logit contributions
 Leave+0.008
irms+0.007
ses+0.007
Leave+0.007
vious+0.007
Neg logit contributions
ator-0.008
--0.008
ail-0.008
 carefully-0.008
yl-0.008
Token 3
Token ID513
Feature activation-7.03e-02

Pos logit contributions
 Leave+0.011
irms+0.010
 pain+0.010
ses+0.010
Ha+0.010
Neg logit contributions
ail-0.008
ator-0.007
--0.007
yl-0.007
ri-0.007
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.081
ail+0.073
pack+0.072
 patiently+0.072
iced+0.070
Neg logit contributions
 Leave-0.162
vious-0.152
 pain-0.148
haw-0.146
Ha-0.141
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.171
ail+0.154
 creation+0.150
-+0.140
iced+0.139
Neg logit contributions
 Leave-0.237
vious-0.222
haw-0.213
Ha-0.210
 betrayed-0.191
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.103
pack+0.081
ail+0.069
iced+0.064
An+0.064
Neg logit contributions
 Leave-0.601
ses-0.575
vious-0.571
irms-0.561
awks-0.555
Tokenold
Token ID727
Feature activation-7.60e-02

Pos logit contributions
ator+0.238
ail+0.217
 creation+0.210
iced+0.204
aby+0.203
Neg logit contributions
 Leave-0.175
vious-0.157
haw-0.146
Ha-0.142
 betrayed-0.130
Token was
Token ID373
Feature activation+6.33e-03

Pos logit contributions
ator+0.130
ail+0.120
-+0.116
ades+0.113
yl+0.110
Neg logit contributions
 Leave-0.158
Ha-0.137
irms-0.136
vious-0.136
ses-0.136
Token so
Token ID523
Feature activation+4.09e-02

Pos logit contributions
 Leave+0.014
ses+0.013
irms+0.012
 Pip+0.012
Leave+0.012
Neg logit contributions
ator-0.010
ail-0.009
--0.009
ades-0.008
yl-0.008
Token happy
Token ID3772
Feature activation-6.30e-03

Pos logit contributions
 Leave+0.101
irms+0.091
ses+0.089
Ha+0.088
vious+0.087
Neg logit contributions
ator-0.055
--0.053
ail-0.051
yl-0.047
lf-0.045
Token and
Token ID290
Feature activation-9.07e-03

Pos logit contributions
ator+0.015
ail+0.014
ades+0.014
eways+0.013
iced+0.013
Neg logit contributions
 Leave-0.019
 Pip-0.016
 Are-0.016
ses-0.015
 Woo-0.015
Token their
Token ID511
Feature activation-2.43e-02

Pos logit contributions
-+0.011
ator+0.009
 carefully+0.009
lf+0.009
yl+0.009
Neg logit contributions
 Leave-0.024
irms-0.023
awks-0.022
ses-0.022
vious-0.022
Token little
Token ID1310
Feature activation+1.69e-02

Pos logit contributions
ator+0.034
-+0.033
ail+0.031
yl+0.030
lf+0.029
Neg logit contributions
 Leave-0.057
irms-0.052
ses-0.052
Ha-0.050
 drown-0.050
Token three
Token ID1115
Feature activation-7.07e-02

Pos logit contributions
 Leave+0.039
irms+0.035
Leave+0.034
ses+0.034
Ha+0.033
Neg logit contributions
--0.026
ator-0.025
ail-0.024
 carefully-0.023
yl-0.023
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.076
pack+0.062
ail+0.057
 patiently+0.055
iced+0.054
Neg logit contributions
 Leave-0.192
irms-0.171
vious-0.171
 pain-0.170
 tongues-0.167
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.144
ail+0.131
-+0.125
lf+0.122
iced+0.117
Neg logit contributions
 Leave-0.274
vious-0.255
Ha-0.241
irms-0.240
Leave-0.236
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.094
ail+0.058
pack+0.052
lf+0.052
An+0.051
Neg logit contributions
 Leave-0.630
vious-0.596
irms-0.590
ses-0.589
awks-0.572
Tokenold
Token ID727
Feature activation-7.83e-02

Pos logit contributions
ator+0.213
ail+0.195
lf+0.188
-+0.185
iced+0.185
Neg logit contributions
 Leave-0.206
vious-0.188
irms-0.171
Ha-0.169
haw-0.165
Token daughter
Token ID4957
Feature activation-6.03e-02

Pos logit contributions
ator+0.141
ail+0.129
-+0.126
lf+0.125
ades+0.122
Neg logit contributions
 Leave-0.159
irms-0.140
ses-0.135
vious-0.134
Ha-0.133
Token.
Token ID13
Feature activation-1.57e-02

Pos logit contributions
ator+0.091
ail+0.081
-+0.080
lf+0.076
ades+0.076
Neg logit contributions
 Leave-0.142
irms-0.123
ses-0.122
 Pip-0.119
Ha-0.117
Token They
Token ID1119
Feature activation+6.97e-03

Pos logit contributions
ator+0.032
-+0.031
ail+0.030
 carefully+0.029
yl+0.028
Neg logit contributions
 Leave-0.028
irms-0.025
ses-0.024
Leave-0.023
vious-0.023
Token,
Token ID11
Feature activation-1.15e-04

Pos logit contributions
ator+0.006
-+0.005
ail+0.005
yl+0.005
lf+0.005
Neg logit contributions
 Leave-0.008
irms-0.008
ses-0.007
Ha-0.007
Leave-0.007
Token a
Token ID257
Feature activation-8.18e-03

Pos logit contributions
ator+0.000
-+0.000
ail+0.000
yl+0.000
lf+0.000
Neg logit contributions
 Leave-0.000
irms-0.000
ses-0.000
Ha-0.000
Leave-0.000
Token small
Token ID1402
Feature activation+1.74e-02

Pos logit contributions
ator+0.012
ail+0.011
-+0.011
ep+0.010
pack+0.010
Neg logit contributions
 Leave-0.018
ses-0.017
irms-0.016
Ha-0.016
Leave-0.016
Token three
Token ID1115
Feature activation-7.06e-02

Pos logit contributions
 Leave+0.039
irms+0.035
ses+0.034
Leave+0.033
Ha+0.033
Neg logit contributions
--0.028
ator-0.027
ail-0.026
yl-0.025
 carefully-0.024
Token-
Token ID12
Feature activation-1.14e-01

Pos logit contributions
ator+0.072
pack+0.062
ail+0.060
iced+0.053
 patiently+0.050
Neg logit contributions
 Leave-0.191
vious-0.180
 pain-0.172
Leave-0.171
irms-0.169
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.154
ail+0.144
lf+0.128
-+0.122
iced+0.119
Neg logit contributions
 Leave-0.280
vious-0.267
Ha-0.256
Leave-0.244
irms-0.240
Token old
Token ID1468
Feature activation+2.84e-02

Pos logit contributions
ator+0.104
pack+0.075
ail+0.071
iced+0.064
An+0.063
Neg logit contributions
 Leave-0.612
vious-0.588
irms-0.566
ses-0.566
 pain-0.564
Token child
Token ID1200
Feature activation-5.11e-02

Pos logit contributions
 Leave+0.063
irms+0.057
ses+0.056
Leave+0.055
Ha+0.055
Neg logit contributions
ator-0.043
ail-0.040
--0.039
ades-0.036
iced-0.036
Token spotted
Token ID13489
Feature activation-2.62e-02

Pos logit contributions
ator+0.095
ail+0.086
-+0.084
ades+0.081
iced+0.079
Neg logit contributions
 Leave-0.102
Leave-0.084
vious-0.081
Ha-0.081
 Pip-0.081
Token the
Token ID262
Feature activation+1.33e-03

Pos logit contributions
ator+0.044
ail+0.042
-+0.042
lf+0.038
yl+0.038
Neg logit contributions
 Leave-0.053
 Pip-0.047
irms-0.047
ses-0.046
Leave-0.045
Token boot
Token ID6297
Feature activation+1.68e-02

Pos logit contributions
 Leave+0.002
irms+0.002
ses+0.002
vious+0.002
Ha+0.002
Neg logit contributions
ator-0.002
--0.002
ail-0.002
lf-0.002
ades-0.002
Token kids
Token ID3988
Feature activation+3.72e-02

Pos logit contributions
ator+0.048
ail+0.043
-+0.042
yl+0.041
pack+0.040
Neg logit contributions
 Leave-0.067
irms-0.059
vious-0.057
Leave-0.057
ses-0.056
Token,
Token ID11
Feature activation+4.31e-04

Pos logit contributions
 Leave+0.085
irms+0.075
ses+0.074
vious+0.072
Leave+0.072
Neg logit contributions
ator-0.057
--0.053
ail-0.053
yl-0.048
lf-0.048
Token a
Token ID257
Feature activation-1.01e-02

Pos logit contributions
 Leave+0.001
irms+0.001
ses+0.001
Leave+0.001
Ha+0.001
Neg logit contributions
--0.001
ator-0.001
ail-0.001
yl-0.001
 carefully-0.001
Token 3
Token ID513
Feature activation-7.04e-02

Pos logit contributions
ator+0.018
ail+0.016
-+0.016
ades+0.015
pack+0.015
Neg logit contributions
 Leave-0.020
ses-0.018
Ha-0.018
 Are-0.018
 drown-0.018
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.068
ail+0.062
pack+0.056
 patiently+0.056
iced+0.055
Neg logit contributions
 Leave-0.191
 pain-0.171
vious-0.170
irms-0.166
Leave-0.165
Tokenyear
Token ID1941
Feature activation-1.96e-01

Pos logit contributions
ator+0.162
ail+0.143
 creation+0.135
iced+0.133
lf+0.133
Neg logit contributions
 Leave-0.270
vious-0.253
Ha-0.232
haw-0.229
irms-0.227
Token-
Token ID12
Feature activation-1.13e-01

Pos logit contributions
ator+0.102
pack+0.077
ail+0.067
iced+0.066
An+0.066
Neg logit contributions
 Leave-0.615
vious-0.586
ses-0.577
irms-0.570
awks-0.562
Tokenold
Token ID727
Feature activation-7.61e-02

Pos logit contributions
ator+0.244
ail+0.219
 creation+0.217
iced+0.215
lf+0.208
Neg logit contributions
 Leave-0.183
vious-0.168
haw-0.149
Ha-0.142
irms-0.138
Token and
Token ID290
Feature activation-8.59e-03

Pos logit contributions
ator+0.136
ail+0.126
-+0.124
lf+0.119
ades+0.117
Neg logit contributions
 Leave-0.156
irms-0.137
ses-0.134
vious-0.132
Ha-0.132
Token their
Token ID511
Feature activation-2.53e-02

Pos logit contributions
ator+0.013
-+0.013
ail+0.012
yl+0.011
 carefully+0.011
Neg logit contributions
 Leave-0.020
irms-0.018
ses-0.017
Leave-0.017
Ha-0.017
Token older
Token ID4697
Feature activation+8.81e-03

Pos logit contributions
ator+0.038
-+0.036
ail+0.035
yl+0.033
lf+0.032
Neg logit contributions
 Leave-0.058
ses-0.053
irms-0.053
 drown-0.050
Ha-0.050
Token the
Token ID262
Feature activation+5.05e-03

Pos logit contributions
 Leave+0.003
irms+0.003
ses+0.003
Leave+0.003
Ha+0.003
Neg logit contributions
ator-0.002
--0.002
ail-0.002
yl-0.002
 carefully-0.002
Token rabbit
Token ID22746
Feature activation+1.32e-02

Pos logit contributions
 Leave+0.010
irms+0.009
ses+0.009
Leave+0.008
Ha+0.008
Neg logit contributions
ator-0.009
--0.009
ail-0.009
yl-0.008
 carefully-0.008
Token realised
Token ID19169
Feature activation-6.10e-02

Pos logit contributions
 Leave+0.029
irms+0.027
ses+0.026
Leave+0.025
Ha+0.025
Neg logit contributions
--0.022
ator-0.021
ail-0.020
 carefully-0.019
yl-0.019
Token she
Token ID673
Feature activation-1.52e-02

Pos logit contributions
ator+0.107
-+0.100
ail+0.100
yl+0.092
ades+0.091
Neg logit contributions
 Leave-0.123
irms-0.108
 Pip-0.106
Leave-0.106
ses-0.105
Token had
Token ID550
Feature activation+6.28e-03

Pos logit contributions
ator+0.023
-+0.022
ail+0.021
yl+0.020
 carefully+0.019
Neg logit contributions
 Leave-0.035
irms-0.032
ses-0.031
vious-0.031
Leave-0.030
Token no
Token ID645
Feature activation+1.21e-01

Pos logit contributions
 Leave+0.014
irms+0.012
ses+0.012
Leave+0.012
Ha+0.012
Neg logit contributions
--0.010
ator-0.010
ail-0.010
yl-0.009
 carefully-0.009
Token food
Token ID2057
Feature activation+1.16e-03

Pos logit contributions
 Leave+0.301
irms+0.270
Ha+0.267
Leave+0.266
ses+0.258
Neg logit contributions
--0.173
ator-0.154
 carefully-0.148
ail-0.147
yl-0.138
Token to
Token ID284
Feature activation+2.82e-03

Pos logit contributions
 Leave+0.003
irms+0.003
ses+0.003
Leave+0.003
Ha+0.003
Neg logit contributions
ator-0.001
--0.001
ail-0.001
yl-0.001
 carefully-0.001
Token eat
Token ID4483
Feature activation-1.78e-02

Pos logit contributions
 Leave+0.005
irms+0.005
ses+0.005
Ha+0.004
vious+0.004
Neg logit contributions
ator-0.006
--0.006
ail-0.005
 carefully-0.005
lf-0.005
Token.
Token ID13
Feature activation-1.24e-02

Pos logit contributions
ator+0.019
ail+0.018
-+0.017
yl+0.016
lf+0.015
Neg logit contributions
 Leave-0.047
irms-0.044
ses-0.042
 Pip-0.042
Leave-0.042
Token She
Token ID1375
Feature activation-1.67e-02

Pos logit contributions
-+0.025
ator+0.024
ail+0.024
 carefully+0.023
yl+0.022
Neg logit contributions
 Leave-0.022
irms-0.020
ses-0.019
Leave-0.019
 drown-0.019
Token,
Token ID11
Feature activation+1.09e-02

Pos logit contributions
 Leave+0.024
irms+0.022
ses+0.021
 Pip+0.020
Ha+0.020
Neg logit contributions
ator-0.016
--0.015
ail-0.015
yl-0.014
lf-0.014
Token it
Token ID340
Feature activation-1.87e-02

Pos logit contributions
 Leave+0.024
irms+0.024
ses+0.022
awks+0.022
vious+0.022
Neg logit contributions
--0.017
ator-0.015
yl-0.015
ail-0.015
lf-0.015
Token's
Token ID338
Feature activation+1.07e-02

Pos logit contributions
ator+0.029
-+0.028
ail+0.027
yl+0.026
lf+0.025
Neg logit contributions
 Leave-0.042
irms-0.038
ses-0.037
vious-0.036
Ha-0.036
Token so
Token ID523
Feature activation+4.81e-02

Pos logit contributions
 Leave+0.027
irms+0.024
Leave+0.024
ses+0.023
Ha+0.023
Neg logit contributions
--0.014
ator-0.014
ail-0.013
yl-0.012
 carefully-0.012
Token pretty
Token ID2495
Feature activation+1.53e-02

Pos logit contributions
 Leave+0.133
irms+0.123
ses+0.120
Leave+0.118
Ha+0.118
Neg logit contributions
--0.050
ator-0.049
ail-0.046
yl-0.042
 carefully-0.042
Token,"
Token ID553
Feature activation+1.39e-01

Pos logit contributions
 Leave+0.032
irms+0.029
ses+0.028
Ha+0.028
vious+0.028
Neg logit contributions
ator-0.026
--0.025
ail-0.024
yl-0.023
lf-0.022
Token Max
Token ID5436
Feature activation+1.43e-02

Pos logit contributions
 Leave+0.247
irms+0.217
vious+0.214
 drown+0.210
ses+0.206
Neg logit contributions
--0.274
 carefully-0.272
lf-0.260
ator-0.258
ail-0.257
Token said
Token ID531
Feature activation-1.33e-02

Pos logit contributions
 Leave+0.029
irms+0.027
ses+0.026
Leave+0.025
Ha+0.025
Neg logit contributions
--0.025
ator-0.025
ail-0.024
yl-0.023
 carefully-0.022
Token.
Token ID13
Feature activation-3.29e-03

Pos logit contributions
ator+0.017
ail+0.015
-+0.015
lf+0.014
yl+0.014
Neg logit contributions
 Leave-0.033
irms-0.031
ses-0.030
vious-0.030
Ha-0.029
Token "
Token ID366
Feature activation-7.17e-02

Pos logit contributions
ator+0.006
ail+0.005
-+0.005
yl+0.005
ma+0.005
Neg logit contributions
 Leave-0.007
Ha-0.007
 Are-0.006
irms-0.006
 Pip-0.006
TokenCan
Token ID6090
Feature activation+1.00e-02

Pos logit contributions
ator+0.131
ail+0.130
-+0.127
lf+0.115
yl+0.115
Neg logit contributions
 Leave-0.144
Leave-0.126
irms-0.125
Ha-0.125
ses-0.120
Token so
Token ID523
Feature activation+4.45e-02

Pos logit contributions
 Leave+0.028
irms+0.026
ses+0.025
Leave+0.024
Ha+0.024
Neg logit contributions
ator-0.014
--0.014
ail-0.013
yl-0.012
 carefully-0.012
Token pretty
Token ID2495
Feature activation+1.29e-02

Pos logit contributions
 Leave+0.116
irms+0.107
ses+0.104
Leave+0.103
Ha+0.102
Neg logit contributions
ator-0.053
--0.053
ail-0.050
yl-0.046
 carefully-0.045
Token with
Token ID351
Feature activation+2.56e-02

Pos logit contributions
 Leave+0.032
irms+0.029
ses+0.029
vious+0.028
 Pip+0.028
Neg logit contributions
ator-0.017
ail-0.016
--0.014
yl-0.014
lf-0.013
Token its
Token ID663
Feature activation+3.06e-02

Pos logit contributions
 Leave+0.065
irms+0.061
ses+0.058
 drown+0.058
Ha+0.058
Neg logit contributions
--0.033
 carefully-0.029
ator-0.029
yl-0.027
 new-0.026
Token orange
Token ID10912
Feature activation-1.71e-03

Pos logit contributions
 Leave+0.063
irms+0.058
Ha+0.055
Leave+0.054
ses+0.054
Neg logit contributions
--0.053
ator-0.052
ail-0.050
 carefully-0.047
yl-0.047
Token wings
Token ID12098
Feature activation+1.40e-01

Pos logit contributions
-+0.003
ator+0.003
ail+0.003
 carefully+0.003
yl+0.003
Neg logit contributions
 Leave-0.003
irms-0.003
Ha-0.003
Leave-0.003
ses-0.003
Token and
Token ID290
Feature activation-4.03e-03

Pos logit contributions
 Leave+0.387
irms+0.358
ses+0.350
Leave+0.345
Ha+0.344
Neg logit contributions
ator-0.144
--0.139
ail-0.130
yl-0.117
 carefully-0.113
Token long
Token ID890
Feature activation+4.69e-02

Pos logit contributions
-+0.006
 carefully+0.005
ator+0.005
lf+0.005
ail+0.005
Neg logit contributions
 Leave-0.010
irms-0.009
ses-0.009
vious-0.009
 drown-0.009
Token be
Token ID307
Feature activation+4.84e-02

Pos logit contributions
 Leave+0.081
irms+0.073
Ha+0.068
ses+0.067
vious+0.067
Neg logit contributions
--0.098
ator-0.092
ail-0.090
 carefully-0.090
yl-0.086
Tokenak
Token ID461
Feature activation+1.18e-03

Pos logit contributions
 Leave+0.096
vious+0.087
irms+0.086
 Pip+0.085
awks+0.084
Neg logit contributions
ator-0.084
ail-0.074
lf-0.071
ades-0.067
pack-0.067
Token.
Token ID13
Feature activation-9.72e-03

Pos logit contributions
 Leave+0.003
irms+0.003
ses+0.003
vious+0.003
Leave+0.003
Neg logit contributions
ator-0.001
ail-0.001
--0.001
lf-0.001
yl-0.001
Token want
Token ID765
Feature activation+3.77e-02

Pos logit contributions
-+0.044
ator+0.042
 carefully+0.041
yl+0.041
ail+0.040
Neg logit contributions
 Leave-0.029
ses-0.025
irms-0.024
Leave-0.023
Ha-0.023
Token to
Token ID284
Feature activation+6.75e-03

Pos logit contributions
 Leave+0.113
irms+0.107
ses+0.104
Leave+0.103
Ha+0.101
Neg logit contributions
--0.031
ator-0.028
ail-0.025
 carefully-0.025
yl-0.024
Token hurt
Token ID5938
Feature activation+1.13e-01

Pos logit contributions
 Leave+0.014
irms+0.014
ses+0.013
Ha+0.013
vious+0.013
Neg logit contributions
--0.012
ator-0.011
 carefully-0.011
yl-0.010
ail-0.010
Token Lily
Token ID20037
Feature activation+1.96e-02

Pos logit contributions
 Leave+0.221
irms+0.200
ses+0.194
vious+0.188
Ha+0.187
Neg logit contributions
--0.209
ator-0.208
ail-0.196
yl-0.190
 carefully-0.188
Token's
Token ID338
Feature activation+8.07e-03

Pos logit contributions
 Leave+0.054
irms+0.049
ses+0.048
Leave+0.047
vious+0.047
Neg logit contributions
ator-0.021
--0.020
ail-0.019
yl-0.017
lf-0.017
Token feelings
Token ID7666
Feature activation+1.20e-01

Pos logit contributions
 Leave+0.017
irms+0.016
ses+0.016
Leave+0.015
 drown+0.015
Neg logit contributions
ator-0.013
--0.013
ail-0.013
yl-0.012
 carefully-0.012
Token,
Token ID11
Feature activation+8.56e-03

Pos logit contributions
 Leave+0.340
irms+0.316
ses+0.307
Leave+0.304
vious+0.303
Neg logit contributions
ator-0.116
--0.108
ail-0.105
yl-0.094
lf-0.088
Token but
Token ID475
Feature activation+2.95e-02

Pos logit contributions
 Leave+0.022
irms+0.021
ses+0.020
Ha+0.019
 drown+0.019
Neg logit contributions
--0.010
 carefully-0.010
ator-0.008
yl-0.008
ail-0.008
Token he
Token ID339
Feature activation+2.95e-02

Pos logit contributions
 Leave+0.056
irms+0.050
ses+0.049
Leave+0.047
Ha+0.047
Neg logit contributions
--0.056
ator-0.056
ail-0.053
 carefully-0.051
yl-0.051
Token did
Token ID750
Feature activation-4.68e-02

Pos logit contributions
 Leave+0.058
irms+0.052
ses+0.052
Ha+0.051
vious+0.050
Neg logit contributions
ator-0.054
--0.050
ail-0.050
yl-0.046
ades-0.046
Token not
Token ID407
Feature activation-1.82e-02

Pos logit contributions
ator+0.085
ail+0.082
-+0.075
yl+0.074
ades+0.072
Neg logit contributions
 Leave-0.093
irms-0.085
 drown-0.082
 pain-0.081
Leave-0.081
Token It
Token ID632
Feature activation-4.94e-03

Pos logit contributions
ator+0.015
-+0.015
ail+0.014
 carefully+0.014
An+0.014
Neg logit contributions
irms-0.012
 Leave-0.012
vious-0.012
 drown-0.012
Leave-0.011
Token might
Token ID1244
Feature activation-2.44e-02

Pos logit contributions
ator+0.007
-+0.007
ail+0.007
yl+0.006
lf+0.006
Neg logit contributions
 Leave-0.011
irms-0.010
ses-0.010
Ha-0.010
Leave-0.010
Token get
Token ID651
Feature activation-9.74e-03

Pos logit contributions
-+0.035
 carefully+0.033
ator+0.032
yl+0.030
ep+0.030
Neg logit contributions
 Leave-0.058
irms-0.057
rified-0.051
Ha-0.051
vious-0.051
Token cold
Token ID4692
Feature activation+7.17e-02

Pos logit contributions
-+0.016
 carefully+0.015
ator+0.014
ail+0.014
ep+0.014
Neg logit contributions
 Leave-0.022
irms-0.020
Ha-0.019
ses-0.019
Leave-0.019
Token outside
Token ID2354
Feature activation-6.86e-02

Pos logit contributions
 Leave+0.214
ses+0.201
irms+0.196
Ha+0.195
Leave+0.193
Neg logit contributions
--0.069
ator-0.056
 carefully-0.051
ail-0.044
ma-0.042
Token,"
Token ID553
Feature activation+1.32e-01

Pos logit contributions
ator+0.091
-+0.090
ail+0.085
yl+0.078
 carefully+0.077
Neg logit contributions
 Leave-0.168
irms-0.154
ses-0.150
Leave-0.147
Ha-0.147
Token her
Token ID607
Feature activation-2.10e-02

Pos logit contributions
 Leave+0.328
irms+0.300
ses+0.296
vious+0.295
 drown+0.287
Neg logit contributions
--0.173
 carefully-0.169
ator-0.162
yl-0.156
ail-0.155
Token dad
Token ID9955
Feature activation+2.05e-03

Pos logit contributions
ator+0.037
-+0.034
ail+0.034
ades+0.031
yl+0.031
Neg logit contributions
 Leave-0.041
ses-0.039
irms-0.039
 Pip-0.036
Ha-0.035
Token said
Token ID531
Feature activation-1.92e-02

Pos logit contributions
 Leave+0.002
vious+0.002
 Pip+0.002
irms+0.002
Ha+0.002
Neg logit contributions
ator-0.005
ail-0.005
eways-0.005
ades-0.005
iced-0.004
Token.
Token ID13
Feature activation-8.38e-03

Pos logit contributions
ator+0.033
yl+0.028
ades+0.027
lf+0.027
ail+0.027
Neg logit contributions
 Leave-0.041
irms-0.036
 betrayed-0.034
 pain-0.034
ses-0.034
Token
Token ID198
Feature activation-3.96e-03

Pos logit contributions
ator+0.014
-+0.013
ail+0.013
yl+0.012
lf+0.012
Neg logit contributions
 Leave-0.018
irms-0.016
Ha-0.016
ses-0.016
Leave-0.016
Token there
Token ID612
Feature activation+1.17e-02

Pos logit contributions
ator+0.016
lf+0.014
ail+0.014
is+0.013
-+0.013
Neg logit contributions
 Leave-0.019
irms-0.018
 Pip-0.017
vious-0.017
 Wally-0.017
Token,
Token ID11
Feature activation+4.42e-03

Pos logit contributions
 Leave+0.025
irms+0.022
 Pip+0.022
ses+0.021
Leave+0.021
Neg logit contributions
ator-0.018
--0.017
ail-0.017
ep-0.016
yl-0.016
Token Joe
Token ID5689
Feature activation-8.23e-02

Pos logit contributions
 Leave+0.010
irms+0.009
ses+0.009
Leave+0.009
Ha+0.009
Neg logit contributions
ator-0.007
--0.007
ail-0.006
yl-0.006
 carefully-0.006
Token mentioned
Token ID4750
Feature activation-8.06e-02

Pos logit contributions
ator+0.137
-+0.127
ail+0.125
yl+0.119
ades+0.114
Neg logit contributions
 Leave-0.176
irms-0.155
vious-0.152
Ha-0.150
Leave-0.150
Token that
Token ID326
Feature activation-3.24e-03

Pos logit contributions
ator+0.123
-+0.117
ail+0.115
yl+0.106
lf+0.103
Neg logit contributions
 Leave-0.182
irms-0.166
ses-0.160
Leave-0.158
vious-0.157
Token he
Token ID339
Feature activation+2.79e-02

Pos logit contributions
ator+0.005
-+0.005
ail+0.005
yl+0.005
 carefully+0.005
Neg logit contributions
 Leave-0.007
irms-0.006
ses-0.006
Ha-0.006
Leave-0.006
Token had
Token ID550
Feature activation+6.88e-03

Pos logit contributions
 Leave+0.066
irms+0.061
ses+0.059
Leave+0.058
Ha+0.058
Neg logit contributions
--0.039
ator-0.039
ail-0.037
 carefully-0.035
yl-0.034
Token never
Token ID1239
Feature activation-2.27e-02

Pos logit contributions
 Leave+0.017
irms+0.016
ses+0.015
Leave+0.015
Ha+0.015
Neg logit contributions
ator-0.009
--0.009
ail-0.008
yl-0.008
 carefully-0.008
Token been
Token ID587
Feature activation-2.39e-02

Pos logit contributions
-+0.042
ator+0.039
ail+0.038
 carefully+0.038
yl+0.037
Neg logit contributions
 Leave-0.045
irms-0.041
ses-0.040
Ha-0.037
Leave-0.037
Token to
Token ID284
Feature activation+2.89e-03

Pos logit contributions
ator+0.039
ail+0.036
lf+0.033
-+0.033
ades+0.032
Neg logit contributions
 Leave-0.053
irms-0.047
vious-0.047
Leave-0.046
 drown-0.045
Token a
Token ID257
Feature activation-7.11e-03

Pos logit contributions
 Leave+0.007
irms+0.006
ses+0.006
Ha+0.006
awks+0.006
Neg logit contributions
--0.004
ator-0.004
 carefully-0.004
yl-0.004
lf-0.004
Token
Token ID198
Feature activation-4.35e-03

Pos logit contributions
ator+0.012
-+0.011
ail+0.011
yl+0.010
lf+0.010
Neg logit contributions
 Leave-0.020
irms-0.017
ses-0.017
Ha-0.017
 Pip-0.017
Token
Token ID198
Feature activation-4.21e-03

Pos logit contributions
ator+0.009
-+0.009
yl+0.008
cards+0.008
ail+0.008
Neg logit contributions
 Leave-0.007
Ha-0.006
 Woo-0.006
 Are-0.006
Leave-0.006
Token"
Token ID1
Feature activation-3.80e-02

Pos logit contributions
ator+0.007
ail+0.007
yl+0.006
-+0.006
 creation+0.006
Neg logit contributions
 Leave-0.009
Leave-0.009
Ha-0.008
 Are-0.008
 Woo-0.007
TokenYeah
Token ID10995
Feature activation-4.86e-02

Pos logit contributions
ator+0.076
-+0.075
ail+0.072
yl+0.069
 carefully+0.068
Neg logit contributions
 Leave-0.068
irms-0.061
ses-0.058
Leave-0.057
Ha-0.057
Token,
Token ID11
Feature activation+6.67e-03

Pos logit contributions
-+0.063
ator+0.061
ail+0.058
 carefully+0.053
yl+0.053
Neg logit contributions
 Leave-0.122
irms-0.114
ses-0.111
Ha-0.109
Leave-0.108
Token I
Token ID314
Feature activation+8.31e-02

Pos logit contributions
 Leave+0.013
irms+0.011
ses+0.011
Leave+0.011
Ha+0.011
Neg logit contributions
ator-0.012
--0.012
ail-0.012
yl-0.011
lf-0.011
Token'm
Token ID1101
Feature activation+1.16e-03

Pos logit contributions
 Leave+0.150
irms+0.135
ses+0.128
Leave+0.126
Ha+0.125
Neg logit contributions
--0.165
ator-0.163
ail-0.158
yl-0.151
 carefully-0.150
Token the
Token ID262
Feature activation+3.17e-03

Pos logit contributions
 Leave+0.002
irms+0.002
ses+0.002
 drown+0.002
Leave+0.002
Neg logit contributions
ator-0.002
--0.002
ail-0.002
lf-0.002
ades-0.002
Token fastest
Token ID14162
Feature activation+7.00e-02

Pos logit contributions
 Leave+0.007
irms+0.007
ses+0.007
Leave+0.006
Ha+0.006
Neg logit contributions
ator-0.005
--0.005
ail-0.004
yl-0.004
 carefully-0.004
Token in
Token ID287
Feature activation-9.63e-03

Pos logit contributions
 Leave+0.166
irms+0.153
ses+0.149
Ha+0.145
Leave+0.145
Neg logit contributions
--0.099
ator-0.099
ail-0.092
yl-0.086
 carefully-0.085
Token the
Token ID262
Feature activation+2.95e-03

Pos logit contributions
ator+0.014
-+0.014
ail+0.013
yl+0.012
 carefully+0.012
Neg logit contributions
 Leave-0.023
irms-0.021
ses-0.020
Ha-0.020
Leave-0.020
Token new
Token ID649
Feature activation-4.02e-02

Pos logit contributions
ator+0.012
ail+0.010
-+0.010
lf+0.009
ep+0.009
Neg logit contributions
 Leave-0.021
ses-0.020
irms-0.019
 drown-0.019
 Are-0.019
Token jewelry
Token ID22634
Feature activation+3.30e-02

Pos logit contributions
ator+0.057
ail+0.053
-+0.053
ades+0.048
 carefully+0.048
Neg logit contributions
 Leave-0.093
ses-0.085
irms-0.084
Ha-0.082
 Pip-0.081
Token in
Token ID287
Feature activation-7.75e-03

Pos logit contributions
 Leave+0.075
ses+0.070
vious+0.067
irms+0.066
 Pip+0.066
Neg logit contributions
ator-0.046
ail-0.042
ades-0.040
iced-0.038
--0.037
Token her
Token ID607
Feature activation-2.02e-02

Pos logit contributions
ator+0.012
-+0.011
ail+0.011
yl+0.010
 carefully+0.010
Neg logit contributions
 Leave-0.018
irms-0.016
ses-0.016
Ha-0.016
Leave-0.015
Token mom
Token ID1995
Feature activation-1.85e-02

Pos logit contributions
ator+0.024
-+0.024
ail+0.022
yl+0.021
 carefully+0.020
Neg logit contributions
 Leave-0.052
irms-0.048
ses-0.047
Ha-0.046
Leave-0.046
Token's
Token ID338
Feature activation+1.06e-02

Pos logit contributions
ator+0.026
ail+0.023
-+0.023
ades+0.022
yl+0.021
Neg logit contributions
 Leave-0.045
irms-0.040
ses-0.038
 Pip-0.038
vious-0.038
Token drawer
Token ID33451
Feature activation-2.62e-02

Pos logit contributions
 Leave+0.026
ses+0.024
irms+0.024
awks+0.022
 Pip+0.022
Neg logit contributions
ator-0.014
--0.013
ail-0.012
ades-0.012
iced-0.012
Token.
Token ID13
Feature activation-1.16e-03

Pos logit contributions
ator+0.031
yl+0.026
ail+0.026
-+0.025
ades+0.024
Neg logit contributions
 Leave-0.067
ses-0.061
irms-0.061
vious-0.060
Leave-0.057
Token It
Token ID632
Feature activation-2.35e-04

Pos logit contributions
-+0.002
ator+0.002
ail+0.002
 carefully+0.002
yl+0.002
Neg logit contributions
 Leave-0.002
irms-0.002
Leave-0.002
 drown-0.002
vious-0.002
Token is
Token ID318
Feature activation+1.99e-02

Pos logit contributions
ator+0.000
-+0.000
ail+0.000
ades+0.000
lf+0.000
Neg logit contributions
 Leave-0.000
ses-0.000
irms-0.000
Ha-0.000
vious-0.000
Token a
Token ID257
Feature activation-8.67e-03

Pos logit contributions
 Leave+0.043
ses+0.041
irms+0.040
 Pip+0.039
 pain+0.038
Neg logit contributions
ator-0.031
ail-0.029
--0.028
ades-0.026
yl-0.025
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
 Leave+0.065
vious+0.062
 pain+0.058
Ha+0.058
irms+0.058
Neg logit contributions
ator-0.045
ail-0.044
aby-0.040
ep-0.039
yl-0.038
Token He
Token ID679
Feature activation+2.90e-02

Pos logit contributions
ator+0.030
-+0.029
ail+0.028
 carefully+0.028
lf+0.027
Neg logit contributions
 Leave-0.021
irms-0.019
ses-0.018
Leave-0.018
 drown-0.018
Token took
Token ID1718
Feature activation-1.07e-02

Pos logit contributions
 Leave+0.058
irms+0.051
Ha+0.050
ses+0.050
vious+0.049
Neg logit contributions
ator-0.052
--0.048
ail-0.047
yl-0.045
ades-0.044
Token a
Token ID257
Feature activation-9.63e-03

Pos logit contributions
ator+0.018
ail+0.017
-+0.016
lf+0.016
yl+0.016
Neg logit contributions
 Leave-0.022
irms-0.020
Ha-0.019
vious-0.019
 Pip-0.019
Token big
Token ID1263
Feature activation+1.50e-03

Pos logit contributions
ator+0.022
ep+0.017
cards+0.017
ail+0.017
lf+0.017
Neg logit contributions
 Leave-0.016
Ha-0.013
 drown-0.013
ses-0.013
 Pip-0.013
Token bite
Token ID13197
Feature activation+4.52e-02

Pos logit contributions
 Leave+0.002
irms+0.002
ses+0.002
Ha+0.002
Leave+0.002
Neg logit contributions
ator-0.003
--0.003
ail-0.003
yl-0.003
 carefully-0.003
Token,
Token ID11
Feature activation+1.31e-03

Pos logit contributions
 Leave+0.122
irms+0.113
ses+0.110
Ha+0.108
Leave+0.108
Neg logit contributions
ator-0.049
--0.047
ail-0.045
yl-0.041
lf-0.039
Token and
Token ID290
Feature activation-7.77e-03

Pos logit contributions
 Leave+0.003
irms+0.003
ses+0.003
 drown+0.003
vious+0.003
Neg logit contributions
--0.002
 carefully-0.002
ator-0.002
ail-0.002
lf-0.001
Token it
Token ID340
Feature activation-2.40e-02

Pos logit contributions
-+0.012
 carefully+0.011
lf+0.010
ator+0.010
ail+0.010
Neg logit contributions
 Leave-0.019
irms-0.017
ses-0.017
Leave-0.016
 drown-0.016
Token was
Token ID373
Feature activation+8.96e-03

Pos logit contributions
ator+0.033
-+0.032
ail+0.031
yl+0.028
 carefully+0.028
Neg logit contributions
 Leave-0.058
irms-0.053
ses-0.052
Leave-0.051
Ha-0.051
Token so
Token ID523
Feature activation+4.31e-02

Pos logit contributions
 Leave+0.023
irms+0.021
ses+0.020
Leave+0.020
Ha+0.020
Neg logit contributions
--0.011
ator-0.011
ail-0.010
 carefully-0.010
lf-0.010
Token of
Token ID286
Feature activation-1.75e-02

Pos logit contributions
ator+0.020
ail+0.019
ades+0.018
ep+0.018
pack+0.017
Neg logit contributions
 Leave-0.017
 pain-0.015
Leave-0.015
vious-0.014
irms-0.014
Token the
Token ID262
Feature activation+3.31e-03

Pos logit contributions
ator+0.025
-+0.025
ail+0.023
yl+0.022
 carefully+0.022
Neg logit contributions
 Leave-0.041
irms-0.038
ses-0.037
Leave-0.036
Ha-0.036
Token hole
Token ID7604
Feature activation+1.04e-01

Pos logit contributions
 Leave+0.007
irms+0.007
vious+0.006
ses+0.006
Leave+0.006
Neg logit contributions
--0.005
ator-0.005
ail-0.005
 carefully-0.005
lf-0.005
Token and
Token ID290
Feature activation-3.72e-03

Pos logit contributions
 Leave+0.227
ses+0.209
vious+0.207
irms+0.205
Leave+0.199
Neg logit contributions
ator-0.154
ail-0.138
yl-0.130
lf-0.124
ades-0.123
Token found
Token ID1043
Feature activation-2.17e-02

Pos logit contributions
ator+0.007
-+0.006
ail+0.006
yl+0.006
 carefully+0.006
Neg logit contributions
 Leave-0.008
irms-0.007
ses-0.007
Leave-0.006
Ha-0.006
Token herself
Token ID5223
Feature activation+7.00e-03

Pos logit contributions
-+0.031
ator+0.031
ail+0.029
 carefully+0.028
yl+0.028
Neg logit contributions
 Leave-0.051
irms-0.047
Leave-0.045
ses-0.045
Ha-0.044
Token in
Token ID287
Feature activation-1.08e-02

Pos logit contributions
 Leave+0.016
 drown+0.015
ses+0.015
irms+0.014
 pain+0.014
Neg logit contributions
ator-0.010
yl-0.009
ail-0.009
--0.008
is-0.008
Token a
Token ID257
Feature activation-1.03e-02

Pos logit contributions
ator+0.017
-+0.016
ail+0.016
yl+0.014
ades+0.014
Neg logit contributions
 Leave-0.024
irms-0.022
ses-0.021
Leave-0.021
Ha-0.021
Token beautiful
Token ID4950
Feature activation-1.51e-02

Pos logit contributions
ator+0.016
ail+0.015
-+0.014
yl+0.013
ades+0.013
Neg logit contributions
 Leave-0.023
irms-0.021
Leave-0.021
 drown-0.021
Ha-0.021
Token garden
Token ID11376
Feature activation-5.44e-02

Pos logit contributions
ator+0.021
-+0.020
ail+0.020
yl+0.018
 carefully+0.018
Neg logit contributions
 Leave-0.035
irms-0.032
ses-0.032
Ha-0.031
Leave-0.031
Token.
Token ID13
Feature activation-8.57e-03

Pos logit contributions
ator+0.066
ail+0.061
-+0.055
yl+0.053
ep+0.052
Neg logit contributions
 Leave-0.138
ses-0.126
irms-0.126
vious-0.121
 Pip-0.120
Token let
Token ID1309
Feature activation+4.58e-02

Pos logit contributions
lf+0.055
ator+0.054
ves+0.053
is+0.053
ades+0.053
Neg logit contributions
 Leave-0.072
 Are-0.059
 Stop-0.057
 drown-0.057
irms-0.056
Token's
Token ID338
Feature activation+2.63e-03

Pos logit contributions
 Leave+0.117
irms+0.107
ses+0.104
Ha+0.103
Leave+0.103
Neg logit contributions
ator-0.057
--0.055
ail-0.053
yl-0.049
 carefully-0.047
Token go
Token ID467
Feature activation+1.51e-02

Pos logit contributions
 Leave+0.005
irms+0.005
ses+0.005
Ha+0.004
Leave+0.004
Neg logit contributions
ator-0.005
--0.005
ail-0.004
 carefully-0.004
yl-0.004
Token for
Token ID329
Feature activation-2.28e-03

Pos logit contributions
 Leave+0.033
 Woo+0.027
 drown+0.027
irms+0.027
 Pip+0.026
Neg logit contributions
ator-0.026
lf-0.023
ail-0.022
yl-0.022
ades-0.022
Token a
Token ID257
Feature activation-8.90e-03

Pos logit contributions
ator+0.003
-+0.003
ail+0.003
yl+0.003
 carefully+0.003
Neg logit contributions
 Leave-0.006
irms-0.005
ses-0.005
Leave-0.005
Ha-0.005
Token march
Token ID9960
Feature activation-2.10e-02

Pos logit contributions
ator+0.021
ail+0.018
kins+0.018
cards+0.018
ep+0.018
Neg logit contributions
 Leave-0.014
 drown-0.012
 Sorry-0.011
 Woo-0.011
 Wally-0.010
Token!"
Token ID2474
Feature activation+7.89e-02

Pos logit contributions
ator+0.029
ail+0.028
-+0.026
yl+0.025
lf+0.025
Neg logit contributions
 Leave-0.050
irms-0.045
vious-0.044
Leave-0.043
ses-0.043
Token Sally
Token ID25737
Feature activation-1.60e-02

Pos logit contributions
 Leave+0.133
irms+0.122
ses+0.117
Leave+0.112
 drown+0.112
Neg logit contributions
--0.165
ator-0.160
ail-0.157
 carefully-0.151
yl-0.149
Token was
Token ID373
Feature activation+9.49e-03

Pos logit contributions
ator+0.029
-+0.024
ades+0.024
yl+0.024
cards+0.024
Neg logit contributions
 Leave-0.031
ses-0.027
Ha-0.027
irms-0.026
vious-0.026
Token excited
Token ID6568
Feature activation-3.46e-02

Pos logit contributions
 Leave+0.022
irms+0.020
ses+0.019
Leave+0.019
 drown+0.018
Neg logit contributions
ator-0.015
ail-0.014
--0.014
yl-0.013
ades-0.013
Token and
Token ID290
Feature activation-6.72e-03

Pos logit contributions
ator+0.049
ail+0.044
ades+0.041
ep+0.041
yl+0.040
Neg logit contributions
 Leave-0.081
irms-0.074
vious-0.073
 Pip-0.071
Leave-0.069
Token bit
Token ID1643
Feature activation+7.63e-02

Pos logit contributions
ator+0.007
-+0.007
ail+0.006
yl+0.006
 carefully+0.006
Neg logit contributions
 Leave-0.011
irms-0.010
ses-0.010
Leave-0.010
Ha-0.010
Token scared
Token ID12008
Feature activation+6.16e-02

Pos logit contributions
 Leave+0.148
ses+0.135
irms+0.129
Ha+0.126
Leave+0.124
Neg logit contributions
--0.150
 carefully-0.137
ail-0.131
ator-0.127
yl-0.124
Token because
Token ID780
Feature activation-3.73e-02

Pos logit contributions
 Leave+0.170
ses+0.167
irms+0.164
Ha+0.158
 drown+0.155
Neg logit contributions
--0.069
ator-0.057
 carefully-0.053
ail-0.048
yl-0.047
Token he
Token ID339
Feature activation+2.83e-02

Pos logit contributions
ator+0.075
-+0.074
ail+0.073
yl+0.068
 carefully+0.066
Neg logit contributions
 Leave-0.065
irms-0.057
Leave-0.054
ses-0.054
vious-0.054
Token had
Token ID550
Feature activation+7.39e-03

Pos logit contributions
 Leave+0.063
irms+0.057
ses+0.056
Ha+0.054
Leave+0.054
Neg logit contributions
ator-0.045
--0.043
ail-0.042
yl-0.039
 carefully-0.038
Token never
Token ID1239
Feature activation-2.23e-02

Pos logit contributions
 Leave+0.017
irms+0.016
ses+0.015
Ha+0.015
Leave+0.015
Neg logit contributions
--0.011
ator-0.010
ail-0.010
 carefully-0.010
yl-0.009
Token been
Token ID587
Feature activation-2.46e-02

Pos logit contributions
-+0.046
ail+0.042
 carefully+0.041
ator+0.041
yl+0.041
Neg logit contributions
 Leave-0.039
irms-0.036
ses-0.034
Ha-0.032
Leave-0.032
Token to
Token ID284
Feature activation+4.18e-03

Pos logit contributions
ator+0.041
ail+0.035
lf+0.035
ades+0.034
iced+0.033
Neg logit contributions
 Leave-0.053
vious-0.048
irms-0.047
 Pip-0.046
Leave-0.046
Token the
Token ID262
Feature activation+4.67e-03

Pos logit contributions
 Leave+0.010
irms+0.009
ses+0.009
Ha+0.008
awks+0.008
Neg logit contributions
--0.006
ator-0.006
 carefully-0.006
ail-0.005
yl-0.005
Token hall
Token ID6899
Feature activation-8.91e-02

Pos logit contributions
 Leave+0.010
irms+0.009
vious+0.009
ses+0.009
Ha+0.009
Neg logit contributions
ator-0.007
--0.007
ail-0.007
lf-0.006
ep-0.006
Token before
Token ID878
Feature activation+4.94e-02

Pos logit contributions
ator+0.153
ail+0.146
-+0.139
yl+0.138
ep+0.134
Neg logit contributions
 Leave-0.177
vious-0.162
irms-0.152
Leave-0.150
ses-0.144
Token and
Token ID290
Feature activation-2.22e-03

Pos logit contributions
ator+0.012
ail+0.011
-+0.011
yl+0.010
lf+0.010
Neg logit contributions
 Leave-0.025
irms-0.023
ses-0.022
vious-0.022
Leave-0.022
Token said
Token ID531
Feature activation-1.51e-02

Pos logit contributions
-+0.004
ator+0.004
ail+0.003
 carefully+0.003
yl+0.003
Neg logit contributions
 Leave-0.005
irms-0.004
Leave-0.004
ses-0.004
vious-0.004
Token to
Token ID284
Feature activation+9.17e-03

Pos logit contributions
ator+0.021
-+0.020
ail+0.019
yl+0.018
 carefully+0.017
Neg logit contributions
 Leave-0.037
irms-0.033
ses-0.032
Leave-0.032
Ha-0.032
Token his
Token ID465
Feature activation+2.13e-02

Pos logit contributions
 Leave+0.019
irms+0.017
ses+0.016
Ha+0.016
Leave+0.016
Neg logit contributions
ator-0.016
--0.016
ail-0.015
yl-0.014
 carefully-0.014
Token sister
Token ID6621
Feature activation+5.32e-02

Pos logit contributions
 Leave+0.027
irms+0.027
ses+0.025
 pain+0.025
vious+0.024
Neg logit contributions
ator-0.051
yl-0.046
ail-0.046
ep-0.045
ades-0.045
Token:
Token ID25
Feature activation-5.66e-02

Pos logit contributions
 Leave+0.111
 Pip+0.097
irms+0.096
ses+0.095
 Wally+0.094
Neg logit contributions
ator-0.093
ep-0.081
ail-0.080
--0.079
iced-0.077
Token "
Token ID366
Feature activation-7.35e-02

Pos logit contributions
ator+0.077
ail+0.072
-+0.071
yl+0.062
ep+0.062
Neg logit contributions
 Leave-0.141
 Are-0.124
irms-0.124
Leave-0.123
Ha-0.123
TokenY
Token ID56
Feature activation-5.68e-02

Pos logit contributions
ator+0.141
-+0.140
ail+0.132
yl+0.128
 carefully+0.128
Neg logit contributions
 Leave-0.137
irms-0.125
ses-0.123
 drown-0.116
Leave-0.113
Tokenay
Token ID323
Feature activation-5.63e-02

Pos logit contributions
ator+0.097
-+0.090
ail+0.088
yl+0.087
lf+0.086
Neg logit contributions
 Leave-0.120
irms-0.108
vious-0.104
ses-0.101
Ha-0.101
Token,
Token ID11
Feature activation+7.67e-03

Pos logit contributions
ator+0.079
ail+0.071
yl+0.068
 carefully+0.068
-+0.067
Neg logit contributions
 Leave-0.134
vious-0.116
irms-0.116
 Pip-0.112
Leave-0.112
Token I
Token ID314
Feature activation+8.37e-02

Pos logit contributions
 Leave+0.015
irms+0.014
ses+0.014
Ha+0.013
Leave+0.013
Neg logit contributions
--0.014
ator-0.014
ail-0.013
 carefully-0.012
yl-0.012
Tokenthey
Token ID9930
Feature activation-4.83e-02

Pos logit contributions
ator+0.122
-+0.122
ail+0.120
yl+0.110
lf+0.107
Neg logit contributions
 Leave-0.156
Leave-0.139
Ha-0.137
irms-0.136
vious-0.133
Token like
Token ID588
Feature activation+8.43e-03

Pos logit contributions
-+0.098
ator+0.096
 carefully+0.096
ail+0.093
yl+0.088
Neg logit contributions
 Leave-0.082
irms-0.078
ses-0.073
Ha-0.071
Leave-0.070
Token to
Token ID284
Feature activation+6.45e-03

Pos logit contributions
 Leave+0.023
irms+0.022
ses+0.021
Ha+0.021
vious+0.020
Neg logit contributions
--0.009
ator-0.009
ail-0.008
 carefully-0.008
yl-0.007
Token eat
Token ID4483
Feature activation-1.43e-02

Pos logit contributions
 Leave+0.013
irms+0.012
ses+0.012
Ha+0.012
vious+0.011
Neg logit contributions
--0.011
ator-0.011
 carefully-0.011
ail-0.010
yl-0.010
Token carrots
Token ID34397
Feature activation+5.28e-02

Pos logit contributions
ator+0.021
-+0.020
ail+0.019
yl+0.018
 carefully+0.018
Neg logit contributions
 Leave-0.033
irms-0.031
ses-0.030
Leave-0.029
Ha-0.029
Token and
Token ID290
Feature activation-5.33e-03

Pos logit contributions
 Leave+0.130
irms+0.120
ses+0.116
vious+0.116
 Pip+0.114
Neg logit contributions
ator-0.067
--0.063
ail-0.062
yl-0.059
lf-0.055
Token they
Token ID484
Feature activation+2.11e-04

Pos logit contributions
 carefully+0.009
-+0.009
ator+0.008
ail+0.008
lf+0.008
Neg logit contributions
 Leave-0.011
irms-0.011
vious-0.011
Ha-0.010
ses-0.010
Token like
Token ID588
Feature activation+8.91e-03

Pos logit contributions
 Leave+0.000
irms+0.000
ses+0.000
Ha+0.000
Leave+0.000
Neg logit contributions
--0.000
ator-0.000
 carefully-0.000
ail-0.000
ep-0.000
Token to
Token ID284
Feature activation+6.75e-03

Pos logit contributions
 Leave+0.022
irms+0.021
ses+0.020
Ha+0.019
Leave+0.019
Neg logit contributions
--0.012
ator-0.011
ail-0.011
 carefully-0.010
yl-0.010
Token hop
Token ID1725
Feature activation+4.43e-02

Pos logit contributions
 Leave+0.013
irms+0.012
ses+0.012
Ha+0.011
vious+0.011
Neg logit contributions
 carefully-0.012
--0.012
ator-0.011
ail-0.011
 patiently-0.011
Token around
Token ID1088
Feature activation-4.76e-03

Pos logit contributions
 Leave+0.115
irms+0.106
ses+0.104
Leave+0.102
Ha+0.102
Neg logit contributions
ator-0.052
--0.052
ail-0.048
yl-0.044
 carefully-0.044
TokenThe
Token ID464
Feature activation-3.71e-02

Pos logit contributions
-+0.016
ator+0.015
ail+0.015
yl+0.014
 creation+0.014
Neg logit contributions
 Leave-0.020
Leave-0.018
Ha-0.018
Bad-0.016
 Wally-0.016
Token end
Token ID886
Feature activation+2.95e-02

Pos logit contributions
ator+0.065
-+0.065
ail+0.064
yl+0.062
ades+0.059
Neg logit contributions
 Leave-0.072
irms-0.065
Leave-0.064
vious-0.063
ses-0.063
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
 Leave+0.057
irms+0.051
 Are+0.050
 pain+0.050
ses+0.050
Neg logit contributions
ator-0.048
ail-0.046
pack-0.042
ep-0.041
is-0.041
Token<|endoftext|>
Token ID50256
Feature activation+3.30e-02

Pos logit contributions
ator+0.017
-+0.016
ail+0.015
yl+0.014
 carefully+0.014
Neg logit contributions
 Leave-0.030
irms-0.027
ses-0.027
Leave-0.026
Ha-0.026
TokenA
Token ID32
Feature activation-3.06e-02

Pos logit contributions
 Leave+0.062
irms+0.054
Ha+0.052
Leave+0.051
ses+0.051
Neg logit contributions
ator-0.063
--0.062
ail-0.061
yl-0.057
 carefully-0.057
Tokenunt
Token ID2797
Feature activation-6.28e-02

Pos logit contributions
ator+0.060
ail+0.056
cards+0.053
ades+0.051
iced+0.050
Neg logit contributions
 Leave-0.051
Ha-0.050
 Pip-0.047
 Are-0.046
 Wally-0.046
Token S
Token ID311
Feature activation+1.33e-02

Pos logit contributions
ades+0.095
ator+0.093
lf+0.088
ep+0.085
ves+0.085
Neg logit contributions
 Leave-0.129
 Pip-0.113
vious-0.113
 Woo-0.112
ses-0.106
Tokenari
Token ID2743
Feature activation-1.11e-01

Pos logit contributions
 Leave+0.027
vious+0.024
 Pip+0.024
Ha+0.022
irms+0.022
Neg logit contributions
ator-0.021
--0.020
lf-0.020
 carefully-0.019
An-0.019
Token and
Token ID290
Feature activation-6.55e-03

Pos logit contributions
ator+0.164
-+0.154
ail+0.153
yl+0.140
ades+0.139
Neg logit contributions
 Leave-0.254
irms-0.223
Ha-0.220
ses-0.219
vious-0.218
Token her
Token ID607
Feature activation-2.50e-02

Pos logit contributions
-+0.011
yl+0.010
 carefully+0.010
ator+0.010
ail+0.009
Neg logit contributions
 Leave-0.014
irms-0.013
ses-0.013
 drown-0.012
vious-0.012
Token son
Token ID3367
Feature activation-4.16e-02

Pos logit contributions
ator+0.042
ail+0.038
-+0.038
lf+0.038
ep+0.037
Neg logit contributions
 Leave-0.053
ses-0.049
irms-0.046
Ha-0.045
 Pip-0.044
Token.
Token ID13
Feature activation-9.36e-03

Pos logit contributions
ator+0.041
-+0.039
ail+0.039
yl+0.037
lf+0.036
Neg logit contributions
 Leave-0.046
irms-0.041
ses-0.039
 drown-0.039
vious-0.039
Token
Token ID220
Feature activation+5.11e-03

Pos logit contributions
-+0.017
ator+0.017
ail+0.017
 carefully+0.016
yl+0.016
Neg logit contributions
 Leave-0.018
irms-0.016
ses-0.016
Leave-0.015
 drown-0.015
Token
Token ID198
Feature activation-5.84e-03

Pos logit contributions
 Leave+0.010
irms+0.008
 Pip+0.008
Ha+0.008
ses+0.008
Neg logit contributions
ator-0.010
--0.009
lf-0.009
yl-0.009
ail-0.009
Token
Token ID198
Feature activation-5.98e-03

Pos logit contributions
ator+0.011
-+0.010
 creation+0.010
ail+0.010
lf+0.010
Neg logit contributions
 Leave-0.012
Ha-0.010
Leave-0.010
irms-0.010
 Pip-0.010
TokenShe
Token ID3347
Feature activation-1.74e-02

Pos logit contributions
ator+0.011
ail+0.009
-+0.009
 creation+0.009
yl+0.008
Neg logit contributions
 Leave-0.014
Leave-0.013
Ha-0.012
vious-0.011
irms-0.011
Token prayed
Token ID32168
Feature activation-1.12e-01

Pos logit contributions
ator+0.038
ail+0.034
-+0.033
ades+0.032
yl+0.032
Neg logit contributions
 Leave-0.029
irms-0.025
Ha-0.025
Leave-0.025
vious-0.025
Token for
Token ID329
Feature activation-1.17e-03

Pos logit contributions
ator+0.173
ail+0.149
yl+0.142
ades+0.142
lf+0.138
Neg logit contributions
 Leave-0.245
irms-0.225
ses-0.223
vious-0.216
Leave-0.216
Token help
Token ID1037
Feature activation-3.13e-02

Pos logit contributions
ator+0.002
ail+0.002
ades+0.002
iced+0.002
ve+0.002
Neg logit contributions
 Leave-0.002
 pain-0.002
 Pip-0.002
irms-0.002
Leave-0.002
Token.
Token ID13
Feature activation-8.46e-03

Pos logit contributions
ator+0.053
ades+0.050
ail+0.046
yl+0.044
ep+0.043
Neg logit contributions
 Leave-0.064
 pain-0.059
 Pip-0.058
irms-0.055
Leave-0.054
Token Suddenly
Token ID24975
Feature activation+3.80e-02

Pos logit contributions
ator+0.019
-+0.019
ail+0.018
yl+0.018
lf+0.018
Neg logit contributions
 Leave-0.013
irms-0.011
ses-0.011
Ha-0.010
Leave-0.010
Token,
Token ID11
Feature activation+4.62e-03

Pos logit contributions
 Leave+0.085
irms+0.076
ses+0.074
 Pip+0.074
Leave+0.073
Neg logit contributions
ator-0.061
--0.054
ail-0.053
ades-0.050
yl-0.049
Token and
Token ID290
Feature activation-2.35e-03

Pos logit contributions
ator+0.046
ades+0.042
ail+0.040
yl+0.040
eways+0.040
Neg logit contributions
 Leave-0.034
 Pip-0.029
vious-0.029
irms-0.028
Ha-0.026
Token dad
Token ID9955
Feature activation+4.27e-03

Pos logit contributions
ator+0.003
-+0.003
ail+0.003
yl+0.003
 carefully+0.003
Neg logit contributions
 Leave-0.006
irms-0.005
ses-0.005
Leave-0.005
Ha-0.005
Token watched
Token ID7342
Feature activation-7.63e-02

Pos logit contributions
 Leave+0.006
vious+0.005
 Pip+0.005
irms+0.005
Leave+0.005
Neg logit contributions
ator-0.010
ail-0.009
--0.009
yl-0.008
ades-0.008
Token to
Token ID284
Feature activation+9.51e-03

Pos logit contributions
ator+0.135
ail+0.132
yl+0.121
a+0.114
ep+0.114
Neg logit contributions
 Leave-0.155
Leave-0.140
irms-0.136
 pain-0.130
vious-0.129
Token make
Token ID787
Feature activation-2.41e-02

Pos logit contributions
 Leave+0.019
irms+0.017
ses+0.017
vious+0.016
Leave+0.016
Neg logit contributions
ator-0.016
--0.016
 carefully-0.016
yl-0.015
ail-0.015
Token sure
Token ID1654
Feature activation-1.36e-01

Pos logit contributions
ator+0.044
ail+0.041
lf+0.039
yl+0.038
An+0.037
Neg logit contributions
 Leave-0.045
irms-0.043
vious-0.041
awks-0.041
 Pip-0.040
Token she
Token ID673
Feature activation-9.95e-03

Pos logit contributions
ator+0.220
-+0.212
ail+0.208
yl+0.195
ep+0.187
Neg logit contributions
 Leave-0.292
irms-0.262
ses-0.253
Leave-0.252
Ha-0.250
Token was
Token ID373
Feature activation+1.28e-02

Pos logit contributions
ator+0.015
ail+0.014
-+0.014
yl+0.013
ades+0.013
Neg logit contributions
 Leave-0.022
irms-0.020
ses-0.020
vious-0.020
Ha-0.019
Token safe
Token ID3338
Feature activation+3.12e-02

Pos logit contributions
 Leave+0.024
 Pip+0.022
irms+0.022
 drown+0.022
vious+0.022
Neg logit contributions
ator-0.022
ail-0.021
yl-0.021
--0.020
ades-0.020
Token.
Token ID13
Feature activation-7.20e-03

Pos logit contributions
 Leave+0.075
irms+0.070
 Pip+0.067
vious+0.067
ses+0.067
Neg logit contributions
ail-0.038
ator-0.037
yl-0.034
ades-0.033
--0.033
Token
Token ID198
Feature activation-3.40e-03

Pos logit contributions
ator+0.015
-+0.015
ail+0.014
 carefully+0.014
yl+0.014
Neg logit contributions
 Leave-0.012
irms-0.011
ses-0.011
Leave-0.010
vious-0.010
Token."
Token ID526
Feature activation+6.84e-02

Pos logit contributions
ator+0.024
-+0.023
ail+0.023
yl+0.021
 carefully+0.020
Neg logit contributions
 Leave-0.043
irms-0.039
ses-0.038
Ha-0.037
Leave-0.037
Token So
Token ID1406
Feature activation+4.97e-02

Pos logit contributions
 Leave+0.112
irms+0.104
ses+0.095
Ha+0.094
 Are+0.093
Neg logit contributions
ator-0.141
--0.140
ail-0.132
yl-0.130
ep-0.128
Token the
Token ID262
Feature activation+1.93e-03

Pos logit contributions
 Leave+0.114
irms+0.104
ses+0.101
Ha+0.099
vious+0.098
Neg logit contributions
ator-0.073
--0.069
ail-0.068
yl-0.062
ep-0.061
Token four
Token ID1440
Feature activation-7.00e-02

Pos logit contributions
 Leave+0.004
Ha+0.004
irms+0.004
ses+0.004
Leave+0.004
Neg logit contributions
ator-0.003
--0.003
ail-0.003
ades-0.002
yl-0.002
Token friends
Token ID2460
Feature activation+4.38e-02

Pos logit contributions
ator+0.119
pack+0.103
ail+0.102
aby+0.097
iced+0.096
Neg logit contributions
 Leave-0.150
vious-0.134
Ha-0.130
Leave-0.128
irms-0.127
Token discussed
Token ID6693
Feature activation-1.16e-01

Pos logit contributions
 Leave+0.079
irms+0.069
vious+0.069
Ha+0.066
ses+0.066
Neg logit contributions
ator-0.088
--0.080
ail-0.080
lf-0.076
yl-0.075
Token what
Token ID644
Feature activation-4.87e-02

Pos logit contributions
ator+0.226
ail+0.221
yl+0.207
ve+0.202
-+0.194
Neg logit contributions
 Leave-0.201
irms-0.194
 Wally-0.181
ses-0.178
 Pip-0.176
Token game
Token ID983
Feature activation-1.70e-02

Pos logit contributions
ator+0.086
ail+0.082
-+0.077
yl+0.073
ep+0.071
Neg logit contributions
 Leave-0.100
Ha-0.086
 Pip-0.086
irms-0.085
 Wally-0.084
Token to
Token ID284
Feature activation+9.63e-03

Pos logit contributions
ator+0.023
-+0.022
ail+0.021
yl+0.020
lf+0.019
Neg logit contributions
 Leave-0.041
irms-0.037
ses-0.037
Ha-0.036
vious-0.036
Token play
Token ID711
Feature activation+6.95e-03

Pos logit contributions
 Leave+0.017
irms+0.015
ses+0.014
Leave+0.014
Ha+0.014
Neg logit contributions
ator-0.020
--0.019
ail-0.019
yl-0.018
 carefully-0.017
Token.
Token ID13
Feature activation-5.14e-03

Pos logit contributions
 Leave+0.016
irms+0.015
Ha+0.014
Leave+0.014
vious+0.014
Neg logit contributions
ail-0.010
ator-0.010
yl-0.009
--0.008
ma-0.008
Token safe
Token ID3338
Feature activation+2.50e-02

Pos logit contributions
 Leave+0.009
irms+0.008
Leave+0.008
ses+0.008
 drown+0.008
Neg logit contributions
ator-0.005
--0.005
ail-0.005
yl-0.005
lf-0.004
Token.
Token ID13
Feature activation-1.36e-02

Pos logit contributions
 Leave+0.069
irms+0.064
ses+0.062
Leave+0.062
 drown+0.061
Neg logit contributions
ator-0.024
ail-0.023
--0.023
yl-0.021
lf-0.019
Token
Token ID220
Feature activation+2.82e-03

Pos logit contributions
ator+0.024
-+0.024
ail+0.022
lf+0.021
yl+0.021
Neg logit contributions
 Leave-0.028
irms-0.025
Ha-0.024
ses-0.024
 Are-0.023
Token
Token ID198
Feature activation-8.08e-03

Pos logit contributions
 Leave+0.006
irms+0.005
Ha+0.005
ses+0.005
 drown+0.005
Neg logit contributions
ator-0.005
--0.005
lf-0.004
yl-0.004
ail-0.004
Token
Token ID198
Feature activation-8.37e-03

Pos logit contributions
ator+0.015
-+0.015
ail+0.014
 creation+0.014
 recipe+0.013
Neg logit contributions
 Leave-0.015
Ha-0.013
Leave-0.013
 Are-0.012
irms-0.012
TokenJo
Token ID9908
Feature activation-1.73e-01

Pos logit contributions
ator+0.012
-+0.012
ail+0.012
 creation+0.011
 recipe+0.010
Neg logit contributions
 Leave-0.019
Leave-0.018
Ha-0.018
 Are-0.016
Got-0.016
Token and
Token ID290
Feature activation-8.07e-03

Pos logit contributions
ator+0.237
ades+0.213
ail+0.212
-+0.210
yl+0.203
Neg logit contributions
 Leave-0.408
vious-0.358
ses-0.350
irms-0.350
Ha-0.350
Token Jeff
Token ID5502
Feature activation-7.78e-02

Pos logit contributions
ator+0.014
-+0.014
ail+0.014
ades+0.013
yl+0.013
Neg logit contributions
 Leave-0.016
irms-0.014
ses-0.014
Ha-0.013
Leave-0.013
Token were
Token ID547
Feature activation+7.61e-03

Pos logit contributions
-+0.135
ator+0.133
ail+0.128
 carefully+0.124
yl+0.121
Neg logit contributions
 Leave-0.158
irms-0.145
ses-0.140
Leave-0.135
Ha-0.135
Token relieved
Token ID22598
Feature activation+1.42e-02

Pos logit contributions
 Leave+0.019
irms+0.017
ses+0.016
Leave+0.016
Ha+0.016
Neg logit contributions
ator-0.010
--0.010
ail-0.010
yl-0.009
lf-0.009
Token.
Token ID13
Feature activation-1.37e-02

Pos logit contributions
 Leave+0.033
irms+0.029
ses+0.029
Leave+0.028
 Pip+0.028
Neg logit contributions
ator-0.021
ail-0.019
--0.019
lf-0.018
ep-0.018
Token smiled
Token ID13541
Feature activation-1.29e-02

Pos logit contributions
-+0.008
 carefully+0.008
lf+0.007
ail+0.007
ator+0.007
Neg logit contributions
 Leave-0.012
irms-0.011
vious-0.011
ses-0.010
Leave-0.010
Token.
Token ID13
Feature activation-9.21e-03

Pos logit contributions
ator+0.017
ail+0.015
lf+0.014
yl+0.014
ep+0.013
Neg logit contributions
 Leave-0.032
irms-0.030
vious-0.028
ses-0.028
 Pip-0.028
Token He
Token ID679
Feature activation+3.08e-02

Pos logit contributions
ator+0.017
-+0.017
ail+0.016
 carefully+0.016
yl+0.015
Neg logit contributions
 Leave-0.017
irms-0.016
vious-0.015
ses-0.015
Leave-0.015
Token couldn
Token ID3521
Feature activation+4.63e-03

Pos logit contributions
 Leave+0.075
irms+0.069
ses+0.066
Leave+0.065
 drown+0.065
Neg logit contributions
--0.043
ator-0.041
ail-0.040
 carefully-0.038
yl-0.037
Token't
Token ID470
Feature activation-3.87e-02

Pos logit contributions
 Leave+0.010
irms+0.009
ses+0.009
Leave+0.008
Ha+0.008
Neg logit contributions
ator-0.008
--0.008
ail-0.007
 carefully-0.007
yl-0.007
Token believe
Token ID1975
Feature activation-1.19e-01

Pos logit contributions
 carefully+0.070
-+0.068
ator+0.066
ail+0.062
 patiently+0.060
Neg logit contributions
 Leave-0.078
irms-0.073
ses-0.068
awks-0.065
Ha-0.064
Token he
Token ID339
Feature activation+2.63e-02

Pos logit contributions
ator+0.194
ail+0.188
-+0.188
yl+0.172
ep+0.168
Neg logit contributions
 Leave-0.252
irms-0.225
Leave-0.218
ses-0.215
Ha-0.213
Token had
Token ID550
Feature activation+8.43e-03

Pos logit contributions
 Leave+0.066
irms+0.061
ses+0.059
Ha+0.059
Leave+0.059
Neg logit contributions
ator-0.034
--0.031
ail-0.031
yl-0.028
ades-0.027
Token eaten
Token ID17065
Feature activation+1.75e-02

Pos logit contributions
 Leave+0.017
irms+0.015
ses+0.015
Leave+0.014
Ha+0.014
Neg logit contributions
--0.015
ator-0.015
ail-0.014
 carefully-0.014
yl-0.014
Token something
Token ID1223
Feature activation+7.15e-03

Pos logit contributions
 Leave+0.047
ses+0.043
irms+0.042
Leave+0.041
Ha+0.041
Neg logit contributions
--0.022
ator-0.019
 carefully-0.019
ail-0.018
ma-0.016
Token so
Token ID523
Feature activation+4.33e-02

Pos logit contributions
 Leave+0.017
irms+0.016
ses+0.016
Leave+0.015
Ha+0.015
Neg logit contributions
ator-0.010
--0.009
ail-0.009
yl-0.008
lf-0.008