TOP ACTIVATIONS
MAX = 0.078

soft t eddy bear named Mr . Bear . She loved
said " del iver to Mr . Smith ". They wondered
was happy that she and Mr . Sn ugg les were
tree with orange fruits hanging from its branches . She picked
and moons . Lily and Mr . Sn ugg les were
N orman and his tutor Mr . Johnson were in the
in awe the cliff was so
and butterflies . Sally and Mr . Bunny played in the
were thrilled they had defeated the
They saw the dog lick Mr . Lee 's face and
fluffy that everyone called him Mr . Velvet . Every day
. All the animals loved Mr . Owl because he was
intelligent t eddy bear named Mr . H ugs . Lily
beautiful rainbow . Lucy and Mr . Wh isk ers were
girl called Emily came to Mr Smith 's vegetable garden .
named Lily came to visit Mr . John . She saw
and enjoyed playing chess with Mr . John .
They saw some vines hanging from the tree . The vines
was a stuffed bunny named Mr . Fl oppy . One
the necklace and the magn ifying glass and started to examine

MIN ACTIVATIONS
MIN = -0.074

mom asked her to help decor ate the kitchen for a
mom asked her to help decor ate a cake . Lily
Joe , " Let 's decor ate the room for a
Jane , " Let 's decor ate the diamond ourselves !"
day , she decided to decor ate her room . She
Tom and Lily wanted to decor ate their house for the
's mom asked her to decor ate the living room for
So , she decided to decor ate it with all the
girl asked her mom to decor ate . Her mom said
" Let 's decor ate the Christmas tree ,"
" Let 's decor ate the sand ," said
tired . She wanted to decor ate the room to make
So , she decided to decor ate it with some pretty
day , they decided to decor ate their garden with lots
Jill , " Let 's decor ate it ! This will
One day she decided to decor ate the sink in her
Today they had decided to decor ate the house .
said , " Let 's decor ate him !"
their day and decided to decor ate the living room .
boy named Tim wanted to decor ate his room . He

INTERVAL 0.040 - 0.078
CONTAINS 0.515%

they saw a big whale on the shore . It was
frogs ! They were jumping everywhere and making lots of noise
what her mom my said when she got hurt .
on the head . The dog y el ped and ran
were in the park playing when they saw the rocket .

INTERVAL 0.002 - 0.040
CONTAINS 43.651%

. She did not know it was glue . She wanted
she finally remembered , it was already dark outside . Lily
OK , oct opus ," said the duck . " I
The rain didn 't spoil their trip after all . <|endoftext|>
want to build a house with the boxes .

INTERVAL -0.036 - 0.002
CONTAINS 54.818%

. He remembered that his dad told him that this button
" Yes , thank you ," Lily and Sam
fit team . They continued to play and have fun in
He was taken to the hospital , but it was too
" Max , you are in trouble . You need

INTERVAL -0.074 - -0.036
CONTAINS 1.016%

, there was a little girl named Lily . She loved
! At the end of the day , the
One day Jenna had an idea . She asked her
and had a lot of fun . Tim was happy to
up , it heard a voice and asked , " Who
Token soft
Token ID2705
Feature activation-8.89e-05

Pos logit contributions
 drown+0.034
rified+0.034
 trigger+0.032
 Ow+0.031
 drowned+0.030
Neg logit contributions
up-0.034
old-0.028
 oblig-0.027
Sarah-0.027
ate-0.027
Token t
Token ID256
Feature activation-8.51e-03

Pos logit contributions
up+0.000
Sarah+0.000
 oblig+0.000
aver+0.000
ate+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 trigger-0.000
 Ow-0.000
 fists-0.000
Tokeneddy
Token ID21874
Feature activation+1.25e-02

Pos logit contributions
up+0.011
 oblig+0.009
Sarah+0.009
 kind+0.009
aver+0.008
Neg logit contributions
rified-0.018
 drown-0.017
 trigger-0.016
 Ow-0.016
 screams-0.016
Token bear
Token ID6842
Feature activation+1.23e-02

Pos logit contributions
 drown+0.025
rified+0.024
 Ow+0.022
 screams+0.022
 fists+0.022
Neg logit contributions
up-0.018
 kind-0.016
Sarah-0.015
aver-0.015
oe-0.014
Token named
Token ID3706
Feature activation+6.91e-03

Pos logit contributions
rified+0.026
 drown+0.025
 Ow+0.023
 trigger+0.023
 screams+0.022
Neg logit contributions
up-0.017
Sarah-0.013
 kind-0.013
oe-0.013
aver-0.013
Token Mr
Token ID1770
Feature activation+7.75e-02

Pos logit contributions
rified+0.010
 drown+0.009
 Ow+0.008
 trigger+0.008
 screams+0.008
Neg logit contributions
up-0.014
oe-0.012
Sarah-0.012
 oblig-0.012
aver-0.012
Token.
Token ID13
Feature activation+2.99e-02

Pos logit contributions
rified+0.142
 drown+0.136
 Ow+0.127
 trigger+0.123
 screams+0.119
Neg logit contributions
up-0.127
Sarah-0.107
aver-0.102
 oblig-0.102
 kind-0.098
Token Bear
Token ID14732
Feature activation+7.44e-03

Pos logit contributions
rified+0.041
 drown+0.039
 trigger+0.034
 screams+0.033
 Ow+0.033
Neg logit contributions
up-0.061
Sarah-0.054
 kind-0.053
oe-0.053
 oblig-0.052
Token.
Token ID13
Feature activation-2.73e-04

Pos logit contributions
rified+0.016
 drown+0.015
 trigger+0.014
 Ow+0.014
 screams+0.014
Neg logit contributions
up-0.010
 kind-0.008
Sarah-0.008
oe-0.008
aver-0.007
Token She
Token ID1375
Feature activation-1.03e-02

Pos logit contributions
up+0.000
aver+0.000
Sarah+0.000
oe+0.000
 oblig+0.000
Neg logit contributions
rified-0.001
 drown-0.000
 Ow-0.000
 screams-0.000
 trigger-0.000
Token loved
Token ID6151
Feature activation-7.27e-04

Pos logit contributions
up+0.016
Sarah+0.013
oe+0.013
 oblig+0.013
 kind+0.013
Neg logit contributions
rified-0.020
 drown-0.019
 Ow-0.017
 trigger-0.017
 screams-0.017
Token said
Token ID531
Feature activation+9.19e-03

Pos logit contributions
rified+0.028
 drown+0.027
 screams+0.025
 Ow+0.025
 trigger+0.025
Neg logit contributions
up-0.025
Sarah-0.021
 oblig-0.021
oe-0.021
aver-0.020
Token "
Token ID366
Feature activation+3.13e-02

Pos logit contributions
rified+0.016
 drown+0.015
 Ow+0.015
 trigger+0.014
 fists+0.014
Neg logit contributions
up-0.015
Sarah-0.013
oe-0.012
 oblig-0.012
aver-0.012
Tokendel
Token ID12381
Feature activation+1.99e-03

Pos logit contributions
rified+0.058
 drown+0.053
 Ow+0.053
 trigger+0.051
 drowned+0.048
Neg logit contributions
up-0.048
 kind-0.041
aver-0.041
 oblig-0.039
 generous-0.039
Tokeniver
Token ID1428
Feature activation+4.64e-03

Pos logit contributions
rified+0.003
 drown+0.003
 trigger+0.003
 Ow+0.003
 fists+0.003
Neg logit contributions
up-0.003
 kind-0.003
 oblig-0.003
Sarah-0.003
 generous-0.003
Token to
Token ID284
Feature activation+1.07e-02

Pos logit contributions
rified+0.008
 drown+0.008
 trigger+0.007
 Ow+0.007
 screams+0.007
Neg logit contributions
up-0.008
Sarah-0.006
 kind-0.006
oe-0.006
 oblig-0.006
Token Mr
Token ID1770
Feature activation+7.64e-02

Pos logit contributions
rified+0.022
 drown+0.021
 Ow+0.020
 trigger+0.019
 screams+0.019
Neg logit contributions
up-0.015
Sarah-0.012
oe-0.012
 oblig-0.011
aver-0.011
Token.
Token ID13
Feature activation+2.02e-02

Pos logit contributions
rified+0.134
 drown+0.128
 Ow+0.123
 trigger+0.117
 screams+0.113
Neg logit contributions
up-0.127
Sarah-0.113
 oblig-0.107
aver-0.104
pron-0.101
Token Smith
Token ID4176
Feature activation+8.02e-03

Pos logit contributions
rified+0.030
 drown+0.028
 Ow+0.025
 trigger+0.025
 screams+0.024
Neg logit contributions
up-0.040
Sarah-0.035
 kind-0.034
 oblig-0.034
oe-0.033
Token".
Token ID1911
Feature activation+2.96e-03

Pos logit contributions
rified+0.015
 drown+0.014
 trigger+0.013
 Ow+0.013
 screams+0.013
Neg logit contributions
up-0.013
 kind-0.010
Sarah-0.010
oe-0.010
aver-0.010
Token They
Token ID1119
Feature activation+2.23e-03

Pos logit contributions
rified+0.006
 drown+0.006
 trigger+0.006
 Ow+0.006
 screams+0.006
Neg logit contributions
up-0.004
Sarah-0.003
 kind-0.003
oe-0.003
 oblig-0.003
Token wondered
Token ID14028
Feature activation-4.49e-04

Pos logit contributions
rified+0.004
 drown+0.004
 trigger+0.003
 Ow+0.003
 drowning+0.003
Neg logit contributions
up-0.004
 kind-0.004
Sarah-0.004
aver-0.003
 oblig-0.003
Token was
Token ID373
Feature activation+9.67e-03

Pos logit contributions
rified+0.009
 drown+0.009
 trigger+0.008
 Ow+0.008
 fists+0.008
Neg logit contributions
up-0.008
 kind-0.007
aver-0.007
Sarah-0.007
 generous-0.006
Token happy
Token ID3772
Feature activation-6.01e-05

Pos logit contributions
rified+0.016
 drown+0.016
 drowned+0.015
 screams+0.014
 drowning+0.014
Neg logit contributions
up-0.017
Sarah-0.014
oe-0.014
 oblig-0.014
ate-0.013
Token that
Token ID326
Feature activation+1.28e-02

Pos logit contributions
up+0.000
Sarah+0.000
 kind+0.000
oe+0.000
 oblig+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 trigger-0.000
 Ow-0.000
 screams-0.000
Token she
Token ID673
Feature activation+1.59e-03

Pos logit contributions
rified+0.033
 drown+0.032
 trigger+0.030
 Ow+0.030
 screams+0.029
Neg logit contributions
up-0.012
 kind-0.009
Sarah-0.008
oe-0.008
 oblig-0.008
Token and
Token ID290
Feature activation+8.50e-03

Pos logit contributions
rified+0.003
 drown+0.003
 Ow+0.003
 trigger+0.003
 fists+0.003
Neg logit contributions
up-0.002
 kind-0.002
Sarah-0.002
aver-0.002
oe-0.002
Token Mr
Token ID1770
Feature activation+7.56e-02

Pos logit contributions
rified+0.022
 drown+0.022
 trigger+0.022
 Ow+0.021
 fists+0.020
Neg logit contributions
up-0.007
 kind-0.006
 generous-0.005
 patient-0.005
Sarah-0.004
Token.
Token ID13
Feature activation+2.60e-02

Pos logit contributions
rified+0.157
 drown+0.151
 trigger+0.139
 Ow+0.138
 screams+0.136
Neg logit contributions
up-0.106
Sarah-0.085
 kind-0.083
oe-0.082
 oblig-0.081
Token Sn
Token ID5489
Feature activation+4.29e-02

Pos logit contributions
rified+0.041
 screams+0.035
 drown+0.034
 trigger+0.034
 tongues+0.033
Neg logit contributions
up-0.046
oe-0.043
 kind-0.040
Sarah-0.040
 oblig-0.040
Tokenugg
Token ID6837
Feature activation+1.81e-02

Pos logit contributions
rified+0.091
 drown+0.088
 trigger+0.081
 Ow+0.081
 screams+0.080
Neg logit contributions
up-0.059
Sarah-0.046
oe-0.046
 kind-0.044
 oblig-0.044
Tokenles
Token ID829
Feature activation-5.33e-03

Pos logit contributions
rified+0.033
 drown+0.033
 Ow+0.029
 fists+0.028
 screams+0.028
Neg logit contributions
up-0.031
oe-0.025
 kind-0.024
ate-0.024
Sarah-0.024
Token were
Token ID547
Feature activation-4.35e-03

Pos logit contributions
up+0.009
 kind+0.007
Sarah+0.007
 oblig+0.007
oe+0.007
Neg logit contributions
rified-0.010
 drown-0.009
 Ow-0.009
 trigger-0.009
 fists-0.008
Token tree
Token ID5509
Feature activation-1.10e-02

Pos logit contributions
rified+0.023
 drown+0.022
 trigger+0.021
 Ow+0.020
 screams+0.020
Neg logit contributions
up-0.019
ate-0.015
Sarah-0.015
 oblig-0.015
aver-0.015
Token with
Token ID351
Feature activation+2.20e-02

Pos logit contributions
up+0.018
 oblig+0.016
Sarah+0.016
aver+0.015
ate+0.014
Neg logit contributions
rified-0.019
 drown-0.018
 screams-0.017
 drowning-0.016
 trigger-0.016
Token orange
Token ID10912
Feature activation-3.69e-03

Pos logit contributions
rified+0.050
 drown+0.048
 trigger+0.044
 Ow+0.043
 screams+0.043
Neg logit contributions
up-0.027
 kind-0.022
Sarah-0.021
 generous-0.019
aver-0.019
Token fruits
Token ID15921
Feature activation+1.52e-02

Pos logit contributions
up+0.007
Sarah+0.005
 kind+0.005
oe+0.005
aver+0.005
Neg logit contributions
rified-0.006
 drown-0.006
 trigger-0.005
 Ow-0.005
 screams-0.005
Token hanging
Token ID10938
Feature activation-5.59e-03

Pos logit contributions
rified+0.034
 drown+0.032
 Ow+0.030
 trigger+0.029
 fists+0.029
Neg logit contributions
up-0.020
 kind-0.016
oe-0.016
Sarah-0.015
 generous-0.014
Token from
Token ID422
Feature activation+7.55e-02

Pos logit contributions
up+0.010
Sarah+0.008
 oblig+0.008
 kind+0.008
aver+0.008
Neg logit contributions
rified-0.010
 drown-0.009
 trigger-0.008
 Ow-0.008
 screams-0.008
Token its
Token ID663
Feature activation+2.06e-02

Pos logit contributions
rified+0.167
 drown+0.159
 trigger+0.145
 screams+0.144
 Ow+0.142
Neg logit contributions
up-0.085
oe-0.078
 kind-0.076
Sarah-0.074
ate-0.073
Token branches
Token ID13737
Feature activation+1.83e-02

Pos logit contributions
rified+0.033
 drown+0.032
 trigger+0.028
 Ow+0.028
 screams+0.027
Neg logit contributions
up-0.038
Sarah-0.033
 kind-0.032
oe-0.032
 oblig-0.032
Token.
Token ID13
Feature activation-7.21e-04

Pos logit contributions
rified+0.041
 drown+0.040
 Ow+0.038
 trigger+0.037
 fists+0.036
Neg logit contributions
up-0.022
oe-0.017
 kind-0.017
Sarah-0.017
aver-0.016
Token She
Token ID1375
Feature activation-1.74e-02

Pos logit contributions
up+0.001
aver+0.001
 oblig+0.001
Sarah+0.001
oe+0.001
Neg logit contributions
rified-0.002
 drown-0.001
 Ow-0.001
 trigger-0.001
 screams-0.001
Token picked
Token ID6497
Feature activation-1.76e-03

Pos logit contributions
up+0.032
wn+0.027
ag+0.027
 oblig+0.027
cluded+0.027
Neg logit contributions
rified-0.024
 screams-0.023
 drowned-0.023
 couldn-0.023
 cries-0.021
Token and
Token ID290
Feature activation+1.14e-02

Pos logit contributions
up+0.010
 kind+0.008
oe+0.008
Sarah+0.008
 oblig+0.007
Neg logit contributions
rified-0.014
 drown-0.014
 Ow-0.013
 trigger-0.013
 screams-0.012
Token moons
Token ID31093
Feature activation+2.62e-03

Pos logit contributions
rified+0.022
 drown+0.021
 trigger+0.019
 Ow+0.019
 fists+0.018
Neg logit contributions
up-0.017
 kind-0.015
Sarah-0.014
 oblig-0.014
 generous-0.013
Token.
Token ID13
Feature activation-7.51e-03

Pos logit contributions
rified+0.005
 drown+0.005
 Ow+0.005
 trigger+0.005
 screams+0.005
Neg logit contributions
up-0.004
Sarah-0.003
 kind-0.003
oe-0.003
 oblig-0.003
Token Lily
Token ID20037
Feature activation-1.27e-03

Pos logit contributions
up+0.010
 oblig+0.007
Sarah+0.007
 kind+0.007
oe+0.007
Neg logit contributions
rified-0.016
 drown-0.016
 Ow-0.015
 trigger-0.015
 screams-0.014
Token and
Token ID290
Feature activation+9.58e-03

Pos logit contributions
up+0.002
Sarah+0.002
 kind+0.002
oe+0.001
aver+0.001
Neg logit contributions
rified-0.003
 drown-0.002
 trigger-0.002
 Ow-0.002
 fists-0.002
Token Mr
Token ID1770
Feature activation+7.53e-02

Pos logit contributions
rified+0.017
 drown+0.016
 trigger+0.014
 screams+0.014
 fists+0.014
Neg logit contributions
up-0.016
 kind-0.014
 oblig-0.013
Sarah-0.013
 generous-0.013
Token.
Token ID13
Feature activation+2.53e-02

Pos logit contributions
rified+0.166
 drown+0.160
 trigger+0.148
 Ow+0.146
 screams+0.145
Neg logit contributions
up-0.094
 kind-0.074
oe-0.073
Sarah-0.070
 oblig-0.068
Token Sn
Token ID5489
Feature activation+4.62e-02

Pos logit contributions
rified+0.045
 screams+0.040
heit+0.038
 drown+0.038
lling+0.037
Neg logit contributions
oe-0.034
 kind-0.034
up-0.033
 handy-0.033
ate-0.032
Tokenugg
Token ID6837
Feature activation+1.69e-02

Pos logit contributions
rified+0.095
 drown+0.091
 Ow+0.084
 trigger+0.083
 screams+0.082
Neg logit contributions
up-0.065
Sarah-0.054
 kind-0.052
 oblig-0.051
aver-0.051
Tokenles
Token ID829
Feature activation+3.60e-04

Pos logit contributions
rified+0.030
 drown+0.029
 Ow+0.026
 screams+0.025
 trigger+0.025
Neg logit contributions
up-0.030
oe-0.024
 kind-0.023
Sarah-0.023
ate-0.023
Token were
Token ID547
Feature activation+8.36e-03

Pos logit contributions
rified+0.001
 drown+0.001
 trigger+0.001
 Ow+0.001
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.001
 kind-0.001
oe-0.001
 oblig-0.001
TokenN
Token ID45
Feature activation-1.01e-02

Pos logit contributions
rified+0.038
 drown+0.036
 trigger+0.033
 Ow+0.033
 screams+0.032
Neg logit contributions
up-0.027
Sarah-0.022
 kind-0.021
oe-0.021
 oblig-0.021
Tokenorman
Token ID26183
Feature activation-9.95e-03

Pos logit contributions
up+0.013
Sarah+0.010
oe+0.010
 kind+0.009
aver+0.009
Neg logit contributions
rified-0.023
 drown-0.022
 trigger-0.020
 Ow-0.020
 screams-0.020
Token and
Token ID290
Feature activation+7.91e-03

Pos logit contributions
up+0.016
Sarah+0.014
 oblig+0.013
 kind+0.013
oe+0.013
Neg logit contributions
rified-0.018
 drown-0.017
 Ow-0.015
 screams-0.015
 drowned-0.015
Token his
Token ID465
Feature activation+1.52e-02

Pos logit contributions
rified+0.015
 drown+0.014
 Ow+0.013
 trigger+0.013
 screams+0.013
Neg logit contributions
up-0.012
Sarah-0.010
oe-0.010
 kind-0.010
aver-0.010
Token tutor
Token ID48680
Feature activation+2.34e-02

Pos logit contributions
rified+0.030
 drown+0.028
 trigger+0.026
 Ow+0.025
 drowned+0.025
Neg logit contributions
up-0.024
 kind-0.020
Sarah-0.019
 oblig-0.019
oe-0.019
Token Mr
Token ID1770
Feature activation+7.50e-02

Pos logit contributions
rified+0.042
 drown+0.040
 trigger+0.037
 Ow+0.037
 screams+0.036
Neg logit contributions
up-0.039
Sarah-0.032
 kind-0.032
oe-0.031
 oblig-0.031
Token.
Token ID13
Feature activation+2.46e-02

Pos logit contributions
rified+0.159
 drown+0.151
 trigger+0.140
 Ow+0.139
 screams+0.136
Neg logit contributions
up-0.101
oe-0.080
 kind-0.080
Sarah-0.079
 oblig-0.078
Token Johnson
Token ID5030
Feature activation+1.51e-02

Pos logit contributions
rified+0.038
 drown+0.037
 screams+0.034
 trigger+0.033
 fists+0.032
Neg logit contributions
up-0.042
oe-0.040
Sarah-0.039
 oblig-0.038
 kind-0.037
Token were
Token ID547
Feature activation+3.13e-04

Pos logit contributions
rified+0.028
 drown+0.027
 Ow+0.024
 trigger+0.024
 screams+0.024
Neg logit contributions
up-0.024
Sarah-0.020
 kind-0.020
oe-0.020
 oblig-0.020
Token in
Token ID287
Feature activation+1.25e-02

Pos logit contributions
rified+0.001
 drown+0.000
 Ow+0.000
 drowning+0.000
 screams+0.000
Neg logit contributions
up-0.001
Sarah-0.000
 oblig-0.000
aid-0.000
aver-0.000
Token the
Token ID262
Feature activation-7.26e-03

Pos logit contributions
rified+0.029
 drown+0.027
 trigger+0.026
 Ow+0.025
 screams+0.025
Neg logit contributions
up-0.014
 kind-0.012
 oblig-0.011
Sarah-0.011
 generous-0.011
Token in
Token ID287
Feature activation+9.37e-03

Pos logit contributions
 drown+0.017
rified+0.017
 trigger+0.016
 Ow+0.016
 tongues+0.016
Neg logit contributions
up-0.010
 kind-0.009
 nice-0.008
 generous-0.008
 wider-0.008
Token awe
Token ID25030
Feature activation+2.35e-03

Pos logit contributions
rified+0.019
 drown+0.019
 trigger+0.018
 drowned+0.017
 Ow+0.017
Neg logit contributions
 kind-0.012
up-0.012
 nice-0.011
 generous-0.011
 five-0.011
Token
Token ID6184
Feature activation-1.20e-02

Pos logit contributions
rified+0.005
 drown+0.005
 trigger+0.004
 Ow+0.004
 screams+0.004
Neg logit contributions
up-0.003
Sarah-0.003
 kind-0.002
oe-0.002
 oblig-0.002
Token
Token ID95
Feature activation-1.18e-02

Pos logit contributions
up+0.021
Sarah+0.018
oe+0.017
 kind+0.017
 oblig+0.017
Neg logit contributions
rified-0.021
 drown-0.019
 trigger-0.018
 Ow-0.018
 screams-0.017
Token
Token ID26391
Feature activation+2.43e-02

Pos logit contributions
up+0.020
Sarah+0.016
 kind+0.016
oe+0.016
aver+0.016
Neg logit contributions
rified-0.021
 drown-0.020
 Ow-0.018
 screams-0.018
 trigger-0.018
Token
Token ID447
Feature activation+7.46e-02

Pos logit contributions
rified+0.048
 trigger+0.047
 drown+0.045
 fists+0.041
 Ow+0.041
Neg logit contributions
up-0.037
Sarah-0.031
oe-0.031
 kind-0.031
ari-0.031
Token
Token ID250
Feature activation+8.24e-03

Pos logit contributions
rified+0.137
 drown+0.136
 trigger+0.128
 fists+0.124
 screams+0.119
Neg logit contributions
up-0.113
oe-0.105
 kind-0.100
Sarah-0.099
ate-0.096
Token the
Token ID262
Feature activation-5.35e-03

Pos logit contributions
rified+0.015
 drown+0.014
 trigger+0.013
 Ow+0.013
 screams+0.013
Neg logit contributions
up-0.013
Sarah-0.011
 kind-0.011
oe-0.011
 oblig-0.011
Token cliff
Token ID19516
Feature activation-2.74e-03

Pos logit contributions
 kind+0.008
up+0.008
 wise+0.007
 nice+0.007
 patient+0.007
Neg logit contributions
rified-0.010
 trigger-0.009
 drown-0.009
ishly-0.009
 tongues-0.009
Token was
Token ID373
Feature activation+7.00e-03

Pos logit contributions
up+0.005
 kind+0.004
oe+0.004
Sarah+0.004
aver+0.004
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.004
 trigger-0.004
 fists-0.004
Token so
Token ID523
Feature activation+4.64e-03

Pos logit contributions
rified+0.014
 drown+0.013
 trigger+0.012
 Ow+0.012
 screams+0.012
Neg logit contributions
up-0.011
Sarah-0.009
 kind-0.009
oe-0.009
aver-0.008
Token and
Token ID290
Feature activation+7.83e-03

Pos logit contributions
up+0.033
Sarah+0.027
 oblig+0.026
 kind+0.026
oe+0.026
Neg logit contributions
rified-0.040
 drown-0.038
 trigger-0.035
 Ow-0.034
 screams-0.034
Token butterflies
Token ID44571
Feature activation+3.21e-03

Pos logit contributions
rified+0.013
 drown+0.012
 Ow+0.011
 trigger+0.011
 screams+0.011
Neg logit contributions
up-0.014
Sarah-0.012
oe-0.012
 oblig-0.012
aver-0.012
Token.
Token ID13
Feature activation-1.12e-03

Pos logit contributions
rified+0.007
 drown+0.006
 Ow+0.006
 trigger+0.006
 fists+0.006
Neg logit contributions
up-0.004
Sarah-0.004
 kind-0.004
oe-0.003
aver-0.003
Token Sally
Token ID25737
Feature activation+8.80e-03

Pos logit contributions
up+0.002
Sarah+0.001
 oblig+0.001
 kind+0.001
oe+0.001
Neg logit contributions
rified-0.002
 drown-0.002
 Ow-0.002
 trigger-0.002
 screams-0.002
Token and
Token ID290
Feature activation+1.66e-02

Pos logit contributions
rified+0.017
 drown+0.016
 trigger+0.015
 Ow+0.015
 screams+0.014
Neg logit contributions
up-0.014
Sarah-0.011
 kind-0.011
oe-0.011
 oblig-0.011
Token Mr
Token ID1770
Feature activation+7.46e-02

Pos logit contributions
rified+0.033
 drown+0.032
 trigger+0.030
 screams+0.029
 drowned+0.028
Neg logit contributions
up-0.022
 kind-0.020
Sarah-0.020
 generous-0.018
 oblig-0.018
Token.
Token ID13
Feature activation+3.54e-02

Pos logit contributions
rified+0.158
 drown+0.152
 trigger+0.140
 Ow+0.139
 screams+0.138
Neg logit contributions
up-0.099
 kind-0.079
oe-0.077
Sarah-0.077
 oblig-0.075
Token Bunny
Token ID28683
Feature activation+2.94e-02

Pos logit contributions
rified+0.060
 drown+0.054
 screams+0.054
 trigger+0.051
 fists+0.050
Neg logit contributions
up-0.052
 kind-0.052
oe-0.050
 generous-0.049
 oblig-0.049
Token played
Token ID2826
Feature activation+1.61e-03

Pos logit contributions
rified+0.057
 drown+0.055
 trigger+0.051
 Ow+0.051
 fists+0.049
Neg logit contributions
up-0.045
 kind-0.037
Sarah-0.036
oe-0.036
 oblig-0.035
Token in
Token ID287
Feature activation+1.16e-02

Pos logit contributions
rified+0.003
 drown+0.003
 Ow+0.003
 screams+0.003
 trigger+0.002
Neg logit contributions
up-0.003
Sarah-0.002
oe-0.002
aver-0.002
old-0.002
Token the
Token ID262
Feature activation-1.34e-02

Pos logit contributions
rified+0.026
 drown+0.024
 trigger+0.023
 Ow+0.022
 fists+0.022
Neg logit contributions
up-0.014
 kind-0.014
Sarah-0.012
 generous-0.012
oe-0.012
Token were
Token ID547
Feature activation+1.04e-02

Pos logit contributions
rified+0.013
 drown+0.012
 Ow+0.012
 trigger+0.012
 fists+0.010
Neg logit contributions
up-0.012
 kind-0.010
 oblig-0.010
Sarah-0.010
oe-0.010
Token thrilled
Token ID20536
Feature activation+4.75e-03

Pos logit contributions
rified+0.018
 drown+0.017
 trigger+0.016
 Ow+0.016
 screams+0.015
Neg logit contributions
up-0.018
Sarah-0.015
 kind-0.015
oe-0.015
 oblig-0.015
Token
Token ID6184
Feature activation-1.49e-03

Pos logit contributions
rified+0.010
 drown+0.009
 Ow+0.008
 trigger+0.008
 screams+0.008
Neg logit contributions
up-0.007
Sarah-0.006
aver-0.005
 oblig-0.005
 kind-0.005
Token
Token ID95
Feature activation-7.67e-03

Pos logit contributions
up+0.002
aver+0.002
 kind+0.002
Sarah+0.002
pron+0.002
Neg logit contributions
rified-0.003
 drown-0.003
ishly-0.002
lves-0.002
 trigger-0.002
Token
Token ID26391
Feature activation+1.98e-02

Pos logit contributions
up+0.013
 kind+0.011
aver+0.011
Sarah+0.011
oe+0.010
Neg logit contributions
rified-0.013
 drown-0.012
 drowned-0.011
 Ow-0.011
 banks-0.011
Token
Token ID447
Feature activation+7.44e-02

Pos logit contributions
rified+0.038
 drown+0.035
 trigger+0.035
 Ow+0.033
 fists+0.032
Neg logit contributions
up-0.032
Sarah-0.027
oe-0.026
 kind-0.026
 oblig-0.025
Token
Token ID250
Feature activation+1.37e-02

Pos logit contributions
rified+0.113
 drown+0.109
 trigger+0.098
 fists+0.095
 screams+0.094
Neg logit contributions
up-0.142
oe-0.125
Sarah-0.125
 kind-0.124
 oblig-0.123
Token they
Token ID484
Feature activation+5.96e-03

Pos logit contributions
rified+0.024
 drown+0.020
 Ow+0.020
ting+0.019
 trigger+0.019
Neg logit contributions
up-0.024
 kind-0.019
aver-0.019
 oblig-0.019
old-0.019
Token had
Token ID550
Feature activation+5.34e-04

Pos logit contributions
rified+0.009
 drown+0.009
 Ow+0.008
 trigger+0.008
 screams+0.008
Neg logit contributions
up-0.011
 kind-0.010
Sarah-0.010
oe-0.009
 oblig-0.009
Token defeated
Token ID9772
Feature activation-4.42e-03

Pos logit contributions
rified+0.001
 drown+0.001
 Ow+0.001
 trigger+0.001
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.001
 kind-0.001
oe-0.001
 oblig-0.001
Token the
Token ID262
Feature activation+1.57e-03

Pos logit contributions
up+0.007
 kind+0.006
Sarah+0.005
oe+0.005
ate+0.005
Neg logit contributions
rified-0.009
 drown-0.008
 trigger-0.008
 Ow-0.007
 drowned-0.007
Token They
Token ID1119
Feature activation-4.71e-03

Pos logit contributions
up+0.015
Sarah+0.013
 kind+0.013
oe+0.011
Her+0.011
Neg logit contributions
rified-0.030
 drown-0.028
 trigger-0.027
 fists-0.026
 screams-0.026
Token saw
Token ID2497
Feature activation-5.42e-03

Pos logit contributions
up+0.007
 kind+0.006
aver+0.005
Sarah+0.005
 oblig+0.005
Neg logit contributions
rified-0.010
 drown-0.010
 trigger-0.009
 Ow-0.009
 fists-0.009
Token the
Token ID262
Feature activation+3.61e-03

Pos logit contributions
up+0.008
 kind+0.007
old+0.006
 nice+0.006
 generous+0.006
Neg logit contributions
rified-0.011
 drown-0.011
 fists-0.010
 trigger-0.010
 Ow-0.009
Token dog
Token ID3290
Feature activation+4.53e-02

Pos logit contributions
rified+0.007
heit+0.007
 drown+0.006
 Ow+0.006
urbed+0.006
Neg logit contributions
 kind-0.006
 generous-0.005
 nice-0.005
up-0.005
 lady-0.005
Token lick
Token ID36728
Feature activation+2.40e-02

Pos logit contributions
rified+0.094
 drown+0.087
 Ow+0.086
 trigger+0.084
 fists+0.081
Neg logit contributions
up-0.063
 kind-0.054
oe-0.049
Sarah-0.048
 generous-0.048
Token Mr
Token ID1770
Feature activation+7.44e-02

Pos logit contributions
 drown+0.051
rified+0.049
 Ow+0.045
 fists+0.045
 trigger+0.044
Neg logit contributions
up-0.033
 kind-0.026
Sarah-0.025
oe-0.025
ot-0.024
Token.
Token ID13
Feature activation+2.19e-02

Pos logit contributions
rified+0.147
 drown+0.141
 Ow+0.130
 trigger+0.129
 screams+0.127
Neg logit contributions
up-0.111
Sarah-0.093
 oblig-0.088
 kind-0.087
aver-0.087
Token Lee
Token ID5741
Feature activation+2.20e-02

Pos logit contributions
rified+0.041
lling+0.039
 drown+0.039
lled+0.038
uses+0.038
Neg logit contributions
 kind-0.025
oe-0.025
Sarah-0.024
up-0.024
 oblig-0.024
Token's
Token ID338
Feature activation+7.54e-03

Pos logit contributions
rified+0.046
 drown+0.042
 Ow+0.041
 trigger+0.040
 fists+0.038
Neg logit contributions
up-0.032
 kind-0.026
oe-0.024
Sarah-0.024
 generous-0.024
Token face
Token ID1986
Feature activation+7.47e-03

Pos logit contributions
rified+0.014
 drown+0.013
 Ow+0.012
ting+0.011
 trigger+0.011
Neg logit contributions
up-0.012
 kind-0.011
 generous-0.010
 nice-0.010
 patient-0.010
Token and
Token ID290
Feature activation-3.59e-03

Pos logit contributions
rified+0.015
 drown+0.014
 Ow+0.013
 trigger+0.013
 screams+0.013
Neg logit contributions
up-0.011
Sarah-0.009
 kind-0.009
oe-0.009
 oblig-0.009
Token fluffy
Token ID39145
Feature activation+2.28e-02

Pos logit contributions
rified+0.028
 drown+0.027
 trigger+0.025
 Ow+0.025
 screams+0.024
Neg logit contributions
up-0.020
Sarah-0.016
oe-0.015
 oblig-0.015
ate-0.015
Token that
Token ID326
Feature activation+2.75e-02

Pos logit contributions
 drown+0.058
rified+0.054
 fists+0.053
 trigger+0.053
 Ow+0.052
Neg logit contributions
up-0.021
 kind-0.020
--0.017
 generous-0.015
aver-0.015
Token everyone
Token ID2506
Feature activation+3.73e-03

Pos logit contributions
rified+0.062
 drown+0.059
 trigger+0.056
 Ow+0.055
 fists+0.054
Neg logit contributions
up-0.033
 kind-0.031
 generous-0.027
Sarah-0.026
aver-0.025
Token called
Token ID1444
Feature activation+3.78e-02

Pos logit contributions
rified+0.008
 drown+0.007
 trigger+0.007
 Ow+0.007
 fists+0.007
Neg logit contributions
up-0.005
 kind-0.005
Sarah-0.004
oe-0.004
aver-0.004
Token him
Token ID683
Feature activation+3.36e-02

Pos logit contributions
rified+0.067
 drown+0.064
 drowned+0.059
 trigger+0.058
 fists+0.057
Neg logit contributions
up-0.058
 kind-0.056
 nice-0.051
 generous-0.050
Sarah-0.050
Token Mr
Token ID1770
Feature activation+7.41e-02

Pos logit contributions
rified+0.072
 drown+0.070
 trigger+0.066
 Ow+0.063
 fists+0.063
Neg logit contributions
up-0.044
 kind-0.038
 generous-0.033
Sarah-0.033
aver-0.033
Token.
Token ID13
Feature activation+3.70e-02

Pos logit contributions
rified+0.149
 drown+0.142
 Ow+0.131
 trigger+0.130
 screams+0.127
Neg logit contributions
up-0.109
Sarah-0.090
 kind-0.085
 oblig-0.085
oe-0.084
Token Velvet
Token ID36271
Feature activation+2.17e-02

Pos logit contributions
rified+0.063
 drown+0.061
 screams+0.059
 trigger+0.056
 drowned+0.054
Neg logit contributions
up-0.053
Sarah-0.052
oe-0.051
 kind-0.051
 oblig-0.049
Token.
Token ID13
Feature activation+1.82e-02

Pos logit contributions
rified+0.047
 drown+0.045
 trigger+0.044
 Ow+0.042
 fists+0.041
Neg logit contributions
up-0.030
 kind-0.025
Sarah-0.022
 generous-0.022
aver-0.021
Token Every
Token ID3887
Feature activation+2.39e-02

Pos logit contributions
rified+0.040
 drown+0.039
 trigger+0.036
 screams+0.035
 drowned+0.035
Neg logit contributions
up-0.022
Sarah-0.018
 kind-0.018
oe-0.017
 oblig-0.017
Token day
Token ID1110
Feature activation-1.17e-03

Pos logit contributions
rified+0.050
 drown+0.047
 Ow+0.044
 screams+0.043
 drowned+0.043
Neg logit contributions
up-0.034
Sarah-0.028
oe-0.026
 oblig-0.026
aver-0.025
Token.
Token ID13
Feature activation+1.66e-03

Pos logit contributions
up+0.013
Sarah+0.011
 kind+0.010
oe+0.010
 oblig+0.010
Neg logit contributions
rified-0.020
 drown-0.019
 trigger-0.017
 Ow-0.017
 screams-0.017
Token All
Token ID1439
Feature activation-1.72e-02

Pos logit contributions
rified+0.003
 drown+0.003
 Ow+0.003
 trigger+0.003
 screams+0.003
Neg logit contributions
up-0.002
Sarah-0.002
oe-0.002
aver-0.002
 oblig-0.002
Token the
Token ID262
Feature activation+1.04e-02

Pos logit contributions
up+0.026
 kind+0.022
Sarah+0.021
oe+0.021
aver+0.021
Neg logit contributions
rified-0.033
 drown-0.032
 trigger-0.030
 Ow-0.029
 screams-0.029
Token animals
Token ID4695
Feature activation+1.74e-02

Pos logit contributions
rified+0.021
 drown+0.020
 trigger+0.019
ishly+0.018
 horror+0.018
Neg logit contributions
up-0.015
 kind-0.014
 generous-0.013
 patient-0.012
 nice-0.012
Token loved
Token ID6151
Feature activation+2.23e-02

Pos logit contributions
rified+0.036
 drown+0.033
 trigger+0.033
 Ow+0.033
 fists+0.032
Neg logit contributions
up-0.025
 kind-0.023
 generous-0.020
aver-0.020
oe-0.019
Token Mr
Token ID1770
Feature activation+7.41e-02

Pos logit contributions
rified+0.059
 drown+0.054
 trigger+0.053
 horror+0.050
 fists+0.049
Neg logit contributions
 kind-0.019
up-0.018
 generous-0.016
 nice-0.015
Sarah-0.015
Token.
Token ID13
Feature activation+3.40e-02

Pos logit contributions
rified+0.153
 drown+0.146
 trigger+0.135
 Ow+0.134
 screams+0.132
Neg logit contributions
up-0.104
Sarah-0.084
 kind-0.082
oe-0.081
 oblig-0.080
Token Owl
Token ID37007
Feature activation+1.15e-02

Pos logit contributions
rified+0.059
 drown+0.056
 screams+0.051
 trigger+0.050
 tongues+0.049
Neg logit contributions
up-0.054
 kind-0.048
Sarah-0.048
 oblig-0.047
oe-0.047
Token because
Token ID780
Feature activation+1.28e-02

Pos logit contributions
rified+0.026
 drown+0.025
 trigger+0.024
 Ow+0.023
 fists+0.022
Neg logit contributions
up-0.014
 kind-0.012
Sarah-0.010
aver-0.010
 generous-0.010
Token he
Token ID339
Feature activation+3.23e-02

Pos logit contributions
rified+0.025
 drown+0.024
 trigger+0.022
 Ow+0.022
 screams+0.021
Neg logit contributions
up-0.019
Sarah-0.016
 kind-0.016
oe-0.015
 oblig-0.015
Token was
Token ID373
Feature activation+1.51e-02

Pos logit contributions
rified+0.068
 drown+0.064
 trigger+0.063
 Ow+0.060
 fists+0.060
Neg logit contributions
up-0.045
 kind-0.040
Sarah-0.037
aver-0.036
 generous-0.035
Token intelligent
Token ID12661
Feature activation+9.46e-03

Pos logit contributions
rified+0.018
 drown+0.017
 Ow+0.016
 trigger+0.016
 screams+0.016
Neg logit contributions
up-0.017
 oblig-0.014
old-0.014
ate-0.014
oe-0.013
Token t
Token ID256
Feature activation-7.65e-03

Pos logit contributions
rified+0.015
 drown+0.014
 trigger+0.013
 Ow+0.013
 screams+0.012
Neg logit contributions
up-0.018
Sarah-0.016
 oblig-0.015
ate-0.015
oe-0.015
Tokeneddy
Token ID21874
Feature activation+4.46e-03

Pos logit contributions
up+0.009
Sarah+0.008
 oblig+0.008
 kind+0.007
aver+0.007
Neg logit contributions
rified-0.017
 drown-0.016
 Ow-0.015
 trigger-0.015
 screams-0.014
Token bear
Token ID6842
Feature activation+3.02e-03

Pos logit contributions
 drown+0.009
rified+0.009
 Ow+0.008
 fists+0.008
 trigger+0.008
Neg logit contributions
up-0.006
 kind-0.006
Sarah-0.005
aver-0.005
 generous-0.005
Token named
Token ID3706
Feature activation+1.30e-02

Pos logit contributions
rified+0.006
 drown+0.006
 trigger+0.006
 Ow+0.006
 screams+0.006
Neg logit contributions
up-0.004
Sarah-0.003
 kind-0.003
oe-0.003
aver-0.003
Token Mr
Token ID1770
Feature activation+7.40e-02

Pos logit contributions
rified+0.020
 drown+0.019
 trigger+0.017
 screams+0.017
 Ow+0.017
Neg logit contributions
up-0.024
Sarah-0.021
 kind-0.021
 oblig-0.020
oe-0.020
Token.
Token ID13
Feature activation+3.27e-02

Pos logit contributions
rified+0.136
 drown+0.132
 Ow+0.123
 trigger+0.118
 screams+0.116
Neg logit contributions
up-0.119
Sarah-0.101
aver-0.097
 oblig-0.095
 kind-0.092
Token H
Token ID367
Feature activation+1.02e-02

Pos logit contributions
rified+0.047
 drown+0.044
 trigger+0.039
 screams+0.039
 Ow+0.038
Neg logit contributions
up-0.064
Sarah-0.058
 kind-0.057
oe-0.056
 oblig-0.055
Tokenugs
Token ID10339
Feature activation+1.01e-02

Pos logit contributions
rified+0.021
 drown+0.020
 Ow+0.019
 screams+0.018
 trigger+0.018
Neg logit contributions
up-0.013
 kind-0.011
 generous-0.011
Sarah-0.011
 oblig-0.010
Token.
Token ID13
Feature activation+4.05e-03

Pos logit contributions
rified+0.021
 drown+0.020
 trigger+0.019
 Ow+0.019
 screams+0.018
Neg logit contributions
up-0.014
 kind-0.011
Sarah-0.011
oe-0.011
 oblig-0.010
Token Lily
Token ID20037
Feature activation-1.93e-03

Pos logit contributions
rified+0.007
 drown+0.007
 Ow+0.007
 trigger+0.006
 horror+0.006
Neg logit contributions
up-0.007
aver-0.006
 oblig-0.006
oe-0.005
Sarah-0.005
Token beautiful
Token ID4950
Feature activation-1.12e-02

Pos logit contributions
rified+0.055
 drown+0.053
 trigger+0.048
 Ow+0.048
 screams+0.047
Neg logit contributions
up-0.040
Sarah-0.032
 oblig-0.031
oe-0.031
aver-0.031
Token rainbow
Token ID27223
Feature activation-3.16e-02

Pos logit contributions
up+0.022
 oblig+0.018
old+0.018
Sarah+0.018
aver+0.017
Neg logit contributions
rified-0.017
 drown-0.015
 Ow-0.015
 trigger-0.015
 screams-0.015
Token.
Token ID13
Feature activation+2.09e-03

Pos logit contributions
up+0.052
Sarah+0.044
 oblig+0.044
 kind+0.042
aver+0.042
Neg logit contributions
rified-0.057
 drown-0.053
 trigger-0.049
 screams-0.048
 Ow-0.047
Token Lucy
Token ID22162
Feature activation+1.44e-02

Pos logit contributions
rified+0.004
 drown+0.004
 trigger+0.004
 Ow+0.004
 screams+0.003
Neg logit contributions
up-0.003
Sarah-0.003
 kind-0.003
oe-0.003
 oblig-0.003
Token and
Token ID290
Feature activation+1.14e-02

Pos logit contributions
rified+0.030
 drown+0.029
 trigger+0.027
 Ow+0.027
 fists+0.026
Neg logit contributions
up-0.021
 kind-0.017
Sarah-0.017
oe-0.016
aver-0.016
Token Mr
Token ID1770
Feature activation+7.36e-02

Pos logit contributions
rified+0.020
 drown+0.019
 trigger+0.017
 screams+0.017
 Ow+0.016
Neg logit contributions
up-0.020
Sarah-0.017
 kind-0.017
 oblig-0.016
 generous-0.016
Token.
Token ID13
Feature activation+3.52e-02

Pos logit contributions
rified+0.155
 drown+0.149
 trigger+0.137
 Ow+0.137
 screams+0.135
Neg logit contributions
up-0.100
 kind-0.079
Sarah-0.078
oe-0.078
 oblig-0.075
Token Wh
Token ID854
Feature activation+2.90e-02

Pos logit contributions
rified+0.068
 drown+0.060
 screams+0.059
 trigger+0.056
der+0.055
Neg logit contributions
up-0.045
oe-0.044
 kind-0.043
Sarah-0.042
 generous-0.040
Tokenisk
Token ID1984
Feature activation+1.84e-02

Pos logit contributions
rified+0.057
 drown+0.054
 Ow+0.050
 trigger+0.049
 drowned+0.049
Neg logit contributions
up-0.043
Sarah-0.036
 oblig-0.035
 kind-0.035
aver-0.034
Tokeners
Token ID364
Feature activation+2.96e-02

Pos logit contributions
rified+0.042
 drown+0.039
 drowned+0.037
 trigger+0.037
ting+0.036
Neg logit contributions
up-0.019
pron-0.016
 oblig-0.015
Sarah-0.015
 kind-0.015
Token were
Token ID547
Feature activation+1.12e-02

Pos logit contributions
rified+0.044
 drown+0.041
 Ow+0.038
 trigger+0.037
 fists+0.034
Neg logit contributions
up-0.060
 kind-0.054
Sarah-0.052
 oblig-0.052
oe-0.051
Token girl
Token ID2576
Feature activation-3.29e-02

Pos logit contributions
rified+0.038
 drown+0.035
 trigger+0.033
 screams+0.031
 Ow+0.031
Neg logit contributions
up-0.033
Sarah-0.028
ate-0.027
 oblig-0.027
aver-0.027
Token called
Token ID1444
Feature activation+1.46e-02

Pos logit contributions
up+0.046
pron+0.039
 oblig+0.038
oe+0.038
ate+0.038
Neg logit contributions
rified-0.065
 drown-0.059
 drowned-0.057
 screams-0.056
ting-0.056
Token Emily
Token ID17608
Feature activation-6.07e-03

Pos logit contributions
rified+0.028
 drown+0.026
 Ow+0.024
 trigger+0.024
 screams+0.023
Neg logit contributions
up-0.023
Sarah-0.019
aver-0.019
 oblig-0.019
 kind-0.018
Token came
Token ID1625
Feature activation-2.05e-03

Pos logit contributions
up+0.008
Sarah+0.007
 kind+0.007
oe+0.006
 oblig+0.006
Neg logit contributions
rified-0.013
 drown-0.012
 trigger-0.011
 Ow-0.011
 screams-0.011
Token to
Token ID284
Feature activation+2.27e-02

Pos logit contributions
up+0.003
pron+0.003
aver+0.003
Sarah+0.003
br+0.003
Neg logit contributions
 drown-0.003
rified-0.003
ting-0.003
 screams-0.003
 drowning-0.003
Token Mr
Token ID1770
Feature activation+7.36e-02

Pos logit contributions
rified+0.050
 drown+0.049
 Ow+0.045
 trigger+0.045
 screams+0.044
Neg logit contributions
up-0.029
Sarah-0.022
aver-0.022
oe-0.021
 kind-0.020
Token Smith
Token ID4176
Feature activation+1.40e-02

Pos logit contributions
rified+0.132
 drown+0.125
 Ow+0.117
 screams+0.113
 trigger+0.112
Neg logit contributions
up-0.122
Sarah-0.103
aver-0.103
 oblig-0.098
 kind-0.095
Token's
Token ID338
Feature activation+1.51e-02

Pos logit contributions
rified+0.029
 drown+0.027
 trigger+0.025
 Ow+0.025
 screams+0.025
Neg logit contributions
up-0.020
Sarah-0.016
 kind-0.016
oe-0.016
 oblig-0.015
Token vegetable
Token ID20236
Feature activation+1.72e-02

Pos logit contributions
rified+0.034
 drown+0.031
 trigger+0.030
 Ow+0.029
 drowned+0.029
Neg logit contributions
up-0.019
 kind-0.017
 generous-0.016
oe-0.015
 patient-0.014
Token garden
Token ID11376
Feature activation-2.00e-02

Pos logit contributions
rified+0.038
 drown+0.036
 Ow+0.034
 trigger+0.034
 fists+0.033
Neg logit contributions
up-0.022
 kind-0.018
 generous-0.016
aver-0.016
Sarah-0.016
Token.
Token ID13
Feature activation+1.99e-03

Pos logit contributions
up+0.029
Sarah+0.023
 kind+0.022
 oblig+0.022
oe+0.022
Neg logit contributions
rified-0.041
 drown-0.039
 trigger-0.036
 Ow-0.036
 screams-0.035
Token named
Token ID3706
Feature activation-9.74e-03

Pos logit contributions
up+0.046
pron+0.040
 oblig+0.039
ate+0.039
Sarah+0.038
Neg logit contributions
rified-0.067
 drown-0.060
 screams-0.059
 drowned-0.058
ting-0.057
Token Lily
Token ID20037
Feature activation-1.51e-02

Pos logit contributions
up+0.014
aver+0.011
 oblig+0.011
Sarah+0.011
 kind+0.011
Neg logit contributions
rified-0.020
 drown-0.018
 Ow-0.017
 trigger-0.017
 screams-0.017
Token came
Token ID1625
Feature activation-7.47e-05

Pos logit contributions
up+0.020
Sarah+0.017
 oblig+0.016
 kind+0.016
oe+0.016
Neg logit contributions
rified-0.032
 drown-0.030
 Ow-0.028
 trigger-0.028
 screams-0.028
Token to
Token ID284
Feature activation+2.30e-02

Pos logit contributions
up+0.000
pron+0.000
aver+0.000
br+0.000
 oblig+0.000
Neg logit contributions
 drown-0.000
rified-0.000
 screams-0.000
ting-0.000
 fists-0.000
Token visit
Token ID3187
Feature activation+2.20e-02

Pos logit contributions
rified+0.044
 drown+0.044
 Ow+0.041
 trigger+0.039
 screams+0.039
Neg logit contributions
up-0.035
aver-0.029
Sarah-0.026
oe-0.026
 kind-0.025
Token Mr
Token ID1770
Feature activation+7.32e-02

Pos logit contributions
rified+0.047
 drown+0.045
 trigger+0.042
 Ow+0.041
 screams+0.041
Neg logit contributions
up-0.029
Sarah-0.024
 kind-0.023
oe-0.023
 oblig-0.023
Token.
Token ID13
Feature activation+2.72e-02

Pos logit contributions
rified+0.127
 drown+0.119
 Ow+0.113
 trigger+0.107
 screams+0.107
Neg logit contributions
up-0.125
Sarah-0.109
aver-0.107
 oblig-0.103
 kind-0.099
Token John
Token ID1757
Feature activation+1.13e-02

Pos logit contributions
rified+0.040
 drown+0.038
 trigger+0.034
 Ow+0.033
 screams+0.033
Neg logit contributions
up-0.054
Sarah-0.047
 kind-0.046
oe-0.046
 oblig-0.045
Token.
Token ID13
Feature activation+9.66e-03

Pos logit contributions
rified+0.022
 drown+0.021
 trigger+0.019
 Ow+0.019
 screams+0.019
Neg logit contributions
up-0.017
Sarah-0.014
aver-0.013
 oblig-0.013
oe-0.013
Token She
Token ID1375
Feature activation-1.29e-02

Pos logit contributions
rified+0.020
 drown+0.019
 Ow+0.018
 trigger+0.017
 screams+0.017
Neg logit contributions
up-0.014
aver-0.011
ate-0.010
 oblig-0.010
Sarah-0.010
Token saw
Token ID2497
Feature activation+2.43e-05

Pos logit contributions
up+0.023
Sarah+0.019
 oblig+0.019
aver+0.019
oe+0.019
Neg logit contributions
rified-0.021
 drown-0.020
 screams-0.018
 drowned-0.018
 Ow-0.018
Token and
Token ID290
Feature activation-6.74e-03

Pos logit contributions
up+0.015
Sarah+0.012
 kind+0.012
oe+0.011
 oblig+0.011
Neg logit contributions
rified-0.021
 drown-0.020
 trigger-0.018
 Ow-0.018
 screams-0.018
Token enjoyed
Token ID8359
Feature activation-1.13e-02

Pos logit contributions
up+0.011
aver+0.009
oe+0.009
Sarah+0.009
 oblig+0.009
Neg logit contributions
rified-0.012
 drown-0.012
 screams-0.011
 drowned-0.011
 trigger-0.011
Token playing
Token ID2712
Feature activation-2.37e-02

Pos logit contributions
up+0.018
Sarah+0.015
 kind+0.014
oe+0.014
aver+0.014
Neg logit contributions
rified-0.022
 drown-0.021
 Ow-0.019
 trigger-0.019
 screams-0.018
Token chess
Token ID19780
Feature activation-8.22e-03

Pos logit contributions
up+0.036
aver+0.030
oe+0.029
 oblig+0.028
ari+0.028
Neg logit contributions
rified-0.045
 drown-0.041
 trigger-0.039
 screams-0.038
ting-0.038
Token with
Token ID351
Feature activation+1.33e-03

Pos logit contributions
up+0.012
Sarah+0.011
aver+0.010
 oblig+0.010
oe+0.010
Neg logit contributions
rified-0.016
 drown-0.014
 trigger-0.013
 screams-0.013
der-0.013
Token Mr
Token ID1770
Feature activation+7.29e-02

Pos logit contributions
rified+0.003
 drown+0.003
 trigger+0.003
 screams+0.003
 Ow+0.003
Neg logit contributions
up-0.002
Sarah-0.001
 kind-0.001
oe-0.001
 oblig-0.001
Token.
Token ID13
Feature activation+3.09e-02

Pos logit contributions
rified+0.129
 drown+0.120
 Ow+0.113
 trigger+0.109
 screams+0.109
Neg logit contributions
up-0.123
Sarah-0.104
aver-0.102
 oblig-0.100
pron-0.095
Token John
Token ID1757
Feature activation+1.12e-02

Pos logit contributions
rified+0.050
 drown+0.047
 screams+0.043
 trigger+0.043
 fists+0.041
Neg logit contributions
up-0.053
Sarah-0.050
oe-0.048
 kind-0.047
 oblig-0.047
Token.
Token ID13
Feature activation+9.54e-03

Pos logit contributions
rified+0.023
 drown+0.022
 Ow+0.020
 trigger+0.020
 fists+0.019
Neg logit contributions
up-0.016
Sarah-0.013
 kind-0.013
oe-0.013
 oblig-0.012
Token
Token ID198
Feature activation-5.84e-03

Pos logit contributions
rified+0.020
 drown+0.020
 trigger+0.018
 screams+0.018
 fists+0.018
Neg logit contributions
up-0.012
Sarah-0.011
oe-0.010
 kind-0.010
 oblig-0.010
Token
Token ID198
Feature activation-9.66e-03

Pos logit contributions
up+0.007
Sarah+0.006
 kind+0.005
oe+0.005
 oblig+0.005
Neg logit contributions
rified-0.014
 drown-0.013
 trigger-0.012
 fists-0.012
 Ow-0.012
Token They
Token ID1119
Feature activation+2.49e-02

Pos logit contributions
rified+0.014
 drown+0.013
 drowned+0.012
 screams+0.012
 trigger+0.012
Neg logit contributions
up-0.008
oe-0.007
 kind-0.007
Sarah-0.007
ate-0.006
Token saw
Token ID2497
Feature activation+9.39e-03

Pos logit contributions
rified+0.051
 fault+0.049
 drown+0.048
 Ow+0.047
der+0.047
Neg logit contributions
 kind-0.028
ot-0.027
up-0.026
 opened-0.026
 called-0.025
Token some
Token ID617
Feature activation+9.92e-03

Pos logit contributions
rified+0.022
 drown+0.021
 Ow+0.019
 trigger+0.019
 drowned+0.019
Neg logit contributions
up-0.011
 kind-0.009
 nice-0.008
 generous-0.008
Sarah-0.007
Token vines
Token ID44439
Feature activation+2.80e-02

Pos logit contributions
rified+0.017
 drown+0.016
 Ow+0.015
 trigger+0.014
 screams+0.014
Neg logit contributions
up-0.018
 kind-0.015
Sarah-0.015
oe-0.014
aver-0.014
Token hanging
Token ID10938
Feature activation-4.83e-03

Pos logit contributions
rified+0.059
 drown+0.056
 Ow+0.054
 trigger+0.051
 fists+0.049
Neg logit contributions
up-0.038
 kind-0.032
oe-0.030
Sarah-0.029
 generous-0.028
Token from
Token ID422
Feature activation+7.29e-02

Pos logit contributions
up+0.008
Sarah+0.007
 kind+0.007
 oblig+0.007
oe+0.007
Neg logit contributions
rified-0.008
 drown-0.008
 trigger-0.007
 Ow-0.007
 screams-0.007
Token the
Token ID262
Feature activation+2.90e-02

Pos logit contributions
rified+0.177
 drown+0.169
 screams+0.156
 trigger+0.152
 fists+0.151
Neg logit contributions
 five-0.062
oe-0.059
 kind-0.057
up-0.055
ate-0.055
Token tree
Token ID5509
Feature activation+2.22e-02

Pos logit contributions
rified+0.049
 drown+0.047
 Ow+0.042
 drowned+0.041
 risking+0.040
Neg logit contributions
up-0.048
 kind-0.045
oe-0.043
 nice-0.041
 generous-0.041
Token.
Token ID13
Feature activation+3.18e-03

Pos logit contributions
rified+0.047
 drown+0.045
 Ow+0.042
 trigger+0.042
 screams+0.040
Neg logit contributions
up-0.030
 kind-0.024
Sarah-0.024
oe-0.024
aver-0.022
Token The
Token ID383
Feature activation+5.52e-04

Pos logit contributions
rified+0.007
 drown+0.007
 trigger+0.006
 screams+0.006
 drowned+0.006
Neg logit contributions
up-0.004
Sarah-0.003
 kind-0.003
oe-0.003
 oblig-0.003
Token vines
Token ID44439
Feature activation+4.20e-02

Pos logit contributions
rified+0.001
 drown+0.001
 trigger+0.001
 Ow+0.001
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.001
 oblig-0.001
oe-0.001
aver-0.001
Token was
Token ID373
Feature activation+6.88e-03

Pos logit contributions
rified+0.015
 drown+0.015
 Ow+0.014
 trigger+0.013
 screams+0.013
Neg logit contributions
up-0.014
Sarah-0.012
 oblig-0.011
oe-0.011
aver-0.011
Token a
Token ID257
Feature activation+2.24e-02

Pos logit contributions
 ow+0.008
 spilled+0.008
 everywhere+0.008
 spilling+0.008
 drown+0.008
Neg logit contributions
up-0.013
reetings-0.012
ari-0.012
 oblig-0.012
immer-0.011
Token stuffed
Token ID22259
Feature activation+6.96e-03

Pos logit contributions
 drown+0.038
rified+0.038
 trigger+0.035
 Ow+0.035
 drowned+0.034
Neg logit contributions
up-0.038
old-0.031
Sarah-0.030
ate-0.030
 oblig-0.030
Token bunny
Token ID44915
Feature activation+1.33e-02

Pos logit contributions
 drown+0.013
rified+0.013
 fists+0.012
 screams+0.012
 trigger+0.012
Neg logit contributions
up-0.011
 kind-0.009
Sarah-0.009
aver-0.008
oe-0.008
Token named
Token ID3706
Feature activation-1.26e-03

Pos logit contributions
rified+0.029
 drown+0.028
 Ow+0.026
 trigger+0.026
 screams+0.026
Neg logit contributions
up-0.017
 kind-0.014
Sarah-0.013
aver-0.013
oe-0.013
Token Mr
Token ID1770
Feature activation+7.28e-02

Pos logit contributions
up+0.003
aver+0.002
 oblig+0.002
oe+0.002
Sarah+0.002
Neg logit contributions
rified-0.002
 drown-0.002
 Ow-0.002
 trigger-0.001
 screams-0.001
Token.
Token ID13
Feature activation+3.32e-02

Pos logit contributions
rified+0.135
 drown+0.131
 Ow+0.121
 trigger+0.118
 drowned+0.116
Neg logit contributions
up-0.116
Sarah-0.097
aver-0.093
 oblig-0.093
 kind-0.090
Token Fl
Token ID1610
Feature activation+3.91e-02

Pos logit contributions
rified+0.048
 drown+0.044
 screams+0.041
 trigger+0.040
 fists+0.039
Neg logit contributions
up-0.062
oe-0.058
Sarah-0.057
 kind-0.056
 oblig-0.055
Tokenoppy
Token ID26696
Feature activation+1.15e-02

Pos logit contributions
rified+0.069
 drown+0.065
 trigger+0.060
 Ow+0.059
 screams+0.058
Neg logit contributions
up-0.067
Sarah-0.056
oe-0.056
 kind-0.055
 oblig-0.054
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
rified+0.025
 drown+0.024
 trigger+0.022
 Ow+0.022
 screams+0.021
Neg logit contributions
up-0.015
 kind-0.012
Sarah-0.012
oe-0.012
aver-0.011
Token One
Token ID1881
Feature activation-9.05e-03

Pos logit contributions
rified+0.024
 drown+0.023
 Ow+0.021
 trigger+0.021
 screams+0.021
Neg logit contributions
up-0.017
Sarah-0.014
 kind-0.014
oe-0.013
 oblig-0.013
Token the
Token ID262
Feature activation-1.27e-02

Pos logit contributions
up+0.047
 oblig+0.038
Sarah+0.037
aver+0.037
oe+0.036
Neg logit contributions
rified-0.079
 drown-0.074
 Ow-0.071
 screams-0.069
 fists-0.068
Token necklace
Token ID38987
Feature activation-1.36e-02

Pos logit contributions
up+0.018
Sarah+0.015
 oblig+0.015
oe+0.014
 kind+0.014
Neg logit contributions
rified-0.026
 drown-0.025
 trigger-0.023
 Ow-0.023
 screams-0.022
Token and
Token ID290
Feature activation-4.13e-03

Pos logit contributions
up+0.024
Sarah+0.021
 oblig+0.021
aver+0.020
 kind+0.020
Neg logit contributions
rified-0.022
 drown-0.020
 trigger-0.019
 Ow-0.018
 screams-0.018
Token the
Token ID262
Feature activation+1.12e-02

Pos logit contributions
up+0.006
Sarah+0.005
 kind+0.004
oe+0.004
 oblig+0.004
Neg logit contributions
rified-0.009
 drown-0.008
 trigger-0.008
 Ow-0.008
 screams-0.007
Token magn
Token ID7842
Feature activation+4.55e-02

Pos logit contributions
rified+0.022
 drown+0.021
 Ow+0.019
 trigger+0.019
 screams+0.019
Neg logit contributions
up-0.017
 kind-0.013
Sarah-0.013
oe-0.013
aver-0.013
Tokenifying
Token ID4035
Feature activation+7.28e-02

Pos logit contributions
 drown+0.040
rified+0.038
 screams+0.030
 Ow+0.029
 drowned+0.029
Neg logit contributions
up-0.117
oe-0.106
Sarah-0.102
aver-0.100
 oblig-0.099
Token glass
Token ID5405
Feature activation-6.78e-05

Pos logit contributions
rified+0.128
 drown+0.121
 trigger+0.110
 Ow+0.110
 screams+0.107
Neg logit contributions
up-0.125
Sarah-0.105
 kind-0.102
oe-0.102
 oblig-0.101
Token and
Token ID290
Feature activation+8.47e-04

Pos logit contributions
up+0.000
Sarah+0.000
 oblig+0.000
 kind+0.000
aver+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 trigger-0.000
 Ow-0.000
 screams-0.000
Token started
Token ID2067
Feature activation-2.76e-02

Pos logit contributions
rified+0.002
 drown+0.002
 trigger+0.002
 Ow+0.002
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.001
 kind-0.001
oe-0.001
 oblig-0.001
Token to
Token ID284
Feature activation-1.04e-02

Pos logit contributions
up+0.034
Sarah+0.027
aver+0.026
 oblig+0.025
oe+0.024
Neg logit contributions
 drown-0.061
rified-0.060
 screams-0.056
 Ow-0.055
 fists-0.054
Token examine
Token ID10716
Feature activation-1.49e-02

Pos logit contributions
up+0.016
Sarah+0.014
oe+0.013
aver+0.013
ari+0.012
Neg logit contributions
 drown-0.020
rified-0.019
 screams-0.018
 Ow-0.018
 trigger-0.018
Token mom
Token ID1995
Feature activation-3.72e-02

Pos logit contributions
up+0.012
Sarah+0.010
 oblig+0.010
aver+0.010
oe+0.009
Neg logit contributions
rified-0.011
 drown-0.011
 Ow-0.010
 screams-0.010
 trigger-0.010
Token asked
Token ID1965
Feature activation+5.52e-03

Pos logit contributions
up+0.048
 oblig+0.047
ag+0.042
oe+0.040
br+0.040
Neg logit contributions
rified-0.068
 drowned-0.062
 Ow-0.062
 shouldn-0.061
 couldn-0.060
Token her
Token ID607
Feature activation-2.14e-02

Pos logit contributions
rified+0.011
 Ow+0.011
 drown+0.010
 screams+0.010
ting+0.010
Neg logit contributions
up-0.008
aver-0.007
 oblig-0.006
oe-0.006
ari-0.006
Token to
Token ID284
Feature activation-2.56e-02

Pos logit contributions
aver+0.035
up+0.034
wn+0.034
eh+0.034
cluded+0.033
Neg logit contributions
 Ow-0.027
 drown-0.027
rified-0.027
der-0.026
 cries-0.026
Token help
Token ID1037
Feature activation-6.99e-03

Pos logit contributions
aver+0.042
up+0.042
eh+0.041
ari+0.041
reetings+0.040
Neg logit contributions
 drown-0.032
 Ow-0.030
 arrest-0.029
 weren-0.029
 trigger-0.028
Token decor
Token ID11705
Feature activation-7.35e-02

Pos logit contributions
ari+0.010
aver+0.010
up+0.010
reetings+0.009
Sarah+0.009
Neg logit contributions
 drown-0.012
 Ow-0.012
rified-0.011
 spilling-0.011
 fists-0.010
Tokenate
Token ID378
Feature activation-4.72e-02

Pos logit contributions
aver+0.061
Sarah+0.055
up+0.054
 oblig+0.053
old+0.048
Neg logit contributions
rified-0.152
 Ow-0.148
ting-0.145
 banks-0.145
 screams-0.143
Token the
Token ID262
Feature activation-4.03e-03

Pos logit contributions
up+0.078
aver+0.070
Sarah+0.069
 oblig+0.069
oe+0.064
Neg logit contributions
rified-0.077
 drown-0.073
 Ow-0.070
 screams-0.067
 fists-0.066
Token kitchen
Token ID9592
Feature activation-2.09e-02

Pos logit contributions
up+0.007
Sarah+0.006
 oblig+0.006
aver+0.006
old+0.006
Neg logit contributions
rified-0.006
 drown-0.006
 Ow-0.006
 screams-0.006
 trigger-0.005
Token for
Token ID329
Feature activation+1.37e-02

Pos logit contributions
aver+0.035
Sarah+0.034
up+0.034
 oblig+0.032
old+0.031
Neg logit contributions
rified-0.029
 banks-0.028
 drown-0.028
 screams-0.026
der-0.026
Token a
Token ID257
Feature activation-1.54e-03

Pos logit contributions
 drown+0.023
 Ow+0.023
rified+0.022
 drowned+0.022
 fists+0.021
Neg logit contributions
up-0.021
aver-0.019
ari-0.018
old-0.017
Sarah-0.017
Token mom
Token ID1995
Feature activation-3.03e-02

Pos logit contributions
up+0.003
Sarah+0.003
 oblig+0.003
oe+0.003
aver+0.003
Neg logit contributions
rified-0.004
 drown-0.003
 Ow-0.003
 trigger-0.003
 screams-0.003
Token asked
Token ID1965
Feature activation+4.88e-03

Pos logit contributions
up+0.044
 oblig+0.038
Sarah+0.037
oe+0.036
ag+0.035
Neg logit contributions
rified-0.056
 drown-0.052
 drowned-0.049
 Ow-0.048
 screams-0.048
Token her
Token ID607
Feature activation-1.79e-02

Pos logit contributions
rified+0.011
 drown+0.010
 Ow+0.010
 screams+0.009
 fists+0.009
Neg logit contributions
up-0.006
aver-0.005
Sarah-0.005
oe-0.005
 oblig-0.005
Token to
Token ID284
Feature activation-1.87e-02

Pos logit contributions
up+0.026
aver+0.024
Sarah+0.023
oe+0.023
 oblig+0.022
Neg logit contributions
rified-0.030
 drown-0.029
 Ow-0.028
 screams-0.027
der-0.027
Token help
Token ID1037
Feature activation-5.75e-03

Pos logit contributions
up+0.031
ari+0.027
aver+0.026
oe+0.026
Sarah+0.026
Neg logit contributions
 drown-0.032
 Ow-0.028
rified-0.028
 trigger-0.026
 fists-0.026
Token decor
Token ID11705
Feature activation-7.34e-02

Pos logit contributions
up+0.008
aver+0.007
ari+0.007
Sarah+0.007
 oblig+0.006
Neg logit contributions
 drown-0.010
rified-0.010
 Ow-0.010
 fists-0.010
 trigger-0.009
Tokenate
Token ID378
Feature activation-4.33e-02

Pos logit contributions
aver+0.058
Sarah+0.054
 oblig+0.053
up+0.052
ari+0.048
Neg logit contributions
rified-0.156
 Ow-0.150
ting-0.149
 banks-0.146
ible-0.144
Token a
Token ID257
Feature activation+1.07e-02

Pos logit contributions
up+0.072
Sarah+0.063
aver+0.062
 oblig+0.062
 kind+0.060
Neg logit contributions
rified-0.074
 drown-0.069
 Ow-0.064
 trigger-0.063
 screams-0.062
Token cake
Token ID12187
Feature activation-3.98e-02

Pos logit contributions
 drown+0.018
rified+0.018
 trigger+0.017
 Ow+0.016
 screams+0.016
Neg logit contributions
up-0.017
 oblig-0.014
Sarah-0.014
aver-0.014
old-0.013
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
up+0.066
aver+0.060
Sarah+0.060
 oblig+0.058
old+0.057
Neg logit contributions
rified-0.057
 drown-0.054
 banks-0.052
der-0.051
 trigger-0.051
Token Lily
Token ID20037
Feature activation-3.66e-03

Pos logit contributions
rified+0.024
 Ow+0.023
 drown+0.022
 screams+0.021
 trigger+0.020
Neg logit contributions
up-0.020
aver-0.017
 oblig-0.016
 kind-0.016
ate-0.015
Token Joe
Token ID5689
Feature activation+6.94e-04

Pos logit contributions
up+0.009
Sarah+0.007
 kind+0.007
oe+0.007
aver+0.007
Neg logit contributions
rified-0.007
 drown-0.007
 Ow-0.006
 trigger-0.006
 screams-0.006
Token,
Token ID11
Feature activation+2.44e-02

Pos logit contributions
rified+0.001
 drown+0.001
 Ow+0.001
 trigger+0.001
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.001
 oblig-0.001
oe-0.001
aver-0.001
Token "
Token ID366
Feature activation-5.07e-04

Pos logit contributions
rified+0.049
 drown+0.046
 Ow+0.043
 trigger+0.043
 screams+0.042
Neg logit contributions
up-0.037
Sarah-0.029
oe-0.028
 oblig-0.028
 kind-0.027
TokenLet
Token ID5756
Feature activation-6.86e-04

Pos logit contributions
up+0.001
Sarah+0.001
 kind+0.001
oe+0.001
 oblig+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 Ow-0.001
 trigger-0.001
 screams-0.001
Token's
Token ID338
Feature activation-5.03e-02

Pos logit contributions
up+0.001
Sarah+0.001
 kind+0.001
oe+0.001
 oblig+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 trigger-0.001
 Ow-0.001
 screams-0.001
Token decor
Token ID11705
Feature activation-7.30e-02

Pos logit contributions
up+0.058
Sarah+0.044
 kind+0.043
oe+0.042
 oblig+0.042
Neg logit contributions
rified-0.117
 drown-0.112
 trigger-0.104
 Ow-0.104
 screams-0.103
Tokenate
Token ID378
Feature activation-4.38e-02

Pos logit contributions
up+0.052
aver+0.046
Sarah+0.043
 oblig+0.043
old+0.040
Neg logit contributions
rified-0.174
 Ow-0.163
 drown-0.159
ting-0.159
 screams-0.156
Token the
Token ID262
Feature activation+4.86e-03

Pos logit contributions
up+0.074
Sarah+0.063
aver+0.061
 oblig+0.061
oe+0.060
Neg logit contributions
rified-0.077
 drown-0.072
 Ow-0.067
 trigger-0.066
 screams-0.065
Token room
Token ID2119
Feature activation-2.88e-02

Pos logit contributions
rified+0.008
 drown+0.008
 Ow+0.007
 trigger+0.007
 screams+0.007
Neg logit contributions
up-0.008
Sarah-0.007
 oblig-0.007
aver-0.007
oe-0.007
Token for
Token ID329
Feature activation+9.33e-03

Pos logit contributions
up+0.049
Sarah+0.042
aver+0.041
 oblig+0.041
 kind+0.039
Neg logit contributions
rified-0.049
 drown-0.046
 Ow-0.041
 trigger-0.041
 screams-0.041
Token a
Token ID257
Feature activation+3.49e-03

Pos logit contributions
rified+0.018
 drown+0.017
 trigger+0.015
 Ow+0.015
 screams+0.015
Neg logit contributions
up-0.015
Sarah-0.012
 kind-0.012
oe-0.012
 oblig-0.012
Token Jane
Token ID12091
Feature activation-1.18e-02

Pos logit contributions
up+0.007
aver+0.005
br+0.005
 kind+0.005
pron+0.005
Neg logit contributions
rified-0.010
 drown-0.009
 Ow-0.009
 screams-0.009
 horror-0.009
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
up+0.016
Sarah+0.013
 oblig+0.012
aver+0.012
oe+0.012
Neg logit contributions
rified-0.025
 drown-0.024
 trigger-0.022
 Ow-0.022
 screams-0.022
Token "
Token ID366
Feature activation-4.52e-03

Pos logit contributions
rified+0.020
 drown+0.018
 Ow+0.017
 trigger+0.016
 screams+0.016
Neg logit contributions
up-0.019
ate-0.015
 oblig-0.015
aver-0.015
oe-0.015
TokenLet
Token ID5756
Feature activation+1.47e-03

Pos logit contributions
up+0.009
 oblig+0.008
aver+0.008
 kind+0.008
oe+0.008
Neg logit contributions
rified-0.007
 drown-0.006
 Ow-0.005
 trigger-0.005
 drowned-0.005
Token's
Token ID338
Feature activation-5.59e-02

Pos logit contributions
rified+0.003
 drown+0.002
 trigger+0.002
 Ow+0.002
 screams+0.002
Neg logit contributions
up-0.003
Sarah-0.002
 kind-0.002
oe-0.002
 oblig-0.002
Token decor
Token ID11705
Feature activation-7.27e-02

Pos logit contributions
up+0.076
Sarah+0.063
aver+0.060
oe+0.059
 oblig+0.059
Neg logit contributions
rified-0.114
 drown-0.112
 Ow-0.103
 trigger-0.103
 screams-0.100
Tokenate
Token ID378
Feature activation-4.54e-02

Pos logit contributions
up+0.051
aver+0.049
Sarah+0.045
 oblig+0.044
old+0.040
Neg logit contributions
rified-0.172
 Ow-0.162
 drown-0.158
ting-0.158
 screams-0.154
Token the
Token ID262
Feature activation+4.93e-03

Pos logit contributions
up+0.079
Sarah+0.066
 kind+0.065
oe+0.064
 oblig+0.063
Neg logit contributions
rified-0.080
 drown-0.075
 trigger-0.068
 Ow-0.068
 screams-0.066
Token diamond
Token ID15291
Feature activation-8.16e-03

Pos logit contributions
rified+0.009
 drown+0.009
 trigger+0.008
 Ow+0.008
 screams+0.008
Neg logit contributions
up-0.008
 kind-0.007
Sarah-0.006
oe-0.006
 generous-0.006
Token ourselves
Token ID6731
Feature activation+4.20e-03

Pos logit contributions
up+0.013
Sarah+0.011
aver+0.010
 oblig+0.010
 kind+0.010
Neg logit contributions
rified-0.015
 drown-0.014
 trigger-0.013
 Ow-0.013
 screams-0.013
Token!"
Token ID2474
Feature activation-1.23e-02

Pos logit contributions
rified+0.008
 drown+0.008
 trigger+0.007
 Ow+0.007
 screams+0.007
Neg logit contributions
up-0.007
Sarah-0.005
 kind-0.005
oe-0.005
 oblig-0.005
Token day
Token ID1110
Feature activation+7.75e-03

Pos logit contributions
up+0.016
 oblig+0.013
Sarah+0.013
ate+0.013
oe+0.012
Neg logit contributions
rified-0.020
 drown-0.018
 Ow-0.017
 screams-0.017
 drowned-0.016
Token,
Token ID11
Feature activation+9.62e-03

Pos logit contributions
rified+0.015
 drown+0.014
 Ow+0.013
 screams+0.012
 trigger+0.012
Neg logit contributions
up-0.012
Sarah-0.010
aver-0.010
ate-0.010
 oblig-0.010
Token she
Token ID673
Feature activation-1.80e-02

Pos logit contributions
rified+0.021
 drown+0.019
 Ow+0.019
 trigger+0.018
 screams+0.018
Neg logit contributions
up-0.013
aver-0.010
ate-0.010
ari-0.010
 oblig-0.010
Token decided
Token ID3066
Feature activation-6.94e-04

Pos logit contributions
up+0.027
Sarah+0.023
 kind+0.022
oe+0.022
 oblig+0.022
Neg logit contributions
rified-0.035
 drown-0.033
 Ow-0.031
 trigger-0.031
 screams-0.030
Token to
Token ID284
Feature activation-2.12e-02

Pos logit contributions
up+0.001
aver+0.001
oe+0.001
 oblig+0.001
Sarah+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 screams-0.001
 Ow-0.001
 drowned-0.001
Token decor
Token ID11705
Feature activation-7.27e-02

Pos logit contributions
up+0.030
oe+0.026
Sarah+0.025
aver+0.025
ari+0.023
Neg logit contributions
 drown-0.042
rified-0.040
 Ow-0.038
 screams-0.037
 drowned-0.036
Tokenate
Token ID378
Feature activation-4.39e-02

Pos logit contributions
up+0.052
 oblig+0.045
aver+0.045
oe+0.045
Sarah+0.044
Neg logit contributions
rified-0.174
 Ow-0.164
 drown-0.158
 screams-0.158
ting-0.156
Token her
Token ID607
Feature activation+1.38e-03

Pos logit contributions
up+0.056
Sarah+0.046
 oblig+0.045
aver+0.045
oe+0.044
Neg logit contributions
rified-0.094
 drown-0.088
 Ow-0.084
 screams-0.083
 trigger-0.082
Token room
Token ID2119
Feature activation-2.92e-02

Pos logit contributions
rified+0.003
 drown+0.003
 trigger+0.003
 Ow+0.003
 screams+0.003
Neg logit contributions
up-0.002
Sarah-0.001
 kind-0.001
oe-0.001
 oblig-0.001
Token.
Token ID13
Feature activation+7.33e-03

Pos logit contributions
up+0.047
 oblig+0.044
Sarah+0.044
aver+0.043
old+0.038
Neg logit contributions
rified-0.046
 drown-0.040
der-0.040
 banks-0.039
 fists-0.038
Token She
Token ID1375
Feature activation-1.97e-02

Pos logit contributions
rified+0.012
 Ow+0.012
 drown+0.011
 screams+0.010
 cries+0.010
Neg logit contributions
up-0.013
aver-0.012
 nicer-0.011
 oblig-0.011
oe-0.010
TokenTom
Token ID13787
Feature activation-8.35e-03

Pos logit contributions
rified+0.014
 drown+0.013
 Ow+0.012
 screams+0.012
 trigger+0.011
Neg logit contributions
up-0.020
Sarah-0.017
aver-0.017
 oblig-0.017
pron-0.016
Token and
Token ID290
Feature activation+5.91e-03

Pos logit contributions
up+0.011
Sarah+0.009
 oblig+0.009
aver+0.009
oe+0.009
Neg logit contributions
rified-0.017
 drown-0.016
 Ow-0.015
 screams-0.015
 drowned-0.015
Token Lily
Token ID20037
Feature activation-2.90e-03

Pos logit contributions
rified+0.010
 drown+0.010
 Ow+0.010
 screams+0.009
 trigger+0.009
Neg logit contributions
up-0.010
oe-0.008
pron-0.008
aver-0.008
ari-0.008
Token wanted
Token ID2227
Feature activation+1.13e-02

Pos logit contributions
up+0.004
 kind+0.003
Sarah+0.003
oe+0.003
ate+0.003
Neg logit contributions
rified-0.006
 drown-0.006
 trigger-0.006
 fists-0.006
 Ow-0.006
Token to
Token ID284
Feature activation-3.20e-02

Pos logit contributions
rified+0.023
 drown+0.023
 Ow+0.022
 screams+0.021
 drowned+0.021
Neg logit contributions
up-0.015
aver-0.013
Sarah-0.012
oe-0.012
 oblig-0.011
Token decor
Token ID11705
Feature activation-7.23e-02

Pos logit contributions
up+0.045
aver+0.040
Sarah+0.038
oe+0.037
eh+0.035
Neg logit contributions
 drown-0.062
rified-0.057
 Ow-0.056
 screams-0.054
 tongues-0.053
Tokenate
Token ID378
Feature activation-4.43e-02

Pos logit contributions
up+0.058
aver+0.055
Sarah+0.052
 oblig+0.049
eh+0.046
Neg logit contributions
rified-0.168
 drown-0.156
 Ow-0.156
 screams-0.153
ting-0.152
Token their
Token ID511
Feature activation-1.48e-04

Pos logit contributions
up+0.073
Sarah+0.063
aver+0.061
 oblig+0.061
oe+0.059
Neg logit contributions
rified-0.078
 drown-0.072
 Ow-0.068
 screams-0.066
 trigger-0.065
Token house
Token ID2156
Feature activation-2.96e-02

Pos logit contributions
up+0.000
Sarah+0.000
 oblig+0.000
aver+0.000
 kind+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 Ow-0.000
 trigger-0.000
 screams-0.000
Token for
Token ID329
Feature activation+4.25e-03

Pos logit contributions
up+0.055
aver+0.053
Sarah+0.052
 oblig+0.051
eh+0.049
Neg logit contributions
rified-0.037
 drown-0.034
 banks-0.033
 fists-0.030
 trigger-0.030
Token the
Token ID262
Feature activation-2.63e-03

Pos logit contributions
rified+0.007
 drown+0.007
 Ow+0.007
 drowned+0.006
 screams+0.006
Neg logit contributions
up-0.007
aver-0.006
Sarah-0.006
 oblig-0.005
ari-0.005
Token's
Token ID338
Feature activation+2.24e-03

Pos logit contributions
up+0.020
Sarah+0.016
 kind+0.016
oe+0.015
aver+0.015
Neg logit contributions
rified-0.027
 drown-0.026
 trigger-0.024
 Ow-0.024
 fists-0.023
Token mom
Token ID1995
Feature activation-3.31e-02

Pos logit contributions
rified+0.004
 drown+0.004
 trigger+0.004
 Ow+0.004
 fists+0.004
Neg logit contributions
up-0.003
Sarah-0.003
 oblig-0.003
oe-0.003
aver-0.003
Token asked
Token ID1965
Feature activation+4.36e-03

Pos logit contributions
up+0.048
 oblig+0.043
oe+0.040
Sarah+0.039
ate+0.039
Neg logit contributions
rified-0.060
 drown-0.056
 drowned-0.054
 Ow-0.052
 screams-0.052
Token her
Token ID607
Feature activation-1.83e-02

Pos logit contributions
rified+0.009
 Ow+0.009
 drown+0.009
ting+0.008
 screams+0.008
Neg logit contributions
up-0.006
aver-0.005
oe-0.005
 oblig-0.004
Sarah-0.004
Token to
Token ID284
Feature activation-1.91e-02

Pos logit contributions
up+0.027
aver+0.026
oe+0.025
Sarah+0.024
 oblig+0.024
Neg logit contributions
rified-0.028
 drown-0.028
 Ow-0.027
 screams-0.026
 fault-0.025
Token decor
Token ID11705
Feature activation-7.22e-02

Pos logit contributions
up+0.030
aver+0.028
ari+0.028
eh+0.028
br+0.027
Neg logit contributions
 drown-0.031
 Ow-0.025
 trigger-0.024
 arrest-0.024
rified-0.024
Tokenate
Token ID378
Feature activation-4.14e-02

Pos logit contributions
 oblig+0.057
aver+0.057
up+0.053
Sarah+0.051
 obliged+0.050
Neg logit contributions
rified-0.147
ting-0.140
 Ow-0.140
icer-0.140
uses-0.136
Token the
Token ID262
Feature activation+4.80e-03

Pos logit contributions
up+0.061
 oblig+0.053
aver+0.053
Sarah+0.052
 kind+0.049
Neg logit contributions
rified-0.077
 drown-0.070
 Ow-0.069
 trigger-0.067
 screams-0.066
Token living
Token ID2877
Feature activation-1.71e-02

Pos logit contributions
rified+0.009
 drown+0.008
 Ow+0.008
 trigger+0.008
 screams+0.007
Neg logit contributions
up-0.008
Sarah-0.007
 kind-0.006
oe-0.006
aver-0.006
Token room
Token ID2119
Feature activation-3.19e-02

Pos logit contributions
up+0.025
Sarah+0.021
 oblig+0.021
aver+0.020
 kind+0.020
Neg logit contributions
rified-0.033
 drown-0.032
 trigger-0.029
 Ow-0.029
 screams-0.029
Token for
Token ID329
Feature activation+1.36e-02

Pos logit contributions
up+0.052
aver+0.049
 oblig+0.048
Sarah+0.047
aid+0.043
Neg logit contributions
rified-0.049
der-0.042
 drown-0.042
 banks-0.040
 weren-0.040
Token So
Token ID1406
Feature activation-5.27e-03

Pos logit contributions
rified+0.008
 drown+0.007
 Ow+0.007
 screams+0.007
 fists+0.007
Neg logit contributions
up-0.004
aver-0.003
 oblig-0.003
oe-0.003
Sarah-0.003
Token,
Token ID11
Feature activation-8.15e-03

Pos logit contributions
up+0.008
aver+0.007
 oblig+0.007
Sarah+0.006
ate+0.006
Neg logit contributions
rified-0.010
 drown-0.009
 Ow-0.009
 screams-0.009
 trigger-0.009
Token she
Token ID673
Feature activation-2.36e-02

Pos logit contributions
up+0.009
aver+0.007
 oblig+0.007
Sarah+0.007
oe+0.006
Neg logit contributions
rified-0.019
 drown-0.018
 Ow-0.017
 trigger-0.017
 screams-0.017
Token decided
Token ID3066
Feature activation-2.78e-03

Pos logit contributions
up+0.039
 oblig+0.039
oe+0.038
Sarah+0.038
gary+0.036
Neg logit contributions
rified-0.034
 drowned-0.032
 drown-0.031
 screams-0.030
 trigger-0.029
Token to
Token ID284
Feature activation-2.88e-02

Pos logit contributions
up+0.003
aver+0.003
 oblig+0.003
oe+0.003
Sarah+0.003
Neg logit contributions
rified-0.006
 drown-0.006
 screams-0.005
 Ow-0.005
 drowned-0.005
Token decor
Token ID11705
Feature activation-7.18e-02

Pos logit contributions
up+0.041
Sarah+0.036
aver+0.034
oe+0.034
 oblig+0.033
Neg logit contributions
 drown-0.055
rified-0.054
 Ow-0.050
 trigger-0.050
 screams-0.048
Tokenate
Token ID378
Feature activation-4.21e-02

Pos logit contributions
up+0.049
aver+0.044
 oblig+0.042
oe+0.039
Sarah+0.039
Neg logit contributions
rified-0.176
 Ow-0.165
ting-0.158
 screams-0.158
 drown-0.156
Token it
Token ID340
Feature activation-1.35e-02

Pos logit contributions
up+0.072
Sarah+0.061
 kind+0.060
oe+0.059
 oblig+0.058
Neg logit contributions
rified-0.074
 drown-0.071
 trigger-0.064
 Ow-0.064
 screams-0.062
Token with
Token ID351
Feature activation+1.05e-02

Pos logit contributions
up+0.020
aver+0.019
Sarah+0.018
 oblig+0.018
old+0.016
Neg logit contributions
rified-0.023
 drown-0.023
 banks-0.021
 screams-0.021
 trigger-0.021
Token all
Token ID477
Feature activation-4.51e-02

Pos logit contributions
rified+0.019
 drown+0.018
 Ow+0.018
 trigger+0.017
 tongues+0.017
Neg logit contributions
up-0.015
aver-0.013
Sarah-0.013
oe-0.013
 oblig-0.013
Token the
Token ID262
Feature activation-3.76e-03

Pos logit contributions
up+0.060
 oblig+0.054
aver+0.052
ari+0.049
Sarah+0.049
Neg logit contributions
rified-0.085
 drown-0.078
 Ow-0.077
 screams-0.076
 fists-0.073
Token girl
Token ID2576
Feature activation-2.03e-02

Pos logit contributions
rified+0.053
 Ow+0.043
 trigger+0.042
 drown+0.042
 screams+0.041
Neg logit contributions
up-0.048
Sarah-0.045
 oblig-0.045
lic-0.042
aver-0.042
Token asked
Token ID1965
Feature activation+1.86e-02

Pos logit contributions
up+0.033
Sarah+0.030
 oblig+0.029
ate+0.028
oe+0.028
Neg logit contributions
rified-0.032
 drown-0.030
 drowned-0.030
 screams-0.029
 cries-0.028
Token her
Token ID607
Feature activation-1.06e-03

Pos logit contributions
rified+0.034
 Ow+0.032
 screams+0.029
ting+0.029
 horror+0.028
Neg logit contributions
up-0.031
aver-0.027
pron-0.026
oe-0.025
lic-0.024
Token mom
Token ID1995
Feature activation-3.31e-02

Pos logit contributions
up+0.002
Sarah+0.002
oe+0.002
aver+0.002
 oblig+0.002
Neg logit contributions
rified-0.002
 drown-0.001
 screams-0.001
 Ow-0.001
 horror-0.001
Token to
Token ID284
Feature activation-1.38e-02

Pos logit contributions
up+0.056
aver+0.052
 oblig+0.051
Sarah+0.050
oe+0.047
Neg logit contributions
rified-0.051
 Ow-0.045
ting-0.045
 everywhere-0.045
 drown-0.043
Token decor
Token ID11705
Feature activation-7.18e-02

Pos logit contributions
up+0.025
aver+0.023
eh+0.022
ari+0.022
oe+0.022
Neg logit contributions
 drown-0.018
rified-0.016
 Ow-0.016
 trigger-0.016
 weren-0.015
Tokenate
Token ID378
Feature activation-3.92e-02

Pos logit contributions
aver+0.058
up+0.058
Sarah+0.051
 oblig+0.051
old+0.048
Neg logit contributions
rified-0.155
 Ow-0.148
ting-0.143
icer-0.141
 banks-0.140
Token.
Token ID13
Feature activation+7.92e-03

Pos logit contributions
up+0.061
aver+0.052
Sarah+0.052
 oblig+0.051
oe+0.049
Neg logit contributions
rified-0.071
 drown-0.065
 Ow-0.063
 screams-0.061
 trigger-0.061
Token Her
Token ID2332
Feature activation-4.85e-03

Pos logit contributions
rified+0.013
 mouths+0.012
 Ow+0.012
 horror+0.012
 tongues+0.012
Neg logit contributions
aver-0.014
up-0.014
eh-0.012
 oblig-0.012
ak-0.011
Token mom
Token ID1995
Feature activation-2.26e-02

Pos logit contributions
up+0.009
 oblig+0.008
Sarah+0.008
aver+0.008
old+0.008
Neg logit contributions
rified-0.007
 drown-0.007
 fists-0.006
 Ow-0.006
 screams-0.006
Token said
Token ID531
Feature activation-2.52e-03

Pos logit contributions
up+0.040
 oblig+0.038
cluded+0.037
wn+0.037
ag+0.037
Neg logit contributions
rified-0.029
 couldn-0.025
 weren-0.025
 screams-0.023
 drowned-0.023
Token
Token ID198
Feature activation+5.27e-04

Pos logit contributions
up+0.005
Sarah+0.004
 kind+0.004
oe+0.004
 oblig+0.004
Neg logit contributions
rified-0.007
 drown-0.007
 Ow-0.006
 trigger-0.006
 screams-0.006
Token
Token ID198
Feature activation-7.60e-04

Pos logit contributions
rified+0.001
 drown+0.001
 Ow+0.001
 screams+0.001
 trigger+0.001
Neg logit contributions
up-0.001
 kind-0.001
Sarah-0.001
 oblig-0.001
aver-0.001
Token"
Token ID1
Feature activation-1.08e-02

Pos logit contributions
up+0.001
 oblig+0.001
oe+0.001
aver+0.001
 kind+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 Ow-0.001
 trigger-0.001
 drowned-0.001
TokenLet
Token ID5756
Feature activation+1.39e-03

Pos logit contributions
up+0.025
old+0.022
ag+0.022
oe+0.022
 oblig+0.022
Neg logit contributions
rified-0.012
 Ow-0.010
ting-0.009
 trigger-0.009
 drown-0.008
Token's
Token ID338
Feature activation-5.52e-02

Pos logit contributions
rified+0.002
 drown+0.002
 Ow+0.002
 trigger+0.002
 screams+0.002
Neg logit contributions
up-0.002
Sarah-0.002
 kind-0.002
oe-0.002
 oblig-0.002
Token decor
Token ID11705
Feature activation-7.17e-02

Pos logit contributions
up+0.077
Sarah+0.065
aver+0.061
oe+0.061
 kind+0.058
Neg logit contributions
 drown-0.110
rified-0.109
 Ow-0.100
 trigger-0.099
 screams-0.096
Tokenate
Token ID378
Feature activation-4.90e-02

Pos logit contributions
up+0.051
aver+0.049
Sarah+0.045
old+0.042
 oblig+0.042
Neg logit contributions
rified-0.167
 Ow-0.158
ting-0.154
 drown-0.153
 banks-0.151
Token the
Token ID262
Feature activation-1.34e-03

Pos logit contributions
up+0.085
Sarah+0.073
aver+0.071
 oblig+0.070
oe+0.070
Neg logit contributions
rified-0.082
 drown-0.077
 Ow-0.071
 trigger-0.070
 screams-0.070
Token Christmas
Token ID6786
Feature activation-7.71e-03

Pos logit contributions
up+0.002
Sarah+0.002
 oblig+0.002
oe+0.002
aver+0.002
Neg logit contributions
rified-0.003
 drown-0.002
 Ow-0.002
 trigger-0.002
 screams-0.002
Token tree
Token ID5509
Feature activation-1.12e-02

Pos logit contributions
up+0.014
aver+0.013
Sarah+0.013
 oblig+0.013
Her+0.013
Neg logit contributions
rified-0.010
 banks-0.009
 drown-0.009
 screams-0.009
 iceberg-0.009
Token,"
Token ID553
Feature activation+9.13e-03

Pos logit contributions
up+0.021
Sarah+0.018
aver+0.018
 oblig+0.017
 kind+0.017
Neg logit contributions
rified-0.017
 drown-0.016
 screams-0.014
 Ow-0.014
 banks-0.014
Token
Token ID198
Feature activation+5.52e-03

Pos logit contributions
up+0.003
Sarah+0.002
 kind+0.002
 oblig+0.002
oe+0.002
Neg logit contributions
rified-0.005
 drown-0.004
 Ow-0.004
 trigger-0.004
 screams-0.004
Token
Token ID198
Feature activation-2.72e-04

Pos logit contributions
rified+0.010
 drown+0.009
 Ow+0.009
 screams+0.008
 trigger+0.008
Neg logit contributions
up-0.009
aver-0.007
 oblig-0.007
pron-0.007
 kind-0.007
Token"
Token ID1
Feature activation-1.06e-02

Pos logit contributions
up+0.000
 oblig+0.000
oe+0.000
pron+0.000
 writer+0.000
Neg logit contributions
rified-0.001
 drown-0.001
 Ow-0.001
 trigger-0.001
 screams-0.001
TokenLet
Token ID5756
Feature activation+2.85e-03

Pos logit contributions
up+0.017
Sarah+0.014
 kind+0.014
oe+0.014
 oblig+0.014
Neg logit contributions
rified-0.020
 drown-0.019
 trigger-0.017
 Ow-0.017
 screams-0.017
Token's
Token ID338
Feature activation-4.60e-02

Pos logit contributions
rified+0.005
 drown+0.005
 trigger+0.005
 Ow+0.005
 fists+0.005
Neg logit contributions
up-0.005
Sarah-0.004
 kind-0.004
oe-0.004
 oblig-0.004
Token decor
Token ID11705
Feature activation-7.12e-02

Pos logit contributions
up+0.056
Sarah+0.044
 kind+0.042
oe+0.042
 oblig+0.041
Neg logit contributions
rified-0.103
 drown-0.100
 Ow-0.092
 trigger-0.092
 screams-0.090
Tokenate
Token ID378
Feature activation-4.53e-02

Pos logit contributions
up+0.051
Sarah+0.043
 oblig+0.042
old+0.041
aver+0.040
Neg logit contributions
rified-0.170
 Ow-0.160
 drown-0.158
 banks-0.155
 screams-0.154
Token the
Token ID262
Feature activation+9.99e-04

Pos logit contributions
up+0.081
Sarah+0.070
aver+0.069
 oblig+0.068
 kind+0.066
Neg logit contributions
rified-0.073
 drown-0.069
 Ow-0.064
 screams-0.062
 trigger-0.062
Token sand
Token ID6450
Feature activation-1.30e-02

Pos logit contributions
rified+0.002
 drown+0.002
 trigger+0.001
 Ow+0.001
 fists+0.001
Neg logit contributions
up-0.002
 kind-0.002
oe-0.001
ate-0.001
Sarah-0.001
Token,"
Token ID553
Feature activation+1.00e-02

Pos logit contributions
up+0.023
Sarah+0.021
aver+0.020
 oblig+0.020
 kind+0.020
Neg logit contributions
rified-0.020
 drown-0.019
 screams-0.017
 Ow-0.017
 trigger-0.017
Token said
Token ID531
Feature activation+8.41e-03

Pos logit contributions
rified+0.019
 drown+0.018
 Ow+0.016
 trigger+0.016
 screams+0.016
Neg logit contributions
up-0.016
Sarah-0.013
oe-0.013
 oblig-0.013
aver-0.012
Token tired
Token ID10032
Feature activation+6.82e-03

Pos logit contributions
up+0.019
Sarah+0.016
 oblig+0.015
aver+0.015
oe+0.014
Neg logit contributions
rified-0.017
 drown-0.016
 drowning-0.015
 Ow-0.014
 fists-0.014
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
rified+0.013
 drown+0.013
 Ow+0.012
 trigger+0.012
 screams+0.012
Neg logit contributions
up-0.010
Sarah-0.008
 kind-0.008
oe-0.008
 oblig-0.008
Token She
Token ID1375
Feature activation-2.82e-02

Pos logit contributions
rified+0.024
 drown+0.022
 Ow+0.022
 tongues+0.020
 screams+0.020
Neg logit contributions
up-0.017
 oblig-0.013
aver-0.013
Sarah-0.012
 kind-0.012
Token wanted
Token ID2227
Feature activation+6.37e-03

Pos logit contributions
up+0.047
Sarah+0.040
oe+0.038
 oblig+0.038
 kind+0.038
Neg logit contributions
rified-0.050
 drown-0.048
 Ow-0.043
 screams-0.043
 trigger-0.043
Token to
Token ID284
Feature activation-3.17e-02

Pos logit contributions
rified+0.013
 drown+0.013
 Ow+0.012
 screams+0.012
 drowned+0.011
Neg logit contributions
up-0.009
aver-0.007
Sarah-0.007
oe-0.007
 oblig-0.007
Token decor
Token ID11705
Feature activation-7.11e-02

Pos logit contributions
up+0.046
Sarah+0.038
oe+0.037
aver+0.037
 kind+0.034
Neg logit contributions
 drown-0.062
rified-0.061
 Ow-0.056
 screams-0.055
 trigger-0.054
Tokenate
Token ID378
Feature activation-4.54e-02

Pos logit contributions
up+0.069
aver+0.063
Sarah+0.059
 oblig+0.057
old+0.054
Neg logit contributions
rified-0.156
 Ow-0.145
 drown-0.144
 screams-0.139
ting-0.139
Token the
Token ID262
Feature activation-1.48e-03

Pos logit contributions
up+0.069
Sarah+0.057
aver+0.056
 oblig+0.055
oe+0.055
Neg logit contributions
rified-0.087
 drown-0.082
 Ow-0.076
 trigger-0.075
 screams-0.074
Token room
Token ID2119
Feature activation-3.01e-02

Pos logit contributions
up+0.002
Sarah+0.002
 oblig+0.002
aver+0.002
oe+0.002
Neg logit contributions
rified-0.003
 drown-0.003
 Ow-0.002
 trigger-0.002
 screams-0.002
Token to
Token ID284
Feature activation-6.62e-04

Pos logit contributions
up+0.050
aver+0.045
Sarah+0.045
 oblig+0.043
old+0.040
Neg logit contributions
rified-0.049
 drown-0.045
 screams-0.041
der-0.040
 fists-0.040
Token make
Token ID787
Feature activation-1.72e-02

Pos logit contributions
up+0.001
aver+0.001
ari+0.001
Sarah+0.001
oe+0.001
Neg logit contributions
 drown-0.001
rified-0.001
 aren-0.001
 Ow-0.001
 screams-0.001
Token So
Token ID1406
Feature activation-6.05e-03

Pos logit contributions
up+0.013
 oblig+0.010
 kind+0.010
oe+0.010
Sarah+0.010
Neg logit contributions
rified-0.020
 drown-0.018
 Ow-0.018
 screams-0.017
 trigger-0.017
Token,
Token ID11
Feature activation-1.17e-02

Pos logit contributions
up+0.007
Sarah+0.006
 kind+0.005
oe+0.005
 oblig+0.005
Neg logit contributions
rified-0.014
 drown-0.013
 Ow-0.012
 trigger-0.012
 screams-0.012
Token she
Token ID673
Feature activation-2.16e-02

Pos logit contributions
up+0.013
Sarah+0.009
 kind+0.009
oe+0.009
 oblig+0.009
Neg logit contributions
rified-0.028
 drown-0.027
 Ow-0.025
 trigger-0.025
 screams-0.025
Token decided
Token ID3066
Feature activation-3.99e-03

Pos logit contributions
up+0.033
Sarah+0.029
oe+0.028
 oblig+0.028
 kind+0.026
Neg logit contributions
rified-0.039
 drown-0.037
 drowned-0.034
 screams-0.034
 Ow-0.033
Token to
Token ID284
Feature activation-2.77e-02

Pos logit contributions
up+0.005
aver+0.004
Sarah+0.004
 oblig+0.004
oe+0.003
Neg logit contributions
rified-0.009
 drown-0.009
 screams-0.008
 Ow-0.008
 trigger-0.008
Token decor
Token ID11705
Feature activation-7.11e-02

Pos logit contributions
up+0.039
Sarah+0.032
oe+0.031
aver+0.031
 kind+0.030
Neg logit contributions
rified-0.055
 drown-0.055
 Ow-0.049
 trigger-0.049
 screams-0.048
Tokenate
Token ID378
Feature activation-3.90e-02

Pos logit contributions
up+0.045
aver+0.040
 oblig+0.039
Sarah+0.037
oe+0.034
Neg logit contributions
rified-0.181
 Ow-0.167
 screams-0.161
 drown-0.161
ting-0.160
Token it
Token ID340
Feature activation-7.98e-03

Pos logit contributions
up+0.061
Sarah+0.050
 kind+0.049
oe+0.048
 oblig+0.048
Neg logit contributions
rified-0.075
 drown-0.072
 trigger-0.065
 Ow-0.065
 screams-0.064
Token with
Token ID351
Feature activation+9.88e-03

Pos logit contributions
up+0.011
aver+0.010
Sarah+0.010
 oblig+0.010
old+0.009
Neg logit contributions
rified-0.015
 drown-0.014
 screams-0.013
 fists-0.013
 Ow-0.013
Token some
Token ID617
Feature activation-1.17e-02

Pos logit contributions
rified+0.018
 drown+0.018
 Ow+0.017
 trigger+0.016
 screams+0.016
Neg logit contributions
up-0.015
Sarah-0.013
aver-0.012
oe-0.012
 oblig-0.012
Token pretty
Token ID2495
Feature activation-2.80e-02

Pos logit contributions
up+0.017
Sarah+0.015
oe+0.014
aver+0.014
 oblig+0.014
Neg logit contributions
rified-0.021
 drown-0.021
 screams-0.020
 trigger-0.019
 tongues-0.019
Token day
Token ID1110
Feature activation-2.50e-03

Pos logit contributions
up+0.037
Sarah+0.032
 oblig+0.031
Ros+0.031
ate+0.030
Neg logit contributions
rified-0.028
 drown-0.027
 screams-0.025
 Ow-0.023
 tongues-0.023
Token,
Token ID11
Feature activation-3.25e-03

Pos logit contributions
up+0.003
Sarah+0.003
 kind+0.002
oe+0.002
 oblig+0.002
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.005
 trigger-0.005
 screams-0.005
Token they
Token ID484
Feature activation-5.79e-03

Pos logit contributions
up+0.005
oe+0.004
 kind+0.004
Sarah+0.004
ot+0.004
Neg logit contributions
 drown-0.006
rified-0.006
 trigger-0.006
 screams-0.006
 drowned-0.006
Token decided
Token ID3066
Feature activation-1.82e-04

Pos logit contributions
up+0.008
 kind+0.007
Sarah+0.007
oe+0.007
aver+0.007
Neg logit contributions
rified-0.012
 drown-0.011
 trigger-0.011
 screams-0.010
 Ow-0.010
Token to
Token ID284
Feature activation-1.84e-02

Pos logit contributions
up+0.000
Sarah+0.000
aver+0.000
 oblig+0.000
oe+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 trigger-0.000
 Ow-0.000
 screams-0.000
Token decor
Token ID11705
Feature activation-7.09e-02

Pos logit contributions
up+0.025
Sarah+0.021
aver+0.020
oe+0.020
 kind+0.019
Neg logit contributions
 drown-0.037
rified-0.037
 Ow-0.034
 trigger-0.033
 screams-0.033
Tokenate
Token ID378
Feature activation-4.08e-02

Pos logit contributions
up+0.047
aver+0.043
 oblig+0.042
Sarah+0.041
oe+0.035
Neg logit contributions
rified-0.171
 Ow-0.161
 drown-0.156
 banks-0.156
 screams-0.155
Token their
Token ID511
Feature activation+1.30e-02

Pos logit contributions
up+0.063
 kind+0.051
Sarah+0.051
oe+0.050
ate+0.049
Neg logit contributions
rified-0.081
 drown-0.078
 trigger-0.071
 Ow-0.070
 drowned-0.069
Token garden
Token ID11376
Feature activation-2.84e-02

Pos logit contributions
rified+0.027
 drown+0.026
 trigger+0.024
 Ow+0.024
 screams+0.023
Neg logit contributions
up-0.018
 kind-0.014
Sarah-0.014
oe-0.014
 oblig-0.014
Token with
Token ID351
Feature activation+8.15e-03

Pos logit contributions
up+0.046
Sarah+0.043
 oblig+0.043
aver+0.042
oe+0.038
Neg logit contributions
rified-0.043
 banks-0.042
 drown-0.041
 tongues-0.038
 screams-0.038
Token lots
Token ID6041
Feature activation-2.25e-02

Pos logit contributions
rified+0.015
 drown+0.014
 Ow+0.013
 trigger+0.013
 screams+0.013
Neg logit contributions
up-0.013
Sarah-0.011
 oblig-0.010
oe-0.010
aver-0.010
Token Jill
Token ID22772
Feature activation-1.16e-02

Pos logit contributions
rified+0.001
 drown+0.001
 Ow+0.001
 screams+0.001
 fists+0.001
Neg logit contributions
up-0.001
aver-0.001
 kind-0.001
oe-0.001
Sarah-0.001
Token,
Token ID11
Feature activation+1.58e-02

Pos logit contributions
up+0.014
Sarah+0.011
 kind+0.011
oe+0.011
 oblig+0.010
Neg logit contributions
rified-0.026
 drown-0.025
 trigger-0.023
 Ow-0.023
 screams-0.023
Token "
Token ID366
Feature activation-1.73e-03

Pos logit contributions
rified+0.033
 drown+0.031
 trigger+0.029
 Ow+0.029
 screams+0.028
Neg logit contributions
up-0.022
Sarah-0.018
 kind-0.017
oe-0.017
 oblig-0.017
TokenLet
Token ID5756
Feature activation+2.88e-03

Pos logit contributions
up+0.003
 oblig+0.003
 kind+0.003
 generous+0.003
aver+0.003
Neg logit contributions
rified-0.003
 Ow-0.002
 drown-0.002
ting-0.002
 trigger-0.002
Token's
Token ID338
Feature activation-4.61e-02

Pos logit contributions
rified+0.005
 drown+0.005
 Ow+0.004
 trigger+0.004
 screams+0.004
Neg logit contributions
up-0.005
Sarah-0.004
aver-0.004
oe-0.004
 oblig-0.004
Token decor
Token ID11705
Feature activation-7.08e-02

Pos logit contributions
up+0.061
Sarah+0.049
oe+0.047
aver+0.047
 kind+0.046
Neg logit contributions
rified-0.097
 drown-0.095
 trigger-0.087
 Ow-0.087
 screams-0.085
Tokenate
Token ID378
Feature activation-3.77e-02

Pos logit contributions
aver+0.054
up+0.052
 oblig+0.049
Sarah+0.048
old+0.046
Neg logit contributions
rified-0.159
ting-0.149
 Ow-0.149
 banks-0.144
uses-0.142
Token it
Token ID340
Feature activation-6.84e-03

Pos logit contributions
up+0.067
Sarah+0.057
 kind+0.056
 oblig+0.055
oe+0.055
Neg logit contributions
rified-0.063
 drown-0.059
 Ow-0.054
 trigger-0.054
 screams-0.052
Token!
Token ID0
Feature activation-1.15e-02

Pos logit contributions
up+0.011
aver+0.010
Sarah+0.010
 oblig+0.010
old+0.009
Neg logit contributions
rified-0.011
 drown-0.011
 Ow-0.010
 banks-0.010
 trigger-0.010
Token This
Token ID770
Feature activation-9.03e-03

Pos logit contributions
up+0.024
eh+0.021
aver+0.020
oe+0.020
 oblig+0.020
Neg logit contributions
rified-0.016
 Ow-0.014
 drown-0.014
 trigger-0.013
 horror-0.012
Token will
Token ID481
Feature activation+5.14e-03

Pos logit contributions
up+0.016
Sarah+0.015
 oblig+0.015
aver+0.015
old+0.014
Neg logit contributions
rified-0.013
 drown-0.012
 Ow-0.011
 screams-0.011
 trigger-0.011
Token One
Token ID1881
Feature activation-1.77e-02

Pos logit contributions
rified+0.013
 drown+0.013
 Ow+0.012
 screams+0.012
 fists+0.011
Neg logit contributions
up-0.009
aver-0.007
Sarah-0.007
oe-0.007
 oblig-0.007
Token day
Token ID1110
Feature activation+7.66e-03

Pos logit contributions
up+0.032
 oblig+0.028
Sarah+0.027
lic+0.026
ate+0.025
Neg logit contributions
rified-0.025
 drown-0.023
 Ow-0.023
 couldn-0.021
 screams-0.021
Token she
Token ID673
Feature activation-2.45e-02

Pos logit contributions
rified+0.017
 drown+0.016
 Ow+0.015
 trigger+0.015
 screams+0.015
Neg logit contributions
up-0.009
Sarah-0.007
 kind-0.007
oe-0.007
 oblig-0.007
Token decided
Token ID3066
Feature activation-6.32e-03

Pos logit contributions
up+0.036
Sarah+0.031
 oblig+0.031
oe+0.029
aver+0.029
Neg logit contributions
rified-0.045
 drown-0.043
 drowned-0.040
 Ow-0.040
 screams-0.039
Token to
Token ID284
Feature activation-3.29e-02

Pos logit contributions
up+0.008
aver+0.008
 oblig+0.007
oe+0.007
Sarah+0.007
Neg logit contributions
rified-0.012
 drown-0.012
 screams-0.011
 Ow-0.011
 risking-0.011
Token decor
Token ID11705
Feature activation-7.08e-02

Pos logit contributions
up+0.048
Sarah+0.045
aver+0.045
oe+0.042
eh+0.042
Neg logit contributions
 drown-0.057
rified-0.053
 Ow-0.053
 trigger-0.052
 screams-0.051
Tokenate
Token ID378
Feature activation-4.32e-02

Pos logit contributions
up+0.053
aver+0.052
 oblig+0.050
Sarah+0.049
old+0.046
Neg logit contributions
rified-0.158
 Ow-0.153
ting-0.146
 banks-0.144
 screams-0.142
Token the
Token ID262
Feature activation-6.18e-04

Pos logit contributions
up+0.049
Sarah+0.039
 oblig+0.038
aver+0.038
oe+0.037
Neg logit contributions
rified-0.098
 drown-0.093
 Ow-0.089
 screams-0.087
 trigger-0.087
Token sink
Token ID14595
Feature activation-1.55e-02

Pos logit contributions
up+0.001
Sarah+0.001
 oblig+0.001
aver+0.001
oe+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 Ow-0.001
 trigger-0.001
 screams-0.001
Token in
Token ID287
Feature activation+8.83e-03

Pos logit contributions
up+0.022
Sarah+0.022
 oblig+0.021
aver+0.021
Her+0.018
Neg logit contributions
rified-0.026
 drown-0.025
 banks-0.023
 screams-0.023
der-0.023
Token her
Token ID607
Feature activation-5.13e-03

Pos logit contributions
rified+0.017
 drown+0.016
 Ow+0.015
 screams+0.015
 fists+0.015
Neg logit contributions
up-0.013
aver-0.011
 oblig-0.011
Sarah-0.010
oe-0.010
Token Today
Token ID6288
Feature activation+9.96e-03

Pos logit contributions
rified+0.009
 drown+0.008
 Ow+0.008
 horror+0.008
 tongues+0.008
Neg logit contributions
up-0.006
aver-0.005
oe-0.005
 oblig-0.005
 kind-0.005
Token they
Token ID484
Feature activation-5.61e-03

Pos logit contributions
rified+0.020
 drown+0.019
 Ow+0.017
 trigger+0.017
 screams+0.017
Neg logit contributions
up-0.015
Sarah-0.012
 kind-0.012
 oblig-0.012
aver-0.012
Token had
Token ID550
Feature activation-1.23e-02

Pos logit contributions
up+0.010
Sarah+0.009
 kind+0.008
oe+0.008
 oblig+0.008
Neg logit contributions
rified-0.009
 drown-0.009
 trigger-0.008
 Ow-0.008
 screams-0.008
Token decided
Token ID3066
Feature activation-4.79e-03

Pos logit contributions
up+0.018
Sarah+0.014
 kind+0.014
oe+0.014
 oblig+0.014
Neg logit contributions
rified-0.025
 drown-0.024
 Ow-0.022
 trigger-0.022
 screams-0.022
Token to
Token ID284
Feature activation-3.31e-02

Pos logit contributions
up+0.006
aver+0.004
Sarah+0.004
 oblig+0.004
oe+0.004
Neg logit contributions
rified-0.011
 drown-0.010
 Ow-0.009
 trigger-0.009
 screams-0.009
Token decor
Token ID11705
Feature activation-7.07e-02

Pos logit contributions
up+0.048
Sarah+0.044
aver+0.044
eh+0.042
oe+0.041
Neg logit contributions
 drown-0.059
rified-0.055
 Ow-0.053
 trigger-0.052
 banks-0.051
Tokenate
Token ID378
Feature activation-4.22e-02

Pos logit contributions
up+0.047
aver+0.044
 oblig+0.039
Sarah+0.038
eh+0.034
Neg logit contributions
rified-0.174
 Ow-0.162
 drown-0.160
 screams-0.157
 banks-0.156
Token the
Token ID262
Feature activation+6.63e-03

Pos logit contributions
up+0.065
 kind+0.054
Sarah+0.053
oe+0.052
ate+0.051
Neg logit contributions
rified-0.083
 drown-0.080
 trigger-0.073
 Ow-0.072
 drowned-0.071
Token house
Token ID2156
Feature activation-2.64e-02

Pos logit contributions
rified+0.013
 drown+0.012
 trigger+0.011
 Ow+0.011
 screams+0.011
Neg logit contributions
up-0.010
Sarah-0.008
 kind-0.008
oe-0.008
 oblig-0.008
Token.
Token ID13
Feature activation+3.29e-03

Pos logit contributions
up+0.040
aver+0.036
Sarah+0.035
 oblig+0.035
eh+0.032
Neg logit contributions
rified-0.046
 drown-0.043
 trigger-0.039
 screams-0.039
 fists-0.039
Token
Token ID198
Feature activation-1.14e-03

Pos logit contributions
rified+0.006
 drown+0.005
 Ow+0.005
 screams+0.005
 trigger+0.005
Neg logit contributions
up-0.006
aver-0.005
 kind-0.005
 oblig-0.005
Sarah-0.005
Token said
Token ID531
Feature activation-1.39e-02

Pos logit contributions
up+0.035
Sarah+0.030
oe+0.030
 oblig+0.030
 kind+0.029
Neg logit contributions
rified-0.036
 drown-0.034
 screams-0.031
 drowned-0.031
 Ow-0.031
Token,
Token ID11
Feature activation+4.88e-03

Pos logit contributions
up+0.021
Sarah+0.017
aver+0.017
ate+0.016
oe+0.016
Neg logit contributions
rified-0.027
 drown-0.026
 Ow-0.024
 screams-0.024
 trigger-0.024
Token "
Token ID366
Feature activation-5.06e-03

Pos logit contributions
rified+0.009
 drown+0.008
 Ow+0.008
 trigger+0.008
 screams+0.008
Neg logit contributions
up-0.008
aver-0.006
ate-0.006
Sarah-0.006
 oblig-0.006
TokenLet
Token ID5756
Feature activation+1.66e-03

Pos logit contributions
up+0.009
 oblig+0.007
 kind+0.007
oe+0.007
ook+0.007
Neg logit contributions
rified-0.009
 drown-0.008
 Ow-0.008
 trigger-0.007
 drowned-0.007
Token's
Token ID338
Feature activation-5.19e-02

Pos logit contributions
rified+0.003
 drown+0.003
 Ow+0.002
 trigger+0.002
 screams+0.002
Neg logit contributions
up-0.003
Sarah-0.002
 kind-0.002
oe-0.002
 oblig-0.002
Token decor
Token ID11705
Feature activation-7.05e-02

Pos logit contributions
up+0.065
Sarah+0.050
 kind+0.050
 oblig+0.048
oe+0.048
Neg logit contributions
rified-0.117
 drown-0.110
 trigger-0.103
 Ow-0.103
 screams-0.101
Tokenate
Token ID378
Feature activation-3.70e-02

Pos logit contributions
up+0.055
aver+0.050
 oblig+0.046
Sarah+0.044
old+0.042
Neg logit contributions
rified-0.167
 Ow-0.155
 drown-0.151
ting-0.150
 screams-0.149
Token him
Token ID683
Feature activation+1.38e-02

Pos logit contributions
up+0.063
Sarah+0.053
 kind+0.052
oe+0.051
 oblig+0.050
Neg logit contributions
rified-0.066
 drown-0.063
 trigger-0.057
 Ow-0.057
 screams-0.055
Token!"
Token ID2474
Feature activation-5.62e-03

Pos logit contributions
rified+0.027
 drown+0.025
 Ow+0.023
 trigger+0.023
 screams+0.023
Neg logit contributions
up-0.021
Sarah-0.017
 kind-0.017
 oblig-0.017
oe-0.017
Token
Token ID220
Feature activation-2.80e-03

Pos logit contributions
up+0.008
 kind+0.007
aver+0.006
Sarah+0.006
 oblig+0.006
Neg logit contributions
rified-0.011
 drown-0.011
 Ow-0.010
 trigger-0.010
 screams-0.010
Token
Token ID198
Feature activation-7.83e-03

Pos logit contributions
up+0.005
 kind+0.004
aver+0.004
 oblig+0.004
Sarah+0.004
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.005
 screams-0.004
 trigger-0.004
Token their
Token ID511
Feature activation+1.52e-02

Pos logit contributions
rified+0.016
 drown+0.015
 trigger+0.014
 Ow+0.014
 screams+0.014
Neg logit contributions
up-0.012
 kind-0.010
Sarah-0.010
aver-0.009
 generous-0.009
Token day
Token ID1110
Feature activation-2.20e-03

Pos logit contributions
rified+0.032
 drown+0.029
 trigger+0.027
 Ow+0.027
ting+0.026
Neg logit contributions
up-0.022
 kind-0.019
 generous-0.018
oe-0.018
Sarah-0.017
Token and
Token ID290
Feature activation-2.36e-03

Pos logit contributions
up+0.003
Sarah+0.002
 kind+0.002
oe+0.002
 oblig+0.002
Neg logit contributions
rified-0.005
 drown-0.004
 Ow-0.004
 trigger-0.004
 screams-0.004
Token decided
Token ID3066
Feature activation-5.58e-03

Pos logit contributions
up+0.003
Sarah+0.003
 kind+0.003
oe+0.003
 oblig+0.003
Neg logit contributions
rified-0.005
 drown-0.005
 trigger-0.004
 Ow-0.004
 screams-0.004
Token to
Token ID284
Feature activation-2.60e-02

Pos logit contributions
up+0.007
aver+0.005
oe+0.005
Sarah+0.005
 oblig+0.005
Neg logit contributions
rified-0.012
 drown-0.012
 trigger-0.011
 Ow-0.011
 screams-0.011
Token decor
Token ID11705
Feature activation-7.04e-02

Pos logit contributions
up+0.037
Sarah+0.032
aver+0.031
oe+0.031
ari+0.029
Neg logit contributions
 drown-0.050
rified-0.048
 Ow-0.045
 trigger-0.045
 screams-0.044
Tokenate
Token ID378
Feature activation-4.17e-02

Pos logit contributions
up+0.046
aver+0.041
 oblig+0.039
Sarah+0.038
oe+0.034
Neg logit contributions
rified-0.173
 Ow-0.161
 drown-0.160
 screams-0.157
 banks-0.156
Token the
Token ID262
Feature activation+9.68e-03

Pos logit contributions
up+0.070
Sarah+0.058
 kind+0.057
oe+0.056
 oblig+0.055
Neg logit contributions
rified-0.076
 drown-0.072
 trigger-0.065
 Ow-0.065
 screams-0.064
Token living
Token ID2877
Feature activation-1.84e-02

Pos logit contributions
rified+0.018
 drown+0.017
 trigger+0.015
 Ow+0.015
 screams+0.015
Neg logit contributions
up-0.016
Sarah-0.013
 oblig-0.013
 kind-0.013
oe-0.013
Token room
Token ID2119
Feature activation-3.15e-02

Pos logit contributions
up+0.026
Sarah+0.022
 oblig+0.021
aver+0.021
 kind+0.020
Neg logit contributions
rified-0.037
 drown-0.034
 Ow-0.032
 trigger-0.031
 screams-0.031
Token.
Token ID13
Feature activation+6.77e-03

Pos logit contributions
up+0.053
Sarah+0.050
aver+0.048
 oblig+0.047
eh+0.044
Neg logit contributions
rified-0.046
 drown-0.045
 screams-0.040
 banks-0.040
 trigger-0.040
Token boy
Token ID2933
Feature activation-3.35e-02

Pos logit contributions
up+0.002
Sarah+0.002
 oblig+0.002
ate+0.002
Her+0.002
Neg logit contributions
rified-0.002
 drown-0.002
 Ow-0.002
 trigger-0.002
 screams-0.002
Token named
Token ID3706
Feature activation-1.56e-02

Pos logit contributions
up+0.048
 wider+0.039
Sarah+0.038
pron+0.037
 oblig+0.037
Neg logit contributions
 drown-0.063
rified-0.062
 drowned-0.060
 Ow-0.059
 tongues-0.057
Token Tim
Token ID5045
Feature activation-1.33e-02

Pos logit contributions
up+0.028
Sarah+0.024
 kind+0.024
oe+0.023
 oblig+0.023
Neg logit contributions
rified-0.027
 drown-0.025
 trigger-0.023
 Ow-0.022
 screams-0.022
Token wanted
Token ID2227
Feature activation+4.73e-03

Pos logit contributions
up+0.015
Sarah+0.011
 kind+0.011
oe+0.011
 oblig+0.010
Neg logit contributions
rified-0.031
 drown-0.030
 Ow-0.028
 trigger-0.028
 screams-0.028
Token to
Token ID284
Feature activation-2.79e-02

Pos logit contributions
 drown+0.009
 Ow+0.008
rified+0.008
 screams+0.008
 drowned+0.008
Neg logit contributions
up-0.007
aver-0.006
oe-0.006
Sarah-0.006
 oblig-0.006
Token decor
Token ID11705
Feature activation-7.04e-02

Pos logit contributions
up+0.044
aver+0.042
oe+0.041
Sarah+0.039
ari+0.039
Neg logit contributions
 drown-0.049
 arrest-0.041
 Ow-0.041
 screams-0.040
 drowned-0.039
Tokenate
Token ID378
Feature activation-4.56e-02

Pos logit contributions
up+0.058
aver+0.053
Sarah+0.053
 oblig+0.051
oe+0.047
Neg logit contributions
rified-0.157
 Ow-0.150
 drown-0.149
 screams-0.145
ting-0.142
Token his
Token ID465
Feature activation+1.07e-02

Pos logit contributions
up+0.072
Sarah+0.061
 oblig+0.060
oe+0.059
aver+0.058
Neg logit contributions
rified-0.084
 drown-0.079
 Ow-0.075
 screams-0.073
 trigger-0.072
Token room
Token ID2119
Feature activation-3.35e-02

Pos logit contributions
rified+0.020
 drown+0.019
 Ow+0.018
 screams+0.017
 trigger+0.017
Neg logit contributions
up-0.017
Sarah-0.014
 oblig-0.014
aver-0.014
oe-0.013
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
Sarah+0.065
up+0.064
 oblig+0.062
aver+0.060
reetings+0.058
Neg logit contributions
rified-0.038
 banks-0.034
 drown-0.034
der-0.033
 everywhere-0.032
Token He
Token ID679
Feature activation-9.00e-03

Pos logit contributions
 Ow+0.020
rified+0.019
 drown+0.018
 fists+0.017
 trigger+0.017
Neg logit contributions
up-0.024
aver-0.020
Sarah-0.020
 writer-0.020
 oblig-0.020
Token they
Token ID484
Feature activation-7.30e-03

Pos logit contributions
rified+0.004
 drown+0.004
 trigger+0.004
 Ow+0.004
 screams+0.004
Neg logit contributions
up-0.003
Sarah-0.002
oe-0.002
 kind-0.002
 oblig-0.002
Token saw
Token ID2497
Feature activation-5.35e-03

Pos logit contributions
up+0.009
 kind+0.007
Sarah+0.006
 oblig+0.006
oe+0.006
Neg logit contributions
rified-0.017
 drown-0.016
 trigger-0.015
 Ow-0.015
 screams-0.015
Token a
Token ID257
Feature activation+2.47e-02

Pos logit contributions
up+0.005
 kind+0.004
Sarah+0.003
aver+0.003
 generous+0.003
Neg logit contributions
rified-0.014
 drown-0.013
 trigger-0.013
 fists-0.012
 tongues-0.012
Token big
Token ID1263
Feature activation+1.29e-02

Pos logit contributions
rified+0.050
 drown+0.048
 trigger+0.045
 Ow+0.044
 screams+0.044
Neg logit contributions
up-0.035
Sarah-0.028
 oblig-0.027
oe-0.027
aver-0.027
Token whale
Token ID22206
Feature activation-2.33e-03

Pos logit contributions
rified+0.023
 drown+0.021
 Ow+0.019
 trigger+0.019
 screams+0.019
Neg logit contributions
up-0.022
Sarah-0.019
 kind-0.019
oe-0.018
 oblig-0.018
Token on
Token ID319
Feature activation+4.95e-02

Pos logit contributions
up+0.003
 kind+0.003
aver+0.002
oe+0.002
Sarah+0.002
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.004
 trigger-0.004
 fists-0.004
Token the
Token ID262
Feature activation+1.62e-02

Pos logit contributions
rified+0.119
 drown+0.112
 drowned+0.104
 trigger+0.103
 fists+0.103
Neg logit contributions
up-0.048
 kind-0.043
ate-0.042
 five-0.039
Sarah-0.038
Token shore
Token ID15191
Feature activation-2.54e-03

Pos logit contributions
rified+0.029
 drown+0.024
 Ow+0.024
 fists+0.022
 drowned+0.021
Neg logit contributions
up-0.029
 kind-0.026
ate-0.024
oe-0.024
 generous-0.024
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
up+0.003
 kind+0.003
oe+0.002
aver+0.002
Sarah+0.002
Neg logit contributions
rified-0.006
 drown-0.005
 Ow-0.005
 trigger-0.005
 fists-0.005
Token It
Token ID632
Feature activation+1.90e-03

Pos logit contributions
rified+0.021
 drown+0.021
 trigger+0.019
 drowned+0.019
 fists+0.019
Neg logit contributions
up-0.014
Sarah-0.012
 kind-0.011
oe-0.011
 oblig-0.011
Token was
Token ID373
Feature activation+1.51e-02

Pos logit contributions
rified+0.004
 drown+0.004
 trigger+0.004
 Ow+0.004
 fists+0.003
Neg logit contributions
up-0.003
 kind-0.002
aver-0.002
oe-0.002
Sarah-0.002
Token frogs
Token ID37475
Feature activation+9.29e-03

Pos logit contributions
rified+0.006
 drown+0.006
 trigger+0.005
 Ow+0.005
 screams+0.005
Neg logit contributions
up-0.006
Sarah-0.006
 kind-0.005
oe-0.005
 oblig-0.005
Token!
Token ID0
Feature activation-1.43e-03

Pos logit contributions
rified+0.019
 drown+0.018
 Ow+0.018
 trigger+0.017
 fists+0.016
Neg logit contributions
up-0.013
 kind-0.011
aver-0.010
Sarah-0.010
oe-0.010
Token They
Token ID1119
Feature activation+1.55e-02

Pos logit contributions
up+0.002
 kind+0.002
Sarah+0.002
oe+0.002
 generous+0.002
Neg logit contributions
 drown-0.003
rified-0.003
 trigger-0.002
 drowned-0.002
 screams-0.002
Token were
Token ID547
Feature activation+1.75e-02

Pos logit contributions
rified+0.026
 drown+0.026
 fault+0.025
 Ow+0.025
 trigger+0.024
Neg logit contributions
 kind-0.026
up-0.026
Sarah-0.024
aver-0.023
 generous-0.023
Token jumping
Token ID14284
Feature activation+1.29e-02

Pos logit contributions
rified+0.032
 drown+0.029
 trigger+0.028
 Ow+0.028
 drowned+0.025
Neg logit contributions
up-0.028
 kind-0.027
aver-0.025
 generous-0.025
Sarah-0.025
Token everywhere
Token ID8347
Feature activation+4.60e-02

Pos logit contributions
 drown+0.026
rified+0.026
 Ow+0.024
 horror+0.023
 drowned+0.023
Neg logit contributions
up-0.018
oe-0.015
 kind-0.015
Sarah-0.015
--0.014
Token and
Token ID290
Feature activation+2.57e-02

Pos logit contributions
rified+0.106
 drown+0.102
 Ow+0.097
 trigger+0.096
 tongues+0.091
Neg logit contributions
up-0.050
 kind-0.046
Sarah-0.045
oe-0.042
 generous-0.041
Token making
Token ID1642
Feature activation-2.71e-03

Pos logit contributions
rified+0.050
 Ow+0.047
 drown+0.046
 trigger+0.045
 fault+0.041
Neg logit contributions
up-0.035
 kind-0.033
Sarah-0.031
 generous-0.029
 patient-0.028
Token lots
Token ID6041
Feature activation-8.45e-03

Pos logit contributions
up+0.004
 kind+0.003
 nice+0.003
 generous+0.003
Sarah+0.003
Neg logit contributions
rified-0.006
 drown-0.005
 trigger-0.005
 Ow-0.005
ifying-0.005
Token of
Token ID286
Feature activation-2.49e-02

Pos logit contributions
up+0.005
Sarah+0.003
 kind+0.002
oe+0.002
 oblig+0.002
Neg logit contributions
rified-0.024
 drown-0.024
 Ow-0.022
 trigger-0.022
 screams-0.022
Token noise
Token ID7838
Feature activation+7.42e-03

Pos logit contributions
up+0.034
 kind+0.032
 nice+0.028
 generous+0.028
 patient+0.026
Neg logit contributions
rified-0.053
 drown-0.049
 trigger-0.048
 Ow-0.046
ifying-0.045
Token what
Token ID644
Feature activation+6.74e-03

Pos logit contributions
up+0.012
Sarah+0.010
 kind+0.010
oe+0.009
 oblig+0.009
Neg logit contributions
rified-0.019
 drown-0.019
 trigger-0.017
 Ow-0.017
 screams-0.017
Token her
Token ID607
Feature activation-4.71e-03

Pos logit contributions
rified+0.012
 drown+0.012
 Ow+0.011
 screams+0.011
 risking+0.011
Neg logit contributions
up-0.010
oe-0.009
ari-0.008
aver-0.008
ate-0.008
Token mom
Token ID1995
Feature activation-9.13e-03

Pos logit contributions
up+0.007
 kind+0.006
Sarah+0.005
oe+0.005
 oblig+0.005
Neg logit contributions
rified-0.010
 drown-0.009
 trigger-0.009
 Ow-0.008
 drowned-0.008
Tokenmy
Token ID1820
Feature activation-1.37e-04

Pos logit contributions
up+0.014
Sarah+0.012
 oblig+0.012
oe+0.012
aver+0.011
Neg logit contributions
rified-0.017
 drown-0.016
 trigger-0.014
 Ow-0.014
 screams-0.014
Token said
Token ID531
Feature activation+8.67e-03

Pos logit contributions
up+0.000
Sarah+0.000
 kind+0.000
oe+0.000
 oblig+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 trigger-0.000
 Ow-0.000
 screams-0.000
Token when
Token ID618
Feature activation+4.20e-02

Pos logit contributions
 drown+0.021
rified+0.020
 Ow+0.019
 trigger+0.019
 drowned+0.018
Neg logit contributions
up-0.009
 kind-0.008
Sarah-0.007
 generous-0.006
 of-0.006
Token she
Token ID673
Feature activation-9.33e-03

Pos logit contributions
rified+0.091
 drown+0.087
 screams+0.082
 Ow+0.080
 fists+0.080
Neg logit contributions
up-0.053
aver-0.042
oe-0.040
 oblig-0.039
ate-0.039
Token got
Token ID1392
Feature activation-1.08e-02

Pos logit contributions
up+0.016
Sarah+0.014
oe+0.013
 oblig+0.013
 kind+0.013
Neg logit contributions
rified-0.016
 drown-0.015
 Ow-0.014
 trigger-0.014
 screams-0.014
Token hurt
Token ID5938
Feature activation+1.00e-02

Pos logit contributions
up+0.017
Sarah+0.014
aver+0.014
 oblig+0.014
oe+0.014
Neg logit contributions
rified-0.020
 drown-0.020
 Ow-0.018
 trigger-0.018
 screams-0.017
Token.
Token ID13
Feature activation-1.76e-03

Pos logit contributions
rified+0.022
 drown+0.021
 trigger+0.020
 Ow+0.019
 screams+0.019
Neg logit contributions
up-0.013
Sarah-0.011
 kind-0.010
oe-0.010
 oblig-0.010
Token
Token ID220
Feature activation-1.32e-03

Pos logit contributions
up+0.002
Sarah+0.002
aver+0.002
 kind+0.002
 oblig+0.002
Neg logit contributions
rified-0.004
 drown-0.004
 Ow-0.003
 screams-0.003
 trigger-0.003
Token on
Token ID319
Feature activation+3.69e-02

Pos logit contributions
rified+0.115
 drown+0.107
 Ow+0.105
 trigger+0.101
 tongues+0.098
Neg logit contributions
up-0.057
 kind-0.048
--0.044
oe-0.044
house-0.042
Token the
Token ID262
Feature activation+1.18e-02

Pos logit contributions
 drown+0.086
rified+0.085
 drowned+0.076
 risking+0.075
idents+0.075
Neg logit contributions
up-0.037
 kind-0.034
L-0.033
Sarah-0.033
ot-0.032
Token head
Token ID1182
Feature activation+2.40e-02

Pos logit contributions
rified+0.022
 drown+0.020
 Ow+0.018
 risking+0.017
 drowned+0.017
Neg logit contributions
up-0.019
 kind-0.017
 nice-0.016
 generous-0.016
 mail-0.016
Token.
Token ID13
Feature activation-3.82e-04

Pos logit contributions
rified+0.053
 drown+0.052
 Ow+0.048
 trigger+0.047
 drowned+0.046
Neg logit contributions
up-0.031
 kind-0.024
oe-0.023
Sarah-0.023
aver-0.023
Token The
Token ID383
Feature activation-2.65e-03

Pos logit contributions
up+0.000
 kind+0.000
Sarah+0.000
oe+0.000
 oblig+0.000
Neg logit contributions
rified-0.001
 drown-0.001
 trigger-0.001
 drowned-0.001
 fists-0.001
Token dog
Token ID3290
Feature activation+4.38e-02

Pos logit contributions
up+0.004
 kind+0.004
 generous+0.003
aver+0.003
 nice+0.003
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.004
 drowned-0.004
 risking-0.004
Token y
Token ID331
Feature activation+1.81e-02

Pos logit contributions
rified+0.095
 fault+0.092
 drown+0.091
 afloat+0.089
 trigger+0.088
Neg logit contributions
up-0.054
 kind-0.050
--0.044
L-0.041
 nice-0.040
Tokenel
Token ID417
Feature activation-1.01e-03

Pos logit contributions
rified+0.032
 drown+0.030
 trigger+0.027
 Ow+0.027
 screams+0.027
Neg logit contributions
up-0.031
Sarah-0.026
oe-0.026
 kind-0.025
 oblig-0.025
Tokenped
Token ID9124
Feature activation+3.91e-02

Pos logit contributions
up+0.002
ot+0.001
aver+0.001
-+0.001
 kind+0.001
Neg logit contributions
rified-0.002
 drown-0.002
 horror-0.002
lves-0.002
idents-0.002
Token and
Token ID290
Feature activation+9.67e-03

Pos logit contributions
rified+0.084
 drown+0.084
 trigger+0.076
 Ow+0.075
 tongues+0.074
Neg logit contributions
up-0.047
 kind-0.043
oe-0.038
 generous-0.038
Sarah-0.037
Token ran
Token ID4966
Feature activation+1.22e-02

Pos logit contributions
rified+0.020
 drown+0.018
 Ow+0.017
 trigger+0.017
 fault+0.016
Neg logit contributions
up-0.014
 kind-0.013
aver-0.011
 generous-0.011
Sarah-0.011
Token were
Token ID547
Feature activation-5.27e-03

Pos logit contributions
rified+0.044
 drown+0.042
 trigger+0.040
 fists+0.038
 Ow+0.037
Neg logit contributions
up-0.046
 kind-0.041
Sarah-0.040
aver-0.038
 oblig-0.038
Token in
Token ID287
Feature activation+1.40e-02

Pos logit contributions
up+0.010
Sarah+0.008
 oblig+0.008
pron+0.008
oe+0.008
Neg logit contributions
rified-0.008
 drown-0.008
 screams-0.007
 drowning-0.007
 Ow-0.007
Token the
Token ID262
Feature activation-9.60e-03

Pos logit contributions
rified+0.033
 drown+0.030
 trigger+0.029
 Ow+0.028
 drowned+0.028
Neg logit contributions
up-0.015
 kind-0.015
Sarah-0.013
 generous-0.013
 oblig-0.012
Token park
Token ID3952
Feature activation-2.75e-02

Pos logit contributions
up+0.014
 kind+0.012
Sarah+0.012
oe+0.011
 oblig+0.011
Neg logit contributions
rified-0.019
 drown-0.018
 trigger-0.017
 Ow-0.017
 screams-0.016
Token playing
Token ID2712
Feature activation-5.78e-03

Pos logit contributions
up+0.046
 oblig+0.038
Sarah+0.038
aver+0.037
pron+0.036
Neg logit contributions
rified-0.047
 drown-0.045
 screams-0.040
 banks-0.039
 drowned-0.039
Token when
Token ID618
Feature activation+4.19e-02

Pos logit contributions
up+0.007
Sarah+0.006
 kind+0.005
oe+0.005
 oblig+0.005
Neg logit contributions
rified-0.013
 drown-0.013
 trigger-0.012
 Ow-0.012
 screams-0.011
Token they
Token ID484
Feature activation+1.57e-02

Pos logit contributions
rified+0.077
 drown+0.073
 trigger+0.066
 Ow+0.066
 screams+0.065
Neg logit contributions
up-0.069
Sarah-0.058
 kind-0.056
oe-0.056
 oblig-0.055
Token saw
Token ID2497
Feature activation-1.98e-03

Pos logit contributions
rified+0.034
 drown+0.034
 trigger+0.032
 fists+0.032
 Ow+0.032
Neg logit contributions
up-0.019
 kind-0.015
Sarah-0.015
oe-0.015
aver-0.015
Token the
Token ID262
Feature activation-1.94e-04

Pos logit contributions
up+0.002
 kind+0.002
Sarah+0.001
ate+0.001
 nice+0.001
Neg logit contributions
 drown-0.005
rified-0.005
 trigger-0.005
 fists-0.004
 drowned-0.004
Token rocket
Token ID10701
Feature activation+5.67e-03

Pos logit contributions
up+0.000
 kind+0.000
 generous+0.000
 nice+0.000
 patient+0.000
Neg logit contributions
rified-0.000
 drown-0.000
 Ow-0.000
 tongues-0.000
ishly-0.000
Token.
Token ID13
Feature activation+4.83e-03

Pos logit contributions
rified+0.012
 drown+0.011
 Ow+0.010
 trigger+0.010
 screams+0.010
Neg logit contributions
up-0.008
Sarah-0.007
 kind-0.006
oe-0.006
 oblig-0.006
Token.
Token ID13
Feature activation-1.88e-02

Pos logit contributions
rified+0.022
 drown+0.022
 trigger+0.020
 Ow+0.020
 drowned+0.019
Neg logit contributions
up-0.014
 kind-0.011
Sarah-0.011
oe-0.011
 oblig-0.010
Token She
Token ID1375
Feature activation-9.39e-03

Pos logit contributions
up+0.018
Sarah+0.013
 kind+0.012
oe+0.012
 oblig+0.012
Neg logit contributions
rified-0.048
 drown-0.046
 trigger-0.043
 Ow-0.043
 screams-0.042
Token did
Token ID750
Feature activation-9.73e-03

Pos logit contributions
up+0.016
 kind+0.014
Sarah+0.014
aver+0.013
 generous+0.013
Neg logit contributions
rified-0.017
 drown-0.016
 trigger-0.015
 Ow-0.014
 fists-0.014
Token not
Token ID407
Feature activation-1.76e-02

Pos logit contributions
up+0.015
Sarah+0.012
 kind+0.012
oe+0.012
 oblig+0.012
Neg logit contributions
rified-0.019
 drown-0.018
 trigger-0.017
 Ow-0.017
 fists-0.016
Token know
Token ID760
Feature activation-2.14e-02

Pos logit contributions
up+0.033
oe+0.031
pron+0.030
ari+0.030
ate+0.029
Neg logit contributions
 drown-0.025
rified-0.024
 screams-0.022
 Ow-0.022
 trigger-0.021
Token it
Token ID340
Feature activation+5.14e-03

Pos logit contributions
up+0.027
 kind+0.023
Sarah+0.022
Her+0.020
 of+0.020
Neg logit contributions
rified-0.047
 drown-0.045
 trigger-0.043
 drowned-0.041
 fists-0.041
Token was
Token ID373
Feature activation+1.62e-02

Pos logit contributions
 drown+0.012
rified+0.012
 trigger+0.011
 afloat+0.011
 fists+0.011
Neg logit contributions
up-0.006
Sarah-0.005
 kind-0.005
oe-0.004
L-0.004
Token glue
Token ID22749
Feature activation-5.33e-03

Pos logit contributions
rified+0.033
 drown+0.032
 Ow+0.029
 trigger+0.029
 screams+0.028
Neg logit contributions
up-0.023
Sarah-0.019
 kind-0.018
oe-0.018
 oblig-0.018
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
up+0.007
 kind+0.005
oe+0.005
Sarah+0.005
-+0.005
Neg logit contributions
rified-0.012
 drown-0.011
 Ow-0.011
 screams-0.010
 trigger-0.010
Token She
Token ID1375
Feature activation-8.25e-03

Pos logit contributions
up+0.011
Sarah+0.008
 kind+0.008
oe+0.007
 oblig+0.007
Neg logit contributions
rified-0.029
 drown-0.028
 trigger-0.026
 Ow-0.026
 screams-0.025
Token wanted
Token ID2227
Feature activation+4.17e-03

Pos logit contributions
up+0.014
 kind+0.012
Sarah+0.012
aver+0.011
oe+0.011
Neg logit contributions
rified-0.015
 drown-0.014
 trigger-0.013
 Ow-0.013
 fists-0.012
Token she
Token ID673
Feature activation-1.64e-02

Pos logit contributions
rified+0.055
 drown+0.053
 trigger+0.049
 Ow+0.048
 screams+0.048
Neg logit contributions
up-0.029
Sarah-0.024
oe-0.023
 kind-0.023
 oblig-0.021
Token finally
Token ID3443
Feature activation-1.11e-02

Pos logit contributions
up+0.026
Sarah+0.022
aver+0.022
ate+0.021
 kind+0.021
Neg logit contributions
rified-0.030
 drown-0.029
 drowned-0.027
 Ow-0.026
 screams-0.026
Token remembered
Token ID12086
Feature activation-1.44e-02

Pos logit contributions
up+0.017
Sarah+0.014
 kind+0.014
aver+0.013
ate+0.013
Neg logit contributions
rified-0.021
 drown-0.020
 drowned-0.018
 Ow-0.018
 screams-0.018
Token,
Token ID11
Feature activation-3.15e-03

Pos logit contributions
up+0.017
Sarah+0.013
 kind+0.013
oe+0.012
 oblig+0.012
Neg logit contributions
rified-0.033
 drown-0.032
 trigger-0.030
 Ow-0.029
 screams-0.029
Token it
Token ID340
Feature activation-9.45e-03

Pos logit contributions
up+0.003
Sarah+0.002
 kind+0.002
oe+0.002
 oblig+0.002
Neg logit contributions
rified-0.008
 drown-0.008
 trigger-0.007
 Ow-0.007
 screams-0.007
Token was
Token ID373
Feature activation+4.43e-03

Pos logit contributions
up+0.015
Sarah+0.013
 kind+0.013
oe+0.012
aver+0.012
Neg logit contributions
rified-0.018
 drown-0.017
 trigger-0.015
 Ow-0.015
 fists-0.015
Token already
Token ID1541
Feature activation-1.31e-02

Pos logit contributions
rified+0.008
 drown+0.008
 Ow+0.008
 drowned+0.007
 drowning+0.007
Neg logit contributions
up-0.007
ari-0.006
 oblig-0.006
old-0.005
Sarah-0.005
Token dark
Token ID3223
Feature activation+9.83e-05

Pos logit contributions
up+0.024
old+0.022
 oblig+0.021
ari+0.020
ook+0.019
Neg logit contributions
rified-0.019
 spilling-0.018
 drown-0.017
 drowned-0.017
 slipping-0.017
Token outside
Token ID2354
Feature activation+9.74e-04

Pos logit contributions
rified+0.000
 drown+0.000
 Ow+0.000
 screams+0.000
ting+0.000
Neg logit contributions
up-0.000
 oblig-0.000
Sarah-0.000
aver-0.000
 kind-0.000
Token.
Token ID13
Feature activation+1.70e-03

Pos logit contributions
rified+0.002
 drown+0.002
 trigger+0.002
 Ow+0.002
 fists+0.002
Neg logit contributions
up-0.001
 kind-0.001
Sarah-0.001
oe-0.001
 generous-0.001
Token Lily
Token ID20037
Feature activation-5.12e-03

Pos logit contributions
rified+0.003
 Ow+0.003
 fists+0.003
 cries+0.003
 trigger+0.003
Neg logit contributions
up-0.003
 oblig-0.002
aver-0.002
oe-0.002
ate-0.002
Token OK
Token ID7477
Feature activation+4.46e-03

Pos logit contributions
up+0.015
 oblig+0.012
Sarah+0.012
ari+0.012
old+0.012
Neg logit contributions
 drown-0.010
rified-0.010
 Ow-0.010
 drowned-0.009
 fists-0.009
Token,
Token ID11
Feature activation-8.94e-03

Pos logit contributions
rified+0.006
 drown+0.006
 Ow+0.006
 screams+0.006
 cries+0.005
Neg logit contributions
up-0.009
ate-0.008
Sarah-0.008
wn-0.008
aver-0.008
Token oct
Token ID19318
Feature activation+2.51e-02

Pos logit contributions
 kind+0.013
 generous+0.012
up+0.011
 nice+0.011
 wise+0.011
Neg logit contributions
rified-0.018
 drown-0.015
 drowned-0.015
 fists-0.015
 tongues-0.015
Tokenopus
Token ID25790
Feature activation-7.79e-03

Pos logit contributions
rified+0.039
 drown+0.036
 fists+0.033
 trigger+0.033
 Ow+0.033
Neg logit contributions
up-0.049
oe-0.041
Sarah-0.041
aver-0.041
 kind-0.040
Token,"
Token ID553
Feature activation+5.57e-03

Pos logit contributions
up+0.011
Sarah+0.009
 kind+0.009
oe+0.009
 oblig+0.009
Neg logit contributions
rified-0.016
 drown-0.015
 Ow-0.014
 trigger-0.014
 screams-0.013
Token said
Token ID531
Feature activation+1.10e-02

Pos logit contributions
rified+0.011
 drown+0.010
 Ow+0.009
 trigger+0.009
 screams+0.009
Neg logit contributions
up-0.009
Sarah-0.007
oe-0.007
 kind-0.007
 oblig-0.007
Token the
Token ID262
Feature activation+4.20e-04

Pos logit contributions
rified+0.019
 drown+0.017
 trigger+0.016
 Ow+0.016
 screams+0.015
Neg logit contributions
up-0.020
Sarah-0.017
aver-0.016
 kind-0.016
oe-0.016
Token duck
Token ID22045
Feature activation+1.92e-02

Pos logit contributions
rified+0.001
 drown+0.001
 fists+0.001
 risking+0.001
 Ow+0.001
Neg logit contributions
up-0.001
 kind-0.001
 generous-0.001
aver-0.001
 wise-0.001
Token.
Token ID13
Feature activation-2.06e-02

Pos logit contributions
rified+0.034
 drown+0.032
 trigger+0.030
 drowned+0.029
 screams+0.029
Neg logit contributions
up-0.031
Sarah-0.027
 oblig-0.026
ate-0.026
oe-0.025
Token "
Token ID366
Feature activation+9.54e-03

Pos logit contributions
up+0.029
Sarah+0.023
 kind+0.023
oe+0.023
 oblig+0.022
Neg logit contributions
rified-0.043
 drown-0.041
 Ow-0.037
 trigger-0.037
 screams-0.037
TokenI
Token ID40
Feature activation-6.60e-03

Pos logit contributions
rified+0.019
 drown+0.018
 trigger+0.017
 Ow+0.017
 fists+0.017
Neg logit contributions
up-0.014
Sarah-0.012
 kind-0.011
oe-0.011
aver-0.011
Token The
Token ID383
Feature activation-6.25e-03

Pos logit contributions
up+0.008
Sarah+0.007
 kind+0.007
oe+0.006
 oblig+0.006
Neg logit contributions
rified-0.014
 drown-0.014
 trigger-0.013
 Ow-0.012
 screams-0.012
Token rain
Token ID6290
Feature activation+5.70e-04

Pos logit contributions
up+0.008
 kind+0.007
oe+0.006
Sarah+0.006
aver+0.006
Neg logit contributions
rified-0.014
 drown-0.013
 Ow-0.012
 trigger-0.012
 fists-0.012
Token didn
Token ID1422
Feature activation+3.84e-02

Pos logit contributions
rified+0.001
 drown+0.001
 Ow+0.001
 trigger+0.001
 fists+0.001
Neg logit contributions
up-0.001
oe-0.001
Sarah-0.001
 kind-0.001
aver-0.001
Token't
Token ID470
Feature activation+2.39e-03

Pos logit contributions
rified+0.063
 drown+0.059
 trigger+0.053
 Ow+0.052
 screams+0.051
Neg logit contributions
up-0.071
Sarah-0.061
 kind-0.060
oe-0.059
 oblig-0.059
Token spoil
Token ID20851
Feature activation+9.51e-03

Pos logit contributions
 drown+0.004
rified+0.004
 trigger+0.004
 drowned+0.004
 Ow+0.004
Neg logit contributions
up-0.004
Sarah-0.003
aver-0.003
ate-0.003
oe-0.003
Token their
Token ID511
Feature activation+4.74e-03

Pos logit contributions
rified+0.021
 drown+0.021
 trigger+0.019
 screams+0.019
 Ow+0.018
Neg logit contributions
up-0.012
 kind-0.010
Sarah-0.009
oe-0.009
ot-0.009
Token trip
Token ID5296
Feature activation+4.44e-03

Pos logit contributions
rified+0.011
 trigger+0.009
 drown+0.009
 Ow+0.009
ting+0.009
Neg logit contributions
up-0.006
 kind-0.006
oe-0.005
 nice-0.005
 generous-0.005
Token after
Token ID706
Feature activation+3.74e-02

Pos logit contributions
rified+0.009
 drown+0.008
 trigger+0.008
 Ow+0.007
 screams+0.007
Neg logit contributions
up-0.007
Sarah-0.006
 kind-0.005
 oblig-0.005
aver-0.005
Token all
Token ID477
Feature activation-2.60e-02

Pos logit contributions
rified+0.070
 drown+0.065
 trigger+0.061
 Ow+0.058
 screams+0.058
Neg logit contributions
up-0.061
Sarah-0.052
oe-0.052
 kind-0.051
 oblig-0.048
Token.
Token ID13
Feature activation-7.61e-03

Pos logit contributions
up+0.031
 kind+0.024
oe+0.024
Sarah+0.024
 generous+0.021
Neg logit contributions
rified-0.061
 drown-0.057
 Ow-0.055
 trigger-0.055
 fists-0.052
Token<|endoftext|>
Token ID50256
Feature activation+2.55e-02

Pos logit contributions
up+0.011
Sarah+0.008
 oblig+0.008
 kind+0.008
oe+0.008
Neg logit contributions
rified-0.016
 drown-0.015
 Ow-0.014
 trigger-0.014
 screams-0.014
Token want
Token ID765
Feature activation+1.51e-03

Pos logit contributions
rified+0.012
 drown+0.011
 Ow+0.010
 trigger+0.010
 drowned+0.010
Neg logit contributions
up-0.008
 kind-0.007
 oblig-0.006
Sarah-0.006
 generous-0.006
Token to
Token ID284
Feature activation-2.06e-02

Pos logit contributions
rified+0.004
 drown+0.003
 Ow+0.003
 trigger+0.003
 screams+0.003
Neg logit contributions
up-0.002
Sarah-0.001
aver-0.001
 oblig-0.001
oe-0.001
Token build
Token ID1382
Feature activation-1.98e-02

Pos logit contributions
up+0.031
Sarah+0.026
aver+0.026
oe+0.025
pron+0.024
Neg logit contributions
 drown-0.039
rified-0.037
 Ow-0.035
 screams-0.034
 trigger-0.034
Token a
Token ID257
Feature activation+9.50e-03

Pos logit contributions
up+0.028
Sarah+0.023
 kind+0.022
oe+0.022
 oblig+0.021
Neg logit contributions
rified-0.041
 drown-0.039
 trigger-0.036
 Ow-0.036
 screams-0.035
Token house
Token ID2156
Feature activation-4.09e-02

Pos logit contributions
 drown+0.016
rified+0.016
 trigger+0.015
 Ow+0.014
 screams+0.014
Neg logit contributions
up-0.016
Sarah-0.013
 oblig-0.013
aver-0.013
oe-0.013
Token with
Token ID351
Feature activation+4.70e-03

Pos logit contributions
up+0.063
Sarah+0.056
 oblig+0.054
aver+0.054
Her+0.051
Neg logit contributions
 drown-0.070
rified-0.069
 screams-0.062
 trigger-0.062
 banks-0.060
Token the
Token ID262
Feature activation+6.71e-03

Pos logit contributions
rified+0.009
 drown+0.009
 trigger+0.008
 Ow+0.008
 screams+0.008
Neg logit contributions
up-0.007
 kind-0.006
Sarah-0.006
oe-0.006
aver-0.006
Token boxes
Token ID10559
Feature activation+1.67e-03

Pos logit contributions
rified+0.012
 drown+0.011
 Ow+0.010
 trigger+0.010
 screams+0.010
Neg logit contributions
up-0.012
Sarah-0.010
 kind-0.010
oe-0.009
aver-0.009
Token.
Token ID13
Feature activation+8.36e-03

Pos logit contributions
rified+0.003
 drown+0.003
 Ow+0.003
 trigger+0.003
 screams+0.003
Neg logit contributions
up-0.002
Sarah-0.002
 kind-0.002
oe-0.002
 oblig-0.002
Token
Token ID198
Feature activation+3.42e-03

Pos logit contributions
rified+0.018
 drown+0.017
 trigger+0.016
 Ow+0.016
 screams+0.016
Neg logit contributions
up-0.011
Sarah-0.009
oe-0.009
 kind-0.008
 oblig-0.008
Token
Token ID198
Feature activation-7.76e-04

Pos logit contributions
rified+0.005
 drown+0.005
 Ow+0.005
 screams+0.005
 drowned+0.005
Neg logit contributions
up-0.006
 writer-0.005
 oblig-0.005
aver-0.005
 generous-0.005
Token.
Token ID13
Feature activation-4.39e-03

Pos logit contributions
rified+0.034
 drown+0.032
 Ow+0.030
 trigger+0.029
 horror+0.028
Neg logit contributions
up-0.016
oe-0.012
 kind-0.012
Sarah-0.011
--0.011
Token He
Token ID679
Feature activation-2.68e-03

Pos logit contributions
up+0.007
 kind+0.006
 oblig+0.006
aver+0.006
Sarah+0.006
Neg logit contributions
rified-0.008
 drown-0.008
 Ow-0.007
 trigger-0.007
 screams-0.007
Token remembered
Token ID12086
Feature activation-1.32e-02

Pos logit contributions
up+0.004
Sarah+0.004
 kind+0.004
oe+0.004
 oblig+0.004
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.004
 trigger-0.004
 screams-0.004
Token that
Token ID326
Feature activation-5.79e-03

Pos logit contributions
up+0.019
 kind+0.016
Sarah+0.016
 oblig+0.015
aver+0.014
Neg logit contributions
rified-0.027
 drown-0.026
 trigger-0.024
 Ow-0.024
 screams-0.023
Token his
Token ID465
Feature activation+8.63e-03

Pos logit contributions
up+0.009
Sarah+0.008
 kind+0.008
oe+0.007
 oblig+0.007
Neg logit contributions
rified-0.011
 drown-0.010
 trigger-0.010
 Ow-0.010
 screams-0.009
Token dad
Token ID9955
Feature activation-2.18e-03

Pos logit contributions
rified+0.019
 drown+0.017
 trigger+0.016
 Ow+0.016
ting+0.015
Neg logit contributions
up-0.012
 kind-0.010
oe-0.010
 generous-0.009
Sarah-0.009
Token told
Token ID1297
Feature activation+1.29e-02

Pos logit contributions
up+0.004
Sarah+0.003
oe+0.003
 oblig+0.003
 kind+0.003
Neg logit contributions
rified-0.004
 drown-0.004
 trigger-0.003
 Ow-0.003
 screams-0.003
Token him
Token ID683
Feature activation+2.79e-03

Pos logit contributions
rified+0.018
 drown+0.016
 Ow+0.015
 trigger+0.015
 screams+0.014
Neg logit contributions
up-0.027
Sarah-0.023
oe-0.023
aver-0.023
 kind-0.023
Token that
Token ID326
Feature activation+6.88e-03

Pos logit contributions
rified+0.006
 drown+0.005
 trigger+0.005
 Ow+0.005
 screams+0.005
Neg logit contributions
up-0.004
Sarah-0.003
oe-0.003
 oblig-0.003
 kind-0.003
Token this
Token ID428
Feature activation-6.89e-03

Pos logit contributions
rified+0.013
 drown+0.012
 trigger+0.011
 Ow+0.011
 screams+0.011
Neg logit contributions
up-0.011
 kind-0.009
Sarah-0.009
 oblig-0.009
oe-0.009
Token button
Token ID4936
Feature activation+7.10e-03

Pos logit contributions
up+0.011
Sarah+0.009
 oblig+0.009
 kind+0.009
oe+0.009
Neg logit contributions
rified-0.013
 drown-0.012
 trigger-0.011
 Ow-0.011
 screams-0.011
Token
Token ID198
Feature activation-3.25e-03

Pos logit contributions
up+0.017
oe+0.013
aver+0.013
Sarah+0.013
 oblig+0.013
Neg logit contributions
rified-0.018
 drown-0.017
 Ow-0.017
 trigger-0.016
 screams-0.016
Token
Token ID198
Feature activation-2.51e-03

Pos logit contributions
up+0.006
pron+0.005
 writer+0.005
wn+0.005
aver+0.005
Neg logit contributions
 drown-0.004
 Ow-0.004
rified-0.004
 screams-0.004
 drowned-0.004
Token"
Token ID1
Feature activation-3.12e-03

Pos logit contributions
pron+0.003
up+0.003
 oblig+0.002
 visitor+0.002
 writer+0.002
Neg logit contributions
rified-0.005
 drown-0.005
 drowned-0.005
 trigger-0.005
 arrest-0.005
TokenYes
Token ID5297
Feature activation+4.85e-04

Pos logit contributions
up+0.005
aver+0.004
Sarah+0.004
ate+0.004
oe+0.004
Neg logit contributions
rified-0.006
 drown-0.006
 Ow-0.005
 trigger-0.005
 screams-0.005
Token,
Token ID11
Feature activation-4.37e-03

Pos logit contributions
rified+0.001
 drown+0.001
 trigger+0.001
 Ow+0.001
 screams+0.001
Neg logit contributions
up-0.001
Sarah-0.000
 kind-0.000
oe-0.000
 oblig-0.000
Token thank
Token ID5875
Feature activation-1.76e-02

Pos logit contributions
up+0.007
oe+0.005
Sarah+0.005
ate+0.005
 oblig+0.005
Neg logit contributions
rified-0.009
 drown-0.008
 trigger-0.008
 Ow-0.007
 screams-0.007
Token you
Token ID345
Feature activation-8.89e-03

Pos logit contributions
up+0.032
aver+0.028
Sarah+0.028
pron+0.028
ate+0.027
Neg logit contributions
rified-0.028
 drown-0.026
 Ow-0.024
 screams-0.024
 trigger-0.023
Token,"
Token ID553
Feature activation+9.14e-03

Pos logit contributions
up+0.013
 kind+0.010
Sarah+0.010
oe+0.010
 oblig+0.010
Neg logit contributions
rified-0.018
 drown-0.017
 trigger-0.016
 Ow-0.016
 fists-0.015
Token Lily
Token ID20037
Feature activation-1.91e-03

Pos logit contributions
rified+0.015
 drown+0.014
 screams+0.013
 trigger+0.013
 Ow+0.013
Neg logit contributions
up-0.016
aver-0.013
oe-0.013
pron-0.013
ate-0.013
Token and
Token ID290
Feature activation+1.20e-03

Pos logit contributions
up+0.003
Sarah+0.002
oe+0.002
 kind+0.002
 oblig+0.002
Neg logit contributions
rified-0.004
 drown-0.003
 Ow-0.003
 trigger-0.003
 screams-0.003
Token Sam
Token ID3409
Feature activation+2.21e-02

Pos logit contributions
rified+0.003
 drown+0.002
 trigger+0.002
 Ow+0.002
 drowned+0.002
Neg logit contributions
up-0.002
Sarah-0.001
 kind-0.001
 oblig-0.001
oe-0.001
Token fit
Token ID4197
Feature activation-2.35e-02

Pos logit contributions
up+0.010
oe+0.009
Sarah+0.008
ate+0.008
 oblig+0.008
Neg logit contributions
 drown-0.014
rified-0.014
 drowned-0.013
 screams-0.013
 shouldn-0.012
Token team
Token ID1074
Feature activation-9.94e-03

Pos logit contributions
up+0.034
Sarah+0.028
 kind+0.027
oe+0.027
 oblig+0.027
Neg logit contributions
rified-0.047
 drown-0.045
 Ow-0.042
 trigger-0.042
 screams-0.041
Token.
Token ID13
Feature activation-3.12e-03

Pos logit contributions
up+0.013
Sarah+0.010
 kind+0.010
oe+0.010
 oblig+0.010
Neg logit contributions
rified-0.022
 drown-0.021
 Ow-0.019
 trigger-0.019
 screams-0.019
Token They
Token ID1119
Feature activation+3.86e-03

Pos logit contributions
up+0.004
Sarah+0.003
 kind+0.003
ate+0.003
L+0.003
Neg logit contributions
rified-0.007
 drown-0.007
 trigger-0.006
 screams-0.006
 fists-0.006
Token continued
Token ID3767
Feature activation-1.35e-02

Pos logit contributions
rified+0.008
 drown+0.007
 Ow+0.007
 trigger+0.007
 fists+0.007
Neg logit contributions
up-0.006
 kind-0.005
aver-0.005
Sarah-0.005
 oblig-0.005
Token to
Token ID284
Feature activation-1.97e-03

Pos logit contributions
up+0.015
Sarah+0.012
 kind+0.011
oe+0.011
 oblig+0.011
Neg logit contributions
rified-0.032
 drown-0.031
 trigger-0.028
 Ow-0.028
 screams-0.028
Token play
Token ID711
Feature activation-8.12e-03

Pos logit contributions
up+0.002
Sarah+0.002
 kind+0.002
 oblig+0.002
oe+0.002
Neg logit contributions
rified-0.005
 drown-0.004
 trigger-0.004
 Ow-0.004
 screams-0.004
Token and
Token ID290
Feature activation-3.01e-03

Pos logit contributions
up+0.011
 kind+0.009
Sarah+0.009
 oblig+0.008
oe+0.008
Neg logit contributions
rified-0.018
 drown-0.017
 Ow-0.016
 trigger-0.016
 screams-0.015
Token have
Token ID423
Feature activation-1.02e-02

Pos logit contributions
up+0.004
 kind+0.003
Sarah+0.003
 oblig+0.003
oe+0.003
Neg logit contributions
rified-0.006
 drown-0.006
 Ow-0.006
 trigger-0.006
 screams-0.005
Token fun
Token ID1257
Feature activation-4.11e-02

Pos logit contributions
up+0.013
Sarah+0.010
oe+0.010
 oblig+0.009
 kind+0.009
Neg logit contributions
rified-0.023
 drown-0.022
 Ow-0.020
 trigger-0.020
 screams-0.020
Token in
Token ID287
Feature activation+1.40e-02

Pos logit contributions
up+0.037
 kind+0.032
-+0.029
oe+0.027
house+0.026
Neg logit contributions
rified-0.107
 drown-0.099
 trigger-0.099
 Ow-0.098
 fists-0.093
Token He
Token ID679
Feature activation+3.69e-03

Pos logit contributions
up+0.001
Sarah+0.001
 kind+0.001
 oblig+0.001
oe+0.001
Neg logit contributions
rified-0.002
 drown-0.002
 trigger-0.002
 screams-0.001
 Ow-0.001
Token was
Token ID373
Feature activation+5.78e-03

Pos logit contributions
rified+0.007
 drown+0.007
 trigger+0.007
 Ow+0.007
 fault+0.006
Neg logit contributions
up-0.005
 kind-0.005
aver-0.004
--0.004
Sarah-0.004
Token taken
Token ID2077
Feature activation-7.04e-03

Pos logit contributions
rified+0.010
 drown+0.009
 drowned+0.008
 screams+0.008
 fists+0.008
Neg logit contributions
up-0.011
Sarah-0.009
oe-0.009
 oblig-0.009
ate-0.008
Token to
Token ID284
Feature activation-5.69e-03

Pos logit contributions
up+0.011
 kind+0.009
Sarah+0.009
oe+0.009
aver+0.008
Neg logit contributions
rified-0.014
 drown-0.013
 trigger-0.012
 Ow-0.012
 screams-0.012
Token the
Token ID262
Feature activation-1.01e-02

Pos logit contributions
up+0.007
 kind+0.006
Sarah+0.006
 oblig+0.006
ate+0.006
Neg logit contributions
rified-0.013
 trigger-0.011
 drown-0.011
 tongues-0.011
 Ow-0.011
Token hospital
Token ID4436
Feature activation-2.78e-02

Pos logit contributions
up+0.015
 kind+0.014
 generous+0.013
 patient+0.012
 nice+0.012
Neg logit contributions
rified-0.020
 drown-0.019
 Ow-0.018
 tongues-0.017
 trigger-0.017
Token,
Token ID11
Feature activation+1.21e-02

Pos logit contributions
up+0.036
 kind+0.029
Sarah+0.028
oe+0.028
 oblig+0.027
Neg logit contributions
rified-0.060
 drown-0.058
 Ow-0.055
 trigger-0.055
 drowned-0.052
Token but
Token ID475
Feature activation+1.81e-02

Pos logit contributions
rified+0.024
 trigger+0.023
 Ow+0.023
 drown+0.022
 tongues+0.021
Neg logit contributions
 kind-0.016
up-0.016
Sarah-0.015
 generous-0.014
 patient-0.014
Token it
Token ID340
Feature activation-4.37e-04

Pos logit contributions
rified+0.040
 drown+0.038
 trigger+0.038
 Ow+0.037
 tongues+0.036
Neg logit contributions
up-0.020
 kind-0.020
 generous-0.018
 patient-0.017
Sarah-0.017
Token was
Token ID373
Feature activation+1.92e-02

Pos logit contributions
up+0.001
 kind+0.001
aver+0.000
Sarah+0.000
oe+0.000
Neg logit contributions
 drown-0.001
rified-0.001
 trigger-0.001
 Ow-0.001
 tongues-0.001
Token too
Token ID1165
Feature activation+1.80e-02

Pos logit contributions
rified+0.036
 drown+0.035
 Ow+0.031
 drowned+0.031
 trigger+0.031
Neg logit contributions
up-0.031
Sarah-0.025
 oblig-0.025
oe-0.024
ate-0.024
Token
Token ID198
Feature activation-9.32e-03

Pos logit contributions
up+0.005
Sarah+0.004
 kind+0.004
oe+0.004
 oblig+0.004
Neg logit contributions
rified-0.009
 drown-0.008
 trigger-0.008
 Ow-0.008
 screams-0.008
Token"
Token ID1
Feature activation-4.72e-03

Pos logit contributions
up+0.009
 oblig+0.007
aver+0.006
oe+0.006
ate+0.006
Neg logit contributions
rified-0.022
 drown-0.021
 screams-0.020
 Ow-0.020
 drowned-0.020
TokenMax
Token ID11518
Feature activation+1.29e-02

Pos logit contributions
up+0.008
 oblig+0.006
 kind+0.006
oe+0.006
aver+0.006
Neg logit contributions
rified-0.009
 drown-0.008
 trigger-0.008
 Ow-0.008
 screams-0.008
Token,
Token ID11
Feature activation-5.57e-04

Pos logit contributions
rified+0.029
 drown+0.028
 trigger+0.026
 Ow+0.026
 screams+0.025
Neg logit contributions
up-0.016
Sarah-0.012
 kind-0.012
oe-0.012
aver-0.011
Token you
Token ID345
Feature activation+1.63e-02

Pos logit contributions
up+0.001
 kind+0.001
Sarah+0.001
 oblig+0.001
aver+0.001
Neg logit contributions
rified-0.001
 drown-0.001
 trigger-0.001
 Ow-0.001
 screams-0.001
Token are
Token ID389
Feature activation-1.96e-03

Pos logit contributions
rified+0.029
 trigger+0.026
 drown+0.026
 tongues+0.025
der+0.025
Neg logit contributions
up-0.030
 kind-0.027
 generous-0.023
 nice-0.022
 of-0.022
Token in
Token ID287
Feature activation+1.80e-02

Pos logit contributions
up+0.003
 kind+0.003
Sarah+0.003
aver+0.002
oe+0.002
Neg logit contributions
rified-0.004
 drown-0.004
 trigger-0.003
 Ow-0.003
 screams-0.003
Token trouble
Token ID5876
Feature activation+1.34e-02

Pos logit contributions
rified+0.037
 drown+0.032
 trigger+0.032
 Ow+0.030
 tongues+0.030
Neg logit contributions
up-0.027
 kind-0.024
oe-0.022
Sarah-0.022
 generous-0.022
Token.
Token ID13
Feature activation-2.00e-02

Pos logit contributions
rified+0.029
 trigger+0.027
 drown+0.027
 Ow+0.026
 tongues+0.025
Neg logit contributions
up-0.018
 kind-0.016
oe-0.015
--0.014
Sarah-0.014
Token You
Token ID921
Feature activation+2.00e-02

Pos logit contributions
up+0.035
Sarah+0.030
 kind+0.029
aver+0.028
oe+0.028
Neg logit contributions
rified-0.034
 drown-0.032
 trigger-0.030
 drowned-0.028
 Ow-0.028
Token need
Token ID761
Feature activation-1.76e-02

Pos logit contributions
rified+0.042
 trigger+0.037
der+0.037
 tongues+0.036
 fists+0.036
Neg logit contributions
up-0.031
 kind-0.026
 of-0.023
 unlocked-0.023
 borrow-0.022
Token,
Token ID11
Feature activation-2.71e-03

Pos logit contributions
up+0.030
Sarah+0.023
 kind+0.023
oe+0.022
 oblig+0.022
Neg logit contributions
rified-0.056
 drown-0.054
 trigger-0.050
 Ow-0.050
 screams-0.049
Token there
Token ID612
Feature activation+2.89e-03

Pos logit contributions
up+0.005
Sarah+0.004
 oblig+0.004
aver+0.003
oe+0.003
Neg logit contributions
rified-0.005
 drown-0.005
 Ow-0.005
 trigger-0.004
 screams-0.004
Token was
Token ID373
Feature activation+6.26e-03

Pos logit contributions
rified+0.007
 drown+0.007
 trigger+0.007
 fists+0.006
 tongues+0.006
Neg logit contributions
up-0.003
 kind-0.003
Sarah-0.002
oe-0.002
ate-0.002
Token a
Token ID257
Feature activation+1.50e-02

Pos logit contributions
rified+0.014
 drown+0.013
 Ow+0.012
 trigger+0.012
 screams+0.012
Neg logit contributions
up-0.008
Sarah-0.006
 kind-0.006
oe-0.006
 oblig-0.006
Token little
Token ID1310
Feature activation+8.69e-03

Pos logit contributions
rified+0.024
 drown+0.024
 trigger+0.022
 Ow+0.022
 screams+0.021
Neg logit contributions
up-0.027
Sarah-0.022
 oblig-0.021
ate-0.021
old-0.021
Token girl
Token ID2576
Feature activation-4.26e-02

Pos logit contributions
rified+0.016
 drown+0.015
 trigger+0.014
 Ow+0.014
 screams+0.013
Neg logit contributions
up-0.014
Sarah-0.012
 oblig-0.012
ate-0.012
oe-0.011
Token named
Token ID3706
Feature activation-2.32e-02

Pos logit contributions
up+0.074
 oblig+0.064
lic+0.060
wn+0.060
reetings+0.060
Neg logit contributions
rified-0.062
 drown-0.061
 drowned-0.057
 horror-0.056
 cries-0.056
Token Lily
Token ID20037
Feature activation-3.06e-02

Pos logit contributions
up+0.040
aver+0.034
ate+0.034
 oblig+0.033
 writer+0.031
Neg logit contributions
rified-0.039
 Ow-0.038
 drown-0.036
 screams-0.033
 fists-0.032
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
up+0.046
Sarah+0.037
 oblig+0.037
oe+0.037
aver+0.036
Neg logit contributions
rified-0.060
 drown-0.057
 Ow-0.052
 trigger-0.052
 screams-0.051
Token She
Token ID1375
Feature activation-1.70e-02

Pos logit contributions
rified+0.021
 Ow+0.020
 drown+0.018
 horror+0.017
 tongues+0.017
Neg logit contributions
up-0.018
aver-0.015
 oblig-0.015
old-0.013
ate-0.013
Token loved
Token ID6151
Feature activation+3.91e-04

Pos logit contributions
up+0.031
 oblig+0.026
Sarah+0.025
oe+0.025
ate+0.024
Neg logit contributions
rified-0.027
 drown-0.026
 drowned-0.024
 screams-0.024
 Ow-0.024
Token!
Token ID0
Feature activation-8.52e-03

Pos logit contributions
up+0.011
Sarah+0.009
 kind+0.009
oe+0.008
 oblig+0.008
Neg logit contributions
rified-0.016
 drown-0.016
 trigger-0.015
 Ow-0.014
 screams-0.014
Token
Token ID198
Feature activation-6.02e-03

Pos logit contributions
up+0.009
Sarah+0.008
oe+0.007
 kind+0.007
 oblig+0.006
Neg logit contributions
rified-0.020
 drown-0.020
 trigger-0.019
 fists-0.018
 screams-0.018
Token
Token ID198
Feature activation-5.38e-03

Pos logit contributions
up+0.008
Sarah+0.007
 kind+0.006
oe+0.006
 oblig+0.006
Neg logit contributions
rified-0.013
 drown-0.012
 trigger-0.012
 Ow-0.011
 fists-0.011
TokenAt
Token ID2953
Feature activation-1.02e-02

Pos logit contributions
up+0.008
oe+0.006
 oblig+0.006
 kind+0.006
aver+0.006
Neg logit contributions
rified-0.011
 drown-0.010
 Ow-0.009
 trigger-0.009
 drowned-0.009
Token the
Token ID262
Feature activation-2.82e-02

Pos logit contributions
up+0.016
Sarah+0.014
 kind+0.014
oe+0.014
 oblig+0.013
Neg logit contributions
rified-0.019
 drown-0.018
 trigger-0.016
 Ow-0.016
 screams-0.016
Token end
Token ID886
Feature activation-3.63e-02

Pos logit contributions
up+0.043
Sarah+0.035
 kind+0.034
oe+0.034
 oblig+0.034
Neg logit contributions
rified-0.056
 drown-0.053
 trigger-0.049
 Ow-0.049
 screams-0.048
Token of
Token ID286
Feature activation-1.58e-02

Pos logit contributions
up+0.050
wn+0.049
cluded+0.043
ital+0.043
eh+0.042
Neg logit contributions
rified-0.062
 drown-0.062
 horror-0.061
 screams-0.058
 mouths-0.057
Token the
Token ID262
Feature activation-2.41e-02

Pos logit contributions
up+0.029
Sarah+0.024
 kind+0.023
aver+0.023
oe+0.023
Neg logit contributions
rified-0.027
 drown-0.026
 Ow-0.023
 trigger-0.023
 screams-0.023
Token day
Token ID1110
Feature activation-4.86e-03

Pos logit contributions
up+0.032
Sarah+0.024
aver+0.023
 oblig+0.023
ate+0.023
Neg logit contributions
rified-0.052
 drown-0.051
 screams-0.047
 trigger-0.046
 Ow-0.046
Token,
Token ID11
Feature activation+6.40e-03

Pos logit contributions
up+0.007
Sarah+0.005
aver+0.005
 kind+0.005
 oblig+0.005
Neg logit contributions
rified-0.010
 drown-0.010
 Ow-0.009
 screams-0.009
 trigger-0.009
Token the
Token ID262
Feature activation+1.60e-03

Pos logit contributions
rified+0.015
 drown+0.015
 trigger+0.014
 Ow+0.014
 screams+0.014
Neg logit contributions
 kind-0.006
 patient-0.006
 generous-0.006
 wise-0.005
 nice-0.005
Token
Token ID198
Feature activation-1.59e-03

Pos logit contributions
rified+0.006
 drown+0.005
 Ow+0.005
 screams+0.005
 trigger+0.005
Neg logit contributions
up-0.004
aver-0.003
 kind-0.003
Sarah-0.003
 oblig-0.003
TokenOne
Token ID3198
Feature activation-1.74e-02

Pos logit contributions
up+0.002
 oblig+0.001
oe+0.001
aver+0.001
pron+0.001
Neg logit contributions
rified-0.004
 drown-0.003
 Ow-0.003
 trigger-0.003
 screams-0.003
Token day
Token ID1110
Feature activation-1.91e-03

Pos logit contributions
up+0.030
ate+0.026
Sarah+0.026
 oblig+0.025
aid+0.025
Neg logit contributions
rified-0.028
 drown-0.025
 screams-0.023
 horror-0.022
 drowning-0.022
Token Jenna
Token ID45377
Feature activation-1.60e-02

Pos logit contributions
up+0.003
aver+0.003
aid+0.003
 oblig+0.003
ag+0.003
Neg logit contributions
rified-0.003
 trigger-0.003
 Ow-0.003
ting-0.003
 fists-0.002
Token had
Token ID550
Feature activation+1.16e-03

Pos logit contributions
up+0.024
 kind+0.018
Sarah+0.018
oe+0.017
aver+0.017
Neg logit contributions
rified-0.033
 drown-0.033
 trigger-0.031
 Ow-0.030
 fists-0.029
Token an
Token ID281
Feature activation-4.84e-02

Pos logit contributions
rified+0.002
 drown+0.002
 Ow+0.002
 drowned+0.002
 screams+0.002
Neg logit contributions
up-0.002
Sarah-0.001
aver-0.001
oe-0.001
 oblig-0.001
Token idea
Token ID2126
Feature activation-3.78e-02

Pos logit contributions
up+0.067
Sarah+0.055
oe+0.053
 oblig+0.050
aver+0.049
Neg logit contributions
rified-0.100
 drown-0.095
 screams-0.088
 Ow-0.088
 trigger-0.088
Token.
Token ID13
Feature activation+2.47e-03

Pos logit contributions
up+0.058
Sarah+0.048
oe+0.047
 oblig+0.047
aver+0.046
Neg logit contributions
rified-0.072
 drown-0.068
 screams-0.062
 Ow-0.062
 trigger-0.062
Token She
Token ID1375
Feature activation-2.32e-02

Pos logit contributions
rified+0.004
 fists+0.004
 Ow+0.004
der+0.004
 trigger+0.004
Neg logit contributions
aver-0.004
up-0.004
eh-0.004
pron-0.004
 oblig-0.004
Token asked
Token ID1965
Feature activation+7.02e-03

Pos logit contributions
up+0.040
Sarah+0.035
 oblig+0.035
oe+0.034
ate+0.033
Neg logit contributions
rified-0.038
 drown-0.034
 drowned-0.032
 screams-0.032
 trigger-0.031
Token her
Token ID607
Feature activation-9.11e-03

Pos logit contributions
rified+0.013
 Ow+0.012
 fists+0.012
ting+0.012
 screams+0.011
Neg logit contributions
up-0.010
aver-0.009
oe-0.009
pron-0.009
eh-0.008
Token and
Token ID290
Feature activation+2.33e-03

Pos logit contributions
up+0.007
 kind+0.006
-+0.006
oe+0.005
 generous+0.005
Neg logit contributions
 drown-0.018
rified-0.018
 Ow-0.016
 trigger-0.016
 screams-0.016
Token had
Token ID550
Feature activation-2.58e-03

Pos logit contributions
rified+0.005
 drown+0.005
 Ow+0.005
 trigger+0.004
 screams+0.004
Neg logit contributions
up-0.003
Sarah-0.003
 kind-0.003
aver-0.002
oe-0.002
Token a
Token ID257
Feature activation-1.53e-02

Pos logit contributions
up+0.004
Sarah+0.003
 kind+0.003
oe+0.003
 oblig+0.003
Neg logit contributions
rified-0.005
 drown-0.005
 trigger-0.004
 Ow-0.004
 screams-0.004
Token lot
Token ID1256
Feature activation-4.50e-02

Pos logit contributions
up+0.026
 kind+0.023
Sarah+0.023
oe+0.022
 oblig+0.021
Neg logit contributions
rified-0.027
 drown-0.025
 trigger-0.023
 Ow-0.023
 screams-0.023
Token of
Token ID286
Feature activation-3.42e-02

Pos logit contributions
pron+0.063
 oblig+0.058
daughter+0.057
 writer+0.056
very+0.055
Neg logit contributions
der-0.073
rified-0.071
 drowned-0.068
 drown-0.067
 screams-0.063
Token fun
Token ID1257
Feature activation-5.36e-02

Pos logit contributions
up+0.040
Sarah+0.031
oe+0.029
aver+0.029
 oblig+0.028
Neg logit contributions
rified-0.078
 drown-0.076
 Ow-0.071
 screams-0.070
 trigger-0.070
Token.
Token ID13
Feature activation-4.06e-03

Pos logit contributions
up+0.050
 kind+0.042
-+0.040
Sarah+0.039
ate+0.037
Neg logit contributions
rified-0.139
 drown-0.128
 trigger-0.123
 Ow-0.122
 fists-0.117
Token Tim
Token ID5045
Feature activation+2.30e-03

Pos logit contributions
up+0.005
Sarah+0.005
oe+0.004
 kind+0.004
L+0.004
Neg logit contributions
rified-0.009
 drown-0.009
 trigger-0.008
 drowned-0.008
 risking-0.008
Token was
Token ID373
Feature activation+4.58e-03

Pos logit contributions
rified+0.005
 drown+0.005
 trigger+0.004
 Ow+0.004
 fists+0.004
Neg logit contributions
up-0.003
 kind-0.003
Sarah-0.003
aver-0.003
 generous-0.003
Token happy
Token ID3772
Feature activation-4.47e-03

Pos logit contributions
rified+0.008
 drown+0.008
 drowned+0.007
 screams+0.007
 Ow+0.007
Neg logit contributions
up-0.008
Sarah-0.006
oe-0.006
 oblig-0.006
aver-0.006
Token to
Token ID284
Feature activation-4.01e-03

Pos logit contributions
up+0.006
 kind+0.005
Sarah+0.005
 generous+0.004
oe+0.004
Neg logit contributions
rified-0.010
 drown-0.009
 trigger-0.009
 Ow-0.009
 fists-0.009
Token up
Token ID510
Feature activation+8.92e-03

Pos logit contributions
rified+0.018
 drown+0.017
 trigger+0.015
 drowned+0.015
 tongues+0.015
Neg logit contributions
up-0.012
ate-0.010
oe-0.010
 kind-0.009
Sarah-0.009
Token,
Token ID11
Feature activation+1.50e-03

Pos logit contributions
rified+0.021
 drown+0.020
 trigger+0.019
 Ow+0.018
 screams+0.018
Neg logit contributions
up-0.010
Sarah-0.008
 kind-0.008
oe-0.008
 oblig-0.008
Token it
Token ID340
Feature activation-8.75e-03

Pos logit contributions
 drown+0.003
rified+0.003
 trigger+0.003
 tongues+0.003
 drowned+0.003
Neg logit contributions
 kind-0.002
up-0.002
 patient-0.001
Sarah-0.001
 generous-0.001
Token heard
Token ID2982
Feature activation+1.92e-02

Pos logit contributions
up+0.014
Sarah+0.012
 kind+0.012
oe+0.011
aver+0.011
Neg logit contributions
rified-0.016
 drown-0.015
 trigger-0.014
 Ow-0.014
 fists-0.014
Token a
Token ID257
Feature activation+6.06e-03

Pos logit contributions
rified+0.041
 drown+0.039
 Ow+0.037
 screams+0.036
 trigger+0.036
Neg logit contributions
up-0.026
Sarah-0.021
oe-0.020
 oblig-0.020
aver-0.020
Token voice
Token ID3809
Feature activation-3.96e-02

Pos logit contributions
rified+0.011
 drown+0.011
 screams+0.010
 trigger+0.010
 Ow+0.010
Neg logit contributions
up-0.010
Sarah-0.008
oe-0.008
aver-0.008
 oblig-0.008
Token and
Token ID290
Feature activation+7.16e-03

Pos logit contributions
up+0.068
Sarah+0.059
oe+0.058
pron+0.057
ate+0.056
Neg logit contributions
rified-0.067
 drown-0.064
 screams-0.060
 trigger-0.056
ting-0.056
Token asked
Token ID1965
Feature activation+1.86e-02

Pos logit contributions
rified+0.013
 drown+0.012
 screams+0.011
 Ow+0.011
 trigger+0.011
Neg logit contributions
up-0.012
Sarah-0.010
oe-0.010
 oblig-0.009
aver-0.009
Token,
Token ID11
Feature activation+1.09e-02

Pos logit contributions
rified+0.037
 drown+0.034
 Ow+0.032
 trigger+0.031
 screams+0.031
Neg logit contributions
up-0.029
Sarah-0.023
aver-0.023
oe-0.023
 oblig-0.022
Token "
Token ID366
Feature activation+1.59e-02

Pos logit contributions
rified+0.020
 drown+0.019
 Ow+0.018
 trigger+0.018
 screams+0.017
Neg logit contributions
up-0.018
Sarah-0.014
oe-0.014
aver-0.014
 oblig-0.014
TokenWho
Token ID8241
Feature activation+1.90e-02

Pos logit contributions
rified+0.028
 drown+0.025
 Ow+0.023
 trigger+0.023
 screams+0.022
Neg logit contributions
up-0.028
oe-0.023
 kind-0.023
aver-0.023
Sarah-0.023