TOP ACTIVATIONS
MAX = 4.667

dad were ignorant . Ign or ant meant they did not
away from home . Ign or ance is not always bliss
others with respect . Ign or ance can lead to misunderstanding
ron and vanilla and lic or ice and star an ise
put it down . Ign or ant children like you don
kind and thoughtful . Ign or ance never leads to kindness
a mask and a sn or kel . He said they
i . B amb i ran as fast as
determined to be the vict or . But ,
for air , he cl amb ered out of the p
j olly deer named B amb i . B amb i
of goggles and a sn or kel . They said they
was trying to shoot B amb i . B
He was a true vict or . From then
is always best . Ign or ant people will always be
. B amb i knew he had to
in a trap . B amb i tried to help Th
a loyal fish named Anch or . He wanted to find
the kids had their t amb our ines and drums ,
want to help . B amb i felt sad and scared

MIN ACTIVATIONS
MIN = -5.598

her mom my watched in amaz ement as the machine flew
. He was wearing his brand new icy blue jacket as
ly . Jill nodded in amaz ement .
watched the k ite in amaz ement , and Little Joe
bird looked at her in amaz ement , and said :
. He watched them in amaz ement and even laughed a
The boy watched her in amaz ement . The
. His eyes widened in amaz ement . Suddenly
looked at the bug in amaz ement . She was so
watch the big elephant in amaz ement . The
up at the tree in amaz ement , wondering how it
friends stopped and watched in amaz ement . The dog too
's eyes spark led in amaz ement . The palace garden
stared at the hook in amaz ement . She was so
. To her amaz ement , the clothes came
as the woman watched in amaz ement . She saw the
Tim my just stared in amaz ement at the t eddy
eddy bear and watched in amaz ement as it shrunk an
little girl looked up in amaz ement and said , "
water . He gasped in amaz ement as he saw the

INTERVAL 2.101 - 4.667
CONTAINS 0.080%

Mary . She was just three years old . One day
around and saw a shiny can on the ground . It
but he was too dizz y . He tried again ,
He looked for another pe b ble and found one
on the rock with a marker . They stand

INTERVAL -0.465 - 2.101
CONTAINS 81.565%

always give her a safe feeling . <|endoftext|> Tim and Lucy
He wanted to know what she had been doing all day
Once upon a time , there was a little girl named
tired ness come over her . With a big yawn she
would spend lots of time around the rail , sitting and

INTERVAL -3.032 - -0.465
CONTAINS 18.352%

L ily shook her head . She did not want
mum and said " you 're okay ". She smiled and
. They decided not to punish the cat . Instead ,
<|endoftext|> L ily and Tom were friends . They liked to
, it 's time for bed . You need to sleep

INTERVAL -5.598 - -3.032
CONTAINS 0.004%

Ben and Lily for any injuries and gave them hugs and
the cricket , it looked brand new . Tim my felt
Her little sister watched in amaz ement . She climbed up
wings . He watched in amaz ement as it flew away
it often and watched in amaz ement whenever the door opened
Token dad
Token ID9955
Feature activation-1.27e-01

Pos logit contributions
 gust+0.011
run+0.011
 brand+0.010
ement+0.010
 anymore+0.010
Neg logit contributions
ont-0.006
P-0.005
fl-0.005
Bel-0.005
 Gard-0.005
Token were
Token ID547
Feature activation-7.00e-01

Pos logit contributions
run+0.006
 gust+0.006
 brand+0.005
ement+0.005
izing+0.005
Neg logit contributions
ont-0.005
 two-0.005
 gard-0.005
 moral-0.005
 pet-0.005
Token ignorant
Token ID20109
Feature activation+8.89e-01

Pos logit contributions
run+0.032
ement+0.030
 gust+0.030
 brand+0.028
 anymore+0.027
Neg logit contributions
ont-0.029
 gard-0.027
P-0.026
Bel-0.025
 cor-0.025
Token.
Token ID13
Feature activation-3.24e-01

Pos logit contributions
ont+0.019
 two+0.017
 moral+0.017
 next+0.016
 adult+0.016
Neg logit contributions
 gust-0.055
 brand-0.054
ement-0.053
run-0.052
oped-0.052
Token Ign
Token ID16583
Feature activation+9.55e-01

Pos logit contributions
run+0.014
 gust+0.014
ement+0.014
 anymore+0.013
 brand+0.013
Neg logit contributions
ont-0.014
P-0.013
 gard-0.012
Bel-0.012
rev-0.012
Tokenor
Token ID273
Feature activation+4.67e+00

Pos logit contributions
ont+0.026
P+0.022
rev+0.020
Bel+0.020
 gard+0.020
Neg logit contributions
 gust-0.056
run-0.055
ement-0.055
 brand-0.053
 anymore-0.051
Tokenant
Token ID415
Feature activation+3.04e+00

Pos logit contributions
ont+0.175
 gard+0.173
P+0.169
Bel+0.165
 moral+0.159
Neg logit contributions
run-0.217
ement-0.209
 gust-0.200
 anymore-0.198
 brand-0.189
Token meant
Token ID4001
Feature activation+3.54e-01

Pos logit contributions
ont+0.118
P+0.106
 gard+0.106
Bel+0.102
rev+0.101
Neg logit contributions
run-0.143
ement-0.140
 gust-0.139
 brand-0.131
 anymore-0.129
Token they
Token ID484
Feature activation+1.15e+00

Pos logit contributions
ont+0.014
 gard+0.013
 cor+0.012
Bel+0.012
P+0.012
Neg logit contributions
ement-0.016
 gust-0.016
run-0.016
 brand-0.014
 anymore-0.014
Token did
Token ID750
Feature activation+1.28e+00

Pos logit contributions
ont+0.047
 gard+0.042
P+0.042
Bel+0.040
rev+0.040
Neg logit contributions
ement-0.052
 gust-0.052
run-0.051
 brand-0.048
 anymore-0.045
Token not
Token ID407
Feature activation+3.49e-01

Pos logit contributions
ont+0.047
P+0.042
 gard+0.042
Bel+0.040
rev+0.039
Neg logit contributions
run-0.063
ement-0.062
 gust-0.062
 brand-0.058
 anymore-0.057
Token away
Token ID1497
Feature activation+1.37e+00

Pos logit contributions
run+0.032
ement+0.032
 gust+0.032
 brand+0.030
 anymore+0.028
Neg logit contributions
ont-0.025
 gard-0.022
P-0.022
Bel-0.021
rev-0.021
Token from
Token ID422
Feature activation+6.98e-02

Pos logit contributions
ont+0.049
 gard+0.044
P+0.043
Bel+0.042
 cor+0.041
Neg logit contributions
 gust-0.067
ement-0.067
run-0.067
 brand-0.063
 anymore-0.059
Token home
Token ID1363
Feature activation+1.23e+00

Pos logit contributions
ont+0.003
 gard+0.002
P+0.002
Bel+0.002
rev+0.002
Neg logit contributions
run-0.003
ement-0.003
 gust-0.003
 brand-0.003
 anymore-0.003
Token.
Token ID13
Feature activation+9.56e-02

Pos logit contributions
ont+0.045
 gard+0.041
P+0.040
Bel+0.039
rev+0.038
Neg logit contributions
 gust-0.059
run-0.059
ement-0.058
 brand-0.055
 anymore-0.053
Token Ign
Token ID16583
Feature activation+8.19e-01

Pos logit contributions
ont+0.004
P+0.004
 gard+0.004
Bel+0.004
rev+0.004
Neg logit contributions
run-0.004
ement-0.004
 gust-0.004
 anymore-0.004
 brand-0.003
Tokenor
Token ID273
Feature activation+4.65e+00

Pos logit contributions
ont+0.039
P+0.036
Bel+0.035
 gard+0.035
rev+0.034
Neg logit contributions
 gust-0.031
run-0.030
ement-0.030
 brand-0.029
 importantly-0.026
Tokenance
Token ID590
Feature activation+1.30e+00

Pos logit contributions
ont+0.164
 gard+0.161
P+0.159
Bel+0.150
rev+0.150
Neg logit contributions
run-0.233
ement-0.220
 gust-0.210
 anymore-0.206
 brand-0.196
Token is
Token ID318
Feature activation+1.43e+00

Pos logit contributions
ont+0.051
 gard+0.047
P+0.046
Bel+0.046
 moral+0.045
Neg logit contributions
 gust-0.058
run-0.057
ement-0.057
 brand-0.055
 anymore-0.051
Token not
Token ID407
Feature activation+7.14e-01

Pos logit contributions
ont+0.055
 gard+0.051
 moral+0.050
P+0.050
 adult+0.050
Neg logit contributions
ement-0.066
 gust-0.064
run-0.063
 brand-0.058
 anymore-0.058
Token always
Token ID1464
Feature activation+1.29e+00

Pos logit contributions
ont+0.028
 gard+0.026
 cor+0.025
 adult+0.025
Bel+0.024
Neg logit contributions
ement-0.034
run-0.032
 gust-0.031
 brand-0.030
oped-0.028
Token bliss
Token ID30533
Feature activation+1.28e-01

Pos logit contributions
ont+0.045
 gard+0.043
 moral+0.042
 two+0.042
 cor+0.041
Neg logit contributions
ement-0.066
 gust-0.061
run-0.061
oped-0.058
 brand-0.056
Token others
Token ID1854
Feature activation+6.58e-01

Pos logit contributions
run+0.002
 gust+0.002
ement+0.002
 brand+0.002
 anymore+0.002
Neg logit contributions
ont-0.002
P-0.001
 gard-0.001
Bel-0.001
rev-0.001
Token with
Token ID351
Feature activation+9.91e-01

Pos logit contributions
ont+0.022
P+0.019
 gard+0.019
 moral+0.019
 two+0.019
Neg logit contributions
 gust-0.035
ement-0.033
run-0.033
 brand-0.031
 anymore-0.029
Token respect
Token ID2461
Feature activation-6.58e-02

Pos logit contributions
ont+0.033
P+0.030
 gard+0.030
Bel+0.029
rev+0.028
Neg logit contributions
run-0.051
 gust-0.050
ement-0.050
 brand-0.047
 anymore-0.046
Token.
Token ID13
Feature activation-3.60e-01

Pos logit contributions
 gust+0.003
run+0.003
ement+0.003
 brand+0.003
oped+0.003
Neg logit contributions
ont-0.002
P-0.002
 gard-0.002
 moral-0.002
 adult-0.002
Token Ign
Token ID16583
Feature activation+6.30e-01

Pos logit contributions
run+0.017
ement+0.015
 gust+0.015
 anymore+0.014
 brand+0.014
Neg logit contributions
ont-0.015
 gard-0.014
P-0.014
 cor-0.013
Bel-0.013
Tokenor
Token ID273
Feature activation+4.51e+00

Pos logit contributions
ont+0.020
P+0.018
 gard+0.017
Bel+0.017
rev+0.017
Neg logit contributions
run-0.033
 gust-0.033
ement-0.033
 brand-0.031
 anymore-0.031
Tokenance
Token ID590
Feature activation+1.07e+00

Pos logit contributions
ont+0.158
 gard+0.158
P+0.152
Bel+0.149
rev+0.143
Neg logit contributions
run-0.223
ement-0.215
 gust-0.199
 anymore-0.197
 brand-0.192
Token can
Token ID460
Feature activation+1.96e+00

Pos logit contributions
ont+0.040
P+0.037
 gard+0.036
 moral+0.036
Bel+0.035
Neg logit contributions
 gust-0.050
run-0.049
ement-0.048
 brand-0.046
 anymore-0.044
Token lead
Token ID1085
Feature activation-1.66e-01

Pos logit contributions
ont+0.062
 gard+0.055
P+0.054
Bel+0.052
rev+0.051
Neg logit contributions
run-0.106
ement-0.104
 gust-0.104
 brand-0.099
 anymore-0.097
Token to
Token ID284
Feature activation+1.03e+00

Pos logit contributions
run+0.008
 anymore+0.007
 gust+0.007
ement+0.007
 brand+0.007
Neg logit contributions
ont-0.006
 gard-0.006
P-0.006
rev-0.006
 Gard-0.006
Token misunderstanding
Token ID29789
Feature activation-1.55e-01

Pos logit contributions
ont+0.034
 gard+0.031
 moral+0.030
P+0.030
 adult+0.029
Neg logit contributions
ement-0.054
run-0.053
 gust-0.053
 brand-0.049
 anymore-0.047
Tokenron
Token ID1313
Feature activation+1.69e+00

Pos logit contributions
ont+0.047
P+0.042
 gard+0.042
Bel+0.041
rev+0.040
Neg logit contributions
run-0.055
ement-0.054
 gust-0.054
 brand-0.051
 anymore-0.050
Token and
Token ID290
Feature activation+2.50e-01

Pos logit contributions
ont+0.062
P+0.055
 gard+0.055
Bel+0.053
rev+0.052
Neg logit contributions
run-0.083
ement-0.081
 gust-0.080
 brand-0.076
 anymore-0.075
Token vanilla
Token ID16858
Feature activation+5.14e-01

Pos logit contributions
ont+0.009
P+0.008
 gard+0.008
Bel+0.008
rev+0.008
Neg logit contributions
run-0.012
 gust-0.012
ement-0.012
 brand-0.011
 anymore-0.011
Token and
Token ID290
Feature activation+5.11e-01

Pos logit contributions
ont+0.017
P+0.015
 gard+0.015
Bel+0.015
rev+0.015
Neg logit contributions
run-0.026
ement-0.026
 gust-0.026
 brand-0.025
 anymore-0.024
Token lic
Token ID3476
Feature activation+8.83e-01

Pos logit contributions
ont+0.018
P+0.016
 gard+0.016
Bel+0.016
rev+0.015
Neg logit contributions
run-0.026
 gust-0.025
ement-0.025
 brand-0.024
 anymore-0.023
Tokenor
Token ID273
Feature activation+4.33e+00

Pos logit contributions
ont+0.024
 gard+0.022
P+0.020
Bel+0.019
 moral+0.019
Neg logit contributions
run-0.053
ement-0.051
 gust-0.048
 anymore-0.047
 brand-0.047
Tokenice
Token ID501
Feature activation+2.06e+00

Pos logit contributions
ont+0.165
 gard+0.152
P+0.150
Bel+0.145
rev+0.143
Neg logit contributions
run-0.208
ement-0.201
 gust-0.194
 brand-0.189
 anymore-0.185
Token and
Token ID290
Feature activation+8.89e-01

Pos logit contributions
ont+0.075
P+0.067
 gard+0.067
Bel+0.065
rev+0.064
Neg logit contributions
run-0.101
ement-0.099
 gust-0.098
 brand-0.093
 anymore-0.092
Token star
Token ID3491
Feature activation+1.56e+00

Pos logit contributions
ont+0.031
 gard+0.028
P+0.028
Bel+0.027
rev+0.026
Neg logit contributions
run-0.045
ement-0.044
 gust-0.044
 brand-0.041
 anymore-0.041
Token an
Token ID281
Feature activation+1.01e-01

Pos logit contributions
ont+0.060
P+0.054
 gard+0.054
Bel+0.052
 cor+0.051
Neg logit contributions
run-0.072
 gust-0.072
ement-0.071
 brand-0.067
 anymore-0.066
Tokenise
Token ID786
Feature activation+1.51e+00

Pos logit contributions
ont+0.005
P+0.004
 gard+0.004
Bel+0.004
rev+0.004
Neg logit contributions
run-0.004
ement-0.004
 gust-0.004
 brand-0.004
 anymore-0.004
Token put
Token ID1234
Feature activation+3.92e-01

Pos logit contributions
ont+0.013
P+0.012
rev+0.012
fl+0.011
Bel+0.011
Neg logit contributions
run-0.013
 anymore-0.013
 gust-0.013
 badly-0.012
ement-0.012
Token it
Token ID340
Feature activation+4.18e-02

Pos logit contributions
ont+0.019
P+0.017
 gard+0.017
Bel+0.016
rev+0.016
Neg logit contributions
run-0.015
ement-0.015
 gust-0.015
 brand-0.013
 anymore-0.013
Token down
Token ID866
Feature activation-8.83e-01

Pos logit contributions
ont+0.002
 gard+0.002
P+0.002
rev+0.002
Bel+0.002
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
 anymore-0.002
Token.
Token ID13
Feature activation+4.20e-02

Pos logit contributions
run+0.044
 gust+0.044
ement+0.043
 brand+0.041
 anymore+0.040
Neg logit contributions
ont-0.031
P-0.028
 gard-0.028
Bel-0.026
 moral-0.026
Token Ign
Token ID16583
Feature activation+6.64e-01

Pos logit contributions
ont+0.002
P+0.002
 gard+0.002
Bel+0.002
 cor+0.002
Neg logit contributions
run-0.002
 gust-0.001
ement-0.001
 brand-0.001
 anymore-0.001
Tokenor
Token ID273
Feature activation+4.32e+00

Pos logit contributions
ont+0.019
P+0.017
Bel+0.016
rev+0.016
 gard+0.015
Neg logit contributions
 gust-0.037
ement-0.037
run-0.037
 brand-0.036
 importantly-0.034
Tokenant
Token ID415
Feature activation+2.42e+00

Pos logit contributions
ont+0.148
 gard+0.142
P+0.138
Bel+0.134
rev+0.132
Neg logit contributions
run-0.223
ement-0.210
 gust-0.204
 anymore-0.197
 brand-0.193
Token children
Token ID1751
Feature activation+1.39e+00

Pos logit contributions
ont+0.099
 gard+0.090
P+0.090
Bel+0.087
rev+0.086
Neg logit contributions
run-0.109
ement-0.105
 gust-0.105
 anymore-0.098
 brand-0.097
Token like
Token ID588
Feature activation+1.31e-01

Pos logit contributions
ont+0.056
 gard+0.050
P+0.049
Bel+0.048
 adult+0.048
Neg logit contributions
 gust-0.064
ement-0.063
run-0.059
 brand-0.059
 anymore-0.053
Token you
Token ID345
Feature activation+1.49e-01

Pos logit contributions
ont+0.005
P+0.005
 gard+0.005
Bel+0.005
rev+0.005
Neg logit contributions
run-0.006
 gust-0.006
ement-0.006
 brand-0.006
 anymore-0.005
Token don
Token ID836
Feature activation+3.05e-01

Pos logit contributions
ont+0.006
 gard+0.005
P+0.005
Bel+0.005
 moral+0.005
Neg logit contributions
ement-0.007
 gust-0.007
run-0.007
 brand-0.007
 anymore-0.006
Token kind
Token ID1611
Feature activation-3.22e-01

Pos logit contributions
ement+0.011
 gust+0.011
run+0.011
 anymore+0.010
 brand+0.010
Neg logit contributions
ont-0.009
P-0.008
 gard-0.008
Bel-0.007
 cor-0.007
Token and
Token ID290
Feature activation+8.71e-01

Pos logit contributions
run+0.017
ement+0.016
 gust+0.016
 anymore+0.016
 brand+0.016
Neg logit contributions
ont-0.010
P-0.010
 gard-0.009
Bel-0.009
rev-0.009
Token thoughtful
Token ID22677
Feature activation+3.45e-01

Pos logit contributions
 gard+0.028
ont+0.027
 moral+0.026
P+0.024
 adult+0.024
Neg logit contributions
ement-0.048
 gust-0.045
run-0.045
 anymore-0.043
oped-0.041
Token.
Token ID13
Feature activation-2.67e-01

Pos logit contributions
ont+0.012
P+0.011
 gard+0.011
Bel+0.010
rev+0.010
Neg logit contributions
run-0.018
ement-0.017
 gust-0.017
 brand-0.016
 anymore-0.016
Token Ign
Token ID16583
Feature activation+4.68e-01

Pos logit contributions
run+0.010
ement+0.009
 gust+0.009
 brand+0.008
 importantly+0.008
Neg logit contributions
ont-0.014
P-0.012
 gard-0.012
 cor-0.012
Bel-0.012
Tokenor
Token ID273
Feature activation+4.31e+00

Pos logit contributions
ont+0.019
P+0.017
 gard+0.016
Bel+0.016
rev+0.016
Neg logit contributions
 gust-0.021
run-0.021
ement-0.021
 brand-0.020
 anymore-0.019
Tokenance
Token ID590
Feature activation+1.01e+00

Pos logit contributions
ont+0.160
 gard+0.151
P+0.149
Bel+0.144
rev+0.143
Neg logit contributions
run-0.208
ement-0.199
 gust-0.193
 anymore-0.186
 brand-0.181
Token never
Token ID1239
Feature activation+5.46e-01

Pos logit contributions
ont+0.038
P+0.035
 gard+0.035
Bel+0.034
 moral+0.033
Neg logit contributions
 gust-0.046
run-0.046
ement-0.046
 brand-0.044
 anymore-0.042
Token leads
Token ID5983
Feature activation+4.56e-01

Pos logit contributions
ont+0.018
 gard+0.016
Bel+0.015
 moral+0.015
P+0.015
Neg logit contributions
ement-0.030
run-0.028
 gust-0.028
 brand-0.027
 anymore-0.025
Token to
Token ID284
Feature activation+9.97e-01

Pos logit contributions
ont+0.016
P+0.015
 gard+0.014
rev+0.014
Bel+0.014
Neg logit contributions
run-0.024
 gust-0.022
ement-0.022
 anymore-0.022
 brand-0.021
Token kindness
Token ID23887
Feature activation-1.35e-01

Pos logit contributions
ont+0.035
 gard+0.032
P+0.031
Bel+0.031
 moral+0.030
Neg logit contributions
run-0.049
ement-0.049
 gust-0.049
 brand-0.046
 anymore-0.044
Token a
Token ID257
Feature activation+5.00e-02

Pos logit contributions
ont+0.010
P+0.009
 gard+0.009
Bel+0.008
rev+0.008
Neg logit contributions
run-0.014
 gust-0.013
 brand-0.013
ement-0.013
 anymore-0.012
Token mask
Token ID9335
Feature activation+1.66e-01

Pos logit contributions
ont+0.002
P+0.002
 gard+0.002
Bel+0.002
rev+0.002
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
 anymore-0.002
Token and
Token ID290
Feature activation+1.68e-01

Pos logit contributions
ont+0.005
 gard+0.005
P+0.004
Bel+0.004
rev+0.004
Neg logit contributions
 gust-0.009
run-0.009
ement-0.009
 brand-0.008
 anymore-0.008
Token a
Token ID257
Feature activation-1.26e-01

Pos logit contributions
ont+0.005
P+0.005
 gard+0.005
rev+0.005
Bel+0.005
Neg logit contributions
run-0.009
 gust-0.009
 brand-0.008
ement-0.008
 anymore-0.008
Token sn
Token ID3013
Feature activation+2.68e-01

Pos logit contributions
run+0.006
 gust+0.006
ement+0.006
 brand+0.006
 anymore+0.006
Neg logit contributions
ont-0.005
P-0.004
 gard-0.004
Bel-0.004
rev-0.004
Tokenor
Token ID273
Feature activation+4.19e+00

Pos logit contributions
ont+0.010
P+0.009
 gard+0.009
Bel+0.009
rev+0.009
Neg logit contributions
run-0.013
 gust-0.013
ement-0.013
 brand-0.012
 anymore-0.012
Tokenkel
Token ID7750
Feature activation+1.65e+00

Pos logit contributions
P+0.140
 Gard+0.139
 gard+0.138
ont+0.138
rev+0.136
Neg logit contributions
run-0.217
ement-0.199
 gust-0.193
 brand-0.191
 anymore-0.190
Token.
Token ID13
Feature activation+3.28e-01

Pos logit contributions
ont+0.056
P+0.050
 gard+0.050
Bel+0.048
rev+0.047
Neg logit contributions
run-0.086
ement-0.084
 gust-0.083
 brand-0.079
 anymore-0.078
Token He
Token ID679
Feature activation-1.24e-01

Pos logit contributions
ont+0.013
P+0.011
 gard+0.011
Bel+0.011
rev+0.011
Neg logit contributions
run-0.015
ement-0.015
 gust-0.015
 brand-0.014
 anymore-0.014
Token said
Token ID531
Feature activation-1.90e-01

Pos logit contributions
run+0.005
 brand+0.004
ement+0.004
 gust+0.004
 importantly+0.004
Neg logit contributions
ont-0.006
 gard-0.005
P-0.005
 Gard-0.005
 adult-0.005
Token they
Token ID484
Feature activation+6.24e-01

Pos logit contributions
run+0.007
 gust+0.007
ement+0.006
 brand+0.006
 anymore+0.006
Neg logit contributions
ont-0.010
P-0.009
 gard-0.009
Bel-0.009
rev-0.009
Tokeni
Token ID72
Feature activation+6.64e-01

Pos logit contributions
ont+0.158
 gard+0.156
P+0.156
Bel+0.147
 moral+0.143
Neg logit contributions
ement-0.186
run-0.184
 gust-0.167
 anymore-0.164
 brand-0.160
Token.
Token ID13
Feature activation+3.02e-01

Pos logit contributions
ont+0.024
P+0.021
 gard+0.021
Bel+0.021
rev+0.020
Neg logit contributions
run-0.033
ement-0.032
 gust-0.032
 brand-0.030
 anymore-0.030
Token
Token ID198
Feature activation+1.95e-01

Pos logit contributions
ont+0.011
P+0.009
 gard+0.009
Bel+0.009
 moral+0.009
Neg logit contributions
run-0.016
 gust-0.015
ement-0.015
 brand-0.014
 anymore-0.014
Token
Token ID198
Feature activation+7.88e-02

Pos logit contributions
ont+0.007
 gard+0.006
P+0.006
Bel+0.006
 Gard+0.006
Neg logit contributions
run-0.010
ement-0.009
 gust-0.009
 brand-0.009
 anymore-0.009
TokenB
Token ID33
Feature activation+2.17e+00

Pos logit contributions
ont+0.002
 gard+0.002
Bel+0.002
 Gard+0.002
rev+0.002
Neg logit contributions
run-0.005
 gust-0.004
 brand-0.004
ement-0.004
izing-0.004
Tokenamb
Token ID4131
Feature activation+4.17e+00

Pos logit contributions
ont+0.059
 moral+0.055
 gard+0.054
 parent+0.047
 Gard+0.046
Neg logit contributions
run-0.135
ement-0.120
oped-0.116
 haven-0.112
 badly-0.112
Tokeni
Token ID72
Feature activation+2.21e-01

Pos logit contributions
ont+0.179
 gard+0.174
P+0.173
 Gard+0.162
rev+0.161
Neg logit contributions
ement-0.180
run-0.172
 gust-0.152
 anymore-0.148
ited-0.146
Token ran
Token ID4966
Feature activation-2.72e-01

Pos logit contributions
ont+0.010
 gard+0.009
P+0.009
rev+0.009
 Gard+0.009
Neg logit contributions
run-0.009
 gust-0.008
ement-0.008
oped-0.008
 brand-0.007
Token as
Token ID355
Feature activation-1.06e+00

Pos logit contributions
 anymore+0.009
run+0.009
 gust+0.009
 brand+0.009
 badly+0.008
Neg logit contributions
ont-0.014
P-0.013
 gard-0.013
 Gard-0.013
rev-0.012
Token fast
Token ID3049
Feature activation-1.25e-01

Pos logit contributions
run+0.041
 gust+0.040
ement+0.039
 brand+0.036
 anymore+0.036
Neg logit contributions
ont-0.051
P-0.046
 gard-0.046
Bel-0.045
rev-0.045
Token as
Token ID355
Feature activation-9.26e-01

Pos logit contributions
ement+0.003
run+0.003
 gust+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.007
 gard-0.007
P-0.007
Bel-0.006
 cor-0.006
Token determined
Token ID5295
Feature activation+6.73e-01

Pos logit contributions
ont+0.027
 gard+0.025
 two+0.024
P+0.023
 cor+0.023
Neg logit contributions
ement-0.034
 gust-0.032
run-0.030
 anymore-0.029
izing-0.029
Token to
Token ID284
Feature activation+2.67e-01

Pos logit contributions
ont+0.020
P+0.017
 gard+0.017
Bel+0.016
rev+0.016
Neg logit contributions
run-0.038
ement-0.037
 gust-0.037
 anymore-0.036
 brand-0.036
Token be
Token ID307
Feature activation-2.40e-01

Pos logit contributions
ont+0.011
P+0.010
 gard+0.010
Bel+0.010
rev+0.010
Neg logit contributions
run-0.011
ement-0.011
 gust-0.011
 brand-0.010
 anymore-0.010
Token the
Token ID262
Feature activation+1.38e-01

Pos logit contributions
ement+0.011
run+0.011
 gust+0.011
 brand+0.010
 anymore+0.010
Neg logit contributions
ont-0.009
P-0.008
 gard-0.008
Bel-0.008
rev-0.008
Token vict
Token ID2210
Feature activation+6.20e-01

Pos logit contributions
ont+0.006
P+0.005
rev+0.005
resp+0.005
Bel+0.005
Neg logit contributions
 gust-0.005
 anymore-0.005
 brand-0.005
ement-0.005
run-0.005
Tokenor
Token ID273
Feature activation+4.16e+00

Pos logit contributions
ont+0.016
 gard+0.015
Bel+0.014
P+0.014
 Gard+0.014
Neg logit contributions
run-0.037
ement-0.036
 gust-0.034
 anymore-0.034
 brand-0.033
Token.
Token ID13
Feature activation+3.38e-01

Pos logit contributions
ont+0.174
P+0.166
 gard+0.164
Bel+0.161
rev+0.155
Neg logit contributions
run-0.182
ement-0.175
 anymore-0.167
 gust-0.167
 brand-0.158
Token
Token ID198
Feature activation+3.09e-01

Pos logit contributions
ont+0.011
P+0.010
 gard+0.010
Bel+0.009
rev+0.009
Neg logit contributions
run-0.018
ement-0.018
 gust-0.018
 brand-0.017
 anymore-0.016
Token
Token ID198
Feature activation+1.97e-01

Pos logit contributions
ont+0.009
P+0.008
 gard+0.008
Bel+0.008
rev+0.008
Neg logit contributions
run-0.017
ement-0.017
 gust-0.017
 brand-0.016
 anymore-0.016
TokenBut
Token ID1537
Feature activation+2.99e-01

Pos logit contributions
ont+0.008
 gard+0.007
P+0.007
Bel+0.007
 Gard+0.006
Neg logit contributions
run-0.009
 gust-0.009
ement-0.009
 brand-0.009
 anymore-0.008
Token,
Token ID11
Feature activation+1.32e-01

Pos logit contributions
ont+0.011
 moral+0.011
 two+0.011
P+0.010
 parent+0.010
Neg logit contributions
ement-0.014
run-0.014
 gust-0.013
 brand-0.013
istered-0.012
Token for
Token ID329
Feature activation-6.96e-01

Pos logit contributions
run+0.025
ement+0.025
 gust+0.024
 brand+0.023
 importantly+0.022
Neg logit contributions
ont-0.018
 gard-0.016
P-0.016
Bel-0.016
 cor-0.016
Token air
Token ID1633
Feature activation-2.49e-01

Pos logit contributions
run+0.035
ement+0.034
 gust+0.033
 brand+0.032
 anymore+0.031
Neg logit contributions
ont-0.025
P-0.022
 gard-0.022
Bel-0.021
 moral-0.021
Token,
Token ID11
Feature activation-7.35e-04

Pos logit contributions
 gust+0.013
ement+0.013
run+0.012
 brand+0.012
 importantly+0.011
Neg logit contributions
ont-0.008
fl-0.007
 gard-0.007
 cor-0.007
Bel-0.007
Token he
Token ID339
Feature activation-3.36e-02

Pos logit contributions
run+0.000
ement+0.000
 gust+0.000
 brand+0.000
 anymore+0.000
Neg logit contributions
ont-0.000
 gard-0.000
 two-0.000
 moral-0.000
P-0.000
Token cl
Token ID537
Feature activation+1.09e+00

Pos logit contributions
run+0.001
 gust+0.001
ement+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.001
P-0.001
 gard-0.001
Bel-0.001
rev-0.001
Tokenamb
Token ID4131
Feature activation+4.15e+00

Pos logit contributions
ont+0.051
P+0.049
 gard+0.047
Bel+0.047
amb+0.045
Neg logit contributions
 gust-0.041
ement-0.039
run-0.038
 brand-0.038
 anymore-0.037
Tokenered
Token ID1068
Feature activation+3.47e-01

Pos logit contributions
ont+0.176
P+0.158
 gard+0.158
Bel+0.154
rev+0.151
Neg logit contributions
run-0.180
 gust-0.178
ement-0.175
 brand-0.167
 anymore-0.163
Token out
Token ID503
Feature activation+6.01e-01

Pos logit contributions
ont+0.013
 gard+0.012
P+0.012
Bel+0.012
rev+0.011
Neg logit contributions
run-0.016
ement-0.016
 gust-0.016
 brand-0.015
 anymore-0.015
Token of
Token ID286
Feature activation+3.47e-02

Pos logit contributions
ont+0.022
 gard+0.021
 cor+0.021
 moral+0.020
Bel+0.020
Neg logit contributions
ement-0.027
 gust-0.027
run-0.026
 brand-0.025
 anymore-0.024
Token the
Token ID262
Feature activation+8.55e-01

Pos logit contributions
ont+0.001
 gard+0.001
P+0.001
 adult+0.001
 moral+0.001
Neg logit contributions
ement-0.002
run-0.002
 gust-0.002
 brand-0.002
ended-0.002
Token p
Token ID279
Feature activation+1.20e+00

Pos logit contributions
ont+0.028
P+0.024
rev+0.024
Bel+0.024
 gard+0.023
Neg logit contributions
 gust-0.046
run-0.045
ement-0.045
 brand-0.043
 anymore-0.043
Token j
Token ID474
Feature activation+6.33e-01

Pos logit contributions
 gust+0.014
run+0.014
ement+0.013
 brand+0.013
 anymore+0.013
Neg logit contributions
ont-0.009
P-0.007
Bel-0.007
rev-0.007
fl-0.007
Tokenolly
Token ID5098
Feature activation+1.87e+00

Pos logit contributions
ont+0.031
 gard+0.029
Bel+0.029
rev+0.029
P+0.028
Neg logit contributions
 gust-0.022
run-0.022
ement-0.022
 anymore-0.020
 brand-0.020
Token deer
Token ID20096
Feature activation-4.53e-01

Pos logit contributions
ont+0.068
Bel+0.065
talk+0.064
rev+0.062
fl+0.062
Neg logit contributions
 gust-0.089
 brand-0.076
run-0.074
ement-0.073
 anymore-0.072
Token named
Token ID3706
Feature activation-4.01e-01

Pos logit contributions
run+0.022
ement+0.021
 gust+0.021
 brand+0.020
 anymore+0.020
Neg logit contributions
ont-0.017
P-0.015
 gard-0.015
Bel-0.015
rev-0.015
Token B
Token ID347
Feature activation+1.84e+00

Pos logit contributions
run+0.013
 gust+0.013
 anymore+0.013
ement+0.012
 brand+0.012
Neg logit contributions
ont-0.021
 gard-0.020
P-0.019
 moral-0.019
Bel-0.019
Tokenamb
Token ID4131
Feature activation+4.13e+00

Pos logit contributions
ont+0.067
 moral+0.063
 gard+0.062
 cor+0.061
P+0.059
Neg logit contributions
run-0.088
ement-0.084
 gust-0.079
 anymore-0.079
 brand-0.077
Tokeni
Token ID72
Feature activation+1.10e+00

Pos logit contributions
ont+0.162
 gard+0.153
P+0.151
Bel+0.145
rev+0.143
Neg logit contributions
ement-0.192
run-0.185
 gust-0.175
 anymore-0.170
 brand-0.166
Token.
Token ID13
Feature activation+1.69e-01

Pos logit contributions
ont+0.057
 gard+0.054
 adult+0.054
 moral+0.053
P+0.053
Neg logit contributions
run-0.034
 anymore-0.030
ement-0.028
 badly-0.027
 brand-0.026
Token B
Token ID347
Feature activation+1.20e+00

Pos logit contributions
ont+0.004
P+0.003
 gard+0.003
Bel+0.003
rev+0.003
Neg logit contributions
run-0.010
ement-0.010
 gust-0.010
 brand-0.010
 anymore-0.010
Tokenamb
Token ID4131
Feature activation+3.84e+00

Pos logit contributions
ont+0.030
 moral+0.029
 gard+0.026
 cor+0.024
P+0.024
Neg logit contributions
ement-0.070
run-0.069
 haven-0.063
izing-0.063
unch-0.063
Tokeni
Token ID72
Feature activation+1.22e-01

Pos logit contributions
ont+0.162
P+0.158
 gard+0.157
 moral+0.147
Bel+0.146
Neg logit contributions
ement-0.170
run-0.157
 anymore-0.140
 gust-0.138
 brand-0.137
Token of
Token ID286
Feature activation-1.02e-01

Pos logit contributions
 gust+0.030
run+0.028
ement+0.028
 brand+0.027
 anymore+0.025
Neg logit contributions
ont-0.032
 gard-0.029
P-0.029
Bel-0.028
rev-0.028
Token goggles
Token ID42762
Feature activation+6.08e-01

Pos logit contributions
run+0.005
 gust+0.005
ement+0.005
 brand+0.005
 importantly+0.004
Neg logit contributions
ont-0.004
rev-0.003
 Gard-0.003
Bel-0.003
 gard-0.003
Token and
Token ID290
Feature activation+1.37e-01

Pos logit contributions
ont+0.017
 gard+0.015
P+0.014
rev+0.014
Bel+0.014
Neg logit contributions
 gust-0.035
run-0.034
ement-0.034
 brand-0.032
 anymore-0.031
Token a
Token ID257
Feature activation+6.38e-02

Pos logit contributions
ont+0.005
P+0.004
Bel+0.004
 gard+0.004
rev+0.004
Neg logit contributions
run-0.007
 gust-0.007
 brand-0.006
ement-0.006
 importantly-0.006
Token sn
Token ID3013
Feature activation+2.03e-01

Pos logit contributions
ont+0.002
P+0.002
 gard+0.002
Bel+0.002
rev+0.002
Neg logit contributions
run-0.003
 gust-0.003
ement-0.003
 brand-0.003
 anymore-0.003
Tokenor
Token ID273
Feature activation+4.12e+00

Pos logit contributions
ont+0.008
P+0.007
 gard+0.007
Bel+0.007
rev+0.007
Neg logit contributions
run-0.009
ement-0.009
 gust-0.009
 brand-0.009
 anymore-0.008
Tokenkel
Token ID7750
Feature activation+1.70e+00

Pos logit contributions
 gard+0.130
 Gard+0.129
P+0.127
ont+0.126
Bel+0.124
Neg logit contributions
run-0.222
ement-0.204
 brand-0.197
 anymore-0.196
 gust-0.192
Token.
Token ID13
Feature activation-4.45e-01

Pos logit contributions
ont+0.063
P+0.058
 gard+0.057
Bel+0.055
rev+0.054
Neg logit contributions
run-0.083
ement-0.080
 gust-0.079
 brand-0.076
 anymore-0.076
Token They
Token ID1119
Feature activation+5.34e-03

Pos logit contributions
run+0.021
ement+0.020
 gust+0.020
 brand+0.019
 anymore+0.019
Neg logit contributions
ont-0.018
P-0.016
 gard-0.016
Bel-0.015
rev-0.015
Token said
Token ID531
Feature activation-2.94e-01

Pos logit contributions
ont+0.000
P+0.000
 gard+0.000
Bel+0.000
 Gard+0.000
Neg logit contributions
run-0.000
 gust-0.000
ement-0.000
 brand-0.000
 anymore-0.000
Token they
Token ID484
Feature activation+5.55e-01

Pos logit contributions
run+0.011
 gust+0.011
ement+0.011
 brand+0.010
 anymore+0.010
Neg logit contributions
ont-0.014
P-0.013
 gard-0.013
rev-0.013
Bel-0.013
Token was
Token ID373
Feature activation+4.75e-01

Pos logit contributions
ont+0.013
P+0.012
 gard+0.012
Bel+0.012
rev+0.012
Neg logit contributions
run-0.014
ement-0.014
 gust-0.014
 brand-0.013
 anymore-0.012
Token trying
Token ID2111
Feature activation+1.96e+00

Pos logit contributions
ont+0.016
 gard+0.014
 cor+0.014
 two+0.014
P+0.014
Neg logit contributions
ement-0.025
 gust-0.024
run-0.023
 anymore-0.022
ited-0.022
Token to
Token ID284
Feature activation+4.30e-01

Pos logit contributions
ont+0.062
P+0.056
 gard+0.056
rev+0.053
Bel+0.051
Neg logit contributions
run-0.109
 gust-0.102
 brand-0.101
 anymore-0.101
ement-0.101
Token shoot
Token ID2686
Feature activation+3.47e-01

Pos logit contributions
ont+0.016
P+0.014
 gard+0.014
rev+0.014
Bel+0.013
Neg logit contributions
run-0.021
 gust-0.021
ement-0.019
 brand-0.019
 anymore-0.019
Token B
Token ID347
Feature activation+1.75e+00

Pos logit contributions
ont+0.016
 gard+0.015
 Gard+0.014
rev+0.014
P+0.014
Neg logit contributions
run-0.015
 anymore-0.013
 gust-0.013
ement-0.013
 badly-0.012
Tokenamb
Token ID4131
Feature activation+4.05e+00

Pos logit contributions
ont+0.057
 gard+0.052
 moral+0.052
P+0.050
 Gard+0.048
Neg logit contributions
run-0.094
ement-0.087
 anymore-0.085
 gust-0.083
izing-0.081
Tokeni
Token ID72
Feature activation+6.64e-01

Pos logit contributions
ont+0.158
 gard+0.156
P+0.156
Bel+0.147
 moral+0.143
Neg logit contributions
ement-0.186
run-0.184
 gust-0.167
 anymore-0.164
 brand-0.160
Token.
Token ID13
Feature activation+3.02e-01

Pos logit contributions
ont+0.024
P+0.021
 gard+0.021
Bel+0.021
rev+0.020
Neg logit contributions
run-0.033
ement-0.032
 gust-0.032
 brand-0.030
 anymore-0.030
Token
Token ID198
Feature activation+1.95e-01

Pos logit contributions
ont+0.011
P+0.009
 gard+0.009
Bel+0.009
 moral+0.009
Neg logit contributions
run-0.016
 gust-0.015
ement-0.015
 brand-0.014
 anymore-0.014
Token
Token ID198
Feature activation+7.88e-02

Pos logit contributions
ont+0.007
 gard+0.006
P+0.006
Bel+0.006
 Gard+0.006
Neg logit contributions
run-0.010
ement-0.009
 gust-0.009
 brand-0.009
 anymore-0.009
TokenB
Token ID33
Feature activation+2.17e+00

Pos logit contributions
ont+0.002
 gard+0.002
Bel+0.002
 Gard+0.002
rev+0.002
Neg logit contributions
run-0.005
 gust-0.004
 brand-0.004
ement-0.004
izing-0.004
Token He
Token ID679
Feature activation+1.48e-01

Pos logit contributions
run+0.000
ement+0.000
 gust+0.000
 brand+0.000
 anymore+0.000
Neg logit contributions
ont-0.001
P-0.001
 gard-0.001
Bel-0.000
 cor-0.000
Token was
Token ID373
Feature activation-1.01e-01

Pos logit contributions
ont+0.006
P+0.005
 gard+0.005
Bel+0.005
rev+0.005
Neg logit contributions
run-0.007
ement-0.007
 gust-0.007
 brand-0.007
 anymore-0.006
Token a
Token ID257
Feature activation-6.37e-02

Pos logit contributions
ement+0.005
 gust+0.005
run+0.004
izing+0.004
 anymore+0.004
Neg logit contributions
ont-0.004
 two-0.004
 gard-0.004
P-0.003
 moral-0.003
Token true
Token ID2081
Feature activation+2.23e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 brand+0.002
 anymore+0.002
Neg logit contributions
ont-0.003
 gard-0.002
P-0.002
Bel-0.002
rev-0.002
Token vict
Token ID2210
Feature activation+4.48e-01

Pos logit contributions
ont+0.008
P+0.007
 gard+0.007
Bel+0.007
rev+0.007
Neg logit contributions
run-0.011
ement-0.011
 gust-0.011
 brand-0.010
 anymore-0.010
Tokenor
Token ID273
Feature activation+3.99e+00

Pos logit contributions
ont+0.010
P+0.009
 gard+0.008
Bel+0.008
rev+0.008
Neg logit contributions
 gust-0.028
run-0.028
ement-0.028
 brand-0.027
 anymore-0.026
Token.
Token ID13
Feature activation-5.44e-02

Pos logit contributions
ont+0.158
 gard+0.142
P+0.139
Bel+0.135
rev+0.135
Neg logit contributions
run-0.181
 gust-0.181
ement-0.177
 brand-0.170
 anymore-0.161
Token
Token ID198
Feature activation+1.53e-01

Pos logit contributions
run+0.002
ement+0.002
 gust+0.002
 brand+0.002
 anymore+0.002
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
Bel-0.002
 cor-0.002
Token
Token ID198
Feature activation+1.14e-01

Pos logit contributions
ont+0.005
 gard+0.005
P+0.005
Bel+0.005
 Gard+0.004
Neg logit contributions
run-0.008
ement-0.008
 gust-0.008
 brand-0.007
 anymore-0.007
TokenFrom
Token ID4863
Feature activation-2.43e-01

Pos logit contributions
ont+0.005
 gard+0.004
 cor+0.004
 Gard+0.004
fl+0.004
Neg logit contributions
run-0.005
ement-0.005
 brand-0.005
 gust-0.005
ended-0.004
Token then
Token ID788
Feature activation-5.02e-02

Pos logit contributions
run+0.009
 gust+0.009
ement+0.008
 brand+0.008
 anymore+0.008
Neg logit contributions
ont-0.012
P-0.011
 gard-0.011
Bel-0.011
rev-0.011
Token is
Token ID318
Feature activation+9.01e-01

Pos logit contributions
run+0.023
ement+0.023
 gust+0.022
 brand+0.021
oped+0.021
Neg logit contributions
ont-0.012
fl-0.012
 gard-0.011
 moral-0.011
Bel-0.010
Token always
Token ID1464
Feature activation+1.07e+00

Pos logit contributions
ont+0.029
 two+0.028
 gard+0.028
 moral+0.027
 next+0.026
Neg logit contributions
run-0.047
ement-0.047
 gust-0.044
oped-0.041
hole-0.041
Token best
Token ID1266
Feature activation+7.78e-01

Pos logit contributions
 two+0.036
 next+0.034
ont+0.034
 gard+0.033
P+0.032
Neg logit contributions
ement-0.056
run-0.054
 gust-0.052
oped-0.051
unch-0.051
Token.
Token ID13
Feature activation+8.89e-01

Pos logit contributions
ont+0.024
 gard+0.021
P+0.021
Bel+0.020
rev+0.019
Neg logit contributions
run-0.043
ement-0.042
 gust-0.042
 brand-0.040
 anymore-0.040
Token Ign
Token ID16583
Feature activation+8.42e-01

Pos logit contributions
ont+0.033
P+0.030
 gard+0.030
Bel+0.028
rev+0.028
Neg logit contributions
run-0.043
 gust-0.042
ement-0.042
 brand-0.040
 anymore-0.039
Tokenor
Token ID273
Feature activation+3.98e+00

Pos logit contributions
ont+0.032
P+0.029
Bel+0.027
 gard+0.027
rev+0.027
Neg logit contributions
 gust-0.040
ement-0.039
run-0.039
 brand-0.038
 anymore-0.036
Tokenant
Token ID415
Feature activation+2.01e+00

Pos logit contributions
ont+0.144
 gard+0.138
P+0.135
Bel+0.130
rev+0.129
Neg logit contributions
run-0.193
ement-0.187
 gust-0.182
 anymore-0.174
 brand-0.171
Token people
Token ID661
Feature activation+8.44e-01

Pos logit contributions
ont+0.078
P+0.070
 gard+0.070
Bel+0.068
rev+0.067
Neg logit contributions
run-0.094
ement-0.092
 gust-0.092
 brand-0.086
 anymore-0.085
Token will
Token ID481
Feature activation+1.64e+00

Pos logit contributions
ont+0.034
 gard+0.031
Bel+0.031
P+0.030
 moral+0.029
Neg logit contributions
 gust-0.037
ement-0.037
run-0.037
 brand-0.035
 anymore-0.032
Token always
Token ID1464
Feature activation+2.79e-01

Pos logit contributions
ont+0.057
P+0.051
 gard+0.049
rev+0.049
Bel+0.048
Neg logit contributions
run-0.082
 gust-0.081
ement-0.080
 brand-0.077
 anymore-0.077
Token be
Token ID307
Feature activation-4.01e-02

Pos logit contributions
ont+0.007
 gard+0.007
Bel+0.006
 moral+0.006
P+0.006
Neg logit contributions
ement-0.016
run-0.016
 gust-0.016
 brand-0.015
 anymore-0.014
Token.
Token ID13
Feature activation+6.15e-01

Pos logit contributions
ont+0.004
rev+0.004
 gard+0.003
 moral+0.003
P+0.003
Neg logit contributions
 gust-0.008
ement-0.008
run-0.007
 brand-0.007
 importantly-0.007
Token
Token ID220
Feature activation-7.21e-01

Pos logit contributions
ont+0.026
 gard+0.023
P+0.023
Bel+0.022
 moral+0.022
Neg logit contributions
run-0.028
 gust-0.027
ement-0.027
 brand-0.026
 anymore-0.025
Token
Token ID198
Feature activation+4.25e-01

Pos logit contributions
 gust+0.044
ement+0.043
run+0.042
 brand+0.041
 anymore+0.040
Neg logit contributions
ont-0.017
P-0.015
rev-0.014
Bel-0.014
 gard-0.013
Token
Token ID198
Feature activation+2.56e-01

Pos logit contributions
ont+0.015
 gard+0.014
P+0.013
Bel+0.013
 Gard+0.013
Neg logit contributions
run-0.022
ement-0.020
 gust-0.020
 brand-0.019
 anymore-0.019
TokenB
Token ID33
Feature activation+2.19e+00

Pos logit contributions
ont+0.009
 gard+0.008
 Gard+0.007
 cor+0.007
rev+0.007
Neg logit contributions
run-0.014
 brand-0.013
 gust-0.013
 badly-0.013
izing-0.012
Tokenamb
Token ID4131
Feature activation+3.94e+00

Pos logit contributions
ont+0.064
 moral+0.061
 gard+0.060
 parent+0.052
 Gard+0.052
Neg logit contributions
run-0.133
ement-0.115
oped-0.114
 haven-0.109
unch-0.109
Tokeni
Token ID72
Feature activation+1.33e-01

Pos logit contributions
ont+0.162
 gard+0.157
P+0.157
Bel+0.147
 Gard+0.146
Neg logit contributions
ement-0.173
run-0.170
 gust-0.153
 anymore-0.147
 brand-0.145
Token knew
Token ID2993
Feature activation+2.20e-01

Pos logit contributions
ont+0.006
 gard+0.005
rev+0.005
P+0.005
 Gard+0.005
Neg logit contributions
run-0.005
 gust-0.005
ement-0.005
oped-0.005
 brand-0.005
Token he
Token ID339
Feature activation-3.84e-01

Pos logit contributions
ont+0.012
P+0.011
 gard+0.011
Bel+0.011
rev+0.011
Neg logit contributions
run-0.007
 gust-0.007
ement-0.007
 brand-0.006
 anymore-0.006
Token had
Token ID550
Feature activation-5.01e-02

Pos logit contributions
run+0.017
ement+0.017
 gust+0.017
 brand+0.016
 anymore+0.015
Neg logit contributions
ont-0.016
P-0.014
 gard-0.014
rev-0.014
 moral-0.014
Token to
Token ID284
Feature activation-5.67e-02

Pos logit contributions
run+0.003
 gust+0.003
 brand+0.003
ement+0.003
 anymore+0.003
Neg logit contributions
ont-0.002
 gard-0.001
P-0.001
Bel-0.001
rev-0.001
Token in
Token ID287
Feature activation+8.35e-01

Pos logit contributions
ement+0.011
 gust+0.010
run+0.010
 brand+0.010
ended+0.010
Neg logit contributions
ont-0.006
P-0.005
 gard-0.005
 two-0.005
Bel-0.005
Token a
Token ID257
Feature activation-4.02e-01

Pos logit contributions
ont+0.024
P+0.021
 gard+0.021
Bel+0.020
rev+0.019
Neg logit contributions
run-0.047
ement-0.047
 gust-0.046
 brand-0.044
 anymore-0.043
Token trap
Token ID12840
Feature activation-1.42e+00

Pos logit contributions
ement+0.025
run+0.024
ended+0.022
 importantly+0.022
izing+0.021
Neg logit contributions
P-0.011
ont-0.010
 gard-0.010
 parent-0.009
Bel-0.009
Token.
Token ID13
Feature activation+4.51e-01

Pos logit contributions
ement+0.075
run+0.074
 gust+0.073
 brand+0.070
 importantly+0.066
Neg logit contributions
ont-0.047
P-0.043
 gard-0.041
Bel-0.039
rev-0.037
Token B
Token ID347
Feature activation+1.43e+00

Pos logit contributions
ont+0.020
 gard+0.017
P+0.017
Bel+0.017
 moral+0.016
Neg logit contributions
run-0.020
 gust-0.019
ement-0.019
 brand-0.018
 anymore-0.017
Tokenamb
Token ID4131
Feature activation+3.93e+00

Pos logit contributions
ont+0.037
 moral+0.036
 gard+0.034
 cor+0.031
 two+0.030
Neg logit contributions
run-0.083
ement-0.082
unch-0.076
izing-0.076
 haven-0.075
Tokeni
Token ID72
Feature activation+3.64e-01

Pos logit contributions
ont+0.158
 gard+0.154
P+0.151
Bel+0.143
 moral+0.141
Neg logit contributions
ement-0.178
run-0.176
 gust-0.159
 anymore-0.154
 brand-0.153
Token tried
Token ID3088
Feature activation+3.35e-01

Pos logit contributions
ont+0.017
rev+0.015
 gard+0.015
P+0.014
Bel+0.014
Neg logit contributions
run-0.015
 gust-0.014
ement-0.014
 brand-0.013
 anymore-0.012
Token to
Token ID284
Feature activation+3.62e-02

Pos logit contributions
ont+0.014
P+0.013
 gard+0.012
rev+0.012
Bel+0.011
Neg logit contributions
run-0.016
 anymore-0.014
 brand-0.014
 gust-0.014
 badly-0.014
Token help
Token ID1037
Feature activation+5.07e-01

Pos logit contributions
ont+0.002
rev+0.001
P+0.001
fl+0.001
Bel+0.001
Neg logit contributions
run-0.002
 gust-0.002
 brand-0.001
 anymore-0.001
ement-0.001
Token Th
Token ID536
Feature activation+8.95e-01

Pos logit contributions
ont+0.021
 gard+0.018
P+0.018
Bel+0.018
rev+0.017
Neg logit contributions
run-0.023
 gust-0.023
ement-0.022
 brand-0.021
 anymore-0.021
Token a
Token ID257
Feature activation-2.36e-01

Pos logit contributions
ont+0.014
 gard+0.013
rev+0.012
 adult+0.012
 Gard+0.012
Neg logit contributions
 gust-0.020
run-0.019
ement-0.019
 brand-0.019
 anymore-0.018
Token loyal
Token ID9112
Feature activation+4.44e-01

Pos logit contributions
 gust+0.013
run+0.012
ement+0.012
 brand+0.012
 anymore+0.011
Neg logit contributions
ont-0.008
rev-0.007
Bel-0.007
P-0.007
fl-0.006
Token fish
Token ID5916
Feature activation+1.38e+00

Pos logit contributions
ont+0.017
P+0.015
Bel+0.015
rev+0.014
 moral+0.014
Neg logit contributions
 gust-0.022
run-0.019
 anymore-0.019
 brand-0.018
ement-0.018
Token named
Token ID3706
Feature activation-5.18e-01

Pos logit contributions
ont+0.057
 gard+0.053
rev+0.053
Bel+0.052
 adult+0.051
Neg logit contributions
run-0.058
ement-0.054
 anymore-0.054
 gust-0.053
 brand-0.051
Token Anch
Token ID29253
Feature activation+1.61e+00

Pos logit contributions
run+0.021
 gust+0.021
 anymore+0.020
 brand+0.020
ement+0.019
Neg logit contributions
ont-0.022
 gard-0.022
P-0.021
Bel-0.021
 moral-0.020
Tokenor
Token ID273
Feature activation+3.86e+00

Pos logit contributions
 gard+0.052
ont+0.045
 Gard+0.045
Bel+0.044
P+0.043
Neg logit contributions
run-0.089
ement-0.087
 anymore-0.079
 gust-0.077
hole-0.076
Token.
Token ID13
Feature activation+4.67e-01

Pos logit contributions
Bel+0.185
 gard+0.184
P+0.178
 Gard+0.176
 moral+0.173
Neg logit contributions
run-0.128
 anymore-0.126
ement-0.119
 badly-0.111
oped-0.099
Token He
Token ID679
Feature activation-1.56e-01

Pos logit contributions
ont+0.013
 gard+0.012
P+0.012
Bel+0.011
 moral+0.010
Neg logit contributions
run-0.027
ement-0.026
 gust-0.026
 brand-0.025
 anymore-0.024
Token wanted
Token ID2227
Feature activation+6.86e-01

Pos logit contributions
 brand+0.005
run+0.005
 gust+0.004
 anymore+0.004
ement+0.004
Neg logit contributions
ont-0.008
 gard-0.008
 adult-0.008
 Gard-0.007
Bel-0.007
Token to
Token ID284
Feature activation-7.18e-02

Pos logit contributions
P+0.034
ont+0.034
 Gard+0.034
rev+0.033
 gard+0.032
Neg logit contributions
run-0.021
 anymore-0.020
 brand-0.020
 gust-0.019
 yet-0.017
Token find
Token ID1064
Feature activation+5.03e-01

Pos logit contributions
 gust+0.003
run+0.003
 brand+0.002
 anymore+0.002
 haven+0.002
Neg logit contributions
ont-0.003
rev-0.003
fl-0.003
P-0.003
Bel-0.003
Token the
Token ID262
Feature activation+2.03e-01

Pos logit contributions
run+0.053
ement+0.053
 gust+0.052
 brand+0.050
 anymore+0.048
Neg logit contributions
ont-0.033
P-0.030
 gard-0.029
Bel-0.028
 cor-0.028
Token kids
Token ID3988
Feature activation-9.25e-03

Pos logit contributions
ont+0.008
 gard+0.007
Bel+0.007
rev+0.007
P+0.007
Neg logit contributions
 gust-0.010
ement-0.010
run-0.009
 brand-0.009
 anymore-0.009
Token had
Token ID550
Feature activation-1.60e-01

Pos logit contributions
 gust+0.000
ement+0.000
run+0.000
 brand+0.000
izing+0.000
Neg logit contributions
ont-0.000
 gard-0.000
P-0.000
Bel-0.000
 cor-0.000
Token their
Token ID511
Feature activation-6.74e-02

Pos logit contributions
ement+0.008
run+0.008
 gust+0.008
 brand+0.007
 anymore+0.007
Neg logit contributions
ont-0.006
P-0.005
 gard-0.005
Bel-0.005
rev-0.005
Token t
Token ID256
Feature activation+1.53e-02

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
Bel-0.002
rev-0.002
Tokenamb
Token ID4131
Feature activation+3.85e+00

Pos logit contributions
ont+0.001
 gard+0.000
P+0.000
Bel+0.000
rev+0.000
Neg logit contributions
run-0.001
ement-0.001
 gust-0.001
 brand-0.001
 anymore-0.001
Tokenour
Token ID454
Feature activation+1.34e+00

Pos logit contributions
ont+0.153
P+0.139
 gard+0.139
Bel+0.134
rev+0.132
Neg logit contributions
run-0.177
ement-0.173
 gust-0.172
 brand-0.162
 anymore-0.160
Tokenines
Token ID1127
Feature activation+6.46e-01

Pos logit contributions
ont+0.046
P+0.041
 gard+0.040
Bel+0.039
rev+0.038
Neg logit contributions
run-0.067
 gust-0.067
ement-0.066
 brand-0.063
 anymore-0.062
Token and
Token ID290
Feature activation+3.25e-01

Pos logit contributions
ont+0.019
P+0.017
 gard+0.016
 two+0.016
 moral+0.016
Neg logit contributions
 gust-0.034
run-0.034
ement-0.033
 brand-0.031
istered-0.031
Token drums
Token ID24730
Feature activation+2.03e-02

Pos logit contributions
ont+0.012
 gard+0.011
 two+0.010
 cor+0.010
fl+0.010
Neg logit contributions
ement-0.016
 gust-0.016
run-0.015
 anymore-0.014
izing-0.014
Token,
Token ID11
Feature activation-3.86e-01

Pos logit contributions
ont+0.001
 gard+0.001
fl+0.001
 two+0.001
Bel+0.001
Neg logit contributions
 gust-0.001
ement-0.001
run-0.001
 brand-0.001
 importantly-0.001
Token want
Token ID765
Feature activation+1.96e+00

Pos logit contributions
ont+0.058
P+0.053
 gard+0.051
rev+0.051
Bel+0.051
Neg logit contributions
run-0.065
 gust-0.064
ement-0.061
 anymore-0.060
 brand-0.059
Token to
Token ID284
Feature activation+9.04e-01

Pos logit contributions
ont+0.081
 gard+0.075
P+0.074
Bel+0.070
 Gard+0.069
Neg logit contributions
run-0.093
 anymore-0.090
 brand-0.084
 gust-0.083
ement-0.080
Token help
Token ID1037
Feature activation+9.73e-01

Pos logit contributions
ont+0.034
P+0.031
 gard+0.031
Bel+0.030
rev+0.029
Neg logit contributions
run-0.043
 gust-0.042
ement-0.042
 anymore-0.040
 brand-0.040
Token.
Token ID13
Feature activation+3.50e-01

Pos logit contributions
ont+0.038
P+0.035
 gard+0.035
Bel+0.033
rev+0.033
Neg logit contributions
run-0.045
ement-0.044
 gust-0.044
 brand-0.042
 anymore-0.041
Token B
Token ID347
Feature activation+1.35e+00

Pos logit contributions
ont+0.017
P+0.015
 gard+0.015
Bel+0.014
 moral+0.014
Neg logit contributions
run-0.014
ement-0.014
 gust-0.013
 brand-0.013
 anymore-0.012
Tokenamb
Token ID4131
Feature activation+3.84e+00

Pos logit contributions
ont+0.041
 moral+0.036
 gard+0.036
P+0.033
 phone+0.032
Neg logit contributions
run-0.072
ement-0.071
unch-0.066
izing-0.065
ited-0.063
Tokeni
Token ID72
Feature activation+1.36e-01

Pos logit contributions
ont+0.174
 gard+0.171
P+0.170
Bel+0.159
 Gard+0.158
Neg logit contributions
ement-0.156
run-0.149
ited-0.125
izing-0.124
ittered-0.121
Token felt
Token ID2936
Feature activation-2.68e-01

Pos logit contributions
ont+0.006
 gard+0.005
rev+0.005
P+0.005
Bel+0.005
Neg logit contributions
run-0.006
ement-0.005
 gust-0.005
 anymore-0.005
 badly-0.005
Token sad
Token ID6507
Feature activation-5.55e-01

Pos logit contributions
run+0.011
ement+0.011
 gust+0.010
hole+0.010
 brand+0.009
Neg logit contributions
ont-0.011
 moral-0.011
 gard-0.011
P-0.011
Bel-0.011
Token and
Token ID290
Feature activation+6.23e-01

Pos logit contributions
run+0.034
ement+0.034
 gust+0.033
ifying+0.032
ited+0.032
Neg logit contributions
ont-0.012
 moral-0.011
 gard-0.011
 two-0.010
 adult-0.010
Token scared
Token ID12008
Feature activation-3.01e-01

Pos logit contributions
ont+0.022
 gard+0.021
P+0.021
 moral+0.020
Bel+0.020
Neg logit contributions
run-0.031
ement-0.031
 gust-0.029
 brand-0.028
 anymore-0.028
Token her
Token ID607
Feature activation-3.80e-01

Pos logit contributions
ont+0.021
 gard+0.019
P+0.019
Bel+0.018
 two+0.018
Neg logit contributions
ement-0.030
run-0.027
 gust-0.027
 brand-0.026
istered-0.025
Token mom
Token ID1995
Feature activation+1.02e-01

Pos logit contributions
run+0.021
ement+0.021
 gust+0.021
 brand+0.020
 anymore+0.019
Neg logit contributions
ont-0.011
 gard-0.010
P-0.009
 moral-0.009
Bel-0.009
Tokenmy
Token ID1820
Feature activation+3.68e-01

Pos logit contributions
ont+0.003
 gard+0.003
 two+0.003
P+0.003
 moral+0.003
Neg logit contributions
 gust-0.006
ement-0.006
run-0.005
 brand-0.005
izing-0.005
Token watched
Token ID7342
Feature activation-6.53e-02

Pos logit contributions
ont+0.011
 gard+0.010
P+0.010
rev+0.009
fl+0.009
Neg logit contributions
 gust-0.020
ement-0.020
run-0.020
 brand-0.018
 anymore-0.018
Token in
Token ID287
Feature activation+5.24e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
 moral-0.002
 adult-0.002
Token amaz
Token ID40642
Feature activation-4.98e+00

Pos logit contributions
 two+0.013
 moral+0.012
 gard+0.011
 next+0.011
 adult+0.011
Neg logit contributions
run-0.030
 gust-0.030
ement-0.029
oped-0.029
ended-0.029
Tokenement
Token ID972
Feature activation-1.61e+00

Pos logit contributions
run+0.085
 gust+0.084
 haven+0.075
istered+0.071
 exhausted+0.069
Neg logit contributions
ont-0.337
P-0.312
 adult-0.299
 two-0.297
ent-0.295
Token as
Token ID355
Feature activation-9.75e-01

Pos logit contributions
 gust+0.078
run+0.078
ement+0.076
 brand+0.073
 anymore+0.068
Neg logit contributions
ont-0.055
 gard-0.050
fl-0.050
Bel-0.047
P-0.047
Token the
Token ID262
Feature activation+8.89e-02

Pos logit contributions
run+0.045
ement+0.045
 gust+0.045
 brand+0.041
 anymore+0.041
Neg logit contributions
ont-0.035
 gard-0.034
 two-0.032
Bel-0.032
P-0.032
Token machine
Token ID4572
Feature activation-8.98e-03

Pos logit contributions
ont+0.003
P+0.003
 gard+0.003
Bel+0.003
rev+0.003
Neg logit contributions
 gust-0.005
ement-0.004
run-0.004
 brand-0.004
 anymore-0.004
Token flew
Token ID13112
Feature activation-7.79e-01

Pos logit contributions
run+0.000
 gust+0.000
ement+0.000
 brand+0.000
 anymore+0.000
Neg logit contributions
ont-0.000
 gard-0.000
P-0.000
Bel-0.000
fl-0.000
Token.
Token ID13
Feature activation+9.07e-02

Pos logit contributions
ont+0.016
 gard+0.014
P+0.013
rev+0.013
 adult+0.013
Neg logit contributions
 gust-0.028
ement-0.027
run-0.027
 brand-0.025
izing-0.025
Token He
Token ID679
Feature activation+7.55e-02

Pos logit contributions
ont+0.003
rev+0.003
 moral+0.003
 adult+0.003
 two+0.003
Neg logit contributions
 gust-0.004
run-0.004
ement-0.004
istered-0.004
 brand-0.004
Token was
Token ID373
Feature activation-6.31e-02

Pos logit contributions
ont+0.003
P+0.003
 gard+0.003
Bel+0.003
rev+0.003
Neg logit contributions
run-0.004
 gust-0.003
ement-0.003
 brand-0.003
 anymore-0.003
Token wearing
Token ID5762
Feature activation+4.40e-01

Pos logit contributions
ement+0.003
run+0.003
 gust+0.003
 anymore+0.003
 brand+0.003
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
 two-0.002
rev-0.002
Token his
Token ID465
Feature activation-7.64e-01

Pos logit contributions
ont+0.012
P+0.010
 gard+0.010
Bel+0.009
rev+0.009
Neg logit contributions
run-0.026
ement-0.026
 gust-0.025
 anymore-0.024
 brand-0.024
Token brand
Token ID4508
Feature activation-4.73e+00

Pos logit contributions
run+0.037
ement+0.037
 gust+0.037
 brand+0.034
 anymore+0.034
Neg logit contributions
ont-0.028
P-0.025
 gard-0.025
Bel-0.024
rev-0.024
Token new
Token ID649
Feature activation-1.55e+00

Pos logit contributions
run+0.217
ement+0.214
 gust+0.210
 brand+0.203
 anymore+0.196
Neg logit contributions
ont-0.186
 gard-0.171
P-0.170
Bel-0.166
rev-0.162
Token icy
Token ID30284
Feature activation-6.49e-01

Pos logit contributions
 brand+0.073
ement+0.071
run+0.071
 gust+0.068
 anymore+0.064
Neg logit contributions
ont-0.056
Bel-0.055
 Gard-0.055
 gard-0.055
rev-0.055
Token blue
Token ID4171
Feature activation-2.11e-01

Pos logit contributions
run+0.030
 brand+0.030
 gust+0.030
ement+0.030
 anymore+0.027
Neg logit contributions
Bel-0.024
 Gard-0.022
rev-0.022
ont-0.022
 gard-0.022
Token jacket
Token ID15224
Feature activation-6.41e-01

Pos logit contributions
run+0.010
ement+0.010
 brand+0.010
 gust+0.010
 anymore+0.010
Neg logit contributions
ont-0.007
Bel-0.007
 gard-0.007
rev-0.007
 Gard-0.007
Token as
Token ID355
Feature activation-6.87e-01

Pos logit contributions
 gust+0.037
run+0.036
ement+0.035
 importantly+0.032
istered+0.032
Neg logit contributions
ont-0.019
rev-0.015
P-0.015
 two-0.015
fl-0.015
Tokenly
Token ID306
Feature activation+5.73e-01

Pos logit contributions
 gust+0.017
run+0.016
 brand+0.015
ement+0.015
 exhausted+0.015
Neg logit contributions
ont-0.013
fl-0.012
P-0.011
 cor-0.010
ent-0.010
Token.
Token ID13
Feature activation-4.01e-01

Pos logit contributions
ont+0.018
 gard+0.016
fl+0.016
 cor+0.016
 two+0.015
Neg logit contributions
 gust-0.030
run-0.030
ement-0.029
 brand-0.027
ended-0.027
Token Jill
Token ID22772
Feature activation-2.72e-01

Pos logit contributions
run+0.016
ement+0.016
 gust+0.016
 brand+0.015
 anymore+0.015
Neg logit contributions
ont-0.018
P-0.016
 gard-0.016
Bel-0.016
rev-0.016
Token nodded
Token ID14464
Feature activation-1.51e+00

Pos logit contributions
ement+0.011
run+0.011
 gust+0.011
 brand+0.010
 anymore+0.010
Neg logit contributions
ont-0.012
P-0.011
 gard-0.011
rev-0.010
 Gard-0.010
Token in
Token ID287
Feature activation+1.66e-01

Pos logit contributions
run+0.086
 gust+0.085
ement+0.083
 brand+0.079
 anymore+0.078
Neg logit contributions
ont-0.041
 gard-0.037
Bel-0.035
P-0.035
 moral-0.035
Token amaz
Token ID40642
Feature activation-4.70e+00

Pos logit contributions
ont+0.007
 gard+0.007
P+0.007
Bel+0.007
 moral+0.007
Neg logit contributions
run-0.007
ement-0.006
 gust-0.006
 brand-0.006
 anymore-0.006
Tokenement
Token ID972
Feature activation-2.22e+00

Pos logit contributions
run+0.077
 gust+0.077
 exhausted+0.063
 haven+0.060
istered+0.059
Neg logit contributions
ont-0.329
P-0.303
 adult-0.294
ent-0.291
 gard-0.289
Token.
Token ID13
Feature activation-3.81e-01

Pos logit contributions
 gust+0.116
run+0.116
ement+0.109
 brand+0.108
oped+0.101
Neg logit contributions
ont-0.067
 gard-0.061
fl-0.059
 two-0.057
Bel-0.057
Token
Token ID220
Feature activation-1.10e+00

Pos logit contributions
run+0.017
ement+0.016
 gust+0.016
 brand+0.015
 anymore+0.015
Neg logit contributions
ont-0.016
P-0.015
 gard-0.015
Bel-0.014
rev-0.014
Token
Token ID198
Feature activation+8.49e-02

Pos logit contributions
run+0.060
ement+0.059
 gust+0.059
 brand+0.056
 anymore+0.055
Neg logit contributions
ont-0.034
P-0.030
 gard-0.030
Bel-0.029
rev-0.028
Token
Token ID198
Feature activation-1.79e-02

Pos logit contributions
ont+0.004
 gard+0.003
 cor+0.003
fl+0.003
 Gard+0.003
Neg logit contributions
run-0.004
ement-0.004
 gust-0.003
 brand-0.003
 anymore-0.003
Token watched
Token ID7342
Feature activation-5.13e-02

Pos logit contributions
ont+0.009
 gard+0.009
Bel+0.009
P+0.009
 cor+0.008
Neg logit contributions
run-0.009
ement-0.009
 gust-0.009
 brand-0.009
 anymore-0.008
Token the
Token ID262
Feature activation+1.56e-01

Pos logit contributions
run+0.003
ement+0.002
istered+0.002
 gust+0.002
 brand+0.002
Neg logit contributions
 two-0.002
ont-0.002
 gard-0.002
 cor-0.002
fl-0.002
Token k
Token ID479
Feature activation-5.34e-01

Pos logit contributions
ont+0.004
 gard+0.004
 moral+0.004
P+0.004
Bel+0.003
Neg logit contributions
run-0.009
ement-0.009
 gust-0.009
 anymore-0.008
 brand-0.008
Tokenite
Token ID578
Feature activation-1.20e+00

Pos logit contributions
 brand+0.017
run+0.017
 gust+0.017
ement+0.017
 anymore+0.016
Neg logit contributions
P-0.026
Bel-0.025
 gard-0.024
fl-0.024
ont-0.023
Token in
Token ID287
Feature activation+5.15e-02

Pos logit contributions
run+0.066
ement+0.065
 gust+0.061
 importantly+0.061
istered+0.061
Neg logit contributions
fl-0.032
Bel-0.032
 gard-0.031
ont-0.030
 two-0.030
Token amaz
Token ID40642
Feature activation-4.70e+00

Pos logit contributions
 two+0.002
 moral+0.002
ont+0.002
 gard+0.002
Bel+0.002
Neg logit contributions
run-0.002
ement-0.002
ended-0.002
 gust-0.002
istered-0.002
Tokenement
Token ID972
Feature activation-1.96e+00

Pos logit contributions
run+0.092
 haven+0.080
 gust+0.079
istered+0.079
 exhausted+0.073
Neg logit contributions
ont-0.305
P-0.283
ent-0.280
 adult-0.276
 two-0.273
Token,
Token ID11
Feature activation-3.60e-01

Pos logit contributions
run+0.100
 gust+0.099
ement+0.096
 brand+0.095
istered+0.088
Neg logit contributions
ont-0.059
fl-0.056
 gard-0.053
 two-0.053
Bel-0.052
Token and
Token ID290
Feature activation-1.03e-01

Pos logit contributions
ement+0.018
run+0.018
 gust+0.017
 anymore+0.017
istered+0.016
Neg logit contributions
 gard-0.011
Bel-0.011
 cor-0.010
 two-0.010
fl-0.010
Token Little
Token ID7703
Feature activation+2.58e-01

Pos logit contributions
ement+0.005
run+0.005
 gust+0.005
 anymore+0.005
istered+0.005
Neg logit contributions
 gard-0.003
ont-0.003
fl-0.003
 two-0.003
Bel-0.003
Token Joe
Token ID5689
Feature activation+3.11e-01

Pos logit contributions
ont+0.011
 gard+0.010
P+0.010
Bel+0.009
rev+0.009
Neg logit contributions
run-0.012
 gust-0.011
ement-0.011
 brand-0.011
 anymore-0.010
Token bird
Token ID6512
Feature activation-9.83e-01

Pos logit contributions
ont+0.008
P+0.007
 gard+0.007
Bel+0.007
rev+0.006
Neg logit contributions
run-0.010
 gust-0.010
ement-0.010
 brand-0.010
 anymore-0.009
Token looked
Token ID3114
Feature activation+1.68e-01

Pos logit contributions
run+0.050
ement+0.050
 gust+0.049
 brand+0.047
 anymore+0.047
Neg logit contributions
ont-0.033
Bel-0.029
 gard-0.028
P-0.028
 cor-0.027
Token at
Token ID379
Feature activation-4.47e-01

Pos logit contributions
ont+0.005
Bel+0.005
 gard+0.005
 cor+0.005
 two+0.004
Neg logit contributions
run-0.009
ement-0.008
 gust-0.008
ended-0.008
unch-0.008
Token her
Token ID607
Feature activation-2.15e-01

Pos logit contributions
run+0.027
 gust+0.026
ement+0.025
 brand+0.025
istered+0.025
Neg logit contributions
ont-0.012
 gard-0.010
 two-0.010
Bel-0.010
 adult-0.009
Token in
Token ID287
Feature activation+4.38e-01

Pos logit contributions
run+0.012
 gust+0.011
ement+0.011
 brand+0.010
ended+0.010
Neg logit contributions
ont-0.006
 gard-0.006
Bel-0.006
 moral-0.006
fl-0.005
Token amaz
Token ID40642
Feature activation-4.66e+00

Pos logit contributions
ont+0.019
 gard+0.019
Bel+0.018
P+0.018
 cor+0.017
Neg logit contributions
run-0.018
ement-0.017
 gust-0.017
 brand-0.016
 anymore-0.016
Tokenement
Token ID972
Feature activation-1.93e+00

Pos logit contributions
run+0.089
 haven+0.087
istered+0.084
 exhausted+0.080
 gust+0.080
Neg logit contributions
ont-0.305
P-0.275
 adult-0.270
ent-0.268
 gard-0.268
Token,
Token ID11
Feature activation-4.07e-01

Pos logit contributions
run+0.111
 gust+0.109
ement+0.104
 brand+0.102
istered+0.100
Neg logit contributions
ont-0.048
fl-0.043
 gard-0.043
 two-0.041
Bel-0.041
Token and
Token ID290
Feature activation-9.01e-02

Pos logit contributions
ement+0.021
run+0.021
 gust+0.020
 anymore+0.020
istered+0.019
Neg logit contributions
 gard-0.012
Bel-0.012
ont-0.012
 moral-0.011
 cor-0.011
Token said
Token ID531
Feature activation-1.86e-01

Pos logit contributions
ement+0.004
run+0.004
 gust+0.004
 anymore+0.004
 brand+0.004
Neg logit contributions
 gard-0.003
Bel-0.003
ont-0.003
 two-0.003
 cor-0.003
Token:
Token ID25
Feature activation+5.12e-01

Pos logit contributions
run+0.008
ement+0.008
 gust+0.008
 brand+0.007
 anymore+0.007
Neg logit contributions
ont-0.008
 gard-0.007
P-0.007
Bel-0.007
 moral-0.007
Token.
Token ID13
Feature activation+1.01e+00

Pos logit contributions
 gust+0.015
ement+0.014
run+0.014
 brand+0.014
ended+0.013
Neg logit contributions
ont-0.008
 gard-0.007
fl-0.007
Bel-0.007
P-0.007
Token He
Token ID679
Feature activation+2.24e-01

Pos logit contributions
ont+0.043
P+0.039
 gard+0.039
Bel+0.037
rev+0.037
Neg logit contributions
run-0.044
 gust-0.042
ement-0.042
 brand-0.039
 anymore-0.039
Token watched
Token ID7342
Feature activation+3.86e-01

Pos logit contributions
ont+0.009
P+0.009
 gard+0.008
rev+0.008
 Gard+0.008
Neg logit contributions
run-0.010
 gust-0.010
ement-0.009
 brand-0.009
 anymore-0.009
Token them
Token ID606
Feature activation+4.71e-01

Pos logit contributions
ont+0.017
P+0.016
 gard+0.015
rev+0.015
Bel+0.015
Neg logit contributions
run-0.017
 gust-0.017
ement-0.016
 anymore-0.015
 brand-0.015
Token in
Token ID287
Feature activation+2.86e-01

Pos logit contributions
ont+0.015
 gard+0.014
Bel+0.013
P+0.013
 moral+0.013
Neg logit contributions
run-0.025
ement-0.025
 gust-0.024
 brand-0.023
 anymore-0.022
Token amaz
Token ID40642
Feature activation-4.62e+00

Pos logit contributions
 two+0.010
ont+0.010
 moral+0.010
 gard+0.010
 cor+0.010
Neg logit contributions
run-0.013
ement-0.013
ended-0.013
 gust-0.013
oped-0.012
Tokenement
Token ID972
Feature activation-1.86e+00

Pos logit contributions
 gust+0.083
run+0.079
 haven+0.074
 exhausted+0.068
istered+0.065
Neg logit contributions
ont-0.313
P-0.295
ent-0.281
 adult-0.278
 two-0.274
Token and
Token ID290
Feature activation-1.75e-01

Pos logit contributions
 gust+0.084
run+0.083
ement+0.081
 brand+0.078
 anymore+0.071
Neg logit contributions
ont-0.072
 gard-0.066
fl-0.064
Bel-0.064
 cor-0.063
Token even
Token ID772
Feature activation+3.35e-01

Pos logit contributions
ement+0.009
run+0.009
 gust+0.008
 anymore+0.008
ended+0.008
Neg logit contributions
 gard-0.006
 cor-0.006
ont-0.006
fl-0.005
Bel-0.005
Token laughed
Token ID13818
Feature activation+3.58e-02

Pos logit contributions
ont+0.011
P+0.010
Bel+0.010
 cor+0.009
 gard+0.009
Neg logit contributions
ement-0.018
run-0.017
 brand-0.017
istered-0.017
 gust-0.017
Token a
Token ID257
Feature activation+3.43e-02

Pos logit contributions
ont+0.001
 gard+0.001
Bel+0.001
 cor+0.001
fl+0.001
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
hole-0.002
Token The
Token ID383
Feature activation+1.24e-01

Pos logit contributions
ont+0.010
P+0.009
 moral+0.009
 gard+0.009
Bel+0.009
Neg logit contributions
run-0.008
 gust-0.008
ement-0.007
 brand-0.007
 anymore-0.007
Token boy
Token ID2933
Feature activation-4.59e-01

Pos logit contributions
ont+0.004
P+0.003
rev+0.003
 gard+0.003
Bel+0.003
Neg logit contributions
 gust-0.007
ement-0.007
run-0.007
 brand-0.007
 anymore-0.007
Token watched
Token ID7342
Feature activation+5.79e-02

Pos logit contributions
run+0.022
ement+0.022
 gust+0.022
 brand+0.020
 anymore+0.020
Neg logit contributions
ont-0.017
P-0.015
 gard-0.015
Bel-0.015
rev-0.015
Token her
Token ID607
Feature activation-3.39e-01

Pos logit contributions
ont+0.003
P+0.002
 gard+0.002
Bel+0.002
rev+0.002
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
 anymore-0.002
Token in
Token ID287
Feature activation+1.67e-01

Pos logit contributions
run+0.018
ement+0.018
 gust+0.018
 brand+0.017
 anymore+0.016
Neg logit contributions
ont-0.011
 gard-0.010
Bel-0.010
P-0.009
 moral-0.009
Token amaz
Token ID40642
Feature activation-4.61e+00

Pos logit contributions
 two+0.006
 gard+0.006
 moral+0.006
ont+0.006
 cor+0.006
Neg logit contributions
ement-0.008
run-0.008
ended-0.008
 brand-0.007
izing-0.007
Tokenement
Token ID972
Feature activation-2.00e+00

Pos logit contributions
run+0.070
 gust+0.058
 haven+0.055
istered+0.053
 exhausted+0.048
Neg logit contributions
ont-0.338
P-0.309
 gard-0.297
 adult-0.295
ent-0.290
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
run+0.106
 gust+0.105
ement+0.102
 brand+0.100
istered+0.093
Neg logit contributions
ont-0.061
 gard-0.052
fl-0.052
Bel-0.050
 two-0.049
Token
Token ID198
Feature activation+2.04e-01

Pos logit contributions
run+0.004
 gust+0.003
ement+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.006
P-0.006
 moral-0.005
Bel-0.005
 gard-0.005
Token
Token ID198
Feature activation+1.24e-01

Pos logit contributions
ont+0.008
 gard+0.007
P+0.007
rev+0.006
 Gard+0.006
Neg logit contributions
run-0.010
 gust-0.009
ement-0.009
 brand-0.009
 anymore-0.009
TokenThe
Token ID464
Feature activation+5.17e-02

Pos logit contributions
ont+0.006
rev+0.006
fl+0.005
 gard+0.005
 Gard+0.005
Neg logit contributions
 gust-0.004
run-0.004
izing-0.004
he-0.004
ended-0.004
Token.
Token ID13
Feature activation+2.27e-01

Pos logit contributions
run+0.089
istered+0.084
ended+0.081
 gust+0.081
oped+0.079
Neg logit contributions
ont-0.030
fl-0.027
 two-0.025
 next-0.024
 cor-0.024
Token His
Token ID2399
Feature activation-7.09e-01

Pos logit contributions
ont+0.011
 gard+0.010
P+0.010
Bel+0.009
 cor+0.009
Neg logit contributions
run-0.009
 gust-0.009
ement-0.009
 brand-0.008
 anymore-0.008
Token eyes
Token ID2951
Feature activation-2.27e+00

Pos logit contributions
run+0.037
 gust+0.037
ement+0.037
 brand+0.035
 anymore+0.034
Neg logit contributions
ont-0.024
P-0.022
 gard-0.021
Bel-0.020
rev-0.020
Token widened
Token ID36405
Feature activation-2.14e+00

Pos logit contributions
run+0.115
ement+0.105
 gust+0.104
 anymore+0.101
 importantly+0.098
Neg logit contributions
 two-0.070
ont-0.069
Bel-0.069
 gard-0.068
 cor-0.067
Token in
Token ID287
Feature activation+5.82e-02

Pos logit contributions
run+0.112
 gust+0.107
ement+0.105
 brand+0.098
ended+0.096
Neg logit contributions
ont-0.071
 gard-0.066
P-0.063
 cor-0.063
Bel-0.063
Token amaz
Token ID40642
Feature activation-4.59e+00

Pos logit contributions
ont+0.002
 gard+0.002
 cor+0.002
 two+0.002
 moral+0.002
Neg logit contributions
run-0.003
ement-0.002
 gust-0.002
 anymore-0.002
ended-0.002
Tokenement
Token ID972
Feature activation-2.14e+00

Pos logit contributions
run+0.085
 haven+0.070
 gust+0.068
 exhausted+0.063
istered+0.061
Neg logit contributions
ont-0.318
P-0.293
ent-0.282
 gard-0.280
 adult-0.278
Token.
Token ID13
Feature activation+1.25e-01

Pos logit contributions
run+0.101
 gust+0.099
ement+0.097
 brand+0.093
 anymore+0.087
Neg logit contributions
ont-0.079
 gard-0.073
fl-0.070
Bel-0.069
 cor-0.068
Token
Token ID198
Feature activation+2.31e-01

Pos logit contributions
ont+0.006
 gard+0.005
P+0.005
Bel+0.005
 cor+0.005
Neg logit contributions
run-0.005
 gust-0.005
ement-0.005
 brand-0.005
 anymore-0.005
Token
Token ID198
Feature activation+1.60e-01

Pos logit contributions
ont+0.009
 gard+0.008
 Gard+0.008
P+0.008
 cor+0.007
Neg logit contributions
run-0.011
 gust-0.010
ement-0.010
 brand-0.010
 anymore-0.010
TokenSuddenly
Token ID38582
Feature activation-5.12e-01

Pos logit contributions
ont+0.007
 gard+0.006
 Gard+0.006
fl+0.006
 cor+0.006
Neg logit contributions
run-0.007
 gust-0.006
 brand-0.006
ement-0.006
ended-0.006
Token looked
Token ID3114
Feature activation+2.68e-01

Pos logit contributions
ont+0.063
P+0.057
rev+0.056
 gard+0.056
 Gard+0.054
Neg logit contributions
run-0.067
ement-0.065
 gust-0.065
 brand-0.061
 anymore-0.058
Token at
Token ID379
Feature activation-3.32e-01

Pos logit contributions
 gard+0.008
ont+0.008
Bel+0.008
 cor+0.008
 moral+0.008
Neg logit contributions
run-0.013
ement-0.013
 gust-0.013
ended-0.012
 importantly-0.012
Token the
Token ID262
Feature activation+1.92e-01

Pos logit contributions
run+0.019
 gust+0.019
istered+0.019
ement+0.019
ended+0.018
Neg logit contributions
ont-0.008
 two-0.008
 adult-0.007
 gard-0.007
 moral-0.007
Token bug
Token ID5434
Feature activation+5.65e-01

Pos logit contributions
ont+0.006
 gard+0.005
P+0.005
Bel+0.005
 moral+0.005
Neg logit contributions
run-0.011
ement-0.010
 gust-0.010
 brand-0.010
 anymore-0.010
Token in
Token ID287
Feature activation+3.75e-01

Pos logit contributions
ont+0.012
 gard+0.011
Bel+0.011
 two+0.010
 moral+0.010
Neg logit contributions
run-0.034
 gust-0.033
ement-0.033
 brand-0.032
istered-0.031
Token amaz
Token ID40642
Feature activation-4.57e+00

Pos logit contributions
 gard+0.014
ont+0.013
 two+0.013
Bel+0.013
 moral+0.013
Neg logit contributions
run-0.018
ement-0.017
 brand-0.016
 gust-0.016
ended-0.016
Tokenement
Token ID972
Feature activation-1.91e+00

Pos logit contributions
run+0.091
 haven+0.081
istered+0.081
 exhausted+0.080
 gust+0.080
Neg logit contributions
ont-0.296
P-0.276
 adult-0.266
ent-0.265
 gard-0.263
Token.
Token ID13
Feature activation-4.54e-01

Pos logit contributions
run+0.110
 gust+0.108
 brand+0.103
ement+0.100
istered+0.099
Neg logit contributions
ont-0.043
fl-0.042
 two-0.041
 gard-0.040
 next-0.039
Token She
Token ID1375
Feature activation-1.01e-01

Pos logit contributions
run+0.018
 gust+0.017
ement+0.017
 brand+0.016
 anymore+0.016
Neg logit contributions
ont-0.020
 gard-0.019
P-0.019
Bel-0.019
rev-0.018
Token was
Token ID373
Feature activation+2.06e-02

Pos logit contributions
run+0.005
ement+0.005
 gust+0.005
 anymore+0.004
 brand+0.004
Neg logit contributions
ont-0.004
P-0.003
 gard-0.003
Bel-0.003
 moral-0.003
Token so
Token ID523
Feature activation-5.26e-01

Pos logit contributions
 gard+0.001
 two+0.001
ont+0.001
Bel+0.001
 cor+0.001
Neg logit contributions
run-0.001
 gust-0.001
ement-0.001
 anymore-0.001
izing-0.001
Token watch
Token ID2342
Feature activation-3.30e-02

Pos logit contributions
ement+0.000
run+0.000
 gust+0.000
 brand+0.000
 anymore+0.000
Neg logit contributions
ont-0.000
 gard-0.000
Bel-0.000
P-0.000
 cor-0.000
Token the
Token ID262
Feature activation+1.39e-01

Pos logit contributions
run+0.002
ement+0.002
 gust+0.002
 brand+0.002
 anymore+0.001
Neg logit contributions
ont-0.001
 gard-0.001
P-0.001
Bel-0.001
 cor-0.001
Token big
Token ID1263
Feature activation-1.61e-01

Pos logit contributions
ont+0.004
P+0.003
rev+0.003
Bel+0.003
 gard+0.003
Neg logit contributions
 gust-0.008
ement-0.008
run-0.008
 anymore-0.007
 brand-0.007
Token elephant
Token ID20950
Feature activation+4.86e-01

Pos logit contributions
run+0.009
ement+0.009
 gust+0.009
 brand+0.008
 anymore+0.008
Neg logit contributions
ont-0.005
 gard-0.004
P-0.004
Bel-0.004
rev-0.004
Token in
Token ID287
Feature activation+3.34e-01

Pos logit contributions
ont+0.014
fl+0.013
 gard+0.013
Bel+0.013
 cor+0.013
Neg logit contributions
run-0.026
ement-0.026
 gust-0.026
 brand-0.025
 importantly-0.024
Token amaz
Token ID40642
Feature activation-4.57e+00

Pos logit contributions
 two+0.008
ont+0.008
 cor+0.007
 moral+0.007
 gard+0.007
Neg logit contributions
run-0.020
ement-0.020
ended-0.019
 gust-0.019
istered-0.019
Tokenement
Token ID972
Feature activation-1.83e+00

Pos logit contributions
run+0.082
 gust+0.065
 haven+0.062
istered+0.061
 exhausted+0.057
Neg logit contributions
ont-0.325
P-0.308
ent-0.286
 gard-0.285
 adult-0.283
Token.
Token ID13
Feature activation-8.04e-02

Pos logit contributions
 gust+0.096
run+0.096
ement+0.093
 brand+0.092
 importantly+0.083
Neg logit contributions
ont-0.059
 gard-0.051
fl-0.050
Bel-0.049
P-0.049
Token
Token ID198
Feature activation+1.40e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 anymore+0.003
 brand+0.003
Neg logit contributions
ont-0.004
P-0.003
 gard-0.003
Bel-0.003
rev-0.003
Token
Token ID198
Feature activation+8.38e-02

Pos logit contributions
ont+0.005
 gard+0.005
P+0.005
 Gard+0.004
rev+0.004
Neg logit contributions
run-0.007
ement-0.006
 gust-0.006
 brand-0.006
 anymore-0.006
TokenThe
Token ID464
Feature activation+1.70e-01

Pos logit contributions
ont+0.004
 gard+0.003
rev+0.003
 Gard+0.003
fl+0.003
Neg logit contributions
run-0.003
 gust-0.003
ement-0.003
 brand-0.003
 anymore-0.003
Token up
Token ID510
Feature activation+6.25e-01

Pos logit contributions
ont+0.004
 gard+0.004
Bel+0.004
 cor+0.004
 moral+0.003
Neg logit contributions
run-0.005
 gust-0.005
ement-0.005
ended-0.005
oped-0.005
Token at
Token ID379
Feature activation-6.05e-01

Pos logit contributions
ont+0.020
 gard+0.020
Bel+0.019
P+0.019
 moral+0.018
Neg logit contributions
run-0.033
 gust-0.031
ement-0.030
 brand-0.029
 anymore-0.028
Token the
Token ID262
Feature activation+3.64e-01

Pos logit contributions
run+0.035
 gust+0.034
ement+0.033
istered+0.032
ended+0.032
Neg logit contributions
ont-0.016
 two-0.014
 moral-0.014
 parent-0.014
 gard-0.013
Token tree
Token ID5509
Feature activation-9.40e-01

Pos logit contributions
 gard+0.010
 moral+0.010
ont+0.010
 two+0.009
Bel+0.009
Neg logit contributions
run-0.021
ement-0.019
 gust-0.019
 brand-0.019
oped-0.018
Token in
Token ID287
Feature activation+3.55e-01

Pos logit contributions
run+0.061
 gust+0.061
ement+0.057
istered+0.057
 aren+0.056
Neg logit contributions
 two-0.015
fl-0.013
 gard-0.013
ont-0.013
 next-0.013
Token amaz
Token ID40642
Feature activation-4.56e+00

Pos logit contributions
 two+0.016
 moral+0.016
 gard+0.016
ont+0.016
Bel+0.016
Neg logit contributions
run-0.014
ement-0.013
 gust-0.013
 brand-0.012
ended-0.012
Tokenement
Token ID972
Feature activation-1.84e+00

Pos logit contributions
run+0.088
 gust+0.079
istered+0.076
 exhausted+0.073
 haven+0.073
Neg logit contributions
ont-0.305
P-0.282
 adult-0.274
 gard-0.270
ent-0.269
Token,
Token ID11
Feature activation-6.06e-01

Pos logit contributions
run+0.098
 gust+0.097
 brand+0.091
ement+0.090
istered+0.085
Neg logit contributions
ont-0.054
fl-0.051
 gard-0.049
 two-0.048
Bel-0.047
Token wondering
Token ID11263
Feature activation+4.05e-01

Pos logit contributions
run+0.028
ement+0.028
 gust+0.028
 anymore+0.026
istered+0.024
Neg logit contributions
 gard-0.022
ont-0.021
Bel-0.021
 cor-0.020
 moral-0.019
Token how
Token ID703
Feature activation+4.01e-01

Pos logit contributions
 gard+0.017
ont+0.017
 cor+0.016
 two+0.016
 moral+0.016
Neg logit contributions
 gust-0.017
run-0.016
ement-0.016
oped-0.015
ited-0.015
Token it
Token ID340
Feature activation+9.93e-01

Pos logit contributions
ont+0.014
 gard+0.014
Bel+0.014
 moral+0.013
P+0.013
Neg logit contributions
run-0.019
ement-0.019
 gust-0.018
 brand-0.017
 anymore-0.017
Token friends
Token ID2460
Feature activation+1.06e-01

Pos logit contributions
ont+0.002
 gard+0.002
P+0.002
Bel+0.001
 moral+0.001
Neg logit contributions
run-0.006
ement-0.006
 gust-0.006
 brand-0.006
 anymore-0.006
Token stopped
Token ID5025
Feature activation-1.39e+00

Pos logit contributions
ont+0.005
 gard+0.004
P+0.004
Bel+0.004
rev+0.004
Neg logit contributions
run-0.005
ement-0.004
 gust-0.004
 brand-0.004
 anymore-0.004
Token and
Token ID290
Feature activation-6.11e-01

Pos logit contributions
run+0.063
ement+0.062
 gust+0.062
 brand+0.058
 anymore+0.057
Neg logit contributions
ont-0.056
P-0.050
 gard-0.050
Bel-0.049
rev-0.048
Token watched
Token ID7342
Feature activation-1.24e-01

Pos logit contributions
run+0.028
ement+0.028
 gust+0.028
 brand+0.026
 anymore+0.026
Neg logit contributions
ont-0.024
 gard-0.022
P-0.022
Bel-0.021
rev-0.021
Token in
Token ID287
Feature activation+4.80e-01

Pos logit contributions
run+0.005
ement+0.005
 gust+0.005
 brand+0.004
 anymore+0.004
Neg logit contributions
ont-0.006
 gard-0.005
P-0.005
Bel-0.005
rev-0.005
Token amaz
Token ID40642
Feature activation-4.55e+00

Pos logit contributions
 two+0.016
 moral+0.015
 names+0.014
ont+0.014
 gard+0.014
Neg logit contributions
ended-0.023
run-0.023
ement-0.023
 gust-0.023
oped-0.022
Tokenement
Token ID972
Feature activation-1.65e+00

Pos logit contributions
run+0.079
 gust+0.078
 haven+0.073
 exhausted+0.067
istered+0.064
Neg logit contributions
ont-0.311
P-0.285
 adult-0.274
ent-0.274
 two-0.271
Token.
Token ID13
Feature activation-2.51e-01

Pos logit contributions
 gust+0.079
run+0.078
ement+0.077
 brand+0.074
ited+0.068
Neg logit contributions
ont-0.058
 gard-0.052
fl-0.051
Bel-0.049
 cor-0.049
Token The
Token ID383
Feature activation-3.31e-01

Pos logit contributions
run+0.011
ement+0.011
 gust+0.011
 brand+0.010
 anymore+0.010
Neg logit contributions
ont-0.010
 gard-0.009
P-0.009
Bel-0.009
rev-0.009
Token dog
Token ID3290
Feature activation-1.24e+00

Pos logit contributions
 gust+0.018
ement+0.017
run+0.017
 brand+0.017
 anymore+0.016
Neg logit contributions
ont-0.011
P-0.010
rev-0.010
Bel-0.010
 gard-0.009
Token too
Token ID1165
Feature activation-4.39e-01

Pos logit contributions
run+0.058
ement+0.057
 gust+0.057
 brand+0.054
 anymore+0.053
Neg logit contributions
ont-0.047
P-0.042
 gard-0.042
Bel-0.041
rev-0.040
Token's
Token ID338
Feature activation+3.88e-01

Pos logit contributions
run+0.045
ement+0.043
 gust+0.043
 brand+0.042
 anymore+0.039
Neg logit contributions
ont-0.051
P-0.046
 gard-0.045
 moral-0.043
Bel-0.043
Token eyes
Token ID2951
Feature activation-1.94e+00

Pos logit contributions
ont+0.011
P+0.010
Bel+0.009
 gard+0.009
rev+0.009
Neg logit contributions
ement-0.023
run-0.023
 gust-0.022
 brand-0.022
 anymore-0.021
Token spark
Token ID9009
Feature activation-1.53e-01

Pos logit contributions
run+0.105
ement+0.093
 gust+0.092
 importantly+0.089
 anymore+0.087
Neg logit contributions
 two-0.057
ont-0.056
Bel-0.056
 gard-0.055
 moral-0.054
Tokenled
Token ID992
Feature activation+6.73e-02

Pos logit contributions
run+0.006
 gust+0.006
 brand+0.006
 haven+0.006
ement+0.006
Neg logit contributions
ont-0.007
P-0.006
fl-0.006
Bel-0.006
 cor-0.006
Token in
Token ID287
Feature activation+2.60e-01

Pos logit contributions
 gard+0.002
ont+0.002
 two+0.002
P+0.002
Bel+0.002
Neg logit contributions
run-0.004
 gust-0.004
ement-0.004
ended-0.004
 brand-0.003
Token amaz
Token ID40642
Feature activation-4.51e+00

Pos logit contributions
ont+0.008
 gard+0.007
 two+0.007
 cor+0.007
 moral+0.007
Neg logit contributions
run-0.015
ement-0.014
 gust-0.014
 brand-0.013
ended-0.013
Tokenement
Token ID972
Feature activation-2.01e+00

Pos logit contributions
run+0.077
 gust+0.064
 haven+0.058
 exhausted+0.057
istered+0.048
Neg logit contributions
ont-0.317
P-0.298
 gard-0.285
ent-0.281
 adult-0.279
Token.
Token ID13
Feature activation-1.31e-01

Pos logit contributions
run+0.100
 gust+0.099
ement+0.096
 brand+0.092
 anymore+0.088
Neg logit contributions
ont-0.071
 gard-0.064
P-0.061
Bel-0.060
fl-0.060
Token The
Token ID383
Feature activation-1.59e-01

Pos logit contributions
run+0.005
ement+0.005
 gust+0.005
 brand+0.005
 anymore+0.005
Neg logit contributions
ont-0.006
 gard-0.006
P-0.005
Bel-0.005
rev-0.005
Token palace
Token ID20562
Feature activation-8.17e-01

Pos logit contributions
ement+0.007
 gust+0.006
 brand+0.006
 anymore+0.006
run+0.006
Neg logit contributions
ont-0.007
P-0.007
rev-0.006
Bel-0.006
resp-0.006
Token garden
Token ID11376
Feature activation-1.73e-01

Pos logit contributions
run+0.046
 gust+0.043
ement+0.042
 brand+0.041
 anymore+0.041
Neg logit contributions
ont-0.025
 gard-0.022
P-0.021
Bel-0.021
 moral-0.021
Token stared
Token ID18484
Feature activation-1.16e-01

Pos logit contributions
run+0.013
 gust+0.013
ement+0.013
 brand+0.012
 anymore+0.012
Neg logit contributions
ont-0.014
 gard-0.012
P-0.012
rev-0.012
Bel-0.012
Token at
Token ID379
Feature activation-2.19e-01

Pos logit contributions
run+0.004
ement+0.004
 gust+0.004
 brand+0.004
 anymore+0.004
Neg logit contributions
ont-0.005
P-0.005
 gard-0.005
Bel-0.005
rev-0.005
Token the
Token ID262
Feature activation+1.34e-01

Pos logit contributions
run+0.012
ement+0.012
 gust+0.012
 brand+0.011
istered+0.011
Neg logit contributions
ont-0.007
 gard-0.006
 adult-0.006
 two-0.006
Bel-0.006
Token hook
Token ID8011
Feature activation+1.27e-01

Pos logit contributions
ont+0.004
 gard+0.004
P+0.004
rev+0.004
Bel+0.004
Neg logit contributions
run-0.007
ement-0.007
 gust-0.007
 anymore-0.007
 brand-0.007
Token in
Token ID287
Feature activation+4.13e-01

Pos logit contributions
ont+0.003
 gard+0.003
 two+0.003
 moral+0.002
 cor+0.002
Neg logit contributions
run-0.008
 gust-0.008
ement-0.007
 brand-0.007
istered-0.007
Token amaz
Token ID40642
Feature activation-4.50e+00

Pos logit contributions
 gard+0.014
 two+0.014
ont+0.014
 moral+0.014
Bel+0.013
Neg logit contributions
run-0.021
ement-0.020
 gust-0.019
ended-0.018
oped-0.018
Tokenement
Token ID972
Feature activation-1.85e+00

Pos logit contributions
run+0.079
 gust+0.065
istered+0.061
 haven+0.060
 exhausted+0.059
Neg logit contributions
ont-0.321
P-0.295
 adult-0.281
 gard-0.280
ent-0.276
Token.
Token ID13
Feature activation-3.84e-02

Pos logit contributions
run+0.093
 gust+0.091
ement+0.088
 brand+0.087
 anymore+0.079
Neg logit contributions
ont-0.064
 gard-0.057
fl-0.056
Bel-0.054
 cor-0.053
Token She
Token ID1375
Feature activation-6.36e-02

Pos logit contributions
run+0.001
 gust+0.001
ement+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.002
P-0.002
 gard-0.002
 cor-0.002
 moral-0.002
Token was
Token ID373
Feature activation+9.86e-03

Pos logit contributions
run+0.003
 gust+0.003
ement+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.003
 gard-0.002
P-0.002
Bel-0.002
rev-0.002
Token so
Token ID523
Feature activation-4.12e-01

Pos logit contributions
 gard+0.000
ont+0.000
 two+0.000
 cor+0.000
Bel+0.000
Neg logit contributions
ement-0.000
run-0.000
 gust-0.000
 anymore-0.000
oped-0.000
Token.
Token ID13
Feature activation+9.09e-01

Pos logit contributions
ont+0.001
 gard+0.001
P+0.001
Bel+0.001
rev+0.001
Neg logit contributions
run-0.001
 gust-0.001
ement-0.001
 brand-0.001
 anymore-0.001
Token
Token ID198
Feature activation+2.27e-01

Pos logit contributions
ont+0.040
P+0.036
 gard+0.036
Bel+0.035
rev+0.035
Neg logit contributions
run-0.038
 gust-0.037
ement-0.037
 brand-0.034
 anymore-0.034
Token
Token ID198
Feature activation+3.45e-02

Pos logit contributions
ont+0.009
 gard+0.008
fl+0.008
rev+0.008
 Gard+0.008
Neg logit contributions
run-0.011
 gust-0.010
ement-0.010
 brand-0.009
 anymore-0.009
TokenTo
Token ID2514
Feature activation+2.15e-01

Pos logit contributions
ont+0.002
fl+0.001
 gard+0.001
rev+0.001
 cor+0.001
Neg logit contributions
run-0.002
 gust-0.001
ement-0.001
ended-0.001
 importantly-0.001
Token her
Token ID607
Feature activation-5.03e-01

Pos logit contributions
ont+0.008
P+0.007
 gard+0.007
Bel+0.007
rev+0.007
Neg logit contributions
run-0.010
 gust-0.010
ement-0.010
 brand-0.009
 anymore-0.009
Token amaz
Token ID40642
Feature activation-4.50e+00

Pos logit contributions
run+0.028
ement+0.028
 gust+0.028
 brand+0.026
 anymore+0.026
Neg logit contributions
ont-0.015
 gard-0.013
P-0.013
Bel-0.012
 moral-0.012
Tokenement
Token ID972
Feature activation-2.06e+00

Pos logit contributions
run+0.069
 gust+0.065
istered+0.062
 haven+0.054
 exhausted+0.051
Neg logit contributions
ont-0.317
P-0.288
 adult-0.284
 moral-0.281
 next-0.275
Token,
Token ID11
Feature activation-1.99e-01

Pos logit contributions
 gust+0.098
run+0.098
ement+0.096
 brand+0.093
istered+0.087
Neg logit contributions
ont-0.073
 gard-0.067
fl-0.063
Bel-0.062
 two-0.062
Token the
Token ID262
Feature activation-2.21e-01

Pos logit contributions
run+0.009
ement+0.009
 gust+0.009
 brand+0.008
 anymore+0.008
Neg logit contributions
ont-0.008
 gard-0.007
P-0.007
Bel-0.007
rev-0.007
Token clothes
Token ID8242
Feature activation+5.21e-02

Pos logit contributions
 gust+0.012
ement+0.011
 brand+0.011
run+0.011
 anymore+0.010
Neg logit contributions
ont-0.008
P-0.007
Bel-0.007
rev-0.006
 gard-0.006
Token came
Token ID1625
Feature activation-5.77e-01

Pos logit contributions
ont+0.002
 gard+0.002
 two+0.002
 moral+0.002
Bel+0.002
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
 importantly-0.002
Token as
Token ID355
Feature activation-8.98e-01

Pos logit contributions
run+0.040
ement+0.040
 gust+0.040
 brand+0.037
 anymore+0.036
Neg logit contributions
ont-0.034
P-0.031
 gard-0.031
Bel-0.030
rev-0.029
Token the
Token ID262
Feature activation+4.07e-01

Pos logit contributions
run+0.039
ement+0.039
 gust+0.037
 brand+0.036
 anymore+0.035
Neg logit contributions
ont-0.036
 gard-0.035
P-0.034
Bel-0.034
 cor-0.033
Token woman
Token ID2415
Feature activation-9.26e-01

Pos logit contributions
ont+0.013
P+0.011
rev+0.011
Bel+0.011
 gard+0.011
Neg logit contributions
 gust-0.022
run-0.022
ement-0.021
 brand-0.021
 anymore-0.020
Token watched
Token ID7342
Feature activation+1.49e-01

Pos logit contributions
ement+0.047
run+0.046
 gust+0.046
 brand+0.044
 anymore+0.043
Neg logit contributions
ont-0.030
 gard-0.028
Bel-0.027
P-0.027
 cor-0.026
Token in
Token ID287
Feature activation+7.32e-01

Pos logit contributions
ont+0.006
 gard+0.006
P+0.005
Bel+0.005
 cor+0.005
Neg logit contributions
run-0.007
ement-0.006
 gust-0.006
 brand-0.006
 anymore-0.006
Token amaz
Token ID40642
Feature activation-4.45e+00

Pos logit contributions
 two+0.025
 moral+0.024
 gard+0.024
ont+0.023
 cor+0.023
Neg logit contributions
ement-0.035
ended-0.034
run-0.034
 gust-0.033
istered-0.033
Tokenement
Token ID972
Feature activation-1.55e+00

Pos logit contributions
 haven+0.076
run+0.075
 gust+0.073
 exhausted+0.067
istered+0.066
Neg logit contributions
ont-0.296
P-0.280
 adult-0.268
ent-0.267
 gard-0.263
Token.
Token ID13
Feature activation-1.73e-01

Pos logit contributions
 gust+0.067
run+0.067
ement+0.066
 brand+0.063
 anymore+0.058
Neg logit contributions
ont-0.060
 gard-0.056
fl-0.055
Bel-0.054
 cor-0.053
Token She
Token ID1375
Feature activation-7.68e-02

Pos logit contributions
run+0.007
ement+0.007
 gust+0.007
 brand+0.006
 anymore+0.006
Neg logit contributions
ont-0.008
 gard-0.007
P-0.007
Bel-0.007
rev-0.007
Token saw
Token ID2497
Feature activation-2.00e-01

Pos logit contributions
run+0.004
ement+0.004
 gust+0.004
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.003
P-0.003
 gard-0.003
Bel-0.003
rev-0.002
Token the
Token ID262
Feature activation-6.99e-02

Pos logit contributions
run+0.010
 gust+0.010
ement+0.010
 anymore+0.010
 brand+0.010
Neg logit contributions
ont-0.006
 gard-0.006
Bel-0.006
 cor-0.006
 two-0.005
TokenTim
Token ID14967
Feature activation-5.26e-02

Pos logit contributions
run+0.003
 gust+0.003
ement+0.003
 brand+0.003
 anymore+0.003
Neg logit contributions
ont-0.003
 gard-0.002
rev-0.002
 Gard-0.002
fl-0.002
Tokenmy
Token ID1820
Feature activation+1.42e-01

Pos logit contributions
run+0.002
ement+0.002
 gust+0.002
 brand+0.002
 anymore+0.002
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
Bel-0.002
rev-0.002
Token just
Token ID655
Feature activation+2.79e-01

Pos logit contributions
ont+0.006
P+0.005
 gard+0.005
Bel+0.005
 Gard+0.005
Neg logit contributions
run-0.006
ement-0.006
 gust-0.006
 brand-0.006
 anymore-0.006
Token stared
Token ID18484
Feature activation-3.19e-01

Pos logit contributions
ont+0.011
 gard+0.010
P+0.009
Bel+0.009
rev+0.009
Neg logit contributions
ement-0.013
run-0.013
 gust-0.013
 brand-0.012
 anymore-0.012
Token in
Token ID287
Feature activation+6.18e-01

Pos logit contributions
run+0.013
 gust+0.013
ement+0.013
 brand+0.012
 anymore+0.011
Neg logit contributions
ont-0.014
 gard-0.013
P-0.013
Bel-0.013
rev-0.012
Token amaz
Token ID40642
Feature activation-4.45e+00

Pos logit contributions
 two+0.022
 gard+0.022
 moral+0.022
ont+0.021
 cor+0.021
Neg logit contributions
run-0.028
ement-0.027
 gust-0.026
ended-0.026
oped-0.026
Tokenement
Token ID972
Feature activation-1.90e+00

Pos logit contributions
run+0.066
 gust+0.066
 haven+0.064
 exhausted+0.058
istered+0.057
Neg logit contributions
ont-0.307
P-0.290
 adult-0.280
ent-0.275
 gard-0.272
Token at
Token ID379
Feature activation-6.82e-01

Pos logit contributions
 gust+0.092
run+0.092
ement+0.090
 brand+0.086
 anymore+0.082
Neg logit contributions
ont-0.068
 gard-0.062
P-0.059
Bel-0.058
fl-0.057
Token the
Token ID262
Feature activation-3.25e-01

Pos logit contributions
 gust+0.036
ement+0.035
run+0.035
 brand+0.033
 anymore+0.032
Neg logit contributions
ont-0.022
 gard-0.019
P-0.019
 adult-0.019
rev-0.019
Token t
Token ID256
Feature activation+1.49e-01

Pos logit contributions
run+0.017
ement+0.016
 gust+0.016
 brand+0.015
 anymore+0.015
Neg logit contributions
ont-0.011
 gard-0.010
P-0.010
Bel-0.010
 moral-0.010
Tokeneddy
Token ID21874
Feature activation+1.53e+00

Pos logit contributions
ont+0.005
 gard+0.005
P+0.005
Bel+0.005
 moral+0.005
Neg logit contributions
run-0.008
ement-0.007
 gust-0.007
 brand-0.007
 anymore-0.007
Tokeneddy
Token ID21874
Feature activation+1.43e+00

Pos logit contributions
run+0.006
ement+0.006
 gust+0.005
 brand+0.005
 anymore+0.005
Neg logit contributions
ont-0.005
 gard-0.005
 moral-0.004
P-0.004
Bel-0.004
Token bear
Token ID6842
Feature activation-3.39e-01

Pos logit contributions
ont+0.037
Bel+0.035
 gard+0.034
P+0.034
rev+0.032
Neg logit contributions
run-0.081
 gust-0.079
ement-0.079
 brand-0.078
 anymore-0.077
Token and
Token ID290
Feature activation-4.30e-01

Pos logit contributions
 gust+0.019
ement+0.019
run+0.019
 brand+0.018
 anymore+0.017
Neg logit contributions
ont-0.010
 gard-0.008
P-0.008
Bel-0.008
 cor-0.008
Token watched
Token ID7342
Feature activation+4.27e-01

Pos logit contributions
run+0.021
ement+0.021
 gust+0.021
 brand+0.020
 anymore+0.019
Neg logit contributions
ont-0.015
 gard-0.014
P-0.014
Bel-0.013
rev-0.013
Token in
Token ID287
Feature activation+8.22e-01

Pos logit contributions
ont+0.017
 gard+0.016
P+0.015
Bel+0.015
 moral+0.015
Neg logit contributions
run-0.019
ement-0.019
 gust-0.018
 brand-0.017
 anymore-0.017
Token amaz
Token ID40642
Feature activation-4.43e+00

Pos logit contributions
 two+0.025
 moral+0.023
 gard+0.022
 pet+0.022
 cor+0.021
Neg logit contributions
ement-0.043
 gust-0.043
run-0.042
ended-0.042
oped-0.042
Tokenement
Token ID972
Feature activation-1.45e+00

Pos logit contributions
 gust+0.081
run+0.079
 exhausted+0.071
 haven+0.070
istered+0.062
Neg logit contributions
ont-0.298
P-0.277
ent-0.263
 adult-0.262
 two-0.260
Token as
Token ID355
Feature activation-7.59e-01

Pos logit contributions
 gust+0.071
run+0.071
ement+0.070
 brand+0.067
 anymore+0.062
Neg logit contributions
ont-0.050
 gard-0.045
fl-0.044
Bel-0.043
 cor-0.042
Token it
Token ID340
Feature activation+5.66e-01

Pos logit contributions
run+0.036
ement+0.036
 gust+0.035
 brand+0.032
 anymore+0.032
Neg logit contributions
ont-0.027
 gard-0.027
 two-0.025
 cor-0.025
Bel-0.025
Token shrunk
Token ID38900
Feature activation+2.91e-01

Pos logit contributions
ont+0.019
 gard+0.017
Bel+0.017
P+0.017
fl+0.017
Neg logit contributions
ement-0.029
run-0.028
 gust-0.028
 brand-0.027
 anymore-0.026
Token an
Token ID281
Feature activation-3.56e-01

Pos logit contributions
ont+0.011
fl+0.009
 gard+0.009
 cor+0.009
P+0.009
Neg logit contributions
 gust-0.014
run-0.014
ement-0.014
 brand-0.013
ended-0.013
Token little
Token ID1310
Feature activation-4.43e-01

Pos logit contributions
 gust+0.027
ement+0.026
 brand+0.026
run+0.025
 anymore+0.024
Neg logit contributions
ont-0.015
rev-0.013
P-0.013
Bel-0.013
 Gard-0.012
Token girl
Token ID2576
Feature activation-4.74e-01

Pos logit contributions
 gust+0.023
run+0.022
ement+0.022
 brand+0.021
 anymore+0.021
Neg logit contributions
ont-0.015
P-0.013
rev-0.013
Bel-0.013
 gard-0.012
Token looked
Token ID3114
Feature activation+2.67e-01

Pos logit contributions
run+0.021
 gust+0.021
ement+0.020
 brand+0.019
 anymore+0.019
Neg logit contributions
ont-0.020
P-0.018
 gard-0.017
rev-0.017
Bel-0.017
Token up
Token ID510
Feature activation+4.93e-01

Pos logit contributions
ont+0.009
 gard+0.009
Bel+0.008
P+0.008
 moral+0.008
Neg logit contributions
run-0.013
ement-0.013
 gust-0.013
ended-0.012
 anymore-0.012
Token in
Token ID287
Feature activation+6.15e-01

Pos logit contributions
ont+0.017
 gard+0.016
P+0.016
Bel+0.015
 moral+0.015
Neg logit contributions
run-0.025
ement-0.024
 gust-0.024
 brand-0.023
 anymore-0.022
Token amaz
Token ID40642
Feature activation-4.43e+00

Pos logit contributions
 gard+0.017
ont+0.016
 two+0.016
 moral+0.015
Bel+0.015
Neg logit contributions
run-0.035
ement-0.033
 gust-0.032
ended-0.032
istered-0.031
Tokenement
Token ID972
Feature activation-1.80e+00

Pos logit contributions
run+0.080
 gust+0.072
 haven+0.070
 exhausted+0.065
istered+0.064
Neg logit contributions
ont-0.295
P-0.283
 gard-0.266
 adult-0.265
 two-0.264
Token and
Token ID290
Feature activation-4.55e-01

Pos logit contributions
run+0.091
 gust+0.089
ement+0.087
 brand+0.084
 anymore+0.078
Neg logit contributions
ont-0.059
 gard-0.056
fl-0.053
Bel-0.052
 two-0.050
Token said
Token ID531
Feature activation-5.00e-01

Pos logit contributions
run+0.024
ement+0.023
unch+0.022
 gust+0.022
 anymore+0.022
Neg logit contributions
 gard-0.014
 two-0.014
Bel-0.014
 next-0.013
fl-0.013
Token,
Token ID11
Feature activation-3.59e-02

Pos logit contributions
run+0.022
ement+0.021
 gust+0.021
 brand+0.020
 anymore+0.019
Neg logit contributions
ont-0.021
 gard-0.020
P-0.019
Bel-0.019
 moral-0.018
Token "
Token ID366
Feature activation-3.67e-01

Pos logit contributions
run+0.001
ement+0.001
 gust+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.002
 gard-0.001
P-0.001
Bel-0.001
rev-0.001
Token water
Token ID1660
Feature activation+3.36e-01

Pos logit contributions
 gust+0.013
ement+0.013
run+0.013
 brand+0.013
 anymore+0.012
Neg logit contributions
ont-0.007
P-0.006
Bel-0.006
 gard-0.006
rev-0.006
Token.
Token ID13
Feature activation+1.12e-01

Pos logit contributions
ont+0.010
 gard+0.010
 cor+0.009
P+0.009
fl+0.009
Neg logit contributions
 gust-0.018
run-0.018
ement-0.017
 brand-0.017
 importantly-0.016
Token He
Token ID679
Feature activation-2.17e-01

Pos logit contributions
ont+0.005
 gard+0.004
P+0.004
Bel+0.004
rev+0.004
Neg logit contributions
ement-0.005
 gust-0.005
run-0.005
 brand-0.004
 anymore-0.004
Token gasped
Token ID45236
Feature activation-4.00e-01

Pos logit contributions
run+0.010
 gust+0.009
ement+0.009
 brand+0.009
 anymore+0.009
Neg logit contributions
ont-0.009
P-0.008
 gard-0.008
Bel-0.008
rev-0.008
Token in
Token ID287
Feature activation+7.61e-01

Pos logit contributions
run+0.020
 gust+0.020
ement+0.019
 brand+0.019
 anymore+0.018
Neg logit contributions
ont-0.013
 gard-0.012
 cor-0.012
Bel-0.012
P-0.011
Token amaz
Token ID40642
Feature activation-4.42e+00

Pos logit contributions
ont+0.040
 gard+0.039
Bel+0.038
P+0.038
 cor+0.038
Neg logit contributions
run-0.024
ement-0.023
 gust-0.023
 anymore-0.022
 brand-0.021
Tokenement
Token ID972
Feature activation-1.92e+00

Pos logit contributions
run+0.089
 haven+0.084
 gust+0.083
istered+0.079
 exhausted+0.077
Neg logit contributions
ont-0.282
P-0.267
ent-0.255
 adult-0.255
 two-0.252
Token as
Token ID355
Feature activation-9.32e-01

Pos logit contributions
run+0.084
 gust+0.083
ement+0.080
 brand+0.078
 anymore+0.073
Neg logit contributions
ont-0.073
 gard-0.070
 cor-0.067
Bel-0.067
fl-0.065
Token he
Token ID339
Feature activation+2.78e-01

Pos logit contributions
run+0.042
ement+0.040
 gust+0.039
 brand+0.037
istered+0.037
Neg logit contributions
ont-0.036
 gard-0.035
 two-0.034
 moral-0.034
 cor-0.034
Token saw
Token ID2497
Feature activation-8.38e-02

Pos logit contributions
ont+0.011
P+0.010
 gard+0.010
Bel+0.010
rev+0.010
Neg logit contributions
run-0.013
ement-0.012
 gust-0.012
 anymore-0.012
 brand-0.012
Token the
Token ID262
Feature activation-1.38e-01

Pos logit contributions
run+0.004
 gust+0.004
ement+0.004
 anymore+0.004
 brand+0.004
Neg logit contributions
ont-0.003
 gard-0.002
 cor-0.002
 two-0.002
P-0.002
Token Mary
Token ID5335
Feature activation-4.14e-01

Pos logit contributions
run+0.022
 gust+0.022
ement+0.022
 brand+0.020
 anymore+0.020
Neg logit contributions
ont-0.021
P-0.019
 gard-0.019
Bel-0.019
rev-0.018
Token.
Token ID13
Feature activation-1.00e-01

Pos logit contributions
run+0.018
ement+0.017
 gust+0.017
 anymore+0.016
 brand+0.016
Neg logit contributions
ont-0.017
P-0.016
 gard-0.016
Bel-0.015
rev-0.015
Token She
Token ID1375
Feature activation-3.84e-01

Pos logit contributions
run+0.005
ement+0.004
 gust+0.004
 brand+0.004
 anymore+0.004
Neg logit contributions
ont-0.004
 gard-0.004
Bel-0.004
P-0.004
rev-0.004
Token was
Token ID373
Feature activation-1.67e-01

Pos logit contributions
 gust+0.011
 brand+0.010
run+0.009
 haven+0.009
 importantly+0.009
Neg logit contributions
 adult-0.020
ont-0.020
 gard-0.019
Bel-0.019
 Gard-0.019
Token just
Token ID655
Feature activation+9.76e-01

Pos logit contributions
run+0.008
ement+0.008
 gust+0.007
 brand+0.007
 anymore+0.007
Neg logit contributions
ont-0.007
 gard-0.006
P-0.006
Bel-0.006
rev-0.006
Token three
Token ID1115
Feature activation+2.12e+00

Pos logit contributions
ont+0.031
 gard+0.029
Bel+0.026
P+0.026
fl+0.025
Neg logit contributions
ement-0.052
run-0.052
 gust-0.051
 brand-0.048
 anymore-0.048
Token years
Token ID812
Feature activation-6.25e-01

Pos logit contributions
ont+0.095
P+0.087
 gard+0.087
Bel+0.084
rev+0.083
Neg logit contributions
run-0.086
 gust-0.084
ement-0.084
 brand-0.079
 anymore-0.076
Token old
Token ID1468
Feature activation+4.16e-01

Pos logit contributions
 gust+0.025
run+0.024
ement+0.024
 brand+0.023
 anymore+0.022
Neg logit contributions
ont-0.030
P-0.028
Bel-0.027
rev-0.026
fl-0.026
Token.
Token ID13
Feature activation+1.31e-01

Pos logit contributions
ont+0.015
P+0.014
 gard+0.013
Bel+0.013
rev+0.013
Neg logit contributions
run-0.020
ement-0.020
 gust-0.020
 brand-0.019
 anymore-0.019
Token One
Token ID1881
Feature activation-4.81e-01

Pos logit contributions
ont+0.005
 gard+0.005
P+0.005
Bel+0.005
rev+0.005
Neg logit contributions
run-0.006
 gust-0.006
ement-0.006
 brand-0.005
 anymore-0.005
Token day
Token ID1110
Feature activation+8.52e-01

Pos logit contributions
 gust+0.022
run+0.022
ement+0.021
 brand+0.020
 anymore+0.020
Neg logit contributions
ont-0.018
P-0.017
Bel-0.017
 gard-0.017
rev-0.017
Token around
Token ID1088
Feature activation-4.69e-02

Pos logit contributions
ont+0.007
 gard+0.006
Bel+0.006
P+0.006
 cor+0.006
Neg logit contributions
run-0.008
 gust-0.007
ement-0.007
ended-0.007
 brand-0.007
Token and
Token ID290
Feature activation-5.02e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 brand+0.003
ended+0.002
Neg logit contributions
ont-0.001
 gard-0.001
 moral-0.001
Bel-0.001
 adult-0.001
Token saw
Token ID2497
Feature activation-4.47e-02

Pos logit contributions
run+0.030
ement+0.029
unch+0.028
 gust+0.028
ended+0.027
Neg logit contributions
 two-0.012
Bel-0.011
 gard-0.011
ont-0.011
 father-0.011
Token a
Token ID257
Feature activation+2.64e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
ited+0.003
 anymore+0.003
Neg logit contributions
ont-0.001
 two-0.001
 gard-0.001
 cor-0.001
Bel-0.001
Token shiny
Token ID22441
Feature activation+3.83e-01

Pos logit contributions
 gard+0.008
ont+0.008
 cor+0.007
P+0.007
 moral+0.007
Neg logit contributions
run-0.015
ement-0.014
 anymore-0.013
 gust-0.013
ended-0.013
Token can
Token ID460
Feature activation+2.22e+00

Pos logit contributions
ont+0.013
 gard+0.012
P+0.011
 moral+0.011
 adult+0.011
Neg logit contributions
run-0.020
ement-0.019
 gust-0.019
 brand-0.018
 anymore-0.018
Token on
Token ID319
Feature activation+2.30e-01

Pos logit contributions
ont+0.088
 moral+0.079
Bel+0.077
P+0.077
 gard+0.077
Neg logit contributions
ement-0.102
 gust-0.100
run-0.096
 brand-0.094
izing-0.090
Token the
Token ID262
Feature activation-2.34e-01

Pos logit contributions
ont+0.008
 gard+0.007
P+0.007
Bel+0.006
rev+0.006
Neg logit contributions
run-0.012
ement-0.012
 gust-0.012
 brand-0.011
 anymore-0.011
Token ground
Token ID2323
Feature activation-3.84e-01

Pos logit contributions
 gust+0.011
 brand+0.011
 anymore+0.011
ement+0.010
run+0.010
Neg logit contributions
ont-0.009
P-0.008
Bel-0.008
rev-0.008
 gard-0.007
Token.
Token ID13
Feature activation-4.39e-01

Pos logit contributions
 gust+0.024
ement+0.023
run+0.023
 brand+0.023
istered+0.022
Neg logit contributions
ont-0.008
 two-0.007
 next-0.007
 gard-0.006
fl-0.006
Token It
Token ID632
Feature activation+3.12e-01

Pos logit contributions
ement+0.018
run+0.018
 gust+0.017
 brand+0.016
 anymore+0.016
Neg logit contributions
ont-0.019
 gard-0.018
P-0.018
Bel-0.017
rev-0.017
Token but
Token ID475
Feature activation-2.56e-01

Pos logit contributions
ement+0.012
run+0.012
 gust+0.011
ended+0.011
 brand+0.011
Neg logit contributions
 gard-0.012
ont-0.011
 moral-0.011
 two-0.011
Bel-0.011
Token he
Token ID339
Feature activation-6.03e-01

Pos logit contributions
ement+0.011
run+0.011
 gust+0.011
 brand+0.010
 anymore+0.010
Neg logit contributions
ont-0.010
 gard-0.009
P-0.009
 two-0.009
 cor-0.009
Token was
Token ID373
Feature activation-1.94e-01

Pos logit contributions
run+0.026
ement+0.025
 gust+0.025
 brand+0.024
 anymore+0.023
Neg logit contributions
ont-0.026
P-0.024
 gard-0.024
Bel-0.023
rev-0.023
Token too
Token ID1165
Feature activation+8.36e-01

Pos logit contributions
ement+0.008
run+0.008
 gust+0.007
 brand+0.007
izing+0.007
Neg logit contributions
ont-0.008
 two-0.008
 gard-0.008
 moral-0.008
P-0.008
Token dizz
Token ID29527
Feature activation+2.93e-01

Pos logit contributions
ont+0.039
P+0.036
 gard+0.036
Bel+0.035
rev+0.035
Neg logit contributions
run-0.033
 gust-0.032
ement-0.031
 brand-0.029
 anymore-0.029
Tokeny
Token ID88
Feature activation+2.10e+00

Pos logit contributions
ont+0.009
P+0.008
rev+0.007
fl+0.007
 gard+0.007
Neg logit contributions
 brand-0.016
ement-0.016
 gust-0.016
run-0.015
 importantly-0.014
Token.
Token ID13
Feature activation+2.56e-01

Pos logit contributions
ont+0.061
 gard+0.049
fl+0.049
 adult+0.048
 two+0.047
Neg logit contributions
 gust-0.120
ement-0.118
run-0.112
 brand-0.111
 importantly-0.104
Token He
Token ID679
Feature activation+6.11e-01

Pos logit contributions
ont+0.012
P+0.011
 gard+0.011
Bel+0.011
rev+0.010
Neg logit contributions
run-0.010
ement-0.010
 gust-0.010
 brand-0.009
 anymore-0.009
Token tried
Token ID3088
Feature activation+6.85e-01

Pos logit contributions
ont+0.026
 gard+0.025
P+0.024
Bel+0.024
rev+0.023
Neg logit contributions
run-0.026
 gust-0.025
ement-0.025
 brand-0.023
 anymore-0.023
Token again
Token ID757
Feature activation-9.65e-03

Pos logit contributions
ont+0.026
P+0.024
 gard+0.023
rev+0.023
Bel+0.023
Neg logit contributions
run-0.033
 gust-0.032
ement-0.032
 anymore-0.031
 brand-0.030
Token,
Token ID11
Feature activation-1.06e-01

Pos logit contributions
ement+0.001
 gust+0.001
run+0.001
 brand+0.001
istered+0.001
Neg logit contributions
ont-0.000
 gard-0.000
 two-0.000
 adult-0.000
Bel-0.000
Token
Token ID198
Feature activation+1.95e-01

Pos logit contributions
ont+0.011
 gard+0.010
P+0.010
Bel+0.010
rev+0.009
Neg logit contributions
run-0.019
 gust-0.018
ement-0.018
 brand-0.017
 anymore-0.017
TokenHe
Token ID1544
Feature activation-1.98e-01

Pos logit contributions
ont+0.008
 gard+0.007
rev+0.007
Bel+0.007
P+0.007
Neg logit contributions
run-0.009
 gust-0.008
ement-0.008
 brand-0.008
 anymore-0.008
Token looked
Token ID3114
Feature activation+2.94e-01

Pos logit contributions
run+0.008
 gust+0.008
ement+0.007
 brand+0.007
 anymore+0.007
Neg logit contributions
ont-0.009
 gard-0.008
P-0.008
 Gard-0.008
fl-0.008
Token for
Token ID329
Feature activation-9.54e-02

Pos logit contributions
ont+0.011
 gard+0.010
 cor+0.010
P+0.010
Bel+0.010
Neg logit contributions
run-0.014
ement-0.013
 gust-0.013
ended-0.012
izing-0.012
Token another
Token ID1194
Feature activation+1.70e-01

Pos logit contributions
run+0.005
 gust+0.005
ement+0.005
 brand+0.005
 anymore+0.004
Neg logit contributions
ont-0.003
P-0.003
 gard-0.003
Bel-0.003
rev-0.003
Token pe
Token ID613
Feature activation+2.23e+00

Pos logit contributions
ont+0.005
P+0.005
 gard+0.005
Bel+0.005
rev+0.005
Neg logit contributions
run-0.009
 gust-0.009
ement-0.009
 brand-0.008
 anymore-0.008
Tokenb
Token ID65
Feature activation+2.57e+00

Pos logit contributions
ont+0.088
P+0.078
 gard+0.078
Bel+0.076
rev+0.073
Neg logit contributions
run-0.104
 gust-0.103
ement-0.102
 brand-0.096
 anymore-0.095
Tokenble
Token ID903
Feature activation+1.66e+00

Pos logit contributions
ont+0.092
P+0.087
 gard+0.086
Bel+0.084
rev+0.083
Neg logit contributions
run-0.126
ement-0.123
 gust-0.117
 brand-0.113
 anymore-0.113
Token and
Token ID290
Feature activation+5.51e-01

Pos logit contributions
ont+0.049
 gard+0.042
P+0.041
 two+0.040
 adult+0.039
Neg logit contributions
 gust-0.092
ement-0.091
run-0.089
 brand-0.088
istered-0.083
Token found
Token ID1043
Feature activation+5.65e-01

Pos logit contributions
ont+0.020
 gard+0.019
P+0.019
 two+0.018
 cor+0.018
Neg logit contributions
ement-0.027
run-0.026
 gust-0.025
 anymore-0.024
izing-0.024
Token one
Token ID530
Feature activation+3.88e-01

Pos logit contributions
ont+0.017
 gard+0.015
P+0.015
Bel+0.014
 cor+0.014
Neg logit contributions
 gust-0.031
run-0.031
ement-0.031
 brand-0.029
 anymore-0.029
Token on
Token ID319
Feature activation-2.18e-01

Pos logit contributions
run+0.032
ement+0.031
 gust+0.031
 brand+0.030
 anymore+0.030
Neg logit contributions
ont-0.019
P-0.017
 gard-0.017
Bel-0.016
rev-0.016
Token the
Token ID262
Feature activation-5.24e-01

Pos logit contributions
run+0.012
 gust+0.012
ement+0.012
 brand+0.011
 anymore+0.011
Neg logit contributions
ont-0.007
 gard-0.006
P-0.006
Bel-0.006
rev-0.006
Token rock
Token ID3881
Feature activation+3.15e-02

Pos logit contributions
 gust+0.027
run+0.027
ement+0.026
 brand+0.025
 anymore+0.025
Neg logit contributions
ont-0.018
P-0.016
Bel-0.015
 gard-0.015
rev-0.015
Token with
Token ID351
Feature activation+3.70e-01

Pos logit contributions
ont+0.001
 gard+0.001
 moral+0.001
fl+0.001
 two+0.001
Neg logit contributions
 gust-0.002
run-0.002
ement-0.002
 brand-0.002
 importantly-0.002
Token a
Token ID257
Feature activation+1.11e-01

Pos logit contributions
ont+0.009
 two+0.008
 cor+0.008
 gard+0.008
Bel+0.008
Neg logit contributions
ement-0.022
run-0.021
 gust-0.021
 brand-0.020
izing-0.020
Token marker
Token ID18364
Feature activation+2.13e+00

Pos logit contributions
ont+0.003
 gard+0.003
P+0.003
 cor+0.003
Bel+0.003
Neg logit contributions
ement-0.006
run-0.006
 gust-0.006
 anymore-0.006
izing-0.005
Token.
Token ID13
Feature activation+2.23e-01

Pos logit contributions
ont+0.056
fl+0.043
 two+0.043
 gard+0.043
 parent+0.039
Neg logit contributions
 gust-0.131
run-0.127
ement-0.124
 brand-0.118
ited-0.117
Token
Token ID198
Feature activation+2.83e-01

Pos logit contributions
ont+0.007
P+0.006
 gard+0.006
Bel+0.006
rev+0.006
Neg logit contributions
ement-0.012
 gust-0.012
run-0.012
 brand-0.011
 anymore-0.011
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
ont+0.009
 gard+0.008
P+0.008
Bel+0.008
rev+0.007
Neg logit contributions
run-0.015
ement-0.015
 gust-0.015
 brand-0.014
 anymore-0.014
TokenThey
Token ID2990
Feature activation+1.38e-01

Pos logit contributions
ont+0.005
 gard+0.004
rev+0.004
Bel+0.004
fl+0.004
Neg logit contributions
run-0.005
 gust-0.005
ement-0.005
 brand-0.005
 anymore-0.005
Token stand
Token ID1302
Feature activation+7.28e-02

Pos logit contributions
ont+0.006
P+0.005
rev+0.005
 gard+0.005
Bel+0.005
Neg logit contributions
 gust-0.006
run-0.006
ement-0.006
 anymore-0.005
 brand-0.005
Token always
Token ID1464
Feature activation-4.14e-01

Pos logit contributions
ont+0.002
P+0.002
 gard+0.002
Bel+0.002
rev+0.002
Neg logit contributions
run-0.003
 gust-0.003
ement-0.003
 brand-0.003
 anymore-0.003
Token give
Token ID1577
Feature activation+2.45e-01

Pos logit contributions
run+0.024
ement+0.024
 gust+0.024
 brand+0.023
 anymore+0.022
Neg logit contributions
ont-0.011
 gard-0.010
P-0.010
Bel-0.010
 moral-0.009
Token her
Token ID607
Feature activation-2.98e-02

Pos logit contributions
ont+0.009
 gard+0.008
P+0.008
rev+0.008
 Gard+0.008
Neg logit contributions
run-0.012
ement-0.011
 anymore-0.011
 gust-0.011
 brand-0.011
Token a
Token ID257
Feature activation+1.06e-01

Pos logit contributions
run+0.001
ement+0.001
 gust+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.001
P-0.001
 gard-0.001
rev-0.001
Bel-0.001
Token safe
Token ID3338
Feature activation-1.06e+00

Pos logit contributions
ont+0.004
P+0.004
rev+0.004
 gard+0.004
 Gard+0.004
Neg logit contributions
run-0.005
 brand-0.004
 gust-0.004
ement-0.004
 anymore-0.004
Token feeling
Token ID4203
Feature activation-3.93e-01

Pos logit contributions
run+0.055
 gust+0.054
ement+0.054
 brand+0.051
 anymore+0.051
Neg logit contributions
ont-0.036
P-0.032
 gard-0.032
Bel-0.031
rev-0.030
Token.
Token ID13
Feature activation+2.48e-01

Pos logit contributions
run+0.019
 gust+0.019
ement+0.018
 brand+0.018
 anymore+0.016
Neg logit contributions
ont-0.014
 gard-0.013
P-0.013
Bel-0.012
 moral-0.012
Token<|endoftext|>
Token ID50256
Feature activation+3.70e-01

Pos logit contributions
ont+0.010
P+0.009
 gard+0.009
rev+0.009
 cor+0.008
Neg logit contributions
run-0.011
ement-0.011
 gust-0.011
 anymore-0.010
 brand-0.010
TokenTim
Token ID14967
Feature activation+6.37e-01

Pos logit contributions
ont+0.010
P+0.009
 gard+0.008
Bel+0.008
rev+0.008
Neg logit contributions
run-0.022
 gust-0.021
ement-0.021
 brand-0.021
 anymore-0.020
Token and
Token ID290
Feature activation-4.42e-01

Pos logit contributions
ont+0.025
P+0.024
 gard+0.024
Bel+0.023
rev+0.022
Neg logit contributions
run-0.028
ement-0.027
 gust-0.027
 anymore-0.025
 brand-0.025
Token Lucy
Token ID22162
Feature activation-7.41e-01

Pos logit contributions
 gust+0.018
run+0.018
ement+0.016
 brand+0.016
 anymore+0.015
Neg logit contributions
ont-0.020
P-0.018
rev-0.017
 gard-0.017
resp-0.017
Token He
Token ID679
Feature activation-3.16e-01

Pos logit contributions
run+0.001
 gust+0.001
ement+0.001
izing+0.001
 brand+0.001
Neg logit contributions
ont-0.001
rev-0.001
 gard-0.001
P-0.001
 moral-0.001
Token wanted
Token ID2227
Feature activation+4.43e-01

Pos logit contributions
run+0.013
ement+0.013
 gust+0.013
 brand+0.012
 anymore+0.011
Neg logit contributions
ont-0.014
P-0.013
 gard-0.012
Bel-0.012
 Gard-0.012
Token to
Token ID284
Feature activation-3.07e-01

Pos logit contributions
ont+0.015
P+0.015
 gard+0.015
 Gard+0.014
rev+0.014
Neg logit contributions
run-0.022
 anymore-0.021
 brand-0.020
 gust-0.020
ement-0.020
Token know
Token ID760
Feature activation-1.78e-01

Pos logit contributions
run+0.014
 gust+0.013
ement+0.013
 anymore+0.013
 brand+0.013
Neg logit contributions
ont-0.012
P-0.011
rev-0.011
Bel-0.011
 gard-0.010
Token what
Token ID644
Feature activation-2.09e-01

Pos logit contributions
 gust+0.008
run+0.007
ement+0.007
istered+0.007
ited+0.007
Neg logit contributions
ont-0.007
 two-0.007
 adult-0.007
 moral-0.007
 gard-0.007
Token she
Token ID673
Feature activation-3.58e-01

Pos logit contributions
run+0.010
 gust+0.010
ement+0.010
 brand+0.009
 anymore+0.009
Neg logit contributions
ont-0.008
P-0.007
 gard-0.007
Bel-0.007
rev-0.007
Token had
Token ID550
Feature activation+1.62e-01

Pos logit contributions
 gust+0.019
run+0.019
ement+0.018
 brand+0.017
 anymore+0.017
Neg logit contributions
ont-0.012
rev-0.011
 moral-0.010
 gard-0.010
P-0.010
Token been
Token ID587
Feature activation+1.71e-01

Pos logit contributions
ont+0.007
P+0.006
 gard+0.006
Bel+0.006
rev+0.006
Neg logit contributions
 gust-0.008
run-0.007
ement-0.007
 brand-0.007
 anymore-0.007
Token doing
Token ID1804
Feature activation+9.82e-01

Pos logit contributions
ont+0.007
 gard+0.007
P+0.007
Bel+0.006
rev+0.006
Neg logit contributions
run-0.007
 gust-0.007
ement-0.007
 brand-0.007
 anymore-0.007
Token all
Token ID477
Feature activation+1.92e-01

Pos logit contributions
ont+0.035
P+0.030
 gard+0.030
rev+0.029
 moral+0.029
Neg logit contributions
 gust-0.050
run-0.049
ement-0.048
 brand-0.045
ended-0.043
Token day
Token ID1110
Feature activation+7.12e-01

Pos logit contributions
ont+0.008
 gard+0.007
P+0.007
Bel+0.007
 moral+0.007
Neg logit contributions
 gust-0.009
ement-0.009
run-0.009
 brand-0.008
 anymore-0.008
TokenOnce
Token ID7454
Feature activation+1.21e-01

Pos logit contributions
ont+0.017
 gard+0.015
P+0.015
Bel+0.014
rev+0.014
Neg logit contributions
run-0.029
ement-0.028
 gust-0.028
 brand-0.027
 anymore-0.026
Token upon
Token ID2402
Feature activation+2.79e-01

Pos logit contributions
ont+0.004
 moral+0.004
 parent+0.004
P+0.004
 two+0.004
Neg logit contributions
ement-0.006
run-0.006
 gust-0.006
 brand-0.005
izing-0.005
Token a
Token ID257
Feature activation-4.27e-01

Pos logit contributions
ont+0.006
P+0.005
Bel+0.005
 gard+0.005
 moral+0.005
Neg logit contributions
ement-0.018
run-0.018
 gust-0.017
 brand-0.017
 anymore-0.016
Token time
Token ID640
Feature activation-3.63e-01

Pos logit contributions
run+0.024
ement+0.023
 gust+0.023
 brand+0.022
 anymore+0.022
Neg logit contributions
ont-0.013
P-0.011
 gard-0.011
Bel-0.011
rev-0.010
Token,
Token ID11
Feature activation+3.05e-02

Pos logit contributions
run+0.017
ement+0.017
 gust+0.017
 brand+0.016
izing+0.015
Neg logit contributions
ont-0.014
P-0.012
 gard-0.012
Bel-0.012
 moral-0.012
Token there
Token ID612
Feature activation-3.80e-01

Pos logit contributions
 two+0.001
P+0.001
ont+0.001
 moral+0.001
 next+0.001
Neg logit contributions
ement-0.002
run-0.002
 brand-0.002
izing-0.001
 gust-0.001
Token was
Token ID373
Feature activation+2.89e-01

Pos logit contributions
run+0.027
 gust+0.026
ement+0.026
 brand+0.026
ifying+0.025
Neg logit contributions
ont-0.006
 two-0.004
P-0.004
fl-0.004
rev-0.004
Token a
Token ID257
Feature activation-1.96e-01

Pos logit contributions
ont+0.010
 gard+0.009
 adult+0.009
rev+0.009
 Gard+0.009
Neg logit contributions
 gust-0.014
ement-0.014
 brand-0.013
run-0.013
 anymore-0.013
Token little
Token ID1310
Feature activation+3.77e-03

Pos logit contributions
 gust+0.011
run+0.010
ement+0.010
 brand+0.010
 anymore+0.009
Neg logit contributions
ont-0.006
P-0.005
Bel-0.005
rev-0.005
 gard-0.005
Token girl
Token ID2576
Feature activation+3.45e-01

Pos logit contributions
ont+0.000
Bel+0.000
P+0.000
rev+0.000
 gard+0.000
Neg logit contributions
 gust-0.000
run-0.000
ement-0.000
 brand-0.000
 anymore-0.000
Token named
Token ID3706
Feature activation-4.00e-01

Pos logit contributions
ont+0.015
rev+0.013
Bel+0.013
 gard+0.013
resp+0.013
Neg logit contributions
run-0.014
 gust-0.014
ement-0.013
 anymore-0.013
 brand-0.013
Token tired
Token ID10032
Feature activation-5.86e-01

Pos logit contributions
run+0.037
ement+0.035
 gust+0.034
oped+0.032
istered+0.032
Neg logit contributions
 gard-0.018
Bel-0.017
ont-0.017
 cor-0.017
 two-0.017
Tokenness
Token ID1108
Feature activation-3.85e-01

Pos logit contributions
run+0.035
ement+0.033
 brand+0.031
 gust+0.031
istered+0.031
Neg logit contributions
 gard-0.016
ont-0.014
 adult-0.013
 two-0.013
fl-0.013
Token come
Token ID1282
Feature activation+2.99e-01

Pos logit contributions
run+0.022
ement+0.021
 gust+0.020
 brand+0.020
istered+0.020
Neg logit contributions
 gard-0.009
ont-0.009
 two-0.009
Bel-0.009
fl-0.009
Token over
Token ID625
Feature activation+7.00e-01

Pos logit contributions
ont+0.009
 gard+0.009
Bel+0.008
fl+0.008
P+0.008
Neg logit contributions
ement-0.016
run-0.016
 gust-0.016
 brand-0.014
 anymore-0.014
Token her
Token ID607
Feature activation+2.87e-01

Pos logit contributions
 gard+0.017
ont+0.016
Bel+0.015
fl+0.015
 cor+0.015
Neg logit contributions
run-0.041
 gust-0.040
ement-0.040
 brand-0.038
istered-0.037
Token.
Token ID13
Feature activation+4.14e-01

Pos logit contributions
ont+0.009
 gard+0.008
fl+0.008
 parent+0.008
 moral+0.007
Neg logit contributions
run-0.015
ement-0.015
 gust-0.015
 brand-0.014
ended-0.014
Token With
Token ID2080
Feature activation-2.47e-01

Pos logit contributions
ont+0.015
 gard+0.015
P+0.014
Bel+0.014
rev+0.014
Neg logit contributions
run-0.019
 gust-0.019
ement-0.019
 brand-0.018
 anymore-0.017
Token a
Token ID257
Feature activation+4.66e-01

Pos logit contributions
run+0.015
ement+0.015
 anymore+0.014
ended+0.014
 gust+0.014
Neg logit contributions
 two-0.006
 gard-0.005
Bel-0.005
 moral-0.005
ont-0.005
Token big
Token ID1263
Feature activation+1.92e-01

Pos logit contributions
ont+0.017
 gard+0.017
P+0.016
Bel+0.016
 moral+0.016
Neg logit contributions
run-0.022
ement-0.022
 anymore-0.021
 gust-0.020
izing-0.019
Token yawn
Token ID40908
Feature activation+4.59e-01

Pos logit contributions
ont+0.008
 gard+0.008
 moral+0.007
Bel+0.007
P+0.007
Neg logit contributions
run-0.008
ement-0.008
 anymore-0.007
 brand-0.007
izing-0.007
Token she
Token ID673
Feature activation+2.43e-01

Pos logit contributions
ont+0.017
 gard+0.016
fl+0.015
P+0.015
Bel+0.015
Neg logit contributions
run-0.022
ement-0.021
 gust-0.020
 brand-0.020
 anymore-0.019
Token would
Token ID561
Feature activation+4.06e-01

Pos logit contributions
ont+0.013
P+0.012
rev+0.012
 gard+0.012
 Gard+0.012
Neg logit contributions
run-0.010
ement-0.010
 gust-0.009
 anymore-0.009
 brand-0.009
Token spend
Token ID4341
Feature activation-8.86e-01

Pos logit contributions
ont+0.015
P+0.014
 gard+0.013
rev+0.013
Bel+0.013
Neg logit contributions
run-0.020
 gust-0.019
ement-0.019
 brand-0.018
 anymore-0.018
Token lots
Token ID6041
Feature activation-1.75e+00

Pos logit contributions
ement+0.050
run+0.050
 gust+0.049
istered+0.047
ended+0.047
Neg logit contributions
ont-0.025
P-0.023
Bel-0.020
 cor-0.020
 two-0.020
Token of
Token ID286
Feature activation-1.78e-02

Pos logit contributions
run+0.059
ement+0.059
 gust+0.058
 brand+0.054
 anymore+0.052
Neg logit contributions
ont-0.089
P-0.083
 gard-0.082
Bel-0.081
 moral-0.078
Token time
Token ID640
Feature activation-1.30e-01

Pos logit contributions
run+0.001
 gust+0.001
ement+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.001
 gard-0.001
P-0.001
rev-0.001
Bel-0.001
Token around
Token ID1088
Feature activation+5.00e-02

Pos logit contributions
ement+0.006
 gust+0.006
run+0.006
 brand+0.006
 anymore+0.006
Neg logit contributions
ont-0.005
P-0.004
 gard-0.004
Bel-0.004
 cor-0.004
Token the
Token ID262
Feature activation-2.29e-02

Pos logit contributions
ont+0.002
 gard+0.001
P+0.001
 cor+0.001
Bel+0.001
Neg logit contributions
run-0.003
ement-0.003
 gust-0.003
 brand-0.002
istered-0.002
Token rail
Token ID6787
Feature activation-8.04e-01

Pos logit contributions
run+0.001
ement+0.001
 gust+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.001
P-0.001
 gard-0.001
Bel-0.001
rev-0.001
Token,
Token ID11
Feature activation-2.59e-01

Pos logit contributions
 gust+0.041
run+0.041
ement+0.040
 brand+0.038
istered+0.037
Neg logit contributions
ont-0.026
 gard-0.023
P-0.023
Bel-0.023
fl-0.022
Token sitting
Token ID5586
Feature activation-5.62e-01

Pos logit contributions
ement+0.012
run+0.012
 gust+0.012
 anymore+0.011
 brand+0.011
Neg logit contributions
ont-0.010
 gard-0.009
Bel-0.009
P-0.009
 cor-0.009
Token and
Token ID290
Feature activation+1.14e-01

Pos logit contributions
run+0.028
 gust+0.028
ement+0.027
 brand+0.026
 anymore+0.026
Neg logit contributions
ont-0.019
P-0.019
 gard-0.018
Bel-0.017
 moral-0.017
Token
Token ID198
Feature activation-8.20e-02

Pos logit contributions
run+0.001
ement+0.001
 gust+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.001
P-0.001
 gard-0.001
Bel-0.001
rev-0.001
TokenL
Token ID43
Feature activation-3.59e-02

Pos logit contributions
run+0.005
 gust+0.005
ement+0.005
 brand+0.004
 anymore+0.004
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
rev-0.002
Bel-0.002
Tokenily
Token ID813
Feature activation+1.27e-01

Pos logit contributions
run+0.002
ement+0.002
 gust+0.002
 anymore+0.002
 brand+0.002
Neg logit contributions
ont-0.001
 cor-0.001
 gard-0.001
Bel-0.001
P-0.001
Token shook
Token ID14682
Feature activation-1.07e+00

Pos logit contributions
ont+0.006
P+0.006
 gard+0.005
rev+0.005
 adult+0.005
Neg logit contributions
run-0.005
ement-0.005
 gust-0.005
 brand-0.004
ited-0.004
Token her
Token ID607
Feature activation-4.92e-01

Pos logit contributions
run+0.068
ement+0.067
 gust+0.065
 brand+0.065
ended+0.064
Neg logit contributions
ont-0.021
 moral-0.019
 gard-0.018
 two-0.018
 adult-0.017
Token head
Token ID1182
Feature activation-8.80e-01

Pos logit contributions
 gust+0.027
run+0.027
ement+0.026
 anymore+0.025
 brand+0.025
Neg logit contributions
ont-0.015
P-0.015
Bel-0.014
rev-0.013
 gard-0.013
Token.
Token ID13
Feature activation-8.20e-01

Pos logit contributions
 gust+0.052
run+0.052
ement+0.050
 importantly+0.049
 brand+0.049
Neg logit contributions
ont-0.021
 two-0.019
 gard-0.019
 moral-0.018
 next-0.017
Token She
Token ID1375
Feature activation-2.03e-01

Pos logit contributions
ement+0.038
run+0.038
 gust+0.036
 brand+0.034
ited+0.034
Neg logit contributions
 gard-0.030
ont-0.030
 Gard-0.027
Bel-0.027
rev-0.027
Token did
Token ID750
Feature activation+2.53e-01

Pos logit contributions
run+0.008
 gust+0.008
ement+0.008
 brand+0.008
 anymore+0.007
Neg logit contributions
ont-0.009
P-0.008
 gard-0.008
Bel-0.008
rev-0.008
Token not
Token ID407
Feature activation-3.58e-01

Pos logit contributions
ont+0.005
 moral+0.005
 gard+0.004
Bel+0.004
 two+0.004
Neg logit contributions
ement-0.016
run-0.016
 gust-0.016
 brand-0.015
 importantly-0.014
Token want
Token ID765
Feature activation+1.15e+00

Pos logit contributions
run+0.012
 gust+0.012
ement+0.011
 anymore+0.011
 brand+0.011
Neg logit contributions
ont-0.018
P-0.017
 gard-0.017
Bel-0.017
rev-0.017
Token mum
Token ID25682
Feature activation+1.79e-01

Pos logit contributions
ont+0.004
P+0.004
Bel+0.004
resp+0.003
Pr+0.003
Neg logit contributions
 brand-0.006
ement-0.006
run-0.006
 gust-0.006
 anymore-0.006
Token and
Token ID290
Feature activation+2.25e-01

Pos logit contributions
ont+0.006
P+0.005
 gard+0.005
Bel+0.005
rev+0.005
Neg logit contributions
run-0.010
ement-0.009
 gust-0.009
 brand-0.009
 anymore-0.009
Token said
Token ID531
Feature activation-4.08e-01

Pos logit contributions
ont+0.009
P+0.008
 gard+0.008
 moral+0.008
Bel+0.008
Neg logit contributions
run-0.011
 gust-0.010
ement-0.010
 brand-0.010
 anymore-0.009
Token "
Token ID366
Feature activation+4.64e-02

Pos logit contributions
run+0.017
ement+0.016
 gust+0.016
 brand+0.015
 anymore+0.015
Neg logit contributions
ont-0.018
P-0.017
 gard-0.017
Bel-0.016
rev-0.016
Tokenyou
Token ID5832
Feature activation+6.90e-01

Pos logit contributions
ont+0.002
 gard+0.002
 Gard+0.002
resp+0.001
rev+0.001
Neg logit contributions
run-0.002
 gust-0.002
ement-0.002
 brand-0.002
 anymore-0.002
Token're
Token ID821
Feature activation-5.23e-01

Pos logit contributions
ont+0.034
P+0.032
 gard+0.031
Bel+0.031
rev+0.030
Neg logit contributions
run-0.025
 gust-0.024
ement-0.024
 brand-0.022
 anymore-0.022
Token okay
Token ID8788
Feature activation-1.72e+00

Pos logit contributions
run+0.020
ement+0.020
 gust+0.020
 brand+0.018
 anymore+0.018
Neg logit contributions
ont-0.025
P-0.023
 gard-0.023
Bel-0.022
rev-0.022
Token".
Token ID1911
Feature activation-1.24e-02

Pos logit contributions
run+0.071
 gust+0.069
ement+0.069
 brand+0.064
 anymore+0.064
Neg logit contributions
ont-0.077
P-0.071
 gard-0.070
Bel-0.069
rev-0.067
Token She
Token ID1375
Feature activation-3.20e-01

Pos logit contributions
run+0.001
 gust+0.001
ement+0.001
 brand+0.001
 anymore+0.001
Neg logit contributions
ont-0.000
 gard-0.000
P-0.000
Bel-0.000
 cor-0.000
Token smiled
Token ID13541
Feature activation-5.10e-01

Pos logit contributions
run+0.013
 gust+0.012
 brand+0.012
ement+0.011
 badly+0.010
Neg logit contributions
ont-0.015
P-0.014
 gard-0.013
fl-0.013
 Gard-0.013
Token and
Token ID290
Feature activation-4.84e-01

Pos logit contributions
run+0.032
 gust+0.031
ement+0.031
 anymore+0.029
 brand+0.029
Neg logit contributions
ont-0.011
 gard-0.010
Bel-0.010
P-0.009
 parent-0.009
Token.
Token ID13
Feature activation-1.77e-01

Pos logit contributions
ont+0.004
 gard+0.003
Bel+0.003
 parent+0.003
 two+0.003
Neg logit contributions
ement-0.009
 gust-0.009
run-0.008
izing-0.008
 importantly-0.008
Token They
Token ID1119
Feature activation+1.46e-01

Pos logit contributions
run+0.008
ement+0.008
 gust+0.008
 brand+0.007
 anymore+0.007
Neg logit contributions
ont-0.007
P-0.006
 gard-0.006
Bel-0.006
rev-0.006
Token decided
Token ID3066
Feature activation+4.91e-01

Pos logit contributions
ont+0.006
 gard+0.005
P+0.005
Bel+0.005
 moral+0.005
Neg logit contributions
run-0.007
ement-0.007
 gust-0.006
 anymore-0.006
 brand-0.006
Token not
Token ID407
Feature activation-1.09e-01

Pos logit contributions
ont+0.014
P+0.012
 gard+0.012
Bel+0.012
rev+0.011
Neg logit contributions
run-0.028
ement-0.028
 gust-0.028
 brand-0.026
 anymore-0.026
Token to
Token ID284
Feature activation+4.74e-01

Pos logit contributions
ement+0.007
run+0.007
 gust+0.007
 brand+0.007
 anymore+0.006
Neg logit contributions
ont-0.002
 gard-0.002
Bel-0.002
P-0.002
 moral-0.002
Token punish
Token ID6878
Feature activation-6.18e-01

Pos logit contributions
ont+0.019
P+0.018
 gard+0.017
Bel+0.017
rev+0.017
Neg logit contributions
run-0.021
 gust-0.021
 anymore-0.020
ement-0.020
 brand-0.020
Token the
Token ID262
Feature activation+8.33e-01

Pos logit contributions
run+0.031
ement+0.030
 gust+0.030
 brand+0.028
 anymore+0.027
Neg logit contributions
ont-0.022
 gard-0.020
P-0.020
Bel-0.019
rev-0.019
Token cat
Token ID3797
Feature activation+3.44e-02

Pos logit contributions
ont+0.025
P+0.022
Bel+0.021
 gard+0.021
rev+0.021
Neg logit contributions
 gust-0.047
run-0.046
ement-0.045
 brand-0.044
 anymore-0.044
Token.
Token ID13
Feature activation+1.82e-01

Pos logit contributions
ont+0.001
 gard+0.001
 moral+0.001
Bel+0.001
 adult+0.001
Neg logit contributions
ement-0.002
 gust-0.002
run-0.002
 brand-0.002
 importantly-0.002
Token Instead
Token ID5455
Feature activation-1.29e+00

Pos logit contributions
ont+0.008
P+0.007
 gard+0.007
Bel+0.007
rev+0.007
Neg logit contributions
run-0.008
ement-0.008
 gust-0.008
 brand-0.007
 anymore-0.007
Token,
Token ID11
Feature activation+6.16e-01

Pos logit contributions
ement+0.065
run+0.062
 brand+0.062
 gust+0.061
ited+0.061
Neg logit contributions
 gard-0.040
 two-0.040
ont-0.040
 adult-0.040
 parent-0.039
Token<|endoftext|>
Token ID50256
Feature activation+4.37e-01

Pos logit contributions
ement+0.009
run+0.009
 gust+0.009
 brand+0.008
 anymore+0.008
Neg logit contributions
ont-0.005
P-0.005
 gard-0.005
rev-0.004
Bel-0.004
TokenL
Token ID43
Feature activation+5.17e-01

Pos logit contributions
ont+0.014
 gard+0.013
P+0.012
Bel+0.012
rev+0.012
Neg logit contributions
run-0.023
ement-0.023
 gust-0.022
 anymore-0.021
 brand-0.021
Tokenily
Token ID813
Feature activation+9.57e-01

Pos logit contributions
ont+0.016
 gard+0.015
Bel+0.014
P+0.014
fl+0.014
Neg logit contributions
run-0.028
ement-0.027
 gust-0.027
 anymore-0.026
 brand-0.025
Token and
Token ID290
Feature activation-2.48e-01

Pos logit contributions
ont+0.039
P+0.036
 adult+0.035
Bel+0.035
 gard+0.035
Neg logit contributions
run-0.039
ement-0.038
 gust-0.038
 anymore-0.037
 brand-0.036
Token Tom
Token ID4186
Feature activation-7.67e-01

Pos logit contributions
run+0.012
 gust+0.012
ement+0.012
 brand+0.011
 anymore+0.011
Neg logit contributions
ont-0.009
P-0.008
 gard-0.008
Bel-0.008
rev-0.008
Token were
Token ID547
Feature activation-7.90e-01

Pos logit contributions
run+0.031
 gust+0.030
ement+0.030
 anymore+0.028
 brand+0.028
Neg logit contributions
ont-0.034
P-0.031
 gard-0.031
Bel-0.030
rev-0.029
Token friends
Token ID2460
Feature activation+9.22e-01

Pos logit contributions
run+0.035
ement+0.034
 gust+0.033
 brand+0.031
 anymore+0.031
Neg logit contributions
ont-0.033
P-0.031
 gard-0.030
Bel-0.029
rev-0.029
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
ont+0.024
P+0.022
 gard+0.021
 two+0.021
 parent+0.021
Neg logit contributions
 gust-0.052
run-0.052
ement-0.049
 brand-0.048
apy-0.046
Token They
Token ID1119
Feature activation+3.19e-03

Pos logit contributions
ement+0.006
run+0.006
 gust+0.006
 brand+0.006
izing+0.005
Neg logit contributions
ont-0.003
P-0.003
 moral-0.003
 gard-0.003
rev-0.003
Token liked
Token ID8288
Feature activation-6.69e-01

Pos logit contributions
ont+0.000
 gard+0.000
P+0.000
Bel+0.000
rev+0.000
Neg logit contributions
run-0.000
ement-0.000
 gust-0.000
 brand-0.000
 anymore-0.000
Token to
Token ID284
Feature activation-6.91e-01

Pos logit contributions
run+0.034
 gust+0.034
 brand+0.033
 anymore+0.032
ement+0.032
Neg logit contributions
ont-0.023
P-0.021
 gard-0.020
rev-0.020
Bel-0.020
Token,
Token ID11
Feature activation-8.86e-02

Pos logit contributions
run+0.020
ement+0.020
 gust+0.019
 brand+0.019
 anymore+0.017
Neg logit contributions
ont-0.016
 gard-0.015
P-0.014
 cor-0.014
Bel-0.014
Token it
Token ID340
Feature activation-6.37e-01

Pos logit contributions
run+0.004
ement+0.004
 gust+0.004
 brand+0.004
 anymore+0.003
Neg logit contributions
ont-0.004
 gard-0.003
P-0.003
Bel-0.003
rev-0.003
Token's
Token ID338
Feature activation-3.16e-01

Pos logit contributions
run+0.036
ement+0.035
 gust+0.035
 anymore+0.033
 brand+0.033
Neg logit contributions
ont-0.018
P-0.016
 gard-0.016
rev-0.016
Bel-0.016
Token time
Token ID640
Feature activation-1.15e+00

Pos logit contributions
run+0.013
ement+0.013
 gust+0.013
 brand+0.012
 anymore+0.011
Neg logit contributions
ont-0.014
P-0.013
 gard-0.013
Bel-0.012
 moral-0.012
Token for
Token ID329
Feature activation-1.30e+00

Pos logit contributions
ement+0.065
run+0.064
 gust+0.064
 brand+0.061
 anymore+0.059
Neg logit contributions
ont-0.033
 gard-0.029
P-0.028
Bel-0.027
 moral-0.026
Token bed
Token ID3996
Feature activation-1.47e+00

Pos logit contributions
 gust+0.063
run+0.061
ement+0.059
 brand+0.058
 anymore+0.058
Neg logit contributions
ont-0.050
Bel-0.044
P-0.044
rev-0.043
 gard-0.043
Token.
Token ID13
Feature activation-4.68e-01

Pos logit contributions
run+0.073
 gust+0.073
ement+0.071
 brand+0.067
 importantly+0.063
Neg logit contributions
ont-0.052
P-0.046
 gard-0.045
rev-0.043
 two-0.042
Token You
Token ID921
Feature activation+3.61e-01

Pos logit contributions
run+0.018
 gust+0.017
ement+0.017
 brand+0.016
 anymore+0.016
Neg logit contributions
ont-0.023
P-0.021
 gard-0.021
Bel-0.020
rev-0.020
Token need
Token ID761
Feature activation-1.90e-01

Pos logit contributions
ont+0.017
 gard+0.015
P+0.015
Bel+0.015
 moral+0.015
Neg logit contributions
run-0.014
ement-0.014
 gust-0.014
 brand-0.013
 anymore-0.012
Token to
Token ID284
Feature activation+2.52e-01

Pos logit contributions
run+0.009
 gust+0.009
 brand+0.009
 anymore+0.009
ement+0.008
Neg logit contributions
ont-0.007
rev-0.006
P-0.006
Bel-0.006
 gard-0.006
Token sleep
Token ID3993
Feature activation-9.42e-01

Pos logit contributions
ont+0.010
P+0.009
rev+0.009
Bel+0.009
 gard+0.009
Neg logit contributions
 gust-0.011
run-0.011
 anymore-0.011
ement-0.011
 brand-0.010
Token Ben
Token ID3932
Feature activation+3.56e-01

Pos logit contributions
 gust+0.021
ement+0.021
run+0.020
 brand+0.020
 importantly+0.018
Neg logit contributions
ont-0.014
 gard-0.012
 adult-0.012
P-0.012
 two-0.012
Token and
Token ID290
Feature activation+2.74e-01

Pos logit contributions
ont+0.007
 two+0.005
 parent+0.005
Bel+0.005
 adult+0.005
Neg logit contributions
 gust-0.024
ement-0.024
run-0.022
 importantly-0.022
 brand-0.022
Token Lily
Token ID20037
Feature activation-2.50e-01

Pos logit contributions
ont+0.010
 gard+0.009
P+0.009
Bel+0.008
rev+0.008
Neg logit contributions
ement-0.014
run-0.014
 gust-0.013
 anymore-0.013
 brand-0.013
Token for
Token ID329
Feature activation-7.69e-01

Pos logit contributions
 gust+0.017
ement+0.017
 importantly+0.016
run+0.016
 brand+0.016
Neg logit contributions
 two-0.004
ont-0.004
 next-0.003
 parent-0.003
 pet-0.003
Token any
Token ID597
Feature activation-6.88e-01

Pos logit contributions
ement+0.050
run+0.047
 gust+0.047
 anymore+0.044
oped+0.044
Neg logit contributions
ont-0.017
 two-0.016
 gard-0.015
 adult-0.015
 moral-0.014
Token injuries
Token ID6821
Feature activation-3.06e+00

Pos logit contributions
ement+0.034
run+0.033
 gust+0.033
 brand+0.031
oped+0.031
Neg logit contributions
ont-0.023
 adult-0.022
 gard-0.021
 moral-0.021
 two-0.020
Token and
Token ID290
Feature activation+2.43e-01

Pos logit contributions
 gust+0.195
 brand+0.188
ement+0.184
run+0.182
 importantly+0.176
Neg logit contributions
ont-0.067
 two-0.056
 next-0.050
ent-0.049
 gard-0.047
Token gave
Token ID2921
Feature activation+3.65e-01

Pos logit contributions
ont+0.009
 gard+0.008
P+0.008
 cor+0.008
 two+0.007
Neg logit contributions
ement-0.012
run-0.012
 gust-0.012
 brand-0.011
 anymore-0.011
Token them
Token ID606
Feature activation+3.56e-01

Pos logit contributions
ont+0.014
P+0.013
 gard+0.013
Bel+0.012
rev+0.012
Neg logit contributions
run-0.017
 gust-0.017
ement-0.017
 brand-0.016
 anymore-0.015
Token hugs
Token ID40657
Feature activation+8.72e-02

Pos logit contributions
ont+0.015
rev+0.014
 gard+0.014
P+0.014
 Gard+0.014
Neg logit contributions
run-0.016
 anymore-0.015
 brand-0.014
 gust-0.013
ement-0.013
Token and
Token ID290
Feature activation-1.04e-01

Pos logit contributions
ont+0.003
P+0.003
 gard+0.003
Bel+0.003
rev+0.003
Neg logit contributions
run-0.004
 gust-0.004
ement-0.004
 brand-0.004
 anymore-0.004
Token the
Token ID262
Feature activation-6.64e-01

Pos logit contributions
run+0.018
 gust+0.018
ement+0.018
 brand+0.017
ended+0.016
Neg logit contributions
ont-0.010
P-0.009
rev-0.008
 gard-0.008
Bel-0.008
Token cricket
Token ID18836
Feature activation-5.64e-01

Pos logit contributions
 gust+0.035
ement+0.034
run+0.033
 brand+0.033
 anymore+0.032
Neg logit contributions
ont-0.022
 gard-0.020
P-0.020
Bel-0.019
rev-0.019
Token,
Token ID11
Feature activation-4.50e-01

Pos logit contributions
run+0.028
 gust+0.028
ement+0.028
 brand+0.026
 anymore+0.025
Neg logit contributions
ont-0.020
P-0.018
 gard-0.018
Bel-0.017
 moral-0.016
Token it
Token ID340
Feature activation-1.49e-01

Pos logit contributions
run+0.021
ement+0.021
 gust+0.019
 brand+0.018
istered+0.018
Neg logit contributions
ont-0.017
 gard-0.017
 two-0.016
P-0.016
 moral-0.016
Token looked
Token ID3114
Feature activation-4.03e-01

Pos logit contributions
run+0.007
 gust+0.007
ement+0.007
 brand+0.006
 anymore+0.006
Neg logit contributions
ont-0.006
P-0.005
 gard-0.005
Bel-0.005
rev-0.005
Token brand
Token ID4508
Feature activation-3.66e+00

Pos logit contributions
run+0.022
oped+0.021
ement+0.020
ended+0.020
unch+0.019
Neg logit contributions
 two-0.011
 moral-0.011
ont-0.011
 gard-0.011
P-0.010
Token new
Token ID649
Feature activation-1.03e+00

Pos logit contributions
run+0.210
 gust+0.202
oped+0.200
ifying+0.195
hole+0.192
Neg logit contributions
ont-0.103
 lesson-0.094
 moral-0.090
P-0.086
 adult-0.080
Token.
Token ID13
Feature activation-4.81e-01

Pos logit contributions
 gust+0.062
run+0.062
ement+0.060
 brand+0.056
 anymore+0.055
Neg logit contributions
ont-0.027
 gard-0.023
P-0.023
Bel-0.021
rev-0.021
Token Tim
Token ID5045
Feature activation-4.97e-02

Pos logit contributions
run+0.019
ement+0.019
 gust+0.019
 anymore+0.017
 brand+0.017
Neg logit contributions
ont-0.021
 gard-0.020
P-0.019
rev-0.019
Bel-0.019
Tokenmy
Token ID1820
Feature activation+3.75e-01

Pos logit contributions
run+0.002
ement+0.002
 gust+0.002
 brand+0.002
 anymore+0.002
Neg logit contributions
ont-0.002
 gard-0.002
P-0.002
Bel-0.002
resp-0.002
Token felt
Token ID2936
Feature activation-4.32e-01

Pos logit contributions
ont+0.016
 gard+0.015
Bel+0.015
P+0.014
resp+0.014
Neg logit contributions
run-0.015
 brand-0.015
 gust-0.015
ement-0.015
ended-0.013
TokenHer
Token ID9360
Feature activation-3.84e-01

Pos logit contributions
ont+0.015
rev+0.013
 Gard+0.013
 gard+0.013
fl+0.013
Neg logit contributions
 gust-0.010
run-0.010
 brand-0.009
he-0.009
izing-0.009
Token little
Token ID1310
Feature activation-3.37e-01

Pos logit contributions
 gust+0.021
run+0.021
 brand+0.021
ement+0.020
 anymore+0.020
Neg logit contributions
ont-0.011
P-0.010
rev-0.010
Bel-0.010
 gard-0.009
Token sister
Token ID6621
Feature activation+7.24e-01

Pos logit contributions
 gust+0.019
run+0.019
ement+0.018
 anymore+0.017
 brand+0.017
Neg logit contributions
ont-0.010
P-0.009
rev-0.009
Bel-0.009
 gard-0.009
Token watched
Token ID7342
Feature activation-6.13e-02

Pos logit contributions
ont+0.031
P+0.029
Bel+0.027
 gard+0.027
rev+0.027
Neg logit contributions
run-0.031
 gust-0.030
ement-0.030
 anymore-0.028
 brand-0.028
Token in
Token ID287
Feature activation+3.22e-01

Pos logit contributions
run+0.003
ement+0.003
 gust+0.003
 brand+0.003
istered+0.003
Neg logit contributions
ont-0.002
 gard-0.002
 moral-0.001
 adult-0.001
P-0.001
Token amaz
Token ID40642
Feature activation-4.24e+00

Pos logit contributions
 two+0.008
 moral+0.007
 parent+0.007
 next+0.006
 pet+0.006
Neg logit contributions
ended-0.019
ement-0.019
izing-0.019
run-0.019
oped-0.019
Tokenement
Token ID972
Feature activation-1.80e+00

Pos logit contributions
run+0.070
 gust+0.055
 haven+0.046
 exhausted+0.044
istered+0.039
Neg logit contributions
ont-0.306
P-0.291
 gard-0.273
 adult-0.271
ent-0.267
Token.
Token ID13
Feature activation-5.53e-01

Pos logit contributions
run+0.081
 gust+0.078
ement+0.078
 brand+0.076
 anymore+0.068
Neg logit contributions
ont-0.071
 gard-0.066
Bel-0.063
P-0.063
fl-0.062
Token She
Token ID1375
Feature activation-2.95e-01

Pos logit contributions
run+0.022
ement+0.021
 gust+0.021
 brand+0.020
 anymore+0.019
Neg logit contributions
ont-0.025
 gard-0.023
P-0.023
Bel-0.023
rev-0.022
Token climbed
Token ID19952
Feature activation-1.04e+00

Pos logit contributions
run+0.013
 gust+0.013
ement+0.013
 brand+0.012
 anymore+0.012
Neg logit contributions
ont-0.012
P-0.011
 gard-0.011
Bel-0.010
rev-0.010
Token up
Token ID510
Feature activation+4.18e-01

Pos logit contributions
run+0.040
 gust+0.039
ement+0.039
 brand+0.038
 anymore+0.037
Neg logit contributions
ont-0.049
P-0.044
 gard-0.044
rev-0.043
resp-0.042
Token wings
Token ID12098
Feature activation-3.46e-01

Pos logit contributions
ont+0.012
rev+0.011
P+0.010
 gard+0.010
Bel+0.010
Neg logit contributions
 gust-0.014
ement-0.014
run-0.013
 anymore-0.013
 brand-0.013
Token.
Token ID13
Feature activation+7.38e-02

Pos logit contributions
 gust+0.021
run+0.021
ement+0.021
 brand+0.020
 importantly+0.019
Neg logit contributions
ont-0.008
fl-0.007
 two-0.007
 cor-0.006
 gard-0.006
Token He
Token ID679
Feature activation+1.85e-01

Pos logit contributions
ont+0.004
P+0.004
 gard+0.003
Bel+0.003
rev+0.003
Neg logit contributions
run-0.003
 gust-0.003
ement-0.003
 brand-0.002
 anymore-0.002
Token watched
Token ID7342
Feature activation+3.58e-01

Pos logit contributions
ont+0.009
rev+0.008
 Gard+0.008
 gard+0.008
P+0.008
Neg logit contributions
run-0.007
 gust-0.007
 brand-0.006
ement-0.006
 badly-0.006
Token in
Token ID287
Feature activation+5.92e-01

Pos logit contributions
ont+0.016
P+0.015
 gard+0.015
Bel+0.014
rev+0.014
Neg logit contributions
run-0.014
ement-0.014
 gust-0.014
 brand-0.013
 anymore-0.013
Token amaz
Token ID40642
Feature activation-3.93e+00

Pos logit contributions
 two+0.027
 gard+0.026
 moral+0.026
ont+0.026
 cor+0.025
Neg logit contributions
run-0.022
ement-0.022
ended-0.021
istered-0.021
oped-0.020
Tokenement
Token ID972
Feature activation-1.50e+00

Pos logit contributions
run+0.056
 gust+0.047
 haven+0.035
 exhausted+0.034
istered+0.032
Neg logit contributions
ont-0.288
P-0.270
 gard-0.258
 adult-0.257
rev-0.254
Token as
Token ID355
Feature activation-7.95e-01

Pos logit contributions
run+0.058
 gust+0.057
ement+0.056
 brand+0.054
 anymore+0.049
Neg logit contributions
ont-0.068
 gard-0.063
Bel-0.061
P-0.060
 cor-0.060
Token it
Token ID340
Feature activation+1.22e+00

Pos logit contributions
run+0.036
ement+0.035
 gust+0.034
 brand+0.032
 anymore+0.031
Neg logit contributions
ont-0.032
 gard-0.030
 moral-0.029
Bel-0.029
 cor-0.029
Token flew
Token ID13112
Feature activation-4.58e-01

Pos logit contributions
ont+0.047
P+0.043
 gard+0.042
Bel+0.042
rev+0.040
Neg logit contributions
run-0.057
ement-0.056
 gust-0.055
 brand-0.053
 anymore-0.052
Token away
Token ID1497
Feature activation+1.39e+00

Pos logit contributions
run+0.021
ement+0.021
 gust+0.020
 brand+0.019
 anymore+0.019
Neg logit contributions
ont-0.018
P-0.017
 gard-0.017
Bel-0.016
rev-0.016
Token it
Token ID340
Feature activation-4.19e-01

Pos logit contributions
run+0.028
ement+0.028
 gust+0.028
 brand+0.026
 anymore+0.026
Neg logit contributions
ont-0.021
P-0.019
 gard-0.019
Bel-0.018
rev-0.018
Token often
Token ID1690
Feature activation-3.63e-01

Pos logit contributions
run+0.020
 gust+0.020
ement+0.019
 brand+0.018
 anymore+0.017
Neg logit contributions
ont-0.016
 gard-0.015
P-0.015
Bel-0.014
 moral-0.014
Token and
Token ID290
Feature activation-3.00e-01

Pos logit contributions
 gust+0.023
ement+0.023
run+0.022
 importantly+0.021
ended+0.021
Neg logit contributions
ont-0.008
 gard-0.008
 two-0.007
 next-0.007
fl-0.007
Token watched
Token ID7342
Feature activation+1.64e-01

Pos logit contributions
ement+0.016
run+0.015
 gust+0.015
 brand+0.014
 anymore+0.014
Neg logit contributions
ont-0.010
 gard-0.009
Bel-0.009
P-0.009
rev-0.009
Token in
Token ID287
Feature activation+4.21e-01

Pos logit contributions
ont+0.006
 gard+0.005
 cor+0.005
 adult+0.005
 moral+0.005
Neg logit contributions
run-0.008
ement-0.008
 gust-0.008
 brand-0.007
ited-0.007
Token amaz
Token ID40642
Feature activation-4.36e+00

Pos logit contributions
 two+0.012
 moral+0.012
 parent+0.011
 parents+0.010
ont+0.010
Neg logit contributions
oped-0.023
ement-0.023
ended-0.023
 gust-0.023
unch-0.023
Tokenement
Token ID972
Feature activation-1.71e+00

Pos logit contributions
run+0.079
 gust+0.069
 haven+0.065
 exhausted+0.060
istered+0.058
Neg logit contributions
ont-0.297
P-0.282
 adult-0.270
ent-0.267
 two-0.264
Token whenever
Token ID8797
Feature activation-5.80e-01

Pos logit contributions
 gust+0.083
run+0.083
ement+0.082
 brand+0.078
 anymore+0.072
Neg logit contributions
ont-0.060
 gard-0.053
Bel-0.052
fl-0.051
 cor-0.051
Token the
Token ID262
Feature activation-8.35e-02

Pos logit contributions
run+0.019
 gust+0.019
ement+0.019
 brand+0.017
 anymore+0.017
Neg logit contributions
ont-0.030
P-0.028
 gard-0.028
Bel-0.027
rev-0.027
Token door
Token ID3420
Feature activation-1.07e+00

Pos logit contributions
run+0.004
ement+0.004
 gust+0.004
 brand+0.004
 anymore+0.004
Neg logit contributions
ont-0.003
 gard-0.003
P-0.003
Bel-0.003
rev-0.003
Token opened
Token ID4721
Feature activation-4.11e-01

Pos logit contributions
 gust+0.063
run+0.063
 brand+0.059
ement+0.059
istered+0.058
Neg logit contributions
ont-0.025
 two-0.022
 moral-0.022
 gard-0.022
 parent-0.020