TOP ACTIVATIONS
MAX = 0.135

canned and jar red and bottled and vacuum - se aled
the store and buy some bottled water . Lily was happy
" G reetings , brave rabbit . I
friendly voice , " G reetings , little one . Why
friendly voice . " G reetings Jack ! Are you here
gen ie . " G reetings , kind one !" said
spoke up . " G reetings , le opard ," it
and said , " G reetings Sir . Do you speak
dog replied , " G reetings , little one . Don
A ww , okay . â
He said , " E ww , this is the worst
at it said " ew ww !" One
She said , " E ww , disgusting !" She threw
be bitter , like e ww . Do you want to
and said , " E ww , Bruno . You are
. " A ww , it is so cute
and said , " A ww www , do I have
!" " A ww !" Lily says . "
. " E ww !" she screamed .
the ground . " E ww , what 's that ?"

MIN ACTIVATIONS
MIN = -0.125

: Sharing is caring . Pride goes before a fall .
the tree to its once former beauty . He was happy
. One day , Wh isk ers woke up from his
Ch ir py and Wh isk ers knew that poison was
day , Spot and Wh isk ers went to the park
" We are the weapon creators ! We can stop the
Sally told him , Wh isk ers always wanted to help
. One day , Wh isk ers saw a bird in
. One day , Wh isk ers saw a bird outside
. So Wh isk ers continued to eat and
ink . Mr . Wh isk ers went to the store
, " I am Wh isk ers . I am too
ch ubby cat named Wh isk ers . Tim my loved
bark ed , but Wh isk ers did not care .
a naughty cat named Wh isk ers . Wh isk ers
Lucy and Mr . Wh isk ers went for a walk
flew down and scared Wh isk ers away .
Wh isk ers . Wh isk ers loved to play with
loved to play with Wh isk ers , but one day
outside the door . Wh isk ers wanted to catch the

INTERVAL 0.070 - 0.135
CONTAINS 0.070%

' tick - t ock ', and they would both be
looks very nice , sweet heart ," she said .
. It was so y ummy ! M olly
O w ! That hurt . â
make him the sp ic iest food she could think of

INTERVAL 0.005 - 0.070
CONTAINS 29.481%

, and they ran in the grass . At
finished her crack ers and went back inside to play with
made a new friend and shared her gift . They ran
sorry , I didn 't mean to kick it so hard
infant wanted to play with the ball , but it was

INTERVAL -0.060 - 0.005
CONTAINS 70.406%

found a nice , warm spot to hide from the cold
my friend ? â Bob smiled and said
was always anxious because he didn 't know where he belonged
there , but Sam is faster . He passes Ben and
her favorite song and put it into her pocket .

INTERVAL -0.125 - -0.060
CONTAINS 0.044%

bright fish they all seemed to
" Look , Mitt ens ," said Lily .
. They make neck l aces , brace lets , and
saw a letter from the bank . The letter said he
touched things that are not theirs again . They played in
Token canned
Token ID32530
Feature activation-5.21e-03

Pos logit contributions
 contain+0.000
 banks+0.000
 tum+0.000
 allowance+0.000
ured+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
hem-0.000
asc-0.000
Token and
Token ID290
Feature activation-6.42e-04

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.008
 symp+0.008
tted+0.008
Neg logit contributions
ww-0.008
w-0.008
o-0.008
ins-0.007
sets-0.007
Token jar
Token ID17379
Feature activation-1.80e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 tum+0.001
 allowance+0.001
ured+0.001
Neg logit contributions
w-0.001
ww-0.001
o-0.001
hem-0.001
asc-0.001
Tokenred
Token ID445
Feature activation+1.95e-02

Pos logit contributions
 contain+0.028
 banks+0.026
 allowance+0.026
 symp+0.026
 tum+0.025
Neg logit contributions
ww-0.029
w-0.029
o-0.029
sets-0.026
ins-0.026
Token and
Token ID290
Feature activation-5.63e-04

Pos logit contributions
ww+0.032
o+0.031
w+0.031
sets+0.028
ins+0.028
Neg logit contributions
 contain-0.032
 banks-0.029
 allowance-0.028
 tum-0.028
 symp-0.027
Token bottled
Token ID34123
Feature activation+1.35e-01

Pos logit contributions
 contain+0.001
 banks+0.001
 tum+0.001
 allowance+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
asc-0.001
Token and
Token ID290
Feature activation-4.69e-04

Pos logit contributions
ww+0.225
w+0.218
o+0.206
hem+0.190
sets+0.185
Neg logit contributions
 contain-0.217
 banks-0.211
 allowance-0.200
ured-0.199
 tum-0.199
Token vacuum
Token ID17076
Feature activation-5.11e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 tum+0.001
 allowance+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
asc-0.001
Token-
Token ID12
Feature activation+4.69e-02

Pos logit contributions
 contain+0.088
 banks+0.083
 allowance+0.081
ured+0.079
 tum+0.079
Neg logit contributions
ww-0.078
w-0.076
o-0.075
sets-0.066
hem-0.066
Tokense
Token ID325
Feature activation+2.49e-02

Pos logit contributions
ww+0.077
w+0.076
o+0.074
sets+0.067
hem+0.067
Neg logit contributions
 contain-0.075
 banks-0.071
 allowance-0.068
ured-0.067
 symp-0.066
Tokenaled
Token ID3021
Feature activation-3.29e-02

Pos logit contributions
w+0.041
ww+0.041
o+0.040
ins+0.036
hem+0.035
Neg logit contributions
 contain-0.040
 banks-0.036
 tum-0.036
 allowance-0.035
 symp-0.035
Token the
Token ID262
Feature activation+1.11e-02

Pos logit contributions
 contain+0.010
tted+0.010
ured+0.010
 widened+0.010
 symp+0.009
Neg logit contributions
 my-0.007
o-0.007
 yourselves-0.007
 yourself-0.007
 your-0.006
Token store
Token ID3650
Feature activation+1.82e-03

Pos logit contributions
ww+0.013
o+0.013
w+0.013
hem+0.012
 yourselves+0.012
Neg logit contributions
 contain-0.022
 symp-0.020
 allowance-0.020
ured-0.020
 widened-0.020
Token and
Token ID290
Feature activation+1.15e-03

Pos logit contributions
o+0.003
w+0.002
ww+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.003
 banks-0.003
 symp-0.003
ured-0.003
 widened-0.003
Token buy
Token ID2822
Feature activation-2.97e-02

Pos logit contributions
o+0.002
ww+0.002
w+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.001
 banks-0.001
 symp-0.001
 allowance-0.001
ured-0.001
Token some
Token ID617
Feature activation-1.28e-02

Pos logit contributions
 contain+0.052
 banks+0.048
tted+0.047
ured+0.047
 symp+0.047
Neg logit contributions
ww-0.045
w-0.044
o-0.043
ries-0.039
 my-0.039
Token bottled
Token ID34123
Feature activation+1.24e-01

Pos logit contributions
 contain+0.024
 banks+0.023
 symp+0.022
 allowance+0.022
ured+0.022
Neg logit contributions
ww-0.018
w-0.017
o-0.017
hem-0.015
sets-0.015
Token water
Token ID1660
Feature activation+9.75e-03

Pos logit contributions
ww+0.201
w+0.194
o+0.185
hem+0.169
sets+0.166
Neg logit contributions
 banks-0.202
 contain-0.200
 allowance-0.192
 tum-0.189
ured-0.183
Token.
Token ID13
Feature activation-4.00e-03

Pos logit contributions
ww+0.018
w+0.018
o+0.017
hem+0.016
sets+0.016
Neg logit contributions
 contain-0.013
 banks-0.013
 allowance-0.012
 tum-0.012
ured-0.012
Token Lily
Token ID20037
Feature activation+5.85e-03

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
ured+0.006
tted+0.006
Neg logit contributions
ww-0.006
o-0.006
w-0.006
hem-0.005
sets-0.005
Token was
Token ID373
Feature activation-6.67e-04

Pos logit contributions
o+0.008
w+0.008
ww+0.008
my+0.007
very+0.007
Neg logit contributions
 contain-0.011
 allowance-0.010
 banks-0.010
ids-0.010
ured-0.010
Token happy
Token ID3772
Feature activation-3.73e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
isc+0.001
 symp+0.001
Neg logit contributions
o-0.001
 my-0.001
ww-0.001
w-0.001
ep-0.001
Token
Token ID220
Feature activation-1.46e-03

Pos logit contributions
 contain+0.009
 banks+0.009
 allowance+0.008
ured+0.008
 symp+0.008
Neg logit contributions
ww-0.009
w-0.009
o-0.009
sets-0.008
hem-0.008
Token
Token ID198
Feature activation+1.49e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
ured+0.003
 symp+0.002
Neg logit contributions
o-0.002
ww-0.002
w-0.002
hem-0.002
sets-0.002
Token
Token ID198
Feature activation+1.24e-03

Pos logit contributions
ww+0.002
w+0.002
o+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.002
 symp-0.002
ured-0.002
Token"
Token ID1
Feature activation+1.89e-03

Pos logit contributions
ww+0.002
w+0.002
o+0.002
hem+0.001
sets+0.001
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
 tum-0.002
 symp-0.002
TokenG
Token ID38
Feature activation-5.03e-04

Pos logit contributions
ww+0.003
w+0.003
o+0.002
ries+0.002
hem+0.002
Neg logit contributions
 contain-0.003
tted-0.003
 widened-0.003
 symp-0.003
 banks-0.003
Tokenreetings
Token ID46648
Feature activation+1.21e-01

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 symp+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token,
Token ID11
Feature activation+1.71e-03

Pos logit contributions
w+0.205
ww+0.203
o+0.195
hem+0.178
sets+0.175
Neg logit contributions
 contain-0.182
 banks-0.176
 allowance-0.169
 tum-0.168
 symp-0.167
Token brave
Token ID14802
Feature activation-2.97e-03

Pos logit contributions
w+0.003
ww+0.003
o+0.003
ries+0.003
sets+0.003
Neg logit contributions
 contain-0.002
 banks-0.002
 tum-0.002
 symp-0.002
ured-0.002
Token rabbit
Token ID22746
Feature activation+1.48e-03

Pos logit contributions
 banks+0.005
 contain+0.005
 tum+0.004
 allowance+0.004
 symp+0.004
Neg logit contributions
w-0.005
ww-0.005
o-0.004
sets-0.004
 yourselves-0.004
Token.
Token ID13
Feature activation-5.43e-03

Pos logit contributions
ww+0.002
w+0.002
sets+0.002
o+0.002
hem+0.002
Neg logit contributions
 contain-0.002
 banks-0.002
 tum-0.002
 bank-0.002
 aid-0.002
Token I
Token ID314
Feature activation+9.57e-03

Pos logit contributions
 contain+0.010
 banks+0.010
tted+0.010
ured+0.010
 tum+0.010
Neg logit contributions
w-0.007
ww-0.007
o-0.006
ries-0.006
ins-0.006
Token friendly
Token ID8030
Feature activation+2.28e-03

Pos logit contributions
 contain+0.012
 banks+0.011
 tum+0.011
 symp+0.011
 allowance+0.011
Neg logit contributions
w-0.011
ww-0.011
o-0.011
sets-0.010
hem-0.010
Token voice
Token ID3809
Feature activation+7.30e-03

Pos logit contributions
w+0.004
ww+0.004
o+0.004
ins+0.004
ries+0.004
Neg logit contributions
 contain-0.003
 tum-0.003
 banks-0.003
 symp-0.003
 aid-0.003
Token,
Token ID11
Feature activation+1.88e-03

Pos logit contributions
ww+0.015
w+0.015
o+0.014
sets+0.013
 yourselves+0.013
Neg logit contributions
 contain-0.009
ured-0.008
 banks-0.008
 symp-0.008
 tum-0.008
Token "
Token ID366
Feature activation-4.25e-03

Pos logit contributions
w+0.004
ww+0.004
o+0.003
hem+0.003
loo+0.003
Neg logit contributions
 symp-0.002
 contain-0.002
ured-0.002
 banks-0.002
 widened-0.002
TokenG
Token ID38
Feature activation-1.05e-03

Pos logit contributions
 contain+0.007
+0.006
tted+0.006
 symp+0.006
ured+0.006
Neg logit contributions
ww-0.007
ries-0.007
w-0.006
loo-0.006
imi-0.006
Tokenreetings
Token ID46648
Feature activation+1.21e-01

Pos logit contributions
 contain+0.002
 banks+0.002
tted+0.002
ured+0.002
 symp+0.002
Neg logit contributions
ww-0.002
w-0.001
o-0.001
sets-0.001
hem-0.001
Token,
Token ID11
Feature activation+2.07e-03

Pos logit contributions
w+0.216
ww+0.213
o+0.205
hem+0.188
iren+0.183
Neg logit contributions
 contain-0.171
 banks-0.164
 allowance-0.159
 tum-0.156
ured-0.155
Token little
Token ID1310
Feature activation+9.78e-03

Pos logit contributions
w+0.004
ww+0.004
o+0.003
rob+0.003
ries+0.003
Neg logit contributions
 contain-0.003
 banks-0.003
ured-0.003
 tum-0.003
 allowance-0.003
Token one
Token ID530
Feature activation-1.56e-02

Pos logit contributions
o+0.014
w+0.014
ww+0.014
hem+0.012
iren+0.012
Neg logit contributions
 contain-0.016
 banks-0.016
 allowance-0.016
 tum-0.015
 symp-0.015
Token.
Token ID13
Feature activation-2.29e-03

Pos logit contributions
 contain+0.022
+0.020
 believe+0.019
 banks+0.017
 tum+0.017
Neg logit contributions
w-0.028
rob-0.027
ucks-0.027
ww-0.027
ries-0.026
Token Why
Token ID4162
Feature activation+1.54e-02

Pos logit contributions
 contain+0.004
 banks+0.004
ured+0.004
+0.004
tted+0.004
Neg logit contributions
w-0.003
ww-0.003
ries-0.003
ins-0.003
 cream-0.003
Token friendly
Token ID8030
Feature activation+1.97e-03

Pos logit contributions
 contain+0.013
 banks+0.012
 tum+0.012
 symp+0.012
 allowance+0.012
Neg logit contributions
w-0.014
ww-0.014
o-0.013
sets-0.012
ries-0.012
Token voice
Token ID3809
Feature activation+7.35e-03

Pos logit contributions
w+0.004
ww+0.004
o+0.003
ins+0.003
sets+0.003
Neg logit contributions
 contain-0.003
 tum-0.003
 banks-0.003
 aid-0.003
 symp-0.003
Token.
Token ID13
Feature activation-5.73e-03

Pos logit contributions
ww+0.013
w+0.013
o+0.013
sets+0.012
hem+0.012
Neg logit contributions
 contain-0.011
 banks-0.010
 allowance-0.010
 tum-0.009
 symp-0.009
Token "
Token ID366
Feature activation-5.02e-03

Pos logit contributions
 contain+0.008
 tum+0.007
 banks+0.007
 symp+0.007
ured+0.007
Neg logit contributions
w-0.012
ww-0.011
o-0.010
rob-0.010
ries-0.010
TokenG
Token ID38
Feature activation-5.66e-04

Pos logit contributions
 contain+0.009
tted+0.008
 symp+0.008
ured+0.008
+0.008
Neg logit contributions
ww-0.007
w-0.007
ries-0.007
o-0.006
sets-0.006
Tokenreetings
Token ID46648
Feature activation+1.21e-01

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
tted+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token Jack
Token ID3619
Feature activation+2.08e-02

Pos logit contributions
w+0.220
ww+0.217
o+0.208
hem+0.193
iren+0.189
Neg logit contributions
 contain-0.169
 banks-0.158
 allowance-0.155
 tum-0.152
 symp-0.151
Token!
Token ID0
Feature activation+1.44e-02

Pos logit contributions
ww+0.033
w+0.033
hem+0.030
o+0.030
Look+0.030
Neg logit contributions
 contain-0.032
 banks-0.030
 allowance-0.029
 symp-0.029
 tum-0.028
Token Are
Token ID4231
Feature activation+6.26e-02

Pos logit contributions
w+0.020
ww+0.019
ries+0.018
ins+0.017
 cream+0.017
Neg logit contributions
 contain-0.027
tted-0.025
-0.025
 tum-0.024
 aid-0.024
Token you
Token ID345
Feature activation-2.05e-02

Pos logit contributions
ww+0.057
w+0.050
hem+0.048
o+0.048
sets+0.046
Neg logit contributions
 contain-0.147
 allowance-0.135
 symp-0.133
ured-0.133
 tum-0.132
Token here
Token ID994
Feature activation-4.42e-02

Pos logit contributions
 contain+0.036
 banks+0.034
 allowance+0.033
ured+0.032
 tum+0.032
Neg logit contributions
ww-0.032
w-0.031
o-0.029
hem-0.027
rob-0.026
Token gen
Token ID2429
Feature activation+4.21e-02

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
 tum+0.011
 symp+0.011
Neg logit contributions
ww-0.011
w-0.011
o-0.011
sets-0.010
hem-0.010
Tokenie
Token ID494
Feature activation+2.84e-02

Pos logit contributions
ww+0.047
w+0.041
o+0.041
sets+0.038
 yourselves+0.036
Neg logit contributions
 contain-0.090
 symp-0.083
 banks-0.082
ured-0.082
 allowance-0.081
Token.
Token ID13
Feature activation-4.08e-03

Pos logit contributions
ww+0.050
w+0.049
o+0.048
sets+0.043
hem+0.043
Neg logit contributions
 contain-0.043
 banks-0.040
 allowance-0.039
ured-0.038
 tum-0.038
Token "
Token ID366
Feature activation-4.99e-03

Pos logit contributions
 contain+0.005
 banks+0.004
 allowance+0.004
 tum+0.004
ured+0.004
Neg logit contributions
w-0.009
ww-0.009
o-0.008
sets-0.008
hem-0.008
TokenG
Token ID38
Feature activation-9.93e-04

Pos logit contributions
 contain+0.009
 symp+0.009
tted+0.009
 widened+0.009
 banks+0.008
Neg logit contributions
ww-0.006
o-0.006
sets-0.006
w-0.006
hem-0.006
Tokenreetings
Token ID46648
Feature activation+1.20e-01

Pos logit contributions
 contain+0.002
 banks+0.001
 symp+0.001
 allowance+0.001
tted+0.001
Neg logit contributions
ww-0.002
w-0.002
o-0.001
sets-0.001
hem-0.001
Token,
Token ID11
Feature activation+1.45e-03

Pos logit contributions
w+0.212
ww+0.210
o+0.203
hem+0.187
iren+0.183
Neg logit contributions
 contain-0.174
 banks-0.162
 tum-0.159
 allowance-0.159
 symp-0.157
Token kind
Token ID1611
Feature activation-3.48e-03

Pos logit contributions
w+0.003
ww+0.002
o+0.002
rob+0.002
ins+0.002
Neg logit contributions
 contain-0.002
 tum-0.002
 symp-0.002
 allowance-0.002
-0.002
Token one
Token ID530
Feature activation-1.57e-02

Pos logit contributions
 contain+0.005
 tum+0.005
 banks+0.005
 symp+0.005
 allowance+0.005
Neg logit contributions
w-0.006
ww-0.006
o-0.006
rob-0.005
sets-0.005
Token!"
Token ID2474
Feature activation+1.34e-02

Pos logit contributions
 contain+0.018
 tum+0.016
 believe+0.016
+0.015
 allowance+0.014
Neg logit contributions
irens-0.031
rob-0.031
w-0.031
ww-0.030
Look-0.030
Token said
Token ID531
Feature activation-1.37e-03

Pos logit contributions
ww+0.022
w+0.021
o+0.020
asc+0.019
ins+0.018
Neg logit contributions
 contain-0.023
 banks-0.021
 tum-0.021
 allowance-0.020
 symp-0.020
Token spoke
Token ID5158
Feature activation+1.11e-02

Pos logit contributions
ww+0.009
w+0.009
o+0.008
hem+0.007
sets+0.007
Neg logit contributions
 contain-0.014
 banks-0.013
 allowance-0.013
 tum-0.013
 symp-0.013
Token up
Token ID510
Feature activation-2.83e-02

Pos logit contributions
ww+0.018
o+0.018
w+0.018
sets+0.016
hem+0.016
Neg logit contributions
 contain-0.018
 banks-0.017
 allowance-0.017
 tum-0.016
tted-0.016
Token.
Token ID13
Feature activation-2.77e-03

Pos logit contributions
 contain+0.042
 banks+0.039
 allowance+0.038
ured+0.037
 symp+0.037
Neg logit contributions
ww-0.050
w-0.049
o-0.048
sets-0.044
hem-0.043
Token "
Token ID366
Feature activation-4.80e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
 symp+0.004
 tum+0.004
Neg logit contributions
ww-0.005
w-0.005
o-0.004
sets-0.004
hem-0.004
TokenG
Token ID38
Feature activation-6.29e-04

Pos logit contributions
 contain+0.009
tted+0.008
 symp+0.008
 aid+0.008
+0.008
Neg logit contributions
ww-0.007
w-0.006
o-0.006
sets-0.006
ries-0.006
Tokenreetings
Token ID46648
Feature activation+1.20e-01

Pos logit contributions
 contain+0.001
 banks+0.001
tted+0.001
 allowance+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token,
Token ID11
Feature activation+2.14e-03

Pos logit contributions
w+0.214
ww+0.211
o+0.205
hem+0.189
iren+0.185
Neg logit contributions
 contain-0.165
 banks-0.159
 allowance-0.155
 tum-0.153
ured-0.152
Token le
Token ID443
Feature activation-1.54e-02

Pos logit contributions
w+0.004
ww+0.004
o+0.004
sets+0.004
hem+0.004
Neg logit contributions
 contain-0.003
 banks-0.003
 tum-0.003
 symp-0.003
 allowance-0.003
Tokenopard
Token ID15478
Feature activation+1.12e-02

Pos logit contributions
 banks+0.025
tted+0.024
 contain+0.024
ured+0.024
 bank+0.023
Neg logit contributions
ww-0.023
w-0.021
sets-0.021
very-0.020
o-0.020
Token,"
Token ID553
Feature activation+1.43e-02

Pos logit contributions
ww+0.019
w+0.019
sets+0.017
o+0.017
hem+0.017
Neg logit contributions
 contain-0.016
 banks-0.016
 tum-0.015
ured-0.015
 symp-0.015
Token it
Token ID340
Feature activation-2.22e-04

Pos logit contributions
w+0.023
ww+0.023
o+0.022
asc+0.020
ins+0.020
Neg logit contributions
 contain-0.024
 banks-0.022
 symp-0.022
ured-0.021
 tum-0.021
Token and
Token ID290
Feature activation-7.65e-04

Pos logit contributions
ww+0.001
o+0.001
w+0.001
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
tted-0.002
ured-0.002
Token said
Token ID531
Feature activation-2.27e-03

Pos logit contributions
 contain+0.001
 widened+0.001
ured+0.001
 tum+0.001
 banks+0.001
Neg logit contributions
w-0.001
ww-0.001
 yourselves-0.001
o-0.001
asc-0.001
Token,
Token ID11
Feature activation-7.12e-04

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
tted+0.004
 symp+0.004
Neg logit contributions
ww-0.003
o-0.003
w-0.003
hem-0.003
sets-0.003
Token "
Token ID366
Feature activation-6.61e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 symp+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
sets-0.001
TokenG
Token ID38
Feature activation-2.38e-03

Pos logit contributions
 contain+0.012
 widened+0.010
+0.010
 symp+0.010
ured+0.010
Neg logit contributions
ww-0.009
w-0.009
loo-0.009
imi-0.009
o-0.009
Tokenreetings
Token ID46648
Feature activation+1.19e-01

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
ured+0.004
 symp+0.004
Neg logit contributions
ww-0.004
w-0.003
o-0.003
sets-0.003
hem-0.003
Token Sir
Token ID7361
Feature activation+3.54e-03

Pos logit contributions
w+0.224
ww+0.220
o+0.213
hem+0.194
sets+0.190
Neg logit contributions
 contain-0.157
 banks-0.149
 allowance-0.145
 tum-0.143
ured-0.141
Token.
Token ID13
Feature activation-7.02e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.005
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.006
 banks-0.005
 allowance-0.005
ured-0.005
 tum-0.005
Token Do
Token ID2141
Feature activation+3.83e-02

Pos logit contributions
 contain+0.012
 allowance+0.011
 banks+0.011
 symp+0.011
tted+0.011
Neg logit contributions
w-0.011
ww-0.011
ins-0.010
ries-0.009
 cream-0.009
Token you
Token ID345
Feature activation-2.11e-02

Pos logit contributions
ww+0.023
w+0.022
o+0.021
hem+0.016
sets+0.016
Neg logit contributions
 contain-0.101
 banks-0.096
 allowance-0.095
ured-0.093
 tum-0.093
Token speak
Token ID2740
Feature activation+1.06e-02

Pos logit contributions
 banks+0.029
 contain+0.027
 symp+0.027
ured+0.027
 allowance+0.026
Neg logit contributions
o-0.039
w-0.038
ww-0.038
my-0.033
 yourself-0.033
Token dog
Token ID3290
Feature activation-5.86e-03

Pos logit contributions
 contain+0.057
 banks+0.055
 allowance+0.054
 tum+0.053
 aid+0.053
Neg logit contributions
w-0.052
ww-0.051
o-0.050
sets-0.048
ries-0.047
Token replied
Token ID8712
Feature activation+7.20e-03

Pos logit contributions
 contain+0.009
 banks+0.009
 symp+0.008
 tum+0.008
ured+0.008
Neg logit contributions
ww-0.010
w-0.010
o-0.009
sets-0.009
hem-0.009
Token,
Token ID11
Feature activation-1.56e-04

Pos logit contributions
o+0.011
ww+0.011
w+0.010
hem+0.009
 my+0.009
Neg logit contributions
 contain-0.013
 banks-0.012
 allowance-0.012
tted-0.011
 symp-0.011
Token "
Token ID366
Feature activation-6.59e-03

Pos logit contributions
 contain+0.000
 banks+0.000
ured+0.000
 symp+0.000
 tum+0.000
Neg logit contributions
w-0.000
ww-0.000
o-0.000
hem-0.000
rob-0.000
TokenG
Token ID38
Feature activation-3.37e-03

Pos logit contributions
 contain+0.011
+0.010
tted+0.010
 symp+0.010
 banks+0.010
Neg logit contributions
ries-0.010
loo-0.010
ww-0.010
ons-0.009
sets-0.009
Tokenreetings
Token ID46648
Feature activation+1.14e-01

Pos logit contributions
 contain+0.005
 banks+0.005
tted+0.005
 allowance+0.005
ured+0.005
Neg logit contributions
ww-0.005
w-0.005
o-0.005
sets-0.005
hem-0.004
Token,
Token ID11
Feature activation+8.83e-04

Pos logit contributions
w+0.200
ww+0.196
o+0.186
hem+0.175
sets+0.172
Neg logit contributions
 contain-0.160
 banks-0.155
 tum-0.151
 allowance-0.149
 symp-0.148
Token little
Token ID1310
Feature activation+9.78e-03

Pos logit contributions
w+0.002
ww+0.001
o+0.001
sets+0.001
ries+0.001
Neg logit contributions
 contain-0.001
 tum-0.001
 banks-0.001
 aid-0.001
ured-0.001
Token one
Token ID530
Feature activation-1.63e-02

Pos logit contributions
w+0.014
ww+0.014
o+0.013
sets+0.013
ries+0.013
Neg logit contributions
 contain-0.016
 banks-0.016
 tum-0.015
 aid-0.015
 allowance-0.015
Token.
Token ID13
Feature activation-3.88e-03

Pos logit contributions
 contain+0.021
+0.019
 believe+0.018
 tum+0.018
 banks+0.017
Neg logit contributions
w-0.030
irens-0.029
ries-0.029
ucks-0.029
sets-0.029
Token Don
Token ID2094
Feature activation-1.45e-02

Pos logit contributions
 contain+0.007
 banks+0.006
tted+0.006
 allowance+0.006
 tum+0.006
Neg logit contributions
w-0.006
ww-0.005
ries-0.005
ins-0.005
sets-0.005
Token
Token ID95
Feature activation+2.70e-02

Pos logit contributions
ww+0.004
w+0.004
o+0.004
www+0.004
very+0.004
Neg logit contributions
 contain-0.004
 banks-0.004
 allowance-0.004
 tum-0.003
 aid-0.003
Token
Token ID26391
Feature activation-3.14e-02

Pos logit contributions
ww+0.046
w+0.045
o+0.045
hem+0.040
sets+0.040
Neg logit contributions
 contain-0.042
 banks-0.039
 allowance-0.038
 tum-0.037
 symp-0.037
Token
Token ID129
Feature activation+5.28e-03

Pos logit contributions
 tum+0.045
 contain+0.045
 banks+0.044
 allowance+0.043
 symp+0.043
Neg logit contributions
ww-0.051
w-0.051
o-0.048
hem-0.047
sets-0.046
Token
Token ID241
Feature activation+4.35e-03

Pos logit contributions
ww+0.010
w+0.010
o+0.010
 my+0.009
 yourselves+0.009
Neg logit contributions
 contain-0.007
 allowance-0.007
 tum-0.006
 banks-0.006
tted-0.006
TokenA
Token ID32
Feature activation-9.79e-03

Pos logit contributions
ww+0.006
w+0.005
o+0.005
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.008
 banks-0.008
 allowance-0.008
 symp-0.008
 tum-0.008
Tokenww
Token ID1383
Feature activation+1.12e-01

Pos logit contributions
 contain+0.014
 tum+0.014
 banks+0.013
tted+0.013
 allowance+0.013
Neg logit contributions
w-0.017
o-0.016
sets-0.016
ries-0.016
ww-0.016
Token,
Token ID11
Feature activation+2.77e-03

Pos logit contributions
o+0.207
sets+0.193
w+0.193
Look+0.190
ww+0.189
Neg logit contributions
 contain-0.142
 tum-0.134
 banks-0.132
 allowance-0.131
ured-0.128
Token okay
Token ID8788
Feature activation-1.85e-02

Pos logit contributions
w+0.004
ww+0.004
o+0.004
sets+0.003
hem+0.003
Neg logit contributions
 contain-0.005
 tum-0.005
 allowance-0.005
 banks-0.005
 symp-0.005
Token.
Token ID13
Feature activation-4.51e-03

Pos logit contributions
 contain+0.033
 tum+0.032
 aid+0.032
 symp+0.032
 resist+0.032
Neg logit contributions
rob-0.023
w-0.023
ww-0.021
ons-0.021
sets-0.021
Tokenâ
Token ID22940
Feature activation+8.74e-03

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.008
 tum+0.008
 symp+0.008
Neg logit contributions
w-0.006
ww-0.006
o-0.006
sets-0.005
hem-0.005
Token
Token ID26391
Feature activation-3.12e-02

Pos logit contributions
ww+0.017
o+0.017
w+0.017
ins+0.016
hem+0.016
Neg logit contributions
 contain-0.011
 banks-0.010
tted-0.009
ured-0.009
 symp-0.009
Token He
Token ID679
Feature activation+3.79e-04

Pos logit contributions
w+0.006
ww+0.006
o+0.006
sets+0.005
ins+0.005
Neg logit contributions
 tum-0.005
 contain-0.005
 banks-0.005
 allowance-0.005
 widened-0.005
Token said
Token ID531
Feature activation-8.84e-05

Pos logit contributions
ww+0.001
w+0.001
sets+0.001
o+0.001
hem+0.001
Neg logit contributions
 contain-0.001
 tum-0.001
 widened-0.001
 banks-0.001
 symp-0.001
Token,
Token ID11
Feature activation+2.47e-03

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
ured+0.000
 symp+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
hem-0.000
sets-0.000
Token "
Token ID366
Feature activation-2.73e-03

Pos logit contributions
ww+0.004
w+0.004
o+0.004
hem+0.004
sets+0.004
Neg logit contributions
 contain-0.004
 banks-0.004
 tum-0.004
 allowance-0.003
 symp-0.003
TokenE
Token ID36
Feature activation+2.40e-02

Pos logit contributions
 contain+0.004
 widened+0.004
 symp+0.004
+0.004
 banks+0.004
Neg logit contributions
o-0.004
ww-0.004
ning-0.004
sets-0.004
loo-0.004
Tokenww
Token ID1383
Feature activation+1.11e-01

Pos logit contributions
sets+0.019
o+0.019
Look+0.018
ning+0.017
ww+0.016
Neg logit contributions
 contain-0.052
 banks-0.049
ured-0.048
 allowance-0.048
tted-0.048
Token,
Token ID11
Feature activation+2.49e-03

Pos logit contributions
o+0.230
ww+0.223
Look+0.223
w+0.222
asc+0.216
Neg logit contributions
 tum-0.106
 allowance-0.105
 contain-0.101
 banks-0.098
 symp-0.088
Token this
Token ID428
Feature activation-4.42e-02

Pos logit contributions
w+0.004
o+0.004
ww+0.004
sets+0.003
hem+0.003
Neg logit contributions
 contain-0.004
 tum-0.004
 allowance-0.004
 symp-0.004
 banks-0.004
Token is
Token ID318
Feature activation+2.43e-02

Pos logit contributions
 tum+0.079
 allowance+0.072
 contain+0.071
 banks+0.070
 bank+0.070
Neg logit contributions
o-0.060
ww-0.056
w-0.055
irens-0.054
iren-0.054
Token the
Token ID262
Feature activation+1.37e-02

Pos logit contributions
w+0.043
ww+0.042
rob+0.040
o+0.040
hem+0.039
Neg logit contributions
 tum-0.035
 contain-0.034
 allowance-0.032
 symp-0.031
 banks-0.031
Token worst
Token ID5290
Feature activation-1.49e-02

Pos logit contributions
o+0.021
ww+0.021
w+0.020
hem+0.019
ins+0.018
Neg logit contributions
 contain-0.023
 tum-0.023
 banks-0.022
 allowance-0.022
 bank-0.021
Token at
Token ID379
Feature activation+2.62e-02

Pos logit contributions
 contain+0.011
 banks+0.010
 allowance+0.010
tted+0.009
 symp+0.009
Neg logit contributions
ww-0.015
w-0.015
o-0.014
sets-0.013
hem-0.013
Token it
Token ID340
Feature activation-2.27e-03

Pos logit contributions
ww+0.051
w+0.048
o+0.046
hem+0.043
very+0.043
Neg logit contributions
 contain-0.037
 banks-0.035
 allowance-0.033
 tum-0.033
 bank-0.031
Token said
Token ID531
Feature activation-8.50e-04

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
ured+0.003
 symp+0.003
Neg logit contributions
ww-0.004
w-0.004
o-0.004
hem-0.004
sets-0.004
Token "
Token ID366
Feature activation-5.12e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
tted+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Tokenew
Token ID413
Feature activation+2.83e-02

Pos logit contributions
 contain+0.009
tted+0.008
 banks+0.008
ured+0.008
 allowance+0.008
Neg logit contributions
ww-0.008
w-0.007
o-0.007
hem-0.007
loo-0.007
Tokenww
Token ID1383
Feature activation+1.10e-01

Pos logit contributions
ww+0.040
o+0.040
w+0.039
hem+0.036
sets+0.036
Neg logit contributions
 contain-0.048
 banks-0.047
 tum-0.045
ured-0.045
 symp-0.044
Token!"
Token ID2474
Feature activation+1.45e-02

Pos logit contributions
o+0.152
w+0.145
ww+0.144
hem+0.133
Look+0.131
Neg logit contributions
 contain-0.190
 banks-0.190
 tum-0.182
ured-0.182
 allowance-0.180
Token
Token ID220
Feature activation-2.35e-03

Pos logit contributions
ww+0.029
w+0.028
o+0.028
hem+0.026
sets+0.025
Neg logit contributions
 contain-0.019
 banks-0.017
 allowance-0.016
 tum-0.016
ured-0.016
Token
Token ID198
Feature activation-1.64e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
ured+0.004
 symp+0.004
Neg logit contributions
o-0.003
ww-0.003
w-0.003
hem-0.003
sets-0.003
Token
Token ID198
Feature activation-1.64e-03

Pos logit contributions
 contain+0.002
 banks+0.002
tted+0.002
 symp+0.002
 allowance+0.002
Neg logit contributions
ww-0.003
w-0.003
o-0.003
ries-0.003
sets-0.003
TokenOne
Token ID3198
Feature activation+1.57e-02

Pos logit contributions
 contain+0.002
 banks+0.002
tted+0.002
ured+0.002
 allowance+0.002
Neg logit contributions
w-0.003
ww-0.003
o-0.003
ries-0.003
sets-0.003
Token She
Token ID1375
Feature activation-4.27e-03

Pos logit contributions
w+0.002
ww+0.002
o+0.002
rob+0.002
sets+0.002
Neg logit contributions
 tum-0.002
 contain-0.002
 symp-0.002
 banks-0.002
 allowance-0.002
Token said
Token ID531
Feature activation-3.83e-04

Pos logit contributions
 widened+0.006
 contain+0.006
 tum+0.006
 banks+0.006
 symp+0.006
Neg logit contributions
ww-0.008
w-0.007
sets-0.007
loo-0.007
o-0.007
Token,
Token ID11
Feature activation+1.23e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
ured+0.001
 symp+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token "
Token ID366
Feature activation-2.97e-03

Pos logit contributions
ww+0.002
w+0.002
o+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.002
 banks-0.002
 tum-0.002
 symp-0.002
 allowance-0.002
TokenE
Token ID36
Feature activation+2.34e-02

Pos logit contributions
 contain+0.006
 banks+0.005
 symp+0.005
 widened+0.005
ured+0.005
Neg logit contributions
o-0.004
ww-0.004
loo-0.003
ins-0.003
hem-0.003
Tokenww
Token ID1383
Feature activation+1.09e-01

Pos logit contributions
sets+0.009
o+0.008
ning+0.007
Look+0.006
hem+0.006
Neg logit contributions
 contain-0.060
 banks-0.058
 symp-0.058
 allowance-0.057
 tum-0.057
Token,
Token ID11
Feature activation+1.25e-03

Pos logit contributions
o+0.221
Look+0.215
ww+0.214
w+0.211
hem+0.209
Neg logit contributions
 tum-0.106
 allowance-0.105
 contain-0.104
 banks-0.100
 symp-0.093
Token disgusting
Token ID23374
Feature activation-1.17e-02

Pos logit contributions
ww+0.002
w+0.002
o+0.002
sets+0.002
hem+0.002
Neg logit contributions
 contain-0.002
 tum-0.002
 banks-0.002
 symp-0.002
 allowance-0.002
Token!"
Token ID2474
Feature activation+1.43e-02

Pos logit contributions
 tum+0.018
 banks+0.017
 bank+0.016
ured+0.016
 aid+0.016
Neg logit contributions
o-0.018
w-0.017
ww-0.017
iren-0.017
very-0.016
Token She
Token ID1375
Feature activation-4.31e-03

Pos logit contributions
ww+0.023
w+0.023
o+0.021
sets+0.020
asc+0.020
Neg logit contributions
 contain-0.024
 tum-0.023
 banks-0.023
 allowance-0.022
 symp-0.022
Token threw
Token ID9617
Feature activation-5.53e-03

Pos logit contributions
 contain+0.007
 tum+0.006
 banks+0.006
 widened+0.006
 allowance+0.006
Neg logit contributions
ww-0.007
w-0.007
o-0.007
sets-0.007
hem-0.006
Token be
Token ID307
Feature activation-6.89e-03

Pos logit contributions
ww+0.040
w+0.039
o+0.038
hem+0.034
sets+0.034
Neg logit contributions
 contain-0.035
 banks-0.034
 allowance-0.033
 symp-0.033
ured-0.032
Token bitter
Token ID12922
Feature activation-1.76e-02

Pos logit contributions
 contain+0.011
 allowance+0.010
 tum+0.010
 banks+0.010
ured+0.009
Neg logit contributions
ww-0.011
w-0.011
o-0.011
sets-0.010
hem-0.010
Token,
Token ID11
Feature activation+2.90e-03

Pos logit contributions
 contain+0.023
 banks+0.022
 allowance+0.021
 tum+0.020
ured+0.020
Neg logit contributions
ww-0.034
w-0.034
o-0.033
sets-0.030
hem-0.030
Token like
Token ID588
Feature activation-2.98e-02

Pos logit contributions
ww+0.006
w+0.005
o+0.005
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.004
 banks-0.004
 allowance-0.003
 tum-0.003
 symp-0.003
Token e
Token ID304
Feature activation-1.93e-02

Pos logit contributions
 contain+0.051
 allowance+0.051
 tum+0.050
 banks+0.048
ured+0.047
Neg logit contributions
ww-0.044
o-0.044
w-0.043
sets-0.039
hem-0.039
Tokenww
Token ID1383
Feature activation+1.09e-01

Pos logit contributions
 symp+0.034
 banks+0.034
 contain+0.034
ured+0.033
 allowance+0.033
Neg logit contributions
ww-0.024
o-0.023
Look-0.021
hem-0.020
sets-0.020
Token.
Token ID13
Feature activation+4.88e-03

Pos logit contributions
o+0.183
ww+0.178
w+0.178
asc+0.163
Look+0.163
Neg logit contributions
 contain-0.158
 banks-0.157
 allowance-0.155
 tum-0.152
ured-0.144
Token Do
Token ID2141
Feature activation+3.89e-02

Pos logit contributions
ww+0.007
w+0.007
o+0.007
sets+0.006
hem+0.006
Neg logit contributions
 contain-0.009
 allowance-0.008
 banks-0.008
 tum-0.008
 symp-0.008
Token you
Token ID345
Feature activation-1.61e-02

Pos logit contributions
ww+0.029
o+0.029
w+0.027
sets+0.025
hem+0.024
Neg logit contributions
 contain-0.093
 allowance-0.089
 tum-0.088
 banks-0.087
 symp-0.086
Token want
Token ID765
Feature activation+4.25e-03

Pos logit contributions
 banks+0.021
ured+0.019
 bank+0.018
 contain+0.018
 allowance+0.018
Neg logit contributions
w-0.034
o-0.032
ww-0.031
?"-0.029
my-0.029
Token to
Token ID284
Feature activation-1.61e-03

Pos logit contributions
ww+0.008
w+0.008
o+0.008
sets+0.008
ins+0.007
Neg logit contributions
 contain-0.006
 tum-0.005
 allowance-0.005
 banks-0.005
 aid-0.004
Token and
Token ID290
Feature activation+4.88e-03

Pos logit contributions
 contain+0.018
 banks+0.017
 allowance+0.017
tted+0.016
 symp+0.016
Neg logit contributions
o-0.016
w-0.016
ww-0.016
sets-0.014
hem-0.014
Token said
Token ID531
Feature activation+4.11e-04

Pos logit contributions
ww+0.008
w+0.008
o+0.007
hem+0.007
sets+0.007
Neg logit contributions
 contain-0.008
 banks-0.008
 tum-0.008
 allowance-0.007
ured-0.007
Token,
Token ID11
Feature activation+1.93e-03

Pos logit contributions
ww+0.001
w+0.001
o+0.001
hem+0.001
sets+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.001
tted-0.001
ured-0.001
Token "
Token ID366
Feature activation-3.65e-03

Pos logit contributions
ww+0.003
w+0.003
o+0.003
sets+0.002
hem+0.002
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.003
ured-0.003
tted-0.003
TokenE
Token ID36
Feature activation+2.25e-02

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
 symp+0.006
ured+0.006
Neg logit contributions
ww-0.005
o-0.005
w-0.005
sets-0.004
hem-0.004
Tokenww
Token ID1383
Feature activation+1.08e-01

Pos logit contributions
o+0.025
ww+0.025
sets+0.024
w+0.023
Look+0.022
Neg logit contributions
 contain-0.043
 banks-0.042
 allowance-0.041
ured-0.040
 symp-0.040
Token,
Token ID11
Feature activation+1.85e-03

Pos logit contributions
o+0.212
ww+0.212
w+0.208
Look+0.200
sets+0.196
Neg logit contributions
 banks-0.117
 allowance-0.115
 contain-0.114
 tum-0.112
ured-0.103
Token Bruno
Token ID31045
Feature activation-3.10e-02

Pos logit contributions
ww+0.003
w+0.002
o+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.003
tted-0.003
ured-0.003
Token.
Token ID13
Feature activation+1.32e-03

Pos logit contributions
 contain+0.038
 banks+0.038
 allowance+0.036
 tum+0.036
ured+0.035
Neg logit contributions
ww-0.059
w-0.058
o-0.056
sets-0.054
hem-0.053
Token You
Token ID921
Feature activation-1.33e-02

Pos logit contributions
ww+0.002
w+0.002
o+0.002
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.002
 tum-0.002
 symp-0.002
Token are
Token ID389
Feature activation-1.14e-02

Pos logit contributions
 contain+0.020
 symp+0.020
ured+0.019
 banks+0.019
 aid+0.019
Neg logit contributions
w-0.023
ww-0.022
o-0.022
 blows-0.019
 my-0.019
Token.
Token ID13
Feature activation+4.88e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 aid+0.004
 bank+0.004
 tum+0.004
Neg logit contributions
ww-0.008
w-0.008
sets-0.008
hem-0.007
o-0.007
Token
Token ID198
Feature activation+3.31e-03

Pos logit contributions
w+0.010
ww+0.010
o+0.009
sets+0.009
hem+0.009
Neg logit contributions
 contain-0.006
 tum-0.006
 allowance-0.005
 symp-0.005
 aid-0.005
Token
Token ID198
Feature activation+3.21e-03

Pos logit contributions
ww+0.006
w+0.005
o+0.005
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.005
 banks-0.005
 allowance-0.005
 tum-0.005
 symp-0.005
Token"
Token ID1
Feature activation+2.97e-03

Pos logit contributions
ww+0.005
o+0.005
w+0.005
sets+0.005
ries+0.004
Neg logit contributions
 contain-0.005
 banks-0.005
tted-0.005
 symp-0.005
ured-0.005
TokenA
Token ID32
Feature activation-1.15e-02

Pos logit contributions
o+0.004
w+0.004
ries+0.004
ins+0.004
ons+0.003
Neg logit contributions
 contain-0.006
 widened-0.005
tted-0.005
 banks-0.005
 symp-0.005
Tokenww
Token ID1383
Feature activation+1.06e-01

Pos logit contributions
 tum+0.013
+0.011
 contain+0.011
 allowance+0.011
 aid+0.011
Neg logit contributions
ries-0.024
o-0.024
w-0.022
 yourselves-0.022
asc-0.022
Token,
Token ID11
Feature activation+2.48e-03

Pos logit contributions
o+0.233
Look+0.231
sets+0.225
mill+0.223
 Fighting+0.219
Neg logit contributions
 tum-0.086
 banks-0.080
 contain-0.078
 allowance-0.076
ids-0.066
Token it
Token ID340
Feature activation-1.91e-03

Pos logit contributions
w+0.004
o+0.004
ww+0.004
sets+0.004
hem+0.003
Neg logit contributions
 contain-0.004
 tum-0.004
 banks-0.004
 allowance-0.004
ured-0.004
Token is
Token ID318
Feature activation+2.41e-02

Pos logit contributions
 banks+0.004
 contain+0.003
 bank+0.003
 tum+0.003
+0.003
Neg logit contributions
ww-0.002
sets-0.002
w-0.002
o-0.002
hem-0.002
Token so
Token ID523
Feature activation+1.40e-02

Pos logit contributions
w+0.042
ww+0.042
o+0.040
hem+0.039
sets+0.039
Neg logit contributions
 tum-0.033
 banks-0.033
 contain-0.033
ured-0.031
 allowance-0.031
Token cute
Token ID13779
Feature activation+6.86e-03

Pos logit contributions
sets+0.030
ries+0.029
ning+0.029
mill+0.028
rob+0.028
Neg logit contributions
 tum-0.017
 contain-0.013
 banks-0.013
-0.013
 allowance-0.013
Token and
Token ID290
Feature activation+2.86e-04

Pos logit contributions
 contain+0.033
 banks+0.031
 allowance+0.030
tted+0.029
ured+0.029
Neg logit contributions
ww-0.037
w-0.037
o-0.036
hem-0.032
sets-0.032
Token said
Token ID531
Feature activation-7.46e-04

Pos logit contributions
ww+0.000
w+0.000
o+0.000
sets+0.000
hem+0.000
Neg logit contributions
 contain-0.000
 banks-0.000
 tum-0.000
ured-0.000
 symp-0.000
Token,
Token ID11
Feature activation-1.39e-04

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
tted+0.001
 symp+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
 my-0.001
hem-0.001
Token "
Token ID366
Feature activation-5.55e-03

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 tum+0.000
ured+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
hem-0.000
sets-0.000
TokenA
Token ID32
Feature activation-1.29e-02

Pos logit contributions
 contain+0.010
 banks+0.010
 symp+0.010
ured+0.010
 allowance+0.010
Neg logit contributions
ww-0.007
o-0.007
w-0.007
sets-0.006
hem-0.006
Tokenww
Token ID1383
Feature activation+1.05e-01

Pos logit contributions
 contain+0.019
 banks+0.018
 tum+0.017
 allowance+0.017
 symp+0.017
Neg logit contributions
w-0.022
o-0.022
ww-0.022
sets-0.020
ries-0.020
Tokenwww
Token ID2503
Feature activation+4.11e-02

Pos logit contributions
o+0.197
w+0.185
Look+0.184
ww+0.181
hem+0.179
Neg logit contributions
 contain-0.131
 tum-0.127
 banks-0.123
 allowance-0.120
ured-0.116
Token,
Token ID11
Feature activation+1.35e-04

Pos logit contributions
ww+0.065
w+0.063
o+0.063
hem+0.058
sets+0.057
Neg logit contributions
 contain-0.063
 banks-0.062
 tum-0.061
 allowance-0.060
ured-0.059
Token do
Token ID466
Feature activation+3.91e-02

Pos logit contributions
ww+0.000
w+0.000
o+0.000
sets+0.000
hem+0.000
Neg logit contributions
 contain-0.000
 banks-0.000
 allowance-0.000
 tum-0.000
ured-0.000
Token I
Token ID314
Feature activation+6.86e-03

Pos logit contributions
ww+0.041
w+0.040
o+0.040
sets+0.034
hem+0.033
Neg logit contributions
 contain-0.085
 banks-0.082
 allowance-0.080
 tum-0.079
ured-0.078
Token have
Token ID423
Feature activation+1.54e-02

Pos logit contributions
o+0.012
w+0.012
ww+0.011
hem+0.010
my+0.010
Neg logit contributions
 contain-0.011
 banks-0.010
ured-0.010
 symp-0.010
 allowance-0.009
Token!"
Token ID2474
Feature activation+1.35e-02

Pos logit contributions
 contain+0.008
 aid+0.007
 believe+0.007
+0.007
 tum+0.007
Neg logit contributions
sets-0.008
irens-0.007
ww-0.007
ries-0.007
rob-0.007
Token
Token ID198
Feature activation+4.07e-03

Pos logit contributions
ww+0.023
w+0.023
o+0.022
sets+0.021
hem+0.020
Neg logit contributions
 contain-0.021
 banks-0.019
 tum-0.019
 allowance-0.019
 symp-0.019
Token
Token ID198
Feature activation+3.97e-03

Pos logit contributions
ww+0.007
w+0.007
o+0.007
sets+0.007
ries+0.006
Neg logit contributions
 contain-0.006
 banks-0.005
tted-0.005
 tum-0.005
 symp-0.005
Token"
Token ID1
Feature activation+3.38e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.005
sets+0.005
ries+0.005
Neg logit contributions
 contain-0.007
 banks-0.007
tted-0.007
 allowance-0.007
 symp-0.007
TokenA
Token ID32
Feature activation-1.33e-02

Pos logit contributions
w+0.005
o+0.005
ries+0.005
ww+0.005
ins+0.004
Neg logit contributions
 contain-0.006
 widened-0.006
tted-0.006
 symp-0.005
 banks-0.005
Tokenww
Token ID1383
Feature activation+1.05e-01

Pos logit contributions
 tum+0.015
 aid+0.014
+0.013
 contain+0.013
 allowance+0.013
Neg logit contributions
ries-0.029
o-0.027
w-0.026
 yourselves-0.026
ins-0.025
Token!"
Token ID2474
Feature activation+1.32e-02

Pos logit contributions
o+0.227
sets+0.219
mill+0.217
Look+0.217
hem+0.211
Neg logit contributions
 tum-0.096
 contain-0.087
 banks-0.083
 allowance-0.082
ids-0.075
Token Lily
Token ID20037
Feature activation+8.64e-03

Pos logit contributions
ww+0.018
w+0.017
o+0.017
hem+0.015
sets+0.015
Neg logit contributions
 contain-0.025
 banks-0.024
 allowance-0.023
ured-0.023
 symp-0.023
Token says
Token ID1139
Feature activation+1.39e-02

Pos logit contributions
ww+0.010
w+0.010
o+0.008
loo+0.008
hem+0.007
Neg logit contributions
 contain-0.019
 banks-0.018
 widened-0.017
 symp-0.017
 aid-0.017
Token.
Token ID13
Feature activation+4.08e-03

Pos logit contributions
o+0.022
w+0.021
ww+0.020
sets+0.018
hem+0.018
Neg logit contributions
 banks-0.023
 contain-0.023
 allowance-0.022
 bank-0.021
tted-0.020
Token "
Token ID366
Feature activation-2.71e-03

Pos logit contributions
w+0.008
ww+0.008
o+0.008
rob+0.007
sets+0.007
Neg logit contributions
 contain-0.005
 symp-0.005
 tum-0.005
 aid-0.005
tted-0.005
Token.
Token ID13
Feature activation-6.08e-03

Pos logit contributions
 contain+0.027
 banks+0.025
 allowance+0.025
 tum+0.024
ured+0.024
Neg logit contributions
ww-0.031
w-0.031
o-0.031
sets-0.028
hem-0.028
Token
Token ID198
Feature activation-4.68e-04

Pos logit contributions
 contain+0.010
 banks+0.010
 allowance+0.010
 symp+0.009
 tum+0.009
Neg logit contributions
ww-0.010
w-0.009
o-0.009
sets-0.008
hem-0.008
Token
Token ID198
Feature activation-4.56e-04

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
ured+0.001
 tum+0.001
Neg logit contributions
o-0.001
ww-0.001
w-0.001
hem-0.001
very-0.001
Token"
Token ID1
Feature activation+1.43e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
very-0.001
TokenE
Token ID36
Feature activation+1.98e-02

Pos logit contributions
o+0.002
ww+0.002
w+0.002
loo+0.002
hem+0.002
Neg logit contributions
 contain-0.003
 widened-0.003
 banks-0.003
tted-0.003
 symp-0.002
Tokenww
Token ID1383
Feature activation+1.03e-01

Pos logit contributions
o+0.009
ww+0.008
w+0.007
sets+0.007
hem+0.006
Neg logit contributions
 contain-0.053
 banks-0.051
 allowance-0.050
 symp-0.050
ured-0.050
Token!"
Token ID2474
Feature activation+1.24e-02

Pos logit contributions
o+0.188
w+0.180
ww+0.180
Look+0.177
hem+0.175
Neg logit contributions
 contain-0.128
 banks-0.128
 tum-0.127
 allowance-0.124
ured-0.118
Token she
Token ID673
Feature activation-3.57e-03

Pos logit contributions
ww+0.014
w+0.014
o+0.013
sets+0.011
hem+0.011
Neg logit contributions
 contain-0.026
 banks-0.025
 allowance-0.024
ured-0.024
 symp-0.024
Token screamed
Token ID25421
Feature activation+1.54e-02

Pos logit contributions
 contain+0.008
 banks+0.007
 allowance+0.007
 symp+0.007
ured+0.007
Neg logit contributions
ww-0.004
w-0.004
o-0.004
hem-0.003
sets-0.003
Token.
Token ID13
Feature activation-4.47e-03

Pos logit contributions
o+0.018
 my+0.015
my+0.015
 forth+0.015
ning+0.014
Neg logit contributions
 contain-0.030
 banks-0.030
tted-0.029
 allowance-0.028
isc-0.026
Token
Token ID198
Feature activation-5.52e-04

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
 symp+0.006
 tum+0.006
Neg logit contributions
ww-0.008
w-0.008
o-0.007
sets-0.007
hem-0.007
Token the
Token ID262
Feature activation+7.06e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.006
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.007
 banks-0.006
 allowance-0.006
 tum-0.006
ured-0.006
Token ground
Token ID2323
Feature activation-5.76e-04

Pos logit contributions
ww+0.010
w+0.010
o+0.009
 my+0.009
sets+0.009
Neg logit contributions
 contain-0.013
 allowance-0.012
 symp-0.012
tted-0.012
 widened-0.011
Token.
Token ID13
Feature activation-5.39e-03

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 symp+0.001
 tum+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
sets-0.001
Token "
Token ID366
Feature activation-5.91e-03

Pos logit contributions
 contain+0.009
 banks+0.008
 tum+0.008
 symp+0.008
 allowance+0.008
Neg logit contributions
w-0.009
ww-0.009
o-0.008
hem-0.007
rob-0.007
TokenE
Token ID36
Feature activation+1.86e-02

Pos logit contributions
 contain+0.012
 banks+0.011
 symp+0.011
tted+0.011
ured+0.011
Neg logit contributions
ww-0.007
o-0.007
w-0.007
sets-0.006
hem-0.006
Tokenww
Token ID1383
Feature activation+1.03e-01

Pos logit contributions
o+0.007
sets+0.006
ww+0.005
hem+0.005
iren+0.004
Neg logit contributions
 contain-0.050
 banks-0.048
 allowance-0.048
 symp-0.047
ured-0.047
Token,
Token ID11
Feature activation+6.16e-04

Pos logit contributions
o+0.196
Look+0.187
w+0.186
ww+0.185
hem+0.181
Neg logit contributions
 banks-0.120
 contain-0.118
 tum-0.117
 allowance-0.117
ured-0.108
Token what
Token ID644
Feature activation+3.66e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.001
 tum-0.001
 symp-0.001
Token's
Token ID338
Feature activation+6.87e-04

Pos logit contributions
ww+0.067
o+0.065
w+0.064
hem+0.063
rob+0.063
Neg logit contributions
 contain-0.049
 tum-0.045
 allowance-0.045
-0.043
 banks-0.041
Token that
Token ID326
Feature activation-1.35e-03

Pos logit contributions
o+0.001
ww+0.001
w+0.001
hem+0.001
sets+0.001
Neg logit contributions
 contain-0.001
 allowance-0.001
 banks-0.001
 tum-0.001
ured-0.001
Token?"
Token ID1701
Feature activation+2.16e-02

Pos logit contributions
 tum+0.002
 banks+0.002
 contain+0.002
 bank+0.002
 allowance+0.002
Neg logit contributions
o-0.002
w-0.002
ww-0.002
ins-0.002
irens-0.002
Token:
Token ID25
Feature activation+3.03e-02

Pos logit contributions
ww+0.047
w+0.046
o+0.044
sets+0.043
ries+0.041
Neg logit contributions
 contain-0.031
 banks-0.028
 tum-0.028
 allowance-0.027
ured-0.026
Token Sharing
Token ID36644
Feature activation+2.60e-03

Pos logit contributions
ww+0.043
w+0.042
o+0.039
rob+0.037
sets+0.036
Neg logit contributions
 contain-0.055
 allowance-0.053
 banks-0.053
 tum-0.052
 symp-0.052
Token is
Token ID318
Feature activation+2.45e-02

Pos logit contributions
ww+0.004
o+0.004
w+0.004
rob+0.004
www+0.004
Neg logit contributions
-0.004
 allowance-0.004
 contain-0.004
 tum-0.004
 banks-0.004
Token caring
Token ID18088
Feature activation-5.21e-03

Pos logit contributions
ww+0.039
w+0.039
o+0.038
sets+0.034
ries+0.034
Neg logit contributions
 contain-0.040
 allowance-0.037
 banks-0.037
ured-0.037
 tum-0.036
Token.
Token ID13
Feature activation+3.17e-03

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
 tum+0.006
ured+0.006
Neg logit contributions
ww-0.010
w-0.010
o-0.010
sets-0.009
hem-0.009
Token Pride
Token ID21735
Feature activation-1.25e-01

Pos logit contributions
ww+0.004
w+0.004
o+0.004
sets+0.004
hem+0.004
Neg logit contributions
 contain-0.006
 banks-0.006
 allowance-0.005
 tum-0.005
 symp-0.005
Token goes
Token ID2925
Feature activation+5.77e-03

Pos logit contributions
 contain+0.197
 banks+0.185
 allowance+0.180
tted+0.174
 tum+0.173
Neg logit contributions
ww-0.214
w-0.205
o-0.202
sets-0.183
rob-0.182
Token before
Token ID878
Feature activation-3.55e-02

Pos logit contributions
ww+0.010
w+0.010
o+0.010
sets+0.009
hem+0.009
Neg logit contributions
 contain-0.008
 banks-0.008
 allowance-0.008
ured-0.008
tted-0.008
Token a
Token ID257
Feature activation-7.24e-03

Pos logit contributions
 allowance+0.065
 banks+0.064
 contain+0.064
 tum+0.062
ured+0.060
Neg logit contributions
o-0.050
ww-0.048
w-0.047
asc-0.042
rob-0.041
Token fall
Token ID2121
Feature activation-1.52e-02

Pos logit contributions
 contain+0.011
 banks+0.010
 tum+0.010
 allowance+0.010
 aid+0.010
Neg logit contributions
w-0.013
ww-0.012
o-0.012
sets-0.011
ries-0.011
Token.
Token ID13
Feature activation+2.89e-03

Pos logit contributions
 contain+0.021
 banks+0.020
 allowance+0.019
 tum+0.018
ured+0.018
Neg logit contributions
ww-0.028
w-0.028
o-0.027
hem-0.025
sets-0.025
Token the
Token ID262
Feature activation+4.95e-03

Pos logit contributions
ww+0.008
w+0.008
o+0.008
hem+0.007
sets+0.007
Neg logit contributions
 contain-0.008
 banks-0.008
 allowance-0.007
ured-0.007
 tum-0.007
Token tree
Token ID5509
Feature activation+1.18e-02

Pos logit contributions
ww+0.007
w+0.007
o+0.007
hem+0.006
sets+0.006
Neg logit contributions
 contain-0.009
 banks-0.009
 allowance-0.009
 tum-0.009
 symp-0.008
Token to
Token ID284
Feature activation-8.08e-03

Pos logit contributions
ww+0.019
o+0.019
w+0.019
sets+0.017
hem+0.016
Neg logit contributions
 contain-0.019
 banks-0.018
 allowance-0.017
ured-0.017
 symp-0.017
Token its
Token ID663
Feature activation-5.29e-03

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
ured+0.010
 tum+0.010
Neg logit contributions
ww-0.015
w-0.015
o-0.014
sets-0.013
hem-0.013
Token once
Token ID1752
Feature activation-4.29e-03

Pos logit contributions
 contain+0.009
 banks+0.008
tted+0.008
 symp+0.008
 allowance+0.008
Neg logit contributions
ww-0.009
w-0.008
o-0.008
sets-0.007
hem-0.007
Token former
Token ID1966
Feature activation-1.14e-01

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
ured+0.006
tted+0.006
Neg logit contributions
ww-0.007
o-0.007
w-0.007
sets-0.006
hem-0.006
Token beauty
Token ID8737
Feature activation-5.61e-03

Pos logit contributions
 contain+0.195
tted+0.175
 symp+0.173
 allowance+0.169
ured+0.168
Neg logit contributions
ww-0.175
w-0.165
o-0.160
sets-0.153
 my-0.151
Token.
Token ID13
Feature activation-9.52e-03

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.007
ured+0.007
 tum+0.007
Neg logit contributions
ww-0.010
w-0.010
o-0.010
sets-0.009
hem-0.009
Token He
Token ID679
Feature activation-3.19e-03

Pos logit contributions
 contain+0.016
 banks+0.015
 allowance+0.014
 tum+0.014
 symp+0.014
Neg logit contributions
ww-0.015
w-0.015
o-0.015
sets-0.013
hem-0.013
Token was
Token ID373
Feature activation-1.18e-03

Pos logit contributions
 contain+0.005
 banks+0.005
 allowance+0.004
ured+0.004
 tum+0.004
Neg logit contributions
ww-0.006
w-0.005
o-0.005
sets-0.005
hem-0.005
Token happy
Token ID3772
Feature activation-3.74e-03

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.002
 symp+0.002
 tum+0.002
Neg logit contributions
o-0.002
ww-0.002
w-0.002
 my-0.002
sets-0.002
Token.
Token ID13
Feature activation-8.26e-03

Pos logit contributions
 contain+0.003
 banks+0.002
 allowance+0.002
 tum+0.002
 symp+0.002
Neg logit contributions
ww-0.003
w-0.003
o-0.003
hem-0.002
sets-0.002
Token One
Token ID1881
Feature activation-7.78e-03

Pos logit contributions
 contain+0.010
 banks+0.010
 allowance+0.010
 tum+0.010
 symp+0.009
Neg logit contributions
w-0.017
ww-0.016
o-0.015
hem-0.014
ins-0.014
Token day
Token ID1110
Feature activation-5.57e-03

Pos logit contributions
 contain+0.014
 tum+0.013
 banks+0.013
 allowance+0.013
 symp+0.012
Neg logit contributions
w-0.011
ww-0.011
rob-0.010
ins-0.010
 yourselves-0.010
Token,
Token ID11
Feature activation-9.65e-05

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.008
 tum+0.007
ured+0.007
Neg logit contributions
ww-0.010
w-0.010
o-0.009
sets-0.009
hem-0.009
Token Wh
Token ID854
Feature activation-6.21e-02

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 tum+0.000
ured+0.000
Neg logit contributions
w-0.000
ww-0.000
o-0.000
ins-0.000
hem-0.000
Tokenisk
Token ID1984
Feature activation-1.12e-01

Pos logit contributions
 contain+0.082
 banks+0.072
 allowance+0.070
ured+0.069
 tum+0.069
Neg logit contributions
o-0.124
w-0.124
ww-0.123
hem-0.110
sets-0.108
Tokeners
Token ID364
Feature activation-8.57e-03

Pos logit contributions
 contain+0.179
 banks+0.170
tted+0.164
 allowance+0.162
ured+0.162
Neg logit contributions
ww-0.184
w-0.176
o-0.172
sets-0.157
hem-0.155
Token woke
Token ID19092
Feature activation+3.19e-02

Pos logit contributions
 contain+0.014
 banks+0.013
 allowance+0.013
ured+0.013
 symp+0.013
Neg logit contributions
w-0.013
ww-0.013
o-0.013
sets-0.011
hem-0.011
Token up
Token ID510
Feature activation-3.01e-02

Pos logit contributions
ww+0.038
w+0.037
o+0.037
sets+0.032
hem+0.031
Neg logit contributions
 contain-0.066
 banks-0.062
 allowance-0.061
tted-0.060
 tum-0.060
Token from
Token ID422
Feature activation-5.19e-02

Pos logit contributions
 contain+0.049
 banks+0.047
 allowance+0.045
ured+0.044
 symp+0.044
Neg logit contributions
ww-0.048
w-0.047
o-0.047
sets-0.042
hem-0.041
Token his
Token ID465
Feature activation+5.47e-03

Pos logit contributions
 contain+0.078
 banks+0.075
 allowance+0.075
 aid+0.071
 tum+0.071
Neg logit contributions
ww-0.097
w-0.092
o-0.088
ins-0.088
sets-0.081
Token Ch
Token ID609
Feature activation-1.89e-02

Pos logit contributions
w+0.035
ww+0.035
o+0.032
sets+0.031
hem+0.030
Neg logit contributions
 contain-0.028
 banks-0.027
 allowance-0.026
 tum-0.026
 aid-0.026
Tokenir
Token ID343
Feature activation-2.13e-02

Pos logit contributions
 contain+0.030
 banks+0.028
 allowance+0.027
 tum+0.026
ured+0.026
Neg logit contributions
ww-0.032
w-0.032
o-0.031
hem-0.028
sets-0.027
Tokenpy
Token ID9078
Feature activation+2.92e-02

Pos logit contributions
 contain+0.038
 banks+0.037
ured+0.036
 symp+0.035
tted+0.034
Neg logit contributions
ww-0.030
w-0.027
o-0.026
sets-0.025
hem-0.025
Token and
Token ID290
Feature activation-9.76e-04

Pos logit contributions
ww+0.049
w+0.046
o+0.044
hem+0.042
sets+0.042
Neg logit contributions
 contain-0.046
 banks-0.045
 symp-0.042
 widened-0.042
ured-0.041
Token Wh
Token ID854
Feature activation-6.06e-02

Pos logit contributions
 banks+0.001
 contain+0.001
 tum+0.001
 bank+0.001
 allowance+0.001
Neg logit contributions
w-0.002
ww-0.002
asc-0.001
loo-0.001
o-0.001
Tokenisk
Token ID1984
Feature activation-1.12e-01

Pos logit contributions
 contain+0.076
 banks+0.075
 symp+0.070
ured+0.068
tted+0.068
Neg logit contributions
ww-0.119
w-0.112
o-0.106
sets-0.104
hem-0.103
Tokeners
Token ID364
Feature activation-1.03e-02

Pos logit contributions
 contain+0.176
 banks+0.169
tted+0.162
ured+0.161
 allowance+0.158
Neg logit contributions
ww-0.185
w-0.176
o-0.172
sets-0.158
hem-0.156
Token knew
Token ID2993
Feature activation-4.41e-03

Pos logit contributions
 contain+0.016
 allowance+0.015
 banks+0.014
 tum+0.014
ured+0.014
Neg logit contributions
w-0.017
ww-0.017
o-0.017
sets-0.015
hem-0.015
Token that
Token ID326
Feature activation-4.11e-03

Pos logit contributions
 contain+0.008
 banks+0.007
 allowance+0.007
ured+0.007
 tum+0.007
Neg logit contributions
ww-0.007
w-0.007
o-0.006
hem-0.006
sets-0.006
Token poison
Token ID8764
Feature activation-1.99e-02

Pos logit contributions
 contain+0.007
 banks+0.007
 allowance+0.007
 tum+0.007
 symp+0.006
Neg logit contributions
ww-0.006
w-0.006
o-0.006
hem-0.005
sets-0.005
Token was
Token ID373
Feature activation-2.22e-03

Pos logit contributions
 contain+0.031
 banks+0.030
 allowance+0.029
ured+0.028
 tum+0.028
Neg logit contributions
ww-0.033
w-0.032
o-0.032
hem-0.029
sets-0.028
Token day
Token ID1110
Feature activation-2.05e-03

Pos logit contributions
 tum+0.026
 contain+0.024
 banks+0.024
 aid+0.023
 allowance+0.022
Neg logit contributions
ww-0.026
w-0.026
hem-0.026
ins-0.025
o-0.025
Token,
Token ID11
Feature activation+2.82e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.002
 tum+0.002
ured+0.002
Neg logit contributions
w-0.004
ww-0.004
o-0.004
sets-0.003
hem-0.003
Token Spot
Token ID15899
Feature activation-3.03e-02

Pos logit contributions
w+0.005
ww+0.004
o+0.004
ins+0.004
hem+0.004
Neg logit contributions
 contain-0.005
 allowance-0.004
 banks-0.004
ured-0.004
 tum-0.004
Token and
Token ID290
Feature activation+4.06e-03

Pos logit contributions
 contain+0.045
 banks+0.044
 allowance+0.041
 tum+0.041
 symp+0.041
Neg logit contributions
ww-0.053
w-0.052
o-0.049
sets-0.046
hem-0.046
Token Wh
Token ID854
Feature activation-6.20e-02

Pos logit contributions
w+0.006
ww+0.005
o+0.005
sets+0.005
ins+0.005
Neg logit contributions
 contain-0.008
 banks-0.007
 tum-0.007
 allowance-0.007
 aid-0.007
Tokenisk
Token ID1984
Feature activation-1.10e-01

Pos logit contributions
 contain+0.082
 allowance+0.073
 banks+0.071
 tum+0.071
ured+0.069
Neg logit contributions
o-0.123
w-0.121
ww-0.119
hem-0.108
sets-0.108
Tokeners
Token ID364
Feature activation-6.99e-03

Pos logit contributions
 contain+0.184
 banks+0.175
 allowance+0.167
ured+0.166
tted+0.165
Neg logit contributions
ww-0.175
w-0.169
o-0.166
sets-0.149
hem-0.148
Token went
Token ID1816
Feature activation+1.48e-02

Pos logit contributions
 allowance+0.012
ured+0.011
 contain+0.011
 symp+0.010
 aid+0.010
Neg logit contributions
o-0.011
w-0.010
my-0.009
sets-0.009
ww-0.008
Token to
Token ID284
Feature activation-6.57e-03

Pos logit contributions
ww+0.029
w+0.029
ries+0.025
o+0.025
ins+0.025
Neg logit contributions
 contain-0.020
 banks-0.019
tted-0.017
 allowance-0.017
ured-0.017
Token the
Token ID262
Feature activation+6.51e-03

Pos logit contributions
 contain+0.011
 banks+0.011
ured+0.011
 symp+0.010
tted+0.010
Neg logit contributions
o-0.010
ww-0.010
w-0.009
hem-0.008
sets-0.008
Token park
Token ID3952
Feature activation+1.08e-02

Pos logit contributions
w+0.009
ww+0.009
o+0.009
ins+0.008
ries+0.008
Neg logit contributions
 banks-0.012
 contain-0.012
 bank-0.011
 tum-0.011
 allowance-0.011
Token "
Token ID366
Feature activation-3.75e-03

Pos logit contributions
ww+0.003
w+0.003
o+0.003
hem+0.003
sets+0.003
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.003
ured-0.003
 symp-0.003
TokenWe
Token ID1135
Feature activation-6.58e-03

Pos logit contributions
 contain+0.006
 banks+0.006
tted+0.006
ured+0.006
 widened+0.006
Neg logit contributions
o-0.006
ww-0.005
ries-0.005
w-0.005
hem-0.005
Token are
Token ID389
Feature activation-1.17e-02

Pos logit contributions
 contain+0.009
 banks+0.009
 allowance+0.008
 symp+0.008
ured+0.008
Neg logit contributions
ww-0.012
w-0.012
o-0.012
hem-0.011
sets-0.011
Token the
Token ID262
Feature activation+1.20e-02

Pos logit contributions
 contain+0.021
 banks+0.020
 allowance+0.020
ured+0.020
 tum+0.019
Neg logit contributions
ww-0.017
w-0.017
o-0.016
hem-0.014
sets-0.014
Token weapon
Token ID4282
Feature activation+4.86e-02

Pos logit contributions
ww+0.019
w+0.019
o+0.019
hem+0.016
sets+0.016
Neg logit contributions
 contain-0.020
 banks-0.019
 allowance-0.018
 tum-0.018
 bank-0.017
Token creators
Token ID16294
Feature activation-1.10e-01

Pos logit contributions
w+0.083
ww+0.083
Look+0.076
o+0.075
ries+0.074
Neg logit contributions
 contain-0.071
 banks-0.068
 allowance-0.066
tted-0.065
ured-0.064
Token!
Token ID0
Feature activation+1.56e-02

Pos logit contributions
 contain+0.187
 banks+0.178
ured+0.169
 aid+0.168
 allowance+0.166
Neg logit contributions
ww-0.172
w-0.164
o-0.156
hem-0.146
sets-0.145
Token We
Token ID775
Feature activation+9.95e-03

Pos logit contributions
ww+0.020
w+0.020
o+0.018
ries+0.017
sets+0.017
Neg logit contributions
 contain-0.030
 banks-0.029
 allowance-0.028
tted-0.028
ured-0.028
Token can
Token ID460
Feature activation+5.50e-03

Pos logit contributions
w+0.018
ww+0.018
o+0.017
hem+0.015
sets+0.015
Neg logit contributions
 contain-0.014
 banks-0.013
 symp-0.013
 allowance-0.013
 tum-0.013
Token stop
Token ID2245
Feature activation+4.83e-03

Pos logit contributions
ww+0.010
w+0.010
o+0.010
hem+0.009
sets+0.009
Neg logit contributions
 contain-0.008
 banks-0.007
 allowance-0.007
 symp-0.007
 tum-0.007
Token the
Token ID262
Feature activation+1.20e-02

Pos logit contributions
ww+0.008
w+0.007
o+0.007
www+0.006
loo+0.006
Neg logit contributions
 banks-0.008
 contain-0.008
 allowance-0.008
ured-0.008
 symp-0.007
Token Sally
Token ID25737
Feature activation-9.54e-03

Pos logit contributions
ww+0.007
o+0.007
w+0.007
hem+0.006
asc+0.006
Neg logit contributions
 contain-0.007
 banks-0.007
 allowance-0.007
 tum-0.006
ured-0.006
Token told
Token ID1297
Feature activation+3.94e-03

Pos logit contributions
 contain+0.014
 banks+0.013
 allowance+0.012
 tum+0.012
ured+0.012
Neg logit contributions
ww-0.017
w-0.017
o-0.017
hem-0.015
sets-0.015
Token him
Token ID683
Feature activation+7.80e-04

Pos logit contributions
ww+0.006
w+0.006
o+0.006
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.007
 banks-0.006
 allowance-0.006
 symp-0.006
ured-0.006
Token,
Token ID11
Feature activation-2.94e-03

Pos logit contributions
ww+0.001
w+0.001
o+0.001
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.001
 symp-0.001
ured-0.001
Token Wh
Token ID854
Feature activation-6.19e-02

Pos logit contributions
 contain+0.005
 banks+0.005
 allowance+0.005
 tum+0.005
 symp+0.004
Neg logit contributions
w-0.005
ww-0.005
o-0.004
hem-0.004
ins-0.004
Tokenisk
Token ID1984
Feature activation-1.10e-01

Pos logit contributions
 contain+0.063
 banks+0.060
 allowance+0.056
 symp+0.055
ured+0.054
Neg logit contributions
ww-0.137
w-0.133
o-0.130
sets-0.122
hem-0.121
Tokeners
Token ID364
Feature activation-8.07e-03

Pos logit contributions
 contain+0.171
 banks+0.162
tted+0.160
ured+0.157
 allowance+0.155
Neg logit contributions
ww-0.184
w-0.175
o-0.171
sets-0.159
hem-0.156
Token always
Token ID1464
Feature activation+2.43e-02

Pos logit contributions
 contain+0.013
 banks+0.012
 allowance+0.011
 symp+0.011
ured+0.011
Neg logit contributions
ww-0.014
w-0.013
o-0.013
sets-0.012
hem-0.012
Token wanted
Token ID2227
Feature activation-3.25e-03

Pos logit contributions
ww+0.045
w+0.041
loo+0.041
rob+0.040
ries+0.040
Neg logit contributions
 contain-0.033
 allowance-0.030
 banks-0.030
 aid-0.029
 resist-0.029
Token to
Token ID284
Feature activation-8.74e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
 tum+0.004
ured+0.004
Neg logit contributions
ww-0.006
w-0.006
o-0.006
sets-0.005
hem-0.005
Token help
Token ID1037
Feature activation-2.78e-02

Pos logit contributions
 contain+0.011
 banks+0.011
 allowance+0.010
 tum+0.010
ured+0.010
Neg logit contributions
ww-0.017
w-0.017
o-0.016
sets-0.015
hem-0.015
Token.
Token ID13
Feature activation-6.59e-03

Pos logit contributions
ww+0.042
w+0.041
o+0.038
hem+0.037
asc+0.036
Neg logit contributions
 contain-0.022
 banks-0.022
 tum-0.020
 bank-0.019
ured-0.018
Token One
Token ID1881
Feature activation-1.05e-02

Pos logit contributions
 contain+0.009
 banks+0.009
 tum+0.008
 allowance+0.008
 symp+0.008
Neg logit contributions
w-0.013
ww-0.012
o-0.011
sets-0.011
hem-0.011
Token day
Token ID1110
Feature activation-5.81e-03

Pos logit contributions
 contain+0.019
 tum+0.018
 banks+0.017
 allowance+0.017
 symp+0.016
Neg logit contributions
w-0.015
ww-0.014
rob-0.014
o-0.014
sets-0.014
Token,
Token ID11
Feature activation-7.98e-04

Pos logit contributions
 contain+0.009
 banks+0.009
 allowance+0.008
 symp+0.008
ured+0.008
Neg logit contributions
ww-0.010
w-0.010
o-0.009
hem-0.008
sets-0.008
Token Wh
Token ID854
Feature activation-6.16e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
sets-0.001
Tokenisk
Token ID1984
Feature activation-1.09e-01

Pos logit contributions
 contain+0.079
 banks+0.070
 allowance+0.067
ured+0.066
 tum+0.066
Neg logit contributions
w-0.125
ww-0.125
o-0.124
hem-0.111
sets-0.110
Tokeners
Token ID364
Feature activation-6.75e-03

Pos logit contributions
 contain+0.166
 banks+0.158
tted+0.155
ured+0.151
 allowance+0.147
Neg logit contributions
ww-0.188
w-0.177
o-0.173
sets-0.160
hem-0.158
Token saw
Token ID2497
Feature activation-1.14e-02

Pos logit contributions
 contain+0.011
 allowance+0.011
 banks+0.010
ured+0.010
 symp+0.010
Neg logit contributions
w-0.010
o-0.010
ww-0.010
sets-0.009
hem-0.009
Token a
Token ID257
Feature activation-7.47e-03

Pos logit contributions
 contain+0.016
 banks+0.015
 tum+0.015
 allowance+0.014
ured+0.014
Neg logit contributions
ww-0.021
w-0.021
o-0.020
sets-0.018
hem-0.018
Token bird
Token ID6512
Feature activation-1.52e-03

Pos logit contributions
 contain+0.011
 tum+0.011
 banks+0.010
 allowance+0.010
 bank+0.010
Neg logit contributions
w-0.013
o-0.013
ww-0.012
ins-0.012
ries-0.012
Token in
Token ID287
Feature activation-5.91e-03

Pos logit contributions
 contain+0.003
 banks+0.002
 symp+0.002
 bank+0.002
 tum+0.002
Neg logit contributions
ww-0.002
w-0.002
www-0.002
hem-0.002
sets-0.002
Token.
Token ID13
Feature activation-6.58e-03

Pos logit contributions
 contain+0.010
 banks+0.010
tted+0.010
 allowance+0.010
 symp+0.009
Neg logit contributions
ww-0.010
w-0.010
o-0.010
very-0.009
sets-0.009
Token One
Token ID1881
Feature activation-1.54e-02

Pos logit contributions
 contain+0.007
 banks+0.007
 allowance+0.007
 tum+0.007
 symp+0.007
Neg logit contributions
w-0.014
ww-0.014
o-0.013
hem-0.012
ins-0.012
Token day
Token ID1110
Feature activation-7.10e-03

Pos logit contributions
 contain+0.028
 tum+0.027
 allowance+0.026
 banks+0.026
 symp+0.025
Neg logit contributions
w-0.021
ww-0.020
 yourselves-0.019
ries-0.019
sets-0.019
Token,
Token ID11
Feature activation-2.06e-03

Pos logit contributions
 contain+0.011
 banks+0.011
 allowance+0.010
 symp+0.010
ured+0.010
Neg logit contributions
ww-0.012
w-0.012
o-0.012
hem-0.010
sets-0.010
Token Wh
Token ID854
Feature activation-5.93e-02

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 tum+0.003
ured+0.003
Neg logit contributions
ww-0.003
w-0.003
o-0.003
hem-0.003
sets-0.003
Tokenisk
Token ID1984
Feature activation-1.08e-01

Pos logit contributions
 contain+0.082
 banks+0.071
 tum+0.068
 allowance+0.068
ured+0.068
Neg logit contributions
o-0.117
w-0.117
ww-0.116
hem-0.104
ins-0.102
Tokeners
Token ID364
Feature activation-5.87e-03

Pos logit contributions
 contain+0.171
 banks+0.162
tted+0.157
ured+0.155
 allowance+0.154
Neg logit contributions
ww-0.179
w-0.170
o-0.167
sets-0.153
hem-0.151
Token saw
Token ID2497
Feature activation-9.98e-03

Pos logit contributions
 contain+0.011
 allowance+0.010
 banks+0.010
ured+0.009
 symp+0.009
Neg logit contributions
o-0.009
w-0.008
ww-0.008
hem-0.007
sets-0.007
Token a
Token ID257
Feature activation-7.85e-03

Pos logit contributions
 contain+0.014
 banks+0.014
 tum+0.014
 allowance+0.013
ured+0.013
Neg logit contributions
ww-0.018
w-0.017
o-0.017
sets-0.016
very-0.015
Token bird
Token ID6512
Feature activation-7.10e-04

Pos logit contributions
 contain+0.011
 tum+0.011
 allowance+0.011
 banks+0.011
 bank+0.010
Neg logit contributions
w-0.014
o-0.013
ww-0.013
ries-0.012
ins-0.012
Token outside
Token ID2354
Feature activation-4.09e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 symp+0.001
 tum+0.001
 bank+0.001
Neg logit contributions
ww-0.001
w-0.001
sets-0.001
o-0.001
www-0.001
Token.
Token ID13
Feature activation-7.18e-03

Pos logit contributions
 contain+0.016
 allowance+0.015
 banks+0.015
tted+0.015
ured+0.014
Neg logit contributions
o-0.017
w-0.016
ww-0.016
sets-0.014
hem-0.014
Token
Token ID198
Feature activation+3.25e-03

Pos logit contributions
 contain+0.011
 banks+0.011
 allowance+0.010
 symp+0.010
 tum+0.010
Neg logit contributions
ww-0.012
w-0.012
o-0.011
sets-0.010
hem-0.010
Token
Token ID198
Feature activation+3.09e-03

Pos logit contributions
ww+0.005
w+0.005
o+0.005
sets+0.005
ries+0.005
Neg logit contributions
 contain-0.005
 banks-0.005
 allowance-0.005
tted-0.005
 symp-0.004
TokenSo
Token ID2396
Feature activation+3.10e-03

Pos logit contributions
ww+0.005
w+0.005
o+0.004
sets+0.004
ries+0.004
Neg logit contributions
 contain-0.005
 banks-0.005
tted-0.005
 allowance-0.005
ured-0.005
Token Wh
Token ID854
Feature activation-5.98e-02

Pos logit contributions
ww+0.005
w+0.005
o+0.005
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.005
 banks-0.005
 allowance-0.004
 tum-0.004
ured-0.004
Tokenisk
Token ID1984
Feature activation-1.08e-01

Pos logit contributions
 contain+0.063
 banks+0.059
 allowance+0.056
 symp+0.054
ured+0.054
Neg logit contributions
ww-0.131
w-0.128
o-0.125
sets-0.117
hem-0.116
Tokeners
Token ID364
Feature activation-7.33e-03

Pos logit contributions
 contain+0.162
 banks+0.154
tted+0.153
ured+0.149
 allowance+0.143
Neg logit contributions
ww-0.186
w-0.176
o-0.172
sets-0.161
hem-0.157
Token continued
Token ID3767
Feature activation+2.27e-02

Pos logit contributions
 contain+0.013
 allowance+0.013
 banks+0.012
 tum+0.012
tted+0.012
Neg logit contributions
o-0.011
w-0.010
ww-0.010
sets-0.008
man-0.008
Token to
Token ID284
Feature activation-8.39e-03

Pos logit contributions
w+0.044
ww+0.044
o+0.039
 yourselves+0.038
ins+0.038
Neg logit contributions
 banks-0.030
 contain-0.030
 allowance-0.029
 bank-0.028
ured-0.028
Token eat
Token ID4483
Feature activation-2.13e-03

Pos logit contributions
 contain+0.011
 banks+0.011
 allowance+0.010
ured+0.010
 symp+0.010
Neg logit contributions
ww-0.016
w-0.016
o-0.016
hem-0.014
ins-0.014
Token and
Token ID290
Feature activation-1.86e-03

Pos logit contributions
 contain+0.004
tted+0.004
nes+0.004
ured+0.004
 banks+0.004
Neg logit contributions
o-0.003
ww-0.003
w-0.002
ries-0.002
hem-0.002
Token ink
Token ID16882
Feature activation-5.52e-03

Pos logit contributions
ww+0.010
w+0.010
o+0.010
sets+0.009
ins+0.009
Neg logit contributions
 contain-0.007
 banks-0.007
 allowance-0.007
 tum-0.007
 aid-0.007
Token.
Token ID13
Feature activation-6.91e-04

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.008
ured+0.007
 tum+0.007
Neg logit contributions
ww-0.010
w-0.009
o-0.009
hem-0.008
sets-0.008
Token Mr
Token ID1770
Feature activation-3.73e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
 symp+0.001
Neg logit contributions
w-0.001
ww-0.001
o-0.001
sets-0.001
hem-0.001
Token.
Token ID13
Feature activation-6.98e-04

Pos logit contributions
 contain+0.051
 banks+0.048
 allowance+0.046
 tum+0.045
 symp+0.045
Neg logit contributions
ww-0.070
w-0.069
o-0.067
sets-0.062
hem-0.061
Token Wh
Token ID854
Feature activation-5.96e-02

Pos logit contributions
 banks+0.001
 allowance+0.001
 contain+0.001
 tum+0.001
 aid+0.001
Neg logit contributions
w-0.001
ww-0.001
o-0.001
ins-0.001
hem-0.001
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 contain+0.070
 banks+0.065
 allowance+0.062
ured+0.060
 symp+0.060
Neg logit contributions
ww-0.124
w-0.122
o-0.120
sets-0.111
hem-0.110
Tokeners
Token ID364
Feature activation-7.13e-03

Pos logit contributions
 contain+0.169
 banks+0.161
tted+0.156
ured+0.154
 allowance+0.152
Neg logit contributions
ww-0.178
w-0.170
o-0.167
sets-0.154
hem-0.150
Token went
Token ID1816
Feature activation+1.31e-02

Pos logit contributions
 contain+0.012
 allowance+0.011
 banks+0.011
ured+0.011
 symp+0.011
Neg logit contributions
o-0.010
w-0.010
ww-0.010
sets-0.009
hem-0.009
Token to
Token ID284
Feature activation-7.26e-03

Pos logit contributions
ww+0.026
w+0.025
o+0.023
sets+0.022
ins+0.022
Neg logit contributions
 contain-0.018
 banks-0.017
ured-0.016
 allowance-0.016
tted-0.016
Token the
Token ID262
Feature activation+7.67e-03

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
 symp+0.010
ured+0.010
Neg logit contributions
ww-0.012
o-0.012
w-0.011
hem-0.010
sets-0.010
Token store
Token ID3650
Feature activation-2.16e-03

Pos logit contributions
ww+0.011
w+0.011
o+0.011
ins+0.010
ries+0.010
Neg logit contributions
 contain-0.013
 banks-0.013
 bank-0.013
 tum-0.013
 allowance-0.012
Token,
Token ID11
Feature activation-6.18e-04

Pos logit contributions
ww+0.000
w+0.000
o+0.000
hem+0.000
sets+0.000
Neg logit contributions
 contain-0.000
 banks-0.000
 allowance-0.000
tted-0.000
 symp-0.000
Token "
Token ID366
Feature activation-5.84e-03

Pos logit contributions
 contain+0.001
 symp+0.001
ured+0.001
 banks+0.001
 tum+0.001
Neg logit contributions
w-0.001
ww-0.001
o-0.001
hem-0.001
rob-0.001
TokenI
Token ID40
Feature activation+1.41e-03

Pos logit contributions
 contain+0.011
tted+0.010
 symp+0.010
+0.010
 aid+0.010
Neg logit contributions
ww-0.008
o-0.007
w-0.007
ries-0.007
ins-0.007
Token am
Token ID716
Feature activation+1.83e-02

Pos logit contributions
ww+0.003
w+0.003
o+0.003
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
ured-0.002
 symp-0.002
Token Wh
Token ID854
Feature activation-5.90e-02

Pos logit contributions
ww+0.031
w+0.030
o+0.028
sets+0.026
hem+0.026
Neg logit contributions
 contain-0.030
 banks-0.027
 allowance-0.027
 tum-0.027
ured-0.027
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 contain+0.065
 banks+0.060
 allowance+0.057
ured+0.056
 symp+0.055
Neg logit contributions
ww-0.127
w-0.124
o-0.122
sets-0.113
hem-0.113
Tokeners
Token ID364
Feature activation-6.54e-03

Pos logit contributions
 contain+0.165
tted+0.154
 banks+0.150
ured+0.149
 allowance+0.145
Neg logit contributions
ww-0.182
w-0.173
o-0.171
sets-0.161
hem-0.157
Token.
Token ID13
Feature activation-5.15e-03

Pos logit contributions
 contain+0.011
 banks+0.010
 allowance+0.010
 tum+0.010
ured+0.010
Neg logit contributions
ww-0.010
w-0.010
o-0.010
hem-0.009
sets-0.009
Token I
Token ID314
Feature activation+8.27e-03

Pos logit contributions
 contain+0.010
 banks+0.009
tted+0.009
 allowance+0.009
 tum+0.009
Neg logit contributions
w-0.007
ww-0.007
ins-0.006
ries-0.006
o-0.006
Token am
Token ID716
Feature activation+1.75e-02

Pos logit contributions
ww+0.014
w+0.014
o+0.014
hem+0.013
sets+0.012
Neg logit contributions
 contain-0.012
 banks-0.012
 allowance-0.011
 symp-0.011
ured-0.011
Token too
Token ID1165
Feature activation+2.69e-02

Pos logit contributions
ww+0.031
w+0.031
o+0.028
sets+0.027
hem+0.027
Neg logit contributions
 contain-0.027
 allowance-0.025
 banks-0.024
 tum-0.024
ured-0.024
Token ch
Token ID442
Feature activation+6.36e-03

Pos logit contributions
 tum+0.009
 allowance+0.008
 contain+0.008
 banks+0.008
 bank+0.008
Neg logit contributions
w-0.010
o-0.010
ww-0.009
ries-0.009
ins-0.009
Tokenubby
Token ID38393
Feature activation+1.73e-02

Pos logit contributions
ww+0.009
w+0.008
o+0.008
sets+0.007
hem+0.007
Neg logit contributions
 contain-0.012
 banks-0.011
 allowance-0.011
 tum-0.011
 symp-0.011
Token cat
Token ID3797
Feature activation-1.43e-02

Pos logit contributions
w+0.028
ww+0.028
o+0.027
hem+0.025
sets+0.025
Neg logit contributions
 contain-0.027
 tum-0.027
 banks-0.027
 allowance-0.026
 bank-0.025
Token named
Token ID3706
Feature activation+2.26e-05

Pos logit contributions
 contain+0.019
 banks+0.018
 tum+0.018
 allowance+0.017
 bank+0.017
Neg logit contributions
ww-0.027
w-0.026
o-0.024
sets-0.024
hem-0.023
Token Wh
Token ID854
Feature activation-5.84e-02

Pos logit contributions
w+0.000
ww+0.000
o+0.000
ins+0.000
hem+0.000
Neg logit contributions
 contain-0.000
 banks-0.000
 allowance-0.000
 tum-0.000
ured-0.000
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 banks+0.053
 contain+0.052
 symp+0.047
tted+0.046
 aid+0.046
Neg logit contributions
ww-0.137
w-0.130
o-0.121
hem-0.119
www-0.119
Tokeners
Token ID364
Feature activation-5.57e-03

Pos logit contributions
tted+0.147
 contain+0.146
 banks+0.141
ured+0.136
 allowance+0.130
Neg logit contributions
ww-0.197
w-0.183
o-0.174
sets-0.168
very-0.165
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
 contain+0.010
 allowance+0.009
 banks+0.009
 tum+0.009
 symp+0.009
Neg logit contributions
o-0.009
w-0.008
ww-0.007
sets-0.007
my-0.007
Token Tim
Token ID5045
Feature activation+8.53e-03

Pos logit contributions
 contain+0.015
 banks+0.015
 allowance+0.014
 tum+0.014
 symp+0.014
Neg logit contributions
w-0.017
ww-0.017
o-0.015
hem-0.015
ins-0.015
Tokenmy
Token ID1820
Feature activation+2.19e-02

Pos logit contributions
ww+0.008
w+0.007
 Sorry+0.006
ries+0.006
 yourselves+0.006
Neg logit contributions
 banks-0.019
 symp-0.018
 bank-0.018
 contain-0.018
 allowance-0.018
Token loved
Token ID6151
Feature activation+4.41e-04

Pos logit contributions
ww+0.040
w+0.039
o+0.036
hem+0.035
sets+0.035
Neg logit contributions
 contain-0.030
 banks-0.030
 allowance-0.028
 tum-0.028
 symp-0.027
Token bark
Token ID21405
Feature activation-6.15e-03

Pos logit contributions
ww+0.002
w+0.002
o+0.002
loo+0.001
ins+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 widened-0.001
 tum-0.001
 bank-0.001
Tokened
Token ID276
Feature activation+3.78e-02

Pos logit contributions
 contain+0.013
 allowance+0.013
 banks+0.012
 aid+0.012
 tum+0.012
Neg logit contributions
o-0.007
ww-0.007
w-0.006
ning-0.006
ins-0.005
Token,
Token ID11
Feature activation-1.23e-03

Pos logit contributions
o+0.058
ww+0.057
w+0.054
 my+0.051
sets+0.050
Neg logit contributions
 contain-0.064
 allowance-0.059
 banks-0.059
tted-0.057
 tum-0.054
Token but
Token ID475
Feature activation-2.26e-02

Pos logit contributions
 contain+0.002
 banks+0.001
ured+0.001
 symp+0.001
 widened+0.001
Neg logit contributions
ww-0.003
w-0.003
o-0.002
loo-0.002
hem-0.002
Token Wh
Token ID854
Feature activation-5.93e-02

Pos logit contributions
 contain+0.041
 banks+0.039
 allowance+0.039
 tum+0.039
ured+0.038
Neg logit contributions
ww-0.032
w-0.031
o-0.029
ins-0.027
hem-0.026
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 banks+0.050
 contain+0.048
 symp+0.047
tted+0.046
ured+0.044
Neg logit contributions
ww-0.141
w-0.134
o-0.126
sets-0.124
hem-0.123
Tokeners
Token ID364
Feature activation-5.66e-03

Pos logit contributions
 contain+0.147
tted+0.145
 banks+0.141
ured+0.138
tered+0.129
Neg logit contributions
ww-0.197
w-0.183
o-0.179
sets-0.168
hem-0.165
Token did
Token ID750
Feature activation+1.40e-02

Pos logit contributions
 contain+0.009
 allowance+0.008
 banks+0.008
tted+0.008
ured+0.008
Neg logit contributions
w-0.010
o-0.009
ww-0.009
sets-0.008
hem-0.008
Token not
Token ID407
Feature activation+8.87e-03

Pos logit contributions
ww+0.022
w+0.022
o+0.022
hem+0.019
sets+0.019
Neg logit contributions
 contain-0.023
 banks-0.022
 allowance-0.021
 symp-0.021
 tum-0.021
Token care
Token ID1337
Feature activation-1.01e-02

Pos logit contributions
ww+0.018
w+0.018
o+0.018
hem+0.016
sets+0.016
Neg logit contributions
 contain-0.011
 banks-0.010
 allowance-0.009
 symp-0.009
 tum-0.009
Token.
Token ID13
Feature activation-2.59e-03

Pos logit contributions
 contain+0.016
 banks+0.015
 allowance+0.015
 tum+0.014
tted+0.014
Neg logit contributions
o-0.017
ww-0.016
w-0.016
hem-0.015
ins-0.015
Token a
Token ID257
Feature activation-7.25e-03

Pos logit contributions
 contain+0.005
 banks+0.004
 allowance+0.004
ured+0.004
 symp+0.004
Neg logit contributions
ww-0.005
w-0.004
o-0.004
sets-0.004
hem-0.004
Token naughty
Token ID45128
Feature activation-1.84e-02

Pos logit contributions
 contain+0.009
 banks+0.008
 tum+0.008
 allowance+0.008
 symp+0.008
Neg logit contributions
w-0.014
ww-0.014
o-0.014
sets-0.013
ries-0.012
Token cat
Token ID3797
Feature activation-1.42e-02

Pos logit contributions
 tum+0.031
 contain+0.030
 banks+0.029
 symp+0.029
 allowance+0.029
Neg logit contributions
w-0.028
ww-0.026
o-0.026
sets-0.025
ries-0.024
Token named
Token ID3706
Feature activation-3.34e-04

Pos logit contributions
 contain+0.019
 banks+0.018
 tum+0.017
 allowance+0.017
 symp+0.017
Neg logit contributions
ww-0.027
w-0.026
o-0.024
sets-0.024
hem-0.023
Token Wh
Token ID854
Feature activation-5.91e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.000
w-0.000
o-0.000
sets-0.000
hem-0.000
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 contain+0.070
 banks+0.064
 allowance+0.062
ured+0.060
 tum+0.060
Neg logit contributions
ww-0.122
w-0.121
o-0.119
sets-0.109
hem-0.109
Tokeners
Token ID364
Feature activation-4.01e-03

Pos logit contributions
 contain+0.153
tted+0.147
 banks+0.146
ured+0.139
 allowance+0.138
Neg logit contributions
ww-0.190
w-0.177
o-0.170
sets-0.163
very-0.158
Token.
Token ID13
Feature activation-7.19e-03

Pos logit contributions
 contain+0.009
 allowance+0.008
ured+0.008
tted+0.008
 banks+0.008
Neg logit contributions
o-0.006
w-0.005
my-0.004
man-0.004
?-0.004
Token Wh
Token ID854
Feature activation-5.83e-02

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
ured+0.011
 symp+0.011
Neg logit contributions
ww-0.011
w-0.011
o-0.011
sets-0.010
hem-0.010
Tokenisk
Token ID1984
Feature activation-1.06e-01

Pos logit contributions
 contain+0.064
 banks+0.059
 allowance+0.056
 symp+0.054
ured+0.054
Neg logit contributions
ww-0.126
w-0.123
o-0.121
sets-0.112
hem-0.112
Tokeners
Token ID364
Feature activation-4.09e-03

Pos logit contributions
 contain+0.149
tted+0.146
 banks+0.144
ured+0.135
 allowance+0.133
Neg logit contributions
ww-0.191
w-0.177
o-0.171
sets-0.165
very-0.160
Token Lucy
Token ID22162
Feature activation-1.32e-02

Pos logit contributions
ww+0.002
w+0.002
o+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
ured-0.002
 tum-0.002
Token and
Token ID290
Feature activation+9.79e-04

Pos logit contributions
 contain+0.021
 banks+0.020
 allowance+0.019
ured+0.019
 symp+0.019
Neg logit contributions
ww-0.022
w-0.022
o-0.021
hem-0.019
sets-0.019
Token Mr
Token ID1770
Feature activation-3.84e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
hem+0.001
ins+0.001
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
 tum-0.002
 symp-0.002
Token.
Token ID13
Feature activation-9.36e-04

Pos logit contributions
 contain+0.057
 banks+0.052
 allowance+0.051
ured+0.050
 tum+0.049
Neg logit contributions
ww-0.068
o-0.067
w-0.067
sets-0.060
hem-0.060
Token Wh
Token ID854
Feature activation-5.96e-02

Pos logit contributions
 banks+0.001
 contain+0.001
 allowance+0.001
 symp+0.001
ured+0.001
Neg logit contributions
w-0.002
ww-0.002
o-0.001
ins-0.001
hem-0.001
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 contain+0.069
 banks+0.063
 allowance+0.061
 tum+0.059
ured+0.058
Neg logit contributions
ww-0.125
w-0.124
o-0.122
sets-0.112
hem-0.112
Tokeners
Token ID364
Feature activation-4.26e-03

Pos logit contributions
 contain+0.163
 banks+0.154
tted+0.151
ured+0.148
 allowance+0.145
Neg logit contributions
ww-0.183
w-0.174
o-0.169
sets-0.155
hem-0.154
Token went
Token ID1816
Feature activation+1.37e-02

Pos logit contributions
 allowance+0.006
 contain+0.006
 banks+0.006
ured+0.006
 symp+0.006
Neg logit contributions
o-0.007
w-0.007
ww-0.006
my-0.006
sets-0.006
Token for
Token ID329
Feature activation+4.70e-03

Pos logit contributions
ww+0.029
w+0.029
www+0.025
o+0.025
ries+0.025
Neg logit contributions
 contain-0.017
 banks-0.016
 allowance-0.015
 bank-0.014
tted-0.014
Token a
Token ID257
Feature activation-7.42e-03

Pos logit contributions
ww+0.006
o+0.005
w+0.005
 my+0.005
hem+0.005
Neg logit contributions
 contain-0.009
 banks-0.009
tted-0.009
 symp-0.009
ured-0.008
Token walk
Token ID2513
Feature activation+2.62e-03

Pos logit contributions
 contain+0.009
 banks+0.009
ured+0.009
tted+0.008
 symp+0.008
Neg logit contributions
ww-0.015
o-0.014
w-0.014
very-0.013
hem-0.013
Token flew
Token ID13112
Feature activation+7.39e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.006
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.005
 banks-0.004
 allowance-0.004
 symp-0.004
ured-0.004
Token down
Token ID866
Feature activation+1.59e-03

Pos logit contributions
ww+0.015
w+0.014
o+0.013
ins+0.013
rob+0.013
Neg logit contributions
 contain-0.010
 banks-0.009
ured-0.009
tted-0.008
 symp-0.008
Token and
Token ID290
Feature activation-1.67e-03

Pos logit contributions
ww+0.003
w+0.003
o+0.003
sets+0.003
ins+0.003
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
tted-0.002
ured-0.002
Token scared
Token ID12008
Feature activation+8.90e-03

Pos logit contributions
 banks+0.002
 contain+0.002
 tum+0.002
 bank+0.002
ured+0.002
Neg logit contributions
w-0.003
ww-0.003
ries-0.003
loo-0.003
ins-0.003
Token Wh
Token ID854
Feature activation-5.91e-02

Pos logit contributions
ww+0.015
w+0.015
o+0.015
sets+0.013
hem+0.013
Neg logit contributions
 contain-0.014
 banks-0.013
 tum-0.012
ured-0.012
 allowance-0.012
Tokenisk
Token ID1984
Feature activation-1.06e-01

Pos logit contributions
 contain+0.071
 banks+0.064
 allowance+0.062
 tum+0.061
 symp+0.059
Neg logit contributions
ww-0.122
w-0.121
o-0.121
hem-0.110
sets-0.109
Tokeners
Token ID364
Feature activation-4.52e-03

Pos logit contributions
 contain+0.176
 banks+0.166
 allowance+0.160
tted+0.159
ured+0.159
Neg logit contributions
ww-0.168
w-0.163
o-0.159
sets-0.144
hem-0.143
Token away
Token ID1497
Feature activation-2.41e-03

Pos logit contributions
 contain+0.007
 allowance+0.007
 symp+0.007
 banks+0.007
 tum+0.007
Neg logit contributions
o-0.007
w-0.006
ww-0.006
sets-0.006
ning-0.005
Token.
Token ID13
Feature activation-8.62e-03

Pos logit contributions
 contain+0.004
 banks+0.003
 allowance+0.003
 symp+0.003
 tum+0.003
Neg logit contributions
ww-0.004
w-0.004
o-0.004
sets-0.004
hem-0.004
Token
Token ID198
Feature activation+8.12e-04

Pos logit contributions
 contain+0.013
 banks+0.013
 allowance+0.012
 tum+0.012
 symp+0.012
Neg logit contributions
w-0.015
ww-0.015
o-0.014
sets-0.013
hem-0.013
Token
Token ID198
Feature activation+6.98e-04

Pos logit contributions
ww+0.001
w+0.001
o+0.001
sets+0.001
ries+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.001
tted-0.001
 symp-0.001
Token Wh
Token ID854
Feature activation-5.91e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.000
w-0.000
o-0.000
sets-0.000
hem-0.000
Tokenisk
Token ID1984
Feature activation-1.07e-01

Pos logit contributions
 contain+0.070
 banks+0.064
 allowance+0.062
ured+0.060
 tum+0.060
Neg logit contributions
ww-0.122
w-0.121
o-0.119
sets-0.109
hem-0.109
Tokeners
Token ID364
Feature activation-4.01e-03

Pos logit contributions
 contain+0.153
tted+0.147
 banks+0.146
ured+0.139
 allowance+0.138
Neg logit contributions
ww-0.190
w-0.177
o-0.170
sets-0.163
very-0.158
Token.
Token ID13
Feature activation-7.19e-03

Pos logit contributions
 contain+0.009
 allowance+0.008
ured+0.008
tted+0.008
 banks+0.008
Neg logit contributions
o-0.006
w-0.005
my-0.004
man-0.004
?-0.004
Token Wh
Token ID854
Feature activation-5.83e-02

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
ured+0.011
 symp+0.011
Neg logit contributions
ww-0.011
w-0.011
o-0.011
sets-0.010
hem-0.010
Tokenisk
Token ID1984
Feature activation-1.06e-01

Pos logit contributions
 contain+0.064
 banks+0.059
 allowance+0.056
 symp+0.054
ured+0.054
Neg logit contributions
ww-0.126
w-0.123
o-0.121
sets-0.112
hem-0.112
Tokeners
Token ID364
Feature activation-4.09e-03

Pos logit contributions
 contain+0.149
tted+0.146
 banks+0.144
ured+0.135
 allowance+0.133
Neg logit contributions
ww-0.191
w-0.177
o-0.171
sets-0.165
very-0.160
Token loved
Token ID6151
Feature activation-2.07e-03

Pos logit contributions
 contain+0.007
 banks+0.006
 allowance+0.006
ured+0.006
 symp+0.006
Neg logit contributions
w-0.006
ww-0.006
o-0.006
hem-0.005
sets-0.005
Token to
Token ID284
Feature activation-7.06e-03

Pos logit contributions
 tum+0.002
 allowance+0.002
 banks+0.002
 bank+0.001
 contain+0.001
Neg logit contributions
w-0.005
ww-0.005
very-0.004
hem-0.004
o-0.004
Token play
Token ID711
Feature activation+1.36e-02

Pos logit contributions
 contain+0.010
 banks+0.009
 allowance+0.009
 tum+0.009
ured+0.009
Neg logit contributions
ww-0.013
w-0.013
o-0.013
hem-0.012
sets-0.012
Token with
Token ID351
Feature activation+1.14e-02

Pos logit contributions
ww+0.019
w+0.019
o+0.018
sets+0.016
hem+0.016
Neg logit contributions
 contain-0.025
 banks-0.024
 allowance-0.023
ured-0.023
 symp-0.023
Token loved
Token ID6151
Feature activation+4.41e-04

Pos logit contributions
ww+0.040
w+0.039
o+0.036
hem+0.035
sets+0.035
Neg logit contributions
 contain-0.030
 banks-0.030
 allowance-0.028
 tum-0.028
 symp-0.027
Token to
Token ID284
Feature activation-7.25e-03

Pos logit contributions
w+0.001
ww+0.001
 yourselves+0.001
very+0.001
www+0.001
Neg logit contributions
 tum-0.000
 banks-0.000
 allowance-0.000
 bank-0.000
 contain-0.000
Token play
Token ID711
Feature activation+1.55e-02

Pos logit contributions
 contain+0.011
 banks+0.010
 allowance+0.010
 tum+0.010
ured+0.009
Neg logit contributions
ww-0.013
w-0.013
o-0.012
hem-0.011
sets-0.011
Token with
Token ID351
Feature activation+1.15e-02

Pos logit contributions
ww+0.021
w+0.021
o+0.020
hem+0.018
sets+0.017
Neg logit contributions
 contain-0.030
 banks-0.028
 allowance-0.027
 symp-0.027
ured-0.027
Token Wh
Token ID854
Feature activation-5.75e-02

Pos logit contributions
ww+0.016
o+0.016
w+0.016
hem+0.014
sets+0.014
Neg logit contributions
 contain-0.022
 banks-0.020
ured-0.019
 symp-0.019
tted-0.019
Tokenisk
Token ID1984
Feature activation-1.06e-01

Pos logit contributions
 banks+0.049
 contain+0.049
 symp+0.044
tted+0.044
 allowance+0.044
Neg logit contributions
ww-0.134
w-0.130
o-0.123
sets-0.121
hem-0.119
Tokeners
Token ID364
Feature activation-4.33e-03

Pos logit contributions
 contain+0.153
tted+0.150
 banks+0.146
ured+0.138
 allowance+0.137
Neg logit contributions
ww-0.184
w-0.174
o-0.166
sets-0.162
 blows-0.157
Token,
Token ID11
Feature activation-5.75e-03

Pos logit contributions
 contain+0.008
tted+0.008
 symp+0.008
ured+0.008
 banks+0.008
Neg logit contributions
o-0.006
w-0.005
ww-0.005
sets-0.005
my-0.005
Token but
Token ID475
Feature activation-2.38e-02

Pos logit contributions
 contain+0.009
 banks+0.008
ured+0.008
tted+0.008
 allowance+0.008
Neg logit contributions
o-0.010
ww-0.010
w-0.010
sets-0.009
 my-0.009
Token one
Token ID530
Feature activation-1.66e-02

Pos logit contributions
 contain+0.040
 banks+0.039
 allowance+0.038
 tum+0.037
 symp+0.037
Neg logit contributions
ww-0.036
w-0.035
o-0.034
hem-0.031
sets-0.031
Token day
Token ID1110
Feature activation-8.62e-03

Pos logit contributions
 tum+0.030
 contain+0.028
 allowance+0.027
 aid+0.026
 symp+0.025
Neg logit contributions
 yourselves-0.024
ww-0.023
w-0.023
 yourself-0.023
sets-0.023
Token outside
Token ID2354
Feature activation-4.09e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 symp+0.001
 tum+0.001
 bank+0.001
Neg logit contributions
ww-0.001
w-0.001
sets-0.001
o-0.001
www-0.001
Token the
Token ID262
Feature activation+7.23e-03

Pos logit contributions
 contain+0.064
 banks+0.059
 tum+0.057
 symp+0.057
 aid+0.055
Neg logit contributions
ww-0.074
w-0.070
o-0.064
hem-0.062
rob-0.061
Token door
Token ID3420
Feature activation+3.02e-02

Pos logit contributions
w+0.011
ww+0.011
o+0.010
ins+0.009
ries+0.009
Neg logit contributions
 banks-0.013
 tum-0.012
 bank-0.012
 contain-0.012
 aid-0.012
Token.
Token ID13
Feature activation-4.65e-03

Pos logit contributions
ww+0.062
w+0.060
o+0.057
hem+0.054
sets+0.054
Neg logit contributions
 contain-0.039
 banks-0.035
 tum-0.033
 allowance-0.033
 symp-0.033
Token Wh
Token ID854
Feature activation-5.78e-02

Pos logit contributions
 contain+0.006
 banks+0.006
 tum+0.006
 allowance+0.006
 symp+0.006
Neg logit contributions
w-0.009
ww-0.009
o-0.008
hem-0.008
ins-0.008
Tokenisk
Token ID1984
Feature activation-1.06e-01

Pos logit contributions
 contain+0.069
 banks+0.062
 allowance+0.060
 tum+0.058
ured+0.058
Neg logit contributions
ww-0.121
w-0.120
o-0.118
hem-0.108
sets-0.107
Tokeners
Token ID364
Feature activation-4.23e-03

Pos logit contributions
 contain+0.152
tted+0.147
 banks+0.144
ured+0.140
 allowance+0.133
Neg logit contributions
ww-0.188
w-0.176
o-0.172
sets-0.163
hem-0.157
Token wanted
Token ID2227
Feature activation-3.09e-03

Pos logit contributions
 contain+0.007
 banks+0.007
 allowance+0.007
ured+0.006
 tum+0.006
Neg logit contributions
ww-0.007
w-0.006
o-0.006
sets-0.006
hem-0.006
Token to
Token ID284
Feature activation-6.96e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
 tum+0.004
ured+0.004
Neg logit contributions
ww-0.006
w-0.006
o-0.005
sets-0.005
hem-0.005
Token catch
Token ID4929
Feature activation+8.32e-05

Pos logit contributions
 contain+0.009
 banks+0.009
 allowance+0.008
 symp+0.008
ured+0.008
Neg logit contributions
ww-0.013
w-0.013
o-0.013
hem-0.012
sets-0.012
Token the
Token ID262
Feature activation+8.15e-03

Pos logit contributions
ww+0.000
w+0.000
o+0.000
hem+0.000
sets+0.000
Neg logit contributions
 contain-0.000
 banks-0.000
 allowance-0.000
ured-0.000
 symp-0.000
Token '
Token ID705
Feature activation+9.40e-03

Pos logit contributions
ww+0.030
w+0.030
o+0.028
hem+0.026
sets+0.026
Neg logit contributions
 contain-0.026
 banks-0.025
 allowance-0.024
ured-0.023
 symp-0.023
Tokentick
Token ID42298
Feature activation-8.92e-03

Pos logit contributions
ww+0.014
imi+0.013
hem+0.013
w+0.013
loo+0.013
Neg logit contributions
 contain-0.015
tted-0.014
 allowance-0.014
 symp-0.013
 banks-0.013
Token-
Token ID12
Feature activation+4.80e-02

Pos logit contributions
 contain+0.016
 banks+0.015
 allowance+0.015
ured+0.015
 symp+0.015
Neg logit contributions
ww-0.012
w-0.012
o-0.012
sets-0.010
hem-0.010
Tokent
Token ID83
Feature activation+1.56e-02

Pos logit contributions
ww+0.089
w+0.086
o+0.085
sets+0.081
hem+0.079
Neg logit contributions
 contain-0.066
 banks-0.060
 allowance-0.058
tted-0.058
ured-0.057
Tokenock
Token ID735
Feature activation+2.51e-02

Pos logit contributions
ww+0.024
w+0.024
o+0.024
sets+0.022
hem+0.021
Neg logit contributions
 contain-0.027
 banks-0.024
ured-0.023
 allowance-0.023
 symp-0.022
Token',
Token ID3256
Feature activation+7.16e-02

Pos logit contributions
ww+0.046
w+0.045
o+0.045
hem+0.041
sets+0.041
Neg logit contributions
 contain-0.035
 banks-0.032
ured-0.031
 allowance-0.031
 symp-0.030
Token and
Token ID290
Feature activation+2.67e-03

Pos logit contributions
ww+0.126
w+0.124
o+0.122
hem+0.111
rob+0.109
Neg logit contributions
 contain-0.109
 banks-0.102
ured-0.099
 tum-0.098
 symp-0.096
Token they
Token ID484
Feature activation-8.58e-03

Pos logit contributions
ww+0.005
w+0.005
o+0.005
hem+0.004
sets+0.004
Neg logit contributions
 contain-0.004
 banks-0.004
 allowance-0.004
ured-0.004
 tum-0.003
Token would
Token ID561
Feature activation-2.95e-03

Pos logit contributions
 contain+0.012
 banks+0.012
 allowance+0.011
ured+0.011
 symp+0.011
Neg logit contributions
w-0.015
o-0.015
ww-0.015
sets-0.013
hem-0.013
Token both
Token ID1111
Feature activation+4.63e-02

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.003
ured+0.003
 tum+0.003
Neg logit contributions
ww-0.006
w-0.006
o-0.006
sets-0.005
hem-0.005
Token be
Token ID307
Feature activation-8.07e-03

Pos logit contributions
ww+0.091
w+0.090
o+0.088
hem+0.081
sets+0.081
Neg logit contributions
 contain-0.060
 banks-0.055
 allowance-0.053
 tum-0.051
ured-0.051
Token looks
Token ID3073
Feature activation+2.52e-02

Pos logit contributions
+0.008
 banks+0.007
 contain+0.006
 seem+0.006
 believe+0.006
Neg logit contributions
w-0.024
ww-0.024
loo-0.023
irens-0.022
 yourselves-0.022
Token very
Token ID845
Feature activation-1.96e-03

Pos logit contributions
ww+0.044
w+0.042
o+0.042
 my+0.038
hem+0.037
Neg logit contributions
 contain-0.038
 banks-0.036
 allowance-0.034
tted-0.034
 symp-0.033
Token nice
Token ID3621
Feature activation+9.74e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 tum+0.003
 allowance+0.003
ured+0.003
Neg logit contributions
w-0.003
ww-0.003
o-0.003
sets-0.003
hem-0.003
Token,
Token ID11
Feature activation+1.27e-03

Pos logit contributions
w+0.019
ww+0.019
o+0.019
Look+0.018
asc+0.018
Neg logit contributions
 contain-0.012
 allowance-0.011
 banks-0.011
 tum-0.010
ured-0.010
Token sweet
Token ID6029
Feature activation+2.10e-02

Pos logit contributions
w+0.002
o+0.002
ww+0.002
rob+0.002
hem+0.002
Neg logit contributions
 contain-0.002
 tum-0.002
 allowance-0.002
 banks-0.002
 symp-0.002
Tokenheart
Token ID11499
Feature activation+9.26e-02

Pos logit contributions
w+0.027
o+0.024
ww+0.023
Look+0.023
rob+0.022
Neg logit contributions
 contain-0.039
 tum-0.038
 banks-0.037
 symp-0.036
 allowance-0.035
Token,"
Token ID553
Feature activation+1.32e-02

Pos logit contributions
ww+0.170
w+0.170
o+0.164
sets+0.154
hem+0.153
Neg logit contributions
 contain-0.124
 banks-0.115
 allowance-0.114
 tum-0.111
ured-0.110
Token she
Token ID673
Feature activation-1.16e-03

Pos logit contributions
w+0.026
rob+0.025
asc+0.025
ww+0.025
o+0.024
Neg logit contributions
 tum-0.019
-0.017
 contain-0.016
 allowance-0.016
 symp-0.015
Token said
Token ID531
Feature activation+3.38e-04

Pos logit contributions
 contain+0.002
 banks+0.002
 tum+0.002
 symp+0.002
ured+0.002
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
 yourselves-0.001
Token.
Token ID13
Feature activation-6.20e-03

Pos logit contributions
o+0.000
ww+0.000
w+0.000
very+0.000
sets+0.000
Neg logit contributions
 banks-0.001
 contain-0.001
 allowance-0.001
tted-0.001
 bank-0.001
Token
Token ID198
Feature activation-4.33e-04

Pos logit contributions
 contain+0.009
 tum+0.008
 banks+0.008
 allowance+0.008
 symp+0.008
Neg logit contributions
w-0.011
ww-0.011
o-0.010
sets-0.010
hem-0.010
Token.
Token ID13
Feature activation-5.01e-03

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.002
ured+0.002
tted+0.002
Neg logit contributions
ww-0.002
w-0.002
o-0.002
sets-0.002
hem-0.002
Token It
Token ID632
Feature activation-1.27e-02

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.008
ured+0.008
 tum+0.008
Neg logit contributions
ww-0.008
w-0.008
o-0.008
sets-0.007
hem-0.007
Token was
Token ID373
Feature activation-1.22e-03

Pos logit contributions
 contain+0.019
 banks+0.018
 tum+0.017
 widened+0.017
 symp+0.017
Neg logit contributions
ww-0.022
w-0.022
o-0.021
hem-0.019
 yourselves-0.019
Token so
Token ID523
Feature activation+1.36e-02

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.002
 tum+0.002
ured+0.002
Neg logit contributions
ww-0.002
w-0.002
o-0.002
sets-0.002
hem-0.002
Token y
Token ID331
Feature activation+1.88e-02

Pos logit contributions
w+0.023
o+0.023
ww+0.023
sets+0.022
rob+0.022
Neg logit contributions
 contain-0.020
 tum-0.019
 banks-0.018
 allowance-0.018
 symp-0.018
Tokenummy
Token ID13513
Feature activation+7.53e-02

Pos logit contributions
ww+0.021
w+0.021
o+0.021
sets+0.018
hem+0.017
Neg logit contributions
 contain-0.039
 banks-0.037
ured-0.036
 tum-0.036
 symp-0.036
Token!
Token ID0
Feature activation+1.36e-02

Pos logit contributions
w+0.128
ww+0.124
o+0.122
hem+0.111
sets+0.111
Neg logit contributions
 contain-0.113
 banks-0.107
ured-0.106
 allowance-0.104
 tum-0.104
Token
Token ID198
Feature activation-2.76e-04

Pos logit contributions
ww+0.018
w+0.018
o+0.017
hem+0.015
sets+0.015
Neg logit contributions
 contain-0.026
 banks-0.025
 allowance-0.024
ured-0.024
 symp-0.024
Token
Token ID198
Feature activation-4.68e-04

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 symp+0.000
 tum+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
sets-0.000
hem-0.000
TokenM
Token ID44
Feature activation+1.49e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
hem-0.001
very-0.001
Tokenolly
Token ID5098
Feature activation+1.45e-02

Pos logit contributions
ww+0.023
w+0.023
o+0.022
hem+0.020
sets+0.019
Neg logit contributions
 contain-0.026
 banks-0.024
 allowance-0.023
ured-0.023
 symp-0.023
Token
Token ID95
Feature activation+2.58e-02

Pos logit contributions
ww+0.004
o+0.004
w+0.004
www+0.004
very+0.004
Neg logit contributions
 contain-0.004
 allowance-0.003
 tum-0.003
 banks-0.003
 symp-0.003
Token
Token ID26391
Feature activation-3.83e-02

Pos logit contributions
ww+0.045
o+0.044
w+0.044
hem+0.039
sets+0.038
Neg logit contributions
 contain-0.039
 allowance-0.036
 banks-0.036
 tum-0.035
 symp-0.034
Token
Token ID129
Feature activation+9.60e-04

Pos logit contributions
 contain+0.058
 banks+0.055
 allowance+0.053
 tum+0.053
tted+0.052
Neg logit contributions
ww-0.065
w-0.064
o-0.061
hem-0.057
sets-0.056
Token
Token ID241
Feature activation+1.20e-03

Pos logit contributions
ww+0.002
o+0.002
w+0.002
 yourselves+0.002
 my+0.002
Neg logit contributions
 contain-0.001
 allowance-0.001
 tum-0.001
 widened-0.001
 symp-0.001
TokenO
Token ID46
Feature activation+1.46e-02

Pos logit contributions
ww+0.002
w+0.002
o+0.001
sets+0.001
ries+0.001
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
 symp-0.002
tted-0.002
Tokenw
Token ID86
Feature activation+7.02e-02

Pos logit contributions
sets+0.019
ww+0.019
Look+0.018
 my+0.018
hem+0.017
Neg logit contributions
 banks-0.024
tted-0.024
 contain-0.024
 symp-0.024
 allowance-0.022
Token!
Token ID0
Feature activation+1.45e-02

Pos logit contributions
ww+0.112
w+0.108
o+0.106
sets+0.098
hem+0.098
Neg logit contributions
 contain-0.110
 banks-0.106
 tum-0.103
ured-0.103
 symp-0.102
Token That
Token ID1320
Feature activation+9.33e-03

Pos logit contributions
w+0.024
ww+0.023
o+0.021
sets+0.020
hem+0.020
Neg logit contributions
 contain-0.023
 allowance-0.022
 banks-0.022
tted-0.021
 symp-0.021
Token hurt
Token ID5938
Feature activation-1.04e-02

Pos logit contributions
ww+0.018
o+0.018
w+0.018
ins+0.017
ries+0.017
Neg logit contributions
 contain-0.011
 banks-0.011
 tum-0.011
 allowance-0.010
 bank-0.010
Token.
Token ID13
Feature activation-4.02e-03

Pos logit contributions
 contain+0.016
 banks+0.016
 tum+0.015
 allowance+0.015
tted+0.015
Neg logit contributions
ww-0.016
w-0.016
o-0.016
sets-0.015
hem-0.015
Tokenâ
Token ID22940
Feature activation+8.74e-03

Pos logit contributions
 contain+0.007
 banks+0.007
 allowance+0.007
 symp+0.007
 tum+0.007
Neg logit contributions
w-0.006
ww-0.006
o-0.005
sets-0.005
hem-0.005
Token make
Token ID787
Feature activation+3.69e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 symp+0.003
ured+0.003
Neg logit contributions
ww-0.006
w-0.006
o-0.006
hem-0.005
sets-0.005
Token him
Token ID683
Feature activation+8.93e-04

Pos logit contributions
ww+0.006
w+0.006
o+0.006
asc+0.005
sets+0.005
Neg logit contributions
 banks-0.005
 contain-0.005
 tum-0.005
 allowance-0.005
ured-0.005
Token the
Token ID262
Feature activation+1.13e-02

Pos logit contributions
o+0.001
ww+0.001
w+0.001
sets+0.001
 my+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
ured-0.001
 symp-0.001
tted-0.001
Token sp
Token ID599
Feature activation+1.35e-02

Pos logit contributions
ww+0.017
w+0.017
o+0.017
hem+0.015
sets+0.015
Neg logit contributions
 contain-0.019
 banks-0.019
 tum-0.018
 allowance-0.018
 bank-0.017
Tokenic
Token ID291
Feature activation+5.00e-02

Pos logit contributions
ww+0.021
w+0.020
o+0.020
sets+0.019
Look+0.018
Neg logit contributions
 contain-0.023
 banks-0.022
 tum-0.021
 allowance-0.021
ured-0.020
Tokeniest
Token ID6386
Feature activation+7.94e-02

Pos logit contributions
ww+0.065
w+0.063
o+0.062
sets+0.055
hem+0.054
Neg logit contributions
 contain-0.098
 banks-0.092
 allowance-0.090
ured-0.088
 tum-0.088
Token food
Token ID2057
Feature activation-5.60e-03

Pos logit contributions
ww+0.120
w+0.119
o+0.114
sets+0.105
hem+0.100
Neg logit contributions
 contain-0.137
 banks-0.132
 tum-0.131
 allowance-0.129
 symp-0.123
Token she
Token ID673
Feature activation-2.56e-03

Pos logit contributions
 contain+0.010
 banks+0.009
 tum+0.009
 allowance+0.009
ured+0.008
Neg logit contributions
ww-0.009
w-0.009
o-0.008
hem-0.008
sets-0.007
Token could
Token ID714
Feature activation+1.07e-02

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 tum+0.003
tted+0.003
Neg logit contributions
ww-0.005
w-0.005
o-0.005
sets-0.005
hem-0.005
Token think
Token ID892
Feature activation+8.35e-03

Pos logit contributions
o+0.019
w+0.019
ww+0.018
hem+0.017
sets+0.016
Neg logit contributions
 contain-0.015
 banks-0.014
 symp-0.014
 allowance-0.014
ured-0.013
Token of
Token ID286
Feature activation+7.75e-03

Pos logit contributions
ww+0.014
w+0.014
o+0.013
hem+0.012
asc+0.012
Neg logit contributions
 contain-0.013
 allowance-0.012
 tum-0.012
 banks-0.012
ured-0.011
Token,
Token ID11
Feature activation-2.36e-03

Pos logit contributions
ww+0.015
w+0.015
o+0.014
sets+0.013
hem+0.013
Neg logit contributions
 contain-0.011
 banks-0.011
 tum-0.010
 allowance-0.010
tted-0.010
Token and
Token ID290
Feature activation-9.85e-04

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.003
ured+0.003
 tum+0.003
Neg logit contributions
ww-0.004
w-0.004
o-0.004
sets-0.003
hem-0.003
Token they
Token ID484
Feature activation-8.61e-03

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.001
 symp+0.001
ured+0.001
Neg logit contributions
ww-0.002
w-0.002
o-0.002
sets-0.001
hem-0.001
Token ran
Token ID4966
Feature activation-2.10e-03

Pos logit contributions
 contain+0.015
ured+0.014
 allowance+0.014
 aid+0.014
 banks+0.014
Neg logit contributions
o-0.013
w-0.013
ww-0.012
ep-0.010
my-0.010
Token in
Token ID287
Feature activation-5.29e-03

Pos logit contributions
 contain+0.004
 allowance+0.003
 symp+0.003
 banks+0.003
 tum+0.003
Neg logit contributions
o-0.003
ww-0.003
w-0.003
 my-0.003
hem-0.003
Token the
Token ID262
Feature activation+5.56e-03

Pos logit contributions
 contain+0.010
ured+0.010
 banks+0.009
tted+0.009
isc+0.009
Neg logit contributions
o-0.006
ww-0.006
 my-0.006
w-0.006
hem-0.005
Token grass
Token ID8701
Feature activation+1.56e-02

Pos logit contributions
w+0.009
ww+0.009
o+0.008
ries+0.008
ins+0.008
Neg logit contributions
 tum-0.009
 banks-0.009
 contain-0.009
tted-0.008
 bank-0.008
Token.
Token ID13
Feature activation-4.06e-03

Pos logit contributions
ww+0.028
w+0.027
o+0.025
hem+0.024
www+0.023
Neg logit contributions
 contain-0.024
 banks-0.023
 tum-0.021
 allowance-0.021
 symp-0.021
Token
Token ID198
Feature activation-7.74e-04

Pos logit contributions
 contain+0.006
 banks+0.006
 allowance+0.005
 tum+0.005
ured+0.005
Neg logit contributions
w-0.007
ww-0.007
o-0.007
hem-0.006
sets-0.006
Token
Token ID198
Feature activation-7.76e-04

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 symp+0.001
 tum+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
TokenAt
Token ID2953
Feature activation-3.80e-05

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
ured+0.001
tted+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token finished
Token ID5201
Feature activation-3.54e-02

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
ured+0.011
 symp+0.011
Neg logit contributions
ww-0.014
w-0.014
o-0.014
hem-0.012
sets-0.012
Token her
Token ID607
Feature activation+4.78e-03

Pos logit contributions
 banks+0.054
 symp+0.054
 contain+0.053
 tum+0.052
ured+0.052
Neg logit contributions
ww-0.058
w-0.056
o-0.053
hem-0.050
loo-0.050
Token crack
Token ID8469
Feature activation-7.85e-03

Pos logit contributions
ww+0.008
w+0.008
o+0.008
sets+0.007
hem+0.007
Neg logit contributions
 contain-0.007
 banks-0.007
 allowance-0.007
ured-0.007
 symp-0.006
Tokeners
Token ID364
Feature activation+3.12e-03

Pos logit contributions
 contain+0.013
 banks+0.012
 allowance+0.012
 tum+0.012
ured+0.012
Neg logit contributions
ww-0.012
w-0.012
o-0.012
hem-0.011
sets-0.010
Token and
Token ID290
Feature activation+8.25e-04

Pos logit contributions
ww+0.005
o+0.004
w+0.004
sets+0.004
hem+0.004
Neg logit contributions
 contain-0.005
 banks-0.005
 allowance-0.005
tted-0.005
ured-0.005
Token went
Token ID1816
Feature activation+1.53e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
iren+0.001
hem+0.001
Neg logit contributions
 contain-0.001
 tum-0.001
 banks-0.001
 allowance-0.001
 widened-0.001
Token back
Token ID736
Feature activation-1.39e-02

Pos logit contributions
w+0.032
ww+0.032
o+0.030
sets+0.029
ries+0.028
Neg logit contributions
 contain-0.018
 banks-0.017
 tum-0.016
ured-0.016
tted-0.015
Token inside
Token ID2641
Feature activation-2.78e-02

Pos logit contributions
 contain+0.018
 banks+0.017
 allowance+0.016
 tum+0.016
 aid+0.016
Neg logit contributions
ww-0.028
w-0.028
o-0.024
ins-0.024
www-0.024
Token to
Token ID284
Feature activation-3.88e-03

Pos logit contributions
 contain+0.047
 banks+0.045
 allowance+0.043
tted+0.043
ured+0.043
Neg logit contributions
o-0.041
ww-0.041
w-0.041
hem-0.036
sets-0.036
Token play
Token ID711
Feature activation+1.22e-02

Pos logit contributions
 contain+0.005
 tum+0.004
 banks+0.004
 symp+0.004
 allowance+0.004
Neg logit contributions
ww-0.008
w-0.008
o-0.008
ins-0.007
hem-0.007
Token with
Token ID351
Feature activation+1.39e-02

Pos logit contributions
ww+0.018
o+0.018
w+0.018
sets+0.016
 my+0.016
Neg logit contributions
 contain-0.021
 allowance-0.020
 banks-0.020
ured-0.019
tted-0.018
Token made
Token ID925
Feature activation-4.97e-03

Pos logit contributions
 contain+0.012
 banks+0.011
 allowance+0.011
ured+0.010
 symp+0.010
Neg logit contributions
ww-0.012
w-0.011
o-0.011
sets-0.010
hem-0.010
Token a
Token ID257
Feature activation-8.23e-03

Pos logit contributions
 contain+0.008
 banks+0.008
 allowance+0.007
 tum+0.007
ured+0.007
Neg logit contributions
ww-0.008
w-0.008
o-0.008
sets-0.007
hem-0.007
Token new
Token ID649
Feature activation+3.00e-02

Pos logit contributions
 contain+0.015
 banks+0.014
 allowance+0.014
 tum+0.013
 symp+0.013
Neg logit contributions
ww-0.012
w-0.012
o-0.012
sets-0.010
hem-0.010
Token friend
Token ID1545
Feature activation-5.34e-03

Pos logit contributions
w+0.045
ww+0.044
o+0.043
hem+0.038
sets+0.037
Neg logit contributions
 contain-0.053
 banks-0.051
 allowance-0.050
 tum-0.049
 symp-0.048
Token and
Token ID290
Feature activation+3.77e-03

Pos logit contributions
 contain+0.007
 banks+0.006
 tum+0.006
 allowance+0.006
 symp+0.006
Neg logit contributions
ww-0.011
w-0.010
o-0.010
hem-0.009
sets-0.009
Token shared
Token ID4888
Feature activation+5.31e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.006
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.006
 allowance-0.006
 tum-0.006
 banks-0.006
ured-0.005
Token her
Token ID607
Feature activation+4.56e-03

Pos logit contributions
ww+0.009
w+0.009
o+0.009
asc+0.008
loo+0.008
Neg logit contributions
 contain-0.008
 allowance-0.007
 banks-0.007
 tum-0.007
 symp-0.007
Token gift
Token ID6979
Feature activation+9.69e-04

Pos logit contributions
ww+0.007
w+0.007
o+0.007
sets+0.006
hem+0.006
Neg logit contributions
 contain-0.008
 banks-0.007
ured-0.007
 symp-0.007
 allowance-0.007
Token.
Token ID13
Feature activation+6.16e-04

Pos logit contributions
o+0.001
ww+0.001
w+0.001
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
 symp-0.002
tted-0.002
Token They
Token ID1119
Feature activation-8.27e-03

Pos logit contributions
w+0.001
ww+0.001
o+0.001
ries+0.001
ins+0.001
Neg logit contributions
 tum-0.001
 contain-0.001
 allowance-0.001
 banks-0.001
 symp-0.001
Token ran
Token ID4966
Feature activation-1.19e-03

Pos logit contributions
 aid+0.015
ured+0.015
 contain+0.015
 banks+0.014
 allowance+0.014
Neg logit contributions
o-0.012
w-0.012
ww-0.011
my-0.010
sets-0.009
Token sorry
Token ID7926
Feature activation+2.12e-02

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 tum+0.000
ured+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
sets-0.000
hem-0.000
Token,
Token ID11
Feature activation-1.04e-03

Pos logit contributions
w+0.038
ww+0.037
o+0.036
sets+0.033
rob+0.033
Neg logit contributions
 contain-0.030
 tum-0.029
 banks-0.028
 symp-0.028
 allowance-0.027
Token I
Token ID314
Feature activation+9.62e-03

Pos logit contributions
 contain+0.002
 tum+0.002
 banks+0.002
 allowance+0.002
 symp+0.002
Neg logit contributions
w-0.001
ww-0.001
o-0.001
sets-0.001
hem-0.001
Token didn
Token ID1422
Feature activation-1.93e-02

Pos logit contributions
w+0.017
o+0.017
ww+0.016
hem+0.014
 blows+0.014
Neg logit contributions
 contain-0.016
 banks-0.015
ured-0.014
 symp-0.014
 allowance-0.014
Token't
Token ID470
Feature activation+2.61e-02

Pos logit contributions
 contain+0.035
 banks+0.033
 allowance+0.032
ured+0.032
 symp+0.032
Neg logit contributions
ww-0.028
w-0.027
o-0.027
hem-0.023
sets-0.023
Token mean
Token ID1612
Feature activation+2.87e-02

Pos logit contributions
ww+0.039
w+0.039
o+0.038
sets+0.035
ries+0.035
Neg logit contributions
 contain-0.045
 banks-0.042
 allowance-0.040
 tum-0.040
tted-0.040
Token to
Token ID284
Feature activation-3.55e-03

Pos logit contributions
ww+0.059
w+0.058
o+0.058
rob+0.057
sets+0.055
Neg logit contributions
 tum-0.029
 contain-0.029
 banks-0.026
 realize-0.026
 allowance-0.026
Token kick
Token ID4829
Feature activation-4.07e-03

Pos logit contributions
 contain+0.005
 banks+0.004
 tum+0.004
 allowance+0.004
 bank+0.004
Neg logit contributions
ww-0.007
w-0.007
o-0.007
sets-0.006
ins-0.006
Token it
Token ID340
Feature activation-1.09e-03

Pos logit contributions
 contain+0.007
 banks+0.007
 allowance+0.007
 tum+0.007
 symp+0.007
Neg logit contributions
ww-0.006
o-0.006
w-0.006
sets-0.005
hem-0.005
Token so
Token ID523
Feature activation+1.36e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
ured+0.001
Neg logit contributions
ww-0.002
w-0.002
o-0.002
sets-0.002
hem-0.002
Token hard
Token ID1327
Feature activation-4.55e-03

Pos logit contributions
ww+0.020
o+0.020
w+0.019
sets+0.018
hem+0.018
Neg logit contributions
 contain-0.024
 banks-0.022
 allowance-0.022
 tum-0.022
 symp-0.021
Token infant
Token ID11212
Feature activation-2.35e-02

Pos logit contributions
 contain+0.027
 tum+0.026
 banks+0.026
 allowance+0.026
 symp+0.024
Neg logit contributions
w-0.037
ww-0.036
o-0.036
ins-0.032
hem-0.032
Token wanted
Token ID2227
Feature activation-2.20e-03

Pos logit contributions
 tum+0.026
 banks+0.024
+0.023
 allowance+0.023
 bank+0.023
Neg logit contributions
ww-0.049
w-0.047
 yourselves-0.047
rob-0.047
iren-0.046
Token to
Token ID284
Feature activation-3.78e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 tum+0.003
 bank+0.002
Neg logit contributions
ww-0.004
w-0.004
o-0.004
sets-0.004
ins-0.004
Token play
Token ID711
Feature activation+1.13e-02

Pos logit contributions
 contain+0.006
 banks+0.006
 symp+0.006
 allowance+0.006
ured+0.006
Neg logit contributions
ww-0.006
o-0.006
w-0.006
hem-0.005
 my-0.005
Token with
Token ID351
Feature activation+1.27e-02

Pos logit contributions
ww+0.014
w+0.014
o+0.014
hem+0.012
sets+0.012
Neg logit contributions
 contain-0.022
 banks-0.021
 allowance-0.021
ured-0.020
 symp-0.020
Token the
Token ID262
Feature activation+1.00e-02

Pos logit contributions
ww+0.021
w+0.020
o+0.020
sets+0.018
hem+0.017
Neg logit contributions
 contain-0.021
 banks-0.020
 allowance-0.019
 tum-0.019
ured-0.019
Token ball
Token ID2613
Feature activation+6.07e-03

Pos logit contributions
ww+0.015
o+0.015
w+0.015
ins+0.014
hem+0.013
Neg logit contributions
 contain-0.017
 banks-0.017
 tum-0.016
 allowance-0.016
tted-0.015
Token,
Token ID11
Feature activation+6.39e-04

Pos logit contributions
ww+0.011
w+0.011
o+0.011
hem+0.010
sets+0.010
Neg logit contributions
 contain-0.008
 banks-0.008
 allowance-0.008
 tum-0.007
 symp-0.007
Token but
Token ID475
Feature activation-2.15e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
hem+0.001
sets+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 tum-0.001
 allowance-0.001
 symp-0.001
Token it
Token ID340
Feature activation-2.54e-03

Pos logit contributions
 contain+0.032
 tum+0.030
 banks+0.030
 allowance+0.030
 symp+0.028
Neg logit contributions
ww-0.039
w-0.037
o-0.036
rob-0.035
hem-0.034
Token was
Token ID373
Feature activation-1.74e-03

Pos logit contributions
 contain+0.004
 banks+0.004
 allowance+0.004
ured+0.004
 symp+0.004
Neg logit contributions
ww-0.004
w-0.004
o-0.004
hem-0.003
sets-0.003
Token found
Token ID1043
Feature activation-1.07e-02

Pos logit contributions
 contain+0.009
 banks+0.009
 allowance+0.009
 symp+0.009
ured+0.008
Neg logit contributions
ww-0.010
o-0.010
w-0.010
sets-0.009
hem-0.008
Token a
Token ID257
Feature activation-7.66e-03

Pos logit contributions
 contain+0.016
 banks+0.015
 allowance+0.014
ured+0.014
 tum+0.014
Neg logit contributions
ww-0.019
w-0.019
o-0.018
sets-0.016
hem-0.016
Token nice
Token ID3621
Feature activation+1.09e-02

Pos logit contributions
 contain+0.011
 tum+0.011
 banks+0.010
 allowance+0.010
 aid+0.010
Neg logit contributions
w-0.014
o-0.013
ww-0.013
ins-0.013
ries-0.013
Token,
Token ID11
Feature activation+1.56e-03

Pos logit contributions
w+0.021
o+0.020
ww+0.020
ins+0.019
ries+0.019
Neg logit contributions
 contain-0.015
 tum-0.015
 aid-0.014
 banks-0.013
 allowance-0.013
Token warm
Token ID5814
Feature activation+3.43e-03

Pos logit contributions
ww+0.002
w+0.002
o+0.002
hem+0.002
sets+0.002
Neg logit contributions
 contain-0.003
 banks-0.002
 allowance-0.002
 tum-0.002
ured-0.002
Token spot
Token ID4136
Feature activation-5.74e-03

Pos logit contributions
w+0.007
ww+0.006
o+0.006
ins+0.006
hem+0.006
Neg logit contributions
 contain-0.005
 tum-0.005
 banks-0.004
 allowance-0.004
 aid-0.004
Token to
Token ID284
Feature activation-3.99e-03

Pos logit contributions
 contain+0.009
 banks+0.008
 allowance+0.008
ured+0.008
 tum+0.008
Neg logit contributions
ww-0.010
w-0.010
o-0.009
hem-0.009
sets-0.009
Token hide
Token ID7808
Feature activation+5.10e-04

Pos logit contributions
 contain+0.006
 banks+0.005
 allowance+0.005
 symp+0.005
ured+0.005
Neg logit contributions
ww-0.007
o-0.007
w-0.007
hem-0.006
 my-0.006
Token from
Token ID422
Feature activation-4.68e-02

Pos logit contributions
w+0.001
o+0.001
ww+0.001
 yourselves+0.001
 my+0.001
Neg logit contributions
 contain-0.001
 tum-0.001
 allowance-0.001
 banks-0.001
tted-0.001
Token the
Token ID262
Feature activation+1.04e-02

Pos logit contributions
 contain+0.072
 banks+0.070
 allowance+0.067
ured+0.064
 bank+0.063
Neg logit contributions
ww-0.084
w-0.081
o-0.078
sets-0.071
hem-0.071
Token cold
Token ID4692
Feature activation-2.86e-03

Pos logit contributions
ww+0.017
w+0.016
o+0.016
ins+0.014
hem+0.014
Neg logit contributions
 contain-0.017
 banks-0.017
 tum-0.016
 allowance-0.016
 symp-0.015
Token my
Token ID616
Feature activation-7.28e-03

Pos logit contributions
 contain+0.014
ured+0.013
tted+0.013
 aid+0.013
 tum+0.013
Neg logit contributions
w-0.010
ww-0.010
o-0.009
sets-0.009
ries-0.009
Token friend
Token ID1545
Feature activation-6.17e-03

Pos logit contributions
 contain+0.014
 banks+0.014
ured+0.013
 symp+0.013
 widened+0.013
Neg logit contributions
ww-0.009
o-0.009
w-0.009
sets-0.008
hem-0.007
Token?
Token ID30
Feature activation+3.61e-02

Pos logit contributions
 contain+0.012
 tum+0.012
 banks+0.011
 aid+0.011
 allowance+0.011
Neg logit contributions
ww-0.007
w-0.007
hem-0.007
ries-0.006
rob-0.006
Tokenâ
Token ID22940
Feature activation+8.47e-03

Pos logit contributions
w+0.054
ww+0.052
o+0.047
hem+0.044
ins+0.044
Neg logit contributions
 contain-0.065
 banks-0.060
 allowance-0.059
tted-0.059
 symp-0.058
Token
Token ID26391
Feature activation-3.23e-02

Pos logit contributions
ww+0.017
w+0.017
o+0.016
ins+0.016
hem+0.016
Neg logit contributions
 contain-0.010
 banks-0.009
tted-0.009
 symp-0.009
ured-0.009
Token
Token ID198
Feature activation-1.24e-03

Pos logit contributions
 contain+0.049
 banks+0.048
 tum+0.047
 allowance+0.046
 symp+0.046
Neg logit contributions
ww-0.053
w-0.053
o-0.048
hem-0.046
sets-0.046
Token
Token ID198
Feature activation-1.19e-03

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.002
ured+0.002
 tum+0.002
Neg logit contributions
ww-0.002
w-0.002
o-0.002
hem-0.002
sets-0.002
TokenBob
Token ID18861
Feature activation+2.30e-03

Pos logit contributions
 contain+0.002
 banks+0.002
tted+0.002
 symp+0.002
 allowance+0.002
Neg logit contributions
w-0.002
ww-0.002
o-0.002
sets-0.001
hem-0.001
Token smiled
Token ID13541
Feature activation+3.42e-03

Pos logit contributions
ww+0.004
w+0.003
o+0.003
sets+0.003
hem+0.003
Neg logit contributions
 contain-0.004
 banks-0.004
 allowance-0.004
 symp-0.004
 tum-0.004
Token and
Token ID290
Feature activation+1.94e-03

Pos logit contributions
ww+0.006
w+0.006
o+0.006
hem+0.005
sets+0.005
Neg logit contributions
 contain-0.005
 banks-0.005
 symp-0.005
 tum-0.005
 allowance-0.005
Token said
Token ID531
Feature activation-2.81e-04

Pos logit contributions
w+0.003
ww+0.003
ries+0.003
o+0.003
hem+0.003
Neg logit contributions
 contain-0.003
 tum-0.003
 banks-0.003
ured-0.003
 bank-0.003
Token was
Token ID373
Feature activation-1.88e-03

Pos logit contributions
ww+0.007
w+0.006
sets+0.006
 yourselves+0.006
ries+0.006
Neg logit contributions
 contain-0.004
 banks-0.004
 tum-0.004
 allowance-0.004
 bank-0.004
Token always
Token ID1464
Feature activation+2.04e-02

Pos logit contributions
 contain+0.002
 tum+0.002
 banks+0.002
 allowance+0.002
ured+0.002
Neg logit contributions
ww-0.004
w-0.004
sets-0.004
ries-0.003
hem-0.003
Token anxious
Token ID18116
Feature activation-8.43e-03

Pos logit contributions
ww+0.042
rob+0.040
w+0.040
hem+0.039
loo+0.038
Neg logit contributions
 contain-0.022
 banks-0.021
 tum-0.020
 bank-0.018
ured-0.018
Token because
Token ID780
Feature activation-2.07e-02

Pos logit contributions
 contain+0.010
 banks+0.009
 tum+0.009
 allowance+0.009
 symp+0.009
Neg logit contributions
ww-0.018
w-0.017
o-0.016
rob-0.015
sets-0.015
Token he
Token ID339
Feature activation+1.54e-03

Pos logit contributions
 contain+0.031
 tum+0.029
 allowance+0.028
 banks+0.027
 even+0.025
Neg logit contributions
ww-0.037
w-0.035
o-0.034
rob-0.033
sets-0.032
Token didn
Token ID1422
Feature activation-2.01e-02

Pos logit contributions
ww+0.003
w+0.003
o+0.003
sets+0.002
hem+0.002
Neg logit contributions
 contain-0.002
 banks-0.002
 allowance-0.002
 tum-0.002
ured-0.002
Token't
Token ID470
Feature activation+2.49e-02

Pos logit contributions
 contain+0.042
 banks+0.040
 symp+0.039
 tum+0.039
ured+0.039
Neg logit contributions
o-0.023
ww-0.023
w-0.023
hem-0.020
sets-0.018
Token know
Token ID760
Feature activation-2.23e-03

Pos logit contributions
ww+0.054
w+0.053
o+0.051
sets+0.049
ries+0.049
Neg logit contributions
 contain-0.028
 banks-0.025
 allowance-0.023
 tum-0.023
tted-0.022
Token where
Token ID810
Feature activation-1.24e-02

Pos logit contributions
 contain+0.004
 banks+0.003
 allowance+0.003
 tum+0.003
 symp+0.003
Neg logit contributions
ww-0.004
w-0.004
o-0.004
hem-0.003
very-0.003
Token he
Token ID339
Feature activation+1.67e-03

Pos logit contributions
 contain+0.019
 tum+0.018
 banks+0.018
 allowance+0.017
 symp+0.016
Neg logit contributions
ww-0.022
w-0.021
o-0.020
sets-0.019
hem-0.019
Token belonged
Token ID19611
Feature activation-1.76e-02

Pos logit contributions
o+0.002
w+0.002
ww+0.002
 blows+0.002
hem+0.002
Neg logit contributions
 contain-0.003
ured-0.003
 banks-0.003
 stret-0.003
 allowance-0.003
Token there
Token ID612
Feature activation-8.16e-03

Pos logit contributions
ww+0.016
w+0.015
o+0.015
sets+0.014
hem+0.014
Neg logit contributions
 contain-0.018
 banks-0.016
 allowance-0.016
 tum-0.016
ured-0.016
Token,
Token ID11
Feature activation+4.09e-04

Pos logit contributions
 contain+0.011
 banks+0.010
 tum+0.010
 allowance+0.010
 symp+0.009
Neg logit contributions
ww-0.016
w-0.015
o-0.015
hem-0.014
sets-0.014
Token but
Token ID475
Feature activation-2.16e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
hem+0.001
sets+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.000
ured-0.000
 symp-0.000
Token Sam
Token ID3409
Feature activation+2.20e-02

Pos logit contributions
 contain+0.037
 allowance+0.034
 banks+0.034
 tum+0.034
ured+0.033
Neg logit contributions
ww-0.034
w-0.033
o-0.031
sets-0.030
hem-0.029
Token is
Token ID318
Feature activation+2.41e-02

Pos logit contributions
ww+0.035
w+0.035
o+0.034
sets+0.031
hem+0.030
Neg logit contributions
 contain-0.036
 banks-0.034
 allowance-0.033
 symp-0.032
ured-0.032
Token faster
Token ID5443
Feature activation-3.69e-03

Pos logit contributions
ww+0.045
w+0.044
o+0.041
sets+0.040
hem+0.039
Neg logit contributions
 contain-0.035
ured-0.031
 banks-0.031
 allowance-0.031
 tum-0.031
Token.
Token ID13
Feature activation+1.09e-03

Pos logit contributions
 contain+0.005
 banks+0.005
 allowance+0.005
ured+0.004
 tum+0.004
Neg logit contributions
ww-0.007
w-0.007
o-0.007
sets-0.006
hem-0.006
Token He
Token ID679
Feature activation-5.27e-04

Pos logit contributions
w+0.002
ww+0.002
sets+0.002
ries+0.002
rob+0.002
Neg logit contributions
 tum-0.002
 banks-0.001
 symp-0.001
 contain-0.001
 aid-0.001
Token passes
Token ID8318
Feature activation-2.84e-02

Pos logit contributions
 contain+0.001
 banks+0.001
 allowance+0.001
 tum+0.001
 symp+0.001
Neg logit contributions
ww-0.001
w-0.001
o-0.001
sets-0.001
hem-0.001
Token Ben
Token ID3932
Feature activation+3.85e-03

Pos logit contributions
 contain+0.040
 banks+0.037
 allowance+0.036
tted+0.035
 tum+0.035
Neg logit contributions
ww-0.054
w-0.048
sets-0.046
rob-0.045
www-0.045
Token and
Token ID290
Feature activation+4.35e-03

Pos logit contributions
ww+0.007
w+0.007
o+0.006
sets+0.006
hem+0.006
Neg logit contributions
 contain-0.005
 banks-0.005
 allowance-0.005
ured-0.005
 tum-0.005
Token her
Token ID607
Feature activation+3.81e-03

Pos logit contributions
 contain+0.039
 allowance+0.036
 tum+0.035
 banks+0.035
 aid+0.034
Neg logit contributions
ww-0.064
w-0.063
o-0.058
rob-0.055
www-0.054
Token favorite
Token ID4004
Feature activation-9.40e-03

Pos logit contributions
ww+0.007
w+0.007
o+0.006
hem+0.006
ins+0.006
Neg logit contributions
 contain-0.006
 banks-0.005
 allowance-0.005
 tum-0.005
ured-0.005
Token song
Token ID3496
Feature activation+2.29e-02

Pos logit contributions
 contain+0.015
 tum+0.015
 banks+0.015
 allowance+0.015
 aid+0.014
Neg logit contributions
ww-0.015
w-0.015
o-0.015
 yourselves-0.013
asc-0.013
Token and
Token ID290
Feature activation+7.44e-04

Pos logit contributions
ww+0.044
w+0.044
o+0.041
hem+0.038
 yourselves+0.038
Neg logit contributions
 contain-0.032
 tum-0.028
 banks-0.028
 allowance-0.027
ured-0.027
Token put
Token ID1234
Feature activation+6.29e-03

Pos logit contributions
w+0.001
ww+0.001
o+0.001
loo+0.001
asc+0.001
Neg logit contributions
 contain-0.001
 allowance-0.001
 tum-0.001
 banks-0.001
 widened-0.001
Token it
Token ID340
Feature activation-2.08e-03

Pos logit contributions
ww+0.011
o+0.011
w+0.010
hem+0.010
loo+0.010
Neg logit contributions
 contain-0.009
 allowance-0.008
 tum-0.008
ured-0.008
 symp-0.008
Token into
Token ID656
Feature activation-1.09e-02

Pos logit contributions
 contain+0.004
 banks+0.003
 allowance+0.003
ured+0.003
 tum+0.003
Neg logit contributions
ww-0.003
w-0.003
o-0.003
hem-0.003
ins-0.003
Token her
Token ID607
Feature activation+3.82e-03

Pos logit contributions
 contain+0.019
 allowance+0.018
 banks+0.018
 tum+0.017
 symp+0.017
Neg logit contributions
ww-0.017
w-0.017
o-0.017
sets-0.014
hem-0.014
Token pocket
Token ID10000
Feature activation-3.01e-02

Pos logit contributions
ww+0.006
w+0.006
o+0.006
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.006
 banks-0.006
 allowance-0.006
ured-0.006
 symp-0.006
Token.
Token ID13
Feature activation-4.37e-03

Pos logit contributions
 contain+0.049
 banks+0.046
 symp+0.046
ured+0.045
 allowance+0.045
Neg logit contributions
w-0.045
o-0.044
ww-0.044
sets-0.039
 my-0.039
Token
Token ID198
Feature activation+1.00e-04

Pos logit contributions
 contain+0.006
 banks+0.006
 allowance+0.006
 tum+0.006
ured+0.006
Neg logit contributions
w-0.008
ww-0.008
o-0.007
ins-0.007
hem-0.007
Token bright
Token ID6016
Feature activation+3.47e-03

Pos logit contributions
ww+0.011
w+0.011
o+0.011
hem+0.009
sets+0.009
Neg logit contributions
 contain-0.013
 banks-0.013
 tum-0.012
 allowance-0.012
ured-0.011
Token fish
Token ID5916
Feature activation-7.48e-03

Pos logit contributions
w+0.005
ww+0.005
o+0.005
hem+0.004
asc+0.004
Neg logit contributions
 banks-0.006
 contain-0.006
 tum-0.006
 widened-0.006
 allowance-0.006
Token
Token ID6184
Feature activation-2.56e-03

Pos logit contributions
 banks+0.010
 contain+0.010
 tum+0.009
 bank+0.009
ured+0.008
Neg logit contributions
ww-0.014
w-0.014
 yourselves-0.013
Look-0.013
rob-0.012
Token
Token ID95
Feature activation+2.30e-02

Pos logit contributions
 contain+0.004
 allowance+0.004
 tum+0.003
 symp+0.003
 aid+0.003
Neg logit contributions
ww-0.004
o-0.004
w-0.004
www-0.004
very-0.004
Token
Token ID26391
Feature activation-3.33e-02

Pos logit contributions
ww+0.039
o+0.039
w+0.038
 yourself+0.033
very+0.033
Neg logit contributions
 contain-0.035
 allowance-0.035
 tum-0.033
 symp-0.032
 banks-0.031
Token
Token ID447
Feature activation-7.43e-02

Pos logit contributions
 contain+0.053
 banks+0.052
 tum+0.050
tted+0.050
 allowance+0.049
Neg logit contributions
ww-0.052
w-0.051
o-0.048
hem-0.045
sets-0.045
Token
Token ID250
Feature activation+3.93e-02

Pos logit contributions
 symp+0.100
 banks+0.099
 contain+0.097
tted+0.094
 tum+0.092
Neg logit contributions
ww-0.131
w-0.126
o-0.126
hem-0.124
ins-0.123
Token they
Token ID484
Feature activation-7.88e-03

Pos logit contributions
ww+0.056
w+0.053
hem+0.049
o+0.049
sets+0.049
Neg logit contributions
 contain-0.070
tted-0.066
 banks-0.065
 symp-0.062
 allowance-0.061
Token all
Token ID477
Feature activation+3.05e-02

Pos logit contributions
 contain+0.010
 banks+0.009
 allowance+0.008
ured+0.008
tted+0.008
Neg logit contributions
ww-0.016
w-0.015
o-0.015
hem-0.014
sets-0.014
Token seemed
Token ID3947
Feature activation-1.83e-02

Pos logit contributions
ww+0.056
w+0.055
o+0.054
hem+0.051
sets+0.050
Neg logit contributions
 contain-0.045
 banks-0.041
 tum-0.037
ured-0.037
 allowance-0.037
Token to
Token ID284
Feature activation-3.17e-03

Pos logit contributions
 contain+0.025
 banks+0.023
 allowance+0.022
 tum+0.022
ured+0.022
Neg logit contributions
w-0.034
ww-0.034
o-0.033
sets-0.031
hem-0.031
Token
Token ID198
Feature activation+1.83e-03

Pos logit contributions
 contain+0.009
 banks+0.008
 allowance+0.008
tted+0.008
 tum+0.008
Neg logit contributions
ww-0.012
w-0.011
o-0.010
sets-0.010
www-0.010
Token
Token ID198
Feature activation+1.73e-03

Pos logit contributions
ww+0.003
w+0.003
sets+0.003
ries+0.003
o+0.003
Neg logit contributions
 contain-0.002
 banks-0.002
tted-0.002
 symp-0.002
 allowance-0.002
Token"
Token ID1
Feature activation-6.92e-05

Pos logit contributions
ww+0.003
w+0.003
o+0.003
sets+0.003
ries+0.002
Neg logit contributions
 contain-0.003
 banks-0.003
 allowance-0.002
ured-0.002
tted-0.002
TokenLook
Token ID8567
Feature activation+6.35e-02

Pos logit contributions
 contain+0.000
 widened+0.000
tted+0.000
 symp+0.000
 banks+0.000
Neg logit contributions
w-0.000
o-0.000
ries-0.000
ww-0.000
loo-0.000
Token,
Token ID11
Feature activation-9.39e-04

Pos logit contributions
ww+0.100
w+0.098
o+0.097
 my+0.085
ries+0.084
Neg logit contributions
 contain-0.111
 banks-0.105
tted-0.101
 allowance-0.100
ured-0.097
Token Mitt
Token ID16627
Feature activation-6.59e-02

Pos logit contributions
 contain+0.001
 tum+0.001
 banks+0.001
 allowance+0.001
 symp+0.001
Neg logit contributions
w-0.001
ww-0.001
o-0.001
sets-0.001
ins-0.001
Tokenens
Token ID641
Feature activation-5.70e-02

Pos logit contributions
 tum+0.072
 contain+0.066
isc+0.066
 banks+0.066
 symp+0.065
Neg logit contributions
ww-0.120
 yourselves-0.119
o-0.118
w-0.117
sets-0.117
Token,"
Token ID553
Feature activation+1.29e-02

Pos logit contributions
 tum+0.080
 banks+0.077
 aid+0.074
tted+0.073
 symp+0.072
Neg logit contributions
ww-0.092
w-0.090
hem-0.088
sets-0.087
ins-0.086
Token said
Token ID531
Feature activation-2.32e-03

Pos logit contributions
w+0.022
ww+0.022
sets+0.021
asc+0.021
o+0.021
Neg logit contributions
 tum-0.020
 contain-0.017
 symp-0.017
 allowance-0.016
 banks-0.016
Token Lily
Token ID20037
Feature activation+3.62e-03

Pos logit contributions
 contain+0.005
 banks+0.004
 allowance+0.004
tted+0.004
ured+0.004
Neg logit contributions
o-0.003
ww-0.003
w-0.003
hem-0.002
 my-0.002
Token.
Token ID13
Feature activation-3.92e-03

Pos logit contributions
o+0.006
ww+0.006
w+0.006
sets+0.005
hem+0.005
Neg logit contributions
 contain-0.006
 banks-0.005
 allowance-0.005
ured-0.005
tted-0.005
Token.
Token ID13
Feature activation-2.25e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 tum+0.002
ured+0.002
Neg logit contributions
ww-0.005
w-0.005
o-0.004
hem-0.004
rob-0.004
Token They
Token ID1119
Feature activation-8.38e-03

Pos logit contributions
 contain+0.003
 banks+0.003
 allowance+0.003
 tum+0.003
 symp+0.003
Neg logit contributions
ww-0.004
w-0.004
o-0.004
sets-0.003
hem-0.003
Token make
Token ID787
Feature activation+3.38e-03

Pos logit contributions
 contain+0.011
 banks+0.011
ured+0.010
 symp+0.010
 allowance+0.010
Neg logit contributions
w-0.015
o-0.015
ww-0.015
sets-0.013
hem-0.013
Token neck
Token ID7393
Feature activation-3.96e-02

Pos logit contributions
ww+0.005
w+0.005
o+0.005
sets+0.004
hem+0.004
Neg logit contributions
 contain-0.006
 banks-0.006
 allowance-0.005
 tum-0.005
ured-0.005
Tokenl
Token ID75
Feature activation-3.34e-02

Pos logit contributions
 contain+0.066
 banks+0.061
 allowance+0.060
 symp+0.058
ured+0.058
Neg logit contributions
ww-0.062
w-0.062
o-0.061
hem-0.054
sets-0.054
Tokenaces
Token ID2114
Feature activation-6.39e-02

Pos logit contributions
 contain+0.041
 banks+0.038
tted+0.036
 tum+0.036
 allowance+0.035
Neg logit contributions
ww-0.068
w-0.066
o-0.061
hem-0.060
Look-0.060
Token,
Token ID11
Feature activation-3.35e-05

Pos logit contributions
 contain+0.100
 banks+0.095
 allowance+0.092
 symp+0.090
 tum+0.089
Neg logit contributions
ww-0.106
w-0.104
o-0.103
sets-0.093
hem-0.092
Token brace
Token ID22581
Feature activation-2.83e-02

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 tum+0.000
 widened+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
hem-0.000
asc-0.000
Tokenlets
Token ID5289
Feature activation+3.15e-02

Pos logit contributions
 contain+0.049
 banks+0.047
 allowance+0.045
ured+0.044
 symp+0.044
Neg logit contributions
ww-0.043
w-0.042
o-0.041
sets-0.036
hem-0.036
Token,
Token ID11
Feature activation-1.42e-04

Pos logit contributions
ww+0.062
w+0.060
o+0.057
hem+0.053
sets+0.053
Neg logit contributions
 contain-0.042
 banks-0.039
 allowance-0.037
ured-0.037
 tum-0.036
Token and
Token ID290
Feature activation+4.50e-04

Pos logit contributions
 contain+0.000
 banks+0.000
 allowance+0.000
 tum+0.000
 widened+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
hem-0.000
www-0.000
Token saw
Token ID2497
Feature activation-9.78e-03

Pos logit contributions
 contain+0.000
 allowance+0.000
 banks+0.000
 widened+0.000
ured+0.000
Neg logit contributions
ww-0.000
w-0.000
o-0.000
sets-0.000
hem-0.000
Token a
Token ID257
Feature activation-6.61e-03

Pos logit contributions
 contain+0.013
 banks+0.013
 tum+0.013
 allowance+0.012
ured+0.012
Neg logit contributions
ww-0.019
w-0.018
o-0.018
sets-0.016
very-0.016
Token letter
Token ID3850
Feature activation+2.65e-02

Pos logit contributions
 contain+0.010
 tum+0.010
 allowance+0.009
 banks+0.009
 aid+0.009
Neg logit contributions
w-0.011
o-0.011
ww-0.011
sets-0.010
ries-0.010
Token from
Token ID422
Feature activation-5.02e-02

Pos logit contributions
ww+0.052
w+0.051
ins+0.047
o+0.047
rob+0.045
Neg logit contributions
 contain-0.037
ured-0.032
 aid-0.031
 tum-0.031
tted-0.031
Token the
Token ID262
Feature activation+8.59e-03

Pos logit contributions
 contain+0.062
 allowance+0.062
 banks+0.059
 bank+0.057
 aid+0.057
Neg logit contributions
ww-0.104
w-0.100
o-0.095
ins-0.093
sets-0.088
Token bank
Token ID3331
Feature activation-6.08e-02

Pos logit contributions
ww+0.012
w+0.012
o+0.012
hem+0.010
sets+0.010
Neg logit contributions
 contain-0.016
 banks-0.015
 allowance-0.015
 tum-0.014
ured-0.014
Token.
Token ID13
Feature activation-4.53e-03

Pos logit contributions
 contain+0.094
 banks+0.088
 symp+0.086
 allowance+0.086
ured+0.084
Neg logit contributions
o-0.099
w-0.099
ww-0.097
hem-0.087
ins-0.086
Token The
Token ID383
Feature activation-2.14e-02

Pos logit contributions
 contain+0.007
 banks+0.006
 tum+0.006
 symp+0.006
 allowance+0.006
Neg logit contributions
w-0.008
ww-0.008
o-0.007
sets-0.007
ins-0.007
Token letter
Token ID3850
Feature activation+2.60e-02

Pos logit contributions
 contain+0.039
 tum+0.038
 allowance+0.038
 banks+0.037
 bank+0.036
Neg logit contributions
o-0.031
ww-0.030
w-0.029
sets-0.026
ins-0.026
Token said
Token ID531
Feature activation-2.96e-03

Pos logit contributions
ww+0.048
w+0.047
o+0.044
 yourselves+0.043
ins+0.042
Neg logit contributions
 contain-0.037
 tum-0.034
 widened-0.034
 allowance-0.034
 banks-0.033
Token he
Token ID339
Feature activation+3.42e-03

Pos logit contributions
 contain+0.005
 banks+0.005
 allowance+0.004
 symp+0.004
ured+0.004
Neg logit contributions
ww-0.005
w-0.005
o-0.005
sets-0.004
hem-0.004
Token touched
Token ID12615
Feature activation-1.41e-03

Pos logit contributions
w+0.011
ww+0.011
o+0.011
hem+0.009
sets+0.009
Neg logit contributions
 contain-0.010
 banks-0.010
 symp-0.010
ured-0.009
tted-0.009
Token things
Token ID1243
Feature activation+4.13e-02

Pos logit contributions
 contain+0.002
 banks+0.002
 allowance+0.002
 tum+0.002
 symp+0.002
Neg logit contributions
ww-0.002
w-0.002
o-0.002
hem-0.002
sets-0.002
Token that
Token ID326
Feature activation+6.50e-04

Pos logit contributions
w+0.053
o+0.052
ww+0.051
 my+0.047
hem+0.046
Neg logit contributions
 banks-0.074
 contain-0.074
 allowance-0.071
 tum-0.070
 symp-0.070
Token are
Token ID389
Feature activation-1.28e-02

Pos logit contributions
ww+0.001
w+0.001
o+0.001
sets+0.001
hem+0.001
Neg logit contributions
 contain-0.001
 banks-0.001
 allowance-0.001
ured-0.001
 symp-0.001
Token not
Token ID407
Feature activation+1.14e-02

Pos logit contributions
 contain+0.020
 banks+0.019
 allowance+0.018
ured+0.018
 symp+0.018
Neg logit contributions
ww-0.021
w-0.021
o-0.021
sets-0.019
hem-0.019
Token theirs
Token ID22021
Feature activation-6.28e-02

Pos logit contributions
ww+0.022
w+0.022
o+0.021
hem+0.020
very+0.019
Neg logit contributions
 contain-0.015
 allowance-0.014
 banks-0.014
 tum-0.013
ured-0.013
Token again
Token ID757
Feature activation-1.70e-02

Pos logit contributions
 banks+0.117
 contain+0.116
 symp+0.114
 stret+0.112
ured+0.109
Neg logit contributions
o-0.075
w-0.068
my-0.064
 my-0.064
?-0.063
Token.
Token ID13
Feature activation+2.55e-03

Pos logit contributions
 contain+0.025
 banks+0.023
 allowance+0.022
ured+0.022
 symp+0.022
Neg logit contributions
ww-0.030
w-0.030
o-0.030
sets-0.027
hem-0.027
Token They
Token ID1119
Feature activation-7.19e-03

Pos logit contributions
ww+0.004
o+0.004
w+0.004
hem+0.003
sets+0.003
Neg logit contributions
 contain-0.005
 banks-0.004
 allowance-0.004
ured-0.004
 symp-0.004
Token played
Token ID2826
Feature activation-9.50e-03

Pos logit contributions
andel+0.014
ured+0.014
uster+0.014
 contain+0.013
 stret+0.013
Neg logit contributions
w-0.009
o-0.009
my-0.008
 says-0.007
Look-0.007
Token in
Token ID287
Feature activation-4.44e-03

Pos logit contributions
 contain+0.019
 banks+0.018
 allowance+0.018
 symp+0.018
 tum+0.017
Neg logit contributions
w-0.011
o-0.010
ww-0.010
 yourselves-0.010
 yourself-0.009