TOP ACTIVATIONS
MAX = 0.176

thought this was a great idea and they got to work
to take turns being the seeker and the h ider .
told him " Don 't worry , I 'll come back
told her " Don 't worry , sweet heart . We
they could prevent having to worry about not being able to
what he sees on the second page . He sees a
wanted was not a good idea . From then on ,
can 't see for a second , you can still do
saying , " Don 't worry , the dog won 't
That is such a great idea ! How can I help
say , " Don 't worry . I 've got your
said , " Don 't worry , I have an idea
" Nothing for you to worry about , Jimmy . These
to our friends to prevent bad things from happening . And
made Rex freeze for a second . Rex tried
down the slide for the second time . <|endoftext|> Tom my
rescue . " Don 't worry little friend , I will
say , " Don 't worry , I can help you
favorite costume without having to worry about it breaking again .
said : " Don 't worry , Pet a , I

MIN ACTIVATIONS
MIN = -0.171

bird saw the roses and per ked up . It flew
ro ost ing on their per ches . They were all
as high as your tree per ch .' All the animals
hand , and the bird per ked up . She said
his warm blanket , and kne el down to pray .
some nuts and seeds and stro ked its fur .
." Max per ked up and w agged
would only eat one mint per day ! The
to a nearby tree and crou ched down .
cave , but the hero per se vered . He searched
" It 's one dollar per cup . But I have
started baking the cake as per the advice of the elderly
butterfly 's wings and gently stro ked them . The butterfly
M ia 's mom crou ched down and said ,
D ott ie per ked up and happily nodded
." F ido 's ears per ked up and he looked
away , and they all kne eled in respect for their
, roll over , and kne el ." "
hand . She smiled and stro ked the horse 's nose
eyes . Jill and Mike crou ched down to pet the

INTERVAL 0.089 - 0.176
CONTAINS 1.075%

, " You did a great job , Lily ! Now
providing her with such an awesome gift . <|endoftext|> M ummy
He had such a fun day , learning about l izards
Billy . " Are you okay ?" " I
<|endoftext|> S ara was a stubborn girl . She did not

INTERVAL 0.003 - 0.089
CONTAINS 50.687%

it 's much better now ." From that
with her mom to catch y ummy treats .
but now she was no longer scared . In fact ,
proud of himself . He realized that even though he was
" I am here ". The man was

INTERVAL -0.084 - 0.003
CONTAINS 43.047%

thank you for finding my sack . I left it in
uddled on the shore . O ll ie felt warm and
and did all of his work with dedication . He watered
. They fix the car with some tape , and take
the store and came back with some fruit and crack ers

INTERVAL -0.171 - -0.084
CONTAINS 5.190%

on the swings . She took a deep breath and walked
. They drew a rainbow and a heart . They wrote
She picks up the stick and throws it at the big
. He took the pumpkin and put it on a table
. She fl apped her wings and soared high in the
Token thought
Token ID1807
Feature activation+4.91e-02

Pos logit contributions
lder+0.146
 property+0.145
 policy+0.143
 client+0.143
 endings+0.143
Neg logit contributions
 crashing-0.102
 shallow-0.096
 sun-0.096
 floating-0.091
 fresh-0.090
Token this
Token ID428
Feature activation+9.30e-02

Pos logit contributions
lder+0.085
 property+0.085
 client+0.085
 means+0.084
 fault+0.082
Neg logit contributions
 shallow-0.066
Ho-0.062
 crashing-0.062
wered-0.061
 sun-0.060
Token was
Token ID373
Feature activation+8.06e-03

Pos logit contributions
lder+0.131
 property+0.129
 client+0.128
grand+0.123
keeping+0.123
Neg logit contributions
 shallow-0.156
 sun-0.152
 crashing-0.150
Ho-0.149
wered-0.147
Token a
Token ID257
Feature activation+4.20e-02

Pos logit contributions
 means+0.016
 property+0.016
 client+0.016
lder+0.015
 fault+0.015
Neg logit contributions
 shallow-0.009
Ho-0.008
 crashing-0.008
wered-0.008
add-0.008
Token great
Token ID1049
Feature activation+1.20e-01

Pos logit contributions
lder+0.079
keeping+0.078
grand+0.077
 means+0.077
rums+0.077
Neg logit contributions
 shallow-0.053
 sun-0.051
 crashing-0.050
Ho-0.049
 fresh-0.048
Token idea
Token ID2126
Feature activation+1.62e-01

Pos logit contributions
lder+0.159
grand+0.156
 client+0.152
 property+0.152
rums+0.151
Neg logit contributions
 sun-0.217
 shallow-0.212
 crashing-0.207
 fresh-0.201
 spring-0.200
Token and
Token ID290
Feature activation-8.97e-02

Pos logit contributions
lder+0.355
 client+0.344
 property+0.343
 depends+0.331
 fault+0.326
Neg logit contributions
 crashing-0.148
 sun-0.148
ores-0.136
 shallow-0.135
 floating-0.130
Token they
Token ID484
Feature activation-2.95e-03

Pos logit contributions
 shallow+0.090
 crashing+0.083
 sun+0.081
ores+0.080
Ho+0.080
Neg logit contributions
 property-0.188
lder-0.187
 client-0.186
 fault-0.182
grand-0.181
Token got
Token ID1392
Feature activation-5.31e-02

Pos logit contributions
 shallow+0.004
Ho+0.004
wered+0.004
ores+0.004
 crashing+0.004
Neg logit contributions
 means-0.005
lder-0.005
 client-0.005
 property-0.005
 depends-0.005
Token to
Token ID284
Feature activation+4.12e-02

Pos logit contributions
 shallow+0.064
 crashing+0.060
Ho+0.059
wered+0.058
 sun+0.058
Neg logit contributions
lder-0.100
 property-0.100
 client-0.099
 means-0.098
grand-0.097
Token work
Token ID670
Feature activation-1.05e-02

Pos logit contributions
lder+0.071
 property+0.071
 client+0.070
 means+0.070
grand+0.069
Neg logit contributions
 shallow-0.056
 crashing-0.053
Ho-0.052
wered-0.052
 sun-0.052
Token to
Token ID284
Feature activation+5.85e-02

Pos logit contributions
 client+0.096
lder+0.095
 property+0.095
grand+0.093
 means+0.092
Neg logit contributions
 shallow-0.062
 crashing-0.060
 sun-0.058
ores-0.057
 fresh-0.056
Token take
Token ID1011
Feature activation-8.80e-02

Pos logit contributions
lder+0.088
 client+0.087
 property+0.087
 means+0.085
 fault+0.082
Neg logit contributions
 shallow-0.093
Ho-0.091
 crashing-0.086
add-0.085
ores-0.085
Token turns
Token ID4962
Feature activation+2.86e-02

Pos logit contributions
 shallow+0.130
 sun+0.126
 crashing+0.122
Ho+0.122
ores+0.122
Neg logit contributions
lder-0.143
 property-0.142
 client-0.140
grand-0.139
 policy-0.136
Token being
Token ID852
Feature activation-3.21e-02

Pos logit contributions
lder+0.055
 property+0.055
 client+0.055
 means+0.054
grand+0.053
Neg logit contributions
 shallow-0.033
 crashing-0.031
Ho-0.031
 sun-0.030
wered-0.030
Token the
Token ID262
Feature activation-2.50e-03

Pos logit contributions
 shallow+0.035
 sun+0.033
 crashing+0.032
ores+0.031
 fresh+0.031
Neg logit contributions
lder-0.066
grand-0.065
 client-0.065
 property-0.065
 policy-0.064
Token seeker
Token ID45993
Feature activation+1.61e-01

Pos logit contributions
 shallow+0.004
 sun+0.004
 crashing+0.003
Ho+0.003
 fresh+0.003
Neg logit contributions
lder-0.004
keeping-0.004
 property-0.004
grand-0.004
 means-0.004
Token and
Token ID290
Feature activation-8.08e-02

Pos logit contributions
 property+0.365
lder+0.365
 client+0.363
 means+0.358
 depends+0.357
Neg logit contributions
 shallow-0.125
 crashing-0.120
 sun-0.119
ores-0.111
Ho-0.107
Token the
Token ID262
Feature activation-1.94e-03

Pos logit contributions
 shallow+0.093
 sun+0.090
 crashing+0.089
ores+0.085
 fresh+0.084
Neg logit contributions
 property-0.159
lder-0.158
grand-0.152
 means-0.151
rums-0.151
Token h
Token ID289
Feature activation-2.05e-02

Pos logit contributions
 sun+0.003
 shallow+0.003
 crashing+0.003
 fresh+0.003
 spring+0.003
Neg logit contributions
lder-0.003
grand-0.003
 means-0.003
keeping-0.003
 property-0.003
Tokenider
Token ID1304
Feature activation+3.47e-02

Pos logit contributions
 shallow+0.033
Ho+0.031
 crashing+0.031
 sun+0.030
wered+0.030
Neg logit contributions
lder-0.031
 client-0.031
 property-0.030
 fault-0.030
 means-0.030
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
 client+0.073
lder+0.072
grand+0.072
 Painter+0.067
 property+0.067
Neg logit contributions
 shallow-0.037
 crashing-0.036
 sun-0.036
ores-0.033
 fresh-0.033
Token told
Token ID1297
Feature activation+3.70e-02

Pos logit contributions
 property+0.074
 policy+0.074
keeping+0.073
 client+0.073
lder+0.073
Neg logit contributions
 shallow-0.055
 crashing-0.052
 sun-0.051
Ho-0.051
 fresh-0.049
Token him
Token ID683
Feature activation+1.87e-02

Pos logit contributions
 means+0.056
 behaviour+0.056
 fault+0.056
 client+0.055
 property+0.055
Neg logit contributions
Ho-0.058
 shallow-0.056
hes-0.053
 crashing-0.051
 south-0.051
Token "
Token ID366
Feature activation+2.62e-02

Pos logit contributions
lder+0.043
 property+0.041
 client+0.040
grand+0.040
keeping+0.040
Neg logit contributions
 shallow-0.016
 sun-0.015
ores-0.015
 crashing-0.014
 fresh-0.014
TokenDon
Token ID3987
Feature activation+4.91e-02

Pos logit contributions
 means+0.042
 property+0.041
lder+0.040
grand+0.039
keeping+0.039
Neg logit contributions
 shallow-0.040
wered-0.037
 camps-0.036
ores-0.036
 crashing-0.036
Token't
Token ID470
Feature activation+4.67e-02

Pos logit contributions
lder+0.072
 property+0.071
 client+0.071
 means+0.070
grand+0.068
Neg logit contributions
 shallow-0.079
Ho-0.075
 crashing-0.075
wered-0.074
 sun-0.074
Token worry
Token ID5490
Feature activation+1.60e-01

Pos logit contributions
lder+0.034
 client+0.033
 means+0.032
 property+0.030
grand+0.030
Neg logit contributions
 crashing-0.111
 shallow-0.110
 sun-0.108
Ho-0.106
wered-0.106
Token,
Token ID11
Feature activation+3.37e-02

Pos logit contributions
grand+0.289
lder+0.288
 client+0.287
keeping+0.287
 property+0.279
Neg logit contributions
 shallow-0.209
 sun-0.208
 crashing-0.208
ores-0.197
 fresh-0.189
Token I
Token ID314
Feature activation+1.65e-02

Pos logit contributions
 property+0.045
 means+0.045
 policy+0.043
 client+0.043
lder+0.043
Neg logit contributions
 shallow-0.057
Ho-0.057
wered-0.055
add-0.055
hes-0.054
Token'll
Token ID1183
Feature activation+4.13e-02

Pos logit contributions
lder+0.020
 client+0.020
keeping+0.019
 property+0.019
grand+0.018
Neg logit contributions
 shallow-0.031
 crashing-0.030
ores-0.030
 sun-0.029
add-0.029
Token come
Token ID1282
Feature activation-6.78e-02

Pos logit contributions
 property+0.064
lder+0.063
 client+0.063
 means+0.062
 fault+0.060
Neg logit contributions
 shallow-0.064
Ho-0.062
wered-0.059
 crashing-0.059
add-0.059
Token back
Token ID736
Feature activation+2.46e-02

Pos logit contributions
 shallow+0.097
wered+0.096
Ho+0.091
hes+0.088
add+0.088
Neg logit contributions
 property-0.112
 client-0.108
 means-0.108
keeping-0.105
 fault-0.105
Token told
Token ID1297
Feature activation+4.01e-02

Pos logit contributions
lder+0.082
 client+0.081
 property+0.081
 means+0.080
grand+0.078
Neg logit contributions
 shallow-0.091
Ho-0.086
 crashing-0.086
wered-0.085
 sun-0.084
Token her
Token ID607
Feature activation+4.04e-02

Pos logit contributions
 property+0.094
 client+0.092
 fault+0.092
grand+0.091
lder+0.091
Neg logit contributions
wered-0.029
 sun-0.029
 fresh-0.027
 shallow-0.027
ores-0.026
Token "
Token ID366
Feature activation+2.27e-02

Pos logit contributions
 depends+0.089
lder+0.087
 property+0.086
keeping+0.086
grand+0.084
Neg logit contributions
 sun-0.034
 all-0.032
ores-0.031
hes-0.029
 fresh-0.029
TokenDon
Token ID3987
Feature activation+4.54e-02

Pos logit contributions
lder+0.039
 means+0.039
 property+0.038
 client+0.038
grand+0.037
Neg logit contributions
 shallow-0.031
 crashing-0.029
wered-0.029
Ho-0.029
ores-0.029
Token't
Token ID470
Feature activation+4.52e-02

Pos logit contributions
 property+0.074
 client+0.072
lder+0.071
 fault+0.071
 means+0.069
Neg logit contributions
 shallow-0.065
 crashing-0.063
Ho-0.063
wered-0.063
 sun-0.062
Token worry
Token ID5490
Feature activation+1.60e-01

Pos logit contributions
lder+0.033
 client+0.032
 means+0.031
 fault+0.031
grand+0.030
Neg logit contributions
 crashing-0.107
 sun-0.106
 shallow-0.104
 fresh-0.102
wered-0.102
Token,
Token ID11
Feature activation+3.33e-02

Pos logit contributions
keeping+0.321
 property+0.314
grand+0.314
 depends+0.306
 client+0.304
Neg logit contributions
 sun-0.182
 fresh-0.170
ores-0.159
 crashing-0.157
 all-0.154
Token sweet
Token ID6029
Feature activation+4.42e-02

Pos logit contributions
lder+0.052
 client+0.052
 property+0.052
grand+0.050
 fault+0.050
Neg logit contributions
 shallow-0.050
 sun-0.049
 crashing-0.048
 fresh-0.047
Ho-0.047
Tokenheart
Token ID11499
Feature activation+1.04e-01

Pos logit contributions
 client+0.074
 property+0.073
 fault+0.072
keeping+0.071
lder+0.071
Neg logit contributions
 sun-0.062
 shallow-0.060
 fresh-0.058
Ho-0.058
add-0.058
Token.
Token ID13
Feature activation+3.62e-02

Pos logit contributions
 client+0.240
grand+0.234
 fault+0.233
 depends+0.233
keeping+0.230
Neg logit contributions
 sun-0.068
 until-0.062
ores-0.057
 all-0.057
 as-0.057
Token We
Token ID775
Feature activation+9.15e-03

Pos logit contributions
 property+0.048
lder+0.048
 client+0.047
 means+0.046
 fault+0.046
Neg logit contributions
 shallow-0.063
 crashing-0.060
Ho-0.060
 sun-0.060
ores-0.059
Token they
Token ID484
Feature activation-7.30e-03

Pos logit contributions
 client+0.134
lder+0.134
 property+0.129
grand+0.128
 privacy+0.123
Neg logit contributions
 sun-0.066
 shallow-0.065
 crashing-0.064
ores-0.062
 floating-0.057
Token could
Token ID714
Feature activation+4.59e-02

Pos logit contributions
 shallow+0.011
 crashing+0.010
Ho+0.010
ores+0.010
 sun+0.010
Neg logit contributions
 property-0.012
 client-0.012
 fault-0.012
lder-0.012
 means-0.011
Token prevent
Token ID2948
Feature activation-1.89e-03

Pos logit contributions
 property+0.082
lder+0.081
 client+0.081
 means+0.081
 fault+0.079
Neg logit contributions
 shallow-0.059
 crashing-0.056
wered-0.056
Ho-0.055
 sun-0.055
Token having
Token ID1719
Feature activation+5.92e-03

Pos logit contributions
 crashing+0.002
 shallow+0.002
 sun+0.002
Ho+0.002
ores+0.002
Neg logit contributions
lder-0.004
grand-0.004
 property-0.004
 client-0.004
 policy-0.004
Token to
Token ID284
Feature activation+6.38e-02

Pos logit contributions
lder+0.013
 client+0.013
grand+0.012
 property+0.012
 means+0.012
Neg logit contributions
 shallow-0.006
 sun-0.006
 fresh-0.006
 crashing-0.005
ores-0.005
Token worry
Token ID5490
Feature activation+1.59e-01

Pos logit contributions
lder+0.109
 property+0.109
 client+0.109
 means+0.107
grand+0.105
Neg logit contributions
 shallow-0.086
Ho-0.082
 crashing-0.081
wered-0.079
 sun-0.079
Token about
Token ID546
Feature activation+1.35e-02

Pos logit contributions
lder+0.290
 property+0.288
 client+0.288
 means+0.283
grand+0.278
Neg logit contributions
 shallow-0.200
 crashing-0.189
Ho-0.186
wered-0.183
 sun-0.183
Token not
Token ID407
Feature activation+3.64e-02

Pos logit contributions
lder+0.029
 client+0.028
 property+0.028
 means+0.028
grand+0.028
Neg logit contributions
 shallow-0.013
 crashing-0.012
ores-0.012
 sun-0.012
Ho-0.012
Token being
Token ID852
Feature activation-2.09e-02

Pos logit contributions
lder+0.065
 client+0.064
 property+0.064
 means+0.063
grand+0.062
Neg logit contributions
 shallow-0.048
 crashing-0.045
Ho-0.044
wered-0.044
 sun-0.044
Token able
Token ID1498
Feature activation+9.40e-02

Pos logit contributions
 shallow+0.027
 crashing+0.025
ores+0.025
hes+0.025
 sun+0.024
Neg logit contributions
lder-0.039
grand-0.039
 policy-0.038
 client-0.038
 property-0.037
Token to
Token ID284
Feature activation+6.02e-02

Pos logit contributions
lder+0.172
 client+0.163
grand+0.162
 property+0.159
 means+0.158
Neg logit contributions
 crashing-0.127
 shallow-0.126
 sun-0.120
 fresh-0.117
ores-0.115
Token what
Token ID644
Feature activation+1.91e-02

Pos logit contributions
 shallow+0.071
Ho+0.065
 crashing+0.065
wered+0.063
add+0.063
Neg logit contributions
 means-0.107
 client-0.107
lder-0.106
 property-0.105
grand-0.102
Token he
Token ID339
Feature activation-5.49e-03

Pos logit contributions
keeping+0.036
 property+0.036
lder+0.035
 fault+0.034
idents+0.032
Neg logit contributions
 sun-0.027
hes-0.023
 fresh-0.023
 spring-0.022
 far-0.021
Token sees
Token ID7224
Feature activation-4.43e-03

Pos logit contributions
Ho+0.009
 sun+0.009
hes+0.009
 shallow+0.009
wered+0.009
Neg logit contributions
keeping-0.007
 policy-0.007
 property-0.007
 fault-0.007
lder-0.007
Token on
Token ID319
Feature activation+1.38e-02

Pos logit contributions
 shallow+0.005
 sun+0.005
Ho+0.005
 crashing+0.004
hes+0.004
Neg logit contributions
lder-0.009
 client-0.008
 property-0.008
 fault-0.008
 policy-0.008
Token the
Token ID262
Feature activation+5.94e-03

Pos logit contributions
lder+0.032
 property+0.031
 client+0.031
 fault+0.031
grand+0.030
Neg logit contributions
 shallow-0.011
 sun-0.010
wered-0.010
Ho-0.010
 crashing-0.010
Token second
Token ID1218
Feature activation+1.59e-01

Pos logit contributions
rums+0.009
 means+0.009
 fault+0.009
lder+0.009
keeping+0.009
Neg logit contributions
 sun-0.010
 shallow-0.009
Ho-0.009
 fresh-0.009
wered-0.009
Token page
Token ID2443
Feature activation+2.25e-02

Pos logit contributions
 fault+0.256
rums+0.244
lder+0.243
 property+0.242
 disl+0.241
Neg logit contributions
 sun-0.263
 shallow-0.224
 spring-0.223
wered-0.218
Ho-0.216
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
lder+0.044
 property+0.043
 client+0.042
 fault+0.041
 means+0.041
Neg logit contributions
 shallow-0.026
 sun-0.025
 crashing-0.024
ores-0.023
Ho-0.023
Token He
Token ID679
Feature activation-3.99e-02

Pos logit contributions
 means+0.029
grand+0.027
 fault+0.027
 policy+0.026
 client+0.026
Neg logit contributions
Ho-0.028
 shallow-0.028
 crashing-0.026
 drifting-0.026
add-0.026
Token sees
Token ID7224
Feature activation-1.17e-02

Pos logit contributions
 shallow+0.076
 sun+0.074
Ho+0.073
wered+0.073
 crashing+0.072
Neg logit contributions
keeping-0.047
 property-0.046
lder-0.046
 policy-0.045
 client-0.043
Token a
Token ID257
Feature activation+2.88e-02

Pos logit contributions
 shallow+0.012
 sun+0.011
 crashing+0.010
Ho+0.010
wered+0.010
Neg logit contributions
lder-0.025
 property-0.024
 client-0.024
 means-0.024
 policy-0.024
Token wanted
Token ID2227
Feature activation+4.83e-02

Pos logit contributions
keeping+0.013
 policy+0.012
 outage+0.012
spe+0.011
 sibling+0.011
Neg logit contributions
 j-0.013
 spring-0.012
 sun-0.012
hes-0.012
add-0.012
Token was
Token ID373
Feature activation+1.09e-02

Pos logit contributions
lder+0.106
 client+0.103
grand+0.098
 property+0.096
 Painter+0.096
Neg logit contributions
 crashing-0.045
 sun-0.045
 shallow-0.043
 them-0.041
 spring-0.041
Token not
Token ID407
Feature activation+3.46e-02

Pos logit contributions
lder+0.023
 sibling+0.023
 client+0.023
 Painter+0.022
keeping+0.022
Neg logit contributions
 fresh-0.010
 crashing-0.010
 floating-0.010
 sun-0.009
 spring-0.009
Token a
Token ID257
Feature activation+4.80e-02

Pos logit contributions
lder+0.064
 client+0.061
grand+0.060
rd+0.059
 Painter+0.058
Neg logit contributions
 crashing-0.044
 shallow-0.043
 sun-0.042
 fresh-0.041
wered-0.040
Token good
Token ID922
Feature activation+1.21e-01

Pos logit contributions
rums+0.097
grand+0.096
keeping+0.096
rd+0.095
lder+0.089
Neg logit contributions
 sun-0.054
 fresh-0.052
 far-0.050
 shallow-0.050
 j-0.050
Token idea
Token ID2126
Feature activation+1.59e-01

Pos logit contributions
lder+0.162
grand+0.155
rums+0.152
rd+0.149
 property+0.145
Neg logit contributions
 sun-0.214
 crashing-0.209
 shallow-0.205
 fresh-0.197
 spring-0.197
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
lder+0.355
 client+0.336
 property+0.326
grand+0.324
 depends+0.322
Neg logit contributions
 crashing-0.148
 sun-0.147
ores-0.139
 shallow-0.138
 floating-0.130
Token From
Token ID3574
Feature activation-4.23e-02

Pos logit contributions
lder+0.035
 property+0.035
 client+0.035
 means+0.034
 fault+0.034
Neg logit contributions
 shallow-0.017
Ho-0.016
 crashing-0.016
 sun-0.016
wered-0.015
Token then
Token ID788
Feature activation-5.02e-03

Pos logit contributions
 crashing+0.065
 shallow+0.065
Ho+0.064
add+0.062
wered+0.061
Neg logit contributions
 means-0.066
 property-0.065
 client-0.063
lder-0.063
 privacy-0.060
Token on
Token ID319
Feature activation+1.58e-02

Pos logit contributions
 shallow+0.006
ores+0.005
wered+0.005
Ho+0.005
 sunscreen+0.005
Neg logit contributions
 means-0.010
 policy-0.009
 client-0.009
lder-0.009
 property-0.009
Token,
Token ID11
Feature activation-4.39e-03

Pos logit contributions
lder+0.029
 property+0.029
keeping+0.029
grand+0.028
 fault+0.028
Neg logit contributions
 sun-0.020
 shallow-0.020
 crashing-0.019
Ho-0.019
 spring-0.018
Token can
Token ID460
Feature activation+5.79e-02

Pos logit contributions
lder+0.034
 client+0.034
 property+0.034
grand+0.033
 policy+0.033
Neg logit contributions
 shallow-0.043
 crashing-0.041
Ho-0.041
 sun-0.041
wered-0.040
Token't
Token ID470
Feature activation+3.55e-02

Pos logit contributions
lder+0.094
 property+0.094
 client+0.094
 means+0.093
 fault+0.091
Neg logit contributions
 shallow-0.084
Ho-0.079
 crashing-0.079
wered-0.077
ores-0.077
Token see
Token ID766
Feature activation-6.85e-03

Pos logit contributions
 property+0.055
lder+0.055
 client+0.054
 means+0.054
grand+0.052
Neg logit contributions
 shallow-0.055
Ho-0.052
 crashing-0.051
wered-0.050
ores-0.050
Token for
Token ID329
Feature activation+4.14e-02

Pos logit contributions
 shallow+0.009
 crashing+0.009
 sun+0.008
Ho+0.008
ores+0.008
Neg logit contributions
lder-0.012
 client-0.012
 property-0.012
grand-0.012
 means-0.011
Token a
Token ID257
Feature activation+4.43e-02

Pos logit contributions
lder+0.074
 client+0.072
 property+0.072
 means+0.072
grand+0.072
Neg logit contributions
 shallow-0.054
 crashing-0.052
 sun-0.051
Ho-0.050
wered-0.050
Token second
Token ID1218
Feature activation+1.58e-01

Pos logit contributions
lder+0.082
grand+0.082
 property+0.082
keeping+0.081
 means+0.081
Neg logit contributions
 crashing-0.057
 sun-0.056
 shallow-0.055
ch-0.052
 fresh-0.051
Token,
Token ID11
Feature activation-6.11e-03

Pos logit contributions
lder+0.326
 property+0.303
grand+0.302
 client+0.299
keeping+0.299
Neg logit contributions
 sun-0.180
 crashing-0.172
 shallow-0.169
 spring-0.161
ch-0.158
Token you
Token ID345
Feature activation+3.91e-02

Pos logit contributions
 shallow+0.009
Ho+0.008
 crashing+0.008
wered+0.008
ores+0.008
Neg logit contributions
 property-0.010
lder-0.010
 client-0.010
 means-0.010
 fault-0.009
Token can
Token ID460
Feature activation+6.52e-02

Pos logit contributions
lder+0.058
 client+0.056
keeping+0.056
 property+0.055
grand+0.054
Neg logit contributions
 shallow-0.062
 crashing-0.060
 sun-0.059
ores-0.059
Ho-0.058
Token still
Token ID991
Feature activation+2.41e-02

Pos logit contributions
lder+0.110
 property+0.110
 client+0.109
 means+0.108
grand+0.107
Neg logit contributions
 shallow-0.090
 crashing-0.086
Ho-0.085
wered-0.085
 sun-0.084
Token do
Token ID466
Feature activation+5.33e-02

Pos logit contributions
lder+0.038
 client+0.038
 property+0.038
 means+0.037
grand+0.036
Neg logit contributions
 shallow-0.036
 crashing-0.034
Ho-0.034
wered-0.033
 sun-0.033
Token saying
Token ID2282
Feature activation+1.11e-02

Pos logit contributions
add+0.025
 camps+0.024
wered+0.024
Ho+0.024
ores+0.023
Neg logit contributions
 means-0.022
 property-0.020
 worry-0.020
lder-0.020
 depends-0.019
Token,
Token ID11
Feature activation-1.16e-02

Pos logit contributions
 property+0.020
keeping+0.020
 client+0.020
lder+0.019
grand+0.019
Neg logit contributions
 shallow-0.014
 crashing-0.014
 sun-0.014
Ho-0.014
ores-0.013
Token "
Token ID366
Feature activation+1.80e-02

Pos logit contributions
 shallow+0.021
wered+0.020
 camps+0.020
add+0.020
Ho+0.019
Neg logit contributions
lder-0.017
 means-0.015
 depends-0.013
 client-0.013
 property-0.013
TokenDon
Token ID3987
Feature activation+4.48e-02

Pos logit contributions
 means+0.031
lder+0.031
grand+0.031
 property+0.031
 client+0.030
Neg logit contributions
 shallow-0.025
 crashing-0.024
ores-0.023
 camps-0.023
add-0.023
Token't
Token ID470
Feature activation+4.26e-02

Pos logit contributions
lder+0.067
 property+0.066
 client+0.066
 means+0.065
 fault+0.064
Neg logit contributions
 shallow-0.071
 crashing-0.068
Ho-0.068
wered-0.067
 sun-0.066
Token worry
Token ID5490
Feature activation+1.58e-01

Pos logit contributions
lder+0.040
 fault+0.039
 client+0.038
 means+0.038
keeping+0.036
Neg logit contributions
 crashing-0.093
 sun-0.092
 shallow-0.089
 spring-0.089
 fresh-0.088
Token,
Token ID11
Feature activation+3.44e-02

Pos logit contributions
keeping+0.306
grand+0.299
 client+0.297
lder+0.296
 property+0.294
Neg logit contributions
 sun-0.197
 shallow-0.181
 crashing-0.180
 fresh-0.179
ores-0.173
Token the
Token ID262
Feature activation-1.10e-02

Pos logit contributions
 property+0.061
 means+0.061
lder+0.061
 client+0.061
 fault+0.059
Neg logit contributions
 shallow-0.044
Ho-0.042
wered-0.042
 crashing-0.041
add-0.040
Token dog
Token ID3290
Feature activation+3.74e-02

Pos logit contributions
 shallow+0.019
Ho+0.018
 crashing+0.018
wered+0.018
ores+0.018
Neg logit contributions
lder-0.015
 property-0.015
 client-0.015
 means-0.015
grand-0.015
Token won
Token ID1839
Feature activation+5.26e-02

Pos logit contributions
keeping+0.046
 client+0.045
lder+0.043
 Painter+0.043
 property+0.042
Neg logit contributions
 shallow-0.068
 crashing-0.066
ch-0.066
Ho-0.065
 sun-0.065
Token't
Token ID470
Feature activation+4.71e-02

Pos logit contributions
 property+0.091
 fault+0.089
 client+0.089
lder+0.088
 policy+0.087
Neg logit contributions
 shallow-0.069
 crashing-0.069
Ho-0.069
 sun-0.068
wered-0.067
TokenThat
Token ID2504
Feature activation+6.01e-02

Pos logit contributions
lder+0.028
 property+0.028
 client+0.028
 means+0.028
 fault+0.027
Neg logit contributions
 shallow-0.021
Ho-0.020
 crashing-0.020
wered-0.020
 sun-0.020
Token is
Token ID318
Feature activation+1.72e-02

Pos logit contributions
 client+0.091
keeping+0.089
rums+0.086
grand+0.084
lder+0.084
Neg logit contributions
 shallow-0.106
 sun-0.103
 crashing-0.097
 fresh-0.095
 spring-0.095
Token such
Token ID884
Feature activation+6.75e-02

Pos logit contributions
 property+0.030
 means+0.030
lder+0.030
 client+0.029
 fault+0.029
Neg logit contributions
 shallow-0.023
Ho-0.022
 crashing-0.021
wered-0.021
add-0.021
Token a
Token ID257
Feature activation+4.48e-02

Pos logit contributions
lder+0.156
 client+0.152
grand+0.151
 property+0.149
 means+0.149
Neg logit contributions
 shallow-0.058
 sun-0.053
wered-0.051
 crashing-0.050
 fresh-0.050
Token great
Token ID1049
Feature activation+1.15e-01

Pos logit contributions
lder+0.080
 means+0.079
 client+0.079
 property+0.079
grand+0.077
Neg logit contributions
 shallow-0.059
 crashing-0.055
 sun-0.054
Ho-0.054
wered-0.053
Token idea
Token ID2126
Feature activation+1.58e-01

Pos logit contributions
lder+0.118
 property+0.118
 client+0.117
 means+0.115
grand+0.111
Neg logit contributions
 shallow-0.237
 crashing-0.228
Ho-0.228
wered-0.225
 sun-0.224
Token!
Token ID0
Feature activation+4.16e-02

Pos logit contributions
 client+0.265
 property+0.255
lder+0.251
 privacy+0.247
 depends+0.246
Neg logit contributions
 sun-0.228
 crashing-0.221
ores-0.212
 shallow-0.203
ch-0.200
Token How
Token ID1374
Feature activation+1.11e-01

Pos logit contributions
lder+0.052
 property+0.051
 client+0.051
 means+0.050
 fault+0.049
Neg logit contributions
 shallow-0.076
 crashing-0.073
Ho-0.073
wered-0.072
 sun-0.071
Token can
Token ID460
Feature activation+7.13e-02

Pos logit contributions
keeping+0.206
 client+0.199
lder+0.196
 property+0.193
rums+0.190
Neg logit contributions
 shallow-0.148
 sun-0.141
ores-0.140
 crashing-0.137
wered-0.136
Token I
Token ID314
Feature activation+2.60e-02

Pos logit contributions
 fault+0.128
 property+0.126
lder+0.121
 client+0.120
 virtue+0.116
Neg logit contributions
 sun-0.093
 shallow-0.092
wered-0.090
 crashing-0.089
Ho-0.085
Token help
Token ID1037
Feature activation+4.43e-02

Pos logit contributions
lder+0.041
 property+0.041
 client+0.040
 means+0.040
grand+0.039
Neg logit contributions
 shallow-0.039
Ho-0.037
 crashing-0.037
wered-0.036
ores-0.036
Token say
Token ID910
Feature activation-1.05e-02

Pos logit contributions
lder+0.027
 property+0.027
 client+0.027
 means+0.026
grand+0.026
Neg logit contributions
 shallow-0.021
Ho-0.019
 crashing-0.019
wered-0.019
 sun-0.019
Token,
Token ID11
Feature activation-1.01e-02

Pos logit contributions
 crashing+0.013
 shallow+0.013
 sun+0.013
ores+0.012
Ho+0.012
Neg logit contributions
 property-0.019
 client-0.019
lder-0.019
grand-0.018
 means-0.018
Token "
Token ID366
Feature activation+2.29e-02

Pos logit contributions
add+0.018
 shallow+0.018
 camps+0.018
wered+0.018
 sunscreen+0.017
Neg logit contributions
 means-0.013
lder-0.013
 property-0.011
 depends-0.011
 worry-0.011
TokenDon
Token ID3987
Feature activation+4.67e-02

Pos logit contributions
lder+0.040
 means+0.039
 property+0.039
grand+0.039
keeping+0.038
Neg logit contributions
 shallow-0.032
ores-0.029
 crashing-0.029
 sun-0.028
add-0.028
Token't
Token ID470
Feature activation+4.38e-02

Pos logit contributions
 property+0.077
 client+0.076
 fault+0.075
lder+0.074
 means+0.072
Neg logit contributions
 shallow-0.066
wered-0.065
 crashing-0.064
 sun-0.064
Ho-0.064
Token worry
Token ID5490
Feature activation+1.57e-01

Pos logit contributions
 fault+0.043
 client+0.042
lder+0.042
 means+0.042
 property+0.040
Neg logit contributions
 crashing-0.092
 sun-0.091
 shallow-0.091
wered-0.089
 fresh-0.087
Token.
Token ID13
Feature activation+4.82e-02

Pos logit contributions
 client+0.307
grand+0.304
keeping+0.298
 property+0.292
 depends+0.291
Neg logit contributions
 sun-0.207
 shallow-0.187
ores-0.183
 fresh-0.182
 crashing-0.179
Token I
Token ID314
Feature activation+1.76e-02

Pos logit contributions
 property+0.078
 client+0.075
 fault+0.074
lder+0.074
 policy+0.073
Neg logit contributions
 sun-0.068
 shallow-0.068
Ho-0.067
 crashing-0.067
ores-0.066
Token've
Token ID1053
Feature activation-6.76e-03

Pos logit contributions
lder+0.021
 client+0.021
 property+0.021
keeping+0.020
grand+0.020
Neg logit contributions
 shallow-0.033
 crashing-0.032
ores-0.031
add-0.031
 sun-0.031
Token got
Token ID1392
Feature activation-4.73e-02

Pos logit contributions
 shallow+0.011
 crashing+0.010
 sun+0.010
wered+0.010
Ho+0.010
Neg logit contributions
 client-0.010
lder-0.010
 property-0.010
 means-0.010
 policy-0.010
Token your
Token ID534
Feature activation+2.70e-02

Pos logit contributions
 shallow+0.056
 crashing+0.052
Ho+0.051
 sun+0.050
wered+0.050
Neg logit contributions
lder-0.091
 property-0.090
 client-0.090
 means-0.089
grand-0.087
Token said
Token ID531
Feature activation+1.28e-02

Pos logit contributions
 shallow+0.146
 sun+0.138
 fresh+0.135
 crashing+0.133
Ho+0.132
Neg logit contributions
lder-0.177
 property-0.177
 client-0.175
 fault-0.171
 policy-0.167
Token,
Token ID11
Feature activation-9.11e-03

Pos logit contributions
 property+0.025
 client+0.025
keeping+0.024
 means+0.024
 fault+0.024
Neg logit contributions
 crashing-0.014
 shallow-0.014
 sun-0.014
Ho-0.013
 fresh-0.013
Token "
Token ID366
Feature activation+1.47e-02

Pos logit contributions
 shallow+0.015
wered+0.014
add+0.014
 camps+0.014
Ho+0.014
Neg logit contributions
lder-0.014
 means-0.013
 client-0.012
 Painter-0.012
 fault-0.012
TokenDon
Token ID3987
Feature activation+4.29e-02

Pos logit contributions
lder+0.026
 means+0.026
grand+0.025
 property+0.025
 client+0.025
Neg logit contributions
 shallow-0.020
 crashing-0.019
wered-0.018
add-0.018
ores-0.018
Token't
Token ID470
Feature activation+4.10e-02

Pos logit contributions
lder+0.063
 property+0.063
 client+0.063
 means+0.062
grand+0.060
Neg logit contributions
 shallow-0.069
 crashing-0.065
Ho-0.065
wered-0.064
 sun-0.064
Token worry
Token ID5490
Feature activation+1.57e-01

Pos logit contributions
 client+0.034
 means+0.033
lder+0.033
 fault+0.033
keeping+0.032
Neg logit contributions
 crashing-0.092
 sun-0.092
 fresh-0.090
 shallow-0.090
Ho-0.089
Token,
Token ID11
Feature activation+3.22e-02

Pos logit contributions
keeping+0.313
 client+0.303
 property+0.301
grand+0.301
lder+0.295
Neg logit contributions
 sun-0.195
 crashing-0.182
 shallow-0.172
ch-0.172
 fresh-0.171
Token I
Token ID314
Feature activation+1.80e-02

Pos logit contributions
 property+0.053
lder+0.053
 means+0.052
 client+0.052
 fault+0.051
Neg logit contributions
 shallow-0.046
Ho-0.044
 crashing-0.043
wered-0.043
ores-0.042
Token have
Token ID423
Feature activation-2.42e-02

Pos logit contributions
lder+0.026
keeping+0.026
 client+0.025
rums+0.024
 property+0.024
Neg logit contributions
 shallow-0.029
 crashing-0.028
 sun-0.028
 spring-0.028
ores-0.028
Token an
Token ID281
Feature activation+6.05e-02

Pos logit contributions
 shallow+0.024
 sun+0.024
 fresh+0.023
 sunscreen+0.023
add+0.020
Neg logit contributions
lder-0.053
 client-0.053
 property-0.051
 policy-0.050
keeping-0.050
Token idea
Token ID2126
Feature activation+1.42e-01

Pos logit contributions
lder+0.079
keeping+0.077
 client+0.075
 means+0.074
grand+0.073
Neg logit contributions
 shallow-0.108
 sun-0.107
wered-0.106
add-0.104
 fresh-0.101
Token "
Token ID366
Feature activation+1.25e-02

Pos logit contributions
 shallow+0.017
add+0.016
wered+0.016
 camps+0.016
Ho+0.016
Neg logit contributions
lder-0.014
 means-0.012
 Painter-0.011
 client-0.011
 worry-0.010
TokenNothing
Token ID18465
Feature activation+1.59e-02

Pos logit contributions
lder+0.024
 property+0.024
 means+0.024
 client+0.024
grand+0.023
Neg logit contributions
 shallow-0.015
 crashing-0.014
Ho-0.014
wered-0.013
 sun-0.013
Token for
Token ID329
Feature activation+4.16e-02

Pos logit contributions
lder+0.030
 client+0.029
keeping+0.028
grand+0.028
 Painter+0.027
Neg logit contributions
 shallow-0.021
 crashing-0.020
 sun-0.020
 fresh-0.020
ch-0.019
Token you
Token ID345
Feature activation+5.85e-02

Pos logit contributions
lder+0.067
 property+0.066
 client+0.066
 means+0.065
 fault+0.063
Neg logit contributions
 shallow-0.062
Ho-0.058
 crashing-0.058
wered-0.057
ores-0.056
Token to
Token ID284
Feature activation+6.87e-02

Pos logit contributions
lder+0.120
 client+0.113
 depends+0.113
grand+0.111
 Painter+0.111
Neg logit contributions
 crashing-0.063
 shallow-0.061
 sun-0.061
 spring-0.056
 fresh-0.055
Token worry
Token ID5490
Feature activation+1.57e-01

Pos logit contributions
 property+0.121
lder+0.121
 client+0.120
 means+0.120
grand+0.116
Neg logit contributions
 shallow-0.090
Ho-0.085
 crashing-0.084
wered-0.082
ores-0.082
Token about
Token ID546
Feature activation+1.06e-02

Pos logit contributions
 client+0.280
lder+0.276
 property+0.270
grand+0.267
 depends+0.266
Neg logit contributions
 shallow-0.205
 sun-0.199
 crashing-0.197
ores-0.194
 fresh-0.187
Token,
Token ID11
Feature activation+1.09e-02

Pos logit contributions
lder+0.020
 client+0.020
grand+0.020
 property+0.019
 policy+0.019
Neg logit contributions
 shallow-0.012
 crashing-0.012
 sun-0.011
ores-0.011
wered-0.011
Token Jimmy
Token ID12963
Feature activation+6.24e-02

Pos logit contributions
lder+0.018
 client+0.018
 property+0.017
grand+0.017
 means+0.017
Neg logit contributions
 shallow-0.016
 crashing-0.015
 sun-0.015
Ho-0.015
ores-0.015
Token.
Token ID13
Feature activation+2.85e-02

Pos logit contributions
 tactic+0.138
 client+0.137
 endings+0.136
rums+0.134
anish+0.134
Neg logit contributions
ores-0.045
 shallow-0.044
 sun-0.044
hes-0.041
 fresh-0.041
Token These
Token ID2312
Feature activation+3.25e-02

Pos logit contributions
lder+0.036
 property+0.036
 client+0.036
 means+0.036
grand+0.035
Neg logit contributions
 shallow-0.051
 crashing-0.049
Ho-0.049
wered-0.048
 sun-0.048
Token to
Token ID284
Feature activation+4.45e-02

Pos logit contributions
 shallow+0.001
 crashing+0.001
 sun+0.001
ores+0.001
 fresh+0.001
Neg logit contributions
lder-0.002
 client-0.002
grand-0.002
 property-0.002
keeping-0.002
Token our
Token ID674
Feature activation+3.38e-02

Pos logit contributions
lder+0.070
 client+0.068
 means+0.068
 property+0.067
grand+0.067
Neg logit contributions
 shallow-0.068
 crashing-0.065
 sun-0.064
Ho-0.064
wered-0.063
Token friends
Token ID2460
Feature activation+3.68e-02

Pos logit contributions
lder+0.049
 means+0.048
grand+0.047
keeping+0.047
 Painter+0.046
Neg logit contributions
 sun-0.055
 shallow-0.054
 crashing-0.054
add-0.053
 fresh-0.053
Token to
Token ID284
Feature activation+5.94e-02

Pos logit contributions
lder+0.084
 client+0.080
grand+0.080
keeping+0.079
rums+0.079
Neg logit contributions
 sun-0.027
 crashing-0.026
 shallow-0.026
 fresh-0.026
 all-0.025
Token prevent
Token ID2948
Feature activation+6.64e-03

Pos logit contributions
 property+0.095
 client+0.093
lder+0.093
 means+0.091
 fault+0.089
Neg logit contributions
 shallow-0.089
Ho-0.085
 crashing-0.084
wered-0.082
ores-0.082
Token bad
Token ID2089
Feature activation+1.57e-01

Pos logit contributions
lder+0.013
grand+0.013
 client+0.012
 policy+0.012
 property+0.012
Neg logit contributions
 crashing-0.007
 shallow-0.007
 sun-0.007
Ho-0.007
ores-0.007
Token things
Token ID1243
Feature activation+5.09e-02

Pos logit contributions
lder+0.219
grand+0.218
 Painter+0.213
 client+0.212
 fault+0.202
Neg logit contributions
 sun-0.263
 crashing-0.260
 shallow-0.249
 fresh-0.247
Ho-0.244
Token from
Token ID422
Feature activation+8.10e-03

Pos logit contributions
 property+0.093
lder+0.092
 client+0.092
 means+0.092
 fault+0.089
Neg logit contributions
 shallow-0.064
Ho-0.060
 crashing-0.060
wered-0.059
 sun-0.058
Token happening
Token ID5836
Feature activation+2.12e-02

Pos logit contributions
 property+0.012
lder+0.011
 client+0.011
 means+0.011
 fault+0.011
Neg logit contributions
 shallow-0.013
Ho-0.013
wered-0.012
 crashing-0.012
ores-0.012
Token.
Token ID13
Feature activation+2.92e-02

Pos logit contributions
lder+0.041
 client+0.041
grand+0.040
 property+0.038
 fault+0.038
Neg logit contributions
 shallow-0.026
 crashing-0.025
 sun-0.023
ores-0.023
Ho-0.023
Token And
Token ID843
Feature activation-3.49e-02

Pos logit contributions
lder+0.054
 property+0.053
 client+0.053
 means+0.052
 fault+0.051
Neg logit contributions
 shallow-0.036
Ho-0.034
 crashing-0.034
 sun-0.033
wered-0.033
Token made
Token ID925
Feature activation-1.09e-02

Pos logit contributions
 shallow+0.021
Ho+0.020
 crashing+0.020
wered+0.020
 sun+0.019
Neg logit contributions
lder-0.020
 means-0.020
 property-0.020
 client-0.020
 fault-0.019
Token Rex
Token ID17853
Feature activation+1.89e-02

Pos logit contributions
 shallow+0.016
 sunscreen+0.015
eh+0.015
Ho+0.015
wered+0.015
Neg logit contributions
 means-0.017
 wishes-0.016
rums-0.015
 client-0.015
 fault-0.014
Token freeze
Token ID16611
Feature activation-1.46e-03

Pos logit contributions
 client+0.035
lder+0.035
 property+0.035
 policy+0.034
grand+0.034
Neg logit contributions
 shallow-0.024
 crashing-0.024
 fresh-0.022
 sun-0.021
ores-0.021
Token for
Token ID329
Feature activation+3.80e-02

Pos logit contributions
 shallow+0.002
 crashing+0.002
Ho+0.002
 sun+0.002
ores+0.002
Neg logit contributions
lder-0.003
 client-0.003
 property-0.003
grand-0.003
 means-0.003
Token a
Token ID257
Feature activation+4.43e-02

Pos logit contributions
 client+0.063
 property+0.062
rums+0.061
 means+0.061
 endings+0.059
Neg logit contributions
 shallow-0.052
Ho-0.047
wered-0.045
ores-0.045
add-0.045
Token second
Token ID1218
Feature activation+1.57e-01

Pos logit contributions
lder+0.087
 property+0.087
 client+0.087
 means+0.085
 fault+0.084
Neg logit contributions
 shallow-0.049
Ho-0.046
 crashing-0.045
wered-0.045
ores-0.044
Token.
Token ID13
Feature activation+8.62e-03

Pos logit contributions
lder+0.275
 client+0.272
 property+0.272
 means+0.270
grand+0.264
Neg logit contributions
 shallow-0.209
 crashing-0.198
Ho-0.195
 sun-0.194
wered-0.191
Token
Token ID198
Feature activation-5.71e-02

Pos logit contributions
 means+0.016
 client+0.015
lder+0.015
 fault+0.014
rums+0.014
Neg logit contributions
 shallow-0.011
wered-0.011
 crashing-0.010
 fresh-0.010
add-0.010
Token
Token ID198
Feature activation-6.10e-02

Pos logit contributions
 shallow+0.071
 crashing+0.066
wered+0.066
 sun+0.065
 sunscreen+0.063
Neg logit contributions
 means-0.104
lder-0.104
grand-0.101
 client-0.101
rums-0.099
TokenRex
Token ID47389
Feature activation-1.16e-02

Pos logit contributions
 shallow+0.068
wered+0.064
ores+0.064
 sun+0.064
add+0.063
Neg logit contributions
 means-0.117
 client-0.117
lder-0.116
 property-0.114
grand-0.111
Token tried
Token ID3088
Feature activation-1.38e-02

Pos logit contributions
 shallow+0.014
Ho+0.013
 crashing+0.013
 sun+0.013
ch+0.013
Neg logit contributions
 policy-0.022
 client-0.022
 property-0.021
 fault-0.021
grand-0.021
Token down
Token ID866
Feature activation-1.01e-02

Pos logit contributions
 means+0.023
 property+0.023
 fault+0.023
 client+0.022
keeping+0.022
Neg logit contributions
Ho-0.017
 shallow-0.017
wered-0.016
ores-0.015
add-0.015
Token the
Token ID262
Feature activation-2.04e-03

Pos logit contributions
Ho+0.014
 shallow+0.013
add+0.013
wered+0.013
hes+0.012
Neg logit contributions
 means-0.018
keeping-0.017
 policy-0.017
 fault-0.016
 property-0.016
Token slide
Token ID10649
Feature activation-7.41e-02

Pos logit contributions
 shallow+0.003
Ho+0.003
ores+0.003
 crashing+0.003
wered+0.003
Neg logit contributions
lder-0.004
 property-0.004
 client-0.003
 means-0.003
 fault-0.003
Token for
Token ID329
Feature activation+3.73e-02

Pos logit contributions
 shallow+0.104
Ho+0.099
wered+0.097
 crashing+0.097
ores+0.095
Neg logit contributions
lder-0.123
 property-0.122
 means-0.122
 client-0.121
 fault-0.118
Token the
Token ID262
Feature activation+2.26e-03

Pos logit contributions
 client+0.064
lder+0.063
keeping+0.063
 means+0.062
 property+0.062
Neg logit contributions
 shallow-0.048
Ho-0.046
 crashing-0.045
add-0.043
 camps-0.042
Token second
Token ID1218
Feature activation+1.57e-01

Pos logit contributions
lder+0.004
 means+0.004
 property+0.004
 client+0.004
grand+0.004
Neg logit contributions
 shallow-0.003
 crashing-0.003
 sun-0.003
Ho-0.003
wered-0.003
Token time
Token ID640
Feature activation+5.99e-02

Pos logit contributions
lder+0.280
 property+0.279
 client+0.275
 means+0.274
grand+0.269
Neg logit contributions
 shallow-0.205
 sun-0.201
 crashing-0.198
Ho-0.182
 fresh-0.181
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
lder+0.107
 client+0.105
 property+0.105
grand+0.102
 means+0.102
Neg logit contributions
 shallow-0.078
 crashing-0.075
 sun-0.074
ores-0.072
Ho-0.072
Token<|endoftext|>
Token ID50256
Feature activation+6.61e-02

Pos logit contributions
lder+0.020
 means+0.020
 fault+0.019
 client+0.019
keeping+0.018
Neg logit contributions
 shallow-0.017
Ho-0.017
wered-0.017
 crashing-0.017
 sunscreen-0.016
TokenTom
Token ID13787
Feature activation+4.40e-02

Pos logit contributions
 means+0.122
lder+0.120
 client+0.117
keeping+0.117
 property+0.116
Neg logit contributions
 shallow-0.081
 crashing-0.076
wered-0.076
Ho-0.075
ores-0.073
Tokenmy
Token ID1820
Feature activation+4.27e-02

Pos logit contributions
 client+0.086
 property+0.085
lder+0.085
grand+0.083
 policy+0.082
Neg logit contributions
 shallow-0.050
 crashing-0.047
 sun-0.046
Ho-0.046
ores-0.044
Token rescue
Token ID9992
Feature activation+8.09e-03

Pos logit contributions
lder+0.046
 means+0.044
grand+0.043
rums+0.042
 policy+0.040
Neg logit contributions
 shallow-0.042
 sun-0.040
 camps-0.038
 crashing-0.038
 fresh-0.037
Token.
Token ID13
Feature activation+2.20e-02

Pos logit contributions
lder+0.017
 client+0.016
grand+0.016
 fault+0.016
 property+0.016
Neg logit contributions
 shallow-0.008
 crashing-0.008
 sun-0.008
ores-0.007
 fresh-0.007
Token "
Token ID366
Feature activation+1.83e-02

Pos logit contributions
lder+0.041
 means+0.041
grand+0.039
 client+0.039
 property+0.039
Neg logit contributions
 shallow-0.027
wered-0.027
 crashing-0.026
Ho-0.026
add-0.025
TokenDon
Token ID3987
Feature activation+4.60e-02

Pos logit contributions
lder+0.026
grand+0.025
 worry+0.025
 property+0.024
keeping+0.023
Neg logit contributions
 shallow-0.030
 camps-0.030
ores-0.030
addle-0.030
add-0.029
Token't
Token ID470
Feature activation+4.13e-02

Pos logit contributions
 property+0.069
lder+0.069
 client+0.069
 means+0.067
 fault+0.067
Neg logit contributions
 shallow-0.072
Ho-0.068
 crashing-0.068
wered-0.067
 sun-0.067
Token worry
Token ID5490
Feature activation+1.57e-01

Pos logit contributions
 client+0.034
 fault+0.034
 means+0.033
lder+0.031
grand+0.031
Neg logit contributions
 crashing-0.094
 sun-0.093
Ho-0.092
 fresh-0.091
 shallow-0.091
Token little
Token ID1310
Feature activation+6.23e-02

Pos logit contributions
 client+0.285
keeping+0.280
 property+0.275
grand+0.270
lder+0.263
Neg logit contributions
 sun-0.205
 fresh-0.195
 shallow-0.193
ch-0.190
 crashing-0.190
Token friend
Token ID1545
Feature activation+9.04e-02

Pos logit contributions
lder+0.113
 property+0.111
 client+0.111
keeping+0.110
grand+0.109
Neg logit contributions
 shallow-0.079
 sun-0.077
Ho-0.075
 crashing-0.075
ch-0.073
Token,
Token ID11
Feature activation+2.97e-02

Pos logit contributions
 client+0.160
 fault+0.155
keeping+0.151
 property+0.148
rums+0.147
Neg logit contributions
ch-0.115
 farther-0.113
 fresh-0.110
 crashing-0.109
 higher-0.108
Token I
Token ID314
Feature activation+1.77e-02

Pos logit contributions
 property+0.044
 means+0.044
lder+0.043
 policy+0.042
 client+0.042
Neg logit contributions
 shallow-0.047
Ho-0.045
wered-0.045
add-0.044
 crashing-0.044
Token will
Token ID481
Feature activation+4.39e-02

Pos logit contributions
keeping+0.022
 client+0.022
lder+0.022
 property+0.021
grand+0.021
Neg logit contributions
 shallow-0.031
add-0.031
 crashing-0.031
ores-0.031
 sun-0.031
Token say
Token ID910
Feature activation-1.49e-02

Pos logit contributions
lder+0.018
 property+0.017
 client+0.017
 fault+0.017
keeping+0.017
Neg logit contributions
 shallow-0.014
 crashing-0.014
Ho-0.013
 sun-0.013
wered-0.013
Token,
Token ID11
Feature activation-9.15e-03

Pos logit contributions
 crashing+0.019
 shallow+0.018
 sun+0.017
ores+0.017
 fresh+0.017
Neg logit contributions
 property-0.028
lder-0.027
grand-0.027
keeping-0.027
 client-0.027
Token "
Token ID366
Feature activation+2.28e-02

Pos logit contributions
 shallow+0.016
add+0.016
wered+0.015
 camps+0.015
 sunscreen+0.015
Neg logit contributions
 means-0.013
lder-0.012
 property-0.011
 depends-0.011
 client-0.011
TokenDon
Token ID3987
Feature activation+4.56e-02

Pos logit contributions
lder+0.039
 means+0.038
 property+0.038
grand+0.037
keeping+0.037
Neg logit contributions
 shallow-0.033
ores-0.030
 camps-0.030
add-0.029
wered-0.029
Token't
Token ID470
Feature activation+4.32e-02

Pos logit contributions
 property+0.080
 fault+0.078
 client+0.077
keeping+0.076
lder+0.076
Neg logit contributions
 crashing-0.059
wered-0.058
 sun-0.058
Ho-0.058
 shallow-0.058
Token worry
Token ID5490
Feature activation+1.57e-01

Pos logit contributions
 means+0.036
lder+0.036
 fault+0.035
 client+0.035
grand+0.033
Neg logit contributions
 sun-0.098
 crashing-0.098
 fresh-0.094
 spring-0.093
 shallow-0.093
Token,
Token ID11
Feature activation+3.41e-02

Pos logit contributions
keeping+0.315
grand+0.301
 depends+0.299
 property+0.298
lder+0.295
Neg logit contributions
 sun-0.187
 crashing-0.172
 fresh-0.162
ores-0.161
ch-0.156
Token I
Token ID314
Feature activation+1.99e-02

Pos logit contributions
lder+0.054
 client+0.053
 property+0.053
 fault+0.051
 means+0.051
Neg logit contributions
 shallow-0.051
 crashing-0.050
 sun-0.049
wered-0.048
Ho-0.047
Token can
Token ID460
Feature activation+7.45e-02

Pos logit contributions
keeping+0.029
lder+0.029
 client+0.029
 property+0.028
grand+0.028
Neg logit contributions
 sun-0.031
 shallow-0.031
 spring-0.031
ores-0.031
 crashing-0.031
Token help
Token ID1037
Feature activation+4.76e-02

Pos logit contributions
 client+0.127
 property+0.127
lder+0.125
keeping+0.124
 fault+0.124
Neg logit contributions
 shallow-0.098
 sun-0.096
wered-0.096
 crashing-0.095
ores-0.093
Token you
Token ID345
Feature activation+6.32e-02

Pos logit contributions
 fault+0.080
lder+0.080
 client+0.079
 property+0.078
grand+0.077
Neg logit contributions
 crashing-0.065
 shallow-0.065
 sun-0.065
ores-0.062
 fresh-0.062
Token favorite
Token ID4004
Feature activation+7.97e-02

Pos logit contributions
 property+0.035
 client+0.035
lder+0.034
 means+0.034
grand+0.033
Neg logit contributions
 shallow-0.049
 crashing-0.047
Ho-0.047
wered-0.046
ores-0.046
Token costume
Token ID16569
Feature activation-1.90e-02

Pos logit contributions
lder+0.133
 client+0.128
 property+0.127
 fault+0.126
 means+0.124
Neg logit contributions
 sun-0.119
 shallow-0.117
 spring-0.108
 fresh-0.108
Ho-0.108
Token without
Token ID1231
Feature activation+1.44e-02

Pos logit contributions
 shallow+0.022
 crashing+0.021
 sun+0.021
Ho+0.021
wered+0.020
Neg logit contributions
lder-0.036
 property-0.036
 client-0.036
 means-0.035
grand-0.034
Token having
Token ID1719
Feature activation-1.39e-03

Pos logit contributions
 property+0.023
 client+0.023
lder+0.023
 means+0.023
 fault+0.022
Neg logit contributions
 shallow-0.021
Ho-0.020
wered-0.020
 crashing-0.020
ores-0.020
Token to
Token ID284
Feature activation+5.32e-02

Pos logit contributions
 shallow+0.001
 sun+0.001
 crashing+0.001
 fresh+0.001
ores+0.001
Neg logit contributions
lder-0.003
 client-0.003
 property-0.003
 means-0.003
grand-0.003
Token worry
Token ID5490
Feature activation+1.56e-01

Pos logit contributions
lder+0.086
 property+0.085
 client+0.085
 means+0.084
 fault+0.082
Neg logit contributions
 shallow-0.077
 crashing-0.074
Ho-0.073
wered-0.072
 sun-0.072
Token about
Token ID546
Feature activation+1.24e-02

Pos logit contributions
lder+0.284
 property+0.282
 client+0.282
 means+0.279
grand+0.273
Neg logit contributions
 shallow-0.197
 crashing-0.185
Ho-0.184
wered-0.180
 sun-0.179
Token it
Token ID340
Feature activation+2.11e-02

Pos logit contributions
 property+0.023
lder+0.023
 client+0.023
 means+0.022
 fault+0.022
Neg logit contributions
 shallow-0.015
Ho-0.014
 crashing-0.014
wered-0.014
 sun-0.014
Token breaking
Token ID7163
Feature activation-3.72e-03

Pos logit contributions
lder+0.041
 client+0.041
 property+0.039
grand+0.039
 means+0.038
Neg logit contributions
 shallow-0.024
 crashing-0.023
 sun-0.023
 fresh-0.023
ores-0.022
Token again
Token ID757
Feature activation+3.72e-02

Pos logit contributions
 shallow+0.004
 sun+0.004
 crashing+0.004
 fresh+0.004
 sunscreen+0.004
Neg logit contributions
lder-0.008
 client-0.007
grand-0.007
 depends-0.007
 property-0.007
Token.
Token ID13
Feature activation+1.83e-02

Pos logit contributions
lder+0.081
 client+0.077
grand+0.075
 depends+0.074
 Painter+0.074
Neg logit contributions
 shallow-0.037
 sun-0.036
 fresh-0.034
 crashing-0.033
 spring-0.032
Token said
Token ID531
Feature activation+1.82e-02

Pos logit contributions
 shallow+0.161
Ho+0.153
 crashing+0.153
wered+0.150
 sun+0.149
Neg logit contributions
lder-0.145
 client-0.143
 property-0.143
 means-0.142
grand-0.137
Token:
Token ID25
Feature activation+2.25e-02

Pos logit contributions
keeping+0.041
 property+0.041
 fault+0.039
 depends+0.039
grand+0.038
Neg logit contributions
 sun-0.017
 them-0.016
 crashing-0.016
 fresh-0.015
 floating-0.014
Token "
Token ID366
Feature activation+2.18e-02

Pos logit contributions
lder+0.038
 means+0.036
 client+0.036
 Painter+0.034
 property+0.034
Neg logit contributions
 shallow-0.033
add-0.031
wered-0.030
Ho-0.030
ores-0.030
TokenDon
Token ID3987
Feature activation+4.58e-02

Pos logit contributions
lder+0.040
 property+0.040
 means+0.040
 client+0.039
grand+0.039
Neg logit contributions
 shallow-0.029
ores-0.027
 crashing-0.026
wered-0.025
 sun-0.025
Token't
Token ID470
Feature activation+4.29e-02

Pos logit contributions
lder+0.068
 property+0.068
 client+0.067
 means+0.067
 fault+0.065
Neg logit contributions
 shallow-0.072
Ho-0.069
 crashing-0.069
wered-0.068
 sun-0.067
Token worry
Token ID5490
Feature activation+1.56e-01

Pos logit contributions
 fault+0.044
 means+0.042
lder+0.041
 client+0.039
 depends+0.039
Neg logit contributions
 sun-0.090
 crashing-0.087
wered-0.086
Ho-0.086
 fresh-0.086
Token,
Token ID11
Feature activation+3.42e-02

Pos logit contributions
keeping+0.267
 property+0.264
grand+0.263
 client+0.259
lder+0.259
Neg logit contributions
 sun-0.209
Ho-0.204
 crashing-0.204
 shallow-0.190
 fresh-0.189
Token Pet
Token ID4767
Feature activation-4.35e-02

Pos logit contributions
 property+0.056
 means+0.056
lder+0.056
 client+0.055
grand+0.053
Neg logit contributions
 shallow-0.053
ores-0.049
hes-0.049
 crashing-0.047
wered-0.047
Tokena
Token ID64
Feature activation+1.28e-02

Pos logit contributions
 shallow+0.070
Ho+0.069
 crashing+0.065
 sun+0.065
wered+0.064
Neg logit contributions
 property-0.064
lder-0.063
 client-0.063
 fault-0.062
grand-0.062
Token,
Token ID11
Feature activation+2.22e-02

Pos logit contributions
keeping+0.027
 client+0.026
grand+0.026
rums+0.026
 policy+0.026
Neg logit contributions
 spring-0.011
 sun-0.011
ores-0.011
 fresh-0.010
 shallow-0.010
Token I
Token ID314
Feature activation+1.27e-02

Pos logit contributions
 property+0.034
lder+0.034
 client+0.033
 means+0.033
grand+0.032
Neg logit contributions
 shallow-0.035
Ho-0.033
 crashing-0.033
wered-0.032
ores-0.032
Token bird
Token ID6512
Feature activation-3.36e-02

Pos logit contributions
 shallow+0.087
Ho+0.084
ores+0.083
 crashing+0.083
wered+0.081
Neg logit contributions
 property-0.085
 client-0.084
lder-0.083
 fault-0.082
 means-0.081
Token saw
Token ID2497
Feature activation-1.24e-02

Pos logit contributions
 shallow+0.053
 crashing+0.050
wered+0.050
Ho+0.050
ores+0.049
Neg logit contributions
lder-0.051
 means-0.051
 property-0.050
 client-0.049
grand-0.049
Token the
Token ID262
Feature activation-1.47e-02

Pos logit contributions
 shallow+0.015
Ho+0.015
 crashing+0.015
wered+0.014
ores+0.013
Neg logit contributions
 means-0.024
lder-0.023
 client-0.023
 property-0.022
 fault-0.022
Token roses
Token ID42152
Feature activation-5.78e-02

Pos logit contributions
 shallow+0.018
Ho+0.017
ores+0.017
wered+0.017
 crashing+0.017
Neg logit contributions
 property-0.028
 client-0.027
 fault-0.027
lder-0.027
 means-0.027
Token and
Token ID290
Feature activation-1.14e-01

Pos logit contributions
Ho+0.080
 shallow+0.075
wered+0.075
add+0.073
inn+0.071
Neg logit contributions
 means-0.098
 property-0.097
lder-0.095
keeping-0.092
 worry-0.090
Token per
Token ID583
Feature activation-1.71e-01

Pos logit contributions
 shallow+0.128
Ho+0.120
 crashing+0.118
wered+0.117
ores+0.117
Neg logit contributions
 means-0.222
lder-0.222
 property-0.219
 client-0.219
grand-0.213
Tokenked
Token ID9091
Feature activation-6.93e-02

Pos logit contributions
 shallow+0.315
wered+0.309
ores+0.268
 camps+0.268
 sunscreen+0.265
Neg logit contributions
 means-0.228
lder-0.218
vered-0.201
ometer-0.198
eware-0.196
Token up
Token ID510
Feature activation-5.89e-02

Pos logit contributions
 shallow+0.085
Ho+0.081
wered+0.079
add+0.077
 crashing+0.076
Neg logit contributions
 means-0.128
 property-0.126
lder-0.125
 client-0.123
 fault-0.123
Token.
Token ID13
Feature activation+2.55e-03

Pos logit contributions
Ho+0.112
 shallow+0.110
add+0.108
wered+0.107
inn+0.106
Neg logit contributions
 means-0.062
 wishes-0.055
 feelings-0.053
 fault-0.052
 worry-0.052
Token It
Token ID632
Feature activation-3.62e-02

Pos logit contributions
 means+0.004
 client+0.004
 fault+0.004
lder+0.004
grand+0.004
Neg logit contributions
 shallow-0.004
 crashing-0.004
Ho-0.004
wered-0.004
 camps-0.004
Token flew
Token ID13112
Feature activation-1.30e-01

Pos logit contributions
 shallow+0.061
 crashing+0.060
wered+0.060
ores+0.060
add+0.059
Neg logit contributions
 means-0.051
lder-0.049
 depends-0.046
 client-0.044
 property-0.044
Token ro
Token ID686
Feature activation-9.64e-02

Pos logit contributions
Ho+0.022
wered+0.021
 shallow+0.021
add+0.020
ores+0.019
Neg logit contributions
 fault-0.032
 means-0.032
 property-0.031
lder-0.031
 client-0.030
Tokenost
Token ID455
Feature activation-6.46e-02

Pos logit contributions
 sun+0.146
wered+0.144
 shallow+0.143
 sunscreen+0.136
 camps+0.136
Neg logit contributions
lder-0.148
 property-0.143
 means-0.139
 virtue-0.135
 decisions-0.134
Tokening
Token ID278
Feature activation-5.09e-02

Pos logit contributions
 shallow+0.085
 crashing+0.082
Ho+0.079
ores+0.078
 sun+0.078
Neg logit contributions
 client-0.114
lder-0.113
 property-0.112
grand-0.110
 fault-0.109
Token on
Token ID319
Feature activation+1.20e-02

Pos logit contributions
 shallow+0.054
 crashing+0.050
Ho+0.050
wered+0.049
 sun+0.049
Neg logit contributions
lder-0.102
 property-0.101
 client-0.101
 means-0.100
grand-0.098
Token their
Token ID511
Feature activation+2.72e-02

Pos logit contributions
lder+0.025
 property+0.025
 client+0.025
 means+0.025
 fault+0.025
Neg logit contributions
 shallow-0.012
Ho-0.011
 crashing-0.011
wered-0.010
 sun-0.010
Token per
Token ID583
Feature activation-1.71e-01

Pos logit contributions
 means+0.052
lder+0.050
 Painter+0.049
 policy+0.049
 client+0.049
Neg logit contributions
 sun-0.034
 shallow-0.034
 crashing-0.033
 sunscreen-0.032
 fresh-0.031
Tokenches
Token ID2052
Feature activation-4.18e-02

Pos logit contributions
 shallow+0.135
wered+0.135
 sunscreen+0.115
ores+0.107
add+0.104
Neg logit contributions
 means-0.393
 property-0.376
lder-0.375
 fault-0.374
 client-0.367
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
 shallow+0.050
 crashing+0.048
 sun+0.046
Ho+0.046
wered+0.045
Neg logit contributions
lder-0.079
 client-0.079
 property-0.078
grand-0.077
 means-0.076
Token They
Token ID1119
Feature activation-5.96e-02

Pos logit contributions
 means+0.025
 client+0.024
lder+0.024
grand+0.024
keeping+0.023
Neg logit contributions
 shallow-0.018
wered-0.017
 crashing-0.017
Ho-0.017
add-0.016
Token were
Token ID547
Feature activation-2.33e-02

Pos logit contributions
 shallow+0.083
Ho+0.080
wered+0.078
 crashing+0.078
ores+0.077
Neg logit contributions
 means-0.099
lder-0.098
 client-0.097
 property-0.096
grand-0.093
Token all
Token ID477
Feature activation-1.37e-02

Pos logit contributions
Ho+0.034
 shallow+0.032
wered+0.032
add+0.031
Exc+0.030
Neg logit contributions
 means-0.036
 fault-0.036
 property-0.036
lder-0.035
 client-0.035
Token as
Token ID355
Feature activation-3.49e-02

Pos logit contributions
 shallow+0.005
 crashing+0.005
 sun+0.004
ores+0.004
 fresh+0.004
Neg logit contributions
lder-0.007
 client-0.007
grand-0.007
 property-0.007
keeping-0.007
Token high
Token ID1029
Feature activation-1.74e-02

Pos logit contributions
 shallow+0.043
 crashing+0.040
Ho+0.039
 sun+0.039
ores+0.039
Neg logit contributions
lder-0.065
 client-0.065
 property-0.064
 means-0.063
grand-0.063
Token as
Token ID355
Feature activation-4.27e-02

Pos logit contributions
 shallow+0.019
 crashing+0.017
Ho+0.017
 sun+0.017
wered+0.017
Neg logit contributions
 client-0.035
lder-0.035
 property-0.035
grand-0.034
 means-0.034
Token your
Token ID534
Feature activation+7.92e-03

Pos logit contributions
 shallow+0.061
Ho+0.059
wered+0.059
 crashing+0.058
add+0.058
Neg logit contributions
 property-0.068
 means-0.068
lder-0.068
 client-0.067
 policy-0.065
Token tree
Token ID5509
Feature activation-3.39e-02

Pos logit contributions
lder+0.014
 means+0.014
 property+0.013
 client+0.013
grand+0.013
Neg logit contributions
 shallow-0.011
 sun-0.010
 crashing-0.010
Ho-0.010
 fresh-0.010
Token per
Token ID583
Feature activation-1.70e-01

Pos logit contributions
 shallow+0.053
 crashing+0.051
 sun+0.050
ch+0.049
ores+0.048
Neg logit contributions
 client-0.052
lder-0.050
 policy-0.049
 property-0.049
grand-0.049
Tokench
Token ID354
Feature activation-9.18e-02

Pos logit contributions
 shallow+0.146
wered+0.133
ores+0.122
 sunscreen+0.122
hes+0.114
Neg logit contributions
 means-0.389
 property-0.373
lder-0.369
 fault-0.364
 client-0.364
Token.'
Token ID2637
Feature activation+3.75e-02

Pos logit contributions
 shallow+0.131
Ho+0.124
wered+0.121
 crashing+0.120
 sun+0.119
Neg logit contributions
lder-0.152
 property-0.150
 means-0.150
 client-0.147
 fault-0.146
Token All
Token ID1439
Feature activation-5.52e-02

Pos logit contributions
lder+0.072
 client+0.071
 means+0.071
 property+0.071
grand+0.070
Neg logit contributions
 shallow-0.044
 crashing-0.041
Ho-0.041
wered-0.040
 sun-0.039
Token the
Token ID262
Feature activation-2.44e-02

Pos logit contributions
 shallow+0.046
 crashing+0.041
Ho+0.040
 sun+0.040
wered+0.039
Neg logit contributions
lder-0.125
 property-0.125
 client-0.125
 means-0.122
 fault-0.121
Token animals
Token ID4695
Feature activation-4.77e-02

Pos logit contributions
 shallow+0.032
 crashing+0.030
Ho+0.030
wered+0.030
ores+0.029
Neg logit contributions
lder-0.043
 property-0.043
 client-0.043
 means-0.042
grand-0.041
Token hand
Token ID1021
Feature activation-2.51e-02

Pos logit contributions
lder+0.023
 means+0.023
 property+0.023
 client+0.022
grand+0.022
Neg logit contributions
 shallow-0.017
 sun-0.016
 crashing-0.016
Ho-0.016
wered-0.015
Token,
Token ID11
Feature activation-2.00e-02

Pos logit contributions
 shallow+0.038
wered+0.038
add+0.037
Ho+0.037
ores+0.035
Neg logit contributions
 means-0.039
lder-0.037
 property-0.037
 client-0.036
keeping-0.036
Token and
Token ID290
Feature activation-1.08e-01

Pos logit contributions
add+0.036
wered+0.033
 camps+0.032
arl+0.032
addle+0.031
Neg logit contributions
 means-0.028
 worry-0.027
lder-0.026
 property-0.025
 refused-0.024
Token the
Token ID262
Feature activation-3.08e-02

Pos logit contributions
 shallow+0.138
wered+0.132
add+0.132
Ho+0.131
 crashing+0.130
Neg logit contributions
 means-0.193
lder-0.190
 client-0.183
 property-0.181
grand-0.178
Token bird
Token ID6512
Feature activation-3.27e-02

Pos logit contributions
 shallow+0.041
 sun+0.040
 crashing+0.038
Ho+0.038
wered+0.037
Neg logit contributions
 client-0.055
lder-0.054
 means-0.054
 property-0.054
grand-0.053
Token per
Token ID583
Feature activation-1.70e-01

Pos logit contributions
 shallow+0.042
Ho+0.040
 crashing+0.040
wered+0.039
 sun+0.039
Neg logit contributions
lder-0.058
 property-0.058
 client-0.058
 means-0.057
grand-0.056
Tokenked
Token ID9091
Feature activation-6.80e-02

Pos logit contributions
 shallow+0.300
wered+0.291
ores+0.281
 sunscreen+0.273
 camps+0.270
Neg logit contributions
 means-0.231
lder-0.219
vered-0.211
eware-0.204
se-0.199
Token up
Token ID510
Feature activation-5.79e-02

Pos logit contributions
 shallow+0.085
Ho+0.081
add+0.079
wered+0.078
ores+0.077
Neg logit contributions
 means-0.123
 property-0.122
lder-0.120
 client-0.119
 fault-0.118
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
Ho+0.093
hes+0.093
 shallow+0.091
wered+0.089
add+0.089
Neg logit contributions
 means-0.080
lder-0.073
 worry-0.072
,'-0.070
 property-0.070
Token She
Token ID1375
Feature activation-4.10e-02

Pos logit contributions
 means+0.023
lder+0.022
 client+0.022
 depends+0.020
 fault+0.020
Neg logit contributions
 shallow-0.020
 crashing-0.019
wered-0.019
Ho-0.019
add-0.019
Token said
Token ID531
Feature activation+1.01e-02

Pos logit contributions
 shallow+0.064
wered+0.062
 crashing+0.062
Ho+0.061
ores+0.061
Neg logit contributions
lder-0.062
 means-0.061
 client-0.059
 property-0.059
grand-0.057
Token his
Token ID465
Feature activation+6.91e-03

Pos logit contributions
 shallow+0.009
Ho+0.009
wered+0.009
 crashing+0.009
add+0.009
Neg logit contributions
 means-0.019
 property-0.019
 client-0.019
 fault-0.019
lder-0.018
Token warm
Token ID5814
Feature activation+3.01e-03

Pos logit contributions
 property+0.012
 client+0.012
lder+0.012
 fault+0.012
grand+0.012
Neg logit contributions
 shallow-0.009
 crashing-0.009
ores-0.009
wered-0.009
Ho-0.008
Token blanket
Token ID18447
Feature activation-2.20e-02

Pos logit contributions
 property+0.005
keeping+0.005
 behaviour+0.005
 policy+0.005
lder+0.005
Neg logit contributions
 shallow-0.004
Ho-0.004
wered-0.004
ores-0.004
hes-0.004
Token,
Token ID11
Feature activation-1.79e-02

Pos logit contributions
 shallow+0.026
Ho+0.025
wered+0.024
 crashing+0.024
add+0.024
Neg logit contributions
 property-0.041
 means-0.041
lder-0.041
 client-0.041
 fault-0.040
Token and
Token ID290
Feature activation-1.08e-01

Pos logit contributions
add+0.026
wered+0.024
inn+0.024
Ho+0.023
arl+0.023
Neg logit contributions
 property-0.030
 means-0.029
 worry-0.029
grand-0.028
 privacy-0.028
Token kne
Token ID24813
Feature activation-1.69e-01

Pos logit contributions
 shallow+0.122
Ho+0.117
wered+0.115
add+0.114
 crashing+0.113
Neg logit contributions
 means-0.208
 property-0.206
 client-0.205
lder-0.203
grand-0.201
Tokenel
Token ID417
Feature activation-4.92e-02

Pos logit contributions
wered+0.281
 shallow+0.280
 sun+0.272
ores+0.269
 crashing+0.264
Neg logit contributions
 means-0.231
lder-0.221
rums-0.220
 property-0.220
keeping-0.218
Token down
Token ID866
Feature activation-3.25e-02

Pos logit contributions
 shallow+0.060
wered+0.057
 crashing+0.056
ores+0.055
add+0.055
Neg logit contributions
 means-0.092
 property-0.091
lder-0.088
 client-0.087
keeping-0.087
Token to
Token ID284
Feature activation+3.58e-02

Pos logit contributions
Ho+0.040
 shallow+0.040
add+0.039
wered+0.038
 crashing+0.037
Neg logit contributions
 means-0.059
lder-0.056
 property-0.056
 client-0.055
 policy-0.054
Token pray
Token ID12472
Feature activation-4.05e-02

Pos logit contributions
 property+0.054
 prove+0.054
 means+0.054
 client+0.053
 privacy+0.052
Neg logit contributions
add-0.053
Ho-0.053
 shallow-0.051
inn-0.049
 crashing-0.048
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
 shallow+0.047
 crashing+0.046
 sun+0.044
ores+0.043
 fresh+0.043
Neg logit contributions
lder-0.079
 client-0.078
 property-0.076
grand-0.075
 means-0.074
Token some
Token ID617
Feature activation-1.13e-02

Pos logit contributions
 crashing+0.059
 shallow+0.059
wered+0.057
Ho+0.056
add+0.054
Neg logit contributions
 property-0.075
lder-0.074
grand-0.074
 means-0.073
 client-0.073
Token nuts
Token ID14380
Feature activation-8.56e-02

Pos logit contributions
wered+0.013
Ho+0.013
 shallow+0.013
 crashing+0.013
ores+0.012
Neg logit contributions
 means-0.021
 property-0.021
 fault-0.021
 policy-0.021
 privacy-0.020
Token and
Token ID290
Feature activation-1.30e-01

Pos logit contributions
wered+0.105
 shallow+0.104
Ho+0.103
 crashing+0.098
add+0.095
Neg logit contributions
 means-0.157
 property-0.156
 fault-0.155
keeping-0.152
lder-0.152
Token seeds
Token ID11904
Feature activation-8.55e-02

Pos logit contributions
 shallow+0.141
 crashing+0.141
Ho+0.137
wered+0.136
add+0.132
Neg logit contributions
 means-0.257
 client-0.249
 property-0.248
lder-0.246
grand-0.246
Token and
Token ID290
Feature activation-1.27e-01

Pos logit contributions
wered+0.142
 shallow+0.139
Ho+0.138
 crashing+0.129
add+0.128
Neg logit contributions
 means-0.126
 fault-0.123
keeping-0.120
 property-0.117
lder-0.114
Token stro
Token ID9660
Feature activation-1.68e-01

Pos logit contributions
 shallow+0.137
 crashing+0.132
Ho+0.132
wered+0.130
add+0.126
Neg logit contributions
 means-0.251
 property-0.248
 client-0.246
lder-0.245
grand-0.243
Tokenked
Token ID9091
Feature activation-7.53e-02

Pos logit contributions
 shallow+0.248
wered+0.230
ores+0.218
 sun+0.210
hes+0.209
Neg logit contributions
lder-0.281
 means-0.280
 property-0.278
keeping-0.274
 client-0.269
Token its
Token ID663
Feature activation-3.52e-02

Pos logit contributions
 shallow+0.083
Ho+0.077
wered+0.076
 crashing+0.076
ores+0.073
Neg logit contributions
 property-0.148
 means-0.148
lder-0.147
 client-0.145
 fault-0.144
Token fur
Token ID9230
Feature activation-7.07e-02

Pos logit contributions
 shallow+0.040
 crashing+0.037
Ho+0.037
wered+0.036
 sun+0.036
Neg logit contributions
lder-0.068
 property-0.068
 client-0.068
 means-0.067
 fault-0.066
Token.
Token ID13
Feature activation+4.48e-04

Pos logit contributions
 shallow+0.108
wered+0.104
Ho+0.103
 crashing+0.101
add+0.100
Neg logit contributions
 means-0.109
 property-0.108
lder-0.107
 client-0.106
 fault-0.105
Token
Token ID198
Feature activation-6.90e-02

Pos logit contributions
 means+0.001
 fault+0.001
 client+0.001
lder+0.001
grand+0.001
Neg logit contributions
 shallow-0.001
wered-0.001
Ho-0.001
 crashing-0.001
 farther-0.001
Token."
Token ID526
Feature activation+3.22e-02

Pos logit contributions
 client+0.161
lder+0.158
 property+0.158
 fault+0.151
 means+0.150
Neg logit contributions
 shallow-0.149
 sun-0.144
 crashing-0.143
ores-0.140
hes-0.137
Token
Token ID220
Feature activation+1.25e-02

Pos logit contributions
 means+0.056
lder+0.055
 client+0.055
 property+0.053
 fault+0.053
Neg logit contributions
 shallow-0.045
wered-0.043
 crashing-0.042
add-0.041
Ho-0.040
Token
Token ID198
Feature activation-5.12e-02

Pos logit contributions
 means+0.025
lder+0.024
 client+0.024
 fault+0.024
 property+0.024
Neg logit contributions
 shallow-0.014
 crashing-0.013
wered-0.013
Ho-0.013
 sun-0.012
Token
Token ID198
Feature activation-5.44e-02

Pos logit contributions
 shallow+0.071
wered+0.067
 crashing+0.066
add+0.064
ores+0.063
Neg logit contributions
lder-0.088
 means-0.087
grand-0.086
rums-0.084
 client-0.084
TokenMax
Token ID11518
Feature activation+2.51e-02

Pos logit contributions
 shallow+0.081
add+0.080
ores+0.077
wered+0.076
 crashing+0.075
Neg logit contributions
lder-0.089
 client-0.086
 property-0.085
 means-0.084
grand-0.084
Token per
Token ID583
Feature activation-1.68e-01

Pos logit contributions
 property+0.044
lder+0.044
 client+0.044
 means+0.043
 fault+0.042
Neg logit contributions
 shallow-0.033
Ho-0.031
 crashing-0.031
wered-0.031
 sun-0.030
Tokenked
Token ID9091
Feature activation-6.50e-02

Pos logit contributions
 shallow+0.264
wered+0.263
 sunscreen+0.238
 camps+0.233
 drifting+0.229
Neg logit contributions
 means-0.271
lder-0.257
 client-0.236
rums-0.236
ometer-0.234
Token up
Token ID510
Feature activation-5.58e-02

Pos logit contributions
 shallow+0.071
Ho+0.068
wered+0.066
add+0.065
 crashing+0.064
Neg logit contributions
 means-0.129
 client-0.125
 property-0.125
lder-0.125
 fault-0.123
Token and
Token ID290
Feature activation-1.15e-01

Pos logit contributions
Ho+0.088
wered+0.085
 shallow+0.084
hes+0.084
add+0.083
Neg logit contributions
 means-0.079
 worry-0.074
lder-0.068
 client-0.068
 behaviour-0.067
Token w
Token ID266
Feature activation-6.04e-02

Pos logit contributions
wered+0.172
add+0.168
 crashing+0.167
Ho+0.167
 shallow+0.166
Neg logit contributions
 means-0.188
lder-0.181
 depends-0.174
 property-0.172
 client-0.170
Tokenagged
Token ID14655
Feature activation-8.47e-02

Pos logit contributions
 shallow+0.046
 crashing+0.044
Ho+0.043
 sun+0.041
 south+0.038
Neg logit contributions
lder-0.142
rums-0.141
 means-0.138
 fault-0.138
 property-0.137
Token would
Token ID561
Feature activation+4.36e-03

Pos logit contributions
 shallow+0.034
 crashing+0.032
Ho+0.032
 sun+0.032
ores+0.032
Neg logit contributions
 property-0.036
lder-0.036
 client-0.036
 fault-0.034
 policy-0.034
Token only
Token ID691
Feature activation+2.19e-03

Pos logit contributions
lder+0.007
 property+0.007
 client+0.007
 means+0.007
 fault+0.007
Neg logit contributions
 shallow-0.006
Ho-0.006
 crashing-0.006
wered-0.006
ores-0.006
Token eat
Token ID4483
Feature activation-3.88e-02

Pos logit contributions
lder+0.004
 property+0.004
 client+0.004
 means+0.004
 fault+0.004
Neg logit contributions
 shallow-0.003
 crashing-0.003
 sun-0.003
wered-0.002
Ho-0.002
Token one
Token ID530
Feature activation+2.20e-02

Pos logit contributions
wered+0.050
 shallow+0.049
Ho+0.048
 crashing+0.048
Tweet+0.046
Neg logit contributions
 means-0.068
 property-0.065
 fault-0.065
 decisions-0.063
 problems-0.061
Token mint
Token ID21043
Feature activation-4.48e-02

Pos logit contributions
lder+0.042
 property+0.042
 client+0.042
 means+0.042
 fault+0.041
Neg logit contributions
 shallow-0.026
Ho-0.024
 crashing-0.024
wered-0.023
ores-0.023
Token per
Token ID583
Feature activation-1.67e-01

Pos logit contributions
 shallow+0.059
Ho+0.055
 crashing+0.055
 sun+0.054
ores+0.053
Neg logit contributions
lder-0.079
 client-0.079
 property-0.079
 means-0.077
grand-0.076
Token day
Token ID1110
Feature activation+7.40e-02

Pos logit contributions
wered+0.255
 shallow+0.248
 sunscreen+0.219
ores+0.219
 camps+0.218
Neg logit contributions
 means-0.277
 fault-0.251
lder-0.249
 property-0.245
 virtue-0.241
Token!
Token ID0
Feature activation+2.74e-02

Pos logit contributions
lder+0.128
 means+0.127
 property+0.127
 client+0.125
keeping+0.123
Neg logit contributions
 shallow-0.098
Ho-0.094
wered-0.093
 crashing-0.091
add-0.090
Token
Token ID198
Feature activation-4.96e-02

Pos logit contributions
 means+0.053
 client+0.053
 fault+0.052
lder+0.051
grand+0.051
Neg logit contributions
 shallow-0.031
 crashing-0.029
wered-0.029
Ho-0.029
 sun-0.027
Token
Token ID198
Feature activation-5.50e-02

Pos logit contributions
 shallow+0.062
 crashing+0.060
 sun+0.057
wered+0.057
 sunscreen+0.057
Neg logit contributions
grand-0.087
lder-0.087
 means-0.086
 client-0.084
rums-0.083
TokenThe
Token ID464
Feature activation-4.94e-02

Pos logit contributions
 sun+0.060
 shallow+0.060
 crashing+0.058
 drifting+0.057
ores+0.057
Neg logit contributions
grand-0.104
lder-0.103
 property-0.102
 client-0.101
 virtue-0.101
Token to
Token ID284
Feature activation+5.02e-02

Pos logit contributions
Ho+0.133
wered+0.130
add+0.124
 shallow+0.122
hes+0.115
Neg logit contributions
keeping-0.087
 property-0.075
 client-0.071
 outage-0.067
 policy-0.066
Token a
Token ID257
Feature activation+2.57e-02

Pos logit contributions
lder+0.104
 property+0.104
 client+0.104
 means+0.103
 fault+0.101
Neg logit contributions
 shallow-0.050
Ho-0.047
 crashing-0.046
wered-0.045
 sun-0.045
Token nearby
Token ID6716
Feature activation+8.13e-02

Pos logit contributions
 means+0.056
rums+0.055
 fault+0.055
lder+0.055
 client+0.054
Neg logit contributions
 shallow-0.027
 sun-0.024
 crashing-0.024
 fresh-0.023
Ho-0.022
Token tree
Token ID5509
Feature activation-2.12e-02

Pos logit contributions
lder+0.148
 property+0.147
 client+0.147
 means+0.145
grand+0.142
Neg logit contributions
 shallow-0.101
Ho-0.096
 crashing-0.095
wered-0.093
ores-0.092
Token and
Token ID290
Feature activation-1.10e-01

Pos logit contributions
Ho+0.027
 shallow+0.027
add+0.026
wered+0.026
 crashing+0.026
Neg logit contributions
 property-0.037
 means-0.037
lder-0.037
keeping-0.036
 client-0.036
Token crou
Token ID33838
Feature activation-1.66e-01

Pos logit contributions
Ho+0.143
 crashing+0.141
 shallow+0.141
add+0.137
wered+0.137
Neg logit contributions
 means-0.189
lder-0.187
 client-0.185
 property-0.185
grand-0.184
Tokenched
Token ID1740
Feature activation-7.84e-02

Pos logit contributions
wered+0.324
 camps+0.324
 sun+0.313
 shallow+0.311
 sunscreen+0.310
Neg logit contributions
vered-0.160
 means-0.157
 happens-0.143
',"-0.140
',-0.131
Token down
Token ID866
Feature activation-3.27e-02

Pos logit contributions
 shallow+0.092
wered+0.086
Ho+0.086
 crashing+0.082
ores+0.080
Neg logit contributions
 means-0.150
 property-0.147
lder-0.145
keeping-0.144
 client-0.144
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
Ho+0.058
 shallow+0.056
wered+0.054
add+0.054
 sunscreen+0.052
Neg logit contributions
 means-0.044
 property-0.040
 fault-0.039
 policy-0.039
keeping-0.039
Token
Token ID198
Feature activation-5.66e-02

Pos logit contributions
 means+0.022
lder+0.021
 client+0.021
keeping+0.021
 fault+0.020
Neg logit contributions
Ho-0.020
wered-0.020
 shallow-0.020
 crashing-0.019
add-0.018
Token
Token ID198
Feature activation-6.02e-02

Pos logit contributions
 shallow+0.076
wered+0.073
 crashing+0.072
Ho+0.070
 sun+0.069
Neg logit contributions
lder-0.098
grand-0.097
 means-0.094
keeping-0.091
 policy-0.091
Token cave
Token ID11527
Feature activation-2.50e-03

Pos logit contributions
 shallow+0.032
ores+0.032
 crashing+0.031
add+0.031
Ho+0.031
Neg logit contributions
 property-0.032
 client-0.031
lder-0.030
 means-0.030
 policy-0.029
Token,
Token ID11
Feature activation-1.75e-02

Pos logit contributions
 shallow+0.003
 crashing+0.003
 sun+0.003
ores+0.003
 fresh+0.003
Neg logit contributions
lder-0.005
 client-0.005
 property-0.004
grand-0.004
 fault-0.004
Token but
Token ID475
Feature activation+2.02e-03

Pos logit contributions
add+0.022
wered+0.022
Ho+0.021
 shallow+0.021
hes+0.020
Neg logit contributions
 property-0.031
 means-0.031
 fault-0.030
 client-0.029
keeping-0.029
Token the
Token ID262
Feature activation-2.82e-02

Pos logit contributions
lder+0.004
 property+0.004
 client+0.004
 means+0.004
grand+0.004
Neg logit contributions
 shallow-0.002
 crashing-0.002
Ho-0.002
wered-0.002
 sun-0.002
Token hero
Token ID4293
Feature activation+1.45e-02

Pos logit contributions
 shallow+0.043
 crashing+0.041
ores+0.041
Ho+0.041
wered+0.040
Neg logit contributions
 client-0.044
 property-0.044
lder-0.043
 means-0.043
 policy-0.041
Token per
Token ID583
Feature activation-1.66e-01

Pos logit contributions
 property+0.023
 fault+0.023
 client+0.023
lder+0.022
 policy+0.022
Neg logit contributions
 shallow-0.021
 crashing-0.020
Ho-0.020
 sun-0.020
add-0.019
Tokense
Token ID325
Feature activation-1.10e-02

Pos logit contributions
 shallow+0.290
 sunscreen+0.278
wered+0.278
ores+0.274
 sun+0.260
Neg logit contributions
 means-0.231
vered-0.216
lder-0.214
 property-0.205
uable-0.202
Tokenvered
Token ID21917
Feature activation+2.89e-02

Pos logit contributions
 shallow+0.025
Ho+0.024
 crashing+0.024
add+0.023
 sun+0.023
Neg logit contributions
lder-0.009
 client-0.009
 property-0.009
 fault-0.009
grand-0.009
Token.
Token ID13
Feature activation+1.98e-02

Pos logit contributions
lder+0.053
 client+0.053
 property+0.052
 means+0.052
grand+0.051
Neg logit contributions
 shallow-0.036
 crashing-0.034
Ho-0.034
 sun-0.033
wered-0.033
Token He
Token ID679
Feature activation-3.38e-02

Pos logit contributions
 means+0.034
 client+0.033
lder+0.033
 depends+0.032
keeping+0.032
Neg logit contributions
 shallow-0.027
 crashing-0.026
wered-0.026
 sunscreen-0.026
Ho-0.025
Token searched
Token ID16499
Feature activation-7.84e-02

Pos logit contributions
 shallow+0.053
 crashing+0.052
wered+0.052
Ho+0.052
ores+0.051
Neg logit contributions
 means-0.050
lder-0.049
 property-0.049
 client-0.048
grand-0.046
Token"
Token ID1
Feature activation+2.81e-02

Pos logit contributions
 shallow+0.073
 crashing+0.068
wered+0.067
Ho+0.067
ores+0.066
Neg logit contributions
lder-0.127
 client-0.126
 property-0.125
 means-0.125
grand-0.122
TokenIt
Token ID1026
Feature activation-2.28e-02

Pos logit contributions
lder+0.046
 means+0.044
grand+0.044
 client+0.044
 Painter+0.041
Neg logit contributions
 shallow-0.042
ores-0.040
 crashing-0.040
 camps-0.040
add-0.040
Token's
Token ID338
Feature activation+2.86e-02

Pos logit contributions
 sun+0.039
 shallow+0.039
ch+0.039
hes+0.038
 spring+0.038
Neg logit contributions
keeping-0.033
 policy-0.032
 client-0.032
 privacy-0.030
 endings-0.030
Token one
Token ID530
Feature activation+1.25e-02

Pos logit contributions
 property+0.047
 means+0.047
 fault+0.047
 client+0.045
 policy+0.045
Neg logit contributions
 shallow-0.039
Ho-0.039
 crashing-0.038
add-0.038
wered-0.037
Token dollar
Token ID8872
Feature activation+4.08e-03

Pos logit contributions
lder+0.024
 client+0.023
 property+0.023
keeping+0.023
 fault+0.023
Neg logit contributions
 shallow-0.015
 sun-0.014
 crashing-0.014
ores-0.014
Ho-0.014
Token per
Token ID583
Feature activation-1.66e-01

Pos logit contributions
lder+0.007
 client+0.007
 property+0.007
 fault+0.007
keeping+0.007
Neg logit contributions
 shallow-0.005
 sun-0.005
Ho-0.005
 crashing-0.005
ores-0.005
Token cup
Token ID6508
Feature activation-2.61e-02

Pos logit contributions
 shallow+0.204
wered+0.194
 camps+0.166
 refreshed+0.162
 sunscreen+0.161
Neg logit contributions
 means-0.333
 fault-0.305
',"-0.302
ometer-0.301
 client-0.300
Token.
Token ID13
Feature activation+1.00e-02

Pos logit contributions
 shallow+0.032
Ho+0.031
 crashing+0.031
wered+0.030
add+0.029
Neg logit contributions
 means-0.048
lder-0.048
 property-0.048
 client-0.047
grand-0.047
Token But
Token ID887
Feature activation+1.84e-02

Pos logit contributions
 means+0.013
 client+0.013
grand+0.012
lder+0.012
 property+0.012
Neg logit contributions
 shallow-0.019
 crashing-0.018
wered-0.017
add-0.017
Ho-0.017
Token I
Token ID314
Feature activation+1.03e-02

Pos logit contributions
 means+0.026
 property+0.025
 client+0.024
lder+0.024
 policy+0.024
Neg logit contributions
 shallow-0.032
Ho-0.030
 crashing-0.030
wered-0.030
ores-0.029
Token have
Token ID423
Feature activation-2.94e-02

Pos logit contributions
keeping+0.017
 property+0.016
lder+0.016
rums+0.016
 client+0.016
Neg logit contributions
 shallow-0.016
ores-0.015
Ho-0.014
hes-0.014
 sun-0.014
Token started
Token ID2067
Feature activation-2.05e-02

Pos logit contributions
 shallow+0.078
 crashing+0.075
Ho+0.074
wered+0.074
ores+0.072
Neg logit contributions
lder-0.090
 means-0.086
 client-0.085
grand-0.084
 property-0.083
Token baking
Token ID16871
Feature activation-4.86e-02

Pos logit contributions
 shallow+0.030
Ho+0.029
wered+0.028
 crashing+0.028
ores+0.027
Neg logit contributions
lder-0.033
 means-0.033
 client-0.032
 property-0.032
 fault-0.032
Token the
Token ID262
Feature activation-8.16e-03

Pos logit contributions
 crashing+0.064
 shallow+0.063
Ho+0.063
wered+0.061
add+0.058
Neg logit contributions
 means-0.085
 property-0.081
lder-0.081
 fault-0.080
 decisions-0.079
Token cake
Token ID12187
Feature activation-9.29e-03

Pos logit contributions
 shallow+0.010
 sun+0.010
 fresh+0.010
wered+0.009
 crashing+0.009
Neg logit contributions
 client-0.015
 means-0.015
 property-0.015
lder-0.015
rums-0.014
Token as
Token ID355
Feature activation-3.65e-02

Pos logit contributions
 shallow+0.011
Ho+0.011
wered+0.011
 crashing+0.011
add+0.010
Neg logit contributions
 means-0.017
lder-0.017
 property-0.017
 client-0.017
grand-0.017
Token per
Token ID583
Feature activation-1.66e-01

Pos logit contributions
 shallow+0.043
Ho+0.043
wered+0.042
 crashing+0.041
add+0.039
Neg logit contributions
 means-0.067
 client-0.066
 fault-0.066
lder-0.066
 policy-0.066
Token the
Token ID262
Feature activation-2.26e-02

Pos logit contributions
wered+0.261
 shallow+0.258
 sunscreen+0.222
 crashing+0.219
 drifting+0.212
Neg logit contributions
 means-0.269
lder-0.247
ometer-0.240
uable-0.234
 client-0.231
Token advice
Token ID5608
Feature activation+3.35e-02

Pos logit contributions
 shallow+0.031
 sun+0.030
 crashing+0.029
Ho+0.029
wered+0.028
Neg logit contributions
lder-0.038
 property-0.038
 client-0.038
 means-0.038
grand-0.037
Token of
Token ID286
Feature activation-3.95e-02

Pos logit contributions
lder+0.066
 property+0.066
 client+0.065
 means+0.065
 fault+0.064
Neg logit contributions
 shallow-0.037
Ho-0.034
 crashing-0.034
wered-0.034
 sun-0.033
Token the
Token ID262
Feature activation-1.70e-02

Pos logit contributions
Ho+0.043
 shallow+0.042
 crashing+0.039
add+0.039
ch+0.038
Neg logit contributions
 property-0.079
 means-0.077
 client-0.077
 fault-0.076
lder-0.076
Token elderly
Token ID13830
Feature activation+3.28e-02

Pos logit contributions
 shallow+0.024
 sun+0.024
 crashing+0.023
 fresh+0.022
Ho+0.022
Neg logit contributions
 means-0.028
lder-0.028
 client-0.028
 property-0.028
 fault-0.027
Token butterfly
Token ID35113
Feature activation-2.86e-02

Pos logit contributions
 shallow+0.021
 crashing+0.019
 sun+0.019
Ho+0.019
wered+0.018
Neg logit contributions
lder-0.028
 client-0.028
 means-0.027
 property-0.027
grand-0.027
Token's
Token ID338
Feature activation+2.41e-02

Pos logit contributions
 shallow+0.031
 crashing+0.028
 sun+0.028
Ho+0.027
ores+0.027
Neg logit contributions
 client-0.057
lder-0.057
 property-0.057
 policy-0.055
 means-0.055
Token wings
Token ID12098
Feature activation-8.00e-02

Pos logit contributions
lder+0.041
 client+0.040
 means+0.040
 property+0.039
grand+0.039
Neg logit contributions
 shallow-0.034
 sun-0.033
 crashing-0.032
Ho-0.031
wered-0.031
Token and
Token ID290
Feature activation-1.15e-01

Pos logit contributions
 shallow+0.090
Ho+0.086
wered+0.085
 crashing+0.083
add+0.082
Neg logit contributions
 property-0.154
 means-0.154
lder-0.153
 fault-0.151
 client-0.150
Token gently
Token ID15165
Feature activation-4.31e-02

Pos logit contributions
 shallow+0.143
Ho+0.136
wered+0.135
 crashing+0.135
add+0.134
Neg logit contributions
 means-0.214
lder-0.209
 property-0.208
 client-0.208
 fault-0.201
Token stro
Token ID9660
Feature activation-1.66e-01

Pos logit contributions
wered+0.054
 shallow+0.054
add+0.053
Ho+0.051
ores+0.048
Neg logit contributions
 means-0.081
lder-0.078
 fault-0.074
 client-0.074
 property-0.074
Tokenked
Token ID9091
Feature activation-7.42e-02

Pos logit contributions
 shallow+0.225
wered+0.205
ores+0.200
 sun+0.196
 camps+0.192
Neg logit contributions
lder-0.294
 means-0.292
 property-0.289
 client-0.286
keeping-0.285
Token them
Token ID606
Feature activation-5.35e-02

Pos logit contributions
 shallow+0.086
Ho+0.080
 crashing+0.079
wered+0.078
 sun+0.077
Neg logit contributions
lder-0.142
 property-0.141
 client-0.141
 means-0.140
 fault-0.137
Token.
Token ID13
Feature activation+8.45e-03

Pos logit contributions
 shallow+0.073
wered+0.069
Ho+0.067
add+0.067
 crashing+0.066
Neg logit contributions
 means-0.093
lder-0.092
 property-0.090
 client-0.089
 fault-0.087
Token The
Token ID383
Feature activation-5.43e-02

Pos logit contributions
 means+0.014
lder+0.013
 fault+0.013
 client+0.013
grand+0.013
Neg logit contributions
 shallow-0.013
wered-0.012
Ho-0.012
add-0.012
 crashing-0.012
Token butterfly
Token ID35113
Feature activation-4.09e-02

Pos logit contributions
 shallow+0.076
Ho+0.072
 crashing+0.072
ores+0.071
wered+0.071
Neg logit contributions
 client-0.090
 property-0.090
lder-0.090
 means-0.088
 fault-0.086
Token
Token ID198
Feature activation-5.53e-02

Pos logit contributions
 shallow+0.073
 crashing+0.069
wered+0.065
 camps+0.065
Ho+0.064
Neg logit contributions
lder-0.089
grand-0.086
 means-0.084
rums-0.083
keeping-0.082
TokenM
Token ID44
Feature activation+9.89e-03

Pos logit contributions
 shallow+0.066
 crashing+0.062
 sun+0.060
 camps+0.060
 drifting+0.058
Neg logit contributions
lder-0.109
grand-0.103
rd-0.099
rums-0.099
 means-0.098
Tokenia
Token ID544
Feature activation-7.08e-03

Pos logit contributions
 property+0.018
lder+0.018
keeping+0.018
 client+0.018
 means+0.018
Neg logit contributions
 shallow-0.012
add-0.012
 sun-0.012
wered-0.012
ores-0.012
Token's
Token ID338
Feature activation+2.56e-02

Pos logit contributions
 shallow+0.010
 sun+0.009
 crashing+0.009
Ho+0.009
add+0.009
Neg logit contributions
 property-0.012
 client-0.012
 policy-0.012
 fault-0.012
keeping-0.012
Token mom
Token ID1995
Feature activation+4.79e-02

Pos logit contributions
 means+0.049
lder+0.048
 property+0.048
grand+0.048
keeping+0.047
Neg logit contributions
 sun-0.031
 shallow-0.031
 crashing-0.029
 fresh-0.028
wered-0.028
Token crou
Token ID33838
Feature activation-1.66e-01

Pos logit contributions
lder+0.077
 property+0.076
 client+0.076
 means+0.075
grand+0.074
Neg logit contributions
 shallow-0.071
 crashing-0.067
Ho-0.067
wered-0.065
 sun-0.065
Tokenched
Token ID1740
Feature activation-7.51e-02

Pos logit contributions
 camps+0.289
 sun+0.282
 shallow+0.281
wered+0.280
Ho+0.275
Neg logit contributions
 means-0.217
vered-0.201
lder-0.201
rd-0.189
 depends-0.183
Token down
Token ID866
Feature activation-2.77e-02

Pos logit contributions
 shallow+0.081
Ho+0.075
 crashing+0.074
wered+0.073
ores+0.071
Neg logit contributions
lder-0.150
 property-0.150
 means-0.150
 client-0.149
 fault-0.145
Token and
Token ID290
Feature activation-1.10e-01

Pos logit contributions
 shallow+0.034
Ho+0.034
wered+0.032
add+0.031
 crashing+0.030
Neg logit contributions
 means-0.052
 policy-0.049
 client-0.049
lder-0.048
 property-0.048
Token said
Token ID531
Feature activation+8.54e-03

Pos logit contributions
Ho+0.167
wered+0.166
 shallow+0.163
 crashing+0.162
add+0.159
Neg logit contributions
 means-0.173
lder-0.165
 client-0.159
 depends-0.158
 property-0.157
Token,
Token ID11
Feature activation-1.15e-02

Pos logit contributions
 property+0.017
 client+0.016
keeping+0.016
grand+0.016
 depends+0.016
Neg logit contributions
 sun-0.010
 crashing-0.010
 fresh-0.010
 shallow-0.009
ores-0.009
Token
Token ID198
Feature activation-5.06e-02

Pos logit contributions
 means+0.036
lder+0.035
 fault+0.035
 client+0.035
 property+0.035
Neg logit contributions
 shallow-0.020
 crashing-0.018
Ho-0.018
wered-0.018
 sun-0.017
Token
Token ID198
Feature activation-5.39e-02

Pos logit contributions
 shallow+0.070
wered+0.068
 crashing+0.067
ores+0.066
 camps+0.062
Neg logit contributions
lder-0.086
 means-0.086
grand-0.085
 depends-0.080
 client-0.079
TokenD
Token ID35
Feature activation-1.01e-02

Pos logit contributions
ores+0.064
 shallow+0.063
wered+0.062
 crashing+0.061
 drifting+0.059
Neg logit contributions
lder-0.103
grand-0.100
 means-0.098
 property-0.093
 client-0.093
Tokenott
Token ID1252
Feature activation-1.19e-02

Pos logit contributions
 shallow+0.017
Ho+0.017
 sun+0.016
add+0.016
 crashing+0.016
Neg logit contributions
 client-0.014
lder-0.014
 property-0.014
 means-0.013
grand-0.013
Tokenie
Token ID494
Feature activation-4.80e-04

Pos logit contributions
add+0.013
eh+0.012
inn+0.012
hes+0.012
aver+0.012
Neg logit contributions
 client-0.023
 policy-0.023
idents-0.022
 fault-0.022
 endings-0.022
Token per
Token ID583
Feature activation-1.66e-01

Pos logit contributions
 shallow+0.001
 sun+0.001
Ho+0.001
 crashing+0.001
add+0.001
Neg logit contributions
 fault-0.001
 policy-0.001
 property-0.001
 client-0.001
keeping-0.001
Tokenked
Token ID9091
Feature activation-6.55e-02

Pos logit contributions
 shallow+0.378
wered+0.364
 sunscreen+0.354
 camps+0.353
ores+0.349
Neg logit contributions
 means-0.152
lder-0.137
 client-0.128
vered-0.125
rums-0.124
Token up
Token ID510
Feature activation-5.60e-02

Pos logit contributions
 shallow+0.068
Ho+0.064
 crashing+0.061
wered+0.061
ores+0.060
Neg logit contributions
 client-0.132
 means-0.132
 property-0.132
lder-0.131
 fault-0.128
Token and
Token ID290
Feature activation-1.14e-01

Pos logit contributions
Ho+0.079
 shallow+0.078
hes+0.078
 crashing+0.074
ores+0.074
Neg logit contributions
 means-0.090
 worry-0.085
 client-0.085
lder-0.084
keeping-0.083
Token happily
Token ID18177
Feature activation-2.79e-02

Pos logit contributions
 shallow+0.165
 crashing+0.155
Ho+0.154
 sun+0.152
wered+0.151
Neg logit contributions
 property-0.186
lder-0.186
 client-0.186
 fault-0.180
 means-0.180
Token nodded
Token ID14464
Feature activation-4.71e-02

Pos logit contributions
 shallow+0.041
wered+0.040
Ho+0.040
 crashing+0.040
ores+0.039
Neg logit contributions
lder-0.045
 means-0.044
 property-0.042
 client-0.042
grand-0.041
Token."
Token ID526
Feature activation+3.35e-02

Pos logit contributions
lder+0.055
 client+0.055
 property+0.053
 fault+0.052
grand+0.052
Neg logit contributions
 shallow-0.069
 crashing-0.066
 sun-0.065
Ho-0.065
ores-0.064
Token F
Token ID376
Feature activation+1.83e-02

Pos logit contributions
 means+0.059
 client+0.055
lder+0.055
 fault+0.055
 property+0.054
Neg logit contributions
 shallow-0.047
wered-0.044
ores-0.042
 crashing-0.041
hes-0.041
Tokenido
Token ID17305
Feature activation+1.47e-02

Pos logit contributions
 property+0.032
lder+0.032
 client+0.032
 means+0.032
grand+0.031
Neg logit contributions
 shallow-0.025
 crashing-0.023
 sun-0.023
Ho-0.023
wered-0.023
Token's
Token ID338
Feature activation+2.91e-02

Pos logit contributions
 property+0.025
 policy+0.025
 client+0.025
 fault+0.025
keeping+0.025
Neg logit contributions
 shallow-0.019
 sun-0.018
Ho-0.018
 crashing-0.018
ores-0.017
Token ears
Token ID11368
Feature activation-3.07e-02

Pos logit contributions
 property+0.047
 client+0.047
 fault+0.046
lder+0.044
 privacy+0.043
Neg logit contributions
 shallow-0.046
ores-0.043
Ho-0.043
wered-0.042
 crashing-0.041
Token per
Token ID583
Feature activation-1.65e-01

Pos logit contributions
 shallow+0.047
Ho+0.045
wered+0.044
ores+0.044
 crashing+0.043
Neg logit contributions
 means-0.048
lder-0.047
 property-0.046
 client-0.045
 fault-0.044
Tokenked
Token ID9091
Feature activation-6.25e-02

Pos logit contributions
 shallow+0.287
wered+0.280
ores+0.258
 sunscreen+0.251
 camps+0.246
Neg logit contributions
 means-0.250
lder-0.224
vered-0.216
 client-0.215
 property-0.214
Token up
Token ID510
Feature activation-5.27e-02

Pos logit contributions
 shallow+0.072
Ho+0.068
wered+0.066
add+0.065
ores+0.065
Neg logit contributions
 means-0.119
 property-0.118
 client-0.117
lder-0.116
 fault-0.115
Token and
Token ID290
Feature activation-1.17e-01

Pos logit contributions
hes+0.081
Ho+0.079
 shallow+0.079
wered+0.077
add+0.077
Neg logit contributions
 means-0.077
 worry-0.075
 behaviour-0.069
 fault-0.069
 property-0.068
Token he
Token ID339
Feature activation-3.40e-02

Pos logit contributions
 shallow+0.156
Ho+0.151
wered+0.150
 crashing+0.147
add+0.146
Neg logit contributions
 means-0.202
lder-0.199
 property-0.197
 client-0.195
 depends-0.190
Token looked
Token ID3114
Feature activation-5.92e-02

Pos logit contributions
 shallow+0.051
 crashing+0.049
Ho+0.049
wered+0.048
 sun+0.047
Neg logit contributions
lder-0.053
 property-0.053
 client-0.053
 means-0.053
grand-0.051
Token away
Token ID1497
Feature activation+7.57e-02

Pos logit contributions
 shallow+0.209
hes+0.208
wered+0.206
add+0.205
Ho+0.205
Neg logit contributions
 means-0.143
 property-0.138
keeping-0.136
 fault-0.127
 decisions-0.127
Token,
Token ID11
Feature activation-1.44e-02

Pos logit contributions
 property+0.131
lder+0.131
 client+0.130
 means+0.130
 fault+0.127
Neg logit contributions
 shallow-0.100
Ho-0.095
wered-0.093
 crashing-0.093
ores-0.091
Token and
Token ID290
Feature activation-1.05e-01

Pos logit contributions
add+0.024
wered+0.023
Ho+0.022
inn+0.021
 shallow+0.021
Neg logit contributions
 property-0.021
 depends-0.020
 worry-0.020
keeping-0.020
 fault-0.020
Token they
Token ID484
Feature activation-2.59e-02

Pos logit contributions
 shallow+0.107
 crashing+0.098
Ho+0.098
wered+0.095
 sun+0.095
Neg logit contributions
lder-0.218
 property-0.217
 client-0.216
 means-0.214
 fault-0.210
Token all
Token ID477
Feature activation-2.22e-02

Pos logit contributions
 shallow+0.039
wered+0.037
ores+0.037
Ho+0.036
 crashing+0.035
Neg logit contributions
lder-0.040
 means-0.039
 client-0.038
 property-0.038
grand-0.038
Token kne
Token ID24813
Feature activation-1.65e-01

Pos logit contributions
 shallow+0.032
Ho+0.031
wered+0.031
 crashing+0.030
ores+0.030
Neg logit contributions
 means-0.035
lder-0.035
 property-0.034
 client-0.034
grand-0.034
Tokeneled
Token ID18449
Feature activation-4.11e-02

Pos logit contributions
 shallow+0.256
wered+0.251
ores+0.243
 sun+0.241
 crashing+0.240
Neg logit contributions
 means-0.244
 property-0.244
lder-0.243
keeping-0.242
rums-0.231
Token in
Token ID287
Feature activation-2.54e-02

Pos logit contributions
 shallow+0.056
wered+0.054
Ho+0.052
 crashing+0.052
ores+0.051
Neg logit contributions
 property-0.071
 means-0.070
keeping-0.069
lder-0.069
 client-0.068
Token respect
Token ID2461
Feature activation+2.31e-04

Pos logit contributions
Ho+0.033
add+0.032
 shallow+0.032
hes+0.031
ores+0.031
Neg logit contributions
 property-0.044
 means-0.044
 policy-0.043
keeping-0.043
 client-0.043
Token for
Token ID329
Feature activation+3.99e-02

Pos logit contributions
lder+0.000
 client+0.000
grand+0.000
 property+0.000
 means+0.000
Neg logit contributions
 shallow-0.000
 crashing-0.000
 sun-0.000
Ho-0.000
wered-0.000
Token their
Token ID511
Feature activation+3.63e-02

Pos logit contributions
 property+0.075
 client+0.075
lder+0.074
 means+0.073
keeping+0.073
Neg logit contributions
 shallow-0.048
Ho-0.044
add-0.043
ores-0.043
wered-0.043
Token,
Token ID11
Feature activation-1.49e-02

Pos logit contributions
wered+0.108
 shallow+0.108
add+0.105
 crashing+0.105
Ho+0.103
Neg logit contributions
 property-0.164
 means-0.156
lder-0.154
 client-0.154
 privacy-0.151
Token roll
Token ID4836
Feature activation-1.06e-01

Pos logit contributions
 shallow+0.021
add+0.020
wered+0.020
inn+0.020
Ho+0.019
Neg logit contributions
 property-0.026
 means-0.025
 worry-0.025
 fault-0.024
 privacy-0.024
Token over
Token ID625
Feature activation-2.06e-02

Pos logit contributions
 shallow+0.122
Ho+0.114
 crashing+0.113
 sun+0.111
ores+0.107
Neg logit contributions
lder-0.208
 client-0.206
 property-0.202
grand-0.199
 fault-0.199
Token,
Token ID11
Feature activation-1.52e-02

Pos logit contributions
 shallow+0.024
wered+0.024
Ho+0.023
 crashing+0.023
ores+0.022
Neg logit contributions
 property-0.039
 means-0.038
 client-0.038
lder-0.037
 fault-0.037
Token and
Token ID290
Feature activation-1.09e-01

Pos logit contributions
wered+0.023
add+0.022
inn+0.021
 shallow+0.021
Ho+0.020
Neg logit contributions
 means-0.026
 property-0.025
 fault-0.024
 depends-0.023
 privacy-0.023
Token kne
Token ID24813
Feature activation-1.64e-01

Pos logit contributions
 shallow+0.152
add+0.148
wered+0.147
Ho+0.145
ores+0.142
Neg logit contributions
 property-0.182
 means-0.182
 client-0.178
 depends-0.173
grand-0.172
Tokenel
Token ID417
Feature activation-4.15e-02

Pos logit contributions
 shallow+0.274
wered+0.269
 sun+0.257
 crashing+0.254
ores+0.254
Neg logit contributions
 means-0.232
 property-0.230
lder-0.226
 client-0.219
keeping-0.219
Token."
Token ID526
Feature activation+2.23e-02

Pos logit contributions
 shallow+0.046
 crashing+0.042
wered+0.042
Ho+0.042
ores+0.041
Neg logit contributions
 property-0.082
 means-0.082
lder-0.081
 client-0.081
 fault-0.079
Token
Token ID198
Feature activation-6.47e-02

Pos logit contributions
 client+0.043
 means+0.042
lder+0.041
grand+0.041
 fault+0.041
Neg logit contributions
 shallow-0.027
wered-0.025
 crashing-0.025
Ho-0.024
ores-0.023
Token
Token ID198
Feature activation-6.73e-02

Pos logit contributions
 shallow+0.094
wered+0.093
 crashing+0.089
 drifting+0.085
ores+0.083
Neg logit contributions
 means-0.105
lder-0.102
grand-0.102
rums-0.099
 client-0.096
Token"
Token ID1
Feature activation+2.41e-02

Pos logit contributions
wered+0.091
ores+0.088
 drifting+0.087
 shallow+0.087
add+0.085
Neg logit contributions
lder-0.115
grand-0.115
 means-0.112
 property-0.112
 client-0.111
Token hand
Token ID1021
Feature activation-2.42e-02

Pos logit contributions
 means+0.050
lder+0.048
 property+0.047
 client+0.047
grand+0.047
Neg logit contributions
 sun-0.035
 shallow-0.032
 crashing-0.032
 fresh-0.031
Ho-0.031
Token.
Token ID13
Feature activation+6.41e-03

Pos logit contributions
 shallow+0.033
 crashing+0.032
Ho+0.031
 sun+0.031
wered+0.031
Neg logit contributions
lder-0.041
 property-0.041
 client-0.041
 means-0.040
grand-0.039
Token She
Token ID1375
Feature activation-4.36e-02

Pos logit contributions
 means+0.010
lder+0.010
 fault+0.010
 client+0.009
grand+0.009
Neg logit contributions
 shallow-0.010
wered-0.010
Ho-0.009
 crashing-0.009
 camps-0.009
Token smiled
Token ID13541
Feature activation-7.01e-02

Pos logit contributions
 shallow+0.070
 crashing+0.067
Ho+0.066
wered+0.066
ores+0.065
Neg logit contributions
lder-0.065
 client-0.064
 means-0.064
 property-0.064
grand-0.061
Token and
Token ID290
Feature activation-1.25e-01

Pos logit contributions
 shallow+0.079
wered+0.076
add+0.073
Ho+0.073
 crashing+0.069
Neg logit contributions
 means-0.139
lder-0.135
 property-0.134
 fault-0.133
 client-0.133
Token stro
Token ID9660
Feature activation-1.63e-01

Pos logit contributions
wered+0.201
 shallow+0.196
add+0.193
Ho+0.193
 crashing+0.192
Neg logit contributions
 means-0.188
lder-0.187
 depends-0.172
 client-0.172
grand-0.169
Tokenked
Token ID9091
Feature activation-7.22e-02

Pos logit contributions
 shallow+0.250
wered+0.228
ores+0.222
 camps+0.216
 sun+0.209
Neg logit contributions
lder-0.272
 means-0.261
 property-0.252
rums-0.251
keeping-0.250
Token the
Token ID262
Feature activation-3.11e-02

Pos logit contributions
 shallow+0.088
Ho+0.083
wered+0.081
add+0.079
 crashing+0.079
Neg logit contributions
 means-0.134
lder-0.132
 property-0.131
 client-0.131
 fault-0.129
Token horse
Token ID8223
Feature activation-3.23e-02

Pos logit contributions
 shallow+0.039
ores+0.037
Ho+0.036
wered+0.036
 crashing+0.036
Neg logit contributions
 property-0.057
lder-0.057
 client-0.056
 means-0.055
 fault-0.054
Token's
Token ID338
Feature activation+8.95e-03

Pos logit contributions
 shallow+0.046
Ho+0.043
 crashing+0.043
wered+0.042
ores+0.042
Neg logit contributions
lder-0.054
 property-0.053
 client-0.053
 means-0.053
 fault-0.051
Token nose
Token ID9686
Feature activation-4.86e-02

Pos logit contributions
 property+0.014
lder+0.014
 client+0.014
 means+0.014
 fault+0.013
Neg logit contributions
 shallow-0.014
wered-0.013
add-0.013
ores-0.013
Ho-0.012
Token eyes
Token ID2951
Feature activation-5.15e-02

Pos logit contributions
lder+0.057
 property+0.057
 client+0.057
 means+0.056
grand+0.055
Neg logit contributions
 shallow-0.042
 crashing-0.040
Ho-0.039
 sun-0.039
wered-0.038
Token.
Token ID13
Feature activation+1.72e-02

Pos logit contributions
 shallow+0.074
Ho+0.072
wered+0.071
ores+0.070
add+0.070
Neg logit contributions
lder-0.081
 means-0.080
 property-0.079
 worry-0.075
 fault-0.075
Token Jill
Token ID22772
Feature activation+4.65e-02

Pos logit contributions
 means+0.029
lder+0.029
 property+0.028
keeping+0.028
grand+0.028
Neg logit contributions
 shallow-0.023
Ho-0.023
wered-0.022
 crashing-0.022
ch-0.021
Token and
Token ID290
Feature activation-9.32e-02

Pos logit contributions
 property+0.084
 client+0.083
 policy+0.082
 fault+0.082
lder+0.080
Neg logit contributions
 shallow-0.059
 crashing-0.056
Ho-0.056
 sun-0.056
wered-0.053
Token Mike
Token ID4995
Feature activation+1.01e-02

Pos logit contributions
 shallow+0.153
 crashing+0.146
Ho+0.146
wered+0.144
 sun+0.143
Neg logit contributions
lder-0.134
 property-0.133
 client-0.133
 means-0.131
 fault-0.128
Token crou
Token ID33838
Feature activation-1.63e-01

Pos logit contributions
 property+0.017
 client+0.016
lder+0.016
 means+0.016
 fault+0.016
Neg logit contributions
 shallow-0.015
 crashing-0.014
Ho-0.014
wered-0.013
 sun-0.013
Tokenched
Token ID1740
Feature activation-7.43e-02

Pos logit contributions
 camps+0.325
wered+0.317
 sun+0.315
 shallow+0.310
 sunscreen+0.310
Neg logit contributions
vered-0.179
 means-0.163
lder-0.147
rd-0.142
 behaviour-0.135
Token down
Token ID866
Feature activation-3.10e-02

Pos logit contributions
 shallow+0.079
Ho+0.073
wered+0.073
 crashing+0.073
ores+0.070
Neg logit contributions
 property-0.149
 means-0.148
lder-0.148
 client-0.148
 fault-0.144
Token to
Token ID284
Feature activation+3.63e-02

Pos logit contributions
Ho+0.042
wered+0.041
 shallow+0.041
hes+0.039
eh+0.038
Neg logit contributions
 means-0.054
 fault-0.050
 policy-0.050
 property-0.049
 client-0.049
Token pet
Token ID4273
Feature activation-5.53e-02

Pos logit contributions
 means+0.058
 property+0.058
 prove+0.056
 worry+0.056
 client+0.056
Neg logit contributions
Ho-0.051
hes-0.051
 shallow-0.050
add-0.048
ores-0.048
Token the
Token ID262
Feature activation-3.14e-03

Pos logit contributions
 shallow+0.067
Ho+0.063
wered+0.062
add+0.061
 crashing+0.060
Neg logit contributions
 means-0.104
 property-0.101
lder-0.100
 client-0.100
keeping-0.098
Token,
Token ID11
Feature activation-1.30e-02

Pos logit contributions
 property+0.016
keeping+0.015
 client+0.015
grand+0.015
 privacy+0.015
Neg logit contributions
 crashing-0.010
 sun-0.010
 shallow-0.010
ores-0.010
 fresh-0.009
Token "
Token ID366
Feature activation+8.73e-03

Pos logit contributions
 shallow+0.024
wered+0.023
add+0.023
 camps+0.023
Ho+0.022
Neg logit contributions
lder-0.018
 means-0.016
 Painter-0.015
 client-0.014
 depends-0.014
TokenYou
Token ID1639
Feature activation+3.06e-02

Pos logit contributions
 means+0.015
lder+0.014
grand+0.014
 property+0.014
 depends+0.014
Neg logit contributions
 shallow-0.012
add-0.012
wered-0.012
 crashing-0.012
 camps-0.012
Token did
Token ID750
Feature activation+5.57e-02

Pos logit contributions
keeping+0.056
 policy+0.054
 client+0.053
lder+0.052
 property+0.052
Neg logit contributions
 sun-0.043
 shallow-0.040
 all-0.039
 fresh-0.038
hes-0.038
Token a
Token ID257
Feature activation+3.41e-02

Pos logit contributions
lder+0.149
grand+0.148
keeping+0.146
 client+0.145
 Painter+0.141
Neg logit contributions
 fresh-0.032
 spring-0.032
 sun-0.031
 shallow-0.028
 all-0.027
Token great
Token ID1049
Feature activation+1.10e-01

Pos logit contributions
keeping+0.086
grand+0.079
rums+0.077
 policy+0.074
 property+0.074
Neg logit contributions
 fresh-0.033
 sun-0.032
 shallow-0.032
 spring-0.030
 higher-0.030
Token job
Token ID1693
Feature activation+1.22e-01

Pos logit contributions
keeping+0.119
 property+0.107
rums+0.101
 purposes+0.101
 privacy+0.098
Neg logit contributions
 sun-0.239
 spring-0.224
ores-0.222
ch-0.221
 fresh-0.221
Token,
Token ID11
Feature activation+4.28e-03

Pos logit contributions
lder+0.233
 client+0.232
 property+0.227
grand+0.226
 depends+0.220
Neg logit contributions
 crashing-0.143
 sun-0.141
 shallow-0.135
ores-0.133
 fresh-0.129
Token Lily
Token ID20037
Feature activation+3.96e-02

Pos logit contributions
 means+0.008
 property+0.008
 client+0.008
 fault+0.008
 policy+0.008
Neg logit contributions
 shallow-0.005
Ho-0.005
wered-0.004
 crashing-0.004
ores-0.004
Token!
Token ID0
Feature activation+3.28e-02

Pos logit contributions
 client+0.100
 endings+0.097
lder+0.095
 property+0.094
 meanings+0.094
Neg logit contributions
 sun-0.023
 fresh-0.020
ores-0.018
 crashing-0.018
 all-0.017
Token Now
Token ID2735
Feature activation+2.30e-02

Pos logit contributions
 means+0.040
lder+0.040
 client+0.040
 property+0.039
grand+0.038
Neg logit contributions
 shallow-0.061
Ho-0.058
 crashing-0.058
wered-0.058
 sun-0.056
Token providing
Token ID4955
Feature activation+7.76e-03

Pos logit contributions
lder+0.071
 client+0.070
keeping+0.070
 property+0.069
 means+0.068
Neg logit contributions
 shallow-0.065
Ho-0.062
add-0.059
 camps-0.059
ores-0.058
Token her
Token ID607
Feature activation+3.90e-02

Pos logit contributions
lder+0.016
 property+0.016
 client+0.015
 means+0.015
 fault+0.015
Neg logit contributions
 shallow-0.008
 crashing-0.008
Ho-0.008
wered-0.007
 sun-0.007
Token with
Token ID351
Feature activation-3.45e-02

Pos logit contributions
lder+0.089
rums+0.086
 client+0.085
 means+0.085
 policy+0.084
Neg logit contributions
 shallow-0.033
 sun-0.032
 fresh-0.032
 crashing-0.029
 sunscreen-0.028
Token such
Token ID884
Feature activation+5.96e-02

Pos logit contributions
 shallow+0.041
Ho+0.040
 crashing+0.039
add+0.038
wered+0.038
Neg logit contributions
 property-0.064
 client-0.063
 means-0.063
lder-0.062
 fault-0.062
Token an
Token ID281
Feature activation+5.55e-02

Pos logit contributions
lder+0.126
 client+0.124
 property+0.124
 means+0.123
grand+0.122
Neg logit contributions
 shallow-0.059
 sun-0.053
 crashing-0.053
Ho-0.053
wered-0.052
Token awesome
Token ID7427
Feature activation+9.62e-02

Pos logit contributions
lder+0.084
 client+0.084
 property+0.084
 means+0.083
 fault+0.081
Neg logit contributions
 shallow-0.086
 crashing-0.081
Ho-0.081
 sun-0.081
wered-0.081
Token gift
Token ID6979
Feature activation+7.65e-02

Pos logit contributions
 property+0.123
 client+0.122
lder+0.122
 means+0.122
grand+0.117
Neg logit contributions
 shallow-0.170
 sun-0.167
 crashing-0.164
Ho-0.161
wered-0.159
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
lder+0.149
 property+0.144
 client+0.143
 means+0.141
rums+0.139
Neg logit contributions
 shallow-0.086
 sun-0.086
 crashing-0.085
ores-0.080
 fresh-0.078
Token<|endoftext|>
Token ID50256
Feature activation+6.95e-02

Pos logit contributions
 means+0.036
lder+0.035
 client+0.035
grand+0.035
keeping+0.034
Neg logit contributions
 shallow-0.026
 crashing-0.024
wered-0.024
add-0.023
Ho-0.023
TokenM
Token ID44
Feature activation+5.45e-02

Pos logit contributions
 means+0.130
lder+0.129
 client+0.126
keeping+0.125
 property+0.125
Neg logit contributions
 shallow-0.084
 crashing-0.079
wered-0.078
Ho-0.076
ores-0.075
Tokenummy
Token ID13513
Feature activation+6.03e-02

Pos logit contributions
 property+0.090
 means+0.087
 client+0.086
lder+0.086
 fault+0.085
Neg logit contributions
 shallow-0.079
 sun-0.077
Ho-0.076
add-0.075
wered-0.075
Token He
Token ID679
Feature activation-4.17e-02

Pos logit contributions
lder+0.028
 means+0.028
 client+0.028
 property+0.027
grand+0.027
Neg logit contributions
 shallow-0.018
Ho-0.017
wered-0.016
add-0.016
 crashing-0.016
Token had
Token ID550
Feature activation-1.51e-03

Pos logit contributions
 shallow+0.070
wered+0.068
 crashing+0.068
Ho+0.067
ores+0.066
Neg logit contributions
lder-0.058
 means-0.057
 client-0.057
 property-0.056
grand-0.055
Token such
Token ID884
Feature activation+5.69e-02

Pos logit contributions
 shallow+0.002
Ho+0.002
 crashing+0.002
wered+0.002
ores+0.002
Neg logit contributions
lder-0.003
 property-0.003
 client-0.003
 means-0.003
 fault-0.003
Token a
Token ID257
Feature activation+2.55e-02

Pos logit contributions
 property+0.111
 client+0.111
lder+0.110
 means+0.110
 fault+0.108
Neg logit contributions
 shallow-0.063
 crashing-0.059
Ho-0.059
wered-0.057
ores-0.056
Token fun
Token ID1257
Feature activation+8.14e-02

Pos logit contributions
lder+0.039
 client+0.039
 property+0.038
 fault+0.038
 policy+0.036
Neg logit contributions
wered-0.038
 shallow-0.038
Ho-0.038
 crashing-0.037
add-0.036
Token day
Token ID1110
Feature activation+8.95e-02

Pos logit contributions
lder+0.125
 property+0.124
 client+0.122
 means+0.121
 fault+0.120
Neg logit contributions
 shallow-0.126
Ho-0.123
wered-0.121
 crashing-0.120
add-0.116
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
 property+0.143
keeping+0.142
rd+0.141
grand+0.140
lder+0.139
Neg logit contributions
wered-0.131
 shallow-0.127
Ho-0.125
add-0.121
Tweet-0.115
Token learning
Token ID4673
Feature activation+7.74e-03

Pos logit contributions
wered+0.019
add+0.019
Ho+0.017
arl+0.017
 shallow+0.017
Neg logit contributions
 fault-0.020
 property-0.020
grand-0.020
 policy-0.019
 worry-0.019
Token about
Token ID546
Feature activation+2.73e-04

Pos logit contributions
 property+0.014
 means+0.014
 client+0.014
lder+0.013
 fault+0.013
Neg logit contributions
 shallow-0.010
Ho-0.010
wered-0.009
 crashing-0.009
add-0.009
Token l
Token ID300
Feature activation-6.97e-02

Pos logit contributions
 property+0.001
 means+0.001
 client+0.001
 fault+0.001
lder+0.000
Neg logit contributions
 shallow-0.000
Ho-0.000
add-0.000
wered-0.000
 crashing-0.000
Tokenizards
Token ID14124
Feature activation-3.98e-02

Pos logit contributions
Ho+0.114
 sun+0.110
 sunscreen+0.109
wered+0.108
 crashing+0.107
Neg logit contributions
 fault-0.092
 property-0.091
 client-0.091
keeping-0.087
rd-0.087
Token Billy
Token ID15890
Feature activation+3.45e-02

Pos logit contributions
lder+0.040
 client+0.040
 means+0.040
 property+0.039
grand+0.039
Neg logit contributions
 shallow-0.025
Ho-0.024
 crashing-0.023
wered-0.023
ores-0.022
Token.
Token ID13
Feature activation+1.41e-02

Pos logit contributions
 property+0.071
lder+0.069
 client+0.068
 policy+0.068
 endings+0.068
Neg logit contributions
 crashing-0.037
 shallow-0.036
 sun-0.035
Ho-0.033
 farther-0.032
Token "
Token ID366
Feature activation+9.66e-03

Pos logit contributions
lder+0.027
 client+0.027
 property+0.027
 means+0.027
 fault+0.026
Neg logit contributions
 shallow-0.016
Ho-0.015
 crashing-0.015
wered-0.015
ores-0.015
TokenAre
Token ID8491
Feature activation+1.15e-02

Pos logit contributions
lder+0.017
 property+0.016
grand+0.016
 client+0.016
 means+0.016
Neg logit contributions
 shallow-0.013
add-0.012
 crashing-0.012
 camps-0.012
ores-0.012
Token you
Token ID345
Feature activation+4.88e-02

Pos logit contributions
lder+0.014
 property+0.014
 client+0.014
 means+0.014
 fault+0.014
Neg logit contributions
 shallow-0.021
 crashing-0.020
Ho-0.020
wered-0.020
 sun-0.019
Token okay
Token ID8788
Feature activation+9.52e-02

Pos logit contributions
lder+0.085
 property+0.085
 client+0.084
 means+0.083
grand+0.082
Neg logit contributions
 shallow-0.066
 crashing-0.062
Ho-0.062
wered-0.061
 sun-0.061
Token?"
Token ID1701
Feature activation+2.88e-02

Pos logit contributions
keeping+0.145
 property+0.143
grand+0.141
 client+0.138
lder+0.136
Neg logit contributions
 sun-0.150
 crashing-0.148
 shallow-0.148
ores-0.143
 spring-0.138
Token
Token ID198
Feature activation-6.05e-02

Pos logit contributions
lder+0.060
 means+0.060
 client+0.060
 property+0.059
grand+0.059
Neg logit contributions
 shallow-0.029
 crashing-0.027
Ho-0.027
wered-0.026
ores-0.025
Token
Token ID198
Feature activation-6.27e-02

Pos logit contributions
 shallow+0.081
wered+0.080
 crashing+0.079
 sunscreen+0.075
 drifting+0.074
Neg logit contributions
grand-0.104
lder-0.103
 means-0.100
rums-0.100
 policy-0.100
Token"
Token ID1
Feature activation+2.96e-02

Pos logit contributions
 shallow+0.078
 crashing+0.075
wered+0.075
ores+0.072
Ho+0.072
Neg logit contributions
lder-0.116
 property-0.115
grand-0.112
 client-0.112
 means-0.112
TokenI
Token ID40
Feature activation-1.38e-03

Pos logit contributions
lder+0.048
 client+0.048
 means+0.048
grand+0.047
 property+0.047
Neg logit contributions
 shallow-0.042
 crashing-0.040
Ho-0.040
wered-0.039
ores-0.039
Token<|endoftext|>
Token ID50256
Feature activation+6.85e-02

Pos logit contributions
 property+0.067
lder+0.065
 fault+0.064
 client+0.063
 means+0.063
Neg logit contributions
 sun-0.019
 shallow-0.018
Ho-0.018
 crashing-0.017
ores-0.016
TokenS
Token ID50
Feature activation+2.21e-02

Pos logit contributions
 means+0.128
lder+0.128
 property+0.127
 client+0.126
grand+0.123
Neg logit contributions
 shallow-0.081
 crashing-0.077
Ho-0.076
wered-0.075
ores-0.073
Tokenara
Token ID3301
Feature activation+2.16e-02

Pos logit contributions
lder+0.032
 client+0.030
 property+0.030
 means+0.030
grand+0.030
Neg logit contributions
 shallow-0.038
 crashing-0.037
Ho-0.035
ores-0.035
wered-0.034
Token was
Token ID373
Feature activation+1.70e-02

Pos logit contributions
 client+0.051
 property+0.050
 fault+0.050
 decisions+0.049
lder+0.049
Neg logit contributions
 shallow-0.015
 sun-0.014
Ho-0.014
ores-0.013
hes-0.013
Token a
Token ID257
Feature activation+4.23e-02

Pos logit contributions
lder+0.024
 fault+0.024
 client+0.024
 means+0.023
rd+0.022
Neg logit contributions
Ho-0.029
 shallow-0.026
 crashing-0.025
wered-0.025
Tweet-0.025
Token stubborn
Token ID24444
Feature activation+1.16e-01

Pos logit contributions
lder+0.076
 client+0.075
 property+0.071
 means+0.071
grand+0.069
Neg logit contributions
 shallow-0.056
 crashing-0.053
wered-0.053
Ho-0.053
ores-0.052
Token girl
Token ID2576
Feature activation+9.05e-02

Pos logit contributions
lder+0.154
 property+0.154
 client+0.152
 means+0.150
grand+0.147
Neg logit contributions
 shallow-0.201
 crashing-0.192
Ho-0.192
 sun-0.190
wered-0.189
Token.
Token ID13
Feature activation+5.31e-02

Pos logit contributions
lder+0.183
 fault+0.182
 property+0.181
 client+0.178
keeping+0.177
Neg logit contributions
 sun-0.092
 shallow-0.091
 crashing-0.088
Ho-0.085
ores-0.082
Token She
Token ID1375
Feature activation-2.81e-02

Pos logit contributions
lder+0.087
 means+0.086
 client+0.083
grand+0.082
 property+0.081
Neg logit contributions
 shallow-0.079
 crashing-0.075
Ho-0.074
wered-0.070
add-0.070
Token did
Token ID750
Feature activation+5.18e-02

Pos logit contributions
 shallow+0.038
 sun+0.036
Ho+0.035
wered+0.035
hes+0.035
Neg logit contributions
keeping-0.048
 property-0.047
 policy-0.047
 fault-0.047
 client-0.047
Token not
Token ID407
Feature activation+1.20e-02

Pos logit contributions
lder+0.086
 property+0.086
 client+0.086
 means+0.085
grand+0.083
Neg logit contributions
 shallow-0.073
Ho-0.070
 crashing-0.069
wered-0.067
 sun-0.067
Token it
Token ID340
Feature activation+1.14e-04

Pos logit contributions
 property+0.021
 means+0.021
lder+0.021
 client+0.021
 fault+0.020
Neg logit contributions
 shallow-0.024
Ho-0.023
wered-0.022
 crashing-0.022
add-0.022
Token's
Token ID338
Feature activation+3.42e-02

Pos logit contributions
keeping+0.000
 client+0.000
grand+0.000
 policy+0.000
 property+0.000
Neg logit contributions
 spring-0.000
 sun-0.000
hes-0.000
 shallow-0.000
ch-0.000
Token much
Token ID881
Feature activation+2.44e-02

Pos logit contributions
 client+0.072
lder+0.071
grand+0.071
keeping+0.069
 property+0.068
Neg logit contributions
 shallow-0.038
 sun-0.036
 fresh-0.035
ores-0.035
 crashing-0.033
Token better
Token ID1365
Feature activation+3.77e-02

Pos logit contributions
 client+0.054
keeping+0.054
lder+0.053
grand+0.052
 property+0.051
Neg logit contributions
 shallow-0.026
 fresh-0.025
 sun-0.025
ores-0.022
 higher-0.021
Token now
Token ID783
Feature activation+8.21e-02

Pos logit contributions
lder+0.077
 client+0.074
 Painter+0.074
grand+0.073
rums+0.073
Neg logit contributions
 sun-0.041
 fresh-0.038
ores-0.038
 spring-0.037
 until-0.036
Token."
Token ID526
Feature activation+3.86e-02

Pos logit contributions
 client+0.157
lder+0.151
grand+0.148
 property+0.147
roc+0.146
Neg logit contributions
 sun-0.097
 fresh-0.092
ores-0.091
 crashing-0.091
 shallow-0.089
Token
Token ID220
Feature activation+1.05e-02

Pos logit contributions
 means+0.065
grand+0.063
lder+0.063
keeping+0.063
 client+0.061
Neg logit contributions
 shallow-0.055
wered-0.055
Ho-0.051
 crashing-0.051
add-0.050
Token
Token ID198
Feature activation-4.86e-02

Pos logit contributions
 means+0.020
 fault+0.020
lder+0.020
 client+0.020
grand+0.020
Neg logit contributions
 shallow-0.012
Ho-0.011
wered-0.011
 crashing-0.011
 sun-0.010
Token
Token ID198
Feature activation-5.22e-02

Pos logit contributions
 shallow+0.062
 crashing+0.059
wered+0.059
ores+0.055
 drifting+0.055
Neg logit contributions
lder-0.088
grand-0.087
 means-0.083
keeping-0.082
 client-0.082
TokenFrom
Token ID4863
Feature activation-4.36e-02

Pos logit contributions
 shallow+0.056
 crashing+0.053
wered+0.053
 sun+0.052
ores+0.051
Neg logit contributions
lder-0.104
grand-0.103
 client-0.098
 property-0.098
rd-0.097
Token that
Token ID326
Feature activation+3.14e-02

Pos logit contributions
 shallow+0.059
 crashing+0.059
Ho+0.058
wered+0.056
add+0.056
Neg logit contributions
 means-0.074
 property-0.074
lder-0.073
 client-0.070
grand-0.070
Token with
Token ID351
Feature activation-3.69e-02

Pos logit contributions
 shallow+0.007
 crashing+0.007
Ho+0.007
 sun+0.007
wered+0.007
Neg logit contributions
lder-0.010
 property-0.010
 client-0.010
 means-0.010
 fault-0.010
Token her
Token ID607
Feature activation+2.12e-02

Pos logit contributions
 shallow+0.046
Ho+0.044
 crashing+0.044
wered+0.041
add+0.041
Neg logit contributions
 client-0.068
lder-0.067
 property-0.067
 means-0.067
grand-0.066
Token mom
Token ID1995
Feature activation+7.06e-02

Pos logit contributions
lder+0.041
 property+0.040
 means+0.040
rums+0.039
keeping+0.039
Neg logit contributions
 sun-0.027
 shallow-0.025
 fresh-0.024
 sunscreen-0.023
 crashing-0.023
Token to
Token ID284
Feature activation+6.65e-02

Pos logit contributions
lder+0.130
 client+0.129
 property+0.129
 means+0.126
 fault+0.125
Neg logit contributions
 shallow-0.087
 crashing-0.082
 sun-0.082
Ho-0.082
ores-0.079
Token catch
Token ID4929
Feature activation-2.75e-02

Pos logit contributions
 worry+0.084
 client+0.077
 prove+0.076
 wish+0.076
grand+0.075
Neg logit contributions
 shallow-0.122
inn-0.119
Ho-0.118
add-0.117
avers-0.113
Token y
Token ID331
Feature activation+5.38e-02

Pos logit contributions
 shallow+0.028
 crashing+0.026
Ho+0.026
wered+0.025
 sun+0.025
Neg logit contributions
lder-0.056
 property-0.056
 client-0.056
 means-0.055
grand-0.054
Tokenummy
Token ID13513
Feature activation+7.76e-02

Pos logit contributions
lder+0.064
 property+0.062
 client+0.061
 policy+0.060
 means+0.059
Neg logit contributions
 shallow-0.102
 crashing-0.098
Ho-0.098
wered-0.096
hes-0.095
Token treats
Token ID18432
Feature activation+9.47e-03

Pos logit contributions
 client+0.160
lder+0.158
 fault+0.157
 property+0.157
grand+0.155
Neg logit contributions
 sun-0.084
 shallow-0.081
 fresh-0.075
 sunscreen-0.074
ch-0.073
Token.
Token ID13
Feature activation+3.82e-02

Pos logit contributions
lder+0.017
 client+0.017
 property+0.017
 means+0.016
 fault+0.016
Neg logit contributions
 shallow-0.012
 crashing-0.012
 sun-0.011
ores-0.011
Ho-0.011
Token
Token ID220
Feature activation+2.23e-02

Pos logit contributions
 property+0.080
lder+0.080
 client+0.080
 means+0.079
 fault+0.077
Neg logit contributions
 shallow-0.037
Ho-0.034
 crashing-0.034
 sun-0.034
wered-0.033
Token
Token ID198
Feature activation-3.74e-02

Pos logit contributions
grand+0.040
 means+0.040
lder+0.040
 fault+0.039
 client+0.039
Neg logit contributions
 shallow-0.029
 crashing-0.028
Ho-0.027
wered-0.026
 sun-0.025
Token but
Token ID475
Feature activation+8.53e-03

Pos logit contributions
 shallow+0.010
add+0.010
Ho+0.010
wered+0.010
ores+0.010
Neg logit contributions
 property-0.015
 means-0.015
 client-0.014
 fault-0.014
lder-0.014
Token now
Token ID783
Feature activation+6.96e-02

Pos logit contributions
lder+0.018
 property+0.018
 client+0.018
 fault+0.018
grand+0.017
Neg logit contributions
 shallow-0.008
 sun-0.008
 crashing-0.008
wered-0.008
Ho-0.007
Token she
Token ID673
Feature activation-1.20e-02

Pos logit contributions
lder+0.160
 property+0.158
 client+0.157
 fault+0.156
grand+0.155
Neg logit contributions
 sun-0.057
 shallow-0.054
 spring-0.049
 fresh-0.049
 crashing-0.048
Token was
Token ID373
Feature activation-5.81e-03

Pos logit contributions
 shallow+0.018
 sun+0.017
 crashing+0.017
Ho+0.017
add+0.016
Neg logit contributions
 property-0.019
 client-0.018
lder-0.018
keeping-0.018
 policy-0.018
Token no
Token ID645
Feature activation+5.01e-02

Pos logit contributions
Ho+0.008
 shallow+0.008
wered+0.007
 crashing+0.007
 camps+0.007
Neg logit contributions
 client-0.010
lder-0.010
 property-0.010
 means-0.010
 fault-0.010
Token longer
Token ID2392
Feature activation+3.69e-02

Pos logit contributions
lder+0.126
 property+0.125
 client+0.123
grand+0.122
 means+0.122
Neg logit contributions
 shallow-0.031
 crashing-0.027
 sun-0.026
Ho-0.026
wered-0.025
Token scared
Token ID12008
Feature activation+7.76e-02

Pos logit contributions
lder+0.071
grand+0.069
 client+0.068
 property+0.067
 policy+0.067
Neg logit contributions
 shallow-0.046
 crashing-0.043
 sun-0.042
 fresh-0.040
ores-0.039
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
lder+0.146
 property+0.143
 client+0.143
 means+0.141
grand+0.140
Neg logit contributions
 shallow-0.094
 crashing-0.090
 sun-0.087
Ho-0.087
wered-0.085
Token In
Token ID554
Feature activation-3.40e-02

Pos logit contributions
lder+0.042
 property+0.042
 client+0.042
 means+0.042
 fault+0.041
Neg logit contributions
 shallow-0.023
Ho-0.022
 crashing-0.022
 sun-0.021
wered-0.021
Token fact
Token ID1109
Feature activation+7.19e-02

Pos logit contributions
 shallow+0.035
Ho+0.033
 crashing+0.033
ores+0.031
wered+0.031
Neg logit contributions
 client-0.069
lder-0.069
 property-0.069
 means-0.069
 fault-0.068
Token,
Token ID11
Feature activation-5.63e-03

Pos logit contributions
keeping+0.129
 property+0.128
 client+0.126
lder+0.123
grand+0.122
Neg logit contributions
 shallow-0.093
add-0.092
 sun-0.090
Ho-0.088
 fresh-0.087
Token proud
Token ID6613
Feature activation+5.94e-02

Pos logit contributions
 shallow+0.118
Ho+0.111
 crashing+0.110
wered+0.109
ores+0.106
Neg logit contributions
lder-0.187
 property-0.185
 client-0.185
 means-0.184
grand-0.180
Token of
Token ID286
Feature activation-3.78e-02

Pos logit contributions
 property+0.114
 client+0.114
lder+0.114
 means+0.114
 policy+0.111
Neg logit contributions
 shallow-0.068
Ho-0.064
wered-0.063
 crashing-0.063
ores-0.061
Token himself
Token ID2241
Feature activation+2.59e-04

Pos logit contributions
 shallow+0.041
Ho+0.038
 crashing+0.037
wered+0.036
 sun+0.036
Neg logit contributions
 property-0.076
 client-0.075
 means-0.075
lder-0.074
 policy-0.074
Token.
Token ID13
Feature activation+1.68e-02

Pos logit contributions
lder+0.001
 property+0.000
 client+0.000
grand+0.000
 means+0.000
Neg logit contributions
 shallow-0.000
 crashing-0.000
 sun-0.000
ores-0.000
Ho-0.000
Token He
Token ID679
Feature activation-3.90e-02

Pos logit contributions
 means+0.031
 client+0.030
lder+0.029
grand+0.029
 fault+0.028
Neg logit contributions
 shallow-0.022
wered-0.021
 crashing-0.021
ores-0.020
 sunscreen-0.020
Token realized
Token ID6939
Feature activation+2.63e-02

Pos logit contributions
 shallow+0.064
wered+0.062
ores+0.061
Ho+0.061
 crashing+0.061
Neg logit contributions
 means-0.055
lder-0.055
 client-0.054
 property-0.053
 depends-0.052
Token that
Token ID326
Feature activation+3.62e-02

Pos logit contributions
 means+0.042
 client+0.042
 property+0.042
lder+0.041
grand+0.040
Neg logit contributions
 shallow-0.039
Ho-0.037
 crashing-0.037
wered-0.037
add-0.036
Token even
Token ID772
Feature activation-2.85e-02

Pos logit contributions
 property+0.069
 client+0.069
lder+0.069
 means+0.069
grand+0.067
Neg logit contributions
 shallow-0.042
Ho-0.039
 crashing-0.039
wered-0.038
ores-0.038
Token though
Token ID996
Feature activation+2.51e-02

Pos logit contributions
 shallow+0.037
 crashing+0.035
Ho+0.034
wered+0.034
ores+0.033
Neg logit contributions
 property-0.051
 client-0.051
 means-0.050
lder-0.050
grand-0.049
Token he
Token ID339
Feature activation-9.29e-03

Pos logit contributions
lder+0.054
 property+0.053
 client+0.053
 means+0.052
grand+0.052
Neg logit contributions
 shallow-0.024
 crashing-0.022
Ho-0.022
 sun-0.021
wered-0.021
Token was
Token ID373
Feature activation-3.25e-03

Pos logit contributions
 shallow+0.015
 sun+0.014
Ho+0.013
 crashing+0.013
 spring+0.013
Neg logit contributions
keeping-0.014
lder-0.014
 property-0.013
 client-0.013
 policy-0.013
Token "
Token ID366
Feature activation+1.49e-02

Pos logit contributions
 shallow+0.019
add+0.018
wered+0.018
Ho+0.018
inn+0.017
Neg logit contributions
 means-0.016
lder-0.016
 property-0.014
 client-0.014
 depends-0.014
TokenI
Token ID40
Feature activation-9.71e-05

Pos logit contributions
 means+0.026
grand+0.026
lder+0.026
 client+0.025
 property+0.025
Neg logit contributions
 shallow-0.021
ores-0.019
wered-0.018
 sun-0.018
 crashing-0.018
Token am
Token ID716
Feature activation+3.06e-02

Pos logit contributions
 spring+0.000
ores+0.000
 shallow+0.000
ch+0.000
 crashing+0.000
Neg logit contributions
keeping-0.000
 property-0.000
rums-0.000
lder-0.000
 client-0.000
Token here
Token ID994
Feature activation+6.46e-02

Pos logit contributions
lder+0.061
 property+0.060
 client+0.060
 means+0.060
 fault+0.059
Neg logit contributions
 shallow-0.033
Ho-0.031
 crashing-0.031
wered-0.030
 sun-0.030
Token".
Token ID1911
Feature activation+4.51e-02

Pos logit contributions
lder+0.121
 client+0.120
rums+0.118
keeping+0.117
 property+0.116
Neg logit contributions
 crashing-0.078
 sun-0.078
 shallow-0.077
 fresh-0.074
ores-0.073
Token
Token ID220
Feature activation+1.33e-02

Pos logit contributions
lder+0.084
 means+0.083
 client+0.083
 property+0.083
grand+0.081
Neg logit contributions
 shallow-0.055
 crashing-0.051
Ho-0.051
wered-0.050
ores-0.049
Token
Token ID198
Feature activation-5.34e-02

Pos logit contributions
 means+0.026
 fault+0.025
 client+0.025
lder+0.025
 property+0.025
Neg logit contributions
 shallow-0.015
 crashing-0.014
Ho-0.014
wered-0.014
 sun-0.013
Token
Token ID198
Feature activation-5.60e-02

Pos logit contributions
 shallow+0.062
 crashing+0.058
ores+0.055
wered+0.055
Ho+0.054
Neg logit contributions
lder-0.104
 means-0.102
grand-0.100
 client-0.100
 property-0.098
TokenThe
Token ID464
Feature activation-4.75e-02

Pos logit contributions
 shallow+0.054
ores+0.053
add+0.051
 crashing+0.050
 sun+0.049
Neg logit contributions
lder-0.120
 property-0.114
 client-0.114
 means-0.114
grand-0.113
Token man
Token ID582
Feature activation+1.63e-02

Pos logit contributions
 shallow+0.077
 crashing+0.074
Ho+0.073
ores+0.072
wered+0.072
Neg logit contributions
lder-0.070
 property-0.070
 client-0.070
 means-0.069
grand-0.066
Token was
Token ID373
Feature activation-1.25e-02

Pos logit contributions
 fault+0.027
 property+0.027
 client+0.027
 policy+0.027
lder+0.026
Neg logit contributions
 shallow-0.023
Ho-0.022
 crashing-0.022
 sun-0.021
wered-0.021
Token thank
Token ID5875
Feature activation+7.38e-03

Pos logit contributions
 means+0.009
 client+0.009
lder+0.008
 property+0.008
grand+0.008
Neg logit contributions
 shallow-0.010
Ho-0.009
wered-0.009
ores-0.009
 crashing-0.009
Token you
Token ID345
Feature activation+4.20e-02

Pos logit contributions
lder+0.009
 property+0.009
 client+0.009
 means+0.009
grand+0.008
Neg logit contributions
 shallow-0.014
 crashing-0.013
Ho-0.013
wered-0.013
ores-0.013
Token for
Token ID329
Feature activation+3.78e-02

Pos logit contributions
lder+0.082
 client+0.082
grand+0.080
 means+0.079
 property+0.079
Neg logit contributions
 sun-0.048
 shallow-0.048
 crashing-0.048
ores-0.044
 fresh-0.043
Token finding
Token ID4917
Feature activation-9.25e-03

Pos logit contributions
 property+0.063
lder+0.063
 client+0.061
keeping+0.060
 means+0.060
Neg logit contributions
 shallow-0.053
Ho-0.052
add-0.049
 crashing-0.049
ores-0.048
Token my
Token ID616
Feature activation+4.64e-02

Pos logit contributions
 shallow+0.008
 crashing+0.007
Ho+0.007
wered+0.007
ores+0.007
Neg logit contributions
 property-0.021
 means-0.020
 client-0.020
lder-0.020
 fault-0.020
Token sack
Token ID23704
Feature activation+6.14e-04

Pos logit contributions
lder+0.084
 means+0.080
rums+0.079
 client+0.077
grand+0.077
Neg logit contributions
 sun-0.062
 shallow-0.059
 fresh-0.058
 sunscreen-0.057
 crashing-0.054
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
lder+0.001
 client+0.001
 property+0.001
 means+0.001
rums+0.001
Neg logit contributions
 shallow-0.001
 sun-0.001
 crashing-0.001
ores-0.001
 fresh-0.001
Token I
Token ID314
Feature activation+5.95e-03

Pos logit contributions
 means+0.015
 client+0.015
grand+0.015
lder+0.014
 property+0.014
Neg logit contributions
 shallow-0.025
 crashing-0.024
wered-0.023
add-0.023
Ho-0.023
Token left
Token ID1364
Feature activation-2.24e-02

Pos logit contributions
 client+0.009
lder+0.009
 property+0.008
keeping+0.008
rums+0.008
Neg logit contributions
 shallow-0.010
ores-0.009
Ho-0.009
add-0.009
 sun-0.009
Token it
Token ID340
Feature activation+2.36e-03

Pos logit contributions
 shallow+0.034
Ho+0.034
add+0.033
wered+0.033
 crashing+0.032
Neg logit contributions
 property-0.034
 means-0.034
keeping-0.034
 client-0.033
 policy-0.033
Token in
Token ID287
Feature activation-7.25e-03

Pos logit contributions
 property+0.004
lder+0.004
 means+0.004
 client+0.004
keeping+0.004
Neg logit contributions
wered-0.003
Ho-0.003
 shallow-0.003
 crashing-0.003
add-0.003
Tokenuddled
Token ID32745
Feature activation-9.47e-02

Pos logit contributions
wered+0.174
 shallow+0.169
 crashing+0.162
 south+0.157
Ho+0.156
Neg logit contributions
lder-0.122
 property-0.117
 fault-0.117
 endings-0.114
idents-0.111
Token on
Token ID319
Feature activation+3.53e-03

Pos logit contributions
 shallow+0.116
wered+0.110
Ho+0.109
add+0.106
 crashing+0.105
Neg logit contributions
 property-0.177
keeping-0.173
 means-0.173
lder-0.172
 fault-0.168
Token the
Token ID262
Feature activation-7.57e-03

Pos logit contributions
 property+0.006
 fault+0.006
 means+0.006
lder+0.006
keeping+0.006
Neg logit contributions
Ho-0.004
 shallow-0.004
wered-0.004
add-0.004
hes-0.004
Token shore
Token ID15191
Feature activation-5.32e-02

Pos logit contributions
ores+0.009
add+0.009
wered+0.009
hes+0.009
Ho+0.009
Neg logit contributions
 property-0.014
lder-0.013
keeping-0.013
 client-0.013
 fault-0.013
Token.
Token ID13
Feature activation+9.68e-03

Pos logit contributions
 shallow+0.075
Ho+0.073
wered+0.072
 crashing+0.070
add+0.069
Neg logit contributions
 property-0.088
lder-0.086
 means-0.086
 client-0.085
 fault-0.084
Token O
Token ID440
Feature activation-2.62e-02

Pos logit contributions
 fault+0.013
 means+0.013
 depends+0.011
keeping+0.011
 wishes+0.011
Neg logit contributions
 farther-0.017
wered-0.017
eh-0.016
Ho-0.016
 higher-0.016
Tokenll
Token ID297
Feature activation+1.66e-02

Pos logit contributions
 shallow+0.047
 crashing+0.045
Ho+0.044
 sun+0.044
wered+0.044
Neg logit contributions
lder-0.035
 property-0.034
 means-0.033
 client-0.033
 fault-0.033
Tokenie
Token ID494
Feature activation+2.41e-05

Pos logit contributions
 property+0.022
lder+0.022
 means+0.022
 client+0.022
 fault+0.021
Neg logit contributions
 shallow-0.028
Ho-0.027
wered-0.027
ores-0.027
 crashing-0.027
Token felt
Token ID2936
Feature activation-5.41e-02

Pos logit contributions
 property+0.000
 client+0.000
 policy+0.000
lder+0.000
grand+0.000
Neg logit contributions
 shallow-0.000
 sun-0.000
 crashing-0.000
Ho-0.000
ores-0.000
Token warm
Token ID5814
Feature activation-7.79e-03

Pos logit contributions
Ho+0.079
add+0.078
wered+0.076
 camps+0.073
 crashing+0.073
Neg logit contributions
 means-0.081
 property-0.081
 wishes-0.080
 fault-0.078
 decisions-0.078
Token and
Token ID290
Feature activation-1.16e-01

Pos logit contributions
wered+0.010
 shallow+0.010
Ho+0.010
add+0.010
ores+0.009
Neg logit contributions
keeping-0.014
 property-0.014
 means-0.014
lder-0.013
 fault-0.013
Token and
Token ID290
Feature activation-1.05e-01

Pos logit contributions
lder+0.011
 means+0.011
 property+0.011
 client+0.011
grand+0.011
Neg logit contributions
 shallow-0.006
Ho-0.006
 crashing-0.006
wered-0.006
add-0.005
Token did
Token ID750
Feature activation+4.59e-02

Pos logit contributions
 shallow+0.141
Ho+0.133
 crashing+0.133
wered+0.131
 sun+0.129
Neg logit contributions
lder-0.182
 property-0.181
 client-0.180
 means-0.180
grand-0.175
Token all
Token ID477
Feature activation-1.75e-02

Pos logit contributions
 property+0.085
lder+0.085
 client+0.084
 means+0.084
 fault+0.082
Neg logit contributions
 shallow-0.056
Ho-0.053
 crashing-0.052
wered-0.051
 sun-0.050
Token of
Token ID286
Feature activation-4.30e-02

Pos logit contributions
 shallow+0.018
 crashing+0.016
 sun+0.016
wered+0.016
ores+0.016
Neg logit contributions
lder-0.037
 client-0.037
 property-0.037
 fault-0.036
 means-0.035
Token his
Token ID465
Feature activation+8.42e-03

Pos logit contributions
 shallow+0.044
Ho+0.043
ores+0.042
add+0.041
 crashing+0.041
Neg logit contributions
 property-0.086
 means-0.085
 fault-0.084
 client-0.084
lder-0.083
Token work
Token ID670
Feature activation-2.15e-02

Pos logit contributions
 client+0.014
grand+0.014
 fault+0.014
keeping+0.014
lder+0.014
Neg logit contributions
 sun-0.012
 fresh-0.011
 shallow-0.011
 sunscreen-0.011
Ho-0.011
Token with
Token ID351
Feature activation-4.94e-02

Pos logit contributions
 shallow+0.027
 crashing+0.026
 sun+0.025
ores+0.025
Ho+0.024
Neg logit contributions
lder-0.040
 client-0.039
 property-0.039
 means-0.038
grand-0.038
Token dedication
Token ID22445
Feature activation+2.37e-02

Pos logit contributions
Ho+0.051
 shallow+0.051
 crashing+0.047
add+0.047
wered+0.046
Neg logit contributions
 property-0.099
 means-0.099
 client-0.099
 fault-0.098
lder-0.096
Token.
Token ID13
Feature activation+2.06e-02

Pos logit contributions
lder+0.043
 client+0.042
 property+0.042
 means+0.041
grand+0.041
Neg logit contributions
 shallow-0.030
 crashing-0.029
 sun-0.028
Ho-0.028
ores-0.027
Token He
Token ID679
Feature activation-3.54e-02

Pos logit contributions
 means+0.038
 client+0.037
 fault+0.037
lder+0.036
 property+0.036
Neg logit contributions
 shallow-0.026
Ho-0.025
 crashing-0.025
wered-0.024
 sunscreen-0.023
Token watered
Token ID48024
Feature activation-7.23e-02

Pos logit contributions
 shallow+0.053
 crashing+0.050
Ho+0.050
 sun+0.050
wered+0.049
Neg logit contributions
 property-0.056
lder-0.055
 client-0.055
 fault-0.054
 policy-0.054
Token.
Token ID13
Feature activation+5.47e-03

Pos logit contributions
lder+0.027
 client+0.026
 fault+0.026
grand+0.026
 property+0.025
Neg logit contributions
 shallow-0.014
 sun-0.013
ores-0.012
Ho-0.012
 fresh-0.012
Token They
Token ID1119
Feature activation-5.91e-02

Pos logit contributions
 means+0.010
 client+0.010
 fault+0.010
 property+0.010
lder+0.009
Neg logit contributions
 shallow-0.007
 crashing-0.007
Ho-0.007
wered-0.007
add-0.006
Token fix
Token ID4259
Feature activation-5.56e-02

Pos logit contributions
 shallow+0.093
 crashing+0.089
Ho+0.089
wered+0.087
 sun+0.086
Neg logit contributions
lder-0.088
 property-0.088
 client-0.088
 means-0.087
 fault-0.084
Token the
Token ID262
Feature activation-1.59e-02

Pos logit contributions
 shallow+0.049
 crashing+0.043
 sun+0.043
Ho+0.042
ores+0.041
Neg logit contributions
lder-0.126
 client-0.123
 property-0.123
grand-0.122
 means-0.120
Token car
Token ID1097
Feature activation-5.82e-02

Pos logit contributions
 shallow+0.021
Ho+0.020
 crashing+0.020
wered+0.019
ores+0.019
Neg logit contributions
lder-0.028
 property-0.028
 client-0.028
 means-0.028
 fault-0.027
Token with
Token ID351
Feature activation-4.20e-02

Pos logit contributions
 shallow+0.058
 crashing+0.054
Ho+0.053
 sun+0.052
ores+0.051
Neg logit contributions
lder-0.122
 client-0.120
 property-0.120
 means-0.118
grand-0.117
Token some
Token ID617
Feature activation-2.43e-03

Pos logit contributions
 shallow+0.047
Ho+0.046
 crashing+0.044
add+0.043
wered+0.043
Neg logit contributions
 property-0.081
 means-0.081
 fault-0.080
 client-0.080
lder-0.077
Token tape
Token ID9154
Feature activation-7.54e-02

Pos logit contributions
 shallow+0.003
Ho+0.003
wered+0.003
 crashing+0.003
ores+0.003
Neg logit contributions
 property-0.004
 means-0.004
 client-0.004
 fault-0.004
lder-0.004
Token,
Token ID11
Feature activation-3.02e-02

Pos logit contributions
 shallow+0.070
 crashing+0.067
 sun+0.066
ores+0.060
 float+0.059
Neg logit contributions
lder-0.167
 client-0.162
 property-0.159
grand-0.158
 privacy-0.153
Token and
Token ID290
Feature activation-1.19e-01

Pos logit contributions
add+0.046
Ho+0.043
 shallow+0.042
inn+0.041
 camps+0.040
Neg logit contributions
 means-0.051
 fault-0.051
 feelings-0.049
 property-0.048
 privacy-0.047
Token take
Token ID1011
Feature activation-1.02e-01

Pos logit contributions
 shallow+0.139
Ho+0.130
 crashing+0.130
wered+0.127
 sun+0.125
Neg logit contributions
lder-0.226
 property-0.225
 client-0.225
 means-0.223
grand-0.218
Token the
Token ID262
Feature activation-5.80e-03

Pos logit contributions
 client+0.089
 means+0.089
lder+0.088
 property+0.088
keeping+0.086
Neg logit contributions
Ho-0.058
 shallow-0.056
 crashing-0.055
add-0.054
ores-0.051
Token store
Token ID3650
Feature activation-1.20e-02

Pos logit contributions
 shallow+0.006
 crashing+0.005
Ho+0.005
wered+0.005
 sun+0.005
Neg logit contributions
lder-0.012
 property-0.012
 client-0.012
 means-0.012
grand-0.012
Token and
Token ID290
Feature activation-1.02e-01

Pos logit contributions
 shallow+0.012
Ho+0.011
 crashing+0.011
wered+0.011
 sun+0.011
Neg logit contributions
lder-0.025
 property-0.025
 client-0.025
 means-0.025
grand-0.024
Token came
Token ID1625
Feature activation-7.25e-02

Pos logit contributions
 crashing+0.152
Ho+0.149
 shallow+0.149
wered+0.147
add+0.142
Neg logit contributions
lder-0.162
 means-0.159
 client-0.156
grand-0.155
 property-0.152
Token back
Token ID736
Feature activation+1.58e-02

Pos logit contributions
wered+0.109
 shallow+0.105
Ho+0.102
ores+0.100
add+0.100
Neg logit contributions
 property-0.112
keeping-0.108
 client-0.106
 policy-0.104
 fault-0.104
Token with
Token ID351
Feature activation-4.12e-02

Pos logit contributions
 means+0.026
 client+0.026
 property+0.026
lder+0.025
 policy+0.025
Neg logit contributions
 shallow-0.023
Ho-0.022
wered-0.022
 crashing-0.021
add-0.021
Token some
Token ID617
Feature activation-1.69e-03

Pos logit contributions
 shallow+0.047
Ho+0.047
 crashing+0.046
add+0.043
wered+0.043
Neg logit contributions
 means-0.078
 client-0.078
 property-0.078
lder-0.077
 fault-0.076
Token fruit
Token ID8234
Feature activation-3.23e-02

Pos logit contributions
 shallow+0.002
Ho+0.002
 crashing+0.002
wered+0.002
ores+0.002
Neg logit contributions
 means-0.003
 property-0.003
lder-0.003
 client-0.003
 policy-0.003
Token and
Token ID290
Feature activation-1.12e-01

Pos logit contributions
 shallow+0.044
 crashing+0.041
Ho+0.041
 sun+0.041
wered+0.040
Neg logit contributions
lder-0.055
 property-0.055
 client-0.055
 means-0.054
grand-0.053
Token crack
Token ID8469
Feature activation-8.77e-02

Pos logit contributions
 shallow+0.129
 crashing+0.119
Ho+0.119
 sun+0.116
wered+0.116
Neg logit contributions
lder-0.218
 property-0.217
 client-0.217
 means-0.214
 fault-0.211
Tokeners
Token ID364
Feature activation-5.96e-02

Pos logit contributions
 shallow+0.115
 crashing+0.112
Ho+0.112
wered+0.110
 sun+0.105
Neg logit contributions
 means-0.153
lder-0.151
 property-0.149
 client-0.146
 fault-0.146
Token on
Token ID319
Feature activation+1.54e-02

Pos logit contributions
 sun+0.002
 crashing+0.002
 shallow+0.001
 spring+0.001
ores+0.001
Neg logit contributions
lder-0.004
 client-0.004
keeping-0.004
 property-0.003
grand-0.003
Token the
Token ID262
Feature activation-1.77e-03

Pos logit contributions
 client+0.027
lder+0.027
 property+0.027
 means+0.027
 fault+0.026
Neg logit contributions
 shallow-0.020
Ho-0.019
 crashing-0.019
wered-0.018
ores-0.018
Token swings
Token ID26728
Feature activation-1.05e-01

Pos logit contributions
 shallow+0.002
 crashing+0.002
 sun+0.002
Ho+0.002
wered+0.002
Neg logit contributions
lder-0.003
 client-0.003
 property-0.003
 means-0.003
grand-0.003
Token.
Token ID13
Feature activation+1.59e-02

Pos logit contributions
 shallow+0.123
 crashing+0.115
Ho+0.115
wered+0.112
 sun+0.111
Neg logit contributions
lder-0.197
 property-0.197
 client-0.196
 means-0.194
grand-0.190
Token She
Token ID1375
Feature activation-3.81e-02

Pos logit contributions
 property+0.035
 policy+0.035
grand+0.035
lder+0.035
 client+0.034
Neg logit contributions
 sun-0.014
 shallow-0.013
wered-0.012
 crashing-0.012
add-0.012
Token took
Token ID1718
Feature activation-8.60e-02

Pos logit contributions
 shallow+0.063
 crashing+0.061
Ho+0.061
wered+0.060
ores+0.059
Neg logit contributions
lder-0.054
 property-0.053
 client-0.053
 means-0.053
grand-0.051
Token a
Token ID257
Feature activation+2.27e-02

Pos logit contributions
 shallow+0.106
Ho+0.098
 crashing+0.095
wered+0.093
ores+0.091
Neg logit contributions
 means-0.163
 client-0.162
lder-0.157
 fault-0.156
 property-0.154
Token deep
Token ID2769
Feature activation+2.11e-02

Pos logit contributions
lder+0.046
 client+0.046
 means+0.046
 property+0.046
grand+0.045
Neg logit contributions
 shallow-0.023
Ho-0.022
wered-0.021
ores-0.021
 crashing-0.021
Token breath
Token ID8033
Feature activation-3.96e-02

Pos logit contributions
 means+0.034
lder+0.033
 property+0.032
 policy+0.031
 client+0.031
Neg logit contributions
Ho-0.030
 shallow-0.030
 crashing-0.029
ores-0.029
hes-0.029
Token and
Token ID290
Feature activation-1.14e-01

Pos logit contributions
Ho+0.053
wered+0.053
 shallow+0.052
add+0.050
ores+0.047
Neg logit contributions
 means-0.068
 fault-0.066
lder-0.064
 property-0.064
 policy-0.063
Token walked
Token ID6807
Feature activation-1.20e-01

Pos logit contributions
Ho+0.168
 shallow+0.166
 crashing+0.165
wered+0.164
add+0.161
Neg logit contributions
 means-0.183
lder-0.174
 property-0.169
 client-0.167
grand-0.162
Token.
Token ID13
Feature activation-1.34e-03

Pos logit contributions
 shallow+0.118
wered+0.114
Ho+0.113
add+0.110
ores+0.105
Neg logit contributions
 means-0.098
lder-0.094
 property-0.092
 client-0.092
 fault-0.090
Token They
Token ID1119
Feature activation-6.17e-02

Pos logit contributions
 shallow+0.002
eh+0.002
Ho+0.002
wered+0.002
 drifting+0.002
Neg logit contributions
 means-0.002
 fault-0.002
 client-0.002
 endings-0.002
 property-0.002
Token drew
Token ID9859
Feature activation-8.79e-02

Pos logit contributions
 shallow+0.095
Ho+0.091
 crashing+0.090
ores+0.087
wered+0.087
Neg logit contributions
lder-0.094
 means-0.094
 client-0.094
 property-0.092
grand-0.090
Token a
Token ID257
Feature activation+1.02e-02

Pos logit contributions
 shallow+0.122
Ho+0.117
 crashing+0.113
wered+0.113
add+0.109
Neg logit contributions
 means-0.151
 property-0.143
 fault-0.142
 client-0.141
lder-0.140
Token rainbow
Token ID27223
Feature activation-5.43e-02

Pos logit contributions
 client+0.017
lder+0.017
 property+0.017
 means+0.016
grand+0.016
Neg logit contributions
 shallow-0.014
wered-0.014
ores-0.014
add-0.014
Ho-0.013
Token and
Token ID290
Feature activation-1.23e-01

Pos logit contributions
 shallow+0.075
Ho+0.071
wered+0.071
add+0.068
 crashing+0.067
Neg logit contributions
 means-0.094
lder-0.089
 client-0.088
 property-0.088
 fault-0.087
Token a
Token ID257
Feature activation+6.15e-03

Pos logit contributions
 shallow+0.133
 sun+0.123
 crashing+0.123
Ho+0.122
wered+0.120
Neg logit contributions
lder-0.248
 property-0.247
 client-0.245
 means-0.239
 fault-0.239
Token heart
Token ID2612
Feature activation-2.85e-02

Pos logit contributions
 client+0.011
lder+0.011
 property+0.011
 means+0.011
 fault+0.011
Neg logit contributions
 shallow-0.008
wered-0.007
ores-0.007
Ho-0.007
add-0.007
Token.
Token ID13
Feature activation-1.25e-03

Pos logit contributions
 shallow+0.040
 crashing+0.038
Ho+0.038
wered+0.037
 sun+0.037
Neg logit contributions
lder-0.048
 property-0.047
 client-0.047
 means-0.047
grand-0.046
Token They
Token ID1119
Feature activation-6.24e-02

Pos logit contributions
 shallow+0.002
eh+0.002
Ho+0.002
 farther+0.002
wered+0.002
Neg logit contributions
 means-0.002
 fault-0.002
 client-0.002
grand-0.002
 endings-0.002
Token wrote
Token ID2630
Feature activation-7.91e-02

Pos logit contributions
 shallow+0.094
Ho+0.089
 crashing+0.089
wered+0.087
ores+0.086
Neg logit contributions
lder-0.098
 client-0.097
 means-0.097
 property-0.097
grand-0.094
Token She
Token ID1375
Feature activation-4.54e-02

Pos logit contributions
 means+0.007
lder+0.006
 client+0.006
 Painter+0.006
 depends+0.006
Neg logit contributions
 shallow-0.006
wered-0.006
 crashing-0.006
Ho-0.006
add-0.006
Token picks
Token ID11103
Feature activation-1.04e-01

Pos logit contributions
wered+0.093
 shallow+0.093
Ho+0.092
 crashing+0.092
ores+0.090
Neg logit contributions
 means-0.048
 client-0.043
 depends-0.043
lder-0.043
 property-0.043
Token up
Token ID510
Feature activation-4.71e-02

Pos logit contributions
Ho+0.161
 shallow+0.160
 crashing+0.160
wered+0.158
ores+0.149
Neg logit contributions
 means-0.170
 fault-0.153
lder-0.145
 client-0.143
 depends-0.141
Token the
Token ID262
Feature activation-1.76e-02

Pos logit contributions
 crashing+0.073
Ho+0.072
add+0.066
 shallow+0.066
wered+0.065
Neg logit contributions
 means-0.077
 fault-0.070
 client-0.068
 problems-0.067
 property-0.065
Token stick
Token ID4859
Feature activation-7.20e-02

Pos logit contributions
ores+0.022
Ho+0.021
wered+0.020
add+0.020
 crashing+0.020
Neg logit contributions
 property-0.032
 means-0.032
 fault-0.032
 depends-0.031
 client-0.031
Token and
Token ID290
Feature activation-1.27e-01

Pos logit contributions
Ho+0.081
 shallow+0.079
wered+0.077
add+0.076
ores+0.072
Neg logit contributions
 means-0.144
 property-0.137
 fault-0.137
lder-0.136
 client-0.135
Token throws
Token ID12542
Feature activation-9.02e-02

Pos logit contributions
Ho+0.219
wered+0.218
add+0.218
 crashing+0.216
 shallow+0.214
Neg logit contributions
 means-0.184
 depends-0.163
 client-0.162
 property-0.162
lder-0.158
Token it
Token ID340
Feature activation-1.52e-02

Pos logit contributions
Ho+0.145
 shallow+0.144
add+0.143
wered+0.141
 crashing+0.139
Neg logit contributions
 means-0.134
 fault-0.122
 property-0.120
 depends-0.116
 client-0.116
Token at
Token ID379
Feature activation+1.05e-02

Pos logit contributions
Ho+0.022
wered+0.021
 shallow+0.021
ores+0.020
add+0.020
Neg logit contributions
 means-0.025
 fault-0.024
 property-0.023
lder-0.023
 depends-0.022
Token the
Token ID262
Feature activation-3.71e-03

Pos logit contributions
 means+0.018
 client+0.017
 fault+0.017
lder+0.017
 property+0.017
Neg logit contributions
 crashing-0.014
Ho-0.013
 shallow-0.013
add-0.013
hes-0.012
Token big
Token ID1263
Feature activation+4.56e-02

Pos logit contributions
Ho+0.004
ores+0.004
wered+0.004
Tweet+0.004
add+0.004
Neg logit contributions
 property-0.007
 fault-0.007
 client-0.007
 means-0.007
 depends-0.007
Token.
Token ID13
Feature activation+9.17e-03

Pos logit contributions
 shallow+0.008
 crashing+0.007
 sun+0.007
ores+0.007
 fresh+0.006
Neg logit contributions
lder-0.015
 client-0.014
grand-0.014
 property-0.014
 depends-0.014
Token He
Token ID679
Feature activation-4.03e-02

Pos logit contributions
lder+0.018
 client+0.018
 means+0.018
 property+0.018
grand+0.017
Neg logit contributions
 shallow-0.010
 crashing-0.010
Ho-0.010
wered-0.009
ores-0.009
Token took
Token ID1718
Feature activation-9.23e-02

Pos logit contributions
 shallow+0.061
 crashing+0.058
Ho+0.058
wered+0.057
ores+0.057
Neg logit contributions
lder-0.063
 client-0.062
 property-0.062
 means-0.062
grand-0.060
Token the
Token ID262
Feature activation-1.82e-02

Pos logit contributions
 shallow+0.114
Ho+0.113
 crashing+0.112
wered+0.108
add+0.103
Neg logit contributions
 means-0.171
 client-0.159
lder-0.158
 fault-0.156
 decisions-0.154
Token pumpkin
Token ID30089
Feature activation-4.57e-02

Pos logit contributions
 shallow+0.028
 sun+0.027
 fresh+0.026
 crashing+0.025
wered+0.025
Neg logit contributions
 client-0.029
keeping-0.029
 property-0.029
rums-0.028
grand-0.028
Token and
Token ID290
Feature activation-1.19e-01

Pos logit contributions
 shallow+0.060
Ho+0.058
 crashing+0.057
wered+0.057
add+0.055
Neg logit contributions
 property-0.080
 means-0.079
lder-0.079
 client-0.079
 fault-0.077
Token put
Token ID1234
Feature activation-1.29e-01

Pos logit contributions
 shallow+0.145
Ho+0.144
 crashing+0.142
wered+0.141
add+0.138
Neg logit contributions
 means-0.220
lder-0.214
 client-0.210
 property-0.209
grand-0.203
Token it
Token ID340
Feature activation-1.80e-02

Pos logit contributions
Ho+0.254
Tweet+0.251
wered+0.248
add+0.247
 crashing+0.246
Neg logit contributions
 means-0.144
 fault-0.111
 depends-0.110
keeping-0.107
lder-0.107
Token on
Token ID319
Feature activation-6.69e-03

Pos logit contributions
Ho+0.029
wered+0.029
 shallow+0.027
 crashing+0.026
ores+0.026
Neg logit contributions
 property-0.022
 means-0.022
 fault-0.022
 mattered-0.021
keeping-0.021
Token a
Token ID257
Feature activation+2.21e-02

Pos logit contributions
add+0.008
 shallow+0.008
Ho+0.008
 crashing+0.007
wered+0.007
Neg logit contributions
 means-0.012
 client-0.012
lder-0.012
 fault-0.012
 property-0.012
Token table
Token ID3084
Feature activation-1.16e-02

Pos logit contributions
lder+0.038
 property+0.038
 client+0.037
 means+0.037
grand+0.036
Neg logit contributions
 shallow-0.030
Ho-0.028
 crashing-0.028
wered-0.028
ores-0.027
Token.
Token ID13
Feature activation+2.78e-02

Pos logit contributions
 client+0.008
grand+0.008
lder+0.008
 policy+0.008
 property+0.008
Neg logit contributions
 shallow-0.003
 sun-0.003
 crashing-0.003
Ho-0.003
 higher-0.003
Token She
Token ID1375
Feature activation-3.52e-02

Pos logit contributions
lder+0.054
 property+0.054
 client+0.054
 means+0.053
grand+0.052
Neg logit contributions
 shallow-0.032
 crashing-0.030
Ho-0.030
wered-0.029
ores-0.028
Token fl
Token ID781
Feature activation-8.95e-02

Pos logit contributions
 shallow+0.049
ores+0.046
 crashing+0.046
wered+0.046
Ho+0.045
Neg logit contributions
lder-0.061
 means-0.060
 client-0.058
 property-0.058
grand-0.056
Tokenapped
Token ID6320
Feature activation-8.31e-02

Pos logit contributions
 shallow+0.141
 fresh+0.133
 sun+0.133
Ho+0.133
 camps+0.130
Neg logit contributions
lder-0.131
 means-0.127
 fault-0.126
 client-0.125
 property-0.125
Token her
Token ID607
Feature activation-1.30e-03

Pos logit contributions
 shallow+0.126
add+0.126
Ho+0.125
ores+0.117
 camps+0.117
Neg logit contributions
 means-0.127
 property-0.117
 privacy-0.117
 worry-0.116
lder-0.115
Token wings
Token ID12098
Feature activation-9.00e-02

Pos logit contributions
 shallow+0.002
ores+0.002
wered+0.002
Ho+0.002
 crashing+0.002
Neg logit contributions
lder-0.002
 property-0.002
 client-0.002
 fault-0.002
 means-0.002
Token and
Token ID290
Feature activation-1.16e-01

Pos logit contributions
Ho+0.132
wered+0.130
add+0.129
 shallow+0.126
ores+0.126
Neg logit contributions
 means-0.137
 fault-0.130
lder-0.127
 worry-0.124
 property-0.122
Token soared
Token ID35292
Feature activation-1.35e-01

Pos logit contributions
 shallow+0.127
add+0.124
wered+0.122
ores+0.121
Ho+0.121
Neg logit contributions
 means-0.230
lder-0.224
 property-0.218
 client-0.216
 depends-0.214
Token high
Token ID1029
Feature activation-5.42e-03

Pos logit contributions
add+0.169
hes+0.168
 shallow+0.165
ores+0.162
wered+0.160
Neg logit contributions
 means-0.226
 property-0.215
keeping-0.211
lder-0.209
",-0.205
Token in
Token ID287
Feature activation-1.82e-02

Pos logit contributions
 shallow+0.006
wered+0.006
Ho+0.006
 crashing+0.006
ores+0.006
Neg logit contributions
lder-0.010
 means-0.010
 property-0.010
 client-0.010
keeping-0.010
Token the
Token ID262
Feature activation-1.40e-02

Pos logit contributions
Ho+0.023
add+0.022
 shallow+0.022
hes+0.022
ores+0.021
Neg logit contributions
 means-0.032
keeping-0.032
 property-0.031
lder-0.030
 worry-0.030