TOP ACTIVATIONS
MAX = 10.360

book to carefully brush all of the horse â
The shop keeper gathered all of the items and told Min
the cat had wiped all of the paw prints off the
quickly opened it and all of her favourite things were inside
came earlier and picked all of the cher ries . â
too . She took all of the stones home and made
her mom my took all of the pills . They went
huge space and put all of his worries in it .
treats . They took all of the strawberries inside and made
andy smiled and ran all of the way home . She
ran off to show all of their friends their turkey .
gigg led and ate all of the yogurt .
Then , they wiped all of the surfaces to make sure
her mom my shared all of their cher ries , making
, and showed them all of the unique things the ocean
Cl aire gathered all of her courage and slowly placed
kitten , showing it all of the new places he had
Instead , he took all of the little boy 's toys
dinner ! She ate all of her broccoli and felt strong
sc run ched up all of the cloth in her hand

MIN ACTIVATIONS
MIN = -10.189

. He blinked his long eyel ashes and kept hopping .
theater and Larry felt his eyel ids become heavy . He
long until Chloe felt her eyel ids getting heavy . She
the black mascara on her eyel ashes . She makes a
THE BE LL WHEN YOU FE EL LIKE IT ".
She put some on her eyel ids and cheeks . It
. Jill hopped for what seemed like miles until she spotted
asked him , " What did you do with your chain
walked and walked for what seemed like forever , until he
They rode around for what seemed like hours . When it
asked him , " What happened , Tim my ? Why
o at field for what seemed like forever .
They laughed together for what seemed like a long time .
Ben asked Tim what he was doing , and Tim replied
old brother , " What did you do ?"
had been searching for what seemed like hours , when all
my asked , " What did you find down there ,
watched the rainbow for what seemed like forever .
and asked , " What happened , Tim my ?"
came by . " What happened , B inky ?" asked

INTERVAL 5.223 - 10.360
CONTAINS 2.679%

. Then they made them into different things . They made
He loved jam . He opened the jar
try his best . The race started and
its new song . They quickly realized the
and clean you up ." And that 's how

INTERVAL 0.086 - 5.223
CONTAINS 46.734%

He had learned his lesson and didn 't speed around so
the exercise , everyone looked at their creations . They couldn
finished , the little girl put her jacket back on and
and take a nap . The boy did as he was
. She felt so relieved that she remembered and so happy

INTERVAL -5.052 - 0.086
CONTAINS 48.878%

went closer and closer until he could see what was causing
started running along the track . M ilo was
up and felt much better . The end . <|endoftext|> Once
to the rabbit , I have
Tom and Jen are still sad . They miss their old

INTERVAL -10.189 - -5.052
CONTAINS 1.709%

or killed by the truck . You should never go near
fairy replied , I am giving
please . For give us . For give us . For
You only wanted the ring . But the ring is not
Lily . She loved to look up at the stars in
Token book
Token ID1492
Feature activation+6.85e-01

Pos logit contributions
ric+0.019
ries+0.017
els+0.017
asted+0.017
aches+0.017
Neg logit contributions
 Please-0.015
 unfair-0.015
 Give-0.015
 trouble-0.014
 guilty-0.014
Token to
Token ID284
Feature activation+3.97e+00

Pos logit contributions
ric+0.015
ries+0.014
els+0.014
asted+0.014
aches+0.013
Neg logit contributions
 trouble-0.011
 Please-0.011
 unfair-0.010
 impossible-0.010
 guilty-0.010
Token carefully
Token ID7773
Feature activation+1.12e+00

Pos logit contributions
ries+0.087
ric+0.085
hat+0.080
ants+0.079
ishes+0.078
Neg logit contributions
 trouble-0.066
 mean-0.063
 stupid-0.060
 impossible-0.059
 unfair-0.059
Token brush
Token ID14093
Feature activation+2.91e+00

Pos logit contributions
ric+0.024
ries+0.024
els+0.022
hat+0.022
asted+0.021
Neg logit contributions
 trouble-0.018
 Please-0.017
 impossible-0.017
 guilty-0.017
 mean-0.016
Token all
Token ID477
Feature activation+6.04e+00

Pos logit contributions
ric+0.050
ries+0.046
asted+0.045
aches+0.044
els+0.043
Neg logit contributions
 trouble-0.058
 guilty-0.054
 Please-0.054
 unfair-0.054
 impossible-0.054
Token of
Token ID286
Feature activation+1.04e+01

Pos logit contributions
ric+0.108
ants+0.106
ries+0.103
hat+0.103
illas+0.103
Neg logit contributions
 trouble-0.114
 wrong-0.113
 impossible-0.107
 mean-0.106
 guilty-0.105
Token the
Token ID262
Feature activation+2.93e+00

Pos logit contributions
ric+0.181
hat+0.174
ries+0.166
illas+0.166
ites+0.164
Neg logit contributions
 trouble-0.219
 stupid-0.189
 guilty-0.187
 mean-0.187
 wrong-0.186
Token horse
Token ID8223
Feature activation+2.42e+00

Pos logit contributions
ric+0.056
ries+0.055
els+0.053
aches+0.051
asted+0.050
Neg logit contributions
 Please-0.051
 unfair-0.051
 trouble-0.050
 guilty-0.048
 Give-0.048
Tokenâ
Token ID22940
Feature activation-2.82e+00

Pos logit contributions
ric+0.054
ries+0.052
els+0.050
asted+0.049
aches+0.049
Neg logit contributions
 trouble-0.035
 Please-0.035
 guilty-0.034
 unfair-0.033
 impossible-0.033
Token
Token ID26391
Feature activation-3.93e+00

Pos logit contributions
 trouble+0.047
 Please+0.046
 unfair+0.046
 guilty+0.044
 mean+0.043
Neg logit contributions
ric-0.058
ries-0.055
asted-0.054
els-0.053
aches-0.053
Token
Token ID8151
Feature activation-1.04e+00

Pos logit contributions
 Please+0.072
 trouble+0.071
 unfair+0.066
 guilty+0.066
 Give+0.065
Neg logit contributions
ries-0.073
ric-0.073
els-0.070
asted-0.067
aches-0.067
TokenThe
Token ID464
Feature activation+1.77e+00

Pos logit contributions
ric+0.113
ries+0.110
packs+0.101
els+0.099
aches+0.097
Neg logit contributions
 Please-0.092
-0.078
 Give-0.078
 trouble-0.077
 brave-0.075
Token shop
Token ID6128
Feature activation+7.45e-01

Pos logit contributions
ric+0.030
ries+0.030
els+0.028
aches+0.027
asted+0.027
Neg logit contributions
 trouble-0.035
 Please-0.035
 unfair-0.035
 guilty-0.034
 impossible-0.033
Tokenkeeper
Token ID13884
Feature activation+2.77e+00

Pos logit contributions
ric+0.016
ries+0.015
els+0.014
aches+0.014
asted+0.014
Neg logit contributions
 Please-0.012
 trouble-0.012
 unfair-0.012
 guilty-0.011
 mean-0.011
Token gathered
Token ID9272
Feature activation+3.48e+00

Pos logit contributions
ric+0.064
ries+0.057
aches+0.054
els+0.053
packs+0.053
Neg logit contributions
 Please-0.041
 guilty-0.040
 mean-0.040
 Give-0.039
 unfair-0.039
Token all
Token ID477
Feature activation+5.48e+00

Pos logit contributions
ric+0.064
ries+0.059
aches+0.057
els+0.056
asted+0.055
Neg logit contributions
 trouble-0.068
 Please-0.065
 unfair-0.065
 guilty-0.065
 impossible-0.063
Token of
Token ID286
Feature activation+1.03e+01

Pos logit contributions
ric+0.112
illas+0.100
aches+0.099
ants+0.099
ppers+0.091
Neg logit contributions
 trouble-0.099
 guilty-0.091
 right-0.088
 unfair-0.087
 stupid-0.087
Token the
Token ID262
Feature activation+2.58e+00

Pos logit contributions
ric+0.212
aches+0.206
hat+0.201
ries+0.198
view+0.198
Neg logit contributions
 trouble-0.174
 stupid-0.152
 guilty-0.149
 impossible-0.145
 unfair-0.143
Token items
Token ID3709
Feature activation+3.47e+00

Pos logit contributions
ric+0.045
ries+0.043
els+0.040
asted+0.040
aches+0.039
Neg logit contributions
 trouble-0.051
 unfair-0.050
 Please-0.050
 guilty-0.049
 mean-0.048
Token and
Token ID290
Feature activation+2.54e+00

Pos logit contributions
ric+0.076
ries+0.069
asted+0.068
els+0.066
aches+0.066
Neg logit contributions
 trouble-0.053
 guilty-0.052
 impossible-0.052
 Please-0.051
 unfair-0.051
Token told
Token ID1297
Feature activation+1.81e+00

Pos logit contributions
ric+0.055
ries+0.046
els+0.046
aches+0.045
ants+0.045
Neg logit contributions
 trouble-0.043
 Please-0.041
 guilty-0.041
 unfair-0.040
 stupid-0.039
Token Min
Token ID1855
Feature activation+1.57e+00

Pos logit contributions
ric+0.033
ries+0.030
aches+0.029
hat+0.029
illas+0.029
Neg logit contributions
 trouble-0.036
 Please-0.034
 sorry-0.034
 guilty-0.034
 unfair-0.034
Token the
Token ID262
Feature activation+3.23e-01

Pos logit contributions
ric+0.020
ries+0.018
els+0.018
hat+0.017
asted+0.017
Neg logit contributions
 trouble-0.022
 Please-0.021
 guilty-0.020
 sorry-0.020
 mean-0.020
Token cat
Token ID3797
Feature activation+1.16e+00

Pos logit contributions
els+0.005
ries+0.005
ric+0.005
asted+0.004
hat+0.004
Neg logit contributions
 Please-0.006
 unfair-0.006
 Come-0.006
 Goodbye-0.006
 Give-0.006
Token had
Token ID550
Feature activation-1.98e+00

Pos logit contributions
ric+0.021
ries+0.021
els+0.020
aches+0.020
asted+0.019
Neg logit contributions
 trouble-0.021
 Please-0.021
 unfair-0.021
 guilty-0.020
 impossible-0.020
Token wiped
Token ID21122
Feature activation+3.32e+00

Pos logit contributions
 trouble+0.031
 Please+0.031
 unfair+0.031
 guilty+0.029
 impossible+0.029
Neg logit contributions
ric-0.042
ries-0.041
els-0.039
aches-0.038
asted-0.038
Token all
Token ID477
Feature activation+5.75e+00

Pos logit contributions
ric+0.057
ries+0.055
els+0.052
aches+0.051
asted+0.050
Neg logit contributions
 trouble-0.068
 Please-0.065
 unfair-0.065
 guilty-0.065
 impossible-0.063
Token of
Token ID286
Feature activation+1.02e+01

Pos logit contributions
ric+0.100
ants+0.095
ries+0.089
els+0.089
hat+0.087
Neg logit contributions
 trouble-0.121
 guilty-0.115
 wrong-0.113
 impossible-0.111
 unfair-0.111
Token the
Token ID262
Feature activation+2.80e+00

Pos logit contributions
ric+0.194
hat+0.191
ries+0.180
illas+0.177
ites+0.176
Neg logit contributions
 trouble-0.204
 stupid-0.181
 guilty-0.175
 mean-0.174
 wrong-0.172
Token paw
Token ID41827
Feature activation+1.08e+00

Pos logit contributions
ries+0.049
ric+0.047
els+0.047
aches+0.044
asted+0.043
Neg logit contributions
 Please-0.055
 unfair-0.052
 Give-0.050
 impossible-0.050
 trouble-0.050
Token prints
Token ID20842
Feature activation+2.32e+00

Pos logit contributions
ric+0.022
ries+0.021
els+0.020
asted+0.020
aches+0.020
Neg logit contributions
 trouble-0.018
 Please-0.017
 unfair-0.017
 guilty-0.017
 impossible-0.017
Token off
Token ID572
Feature activation+4.41e+00

Pos logit contributions
ric+0.045
ries+0.044
els+0.042
aches+0.041
asted+0.041
Neg logit contributions
 trouble-0.040
 Please-0.040
 unfair-0.039
 guilty-0.038
 impossible-0.037
Token the
Token ID262
Feature activation+2.11e+00

Pos logit contributions
ric+0.080
ries+0.073
asted+0.070
els+0.070
aches+0.069
Neg logit contributions
 trouble-0.088
 Please-0.086
 guilty-0.083
 unfair-0.083
 impossible-0.083
Token quickly
Token ID2952
Feature activation-1.25e-01

Pos logit contributions
ric+0.027
ries+0.026
els+0.025
hat+0.024
ants+0.023
Neg logit contributions
 Please-0.022
 trouble-0.022
 unfair-0.022
 guilty-0.022
 mean-0.021
Token opened
Token ID4721
Feature activation+2.61e+00

Pos logit contributions
 trouble+0.002
 guilty+0.002
 mean+0.002
 unfair+0.002
 Please+0.002
Neg logit contributions
ric-0.003
ries-0.002
els-0.002
ants-0.002
illas-0.002
Token it
Token ID340
Feature activation+1.11e+00

Pos logit contributions
ric+0.056
illas+0.050
aches+0.049
asted+0.049
bel+0.048
Neg logit contributions
 trouble-0.045
 wrong-0.039
 guilty-0.038
 Please-0.038
 right-0.038
Token and
Token ID290
Feature activation+2.68e+00

Pos logit contributions
ric+0.025
ries+0.024
els+0.023
asted+0.023
aches+0.023
Neg logit contributions
 trouble-0.016
 Please-0.016
 unfair-0.016
 impossible-0.016
 guilty-0.015
Token all
Token ID477
Feature activation+5.13e+00

Pos logit contributions
ric+0.065
ries+0.057
els+0.057
ants+0.055
aches+0.053
Neg logit contributions
 Please-0.039
 guilty-0.039
 trouble-0.038
 mean-0.036
 unfair-0.035
Token of
Token ID286
Feature activation+1.02e+01

Pos logit contributions
ric+0.112
ants+0.108
hat+0.103
illas+0.102
ries+0.101
Neg logit contributions
 guilty-0.078
 mean-0.074
 trouble-0.073
 scared-0.071
 stupid-0.070
Token her
Token ID607
Feature activation+1.18e+00

Pos logit contributions
ric+0.200
hat+0.182
ants+0.178
ries+0.178
bel+0.177
Neg logit contributions
 trouble-0.204
 unfair-0.174
 guilty-0.172
 stupid-0.171
 wrong-0.170
Token favourite
Token ID12507
Feature activation+2.25e+00

Pos logit contributions
ric+0.024
ries+0.023
asted+0.022
els+0.022
aches+0.021
Neg logit contributions
 trouble-0.021
 guilty-0.019
 unfair-0.019
 Please-0.019
 stupid-0.019
Token things
Token ID1243
Feature activation+2.63e-01

Pos logit contributions
ric+0.047
ries+0.044
asted+0.042
els+0.042
aches+0.042
Neg logit contributions
 trouble-0.038
 unfair-0.035
 Please-0.035
 guilty-0.035
 mean-0.034
Token were
Token ID547
Feature activation-3.70e+00

Pos logit contributions
ric+0.005
ries+0.005
els+0.005
asted+0.005
aches+0.005
Neg logit contributions
 trouble-0.005
 unfair-0.005
 Please-0.004
 guilty-0.004
 impossible-0.004
Token inside
Token ID2641
Feature activation+4.17e+00

Pos logit contributions
 trouble+0.060
 unfair+0.058
 impossible+0.057
 Please+0.057
 guilty+0.057
Neg logit contributions
ric-0.078
ries-0.074
els-0.071
asted-0.071
hat-0.069
Token came
Token ID1625
Feature activation-2.06e+00

Pos logit contributions
 trouble+0.008
 Please+0.008
 unfair+0.008
 guilty+0.007
 impossible+0.007
Neg logit contributions
ric-0.012
ries-0.012
els-0.011
aches-0.011
asted-0.011
Token earlier
Token ID2961
Feature activation-7.50e-01

Pos logit contributions
 trouble+0.033
 unfair+0.033
 Please+0.032
 guilty+0.031
 impossible+0.030
Neg logit contributions
ric-0.043
ries-0.042
els-0.040
aches-0.039
hat-0.039
Token and
Token ID290
Feature activation+1.42e+00

Pos logit contributions
 trouble+0.012
 unfair+0.012
 Please+0.012
 guilty+0.012
 impossible+0.012
Neg logit contributions
ric-0.015
ries-0.015
els-0.014
aches-0.014
asted-0.013
Token picked
Token ID6497
Feature activation+2.52e+00

Pos logit contributions
ric+0.027
ries+0.027
els+0.025
aches+0.025
asted+0.025
Neg logit contributions
 trouble-0.025
 Please-0.025
 unfair-0.025
 guilty-0.024
 impossible-0.023
Token all
Token ID477
Feature activation+5.90e+00

Pos logit contributions
ric+0.051
ries+0.048
els+0.046
asted+0.045
aches+0.045
Neg logit contributions
 trouble-0.043
 Please-0.042
 unfair-0.042
 guilty-0.041
 Give-0.040
Token of
Token ID286
Feature activation+1.02e+01

Pos logit contributions
ric+0.108
illas+0.102
hat+0.101
els+0.097
bel+0.097
Neg logit contributions
 trouble-0.109
 Give-0.102
 mean-0.101
 wrong-0.101
 unfair-0.100
Token the
Token ID262
Feature activation+2.02e+00

Pos logit contributions
bel+0.199
hat+0.197
ric+0.196
packs+0.186
illas+0.182
Neg logit contributions
 trouble-0.185
 stupid-0.167
 wrong-0.166
 unfair-0.165
 mean-0.161
Token cher
Token ID21382
Feature activation+3.34e+00

Pos logit contributions
ric+0.036
hat+0.033
asted+0.033
ries+0.032
els+0.032
Neg logit contributions
 trouble-0.041
 guilty-0.039
 unfair-0.039
 mean-0.039
 stupid-0.038
Tokenries
Token ID1678
Feature activation+3.60e+00

Pos logit contributions
asted+0.045
wered+0.034
bel+0.033
oted+0.033
ric+0.031
Neg logit contributions
 wrong-0.073
 trouble-0.073
 unfair-0.065
 guilty-0.065
 problem-0.064
Token.
Token ID13
Feature activation-2.71e+00

Pos logit contributions
ric+0.081
ries+0.075
asted+0.075
els+0.074
hat+0.072
Neg logit contributions
 trouble-0.052
 Please-0.052
 guilty-0.050
 unfair-0.050
 wrong-0.049
Tokenâ
Token ID22940
Feature activation-2.76e+00

Pos logit contributions
 trouble+0.042
 unfair+0.042
 Please+0.040
 mean+0.040
 guilty+0.040
Neg logit contributions
ric-0.056
ries-0.055
els-0.051
asted-0.051
hat-0.050
Token too
Token ID1165
Feature activation-1.12e+00

Pos logit contributions
ric+0.038
ries+0.037
els+0.035
asted+0.035
aches+0.034
Neg logit contributions
 trouble-0.031
 Please-0.031
 unfair-0.031
 guilty-0.030
 impossible-0.030
Token.
Token ID13
Feature activation-1.02e+00

Pos logit contributions
 Please+0.020
 unfair+0.020
 trouble+0.020
 guilty+0.019
 Give+0.019
Neg logit contributions
ries-0.021
ric-0.020
els-0.019
aches-0.019
hat-0.019
Token She
Token ID1375
Feature activation+1.89e+00

Pos logit contributions
 Please+0.017
 trouble+0.016
 guilty+0.015
 unfair+0.015
 Give+0.015
Neg logit contributions
ric-0.022
ries-0.021
aches-0.021
els-0.021
asted-0.020
Token took
Token ID1718
Feature activation+2.33e+00

Pos logit contributions
ric+0.038
ries+0.036
els+0.033
hat+0.032
ants+0.032
Neg logit contributions
 trouble-0.032
 Please-0.032
 unfair-0.031
 guilty-0.031
 mean-0.031
Token all
Token ID477
Feature activation+4.91e+00

Pos logit contributions
ric+0.046
ries+0.043
els+0.041
aches+0.040
asted+0.040
Neg logit contributions
 trouble-0.042
 Please-0.041
 unfair-0.040
 guilty-0.040
 impossible-0.039
Token of
Token ID286
Feature activation+1.02e+01

Pos logit contributions
ric+0.102
illas+0.090
ants+0.086
aches+0.082
ries+0.081
Neg logit contributions
 trouble-0.094
 guilty-0.082
 stupid-0.080
 right-0.079
 impossible-0.078
Token the
Token ID262
Feature activation+2.63e+00

Pos logit contributions
ric+0.208
bel+0.184
hat+0.182
illas+0.182
ants+0.175
Neg logit contributions
 trouble-0.202
 stupid-0.174
 guilty-0.173
 wrong-0.166
 unfair-0.165
Token stones
Token ID14966
Feature activation+2.85e+00

Pos logit contributions
ric+0.042
ries+0.042
els+0.039
aches+0.038
asted+0.037
Neg logit contributions
 Please-0.055
 trouble-0.054
 unfair-0.054
 guilty-0.052
 impossible-0.051
Token home
Token ID1363
Feature activation+1.42e+00

Pos logit contributions
ric+0.067
ries+0.062
asted+0.061
els+0.060
aches+0.058
Neg logit contributions
 trouble-0.039
 impossible-0.038
 guilty-0.038
 Please-0.037
 mean-0.036
Token and
Token ID290
Feature activation+2.68e+00

Pos logit contributions
ric+0.032
ries+0.030
els+0.029
asted+0.029
aches+0.028
Neg logit contributions
 trouble-0.021
 Please-0.020
 guilty-0.020
 unfair-0.020
 impossible-0.020
Token made
Token ID925
Feature activation+2.42e+00

Pos logit contributions
ric+0.054
ries+0.048
els+0.047
ants+0.046
hat+0.045
Neg logit contributions
 trouble-0.049
 Please-0.046
 guilty-0.046
 unfair-0.046
 mean-0.044
Token her
Token ID607
Feature activation-1.41e+00

Pos logit contributions
ric+0.045
ries+0.043
els+0.041
hat+0.039
aches+0.039
Neg logit contributions
 trouble-0.049
 Please-0.049
 unfair-0.048
 guilty-0.047
 mean-0.046
Token mom
Token ID1995
Feature activation+8.82e-01

Pos logit contributions
 trouble+0.026
 Please+0.026
 unfair+0.026
 guilty+0.025
 impossible+0.024
Neg logit contributions
ric-0.026
ries-0.025
els-0.024
aches-0.023
asted-0.023
Tokenmy
Token ID1820
Feature activation+6.57e-01

Pos logit contributions
ric+0.017
ries+0.017
aches+0.017
els+0.017
ants+0.016
Neg logit contributions
 Please-0.014
 guilty-0.014
 mean-0.014
 trouble-0.013
 ok-0.013
Token took
Token ID1718
Feature activation+2.53e+00

Pos logit contributions
ric+0.013
ries+0.013
aches+0.012
els+0.012
ants+0.012
Neg logit contributions
 guilty-0.011
 Please-0.010
 mean-0.010
 trouble-0.010
 unfair-0.010
Token all
Token ID477
Feature activation+6.12e+00

Pos logit contributions
ric+0.043
ries+0.040
els+0.038
aches+0.037
asted+0.036
Neg logit contributions
 trouble-0.053
 Please-0.051
 unfair-0.051
 guilty-0.050
 mean-0.049
Token of
Token ID286
Feature activation+1.01e+01

Pos logit contributions
ric+0.116
ants+0.106
ries+0.101
illas+0.101
aches+0.100
Neg logit contributions
 trouble-0.122
 guilty-0.109
 wrong-0.108
 mean-0.105
 stupid-0.104
Token the
Token ID262
Feature activation+3.22e+00

Pos logit contributions
ric+0.186
hat+0.165
ries+0.160
illas+0.158
ants+0.157
Neg logit contributions
 trouble-0.223
 sorry-0.187
 guilty-0.186
 stupid-0.185
 unfair-0.184
Token pills
Token ID19521
Feature activation+1.44e+00

Pos logit contributions
ries+0.058
ric+0.057
els+0.054
aches+0.053
asted+0.052
Neg logit contributions
 Please-0.061
 unfair-0.059
 trouble-0.058
 impossible-0.057
 Give-0.056
Token.
Token ID13
Feature activation-1.82e+00

Pos logit contributions
ric+0.030
ries+0.029
els+0.028
asted+0.027
aches+0.027
Neg logit contributions
 trouble-0.023
 Please-0.023
 unfair-0.023
 guilty-0.022
 impossible-0.022
Token They
Token ID1119
Feature activation+1.77e+00

Pos logit contributions
 Please+0.028
 trouble+0.026
 Give+0.026
 unfair+0.025
 guilty+0.025
Neg logit contributions
ries-0.040
ric-0.040
aches-0.039
els-0.039
hat-0.037
Token went
Token ID1816
Feature activation-2.12e+00

Pos logit contributions
ric+0.041
ries+0.039
els+0.037
ants+0.036
cks+0.035
Neg logit contributions
 guilty-0.026
 mean-0.026
 unfair-0.024
 scared-0.024
 ashamed-0.023
Token huge
Token ID3236
Feature activation+1.61e+00

Pos logit contributions
ric+0.029
ries+0.027
els+0.026
asted+0.024
hat+0.024
Neg logit contributions
 unfair-0.026
 Please-0.025
 trouble-0.025
 guilty-0.024
 Give-0.024
Token space
Token ID2272
Feature activation+2.24e+00

Pos logit contributions
ric+0.029
ries+0.028
els+0.027
asted+0.025
hat+0.025
Neg logit contributions
 unfair-0.030
 Please-0.029
 trouble-0.029
 guilty-0.028
 impossible-0.028
Token and
Token ID290
Feature activation+2.19e+00

Pos logit contributions
ric+0.048
ries+0.047
els+0.045
aches+0.044
asted+0.044
Neg logit contributions
 trouble-0.035
 Please-0.035
 guilty-0.033
 unfair-0.033
 impossible-0.033
Token put
Token ID1234
Feature activation+3.92e+00

Pos logit contributions
ric+0.045
ries+0.043
els+0.042
ants+0.039
aches+0.038
Neg logit contributions
 Please-0.039
 trouble-0.039
 guilty-0.038
 mean-0.037
 unfair-0.037
Token all
Token ID477
Feature activation+5.46e+00

Pos logit contributions
ric+0.075
ries+0.070
els+0.067
aches+0.066
ants+0.065
Neg logit contributions
 trouble-0.073
 Please-0.070
 guilty-0.070
 unfair-0.069
 sorry-0.067
Token of
Token ID286
Feature activation+1.01e+01

Pos logit contributions
ric+0.107
ants+0.100
ries+0.099
aches+0.098
els+0.097
Neg logit contributions
 trouble-0.100
 guilty-0.093
 Please-0.089
 stupid-0.089
 mean-0.088
Token his
Token ID465
Feature activation+7.09e-01

Pos logit contributions
ric+0.183
hat+0.178
ries+0.176
aches+0.167
els+0.165
Neg logit contributions
 trouble-0.205
 guilty-0.184
 stupid-0.181
 sorry-0.180
 mean-0.177
Token worries
Token ID18572
Feature activation-1.15e+00

Pos logit contributions
ric+0.016
ries+0.015
els+0.015
aches+0.014
asted+0.014
Neg logit contributions
 trouble-0.011
 Please-0.011
 unfair-0.010
 guilty-0.010
 impossible-0.010
Token in
Token ID287
Feature activation+5.84e+00

Pos logit contributions
 trouble+0.019
 Please+0.019
 unfair+0.019
 guilty+0.018
 impossible+0.018
Neg logit contributions
ric-0.023
ries-0.023
els-0.022
asted-0.021
aches-0.021
Token it
Token ID340
Feature activation+5.50e-01

Pos logit contributions
ric+0.097
ries+0.096
aches+0.090
els+0.089
ants+0.086
Neg logit contributions
 trouble-0.124
 Please-0.117
 guilty-0.115
 unfair-0.115
 impossible-0.114
Token.
Token ID13
Feature activation-1.07e+00

Pos logit contributions
ric+0.011
ries+0.011
els+0.010
aches+0.010
asted+0.010
Neg logit contributions
 trouble-0.009
 Please-0.009
 unfair-0.009
 guilty-0.009
 impossible-0.008
Token treats
Token ID18432
Feature activation-6.33e-02

Pos logit contributions
ric+0.006
ries+0.006
els+0.006
asted+0.005
aches+0.005
Neg logit contributions
 trouble-0.006
 Please-0.006
 unfair-0.006
 guilty-0.006
 mean-0.006
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
 trouble+0.001
 Please+0.001
 unfair+0.001
 guilty+0.001
 impossible+0.001
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
aches-0.001
asted-0.001
Token They
Token ID1119
Feature activation+2.05e+00

Pos logit contributions
 Please+0.021
 trouble+0.021
 unfair+0.020
 guilty+0.020
 Give+0.019
Neg logit contributions
ries-0.029
ric-0.028
aches-0.028
els-0.027
hat-0.026
Token took
Token ID1718
Feature activation+2.19e+00

Pos logit contributions
ries+0.044
ric+0.042
els+0.041
ants+0.039
hat+0.037
Neg logit contributions
 guilty-0.033
 mean-0.033
 unfair-0.030
 stupid-0.030
 Please-0.030
Token all
Token ID477
Feature activation+4.88e+00

Pos logit contributions
ric+0.046
ries+0.043
els+0.041
aches+0.041
hat+0.040
Neg logit contributions
 trouble-0.037
 unfair-0.035
 Please-0.035
 guilty-0.035
 mean-0.034
Token of
Token ID286
Feature activation+1.00e+01

Pos logit contributions
ric+0.093
ants+0.083
aches+0.083
illas+0.082
ries+0.081
Neg logit contributions
 trouble-0.092
 guilty-0.080
 stupid-0.078
 right-0.077
 wrong-0.076
Token the
Token ID262
Feature activation+2.59e+00

Pos logit contributions
ric+0.169
bel+0.162
aches+0.157
ries+0.154
hat+0.153
Neg logit contributions
 trouble-0.209
 guilty-0.191
 stupid-0.189
 unfair-0.184
 mean-0.183
Token strawberries
Token ID36973
Feature activation+2.90e+00

Pos logit contributions
ric+0.051
ries+0.048
els+0.046
asted+0.046
hat+0.045
Neg logit contributions
 trouble-0.046
 unfair-0.045
 Please-0.045
 guilty-0.044
 mean-0.043
Token inside
Token ID2641
Feature activation+6.37e+00

Pos logit contributions
ric+0.071
els+0.064
asted+0.063
ries+0.063
aches+0.060
Neg logit contributions
 trouble-0.035
 guilty-0.035
 impossible-0.034
 Please-0.034
 right-0.033
Token and
Token ID290
Feature activation+2.74e+00

Pos logit contributions
ric+0.141
ries+0.133
aches+0.131
els+0.131
hat+0.126
Neg logit contributions
 trouble-0.100
 guilty-0.096
 Please-0.091
 stupid-0.090
 impossible-0.089
Token made
Token ID925
Feature activation+2.73e+00

Pos logit contributions
ric+0.055
ries+0.051
els+0.050
ants+0.048
hat+0.047
Neg logit contributions
 trouble-0.048
 guilty-0.048
 Please-0.047
 unfair-0.046
 mean-0.045
Tokenandy
Token ID10757
Feature activation+9.13e-01

Pos logit contributions
ric+0.008
ries+0.007
els+0.006
aches+0.006
asted+0.006
Neg logit contributions
 trouble-0.023
 Please-0.023
 unfair-0.023
 guilty-0.022
 impossible-0.022
Token smiled
Token ID13541
Feature activation-2.24e+00

Pos logit contributions
ric+0.019
ries+0.018
els+0.017
aches+0.016
hat+0.016
Neg logit contributions
 Please-0.015
 unfair-0.015
 trouble-0.015
 guilty-0.015
 Give-0.014
Token and
Token ID290
Feature activation+2.58e+00

Pos logit contributions
 trouble+0.044
 unfair+0.044
 Please+0.043
 Give+0.042
 liar+0.041
Neg logit contributions
ric-0.036
hat-0.036
ries-0.036
els-0.034
asted-0.033
Token ran
Token ID4966
Feature activation-1.48e+00

Pos logit contributions
ric+0.064
ries+0.059
els+0.058
ants+0.055
aches+0.054
Neg logit contributions
 Please-0.037
 guilty-0.036
 trouble-0.034
 sorry-0.034
 unfair-0.034
Token all
Token ID477
Feature activation+5.46e+00

Pos logit contributions
 unfair+0.027
 Please+0.027
 trouble+0.027
 guilty+0.025
 impossible+0.025
Neg logit contributions
ries-0.026
ric-0.026
hat-0.025
asted-0.025
els-0.025
Token of
Token ID286
Feature activation+1.00e+01

Pos logit contributions
ric+0.112
ries+0.110
ants+0.108
aches+0.104
els+0.104
Neg logit contributions
 trouble-0.087
 guilty-0.085
 mean-0.083
 impossible-0.082
 stupid-0.081
Token the
Token ID262
Feature activation+3.15e+00

Pos logit contributions
ric+0.204
ants+0.190
hat+0.186
pp+0.180
illas+0.179
Neg logit contributions
 trouble-0.206
 mean-0.169
 guilty-0.167
 stupid-0.167
 sorry-0.161
Token way
Token ID835
Feature activation-8.17e-01

Pos logit contributions
ric+0.063
ries+0.058
els+0.056
asted+0.056
ants+0.055
Neg logit contributions
 trouble-0.058
 unfair-0.056
 Please-0.055
 guilty-0.055
 mean-0.054
Token home
Token ID1363
Feature activation+2.11e+00

Pos logit contributions
 unfair+0.013
 trouble+0.013
 Please+0.013
 guilty+0.012
 Give+0.012
Neg logit contributions
ric-0.017
ries-0.017
els-0.016
aches-0.016
hat-0.016
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
ric+0.043
ries+0.042
els+0.040
aches+0.040
asted+0.039
Neg logit contributions
 trouble-0.035
 Please-0.034
 unfair-0.034
 guilty-0.033
 impossible-0.032
Token She
Token ID1375
Feature activation+1.55e+00

Pos logit contributions
 trouble+0.027
 unfair+0.027
 Please+0.027
 guilty+0.026
 impossible+0.025
Neg logit contributions
ric-0.037
ries-0.036
els-0.035
asted-0.034
aches-0.034
Token ran
Token ID4966
Feature activation-1.02e+00

Pos logit contributions
ric+0.033
ries+0.031
cks+0.027
els+0.027
aches+0.027
Neg logit contributions
 mean-0.023
 Please-0.022
 guilty-0.022
 unfair-0.021
 stupid-0.021
Token off
Token ID572
Feature activation+4.13e+00

Pos logit contributions
 Please+0.019
 unfair+0.019
 trouble+0.018
 guilty+0.018
 impossible+0.017
Neg logit contributions
ries-0.018
ric-0.018
hat-0.017
els-0.017
asted-0.017
Token to
Token ID284
Feature activation+5.14e+00

Pos logit contributions
ric+0.109
ries+0.103
els+0.101
aches+0.100
asted+0.097
Neg logit contributions
 trouble-0.051
 guilty-0.046
 Please-0.046
 sorry-0.043
 mean-0.042
Token show
Token ID905
Feature activation+1.44e+00

Pos logit contributions
ries+0.109
ric+0.103
ants+0.103
hat+0.102
els+0.100
Neg logit contributions
 trouble-0.092
 sorry-0.084
 guilty-0.082
 mean-0.080
 stupid-0.078
Token all
Token ID477
Feature activation+6.25e+00

Pos logit contributions
ric+0.018
ries+0.016
els+0.014
hat+0.014
aches+0.014
Neg logit contributions
 trouble-0.037
 unfair-0.036
 guilty-0.036
 Please-0.035
 sorry-0.035
Token of
Token ID286
Feature activation+9.98e+00

Pos logit contributions
ric+0.108
illas+0.107
ries+0.102
hat+0.102
ants+0.100
Neg logit contributions
 trouble-0.129
 unfair-0.116
 guilty-0.116
 mean-0.116
 wrong-0.114
Token their
Token ID511
Feature activation+2.56e+00

Pos logit contributions
ric+0.168
hat+0.158
pp+0.153
illas+0.151
bel+0.149
Neg logit contributions
 trouble-0.217
 unfair-0.201
 guilty-0.200
 sorry-0.198
 stupid-0.197
Token friends
Token ID2460
Feature activation+2.84e+00

Pos logit contributions
ric+0.053
ries+0.050
hat+0.047
aches+0.047
asted+0.047
Neg logit contributions
 trouble-0.045
 unfair-0.042
 guilty-0.041
 mean-0.040
 Please-0.040
Token their
Token ID511
Feature activation+9.10e-01

Pos logit contributions
ric+0.064
ries+0.062
els+0.059
aches+0.059
asted+0.057
Neg logit contributions
 trouble-0.041
 guilty-0.040
 unfair-0.038
 sorry-0.038
 wrong-0.037
Token turkey
Token ID26876
Feature activation+2.15e+00

Pos logit contributions
ric+0.018
ries+0.018
els+0.017
aches+0.017
asted+0.017
Neg logit contributions
 trouble-0.015
 unfair-0.015
 Please-0.015
 guilty-0.014
 impossible-0.014
Token.
Token ID13
Feature activation-1.63e+00

Pos logit contributions
ric+0.045
ries+0.044
els+0.042
aches+0.041
asted+0.041
Neg logit contributions
 trouble-0.034
 Please-0.033
 guilty-0.033
 unfair-0.033
 impossible-0.031
Token gigg
Token ID30442
Feature activation+3.50e-01

Pos logit contributions
ric+0.044
ries+0.042
els+0.040
aches+0.037
hat+0.036
Neg logit contributions
 Please-0.039
 unfair-0.038
 trouble-0.038
 guilty-0.037
 mean-0.037
Tokenled
Token ID992
Feature activation+1.26e+00

Pos logit contributions
ric+0.006
ries+0.006
aches+0.005
els+0.005
hat+0.005
Neg logit contributions
 unfair-0.007
 Please-0.007
 guilty-0.007
 trouble-0.007
 Give-0.007
Token and
Token ID290
Feature activation+2.51e+00

Pos logit contributions
ries+0.021
ric+0.021
els+0.020
hat+0.020
asted+0.020
Neg logit contributions
 trouble-0.024
 unfair-0.024
 Please-0.024
 impossible-0.023
 guilty-0.023
Token ate
Token ID15063
Feature activation+3.52e+00

Pos logit contributions
ric+0.052
ries+0.050
els+0.049
ants+0.046
aches+0.045
Neg logit contributions
 Please-0.043
 guilty-0.042
 trouble-0.042
 unfair-0.041
 sorry-0.040
Token all
Token ID477
Feature activation+5.86e+00

Pos logit contributions
ric+0.071
illas+0.062
els+0.061
ries+0.060
hat+0.059
Neg logit contributions
 trouble-0.067
 guilty-0.065
 sorry-0.064
 wrong-0.063
 unfair-0.062
Token of
Token ID286
Feature activation+9.96e+00

Pos logit contributions
ric+0.116
illas+0.105
ants+0.102
els+0.099
ries+0.097
Neg logit contributions
 trouble-0.110
 guilty-0.102
 unfair-0.102
 wrong-0.101
 stupid-0.099
Token the
Token ID262
Feature activation+3.05e+00

Pos logit contributions
bel+0.219
packs+0.205
illas+0.201
ric+0.200
hat+0.193
Neg logit contributions
 trouble-0.170
 right-0.142
 stupid-0.139
 truth-0.135
 bad-0.134
Token yogurt
Token ID32132
Feature activation+1.15e+00

Pos logit contributions
ric+0.058
ries+0.056
els+0.054
aches+0.052
asted+0.052
Neg logit contributions
 trouble-0.055
 Please-0.054
 unfair-0.054
 guilty-0.052
 impossible-0.051
Token.
Token ID13
Feature activation-5.28e-01

Pos logit contributions
ric+0.026
ries+0.023
els+0.023
asted+0.023
hat+0.021
Neg logit contributions
 trouble-0.017
 Please-0.017
 guilty-0.017
 unfair-0.017
 impossible-0.017
Token
Token ID198
Feature activation+6.31e+00

Pos logit contributions
 Please+0.009
 Give+0.008
 trouble+0.008
 guilty+0.008
 Can+0.008
Neg logit contributions
ries-0.011
ric-0.011
els-0.011
aches-0.011
hat-0.010
Token
Token ID198
Feature activation+5.50e+00

Pos logit contributions
ries+0.176
els+0.172
ric+0.169
ishes+0.169
illas+0.162
Neg logit contributions
 Please-0.060
 Give-0.058
 sorry-0.055
 trouble-0.054
 You-0.052
Token Then
Token ID3244
Feature activation-2.86e-01

Pos logit contributions
 trouble+0.030
 Please+0.030
 unfair+0.030
 guilty+0.029
 impossible+0.028
Neg logit contributions
ric-0.037
ries-0.036
els-0.034
aches-0.033
asted-0.033
Token,
Token ID11
Feature activation+1.34e+00

Pos logit contributions
 Please+0.005
 unfair+0.005
 trouble+0.005
 guilty+0.005
 impossible+0.005
Neg logit contributions
ric-0.005
ries-0.005
els-0.005
aches-0.005
asted-0.005
Token they
Token ID484
Feature activation+1.06e+00

Pos logit contributions
ric+0.025
ries+0.022
els+0.022
aches+0.021
hat+0.021
Neg logit contributions
 trouble-0.027
 Please-0.026
 guilty-0.024
 unfair-0.024
 mean-0.024
Token wiped
Token ID21122
Feature activation+3.97e+00

Pos logit contributions
ric+0.025
ries+0.024
els+0.022
ants+0.021
cks+0.021
Neg logit contributions
 mean-0.016
 guilty-0.015
 trouble-0.015
 Please-0.014
 unfair-0.014
Token all
Token ID477
Feature activation+6.08e+00

Pos logit contributions
ric+0.064
ries+0.060
aches+0.058
els+0.055
ants+0.054
Neg logit contributions
 trouble-0.088
 guilty-0.082
 unfair-0.081
 Please-0.081
 impossible-0.080
Token of
Token ID286
Feature activation+9.95e+00

Pos logit contributions
ric+0.119
ants+0.107
illas+0.104
ries+0.104
els+0.100
Neg logit contributions
 trouble-0.113
 guilty-0.107
 wrong-0.106
 stupid-0.105
 right-0.103
Token the
Token ID262
Feature activation+2.89e+00

Pos logit contributions
ric+0.188
hat+0.187
bel+0.173
pp+0.172
ries+0.169
Neg logit contributions
 trouble-0.197
 stupid-0.173
 bad-0.168
 foolish-0.168
 guilty-0.166
Token surfaces
Token ID16649
Feature activation+3.04e+00

Pos logit contributions
ric+0.051
ries+0.049
els+0.047
aches+0.046
asted+0.046
Neg logit contributions
 trouble-0.057
 Please-0.055
 unfair-0.055
 guilty-0.054
 mean-0.053
Token to
Token ID284
Feature activation+4.17e+00

Pos logit contributions
ric+0.061
ries+0.059
els+0.056
aches+0.055
asted+0.055
Neg logit contributions
 trouble-0.051
 Please-0.051
 unfair-0.050
 guilty-0.049
 impossible-0.048
Token make
Token ID787
Feature activation+2.25e+00

Pos logit contributions
ries+0.096
ric+0.091
els+0.090
ants+0.090
ites+0.088
Neg logit contributions
 trouble-0.062
 stupid-0.059
 sorry-0.059
 mean-0.059
 guilty-0.058
Token sure
Token ID1654
Feature activation+5.08e-01

Pos logit contributions
ric+0.040
ries+0.038
els+0.037
hat+0.036
aches+0.035
Neg logit contributions
 trouble-0.045
 unfair-0.043
 guilty-0.042
 Please-0.041
 impossible-0.041
Token her
Token ID607
Feature activation-2.11e-01

Pos logit contributions
ric+0.052
ries+0.050
aches+0.048
ants+0.047
els+0.046
Neg logit contributions
 trouble-0.060
 guilty-0.057
 Please-0.057
 sorry-0.057
 unfair-0.057
Token mom
Token ID1995
Feature activation+2.94e+00

Pos logit contributions
 trouble+0.004
 Please+0.004
 guilty+0.004
 unfair+0.004
 sorry+0.004
Neg logit contributions
ric-0.004
ries-0.004
els-0.004
asted-0.004
aches-0.004
Tokenmy
Token ID1820
Feature activation+1.72e+00

Pos logit contributions
ric+0.066
cks+0.065
ants+0.064
aches+0.063
irs+0.062
Neg logit contributions
 mean-0.038
 guilty-0.036
 right-0.035
 ok-0.035
 okay-0.035
Token shared
Token ID4888
Feature activation+3.15e+00

Pos logit contributions
ric+0.039
ries+0.034
els+0.034
ants+0.033
cks+0.033
Neg logit contributions
 guilty-0.026
 mean-0.025
 trouble-0.025
 unfair-0.024
 Please-0.024
Token all
Token ID477
Feature activation+6.42e+00

Pos logit contributions
ric+0.075
ants+0.066
illas+0.062
field+0.060
irs+0.060
Neg logit contributions
 trouble-0.055
 right-0.049
 wrong-0.048
 mean-0.047
 unfair-0.047
Token of
Token ID286
Feature activation+9.88e+00

Pos logit contributions
ric+0.122
ants+0.107
hat+0.104
illas+0.104
els+0.102
Neg logit contributions
 trouble-0.126
 wrong-0.119
 stupid-0.114
 mean-0.114
 unfair-0.113
Token their
Token ID511
Feature activation+1.97e+00

Pos logit contributions
ric+0.179
hat+0.174
bel+0.173
pp+0.169
ants+0.168
Neg logit contributions
 trouble-0.199
 bad-0.169
 unfair-0.169
 wrong-0.169
 stupid-0.167
Token cher
Token ID21382
Feature activation+3.65e+00

Pos logit contributions
ric+0.040
hat+0.036
ants+0.034
els+0.034
asted+0.033
Neg logit contributions
 trouble-0.036
 guilty-0.034
 unfair-0.033
 stupid-0.032
 foolish-0.032
Tokenries
Token ID1678
Feature activation+3.94e+00

Pos logit contributions
asted+0.050
wered+0.043
ric+0.042
growth+0.039
 unison+0.036
Neg logit contributions
 trouble-0.077
 wrong-0.071
 unfair-0.067
 foolish-0.066
 guilty-0.066
Token,
Token ID11
Feature activation+5.75e-02

Pos logit contributions
ric+0.096
asted+0.084
els+0.083
ries+0.081
ants+0.080
Neg logit contributions
 trouble-0.053
 guilty-0.051
 wrong-0.050
 Please-0.049
 right-0.048
Token making
Token ID1642
Feature activation+2.74e+00

Pos logit contributions
ric+0.001
els+0.001
ries+0.001
ants+0.001
irs+0.001
Neg logit contributions
 guilty-0.001
 trouble-0.001
 Please-0.001
 sorry-0.001
 unfair-0.001
Token,
Token ID11
Feature activation+1.78e-01

Pos logit contributions
ric+0.017
ries+0.017
els+0.016
aches+0.015
asted+0.015
Neg logit contributions
 trouble-0.013
 Please-0.013
 unfair-0.013
 guilty-0.012
 impossible-0.012
Token and
Token ID290
Feature activation+1.69e+00

Pos logit contributions
ric+0.004
ries+0.004
els+0.004
aches+0.003
asted+0.003
Neg logit contributions
 trouble-0.003
 Please-0.003
 unfair-0.003
 guilty-0.003
 impossible-0.003
Token showed
Token ID3751
Feature activation+1.77e+00

Pos logit contributions
ric+0.029
ries+0.028
els+0.027
aches+0.026
hat+0.025
Neg logit contributions
 trouble-0.034
 Please-0.034
 unfair-0.033
 guilty-0.033
 impossible-0.032
Token them
Token ID606
Feature activation+2.82e+00

Pos logit contributions
ric+0.022
ries+0.021
els+0.020
aches+0.019
asted+0.019
Neg logit contributions
 trouble-0.043
 Please-0.043
 unfair-0.042
 guilty-0.041
 impossible-0.041
Token all
Token ID477
Feature activation+5.28e+00

Pos logit contributions
ries+0.047
ric+0.046
els+0.044
hat+0.042
asted+0.041
Neg logit contributions
 trouble-0.057
 guilty-0.054
 unfair-0.054
 stupid-0.054
 sorry-0.054
Token of
Token ID286
Feature activation+9.87e+00

Pos logit contributions
ries+0.086
ric+0.085
ants+0.083
illas+0.081
hat+0.081
Neg logit contributions
 trouble-0.110
 wrong-0.102
 guilty-0.100
 unfair-0.100
 mean-0.100
Token the
Token ID262
Feature activation+3.15e+00

Pos logit contributions
ric+0.158
ries+0.152
hat+0.144
aches+0.137
els+0.134
Neg logit contributions
 trouble-0.224
 unfair-0.209
 mean-0.207
 stupid-0.205
 guilty-0.205
Token unique
Token ID3748
Feature activation+7.56e-01

Pos logit contributions
ric+0.057
ries+0.055
els+0.054
aches+0.053
asted+0.051
Neg logit contributions
 Please-0.059
 unfair-0.056
 Give-0.056
 trouble-0.054
 sorry-0.054
Token things
Token ID1243
Feature activation+8.09e-01

Pos logit contributions
ric+0.012
ries+0.012
els+0.012
aches+0.011
asted+0.011
Neg logit contributions
 Please-0.015
 unfair-0.015
 trouble-0.015
 Give-0.015
 guilty-0.014
Token the
Token ID262
Feature activation+1.88e+00

Pos logit contributions
ric+0.015
ries+0.014
els+0.013
asted+0.013
aches+0.013
Neg logit contributions
 trouble-0.015
 Please-0.015
 unfair-0.015
 guilty-0.015
 impossible-0.014
Token ocean
Token ID9151
Feature activation+3.90e-01

Pos logit contributions
ric+0.033
ries+0.031
els+0.029
aches+0.029
asted+0.029
Neg logit contributions
 Please-0.036
 Give-0.035
 unfair-0.035
 trouble-0.034
 guilty-0.033
Token
Token ID198
Feature activation+4.71e+00

Pos logit contributions
ries+0.123
ric+0.122
els+0.117
aches+0.107
asted+0.105
Neg logit contributions
 Please-0.093
 trouble-0.089
 Give-0.085
 unfair-0.085
 guilty-0.084
TokenCl
Token ID2601
Feature activation+6.51e-02

Pos logit contributions
ric+0.080
ries+0.078
els+0.073
hat+0.063
aches+0.063
Neg logit contributions
 Please-0.106
 unfair-0.100
 trouble-0.099
 Give-0.096
 mean-0.095
Tokenaire
Token ID7626
Feature activation+2.53e+00

Pos logit contributions
ric+0.001
ries+0.001
els+0.001
aches+0.001
asted+0.001
Neg logit contributions
 Please-0.001
 trouble-0.001
 unfair-0.001
 guilty-0.001
 Give-0.001
Token gathered
Token ID9272
Feature activation+2.72e+00

Pos logit contributions
ric+0.052
ries+0.052
els+0.048
aches+0.046
asted+0.046
Neg logit contributions
 Please-0.041
 trouble-0.040
 unfair-0.039
 guilty-0.038
 mean-0.038
Token all
Token ID477
Feature activation+5.52e+00

Pos logit contributions
ric+0.045
ries+0.041
els+0.038
aches+0.038
asted+0.037
Neg logit contributions
 trouble-0.059
 Please-0.057
 unfair-0.057
 guilty-0.056
 impossible-0.054
Token of
Token ID286
Feature activation+9.85e+00

Pos logit contributions
ric+0.111
illas+0.099
ants+0.098
aches+0.094
ries+0.093
Neg logit contributions
 trouble-0.104
 guilty-0.093
 stupid-0.091
 unfair-0.091
 wrong-0.091
Token her
Token ID607
Feature activation+1.57e+00

Pos logit contributions
ric+0.210
bel+0.184
illas+0.183
pp+0.180
hat+0.180
Neg logit contributions
 trouble-0.190
 unfair-0.159
 guilty-0.158
 stupid-0.157
 wrong-0.154
Token courage
Token ID11917
Feature activation-4.28e-01

Pos logit contributions
ric+0.032
ries+0.030
aches+0.029
asted+0.029
els+0.029
Neg logit contributions
 trouble-0.027
 unfair-0.026
 guilty-0.026
 Please-0.026
 mean-0.025
Token and
Token ID290
Feature activation+3.00e+00

Pos logit contributions
 trouble+0.007
 Please+0.007
 unfair+0.007
 guilty+0.007
 impossible+0.007
Neg logit contributions
ric-0.009
ries-0.008
els-0.008
aches-0.008
asted-0.008
Token slowly
Token ID6364
Feature activation+1.88e+00

Pos logit contributions
ric+0.059
ries+0.055
els+0.052
aches+0.050
ants+0.050
Neg logit contributions
 trouble-0.056
 Please-0.053
 unfair-0.053
 guilty-0.053
 mean-0.051
Token placed
Token ID4624
Feature activation+4.73e+00

Pos logit contributions
ric+0.040
ries+0.036
els+0.034
ants+0.033
aches+0.033
Neg logit contributions
 trouble-0.032
 guilty-0.031
 mean-0.030
 impossible-0.030
 Please-0.029
Token kitten
Token ID42143
Feature activation+6.04e-02

Pos logit contributions
ric+0.028
ries+0.027
els+0.026
hat+0.025
aches+0.025
Neg logit contributions
 Please-0.029
 unfair-0.028
 trouble-0.028
 Give-0.027
 impossible-0.026
Token,
Token ID11
Feature activation+7.25e-01

Pos logit contributions
ric+0.001
ries+0.001
els+0.001
aches+0.001
asted+0.001
Neg logit contributions
 trouble-0.001
 Please-0.001
 unfair-0.001
 guilty-0.001
 impossible-0.001
Token showing
Token ID4478
Feature activation+6.94e-01

Pos logit contributions
ries+0.016
ric+0.016
els+0.015
aches+0.015
ants+0.014
Neg logit contributions
 guilty-0.011
 Please-0.011
 trouble-0.011
 unfair-0.011
 mean-0.011
Token it
Token ID340
Feature activation+1.58e+00

Pos logit contributions
ric+0.012
ries+0.011
els+0.011
aches+0.011
asted+0.010
Neg logit contributions
 Please-0.014
 trouble-0.014
 unfair-0.013
 guilty-0.013
 impossible-0.013
Token all
Token ID477
Feature activation+5.10e+00

Pos logit contributions
ric+0.036
ries+0.036
els+0.034
asted+0.033
aches+0.033
Neg logit contributions
 trouble-0.022
 unfair-0.021
 Please-0.021
 guilty-0.021
 impossible-0.020
Token of
Token ID286
Feature activation+9.84e+00

Pos logit contributions
ric+0.092
ries+0.091
ants+0.087
els+0.086
illas+0.084
Neg logit contributions
 trouble-0.098
 wrong-0.092
 unfair-0.091
 guilty-0.090
 impossible-0.090
Token the
Token ID262
Feature activation+3.43e+00

Pos logit contributions
ric+0.199
ries+0.197
hat+0.183
els+0.179
illas+0.179
Neg logit contributions
 trouble-0.178
 unfair-0.157
 mean-0.156
 stupid-0.155
 guilty-0.152
Token new
Token ID649
Feature activation+1.55e+00

Pos logit contributions
ric+0.058
ries+0.057
els+0.056
aches+0.052
asted+0.051
Neg logit contributions
 Please-0.067
 unfair-0.064
 Give-0.062
 trouble-0.061
 sorry-0.061
Token places
Token ID4113
Feature activation+2.34e+00

Pos logit contributions
ric+0.028
ries+0.027
els+0.027
aches+0.025
hat+0.025
Neg logit contributions
 Please-0.029
 unfair-0.028
 trouble-0.027
 impossible-0.027
 guilty-0.027
Token he
Token ID339
Feature activation-1.34e-01

Pos logit contributions
ric+0.049
ries+0.048
els+0.046
asted+0.045
aches+0.045
Neg logit contributions
 trouble-0.037
 Please-0.037
 unfair-0.036
 guilty-0.035
 impossible-0.034
Token had
Token ID550
Feature activation-3.07e+00

Pos logit contributions
 trouble+0.002
 Please+0.002
 unfair+0.002
 guilty+0.002
 impossible+0.002
Neg logit contributions
ric-0.003
ries-0.003
els-0.003
aches-0.002
asted-0.002
Token Instead
Token ID5455
Feature activation+1.30e+00

Pos logit contributions
 Please+0.028
 trouble+0.028
 unfair+0.027
 guilty+0.027
 impossible+0.026
Neg logit contributions
ric-0.031
ries-0.030
els-0.029
aches-0.029
asted-0.028
Token,
Token ID11
Feature activation+1.12e+00

Pos logit contributions
ric+0.028
ries+0.027
els+0.026
aches+0.026
asted+0.026
Neg logit contributions
 trouble-0.020
 Please-0.020
 unfair-0.020
 guilty-0.019
 impossible-0.019
Token he
Token ID339
Feature activation+1.00e+00

Pos logit contributions
ric+0.020
ries+0.018
els+0.018
hat+0.018
aches+0.017
Neg logit contributions
 trouble-0.024
 Please-0.023
 guilty-0.022
 unfair-0.022
 sorry-0.021
Token took
Token ID1718
Feature activation+2.08e+00

Pos logit contributions
ric+0.021
ries+0.020
els+0.019
aches+0.018
asted+0.018
Neg logit contributions
 Please-0.017
 trouble-0.017
 unfair-0.016
 guilty-0.016
 mean-0.016
Token all
Token ID477
Feature activation+4.89e+00

Pos logit contributions
ric+0.030
ries+0.028
els+0.026
aches+0.026
asted+0.025
Neg logit contributions
 trouble-0.048
 Please-0.047
 unfair-0.047
 guilty-0.046
 impossible-0.045
Token of
Token ID286
Feature activation+9.84e+00

Pos logit contributions
ric+0.092
ants+0.082
aches+0.077
ries+0.077
illas+0.076
Neg logit contributions
 trouble-0.103
 guilty-0.091
 wrong-0.087
 unfair-0.086
 mean-0.086
Token the
Token ID262
Feature activation+2.78e+00

Pos logit contributions
ric+0.193
hat+0.179
bel+0.177
pp+0.173
ants+0.170
Neg logit contributions
 trouble-0.201
 guilty-0.169
 stupid-0.166
 sorry-0.164
 mean-0.162
Token little
Token ID1310
Feature activation+1.80e+00

Pos logit contributions
ries+0.052
els+0.050
ric+0.048
aches+0.045
hat+0.045
Neg logit contributions
 Please-0.051
 impossible-0.048
 unfair-0.048
 trouble-0.046
 Give-0.045
Token boy
Token ID2933
Feature activation+1.09e+00

Pos logit contributions
ries+0.037
els+0.034
 pump+0.031
packs+0.031
ric+0.031
Neg logit contributions
 unfair-0.028
 impossible-0.028
 Please-0.027
 Goodbye-0.027
 Give-0.026
Token's
Token ID338
Feature activation+2.76e-01

Pos logit contributions
ric+0.025
ries+0.024
els+0.023
aches+0.023
asted+0.023
Neg logit contributions
 trouble-0.016
 Please-0.015
 unfair-0.015
 guilty-0.015
 impossible-0.014
Token toys
Token ID14958
Feature activation+2.37e+00

Pos logit contributions
ric+0.005
ries+0.005
els+0.005
aches+0.005
asted+0.005
Neg logit contributions
 Please-0.005
 unfair-0.005
 trouble-0.005
 impossible-0.005
 guilty-0.004
Token dinner
Token ID8073
Feature activation+1.11e-01

Pos logit contributions
ric+0.112
els+0.101
ries+0.098
hat+0.098
packs+0.094
Neg logit contributions
 trouble-0.096
 sorry-0.089
 Please-0.087
 unfair-0.087
 guilty-0.085
Token!
Token ID0
Feature activation-9.18e-01

Pos logit contributions
ric+0.002
ries+0.002
els+0.002
aches+0.002
asted+0.002
Neg logit contributions
 trouble-0.002
 Please-0.002
 unfair-0.002
 guilty-0.002
 impossible-0.002
Token She
Token ID1375
Feature activation+1.78e+00

Pos logit contributions
 Please+0.015
 trouble+0.015
 unfair+0.015
 guilty+0.014
 Give+0.014
Neg logit contributions
ric-0.019
ries-0.019
els-0.018
aches-0.018
asted-0.017
Token ate
Token ID15063
Feature activation+2.88e+00

Pos logit contributions
ric+0.035
ries+0.034
els+0.032
aches+0.031
hat+0.031
Neg logit contributions
 trouble-0.031
 Please-0.031
 unfair-0.031
 guilty-0.030
 mean-0.029
Token all
Token ID477
Feature activation+5.30e+00

Pos logit contributions
ric+0.054
ries+0.050
els+0.049
aches+0.047
hat+0.047
Neg logit contributions
 trouble-0.055
 Please-0.053
 unfair-0.053
 guilty-0.052
 mean-0.051
Token of
Token ID286
Feature activation+9.81e+00

Pos logit contributions
ric+0.098
illas+0.085
els+0.084
ants+0.083
ries+0.081
Neg logit contributions
 trouble-0.110
 unfair-0.101
 guilty-0.101
 wrong-0.101
 stupid-0.099
Token her
Token ID607
Feature activation+1.25e+00

Pos logit contributions
ric+0.192
bel+0.176
els+0.168
illas+0.167
hat+0.167
Neg logit contributions
 trouble-0.199
 unfair-0.179
 wrong-0.177
 sorry-0.175
 guilty-0.172
Token broccoli
Token ID44653
Feature activation+2.23e+00

Pos logit contributions
ric+0.026
ries+0.026
els+0.025
aches+0.024
asted+0.024
Neg logit contributions
 Please-0.020
 trouble-0.019
 unfair-0.019
 guilty-0.018
 impossible-0.018
Token and
Token ID290
Feature activation+2.96e+00

Pos logit contributions
ric+0.049
ries+0.046
els+0.045
asted+0.044
aches+0.043
Neg logit contributions
 trouble-0.034
 Please-0.033
 guilty-0.033
 unfair-0.033
 impossible-0.032
Token felt
Token ID2936
Feature activation-2.93e+00

Pos logit contributions
ric+0.060
els+0.055
ries+0.055
ants+0.052
hat+0.051
Neg logit contributions
 trouble-0.053
 Please-0.051
 guilty-0.051
 unfair-0.050
 mean-0.048
Token strong
Token ID1913
Feature activation-2.33e+00

Pos logit contributions
 Please+0.046
 trouble+0.044
 Give+0.043
 unfair+0.043
 impossible+0.041
Neg logit contributions
ric-0.057
aches-0.057
asted-0.057
ries-0.057
hat-0.053
Token sc
Token ID629
Feature activation-3.37e+00

Pos logit contributions
ric+0.028
ries+0.028
els+0.026
aches+0.026
asted+0.025
Neg logit contributions
 trouble-0.022
 Please-0.022
 unfair-0.022
 guilty-0.021
 impossible-0.021
Tokenrun
Token ID5143
Feature activation+3.52e+00

Pos logit contributions
 Please+0.037
 Give+0.035
 trouble+0.035
 guilty+0.035
 unfair+0.034
Neg logit contributions
ric-0.088
ries-0.085
els-0.081
asted-0.080
aches-0.079
Tokenched
Token ID1740
Feature activation+2.53e+00

Pos logit contributions
ric+0.083
ries+0.078
aches+0.074
field+0.073
els+0.073
Neg logit contributions
 trouble-0.044
 Give-0.044
 wrong-0.041
 Please-0.041
 mean-0.040
Token up
Token ID510
Feature activation+3.59e+00

Pos logit contributions
ric+0.040
ries+0.036
aches+0.035
asted+0.035
hat+0.034
Neg logit contributions
 trouble-0.054
 Please-0.052
 guilty-0.052
 unfair-0.051
 mean-0.051
Token all
Token ID477
Feature activation+4.82e+00

Pos logit contributions
ric+0.072
ries+0.065
els+0.063
aches+0.062
asted+0.062
Neg logit contributions
 trouble-0.067
 Please-0.063
 unfair-0.062
 guilty-0.062
 mean-0.061
Token of
Token ID286
Feature activation+9.81e+00

Pos logit contributions
ric+0.103
ants+0.093
illas+0.090
hat+0.090
asted+0.090
Neg logit contributions
 trouble-0.085
 mean-0.079
 wrong-0.079
 guilty-0.077
 stupid-0.076
Token the
Token ID262
Feature activation+2.52e+00

Pos logit contributions
pp+0.222
hat+0.221
ric+0.216
ants+0.214
ites+0.214
Neg logit contributions
 trouble-0.164
 right-0.134
 stupid-0.129
 bad-0.122
 guilty-0.121
Token cloth
Token ID16270
Feature activation+1.11e+00

Pos logit contributions
ric+0.048
ries+0.046
els+0.044
asted+0.043
aches+0.043
Neg logit contributions
 trouble-0.046
 Please-0.045
 unfair-0.045
 guilty-0.044
 mean-0.043
Token in
Token ID287
Feature activation+6.08e+00

Pos logit contributions
ric+0.023
ries+0.023
els+0.022
asted+0.021
aches+0.021
Neg logit contributions
 trouble-0.018
 Please-0.017
 unfair-0.017
 guilty-0.017
 impossible-0.017
Token her
Token ID607
Feature activation+9.73e-02

Pos logit contributions
ric+0.114
ants+0.102
ries+0.102
aches+0.100
hat+0.097
Neg logit contributions
 trouble-0.139
 mean-0.120
 Please-0.119
 guilty-0.118
 stupid-0.117
Token hand
Token ID1021
Feature activation+4.88e-01

Pos logit contributions
ric+0.002
ries+0.002
els+0.002
asted+0.002
aches+0.002
Neg logit contributions
 trouble-0.002
 Please-0.002
 guilty-0.002
 unfair-0.002
 mean-0.002
Token.
Token ID13
Feature activation-1.09e+00

Pos logit contributions
 unfair+0.004
 trouble+0.004
 Please+0.004
 sorry+0.004
 guilty+0.004
Neg logit contributions
ries-0.003
els-0.003
ric-0.003
aches-0.003
hat-0.003
Token He
Token ID679
Feature activation+7.14e-01

Pos logit contributions
 Please+0.019
 trouble+0.019
 unfair+0.018
 guilty+0.018
 Give+0.018
Neg logit contributions
ries-0.021
ric-0.021
els-0.020
aches-0.020
asted-0.019
Token blinked
Token ID40360
Feature activation-7.62e-01

Pos logit contributions
ric+0.015
ries+0.014
els+0.014
aches+0.013
hat+0.013
Neg logit contributions
 Please-0.012
 trouble-0.012
 unfair-0.012
 guilty-0.011
 mean-0.011
Token his
Token ID465
Feature activation-2.20e+00

Pos logit contributions
 trouble+0.014
 Please+0.014
 unfair+0.014
 guilty+0.013
 impossible+0.013
Neg logit contributions
ric-0.014
ries-0.014
els-0.013
aches-0.013
asted-0.013
Token long
Token ID890
Feature activation-5.61e-01

Pos logit contributions
 trouble+0.038
 Please+0.037
 unfair+0.037
 guilty+0.036
 impossible+0.035
Neg logit contributions
ric-0.044
ries-0.043
els-0.041
aches-0.040
asted-0.039
Token eyel
Token ID29708
Feature activation-1.02e+01

Pos logit contributions
 trouble+0.010
 unfair+0.010
 Please+0.010
 guilty+0.010
 impossible+0.009
Neg logit contributions
ric-0.010
ries-0.010
els-0.010
aches-0.010
asted-0.010
Tokenashes
Token ID7465
Feature activation+1.05e+00

Pos logit contributions
 unfair+0.233
 trouble+0.231
 Please+0.224
 wrong+0.216
 mean+0.215
Neg logit contributions
ric-0.159
els-0.137
asted-0.131
ants-0.126
ries-0.124
Token and
Token ID290
Feature activation+2.02e+00

Pos logit contributions
ric+0.020
ries+0.020
els+0.019
aches+0.018
asted+0.018
Neg logit contributions
 unfair-0.018
 trouble-0.018
 Please-0.018
 guilty-0.017
 impossible-0.017
Token kept
Token ID4030
Feature activation-3.78e-01

Pos logit contributions
ric+0.042
ries+0.041
els+0.039
hat+0.037
ants+0.037
Neg logit contributions
 Please-0.035
 trouble-0.034
 guilty-0.034
 unfair-0.034
 mean-0.033
Token hopping
Token ID47153
Feature activation+1.29e-01

Pos logit contributions
 trouble+0.006
 Please+0.005
 unfair+0.005
 guilty+0.005
 mean+0.005
Neg logit contributions
ric-0.009
ries-0.008
els-0.008
aches-0.008
hat-0.008
Token.
Token ID13
Feature activation-5.53e-01

Pos logit contributions
ric+0.002
ries+0.002
aches+0.002
els+0.002
hat+0.002
Neg logit contributions
 unfair-0.003
 trouble-0.003
 Please-0.003
 guilty-0.002
 Give-0.002
Token theater
Token ID13766
Feature activation+1.15e+00

Pos logit contributions
ric+0.036
ries+0.033
asted+0.032
els+0.032
aches+0.030
Neg logit contributions
 unfair-0.044
 Please-0.044
 trouble-0.043
 impossible-0.042
 Give-0.042
Token and
Token ID290
Feature activation+1.71e+00

Pos logit contributions
ric+0.023
ries+0.023
els+0.022
aches+0.021
asted+0.021
Neg logit contributions
 trouble-0.019
 Please-0.019
 unfair-0.019
 guilty-0.018
 impossible-0.018
Token Larry
Token ID13633
Feature activation+8.37e-01

Pos logit contributions
ric+0.032
ries+0.031
els+0.030
hat+0.029
aches+0.029
Neg logit contributions
 trouble-0.031
 Please-0.031
 unfair-0.031
 guilty-0.030
 impossible-0.029
Token felt
Token ID2936
Feature activation-3.80e+00

Pos logit contributions
ric+0.016
ries+0.016
els+0.015
aches+0.015
asted+0.015
Neg logit contributions
 trouble-0.014
 Please-0.014
 unfair-0.014
 guilty-0.013
 impossible-0.013
Token his
Token ID465
Feature activation-2.25e+00

Pos logit contributions
 Give+0.053
 Please+0.053
 unfair+0.049
 trouble+0.047
 Come+0.046
Neg logit contributions
ric-0.081
asted-0.074
aches-0.074
ries-0.072
els-0.068
Token eyel
Token ID29708
Feature activation-1.02e+01

Pos logit contributions
 unfair+0.045
 Give+0.044
 Please+0.044
 trouble+0.041
 impossible+0.041
Neg logit contributions
ric-0.038
asted-0.031
ries-0.031
hat-0.031
aches-0.030
Tokenids
Token ID2340
Feature activation+7.91e-01

Pos logit contributions
 unfair+0.230
 trouble+0.224
 Please+0.219
 wrong+0.216
 mean+0.210
Neg logit contributions
ric-0.165
asted-0.136
els-0.136
ants-0.130
packs-0.126
Token become
Token ID1716
Feature activation-2.40e+00

Pos logit contributions
ric+0.016
ries+0.016
els+0.015
aches+0.015
asted+0.015
Neg logit contributions
 trouble-0.013
 Please-0.013
 unfair-0.013
 guilty-0.012
 impossible-0.012
Token heavy
Token ID4334
Feature activation-2.11e+00

Pos logit contributions
 Please+0.036
 unfair+0.036
 Give+0.036
 trouble+0.035
 guilty+0.033
Neg logit contributions
ric-0.051
ries-0.050
els-0.048
aches-0.048
hat-0.047
Token.
Token ID13
Feature activation-1.34e+00

Pos logit contributions
 unfair+0.050
 Please+0.047
 Give+0.047
 Goodbye+0.046
 liar+0.046
Neg logit contributions
ries-0.026
ric-0.025
els-0.024
packs-0.024
hat-0.024
Token He
Token ID679
Feature activation+1.33e+00

Pos logit contributions
 trouble+0.018
 unfair+0.017
 Please+0.017
 guilty+0.016
 mean+0.016
Neg logit contributions
ric-0.031
ries-0.031
els-0.030
asted-0.029
aches-0.029
Token long
Token ID890
Feature activation-1.52e+00

Pos logit contributions
 Please+0.034
 Give+0.033
 unfair+0.033
 trouble+0.033
 guilty+0.031
Neg logit contributions
ric-0.050
ries-0.049
aches-0.047
els-0.045
hat-0.045
Token until
Token ID1566
Feature activation+1.81e+00

Pos logit contributions
 unfair+0.029
 Please+0.028
 trouble+0.027
 guilty+0.026
 Give+0.026
Neg logit contributions
ries-0.026
ric-0.026
els-0.024
aches-0.024
hat-0.023
Token Chloe
Token ID29476
Feature activation-1.14e+00

Pos logit contributions
ric+0.037
ries+0.035
els+0.034
aches+0.033
hat+0.032
Neg logit contributions
 trouble-0.032
 Please-0.032
 unfair-0.030
 guilty-0.030
 impossible-0.029
Token felt
Token ID2936
Feature activation-3.36e+00

Pos logit contributions
 unfair+0.022
 trouble+0.021
 Please+0.021
 impossible+0.020
 guilty+0.020
Neg logit contributions
ric-0.020
ries-0.019
asted-0.018
aches-0.018
els-0.018
Token her
Token ID607
Feature activation-1.71e+00

Pos logit contributions
 Please+0.047
 Give+0.047
 unfair+0.044
 trouble+0.044
 liar+0.041
Neg logit contributions
ric-0.071
aches-0.069
ries-0.068
asted-0.068
hat-0.065
Token eyel
Token ID29708
Feature activation-9.73e+00

Pos logit contributions
 unfair+0.033
 Give+0.031
 Please+0.031
 trouble+0.030
 mean+0.029
Neg logit contributions
ric-0.029
ries-0.026
hat-0.026
aches-0.026
asted-0.025
Tokenids
Token ID2340
Feature activation+8.25e-01

Pos logit contributions
 unfair+0.211
 trouble+0.203
 Please+0.200
 wrong+0.197
 mean+0.195
Neg logit contributions
ric-0.161
asted-0.139
els-0.135
ries-0.128
ants-0.128
Token getting
Token ID1972
Feature activation-1.53e+00

Pos logit contributions
ric+0.017
ries+0.017
els+0.016
asted+0.015
aches+0.015
Neg logit contributions
 trouble-0.013
 Please-0.013
 unfair-0.013
 guilty-0.013
 impossible-0.012
Token heavy
Token ID4334
Feature activation-6.26e-02

Pos logit contributions
 Please+0.023
 trouble+0.023
 unfair+0.023
 Give+0.022
 guilty+0.021
Neg logit contributions
ries-0.033
ric-0.033
els-0.031
aches-0.030
asted-0.030
Token.
Token ID13
Feature activation-1.17e+00

Pos logit contributions
 unfair+0.001
 Please+0.001
 Give+0.001
 liar+0.001
 guilty+0.001
Neg logit contributions
ries-0.001
els-0.001
ric-0.001
hat-0.001
packs-0.001
Token She
Token ID1375
Feature activation+2.19e+00

Pos logit contributions
 Please+0.021
 trouble+0.020
 unfair+0.020
 guilty+0.019
 Give+0.019
Neg logit contributions
ric-0.023
ries-0.023
els-0.022
aches-0.021
asted-0.021
Token the
Token ID262
Feature activation+2.08e+00

Pos logit contributions
ric+0.063
ries+0.059
aches+0.058
ants+0.057
asted+0.057
Neg logit contributions
 trouble-0.046
 Please-0.045
 guilty-0.042
 sorry-0.042
 mean-0.042
Token black
Token ID2042
Feature activation+6.31e-01

Pos logit contributions
ric+0.038
ries+0.037
els+0.035
aches+0.034
asted+0.034
Neg logit contributions
 trouble-0.039
 Please-0.038
 unfair-0.038
 guilty-0.037
 mean-0.036
Token mascara
Token ID48430
Feature activation+1.67e-01

Pos logit contributions
ric+0.012
ries+0.012
els+0.011
aches+0.011
asted+0.010
Neg logit contributions
 trouble-0.011
 unfair-0.011
 Please-0.011
 guilty-0.011
 impossible-0.011
Token on
Token ID319
Feature activation+6.11e+00

Pos logit contributions
ric+0.004
ries+0.003
els+0.003
asted+0.003
aches+0.003
Neg logit contributions
 trouble-0.003
 Please-0.003
 unfair-0.003
 guilty-0.003
 impossible-0.002
Token her
Token ID607
Feature activation-7.43e-01

Pos logit contributions
ric+0.119
illas+0.103
ries+0.101
ites+0.098
packs+0.096
Neg logit contributions
 Please-0.121
 trouble-0.119
 mean-0.118
 unfair-0.114
 stupid-0.113
Token eyel
Token ID29708
Feature activation-9.55e+00

Pos logit contributions
 trouble+0.012
 Please+0.012
 unfair+0.012
 guilty+0.012
 mean+0.011
Neg logit contributions
ric-0.015
ries-0.015
els-0.014
aches-0.014
asted-0.014
Tokenashes
Token ID7465
Feature activation+2.60e+00

Pos logit contributions
 unfair+0.188
 trouble+0.186
 Please+0.180
 wrong+0.174
 mean+0.171
Neg logit contributions
ric-0.175
els-0.157
asted-0.153
ries-0.149
ants-0.149
Token.
Token ID13
Feature activation-3.07e+00

Pos logit contributions
ric+0.054
ries+0.053
els+0.050
aches+0.049
asted+0.049
Neg logit contributions
 trouble-0.042
 Please-0.041
 unfair-0.040
 guilty-0.039
 mean-0.038
Token She
Token ID1375
Feature activation+1.86e+00

Pos logit contributions
 Please+0.049
 trouble+0.048
 unfair+0.046
 Give+0.046
 sorry+0.045
Neg logit contributions
ric-0.066
ries-0.065
els-0.063
aches-0.062
asted-0.060
Token makes
Token ID1838
Feature activation+1.80e+00

Pos logit contributions
ric+0.042
ries+0.041
els+0.040
aches+0.039
asted+0.039
Neg logit contributions
 trouble-0.026
 Please-0.026
 unfair-0.025
 guilty-0.024
 impossible-0.024
Token a
Token ID257
Feature activation-2.48e-01

Pos logit contributions
ric+0.032
ries+0.032
els+0.030
hat+0.029
aches+0.029
Neg logit contributions
 trouble-0.034
 unfair-0.033
 Please-0.033
 guilty-0.032
 mean-0.032
Token THE
Token ID3336
Feature activation+4.92e-03

Pos logit contributions
 Please+0.051
 trouble+0.050
 unfair+0.049
 guilty+0.048
 Give+0.048
Neg logit contributions
ric-0.053
ries-0.052
els-0.050
asted-0.049
aches-0.049
Token BE
Token ID9348
Feature activation-4.86e+00

Pos logit contributions
ric+0.000
ries+0.000
els+0.000
asted+0.000
aches+0.000
Neg logit contributions
 Please-0.000
 trouble-0.000
 unfair-0.000
 Give-0.000
 guilty-0.000
TokenLL
Token ID3069
Feature activation-5.57e+00

Pos logit contributions
 trouble+0.083
 unfair+0.080
 Please+0.079
 guilty+0.078
 impossible+0.077
Neg logit contributions
ric-0.096
ries-0.093
els-0.089
aches-0.088
asted-0.087
Token WHEN
Token ID42099
Feature activation-6.02e+00

Pos logit contributions
 trouble+0.109
 guilty+0.104
 unfair+0.104
 impossible+0.103
 fault+0.098
Neg logit contributions
ric-0.090
ries-0.086
hat-0.083
els-0.082
aches-0.082
Token YOU
Token ID7013
Feature activation-5.25e+00

Pos logit contributions
 guilty+0.090
 trouble+0.089
 unfair+0.089
 Please+0.087
 impossible+0.085
Neg logit contributions
ries-0.130
els-0.126
ric-0.125
aches-0.119
ants-0.114
Token FE
Token ID18630
Feature activation-9.23e+00

Pos logit contributions
 trouble+0.098
 Please+0.097
 unfair+0.096
 guilty+0.093
 impossible+0.091
Neg logit contributions
ric-0.096
ries-0.092
els-0.089
aches-0.087
asted-0.086
TokenEL
Token ID3698
Feature activation-4.19e+00

Pos logit contributions
 trouble+0.175
 Please+0.160
 guilty+0.158
 unfair+0.154
 mean+0.148
Neg logit contributions
ric-0.173
els-0.157
aches-0.155
ries-0.155
asted-0.154
Token LIKE
Token ID34178
Feature activation-3.31e+00

Pos logit contributions
 Please+0.087
 trouble+0.086
 unfair+0.086
 guilty+0.084
 sorry+0.083
Neg logit contributions
ries-0.065
ric-0.064
els-0.061
aches-0.059
hat-0.057
Token IT
Token ID7283
Feature activation-1.43e+00

Pos logit contributions
 trouble+0.053
 Please+0.052
 unfair+0.052
 guilty+0.050
 impossible+0.049
Neg logit contributions
ric-0.069
ries-0.067
els-0.064
aches-0.063
asted-0.062
Token".
Token ID1911
Feature activation+1.11e-01

Pos logit contributions
 trouble+0.021
 Please+0.021
 unfair+0.021
 guilty+0.020
 impossible+0.020
Neg logit contributions
ric-0.030
ries-0.030
els-0.029
aches-0.029
hat-0.028
Token
Token ID198
Feature activation+5.70e+00

Pos logit contributions
ric+0.002
ries+0.002
els+0.002
aches+0.002
asted+0.002
Neg logit contributions
 trouble-0.002
 Please-0.002
 unfair-0.002
 guilty-0.002
 impossible-0.002
Token She
Token ID1375
Feature activation+1.31e+00

Pos logit contributions
 Please+0.016
 trouble+0.015
 Give+0.015
 unfair+0.014
 guilty+0.014
Neg logit contributions
ric-0.019
ries-0.019
els-0.018
aches-0.018
cking-0.017
Token put
Token ID1234
Feature activation+3.27e+00

Pos logit contributions
ric+0.027
ries+0.026
els+0.024
aches+0.024
hat+0.023
Neg logit contributions
 Please-0.022
 unfair-0.022
 trouble-0.021
 mean-0.021
 guilty-0.021
Token some
Token ID617
Feature activation+2.50e+00

Pos logit contributions
ric+0.071
ries+0.062
ants+0.061
asted+0.061
aches+0.060
Neg logit contributions
 trouble-0.056
 guilty-0.053
 mean-0.053
 unfair-0.052
 Please-0.052
Token on
Token ID319
Feature activation+5.27e+00

Pos logit contributions
ric+0.049
ries+0.045
els+0.043
asted+0.042
aches+0.042
Neg logit contributions
 trouble-0.046
 Please-0.045
 unfair-0.045
 guilty-0.044
 mean-0.043
Token her
Token ID607
Feature activation-1.03e+00

Pos logit contributions
ric+0.086
ries+0.077
aches+0.071
asted+0.071
els+0.070
Neg logit contributions
 trouble-0.116
 Please-0.112
 unfair-0.111
 mean-0.110
 guilty-0.110
Token eyel
Token ID29708
Feature activation-9.16e+00

Pos logit contributions
 trouble+0.019
 Please+0.019
 unfair+0.019
 guilty+0.018
 impossible+0.018
Neg logit contributions
ric-0.019
ries-0.019
els-0.018
aches-0.017
asted-0.017
Tokenids
Token ID2340
Feature activation+1.03e+00

Pos logit contributions
 trouble+0.195
 unfair+0.194
 Please+0.189
 wrong+0.184
 mean+0.182
Neg logit contributions
ric-0.152
els-0.138
ries-0.133
asted-0.132
ants-0.128
Token and
Token ID290
Feature activation+2.18e+00

Pos logit contributions
ric+0.024
ries+0.023
asted+0.022
els+0.022
aches+0.022
Neg logit contributions
 trouble-0.014
 Please-0.014
 guilty-0.014
 impossible-0.014
 mean-0.013
Token cheeks
Token ID25839
Feature activation+1.56e-01

Pos logit contributions
ric+0.049
ide+0.042
ries+0.041
illas+0.039
ants+0.039
Neg logit contributions
 Please-0.038
 trouble-0.038
 mean-0.037
 guilty-0.036
 unfair-0.036
Token.
Token ID13
Feature activation-8.74e-01

Pos logit contributions
ric+0.004
ries+0.003
els+0.003
asted+0.003
aches+0.003
Neg logit contributions
 Please-0.002
 trouble-0.002
 guilty-0.002
 unfair-0.002
 mean-0.002
Token It
Token ID632
Feature activation+9.45e-01

Pos logit contributions
 Please+0.013
 trouble+0.013
 Give+0.013
 unfair+0.012
 guilty+0.012
Neg logit contributions
ric-0.019
ries-0.019
els-0.018
aches-0.018
cking-0.017
Token.
Token ID13
Feature activation-7.71e-01

Pos logit contributions
 Please+0.050
 unfair+0.050
 trouble+0.049
 guilty+0.048
 Give+0.047
Neg logit contributions
ric-0.057
ries-0.056
els-0.053
aches-0.052
hat-0.051
Token Jill
Token ID22772
Feature activation+8.96e-01

Pos logit contributions
 trouble+0.015
 Please+0.015
 unfair+0.014
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.014
ries-0.014
els-0.013
aches-0.012
asted-0.012
Token hopped
Token ID45230
Feature activation-1.53e+00

Pos logit contributions
ric+0.020
ries+0.020
els+0.019
asted+0.018
irs+0.018
Neg logit contributions
 Please-0.013
 mean-0.013
 guilty-0.012
 trouble-0.012
 unfair-0.012
Token for
Token ID329
Feature activation+3.91e+00

Pos logit contributions
 unfair+0.035
 trouble+0.034
 Please+0.033
 liar+0.033
 fault+0.032
Neg logit contributions
hat-0.021
At-0.019
 every-0.019
ries-0.018
aches-0.018
Token what
Token ID644
Feature activation-1.37e+00

Pos logit contributions
ric+0.085
ries+0.084
els+0.080
aches+0.078
asted+0.078
Neg logit contributions
 trouble-0.059
 Please-0.058
 unfair-0.058
 guilty-0.055
 impossible-0.054
Token seemed
Token ID3947
Feature activation-9.09e+00

Pos logit contributions
 Please+0.027
 unfair+0.026
 trouble+0.025
 Give+0.025
 guilty+0.025
Neg logit contributions
ries-0.023
ric-0.022
aches-0.022
els-0.021
asted-0.021
Token like
Token ID588
Feature activation-5.18e-01

Pos logit contributions
 Please+0.148
 Give+0.148
 trouble+0.133
 liar+0.131
 Promise+0.130
Neg logit contributions
ric-0.171
aches-0.162
 every-0.151
asted-0.151
bel-0.149
Token miles
Token ID4608
Feature activation-5.27e-01

Pos logit contributions
 unfair+0.009
 Please+0.009
 trouble+0.009
 guilty+0.008
 Give+0.008
Neg logit contributions
ric-0.010
ries-0.010
els-0.009
aches-0.009
asted-0.009
Token until
Token ID1566
Feature activation+2.04e+00

Pos logit contributions
 Please+0.009
 trouble+0.009
 unfair+0.009
 guilty+0.008
 Give+0.008
Neg logit contributions
ric-0.010
ries-0.010
els-0.010
aches-0.009
hat-0.009
Token she
Token ID673
Feature activation-1.09e+00

Pos logit contributions
ric+0.032
ries+0.032
els+0.030
asted+0.029
aches+0.029
Neg logit contributions
 Please-0.042
 unfair-0.041
 trouble-0.041
 guilty-0.040
 Give-0.039
Token spotted
Token ID13489
Feature activation+1.67e+00

Pos logit contributions
 trouble+0.018
 Please+0.018
 guilty+0.017
 unfair+0.017
 mean+0.017
Neg logit contributions
ric-0.022
ries-0.022
els-0.021
ants-0.020
aches-0.020
Token asked
Token ID1965
Feature activation+1.05e+00

Pos logit contributions
ric+0.015
ries+0.014
els+0.013
aches+0.013
asted+0.013
Neg logit contributions
 trouble-0.015
 Please-0.015
 unfair-0.014
 guilty-0.014
 mean-0.014
Token him
Token ID683
Feature activation-1.64e+00

Pos logit contributions
ries+0.016
ric+0.015
els+0.014
asted+0.014
aches+0.014
Neg logit contributions
 unfair-0.022
 trouble-0.022
 Please-0.022
 impossible-0.022
 guilty-0.022
Token,
Token ID11
Feature activation-1.01e+00

Pos logit contributions
 trouble+0.027
 Please+0.027
 unfair+0.026
 guilty+0.025
 impossible+0.025
Neg logit contributions
ric-0.032
ries-0.032
els-0.031
aches-0.031
asted-0.030
Token "
Token ID366
Feature activation+1.43e+00

Pos logit contributions
 trouble+0.015
 unfair+0.015
 Please+0.014
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.022
ries-0.022
els-0.021
asted-0.020
aches-0.020
TokenWhat
Token ID2061
Feature activation-4.44e+00

Pos logit contributions
ric+0.026
ries+0.025
els+0.024
aches+0.024
asted+0.023
Neg logit contributions
 Please-0.027
 trouble-0.027
 unfair-0.026
 guilty-0.025
 Give-0.025
Token did
Token ID750
Feature activation-8.91e+00

Pos logit contributions
 impossible+0.079
 Please+0.078
 guilty+0.078
 unfair+0.074
 trouble+0.074
Neg logit contributions
ries-0.075
ric-0.074
aches-0.072
els-0.072
packs-0.071
Token you
Token ID345
Feature activation-5.16e+00

Pos logit contributions
 trouble+0.125
 Please+0.123
 unfair+0.123
 guilty+0.119
 impossible+0.117
Neg logit contributions
ric-0.198
ries-0.197
els-0.188
aches-0.186
asted-0.182
Token do
Token ID466
Feature activation-5.89e+00

Pos logit contributions
 trouble+0.081
 Please+0.080
 unfair+0.079
 guilty+0.076
 impossible+0.074
Neg logit contributions
ric-0.109
ries-0.105
els-0.102
aches-0.101
asted-0.100
Token with
Token ID351
Feature activation+8.56e-02

Pos logit contributions
 trouble+0.121
 guilty+0.121
 unfair+0.120
 Please+0.115
 Goodbye+0.114
Neg logit contributions
aches-0.080
ries-0.078
hat-0.076
ric-0.075
els-0.074
Token your
Token ID534
Feature activation-2.80e+00

Pos logit contributions
ric+0.002
asted+0.002
hat+0.002
ries+0.002
bel+0.002
Neg logit contributions
 trouble-0.001
 mean-0.001
 stupid-0.001
 wrong-0.001
 unfair-0.001
Token chain
Token ID6333
Feature activation-2.31e+00

Pos logit contributions
 Please+0.051
 trouble+0.051
 unfair+0.051
 guilty+0.049
 impossible+0.048
Neg logit contributions
ric-0.052
ries-0.051
els-0.048
aches-0.047
asted-0.046
Token walked
Token ID6807
Feature activation-1.27e+00

Pos logit contributions
ric+0.011
ries+0.011
els+0.011
aches+0.010
asted+0.010
Neg logit contributions
 trouble-0.014
 Please-0.014
 unfair-0.014
 guilty-0.014
 impossible-0.014
Token and
Token ID290
Feature activation+2.22e+00

Pos logit contributions
 unfair+0.025
 Please+0.025
 trouble+0.024
 guilty+0.023
 Give+0.023
Neg logit contributions
ries-0.021
ric-0.021
hat-0.021
els-0.020
aches-0.020
Token walked
Token ID6807
Feature activation-9.14e-01

Pos logit contributions
ric+0.043
ries+0.038
els+0.037
ants+0.036
aches+0.036
Neg logit contributions
 Please-0.042
 trouble-0.041
 guilty-0.041
 mean-0.040
 unfair-0.040
Token for
Token ID329
Feature activation+4.25e+00

Pos logit contributions
 unfair+0.021
 Please+0.020
 trouble+0.020
 guilty+0.019
 Give+0.019
Neg logit contributions
ric-0.013
ries-0.012
hat-0.012
aches-0.012
els-0.012
Token what
Token ID644
Feature activation-1.43e+00

Pos logit contributions
ric+0.095
ries+0.093
els+0.089
aches+0.087
asted+0.086
Neg logit contributions
 trouble-0.063
 Please-0.062
 unfair-0.061
 guilty-0.058
 impossible-0.057
Token seemed
Token ID3947
Feature activation-8.89e+00

Pos logit contributions
 Please+0.025
 unfair+0.025
 trouble+0.024
 guilty+0.024
 impossible+0.024
Neg logit contributions
ries-0.026
ric-0.026
aches-0.025
els-0.025
hat-0.024
Token like
Token ID588
Feature activation-7.70e-01

Pos logit contributions
 Give+0.164
 Please+0.162
 Promise+0.148
 Come+0.145
 trouble+0.143
Neg logit contributions
ric-0.142
aches-0.141
 every-0.134
packs-0.125
 how-0.124
Token forever
Token ID8097
Feature activation-1.56e+00

Pos logit contributions
 unfair+0.014
 Please+0.014
 trouble+0.013
 guilty+0.013
 Give+0.013
Neg logit contributions
ric-0.013
ries-0.013
els-0.013
aches-0.013
asted-0.012
Token,
Token ID11
Feature activation-2.35e-01

Pos logit contributions
 trouble+0.025
 Please+0.025
 unfair+0.024
 guilty+0.024
 impossible+0.023
Neg logit contributions
ric-0.032
ries-0.032
els-0.030
aches-0.030
asted-0.029
Token until
Token ID1566
Feature activation+2.25e+00

Pos logit contributions
 unfair+0.004
 Please+0.004
 Give+0.004
 trouble+0.004
 happens+0.004
Neg logit contributions
ric-0.005
ries-0.004
els-0.004
asted-0.004
aches-0.004
Token he
Token ID339
Feature activation-1.17e+00

Pos logit contributions
ric+0.046
ries+0.045
els+0.043
aches+0.042
asted+0.042
Neg logit contributions
 Please-0.036
 trouble-0.036
 unfair-0.036
 guilty-0.034
 impossible-0.034
TokenThey
Token ID2990
Feature activation+1.17e+00

Pos logit contributions
ries+0.079
ric+0.078
els+0.071
ishes+0.066
aches+0.065
Neg logit contributions
 Please-0.110
 Give-0.105
 trouble-0.104
 unfair-0.104
 mean-0.103
Token rode
Token ID22075
Feature activation+1.03e+00

Pos logit contributions
ries+0.026
ric+0.024
els+0.023
ants+0.022
aches+0.021
Neg logit contributions
 mean-0.019
 guilty-0.018
 stupid-0.017
 impossible-0.017
 unfair-0.017
Token around
Token ID1088
Feature activation+5.42e+00

Pos logit contributions
ric+0.017
ries+0.017
els+0.016
aches+0.016
asted+0.015
Neg logit contributions
 trouble-0.020
 Please-0.020
 unfair-0.020
 guilty-0.020
 impossible-0.019
Token for
Token ID329
Feature activation+4.57e+00

Pos logit contributions
ric+0.078
ries+0.077
els+0.071
aches+0.071
hat+0.068
Neg logit contributions
 trouble-0.125
 Please-0.121
 unfair-0.118
 guilty-0.118
 impossible-0.117
Token what
Token ID644
Feature activation-4.73e-01

Pos logit contributions
ric+0.093
ries+0.091
els+0.087
aches+0.085
hat+0.083
Neg logit contributions
 trouble-0.077
 Please-0.075
 unfair-0.075
 guilty-0.072
 impossible-0.071
Token seemed
Token ID3947
Feature activation-8.88e+00

Pos logit contributions
 trouble+0.009
 Please+0.009
 unfair+0.008
 guilty+0.008
 impossible+0.008
Neg logit contributions
ric-0.009
ries-0.009
els-0.008
aches-0.008
asted-0.008
Token like
Token ID588
Feature activation-1.29e-01

Pos logit contributions
 Give+0.169
 Please+0.160
 Come+0.148
 Promise+0.147
 liar+0.145
Neg logit contributions
ric-0.137
 every-0.130
aches-0.125
 how-0.124
 their-0.124
Token hours
Token ID2250
Feature activation+1.11e+00

Pos logit contributions
 Please+0.002
 unfair+0.002
 trouble+0.002
 guilty+0.002
 Give+0.002
Neg logit contributions
ric-0.003
ries-0.002
els-0.002
asted-0.002
aches-0.002
Token.
Token ID13
Feature activation-1.51e+00

Pos logit contributions
ric+0.023
ries+0.022
els+0.022
aches+0.021
asted+0.021
Neg logit contributions
 trouble-0.018
 Please-0.017
 unfair-0.017
 guilty-0.017
 impossible-0.016
Token When
Token ID1649
Feature activation+9.52e-01

Pos logit contributions
 trouble+0.024
 Please+0.024
 unfair+0.023
 guilty+0.023
 impossible+0.022
Neg logit contributions
ric-0.032
ries-0.031
els-0.030
aches-0.030
asted-0.029
Token it
Token ID340
Feature activation-5.62e-01

Pos logit contributions
ric+0.017
ries+0.016
els+0.015
aches+0.015
hat+0.014
Neg logit contributions
 trouble-0.020
 Please-0.019
 unfair-0.019
 guilty-0.018
 impossible-0.018
Token asked
Token ID1965
Feature activation+9.91e-01

Pos logit contributions
ric+0.028
ries+0.026
aches+0.025
els+0.025
asted+0.025
Neg logit contributions
 trouble-0.022
 Please-0.022
 unfair-0.022
 mean-0.022
 guilty-0.022
Token him
Token ID683
Feature activation-1.65e+00

Pos logit contributions
ries+0.016
ric+0.016
els+0.015
aches+0.015
asted+0.014
Neg logit contributions
 Please-0.020
 unfair-0.020
 trouble-0.020
 impossible-0.019
 guilty-0.019
Token,
Token ID11
Feature activation-1.23e+00

Pos logit contributions
 trouble+0.033
 Please+0.032
 unfair+0.032
 guilty+0.031
 impossible+0.030
Neg logit contributions
ries-0.027
ric-0.027
els-0.026
aches-0.026
hat-0.025
Token "
Token ID366
Feature activation+1.17e+00

Pos logit contributions
 trouble+0.019
 unfair+0.018
 Please+0.018
 impossible+0.017
 guilty+0.017
Neg logit contributions
ric-0.026
ries-0.026
els-0.025
asted-0.024
aches-0.024
TokenWhat
Token ID2061
Feature activation-4.45e+00

Pos logit contributions
ric+0.020
ries+0.019
els+0.018
aches+0.018
asted+0.018
Neg logit contributions
 trouble-0.024
 Please-0.024
 unfair-0.023
 guilty-0.022
 mean-0.022
Token happened
Token ID3022
Feature activation-8.87e+00

Pos logit contributions
 impossible+0.084
 Please+0.083
 guilty+0.080
 unfair+0.079
 trouble+0.077
Neg logit contributions
ries-0.072
packs-0.069
ric-0.068
els-0.068
ons-0.067
Token,
Token ID11
Feature activation-2.87e+00

Pos logit contributions
 unfair+0.192
 consequences+0.185
 impossible+0.183
 liar+0.181
 fault+0.181
Neg logit contributions
ries-0.109
ie-0.104
hat-0.102
ric-0.101
bel-0.095
Token Tim
Token ID5045
Feature activation-1.32e+00

Pos logit contributions
 trouble+0.053
 Please+0.053
 unfair+0.052
 guilty+0.051
 impossible+0.051
Neg logit contributions
ries-0.051
ric-0.050
els-0.047
aches-0.046
asted-0.045
Tokenmy
Token ID1820
Feature activation-3.60e+00

Pos logit contributions
 trouble+0.031
 Please+0.031
 unfair+0.030
 guilty+0.030
 impossible+0.029
Neg logit contributions
ric-0.018
ries-0.017
els-0.016
aches-0.015
asted-0.015
Token?
Token ID30
Feature activation-5.70e+00

Pos logit contributions
 trouble+0.069
 unfair+0.068
 impossible+0.068
 Please+0.067
 guilty+0.067
Neg logit contributions
ries-0.061
ric-0.059
hat-0.057
els-0.056
aches-0.054
Token Why
Token ID4162
Feature activation-4.11e+00

Pos logit contributions
 trouble+0.072
 unfair+0.066
 impossible+0.064
 consequences+0.062
 guilty+0.059
Neg logit contributions
ric-0.143
ries-0.133
els-0.130
ants-0.125
illas-0.124
Token o
Token ID267
Feature activation+8.58e-01

Pos logit contributions
ric+0.042
ries+0.041
els+0.038
aches+0.038
asted+0.037
Neg logit contributions
 trouble-0.058
 Please-0.057
 unfair-0.057
 guilty-0.055
 impossible-0.055
Tokenat
Token ID265
Feature activation+2.79e+00

Pos logit contributions
nam+0.015
ries+0.014
els+0.013
ck+0.012
irs+0.012
Neg logit contributions
 Come-0.013
umpy-0.013
 Just-0.013
 trouble-0.013
ANG-0.013
Token field
Token ID2214
Feature activation+4.49e+00

Pos logit contributions
ric+0.057
ries+0.055
aches+0.054
asted+0.053
els+0.052
Neg logit contributions
 trouble-0.042
 impossible-0.040
 guilty-0.039
 stupid-0.039
 wrong-0.038
Token for
Token ID329
Feature activation+5.01e+00

Pos logit contributions
ries+0.096
ric+0.095
els+0.091
aches+0.088
asted+0.086
Neg logit contributions
 trouble-0.070
 impossible-0.066
 Please-0.066
 guilty-0.065
 unfair-0.064
Token what
Token ID644
Feature activation-4.39e-01

Pos logit contributions
ric+0.098
ries+0.095
hat+0.092
els+0.092
ants+0.090
Neg logit contributions
 trouble-0.100
 unfair-0.086
 guilty-0.085
 stupid-0.085
 impossible-0.084
Token seemed
Token ID3947
Feature activation-8.76e+00

Pos logit contributions
 trouble+0.008
 Please+0.008
 unfair+0.008
 guilty+0.008
 impossible+0.008
Neg logit contributions
ric-0.008
ries-0.008
els-0.007
aches-0.007
asted-0.007
Token like
Token ID588
Feature activation-5.31e-02

Pos logit contributions
 Give+0.149
 Please+0.143
 Come+0.131
 Promise+0.130
 trouble+0.130
Neg logit contributions
ric-0.162
aches-0.145
bel-0.142
 every-0.141
asted-0.138
Token forever
Token ID8097
Feature activation-8.01e-01

Pos logit contributions
 Please+0.001
 trouble+0.001
 unfair+0.001
 guilty+0.001
 impossible+0.001
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
aches-0.001
asted-0.001
Token.
Token ID13
Feature activation-3.15e-01

Pos logit contributions
 trouble+0.012
 Please+0.012
 unfair+0.012
 guilty+0.011
 impossible+0.011
Neg logit contributions
ric-0.017
ries-0.017
els-0.016
aches-0.016
asted-0.016
Token
Token ID198
Feature activation+6.23e+00

Pos logit contributions
 Please+0.005
 trouble+0.005
 Give+0.005
 guilty+0.005
 unfair+0.005
Neg logit contributions
ries-0.006
els-0.006
ric-0.006
aches-0.006
hat-0.006
Token
Token ID198
Feature activation+5.25e+00

Pos logit contributions
ries+0.153
els+0.147
ishes+0.139
ric+0.139
pp+0.132
Neg logit contributions
 trouble-0.086
 Give-0.079
 Please-0.077
 sorry-0.076
 guilty-0.075
Token They
Token ID1119
Feature activation+1.02e+00

Pos logit contributions
 trouble+0.017
 Please+0.017
 unfair+0.017
 guilty+0.016
 impossible+0.016
Neg logit contributions
ric-0.022
ries-0.021
els-0.020
aches-0.020
asted-0.020
Token laughed
Token ID13818
Feature activation-2.06e+00

Pos logit contributions
ries+0.023
ric+0.022
els+0.021
ants+0.020
aches+0.019
Neg logit contributions
 mean-0.017
 guilty-0.016
 unfair-0.016
 stupid-0.015
 Please-0.015
Token together
Token ID1978
Feature activation+2.15e+00

Pos logit contributions
 trouble+0.046
 unfair+0.046
 liar+0.045
 Please+0.045
 Give+0.043
Neg logit contributions
hat-0.026
ries-0.026
ric-0.025
asted-0.024
els-0.024
Token for
Token ID329
Feature activation+4.29e+00

Pos logit contributions
ric+0.047
ries+0.046
els+0.044
aches+0.043
hat+0.042
Neg logit contributions
 trouble-0.033
 Please-0.032
 unfair-0.032
 guilty-0.032
 impossible-0.031
Token what
Token ID644
Feature activation-1.02e+00

Pos logit contributions
ric+0.094
ries+0.092
els+0.089
aches+0.086
hat+0.086
Neg logit contributions
 trouble-0.069
 unfair-0.065
 Please-0.065
 guilty-0.063
 mean-0.062
Token seemed
Token ID3947
Feature activation-8.68e+00

Pos logit contributions
 Please+0.017
 unfair+0.017
 trouble+0.017
 impossible+0.016
 guilty+0.016
Neg logit contributions
ries-0.019
ric-0.019
aches-0.018
els-0.018
asted-0.017
Token like
Token ID588
Feature activation-3.18e-01

Pos logit contributions
 Give+0.157
 Please+0.156
 Promise+0.141
 liar+0.139
 Come+0.137
Neg logit contributions
ric-0.142
aches-0.128
 every-0.127
 their-0.126
 how-0.123
Token a
Token ID257
Feature activation-2.75e+00

Pos logit contributions
 Please+0.006
 unfair+0.006
 trouble+0.005
 guilty+0.005
 Give+0.005
Neg logit contributions
ries-0.006
ric-0.006
els-0.005
asted-0.005
aches-0.005
Token long
Token ID890
Feature activation-9.04e-01

Pos logit contributions
 Give+0.049
 trouble+0.049
 unfair+0.048
 Please+0.048
 consequences+0.047
Neg logit contributions
ric-0.042
hat-0.040
asted-0.038
aches-0.038
field-0.037
Token time
Token ID640
Feature activation-1.55e+00

Pos logit contributions
 unfair+0.015
 trouble+0.015
 Please+0.015
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.017
asted-0.016
ries-0.016
hat-0.016
aches-0.016
Token.
Token ID13
Feature activation-2.70e-01

Pos logit contributions
 unfair+0.028
 Please+0.028
 trouble+0.027
 Give+0.026
 guilty+0.026
Neg logit contributions
ric-0.029
ries-0.028
els-0.026
asted-0.026
aches-0.025
Token Ben
Token ID3932
Feature activation-1.80e+00

Pos logit contributions
 trouble+0.045
 Please+0.045
 unfair+0.044
 guilty+0.043
 impossible+0.042
Neg logit contributions
ric-0.051
ries-0.050
els-0.047
asted-0.047
hat-0.046
Token asked
Token ID1965
Feature activation+4.83e-01

Pos logit contributions
 trouble+0.038
 Please+0.037
 unfair+0.037
 impossible+0.037
 guilty+0.035
Neg logit contributions
ric-0.026
ries-0.025
hat-0.025
asted-0.025
els-0.024
Token Tim
Token ID5045
Feature activation-9.29e-01

Pos logit contributions
ries+0.009
ric+0.008
els+0.008
asted+0.008
aches+0.008
Neg logit contributions
 Please-0.009
 unfair-0.009
 trouble-0.009
 impossible-0.009
 guilty-0.009
Token what
Token ID644
Feature activation-3.08e+00

Pos logit contributions
 mean+0.013
 unfair+0.013
 trouble+0.013
 guilty+0.013
 Please+0.013
Neg logit contributions
ric-0.020
ries-0.020
els-0.019
asted-0.019
aches-0.019
Token he
Token ID339
Feature activation-5.06e+00

Pos logit contributions
 Please+0.056
 trouble+0.055
 unfair+0.055
 guilty+0.054
 impossible+0.054
Neg logit contributions
ries-0.054
ric-0.054
els-0.051
aches-0.050
hat-0.050
Token was
Token ID373
Feature activation-8.67e+00

Pos logit contributions
 Please+0.096
 trouble+0.095
 unfair+0.093
 impossible+0.091
 sorry+0.090
Neg logit contributions
ric-0.089
ries-0.086
els-0.083
hat-0.081
aches-0.080
Token doing
Token ID1804
Feature activation-5.19e+00

Pos logit contributions
 Please+0.151
 trouble+0.142
 Give+0.141
 unfair+0.141
 Can+0.134
Neg logit contributions
ric-0.172
ries-0.168
aches-0.165
els-0.160
hat-0.153
Token,
Token ID11
Feature activation-7.63e-01

Pos logit contributions
 unfair+0.115
 trouble+0.111
 guilty+0.111
 Please+0.109
 impossible+0.107
Neg logit contributions
ries-0.073
ric-0.069
els-0.068
hat-0.068
aches-0.065
Token and
Token ID290
Feature activation+6.24e-01

Pos logit contributions
 Please+0.011
 trouble+0.011
 unfair+0.011
 guilty+0.011
 mean+0.010
Neg logit contributions
ric-0.017
ries-0.016
els-0.016
aches-0.015
asted-0.015
Token Tim
Token ID5045
Feature activation-4.37e-02

Pos logit contributions
ric+0.015
els+0.014
ries+0.014
hat+0.013
aches+0.013
Neg logit contributions
 unfair-0.009
 Please-0.008
 trouble-0.008
 mean-0.008
 guilty-0.008
Token replied
Token ID8712
Feature activation-3.66e+00

Pos logit contributions
 Please+0.001
 trouble+0.001
 unfair+0.001
 impossible+0.001
 guilty+0.001
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
aches-0.001
asted-0.001
Token old
Token ID1468
Feature activation+1.77e-01

Pos logit contributions
 trouble+0.014
 Please+0.014
 unfair+0.014
 guilty+0.013
 Give+0.013
Neg logit contributions
ries-0.026
ric-0.026
els-0.026
aches-0.025
hat-0.024
Token brother
Token ID3956
Feature activation-1.24e+00

Pos logit contributions
ries+0.003
els+0.003
ric+0.003
hat+0.003
aches+0.003
Neg logit contributions
 impossible-0.003
 unfair-0.003
 Please-0.003
 trouble-0.003
 guilty-0.003
Token,
Token ID11
Feature activation-2.94e-01

Pos logit contributions
 trouble+0.021
 Please+0.021
 unfair+0.020
 guilty+0.019
 impossible+0.019
Neg logit contributions
ric-0.024
ries-0.024
els-0.023
hat-0.022
aches-0.022
Token "
Token ID366
Feature activation+1.53e+00

Pos logit contributions
 trouble+0.005
 Please+0.005
 unfair+0.005
 guilty+0.005
 impossible+0.005
Neg logit contributions
ric-0.006
ries-0.006
els-0.006
aches-0.005
asted-0.005
TokenWhat
Token ID2061
Feature activation-3.97e+00

Pos logit contributions
ric+0.029
ries+0.028
aches+0.027
els+0.027
ants+0.025
Neg logit contributions
 Please-0.029
 Give-0.027
 trouble-0.027
 unfair-0.027
 mean-0.026
Token did
Token ID750
Feature activation-8.65e+00

Pos logit contributions
 impossible+0.068
 guilty+0.068
 Please+0.067
 unfair+0.066
 trouble+0.065
Neg logit contributions
ries-0.069
aches-0.068
ric-0.066
els-0.064
packs-0.064
Token you
Token ID345
Feature activation-4.97e+00

Pos logit contributions
 trouble+0.122
 unfair+0.121
 guilty+0.120
 Please+0.118
 impossible+0.117
Neg logit contributions
ries-0.188
ric-0.182
aches-0.180
els-0.175
asted-0.170
Token do
Token ID466
Feature activation-5.46e+00

Pos logit contributions
 trouble+0.089
 Please+0.089
 unfair+0.085
 Give+0.083
 Goodbye+0.081
Neg logit contributions
ric-0.089
asted-0.086
aches-0.084
els-0.079
ries-0.079
Token?"
Token ID1701
Feature activation+1.77e+00

Pos logit contributions
 trouble+0.115
 guilty+0.114
 unfair+0.114
 impossible+0.108
 fault+0.107
Neg logit contributions
ries-0.071
aches-0.071
hat-0.067
els-0.064
asted-0.064
Token
Token ID198
Feature activation+5.14e+00

Pos logit contributions
ric+0.038
ries+0.036
els+0.035
aches+0.034
hat+0.033
Neg logit contributions
 Please-0.029
 trouble-0.029
 unfair-0.028
 guilty-0.027
 Give-0.027
Token
Token ID198
Feature activation+4.34e+00

Pos logit contributions
ric+0.101
ries+0.098
els+0.093
aches+0.088
asted+0.086
Neg logit contributions
 Please-0.096
 trouble-0.094
 unfair-0.090
 Give-0.089
 mean-0.088
Token had
Token ID550
Feature activation-2.27e+00

Pos logit contributions
ries+0.026
ric+0.026
els+0.024
aches+0.024
ants+0.023
Neg logit contributions
 mean-0.016
 guilty-0.016
 trouble-0.015
 Please-0.015
 unfair-0.015
Token been
Token ID587
Feature activation-5.21e+00

Pos logit contributions
 Please+0.036
 unfair+0.035
 trouble+0.035
 guilty+0.034
 Give+0.033
Neg logit contributions
ric-0.047
ries-0.046
aches-0.043
asted-0.043
els-0.043
Token searching
Token ID10342
Feature activation-2.21e+00

Pos logit contributions
 Please+0.084
 unfair+0.083
 trouble+0.083
 Give+0.079
 guilty+0.078
Neg logit contributions
ric-0.108
ries-0.106
els-0.101
aches-0.099
asted-0.098
Token for
Token ID329
Feature activation+3.30e+00

Pos logit contributions
 Please+0.037
 unfair+0.036
 trouble+0.035
 Give+0.034
 guilty+0.033
Neg logit contributions
ric-0.044
ries-0.044
els-0.042
asted-0.041
aches-0.041
Token what
Token ID644
Feature activation-8.03e-01

Pos logit contributions
ric+0.063
ries+0.063
els+0.060
asted+0.059
aches+0.058
Neg logit contributions
 Please-0.058
 unfair-0.056
 trouble-0.056
 guilty-0.054
 Give-0.054
Token seemed
Token ID3947
Feature activation-8.65e+00

Pos logit contributions
 Please+0.017
 unfair+0.016
 Give+0.016
 trouble+0.016
 guilty+0.016
Neg logit contributions
ries-0.012
ric-0.011
aches-0.011
els-0.011
asted-0.011
Token like
Token ID588
Feature activation+4.51e-03

Pos logit contributions
 Give+0.149
 Please+0.147
 liar+0.133
 trouble+0.130
 Promise+0.130
Neg logit contributions
ric-0.162
aches-0.146
asted-0.140
ries-0.138
 every-0.138
Token hours
Token ID2250
Feature activation+1.17e+00

Pos logit contributions
ric+0.000
ries+0.000
els+0.000
asted+0.000
aches+0.000
Neg logit contributions
 unfair-0.000
 Please-0.000
 trouble-0.000
 guilty-0.000
 Give-0.000
Token,
Token ID11
Feature activation+4.60e-01

Pos logit contributions
ric+0.024
ries+0.023
els+0.022
aches+0.022
asted+0.021
Neg logit contributions
 trouble-0.019
 Please-0.019
 unfair-0.019
 guilty-0.018
 impossible-0.018
Token when
Token ID618
Feature activation-3.02e-02

Pos logit contributions
ric+0.009
ries+0.009
asted+0.009
els+0.008
aches+0.008
Neg logit contributions
 Please-0.008
 unfair-0.007
 trouble-0.007
 Give-0.007
 impossible-0.007
Token all
Token ID477
Feature activation+1.88e+00

Pos logit contributions
 Give+0.001
 Please+0.001
 unfair+0.001
 liar+0.001
 guilty+0.001
Neg logit contributions
ries-0.000
asted-0.000
ric-0.000
els-0.000
packs-0.000
Tokenmy
Token ID1820
Feature activation-8.00e-01

Pos logit contributions
 trouble+0.003
 unfair+0.003
 Please+0.003
 mean+0.003
 guilty+0.003
Neg logit contributions
ric-0.002
ries-0.002
els-0.002
aches-0.002
asted-0.002
Token asked
Token ID1965
Feature activation+2.00e+00

Pos logit contributions
 trouble+0.014
 Please+0.014
 unfair+0.013
 guilty+0.013
 mean+0.013
Neg logit contributions
ric-0.016
ries-0.015
els-0.015
aches-0.014
asted-0.014
Token,
Token ID11
Feature activation-2.01e-01

Pos logit contributions
ries+0.026
ie+0.024
els+0.024
aches+0.023
asted+0.023
Neg logit contributions
 Please-0.046
 unfair-0.046
 trouble-0.045
 crank-0.045
 impossible-0.044
Token "
Token ID366
Feature activation+1.59e+00

Pos logit contributions
 trouble+0.003
 unfair+0.003
 Please+0.003
 impossible+0.003
 Give+0.003
Neg logit contributions
ric-0.004
ries-0.004
els-0.004
asted-0.004
hat-0.004
TokenWhat
Token ID2061
Feature activation-4.19e+00

Pos logit contributions
ric+0.028
ries+0.027
els+0.026
aches+0.026
asted+0.025
Neg logit contributions
 Please-0.031
 trouble-0.031
 unfair-0.030
 mean-0.029
 guilty-0.029
Token did
Token ID750
Feature activation-8.65e+00

Pos logit contributions
 Please+0.083
 impossible+0.081
 unfair+0.079
 guilty+0.078
 sorry+0.076
Neg logit contributions
aches-0.064
ries-0.063
ric-0.063
packs-0.059
els-0.058
Token you
Token ID345
Feature activation-5.06e+00

Pos logit contributions
 trouble+0.128
 unfair+0.128
 Please+0.127
 guilty+0.125
 impossible+0.124
Neg logit contributions
ries-0.179
ric-0.176
aches-0.172
els-0.167
asted-0.161
Token find
Token ID1064
Feature activation-1.81e+00

Pos logit contributions
 Please+0.094
 unfair+0.091
 trouble+0.091
 Give+0.087
 guilty+0.085
Neg logit contributions
ric-0.094
aches-0.087
asted-0.086
ries-0.082
els-0.081
Token down
Token ID866
Feature activation+2.16e-01

Pos logit contributions
 trouble+0.029
 Please+0.028
 unfair+0.028
 guilty+0.027
 impossible+0.026
Neg logit contributions
ric-0.038
ries-0.037
els-0.036
aches-0.035
asted-0.035
Token there
Token ID612
Feature activation-1.20e+00

Pos logit contributions
ric+0.005
ries+0.005
els+0.005
asted+0.004
ants+0.004
Neg logit contributions
 trouble-0.003
 Please-0.003
 mean-0.003
 unfair-0.003
 guilty-0.003
Token,
Token ID11
Feature activation-2.55e+00

Pos logit contributions
 trouble+0.018
 Please+0.018
 unfair+0.017
 guilty+0.017
 mean+0.017
Neg logit contributions
ric-0.026
ries-0.026
els-0.025
asted-0.025
aches-0.024
Token watched
Token ID7342
Feature activation+2.35e-01

Pos logit contributions
ric+0.036
ries+0.036
els+0.034
ants+0.032
aches+0.032
Neg logit contributions
 Please-0.037
 trouble-0.037
 guilty-0.036
 unfair-0.035
 mean-0.035
Token the
Token ID262
Feature activation+9.04e-01

Pos logit contributions
ric+0.004
ries+0.004
els+0.004
aches+0.004
illas+0.004
Neg logit contributions
 trouble-0.005
 Please-0.004
 sorry-0.004
 guilty-0.004
 impossible-0.004
Token rainbow
Token ID27223
Feature activation+2.24e+00

Pos logit contributions
ric+0.016
ries+0.016
els+0.015
aches+0.015
hat+0.014
Neg logit contributions
 trouble-0.017
 Please-0.017
 unfair-0.017
 guilty-0.017
 mean-0.017
Token for
Token ID329
Feature activation+5.65e+00

Pos logit contributions
ric+0.047
ries+0.044
asted+0.041
els+0.041
aches+0.041
Neg logit contributions
 impossible-0.035
 mean-0.033
 trouble-0.033
 Please-0.032
 wrong-0.032
Token what
Token ID644
Feature activation-9.42e-01

Pos logit contributions
ric+0.132
ries+0.126
els+0.121
hat+0.118
aches+0.118
Neg logit contributions
 trouble-0.085
 Please-0.080
 unfair-0.080
 guilty-0.076
 impossible-0.076
Token seemed
Token ID3947
Feature activation-8.63e+00

Pos logit contributions
 Please+0.017
 trouble+0.017
 unfair+0.017
 guilty+0.017
 Give+0.016
Neg logit contributions
ries-0.017
ric-0.016
els-0.016
aches-0.015
asted-0.015
Token like
Token ID588
Feature activation+1.40e-01

Pos logit contributions
 Give+0.140
 Please+0.134
 trouble+0.128
 Promise+0.123
 Come+0.123
Neg logit contributions
ric-0.168
aches-0.155
ries-0.148
asted-0.146
bel-0.145
Token forever
Token ID8097
Feature activation-9.47e-01

Pos logit contributions
ric+0.003
ries+0.003
els+0.002
aches+0.002
asted+0.002
Neg logit contributions
 trouble-0.003
 Please-0.003
 unfair-0.002
 guilty-0.002
 impossible-0.002
Token.
Token ID13
Feature activation-5.87e-01

Pos logit contributions
 trouble+0.014
 Please+0.014
 unfair+0.014
 guilty+0.013
 impossible+0.013
Neg logit contributions
ric-0.021
ries-0.020
els-0.020
asted-0.019
aches-0.019
Token
Token ID198
Feature activation+5.69e+00

Pos logit contributions
 Please+0.011
 trouble+0.011
 unfair+0.011
 guilty+0.011
 Give+0.011
Neg logit contributions
ric-0.010
ries-0.010
els-0.010
aches-0.010
asted-0.009
Token
Token ID198
Feature activation+4.74e+00

Pos logit contributions
ries+0.135
ric+0.132
els+0.128
aches+0.119
ishes+0.117
Neg logit contributions
 trouble-0.081
 Please-0.078
 Give-0.074
 guilty-0.073
 mean-0.071
Token and
Token ID290
Feature activation+2.35e+00

Pos logit contributions
 unfair+0.028
 Please+0.027
 impossible+0.027
 liar+0.027
 trouble+0.026
Neg logit contributions
ries-0.021
els-0.020
hat-0.019
ric-0.019
asted-0.018
Token asked
Token ID1965
Feature activation+1.52e+00

Pos logit contributions
ric+0.060
els+0.055
asted+0.054
aches+0.054
ries+0.054
Neg logit contributions
 guilty-0.033
 Please-0.032
 mean-0.032
 sorry-0.030
 foolish-0.029
Token,
Token ID11
Feature activation-6.94e-01

Pos logit contributions
ries+0.026
ric+0.025
els+0.024
aches+0.023
asted+0.022
Neg logit contributions
 Please-0.030
 unfair-0.030
 trouble-0.030
 guilty-0.029
 impossible-0.029
Token "
Token ID366
Feature activation+1.62e+00

Pos logit contributions
 trouble+0.010
 Please+0.010
 unfair+0.010
 guilty+0.009
 impossible+0.009
Neg logit contributions
ric-0.016
ries-0.015
els-0.015
aches-0.014
asted-0.014
TokenWhat
Token ID2061
Feature activation-4.22e+00

Pos logit contributions
ric+0.029
ries+0.027
els+0.027
aches+0.026
hat+0.026
Neg logit contributions
 trouble-0.031
 Please-0.031
 unfair-0.030
 mean-0.030
 guilty-0.029
Token happened
Token ID3022
Feature activation-8.62e+00

Pos logit contributions
 impossible+0.076
 Please+0.074
 unfair+0.071
 guilty+0.069
 trouble+0.068
Neg logit contributions
ries-0.075
ric-0.073
aches-0.069
els-0.069
ons-0.067
Token,
Token ID11
Feature activation-2.87e+00

Pos logit contributions
 unfair+0.192
 Give+0.183
 impossible+0.182
 consequences+0.182
 liar+0.180
Neg logit contributions
ie-0.098
ries-0.094
ric-0.089
hat-0.088
bel-0.083
Token Tim
Token ID5045
Feature activation-1.08e+00

Pos logit contributions
 trouble+0.052
 Please+0.052
 unfair+0.051
 impossible+0.049
 guilty+0.049
Neg logit contributions
ries-0.052
ric-0.052
els-0.049
aches-0.048
asted-0.047
Tokenmy
Token ID1820
Feature activation-3.51e+00

Pos logit contributions
 trouble+0.026
 Please+0.026
 unfair+0.025
 guilty+0.025
 impossible+0.025
Neg logit contributions
ric-0.014
ries-0.013
els-0.012
aches-0.012
hat-0.012
Token?"
Token ID1701
Feature activation+1.21e+00

Pos logit contributions
 trouble+0.067
 impossible+0.067
 unfair+0.067
 Please+0.066
 guilty+0.065
Neg logit contributions
ries-0.059
ric-0.057
els-0.054
hat-0.054
ie-0.052
Token
Token ID198
Feature activation+4.99e+00

Pos logit contributions
ric+0.023
ries+0.023
els+0.022
aches+0.021
asted+0.021
Neg logit contributions
 trouble-0.021
 Please-0.021
 unfair-0.021
 impossible-0.020
 guilty-0.020
Token came
Token ID1625
Feature activation-2.07e+00

Pos logit contributions
 trouble+0.042
 Please+0.042
 unfair+0.042
 guilty+0.040
 impossible+0.039
Neg logit contributions
ric-0.043
ries-0.043
els-0.041
aches-0.039
hat-0.039
Token by
Token ID416
Feature activation+4.20e+00

Pos logit contributions
 unfair+0.041
 Please+0.040
 trouble+0.039
 guilty+0.038
 liar+0.038
Neg logit contributions
ric-0.036
ries-0.035
els-0.034
hat-0.033
packs-0.031
Token.
Token ID13
Feature activation-3.47e+00

Pos logit contributions
ric+0.089
ries+0.088
els+0.084
aches+0.083
asted+0.082
Neg logit contributions
 trouble-0.067
 Please-0.065
 unfair-0.063
 guilty-0.062
 mean-0.060
Token "
Token ID366
Feature activation+1.19e+00

Pos logit contributions
 trouble+0.065
 unfair+0.063
 Please+0.063
 impossible+0.061
 consequences+0.059
Neg logit contributions
ric-0.062
ries-0.059
hat-0.056
els-0.055
asted-0.053
TokenWhat
Token ID2061
Feature activation-4.14e+00

Pos logit contributions
ric+0.026
ries+0.026
els+0.025
aches+0.025
ants+0.023
Neg logit contributions
 Please-0.020
 trouble-0.018
 Give-0.018
 unfair-0.018
 mean-0.018
Token happened
Token ID3022
Feature activation-8.59e+00

Pos logit contributions
 Please+0.079
 impossible+0.078
 unfair+0.076
 guilty+0.074
 trouble+0.071
Neg logit contributions
ric-0.065
ries-0.065
aches-0.062
els-0.060
packs-0.060
Token,
Token ID11
Feature activation-2.64e+00

Pos logit contributions
 unfair+0.189
 liar+0.184
 Promise+0.181
 consequences+0.181
 fault+0.180
Neg logit contributions
ie-0.096
ries-0.096
ric-0.092
hat-0.084
els-0.078
Token B
Token ID347
Feature activation-4.93e+00

Pos logit contributions
 trouble+0.048
 Please+0.048
 unfair+0.047
 guilty+0.045
 impossible+0.045
Neg logit contributions
ric-0.049
ries-0.048
els-0.045
aches-0.044
asted-0.044
Tokeninky
Token ID29246
Feature activation-6.00e-01

Pos logit contributions
 trouble+0.103
 unfair+0.101
 Please+0.100
 guilty+0.097
 impossible+0.095
Neg logit contributions
ric-0.081
ries-0.078
hat-0.076
els-0.074
ants-0.070
Token?"
Token ID1701
Feature activation+1.52e+00

Pos logit contributions
 unfair+0.011
 Please+0.011
 impossible+0.011
 trouble+0.011
 crank+0.011
Neg logit contributions
ie-0.010
ries-0.010
ric-0.009
aches-0.009
els-0.009
Token asked
Token ID1965
Feature activation+2.21e+00

Pos logit contributions
ric+0.027
ries+0.027
els+0.025
aches+0.024
hat+0.024
Neg logit contributions
 trouble-0.028
 unfair-0.028
 Please-0.028
 impossible-0.027
 guilty-0.027
Token.
Token ID13
Feature activation-2.23e+00

Pos logit contributions
ric+0.009
ries+0.009
els+0.009
aches+0.008
asted+0.008
Neg logit contributions
 trouble-0.007
 Please-0.007
 unfair-0.007
 guilty-0.006
 mean-0.006
Token Then
Token ID3244
Feature activation-7.04e-01

Pos logit contributions
 Please+0.038
 trouble+0.038
 unfair+0.036
 guilty+0.035
 Give+0.035
Neg logit contributions
ric-0.046
ries-0.046
els-0.043
aches-0.043
hat-0.041
Token they
Token ID484
Feature activation+1.81e+00

Pos logit contributions
 trouble+0.016
 Please+0.016
 unfair+0.016
 guilty+0.016
 impossible+0.016
Neg logit contributions
ric-0.009
ries-0.009
els-0.008
aches-0.008
asted-0.008
Token made
Token ID925
Feature activation+2.07e+00

Pos logit contributions
ric+0.043
ries+0.040
els+0.036
ants+0.036
aches+0.036
Neg logit contributions
 mean-0.028
 Please-0.027
 unfair-0.025
 guilty-0.025
 trouble-0.024
Token them
Token ID606
Feature activation+1.12e+00

Pos logit contributions
ric+0.039
ries+0.038
els+0.037
asted+0.036
aches+0.035
Neg logit contributions
 Please-0.037
 unfair-0.036
 trouble-0.036
 guilty-0.035
 impossible-0.035
Token into
Token ID656
Feature activation+6.37e+00

Pos logit contributions
ric+0.023
ries+0.023
els+0.022
ants+0.021
asted+0.020
Neg logit contributions
 trouble-0.018
 mean-0.017
 guilty-0.017
 sorry-0.017
 stupid-0.017
Token different
Token ID1180
Feature activation+2.56e+00

Pos logit contributions
ric+0.133
ries+0.132
hat+0.125
aches+0.125
ants+0.121
Neg logit contributions
 trouble-0.119
 mean-0.104
 unfair-0.102
 stupid-0.102
 guilty-0.101
Token things
Token ID1243
Feature activation-7.04e-02

Pos logit contributions
ric+0.044
ries+0.042
aches+0.040
asted+0.039
hat+0.039
Neg logit contributions
 trouble-0.052
 Please-0.050
 unfair-0.050
 impossible-0.048
 guilty-0.048
Token.
Token ID13
Feature activation-2.66e+00

Pos logit contributions
 trouble+0.001
 Please+0.001
 unfair+0.001
 guilty+0.001
 impossible+0.001
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
aches-0.001
hat-0.001
Token They
Token ID1119
Feature activation+2.38e+00

Pos logit contributions
 Please+0.045
 trouble+0.044
 unfair+0.043
 guilty+0.041
 Give+0.041
Neg logit contributions
ric-0.056
ries-0.055
aches-0.053
els-0.052
hat-0.050
Token made
Token ID925
Feature activation+1.96e+00

Pos logit contributions
ric+0.058
ries+0.057
ants+0.052
els+0.051
hat+0.049
Neg logit contributions
 mean-0.035
 unfair-0.031
 guilty-0.031
 trouble-0.030
 Please-0.030
Token He
Token ID679
Feature activation+7.49e-01

Pos logit contributions
 unfair+0.024
 trouble+0.024
 mean+0.023
 impossible+0.023
 Please+0.023
Neg logit contributions
ric-0.034
ries-0.033
els-0.031
asted-0.030
hat-0.030
Token loved
Token ID6151
Feature activation-1.90e+00

Pos logit contributions
ric+0.014
ries+0.014
els+0.013
aches+0.013
asted+0.013
Neg logit contributions
 Please-0.013
 trouble-0.013
 unfair-0.013
 guilty-0.013
 impossible-0.012
Token jam
Token ID16853
Feature activation-9.98e-01

Pos logit contributions
 trouble+0.032
 unfair+0.030
 mean+0.030
 stupid+0.029
 guilty+0.029
Neg logit contributions
ric-0.039
ries-0.036
hat-0.036
els-0.036
illas-0.035
Token.
Token ID13
Feature activation-9.78e-01

Pos logit contributions
 Please+0.017
 trouble+0.017
 unfair+0.017
 guilty+0.016
 Give+0.016
Neg logit contributions
ries-0.019
ric-0.019
aches-0.018
els-0.018
hat-0.018
Token
Token ID220
Feature activation-8.76e-01

Pos logit contributions
 trouble+0.016
 Please+0.016
 unfair+0.016
 guilty+0.015
 impossible+0.015
Neg logit contributions
ric-0.020
ries-0.020
els-0.019
aches-0.018
asted-0.018
Token
Token ID198
Feature activation+5.49e+00

Pos logit contributions
 trouble+0.013
 Please+0.013
 Give+0.012
 guilty+0.012
 You+0.012
Neg logit contributions
ries-0.019
ric-0.019
els-0.018
asted-0.018
aches-0.018
Token
Token ID198
Feature activation+4.27e+00

Pos logit contributions
ries+0.123
ric+0.119
els+0.117
asted+0.108
aches+0.107
Neg logit contributions
 trouble-0.083
 Please-0.082
 Give-0.079
 mean-0.076
 guilty-0.076
TokenHe
Token ID1544
Feature activation+7.48e-01

Pos logit contributions
ries+0.068
ric+0.067
els+0.062
aches+0.057
asted+0.056
Neg logit contributions
 Please-0.095
 unfair-0.092
 trouble-0.092
 mean-0.090
 Give-0.089
Token opened
Token ID4721
Feature activation+2.27e+00

Pos logit contributions
ric+0.014
ries+0.013
els+0.013
aches+0.012
hat+0.012
Neg logit contributions
 trouble-0.014
 Please-0.014
 unfair-0.014
 guilty-0.013
 mean-0.013
Token the
Token ID262
Feature activation+8.76e-01

Pos logit contributions
ric+0.047
illas+0.044
asted+0.041
aches+0.041
bel+0.041
Neg logit contributions
 trouble-0.038
 wrong-0.034
 right-0.034
 stupid-0.032
 mean-0.032
Token jar
Token ID17379
Feature activation+2.41e+00

Pos logit contributions
ric+0.015
ries+0.015
els+0.014
aches+0.014
hat+0.013
Neg logit contributions
 Please-0.017
 unfair-0.017
 trouble-0.016
 impossible-0.016
 Give-0.016
Token try
Token ID1949
Feature activation-2.37e+00

Pos logit contributions
ric+0.032
ries+0.031
aches+0.031
asted+0.030
els+0.030
Neg logit contributions
 unfair-0.031
 Please-0.031
 trouble-0.031
 guilty-0.030
 impossible-0.029
Token his
Token ID465
Feature activation-1.61e+00

Pos logit contributions
 trouble+0.043
 Please+0.042
 unfair+0.042
 guilty+0.040
 impossible+0.040
Neg logit contributions
ric-0.045
ries-0.044
els-0.042
aches-0.041
asted-0.040
Token best
Token ID1266
Feature activation-2.61e+00

Pos logit contributions
 Please+0.026
 Give+0.024
 unfair+0.024
 guilty+0.023
 trouble+0.023
Neg logit contributions
ric-0.032
ries-0.032
hat-0.032
els-0.032
aches-0.031
Token.
Token ID13
Feature activation-1.27e+00

Pos logit contributions
 unfair+0.051
 Please+0.051
 trouble+0.050
 guilty+0.050
 Give+0.047
Neg logit contributions
hat-0.043
ries-0.041
ric-0.041
els-0.040
aches-0.039
Token
Token ID220
Feature activation-1.13e+00

Pos logit contributions
 unfair+0.020
 trouble+0.020
 mean+0.019
 guilty+0.019
 impossible+0.019
Neg logit contributions
ric-0.026
ries-0.025
hat-0.024
asted-0.024
els-0.023
Token
Token ID198
Feature activation+5.47e+00

Pos logit contributions
 Please+0.019
 trouble+0.019
 Give+0.018
 unfair+0.018
 guilty+0.018
Neg logit contributions
ries-0.022
ric-0.022
els-0.021
asted-0.021
aches-0.021
Token
Token ID198
Feature activation+4.36e+00

Pos logit contributions
ries+0.115
ric+0.110
els+0.108
asted+0.099
aches+0.098
Neg logit contributions
 Please-0.095
 trouble-0.092
 Give-0.089
 unfair-0.087
 guilty-0.086
TokenThe
Token ID464
Feature activation+6.16e-01

Pos logit contributions
ries+0.073
ric+0.070
els+0.066
aches+0.060
asted+0.058
Neg logit contributions
 Please-0.097
 trouble-0.092
 unfair-0.091
 Give-0.090
 mean-0.089
Token race
Token ID3234
Feature activation+6.65e-01

Pos logit contributions
ric+0.010
asted+0.009
ries+0.009
hat+0.009
aches+0.009
Neg logit contributions
 trouble-0.011
 Please-0.011
 unfair-0.011
 Give-0.011
 fault-0.011
Token started
Token ID2067
Feature activation-3.30e+00

Pos logit contributions
ries+0.011
ric+0.010
els+0.010
hat+0.010
asted+0.010
Neg logit contributions
 unfair-0.013
 trouble-0.013
 Please-0.013
 guilty-0.012
 sorry-0.012
Token and
Token ID290
Feature activation+2.78e+00

Pos logit contributions
 Please+0.056
 unfair+0.056
 trouble+0.055
 guilty+0.053
 Give+0.052
Neg logit contributions
ric-0.065
ries-0.063
els-0.060
hat-0.058
asted-0.058
Token its
Token ID663
Feature activation-6.10e-01

Pos logit contributions
hat+0.080
aches+0.073
ries+0.072
packs+0.070
bel+0.070
Neg logit contributions
 sorry-0.086
 trouble-0.085
 stupid-0.083
 impossible-0.083
 guilty-0.082
Token new
Token ID649
Feature activation-7.91e-01

Pos logit contributions
 trouble+0.010
 Please+0.010
 unfair+0.010
 guilty+0.010
 impossible+0.010
Neg logit contributions
ric-0.012
ries-0.012
els-0.011
aches-0.011
asted-0.011
Token song
Token ID3496
Feature activation+3.99e-01

Pos logit contributions
 Please+0.014
 unfair+0.014
 trouble+0.014
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.015
ries-0.014
els-0.013
aches-0.013
hat-0.013
Token.
Token ID13
Feature activation-1.00e+00

Pos logit contributions
ric+0.009
ries+0.009
els+0.008
aches+0.008
asted+0.008
Neg logit contributions
 trouble-0.006
 Please-0.006
 unfair-0.006
 impossible-0.006
 guilty-0.006
Token
Token ID220
Feature activation-7.12e-01

Pos logit contributions
 Please+0.017
 trouble+0.016
 unfair+0.016
 guilty+0.016
 Give+0.015
Neg logit contributions
ries-0.020
ric-0.020
aches-0.019
els-0.019
hat-0.019
Token
Token ID198
Feature activation+6.00e+00

Pos logit contributions
 trouble+0.012
 Please+0.012
 Give+0.011
 sorry+0.011
 guilty+0.011
Neg logit contributions
ries-0.015
els-0.014
ric-0.014
aches-0.014
asted-0.013
Token
Token ID198
Feature activation+5.10e+00

Pos logit contributions
ries+0.138
els+0.130
ric+0.129
aches+0.120
ishes+0.119
Neg logit contributions
 Please-0.092
 trouble-0.088
 Give-0.088
 unfair-0.083
 guilty-0.082
TokenThey
Token ID2990
Feature activation+1.70e+00

Pos logit contributions
ries+0.083
ric+0.076
els+0.072
ishes+0.067
packs+0.066
Neg logit contributions
 Please-0.119
 unfair-0.114
 Give-0.112
 mean-0.111
 trouble-0.110
Token quickly
Token ID2952
Feature activation-1.23e-01

Pos logit contributions
ries+0.038
ric+0.035
els+0.034
ants+0.033
aches+0.032
Neg logit contributions
 mean-0.028
 guilty-0.027
 Please-0.026
 sorry-0.025
 impossible-0.025
Token realized
Token ID6939
Feature activation-1.21e+00

Pos logit contributions
 mean+0.002
 guilty+0.002
 trouble+0.002
 Please+0.002
 unfair+0.002
Neg logit contributions
ries-0.002
ric-0.002
els-0.002
ants-0.002
aches-0.002
Token the
Token ID262
Feature activation+1.33e-01

Pos logit contributions
 trouble+0.021
 unfair+0.020
 Please+0.020
 guilty+0.020
 sorry+0.020
Neg logit contributions
ric-0.024
ries-0.023
els-0.023
hat-0.022
asted-0.022
Token and
Token ID290
Feature activation+2.24e+00

Pos logit contributions
ric+0.101
ries+0.099
els+0.095
aches+0.093
asted+0.092
Neg logit contributions
 trouble-0.081
 Please-0.080
 unfair-0.078
 guilty-0.076
 impossible-0.075
Token clean
Token ID3424
Feature activation+1.98e+00

Pos logit contributions
ric+0.039
ries+0.037
aches+0.035
asted+0.035
els+0.035
Neg logit contributions
 trouble-0.043
 unfair-0.043
 Please-0.042
 impossible-0.041
 guilty-0.041
Token you
Token ID345
Feature activation-7.75e-01

Pos logit contributions
ries+0.038
ric+0.038
els+0.036
aches+0.035
asted+0.034
Neg logit contributions
 Please-0.035
 unfair-0.035
 trouble-0.034
 guilty-0.033
 impossible-0.033
Token up
Token ID510
Feature activation+2.77e+00

Pos logit contributions
 trouble+0.012
 Please+0.012
 unfair+0.012
 guilty+0.012
 impossible+0.011
Neg logit contributions
ric-0.016
ries-0.016
els-0.015
aches-0.015
asted-0.015
Token."
Token ID526
Feature activation+1.09e+00

Pos logit contributions
ric+0.059
ries+0.058
els+0.055
aches+0.054
asted+0.054
Neg logit contributions
 trouble-0.043
 Please-0.043
 unfair-0.042
 guilty-0.040
 impossible-0.040
Token
Token ID198
Feature activation+5.41e+00

Pos logit contributions
ric+0.023
ries+0.022
els+0.022
aches+0.021
hat+0.020
Neg logit contributions
 Please-0.019
 trouble-0.018
 guilty-0.017
 Give-0.017
 sorry-0.017
Token
Token ID198
Feature activation+4.56e+00

Pos logit contributions
ries+0.125
ric+0.120
els+0.118
aches+0.109
ishes+0.108
Neg logit contributions
 trouble-0.082
 Please-0.081
 Give-0.077
 sorry-0.073
 guilty-0.073
TokenAnd
Token ID1870
Feature activation-5.98e-01

Pos logit contributions
ries+0.085
ric+0.083
els+0.078
ishes+0.069
hat+0.066
Neg logit contributions
 Please-0.095
 Give-0.088
 trouble-0.088
 unfair-0.086
 mean-0.086
Token that
Token ID326
Feature activation+6.85e-01

Pos logit contributions
 Please+0.010
 trouble+0.010
 unfair+0.010
 guilty+0.010
 impossible+0.010
Neg logit contributions
ric-0.012
ries-0.012
els-0.011
aches-0.011
asted-0.011
Token's
Token ID338
Feature activation-5.31e-01

Pos logit contributions
ric+0.015
ries+0.015
els+0.014
aches+0.014
asted+0.014
Neg logit contributions
 Please-0.010
 trouble-0.010
 unfair-0.010
 guilty-0.010
 impossible-0.009
Token how
Token ID703
Feature activation-1.79e+00

Pos logit contributions
 unfair+0.010
 trouble+0.010
 Please+0.010
 sorry+0.010
 impossible+0.010
Neg logit contributions
ries-0.010
ric-0.009
els-0.009
hat-0.009
aches-0.009
Token He
Token ID679
Feature activation+1.85e+00

Pos logit contributions
 trouble+0.020
 Please+0.020
 unfair+0.019
 guilty+0.018
 impossible+0.018
Neg logit contributions
ric-0.034
ries-0.034
els-0.032
aches-0.032
asted-0.032
Token had
Token ID550
Feature activation-1.19e+00

Pos logit contributions
ric+0.037
ries+0.037
els+0.035
aches+0.033
hat+0.033
Neg logit contributions
 trouble-0.032
 Please-0.031
 unfair-0.031
 guilty-0.031
 mean-0.030
Token learned
Token ID4499
Feature activation-1.47e+00

Pos logit contributions
 Please+0.020
 unfair+0.019
 Give+0.019
 trouble+0.019
 guilty+0.018
Neg logit contributions
ric-0.023
ries-0.022
aches-0.022
asted-0.022
hat-0.021
Token his
Token ID465
Feature activation-5.32e-01

Pos logit contributions
 Please+0.028
 unfair+0.027
 trouble+0.027
 guilty+0.026
 Give+0.026
Neg logit contributions
ric-0.026
ries-0.025
aches-0.024
els-0.024
asted-0.023
Token lesson
Token ID11483
Feature activation-3.51e-01

Pos logit contributions
 Please+0.008
 Give+0.007
 Come+0.006
 Can+0.006
 unfair+0.006
Neg logit contributions
ric-0.012
aches-0.011
asted-0.011
hat-0.011
ries-0.011
Token and
Token ID290
Feature activation+3.22e+00

Pos logit contributions
 Please+0.007
 unfair+0.007
 sorry+0.007
 trouble+0.007
 guilty+0.007
Neg logit contributions
ric-0.005
ries-0.005
els-0.005
aches-0.004
hat-0.004
Token didn
Token ID1422
Feature activation-4.40e+00

Pos logit contributions
ric+0.059
ries+0.057
aches+0.054
asted+0.054
els+0.054
Neg logit contributions
 Please-0.058
 unfair-0.058
 trouble-0.057
 guilty-0.055
 impossible-0.054
Token't
Token ID470
Feature activation-2.48e-01

Pos logit contributions
 Please+0.064
 trouble+0.064
 unfair+0.064
 guilty+0.062
 impossible+0.059
Neg logit contributions
ric-0.096
ries-0.092
els-0.088
aches-0.088
asted-0.087
Token speed
Token ID2866
Feature activation-4.90e-01

Pos logit contributions
 trouble+0.004
 Please+0.004
 unfair+0.004
 guilty+0.004
 Give+0.004
Neg logit contributions
ric-0.005
ries-0.005
aches-0.004
asted-0.004
els-0.004
Token around
Token ID1088
Feature activation+3.46e+00

Pos logit contributions
 Please+0.009
 trouble+0.009
 unfair+0.008
 Give+0.008
 guilty+0.008
Neg logit contributions
ric-0.009
ries-0.009
els-0.008
hat-0.008
aches-0.008
Token so
Token ID523
Feature activation+4.89e-01

Pos logit contributions
ric+0.054
ries+0.053
els+0.050
aches+0.048
asted+0.047
Neg logit contributions
 Please-0.073
 trouble-0.073
 unfair-0.073
 guilty-0.070
 impossible-0.069
Token the
Token ID262
Feature activation+2.84e+00

Pos logit contributions
ants+0.147
ric+0.147
aches+0.139
ries+0.139
els+0.133
Neg logit contributions
 trouble-0.177
 wrong-0.148
 sorry-0.147
 mean-0.147
 guilty-0.146
Token exercise
Token ID5517
Feature activation+3.34e-01

Pos logit contributions
ric+0.062
ries+0.061
els+0.058
aches+0.056
asted+0.056
Neg logit contributions
 trouble-0.044
 unfair-0.043
 Please-0.043
 guilty-0.041
 impossible-0.041
Token,
Token ID11
Feature activation+7.18e-01

Pos logit contributions
ric+0.007
ries+0.007
els+0.006
aches+0.006
asted+0.006
Neg logit contributions
 trouble-0.005
 Please-0.005
 unfair-0.005
 guilty-0.005
 impossible-0.005
Token everyone
Token ID2506
Feature activation+2.10e+00

Pos logit contributions
ric+0.013
ries+0.012
els+0.012
aches+0.011
hat+0.011
Neg logit contributions
 trouble-0.014
 Please-0.014
 guilty-0.014
 unfair-0.014
 sorry-0.013
Token looked
Token ID3114
Feature activation-3.79e+00

Pos logit contributions
ric+0.044
ries+0.043
els+0.041
hat+0.040
asted+0.039
Neg logit contributions
 Please-0.034
 trouble-0.034
 guilty-0.034
 unfair-0.033
 mean-0.032
Token at
Token ID379
Feature activation+4.86e+00

Pos logit contributions
 Please+0.062
 trouble+0.060
 unfair+0.059
 Give+0.059
 liar+0.054
Neg logit contributions
ric-0.075
ries-0.074
els-0.071
asted-0.070
aches-0.069
Token their
Token ID511
Feature activation-1.70e-01

Pos logit contributions
ric+0.064
aches+0.062
hat+0.059
ries+0.059
asted+0.056
Neg logit contributions
 trouble-0.116
 unfair-0.111
 guilty-0.109
 stupid-0.108
 wrong-0.108
Token creations
Token ID28443
Feature activation+2.07e+00

Pos logit contributions
 Please+0.003
 trouble+0.003
 unfair+0.003
 guilty+0.003
 Give+0.003
Neg logit contributions
ric-0.003
ries-0.003
els-0.003
aches-0.003
asted-0.003
Token.
Token ID13
Feature activation-1.86e+00

Pos logit contributions
ric+0.048
ries+0.046
els+0.044
asted+0.043
ants+0.043
Neg logit contributions
 trouble-0.029
 guilty-0.028
 impossible-0.028
 unfair-0.027
 Please-0.027
Token They
Token ID1119
Feature activation+2.12e+00

Pos logit contributions
 trouble+0.029
 Please+0.029
 unfair+0.028
 guilty+0.027
 impossible+0.027
Neg logit contributions
ric-0.040
ries-0.039
els-0.037
aches-0.037
asted-0.036
Token couldn
Token ID3521
Feature activation-3.59e+00

Pos logit contributions
ries+0.047
ric+0.046
els+0.044
ants+0.042
hat+0.041
Neg logit contributions
 mean-0.034
 guilty-0.034
 unfair-0.032
 stupid-0.031
 ashamed-0.030
Token finished
Token ID5201
Feature activation+1.21e+00

Pos logit contributions
 trouble+0.023
 Please+0.023
 unfair+0.023
 guilty+0.022
 impossible+0.022
Neg logit contributions
ric-0.033
ries-0.033
els-0.031
aches-0.031
asted-0.031
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
ric+0.023
ries+0.021
els+0.021
hat+0.020
asted+0.020
Neg logit contributions
 trouble-0.023
 Please-0.022
 guilty-0.021
 unfair-0.021
 sorry-0.021
Token the
Token ID262
Feature activation+6.61e-01

Pos logit contributions
ric+0.002
ries+0.002
els+0.002
aches+0.002
hat+0.002
Neg logit contributions
 Please-0.002
 trouble-0.002
 guilty-0.002
 unfair-0.002
 sorry-0.002
Token little
Token ID1310
Feature activation+5.35e-01

Pos logit contributions
ric+0.013
ries+0.013
els+0.012
aches+0.012
asted+0.012
Neg logit contributions
 trouble-0.011
 Please-0.011
 unfair-0.011
 guilty-0.010
 Give-0.010
Token girl
Token ID2576
Feature activation+8.64e-01

Pos logit contributions
ric+0.012
ries+0.012
els+0.011
aches+0.011
asted+0.011
Neg logit contributions
 trouble-0.008
 unfair-0.008
 Please-0.008
 impossible-0.007
 guilty-0.007
Token put
Token ID1234
Feature activation+3.84e+00

Pos logit contributions
ric+0.018
ries+0.017
els+0.017
hat+0.016
aches+0.016
Neg logit contributions
 Please-0.014
 trouble-0.014
 guilty-0.014
 unfair-0.014
 mean-0.013
Token her
Token ID607
Feature activation+5.26e-01

Pos logit contributions
ric+0.076
ants+0.065
aches+0.063
asted+0.062
ries+0.062
Neg logit contributions
 trouble-0.076
 guilty-0.071
 Please-0.070
 wrong-0.069
 sorry-0.069
Token jacket
Token ID15224
Feature activation-3.94e-01

Pos logit contributions
ric+0.010
ries+0.010
els+0.010
asted+0.009
aches+0.009
Neg logit contributions
 trouble-0.009
 Please-0.009
 guilty-0.009
 unfair-0.009
 sorry-0.008
Token back
Token ID736
Feature activation+1.77e+00

Pos logit contributions
 Please+0.006
 trouble+0.006
 guilty+0.006
 unfair+0.006
 mean+0.006
Neg logit contributions
ric-0.008
ries-0.008
asted-0.008
els-0.008
aches-0.007
Token on
Token ID319
Feature activation+5.97e+00

Pos logit contributions
ric+0.039
asted+0.036
ries+0.035
ants+0.035
aches+0.035
Neg logit contributions
 trouble-0.026
 Please-0.026
 wrong-0.026
 guilty-0.025
 sorry-0.025
Token and
Token ID290
Feature activation+2.80e+00

Pos logit contributions
ric+0.110
ries+0.101
els+0.095
aches+0.095
ants+0.091
Neg logit contributions
 trouble-0.124
 Please-0.118
 guilty-0.112
 unfair-0.111
 impossible-0.110
Token and
Token ID290
Feature activation+2.62e+00

Pos logit contributions
ric+0.023
ries+0.021
els+0.021
asted+0.020
ants+0.019
Neg logit contributions
 trouble-0.019
 unfair-0.019
 guilty-0.019
 Please-0.019
 mean-0.018
Token take
Token ID1011
Feature activation+1.66e+00

Pos logit contributions
ric+0.050
ries+0.048
els+0.047
hat+0.045
ants+0.044
Neg logit contributions
 Please-0.048
 trouble-0.048
 unfair-0.046
 guilty-0.045
 mean-0.044
Token a
Token ID257
Feature activation+6.22e-01

Pos logit contributions
ries+0.031
ric+0.031
els+0.030
asted+0.029
aches+0.029
Neg logit contributions
 Please-0.029
 unfair-0.029
 trouble-0.028
 guilty-0.027
 impossible-0.027
Token nap
Token ID25422
Feature activation+6.02e-01

Pos logit contributions
ric+0.011
ries+0.011
els+0.011
aches+0.010
asted+0.010
Neg logit contributions
 trouble-0.012
 Please-0.011
 unfair-0.011
 guilty-0.011
 impossible-0.011
Token.
Token ID13
Feature activation-1.17e+00

Pos logit contributions
ric+0.011
ries+0.010
els+0.010
aches+0.010
asted+0.010
Neg logit contributions
 unfair-0.011
 trouble-0.011
 Please-0.011
 guilty-0.011
 Give-0.011
Token The
Token ID383
Feature activation+1.72e+00

Pos logit contributions
 Please+0.016
 trouble+0.015
 unfair+0.015
 Give+0.015
 guilty+0.015
Neg logit contributions
ric-0.028
ries-0.027
els-0.027
aches-0.027
asted-0.025
Token boy
Token ID2933
Feature activation+1.13e+00

Pos logit contributions
ries+0.032
ric+0.032
els+0.030
asted+0.030
aches+0.030
Neg logit contributions
 Please-0.030
 impossible-0.028
 unfair-0.028
 trouble-0.028
 Give-0.027
Token did
Token ID750
Feature activation-3.19e+00

Pos logit contributions
ric+0.021
ries+0.021
els+0.020
aches+0.019
asted+0.019
Neg logit contributions
 trouble-0.020
 Please-0.020
 unfair-0.020
 guilty-0.019
 impossible-0.019
Token as
Token ID355
Feature activation+7.22e-01

Pos logit contributions
 Please+0.051
 trouble+0.050
 unfair+0.050
 guilty+0.048
 impossible+0.047
Neg logit contributions
ric-0.067
ries-0.065
els-0.062
aches-0.062
asted-0.061
Token he
Token ID339
Feature activation-2.68e-02

Pos logit contributions
ric+0.013
ries+0.013
aches+0.012
els+0.012
asted+0.012
Neg logit contributions
 Please-0.014
 unfair-0.013
 trouble-0.013
 Give-0.013
 impossible-0.012
Token was
Token ID373
Feature activation-2.82e+00

Pos logit contributions
 trouble+0.000
 Please+0.000
 unfair+0.000
 guilty+0.000
 mean+0.000
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
ants-0.000
aches-0.000
Token.
Token ID13
Feature activation-2.37e+00

Pos logit contributions
 trouble+0.027
 Please+0.025
 unfair+0.025
 guilty+0.024
 mean+0.024
Neg logit contributions
ric-0.043
ries-0.040
els-0.040
asted-0.038
hat-0.038
Token She
Token ID1375
Feature activation+1.58e+00

Pos logit contributions
 Please+0.037
 trouble+0.036
 unfair+0.036
 guilty+0.034
 Give+0.034
Neg logit contributions
ric-0.051
ries-0.050
els-0.048
aches-0.047
asted-0.046
Token felt
Token ID2936
Feature activation-3.16e+00

Pos logit contributions
ric+0.031
ries+0.031
els+0.029
aches+0.028
hat+0.028
Neg logit contributions
 trouble-0.027
 Please-0.027
 unfair-0.027
 guilty-0.026
 mean-0.026
Token so
Token ID523
Feature activation-1.26e+00

Pos logit contributions
 Please+0.039
 Give+0.038
ANG+0.035
 Come+0.034
 fault+0.033
Neg logit contributions
aches-0.069
asted-0.069
ric-0.068
ries-0.068
els-0.065
Token relieved
Token ID22598
Feature activation-6.29e-01

Pos logit contributions
 Please+0.020
 Give+0.019
 Can+0.018
 trouble+0.018
 robber+0.017
Neg logit contributions
aches-0.025
asted-0.023
ric-0.022
ries-0.022
els-0.021
Token that
Token ID326
Feature activation+1.97e-01

Pos logit contributions
 trouble+0.011
 Please+0.011
 unfair+0.011
 guilty+0.010
 impossible+0.010
Neg logit contributions
ric-0.012
ries-0.012
els-0.011
aches-0.011
asted-0.011
Token she
Token ID673
Feature activation+2.32e-01

Pos logit contributions
ric+0.004
ries+0.004
els+0.003
hat+0.003
asted+0.003
Neg logit contributions
 trouble-0.004
 Please-0.004
 unfair-0.004
 guilty-0.004
 mean-0.004
Token remembered
Token ID12086
Feature activation-1.15e+00

Pos logit contributions
ric+0.004
ries+0.004
aches+0.004
els+0.004
asted+0.004
Neg logit contributions
 trouble-0.005
 unfair-0.004
 impossible-0.004
 Please-0.004
 Give-0.004
Token and
Token ID290
Feature activation+2.36e+00

Pos logit contributions
 trouble+0.023
 unfair+0.022
 guilty+0.022
 Please+0.022
 mean+0.021
Neg logit contributions
ric-0.021
ries-0.019
els-0.019
hat-0.018
ants-0.018
Token so
Token ID523
Feature activation+7.94e-01

Pos logit contributions
ric+0.044
ries+0.042
els+0.041
hat+0.039
aches+0.039
Neg logit contributions
 trouble-0.044
 Please-0.044
 unfair-0.043
 guilty-0.042
 mean-0.041
Token happy
Token ID3772
Feature activation-3.23e-01

Pos logit contributions
ric+0.019
ries+0.018
els+0.017
hat+0.017
asted+0.017
Neg logit contributions
 guilty-0.012
 sorry-0.012
 trouble-0.012
 Please-0.011
 unfair-0.011
Token went
Token ID1816
Feature activation-2.04e+00

Pos logit contributions
ric+0.021
ries+0.021
els+0.020
aches+0.019
asted+0.019
Neg logit contributions
 trouble-0.020
 Please-0.020
 unfair-0.020
 guilty-0.019
 impossible-0.019
Token closer
Token ID5699
Feature activation+1.56e+00

Pos logit contributions
 unfair+0.039
 Please+0.038
 trouble+0.038
 liar+0.035
 guilty+0.035
Neg logit contributions
ric-0.034
ries-0.034
hat-0.033
aches-0.033
els-0.033
Token and
Token ID290
Feature activation+2.02e+00

Pos logit contributions
ric+0.032
ries+0.032
els+0.030
aches+0.030
hat+0.029
Neg logit contributions
 unfair-0.025
 Please-0.025
 trouble-0.025
 guilty-0.024
 impossible-0.023
Token closer
Token ID5699
Feature activation+1.48e+00

Pos logit contributions
ric+0.043
ries+0.042
els+0.040
aches+0.039
asted+0.038
Neg logit contributions
 Please-0.033
 trouble-0.032
 unfair-0.032
 guilty-0.031
 impossible-0.030
Token until
Token ID1566
Feature activation+2.17e+00

Pos logit contributions
ric+0.030
ries+0.029
els+0.028
aches+0.027
hat+0.027
Neg logit contributions
 trouble-0.024
 unfair-0.024
 Please-0.024
 guilty-0.023
 Give-0.022
Token he
Token ID339
Feature activation-8.65e-01

Pos logit contributions
ric+0.043
ries+0.043
els+0.041
aches+0.040
asted+0.040
Neg logit contributions
 unfair-0.036
 trouble-0.035
 Please-0.035
 guilty-0.034
 Give-0.033
Token could
Token ID714
Feature activation-2.82e+00

Pos logit contributions
 trouble+0.015
 Please+0.015
 unfair+0.015
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.017
ries-0.016
els-0.016
aches-0.015
asted-0.015
Token see
Token ID766
Feature activation+1.37e+00

Pos logit contributions
 trouble+0.050
 Please+0.050
 unfair+0.050
 guilty+0.048
 impossible+0.047
Neg logit contributions
ric-0.054
ries-0.052
els-0.050
aches-0.049
asted-0.048
Token what
Token ID644
Feature activation-8.48e-01

Pos logit contributions
ric+0.024
ries+0.023
els+0.022
hat+0.022
asted+0.022
Neg logit contributions
 trouble-0.027
 unfair-0.026
 guilty-0.026
 Please-0.026
 mean-0.025
Token was
Token ID373
Feature activation-5.76e+00

Pos logit contributions
 Please+0.015
 trouble+0.015
 unfair+0.015
 guilty+0.014
 impossible+0.014
Neg logit contributions
ric-0.016
ries-0.015
els-0.015
aches-0.014
asted-0.014
Token causing
Token ID6666
Feature activation-1.90e+00

Pos logit contributions
 Please+0.106
 trouble+0.101
 unfair+0.101
 Give+0.101
 guilty+0.097
Neg logit contributions
ric-0.103
ries-0.103
els-0.098
aches-0.096
asted-0.090
Token started
Token ID2067
Feature activation-2.38e+00

Pos logit contributions
ric+0.048
ries+0.048
els+0.045
asted+0.044
aches+0.044
Neg logit contributions
 Please-0.056
 unfair-0.055
 trouble-0.055
 impossible-0.052
 guilty-0.052
Token running
Token ID2491
Feature activation-1.76e+00

Pos logit contributions
 trouble+0.033
 Please+0.031
 unfair+0.031
 guilty+0.030
 impossible+0.029
Neg logit contributions
ric-0.055
ries-0.055
els-0.053
aches-0.052
asted-0.051
Token along
Token ID1863
Feature activation+3.40e+00

Pos logit contributions
 unfair+0.039
 Please+0.038
 trouble+0.037
 liar+0.036
 Promise+0.036
Neg logit contributions
ric-0.024
hat-0.023
ries-0.023
els-0.022
aches-0.022
Token the
Token ID262
Feature activation+2.43e+00

Pos logit contributions
ric+0.052
ries+0.051
els+0.048
aches+0.046
asted+0.046
Neg logit contributions
 unfair-0.071
 Please-0.071
 trouble-0.070
 guilty-0.068
 Give-0.067
Token track
Token ID2610
Feature activation+1.75e+00

Pos logit contributions
ric+0.036
asted+0.033
ries+0.033
hat+0.032
els+0.032
Neg logit contributions
 Please-0.053
 unfair-0.052
 trouble-0.052
 sorry-0.051
 guilty-0.051
Token.
Token ID13
Feature activation-5.22e-01

Pos logit contributions
ric+0.031
ries+0.030
els+0.029
aches+0.028
hat+0.028
Neg logit contributions
 unfair-0.033
 trouble-0.033
 Please-0.032
 guilty-0.031
 impossible-0.030
Token
Token ID198
Feature activation+5.54e+00

Pos logit contributions
 Please+0.009
 trouble+0.009
 Give+0.009
 guilty+0.008
 Come+0.008
Neg logit contributions
ries-0.011
ric-0.010
els-0.010
aches-0.010
asted-0.009
Token
Token ID198
Feature activation+4.53e+00

Pos logit contributions
ries+0.137
ric+0.132
els+0.130
aches+0.121
ishes+0.120
Neg logit contributions
 Please-0.075
 trouble-0.074
 Give-0.071
 guilty-0.067
 mean-0.066
TokenM
Token ID44
Feature activation+6.15e-01

Pos logit contributions
ries+0.081
ric+0.081
els+0.075
aches+0.065
ishes+0.065
Neg logit contributions
 Please-0.101
 Give-0.092
 mean-0.090
 unfair-0.090
 trouble-0.088
Tokenilo
Token ID18526
Feature activation-8.77e-01

Pos logit contributions
ric+0.009
ries+0.009
els+0.009
aches+0.008
asted+0.008
Neg logit contributions
 Please-0.013
 trouble-0.013
 unfair-0.013
 guilty-0.013
 impossible-0.013
Token was
Token ID373
Feature activation-3.86e+00

Pos logit contributions
 trouble+0.017
 unfair+0.017
 Please+0.017
 impossible+0.016
 guilty+0.016
Neg logit contributions
ric-0.014
ries-0.014
els-0.013
aches-0.013
hat-0.013
Token up
Token ID510
Feature activation+4.73e+00

Pos logit contributions
ric+0.106
ries+0.101
els+0.100
aches+0.094
ants+0.094
Neg logit contributions
 trouble-0.057
 unfair-0.057
 guilty-0.055
 impossible-0.055
 wrong-0.055
Token and
Token ID290
Feature activation+2.95e+00

Pos logit contributions
ric+0.121
ries+0.107
els+0.106
asted+0.105
hat+0.103
Neg logit contributions
 trouble-0.064
 guilty-0.058
 wrong-0.057
 Please-0.056
 stupid-0.056
Token felt
Token ID2936
Feature activation-2.64e+00

Pos logit contributions
ric+0.058
ries+0.055
els+0.054
aches+0.051
hat+0.051
Neg logit contributions
 trouble-0.053
 Please-0.052
 unfair-0.051
 guilty-0.051
 mean-0.049
Token much
Token ID881
Feature activation-2.13e-01

Pos logit contributions
 Please+0.041
 trouble+0.039
 unfair+0.038
 Give+0.038
 impossible+0.036
Neg logit contributions
ric-0.054
ries-0.052
aches-0.052
asted-0.052
els-0.049
Token better
Token ID1365
Feature activation-1.87e+00

Pos logit contributions
 Please+0.003
 Give+0.003
 unfair+0.003
 liar+0.003
 robber+0.003
Neg logit contributions
aches-0.004
ries-0.004
asted-0.004
ric-0.004
hat-0.004
Token.
Token ID13
Feature activation-6.52e-01

Pos logit contributions
 unfair+0.037
 Please+0.036
 Give+0.035
 trouble+0.034
 impossible+0.034
Neg logit contributions
ric-0.031
ries-0.030
aches-0.028
hat-0.028
els-0.028
Token The
Token ID383
Feature activation+1.68e+00

Pos logit contributions
 Please+0.012
 trouble+0.012
 unfair+0.012
 guilty+0.011
 Give+0.011
Neg logit contributions
ric-0.012
ries-0.012
els-0.011
aches-0.011
asted-0.011
Token end
Token ID886
Feature activation-1.04e+00

Pos logit contributions
ric+0.028
ries+0.027
els+0.025
aches+0.025
asted+0.025
Neg logit contributions
 Please-0.034
 trouble-0.034
 unfair-0.034
 guilty-0.033
 impossible-0.033
Token.
Token ID13
Feature activation-3.69e-01

Pos logit contributions
 trouble+0.018
 Please+0.018
 unfair+0.018
 guilty+0.017
 impossible+0.017
Neg logit contributions
ric-0.020
ries-0.020
els-0.019
aches-0.019
asted-0.018
Token<|endoftext|>
Token ID50256
Feature activation+1.68e+00

Pos logit contributions
 trouble+0.006
 Please+0.006
 guilty+0.005
 Give+0.005
 You+0.005
Neg logit contributions
ries-0.009
els-0.009
aches-0.008
ric-0.008
ants-0.008
TokenOnce
Token ID7454
Feature activation+1.59e+00

Pos logit contributions
ric+0.028
ries+0.027
els+0.026
aches+0.025
asted+0.025
Neg logit contributions
 trouble-0.034
 Please-0.034
 unfair-0.033
 guilty-0.032
 impossible-0.032
Token to
Token ID284
Feature activation+4.25e+00

Pos logit contributions
 trouble+0.118
 unfair+0.111
 impossible+0.109
 crank+0.108
 consequences+0.106
Neg logit contributions
aches-0.080
ries-0.078
ants-0.075
ric-0.073
bel-0.070
Token the
Token ID262
Feature activation-1.93e-01

Pos logit contributions
ries+0.082
hat+0.081
ric+0.080
els+0.079
bel+0.079
Neg logit contributions
 trouble-0.079
 sorry-0.075
 unfair-0.075
 mean-0.075
 stupid-0.073
Token rabbit
Token ID22746
Feature activation-2.39e-01

Pos logit contributions
 Please+0.004
 trouble+0.004
 Give+0.004
 unfair+0.004
 guilty+0.003
Neg logit contributions
ric-0.003
hat-0.003
aches-0.003
ries-0.003
asted-0.003
Token,
Token ID11
Feature activation-7.93e-01

Pos logit contributions
 trouble+0.004
 Please+0.004
 unfair+0.004
 guilty+0.004
 impossible+0.004
Neg logit contributions
ric-0.005
ries-0.004
els-0.004
aches-0.004
asted-0.004
Token
Token ID6184
Feature activation-5.31e+00

Pos logit contributions
 trouble+0.015
 unfair+0.014
 impossible+0.013
 consequences+0.013
 Please+0.013
Neg logit contributions
ric-0.014
ries-0.013
asted-0.012
els-0.012
hat-0.012
Token
Token ID95
Feature activation-3.12e+00

Pos logit contributions
 trouble+0.102
 unfair+0.098
 Please+0.098
 guilty+0.095
 impossible+0.094
Neg logit contributions
ric-0.095
ries-0.090
els-0.087
aches-0.086
asted-0.085
Token
Token ID26391
Feature activation-5.47e+00

Pos logit contributions
 trouble+0.059
 unfair+0.057
 guilty+0.056
 Please+0.056
 impossible+0.054
Neg logit contributions
ric-0.057
asted-0.053
ries-0.051
els-0.050
hat-0.049
Token
Token ID129
Feature activation-1.93e+00

Pos logit contributions
 Please+0.112
 trouble+0.111
 unfair+0.109
 guilty+0.106
 mean+0.104
Neg logit contributions
ric-0.089
ries-0.088
els-0.083
aches-0.080
asted-0.080
Token
Token ID241
Feature activation-1.07e-01

Pos logit contributions
 trouble+0.036
 Please+0.035
 unfair+0.034
 guilty+0.034
 Give+0.032
Neg logit contributions
ric-0.036
ries-0.033
asted-0.033
aches-0.033
els-0.033
TokenI
Token ID40
Feature activation-1.90e+00

Pos logit contributions
 unfair+0.002
 trouble+0.002
 impossible+0.002
 guilty+0.002
 sorry+0.002
Neg logit contributions
ric-0.002
asted-0.002
hat-0.002
ries-0.002
aches-0.002
Token have
Token ID423
Feature activation-3.43e+00

Pos logit contributions
 unfair+0.030
 Please+0.030
 trouble+0.030
 impossible+0.028
 consequences+0.028
Neg logit contributions
ric-0.036
aches-0.036
asted-0.035
hat-0.034
bel-0.033
Token Tom
Token ID4186
Feature activation+9.05e-01

Pos logit contributions
 trouble+0.002
 Please+0.002
 unfair+0.002
 guilty+0.001
 sorry+0.001
Neg logit contributions
ric-0.001
ries-0.001
els-0.001
hat-0.001
aches-0.001
Token and
Token ID290
Feature activation+2.20e+00

Pos logit contributions
els+0.010
ries+0.010
ric+0.010
hat+0.009
aches+0.009
Neg logit contributions
 Please-0.022
 impossible-0.022
 unfair-0.022
 Can-0.020
 trouble-0.020
Token Jen
Token ID13364
Feature activation+2.00e+00

Pos logit contributions
ric+0.047
ries+0.047
hat+0.045
els+0.044
ants+0.044
Neg logit contributions
 trouble-0.035
 mean-0.034
 unfair-0.033
 stupid-0.032
 Give-0.032
Token are
Token ID389
Feature activation-1.85e+00

Pos logit contributions
ric+0.035
ries+0.033
aches+0.032
els+0.032
asted+0.031
Neg logit contributions
 Please-0.038
 trouble-0.038
 unfair-0.038
 impossible-0.036
 Give-0.036
Token still
Token ID991
Feature activation-2.03e+00

Pos logit contributions
 Please+0.026
 Give+0.026
 trouble+0.025
 unfair+0.024
 impossible+0.024
Neg logit contributions
ric-0.040
ries-0.039
aches-0.039
els-0.038
asted-0.036
Token sad
Token ID6507
Feature activation-1.75e+00

Pos logit contributions
 Please+0.024
 Give+0.022
 unfair+0.021
 impossible+0.021
 trouble+0.020
Neg logit contributions
ric-0.051
ries-0.049
els-0.047
aches-0.046
hat-0.045
Token.
Token ID13
Feature activation-3.91e+00

Pos logit contributions
 unfair+0.034
 Please+0.033
 impossible+0.033
 trouble+0.032
 Give+0.032
Neg logit contributions
ries-0.030
ric-0.029
els-0.028
aches-0.026
hat-0.025
Token They
Token ID1119
Feature activation+2.00e+00

Pos logit contributions
 unfair+0.074
 trouble+0.073
 impossible+0.072
 Please+0.072
 guilty+0.071
Neg logit contributions
ric-0.069
ries-0.065
els-0.062
hat-0.060
asted-0.060
Token miss
Token ID2051
Feature activation-2.16e-01

Pos logit contributions
ric+0.037
aches+0.035
asted+0.034
els+0.032
ries+0.031
Neg logit contributions
 Please-0.035
 trouble-0.034
 unfair-0.033
 impossible-0.033
 Give-0.033
Token their
Token ID511
Feature activation+1.51e-01

Pos logit contributions
 trouble+0.005
 sorry+0.005
 unfair+0.005
 mean+0.005
 guilty+0.005
Neg logit contributions
ric-0.003
hat-0.002
aches-0.002
ries-0.002
ants-0.002
Token old
Token ID1468
Feature activation-1.79e-01

Pos logit contributions
ric+0.003
ries+0.003
els+0.003
aches+0.003
asted+0.003
Neg logit contributions
 Please-0.003
 trouble-0.003
 unfair-0.003
 guilty-0.003
 impossible-0.003
Token or
Token ID393
Feature activation-3.42e-01

Pos logit contributions
 unfair+0.002
 impossible+0.002
 guilty+0.002
 sorry+0.002
 Please+0.002
Neg logit contributions
ries-0.002
els-0.002
ric-0.002
ie-0.002
hat-0.002
Token killed
Token ID2923
Feature activation+2.14e+00

Pos logit contributions
 trouble+0.005
 Please+0.004
 unfair+0.004
 guilty+0.004
 impossible+0.004
Neg logit contributions
ric-0.008
ries-0.008
els-0.008
aches-0.007
asted-0.007
Token by
Token ID416
Feature activation+5.57e+00

Pos logit contributions
ries+0.041
ric+0.040
els+0.039
aches+0.037
asted+0.036
Neg logit contributions
 unfair-0.037
 Please-0.037
 trouble-0.037
 guilty-0.036
 impossible-0.036
Token the
Token ID262
Feature activation+2.25e+00

Pos logit contributions
hat+0.101
ric+0.096
ries+0.093
bel+0.092
packs+0.091
Neg logit contributions
 trouble-0.111
 unfair-0.103
 sorry-0.102
 mean-0.101
 stupid-0.100
Token truck
Token ID7779
Feature activation+1.36e+00

Pos logit contributions
ric+0.042
ries+0.040
els+0.038
asted+0.037
aches+0.037
Neg logit contributions
 Please-0.039
 unfair-0.039
 impossible-0.038
 Give-0.038
 trouble-0.036
Token.
Token ID13
Feature activation-5.25e+00

Pos logit contributions
ric+0.025
ries+0.025
els+0.024
aches+0.023
hat+0.022
Neg logit contributions
 unfair-0.024
 trouble-0.023
 Please-0.023
 guilty-0.023
 impossible-0.022
Token You
Token ID921
Feature activation-1.49e+00

Pos logit contributions
 impossible+0.058
 unfair+0.055
 wrong+0.051
 guilty+0.051
 trouble+0.050
Neg logit contributions
ric-0.135
ries-0.126
els-0.124
asted-0.123
ants-0.118
Token should
Token ID815
Feature activation-8.40e-01

Pos logit contributions
 trouble+0.024
 Please+0.024
 unfair+0.023
 guilty+0.023
 mean+0.022
Neg logit contributions
ric-0.031
ries-0.031
els-0.029
asted-0.029
aches-0.028
Token never
Token ID1239
Feature activation+7.97e-01

Pos logit contributions
 Please+0.014
 trouble+0.014
 unfair+0.013
 guilty+0.013
 impossible+0.013
Neg logit contributions
ric-0.017
ries-0.016
aches-0.016
els-0.016
asted-0.016
Token go
Token ID467
Feature activation-8.32e-01

Pos logit contributions
ries+0.015
ric+0.015
els+0.014
ants+0.013
hat+0.013
Neg logit contributions
 trouble-0.015
 Please-0.015
 unfair-0.014
 mean-0.014
 guilty-0.014
Token near
Token ID1474
Feature activation+4.81e+00

Pos logit contributions
 Please+0.014
 unfair+0.014
 trouble+0.014
 impossible+0.013
 guilty+0.013
Neg logit contributions
ric-0.016
ries-0.016
els-0.015
aches-0.015
hat-0.014
Token fairy
Token ID25607
Feature activation+6.99e-01

Pos logit contributions
ric+0.006
ries+0.006
aches+0.006
els+0.006
asted+0.006
Neg logit contributions
 trouble-0.009
 unfair-0.009
 Please-0.008
 Give-0.008
 impossible-0.008
Token replied
Token ID8712
Feature activation-3.20e+00

Pos logit contributions
ric+0.015
ries+0.014
els+0.014
asted+0.013
aches+0.013
Neg logit contributions
 Please-0.011
 trouble-0.011
 unfair-0.011
 mean-0.011
 guilty-0.011
Token,
Token ID11
Feature activation-8.06e-01

Pos logit contributions
 trouble+0.050
 Please+0.049
 unfair+0.049
 guilty+0.047
 impossible+0.046
Neg logit contributions
ric-0.068
ries-0.067
els-0.064
aches-0.062
asted-0.062
Token
Token ID6184
Feature activation-5.14e+00

Pos logit contributions
 Please+0.011
 trouble+0.011
 unfair+0.011
 guilty+0.010
 impossible+0.010
Neg logit contributions
ric-0.019
ries-0.018
els-0.018
aches-0.017
asted-0.017
Token
Token ID95
Feature activation-3.14e+00

Pos logit contributions
 trouble+0.104
 unfair+0.099
 guilty+0.097
 Please+0.095
 mean+0.095
Neg logit contributions
ric-0.087
ries-0.084
els-0.080
aches-0.079
asted-0.078
Token
Token ID26391
Feature activation-5.18e+00

Pos logit contributions
 trouble+0.059
 unfair+0.057
 guilty+0.057
 mean+0.055
 sorry+0.055
Neg logit contributions
ric-0.057
asted-0.052
ries-0.050
els-0.050
aches-0.050
Token
Token ID129
Feature activation-1.73e+00

Pos logit contributions
 trouble+0.095
 Please+0.094
 unfair+0.094
 guilty+0.091
 impossible+0.089
Neg logit contributions
ric-0.095
ries-0.093
els-0.088
aches-0.086
asted-0.085
Token
Token ID241
Feature activation-4.05e-01

Pos logit contributions
 trouble+0.032
 Please+0.032
 unfair+0.031
 guilty+0.030
 mean+0.030
Neg logit contributions
ric-0.032
ries-0.030
els-0.029
asted-0.029
aches-0.029
TokenI
Token ID40
Feature activation-1.47e+00

Pos logit contributions
 trouble+0.008
 unfair+0.008
 Please+0.007
 guilty+0.007
 impossible+0.007
Neg logit contributions
ric-0.007
ries-0.007
asted-0.006
hat-0.006
els-0.006
Token am
Token ID716
Feature activation-4.63e+00

Pos logit contributions
 trouble+0.023
 Please+0.022
 unfair+0.022
 guilty+0.021
 impossible+0.021
Neg logit contributions
ric-0.031
ries-0.030
els-0.029
aches-0.029
asted-0.029
Token giving
Token ID3501
Feature activation-6.10e-01

Pos logit contributions
 trouble+0.072
 Please+0.072
 unfair+0.072
 guilty+0.068
 Give+0.067
Neg logit contributions
ric-0.096
ries-0.094
els-0.090
aches-0.089
asted-0.088
Token please
Token ID3387
Feature activation-9.65e-01

Pos logit contributions
 trouble+0.035
 unfair+0.034
 impossible+0.032
 guilty+0.032
 Please+0.030
Neg logit contributions
ric-0.061
ries-0.058
asted-0.057
els-0.055
aches-0.055
Token.
Token ID13
Feature activation-6.18e+00

Pos logit contributions
 trouble+0.016
 unfair+0.015
 impossible+0.015
 guilty+0.015
 Please+0.014
Neg logit contributions
ric-0.020
ries-0.019
els-0.018
asted-0.018
aches-0.018
Token For
Token ID1114
Feature activation+2.07e+00

Pos logit contributions
 unfair+0.070
 impossible+0.070
 trouble+0.063
 wrong+0.062
 guilty+0.062
Neg logit contributions
ric-0.149
asted-0.145
illas-0.136
ries-0.136
packs-0.133
Tokengive
Token ID26535
Feature activation-4.64e-01

Pos logit contributions
ric+0.044
ries+0.043
els+0.041
aches+0.040
hat+0.039
Neg logit contributions
 trouble-0.033
 Please-0.033
 unfair-0.032
 guilty-0.031
 mean-0.030
Token us
Token ID514
Feature activation+1.76e+00

Pos logit contributions
 guilty+0.009
 unfair+0.009
 impossible+0.009
 trouble+0.008
 Please+0.008
Neg logit contributions
ries-0.008
ric-0.008
els-0.008
hat-0.007
asted-0.007
Token.
Token ID13
Feature activation-5.42e+00

Pos logit contributions
ric+0.039
ries+0.039
els+0.037
aches+0.036
asted+0.036
Neg logit contributions
 trouble-0.025
 Please-0.025
 unfair-0.025
 guilty-0.024
 impossible-0.023
Token For
Token ID1114
Feature activation+2.69e+00

Pos logit contributions
 impossible+0.064
 unfair+0.062
 trouble+0.059
 wrong+0.058
 guilty+0.057
Neg logit contributions
ric-0.130
asted-0.126
ries-0.120
els-0.117
illas-0.116
Tokengive
Token ID26535
Feature activation-4.24e-01

Pos logit contributions
ric+0.060
ries+0.058
els+0.056
hat+0.055
aches+0.055
Neg logit contributions
 trouble-0.041
 Please-0.041
 unfair-0.039
 mean-0.037
 sorry-0.037
Token us
Token ID514
Feature activation+1.82e+00

Pos logit contributions
 unfair+0.008
 guilty+0.008
 trouble+0.008
 Please+0.008
 impossible+0.007
Neg logit contributions
ries-0.008
ric-0.008
els-0.007
hat-0.007
asted-0.007
Token.
Token ID13
Feature activation-4.97e+00

Pos logit contributions
ric+0.040
ries+0.039
els+0.037
aches+0.037
asted+0.036
Neg logit contributions
 trouble-0.027
 Please-0.027
 unfair-0.027
 guilty-0.025
 impossible-0.025
Token For
Token ID1114
Feature activation+2.91e+00

Pos logit contributions
 impossible+0.059
 unfair+0.057
 trouble+0.056
 guilty+0.053
 wrong+0.053
Neg logit contributions
ric-0.122
asted-0.115
ries-0.115
els-0.111
illas-0.107
Token You
Token ID921
Feature activation-1.80e+00

Pos logit contributions
 impossible+0.044
 unfair+0.044
 trouble+0.041
 guilty+0.040
 wrong+0.037
Neg logit contributions
ric-0.162
ries-0.158
els-0.156
asted-0.152
ants-0.148
Token only
Token ID691
Feature activation+7.62e-02

Pos logit contributions
 trouble+0.029
 Please+0.029
 unfair+0.028
 mean+0.027
 guilty+0.027
Neg logit contributions
ric-0.038
ries-0.037
els-0.035
asted-0.034
hat-0.034
Token wanted
Token ID2227
Feature activation-2.53e+00

Pos logit contributions
ric+0.002
ries+0.002
els+0.002
aches+0.001
asted+0.001
Neg logit contributions
 trouble-0.001
 Please-0.001
 unfair-0.001
 guilty-0.001
 impossible-0.001
Token the
Token ID262
Feature activation+8.36e-01

Pos logit contributions
 trouble+0.035
 Please+0.035
 unfair+0.034
 guilty+0.033
 impossible+0.032
Neg logit contributions
ric-0.058
ries-0.057
els-0.055
aches-0.054
asted-0.053
Token ring
Token ID5858
Feature activation+8.79e-01

Pos logit contributions
ric+0.011
ries+0.011
aches+0.010
els+0.010
hat+0.009
Neg logit contributions
 Please-0.020
 unfair-0.019
 trouble-0.019
 impossible-0.019
 Give-0.018
Token.
Token ID13
Feature activation-5.31e+00

Pos logit contributions
ric+0.020
ries+0.019
els+0.018
asted+0.018
aches+0.018
Neg logit contributions
 Please-0.013
 trouble-0.013
 unfair-0.012
 guilty-0.012
 mean-0.012
Token But
Token ID887
Feature activation-5.47e-01

Pos logit contributions
 impossible+0.050
 unfair+0.050
 guilty+0.047
 trouble+0.046
 wrong+0.045
Neg logit contributions
ric-0.142
els-0.138
ries-0.138
asted-0.134
ants-0.130
Token the
Token ID262
Feature activation-8.08e-02

Pos logit contributions
 Please+0.007
 trouble+0.007
 unfair+0.006
 guilty+0.006
 mean+0.006
Neg logit contributions
ric-0.014
ries-0.013
els-0.013
hat-0.013
asted-0.013
Token ring
Token ID5858
Feature activation+1.53e+00

Pos logit contributions
 Please+0.002
 unfair+0.002
 trouble+0.002
 impossible+0.001
 guilty+0.001
Neg logit contributions
ric-0.001
ries-0.001
aches-0.001
els-0.001
asted-0.001
Token is
Token ID318
Feature activation-2.61e+00

Pos logit contributions
ric+0.036
ries+0.035
els+0.034
asted+0.033
aches+0.033
Neg logit contributions
 trouble-0.020
 Please-0.020
 unfair-0.020
 guilty-0.019
 mean-0.018
Token not
Token ID407
Feature activation-3.40e+00

Pos logit contributions
 trouble+0.032
 Please+0.032
 unfair+0.031
 guilty+0.030
 impossible+0.029
Neg logit contributions
ric-0.063
ries-0.062
els-0.060
aches-0.059
asted-0.058
Token Lily
Token ID20037
Feature activation-2.02e+00

Pos logit contributions
 trouble+0.041
 Please+0.041
 unfair+0.041
 guilty+0.039
 impossible+0.038
Neg logit contributions
ric-0.055
ries-0.054
els-0.052
aches-0.051
asted-0.050
Token.
Token ID13
Feature activation-3.32e+00

Pos logit contributions
 Please+0.024
 mean+0.024
 trouble+0.024
 Give+0.023
 guilty+0.023
Neg logit contributions
ries-0.050
ric-0.049
aches-0.047
els-0.047
ants-0.046
Token She
Token ID1375
Feature activation-1.18e-01

Pos logit contributions
 trouble+0.058
 unfair+0.056
 Please+0.056
 guilty+0.055
 impossible+0.054
Neg logit contributions
ric-0.062
ries-0.062
els-0.058
hat-0.057
asted-0.057
Token loved
Token ID6151
Feature activation-3.40e+00

Pos logit contributions
 trouble+0.003
 impossible+0.002
 Please+0.002
 unfair+0.002
 guilty+0.002
Neg logit contributions
ric-0.002
aches-0.002
ries-0.002
hat-0.002
asted-0.002
Token to
Token ID284
Feature activation-6.86e-01

Pos logit contributions
 trouble+0.039
 Please+0.037
 unfair+0.036
 mean+0.034
 guilty+0.033
Neg logit contributions
ric-0.090
els-0.085
ries-0.084
aches-0.083
ants-0.082
Token look
Token ID804
Feature activation-5.16e+00

Pos logit contributions
 mean+0.012
 trouble+0.012
 Please+0.012
 unfair+0.012
 sorry+0.011
Neg logit contributions
ric-0.014
ries-0.014
els-0.013
ants-0.012
illas-0.012
Token up
Token ID510
Feature activation-1.57e-03

Pos logit contributions
 trouble+0.074
 Please+0.074
 unfair+0.073
 guilty+0.071
 mean+0.069
Neg logit contributions
ric-0.118
ries-0.113
els-0.108
asted-0.107
aches-0.107
Token at
Token ID379
Feature activation+3.50e+00

Pos logit contributions
 trouble+0.000
 Please+0.000
 guilty+0.000
 mean+0.000
 stupid+0.000
Neg logit contributions
ric-0.000
asted-0.000
aches-0.000
ries-0.000
els-0.000
Token the
Token ID262
Feature activation-7.90e-01

Pos logit contributions
ric+0.063
ants+0.060
hat+0.059
aches+0.058
illas+0.057
Neg logit contributions
 trouble-0.076
 unfair-0.065
 mean-0.065
 wrong-0.064
 stupid-0.064
Token stars
Token ID5788
Feature activation+1.69e+00

Pos logit contributions
 trouble+0.014
 Please+0.014
 unfair+0.014
 mean+0.013
 guilty+0.013
Neg logit contributions
ric-0.015
ries-0.015
els-0.014
asted-0.014
aches-0.014
Token in
Token ID287
Feature activation+5.31e+00

Pos logit contributions
ric+0.037
ries+0.034
asted+0.034
aches+0.034
els+0.033
Neg logit contributions
 Please-0.025
 trouble-0.024
 mean-0.024
 impossible-0.024
 guilty-0.024