TOP ACTIVATIONS
MAX = 5.708

him . But to no avail , they were too scared
deep paws , to no avail . He finally thought of
? Why are you pan icking ?" " My
times , but to no avail . Jane 's
Her mom saw her pan icking and asked , " What
with something else . K icking makes everyone sad ."
out , but to no avail . He felt sad and
with fragile things . P icking up an important map is
to draw with my cr ay ons ." <|endoftext|> Once upon
the floor , making v room and be ep sounds .
with their books and cr ay ons . They wished Mom
the right thing by pan icking . It 's ok to
. Ben has many cr ay ons , but Mia has
scissors , glue and cr ay ons . They make their
They draw with the cr ay ons . They use the
draw on paper with cr ay ons and stickers . They
K icking is fun ! It makes
, screaming , and pan icking . Sally was so frightened
makes v room - v room sounds . "
" Why are you pan icking ?" " I

MIN ACTIVATIONS
MIN = -5.647

happy that she learned about cher ries that day . She
would come and let the cher ries fall to the ground
must n 't touch the cher ries !" But Jack was
was filled with special pink cher ries . The
night it grows even more cher ries ." The
on he always polished his cher ries before eating them .
and soon everyone was taking turns to spin the trophy .
. You 've had enough cher ries for today . You
to be careful when picking cher ries so that the tree
ugly . There were no cher ries ! L
she had picked too many cher ries and the tree was
full of y ummy red cher ries . The bird ie
had been right about the cher ries . From that day
came back to pick more cher ries . She was so
. Little Tommy wanted some cher ries from the tree ,
The bird ie ate many cher ries and felt happy .
. Then both friends took turns spinning the tire together .
up and grabbed as many cher ries as he could .
pick a few big red cher ries , then rode away
still happy to have her cher ries to eat . She

INTERVAL 2.870 - 5.708
CONTAINS 0.202%

. M ummy paid for the cake .
They both g iggle and play with the clown
a giant smile to the world . One day
Pete " P ete was touched by his dad
the corner . When M ummy saw Sarah , she screamed

INTERVAL 0.031 - 2.870
CONTAINS 46.491%

was scared , but the car led him down the road
who liked to play in the garden . They liked to
, she would take a break to enjoy a bit of
friends . He would often fold his paws and wait patiently
of different colors and shapes . They can make houses ,

INTERVAL -2.808 - 0.031
CONTAINS 53.215%

ons to make a pretty picture . Later that
bear . We can share ," L ila says .
hard to keep us safe . The little girl smiled and
sink . She looked over and saw that a little frog
The daughter played with her toys and the mom watched her

INTERVAL -5.647 - -2.808
CONTAINS 0.092%

on , Lily shared her clay creations with others and felt
Billy have fun with the clay . He apologized to Billy
friends agreed that they were proud . They felt content and
They take the ice cre ams and lick them .
M ia feels a tug on her pole too .
Token him
Token ID683
Feature activation+1.44e+00

Pos logit contributions
 p+0.048
 st+0.047
 r+0.046
 hurt+0.042
 pet+0.041
Neg logit contributions
ancer-0.129
ancing-0.119
icking-0.118
ver-0.117
ummy-0.115
Token.
Token ID13
Feature activation-9.05e-01

Pos logit contributions
ancer+0.210
ancing+0.196
icking+0.192
ummy+0.191
ver+0.191
Neg logit contributions
 p-0.040
 st-0.039
 r-0.037
 hurt-0.032
 pet-0.030
Token But
Token ID887
Feature activation-1.11e+00

Pos logit contributions
 p+0.052
 r+0.047
 st+0.045
 pet+0.045
 le+0.044
Neg logit contributions
ancer-0.102
icking-0.099
ancing-0.094
otion-0.089
ummy-0.088
Token to
Token ID284
Feature activation-1.08e+00

Pos logit contributions
 p+0.055
 r+0.055
 st+0.054
 hurt+0.050
 sw+0.046
Neg logit contributions
ancer-0.141
ancing-0.133
ummy-0.129
unch-0.125
apping-0.124
Token no
Token ID645
Feature activation+8.39e-01

Pos logit contributions
 p+0.050
 r+0.049
 st+0.046
 hurt+0.044
 sc+0.043
Neg logit contributions
ancer-0.143
icking-0.131
ancing-0.131
ummy-0.129
ver-0.128
Token avail
Token ID29107
Feature activation+5.23e+00

Pos logit contributions
ancer+0.088
ancing+0.081
icking+0.079
ver+0.079
ummy+0.077
Neg logit contributions
 p-0.057
 st-0.056
 r-0.056
 hurt-0.052
 pet-0.051
Token,
Token ID11
Feature activation+2.83e-01

Pos logit contributions
ancer+0.693
ancing+0.638
ummy+0.624
icking+0.623
ver+0.605
Neg logit contributions
 p-0.211
 st-0.200
 r-0.199
 hurt-0.181
 pet-0.175
Token they
Token ID484
Feature activation-1.04e+00

Pos logit contributions
ancer+0.038
ancing+0.036
ummy+0.035
icking+0.034
apping+0.034
Neg logit contributions
 p-0.012
 st-0.012
 r-0.012
 hurt-0.011
 wished-0.010
Token were
Token ID547
Feature activation-7.70e-01

Pos logit contributions
 p+0.064
 st+0.063
 r+0.062
 hurt+0.058
 pet+0.056
Neg logit contributions
ancer-0.116
ancing-0.106
icking-0.105
ver-0.103
ummy-0.103
Token too
Token ID1165
Feature activation-7.09e-02

Pos logit contributions
 p+0.046
 st+0.045
 r+0.044
 scratches+0.041
 trash+0.041
Neg logit contributions
ancer-0.085
ancing-0.078
icking-0.078
ver-0.077
ummy-0.074
Token scared
Token ID12008
Feature activation-1.52e+00

Pos logit contributions
 p+0.003
 st+0.003
 r+0.003
 scratches+0.003
 hurt+0.003
Neg logit contributions
ancer-0.009
ver-0.008
icking-0.008
ancing-0.008
ummy-0.008
Token deep
Token ID2769
Feature activation+5.11e-01

Pos logit contributions
 p+0.063
 st+0.062
 r+0.062
 hurt+0.058
 trash+0.057
Neg logit contributions
ancer-0.082
ancing-0.074
icking-0.073
ver-0.073
ummy-0.071
Token paws
Token ID49193
Feature activation+1.88e-01

Pos logit contributions
ancer+0.049
ancing+0.046
icking+0.045
ver+0.044
ummy+0.044
Neg logit contributions
 p-0.036
 st-0.036
 r-0.035
 hurt-0.034
 pet-0.033
Token,
Token ID11
Feature activation+4.22e-01

Pos logit contributions
ancer+0.026
ancing+0.024
ummy+0.024
imb+0.023
ete+0.023
Neg logit contributions
 p-0.007
 st-0.006
 r-0.006
 hurt-0.006
 scattered-0.005
Token to
Token ID284
Feature activation-8.46e-01

Pos logit contributions
ancer+0.042
ancing+0.037
ver+0.037
icking+0.036
ummy+0.036
Neg logit contributions
 p-0.033
 st-0.032
 r-0.032
 hurt-0.030
 scratches-0.030
Token no
Token ID645
Feature activation+9.40e-01

Pos logit contributions
 p+0.048
 r+0.046
 st+0.046
 hurt+0.043
 pet+0.042
Neg logit contributions
ancer-0.101
ancing-0.092
icking-0.091
ver-0.091
ummy-0.090
Token avail
Token ID29107
Feature activation+5.19e+00

Pos logit contributions
ancer+0.098
ancing+0.089
icking+0.088
ver+0.087
ummy+0.086
Neg logit contributions
 p-0.063
 st-0.063
 r-0.062
 hurt-0.058
 scratches-0.057
Token.
Token ID13
Feature activation-4.41e-01

Pos logit contributions
ancer+0.738
ancing+0.689
icking+0.682
ver+0.675
ummy+0.670
Neg logit contributions
 p-0.152
 st-0.146
 r-0.143
 hurt-0.122
 pet-0.116
Token He
Token ID679
Feature activation-4.55e-02

Pos logit contributions
 p+0.020
 st+0.018
 r+0.018
 pet+0.017
 che+0.016
Neg logit contributions
ancer-0.055
icking-0.054
ancing-0.053
ummy-0.049
ying-0.048
Token finally
Token ID3443
Feature activation+1.71e+00

Pos logit contributions
 p+0.002
 st+0.002
 r+0.002
 trash+0.002
 scratches+0.002
Neg logit contributions
ancer-0.005
ver-0.005
icking-0.005
ancing-0.005
ummy-0.005
Token thought
Token ID1807
Feature activation-1.22e+00

Pos logit contributions
ver+0.191
ancer+0.187
icking+0.183
ancing+0.178
ete+0.174
Neg logit contributions
 st-0.085
 p-0.084
 trash-0.082
 r-0.078
 scratches-0.078
Token of
Token ID286
Feature activation+5.51e-01

Pos logit contributions
 r+0.065
 bad+0.061
 p+0.061
 sharp+0.060
 st+0.059
Neg logit contributions
ancer-0.152
imb-0.138
ancing-0.137
icking-0.133
cess-0.132
Token?
Token ID30
Feature activation+6.44e-01

Pos logit contributions
 p+0.086
 st+0.085
 r+0.085
 hurt+0.080
 pet+0.079
Neg logit contributions
ancer-0.087
ancing-0.078
icking-0.077
ver-0.076
ummy-0.074
Token Why
Token ID4162
Feature activation-9.33e-01

Pos logit contributions
ancer+0.059
ancing+0.056
icking+0.054
ummy+0.052
ete+0.052
Neg logit contributions
 p-0.052
 st-0.051
 r-0.051
 hurt-0.049
 wished-0.047
Token are
Token ID389
Feature activation-6.05e-02

Pos logit contributions
 p+0.049
 r+0.047
 st+0.047
 hurt+0.045
 trash+0.043
Neg logit contributions
ancer-0.113
ancing-0.106
ummy-0.103
icking-0.102
ver-0.100
Token you
Token ID345
Feature activation+7.11e-02

Pos logit contributions
 p+0.005
 st+0.005
 r+0.005
 hurt+0.005
 trash+0.005
Neg logit contributions
ancer-0.005
ancing-0.005
icking-0.005
ver-0.005
ummy-0.004
Token pan
Token ID3425
Feature activation-1.39e+00

Pos logit contributions
ancer+0.009
ancing+0.008
ete+0.008
ver+0.008
ummy+0.008
Neg logit contributions
 p-0.004
 st-0.004
 hurt-0.004
 r-0.004
 sc-0.003
Tokenicking
Token ID7958
Feature activation+5.18e+00

Pos logit contributions
 hurt+0.123
 p+0.120
 st+0.118
 r+0.117
 sc+0.112
Neg logit contributions
ancer-0.126
ancing-0.113
ete-0.112
ver-0.110
istent-0.105
Token?"
Token ID1701
Feature activation-9.48e-02

Pos logit contributions
ancer+0.573
ancing+0.518
ver+0.506
icking+0.498
ete+0.492
Neg logit contributions
 p-0.350
 r-0.340
 st-0.338
 hurt-0.331
 trash-0.308
Token
Token ID198
Feature activation-1.71e-01

Pos logit contributions
 st+0.007
 p+0.006
 r+0.006
 scratches+0.006
 trash+0.006
Neg logit contributions
ancer-0.009
ummy-0.008
ancing-0.008
icking-0.008
ver-0.008
Token
Token ID198
Feature activation-2.26e-01

Pos logit contributions
 p+0.011
 st+0.010
 r+0.010
 hurt+0.010
 scratches+0.009
Neg logit contributions
ancer-0.019
ancing-0.017
ver-0.017
icking-0.017
ummy-0.017
Token"
Token ID1
Feature activation-1.20e+00

Pos logit contributions
 st+0.015
 scratches+0.014
 scattered+0.014
 p+0.014
 sw+0.013
Neg logit contributions
ummy-0.020
ancer-0.020
apping-0.019
ress-0.019
enny-0.019
TokenMy
Token ID3666
Feature activation-2.62e-02

Pos logit contributions
 p+0.079
 st+0.078
 r+0.077
 scratches+0.073
 trash+0.071
Neg logit contributions
ancer-0.125
ver-0.112
icking-0.111
ancing-0.109
ummy-0.109
Token times
Token ID1661
Feature activation+3.41e-01

Pos logit contributions
 p+0.001
 st+0.001
 r+0.001
P+0.001
 wished+0.001
Neg logit contributions
ancer-0.001
ancing-0.001
icking-0.001
ver-0.001
ummy-0.001
Token,
Token ID11
Feature activation+2.65e-01

Pos logit contributions
ancer+0.046
ancing+0.040
imb+0.040
ete+0.040
ummy+0.039
Neg logit contributions
 p-0.014
 r-0.014
 st-0.014
 sc-0.013
 hurt-0.013
Token but
Token ID475
Feature activation+7.79e-01

Pos logit contributions
ancer+0.025
ancing+0.022
icking+0.022
ver+0.021
ummy+0.021
Neg logit contributions
 p-0.022
 st-0.022
 r-0.022
 hurt-0.020
 scratches-0.020
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
ancer+0.094
ancing+0.086
icking+0.083
ummy+0.082
apping+0.080
Neg logit contributions
 p-0.044
 r-0.044
 st-0.043
 hurt-0.041
 wished-0.038
Token no
Token ID645
Feature activation+7.86e-01

Pos logit contributions
 p+0.069
 r+0.064
 st+0.064
 hurt+0.064
 sc+0.060
Neg logit contributions
ancer-0.135
ancing-0.123
ummy-0.122
apping-0.121
icking-0.121
Token avail
Token ID29107
Feature activation+5.18e+00

Pos logit contributions
ancer+0.065
ancing+0.057
icking+0.057
ver+0.056
ummy+0.055
Neg logit contributions
 p-0.070
 st-0.069
 r-0.069
 hurt-0.065
 scratches-0.065
Token.
Token ID13
Feature activation-7.48e-01

Pos logit contributions
ancer+0.721
ancing+0.671
icking+0.664
ver+0.656
ummy+0.652
Neg logit contributions
 p-0.169
 st-0.162
 r-0.159
 hurt-0.139
 pet-0.133
Token
Token ID220
Feature activation-2.52e-01

Pos logit contributions
 p+0.051
 r+0.048
 st+0.045
 pet+0.043
 trash+0.042
Neg logit contributions
ancer-0.084
ancing-0.076
icking-0.075
ver-0.070
istent-0.067
Token
Token ID628
Feature activation-6.69e-01

Pos logit contributions
 p+0.005
 r+0.005
 fun+0.003
.+0.003
 st+0.003
Neg logit contributions
ancer-0.037
icking-0.034
ancing-0.034
istent-0.033
ummy-0.033
Token Jane
Token ID12091
Feature activation+5.41e-02

Pos logit contributions
 st+0.045
 p+0.044
 r+0.043
 scratches+0.040
 hurt+0.040
Neg logit contributions
ancer-0.068
ver-0.063
ummy-0.062
icking-0.062
ancing-0.061
Token's
Token ID338
Feature activation+1.34e+00

Pos logit contributions
ancer+0.006
icking+0.006
ver+0.006
ancing+0.006
ummy+0.005
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 trash-0.003
 scratches-0.003
TokenHer
Token ID9360
Feature activation-8.26e-01

Pos logit contributions
 st+0.005
 scratches+0.005
 p+0.005
 scattered+0.005
 sw+0.005
Neg logit contributions
ancer-0.006
ummy-0.006
apping-0.006
ver-0.006
ancing-0.005
Token mom
Token ID1995
Feature activation+1.67e-01

Pos logit contributions
 st+0.077
sm+0.077
 p+0.076
 r+0.075
 scattered+0.075
Neg logit contributions
ummy-0.051
ancer-0.051
icking-0.049
ama-0.049
ver-0.042
Token saw
Token ID2497
Feature activation-7.10e-01

Pos logit contributions
ancer+0.021
icking+0.020
ancing+0.019
ver+0.019
ummy+0.019
Neg logit contributions
 p-0.007
 st-0.007
 r-0.007
 trash-0.006
 scratches-0.006
Token her
Token ID607
Feature activation-4.16e-01

Pos logit contributions
 st+0.039
 p+0.038
 r+0.037
 pet+0.036
 hurt+0.036
Neg logit contributions
ancer-0.077
ummy-0.072
ancing-0.071
icking-0.070
ver-0.068
Token pan
Token ID3425
Feature activation-1.34e+00

Pos logit contributions
 p+0.024
 st+0.023
 r+0.023
 hurt+0.021
 pet+0.021
Neg logit contributions
ancer-0.049
ancing-0.045
ver-0.044
icking-0.044
ummy-0.043
Tokenicking
Token ID7958
Feature activation+5.15e+00

Pos logit contributions
 p+0.068
 st+0.067
 r+0.065
 hurt+0.062
 sc+0.058
Neg logit contributions
ancer-0.178
ver-0.160
ancing-0.158
ete-0.156
imb-0.149
Token and
Token ID290
Feature activation+3.20e-01

Pos logit contributions
ancer+0.565
ancing+0.528
icking+0.527
ver+0.513
ummy+0.510
Neg logit contributions
 p-0.295
 st-0.292
 r-0.287
 hurt-0.266
 scratches-0.265
Token asked
Token ID1965
Feature activation-3.96e-01

Pos logit contributions
ancer+0.034
icking+0.032
ress+0.032
ver+0.031
ummy+0.030
Neg logit contributions
 scratches-0.015
 trash-0.015
 st-0.015
 p-0.014
P-0.014
Token,
Token ID11
Feature activation+4.56e-01

Pos logit contributions
 p+0.021
 r+0.021
 st+0.020
 hurt+0.018
 trash+0.018
Neg logit contributions
ancer-0.048
ancing-0.045
icking-0.044
ver-0.043
ummy-0.042
Token "
Token ID366
Feature activation+8.35e-01

Pos logit contributions
ancer+0.039
ancing+0.035
icking+0.034
ver+0.034
ummy+0.033
Neg logit contributions
 p-0.040
 st-0.039
 r-0.039
 hurt-0.037
 pet-0.036
TokenWhat
Token ID2061
Feature activation+1.92e-01

Pos logit contributions
ancer+0.090
icking+0.080
ancing+0.079
ver+0.078
ummy+0.078
Neg logit contributions
 st-0.050
 p-0.050
 r-0.049
 scratches-0.046
 hurt-0.046
Token with
Token ID351
Feature activation-3.86e-01

Pos logit contributions
 p+0.012
 r+0.012
 st+0.012
 hurt+0.011
 pet+0.011
Neg logit contributions
ancer-0.025
ancing-0.023
icking-0.022
ver-0.022
ummy-0.022
Token something
Token ID1223
Feature activation-3.51e-01

Pos logit contributions
 p+0.022
 st+0.022
 r+0.021
 hurt+0.020
 trash+0.020
Neg logit contributions
ancer-0.045
ancing-0.041
icking-0.041
ver-0.041
ummy-0.039
Token else
Token ID2073
Feature activation-1.43e+00

Pos logit contributions
 st+0.021
 p+0.020
 r+0.020
 scratches+0.019
 pet+0.018
Neg logit contributions
ancer-0.037
icking-0.035
ummy-0.034
ver-0.034
ancing-0.034
Token.
Token ID13
Feature activation+3.73e-01

Pos logit contributions
 p+0.039
 st+0.037
 r+0.036
 hurt+0.032
 pet+0.028
Neg logit contributions
ancer-0.210
ancing-0.198
icking-0.194
ver-0.191
ummy-0.188
Token K
Token ID509
Feature activation+6.34e-01

Pos logit contributions
ancer+0.035
icking+0.033
ancing+0.033
ver+0.030
ummy+0.029
Neg logit contributions
 p-0.029
 r-0.028
 st-0.028
 pet-0.027
 trash-0.027
Tokenicking
Token ID7958
Feature activation+5.14e+00

Pos logit contributions
ancer+0.049
ver+0.044
ancing+0.043
ress+0.040
istent+0.040
Neg logit contributions
 p-0.057
 le-0.054
 r-0.053
 st-0.053
 pet-0.052
Token makes
Token ID1838
Feature activation-4.93e-01

Pos logit contributions
ancer+0.657
ver+0.601
ancing+0.598
icking+0.591
ummy+0.571
Neg logit contributions
 p-0.268
 r-0.250
 st-0.246
 hurt-0.229
 trash-0.224
Token everyone
Token ID2506
Feature activation+3.11e-02

Pos logit contributions
 p+0.027
 r+0.026
 st+0.025
 hurt+0.024
 pet+0.023
Neg logit contributions
ancer-0.060
ancing-0.055
ver-0.054
icking-0.054
ummy-0.052
Token sad
Token ID6507
Feature activation+2.89e-01

Pos logit contributions
ancer+0.004
icking+0.003
ancing+0.003
ver+0.003
ummy+0.003
Neg logit contributions
 st-0.002
 p-0.002
 r-0.002
 scratches-0.001
 pet-0.001
Token."
Token ID526
Feature activation-8.10e-01

Pos logit contributions
ancer+0.036
ancing+0.033
icking+0.033
ver+0.032
ummy+0.032
Neg logit contributions
 p-0.014
 st-0.014
 r-0.013
 hurt-0.012
 pet-0.012
Token
Token ID198
Feature activation+4.52e-02

Pos logit contributions
 p+0.052
 st+0.049
 r+0.048
 pet+0.046
 hurt+0.045
Neg logit contributions
ancer-0.091
ancing-0.087
icking-0.085
ver-0.083
ummy-0.079
Token out
Token ID503
Feature activation+5.86e-01

Pos logit contributions
ancer+0.186
ver+0.169
ancing+0.168
icking+0.164
ummy+0.161
Neg logit contributions
 p-0.079
 r-0.074
 st-0.073
 hurt-0.069
 pet-0.066
Token,
Token ID11
Feature activation+5.12e-02

Pos logit contributions
ancer+0.068
ancing+0.062
icking+0.061
ver+0.061
ummy+0.060
Neg logit contributions
 p-0.034
 st-0.033
 r-0.033
 hurt-0.031
 scratches-0.029
Token but
Token ID475
Feature activation+6.59e-01

Pos logit contributions
ancer+0.005
ancing+0.004
ver+0.004
icking+0.004
ummy+0.004
Neg logit contributions
 p-0.004
 st-0.004
 r-0.004
 hurt-0.004
 pet-0.004
Token to
Token ID284
Feature activation-1.19e+00

Pos logit contributions
ancer+0.082
ancing+0.076
ummy+0.075
icking+0.072
apping+0.071
Neg logit contributions
 p-0.035
 r-0.035
 st-0.034
 hurt-0.033
 wished-0.030
Token no
Token ID645
Feature activation+7.39e-01

Pos logit contributions
 p+0.072
 r+0.067
 hurt+0.066
 st+0.066
 sc+0.063
Neg logit contributions
ancer-0.142
ummy-0.132
apping-0.128
ancing-0.127
icking-0.125
Token avail
Token ID29107
Feature activation+5.13e+00

Pos logit contributions
ancer+0.065
ancing+0.058
icking+0.057
ver+0.056
ummy+0.056
Neg logit contributions
 p-0.061
 st-0.061
 r-0.060
 hurt-0.057
 scratches-0.057
Token.
Token ID13
Feature activation-8.47e-01

Pos logit contributions
ancer+0.734
ancing+0.684
icking+0.675
ummy+0.667
ver+0.665
Neg logit contributions
 p-0.147
 st-0.140
 r-0.138
 hurt-0.119
 pet-0.111
Token He
Token ID679
Feature activation-2.49e-01

Pos logit contributions
 p+0.051
 r+0.046
 st+0.046
 scratches+0.044
 pet+0.044
Neg logit contributions
ancer-0.101
ancing-0.090
icking-0.090
ummy-0.084
otion-0.081
Token felt
Token ID2936
Feature activation-1.08e+00

Pos logit contributions
 p+0.012
 st+0.012
 r+0.012
 trash+0.012
 scratches+0.011
Neg logit contributions
ancer-0.028
icking-0.026
ver-0.026
ancing-0.026
ete-0.025
Token sad
Token ID6507
Feature activation-1.41e-01

Pos logit contributions
 p+0.066
 st+0.065
 r+0.063
 scratches+0.060
 trash+0.060
Neg logit contributions
ancer-0.108
ancing-0.100
ver-0.100
icking-0.100
ete-0.096
Token and
Token ID290
Feature activation-2.31e-01

Pos logit contributions
 p+0.008
 r+0.008
 st+0.008
 hurt+0.007
 scratches+0.007
Neg logit contributions
ancer-0.017
ancing-0.015
ver-0.015
icking-0.015
ummy-0.015
Token with
Token ID351
Feature activation-6.42e-01

Pos logit contributions
 r+0.059
 hurt+0.053
 p+0.050
 st+0.050
 bad+0.046
Neg logit contributions
ancer-0.209
ver-0.181
imb-0.181
ete-0.181
ancing-0.180
Token fragile
Token ID21049
Feature activation-7.54e-01

Pos logit contributions
 p+0.032
 st+0.032
 r+0.032
 hurt+0.029
 trash+0.029
Neg logit contributions
ancer-0.080
ancing-0.073
ver-0.072
icking-0.071
ummy-0.070
Token things
Token ID1243
Feature activation-1.63e+00

Pos logit contributions
 p+0.037
 st+0.036
 r+0.036
 hurt+0.033
 scratches+0.031
Neg logit contributions
ancer-0.092
ancing-0.085
icking-0.084
ver-0.083
ummy-0.082
Token.
Token ID13
Feature activation+5.27e-02

Pos logit contributions
 p+0.063
 st+0.062
 r+0.061
 hurt+0.056
 trash+0.052
Neg logit contributions
ancer-0.219
ancing-0.202
icking-0.198
ummy-0.196
ver-0.195
Token P
Token ID350
Feature activation+6.34e-01

Pos logit contributions
ancer+0.006
icking+0.006
ancing+0.006
ying+0.006
ummy+0.006
Neg logit contributions
 p-0.002
 pet-0.002
 st-0.002
 r-0.002
 wished-0.002
Tokenicking
Token ID7958
Feature activation+5.09e+00

Pos logit contributions
ancer+0.051
ver+0.045
ancing+0.041
icking+0.041
bars+0.039
Neg logit contributions
 p-0.056
 r-0.055
 st-0.053
 pet-0.051
 trash-0.051
Token up
Token ID510
Feature activation+4.21e-01

Pos logit contributions
ancer+0.623
ver+0.558
ancing+0.557
icking+0.539
unch+0.539
Neg logit contributions
 p-0.277
 r-0.276
 st-0.269
 hurt-0.253
 trash-0.248
Token an
Token ID281
Feature activation+9.09e-01

Pos logit contributions
ancer+0.049
ancing+0.045
ver+0.044
icking+0.044
ummy+0.043
Neg logit contributions
 st-0.024
 p-0.024
 r-0.024
 hurt-0.022
 trash-0.021
Token important
Token ID1593
Feature activation-1.68e-01

Pos logit contributions
ancer+0.102
ancing+0.093
ver+0.092
icking+0.092
ummy+0.088
Neg logit contributions
 p-0.055
 r-0.054
 st-0.054
 hurt-0.049
 trash-0.048
Token map
Token ID3975
Feature activation+1.19e+00

Pos logit contributions
 p+0.010
 st+0.010
 r+0.010
 hurt+0.009
 pet+0.009
Neg logit contributions
ancer-0.019
ancing-0.017
icking-0.017
ver-0.017
ummy-0.017
Token is
Token ID318
Feature activation+1.06e-01

Pos logit contributions
ancer+0.146
ancing+0.135
icking+0.131
ummy+0.129
ver+0.128
Neg logit contributions
 p-0.059
 st-0.058
 r-0.058
 hurt-0.052
 pet-0.051
Token to
Token ID284
Feature activation-3.73e-01

Pos logit contributions
 p+0.047
 st+0.045
 r+0.045
 hurt+0.042
 pet+0.041
Neg logit contributions
ancer-0.077
ancing-0.070
ver-0.070
icking-0.070
ete-0.065
Token draw
Token ID3197
Feature activation+1.35e-01

Pos logit contributions
 p+0.020
 st+0.019
 r+0.019
 hurt+0.017
 pet+0.017
Neg logit contributions
ancer-0.045
ancing-0.041
icking-0.041
ver-0.040
ummy-0.040
Token with
Token ID351
Feature activation-1.17e-01

Pos logit contributions
ancer+0.017
icking+0.016
ancing+0.016
ver+0.016
ete+0.015
Neg logit contributions
 p-0.007
 r-0.006
 st-0.006
 hurt-0.006
 fun-0.005
Token my
Token ID616
Feature activation+1.27e+00

Pos logit contributions
 st+0.007
 p+0.007
 r+0.007
 hurt+0.006
 scratches+0.006
Neg logit contributions
ancer-0.012
ummy-0.011
ancing-0.011
icking-0.011
ver-0.011
Token cr
Token ID1067
Feature activation+1.72e-01

Pos logit contributions
ancer+0.142
icking+0.129
ancing+0.128
ummy+0.128
ver+0.125
Neg logit contributions
 p-0.070
 st-0.070
 r-0.070
 hurt-0.064
 scratches-0.063
Tokenay
Token ID323
Feature activation+5.03e+00

Pos logit contributions
ancer+0.012
ver+0.010
icking+0.010
ancing+0.010
ete+0.009
Neg logit contributions
 p-0.017
 r-0.017
 st-0.016
 hurt-0.015
 pr-0.015
Tokenons
Token ID684
Feature activation+4.97e-01

Pos logit contributions
ancer+0.467
ancing+0.420
icking+0.415
ver+0.410
ummy+0.397
Neg logit contributions
 p-0.398
 st-0.393
 r-0.390
 hurt-0.367
 scratches-0.361
Token."
Token ID526
Feature activation-4.47e-01

Pos logit contributions
ancer+0.062
ancing+0.058
icking+0.057
ver+0.057
ummy+0.055
Neg logit contributions
 p-0.023
 st-0.022
 r-0.022
 hurt-0.020
 pet-0.019
Token<|endoftext|>
Token ID50256
Feature activation+2.60e-01

Pos logit contributions
 p+0.026
 st+0.025
 r+0.025
 pet+0.023
 hurt+0.023
Neg logit contributions
ancer-0.052
ancing-0.049
icking-0.048
ver-0.047
ummy-0.045
TokenOnce
Token ID7454
Feature activation-6.31e-01

Pos logit contributions
ancer+0.031
ancing+0.029
icking+0.029
ver+0.028
ummy+0.028
Neg logit contributions
 p-0.014
 st-0.013
 r-0.013
 hurt-0.012
 pet-0.012
Token upon
Token ID2402
Feature activation+1.94e+00

Pos logit contributions
 r+0.056
 p+0.056
 st+0.055
 hurt+0.053
 bad+0.052
Neg logit contributions
ancer-0.050
ancing-0.048
icking-0.045
istent-0.043
ver-0.042
Token the
Token ID262
Feature activation+1.15e+00

Pos logit contributions
ancer+0.256
ancing+0.237
icking+0.236
ver+0.232
ummy+0.225
Neg logit contributions
 p-0.072
 st-0.068
 r-0.068
 hurt-0.058
 trash-0.056
Token floor
Token ID4314
Feature activation+6.65e-01

Pos logit contributions
ancer+0.106
ancing+0.096
icking+0.095
ummy+0.094
ver+0.090
Neg logit contributions
 st-0.082
 p-0.081
 r-0.080
 hurt-0.077
 scratches-0.077
Token,
Token ID11
Feature activation+1.27e+00

Pos logit contributions
ancer+0.096
ancing+0.089
icking+0.087
ver+0.087
ummy+0.086
Neg logit contributions
 p-0.021
 r-0.019
 st-0.019
 hurt-0.016
 trash-0.014
Token making
Token ID1642
Feature activation+1.21e+00

Pos logit contributions
ancer+0.155
ver+0.142
ancing+0.141
ummy+0.139
icking+0.139
Neg logit contributions
 p-0.075
 r-0.071
 st-0.071
 hurt-0.063
 trash-0.062
Token v
Token ID410
Feature activation-1.19e-01

Pos logit contributions
ancer+0.150
ancing+0.139
icking+0.138
ver+0.137
ummy+0.135
Neg logit contributions
 p-0.055
 st-0.054
 r-0.053
 hurt-0.048
 scratches-0.047
Tokenroom
Token ID3823
Feature activation+5.02e+00

Pos logit contributions
 p+0.014
 r+0.014
 st+0.013
 hurt+0.013
 scratches+0.013
Neg logit contributions
ancer-0.006
ancing-0.005
ete-0.005
icking-0.005
ver-0.005
Token and
Token ID290
Feature activation+7.52e-01

Pos logit contributions
ancer+0.630
ancing+0.582
icking+0.577
ver+0.574
ummy+0.563
Neg logit contributions
 p-0.234
 st-0.227
 r-0.225
 hurt-0.202
 scratches-0.195
Token be
Token ID307
Feature activation-3.10e-01

Pos logit contributions
ancer+0.074
ancing+0.069
icking+0.064
ver+0.064
ummy+0.063
Neg logit contributions
 st-0.044
P-0.044
 pet-0.042
sm-0.041
 p-0.041
Tokenep
Token ID538
Feature activation-5.00e-01

Pos logit contributions
 scratches+0.015
 st+0.015
 pet+0.015
 p+0.015
 trash+0.014
Neg logit contributions
ancer-0.033
icking-0.032
apping-0.032
ancing-0.031
ver-0.030
Token sounds
Token ID5238
Feature activation-3.80e-01

Pos logit contributions
 p+0.020
 r+0.020
 st+0.019
 hurt+0.018
 s+0.016
Neg logit contributions
ancer-0.067
ancing-0.061
ummy-0.060
icking-0.060
ete-0.059
Token.
Token ID13
Feature activation+4.43e-01

Pos logit contributions
 st+0.017
 p+0.017
 scratches+0.017
 r+0.016
 trash+0.016
Neg logit contributions
ancer-0.045
icking-0.042
ancing-0.042
ver-0.042
ummy-0.042
Token with
Token ID351
Feature activation-4.04e-01

Pos logit contributions
ancer+0.083
ancing+0.078
ver+0.075
icking+0.075
ummy+0.073
Neg logit contributions
 p-0.023
 st-0.022
 r-0.022
 hurt-0.019
 pet-0.018
Token their
Token ID511
Feature activation-1.40e+00

Pos logit contributions
 p+0.018
 st+0.018
 r+0.017
 hurt+0.015
 scratches+0.015
Neg logit contributions
ancer-0.050
icking-0.046
ancing-0.046
ver-0.046
ummy-0.045
Token books
Token ID3835
Feature activation+8.26e-01

Pos logit contributions
 r+0.088
 st+0.088
 p+0.087
P+0.083
 hurt+0.082
Neg logit contributions
ancer-0.135
ummy-0.125
icking-0.125
ancing-0.119
ver-0.116
Token and
Token ID290
Feature activation+6.43e-01

Pos logit contributions
ancer+0.117
ancing+0.112
icking+0.106
imb+0.106
ete+0.105
Neg logit contributions
 p-0.027
 st-0.026
 r-0.024
 hurt-0.022
 scattered-0.019
Token cr
Token ID1067
Feature activation+1.48e-01

Pos logit contributions
ancer+0.075
icking+0.070
ancing+0.069
ver+0.068
ummy+0.067
Neg logit contributions
 p-0.030
 st-0.030
 r-0.030
 hurt-0.026
 scratches-0.026
Tokenay
Token ID323
Feature activation+5.02e+00

Pos logit contributions
ancer+0.010
ver+0.008
icking+0.008
ancing+0.008
imb+0.007
Neg logit contributions
 p-0.015
 r-0.015
 st-0.014
 scratches-0.014
 s-0.014
Tokenons
Token ID684
Feature activation+5.53e-01

Pos logit contributions
ancer+0.356
ancing+0.322
ummy+0.317
icking+0.313
ver+0.311
Neg logit contributions
 p-0.467
 st-0.463
 r-0.449
 pet-0.447
 hurt-0.446
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
ancer+0.071
icking+0.067
ver+0.067
ancing+0.066
ummy+0.064
Neg logit contributions
 p-0.021
 r-0.021
 st-0.021
 scratches-0.019
 trash-0.019
Token They
Token ID1119
Feature activation-4.58e-01

Pos logit contributions
ancer+0.012
icking+0.012
ancing+0.012
ver+0.011
ummy+0.011
Neg logit contributions
 p-0.006
 r-0.005
 st-0.005
 pet-0.005
 wished-0.005
Token wished
Token ID16555
Feature activation-1.11e+00

Pos logit contributions
 r+0.021
 p+0.021
 trash+0.021
 st+0.021
 scratches+0.020
Neg logit contributions
ancer-0.051
ver-0.050
icking-0.047
ancing-0.044
ete-0.043
Token Mom
Token ID11254
Feature activation+3.35e-01

Pos logit contributions
 r+0.048
 p+0.047
 st+0.045
 hurt+0.041
 s+0.040
Neg logit contributions
ancer-0.145
ancing-0.138
ver-0.133
icking-0.133
imb-0.132
Token the
Token ID262
Feature activation+5.22e-02

Pos logit contributions
 p+0.030
 st+0.030
 r+0.029
 trash+0.027
 scratches+0.027
Neg logit contributions
ancer-0.049
ver-0.047
ancing-0.047
icking-0.046
ummy-0.044
Token right
Token ID826
Feature activation+5.55e-01

Pos logit contributions
ancer+0.006
ancing+0.006
icking+0.006
ver+0.006
ummy+0.005
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 hurt-0.003
 trash-0.002
Token thing
Token ID1517
Feature activation-1.46e+00

Pos logit contributions
ancer+0.048
icking+0.047
ancing+0.046
ummy+0.046
ver+0.044
Neg logit contributions
 st-0.037
 p-0.037
 r-0.036
 scratches-0.035
P-0.035
Token by
Token ID416
Feature activation-1.03e+00

Pos logit contributions
 p+0.071
 st+0.069
 r+0.067
 hurt+0.064
 pet+0.058
Neg logit contributions
ancer-0.182
ancing-0.170
icking-0.163
ete-0.161
ver-0.161
Token pan
Token ID3425
Feature activation-1.41e+00

Pos logit contributions
 p+0.074
 r+0.073
 st+0.073
 hurt+0.068
 pet+0.066
Neg logit contributions
ancer-0.107
ver-0.095
ancing-0.095
icking-0.094
ummy-0.091
Tokenicking
Token ID7958
Feature activation+5.00e+00

Pos logit contributions
 p+0.062
 st+0.058
 r+0.057
 hurt+0.054
 sc+0.048
Neg logit contributions
ancer-0.186
ancing-0.170
ver-0.169
imb-0.165
ete-0.165
Token.
Token ID13
Feature activation-2.06e-02

Pos logit contributions
ancer+0.623
ancing+0.561
ver+0.559
icking+0.553
ummy+0.539
Neg logit contributions
 p-0.262
 r-0.256
 st-0.254
 hurt-0.236
 trash-0.222
Token It
Token ID632
Feature activation+4.20e-01

Pos logit contributions
 p+0.002
 r+0.001
 st+0.001
 pet+0.001
 wished+0.001
Neg logit contributions
ancer-0.002
icking-0.002
ancing-0.002
ver-0.002
ete-0.002
Token's
Token ID338
Feature activation+1.61e+00

Pos logit contributions
ancer+0.044
icking+0.041
ver+0.040
ancing+0.040
ummy+0.038
Neg logit contributions
 p-0.026
 st-0.026
 r-0.025
 scratches-0.024
 pet-0.024
Token ok
Token ID12876
Feature activation+2.46e+00

Pos logit contributions
ancer+0.170
icking+0.160
ancing+0.159
ver+0.157
ummy+0.150
Neg logit contributions
 p-0.096
 st-0.094
 r-0.092
 scratches-0.088
 pet-0.087
Token to
Token ID284
Feature activation-9.64e-01

Pos logit contributions
ancer+0.337
ancing+0.317
ver+0.313
ete+0.310
icking+0.307
Neg logit contributions
 r-0.096
 p-0.093
 st-0.088
 hurt-0.082
 pet-0.076
Token.
Token ID13
Feature activation-4.12e-02

Pos logit contributions
 p+0.007
 st+0.007
 r+0.007
 scratches+0.006
 hurt+0.006
Neg logit contributions
ancer-0.022
ancing-0.021
icking-0.021
ver-0.021
ummy-0.020
Token Ben
Token ID3932
Feature activation+8.43e-01

Pos logit contributions
 p+0.003
 r+0.002
 st+0.002
 pet+0.002
 s+0.002
Neg logit contributions
icking-0.005
ancer-0.005
ancing-0.004
ver-0.004
ummy-0.004
Token has
Token ID468
Feature activation-3.87e-01

Pos logit contributions
ancer+0.105
ancing+0.098
icking+0.097
ver+0.095
ummy+0.094
Neg logit contributions
 p-0.041
 st-0.039
 r-0.039
 hurt-0.035
 pet-0.034
Token many
Token ID867
Feature activation-8.94e-01

Pos logit contributions
 p+0.019
 st+0.019
 r+0.019
sm+0.017
 pr+0.016
Neg logit contributions
ancer-0.042
ver-0.039
ancing-0.037
icking-0.037
ete-0.036
Token cr
Token ID1067
Feature activation-1.39e-01

Pos logit contributions
 st+0.045
 r+0.045
 hurt+0.044
 p+0.044
 wished+0.043
Neg logit contributions
ancer-0.092
ummy-0.088
icking-0.083
ancing-0.082
ver-0.081
Tokenay
Token ID323
Feature activation+4.99e+00

Pos logit contributions
 p+0.014
 r+0.014
 st+0.014
 scratches+0.014
 pr+0.013
Neg logit contributions
ancer-0.009
ver-0.007
icking-0.007
ancing-0.007
ete-0.006
Tokenons
Token ID684
Feature activation+4.73e-01

Pos logit contributions
ancer+0.374
ancing+0.343
ummy+0.336
icking+0.333
ver+0.328
Neg logit contributions
 p-0.456
 st-0.450
 r-0.437
 wished-0.435
 hurt-0.430
Token,
Token ID11
Feature activation+1.40e+00

Pos logit contributions
ancer+0.057
ancing+0.053
icking+0.052
ver+0.052
ummy+0.050
Neg logit contributions
 p-0.024
 st-0.024
 r-0.023
 hurt-0.022
 pet-0.021
Token but
Token ID475
Feature activation+1.76e+00

Pos logit contributions
ancer+0.138
ancing+0.126
icking+0.124
ver+0.122
ummy+0.120
Neg logit contributions
 p-0.110
 st-0.107
 r-0.106
 hurt-0.099
 pet-0.099
Token Mia
Token ID32189
Feature activation+2.71e-01

Pos logit contributions
ancer+0.218
ancing+0.208
icking+0.203
ver+0.198
ummy+0.196
Neg logit contributions
 p-0.090
 r-0.089
 st-0.087
 hurt-0.080
 trash-0.078
Token has
Token ID468
Feature activation-1.81e-01

Pos logit contributions
ancer+0.032
ancing+0.030
icking+0.029
ver+0.029
ummy+0.029
Neg logit contributions
 p-0.014
 st-0.014
 r-0.014
 hurt-0.012
 scratches-0.012
Token scissors
Token ID41786
Feature activation-9.86e-01

Pos logit contributions
ancer+0.164
ancing+0.149
icking+0.148
ver+0.147
ummy+0.145
Neg logit contributions
 p-0.069
 st-0.068
 r-0.067
 hurt-0.063
 wished-0.062
Token,
Token ID11
Feature activation+1.44e+00

Pos logit contributions
 p+0.057
 st+0.056
 r+0.056
 hurt+0.051
 scratches+0.049
Neg logit contributions
ancer-0.114
ancing-0.104
icking-0.102
ver-0.101
ummy-0.101
Token glue
Token ID22749
Feature activation-1.14e+00

Pos logit contributions
ancer+0.165
ancing+0.151
icking+0.150
ver+0.148
ummy+0.145
Neg logit contributions
 p-0.074
 st-0.072
 r-0.071
 hurt-0.067
 pet-0.066
Token and
Token ID290
Feature activation+9.09e-01

Pos logit contributions
 p+0.068
 st+0.067
 r+0.066
 hurt+0.062
 pet+0.060
Neg logit contributions
ancer-0.126
ancing-0.116
icking-0.114
ver-0.113
ummy-0.111
Token cr
Token ID1067
Feature activation-1.67e-02

Pos logit contributions
ancer+0.089
ancing+0.077
ver+0.077
icking+0.077
bars+0.076
Neg logit contributions
sm-0.049
 p-0.049
 st-0.048
 pet-0.046
 r-0.045
Tokenay
Token ID323
Feature activation+4.97e+00

Pos logit contributions
 r+0.002
 p+0.002
 st+0.002
 scratches+0.002
 hurt+0.002
Neg logit contributions
ancer-0.001
icking-0.001
ver-0.001
ancing-0.001
ete-0.001
Tokenons
Token ID684
Feature activation+5.73e-01

Pos logit contributions
ancer+0.337
ancing+0.315
ummy+0.310
icking+0.297
ver+0.294
Neg logit contributions
 p-0.465
 st-0.457
 wished-0.455
 pet-0.451
 hurt-0.441
Token.
Token ID13
Feature activation+5.90e-01

Pos logit contributions
ancer+0.068
icking+0.065
ancing+0.064
ver+0.064
ummy+0.063
Neg logit contributions
 p-0.026
 st-0.025
 r-0.024
 wished-0.024
 scratches-0.023
Token They
Token ID1119
Feature activation-1.95e-01

Pos logit contributions
icking+0.072
ancer+0.072
ancing+0.071
ver+0.066
istent+0.065
Neg logit contributions
 p-0.028
 r-0.026
 st-0.025
 pet-0.024
 trash-0.023
Token make
Token ID787
Feature activation+7.46e-01

Pos logit contributions
sm+0.012
 p+0.012
 st+0.012
P+0.012
B+0.012
Neg logit contributions
ancer-0.018
ver-0.017
icking-0.016
ancing-0.015
bars-0.015
Token their
Token ID511
Feature activation-1.58e+00

Pos logit contributions
ancer+0.090
ancing+0.084
icking+0.083
ver+0.081
ummy+0.081
Neg logit contributions
 st-0.034
 p-0.034
 r-0.032
 hurt-0.030
 pet-0.029
Token They
Token ID1119
Feature activation-2.29e-01

Pos logit contributions
ancer+0.056
icking+0.053
ancing+0.052
ver+0.050
istent+0.048
Neg logit contributions
 p-0.023
 r-0.022
 st-0.021
 pet-0.020
 s-0.020
Token draw
Token ID3197
Feature activation+2.11e-01

Pos logit contributions
 st+0.013
 p+0.013
sm+0.013
 r+0.013
P+0.013
Neg logit contributions
ancer-0.023
ver-0.021
icking-0.021
ancing-0.021
ummy-0.019
Token with
Token ID351
Feature activation-1.29e-01

Pos logit contributions
ancer+0.028
ancing+0.026
icking+0.026
ver+0.025
ummy+0.025
Neg logit contributions
 p-0.009
 r-0.009
 st-0.009
 hurt-0.008
 trash-0.007
Token the
Token ID262
Feature activation+6.87e-01

Pos logit contributions
 st+0.006
 p+0.006
 r+0.006
P+0.006
 wished+0.006
Neg logit contributions
ancer-0.014
icking-0.013
ancing-0.013
ummy-0.013
ver-0.013
Token cr
Token ID1067
Feature activation+1.50e-01

Pos logit contributions
ancer+0.077
ancing+0.069
icking+0.068
ummy+0.068
ver+0.067
Neg logit contributions
 p-0.037
 r-0.036
 st-0.036
 wished-0.034
 hurt-0.033
Tokenay
Token ID323
Feature activation+4.97e+00

Pos logit contributions
ancer+0.010
icking+0.008
ver+0.008
ancing+0.008
imb+0.007
Neg logit contributions
 r-0.016
 p-0.016
 st-0.015
 scratches-0.015
 hurt-0.015
Tokenons
Token ID684
Feature activation+6.76e-01

Pos logit contributions
ancer+0.452
ancing+0.407
icking+0.401
ver+0.397
ummy+0.387
Neg logit contributions
 p-0.400
 st-0.395
 r-0.391
 hurt-0.370
 scratches-0.365
Token.
Token ID13
Feature activation+4.91e-01

Pos logit contributions
ancer+0.089
ancing+0.083
icking+0.083
ver+0.082
ummy+0.081
Neg logit contributions
 p-0.025
 st-0.025
 r-0.024
 scratches-0.022
 hurt-0.021
Token They
Token ID1119
Feature activation-1.87e-01

Pos logit contributions
ancer+0.060
icking+0.058
ancing+0.057
ver+0.054
istent+0.053
Neg logit contributions
 p-0.026
 r-0.025
 st-0.024
 pet-0.023
 fun-0.022
Token use
Token ID779
Feature activation+9.01e-01

Pos logit contributions
sm+0.012
 st+0.012
 p+0.012
P+0.012
 r+0.012
Neg logit contributions
ancer-0.017
ver-0.016
ancing-0.015
icking-0.015
ummy-0.014
Token the
Token ID262
Feature activation+6.76e-01

Pos logit contributions
ancer+0.103
ancing+0.097
icking+0.097
ver+0.096
ummy+0.095
Neg logit contributions
 p-0.043
 st-0.042
 r-0.040
 hurt-0.039
 wished-0.038
Token draw
Token ID3197
Feature activation+1.33e-01

Pos logit contributions
 p+0.009
 st+0.009
 r+0.009
 hurt+0.008
 pet+0.008
Neg logit contributions
ancer-0.018
ancing-0.016
icking-0.016
ver-0.016
ummy-0.016
Token on
Token ID319
Feature activation+1.02e+00

Pos logit contributions
ancer+0.016
ancing+0.015
icking+0.015
ver+0.015
ummy+0.015
Neg logit contributions
 p-0.007
 st-0.007
 r-0.006
 hurt-0.006
 trash-0.006
Token paper
Token ID3348
Feature activation-4.98e-01

Pos logit contributions
ancer+0.122
ancing+0.112
icking+0.111
ver+0.110
ummy+0.110
Neg logit contributions
 st-0.049
 p-0.049
 r-0.048
 hurt-0.045
 scratches-0.043
Token with
Token ID351
Feature activation-9.59e-02

Pos logit contributions
 p+0.027
 st+0.026
 r+0.026
 hurt+0.023
 pet+0.023
Neg logit contributions
ancer-0.060
ancing-0.055
icking-0.055
ver-0.054
ummy-0.053
Token cr
Token ID1067
Feature activation-4.09e-02

Pos logit contributions
 st+0.004
 p+0.004
 r+0.004
 wished+0.004
 hurt+0.004
Neg logit contributions
ancer-0.011
ummy-0.010
ancing-0.010
icking-0.010
ver-0.009
Tokenay
Token ID323
Feature activation+4.97e+00

Pos logit contributions
 p+0.005
 r+0.005
 st+0.004
 scratches+0.004
 hurt+0.004
Neg logit contributions
ancer-0.002
icking-0.002
ver-0.002
ancing-0.002
ete-0.002
Tokenons
Token ID684
Feature activation+5.59e-01

Pos logit contributions
ancer+0.372
ancing+0.342
ummy+0.337
ver+0.325
icking+0.322
Neg logit contributions
 p-0.448
 st-0.443
 wished-0.430
 r-0.428
 hurt-0.428
Token and
Token ID290
Feature activation+7.38e-01

Pos logit contributions
ancer+0.068
ancing+0.063
icking+0.062
ver+0.062
ummy+0.061
Neg logit contributions
 p-0.028
 st-0.027
 r-0.027
 hurt-0.024
 scratches-0.024
Token stickers
Token ID28568
Feature activation-8.64e-01

Pos logit contributions
ancer+0.075
ancing+0.067
ummy+0.066
icking+0.064
ver+0.064
Neg logit contributions
sm-0.039
 p-0.039
 st-0.039
 r-0.037
 hurt-0.036
Token.
Token ID13
Feature activation+5.01e-01

Pos logit contributions
 p+0.028
 st+0.028
 r+0.027
 hurt+0.023
 pet+0.022
Neg logit contributions
ancer-0.120
ancing-0.112
icking-0.111
ver-0.110
ummy-0.108
Token They
Token ID1119
Feature activation-2.50e-01

Pos logit contributions
icking+0.061
ancing+0.059
ancer+0.059
ver+0.055
istent+0.054
Neg logit contributions
 p-0.026
 r-0.024
 st-0.023
 pet-0.022
 fun-0.021
Token
Token ID95
Feature activation+5.18e-01

Pos logit contributions
ancer+0.069
ancing+0.060
icking+0.060
ver+0.059
ummy+0.057
Neg logit contributions
 p-0.073
 st-0.072
 r-0.071
 hurt-0.068
 pet-0.068
Token
Token ID26391
Feature activation+1.18e+00

Pos logit contributions
ancer+0.044
ancing+0.039
icking+0.038
ver+0.038
ummy+0.037
Neg logit contributions
 p-0.045
 st-0.045
 r-0.045
 hurt-0.042
 pet-0.042
Token
Token ID129
Feature activation-5.91e-01

Pos logit contributions
ancer+0.122
ancing+0.111
icking+0.108
ummy+0.104
ver+0.103
Neg logit contributions
 p-0.081
 st-0.081
 r-0.080
 hurt-0.076
 pet-0.074
Token
Token ID241
Feature activation-1.50e-01

Pos logit contributions
 p+0.047
 st+0.046
 r+0.045
 hurt+0.043
 trash+0.042
Neg logit contributions
ancer-0.054
ancing-0.050
icking-0.049
ver-0.048
ummy-0.047
TokenK
Token ID42
Feature activation+1.31e-01

Pos logit contributions
 p+0.010
 st+0.010
 r+0.010
 hurt+0.009
 scratches+0.009
Neg logit contributions
ancer-0.016
ancing-0.014
icking-0.014
ver-0.014
ummy-0.014
Tokenicking
Token ID7958
Feature activation+4.95e+00

Pos logit contributions
ancer+0.012
ancing+0.011
ver+0.011
icking+0.011
apping+0.010
Neg logit contributions
 p-0.010
 st-0.010
 r-0.010
 che-0.009
 pr-0.009
Token is
Token ID318
Feature activation-1.36e-01

Pos logit contributions
ancer+0.640
ver+0.587
ancing+0.565
icking+0.558
ummy+0.551
Neg logit contributions
 p-0.263
 r-0.249
 st-0.247
 bad-0.226
 hurt-0.224
Token fun
Token ID1257
Feature activation-1.43e+00

Pos logit contributions
 p+0.006
 st+0.005
 r+0.005
 hurt+0.005
 pet+0.005
Neg logit contributions
ancer-0.018
ancing-0.017
icking-0.016
ummy-0.016
ver-0.016
Token!
Token ID0
Feature activation+1.32e+00

Pos logit contributions
 p+0.077
 st+0.076
 r+0.075
 hurt+0.069
 trash+0.068
Neg logit contributions
ancer-0.165
ancing-0.152
icking-0.151
ver-0.149
ummy-0.147
Token It
Token ID632
Feature activation+4.78e-01

Pos logit contributions
icking+0.159
ancer+0.151
ancing+0.151
ver+0.150
otion+0.146
Neg logit contributions
 p-0.069
 r-0.059
 st-0.059
 pet-0.058
 le-0.053
Token makes
Token ID1838
Feature activation-6.83e-01

Pos logit contributions
ancer+0.053
ver+0.049
icking+0.048
ancing+0.048
ummy+0.045
Neg logit contributions
 st-0.024
 p-0.024
 scratches-0.024
 trash-0.023
 r-0.023
Token,
Token ID11
Feature activation+2.71e-01

Pos logit contributions
ancer+0.061
ancing+0.057
icking+0.056
ver+0.056
apping+0.054
Neg logit contributions
 st-0.027
 p-0.026
 r-0.026
 scratches-0.026
P-0.025
Token screaming
Token ID14788
Feature activation-7.40e-02

Pos logit contributions
ancer+0.027
ancing+0.024
icking+0.024
ver+0.024
ummy+0.023
Neg logit contributions
 p-0.019
 st-0.019
 r-0.019
 trash-0.018
 scratches-0.017
Token,
Token ID11
Feature activation+2.85e-01

Pos logit contributions
 st+0.004
 p+0.004
 r+0.004
 scratches+0.003
 trash+0.003
Neg logit contributions
ancer-0.008
ancing-0.008
icking-0.008
ver-0.008
ummy-0.008
Token and
Token ID290
Feature activation-9.06e-03

Pos logit contributions
ancer+0.028
ancing+0.026
icking+0.026
ver+0.025
ummy+0.024
Neg logit contributions
 st-0.018
 p-0.018
 r-0.018
 trash-0.018
 scratches-0.017
Token pan
Token ID3425
Feature activation-1.69e+00

Pos logit contributions
 st+0.001
 p+0.001
 r+0.001
 scratches+0.001
 hurt+0.001
Neg logit contributions
ancer-0.001
icking-0.001
ancing-0.001
ver-0.001
ummy-0.001
Tokenicking
Token ID7958
Feature activation+4.93e+00

Pos logit contributions
 p+0.089
 st+0.086
 r+0.086
 hurt+0.079
 pet+0.074
Neg logit contributions
ancer-0.205
ancing-0.187
ver-0.184
ummy-0.183
icking-0.181
Token.
Token ID13
Feature activation-4.82e-01

Pos logit contributions
ancer+0.596
ancing+0.557
icking+0.552
ver+0.545
ummy+0.536
Neg logit contributions
 p-0.238
 st-0.234
 r-0.229
 hurt-0.209
 scratches-0.205
Token Sally
Token ID25737
Feature activation+5.97e-01

Pos logit contributions
 p+0.040
 r+0.039
 st+0.039
 pet+0.037
 trash+0.036
Neg logit contributions
ancer-0.045
ancing-0.040
icking-0.039
ver-0.038
ummy-0.036
Token was
Token ID373
Feature activation-2.23e-01

Pos logit contributions
ancer+0.075
ancing+0.069
icking+0.069
ver+0.068
ummy+0.067
Neg logit contributions
 p-0.028
 st-0.027
 r-0.027
 hurt-0.024
 pet-0.024
Token so
Token ID523
Feature activation+1.03e+00

Pos logit contributions
 p+0.011
 st+0.011
 r+0.011
 hurt+0.010
 pet+0.010
Neg logit contributions
ancer-0.028
ancing-0.026
icking-0.025
ummy-0.025
ver-0.025
Token frightened
Token ID24776
Feature activation-1.41e+00

Pos logit contributions
ancer+0.129
ver+0.120
icking+0.119
ancing+0.119
apping+0.115
Neg logit contributions
 p-0.044
 st-0.043
 r-0.042
 scratches-0.038
 trash-0.038
Token makes
Token ID1838
Feature activation-4.74e-01

Pos logit contributions
ancer+0.061
ver+0.056
ancing+0.056
icking+0.056
ummy+0.053
Neg logit contributions
 st-0.042
 p-0.041
 r-0.041
 hurt-0.040
 trash-0.039
Token v
Token ID410
Feature activation+2.80e-01

Pos logit contributions
 p+0.026
 st+0.025
 r+0.025
 hurt+0.023
 scratches+0.022
Neg logit contributions
ancer-0.056
ancing-0.051
icking-0.051
ver-0.050
ummy-0.049
Tokenroom
Token ID3823
Feature activation+5.16e+00

Pos logit contributions
ancer+0.017
ancing+0.014
icking+0.013
ver+0.013
ete+0.013
Neg logit contributions
 p-0.031
 r-0.030
 st-0.030
 hurt-0.029
 scratches-0.028
Token-
Token ID12
Feature activation+7.62e-01

Pos logit contributions
ancer+0.634
ancing+0.588
icking+0.582
ver+0.577
ummy+0.568
Neg logit contributions
 p-0.243
 st-0.240
 r-0.235
 hurt-0.214
 scratches-0.209
Tokenv
Token ID85
Feature activation+1.23e-01

Pos logit contributions
ancer+0.075
ancing+0.067
ver+0.067
icking+0.066
ress+0.065
Neg logit contributions
 p-0.053
 st-0.052
 r-0.051
 trash-0.049
 scratches-0.049
Tokenroom
Token ID3823
Feature activation+4.92e+00

Pos logit contributions
ancer+0.007
ancing+0.005
ver+0.005
icking+0.005
ete+0.005
Neg logit contributions
 p-0.015
 r-0.014
 st-0.014
 hurt-0.014
 sc-0.013
Token sounds
Token ID5238
Feature activation-4.31e-01

Pos logit contributions
ancer+0.597
ancing+0.558
icking+0.552
ver+0.544
ummy+0.537
Neg logit contributions
 st-0.225
 p-0.225
 r-0.216
 pet-0.199
 scratches-0.199
Token.
Token ID13
Feature activation+3.33e-01

Pos logit contributions
 st+0.021
 p+0.021
 scratches+0.020
 r+0.020
 pet+0.019
Neg logit contributions
ancer-0.050
ancing-0.047
icking-0.046
ver-0.046
apping-0.045
Token
Token ID198
Feature activation+1.20e-01

Pos logit contributions
ancer+0.036
icking+0.035
ancing+0.034
ver+0.033
ummy+0.031
Neg logit contributions
 p-0.023
 r-0.022
 st-0.021
 pet-0.019
 s-0.019
Token
Token ID198
Feature activation-2.47e-02

Pos logit contributions
ancer+0.016
ancing+0.015
icking+0.015
ver+0.014
ummy+0.014
Neg logit contributions
 p-0.005
 r-0.005
 st-0.005
 hurt-0.004
 pet-0.004
Token"
Token ID1
Feature activation-9.83e-01

Pos logit contributions
 p+0.001
 st+0.001
 r+0.001
 hurt+0.001
 scratches+0.001
Neg logit contributions
ancer-0.003
ancing-0.003
ver-0.003
icking-0.003
ummy-0.003
Token "
Token ID366
Feature activation+6.65e-01

Pos logit contributions
 p+0.034
 r+0.032
 st+0.032
 hurt+0.030
 pet+0.030
Neg logit contributions
ancer-0.044
ancing-0.040
icking-0.040
ver-0.037
ummy-0.037
TokenWhy
Token ID5195
Feature activation-8.42e-01

Pos logit contributions
ancer+0.073
ancing+0.065
ver+0.064
icking+0.064
ummy+0.062
Neg logit contributions
 p-0.041
 st-0.041
 r-0.040
 scratches-0.038
 trash-0.037
Token are
Token ID389
Feature activation-1.69e-01

Pos logit contributions
 r+0.047
 hurt+0.047
 p+0.047
 st+0.045
 bad+0.043
Neg logit contributions
ancer-0.098
ancing-0.096
ummy-0.094
apping-0.090
icking-0.089
Token you
Token ID345
Feature activation-1.12e-01

Pos logit contributions
 p+0.014
 st+0.014
 r+0.014
 hurt+0.013
 trash+0.013
Neg logit contributions
ancer-0.015
ancing-0.014
icking-0.013
ver-0.013
ummy-0.013
Token pan
Token ID3425
Feature activation-1.63e+00

Pos logit contributions
 p+0.006
 st+0.006
 r+0.006
 hurt+0.006
 sc+0.005
Neg logit contributions
ancer-0.014
ancing-0.013
ver-0.013
ete-0.012
ummy-0.012
Tokenicking
Token ID7958
Feature activation+4.92e+00

Pos logit contributions
 hurt+0.144
 p+0.136
 st+0.136
 r+0.133
 sc+0.130
Neg logit contributions
ancer-0.145
ancing-0.133
ete-0.129
ver-0.129
istent-0.128
Token?"
Token ID1701
Feature activation-1.17e-01

Pos logit contributions
ancer+0.551
ancing+0.510
ver+0.487
ete+0.480
ress+0.479
Neg logit contributions
 p-0.329
 hurt-0.327
 r-0.323
 st-0.318
 bad-0.303
Token
Token ID198
Feature activation-1.24e-01

Pos logit contributions
 p+0.008
 st+0.008
 r+0.008
 hurt+0.007
 scratches+0.007
Neg logit contributions
ancer-0.012
ancing-0.011
icking-0.011
ver-0.011
ummy-0.011
Token
Token ID198
Feature activation-1.88e-01

Pos logit contributions
 p+0.006
 r+0.006
 st+0.006
 hurt+0.005
 pet+0.005
Neg logit contributions
ancer-0.016
ancing-0.015
icking-0.014
ver-0.014
ummy-0.014
Token"
Token ID1
Feature activation-1.27e+00

Pos logit contributions
 st+0.015
 r+0.014
 p+0.014
 scratches+0.013
 trash+0.013
Neg logit contributions
ancer-0.015
ver-0.015
ancing-0.015
ress-0.015
ummy-0.015
TokenI
Token ID40
Feature activation+6.12e-02

Pos logit contributions
 p+0.090
 st+0.089
 r+0.088
 scratches+0.085
 trash+0.083
Neg logit contributions
ancer-0.122
ver-0.110
icking-0.104
ancing-0.102
ummy-0.101
Token happy
Token ID3772
Feature activation-2.04e+00

Pos logit contributions
 p+0.061
 st+0.060
 r+0.059
 scratches+0.055
 hurt+0.054
Neg logit contributions
ancer-0.120
ancing-0.110
icking-0.110
ver-0.109
ummy-0.106
Token that
Token ID326
Feature activation-9.33e-01

Pos logit contributions
 p+0.107
 st+0.105
 r+0.104
 hurt+0.095
 pet+0.093
Neg logit contributions
ancer-0.244
ancing-0.225
icking-0.223
ver-0.221
ummy-0.217
Token she
Token ID673
Feature activation-1.41e+00

Pos logit contributions
 p+0.061
 r+0.060
 st+0.057
 fun+0.054
 hurt+0.054
Neg logit contributions
ancer-0.105
ancing-0.098
icking-0.095
ver-0.093
ete-0.090
Token learned
Token ID4499
Feature activation-6.55e-01

Pos logit contributions
 p+0.058
 st+0.057
 r+0.056
 hurt+0.051
 wished+0.048
Neg logit contributions
ancer-0.187
ancing-0.176
icking-0.172
ummy-0.171
ver-0.169
Token about
Token ID546
Feature activation+2.93e-01

Pos logit contributions
 p+0.035
 st+0.034
 r+0.034
 hurt+0.031
 pet+0.030
Neg logit contributions
ancer-0.078
ancing-0.072
icking-0.071
ver-0.070
ummy-0.069
Token cher
Token ID21382
Feature activation-4.60e+00

Pos logit contributions
ancer+0.036
ancing+0.033
icking+0.032
ver+0.032
ummy+0.031
Neg logit contributions
 p-0.016
 st-0.015
 r-0.015
 hurt-0.014
 pet-0.014
Tokenries
Token ID1678
Feature activation-1.17e+00

Pos logit contributions
 p+0.433
 st+0.428
 r+0.424
 hurt+0.405
 scratches+0.398
Neg logit contributions
ancer-0.356
ancing-0.315
icking-0.307
ver-0.306
ummy-0.294
Token that
Token ID326
Feature activation-4.83e-01

Pos logit contributions
 p+0.052
 st+0.049
 r+0.048
 hurt+0.045
 pet+0.042
Neg logit contributions
ancer-0.154
ancing-0.144
icking-0.139
ete-0.137
ver-0.136
Token day
Token ID1110
Feature activation-1.30e+00

Pos logit contributions
 p+0.027
 r+0.026
 st+0.026
 hurt+0.024
 pet+0.023
Neg logit contributions
ancer-0.059
ancing-0.055
icking-0.054
apping-0.052
ete-0.052
Token.
Token ID13
Feature activation-2.26e-01

Pos logit contributions
 st+0.032
 r+0.031
 p+0.030
 hurt+0.026
 pet+0.021
Neg logit contributions
ancer-0.195
ancing-0.182
icking-0.179
ver-0.175
ete-0.173
Token She
Token ID1375
Feature activation-9.98e-02

Pos logit contributions
 p+0.012
 r+0.011
 st+0.011
 pet+0.011
 wished+0.010
Neg logit contributions
ancer-0.027
icking-0.026
ancing-0.026
ver-0.025
apping-0.023
Token would
Token ID561
Feature activation+6.55e-01

Pos logit contributions
 p+0.030
 r+0.029
 st+0.029
 scratches+0.028
 trash+0.028
Neg logit contributions
ancer-0.063
ver-0.058
icking-0.058
ancing-0.058
ummy-0.054
Token come
Token ID1282
Feature activation+1.35e+00

Pos logit contributions
ancer+0.072
ancing+0.066
icking+0.066
ver+0.065
ummy+0.064
Neg logit contributions
 p-0.040
 st-0.039
 r-0.038
 hurt-0.036
 scratches-0.035
Token and
Token ID290
Feature activation-4.63e-01

Pos logit contributions
ancer+0.181
ancing+0.164
ver+0.162
ete+0.162
ummy+0.161
Neg logit contributions
 pet-0.057
 p-0.056
 st-0.056
 r-0.055
 hurt-0.049
Token let
Token ID1309
Feature activation-6.48e-01

Pos logit contributions
 p+0.028
 st+0.027
 r+0.027
 scratches+0.025
 trash+0.025
Neg logit contributions
ancer-0.050
ancing-0.046
ver-0.046
icking-0.046
ummy-0.044
Token the
Token ID262
Feature activation-3.12e-01

Pos logit contributions
 r+0.032
 p+0.032
 st+0.032
 pet+0.028
 hurt+0.028
Neg logit contributions
ancer-0.083
ancing-0.076
icking-0.074
ver-0.074
apping-0.074
Token cher
Token ID21382
Feature activation-4.43e+00

Pos logit contributions
 p+0.017
 st+0.017
 r+0.017
 hurt+0.015
 scratches+0.015
Neg logit contributions
ancer-0.036
ancing-0.033
icking-0.033
ver-0.032
ummy-0.032
Tokenries
Token ID1678
Feature activation-1.02e+00

Pos logit contributions
 p+0.411
 st+0.408
 r+0.400
 hurt+0.382
 pet+0.376
Neg logit contributions
ancer-0.351
ancing-0.311
ver-0.302
icking-0.302
ummy-0.290
Token fall
Token ID2121
Feature activation+7.86e-02

Pos logit contributions
 p+0.058
 st+0.057
 r+0.056
 hurt+0.053
 pet+0.051
Neg logit contributions
ancer-0.119
ancing-0.109
icking-0.107
ver-0.106
ummy-0.106
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
ancer+0.010
ancing+0.009
icking+0.009
ver+0.009
ummy+0.008
Neg logit contributions
 p-0.004
 st-0.004
 r-0.004
 hurt-0.004
 pet-0.003
Token the
Token ID262
Feature activation-1.00e-01

Pos logit contributions
 p+0.062
 st+0.061
 r+0.060
 hurt+0.055
 scratches+0.055
Neg logit contributions
ancer-0.130
ancing-0.120
icking-0.119
ver-0.118
ummy-0.116
Token ground
Token ID2323
Feature activation+2.57e-01

Pos logit contributions
 st+0.006
 p+0.006
 r+0.006
 scratches+0.005
 hurt+0.005
Neg logit contributions
ancer-0.011
icking-0.010
ancing-0.010
ummy-0.010
ver-0.010
Token must
Token ID1276
Feature activation+2.57e+00

Pos logit contributions
ancer+0.108
ancing+0.101
ummy+0.097
icking+0.097
ver+0.094
Neg logit contributions
 p-0.065
 st-0.065
 r-0.063
 hurt-0.062
 pet-0.058
Tokenn
Token ID77
Feature activation-9.52e-01

Pos logit contributions
ancer+0.261
ancing+0.244
icking+0.241
ver+0.237
ummy+0.229
Neg logit contributions
 st-0.167
 p-0.167
 r-0.165
 trash-0.153
 scratches-0.152
Token't
Token ID470
Feature activation+4.15e-01

Pos logit contributions
 r+0.079
 hurt+0.078
 p+0.076
 st+0.076
 sc+0.073
Neg logit contributions
ancer-0.083
ummy-0.074
ete-0.072
ancing-0.069
icking-0.068
Token touch
Token ID3638
Feature activation-1.10e+00

Pos logit contributions
ancer+0.046
ancing+0.042
icking+0.042
ver+0.041
ummy+0.040
Neg logit contributions
 p-0.025
 st-0.025
 r-0.024
 scratches-0.022
 trash-0.022
Token the
Token ID262
Feature activation-5.58e-02

Pos logit contributions
 p+0.066
 st+0.063
 r+0.062
 hurt+0.058
 trash+0.057
Neg logit contributions
ancer-0.127
ancing-0.116
ver-0.115
icking-0.115
ummy-0.110
Token cher
Token ID21382
Feature activation-4.42e+00

Pos logit contributions
 p+0.003
 st+0.003
 r+0.003
 hurt+0.003
 scratches+0.003
Neg logit contributions
ancer-0.006
ancing-0.006
icking-0.006
ummy-0.006
ver-0.006
Tokenries
Token ID1678
Feature activation-8.64e-01

Pos logit contributions
 p+0.421
 st+0.417
 r+0.413
 hurt+0.394
 scratches+0.389
Neg logit contributions
ancer-0.338
ancing-0.297
icking-0.292
ver-0.289
ummy-0.279
Token!"
Token ID2474
Feature activation+5.85e-02

Pos logit contributions
 p+0.042
 st+0.039
 hurt+0.037
 r+0.037
 sc+0.032
Neg logit contributions
ancer-0.112
ancing-0.103
ete-0.099
ver-0.098
icking-0.097
Token But
Token ID887
Feature activation-8.98e-01

Pos logit contributions
ancer+0.007
ancing+0.007
icking+0.007
ver+0.006
ummy+0.006
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 pet-0.003
 hurt-0.003
Token Jack
Token ID3619
Feature activation+1.31e+00

Pos logit contributions
 p+0.053
 st+0.049
 r+0.049
 hurt+0.046
 pet+0.044
Neg logit contributions
ancer-0.107
ancing-0.100
ummy-0.095
unch-0.094
icking-0.093
Token was
Token ID373
Feature activation-4.28e-01

Pos logit contributions
ancer+0.160
ancing+0.147
icking+0.144
ummy+0.142
ver+0.142
Neg logit contributions
 p-0.068
 st-0.067
 r-0.065
 hurt-0.061
 pet-0.058
Token was
Token ID373
Feature activation-3.99e-01

Pos logit contributions
ancer+0.014
ancing+0.013
icking+0.013
ver+0.013
ummy+0.012
Neg logit contributions
 p-0.005
 st-0.005
 r-0.005
 hurt-0.004
 pet-0.004
Token filled
Token ID5901
Feature activation+2.23e+00

Pos logit contributions
 p+0.022
 st+0.021
 r+0.021
 trash+0.020
 scratches+0.020
Neg logit contributions
ancer-0.043
icking-0.041
ver-0.040
ancing-0.040
ummy-0.040
Token with
Token ID351
Feature activation-1.05e+00

Pos logit contributions
ancer+0.271
ancing+0.244
icking+0.240
ver+0.240
ummy+0.230
Neg logit contributions
 p-0.126
 st-0.121
 r-0.120
 pet-0.110
 hurt-0.109
Token special
Token ID2041
Feature activation+1.01e-01

Pos logit contributions
 st+0.053
 p+0.053
 r+0.052
 hurt+0.048
 scratches+0.046
Neg logit contributions
ancer-0.121
ancing-0.112
icking-0.112
ummy-0.110
ver-0.109
Token pink
Token ID11398
Feature activation-1.01e-01

Pos logit contributions
ancer+0.011
ancing+0.010
ummy+0.010
icking+0.010
ver+0.010
Neg logit contributions
 st-0.005
 p-0.005
 r-0.005
 hurt-0.005
 wished-0.005
Token cher
Token ID21382
Feature activation-4.42e+00

Pos logit contributions
 r+0.005
 st+0.005
 hurt+0.005
 p+0.005
 wished+0.004
Neg logit contributions
ancer-0.012
ummy-0.011
ancing-0.011
icking-0.010
ver-0.010
Tokenries
Token ID1678
Feature activation-1.14e+00

Pos logit contributions
 p+0.428
 st+0.424
 r+0.422
 hurt+0.403
 scratches+0.400
Neg logit contributions
ancer-0.329
ancing-0.288
icking-0.285
ver-0.278
ummy-0.274
Token.
Token ID13
Feature activation-8.91e-01

Pos logit contributions
 p+0.039
 st+0.037
 r+0.036
 hurt+0.032
 pet+0.030
Neg logit contributions
ancer-0.158
ancing-0.148
icking-0.145
ver-0.145
ummy-0.141
Token
Token ID198
Feature activation-6.99e-01

Pos logit contributions
 p+0.057
 r+0.051
 st+0.051
 pet+0.050
 s+0.046
Neg logit contributions
ancer-0.098
icking-0.096
ancing-0.095
ver-0.090
istent-0.083
Token
Token ID198
Feature activation-6.72e-01

Pos logit contributions
 p+0.031
 r+0.030
 st+0.029
 pet+0.027
 hurt+0.025
Neg logit contributions
ancer-0.091
ancing-0.086
icking-0.085
ver-0.081
ummy-0.080
TokenThe
Token ID464
Feature activation-4.00e-01

Pos logit contributions
 p+0.039
 r+0.038
 st+0.038
 pet+0.034
 hurt+0.034
Neg logit contributions
ancer-0.077
ancing-0.071
icking-0.071
ver-0.069
ummy-0.068
Token night
Token ID1755
Feature activation-2.42e-01

Pos logit contributions
ancer+0.242
ancing+0.224
icking+0.221
ver+0.219
ummy+0.214
Neg logit contributions
 p-0.094
 st-0.092
 r-0.090
 hurt-0.082
 pet-0.079
Token it
Token ID340
Feature activation-1.97e-01

Pos logit contributions
 r+0.020
 p+0.019
 st+0.019
 pet+0.017
 hurt+0.017
Neg logit contributions
ancer-0.023
icking-0.022
ancing-0.021
ver-0.021
unch-0.020
Token grows
Token ID13676
Feature activation-6.03e-01

Pos logit contributions
 p+0.012
 st+0.012
 r+0.011
 hurt+0.011
 scratches+0.011
Neg logit contributions
ancer-0.022
ancing-0.020
icking-0.020
ver-0.019
ummy-0.019
Token even
Token ID772
Feature activation+3.42e-02

Pos logit contributions
 p+0.035
 st+0.035
 r+0.034
 hurt+0.031
 trash+0.030
Neg logit contributions
ancer-0.071
ancing-0.065
ver-0.064
icking-0.064
ete-0.062
Token more
Token ID517
Feature activation-1.41e-01

Pos logit contributions
ancer+0.004
icking+0.004
ver+0.003
ancing+0.003
ummy+0.003
Neg logit contributions
 p-0.002
 st-0.002
 r-0.002
 scratches-0.002
 pet-0.002
Token cher
Token ID21382
Feature activation-4.40e+00

Pos logit contributions
 p+0.006
 st+0.006
 r+0.006
 scratches+0.006
 trash+0.006
Neg logit contributions
ancer-0.016
ummy-0.016
ver-0.015
icking-0.015
ancing-0.015
Tokenries
Token ID1678
Feature activation-9.86e-01

Pos logit contributions
 st+0.388
 p+0.388
 r+0.376
 hurt+0.360
 scattered+0.351
Neg logit contributions
ancer-0.364
ancing-0.327
ver-0.320
icking-0.315
ete-0.304
Token."
Token ID526
Feature activation-1.01e+00

Pos logit contributions
 p+0.055
 st+0.054
 r+0.052
 hurt+0.049
 pet+0.046
Neg logit contributions
ancer-0.118
ancing-0.110
icking-0.107
ver-0.104
ete-0.104
Token
Token ID198
Feature activation-5.99e-01

Pos logit contributions
 p+0.053
 st+0.048
 pet+0.047
 r+0.046
 wished+0.041
Neg logit contributions
ancer-0.125
icking-0.119
ancing-0.119
ver-0.112
ummy-0.108
Token
Token ID198
Feature activation-6.71e-01

Pos logit contributions
 p+0.029
 st+0.027
 r+0.027
 pet+0.025
 hurt+0.025
Neg logit contributions
ancer-0.076
ancing-0.071
icking-0.071
ver-0.067
ummy-0.066
TokenThe
Token ID464
Feature activation-3.08e-01

Pos logit contributions
 p+0.040
 st+0.039
 r+0.038
 hurt+0.035
 pet+0.035
Neg logit contributions
ancer-0.076
ancing-0.070
icking-0.069
ver-0.068
ummy-0.067
Token on
Token ID319
Feature activation+3.35e-01

Pos logit contributions
 st+0.009
 p+0.009
 r+0.009
 hurt+0.008
 scratches+0.007
Neg logit contributions
ancer-0.030
ancing-0.028
icking-0.028
ver-0.028
ummy-0.027
Token he
Token ID339
Feature activation-1.05e+00

Pos logit contributions
ancer+0.042
ancing+0.039
icking+0.038
ver+0.037
apping+0.036
Neg logit contributions
 p-0.018
 st-0.017
 r-0.017
 hurt-0.016
 bad-0.014
Token always
Token ID1464
Feature activation-6.24e-01

Pos logit contributions
 p+0.050
 st+0.049
 r+0.048
 trash+0.045
 scratches+0.044
Neg logit contributions
ancer-0.127
ancing-0.117
ver-0.117
icking-0.117
ummy-0.113
Token polished
Token ID23895
Feature activation-2.01e+00

Pos logit contributions
 p+0.027
 st+0.026
 r+0.026
 trash+0.025
 scratches+0.024
Neg logit contributions
ancer-0.075
ancing-0.070
icking-0.070
ver-0.069
ummy-0.066
Token his
Token ID465
Feature activation-1.06e+00

Pos logit contributions
 p+0.087
 st+0.083
 r+0.080
 hurt+0.070
 trash+0.070
Neg logit contributions
ancer-0.271
ancing-0.247
icking-0.246
ver-0.246
apping-0.238
Token cher
Token ID21382
Feature activation-4.40e+00

Pos logit contributions
 p+0.057
 st+0.057
 r+0.056
 hurt+0.052
 scratches+0.051
Neg logit contributions
ancer-0.119
icking-0.112
ancing-0.111
ummy-0.110
ver-0.108
Tokenries
Token ID1678
Feature activation-8.46e-01

Pos logit contributions
 p+0.394
 st+0.389
 r+0.378
 hurt+0.362
 pr+0.356
Neg logit contributions
ancer-0.369
ancing-0.325
ver-0.319
icking-0.309
ete-0.299
Token before
Token ID878
Feature activation+6.74e-01

Pos logit contributions
 p+0.048
 st+0.045
 r+0.045
 hurt+0.041
 pr+0.040
Neg logit contributions
ancer-0.103
ancing-0.091
ver-0.088
otion-0.087
icking-0.087
Token eating
Token ID6600
Feature activation+3.35e-01

Pos logit contributions
ancer+0.075
ancing+0.068
icking+0.067
ver+0.067
ummy+0.065
Neg logit contributions
 p-0.042
 st-0.041
 r-0.041
 hurt-0.038
 pet-0.037
Token them
Token ID606
Feature activation-5.85e-01

Pos logit contributions
ancer+0.040
ancing+0.037
icking+0.036
ver+0.036
ummy+0.035
Neg logit contributions
 p-0.017
 st-0.017
 r-0.017
 hurt-0.015
 scratches-0.015
Token.
Token ID13
Feature activation-7.27e-01

Pos logit contributions
 p+0.021
 st+0.020
 r+0.020
 hurt+0.017
 pet+0.016
Neg logit contributions
ancer-0.082
ancing-0.076
ver-0.074
icking-0.074
apping-0.072
Token and
Token ID290
Feature activation-6.71e-01

Pos logit contributions
 p+0.011
 st+0.010
 r+0.010
 hurt+0.009
 pet+0.009
Neg logit contributions
ancer-0.027
ancing-0.025
icking-0.025
ver-0.025
ummy-0.024
Token soon
Token ID2582
Feature activation+5.17e-01

Pos logit contributions
 p+0.035
 st+0.034
 r+0.033
 hurt+0.031
 scratches+0.030
Neg logit contributions
ancer-0.080
ancing-0.074
icking-0.073
ver-0.072
ummy-0.071
Token everyone
Token ID2506
Feature activation-8.31e-01

Pos logit contributions
ancer+0.060
ancing+0.055
icking+0.054
ver+0.053
ummy+0.053
Neg logit contributions
 p-0.029
 st-0.029
 r-0.029
 hurt-0.026
 pet-0.026
Token was
Token ID373
Feature activation-8.49e-01

Pos logit contributions
 p+0.043
 st+0.043
 r+0.042
 hurt+0.038
 scratches+0.038
Neg logit contributions
ancer-0.098
ancing-0.090
icking-0.089
ver-0.089
ummy-0.087
Token taking
Token ID2263
Feature activation+2.49e-01

Pos logit contributions
 p+0.050
 st+0.049
 r+0.049
 hurt+0.046
 pet+0.044
Neg logit contributions
ancer-0.097
ancing-0.089
icking-0.087
ummy-0.086
ver-0.086
Token turns
Token ID4962
Feature activation-4.39e+00

Pos logit contributions
ancer+0.031
ancing+0.028
icking+0.028
ver+0.028
ummy+0.027
Neg logit contributions
 p-0.012
 st-0.012
 r-0.012
 hurt-0.011
 pet-0.011
Token to
Token ID284
Feature activation-1.20e+00

Pos logit contributions
 p+0.250
 st+0.245
 r+0.242
 hurt+0.223
 scratches+0.220
Neg logit contributions
ancer-0.498
ancing-0.458
icking-0.454
ver-0.450
ummy-0.441
Token spin
Token ID7906
Feature activation-4.36e-01

Pos logit contributions
 p+0.068
 r+0.064
 st+0.064
 hurt+0.062
 pet+0.062
Neg logit contributions
ancer-0.147
icking-0.134
ancing-0.133
ummy-0.132
ver-0.131
Token the
Token ID262
Feature activation-9.09e-02

Pos logit contributions
 p+0.023
 st+0.023
 r+0.023
 hurt+0.021
 pet+0.020
Neg logit contributions
ancer-0.051
ancing-0.047
icking-0.047
ver-0.047
ummy-0.046
Token trophy
Token ID16383
Feature activation-6.75e-01

Pos logit contributions
 p+0.006
 st+0.005
 r+0.005
 hurt+0.005
 pet+0.005
Neg logit contributions
ancer-0.010
ancing-0.009
icking-0.009
ver-0.009
ummy-0.009
Token.
Token ID13
Feature activation-4.05e-01

Pos logit contributions
 p+0.030
 st+0.029
 r+0.028
 hurt+0.025
 pet+0.024
Neg logit contributions
ancer-0.087
ancing-0.080
icking-0.080
ver-0.079
ummy-0.078
Token.
Token ID13
Feature activation+1.50e-01

Pos logit contributions
 p+0.033
 st+0.032
 r+0.032
 hurt+0.029
 pet+0.026
Neg logit contributions
ancer-0.127
ancing-0.118
icking-0.116
ver-0.116
ummy-0.114
Token You
Token ID921
Feature activation+1.13e+00

Pos logit contributions
ancer+0.012
icking+0.011
ancing+0.011
ver+0.011
ete+0.011
Neg logit contributions
 p-0.013
 r-0.012
 st-0.012
 trash-0.011
 pet-0.011
Token've
Token ID1053
Feature activation+2.16e+00

Pos logit contributions
ancer+0.126
ancing+0.119
ummy+0.114
icking+0.114
ver+0.111
Neg logit contributions
 st-0.070
 p-0.070
 r-0.068
 hurt-0.066
 pet-0.062
Token had
Token ID550
Feature activation-7.38e-02

Pos logit contributions
ancer+0.244
ver+0.224
icking+0.223
ummy+0.215
ancing+0.215
Neg logit contributions
 p-0.100
 st-0.095
 r-0.092
 scratches-0.089
 pet-0.087
Token enough
Token ID1576
Feature activation+1.77e+00

Pos logit contributions
 p+0.004
 r+0.004
 st+0.003
 wished+0.003
 scratches+0.003
Neg logit contributions
ancer-0.008
icking-0.007
ummy-0.007
ver-0.007
ancing-0.007
Token cher
Token ID21382
Feature activation-4.37e+00

Pos logit contributions
ancer+0.190
ummy+0.182
icking+0.178
ancing+0.174
ver+0.169
Neg logit contributions
 wished-0.091
 r-0.090
 st-0.089
 p-0.086
 hurt-0.086
Tokenries
Token ID1678
Feature activation-7.88e-01

Pos logit contributions
 p+0.414
 st+0.410
 r+0.406
 hurt+0.387
 scratches+0.382
Neg logit contributions
ancer-0.336
ancing-0.296
icking-0.290
ver-0.289
ummy-0.278
Token for
Token ID329
Feature activation+1.91e+00

Pos logit contributions
 p+0.036
 st+0.036
 r+0.034
 hurt+0.032
 pet+0.030
Neg logit contributions
ancer-0.100
ancing-0.094
icking-0.091
ver-0.090
ete-0.089
Token today
Token ID1909
Feature activation+8.45e-01

Pos logit contributions
ancer+0.219
ancing+0.200
ver+0.197
icking+0.197
apping+0.185
Neg logit contributions
 p-0.124
 st-0.120
 r-0.116
 hurt-0.109
 pet-0.106
Token.
Token ID13
Feature activation+3.20e-01

Pos logit contributions
ancer+0.130
ancing+0.124
icking+0.119
ver+0.118
ete+0.116
Neg logit contributions
 p-0.023
 st-0.019
 r-0.019
.-0.016
 hurt-0.014
Token You
Token ID921
Feature activation+1.14e+00

Pos logit contributions
icking+0.030
ancer+0.029
ancing+0.029
ver+0.027
ete+0.026
Neg logit contributions
 p-0.025
 r-0.023
 st-0.023
 pet-0.022
 le-0.022
Token to
Token ID284
Feature activation-1.11e+00

Pos logit contributions
ancer+0.006
ancing+0.005
icking+0.005
ver+0.005
ummy+0.005
Neg logit contributions
 p-0.002
 st-0.002
 r-0.002
 hurt-0.002
 pet-0.002
Token be
Token ID307
Feature activation-1.04e+00

Pos logit contributions
 p+0.066
 st+0.065
 r+0.065
 trash+0.060
 scratches+0.060
Neg logit contributions
ancer-0.120
ancing-0.111
icking-0.110
ver-0.108
ummy-0.106
Token careful
Token ID8161
Feature activation-1.79e+00

Pos logit contributions
 p+0.054
 r+0.054
 st+0.054
 scratches+0.053
 trash+0.053
Neg logit contributions
ancer-0.104
icking-0.103
ver-0.100
ancing-0.100
apping-0.096
Token when
Token ID618
Feature activation-8.91e-01

Pos logit contributions
 r+0.081
 hurt+0.075
 p+0.074
 st+0.071
 bad+0.063
Neg logit contributions
ancer-0.235
ete-0.206
ver-0.205
ancing-0.202
imb-0.201
Token picking
Token ID10868
Feature activation-7.03e-01

Pos logit contributions
 p+0.058
 st+0.058
 r+0.057
 hurt+0.053
 pet+0.052
Neg logit contributions
ancer-0.096
ancing-0.088
icking-0.087
ver-0.087
ummy-0.083
Token cher
Token ID21382
Feature activation-4.36e+00

Pos logit contributions
 st+0.034
 r+0.034
 p+0.034
 hurt+0.031
 scratches+0.031
Neg logit contributions
ancer-0.082
ancing-0.077
icking-0.076
ummy-0.075
ver-0.075
Tokenries
Token ID1678
Feature activation-7.11e-01

Pos logit contributions
 p+0.402
 st+0.397
 r+0.391
 hurt+0.371
 pet+0.366
Neg logit contributions
ancer-0.350
ancing-0.309
ver-0.302
icking-0.301
ummy-0.287
Token so
Token ID523
Feature activation+1.09e+00

Pos logit contributions
 p+0.028
 st+0.026
 r+0.025
 hurt+0.023
 pet+0.021
Neg logit contributions
ancer-0.097
ancing-0.090
icking-0.087
ver-0.087
ete-0.086
Token that
Token ID326
Feature activation-4.36e-01

Pos logit contributions
ancer+0.125
ancing+0.115
icking+0.113
ver+0.112
ummy+0.109
Neg logit contributions
 p-0.063
 st-0.062
 r-0.062
 hurt-0.057
 pet-0.055
Token the
Token ID262
Feature activation+8.26e-02

Pos logit contributions
 p+0.024
 r+0.024
 st+0.024
 hurt+0.022
 trash+0.021
Neg logit contributions
ancer-0.052
ancing-0.048
icking-0.047
ver-0.047
ummy-0.046
Token tree
Token ID5509
Feature activation-9.78e-01

Pos logit contributions
ancer+0.009
ummy+0.008
icking+0.008
ver+0.008
ancing+0.008
Neg logit contributions
 st-0.004
 p-0.004
 r-0.004
 scratches-0.004
 hurt-0.004
Token ugly
Token ID13400
Feature activation+4.00e-01

Pos logit contributions
 st+0.034
 p+0.034
 r+0.033
 scratches+0.032
 trash+0.030
Neg logit contributions
ancer-0.084
icking-0.079
ancing-0.079
ummy-0.078
ver-0.077
Token.
Token ID13
Feature activation-2.71e-01

Pos logit contributions
ancer+0.050
ancing+0.047
icking+0.047
ver+0.046
ummy+0.046
Neg logit contributions
 p-0.017
 st-0.017
 r-0.016
 hurt-0.015
 scratches-0.015
Token There
Token ID1318
Feature activation+2.19e-01

Pos logit contributions
 p+0.019
 r+0.018
 st+0.018
 pet+0.017
 trash+0.016
Neg logit contributions
icking-0.029
ancer-0.029
ancing-0.028
ver-0.025
ying-0.025
Token were
Token ID547
Feature activation-6.99e-01

Pos logit contributions
ancer+0.034
ancing+0.031
apping+0.029
cess+0.029
istent+0.029
Neg logit contributions
 r-0.007
 p-0.007
 st-0.006
 hurt-0.006
 scattered-0.005
Token no
Token ID645
Feature activation+8.28e-01

Pos logit contributions
 p+0.037
 st+0.037
 r+0.036
 hurt+0.033
 scratches+0.033
Neg logit contributions
ancer-0.082
ancing-0.076
icking-0.075
ver-0.074
ummy-0.073
Token cher
Token ID21382
Feature activation-4.35e+00

Pos logit contributions
ancer+0.099
ancing+0.091
icking+0.091
ummy+0.089
ver+0.089
Neg logit contributions
 p-0.041
 st-0.041
 r-0.040
 hurt-0.036
 scratches-0.036
Tokenries
Token ID1678
Feature activation-8.08e-01

Pos logit contributions
 p+0.422
 st+0.418
 r+0.416
 hurt+0.397
 pet+0.393
Neg logit contributions
ancer-0.324
ancing-0.284
icking-0.281
ver-0.274
ummy-0.268
Token!
Token ID0
Feature activation+1.22e+00

Pos logit contributions
 p+0.041
 st+0.040
 r+0.039
 hurt+0.037
 scattered+0.035
Neg logit contributions
ancer-0.101
ancing-0.093
ver-0.090
icking-0.090
ummy-0.088
Token
Token ID198
Feature activation-3.08e-01

Pos logit contributions
icking+0.136
ancer+0.135
ancing+0.134
ver+0.119
uing+0.116
Neg logit contributions
 p-0.078
 st-0.071
 r-0.069
 pet-0.068
 trash-0.065
Token
Token ID198
Feature activation-3.72e-01

Pos logit contributions
 p+0.014
 r+0.013
 st+0.013
 hurt+0.012
 pet+0.012
Neg logit contributions
ancer-0.040
ancing-0.038
icking-0.038
ver-0.035
ummy-0.035
TokenL
Token ID43
Feature activation-2.00e+00

Pos logit contributions
 p+0.021
 st+0.021
 r+0.021
 hurt+0.019
 pet+0.018
Neg logit contributions
ancer-0.043
ancing-0.040
icking-0.040
ver-0.039
ummy-0.038
Token she
Token ID673
Feature activation-9.61e-01

Pos logit contributions
 p+0.033
 r+0.032
 st+0.031
 hurt+0.029
 pr+0.027
Neg logit contributions
ancer-0.075
ancing-0.071
ver-0.067
ete-0.067
icking-0.067
Token had
Token ID550
Feature activation-6.19e-01

Pos logit contributions
 p+0.049
 st+0.048
 r+0.047
 hurt+0.043
 scratches+0.043
Neg logit contributions
ancer-0.115
ancing-0.105
icking-0.105
ver-0.105
ummy-0.102
Token picked
Token ID6497
Feature activation-2.33e+00

Pos logit contributions
 p+0.032
 st+0.030
 r+0.029
 trash+0.028
P+0.028
Neg logit contributions
ancer-0.067
ver-0.064
icking-0.064
ancing-0.061
ummy-0.059
Token too
Token ID1165
Feature activation-7.80e-02

Pos logit contributions
 p+0.139
 r+0.134
 st+0.134
 trash+0.125
 hurt+0.123
Neg logit contributions
ancer-0.278
ver-0.248
ancing-0.247
icking-0.245
apping-0.243
Token many
Token ID867
Feature activation-1.69e+00

Pos logit contributions
 p+0.003
 st+0.003
 r+0.003
 hurt+0.003
 trash+0.003
Neg logit contributions
ancer-0.010
ancing-0.009
icking-0.009
ver-0.009
ummy-0.009
Token cher
Token ID21382
Feature activation-4.34e+00

Pos logit contributions
 st+0.082
 r+0.080
 p+0.080
P+0.075
 hurt+0.075
Neg logit contributions
ancer-0.186
ummy-0.175
icking-0.174
ancing-0.173
ver-0.172
Tokenries
Token ID1678
Feature activation-8.74e-01

Pos logit contributions
 p+0.418
 st+0.414
 r+0.411
 hurt+0.393
 scratches+0.388
Neg logit contributions
ancer-0.325
ancing-0.286
icking-0.282
ver-0.277
ummy-0.270
Token and
Token ID290
Feature activation+6.52e-02

Pos logit contributions
 p+0.039
 st+0.037
 r+0.036
 hurt+0.034
 scattered+0.031
Neg logit contributions
ancer-0.116
ancing-0.106
icking-0.102
ver-0.101
ete-0.101
Token the
Token ID262
Feature activation+4.23e-02

Pos logit contributions
ancer+0.008
ancing+0.007
icking+0.007
ver+0.007
ummy+0.007
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 hurt-0.003
 trash-0.003
Token tree
Token ID5509
Feature activation-8.58e-01

Pos logit contributions
ancer+0.005
icking+0.004
ancing+0.004
ummy+0.004
ver+0.004
Neg logit contributions
 p-0.002
 r-0.002
 st-0.002
 hurt-0.002
 scratches-0.002
Token was
Token ID373
Feature activation-3.34e-01

Pos logit contributions
 p+0.039
 st+0.038
 r+0.037
 hurt+0.034
 pet+0.032
Neg logit contributions
ancer-0.109
ancing-0.101
icking-0.099
ver-0.099
ummy-0.097
Token full
Token ID1336
Feature activation+7.00e-01

Pos logit contributions
 p+0.025
 st+0.025
 r+0.025
 trash+0.024
 scratches+0.024
Neg logit contributions
ancer-0.037
icking-0.035
ancing-0.035
ver-0.034
ummy-0.034
Token of
Token ID286
Feature activation+5.21e-01

Pos logit contributions
ancer+0.069
ancing+0.061
ver+0.059
icking+0.058
ete+0.055
Neg logit contributions
 p-0.056
 r-0.054
 st-0.054
 pet-0.051
 hurt-0.050
Token y
Token ID331
Feature activation-5.07e-01

Pos logit contributions
ancer+0.057
ummy+0.056
icking+0.054
ancing+0.053
unch+0.052
Neg logit contributions
P-0.022
 st-0.022
 r-0.022
 p-0.020
 hurt-0.019
Tokenummy
Token ID13513
Feature activation+3.99e+00

Pos logit contributions
 p+0.080
 st+0.080
 r+0.079
 hurt+0.077
 scratches+0.077
Neg logit contributions
ancer-0.006
ancing-0.002
icking-0.001
ummy-0.001
ver-0.000
Token red
Token ID2266
Feature activation-3.92e-01

Pos logit contributions
ummy+0.404
ancer+0.376
icking+0.358
ancing+0.354
unch+0.354
Neg logit contributions
 wished-0.225
P-0.219
 st-0.214
 hurt-0.204
 r-0.204
Token cher
Token ID21382
Feature activation-4.33e+00

Pos logit contributions
 wished+0.020
 st+0.019
 r+0.019
P+0.019
 hurt+0.018
Neg logit contributions
ummy-0.042
ancer-0.040
ancing-0.039
icking-0.038
unch-0.038
Tokenries
Token ID1678
Feature activation-8.12e-01

Pos logit contributions
 r+0.450
 p+0.449
 st+0.444
 wished+0.435
 scratches+0.431
Neg logit contributions
ancer-0.280
icking-0.249
ummy-0.244
ancing-0.242
ver-0.226
Token.
Token ID13
Feature activation-5.53e-01

Pos logit contributions
 p+0.034
 st+0.033
 r+0.032
 hurt+0.029
 pet+0.028
Neg logit contributions
ancer-0.106
ancing-0.098
icking-0.097
ver-0.096
ummy-0.095
Token The
Token ID383
Feature activation-4.44e-01

Pos logit contributions
 p+0.037
 st+0.036
 r+0.035
 pet+0.033
 wished+0.031
Neg logit contributions
ancer-0.060
icking-0.055
ancing-0.055
ver-0.055
ummy-0.051
Token bird
Token ID6512
Feature activation-1.34e+00

Pos logit contributions
 scratches+0.027
 st+0.027
 r+0.027
 trash+0.026
P+0.026
Neg logit contributions
ancer-0.042
ummy-0.040
icking-0.039
ama-0.037
ver-0.036
Tokenie
Token ID494
Feature activation+2.54e+00

Pos logit contributions
 trash+0.147
 scratches+0.146
 st+0.140
 r+0.138
 fun+0.138
Neg logit contributions
ancer-0.067
icking-0.067
ummy-0.058
ver-0.058
ancing-0.055
Token had
Token ID550
Feature activation-5.41e-01

Pos logit contributions
ancer+0.030
ancing+0.027
icking+0.027
ver+0.027
ummy+0.027
Neg logit contributions
 p-0.010
 st-0.010
 r-0.010
 hurt-0.009
 scratches-0.009
Token been
Token ID587
Feature activation-1.59e-01

Pos logit contributions
 p+0.026
 st+0.024
 r+0.024
 trash+0.022
 scratches+0.022
Neg logit contributions
ancer-0.060
icking-0.058
ver-0.057
ancing-0.055
ete-0.053
Token right
Token ID826
Feature activation+4.79e-01

Pos logit contributions
 p+0.008
 r+0.008
 st+0.008
 scratches+0.008
 trash+0.008
Neg logit contributions
ancer-0.017
icking-0.016
ver-0.016
ancing-0.016
apping-0.015
Token about
Token ID546
Feature activation+5.03e-01

Pos logit contributions
ancer+0.057
ancing+0.052
icking+0.052
ver+0.051
ummy+0.051
Neg logit contributions
 p-0.025
 st-0.025
 r-0.025
 hurt-0.022
 scratches-0.022
Token the
Token ID262
Feature activation+6.85e-02

Pos logit contributions
ancer+0.062
ancing+0.057
ver+0.056
icking+0.056
ummy+0.054
Neg logit contributions
 p-0.026
 st-0.025
 r-0.025
 hurt-0.023
 pet-0.022
Token cher
Token ID21382
Feature activation-4.32e+00

Pos logit contributions
ancer+0.007
ummy+0.007
icking+0.007
ancing+0.007
ver+0.007
Neg logit contributions
 r-0.004
 st-0.004
 p-0.003
 wished-0.003
 hurt-0.003
Tokenries
Token ID1678
Feature activation-7.22e-01

Pos logit contributions
 p+0.405
 st+0.400
 r+0.395
 hurt+0.377
 pet+0.370
Neg logit contributions
ancer-0.338
ancing-0.300
ver-0.291
icking-0.291
ummy-0.278
Token.
Token ID13
Feature activation-5.19e-01

Pos logit contributions
 p+0.031
 st+0.030
 r+0.029
 hurt+0.028
 scattered+0.026
Neg logit contributions
ancer-0.097
ancing-0.090
ummy-0.085
ver-0.085
icking-0.084
Token From
Token ID3574
Feature activation+6.75e-01

Pos logit contributions
 p+0.032
 r+0.030
 wished+0.027
 trash+0.027
 st+0.027
Neg logit contributions
ancer-0.062
icking-0.054
ancing-0.054
ver-0.053
unch-0.049
Token that
Token ID326
Feature activation-5.67e-01

Pos logit contributions
ancer+0.080
ancing+0.073
ver+0.072
icking+0.072
ummy+0.070
Neg logit contributions
 p-0.038
 st-0.036
 r-0.036
 hurt-0.034
 pet-0.033
Token day
Token ID1110
Feature activation-1.27e+00

Pos logit contributions
 p+0.027
 st+0.027
 r+0.026
 hurt+0.024
 scratches+0.023
Neg logit contributions
ancer-0.070
ancing-0.065
icking-0.064
ver-0.064
ummy-0.063
Token came
Token ID1625
Feature activation-9.20e-01

Pos logit contributions
 p+0.006
 st+0.006
 r+0.006
 scratches+0.005
 trash+0.005
Neg logit contributions
ancer-0.014
icking-0.013
ver-0.013
ancing-0.013
ummy-0.013
Token back
Token ID736
Feature activation+1.04e+00

Pos logit contributions
 p+0.044
 st+0.043
 r+0.042
 hurt+0.040
 pet+0.039
Neg logit contributions
ancer-0.115
ancing-0.106
icking-0.105
ver-0.104
ummy-0.103
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
ancer+0.127
ancing+0.117
icking+0.115
ver+0.115
ummy+0.112
Neg logit contributions
 p-0.053
 st-0.052
 r-0.051
 pet-0.047
 hurt-0.047
Token pick
Token ID2298
Feature activation-9.13e-01

Pos logit contributions
 p+0.048
 st+0.047
 r+0.046
 hurt+0.041
 pet+0.040
Neg logit contributions
ancer-0.148
ancing-0.138
icking-0.137
ver-0.136
ummy-0.133
Token more
Token ID517
Feature activation+1.32e-02

Pos logit contributions
 p+0.043
 st+0.041
 r+0.041
 hurt+0.037
 pet+0.036
Neg logit contributions
ancer-0.115
ancing-0.107
icking-0.106
ver-0.105
ummy-0.103
Token cher
Token ID21382
Feature activation-4.31e+00

Pos logit contributions
ancer+0.002
ummy+0.001
icking+0.001
ancing+0.001
ver+0.001
Neg logit contributions
 st-0.001
 r-0.001
 p-0.001
 hurt-0.001
 wished-0.001
Tokenries
Token ID1678
Feature activation-7.04e-01

Pos logit contributions
 p+0.426
 r+0.422
 st+0.422
 hurt+0.402
 scratches+0.399
Neg logit contributions
ancer-0.312
ancing-0.272
icking-0.272
ver-0.262
ummy-0.259
Token.
Token ID13
Feature activation-4.19e-01

Pos logit contributions
 p+0.024
 st+0.023
 r+0.022
 hurt+0.020
 pet+0.019
Neg logit contributions
ancer-0.098
ancing-0.091
ver-0.089
icking-0.089
ummy-0.088
Token She
Token ID1375
Feature activation-6.47e-02

Pos logit contributions
 p+0.026
 r+0.024
 pet+0.024
 st+0.024
 wished+0.021
Neg logit contributions
ancer-0.047
icking-0.045
ancing-0.044
ver-0.042
apping-0.039
Token was
Token ID373
Feature activation-3.73e-01

Pos logit contributions
 p+0.003
 st+0.003
 r+0.003
 trash+0.003
 scratches+0.003
Neg logit contributions
ancer-0.008
icking-0.007
ver-0.007
ancing-0.007
ummy-0.007
Token so
Token ID523
Feature activation+1.04e+00

Pos logit contributions
 p+0.021
 st+0.021
 r+0.020
 hurt+0.019
 pet+0.019
Neg logit contributions
ancer-0.044
ancing-0.040
icking-0.039
ver-0.039
ummy-0.039
Token.
Token ID13
Feature activation-5.10e-01

Pos logit contributions
 p+0.049
 r+0.047
 st+0.047
 scratches+0.043
 trash+0.042
Neg logit contributions
ancer-0.164
icking-0.154
ancing-0.153
ver-0.152
ummy-0.151
Token Little
Token ID7703
Feature activation+1.80e+00

Pos logit contributions
 p+0.030
 r+0.030
 pet+0.028
 st+0.028
 s+0.025
Neg logit contributions
ancer-0.062
icking-0.060
ancing-0.058
ver-0.054
ete-0.052
Token Tommy
Token ID19919
Feature activation+3.15e-01

Pos logit contributions
ancer+0.169
ancing+0.153
ver+0.150
icking+0.148
apping+0.143
Neg logit contributions
 p-0.149
 r-0.144
 st-0.143
 hurt-0.135
 pet-0.135
Token wanted
Token ID2227
Feature activation-9.75e-01

Pos logit contributions
ancer+0.029
ver+0.029
icking+0.028
ummy+0.028
ancing+0.026
Neg logit contributions
 st-0.019
 scratches-0.018
 p-0.018
 r-0.017
 trash-0.017
Token some
Token ID617
Feature activation-9.87e-02

Pos logit contributions
 p+0.046
 st+0.045
 r+0.044
 hurt+0.040
 scratches+0.039
Neg logit contributions
ancer-0.122
ancing-0.113
icking-0.112
ver-0.111
ummy-0.109
Token cher
Token ID21382
Feature activation-4.31e+00

Pos logit contributions
 r+0.004
 p+0.004
 st+0.004
 wished+0.004
 scattered+0.004
Neg logit contributions
ancer-0.012
ummy-0.011
icking-0.011
ancing-0.011
ver-0.011
Tokenries
Token ID1678
Feature activation-6.81e-01

Pos logit contributions
 p+0.417
 st+0.412
 r+0.411
 hurt+0.391
 scratches+0.389
Neg logit contributions
ancer-0.321
ancing-0.280
icking-0.279
ver-0.272
ummy-0.268
Token from
Token ID422
Feature activation-1.90e-01

Pos logit contributions
 p+0.026
 st+0.025
 r+0.023
 hurt+0.023
 pet+0.021
Neg logit contributions
ancer-0.093
ancing-0.088
icking-0.084
ver-0.083
ummy-0.082
Token the
Token ID262
Feature activation+7.88e-03

Pos logit contributions
 p+0.009
 st+0.009
 r+0.009
 scratches+0.008
 hurt+0.008
Neg logit contributions
ancer-0.023
ancing-0.021
icking-0.021
ummy-0.021
ver-0.020
Token tree
Token ID5509
Feature activation-1.24e+00

Pos logit contributions
ancer+0.001
ancing+0.001
icking+0.001
ver+0.001
ummy+0.001
Neg logit contributions
 p-0.000
 st-0.000
 r-0.000
 scratches-0.000
 hurt-0.000
Token,
Token ID11
Feature activation+3.71e-01

Pos logit contributions
 st+0.026
 p+0.025
 r+0.022
 hurt+0.019
 pet+0.018
Neg logit contributions
ancer-0.191
ancing-0.182
ete-0.171
ummy-0.171
ver-0.171
Token The
Token ID383
Feature activation-4.59e-01

Pos logit contributions
ancer+0.107
icking+0.098
ancing+0.098
ver+0.096
ummy+0.093
Neg logit contributions
 p-0.064
 st-0.062
 r-0.061
 pet-0.057
 hurt-0.056
Token bird
Token ID6512
Feature activation-1.37e+00

Pos logit contributions
 scratches+0.029
 st+0.028
 r+0.028
 trash+0.028
 p+0.027
Neg logit contributions
ancer-0.043
ummy-0.041
icking-0.040
ama-0.038
ver-0.037
Tokenie
Token ID494
Feature activation+2.58e+00

Pos logit contributions
 scratches+0.141
 trash+0.141
 st+0.139
 p+0.138
 r+0.137
Neg logit contributions
ancer-0.082
icking-0.077
ver-0.070
ummy-0.069
ancing-0.068
Token ate
Token ID15063
Feature activation-5.67e-01

Pos logit contributions
ancer+0.314
icking+0.292
ancing+0.290
ver+0.289
ummy+0.285
Neg logit contributions
 p-0.115
 st-0.114
 r-0.112
 trash-0.105
 scratches-0.103
Token many
Token ID867
Feature activation-1.56e+00

Pos logit contributions
 st+0.033
 r+0.032
 p+0.032
 scratches+0.031
 wished+0.031
Neg logit contributions
ancer-0.058
ummy-0.055
ancing-0.055
icking-0.055
ver-0.054
Token cher
Token ID21382
Feature activation-4.31e+00

Pos logit contributions
 wished+0.078
 st+0.076
 r+0.074
 hurt+0.072
 sw+0.071
Neg logit contributions
ummy-0.163
unch-0.146
icking-0.146
ancer-0.146
ver-0.141
Tokenries
Token ID1678
Feature activation-7.27e-01

Pos logit contributions
 p+0.432
 r+0.431
 st+0.429
 hurt+0.412
 trash+0.408
Neg logit contributions
ancer-0.299
icking-0.263
ancing-0.261
ummy-0.254
ver-0.250
Token and
Token ID290
Feature activation-4.98e-02

Pos logit contributions
 p+0.041
 st+0.040
 r+0.039
 hurt+0.036
 pet+0.036
Neg logit contributions
ancer-0.085
ancing-0.078
icking-0.076
ver-0.076
ummy-0.074
Token felt
Token ID2936
Feature activation-8.17e-01

Pos logit contributions
 st+0.002
 p+0.002
 trash+0.002
 r+0.002
 scratches+0.002
Neg logit contributions
ancer-0.006
icking-0.005
ver-0.005
ummy-0.005
ancing-0.005
Token happy
Token ID3772
Feature activation-1.20e+00

Pos logit contributions
 st+0.049
 p+0.049
 scratches+0.048
 r+0.047
 trash+0.044
Neg logit contributions
ancer-0.081
ver-0.077
icking-0.076
ummy-0.076
ancing-0.076
Token.
Token ID13
Feature activation-6.57e-01

Pos logit contributions
 st+0.064
 p+0.062
 r+0.061
 trash+0.060
 scratches+0.059
Neg logit contributions
ancer-0.137
ancing-0.128
ummy-0.127
icking-0.127
ver-0.124
Token.
Token ID13
Feature activation-5.29e-02

Pos logit contributions
 p+0.003
 r+0.002
 st+0.002
 s+0.002
 hurt+0.002
Neg logit contributions
ancer-0.015
icking-0.014
ancing-0.014
ummy-0.014
istent-0.014
Token Then
Token ID3244
Feature activation-2.38e-01

Pos logit contributions
 p+0.004
 r+0.003
 st+0.003
 pet+0.003
 wished+0.003
Neg logit contributions
ancer-0.006
icking-0.006
ancing-0.005
ver-0.005
ummy-0.005
Token both
Token ID1111
Feature activation-2.81e-01

Pos logit contributions
 p+0.014
 st+0.014
 r+0.014
 hurt+0.013
 pet+0.012
Neg logit contributions
ancer-0.027
ancing-0.025
icking-0.024
ver-0.024
ummy-0.024
Token friends
Token ID2460
Feature activation-7.02e-01

Pos logit contributions
 p+0.014
 st+0.014
 r+0.013
 hurt+0.012
 scratches+0.012
Neg logit contributions
ancer-0.034
ancing-0.031
icking-0.031
ver-0.031
ummy-0.030
Token took
Token ID1718
Feature activation-1.32e+00

Pos logit contributions
 p+0.031
 st+0.030
 r+0.030
 hurt+0.027
 pet+0.026
Neg logit contributions
ancer-0.089
ancing-0.083
icking-0.082
ver-0.081
ummy-0.081
Token turns
Token ID4962
Feature activation-4.30e+00

Pos logit contributions
 p+0.063
 st+0.062
 r+0.061
 hurt+0.055
 scratches+0.054
Neg logit contributions
ancer-0.162
ancing-0.150
icking-0.148
ver-0.147
ummy-0.145
Token spinning
Token ID19493
Feature activation-2.25e-01

Pos logit contributions
 p+0.285
 st+0.280
 r+0.275
 trash+0.262
 scratches+0.259
Neg logit contributions
ancer-0.437
ancing-0.402
icking-0.401
ver-0.395
ummy-0.383
Token the
Token ID262
Feature activation-9.04e-03

Pos logit contributions
 st+0.013
 p+0.013
 r+0.013
 hurt+0.012
 scratches+0.012
Neg logit contributions
ancer-0.025
ancing-0.023
ver-0.023
icking-0.022
ummy-0.022
Token tire
Token ID15867
Feature activation-7.70e-01

Pos logit contributions
 p+0.001
 st+0.001
 r+0.001
 hurt+0.000
 scratches+0.000
Neg logit contributions
ancer-0.001
ancing-0.001
icking-0.001
ver-0.001
ummy-0.001
Token together
Token ID1978
Feature activation-6.76e-01

Pos logit contributions
 p+0.038
 st+0.037
 r+0.036
 hurt+0.033
 pet+0.031
Neg logit contributions
ancer-0.096
ancing-0.088
icking-0.087
ver-0.086
ummy-0.085
Token.
Token ID13
Feature activation-2.69e-01

Pos logit contributions
 p+0.026
 st+0.024
 r+0.024
 hurt+0.021
 pet+0.020
Neg logit contributions
ancer-0.094
ancing-0.085
icking-0.085
ver-0.084
ummy-0.083
Token up
Token ID510
Feature activation+1.20e-01

Pos logit contributions
 p+0.073
 r+0.070
 st+0.070
 pet+0.064
 hurt+0.063
Neg logit contributions
ancer-0.180
ancing-0.162
ver-0.160
icking-0.160
ummy-0.160
Token and
Token ID290
Feature activation-3.64e-01

Pos logit contributions
ancer+0.014
ancing+0.013
icking+0.013
ver+0.013
ummy+0.012
Neg logit contributions
 p-0.007
 st-0.006
 r-0.006
 hurt-0.006
 trash-0.006
Token grabbed
Token ID13176
Feature activation-2.10e+00

Pos logit contributions
 scratches+0.015
 bad+0.015
 r+0.015
 st+0.015
 trash+0.015
Neg logit contributions
ancer-0.041
ver-0.040
icking-0.039
ancing-0.039
ete-0.038
Token as
Token ID355
Feature activation-1.24e+00

Pos logit contributions
 p+0.100
 st+0.099
 r+0.099
 hurt+0.091
 wished+0.089
Neg logit contributions
ancer-0.247
ancing-0.232
icking-0.230
ver-0.229
ummy-0.224
Token many
Token ID867
Feature activation-1.69e+00

Pos logit contributions
 p+0.046
 st+0.045
 r+0.044
 hurt+0.041
 pet+0.037
Neg logit contributions
ancer-0.170
ancing-0.161
icking-0.158
ummy-0.153
ver-0.151
Token cher
Token ID21382
Feature activation-4.30e+00

Pos logit contributions
P+0.079
 wished+0.077
 st+0.076
 r+0.076
 hurt+0.074
Neg logit contributions
ummy-0.176
ancer-0.169
icking-0.166
ver-0.166
ancing-0.160
Tokenries
Token ID1678
Feature activation-7.80e-01

Pos logit contributions
 p+0.441
 r+0.440
 st+0.435
 hurt+0.424
 wished+0.423
Neg logit contributions
ancer-0.286
icking-0.254
ancing-0.252
ummy-0.242
ver-0.238
Token as
Token ID355
Feature activation-1.17e+00

Pos logit contributions
 p+0.038
 st+0.037
 r+0.037
 hurt+0.034
 pet+0.033
Neg logit contributions
ancer-0.095
ancing-0.088
icking-0.087
ver-0.087
ummy-0.085
Token he
Token ID339
Feature activation-8.95e-01

Pos logit contributions
 p+0.059
 st+0.058
 r+0.057
 hurt+0.053
 pet+0.052
Neg logit contributions
ancer-0.144
ancing-0.136
icking-0.132
istent-0.128
ummy-0.127
Token could
Token ID714
Feature activation+1.19e+00

Pos logit contributions
 p+0.055
 st+0.054
 r+0.054
 trash+0.051
 hurt+0.051
Neg logit contributions
ancer-0.096
ver-0.087
ancing-0.087
icking-0.087
ummy-0.084
Token.
Token ID13
Feature activation-6.58e-01

Pos logit contributions
ancer+0.162
ancing+0.147
icking+0.145
ver+0.144
ummy+0.141
Neg logit contributions
 p-0.048
 r-0.045
 st-0.045
 hurt-0.041
 pet-0.041
Token pick
Token ID2298
Feature activation-8.62e-01

Pos logit contributions
 p+0.063
 st+0.062
 r+0.061
 hurt+0.056
 scratches+0.055
Neg logit contributions
ancer-0.127
ancing-0.117
icking-0.115
ver-0.114
ummy-0.112
Token a
Token ID257
Feature activation+7.96e-02

Pos logit contributions
 p+0.043
 st+0.042
 r+0.041
 pet+0.037
 hurt+0.037
Neg logit contributions
ancer-0.107
ancing-0.099
icking-0.098
ver-0.098
ummy-0.095
Token few
Token ID1178
Feature activation-9.82e-01

Pos logit contributions
ancer+0.010
ancing+0.009
icking+0.009
ver+0.009
ummy+0.009
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 hurt-0.003
 scratches-0.003
Token big
Token ID1263
Feature activation-2.26e-01

Pos logit contributions
 r+0.046
 wished+0.046
 hurt+0.045
 st+0.045
P+0.043
Neg logit contributions
ummy-0.097
ancer-0.096
icking-0.094
ancing-0.094
ver-0.090
Token red
Token ID2266
Feature activation-3.70e-01

Pos logit contributions
 r+0.012
 st+0.012
 p+0.012
 hurt+0.012
P+0.012
Neg logit contributions
ancer-0.023
ummy-0.022
ancing-0.022
icking-0.022
ver-0.020
Token cher
Token ID21382
Feature activation-4.29e+00

Pos logit contributions
 p+0.021
 st+0.021
 r+0.020
 hurt+0.019
 scratches+0.018
Neg logit contributions
ancer-0.042
ancing-0.039
icking-0.038
ver-0.038
ummy-0.038
Tokenries
Token ID1678
Feature activation-8.02e-01

Pos logit contributions
 p+0.429
 r+0.428
 st+0.425
 hurt+0.408
 scratches+0.407
Neg logit contributions
ancer-0.304
icking-0.265
ancing-0.264
ummy-0.254
ver-0.250
Token,
Token ID11
Feature activation+3.68e-01

Pos logit contributions
 p+0.035
 st+0.034
 r+0.033
 hurt+0.030
 pet+0.029
Neg logit contributions
ancer-0.106
ancing-0.097
ver-0.095
icking-0.095
ummy-0.093
Token then
Token ID788
Feature activation-3.01e-02

Pos logit contributions
ancer+0.040
ancing+0.036
icking+0.036
ver+0.036
ummy+0.035
Neg logit contributions
 p-0.023
 st-0.022
 r-0.022
 hurt-0.021
 scratches-0.020
Token rode
Token ID22075
Feature activation-1.55e+00

Pos logit contributions
 p+0.002
 st+0.001
 r+0.001
 hurt+0.001
 pet+0.001
Neg logit contributions
ancer-0.004
ancing-0.003
icking-0.003
ver-0.003
ummy-0.003
Token away
Token ID1497
Feature activation-8.79e-01

Pos logit contributions
 p+0.077
 st+0.076
 r+0.075
 hurt+0.068
 scratches+0.068
Neg logit contributions
ancer-0.185
ancing-0.173
icking-0.170
ver-0.170
ummy-0.166
Token still
Token ID991
Feature activation+1.61e-02

Pos logit contributions
 st+0.024
 p+0.024
 scratches+0.024
 r+0.023
 trash+0.023
Neg logit contributions
ancer-0.037
ver-0.036
icking-0.036
ancing-0.034
apping-0.034
Token happy
Token ID3772
Feature activation-1.38e+00

Pos logit contributions
ancer+0.002
icking+0.002
ver+0.002
ancing+0.002
apping+0.002
Neg logit contributions
 p-0.001
 st-0.001
 r-0.001
 scratches-0.001
 trash-0.001
Token to
Token ID284
Feature activation-9.35e-01

Pos logit contributions
 st+0.067
 p+0.066
 r+0.064
 trash+0.061
 scratches+0.059
Neg logit contributions
ancer-0.163
ancing-0.152
icking-0.152
ummy-0.152
ver-0.147
Token have
Token ID423
Feature activation-3.88e-01

Pos logit contributions
 st+0.061
 p+0.061
 r+0.060
 trash+0.056
 scratches+0.056
Neg logit contributions
ancer-0.096
ancing-0.089
icking-0.088
ver-0.087
ummy-0.085
Token her
Token ID607
Feature activation-7.56e-01

Pos logit contributions
 st+0.020
 p+0.020
 r+0.020
 scratches+0.018
 trash+0.018
Neg logit contributions
ancer-0.038
icking-0.037
ancing-0.037
ver-0.037
ete-0.036
Token cher
Token ID21382
Feature activation-4.29e+00

Pos logit contributions
 st+0.046
 p+0.046
 r+0.046
 scratches+0.042
 hurt+0.042
Neg logit contributions
ancer-0.079
icking-0.073
ver-0.072
ancing-0.072
ummy-0.071
Tokenries
Token ID1678
Feature activation-7.56e-01

Pos logit contributions
 p+0.424
 r+0.422
 st+0.420
 hurt+0.400
 trash+0.398
Neg logit contributions
ancer-0.309
icking-0.272
ancing-0.269
ver-0.260
ummy-0.257
Token to
Token ID284
Feature activation-8.43e-01

Pos logit contributions
 p+0.024
 st+0.022
 r+0.021
 hurt+0.019
 pet+0.018
Neg logit contributions
ancer-0.108
ancing-0.101
icking-0.098
ver-0.098
ummy-0.097
Token eat
Token ID4483
Feature activation+3.26e-01

Pos logit contributions
 p+0.050
 st+0.049
 r+0.048
 hurt+0.045
 scratches+0.044
Neg logit contributions
ancer-0.095
ancing-0.087
icking-0.086
ver-0.085
ummy-0.084
Token.
Token ID13
Feature activation-3.38e-01

Pos logit contributions
ancer+0.046
ancing+0.043
icking+0.042
ver+0.042
ummy+0.041
Neg logit contributions
 p-0.011
 st-0.011
 r-0.010
 hurt-0.009
 trash-0.009
Token She
Token ID1375
Feature activation-1.07e-01

Pos logit contributions
 p+0.022
 r+0.021
 st+0.021
 pet+0.020
 hurt+0.019
Neg logit contributions
ancer-0.037
ancing-0.033
icking-0.033
ver-0.033
ummy-0.032
Token.
Token ID13
Feature activation-5.07e-01

Pos logit contributions
 p+0.035
 r+0.034
 st+0.033
 hurt+0.030
 pet+0.029
Neg logit contributions
ancer-0.112
ver-0.103
ancing-0.103
icking-0.101
ete-0.096
Token
Token ID220
Feature activation-2.11e-01

Pos logit contributions
 p+0.029
 r+0.028
 st+0.025
 pet+0.024
 s+0.023
Neg logit contributions
ancer-0.059
icking-0.056
ancing-0.055
ver-0.054
ress-0.050
Token
Token ID198
Feature activation-3.05e-01

Pos logit contributions
 r+0.004
 p+0.004
.+0.003
 fun+0.003
 st+0.002
Neg logit contributions
ancer-0.030
icking-0.028
ancing-0.028
istent-0.027
ress-0.027
Token
Token ID198
Feature activation-3.49e-01

Pos logit contributions
 p+0.013
 r+0.012
 st+0.011
 hurt+0.010
P+0.010
Neg logit contributions
ancer-0.040
ancing-0.038
icking-0.038
ver-0.036
istent-0.035
TokenM
Token ID44
Feature activation-2.11e+00

Pos logit contributions
 p+0.019
 st+0.018
 r+0.018
 hurt+0.016
 scratches+0.016
Neg logit contributions
ancer-0.042
ancing-0.038
icking-0.038
ver-0.038
ummy-0.037
Tokenummy
Token ID13513
Feature activation+3.80e+00

Pos logit contributions
 p+0.296
 r+0.294
 st+0.285
 pr+0.278
 hurt+0.274
Neg logit contributions
ancer-0.072
ancing-0.053
ver-0.050
icking-0.036
cess-0.027
Token paid
Token ID3432
Feature activation-3.76e-01

Pos logit contributions
ancer+0.484
ancing+0.449
icking+0.445
ver+0.442
ummy+0.434
Neg logit contributions
 p-0.165
 st-0.162
 r-0.159
 hurt-0.143
 scratches-0.139
Token for
Token ID329
Feature activation+1.54e+00

Pos logit contributions
 p+0.024
 st+0.023
 r+0.023
 hurt+0.021
 pet+0.021
Neg logit contributions
ancer-0.042
ancing-0.038
ver-0.037
icking-0.037
ummy-0.036
Token the
Token ID262
Feature activation-1.56e-01

Pos logit contributions
ancer+0.211
ancing+0.197
ver+0.191
icking+0.191
apping+0.188
Neg logit contributions
 p-0.067
 r-0.063
 st-0.062
 hurt-0.055
 fun-0.055
Token cake
Token ID12187
Feature activation-1.04e+00

Pos logit contributions
 st+0.009
 p+0.009
 r+0.008
 hurt+0.008
 scratches+0.008
Neg logit contributions
ancer-0.017
icking-0.016
ummy-0.016
ver-0.016
ancing-0.016
Token.
Token ID13
Feature activation-6.09e-01

Pos logit contributions
 p+0.046
 r+0.041
 st+0.041
 sc+0.036
 pet+0.036
Neg logit contributions
ancer-0.140
ancing-0.128
ver-0.123
istent-0.122
icking-0.122
Token
Token ID198
Feature activation+4.62e-01

Pos logit contributions
ancer+0.010
ancing+0.009
icking+0.009
ver+0.009
ummy+0.009
Neg logit contributions
 p-0.005
 st-0.005
 r-0.005
 hurt-0.004
 pet-0.004
Token
Token ID198
Feature activation+3.99e-01

Pos logit contributions
ancer+0.053
ancing+0.049
icking+0.049
ver+0.048
ummy+0.047
Neg logit contributions
 p-0.026
 st-0.026
 r-0.025
 hurt-0.023
 pet-0.023
TokenThey
Token ID2990
Feature activation-7.61e-01

Pos logit contributions
ancer+0.040
ancing+0.037
ver+0.036
ummy+0.036
icking+0.036
Neg logit contributions
 st-0.027
 p-0.027
 r-0.026
 hurt-0.025
 scratches-0.025
Token both
Token ID1111
Feature activation+3.08e-01

Pos logit contributions
 r+0.044
 st+0.043
 p+0.043
 trash+0.043
 scratches+0.042
Neg logit contributions
ancer-0.079
ancing-0.073
ver-0.073
icking-0.072
ete-0.069
Token g
Token ID308
Feature activation-1.36e+00

Pos logit contributions
ancer+0.028
ancing+0.026
ver+0.026
icking+0.025
ete+0.024
Neg logit contributions
P-0.021
 st-0.020
 r-0.020
 scratches-0.019
 trash-0.019
Tokeniggle
Token ID24082
Feature activation+3.05e+00

Pos logit contributions
 p+0.159
 r+0.156
 st+0.154
 pet+0.149
 scratches+0.148
Neg logit contributions
ancer-0.078
ver-0.064
ancing-0.060
icking-0.058
ress-0.053
Token and
Token ID290
Feature activation+4.60e-01

Pos logit contributions
ancer+0.311
icking+0.290
ancing+0.288
ver+0.282
ummy+0.276
Neg logit contributions
 p-0.192
 st-0.190
 r-0.185
 trash-0.177
 hurt-0.177
Token play
Token ID711
Feature activation-3.68e-02

Pos logit contributions
ancer+0.046
ancing+0.042
ver+0.041
icking+0.041
ete+0.038
Neg logit contributions
 r-0.028
 st-0.028
 p-0.027
 trash-0.027
P-0.026
Token with
Token ID351
Feature activation-4.73e-01

Pos logit contributions
 st+0.002
 p+0.002
 r+0.002
 trash+0.002
 scratches+0.002
Neg logit contributions
ancer-0.004
ver-0.003
ancing-0.003
icking-0.003
apping-0.003
Token the
Token ID262
Feature activation+3.09e-01

Pos logit contributions
 st+0.022
 p+0.022
 r+0.022
 hurt+0.019
 scratches+0.019
Neg logit contributions
ancer-0.056
icking-0.052
ancing-0.051
ver-0.051
ummy-0.051
Token clown
Token ID25573
Feature activation-1.43e+00

Pos logit contributions
ancer+0.035
ummy+0.030
icking+0.029
ancing+0.029
ver+0.028
Neg logit contributions
 r-0.016
 st-0.016
 wished-0.015
 scratches-0.015
 p-0.015
Token a
Token ID257
Feature activation-4.70e-02

Pos logit contributions
ancer+0.001
ancing+0.001
icking+0.001
ver+0.001
ummy+0.001
Neg logit contributions
 p-0.001
 st-0.001
 r-0.001
 pet-0.000
 hurt-0.000
Token giant
Token ID6175
Feature activation-2.08e-01

Pos logit contributions
 p+0.002
 r+0.002
 st+0.002
 scratches+0.002
 wished+0.002
Neg logit contributions
ancer-0.006
ver-0.005
ancing-0.005
icking-0.005
ummy-0.005
Token smile
Token ID8212
Feature activation-6.36e-01

Pos logit contributions
 wished+0.012
 scratches+0.011
 st+0.011
 scattered+0.011
 r+0.011
Neg logit contributions
ancer-0.021
icking-0.019
ummy-0.019
ancing-0.019
ress-0.018
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
 p+0.024
 st+0.023
 r+0.023
 hurt+0.020
 scratches+0.019
Neg logit contributions
ancer-0.085
ancing-0.079
icking-0.079
ver-0.078
ummy-0.077
Token the
Token ID262
Feature activation-1.55e-01

Pos logit contributions
 p+0.066
 r+0.063
 st+0.062
 hurt+0.061
 pet+0.060
Neg logit contributions
ancer-0.136
icking-0.124
ancing-0.122
ver-0.120
ummy-0.117
Token world
Token ID995
Feature activation+3.06e+00

Pos logit contributions
 st+0.009
 p+0.009
 r+0.009
 scratches+0.008
 hurt+0.008
Neg logit contributions
ancer-0.017
icking-0.016
ancing-0.016
ummy-0.016
ver-0.015
Token.
Token ID13
Feature activation-2.83e-01

Pos logit contributions
ancer+0.405
ancing+0.378
icking+0.374
ver+0.372
ummy+0.365
Neg logit contributions
 p-0.112
 st-0.111
 r-0.107
 scratches-0.096
 hurt-0.096
Token
Token ID198
Feature activation-2.17e-01

Pos logit contributions
 p+0.010
 r+0.010
 fun+0.009
 pet+0.009
 st+0.008
Neg logit contributions
ancer-0.039
icking-0.038
ancing-0.036
ver-0.034
apping-0.033
Token
Token ID198
Feature activation-3.48e-01

Pos logit contributions
 p+0.007
 r+0.007
 st+0.007
 pet+0.007
 hurt+0.006
Neg logit contributions
ancer-0.029
icking-0.029
ancing-0.029
istent-0.026
ummy-0.026
TokenOne
Token ID3198
Feature activation-1.09e-01

Pos logit contributions
 p+0.020
 st+0.020
 r+0.019
 hurt+0.018
 trash+0.018
Neg logit contributions
ancer-0.039
ver-0.036
ancing-0.036
icking-0.035
ummy-0.035
Token day
Token ID1110
Feature activation-1.42e+00

Pos logit contributions
 p+0.006
 st+0.006
 scratches+0.006
 r+0.006
 trash+0.006
Neg logit contributions
ancer-0.012
icking-0.011
ummy-0.011
ancing-0.011
ver-0.011
Token Pete
Token ID17542
Feature activation+2.86e+00

Pos logit contributions
ancer+0.152
ancing+0.146
icking+0.140
ver+0.139
istent+0.135
Neg logit contributions
 p-0.104
 r-0.098
 st-0.095
 pet-0.094
 bad-0.092
Token"
Token ID1
Feature activation-9.30e-01

Pos logit contributions
ancer+0.358
ancing+0.332
icking+0.326
ver+0.324
ummy+0.318
Neg logit contributions
 p-0.135
 st-0.133
 r-0.131
 hurt-0.119
 pet-0.115
Token
Token ID198
Feature activation+4.84e-01

Pos logit contributions
 p+0.061
 st+0.059
 r+0.057
P+0.056
 fun+0.055
Neg logit contributions
ancing-0.097
ancer-0.096
icking-0.092
ver-0.090
ummy-0.087
Token
Token ID198
Feature activation+1.94e-01

Pos logit contributions
ancer+0.058
icking+0.053
ancing+0.053
ver+0.052
ummy+0.050
Neg logit contributions
 p-0.026
 r-0.025
 st-0.025
 hurt-0.023
 pet-0.023
TokenP
Token ID47
Feature activation-3.49e+00

Pos logit contributions
ancer+0.025
ancing+0.024
icking+0.023
ver+0.023
ummy+0.023
Neg logit contributions
 p-0.008
 st-0.008
 r-0.008
 hurt-0.007
 pet-0.007
Tokenete
Token ID14471
Feature activation+3.58e+00

Pos logit contributions
 p+0.419
 st+0.413
 r+0.410
P+0.405
 che+0.401
Neg logit contributions
ancer-0.183
ancing-0.125
cess-0.108
icking-0.106
otion-0.105
Token was
Token ID373
Feature activation-1.88e-01

Pos logit contributions
ancer+0.418
ancing+0.381
icking+0.377
ver+0.374
ummy+0.367
Neg logit contributions
 p-0.203
 st-0.199
 r-0.197
 hurt-0.180
 pet-0.177
Token touched
Token ID12615
Feature activation-1.34e+00

Pos logit contributions
 p+0.012
 r+0.012
 st+0.012
 hurt+0.011
 pet+0.011
Neg logit contributions
ancer-0.021
ancing-0.019
icking-0.019
ummy-0.018
ver-0.018
Token by
Token ID416
Feature activation-1.03e+00

Pos logit contributions
 p+0.068
 st+0.065
 r+0.062
 pet+0.059
 hurt+0.058
Neg logit contributions
ancer-0.169
ancing-0.154
ver-0.152
icking-0.149
ummy-0.145
Token his
Token ID465
Feature activation-1.14e+00

Pos logit contributions
 p+0.056
 st+0.051
 r+0.050
 pet+0.048
 hurt+0.047
Neg logit contributions
ancer-0.132
ver-0.120
ancing-0.118
icking-0.117
ummy-0.113
Token dad
Token ID9955
Feature activation-3.62e-01

Pos logit contributions
 p+0.085
 st+0.083
 r+0.082
 hurt+0.078
 pet+0.077
Neg logit contributions
ancer-0.113
ancing-0.101
ver-0.100
icking-0.100
ummy-0.096
Token the
Token ID262
Feature activation-5.20e-03

Pos logit contributions
 st+0.003
 r+0.003
 p+0.003
 hurt+0.003
P+0.003
Neg logit contributions
ancer-0.008
icking-0.008
ancing-0.008
ummy-0.008
ver-0.007
Token corner
Token ID5228
Feature activation+1.72e+00

Pos logit contributions
 st+0.000
 scratches+0.000
sm+0.000
 hurt+0.000
 p+0.000
Neg logit contributions
ancer-0.000
icking-0.000
ummy-0.000
ancing-0.000
ama-0.000
Token.
Token ID13
Feature activation-6.99e-01

Pos logit contributions
ancer+0.212
ancing+0.194
ver+0.189
icking+0.189
ummy+0.184
Neg logit contributions
 p-0.089
 st-0.086
 r-0.086
 hurt-0.077
 pet-0.075
Token When
Token ID1649
Feature activation+4.60e-01

Pos logit contributions
 p+0.051
 r+0.049
 st+0.046
 pet+0.044
 s+0.044
Neg logit contributions
ancer-0.076
icking-0.070
ancing-0.069
ver-0.068
apping-0.062
Token M
Token ID337
Feature activation+3.75e-01

Pos logit contributions
ancer+0.055
ancing+0.052
icking+0.051
ver+0.050
ummy+0.048
Neg logit contributions
 p-0.026
 r-0.025
 st-0.025
 hurt-0.022
 pet-0.022
Tokenummy
Token ID13513
Feature activation+3.96e+00

Pos logit contributions
ancer+0.016
ver+0.013
ancing+0.010
ete+0.009
cess+0.007
Neg logit contributions
 p-0.049
 r-0.048
 sc-0.046
 st-0.045
 pr-0.045
Token saw
Token ID2497
Feature activation-9.55e-01

Pos logit contributions
ancer+0.533
ancing+0.498
icking+0.486
ver+0.484
ummy+0.481
Neg logit contributions
 p-0.159
 st-0.154
 r-0.153
 hurt-0.130
 pet-0.128
Token Sarah
Token ID10490
Feature activation+9.26e-01

Pos logit contributions
 p+0.052
 r+0.050
 st+0.049
 hurt+0.045
 trash+0.044
Neg logit contributions
ancer-0.118
ancing-0.107
ver-0.105
icking-0.104
apping-0.102
Token,
Token ID11
Feature activation+8.64e-02

Pos logit contributions
ancer+0.111
ancing+0.100
ver+0.098
icking+0.097
ummy+0.097
Neg logit contributions
 p-0.053
 st-0.052
 r-0.052
 hurt-0.048
 pet-0.047
Token she
Token ID673
Feature activation-8.30e-01

Pos logit contributions
ancer+0.010
ancing+0.010
icking+0.009
ver+0.009
ummy+0.009
Neg logit contributions
 p-0.005
 st-0.005
 r-0.005
 hurt-0.004
 pet-0.004
Token screamed
Token ID25421
Feature activation-8.52e-01

Pos logit contributions
 p+0.038
 st+0.038
 r+0.037
 trash+0.035
 scratches+0.034
Neg logit contributions
ancer-0.099
icking-0.093
ver-0.092
ancing-0.092
ummy-0.089
Token was
Token ID373
Feature activation-5.48e-02

Pos logit contributions
ancer+0.106
ver+0.101
icking+0.099
ancing+0.096
apping+0.094
Neg logit contributions
 p-0.049
 st-0.049
 trash-0.049
 scratches-0.048
 r-0.047
Token scared
Token ID12008
Feature activation-1.42e+00

Pos logit contributions
 p+0.003
 st+0.003
 r+0.003
 trash+0.003
 scratches+0.003
Neg logit contributions
ancer-0.006
ancing-0.006
icking-0.006
ver-0.006
ummy-0.006
Token,
Token ID11
Feature activation+5.10e-01

Pos logit contributions
 p+0.082
 r+0.079
 st+0.078
 hurt+0.073
 pet+0.070
Neg logit contributions
ancer-0.166
ancing-0.153
ver-0.151
icking-0.151
ummy-0.145
Token but
Token ID475
Feature activation+9.96e-01

Pos logit contributions
ancer+0.045
ancing+0.040
icking+0.039
ver+0.039
ummy+0.038
Neg logit contributions
 p-0.044
 st-0.044
 r-0.043
 hurt-0.042
 pet-0.040
Token the
Token ID262
Feature activation+1.19e-01

Pos logit contributions
ancer+0.104
icking+0.095
ver+0.095
ancing+0.093
ummy+0.089
Neg logit contributions
 st-0.064
 p-0.064
 r-0.062
 scratches-0.061
 trash-0.060
Token car
Token ID1097
Feature activation+1.95e-01

Pos logit contributions
ancer+0.011
ver+0.010
icking+0.010
ancing+0.009
ummy+0.009
Neg logit contributions
sm-0.008
 st-0.008
 p-0.008
 r-0.008
P-0.008
Token led
Token ID2957
Feature activation+1.29e+00

Pos logit contributions
ancer+0.023
ver+0.022
icking+0.021
ancing+0.021
ummy+0.020
Neg logit contributions
 p-0.009
 st-0.009
 r-0.009
 trash-0.009
 scratches-0.008
Token him
Token ID683
Feature activation+1.59e+00

Pos logit contributions
ancer+0.128
ancing+0.116
icking+0.115
ver+0.114
ummy+0.111
Neg logit contributions
 p-0.094
 st-0.093
 r-0.091
 hurt-0.086
 scratches-0.085
Token down
Token ID866
Feature activation+1.76e-01

Pos logit contributions
ancer+0.172
ver+0.164
icking+0.163
ancing+0.162
apping+0.146
Neg logit contributions
 p-0.092
 r-0.088
 st-0.087
P-0.083
 trash-0.083
Token the
Token ID262
Feature activation+1.30e-01

Pos logit contributions
ancer+0.024
ancing+0.022
icking+0.022
ver+0.021
ummy+0.021
Neg logit contributions
 p-0.007
 st-0.007
 r-0.007
 hurt-0.006
 pet-0.006
Token road
Token ID2975
Feature activation+6.21e-01

Pos logit contributions
ancer+0.012
icking+0.010
ver+0.010
ancing+0.010
ummy+0.010
Neg logit contributions
 st-0.010
 p-0.010
 r-0.009
 scratches-0.009
 pet-0.009
Token who
Token ID508
Feature activation-1.14e+00

Pos logit contributions
 p+0.008
 st+0.008
 r+0.007
 hurt+0.006
 scratches+0.006
Neg logit contributions
ancer-0.028
ancing-0.026
icking-0.026
ver-0.025
ummy-0.025
Token liked
Token ID8288
Feature activation-1.78e+00

Pos logit contributions
 st+0.054
 p+0.053
 r+0.051
 scratches+0.051
 trash+0.049
Neg logit contributions
ancer-0.138
ver-0.126
icking-0.122
ancing-0.122
ummy-0.118
Token to
Token ID284
Feature activation-8.11e-01

Pos logit contributions
 p+0.093
 st+0.092
 r+0.090
 scratches+0.085
 hurt+0.084
Neg logit contributions
ancer-0.204
ancing-0.185
icking-0.185
ummy-0.185
ver-0.181
Token play
Token ID711
Feature activation-1.68e-01

Pos logit contributions
 p+0.032
 st+0.031
 r+0.031
 hurt+0.027
 pet+0.027
Neg logit contributions
ancer-0.108
ancing-0.100
icking-0.099
ver-0.098
ummy-0.097
Token in
Token ID287
Feature activation+3.00e-01

Pos logit contributions
 scratches+0.011
 wished+0.011
 r+0.010
 st+0.010
 hurt+0.010
Neg logit contributions
ancer-0.016
ancing-0.015
icking-0.015
apping-0.014
ver-0.014
Token the
Token ID262
Feature activation+3.48e-01

Pos logit contributions
ancer+0.035
ummy+0.032
ancing+0.032
icking+0.031
ver+0.031
Neg logit contributions
 st-0.015
 p-0.014
 r-0.014
 scratches-0.014
 hurt-0.013
Token garden
Token ID11376
Feature activation-3.10e-01

Pos logit contributions
ancer+0.032
ummy+0.032
icking+0.030
ancing+0.029
imb+0.029
Neg logit contributions
 scratches-0.023
 st-0.022
 wished-0.022
 hurt-0.021
sm-0.021
Token.
Token ID13
Feature activation-1.26e-01

Pos logit contributions
 p+0.009
 st+0.009
 r+0.008
 hurt+0.007
 pet+0.007
Neg logit contributions
ancer-0.044
ancing-0.041
icking-0.041
ver-0.041
ummy-0.040
Token They
Token ID1119
Feature activation-5.57e-01

Pos logit contributions
 p+0.007
 r+0.007
 pet+0.007
 st+0.006
 trash+0.006
Neg logit contributions
ancer-0.015
icking-0.014
ancing-0.014
ver-0.014
ress-0.013
Token liked
Token ID8288
Feature activation-1.75e+00

Pos logit contributions
 p+0.024
 st+0.024
 scratches+0.024
 r+0.024
 trash+0.023
Neg logit contributions
ancer-0.068
ver-0.063
icking-0.061
ancing-0.060
ete-0.058
Token to
Token ID284
Feature activation-8.23e-01

Pos logit contributions
 p+0.092
 st+0.091
 r+0.088
 hurt+0.084
 scratches+0.084
Neg logit contributions
ancer-0.199
ancing-0.182
icking-0.182
ummy-0.180
ver-0.178
Token,
Token ID11
Feature activation+5.69e-02

Pos logit contributions
ancer+0.346
ancing+0.317
icking+0.313
ver+0.309
ete+0.298
Neg logit contributions
 p-0.154
 r-0.152
 st-0.148
 hurt-0.132
 scratches-0.125
Token she
Token ID673
Feature activation-1.01e+00

Pos logit contributions
ancer+0.007
ancing+0.007
icking+0.007
ver+0.007
ummy+0.006
Neg logit contributions
 p-0.003
 st-0.003
 r-0.003
 hurt-0.002
 pet-0.002
Token would
Token ID561
Feature activation+7.95e-01

Pos logit contributions
 p+0.048
 st+0.047
 r+0.046
 trash+0.044
 scratches+0.043
Neg logit contributions
ancer-0.120
ver-0.112
icking-0.111
ancing-0.111
ummy-0.108
Token take
Token ID1011
Feature activation+2.74e-02

Pos logit contributions
ancer+0.092
ancing+0.083
icking+0.083
ver+0.081
ummy+0.080
Neg logit contributions
 p-0.048
 r-0.047
 st-0.046
 hurt-0.043
 pet-0.042
Token a
Token ID257
Feature activation-2.56e-02

Pos logit contributions
ancer+0.004
ancing+0.003
icking+0.003
ver+0.003
ummy+0.003
Neg logit contributions
 p-0.001
 st-0.001
 r-0.001
 hurt-0.001
 scratches-0.001
Token break
Token ID2270
Feature activation+4.55e-01

Pos logit contributions
 p+0.001
 st+0.001
 r+0.001
 scratches+0.001
 wished+0.001
Neg logit contributions
ancer-0.003
ver-0.003
ancing-0.003
icking-0.003
ummy-0.003
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
ancer+0.056
ancing+0.051
ver+0.050
icking+0.050
ummy+0.048
Neg logit contributions
 p-0.024
 r-0.023
 st-0.023
 hurt-0.021
 trash-0.020
Token enjoy
Token ID2883
Feature activation-5.89e-01

Pos logit contributions
 p+0.060
 r+0.057
 st+0.055
 hurt+0.054
 pet+0.050
Neg logit contributions
ancer-0.147
ancing-0.134
ver-0.132
icking-0.132
ummy-0.127
Token a
Token ID257
Feature activation+3.44e-02

Pos logit contributions
 p+0.026
 st+0.025
 r+0.025
 hurt+0.022
 pet+0.021
Neg logit contributions
ancer-0.076
ancing-0.070
icking-0.070
ver-0.069
ummy-0.068
Token bit
Token ID1643
Feature activation-2.20e+00

Pos logit contributions
ancer+0.004
ummy+0.004
icking+0.004
ancing+0.004
ver+0.004
Neg logit contributions
 st-0.002
 p-0.002
 r-0.002
 scratches-0.002
 wished-0.002
Token of
Token ID286
Feature activation+5.89e-01

Pos logit contributions
 p+0.156
 st+0.155
 r+0.152
 hurt+0.143
 scratches+0.142
Neg logit contributions
ancer-0.218
ancing-0.197
icking-0.196
ummy-0.193
ver-0.192
Token friends
Token ID2460
Feature activation-2.86e-01

Pos logit contributions
ancer+0.027
ancing+0.024
ver+0.024
icking+0.024
ummy+0.024
Neg logit contributions
 p-0.011
 st-0.011
 r-0.010
 hurt-0.010
 pet-0.009
Token.
Token ID13
Feature activation-2.76e-01

Pos logit contributions
 st+0.010
 p+0.010
 r+0.010
 hurt+0.009
 pet+0.009
Neg logit contributions
ancer-0.040
ancing-0.037
ummy-0.036
icking-0.036
ver-0.036
Token He
Token ID679
Feature activation+1.59e-01

Pos logit contributions
 p+0.016
 r+0.016
 st+0.015
 pet+0.014
 che+0.014
Neg logit contributions
ancer-0.031
icking-0.030
ancing-0.029
ver-0.028
apping-0.027
Token would
Token ID561
Feature activation+1.06e+00

Pos logit contributions
ancer+0.018
ver+0.017
icking+0.017
ancing+0.017
ete+0.016
Neg logit contributions
 p-0.008
 trash-0.008
 st-0.008
 r-0.007
 scratches-0.007
Token often
Token ID1690
Feature activation+7.22e-01

Pos logit contributions
ancer+0.123
ancing+0.112
icking+0.111
ver+0.110
ummy+0.109
Neg logit contributions
 p-0.061
 st-0.059
 r-0.058
 hurt-0.054
 pet-0.054
Token fold
Token ID5591
Feature activation+4.94e-02

Pos logit contributions
ancer+0.080
ancing+0.073
icking+0.072
ver+0.071
ummy+0.070
Neg logit contributions
 p-0.044
 st-0.044
 r-0.043
 hurt-0.040
 pet-0.039
Token his
Token ID465
Feature activation-6.83e-01

Pos logit contributions
ancer+0.007
ancing+0.006
icking+0.006
ver+0.006
ummy+0.006
Neg logit contributions
 p-0.002
 st-0.002
 r-0.002
 hurt-0.002
 trash-0.002
Token paws
Token ID49193
Feature activation+2.04e-01

Pos logit contributions
 p+0.041
 r+0.040
 st+0.039
 wished+0.037
 scattered+0.037
Neg logit contributions
ancer-0.069
ancing-0.064
icking-0.064
ver-0.062
ummy-0.061
Token and
Token ID290
Feature activation+1.43e-01

Pos logit contributions
ancer+0.026
ancing+0.024
ummy+0.023
ver+0.023
icking+0.022
Neg logit contributions
 p-0.010
 st-0.010
 r-0.009
 hurt-0.009
 scattered-0.008
Token wait
Token ID4043
Feature activation-6.40e-01

Pos logit contributions
ancer+0.016
ancing+0.014
icking+0.014
ver+0.014
ummy+0.014
Neg logit contributions
 p-0.008
 st-0.008
 r-0.008
 trash-0.007
 scratches-0.007
Token patiently
Token ID37220
Feature activation+1.61e-01

Pos logit contributions
 p+0.043
 st+0.043
 r+0.042
 hurt+0.040
 scratches+0.039
Neg logit contributions
ancer-0.066
ancing-0.060
icking-0.060
ver-0.059
ummy-0.058
Token of
Token ID286
Feature activation+9.67e-01

Pos logit contributions
 p+0.078
 st+0.078
 r+0.076
 hurt+0.069
 scratches+0.068
Neg logit contributions
ancer-0.196
ancing-0.181
icking-0.180
ver-0.179
ummy-0.177
Token different
Token ID1180
Feature activation-9.77e-01

Pos logit contributions
ancer+0.122
ummy+0.113
ver+0.111
ancing+0.109
icking+0.107
Neg logit contributions
 st-0.035
 wished-0.032
 r-0.032
 p-0.031
 hurt-0.029
Token colors
Token ID7577
Feature activation-1.02e+00

Pos logit contributions
 st+0.056
 p+0.053
 pet+0.053
 hurt+0.051
 r+0.051
Neg logit contributions
ancer-0.105
ummy-0.099
icking-0.096
ancing-0.096
ver-0.095
Token and
Token ID290
Feature activation+8.21e-01

Pos logit contributions
 p+0.032
 r+0.030
 st+0.030
 hurt+0.025
 trash+0.023
Neg logit contributions
ancer-0.145
ancing-0.135
icking-0.134
ver-0.133
ummy-0.129
Token shapes
Token ID15268
Feature activation-7.97e-01

Pos logit contributions
ancer+0.070
ancing+0.063
ver+0.061
ummy+0.061
icking+0.059
Neg logit contributions
 st-0.063
P-0.062
 p-0.061
 r-0.059
sm-0.058
Token.
Token ID13
Feature activation+3.44e-01

Pos logit contributions
 st+0.035
 p+0.033
 wished+0.033
 r+0.032
 scratches+0.032
Neg logit contributions
ancer-0.094
ancing-0.090
ummy-0.090
icking-0.088
ver-0.088
Token They
Token ID1119
Feature activation-2.98e-01

Pos logit contributions
ancer+0.038
ancing+0.035
icking+0.035
ver+0.034
ummy+0.032
Neg logit contributions
 p-0.023
 r-0.022
 st-0.021
 pet-0.020
 trash-0.020
Token can
Token ID460
Feature activation-3.37e-01

Pos logit contributions
 st+0.022
sm+0.022
 p+0.021
 scratches+0.021
P+0.021
Neg logit contributions
ancer-0.026
ver-0.023
ancing-0.022
icking-0.022
ummy-0.021
Token make
Token ID787
Feature activation+7.18e-01

Pos logit contributions
 p+0.021
 r+0.020
 st+0.020
 hurt+0.019
 pet+0.018
Neg logit contributions
ancer-0.038
ancing-0.035
icking-0.035
ver-0.034
ummy-0.034
Token houses
Token ID7777
Feature activation-3.87e-01

Pos logit contributions
ancer+0.086
ancing+0.079
icking+0.078
ver+0.078
ummy+0.076
Neg logit contributions
 p-0.039
 st-0.037
 r-0.037
 hurt-0.034
 trash-0.033
Token,
Token ID11
Feature activation+1.24e+00

Pos logit contributions
 p+0.022
 st+0.021
 r+0.021
 hurt+0.020
 scratches+0.019
Neg logit contributions
ancer-0.044
ancing-0.041
icking-0.040
ver-0.040
ummy-0.039
Tokenons
Token ID684
Feature activation+2.35e-01

Pos logit contributions
ancer+0.333
ancing+0.302
ver+0.299
icking+0.295
ummy+0.294
Neg logit contributions
 p-0.411
 st-0.400
 r-0.387
 trash-0.381
 wished-0.379
Token to
Token ID284
Feature activation-9.14e-01

Pos logit contributions
ancer+0.026
icking+0.025
ancing+0.025
ver+0.025
ummy+0.023
Neg logit contributions
 p-0.013
 st-0.013
 r-0.013
 trash-0.012
 scratches-0.012
Token make
Token ID787
Feature activation+5.66e-01

Pos logit contributions
 p+0.053
 st+0.052
 r+0.052
 hurt+0.047
 trash+0.047
Neg logit contributions
ancer-0.101
ancing-0.094
icking-0.093
ver-0.092
ummy-0.090
Token a
Token ID257
Feature activation+2.61e-01

Pos logit contributions
ancer+0.070
ancing+0.065
icking+0.064
ver+0.064
ummy+0.063
Neg logit contributions
 p-0.027
 st-0.026
 r-0.026
 hurt-0.024
 scratches-0.023
Token pretty
Token ID2495
Feature activation-9.53e-01

Pos logit contributions
ancer+0.028
ummy+0.026
icking+0.026
ancing+0.025
ver+0.025
Neg logit contributions
 p-0.015
 st-0.015
 r-0.014
 scratches-0.013
 hurt-0.013
Token picture
Token ID4286
Feature activation-4.28e-01

Pos logit contributions
 st+0.055
 p+0.054
 r+0.053
 scratches+0.051
 hurt+0.050
Neg logit contributions
ancer-0.101
ummy-0.095
ancing-0.094
icking-0.094
ver-0.090
Token.
Token ID13
Feature activation-3.94e-01

Pos logit contributions
 p+0.018
 st+0.017
 r+0.017
 hurt+0.015
 pet+0.015
Neg logit contributions
ancer-0.056
ancing-0.052
icking-0.051
ver-0.051
ummy-0.050
Token
Token ID198
Feature activation-1.65e-01

Pos logit contributions
 p+0.023
 r+0.022
 st+0.022
 pet+0.020
 hurt+0.020
Neg logit contributions
ancer-0.045
ancing-0.042
icking-0.042
ver-0.041
ummy-0.039
Token
Token ID198
Feature activation-2.52e-01

Pos logit contributions
 p+0.007
 r+0.007
 st+0.007
 hurt+0.007
 pet+0.006
Neg logit contributions
ancer-0.021
ancing-0.020
icking-0.020
ver-0.019
ummy-0.018
TokenLater
Token ID18602
Feature activation+8.97e-01

Pos logit contributions
 p+0.019
 st+0.019
 r+0.018
 scratches+0.018
 hurt+0.017
Neg logit contributions
ancer-0.024
ancing-0.022
ver-0.022
icking-0.021
ummy-0.021
Token that
Token ID326
Feature activation-4.71e-01

Pos logit contributions
ancer+0.102
icking+0.095
ummy+0.094
ancing+0.094
ver+0.092
Neg logit contributions
 p-0.049
 st-0.048
 r-0.047
 scratches-0.045
 trash-0.044
Token bear
Token ID6842
Feature activation+4.91e-01

Pos logit contributions
ancer+0.240
ancing+0.221
icking+0.219
ver+0.216
ummy+0.213
Neg logit contributions
 p-0.111
 st-0.109
 r-0.108
 hurt-0.099
 scratches-0.098
Token.
Token ID13
Feature activation-1.62e-01

Pos logit contributions
ancer+0.060
ancing+0.056
icking+0.054
ver+0.054
ummy+0.053
Neg logit contributions
 p-0.026
 st-0.025
 r-0.025
 hurt-0.022
 pet-0.022
Token We
Token ID775
Feature activation+3.69e-01

Pos logit contributions
 p+0.011
 st+0.010
 r+0.010
 pet+0.010
 s+0.009
Neg logit contributions
ancer-0.017
icking-0.017
ancing-0.016
ver-0.016
ummy-0.015
Token can
Token ID460
Feature activation-1.73e-01

Pos logit contributions
ancer+0.040
icking+0.036
ver+0.036
ancing+0.036
ummy+0.034
Neg logit contributions
 p-0.022
 st-0.022
 r-0.022
 scratches-0.021
 trash-0.020
Token share
Token ID2648
Feature activation-5.78e-02

Pos logit contributions
 p+0.010
 r+0.010
 st+0.010
 hurt+0.009
 pet+0.009
Neg logit contributions
ancer-0.020
ancing-0.019
ver-0.018
icking-0.018
ummy-0.018
Token,"
Token ID553
Feature activation-2.09e+00

Pos logit contributions
 st+0.003
 r+0.003
 p+0.003
 scratches+0.003
 hurt+0.003
Neg logit contributions
ancer-0.006
ummy-0.006
ancing-0.006
icking-0.006
ver-0.006
Token L
Token ID406
Feature activation-4.68e-01

Pos logit contributions
 p+0.142
 st+0.141
 r+0.140
 hurt+0.130
 scratches+0.129
Neg logit contributions
ancer-0.211
ancing-0.189
icking-0.188
ver-0.188
ummy-0.186
Tokenila
Token ID10102
Feature activation-3.48e-01

Pos logit contributions
 p+0.033
 st+0.031
 r+0.031
 hurt+0.029
 sc+0.029
Neg logit contributions
ancer-0.047
ancing-0.044
icking-0.043
ummy-0.042
ver-0.042
Token says
Token ID1139
Feature activation+4.37e-02

Pos logit contributions
 st+0.020
 p+0.019
 r+0.019
 scratches+0.018
 trash+0.018
Neg logit contributions
ancer-0.038
icking-0.035
ver-0.035
ancing-0.035
ummy-0.034
Token.
Token ID13
Feature activation+5.09e-01

Pos logit contributions
ancer+0.006
ancing+0.006
icking+0.006
ver+0.006
ummy+0.006
Neg logit contributions
 p-0.001
 st-0.001
 r-0.001
 hurt-0.001
 pet-0.001
Token
Token ID198
Feature activation+3.26e-01

Pos logit contributions
ancer+0.060
icking+0.057
ancing+0.056
ver+0.054
ummy+0.053
Neg logit contributions
 p-0.029
 st-0.028
 r-0.027
 pet-0.026
 hurt-0.024
Token hard
Token ID1327
Feature activation-1.40e+00

Pos logit contributions
 p+0.035
 st+0.035
 r+0.035
 hurt+0.032
 scratches+0.030
Neg logit contributions
ancer-0.087
ancing-0.080
icking-0.078
ummy-0.077
ver-0.076
Token to
Token ID284
Feature activation-9.07e-01

Pos logit contributions
 p+0.072
 st+0.072
 r+0.071
 hurt+0.064
 scratches+0.063
Neg logit contributions
ancer-0.169
ancing-0.156
icking-0.154
ver-0.152
ummy-0.149
Token keep
Token ID1394
Feature activation+5.56e-02

Pos logit contributions
 p+0.049
 st+0.048
 r+0.047
 hurt+0.046
 pet+0.043
Neg logit contributions
ancer-0.111
ancing-0.102
icking-0.100
ummy-0.100
ver-0.099
Token us
Token ID514
Feature activation+1.38e+00

Pos logit contributions
ancer+0.007
ancing+0.006
ver+0.006
icking+0.006
ummy+0.006
Neg logit contributions
 st-0.003
 p-0.003
 r-0.003
 hurt-0.003
 trash-0.003
Token safe
Token ID3338
Feature activation+6.24e-01

Pos logit contributions
ancer+0.174
ancing+0.159
ummy+0.155
ver+0.154
icking+0.153
Neg logit contributions
 p-0.068
 st-0.066
 r-0.066
 hurt-0.062
 trash-0.058
Token.
Token ID13
Feature activation-5.53e-01

Pos logit contributions
ancer+0.087
ancing+0.082
ver+0.079
icking+0.079
ummy+0.078
Neg logit contributions
 p-0.022
 st-0.021
 r-0.021
 hurt-0.019
 trash-0.017
Token The
Token ID383
Feature activation-3.86e-01

Pos logit contributions
 p+0.036
 r+0.034
 st+0.033
 pet+0.031
 s+0.030
Neg logit contributions
ancer-0.060
ancing-0.058
icking-0.057
ver-0.056
otion-0.051
Token little
Token ID1310
Feature activation-2.95e-01

Pos logit contributions
 p+0.022
 st+0.021
 r+0.021
 hurt+0.019
 pet+0.019
Neg logit contributions
ancer-0.045
ancing-0.042
icking-0.041
ver-0.041
ummy-0.040
Token girl
Token ID2576
Feature activation-2.81e-03

Pos logit contributions
 p+0.017
 st+0.016
 r+0.016
 hurt+0.015
 pet+0.015
Neg logit contributions
ancer-0.034
ancing-0.032
icking-0.031
ver-0.031
ummy-0.030
Token smiled
Token ID13541
Feature activation-7.71e-01

Pos logit contributions
 p+0.000
 st+0.000
 r+0.000
 hurt+0.000
 scratches+0.000
Neg logit contributions
ancer-0.000
ancing-0.000
icking-0.000
ver-0.000
ummy-0.000
Token and
Token ID290
Feature activation+9.85e-02

Pos logit contributions
 p+0.042
 st+0.041
 r+0.041
 hurt+0.037
 pet+0.037
Neg logit contributions
ancer-0.092
ancing-0.084
icking-0.083
ver-0.083
ummy-0.081
Token sink
Token ID14595
Feature activation-4.12e-01

Pos logit contributions
ancer+0.015
ancing+0.013
icking+0.013
ummy+0.013
ver+0.013
Neg logit contributions
 p-0.010
 st-0.010
 r-0.010
 scratches-0.010
 hurt-0.010
Token.
Token ID13
Feature activation-4.03e-01

Pos logit contributions
 p+0.012
 st+0.011
 r+0.011
 hurt+0.009
 pet+0.008
Neg logit contributions
ancer-0.060
ancing-0.056
ver-0.055
icking-0.055
ummy-0.054
Token She
Token ID1375
Feature activation+9.68e-02

Pos logit contributions
 p+0.029
 st+0.027
 r+0.027
 pet+0.025
 s+0.024
Neg logit contributions
ancer-0.045
icking-0.043
ancing-0.042
ver-0.039
ummy-0.037
Token looked
Token ID3114
Feature activation-8.61e-01

Pos logit contributions
ancer+0.012
ancing+0.011
icking+0.011
ver+0.011
ummy+0.011
Neg logit contributions
 p-0.004
 st-0.004
 r-0.004
 trash-0.004
 scratches-0.004
Token over
Token ID625
Feature activation-3.95e-01

Pos logit contributions
P+0.063
 st+0.062
 scratches+0.062
 p+0.060
 r+0.059
Neg logit contributions
ancer-0.070
icking-0.069
ancing-0.066
ver-0.064
ummy-0.063
Token and
Token ID290
Feature activation-1.67e-01

Pos logit contributions
 st+0.023
 p+0.023
 r+0.023
 hurt+0.022
 trash+0.021
Neg logit contributions
ancer-0.041
ummy-0.038
ancing-0.038
icking-0.038
ver-0.037
Token saw
Token ID2497
Feature activation-6.86e-01

Pos logit contributions
 st+0.006
 p+0.005
 r+0.005
 scratches+0.005
 trash+0.005
Neg logit contributions
ancer-0.022
icking-0.020
ancing-0.020
ver-0.020
ete-0.020
Token that
Token ID326
Feature activation-3.70e-01

Pos logit contributions
 p+0.030
 st+0.029
 r+0.029
 hurt+0.026
 scratches+0.025
Neg logit contributions
ancer-0.088
ancing-0.082
icking-0.081
ver-0.080
ummy-0.079
Token a
Token ID257
Feature activation+2.99e-01

Pos logit contributions
 p+0.018
 r+0.017
 st+0.017
 hurt+0.016
 pet+0.015
Neg logit contributions
ancer-0.048
ancing-0.045
icking-0.044
ver-0.043
ummy-0.043
Token little
Token ID1310
Feature activation-1.65e-01

Pos logit contributions
ancer+0.036
ancing+0.033
icking+0.033
ver+0.033
ummy+0.033
Neg logit contributions
 p-0.014
 st-0.014
 r-0.014
 hurt-0.013
 scratches-0.012
Token frog
Token ID21264
Feature activation-4.39e-01

Pos logit contributions
 p+0.009
 st+0.009
 r+0.009
 hurt+0.008
 pet+0.008
Neg logit contributions
ancer-0.019
ancing-0.018
ver-0.018
icking-0.018
ummy-0.017
Token The
Token ID383
Feature activation-2.38e-01

Pos logit contributions
 p+0.036
 r+0.036
 pet+0.034
 st+0.033
 s+0.033
Neg logit contributions
ancer-0.061
icking-0.058
ancing-0.057
ver-0.057
ete-0.053
Token daughter
Token ID4957
Feature activation+9.91e-01

Pos logit contributions
 p+0.017
 st+0.017
 r+0.017
 hurt+0.016
 pet+0.015
Neg logit contributions
ancer-0.024
ancing-0.023
ver-0.022
icking-0.022
ummy-0.021
Token played
Token ID2826
Feature activation-5.40e-01

Pos logit contributions
ancer+0.143
ancing+0.138
ummy+0.132
icking+0.131
ver+0.129
Neg logit contributions
 p-0.033
 r-0.031
 st-0.031
 pet-0.025
 wished-0.025
Token with
Token ID351
Feature activation-8.64e-01

Pos logit contributions
 p+0.027
 st+0.026
 r+0.026
 hurt+0.023
 pet+0.023
Neg logit contributions
ancer-0.069
icking-0.062
ancing-0.062
ver-0.061
ummy-0.061
Token her
Token ID607
Feature activation-4.49e-01

Pos logit contributions
 p+0.034
 st+0.034
 r+0.033
 hurt+0.030
 scratches+0.029
Neg logit contributions
ancer-0.111
ancing-0.103
icking-0.103
ver-0.102
ummy-0.101
Token toys
Token ID14958
Feature activation-5.34e-01

Pos logit contributions
 r+0.023
 p+0.023
 st+0.023
 hurt+0.021
 scratches+0.021
Neg logit contributions
ancer-0.052
icking-0.048
ummy-0.047
ver-0.047
ancing-0.047
Token and
Token ID290
Feature activation+8.67e-03

Pos logit contributions
 p+0.029
 st+0.028
 r+0.028
 hurt+0.025
 pet+0.025
Neg logit contributions
ancer-0.064
ancing-0.059
icking-0.058
ver-0.057
ummy-0.057
Token the
Token ID262
Feature activation+3.17e-01

Pos logit contributions
ancer+0.001
ver+0.001
icking+0.001
ancing+0.001
ummy+0.001
Neg logit contributions
 st-0.000
 p-0.000
 r-0.000
 scratches-0.000
 hurt-0.000
Token mom
Token ID1995
Feature activation+2.27e-01

Pos logit contributions
ancer+0.034
ancing+0.031
icking+0.031
ver+0.030
ummy+0.030
Neg logit contributions
 p-0.020
 st-0.020
 r-0.020
 hurt-0.018
 scratches-0.018
Token watched
Token ID7342
Feature activation-8.85e-01

Pos logit contributions
ancer+0.030
ancing+0.028
icking+0.027
ver+0.027
ummy+0.027
Neg logit contributions
 p-0.009
 st-0.009
 r-0.009
 hurt-0.008
 pet-0.008
Token her
Token ID607
Feature activation-3.94e-01

Pos logit contributions
 p+0.038
 r+0.038
 st+0.037
 hurt+0.031
 pet+0.031
Neg logit contributions
ancer-0.119
ver-0.110
ancing-0.109
icking-0.107
ete-0.105
Token on
Token ID319
Feature activation+7.84e-01

Pos logit contributions
 st+0.038
 r+0.036
 p+0.036
 hurt+0.032
 pet+0.028
Neg logit contributions
ancer-0.131
ancing-0.122
icking-0.120
ummy-0.119
ver-0.118
Token,
Token ID11
Feature activation+5.72e-01

Pos logit contributions
ancer+0.101
ancing+0.093
ver+0.090
icking+0.089
apping+0.087
Neg logit contributions
 p-0.041
 st-0.040
 r-0.039
 hurt-0.034
 bad-0.032
Token Lily
Token ID20037
Feature activation-7.98e-01

Pos logit contributions
ancer+0.067
ancing+0.063
icking+0.061
ver+0.061
ummy+0.059
Neg logit contributions
 p-0.032
 r-0.031
 st-0.031
 hurt-0.028
 pet-0.028
Token shared
Token ID4888
Feature activation+6.17e-01

Pos logit contributions
 p+0.038
 st+0.037
 r+0.036
 trash+0.034
 scratches+0.034
Neg logit contributions
ancer-0.097
icking-0.089
ancing-0.089
ver-0.089
ummy-0.087
Token her
Token ID607
Feature activation-3.50e-01

Pos logit contributions
ancer+0.080
ancing+0.075
icking+0.074
ver+0.073
ummy+0.073
Neg logit contributions
 p-0.023
 st-0.023
 r-0.023
 hurt-0.020
 scratches-0.020
Token clay
Token ID21558
Feature activation-2.99e+00

Pos logit contributions
 st+0.019
 p+0.019
 r+0.018
 wished+0.018
 hurt+0.018
Neg logit contributions
ancer-0.037
icking-0.035
ummy-0.035
ancing-0.034
ver-0.033
Token creations
Token ID28443
Feature activation-1.36e+00

Pos logit contributions
 p+0.186
 st+0.185
 r+0.183
 hurt+0.171
 scratches+0.170
Neg logit contributions
ancer-0.317
ancing-0.292
icking-0.291
ver-0.286
ummy-0.286
Token with
Token ID351
Feature activation-5.07e-01

Pos logit contributions
 p+0.069
 r+0.066
 st+0.066
 pet+0.061
 hurt+0.060
Neg logit contributions
ancer-0.170
ancing-0.154
icking-0.151
ver-0.149
ummy-0.144
Token others
Token ID1854
Feature activation-4.58e-01

Pos logit contributions
 p+0.021
 st+0.020
 r+0.020
 hurt+0.018
 scratches+0.017
Neg logit contributions
ancer-0.065
ancing-0.060
icking-0.060
ver-0.060
ummy-0.059
Token and
Token ID290
Feature activation+4.23e-01

Pos logit contributions
 p+0.024
 st+0.023
 r+0.023
 hurt+0.021
 pet+0.020
Neg logit contributions
ancer-0.055
ancing-0.051
icking-0.050
ver-0.050
ummy-0.049
Token felt
Token ID2936
Feature activation-5.26e-01

Pos logit contributions
ancer+0.047
icking+0.043
ancing+0.043
ver+0.043
ummy+0.042
Neg logit contributions
 p-0.024
 st-0.023
 r-0.023
 trash-0.021
 scratches-0.021
Token Billy
Token ID15890
Feature activation+1.07e+00

Pos logit contributions
ancer+0.197
ancing+0.177
ver+0.177
icking+0.175
ummy+0.173
Neg logit contributions
 p-0.088
 r-0.085
 st-0.085
 hurt-0.083
 pet-0.080
Token have
Token ID423
Feature activation-2.30e-01

Pos logit contributions
ancer+0.125
ancing+0.115
icking+0.113
ver+0.113
ummy+0.111
Neg logit contributions
 p-0.060
 st-0.059
 r-0.058
 hurt-0.054
 pet-0.053
Token fun
Token ID1257
Feature activation-1.42e+00

Pos logit contributions
 st+0.011
 r+0.011
 p+0.011
P+0.010
 scratches+0.010
Neg logit contributions
ancer-0.024
icking-0.023
ver-0.023
ummy-0.023
ancing-0.023
Token with
Token ID351
Feature activation-7.33e-01

Pos logit contributions
 p+0.059
 st+0.058
 r+0.057
 hurt+0.051
 scratches+0.050
Neg logit contributions
ancer-0.181
ancing-0.168
icking-0.167
ver-0.165
ummy-0.163
Token the
Token ID262
Feature activation+1.82e-01

Pos logit contributions
 p+0.039
 st+0.038
 r+0.037
 hurt+0.035
 scratches+0.034
Neg logit contributions
ancer-0.088
ancing-0.081
icking-0.080
ver-0.079
ummy-0.078
Token clay
Token ID21558
Feature activation-3.16e+00

Pos logit contributions
ancer+0.021
ancing+0.019
icking+0.019
ver+0.019
ummy+0.019
Neg logit contributions
 p-0.010
 st-0.010
 r-0.009
 hurt-0.009
 scratches-0.008
Token.
Token ID13
Feature activation+1.79e-01

Pos logit contributions
 p+0.124
 st+0.123
 r+0.121
 scratches+0.107
 hurt+0.106
Neg logit contributions
ancer-0.409
ancing-0.379
icking-0.378
ver-0.374
ummy-0.372
Token He
Token ID679
Feature activation+3.16e-01

Pos logit contributions
ancer+0.021
icking+0.021
ancing+0.020
ver+0.019
apping+0.018
Neg logit contributions
 p-0.010
 r-0.009
 st-0.009
 pet-0.009
 wished-0.008
Token apologized
Token ID22893
Feature activation+4.91e-01

Pos logit contributions
ancer+0.034
ver+0.032
icking+0.032
ancing+0.031
ete+0.030
Neg logit contributions
 st-0.016
 p-0.015
 r-0.015
 trash-0.015
sm-0.014
Token to
Token ID284
Feature activation-7.86e-01

Pos logit contributions
ancer+0.059
ancing+0.055
icking+0.054
ver+0.054
ummy+0.053
Neg logit contributions
 p-0.025
 st-0.025
 r-0.024
 hurt-0.022
 scratches-0.022
Token Billy
Token ID15890
Feature activation+1.26e+00

Pos logit contributions
 p+0.051
 st+0.048
 r+0.048
 sc+0.046
 hurt+0.046
Neg logit contributions
ancer-0.088
ver-0.082
ancing-0.082
icking-0.080
apping-0.078
Token friends
Token ID2460
Feature activation-5.67e-01

Pos logit contributions
 p+0.016
 st+0.016
 r+0.016
 scratches+0.015
 trash+0.015
Neg logit contributions
ancer-0.032
icking-0.030
ancing-0.029
ummy-0.029
ver-0.029
Token agreed
Token ID4987
Feature activation+4.59e-01

Pos logit contributions
 p+0.024
 st+0.024
 r+0.023
 hurt+0.020
 pet+0.020
Neg logit contributions
ancer-0.075
ancing-0.070
ummy-0.069
icking-0.068
ver-0.067
Token that
Token ID326
Feature activation-7.94e-01

Pos logit contributions
ancer+0.059
ancing+0.053
icking+0.052
ver+0.052
ummy+0.051
Neg logit contributions
 p-0.022
 r-0.022
 st-0.022
 hurt-0.019
 pet-0.019
Token they
Token ID484
Feature activation-1.35e+00

Pos logit contributions
 r+0.042
 st+0.040
 p+0.040
 hurt+0.037
 wished+0.036
Neg logit contributions
ancer-0.098
ancing-0.091
ummy-0.088
icking-0.087
ver-0.087
Token were
Token ID547
Feature activation-1.05e+00

Pos logit contributions
 p+0.081
 st+0.079
 r+0.078
 scratches+0.073
 trash+0.073
Neg logit contributions
ancer-0.149
ver-0.137
ancing-0.136
icking-0.136
ummy-0.129
Token proud
Token ID6613
Feature activation-2.85e+00

Pos logit contributions
 p+0.067
 st+0.066
 r+0.065
 scratches+0.060
 trash+0.060
Neg logit contributions
ancer-0.113
ancing-0.104
icking-0.103
ver-0.102
ummy-0.099
Token.
Token ID13
Feature activation-1.02e+00

Pos logit contributions
 p+0.219
 st+0.214
 r+0.214
 hurt+0.200
 pet+0.198
Neg logit contributions
ancer-0.274
ancing-0.248
icking-0.242
ver-0.242
ummy-0.230
Token They
Token ID1119
Feature activation-1.52e+00

Pos logit contributions
 p+0.049
 st+0.048
 pet+0.047
 r+0.045
 s+0.041
Neg logit contributions
ancer-0.124
icking-0.121
ancing-0.111
ummy-0.110
ver-0.109
Token felt
Token ID2936
Feature activation-1.31e+00

Pos logit contributions
 p+0.069
 st+0.067
 r+0.066
 hurt+0.059
 pet+0.058
Neg logit contributions
ancer-0.193
ancing-0.179
icking-0.176
ver-0.175
ummy-0.173
Token content
Token ID2695
Feature activation+3.72e-02

Pos logit contributions
 p+0.081
 st+0.081
 r+0.079
 scratches+0.076
 trash+0.074
Neg logit contributions
ancer-0.135
ver-0.127
ancing-0.126
icking-0.125
ete-0.120
Token and
Token ID290
Feature activation-3.71e-01

Pos logit contributions
ancer+0.004
ancing+0.004
icking+0.004
ver+0.004
ummy+0.004
Neg logit contributions
 p-0.002
 st-0.002
 r-0.002
 hurt-0.002
 trash-0.002
Token
Token ID198
Feature activation+6.01e-01

Pos logit contributions
ancer+0.080
icking+0.074
ancing+0.074
ver+0.072
ummy+0.069
Neg logit contributions
 p-0.035
 r-0.033
 st-0.033
 hurt-0.030
 pet-0.030
TokenThey
Token ID2990
Feature activation-7.63e-01

Pos logit contributions
ancer+0.051
ancing+0.049
ummy+0.048
apping+0.048
ver+0.047
Neg logit contributions
 st-0.047
 p-0.044
 r-0.044
 hurt-0.044
 wished-0.042
Token take
Token ID1011
Feature activation+2.00e-01

Pos logit contributions
 st+0.044
 p+0.044
 r+0.043
 hurt+0.040
 scratches+0.040
Neg logit contributions
ancer-0.084
ancing-0.078
icking-0.077
ver-0.076
ummy-0.074
Token the
Token ID262
Feature activation+3.64e-01

Pos logit contributions
ancer+0.024
ancing+0.023
icking+0.023
ver+0.022
ummy+0.022
Neg logit contributions
 p-0.008
 st-0.008
 r-0.008
 wished-0.008
 hurt-0.008
Token ice
Token ID4771
Feature activation-1.14e+00

Pos logit contributions
ancer+0.039
icking+0.037
ummy+0.037
ancing+0.035
ver+0.034
Neg logit contributions
 wished-0.018
 r-0.018
 st-0.018
sm-0.018
P-0.017
Token cre
Token ID1126
Feature activation-2.97e+00

Pos logit contributions
 st+0.054
 wished+0.053
 r+0.053
 p+0.051
 scratches+0.050
Neg logit contributions
ancer-0.136
icking-0.128
ancing-0.126
ummy-0.124
ver-0.122
Tokenams
Token ID4105
Feature activation-1.40e-01

Pos logit contributions
 p+0.231
 st+0.228
 r+0.226
 hurt+0.215
 pet+0.211
Neg logit contributions
ancer-0.278
ancing-0.251
icking-0.249
ver-0.245
ummy-0.241
Token and
Token ID290
Feature activation+6.14e-01

Pos logit contributions
 p+0.008
 st+0.008
 r+0.007
 hurt+0.007
 pet+0.007
Neg logit contributions
ancer-0.016
ancing-0.015
ver-0.015
icking-0.015
ummy-0.014
Token lick
Token ID36728
Feature activation-1.84e+00

Pos logit contributions
ancer+0.065
ancing+0.060
icking+0.060
ver+0.059
ummy+0.057
Neg logit contributions
 r-0.033
 st-0.032
 p-0.032
sm-0.030
P-0.030
Token them
Token ID606
Feature activation-3.89e-01

Pos logit contributions
 p+0.060
 st+0.057
 r+0.057
 pet+0.048
 trash+0.048
Neg logit contributions
ancer-0.267
ancing-0.243
ver-0.241
icking-0.241
ummy-0.236
Token.
Token ID13
Feature activation+4.17e-01

Pos logit contributions
 st+0.018
 p+0.018
 wished+0.018
 r+0.018
 scratches+0.017
Neg logit contributions
ancer-0.044
icking-0.043
ancing-0.043
ummy-0.041
ver-0.041
Token
Token ID198
Feature activation+3.74e-01

Pos logit contributions
ancer+0.049
ancing+0.045
icking+0.044
ver+0.044
ummy+0.044
Neg logit contributions
 p-0.027
 st-0.027
 r-0.026
 hurt-0.024
 scratches-0.024
TokenM
Token ID44
Feature activation-1.53e+00

Pos logit contributions
ancer+0.032
ummy+0.029
ete+0.029
icking+0.029
ancing+0.028
Neg logit contributions
 st-0.030
 p-0.030
 r-0.029
 scratches-0.028
 hurt-0.028
Tokenia
Token ID544
Feature activation+2.38e-01

Pos logit contributions
 p+0.146
 r+0.145
 st+0.143
 scratches+0.136
 hurt+0.134
Neg logit contributions
ancer-0.118
ancing-0.103
ver-0.103
icking-0.095
istent-0.092
Token feels
Token ID5300
Feature activation-7.73e-01

Pos logit contributions
ancer+0.025
ancing+0.024
icking+0.023
ver+0.023
ummy+0.023
Neg logit contributions
 st-0.014
 p-0.014
 r-0.014
 hurt-0.013
 trash-0.013
Token a
Token ID257
Feature activation+3.15e-01

Pos logit contributions
 st+0.040
 p+0.040
 r+0.038
 scratches+0.034
 scattered+0.034
Neg logit contributions
ancer-0.085
ver-0.079
ancing-0.079
icking-0.078
ummy-0.076
Token tug
Token ID27762
Feature activation-3.31e+00

Pos logit contributions
ancer+0.033
ummy+0.031
ver+0.031
ancing+0.030
icking+0.029
Neg logit contributions
 p-0.016
 st-0.015
 r-0.015
 scratches-0.015
 scattered-0.014
Token on
Token ID319
Feature activation+7.42e-01

Pos logit contributions
 p+0.169
 st+0.166
 r+0.163
 trash+0.154
 scratches+0.152
Neg logit contributions
ancer-0.391
icking-0.361
ancing-0.361
ver-0.352
ummy-0.351
Token her
Token ID607
Feature activation-3.68e-01

Pos logit contributions
ancer+0.085
ancing+0.083
ummy+0.082
icking+0.081
ete+0.078
Neg logit contributions
 st-0.033
 p-0.032
 r-0.031
 wished-0.030
 hurt-0.029
Token pole
Token ID16825
Feature activation-7.94e-01

Pos logit contributions
 p+0.027
 st+0.026
 r+0.026
 wished+0.024
 scratches+0.024
Neg logit contributions
ancer-0.036
ancing-0.033
icking-0.033
ver-0.032
ummy-0.032
Token too
Token ID1165
Feature activation+2.87e-01

Pos logit contributions
 p+0.024
 st+0.023
 r+0.023
 hurt+0.019
 pet+0.018
Neg logit contributions
ancer-0.113
ancing-0.105
ver-0.104
icking-0.104
ummy-0.102
Token.
Token ID13
Feature activation+1.15e-01

Pos logit contributions
ancer+0.038
ancing+0.035
icking+0.035
ver+0.035
ummy+0.035
Neg logit contributions
 p-0.011
 st-0.010
 r-0.010
 trash-0.009
 scratches-0.009