TOP ACTIVATIONS
MAX = 6.488

away in her pig gy bank every week . This way
blanket before it was torn S ue decided
the grass and looked up at the big , blue
Ben start to argue about whose doll is better . They
the novel and the people who made it . It is
their toys with other kids who do not have any .
and Ben are just kids who were curious . He tells
They started to argue about whose turn it was .
silly and not like real hipp os . One
Ben started to argue about whose doll was better . They
was an adorable little girl whose name was Alice . Alice
give jam to the kids who say yes . They open
" It doesn 't matter whose barrel is bigger . What
and Sam were squirrel s who liked to find nuts in
are bigger than it ?" So they all
the temple and the people who came there . They said
bed wearing his favourite py j amas , feeling so proud
can be lions and tigers who drink from the fountain ."
It might be slippery or icky or have snakes ."
and made long and cur ving lines . She was very

MIN ACTIVATIONS
MIN = -6.824

That was the setup that fore shadow ed the bad ending
L ily had fore shadow ed the danger with
wrong . She saw the mess and the tears and said
there and wait . The fore shadow ing of the dull
the sky before they walked fore shadow ed the down pour
happened . She saw the mess and the broken v ase
Let 's clean up the mess and have some fun !"
He felt a great satisfaction and decided to never forget
. He felt a sudden relief that he had helped the
door . She saw the mess . She saw the cr
end . M oral value : The story teaches
kitchen . He sees the mess . He sees Tom with
bathroom . She sees the mess and the ring . She
She felt a wave of relief as their day continued .
. She wiped up the mess and made everything nice and
happened . She saw the mess and said , " Oh
quickly , and saw the mess . They were worried ,
quickly . She sees the mess and Tom 's tears .
. They cleaned up the mess together and even made some
. They look at the mess of blocks .

INTERVAL 3.160 - 6.488
CONTAINS 1.909%

asked herself why it felt so quiet and empty .
to make the boo - bo o go away . Tim
Ben said . He was scared and serious .
soon . <|endoftext|> Once , there was a reliable start .
Once upon a time , there was a little boy named

INTERVAL -0.168 - 3.160
CONTAINS 50.993%

have to be gentle . The piano is not a toy
a time , there was a little girl named Lily .
they saw a big poster with a picture of a lion
a time , there was a little boy who was so
for her act of courage , and she was even more

INTERVAL -3.496 - -0.168
CONTAINS 46.339%

The little girl said , " My name is Chloe
were sharing lots of c udd les and love . Now
remind her before they left . They had a fun day
playing outside . One day , Sammy and his best friend
playing with it . From then on , she

INTERVAL -6.824 - -3.496
CONTAINS 0.758%

best of friends . The end . <|endoftext|> M olly and
happy . It liked to clean plates and cups . The
So they paid for the gift , hopped back on the
ed every year and brought happiness to everyone who saw it
Her eyes lit up with joy when she saw it up
Token away
Token ID1497
Feature activation-1.36e+00

Pos logit contributions
quet+0.046
ful+0.039
ment+0.037
bly+0.037
 ray+0.037
Neg logit contributions
 disagreed-0.044
 challenged-0.042
rolling-0.041
 begged-0.041
 argued-0.040
Token in
Token ID287
Feature activation+1.89e+00

Pos logit contributions
quet+0.050
aid+0.046
Happy+0.041
 bless+0.041
bly+0.040
Neg logit contributions
 disagreed-0.039
 One-0.037
 upon-0.037
rolling-0.036
 challenged-0.036
Token her
Token ID607
Feature activation-9.98e-02

Pos logit contributions
 One+0.066
 challenged+0.062
 disagreed+0.062
 favorites+0.062
 age+0.062
Neg logit contributions
quet-0.061
aid-0.053
paste-0.052
ful-0.052
cake-0.052
Token pig
Token ID12967
Feature activation+4.13e+00

Pos logit contributions
quet+0.004
ful+0.004
bly+0.004
aid+0.004
cake+0.004
Neg logit contributions
rolling-0.003
 favorites-0.003
 One-0.002
 challenged-0.002
 disagreed-0.002
Tokengy
Token ID1360
Feature activation+3.69e+00

Pos logit contributions
rolling+0.127
 challenged+0.124
 disagreed+0.119
 favorites+0.117
 Whoever+0.116
Neg logit contributions
quet-0.147
ful-0.143
bly-0.126
ment-0.124
kin-0.122
Tokenbank
Token ID17796
Feature activation+6.49e+00

Pos logit contributions
rolling+0.069
 challenged+0.065
 disagreed+0.061
 Whoever+0.060
 One+0.059
Neg logit contributions
quet-0.176
ful-0.174
bly-0.158
ment-0.156
kin-0.155
Token every
Token ID790
Feature activation+3.03e+00

Pos logit contributions
rolling+0.250
 disagreed+0.246
 challenged+0.246
 favorites+0.236
 One+0.236
Neg logit contributions
quet-0.180
ful-0.160
 fairy-0.140
bly-0.140
aid-0.140
Token week
Token ID1285
Feature activation-1.19e+00

Pos logit contributions
rolling+0.095
 challenged+0.094
 disagreed+0.090
 Whoever+0.089
 One+0.088
Neg logit contributions
quet-0.106
ful-0.104
bly-0.090
ment-0.089
aid-0.087
Token.
Token ID13
Feature activation-2.71e-01

Pos logit contributions
quet+0.037
ful+0.035
aid+0.034
bly+0.034
web+0.030
Neg logit contributions
 disagreed-0.041
rolling-0.040
 challenged-0.040
 begged-0.039
 One-0.038
Token This
Token ID770
Feature activation+4.02e-01

Pos logit contributions
ful+0.010
quet+0.010
ment+0.009
 fairy+0.008
 Fairy+0.008
Neg logit contributions
rolling-0.008
 challenged-0.008
 disagreed-0.008
 padd-0.007
 argued-0.007
Token way
Token ID835
Feature activation-5.69e-02

Pos logit contributions
 challenged+0.012
rolling+0.012
 disagreed+0.011
 Whoever+0.011
 One+0.011
Neg logit contributions
quet-0.015
ful-0.015
bly-0.013
ment-0.013
aid-0.012
Token blanket
Token ID18447
Feature activation-1.21e-01

Pos logit contributions
rolling+0.014
 favorites+0.014
 disagreed+0.014
 challenged+0.013
 One+0.013
Neg logit contributions
quet-0.018
ful-0.017
bly-0.015
ment-0.015
aid-0.015
Token before
Token ID878
Feature activation+2.05e+00

Pos logit contributions
quet+0.004
aid+0.003
ful+0.003
Happy+0.003
ment+0.003
Neg logit contributions
 disagreed-0.004
rolling-0.004
 One-0.004
 challenged-0.004
 begged-0.003
Token it
Token ID340
Feature activation+1.41e+00

Pos logit contributions
rolling+0.069
 One+0.067
 challenged+0.066
 disagreed+0.065
 begged+0.063
Neg logit contributions
quet-0.066
ful-0.064
aid-0.061
bly-0.057
kin-0.057
Token was
Token ID373
Feature activation+1.97e+00

Pos logit contributions
 disagreed+0.035
 challenged+0.034
 One+0.034
rolling+0.034
 wanted+0.032
Neg logit contributions
quet-0.061
ful-0.057
bly-0.052
igsaw-0.051
aid-0.051
Token torn
Token ID12445
Feature activation-2.59e+00

Pos logit contributions
 disagreed+0.061
rolling+0.061
 One+0.060
 challenged+0.060
 begged+0.059
Neg logit contributions
quet-0.071
ful-0.063
ment-0.059
aid-0.056
cake-0.056
Token
Token ID220
Feature activation+6.29e+00

Pos logit contributions
quet+0.100
aid+0.088
 bless+0.084
Happy+0.080
 relief+0.080
Neg logit contributions
 One-0.069
 disagreed-0.066
 exception-0.065
rolling-0.065
inters-0.064
Token
Token ID198
Feature activation-3.43e-01

Pos logit contributions
rolling+0.208
 challenged+0.198
 Whoever+0.194
 disagreed+0.190
 One+0.188
Neg logit contributions
quet-0.209
ful-0.206
bly-0.175
 fairy-0.175
ment-0.175
Token
Token ID198
Feature activation-6.88e-01

Pos logit contributions
quet+0.013
ful+0.013
bly+0.011
ment+0.011
 fairy+0.011
Neg logit contributions
rolling-0.010
 challenged-0.009
 Whoever-0.009
 disagreed-0.009
 favorites-0.009
TokenS
Token ID50
Feature activation-1.92e+00

Pos logit contributions
quet+0.018
ful+0.018
ment+0.015
kin+0.015
bly+0.015
Neg logit contributions
rolling-0.027
 challenged-0.027
 Whoever-0.026
 One-0.026
 disagreed-0.025
Tokenue
Token ID518
Feature activation-1.15e+00

Pos logit contributions
ful+0.064
quet+0.063
bly+0.057
kin+0.054
 fairy+0.053
Neg logit contributions
rolling-0.063
 disagreed-0.061
 challenged-0.060
 Whoever-0.058
 One-0.058
Token decided
Token ID3066
Feature activation+2.88e-01

Pos logit contributions
quet+0.050
ful+0.047
bly+0.044
kin+0.043
ment+0.043
Neg logit contributions
rolling-0.027
 challenged-0.027
 disagreed-0.027
 begged-0.025
 Whoever-0.024
Token the
Token ID262
Feature activation-5.03e-01

Pos logit contributions
quet+0.010
ful+0.009
aid+0.009
bly+0.009
paste+0.008
Neg logit contributions
 One-0.010
 padd-0.009
 challenged-0.009
 favorites-0.009
rolling-0.009
Token grass
Token ID8701
Feature activation+7.65e-01

Pos logit contributions
quet+0.021
ful+0.019
ment+0.019
aid+0.019
kin+0.019
Neg logit contributions
rolling-0.013
 banks-0.012
 One-0.012
 padd-0.012
 challenged-0.011
Token and
Token ID290
Feature activation+2.72e-01

Pos logit contributions
rolling+0.029
 challenged+0.028
 disagreed+0.027
 One+0.027
 Whoever+0.027
Neg logit contributions
quet-0.022
ful-0.021
aid-0.018
 fairy-0.018
bly-0.018
Token looked
Token ID3114
Feature activation+1.28e+00

Pos logit contributions
rolling+0.008
 challenged+0.008
 Whoever+0.008
 disagreed+0.008
 favorites+0.008
Neg logit contributions
ful-0.010
quet-0.010
ment-0.008
 fairy-0.008
bly-0.008
Token up
Token ID510
Feature activation+4.08e+00

Pos logit contributions
rolling+0.042
 challenged+0.041
 Whoever+0.041
 disagreed+0.039
Whoever+0.039
Neg logit contributions
ful-0.043
quet-0.040
 fairy-0.036
bly-0.036
ment-0.036
Token
Token ID220
Feature activation+6.27e+00

Pos logit contributions
rolling+0.116
 challenged+0.112
 Whoever+0.108
 disagreed+0.107
 One+0.104
Neg logit contributions
quet-0.155
ful-0.153
bly-0.133
ment-0.132
kin-0.132
Token at
Token ID379
Feature activation+1.60e+00

Pos logit contributions
 challenged+0.195
 disagreed+0.188
rolling+0.188
 Whoever+0.183
 argued+0.182
Neg logit contributions
ful-0.224
quet-0.216
ment-0.190
kin-0.187
bly-0.186
Token the
Token ID262
Feature activation-2.25e-01

Pos logit contributions
rolling+0.074
 challenged+0.072
 Whoever+0.071
 disagreed+0.070
 favorites+0.068
Neg logit contributions
ful-0.033
quet-0.032
ment-0.025
 fairy-0.024
bly-0.024
Token big
Token ID1263
Feature activation-2.44e+00

Pos logit contributions
quet+0.009
ful+0.009
aid+0.008
ment+0.008
kin+0.008
Neg logit contributions
rolling-0.006
 One-0.006
 favorites-0.006
 challenged-0.006
 padd-0.005
Token,
Token ID11
Feature activation+1.16e-01

Pos logit contributions
quet+0.099
ful+0.097
aid+0.092
bly+0.092
kin+0.086
Neg logit contributions
 One-0.064
 disagreed-0.061
rolling-0.061
 challenged-0.061
 asks-0.060
Token blue
Token ID4171
Feature activation+2.49e+00

Pos logit contributions
rolling+0.003
 challenged+0.003
 Whoever+0.003
 disagreed+0.003
 One+0.003
Neg logit contributions
quet-0.004
ful-0.004
ment-0.004
bly-0.004
kin-0.004
Token Ben
Token ID3932
Feature activation+5.40e-01

Pos logit contributions
rolling+0.003
 challenged+0.003
 Whoever+0.002
 disagreed+0.002
 favorites+0.002
Neg logit contributions
ful-0.002
quet-0.002
 fairy-0.002
ment-0.002
bly-0.002
Token start
Token ID923
Feature activation-3.52e-01

Pos logit contributions
rolling+0.016
 challenged+0.015
 disagreed+0.015
 Whoever+0.015
 One+0.014
Neg logit contributions
quet-0.021
ful-0.020
ment-0.018
bly-0.017
kin-0.017
Token to
Token ID284
Feature activation+3.51e-01

Pos logit contributions
quet+0.012
ful+0.012
bly+0.010
ment+0.010
kin+0.010
Neg logit contributions
rolling-0.011
 challenged-0.011
 Whoever-0.011
 disagreed-0.011
 favorites-0.010
Token argue
Token ID7267
Feature activation+2.23e+00

Pos logit contributions
rolling+0.009
 challenged+0.009
 Whoever+0.009
 favorites+0.009
 disagreed+0.008
Neg logit contributions
ful-0.014
quet-0.014
 fairy-0.012
bly-0.012
kin-0.012
Token about
Token ID546
Feature activation+1.17e-01

Pos logit contributions
 One+0.077
 Whoever+0.076
outed+0.074
 challenged+0.074
 disagreed+0.074
Neg logit contributions
quet-0.072
ful-0.063
 fairy-0.060
 queens-0.058
aid-0.057
Token whose
Token ID3025
Feature activation+6.20e+00

Pos logit contributions
 challenged+0.004
 One+0.004
 favorites+0.004
 Whoever+0.004
rolling+0.004
Neg logit contributions
quet-0.004
ful-0.003
bly-0.003
cake-0.003
kin-0.003
Token doll
Token ID3654
Feature activation+1.91e+00

Pos logit contributions
rolling+0.199
 challenged+0.194
 favorites+0.193
 One+0.192
 Whoever+0.190
Neg logit contributions
quet-0.211
ful-0.207
bly-0.185
aid-0.173
ment-0.173
Token is
Token ID318
Feature activation+6.43e-01

Pos logit contributions
 disagreed+0.046
 challenged+0.045
rolling+0.042
 argued+0.041
 Whoever+0.041
Neg logit contributions
quet-0.084
ful-0.081
bly-0.075
ment-0.074
igsaw-0.070
Token better
Token ID1365
Feature activation-8.78e-01

Pos logit contributions
 challenged+0.019
 disagreed+0.019
 begged+0.017
rolling+0.017
 One+0.017
Neg logit contributions
quet-0.025
ful-0.022
ment-0.021
igsaw-0.021
bly-0.020
Token.
Token ID13
Feature activation-5.68e-01

Pos logit contributions
quet+0.033
bly+0.027
igsaw+0.027
cake+0.026
 ray+0.026
Neg logit contributions
 disagreed-0.028
 favorites-0.026
 challenged-0.026
 age-0.026
 wanted-0.026
Token They
Token ID1119
Feature activation+5.98e-01

Pos logit contributions
ful+0.017
ment+0.015
quet+0.014
 from+0.014
 fairy+0.014
Neg logit contributions
rolling-0.020
 challenged-0.020
 disagreed-0.019
 exception-0.018
 argued-0.018
Token the
Token ID262
Feature activation+1.07e+00

Pos logit contributions
quet+0.069
bly+0.058
ful+0.057
igsaw+0.054
paste+0.054
Neg logit contributions
 asks-0.052
 favorites-0.051
 One-0.051
 challenged-0.051
 disagreed-0.049
Token novel
Token ID5337
Feature activation-1.79e-01

Pos logit contributions
 challenged+0.036
rolling+0.036
 favorites+0.036
 disagreed+0.035
 One+0.035
Neg logit contributions
quet-0.034
ful-0.034
bly-0.030
ment-0.029
kin-0.028
Token and
Token ID290
Feature activation+1.02e+00

Pos logit contributions
quet+0.006
bly+0.006
ful+0.006
aid+0.006
sm+0.006
Neg logit contributions
 One-0.005
 Whoever-0.005
 disagreed-0.005
 favorite-0.005
 unsure-0.004
Token the
Token ID262
Feature activation+1.25e+00

Pos logit contributions
 asks+0.032
 challenged+0.032
 begged+0.031
 disagreed+0.030
 One+0.030
Neg logit contributions
quet-0.038
ful-0.034
bly-0.033
kin-0.031
sm-0.030
Token people
Token ID661
Feature activation-7.71e-01

Pos logit contributions
 favorites+0.039
 challenged+0.039
 disagreed+0.038
 One+0.037
rolling+0.037
Neg logit contributions
quet-0.044
ful-0.044
bly-0.039
ment-0.037
aid-0.035
Token who
Token ID508
Feature activation+6.16e+00

Pos logit contributions
quet+0.027
ful+0.024
ment+0.022
igsaw+0.022
bly+0.022
Neg logit contributions
 challenged-0.024
 Whoever-0.024
 disagreed-0.024
 asks-0.023
rolling-0.023
Token made
Token ID925
Feature activation+1.19e+00

Pos logit contributions
 challenged+0.179
rolling+0.177
 disagreed+0.174
 Whoever+0.170
 favorites+0.167
Neg logit contributions
quet-0.231
ful-0.222
bly-0.196
ment-0.193
kin-0.193
Token it
Token ID340
Feature activation+1.07e+00

Pos logit contributions
 challenged+0.034
 Whoever+0.034
 One+0.033
rolling+0.033
 favorites+0.033
Neg logit contributions
quet-0.045
ful-0.041
kin-0.038
bly-0.037
ment-0.037
Token.
Token ID13
Feature activation-2.37e-01

Pos logit contributions
 One+0.034
rolling+0.033
 challenged+0.033
 exception+0.032
 disagreed+0.032
Neg logit contributions
quet-0.036
ful-0.033
bly-0.030
kin-0.030
igsaw-0.029
Token It
Token ID632
Feature activation+9.32e-01

Pos logit contributions
quet+0.009
ful+0.009
kin+0.008
bly+0.008
ment+0.008
Neg logit contributions
rolling-0.006
 Whoever-0.006
 challenged-0.006
 One-0.006
 favorites-0.006
Token is
Token ID318
Feature activation+3.75e-01

Pos logit contributions
rolling+0.030
 challenged+0.029
 Whoever+0.028
 disagreed+0.028
 favorites+0.027
Neg logit contributions
quet-0.031
ful-0.031
bly-0.026
ment-0.026
 fairy-0.026
Token their
Token ID511
Feature activation+2.18e-01

Pos logit contributions
quet+0.041
bly+0.033
ross+0.033
igsaw+0.033
 Fairy+0.033
Neg logit contributions
 One-0.024
 favorites-0.023
 disagreed-0.023
 favorite-0.022
 wanted-0.022
Token toys
Token ID14958
Feature activation-9.96e-01

Pos logit contributions
rolling+0.007
 challenged+0.007
 favorites+0.006
 Whoever+0.006
 disagreed+0.006
Neg logit contributions
ful-0.008
quet-0.008
bly-0.007
ment-0.007
 fairy-0.006
Token with
Token ID351
Feature activation+2.16e+00

Pos logit contributions
quet+0.044
 Fairy+0.038
 ray+0.038
bly+0.037
 ow+0.037
Neg logit contributions
 disagreed-0.022
 favorite-0.021
 bank-0.021
 banks-0.020
 favorites-0.020
Token other
Token ID584
Feature activation+1.46e+00

Pos logit contributions
 favorites+0.060
 One+0.058
 disagreed+0.057
 challenged+0.057
 argued+0.055
Neg logit contributions
quet-0.091
ful-0.073
cake-0.072
 ray-0.071
bly-0.070
Token kids
Token ID3988
Feature activation+6.13e-01

Pos logit contributions
rolling+0.042
 challenged+0.042
 favorites+0.041
 disagreed+0.040
 Whoever+0.039
Neg logit contributions
quet-0.056
ful-0.054
bly-0.048
kin-0.047
ment-0.047
Token who
Token ID508
Feature activation+6.16e+00

Pos logit contributions
 disagreed+0.019
 challenged+0.018
 favorites+0.018
 begged+0.018
 argued+0.018
Neg logit contributions
quet-0.022
ful-0.019
 ray-0.019
bly-0.018
ment-0.018
Token do
Token ID466
Feature activation-1.93e-01

Pos logit contributions
rolling+0.184
 challenged+0.181
 Whoever+0.174
 disagreed+0.173
 favorites+0.168
Neg logit contributions
quet-0.225
ful-0.220
ment-0.190
bly-0.190
kin-0.188
Token not
Token ID407
Feature activation+3.50e-01

Pos logit contributions
quet+0.008
ful+0.007
 ray+0.006
ment+0.006
igsaw+0.006
Neg logit contributions
 challenged-0.005
 disagreed-0.005
 One-0.005
 Whoever-0.005
 favorites-0.005
Token have
Token ID423
Feature activation-1.03e+00

Pos logit contributions
 challenged+0.010
 disagreed+0.010
 begged+0.010
 argued+0.009
 Whoever+0.009
Neg logit contributions
quet-0.013
ful-0.012
ment-0.012
web-0.012
bly-0.012
Token any
Token ID597
Feature activation+1.54e+00

Pos logit contributions
quet+0.033
ful+0.032
bly+0.027
kin+0.027
ment+0.027
Neg logit contributions
rolling-0.035
 challenged-0.035
 disagreed-0.034
 Whoever-0.034
 favorites-0.033
Token.
Token ID13
Feature activation-1.39e-01

Pos logit contributions
 challenged+0.044
 favorites+0.043
 One+0.043
 asks+0.042
 disagreed+0.042
Neg logit contributions
quet-0.059
ful-0.058
bly-0.051
kin-0.050
ment-0.049
Token and
Token ID290
Feature activation+7.46e-01

Pos logit contributions
 challenged+0.012
 disagreed+0.011
rolling+0.011
 Whoever+0.011
 asks+0.011
Neg logit contributions
quet-0.013
ful-0.011
bly-0.010
ment-0.010
aid-0.010
Token Ben
Token ID3932
Feature activation+9.47e-01

Pos logit contributions
rolling+0.024
 challenged+0.023
 Whoever+0.023
 disagreed+0.022
 One+0.022
Neg logit contributions
quet-0.026
ful-0.025
bly-0.022
ment-0.022
kin-0.022
Token are
Token ID389
Feature activation-1.15e+00

Pos logit contributions
rolling+0.031
 challenged+0.031
 disagreed+0.030
 Whoever+0.030
 One+0.029
Neg logit contributions
quet-0.032
ful-0.031
ment-0.027
bly-0.027
kin-0.026
Token just
Token ID655
Feature activation+2.64e+00

Pos logit contributions
quet+0.048
ful+0.042
ment+0.042
igsaw+0.040
cake+0.040
Neg logit contributions
 challenged-0.030
rolling-0.030
 begged-0.028
 padd-0.028
 disagreed-0.028
Token kids
Token ID3988
Feature activation+1.25e+00

Pos logit contributions
rolling+0.079
 challenged+0.075
 Whoever+0.074
 disagreed+0.072
 One+0.071
Neg logit contributions
ful-0.097
quet-0.095
bly-0.083
ment-0.082
 fairy-0.082
Token who
Token ID508
Feature activation+6.14e+00

Pos logit contributions
rolling+0.047
 challenged+0.047
 disagreed+0.046
 Whoever+0.045
 begged+0.044
Neg logit contributions
quet-0.036
ful-0.033
ment-0.030
bly-0.029
kin-0.029
Token were
Token ID547
Feature activation+4.96e-01

Pos logit contributions
 challenged+0.172
rolling+0.169
 disagreed+0.165
 Whoever+0.163
 favorites+0.157
Neg logit contributions
quet-0.239
ful-0.229
bly-0.202
ment-0.201
kin-0.199
Token curious
Token ID11040
Feature activation+4.79e+00

Pos logit contributions
rolling+0.015
 challenged+0.015
 Whoever+0.014
 disagreed+0.014
 favorites+0.014
Neg logit contributions
quet-0.018
ful-0.017
ment-0.015
kin-0.015
bly-0.015
Token.
Token ID13
Feature activation-5.20e-01

Pos logit contributions
 One+0.158
 disagreed+0.156
 challenged+0.155
 asks+0.151
 begged+0.150
Neg logit contributions
quet-0.168
bly-0.145
ful-0.144
kin-0.138
aid-0.132
Token He
Token ID679
Feature activation+1.04e+00

Pos logit contributions
ful+0.020
quet+0.019
ment+0.017
 fairy+0.017
bly+0.017
Neg logit contributions
rolling-0.015
 challenged-0.014
 disagreed-0.014
 Whoever-0.013
 favorites-0.013
Token tells
Token ID4952
Feature activation-3.38e-01

Pos logit contributions
rolling+0.032
inters+0.028
 favorites+0.027
 disagreed+0.027
 exception+0.027
Neg logit contributions
ful-0.034
 fairy-0.034
 fore-0.033
quet-0.033
 from-0.032
Token They
Token ID1119
Feature activation+3.74e-01

Pos logit contributions
ful+0.003
 from+0.003
cake+0.003
quet+0.003
 Amy+0.002
Neg logit contributions
rolling-0.007
 exception-0.007
 padd-0.007
 inc-0.006
 challenged-0.006
Token started
Token ID2067
Feature activation+1.97e+00

Pos logit contributions
rolling+0.014
 favorites+0.012
 Int+0.012
 exception+0.012
inters+0.012
Neg logit contributions
ful-0.012
 fairy-0.010
 from-0.010
quet-0.010
kin-0.010
Token to
Token ID284
Feature activation+9.22e-02

Pos logit contributions
rolling+0.079
 favorites+0.078
outed+0.077
 Whoever+0.075
Whoever+0.075
Neg logit contributions
ful-0.051
 from-0.039
quet-0.038
 accidents-0.036
 fairy-0.034
Token argue
Token ID7267
Feature activation+1.83e+00

Pos logit contributions
rolling+0.002
outed+0.002
 favorites+0.002
 challenged+0.002
 Whoever+0.002
Neg logit contributions
ful-0.003
quet-0.003
 fairy-0.003
 ray-0.003
 from-0.003
Token about
Token ID546
Feature activation+8.27e-03

Pos logit contributions
rolling+0.066
 challenged+0.066
 Whoever+0.064
 disagreed+0.063
 One+0.063
Neg logit contributions
quet-0.055
ful-0.053
 fairy-0.045
ment-0.044
bly-0.044
Token whose
Token ID3025
Feature activation+6.13e+00

Pos logit contributions
rolling+0.000
 challenged+0.000
 disagreed+0.000
 Whoever+0.000
 favorites+0.000
Neg logit contributions
ful-0.000
quet-0.000
 fairy-0.000
ment-0.000
bly-0.000
Token turn
Token ID1210
Feature activation-2.81e-01

Pos logit contributions
rolling+0.239
outed+0.224
 disagreed+0.223
 challenged+0.222
 Int+0.221
Neg logit contributions
ful-0.170
 fairy-0.152
 gift-0.151
 from-0.149
quet-0.142
Token it
Token ID340
Feature activation+8.49e-02

Pos logit contributions
quet+0.010
ful+0.009
bly+0.008
ment+0.008
 Fairy+0.008
Neg logit contributions
 disagreed-0.009
 challenged-0.009
 Whoever-0.009
rolling-0.009
 argued-0.009
Token was
Token ID373
Feature activation+1.21e+00

Pos logit contributions
rolling+0.003
 challenged+0.003
 disagreed+0.003
 Whoever+0.003
 One+0.003
Neg logit contributions
quet-0.003
ful-0.002
ment-0.002
bly-0.002
kin-0.002
Token.
Token ID13
Feature activation-2.36e-01

Pos logit contributions
 challenged+0.046
rolling+0.045
 disagreed+0.044
 Whoever+0.043
 One+0.043
Neg logit contributions
quet-0.036
ful-0.034
ment-0.030
bly-0.028
 fairy-0.028
Token
Token ID198
Feature activation-6.63e-01

Pos logit contributions
ful+0.005
quet+0.004
 from+0.004
cake+0.004
smart+0.003
Neg logit contributions
 exception-0.010
rolling-0.010
 padd-0.010
 challenged-0.009
 inc-0.009
Token silly
Token ID14397
Feature activation+1.15e+00

Pos logit contributions
quet+0.012
ful+0.012
ment+0.010
bly+0.010
 fairy+0.010
Neg logit contributions
rolling-0.008
 challenged-0.008
 disagreed-0.007
 Whoever-0.007
 begged-0.007
Token and
Token ID290
Feature activation+1.88e-01

Pos logit contributions
 disagreed+0.040
 challenged+0.039
 argued+0.038
 favorites+0.038
 begged+0.037
Neg logit contributions
quet-0.037
ful-0.033
bly-0.032
kin-0.031
aid-0.031
Token not
Token ID407
Feature activation+2.56e-01

Pos logit contributions
 disagreed+0.005
 challenged+0.005
 begged+0.005
 argued+0.005
rolling+0.005
Neg logit contributions
quet-0.008
ful-0.007
ment-0.007
bly-0.006
aid-0.006
Token like
Token ID588
Feature activation+1.34e+00

Pos logit contributions
 disagreed+0.008
 challenged+0.008
rolling+0.008
 argued+0.008
 begged+0.008
Neg logit contributions
quet-0.009
ful-0.008
ment-0.008
bly-0.008
kin-0.007
Token real
Token ID1103
Feature activation+4.95e-01

Pos logit contributions
 disagreed+0.041
 challenged+0.040
 Whoever+0.040
 One+0.040
 argued+0.039
Neg logit contributions
quet-0.048
ful-0.043
bly-0.042
kin-0.041
aid-0.040
Token hipp
Token ID18568
Feature activation+6.12e+00

Pos logit contributions
 challenged+0.014
 favorites+0.014
rolling+0.014
 disagreed+0.014
 One+0.014
Neg logit contributions
ful-0.018
quet-0.018
bly-0.017
kin-0.017
paste-0.015
Tokenos
Token ID418
Feature activation+2.58e+00

Pos logit contributions
rolling+0.128
 disagreed+0.127
 challenged+0.126
 favorites+0.119
 Whoever+0.117
Neg logit contributions
quet-0.270
ful-0.261
kin-0.246
bly-0.245
ment-0.243
Token.
Token ID13
Feature activation-1.14e-01

Pos logit contributions
 disagreed+0.086
 One+0.084
 challenged+0.083
 Whoever+0.079
 argued+0.079
Neg logit contributions
quet-0.084
ful-0.077
ment-0.075
kin-0.073
 Tooth-0.072
Token
Token ID198
Feature activation-5.66e-01

Pos logit contributions
ful+0.004
quet+0.004
ment+0.003
 fairy+0.003
bly+0.003
Neg logit contributions
rolling-0.004
 challenged-0.003
 disagreed-0.003
 padd-0.003
 exception-0.003
Token
Token ID198
Feature activation-8.45e-01

Pos logit contributions
ful+0.016
quet+0.014
sm+0.013
aid+0.012
 fairy+0.012
Neg logit contributions
rolling-0.022
 disagreed-0.022
 argued-0.022
 challenged-0.022
 favorites-0.021
TokenOne
Token ID3198
Feature activation+2.30e+00

Pos logit contributions
ful+0.024
quet+0.024
bly+0.020
 fairy+0.019
kin+0.019
Neg logit contributions
rolling-0.032
 challenged-0.031
 disagreed-0.030
 favorites-0.030
 Whoever-0.030
Token Ben
Token ID3932
Feature activation-6.78e-02

Pos logit contributions
quet+0.010
ful+0.010
bly+0.008
ment+0.008
 fairy+0.008
Neg logit contributions
rolling-0.009
 challenged-0.009
 disagreed-0.009
 Whoever-0.009
 favorites-0.008
Token started
Token ID2067
Feature activation+1.91e+00

Pos logit contributions
ful+0.002
quet+0.002
 fairy+0.002
bly+0.002
kin+0.002
Neg logit contributions
rolling-0.002
 exception-0.002
 challenged-0.002
 favorites-0.002
inters-0.002
Token to
Token ID284
Feature activation+1.81e-01

Pos logit contributions
rolling+0.070
 Whoever+0.068
 challenged+0.068
 disagreed+0.066
 favorites+0.066
Neg logit contributions
ful-0.056
quet-0.052
ment-0.044
 from-0.044
 fairy-0.043
Token argue
Token ID7267
Feature activation+1.84e+00

Pos logit contributions
rolling+0.005
 challenged+0.005
 Whoever+0.005
outed+0.005
 favorites+0.005
Neg logit contributions
ful-0.006
quet-0.006
 fairy-0.006
 ray-0.005
bly-0.005
Token about
Token ID546
Feature activation-1.43e-01

Pos logit contributions
 One+0.065
 Whoever+0.065
 challenged+0.065
rolling+0.064
 disagreed+0.064
Neg logit contributions
quet-0.057
ful-0.051
 fairy-0.047
aid-0.044
bly-0.044
Token whose
Token ID3025
Feature activation+6.12e+00

Pos logit contributions
quet+0.004
ful+0.004
kin+0.003
bly+0.003
 fairy+0.003
Neg logit contributions
rolling-0.005
 challenged-0.005
 Whoever-0.005
 disagreed-0.005
 One-0.005
Token doll
Token ID3654
Feature activation+1.73e+00

Pos logit contributions
rolling+0.216
 challenged+0.209
 Whoever+0.203
 disagreed+0.201
 favorites+0.194
Neg logit contributions
quet-0.191
ful-0.189
 fairy-0.159
ment-0.158
bly-0.157
Token was
Token ID373
Feature activation+1.11e+00

Pos logit contributions
 challenged+0.040
 disagreed+0.040
rolling+0.040
 Whoever+0.038
 favorites+0.037
Neg logit contributions
quet-0.075
ful-0.073
ment-0.066
bly-0.065
kin-0.063
Token better
Token ID1365
Feature activation-1.25e+00

Pos logit contributions
 challenged+0.033
 disagreed+0.033
 begged+0.030
rolling+0.030
 argued+0.030
Neg logit contributions
quet-0.043
ful-0.038
ment-0.037
igsaw-0.035
aid-0.034
Token.
Token ID13
Feature activation-7.01e-01

Pos logit contributions
quet+0.043
igsaw+0.035
bly+0.034
 ray+0.034
ful+0.034
Neg logit contributions
 disagreed-0.043
 challenged-0.042
 favorites-0.041
 begged-0.041
 age-0.040
Token They
Token ID1119
Feature activation+4.94e-02

Pos logit contributions
ful+0.017
 from+0.014
 fairy+0.013
 Lily+0.013
 lav+0.012
Neg logit contributions
rolling-0.028
 exception-0.026
 disagreed-0.026
 challenged-0.025
 argued-0.025
Token was
Token ID373
Feature activation+4.26e-01

Pos logit contributions
rolling+0.144
 challenged+0.135
 Whoever+0.134
inters+0.129
 favorites+0.128
Neg logit contributions
ful-0.132
quet-0.125
kin-0.112
 fairy-0.111
ment-0.107
Token an
Token ID281
Feature activation+3.97e-01

Pos logit contributions
rolling+0.018
 challenged+0.017
 Whoever+0.017
 disagreed+0.017
 One+0.016
Neg logit contributions
quet-0.011
ful-0.010
ment-0.008
bly-0.008
 fairy-0.008
Token adorable
Token ID29012
Feature activation+8.00e-01

Pos logit contributions
rolling+0.012
 challenged+0.012
 Whoever+0.012
 disagreed+0.012
 One+0.012
Neg logit contributions
quet-0.014
ful-0.013
ment-0.012
bly-0.012
kin-0.011
Token little
Token ID1310
Feature activation+5.33e-01

Pos logit contributions
rolling+0.021
 challenged+0.021
 One+0.020
 Whoever+0.020
 padd+0.019
Neg logit contributions
quet-0.031
ful-0.030
bly-0.028
ment-0.028
aid-0.028
Token girl
Token ID2576
Feature activation+5.43e-01

Pos logit contributions
 One+0.012
 challenged+0.012
 padd+0.012
rolling+0.012
 disagreed+0.011
Neg logit contributions
quet-0.022
ful-0.021
ment-0.020
bly-0.020
 fore-0.019
Token whose
Token ID3025
Feature activation+6.10e+00

Pos logit contributions
 One+0.012
 challenged+0.012
rolling+0.011
 asks+0.010
 wanted+0.010
Neg logit contributions
quet-0.025
ment-0.021
paste-0.021
bly-0.021
igsaw-0.021
Token name
Token ID1438
Feature activation+1.81e+00

Pos logit contributions
rolling+0.126
 challenged+0.118
 Whoever+0.112
 disagreed+0.112
 favorites+0.111
Neg logit contributions
quet-0.281
ful-0.278
bly-0.251
ment-0.246
kin-0.246
Token was
Token ID373
Feature activation+1.35e+00

Pos logit contributions
 One+0.031
 challenged+0.031
 disagreed+0.031
 wanted+0.030
 pointed+0.027
Neg logit contributions
quet-0.088
ful-0.086
bly-0.080
igsaw-0.080
 queens-0.080
Token Alice
Token ID14862
Feature activation+2.04e+00

Pos logit contributions
 One+0.048
rolling+0.048
 challenged+0.047
 disagreed+0.046
 Whoever+0.045
Neg logit contributions
quet-0.043
ful-0.038
ment-0.035
aid-0.035
bly-0.034
Token.
Token ID13
Feature activation-3.21e-01

Pos logit contributions
 One+0.073
 challenged+0.070
 disagreed+0.070
rolling+0.069
 favorites+0.068
Neg logit contributions
quet-0.064
ful-0.056
bly-0.054
ment-0.052
igsaw-0.051
Token Alice
Token ID14862
Feature activation+1.67e-01

Pos logit contributions
quet+0.009
ful+0.009
ment+0.007
cake+0.007
 fairy+0.007
Neg logit contributions
rolling-0.012
 challenged-0.012
 padd-0.011
 Whoever-0.011
 disagreed-0.011
Token give
Token ID1577
Feature activation+1.53e-01

Pos logit contributions
rolling+0.022
 challenged+0.021
 Whoever+0.020
 favorites+0.020
 disagreed+0.020
Neg logit contributions
ful-0.019
quet-0.019
 fairy-0.017
kin-0.016
aid-0.016
Token jam
Token ID16853
Feature activation-2.43e+00

Pos logit contributions
rolling+0.005
 challenged+0.005
 Whoever+0.005
 disagreed+0.005
 favorites+0.005
Neg logit contributions
quet-0.005
ful-0.005
bly-0.004
ment-0.004
kin-0.004
Token to
Token ID284
Feature activation+6.97e-01

Pos logit contributions
quet+0.099
 relief+0.094
igsaw+0.088
web+0.084
 satisfaction+0.083
Neg logit contributions
 favorite-0.058
 upon-0.054
 One-0.053
 favorites-0.051
 disagreed-0.048
Token the
Token ID262
Feature activation+8.03e-01

Pos logit contributions
 challenged+0.023
rolling+0.022
 Whoever+0.022
 One+0.022
 disagreed+0.022
Neg logit contributions
quet-0.024
ful-0.022
ment-0.020
bly-0.020
kin-0.020
Token kids
Token ID3988
Feature activation-1.70e-01

Pos logit contributions
rolling+0.024
 challenged+0.024
 Whoever+0.023
 disagreed+0.023
 favorites+0.022
Neg logit contributions
quet-0.029
ful-0.029
bly-0.025
ment-0.025
kin-0.024
Token who
Token ID508
Feature activation+6.10e+00

Pos logit contributions
quet+0.005
ful+0.004
bly+0.004
web+0.004
ment+0.004
Neg logit contributions
 challenged-0.006
rolling-0.006
 disagreed-0.006
 Whoever-0.006
 One-0.006
Token say
Token ID910
Feature activation+1.17e+00

Pos logit contributions
 challenged+0.167
 disagreed+0.162
 asks+0.161
 Whoever+0.159
 favorites+0.158
Neg logit contributions
quet-0.240
ful-0.234
bly-0.210
ment-0.205
web-0.197
Token yes
Token ID3763
Feature activation+2.93e+00

Pos logit contributions
 One+0.037
 disagreed+0.034
 challenged+0.034
 asks+0.033
 begged+0.033
Neg logit contributions
quet-0.043
web-0.037
ful-0.037
igsaw-0.037
bly-0.036
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
 challenged+0.112
 disagreed+0.111
 One+0.110
rolling+0.109
 Whoever+0.108
Neg logit contributions
quet-0.083
ful-0.078
bly-0.072
ment-0.067
kin-0.067
Token They
Token ID1119
Feature activation+3.49e-01

Pos logit contributions
ful+0.016
quet+0.016
 fairy+0.014
ment+0.014
bly+0.014
Neg logit contributions
rolling-0.012
 challenged-0.012
 disagreed-0.011
 exception-0.011
 Whoever-0.011
Token open
Token ID1280
Feature activation-1.59e+00

Pos logit contributions
rolling+0.010
 challenged+0.010
 Whoever+0.010
 disagreed+0.010
 One+0.009
Neg logit contributions
quet-0.013
ful-0.013
ment-0.011
bly-0.011
kin-0.011
Token "
Token ID366
Feature activation+1.25e+00

Pos logit contributions
quet+0.020
ful+0.020
ment+0.018
kin+0.018
bly+0.018
Neg logit contributions
rolling-0.009
 challenged-0.009
 Whoever-0.009
 disagreed-0.008
 One-0.008
TokenIt
Token ID1026
Feature activation-6.35e-01

Pos logit contributions
rolling+0.042
 challenged+0.041
 Whoever+0.039
 disagreed+0.039
 favorites+0.038
Neg logit contributions
quet-0.041
ful-0.041
bly-0.034
ment-0.034
 fairy-0.034
Token doesn
Token ID1595
Feature activation+7.74e-01

Pos logit contributions
ful+0.026
quet+0.025
 fairy+0.023
bly+0.023
 Fairy+0.022
Neg logit contributions
rolling-0.016
 favorites-0.014
 challenged-0.014
 Whoever-0.014
 disagreed-0.014
Token't
Token ID470
Feature activation+6.32e-01

Pos logit contributions
rolling+0.022
 Whoever+0.021
 challenged+0.021
 One+0.020
 disagreed+0.019
Neg logit contributions
quet-0.030
ful-0.029
ment-0.027
aid-0.026
kin-0.026
Token matter
Token ID2300
Feature activation-3.76e-01

Pos logit contributions
 challenged+0.019
rolling+0.019
 disagreed+0.018
 Whoever+0.018
 One+0.018
Neg logit contributions
quet-0.023
ful-0.022
ment-0.020
bly-0.020
kin-0.019
Token whose
Token ID3025
Feature activation+6.10e+00

Pos logit contributions
quet+0.011
ful+0.010
kin+0.009
bly+0.009
aid+0.009
Neg logit contributions
 challenged-0.014
rolling-0.014
 Whoever-0.014
 One-0.014
 disagreed-0.014
Token barrel
Token ID9036
Feature activation+2.20e+00

Pos logit contributions
rolling+0.220
 challenged+0.210
 Whoever+0.206
 disagreed+0.201
outed+0.198
Neg logit contributions
ful-0.187
quet-0.182
 fairy-0.160
ment-0.155
kin-0.150
Token is
Token ID318
Feature activation+4.28e-01

Pos logit contributions
 disagreed+0.065
 challenged+0.065
rolling+0.062
 Whoever+0.062
 asks+0.060
Neg logit contributions
quet-0.083
ful-0.079
bly-0.071
ment-0.071
aid-0.071
Token bigger
Token ID5749
Feature activation+1.95e+00

Pos logit contributions
 challenged+0.015
rolling+0.015
 disagreed+0.014
 Whoever+0.014
 One+0.014
Neg logit contributions
quet-0.014
ful-0.013
ment-0.012
aid-0.011
bly-0.011
Token.
Token ID13
Feature activation-1.96e-02

Pos logit contributions
 challenged+0.064
 disagreed+0.063
 Whoever+0.062
 favorites+0.061
 asks+0.060
Neg logit contributions
quet-0.068
ful-0.058
aid-0.057
bly-0.056
kin-0.055
Token What
Token ID1867
Feature activation-4.71e-01

Pos logit contributions
ful+0.001
quet+0.001
bly+0.001
ment+0.001
 Fairy+0.001
Neg logit contributions
rolling-0.001
 disagreed-0.001
 challenged-0.001
 argued-0.001
 favorites-0.001
Token and
Token ID290
Feature activation+7.35e-02

Pos logit contributions
 challenged+0.024
rolling+0.024
 disagreed+0.023
 One+0.022
 favorites+0.022
Neg logit contributions
quet-0.034
ful-0.032
bly-0.029
ment-0.028
kin-0.028
Token Sam
Token ID3409
Feature activation+8.99e-01

Pos logit contributions
 challenged+0.003
rolling+0.003
 One+0.003
 disagreed+0.002
 Whoever+0.002
Neg logit contributions
quet-0.002
ful-0.002
bly-0.002
web-0.002
kin-0.002
Token were
Token ID547
Feature activation-3.56e-01

Pos logit contributions
rolling+0.022
 challenged+0.021
 Whoever+0.020
 disagreed+0.020
 favorites+0.019
Neg logit contributions
quet-0.037
ful-0.037
bly-0.033
ment-0.032
kin-0.032
Token squirrel
Token ID33039
Feature activation+1.85e+00

Pos logit contributions
ful+0.011
quet+0.010
bly+0.009
kin+0.009
sm+0.009
Neg logit contributions
 challenged-0.012
 Whoever-0.012
rolling-0.012
Whoever-0.012
 inc-0.012
Tokens
Token ID82
Feature activation+3.93e+00

Pos logit contributions
rolling+0.073
 challenged+0.072
 Whoever+0.070
 disagreed+0.069
 favorites+0.068
Neg logit contributions
ful-0.050
quet-0.049
 fairy-0.039
bly-0.039
ment-0.038
Token who
Token ID508
Feature activation+6.09e+00

Pos logit contributions
rolling+0.159
 challenged+0.154
 Whoever+0.150
 disagreed+0.149
 favorites+0.147
Neg logit contributions
quet-0.103
ful-0.100
bly-0.081
ment-0.081
kin-0.080
Token liked
Token ID8288
Feature activation+2.72e+00

Pos logit contributions
rolling+0.113
 challenged+0.112
 disagreed+0.104
 Whoever+0.101
 One+0.100
Neg logit contributions
quet-0.293
ful-0.289
bly-0.260
ment-0.258
kin-0.256
Token to
Token ID284
Feature activation+1.38e+00

Pos logit contributions
rolling+0.080
 challenged+0.080
 disagreed+0.076
 Whoever+0.075
 One+0.075
Neg logit contributions
quet-0.100
ful-0.097
bly-0.087
kin-0.086
ment-0.084
Token find
Token ID1064
Feature activation-3.34e-02

Pos logit contributions
rolling+0.039
 challenged+0.038
 Whoever+0.037
 disagreed+0.036
 favorites+0.035
Neg logit contributions
quet-0.053
ful-0.052
bly-0.045
ment-0.045
 fairy-0.045
Token nuts
Token ID14380
Feature activation+1.56e+00

Pos logit contributions
quet+0.001
ful+0.001
bly+0.001
ment+0.001
kin+0.001
Neg logit contributions
rolling-0.001
 challenged-0.001
 Whoever-0.001
 disagreed-0.001
 favorites-0.001
Token in
Token ID287
Feature activation+2.59e+00

Pos logit contributions
 challenged+0.051
 One+0.050
 disagreed+0.050
 favorites+0.049
rolling+0.048
Neg logit contributions
quet-0.049
ful-0.043
bly-0.043
igsaw-0.042
 relief-0.041
Token are
Token ID389
Feature activation-8.08e-01

Pos logit contributions
rolling+0.035
 Int+0.032
 challenged+0.031
outed+0.031
 Whoever+0.031
Neg logit contributions
ful-0.031
quet-0.028
 fairy-0.025
 from-0.025
bly-0.024
Token bigger
Token ID5749
Feature activation+2.12e+00

Pos logit contributions
quet+0.028
ful+0.027
ment+0.024
kin+0.023
bly+0.023
Neg logit contributions
rolling-0.026
 challenged-0.026
 disagreed-0.025
 Whoever-0.024
 favorites-0.024
Token than
Token ID621
Feature activation+2.48e-01

Pos logit contributions
rolling+0.081
 challenged+0.079
 Whoever+0.077
 disagreed+0.076
 favorites+0.075
Neg logit contributions
quet-0.059
ful-0.059
bly-0.048
ment-0.048
 fairy-0.047
Token it
Token ID340
Feature activation+1.87e+00

Pos logit contributions
 challenged+0.009
rolling+0.009
 Whoever+0.009
 disagreed+0.009
 One+0.009
Neg logit contributions
quet-0.008
ful-0.007
kin-0.006
ment-0.006
bly-0.006
Token?"
Token ID1701
Feature activation+3.03e+00

Pos logit contributions
rolling+0.061
 challenged+0.060
 disagreed+0.058
 Whoever+0.058
 One+0.056
Neg logit contributions
quet-0.063
ful-0.061
ment-0.053
bly-0.052
kin-0.052
Token
Token ID220
Feature activation+6.09e+00

Pos logit contributions
rolling+0.074
 challenged+0.065
inters+0.062
 exception+0.061
 favorites+0.061
Neg logit contributions
ful-0.125
quet-0.125
ment-0.115
cake-0.112
aid-0.111
Token
Token ID198
Feature activation+6.37e-01

Pos logit contributions
 disagreed+0.230
 argued+0.227
 challenged+0.227
rolling+0.223
 favorites+0.218
Neg logit contributions
ful-0.171
quet-0.150
 from-0.139
ment-0.137
bly-0.127
Token
Token ID198
Feature activation+5.00e-02

Pos logit contributions
rolling+0.023
 disagreed+0.022
 favorites+0.022
 challenged+0.022
inters+0.022
Neg logit contributions
ful-0.018
quet-0.018
aid-0.016
ment-0.015
bly-0.015
TokenSo
Token ID2396
Feature activation+1.17e+00

Pos logit contributions
rolling+0.002
 challenged+0.002
 disagreed+0.002
 favorites+0.002
 Whoever+0.002
Neg logit contributions
quet-0.001
ful-0.001
bly-0.001
aid-0.001
kin-0.001
Token they
Token ID484
Feature activation+5.21e-01

Pos logit contributions
 challenged+0.038
 disagreed+0.038
 Whoever+0.038
 One+0.037
 begged+0.037
Neg logit contributions
quet-0.042
ful-0.039
bly-0.035
web-0.035
aid-0.034
Token all
Token ID477
Feature activation+1.19e+00

Pos logit contributions
rolling+0.015
 favorites+0.014
 challenged+0.014
inters+0.014
 Whoever+0.014
Neg logit contributions
ful-0.020
quet-0.019
bly-0.017
 fairy-0.016
aid-0.016
Token the
Token ID262
Feature activation+7.34e-01

Pos logit contributions
quet+0.074
bly+0.060
igsaw+0.059
aid+0.057
cake+0.057
Neg logit contributions
 asks-0.047
 favorites-0.046
 challenged-0.045
 One-0.045
 disagreed-0.044
Token temple
Token ID12505
Feature activation-2.20e+00

Pos logit contributions
rolling+0.028
 challenged+0.027
 Whoever+0.026
 disagreed+0.025
 One+0.025
Neg logit contributions
quet-0.021
ful-0.021
 fairy-0.018
ment-0.018
sm-0.017
Token and
Token ID290
Feature activation+8.76e-01

Pos logit contributions
quet+0.081
bly+0.074
ful+0.074
aid+0.071
Happy+0.068
Neg logit contributions
 disagreed-0.062
 Whoever-0.060
 One-0.059
 challenged-0.056
 unsure-0.056
Token the
Token ID262
Feature activation+9.08e-01

Pos logit contributions
 challenged+0.028
 disagreed+0.027
 begged+0.027
 argued+0.026
 asks+0.026
Neg logit contributions
quet-0.032
ful-0.027
bly-0.027
kin-0.026
igsaw-0.026
Token people
Token ID661
Feature activation-1.01e+00

Pos logit contributions
rolling+0.030
 challenged+0.030
 disagreed+0.029
 Whoever+0.029
 favorites+0.028
Neg logit contributions
quet-0.030
ful-0.029
bly-0.025
ment-0.024
kin-0.024
Token who
Token ID508
Feature activation+6.08e+00

Pos logit contributions
quet+0.036
ful+0.032
ment+0.029
igsaw+0.029
bly+0.029
Neg logit contributions
 challenged-0.031
 disagreed-0.031
rolling-0.031
 Whoever-0.030
 begged-0.029
Token came
Token ID1625
Feature activation+7.83e-01

Pos logit contributions
rolling+0.148
 challenged+0.146
 disagreed+0.140
 Whoever+0.138
 favorites+0.131
Neg logit contributions
quet-0.255
ful-0.249
bly-0.221
ment-0.220
kin-0.218
Token there
Token ID612
Feature activation+4.42e+00

Pos logit contributions
rolling+0.025
 challenged+0.024
 Whoever+0.023
 disagreed+0.023
 One+0.022
Neg logit contributions
quet-0.028
ful-0.026
kin-0.024
bly-0.023
ment-0.023
Token.
Token ID13
Feature activation-3.61e-01

Pos logit contributions
 disagreed+0.155
 challenged+0.154
 One+0.150
rolling+0.149
 Whoever+0.145
Neg logit contributions
quet-0.144
ful-0.125
bly-0.119
aid-0.114
ment-0.113
Token They
Token ID1119
Feature activation+5.90e-01

Pos logit contributions
ful+0.013
quet+0.012
ment+0.011
 fairy+0.011
 from+0.010
Neg logit contributions
rolling-0.012
 challenged-0.011
 disagreed-0.011
 argued-0.010
 padd-0.010
Token said
Token ID531
Feature activation+1.07e+00

Pos logit contributions
rolling+0.017
 Whoever+0.014
 favorites+0.014
 challenged+0.014
inters+0.014
Neg logit contributions
ful-0.024
quet-0.023
 fairy-0.021
aid-0.021
 from-0.020
Token bed
Token ID3996
Feature activation-3.91e+00

Pos logit contributions
quet+0.096
ful+0.087
cake+0.085
web+0.081
aid+0.081
Neg logit contributions
 padd-0.094
 challenged-0.090
rolling-0.089
 One-0.088
 begged-0.088
Token wearing
Token ID5762
Feature activation+1.85e+00

Pos logit contributions
quet+0.128
aid+0.114
ful+0.109
 queens+0.106
 ray+0.105
Neg logit contributions
rolling-0.127
 disagreed-0.127
 challenged-0.123
 begged-0.119
 One-0.118
Token his
Token ID465
Feature activation+1.38e+00

Pos logit contributions
 One+0.057
 challenged+0.057
 favorites+0.056
rolling+0.055
 padd+0.054
Neg logit contributions
quet-0.069
ful-0.064
cake-0.054
aid-0.054
ment-0.053
Token favourite
Token ID12507
Feature activation+2.57e+00

Pos logit contributions
rolling+0.034
 challenged+0.032
 favorites+0.032
 disagreed+0.031
 One+0.030
Neg logit contributions
quet-0.058
ful-0.058
ment-0.051
bly-0.050
aid-0.049
Token py
Token ID12972
Feature activation+7.15e-02

Pos logit contributions
rolling+0.084
 challenged+0.081
 favorites+0.079
 disagreed+0.078
 One+0.078
Neg logit contributions
ful-0.087
quet-0.086
bly-0.073
ment-0.073
kin-0.070
Tokenj
Token ID73
Feature activation+6.07e+00

Pos logit contributions
rolling+0.001
 challenged+0.001
 Whoever+0.001
 disagreed+0.001
 One+0.001
Neg logit contributions
quet-0.003
ful-0.003
ment-0.003
bly-0.003
 fairy-0.003
Tokenamas
Token ID17485
Feature activation+1.03e-01

Pos logit contributions
rolling+0.178
 challenged+0.167
 Whoever+0.161
 disagreed+0.161
 One+0.157
Neg logit contributions
quet-0.227
ful-0.227
 fairy-0.197
bly-0.196
ment-0.196
Token,
Token ID11
Feature activation+7.74e-01

Pos logit contributions
rolling+0.004
 challenged+0.004
 disagreed+0.004
 One+0.003
 Whoever+0.003
Neg logit contributions
quet-0.003
ful-0.003
aid-0.003
ment-0.003
 queens-0.003
Token feeling
Token ID4203
Feature activation-5.95e-01

Pos logit contributions
rolling+0.025
 challenged+0.025
 Whoever+0.024
 disagreed+0.024
 padd+0.024
Neg logit contributions
quet-0.027
ful-0.025
bly-0.022
ment-0.022
kin-0.022
Token so
Token ID523
Feature activation+2.56e+00

Pos logit contributions
quet+0.023
ful+0.019
cake+0.018
igsaw+0.018
web+0.018
Neg logit contributions
 challenged-0.017
 intrigued-0.017
 padd-0.017
rolling-0.017
 One-0.017
Token proud
Token ID6613
Feature activation+7.99e-01

Pos logit contributions
rolling+0.083
 challenged+0.078
 disagreed+0.076
 One+0.076
 Whoever+0.074
Neg logit contributions
quet-0.091
ful-0.085
 fairy-0.074
web-0.073
aid-0.072
Token can
Token ID460
Feature activation-5.62e-01

Pos logit contributions
rolling+0.008
 Int+0.008
 challenged+0.007
outed+0.007
 exception+0.007
Neg logit contributions
ful-0.006
quet-0.006
 fairy-0.005
kin-0.005
aid-0.005
Token be
Token ID307
Feature activation-1.50e+00

Pos logit contributions
quet+0.019
ful+0.019
bly+0.016
 fairy+0.016
kin+0.016
Neg logit contributions
rolling-0.018
 challenged-0.017
 disagreed-0.017
 Whoever-0.017
 favorites-0.016
Token lions
Token ID31420
Feature activation+1.77e+00

Pos logit contributions
quet+0.053
ful+0.049
cake+0.046
igsaw+0.042
ment+0.042
Neg logit contributions
 challenged-0.049
rolling-0.048
 padd-0.048
 favorites-0.047
 disagreed-0.045
Token and
Token ID290
Feature activation+2.91e-01

Pos logit contributions
 challenged+0.057
 Whoever+0.057
 disagreed+0.054
 favorites+0.053
 padd+0.053
Neg logit contributions
quet-0.058
ful-0.054
ment-0.052
aid-0.052
bly-0.051
Token tigers
Token ID43114
Feature activation+4.42e+00

Pos logit contributions
rolling+0.009
 challenged+0.008
 Whoever+0.008
 disagreed+0.008
 One+0.008
Neg logit contributions
quet-0.011
ful-0.010
kin-0.009
bly-0.009
ment-0.009
Token who
Token ID508
Feature activation+6.04e+00

Pos logit contributions
 challenged+0.154
 Whoever+0.153
rolling+0.147
 disagreed+0.147
 One+0.146
Neg logit contributions
quet-0.137
ful-0.126
ment-0.119
bly-0.118
kin-0.113
Token drink
Token ID4144
Feature activation+7.41e-01

Pos logit contributions
rolling+0.173
 challenged+0.171
 Whoever+0.165
 disagreed+0.162
 favorites+0.158
Neg logit contributions
quet-0.227
ful-0.222
ment-0.194
bly-0.193
kin-0.192
Token from
Token ID422
Feature activation-6.51e-01

Pos logit contributions
 One+0.022
 Whoever+0.020
 padd+0.020
 challenged+0.019
 banks+0.019
Neg logit contributions
quet-0.028
igsaw-0.024
kin-0.023
ful-0.023
web-0.023
Token the
Token ID262
Feature activation+4.27e-01

Pos logit contributions
quet+0.021
ful+0.019
kin+0.017
aid+0.016
ment+0.016
Neg logit contributions
 challenged-0.023
rolling-0.022
 Whoever-0.022
 padd-0.022
 One-0.022
Token fountain
Token ID29420
Feature activation+1.16e+00

Pos logit contributions
rolling+0.013
 challenged+0.012
 disagreed+0.012
 favorites+0.012
 Whoever+0.012
Neg logit contributions
ful-0.016
quet-0.015
ment-0.014
bly-0.013
aid-0.013
Token."
Token ID526
Feature activation+1.76e+00

Pos logit contributions
 challenged+0.035
 disagreed+0.034
 One+0.034
 Whoever+0.034
 favorites+0.034
Neg logit contributions
quet-0.041
ful-0.037
aid-0.037
bly-0.035
 forgiveness-0.035
TokenIt
Token ID1026
Feature activation-2.73e-01

Pos logit contributions
rolling+0.057
 challenged+0.054
 disagreed+0.052
 favorites+0.052
 exception+0.051
Neg logit contributions
ful-0.046
quet-0.043
bly-0.037
aid-0.037
ment-0.037
Token might
Token ID1244
Feature activation+3.05e+00

Pos logit contributions
ful+0.008
 fairy+0.007
quet+0.007
 from+0.007
web+0.007
Neg logit contributions
rolling-0.011
 favorites-0.009
 challenged-0.009
 exception-0.009
 Whoever-0.009
Token be
Token ID307
Feature activation-1.22e+00

Pos logit contributions
 One+0.082
 challenged+0.080
 disagreed+0.078
rolling+0.076
 quarrel+0.075
Neg logit contributions
quet-0.126
ful-0.116
ment-0.114
bly-0.109
aid-0.106
Token slippery
Token ID32911
Feature activation+1.38e+00

Pos logit contributions
quet+0.044
ful+0.042
ment+0.036
bly+0.036
kin+0.036
Neg logit contributions
rolling-0.038
 challenged-0.038
 disagreed-0.036
 Whoever-0.036
 One-0.035
Token or
Token ID393
Feature activation+1.91e-01

Pos logit contributions
 One+0.046
 challenged+0.046
 Whoever+0.044
inters+0.043
 disagreed+0.043
Neg logit contributions
quet-0.047
ful-0.040
ment-0.039
aid-0.039
kin-0.038
Token
Token ID220
Feature activation+6.04e+00

Pos logit contributions
rolling+0.006
 challenged+0.006
 disagreed+0.006
 Whoever+0.006
 One+0.006
Neg logit contributions
quet-0.007
ful-0.006
ment-0.006
bly-0.006
kin-0.005
Tokenicky
Token ID17479
Feature activation+2.94e+00

Pos logit contributions
 challenged+0.211
rolling+0.209
 disagreed+0.204
 Whoever+0.200
 argued+0.199
Neg logit contributions
ful-0.190
quet-0.182
ment-0.156
kin-0.154
bly-0.152
Token or
Token ID393
Feature activation+7.73e-01

Pos logit contributions
 challenged+0.105
 disagreed+0.105
 Whoever+0.101
rolling+0.101
 begged+0.100
Neg logit contributions
quet-0.091
ful-0.079
bly-0.073
aid-0.070
 fairy-0.070
Token have
Token ID423
Feature activation-1.69e+00

Pos logit contributions
 challenged+0.024
rolling+0.024
 disagreed+0.024
 Whoever+0.023
 One+0.023
Neg logit contributions
quet-0.028
ful-0.025
ment-0.022
bly-0.022
kin-0.022
Token snakes
Token ID26042
Feature activation+7.63e-01

Pos logit contributions
quet+0.054
ful+0.052
bly+0.045
kin+0.045
ment+0.044
Neg logit contributions
rolling-0.058
 challenged-0.057
 Whoever-0.055
 disagreed-0.055
 One-0.055
Token."
Token ID526
Feature activation+2.32e+00

Pos logit contributions
 challenged+0.026
 disagreed+0.025
 begged+0.025
 Whoever+0.025
 asks+0.024
Neg logit contributions
quet-0.025
ful-0.022
bly-0.021
ment-0.021
igsaw-0.020
Token and
Token ID290
Feature activation-9.68e-03

Pos logit contributions
 forgiveness+0.079
quet+0.078
ful+0.078
sm+0.076
bly+0.075
Neg logit contributions
 One-0.068
 lined-0.064
 Whoever-0.063
 Suddenly-0.062
 pad-0.059
Token made
Token ID925
Feature activation+3.69e-01

Pos logit contributions
quet+0.000
ful+0.000
kin+0.000
bly+0.000
ment+0.000
Neg logit contributions
rolling-0.000
 challenged-0.000
 disagreed-0.000
 Whoever-0.000
 begged-0.000
Token long
Token ID890
Feature activation-1.80e-01

Pos logit contributions
rolling+0.013
 Whoever+0.013
 challenged+0.013
 disagreed+0.012
 favorites+0.012
Neg logit contributions
quet-0.011
ful-0.011
kin-0.009
bly-0.009
ment-0.009
Token and
Token ID290
Feature activation-3.63e-01

Pos logit contributions
ful+0.007
quet+0.007
bly+0.006
kin+0.006
aid+0.006
Neg logit contributions
rolling-0.005
 One-0.005
 inc-0.005
 favorites-0.005
 disagreed-0.005
Token cur
Token ID1090
Feature activation+1.98e-01

Pos logit contributions
quet+0.013
ful+0.013
ment+0.011
bly+0.011
kin+0.011
Neg logit contributions
rolling-0.011
 challenged-0.011
 disagreed-0.011
 Whoever-0.010
 begged-0.010
Tokenving
Token ID1075
Feature activation+6.03e+00

Pos logit contributions
rolling+0.007
 favorites+0.007
 One+0.007
 challenged+0.007
 Whoever+0.006
Neg logit contributions
quet-0.006
ful-0.006
bly-0.006
aid-0.005
kin-0.005
Token lines
Token ID3951
Feature activation-3.80e-01

Pos logit contributions
rolling+0.200
 disagreed+0.197
 One+0.195
 Whoever+0.194
 challenged+0.191
Neg logit contributions
quet-0.201
ful-0.186
aid-0.167
bly-0.166
kin-0.164
Token.
Token ID13
Feature activation-4.26e-01

Pos logit contributions
quet+0.012
 Tooth+0.012
 forgiveness+0.012
ful+0.011
 relief+0.011
Neg logit contributions
rolling-0.011
 lined-0.011
 inc-0.011
 One-0.011
 Whoever-0.011
Token She
Token ID1375
Feature activation-1.56e+00

Pos logit contributions
quet+0.014
ful+0.014
ment+0.012
cake+0.012
 fairy+0.012
Neg logit contributions
 challenged-0.014
rolling-0.014
 disagreed-0.013
 padd-0.013
 argued-0.012
Token was
Token ID373
Feature activation+6.95e-02

Pos logit contributions
ful+0.054
quet+0.052
 fairy+0.047
kin+0.046
web+0.045
Neg logit contributions
rolling-0.050
 challenged-0.046
 Whoever-0.045
 One-0.044
 exception-0.044
Token very
Token ID845
Feature activation+6.31e-01

Pos logit contributions
rolling+0.002
 challenged+0.002
 Whoever+0.002
 disagreed+0.002
 favorites+0.002
Neg logit contributions
quet-0.002
ful-0.002
bly-0.002
ment-0.002
 fairy-0.002
Token That
Token ID1320
Feature activation+8.70e-01

Pos logit contributions
ful+0.015
quet+0.015
ment+0.013
cake+0.013
 from+0.013
Neg logit contributions
rolling-0.013
 challenged-0.013
 exception-0.012
inters-0.012
 favorites-0.012
Token was
Token ID373
Feature activation+3.85e-01

Pos logit contributions
rolling+0.029
 challenged+0.026
 Whoever+0.026
 favorites+0.025
 disagreed+0.025
Neg logit contributions
ful-0.030
quet-0.029
 fairy-0.026
kin-0.025
ment-0.025
Token the
Token ID262
Feature activation+7.55e-01

Pos logit contributions
 challenged+0.013
 disagreed+0.013
 One+0.012
 begged+0.012
 padd+0.012
Neg logit contributions
quet-0.014
ful-0.011
kin-0.011
ment-0.011
aid-0.011
Token setup
Token ID9058
Feature activation-3.99e+00

Pos logit contributions
rolling+0.026
 challenged+0.025
 Whoever+0.025
 disagreed+0.024
 favorites+0.024
Neg logit contributions
quet-0.024
ful-0.024
 fairy-0.020
bly-0.020
kin-0.020
Token that
Token ID326
Feature activation-1.27e+00

Pos logit contributions
quet+0.141
ful+0.121
aid+0.117
bly+0.116
 queens+0.116
Neg logit contributions
 challenged-0.129
 disagreed-0.127
rolling-0.123
 begged-0.119
 Whoever-0.119
Token fore
Token ID1674
Feature activation-6.82e+00

Pos logit contributions
quet+0.047
ful+0.045
bly+0.040
aid+0.040
kin+0.039
Neg logit contributions
 challenged-0.038
rolling-0.037
 disagreed-0.037
 Whoever-0.036
 One-0.035
Tokenshadow
Token ID19106
Feature activation+1.66e+00

Pos logit contributions
quet+0.260
ful+0.258
bly+0.219
aid+0.211
sm+0.209
Neg logit contributions
 favorites-0.208
 One-0.188
 favourites-0.186
rolling-0.185
 asks-0.183
Tokened
Token ID276
Feature activation+5.49e-01

Pos logit contributions
 challenged+0.049
rolling+0.049
 disagreed+0.047
outed+0.046
 Whoever+0.045
Neg logit contributions
quet-0.062
ful-0.055
kin-0.053
 fairy-0.051
bly-0.050
Token the
Token ID262
Feature activation-7.55e-01

Pos logit contributions
 challenged+0.017
 disagreed+0.017
rolling+0.016
 One+0.016
 Whoever+0.016
Neg logit contributions
quet-0.021
ful-0.018
kin-0.017
aid-0.017
bly-0.017
Token bad
Token ID2089
Feature activation-4.36e+00

Pos logit contributions
quet+0.023
ful+0.023
 fairy+0.020
kin+0.019
ment+0.019
Neg logit contributions
rolling-0.027
 challenged-0.026
 Whoever-0.025
 disagreed-0.025
 favorites-0.024
Token ending
Token ID7464
Feature activation-4.62e+00

Pos logit contributions
quet+0.127
ful+0.125
bly+0.104
ment+0.102
kin+0.101
Neg logit contributions
rolling-0.163
 challenged-0.160
 Whoever-0.155
 disagreed-0.154
 One-0.151
Token
Token ID198
Feature activation-6.84e-01

Pos logit contributions
ful+0.010
quet+0.010
 fairy+0.008
ment+0.008
cake+0.008
Neg logit contributions
rolling-0.010
 challenged-0.009
 disagreed-0.009
 exception-0.009
 padd-0.009
Token
Token ID198
Feature activation-1.01e+00

Pos logit contributions
ful+0.023
quet+0.022
sm+0.019
bly+0.019
 fairy+0.019
Neg logit contributions
rolling-0.023
 challenged-0.023
 disagreed-0.022
 argued-0.022
 favorites-0.021
TokenL
Token ID43
Feature activation-3.21e+00

Pos logit contributions
quet+0.026
ful+0.026
ment+0.021
bly+0.021
 fairy+0.021
Neg logit contributions
rolling-0.041
 challenged-0.040
 Whoever-0.039
 disagreed-0.038
 One-0.038
Tokenily
Token ID813
Feature activation-2.17e+00

Pos logit contributions
quet+0.080
ful+0.073
ment+0.065
 fairy+0.065
 Tooth+0.064
Neg logit contributions
 Whoever-0.130
 challenged-0.129
 disagreed-0.127
Whoever-0.125
outed-0.121
Token had
Token ID550
Feature activation-2.15e+00

Pos logit contributions
quet+0.072
ful+0.072
 fairy+0.060
bly+0.060
ment+0.060
Neg logit contributions
rolling-0.072
 challenged-0.069
 Whoever-0.067
 disagreed-0.066
 favorites-0.065
Token fore
Token ID1674
Feature activation-6.72e+00

Pos logit contributions
ful+0.063
quet+0.055
 fore+0.052
 fairy+0.050
 from+0.048
Neg logit contributions
rolling-0.081
 Whoever-0.075
 favorites-0.074
 challenged-0.074
Whoever-0.072
Tokenshadow
Token ID19106
Feature activation+1.12e+00

Pos logit contributions
quet+0.250
ful+0.249
bly+0.211
aid+0.193
sm+0.190
Neg logit contributions
 favorites-0.221
 One-0.206
 asks-0.202
 favourites-0.194
inters-0.192
Tokened
Token ID276
Feature activation+4.14e-01

Pos logit contributions
rolling+0.032
 challenged+0.032
outed+0.031
 disagreed+0.030
 Whoever+0.029
Neg logit contributions
quet-0.043
kin-0.038
ful-0.037
paste-0.035
 Tooth-0.035
Token the
Token ID262
Feature activation-8.47e-01

Pos logit contributions
 challenged+0.013
 disagreed+0.013
rolling+0.013
 padd+0.012
 One+0.012
Neg logit contributions
quet-0.016
aid-0.012
cake-0.012
kin-0.012
ful-0.012
Token danger
Token ID3514
Feature activation-1.67e+00

Pos logit contributions
quet+0.029
ful+0.029
bly+0.024
ment+0.024
 fairy+0.024
Neg logit contributions
rolling-0.028
 challenged-0.027
 Whoever-0.026
 disagreed-0.026
 favorites-0.025
Token with
Token ID351
Feature activation+1.10e+00

Pos logit contributions
quet+0.058
aid+0.051
ful+0.050
bly+0.048
ment+0.047
Neg logit contributions
 challenged-0.054
 disagreed-0.053
 Whoever-0.052
rolling-0.051
 wanted-0.049
Token wrong
Token ID2642
Feature activation+3.29e-01

Pos logit contributions
 challenged+0.006
 disagreed+0.006
rolling+0.006
 exception+0.005
 asks+0.005
Neg logit contributions
quet-0.007
ful-0.006
aid-0.006
ment-0.006
kin-0.005
Token.
Token ID13
Feature activation-6.07e-01

Pos logit contributions
 challenged+0.012
 disagreed+0.012
 One+0.011
rolling+0.011
 begged+0.011
Neg logit contributions
quet-0.010
ful-0.009
aid-0.009
bly-0.008
kin-0.008
Token She
Token ID1375
Feature activation-1.56e+00

Pos logit contributions
ful+0.022
quet+0.022
 fairy+0.019
ment+0.019
bly+0.018
Neg logit contributions
rolling-0.018
 challenged-0.017
 disagreed-0.016
 favorites-0.016
 Whoever-0.016
Token saw
Token ID2497
Feature activation-8.90e-01

Pos logit contributions
ful+0.064
quet+0.063
 fairy+0.058
ment+0.056
kin+0.055
Neg logit contributions
rolling-0.039
 challenged-0.035
 Whoever-0.035
 favorites-0.034
inters-0.033
Token the
Token ID262
Feature activation-5.12e-01

Pos logit contributions
quet+0.029
kin+0.023
ful+0.023
bly+0.022
ment+0.021
Neg logit contributions
 challenged-0.031
 disagreed-0.031
 One-0.030
 asks-0.030
 padd-0.029
Token mess
Token ID2085
Feature activation-6.67e+00

Pos logit contributions
quet+0.018
ful+0.017
bly+0.015
ment+0.015
aid+0.014
Neg logit contributions
 challenged-0.016
 disagreed-0.016
rolling-0.015
 favorites-0.015
 padd-0.015
Token and
Token ID290
Feature activation-4.72e-01

Pos logit contributions
quet+0.262
 Tooth+0.244
aid+0.237
 relief+0.237
 bless+0.236
Neg logit contributions
 disagreed-0.161
 wanted-0.154
 Whoever-0.151
 lined-0.148
 argued-0.140
Token the
Token ID262
Feature activation+1.17e-01

Pos logit contributions
quet+0.021
ful+0.019
bly+0.018
igsaw+0.017
kin+0.017
Neg logit contributions
 disagreed-0.012
 challenged-0.012
 begged-0.011
 argued-0.011
 wanted-0.011
Token tears
Token ID10953
Feature activation-3.78e+00

Pos logit contributions
 disagreed+0.003
 challenged+0.003
rolling+0.003
 favorites+0.003
 padd+0.003
Neg logit contributions
quet-0.004
ful-0.004
bly-0.004
ment-0.003
aid-0.003
Token and
Token ID290
Feature activation+2.43e-01

Pos logit contributions
quet+0.138
igsaw+0.126
 Tooth+0.124
 queens+0.119
ment+0.115
Neg logit contributions
 disagreed-0.117
 argued-0.111
 wanted-0.111
 challenged-0.109
 begged-0.105
Token said
Token ID531
Feature activation+1.33e+00

Pos logit contributions
 disagreed+0.007
 challenged+0.007
 begged+0.007
 argued+0.007
 asks+0.007
Neg logit contributions
quet-0.010
ful-0.009
bly-0.009
igsaw-0.008
kin-0.008
Token there
Token ID612
Feature activation+3.21e+00

Pos logit contributions
rolling+0.002
 challenged+0.002
 disagreed+0.002
 padd+0.001
outed+0.001
Neg logit contributions
quet-0.002
ful-0.002
aid-0.001
igsaw-0.001
cake-0.001
Token and
Token ID290
Feature activation-7.68e-01

Pos logit contributions
rolling+0.119
 disagreed+0.116
 challenged+0.116
 One+0.115
 Whoever+0.113
Neg logit contributions
quet-0.093
ful-0.086
bly-0.077
aid-0.077
ment-0.077
Token wait
Token ID4043
Feature activation-1.37e+00

Pos logit contributions
quet+0.030
ful+0.029
bly+0.026
kin+0.025
ment+0.025
Neg logit contributions
rolling-0.022
 challenged-0.021
 disagreed-0.021
 begged-0.020
 Whoever-0.020
Token.
Token ID13
Feature activation-4.92e-01

Pos logit contributions
quet+0.047
ful+0.043
aid+0.040
bly+0.038
 queens+0.037
Neg logit contributions
rolling-0.045
 One-0.043
 challenged-0.042
 Whoever-0.042
 disagreed-0.042
Token The
Token ID383
Feature activation-6.04e-02

Pos logit contributions
ful+0.018
quet+0.018
ment+0.016
 fairy+0.015
cake+0.015
Neg logit contributions
 challenged-0.014
rolling-0.014
 disagreed-0.013
 padd-0.013
 exception-0.012
Token fore
Token ID1674
Feature activation-6.49e+00

Pos logit contributions
quet+0.002
ful+0.002
bly+0.002
ment+0.002
paste+0.002
Neg logit contributions
rolling-0.002
 challenged-0.002
 disagreed-0.002
 One-0.002
 Whoever-0.002
Tokenshadow
Token ID19106
Feature activation+1.13e+00

Pos logit contributions
quet+0.254
ful+0.243
bly+0.217
web+0.206
aid+0.204
Neg logit contributions
 favorites-0.205
 One-0.189
 favourites-0.185
inters-0.181
 challenged-0.179
Tokening
Token ID278
Feature activation+1.20e-01

Pos logit contributions
rolling+0.034
 challenged+0.033
outed+0.032
 disagreed+0.032
 Whoever+0.031
Neg logit contributions
quet-0.041
ful-0.037
kin-0.036
bly-0.034
aid-0.034
Token of
Token ID286
Feature activation-3.17e+00

Pos logit contributions
 challenged+0.004
 disagreed+0.004
 begged+0.004
rolling+0.004
 inc+0.004
Neg logit contributions
quet-0.004
ful-0.004
aid-0.003
 queens-0.003
cake-0.003
Token the
Token ID262
Feature activation-7.95e-01

Pos logit contributions
quet+0.122
paste+0.103
aid+0.102
cake+0.102
ful+0.096
Neg logit contributions
 One-0.093
 disagreed-0.091
 challenged-0.085
 struggled-0.084
 begged-0.083
Token dull
Token ID19222
Feature activation-2.21e+00

Pos logit contributions
ful+0.027
quet+0.027
bly+0.023
ment+0.022
kin+0.022
Neg logit contributions
rolling-0.026
 challenged-0.026
 disagreed-0.025
 Whoever-0.024
 One-0.024
Token the
Token ID262
Feature activation-2.29e-02

Pos logit contributions
 One+0.024
rolling+0.023
 challenged+0.023
 disagreed+0.023
 asks+0.023
Neg logit contributions
quet-0.020
ful-0.018
aid-0.017
paste-0.016
bly-0.016
Token sky
Token ID6766
Feature activation-1.94e-02

Pos logit contributions
quet+0.001
ful+0.001
bly+0.001
ment+0.001
aid+0.001
Neg logit contributions
rolling-0.001
 One-0.001
 challenged-0.001
 favorites-0.001
 padd-0.001
Token before
Token ID878
Feature activation+1.49e+00

Pos logit contributions
quet+0.001
ful+0.001
aid+0.000
bly+0.000
paste+0.000
Neg logit contributions
rolling-0.001
 challenged-0.001
 disagreed-0.001
 One-0.001
 begged-0.001
Token they
Token ID484
Feature activation+1.46e+00

Pos logit contributions
rolling+0.051
 One+0.050
 padd+0.049
 challenged+0.049
 Whoever+0.048
Neg logit contributions
quet-0.049
ful-0.047
kin-0.043
aid-0.042
bly-0.041
Token walked
Token ID6807
Feature activation+1.66e+00

Pos logit contributions
rolling+0.041
 challenged+0.039
 Whoever+0.038
 disagreed+0.038
 One+0.036
Neg logit contributions
quet-0.056
ful-0.055
bly-0.048
ment-0.048
kin-0.048
Token fore
Token ID1674
Feature activation-6.40e+00

Pos logit contributions
rolling+0.062
 challenged+0.060
 Whoever+0.058
 disagreed+0.058
 One+0.056
Neg logit contributions
quet-0.049
ful-0.048
bly-0.040
ment-0.040
kin-0.039
Tokenshadow
Token ID19106
Feature activation+1.79e+00

Pos logit contributions
ful+0.216
quet+0.212
 from+0.184
bly+0.182
 queens+0.161
Neg logit contributions
 favorites-0.220
 favourites-0.199
 One-0.198
inters-0.187
 disagreed-0.187
Tokened
Token ID276
Feature activation+6.39e-01

Pos logit contributions
rolling+0.054
 challenged+0.052
outed+0.051
 disagreed+0.051
 Whoever+0.050
Neg logit contributions
quet-0.066
ful-0.059
kin-0.057
 fairy-0.054
aid-0.054
Token the
Token ID262
Feature activation-6.21e-01

Pos logit contributions
 challenged+0.021
 disagreed+0.021
rolling+0.020
 Whoever+0.020
 One+0.020
Neg logit contributions
quet-0.022
ful-0.019
aid-0.019
bly-0.018
web-0.018
Token down
Token ID866
Feature activation+2.66e+00

Pos logit contributions
ful+0.021
quet+0.021
bly+0.018
ment+0.018
kin+0.017
Neg logit contributions
rolling-0.020
 challenged-0.020
 disagreed-0.019
 Whoever-0.019
 One-0.019
Tokenpour
Token ID48681
Feature activation-5.38e+00

Pos logit contributions
 challenged+0.088
 One+0.088
 disagreed+0.087
rolling+0.085
 upon+0.082
Neg logit contributions
quet-0.088
aid-0.077
ful-0.076
bly-0.074
paste-0.073
Token happened
Token ID3022
Feature activation+9.04e-01

Pos logit contributions
 challenged+0.010
rolling+0.010
 disagreed+0.010
 One+0.009
 Whoever+0.009
Neg logit contributions
quet-0.015
ful-0.015
bly-0.013
ment-0.013
kin-0.013
Token.
Token ID13
Feature activation-4.73e-01

Pos logit contributions
rolling+0.043
 challenged+0.041
 Whoever+0.041
 favorites+0.040
 disagreed+0.040
Neg logit contributions
ful-0.018
quet-0.017
ment-0.013
bly-0.012
 fairy-0.012
Token She
Token ID1375
Feature activation-1.40e+00

Pos logit contributions
ful+0.014
quet+0.013
ment+0.011
 fairy+0.011
cake+0.011
Neg logit contributions
rolling-0.018
 challenged-0.017
 Whoever-0.016
 favorites-0.016
 disagreed-0.016
Token saw
Token ID2497
Feature activation-5.46e-01

Pos logit contributions
quet+0.071
ful+0.070
bly+0.063
ment+0.063
kin+0.063
Neg logit contributions
rolling-0.022
 challenged-0.021
 Whoever-0.020
 disagreed-0.019
 favorites-0.018
Token the
Token ID262
Feature activation-3.26e-01

Pos logit contributions
quet+0.020
kin+0.017
bly+0.016
ful+0.016
ross+0.015
Neg logit contributions
 Whoever-0.017
 asks-0.017
 challenged-0.017
 padd-0.017
 One-0.016
Token mess
Token ID2085
Feature activation-6.30e+00

Pos logit contributions
quet+0.011
ful+0.011
ment+0.009
bly+0.009
kin+0.009
Neg logit contributions
rolling-0.011
 challenged-0.010
 disagreed-0.010
 Whoever-0.010
 favorites-0.010
Token and
Token ID290
Feature activation-3.83e-01

Pos logit contributions
quet+0.269
 Tooth+0.263
 bless+0.255
ment+0.250
 Fairy+0.247
Neg logit contributions
 Whoever-0.130
 disagreed-0.127
 lined-0.123
 wanted-0.118
 argued-0.111
Token the
Token ID262
Feature activation+1.65e-01

Pos logit contributions
quet+0.014
ful+0.012
kin+0.011
bly+0.011
igsaw+0.011
Neg logit contributions
 challenged-0.013
 disagreed-0.013
 begged-0.013
 asks-0.012
rolling-0.012
Token broken
Token ID5445
Feature activation-2.89e+00

Pos logit contributions
rolling+0.005
 challenged+0.005
 disagreed+0.005
 Whoever+0.005
 favorites+0.005
Neg logit contributions
quet-0.006
ful-0.006
ment-0.005
bly-0.005
aid-0.005
Token v
Token ID410
Feature activation-2.29e-01

Pos logit contributions
quet+0.103
ful+0.100
aid+0.087
bly+0.087
ment+0.084
Neg logit contributions
rolling-0.087
 challenged-0.086
 disagreed-0.084
 Whoever-0.080
 One-0.080
Tokenase
Token ID589
Feature activation-9.93e-01

Pos logit contributions
quet+0.006
ful+0.006
 relief+0.006
 fairy+0.006
 Fairy+0.005
Neg logit contributions
rolling-0.008
outed-0.008
inters-0.008
 challenged-0.008
 Whoever-0.008
Token Let
Token ID3914
Feature activation+9.90e-01

Pos logit contributions
quet+0.019
ful+0.019
kin+0.016
bly+0.016
ment+0.016
Neg logit contributions
rolling-0.017
 Whoever-0.016
 challenged-0.016
 One-0.016
 disagreed-0.015
Token's
Token ID338
Feature activation+9.09e-01

Pos logit contributions
rolling+0.022
 challenged+0.020
 Whoever+0.020
 disagreed+0.019
 favorites+0.019
Neg logit contributions
quet-0.044
ful-0.044
bly-0.039
 fairy-0.039
ment-0.039
Token clean
Token ID3424
Feature activation-4.66e+00

Pos logit contributions
rolling+0.024
 One+0.023
 challenged+0.023
 Whoever+0.023
 disagreed+0.022
Neg logit contributions
quet-0.037
ful-0.035
ment-0.034
sm-0.032
bly-0.031
Token up
Token ID510
Feature activation+2.97e+00

Pos logit contributions
quet+0.219
ful+0.193
aid+0.187
kin+0.185
cake+0.185
Neg logit contributions
 Whoever-0.099
 padd-0.098
 challenged-0.094
 asks-0.091
 favorites-0.090
Token the
Token ID262
Feature activation-1.38e+00

Pos logit contributions
 padd+0.087
 Whoever+0.084
 One+0.077
 asks+0.076
 challenged+0.075
Neg logit contributions
quet-0.117
 ray-0.099
ative-0.097
kin-0.095
aid-0.092
Token mess
Token ID2085
Feature activation-6.29e+00

Pos logit contributions
ful+0.045
quet+0.045
bly+0.039
ment+0.038
kin+0.037
Neg logit contributions
rolling-0.046
 challenged-0.045
 disagreed-0.044
 Whoever-0.044
 favorites-0.043
Token and
Token ID290
Feature activation-4.40e-01

Pos logit contributions
quet+0.258
igsaw+0.232
 relief+0.228
ment+0.223
aid+0.221
Neg logit contributions
 Whoever-0.151
 disagreed-0.139
 wants-0.133
 challenged-0.130
 wanted-0.130
Token have
Token ID423
Feature activation-2.63e+00

Pos logit contributions
quet+0.018
ful+0.017
ment+0.015
kin+0.015
bly+0.015
Neg logit contributions
rolling-0.012
 Whoever-0.012
 challenged-0.012
 begged-0.011
 disagreed-0.011
Token some
Token ID617
Feature activation+4.63e-01

Pos logit contributions
quet+0.085
ful+0.081
kin+0.071
ment+0.070
bly+0.069
Neg logit contributions
 challenged-0.089
rolling-0.088
 disagreed-0.087
 Whoever-0.086
 One-0.086
Token fun
Token ID1257
Feature activation-1.58e+00

Pos logit contributions
 favorites+0.017
 One+0.017
 disagreed+0.016
 padd+0.016
 begged+0.016
Neg logit contributions
quet-0.014
ful-0.013
 Fairy-0.011
bly-0.011
web-0.011
Token!"
Token ID2474
Feature activation+1.50e+00

Pos logit contributions
quet+0.053
ful+0.049
bly+0.044
 fairy+0.043
kin+0.042
Neg logit contributions
 favorites-0.051
 challenged-0.050
 disagreed-0.050
 begged-0.048
 Whoever-0.048
Token
Token ID198
Feature activation-8.50e-01

Pos logit contributions
ful+0.018
quet+0.017
sm+0.014
ment+0.014
aid+0.014
Neg logit contributions
rolling-0.019
 challenged-0.018
 disagreed-0.018
 favorites-0.018
 Whoever-0.017
TokenHe
Token ID1544
Feature activation-1.27e-01

Pos logit contributions
quet+0.025
ful+0.025
ment+0.021
 fairy+0.021
kin+0.021
Neg logit contributions
rolling-0.031
 challenged-0.030
 Whoever-0.030
 One-0.029
 disagreed-0.029
Token felt
Token ID2936
Feature activation-1.20e+00

Pos logit contributions
quet+0.005
ful+0.005
bly+0.004
ment+0.004
kin+0.004
Neg logit contributions
 challenged-0.003
rolling-0.003
 disagreed-0.003
 Whoever-0.003
 One-0.003
Token a
Token ID257
Feature activation+1.29e-01

Pos logit contributions
quet+0.043
ful+0.037
web+0.035
kin+0.034
cake+0.033
Neg logit contributions
 challenged-0.040
 One-0.039
 intrigued-0.039
 hesitant-0.038
rolling-0.037
Token great
Token ID1049
Feature activation-2.45e+00

Pos logit contributions
 padd+0.003
rolling+0.003
 challenged+0.003
 bet+0.003
 Whoever+0.003
Neg logit contributions
quet-0.005
ful-0.005
ions-0.004
ers-0.004
keeping-0.004
Token satisfaction
Token ID14676
Feature activation-6.28e+00

Pos logit contributions
quet+0.086
ful+0.082
bly+0.074
 queens+0.073
 yourself+0.071
Neg logit contributions
 challenged-0.076
rolling-0.074
 One-0.074
 padd-0.073
 disagreed-0.073
Token and
Token ID290
Feature activation-2.90e-01

Pos logit contributions
quet+0.228
igsaw+0.211
aid+0.204
Happy+0.201
 yourself+0.195
Neg logit contributions
 padd-0.185
 challenged-0.167
 disagreed-0.167
 wanted-0.164
 age-0.163
Token decided
Token ID3066
Feature activation+2.91e-01

Pos logit contributions
quet+0.013
ful+0.011
igsaw+0.011
bly+0.010
web+0.010
Neg logit contributions
 challenged-0.008
 wanted-0.007
 disagreed-0.007
 padd-0.007
 begged-0.007
Token to
Token ID284
Feature activation-1.58e-01

Pos logit contributions
 challenged+0.009
rolling+0.008
 Whoever+0.008
 disagreed+0.008
 One+0.008
Neg logit contributions
quet-0.011
ful-0.010
kin-0.010
bly-0.010
web-0.009
Token never
Token ID1239
Feature activation-2.41e+00

Pos logit contributions
quet+0.006
ful+0.006
sm+0.006
paste+0.006
ment+0.006
Neg logit contributions
 One-0.004
 challenged-0.004
rolling-0.004
 padd-0.004
 begged-0.004
Token forget
Token ID6044
Feature activation-3.41e+00

Pos logit contributions
quet+0.083
ful+0.081
bly+0.071
ment+0.070
kin+0.070
Neg logit contributions
rolling-0.078
 challenged-0.076
 disagreed-0.073
 Whoever-0.073
 One-0.071
Token.
Token ID13
Feature activation-7.36e-01

Pos logit contributions
quet+0.069
ful+0.066
kin+0.059
aid+0.057
bly+0.057
Neg logit contributions
rolling-0.071
 challenged-0.068
 Whoever-0.065
 One-0.065
 padd-0.064
Token He
Token ID679
Feature activation+1.43e-01

Pos logit contributions
ful+0.025
quet+0.024
ment+0.022
 fairy+0.021
 Fairy+0.021
Neg logit contributions
 disagreed-0.022
rolling-0.021
 challenged-0.021
 argued-0.020
 exception-0.020
Token felt
Token ID2936
Feature activation-1.71e+00

Pos logit contributions
rolling+0.004
 Whoever+0.004
inters+0.004
 challenged+0.004
 favorites+0.004
Neg logit contributions
ful-0.005
quet-0.005
 fairy-0.005
ment-0.005
kin-0.005
Token a
Token ID257
Feature activation-2.93e-01

Pos logit contributions
quet+0.068
ful+0.059
kin+0.054
web+0.054
cake+0.053
Neg logit contributions
 challenged-0.050
rolling-0.049
 intrigued-0.049
 One-0.048
 padd-0.048
Token sudden
Token ID4802
Feature activation+4.96e-01

Pos logit contributions
quet+0.010
ful+0.010
ment+0.009
aid+0.008
bly+0.008
Neg logit contributions
rolling-0.009
 challenged-0.009
 padd-0.009
 Whoever-0.009
 disagreed-0.009
Token relief
Token ID8259
Feature activation-6.26e+00

Pos logit contributions
 padd+0.015
 challenged+0.015
rolling+0.015
 One+0.015
 inc+0.015
Neg logit contributions
quet-0.018
ful-0.016
kin-0.015
igsaw-0.015
bly-0.015
Token that
Token ID326
Feature activation-3.45e-01

Pos logit contributions
quet+0.222
igsaw+0.216
aid+0.198
 queens+0.197
kin+0.188
Neg logit contributions
 padd-0.190
 challenged-0.186
 inc-0.183
rolling-0.180
 wanted-0.177
Token he
Token ID339
Feature activation+1.52e+00

Pos logit contributions
quet+0.015
ful+0.013
bly+0.012
aid+0.012
igsaw+0.012
Neg logit contributions
 challenged-0.009
 disagreed-0.009
 One-0.009
 begged-0.008
 wanted-0.008
Token had
Token ID550
Feature activation-1.68e+00

Pos logit contributions
rolling+0.035
 challenged+0.034
 disagreed+0.033
 Whoever+0.031
 begged+0.031
Neg logit contributions
quet-0.067
ful-0.065
bly-0.058
ment-0.058
kin-0.057
Token helped
Token ID4193
Feature activation+1.20e+00

Pos logit contributions
quet+0.068
ful+0.064
bly+0.059
kin+0.058
ment+0.057
Neg logit contributions
 challenged-0.046
rolling-0.044
 disagreed-0.044
 begged-0.042
 padd-0.041
Token the
Token ID262
Feature activation-1.82e-02

Pos logit contributions
 One+0.039
 challenged+0.039
 begged+0.038
rolling+0.038
 dare+0.038
Neg logit contributions
quet-0.039
bly-0.034
ful-0.033
kin-0.033
web-0.032
Token door
Token ID3420
Feature activation-3.58e+00

Pos logit contributions
quet+0.044
ful+0.043
bly+0.038
ment+0.037
aid+0.036
Neg logit contributions
rolling-0.035
 favorites-0.035
 One-0.035
 Whoever-0.034
 challenged-0.034
Token.
Token ID13
Feature activation-4.75e-01

Pos logit contributions
quet+0.126
ful+0.119
bly+0.116
kin+0.114
aid+0.111
Neg logit contributions
 One-0.106
 Whoever-0.105
rolling-0.102
 disagreed-0.101
 begged-0.099
Token She
Token ID1375
Feature activation-1.52e+00

Pos logit contributions
ful+0.016
quet+0.016
ment+0.014
 fairy+0.014
bly+0.013
Neg logit contributions
rolling-0.015
 challenged-0.015
 disagreed-0.014
 Whoever-0.014
 favorites-0.014
Token saw
Token ID2497
Feature activation-6.76e-01

Pos logit contributions
ful+0.064
quet+0.064
 fairy+0.057
ment+0.056
kin+0.056
Neg logit contributions
rolling-0.036
 challenged-0.034
 Whoever-0.033
 disagreed-0.032
 favorites-0.031
Token the
Token ID262
Feature activation-1.58e-01

Pos logit contributions
quet+0.025
kin+0.022
bly+0.020
ful+0.020
ross+0.019
Neg logit contributions
 challenged-0.020
 Whoever-0.020
 disagreed-0.020
 asks-0.020
 One-0.020
Token mess
Token ID2085
Feature activation-6.24e+00

Pos logit contributions
quet+0.005
ful+0.005
bly+0.005
ment+0.005
aid+0.004
Neg logit contributions
rolling-0.005
 favorites-0.005
 challenged-0.005
 disagreed-0.005
 Whoever-0.005
Token.
Token ID13
Feature activation-8.53e-01

Pos logit contributions
quet+0.255
aid+0.245
 Tooth+0.244
 bless+0.241
 relief+0.234
Neg logit contributions
 disagreed-0.134
 lined-0.134
 Whoever-0.128
 wanted-0.127
 pointed-0.117
Token She
Token ID1375
Feature activation-1.37e+00

Pos logit contributions
quet+0.033
ful+0.032
bly+0.029
kin+0.028
aid+0.027
Neg logit contributions
 One-0.024
rolling-0.024
 Whoever-0.023
 challenged-0.023
 Int-0.022
Token saw
Token ID2497
Feature activation-6.78e-01

Pos logit contributions
quet+0.061
ful+0.059
bly+0.054
ment+0.053
kin+0.053
Neg logit contributions
 challenged-0.031
rolling-0.030
 disagreed-0.029
 Whoever-0.028
 begged-0.028
Token the
Token ID262
Feature activation-2.76e-01

Pos logit contributions
quet+0.026
kin+0.023
bly+0.021
ross+0.021
ful+0.021
Neg logit contributions
 Whoever-0.019
 asks-0.019
 disagreed-0.019
 One-0.019
 challenged-0.019
Token cr
Token ID1067
Feature activation-2.23e+00

Pos logit contributions
quet+0.010
ful+0.010
bly+0.008
ment+0.008
aid+0.008
Neg logit contributions
 disagreed-0.008
 favorites-0.008
rolling-0.008
 challenged-0.008
 padd-0.008
Token end
Token ID886
Feature activation-4.77e+00

Pos logit contributions
rolling+0.009
 challenged+0.009
 Whoever+0.009
 disagreed+0.008
 favorites+0.008
Neg logit contributions
quet-0.014
ful-0.014
bly-0.012
ment-0.012
 fairy-0.012
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
quet+0.153
ful+0.136
bly+0.126
kin+0.121
aid+0.120
Neg logit contributions
 One-0.165
 challenged-0.163
 disagreed-0.162
rolling-0.162
 asks-0.159
Token
Token ID198
Feature activation-4.26e-01

Pos logit contributions
ful+0.016
quet+0.016
 fairy+0.014
ment+0.014
bly+0.014
Neg logit contributions
rolling-0.012
 challenged-0.011
 disagreed-0.011
 Whoever-0.011
 favorites-0.011
Token
Token ID198
Feature activation-6.07e-01

Pos logit contributions
ful+0.014
quet+0.013
sm+0.011
 fairy+0.011
ment+0.011
Neg logit contributions
rolling-0.015
 challenged-0.015
 disagreed-0.015
 favorites-0.014
 Whoever-0.014
TokenM
Token ID44
Feature activation-4.38e-01

Pos logit contributions
quet+0.017
ful+0.017
bly+0.014
kin+0.014
ment+0.014
Neg logit contributions
rolling-0.023
 challenged-0.023
 disagreed-0.022
 Whoever-0.022
 favorites-0.022
Tokenoral
Token ID6864
Feature activation-6.23e+00

Pos logit contributions
ful+0.013
quet+0.012
bly+0.011
ment+0.010
 fairy+0.010
Neg logit contributions
rolling-0.016
 disagreed-0.015
outed-0.015
 challenged-0.015
 One-0.015
Token value
Token ID1988
Feature activation-3.55e+00

Pos logit contributions
quet+0.172
ful+0.155
bly+0.150
kin+0.147
paste+0.138
Neg logit contributions
 One-0.248
 challenged-0.228
 asks-0.224
 dare-0.217
 disagreed-0.216
Token:
Token ID25
Feature activation+3.12e-01

Pos logit contributions
quet+0.132
bly+0.113
aid+0.109
paste+0.108
ful+0.107
Neg logit contributions
 One-0.105
 challenged-0.104
 asks-0.102
 upon-0.100
 disagreed-0.100
Token The
Token ID383
Feature activation-1.16e-01

Pos logit contributions
rolling+0.011
 challenged+0.010
 disagreed+0.010
 favorites+0.010
 Whoever+0.010
Neg logit contributions
ful-0.010
quet-0.010
 fairy-0.008
ment-0.008
 from-0.008
Token story
Token ID1621
Feature activation-4.42e+00

Pos logit contributions
quet+0.004
ful+0.004
bly+0.003
ment+0.003
paste+0.003
Neg logit contributions
 challenged-0.004
rolling-0.004
 One-0.004
 favorites-0.004
 disagreed-0.004
Token teaches
Token ID17324
Feature activation-2.45e+00

Pos logit contributions
quet+0.162
ful+0.160
bly+0.142
ment+0.141
aid+0.135
Neg logit contributions
 Whoever-0.129
 challenged-0.128
 disagreed-0.123
 asks-0.121
rolling-0.121
Token kitchen
Token ID9592
Feature activation-2.33e+00

Pos logit contributions
quet+0.032
ful+0.031
bly+0.028
kin+0.028
ment+0.028
Neg logit contributions
rolling-0.016
 asks-0.016
 One-0.016
 challenged-0.016
 bank-0.015
Token.
Token ID13
Feature activation-8.67e-01

Pos logit contributions
quet+0.078
ful+0.070
bly+0.067
aid+0.066
kin+0.064
Neg logit contributions
 challenged-0.077
rolling-0.076
 disagreed-0.076
 One-0.074
 Whoever-0.074
Token He
Token ID679
Feature activation+6.06e-01

Pos logit contributions
quet+0.034
ful+0.034
ment+0.030
 fairy+0.030
bly+0.030
Neg logit contributions
rolling-0.023
 challenged-0.022
 Whoever-0.021
 disagreed-0.021
 favorites-0.020
Token sees
Token ID7224
Feature activation-1.09e+00

Pos logit contributions
rolling+0.018
 challenged+0.016
 Whoever+0.016
 disagreed+0.016
 favorites+0.015
Neg logit contributions
ful-0.021
quet-0.020
 fairy-0.020
sm-0.019
 fore-0.019
Token the
Token ID262
Feature activation-3.33e-01

Pos logit contributions
quet+0.044
kin+0.036
ful+0.035
bly+0.035
web+0.033
Neg logit contributions
 asks-0.032
 challenged-0.031
 One-0.031
 padd-0.031
 Whoever-0.030
Token mess
Token ID2085
Feature activation-6.23e+00

Pos logit contributions
quet+0.011
ful+0.011
bly+0.009
ment+0.009
aid+0.009
Neg logit contributions
 challenged-0.011
rolling-0.011
 disagreed-0.010
 favorites-0.010
 padd-0.010
Token.
Token ID13
Feature activation-9.75e-01

Pos logit contributions
quet+0.249
 Tooth+0.231
igsaw+0.227
aid+0.227
 relief+0.226
Neg logit contributions
 Whoever-0.151
 lined-0.142
 wanted-0.140
 wants-0.135
 disagreed-0.135
Token He
Token ID679
Feature activation+6.90e-01

Pos logit contributions
quet+0.043
ful+0.042
bly+0.038
kin+0.037
 fairy+0.037
Neg logit contributions
rolling-0.022
 challenged-0.021
 Whoever-0.021
 One-0.021
 disagreed-0.020
Token sees
Token ID7224
Feature activation-1.13e+00

Pos logit contributions
rolling+0.020
 challenged+0.019
 Whoever+0.018
 disagreed+0.018
 favorites+0.018
Neg logit contributions
ful-0.025
quet-0.025
 fairy-0.023
ment-0.022
sm-0.022
Token Tom
Token ID4186
Feature activation-1.51e-01

Pos logit contributions
quet+0.047
kin+0.038
ful+0.037
bly+0.037
ross+0.035
Neg logit contributions
 asks-0.032
 padd-0.031
 One-0.031
 challenged-0.030
 Whoever-0.030
Token with
Token ID351
Feature activation+7.93e-01

Pos logit contributions
quet+0.005
ful+0.005
ment+0.005
kin+0.005
bly+0.004
Neg logit contributions
rolling-0.005
 challenged-0.005
 Whoever-0.005
 disagreed-0.005
 argued-0.004
Token bathroom
Token ID12436
Feature activation-1.01e+00

Pos logit contributions
quet+0.007
ful+0.006
ment+0.006
bly+0.006
aid+0.006
Neg logit contributions
 bank-0.004
rolling-0.004
 challenged-0.004
 padd-0.004
 banks-0.004
Token.
Token ID13
Feature activation-3.43e-01

Pos logit contributions
quet+0.026
ful+0.024
bly+0.021
ment+0.020
kin+0.020
Neg logit contributions
rolling-0.041
 challenged-0.041
 disagreed-0.040
 Whoever-0.039
 One-0.039
Token She
Token ID1375
Feature activation-1.18e+00

Pos logit contributions
quet+0.011
ful+0.011
ment+0.009
bly+0.009
 fairy+0.009
Neg logit contributions
rolling-0.011
 challenged-0.011
 Whoever-0.011
 disagreed-0.011
 favorites-0.010
Token sees
Token ID7224
Feature activation-1.08e+00

Pos logit contributions
ful+0.046
quet+0.045
 fairy+0.043
sm+0.041
ment+0.041
Neg logit contributions
rolling-0.030
 challenged-0.028
 disagreed-0.027
 Whoever-0.027
 favorites-0.027
Token the
Token ID262
Feature activation-9.48e-02

Pos logit contributions
quet+0.042
kin+0.034
bly+0.032
ful+0.032
ross+0.032
Neg logit contributions
 padd-0.032
 asks-0.032
 challenged-0.031
 Whoever-0.031
 One-0.031
Token mess
Token ID2085
Feature activation-6.23e+00

Pos logit contributions
quet+0.003
ful+0.003
ment+0.003
bly+0.003
aid+0.003
Neg logit contributions
rolling-0.003
 challenged-0.003
 disagreed-0.003
 padd-0.003
 favorites-0.003
Token and
Token ID290
Feature activation-1.43e-01

Pos logit contributions
quet+0.260
 Tooth+0.254
igsaw+0.244
 relief+0.242
ment+0.242
Neg logit contributions
 Whoever-0.134
 disagreed-0.120
 lined-0.120
 wanted-0.118
 pointed-0.110
Token the
Token ID262
Feature activation+4.30e-01

Pos logit contributions
quet+0.005
ful+0.004
kin+0.004
bly+0.004
ment+0.004
Neg logit contributions
 challenged-0.005
rolling-0.005
 Whoever-0.005
 asks-0.005
 disagreed-0.005
Token ring
Token ID5858
Feature activation+9.17e-01

Pos logit contributions
rolling+0.013
 challenged+0.013
 disagreed+0.012
 padd+0.012
 favorites+0.012
Neg logit contributions
quet-0.016
ful-0.015
ment-0.013
bly-0.013
aid-0.012
Token.
Token ID13
Feature activation-5.92e-01

Pos logit contributions
 disagreed+0.029
 Whoever+0.029
rolling+0.029
 challenged+0.028
outed+0.028
Neg logit contributions
quet-0.030
ful-0.028
 queens-0.026
 Tooth-0.026
bly-0.025
Token She
Token ID1375
Feature activation-9.32e-01

Pos logit contributions
quet+0.021
ful+0.020
bly+0.018
kin+0.018
ment+0.017
Neg logit contributions
rolling-0.019
 challenged-0.018
 Whoever-0.018
 One-0.017
 disagreed-0.017
Token She
Token ID1375
Feature activation-1.53e+00

Pos logit contributions
ful+0.008
quet+0.008
ment+0.007
 fairy+0.007
cake+0.006
Neg logit contributions
rolling-0.006
 challenged-0.006
 disagreed-0.006
inters-0.006
 favorites-0.006
Token felt
Token ID2936
Feature activation-1.54e+00

Pos logit contributions
ful+0.062
quet+0.062
 fairy+0.054
kin+0.054
ment+0.054
Neg logit contributions
rolling-0.039
 challenged-0.037
 Whoever-0.036
 favorites-0.034
 disagreed-0.034
Token a
Token ID257
Feature activation-2.04e-01

Pos logit contributions
quet+0.065
ful+0.058
web+0.054
ment+0.053
kin+0.052
Neg logit contributions
 challenged-0.039
 One-0.039
 disagreed-0.039
rolling-0.038
 intrigued-0.038
Token wave
Token ID6769
Feature activation+3.49e-01

Pos logit contributions
quet+0.008
ful+0.008
ment+0.007
web+0.007
ers+0.007
Neg logit contributions
rolling-0.006
 padd-0.005
 disagreed-0.005
 One-0.005
 intrigued-0.005
Token of
Token ID286
Feature activation-2.80e+00

Pos logit contributions
 challenged+0.012
 disagreed+0.011
rolling+0.011
 Whoever+0.011
 begged+0.011
Neg logit contributions
quet-0.011
ment-0.009
ful-0.009
igsaw-0.009
 queens-0.009
Token relief
Token ID8259
Feature activation-6.23e+00

Pos logit contributions
quet+0.098
ful+0.087
web+0.078
cake+0.078
paste+0.075
Neg logit contributions
 padd-0.095
 begged-0.092
 asks-0.088
 challenged-0.087
 One-0.087
Token as
Token ID355
Feature activation+3.20e+00

Pos logit contributions
quet+0.231
aid+0.219
igsaw+0.218
bly+0.199
 queens+0.197
Neg logit contributions
 padd-0.167
 disagreed-0.165
 challenged-0.165
 begged-0.161
 wanted-0.158
Token their
Token ID511
Feature activation-2.58e-01

Pos logit contributions
rolling+0.079
 challenged+0.076
 intrigued+0.076
 disagreed+0.076
 upon+0.076
Neg logit contributions
quet-0.129
cake-0.106
ful-0.106
aid-0.104
igsaw-0.100
Token day
Token ID1110
Feature activation-1.04e+00

Pos logit contributions
quet+0.009
ful+0.009
bly+0.008
web+0.008
ment+0.008
Neg logit contributions
rolling-0.008
 disagreed-0.007
 favorites-0.007
 challenged-0.007
 padd-0.007
Token continued
Token ID3767
Feature activation+2.91e-01

Pos logit contributions
quet+0.040
ful+0.037
igsaw+0.036
aid+0.035
bly+0.035
Neg logit contributions
 disagreed-0.029
 challenged-0.029
 wanted-0.028
 begged-0.028
 inc-0.027
Token.
Token ID13
Feature activation-7.61e-01

Pos logit contributions
rolling+0.009
 challenged+0.008
 One+0.008
 padd+0.008
 begged+0.008
Neg logit contributions
quet-0.011
aid-0.010
ful-0.010
kin-0.009
cake-0.009
Token.
Token ID13
Feature activation-6.21e-01

Pos logit contributions
rolling+0.054
 disagreed+0.054
 challenged+0.053
 Whoever+0.052
 One+0.052
Neg logit contributions
quet-0.041
ful-0.037
bly-0.035
aid-0.033
kin-0.033
Token She
Token ID1375
Feature activation-1.74e+00

Pos logit contributions
quet+0.023
ful+0.023
ment+0.020
bly+0.019
 fairy+0.019
Neg logit contributions
rolling-0.018
 challenged-0.018
 Whoever-0.017
 disagreed-0.017
 favorites-0.016
Token wiped
Token ID21122
Feature activation-9.45e-01

Pos logit contributions
ful+0.066
quet+0.065
bly+0.057
 fairy+0.056
kin+0.056
Neg logit contributions
rolling-0.050
 challenged-0.048
 Whoever-0.047
 disagreed-0.046
 favorites-0.045
Token up
Token ID510
Feature activation+2.58e+00

Pos logit contributions
quet+0.034
ful+0.029
kin+0.027
 queens+0.026
cake+0.025
Neg logit contributions
 One-0.032
 Whoever-0.031
 Suddenly-0.031
 disagreed-0.030
 asks-0.030
Token the
Token ID262
Feature activation-1.22e+00

Pos logit contributions
 padd+0.064
 One+0.063
 inc+0.063
 argued+0.061
 knees+0.060
Neg logit contributions
quet-0.108
ful-0.094
ross-0.091
 bless-0.090
ative-0.090
Token mess
Token ID2085
Feature activation-6.23e+00

Pos logit contributions
ful+0.043
quet+0.043
bly+0.037
ment+0.036
ative+0.035
Neg logit contributions
 disagreed-0.036
rolling-0.036
 challenged-0.035
 favorites-0.035
outed-0.034
Token and
Token ID290
Feature activation-6.79e-01

Pos logit contributions
quet+0.252
 Tooth+0.244
 bless+0.243
 queens+0.228
aid+0.226
Neg logit contributions
 Whoever-0.125
 lined-0.122
 disagreed-0.117
 wanted-0.107
 argued-0.101
Token made
Token ID925
Feature activation-1.58e-01

Pos logit contributions
quet+0.030
ful+0.026
igsaw+0.025
bly+0.024
ative+0.023
Neg logit contributions
 challenged-0.017
 disagreed-0.017
 padd-0.017
 begged-0.016
 argued-0.016
Token everything
Token ID2279
Feature activation-1.75e+00

Pos logit contributions
quet+0.005
ful+0.005
kin+0.004
bly+0.004
 fairy+0.004
Neg logit contributions
rolling-0.005
 Whoever-0.005
 challenged-0.005
 disagreed-0.005
 padd-0.005
Token nice
Token ID3621
Feature activation-1.54e+00

Pos logit contributions
quet+0.063
ful+0.056
coon+0.055
 Fairy+0.054
 Tooth+0.054
Neg logit contributions
 disagreed-0.050
 favorite-0.049
 unsure-0.047
 wanted-0.046
rolling-0.046
Token and
Token ID290
Feature activation-3.18e-01

Pos logit contributions
quet+0.064
aid+0.060
bly+0.058
 bless+0.057
 Tooth+0.056
Neg logit contributions
 disagreed-0.040
 wanted-0.037
 argued-0.034
 might-0.033
 refused-0.033
Token happened
Token ID3022
Feature activation+1.21e+00

Pos logit contributions
 challenged+0.016
 disagreed+0.015
 One+0.015
 Whoever+0.014
rolling+0.014
Neg logit contributions
quet-0.029
ful-0.028
ment-0.026
bly-0.026
aid-0.024
Token.
Token ID13
Feature activation-8.23e-01

Pos logit contributions
rolling+0.060
 challenged+0.058
 Whoever+0.058
 favorites+0.057
Whoever+0.057
Neg logit contributions
ful-0.022
quet-0.018
 from-0.014
ment-0.013
 fairy-0.013
Token She
Token ID1375
Feature activation-1.93e+00

Pos logit contributions
ful+0.020
quet+0.020
 fairy+0.017
cake+0.016
paste+0.015
Neg logit contributions
rolling-0.033
 challenged-0.032
 padd-0.031
 favorites-0.031
 disagreed-0.031
Token saw
Token ID2497
Feature activation-5.47e-01

Pos logit contributions
ful+0.078
 fairy+0.072
quet+0.071
sm+0.067
 from+0.067
Neg logit contributions
rolling-0.048
 favorites-0.047
inters-0.047
Whoever-0.045
 Whoever-0.045
Token the
Token ID262
Feature activation-4.29e-01

Pos logit contributions
quet+0.022
kin+0.020
ment+0.020
ful+0.019
bly+0.019
Neg logit contributions
 challenged-0.015
 Whoever-0.015
 One-0.015
 asks-0.014
 Int-0.014
Token mess
Token ID2085
Feature activation-6.19e+00

Pos logit contributions
quet+0.014
ful+0.014
bly+0.012
ment+0.012
 fairy+0.012
Neg logit contributions
rolling-0.015
 challenged-0.014
 Whoever-0.014
 disagreed-0.014
 favorites-0.013
Token and
Token ID290
Feature activation-5.65e-01

Pos logit contributions
quet+0.275
ment+0.261
 relief+0.257
aid+0.250
 Tooth+0.250
Neg logit contributions
 Whoever-0.125
 lined-0.116
 disagreed-0.115
 wanted-0.111
 challenged-0.102
Token said
Token ID531
Feature activation+1.21e+00

Pos logit contributions
quet+0.026
ful+0.026
bly+0.023
ment+0.023
kin+0.023
Neg logit contributions
rolling-0.011
 challenged-0.011
 Whoever-0.010
 disagreed-0.010
 favorites-0.009
Token,
Token ID11
Feature activation-9.77e-01

Pos logit contributions
 disagreed+0.043
 challenged+0.042
rolling+0.042
 One+0.041
 begged+0.040
Neg logit contributions
quet-0.039
ful-0.033
bly-0.032
kin-0.032
web-0.031
Token "
Token ID366
Feature activation+9.02e-01

Pos logit contributions
ful+0.039
quet+0.038
 fairy+0.034
bly+0.033
ment+0.033
Neg logit contributions
rolling-0.025
 challenged-0.024
 favorites-0.024
 disagreed-0.023
 Whoever-0.023
TokenOh
Token ID5812
Feature activation-1.71e+00

Pos logit contributions
rolling+0.035
 challenged+0.035
 Whoever+0.034
 disagreed+0.034
 One+0.033
Neg logit contributions
quet-0.025
ful-0.024
ment-0.020
kin-0.020
bly-0.019
Token quickly
Token ID2952
Feature activation-1.33e+00

Pos logit contributions
quet+0.022
ful+0.022
kin+0.019
bly+0.018
ment+0.018
Neg logit contributions
rolling-0.025
 challenged-0.025
 Whoever-0.024
 disagreed-0.024
 One-0.023
Token,
Token ID11
Feature activation-8.01e-01

Pos logit contributions
quet+0.046
ful+0.042
kin+0.040
bly+0.039
web+0.038
Neg logit contributions
 challenged-0.044
rolling-0.044
 disagreed-0.042
 begged-0.041
 padd-0.041
Token and
Token ID290
Feature activation-1.41e+00

Pos logit contributions
ful+0.026
quet+0.026
 fairy+0.022
bly+0.022
ment+0.022
Neg logit contributions
rolling-0.027
 challenged-0.026
 Whoever-0.025
 disagreed-0.025
 favorites-0.025
Token saw
Token ID2497
Feature activation-4.01e-01

Pos logit contributions
quet+0.050
ful+0.046
kin+0.041
bly+0.040
ment+0.040
Neg logit contributions
 challenged-0.045
rolling-0.045
 disagreed-0.043
 Whoever-0.042
 begged-0.042
Token the
Token ID262
Feature activation-2.96e-01

Pos logit contributions
quet+0.012
ful+0.010
kin+0.009
bly+0.008
igsaw+0.008
Neg logit contributions
 challenged-0.015
 disagreed-0.015
 padd-0.015
 One-0.014
 Whoever-0.014
Token mess
Token ID2085
Feature activation-6.18e+00

Pos logit contributions
ful+0.010
quet+0.010
bly+0.009
ment+0.009
aid+0.008
Neg logit contributions
rolling-0.009
 challenged-0.009
 disagreed-0.009
 favorites-0.009
 padd-0.009
Token.
Token ID13
Feature activation-6.26e-01

Pos logit contributions
quet+0.215
aid+0.200
 Tooth+0.188
ment+0.184
 relief+0.182
Neg logit contributions
 disagreed-0.185
 wanted-0.174
 lined-0.171
 challenged-0.171
 Whoever-0.169
Token They
Token ID1119
Feature activation+6.29e-02

Pos logit contributions
quet+0.022
ful+0.022
ment+0.019
 fairy+0.019
bly+0.019
Neg logit contributions
rolling-0.019
 challenged-0.018
 disagreed-0.018
 Whoever-0.017
 favorites-0.017
Token were
Token ID547
Feature activation-1.50e+00

Pos logit contributions
rolling+0.002
 challenged+0.002
 Whoever+0.002
 favorites+0.002
 disagreed+0.001
Neg logit contributions
ful-0.002
quet-0.002
 fairy-0.002
bly-0.002
kin-0.002
Token worried
Token ID7960
Feature activation+9.31e-01

Pos logit contributions
quet+0.061
ful+0.058
ment+0.053
 fairy+0.051
kin+0.051
Neg logit contributions
rolling-0.041
 challenged-0.039
 disagreed-0.038
 Whoever-0.037
 begged-0.036
Token,
Token ID11
Feature activation-2.62e-01

Pos logit contributions
 One+0.028
 disagreed+0.027
 Whoever+0.027
 challenged+0.026
 argued+0.025
Neg logit contributions
quet-0.036
kin-0.031
aid-0.030
bly-0.030
 fairy-0.029
Token quickly
Token ID2952
Feature activation-1.53e+00

Pos logit contributions
quet+0.117
bly+0.098
kin+0.098
ful+0.097
igsaw+0.096
Neg logit contributions
 One-0.056
 begged-0.055
 challenged-0.054
 asks-0.054
 disagreed-0.053
Token.
Token ID13
Feature activation-1.10e+00

Pos logit contributions
quet+0.057
bly+0.052
kin+0.050
ful+0.048
 Fairy+0.045
Neg logit contributions
 One-0.048
 begged-0.047
 challenged-0.047
 asks-0.046
 disagreed-0.045
Token She
Token ID1375
Feature activation-1.43e+00

Pos logit contributions
quet+0.037
ful+0.037
bly+0.031
kin+0.031
ment+0.031
Neg logit contributions
rolling-0.037
 challenged-0.035
 Whoever-0.035
 disagreed-0.034
 One-0.034
Token sees
Token ID7224
Feature activation-9.74e-01

Pos logit contributions
ful+0.049
quet+0.049
 fairy+0.044
sm+0.042
ment+0.042
Neg logit contributions
rolling-0.045
 challenged-0.042
 Whoever-0.041
 disagreed-0.041
 favorites-0.040
Token the
Token ID262
Feature activation-1.06e-01

Pos logit contributions
quet+0.045
kin+0.039
bly+0.038
ful+0.038
 Fairy+0.037
Neg logit contributions
 asks-0.023
 Whoever-0.022
 One-0.022
 challenged-0.021
 padd-0.021
Token mess
Token ID2085
Feature activation-6.18e+00

Pos logit contributions
quet+0.003
ful+0.003
ment+0.003
bly+0.003
kin+0.003
Neg logit contributions
rolling-0.004
 challenged-0.004
 disagreed-0.004
 Whoever-0.004
 favorites-0.003
Token and
Token ID290
Feature activation-3.75e-01

Pos logit contributions
quet+0.250
igsaw+0.237
 Tooth+0.234
 Fairy+0.228
 relief+0.221
Neg logit contributions
 Whoever-0.142
 wants-0.133
 wanted-0.131
 asks-0.128
outed-0.125
Token Tom
Token ID4186
Feature activation+3.64e-01

Pos logit contributions
quet+0.014
kin+0.012
ful+0.012
bly+0.011
igsaw+0.011
Neg logit contributions
 asks-0.013
 padd-0.012
 challenged-0.012
 wants-0.012
 begged-0.012
Token's
Token ID338
Feature activation+1.55e+00

Pos logit contributions
rolling+0.012
 challenged+0.012
 disagreed+0.012
 Whoever+0.012
 begged+0.011
Neg logit contributions
quet-0.012
ful-0.012
kin-0.010
bly-0.010
ment-0.010
Token tears
Token ID10953
Feature activation-3.16e+00

Pos logit contributions
rolling+0.048
 challenged+0.046
 disagreed+0.045
 Whoever+0.044
 favorites+0.044
Neg logit contributions
quet-0.056
ful-0.054
bly-0.047
ment-0.046
 fairy-0.045
Token.
Token ID13
Feature activation-8.69e-01

Pos logit contributions
quet+0.136
igsaw+0.122
 queens+0.112
 Tooth+0.110
 Fairy+0.110
Neg logit contributions
 argued-0.076
 One-0.076
 disagreed-0.076
 padd-0.075
 challenged-0.075
Token.
Token ID13
Feature activation-3.12e-01

Pos logit contributions
 disagreed+0.015
 challenged+0.014
 wanted+0.014
 begged+0.013
 argued+0.013
Neg logit contributions
quet-0.018
bly-0.015
aid-0.015
ful-0.014
kin-0.014
Token They
Token ID1119
Feature activation+1.44e-01

Pos logit contributions
quet+0.011
ful+0.011
 fairy+0.009
ment+0.009
cake+0.009
Neg logit contributions
rolling-0.010
 challenged-0.009
 disagreed-0.009
 Whoever-0.009
 favorites-0.009
Token cleaned
Token ID20750
Feature activation-3.70e+00

Pos logit contributions
 disagreed+0.004
 challenged+0.004
rolling+0.004
 argued+0.003
 begged+0.003
Neg logit contributions
quet-0.006
ful-0.006
ment-0.005
bly-0.005
kin-0.005
Token up
Token ID510
Feature activation+2.82e+00

Pos logit contributions
quet+0.186
ful+0.167
kin+0.161
aid+0.156
 fairy+0.155
Neg logit contributions
 Whoever-0.070
 disagreed-0.066
 favorites-0.065
 challenged-0.064
 Suddenly-0.063
Token the
Token ID262
Feature activation-1.37e+00

Pos logit contributions
 disagreed+0.079
 favorites+0.074
 argued+0.073
 favorite+0.072
 padd+0.070
Neg logit contributions
quet-0.116
bly-0.094
ful-0.093
web-0.093
 ray-0.089
Token mess
Token ID2085
Feature activation-6.17e+00

Pos logit contributions
quet+0.049
ful+0.049
ment+0.042
bly+0.042
kin+0.039
Neg logit contributions
rolling-0.042
 challenged-0.041
 disagreed-0.040
 favorites-0.040
 Whoever-0.039
Token together
Token ID1978
Feature activation+7.84e-01

Pos logit contributions
quet+0.265
 Tooth+0.248
aid+0.232
 queens+0.228
 bless+0.225
Neg logit contributions
 disagreed-0.134
 Whoever-0.133
 argued-0.118
 lined-0.117
 refused-0.112
Token and
Token ID290
Feature activation-1.50e-01

Pos logit contributions
 disagreed+0.023
 challenged+0.020
 argued+0.020
 wanted+0.020
 begged+0.019
Neg logit contributions
quet-0.031
bly-0.028
aid-0.027
 queens-0.027
ful-0.026
Token even
Token ID772
Feature activation+2.26e-02

Pos logit contributions
quet+0.007
ful+0.006
bly+0.006
igsaw+0.005
kin+0.005
Neg logit contributions
 disagreed-0.004
 challenged-0.004
 begged-0.004
 argued-0.004
 wanted-0.003
Token made
Token ID925
Feature activation+3.33e-01

Pos logit contributions
 challenged+0.001
 disagreed+0.001
 begged+0.001
 argued+0.001
 asks+0.000
Neg logit contributions
quet-0.001
ful-0.001
bly-0.001
ment-0.001
igsaw-0.001
Token some
Token ID617
Feature activation+3.02e-01

Pos logit contributions
rolling+0.011
 challenged+0.011
 Whoever+0.011
 disagreed+0.010
 favorites+0.010
Neg logit contributions
quet-0.011
ful-0.010
kin-0.009
 fairy-0.009
bly-0.009
Token.
Token ID13
Feature activation+7.33e-01

Pos logit contributions
rolling+0.019
 challenged+0.017
 Whoever+0.017
Whoever+0.017
rolled+0.016
Neg logit contributions
ful-0.010
 from-0.009
 fairy-0.008
quet-0.008
 ray-0.008
Token They
Token ID1119
Feature activation+4.20e-01

Pos logit contributions
rolling+0.021
 challenged+0.021
 disagreed+0.020
 favorites+0.019
 Whoever+0.019
Neg logit contributions
ful-0.027
quet-0.026
 fairy-0.023
ment-0.023
bly-0.022
Token look
Token ID804
Feature activation-1.89e+00

Pos logit contributions
rolling+0.011
 challenged+0.011
 Whoever+0.010
 disagreed+0.010
 favorites+0.010
Neg logit contributions
ful-0.017
quet-0.017
 fairy-0.014
bly-0.014
kin-0.014
Token at
Token ID379
Feature activation+1.47e+00

Pos logit contributions
ful+0.065
quet+0.061
 fairy+0.055
bly+0.055
 from+0.054
Neg logit contributions
rolling-0.063
 challenged-0.058
 Whoever-0.057
 disagreed-0.055
Whoever-0.054
Token the
Token ID262
Feature activation-4.34e-01

Pos logit contributions
 challenged+0.049
 Whoever+0.048
rolling+0.048
 One+0.048
 favorites+0.047
Neg logit contributions
quet-0.050
ful-0.045
kin-0.041
bly-0.040
paste-0.040
Token mess
Token ID2085
Feature activation-6.17e+00

Pos logit contributions
quet+0.015
ful+0.015
bly+0.013
ment+0.013
aid+0.012
Neg logit contributions
rolling-0.013
 challenged-0.013
 favorites-0.013
 disagreed-0.013
 One-0.013
Token of
Token ID286
Feature activation-3.21e+00

Pos logit contributions
quet+0.231
aid+0.211
 Tooth+0.205
igsaw+0.200
 relief+0.199
Neg logit contributions
 disagreed-0.169
 Whoever-0.164
 wanted-0.155
 lined-0.155
 argued-0.152
Token blocks
Token ID7021
Feature activation-9.61e-01

Pos logit contributions
quet+0.117
ful+0.099
ative+0.094
paste+0.093
ment+0.093
Neg logit contributions
 disagreed-0.100
 padd-0.098
 argued-0.098
 favorites-0.098
 begged-0.097
Token.
Token ID13
Feature activation-4.45e-01

Pos logit contributions
quet+0.033
 Fairy+0.029
 Tooth+0.029
bly+0.028
aid+0.028
Neg logit contributions
 disagreed-0.031
 Whoever-0.028
 lined-0.028
 argued-0.028
 challenged-0.027
Token
Token ID198
Feature activation-5.87e-01

Pos logit contributions
ful+0.017
quet+0.017
ment+0.015
 fairy+0.015
bly+0.015
Neg logit contributions
rolling-0.012
 challenged-0.012
 disagreed-0.011
 Whoever-0.011
 favorites-0.011
Token
Token ID198
Feature activation-7.98e-01

Pos logit contributions
ful+0.020
quet+0.018
sm+0.016
ment+0.016
aid+0.016
Neg logit contributions
rolling-0.020
 disagreed-0.020
 challenged-0.020
 favorites-0.019
 argued-0.019
Token asked
Token ID1965
Feature activation+2.28e+00

Pos logit contributions
ful+0.062
 fairy+0.059
quet+0.054
web+0.054
kin+0.053
Neg logit contributions
rolling-0.040
 Whoever-0.038
 exception-0.036
Whoever-0.036
inters-0.035
Token herself
Token ID5223
Feature activation-2.01e+00

Pos logit contributions
rolling+0.096
 challenged+0.096
 One+0.093
 disagreed+0.093
 Whoever+0.091
Neg logit contributions
quet-0.056
ful-0.051
bly-0.044
kin-0.043
aid-0.040
Token why
Token ID1521
Feature activation+1.76e+00

Pos logit contributions
quet+0.063
ful+0.060
bly+0.053
kin+0.051
ment+0.051
Neg logit contributions
rolling-0.071
 challenged-0.069
 Whoever-0.067
 disagreed-0.066
 One-0.066
Token it
Token ID340
Feature activation+1.41e+00

Pos logit contributions
 disagreed+0.049
 One+0.049
 challenged+0.049
rolling+0.049
 inc+0.046
Neg logit contributions
quet-0.067
ful-0.060
bly-0.058
aid-0.056
cake-0.055
Token felt
Token ID2936
Feature activation+8.72e-01

Pos logit contributions
rolling+0.044
 challenged+0.041
 Whoever+0.041
 disagreed+0.039
 favorites+0.039
Neg logit contributions
quet-0.050
ful-0.049
 fairy-0.043
kin-0.042
bly-0.042
Token so
Token ID523
Feature activation+3.20e+00

Pos logit contributions
 challenged+0.026
 disagreed+0.026
rolling+0.025
 One+0.025
 intrigued+0.024
Neg logit contributions
quet-0.034
ful-0.030
ment-0.027
bly-0.027
aid-0.027
Token quiet
Token ID5897
Feature activation+2.74e-01

Pos logit contributions
rolling+0.098
 One+0.094
 disagreed+0.093
 challenged+0.093
 Whoever+0.088
Neg logit contributions
quet-0.120
ful-0.112
bly-0.098
aid-0.097
ment-0.096
Token and
Token ID290
Feature activation-1.55e-01

Pos logit contributions
 disagreed+0.008
 challenged+0.008
 One+0.008
 wanted+0.008
 Whoever+0.007
Neg logit contributions
quet-0.010
aid-0.009
bly-0.008
ful-0.008
igsaw-0.008
Token empty
Token ID6565
Feature activation-9.40e-01

Pos logit contributions
quet+0.005
ful+0.005
bly+0.004
ment+0.004
 fairy+0.004
Neg logit contributions
rolling-0.005
 challenged-0.005
 Whoever-0.005
 disagreed-0.005
 favorites-0.005
Token.
Token ID13
Feature activation-2.14e-01

Pos logit contributions
quet+0.033
aid+0.032
ful+0.030
bly+0.030
kin+0.027
Neg logit contributions
 One-0.028
 disagreed-0.027
 challenged-0.027
 exception-0.026
 wanted-0.026
Token
Token ID198
Feature activation-7.44e-01

Pos logit contributions
ful+0.006
ment+0.005
quet+0.005
 Fairy+0.005
 fairy+0.005
Neg logit contributions
 exception-0.007
rolling-0.007
ounded-0.007
 padd-0.007
inters-0.007
Token to
Token ID284
Feature activation+9.23e-01

Pos logit contributions
quet+0.057
 relief+0.051
igsaw+0.050
 queens+0.050
 forgiveness+0.050
Neg logit contributions
 padd-0.041
 favorite-0.040
inters-0.039
 lined-0.039
 disagreed-0.039
Token make
Token ID787
Feature activation+3.50e-01

Pos logit contributions
 One+0.023
 begged+0.022
 padd+0.022
 beg+0.021
 challenged+0.021
Neg logit contributions
quet-0.038
igsaw-0.035
ative-0.034
ful-0.034
ment-0.034
Token the
Token ID262
Feature activation+5.20e-01

Pos logit contributions
 challenged+0.010
 Whoever+0.010
 padd+0.010
rolling+0.010
 One+0.009
Neg logit contributions
quet-0.013
ful-0.012
kin-0.011
 ray-0.011
web-0.011
Token boo
Token ID22513
Feature activation-6.27e-01

Pos logit contributions
 challenged+0.015
rolling+0.015
 favorites+0.015
 One+0.015
 disagreed+0.015
Neg logit contributions
quet-0.019
ful-0.019
bly-0.016
ment-0.016
kin-0.015
Token-
Token ID12
Feature activation-2.36e-01

Pos logit contributions
quet+0.017
ful+0.015
igsaw+0.015
 fore+0.013
 lav+0.013
Neg logit contributions
rolling-0.023
 challenged-0.022
 exception-0.022
 padd-0.021
 disagreed-0.021
Tokenbo
Token ID2127
Feature activation+3.96e+00

Pos logit contributions
quet+0.008
ful+0.007
kin+0.007
igsaw+0.006
 ray+0.006
Neg logit contributions
rolling-0.008
 challenged-0.008
 disagreed-0.007
 Whoever-0.007
inters-0.007
Tokeno
Token ID78
Feature activation+3.68e+00

Pos logit contributions
rolling+0.088
 challenged+0.085
 disagreed+0.084
 Whoever+0.080
 favorites+0.080
Neg logit contributions
quet-0.173
ful-0.164
bly-0.152
 fairy-0.149
kin-0.148
Token go
Token ID467
Feature activation-5.88e-01

Pos logit contributions
rolling+0.105
 padd+0.102
 disagreed+0.102
 challenged+0.102
 One+0.101
Neg logit contributions
quet-0.128
ful-0.114
igsaw-0.111
 queens-0.105
kin-0.103
Token away
Token ID1497
Feature activation-1.25e+00

Pos logit contributions
quet+0.019
ful+0.017
kin+0.016
ment+0.016
 queens+0.015
Neg logit contributions
rolling-0.021
 One-0.020
 challenged-0.020
 Whoever-0.020
 exception-0.019
Token.
Token ID13
Feature activation-3.22e-01

Pos logit contributions
quet+0.047
aid+0.044
kin+0.039
bly+0.039
igsaw+0.039
Neg logit contributions
 One-0.035
 exception-0.033
 padd-0.033
 challenged-0.032
rolling-0.031
Token Tim
Token ID5045
Feature activation+6.23e-01

Pos logit contributions
ful+0.007
quet+0.007
cake+0.007
 from+0.006
 accidents+0.006
Neg logit contributions
rolling-0.013
 challenged-0.012
 favorites-0.012
 disagreed-0.012
Whoever-0.012
Token Ben
Token ID3932
Feature activation+7.97e-01

Pos logit contributions
rolling+0.090
 challenged+0.085
 favorites+0.082
 Whoever+0.082
 disagreed+0.081
Neg logit contributions
ful-0.099
quet-0.095
 fairy-0.085
ment-0.085
bly-0.082
Token said
Token ID531
Feature activation+2.16e+00

Pos logit contributions
rolling+0.025
 favorites+0.021
inters+0.021
rolled+0.020
 Whoever+0.020
Neg logit contributions
ful-0.030
 fairy-0.028
 from-0.027
quet-0.026
 fore-0.025
Token.
Token ID13
Feature activation-2.73e-01

Pos logit contributions
rolling+0.073
 One+0.073
 challenged+0.073
 disagreed+0.071
 Whoever+0.070
Neg logit contributions
quet-0.070
 queens-0.058
ful-0.056
web-0.055
igsaw-0.054
Token He
Token ID679
Feature activation+1.10e+00

Pos logit contributions
quet+0.012
ful+0.012
bly+0.010
kin+0.010
ment+0.010
Neg logit contributions
rolling-0.006
 challenged-0.006
 Whoever-0.006
 One-0.006
 disagreed-0.006
Token was
Token ID373
Feature activation+4.52e-01

Pos logit contributions
rolling+0.033
 favorites+0.029
 Whoever+0.029
inters+0.028
 exception+0.028
Neg logit contributions
ful-0.041
quet-0.040
 fairy-0.037
kin-0.036
aid-0.036
Token scared
Token ID12008
Feature activation+3.66e+00

Pos logit contributions
rolling+0.014
 Whoever+0.014
 challenged+0.013
 favorites+0.013
 disagreed+0.013
Neg logit contributions
ful-0.016
quet-0.015
bly-0.014
 fairy-0.013
kin-0.013
Token and
Token ID290
Feature activation-4.34e-01

Pos logit contributions
 One+0.111
 Whoever+0.101
 begged+0.100
 challenged+0.100
 asks+0.099
Neg logit contributions
quet-0.148
kin-0.129
igsaw-0.121
bly-0.120
 Fairy-0.115
Token serious
Token ID2726
Feature activation+1.36e-01

Pos logit contributions
ful+0.016
quet+0.015
bly+0.014
kin+0.014
 fairy+0.014
Neg logit contributions
rolling-0.013
 favorites-0.012
 Whoever-0.012
 challenged-0.012
Whoever-0.011
Token.
Token ID13
Feature activation-5.33e-01

Pos logit contributions
 disagreed+0.004
 challenged+0.004
 One+0.004
 begged+0.004
 argued+0.004
Neg logit contributions
quet-0.005
 queens-0.004
bly-0.004
igsaw-0.004
aid-0.004
Token
Token ID198
Feature activation-5.36e-01

Pos logit contributions
ful+0.020
quet+0.020
ment+0.018
bly+0.018
kin+0.018
Neg logit contributions
rolling-0.015
 challenged-0.014
 disagreed-0.014
 Whoever-0.014
 favorites-0.013
Token
Token ID198
Feature activation-8.25e-01

Pos logit contributions
ful+0.019
quet+0.018
ment+0.016
 fairy+0.015
bly+0.015
Neg logit contributions
rolling-0.017
 challenged-0.017
 disagreed-0.017
 favorites-0.016
 argued-0.016
Token soon
Token ID2582
Feature activation-8.69e-01

Pos logit contributions
quet+0.054
ful+0.046
aid+0.046
bly+0.045
kin+0.045
Neg logit contributions
 challenged-0.049
 One-0.048
rolling-0.047
 padd-0.046
 disagreed-0.045
Token.
Token ID13
Feature activation-5.53e-01

Pos logit contributions
quet+0.030
bly+0.027
ful+0.027
aid+0.026
kin+0.026
Neg logit contributions
 challenged-0.029
 One-0.028
 disagreed-0.027
 begged-0.027
 asks-0.027
Token<|endoftext|>
Token ID50256
Feature activation+3.24e-01

Pos logit contributions
ful+0.020
quet+0.020
ment+0.018
 fairy+0.017
 Fairy+0.017
Neg logit contributions
rolling-0.016
 challenged-0.016
 disagreed-0.015
 Whoever-0.014
 argued-0.014
TokenOnce
Token ID7454
Feature activation+6.92e-01

Pos logit contributions
rolling+0.008
 challenged+0.007
 Whoever+0.007
 One+0.007
 disagreed+0.007
Neg logit contributions
quet-0.014
ful-0.014
bly-0.012
ment-0.012
kin-0.012
Token,
Token ID11
Feature activation-3.77e-01

Pos logit contributions
 challenged+0.015
rolling+0.015
Whoever+0.015
 Whoever+0.015
 argued+0.014
Neg logit contributions
ful-0.031
kin-0.028
 fairy-0.027
 from-0.027
quet-0.026
Token there
Token ID612
Feature activation+4.13e+00

Pos logit contributions
ful+0.012
kin+0.010
quet+0.010
 from+0.009
bly+0.009
Neg logit contributions
rolling-0.014
 challenged-0.014
inters-0.013
 Whoever-0.013
 disagreed-0.013
Token was
Token ID373
Feature activation+3.41e-01

Pos logit contributions
rolling+0.158
 Whoever+0.146
 challenged+0.146
inters+0.144
 exception+0.140
Neg logit contributions
ful-0.122
quet-0.110
kin-0.101
 fairy-0.098
 from-0.096
Token a
Token ID257
Feature activation+2.88e-01

Pos logit contributions
rolling+0.013
 challenged+0.013
 Whoever+0.013
 One+0.013
 favorites+0.013
Neg logit contributions
quet-0.009
ful-0.009
ment-0.007
bly-0.007
 fairy-0.007
Token reliable
Token ID9314
Feature activation-4.65e-01

Pos logit contributions
rolling+0.009
 challenged+0.009
 Whoever+0.009
 disagreed+0.008
 One+0.008
Neg logit contributions
quet-0.010
ful-0.010
ment-0.009
bly-0.008
kin-0.008
Token start
Token ID923
Feature activation-5.01e-01

Pos logit contributions
quet+0.018
ful+0.018
bly+0.016
ment+0.016
paste+0.015
Neg logit contributions
 challenged-0.012
rolling-0.012
 One-0.012
 favorites-0.012
 padd-0.012
Token.
Token ID13
Feature activation+5.86e-02

Pos logit contributions
quet+0.016
ful+0.016
bly+0.014
aid+0.014
ment+0.014
Neg logit contributions
rolling-0.017
 challenged-0.016
 One-0.016
 Whoever-0.015
 disagreed-0.015
TokenOnce
Token ID7454
Feature activation+7.38e-01

Pos logit contributions
rolling+0.014
 challenged+0.014
 Whoever+0.013
 One+0.013
 disagreed+0.013
Neg logit contributions
quet-0.024
ful-0.024
bly-0.021
ment-0.021
kin-0.021
Token upon
Token ID2402
Feature activation+4.01e-01

Pos logit contributions
 challenged+0.016
Whoever+0.016
rolling+0.016
 Whoever+0.016
 argued+0.015
Neg logit contributions
ful-0.034
kin-0.031
 fairy-0.030
 from-0.029
quet-0.029
Token a
Token ID257
Feature activation+4.44e-02

Pos logit contributions
rolling+0.015
 challenged+0.015
 One+0.015
 Whoever+0.015
 disagreed+0.015
Neg logit contributions
quet-0.011
ful-0.011
bly-0.009
ment-0.009
kin-0.009
Token time
Token ID640
Feature activation-9.25e-01

Pos logit contributions
rolling+0.002
 challenged+0.002
 Whoever+0.002
 disagreed+0.002
Whoever+0.001
Neg logit contributions
ful-0.001
quet-0.001
kin-0.001
sm-0.001
 fairy-0.001
Token,
Token ID11
Feature activation-4.44e-01

Pos logit contributions
quet+0.020
ful+0.020
bly+0.015
ment+0.015
kin+0.015
Neg logit contributions
rolling-0.041
 challenged-0.040
 Whoever-0.040
 disagreed-0.039
 favorites-0.039
Token there
Token ID612
Feature activation+3.69e+00

Pos logit contributions
ful+0.014
quet+0.012
kin+0.011
bly+0.011
web+0.011
Neg logit contributions
rolling-0.017
 challenged-0.016
 disagreed-0.016
 Whoever-0.016
 favorites-0.015
Token was
Token ID373
Feature activation+3.98e-02

Pos logit contributions
rolling+0.127
 challenged+0.120
 Whoever+0.118
 disagreed+0.114
inters+0.113
Neg logit contributions
ful-0.121
quet-0.115
kin-0.102
 fairy-0.101
ment-0.099
Token a
Token ID257
Feature activation+1.20e-01

Pos logit contributions
rolling+0.001
 challenged+0.001
 One+0.001
 Whoever+0.001
 favorites+0.001
Neg logit contributions
quet-0.001
ful-0.001
ment-0.001
 fairy-0.001
bly-0.001
Token little
Token ID1310
Feature activation+5.09e-02

Pos logit contributions
rolling+0.004
 challenged+0.004
 Whoever+0.004
 disagreed+0.003
 One+0.003
Neg logit contributions
quet-0.004
ful-0.004
ment-0.004
bly-0.004
aid-0.003
Token boy
Token ID2933
Feature activation+1.42e+00

Pos logit contributions
 challenged+0.001
 padd+0.001
 disagreed+0.001
 One+0.001
rolling+0.001
Neg logit contributions
quet-0.002
ful-0.002
ment-0.002
bly-0.002
paste-0.002
Token named
Token ID3706
Feature activation+2.04e+00

Pos logit contributions
 challenged+0.035
 One+0.034
rolling+0.033
 disagreed+0.033
 Whoever+0.032
Neg logit contributions
quet-0.060
ful-0.054
ment-0.051
bly-0.051
paste-0.050
Token have
Token ID423
Feature activation-2.91e+00

Pos logit contributions
ful+0.047
quet+0.046
 fairy+0.041
kin+0.040
 ray+0.039
Neg logit contributions
rolling-0.050
 challenged-0.046
 exception-0.044
outed-0.044
 favorites-0.044
Token to
Token ID284
Feature activation+3.27e-03

Pos logit contributions
quet+0.100
ful+0.098
bly+0.084
kin+0.084
ment+0.084
Neg logit contributions
rolling-0.093
 challenged-0.091
 Whoever-0.088
 disagreed-0.088
 favorites-0.085
Token be
Token ID307
Feature activation-1.89e+00

Pos logit contributions
 challenged+0.000
rolling+0.000
 begged+0.000
 One+0.000
 disagreed+0.000
Neg logit contributions
quet-0.000
ful-0.000
ment-0.000
sm-0.000
web-0.000
Token gentle
Token ID10296
Feature activation+3.02e-01

Pos logit contributions
quet+0.068
ful+0.063
ment+0.056
kin+0.055
 fairy+0.055
Neg logit contributions
 challenged-0.061
rolling-0.059
 disagreed-0.058
 begged-0.056
 Whoever-0.055
Token.
Token ID13
Feature activation-6.28e-01

Pos logit contributions
 challenged+0.010
 disagreed+0.009
 One+0.009
 begged+0.009
rolling+0.009
Neg logit contributions
quet-0.010
ful-0.009
kin-0.009
aid-0.008
bly-0.008
Token The
Token ID383
Feature activation+2.57e-01

Pos logit contributions
quet+0.025
ful+0.024
bly+0.021
kin+0.021
ment+0.021
Neg logit contributions
rolling-0.017
 Whoever-0.017
 challenged-0.016
 One-0.016
 favorites-0.015
Token piano
Token ID19132
Feature activation-5.10e-01

Pos logit contributions
rolling+0.008
 challenged+0.008
 One+0.008
 favorites+0.008
 disagreed+0.008
Neg logit contributions
quet-0.009
ful-0.009
ment-0.008
bly-0.008
aid-0.007
Token is
Token ID318
Feature activation-2.20e-01

Pos logit contributions
ful+0.020
quet+0.020
bly+0.017
ment+0.017
kin+0.016
Neg logit contributions
 challenged-0.014
 disagreed-0.014
 Whoever-0.013
rolling-0.013
 One-0.013
Token not
Token ID407
Feature activation-5.64e-01

Pos logit contributions
quet+0.008
ful+0.007
ment+0.007
igsaw+0.007
aid+0.007
Neg logit contributions
 challenged-0.007
 disagreed-0.006
 One-0.006
rolling-0.006
 Whoever-0.006
Token a
Token ID257
Feature activation+1.25e+00

Pos logit contributions
quet+0.019
ful+0.017
ment+0.016
cake+0.016
bly+0.016
Neg logit contributions
 challenged-0.019
 disagreed-0.018
inters-0.017
 One-0.017
 begged-0.017
Token toy
Token ID13373
Feature activation+1.35e+00

Pos logit contributions
 challenged+0.039
 Whoever+0.038
rolling+0.037
 One+0.037
 age+0.037
Neg logit contributions
quet-0.044
ful-0.042
ment-0.039
bly-0.037
aid-0.036
Token a
Token ID257
Feature activation+3.30e-01

Pos logit contributions
rolling+0.007
 challenged+0.007
 Whoever+0.007
 disagreed+0.007
 favorites+0.007
Neg logit contributions
quet-0.004
ful-0.004
bly-0.003
ment-0.003
 fairy-0.003
Token time
Token ID640
Feature activation-6.38e-01

Pos logit contributions
rolling+0.013
 challenged+0.013
 Whoever+0.012
 disagreed+0.012
Whoever+0.012
Neg logit contributions
ful-0.008
quet-0.008
sm-0.007
kin-0.007
 fore-0.006
Token,
Token ID11
Feature activation-4.25e-01

Pos logit contributions
ful+0.011
quet+0.010
kin+0.008
 fairy+0.007
ment+0.007
Neg logit contributions
rolling-0.031
 challenged-0.031
 Whoever-0.030
Whoever-0.030
 disagreed-0.030
Token there
Token ID612
Feature activation+3.76e+00

Pos logit contributions
ful+0.014
quet+0.013
kin+0.012
 fairy+0.011
bly+0.011
Neg logit contributions
rolling-0.015
 challenged-0.014
 disagreed-0.014
 Whoever-0.014
 favorites-0.014
Token was
Token ID373
Feature activation-4.91e-04

Pos logit contributions
rolling+0.148
inters+0.139
 challenged+0.138
 Whoever+0.138
 exception+0.135
Neg logit contributions
ful-0.105
quet-0.093
kin-0.088
 fairy-0.084
 from-0.083
Token a
Token ID257
Feature activation+3.43e-01

Pos logit contributions
quet+0.000
ful+0.000
ment+0.000
bly+0.000
 fairy+0.000
Neg logit contributions
rolling-0.000
 challenged-0.000
 One-0.000
 Whoever-0.000
 favorites-0.000
Token little
Token ID1310
Feature activation+1.87e-01

Pos logit contributions
rolling+0.011
 challenged+0.011
 Whoever+0.010
 disagreed+0.010
 One+0.010
Neg logit contributions
quet-0.012
ful-0.012
ment-0.010
bly-0.010
aid-0.010
Token girl
Token ID2576
Feature activation+7.92e-03

Pos logit contributions
 challenged+0.004
 padd+0.004
rolling+0.004
 One+0.004
 disagreed+0.004
Neg logit contributions
quet-0.008
ful-0.008
ment-0.007
bly-0.007
paste-0.007
Token named
Token ID3706
Feature activation+2.11e+00

Pos logit contributions
 One+0.000
 challenged+0.000
rolling+0.000
 begged+0.000
 disagreed+0.000
Neg logit contributions
quet-0.000
ful-0.000
bly-0.000
ment-0.000
paste-0.000
Token Lily
Token ID20037
Feature activation-9.17e-01

Pos logit contributions
 One+0.080
rolling+0.079
 Whoever+0.077
 challenged+0.077
 disagreed+0.076
Neg logit contributions
quet-0.061
ful-0.057
ment-0.053
bly-0.051
kin-0.049
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
quet+0.021
ful+0.021
bly+0.016
ment+0.016
 fairy+0.016
Neg logit contributions
rolling-0.040
 challenged-0.039
 Whoever-0.038
 disagreed-0.038
 favorites-0.037
Token they
Token ID484
Feature activation+3.64e-01

Pos logit contributions
quet+0.035
ful+0.035
bly+0.030
ment+0.030
kin+0.029
Neg logit contributions
rolling-0.032
 challenged-0.031
 Whoever-0.030
 disagreed-0.030
 favorites-0.029
Token saw
Token ID2497
Feature activation-2.02e-01

Pos logit contributions
rolling+0.010
 Whoever+0.010
 exception+0.010
inters+0.010
 Int+0.009
Neg logit contributions
ful-0.014
 from-0.012
 fairy-0.011
quet-0.011
 queens-0.011
Token a
Token ID257
Feature activation+8.23e-02

Pos logit contributions
quet+0.005
ful+0.005
bly+0.004
ment+0.004
kin+0.004
Neg logit contributions
rolling-0.008
 challenged-0.008
 Whoever-0.008
 disagreed-0.008
 One-0.008
Token big
Token ID1263
Feature activation-1.82e+00

Pos logit contributions
rolling+0.003
 challenged+0.003
 disagreed+0.003
 Whoever+0.003
 padd+0.002
Neg logit contributions
ful-0.003
quet-0.003
ment-0.002
bly-0.002
aid-0.002
Token poster
Token ID11968
Feature activation-7.60e-01

Pos logit contributions
quet+0.071
ful+0.070
ment+0.066
bly+0.065
aid+0.064
Neg logit contributions
 challenged-0.047
rolling-0.047
 disagreed-0.047
 One-0.045
 padd-0.045
Token with
Token ID351
Feature activation+1.40e+00

Pos logit contributions
quet+0.017
ful+0.017
ment+0.014
kin+0.014
bly+0.014
Neg logit contributions
 challenged-0.032
rolling-0.032
 Whoever-0.031
 disagreed-0.030
 One-0.030
Token a
Token ID257
Feature activation+7.87e-01

Pos logit contributions
rolling+0.049
 challenged+0.049
 Whoever+0.048
 disagreed+0.047
 One+0.046
Neg logit contributions
quet-0.043
ful-0.042
kin-0.036
bly-0.036
ment-0.035
Token picture
Token ID4286
Feature activation-1.59e+00

Pos logit contributions
rolling+0.030
 challenged+0.029
 Whoever+0.028
 disagreed+0.028
 favorites+0.028
Neg logit contributions
quet-0.022
ful-0.021
 fairy-0.018
kin-0.017
 ray-0.017
Token of
Token ID286
Feature activation-1.29e+00

Pos logit contributions
quet+0.040
ful+0.040
 relief+0.039
 forgiveness+0.038
bly+0.038
Neg logit contributions
 challenged-0.058
 One-0.057
 Whoever-0.056
 lined-0.055
rolling-0.054
Token a
Token ID257
Feature activation+6.91e-01

Pos logit contributions
quet+0.040
ful+0.038
kin+0.034
bly+0.033
ment+0.033
Neg logit contributions
 challenged-0.044
rolling-0.044
 Whoever-0.044
 One-0.044
 favorites-0.043
Token lion
Token ID18744
Feature activation+2.10e+00

Pos logit contributions
rolling+0.023
 challenged+0.023
 Whoever+0.022
 disagreed+0.022
 One+0.021
Neg logit contributions
ful-0.022
quet-0.022
ment-0.019
bly-0.019
kin-0.018
Token a
Token ID257
Feature activation+1.65e-01

Pos logit contributions
rolling+0.021
 challenged+0.020
 One+0.020
 Whoever+0.020
 disagreed+0.020
Neg logit contributions
quet-0.015
ful-0.015
bly-0.012
ment-0.012
kin-0.012
Token time
Token ID640
Feature activation-7.00e-01

Pos logit contributions
rolling+0.006
 challenged+0.006
 Whoever+0.006
Whoever+0.006
 disagreed+0.006
Neg logit contributions
ful-0.004
quet-0.004
kin-0.004
sm-0.004
 fairy-0.004
Token,
Token ID11
Feature activation-1.66e-01

Pos logit contributions
ful+0.014
quet+0.014
kin+0.010
 fairy+0.010
bly+0.010
Neg logit contributions
rolling-0.032
 challenged-0.032
 Whoever-0.031
 disagreed-0.030
 favorites-0.030
Token there
Token ID612
Feature activation+3.86e+00

Pos logit contributions
ful+0.005
quet+0.005
kin+0.004
bly+0.004
web+0.004
Neg logit contributions
rolling-0.006
 challenged-0.006
 disagreed-0.006
 Whoever-0.006
 favorites-0.006
Token was
Token ID373
Feature activation+2.14e-01

Pos logit contributions
rolling+0.132
 challenged+0.124
 Whoever+0.123
inters+0.118
 disagreed+0.118
Neg logit contributions
ful-0.127
quet-0.122
kin-0.108
 fairy-0.106
bly-0.104
Token a
Token ID257
Feature activation+1.76e-01

Pos logit contributions
rolling+0.009
 challenged+0.008
 Whoever+0.008
 One+0.008
 disagreed+0.008
Neg logit contributions
quet-0.006
ful-0.005
ment-0.005
bly-0.004
 fairy-0.004
Token little
Token ID1310
Feature activation+1.76e-01

Pos logit contributions
rolling+0.006
 challenged+0.005
 Whoever+0.005
 disagreed+0.005
 One+0.005
Neg logit contributions
quet-0.006
ful-0.006
ment-0.005
bly-0.005
aid-0.005
Token boy
Token ID2933
Feature activation+1.48e+00

Pos logit contributions
 challenged+0.004
 padd+0.004
 disagreed+0.004
rolling+0.004
 Whoever+0.004
Neg logit contributions
quet-0.007
ful-0.007
ment-0.007
bly-0.006
paste-0.006
Token who
Token ID508
Feature activation+4.57e+00

Pos logit contributions
 challenged+0.039
rolling+0.038
 One+0.037
 disagreed+0.036
 Whoever+0.036
Neg logit contributions
quet-0.061
ful-0.055
ment-0.052
bly-0.051
paste-0.051
Token was
Token ID373
Feature activation+7.41e-01

Pos logit contributions
rolling+0.090
 challenged+0.088
 disagreed+0.081
 One+0.080
 Whoever+0.079
Neg logit contributions
quet-0.219
ful-0.211
ment-0.191
bly-0.188
 fairy-0.187
Token so
Token ID523
Feature activation+2.64e+00

Pos logit contributions
rolling+0.030
 Whoever+0.030
 disagreed+0.029
 challenged+0.029
Whoever+0.029
Neg logit contributions
ful-0.019
quet-0.016
bly-0.015
 fairy-0.015
 from-0.014
Token for
Token ID329
Feature activation-2.75e+00

Pos logit contributions
quet+0.063
bly+0.053
kin+0.053
aid+0.051
ful+0.050
Neg logit contributions
 disagreed-0.051
 challenged-0.051
 begged-0.048
 One-0.047
rolling-0.047
Token her
Token ID607
Feature activation-3.50e-01

Pos logit contributions
quet+0.100
cake+0.090
kin+0.085
paste+0.084
ful+0.083
Neg logit contributions
 age-0.082
 challenged-0.082
 begged-0.081
 One-0.080
 padd-0.079
Token act
Token ID719
Feature activation+3.34e-01

Pos logit contributions
quet+0.012
ful+0.012
bly+0.011
ment+0.010
aid+0.010
Neg logit contributions
rolling-0.011
 disagreed-0.010
 challenged-0.010
 favorites-0.010
 age-0.010
Token of
Token ID286
Feature activation-2.05e+00

Pos logit contributions
rolling+0.011
 disagreed+0.011
 challenged+0.011
 intrigued+0.010
 hesitant+0.010
Neg logit contributions
quet-0.010
ful-0.009
mm-0.009
bly-0.009
aid-0.009
Token courage
Token ID11917
Feature activation+2.56e-03

Pos logit contributions
quet+0.076
ful+0.069
cake+0.065
bly+0.063
kin+0.060
Neg logit contributions
 favorites-0.062
 challenged-0.062
 age-0.061
 One-0.061
 padd-0.060
Token,
Token ID11
Feature activation+3.36e-02

Pos logit contributions
 disagreed+0.000
rolling+0.000
 challenged+0.000
 One+0.000
 age+0.000
Neg logit contributions
quet-0.000
ful-0.000
bly-0.000
aid-0.000
kin-0.000
Token and
Token ID290
Feature activation-4.35e-01

Pos logit contributions
 challenged+0.001
rolling+0.001
 Whoever+0.001
 disagreed+0.001
 One+0.001
Neg logit contributions
quet-0.001
ful-0.001
kin-0.001
bly-0.001
ment-0.001
Token she
Token ID673
Feature activation+4.10e-01

Pos logit contributions
quet+0.016
ful+0.014
kin+0.013
bly+0.013
cake+0.013
Neg logit contributions
 challenged-0.014
rolling-0.013
 disagreed-0.013
 begged-0.013
 intrigued-0.012
Token was
Token ID373
Feature activation+5.93e-01

Pos logit contributions
rolling+0.011
 challenged+0.010
 disagreed+0.010
 Whoever+0.010
 One+0.009
Neg logit contributions
quet-0.017
ful-0.016
bly-0.014
ment-0.014
kin-0.014
Token even
Token ID772
Feature activation-5.99e-01

Pos logit contributions
rolling+0.018
 challenged+0.018
 disagreed+0.017
 begged+0.017
 intrigued+0.017
Neg logit contributions
quet-0.022
ful-0.020
ment-0.018
kin-0.018
paste-0.018
Token more
Token ID517
Feature activation-1.11e+00

Pos logit contributions
quet+0.017
ful+0.016
kin+0.013
bly+0.013
ment+0.013
Neg logit contributions
 challenged-0.024
rolling-0.023
 disagreed-0.023
 begged-0.022
 hesitant-0.022
Token
Token ID198
Feature activation-1.37e-01

Pos logit contributions
rolling+0.008
 challenged+0.007
 Whoever+0.007
 One+0.007
 Int+0.007
Neg logit contributions
quet-0.015
ful-0.015
kin-0.014
bly-0.014
ment-0.013
TokenThe
Token ID464
Feature activation+2.56e-01

Pos logit contributions
ful+0.004
quet+0.004
kin+0.004
 ray+0.004
ment+0.004
Neg logit contributions
rolling-0.004
 Whoever-0.004
 challenged-0.004
 One-0.004
 begged-0.004
Token little
Token ID1310
Feature activation+6.63e-01

Pos logit contributions
 challenged+0.007
rolling+0.007
 disagreed+0.007
 One+0.007
 Whoever+0.007
Neg logit contributions
quet-0.010
ful-0.009
bly-0.008
ment-0.008
paste-0.008
Token girl
Token ID2576
Feature activation+2.52e-01

Pos logit contributions
rolling+0.022
 challenged+0.022
 disagreed+0.021
 Whoever+0.021
 One+0.021
Neg logit contributions
quet-0.022
ful-0.021
bly-0.018
ment-0.018
kin-0.018
Token said
Token ID531
Feature activation+1.31e+00

Pos logit contributions
rolling+0.007
 challenged+0.006
 Whoever+0.006
 favorites+0.006
inters+0.006
Neg logit contributions
ful-0.010
quet-0.010
 fairy-0.009
ment-0.009
kin-0.008
Token,
Token ID11
Feature activation-6.83e-01

Pos logit contributions
rolling+0.038
 challenged+0.038
 One+0.038
 begged+0.038
 disagreed+0.037
Neg logit contributions
quet-0.047
igsaw-0.040
web-0.040
 queens-0.040
 Tooth-0.039
Token "
Token ID366
Feature activation+1.11e+00

Pos logit contributions
quet+0.033
ful+0.032
kin+0.030
ment+0.029
bly+0.029
Neg logit contributions
rolling-0.013
 challenged-0.012
 Whoever-0.012
 One-0.011
 disagreed-0.011
TokenMy
Token ID3666
Feature activation+2.34e+00

Pos logit contributions
rolling+0.039
 challenged+0.037
 favorites+0.036
 disagreed+0.036
 Whoever+0.035
Neg logit contributions
quet-0.035
ful-0.035
bly-0.030
 fairy-0.029
aid-0.029
Token name
Token ID1438
Feature activation+7.44e-01

Pos logit contributions
rolling+0.064
 challenged+0.061
 Whoever+0.058
 disagreed+0.058
 favorites+0.058
Neg logit contributions
quet-0.093
ful-0.091
bly-0.081
ment-0.079
kin-0.078
Token is
Token ID318
Feature activation+2.15e-02

Pos logit contributions
 wants+0.015
 wanted+0.015
 might+0.014
 disagreed+0.014
 challenged+0.014
Neg logit contributions
quet-0.035
ful-0.034
igsaw-0.033
ment-0.031
bly-0.031
Token Chloe
Token ID29476
Feature activation+1.58e+00

Pos logit contributions
 challenged+0.001
 One+0.001
rolling+0.001
 Whoever+0.001
 disagreed+0.001
Neg logit contributions
quet-0.001
ful-0.001
ment-0.001
aid-0.001
kin-0.001
Token were
Token ID547
Feature activation-9.74e-01

Pos logit contributions
rolling+0.002
 challenged+0.002
 Whoever+0.002
 disagreed+0.002
 favorites+0.002
Neg logit contributions
quet-0.003
ful-0.003
bly-0.003
ment-0.003
kin-0.003
Token sharing
Token ID7373
Feature activation-2.32e+00

Pos logit contributions
quet+0.038
ful+0.034
ment+0.032
aid+0.031
paste+0.031
Neg logit contributions
rolling-0.029
 challenged-0.028
 disagreed-0.027
 padd-0.027
 Whoever-0.026
Token lots
Token ID6041
Feature activation-1.36e+00

Pos logit contributions
quet+0.098
bly+0.084
ful+0.083
arl+0.078
igsaw+0.078
Neg logit contributions
 challenged-0.060
 disagreed-0.059
 favorite-0.058
 favorites-0.058
 wanted-0.055
Token of
Token ID286
Feature activation-1.92e+00

Pos logit contributions
bly+0.050
quet+0.050
kin+0.050
aby+0.049
 Tooth+0.049
Neg logit contributions
 favorite-0.029
 Whoever-0.028
 favourite-0.027
 especially-0.027
 exception-0.026
Token c
Token ID269
Feature activation-1.65e+00

Pos logit contributions
quet+0.075
ful+0.074
bly+0.065
kin+0.062
paste+0.060
Neg logit contributions
 favorites-0.058
 One-0.054
 Whoever-0.053
 challenged-0.052
 disagreed-0.052
Tokenudd
Token ID4185
Feature activation-2.17e+00

Pos logit contributions
ful+0.081
quet+0.077
 fairy+0.076
kin+0.073
 Fairy+0.072
Neg logit contributions
rolling-0.033
ashing-0.028
 favorites-0.028
inters-0.027
outed-0.025
Tokenles
Token ID829
Feature activation+2.21e+00

Pos logit contributions
ful+0.099
quet+0.099
 fairy+0.089
web+0.086
 Fairy+0.086
Neg logit contributions
 exception-0.043
 challenged-0.042
ashing-0.042
 lined-0.041
inters-0.041
Token and
Token ID290
Feature activation-1.01e-01

Pos logit contributions
 padd+0.048
 disagreed+0.047
 challenged+0.047
 age+0.046
 favorites+0.045
Neg logit contributions
quet-0.091
aid-0.083
 Tooth-0.081
bly-0.081
ful-0.079
Token love
Token ID1842
Feature activation-2.32e+00

Pos logit contributions
quet+0.004
ful+0.004
bly+0.004
kin+0.003
web+0.003
Neg logit contributions
 challenged-0.003
 disagreed-0.003
 padd-0.003
 argued-0.003
 wanted-0.003
Token.
Token ID13
Feature activation-2.95e-01

Pos logit contributions
quet+0.095
aid+0.084
bly+0.080
kin+0.080
cake+0.077
Neg logit contributions
 disagreed-0.060
 favorites-0.060
 wanted-0.059
 age-0.059
 One-0.059
Token Now
Token ID2735
Feature activation-4.65e+00

Pos logit contributions
ful+0.010
quet+0.010
ment+0.009
bly+0.008
sm+0.008
Neg logit contributions
rolling-0.009
 challenged-0.009
 disagreed-0.009
 exception-0.008
 begged-0.008
Token remind
Token ID7101
Feature activation-1.31e+00

Pos logit contributions
quet+0.000
ful+0.000
ment+0.000
sm+0.000
paste+0.000
Neg logit contributions
rolling-0.000
 challenged-0.000
 padd-0.000
 Whoever-0.000
 begged-0.000
Token her
Token ID607
Feature activation-6.12e-01

Pos logit contributions
quet+0.050
 ray+0.044
ful+0.044
kin+0.044
web+0.041
Neg logit contributions
 Whoever-0.037
 challenged-0.036
 One-0.036
 begged-0.036
 padd-0.035
Token before
Token ID878
Feature activation+9.88e-01

Pos logit contributions
quet+0.024
ful+0.022
bly+0.022
kin+0.020
 ray+0.020
Neg logit contributions
rolling-0.016
 challenged-0.016
 disagreed-0.016
 begged-0.015
 One-0.015
Token they
Token ID484
Feature activation+1.43e+00

Pos logit contributions
rolling+0.030
 challenged+0.028
 disagreed+0.028
 padd+0.028
 begged+0.027
Neg logit contributions
quet-0.037
ful-0.035
aid-0.032
kin-0.032
web-0.032
Token left
Token ID1364
Feature activation-1.48e+00

Pos logit contributions
 disagreed+0.032
 challenged+0.032
rolling+0.031
 argued+0.030
 begged+0.030
Neg logit contributions
quet-0.065
ful-0.061
bly-0.056
ment-0.056
igsaw-0.056
Token.
Token ID13
Feature activation-3.81e-01

Pos logit contributions
quet+0.051
aid+0.045
ful+0.044
 queens+0.042
cake+0.042
Neg logit contributions
 One-0.045
 padd-0.044
 begged-0.044
 disagreed-0.043
rolling-0.043
Token They
Token ID1119
Feature activation+1.15e-02

Pos logit contributions
quet+0.015
ful+0.015
ment+0.013
 fairy+0.012
bly+0.012
Neg logit contributions
rolling-0.010
 challenged-0.010
 disagreed-0.010
 Whoever-0.009
 favorites-0.009
Token had
Token ID550
Feature activation-2.09e+00

Pos logit contributions
rolling+0.000
 challenged+0.000
 Whoever+0.000
 disagreed+0.000
 favorites+0.000
Neg logit contributions
quet-0.000
ful-0.000
ment-0.000
bly-0.000
kin-0.000
Token a
Token ID257
Feature activation-2.62e-01

Pos logit contributions
quet+0.072
ful+0.069
ment+0.061
kin+0.061
bly+0.060
Neg logit contributions
rolling-0.068
 challenged-0.067
 disagreed-0.065
 Whoever-0.064
 One-0.063
Token fun
Token ID1257
Feature activation-2.50e+00

Pos logit contributions
quet+0.009
ful+0.009
ment+0.007
bly+0.007
kin+0.007
Neg logit contributions
rolling-0.009
 challenged-0.009
 Whoever-0.008
 disagreed-0.008
 favorites-0.008
Token day
Token ID1110
Feature activation-2.00e+00

Pos logit contributions
quet+0.088
ful+0.087
bly+0.075
ment+0.074
 fairy+0.072
Neg logit contributions
rolling-0.079
 challenged-0.077
 Whoever-0.074
 disagreed-0.074
 One-0.073
Token playing
Token ID2712
Feature activation+1.65e+00

Pos logit contributions
rolling+0.022
 challenged+0.021
 Whoever+0.021
 disagreed+0.021
 favorites+0.020
Neg logit contributions
ful-0.021
quet-0.021
ment-0.018
 fairy-0.018
bly-0.018
Token outside
Token ID2354
Feature activation+8.85e-01

Pos logit contributions
rolling+0.067
 challenged+0.067
 Whoever+0.065
 One+0.063
 disagreed+0.063
Neg logit contributions
quet-0.042
ful-0.042
ment-0.034
bly-0.033
kin-0.032
Token.
Token ID13
Feature activation-6.52e-01

Pos logit contributions
rolling+0.035
 challenged+0.035
 One+0.034
 Whoever+0.034
 disagreed+0.033
Neg logit contributions
quet-0.024
ful-0.022
aid-0.019
ment-0.019
kin-0.019
Token One
Token ID1881
Feature activation+2.22e+00

Pos logit contributions
ful+0.021
igsaw+0.019
 Tooth+0.019
 from+0.019
 ink+0.018
Neg logit contributions
 disagreed-0.019
 argued-0.019
outed-0.018
ounded-0.018
rolling-0.018
Token day
Token ID1110
Feature activation-2.08e+00

Pos logit contributions
rolling+0.072
 challenged+0.070
 disagreed+0.068
 Whoever+0.067
 One+0.067
Neg logit contributions
quet-0.076
ful-0.074
bly-0.064
ment-0.063
 fairy-0.062
Token,
Token ID11
Feature activation-1.25e+00

Pos logit contributions
quet+0.050
ful+0.049
bly+0.038
ment+0.038
 fairy+0.038
Neg logit contributions
rolling-0.089
 challenged-0.087
 Whoever-0.085
 disagreed-0.084
 One-0.083
Token Sammy
Token ID44067
Feature activation-4.70e-01

Pos logit contributions
ful+0.043
quet+0.041
 fairy+0.036
bly+0.036
kin+0.036
Neg logit contributions
rolling-0.040
 challenged-0.038
 Whoever-0.038
 disagreed-0.037
 favorites-0.037
Token and
Token ID290
Feature activation-2.95e-01

Pos logit contributions
ful+0.019
quet+0.018
 fairy+0.017
 from+0.016
kin+0.016
Neg logit contributions
rolling-0.012
 Whoever-0.011
 favorites-0.011
 exception-0.010
 challenged-0.010
Token his
Token ID465
Feature activation+1.04e+00

Pos logit contributions
quet+0.007
ful+0.007
ment+0.006
 fairy+0.006
bly+0.006
Neg logit contributions
rolling-0.012
 challenged-0.012
 Whoever-0.012
 disagreed-0.012
 favorites-0.011
Token best
Token ID1266
Feature activation+1.17e+00

Pos logit contributions
rolling+0.029
 challenged+0.027
 favorites+0.026
 One+0.025
 disagreed+0.025
Neg logit contributions
quet-0.042
ful-0.041
ment-0.036
bly-0.036
kin-0.035
Token friend
Token ID1545
Feature activation+1.02e+00

Pos logit contributions
rolling+0.033
 challenged+0.031
 Whoever+0.031
 disagreed+0.030
 Int+0.029
Neg logit contributions
ful-0.046
quet-0.044
 fairy-0.039
ment-0.039
kin-0.038
Token playing
Token ID2712
Feature activation+2.03e+00

Pos logit contributions
quet+0.011
ful+0.010
ment+0.009
bly+0.009
kin+0.009
Neg logit contributions
rolling-0.013
 challenged-0.012
 Whoever-0.012
 disagreed-0.012
 favorites-0.012
Token with
Token ID351
Feature activation+1.07e+00

Pos logit contributions
 One+0.056
 challenged+0.055
 Whoever+0.055
 hesitant+0.053
 unsure+0.053
Neg logit contributions
ful-0.070
quet-0.068
ment-0.066
aid-0.062
 bless-0.059
Token it
Token ID340
Feature activation+4.55e-01

Pos logit contributions
 One+0.029
 disagreed+0.029
 favorites+0.029
 Whoever+0.028
 challenged+0.028
Neg logit contributions
quet-0.041
ful-0.036
cake-0.034
ment-0.034
bly-0.033
Token.
Token ID13
Feature activation-6.47e-01

Pos logit contributions
 One+0.013
 disagreed+0.013
 unsure+0.012
 challenged+0.012
 favorite+0.012
Neg logit contributions
quet-0.016
aid-0.015
ment-0.015
ful-0.015
bly-0.015
Token
Token ID198
Feature activation-8.33e-01

Pos logit contributions
quet+0.025
ful+0.025
bly+0.022
kin+0.021
ment+0.021
Neg logit contributions
rolling-0.018
 challenged-0.018
 Whoever-0.017
 One-0.017
 disagreed-0.017
Token
Token ID198
Feature activation-1.14e+00

Pos logit contributions
ful+0.028
quet+0.027
sm+0.023
 fairy+0.023
ment+0.023
Neg logit contributions
 challenged-0.027
rolling-0.027
 disagreed-0.027
 favorites-0.026
 argued-0.026
TokenFrom
Token ID4863
Feature activation-5.01e+00

Pos logit contributions
ful+0.035
quet+0.034
ment+0.031
kin+0.029
bly+0.029
Neg logit contributions
rolling-0.040
 challenged-0.039
 Whoever-0.039
Whoever-0.037
 disagreed-0.037
Token then
Token ID788
Feature activation-1.25e+00

Pos logit contributions
quet+0.162
ful+0.139
paste+0.134
cake+0.128
kin+0.128
Neg logit contributions
 challenged-0.174
 padd-0.168
 disagreed-0.166
rolling-0.165
 asks-0.164
Token on
Token ID319
Feature activation-6.45e-01

Pos logit contributions
quet+0.034
ful+0.032
 fairy+0.027
ment+0.026
bly+0.026
Neg logit contributions
rolling-0.049
 challenged-0.049
 Whoever-0.048
 One-0.048
 disagreed-0.047
Token,
Token ID11
Feature activation-1.15e+00

Pos logit contributions
quet+0.027
aid+0.026
ful+0.025
bly+0.024
 queens+0.023
Neg logit contributions
 One-0.019
 Suddenly-0.017
 disagreed-0.016
rolling-0.015
 challenged-0.015
Token she
Token ID673
Feature activation-2.63e-01

Pos logit contributions
quet+0.041
ful+0.040
bly+0.038
aid+0.036
web+0.034
Neg logit contributions
 One-0.037
 favorites-0.035
 disagreed-0.035
 Suddenly-0.034
 challenged-0.034
Token best
Token ID1266
Feature activation+1.39e+00

Pos logit contributions
rolling+0.015
 challenged+0.014
 Whoever+0.014
 disagreed+0.014
 favorites+0.013
Neg logit contributions
quet-0.015
ful-0.015
bly-0.012
ment-0.012
kin-0.012
Token of
Token ID286
Feature activation-1.76e+00

Pos logit contributions
 padd+0.049
rolling+0.049
 challenged+0.049
 favorites+0.048
 exception+0.047
Neg logit contributions
quet-0.042
ful-0.039
aid-0.037
bly-0.036
kin-0.035
Token friends
Token ID2460
Feature activation-2.89e-01

Pos logit contributions
quet+0.070
ful+0.068
cake+0.060
ment+0.060
 fairy+0.060
Neg logit contributions
 challenged-0.047
rolling-0.047
 disagreed-0.045
 Whoever-0.045
 favorites-0.044
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
quet+0.010
aid+0.009
bly+0.009
ful+0.009
 Tooth+0.009
Neg logit contributions
 disagreed-0.009
 challenged-0.009
 Whoever-0.009
 One-0.009
 padd-0.008
Token The
Token ID383
Feature activation-1.90e-01

Pos logit contributions
ful+0.015
quet+0.015
ment+0.013
 fairy+0.013
sm+0.013
Neg logit contributions
rolling-0.013
 challenged-0.013
 disagreed-0.012
 Whoever-0.012
 exception-0.012
Token end
Token ID886
Feature activation-3.57e+00

Pos logit contributions
ful+0.006
quet+0.006
ment+0.005
bly+0.004
aid+0.004
Neg logit contributions
 challenged-0.007
rolling-0.007
 One-0.007
 favorites-0.007
 disagreed-0.007
Token.
Token ID13
Feature activation-4.85e-01

Pos logit contributions
quet+0.132
bly+0.120
kin+0.115
ful+0.113
 queens+0.110
Neg logit contributions
 One-0.110
 Whoever-0.099
 challenged-0.096
 favorites-0.094
 disagreed-0.093
Token<|endoftext|>
Token ID50256
Feature activation+3.46e-01

Pos logit contributions
quet+0.021
ful+0.020
bly+0.019
kin+0.018
ment+0.018
Neg logit contributions
rolling-0.011
 challenged-0.011
 One-0.011
 Whoever-0.011
 disagreed-0.010
TokenM
Token ID44
Feature activation+1.04e+00

Pos logit contributions
rolling+0.008
 challenged+0.008
 Whoever+0.008
 One+0.008
 disagreed+0.007
Neg logit contributions
quet-0.015
ful-0.015
bly-0.013
ment-0.013
kin-0.013
Tokenolly
Token ID5098
Feature activation-1.13e+00

Pos logit contributions
rolling+0.026
 One+0.025
 disagreed+0.025
 challenged+0.025
 Whoever+0.024
Neg logit contributions
quet-0.042
ful-0.042
bly-0.038
ment-0.037
 fairy-0.037
Token and
Token ID290
Feature activation-3.48e-01

Pos logit contributions
quet+0.038
ful+0.038
bly+0.032
ment+0.032
kin+0.032
Neg logit contributions
rolling-0.037
 challenged-0.036
 disagreed-0.035
 Whoever-0.035
 One-0.034
Token happy
Token ID3772
Feature activation-1.01e+00

Pos logit contributions
rolling+0.004
 challenged+0.004
 Whoever+0.004
 disagreed+0.004
 favorites+0.004
Neg logit contributions
ful-0.004
quet-0.004
bly-0.003
kin-0.003
 fairy-0.003
Token.
Token ID13
Feature activation-8.59e-01

Pos logit contributions
quet+0.037
aid+0.034
bly+0.034
ful+0.032
kin+0.032
Neg logit contributions
 challenged-0.029
 wanted-0.028
 disagreed-0.028
 begged-0.027
 One-0.027
Token It
Token ID632
Feature activation-2.61e-01

Pos logit contributions
ful+0.030
quet+0.030
 fairy+0.026
ment+0.025
 Fairy+0.025
Neg logit contributions
rolling-0.026
 challenged-0.025
 disagreed-0.025
inters-0.024
 favorites-0.023
Token liked
Token ID8288
Feature activation+9.46e-01

Pos logit contributions
quet+0.011
ful+0.010
ment+0.009
bly+0.009
kin+0.009
Neg logit contributions
 challenged-0.007
rolling-0.007
 disagreed-0.006
 Whoever-0.006
 One-0.006
Token to
Token ID284
Feature activation+3.43e-01

Pos logit contributions
 challenged+0.021
 Whoever+0.020
 One+0.020
 favorites+0.019
rolling+0.019
Neg logit contributions
quet-0.044
kin-0.038
aid-0.038
bly-0.038
ful-0.038
Token clean
Token ID3424
Feature activation-4.06e+00

Pos logit contributions
 One+0.010
rolling+0.010
 Whoever+0.010
 challenged+0.010
 begged+0.009
Neg logit contributions
quet-0.013
ful-0.012
ment-0.011
sm-0.011
paste-0.011
Token plates
Token ID13375
Feature activation+1.80e-01

Pos logit contributions
quet+0.163
aid+0.145
ful+0.143
cake+0.131
kin+0.130
Neg logit contributions
 One-0.112
 Whoever-0.111
 challenged-0.110
 exception-0.108
 favorites-0.107
Token and
Token ID290
Feature activation+1.72e-01

Pos logit contributions
 disagreed+0.004
 One+0.004
 lined+0.004
 challenged+0.004
 favorites+0.004
Neg logit contributions
quet-0.007
ment-0.006
sm-0.006
bly-0.006
ful-0.006
Token cups
Token ID14180
Feature activation+3.59e-01

Pos logit contributions
 challenged+0.006
rolling+0.006
 disagreed+0.005
 begged+0.005
 One+0.005
Neg logit contributions
quet-0.006
ful-0.006
sm-0.005
ment-0.005
ative-0.005
Token.
Token ID13
Feature activation-3.65e-01

Pos logit contributions
 disagreed+0.009
 lined+0.009
 One+0.008
 favorites+0.008
 wanted+0.008
Neg logit contributions
quet-0.014
 relief-0.012
igsaw-0.012
ment-0.012
 Tooth-0.012
Token The
Token ID383
Feature activation+4.45e-01

Pos logit contributions
ful+0.013
quet+0.012
 fairy+0.011
ment+0.011
bly+0.011
Neg logit contributions
rolling-0.012
 disagreed-0.011
 challenged-0.011
 argued-0.010
 padd-0.010
Token So
Token ID1406
Feature activation+6.63e-01

Pos logit contributions
quet+0.020
ful+0.019
bly+0.017
kin+0.017
ment+0.017
Neg logit contributions
rolling-0.012
 challenged-0.012
 Whoever-0.012
 One-0.011
 disagreed-0.011
Token they
Token ID484
Feature activation+8.48e-02

Pos logit contributions
 Whoever+0.020
 One+0.020
 disagreed+0.020
 challenged+0.020
 begged+0.020
Neg logit contributions
quet-0.024
ful-0.021
web-0.020
sm-0.020
aid-0.020
Token paid
Token ID3432
Feature activation-2.37e+00

Pos logit contributions
rolling+0.002
 challenged+0.002
 Whoever+0.002
 disagreed+0.002
 padd+0.002
Neg logit contributions
quet-0.003
ful-0.003
ment-0.003
bly-0.003
kin-0.003
Token for
Token ID329
Feature activation-2.79e+00

Pos logit contributions
quet+0.090
web+0.077
kin+0.073
cake+0.072
bly+0.071
Neg logit contributions
 Whoever-0.071
 padd-0.071
rolling-0.068
 challenged-0.066
 begged-0.066
Token the
Token ID262
Feature activation-2.56e-02

Pos logit contributions
quet+0.099
web+0.090
ful+0.088
kin+0.088
cake+0.087
Neg logit contributions
 Whoever-0.085
 begged-0.084
 padd-0.082
 challenged-0.081
 One-0.080
Token gift
Token ID6979
Feature activation-4.10e+00

Pos logit contributions
ful+0.001
quet+0.001
ment+0.001
smart+0.001
bly+0.001
Neg logit contributions
 favorites-0.001
 age-0.001
 favourite-0.001
 One-0.001
 favorite-0.001
Token,
Token ID11
Feature activation-1.55e-01

Pos logit contributions
quet+0.132
aid+0.125
bly+0.125
ment+0.118
ful+0.117
Neg logit contributions
 disagreed-0.124
 Whoever-0.123
 One-0.119
 challenged-0.118
 begged-0.117
Token hopped
Token ID45230
Feature activation+6.98e-01

Pos logit contributions
quet+0.005
ful+0.005
kin+0.004
bly+0.004
web+0.004
Neg logit contributions
rolling-0.005
 Whoever-0.005
 challenged-0.005
 disagreed-0.005
 padd-0.005
Token back
Token ID736
Feature activation-3.00e+00

Pos logit contributions
 Whoever+0.023
rolling+0.022
 challenged+0.022
Whoever+0.021
 padd+0.021
Neg logit contributions
quet-0.023
ful-0.022
 forgiveness-0.020
aid-0.020
web-0.020
Token on
Token ID319
Feature activation-7.50e-01

Pos logit contributions
quet+0.117
ful+0.111
kin+0.104
bly+0.103
aid+0.101
Neg logit contributions
rolling-0.088
 challenged-0.080
 Whoever-0.079
 favorites-0.079
 padd-0.077
Token the
Token ID262
Feature activation-8.72e-01

Pos logit contributions
quet+0.026
aid+0.023
ful+0.023
paste+0.022
web+0.022
Neg logit contributions
 Whoever-0.025
 One-0.025
 padd-0.024
rolling-0.023
 favorites-0.023
Tokened
Token ID276
Feature activation+3.69e-01

Pos logit contributions
quet+0.108
igsaw+0.107
 Tooth+0.107
 dessert+0.102
 forgiveness+0.102
Neg logit contributions
rolling-0.055
 wanted-0.054
 intrigued-0.051
 spotted-0.051
 disagreed-0.050
Token every
Token ID790
Feature activation+2.37e+00

Pos logit contributions
 favorites+0.010
 challenged+0.010
 disagreed+0.010
rolling+0.010
 One+0.010
Neg logit contributions
quet-0.014
ful-0.013
kin-0.012
paste-0.012
aid-0.012
Token year
Token ID614
Feature activation-9.12e-01

Pos logit contributions
rolling+0.069
 challenged+0.069
 disagreed+0.067
 Whoever+0.066
 One+0.066
Neg logit contributions
quet-0.086
ful-0.084
bly-0.074
paste-0.073
ment-0.073
Token and
Token ID290
Feature activation-1.66e-01

Pos logit contributions
quet+0.031
ful+0.028
bly+0.027
aid+0.026
kin+0.026
Neg logit contributions
 disagreed-0.030
rolling-0.030
 challenged-0.029
 wanted-0.028
 One-0.028
Token brought
Token ID3181
Feature activation+5.37e-01

Pos logit contributions
quet+0.007
ful+0.006
bly+0.006
igsaw+0.005
kin+0.005
Neg logit contributions
 challenged-0.005
 disagreed-0.005
rolling-0.004
 padd-0.004
 wanted-0.004
Token happiness
Token ID12157
Feature activation-3.78e+00

Pos logit contributions
 challenged+0.015
 favorites+0.015
 One+0.014
rolling+0.014
 Whoever+0.014
Neg logit contributions
quet-0.020
ful-0.018
kin-0.018
bly-0.017
paste-0.017
Token to
Token ID284
Feature activation+1.85e-01

Pos logit contributions
quet+0.151
kin+0.147
bly+0.144
arl+0.141
igsaw+0.141
Neg logit contributions
 favorites-0.077
 favourite-0.075
rolling-0.075
 challenged-0.071
 favorite-0.071
Token everyone
Token ID2506
Feature activation-7.95e-01

Pos logit contributions
 One+0.006
 challenged+0.006
 asks+0.006
rolling+0.005
 favorites+0.005
Neg logit contributions
quet-0.006
paste-0.006
aid-0.006
ful-0.006
cake-0.006
Token who
Token ID508
Feature activation+5.26e+00

Pos logit contributions
quet+0.025
ful+0.023
aid+0.022
kin+0.022
bly+0.021
Neg logit contributions
rolling-0.026
 challenged-0.026
 disagreed-0.025
 intrigued-0.025
 wanted-0.025
Token saw
Token ID2497
Feature activation+1.06e+00

Pos logit contributions
 challenged+0.121
rolling+0.119
 disagreed+0.116
 Whoever+0.108
 favorites+0.106
Neg logit contributions
quet-0.230
ful-0.222
bly-0.201
kin-0.199
ment-0.197
Token it
Token ID340
Feature activation+6.98e-01

Pos logit contributions
 challenged+0.032
 disagreed+0.032
 favorites+0.031
 wanted+0.031
 asks+0.030
Neg logit contributions
quet-0.038
igsaw-0.033
ful-0.032
kin-0.032
paste-0.031
Token Her
Token ID2332
Feature activation-8.23e-02

Pos logit contributions
ful+0.008
quet+0.008
ment+0.007
 fairy+0.007
bly+0.007
Neg logit contributions
rolling-0.006
 challenged-0.006
 disagreed-0.006
 favorites-0.005
 Whoever-0.005
Token eyes
Token ID2951
Feature activation-2.23e+00

Pos logit contributions
quet+0.003
ful+0.003
bly+0.003
ment+0.003
 fairy+0.003
Neg logit contributions
rolling-0.002
 challenged-0.002
 Whoever-0.002
 disagreed-0.002
 favorites-0.002
Token lit
Token ID6578
Feature activation-3.54e-01

Pos logit contributions
quet+0.082
ful+0.080
igsaw+0.072
 queens+0.070
kin+0.069
Neg logit contributions
rolling-0.068
 challenged-0.063
 disagreed-0.062
 padd-0.061
 inc-0.059
Token up
Token ID510
Feature activation+3.63e+00

Pos logit contributions
quet+0.016
ful+0.015
kin+0.014
aid+0.014
paste+0.014
Neg logit contributions
rolling-0.008
 challenged-0.008
 Whoever-0.007
 One-0.007
 padd-0.007
Token with
Token ID351
Feature activation+9.74e-01

Pos logit contributions
rolling+0.100
 disagreed+0.098
 One+0.098
 upon+0.098
 intrigued+0.095
Neg logit contributions
quet-0.140
aid-0.133
paste-0.126
ful-0.119
kin-0.117
Token joy
Token ID8716
Feature activation-3.51e+00

Pos logit contributions
 challenged+0.030
rolling+0.030
 padd+0.030
 disagreed+0.029
 intrigued+0.029
Neg logit contributions
quet-0.035
ful-0.031
web-0.030
kin-0.029
paste-0.029
Token when
Token ID618
Feature activation+2.22e+00

Pos logit contributions
quet+0.130
aid+0.121
 queens+0.113
igsaw+0.111
kin+0.111
Neg logit contributions
 disagreed-0.099
rolling-0.098
 challenged-0.093
 lined-0.089
 wanted-0.089
Token she
Token ID673
Feature activation+6.15e-01

Pos logit contributions
rolling+0.065
 challenged+0.065
 One+0.063
 disagreed+0.062
 padd+0.061
Neg logit contributions
quet-0.085
ful-0.077
aid-0.075
bly-0.072
cake-0.071
Token saw
Token ID2497
Feature activation+7.36e-01

Pos logit contributions
rolling+0.013
 Whoever+0.012
 challenged+0.012
 One+0.012
 favorites+0.011
Neg logit contributions
ful-0.027
quet-0.027
 fairy-0.024
kin-0.024
ment-0.024
Token it
Token ID340
Feature activation+5.43e-01

Pos logit contributions
 challenged+0.025
 disagreed+0.024
 padd+0.024
 favorites+0.024
rolling+0.023
Neg logit contributions
quet-0.025
ful-0.020
kin-0.019
bly-0.019
cake-0.019
Token up
Token ID510
Feature activation+4.30e+00

Pos logit contributions
rolling+0.018
 challenged+0.018
 disagreed+0.017
 Whoever+0.016
 One+0.016
Neg logit contributions
quet-0.019
ful-0.018
aid-0.016
ment-0.015
bly-0.015