TOP ACTIVATIONS
MAX = 6.212

. " Stop that right now ! What are
. " Stop that !" the mill er says
. " Stop that ! You are making too
. " Stop that noise !" she says .
. " Stop that , Tom ," Dad said
. " Stop that , Tim and Kim ,"
. " Stop that !" the man yelled .
. " Stop that noise !" a voice said
. " Stop that !" said Mom . "
. " Stop that right now !" she said
She says , " Stop that ! You are not nice
. " Stop that !" she shouted . "
. " Stop that , Lily ," her mom
. " Stop that right now !" she said
and said , " Stop that noise ! You 're hurting
. " Stop that , silly cat . The
You can 't go near that dog . It might be
" Look at that man !" Tim says .
They said , " Stop that ! You are mean !
. " Stop that ! Stop rocking ! You

MIN ACTIVATIONS
MIN = -8.111

, but it was all fore shadow ed by the kind
, and all the other av oc ados were very happy
there and wait . The fore shadow ing of the dull
Everyday he would take the av oc ados and scatter them
Grand ma 's words fore shadow ed a happy future
That was the setup that fore shadow ed the bad ending
tree was sad that the av oc ados were being taken
sleep so that no more mis chie vous mice could come
As the farmer scattered the av oc ados , he noticed
happy that it produced more av oc ados than ever before
the tree knew that the av oc ados would be back
Tim saw some other kids mis beh aving and running around
It was a big j igsaw puzzle and she wanted to
know , her mum was fore shadow ing ... the next
?" The little boy smiled mis chie v ously and said
so happy and brought the av oc ados home to share
would take out a j igsaw puzzle and try to solve
baby . He was a mis chie vous little baby and
It turned out that the av oc ados were bad and
and the princess were over joy ed , and they soon

INTERVAL 2.631 - 6.212
CONTAINS 1.372%

Yes , it â € ™ s so bright ! â
. Come on , let 's go play ."
. " O w , ow , ow !"
at him . Tim screamed and pulled back his hand .
. " Yes , we did , Tom .

INTERVAL -0.949 - 2.631
CONTAINS 80.051%

in a pile of leaves . She ran away and forgot
loneliness . Fl appy continued searching for a place
. Tim 's mom said , " Today , you will
to God . She asked God to help her to have
was white and fluffy . The boat loved to sail in

INTERVAL -4.530 - -0.949
CONTAINS 18.567%

happy and loved being a helper in the kitchen . <|endoftext|>
and started to move ! It was very scary . The
asked Lily if he could borrow her comb to comb his
hope so too !" She picked up the leaf and looked
to his room and found two books . One was blue

INTERVAL -8.111 - -4.530
CONTAINS 0.010%

Sam and Fran ran over to the peanuts
play together ." Da isy smiled and hugged Jack
moments together . Now whenever Heidi sees someone kneeling , she
. This week Lib by wanted to go to
wide the universe was and how silly and small she felt
Token.
Token ID13
Feature activation+8.27e-01

Pos logit contributions
opl+0.011
ats+0.010
 snowball+0.010
?"+0.010
ossible+0.009
Neg logit contributions
 Grac-0.013
nam-0.013
 Claire-0.013
rily-0.012
 Katy-0.012
Token
Token ID198
Feature activation+9.59e-01

Pos logit contributions
opl+0.017
?"+0.015
ats+0.015
 snowball+0.014
?+0.014
Neg logit contributions
 Grac-0.032
nam-0.031
 Claire-0.030
rily-0.030
 Katy-0.029
Token
Token ID198
Feature activation+6.87e-01

Pos logit contributions
opl+0.027
ats+0.024
 snowball+0.023
ah+0.023
at+0.022
Neg logit contributions
 Grac-0.029
 Claire-0.026
nam-0.026
rily-0.026
 Katy-0.025
Token"
Token ID1
Feature activation-2.41e+00

Pos logit contributions
opl+0.018
 snowball+0.016
ats+0.016
?"+0.015
?+0.015
Neg logit contributions
 Grac-0.021
nam-0.021
ve-0.020
 Claire-0.020
rily-0.020
TokenStop
Token ID19485
Feature activation+8.08e-01

Pos logit contributions
nam+0.092
 Grac+0.091
 Katy+0.087
rily+0.087
 Claire+0.083
Neg logit contributions
?"-0.047
 snowball-0.047
opl-0.045
?-0.044
aw-0.042
Token that
Token ID326
Feature activation+6.21e+00

Pos logit contributions
opl+0.019
?"+0.018
 snowball+0.017
ats+0.017
?+0.016
Neg logit contributions
 Grac-0.028
nam-0.028
rily-0.026
 Claire-0.026
 Katy-0.026
Token right
Token ID826
Feature activation-1.81e-01

Pos logit contributions
opl+0.135
ats+0.116
?"+0.111
 snowball+0.110
ossible+0.108
Neg logit contributions
 Grac-0.227
rily-0.213
 Claire-0.209
ve-0.209
nam-0.208
Token now
Token ID783
Feature activation+3.09e+00

Pos logit contributions
 Grac+0.006
nam+0.006
 Claire+0.006
 Katy+0.006
rily+0.006
Neg logit contributions
opl-0.004
ats-0.004
 snowball-0.003
ossible-0.003
ch-0.003
Token!
Token ID0
Feature activation+2.34e+00

Pos logit contributions
opl+0.069
 snowball+0.059
ats+0.059
?"+0.056
o+0.056
Neg logit contributions
 Grac-0.111
nam-0.106
 Claire-0.105
 Katy-0.105
rily-0.105
Token What
Token ID1867
Feature activation-2.46e+00

Pos logit contributions
opl+0.061
?"+0.056
ats+0.055
 snowball+0.054
?+0.051
Neg logit contributions
 Grac-0.077
nam-0.074
 Claire-0.071
rily-0.070
 Katy-0.070
Token are
Token ID389
Feature activation-3.45e-01

Pos logit contributions
 Grac+0.083
nam+0.083
 Katy+0.079
 Claire+0.079
 brace+0.075
Neg logit contributions
opl-0.061
 snowball-0.051
ats-0.048
ch-0.046
 microwave-0.045
Token.
Token ID13
Feature activation+6.71e-01

Pos logit contributions
opl+0.003
?"+0.003
 snowball+0.003
ats+0.003
ch+0.003
Neg logit contributions
nam-0.004
 Grac-0.004
rily-0.004
 Claire-0.003
 Katy-0.003
Token
Token ID198
Feature activation+1.08e+00

Pos logit contributions
ats+0.017
opl+0.017
?"+0.015
 snowball+0.015
at+0.015
Neg logit contributions
 Grac-0.022
nam-0.021
 Claire-0.021
 Katy-0.020
rily-0.020
Token
Token ID198
Feature activation+7.13e-01

Pos logit contributions
opl+0.032
ats+0.031
at+0.030
ah+0.029
?+0.028
Neg logit contributions
 Grac-0.027
 Claire-0.025
ve-0.024
 Katy-0.024
nam-0.024
Token"
Token ID1
Feature activation-2.40e+00

Pos logit contributions
opl+0.021
ats+0.020
?"+0.019
at+0.019
 snowball+0.018
Neg logit contributions
nam-0.017
Gr-0.017
 Grac-0.016
 Claire-0.016
ve-0.016
TokenStop
Token ID19485
Feature activation+6.96e-01

Pos logit contributions
 Grac+0.091
nam+0.090
rily+0.087
 Katy+0.084
 Claire+0.083
Neg logit contributions
opl-0.050
?"-0.047
 snowball-0.046
?-0.044
o-0.042
Token that
Token ID326
Feature activation+6.08e+00

Pos logit contributions
opl+0.017
?"+0.014
ats+0.014
 snowball+0.014
ossible+0.013
Neg logit contributions
 Grac-0.025
nam-0.024
 Claire-0.023
rily-0.023
 Katy-0.022
Token!"
Token ID2474
Feature activation-3.15e-01

Pos logit contributions
opl+0.139
ats+0.122
o+0.116
ossible+0.115
 snowball+0.113
Neg logit contributions
 Grac-0.201
ve-0.197
rily-0.190
nam-0.184
rier-0.182
Token the
Token ID262
Feature activation+1.41e+00

Pos logit contributions
nam+0.011
 Grac+0.011
 Claire+0.011
 Katy+0.010
rily+0.010
Neg logit contributions
opl-0.008
ats-0.007
?"-0.006
 snowball-0.006
o-0.006
Token mill
Token ID3939
Feature activation+6.20e-01

Pos logit contributions
opl+0.035
?"+0.032
 snowball+0.030
ats+0.030
?+0.030
Neg logit contributions
 Grac-0.048
nam-0.048
rily-0.045
 Claire-0.045
 Katy-0.044
Tokener
Token ID263
Feature activation+1.76e+00

Pos logit contributions
opl+0.015
?"+0.013
 snowball+0.012
ats+0.012
?+0.012
Neg logit contributions
 Grac-0.021
nam-0.021
rily-0.020
 Claire-0.020
edly-0.019
Token says
Token ID1139
Feature activation-7.60e-02

Pos logit contributions
opl+0.039
 snowball+0.033
ats+0.032
?"+0.032
ossible+0.029
Neg logit contributions
 Grac-0.064
nam-0.062
rily-0.061
 Claire-0.059
ve-0.059
Token.
Token ID13
Feature activation+5.67e-01

Pos logit contributions
opl+0.034
ats+0.032
 snowball+0.030
ossible+0.030
at+0.029
Neg logit contributions
 Grac-0.027
 Claire-0.026
edly-0.026
 Katy-0.025
rily-0.025
Token
Token ID198
Feature activation+5.35e-01

Pos logit contributions
opl+0.011
?"+0.010
 snowball+0.009
?+0.009
ats+0.009
Neg logit contributions
 Grac-0.023
nam-0.022
rily-0.021
 Claire-0.021
 Katy-0.021
Token
Token ID198
Feature activation+4.02e-01

Pos logit contributions
opl+0.014
ats+0.013
 snowball+0.013
ah+0.012
at+0.012
Neg logit contributions
 Grac-0.017
nam-0.016
 Claire-0.016
 Katy-0.015
rily-0.015
Token"
Token ID1
Feature activation-2.28e+00

Pos logit contributions
opl+0.009
 snowball+0.008
?"+0.008
ats+0.008
?+0.007
Neg logit contributions
 Grac-0.014
nam-0.014
 Claire-0.013
 Katy-0.013
rily-0.013
TokenStop
Token ID19485
Feature activation+7.58e-01

Pos logit contributions
 Grac+0.090
rily+0.089
nam+0.089
edly+0.083
 Katy+0.083
Neg logit contributions
?"-0.041
?-0.038
opl-0.038
 snowball-0.038
ch-0.036
Token that
Token ID326
Feature activation+6.08e+00

Pos logit contributions
opl+0.018
?"+0.016
ats+0.015
 snowball+0.015
?+0.014
Neg logit contributions
 Grac-0.027
nam-0.026
 Claire-0.025
rily-0.025
 Katy-0.025
Token!
Token ID0
Feature activation+2.49e+00

Pos logit contributions
opl+0.126
ats+0.116
ossible+0.106
 snowball+0.106
o+0.105
Neg logit contributions
 Grac-0.214
ve-0.206
rily-0.205
nam-0.203
 Katy-0.202
Token You
Token ID921
Feature activation+9.86e-01

Pos logit contributions
opl+0.066
?"+0.060
ats+0.060
 snowball+0.059
?+0.056
Neg logit contributions
 Grac-0.081
nam-0.078
 Claire-0.075
rily-0.073
 Katy-0.073
Token are
Token ID389
Feature activation+7.59e-01

Pos logit contributions
opl+0.026
 snowball+0.023
?"+0.021
ats+0.021
?+0.020
Neg logit contributions
 Grac-0.032
nam-0.031
ve-0.030
 Claire-0.028
 Katy-0.028
Token making
Token ID1642
Feature activation+2.05e+00

Pos logit contributions
opl+0.018
?"+0.017
?+0.016
 snowball+0.016
ats+0.015
Neg logit contributions
 Grac-0.027
nam-0.026
rily-0.025
 Claire-0.025
edly-0.024
Token too
Token ID1165
Feature activation+2.23e+00

Pos logit contributions
opl+0.055
?"+0.051
ats+0.050
 snowball+0.048
?+0.047
Neg logit contributions
 Grac-0.065
nam-0.064
 Claire-0.060
rily-0.060
 Katy-0.060
Token.
Token ID13
Feature activation+5.58e-01

Pos logit contributions
opl+0.033
ats+0.032
 snowball+0.030
at+0.030
ossible+0.029
Neg logit contributions
 Grac-0.023
nam-0.022
edly-0.022
 Claire-0.021
 brace-0.021
Token
Token ID198
Feature activation+6.37e-01

Pos logit contributions
opl+0.016
?"+0.015
 snowball+0.015
ats+0.015
?+0.014
Neg logit contributions
 Grac-0.016
nam-0.016
rily-0.015
 Claire-0.015
 Katy-0.015
Token
Token ID198
Feature activation+3.30e-01

Pos logit contributions
opl+0.018
 snowball+0.016
ats+0.016
ah+0.015
at+0.015
Neg logit contributions
 Grac-0.020
nam-0.018
 Claire-0.018
ve-0.017
 Katy-0.017
Token"
Token ID1
Feature activation-2.61e+00

Pos logit contributions
opl+0.009
 snowball+0.008
?"+0.008
ats+0.008
o+0.007
Neg logit contributions
 Grac-0.010
nam-0.010
 Claire-0.009
ve-0.009
 Katy-0.009
TokenStop
Token ID19485
Feature activation+6.27e-01

Pos logit contributions
rily+0.109
 Grac+0.104
nam+0.102
 Katy+0.101
edly+0.099
Neg logit contributions
?"-0.043
?-0.041
 snowball-0.039
ch-0.038
 Rex-0.034
Token that
Token ID326
Feature activation+6.02e+00

Pos logit contributions
opl+0.016
?"+0.014
ats+0.014
 snowball+0.014
?+0.013
Neg logit contributions
 Grac-0.021
nam-0.021
 Claire-0.020
rily-0.020
 Katy-0.019
Token noise
Token ID7838
Feature activation+2.41e+00

Pos logit contributions
opl+0.132
ats+0.115
 snowball+0.109
o+0.107
?"+0.105
Neg logit contributions
 Grac-0.217
ve-0.201
rily-0.200
nam-0.199
 Claire-0.198
Token!"
Token ID2474
Feature activation-5.26e-01

Pos logit contributions
opl+0.075
ossible+0.062
 snowball+0.061
 Arctic+0.060
 microwave+0.059
Neg logit contributions
ve-0.062
 Grac-0.062
 Katy-0.059
 Claire-0.059
rily-0.056
Token she
Token ID673
Feature activation+2.42e+00

Pos logit contributions
 Grac+0.013
nam+0.013
rily+0.012
 Claire+0.011
 Katy+0.011
Neg logit contributions
opl-0.018
?"-0.018
?-0.017
 snowball-0.016
ats-0.016
Token says
Token ID1139
Feature activation+3.85e-02

Pos logit contributions
opl+0.051
ats+0.038
 snowball+0.038
?"+0.037
loo+0.037
Neg logit contributions
 Grac-0.094
nam-0.090
 brace-0.087
 Claire-0.087
ve-0.086
Token.
Token ID13
Feature activation+1.15e+00

Pos logit contributions
?"+0.000
?+0.000
o+0.000
opl+0.000
ats+0.000
Neg logit contributions
 Grac-0.002
nam-0.002
 Claire-0.002
 Katy-0.002
rily-0.002
Token.
Token ID13
Feature activation-1.74e-01

Pos logit contributions
opl+0.028
?"+0.025
ats+0.024
 snowball+0.024
?+0.023
Neg logit contributions
nam-0.041
 Grac-0.040
rily-0.038
 Claire-0.038
 Katy-0.037
Token
Token ID198
Feature activation+4.74e-01

Pos logit contributions
 Grac+0.006
nam+0.006
 Claire+0.006
rily+0.006
 Katy+0.005
Neg logit contributions
opl-0.004
ats-0.004
?"-0.004
?-0.004
 snowball-0.003
Token
Token ID198
Feature activation+3.05e-01

Pos logit contributions
opl+0.016
ats+0.014
at+0.014
ah+0.013
 snowball+0.013
Neg logit contributions
 Grac-0.013
 Claire-0.011
Gr-0.010
ve-0.010
 Katy-0.010
Token"
Token ID1
Feature activation-2.47e+00

Pos logit contributions
opl+0.009
ats+0.008
?"+0.008
 snowball+0.008
at+0.007
Neg logit contributions
 Grac-0.009
nam-0.009
 Claire-0.008
Gr-0.008
ve-0.008
TokenStop
Token ID19485
Feature activation+5.91e-01

Pos logit contributions
 Grac+0.074
nam+0.074
 Claire+0.069
rily+0.066
edly+0.065
Neg logit contributions
opl-0.074
?"-0.069
ats-0.067
?-0.063
 snowball-0.061
Token that
Token ID326
Feature activation+6.01e+00

Pos logit contributions
opl+0.016
ats+0.014
ossible+0.014
 snowball+0.013
?"+0.013
Neg logit contributions
 Grac-0.019
nam-0.019
 Claire-0.017
rily-0.017
 Katy-0.017
Token,
Token ID11
Feature activation+2.15e+00

Pos logit contributions
opl+0.129
ats+0.112
o+0.109
 snowball+0.105
?"+0.103
Neg logit contributions
 Grac-0.221
rily-0.209
nam-0.209
 Claire-0.202
ve-0.201
Token Tom
Token ID4186
Feature activation+1.51e+00

Pos logit contributions
opl+0.066
?"+0.061
ats+0.059
 snowball+0.058
?+0.058
Neg logit contributions
 Grac-0.061
nam-0.060
rily-0.056
 Claire-0.056
 Katy-0.055
Token,"
Token ID553
Feature activation-1.14e+00

Pos logit contributions
opl+0.046
ossible+0.039
 snowball+0.039
haw+0.037
ats+0.036
Neg logit contributions
 Grac-0.038
rily-0.037
 brace-0.036
ve-0.036
 Katy-0.036
Token Dad
Token ID17415
Feature activation+9.98e-01

Pos logit contributions
 Grac+0.035
nam+0.034
 Claire+0.032
 Katy+0.031
rily+0.030
Neg logit contributions
opl-0.035
ats-0.031
?"-0.030
 snowball-0.029
o-0.028
Token said
Token ID531
Feature activation+3.83e-01

Pos logit contributions
opl+0.034
 snowball+0.027
?"+0.027
ats+0.026
ossible+0.025
Neg logit contributions
 Grac-0.029
nam-0.027
ve-0.025
 Claire-0.025
 Katy-0.025
Token.
Token ID13
Feature activation+3.65e-01

Pos logit contributions
nam+0.023
 Grac+0.022
rily+0.021
 Claire+0.020
 Katy+0.020
Neg logit contributions
opl-0.019
?"-0.017
 snowball-0.016
ats-0.016
ossible-0.016
Token
Token ID198
Feature activation+1.18e+00

Pos logit contributions
opl+0.010
ats+0.009
?"+0.009
 snowball+0.008
?+0.008
Neg logit contributions
 Grac-0.012
nam-0.011
 Claire-0.011
 Katy-0.011
rily-0.010
Token
Token ID198
Feature activation+8.52e-01

Pos logit contributions
opl+0.038
ats+0.033
at+0.033
 snowball+0.032
ah+0.032
Neg logit contributions
 Grac-0.032
 Claire-0.028
ve-0.028
 Katy-0.027
Gr-0.027
Token"
Token ID1
Feature activation-2.39e+00

Pos logit contributions
opl+0.024
?"+0.020
 snowball+0.020
ats+0.020
 microwave+0.019
Neg logit contributions
nam-0.027
 Grac-0.027
ve-0.025
Gr-0.024
 Claire-0.024
TokenStop
Token ID19485
Feature activation+4.90e-01

Pos logit contributions
 Grac+0.078
nam+0.077
 Claire+0.073
rily+0.072
 Katy+0.071
Neg logit contributions
opl-0.064
?"-0.059
ats-0.056
?-0.055
 snowball-0.054
Token that
Token ID326
Feature activation+5.99e+00

Pos logit contributions
opl+0.013
ats+0.011
ossible+0.011
 snowball+0.010
?"+0.010
Neg logit contributions
 Grac-0.016
nam-0.016
 Claire-0.015
 Katy-0.015
rily-0.014
Token,
Token ID11
Feature activation+2.12e+00

Pos logit contributions
opl+0.139
ats+0.124
ossible+0.116
o+0.116
aster+0.114
Neg logit contributions
 Grac-0.195
ve-0.188
rily-0.184
 Claire-0.179
 brace-0.178
Token Tim
Token ID5045
Feature activation+2.18e+00

Pos logit contributions
opl+0.058
?"+0.050
ats+0.050
 snowball+0.048
o+0.048
Neg logit contributions
 Grac-0.069
nam-0.068
 Claire-0.064
rily-0.063
 Katy-0.062
Token and
Token ID290
Feature activation-2.42e-01

Pos logit contributions
opl+0.058
 snowball+0.050
ossible+0.048
ats+0.047
ch+0.047
Neg logit contributions
 Grac-0.062
ve-0.062
rily-0.061
 Katy-0.061
 Claire-0.060
Token Kim
Token ID6502
Feature activation+6.31e-01

Pos logit contributions
nam+0.008
 Grac+0.008
 brace+0.007
 Katy+0.007
 rib+0.007
Neg logit contributions
opl-0.006
?"-0.006
 snowball-0.006
ats-0.005
?-0.005
Token,"
Token ID553
Feature activation-1.26e+00

Pos logit contributions
opl+0.016
 snowball+0.015
 microwave+0.013
ossible+0.013
iren+0.013
Neg logit contributions
 Grac-0.019
ve-0.018
 Claire-0.018
 Katy-0.018
nam-0.018
Token.
Token ID13
Feature activation+4.95e-01

Pos logit contributions
opl+0.048
?"+0.044
 snowball+0.041
ats+0.041
?+0.040
Neg logit contributions
 Grac-0.078
nam-0.077
rily-0.074
 Claire-0.074
 Katy-0.072
Token
Token ID198
Feature activation+6.17e-01

Pos logit contributions
opl+0.011
?"+0.009
ats+0.009
 snowball+0.009
?+0.009
Neg logit contributions
 Grac-0.019
nam-0.018
 Claire-0.018
rily-0.017
 Katy-0.017
Token
Token ID198
Feature activation+3.11e-01

Pos logit contributions
opl+0.018
ats+0.016
 snowball+0.015
ah+0.015
at+0.015
Neg logit contributions
 Grac-0.018
 Claire-0.017
 Katy-0.016
nam-0.016
rily-0.016
Token"
Token ID1
Feature activation-2.51e+00

Pos logit contributions
opl+0.009
 snowball+0.007
?"+0.007
ats+0.007
?+0.007
Neg logit contributions
 Grac-0.010
nam-0.009
 Claire-0.009
rily-0.009
ve-0.009
TokenStop
Token ID19485
Feature activation+5.27e-01

Pos logit contributions
 Grac+0.089
nam+0.088
rily+0.085
 Katy+0.083
 Claire+0.082
Neg logit contributions
opl-0.057
?"-0.054
 snowball-0.051
?-0.049
o-0.047
Token that
Token ID326
Feature activation+5.99e+00

Pos logit contributions
opl+0.012
?"+0.011
 snowball+0.010
ats+0.010
?+0.010
Neg logit contributions
 Grac-0.019
nam-0.019
rily-0.018
 Claire-0.018
 Katy-0.018
Token!"
Token ID2474
Feature activation-3.02e-01

Pos logit contributions
opl+0.122
ats+0.108
 snowball+0.101
o+0.099
ossible+0.098
Neg logit contributions
 Grac-0.217
rily-0.209
ve-0.206
 Claire-0.204
nam-0.204
Token the
Token ID262
Feature activation+1.11e+00

Pos logit contributions
 Grac+0.010
nam+0.010
rily+0.009
 Claire+0.009
 Katy+0.009
Neg logit contributions
?"-0.008
opl-0.008
?-0.007
 snowball-0.007
ats-0.007
Token man
Token ID582
Feature activation-5.44e-01

Pos logit contributions
opl+0.025
?"+0.025
?+0.022
 snowball+0.022
ats+0.022
Neg logit contributions
 Grac-0.040
nam-0.039
rily-0.038
 Claire-0.037
 Katy-0.037
Token yelled
Token ID22760
Feature activation+7.84e-01

Pos logit contributions
 Grac+0.018
nam+0.018
rily+0.017
 Claire+0.017
 Katy+0.017
Neg logit contributions
opl-0.014
?"-0.013
ats-0.012
 snowball-0.012
?-0.012
Token.
Token ID13
Feature activation+8.97e-01

Pos logit contributions
?"+0.010
?+0.009
 snowball+0.008
opl+0.007
 Rex+0.007
Neg logit contributions
nam-0.036
 Grac-0.035
 Katy-0.033
 Claire-0.033
rily-0.033
Token.
Token ID13
Feature activation+7.79e-01

Pos logit contributions
opl+0.032
ats+0.028
 snowball+0.028
?"+0.028
ossible+0.027
Neg logit contributions
 Grac-0.039
nam-0.038
rily-0.036
 Claire-0.036
edly-0.036
Token
Token ID198
Feature activation+1.02e+00

Pos logit contributions
opl+0.019
?"+0.018
?+0.016
 snowball+0.016
ats+0.016
Neg logit contributions
 Grac-0.027
nam-0.027
rily-0.026
 Claire-0.025
 Katy-0.025
Token
Token ID198
Feature activation+6.77e-01

Pos logit contributions
opl+0.027
 snowball+0.024
ats+0.024
?"+0.023
 microwave+0.022
Neg logit contributions
 Grac-0.032
nam-0.030
 Claire-0.030
 Katy-0.029
ve-0.028
Token"
Token ID1
Feature activation-2.30e+00

Pos logit contributions
opl+0.022
 snowball+0.020
?"+0.019
ats+0.019
?+0.018
Neg logit contributions
nam-0.018
 Grac-0.018
 Claire-0.017
ve-0.016
edly-0.016
TokenStop
Token ID19485
Feature activation+5.38e-01

Pos logit contributions
 Grac+0.091
nam+0.089
rily+0.088
 Katy+0.085
 Claire+0.083
Neg logit contributions
?"-0.042
opl-0.042
?-0.039
 snowball-0.039
ats-0.036
Token that
Token ID326
Feature activation+5.97e+00

Pos logit contributions
opl+0.013
?"+0.012
ats+0.011
 snowball+0.011
?+0.011
Neg logit contributions
 Grac-0.019
nam-0.018
 Claire-0.017
rily-0.017
 Katy-0.017
Token noise
Token ID7838
Feature activation+2.45e+00

Pos logit contributions
opl+0.128
ats+0.110
o+0.109
 snowball+0.108
?"+0.107
Neg logit contributions
 Grac-0.217
nam-0.209
rily-0.204
 Claire-0.203
 Katy-0.203
Token!"
Token ID2474
Feature activation-3.36e-01

Pos logit contributions
opl+0.071
 snowball+0.057
ossible+0.057
ats+0.055
 microwave+0.054
Neg logit contributions
 Grac-0.071
 Claire-0.069
ve-0.068
 Katy-0.068
rily-0.067
Token a
Token ID257
Feature activation+1.26e+00

Pos logit contributions
 Grac+0.010
nam+0.010
rily+0.009
 Claire+0.009
 Katy+0.009
Neg logit contributions
opl-0.010
?"-0.010
 snowball-0.009
?-0.009
ats-0.009
Token voice
Token ID3809
Feature activation+1.90e+00

Pos logit contributions
?"+0.031
opl+0.030
 snowball+0.029
?+0.029
ch+0.027
Neg logit contributions
nam-0.041
 Grac-0.041
rily-0.040
 Claire-0.038
 Katy-0.038
Token said
Token ID531
Feature activation+3.25e-01

Pos logit contributions
opl+0.046
?"+0.042
 snowball+0.040
ats+0.040
?+0.039
Neg logit contributions
 Grac-0.065
nam-0.064
rily-0.061
 Claire-0.061
 Katy-0.060
Token.
Token ID13
Feature activation+1.91e-01

Pos logit contributions
opl+0.023
?"+0.021
ats+0.020
 snowball+0.020
?+0.019
Neg logit contributions
 Grac-0.038
nam-0.037
rily-0.035
 Claire-0.035
 Katy-0.035
Token
Token ID198
Feature activation+5.98e-01

Pos logit contributions
opl+0.004
?"+0.004
 snowball+0.003
?+0.003
ats+0.003
Neg logit contributions
nam-0.007
 Grac-0.007
rily-0.007
 Katy-0.007
 Claire-0.007
Token
Token ID198
Feature activation+1.57e-01

Pos logit contributions
opl+0.018
ats+0.015
 snowball+0.015
?"+0.014
at+0.014
Neg logit contributions
 Grac-0.019
 Claire-0.017
 Katy-0.016
ve-0.016
rily-0.016
Token"
Token ID1
Feature activation-2.73e+00

Pos logit contributions
opl+0.004
?"+0.003
 snowball+0.003
ats+0.003
?+0.003
Neg logit contributions
 Grac-0.005
nam-0.005
 Claire-0.005
rily-0.005
 Katy-0.005
TokenStop
Token ID19485
Feature activation+2.13e-01

Pos logit contributions
nam+0.094
 Grac+0.093
rily+0.092
 Katy+0.087
ve+0.086
Neg logit contributions
opl-0.062
?"-0.060
 snowball-0.059
?-0.056
o-0.055
Token that
Token ID326
Feature activation+5.95e+00

Pos logit contributions
opl+0.006
ats+0.005
?"+0.005
 snowball+0.005
ossible+0.005
Neg logit contributions
 Grac-0.007
nam-0.007
 Claire-0.006
rily-0.006
 Katy-0.006
Token!"
Token ID2474
Feature activation-7.58e-01

Pos logit contributions
opl+0.140
ats+0.114
 snowball+0.110
?"+0.108
ossible+0.108
Neg logit contributions
 Grac-0.212
ve-0.194
rily-0.194
 Claire-0.187
 Katy-0.185
Token said
Token ID531
Feature activation+9.14e-01

Pos logit contributions
nam+0.025
 Grac+0.025
rily+0.024
 Katy+0.023
 Claire+0.023
Neg logit contributions
?"-0.018
opl-0.017
?-0.017
 snowball-0.016
ats-0.016
Token Mom
Token ID11254
Feature activation+2.20e+00

Pos logit contributions
?"+0.021
?+0.019
 Rex+0.019
ossible+0.017
 snowball+0.017
Neg logit contributions
nam-0.032
ve-0.031
rily-0.030
 brace-0.030
chie-0.030
Token.
Token ID13
Feature activation+2.90e-01

Pos logit contributions
opl+0.052
?"+0.046
ats+0.043
 snowball+0.043
?+0.041
Neg logit contributions
 Grac-0.079
nam-0.078
rily-0.074
 Claire-0.073
 Katy-0.072
Token "
Token ID366
Feature activation-8.24e-01

Pos logit contributions
?"+0.007
?+0.006
opl+0.006
 snowball+0.006
 Rex+0.006
Neg logit contributions
nam-0.010
 Grac-0.010
rily-0.010
ve-0.009
 brace-0.009
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
opl+0.023
 snowball+0.021
?"+0.020
ats+0.020
ossible+0.020
Neg logit contributions
 Grac-0.031
nam-0.031
 Claire-0.029
rily-0.029
 Katy-0.029
Token
Token ID198
Feature activation+3.81e-01

Pos logit contributions
nam+0.004
 Grac+0.004
rily+0.003
 brace+0.003
 Katy+0.003
Neg logit contributions
?"-0.002
opl-0.002
?-0.002
 snowball-0.002
o-0.002
Token
Token ID198
Feature activation+1.98e-01

Pos logit contributions
opl+0.010
 snowball+0.008
ats+0.008
?"+0.008
?+0.008
Neg logit contributions
 Grac-0.013
 Claire-0.012
 Katy-0.012
nam-0.012
rily-0.012
Token"
Token ID1
Feature activation-2.54e+00

Pos logit contributions
opl+0.004
 snowball+0.004
?"+0.004
ats+0.004
o+0.003
Neg logit contributions
 Grac-0.007
nam-0.007
 Claire-0.007
 Katy-0.007
rily-0.007
TokenStop
Token ID19485
Feature activation+2.94e-01

Pos logit contributions
 Grac+0.090
nam+0.089
rily+0.088
 Katy+0.083
edly+0.080
Neg logit contributions
?"-0.055
?-0.051
opl-0.051
 snowball-0.050
ch-0.047
Token that
Token ID326
Feature activation+5.91e+00

Pos logit contributions
opl+0.008
?"+0.007
ats+0.007
 snowball+0.007
?+0.006
Neg logit contributions
 Grac-0.010
nam-0.010
 Claire-0.009
rily-0.009
 Katy-0.009
Token right
Token ID826
Feature activation-5.94e-01

Pos logit contributions
opl+0.126
ats+0.114
 snowball+0.107
ossible+0.106
?"+0.105
Neg logit contributions
 Grac-0.209
nam-0.198
rily-0.197
 Claire-0.197
ve-0.196
Token now
Token ID783
Feature activation+2.62e+00

Pos logit contributions
 Grac+0.025
nam+0.024
 Claire+0.024
rily+0.024
 Katy+0.024
Neg logit contributions
opl-0.010
ats-0.008
 snowball-0.007
ossible-0.007
ch-0.007
Token!"
Token ID2474
Feature activation-4.04e-01

Pos logit contributions
opl+0.057
 snowball+0.049
ats+0.048
?"+0.047
o+0.046
Neg logit contributions
 Grac-0.096
nam-0.092
 Claire-0.091
 Katy-0.090
rily-0.090
Token she
Token ID673
Feature activation+2.51e+00

Pos logit contributions
nam+0.012
 Grac+0.012
rily+0.011
ve+0.011
chie+0.010
Neg logit contributions
?"-0.012
opl-0.011
?-0.011
 snowball-0.010
ossible-0.010
Token said
Token ID531
Feature activation+6.49e-01

Pos logit contributions
?"+0.044
?+0.037
opl+0.037
o+0.036
 snowball+0.036
Neg logit contributions
nam-0.102
 Grac-0.100
rily-0.099
 Claire-0.095
 Katy-0.094
Token She
Token ID1375
Feature activation+1.72e+00

Pos logit contributions
opl+0.003
?"+0.003
?+0.003
 snowball+0.003
o+0.002
Neg logit contributions
nam-0.003
 Grac-0.003
rily-0.003
 Claire-0.003
ve-0.003
Token says
Token ID1139
Feature activation-1.76e-01

Pos logit contributions
opl+0.033
 snowball+0.027
?"+0.027
ats+0.026
o+0.025
Neg logit contributions
 Grac-0.067
nam-0.065
 brace-0.063
 Claire-0.062
ve-0.062
Token,
Token ID11
Feature activation+7.46e-01

Pos logit contributions
nam+0.008
 Grac+0.007
rily+0.007
 Claire+0.007
 Katy+0.007
Neg logit contributions
?"-0.003
opl-0.003
?-0.002
ats-0.002
 snowball-0.002
Token "
Token ID366
Feature activation-9.27e-01

Pos logit contributions
opl+0.021
?"+0.020
?+0.019
 snowball+0.019
o+0.018
Neg logit contributions
 Grac-0.022
nam-0.022
rily-0.021
 Claire-0.021
 Katy-0.021
TokenStop
Token ID19485
Feature activation+4.43e-01

Pos logit contributions
 Grac+0.034
rily+0.034
nam+0.033
 Claire+0.032
 Katy+0.032
Neg logit contributions
opl-0.019
?"-0.019
?-0.018
 snowball-0.017
ossible-0.017
Token that
Token ID326
Feature activation+5.89e+00

Pos logit contributions
opl+0.011
?"+0.010
 snowball+0.010
ats+0.010
?+0.009
Neg logit contributions
 Grac-0.015
nam-0.015
rily-0.014
 Claire-0.014
 Katy-0.014
Token!
Token ID0
Feature activation+2.17e+00

Pos logit contributions
opl+0.134
ats+0.116
?"+0.114
 snowball+0.113
o+0.109
Neg logit contributions
 Grac-0.208
nam-0.200
rily-0.196
 Claire-0.195
 Katy-0.193
Token You
Token ID921
Feature activation+8.00e-01

Pos logit contributions
opl+0.053
?"+0.050
 snowball+0.047
ats+0.047
?+0.045
Neg logit contributions
 Grac-0.074
nam-0.073
 Claire-0.069
rily-0.069
 Katy-0.068
Token are
Token ID389
Feature activation+5.29e-01

Pos logit contributions
opl+0.022
 snowball+0.019
?"+0.017
ats+0.017
 microwave+0.017
Neg logit contributions
 Grac-0.026
nam-0.025
ve-0.024
 Claire-0.023
 Katy-0.022
Token not
Token ID407
Feature activation-4.28e-01

Pos logit contributions
?"+0.011
opl+0.010
?+0.010
o+0.009
 snowball+0.009
Neg logit contributions
 Grac-0.020
rily-0.020
nam-0.020
edly-0.019
 Claire-0.019
Token nice
Token ID3621
Feature activation+7.43e-01

Pos logit contributions
 Grac+0.016
rily+0.016
nam+0.015
edly+0.015
 Claire+0.015
Neg logit contributions
?"-0.010
opl-0.009
?-0.009
o-0.008
 snowball-0.008
Token.
Token ID13
Feature activation+3.49e-01

Pos logit contributions
opl+0.027
ats+0.025
at+0.024
 snowball+0.023
ossible+0.023
Neg logit contributions
 Grac-0.020
edly-0.019
 Claire-0.018
nam-0.018
rily-0.018
Token
Token ID198
Feature activation+4.02e-01

Pos logit contributions
opl+0.010
?"+0.009
 snowball+0.009
?+0.009
ats+0.009
Neg logit contributions
 Grac-0.011
nam-0.011
rily-0.010
 Claire-0.010
 Katy-0.010
Token
Token ID198
Feature activation+4.99e-02

Pos logit contributions
opl+0.010
 snowball+0.009
ats+0.009
?"+0.009
 microwave+0.008
Neg logit contributions
 Grac-0.013
nam-0.012
 Claire-0.012
 Katy-0.012
ve-0.012
Token"
Token ID1
Feature activation-2.71e+00

Pos logit contributions
opl+0.001
 snowball+0.001
?"+0.001
ats+0.001
o+0.001
Neg logit contributions
 Grac-0.002
nam-0.002
ve-0.001
 Claire-0.001
 Katy-0.001
TokenStop
Token ID19485
Feature activation+3.53e-01

Pos logit contributions
rily+0.104
 Grac+0.102
nam+0.100
 Katy+0.099
 Claire+0.096
Neg logit contributions
?"-0.052
 snowball-0.048
opl-0.048
?-0.048
ch-0.047
Token that
Token ID326
Feature activation+5.86e+00

Pos logit contributions
opl+0.009
ats+0.008
?"+0.007
 snowball+0.007
ossible+0.007
Neg logit contributions
 Grac-0.012
nam-0.012
 Claire-0.011
rily-0.011
 Katy-0.011
Token!"
Token ID2474
Feature activation-7.22e-01

Pos logit contributions
opl+0.124
ats+0.109
ossible+0.104
 snowball+0.100
o+0.099
Neg logit contributions
 Grac-0.209
rily-0.201
ve-0.198
 Katy-0.194
 Claire-0.193
Token she
Token ID673
Feature activation+2.33e+00

Pos logit contributions
 Grac+0.021
nam+0.020
rily+0.020
 Claire+0.019
 Katy+0.019
Neg logit contributions
opl-0.021
?"-0.021
?-0.020
 snowball-0.019
ats-0.019
Token shouted
Token ID17293
Feature activation+9.25e-01

Pos logit contributions
?"+0.047
opl+0.045
 snowball+0.042
?+0.041
ats+0.041
Neg logit contributions
 Grac-0.088
nam-0.087
rily-0.084
 Claire-0.083
 Katy-0.082
Token.
Token ID13
Feature activation+9.21e-01

Pos logit contributions
?"+0.009
?+0.009
 snowball+0.008
opl+0.007
 Rex+0.007
Neg logit contributions
nam-0.043
 Grac-0.043
 Claire-0.041
rily-0.041
 Katy-0.041
Token "
Token ID366
Feature activation-6.58e-01

Pos logit contributions
?"+0.023
opl+0.023
?+0.021
 snowball+0.020
o+0.020
Neg logit contributions
nam-0.030
 Grac-0.030
rily-0.029
 Claire-0.027
 Katy-0.027
Token.
Token ID13
Feature activation+3.45e-01

Pos logit contributions
opl+0.035
?"+0.033
?+0.031
ats+0.031
 snowball+0.030
Neg logit contributions
 Grac-0.067
nam-0.066
 Claire-0.063
rily-0.063
 Katy-0.062
Token
Token ID198
Feature activation+5.07e-01

Pos logit contributions
opl+0.007
?"+0.006
?+0.006
o+0.006
 snowball+0.005
Neg logit contributions
nam-0.013
 Grac-0.013
rily-0.013
 Claire-0.012
 Katy-0.012
Token
Token ID198
Feature activation+9.69e-02

Pos logit contributions
opl+0.012
 snowball+0.011
?"+0.011
ats+0.011
?+0.010
Neg logit contributions
 Grac-0.018
nam-0.016
 Claire-0.016
 Katy-0.016
rily-0.016
Token"
Token ID1
Feature activation-2.77e+00

Pos logit contributions
opl+0.003
?"+0.003
 snowball+0.003
ats+0.003
?+0.003
Neg logit contributions
 Grac-0.003
nam-0.003
 Claire-0.002
rily-0.002
 Katy-0.002
TokenStop
Token ID19485
Feature activation+2.51e-01

Pos logit contributions
 Grac+0.103
nam+0.101
rily+0.099
 Katy+0.096
 Claire+0.095
Neg logit contributions
opl-0.056
?"-0.055
 snowball-0.052
?-0.050
ch-0.049
Token that
Token ID326
Feature activation+5.84e+00

Pos logit contributions
opl+0.006
?"+0.006
ats+0.006
 snowball+0.006
?+0.005
Neg logit contributions
 Grac-0.008
nam-0.008
 Claire-0.008
rily-0.008
 Katy-0.008
Token,
Token ID11
Feature activation+2.15e+00

Pos logit contributions
opl+0.119
?"+0.104
ats+0.102
 snowball+0.099
o+0.095
Neg logit contributions
 Grac-0.222
nam-0.215
rily-0.209
 Claire-0.207
 Katy-0.204
Token Lily
Token ID20037
Feature activation+1.57e+00

Pos logit contributions
?"+0.060
opl+0.060
?+0.056
 snowball+0.055
ats+0.053
Neg logit contributions
 Grac-0.064
nam-0.064
rily-0.060
 Claire-0.059
 Katy-0.059
Token,"
Token ID553
Feature activation-1.12e+00

Pos logit contributions
opl+0.039
 snowball+0.034
ats+0.033
ossible+0.032
?"+0.031
Neg logit contributions
 Grac-0.052
nam-0.050
 Katy-0.049
 Claire-0.049
rily-0.048
Token her
Token ID607
Feature activation-4.83e-01

Pos logit contributions
 Grac+0.024
nam+0.024
 Claire+0.021
rily+0.021
 Katy+0.021
Neg logit contributions
opl-0.042
?"-0.040
 snowball-0.039
ats-0.039
?-0.038
Token mom
Token ID1995
Feature activation+1.57e+00

Pos logit contributions
 Grac+0.017
nam+0.017
rily+0.016
 Claire+0.016
 Katy+0.016
Neg logit contributions
opl-0.011
?"-0.011
 snowball-0.010
?-0.010
ats-0.010
Token.
Token ID13
Feature activation+3.27e-01

Pos logit contributions
opl+0.008
?"+0.007
ats+0.007
 snowball+0.007
ossible+0.006
Neg logit contributions
 Grac-0.012
nam-0.012
 Claire-0.011
rily-0.011
 Katy-0.011
Token
Token ID198
Feature activation+5.79e-01

Pos logit contributions
?"+0.008
opl+0.008
?+0.007
o+0.007
ossible+0.006
Neg logit contributions
nam-0.011
 Grac-0.011
rily-0.011
 Claire-0.010
 brace-0.010
Token
Token ID198
Feature activation+4.68e-01

Pos logit contributions
opl+0.013
 snowball+0.012
ats+0.012
?"+0.012
?+0.011
Neg logit contributions
 Grac-0.020
nam-0.019
 Claire-0.019
 Katy-0.019
rily-0.018
Token"
Token ID1
Feature activation-2.36e+00

Pos logit contributions
opl+0.010
?"+0.009
 snowball+0.009
ats+0.008
?+0.008
Neg logit contributions
 Grac-0.018
nam-0.017
 Claire-0.016
 Katy-0.016
rily-0.016
TokenStop
Token ID19485
Feature activation+7.02e-01

Pos logit contributions
rily+0.094
nam+0.089
 Grac+0.088
edly+0.084
 Katy+0.084
Neg logit contributions
?"-0.043
?-0.041
 snowball-0.038
 course-0.037
ch-0.037
Token that
Token ID326
Feature activation+5.82e+00

Pos logit contributions
opl+0.017
?"+0.016
 snowball+0.015
ats+0.015
?+0.015
Neg logit contributions
 Grac-0.024
nam-0.024
rily-0.023
 Claire-0.022
 Katy-0.022
Token right
Token ID826
Feature activation-4.03e-01

Pos logit contributions
opl+0.127
ats+0.111
ossible+0.105
o+0.104
 snowball+0.104
Neg logit contributions
 Grac-0.208
rily-0.196
nam-0.194
ve-0.193
 Claire-0.192
Token now
Token ID783
Feature activation+2.86e+00

Pos logit contributions
 Grac+0.015
nam+0.015
 Claire+0.014
rily+0.014
 Katy+0.014
Neg logit contributions
opl-0.009
ats-0.007
 snowball-0.007
ossible-0.007
o-0.006
Token!"
Token ID2474
Feature activation-3.83e-01

Pos logit contributions
opl+0.060
?"+0.051
 snowball+0.051
ats+0.051
o+0.049
Neg logit contributions
 Grac-0.107
nam-0.104
rily-0.101
 Claire-0.101
 Katy-0.100
Token she
Token ID673
Feature activation+2.72e+00

Pos logit contributions
nam+0.009
 Grac+0.009
rily+0.008
 Claire+0.008
edly+0.008
Neg logit contributions
?"-0.013
opl-0.013
?-0.013
 snowball-0.012
ossible-0.012
Token said
Token ID531
Feature activation+6.30e-01

Pos logit contributions
?"+0.044
o+0.039
?+0.037
 snowball+0.036
ats+0.036
Neg logit contributions
nam-0.112
rily-0.112
 Grac-0.112
 Claire-0.108
 Katy-0.106
Token and
Token ID290
Feature activation+5.76e-01

Pos logit contributions
 Grac+0.013
nam+0.013
rily+0.012
 Claire+0.012
 Katy+0.012
Neg logit contributions
opl-0.008
?"-0.008
 snowball-0.007
ats-0.007
?-0.007
Token said
Token ID531
Feature activation+1.83e-01

Pos logit contributions
?"+0.013
o+0.012
 snowball+0.012
opl+0.012
?+0.012
Neg logit contributions
 Grac-0.020
nam-0.019
rily-0.019
 Claire-0.018
 Katy-0.018
Token,
Token ID11
Feature activation+6.78e-01

Pos logit contributions
?"+0.003
?+0.002
o+0.002
opl+0.002
 snowball+0.002
Neg logit contributions
nam-0.008
rily-0.008
 Grac-0.008
 Claire-0.008
edly-0.008
Token "
Token ID366
Feature activation-1.62e+00

Pos logit contributions
?"+0.016
opl+0.016
?+0.015
o+0.015
 snowball+0.015
Neg logit contributions
nam-0.023
 Grac-0.022
rily-0.022
 Claire-0.021
 Katy-0.021
TokenStop
Token ID19485
Feature activation-1.09e-01

Pos logit contributions
 Grac+0.054
nam+0.053
rily+0.051
 Katy+0.050
 Claire+0.050
Neg logit contributions
opl-0.041
?"-0.039
 snowball-0.037
?-0.036
o-0.036
Token that
Token ID326
Feature activation+5.70e+00

Pos logit contributions
 Grac+0.004
nam+0.004
 Claire+0.003
rily+0.003
 Katy+0.003
Neg logit contributions
opl-0.003
ats-0.002
?"-0.002
 snowball-0.002
ossible-0.002
Token noise
Token ID7838
Feature activation+1.72e+00

Pos logit contributions
opl+0.140
ats+0.132
ossible+0.123
o+0.120
 snowball+0.120
Neg logit contributions
 Grac-0.182
rily-0.172
ve-0.168
nam-0.168
 Claire-0.167
Token!
Token ID0
Feature activation+1.68e+00

Pos logit contributions
opl+0.051
 snowball+0.044
ossible+0.044
 microwave+0.043
 sled+0.040
Neg logit contributions
 Grac-0.048
 Claire-0.045
ve-0.045
nam-0.044
 Katy-0.044
Token You
Token ID921
Feature activation-1.39e-01

Pos logit contributions
opl+0.049
ats+0.046
 snowball+0.044
?"+0.043
?+0.041
Neg logit contributions
 Grac-0.052
nam-0.050
 Claire-0.047
 Katy-0.045
rily-0.044
Token're
Token ID821
Feature activation+1.03e+00

Pos logit contributions
 Grac+0.005
nam+0.004
ve+0.004
 Claire+0.004
rily+0.004
Neg logit contributions
opl-0.004
 snowball-0.003
ats-0.003
?"-0.003
ossible-0.003
Token hurting
Token ID21530
Feature activation+2.02e+00

Pos logit contributions
opl+0.025
?"+0.025
?+0.023
 snowball+0.022
o+0.022
Neg logit contributions
 Grac-0.035
rily-0.034
nam-0.034
 Claire-0.032
edly-0.032
Token.
Token ID13
Feature activation-1.29e-01

Pos logit contributions
opl+0.034
?"+0.031
ats+0.030
ossible+0.029
 snowball+0.028
Neg logit contributions
 Grac-0.038
rily-0.037
nam-0.036
 Claire-0.036
 brace-0.035
Token
Token ID198
Feature activation+5.18e-01

Pos logit contributions
 Grac+0.004
 Claire+0.004
nam+0.004
rily+0.004
 Katy+0.004
Neg logit contributions
opl-0.004
ats-0.004
at-0.003
?"-0.003
 snowball-0.003
Token
Token ID198
Feature activation+2.71e-01

Pos logit contributions
opl+0.016
ats+0.015
 snowball+0.014
at+0.014
ah+0.014
Neg logit contributions
 Grac-0.014
 Claire-0.013
ve-0.012
 Katy-0.012
Gr-0.012
Token"
Token ID1
Feature activation-2.56e+00

Pos logit contributions
opl+0.009
ats+0.007
?"+0.007
 snowball+0.007
at+0.007
Neg logit contributions
 Grac-0.007
nam-0.007
Gr-0.007
ve-0.007
 Claire-0.007
TokenStop
Token ID19485
Feature activation+3.22e-01

Pos logit contributions
 Grac+0.078
nam+0.077
 Claire+0.073
rily+0.072
edly+0.069
Neg logit contributions
opl-0.076
?"-0.069
ats-0.067
?-0.064
 snowball-0.062
Token that
Token ID326
Feature activation+5.69e+00

Pos logit contributions
opl+0.008
ats+0.007
ossible+0.007
?"+0.007
 snowball+0.007
Neg logit contributions
 Grac-0.011
nam-0.011
 Claire-0.010
rily-0.010
 Katy-0.010
Token,
Token ID11
Feature activation+1.92e+00

Pos logit contributions
opl+0.136
ats+0.122
ossible+0.113
 fires+0.108
o+0.108
Neg logit contributions
 Grac-0.189
rily-0.179
ve-0.178
 Claire-0.172
 brace-0.168
Token silly
Token ID14397
Feature activation+3.00e+00

Pos logit contributions
opl+0.051
?"+0.045
ats+0.045
 snowball+0.043
o+0.042
Neg logit contributions
 Grac-0.063
nam-0.061
 Claire-0.059
rily-0.059
 Katy-0.057
Token cat
Token ID3797
Feature activation+2.33e+00

Pos logit contributions
opl+0.082
ossible+0.069
?"+0.067
ats+0.066
o+0.064
Neg logit contributions
 Grac-0.094
rily-0.088
 Claire-0.087
 Katy-0.085
ve-0.084
Token.
Token ID13
Feature activation+1.13e+00

Pos logit contributions
opl+0.064
ossible+0.053
ats+0.050
 snowball+0.049
 microwave+0.049
Neg logit contributions
 Grac-0.067
rily-0.066
ve-0.066
 Claire-0.065
 brace-0.063
Token The
Token ID383
Feature activation+1.85e+00

Pos logit contributions
opl+0.032
ats+0.030
?"+0.029
 snowball+0.029
?+0.027
Neg logit contributions
 Grac-0.035
nam-0.033
 Claire-0.032
rily-0.031
 Katy-0.030
TokenYou
Token ID1639
Feature activation+4.20e-01

Pos logit contributions
 Grac+0.032
nam+0.031
rily+0.030
 Claire+0.030
 Katy+0.030
Neg logit contributions
opl-0.025
?"-0.024
 snowball-0.023
?-0.022
ats-0.022
Token can
Token ID460
Feature activation-1.30e+00

Pos logit contributions
opl+0.010
?"+0.009
 snowball+0.008
ats+0.008
?+0.008
Neg logit contributions
 Grac-0.015
nam-0.015
 Claire-0.014
 Katy-0.014
rily-0.014
Token't
Token ID470
Feature activation+1.38e+00

Pos logit contributions
 Grac+0.030
 Claire+0.027
ve+0.026
nam+0.025
 Ivy+0.024
Neg logit contributions
opl-0.045
 snowball-0.042
 fires-0.041
 microwave-0.041
aw-0.040
Token go
Token ID467
Feature activation+8.49e-01

Pos logit contributions
opl+0.037
 snowball+0.032
 microwave+0.031
ats+0.030
?"+0.030
Neg logit contributions
 Grac-0.045
nam-0.043
 Claire-0.042
 Katy-0.040
ve-0.040
Token near
Token ID1474
Feature activation+2.56e+00

Pos logit contributions
opl+0.027
 snowball+0.023
ossible+0.022
ats+0.022
 microwave+0.022
Neg logit contributions
 Grac-0.024
nam-0.023
 Claire-0.021
rily-0.021
ve-0.021
Token that
Token ID326
Feature activation+5.64e+00

Pos logit contributions
opl+0.076
?"+0.069
ats+0.068
 snowball+0.067
o+0.066
Neg logit contributions
 Grac-0.073
nam-0.073
 Claire-0.068
rily-0.068
 Katy-0.067
Token dog
Token ID3290
Feature activation+1.63e+00

Pos logit contributions
opl+0.133
?"+0.120
ats+0.118
 snowball+0.118
o+0.114
Neg logit contributions
 Grac-0.191
nam-0.187
rily-0.179
 Claire-0.178
 Katy-0.178
Token.
Token ID13
Feature activation+9.36e-01

Pos logit contributions
opl+0.045
 snowball+0.038
ossible+0.037
 microwave+0.035
iren+0.035
Neg logit contributions
 Grac-0.049
 Claire-0.048
nam-0.047
rily-0.046
ve-0.045
Token It
Token ID632
Feature activation-1.79e-01

Pos logit contributions
opl+0.026
ats+0.025
?"+0.024
 snowball+0.023
?+0.022
Neg logit contributions
 Grac-0.029
nam-0.028
 Claire-0.027
rily-0.026
 Katy-0.026
Token might
Token ID1244
Feature activation+5.93e-01

Pos logit contributions
 Grac+0.007
nam+0.006
 Claire+0.006
rily+0.006
ve+0.006
Neg logit contributions
opl-0.004
?"-0.003
ats-0.003
ossible-0.003
 snowball-0.003
Token be
Token ID307
Feature activation-5.63e-01

Pos logit contributions
opl+0.016
?"+0.015
ats+0.014
 snowball+0.014
?+0.014
Neg logit contributions
 Grac-0.019
nam-0.018
 Claire-0.017
rily-0.017
 Katy-0.017
Token
Token ID198
Feature activation+7.16e-01

Pos logit contributions
opl+0.008
ats+0.007
?"+0.007
 snowball+0.007
?+0.007
Neg logit contributions
 Grac-0.013
nam-0.013
 Claire-0.012
 Katy-0.012
rily-0.012
Token
Token ID198
Feature activation+2.74e-01

Pos logit contributions
opl+0.022
ats+0.019
ah+0.019
at+0.019
 snowball+0.018
Neg logit contributions
 Grac-0.019
ve-0.017
 Claire-0.017
 Katy-0.017
rily-0.016
Token"
Token ID1
Feature activation-2.59e+00

Pos logit contributions
opl+0.007
ats+0.006
?"+0.006
 snowball+0.006
?+0.006
Neg logit contributions
 Grac-0.009
nam-0.009
ve-0.008
 Claire-0.008
 Katy-0.008
TokenLook
Token ID8567
Feature activation+1.66e-01

Pos logit contributions
 Grac+0.086
nam+0.085
rily+0.082
 Katy+0.080
 Claire+0.079
Neg logit contributions
opl-0.065
?"-0.062
 snowball-0.059
?-0.057
o-0.057
Token at
Token ID379
Feature activation+1.80e+00

Pos logit contributions
opl+0.004
 snowball+0.003
?"+0.003
ats+0.003
?+0.003
Neg logit contributions
 Grac-0.006
nam-0.006
rily-0.006
 Claire-0.006
 Katy-0.006
Token that
Token ID326
Feature activation+5.62e+00

Pos logit contributions
opl+0.037
?"+0.033
 snowball+0.032
ats+0.031
?+0.030
Neg logit contributions
 Grac-0.069
nam-0.068
 Claire-0.065
rily-0.064
 Katy-0.064
Token man
Token ID582
Feature activation+2.72e-01

Pos logit contributions
opl+0.122
ats+0.116
 snowball+0.107
ossible+0.105
o+0.104
Neg logit contributions
 Grac-0.194
nam-0.193
 Claire-0.187
 Katy-0.182
rily-0.182
Token!"
Token ID2474
Feature activation-6.23e-01

Pos logit contributions
opl+0.006
 snowball+0.005
ats+0.005
?"+0.004
ossible+0.004
Neg logit contributions
 Grac-0.010
nam-0.010
 Claire-0.009
rily-0.009
 Katy-0.009
Token Tim
Token ID5045
Feature activation+2.01e-01

Pos logit contributions
 Grac+0.021
nam+0.021
rily+0.020
 Claire+0.019
 Katy+0.019
Neg logit contributions
opl-0.016
?"-0.015
 snowball-0.014
?-0.014
ats-0.014
Token says
Token ID1139
Feature activation+1.74e-01

Pos logit contributions
opl+0.005
ats+0.004
aster+0.004
loo+0.004
ch+0.004
Neg logit contributions
 Grac-0.007
 brace-0.007
 Claire-0.007
 Katy-0.007
nam-0.006
Token.
Token ID13
Feature activation+8.02e-01

Pos logit contributions
?"+0.002
?+0.002
o+0.002
opl+0.002
 snowball+0.002
Neg logit contributions
 Grac-0.008
nam-0.008
 Claire-0.007
 Katy-0.007
rily-0.007
Token They
Token ID1119
Feature activation+5.39e-01

Pos logit contributions
opl+0.001
?"+0.001
 snowball+0.001
ats+0.001
?+0.001
Neg logit contributions
 Grac-0.001
nam-0.001
rily-0.001
 Claire-0.001
 Katy-0.001
Token said
Token ID531
Feature activation+3.93e-01

Pos logit contributions
opl+0.015
?"+0.014
?+0.013
 snowball+0.013
ats+0.013
Neg logit contributions
nam-0.017
 Grac-0.017
 Claire-0.015
rily-0.015
 Katy-0.015
Token,
Token ID11
Feature activation+5.32e-01

Pos logit contributions
opl+0.010
?"+0.009
 snowball+0.009
?+0.009
ats+0.008
Neg logit contributions
 Grac-0.013
nam-0.013
rily-0.012
 Claire-0.012
 Katy-0.012
Token "
Token ID366
Feature activation-1.08e+00

Pos logit contributions
opl+0.017
?"+0.016
ats+0.016
 snowball+0.015
?+0.015
Neg logit contributions
 Grac-0.014
nam-0.014
 Claire-0.013
rily-0.013
 Katy-0.012
TokenStop
Token ID19485
Feature activation+1.08e-01

Pos logit contributions
 Grac+0.033
nam+0.033
rily+0.030
 Claire+0.030
 Katy+0.030
Neg logit contributions
opl-0.030
?"-0.029
ats-0.027
?-0.027
 snowball-0.027
Token that
Token ID326
Feature activation+5.61e+00

Pos logit contributions
opl+0.003
ats+0.002
ossible+0.002
 snowball+0.002
?"+0.002
Neg logit contributions
 Grac-0.004
nam-0.003
 Katy-0.003
 Claire-0.003
rily-0.003
Token!
Token ID0
Feature activation+1.97e+00

Pos logit contributions
opl+0.137
ats+0.119
aster+0.111
ossible+0.110
o+0.108
Neg logit contributions
 Grac-0.187
rily-0.178
ve-0.174
nam-0.169
 Claire-0.168
Token You
Token ID921
Feature activation+4.82e-01

Pos logit contributions
opl+0.055
ats+0.051
?"+0.050
 snowball+0.049
?+0.047
Neg logit contributions
 Grac-0.062
nam-0.059
 Claire-0.056
rily-0.055
 Katy-0.054
Token are
Token ID389
Feature activation+2.84e-01

Pos logit contributions
opl+0.013
 snowball+0.011
ats+0.010
?"+0.010
 microwave+0.010
Neg logit contributions
 Grac-0.016
nam-0.015
ve-0.015
chie-0.014
 Claire-0.014
Token mean
Token ID1612
Feature activation+7.43e-01

Pos logit contributions
opl+0.007
?"+0.006
?+0.006
 snowball+0.006
ats+0.006
Neg logit contributions
 Grac-0.010
nam-0.010
rily-0.009
 Claire-0.009
 Katy-0.009
Token!
Token ID0
Feature activation+1.86e+00

Pos logit contributions
opl+0.023
aster+0.020
 snowball+0.019
ossible+0.019
 microwave+0.019
Neg logit contributions
 Grac-0.019
ve-0.019
 Katy-0.018
ately-0.018
rily-0.018
Token.
Token ID13
Feature activation-1.08e-01

Pos logit contributions
opl+0.018
?"+0.017
 snowball+0.016
ats+0.016
?+0.016
Neg logit contributions
 Grac-0.028
nam-0.028
rily-0.027
 Claire-0.026
 Katy-0.026
Token
Token ID198
Feature activation+2.57e-01

Pos logit contributions
nam+0.004
rily+0.003
 Grac+0.003
ve+0.003
 Claire+0.003
Neg logit contributions
?"-0.003
opl-0.003
?-0.003
 Rex-0.002
 snowball-0.002
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
opl+0.005
?"+0.005
 snowball+0.005
ats+0.005
?+0.004
Neg logit contributions
 Grac-0.010
nam-0.009
 Claire-0.009
rily-0.009
 Katy-0.009
Token"
Token ID1
Feature activation-2.90e+00

Pos logit contributions
 Grac+0.001
nam+0.001
rily+0.001
 Claire+0.001
 Katy+0.001
Neg logit contributions
opl-0.001
?"-0.001
 snowball-0.000
ats-0.000
?-0.000
TokenStop
Token ID19485
Feature activation+9.77e-02

Pos logit contributions
rily+0.110
 Grac+0.104
nam+0.104
 Katy+0.099
ately+0.099
Neg logit contributions
?"-0.059
?-0.054
opl-0.053
ch-0.052
 snowball-0.052
Token that
Token ID326
Feature activation+5.59e+00

Pos logit contributions
opl+0.003
ats+0.002
ossible+0.002
 snowball+0.002
?"+0.002
Neg logit contributions
 Grac-0.003
nam-0.003
 Claire-0.003
 Katy-0.003
rily-0.003
Token!
Token ID0
Feature activation+1.61e+00

Pos logit contributions
opl+0.126
ats+0.113
ossible+0.104
o+0.103
?"+0.102
Neg logit contributions
 Grac-0.200
rily-0.187
nam-0.185
 Claire-0.184
 Katy-0.180
Token Stop
Token ID13707
Feature activation+8.29e-01

Pos logit contributions
opl+0.040
?"+0.037
 snowball+0.035
ats+0.034
?+0.034
Neg logit contributions
 Grac-0.055
nam-0.054
rily-0.052
 Claire-0.051
 Katy-0.050
Token rocking
Token ID38059
Feature activation+2.26e+00

Pos logit contributions
opl+0.023
ossible+0.021
ats+0.021
 snowball+0.019
etch+0.019
Neg logit contributions
 Grac-0.026
nam-0.025
 Claire-0.023
 brace-0.023
 Katy-0.022
Token!
Token ID0
Feature activation+1.43e+00

Pos logit contributions
opl+0.053
ats+0.047
 snowball+0.045
ossible+0.044
?"+0.044
Neg logit contributions
 Grac-0.077
nam-0.077
rily-0.073
 Claire-0.072
 Katy-0.070
Token You
Token ID921
Feature activation+3.49e-01

Pos logit contributions
opl+0.039
ats+0.036
?"+0.036
 snowball+0.034
?+0.033
Neg logit contributions
 Grac-0.046
nam-0.044
 Claire-0.042
 Katy-0.040
rily-0.040
Token,
Token ID11
Feature activation-3.63e-01

Pos logit contributions
 Grac+0.022
nam+0.021
 Claire+0.020
rily+0.020
 Katy+0.020
Neg logit contributions
opl-0.013
?"-0.012
 snowball-0.011
ats-0.011
?-0.011
Token but
Token ID475
Feature activation+1.86e-01

Pos logit contributions
 Grac+0.011
nam+0.011
 Claire+0.011
 Katy+0.011
rily+0.011
Neg logit contributions
opl-0.010
?"-0.009
ats-0.009
 snowball-0.009
?-0.008
Token it
Token ID340
Feature activation-1.61e+00

Pos logit contributions
opl+0.004
?"+0.004
?+0.003
 snowball+0.003
o+0.003
Neg logit contributions
 Grac-0.007
nam-0.007
rily-0.007
 Claire-0.007
 Katy-0.006
Token was
Token ID373
Feature activation-3.93e-01

Pos logit contributions
 Grac+0.055
nam+0.054
rily+0.051
 Claire+0.051
 Katy+0.050
Neg logit contributions
opl-0.039
?"-0.037
 snowball-0.034
ats-0.034
?-0.034
Token all
Token ID477
Feature activation-1.34e+00

Pos logit contributions
 Grac+0.012
nam+0.012
 Claire+0.011
 Katy+0.011
rily+0.011
Neg logit contributions
opl-0.011
ats-0.011
?"-0.010
iren-0.010
 snowball-0.010
Token fore
Token ID1674
Feature activation-7.83e+00

Pos logit contributions
 Grac+0.043
nam+0.042
 Claire+0.040
rily+0.039
 Katy+0.039
Neg logit contributions
opl-0.036
?"-0.033
ats-0.032
 snowball-0.031
?-0.030
Tokenshadow
Token ID19106
Feature activation-2.78e+00

Pos logit contributions
 Grac+0.214
 Claire+0.202
nam+0.199
 Katy+0.187
rily+0.185
Neg logit contributions
opl-0.245
?"-0.234
ats-0.232
o-0.227
ch-0.226
Tokened
Token ID276
Feature activation-1.03e+00

Pos logit contributions
nam+0.103
 Grac+0.093
rily+0.090
 Claire+0.087
 Katy+0.085
Neg logit contributions
o-0.068
aw-0.066
opl-0.066
ats-0.064
ch-0.064
Token by
Token ID416
Feature activation+2.29e-02

Pos logit contributions
 Grac+0.035
nam+0.035
rily+0.033
 Claire+0.032
 Katy+0.032
Neg logit contributions
opl-0.025
?"-0.024
?-0.022
 snowball-0.022
ats-0.022
Token the
Token ID262
Feature activation+1.18e+00

Pos logit contributions
opl+0.001
?"+0.001
?+0.001
 snowball+0.000
o+0.000
Neg logit contributions
nam-0.001
 Grac-0.001
rily-0.001
 Claire-0.001
 Katy-0.001
Token kind
Token ID1611
Feature activation-1.49e+00

Pos logit contributions
opl+0.031
?"+0.029
ats+0.027
 snowball+0.027
?+0.026
Neg logit contributions
 Grac-0.038
nam-0.038
 Claire-0.036
rily-0.035
 Katy-0.035
Token,
Token ID11
Feature activation+3.39e-01

Pos logit contributions
opl+0.018
?"+0.017
 snowball+0.016
ats+0.016
?+0.015
Neg logit contributions
 Grac-0.029
nam-0.028
rily-0.027
 Claire-0.027
 Katy-0.027
Token and
Token ID290
Feature activation+2.81e-01

Pos logit contributions
opl+0.009
?"+0.008
ats+0.008
 snowball+0.008
?+0.008
Neg logit contributions
 Grac-0.011
nam-0.011
 Claire-0.010
rily-0.010
 Katy-0.010
Token all
Token ID477
Feature activation-5.57e-01

Pos logit contributions
?"+0.007
opl+0.007
?+0.007
o+0.007
 snowball+0.007
Neg logit contributions
 Grac-0.009
nam-0.008
rily-0.008
ve-0.008
chie-0.008
Token the
Token ID262
Feature activation-2.31e+00

Pos logit contributions
 Grac+0.018
nam+0.017
 Claire+0.017
rily+0.017
 Katy+0.016
Neg logit contributions
opl-0.015
?"-0.014
 snowball-0.013
ats-0.013
?-0.013
Token other
Token ID584
Feature activation-3.14e+00

Pos logit contributions
 Grac+0.072
nam+0.070
rily+0.070
 Claire+0.065
 Katy+0.065
Neg logit contributions
opl-0.063
?"-0.060
 snowball-0.058
?-0.056
ats-0.055
Token av
Token ID1196
Feature activation-6.98e+00

Pos logit contributions
 Grac+0.104
nam+0.103
rily+0.101
edly+0.095
 Claire+0.094
Neg logit contributions
opl-0.080
?"-0.077
?-0.073
 snowball-0.071
ats-0.070
Tokenoc
Token ID420
Feature activation-4.80e-01

Pos logit contributions
 Grac+0.331
nam+0.328
rily+0.315
 Claire+0.315
 Katy+0.311
Neg logit contributions
opl-0.080
?"-0.068
ats-0.057
 snowball-0.057
?-0.055
Tokenados
Token ID22484
Feature activation-3.80e+00

Pos logit contributions
 Grac+0.010
nam+0.009
 Claire+0.008
rily+0.008
 Katy+0.008
Neg logit contributions
opl-0.020
?"-0.018
ats-0.018
o-0.017
aw-0.017
Token were
Token ID547
Feature activation-2.11e-01

Pos logit contributions
 Grac+0.127
nam+0.126
rily+0.119
 Claire+0.117
 Katy+0.116
Neg logit contributions
opl-0.094
?"-0.092
?-0.085
ats-0.084
o-0.082
Token very
Token ID845
Feature activation-6.53e-01

Pos logit contributions
 Grac+0.007
nam+0.007
rily+0.006
 Claire+0.006
edly+0.006
Neg logit contributions
opl-0.006
?"-0.005
?-0.005
 snowball-0.005
ats-0.005
Token happy
Token ID3772
Feature activation-1.05e+00

Pos logit contributions
 Grac+0.021
nam+0.020
rily+0.019
ately+0.019
 Katy+0.019
Neg logit contributions
?"-0.017
opl-0.017
?-0.015
ats-0.015
o-0.015
Token there
Token ID612
Feature activation-2.02e+00

Pos logit contributions
 Grac+0.000
nam+0.000
 Claire+0.000
 Katy+0.000
 brace+0.000
Neg logit contributions
?"-0.000
opl-0.000
?-0.000
 snowball-0.000
ats-0.000
Token and
Token ID290
Feature activation+4.97e-01

Pos logit contributions
 Grac+0.078
nam+0.077
rily+0.072
 Claire+0.072
 Katy+0.071
Neg logit contributions
opl-0.039
?"-0.038
?-0.035
 snowball-0.033
ats-0.033
Token wait
Token ID4043
Feature activation-1.87e-01

Pos logit contributions
?"+0.011
opl+0.011
?+0.010
 snowball+0.010
ats+0.010
Neg logit contributions
 Grac-0.017
rily-0.017
nam-0.017
 Claire-0.016
ve-0.016
Token.
Token ID13
Feature activation-2.78e-01

Pos logit contributions
nam+0.007
 Grac+0.007
rily+0.007
 Claire+0.007
 Katy+0.007
Neg logit contributions
?"-0.004
opl-0.004
?-0.003
 snowball-0.003
ats-0.003
Token The
Token ID383
Feature activation+8.96e-01

Pos logit contributions
nam+0.010
 Grac+0.010
rily+0.010
 brace+0.009
 diamonds+0.009
Neg logit contributions
opl-0.006
?"-0.006
?-0.005
o-0.005
ossible-0.005
Token fore
Token ID1674
Feature activation-6.79e+00

Pos logit contributions
?"+0.023
opl+0.022
?+0.021
 snowball+0.020
 truck+0.020
Neg logit contributions
 Grac-0.030
nam-0.029
rily-0.027
 brace-0.027
 Katy-0.026
Tokenshadow
Token ID19106
Feature activation-2.51e+00

Pos logit contributions
 Grac+0.171
nam+0.165
 Claire+0.156
rily+0.152
 Katy+0.150
Neg logit contributions
opl-0.227
?"-0.219
ats-0.208
 snowball-0.206
?-0.205
Tokening
Token ID278
Feature activation+3.39e-01

Pos logit contributions
nam+0.086
rily+0.074
 Grac+0.073
 Claire+0.069
 Katy+0.067
Neg logit contributions
o-0.071
aw-0.069
ch-0.067
ats-0.067
?"-0.066
Token of
Token ID286
Feature activation-1.57e-01

Pos logit contributions
opl+0.008
?"+0.008
?+0.007
ats+0.007
 snowball+0.007
Neg logit contributions
 Grac-0.012
nam-0.012
rily-0.011
 Claire-0.011
 Katy-0.011
Token the
Token ID262
Feature activation+6.23e-01

Pos logit contributions
nam+0.005
 Grac+0.005
rily+0.005
 Claire+0.004
ve+0.004
Neg logit contributions
opl-0.004
?"-0.004
?-0.004
 snowball-0.004
ats-0.004
Token dull
Token ID19222
Feature activation-2.57e-01

Pos logit contributions
?"+0.016
opl+0.016
 snowball+0.015
?+0.015
 truck+0.014
Neg logit contributions
 Grac-0.020
nam-0.020
rily-0.019
 brace-0.018
 Katy-0.017
Token Everyday
Token ID48154
Feature activation+8.99e-01

Pos logit contributions
opl+0.014
?"+0.014
?+0.013
ats+0.013
 snowball+0.013
Neg logit contributions
 Grac-0.018
nam-0.018
rily-0.017
 Claire-0.017
 Katy-0.016
Token he
Token ID339
Feature activation+6.30e-01

Pos logit contributions
opl+0.023
?"+0.022
 snowball+0.020
?+0.020
ats+0.020
Neg logit contributions
 Grac-0.029
nam-0.029
rily-0.027
 Claire-0.027
 Katy-0.027
Token would
Token ID561
Feature activation-2.77e-01

Pos logit contributions
opl+0.018
 snowball+0.016
ats+0.015
iren+0.015
?"+0.015
Neg logit contributions
 Grac-0.019
 Claire-0.018
nam-0.018
 Katy-0.017
rily-0.017
Token take
Token ID1011
Feature activation-1.50e+00

Pos logit contributions
 Grac+0.009
nam+0.009
 Claire+0.008
 Katy+0.008
rily+0.008
Neg logit contributions
opl-0.007
 snowball-0.006
ats-0.006
?"-0.006
o-0.006
Token the
Token ID262
Feature activation-1.30e+00

Pos logit contributions
 Grac+0.051
nam+0.051
rily+0.048
 Claire+0.047
 Katy+0.047
Neg logit contributions
opl-0.037
?"-0.035
 snowball-0.032
ats-0.032
?-0.032
Token av
Token ID1196
Feature activation-6.74e+00

Pos logit contributions
 Grac+0.040
nam+0.040
 Claire+0.037
rily+0.036
 Katy+0.036
Neg logit contributions
opl-0.036
?"-0.034
ats-0.032
 snowball-0.031
?-0.031
Tokenoc
Token ID420
Feature activation-5.26e-01

Pos logit contributions
 Grac+0.309
nam+0.307
rily+0.294
 Claire+0.293
 Katy+0.290
Neg logit contributions
opl-0.085
?"-0.075
 snowball-0.064
ats-0.063
?-0.062
Tokenados
Token ID22484
Feature activation-3.69e+00

Pos logit contributions
 Grac+0.010
nam+0.010
 Claire+0.009
rily+0.008
 Katy+0.008
Neg logit contributions
opl-0.022
?"-0.020
ats-0.020
?-0.019
o-0.019
Token and
Token ID290
Feature activation-1.13e+00

Pos logit contributions
 Grac+0.118
nam+0.117
rily+0.110
 Claire+0.109
 Katy+0.108
Neg logit contributions
opl-0.097
?"-0.092
?-0.086
ats-0.085
 snowball-0.085
Token scatter
Token ID41058
Feature activation-4.59e-01

Pos logit contributions
 Grac+0.036
rily+0.035
nam+0.034
edly+0.033
ve+0.033
Neg logit contributions
?"-0.028
opl-0.028
?-0.026
 snowball-0.025
ats-0.024
Token them
Token ID606
Feature activation-1.83e+00

Pos logit contributions
 Grac+0.015
nam+0.015
rily+0.014
 Claire+0.013
 Katy+0.013
Neg logit contributions
opl-0.012
?"-0.012
?-0.011
 snowball-0.011
ats-0.011
Token
Token ID198
Feature activation-3.51e-01

Pos logit contributions
 Grac+0.001
ve+0.001
 Katy+0.001
 Claire+0.001
rily+0.001
Neg logit contributions
opl-0.001
?"-0.001
ats-0.001
 snowball-0.001
?-0.001
TokenGrand
Token ID23581
Feature activation-1.01e+00

Pos logit contributions
 Grac+0.010
nam+0.009
 Katy+0.009
 Claire+0.009
rily+0.009
Neg logit contributions
opl-0.011
?"-0.010
ats-0.010
 snowball-0.009
?-0.009
Tokenma
Token ID2611
Feature activation-8.72e-01

Pos logit contributions
 Grac+0.042
nam+0.041
rily+0.039
 Claire+0.039
 Katy+0.039
Neg logit contributions
opl-0.017
?"-0.016
 snowball-0.015
ats-0.014
?-0.013
Token's
Token ID338
Feature activation-2.84e-01

Pos logit contributions
 Grac+0.029
nam+0.028
 Claire+0.027
rily+0.027
 Katy+0.026
Neg logit contributions
opl-0.023
?"-0.021
 snowball-0.020
ats-0.019
?-0.019
Token words
Token ID2456
Feature activation-1.54e+00

Pos logit contributions
 Grac+0.007
nam+0.007
 Claire+0.007
 Katy+0.007
rily+0.007
Neg logit contributions
opl-0.009
?"-0.009
ats-0.009
 snowball-0.008
?-0.008
Token fore
Token ID1674
Feature activation-6.59e+00

Pos logit contributions
 Grac+0.053
nam+0.053
rily+0.050
 Claire+0.049
 Katy+0.048
Neg logit contributions
opl-0.037
?"-0.035
?-0.032
ats-0.032
 snowball-0.031
Tokenshadow
Token ID19106
Feature activation-2.70e+00

Pos logit contributions
 Grac+0.172
nam+0.161
 Claire+0.159
 Katy+0.150
rily+0.150
Neg logit contributions
opl-0.213
?"-0.205
ats-0.200
o-0.196
?-0.193
Tokened
Token ID276
Feature activation-1.01e+00

Pos logit contributions
nam+0.096
rily+0.085
 Grac+0.085
 Claire+0.079
 brace+0.078
Neg logit contributions
o-0.070
aw-0.068
ats-0.067
ch-0.065
?-0.064
Token a
Token ID257
Feature activation+1.17e+00

Pos logit contributions
nam+0.034
 Grac+0.034
rily+0.032
 Claire+0.031
 Katy+0.031
Neg logit contributions
opl-0.025
?"-0.024
?-0.022
ats-0.022
 snowball-0.022
Token happy
Token ID3772
Feature activation-1.35e+00

Pos logit contributions
opl+0.035
?"+0.033
ats+0.032
?+0.031
 snowball+0.031
Neg logit contributions
 Grac-0.034
nam-0.033
 Claire-0.031
 Katy-0.031
rily-0.030
Token future
Token ID2003
Feature activation-1.41e+00

Pos logit contributions
 Grac+0.042
nam+0.041
rily+0.039
 Claire+0.039
 Katy+0.038
Neg logit contributions
opl-0.037
?"-0.035
 snowball-0.033
ats-0.033
?-0.033
Token That
Token ID1320
Feature activation-2.05e+00

Pos logit contributions
opl+0.007
?"+0.007
 snowball+0.006
ats+0.006
?+0.006
Neg logit contributions
 Grac-0.011
nam-0.011
rily-0.010
 Claire-0.010
 Katy-0.010
Token was
Token ID373
Feature activation-2.02e-01

Pos logit contributions
 Grac+0.067
nam+0.065
 Claire+0.064
 Katy+0.064
rily+0.062
Neg logit contributions
opl-0.053
?"-0.049
ats-0.046
 snowball-0.046
?-0.045
Token the
Token ID262
Feature activation-1.41e-01

Pos logit contributions
 Grac+0.006
nam+0.006
 Claire+0.006
rily+0.006
 Katy+0.006
Neg logit contributions
opl-0.006
?"-0.005
ats-0.005
 snowball-0.005
?-0.005
Token setup
Token ID9058
Feature activation-1.92e+00

Pos logit contributions
 Grac+0.004
nam+0.004
 Claire+0.004
 Katy+0.004
rily+0.004
Neg logit contributions
opl-0.004
?"-0.004
ats-0.004
o-0.004
?-0.003
Token that
Token ID326
Feature activation+2.73e+00

Pos logit contributions
 Grac+0.067
nam+0.065
rily+0.063
 Claire+0.063
 Katy+0.063
Neg logit contributions
opl-0.045
?"-0.041
 snowball-0.039
ats-0.037
?-0.036
Token fore
Token ID1674
Feature activation-6.58e+00

Pos logit contributions
opl+0.073
?"+0.067
 snowball+0.066
ats+0.065
?+0.061
Neg logit contributions
 Grac-0.087
nam-0.085
 Katy-0.082
 Claire-0.082
rily-0.079
Tokenshadow
Token ID19106
Feature activation-2.64e+00

Pos logit contributions
 Grac+0.163
nam+0.158
 Claire+0.148
rily+0.145
 Katy+0.142
Neg logit contributions
opl-0.225
?"-0.215
ats-0.208
 snowball-0.203
?-0.203
Tokened
Token ID276
Feature activation-1.00e+00

Pos logit contributions
nam+0.077
 Grac+0.068
rily+0.066
 Claire+0.061
 brace+0.059
Neg logit contributions
o-0.085
aw-0.083
opl-0.083
ats-0.082
?-0.080
Token the
Token ID262
Feature activation+4.93e-01

Pos logit contributions
 Grac+0.035
nam+0.034
rily+0.032
 Claire+0.032
 Katy+0.032
Neg logit contributions
opl-0.024
?"-0.023
ats-0.021
?-0.021
 snowball-0.021
Token bad
Token ID2089
Feature activation-1.04e+00

Pos logit contributions
opl+0.014
?"+0.013
ats+0.012
o+0.012
?+0.012
Neg logit contributions
 Grac-0.015
nam-0.014
 Claire-0.014
ve-0.014
 Katy-0.013
Token ending
Token ID7464
Feature activation-2.52e+00

Pos logit contributions
 Grac+0.032
nam+0.031
 Claire+0.030
 Katy+0.030
rily+0.029
Neg logit contributions
opl-0.028
?"-0.026
ats-0.025
 snowball-0.024
?-0.024
Token tree
Token ID5509
Feature activation-7.59e-01

Pos logit contributions
 Grac+0.002
rily+0.002
nam+0.002
 Claire+0.002
edly+0.002
Neg logit contributions
opl-0.002
?"-0.002
 truck-0.002
?-0.002
 snowball-0.002
Token was
Token ID373
Feature activation-4.41e-01

Pos logit contributions
 Grac+0.025
nam+0.024
rily+0.023
 Claire+0.023
 Katy+0.022
Neg logit contributions
opl-0.020
?"-0.019
 snowball-0.017
ats-0.017
?-0.017
Token sad
Token ID6507
Feature activation+8.91e-02

Pos logit contributions
 Grac+0.013
nam+0.013
 Claire+0.012
rily+0.012
 Katy+0.012
Neg logit contributions
opl-0.013
?"-0.012
ats-0.012
 snowball-0.011
?-0.011
Token that
Token ID326
Feature activation+2.14e+00

Pos logit contributions
opl+0.002
?"+0.002
ats+0.002
 snowball+0.002
?+0.002
Neg logit contributions
 Grac-0.003
nam-0.003
rily-0.003
 Claire-0.003
 Katy-0.003
Token the
Token ID262
Feature activation-8.30e-01

Pos logit contributions
opl+0.048
?"+0.047
?+0.044
 snowball+0.042
ats+0.040
Neg logit contributions
 Grac-0.075
nam-0.073
rily-0.072
ve-0.069
edly-0.068
Token av
Token ID1196
Feature activation-6.52e+00

Pos logit contributions
 Grac+0.027
rily+0.027
nam+0.026
 Claire+0.025
edly+0.024
Neg logit contributions
opl-0.021
?"-0.020
?-0.019
 snowball-0.019
 truck-0.018
Tokenoc
Token ID420
Feature activation-3.49e-01

Pos logit contributions
 Grac+0.330
nam+0.325
 Claire+0.315
rily+0.312
 Katy+0.308
Neg logit contributions
opl-0.065
?"-0.044
ats-0.038
 snowball-0.036
o-0.035
Tokenados
Token ID22484
Feature activation-3.29e+00

Pos logit contributions
 Grac+0.007
nam+0.007
 Claire+0.006
rily+0.006
 Katy+0.006
Neg logit contributions
opl-0.015
?"-0.013
ats-0.013
o-0.012
 truck-0.012
Token were
Token ID547
Feature activation-3.69e-01

Pos logit contributions
 Grac+0.100
nam+0.099
rily+0.093
 Claire+0.092
 Katy+0.091
Neg logit contributions
opl-0.092
?"-0.088
?-0.082
ats-0.082
 snowball-0.081
Token being
Token ID852
Feature activation-9.12e-01

Pos logit contributions
 Grac+0.013
nam+0.012
rily+0.012
edly+0.011
 Claire+0.011
Neg logit contributions
?"-0.009
opl-0.009
?-0.008
 snowball-0.008
ats-0.007
Token taken
Token ID2077
Feature activation-8.90e-01

Pos logit contributions
 Grac+0.030
rily+0.029
nam+0.029
edly+0.028
 Katy+0.028
Neg logit contributions
?"-0.022
?-0.020
opl-0.020
o-0.020
ats-0.019
Token sleep
Token ID3993
Feature activation+1.85e+00

Pos logit contributions
opl+0.000
?"+0.000
?+0.000
 snowball+0.000
ats+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
rily-0.000
 Claire-0.000
 Katy-0.000
Token so
Token ID523
Feature activation+1.30e+00

Pos logit contributions
opl+0.036
?"+0.036
?+0.033
 snowball+0.031
ats+0.030
Neg logit contributions
 Grac-0.071
nam-0.071
 Claire-0.066
rily-0.066
 Katy-0.066
Token that
Token ID326
Feature activation+2.15e+00

Pos logit contributions
?"+0.026
opl+0.025
?+0.023
 snowball+0.021
ats+0.021
Neg logit contributions
 Grac-0.049
nam-0.048
rily-0.046
 Claire-0.046
 Katy-0.046
Token no
Token ID645
Feature activation+1.01e+00

Pos logit contributions
opl+0.055
?"+0.050
ats+0.049
 snowball+0.048
o+0.046
Neg logit contributions
 Grac-0.072
nam-0.071
 Claire-0.067
 Katy-0.066
rily-0.066
Token more
Token ID517
Feature activation-7.65e-01

Pos logit contributions
opl+0.029
ats+0.027
?"+0.026
ossible+0.025
o+0.025
Neg logit contributions
 Grac-0.031
nam-0.030
 Claire-0.029
 Katy-0.028
rily-0.028
Token mis
Token ID2984
Feature activation-6.50e+00

Pos logit contributions
 Grac+0.024
nam+0.024
 Claire+0.023
rily+0.022
 Katy+0.022
Neg logit contributions
opl-0.020
?"-0.019
ats-0.018
 snowball-0.018
o-0.018
Tokenchie
Token ID3043
Feature activation-3.36e+00

Pos logit contributions
 Grac+0.102
nam+0.093
 Claire+0.091
rily+0.088
 Katy+0.088
Neg logit contributions
opl-0.278
?"-0.267
o-0.261
ats-0.260
?-0.256
Tokenvous
Token ID31222
Feature activation-6.92e-01

Pos logit contributions
 Grac+0.116
nam+0.116
 Claire+0.111
 Katy+0.109
rily+0.109
Neg logit contributions
opl-0.075
?"-0.069
 snowball-0.067
ats-0.064
?-0.063
Token mice
Token ID10693
Feature activation-1.27e+00

Pos logit contributions
 Grac+0.023
nam+0.022
 Claire+0.021
rily+0.021
 Katy+0.021
Neg logit contributions
opl-0.018
?"-0.017
ats-0.016
 snowball-0.016
?-0.015
Token could
Token ID714
Feature activation-5.16e-01

Pos logit contributions
 Grac+0.041
nam+0.040
 Claire+0.038
rily+0.038
 Katy+0.037
Neg logit contributions
opl-0.033
?"-0.030
 snowball-0.030
ats-0.029
?-0.028
Token come
Token ID1282
Feature activation-1.77e+00

Pos logit contributions
 Grac+0.017
nam+0.017
rily+0.016
 Claire+0.016
 Katy+0.016
Neg logit contributions
opl-0.013
?"-0.012
 snowball-0.011
?-0.011
ats-0.011
Token As
Token ID1081
Feature activation+5.68e-01

Pos logit contributions
nam+0.008
 Grac+0.008
rily+0.008
 brace+0.008
 Claire+0.007
Neg logit contributions
opl-0.006
?"-0.006
?-0.006
 snowball-0.006
o-0.006
Token the
Token ID262
Feature activation-4.65e-01

Pos logit contributions
?"+0.012
opl+0.012
?+0.011
 snowball+0.011
o+0.010
Neg logit contributions
nam-0.020
 Grac-0.020
rily-0.020
edly-0.019
 Claire-0.019
Token farmer
Token ID18739
Feature activation-1.81e+00

Pos logit contributions
 Grac+0.015
nam+0.015
rily+0.015
 Claire+0.014
edly+0.014
Neg logit contributions
opl-0.012
?"-0.011
 snowball-0.010
?-0.010
ats-0.010
Token scattered
Token ID16830
Feature activation+8.93e-01

Pos logit contributions
 Grac+0.059
nam+0.059
rily+0.055
 Claire+0.055
 Katy+0.054
Neg logit contributions
opl-0.046
?"-0.044
 snowball-0.041
ats-0.041
?-0.041
Token the
Token ID262
Feature activation-1.57e+00

Pos logit contributions
opl+0.022
?"+0.021
 snowball+0.020
?+0.020
ats+0.020
Neg logit contributions
 Grac-0.029
nam-0.029
rily-0.028
 Claire-0.027
 Katy-0.027
Token av
Token ID1196
Feature activation-6.38e+00

Pos logit contributions
 Grac+0.040
nam+0.037
rily+0.037
edly+0.036
 Claire+0.035
Neg logit contributions
opl-0.053
?"-0.052
?-0.049
 snowball-0.048
 truck-0.047
Tokenoc
Token ID420
Feature activation-3.87e-01

Pos logit contributions
 Grac+0.312
nam+0.311
 Claire+0.299
rily+0.297
 Katy+0.293
Neg logit contributions
opl-0.068
?"-0.051
ats-0.045
 snowball-0.044
o-0.041
Tokenados
Token ID22484
Feature activation-3.52e+00

Pos logit contributions
 Grac+0.008
nam+0.008
 Claire+0.007
 Katy+0.007
rily+0.006
Neg logit contributions
opl-0.016
?"-0.014
ats-0.014
 truck-0.013
o-0.013
Token,
Token ID11
Feature activation-9.79e-01

Pos logit contributions
nam+0.121
 Grac+0.121
rily+0.114
 Claire+0.113
 Katy+0.111
Neg logit contributions
opl-0.082
?"-0.081
?-0.077
ats-0.073
o-0.071
Token he
Token ID339
Feature activation+5.47e-01

Pos logit contributions
 Grac+0.037
nam+0.037
rily+0.037
edly+0.035
 brace+0.035
Neg logit contributions
?"-0.017
opl-0.016
 snowball-0.016
?-0.016
 Rex-0.015
Token noticed
Token ID6810
Feature activation+1.15e+00

Pos logit contributions
opl+0.015
?"+0.015
 snowball+0.014
ats+0.014
?+0.014
Neg logit contributions
 Grac-0.017
nam-0.017
rily-0.016
 Claire-0.015
 Katy-0.015
Token happy
Token ID3772
Feature activation-1.75e+00

Pos logit contributions
 Grac+0.005
nam+0.005
rily+0.005
 Claire+0.004
edly+0.004
Neg logit contributions
opl-0.005
?"-0.005
?-0.004
o-0.004
ossible-0.004
Token that
Token ID326
Feature activation+1.62e+00

Pos logit contributions
 Grac+0.060
nam+0.059
rily+0.057
 Claire+0.056
 Katy+0.056
Neg logit contributions
opl-0.042
?"-0.039
 snowball-0.037
ats-0.037
?-0.036
Token it
Token ID340
Feature activation-2.01e+00

Pos logit contributions
?"+0.038
opl+0.038
?+0.036
o+0.031
 snowball+0.030
Neg logit contributions
 Grac-0.055
rily-0.054
nam-0.054
edly-0.053
ve-0.052
Token produced
Token ID4635
Feature activation-1.12e+00

Pos logit contributions
nam+0.076
 Grac+0.071
rily+0.068
 Katy+0.065
 Claire+0.065
Neg logit contributions
?"-0.044
opl-0.044
?-0.041
o-0.039
ats-0.037
Token more
Token ID517
Feature activation-1.48e+00

Pos logit contributions
 Grac+0.040
nam+0.040
rily+0.037
ve+0.037
edly+0.037
Neg logit contributions
?"-0.025
?-0.023
opl-0.023
aw-0.020
 snowball-0.020
Token av
Token ID1196
Feature activation-6.34e+00

Pos logit contributions
 Grac+0.039
nam+0.037
edly+0.034
rily+0.034
 Claire+0.033
Neg logit contributions
?"-0.049
opl-0.049
?-0.046
ats-0.042
 snowball-0.042
Tokenoc
Token ID420
Feature activation-3.05e-01

Pos logit contributions
 Grac+0.326
nam+0.324
 Claire+0.314
rily+0.307
 Katy+0.305
Neg logit contributions
opl-0.057
?"-0.035
ats-0.032
o-0.030
 snowball-0.027
Tokenados
Token ID22484
Feature activation-3.39e+00

Pos logit contributions
 Grac+0.007
nam+0.007
 Claire+0.006
 Katy+0.005
Gr+0.005
Neg logit contributions
opl-0.012
ats-0.011
?"-0.011
aw-0.010
o-0.010
Token than
Token ID621
Feature activation-2.75e-01

Pos logit contributions
nam+0.140
 Grac+0.138
rily+0.129
 Claire+0.127
edly+0.125
Neg logit contributions
?"-0.058
opl-0.058
?-0.057
o-0.047
ats-0.046
Token ever
Token ID1683
Feature activation-4.63e-01

Pos logit contributions
 Grac+0.010
edly+0.009
rily+0.009
ve+0.009
Gr+0.009
Neg logit contributions
?"-0.006
?-0.006
opl-0.006
 fires-0.005
 course-0.005
Token before
Token ID878
Feature activation-1.09e+00

Pos logit contributions
 Grac+0.017
nam+0.016
rily+0.015
 Claire+0.015
edly+0.014
Neg logit contributions
?"-0.011
opl-0.011
?-0.010
ats-0.009
o-0.009
Token the
Token ID262
Feature activation-2.67e-01

Pos logit contributions
opl+0.025
?"+0.024
?+0.022
 snowball+0.022
ats+0.021
Neg logit contributions
nam-0.046
 Grac-0.046
rily-0.044
 Claire-0.043
 Katy-0.042
Token tree
Token ID5509
Feature activation-1.30e+00

Pos logit contributions
 Grac+0.009
rily+0.009
nam+0.008
 Claire+0.008
edly+0.008
Neg logit contributions
?"-0.007
opl-0.007
?-0.006
 truck-0.006
 snowball-0.006
Token knew
Token ID2993
Feature activation-1.08e+00

Pos logit contributions
 Grac+0.041
nam+0.041
rily+0.038
 Claire+0.038
 Katy+0.037
Neg logit contributions
opl-0.035
?"-0.033
 snowball-0.031
ats-0.031
?-0.031
Token that
Token ID326
Feature activation+3.46e+00

Pos logit contributions
 Grac+0.040
nam+0.040
rily+0.038
 Claire+0.038
 Katy+0.037
Neg logit contributions
opl-0.023
?"-0.021
?-0.020
 snowball-0.019
ats-0.019
Token the
Token ID262
Feature activation-4.98e-02

Pos logit contributions
opl+0.085
?"+0.084
?+0.079
 snowball+0.074
ats+0.072
Neg logit contributions
 Grac-0.114
nam-0.112
rily-0.110
 Claire-0.105
ve-0.105
Token av
Token ID1196
Feature activation-6.24e+00

Pos logit contributions
 Grac+0.001
rily+0.001
nam+0.001
edly+0.001
ately+0.001
Neg logit contributions
?"-0.001
opl-0.001
?-0.001
 truck-0.001
 snowball-0.001
Tokenoc
Token ID420
Feature activation-1.65e-01

Pos logit contributions
 Grac+0.314
nam+0.311
 Claire+0.301
rily+0.298
 Katy+0.295
Neg logit contributions
opl-0.061
?"-0.044
ats-0.036
 snowball-0.034
o-0.034
Tokenados
Token ID22484
Feature activation-3.35e+00

Pos logit contributions
 Grac+0.004
nam+0.003
 Claire+0.003
 Katy+0.003
rily+0.003
Neg logit contributions
opl-0.007
?"-0.006
o-0.006
ats-0.006
?-0.006
Token would
Token ID561
Feature activation-2.58e-01

Pos logit contributions
 Grac+0.112
nam+0.112
rily+0.105
 Claire+0.103
 Katy+0.102
Neg logit contributions
?"-0.081
opl-0.081
?-0.076
ats-0.073
o-0.071
Token be
Token ID307
Feature activation-4.29e-01

Pos logit contributions
nam+0.009
 Grac+0.009
rily+0.008
 Claire+0.008
 Katy+0.008
Neg logit contributions
opl-0.006
?"-0.006
?-0.006
 snowball-0.005
ats-0.005
Token back
Token ID736
Feature activation-1.64e+00

Pos logit contributions
 Grac+0.014
nam+0.014
rily+0.013
edly+0.013
 Claire+0.013
Neg logit contributions
?"-0.011
opl-0.010
?-0.010
 snowball-0.009
ats-0.009
Token Tim
Token ID5045
Feature activation-5.61e-01

Pos logit contributions
 Grac+0.008
rily+0.008
nam+0.008
edly+0.008
 brace+0.008
Neg logit contributions
?"-0.004
 snowball-0.004
opl-0.004
?-0.004
 Rex-0.003
Token saw
Token ID2497
Feature activation+1.13e+00

Pos logit contributions
 Grac+0.015
 Claire+0.014
 Katy+0.014
nam+0.014
ve+0.014
Neg logit contributions
opl-0.017
 snowball-0.015
?"-0.015
ossible-0.014
etch-0.014
Token some
Token ID617
Feature activation-1.06e+00

Pos logit contributions
opl+0.036
?"+0.032
ats+0.032
ossible+0.031
?+0.030
Neg logit contributions
nam-0.032
 Grac-0.031
 Claire-0.029
 Katy-0.029
rily-0.028
Token other
Token ID584
Feature activation-1.45e+00

Pos logit contributions
 Grac+0.032
nam+0.032
 Claire+0.029
rily+0.029
 Katy+0.029
Neg logit contributions
opl-0.030
?"-0.029
ats-0.027
 snowball-0.027
?-0.026
Token kids
Token ID3988
Feature activation-7.45e-01

Pos logit contributions
 Grac+0.044
nam+0.043
 Claire+0.041
rily+0.040
 Katy+0.040
Neg logit contributions
opl-0.041
?"-0.038
 snowball-0.036
ats-0.036
?-0.036
Token mis
Token ID2984
Feature activation-6.19e+00

Pos logit contributions
 Grac+0.024
nam+0.023
 Claire+0.022
rily+0.022
 Katy+0.022
Neg logit contributions
opl-0.020
?"-0.018
 snowball-0.018
ats-0.017
?-0.017
Tokenbeh
Token ID20709
Feature activation-3.55e+00

Pos logit contributions
 Grac+0.141
nam+0.133
rily+0.129
 Claire+0.128
 Katy+0.126
Neg logit contributions
opl-0.217
ats-0.208
?"-0.207
o-0.207
 snowball-0.201
Tokenaving
Token ID2703
Feature activation+5.02e-01

Pos logit contributions
 Grac+0.098
nam+0.093
 Claire+0.086
chie+0.085
 Katy+0.085
Neg logit contributions
opl-0.107
?"-0.106
 snowball-0.103
?-0.100
 snow-0.094
Token and
Token ID290
Feature activation+7.32e-01

Pos logit contributions
opl+0.012
?"+0.011
 snowball+0.010
ats+0.010
?+0.010
Neg logit contributions
 Grac-0.018
nam-0.017
rily-0.016
 Claire-0.016
 Katy-0.016
Token running
Token ID2491
Feature activation+1.49e-01

Pos logit contributions
opl+0.018
?"+0.017
?+0.016
ats+0.016
o+0.016
Neg logit contributions
 Grac-0.024
rily-0.024
nam-0.024
 Claire-0.022
 Katy-0.022
Token around
Token ID1088
Feature activation-2.22e-02

Pos logit contributions
opl+0.003
?"+0.003
 snowball+0.003
?+0.003
ats+0.003
Neg logit contributions
 Grac-0.005
nam-0.005
rily-0.005
 Katy-0.005
 Claire-0.005
Token It
Token ID632
Feature activation-1.71e+00

Pos logit contributions
nam+0.029
 Grac+0.028
rily+0.027
ve+0.026
 Katy+0.026
Neg logit contributions
opl-0.017
?"-0.016
?-0.015
 snowball-0.015
ossible-0.014
Token was
Token ID373
Feature activation-5.09e-01

Pos logit contributions
nam+0.064
 Grac+0.061
rily+0.061
 Katy+0.059
 Claire+0.058
Neg logit contributions
opl-0.035
?"-0.035
?-0.032
o-0.031
 snowball-0.031
Token a
Token ID257
Feature activation-1.81e+00

Pos logit contributions
 Grac+0.014
nam+0.014
 Claire+0.013
rily+0.013
 Katy+0.013
Neg logit contributions
opl-0.016
?"-0.015
ats-0.014
 snowball-0.014
?-0.014
Token big
Token ID1263
Feature activation+8.52e-01

Pos logit contributions
 Grac+0.050
nam+0.049
 Claire+0.045
 Katy+0.044
rily+0.044
Neg logit contributions
opl-0.057
?"-0.053
ats-0.052
 snowball-0.050
?-0.049
Token j
Token ID474
Feature activation+3.76e-01

Pos logit contributions
opl+0.023
ats+0.022
?"+0.020
o+0.020
at+0.019
Neg logit contributions
 Grac-0.027
nam-0.026
 Claire-0.025
 Katy-0.025
 brace-0.023
Tokenigsaw
Token ID45636
Feature activation-6.14e+00

Pos logit contributions
opl+0.012
?"+0.011
 snowball+0.011
ats+0.011
?+0.010
Neg logit contributions
 Grac-0.010
nam-0.009
 Claire-0.009
rily-0.009
 Katy-0.009
Token puzzle
Token ID15027
Feature activation-1.34e+00

Pos logit contributions
nam+0.197
 Grac+0.192
rily+0.184
 Claire+0.175
ately+0.175
Neg logit contributions
?"-0.163
opl-0.160
?-0.153
 snowball-0.146
ats-0.137
Token and
Token ID290
Feature activation-1.23e-01

Pos logit contributions
 Grac+0.045
nam+0.045
 Claire+0.043
 Katy+0.042
rily+0.042
Neg logit contributions
opl-0.033
?"-0.029
 snowball-0.029
ats-0.028
ossible-0.027
Token she
Token ID673
Feature activation-4.79e-02

Pos logit contributions
rily+0.003
 Grac+0.003
nam+0.003
ve+0.003
chie+0.003
Neg logit contributions
?"-0.004
opl-0.003
?-0.003
 snowball-0.003
o-0.003
Token wanted
Token ID2227
Feature activation-1.13e+00

Pos logit contributions
 Grac+0.002
nam+0.002
rily+0.001
 Claire+0.001
 Katy+0.001
Neg logit contributions
opl-0.001
?"-0.001
 snowball-0.001
ats-0.001
?-0.001
Token to
Token ID284
Feature activation+1.23e-01

Pos logit contributions
 Grac+0.039
nam+0.038
rily+0.038
 Claire+0.036
edly+0.036
Neg logit contributions
?"-0.026
opl-0.025
?-0.024
 snowball-0.022
ats-0.022
Token know
Token ID760
Feature activation+3.25e-01

Pos logit contributions
opl+0.039
?"+0.036
 snowball+0.035
ats+0.034
?+0.033
Neg logit contributions
 Grac-0.046
nam-0.044
 Claire-0.042
rily-0.042
 Katy-0.041
Token,
Token ID11
Feature activation+2.78e-01

Pos logit contributions
opl+0.007
?"+0.006
?+0.006
ats+0.006
 snowball+0.006
Neg logit contributions
 Grac-0.012
nam-0.012
rily-0.012
 Claire-0.011
 Katy-0.011
Token her
Token ID607
Feature activation-1.37e+00

Pos logit contributions
opl+0.007
?"+0.007
?+0.006
 snowball+0.006
ats+0.006
Neg logit contributions
 Grac-0.009
nam-0.009
rily-0.008
 Claire-0.008
 Katy-0.008
Token mum
Token ID25682
Feature activation+1.26e+00

Pos logit contributions
 Grac+0.037
nam+0.037
 Claire+0.034
rily+0.034
 Katy+0.034
Neg logit contributions
opl-0.043
?"-0.040
ats-0.039
 snowball-0.038
o-0.038
Token was
Token ID373
Feature activation+1.11e+00

Pos logit contributions
opl+0.030
?"+0.028
 snowball+0.026
ats+0.026
?+0.026
Neg logit contributions
 Grac-0.043
nam-0.043
rily-0.041
 Claire-0.040
 Katy-0.040
Token fore
Token ID1674
Feature activation-6.08e+00

Pos logit contributions
opl+0.031
?"+0.029
 snowball+0.028
ats+0.027
?+0.027
Neg logit contributions
 Grac-0.034
nam-0.034
rily-0.032
 Claire-0.032
 Katy-0.031
Tokenshadow
Token ID19106
Feature activation-2.62e+00

Pos logit contributions
 Grac+0.151
nam+0.143
 Claire+0.139
rily+0.134
 Katy+0.133
Neg logit contributions
opl-0.200
?"-0.196
ats-0.188
?-0.185
o-0.184
Tokening
Token ID278
Feature activation+1.75e+00

Pos logit contributions
nam+0.101
 Grac+0.091
rily+0.088
 Claire+0.084
 diamonds+0.083
Neg logit contributions
aw-0.059
o-0.059
?-0.055
ats-0.055
opl-0.054
Token...
Token ID986
Feature activation-8.89e-01

Pos logit contributions
opl+0.044
?"+0.042
 snowball+0.039
ats+0.039
?+0.039
Neg logit contributions
 Grac-0.058
nam-0.057
rily-0.054
 Claire-0.054
 Katy-0.053
Tokenthe
Token ID1169
Feature activation+9.76e-02

Pos logit contributions
 Grac+0.025
nam+0.024
rily+0.024
 brace+0.022
 Claire+0.022
Neg logit contributions
opl-0.026
?"-0.025
o-0.024
?-0.024
ats-0.023
Token next
Token ID1306
Feature activation-1.77e+00

Pos logit contributions
opl+0.002
?"+0.002
o+0.002
ats+0.002
ossible+0.002
Neg logit contributions
 Grac-0.003
nam-0.003
 Claire-0.003
ve-0.003
 Katy-0.003
Token?"
Token ID1701
Feature activation-5.30e-01

Pos logit contributions
 Grac+0.001
 Claire+0.001
nam+0.001
 Katy+0.001
rily+0.001
Neg logit contributions
opl-0.000
 snowball-0.000
ats-0.000
ossible-0.000
haw-0.000
Token The
Token ID383
Feature activation+1.76e+00

Pos logit contributions
nam+0.020
 Grac+0.020
rily+0.020
 Claire+0.019
 Katy+0.019
Neg logit contributions
opl-0.010
?"-0.010
?-0.009
 snowball-0.009
ats-0.008
Token little
Token ID1310
Feature activation-2.51e+00

Pos logit contributions
opl+0.055
?"+0.051
ats+0.049
 snowball+0.049
?+0.048
Neg logit contributions
 Grac-0.049
nam-0.048
 Claire-0.045
rily-0.045
 Katy-0.044
Token boy
Token ID2933
Feature activation-1.30e+00

Pos logit contributions
nam+0.058
 Grac+0.057
rily+0.054
 Katy+0.051
ve+0.051
Neg logit contributions
?"-0.086
 snowball-0.083
opl-0.083
?-0.082
o-0.078
Token smiled
Token ID13541
Feature activation-1.46e+00

Pos logit contributions
 Grac+0.043
nam+0.043
rily+0.040
 Claire+0.040
 Katy+0.040
Neg logit contributions
opl-0.033
?"-0.031
 snowball-0.029
ats-0.029
?-0.028
Token mis
Token ID2984
Feature activation-6.05e+00

Pos logit contributions
nam+0.065
 Grac+0.064
 Claire+0.059
ve+0.059
 Katy+0.059
Neg logit contributions
?"-0.021
?-0.020
opl-0.018
 snowball-0.017
o-0.016
Tokenchie
Token ID3043
Feature activation-1.97e+00

Pos logit contributions
 Grac+0.117
nam+0.113
 Claire+0.109
 Katy+0.106
rily+0.104
Neg logit contributions
opl-0.236
o-0.226
?"-0.223
ats-0.222
 snowball-0.216
Tokenv
Token ID85
Feature activation+1.54e+00

Pos logit contributions
 Grac+0.064
nam+0.063
 Claire+0.062
 Katy+0.061
rily+0.061
Neg logit contributions
opl-0.047
?"-0.043
 snowball-0.042
ats-0.040
?-0.040
Tokenously
Token ID3481
Feature activation-7.14e-01

Pos logit contributions
opl+0.048
?"+0.046
ats+0.043
?+0.043
 snowball+0.043
Neg logit contributions
nam-0.043
 Grac-0.043
 Claire-0.039
rily-0.039
 Katy-0.038
Token and
Token ID290
Feature activation-6.20e-03

Pos logit contributions
 Grac+0.029
nam+0.029
 Claire+0.027
rily+0.027
 Katy+0.027
Neg logit contributions
opl-0.013
?"-0.012
?-0.011
 snowball-0.010
ats-0.010
Token said
Token ID531
Feature activation-4.19e-02

Pos logit contributions
nam+0.000
 Grac+0.000
rily+0.000
ve+0.000
chie+0.000
Neg logit contributions
?"-0.000
o-0.000
?-0.000
 Rex-0.000
 named-0.000
Token so
Token ID523
Feature activation+2.31e-01

Pos logit contributions
opl+0.021
?"+0.020
ats+0.019
 snowball+0.019
?+0.018
Neg logit contributions
 Grac-0.020
nam-0.020
 Claire-0.019
rily-0.018
 Katy-0.018
Token happy
Token ID3772
Feature activation-1.31e+00

Pos logit contributions
opl+0.007
?"+0.007
?+0.006
o+0.006
 snowball+0.006
Neg logit contributions
 Grac-0.007
nam-0.007
rily-0.006
 Katy-0.006
 Claire-0.006
Token and
Token ID290
Feature activation+8.32e-02

Pos logit contributions
nam+0.052
 Grac+0.051
rily+0.049
 Claire+0.048
 Katy+0.047
Neg logit contributions
opl-0.024
?"-0.024
?-0.022
o-0.021
 snowball-0.021
Token brought
Token ID3181
Feature activation-1.15e+00

Pos logit contributions
?"+0.002
o+0.002
 named+0.002
opl+0.002
 called+0.002
Neg logit contributions
 Grac-0.003
rily-0.003
chie-0.003
nam-0.003
rier-0.003
Token the
Token ID262
Feature activation-1.29e+00

Pos logit contributions
edly+0.047
ately+0.046
nam+0.045
 Grac+0.044
ve+0.044
Neg logit contributions
?"-0.021
?-0.017
ossible-0.016
ats-0.016
opl-0.016
Token av
Token ID1196
Feature activation-6.04e+00

Pos logit contributions
 Grac+0.036
edly+0.035
rily+0.034
ately+0.033
nam+0.030
Neg logit contributions
?"-0.042
opl-0.041
?-0.038
 truck-0.038
ossible-0.036
Tokenoc
Token ID420
Feature activation-1.29e-01

Pos logit contributions
 Grac+0.292
nam+0.289
 Claire+0.278
rily+0.278
 Katy+0.274
Neg logit contributions
opl-0.065
?"-0.053
ats-0.045
 snowball-0.044
o-0.041
Tokenados
Token ID22484
Feature activation-3.37e+00

Pos logit contributions
 Grac+0.003
nam+0.003
 Claire+0.002
 Katy+0.002
rily+0.002
Neg logit contributions
opl-0.005
?"-0.005
ats-0.005
 truck-0.004
o-0.004
Token home
Token ID1363
Feature activation-1.64e+00

Pos logit contributions
 Grac+0.123
nam+0.120
rily+0.113
 Katy+0.111
 Claire+0.111
Neg logit contributions
?"-0.071
?-0.068
opl-0.067
o-0.066
ossible-0.063
Token to
Token ID284
Feature activation+2.39e-01

Pos logit contributions
 Grac+0.072
nam+0.070
rily+0.065
 Katy+0.065
 Claire+0.065
Neg logit contributions
?"-0.022
?-0.020
o-0.020
opl-0.019
ats-0.018
Token share
Token ID2648
Feature activation-1.39e+00

Pos logit contributions
?"+0.006
opl+0.005
?+0.005
aster+0.005
ossible+0.004
Neg logit contributions
 Grac-0.009
nam-0.008
rily-0.008
edly-0.008
ve-0.008
Token would
Token ID561
Feature activation-3.59e-01

Pos logit contributions
 Grac+0.005
nam+0.005
 Claire+0.004
 brace+0.004
 Katy+0.004
Neg logit contributions
opl-0.004
?"-0.003
 snowball-0.003
ats-0.003
?-0.003
Token take
Token ID1011
Feature activation-1.19e+00

Pos logit contributions
nam+0.011
 Grac+0.011
 Claire+0.011
rily+0.010
 Katy+0.010
Neg logit contributions
opl-0.010
?"-0.009
 snowball-0.009
ats-0.009
o-0.008
Token out
Token ID503
Feature activation-1.04e-01

Pos logit contributions
 Grac+0.045
nam+0.045
rily+0.044
edly+0.043
ately+0.042
Neg logit contributions
opl-0.023
?"-0.023
 snowball-0.021
ats-0.021
?-0.020
Token a
Token ID257
Feature activation-5.91e-01

Pos logit contributions
 Grac+0.004
nam+0.004
rily+0.003
 Claire+0.003
 Katy+0.003
Neg logit contributions
opl-0.002
?"-0.002
?-0.002
 snowball-0.002
ats-0.002
Token j
Token ID474
Feature activation+5.45e-01

Pos logit contributions
 Grac+0.018
rily+0.018
nam+0.018
edly+0.018
 Claire+0.017
Neg logit contributions
?"-0.015
opl-0.015
 snowball-0.014
?-0.014
ats-0.013
Tokenigsaw
Token ID45636
Feature activation-6.00e+00

Pos logit contributions
opl+0.018
?"+0.018
 snowball+0.017
o+0.016
 Rex+0.016
Neg logit contributions
 Grac-0.013
nam-0.012
rily-0.011
 Claire-0.011
 Katy-0.011
Token puzzle
Token ID15027
Feature activation-1.04e+00

Pos logit contributions
nam+0.176
 Grac+0.174
rily+0.165
 Claire+0.160
edly+0.154
Neg logit contributions
?"-0.174
opl-0.173
?-0.164
 snowball-0.156
ats-0.155
Token and
Token ID290
Feature activation+3.64e-01

Pos logit contributions
nam+0.031
 Grac+0.031
 Katy+0.029
rily+0.029
 Claire+0.029
Neg logit contributions
opl-0.030
 snowball-0.026
?"-0.026
ats-0.026
haw-0.025
Token try
Token ID1949
Feature activation-3.20e-01

Pos logit contributions
?"+0.008
opl+0.008
?+0.007
 snowball+0.007
ats+0.007
Neg logit contributions
 Grac-0.013
rily-0.012
nam-0.012
 Claire-0.012
edly-0.012
Token to
Token ID284
Feature activation+1.79e-01

Pos logit contributions
 Grac+0.011
nam+0.011
 Claire+0.011
rily+0.010
edly+0.010
Neg logit contributions
?"-0.008
opl-0.008
?-0.007
 snowball-0.007
ats-0.007
Token solve
Token ID8494
Feature activation-1.97e+00

Pos logit contributions
?"+0.005
?+0.004
opl+0.004
aster+0.004
 snowball+0.003
Neg logit contributions
 Grac-0.006
rily-0.006
edly-0.006
nam-0.006
 Claire-0.006
Token baby
Token ID5156
Feature activation-7.09e-02

Pos logit contributions
?"+0.008
 snowball+0.008
?+0.008
opl+0.008
o+0.007
Neg logit contributions
nam-0.007
rily-0.007
 Grac-0.007
edly-0.007
 Katy-0.006
Token.
Token ID13
Feature activation-7.50e-01

Pos logit contributions
 Grac+0.003
nam+0.003
 Claire+0.002
rily+0.002
 Katy+0.002
Neg logit contributions
opl-0.002
?"-0.001
ats-0.001
 snowball-0.001
?-0.001
Token He
Token ID679
Feature activation-6.89e-01

Pos logit contributions
 Grac+0.022
nam+0.022
 Claire+0.021
rily+0.021
 Katy+0.020
Neg logit contributions
opl-0.022
?"-0.020
ats-0.019
 snowball-0.019
?-0.019
Token was
Token ID373
Feature activation-3.34e-01

Pos logit contributions
 Grac+0.022
nam+0.022
 Claire+0.021
rily+0.020
 Katy+0.020
Neg logit contributions
opl-0.018
?"-0.016
ats-0.016
 snowball-0.015
?-0.015
Token a
Token ID257
Feature activation-2.41e+00

Pos logit contributions
 Grac+0.011
nam+0.011
rily+0.010
 Claire+0.010
 Katy+0.010
Neg logit contributions
opl-0.009
?"-0.008
 snowball-0.008
ats-0.008
?-0.008
Token mis
Token ID2984
Feature activation-6.00e+00

Pos logit contributions
 Grac+0.070
nam+0.070
rily+0.067
 Claire+0.064
 Katy+0.064
Neg logit contributions
opl-0.068
?"-0.067
 snowball-0.064
?-0.063
o-0.061
Tokenchie
Token ID3043
Feature activation-2.34e+00

Pos logit contributions
 Grac+0.078
nam+0.075
 Claire+0.073
 Katy+0.072
rily+0.069
Neg logit contributions
opl-0.253
o-0.249
ats-0.240
?"-0.239
aver-0.237
Tokenvous
Token ID31222
Feature activation-1.61e+00

Pos logit contributions
 Grac+0.080
nam+0.078
 Claire+0.075
 Katy+0.074
ve+0.073
Neg logit contributions
opl-0.053
?"-0.049
 snowball-0.047
?-0.046
ats-0.046
Token little
Token ID1310
Feature activation-1.90e+00

Pos logit contributions
nam+0.051
 Grac+0.050
rily+0.047
 Katy+0.045
 Claire+0.045
Neg logit contributions
opl-0.043
?"-0.043
 snowball-0.042
?-0.040
o-0.038
Token baby
Token ID5156
Feature activation-6.74e-01

Pos logit contributions
 Grac+0.055
nam+0.055
rily+0.052
 Claire+0.051
 Katy+0.050
Neg logit contributions
opl-0.055
?"-0.053
 snowball-0.050
?-0.050
ats-0.049
Token and
Token ID290
Feature activation-2.39e-01

Pos logit contributions
 Grac+0.023
nam+0.023
 Claire+0.022
rily+0.021
 Katy+0.021
Neg logit contributions
opl-0.016
ats-0.014
?"-0.014
ossible-0.014
 snowball-0.013
Token It
Token ID632
Feature activation-1.18e+00

Pos logit contributions
opl+0.013
?"+0.012
ats+0.011
 snowball+0.011
?+0.011
Neg logit contributions
 Grac-0.016
nam-0.016
 Claire-0.015
rily-0.015
 Katy-0.015
Token turned
Token ID2900
Feature activation-1.75e+00

Pos logit contributions
 Grac+0.037
nam+0.035
rily+0.034
 Claire+0.034
 Katy+0.032
Neg logit contributions
opl-0.034
?"-0.031
ats-0.030
 snowball-0.029
?-0.029
Token out
Token ID503
Feature activation-2.35e+00

Pos logit contributions
nam+0.052
 Grac+0.051
 Claire+0.048
rily+0.048
 Katy+0.047
Neg logit contributions
opl-0.049
?"-0.048
 snowball-0.046
?-0.045
ats-0.044
Token that
Token ID326
Feature activation+1.80e+00

Pos logit contributions
nam+0.088
 Grac+0.086
 Claire+0.081
rily+0.081
 Katy+0.079
Neg logit contributions
?"-0.048
opl-0.048
?-0.045
 snowball-0.044
ats-0.041
Token the
Token ID262
Feature activation-7.42e-01

Pos logit contributions
opl+0.048
?"+0.048
?+0.044
 snowball+0.044
ats+0.042
Neg logit contributions
 Grac-0.056
nam-0.055
rily-0.053
 Claire-0.051
 Katy-0.050
Token av
Token ID1196
Feature activation-5.91e+00

Pos logit contributions
 Grac+0.023
nam+0.023
rily+0.022
 Claire+0.021
 Katy+0.021
Neg logit contributions
opl-0.020
?"-0.019
 snowball-0.018
?-0.018
ats-0.017
Tokenoc
Token ID420
Feature activation-2.01e-01

Pos logit contributions
 Grac+0.272
nam+0.270
rily+0.259
 Claire+0.257
 Katy+0.255
Neg logit contributions
opl-0.072
?"-0.065
 snowball-0.054
?-0.054
ats-0.053
Tokenados
Token ID22484
Feature activation-2.97e+00

Pos logit contributions
 Grac+0.004
nam+0.004
 Claire+0.004
 Katy+0.003
rily+0.003
Neg logit contributions
opl-0.008
?"-0.007
ats-0.007
o-0.007
 truck-0.007
Token were
Token ID547
Feature activation-3.55e-01

Pos logit contributions
 Grac+0.089
nam+0.088
rily+0.083
 Claire+0.082
 Katy+0.081
Neg logit contributions
opl-0.084
?"-0.080
 snowball-0.075
ats-0.074
?-0.074
Token bad
Token ID2089
Feature activation-4.14e-01

Pos logit contributions
 Grac+0.011
nam+0.011
rily+0.010
 Claire+0.010
 Katy+0.010
Neg logit contributions
opl-0.010
?"-0.009
 snowball-0.008
ats-0.008
?-0.008
Token and
Token ID290
Feature activation-1.27e-01

Pos logit contributions
nam+0.016
 Grac+0.016
rily+0.015
 Claire+0.015
 Katy+0.014
Neg logit contributions
opl-0.009
?"-0.008
?-0.007
 snowball-0.007
ats-0.007
Token and
Token ID290
Feature activation-3.79e+00

Pos logit contributions
nam+0.085
 Grac+0.081
rily+0.077
 Katy+0.075
 Claire+0.075
Neg logit contributions
?"-0.052
opl-0.052
o-0.048
?-0.047
ats-0.046
Token the
Token ID262
Feature activation-1.88e+00

Pos logit contributions
ve+0.115
rily+0.111
edly+0.107
rier+0.102
nam+0.101
Neg logit contributions
?"-0.099
 Rex-0.098
opl-0.094
?-0.094
ossible-0.091
Token princess
Token ID21752
Feature activation-3.78e+00

Pos logit contributions
rily+0.053
nam+0.050
ately+0.047
ve+0.045
 Katy+0.044
Neg logit contributions
?"-0.052
 course-0.049
?-0.049
 snow-0.048
 truck-0.048
Token were
Token ID547
Feature activation-9.23e-02

Pos logit contributions
nam+0.131
 Grac+0.124
rily+0.120
 Katy+0.115
 Claire+0.114
Neg logit contributions
?"-0.088
ats-0.087
opl-0.087
o-0.085
?-0.080
Token over
Token ID625
Feature activation-2.34e+00

Pos logit contributions
nam+0.003
 Grac+0.003
rily+0.003
edly+0.003
 Katy+0.003
Neg logit contributions
opl-0.002
?"-0.002
 snowball-0.002
ats-0.002
o-0.002
Tokenjoy
Token ID2633
Feature activation-5.90e+00

Pos logit contributions
nam+0.059
rily+0.053
 Grac+0.053
 brace+0.047
chie+0.046
Neg logit contributions
?-0.079
?"-0.078
opl-0.077
ossible-0.076
 snowball-0.074
Tokened
Token ID276
Feature activation-9.35e-01

Pos logit contributions
nam+0.136
 Grac+0.129
rily+0.113
 brace+0.104
 Claire+0.104
Neg logit contributions
opl-0.222
o-0.220
aw-0.211
ats-0.208
?"-0.206
Token,
Token ID11
Feature activation-4.42e-01

Pos logit contributions
nam+0.042
 Grac+0.039
rily+0.037
 Claire+0.036
 Katy+0.036
Neg logit contributions
opl-0.014
?-0.014
?"-0.013
o-0.011
ats-0.011
Token and
Token ID290
Feature activation-7.44e-01

Pos logit contributions
nam+0.016
 Grac+0.016
rily+0.016
 Claire+0.015
edly+0.015
Neg logit contributions
opl-0.009
?"-0.008
 snowball-0.008
o-0.008
?-0.008
Token they
Token ID484
Feature activation-8.43e-01

Pos logit contributions
nam+0.025
ve+0.024
rily+0.023
chie+0.023
 Grac+0.023
Neg logit contributions
o-0.016
 Rex-0.015
 named-0.015
?"-0.014
opl-0.014
Token soon
Token ID2582
Feature activation-6.28e-01

Pos logit contributions
nam+0.027
 Grac+0.027
rily+0.026
rier+0.025
 Katy+0.025
Neg logit contributions
?"-0.020
opl-0.019
o-0.019
ossible-0.019
ats-0.018
TokenYes
Token ID5297
Feature activation+1.05e+00

Pos logit contributions
 Grac+0.002
nam+0.002
 Claire+0.002
rily+0.002
 Katy+0.002
Neg logit contributions
opl-0.002
?"-0.002
ats-0.002
 snowball-0.002
?-0.002
Token,
Token ID11
Feature activation+2.02e+00

Pos logit contributions
opl+0.024
 snowball+0.020
?"+0.020
ats+0.019
o+0.019
Neg logit contributions
 Grac-0.039
nam-0.037
 Claire-0.037
rily-0.037
 Katy-0.036
Token it
Token ID340
Feature activation+7.36e-01

Pos logit contributions
opl+0.057
?"+0.053
ats+0.050
 snowball+0.050
?+0.049
Neg logit contributions
 Grac-0.062
nam-0.061
rily-0.057
 Claire-0.057
 Katy-0.056
Tokenâ
Token ID22940
Feature activation+2.61e+00

Pos logit contributions
opl+0.019
 snowball+0.016
ats+0.016
ossible+0.015
?"+0.015
Neg logit contributions
 Grac-0.025
 Claire-0.023
rily-0.022
ve-0.022
 Katy-0.022
Token€
Token ID26391
Feature activation-1.10e+00

Pos logit contributions
 snowball+0.070
opl+0.068
?"+0.063
ats+0.062
 sled+0.061
Neg logit contributions
 Grac-0.074
 Claire-0.073
 Katy-0.072
nam-0.069
ve-0.068
Tokenâ„¢
Token ID8151
Feature activation+2.90e+00

Pos logit contributions
 Grac+0.034
nam+0.033
 Katy+0.033
 Claire+0.032
ve+0.032
Neg logit contributions
opl-0.028
 snowball-0.025
ats-0.024
?"-0.024
 microwave-0.023
Tokens
Token ID82
Feature activation+1.51e+00

Pos logit contributions
opl+0.079
 snowball+0.073
?"+0.070
 fires+0.067
ats+0.065
Neg logit contributions
 Grac-0.088
ve-0.082
 Claire-0.081
nam-0.081
 Katy-0.080
Token so
Token ID523
Feature activation+7.42e-01

Pos logit contributions
opl+0.034
 snowball+0.030
?"+0.030
ats+0.029
o+0.028
Neg logit contributions
 Grac-0.054
nam-0.052
 Claire-0.051
rily-0.050
 Katy-0.050
Token bright
Token ID6016
Feature activation+6.45e-01

Pos logit contributions
opl+0.020
?"+0.018
 snowball+0.017
ats+0.017
?+0.017
Neg logit contributions
 Grac-0.024
nam-0.024
rily-0.022
 Claire-0.022
 Katy-0.022
Token!
Token ID0
Feature activation+1.06e+00

Pos logit contributions
opl+0.015
ats+0.013
 snowball+0.013
?"+0.012
ossible+0.012
Neg logit contributions
 Grac-0.023
rily-0.022
 Claire-0.022
nam-0.022
 Katy-0.021
Tokenâ
Token ID22940
Feature activation+2.41e+00

Pos logit contributions
opl+0.027
ats+0.025
?"+0.024
 snowball+0.023
?+0.022
Neg logit contributions
 Grac-0.036
nam-0.034
 Claire-0.034
 Katy-0.033
rily-0.033
Token.
Token ID13
Feature activation+1.42e+00

Pos logit contributions
opl+0.048
 snowball+0.041
ossible+0.040
 microwave+0.040
ats+0.040
Neg logit contributions
rily-0.045
 Grac-0.045
nam-0.045
 Claire-0.044
ve-0.042
Token Come
Token ID7911
Feature activation+3.06e-01

Pos logit contributions
ats+0.041
opl+0.040
at+0.035
?"+0.035
 snowball+0.034
Neg logit contributions
 Grac-0.042
 Claire-0.039
nam-0.037
edly-0.037
rily-0.036
Token on
Token ID319
Feature activation+1.85e+00

Pos logit contributions
opl+0.010
ats+0.008
?"+0.008
 snowball+0.008
ossible+0.008
Neg logit contributions
 Grac-0.009
ve-0.008
rily-0.008
 Claire-0.008
nam-0.007
Token,
Token ID11
Feature activation+2.16e+00

Pos logit contributions
opl+0.056
ats+0.047
aw+0.046
at+0.044
ossible+0.044
Neg logit contributions
 Grac-0.055
 Claire-0.051
 Katy-0.049
rily-0.048
 Grace-0.045
Token let
Token ID1309
Feature activation+3.60e-01

Pos logit contributions
opl+0.068
ats+0.057
?"+0.054
ossible+0.053
?+0.051
Neg logit contributions
 Grac-0.065
 Claire-0.060
nam-0.058
rily-0.056
 rib-0.055
Token's
Token ID338
Feature activation+3.35e+00

Pos logit contributions
opl+0.011
iren+0.009
?"+0.009
loo+0.009
aver+0.008
Neg logit contributions
 Grac-0.010
 Claire-0.009
 Katy-0.009
rily-0.009
ve-0.009
Token go
Token ID467
Feature activation+1.24e+00

Pos logit contributions
opl+0.094
iren+0.080
ats+0.078
aw+0.074
o+0.074
Neg logit contributions
 Grac-0.103
 Claire-0.099
nam-0.095
 brace-0.091
 Katy-0.091
Token play
Token ID711
Feature activation+1.43e+00

Pos logit contributions
opl+0.037
 snowball+0.031
ats+0.030
aw+0.029
?"+0.029
Neg logit contributions
 Grac-0.037
 Claire-0.033
rily-0.033
ve-0.033
nam-0.032
Token."
Token ID526
Feature activation+1.19e-01

Pos logit contributions
opl+0.041
ats+0.036
o+0.034
?"+0.033
ossible+0.033
Neg logit contributions
 Grac-0.044
rily-0.042
 Claire-0.042
nam-0.041
edly-0.039
Token
Token ID198
Feature activation+8.98e-01

Pos logit contributions
opl+0.004
at+0.003
ats+0.003
?"+0.003
aver+0.003
Neg logit contributions
 Grac-0.003
nam-0.003
edly-0.003
 Claire-0.003
 Katy-0.003
Token
Token ID198
Feature activation+6.24e-01

Pos logit contributions
opl+0.031
at+0.028
ats+0.028
ah+0.026
 microwave+0.025
Neg logit contributions
 Grac-0.019
ve-0.017
 Claire-0.017
rily-0.016
Gr-0.016
Token.
Token ID13
Feature activation+6.33e-01

Pos logit contributions
opl+0.027
?"+0.024
ats+0.023
 snowball+0.022
?+0.022
Neg logit contributions
 Grac-0.047
nam-0.046
rily-0.044
 Claire-0.044
 Katy-0.043
Token
Token ID198
Feature activation+9.07e-01

Pos logit contributions
opl+0.015
ats+0.014
?"+0.013
 snowball+0.013
?+0.012
Neg logit contributions
 Grac-0.022
nam-0.021
 Katy-0.021
 Claire-0.021
rily-0.020
Token
Token ID198
Feature activation+6.61e-01

Pos logit contributions
opl+0.026
ats+0.022
 snowball+0.022
at+0.021
?"+0.021
Neg logit contributions
 Grac-0.028
 Katy-0.025
 Claire-0.025
nam-0.025
ve-0.025
Token"
Token ID1
Feature activation-2.07e+00

Pos logit contributions
opl+0.017
 snowball+0.015
?"+0.014
ats+0.014
at+0.013
Neg logit contributions
nam-0.021
 Grac-0.021
 Katy-0.020
ve-0.020
 Claire-0.020
TokenO
Token ID46
Feature activation+1.37e+00

Pos logit contributions
 Grac+0.066
nam+0.065
rily+0.062
 Katy+0.061
 Claire+0.061
Neg logit contributions
opl-0.054
?"-0.052
 snowball-0.050
?-0.049
ats-0.047
Tokenw
Token ID86
Feature activation+4.04e+00

Pos logit contributions
opl+0.023
 snowball+0.020
?"+0.019
 microwave+0.016
 fires+0.016
Neg logit contributions
ve-0.055
 Grac-0.053
nam-0.052
ately-0.051
edly-0.050
Token,
Token ID11
Feature activation+2.15e+00

Pos logit contributions
opl+0.093
 snowball+0.075
?"+0.072
ats+0.071
ossible+0.071
Neg logit contributions
 Grac-0.149
nam-0.139
rily-0.137
ve-0.135
 Claire-0.134
Token ow
Token ID12334
Feature activation+3.87e+00

Pos logit contributions
opl+0.064
?"+0.058
ats+0.057
 snowball+0.055
?+0.054
Neg logit contributions
 Grac-0.064
nam-0.062
 Claire-0.058
rily-0.058
 Katy-0.057
Token,
Token ID11
Feature activation+1.65e+00

Pos logit contributions
opl+0.099
 snowball+0.081
ats+0.076
ossible+0.076
 microwave+0.076
Neg logit contributions
 Grac-0.127
ve-0.122
nam-0.119
 Claire-0.113
rily-0.112
Token ow
Token ID12334
Feature activation+3.60e+00

Pos logit contributions
opl+0.050
?"+0.045
ats+0.045
 snowball+0.042
?+0.042
Neg logit contributions
 Grac-0.049
nam-0.048
rily-0.044
 Claire-0.044
 Katy-0.043
Token!"
Token ID2474
Feature activation-1.58e-01

Pos logit contributions
opl+0.087
ossible+0.069
ats+0.069
 snowball+0.069
?"+0.065
Neg logit contributions
 Grac-0.122
ve-0.117
nam-0.115
rily-0.111
 Claire-0.111
Token at
Token ID379
Feature activation+1.46e+00

Pos logit contributions
?"+0.020
 snowball+0.019
opl+0.019
?+0.019
o+0.017
Neg logit contributions
 Grac-0.032
nam-0.032
 Claire-0.030
 Katy-0.030
rily-0.028
Token him
Token ID683
Feature activation+3.09e+00

Pos logit contributions
?"+0.026
 snowball+0.025
opl+0.025
?+0.024
 Rex+0.023
Neg logit contributions
 Grac-0.058
edly-0.056
nam-0.055
ve-0.055
rily-0.055
Token.
Token ID13
Feature activation+3.57e-01

Pos logit contributions
opl+0.082
?"+0.074
ats+0.072
 snowball+0.069
ossible+0.069
Neg logit contributions
rily-0.098
 Grac-0.096
nam-0.095
edly-0.089
 brace-0.088
Token Tim
Token ID5045
Feature activation+1.63e-01

Pos logit contributions
opl+0.009
?"+0.008
ats+0.008
 snowball+0.008
?+0.008
Neg logit contributions
 Grac-0.012
nam-0.012
rily-0.011
 Claire-0.011
 Katy-0.011
Token screamed
Token ID25421
Feature activation+2.02e+00

Pos logit contributions
opl+0.005
?"+0.004
?+0.004
ats+0.004
ossible+0.004
Neg logit contributions
 Grac-0.005
nam-0.005
rily-0.005
 Claire-0.004
ve-0.004
Token and
Token ID290
Feature activation+2.95e+00

Pos logit contributions
opl+0.049
?"+0.044
ats+0.042
 snowball+0.039
?+0.039
Neg logit contributions
 Grac-0.071
nam-0.069
rily-0.068
 Claire-0.066
 Katy-0.064
Token pulled
Token ID5954
Feature activation+1.72e+00

Pos logit contributions
opl+0.082
?"+0.078
 snowball+0.073
ats+0.072
?+0.072
Neg logit contributions
 Grac-0.091
nam-0.090
rily-0.084
 Claire-0.084
 Katy-0.083
Token back
Token ID736
Feature activation-2.24e-01

Pos logit contributions
opl+0.050
?"+0.045
ats+0.044
ossible+0.040
?+0.040
Neg logit contributions
 Grac-0.053
nam-0.050
rily-0.049
 Claire-0.048
 Katy-0.047
Token his
Token ID465
Feature activation+3.63e-03

Pos logit contributions
 Grac+0.007
nam+0.007
rily+0.007
 Claire+0.007
 Katy+0.007
Neg logit contributions
opl-0.006
?"-0.005
ats-0.005
 snowball-0.005
ossible-0.005
Token hand
Token ID1021
Feature activation+1.19e+00

Pos logit contributions
opl+0.000
?"+0.000
ats+0.000
 snowball+0.000
?+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
rily-0.000
 Claire-0.000
 Katy-0.000
Token.
Token ID13
Feature activation+3.17e-01

Pos logit contributions
opl+0.032
?"+0.029
ats+0.027
ossible+0.026
 snowball+0.025
Neg logit contributions
 Grac-0.038
nam-0.038
rily-0.038
 Claire-0.035
 brace-0.035
Token.
Token ID13
Feature activation+6.34e-01

Pos logit contributions
?"+0.003
?+0.003
o+0.002
 named+0.001
 truck+0.001
Neg logit contributions
nam-0.014
 Grac-0.013
 diamonds-0.013
 gems-0.013
 rib-0.013
Token
Token ID198
Feature activation+4.91e-01

Pos logit contributions
opl+0.013
?"+0.012
?+0.011
 snowball+0.011
o+0.011
Neg logit contributions
nam-0.024
 Grac-0.024
rily-0.023
 Claire-0.022
ve-0.022
Token
Token ID198
Feature activation+5.33e-01

Pos logit contributions
opl+0.011
?"+0.010
ats+0.010
 snowball+0.010
?+0.009
Neg logit contributions
 Grac-0.018
nam-0.016
 Claire-0.016
 Katy-0.016
rily-0.016
Token"
Token ID1
Feature activation-2.11e+00

Pos logit contributions
opl+0.012
?"+0.011
ats+0.011
 snowball+0.011
?+0.010
Neg logit contributions
 Grac-0.019
nam-0.019
 Claire-0.018
rily-0.018
 Katy-0.018
TokenYes
Token ID5297
Feature activation+2.03e+00

Pos logit contributions
rily+0.073
 Grac+0.073
nam+0.072
 Katy+0.068
ately+0.067
Neg logit contributions
opl-0.044
?"-0.043
?-0.041
 snowball-0.040
o-0.039
Token,
Token ID11
Feature activation+2.81e+00

Pos logit contributions
opl+0.039
?"+0.035
 snowball+0.033
ats+0.032
?+0.031
Neg logit contributions
 Grac-0.081
nam-0.079
rily-0.076
 Claire-0.076
 Katy-0.075
Token we
Token ID356
Feature activation+6.53e-01

Pos logit contributions
opl+0.074
?"+0.073
?+0.068
 snowball+0.067
ats+0.065
Neg logit contributions
 Grac-0.089
nam-0.088
rily-0.084
 Claire-0.082
 Katy-0.080
Token did
Token ID750
Feature activation-4.09e-02

Pos logit contributions
opl+0.017
 snowball+0.014
iren+0.013
ats+0.013
o+0.013
Neg logit contributions
 Grac-0.022
nam-0.021
 Katy-0.021
 Claire-0.021
ve-0.020
Token,
Token ID11
Feature activation+2.49e+00

Pos logit contributions
 Grac+0.001
nam+0.001
rily+0.001
 Claire+0.001
 Katy+0.001
Neg logit contributions
opl-0.001
?"-0.001
 snowball-0.001
ats-0.001
?-0.001
Token Tom
Token ID4186
Feature activation+1.08e+00

Pos logit contributions
?"+0.055
opl+0.054
?+0.051
 snowball+0.049
ats+0.045
Neg logit contributions
 Grac-0.089
nam-0.089
rily-0.086
 Claire-0.083
edly-0.083
Token.
Token ID13
Feature activation+1.44e+00

Pos logit contributions
opl+0.027
 snowball+0.023
ats+0.023
?"+0.022
ch+0.021
Neg logit contributions
 Grac-0.036
nam-0.034
 Claire-0.034
rily-0.034
 Katy-0.034
Token in
Token ID287
Feature activation-6.83e-01

Pos logit contributions
 Grac+0.001
nam+0.001
 Katy+0.001
 Claire+0.001
rily+0.001
Neg logit contributions
?"-0.000
opl-0.000
?-0.000
o-0.000
aw-0.000
Token a
Token ID257
Feature activation+6.33e-01

Pos logit contributions
rily+0.025
ve+0.025
 Grac+0.025
nam+0.025
ately+0.024
Neg logit contributions
 snowball-0.013
?"-0.013
?-0.013
 snow-0.012
opl-0.012
Token pile
Token ID14540
Feature activation+1.19e+00

Pos logit contributions
opl+0.017
?"+0.016
ats+0.015
 snowball+0.015
?+0.015
Neg logit contributions
 Grac-0.020
nam-0.020
 Claire-0.019
rily-0.019
 Katy-0.018
Token of
Token ID286
Feature activation-1.23e+00

Pos logit contributions
opl+0.022
?"+0.020
 snowball+0.018
ats+0.018
?+0.018
Neg logit contributions
 Grac-0.047
nam-0.047
rily-0.045
 Claire-0.045
 Katy-0.044
Token leaves
Token ID5667
Feature activation+1.46e+00

Pos logit contributions
 Grac+0.039
rily+0.037
nam+0.037
 Claire+0.036
 Katy+0.035
Neg logit contributions
opl-0.033
?"-0.032
 snowball-0.030
?-0.030
ats-0.029
Token.
Token ID13
Feature activation-6.62e-01

Pos logit contributions
opl+0.032
?"+0.030
 snowball+0.028
ats+0.027
?+0.027
Neg logit contributions
 Grac-0.053
nam-0.053
rily-0.050
 Claire-0.050
 Katy-0.049
Token She
Token ID1375
Feature activation+3.95e-01

Pos logit contributions
nam+0.022
rily+0.022
 Grac+0.021
chie+0.021
ve+0.020
Neg logit contributions
?"-0.015
 Sometimes-0.014
opl-0.014
 snowball-0.013
?-0.013
Token ran
Token ID4966
Feature activation-2.39e-01

Pos logit contributions
opl+0.010
?"+0.010
 snowball+0.009
ats+0.009
?+0.009
Neg logit contributions
 Grac-0.013
nam-0.013
rily-0.012
 Claire-0.012
 Katy-0.012
Token away
Token ID1497
Feature activation-1.50e+00

Pos logit contributions
nam+0.009
 Grac+0.008
rily+0.008
 Claire+0.008
 Katy+0.008
Neg logit contributions
opl-0.005
?"-0.005
 snowball-0.005
ats-0.005
o-0.005
Token and
Token ID290
Feature activation-1.96e-01

Pos logit contributions
nam+0.069
 Grac+0.067
rily+0.067
 Katy+0.064
sel+0.064
Neg logit contributions
?"-0.015
?-0.015
,-0.010
 truck-0.009
 about-0.009
Token forgot
Token ID16453
Feature activation+6.04e-04

Pos logit contributions
rily+0.007
 Grac+0.007
nam+0.007
rier+0.006
ve+0.006
Neg logit contributions
?"-0.004
opl-0.004
o-0.004
 Rex-0.004
?-0.004
Token loneliness
Token ID37378
Feature activation-7.93e-01

Pos logit contributions
 Grac+0.053
nam+0.052
rily+0.049
 Claire+0.048
 Katy+0.047
Neg logit contributions
opl-0.046
?"-0.043
 snowball-0.041
?-0.040
ats-0.040
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
 Grac+0.031
nam+0.031
rily+0.029
 Claire+0.029
 Katy+0.029
Neg logit contributions
opl-0.015
?"-0.014
?-0.013
 snowball-0.013
ats-0.012
Token
Token ID198
Feature activation-3.45e-01

Pos logit contributions
opl+0.000
?"+0.000
ats+0.000
 snowball+0.000
?+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
 Claire-0.000
 Katy-0.000
rily-0.000
Token
Token ID198
Feature activation-5.79e-01

Pos logit contributions
 Grac+0.010
ve+0.009
 Claire+0.009
Gr+0.009
 Katy+0.008
Neg logit contributions
opl-0.011
ats-0.009
 microwave-0.009
?"-0.009
 fires-0.009
TokenFl
Token ID7414
Feature activation-1.19e+00

Pos logit contributions
 Grac+0.015
Gr+0.014
nam+0.014
ve+0.013
 Claire+0.013
Neg logit contributions
opl-0.020
?"-0.018
ats-0.018
?-0.017
 microwave-0.017
Tokenappy
Token ID7774
Feature activation-7.72e-02

Pos logit contributions
nam+0.031
 Grac+0.029
 Claire+0.028
rily+0.028
 Katy+0.027
Neg logit contributions
opl-0.040
o-0.040
?"-0.040
ats-0.039
?-0.039
Token continued
Token ID3767
Feature activation-4.33e-01

Pos logit contributions
 Grac+0.002
nam+0.002
 Claire+0.002
rily+0.002
 Katy+0.002
Neg logit contributions
opl-0.002
?"-0.002
 snowball-0.002
ats-0.002
?-0.002
Token searching
Token ID10342
Feature activation-2.19e+00

Pos logit contributions
 Grac+0.014
nam+0.014
rily+0.013
 Claire+0.013
 Katy+0.013
Neg logit contributions
opl-0.012
?"-0.011
 snowball-0.010
ats-0.010
?-0.010
Token for
Token ID329
Feature activation-9.08e-01

Pos logit contributions
 Grac+0.080
nam+0.079
rily+0.075
 Claire+0.074
 Katy+0.073
Neg logit contributions
opl-0.048
?"-0.045
 snowball-0.042
?-0.042
ats-0.041
Token a
Token ID257
Feature activation+6.81e-01

Pos logit contributions
 Grac+0.032
nam+0.031
rily+0.031
 Claire+0.029
 Katy+0.029
Neg logit contributions
opl-0.021
?"-0.020
 snowball-0.019
?-0.018
ats-0.018
Token place
Token ID1295
Feature activation-7.09e-01

Pos logit contributions
 snowball+0.017
?"+0.017
opl+0.017
?+0.016
ch+0.016
Neg logit contributions
 Grac-0.021
nam-0.021
rily-0.021
edly-0.020
 Claire-0.019
Token.
Token ID13
Feature activation+3.28e-02

Pos logit contributions
opl+0.076
ossible+0.065
ats+0.064
 snowball+0.063
oof+0.061
Neg logit contributions
rily-0.067
 Claire-0.065
 Grac-0.064
nam-0.064
 brace-0.059
Token Tim
Token ID5045
Feature activation-9.73e-01

Pos logit contributions
opl+0.001
ats+0.001
?"+0.001
 fires+0.001
at+0.001
Neg logit contributions
 Grac-0.001
 Claire-0.001
 Katy-0.001
nam-0.001
 Emmy-0.001
Token's
Token ID338
Feature activation-4.98e-01

Pos logit contributions
 Grac+0.025
 Claire+0.024
nam+0.023
 Katy+0.023
ve+0.022
Neg logit contributions
opl-0.032
iren-0.027
 snowball-0.026
ch-0.026
etch-0.025
Token mom
Token ID1995
Feature activation+1.54e+00

Pos logit contributions
nam+0.012
 Grac+0.011
 Claire+0.010
 Katy+0.010
rily+0.009
Neg logit contributions
opl-0.019
?"-0.016
ats-0.016
ch-0.016
iren-0.016
Token said
Token ID531
Feature activation-1.79e-02

Pos logit contributions
opl+0.052
 snowball+0.044
 fires+0.041
ch+0.040
iren+0.039
Neg logit contributions
 Grac-0.043
 Claire-0.040
 Katy-0.040
ve-0.038
 Emmy-0.037
Token,
Token ID11
Feature activation+2.86e-01

Pos logit contributions
 Grac+0.001
nam+0.001
rily+0.001
 Claire+0.001
 Katy+0.001
Neg logit contributions
opl-0.000
?"-0.000
 snowball-0.000
?-0.000
ats-0.000
Token "
Token ID366
Feature activation-1.53e+00

Pos logit contributions
opl+0.009
?"+0.008
ats+0.008
 snowball+0.008
?+0.008
Neg logit contributions
 Grac-0.008
nam-0.008
rily-0.007
 Claire-0.007
 Katy-0.007
TokenToday
Token ID8888
Feature activation+5.76e-01

Pos logit contributions
 Grac+0.040
nam+0.039
 Claire+0.037
 Katy+0.037
rily+0.036
Neg logit contributions
opl-0.050
?"-0.048
ats-0.045
?-0.044
 snowball-0.043
Token,
Token ID11
Feature activation+6.75e-01

Pos logit contributions
opl+0.016
 snowball+0.013
ats+0.013
iren+0.012
aver+0.012
Neg logit contributions
 Grac-0.018
 Claire-0.018
 Katy-0.018
nam-0.017
 Nancy-0.017
Token you
Token ID345
Feature activation-3.90e-01

Pos logit contributions
opl+0.023
ats+0.020
o+0.019
?"+0.018
 snowball+0.018
Neg logit contributions
 Grac-0.018
nam-0.018
 Claire-0.017
 Katy-0.016
 diamonds-0.016
Token will
Token ID481
Feature activation-1.49e-01

Pos logit contributions
 Grac+0.013
 Claire+0.012
nam+0.012
ve+0.012
 Katy+0.012
Neg logit contributions
opl-0.011
 snowball-0.008
iren-0.008
ats-0.008
o-0.008
Token to
Token ID284
Feature activation-2.48e-01

Pos logit contributions
nam+0.042
 Grac+0.042
rily+0.040
 Claire+0.039
 Katy+0.039
Neg logit contributions
opl-0.022
?"-0.021
?-0.019
ats-0.019
 snowball-0.019
Token God
Token ID1793
Feature activation-2.11e-01

Pos logit contributions
 Grac+0.009
rily+0.009
nam+0.009
edly+0.009
ve+0.009
Neg logit contributions
?"-0.006
?-0.005
opl-0.005
aster-0.004
ossible-0.004
Token.
Token ID13
Feature activation-4.73e-01

Pos logit contributions
 Grac+0.008
nam+0.008
rily+0.007
 Claire+0.007
 Katy+0.007
Neg logit contributions
opl-0.004
?"-0.004
 snowball-0.004
ats-0.004
?-0.004
Token She
Token ID1375
Feature activation-3.90e-02

Pos logit contributions
 Grac+0.016
nam+0.016
rily+0.015
 Claire+0.015
ve+0.015
Neg logit contributions
opl-0.011
?"-0.011
?-0.010
 snowball-0.010
o-0.010
Token asked
Token ID1965
Feature activation-1.01e+00

Pos logit contributions
 Grac+0.001
nam+0.001
 Claire+0.001
rily+0.001
 Katy+0.001
Neg logit contributions
opl-0.001
?"-0.001
 snowball-0.001
ats-0.001
?-0.001
Token God
Token ID1793
Feature activation+1.41e-02

Pos logit contributions
 Grac+0.037
nam+0.036
rily+0.035
edly+0.034
chie+0.034
Neg logit contributions
?"-0.023
?-0.020
opl-0.020
 fires-0.017
ats-0.017
Token to
Token ID284
Feature activation-1.07e-01

Pos logit contributions
opl+0.000
?"+0.000
 snowball+0.000
ats+0.000
?+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
rily-0.000
 Claire-0.000
 Katy-0.000
Token help
Token ID1037
Feature activation+4.33e-01

Pos logit contributions
 Grac+0.004
nam+0.004
 Claire+0.003
rily+0.003
 Katy+0.003
Neg logit contributions
opl-0.003
?"-0.002
 snowball-0.002
ats-0.002
?-0.002
Token her
Token ID607
Feature activation-2.04e-01

Pos logit contributions
?"+0.011
?+0.010
opl+0.010
ats+0.009
aster+0.009
Neg logit contributions
nam-0.014
 Grac-0.014
rily-0.013
edly-0.013
 Claire-0.013
Token to
Token ID284
Feature activation+4.58e-01

Pos logit contributions
 Grac+0.007
nam+0.007
 Claire+0.006
rily+0.006
 Katy+0.006
Neg logit contributions
opl-0.005
 snowball-0.004
?"-0.004
ats-0.004
o-0.004
Token have
Token ID423
Feature activation-6.34e-02

Pos logit contributions
opl+0.011
?"+0.011
?+0.010
 snowball+0.010
ats+0.010
Neg logit contributions
 Grac-0.016
nam-0.015
rily-0.015
 Claire-0.014
 Katy-0.014
Token was
Token ID373
Feature activation-8.62e-01

Pos logit contributions
opl+0.015
 snowball+0.013
ats+0.013
?"+0.013
?+0.012
Neg logit contributions
 Grac-0.017
nam-0.017
 Claire-0.016
rily-0.016
 Katy-0.015
Token white
Token ID2330
Feature activation+6.60e-01

Pos logit contributions
nam+0.023
 Grac+0.022
rily+0.021
 diamonds+0.019
 Claire+0.019
Neg logit contributions
opl-0.030
ats-0.026
ossible-0.025
 snowball-0.024
?"-0.024
Token and
Token ID290
Feature activation-6.61e-01

Pos logit contributions
opl+0.021
ats+0.019
ossible+0.018
haw+0.018
aster+0.017
Neg logit contributions
nam-0.017
rily-0.017
 diamonds-0.017
 Grac-0.017
edly-0.016
Token fluffy
Token ID39145
Feature activation+5.24e-01

Pos logit contributions
 Grac+0.018
nam+0.017
rily+0.016
 Claire+0.016
 Katy+0.016
Neg logit contributions
opl-0.021
?"-0.020
 snowball-0.019
?-0.019
ats-0.019
Token.
Token ID13
Feature activation-9.36e-01

Pos logit contributions
opl+0.013
ats+0.011
?"+0.011
ossible+0.011
?+0.010
Neg logit contributions
 Grac-0.018
nam-0.018
rily-0.017
 Claire-0.017
edly-0.016
Token The
Token ID383
Feature activation+5.58e-01

Pos logit contributions
 Grac+0.025
nam+0.025
 Claire+0.024
 Katy+0.023
rily+0.022
Neg logit contributions
opl-0.031
ats-0.030
 snowball-0.027
?"-0.027
at-0.026
Token boat
Token ID8848
Feature activation-3.69e-01

Pos logit contributions
opl+0.018
ats+0.016
o+0.016
?"+0.015
?+0.015
Neg logit contributions
nam-0.015
 Grac-0.015
rily-0.015
 Claire-0.014
 diamonds-0.014
Token loved
Token ID6151
Feature activation+2.39e-01

Pos logit contributions
 Grac+0.009
nam+0.009
 Claire+0.008
rily+0.008
ve+0.008
Neg logit contributions
opl-0.013
 snowball-0.011
ats-0.011
aster-0.011
iren-0.011
Token to
Token ID284
Feature activation-8.10e-01

Pos logit contributions
opl+0.008
ats+0.008
?"+0.007
?+0.007
 snowball+0.007
Neg logit contributions
nam-0.006
 Grac-0.006
 Claire-0.005
 diamonds-0.005
 brace-0.005
Token sail
Token ID14936
Feature activation+5.64e-01

Pos logit contributions
 Grac+0.017
nam+0.016
 rib+0.016
rily+0.015
 Claire+0.014
Neg logit contributions
opl-0.030
ats-0.027
aw-0.027
o-0.027
iren-0.025
Token in
Token ID287
Feature activation-1.59e+00

Pos logit contributions
opl+0.017
ats+0.015
 snowball+0.015
?"+0.014
ossible+0.014
Neg logit contributions
nam-0.016
 Grac-0.015
rily-0.015
 Claire-0.015
 Katy-0.015
Token happy
Token ID3772
Feature activation-1.59e+00

Pos logit contributions
opl+0.017
?"+0.016
 snowball+0.015
?+0.015
ats+0.015
Neg logit contributions
 Grac-0.017
nam-0.017
rily-0.016
 Claire-0.016
 Katy-0.016
Token and
Token ID290
Feature activation+3.74e-01

Pos logit contributions
nam+0.065
 Grac+0.063
rily+0.061
 Claire+0.060
 Katy+0.059
Neg logit contributions
opl-0.028
?"-0.028
?-0.025
 snowball-0.024
o-0.023
Token loved
Token ID6151
Feature activation+8.34e-01

Pos logit contributions
?"+0.009
opl+0.008
?+0.008
o+0.008
ossible+0.007
Neg logit contributions
rily-0.013
 Grac-0.012
nam-0.012
chie-0.012
rier-0.012
Token being
Token ID852
Feature activation-2.41e-01

Pos logit contributions
opl+0.022
?"+0.021
 snowball+0.020
?+0.020
ats+0.019
Neg logit contributions
 Grac-0.027
nam-0.027
rily-0.026
 Claire-0.025
 Katy-0.024
Token a
Token ID257
Feature activation-6.76e-01

Pos logit contributions
 Grac+0.007
rily+0.007
nam+0.007
ately+0.007
edly+0.007
Neg logit contributions
?"-0.006
opl-0.006
?-0.006
o-0.006
 snowball-0.006
Token helper
Token ID31904
Feature activation-1.07e+00

Pos logit contributions
nam+0.021
 Grac+0.021
rily+0.020
 Claire+0.019
 Katy+0.019
Neg logit contributions
opl-0.018
?"-0.017
 snowball-0.017
?-0.016
ats-0.016
Token in
Token ID287
Feature activation-4.70e-01

Pos logit contributions
nam+0.042
 Grac+0.041
rily+0.039
 Claire+0.039
 Katy+0.038
Neg logit contributions
opl-0.021
?"-0.020
?-0.018
 snowball-0.017
ats-0.017
Token the
Token ID262
Feature activation+5.76e-01

Pos logit contributions
 Grac+0.013
nam+0.013
rily+0.012
 Claire+0.012
 Katy+0.012
Neg logit contributions
opl-0.014
?"-0.014
 snowball-0.013
ats-0.013
?-0.013
Token kitchen
Token ID9592
Feature activation-1.39e-01

Pos logit contributions
opl+0.013
?"+0.013
ats+0.012
 snowball+0.011
o+0.011
Neg logit contributions
 Grac-0.020
nam-0.020
 Claire-0.019
 Katy-0.019
rily-0.018
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
 Grac+0.005
nam+0.005
rily+0.005
 Claire+0.005
 Katy+0.005
Neg logit contributions
opl-0.003
?"-0.003
?-0.002
ats-0.002
 snowball-0.002
Token<|endoftext|>
Token ID50256
Feature activation+1.20e+00

Pos logit contributions
 Grac+0.000
nam+0.000
rily+0.000
 Claire+0.000
 Katy+0.000
Neg logit contributions
opl-0.000
?"-0.000
 snowball-0.000
ats-0.000
?-0.000
Token and
Token ID290
Feature activation+6.85e-01

Pos logit contributions
opl+0.000
?"+0.000
ats+0.000
 snowball+0.000
?+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
rily-0.000
 Claire-0.000
 Katy-0.000
Token started
Token ID2067
Feature activation+1.22e-01

Pos logit contributions
opl+0.017
?"+0.017
?+0.016
 snowball+0.016
ats+0.015
Neg logit contributions
 Grac-0.022
nam-0.022
rily-0.021
 Claire-0.021
 Katy-0.020
Token to
Token ID284
Feature activation-6.26e-02

Pos logit contributions
opl+0.003
?"+0.003
 snowball+0.003
?+0.003
ats+0.003
Neg logit contributions
 Grac-0.004
nam-0.004
rily-0.004
 Claire-0.004
 Katy-0.004
Token move
Token ID1445
Feature activation+1.07e+00

Pos logit contributions
 Grac+0.002
nam+0.002
rily+0.002
 Claire+0.002
 Katy+0.002
Neg logit contributions
opl-0.002
?"-0.002
 snowball-0.002
?-0.001
ats-0.001
Token!
Token ID0
Feature activation-1.98e-01

Pos logit contributions
opl+0.024
?"+0.023
 snowball+0.021
?+0.021
ats+0.021
Neg logit contributions
 Grac-0.039
nam-0.038
rily-0.036
 Claire-0.036
 Katy-0.036
Token It
Token ID632
Feature activation-1.45e+00

Pos logit contributions
 Grac+0.006
nam+0.006
rily+0.006
 Claire+0.006
 Katy+0.006
Neg logit contributions
opl-0.005
?"-0.005
 snowball-0.005
ats-0.005
?-0.005
Token was
Token ID373
Feature activation-2.76e-01

Pos logit contributions
 Grac+0.047
nam+0.046
rily+0.043
 Claire+0.043
 Katy+0.042
Neg logit contributions
opl-0.039
?"-0.036
ats-0.034
?-0.033
 snowball-0.033
Token very
Token ID845
Feature activation-1.33e+00

Pos logit contributions
 Grac+0.008
nam+0.008
 Claire+0.007
rily+0.007
 Katy+0.007
Neg logit contributions
opl-0.008
?"-0.008
ats-0.007
?-0.007
 snowball-0.007
Token scary
Token ID14343
Feature activation+1.00e+00

Pos logit contributions
 Grac+0.042
nam+0.042
rily+0.039
 Claire+0.039
 Katy+0.038
Neg logit contributions
opl-0.036
?"-0.034
ats-0.032
 snowball-0.032
?-0.031
Token.
Token ID13
Feature activation-2.25e-01

Pos logit contributions
opl+0.024
?"+0.021
ats+0.021
ossible+0.020
 snowball+0.020
Neg logit contributions
 Grac-0.035
nam-0.033
rily-0.033
 Claire-0.032
 Katy-0.032
Token The
Token ID383
Feature activation+9.56e-01

Pos logit contributions
 Grac+0.007
nam+0.007
rily+0.007
 Claire+0.007
 Katy+0.007
Neg logit contributions
opl-0.006
?"-0.006
 snowball-0.005
ats-0.005
?-0.005
Token asked
Token ID1965
Feature activation-1.09e+00

Pos logit contributions
 Grac+0.038
nam+0.037
 Claire+0.035
rily+0.035
 Katy+0.034
Neg logit contributions
opl-0.030
?"-0.028
 snowball-0.026
ats-0.026
?-0.026
Token Lily
Token ID20037
Feature activation+7.44e-01

Pos logit contributions
 Grac+0.032
nam+0.031
 Claire+0.029
rily+0.029
 Katy+0.029
Neg logit contributions
opl-0.033
?"-0.030
 snowball-0.029
ats-0.029
?-0.028
Token if
Token ID611
Feature activation-8.69e-01

Pos logit contributions
opl+0.018
 snowball+0.016
?"+0.015
ats+0.015
ch+0.014
Neg logit contributions
 Grac-0.026
nam-0.026
 Claire-0.025
 Katy-0.024
rily-0.024
Token he
Token ID339
Feature activation+1.37e+00

Pos logit contributions
rily+0.031
edly+0.029
nam+0.028
 Grac+0.028
ve+0.028
Neg logit contributions
?"-0.021
?-0.020
 Rex-0.019
o-0.018
ch-0.017
Token could
Token ID714
Feature activation+7.43e-04

Pos logit contributions
opl+0.040
?"+0.036
 snowball+0.035
ats+0.034
?+0.033
Neg logit contributions
 Grac-0.041
nam-0.040
 Claire-0.038
rily-0.037
 Katy-0.037
Token borrow
Token ID8804
Feature activation-9.86e-01

Pos logit contributions
opl+0.000
 snowball+0.000
 microwave+0.000
ats+0.000
aw+0.000
Neg logit contributions
 Grac-0.000
nam-0.000
 Claire-0.000
 brace-0.000
ve-0.000
Token her
Token ID607
Feature activation-7.78e-01

Pos logit contributions
 Grac+0.032
nam+0.032
rily+0.031
 Claire+0.030
 Katy+0.030
Neg logit contributions
opl-0.025
?"-0.024
 snowball-0.023
?-0.022
ats-0.022
Token comb
Token ID1974
Feature activation+9.65e-01

Pos logit contributions
 Grac+0.024
nam+0.023
rily+0.022
 Claire+0.022
 Katy+0.021
Neg logit contributions
opl-0.022
?"-0.020
ats-0.019
 snowball-0.019
?-0.019
Token to
Token ID284
Feature activation+3.40e-02

Pos logit contributions
opl+0.023
?"+0.021
 snowball+0.020
ats+0.020
?+0.019
Neg logit contributions
 Grac-0.033
nam-0.033
rily-0.031
 Claire-0.031
 Katy-0.030
Token comb
Token ID1974
Feature activation+1.01e+00

Pos logit contributions
opl+0.001
?"+0.001
?+0.001
 snowball+0.001
ats+0.001
Neg logit contributions
 Grac-0.001
nam-0.001
rily-0.001
 Claire-0.001
 Katy-0.001
Token his
Token ID465
Feature activation-7.01e-01

Pos logit contributions
opl+0.030
?"+0.029
?+0.028
ats+0.027
 snowball+0.027
Neg logit contributions
 Grac-0.028
nam-0.028
 Claire-0.026
rily-0.026
 Katy-0.026
Token hope
Token ID2911
Feature activation-8.81e-01

Pos logit contributions
 Grac+0.003
ve+0.003
 Claire+0.002
 Katy+0.002
nam+0.002
Neg logit contributions
opl-0.003
 snowball-0.003
iren-0.003
 microwave-0.003
?"-0.003
Token so
Token ID523
Feature activation+2.62e-01

Pos logit contributions
 Grac+0.029
nam+0.029
rily+0.027
 Claire+0.027
 Katy+0.026
Neg logit contributions
opl-0.022
?"-0.022
?-0.020
 snowball-0.020
ats-0.020
Token too
Token ID1165
Feature activation+6.21e-01

Pos logit contributions
?"+0.006
opl+0.006
?+0.006
 snowball+0.005
o+0.005
Neg logit contributions
 Grac-0.009
nam-0.009
rily-0.009
 Claire-0.009
 Katy-0.009
Token!"
Token ID2474
Feature activation-9.52e-01

Pos logit contributions
opl+0.015
ats+0.013
 snowball+0.012
ossible+0.012
o+0.011
Neg logit contributions
 Grac-0.022
nam-0.022
 Claire-0.021
rily-0.021
 Katy-0.021
Token She
Token ID1375
Feature activation+1.03e+00

Pos logit contributions
nam+0.033
 Grac+0.033
rily+0.031
 Claire+0.031
 Katy+0.030
Neg logit contributions
opl-0.022
?"-0.021
?-0.020
ats-0.019
 snowball-0.019
Token picked
Token ID6497
Feature activation-1.02e+00

Pos logit contributions
opl+0.026
?"+0.026
 snowball+0.024
ats+0.024
?+0.023
Neg logit contributions
nam-0.033
 Grac-0.033
rily-0.031
 Claire-0.031
 Katy-0.031
Token up
Token ID510
Feature activation-1.06e+00

Pos logit contributions
nam+0.038
 Grac+0.036
edly+0.036
chie+0.036
ve+0.034
Neg logit contributions
?"-0.021
 snowball-0.020
opl-0.020
ats-0.019
?-0.019
Token the
Token ID262
Feature activation-6.82e-01

Pos logit contributions
ve+0.038
rily+0.037
chie+0.037
 Grac+0.037
nam+0.036
Neg logit contributions
 snowball-0.023
?"-0.022
 snow-0.021
 Rex-0.021
?-0.020
Token leaf
Token ID12835
Feature activation+7.31e-01

Pos logit contributions
nam+0.022
 Grac+0.022
rily+0.021
 Katy+0.021
 Claire+0.021
Neg logit contributions
?"-0.017
opl-0.017
 snowball-0.016
?-0.016
ats-0.015
Token and
Token ID290
Feature activation-9.97e-02

Pos logit contributions
opl+0.012
?"+0.012
 snowball+0.011
?+0.011
o+0.011
Neg logit contributions
nam-0.030
 Grac-0.030
 Claire-0.028
 Katy-0.028
rily-0.028
Token looked
Token ID3114
Feature activation-6.79e-01

Pos logit contributions
chie+0.004
rily+0.004
 Grac+0.004
rier+0.003
nam+0.003
Neg logit contributions
?"-0.002
 snowball-0.002
o-0.002
 Rex-0.002
?-0.002
Token to
Token ID284
Feature activation-2.78e-01

Pos logit contributions
opl+0.010
?"+0.010
 snowball+0.009
ats+0.009
?+0.009
Neg logit contributions
 Grac-0.011
nam-0.011
rily-0.010
 Claire-0.010
 Katy-0.010
Token his
Token ID465
Feature activation+6.43e-01

Pos logit contributions
 Grac+0.009
nam+0.009
rily+0.009
edly+0.008
 Claire+0.008
Neg logit contributions
opl-0.007
?"-0.007
?-0.006
 snowball-0.006
ats-0.006
Token room
Token ID2119
Feature activation+2.12e+00

Pos logit contributions
opl+0.017
?"+0.016
?+0.015
 snowball+0.015
ats+0.014
Neg logit contributions
 Grac-0.021
nam-0.020
rily-0.019
 Claire-0.019
 Katy-0.019
Token and
Token ID290
Feature activation+1.02e+00

Pos logit contributions
opl+0.047
?"+0.044
 snowball+0.040
ats+0.040
?+0.040
Neg logit contributions
 Grac-0.077
nam-0.076
rily-0.072
 Claire-0.072
 Katy-0.071
Token found
Token ID1043
Feature activation+6.82e-02

Pos logit contributions
opl+0.026
?"+0.026
?+0.024
 snowball+0.023
ats+0.023
Neg logit contributions
 Grac-0.033
nam-0.032
rily-0.031
 Claire-0.030
 Katy-0.030
Token two
Token ID734
Feature activation-1.91e+00

Pos logit contributions
opl+0.002
?"+0.002
 snowball+0.002
ats+0.002
?+0.002
Neg logit contributions
 Grac-0.002
nam-0.002
rily-0.002
 Claire-0.002
 Katy-0.002
Token books
Token ID3835
Feature activation-1.35e+00

Pos logit contributions
 Grac+0.058
nam+0.056
rily+0.054
 Claire+0.054
edly+0.054
Neg logit contributions
opl-0.050
 snowball-0.049
?"-0.049
?-0.047
ats-0.046
Token.
Token ID13
Feature activation-4.27e-01

Pos logit contributions
 Grac+0.053
nam+0.051
 Claire+0.048
rily+0.048
 Katy+0.047
Neg logit contributions
?"-0.027
opl-0.026
?-0.025
ats-0.023
o-0.022
Token One
Token ID1881
Feature activation+1.23e+00

Pos logit contributions
 Grac+0.015
nam+0.014
rily+0.014
 Claire+0.014
 Katy+0.013
Neg logit contributions
opl-0.011
?"-0.010
ats-0.009
 snowball-0.009
?-0.009
Token was
Token ID373
Feature activation+4.58e-01

Pos logit contributions
opl+0.029
?"+0.025
ats+0.024
 snowball+0.024
?+0.023
Neg logit contributions
 Grac-0.044
nam-0.044
 Claire-0.042
rily-0.041
 Katy-0.041
Token blue
Token ID4171
Feature activation+1.65e+00

Pos logit contributions
opl+0.013
?"+0.012
 snowball+0.011
ats+0.011
?+0.011
Neg logit contributions
 Grac-0.014
nam-0.014
rily-0.013
 Claire-0.013
 Katy-0.013
Token
Token ID220
Feature activation+5.04e-01

Pos logit contributions
nam+0.032
 Grac+0.032
rily+0.031
 Claire+0.029
chie+0.028
Neg logit contributions
opl-0.016
?"-0.015
?-0.014
ossible-0.014
o-0.013
Token
Token ID198
Feature activation-1.10e-01

Pos logit contributions
opl+0.011
?"+0.010
 snowball+0.009
ats+0.009
?+0.009
Neg logit contributions
 Grac-0.019
nam-0.018
rily-0.018
 Claire-0.018
 Katy-0.017
Token
Token ID198
Feature activation-4.14e-01

Pos logit contributions
 Grac+0.004
nam+0.004
 Claire+0.004
rily+0.004
 Katy+0.004
Neg logit contributions
opl-0.002
?"-0.002
 snowball-0.002
ats-0.002
?-0.002
TokenSam
Token ID16305
Feature activation-1.04e+00

Pos logit contributions
 Grac+0.012
nam+0.012
rily+0.012
 Claire+0.011
 Katy+0.011
Neg logit contributions
opl-0.012
?"-0.012
?-0.011
ats-0.011
ossible-0.011
Token and
Token ID290
Feature activation-1.93e+00

Pos logit contributions
 Grac+0.038
nam+0.038
rily+0.036
 Claire+0.035
 Katy+0.035
Neg logit contributions
opl-0.023
?"-0.021
 snowball-0.020
ats-0.019
?-0.019
Token Fran
Token ID12323
Feature activation-4.62e+00

Pos logit contributions
rily+0.059
 Grac+0.051
ve+0.048
edly+0.046
nam+0.046
Neg logit contributions
?"-0.056
 course-0.053
?-0.053
 Rex-0.051
ossible-0.051
Token ran
Token ID4966
Feature activation-2.93e-01

Pos logit contributions
 Grac+0.167
nam+0.164
rily+0.159
rier+0.152
keeping+0.151
Neg logit contributions
?"-0.098
ats-0.096
ossible-0.093
o-0.089
?-0.087
Token over
Token ID625
Feature activation-1.55e+00

Pos logit contributions
 Grac+0.010
nam+0.010
rily+0.010
 Claire+0.009
 Katy+0.009
Neg logit contributions
opl-0.007
?"-0.007
 snowball-0.006
ats-0.006
o-0.006
Token to
Token ID284
Feature activation+1.52e-01

Pos logit contributions
nam+0.057
 Grac+0.056
rily+0.054
 Claire+0.052
 Katy+0.051
Neg logit contributions
?"-0.033
 snowball-0.032
opl-0.031
?-0.031
o-0.030
Token the
Token ID262
Feature activation+7.46e-01

Pos logit contributions
?"+0.003
?+0.003
 Rex+0.003
 truck+0.002
 snowball+0.002
Neg logit contributions
edly-0.006
rily-0.006
nam-0.006
 Grac-0.006
rier-0.006
Token peanuts
Token ID45612
Feature activation-1.31e-01

Pos logit contributions
?"+0.017
opl+0.016
?+0.015
 truck+0.015
 snowball+0.015
Neg logit contributions
 Grac-0.027
nam-0.027
rily-0.027
edly-0.026
 Claire-0.025
Token play
Token ID711
Feature activation+1.55e-01

Pos logit contributions
opl+0.027
?"+0.026
?+0.024
 snowball+0.023
ats+0.023
Neg logit contributions
 Grac-0.044
nam-0.043
rily-0.042
 Claire-0.041
 Katy-0.041
Token together
Token ID1978
Feature activation-5.54e-01

Pos logit contributions
opl+0.004
?"+0.004
?+0.003
 snowball+0.003
ats+0.003
Neg logit contributions
 Grac-0.005
nam-0.005
rily-0.005
 Claire-0.005
 Katy-0.005
Token."
Token ID526
Feature activation-8.89e-01

Pos logit contributions
 Grac+0.021
nam+0.020
 Claire+0.020
rily+0.020
 Katy+0.019
Neg logit contributions
opl-0.012
?"-0.010
 snowball-0.010
ats-0.010
?-0.009
Token
Token ID198
Feature activation-5.61e-01

Pos logit contributions
nam+0.040
 Grac+0.037
rily+0.037
chie+0.036
 brace+0.035
Neg logit contributions
?"-0.013
opl-0.012
?-0.012
o-0.011
ossible-0.011
Token
Token ID198
Feature activation-7.36e-01

Pos logit contributions
 Grac+0.022
nam+0.022
rily+0.021
 Claire+0.021
 Katy+0.020
Neg logit contributions
opl-0.011
?"-0.010
 snowball-0.009
ats-0.009
?-0.009
TokenDa
Token ID26531
Feature activation-4.55e+00

Pos logit contributions
nam+0.024
 Grac+0.023
rily+0.022
 diamonds+0.021
ately+0.021
Neg logit contributions
?"-0.018
opl-0.018
 snowball-0.017
?-0.017
o-0.017
Tokenisy
Token ID13560
Feature activation+4.57e-01

Pos logit contributions
nam+0.154
 Grac+0.135
 diamonds+0.122
 rib+0.121
 Erin+0.118
Neg logit contributions
?"-0.121
opl-0.115
o-0.114
?-0.110
ats-0.109
Token smiled
Token ID13541
Feature activation-1.59e+00

Pos logit contributions
opl+0.013
?"+0.012
 snowball+0.011
ats+0.011
?+0.011
Neg logit contributions
 Grac-0.014
nam-0.013
 Claire-0.013
rily-0.013
 Katy-0.013
Token and
Token ID290
Feature activation+5.48e-01

Pos logit contributions
nam+0.070
 Grac+0.069
 Claire+0.064
 Katy+0.064
rily+0.063
Neg logit contributions
?"-0.024
opl-0.022
?-0.022
 snowball-0.021
o-0.018
Token hugged
Token ID41130
Feature activation-5.02e-01

Pos logit contributions
?"+0.008
o+0.008
 Rex+0.007
?+0.007
opl+0.007
Neg logit contributions
nam-0.023
rily-0.022
 Grac-0.022
chie-0.022
rier-0.021
Token Jack
Token ID3619
Feature activation-7.46e-01

Pos logit contributions
nam+0.016
chie+0.015
ve+0.015
 Grac+0.015
rily+0.014
Neg logit contributions
?"-0.013
?-0.013
 snowball-0.012
 Rex-0.012
opl-0.012
Token moments
Token ID7188
Feature activation-3.43e+00

Pos logit contributions
 Grac+0.057
nam+0.055
 Claire+0.052
 Katy+0.051
edly+0.050
Neg logit contributions
opl-0.058
?"-0.055
?-0.051
 snowball-0.050
o-0.050
Token together
Token ID1978
Feature activation-1.21e+00

Pos logit contributions
 Grac+0.118
nam+0.118
rily+0.113
 Claire+0.110
 Katy+0.108
Neg logit contributions
opl-0.082
?"-0.077
?-0.073
ats-0.071
 snowball-0.069
Token.
Token ID13
Feature activation-8.59e-01

Pos logit contributions
nam+0.048
 Grac+0.047
rily+0.045
 Claire+0.044
 Katy+0.044
Neg logit contributions
opl-0.023
?"-0.022
?-0.020
ats-0.019
 snowball-0.019
Token Now
Token ID2735
Feature activation-5.55e-01

Pos logit contributions
 Grac+0.028
nam+0.028
rily+0.026
 Claire+0.025
 Katy+0.025
Neg logit contributions
opl-0.023
?"-0.021
 snowball-0.020
?-0.020
ats-0.020
Token whenever
Token ID8797
Feature activation-1.60e+00

Pos logit contributions
nam+0.020
 Grac+0.019
rily+0.018
chie+0.018
edly+0.018
Neg logit contributions
?"-0.012
opl-0.012
?-0.011
ats-0.011
ossible-0.010
Token Heidi
Token ID45518
Feature activation-4.69e+00

Pos logit contributions
 Grac+0.047
nam+0.043
rily+0.043
ve+0.040
chie+0.040
Neg logit contributions
?"-0.049
?-0.046
opl-0.046
 snowball-0.042
ats-0.042
Token sees
Token ID7224
Feature activation+7.18e-01

Pos logit contributions
 Grac+0.168
nam+0.158
 Claire+0.158
rily+0.155
 Katy+0.154
Neg logit contributions
opl-0.108
?"-0.095
 snowball-0.092
ats-0.090
?-0.085
Token someone
Token ID2130
Feature activation-1.75e+00

Pos logit contributions
opl+0.020
?"+0.019
?+0.018
 snowball+0.018
ats+0.018
Neg logit contributions
 Grac-0.022
nam-0.022
rily-0.020
 Claire-0.020
 Katy-0.020
Token kneeling
Token ID42687
Feature activation+6.54e-01

Pos logit contributions
 Grac+0.062
nam+0.062
rily+0.059
 Claire+0.058
 Katy+0.057
Neg logit contributions
opl-0.040
?"-0.039
?-0.036
ats-0.035
 snowball-0.034
Token,
Token ID11
Feature activation-2.16e-01

Pos logit contributions
opl+0.016
?"+0.016
?+0.015
ats+0.014
 snowball+0.014
Neg logit contributions
 Grac-0.022
nam-0.022
rily-0.021
 Claire-0.020
 Katy-0.020
Token she
Token ID673
Feature activation+6.28e-01

Pos logit contributions
 Grac+0.006
nam+0.006
rily+0.006
ve+0.005
edly+0.005
Neg logit contributions
?"-0.006
opl-0.006
?-0.006
 snowball-0.006
ats-0.006
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
opl+0.004
?"+0.004
 snowball+0.004
ats+0.004
?+0.004
Neg logit contributions
 Grac-0.007
nam-0.007
rily-0.007
 Claire-0.007
 Katy-0.007
Token
Token ID198
Feature activation-5.87e-01

Pos logit contributions
 Grac+0.013
nam+0.013
rily+0.012
chie+0.011
ve+0.011
Neg logit contributions
opl-0.013
?"-0.012
?-0.011
ossible-0.011
o-0.011
Token
Token ID198
Feature activation-8.61e-01

Pos logit contributions
 Grac+0.021
nam+0.021
 Claire+0.020
rily+0.020
 Katy+0.020
Neg logit contributions
opl-0.013
?"-0.011
 snowball-0.011
ats-0.011
?-0.011
TokenThis
Token ID1212
Feature activation-1.65e+00

Pos logit contributions
 Grac+0.036
nam+0.036
rily+0.034
 Claire+0.034
 Katy+0.034
Neg logit contributions
opl-0.014
?"-0.013
 snowball-0.012
ats-0.011
?-0.011
Token week
Token ID1285
Feature activation-3.10e+00

Pos logit contributions
 Katy+0.053
 Claire+0.053
 Grac+0.052
nam+0.051
rily+0.050
Neg logit contributions
opl-0.040
 snowball-0.037
o-0.036
?"-0.035
ats-0.034
Token Lib
Token ID7980
Feature activation-4.75e+00

Pos logit contributions
 Grac+0.128
nam+0.128
rily+0.122
edly+0.116
chie+0.115
Neg logit contributions
?"-0.052
opl-0.051
?-0.046
ats-0.043
o-0.041
Tokenby
Token ID1525
Feature activation-1.57e+00

Pos logit contributions
 Grac+0.131
nam+0.130
rily+0.119
 Claire+0.119
 Katy+0.116
Neg logit contributions
opl-0.148
?"-0.142
ats-0.134
?-0.133
 snowball-0.133
Token wanted
Token ID2227
Feature activation-9.47e-01

Pos logit contributions
 Grac+0.049
nam+0.047
 Claire+0.046
rily+0.046
 Katy+0.046
Neg logit contributions
opl-0.042
?"-0.037
 snowball-0.037
?-0.035
ats-0.034
Token to
Token ID284
Feature activation-1.88e-01

Pos logit contributions
 Grac+0.031
nam+0.031
rily+0.029
 Claire+0.028
 Katy+0.028
Neg logit contributions
opl-0.024
?"-0.023
?-0.022
ats-0.021
 snowball-0.021
Token go
Token ID467
Feature activation-4.89e-02

Pos logit contributions
 Grac+0.007
nam+0.007
 Claire+0.006
rily+0.006
 Katy+0.006
Neg logit contributions
opl-0.004
?"-0.004
 snowball-0.004
ats-0.004
o-0.004
Token to
Token ID284
Feature activation+4.11e-01

Pos logit contributions
 Grac+0.002
nam+0.002
rily+0.002
 Claire+0.002
 Katy+0.002
Neg logit contributions
opl-0.001
?"-0.001
 snowball-0.001
ats-0.001
?-0.001
Token wide
Token ID3094
Feature activation-6.15e-01

Pos logit contributions
opl+0.010
?"+0.010
 snowball+0.009
?+0.009
ats+0.009
Neg logit contributions
 Grac-0.012
rily-0.012
nam-0.011
 Claire-0.011
 Katy-0.011
Token the
Token ID262
Feature activation-8.86e-01

Pos logit contributions
 Grac+0.019
nam+0.019
rily+0.017
 Claire+0.017
 Katy+0.017
Neg logit contributions
opl-0.017
?"-0.016
?-0.015
 snowball-0.015
ats-0.015
Token universe
Token ID6881
Feature activation-2.78e+00

Pos logit contributions
 Grac+0.029
nam+0.028
rily+0.028
 Claire+0.027
 Katy+0.027
Neg logit contributions
opl-0.022
?"-0.021
 snowball-0.020
?-0.020
ats-0.019
Token was
Token ID373
Feature activation-8.20e-01

Pos logit contributions
nam+0.103
 Grac+0.102
rily+0.098
 Claire+0.094
 Katy+0.094
Neg logit contributions
opl-0.059
?"-0.058
?-0.055
ats-0.052
 snowball-0.051
Token and
Token ID290
Feature activation-8.40e-01

Pos logit contributions
 Grac+0.027
nam+0.026
 Claire+0.025
rily+0.024
 Katy+0.024
Neg logit contributions
opl-0.021
?"-0.019
ats-0.019
 snowball-0.019
ossible-0.018
Token how
Token ID703
Feature activation-5.17e+00

Pos logit contributions
 Grac+0.025
nam+0.025
rily+0.024
 Claire+0.023
 Katy+0.023
Neg logit contributions
opl-0.023
?"-0.023
?-0.021
 snowball-0.021
ats-0.021
Token silly
Token ID14397
Feature activation-1.27e+00

Pos logit contributions
rily+0.172
 Grac+0.171
nam+0.163
edly+0.162
 Claire+0.159
Neg logit contributions
?"-0.129
?-0.119
opl-0.113
o-0.111
ch-0.109
Token and
Token ID290
Feature activation-3.75e-01

Pos logit contributions
nam+0.034
 Grac+0.034
rily+0.032
 Claire+0.030
 Katy+0.030
Neg logit contributions
opl-0.040
?"-0.039
?-0.037
 snowball-0.036
ats-0.036
Token small
Token ID1402
Feature activation-1.28e+00

Pos logit contributions
 Grac+0.012
rily+0.012
nam+0.011
ve+0.011
 Claire+0.011
Neg logit contributions
opl-0.010
?"-0.009
?-0.009
 snowball-0.008
ats-0.008
Token she
Token ID673
Feature activation-4.96e-01

Pos logit contributions
nam+0.040
 Grac+0.039
rily+0.038
 Claire+0.036
ve+0.036
Neg logit contributions
opl-0.034
?"-0.033
?-0.032
 snowball-0.030
ats-0.030
Token felt
Token ID2936
Feature activation-4.28e-01

Pos logit contributions
nam+0.016
 Grac+0.016
rily+0.015
 Claire+0.015
 Katy+0.015
Neg logit contributions
opl-0.013
?"-0.012
?-0.011
ats-0.011
 snowball-0.011