TOP ACTIVATIONS
MAX = 0.096

worked hard . The grass ho pper was spoiled and didn
" Is it a grass ho pper ?" she guesses .
It is not a grass ho pper or a cricket .
and pointed at the grass ho pper . He said ,
an ant and a grass ho pper . The ant was
busy working when the grass ho pper saw a switch next
introduced to a smooth grass ho pper called Jim . Ted
a great he ave - ho , they gained the avocado
lady bug and a grass ho pper . One
and saw a huge grass ho pper , who had been
and picked up a grass ho pper ! Both friends were
ried ly , the grass ho pper rushed over and tried
." The grass ho pper sighed and decided to
?" The grass ho pper said , " I
. It was a grass ho pper ! Tom was so
in circles with the grass ho pper . They jumped and
Gr ass ho pper , let â
so happy that the grass ho pper had said yes .
Then Baby saw some grass ho ppers in the garden .
and squ ished the grass ho ppers with her fingers to

MIN ACTIVATIONS
MIN = -0.104

time playing with her or nam ents and making her special
and play with your or nam ents . Maybe you can
tree with lights and or nam ents . They also saw
the rest of the or nam ents onto the tree .
he especially liked the or nam ents he kept around .
a box of shiny or nam ents that spark led in
many colors . Some or nam ents were round , some
to look at the or nam ents on the tree .
can we have some or nam ents for our snow man
All of these lovely or nam ents reminded him of the
park filled with colourful or nam ents . Jimmy looked out
she had with her or nam ents . <|endoftext|> Once upon
to me . The or nam ents were very important to
never played with the or nam ents again . They still
He quickly hung colorful or nam ents and tw ink ling
and lots of shiny or nam ents . L ila wanted
to help sort the or nam ents . She picked up
take one of the or nam ents outside for a walk
at all the different or nam ents and adm iring the
who liked to collect or nam ents . Everywhere she went

INTERVAL 0.046 - 0.096
CONTAINS 0.092%

But we have to take turns and share . First ,
was so relieved , he quickly ran past the trap and
up and down . She quickly put on her best dress
ped up the apples au ce and then went racing around
he said , " Hi Fl uffy , do you want

INTERVAL -0.004 - 0.046
CONTAINS 63.061%

The pond spark led and the ducks qu acked . Anna
a good time in the mountains . He also hoped they
house was getting very angry . He shouted out of his
his owner were out walking , the sky above them turned
and began to cry . His friends woke up

INTERVAL -0.054 - -0.004
CONTAINS 36.812%

him . She said , " Tom my , are you
smiled and enjoyed his time on the bench . He looked
in a line . They say " m arch , march
Yes , the egg will be warm . It will be
" But you will get sick , Lily ," Sam

INTERVAL -0.104 - -0.054
CONTAINS 0.035%

" It 's okay , kids ," he said
and then on a book case . She reached for the
't do that again , okay ?" Sara says .
says , " Ben is okay . He touched the oven
" Yes , we 're okay , sweet ies . Don
Token worked
Token ID3111
Feature activation+2.16e-03

Pos logit contributions
amas+0.019
 subtract+0.018
 avail+0.018
 magnet+0.017
 Tears+0.017
Neg logit contributions
 ostr-0.017
 wo-0.016
ho-0.015
aa-0.015
 loves-0.014
Token hard
Token ID1327
Feature activation+2.86e-02

Pos logit contributions
 wo+0.002
 ostr+0.002
ho+0.002
 loves+0.002
aa+0.001
Neg logit contributions
amas-0.005
 avail-0.005
 subtract-0.005
-0.005
 magnet-0.005
Token.
Token ID13
Feature activation+1.48e-03

Pos logit contributions
 ostr+0.033
 wo+0.032
ho+0.029
aa+0.027
 loves+0.027
Neg logit contributions
amas-0.063
 avail-0.061
 subtract-0.060
 magnet-0.058
-0.058
Token The
Token ID383
Feature activation+1.58e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.002
 avail-0.002
 subtract-0.002
 magnet-0.002
smanship-0.002
Token grass
Token ID8701
Feature activation+2.51e-02

Pos logit contributions
 ostr+0.026
 wo+0.025
ho+0.023
aa+0.023
 loves+0.021
Neg logit contributions
amas-0.028
 subtract-0.026
 avail-0.026
 magnet-0.026
-0.025
Tokenho
Token ID8873
Feature activation+9.57e-02

Pos logit contributions
 ostr+0.035
ho+0.033
 wo+0.032
 loves+0.029
aa+0.028
Neg logit contributions
amas-0.052
 avail-0.052
 subtract-0.051
 magnet-0.047
ulum-0.047
Tokenpper
Token ID2848
Feature activation+2.18e-02

Pos logit contributions
 ostr+0.190
 wo+0.181
ho+0.170
aa+0.166
 loves+0.160
Neg logit contributions
amas-0.141
 subtract-0.130
 avail-0.129
 magnet-0.126
-0.124
Token was
Token ID373
Feature activation+1.49e-02

Pos logit contributions
 ostr+0.033
 wo+0.032
ho+0.030
 loves+0.028
aa+0.028
Neg logit contributions
amas-0.042
 avail-0.041
 subtract-0.039
-0.037
 magnet-0.037
Token spoiled
Token ID34781
Feature activation+6.27e-03

Pos logit contributions
 ostr+0.024
 wo+0.022
ho+0.021
aa+0.020
 loves+0.019
Neg logit contributions
amas-0.028
 subtract-0.026
 avail-0.026
 magnet-0.025
 Tears-0.025
Token and
Token ID290
Feature activation-4.86e-03

Pos logit contributions
 ostr+0.008
 wo+0.008
ho+0.007
aa+0.007
 loves+0.007
Neg logit contributions
amas-0.013
 avail-0.013
 subtract-0.013
ulum-0.012
-0.012
Token didn
Token ID1422
Feature activation+5.17e-03

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.008
+0.008
ulum+0.008
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token "
Token ID366
Feature activation-1.42e-02

Pos logit contributions
amas+0.010
 subtract+0.009
 avail+0.009
+0.009
 Tears+0.009
Neg logit contributions
 ostr-0.013
 wo-0.013
ho-0.012
aa-0.012
 loves-0.011
TokenIs
Token ID3792
Feature activation+9.06e-03

Pos logit contributions
amas+0.024
 avail+0.022
 magnet+0.022
 Tears+0.021
 subtract+0.021
Neg logit contributions
 ostr-0.026
 wo-0.024
ho-0.024
aa-0.023
 loves-0.022
Token it
Token ID340
Feature activation-2.01e-03

Pos logit contributions
 ostr+0.018
aa+0.018
ho+0.017
 wo+0.017
arl+0.017
Neg logit contributions
 avail-0.010
 subtract-0.010
 boulder-0.010
lf-0.010
on-0.010
Token a
Token ID257
Feature activation-6.90e-03

Pos logit contributions
amas+0.004
 avail+0.003
 subtract+0.003
 magnet+0.003
oon+0.003
Neg logit contributions
 ostr-0.003
ho-0.003
 wo-0.003
aa-0.003
 loves-0.003
Token grass
Token ID8701
Feature activation+2.15e-02

Pos logit contributions
amas+0.011
 magnet+0.011
 avail+0.011
 subtract+0.011
 boulder+0.011
Neg logit contributions
 ostr-0.011
 wo-0.011
ho-0.010
aa-0.010
 loves-0.010
Tokenho
Token ID8873
Feature activation+9.41e-02

Pos logit contributions
 ostr+0.034
 wo+0.032
ho+0.029
aa+0.029
 loves+0.029
Neg logit contributions
amas-0.039
 avail-0.036
 magnet-0.036
 subtract-0.036
-0.036
Tokenpper
Token ID2848
Feature activation+2.39e-02

Pos logit contributions
 ostr+0.162
 wo+0.152
ho+0.138
aa+0.134
 loves+0.133
Neg logit contributions
amas-0.164
 avail-0.150
 subtract-0.149
 magnet-0.148
-0.146
Token?"
Token ID1701
Feature activation-1.53e-03

Pos logit contributions
 ostr+0.036
 wo+0.034
ho+0.032
aa+0.031
 loves+0.029
Neg logit contributions
amas-0.045
 avail-0.042
 magnet-0.042
 subtract-0.042
oon-0.041
Token she
Token ID673
Feature activation-8.23e-03

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.002
 magnet+0.002
+0.002
Neg logit contributions
 ostr-0.003
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token guesses
Token ID44774
Feature activation-6.88e-03

Pos logit contributions
amas+0.017
 avail+0.016
 subtract+0.016
 magnet+0.015
 Tears+0.015
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.010
aa-0.010
 loves-0.009
Token.
Token ID13
Feature activation-5.03e-03

Pos logit contributions
amas+0.015
 avail+0.014
 subtract+0.014
+0.014
 magnet+0.014
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.007
 loves-0.007
TokenIt
Token ID1026
Feature activation-2.43e-03

Pos logit contributions
amas+0.002
 avail+0.002
 subtract+0.002
 magnet+0.002
+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token is
Token ID318
Feature activation+1.93e-02

Pos logit contributions
amas+0.004
 magnet+0.004
oon+0.004
 subtract+0.004
 Tears+0.004
Neg logit contributions
 ostr-0.004
ho-0.004
 wo-0.004
aa-0.004
arl-0.003
Token not
Token ID407
Feature activation-1.09e-02

Pos logit contributions
ho+0.033
 ostr+0.032
aa+0.030
 wo+0.029
 loves+0.028
Neg logit contributions
amas-0.031
 avail-0.031
 magnet-0.030
 Tears-0.029
 subtract-0.029
Token a
Token ID257
Feature activation-7.88e-03

Pos logit contributions
amas+0.018
 magnet+0.017
 avail+0.017
 subtract+0.017
 Tears+0.016
Neg logit contributions
ho-0.019
 ostr-0.018
 wo-0.017
aa-0.016
 loves-0.015
Token grass
Token ID8701
Feature activation+2.04e-02

Pos logit contributions
 magnet+0.011
 avail+0.010
 boulder+0.010
 basketball+0.010
amas+0.010
Neg logit contributions
 ostr-0.014
 wo-0.014
aa-0.013
ho-0.013
la-0.013
Tokenho
Token ID8873
Feature activation+9.37e-02

Pos logit contributions
 ostr+0.031
 wo+0.030
aa+0.027
 loves+0.027
ho+0.026
Neg logit contributions
amas-0.034
 magnet-0.034
-0.033
 boulder-0.033
 avail-0.033
Tokenpper
Token ID2848
Feature activation+2.02e-02

Pos logit contributions
 ostr+0.170
 wo+0.162
 loves+0.144
ho+0.140
aa+0.137
Neg logit contributions
amas-0.151
-0.139
on-0.139
iece-0.139
 avail-0.138
Token or
Token ID393
Feature activation-1.96e-02

Pos logit contributions
 ostr+0.031
 wo+0.029
aa+0.027
ho+0.027
 loves+0.025
Neg logit contributions
amas-0.036
 magnet-0.034
 avail-0.034
 boulder-0.033
 subtract-0.032
Token a
Token ID257
Feature activation-8.49e-03

Pos logit contributions
amas+0.034
 subtract+0.033
 avail+0.033
 magnet+0.032
 Tears+0.031
Neg logit contributions
 ostr-0.032
 wo-0.030
ho-0.029
aa-0.028
 loves-0.026
Token cricket
Token ID18836
Feature activation+1.45e-02

Pos logit contributions
 magnet+0.013
 avail+0.012
amas+0.012
 subtract+0.012
 boulder+0.012
Neg logit contributions
 ostr-0.014
 wo-0.014
aa-0.013
ho-0.013
 loves-0.013
Token.
Token ID13
Feature activation-5.98e-03

Pos logit contributions
 ostr+0.020
 wo+0.019
ho+0.017
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.028
 magnet-0.027
 avail-0.027
 subtract-0.027
oon-0.026
Token and
Token ID290
Feature activation-3.61e-03

Pos logit contributions
 avail+0.051
amas+0.051
 subtract+0.049
ulum+0.047
+0.047
Neg logit contributions
 ostr-0.028
ho-0.028
 wo-0.027
 for-0.023
 loves-0.023
Token pointed
Token ID6235
Feature activation-1.45e-03

Pos logit contributions
amas+0.008
 avail+0.007
+0.007
smanship+0.007
icycle+0.007
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
 loves-0.004
 sw-0.004
Token at
Token ID379
Feature activation-3.22e-02

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
 magnet+0.003
 Tears+0.003
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token the
Token ID262
Feature activation-8.75e-04

Pos logit contributions
amas+0.064
 subtract+0.059
 avail+0.059
+0.056
ulum+0.056
Neg logit contributions
 ostr-0.050
 wo-0.046
ho-0.044
aa-0.041
 loves-0.040
Token grass
Token ID8701
Feature activation+2.13e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
+0.001
ulum+0.001
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.001
 loves-0.001
aa-0.001
Tokenho
Token ID8873
Feature activation+9.34e-02

Pos logit contributions
 ostr+0.029
 wo+0.027
ho+0.026
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.044
 avail-0.042
 subtract-0.042
 magnet-0.040
-0.040
Tokenpper
Token ID2848
Feature activation+2.30e-02

Pos logit contributions
 ostr+0.163
 wo+0.155
ho+0.133
aa+0.133
 loves+0.131
Neg logit contributions
amas-0.156
iece-0.145
 avail-0.144
on-0.144
-0.142
Token.
Token ID13
Feature activation-2.57e-03

Pos logit contributions
 ostr+0.035
 wo+0.033
aa+0.030
ho+0.030
 loves+0.028
Neg logit contributions
amas-0.043
 subtract-0.041
 magnet-0.040
 avail-0.040
 Tears-0.040
Token He
Token ID679
Feature activation-5.73e-03

Pos logit contributions
 avail+0.004
amas+0.004
smanship+0.003
lf+0.003
 subtract+0.003
Neg logit contributions
 wo-0.005
 ostr-0.005
ho-0.005
aa-0.005
 loves-0.004
Token said
Token ID531
Feature activation-6.33e-03

Pos logit contributions
amas+0.012
 avail+0.011
 subtract+0.011
+0.011
 magnet+0.010
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token,
Token ID11
Feature activation-1.59e-03

Pos logit contributions
 magnet+0.010
 subtract+0.010
on+0.009
 Tears+0.009
amas+0.009
Neg logit contributions
 ostr-0.011
aa-0.010
 wo-0.010
ho-0.009
 loves-0.009
Token an
Token ID281
Feature activation-1.77e-02

Pos logit contributions
 ostr+0.025
 wo+0.024
ho+0.023
aa+0.022
 loves+0.021
Neg logit contributions
amas-0.029
 avail-0.027
 subtract-0.027
 magnet-0.027
-0.026
Token ant
Token ID1885
Feature activation+2.24e-02

Pos logit contributions
amas+0.033
 subtract+0.031
 avail+0.031
 magnet+0.031
ulum+0.030
Neg logit contributions
 ostr-0.027
 wo-0.026
ho-0.025
aa-0.024
 loves-0.022
Token and
Token ID290
Feature activation-3.66e-03

Pos logit contributions
 ostr+0.030
 wo+0.028
ho+0.026
aa+0.025
 loves+0.023
Neg logit contributions
amas-0.045
 subtract-0.043
 magnet-0.043
 avail-0.043
 Tears-0.043
Token a
Token ID257
Feature activation-9.87e-03

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
 magnet+0.005
 Tears+0.005
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.007
aa-0.007
 loves-0.006
Token grass
Token ID8701
Feature activation+1.95e-02

Pos logit contributions
amas+0.017
 magnet+0.017
+0.016
 avail+0.016
 subtract+0.016
Neg logit contributions
 ostr-0.015
 wo-0.015
ho-0.014
aa-0.013
 loves-0.013
Tokenho
Token ID8873
Feature activation+9.28e-02

Pos logit contributions
 ostr+0.022
 wo+0.021
aa+0.018
 loves+0.017
ho+0.016
Neg logit contributions
amas-0.042
-0.041
 Tears-0.041
 magnet-0.040
oon-0.039
Tokenpper
Token ID2848
Feature activation+2.38e-02

Pos logit contributions
 ostr+0.167
 wo+0.165
 loves+0.143
aa+0.137
ho+0.135
Neg logit contributions
amas-0.143
on-0.135
-0.132
pen-0.130
iece-0.129
Token.
Token ID13
Feature activation+2.06e-03

Pos logit contributions
 ostr+0.041
 wo+0.040
aa+0.039
ho+0.038
arl+0.037
Neg logit contributions
 Tears-0.034
 magnet-0.033
on-0.031
-0.031
things-0.031
Token The
Token ID383
Feature activation+2.03e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
 loves+0.003
aa+0.002
Neg logit contributions
amas-0.004
 avail-0.004
 subtract-0.004
smanship-0.004
lf-0.004
Token ant
Token ID1885
Feature activation+2.31e-02

Pos logit contributions
 wo+0.038
 ostr+0.036
aa+0.034
ho+0.034
la+0.033
Neg logit contributions
amas-0.027
 magnet-0.027
 boulder-0.027
 avail-0.026
 Tears-0.026
Token was
Token ID373
Feature activation+1.58e-02

Pos logit contributions
 ostr+0.033
 wo+0.032
aa+0.030
ho+0.029
arl+0.026
Neg logit contributions
amas-0.041
 magnet-0.041
 Tears-0.040
lf-0.040
oon-0.039
Token busy
Token ID8179
Feature activation+5.13e-03

Pos logit contributions
 ostr+0.023
 wo+0.022
ho+0.020
aa+0.020
 loves+0.019
Neg logit contributions
amas-0.027
 avail-0.026
 subtract-0.025
 magnet-0.024
-0.024
Token working
Token ID1762
Feature activation-3.38e-03

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
 loves+0.007
aa+0.007
Neg logit contributions
amas-0.010
 avail-0.009
 subtract-0.009
-0.009
 magnet-0.009
Token when
Token ID618
Feature activation+2.09e-02

Pos logit contributions
amas+0.007
 avail+0.007
 subtract+0.007
+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.003
 loves-0.003
 for-0.003
Token the
Token ID262
Feature activation-2.94e-03

Pos logit contributions
 ostr+0.032
 wo+0.029
ho+0.028
aa+0.026
 loves+0.025
Neg logit contributions
amas-0.042
 subtract-0.039
oon-0.037
 magnet-0.037
icycle-0.037
Token grass
Token ID8701
Feature activation+2.05e-02

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
 magnet+0.005
+0.004
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.005
aa-0.004
 loves-0.004
Tokenho
Token ID8873
Feature activation+9.24e-02

Pos logit contributions
 ostr+0.032
ho+0.030
 wo+0.029
aa+0.027
 loves+0.025
Neg logit contributions
amas-0.039
 subtract-0.038
 avail-0.038
 magnet-0.035
ulum-0.035
Tokenpper
Token ID2848
Feature activation+1.92e-02

Pos logit contributions
 ostr+0.173
 wo+0.165
ho+0.154
aa+0.150
 loves+0.146
Neg logit contributions
amas-0.146
 avail-0.135
 subtract-0.135
 magnet-0.131
-0.130
Token saw
Token ID2497
Feature activation-1.10e-02

Pos logit contributions
 ostr+0.030
 wo+0.028
ho+0.026
aa+0.025
 loves+0.024
Neg logit contributions
amas-0.036
 avail-0.034
 subtract-0.034
 magnet-0.033
-0.032
Token a
Token ID257
Feature activation-7.89e-03

Pos logit contributions
amas+0.018
 avail+0.017
 subtract+0.017
 magnet+0.017
 Tears+0.016
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.017
aa-0.017
 loves-0.016
Token switch
Token ID5078
Feature activation-1.92e-02

Pos logit contributions
amas+0.015
 subtract+0.014
 avail+0.014
+0.013
 magnet+0.013
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.010
 loves-0.010
Token next
Token ID1306
Feature activation+1.08e-02

Pos logit contributions
amas+0.038
 avail+0.036
 subtract+0.036
 magnet+0.035
+0.034
Neg logit contributions
 ostr-0.028
 wo-0.026
ho-0.024
aa-0.023
 loves-0.022
Token introduced
Token ID5495
Feature activation+5.73e-03

Pos logit contributions
 ostr+0.028
 wo+0.026
ho+0.026
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.022
 avail-0.020
 subtract-0.020
 magnet-0.020
 Tears-0.020
Token to
Token ID284
Feature activation+7.19e-03

Pos logit contributions
 ostr+0.008
 wo+0.008
ho+0.007
aa+0.007
 loves+0.007
Neg logit contributions
amas-0.012
-0.011
 avail-0.011
 magnet-0.010
 subtract-0.010
Token a
Token ID257
Feature activation-9.72e-03

Pos logit contributions
 ostr+0.011
 wo+0.011
ho+0.010
aa+0.010
 loves+0.010
Neg logit contributions
amas-0.013
 subtract-0.012
 avail-0.012
-0.012
 magnet-0.012
Token smooth
Token ID7209
Feature activation-1.06e-03

Pos logit contributions
amas+0.017
 magnet+0.017
 avail+0.017
 subtract+0.017
 boulder+0.016
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.013
 loves-0.012
Token grass
Token ID8701
Feature activation+1.75e-02

Pos logit contributions
amas+0.002
 subtract+0.002
 avail+0.001
 Tears+0.001
+0.001
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Tokenho
Token ID8873
Feature activation+9.04e-02

Pos logit contributions
 ostr+0.025
 wo+0.024
aa+0.022
ho+0.021
 loves+0.021
Neg logit contributions
amas-0.034
 magnet-0.032
 subtract-0.032
 avail-0.031
-0.031
Tokenpper
Token ID2848
Feature activation+2.16e-02

Pos logit contributions
 ostr+0.150
 wo+0.142
 loves+0.125
ho+0.125
aa+0.122
Neg logit contributions
amas-0.161
 avail-0.147
 magnet-0.144
-0.144
oon-0.143
Token called
Token ID1444
Feature activation+1.60e-02

Pos logit contributions
 ostr+0.034
 wo+0.032
ho+0.030
aa+0.030
 loves+0.027
Neg logit contributions
amas-0.039
 magnet-0.037
 subtract-0.036
 avail-0.036
-0.035
Token Jim
Token ID5395
Feature activation+1.13e-02

Pos logit contributions
 ostr+0.030
 wo+0.028
ho+0.027
aa+0.027
 loves+0.026
Neg logit contributions
amas-0.024
 subtract-0.022
 magnet-0.022
 avail-0.022
 Tears-0.022
Token.
Token ID13
Feature activation+1.07e-03

Pos logit contributions
 ostr+0.023
ho+0.020
 wo+0.020
aa+0.020
rav+0.019
Neg logit contributions
 magnet-0.013
pen-0.013
on-0.012
amas-0.012
 chess-0.012
Token Ted
Token ID11396
Feature activation+1.18e-02

Pos logit contributions
 wo+0.002
 loves+0.001
 ostr+0.001
ho+0.001
:+0.001
Neg logit contributions
amas-0.002
 avail-0.002
inge-0.002
ivering-0.002
smanship-0.002
Token a
Token ID257
Feature activation-6.24e-03

Pos logit contributions
amas+0.017
 avail+0.016
 subtract+0.016
 magnet+0.015
+0.015
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.012
aa-0.011
 loves-0.011
Token great
Token ID1049
Feature activation-7.27e-03

Pos logit contributions
amas+0.012
 avail+0.011
 subtract+0.011
ulum+0.011
roads+0.011
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.008
aa-0.008
 loves-0.008
Token he
Token ID339
Feature activation-1.22e-02

Pos logit contributions
amas+0.015
roads+0.014
iece+0.014
 subtract+0.014
iers+0.013
Neg logit contributions
 wo-0.010
 ostr-0.010
 loves-0.009
ho-0.008
 for-0.008
Tokenave
Token ID1015
Feature activation-4.98e-03

Pos logit contributions
amas+0.022
 avail+0.021
 subtract+0.021
 magnet+0.020
+0.020
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.017
aa-0.017
 loves-0.016
Token-
Token ID12
Feature activation-2.42e-03

Pos logit contributions
amas+0.012
ulum+0.011
+0.011
icycle+0.011
 avail+0.011
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
 for-0.004
 loves-0.004
Tokenho
Token ID8873
Feature activation+9.03e-02

Pos logit contributions
amas+0.004
 subtract+0.004
 avail+0.004
 magnet+0.004
+0.004
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Token,
Token ID11
Feature activation+1.52e-04

Pos logit contributions
 ostr+0.153
 wo+0.145
ho+0.133
aa+0.129
 loves+0.127
Neg logit contributions
amas-0.157
 avail-0.148
 subtract-0.147
 magnet-0.143
-0.143
Token they
Token ID484
Feature activation+7.26e-03

Pos logit contributions
 ostr+0.000
 wo+0.000
ho+0.000
 loves+0.000
aa+0.000
Neg logit contributions
amas-0.000
 avail-0.000
ulum-0.000
icycle-0.000
smanship-0.000
Token gained
Token ID8618
Feature activation+3.11e-03

Pos logit contributions
 ostr+0.012
 wo+0.011
ho+0.011
aa+0.010
 loves+0.010
Neg logit contributions
amas-0.013
 subtract-0.012
 avail-0.012
 magnet-0.012
-0.012
Token the
Token ID262
Feature activation-3.87e-03

Pos logit contributions
 ostr+0.005
 wo+0.004
ho+0.004
 loves+0.004
aa+0.004
Neg logit contributions
amas-0.006
 avail-0.006
ulum-0.006
 subtract-0.006
-0.006
Token avocado
Token ID40377
Feature activation+3.76e-03

Pos logit contributions
amas+0.007
 avail+0.007
ulum+0.006
oon+0.006
 subtract+0.006
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.006
 loves-0.005
aa-0.005
Token lady
Token ID10846
Feature activation+5.11e-03

Pos logit contributions
 magnet+0.016
amas+0.016
 avail+0.015
+0.015
 boulder+0.015
Neg logit contributions
 ostr-0.021
 wo-0.019
aa-0.019
ho-0.019
 loves-0.018
Tokenbug
Token ID25456
Feature activation+6.64e-03

Pos logit contributions
 ostr+0.008
 wo+0.008
aa+0.007
ho+0.007
 loves+0.007
Neg logit contributions
amas-0.008
 magnet-0.008
 subtract-0.008
oon-0.008
on-0.008
Token and
Token ID290
Feature activation-7.09e-03

Pos logit contributions
 ostr+0.010
 wo+0.010
ho+0.009
aa+0.009
 loves+0.008
Neg logit contributions
amas-0.012
 magnet-0.011
 subtract-0.011
-0.011
 avail-0.011
Token a
Token ID257
Feature activation-1.10e-02

Pos logit contributions
amas+0.013
 avail+0.013
 subtract+0.012
 magnet+0.012
+0.012
Neg logit contributions
 ostr-0.011
 wo-0.010
ho-0.010
aa-0.009
 loves-0.009
Token grass
Token ID8701
Feature activation+1.95e-02

Pos logit contributions
amas+0.017
 magnet+0.016
 subtract+0.016
 avail+0.016
+0.015
Neg logit contributions
 ostr-0.020
 wo-0.018
ho-0.018
aa-0.018
 loves-0.017
Tokenho
Token ID8873
Feature activation+9.01e-02

Pos logit contributions
 ostr+0.024
 wo+0.022
aa+0.021
 loves+0.020
rav+0.019
Neg logit contributions
-0.039
amas-0.038
 magnet-0.037
oon-0.036
 Tears-0.036
Tokenpper
Token ID2848
Feature activation+2.06e-02

Pos logit contributions
 ostr+0.168
 wo+0.161
 loves+0.151
 lives+0.140
aa+0.137
Neg logit contributions
amas-0.129
on-0.125
-0.123
iece-0.121
pen-0.120
Token.
Token ID13
Feature activation-3.84e-03

Pos logit contributions
 ostr+0.035
aa+0.033
 wo+0.032
ho+0.031
rav+0.029
Neg logit contributions
amas-0.031
 magnet-0.030
oon-0.030
on-0.029
 Tears-0.029
Token
Token ID198
Feature activation+4.29e-03

Pos logit contributions
amas+0.007
 avail+0.006
 subtract+0.006
 magnet+0.006
lf+0.006
Neg logit contributions
 ostr-0.007
 wo-0.006
aa-0.006
ho-0.006
 loves-0.006
Token
Token ID198
Feature activation+4.18e-03

Pos logit contributions
 ostr+0.009
 wo+0.008
ho+0.008
aa+0.007
 loves+0.007
Neg logit contributions
amas-0.006
-0.005
oon-0.005
 Tears-0.005
things-0.005
TokenOne
Token ID3198
Feature activation+2.52e-02

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.006
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.008
 avail-0.007
 subtract-0.007
 magnet-0.007
-0.007
Token and
Token ID290
Feature activation-6.02e-03

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
 magnet+0.005
+0.005
Neg logit contributions
 ostr-0.004
 wo-0.003
ho-0.003
aa-0.003
 loves-0.003
Token saw
Token ID2497
Feature activation-1.24e-02

Pos logit contributions
amas+0.014
 avail+0.013
+0.013
 subtract+0.013
 magnet+0.012
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.006
 loves-0.006
aa-0.006
Token a
Token ID257
Feature activation-5.22e-03

Pos logit contributions
amas+0.026
 avail+0.024
 subtract+0.024
+0.023
 magnet+0.023
Neg logit contributions
 ostr-0.018
 wo-0.016
ho-0.015
aa-0.014
 loves-0.014
Token huge
Token ID3236
Feature activation+2.04e-02

Pos logit contributions
amas+0.011
 subtract+0.010
 avail+0.010
iece+0.010
roads+0.010
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Token grass
Token ID8701
Feature activation+1.72e-02

Pos logit contributions
 ostr+0.035
 wo+0.031
ho+0.028
 loves+0.026
aa+0.026
Neg logit contributions
amas-0.041
 subtract-0.038
iece-0.037
 avail-0.037
things-0.036
Tokenho
Token ID8873
Feature activation+9.00e-02

Pos logit contributions
 ostr+0.017
ho+0.016
 wo+0.015
aa+0.013
 loves+0.011
Neg logit contributions
amas-0.043
 avail-0.042
 subtract-0.042
 magnet-0.039
ulum-0.039
Tokenpper
Token ID2848
Feature activation+2.11e-02

Pos logit contributions
 ostr+0.153
 wo+0.145
 loves+0.133
aa+0.128
ho+0.126
Neg logit contributions
amas-0.155
-0.143
 avail-0.141
oon-0.139
on-0.139
Token,
Token ID11
Feature activation+2.32e-03

Pos logit contributions
 ostr+0.031
 wo+0.029
ho+0.027
aa+0.025
 loves+0.025
Neg logit contributions
amas-0.042
 avail-0.040
 subtract-0.040
-0.038
ulum-0.038
Token who
Token ID508
Feature activation+5.06e-04

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.003
 loves+0.003
aa+0.003
Neg logit contributions
amas-0.005
 avail-0.004
-0.004
 subtract-0.004
smanship-0.004
Token had
Token ID550
Feature activation+1.06e-02

Pos logit contributions
 ostr+0.001
ho+0.001
 wo+0.001
 loves+0.001
aa+0.001
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
iece-0.001
ulum-0.001
Token been
Token ID587
Feature activation-1.41e-02

Pos logit contributions
 ostr+0.016
 wo+0.015
ho+0.014
aa+0.014
 loves+0.013
Neg logit contributions
amas-0.019
 subtract-0.019
 avail-0.018
 magnet-0.018
 Tears-0.018
Token and
Token ID290
Feature activation-5.94e-03

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.015
 avail-0.014
 subtract-0.014
-0.013
ulum-0.013
Token picked
Token ID6497
Feature activation+4.53e-03

Pos logit contributions
amas+0.014
 avail+0.014
+0.013
ulum+0.012
smanship+0.012
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
 loves-0.006
aa-0.006
Token up
Token ID510
Feature activation+4.79e-03

Pos logit contributions
 ostr+0.007
 wo+0.007
ho+0.007
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.008
 subtract-0.008
 avail-0.008
 magnet-0.007
-0.007
Token a
Token ID257
Feature activation-8.06e-03

Pos logit contributions
 ostr+0.006
 wo+0.006
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.011
 subtract-0.010
 avail-0.010
oon-0.010
-0.010
Token grass
Token ID8701
Feature activation+1.71e-02

Pos logit contributions
amas+0.015
 avail+0.014
 subtract+0.014
+0.013
 Tears+0.013
Neg logit contributions
 ostr-0.013
 wo-0.013
ho-0.012
aa-0.011
 loves-0.011
Tokenho
Token ID8873
Feature activation+8.96e-02

Pos logit contributions
 ostr+0.022
 wo+0.021
ho+0.019
aa+0.018
 loves+0.017
Neg logit contributions
amas-0.036
 avail-0.035
 subtract-0.035
 magnet-0.034
-0.033
Tokenpper
Token ID2848
Feature activation+2.21e-02

Pos logit contributions
 ostr+0.150
 wo+0.141
ho+0.125
 loves+0.125
aa+0.124
Neg logit contributions
amas-0.157
 avail-0.144
-0.143
 subtract-0.142
 magnet-0.142
Token!
Token ID0
Feature activation+8.41e-03

Pos logit contributions
 ostr+0.032
 wo+0.030
ho+0.028
aa+0.027
 loves+0.026
Neg logit contributions
amas-0.044
 avail-0.042
 subtract-0.042
-0.040
 magnet-0.040
Token Both
Token ID5747
Feature activation+1.06e-02

Pos logit contributions
 ostr+0.013
 wo+0.013
ho+0.012
 loves+0.011
aa+0.011
Neg logit contributions
 avail-0.015
amas-0.015
 subtract-0.015
smanship-0.014
lf-0.014
Token friends
Token ID2460
Feature activation-1.43e-02

Pos logit contributions
 ostr+0.018
 wo+0.017
ho+0.016
aa+0.016
 loves+0.015
Neg logit contributions
amas-0.018
 avail-0.017
 subtract-0.017
 magnet-0.017
 Tears-0.017
Token were
Token ID547
Feature activation+2.57e-03

Pos logit contributions
amas+0.027
 avail+0.026
 subtract+0.025
 magnet+0.024
+0.024
Neg logit contributions
 ostr-0.022
 wo-0.021
ho-0.020
aa-0.019
 loves-0.019
Tokenried
Token ID2228
Feature activation+1.71e-02

Pos logit contributions
amas+0.006
+0.006
ulum+0.006
 Tears+0.006
 boulder+0.005
Neg logit contributions
 ostr-0.010
 wo-0.010
aa-0.008
 loves-0.008
 ost-0.008
Tokenly
Token ID306
Feature activation+3.38e-02

Pos logit contributions
 ostr+0.030
 wo+0.028
aa+0.026
ho+0.026
 loves+0.026
Neg logit contributions
amas-0.028
 avail-0.026
 boulder-0.026
 Tears-0.026
 magnet-0.026
Token,
Token ID11
Feature activation-2.75e-03

Pos logit contributions
 ostr+0.054
 wo+0.048
ho+0.045
aa+0.045
 loves+0.042
Neg logit contributions
amas-0.063
 subtract-0.058
 Tears-0.058
 boulder-0.057
-0.057
Token the
Token ID262
Feature activation-4.79e-03

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
 magnet+0.005
ulum+0.005
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Token grass
Token ID8701
Feature activation+1.83e-02

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.007
 loves-0.007
Tokenho
Token ID8873
Feature activation+8.91e-02

Pos logit contributions
 ostr+0.023
 wo+0.021
ho+0.019
aa+0.019
 loves+0.018
Neg logit contributions
amas-0.040
 avail-0.038
 subtract-0.038
 magnet-0.037
-0.036
Tokenpper
Token ID2848
Feature activation+1.75e-02

Pos logit contributions
 ostr+0.144
 wo+0.137
ho+0.123
aa+0.122
 loves+0.119
Neg logit contributions
amas-0.163
 avail-0.150
 subtract-0.150
 magnet-0.148
-0.148
Token rushed
Token ID13999
Feature activation+1.58e-02

Pos logit contributions
 ostr+0.028
 wo+0.027
ho+0.025
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.032
 subtract-0.030
 avail-0.029
 magnet-0.029
-0.029
Token over
Token ID625
Feature activation+5.01e-03

Pos logit contributions
 ostr+0.025
 wo+0.024
aa+0.021
ho+0.021
 loves+0.021
Neg logit contributions
amas-0.029
 subtract-0.027
 avail-0.027
 magnet-0.026
 Tears-0.026
Token and
Token ID290
Feature activation-8.40e-03

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.003
 loves+0.003
Neg logit contributions
amas-0.013
 avail-0.012
 subtract-0.012
 magnet-0.012
-0.012
Token tried
Token ID3088
Feature activation+2.78e-04

Pos logit contributions
amas+0.015
 avail+0.015
 subtract+0.014
+0.014
 magnet+0.014
Neg logit contributions
 ostr-0.014
 wo-0.013
ho-0.013
aa-0.012
 loves-0.012
Token."
Token ID526
Feature activation+7.89e-03

Pos logit contributions
amas+0.040
 avail+0.038
 subtract+0.038
 magnet+0.037
+0.036
Neg logit contributions
 ostr-0.025
 wo-0.023
ho-0.021
aa-0.021
 loves-0.020
Token
Token ID198
Feature activation+3.63e-03

Pos logit contributions
 wo+0.009
 ostr+0.009
ho+0.008
aa+0.007
 loves+0.007
Neg logit contributions
 avail-0.018
lf-0.018
amas-0.018
 subtract-0.017
smanship-0.017
Token
Token ID198
Feature activation+3.58e-03

Pos logit contributions
 ostr+0.006
 wo+0.006
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.007
 subtract-0.006
 avail-0.006
-0.006
 magnet-0.006
TokenThe
Token ID464
Feature activation+1.58e-02

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.006
aa+0.006
 loves+0.005
Neg logit contributions
amas-0.006
 avail-0.006
 magnet-0.005
ulum-0.005
 subtract-0.005
Token grass
Token ID8701
Feature activation+1.67e-02

Pos logit contributions
 ostr+0.029
 wo+0.029
ho+0.027
aa+0.027
 loves+0.026
Neg logit contributions
amas-0.023
 magnet-0.022
 subtract-0.022
 avail-0.022
 boulder-0.022
Tokenho
Token ID8873
Feature activation+8.87e-02

Pos logit contributions
 ostr+0.014
ho+0.013
 wo+0.012
aa+0.009
 loves+0.008
Neg logit contributions
amas-0.045
 avail-0.044
 subtract-0.044
ulum-0.042
 magnet-0.041
Tokenpper
Token ID2848
Feature activation+1.75e-02

Pos logit contributions
 ostr+0.130
 wo+0.121
ho+0.114
aa+0.109
 loves+0.104
Neg logit contributions
amas-0.175
 avail-0.167
 subtract-0.166
 magnet-0.161
 Tears-0.159
Token sighed
Token ID21893
Feature activation-1.11e-02

Pos logit contributions
 ostr+0.026
 wo+0.025
ho+0.023
 loves+0.022
aa+0.021
Neg logit contributions
 avail-0.036
amas-0.035
 subtract-0.032
ulum-0.032
-0.032
Token and
Token ID290
Feature activation-8.20e-03

Pos logit contributions
amas+0.024
 avail+0.023
 subtract+0.023
+0.022
 magnet+0.021
Neg logit contributions
 ostr-0.014
 wo-0.014
ho-0.012
 loves-0.012
aa-0.011
Token decided
Token ID3066
Feature activation-2.58e-03

Pos logit contributions
amas+0.016
 avail+0.015
 subtract+0.015
+0.015
 magnet+0.015
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.010
 loves-0.010
Token to
Token ID284
Feature activation+7.70e-03

Pos logit contributions
amas+0.006
 subtract+0.006
 avail+0.006
 magnet+0.006
 Tears+0.006
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token?"
Token ID1701
Feature activation-2.97e-04

Pos logit contributions
amas+0.042
 avail+0.041
 subtract+0.040
 Tears+0.039
iece+0.039
Neg logit contributions
 wo-0.021
 ostr-0.020
 loves-0.018
ho-0.017
 for-0.017
Token
Token ID198
Feature activation+4.96e-03

Pos logit contributions
amas+0.001
 avail+0.001
smanship+0.001
icycle+0.001
lf+0.001
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token
Token ID198
Feature activation+5.11e-03

Pos logit contributions
 ostr+0.007
 wo+0.007
ho+0.006
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.010
 avail-0.010
 subtract-0.009
 magnet-0.009
ulum-0.009
TokenThe
Token ID464
Feature activation+1.42e-02

Pos logit contributions
 ostr+0.009
 wo+0.008
aa+0.008
ho+0.008
 loves+0.007
Neg logit contributions
amas-0.009
 avail-0.009
smanship-0.008
 magnet-0.008
ulum-0.008
Token grass
Token ID8701
Feature activation+1.61e-02

Pos logit contributions
 ostr+0.026
 wo+0.026
ho+0.024
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.022
 avail-0.020
 subtract-0.020
 magnet-0.020
 boulder-0.020
Tokenho
Token ID8873
Feature activation+8.83e-02

Pos logit contributions
 ostr+0.011
ho+0.011
 wo+0.009
aa+0.007
 loves+0.005
Neg logit contributions
amas-0.046
 avail-0.045
 subtract-0.044
ulum-0.042
 equations-0.042
Tokenpper
Token ID2848
Feature activation+1.63e-02

Pos logit contributions
 ostr+0.130
 wo+0.123
ho+0.114
aa+0.110
 loves+0.105
Neg logit contributions
amas-0.173
 avail-0.164
 subtract-0.164
 magnet-0.159
-0.157
Token said
Token ID531
Feature activation-5.36e-03

Pos logit contributions
 ostr+0.025
 wo+0.024
ho+0.023
 loves+0.021
aa+0.021
Neg logit contributions
 avail-0.032
amas-0.032
 subtract-0.029
-0.029
ulum-0.028
Token,
Token ID11
Feature activation-2.84e-03

Pos logit contributions
amas+0.010
 subtract+0.010
 magnet+0.010
 avail+0.010
 Tears+0.010
Neg logit contributions
 ostr-0.008
 wo-0.007
aa-0.007
ho-0.007
 loves-0.006
Token "
Token ID366
Feature activation-1.84e-02

Pos logit contributions
amas+0.005
 subtract+0.004
 Tears+0.004
+0.004
 magnet+0.004
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
aa-0.004
 loves-0.004
TokenI
Token ID40
Feature activation-9.81e-03

Pos logit contributions
amas+0.031
 avail+0.030
 magnet+0.028
 subtract+0.027
ulum+0.027
Neg logit contributions
 ostr-0.034
ho-0.032
 wo-0.031
aa-0.030
 loves-0.028
Token.
Token ID13
Feature activation-5.42e-03

Pos logit contributions
amas+0.034
 avail+0.033
+0.032
 magnet+0.031
 subtract+0.031
Neg logit contributions
 ostr-0.015
 wo-0.013
ho-0.013
 loves-0.012
aa-0.011
Token It
Token ID632
Feature activation-2.43e-03

Pos logit contributions
smanship+0.011
 avail+0.011
ivering+0.010
oon+0.010
inge+0.010
Neg logit contributions
 wo-0.009
 ostr-0.007
 loves-0.007
:-0.007
ho-0.007
Token was
Token ID373
Feature activation+1.20e-02

Pos logit contributions
amas+0.006
 avail+0.006
 subtract+0.006
+0.006
ados+0.006
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
 would-0.002
 loves-0.002
Token a
Token ID257
Feature activation-1.14e-02

Pos logit contributions
 ostr+0.017
 wo+0.015
ho+0.014
aa+0.013
 loves+0.012
Neg logit contributions
amas-0.026
 avail-0.024
 subtract-0.024
-0.023
 magnet-0.023
Token grass
Token ID8701
Feature activation+1.52e-02

Pos logit contributions
amas+0.022
 subtract+0.020
 avail+0.020
iece+0.019
+0.019
Neg logit contributions
 ostr-0.018
 wo-0.017
ho-0.016
aa-0.015
 loves-0.015
Tokenho
Token ID8873
Feature activation+8.79e-02

Pos logit contributions
 ostr+0.022
 wo+0.020
ho+0.020
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.032
 avail-0.030
 subtract-0.030
 magnet-0.028
 Tears-0.028
Tokenpper
Token ID2848
Feature activation+1.82e-02

Pos logit contributions
 ostr+0.147
 wo+0.141
 loves+0.126
ho+0.124
aa+0.122
Neg logit contributions
amas-0.155
-0.140
 avail-0.140
 subtract-0.139
iece-0.138
Token!
Token ID0
Feature activation+3.77e-03

Pos logit contributions
 ostr+0.030
 wo+0.029
ho+0.027
aa+0.026
 loves+0.025
Neg logit contributions
amas-0.032
 subtract-0.031
 avail-0.031
 magnet-0.029
-0.029
Token Tom
Token ID4186
Feature activation-1.79e-02

Pos logit contributions
 wo+0.005
 ostr+0.004
 loves+0.004
aa+0.004
ho+0.004
Neg logit contributions
 avail-0.008
smanship-0.008
oon-0.008
amas-0.008
inge-0.008
Token was
Token ID373
Feature activation+1.17e-02

Pos logit contributions
amas+0.035
 avail+0.033
 subtract+0.032
 magnet+0.032
+0.031
Neg logit contributions
 ostr-0.027
 wo-0.026
ho-0.024
aa-0.023
 loves-0.022
Token so
Token ID523
Feature activation-1.16e-02

Pos logit contributions
 ostr+0.019
 wo+0.018
ho+0.016
aa+0.015
 loves+0.015
Neg logit contributions
amas-0.023
 avail-0.022
 subtract-0.021
 magnet-0.020
-0.020
Token in
Token ID287
Feature activation-1.05e-02

Pos logit contributions
amas+0.007
 avail+0.007
 subtract+0.007
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
aa-0.004
 loves-0.004
Token circles
Token ID13332
Feature activation-1.61e-03

Pos logit contributions
amas+0.017
+0.015
 avail+0.015
 subtract+0.015
 magnet+0.015
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.017
 loves-0.017
aa-0.017
Token with
Token ID351
Feature activation-7.91e-03

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
 magnet+0.003
+0.003
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token the
Token ID262
Feature activation-2.33e-03

Pos logit contributions
amas+0.014
 avail+0.013
 subtract+0.013
 magnet+0.013
+0.012
Neg logit contributions
 ostr-0.013
 wo-0.013
ho-0.012
aa-0.012
 loves-0.011
Token grass
Token ID8701
Feature activation+1.69e-02

Pos logit contributions
amas+0.004
 subtract+0.003
 avail+0.003
 magnet+0.003
 Tears+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Tokenho
Token ID8873
Feature activation+8.77e-02

Pos logit contributions
 ostr+0.025
 wo+0.023
ho+0.021
aa+0.021
 loves+0.020
Neg logit contributions
amas-0.032
 subtract-0.031
 avail-0.030
 magnet-0.030
-0.030
Tokenpper
Token ID2848
Feature activation+2.05e-02

Pos logit contributions
 ostr+0.152
 wo+0.142
ho+0.126
 loves+0.125
aa+0.122
Neg logit contributions
amas-0.152
 avail-0.139
-0.138
 subtract-0.137
iece-0.136
Token.
Token ID13
Feature activation+1.57e-03

Pos logit contributions
 ostr+0.032
 wo+0.030
ho+0.028
aa+0.028
 loves+0.026
Neg logit contributions
amas-0.037
 subtract-0.035
 magnet-0.035
 Tears-0.034
 avail-0.034
Token They
Token ID1119
Feature activation+4.40e-03

Pos logit contributions
 wo+0.003
 ostr+0.003
aa+0.003
 loves+0.003
ho+0.003
Neg logit contributions
 avail-0.002
amas-0.002
smanship-0.002
oon-0.002
 magnet-0.002
Token jumped
Token ID11687
Feature activation+1.34e-02

Pos logit contributions
 ostr+0.008
 wo+0.008
aa+0.007
ho+0.007
 loves+0.007
Neg logit contributions
 subtract-0.006
amas-0.006
 avail-0.006
 magnet-0.006
 Tears-0.006
Token and
Token ID290
Feature activation-6.25e-03

Pos logit contributions
 ostr+0.024
 wo+0.021
ho+0.021
aa+0.021
 loves+0.020
Neg logit contributions
amas-0.022
 subtract-0.020
 boulder-0.020
 magnet-0.020
 Tears-0.020
Token
Token ID26391
Feature activation-1.62e-02

Pos logit contributions
 avail+0.009
amas+0.009
 magnet+0.009
 subtract+0.008
smanship+0.008
Neg logit contributions
 ostr-0.010
 wo-0.009
 loves-0.008
aa-0.008
ho-0.008
Token
Token ID129
Feature activation-1.20e-02

Pos logit contributions
 avail+0.029
amas+0.027
 subtract+0.027
 magnet+0.027
smanship+0.026
Neg logit contributions
 wo-0.026
 ostr-0.026
ho-0.024
 loves-0.023
aa-0.023
Token
Token ID241
Feature activation+9.37e-04

Pos logit contributions
amas+0.017
 subtract+0.016
ulum+0.016
 avail+0.015
 Tears+0.015
Neg logit contributions
 ostr-0.024
 wo-0.023
ho-0.022
aa-0.021
 loves-0.020
TokenGr
Token ID8642
Feature activation+7.45e-03

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
 magnet-0.001
-0.001
Tokenass
Token ID562
Feature activation+2.76e-03

Pos logit contributions
 ostr+0.015
 wo+0.014
ho+0.013
aa+0.013
 loves+0.012
Neg logit contributions
amas-0.011
 subtract-0.010
 avail-0.010
 magnet-0.010
 Tears-0.010
Tokenho
Token ID8873
Feature activation+8.76e-02

Pos logit contributions
 ostr+0.004
 wo+0.004
aa+0.004
ho+0.003
 loves+0.003
Neg logit contributions
 magnet-0.005
on-0.005
lf-0.005
oon-0.005
 boulder-0.005
Tokenpper
Token ID2848
Feature activation+1.82e-02

Pos logit contributions
 ostr+0.138
 wo+0.130
ho+0.119
aa+0.116
 loves+0.112
Neg logit contributions
amas-0.163
 avail-0.153
 subtract-0.153
 magnet-0.149
-0.149
Token,
Token ID11
Feature activation-1.40e-03

Pos logit contributions
 ostr+0.030
 wo+0.029
ho+0.027
aa+0.026
 loves+0.025
Neg logit contributions
amas-0.033
 avail-0.031
 subtract-0.031
 magnet-0.029
-0.029
Token let
Token ID1309
Feature activation-1.28e-02

Pos logit contributions
amas+0.003
 avail+0.002
 subtract+0.002
 magnet+0.002
+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Tokenâ
Token ID22940
Feature activation-9.75e-03

Pos logit contributions
amas+0.021
 avail+0.020
 magnet+0.019
 subtract+0.019
 boulder+0.019
Neg logit contributions
 ostr-0.022
 wo-0.021
ho-0.021
 loves-0.019
aa-0.019
Token
Token ID26391
Feature activation-1.69e-02

Pos logit contributions
amas+0.017
 avail+0.016
 magnet+0.016
 subtract+0.016
 Tears+0.015
Neg logit contributions
 ostr-0.016
 wo-0.015
aa-0.014
ho-0.014
 loves-0.014
Token so
Token ID523
Feature activation-1.02e-02

Pos logit contributions
 ostr+0.025
 wo+0.024
ho+0.024
aa+0.022
 loves+0.021
Neg logit contributions
amas-0.022
 subtract-0.021
 avail-0.020
 Tears-0.020
-0.019
Token happy
Token ID3772
Feature activation-1.92e-02

Pos logit contributions
amas+0.014
 subtract+0.014
 Tears+0.013
 avail+0.013
+0.013
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.019
aa-0.017
 loves-0.016
Token that
Token ID326
Feature activation+2.01e-02

Pos logit contributions
amas+0.039
 avail+0.037
 subtract+0.037
 magnet+0.036
+0.035
Neg logit contributions
 ostr-0.027
 wo-0.026
ho-0.024
aa-0.023
 loves-0.022
Token the
Token ID262
Feature activation-4.27e-03

Pos logit contributions
 ostr+0.046
 wo+0.044
ho+0.042
aa+0.041
 loves+0.040
Neg logit contributions
amas-0.025
 subtract-0.022
 avail-0.021
-0.021
 magnet-0.020
Token grass
Token ID8701
Feature activation+1.52e-02

Pos logit contributions
amas+0.008
 subtract+0.007
 avail+0.007
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Tokenho
Token ID8873
Feature activation+8.74e-02

Pos logit contributions
 ostr+0.023
ho+0.021
 wo+0.021
aa+0.019
 loves+0.018
Neg logit contributions
amas-0.030
 avail-0.030
 subtract-0.029
ulum-0.027
 magnet-0.027
Tokenpper
Token ID2848
Feature activation+1.89e-02

Pos logit contributions
 ostr+0.144
 wo+0.136
ho+0.125
aa+0.122
 loves+0.118
Neg logit contributions
amas-0.157
 avail-0.146
 subtract-0.146
 magnet-0.143
-0.143
Token had
Token ID550
Feature activation+8.48e-03

Pos logit contributions
 ostr+0.028
 wo+0.027
ho+0.025
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.037
 avail-0.035
 subtract-0.035
 magnet-0.034
-0.034
Token said
Token ID531
Feature activation-5.47e-03

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.013
aa+0.012
 loves+0.012
Neg logit contributions
amas-0.015
 subtract-0.014
 avail-0.014
 Tears-0.014
 magnet-0.013
Token yes
Token ID3763
Feature activation-1.68e-02

Pos logit contributions
amas+0.009
 subtract+0.009
 magnet+0.009
 avail+0.009
 Tears+0.009
Neg logit contributions
 ostr-0.009
ho-0.008
 wo-0.008
aa-0.007
 loves-0.007
Token.
Token ID13
Feature activation-1.30e-04

Pos logit contributions
amas+0.024
 subtract+0.023
 avail+0.023
 Tears+0.021
 boulder+0.021
Neg logit contributions
 ostr-0.033
ho-0.030
 wo-0.028
aa-0.028
 loves-0.027
TokenThen
Token ID6423
Feature activation+1.14e-02

Pos logit contributions
 ostr+0.012
 wo+0.012
ho+0.011
aa+0.011
 loves+0.010
Neg logit contributions
amas-0.008
 avail-0.008
 subtract-0.008
 magnet-0.007
-0.007
Token Baby
Token ID14801
Feature activation+2.05e-02

Pos logit contributions
 ostr+0.015
 wo+0.014
ho+0.013
aa+0.013
 loves+0.012
Neg logit contributions
amas-0.024
 subtract-0.023
 avail-0.022
 magnet-0.022
ulum-0.021
Token saw
Token ID2497
Feature activation-1.35e-02

Pos logit contributions
 ostr+0.034
 wo+0.032
ho+0.030
aa+0.029
 loves+0.028
Neg logit contributions
amas-0.036
 subtract-0.034
 avail-0.034
 magnet-0.033
 Tears-0.032
Token some
Token ID617
Feature activation+8.46e-03

Pos logit contributions
amas+0.021
 avail+0.020
 Tears+0.020
 subtract+0.020
 magnet+0.020
Neg logit contributions
 ostr-0.024
 wo-0.024
ho-0.023
aa-0.022
 loves-0.021
Token grass
Token ID8701
Feature activation+1.55e-02

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
aa+0.012
 loves+0.011
Neg logit contributions
amas-0.016
 avail-0.015
 subtract-0.015
-0.014
ulum-0.014
Tokenho
Token ID8873
Feature activation+8.73e-02

Pos logit contributions
 ostr+0.021
 wo+0.020
aa+0.017
ho+0.017
 loves+0.017
Neg logit contributions
amas-0.032
 subtract-0.030
 avail-0.030
 magnet-0.030
-0.030
Tokenppers
Token ID11799
Feature activation+3.24e-03

Pos logit contributions
 ostr+0.141
 wo+0.133
ho+0.120
 loves+0.117
aa+0.116
Neg logit contributions
amas-0.161
 avail-0.150
 subtract-0.148
-0.146
 magnet-0.144
Token in
Token ID287
Feature activation-1.20e-02

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.006
 avail-0.006
 subtract-0.006
 magnet-0.006
-0.006
Token the
Token ID262
Feature activation-3.64e-03

Pos logit contributions
amas+0.031
 subtract+0.028
iece+0.027
oon+0.027
ulum+0.027
Neg logit contributions
 ostr-0.013
 wo-0.010
ho-0.010
 their-0.009
aa-0.008
Token garden
Token ID11376
Feature activation-5.84e-03

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
+0.005
 magnet+0.005
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token.
Token ID13
Feature activation-5.78e-03

Pos logit contributions
amas+0.013
 avail+0.013
 subtract+0.012
 magnet+0.012
ulum+0.012
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
 loves-0.006
aa-0.005
Token and
Token ID290
Feature activation-6.66e-03

Pos logit contributions
 ostr+0.024
 wo+0.023
ho+0.021
aa+0.020
 loves+0.020
Neg logit contributions
amas-0.033
 avail-0.031
 subtract-0.031
-0.030
 magnet-0.030
Token squ
Token ID2809
Feature activation+7.35e-03

Pos logit contributions
amas+0.013
 avail+0.012
 subtract+0.012
 magnet+0.012
ulum+0.011
Neg logit contributions
 ostr-0.010
 wo-0.010
ho-0.009
aa-0.009
 loves-0.009
Tokenished
Token ID1348
Feature activation-2.89e-03

Pos logit contributions
 ostr+0.013
 wo+0.013
aa+0.012
ho+0.012
 loves+0.011
Neg logit contributions
amas-0.012
 avail-0.011
 subtract-0.011
 magnet-0.011
lf-0.011
Token the
Token ID262
Feature activation-2.24e-03

Pos logit contributions
amas+0.006
 subtract+0.005
 avail+0.005
 magnet+0.005
+0.005
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.003
Token grass
Token ID8701
Feature activation+1.66e-02

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
 magnet+0.003
+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Tokenho
Token ID8873
Feature activation+8.72e-02

Pos logit contributions
 ostr+0.025
 wo+0.023
ho+0.021
aa+0.021
 loves+0.020
Neg logit contributions
amas-0.032
 subtract-0.030
 avail-0.030
 magnet-0.030
-0.029
Tokenppers
Token ID11799
Feature activation+5.10e-03

Pos logit contributions
 ostr+0.147
 wo+0.138
 loves+0.121
ho+0.121
aa+0.119
Neg logit contributions
amas-0.156
-0.143
 avail-0.142
 subtract-0.138
iece-0.138
Token with
Token ID351
Feature activation-8.36e-03

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.010
 avail-0.009
 subtract-0.009
 magnet-0.009
-0.009
Token her
Token ID607
Feature activation+2.35e-03

Pos logit contributions
amas+0.017
 subtract+0.016
 avail+0.016
 magnet+0.015
ulum+0.015
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.011
aa-0.010
 loves-0.010
Token fingers
Token ID9353
Feature activation-4.24e-03

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.003
 magnet-0.003
 subtract-0.003
 avail-0.003
-0.003
Token to
Token ID284
Feature activation+7.42e-03

Pos logit contributions
amas+0.008
 subtract+0.008
 avail+0.008
 magnet+0.008
 Tears+0.008
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.005
aa-0.005
 loves-0.005
Token time
Token ID640
Feature activation+9.07e-03

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.009
 magnet+0.009
+0.008
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token playing
Token ID2712
Feature activation-2.28e-03

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
aa+0.012
 loves+0.011
Neg logit contributions
amas-0.017
 avail-0.016
 subtract-0.016
 magnet-0.016
-0.015
Token with
Token ID351
Feature activation-1.13e-02

Pos logit contributions
amas+0.004
 subtract+0.004
 avail+0.004
 Tears+0.004
ulum+0.004
Neg logit contributions
 ostr-0.003
 wo-0.003
 for-0.003
ho-0.003
 loves-0.003
Token her
Token ID607
Feature activation+3.79e-03

Pos logit contributions
amas+0.027
 subtract+0.024
 avail+0.024
ulum+0.024
+0.024
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.010
 loves-0.010
aa-0.010
Token or
Token ID393
Feature activation-1.81e-02

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.006
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.006
 avail-0.006
 subtract-0.006
 magnet-0.006
-0.006
Tokennam
Token ID7402
Feature activation-9.26e-02

Pos logit contributions
amas+0.035
+0.032
 subtract+0.032
 avail+0.032
oon+0.031
Neg logit contributions
 ostr-0.028
 wo-0.026
ho-0.025
 loves-0.023
aa-0.023
Tokenents
Token ID658
Feature activation-6.44e-03

Pos logit contributions
amas+0.133
 avail+0.133
 boulder+0.130
+0.127
 magnet+0.124
Neg logit contributions
 ostr-0.172
 wo-0.162
ho-0.161
aa-0.157
 loves-0.149
Token and
Token ID290
Feature activation-5.60e-03

Pos logit contributions
amas+0.012
 subtract+0.012
 avail+0.012
 magnet+0.011
+0.011
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.008
aa-0.008
 loves-0.008
Token making
Token ID1642
Feature activation+3.82e-03

Pos logit contributions
amas+0.010
 avail+0.009
+0.009
 subtract+0.009
 magnet+0.009
Neg logit contributions
 ostr-0.009
 wo-0.009
ho-0.009
aa-0.008
 loves-0.008
Token her
Token ID607
Feature activation+2.66e-03

Pos logit contributions
 ostr+0.006
 wo+0.005
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.007
 avail-0.007
 subtract-0.007
 magnet-0.007
-0.007
Token special
Token ID2041
Feature activation-3.56e-04

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.005
 avail-0.005
 subtract-0.005
 magnet-0.004
-0.004
Token and
Token ID290
Feature activation+1.11e-03

Pos logit contributions
 ostr+0.019
 wo+0.018
ho+0.017
aa+0.016
 loves+0.015
Neg logit contributions
amas-0.024
 avail-0.022
 subtract-0.022
 magnet-0.021
-0.021
Token play
Token ID711
Feature activation-1.26e-02

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.001
Neg logit contributions
amas-0.002
 avail-0.002
 subtract-0.002
 magnet-0.002
 Tears-0.002
Token with
Token ID351
Feature activation-4.57e-03

Pos logit contributions
 subtract+0.022
amas+0.022
 avail+0.022
 Tears+0.021
+0.020
Neg logit contributions
 ostr-0.020
 wo-0.020
 loves-0.018
ho-0.017
 for-0.016
Token your
Token ID534
Feature activation+1.27e-02

Pos logit contributions
amas+0.008
 subtract+0.008
 avail+0.008
+0.008
 magnet+0.008
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Token or
Token ID393
Feature activation-1.28e-02

Pos logit contributions
 ostr+0.024
 wo+0.023
ho+0.022
aa+0.022
 loves+0.020
Neg logit contributions
amas-0.019
 magnet-0.018
 avail-0.018
 subtract-0.018
-0.017
Tokennam
Token ID7402
Feature activation-9.26e-02

Pos logit contributions
amas+0.025
 subtract+0.024
 avail+0.024
+0.023
 magnet+0.023
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.017
aa-0.016
 loves-0.015
Tokenents
Token ID658
Feature activation-1.02e-03

Pos logit contributions
 avail+0.132
amas+0.131
 boulder+0.127
+0.125
 subtract+0.122
Neg logit contributions
 ostr-0.176
 wo-0.165
ho-0.163
aa-0.161
 loves-0.152
Token.
Token ID13
Feature activation-6.16e-04

Pos logit contributions
amas+0.002
 subtract+0.002
 magnet+0.002
 avail+0.002
oon+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
aa-0.001
ho-0.001
 loves-0.001
Token Maybe
Token ID6674
Feature activation-7.32e-03

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
smanship+0.001
 boulder+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
 loves-0.001
aa-0.001
Token you
Token ID345
Feature activation+2.01e-02

Pos logit contributions
 avail+0.012
amas+0.012
 subtract+0.012
 magnet+0.012
lf+0.011
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.011
 loves-0.010
Token can
Token ID460
Feature activation+8.54e-03

Pos logit contributions
 ostr+0.022
 wo+0.020
ho+0.018
aa+0.017
 loves+0.015
Neg logit contributions
amas-0.047
 magnet-0.045
 subtract-0.045
 avail-0.045
-0.044
Token tree
Token ID5509
Feature activation-3.02e-03

Pos logit contributions
 ostr+0.013
 wo+0.012
ho+0.011
 loves+0.010
aa+0.010
Neg logit contributions
amas-0.012
 subtract-0.011
iece-0.011
 avail-0.011
 Tears-0.011
Token with
Token ID351
Feature activation-8.08e-03

Pos logit contributions
amas+0.006
 avail+0.006
 subtract+0.006
 magnet+0.006
+0.006
Neg logit contributions
 ostr-0.004
ho-0.004
 wo-0.004
 loves-0.003
aa-0.003
Token lights
Token ID7588
Feature activation+1.76e-02

Pos logit contributions
amas+0.015
 avail+0.014
 subtract+0.014
 magnet+0.014
+0.014
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.011
aa-0.011
 loves-0.011
Token and
Token ID290
Feature activation-5.49e-03

Pos logit contributions
 ostr+0.027
 wo+0.026
ho+0.024
aa+0.024
 loves+0.022
Neg logit contributions
amas-0.033
 subtract-0.031
 avail-0.030
 magnet-0.030
 Tears-0.030
Token or
Token ID393
Feature activation-1.72e-02

Pos logit contributions
amas+0.010
 avail+0.009
+0.009
 subtract+0.009
 magnet+0.009
Neg logit contributions
 ostr-0.009
 wo-0.009
ho-0.008
 loves-0.008
aa-0.008
Tokennam
Token ID7402
Feature activation-9.09e-02

Pos logit contributions
amas+0.031
 avail+0.028
+0.028
 subtract+0.028
 magnet+0.027
Neg logit contributions
 ostr-0.030
 wo-0.028
ho-0.026
aa-0.024
 loves-0.024
Tokenents
Token ID658
Feature activation-4.49e-03

Pos logit contributions
amas+0.127
 avail+0.124
+0.120
 boulder+0.119
 subtract+0.119
Neg logit contributions
 ostr-0.178
 wo-0.167
ho-0.164
aa-0.161
 loves-0.151
Token.
Token ID13
Feature activation-4.02e-03

Pos logit contributions
amas+0.008
 subtract+0.008
 avail+0.007
 magnet+0.007
iece+0.007
Neg logit contributions
 ostr-0.007
 wo-0.007
aa-0.006
ho-0.006
 loves-0.006
Token They
Token ID1119
Feature activation+3.36e-03

Pos logit contributions
 avail+0.008
amas+0.008
 subtract+0.008
smanship+0.008
 magnet+0.008
Neg logit contributions
 ostr-0.006
 wo-0.005
ho-0.005
aa-0.004
 loves-0.004
Token also
Token ID635
Feature activation+4.03e-03

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.007
 avail-0.006
 subtract-0.006
-0.006
 magnet-0.006
Token saw
Token ID2497
Feature activation-1.07e-02

Pos logit contributions
 ostr+0.007
 wo+0.006
aa+0.006
ho+0.006
 loves+0.005
Neg logit contributions
 subtract-0.007
amas-0.007
 avail-0.007
 magnet-0.007
 Tears-0.006
Token the
Token ID262
Feature activation-1.58e-03

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.001
Neg logit contributions
amas-0.003
 avail-0.003
 subtract-0.003
 magnet-0.002
-0.002
Token rest
Token ID1334
Feature activation-7.14e-03

Pos logit contributions
amas+0.002
 avail+0.002
 subtract+0.002
+0.002
 Tears+0.002
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
 loves-0.003
Token of
Token ID286
Feature activation-1.84e-02

Pos logit contributions
amas+0.013
 subtract+0.012
 avail+0.012
 magnet+0.011
+0.011
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.010
aa-0.010
 loves-0.010
Token the
Token ID262
Feature activation-5.83e-04

Pos logit contributions
amas+0.041
 subtract+0.037
 avail+0.037
oon+0.036
 Tears+0.036
Neg logit contributions
 ostr-0.024
 wo-0.022
ho-0.020
aa-0.019
 loves-0.018
Token or
Token ID393
Feature activation-1.39e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 Tears+0.001
+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
 loves-0.001
aa-0.001
Tokennam
Token ID7402
Feature activation-8.99e-02

Pos logit contributions
amas+0.027
+0.025
oon+0.024
 avail+0.024
 subtract+0.024
Neg logit contributions
 ostr-0.022
ho-0.020
 wo-0.020
 loves-0.018
aa-0.017
Tokenents
Token ID658
Feature activation-2.75e-03

Pos logit contributions
 avail+0.137
amas+0.133
 boulder+0.130
+0.127
 magnet+0.126
Neg logit contributions
 ostr-0.162
ho-0.154
aa-0.150
 wo-0.149
 loves-0.138
Token onto
Token ID4291
Feature activation+1.84e-03

Pos logit contributions
amas+0.005
 subtract+0.005
 avail+0.005
 magnet+0.005
+0.005
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.003
Token the
Token ID262
Feature activation-3.95e-03

Pos logit contributions
 ostr+0.003
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.004
 subtract-0.004
 avail-0.004
 magnet-0.003
 Tears-0.003
Token tree
Token ID5509
Feature activation-6.05e-03

Pos logit contributions
amas+0.005
 avail+0.005
 subtract+0.005
+0.004
 magnet+0.004
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.008
aa-0.007
 loves-0.007
Token.
Token ID13
Feature activation+1.68e-03

Pos logit contributions
amas+0.012
 avail+0.012
 subtract+0.012
 magnet+0.011
+0.011
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token he
Token ID339
Feature activation-1.13e-02

Pos logit contributions
amas+0.010
+0.009
 avail+0.009
oon+0.009
pen+0.009
Neg logit contributions
 ostr-0.007
 loves-0.007
 wo-0.007
ho-0.006
 would-0.006
Token especially
Token ID2592
Feature activation-2.18e-03

Pos logit contributions
amas+0.023
 subtract+0.022
 avail+0.022
 magnet+0.022
+0.021
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.013
 loves-0.012
Token liked
Token ID8288
Feature activation-9.05e-04

Pos logit contributions
amas+0.005
 avail+0.004
 subtract+0.004
+0.004
 magnet+0.004
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.002
 loves-0.002
Token the
Token ID262
Feature activation-4.39e-03

Pos logit contributions
amas+0.002
 subtract+0.002
 magnet+0.002
 avail+0.002
 Tears+0.002
Neg logit contributions
 ostr-0.001
 wo-0.001
aa-0.001
ho-0.001
 loves-0.001
Token or
Token ID393
Feature activation-1.89e-02

Pos logit contributions
amas+0.008
+0.008
 avail+0.008
 subtract+0.008
oon+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
 loves-0.006
ho-0.006
aa-0.005
Tokennam
Token ID7402
Feature activation-8.99e-02

Pos logit contributions
amas+0.039
+0.036
oon+0.035
 avail+0.034
icycle+0.033
Neg logit contributions
 ostr-0.029
 wo-0.027
 loves-0.025
ho-0.024
 would-0.022
Tokenents
Token ID658
Feature activation-7.12e-03

Pos logit contributions
 avail+0.132
amas+0.129
 boulder+0.128
+0.125
 subtract+0.123
Neg logit contributions
 ostr-0.166
ho-0.153
 wo-0.152
aa-0.152
 loves-0.142
Token he
Token ID339
Feature activation-1.27e-02

Pos logit contributions
amas+0.014
 avail+0.013
 subtract+0.013
 magnet+0.013
+0.013
Neg logit contributions
 ostr-0.010
 wo-0.010
ho-0.009
aa-0.009
 loves-0.008
Token kept
Token ID4030
Feature activation-9.11e-03

Pos logit contributions
amas+0.025
 subtract+0.024
 avail+0.024
 magnet+0.023
+0.023
Neg logit contributions
 ostr-0.018
 wo-0.017
ho-0.016
aa-0.016
 loves-0.014
Token around
Token ID1088
Feature activation+2.45e-04

Pos logit contributions
amas+0.017
 avail+0.016
 subtract+0.016
 magnet+0.016
+0.016
Neg logit contributions
 ostr-0.014
 wo-0.013
ho-0.012
aa-0.012
 loves-0.011
Token.
Token ID13
Feature activation+2.91e-03

Pos logit contributions
 ostr+0.000
 wo+0.000
ho+0.000
 loves+0.000
aa+0.000
Neg logit contributions
amas-0.001
 avail-0.000
 subtract-0.000
-0.000
ulum-0.000
Token a
Token ID257
Feature activation-6.97e-03

Pos logit contributions
 ostr+0.019
 wo+0.018
ho+0.018
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.021
 subtract-0.021
 avail-0.020
 magnet-0.019
 Tears-0.019
Token box
Token ID3091
Feature activation-2.45e-02

Pos logit contributions
amas+0.013
 magnet+0.013
 avail+0.013
 subtract+0.013
+0.013
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
aa-0.009
 loves-0.008
Token of
Token ID286
Feature activation-2.30e-02

Pos logit contributions
amas+0.046
 avail+0.046
 subtract+0.044
ulum+0.042
+0.042
Neg logit contributions
 ostr-0.037
 wo-0.036
ho-0.033
aa-0.032
 loves-0.031
Token shiny
Token ID22441
Feature activation-2.97e-03

Pos logit contributions
amas+0.038
 avail+0.035
 subtract+0.035
+0.034
 magnet+0.033
Neg logit contributions
 ostr-0.042
 wo-0.040
ho-0.037
aa-0.036
 loves-0.035
Token or
Token ID393
Feature activation-1.56e-02

Pos logit contributions
amas+0.005
 avail+0.004
 subtract+0.004
+0.004
 Tears+0.004
Neg logit contributions
 ostr-0.006
 wo-0.005
ho-0.005
 loves-0.005
aa-0.005
Tokennam
Token ID7402
Feature activation-8.92e-02

Pos logit contributions
amas+0.024
+0.022
 avail+0.021
oon+0.021
 boulder+0.021
Neg logit contributions
 ostr-0.031
 wo-0.029
ho-0.027
 loves-0.027
aa-0.025
Tokenents
Token ID658
Feature activation-4.74e-03

Pos logit contributions
 avail+0.135
amas+0.131
 boulder+0.130
+0.127
roads+0.125
Neg logit contributions
 ostr-0.161
ho-0.149
 wo-0.149
aa-0.146
 loves-0.137
Token that
Token ID326
Feature activation+2.45e-02

Pos logit contributions
amas+0.009
 subtract+0.009
 avail+0.008
 magnet+0.008
+0.008
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Token spark
Token ID9009
Feature activation-5.10e-03

Pos logit contributions
 ostr+0.040
 wo+0.037
ho+0.036
 loves+0.033
aa+0.033
Neg logit contributions
amas-0.045
 avail-0.043
 subtract-0.041
ulum-0.041
 boulder-0.040
Tokenled
Token ID992
Feature activation+1.36e-02

Pos logit contributions
amas+0.007
 subtract+0.007
 avail+0.007
 magnet+0.007
oon+0.006
Neg logit contributions
 ostr-0.010
 wo-0.010
ho-0.009
aa-0.009
 loves-0.008
Token in
Token ID287
Feature activation-1.30e-02

Pos logit contributions
 ostr+0.019
 wo+0.018
ho+0.017
aa+0.016
 loves+0.015
Neg logit contributions
amas-0.028
 avail-0.026
 subtract-0.026
 magnet-0.025
-0.025
Token many
Token ID867
Feature activation+5.82e-03

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
aa+0.007
 loves+0.006
Neg logit contributions
amas-0.010
 subtract-0.009
 avail-0.009
 magnet-0.009
 Tears-0.009
Token colors
Token ID7577
Feature activation-4.81e-03

Pos logit contributions
 ostr+0.010
 wo+0.010
ho+0.009
 loves+0.009
aa+0.009
Neg logit contributions
amas-0.010
 avail-0.009
 subtract-0.009
-0.009
ulum-0.009
Token.
Token ID13
Feature activation-5.03e-03

Pos logit contributions
amas+0.009
 subtract+0.008
 magnet+0.008
 avail+0.008
 Tears+0.008
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token Some
Token ID2773
Feature activation+2.33e-02

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.009
 boulder+0.008
 magnet+0.008
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token or
Token ID393
Feature activation-2.11e-02

Pos logit contributions
 ostr+0.037
 wo+0.035
ho+0.032
aa+0.031
 loves+0.031
Neg logit contributions
amas-0.044
 avail-0.042
 subtract-0.041
ulum-0.040
-0.040
Tokennam
Token ID7402
Feature activation-8.88e-02

Pos logit contributions
amas+0.027
 avail+0.024
 subtract+0.024
+0.023
 magnet+0.023
Neg logit contributions
 ostr-0.046
 wo-0.044
ho-0.042
aa-0.041
 loves-0.040
Tokenents
Token ID658
Feature activation-8.69e-03

Pos logit contributions
amas+0.125
 avail+0.121
+0.116
 boulder+0.116
 subtract+0.114
Neg logit contributions
 ostr-0.174
 wo-0.164
ho-0.161
aa-0.158
 loves-0.149
Token were
Token ID547
Feature activation-4.45e-03

Pos logit contributions
amas+0.015
 subtract+0.015
 avail+0.015
 magnet+0.014
+0.014
Neg logit contributions
 ostr-0.014
 wo-0.013
aa-0.012
ho-0.012
 loves-0.011
Token round
Token ID2835
Feature activation-9.07e-04

Pos logit contributions
 subtract+0.007
amas+0.007
 magnet+0.007
 avail+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.007
 wo-0.007
aa-0.007
ho-0.007
 loves-0.006
Token,
Token ID11
Feature activation-5.18e-03

Pos logit contributions
amas+0.002
 subtract+0.002
 avail+0.002
 magnet+0.002
+0.002
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token some
Token ID617
Feature activation+5.70e-03

Pos logit contributions
amas+0.011
 avail+0.010
icycle+0.010
 boulder+0.010
oon+0.010
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
 loves-0.006
 for-0.006
Token to
Token ID284
Feature activation+8.56e-03

Pos logit contributions
 ostr+0.000
 wo+0.000
ho+0.000
 loves+0.000
aa+0.000
Neg logit contributions
 subtract-0.000
 magnet-0.000
amas-0.000
 basketball-0.000
 magnets-0.000
Token look
Token ID804
Feature activation-4.72e-03

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
aa+0.012
 loves+0.011
Neg logit contributions
amas-0.016
 avail-0.015
 subtract-0.015
 Tears-0.014
-0.014
Token at
Token ID379
Feature activation-3.27e-02

Pos logit contributions
amas+0.008
 magnet+0.008
 avail+0.007
 basketball+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.008
ho-0.008
 wo-0.007
aa-0.007
 ost-0.006
Token the
Token ID262
Feature activation-5.82e-03

Pos logit contributions
amas+0.063
 subtract+0.060
 avail+0.060
+0.058
 magnet+0.057
Neg logit contributions
 ostr-0.049
 wo-0.046
ho-0.043
aa-0.041
 loves-0.040
Token or
Token ID393
Feature activation-1.99e-02

Pos logit contributions
amas+0.010
 avail+0.009
 subtract+0.009
 magnet+0.009
+0.009
Neg logit contributions
 ostr-0.010
 wo-0.010
ho-0.009
aa-0.009
 loves-0.009
Tokennam
Token ID7402
Feature activation-8.87e-02

Pos logit contributions
amas+0.027
+0.024
oon+0.024
 subtract+0.023
 avail+0.022
Neg logit contributions
 ostr-0.043
ho-0.041
 wo-0.041
 loves-0.038
aa-0.037
Tokenents
Token ID658
Feature activation-6.91e-03

Pos logit contributions
 avail+0.131
amas+0.129
 boulder+0.127
+0.126
 magnet+0.121
Neg logit contributions
 ostr-0.162
ho-0.152
 wo-0.148
aa-0.148
 loves-0.138
Token on
Token ID319
Feature activation-2.01e-02

Pos logit contributions
amas+0.014
 avail+0.013
 subtract+0.013
 magnet+0.013
+0.013
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
aa-0.008
 loves-0.008
Token the
Token ID262
Feature activation-7.35e-03

Pos logit contributions
amas+0.042
 subtract+0.040
 avail+0.039
oon+0.038
+0.038
Neg logit contributions
 ostr-0.027
 wo-0.025
ho-0.023
 loves-0.021
aa-0.021
Token tree
Token ID5509
Feature activation-8.56e-03

Pos logit contributions
amas+0.008
 magnet+0.007
 subtract+0.007
 avail+0.007
 Tears+0.006
Neg logit contributions
 ostr-0.017
 wo-0.017
aa-0.016
ho-0.016
 loves-0.016
Token.
Token ID13
Feature activation-3.66e-03

Pos logit contributions
amas+0.018
 avail+0.018
 subtract+0.017
+0.017
 magnet+0.017
Neg logit contributions
 ostr-0.011
 wo-0.010
ho-0.010
aa-0.009
 loves-0.009
Token can
Token ID460
Feature activation+3.71e-03

Pos logit contributions
 magnet+0.007
amas+0.007
 subtract+0.007
 boulder+0.006
 avail+0.006
Neg logit contributions
 ostr-0.007
aa-0.007
ho-0.007
 wo-0.007
 loves-0.006
Token we
Token ID356
Feature activation+1.94e-03

Pos logit contributions
 ostr+0.006
 wo+0.005
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.007
 avail-0.007
 boulder-0.007
iece-0.007
 Tears-0.007
Token have
Token ID423
Feature activation-1.04e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
aa+0.003
 loves+0.002
Neg logit contributions
amas-0.004
 avail-0.003
 subtract-0.003
 Tears-0.003
 magnet-0.003
Token some
Token ID617
Feature activation+1.00e-02

Pos logit contributions
 subtract+0.015
amas+0.015
 avail+0.014
 magnet+0.014
+0.013
Neg logit contributions
ho-0.019
 ostr-0.019
aa-0.019
 wo-0.019
 loves-0.017
Token or
Token ID393
Feature activation-1.87e-02

Pos logit contributions
 ostr+0.017
 wo+0.016
ho+0.015
aa+0.015
 loves+0.014
Neg logit contributions
amas-0.017
 subtract-0.016
 avail-0.016
 magnet-0.016
-0.016
Tokennam
Token ID7402
Feature activation-8.82e-02

Pos logit contributions
amas+0.026
 avail+0.024
 subtract+0.024
 magnet+0.023
+0.023
Neg logit contributions
 ostr-0.038
 wo-0.037
ho-0.035
aa-0.034
 loves-0.033
Tokenents
Token ID658
Feature activation-7.30e-03

Pos logit contributions
amas+0.125
 avail+0.124
 boulder+0.119
+0.119
 subtract+0.116
Neg logit contributions
 ostr-0.169
 wo-0.158
ho-0.156
aa-0.153
 loves-0.143
Token for
Token ID329
Feature activation+6.15e-03

Pos logit contributions
amas+0.014
 subtract+0.013
 avail+0.013
 magnet+0.013
+0.013
Neg logit contributions
 ostr-0.011
 wo-0.010
ho-0.009
aa-0.009
 loves-0.009
Token our
Token ID674
Feature activation+1.38e-02

Pos logit contributions
 ostr+0.009
 wo+0.009
ho+0.009
aa+0.008
 loves+0.008
Neg logit contributions
amas-0.012
 subtract-0.011
 avail-0.011
 magnet-0.011
-0.011
Token snow
Token ID6729
Feature activation-5.02e-03

Pos logit contributions
 ostr+0.022
aa+0.022
 wo+0.021
ho+0.020
la+0.018
Neg logit contributions
 magnet-0.023
amas-0.023
 subtract-0.022
lf-0.022
 avail-0.021
Tokenman
Token ID805
Feature activation+7.93e-03

Pos logit contributions
amas+0.008
 avail+0.008
 magnet+0.008
 subtract+0.008
+0.007
Neg logit contributions
 ostr-0.009
 wo-0.009
ho-0.008
aa-0.008
 loves-0.008
Token All
Token ID1439
Feature activation+1.07e-02

Pos logit contributions
 ostr+0.004
 wo+0.004
 loves+0.003
aa+0.003
ho+0.003
Neg logit contributions
amas-0.006
 avail-0.006
 subtract-0.005
iece-0.005
lf-0.005
Token of
Token ID286
Feature activation-2.64e-02

Pos logit contributions
 ostr+0.018
 wo+0.017
ho+0.016
aa+0.015
 loves+0.015
Neg logit contributions
amas-0.018
 subtract-0.017
 avail-0.017
 magnet-0.017
 Tears-0.017
Token these
Token ID777
Feature activation+2.53e-02

Pos logit contributions
amas+0.054
 avail+0.049
 subtract+0.049
ulum+0.048
+0.048
Neg logit contributions
 ostr-0.039
 wo-0.036
ho-0.032
aa-0.031
 loves-0.031
Token lovely
Token ID14081
Feature activation+7.19e-03

Pos logit contributions
 ostr+0.043
 wo+0.040
ho+0.038
aa+0.037
 loves+0.036
Neg logit contributions
amas-0.044
 avail-0.042
 subtract-0.041
 magnet-0.040
-0.040
Token or
Token ID393
Feature activation-1.92e-02

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
 loves+0.012
aa+0.012
Neg logit contributions
amas-0.011
 avail-0.011
 subtract-0.010
-0.010
oon-0.010
Tokennam
Token ID7402
Feature activation-8.81e-02

Pos logit contributions
amas+0.027
+0.025
oon+0.024
 avail+0.023
ulum+0.022
Neg logit contributions
 ostr-0.041
 wo-0.038
ho-0.037
 loves-0.036
 would-0.034
Tokenents
Token ID658
Feature activation-8.24e-03

Pos logit contributions
 avail+0.133
amas+0.128
 boulder+0.126
+0.124
 magnet+0.122
Neg logit contributions
 ostr-0.162
ho-0.150
 wo-0.150
aa-0.148
 loves-0.139
Token reminded
Token ID14516
Feature activation-1.09e-02

Pos logit contributions
amas+0.016
 avail+0.015
 subtract+0.015
 magnet+0.014
+0.014
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.010
 loves-0.010
Token him
Token ID683
Feature activation+2.49e-03

Pos logit contributions
amas+0.019
 avail+0.018
 subtract+0.018
 magnet+0.017
 Tears+0.017
Neg logit contributions
 ostr-0.018
 wo-0.017
ho-0.016
aa-0.016
 loves-0.015
Token of
Token ID286
Feature activation-2.59e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.005
 subtract-0.005
 avail-0.005
 magnet-0.005
 Tears-0.005
Token the
Token ID262
Feature activation-4.53e-03

Pos logit contributions
amas+0.047
 subtract+0.045
 avail+0.045
 magnet+0.044
+0.043
Neg logit contributions
 ostr-0.042
 wo-0.039
ho-0.037
aa-0.036
 loves-0.034
Token park
Token ID3952
Feature activation+5.61e-04

Pos logit contributions
amas+0.014
 avail+0.013
 subtract+0.013
 magnet+0.013
+0.013
Neg logit contributions
 ostr-0.013
 wo-0.013
ho-0.012
aa-0.011
 loves-0.011
Token filled
Token ID5901
Feature activation-7.41e-03

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
aa+0.001
 loves+0.001
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
 magnet-0.001
-0.001
Token with
Token ID351
Feature activation-1.10e-02

Pos logit contributions
amas+0.010
 subtract+0.010
 avail+0.010
+0.009
 magnet+0.009
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.014
aa-0.013
 loves-0.013
Token colourful
Token ID44799
Feature activation+5.01e-03

Pos logit contributions
amas+0.018
 subtract+0.017
 avail+0.017
 magnet+0.016
 Tears+0.016
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.018
aa-0.017
 loves-0.016
Token or
Token ID393
Feature activation-2.06e-02

Pos logit contributions
 ostr+0.009
 wo+0.009
ho+0.008
aa+0.008
 loves+0.008
Neg logit contributions
amas-0.008
 avail-0.008
 subtract-0.008
-0.008
 Tears-0.007
Tokennam
Token ID7402
Feature activation-8.72e-02

Pos logit contributions
amas+0.039
+0.037
oon+0.036
 avail+0.035
 subtract+0.034
Neg logit contributions
 ostr-0.035
 wo-0.032
ho-0.028
 loves-0.028
aa-0.027
Tokenents
Token ID658
Feature activation-1.09e-02

Pos logit contributions
 avail+0.126
amas+0.124
+0.119
 boulder+0.118
 subtract+0.117
Neg logit contributions
 ostr-0.166
 wo-0.155
ho-0.152
aa-0.151
 loves-0.140
Token.
Token ID13
Feature activation+3.23e-04

Pos logit contributions
amas+0.020
 subtract+0.019
 magnet+0.019
 avail+0.019
+0.018
Neg logit contributions
 ostr-0.017
 wo-0.016
ho-0.014
aa-0.014
 loves-0.014
Token Jimmy
Token ID12963
Feature activation+1.05e-02

Pos logit contributions
 wo+0.001
 ostr+0.001
 loves+0.001
aa+0.001
ho+0.001
Neg logit contributions
 avail-0.000
amas-0.000
 subtract-0.000
oon-0.000
iece-0.000
Token looked
Token ID3114
Feature activation-1.80e-03

Pos logit contributions
 ostr+0.017
 wo+0.017
ho+0.016
aa+0.015
 loves+0.014
Neg logit contributions
amas-0.018
 subtract-0.017
 magnet-0.016
lf-0.016
 boulder-0.016
Token out
Token ID503
Feature activation-6.99e-05

Pos logit contributions
amas+0.003
 magnet+0.003
 avail+0.003
 subtract+0.003
 Tears+0.003
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
 loves-0.002
Token she
Token ID673
Feature activation-8.95e-03

Pos logit contributions
amas+0.027
 avail+0.025
 subtract+0.025
+0.025
 magnet+0.024
Neg logit contributions
 ostr-0.025
 wo-0.024
ho-0.022
 loves-0.021
aa-0.021
Token had
Token ID550
Feature activation+9.02e-03

Pos logit contributions
amas+0.018
 avail+0.017
 subtract+0.017
 magnet+0.017
+0.016
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.011
aa-0.011
 loves-0.010
Token with
Token ID351
Feature activation-1.16e-02

Pos logit contributions
 ostr+0.014
 wo+0.014
ho+0.013
aa+0.012
 loves+0.012
Neg logit contributions
amas-0.017
 avail-0.016
 subtract-0.016
 magnet-0.015
-0.015
Token her
Token ID607
Feature activation+1.94e-03

Pos logit contributions
amas+0.024
 avail+0.022
 subtract+0.022
 magnet+0.022
+0.021
Neg logit contributions
 ostr-0.016
 wo-0.015
ho-0.014
aa-0.014
 loves-0.013
Token or
Token ID393
Feature activation-1.86e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
aa+0.003
ho+0.003
 loves+0.003
Neg logit contributions
amas-0.003
 magnet-0.003
 subtract-0.003
 avail-0.003
-0.003
Tokennam
Token ID7402
Feature activation-8.68e-02

Pos logit contributions
amas+0.035
 avail+0.032
 subtract+0.032
+0.032
 magnet+0.031
Neg logit contributions
 ostr-0.029
 wo-0.028
ho-0.026
aa-0.024
 loves-0.024
Tokenents
Token ID658
Feature activation-7.94e-03

Pos logit contributions
 avail+0.131
amas+0.128
 boulder+0.124
+0.122
roads+0.122
Neg logit contributions
 ostr-0.156
 wo-0.148
ho-0.146
aa-0.144
 loves-0.135
Token.
Token ID13
Feature activation+1.11e-03

Pos logit contributions
amas+0.016
 subtract+0.015
 avail+0.015
 magnet+0.014
+0.014
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.010
aa-0.010
 loves-0.009
Token<|endoftext|>
Token ID50256
Feature activation-8.24e-03

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
 loves+0.001
aa+0.001
Neg logit contributions
amas-0.002
 avail-0.002
 subtract-0.002
 magnet-0.002
 boulder-0.002
TokenOnce
Token ID7454
Feature activation+7.08e-04

Pos logit contributions
amas+0.017
 subtract+0.016
 avail+0.016
 magnet+0.015
ulum+0.015
Neg logit contributions
 ostr-0.011
 wo-0.011
ho-0.010
aa-0.009
 loves-0.009
Token upon
Token ID2402
Feature activation-3.46e-02

Pos logit contributions
 ostr+0.002
 wo+0.001
aa+0.001
 ll+0.001
ho+0.001
Neg logit contributions
 Tears-0.001
 magnet-0.001
on-0.001
lf-0.001
amas-0.001
Token to
Token ID284
Feature activation+1.58e-02

Pos logit contributions
 ostr+0.005
ho+0.004
 wo+0.004
aa+0.004
 ost+0.003
Neg logit contributions
amas-0.007
 subtract-0.007
 Tears-0.007
 magnet-0.006
things-0.006
Token me
Token ID502
Feature activation+2.07e-02

Pos logit contributions
 ostr+0.024
 wo+0.023
ho+0.022
aa+0.021
 loves+0.019
Neg logit contributions
 avail-0.029
 subtract-0.028
amas-0.028
 magnet-0.026
lf-0.026
Token.
Token ID13
Feature activation-1.33e-03

Pos logit contributions
 ostr+0.033
 wo+0.031
ho+0.029
aa+0.028
 loves+0.026
Neg logit contributions
amas-0.039
 subtract-0.036
 avail-0.036
 magnet-0.035
 Tears-0.035
Token The
Token ID383
Feature activation+1.70e-02

Pos logit contributions
amas+0.002
 avail+0.002
smanship+0.002
lf+0.002
 boulder+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
 loves-0.002
aa-0.002
ho-0.002
Token or
Token ID393
Feature activation-1.44e-02

Pos logit contributions
 ostr+0.029
 wo+0.028
ho+0.026
aa+0.026
 loves+0.024
Neg logit contributions
amas-0.028
 magnet-0.027
 subtract-0.027
 avail-0.026
 Tears-0.025
Tokennam
Token ID7402
Feature activation-8.67e-02

Pos logit contributions
amas+0.030
 avail+0.027
+0.027
 subtract+0.027
oon+0.026
Neg logit contributions
 ostr-0.021
 wo-0.020
ho-0.018
aa-0.017
 loves-0.017
Tokenents
Token ID658
Feature activation+1.27e-04

Pos logit contributions
 avail+0.128
amas+0.127
 boulder+0.123
+0.122
 magnet+0.118
Neg logit contributions
 ostr-0.163
ho-0.152
 wo-0.151
aa-0.150
 loves-0.141
Token were
Token ID547
Feature activation-4.10e-04

Pos logit contributions
 ostr+0.000
 wo+0.000
ho+0.000
aa+0.000
 loves+0.000
Neg logit contributions
amas-0.000
 subtract-0.000
 avail-0.000
 magnet-0.000
-0.000
Token very
Token ID845
Feature activation-4.23e-03

Pos logit contributions
amas+0.001
 subtract+0.001
 magnet+0.001
 avail+0.001
 Tears+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token important
Token ID1593
Feature activation+2.03e-02

Pos logit contributions
amas+0.007
 subtract+0.007
 magnet+0.006
 avail+0.006
 Tears+0.006
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token to
Token ID284
Feature activation+1.21e-02

Pos logit contributions
 ostr+0.036
 wo+0.032
ho+0.032
aa+0.032
 loves+0.027
Neg logit contributions
amas-0.033
 subtract-0.032
 magnet-0.030
 Tears-0.030
 avail-0.030
Token never
Token ID1239
Feature activation+2.79e-02

Pos logit contributions
 ostr+0.000
 wo+0.000
ho+0.000
aa+0.000
 loves+0.000
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
 magnet-0.001
-0.001
Token played
Token ID2826
Feature activation-3.11e-03

Pos logit contributions
 ostr+0.046
 wo+0.044
ho+0.040
aa+0.040
 loves+0.038
Neg logit contributions
 subtract-0.047
amas-0.046
 avail-0.045
 magnet-0.045
 Tears-0.044
Token with
Token ID351
Feature activation-2.79e-03

Pos logit contributions
 avail+0.006
amas+0.006
 subtract+0.006
 boulder+0.005
 Tears+0.005
Neg logit contributions
 wo-0.004
 ostr-0.004
 for-0.004
 loves-0.004
ho-0.004
Token the
Token ID262
Feature activation+1.50e-03

Pos logit contributions
amas+0.005
 magnet+0.005
 avail+0.005
 subtract+0.005
 Tears+0.005
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.003
Token or
Token ID393
Feature activation-1.16e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.002
 subtract-0.002
 magnet-0.002
 avail-0.002
-0.002
Tokennam
Token ID7402
Feature activation-8.62e-02

Pos logit contributions
amas+0.021
 avail+0.020
 subtract+0.020
 magnet+0.019
+0.019
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.016
aa-0.016
 loves-0.015
Tokenents
Token ID658
Feature activation+2.27e-03

Pos logit contributions
 avail+0.125
amas+0.123
 boulder+0.119
+0.116
 subtract+0.115
Neg logit contributions
 ostr-0.164
ho-0.153
 wo-0.153
aa-0.150
 loves-0.141
Token again
Token ID757
Feature activation-4.83e-03

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.003
 avail-0.003
 subtract-0.003
 magnet-0.003
-0.003
Token.
Token ID13
Feature activation+1.06e-03

Pos logit contributions
amas+0.011
 subtract+0.010
 avail+0.010
 magnet+0.010
 Tears+0.010
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.005
aa-0.005
 loves-0.005
Token They
Token ID1119
Feature activation+7.49e-03

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.001
Neg logit contributions
amas-0.002
 subtract-0.002
 avail-0.002
 magnet-0.002
-0.002
Token still
Token ID991
Feature activation-2.75e-03

Pos logit contributions
 ostr+0.012
 wo+0.012
aa+0.012
ho+0.011
 loves+0.010
Neg logit contributions
 subtract-0.012
 magnet-0.012
amas-0.012
-0.011
 avail-0.011
Token He
Token ID679
Feature activation-4.82e-03

Pos logit contributions
 avail+0.002
smanship+0.002
inge+0.002
ivering+0.002
lf+0.001
Neg logit contributions
 wo-0.002
 loves-0.001
 to-0.001
 ostr-0.001
:-0.001
Token quickly
Token ID2952
Feature activation+5.08e-02

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.008
+0.008
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token hung
Token ID9174
Feature activation-8.10e-03

Pos logit contributions
 ostr+0.087
 wo+0.082
ho+0.077
aa+0.075
 loves+0.072
Neg logit contributions
amas-0.088
 avail-0.083
 subtract-0.082
 magnet-0.080
-0.079
Token colorful
Token ID20239
Feature activation+8.52e-03

Pos logit contributions
amas+0.016
 subtract+0.016
 avail+0.016
+0.015
ulum+0.015
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.010
aa-0.009
 loves-0.009
Token or
Token ID393
Feature activation-1.73e-02

Pos logit contributions
 ostr+0.017
 wo+0.016
ho+0.015
aa+0.014
 loves+0.014
Neg logit contributions
amas-0.013
 avail-0.012
 subtract-0.012
-0.012
 Tears-0.011
Tokennam
Token ID7402
Feature activation-8.57e-02

Pos logit contributions
amas+0.023
+0.022
 avail+0.020
oon+0.020
 boulder+0.020
Neg logit contributions
 ostr-0.036
 wo-0.035
ho-0.033
 loves-0.033
aa-0.030
Tokenents
Token ID658
Feature activation-6.19e-03

Pos logit contributions
 avail+0.131
amas+0.128
 boulder+0.126
+0.126
roads+0.124
Neg logit contributions
 ostr-0.151
ho-0.142
 wo-0.139
aa-0.139
la-0.127
Token and
Token ID290
Feature activation-5.11e-03

Pos logit contributions
amas+0.011
 subtract+0.011
 avail+0.011
 magnet+0.010
+0.010
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
aa-0.008
 loves-0.008
Token tw
Token ID665
Feature activation-1.61e-02

Pos logit contributions
amas+0.010
 avail+0.009
+0.009
ulum+0.008
oon+0.008
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
 loves-0.007
aa-0.007
Tokenink
Token ID676
Feature activation+1.93e-02

Pos logit contributions
amas+0.021
 subtract+0.019
 avail+0.019
+0.018
 magnet+0.018
Neg logit contributions
 ostr-0.035
 wo-0.034
ho-0.032
aa-0.031
 loves-0.030
Tokenling
Token ID1359
Feature activation+9.46e-03

Pos logit contributions
 ostr+0.040
 wo+0.038
ho+0.036
aa+0.035
 loves+0.034
Neg logit contributions
amas-0.027
 subtract-0.023
oon-0.023
iece-0.023
 magnet-0.023
Token and
Token ID290
Feature activation-6.74e-03

Pos logit contributions
amas+0.007
 avail+0.007
 subtract+0.007
 magnet+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
aa-0.004
 loves-0.004
Token lots
Token ID6041
Feature activation+1.31e-02

Pos logit contributions
amas+0.012
 avail+0.011
+0.011
 subtract+0.011
 magnet+0.010
Neg logit contributions
 ostr-0.012
 wo-0.011
ho-0.010
 loves-0.010
aa-0.010
Token of
Token ID286
Feature activation-2.55e-02

Pos logit contributions
 ostr+0.022
 wo+0.021
ho+0.020
aa+0.019
 loves+0.018
Neg logit contributions
amas-0.023
 avail-0.021
 subtract-0.021
 magnet-0.021
-0.020
Token shiny
Token ID22441
Feature activation-6.36e-03

Pos logit contributions
amas+0.045
 avail+0.042
 subtract+0.041
+0.040
 magnet+0.040
Neg logit contributions
 ostr-0.044
 wo-0.042
ho-0.039
aa-0.037
 loves-0.037
Token or
Token ID393
Feature activation-2.03e-02

Pos logit contributions
amas+0.009
 avail+0.008
 subtract+0.008
+0.008
 Tears+0.008
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.012
 loves-0.011
aa-0.011
Tokennam
Token ID7402
Feature activation-8.53e-02

Pos logit contributions
amas+0.038
+0.035
 avail+0.034
 subtract+0.034
oon+0.033
Neg logit contributions
 ostr-0.034
 wo-0.032
ho-0.029
 loves-0.028
aa-0.027
Tokenents
Token ID658
Feature activation-1.06e-02

Pos logit contributions
amas+0.127
 avail+0.122
+0.119
 boulder+0.117
 subtract+0.117
Neg logit contributions
 ostr-0.161
ho-0.151
 wo-0.149
aa-0.145
la-0.135
Token.
Token ID13
Feature activation-9.08e-04

Pos logit contributions
amas+0.020
 subtract+0.019
 avail+0.019
 magnet+0.019
+0.018
Neg logit contributions
 ostr-0.016
 wo-0.015
ho-0.014
aa-0.014
 loves-0.013
Token L
Token ID406
Feature activation+7.71e-03

Pos logit contributions
amas+0.002
 avail+0.002
 subtract+0.002
 magnet+0.002
 boulder+0.002
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Tokenila
Token ID10102
Feature activation+2.06e-03

Pos logit contributions
 ostr+0.011
 wo+0.010
ho+0.010
 loves+0.009
aa+0.008
Neg logit contributions
amas-0.015
-0.014
oon-0.014
 subtract-0.014
 avail-0.014
Token wanted
Token ID2227
Feature activation+3.72e-03

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
aa+0.003
 loves+0.003
Neg logit contributions
amas-0.004
 subtract-0.004
 avail-0.004
 magnet-0.003
 Tears-0.003
Token to
Token ID284
Feature activation+1.29e-02

Pos logit contributions
amas+0.037
 avail+0.035
 subtract+0.034
 magnet+0.034
+0.034
Neg logit contributions
 ostr-0.018
 wo-0.017
ho-0.015
aa-0.014
 loves-0.014
Token help
Token ID1037
Feature activation+8.53e-04

Pos logit contributions
 ostr+0.020
 wo+0.019
ho+0.018
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.024
 avail-0.023
 subtract-0.023
 magnet-0.022
-0.022
Token sort
Token ID3297
Feature activation-1.33e-02

Pos logit contributions
 ostr+0.001
 wo+0.001
aa+0.001
 for+0.001
ho+0.001
Neg logit contributions
amas-0.002
bridge-0.002
-0.002
icycle-0.002
oon-0.002
Token the
Token ID262
Feature activation-3.05e-04

Pos logit contributions
amas+0.033
+0.030
oon+0.030
lf+0.030
ulum+0.030
Neg logit contributions
 ostr-0.013
 wo-0.011
 for-0.011
ho-0.010
aa-0.010
Token or
Token ID393
Feature activation-1.71e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.000
+0.000
oon+0.000
Neg logit contributions
 ostr-0.001
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Tokennam
Token ID7402
Feature activation-8.51e-02

Pos logit contributions
amas+0.031
+0.028
oon+0.028
icycle+0.026
ulum+0.026
Neg logit contributions
 ostr-0.029
ho-0.027
 wo-0.027
 for-0.026
 loves-0.026
Tokenents
Token ID658
Feature activation-4.52e-03

Pos logit contributions
amas+0.130
 avail+0.130
 boulder+0.128
+0.125
 sensation+0.120
Neg logit contributions
 ostr-0.150
ho-0.144
 wo-0.137
aa-0.137
 loves-0.128
Token.
Token ID13
Feature activation-2.96e-05

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.009
 magnet+0.009
+0.009
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.005
aa-0.005
 loves-0.005
Token She
Token ID1375
Feature activation-4.61e-03

Pos logit contributions
 avail+0.000
amas+0.000
lf+0.000
 magnet+0.000
oon+0.000
Neg logit contributions
 wo-0.000
 ostr-0.000
ho-0.000
 loves-0.000
aa-0.000
Token picked
Token ID6497
Feature activation+8.80e-03

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.008
+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token up
Token ID510
Feature activation+7.31e-03

Pos logit contributions
 ostr+0.015
 wo+0.015
ho+0.014
aa+0.014
 loves+0.013
Neg logit contributions
amas-0.014
 avail-0.014
 subtract-0.014
 magnet-0.014
 Tears-0.013
Token take
Token ID1011
Feature activation-1.95e-02

Pos logit contributions
 ostr+0.009
 wo+0.008
 loves+0.008
ho+0.008
aa+0.007
Neg logit contributions
amas-0.015
 avail-0.013
 Tears-0.013
 subtract-0.013
iece-0.013
Token one
Token ID530
Feature activation-1.76e-02

Pos logit contributions
amas+0.038
 avail+0.035
 subtract+0.035
+0.034
 magnet+0.034
Neg logit contributions
 ostr-0.030
 wo-0.028
ho-0.025
aa-0.025
 loves-0.024
Token of
Token ID286
Feature activation-2.63e-02

Pos logit contributions
amas+0.034
 subtract+0.032
 avail+0.032
 Tears+0.031
+0.030
Neg logit contributions
 ostr-0.027
 wo-0.025
ho-0.024
aa-0.023
 loves-0.022
Token the
Token ID262
Feature activation-4.51e-03

Pos logit contributions
amas+0.046
 subtract+0.043
 avail+0.042
+0.041
 magnet+0.041
Neg logit contributions
 ostr-0.045
 wo-0.043
ho-0.040
aa-0.039
 loves-0.037
Token or
Token ID393
Feature activation-2.22e-02

Pos logit contributions
amas+0.008
 avail+0.007
 subtract+0.007
ulum+0.007
oon+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
 loves-0.007
aa-0.007
Tokennam
Token ID7402
Feature activation-8.49e-02

Pos logit contributions
amas+0.037
+0.034
oon+0.032
 avail+0.031
 Tears+0.030
Neg logit contributions
 ostr-0.042
 wo-0.039
ho-0.037
 loves-0.036
aa-0.034
Tokenents
Token ID658
Feature activation-1.08e-02

Pos logit contributions
 avail+0.124
amas+0.123
 boulder+0.118
+0.117
 magnet+0.115
Neg logit contributions
 ostr-0.159
ho-0.149
 wo-0.146
aa-0.145
 loves-0.134
Token outside
Token ID2354
Feature activation+8.56e-03

Pos logit contributions
amas+0.022
 subtract+0.021
 avail+0.021
 magnet+0.020
+0.020
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.013
 loves-0.012
Token for
Token ID329
Feature activation+3.49e-03

Pos logit contributions
 ostr+0.010
 wo+0.010
ho+0.009
 loves+0.009
aa+0.008
Neg logit contributions
amas-0.019
 avail-0.018
 subtract-0.018
-0.017
ulum-0.017
Token a
Token ID257
Feature activation-9.26e-03

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.007
 avail-0.006
 subtract-0.006
oon-0.006
ulum-0.006
Token walk
Token ID2513
Feature activation-8.17e-03

Pos logit contributions
 magnet+0.014
amas+0.014
 subtract+0.013
 avail+0.013
+0.013
Neg logit contributions
 ostr-0.017
 wo-0.016
ho-0.015
aa-0.015
 loves-0.015
Token at
Token ID379
Feature activation-2.71e-02

Pos logit contributions
 ostr+0.030
 wo+0.027
ho+0.025
aa+0.023
 loves+0.021
Neg logit contributions
amas-0.060
 subtract-0.056
 magnet-0.056
 avail-0.056
 Tears-0.056
Token all
Token ID477
Feature activation+2.48e-02

Pos logit contributions
amas+0.053
 avail+0.051
 subtract+0.050
 magnet+0.049
 Tears+0.048
Neg logit contributions
 ostr-0.040
 wo-0.038
ho-0.035
aa-0.034
 loves-0.032
Token the
Token ID262
Feature activation-1.63e-04

Pos logit contributions
 ostr+0.039
 wo+0.037
ho+0.035
aa+0.034
 loves+0.032
Neg logit contributions
amas-0.046
 avail-0.043
 subtract-0.043
 magnet-0.042
 Tears-0.042
Token different
Token ID1180
Feature activation+5.43e-03

Pos logit contributions
amas+0.000
 avail+0.000
 subtract+0.000
+0.000
 magnet+0.000
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token or
Token ID393
Feature activation-1.59e-02

Pos logit contributions
 ostr+0.011
 wo+0.010
ho+0.010
 loves+0.009
aa+0.009
Neg logit contributions
amas-0.008
 avail-0.007
-0.007
 subtract-0.007
 magnet-0.007
Tokennam
Token ID7402
Feature activation-8.48e-02

Pos logit contributions
amas+0.023
+0.021
oon+0.020
 avail+0.019
 subtract+0.019
Neg logit contributions
 ostr-0.033
 wo-0.031
ho-0.031
 loves-0.029
aa-0.028
Tokenents
Token ID658
Feature activation-2.52e-03

Pos logit contributions
 avail+0.130
amas+0.127
 boulder+0.124
+0.121
 magnet+0.121
Neg logit contributions
 ostr-0.152
ho-0.144
aa-0.140
 wo-0.139
 loves-0.129
Token and
Token ID290
Feature activation-3.88e-03

Pos logit contributions
amas+0.005
 subtract+0.005
 avail+0.005
 magnet+0.004
+0.004
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.003
aa-0.003
 loves-0.003
Token adm
Token ID6178
Feature activation-9.52e-03

Pos logit contributions
amas+0.007
 avail+0.006
+0.006
 subtract+0.006
oon+0.006
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
 loves-0.006
aa-0.006
Tokeniring
Token ID3428
Feature activation-9.92e-03

Pos logit contributions
amas+0.013
 subtract+0.013
 avail+0.013
+0.012
 magnet+0.012
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.018
aa-0.017
 loves-0.017
Token the
Token ID262
Feature activation-2.34e-03

Pos logit contributions
amas+0.020
 avail+0.019
 subtract+0.019
 magnet+0.018
+0.018
Neg logit contributions
 ostr-0.014
 wo-0.013
ho-0.012
aa-0.012
 loves-0.011
Token who
Token ID508
Feature activation+1.79e-03

Pos logit contributions
house+0.027
lf+0.026
 magnet+0.026
on+0.026
 Tears+0.024
Neg logit contributions
 wo-0.035
 ostr-0.034
rav-0.033
ho-0.031
 vows-0.031
Token liked
Token ID8288
Feature activation-3.45e-03

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.002
aa+0.002
la+0.002
Neg logit contributions
 magnet-0.003
amas-0.003
 Tears-0.003
things-0.003
house-0.003
Token to
Token ID284
Feature activation+8.47e-03

Pos logit contributions
amas+0.008
 subtract+0.008
 magnet+0.008
 basketball+0.008
 Tears+0.007
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
 loves-0.003
Token collect
Token ID2824
Feature activation-1.43e-02

Pos logit contributions
 ostr+0.012
 wo+0.012
aa+0.010
 loves+0.010
ho+0.010
Neg logit contributions
amas-0.018
 avail-0.016
 Tears-0.016
 subtract-0.016
 boulder-0.015
Token or
Token ID393
Feature activation-1.81e-02

Pos logit contributions
amas+0.026
 avail+0.024
 subtract+0.024
+0.024
ulum+0.023
Neg logit contributions
 ostr-0.025
 wo-0.023
ho-0.021
aa-0.021
 loves-0.020
Tokennam
Token ID7402
Feature activation-8.48e-02

Pos logit contributions
amas+0.023
+0.020
oon+0.020
icycle+0.018
ulum+0.018
Neg logit contributions
 ostr-0.042
ho-0.039
 wo-0.039
 loves-0.036
 would-0.036
Tokenents
Token ID658
Feature activation-6.67e-03

Pos logit contributions
amas+0.118
 avail+0.114
 boulder+0.110
 subtract+0.109
+0.108
Neg logit contributions
 ostr-0.168
 wo-0.157
ho-0.155
aa-0.151
 loves-0.144
Token.
Token ID13
Feature activation-1.77e-03

Pos logit contributions
amas+0.012
 subtract+0.011
 magnet+0.011
 avail+0.011
iece+0.011
Neg logit contributions
 ostr-0.011
 wo-0.010
aa-0.009
ho-0.009
 loves-0.009
Token Everywhere
Token ID48201
Feature activation+1.95e-02

Pos logit contributions
amas+0.003
 avail+0.003
 boulder+0.002
 subtract+0.002
 magnet+0.002
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
 loves-0.003
aa-0.003
Token she
Token ID673
Feature activation-9.66e-03

Pos logit contributions
 ostr+0.043
 wo+0.042
ho+0.040
aa+0.040
arl+0.037
Neg logit contributions
 avail-0.021
amas-0.021
 magnet-0.020
 subtract-0.019
lf-0.019
Token went
Token ID1816
Feature activation+5.42e-04

Pos logit contributions
amas+0.020
 avail+0.019
 subtract+0.019
 magnet+0.018
+0.018
Neg logit contributions
 ostr-0.014
 wo-0.013
ho-0.012
aa-0.011
 loves-0.011
Token But
Token ID887
Feature activation-2.15e-02

Pos logit contributions
amas+0.007
 avail+0.006
 subtract+0.006
 magnet+0.006
+0.006
Neg logit contributions
 ostr-0.009
 wo-0.009
ho-0.008
aa-0.008
 loves-0.008
Token we
Token ID356
Feature activation+3.23e-03

Pos logit contributions
amas+0.039
 avail+0.039
 subtract+0.039
 magnet+0.037
+0.036
Neg logit contributions
 ostr-0.032
 wo-0.031
aa-0.029
ho-0.029
 loves-0.026
Token have
Token ID423
Feature activation-8.46e-03

Pos logit contributions
 ostr+0.005
 wo+0.005
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.006
 avail-0.006
 subtract-0.006
 Tears-0.006
 magnet-0.006
Token to
Token ID284
Feature activation+1.15e-02

Pos logit contributions
 subtract+0.017
amas+0.016
 slices+0.016
 pieces+0.016
 Tears+0.016
Neg logit contributions
 ostr-0.011
ho-0.010
 wo-0.010
aa-0.009
 loves-0.008
Token take
Token ID1011
Feature activation-1.59e-02

Pos logit contributions
 ostr+0.017
 wo+0.016
ho+0.015
aa+0.015
 loves+0.014
Neg logit contributions
amas-0.022
 avail-0.021
 subtract-0.021
 magnet-0.020
-0.020
Token turns
Token ID4962
Feature activation+4.82e-02

Pos logit contributions
amas+0.022
 subtract+0.022
 avail+0.022
 Tears+0.021
 equations+0.020
Neg logit contributions
 ostr-0.031
ho-0.030
 wo-0.028
aa-0.026
 loves-0.025
Token and
Token ID290
Feature activation-6.02e-03

Pos logit contributions
 ostr+0.084
 wo+0.076
ho+0.076
aa+0.068
 loves+0.065
Neg logit contributions
amas-0.080
 subtract-0.078
lf-0.072
oon-0.072
pen-0.070
Token share
Token ID2648
Feature activation-2.06e-02

Pos logit contributions
amas+0.012
 avail+0.011
 subtract+0.011
 magnet+0.011
+0.011
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.008
 loves-0.007
Token.
Token ID13
Feature activation-4.67e-03

Pos logit contributions
amas+0.034
 subtract+0.032
 magnet+0.032
 avail+0.032
lf+0.031
Neg logit contributions
 ostr-0.035
 wo-0.033
ho-0.032
aa-0.032
 loves-0.027
Token First
Token ID3274
Feature activation-1.96e-04

Pos logit contributions
amas+0.007
 avail+0.007
 subtract+0.007
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.008
 loves-0.007
Token,
Token ID11
Feature activation-5.07e-03

Pos logit contributions
amas+0.000
 avail+0.000
 subtract+0.000
+0.000
 magnet+0.000
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token was
Token ID373
Feature activation+1.51e-02

Pos logit contributions
 ostr+0.008
 wo+0.008
ho+0.007
 loves+0.007
aa+0.007
Neg logit contributions
amas-0.011
 avail-0.011
 subtract-0.010
-0.010
ulum-0.010
Token so
Token ID523
Feature activation-8.13e-03

Pos logit contributions
 ostr+0.026
 wo+0.025
ho+0.023
aa+0.022
 loves+0.022
Neg logit contributions
amas-0.026
 avail-0.025
 subtract-0.025
 magnet-0.024
-0.024
Token relieved
Token ID22598
Feature activation-5.99e-03

Pos logit contributions
amas+0.015
 avail+0.014
 subtract+0.013
ulum+0.013
+0.013
Neg logit contributions
 ostr-0.014
 wo-0.013
ho-0.013
 loves-0.012
aa-0.012
Token,
Token ID11
Feature activation+5.11e-04

Pos logit contributions
amas+0.011
 avail+0.011
 subtract+0.011
 Tears+0.011
 magnet+0.011
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.007
 loves-0.007
Token he
Token ID339
Feature activation-1.40e-02

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
 for+0.001
 loves+0.001
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
smanship-0.001
icycle-0.001
Token quickly
Token ID2952
Feature activation+5.10e-02

Pos logit contributions
amas+0.025
 subtract+0.023
 avail+0.023
 magnet+0.023
 Tears+0.022
Neg logit contributions
 ostr-0.023
 wo-0.022
ho-0.021
aa-0.020
 loves-0.019
Token ran
Token ID4966
Feature activation+2.09e-02

Pos logit contributions
 ostr+0.084
 wo+0.079
ho+0.075
aa+0.073
 loves+0.070
Neg logit contributions
amas-0.087
 subtract-0.084
 avail-0.081
 magnet-0.081
 boulder-0.080
Token past
Token ID1613
Feature activation+5.61e-03

Pos logit contributions
 ostr+0.028
 wo+0.026
ho+0.024
aa+0.023
 loves+0.022
Neg logit contributions
amas-0.044
 avail-0.042
 subtract-0.042
 magnet-0.041
-0.041
Token the
Token ID262
Feature activation-3.19e-03

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.006
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.013
 subtract-0.012
 avail-0.012
-0.011
ulum-0.011
Token trap
Token ID12840
Feature activation+1.69e-02

Pos logit contributions
amas+0.005
 subtract+0.005
 avail+0.005
+0.004
 Tears+0.004
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.005
 loves-0.005
aa-0.005
Token and
Token ID290
Feature activation-6.47e-03

Pos logit contributions
 ostr+0.022
 wo+0.022
ho+0.020
aa+0.018
 loves+0.018
Neg logit contributions
amas-0.036
 avail-0.035
 subtract-0.034
ulum-0.033
-0.033
Token up
Token ID510
Feature activation+5.43e-03

Pos logit contributions
 ostr+0.019
 wo+0.019
ho+0.017
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.026
 avail-0.026
 subtract-0.025
-0.024
 magnet-0.024
Token and
Token ID290
Feature activation-6.38e-03

Pos logit contributions
 ostr+0.007
 wo+0.007
ho+0.006
aa+0.006
 loves+0.005
Neg logit contributions
amas-0.011
 subtract-0.011
 avail-0.011
 magnet-0.011
ulum-0.010
Token down
Token ID866
Feature activation-1.52e-02

Pos logit contributions
amas+0.010
 avail+0.009
 subtract+0.009
+0.009
 magnet+0.009
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.011
 loves-0.011
Token.
Token ID13
Feature activation+1.95e-04

Pos logit contributions
amas+0.031
 avail+0.029
 subtract+0.029
 magnet+0.029
+0.028
Neg logit contributions
 ostr-0.021
 wo-0.020
ho-0.019
aa-0.018
 loves-0.017
Token She
Token ID1375
Feature activation-6.26e-03

Pos logit contributions
 wo+0.000
 ostr+0.000
 loves+0.000
ho+0.000
aa+0.000
Neg logit contributions
 avail-0.000
amas-0.000
oon-0.000
 boulder-0.000
roads-0.000
Token quickly
Token ID2952
Feature activation+5.27e-02

Pos logit contributions
amas+0.011
 subtract+0.011
 magnet+0.010
 Tears+0.010
 avail+0.010
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
aa-0.009
 loves-0.008
Token put
Token ID1234
Feature activation-5.13e-03

Pos logit contributions
 ostr+0.091
 wo+0.084
ho+0.082
aa+0.078
 loves+0.075
Neg logit contributions
amas-0.085
 subtract-0.084
 avail-0.080
 magnet-0.078
 boulder-0.078
Token on
Token ID319
Feature activation-1.35e-02

Pos logit contributions
amas+0.011
 subtract+0.011
 avail+0.011
ulum+0.010
 boulder+0.010
Neg logit contributions
 wo-0.006
 ostr-0.006
 loves-0.005
aa-0.005
 for-0.005
Token her
Token ID607
Feature activation+2.94e-03

Pos logit contributions
amas+0.030
 subtract+0.029
 avail+0.028
oon+0.028
+0.028
Neg logit contributions
 ostr-0.016
 wo-0.015
 loves-0.013
ho-0.013
aa-0.012
Token best
Token ID1266
Feature activation-1.58e-02

Pos logit contributions
 ostr+0.005
 wo+0.004
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.005
 avail-0.005
 subtract-0.005
 magnet-0.005
-0.005
Token dress
Token ID6576
Feature activation-7.04e-03

Pos logit contributions
amas+0.024
 avail+0.022
 magnet+0.022
 subtract+0.022
+0.021
Neg logit contributions
 ostr-0.030
ho-0.029
 wo-0.028
aa-0.026
 loves-0.025
Tokenped
Token ID9124
Feature activation+4.41e-02

Pos logit contributions
amas+0.003
 subtract+0.003
 avail+0.003
 magnet+0.003
+0.003
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token up
Token ID510
Feature activation+6.46e-03

Pos logit contributions
 ostr+0.069
 wo+0.067
ho+0.062
aa+0.058
 loves+0.058
Neg logit contributions
amas-0.082
 avail-0.078
 subtract-0.077
-0.074
 magnet-0.074
Token the
Token ID262
Feature activation-2.16e-03

Pos logit contributions
 ostr+0.009
 wo+0.009
ho+0.008
aa+0.008
 loves+0.007
Neg logit contributions
amas-0.013
 subtract-0.012
 avail-0.012
ulum-0.012
 magnet-0.012
Token apples
Token ID22514
Feature activation+1.22e-02

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
 magnet+0.003
+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Tokenau
Token ID559
Feature activation+1.93e-02

Pos logit contributions
 ostr+0.016
ho+0.015
 wo+0.015
aa+0.013
 for+0.012
Neg logit contributions
amas-0.026
 avail-0.026
 subtract-0.025
ulum-0.024
 Tears-0.024
Tokence
Token ID344
Feature activation+4.82e-02

Pos logit contributions
 ostr+0.039
 wo+0.037
ho+0.035
aa+0.034
 loves+0.033
Neg logit contributions
amas-0.028
 avail-0.025
 subtract-0.025
-0.025
 magnet-0.024
Token and
Token ID290
Feature activation-4.44e-03

Pos logit contributions
 ostr+0.083
 wo+0.078
aa+0.071
ho+0.070
 loves+0.068
Neg logit contributions
amas-0.081
 subtract-0.075
 magnet-0.074
-0.074
 Tears-0.072
Token then
Token ID788
Feature activation+1.68e-02

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
+0.008
 magnet+0.007
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Token went
Token ID1816
Feature activation+4.04e-03

Pos logit contributions
 ostr+0.028
 wo+0.027
ho+0.025
aa+0.024
 loves+0.024
Neg logit contributions
amas-0.030
 avail-0.028
 subtract-0.028
-0.027
 magnet-0.027
Token racing
Token ID11717
Feature activation+2.14e-02

Pos logit contributions
 ostr+0.005
 wo+0.004
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.009
 avail-0.009
 subtract-0.009
 magnet-0.008
-0.008
Token around
Token ID1088
Feature activation-1.60e-03

Pos logit contributions
 ostr+0.031
 wo+0.029
ho+0.027
aa+0.026
 loves+0.025
Neg logit contributions
amas-0.043
 avail-0.040
 subtract-0.040
-0.039
 magnet-0.039
Token he
Token ID339
Feature activation-1.16e-02

Pos logit contributions
 avail+0.005
amas+0.005
 Tears+0.005
 magnet+0.004
lf+0.004
Neg logit contributions
 ostr-0.025
 wo-0.024
ho-0.024
aa-0.023
 loves-0.023
Token said
Token ID531
Feature activation-2.39e-03

Pos logit contributions
amas+0.017
 magnet+0.016
 subtract+0.016
lf+0.016
 boulder+0.015
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.018
 loves-0.018
aa-0.018
Token,
Token ID11
Feature activation-1.26e-03

Pos logit contributions
 magnet+0.004
 Tears+0.004
 subtract+0.004
amas+0.004
on+0.004
Neg logit contributions
 ostr-0.004
aa-0.004
 wo-0.003
 loves-0.003
ho-0.003
Token "
Token ID366
Feature activation-1.65e-02

Pos logit contributions
amas+0.002
 subtract+0.002
 Tears+0.002
 magnet+0.002
+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
TokenHi
Token ID17250
Feature activation-1.99e-02

Pos logit contributions
amas+0.026
 avail+0.024
 subtract+0.024
 magnet+0.023
+0.023
Neg logit contributions
 ostr-0.031
 wo-0.029
ho-0.028
aa-0.027
 loves-0.026
Token Fl
Token ID1610
Feature activation+4.67e-02

Pos logit contributions
amas+0.032
 avail+0.029
 magnet+0.028
 Tears+0.028
+0.027
Neg logit contributions
 ostr-0.033
ho-0.032
 wo-0.029
aa-0.029
la-0.027
Tokenuffy
Token ID15352
Feature activation-6.37e-04

Pos logit contributions
 ostr+0.078
 wo+0.066
ho+0.059
la+0.058
 ost+0.056
Neg logit contributions
amas-0.088
-0.088
oon-0.076
 magnet-0.074
icycle-0.073
Token,
Token ID11
Feature activation-4.97e-03

Pos logit contributions
amas+0.001
 magnet+0.001
+0.001
oon+0.001
 avail+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token do
Token ID466
Feature activation-2.20e-02

Pos logit contributions
 magnet+0.007
 boulder+0.007
 basketball+0.007
 Tears+0.006
amas+0.006
Neg logit contributions
 ostr-0.008
aa-0.008
ho-0.008
la-0.008
by-0.007
Token you
Token ID345
Feature activation+1.64e-02

Pos logit contributions
amas+0.044
 avail+0.041
 subtract+0.041
 magnet+0.040
+0.040
Neg logit contributions
 ostr-0.032
 wo-0.030
ho-0.028
aa-0.027
 loves-0.026
Token want
Token ID765
Feature activation-1.20e-02

Pos logit contributions
 ostr+0.024
 wo+0.021
ho+0.020
aa+0.019
 loves+0.017
Neg logit contributions
amas-0.032
 subtract-0.031
 avail-0.031
 magnet-0.030
 Tears-0.030
Token The
Token ID383
Feature activation+2.53e-02

Pos logit contributions
amas+0.000
 avail+0.000
 subtract+0.000
 magnet+0.000
 boulder+0.000
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token pond
Token ID16723
Feature activation+4.94e-03

Pos logit contributions
 ostr+0.043
 wo+0.041
ho+0.038
aa+0.037
 loves+0.036
Neg logit contributions
amas-0.044
 avail-0.041
 subtract-0.041
 magnet-0.040
-0.039
Token spark
Token ID9009
Feature activation-2.34e-03

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
aa+0.006
 loves+0.006
Neg logit contributions
amas-0.009
 avail-0.009
 subtract-0.009
 magnet-0.009
-0.009
Tokenled
Token ID992
Feature activation+1.99e-02

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
oon+0.003
 magnet+0.003
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
aa-0.004
 loves-0.004
Token and
Token ID290
Feature activation-5.07e-04

Pos logit contributions
 ostr+0.032
 wo+0.030
ho+0.028
aa+0.027
 loves+0.026
Neg logit contributions
amas-0.036
 avail-0.034
 subtract-0.034
 magnet-0.033
 Tears-0.033
Token the
Token ID262
Feature activation+2.29e-03

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 magnet+0.001
+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token ducks
Token ID39694
Feature activation+1.13e-02

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.004
 loves+0.003
Neg logit contributions
amas-0.004
 avail-0.003
 subtract-0.003
 magnet-0.003
-0.003
Token qu
Token ID627
Feature activation-6.89e-04

Pos logit contributions
 ostr+0.018
 wo+0.017
ho+0.016
aa+0.016
 loves+0.015
Neg logit contributions
amas-0.020
 subtract-0.018
 magnet-0.018
-0.018
 avail-0.018
Tokenacked
Token ID6021
Feature activation+3.12e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 magnet+0.001
+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token.
Token ID13
Feature activation-5.28e-05

Pos logit contributions
 ostr+0.049
 wo+0.046
ho+0.043
aa+0.042
 loves+0.040
Neg logit contributions
amas-0.058
 avail-0.055
 subtract-0.054
 magnet-0.053
 Tears-0.052
Token Anna
Token ID11735
Feature activation+1.78e-02

Pos logit contributions
amas+0.000
 avail+0.000
 subtract+0.000
 magnet+0.000
 boulder+0.000
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token a
Token ID257
Feature activation-5.12e-03

Pos logit contributions
 subtract+0.009
 avail+0.009
amas+0.009
 magnet+0.008
 slices+0.008
Neg logit contributions
 ostr-0.010
ho-0.010
aa-0.010
 wo-0.009
 loves-0.008
Token good
Token ID922
Feature activation+6.52e-03

Pos logit contributions
 magnet+0.009
+0.008
 basketball+0.008
 vest+0.008
 avail+0.007
Neg logit contributions
 ostr-0.008
ho-0.008
la-0.007
unn-0.007
 wo-0.007
Token time
Token ID640
Feature activation+1.07e-02

Pos logit contributions
 ostr+0.010
 wo+0.009
ho+0.009
la+0.008
aa+0.008
Neg logit contributions
amas-0.012
 magnet-0.011
 avail-0.011
-0.011
 subtract-0.010
Token in
Token ID287
Feature activation-9.54e-03

Pos logit contributions
 ostr+0.017
 wo+0.016
ho+0.015
aa+0.015
 loves+0.013
Neg logit contributions
amas-0.019
 subtract-0.019
 avail-0.018
 magnet-0.018
-0.018
Token the
Token ID262
Feature activation-9.61e-04

Pos logit contributions
amas+0.018
 avail+0.018
 subtract+0.017
 magnet+0.017
+0.017
Neg logit contributions
 ostr-0.014
 wo-0.014
ho-0.013
aa-0.012
 loves-0.012
Token mountains
Token ID12269
Feature activation+2.24e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 magnet+0.001
+0.001
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token.
Token ID13
Feature activation-1.02e-03

Pos logit contributions
 ostr+0.035
 wo+0.033
ho+0.030
aa+0.030
 loves+0.028
Neg logit contributions
amas-0.041
 avail-0.039
 subtract-0.039
 magnet-0.038
-0.038
Token He
Token ID679
Feature activation-3.56e-03

Pos logit contributions
amas+0.002
 avail+0.002
 subtract+0.001
 boulder+0.001
lf+0.001
Neg logit contributions
 wo-0.002
 ostr-0.002
 loves-0.002
aa-0.002
ho-0.002
Token also
Token ID635
Feature activation+7.78e-03

Pos logit contributions
amas+0.006
 subtract+0.006
 magnet+0.006
 avail+0.006
 Tears+0.006
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.005
aa-0.005
 loves-0.004
Token hoped
Token ID10719
Feature activation+7.31e-03

Pos logit contributions
 ostr+0.013
 wo+0.012
aa+0.012
ho+0.012
unn+0.010
Neg logit contributions
amas-0.013
 subtract-0.012
 magnet-0.012
 Tears-0.012
 avail-0.012
Token they
Token ID484
Feature activation+1.05e-02

Pos logit contributions
 ostr+0.010
 wo+0.010
ho+0.009
aa+0.009
 loves+0.009
Neg logit contributions
 avail-0.014
amas-0.014
 subtract-0.013
 magnet-0.013
 Tears-0.013
Token house
Token ID2156
Feature activation-1.82e-02

Pos logit contributions
 ostr+0.017
 wo+0.016
ho+0.016
aa+0.015
 loves+0.015
Neg logit contributions
amas-0.011
 avail-0.010
 subtract-0.010
 magnet-0.009
-0.009
Token was
Token ID373
Feature activation+1.61e-02

Pos logit contributions
amas+0.037
 avail+0.036
 subtract+0.035
 magnet+0.034
+0.033
Neg logit contributions
 ostr-0.025
 wo-0.024
ho-0.023
 loves-0.021
aa-0.021
Token getting
Token ID1972
Feature activation+1.03e-02

Pos logit contributions
 ostr+0.026
 wo+0.025
ho+0.023
aa+0.023
 loves+0.021
Neg logit contributions
amas-0.028
 subtract-0.027
 avail-0.027
 Tears-0.026
 magnet-0.026
Token very
Token ID845
Feature activation-5.96e-03

Pos logit contributions
 ostr+0.018
 wo+0.017
ho+0.016
aa+0.015
 loves+0.015
Neg logit contributions
amas-0.018
 avail-0.017
 subtract-0.016
-0.016
 magnet-0.016
Token angry
Token ID7954
Feature activation-1.57e-02

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.009
 magnet+0.009
 Tears+0.008
Neg logit contributions
 ostr-0.011
 wo-0.011
ho-0.010
aa-0.010
 loves-0.009
Token.
Token ID13
Feature activation+5.68e-04

Pos logit contributions
amas+0.033
 avail+0.031
 subtract+0.031
+0.030
 magnet+0.030
Neg logit contributions
 ostr-0.021
 wo-0.020
ho-0.018
aa-0.017
 loves-0.017
Token He
Token ID679
Feature activation-7.98e-03

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
 loves+0.001
aa+0.001
Neg logit contributions
 avail-0.001
amas-0.001
 subtract-0.001
lf-0.001
 magnet-0.001
Token shouted
Token ID17293
Feature activation+1.51e-03

Pos logit contributions
amas+0.013
 subtract+0.012
 Tears+0.012
 magnet+0.012
 boulder+0.012
Neg logit contributions
 ostr-0.013
 wo-0.013
ho-0.012
aa-0.012
 loves-0.011
Token out
Token ID503
Feature activation+2.43e-03

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.003
 subtract-0.003
 Tears-0.003
 magnet-0.003
 avail-0.003
Token of
Token ID286
Feature activation-2.49e-02

Pos logit contributions
 ostr+0.003
 wo+0.003
aa+0.003
ho+0.003
 loves+0.003
Neg logit contributions
amas-0.005
 Tears-0.005
 subtract-0.005
 magnet-0.005
 boulder-0.004
Token his
Token ID465
Feature activation-5.62e-03

Pos logit contributions
amas+0.046
 subtract+0.043
 avail+0.042
+0.041
 magnet+0.041
Neg logit contributions
 ostr-0.040
 wo-0.038
ho-0.035
aa-0.034
 loves-0.033
Token his
Token ID465
Feature activation-9.64e-03

Pos logit contributions
amas+0.017
 avail+0.016
 subtract+0.016
 magnet+0.016
 Tears+0.015
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.011
aa-0.011
 loves-0.010
Token owner
Token ID4870
Feature activation-1.83e-02

Pos logit contributions
amas+0.013
 avail+0.012
 magnet+0.012
+0.011
 Tears+0.011
Neg logit contributions
 ostr-0.019
ho-0.018
 wo-0.017
la-0.017
unn-0.015
Token were
Token ID547
Feature activation+5.78e-04

Pos logit contributions
amas+0.034
 avail+0.032
 subtract+0.032
 magnet+0.032
+0.031
Neg logit contributions
 ostr-0.028
 wo-0.027
ho-0.025
aa-0.024
 loves-0.023
Token out
Token ID503
Feature activation+1.19e-03

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
aa+0.001
 loves+0.001
Neg logit contributions
amas-0.001
 subtract-0.001
 avail-0.001
 Tears-0.001
-0.001
Token walking
Token ID6155
Feature activation+1.07e-02

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.002
-0.002
 Tears-0.002
 subtract-0.002
 magnet-0.002
Token,
Token ID11
Feature activation-6.83e-04

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
aa+0.012
 loves+0.011
Neg logit contributions
amas-0.023
 subtract-0.022
 avail-0.022
 magnet-0.021
-0.021
Token the
Token ID262
Feature activation-5.25e-03

Pos logit contributions
amas+0.001
 subtract+0.001
 avail+0.001
icycle+0.001
+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token sky
Token ID6766
Feature activation-1.16e-02

Pos logit contributions
amas+0.010
 subtract+0.010
 avail+0.010
 magnet+0.010
+0.009
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token above
Token ID2029
Feature activation+1.24e-02

Pos logit contributions
amas+0.021
 avail+0.019
 subtract+0.019
 magnet+0.019
+0.018
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.017
aa-0.016
 loves-0.016
Token them
Token ID606
Feature activation+1.93e-02

Pos logit contributions
 ostr+0.020
 wo+0.019
ho+0.018
aa+0.017
 loves+0.016
Neg logit contributions
amas-0.022
 avail-0.021
 Tears-0.020
 subtract-0.020
 magnet-0.020
Token turned
Token ID2900
Feature activation+1.53e-02

Pos logit contributions
 ostr+0.032
 wo+0.030
ho+0.027
aa+0.027
 loves+0.026
Neg logit contributions
amas-0.033
 subtract-0.031
 Tears-0.031
-0.030
 avail-0.030
Token and
Token ID290
Feature activation-6.93e-03

Pos logit contributions
amas+0.021
 avail+0.021
 subtract+0.020
 magnet+0.019
+0.019
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.012
 loves-0.011
Token began
Token ID2540
Feature activation-2.08e-03

Pos logit contributions
amas+0.011
 avail+0.011
 subtract+0.011
+0.010
 magnet+0.010
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.012
aa-0.011
 loves-0.011
Token to
Token ID284
Feature activation+8.23e-03

Pos logit contributions
amas+0.005
 subtract+0.005
 avail+0.005
 Tears+0.005
 magnet+0.005
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.001
aa-0.001
 loves-0.001
Token cry
Token ID3960
Feature activation-4.27e-03

Pos logit contributions
 ostr+0.015
 wo+0.014
ho+0.013
aa+0.012
 loves+0.012
Neg logit contributions
amas-0.014
 avail-0.013
 subtract-0.013
-0.013
 magnet-0.012
Token.
Token ID13
Feature activation-1.55e-03

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.008
 Tears+0.008
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.005
aa-0.005
 loves-0.005
Token
Token ID198
Feature activation+6.27e-03

Pos logit contributions
amas+0.003
 avail+0.003
lf+0.003
 subtract+0.003
smanship+0.003
Neg logit contributions
 wo-0.002
 ostr-0.002
ho-0.002
aa-0.002
 loves-0.002
Token
Token ID198
Feature activation+6.09e-03

Pos logit contributions
 ostr+0.011
 wo+0.010
ho+0.009
aa+0.009
 loves+0.009
Neg logit contributions
amas-0.011
 subtract-0.010
 avail-0.010
-0.010
 Tears-0.010
TokenHis
Token ID6653
Feature activation-1.61e-02

Pos logit contributions
 ostr+0.012
 wo+0.011
ho+0.011
aa+0.011
 loves+0.010
Neg logit contributions
amas-0.009
 avail-0.009
 subtract-0.008
 magnet-0.008
ulum-0.008
Token friends
Token ID2460
Feature activation-1.96e-02

Pos logit contributions
 avail+0.024
amas+0.023
 boulder+0.023
 magnet+0.023
 Tears+0.023
Neg logit contributions
 ostr-0.030
 wo-0.028
ho-0.027
aa-0.026
 loves-0.024
Token woke
Token ID19092
Feature activation+1.48e-02

Pos logit contributions
amas+0.034
 subtract+0.032
 avail+0.032
 magnet+0.031
 Tears+0.031
Neg logit contributions
 ostr-0.033
 wo-0.031
ho-0.029
aa-0.028
 loves-0.027
Token up
Token ID510
Feature activation+4.27e-03

Pos logit contributions
 ostr+0.027
 wo+0.027
ho+0.025
aa+0.023
 loves+0.023
Neg logit contributions
amas-0.024
 avail-0.024
ulum-0.021
 Tears-0.021
 boulder-0.021
Token him
Token ID683
Feature activation+6.95e-03

Pos logit contributions
 ostr+0.019
 wo+0.018
ho+0.017
aa+0.016
 loves+0.016
Neg logit contributions
amas-0.020
 avail-0.019
 subtract-0.019
 magnet-0.019
 Tears-0.018
Token.
Token ID13
Feature activation-3.32e-03

Pos logit contributions
 ostr+0.011
 wo+0.010
ho+0.010
aa+0.009
 loves+0.009
Neg logit contributions
amas-0.013
 avail-0.012
 subtract-0.012
 magnet-0.012
-0.012
Token She
Token ID1375
Feature activation-3.79e-03

Pos logit contributions
amas+0.005
 avail+0.004
 subtract+0.004
 magnet+0.004
oon+0.004
Neg logit contributions
 ostr-0.007
 wo-0.007
ho-0.006
aa-0.006
 loves-0.006
Token said
Token ID531
Feature activation-5.94e-04

Pos logit contributions
amas+0.006
 subtract+0.006
 magnet+0.006
 avail+0.006
+0.006
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.006
aa-0.005
 loves-0.005
Token,
Token ID11
Feature activation-2.33e-03

Pos logit contributions
amas+0.001
 magnet+0.001
 subtract+0.001
 Tears+0.001
 avail+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
aa-0.001
ho-0.001
 loves-0.001
Token "
Token ID366
Feature activation-1.28e-02

Pos logit contributions
amas+0.004
 subtract+0.004
 magnet+0.004
 avail+0.004
+0.004
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.003
aa-0.003
 loves-0.003
TokenTom
Token ID13787
Feature activation-1.04e-02

Pos logit contributions
amas+0.021
 avail+0.020
 Tears+0.019
ulum+0.019
 magnet+0.019
Neg logit contributions
 ostr-0.023
 wo-0.022
ho-0.021
aa-0.021
 loves-0.019
Tokenmy
Token ID1820
Feature activation-7.99e-03

Pos logit contributions
amas+0.016
 subtract+0.015
 magnet+0.014
 avail+0.014
 Tears+0.014
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.018
aa-0.017
 loves-0.016
Token,
Token ID11
Feature activation-3.07e-03

Pos logit contributions
amas+0.016
 subtract+0.016
 avail+0.015
 magnet+0.015
+0.015
Neg logit contributions
 ostr-0.011
 wo-0.010
ho-0.010
aa-0.009
 loves-0.009
Token are
Token ID389
Feature activation+1.10e-02

Pos logit contributions
amas+0.006
 subtract+0.005
 avail+0.005
 magnet+0.005
 Tears+0.005
Neg logit contributions
 ostr-0.005
 wo-0.005
ho-0.004
aa-0.004
 loves-0.004
Token you
Token ID345
Feature activation+1.76e-02

Pos logit contributions
 ostr+0.019
 wo+0.017
ho+0.017
aa+0.016
 loves+0.015
Neg logit contributions
amas-0.019
 subtract-0.019
 avail-0.018
 magnet-0.018
 Tears-0.018
Token smiled
Token ID13541
Feature activation-1.46e-04

Pos logit contributions
amas+0.004
 subtract+0.004
 magnet+0.004
 avail+0.004
+0.004
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.004
 loves-0.003
Token and
Token ID290
Feature activation-2.08e-03

Pos logit contributions
amas+0.000
 avail+0.000
 subtract+0.000
 magnet+0.000
 Tears+0.000
Neg logit contributions
 ostr-0.000
 wo-0.000
ho-0.000
aa-0.000
 loves-0.000
Token enjoyed
Token ID8359
Feature activation-7.09e-03

Pos logit contributions
amas+0.004
 avail+0.004
 subtract+0.004
 magnet+0.004
+0.004
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
 loves-0.003
Token his
Token ID465
Feature activation-4.96e-03

Pos logit contributions
amas+0.012
 avail+0.011
 subtract+0.011
 magnet+0.010
+0.010
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.012
aa-0.011
 loves-0.011
Token time
Token ID640
Feature activation+1.28e-02

Pos logit contributions
amas+0.009
 subtract+0.008
 avail+0.008
 magnet+0.008
+0.008
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.007
 loves-0.006
Token on
Token ID319
Feature activation-1.29e-02

Pos logit contributions
 ostr+0.020
 wo+0.019
ho+0.018
aa+0.017
 loves+0.017
Neg logit contributions
amas-0.024
 avail-0.023
 subtract-0.022
ulum-0.022
 magnet-0.022
Token the
Token ID262
Feature activation-2.32e-03

Pos logit contributions
amas+0.024
 subtract+0.023
 avail+0.022
oon+0.022
iece+0.022
Neg logit contributions
 ostr-0.020
 wo-0.019
ho-0.017
 loves-0.016
aa-0.016
Token bench
Token ID7624
Feature activation+5.35e-03

Pos logit contributions
amas+0.003
 magnet+0.003
 boulder+0.003
 subtract+0.003
 avail+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
aa-0.004
ho-0.004
 loves-0.004
Token.
Token ID13
Feature activation-8.04e-04

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.006
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.012
 avail-0.011
 subtract-0.011
 magnet-0.011
ulum-0.010
Token He
Token ID679
Feature activation-7.61e-03

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 magnet+0.001
smanship+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
aa-0.001
 loves-0.001
ho-0.001
Token looked
Token ID3114
Feature activation+2.34e-03

Pos logit contributions
amas+0.013
 subtract+0.013
 magnet+0.012
 avail+0.012
+0.012
Neg logit contributions
 ostr-0.012
 wo-0.012
ho-0.011
aa-0.011
 loves-0.010
Token in
Token ID287
Feature activation-1.12e-02

Pos logit contributions
amas+0.007
 avail+0.007
 subtract+0.007
 magnet+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.006
 wo-0.005
aa-0.004
ho-0.004
 loves-0.004
Token a
Token ID257
Feature activation-8.53e-03

Pos logit contributions
amas+0.026
 subtract+0.024
 avail+0.023
ulum+0.023
+0.023
Neg logit contributions
 ostr-0.013
 wo-0.012
ho-0.011
aa-0.010
 loves-0.010
Token line
Token ID1627
Feature activation-1.38e-02

Pos logit contributions
amas+0.015
 subtract+0.014
 avail+0.014
+0.013
 Tears+0.013
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.013
 loves-0.012
Token.
Token ID13
Feature activation-6.87e-03

Pos logit contributions
amas+0.027
 subtract+0.026
 avail+0.025
 magnet+0.025
 Tears+0.024
Neg logit contributions
 ostr-0.021
 wo-0.019
ho-0.018
aa-0.017
 loves-0.016
Token They
Token ID1119
Feature activation+2.69e-03

Pos logit contributions
amas+0.012
 avail+0.012
 subtract+0.011
 magnet+0.011
+0.011
Neg logit contributions
 ostr-0.011
 wo-0.011
ho-0.010
aa-0.010
 loves-0.010
Token say
Token ID910
Feature activation-1.29e-02

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.005
 avail-0.005
 subtract-0.004
 magnet-0.004
-0.004
Token "
Token ID366
Feature activation-1.30e-02

Pos logit contributions
 subtract+0.020
 basketball+0.019
 magnet+0.019
 Tears+0.019
 towers+0.018
Neg logit contributions
 ostr-0.022
ho-0.020
aa-0.020
ind-0.019
 loves-0.018
Tokenm
Token ID76
Feature activation-1.25e-02

Pos logit contributions
amas+0.020
 avail+0.019
 magnet+0.018
 subtract+0.018
 Tears+0.018
Neg logit contributions
 ostr-0.024
 wo-0.023
ho-0.023
aa-0.022
 loves-0.021
Tokenarch
Token ID998
Feature activation+3.48e-03

Pos logit contributions
amas+0.023
 subtract+0.021
 Tears+0.021
+0.021
 avail+0.020
Neg logit contributions
 ostr-0.020
ho-0.018
 wo-0.018
aa-0.017
 loves-0.016
Token,
Token ID11
Feature activation-4.65e-03

Pos logit contributions
 ostr+0.006
 wo+0.006
ho+0.006
 loves+0.005
unn+0.005
Neg logit contributions
amas-0.006
 boulder-0.005
 avail-0.005
-0.005
 magnet-0.005
Token march
Token ID9960
Feature activation-1.39e-03

Pos logit contributions
amas+0.008
 subtract+0.008
 avail+0.008
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.007
 loves-0.007
TokenYes
Token ID5297
Feature activation-1.51e-02

Pos logit contributions
amas+0.002
 avail+0.002
 subtract+0.002
 magnet+0.002
 Tears+0.002
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
 loves-0.003
Token,
Token ID11
Feature activation-9.08e-04

Pos logit contributions
amas+0.027
 avail+0.025
 subtract+0.024
 magnet+0.024
+0.023
Neg logit contributions
 ostr-0.026
ho-0.023
 wo-0.023
 loves-0.021
aa-0.021
Token the
Token ID262
Feature activation-1.18e-03

Pos logit contributions
amas+0.002
 avail+0.001
 subtract+0.001
 magnet+0.001
 Tears+0.001
Neg logit contributions
 ostr-0.002
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token egg
Token ID5935
Feature activation-4.86e-03

Pos logit contributions
amas+0.002
 avail+0.002
 magnet+0.002
 boulder+0.002
 Tears+0.002
Neg logit contributions
 wo-0.002
 ostr-0.002
aa-0.002
ho-0.002
 loves-0.002
Token will
Token ID481
Feature activation+7.61e-03

Pos logit contributions
amas+0.010
 magnet+0.010
oon+0.010
 subtract+0.010
 Tears+0.010
Neg logit contributions
 ostr-0.006
 wo-0.005
ho-0.005
aa-0.005
la-0.004
Token be
Token ID307
Feature activation-1.85e-02

Pos logit contributions
 ostr+0.014
 wo+0.013
ho+0.012
aa+0.012
la+0.011
Neg logit contributions
amas-0.012
-0.012
 avail-0.011
 magnet-0.011
 subtract-0.011
Token warm
Token ID5814
Feature activation-4.73e-03

Pos logit contributions
amas+0.030
 avail+0.029
 magnet+0.027
 Tears+0.027
 subtract+0.026
Neg logit contributions
 ostr-0.033
 wo-0.032
ho-0.032
aa-0.029
 loves-0.027
Token.
Token ID13
Feature activation+8.06e-06

Pos logit contributions
amas+0.010
 subtract+0.009
 avail+0.009
 magnet+0.009
+0.009
Neg logit contributions
 ostr-0.006
 wo-0.006
ho-0.006
aa-0.005
 loves-0.005
Token It
Token ID632
Feature activation+2.69e-03

Feature not active on prev token;
no predictions were affected.
Token will
Token ID481
Feature activation+7.66e-03

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.003
aa+0.002
arl+0.002
Neg logit contributions
amas-0.006
 magnet-0.006
 subtract-0.006
 Tears-0.006
-0.006
Token be
Token ID307
Feature activation-1.82e-02

Pos logit contributions
 ostr+0.014
 wo+0.013
aa+0.012
ho+0.012
 loves+0.011
Neg logit contributions
amas-0.012
 avail-0.012
-0.011
 magnet-0.011
 subtract-0.011
Token
Token ID198
Feature activation+7.33e-04

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
aa+0.001
la+0.001
Neg logit contributions
amas-0.000
 Tears-0.000
 subtract-0.000
oon-0.000
 magnet-0.000
Token"
Token ID1
Feature activation-1.19e-03

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
aa+0.001
 loves+0.001
Neg logit contributions
amas-0.001
 avail-0.001
 subtract-0.001
 magnet-0.001
-0.001
TokenBut
Token ID1537
Feature activation-2.10e-02

Pos logit contributions
amas+0.002
 avail+0.002
 magnet+0.002
 subtract+0.002
 boulder+0.002
Neg logit contributions
 ostr-0.002
 wo-0.002
ho-0.002
aa-0.002
 loves-0.002
Token you
Token ID345
Feature activation+1.60e-02

Pos logit contributions
amas+0.037
 avail+0.035
 subtract+0.035
 magnet+0.034
+0.033
Neg logit contributions
 ostr-0.035
 wo-0.033
ho-0.031
aa-0.030
 loves-0.029
Token will
Token ID481
Feature activation+5.21e-03

Pos logit contributions
 ostr+0.023
 wo+0.021
ho+0.019
aa+0.019
 loves+0.018
Neg logit contributions
amas-0.033
 avail-0.032
 subtract-0.031
 magnet-0.030
 Tears-0.030
Token get
Token ID651
Feature activation-1.12e-02

Pos logit contributions
 ostr+0.009
 wo+0.009
ho+0.008
aa+0.008
 loves+0.008
Neg logit contributions
 avail-0.008
amas-0.008
-0.008
 subtract-0.007
 seek-0.007
Token sick
Token ID6639
Feature activation-2.05e-03

Pos logit contributions
amas+0.016
 avail+0.015
 magnet+0.015
 subtract+0.015
 Tears+0.015
Neg logit contributions
 ostr-0.022
 wo-0.021
ho-0.020
 loves-0.018
rav-0.018
Token,
Token ID11
Feature activation-5.88e-03

Pos logit contributions
amas+0.004
 avail+0.004
 subtract+0.004
 magnet+0.004
+0.004
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.002
aa-0.002
 loves-0.002
Token Lily
Token ID20037
Feature activation-1.87e-03

Pos logit contributions
amas+0.011
 avail+0.011
 subtract+0.011
+0.011
icycle+0.010
Neg logit contributions
 ostr-0.009
 wo-0.009
ho-0.008
aa-0.008
 loves-0.007
Token,"
Token ID553
Feature activation-6.53e-03

Pos logit contributions
amas+0.003
 magnet+0.003
 avail+0.003
oon+0.003
 subtract+0.003
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.003
aa-0.003
rav-0.002
Token Sam
Token ID3409
Feature activation-8.97e-03

Pos logit contributions
amas+0.012
+0.011
oon+0.011
 avail+0.011
 subtract+0.011
Neg logit contributions
 ostr-0.011
 wo-0.011
ho-0.010
 loves-0.009
aa-0.009
Token
Token ID198
Feature activation+1.87e-03

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
lf+0.003
smanship+0.003
Neg logit contributions
 wo-0.005
 ostr-0.005
 loves-0.005
aa-0.005
ho-0.005
Token
Token ID198
Feature activation+2.09e-03

Pos logit contributions
 ostr+0.004
 wo+0.003
ho+0.003
aa+0.003
 loves+0.003
Neg logit contributions
amas-0.003
 subtract-0.003
 Tears-0.003
 magnet-0.003
oon-0.003
Token"
Token ID1
Feature activation+1.18e-03

Pos logit contributions
 ostr+0.004
 wo+0.003
ho+0.003
aa+0.003
 loves+0.003
Neg logit contributions
amas-0.004
 avail-0.003
 subtract-0.003
 magnet-0.003
-0.003
TokenIt
Token ID1026
Feature activation+5.62e-04

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.002
 avail-0.002
 Tears-0.002
ulum-0.002
 boulder-0.002
Token's
Token ID338
Feature activation+4.26e-03

Pos logit contributions
 ostr+0.001
 wo+0.001
ho+0.001
aa+0.001
la+0.001
Neg logit contributions
amas-0.001
 magnet-0.001
 subtract-0.001
 Tears-0.001
oon-0.001
Token okay
Token ID8788
Feature activation-5.79e-02

Pos logit contributions
 ostr+0.008
 wo+0.007
ho+0.007
aa+0.007
 loves+0.006
Neg logit contributions
amas-0.007
 avail-0.007
 subtract-0.006
 magnet-0.006
 Tears-0.006
Token,
Token ID11
Feature activation-4.54e-03

Pos logit contributions
 avail+0.090
amas+0.089
 magnet+0.089
lf+0.084
 subtract+0.082
Neg logit contributions
 ostr-0.101
 wo-0.091
ho-0.089
aa-0.087
la-0.082
Token kids
Token ID3988
Feature activation-4.79e-03

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.007
+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.006
 loves-0.006
Token,"
Token ID553
Feature activation-4.26e-03

Pos logit contributions
amas+0.008
 magnet+0.008
 subtract+0.008
pen+0.007
 avail+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.007
 loves-0.006
Token he
Token ID339
Feature activation-8.60e-03

Pos logit contributions
amas+0.007
 avail+0.006
 subtract+0.006
+0.006
ulum+0.006
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token said
Token ID531
Feature activation-3.28e-03

Pos logit contributions
 subtract+0.014
amas+0.014
 magnet+0.013
 Tears+0.013
 avail+0.013
Neg logit contributions
 ostr-0.014
ho-0.013
 wo-0.013
aa-0.012
 loves-0.012
Token and
Token ID290
Feature activation-3.57e-03

Pos logit contributions
 avail+0.004
amas+0.004
+0.003
 subtract+0.003
smanship+0.003
Neg logit contributions
 ostr-0.003
 wo-0.003
ho-0.002
 loves-0.002
aa-0.002
Token then
Token ID788
Feature activation+1.74e-02

Pos logit contributions
amas+0.007
 avail+0.006
 subtract+0.006
+0.006
 magnet+0.006
Neg logit contributions
 ostr-0.006
 wo-0.005
ho-0.005
 loves-0.005
aa-0.005
Token on
Token ID319
Feature activation-1.69e-02

Pos logit contributions
 ostr+0.029
 wo+0.028
ho+0.025
aa+0.025
 loves+0.024
Neg logit contributions
amas-0.031
 avail-0.029
 subtract-0.029
-0.029
 magnet-0.028
Token a
Token ID257
Feature activation-1.40e-02

Pos logit contributions
amas+0.039
 subtract+0.038
iece+0.037
wheel+0.036
 avail+0.036
Neg logit contributions
 ostr-0.017
 wo-0.016
 their-0.014
ho-0.014
 loves-0.014
Token book
Token ID1492
Feature activation-5.22e-03

Pos logit contributions
amas+0.023
 avail+0.021
 subtract+0.021
 magnet+0.021
+0.020
Neg logit contributions
 ostr-0.025
 wo-0.024
ho-0.023
aa-0.022
 loves-0.021
Tokencase
Token ID7442
Feature activation-5.66e-02

Pos logit contributions
amas+0.010
 avail+0.009
 subtract+0.009
 magnet+0.009
+0.009
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
 loves-0.007
Token.
Token ID13
Feature activation-1.57e-03

Pos logit contributions
amas+0.106
 subtract+0.101
 avail+0.100
 magnet+0.098
+0.097
Neg logit contributions
 ostr-0.086
 wo-0.081
aa-0.075
ho-0.074
 loves-0.069
Token She
Token ID1375
Feature activation-9.15e-03

Pos logit contributions
 avail+0.003
smanship+0.003
amas+0.003
iece+0.002
 subtract+0.002
Neg logit contributions
 wo-0.003
 ostr-0.003
aa-0.002
ho-0.002
 loves-0.002
Token reached
Token ID4251
Feature activation+1.20e-02

Pos logit contributions
amas+0.016
 subtract+0.016
 avail+0.015
 magnet+0.015
 Tears+0.015
Neg logit contributions
 ostr-0.015
 wo-0.014
ho-0.013
aa-0.013
 loves-0.012
Token for
Token ID329
Feature activation+5.28e-03

Pos logit contributions
 ostr+0.018
 wo+0.017
ho+0.016
aa+0.015
 loves+0.014
Neg logit contributions
amas-0.024
 avail-0.022
 subtract-0.022
 Tears-0.022
 magnet-0.022
Token the
Token ID262
Feature activation-7.12e-03

Pos logit contributions
 ostr+0.006
 wo+0.006
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.012
oon-0.011
 magnet-0.011
ulum-0.011
icycle-0.011
Token't
Token ID470
Feature activation+2.55e-03

Pos logit contributions
amas+0.029
 avail+0.028
 subtract+0.028
 magnet+0.027
 Tears+0.026
Neg logit contributions
 ostr-0.032
 wo-0.031
ho-0.029
aa-0.029
 loves-0.028
Token do
Token ID466
Feature activation-1.82e-02

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.004
 loves+0.003
Neg logit contributions
amas-0.005
 subtract-0.004
 avail-0.004
-0.004
 Tears-0.004
Token that
Token ID326
Feature activation+2.35e-02

Pos logit contributions
amas+0.032
 avail+0.032
 subtract+0.031
 magnet+0.031
 Tears+0.030
Neg logit contributions
 ostr-0.030
 wo-0.028
ho-0.027
aa-0.026
 loves-0.024
Token again
Token ID757
Feature activation-1.44e-02

Pos logit contributions
 ostr+0.040
 wo+0.037
ho+0.036
aa+0.034
 loves+0.031
Neg logit contributions
amas-0.041
 avail-0.040
 subtract-0.038
 magnet-0.038
ulum-0.037
Token,
Token ID11
Feature activation-5.56e-03

Pos logit contributions
amas+0.025
 avail+0.025
 subtract+0.024
 magnet+0.023
ulum+0.023
Neg logit contributions
 ostr-0.024
 wo-0.022
ho-0.022
aa-0.020
la-0.019
Token okay
Token ID8788
Feature activation-6.04e-02

Pos logit contributions
amas+0.010
 avail+0.010
 subtract+0.010
+0.009
 magnet+0.009
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.008
 loves-0.007
Token?"
Token ID1701
Feature activation-4.69e-03

Pos logit contributions
amas+0.095
 avail+0.093
 magnet+0.092
ulum+0.087
 subtract+0.086
Neg logit contributions
 ostr-0.112
 wo-0.104
ho-0.099
aa-0.094
la-0.093
Token Sara
Token ID24799
Feature activation+6.51e-03

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.008
 magnet+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.008
 wo-0.007
ho-0.007
aa-0.007
 loves-0.006
Token says
Token ID1139
Feature activation-3.79e-03

Pos logit contributions
 ostr+0.007
 wo+0.006
ho+0.005
aa+0.005
 loves+0.005
Neg logit contributions
amas-0.016
 avail-0.015
 subtract-0.015
-0.015
 magnet-0.015
Token.
Token ID13
Feature activation-2.84e-03

Pos logit contributions
amas+0.007
 subtract+0.007
 avail+0.007
 magnet+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.006
 wo-0.005
ho-0.005
aa-0.005
 loves-0.005
Token
Token ID198
Feature activation+3.73e-04

Pos logit contributions
amas+0.005
 subtract+0.005
 avail+0.005
 magnet+0.005
+0.005
Neg logit contributions
 ostr-0.005
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Token says
Token ID1139
Feature activation-4.05e-03

Pos logit contributions
amas+0.022
 subtract+0.022
 magnet+0.021
 Tears+0.021
 boulder+0.020
Neg logit contributions
 ostr-0.014
ho-0.013
 wo-0.012
aa-0.012
 loves-0.010
Token,
Token ID11
Feature activation-5.61e-03

Pos logit contributions
+0.006
on+0.005
amas+0.005
 Tears+0.005
 magnet+0.005
Neg logit contributions
 ostr-0.008
aa-0.007
 wo-0.007
ho-0.006
 loves-0.006
Token "
Token ID366
Feature activation-1.21e-02

Pos logit contributions
amas+0.008
+0.008
 subtract+0.008
 magnet+0.007
 Tears+0.007
Neg logit contributions
 ostr-0.011
aa-0.010
 wo-0.010
ho-0.010
 loves-0.009
TokenBen
Token ID11696
Feature activation+1.57e-03

Pos logit contributions
amas+0.019
 avail+0.018
 subtract+0.017
 magnet+0.017
 Tears+0.016
Neg logit contributions
 ostr-0.023
 wo-0.022
ho-0.021
aa-0.021
 loves-0.020
Token is
Token ID318
Feature activation+1.86e-02

Pos logit contributions
 ostr+0.002
 wo+0.002
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.003
 subtract-0.003
 magnet-0.003
 avail-0.003
-0.003
Token okay
Token ID8788
Feature activation-6.09e-02

Pos logit contributions
 ostr+0.029
 wo+0.027
ho+0.025
aa+0.024
 loves+0.023
Neg logit contributions
amas-0.036
 avail-0.034
 subtract-0.034
 magnet-0.033
-0.032
Token.
Token ID13
Feature activation-5.12e-03

Pos logit contributions
amas+0.113
 avail+0.108
 subtract+0.106
 magnet+0.103
 Tears+0.102
Neg logit contributions
 ostr-0.096
 wo-0.089
ho-0.085
aa-0.080
 loves-0.077
Token He
Token ID679
Feature activation-5.56e-03

Pos logit contributions
amas+0.008
 avail+0.008
 subtract+0.007
+0.007
lf+0.007
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
 loves-0.009
aa-0.009
Token touched
Token ID12615
Feature activation-1.26e-02

Pos logit contributions
amas+0.011
 magnet+0.010
 subtract+0.010
+0.010
 Tears+0.010
Neg logit contributions
 ostr-0.008
 wo-0.008
ho-0.007
aa-0.007
la-0.006
Token the
Token ID262
Feature activation-2.32e-03

Pos logit contributions
amas+0.024
 avail+0.023
 subtract+0.023
 magnet+0.022
 Tears+0.022
Neg logit contributions
 ostr-0.019
 wo-0.018
ho-0.017
aa-0.016
 loves-0.015
Token oven
Token ID14361
Feature activation+1.99e-02

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
+0.003
 Tears+0.003
Neg logit contributions
 ostr-0.005
 wo-0.004
ho-0.004
aa-0.004
 loves-0.004
Token"
Token ID1
Feature activation-4.30e-04

Pos logit contributions
 ostr+0.004
 wo+0.004
ho+0.004
aa+0.004
 loves+0.004
Neg logit contributions
amas-0.006
 avail-0.006
 subtract-0.006
 magnet-0.005
-0.005
TokenYes
Token ID5297
Feature activation-1.60e-02

Pos logit contributions
amas+0.001
 avail+0.001
 subtract+0.001
 magnet+0.001
 Tears+0.001
Neg logit contributions
 ostr-0.001
 wo-0.001
ho-0.001
aa-0.001
 loves-0.001
Token,
Token ID11
Feature activation-5.52e-03

Pos logit contributions
amas+0.025
 avail+0.024
lf+0.023
 magnet+0.023
+0.023
Neg logit contributions
 ostr-0.028
ho-0.026
 wo-0.026
aa-0.025
 loves-0.024
Token we
Token ID356
Feature activation+1.96e-03

Pos logit contributions
amas+0.009
 subtract+0.009
 magnet+0.009
 avail+0.009
lf+0.009
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.008
 loves-0.008
Token're
Token ID821
Feature activation-5.76e-03

Pos logit contributions
 ostr+0.003
 wo+0.003
ho+0.002
aa+0.002
 loves+0.002
Neg logit contributions
amas-0.004
 subtract-0.004
 magnet-0.004
 avail-0.004
-0.004
Token okay
Token ID8788
Feature activation-6.10e-02

Pos logit contributions
amas+0.010
 avail+0.009
 subtract+0.009
 magnet+0.009
 Tears+0.009
Neg logit contributions
 ostr-0.010
 wo-0.009
ho-0.009
aa-0.009
 loves-0.008
Token,
Token ID11
Feature activation-5.29e-03

Pos logit contributions
amas+0.105
 avail+0.100
 subtract+0.095
 magnet+0.095
 Tears+0.095
Neg logit contributions
 ostr-0.103
 wo-0.096
ho-0.092
aa-0.087
 loves-0.082
Token sweet
Token ID6029
Feature activation+8.14e-03

Pos logit contributions
amas+0.009
 avail+0.009
 subtract+0.009
 magnet+0.008
 Tears+0.008
Neg logit contributions
 ostr-0.009
 wo-0.008
ho-0.008
aa-0.008
 loves-0.007
Tokenies
Token ID444
Feature activation-2.28e-03

Pos logit contributions
 ostr+0.013
 wo+0.013
ho+0.012
 loves+0.011
aa+0.010
Neg logit contributions
amas-0.014
 avail-0.014
lf-0.013
 magnet-0.013
 subtract-0.013
Token.
Token ID13
Feature activation-2.06e-03

Pos logit contributions
amas+0.004
 magnet+0.003
oon+0.003
 subtract+0.003
smanship+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.003
 loves-0.003
Token Don
Token ID2094
Feature activation-9.05e-03

Pos logit contributions
amas+0.003
 avail+0.003
 subtract+0.003
 magnet+0.003
+0.003
Neg logit contributions
 ostr-0.004
 wo-0.004
ho-0.004
aa-0.003
 loves-0.003