TOP ACTIVATIONS
MAX = 0.123

to me . The or nam ents were very important to
more careful . The or nam ents are not toys .
. It made the b unn ies smile and cheer .
never played with the or nam ents again . They still
on top . The b unn ies wanted to have a
unn ies . The b unn ies lived in a cozy
was ready , the b unn ies hopped over to the
want to see the b unn ies . They are the
to look at the or nam ents on the tree .
the rest of the or nam ents onto the tree .
Everyday he would take the av oc ados and scatter them
the animals , the ac rob ats and the clown s
hopped over to the b unn ies . They were so
on it . The or nam ents were shiny and colorful
time playing with her or nam ents and making her special
to help sort the or nam ents . She picked up
and play with your or nam ents . Maybe you can
he especially liked the or nam ents he kept around .
She learned that the or nam ents were not for playing
the house saw the b unn ies up in the nest

MIN ACTIVATIONS
MIN = -0.093

or read a book or dance . But Lily
her leg and could not dance . She said that she
floor or on a soft mat . They are careful and
Ben smile and hug their dolls . They say , "
looked very happy and not boring anymore .
one more cartoon before bed time . <|endoftext|> Once upon a
like it hadn 't r ained in a long time .
tree pose and a warrior pose . They were fun ,
to get away un har med . From that day onwards
dust her shelves before bed time . <|endoftext|> Once upon a
and didn 't want to dance anymore . Her
because it had not r ained for many days .
be healthy and un har med . Suddenly , her wish
to hug and kiss their dolls . L ily
and didn 't want to dance . She felt like she
shouted to no one in particular . Suddenly , a voice
and didn 't want to dance anymore . Em
and r ained and r ained . The water came down
was destroyed and could not spin anymore . Tim was very
broke and she couldn 't dance anymore . She felt sad

INTERVAL 0.069 - 0.123
CONTAINS 0.025%

. Here you go . Enjoy !" M um
He went on the fer ris wheel , the roller coaster
to see the rh in oc eros . We don 't
!" Ben says . " P ill ows are fun !"
park and saw that the other children were playing a game

INTERVAL 0.015 - 0.069
CONTAINS 16.995%

cookies . But don 't eat too many , or you
So she sliced the pepper into pieces and passed them
ily smiled . She liked to teach Max new things .
sunshine . One day he thought it would be fun to
. The dragon was happy to see Sally . Sally was

INTERVAL -0.039 - 0.015
CONTAINS 81.633%

. I won 't do it again . I just wanted
a regular house with her mom and dad . One day
to listen to his parents and would often run away from
of light in the hall . She followed it and found
?" Lily asked . " I don 't know

INTERVAL -0.093 - -0.039
CONTAINS 1.347%

revealing his plan and motive at the end of the last
player and puts a new CD in . She plays the
passed away , and his beautiful paintings were lost forever .
shiny and had a long cord . " Wow
from her . At the park , Lily met
Token to
Token ID284
Feature activation-9.82e-03

Pos logit contributions
ll+0.019
opus+0.018
arter+0.018
 sidelines+0.018
ye+0.018
Neg logit contributions
 her-0.006
 and-0.005
stick-0.005
 enough-0.004
 them-0.004
Token me
Token ID502
Feature activation-1.97e-02

Pos logit contributions
 and+0.011
 her+0.009
 as+0.008
.+0.008
,+0.008
Neg logit contributions
 sidelines-0.019
arter-0.019
opus-0.018
ye-0.018
prise-0.017
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
 her+0.011
 them+0.010
 and+0.009
 its+0.009
 enough+0.008
Neg logit contributions
ll-0.041
cher-0.041
 cheek-0.040
arter-0.040
uch-0.040
Token The
Token ID383
Feature activation+2.27e-02

Pos logit contributions
opus+0.027
arter+0.027
 sidelines+0.026
ye+0.025
uch+0.024
Neg logit contributions
 and-0.026
 her-0.023
.-0.021
 as-0.021
 how-0.021
Token or
Token ID393
Feature activation+1.69e-02

Pos logit contributions
 sidelines+0.034
opus+0.034
arter+0.034
ye+0.031
uch+0.031
Neg logit contributions
 and-0.035
 her-0.033
 as-0.030
 how-0.030
 them-0.030
Tokennam
Token ID7402
Feature activation+1.23e-01

Pos logit contributions
opus+0.040
 sidelines+0.040
arter+0.039
ye+0.038
uch+0.037
Neg logit contributions
 and-0.012
 her-0.010
.-0.008
 as-0.008
 them-0.008
Tokenents
Token ID658
Feature activation+1.88e-02

Pos logit contributions
arter+0.203
opus+0.203
 sidelines+0.202
ye+0.189
uch+0.187
Neg logit contributions
 and-0.177
 her-0.160
.-0.146
 as-0.145
 how-0.143
Token were
Token ID547
Feature activation-5.17e-03

Pos logit contributions
uch+0.026
ll+0.025
arter+0.024
 sidelines+0.024
opus+0.024
Neg logit contributions
 her-0.024
ine-0.024
 and-0.024
 enough-0.023
 one-0.023
Token very
Token ID845
Feature activation+1.89e-02

Pos logit contributions
 and+0.007
 her+0.007
,+0.006
 them+0.006
.+0.006
Neg logit contributions
arter-0.008
 sidelines-0.007
 cheesy-0.007
ye-0.007
opus-0.007
Token important
Token ID1593
Feature activation-2.07e-02

Pos logit contributions
opus+0.029
 sidelines+0.029
arter+0.028
umm+0.027
atmeal+0.026
Neg logit contributions
 and-0.031
 her-0.029
.-0.027
 as-0.026
,-0.026
Token to
Token ID284
Feature activation-1.03e-02

Pos logit contributions
c+0.014
 her+0.013
 them+0.011
stick+0.011
 its+0.011
Neg logit contributions
ider-0.041
ll-0.041
ve-0.040
 cooperative-0.039
opus-0.038
Token more
Token ID517
Feature activation+3.14e-03

Pos logit contributions
prise+0.002
 sidelines+0.002
arter+0.002
 cheesy+0.002
cher+0.002
Neg logit contributions
 and-0.005
 her-0.004
 or-0.004
 them-0.004
.-0.004
Token careful
Token ID8161
Feature activation+5.19e-02

Pos logit contributions
arter+0.004
opus+0.004
 sidelines+0.004
ye+0.003
prise+0.003
Neg logit contributions
 and-0.006
 her-0.006
 them-0.005
 how-0.005
 as-0.005
Token.
Token ID13
Feature activation+2.04e-02

Pos logit contributions
 cheek+0.105
ll+0.104
uch+0.103
ve+0.102
arter+0.102
Neg logit contributions
 her-0.032
 its-0.026
 them-0.025
 their-0.023
 and-0.023
Token The
Token ID383
Feature activation+2.13e-02

Pos logit contributions
opus+0.033
 sidelines+0.033
arter+0.032
ve+0.031
ye+0.031
Neg logit contributions
 and-0.031
 her-0.027
.-0.025
 how-0.025
 as-0.025
Token or
Token ID393
Feature activation+2.11e-02

Pos logit contributions
opus+0.032
 sidelines+0.032
arter+0.032
ye+0.029
uch+0.029
Neg logit contributions
 and-0.034
 her-0.031
.-0.028
 as-0.028
 how-0.028
Tokennam
Token ID7402
Feature activation+1.22e-01

Pos logit contributions
opus+0.040
arter+0.040
 sidelines+0.040
ye+0.038
uch+0.037
Neg logit contributions
 and-0.025
 her-0.022
.-0.019
 as-0.019
 them-0.019
Tokenents
Token ID658
Feature activation+2.37e-02

Pos logit contributions
arter+0.204
 sidelines+0.204
opus+0.204
ye+0.191
uch+0.188
Neg logit contributions
 and-0.174
 her-0.156
.-0.143
 as-0.141
 how-0.140
Token are
Token ID389
Feature activation-5.74e-03

Pos logit contributions
uch+0.035
ll+0.035
opus+0.034
ve+0.033
cher+0.033
Neg logit contributions
 her-0.029
 and-0.028
ine-0.027
 enough-0.026
 one-0.026
Token not
Token ID407
Feature activation+1.70e-02

Pos logit contributions
 and+0.008
 her+0.007
.+0.006
 them+0.006
,+0.006
Neg logit contributions
arter-0.009
 sidelines-0.009
opus-0.008
ye-0.008
 cheesy-0.008
Token toys
Token ID14958
Feature activation-1.76e-02

Pos logit contributions
arter+0.023
ye+0.022
ve+0.021
 cheesy+0.021
 sidelines+0.021
Neg logit contributions
 and-0.025
 her-0.023
 them-0.022
-0.021
 Every-0.020
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
 her+0.009
 and+0.007
ine+0.006
 its+0.006
 them+0.006
Neg logit contributions
opus-0.040
arter-0.039
uch-0.039
ll-0.039
 sidelines-0.039
Token.
Token ID13
Feature activation-1.04e-03

Pos logit contributions
 her+0.001
 Every+0.001
 One+0.001
ine+0.001
 Whenever+0.001
Neg logit contributions
ll-0.004
beans-0.004
arter-0.004
 hamb-0.004
opus-0.004
Token It
Token ID632
Feature activation+1.26e-02

Pos logit contributions
 and+0.001
 her+0.001
 enough+0.001
 as+0.001
 or+0.001
Neg logit contributions
arter-0.002
opus-0.002
 sidelines-0.002
leaf-0.002
 cheesy-0.002
Token made
Token ID925
Feature activation+6.47e-03

Pos logit contributions
opus+0.023
arter+0.022
ll+0.022
uch+0.022
atmeal+0.021
Neg logit contributions
 and-0.014
 her-0.013
 as-0.012
 them-0.012
.-0.012
Token the
Token ID262
Feature activation+3.07e-02

Pos logit contributions
 sidelines+0.014
arter+0.014
opus+0.013
 cheesy+0.013
leaf+0.013
Neg logit contributions
 and-0.005
 her-0.004
 or-0.004
.-0.003
stick-0.003
Token b
Token ID275
Feature activation+7.10e-02

Pos logit contributions
arter+0.053
 sidelines+0.049
opus+0.048
prise+0.048
leaf+0.048
Neg logit contributions
 and-0.048
 her-0.041
 one-0.040
.-0.039
,-0.038
Tokenunn
Token ID20935
Feature activation+1.20e-01

Pos logit contributions
 sidelines+0.077
opus+0.074
cher+0.074
arter+0.072
vell+0.071
Neg logit contributions
 and-0.146
 as-0.130
.-0.127
 her-0.124
,-0.122
Tokenies
Token ID444
Feature activation+1.00e-02

Pos logit contributions
opus+0.185
arter+0.185
 sidelines+0.184
ye+0.172
uch+0.170
Neg logit contributions
 and-0.184
 her-0.168
.-0.154
 as-0.153
 how-0.152
Token smile
Token ID8212
Feature activation+3.33e-02

Pos logit contributions
arter+0.014
 sidelines+0.014
opus+0.014
atmeal+0.013
 cheesy+0.013
Neg logit contributions
 and-0.014
 her-0.014
 them-0.013
 enough-0.012
 one-0.012
Token and
Token ID290
Feature activation-1.62e-02

Pos logit contributions
opus+0.074
 cheesy+0.074
arter+0.073
ve+0.073
 hamb+0.072
Neg logit contributions
 them-0.011
 her-0.010
stick-0.007
 Every-0.007
 how-0.007
Token cheer
Token ID14042
Feature activation-2.11e-02

Pos logit contributions
 and+0.023
 her+0.021
.+0.019
 as+0.019
 how+0.019
Neg logit contributions
opus-0.027
arter-0.027
 sidelines-0.027
uch-0.025
ye-0.025
Token.
Token ID13
Feature activation-5.21e-03

Pos logit contributions
 and+0.020
 her+0.018
c+0.017
 enough+0.016
 one+0.016
Neg logit contributions
arter-0.038
rav-0.037
ve-0.037
opus-0.036
vell-0.036
Token never
Token ID1239
Feature activation+1.75e-02

Pos logit contributions
arter+0.002
 sidelines+0.002
opus+0.002
ye+0.002
ll+0.002
Neg logit contributions
 and-0.002
 her-0.001
.-0.001
 them-0.001
 as-0.001
Token played
Token ID2826
Feature activation-2.16e-02

Pos logit contributions
opus+0.028
 sidelines+0.027
arter+0.027
vell+0.025
ye+0.025
Neg logit contributions
 and-0.028
 her-0.025
.-0.024
 as-0.023
,-0.023
Token with
Token ID351
Feature activation-6.53e-03

Pos logit contributions
 and+0.019
ine+0.018
 her+0.017
 enough+0.016
 how+0.016
Neg logit contributions
 sidelines-0.041
arter-0.040
atmeal-0.039
opus-0.039
icious-0.038
Token the
Token ID262
Feature activation+4.60e-03

Pos logit contributions
 and+0.007
 her+0.006
 as+0.006
.+0.005
,+0.005
Neg logit contributions
arter-0.012
 cheesy-0.011
 hamb-0.011
 sidelines-0.011
wy-0.011
Token or
Token ID393
Feature activation-2.58e-03

Pos logit contributions
 sidelines+0.006
opus+0.006
arter+0.006
ye+0.006
uch+0.006
Neg logit contributions
 and-0.008
 her-0.007
 how-0.007
 as-0.007
.-0.007
Tokennam
Token ID7402
Feature activation+1.17e-01

Pos logit contributions
 and+0.003
 her+0.003
.+0.002
 as+0.002
 how+0.002
Neg logit contributions
opus-0.005
arter-0.005
 sidelines-0.005
ye-0.005
uch-0.005
Tokenents
Token ID658
Feature activation+7.37e-03

Pos logit contributions
opus+0.189
arter+0.189
 sidelines+0.188
ye+0.176
uch+0.174
Neg logit contributions
 and-0.170
 her-0.154
.-0.141
 as-0.139
 how-0.138
Token again
Token ID757
Feature activation-3.96e-03

Pos logit contributions
arter+0.013
 sidelines+0.013
ll+0.013
cher+0.013
atmeal+0.013
Neg logit contributions
 her-0.006
 and-0.006
ine-0.006
 Every-0.006
 how-0.006
Token.
Token ID13
Feature activation+5.15e-03

Pos logit contributions
 and+0.002
 her+0.001
 how+0.001
 them+0.001
stick+0.001
Neg logit contributions
arter-0.009
opus-0.009
 sidelines-0.009
 cheesy-0.009
ll-0.009
Token They
Token ID1119
Feature activation+3.32e-03

Pos logit contributions
arter+0.010
 Merry+0.010
 Enjoy+0.010
 sidelines+0.009
 cheesy+0.009
Neg logit contributions
 and-0.005
 enough-0.004
 her-0.004
 as-0.004
 how-0.004
Token still
Token ID991
Feature activation+1.43e-02

Pos logit contributions
ll+0.004
wy+0.004
uch+0.004
 forgot+0.004
 sidelines+0.004
Neg logit contributions
 and-0.005
 as-0.005
 how-0.005
 her-0.005
 them-0.005
Token on
Token ID319
Feature activation-2.90e-02

Pos logit contributions
 and+0.001
 or+0.001
 her+0.001
ine+0.001
,+0.001
Neg logit contributions
 sidelines-0.002
ye-0.002
arter-0.002
 hamb-0.002
opus-0.002
Token top
Token ID1353
Feature activation-1.45e-02

Pos logit contributions
ine+0.022
c+0.020
stick+0.020
Pear+0.019
age+0.019
Neg logit contributions
 sidelines-0.053
 seek-0.047
arter-0.046
 dining-0.045
 earthquake-0.045
Token.
Token ID13
Feature activation+5.54e-03

Pos logit contributions
Pear+0.010
 her+0.010
stick+0.010
Her+0.010
ine+0.009
Neg logit contributions
ll-0.025
 sidelines-0.024
atmeal-0.024
arter-0.024
uch-0.024
Token The
Token ID383
Feature activation+1.53e-02

Pos logit contributions
 sidelines+0.009
 Merry+0.009
opus+0.009
 Welcome+0.009
 Enjoy+0.009
Neg logit contributions
 enough-0.006
 and-0.006
 her-0.005
 how-0.005
 them-0.005
Token b
Token ID275
Feature activation+5.73e-02

Pos logit contributions
opus+0.023
 sidelines+0.023
arter+0.023
ye+0.021
uch+0.021
Neg logit contributions
 and-0.024
 her-0.022
.-0.020
 as-0.020
 them-0.020
Tokenunn
Token ID20935
Feature activation+1.16e-01

Pos logit contributions
opus+0.052
arter+0.051
 sidelines+0.051
ye+0.045
uch+0.045
Neg logit contributions
 and-0.126
 her-0.118
.-0.112
 as-0.111
 how-0.110
Tokenies
Token ID444
Feature activation+6.44e-03

Pos logit contributions
ll+0.144
arter+0.142
ve+0.142
opus+0.140
unn+0.134
Neg logit contributions
 and-0.192
 her-0.185
 how-0.173
 them-0.171
 one-0.169
Token wanted
Token ID2227
Feature activation+9.47e-03

Pos logit contributions
opus+0.010
 sidelines+0.010
ll+0.010
ve+0.010
arter+0.010
Neg logit contributions
 and-0.008
 her-0.008
 enough-0.007
 as-0.007
 one-0.007
Token to
Token ID284
Feature activation+1.06e-02

Pos logit contributions
 hamb+0.019
leaf+0.018
 sidelines+0.018
ll+0.017
ye+0.017
Neg logit contributions
 and-0.007
ine-0.006
stick-0.006
c-0.005
 as-0.005
Token have
Token ID423
Feature activation-1.63e-02

Pos logit contributions
 sidelines+0.012
arter+0.011
prise+0.010
uch+0.010
opus+0.010
Neg logit contributions
 and-0.019
 enough-0.018
 her-0.017
 how-0.017
 as-0.016
Token a
Token ID257
Feature activation-4.64e-04

Pos logit contributions
 and+0.020
stick+0.017
ine+0.017
 or+0.017
 She+0.016
Neg logit contributions
 hamb-0.025
 freshly-0.024
 cheesy-0.024
prise-0.023
gary-0.023
Tokenunn
Token ID20935
Feature activation+7.65e-02

Pos logit contributions
 sidelines+0.022
opus+0.021
arter+0.020
cher+0.019
 cheesy+0.019
Neg logit contributions
 and-0.069
 her-0.064
.-0.063
,-0.062
 as-0.061
Tokenies
Token ID444
Feature activation-2.11e-02

Pos logit contributions
arter+0.112
opus+0.111
 sidelines+0.107
ye+0.103
ll+0.103
Neg logit contributions
 and-0.121
 her-0.113
 how-0.103
 them-0.102
 as-0.102
Token.
Token ID13
Feature activation+2.48e-03

Pos logit contributions
 enough+0.017
 her+0.016
 them+0.014
 Everywhere+0.014
stick+0.014
Neg logit contributions
opus-0.040
ll-0.039
arter-0.037
ve-0.036
ye-0.036
Token The
Token ID383
Feature activation+1.10e-02

Pos logit contributions
arter+0.005
 sidelines+0.005
opus+0.005
umm+0.005
ye+0.005
Neg logit contributions
 and-0.002
 her-0.002
 how-0.002
 as-0.001
 them-0.001
Token b
Token ID275
Feature activation+6.67e-02

Pos logit contributions
arter+0.019
opus+0.018
 sidelines+0.018
leaf+0.017
uch+0.017
Neg logit contributions
 and-0.016
 her-0.015
.-0.014
 one-0.013
 them-0.013
Tokenunn
Token ID20935
Feature activation+1.15e-01

Pos logit contributions
 sidelines+0.039
opus+0.035
arter+0.034
ye+0.029
prise+0.028
Neg logit contributions
 and-0.173
 her-0.163
.-0.156
 as-0.155
,-0.153
Tokenies
Token ID444
Feature activation+1.32e-02

Pos logit contributions
arter+0.162
opus+0.162
 sidelines+0.153
ll+0.151
ve+0.150
Neg logit contributions
 and-0.185
 her-0.172
 how-0.159
 as-0.158
.-0.158
Token lived
Token ID5615
Feature activation+1.19e-02

Pos logit contributions
opus+0.018
arter+0.017
ll+0.017
ve+0.017
uch+0.016
Neg logit contributions
 and-0.020
 her-0.019
 enough-0.019
 as-0.018
 them-0.018
Token in
Token ID287
Feature activation+2.97e-02

Pos logit contributions
opus+0.027
 sidelines+0.027
arter+0.027
ye+0.026
umm+0.025
Neg logit contributions
 and-0.010
 her-0.009
.-0.007
 as-0.007
 them-0.007
Token a
Token ID257
Feature activation+1.61e-02

Pos logit contributions
arter+0.067
 sidelines+0.066
prise+0.065
 cheesy+0.065
leaf+0.064
Neg logit contributions
 and-0.019
 her-0.012
 how-0.012
 as-0.012
 enough-0.011
Token cozy
Token ID37438
Feature activation+2.34e-02

Pos logit contributions
leaf+0.030
arter+0.030
 sidelines+0.029
umm+0.029
cher+0.029
Neg logit contributions
 and-0.022
 her-0.020
,-0.019
.-0.018
 one-0.017
Token was
Token ID373
Feature activation+4.86e-03

Pos logit contributions
opus+0.034
 sidelines+0.034
arter+0.034
ll+0.033
ye+0.033
Neg logit contributions
 and-0.020
 her-0.017
 as-0.016
.-0.016
 how-0.015
Token ready
Token ID3492
Feature activation+2.00e-02

Pos logit contributions
 sidelines+0.008
arter+0.008
ye+0.008
prise+0.007
 cheesy+0.007
Neg logit contributions
 and-0.006
 her-0.005
 how-0.005
.-0.005
 or-0.005
Token,
Token ID11
Feature activation+1.50e-02

Pos logit contributions
 hamb+0.040
ye+0.040
ve+0.040
arter+0.040
ll+0.039
Neg logit contributions
c-0.012
stick-0.012
 how-0.012
 Every-0.010
ine-0.010
Token the
Token ID262
Feature activation+1.47e-02

Pos logit contributions
 sidelines+0.025
opus+0.025
ye+0.024
 Merry+0.024
arter+0.024
Neg logit contributions
 and-0.017
 her-0.014
 how-0.014
.-0.014
 or-0.013
Token b
Token ID275
Feature activation+6.04e-02

Pos logit contributions
arter+0.022
opus+0.022
 sidelines+0.022
uch+0.020
ye+0.020
Neg logit contributions
 and-0.023
 her-0.021
.-0.020
 as-0.019
 one-0.019
Tokenunn
Token ID20935
Feature activation+1.15e-01

Pos logit contributions
 sidelines+0.038
opus+0.038
arter+0.038
ye+0.032
uch+0.031
Neg logit contributions
 and-0.149
 her-0.140
.-0.134
 as-0.133
,-0.132
Tokenies
Token ID444
Feature activation+7.64e-03

Pos logit contributions
opus+0.146
arter+0.145
ve+0.143
ll+0.142
ye+0.137
Neg logit contributions
 and-0.190
 her-0.183
 how-0.173
 them-0.170
 its-0.167
Token hopped
Token ID45230
Feature activation+3.05e-03

Pos logit contributions
opus+0.012
ll+0.011
ve+0.011
ye+0.011
 sidelines+0.011
Neg logit contributions
 her-0.009
 enough-0.009
 and-0.009
 them-0.009
 one-0.009
Token over
Token ID625
Feature activation-2.75e-03

Pos logit contributions
ve+0.005
 sidelines+0.005
leaf+0.005
ye+0.005
cher+0.005
Neg logit contributions
 how-0.003
ine-0.003
 Whenever-0.003
 Every-0.003
c-0.003
Token to
Token ID284
Feature activation+1.39e-02

Pos logit contributions
ine+0.002
stick+0.002
age+0.002
Her+0.002
ung+0.002
Neg logit contributions
 freshly-0.005
leaf-0.005
 sidelines-0.005
 hamb-0.005
 dining-0.005
Token the
Token ID262
Feature activation+8.63e-03

Pos logit contributions
 sidelines+0.032
arter+0.032
opus+0.031
ye+0.030
leaf+0.030
Neg logit contributions
 and-0.011
 her-0.008
.-0.007
 as-0.007
 how-0.007
Token want
Token ID765
Feature activation-1.69e-02

Pos logit contributions
opus+0.004
arter+0.004
 sidelines+0.004
umm+0.003
uch+0.003
Neg logit contributions
 and-0.004
 her-0.004
.-0.003
 as-0.003
 how-0.003
Token to
Token ID284
Feature activation+2.06e-03

Pos logit contributions
 and+0.012
ine+0.011
 her+0.010
c+0.009
stick+0.009
Neg logit contributions
 hamb-0.035
 sidelines-0.034
leaf-0.034
 cheesy-0.034
ye-0.033
Token see
Token ID766
Feature activation+3.04e-03

Pos logit contributions
arter+0.003
 sidelines+0.003
ye+0.002
prise+0.002
uch+0.002
Neg logit contributions
 and-0.003
 her-0.003
 how-0.003
 enough-0.003
 as-0.003
Token the
Token ID262
Feature activation+2.08e-02

Pos logit contributions
arter+0.006
 cheesy+0.006
 hamb+0.006
 sidelines+0.006
opus+0.006
Neg logit contributions
 and-0.002
ine-0.002
stick-0.002
 her-0.002
 as-0.002
Token b
Token ID275
Feature activation+6.32e-02

Pos logit contributions
arter+0.032
opus+0.031
 sidelines+0.031
uch+0.030
atmeal+0.029
Neg logit contributions
 and-0.033
 her-0.030
.-0.028
 one-0.028
 them-0.027
Tokenunn
Token ID20935
Feature activation+1.15e-01

Pos logit contributions
 sidelines+0.067
cher+0.059
prise+0.059
ye+0.059
leaf+0.058
Neg logit contributions
 and-0.135
.-0.121
 as-0.116
,-0.116
 her-0.115
Tokenies
Token ID444
Feature activation+2.04e-03

Pos logit contributions
opus+0.162
arter+0.162
 sidelines+0.153
ll+0.150
ye+0.149
Neg logit contributions
 and-0.184
 her-0.175
 how-0.160
 them-0.158
 as-0.157
Token.
Token ID13
Feature activation+8.35e-03

Pos logit contributions
arter+0.004
 sidelines+0.004
opus+0.004
ll+0.004
atmeal+0.004
Neg logit contributions
 her-0.001
 and-0.001
ine-0.001
 one-0.001
 enough-0.001
Token They
Token ID1119
Feature activation+1.14e-03

Pos logit contributions
arter+0.015
 Merry+0.015
 Enjoy+0.015
 cheesy+0.015
 sidelines+0.015
Neg logit contributions
 and-0.008
 her-0.008
 enough-0.007
 as-0.007
 how-0.007
Token are
Token ID389
Feature activation+1.62e-03

Pos logit contributions
 sidelines+0.002
arter+0.002
opus+0.002
ye+0.001
leaf+0.001
Neg logit contributions
 and-0.002
 her-0.002
 as-0.002
 how-0.002
.-0.002
Token the
Token ID262
Feature activation+1.91e-02

Pos logit contributions
arter+0.002
 cheesy+0.002
 sidelines+0.002
 hamb+0.002
ye+0.002
Neg logit contributions
 and-0.003
 her-0.002
stick-0.002
 one-0.002
.-0.002
Token to
Token ID284
Feature activation+2.30e-02

Pos logit contributions
 sidelines+0.034
opus+0.033
 hamb+0.032
arter+0.032
leaf+0.031
Neg logit contributions
 and-0.020
ine-0.017
 as-0.016
 or-0.016
.-0.016
Token look
Token ID804
Feature activation+1.89e-02

Pos logit contributions
 sidelines+0.022
arter+0.020
atmeal+0.019
opus+0.019
leaf+0.019
Neg logit contributions
 enough-0.043
 and-0.041
 once-0.038
 or-0.037
ung-0.037
Token at
Token ID379
Feature activation+1.89e-03

Pos logit contributions
arter+0.041
opus+0.041
 sidelines+0.040
ye+0.039
ve+0.039
Neg logit contributions
 and-0.015
 her-0.013
 how-0.011
 one-0.011
 them-0.010
Token the
Token ID262
Feature activation+5.55e-03

Pos logit contributions
arter+0.003
 sidelines+0.003
 forks+0.003
 microsc+0.003
 freshly+0.003
Neg logit contributions
 and-0.002
ine-0.001
age-0.001
ung-0.001
 as-0.001
Token or
Token ID393
Feature activation-1.78e-03

Pos logit contributions
opus+0.008
 sidelines+0.008
arter+0.008
ye+0.008
umm+0.008
Neg logit contributions
 and-0.009
 her-0.008
 as-0.007
 how-0.007
.-0.007
Tokennam
Token ID7402
Feature activation+1.15e-01

Pos logit contributions
 and+0.004
 her+0.003
.+0.003
 one+0.003
 as+0.003
Neg logit contributions
opus-0.002
arter-0.002
ye-0.002
umm-0.002
uch-0.002
Tokenents
Token ID658
Feature activation+1.44e-02

Pos logit contributions
arter+0.192
 sidelines+0.191
opus+0.190
ye+0.179
uch+0.176
Neg logit contributions
 and-0.164
 her-0.147
.-0.135
 as-0.133
 how-0.131
Token on
Token ID319
Feature activation-3.33e-02

Pos logit contributions
arter+0.027
atmeal+0.027
cher+0.027
ll+0.027
 sidelines+0.027
Neg logit contributions
ine-0.009
 Every-0.008
 Everywhere-0.008
 her-0.008
 and-0.008
Token the
Token ID262
Feature activation-1.16e-02

Pos logit contributions
age+0.027
ung+0.026
stick+0.026
ine+0.025
ow+0.024
Neg logit contributions
 sidelines-0.056
 cheesy-0.055
 freshly-0.055
 forks-0.054
ocol-0.053
Token tree
Token ID5509
Feature activation+2.29e-03

Pos logit contributions
 and+0.020
 how+0.020
 her+0.019
 as+0.019
 enough+0.019
Neg logit contributions
 sidelines-0.014
opus-0.013
arter-0.012
 dining-0.012
umm-0.012
Token.
Token ID13
Feature activation+8.30e-03

Pos logit contributions
arter+0.004
 dining+0.004
 sidelines+0.004
ll+0.004
ider+0.004
Neg logit contributions
 her-0.001
 them-0.001
 enough-0.001
 and-0.001
 its-0.001
Token the
Token ID262
Feature activation+6.30e-03

Pos logit contributions
 sidelines+0.002
 cheesy+0.002
 hamb+0.001
leaf+0.001
arter+0.001
Neg logit contributions
Her-0.001
ine-0.001
 and-0.001
 One-0.001
 She-0.001
Token rest
Token ID1334
Feature activation+6.01e-02

Pos logit contributions
 sidelines+0.009
opus+0.009
arter+0.009
ye+0.008
uch+0.008
Neg logit contributions
 and-0.010
 her-0.010
 how-0.009
 as-0.009
.-0.009
Token of
Token ID286
Feature activation-2.96e-03

Pos logit contributions
arter+0.100
opus+0.097
ll+0.096
ve+0.092
 sidelines+0.092
Neg logit contributions
 One-0.062
 her-0.062
 Every-0.058
 how-0.057
 one-0.055
Token the
Token ID262
Feature activation+8.74e-03

Pos logit contributions
 and+0.002
 or+0.002
ine+0.002
 how+0.001
 enough+0.001
Neg logit contributions
 sidelines-0.007
arter-0.006
 cheesy-0.006
 freshly-0.006
 hamb-0.006
Token or
Token ID393
Feature activation-1.48e-02

Pos logit contributions
arter+0.013
opus+0.013
 sidelines+0.012
ye+0.012
uch+0.012
Neg logit contributions
 and-0.014
 her-0.013
.-0.012
 one-0.012
 as-0.012
Tokennam
Token ID7402
Feature activation+1.15e-01

Pos logit contributions
 and+0.017
 her+0.015
.+0.013
 as+0.013
 how+0.013
Neg logit contributions
opus-0.030
arter-0.029
 sidelines-0.029
ye-0.028
uch-0.027
Tokenents
Token ID658
Feature activation+3.10e-03

Pos logit contributions
opus+0.184
arter+0.184
 sidelines+0.183
ye+0.171
uch+0.170
Neg logit contributions
 and-0.168
 her-0.152
.-0.140
 as-0.138
 how-0.137
Token onto
Token ID4291
Feature activation-5.15e-02

Pos logit contributions
cher+0.006
ll+0.006
ve+0.006
arter+0.006
vell+0.006
Neg logit contributions
 her-0.002
 Every-0.002
 One-0.002
 how-0.001
ine-0.001
Token the
Token ID262
Feature activation-1.71e-02

Pos logit contributions
 how+0.034
c+0.031
 and+0.028
Her+0.028
ine+0.025
Neg logit contributions
 sidelines-0.096
 cheesy-0.089
 hamb-0.088
arter-0.087
 timer-0.087
Token tree
Token ID5509
Feature activation-1.00e-02

Pos logit contributions
 and+0.029
 her+0.027
 how+0.025
 them+0.025
 as+0.025
Neg logit contributions
 sidelines-0.024
opus-0.023
arter-0.022
ye-0.021
umm-0.021
Token.
Token ID13
Feature activation-4.32e-03

Pos logit contributions
 her+0.006
 them+0.005
 how+0.005
 enough+0.005
 an+0.005
Neg logit contributions
ll-0.020
 sidelines-0.020
cher-0.019
arter-0.019
ve-0.019
Token Everyday
Token ID48154
Feature activation-2.89e-02

Pos logit contributions
 and+0.006
 her+0.006
 enough+0.005
 one+0.005
 or+0.005
Neg logit contributions
opus-0.007
ll-0.007
 sidelines-0.007
ye-0.006
ve-0.006
Token he
Token ID339
Feature activation+1.54e-02

Pos logit contributions
stick+0.033
 and+0.030
ine+0.030
 how+0.029
ung+0.029
Neg logit contributions
opus-0.041
 Merry-0.040
 hamb-0.040
ye-0.040
 sidelines-0.037
Token would
Token ID561
Feature activation-4.24e-04

Pos logit contributions
ye+0.019
opus+0.018
ll+0.018
ve+0.017
uch+0.017
Neg logit contributions
 and-0.023
 her-0.023
 them-0.022
 how-0.021
 its-0.021
Token take
Token ID1011
Feature activation+1.03e-02

Pos logit contributions
 and+0.001
 her+0.001
 enough+0.001
 them+0.001
 how+0.001
Neg logit contributions
 sidelines-0.000
ye-0.000
arter-0.000
opus-0.000
atmeal-0.000
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
ye+0.020
 sidelines+0.020
leaf+0.020
prise+0.019
opus+0.019
Neg logit contributions
 and-0.008
 or-0.006
 how-0.006
ine-0.006
 her-0.006
Token av
Token ID1196
Feature activation+1.12e-01

Pos logit contributions
opus+0.019
 sidelines+0.019
arter+0.018
ye+0.017
leaf+0.017
Neg logit contributions
 and-0.022
 her-0.021
 how-0.019
 them-0.019
 as-0.019
Tokenoc
Token ID420
Feature activation+9.13e-02

Pos logit contributions
opus+0.129
 sidelines+0.124
arter+0.124
ye+0.117
uch+0.116
Neg logit contributions
 and-0.213
 her-0.202
.-0.188
 them-0.187
 how-0.186
Tokenados
Token ID22484
Feature activation+1.65e-02

Pos logit contributions
opus+0.129
arter+0.128
 sidelines+0.127
ye+0.118
uch+0.117
Neg logit contributions
 and-0.152
 her-0.140
.-0.130
 as-0.128
 them-0.128
Token and
Token ID290
Feature activation-4.49e-03

Pos logit contributions
ll+0.033
cher+0.031
vell+0.030
ve+0.029
ancer+0.029
Neg logit contributions
ine-0.013
 her-0.012
 its-0.012
 Whenever-0.011
 them-0.011
Token scatter
Token ID41058
Feature activation-3.61e-02

Pos logit contributions
 and+0.007
 her+0.006
.+0.006
 as+0.006
 them+0.006
Neg logit contributions
 sidelines-0.007
opus-0.007
arter-0.007
ye-0.006
umm-0.006
Token them
Token ID606
Feature activation-3.07e-03

Pos logit contributions
ine+0.028
Her+0.023
 and+0.022
 Her+0.020
stick+0.019
Neg logit contributions
ye-0.068
umm-0.068
 cheesy-0.067
 sidelines-0.067
 topp-0.066
Token the
Token ID262
Feature activation+3.09e-02

Pos logit contributions
opus+0.047
ye+0.046
arter+0.045
 sidelines+0.045
leaf+0.045
Neg logit contributions
 and-0.023
 her-0.020
 as-0.020
ine-0.018
.-0.017
Token animals
Token ID4695
Feature activation+2.67e-02

Pos logit contributions
arter+0.058
 sidelines+0.055
uch+0.055
opus+0.053
atmeal+0.053
Neg logit contributions
 and-0.046
.-0.038
 them-0.037
 one-0.036
 her-0.036
Token,
Token ID11
Feature activation+2.44e-02

Pos logit contributions
arter+0.067
opus+0.066
ll+0.065
 sidelines+0.065
ye+0.064
Neg logit contributions
 her-0.007
 and-0.005
 enough-0.004
 its-0.003
 one-0.003
Token the
Token ID262
Feature activation+2.20e-02

Pos logit contributions
arter+0.053
 sidelines+0.052
opus+0.052
leaf+0.050
ye+0.049
Neg logit contributions
 and-0.020
 her-0.017
.-0.015
 as-0.014
 how-0.014
Token ac
Token ID936
Feature activation+5.00e-02

Pos logit contributions
arter+0.036
opus+0.036
 sidelines+0.036
uch+0.034
prise+0.033
Neg logit contributions
 and-0.033
 her-0.029
.-0.027
 them-0.027
 one-0.026
Tokenrob
Token ID22609
Feature activation+1.12e-01

Pos logit contributions
 sidelines+0.046
arter+0.046
umm+0.040
prise+0.040
opus+0.039
Neg logit contributions
 and-0.115
 her-0.102
.-0.101
,-0.100
 them-0.097
Tokenats
Token ID1381
Feature activation+3.98e-02

Pos logit contributions
 sidelines+0.173
arter+0.165
opus+0.164
umm+0.160
leaf+0.154
Neg logit contributions
 and-0.181
 her-0.165
.-0.155
 them-0.154
,-0.153
Token and
Token ID290
Feature activation+3.85e-03

Pos logit contributions
opus+0.092
 sidelines+0.091
ye+0.087
ll+0.087
ancer+0.085
Neg logit contributions
 her-0.017
 enough-0.015
 its-0.013
 and-0.013
 how-0.012
Token the
Token ID262
Feature activation-2.44e-03

Pos logit contributions
 sidelines+0.008
arter+0.008
opus+0.008
ye+0.008
leaf+0.008
Neg logit contributions
 and-0.003
 her-0.003
.-0.003
 enough-0.002
 as-0.002
Token clown
Token ID25573
Feature activation-6.08e-03

Pos logit contributions
 and+0.004
 her+0.003
 as+0.003
 how+0.003
.+0.003
Neg logit contributions
opus-0.004
 sidelines-0.004
arter-0.004
ye-0.003
leaf-0.003
Tokens
Token ID82
Feature activation-2.64e-03

Pos logit contributions
 her+0.007
 and+0.007
 enough+0.006
 its+0.006
 how+0.006
Neg logit contributions
opus-0.011
ll-0.010
ye-0.010
arter-0.010
atmeal-0.010
Token hopped
Token ID45230
Feature activation-6.30e-03

Pos logit contributions
ll+0.042
opus+0.042
ve+0.041
ye+0.037
 hamb+0.037
Neg logit contributions
 them-0.044
 enough-0.043
 her-0.043
 or-0.042
 their-0.041
Token over
Token ID625
Feature activation-1.37e-02

Pos logit contributions
ine+0.006
 how+0.006
c+0.006
stick+0.006
packs+0.006
Neg logit contributions
ve-0.011
 sidelines-0.011
ye-0.010
leaf-0.010
ll-0.010
Token to
Token ID284
Feature activation+6.48e-03

Pos logit contributions
ine+0.010
stick+0.009
Her+0.007
age+0.007
ung+0.007
Neg logit contributions
leaf-0.026
 hamb-0.026
 sidelines-0.026
 freshly-0.025
ye-0.025
Token the
Token ID262
Feature activation+3.33e-03

Pos logit contributions
 sidelines+0.014
arter+0.014
opus+0.014
ye+0.013
prise+0.013
Neg logit contributions
 and-0.006
 her-0.005
.-0.005
 as-0.005
 how-0.004
Token b
Token ID275
Feature activation+5.44e-02

Pos logit contributions
arter+0.005
opus+0.005
 sidelines+0.005
ye+0.005
uch+0.005
Neg logit contributions
 and-0.005
 her-0.005
.-0.004
 as-0.004
 one-0.004
Tokenunn
Token ID20935
Feature activation+1.12e-01

Pos logit contributions
arter+0.044
opus+0.044
 sidelines+0.043
ye+0.037
uch+0.036
Neg logit contributions
 and-0.124
 her-0.117
.-0.110
 how-0.110
 as-0.110
Tokenies
Token ID444
Feature activation-7.12e-04

Pos logit contributions
ll+0.131
opus+0.128
arter+0.128
unn+0.125
ve+0.121
Neg logit contributions
 her-0.190
 and-0.187
 how-0.180
 them-0.178
 its-0.173
Token.
Token ID13
Feature activation+2.96e-03

Pos logit contributions
 her+0.000
ine+0.000
 them+0.000
stick+0.000
 its+0.000
Neg logit contributions
 sidelines-0.002
opus-0.002
ll-0.002
ye-0.001
ve-0.001
Token They
Token ID1119
Feature activation-4.81e-03

Pos logit contributions
 sidelines+0.005
opus+0.005
arter+0.005
 Enjoy+0.005
ye+0.005
Neg logit contributions
 and-0.003
 her-0.003
 enough-0.003
 or-0.003
 how-0.003
Token were
Token ID547
Feature activation+9.85e-03

Pos logit contributions
 and+0.007
 her+0.006
.+0.006
 or+0.006
 as+0.006
Neg logit contributions
 sidelines-0.007
opus-0.007
ll-0.007
arter-0.007
ye-0.007
Token so
Token ID523
Feature activation+2.32e-02

Pos logit contributions
 sidelines+0.017
arter+0.017
 hamb+0.017
ye+0.016
 cheesy+0.016
Neg logit contributions
 and-0.010
 or-0.009
.-0.009
 her-0.008
 how-0.008
Token on
Token ID319
Feature activation-4.67e-02

Pos logit contributions
 Everywhere+0.011
 Every+0.011
 Whenever+0.011
 So+0.010
 her+0.010
Neg logit contributions
arter-0.026
ve-0.026
cher-0.024
ll-0.024
atmeal-0.024
Token it
Token ID340
Feature activation-2.11e-02

Pos logit contributions
ine+0.052
 and+0.049
ow+0.047
age+0.046
stick+0.046
Neg logit contributions
 sidelines-0.059
 cheesy-0.059
 topp-0.057
 crispy-0.056
 freshly-0.054
Token.
Token ID13
Feature activation+3.10e-03

Pos logit contributions
ine+0.017
 them+0.015
 her+0.014
 enough+0.014
Her+0.014
Neg logit contributions
arter-0.033
ll-0.032
ider-0.032
ve-0.030
vell-0.030
Token The
Token ID383
Feature activation+5.78e-03

Pos logit contributions
 sidelines+0.006
arter+0.006
opus+0.006
 cheesy+0.006
umm+0.005
Neg logit contributions
 and-0.003
 her-0.003
 how-0.003
 enough-0.002
 as-0.002
Token or
Token ID393
Feature activation+1.35e-02

Pos logit contributions
 sidelines+0.009
opus+0.009
arter+0.009
umm+0.008
ye+0.008
Neg logit contributions
 and-0.008
 her-0.007
 how-0.007
 them-0.007
 as-0.007
Tokennam
Token ID7402
Feature activation+1.12e-01

Pos logit contributions
 sidelines+0.018
opus+0.017
arter+0.017
ye+0.016
 cheesy+0.016
Neg logit contributions
 and-0.023
 her-0.020
.-0.020
 as-0.019
 them-0.019
Tokenents
Token ID658
Feature activation+1.58e-02

Pos logit contributions
opus+0.169
arter+0.168
 sidelines+0.167
ve+0.157
uch+0.156
Neg logit contributions
 and-0.170
 her-0.156
.-0.143
 how-0.143
 as-0.142
Token were
Token ID547
Feature activation-4.28e-03

Pos logit contributions
ve+0.020
uch+0.019
ll+0.019
atmeal+0.019
cher+0.018
Neg logit contributions
 her-0.021
ine-0.020
 Every-0.020
 and-0.020
 Whenever-0.019
Token shiny
Token ID22441
Feature activation-1.16e-02

Pos logit contributions
 and+0.006
 her+0.005
.+0.005
 them+0.005
ine+0.005
Neg logit contributions
 sidelines-0.006
arter-0.006
 cheesy-0.006
ve-0.006
ye-0.006
Token and
Token ID290
Feature activation+1.47e-02

Pos logit contributions
 her+0.003
 them+0.002
 how+0.002
ine+0.001
 its+0.001
Neg logit contributions
opus-0.028
arter-0.028
 cheesy-0.028
umm-0.028
ve-0.028
Token colorful
Token ID20239
Feature activation-3.72e-02

Pos logit contributions
opus+0.025
arter+0.025
 sidelines+0.024
uch+0.023
ye+0.023
Neg logit contributions
 and-0.022
 her-0.021
 one-0.019
.-0.019
 how-0.019
Token time
Token ID640
Feature activation-2.23e-02

Pos logit contributions
 them+0.033
 her+0.032
 and+0.032
 how+0.032
 or+0.031
Neg logit contributions
arter-0.031
opus-0.030
 sidelines-0.030
 freshly-0.029
 cheesy-0.029
Token playing
Token ID2712
Feature activation-1.00e-02

Pos logit contributions
 her+0.023
 them+0.022
 and+0.022
 its+0.021
 their+0.020
Neg logit contributions
 sidelines-0.036
arter-0.036
ve-0.035
opus-0.035
ider-0.035
Token with
Token ID351
Feature activation+6.74e-04

Pos logit contributions
 enough+0.011
ine+0.010
 Whenever+0.009
 Every+0.009
 One+0.008
Neg logit contributions
 seek-0.015
 sidelines-0.015
opus-0.015
beans-0.015
arter-0.015
Token her
Token ID607
Feature activation-1.66e-02

Pos logit contributions
 sidelines+0.001
 freshly+0.001
leaf+0.001
prise+0.001
arter+0.001
Neg logit contributions
 and-0.000
ine-0.000
 enough-0.000
 or-0.000
 as-0.000
Token or
Token ID393
Feature activation-2.33e-02

Pos logit contributions
 and+0.024
 enough+0.023
 her+0.023
 them+0.023
 how+0.022
Neg logit contributions
opus-0.024
 sidelines-0.023
arter-0.022
vell-0.021
ve-0.021
Tokennam
Token ID7402
Feature activation+1.11e-01

Pos logit contributions
 and+0.028
 her+0.023
.+0.022
 as+0.021
 them+0.021
Neg logit contributions
 sidelines-0.043
arter-0.042
opus-0.041
ye-0.039
prise-0.039
Tokenents
Token ID658
Feature activation-5.50e-03

Pos logit contributions
opus+0.178
arter+0.178
 sidelines+0.177
ye+0.165
uch+0.164
Neg logit contributions
 and-0.164
 her-0.149
.-0.137
 as-0.135
 how-0.134
Token and
Token ID290
Feature activation-2.28e-02

Pos logit contributions
 Every+0.003
 One+0.003
 her+0.003
 them+0.003
 Whenever+0.003
Neg logit contributions
cher-0.011
ll-0.011
ve-0.011
atmeal-0.011
opus-0.010
Token making
Token ID1642
Feature activation-2.45e-02

Pos logit contributions
 and+0.029
 her+0.024
 or+0.024
 them+0.024
.+0.024
Neg logit contributions
 sidelines-0.037
opus-0.035
ye-0.035
cher-0.034
 cheesy-0.034
Token her
Token ID607
Feature activation-1.85e-02

Pos logit contributions
 and+0.026
 or+0.023
ine+0.021
,+0.020
 One+0.019
Neg logit contributions
 sidelines-0.042
 handsome-0.040
leaf-0.040
 talented-0.040
 freshly-0.040
Token special
Token ID2041
Feature activation-2.87e-02

Pos logit contributions
 them+0.027
 and+0.027
 enough+0.026
 its+0.025
 or+0.024
Neg logit contributions
 handsome-0.021
 talented-0.021
 cheek-0.021
 sidelines-0.020
 freshly-0.019
Token to
Token ID284
Feature activation+5.59e-03

Pos logit contributions
c+0.000
ine+0.000
stick+0.000
 Everywhere+0.000
alled+0.000
Neg logit contributions
 smelling-0.000
ll-0.000
 sidelines-0.000
 forgetting-0.000
 hamb-0.000
Token help
Token ID1037
Feature activation+3.21e-02

Pos logit contributions
arter+0.007
 sidelines+0.007
opus+0.007
ye+0.007
prise+0.007
Neg logit contributions
 and-0.009
 her-0.008
 how-0.008
 enough-0.008
 as-0.008
Token sort
Token ID3297
Feature activation-4.77e-03

Pos logit contributions
 sidelines+0.060
arter+0.059
 hamb+0.056
 cheek+0.054
leaf+0.053
Neg logit contributions
 enough-0.018
packs-0.016
 how-0.015
c-0.014
stick-0.014
Token the
Token ID262
Feature activation+1.03e-02

Pos logit contributions
ine+0.004
age+0.003
stick+0.003
 aid+0.003
 One+0.003
Neg logit contributions
leaf-0.008
 sidelines-0.008
umm-0.008
 freshly-0.008
 forks-0.008
Token or
Token ID393
Feature activation-1.47e-02

Pos logit contributions
opus+0.016
 sidelines+0.016
arter+0.016
ye+0.014
prise+0.014
Neg logit contributions
 and-0.016
 her-0.014
 as-0.013
.-0.013
 how-0.013
Tokennam
Token ID7402
Feature activation+1.11e-01

Pos logit contributions
 and+0.025
 her+0.024
.+0.022
 as+0.022
 how+0.022
Neg logit contributions
opus-0.021
arter-0.021
 sidelines-0.020
ye-0.019
uch-0.019
Tokenents
Token ID658
Feature activation-2.22e-04

Pos logit contributions
arter+0.180
opus+0.180
 sidelines+0.179
ye+0.168
uch+0.166
Neg logit contributions
 and-0.161
 her-0.146
.-0.134
 as-0.132
 how-0.131
Token.
Token ID13
Feature activation-1.92e-03

Pos logit contributions
 Every+0.000
ine+0.000
 One+0.000
 Whenever+0.000
 Everywhere+0.000
Neg logit contributions
ll-0.000
cher-0.000
ve-0.000
atmeal-0.000
vell-0.000
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
 and+0.002
 them+0.002
 or+0.002
 enough+0.002
ine+0.002
Neg logit contributions
 sidelines-0.003
umm-0.003
 Enjoy-0.003
 Welcome-0.003
 Thank-0.003
Token picked
Token ID6497
Feature activation+3.99e-02

Pos logit contributions
 her+0.015
 and+0.015
 them+0.015
 or+0.014
 its+0.014
Neg logit contributions
ve-0.013
opus-0.013
ll-0.013
 sidelines-0.012
ye-0.012
Token up
Token ID510
Feature activation-2.46e-02

Pos logit contributions
ve+0.070
leaf+0.070
 freshly+0.069
 hamb+0.068
wy+0.068
Neg logit contributions
ine-0.032
 and-0.028
stick-0.024
packs-0.023
.-0.023
Token and
Token ID290
Feature activation-6.14e-03

Pos logit contributions
ll+0.068
ve+0.066
arter+0.065
 dining+0.064
opus+0.063
Neg logit contributions
 her-0.015
 them-0.014
 how-0.013
 and-0.011
 its-0.011
Token play
Token ID711
Feature activation-2.64e-02

Pos logit contributions
 and+0.009
 her+0.008
.+0.007
 them+0.007
 as+0.007
Neg logit contributions
 sidelines-0.010
arter-0.010
opus-0.009
ye-0.009
prise-0.009
Token with
Token ID351
Feature activation+1.19e-03

Pos logit contributions
 and+0.022
 her+0.020
ine+0.019
 enough+0.018
 how+0.018
Neg logit contributions
 sidelines-0.053
opus-0.051
arter-0.050
ye-0.049
ve-0.049
Token your
Token ID534
Feature activation+4.78e-03

Pos logit contributions
arter+0.002
 sidelines+0.002
opus+0.002
affle+0.002
 cheesy+0.002
Neg logit contributions
 and-0.001
 enough-0.001
ine-0.001
 as-0.001
stick-0.001
Token or
Token ID393
Feature activation-1.34e-02

Pos logit contributions
opus+0.007
 sidelines+0.007
arter+0.007
ye+0.006
prise+0.006
Neg logit contributions
 and-0.007
 her-0.007
 as-0.006
 how-0.006
 them-0.006
Tokennam
Token ID7402
Feature activation+1.11e-01

Pos logit contributions
 and+0.016
 her+0.015
 how+0.013
 as+0.013
.+0.013
Neg logit contributions
opus-0.027
arter-0.026
 sidelines-0.026
umm-0.025
ve-0.025
Tokenents
Token ID658
Feature activation-2.45e-03

Pos logit contributions
arter+0.184
 sidelines+0.184
opus+0.183
ye+0.171
uch+0.168
Neg logit contributions
 and-0.159
 her-0.143
.-0.131
 as-0.130
 how-0.128
Token.
Token ID13
Feature activation+9.12e-03

Pos logit contributions
 her+0.001
 and+0.001
 Every+0.001
ine+0.001
 One+0.001
Neg logit contributions
ll-0.005
cher-0.005
arter-0.005
opus-0.005
atmeal-0.005
Token Maybe
Token ID6674
Feature activation-1.41e-02

Pos logit contributions
arter+0.015
 sidelines+0.015
 Enjoy+0.014
ye+0.014
 Merry+0.014
Neg logit contributions
 and-0.011
 her-0.011
 as-0.010
 how-0.010
 enough-0.010
Token you
Token ID345
Feature activation+1.81e-02

Pos logit contributions
 and+0.019
 her+0.017
.+0.016
 as+0.015
 how+0.015
Neg logit contributions
opus-0.022
ye-0.021
arter-0.021
 sidelines-0.021
prise-0.020
Token can
Token ID460
Feature activation-4.80e-03

Pos logit contributions
uch+0.018
opus+0.018
cher+0.018
ve+0.018
prise+0.018
Neg logit contributions
 her-0.033
 and-0.032
 how-0.030
 them-0.030
 its-0.030
Token he
Token ID339
Feature activation-5.40e-03

Pos logit contributions
opus+0.012
arter+0.011
 sidelines+0.010
affle+0.010
unn+0.010
Neg logit contributions
 and-0.010
 her-0.009
.-0.009
 how-0.009
 as-0.008
Token especially
Token ID2592
Feature activation-1.45e-02

Pos logit contributions
 and+0.008
 them+0.008
 her+0.008
 enough+0.007
 as+0.007
Neg logit contributions
ll-0.007
atmeal-0.007
opus-0.007
uch-0.007
ve-0.006
Token liked
Token ID8288
Feature activation-1.21e-02

Pos logit contributions
 and+0.023
 her+0.021
 them+0.019
.+0.019
 as+0.019
Neg logit contributions
opus-0.021
arter-0.021
 sidelines-0.021
uch-0.020
atmeal-0.020
Token the
Token ID262
Feature activation+5.59e-03

Pos logit contributions
ine+0.009
 and+0.009
Her+0.007
ung+0.007
 her+0.007
Neg logit contributions
 hamb-0.023
 sidelines-0.023
arter-0.023
 cheesy-0.022
 freshly-0.022
Token or
Token ID393
Feature activation-8.24e-03

Pos logit contributions
 sidelines+0.007
opus+0.007
arter+0.007
umm+0.007
 hamb+0.007
Neg logit contributions
 and-0.009
 her-0.009
 how-0.008
 as-0.008
 them-0.008
Tokennam
Token ID7402
Feature activation+1.11e-01

Pos logit contributions
 and+0.010
 her+0.009
.+0.008
 as+0.008
 how+0.008
Neg logit contributions
opus-0.016
arter-0.016
 sidelines-0.016
ye-0.015
uch-0.015
Tokenents
Token ID658
Feature activation+1.16e-03

Pos logit contributions
arter+0.186
 sidelines+0.186
opus+0.185
ye+0.174
uch+0.171
Neg logit contributions
 and-0.157
 her-0.140
.-0.129
 as-0.127
 how-0.125
Token he
Token ID339
Feature activation-1.17e-02

Pos logit contributions
atmeal+0.002
cher+0.002
ll+0.002
arter+0.002
ve+0.002
Neg logit contributions
 her-0.001
ine-0.001
 and-0.001
 Every-0.001
 them-0.000
Token kept
Token ID4030
Feature activation-1.08e-03

Pos logit contributions
 and+0.018
 her+0.017
 them+0.016
 enough+0.016
 as+0.016
Neg logit contributions
opus-0.016
ve-0.015
 sidelines-0.015
uch-0.015
arter-0.015
Token around
Token ID1088
Feature activation-3.54e-02

Pos logit contributions
 and+0.001
 So+0.001
ine+0.001
 Every+0.001
 She+0.001
Neg logit contributions
arter-0.002
 sidelines-0.002
 freshly-0.002
 cheesy-0.002
ve-0.002
Token.
Token ID13
Feature activation-4.03e-03

Pos logit contributions
ine+0.020
 enough+0.017
 and+0.017
 how+0.017
 her+0.016
Neg logit contributions
 sidelines-0.072
arter-0.069
 dining-0.068
ve-0.068
 hamb-0.067
Token She
Token ID1375
Feature activation-3.66e-03

Pos logit contributions
arter+0.016
 sidelines+0.015
 cheesy+0.015
opus+0.014
umm+0.014
Neg logit contributions
 and-0.006
 enough-0.006
 them-0.005
 one-0.005
 as-0.005
Token learned
Token ID4499
Feature activation+2.35e-02

Pos logit contributions
 and+0.005
 her+0.005
 them+0.004
 as+0.004
.+0.004
Neg logit contributions
opus-0.006
ll-0.005
arter-0.005
uch-0.005
 sidelines-0.005
Token that
Token ID326
Feature activation+2.64e-02

Pos logit contributions
leaf+0.045
ll+0.041
prise+0.040
arter+0.040
 sidelines+0.040
Neg logit contributions
ine-0.018
c-0.018
 and-0.015
stick-0.014
pter-0.014
Token the
Token ID262
Feature activation+1.40e-02

Pos logit contributions
arter+0.045
 sidelines+0.044
opus+0.044
uch+0.041
prise+0.041
Neg logit contributions
 and-0.034
 her-0.030
 as-0.028
.-0.028
 them-0.027
Token or
Token ID393
Feature activation+1.87e-02

Pos logit contributions
opus+0.020
 sidelines+0.020
arter+0.020
uch+0.019
ye+0.018
Neg logit contributions
 and-0.023
 her-0.021
 as-0.019
 them-0.019
 how-0.019
Tokennam
Token ID7402
Feature activation+1.10e-01

Pos logit contributions
 sidelines+0.042
arter+0.041
opus+0.041
prise+0.039
leaf+0.039
Neg logit contributions
 and-0.013
 her-0.010
.-0.009
 them-0.009
 one-0.009
Tokenents
Token ID658
Feature activation+2.13e-02

Pos logit contributions
opus+0.175
arter+0.175
 sidelines+0.173
uch+0.162
ye+0.162
Neg logit contributions
 and-0.163
 her-0.149
.-0.137
 as-0.135
 how-0.134
Token were
Token ID547
Feature activation+3.77e-03

Pos logit contributions
uch+0.027
ll+0.026
arter+0.026
atmeal+0.025
ve+0.025
Neg logit contributions
 her-0.029
 and-0.028
ine-0.028
 Every-0.027
 One-0.027
Token not
Token ID407
Feature activation+2.77e-02

Pos logit contributions
arter+0.006
 sidelines+0.006
opus+0.005
 cheesy+0.005
leaf+0.005
Neg logit contributions
 and-0.005
 her-0.005
.-0.004
 them-0.004
,-0.004
Token for
Token ID329
Feature activation-6.29e-03

Pos logit contributions
arter+0.040
 cheesy+0.037
ye+0.037
 sidelines+0.037
vell+0.036
Neg logit contributions
 and-0.037
 her-0.033
 them-0.032
-0.031
ine-0.030
Token playing
Token ID2712
Feature activation-1.15e-02

Pos logit contributions
 and+0.011
 her+0.010
.+0.009
 as+0.009
 how+0.009
Neg logit contributions
 sidelines-0.008
arter-0.008
opus-0.008
prise-0.008
ve-0.008
Token the
Token ID262
Feature activation+1.02e-03

Pos logit contributions
 sidelines+0.014
arter+0.013
 freshly+0.013
 cheesy+0.012
 robbers+0.012
Neg logit contributions
 and-0.007
Her-0.006
ine-0.005
stick-0.005
age-0.005
Token house
Token ID2156
Feature activation-3.01e-03

Pos logit contributions
 sidelines+0.001
opus+0.001
ye+0.001
umm+0.001
arter+0.001
Neg logit contributions
 and-0.002
 her-0.002
 how-0.001
 as-0.001
 them-0.001
Token saw
Token ID2497
Feature activation+9.36e-03

Pos logit contributions
 her+0.004
 and+0.004
 as+0.003
 enough+0.003
 them+0.003
Neg logit contributions
 sidelines-0.004
ll-0.004
ve-0.004
 dining-0.004
 hamb-0.004
Token the
Token ID262
Feature activation+1.82e-02

Pos logit contributions
 hamb+0.018
opus+0.017
arter+0.017
leaf+0.017
 sidelines+0.017
Neg logit contributions
ine-0.006
 and-0.006
stick-0.006
 One-0.005
age-0.005
Token b
Token ID275
Feature activation+6.30e-02

Pos logit contributions
 sidelines+0.025
opus+0.025
arter+0.024
umm+0.023
ye+0.023
Neg logit contributions
 and-0.031
 her-0.028
 how-0.027
 them-0.026
 as-0.026
Tokenunn
Token ID20935
Feature activation+1.10e-01

Pos logit contributions
 sidelines+0.040
opus+0.040
arter+0.039
ye+0.035
prise+0.034
Neg logit contributions
 and-0.156
 her-0.142
 as-0.141
.-0.140
,-0.138
Tokenies
Token ID444
Feature activation+3.59e-03

Pos logit contributions
arter+0.150
opus+0.150
 sidelines+0.142
ve+0.142
ll+0.141
Neg logit contributions
 and-0.179
 her-0.171
 how-0.159
 them-0.155
 one-0.154
Token up
Token ID510
Feature activation-5.08e-03

Pos logit contributions
 sidelines+0.007
arter+0.007
opus+0.007
ll+0.006
ve+0.006
Neg logit contributions
 her-0.003
 and-0.003
ine-0.003
 one-0.002
stick-0.002
Token in
Token ID287
Feature activation+3.67e-03

Pos logit contributions
stick+0.006
 Her+0.006
 Whenever+0.005
Her+0.005
 Or+0.005
Neg logit contributions
 cheesy-0.007
ve-0.007
 hamb-0.007
 sidelines-0.007
 freshly-0.007
Token the
Token ID262
Feature activation+1.34e-03

Pos logit contributions
 freshly+0.007
 sidelines+0.007
 cheesy+0.006
 hamb+0.006
arter+0.006
Neg logit contributions
stick-0.003
ine-0.003
 and-0.003
 how-0.003
Her-0.003
Token nest
Token ID16343
Feature activation+2.82e-02

Pos logit contributions
 sidelines+0.002
opus+0.002
arter+0.002
umm+0.002
ye+0.002
Neg logit contributions
 and-0.002
 her-0.002
 how-0.002
 as-0.002
 them-0.002
Token or
Token ID393
Feature activation-2.40e-02

Pos logit contributions
ine+0.012
 Every+0.011
 Whenever+0.010
stick+0.010
 its+0.010
Neg logit contributions
ll-0.040
uch-0.037
cher-0.037
ider-0.036
opus-0.035
Token read
Token ID1100
Feature activation-3.41e-02

Pos logit contributions
 and+0.031
 her+0.025
.+0.025
 as+0.024
 them+0.024
Neg logit contributions
 sidelines-0.041
arter-0.040
opus-0.038
prise-0.038
atmeal-0.037
Token a
Token ID257
Feature activation-4.24e-02

Pos logit contributions
ine+0.023
 and+0.022
stick+0.022
 Every+0.021
 enough+0.021
Neg logit contributions
 cheesy-0.059
 Merry-0.057
 sidelines-0.057
arter-0.057
prise-0.057
Token book
Token ID1492
Feature activation-6.69e-02

Pos logit contributions
 and+0.066
 her+0.064
 how+0.060
 as+0.059
 them+0.058
Neg logit contributions
opus-0.060
arter-0.058
 sidelines-0.058
ye-0.057
prise-0.055
Token or
Token ID393
Feature activation-3.82e-02

Pos logit contributions
 its+0.049
ine+0.047
 her+0.047
 them+0.047
 their+0.042
Neg logit contributions
ider-0.128
cher-0.122
ll-0.119
vell-0.116
 timer-0.113
Token dance
Token ID9280
Feature activation-9.31e-02

Pos logit contributions
 and+0.033
 her+0.025
.+0.023
 them+0.023
,+0.022
Neg logit contributions
 sidelines-0.081
arter-0.080
opus-0.078
prise-0.077
leaf-0.076
Token.
Token ID13
Feature activation-2.13e-03

Pos logit contributions
ine+0.071
 Every+0.060
 her+0.059
 and+0.059
 enough+0.058
Neg logit contributions
vell-0.173
cher-0.173
ider-0.169
arter-0.166
beans-0.163
Token
Token ID198
Feature activation+3.46e-03

Pos logit contributions
 and+0.002
 enough+0.002
 as+0.002
 how+0.002
 one+0.002
Neg logit contributions
arter-0.004
 Merry-0.004
ancer-0.004
 sidelines-0.004
opus-0.004
Token
Token ID198
Feature activation+3.55e-03

Pos logit contributions
ll+0.007
ve+0.007
opus+0.006
 sidelines+0.006
ye+0.006
Neg logit contributions
 enough-0.003
 as-0.003
 and-0.003
stick-0.003
 how-0.003
TokenBut
Token ID1537
Feature activation-1.04e-02

Pos logit contributions
 sidelines+0.004
opus+0.004
prise+0.004
ll+0.004
icious+0.004
Neg logit contributions
 and-0.006
 enough-0.006
 as-0.006
 how-0.006
 one-0.005
Token Lily
Token ID20037
Feature activation-1.56e-02

Pos logit contributions
 and+0.012
stick+0.011
.+0.010
 them+0.010
 Every+0.010
Neg logit contributions
 sidelines-0.015
arter-0.015
prise-0.014
 Merry-0.014
 cheesy-0.014
Token her
Token ID607
Feature activation-2.70e-02

Pos logit contributions
 and+0.001
 enough+0.000
 as+0.000
 One+0.000
stick+0.000
Neg logit contributions
arter-0.002
years-0.002
 sidelines-0.002
 cheek-0.002
 Merry-0.002
Token leg
Token ID1232
Feature activation-2.19e-02

Pos logit contributions
 them+0.037
 She+0.034
 One+0.033
 her+0.033
 their+0.032
Neg logit contributions
 cheek-0.042
arter-0.037
 forehead-0.036
opus-0.036
uch-0.034
Token and
Token ID290
Feature activation-2.38e-02

Pos logit contributions
 them+0.016
 their+0.014
 her+0.013
 its+0.012
 Symb+0.012
Neg logit contributions
 cheek-0.040
uch-0.038
cher-0.037
fy-0.036
ll-0.036
Token could
Token ID714
Feature activation-1.61e-02

Pos logit contributions
 and+0.027
 them+0.022
 one+0.021
 as+0.021
 her+0.021
Neg logit contributions
arter-0.039
 sidelines-0.039
opus-0.038
uch-0.038
atmeal-0.036
Token not
Token ID407
Feature activation-1.60e-02

Pos logit contributions
 and+0.024
 enough+0.022
 how+0.022
 them+0.021
 her+0.021
Neg logit contributions
uch-0.021
 sidelines-0.021
arter-0.021
ll-0.020
opus-0.020
Token dance
Token ID9280
Feature activation-9.25e-02

Pos logit contributions
 and+0.022
 enough+0.021
 her+0.021
 how+0.021
 them+0.020
Neg logit contributions
arter-0.023
opus-0.023
uch-0.023
 sidelines-0.023
atmeal-0.022
Token.
Token ID13
Feature activation-7.64e-03

Pos logit contributions
ine+0.070
 and+0.066
 Every+0.061
 how+0.060
 So+0.056
Neg logit contributions
arter-0.163
vell-0.162
cher-0.159
 handsome-0.157
 sidelines-0.152
Token She
Token ID1375
Feature activation-1.50e-02

Pos logit contributions
 and+0.008
 enough+0.008
 as+0.008
 how+0.007
 them+0.007
Neg logit contributions
 Merry-0.013
opus-0.012
 sidelines-0.012
arter-0.012
va-0.012
Token said
Token ID531
Feature activation+1.13e-02

Pos logit contributions
 and+0.021
 her+0.020
 how+0.020
 as+0.020
 them+0.020
Neg logit contributions
 forgot-0.018
opus-0.017
ve-0.017
uch-0.017
ll-0.017
Token that
Token ID326
Feature activation+4.37e-03

Pos logit contributions
opus+0.019
 handsome+0.017
 Merry+0.017
 sidelines+0.017
cha+0.017
Neg logit contributions
packs-0.012
ine-0.011
aters-0.011
stick-0.011
 and-0.011
Token she
Token ID673
Feature activation-1.39e-02

Pos logit contributions
opus+0.008
arter+0.008
 sidelines+0.008
ancer+0.008
ll+0.008
Neg logit contributions
 and-0.004
 as-0.003
 them-0.003
.-0.003
ine-0.003
Token floor
Token ID4314
Feature activation-2.18e-02

Pos logit contributions
opus+0.008
arter+0.008
 sidelines+0.008
prise+0.008
uch+0.008
Neg logit contributions
 and-0.009
 her-0.008
.-0.007
 them-0.007
,-0.007
Token or
Token ID393
Feature activation-1.46e-02

Pos logit contributions
 her+0.010
ine+0.008
stick+0.008
 its+0.008
 Every+0.008
Neg logit contributions
ider-0.046
arter-0.046
ll-0.046
 sidelines-0.044
 cheesy-0.044
Token on
Token ID319
Feature activation-4.10e-02

Pos logit contributions
 and+0.013
 her+0.011
.+0.009
 as+0.009
 them+0.009
Neg logit contributions
opus-0.034
arter-0.033
 sidelines-0.033
umm-0.032
ye-0.032
Token a
Token ID257
Feature activation-1.63e-02

Pos logit contributions
 and+0.029
ine+0.027
 as+0.026
 She+0.026
 her+0.026
Neg logit contributions
 sidelines-0.075
arter-0.073
 cheesy-0.073
 rubble-0.069
 topp-0.067
Token soft
Token ID2705
Feature activation-2.72e-02

Pos logit contributions
 and+0.026
 her+0.024
.+0.022
 as+0.022
 them+0.022
Neg logit contributions
opus-0.024
arter-0.024
 sidelines-0.024
ye-0.022
uch-0.022
Token mat
Token ID2603
Feature activation-8.99e-02

Pos logit contributions
 enough+0.044
 how+0.041
 and+0.041
 her+0.041
 as+0.040
Neg logit contributions
 sidelines-0.034
opus-0.031
arter-0.030
ye-0.029
 cheek-0.029
Token.
Token ID13
Feature activation+5.78e-03

Pos logit contributions
 her+0.063
ine+0.058
 its+0.054
 how+0.051
Her+0.050
Neg logit contributions
ider-0.174
 sidelines-0.172
 timer-0.169
arter-0.168
cher-0.167
Token They
Token ID1119
Feature activation-4.37e-04

Pos logit contributions
arter+0.012
 cheesy+0.011
 sidelines+0.011
umm+0.011
opus+0.011
Neg logit contributions
 and-0.005
 her-0.005
 as-0.004
 how-0.004
 enough-0.004
Token are
Token ID389
Feature activation+1.84e-03

Pos logit contributions
 and+0.001
 her+0.001
.+0.001
 as+0.001
 how+0.001
Neg logit contributions
 sidelines-0.001
arter-0.001
opus-0.001
ye-0.001
uch-0.001
Token careful
Token ID8161
Feature activation+5.00e-02

Pos logit contributions
arter+0.002
 sidelines+0.002
 cheesy+0.002
opus+0.002
ye+0.002
Neg logit contributions
 and-0.003
 her-0.003
 them-0.003
 one-0.003
.-0.003
Token and
Token ID290
Feature activation+2.15e-02

Pos logit contributions
ll+0.097
arter+0.095
uch+0.094
 cheek+0.093
opus+0.092
Neg logit contributions
 her-0.038
 enough-0.036
 and-0.030
 its-0.030
 how-0.030
Token Ben
Token ID3932
Feature activation-1.47e-02

Pos logit contributions
 and+0.009
 her+0.007
.+0.007
 as+0.007
 them+0.007
Neg logit contributions
 sidelines-0.012
arter-0.012
opus-0.012
ye-0.011
atmeal-0.011
Token smile
Token ID8212
Feature activation-6.65e-03

Pos logit contributions
 and+0.023
 her+0.022
 how+0.020
 as+0.020
.+0.020
Neg logit contributions
 sidelines-0.021
opus-0.020
arter-0.020
ye-0.019
prise-0.019
Token and
Token ID290
Feature activation-2.14e-02

Pos logit contributions
 her+0.001
 and+0.001
 them+0.000
 how-0.000
 one-0.000
Neg logit contributions
arter-0.017
 sidelines-0.017
opus-0.017
ye-0.017
ll-0.017
Token hug
Token ID16225
Feature activation-4.26e-03

Pos logit contributions
 and+0.035
 her+0.032
.+0.030
 as+0.030
,+0.029
Neg logit contributions
opus-0.032
arter-0.032
 sidelines-0.031
uch-0.029
ye-0.029
Token their
Token ID511
Feature activation-3.47e-02

Pos logit contributions
stick+0.002
 enough+0.002
 and+0.002
c+0.002
 how+0.002
Neg logit contributions
 Merry-0.008
arter-0.008
 cheesy-0.008
 hamb-0.008
 cheek-0.008
Token dolls
Token ID36062
Feature activation-8.96e-02

Pos logit contributions
 and+0.038
 her+0.033
.+0.031
 one+0.027
 as+0.027
Neg logit contributions
opus-0.076
arter-0.075
 sidelines-0.073
ye-0.072
umm-0.072
Token.
Token ID13
Feature activation+7.71e-03

Pos logit contributions
 her+0.032
 and+0.030
 them+0.028
 how+0.024
ine+0.021
Neg logit contributions
 sidelines-0.206
ll-0.205
 cheek-0.203
arter-0.203
leaf-0.199
Token They
Token ID1119
Feature activation-1.61e-03

Pos logit contributions
arter+0.015
 cheesy+0.014
 Merry+0.014
 sidelines+0.014
ye+0.014
Neg logit contributions
 and-0.007
 her-0.007
 how-0.006
stick-0.006
 enough-0.006
Token say
Token ID910
Feature activation+5.73e-03

Pos logit contributions
 and+0.003
 her+0.002
 how+0.002
 them+0.002
 as+0.002
Neg logit contributions
 sidelines-0.002
opus-0.002
arter-0.002
ye-0.002
uch-0.002
Token,
Token ID11
Feature activation-2.29e-03

Pos logit contributions
ye+0.013
rav+0.012
opus+0.012
arter+0.012
prise+0.012
Neg logit contributions
 and-0.004
ine-0.003
stick-0.003
c-0.003
 her-0.003
Token "
Token ID366
Feature activation-2.40e-04

Pos logit contributions
 and+0.002
 her+0.002
.+0.002
 as+0.001
 them+0.001
Neg logit contributions
opus-0.005
 sidelines-0.005
arter-0.005
ye-0.005
uch-0.005
Token looked
Token ID3114
Feature activation+2.67e-03

Pos logit contributions
 and+0.003
 her+0.003
 how+0.002
.+0.002
 as+0.002
Neg logit contributions
arter-0.003
 sidelines-0.003
opus-0.003
ye-0.003
prise-0.003
Token very
Token ID845
Feature activation+1.09e-02

Pos logit contributions
 sidelines+0.005
 cheesy+0.005
arter+0.004
 handsome+0.004
 hamb+0.004
Neg logit contributions
 and-0.002
stick-0.002
ine-0.002
 her-0.002
 how-0.002
Token happy
Token ID3772
Feature activation-4.85e-03

Pos logit contributions
opus+0.017
arter+0.017
 sidelines+0.017
uch+0.016
ye+0.016
Neg logit contributions
 and-0.017
 her-0.015
.-0.014
 as-0.014
 them-0.014
Token and
Token ID290
Feature activation-4.36e-03

Pos logit contributions
ine+0.003
c+0.003
 Every+0.003
 One+0.003
 Everywhere+0.003
Neg logit contributions
 hamb-0.009
opus-0.009
 cheek-0.009
arter-0.009
 waiter-0.009
Token not
Token ID407
Feature activation+1.78e-04

Pos logit contributions
 and+0.006
 her+0.006
.+0.005
 as+0.005
 how+0.005
Neg logit contributions
opus-0.008
arter-0.008
 sidelines-0.007
uch-0.007
ye-0.007
Token boring
Token ID14262
Feature activation-8.93e-02

Pos logit contributions
arter+0.000
 sidelines+0.000
opus+0.000
ye+0.000
prise+0.000
Neg logit contributions
 and-0.000
 her-0.000
 them-0.000
.-0.000
 how-0.000
Token anymore
Token ID7471
Feature activation-5.32e-02

Pos logit contributions
 and+0.054
c+0.053
 its+0.051
ine+0.051
 her+0.048
Neg logit contributions
arter-0.181
 sidelines-0.178
opus-0.172
ll-0.172
umm-0.171
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 and+0.025
 enough+0.021
 her+0.021
stick+0.021
ine+0.019
Neg logit contributions
arter-0.116
opus-0.113
ll-0.110
 sidelines-0.108
ve-0.108
Token
Token ID220
Feature activation+3.86e-02

Pos logit contributions
 and+0.010
 enough+0.009
 her+0.009
 as+0.009
 how+0.008
Neg logit contributions
arter-0.019
opus-0.019
 Merry-0.019
 sidelines-0.019
 Enjoy-0.018
Token
Token ID198
Feature activation-3.71e-03

Pos logit contributions
arter+0.083
opus+0.083
ll+0.083
 sidelines+0.080
ve+0.080
Neg logit contributions
 and-0.028
 enough-0.024
 her-0.024
 as-0.023
 one-0.021
Token
Token ID198
Feature activation-3.73e-03

Pos logit contributions
 and+0.003
 enough+0.003
 as+0.002
 or+0.002
 her+0.002
Neg logit contributions
opus-0.008
 sidelines-0.008
ll-0.008
ye-0.008
ve-0.008
Token one
Token ID530
Feature activation+1.60e-03

Pos logit contributions
ine+0.004
packs+0.004
c+0.004
 enough+0.004
So+0.004
Neg logit contributions
 sidelines-0.011
 Merry-0.011
arter-0.011
 cheesy-0.011
 cheek-0.011
Token more
Token ID517
Feature activation-3.28e-03

Pos logit contributions
arter+0.003
opus+0.003
 sidelines+0.003
prise+0.003
umm+0.003
Neg logit contributions
 and-0.002
 her-0.002
 them-0.002
 its-0.001
 enough-0.001
Token cartoon
Token ID16251
Feature activation-4.67e-02

Pos logit contributions
 and+0.005
 them+0.005
 her+0.005
 its+0.005
 how+0.005
Neg logit contributions
 sidelines-0.004
arter-0.004
opus-0.004
 cheesy-0.004
leaf-0.004
Token before
Token ID878
Feature activation-3.22e-02

Pos logit contributions
 them+0.032
 Every+0.031
 and+0.031
 its+0.030
 One+0.029
Neg logit contributions
opus-0.091
cher-0.090
 sidelines-0.090
prise-0.089
atmeal-0.088
Token bed
Token ID3996
Feature activation-3.13e-02

Pos logit contributions
 and+0.035
 how+0.031
stick+0.030
c+0.029
 enough+0.029
Neg logit contributions
 sidelines-0.057
arter-0.055
opus-0.051
 freshly-0.050
atmeal-0.050
Tokentime
Token ID2435
Feature activation-8.72e-02

Pos logit contributions
 and+0.016
 her+0.012
 them+0.011
 its+0.011
 how+0.011
Neg logit contributions
arter-0.073
opus-0.073
 sidelines-0.072
atmeal-0.071
umm-0.070
Token.
Token ID13
Feature activation-3.23e-03

Pos logit contributions
 and+0.043
c+0.041
 them+0.041
 how+0.037
 its+0.036
Neg logit contributions
 Merry-0.179
arter-0.179
umm-0.178
 cheek-0.176
 hamb-0.175
Token<|endoftext|>
Token ID50256
Feature activation+1.67e-02

Pos logit contributions
 and+0.002
 her+0.002
.+0.001
 as+0.001
 them+0.001
Neg logit contributions
arter-0.008
opus-0.008
 sidelines-0.008
ye-0.007
uch-0.007
TokenOnce
Token ID7454
Feature activation-5.11e-03

Pos logit contributions
opus+0.040
arter+0.039
 sidelines+0.039
ye+0.038
ll+0.038
Neg logit contributions
 and-0.009
 her-0.007
 as-0.006
 how-0.006
 enough-0.006
Token upon
Token ID2402
Feature activation-3.03e-03

Pos logit contributions
 and+0.008
 her+0.007
 as+0.007
 how+0.007
.+0.007
Neg logit contributions
arter-0.008
opus-0.008
 sidelines-0.007
ye-0.007
ll-0.007
Token a
Token ID257
Feature activation-9.55e-03

Pos logit contributions
 and+0.001
 her+0.001
 as+0.001
.+0.001
 them+0.001
Neg logit contributions
opus-0.008
arter-0.008
 sidelines-0.008
uch-0.008
ye-0.008
Token like
Token ID588
Feature activation-2.33e-02

Pos logit contributions
arter+0.055
 sidelines+0.055
opus+0.053
 cheesy+0.052
ye+0.051
Neg logit contributions
 and-0.038
 her-0.035
.-0.031
 how-0.031
 them-0.031
Token it
Token ID340
Feature activation+2.77e-02

Pos logit contributions
 and+0.019
ine+0.016
 her+0.015
.+0.015
stick+0.014
Neg logit contributions
 sidelines-0.048
opus-0.046
ye-0.046
arter-0.046
 hamb-0.046
Token hadn
Token ID8020
Feature activation+1.26e-02

Pos logit contributions
opus+0.048
arter+0.047
uch+0.046
ll+0.046
ve+0.045
Neg logit contributions
 and-0.030
 her-0.029
 them-0.027
 as-0.026
 how-0.026
Token't
Token ID470
Feature activation+2.19e-02

Pos logit contributions
opus+0.015
uch+0.014
arter+0.014
ve+0.014
 sidelines+0.014
Neg logit contributions
 and-0.022
 her-0.021
 how-0.021
 as-0.020
 enough-0.020
Token r
Token ID374
Feature activation-4.06e-02

Pos logit contributions
arter+0.032
ye+0.031
 sidelines+0.031
 hamb+0.031
atmeal+0.031
Neg logit contributions
 and-0.030
 her-0.027
 how-0.025
 them-0.025
,-0.025
Tokenained
Token ID1328
Feature activation-8.72e-02

Pos logit contributions
 and+0.050
 her+0.045
.+0.040
 as+0.040
 how+0.039
Neg logit contributions
opus-0.075
arter-0.075
 sidelines-0.075
ye-0.071
uch-0.070
Token in
Token ID287
Feature activation-5.57e-03

Pos logit contributions
 and+0.064
 her+0.063
 how+0.058
ine+0.058
stick+0.053
Neg logit contributions
arter-0.168
 sidelines-0.167
ancer-0.163
ll-0.161
uch-0.161
Token a
Token ID257
Feature activation-1.72e-02

Pos logit contributions
stick+0.004
Her+0.004
ine+0.004
 and+0.004
c+0.003
Neg logit contributions
arter-0.010
 hamb-0.010
 freshly-0.010
 sidelines-0.010
 cheesy-0.010
Token long
Token ID890
Feature activation-6.85e-03

Pos logit contributions
 and+0.026
.+0.023
,+0.022
 one+0.022
 them+0.021
Neg logit contributions
arter-0.029
opus-0.028
leaf-0.028
atmeal-0.027
 sidelines-0.027
Token time
Token ID640
Feature activation-3.29e-02

Pos logit contributions
 her+0.009
 how+0.008
 and+0.008
 them+0.007
 as+0.007
Neg logit contributions
 sidelines-0.012
ye-0.011
opus-0.011
arter-0.011
uch-0.011
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
 her+0.020
Her+0.016
 its+0.014
stick+0.014
 them+0.014
Neg logit contributions
arter-0.068
ll-0.067
 sidelines-0.065
ve-0.065
ider-0.064
Token tree
Token ID5509
Feature activation-2.40e-03

Pos logit contributions
 and+0.015
 her+0.014
 them+0.013
 how+0.013
.+0.013
Neg logit contributions
opus-0.013
 sidelines-0.013
arter-0.013
ye-0.012
uch-0.011
Token pose
Token ID12705
Feature activation-6.56e-02

Pos logit contributions
 them+0.002
 their+0.002
Her+0.002
 her+0.002
 themselves+0.002
Neg logit contributions
ider-0.004
arter-0.004
opus-0.004
 sidelines-0.004
ll-0.004
Token and
Token ID290
Feature activation-9.42e-03

Pos logit contributions
 Every+0.046
c+0.043
 One+0.040
 Whenever+0.039
 them+0.039
Neg logit contributions
ider-0.118
cher-0.116
ve-0.114
beans-0.112
opus-0.107
Token a
Token ID257
Feature activation-2.79e-02

Pos logit contributions
 and+0.010
 them+0.008
 her+0.008
.+0.008
 or+0.007
Neg logit contributions
 sidelines-0.018
opus-0.017
cher-0.016
leaf-0.016
ye-0.016
Token warrior
Token ID16491
Feature activation-2.88e-02

Pos logit contributions
 and+0.047
 her+0.044
 them+0.041
 how+0.041
 as+0.040
Neg logit contributions
 sidelines-0.038
opus-0.038
arter-0.037
ye-0.035
prise-0.034
Token pose
Token ID12705
Feature activation-8.68e-02

Pos logit contributions
 them+0.028
 Every+0.026
 her+0.025
 their+0.025
 Everywhere+0.025
Neg logit contributions
cher-0.054
opus-0.051
ve-0.049
ye-0.049
ider-0.048
Token.
Token ID13
Feature activation-4.75e-03

Pos logit contributions
 Every+0.055
c+0.053
 One+0.049
 Whenever+0.049
 nests+0.048
Neg logit contributions
ider-0.163
cher-0.163
ve-0.158
beans-0.152
opus-0.143
Token They
Token ID1119
Feature activation-6.63e-03

Pos logit contributions
 and+0.004
 enough+0.004
 them+0.004
 or+0.004
 how+0.003
Neg logit contributions
 sidelines-0.009
umm-0.009
arter-0.008
 Seek-0.008
opus-0.008
Token were
Token ID547
Feature activation+9.72e-03

Pos logit contributions
 and+0.010
 her+0.009
 or+0.008
 enough+0.008
 how+0.008
Neg logit contributions
 sidelines-0.009
opus-0.009
ve-0.009
ye-0.009
ll-0.009
Token fun
Token ID1257
Feature activation-5.78e-03

Pos logit contributions
 sidelines+0.016
 Seek+0.015
arter+0.014
ve+0.014
ye+0.014
Neg logit contributions
 and-0.011
 them-0.010
 or-0.010
 her-0.010
.-0.010
Token,
Token ID11
Feature activation+2.31e-02

Pos logit contributions
 her+0.004
 them+0.004
 Every+0.004
 its+0.004
stick+0.004
Neg logit contributions
opus-0.012
ve-0.011
ll-0.011
ye-0.011
 sidelines-0.011
Token to
Token ID284
Feature activation+2.29e-03

Pos logit contributions
ll+0.038
ye+0.036
 sidelines+0.036
 hamb+0.035
ve+0.035
Neg logit contributions
c-0.015
 Every-0.014
stick-0.014
ine-0.014
 and-0.013
Token get
Token ID651
Feature activation-6.82e-03

Pos logit contributions
 sidelines+0.003
arter+0.003
opus+0.003
uch+0.003
ye+0.003
Neg logit contributions
 and-0.004
 her-0.003
 how-0.003
.-0.003
 them-0.003
Token away
Token ID1497
Feature activation+4.87e-03

Pos logit contributions
 and+0.007
ine+0.005
 or+0.005
 her+0.005
 how+0.004
Neg logit contributions
 sidelines-0.013
leaf-0.013
cher-0.013
ye-0.012
prise-0.012
Token un
Token ID555
Feature activation-2.39e-02

Pos logit contributions
arter+0.009
 sidelines+0.009
 hamb+0.009
vell+0.008
cher+0.008
Neg logit contributions
ine-0.004
c-0.003
 One-0.003
 them-0.003
 Every-0.003
Tokenhar
Token ID9869
Feature activation+3.49e-03

Pos logit contributions
 and+0.047
 her+0.044
.+0.041
 as+0.041
 how+0.041
Neg logit contributions
opus-0.027
 sidelines-0.027
arter-0.027
uch-0.024
ye-0.024
Tokenmed
Token ID1150
Feature activation-8.57e-02

Pos logit contributions
opus+0.005
arter+0.005
 sidelines+0.005
ye+0.005
uch+0.005
Neg logit contributions
 and-0.005
 her-0.005
.-0.005
 as-0.005
 how-0.005
Token.
Token ID13
Feature activation-5.91e-03

Pos logit contributions
ine+0.054
 how+0.050
c+0.046
 her+0.046
 Every+0.046
Neg logit contributions
ve-0.167
cher-0.161
vell-0.160
 sidelines-0.157
opus-0.151
Token From
Token ID3574
Feature activation+8.22e-03

Pos logit contributions
 and+0.005
 her+0.004
.+0.003
 as+0.003
 them+0.003
Neg logit contributions
opus-0.013
 sidelines-0.013
arter-0.013
ye-0.013
umm-0.013
Token that
Token ID326
Feature activation+1.56e-02

Pos logit contributions
opus+0.015
 sidelines+0.015
ll+0.014
arter+0.014
ye+0.014
Neg logit contributions
 and-0.008
 her-0.007
 or-0.007
 them-0.007
ine-0.006
Token day
Token ID1110
Feature activation+2.00e-03

Pos logit contributions
opus+0.023
 sidelines+0.022
arter+0.022
ye+0.022
uch+0.021
Neg logit contributions
 and-0.024
 her-0.023
 them-0.022
 how-0.021
 as-0.021
Token onwards
Token ID31719
Feature activation-1.90e-02

Pos logit contributions
 sidelines+0.005
arter+0.005
opus+0.005
umm+0.004
leaf+0.004
Neg logit contributions
 and-0.002
 her-0.001
.-0.001
,-0.001
 as-0.001
Token dust
Token ID8977
Feature activation-3.86e-02

Pos logit contributions
 and+0.022
 her+0.020
.+0.018
 as+0.018
 how+0.018
Neg logit contributions
opus-0.026
arter-0.026
 sidelines-0.025
ye-0.024
uch-0.024
Token her
Token ID607
Feature activation-2.41e-02

Pos logit contributions
 One+0.030
 Every+0.026
+0.025
 and+0.024
ine+0.022
Neg logit contributions
cher-0.073
vell-0.070
ve-0.069
atmeal-0.069
 sidelines-0.069
Token shelves
Token ID20726
Feature activation-3.89e-02

Pos logit contributions
 and+0.037
 her+0.035
 them+0.033
 how+0.033
 enough+0.032
Neg logit contributions
 sidelines-0.034
arter-0.033
opus-0.033
atmeal-0.031
uch-0.031
Token before
Token ID878
Feature activation-3.09e-02

Pos logit contributions
 and+0.020
 them+0.018
 her+0.018
 how+0.017
ine+0.016
Neg logit contributions
 sidelines-0.084
atmeal-0.082
ve-0.082
ll-0.081
vell-0.081
Token bed
Token ID3996
Feature activation-2.84e-02

Pos logit contributions
 and+0.042
 how+0.035
ine+0.035
 her+0.034
 or+0.034
Neg logit contributions
 sidelines-0.049
arter-0.045
 freshly-0.043
atmeal-0.043
 dining-0.042
Tokentime
Token ID2435
Feature activation-8.56e-02

Pos logit contributions
 and+0.013
 her+0.010
 how+0.010
 its+0.009
 them+0.009
Neg logit contributions
 sidelines-0.066
arter-0.066
atmeal-0.065
opus-0.065
ve-0.063
Token.
Token ID13
Feature activation-5.17e-03

Pos logit contributions
 and+0.044
c+0.039
 how+0.039
 them+0.037
 its+0.037
Neg logit contributions
umm-0.175
 Merry-0.173
 sidelines-0.170
 cheek-0.166
 dining-0.165
Token<|endoftext|>
Token ID50256
Feature activation+2.32e-02

Pos logit contributions
 and+0.004
 enough+0.003
 as+0.003
 them+0.003
 or+0.003
Neg logit contributions
arter-0.011
opus-0.011
 Merry-0.011
 sidelines-0.011
umm-0.011
TokenOnce
Token ID7454
Feature activation-5.71e-04

Pos logit contributions
opus+0.049
arter+0.047
ye+0.047
ve+0.047
ll+0.047
Neg logit contributions
 and-0.016
 as-0.015
 her-0.015
 how-0.015
 enough-0.014
Token upon
Token ID2402
Feature activation-3.36e-03

Pos logit contributions
 and+0.001
 her+0.001
 as+0.001
 how+0.001
 enough+0.001
Neg logit contributions
arter-0.001
opus-0.001
ll-0.001
ve-0.001
 sidelines-0.001
Token a
Token ID257
Feature activation-6.49e-03

Pos logit contributions
 and+0.002
 her+0.001
 them+0.001
 how+0.001
,+0.001
Neg logit contributions
opus-0.008
arter-0.008
 sidelines-0.008
prise-0.008
uch-0.008
Token and
Token ID290
Feature activation-4.00e-03

Pos logit contributions
c+0.023
stick+0.021
 them+0.019
ine+0.018
 her+0.017
Neg logit contributions
arter-0.061
ll-0.058
opus-0.055
 sidelines-0.054
umm-0.054
Token didn
Token ID1422
Feature activation-3.51e-02

Pos logit contributions
 and+0.006
 them+0.005
 her+0.005
.+0.005
 or+0.005
Neg logit contributions
 sidelines-0.006
arter-0.006
opus-0.006
ye-0.005
uch-0.005
Token't
Token ID470
Feature activation+4.30e-03

Pos logit contributions
 and+0.069
 how+0.064
 her+0.063
 enough+0.061
 one+0.060
Neg logit contributions
uch-0.033
 sidelines-0.032
opus-0.032
arter-0.032
ve-0.031
Token want
Token ID765
Feature activation-4.82e-02

Pos logit contributions
opus+0.007
 sidelines+0.007
arter+0.007
ye+0.007
uch+0.007
Neg logit contributions
 and-0.006
 her-0.006
.-0.005
 as-0.005
 how-0.005
Token to
Token ID284
Feature activation-2.47e-02

Pos logit contributions
 and+0.038
ine+0.027
 as+0.026
c+0.025
 or+0.025
Neg logit contributions
arter-0.096
 sidelines-0.096
leaf-0.093
 hamb-0.090
 Merry-0.088
Token dance
Token ID9280
Feature activation-8.51e-02

Pos logit contributions
 and+0.037
 her+0.034
.+0.031
 how+0.031
 as+0.031
Neg logit contributions
arter-0.039
opus-0.039
 sidelines-0.039
uch-0.036
ye-0.036
Token anymore
Token ID7471
Feature activation-4.64e-02

Pos logit contributions
ine+0.092
 Every+0.074
 Whenever+0.070
c+0.070
 its+0.070
Neg logit contributions
cher-0.131
vell-0.128
 handsome-0.128
ider-0.126
fy-0.124
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
alled+0.039
 Everywhere+0.039
stick+0.038
 them+0.037
 Every+0.037
Neg logit contributions
ll-0.071
ve-0.069
 seek-0.068
arter-0.068
cher-0.066
Token
Token ID198
Feature activation-6.52e-03

Pos logit contributions
 enough+0.011
 and+0.011
 or+0.011
 how+0.011
 as+0.010
Neg logit contributions
ll-0.016
 sidelines-0.016
opus-0.016
 Merry-0.015
va-0.015
Token
Token ID198
Feature activation-7.24e-03

Pos logit contributions
 enough+0.005
 and+0.005
 as+0.005
 how+0.005
 or+0.005
Neg logit contributions
ll-0.013
ve-0.013
opus-0.013
prise-0.013
ye-0.013
TokenHer
Token ID9360
Feature activation-1.17e-02

Pos logit contributions
 and+0.011
 enough+0.011
 as+0.010
 or+0.010
 how+0.010
Neg logit contributions
opus-0.010
ve-0.010
 sidelines-0.009
ll-0.009
ye-0.009
Token because
Token ID780
Feature activation+5.90e-03

Pos logit contributions
arter+0.053
opus+0.053
umm+0.053
 sidelines+0.052
ll+0.051
Neg logit contributions
 her-0.003
 and-0.002
 how-0.001
 them-0.001
 its+0.001
Token it
Token ID340
Feature activation+2.13e-02

Pos logit contributions
opus+0.014
arter+0.014
 sidelines+0.014
ye+0.014
uch+0.014
Neg logit contributions
 and-0.004
 her-0.003
.-0.003
 as-0.002
 how-0.002
Token had
Token ID550
Feature activation+1.48e-02

Pos logit contributions
opus+0.038
arter+0.038
 sidelines+0.038
uch+0.036
ye+0.036
Neg logit contributions
 and-0.027
 her-0.024
.-0.022
 as-0.022
 them-0.022
Token not
Token ID407
Feature activation+1.99e-02

Pos logit contributions
 sidelines+0.023
arter+0.022
 hamb+0.022
ye+0.021
leaf+0.021
Neg logit contributions
 and-0.020
 her-0.017
,-0.016
 as-0.016
.-0.016
Token r
Token ID374
Feature activation-3.99e-02

Pos logit contributions
arter+0.029
atmeal+0.027
 sidelines+0.027
 cheesy+0.027
 hamb+0.027
Neg logit contributions
 and-0.029
 her-0.025
 them-0.025
,-0.024
 or-0.023
Tokenained
Token ID1328
Feature activation-8.50e-02

Pos logit contributions
 and+0.058
 her+0.053
.+0.048
 as+0.048
 how+0.048
Neg logit contributions
arter-0.065
opus-0.064
 sidelines-0.063
uch-0.060
umm-0.060
Token for
Token ID329
Feature activation-1.11e-02

Pos logit contributions
ine+0.056
Her+0.051
c+0.048
stick+0.048
 and+0.048
Neg logit contributions
arter-0.158
 sidelines-0.150
uch-0.149
ancer-0.148
ll-0.148
Token many
Token ID867
Feature activation+1.06e-02

Pos logit contributions
 and+0.009
 her+0.007
 how+0.006
.+0.006
 as+0.005
Neg logit contributions
 sidelines-0.025
arter-0.025
opus-0.024
ye-0.023
prise-0.023
Token days
Token ID1528
Feature activation-8.41e-03

Pos logit contributions
arter+0.016
opus+0.016
 sidelines+0.016
ye+0.015
uch+0.015
Neg logit contributions
 and-0.016
 her-0.015
.-0.013
 as-0.013
 how-0.013
Token.
Token ID13
Feature activation-6.41e-03

Pos logit contributions
 her+0.004
 how+0.004
 its+0.004
 them+0.003
 Every+0.003
Neg logit contributions
ll-0.018
ve-0.018
 sidelines-0.017
arter-0.017
uch-0.017
Token
Token ID198
Feature activation+1.36e-03

Pos logit contributions
 and+0.007
 her+0.006
 enough+0.006
 how+0.006
 as+0.006
Neg logit contributions
 sidelines-0.012
opus-0.011
 Merry-0.011
arter-0.011
ye-0.011
Token be
Token ID307
Feature activation+1.18e-02

Pos logit contributions
arter+0.001
 sidelines+0.001
atmeal+0.001
uch+0.001
prise+0.001
Neg logit contributions
 and-0.001
 her-0.001
 enough-0.001
 them-0.001
 how-0.001
Token healthy
Token ID5448
Feature activation+2.46e-02

Pos logit contributions
 cheesy+0.017
wy+0.016
 hamb+0.015
 freshly+0.015
arter+0.015
Neg logit contributions
 and-0.016
stick-0.015
 She-0.015
 Every-0.014
 her-0.014
Token and
Token ID290
Feature activation-6.49e-03

Pos logit contributions
umm+0.053
 cheesy+0.051
ll+0.050
 hamb+0.050
arter+0.050
Neg logit contributions
Her-0.013
 her-0.012
 Her-0.010
 its-0.010
 them-0.010
Token un
Token ID555
Feature activation-3.42e-02

Pos logit contributions
 and+0.010
 her+0.010
 as+0.010
 how+0.009
.+0.009
Neg logit contributions
opus-0.013
arter-0.012
 sidelines-0.011
affle-0.011
unn-0.011
Tokenhar
Token ID9869
Feature activation+3.89e-04

Pos logit contributions
 and+0.062
 her+0.057
.+0.054
 as+0.053
 how+0.053
Neg logit contributions
opus-0.044
arter-0.044
 sidelines-0.044
ye-0.040
uch-0.039
Tokenmed
Token ID1150
Feature activation-8.49e-02

Pos logit contributions
arter+0.001
 sidelines+0.001
opus+0.001
ye+0.001
prise+0.001
Neg logit contributions
 and-0.001
 her-0.001
.-0.000
 as-0.000
 how-0.000
Token.
Token ID13
Feature activation-7.06e-03

Pos logit contributions
ine+0.047
 how+0.047
 and+0.040
 One+0.038
 Every+0.038
Neg logit contributions
ve-0.165
cher-0.163
vell-0.157
 sidelines-0.157
ye-0.156
Token Suddenly
Token ID24975
Feature activation-2.31e-02

Pos logit contributions
 and+0.007
 them+0.006
 her+0.006
 as+0.006
 how+0.005
Neg logit contributions
 Merry-0.013
rav-0.013
 Enjoy-0.013
 sidelines-0.013
opus-0.013
Token,
Token ID11
Feature activation+7.34e-03

Pos logit contributions
stick+0.017
 Every+0.014
 and+0.014
 how+0.014
 them+0.012
Neg logit contributions
ve-0.044
 Merry-0.044
rav-0.044
opus-0.043
ye-0.043
Token her
Token ID607
Feature activation+6.09e-03

Pos logit contributions
 sidelines+0.013
 Merry+0.012
ve+0.012
opus+0.012
uch+0.012
Neg logit contributions
 and-0.007
stick-0.006
 them-0.006
 how-0.006
.-0.006
Token wish
Token ID4601
Feature activation+1.29e-02

Pos logit contributions
arter+0.010
opus+0.010
 sidelines+0.010
umm+0.009
 cheesy+0.009
Neg logit contributions
 and-0.008
 her-0.007
 them-0.007
 how-0.007
 as-0.006
Token to
Token ID284
Feature activation-9.55e-03

Pos logit contributions
 and+0.019
 her+0.013
stick+0.013
ine+0.013
 or+0.012
Neg logit contributions
leaf-0.057
arter-0.057
opus-0.057
 hamb-0.056
 cheesy-0.056
Token hug
Token ID16225
Feature activation+4.66e-04

Pos logit contributions
 and+0.015
 enough+0.014
 her+0.014
 how+0.013
 as+0.013
Neg logit contributions
 sidelines-0.012
arter-0.012
uch-0.011
prise-0.011
ll-0.011
Token and
Token ID290
Feature activation-2.61e-02

Pos logit contributions
 Merry+0.001
arter+0.001
 cheesy+0.001
wy+0.001
 hamb+0.001
Neg logit contributions
 enough-0.000
 Every-0.000
 Everywhere-0.000
stick-0.000
 and-0.000
Token kiss
Token ID9245
Feature activation-3.55e-02

Pos logit contributions
 and+0.042
 her+0.038
 them+0.036
.+0.036
 as+0.036
Neg logit contributions
 sidelines-0.036
arter-0.035
opus-0.035
prise-0.033
ye-0.033
Token their
Token ID511
Feature activation-3.44e-02

Pos logit contributions
 and+0.015
 enough+0.013
 Every+0.013
 how+0.010
 Everywhere+0.010
Neg logit contributions
 Merry-0.076
cher-0.074
arter-0.073
rav-0.073
wy-0.072
Token dolls
Token ID36062
Feature activation-8.47e-02

Pos logit contributions
 and+0.056
 her+0.051
 enough+0.049
 as+0.048
 how+0.047
Neg logit contributions
 sidelines-0.047
arter-0.046
opus-0.046
ve-0.043
ye-0.043
Token.
Token ID13
Feature activation+2.40e-03

Pos logit contributions
 her+0.038
 them+0.034
 Every+0.033
 Whenever+0.033
 Everywhere+0.032
Neg logit contributions
 cheek-0.182
ll-0.178
cher-0.176
atmeal-0.174
 cheesy-0.172
Token
Token ID198
Feature activation+1.11e-02

Pos logit contributions
arter+0.005
 cheesy+0.004
 sidelines+0.004
ye+0.004
opus+0.004
Neg logit contributions
 and-0.002
 enough-0.002
 her-0.002
stick-0.002
 as-0.002
Token
Token ID198
Feature activation+1.12e-02

Pos logit contributions
opus+0.024
 sidelines+0.024
arter+0.024
ye+0.024
ll+0.024
Neg logit contributions
 and-0.007
 enough-0.007
 as-0.006
 her-0.006
 one-0.006
TokenL
Token ID43
Feature activation+6.97e-03

Pos logit contributions
arter+0.022
opus+0.022
 sidelines+0.022
uch+0.021
ye+0.021
Neg logit contributions
 and-0.013
 her-0.011
.-0.010
 as-0.010
 them-0.010
Tokenily
Token ID813
Feature activation-4.12e-03

Pos logit contributions
arter+0.009
opus+0.009
 sidelines+0.009
prise+0.008
ve+0.008
Neg logit contributions
 and-0.012
 her-0.011
 as-0.010
 one-0.010
 how-0.010
Token and
Token ID290
Feature activation-7.14e-03

Pos logit contributions
c+0.021
stick+0.018
ine+0.018
 them+0.017
 its+0.016
Neg logit contributions
arter-0.077
ll-0.074
umm-0.073
 sidelines-0.073
ancer-0.072
Token didn
Token ID1422
Feature activation-3.99e-02

Pos logit contributions
 and+0.011
 her+0.010
.+0.009
 as+0.009
 them+0.009
Neg logit contributions
 sidelines-0.011
arter-0.011
opus-0.011
uch-0.010
ye-0.010
Token't
Token ID470
Feature activation+4.17e-04

Pos logit contributions
 and+0.081
 how+0.074
 her+0.073
 enough+0.072
 as+0.071
Neg logit contributions
uch-0.036
 sidelines-0.035
opus-0.035
arter-0.033
ve-0.031
Token want
Token ID765
Feature activation-5.12e-02

Pos logit contributions
opus+0.001
arter+0.001
 sidelines+0.001
ye+0.001
uch+0.001
Neg logit contributions
 and-0.001
 her-0.001
.-0.001
 as-0.001
 how-0.001
Token to
Token ID284
Feature activation-2.44e-02

Pos logit contributions
 and+0.038
c+0.032
ine+0.030
stick+0.028
 as+0.025
Neg logit contributions
 sidelines-0.103
leaf-0.102
arter-0.099
 Merry-0.098
opus-0.096
Token dance
Token ID9280
Feature activation-8.43e-02

Pos logit contributions
 and+0.042
 her+0.037
 enough+0.036
 how+0.035
 as+0.035
Neg logit contributions
 sidelines-0.032
arter-0.031
opus-0.029
uch-0.028
prise-0.028
Token.
Token ID13
Feature activation-1.20e-02

Pos logit contributions
ine+0.080
 its+0.064
 how+0.064
c+0.064
 Every+0.064
Neg logit contributions
vell-0.137
 handsome-0.136
cher-0.131
ider-0.130
arter-0.129
Token She
Token ID1375
Feature activation-2.00e-02

Pos logit contributions
 and+0.012
 them+0.011
 how+0.011
 their+0.010
c+0.010
Neg logit contributions
 Merry-0.021
 sidelines-0.021
opus-0.020
 dining-0.020
umm-0.020
Token felt
Token ID2936
Feature activation-1.08e-02

Pos logit contributions
 and+0.029
 her+0.027
 them+0.026
 as+0.026
 how+0.026
Neg logit contributions
opus-0.028
 sidelines-0.027
ve-0.026
arter-0.026
uch-0.026
Token like
Token ID588
Feature activation-4.01e-02

Pos logit contributions
 and+0.016
ine+0.015
packs+0.014
stick+0.014
c+0.013
Neg logit contributions
 sidelines-0.015
arter-0.014
 handsome-0.013
 freshly-0.013
leaf-0.013
Token she
Token ID673
Feature activation-1.11e-02

Pos logit contributions
 and+0.039
ine+0.034
c+0.033
stick+0.031
.+0.031
Neg logit contributions
 sidelines-0.076
opus-0.071
 talented-0.070
leaf-0.070
arter-0.069
Token shouted
Token ID17293
Feature activation-5.50e-03

Pos logit contributions
ll+0.067
opus+0.066
ve+0.065
cher+0.063
uch+0.061
Neg logit contributions
 them-0.069
 her-0.067
 their-0.065
 and-0.062
 its-0.061
Token to
Token ID284
Feature activation-1.67e-02

Pos logit contributions
 them+0.002
 how+0.002
c+0.002
 and+0.002
ine+0.002
Neg logit contributions
arter-0.011
 cheek-0.011
ve-0.011
 sidelines-0.011
opus-0.011
Token no
Token ID645
Feature activation-1.74e-02

Pos logit contributions
 and+0.010
 her+0.007
.+0.006
 as+0.006
 enough+0.006
Neg logit contributions
 sidelines-0.041
arter-0.041
opus-0.041
leaf-0.039
ye-0.039
Token one
Token ID530
Feature activation-2.70e-02

Pos logit contributions
 and+0.017
 her+0.015
 how+0.013
 them+0.013
 as+0.013
Neg logit contributions
arter-0.036
opus-0.036
 sidelines-0.036
ye-0.034
uch-0.033
Token in
Token ID287
Feature activation-2.13e-02

Pos logit contributions
 them+0.016
stick+0.016
 and+0.014
 her+0.014
ine+0.013
Neg logit contributions
arter-0.056
opus-0.054
ye-0.052
ll-0.052
uch-0.051
Token particular
Token ID1948
Feature activation-8.42e-02

Pos logit contributions
 and+0.018
 as+0.012
 how+0.012
stick+0.012
ine+0.011
Neg logit contributions
arter-0.041
prise-0.039
ye-0.038
 traitor-0.038
 sidelines-0.038
Token.
Token ID13
Feature activation-7.34e-03

Pos logit contributions
 them+0.070
 her+0.069
 their+0.064
 and+0.062
 themselves+0.060
Neg logit contributions
opus-0.160
arter-0.154
 sidelines-0.150
uch-0.147
prise-0.145
Token Suddenly
Token ID24975
Feature activation-3.11e-02

Pos logit contributions
 enough+0.008
 and+0.008
 them+0.008
 or+0.007
 their+0.007
Neg logit contributions
 sidelines-0.012
opus-0.011
 cheek-0.011
 Enjoy-0.011
 Merry-0.011
Token,
Token ID11
Feature activation-2.33e-03

Pos logit contributions
 and+0.018
stick+0.018
 how+0.017
 Every+0.017
 them+0.017
Neg logit contributions
opus-0.063
ve-0.062
ye-0.061
 hamb-0.059
 Merry-0.058
Token a
Token ID257
Feature activation-2.42e-03

Pos logit contributions
 and+0.002
 them+0.002
stick+0.001
ine+0.001
 how+0.001
Neg logit contributions
opus-0.005
ve-0.004
ancer-0.004
 sidelines-0.004
 hamb-0.004
Token voice
Token ID3809
Feature activation-2.72e-02

Pos logit contributions
 and+0.004
,+0.003
 her+0.003
.+0.003
 one+0.003
Neg logit contributions
arter-0.004
ll-0.004
wy-0.004
atmeal-0.004
uch-0.004
Token and
Token ID290
Feature activation-1.04e-02

Pos logit contributions
ine+0.014
packs+0.014
c+0.014
stick+0.012
 her+0.010
Neg logit contributions
arter-0.070
opus-0.068
ll-0.068
ve-0.067
 sidelines-0.067
Token didn
Token ID1422
Feature activation-3.71e-02

Pos logit contributions
 and+0.015
 her+0.013
.+0.012
 them+0.012
 as+0.012
Neg logit contributions
 sidelines-0.017
arter-0.017
opus-0.017
ye-0.016
uch-0.016
Token't
Token ID470
Feature activation-5.57e-04

Pos logit contributions
 and+0.068
 her+0.063
 how+0.060
 one+0.059
 as+0.059
Neg logit contributions
 sidelines-0.042
opus-0.042
arter-0.042
uch-0.041
ve-0.038
Token want
Token ID765
Feature activation-5.00e-02

Pos logit contributions
 and+0.001
 her+0.001
.+0.001
 as+0.001
,+0.001
Neg logit contributions
opus-0.001
 sidelines-0.001
arter-0.001
umm-0.001
uch-0.001
Token to
Token ID284
Feature activation-2.38e-02

Pos logit contributions
 and+0.037
ine+0.028
 as+0.025
c+0.023
 or+0.023
Neg logit contributions
 sidelines-0.103
arter-0.103
leaf-0.101
 cheesy-0.097
prise-0.096
Token dance
Token ID9280
Feature activation-8.39e-02

Pos logit contributions
 and+0.039
 her+0.036
.+0.033
 as+0.032
 how+0.032
Neg logit contributions
opus-0.036
arter-0.035
 sidelines-0.035
ye-0.033
uch-0.032
Token anymore
Token ID7471
Feature activation-5.01e-02

Pos logit contributions
ine+0.109
 Every+0.093
 how+0.087
c+0.086
 and+0.085
Neg logit contributions
cher-0.114
 handsome-0.111
vell-0.108
ider-0.105
arter-0.104
Token.
Token ID13
Feature activation-1.14e-02

Pos logit contributions
 Everywhere+0.032
 them+0.032
stick+0.032
 Every+0.032
c+0.031
Neg logit contributions
ll-0.089
ve-0.089
arter-0.088
opus-0.086
 forgetting-0.084
Token
Token ID198
Feature activation-4.72e-03

Pos logit contributions
 and+0.007
 her+0.005
 as+0.004
 how+0.004
.+0.004
Neg logit contributions
 sidelines-0.028
opus-0.028
arter-0.028
ye-0.026
umm-0.026
Token
Token ID198
Feature activation-5.04e-03

Pos logit contributions
 and+0.002
 her+0.002
 as+0.001
.+0.001
 how+0.001
Neg logit contributions
opus-0.012
arter-0.012
 sidelines-0.012
ye-0.012
uch-0.012
TokenEm
Token ID10161
Feature activation+1.56e-02

Pos logit contributions
 and+0.007
 her+0.006
.+0.005
 how+0.005
 them+0.005
Neg logit contributions
arter-0.010
opus-0.009
 sidelines-0.009
umm-0.009
atmeal-0.009
Token and
Token ID290
Feature activation+1.01e-02

Pos logit contributions
ine+0.037
 enough+0.033
 her+0.029
c+0.029
packs+0.028
Neg logit contributions
 sidelines-0.105
ve-0.099
 concession-0.098
uch-0.097
ll-0.096
Token r
Token ID374
Feature activation-2.82e-02

Pos logit contributions
 sidelines+0.017
arter+0.016
uch+0.016
 cheesy+0.016
opus+0.015
Neg logit contributions
 and-0.013
ine-0.010
 her-0.010
.-0.010
 or-0.010
Tokenained
Token ID1328
Feature activation-7.50e-02

Pos logit contributions
 and+0.038
 her+0.035
 one+0.031
 how+0.031
 as+0.031
Neg logit contributions
arter-0.049
opus-0.048
 sidelines-0.048
uch-0.046
ll-0.046
Token and
Token ID290
Feature activation-1.12e-03

Pos logit contributions
ine+0.060
 enough+0.053
 god+0.051
c+0.050
packs+0.049
Neg logit contributions
 sidelines-0.133
ve-0.126
 rubble-0.123
free-0.123
 concession-0.118
Token r
Token ID374
Feature activation-3.66e-02

Pos logit contributions
 and+0.002
ine+0.001
 her+0.001
 or+0.001
.+0.001
Neg logit contributions
 sidelines-0.002
arter-0.002
uch-0.001
 cheesy-0.001
 hamb-0.001
Tokenained
Token ID1328
Feature activation-8.35e-02

Pos logit contributions
 and+0.051
 her+0.047
 one+0.042
 how+0.042
 them+0.042
Neg logit contributions
arter-0.061
 sidelines-0.060
opus-0.060
uch-0.058
ll-0.058
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
ine+0.066
 enough+0.058
 god+0.058
c+0.056
packs+0.053
Neg logit contributions
 sidelines-0.147
ve-0.140
free-0.139
 rubble-0.137
uch-0.129
Token The
Token ID383
Feature activation+1.15e-02

Pos logit contributions
 and+0.020
 enough+0.018
 how+0.017
 her+0.017
 as+0.016
Neg logit contributions
 sidelines-0.023
opus-0.022
arter-0.022
 Merry-0.021
umm-0.021
Token water
Token ID1660
Feature activation+3.41e-02

Pos logit contributions
opus+0.018
arter+0.018
 sidelines+0.018
ye+0.017
uch+0.016
Neg logit contributions
 and-0.017
 her-0.016
.-0.015
 as-0.014
 how-0.014
Token came
Token ID1625
Feature activation+4.14e-02

Pos logit contributions
opus+0.048
uch+0.048
 sidelines+0.045
ll+0.045
atmeal+0.043
Neg logit contributions
 her-0.046
 how-0.044
 enough-0.043
 and-0.042
 as-0.042
Token down
Token ID866
Feature activation+2.03e-02

Pos logit contributions
ve+0.069
ll+0.065
uch+0.065
 sidelines+0.064
arter+0.064
Neg logit contributions
 how-0.039
 her-0.038
 and-0.034
 Every-0.034
 Whenever-0.032
Token was
Token ID373
Feature activation+3.99e-03

Pos logit contributions
opus+0.038
uch+0.036
 sidelines+0.035
arter+0.034
ye+0.033
Neg logit contributions
 her-0.031
 and-0.030
 them-0.029
 how-0.028
 its-0.027
Token destroyed
Token ID6572
Feature activation+1.69e-02

Pos logit contributions
arter+0.005
 sidelines+0.005
uch+0.005
 cheesy+0.005
opus+0.005
Neg logit contributions
 and-0.006
 her-0.005
 them-0.005
ine-0.005
.-0.005
Token and
Token ID290
Feature activation+5.31e-03

Pos logit contributions
arter+0.038
 cheesy+0.036
umm+0.036
opus+0.035
 sidelines+0.035
Neg logit contributions
 her-0.008
Her-0.007
ine-0.006
 Her-0.006
stick-0.006
Token could
Token ID714
Feature activation-8.77e-03

Pos logit contributions
arter+0.011
opus+0.010
 sidelines+0.010
ye+0.010
uch+0.010
Neg logit contributions
 and-0.006
 her-0.005
.-0.005
 as-0.004
 how-0.004
Token not
Token ID407
Feature activation-1.52e-02

Pos logit contributions
 and+0.011
 her+0.010
 them+0.009
.+0.009
 enough+0.009
Neg logit contributions
 sidelines-0.015
arter-0.015
opus-0.015
uch-0.014
atmeal-0.014
Token spin
Token ID7906
Feature activation-8.34e-02

Pos logit contributions
 and+0.024
 her+0.022
 how+0.021
 enough+0.021
 them+0.020
Neg logit contributions
arter-0.022
 sidelines-0.022
opus-0.021
uch-0.020
atmeal-0.020
Token anymore
Token ID7471
Feature activation-5.21e-02

Pos logit contributions
 and+0.091
 her+0.088
ine+0.087
Her+0.084
 Her+0.081
Neg logit contributions
arter-0.128
vell-0.119
 sidelines-0.118
uch-0.117
opus-0.115
Token.
Token ID13
Feature activation-1.31e-02

Pos logit contributions
 her+0.033
 them+0.029
 enough+0.027
 Every+0.026
 and+0.026
Neg logit contributions
arter-0.107
opus-0.104
ll-0.100
ve-0.099
cher-0.098
Token Tim
Token ID5045
Feature activation+1.78e-02

Pos logit contributions
 and+0.016
 her+0.015
.+0.013
 as+0.013
 how+0.013
Neg logit contributions
opus-0.024
arter-0.024
 sidelines-0.024
ye-0.023
uch-0.022
Token was
Token ID373
Feature activation-7.92e-03

Pos logit contributions
opus+0.029
uch+0.029
ll+0.028
ve+0.027
 sidelines+0.027
Neg logit contributions
 her-0.021
 them-0.020
 and-0.020
 as-0.019
 their-0.019
Token very
Token ID845
Feature activation+2.14e-02

Pos logit contributions
 and+0.012
 her+0.011
 them+0.011
ine+0.010
.+0.010
Neg logit contributions
 sidelines-0.010
arter-0.010
uch-0.010
 traitor-0.009
opus-0.009
Token broke
Token ID6265
Feature activation+1.45e-02

Pos logit contributions
Her+0.012
 Every+0.010
 Whenever+0.010
 Everywhere+0.010
 One+0.010
Neg logit contributions
cher-0.030
 timer-0.029
 cheek-0.029
uch-0.028
ve-0.028
Token and
Token ID290
Feature activation-5.77e-03

Pos logit contributions
arter+0.027
 cheesy+0.026
 sidelines+0.026
 freshly+0.026
umm+0.025
Neg logit contributions
Her-0.009
 Her-0.008
 One-0.008
 Every-0.008
 Whenever-0.008
Token she
Token ID673
Feature activation-2.57e-02

Pos logit contributions
 and+0.008
 her+0.007
 as+0.007
 them+0.007
.+0.007
Neg logit contributions
 sidelines-0.008
arter-0.008
cher-0.008
prise-0.008
 cheesy-0.008
Token couldn
Token ID3521
Feature activation-1.47e-02

Pos logit contributions
 and+0.039
 her+0.037
 them+0.035
 as+0.035
 one+0.034
Neg logit contributions
ve-0.033
opus-0.033
 sidelines-0.032
ll-0.032
prise-0.032
Token't
Token ID470
Feature activation+9.06e-03

Pos logit contributions
 and+0.029
 how+0.028
 as+0.027
 enough+0.027
 her+0.027
Neg logit contributions
uch-0.013
opus-0.012
ve-0.012
 sidelines-0.012
arter-0.011
Token dance
Token ID9280
Feature activation-8.33e-02

Pos logit contributions
arter+0.012
 sidelines+0.012
opus+0.011
prise+0.011
uch+0.011
Neg logit contributions
 and-0.015
 her-0.014
 enough-0.013
 them-0.013
 how-0.013
Token anymore
Token ID7471
Feature activation-4.17e-02

Pos logit contributions
ine+0.113
 and+0.100
 Every+0.098
c+0.096
 how+0.093
Neg logit contributions
arter-0.109
vell-0.108
 sidelines-0.104
 handsome-0.104
cher-0.101
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 enough+0.021
 and+0.019
 them+0.019
 her+0.018
 Everywhere+0.018
Neg logit contributions
arter-0.087
opus-0.086
ll-0.084
ve-0.084
 sidelines-0.084
Token She
Token ID1375
Feature activation-1.20e-02

Pos logit contributions
 and+0.010
 enough+0.010
 them+0.009
 as+0.009
 how+0.009
Neg logit contributions
 sidelines-0.018
opus-0.018
 Merry-0.017
arter-0.017
 Bye-0.017
Token felt
Token ID2936
Feature activation-5.92e-03

Pos logit contributions
 her+0.017
 and+0.017
 them+0.016
ings+0.016
 as+0.016
Neg logit contributions
 forgot-0.014
ll-0.014
opus-0.013
 sidelines-0.013
ve-0.013
Token sad
Token ID6507
Feature activation-3.53e-02

Pos logit contributions
 and+0.008
packs+0.008
ine+0.008
ings+0.008
stick+0.008
Neg logit contributions
 sidelines-0.008
 handsome-0.007
arter-0.007
 talented-0.007
 cheesy-0.007
Token.
Token ID13
Feature activation-1.88e-03

Pos logit contributions
 and+0.010
 her+0.009
 them+0.008
 how+0.008
 as+0.007
Neg logit contributions
opus-0.016
arter-0.016
ye-0.016
 sidelines-0.016
ve-0.015
Token Here
Token ID3423
Feature activation+1.65e-02

Pos logit contributions
 and+0.003
 her+0.003
.+0.003
 as+0.003
 how+0.003
Neg logit contributions
arter-0.003
opus-0.002
 sidelines-0.002
uch-0.002
ye-0.002
Token you
Token ID345
Feature activation+8.55e-04

Pos logit contributions
opus+0.019
ye+0.019
uch+0.019
prise+0.018
ve+0.017
Neg logit contributions
 and-0.027
stick-0.026
 her-0.025
 as-0.024
 them-0.024
Token go
Token ID467
Feature activation-9.95e-03

Pos logit contributions
opus+0.001
arter+0.001
 sidelines+0.001
ye+0.001
prise+0.001
Neg logit contributions
 and-0.001
 her-0.001
 them-0.001
 how-0.001
 as-0.001
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
 and+0.011
 her+0.011
 how+0.009
 its+0.009
 as+0.009
Neg logit contributions
opus-0.018
 sidelines-0.018
ye-0.018
arter-0.017
ve-0.017
Token Enjoy
Token ID18179
Feature activation+7.08e-02

Pos logit contributions
arter+0.014
opus+0.014
 sidelines+0.014
uch+0.013
ye+0.013
Neg logit contributions
 and-0.022
 her-0.020
 as-0.019
.-0.019
 how-0.019
Token!"
Token ID2474
Feature activation-1.18e-03

Pos logit contributions
opus+0.118
 sidelines+0.117
arter+0.116
ye+0.113
atmeal+0.111
Neg logit contributions
 and-0.090
 her-0.082
 as-0.074
.-0.074
 how-0.073
Token
Token ID198
Feature activation-4.83e-03

Pos logit contributions
 enough+0.001
 or+0.001
 them+0.001
 one+0.001
 her+0.001
Neg logit contributions
 sidelines-0.002
leaf-0.002
ll-0.002
ve-0.002
itt-0.002
Token
Token ID198
Feature activation-3.02e-03

Pos logit contributions
 enough+0.004
 as+0.004
 and+0.004
 her+0.003
 one+0.003
Neg logit contributions
opus-0.010
ll-0.010
ve-0.010
 sidelines-0.010
ye-0.010
TokenM
Token ID44
Feature activation-8.68e-03

Pos logit contributions
 and+0.006
 her+0.005
 enough+0.005
 as+0.005
 one+0.005
Neg logit contributions
opus-0.003
 sidelines-0.003
prise-0.003
ye-0.003
icious-0.003
Tokenum
Token ID388
Feature activation-2.54e-03

Pos logit contributions
 and+0.014
 her+0.013
.+0.012
 as+0.012
 them+0.012
Neg logit contributions
opus-0.013
arter-0.013
 sidelines-0.013
ye-0.012
uch-0.012
Token He
Token ID679
Feature activation+1.53e-02

Pos logit contributions
 sidelines+0.008
arter+0.008
umm+0.008
ye+0.008
opus+0.008
Neg logit contributions
 her-0.006
 and-0.006
 them-0.005
 how-0.005
,-0.005
Token went
Token ID1816
Feature activation+1.21e-02

Pos logit contributions
opus+0.022
ye+0.022
 sidelines+0.021
ll+0.021
ve+0.020
Neg logit contributions
 and-0.023
 her-0.022
.-0.020
 them-0.020
 as-0.020
Token on
Token ID319
Feature activation-7.90e-03

Pos logit contributions
 sidelines+0.027
opus+0.026
ye+0.026
ll+0.026
ve+0.026
Neg logit contributions
 and-0.008
 her-0.008
 how-0.006
 its-0.006
 them-0.006
Token the
Token ID262
Feature activation+1.84e-02

Pos logit contributions
stick+0.005
 and+0.005
ine+0.005
 how+0.005
 her+0.005
Neg logit contributions
 sidelines-0.016
ye-0.015
 Seek-0.014
 hamb-0.014
 robbers-0.014
Token fer
Token ID11354
Feature activation+7.18e-02

Pos logit contributions
arter+0.033
opus+0.031
prise+0.030
umm+0.029
 sidelines+0.029
Neg logit contributions
 and-0.027
 her-0.025
 one-0.023
.-0.023
 as-0.022
Tokenris
Token ID2442
Feature activation+7.75e-02

Pos logit contributions
opus+0.089
arter+0.086
 sidelines+0.085
ye+0.082
uch+0.081
Neg logit contributions
 and-0.133
 her-0.127
.-0.116
 as-0.115
 them-0.115
Token wheel
Token ID7825
Feature activation+3.12e-02

Pos logit contributions
opus+0.109
 sidelines+0.108
ye+0.107
rav+0.095
uch+0.095
Neg logit contributions
 her-0.121
 and-0.113
 how-0.104
 them-0.103
 its-0.101
Token,
Token ID11
Feature activation+2.02e-02

Pos logit contributions
ll+0.068
opus+0.064
ancer+0.063
 sidelines+0.062
cher+0.062
Neg logit contributions
 her-0.019
 its-0.017
 them-0.017
 their-0.015
Her-0.014
Token the
Token ID262
Feature activation+6.15e-03

Pos logit contributions
 sidelines+0.041
opus+0.040
arter+0.039
ye+0.039
leaf+0.038
Neg logit contributions
 and-0.020
 her-0.018
.-0.016
 them-0.015
 how-0.015
Token roller
Token ID24471
Feature activation+1.14e-02

Pos logit contributions
arter+0.011
opus+0.010
prise+0.009
uch+0.009
 sidelines+0.009
Neg logit contributions
 and-0.010
 her-0.008
 one-0.008
.-0.008
 as-0.008
Token coaster
Token ID42450
Feature activation+2.43e-02

Pos logit contributions
opus+0.011
 sidelines+0.010
uch+0.010
atmeal+0.009
cher+0.009
Neg logit contributions
 her-0.023
 and-0.021
 their-0.021
 its-0.021
 them-0.021
Token to
Token ID284
Feature activation+2.71e-02

Pos logit contributions
arter+0.050
prise+0.049
 sidelines+0.048
ll+0.048
ye+0.048
Neg logit contributions
 and-0.021
 her-0.021
 how-0.018
ine-0.018
 them-0.017
Token see
Token ID766
Feature activation+1.02e-02

Pos logit contributions
 sidelines+0.059
arter+0.059
opus+0.057
prise+0.056
ye+0.056
Neg logit contributions
 and-0.023
 her-0.019
.-0.016
 as-0.016
 how-0.016
Token the
Token ID262
Feature activation+1.67e-02

Pos logit contributions
arter+0.020
opus+0.020
prise+0.020
 sidelines+0.020
ye+0.019
Neg logit contributions
 and-0.008
 her-0.007
ine-0.007
stick-0.006
 as-0.006
Token rh
Token ID9529
Feature activation+6.24e-02

Pos logit contributions
arter+0.028
atmeal+0.028
opus+0.027
 sidelines+0.027
cher+0.026
Neg logit contributions
 and-0.026
 one-0.022
.-0.022
 her-0.020
,-0.020
Tokenin
Token ID259
Feature activation+4.31e-02

Pos logit contributions
 sidelines+0.107
arter+0.106
umm+0.095
 cheesy+0.095
leaf+0.095
Neg logit contributions
 and-0.097
.-0.082
 them-0.077
 or-0.075
,-0.075
Tokenoc
Token ID420
Feature activation+8.72e-02

Pos logit contributions
 sidelines+0.060
arter+0.059
opus+0.057
leaf+0.054
umm+0.053
Neg logit contributions
 and-0.078
.-0.067
 her-0.067
,-0.065
 as-0.065
Tokeneros
Token ID27498
Feature activation+2.04e-02

Pos logit contributions
arter+0.129
 sidelines+0.129
prise+0.125
inations+0.123
ye+0.121
Neg logit contributions
 and-0.142
.-0.121
,-0.119
 how-0.117
 her-0.117
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
arter+0.045
opus+0.044
ll+0.043
 sidelines+0.043
uch+0.043
Neg logit contributions
 her-0.013
 and-0.013
 them-0.010
 how-0.009
ine-0.009
Token We
Token ID775
Feature activation-6.56e-03

Pos logit contributions
arter+0.023
opus+0.022
 sidelines+0.022
ye+0.022
 cheesy+0.021
Neg logit contributions
 and-0.020
 her-0.019
 them-0.017
 how-0.017
 as-0.017
Token don
Token ID836
Feature activation+1.63e-02

Pos logit contributions
 and+0.011
 her+0.010
 how+0.010
 as+0.010
 them+0.009
Neg logit contributions
arter-0.009
opus-0.008
uch-0.008
prise-0.008
ve-0.008
Token't
Token ID470
Feature activation+1.64e-02

Pos logit contributions
opus+0.023
arter+0.023
 sidelines+0.023
ye+0.021
uch+0.021
Neg logit contributions
 and-0.027
 her-0.025
.-0.023
 as-0.023
 how-0.023
Token!"
Token ID2474
Feature activation-6.99e-04

Pos logit contributions
 her+0.015
 them+0.015
 Everywhere+0.014
 how+0.014
 and+0.014
Neg logit contributions
opus-0.013
arter-0.012
ll-0.012
ye-0.012
uch-0.012
Token Ben
Token ID3932
Feature activation-1.57e-02

Pos logit contributions
stick+0.001
 and+0.001
 how+0.001
 enough+0.001
c+0.001
Neg logit contributions
 sidelines-0.001
arter-0.001
opus-0.001
 cheesy-0.001
 hamb-0.001
Token says
Token ID1139
Feature activation-4.42e-03

Pos logit contributions
 its+0.026
 her+0.026
stick+0.025
 whenever+0.025
 as+0.025
Neg logit contributions
 sidelines-0.014
leaf-0.013
opus-0.013
 cheek-0.012
 cheesy-0.012
Token.
Token ID13
Feature activation+3.87e-03

Pos logit contributions
 and+0.002
 her+0.001
 how+0.001
 enough+0.001
 them+0.001
Neg logit contributions
arter-0.011
 sidelines-0.011
opus-0.011
 cheek-0.011
ve-0.011
Token "
Token ID366
Feature activation-7.94e-04

Pos logit contributions
arter+0.007
opus+0.006
 sidelines+0.006
ye+0.006
 waiter+0.006
Neg logit contributions
 her-0.004
 enough-0.004
 and-0.004
ine-0.004
 how-0.004
TokenP
Token ID47
Feature activation+6.97e-02

Pos logit contributions
 and+0.001
 her+0.001
 as+0.001
.+0.001
 how+0.001
Neg logit contributions
opus-0.001
 sidelines-0.001
arter-0.001
uch-0.001
ye-0.001
Tokenill
Token ID359
Feature activation-3.61e-03

Pos logit contributions
opus+0.086
arter+0.085
 sidelines+0.081
prise+0.081
uch+0.080
Neg logit contributions
 and-0.128
 her-0.120
 as-0.111
 them-0.110
 one-0.110
Tokenows
Token ID1666
Feature activation-5.69e-03

Pos logit contributions
 her+0.004
 and+0.004
 them+0.004
 as+0.004
 one+0.004
Neg logit contributions
opus-0.006
arter-0.006
uch-0.006
ll-0.006
ve-0.006
Token are
Token ID389
Feature activation+9.24e-04

Pos logit contributions
ine+0.008
 and+0.008
 her+0.008
stick+0.008
 one+0.008
Neg logit contributions
arter-0.007
 sidelines-0.007
opus-0.006
uch-0.006
 cheesy-0.006
Token fun
Token ID1257
Feature activation-2.61e-03

Pos logit contributions
arter+0.001
 sidelines+0.001
opus+0.001
ye+0.001
uch+0.001
Neg logit contributions
 and-0.001
 her-0.001
.-0.001
 them-0.001
 how-0.001
Token!"
Token ID2474
Feature activation+9.55e-03

Pos logit contributions
 her+0.003
 how+0.002
 and+0.002
 Every+0.002
 them+0.002
Neg logit contributions
arter-0.004
opus-0.004
 sidelines-0.004
ye-0.004
ve-0.004
Token park
Token ID3952
Feature activation+8.43e-03

Pos logit contributions
arter+0.008
opus+0.008
 sidelines+0.007
uch+0.007
prise+0.007
Neg logit contributions
 and-0.007
 her-0.006
.-0.006
 one-0.006
,-0.006
Token and
Token ID290
Feature activation-8.02e-03

Pos logit contributions
 sidelines+0.017
ll+0.016
 dining+0.016
 cheek+0.016
 hamb+0.016
Neg logit contributions
 enough-0.004
 them-0.004
 Whenever-0.004
stick-0.004
ine-0.004
Token saw
Token ID2497
Feature activation+1.16e-03

Pos logit contributions
 and+0.012
 her+0.010
.+0.010
 them+0.010
 how+0.010
Neg logit contributions
 sidelines-0.012
opus-0.012
arter-0.011
leaf-0.011
ll-0.011
Token that
Token ID326
Feature activation+2.57e-02

Pos logit contributions
 sidelines+0.002
leaf+0.002
opus+0.002
 concession+0.002
 hamb+0.002
Neg logit contributions
 and-0.001
ine-0.001
 as-0.001
 or-0.001
.-0.001
Token the
Token ID262
Feature activation+2.36e-02

Pos logit contributions
 sidelines+0.050
opus+0.048
leaf+0.047
 talented+0.047
 queens+0.047
Neg logit contributions
 and-0.020
ine-0.019
stick-0.015
 as-0.015
 or-0.014
Token other
Token ID584
Feature activation+7.45e-02

Pos logit contributions
arter+0.036
opus+0.036
 sidelines+0.036
ye+0.033
uch+0.033
Neg logit contributions
 and-0.037
 her-0.034
.-0.031
 as-0.031
 one-0.031
Token children
Token ID1751
Feature activation+3.04e-02

Pos logit contributions
opus+0.084
 sidelines+0.084
arter+0.080
umm+0.074
uch+0.074
Neg logit contributions
 and-0.130
 her-0.127
 its-0.119
 as-0.118
 how-0.117
Token were
Token ID547
Feature activation-1.05e-02

Pos logit contributions
opus+0.048
 sidelines+0.048
arter+0.047
ll+0.046
ve+0.045
Neg logit contributions
 and-0.038
 her-0.036
 enough-0.034
 its-0.033
 as-0.033
Token playing
Token ID2712
Feature activation-2.66e-03

Pos logit contributions
 and+0.017
 her+0.016
 how+0.015
 as+0.015
 its+0.015
Neg logit contributions
 sidelines-0.013
arter-0.013
opus-0.012
ve-0.012
ll-0.012
Token a
Token ID257
Feature activation+1.43e-02

Pos logit contributions
ine+0.002
 enough+0.002
 Every+0.002
 Whenever+0.002
stick+0.002
Neg logit contributions
 sidelines-0.005
cher-0.005
ve-0.004
 seek-0.004
prise-0.004
Token game
Token ID983
Feature activation-2.69e-02

Pos logit contributions
arter+0.022
umm+0.021
 sidelines+0.021
wy+0.020
atmeal+0.020
Neg logit contributions
 and-0.024
 her-0.021
.-0.020
 one-0.020
,-0.020
Token cookies
Token ID14746
Feature activation-6.15e-03

Pos logit contributions
 sidelines+0.002
opus+0.002
prise+0.002
arter+0.002
ye+0.002
Neg logit contributions
 and-0.003
 her-0.003
 its-0.003
 how-0.003
 them-0.003
Token.
Token ID13
Feature activation+2.66e-02

Pos logit contributions
 its+0.004
 her+0.004
 Whenever+0.004
ine+0.004
 Every+0.004
Neg logit contributions
ll-0.012
prise-0.012
cher-0.012
uch-0.012
wy-0.012
Token But
Token ID887
Feature activation+6.09e-03

Pos logit contributions
opus+0.044
arter+0.043
 sidelines+0.043
ye+0.040
uch+0.040
Neg logit contributions
 and-0.039
 her-0.035
.-0.032
 as-0.032
 how-0.032
Token don
Token ID836
Feature activation+2.48e-02

Pos logit contributions
opus+0.011
 sidelines+0.011
arter+0.011
leaf+0.011
vell+0.010
Neg logit contributions
 and-0.009
 her-0.008
 one-0.008
 how-0.008
,-0.008
Token't
Token ID470
Feature activation+2.51e-02

Pos logit contributions
opus+0.034
arter+0.034
 sidelines+0.034
uch+0.032
ye+0.031
Neg logit contributions
 and-0.041
 her-0.038
.-0.035
 as-0.035
 how-0.035
Token eat
Token ID4483
Feature activation+2.07e-02

Pos logit contributions
opus+0.034
arter+0.034
 sidelines+0.033
ye+0.031
leaf+0.031
Neg logit contributions
 and-0.045
 her-0.041
.-0.039
 as-0.038
 how-0.038
Token too
Token ID1165
Feature activation-1.91e-02

Pos logit contributions
 cheesy+0.033
uch+0.032
wy+0.032
atmeal+0.032
arter+0.031
Neg logit contributions
 and-0.024
stick-0.023
 her-0.021
ine-0.020
 how-0.020
Token many
Token ID867
Feature activation+1.13e-02

Pos logit contributions
 and+0.025
 her+0.022
 how+0.020
 as+0.020
 them+0.020
Neg logit contributions
arter-0.033
opus-0.033
 sidelines-0.033
uch-0.032
ye-0.032
Token,
Token ID11
Feature activation+5.18e-03

Pos logit contributions
arter+0.014
opus+0.014
 cheesy+0.014
 sidelines+0.014
atmeal+0.013
Neg logit contributions
 and-0.018
 her-0.017
stick-0.016
 its-0.016
 as-0.015
Token or
Token ID393
Feature activation-6.07e-04

Pos logit contributions
arter+0.008
 sidelines+0.008
opus+0.008
 cheesy+0.008
ye+0.008
Neg logit contributions
 and-0.006
 her-0.006
.-0.006
 how-0.006
 them-0.005
Token you
Token ID345
Feature activation+1.83e-03

Pos logit contributions
 and+0.001
 her+0.001
.+0.001
 as+0.001
 how+0.001
Neg logit contributions
opus-0.001
arter-0.001
 sidelines-0.001
ye-0.001
uch-0.001
Token
Token ID198
Feature activation+1.44e-02

Pos logit contributions
opus+0.026
 sidelines+0.026
ll+0.025
ye+0.025
prise+0.025
Neg logit contributions
 enough-0.022
 as-0.020
 and-0.020
 or-0.019
 one-0.018
TokenSo
Token ID2396
Feature activation+7.78e-03

Pos logit contributions
opus+0.021
prise+0.021
 sidelines+0.021
icious+0.020
ye+0.020
Neg logit contributions
 and-0.023
 enough-0.022
 as-0.021
,-0.021
 or-0.020
Token she
Token ID673
Feature activation-1.73e-03

Pos logit contributions
 Merry+0.014
ye+0.014
ll+0.013
opus+0.013
prise+0.013
Neg logit contributions
stick-0.006
 and-0.006
 how-0.006
 them-0.005
ine-0.005
Token sliced
Token ID26790
Feature activation+3.25e-02

Pos logit contributions
 her+0.003
 them+0.002
 and+0.002
 one+0.002
 its+0.002
Neg logit contributions
ye-0.002
opus-0.002
ll-0.002
leaf-0.002
cher-0.002
Token the
Token ID262
Feature activation+1.33e-02

Pos logit contributions
 cheesy+0.058
 hamb+0.058
umm+0.057
ve+0.057
wy+0.057
Neg logit contributions
 and-0.022
 or-0.022
ine-0.021
packs-0.020
stick-0.019
Token pepper
Token ID13385
Feature activation+5.71e-02

Pos logit contributions
opus+0.015
arter+0.015
 sidelines+0.015
ye+0.014
uch+0.014
Neg logit contributions
 and-0.026
 her-0.024
.-0.023
 as-0.022
 how-0.022
Token into
Token ID656
Feature activation+3.21e-03

Pos logit contributions
opus+0.114
ll+0.112
umm+0.112
 hamb+0.110
atmeal+0.109
Neg logit contributions
ine-0.036
 her-0.036
 Every-0.035
 One-0.034
 Whenever-0.032
Token pieces
Token ID5207
Feature activation-1.28e-03

Pos logit contributions
 sidelines+0.006
arter+0.006
 cheesy+0.006
opus+0.006
ve+0.006
Neg logit contributions
 and-0.003
 or-0.003
 as-0.003
 how-0.003
 her-0.003
Token and
Token ID290
Feature activation-1.51e-03

Pos logit contributions
 One+0.001
 Whenever+0.001
ine+0.001
 Every+0.001
 how+0.001
Neg logit contributions
ll-0.002
ve-0.002
opus-0.002
beans-0.002
is-0.002
Token passed
Token ID3804
Feature activation-1.84e-02

Pos logit contributions
 and+0.002
 her+0.002
.+0.002
 them+0.002
 or+0.002
Neg logit contributions
 sidelines-0.002
opus-0.002
arter-0.002
ye-0.002
atmeal-0.002
Token them
Token ID606
Feature activation+3.08e-03

Pos logit contributions
ine+0.015
 or+0.012
 One+0.012
 and+0.012
 Every+0.011
Neg logit contributions
ve-0.033
leaf-0.031
 queens-0.030
 sidelines-0.030
umm-0.030
Tokenily
Token ID813
Feature activation-1.54e-02

Pos logit contributions
 and+0.007
 her+0.006
.+0.006
 as+0.006
 how+0.006
Neg logit contributions
opus-0.007
 sidelines-0.007
arter-0.007
ye-0.007
uch-0.007
Token smiled
Token ID13541
Feature activation+1.82e-02

Pos logit contributions
 her+0.022
 them+0.022
 its+0.022
 their+0.021
 as+0.021
Neg logit contributions
opus-0.021
ll-0.020
ve-0.019
 hamb-0.018
leton-0.018
Token.
Token ID13
Feature activation+5.55e-03

Pos logit contributions
 handsome+0.036
arter+0.035
 cheesy+0.035
 cheek+0.035
vell+0.034
Neg logit contributions
stick-0.011
ine-0.010
 them-0.009
c-0.009
ion-0.007
Token She
Token ID1375
Feature activation-5.96e-03

Pos logit contributions
 sidelines+0.009
arter+0.009
ancer+0.009
opus+0.009
 Welcome+0.008
Neg logit contributions
 enough-0.006
ine-0.005
 and-0.005
 one-0.005
c-0.005
Token liked
Token ID8288
Feature activation+6.54e-03

Pos logit contributions
 and+0.008
 her+0.007
.+0.006
 how+0.006
,+0.006
Neg logit contributions
 sidelines-0.011
arter-0.011
opus-0.011
umm-0.010
uch-0.010
Token to
Token ID284
Feature activation+2.10e-02

Pos logit contributions
arter+0.007
 hamb+0.007
 cheesy+0.007
 sidelines+0.007
opus+0.007
Neg logit contributions
 and-0.011
ine-0.010
 as-0.010
stick-0.010
 her-0.010
Token teach
Token ID4545
Feature activation+1.14e-02

Pos logit contributions
arter+0.026
 sidelines+0.026
opus+0.025
ye+0.023
atmeal+0.023
Neg logit contributions
 and-0.038
 her-0.035
 as-0.033
 how-0.033
 them-0.033
Token Max
Token ID5436
Feature activation+4.97e-03

Pos logit contributions
 sidelines+0.014
arter+0.012
icated+0.012
leaf+0.012
 Seek+0.012
Neg logit contributions
ine-0.016
 and-0.015
c-0.015
 enough-0.015
age-0.014
Token new
Token ID649
Feature activation-1.46e-02

Pos logit contributions
affles+0.008
 cheesy+0.008
beans+0.008
leaf+0.008
affle+0.008
Neg logit contributions
stick-0.003
mith-0.003
 or-0.003
bul-0.003
ine-0.003
Token things
Token ID1243
Feature activation-1.40e-02

Pos logit contributions
 and+0.025
 her+0.025
 as+0.024
 or+0.024
 enough+0.024
Neg logit contributions
opus-0.015
 hamb-0.015
arter-0.015
ye-0.015
 cheesy-0.014
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
ine+0.012
stick+0.011
c+0.009
Pear+0.008
ulously+0.008
Neg logit contributions
ll-0.026
vell-0.024
leaf-0.024
arter-0.023
cher-0.023
Token sunshine
Token ID34488
Feature activation+1.35e-02

Pos logit contributions
arter+0.023
opus+0.023
 sidelines+0.022
ve+0.021
uch+0.021
Neg logit contributions
 and-0.017
 her-0.015
.-0.014
 one-0.013
 them-0.013
Token.
Token ID13
Feature activation+5.98e-03

Pos logit contributions
atmeal+0.028
opus+0.028
umm+0.027
 sidelines+0.027
arter+0.027
Neg logit contributions
 how-0.007
 her-0.007
ine-0.006
 its-0.006
 enough-0.006
Token One
Token ID1881
Feature activation-7.46e-03

Pos logit contributions
arter+0.013
 sidelines+0.013
opus+0.013
ye+0.012
atmeal+0.012
Neg logit contributions
 and-0.004
 her-0.004
 enough-0.004
 as-0.003
 how-0.003
Token day
Token ID1110
Feature activation+5.91e-03

Pos logit contributions
 and+0.010
 her+0.009
.+0.008
,+0.008
 one+0.008
Neg logit contributions
 sidelines-0.014
opus-0.014
arter-0.014
cher-0.014
vell-0.014
Token he
Token ID339
Feature activation+1.90e-02

Pos logit contributions
 sidelines+0.018
arter+0.018
opus+0.018
umm+0.017
leaf+0.017
Neg logit contributions
 and-0.001
 her-0.000
,-0.000
.+0.000
 as+0.000
Token thought
Token ID1807
Feature activation+3.60e-02

Pos logit contributions
opus+0.030
arter+0.029
 sidelines+0.029
ye+0.028
uch+0.028
Neg logit contributions
 and-0.026
 her-0.024
 how-0.023
 as-0.023
.-0.022
Token it
Token ID340
Feature activation-1.22e-03

Pos logit contributions
opus+0.064
arter+0.063
ye+0.062
 sidelines+0.060
atmeal+0.059
Neg logit contributions
 and-0.037
 her-0.035
stick-0.032
 them-0.032
.-0.031
Token would
Token ID561
Feature activation-1.13e-02

Pos logit contributions
 and+0.001
 her+0.001
.+0.001
 as+0.001
 how+0.001
Neg logit contributions
opus-0.003
arter-0.003
 sidelines-0.003
ye-0.002
uch-0.002
Token be
Token ID307
Feature activation+1.29e-02

Pos logit contributions
 and+0.015
 her+0.014
 enough+0.013
 how+0.013
 as+0.013
Neg logit contributions
 sidelines-0.018
arter-0.018
opus-0.017
uch-0.017
ye-0.016
Token fun
Token ID1257
Feature activation-6.75e-03

Pos logit contributions
 cheesy+0.016
arter+0.015
wy+0.015
 hamb+0.015
atmeal+0.015
Neg logit contributions
 and-0.020
stick-0.019
 how-0.018
 her-0.017
 them-0.017
Token to
Token ID284
Feature activation+2.26e-02

Pos logit contributions
stick+0.005
 its+0.004
 her+0.004
ine+0.004
Her+0.004
Neg logit contributions
ll-0.013
ve-0.012
inations-0.012
ye-0.012
opus-0.012
Token.
Token ID13
Feature activation-3.58e-03

Pos logit contributions
 them+0.005
 her+0.005
stick+0.004
Her+0.004
 Everywhere+0.004
Neg logit contributions
ll-0.016
arter-0.016
opus-0.015
cher-0.015
ye-0.014
Token The
Token ID383
Feature activation+9.15e-04

Pos logit contributions
 and+0.003
 her+0.002
 them+0.002
 how+0.002
 enough+0.002
Neg logit contributions
 sidelines-0.008
arter-0.008
umm-0.008
opus-0.007
leaf-0.007
Token dragon
Token ID10441
Feature activation+1.45e-02

Pos logit contributions
icious+0.002
 sidelines+0.002
gary+0.002
arter+0.002
leaf+0.002
Neg logit contributions
 and-0.001
 her-0.001
 one-0.001
.-0.001
,-0.001
Token was
Token ID373
Feature activation+2.76e-03

Pos logit contributions
opus+0.020
ll+0.020
ye+0.018
uch+0.018
ve+0.018
Neg logit contributions
 her-0.021
 them-0.020
 and-0.019
 how-0.019
 enough-0.019
Token happy
Token ID3772
Feature activation+7.09e-03

Pos logit contributions
arter+0.004
 sidelines+0.004
 cheesy+0.004
leaf+0.003
ye+0.003
Neg logit contributions
 and-0.004
 her-0.004
 how-0.004
.-0.004
 them-0.004
Token to
Token ID284
Feature activation+1.72e-02

Pos logit contributions
ll+0.012
arter+0.012
 joking+0.011
 smelling+0.011
 forgetting+0.011
Neg logit contributions
ine-0.007
c-0.006
stick-0.006
Her-0.006
alled-0.006
Token see
Token ID766
Feature activation+1.76e-02

Pos logit contributions
arter+0.026
 sidelines+0.026
opus+0.025
ye+0.024
uch+0.024
Neg logit contributions
 and-0.026
 her-0.023
 how-0.022
 as-0.022
.-0.022
Token Sally
Token ID25737
Feature activation+3.17e-02

Pos logit contributions
arter+0.038
 sidelines+0.037
leaf+0.037
prise+0.036
opus+0.035
Neg logit contributions
 and-0.012
ine-0.010
 as-0.008
 enough-0.008
stick-0.007
Token.
Token ID13
Feature activation+4.76e-03

Pos logit contributions
 cheek+0.064
arter+0.063
ll+0.063
 sidelines+0.063
opus+0.061
Neg logit contributions
 them-0.016
ine-0.014
 enough-0.013
 her-0.013
 their-0.012
Token Sally
Token ID25737
Feature activation+2.06e-02

Pos logit contributions
 sidelines+0.010
umm+0.010
arter+0.010
opus+0.009
ye+0.009
Neg logit contributions
 and-0.004
 her-0.004
 them-0.003
 how-0.003
 enough-0.003
Token was
Token ID373
Feature activation-1.35e-03

Pos logit contributions
opus+0.029
ll+0.029
ve+0.026
ye+0.026
uch+0.025
Neg logit contributions
 her-0.029
 them-0.028
 enough-0.027
 its-0.027
 as-0.027
Token.
Token ID13
Feature activation+2.75e-03

Pos logit contributions
 and+0.002
 her+0.002
 them+0.002
 its+0.001
 how+0.001
Neg logit contributions
opus-0.011
arter-0.011
ll-0.011
 sidelines-0.011
ye-0.011
Token I
Token ID314
Feature activation+5.95e-03

Pos logit contributions
 sidelines+0.003
opus+0.003
arter+0.003
ve+0.003
uch+0.003
Neg logit contributions
 and-0.006
 her-0.005
.-0.005
,-0.005
 how-0.005
Token won
Token ID1839
Feature activation+1.49e-02

Pos logit contributions
opus+0.009
arter+0.009
 sidelines+0.009
uch+0.008
ye+0.008
Neg logit contributions
 and-0.009
 her-0.009
 as-0.008
.-0.008
 how-0.008
Token't
Token ID470
Feature activation+1.44e-02

Pos logit contributions
opus+0.020
 sidelines+0.020
arter+0.019
uch+0.018
ve+0.018
Neg logit contributions
 and-0.025
 her-0.023
 as-0.022
.-0.022
 how-0.022
Token do
Token ID466
Feature activation-4.60e-02

Pos logit contributions
opus+0.023
arter+0.023
 sidelines+0.022
umm+0.021
ye+0.021
Neg logit contributions
 and-0.024
 her-0.021
.-0.020
,-0.020
 as-0.020
Token it
Token ID340
Feature activation-5.06e-03

Pos logit contributions
 and+0.045
 her+0.033
stick+0.033
ine+0.032
c+0.032
Neg logit contributions
 sidelines-0.088
opus-0.087
arter-0.085
 talented-0.084
uch-0.084
Token again
Token ID757
Feature activation+4.13e-03

Pos logit contributions
 and+0.004
 her+0.004
 its+0.003
 how+0.003
 them+0.003
Neg logit contributions
uch-0.010
opus-0.010
ll-0.010
arter-0.010
ve-0.010
Token.
Token ID13
Feature activation+2.60e-03

Pos logit contributions
arter+0.009
opus+0.009
 sidelines+0.009
uch+0.008
ye+0.008
Neg logit contributions
 and-0.004
 her-0.003
 how-0.003
 them-0.003
 as-0.003
Token I
Token ID314
Feature activation+4.60e-03

Pos logit contributions
 sidelines+0.003
opus+0.003
arter+0.003
ve+0.003
uch+0.003
Neg logit contributions
 and-0.005
 her-0.005
.-0.004
 how-0.004
,-0.004
Token just
Token ID655
Feature activation+2.37e-02

Pos logit contributions
opus+0.006
arter+0.006
uch+0.006
ll+0.006
 sidelines+0.006
Neg logit contributions
 and-0.007
 her-0.007
 how-0.007
 them-0.007
 as-0.007
Token wanted
Token ID2227
Feature activation+3.43e-03

Pos logit contributions
opus+0.039
 sidelines+0.038
arter+0.038
umm+0.036
vell+0.035
Neg logit contributions
 and-0.038
 her-0.034
.-0.032
 how-0.031
 as-0.031
Token a
Token ID257
Feature activation+1.72e-02

Pos logit contributions
arter+0.069
 sidelines+0.069
opus+0.068
prise+0.066
leaf+0.066
Neg logit contributions
 and-0.013
 her-0.008
.-0.007
 how-0.006
 them-0.006
Token regular
Token ID3218
Feature activation-9.87e-04

Pos logit contributions
leaf+0.038
cher+0.037
umm+0.036
 sidelines+0.035
vell+0.035
Neg logit contributions
 and-0.021
 her-0.020
,-0.018
.-0.017
 their-0.016
Token house
Token ID2156
Feature activation-1.24e-02

Pos logit contributions
 and+0.001
 her+0.001
 how+0.001
 them+0.001
 as+0.001
Neg logit contributions
opus-0.002
arter-0.001
 sidelines-0.001
ye-0.001
prise-0.001
Token with
Token ID351
Feature activation+1.98e-07

Pos logit contributions
 and+0.008
 her+0.008
 them+0.008
 enough+0.007
 how+0.006
Neg logit contributions
arter-0.026
 sidelines-0.026
opus-0.026
uch-0.025
prise-0.025
Token her
Token ID607
Feature activation-2.24e-02

Feature not active on prev token;
no predictions were affected.
Token mom
Token ID1995
Feature activation-3.26e-02

Pos logit contributions
 and+0.028
 her+0.025
.+0.022
 as+0.022
 how+0.022
Neg logit contributions
opus-0.042
 sidelines-0.042
arter-0.042
ye-0.039
uch-0.039
Token and
Token ID290
Feature activation-1.82e-02

Pos logit contributions
 its+0.024
stick+0.024
 itself+0.024
 as+0.022
 how+0.021
Neg logit contributions
ll-0.062
atmeal-0.057
arter-0.057
wy-0.056
 hamb-0.056
Token dad
Token ID9955
Feature activation-2.33e-02

Pos logit contributions
 and+0.021
 her+0.020
 how+0.016
.+0.016
,+0.016
Neg logit contributions
opus-0.037
 sidelines-0.037
arter-0.037
umm-0.035
ye-0.035
Token.
Token ID13
Feature activation-9.16e-04

Pos logit contributions
 and+0.009
 her+0.009
 them+0.008
stick+0.008
 its+0.008
Neg logit contributions
ll-0.052
arter-0.051
prise-0.051
 hamb-0.049
 cheesy-0.049
Token One
Token ID1881
Feature activation-1.09e-02

Pos logit contributions
 and+0.001
 her+0.001
 as+0.001
 how+0.001
 enough+0.001
Neg logit contributions
arter-0.002
 sidelines-0.002
opus-0.002
umm-0.002
 cheesy-0.002
Token day
Token ID1110
Feature activation+9.37e-05

Pos logit contributions
 and+0.013
 her+0.011
.+0.010
,+0.010
 one+0.010
Neg logit contributions
 sidelines-0.026
cher-0.025
vell-0.024
opus-0.024
agna-0.024
Token to
Token ID284
Feature activation-1.53e-02

Pos logit contributions
 and+0.058
ine+0.055
c+0.048
stick+0.045
 her+0.044
Neg logit contributions
arter-0.114
 sidelines-0.113
opus-0.107
 hamb-0.104
 cheesy-0.103
Token listen
Token ID6004
Feature activation+1.02e-03

Pos logit contributions
 and+0.027
 enough+0.024
 how+0.023
 her+0.023
 as+0.023
Neg logit contributions
 sidelines-0.018
arter-0.018
prise-0.016
uch-0.015
ye-0.015
Token to
Token ID284
Feature activation-1.54e-02

Pos logit contributions
ll+0.002
 cheesy+0.002
arter+0.002
 sidelines+0.002
ye+0.002
Neg logit contributions
stick-0.001
 her-0.001
c-0.001
ine-0.001
 and-0.001
Token his
Token ID465
Feature activation-1.15e-02

Pos logit contributions
 and+0.013
 enough+0.010
 as+0.009
 her+0.009
,+0.009
Neg logit contributions
 sidelines-0.032
arter-0.032
opus-0.030
ye-0.030
leaf-0.029
Token parents
Token ID3397
Feature activation-2.58e-02

Pos logit contributions
 her+0.016
 and+0.016
 them+0.015
 how+0.015
 its+0.014
Neg logit contributions
arter-0.016
opus-0.016
 cheek-0.015
 sidelines-0.015
ye-0.015
Token and
Token ID290
Feature activation-3.35e-02

Pos logit contributions
 her+0.010
 its+0.009
 and+0.007
 them+0.007
 how+0.006
Neg logit contributions
ll-0.060
 sidelines-0.059
opus-0.059
ve-0.058
cher-0.058
Token would
Token ID561
Feature activation-2.96e-02

Pos logit contributions
 and+0.051
 her+0.046
 as+0.044
 them+0.044
 enough+0.043
Neg logit contributions
 sidelines-0.046
arter-0.045
opus-0.044
ye-0.043
prise-0.043
Token often
Token ID1690
Feature activation-3.13e-02

Pos logit contributions
 and+0.045
 her+0.043
 enough+0.041
 how+0.040
 as+0.040
Neg logit contributions
 sidelines-0.040
arter-0.037
ll-0.036
opus-0.035
ye-0.035
Token run
Token ID1057
Feature activation-1.86e-02

Pos logit contributions
 and+0.050
 her+0.047
 how+0.044
 them+0.044
 enough+0.043
Neg logit contributions
 sidelines-0.042
arter-0.040
opus-0.039
ye-0.038
prise-0.038
Token away
Token ID1497
Feature activation-3.36e-03

Pos logit contributions
ine+0.017
 how+0.017
 Every+0.017
 Her+0.016
 her+0.016
Neg logit contributions
 sidelines-0.031
ve-0.030
cher-0.029
leaf-0.028
fy-0.027
Token from
Token ID422
Feature activation-7.10e-03

Pos logit contributions
ine+0.002
stick+0.002
c+0.002
 Every+0.002
 her+0.002
Neg logit contributions
arter-0.007
 sidelines-0.007
ve-0.007
ll-0.006
 cheek-0.006
Token of
Token ID286
Feature activation-9.09e-03

Pos logit contributions
arter+0.030
 sidelines+0.029
leaf+0.027
opus+0.027
 cheesy+0.027
Neg logit contributions
 her-0.022
 and-0.022
 how-0.022
ine-0.022
stick-0.021
Token light
Token ID1657
Feature activation-1.70e-02

Pos logit contributions
 and+0.013
 as+0.011
 her+0.011
 how+0.011
.+0.011
Neg logit contributions
 sidelines-0.014
arter-0.013
leaf-0.013
 cheesy-0.012
opus-0.012
Token in
Token ID287
Feature activation-7.42e-03

Pos logit contributions
 One+0.009
 Whenever+0.009
+0.008
 Every+0.008
 Everywhere+0.008
Neg logit contributions
arter-0.034
opus-0.034
umm-0.032
 cheek-0.032
 cheesy-0.032
Token the
Token ID262
Feature activation-8.12e-03

Pos logit contributions
ine+0.005
stick+0.005
age+0.005
 as+0.004
 how+0.004
Neg logit contributions
arter-0.013
 freshly-0.013
prise-0.013
 sidelines-0.012
 cheesy-0.012
Token hall
Token ID6899
Feature activation-4.05e-02

Pos logit contributions
 and+0.012
 her+0.011
.+0.010
,+0.010
 one+0.010
Neg logit contributions
opus-0.014
arter-0.014
 sidelines-0.013
ye-0.013
ve-0.012
Token.
Token ID13
Feature activation+8.42e-03

Pos logit contributions
 her+0.018
 them+0.014
 how+0.013
 its+0.013
 and+0.011
Neg logit contributions
 sidelines-0.094
ider-0.090
opus-0.090
arter-0.090
 timer-0.088
Token She
Token ID1375
Feature activation-4.09e-03

Pos logit contributions
arter+0.014
 sidelines+0.014
 Merry+0.014
umm+0.013
 Enjoy+0.013
Neg logit contributions
 enough-0.008
 and-0.008
ine-0.008
 as-0.008
 them-0.008
Token followed
Token ID3940
Feature activation+1.97e-02

Pos logit contributions
 and+0.006
 her+0.005
 as+0.005
.+0.005
 them+0.005
Neg logit contributions
opus-0.006
 sidelines-0.006
arter-0.006
ye-0.006
uch-0.006
Token it
Token ID340
Feature activation+1.46e-02

Pos logit contributions
 sidelines+0.041
leaf+0.038
 Merry+0.038
ve+0.037
 cheek+0.037
Neg logit contributions
c-0.011
 and-0.010
ine-0.010
 how-0.010
 enough-0.010
Token and
Token ID290
Feature activation+7.30e-03

Pos logit contributions
ll+0.026
 cheek+0.026
 microscopic+0.025
 sidelines+0.025
ve+0.025
Neg logit contributions
ine-0.008
 them-0.008
aw-0.008
c-0.007
 Every-0.006
Token found
Token ID1043
Feature activation+2.05e-02

Pos logit contributions
 sidelines+0.012
opus+0.012
arter+0.012
ye+0.011
prise+0.011
Neg logit contributions
 and-0.010
 her-0.009
.-0.009
 them-0.008
 as-0.008
Token?"
Token ID1701
Feature activation-2.41e-02

Pos logit contributions
opus+0.005
arter+0.005
uch+0.005
 sidelines+0.005
ye+0.005
Neg logit contributions
 and-0.004
 her-0.004
ine-0.003
 them-0.003
 enough-0.003
Token Lily
Token ID20037
Feature activation-3.48e-02

Pos logit contributions
stick+0.033
 and+0.032
 how+0.031
ine+0.031
c+0.030
Neg logit contributions
 sidelines-0.033
 cheesy-0.029
 Merry-0.029
 hamb-0.028
leaf-0.028
Token asked
Token ID1965
Feature activation-1.83e-02

Pos logit contributions
 them+0.045
 her+0.044
 how+0.044
 their+0.044
 its+0.044
Neg logit contributions
opus-0.049
ll-0.046
ve-0.044
 hamb-0.042
cher-0.042
Token.
Token ID13
Feature activation-7.41e-03

Pos logit contributions
 and+0.008
 how+0.007
 her+0.006
ine+0.006
stick+0.006
Neg logit contributions
ve-0.039
cher-0.039
 sidelines-0.039
 cheek-0.038
opus-0.038
Token
Token ID198
Feature activation-6.90e-04

Pos logit contributions
 enough+0.009
ine+0.008
 how+0.008
c+0.007
stick+0.007
Neg logit contributions
 sidelines-0.011
 Welcome-0.011
opus-0.011
ancer-0.011
ve-0.011
Token
Token ID198
Feature activation+7.05e-04

Pos logit contributions
 and+0.000
 enough+0.000
 her+0.000
 as+0.000
 how+0.000
Neg logit contributions
opus-0.002
 sidelines-0.001
ye-0.001
ll-0.001
ve-0.001
Token"
Token ID1
Feature activation-1.24e-02

Pos logit contributions
opus+0.001
 sidelines+0.001
ye+0.001
uch+0.001
prise+0.001
Neg logit contributions
 and-0.001
 her-0.001
 how-0.001
 as-0.001
 enough-0.001
TokenI
Token ID40
Feature activation+8.25e-03

Pos logit contributions
 and+0.024
 enough+0.023
,+0.022
 her+0.022
 as+0.022
Neg logit contributions
opus-0.015
leaf-0.013
uch-0.013
Sorry-0.012
ider-0.012
Token don
Token ID836
Feature activation+1.13e-02

Pos logit contributions
opus+0.009
arter+0.009
uch+0.009
ve+0.009
ll+0.009
Neg logit contributions
 and-0.015
 her-0.015
 how-0.014
 them-0.014
 as-0.013
Token't
Token ID470
Feature activation+1.12e-02

Pos logit contributions
opus+0.015
 sidelines+0.014
arter+0.014
uch+0.013
ve+0.013
Neg logit contributions
 and-0.020
 her-0.019
 how-0.017
 as-0.017
 one-0.017
Token know
Token ID760
Feature activation-1.72e-02

Pos logit contributions
arter+0.016
opus+0.015
ye+0.015
leaf+0.014
 sidelines+0.014
Neg logit contributions
 and-0.017
 her-0.017
 how-0.016
 them-0.015
 as-0.015
Token revealing
Token ID13477
Feature activation-2.97e-03

Pos logit contributions
arter+0.008
 sidelines+0.008
opus+0.007
ye+0.007
 cheesy+0.007
Neg logit contributions
 and-0.006
 her-0.005
.-0.005
 them-0.004
 as-0.004
Token his
Token ID465
Feature activation+1.01e-02

Pos logit contributions
 and+0.002
Her+0.002
stick+0.002
ine+0.002
 She+0.002
Neg logit contributions
arter-0.005
 cheesy-0.005
 hamb-0.005
ye-0.005
 sidelines-0.005
Token plan
Token ID1410
Feature activation-1.84e-02

Pos logit contributions
 sidelines+0.013
arter+0.013
ye+0.013
opus+0.012
prise+0.012
Neg logit contributions
 and-0.016
 her-0.016
 how-0.014
 them-0.014
 as-0.014
Token and
Token ID290
Feature activation+6.12e-03

Pos logit contributions
 her+0.008
c+0.007
stick+0.007
 Every+0.007
 and+0.007
Neg logit contributions
arter-0.037
cher-0.036
 sidelines-0.036
ye-0.036
ve-0.035
Token motive
Token ID20289
Feature activation+9.04e-03

Pos logit contributions
opus+0.010
arter+0.010
 sidelines+0.010
uch+0.009
atmeal+0.009
Neg logit contributions
 and-0.009
 her-0.009
.-0.008
 how-0.008
 as-0.008
Token at
Token ID379
Feature activation-4.83e-02

Pos logit contributions
opus+0.017
ve+0.017
arter+0.016
 sidelines+0.016
ye+0.016
Neg logit contributions
 and-0.009
 her-0.008
stick-0.008
 how-0.007
 She-0.007
Token the
Token ID262
Feature activation-2.48e-02

Pos logit contributions
 and+0.039
 her+0.029
.+0.026
,+0.025
 as+0.025
Neg logit contributions
arter-0.105
 sidelines-0.105
opus-0.097
ye-0.097
prise-0.096
Token end
Token ID886
Feature activation-1.85e-02

Pos logit contributions
 and+0.042
 her+0.039
 how+0.037
 as+0.036
 them+0.036
Neg logit contributions
 sidelines-0.034
arter-0.033
opus-0.032
ye-0.031
umm-0.030
Token of
Token ID286
Feature activation-2.87e-02

Pos logit contributions
stick+0.011
 her+0.010
 and+0.009
 enough+0.009
 Every+0.009
Neg logit contributions
arter-0.038
opus-0.035
ye-0.035
ider-0.034
 pounds-0.034
Token the
Token ID262
Feature activation-1.68e-02

Pos logit contributions
 and+0.021
 as+0.014
 enough+0.014
,+0.014
 or+0.012
Neg logit contributions
arter-0.062
 sidelines-0.061
 robbers-0.054
opus-0.054
 cheesy-0.054
Token last
Token ID938
Feature activation+7.28e-03

Pos logit contributions
 and+0.028
 her+0.026
 how+0.024
 as+0.024
 them+0.024
Neg logit contributions
 sidelines-0.024
arter-0.023
opus-0.023
ye-0.022
uch-0.021
Token player
Token ID2137
Feature activation-1.71e-04

Pos logit contributions
 enough+0.009
 her+0.009
ine+0.008
 them+0.008
 its+0.008
Neg logit contributions
opus-0.024
washer-0.024
cher-0.024
 cheek-0.023
 waiter-0.023
Token and
Token ID290
Feature activation-3.89e-03

Pos logit contributions
 her+0.000
 them+0.000
 Every+0.000
Her+0.000
 Everywhere+0.000
Neg logit contributions
cher-0.000
ider-0.000
 cheek-0.000
opus-0.000
beans-0.000
Token puts
Token ID7584
Feature activation-2.86e-03

Pos logit contributions
 and+0.004
 her+0.003
.+0.003
 them+0.003
 as+0.003
Neg logit contributions
 sidelines-0.008
arter-0.007
opus-0.007
leaf-0.007
ye-0.007
Token a
Token ID257
Feature activation-1.72e-03

Pos logit contributions
 and+0.002
ine+0.002
 or+0.002
,+0.002
 how+0.002
Neg logit contributions
 sidelines-0.006
leaf-0.006
cher-0.006
wy-0.005
ye-0.005
Token new
Token ID649
Feature activation-2.66e-02

Pos logit contributions
 and+0.002
 her+0.002
.+0.002
 as+0.002
 how+0.002
Neg logit contributions
opus-0.003
 sidelines-0.003
arter-0.003
ye-0.003
prise-0.003
Token CD
Token ID6458
Feature activation-5.72e-02

Pos logit contributions
 her+0.035
 and+0.034
 enough+0.034
 them+0.034
 how+0.033
Neg logit contributions
opus-0.038
 sidelines-0.037
leaf-0.037
 cheek-0.037
prise-0.033
Token in
Token ID287
Feature activation-4.78e-03

Pos logit contributions
ine+0.052
 her+0.048
 them+0.046
 Whenever+0.045
 One+0.043
Neg logit contributions
cher-0.110
ider-0.109
opus-0.101
leaf-0.101
 cheek-0.100
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
 and+0.003
 how+0.003
 as+0.003
 enough+0.003
 or+0.002
Neg logit contributions
 sidelines-0.009
arter-0.009
prise-0.008
 cheesy-0.008
 freshly-0.008
Token She
Token ID1375
Feature activation+1.88e-04

Pos logit contributions
arter+0.020
 sidelines+0.020
 Enjoy+0.019
 Merry+0.019
 Thank+0.019
Neg logit contributions
ine-0.011
 and-0.010
stick-0.010
 enough-0.010
 as-0.010
Token plays
Token ID5341
Feature activation-2.85e-02

Pos logit contributions
opus+0.000
leaf+0.000
ye+0.000
arter+0.000
 sidelines+0.000
Neg logit contributions
 and-0.000
 her-0.000
 as-0.000
 its-0.000
 them-0.000
Token the
Token ID262
Feature activation+2.06e-02

Pos logit contributions
 enough+0.021
ine+0.021
 and+0.018
c+0.018
 her+0.017
Neg logit contributions
 sidelines-0.057
opus-0.052
cher-0.051
leaf-0.051
 talented-0.051
Token passed
Token ID3804
Feature activation+3.59e-04

Pos logit contributions
 her+0.001
 and+0.001
 them+0.001
 how+0.001
 as+0.001
Neg logit contributions
opus-0.002
 sidelines-0.002
uch-0.002
ll-0.002
ye-0.002
Token away
Token ID1497
Feature activation+1.66e-02

Pos logit contributions
 sidelines+0.001
ve+0.001
ye+0.001
wy+0.001
leaf+0.001
Neg logit contributions
ine-0.000
 and-0.000
 her-0.000
 how-0.000
Her-0.000
Token,
Token ID11
Feature activation+8.51e-03

Pos logit contributions
arter+0.033
ve+0.033
 sidelines+0.033
ll+0.033
ider+0.031
Neg logit contributions
ine-0.011
 them-0.009
c-0.009
 her-0.009
 Every-0.008
Token and
Token ID290
Feature activation+5.68e-03

Pos logit contributions
 sidelines+0.016
arter+0.015
 cheek+0.015
ll+0.015
opus+0.015
Neg logit contributions
 her-0.008
 and-0.007
 enough-0.007
 them-0.007
 how-0.007
Token his
Token ID465
Feature activation-3.52e-03

Pos logit contributions
 sidelines+0.013
arter+0.013
opus+0.013
ye+0.013
uch+0.013
Neg logit contributions
 and-0.004
 her-0.003
.-0.003
 them-0.002
 as-0.002
Token beautiful
Token ID4950
Feature activation-4.18e-02

Pos logit contributions
 and+0.005
 her+0.005
.+0.004
 one+0.004
,+0.004
Neg logit contributions
opus-0.006
arter-0.006
 sidelines-0.006
ye-0.005
prise-0.005
Token paintings
Token ID21641
Feature activation-2.82e-02

Pos logit contributions
 and+0.074
 her+0.071
 them+0.066
 how+0.066
 as+0.066
Neg logit contributions
arter-0.050
 sidelines-0.050
opus-0.049
umm-0.045
atmeal-0.044
Token were
Token ID547
Feature activation-1.10e-02

Pos logit contributions
 her+0.037
ine+0.036
 them+0.035
 Every+0.035
 One+0.035
Neg logit contributions
uch-0.038
ve-0.037
ll-0.037
atmeal-0.036
cher-0.036
Token lost
Token ID2626
Feature activation+3.04e-03

Pos logit contributions
 and+0.016
ine+0.016
 She+0.015
 her+0.015
 So+0.015
Neg logit contributions
arter-0.013
 sidelines-0.012
ll-0.012
vell-0.012
ve-0.012
Token forever
Token ID8097
Feature activation+9.21e-03

Pos logit contributions
arter+0.006
vell+0.006
ve+0.006
cher+0.006
 sidelines+0.006
Neg logit contributions
 her-0.002
ine-0.002
 Her-0.002
Her-0.002
 Every-0.002
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
arter+0.021
 sidelines+0.019
ve+0.019
 cheesy+0.019
ll+0.019
Neg logit contributions
 her-0.005
 them-0.004
 Every-0.004
 She-0.004
 how-0.004
Token shiny
Token ID22441
Feature activation+4.29e-03

Pos logit contributions
arter+0.029
 cheesy+0.028
 hamb+0.027
 sidelines+0.026
 freshly+0.026
Neg logit contributions
 and-0.026
 So-0.024
.-0.023
stick-0.023
 whenever-0.023
Token and
Token ID290
Feature activation+3.18e-02

Pos logit contributions
opus+0.011
arter+0.011
umm+0.011
 cheesy+0.011
 sidelines+0.011
Neg logit contributions
 her-0.001
 them-0.000
 and-0.000
 how-0.000
 their+0.000
Token had
Token ID550
Feature activation+7.21e-03

Pos logit contributions
opus+0.053
arter+0.052
 sidelines+0.052
atmeal+0.049
umm+0.049
Neg logit contributions
 and-0.050
 her-0.046
.-0.042
 one-0.041
 how-0.041
Token a
Token ID257
Feature activation-4.26e-03

Pos logit contributions
leaf+0.013
ye+0.013
 sidelines+0.013
arter+0.013
opus+0.012
Neg logit contributions
 and-0.007
 her-0.006
,-0.006
 as-0.005
 how-0.005
Token long
Token ID890
Feature activation-3.89e-03

Pos logit contributions
 and+0.007
 her+0.006
.+0.006
 as+0.006
 them+0.006
Neg logit contributions
arter-0.006
opus-0.006
 sidelines-0.006
ye-0.006
uch-0.006
Token cord
Token ID15050
Feature activation-4.66e-02

Pos logit contributions
 her+0.006
 and+0.005
 how+0.005
 them+0.005
 enough+0.005
Neg logit contributions
 sidelines-0.005
opus-0.005
arter-0.005
ve-0.005
uch-0.005
Token.
Token ID13
Feature activation+3.04e-03

Pos logit contributions
 her+0.036
 and+0.032
 them+0.032
 how+0.029
 their+0.027
Neg logit contributions
opus-0.097
arter-0.096
ve-0.093
 sidelines-0.092
ider-0.092
Token
Token ID198
Feature activation+4.11e-03

Pos logit contributions
arter+0.004
 sidelines+0.004
 cheesy+0.004
opus+0.004
 hamb+0.004
Neg logit contributions
 and-0.004
 her-0.004
 enough-0.004
stick-0.004
 as-0.004
Token
Token ID198
Feature activation+3.71e-03

Pos logit contributions
ll+0.009
opus+0.009
ve+0.009
ye+0.008
prise+0.008
Neg logit contributions
 and-0.003
 as-0.003
 enough-0.003
 her-0.003
 how-0.002
Token"
Token ID1
Feature activation-4.01e-03

Pos logit contributions
opus+0.006
 sidelines+0.006
arter+0.005
ye+0.005
prise+0.005
Neg logit contributions
 and-0.005
 her-0.005
 as-0.005
 how-0.004
 enough-0.004
TokenWow
Token ID22017
Feature activation+3.49e-03

Pos logit contributions
 and+0.007
 her+0.006
.+0.006
 as+0.006
 how+0.006
Neg logit contributions
opus-0.005
 sidelines-0.005
arter-0.005
ye-0.005
uch-0.005
Token from
Token ID422
Feature activation-3.07e-03

Pos logit contributions
 them+0.002
ine+0.002
 Whenever+0.002
 Every+0.002
c+0.002
Neg logit contributions
arter-0.007
ll-0.007
vell-0.007
ve-0.007
 cheek-0.007
Token her
Token ID607
Feature activation-3.37e-02

Pos logit contributions
 and+0.002
 how+0.001
,+0.001
 her+0.001
 as+0.001
Neg logit contributions
 sidelines-0.007
arter-0.007
prise-0.007
opus-0.007
 cheesy-0.007
Token.
Token ID13
Feature activation-3.73e-03

Pos logit contributions
 them+0.039
 her+0.039
 and+0.039
 how+0.037
 its+0.035
Neg logit contributions
opus-0.055
 sidelines-0.055
arter-0.055
ll-0.051
 cheek-0.051
Token
Token ID198
Feature activation-1.85e-03

Pos logit contributions
 and+0.004
 enough+0.004
 her+0.003
 as+0.003
 or+0.003
Neg logit contributions
 sidelines-0.006
opus-0.006
ll-0.006
 Merry-0.006
arter-0.006
Token
Token ID198
Feature activation-1.38e-03

Pos logit contributions
 and+0.001
 enough+0.001
 as+0.001
 her+0.001
 how+0.001
Neg logit contributions
opus-0.004
ll-0.004
ve-0.004
 sidelines-0.004
ye-0.004
TokenAt
Token ID2953
Feature activation-3.91e-02

Pos logit contributions
 and+0.002
 her+0.002
 as+0.002
 enough+0.002
 how+0.002
Neg logit contributions
 sidelines-0.002
opus-0.002
arter-0.002
ve-0.002
prise-0.002
Token the
Token ID262
Feature activation+8.81e-03

Pos logit contributions
 and+0.033
 or+0.025
.+0.025
ine+0.024
 her+0.024
Neg logit contributions
 sidelines-0.085
arter-0.079
opus-0.077
prise-0.076
ll-0.075
Token park
Token ID3952
Feature activation+1.30e-02

Pos logit contributions
 sidelines+0.015
opus+0.015
arter+0.015
uch+0.014
ye+0.014
Neg logit contributions
 and-0.012
 her-0.011
 them-0.010
.-0.010
 how-0.010
Token,
Token ID11
Feature activation+1.79e-02

Pos logit contributions
 sidelines+0.025
ll+0.025
opus+0.024
rav+0.024
arter+0.024
Neg logit contributions
stick-0.007
ine-0.007
 how-0.007
 them-0.007
 and-0.006
Token Lily
Token ID20037
Feature activation-1.88e-03

Pos logit contributions
 sidelines+0.029
opus+0.028
arter+0.028
ancer+0.027
prise+0.027
Neg logit contributions
 and-0.022
.-0.019
 her-0.018
 enough-0.018
 how-0.018
Token met
Token ID1138
Feature activation+9.85e-03

Pos logit contributions
 her+0.003
 them+0.003
 how+0.002
 as+0.002
 its+0.002
Neg logit contributions
opus-0.003
ll-0.003
 sidelines-0.002
uch-0.002
atmeal-0.002