TOP ACTIVATIONS
MAX = 0.141

the tied salad in its be ak and happily flew away
the needle out of its be ak . After a few
Tim my up in his be ak . Tim my was
had got stuck in its be ak ! The
the flower away in its be ak ! Lily was upset
pe b ble in their be aks . And they both
bird a kiss on the be ak . They lived happily
was better to use its be ak to pick up food
a lake and filled his be ak with water . Then
It must have weighed at least two hundred pounds ! W
of the honey from the be eh ive .
happily grabbed it in its be ak and flew away .
holding the carrot in its be ak . Benny and Lily
held the film in their be aks and flew together .
with the ornament in its be ak . But then ,
right for her with his be ak open and his wings
with the worm in its be ak . She couldn 't
be ak to break the be eh ive and let the
pe b ble in his be ak , he saw his
cr ay on in its be ak , and together they

MIN ACTIVATIONS
MIN = -0.182

something new ! She thought av oc ados were the most
something new ! She thought av oc ados were the most
, sauce , cheese and topp ings . She showed them
. They were called or nam ents and they were very
of cheese and sauce and topp ings . The people who
, scar ves , and swe aters . He is very
tree with lights and or nam ents . They also saw
girl who didn 't like av oc ados . She always
. Mom says they are av oc ados . Sara wants
time there was a little av ac ado . It was
and lots of shiny or nam ents . L ila wanted
many colors . Some or nam ents were round , some
, there was a loud av ac ado . It was
swings and slides and sand boxes . Lily and Ben wanted
away . She learned that av oc ados are good for
the shiny things were or nam ents and they were very
ongo didn 't like wet av oc ados , so he
and shared everything , including av oc ados . They were
- it was bigger and ju ic ier than any he
on , whenever they wanted av oc ados , they would

INTERVAL 0.060 - 0.141
CONTAINS 1.343%

Jack set off on his journey alone , but he was
was the right thing to do . So he shared his
questions . She wanted to know what others were doing ,
parents took her to the hospital , but it was too
was safe and went to find some cheese to eat .

INTERVAL -0.021 - 0.060
CONTAINS 80.648%

" Look , Tom , I found the light ! We
to be something that was charming . The next
with her daddy . It was the best pool ever !
is reading a book . " Dad , can
some delicious snacks ! All because it took a short ,

INTERVAL -0.102 - -0.021
CONTAINS 17.926%

happy . <|endoftext|> S ara and Ben were playing in the
day , he met a new friend named Billy . Billy
clever cat named Kitty . Kitty liked to play with her
them away . Lily felt sorry for the girl . She
do !" His mom my smiled and said , " Maybe

INTERVAL -0.182 - -0.102
CONTAINS 0.082%

, a kind man named Mr . Johnson heard Lily 's
€ said Sally , looking up at Peter . � �
be very quiet and sneak up on them ." Max listened
, sheep , pigs , chickens , ducks , and more
of places . John came up with one spot , but
Token the
Token ID262
Feature activation+6.95e-03

Pos logit contributions
Aud+0.168
 Heidi+0.155
 Scarlett+0.153
ifying+0.153
 blonde+0.152
Neg logit contributions
 oct-0.098
ands-0.085
orse-0.083
rim-0.080
ulu-0.080
Token tied
Token ID8165
Feature activation+1.73e-02

Pos logit contributions
ands+0.011
 oct+0.011
orse+0.010
rr+0.010
rim+0.010
Neg logit contributions
Aud-0.012
 Sharon-0.011
 Monica-0.011
 Heidi-0.011
 Kathy-0.011
Token salad
Token ID20698
Feature activation-2.69e-02

Pos logit contributions
ands+0.026
 oct+0.025
rim+0.024
ulu+0.023
orse+0.023
Neg logit contributions
Aud-0.031
 Heidi-0.029
 Sharon-0.029
 blonde-0.029
 Stephen-0.028
Token in
Token ID287
Feature activation+2.14e-03

Pos logit contributions
Aud+0.052
 blonde+0.050
 Sharon+0.050
 Heidi+0.050
 Monica+0.049
Neg logit contributions
ands-0.036
 oct-0.034
orse-0.031
rim-0.031
ulu-0.031
Token its
Token ID663
Feature activation+2.73e-02

Pos logit contributions
ands+0.004
rim+0.004
orse+0.004
 oct+0.004
rr+0.003
Neg logit contributions
Aud-0.003
 Heidi-0.003
 Sharon-0.003
 Alice-0.003
rified-0.003
Token be
Token ID307
Feature activation+1.39e-01

Pos logit contributions
ands+0.053
rim+0.050
orse+0.049
 oct+0.048
rr+0.047
Neg logit contributions
Aud-0.037
 Heidi-0.037
rified-0.035
 Sharon-0.035
 Alice-0.035
Tokenak
Token ID461
Feature activation+8.42e-02

Pos logit contributions
ands+0.160
rim+0.156
 oct+0.152
ulu+0.133
orse+0.130
Neg logit contributions
Aud-0.290
 blonde-0.285
 Heidi-0.278
 redd-0.275
 Alice-0.271
Token and
Token ID290
Feature activation+6.75e-04

Pos logit contributions
ands+0.101
rim+0.093
 oct+0.092
orse+0.089
rr+0.085
Neg logit contributions
Aud-0.174
 Heidi-0.173
 Sharon-0.169
 Stephen-0.166
 Monica-0.165
Token happily
Token ID18177
Feature activation-3.56e-03

Pos logit contributions
ands+0.001
orse+0.001
 oct+0.001
rim+0.001
 seek+0.001
Neg logit contributions
Aud-0.001
rified-0.001
 Heidi-0.001
ifying-0.001
 Sharon-0.001
Token flew
Token ID13112
Feature activation+6.41e-03

Pos logit contributions
Aud+0.004
 Heidi+0.004
 Alice+0.004
 Sharon+0.004
 redd+0.004
Neg logit contributions
ands-0.007
rim-0.007
 oct-0.006
orse-0.006
rr-0.006
Token away
Token ID1497
Feature activation-2.35e-03

Pos logit contributions
ands+0.010
 oct+0.010
ulu+0.009
rim+0.009
 seek+0.009
Neg logit contributions
 Heidi-0.010
 blonde-0.010
Aud-0.010
cases-0.010
 Sharon-0.010
Token the
Token ID262
Feature activation+1.31e-02

Pos logit contributions
ands+0.022
rim+0.021
 oct+0.020
orse+0.020
ulu+0.019
Neg logit contributions
Aud-0.021
 Heidi-0.021
 Sharon-0.020
 Alice-0.020
 blonde-0.020
Token needle
Token ID17598
Feature activation+1.22e-02

Pos logit contributions
rim+0.025
ands+0.025
orse+0.024
rr+0.023
 oct+0.023
Neg logit contributions
 redd-0.017
Aud-0.017
 blonde-0.017
 Heidi-0.016
 Scarlett-0.016
Token out
Token ID503
Feature activation-5.10e-02

Pos logit contributions
ands+0.018
rim+0.017
orse+0.017
 oct+0.017
rr+0.016
Neg logit contributions
Aud-0.022
 Heidi-0.022
 Sharon-0.021
 Stephen-0.021
 blonde-0.021
Token of
Token ID286
Feature activation-7.77e-03

Pos logit contributions
ifying+0.087
rified+0.084
 Alice+0.081
 Heidi+0.081
 blonde+0.081
Neg logit contributions
ands-0.079
rim-0.079
orse-0.071
 oct-0.069
rr-0.067
Token its
Token ID663
Feature activation+1.06e-02

Pos logit contributions
 Heidi+0.011
Aud+0.011
 Sharon+0.011
 Alice+0.011
 blonde+0.011
Neg logit contributions
ands-0.014
rim-0.013
orse-0.013
 oct-0.013
rr-0.012
Token be
Token ID307
Feature activation+1.38e-01

Pos logit contributions
ands+0.019
rim+0.019
 oct+0.017
orse+0.017
ulu+0.017
Neg logit contributions
 blonde-0.015
Aud-0.015
ifying-0.015
 Heidi-0.015
 redd-0.015
Tokenak
Token ID461
Feature activation+7.94e-02

Pos logit contributions
 oct+0.183
rim+0.174
ands+0.169
ulu+0.155
loo+0.152
Neg logit contributions
Aud-0.251
 blonde-0.249
 sho-0.233
Scar-0.232
 redd-0.229
Token.
Token ID13
Feature activation+6.17e-03

Pos logit contributions
ands+0.098
rim+0.088
 oct+0.087
orse+0.085
rr+0.081
Neg logit contributions
Aud-0.164
 Heidi-0.163
 Sharon-0.159
 Stephen-0.157
 Alice-0.157
Token After
Token ID2293
Feature activation-1.99e-02

Pos logit contributions
ands+0.011
 oct+0.011
orse+0.010
rr+0.010
 myself+0.010
Neg logit contributions
Aud-0.009
ifying-0.009
 Heidi-0.008
 Monica-0.008
 Stephen-0.008
Token a
Token ID257
Feature activation-1.73e-02

Pos logit contributions
 Stephen+0.021
 Heidi+0.021
 Alice+0.021
 blonde+0.021
 Michelle+0.020
Neg logit contributions
rim-0.045
ands-0.042
rr-0.039
ulu-0.038
asis-0.038
Token few
Token ID1178
Feature activation+2.17e-02

Pos logit contributions
 Heidi+0.019
Aud+0.019
 ï¿½+0.018
 Sharon+0.018
 Alice+0.018
Neg logit contributions
ands-0.039
rim-0.038
 oct-0.033
rr-0.033
ulu-0.033
Token Tim
Token ID5045
Feature activation+5.51e-02

Pos logit contributions
Aud+0.007
 blonde+0.007
 Heidi+0.007
 Scarlett+0.007
ifying+0.007
Neg logit contributions
 oct-0.006
ands-0.006
rim-0.005
orse-0.005
rr-0.005
Tokenmy
Token ID1820
Feature activation+5.82e-02

Pos logit contributions
ands+0.100
rim+0.096
 oct+0.091
orse+0.091
rr+0.089
Neg logit contributions
 Heidi-0.080
Aud-0.080
 Sharon-0.077
 Alice-0.076
 blonde-0.076
Token up
Token ID510
Feature activation-6.87e-02

Pos logit contributions
ands+0.114
rim+0.114
 oct+0.113
orse+0.108
rr+0.108
Neg logit contributions
Aud-0.072
 blonde-0.072
 Heidi-0.071
 Sharon-0.069
 Stephen-0.067
Token in
Token ID287
Feature activation+5.11e-03

Pos logit contributions
Aud+0.129
 Heidi+0.127
 Sharon+0.122
rified+0.121
 redd+0.120
Neg logit contributions
ands-0.099
rim-0.094
 oct-0.092
orse-0.087
ulu-0.083
Token his
Token ID465
Feature activation+2.36e-02

Pos logit contributions
ands+0.009
rim+0.008
 oct+0.008
orse+0.008
rr+0.008
Neg logit contributions
Aud-0.008
 Heidi-0.008
 Sharon-0.008
rified-0.008
 Alice-0.008
Token be
Token ID307
Feature activation+1.38e-01

Pos logit contributions
ands+0.049
rim+0.047
ulu+0.044
rr+0.043
orse+0.043
Neg logit contributions
ifying-0.028
 blonde-0.028
Aud-0.027
 Heidi-0.027
 redd-0.027
Tokenak
Token ID461
Feature activation+8.44e-02

Pos logit contributions
 oct+0.189
ands+0.188
ulu+0.167
rim+0.164
 ig+0.153
Neg logit contributions
Aud-0.248
 blonde-0.234
 Heidi-0.223
 Alice-0.220
se-0.219
Token.
Token ID13
Feature activation+4.63e-03

Pos logit contributions
ands+0.090
rim+0.084
 oct+0.081
orse+0.076
rr+0.075
Neg logit contributions
Aud-0.188
 Heidi-0.187
 Sharon-0.183
 Stephen-0.180
 blonde-0.180
Token Tim
Token ID5045
Feature activation+9.52e-03

Pos logit contributions
ands+0.007
orse+0.007
rim+0.006
 oct+0.006
ulu+0.006
Neg logit contributions
Aud-0.008
 Heidi-0.008
ifying-0.008
 redd-0.008
 blonde-0.008
Tokenmy
Token ID1820
Feature activation+1.90e-03

Pos logit contributions
ands+0.020
rim+0.019
orse+0.018
 oct+0.018
inations+0.017
Neg logit contributions
 Heidi-0.011
rified-0.011
Aud-0.011
 Alice-0.011
 Sharon-0.011
Token was
Token ID373
Feature activation+3.35e-03

Pos logit contributions
ands+0.003
rim+0.003
orse+0.003
 oct+0.003
rr+0.003
Neg logit contributions
Aud-0.003
 Heidi-0.003
 redd-0.003
ifying-0.003
 Sharon-0.003
Token had
Token ID550
Feature activation+2.36e-02

Pos logit contributions
Aud+0.015
 Heidi+0.015
 Sharon+0.014
 Alice+0.014
 Stephen+0.014
Neg logit contributions
ands-0.017
rim-0.016
orse-0.015
 oct-0.015
rr-0.014
Token got
Token ID1392
Feature activation+2.22e-02

Pos logit contributions
ands+0.040
rim+0.038
orse+0.037
 oct+0.036
rr+0.035
Neg logit contributions
Aud-0.037
 Heidi-0.037
 Sharon-0.035
 Alice-0.035
 Stephen-0.035
Token stuck
Token ID7819
Feature activation+7.16e-02

Pos logit contributions
ands+0.036
rim+0.034
 oct+0.033
orse+0.032
ulu+0.031
Neg logit contributions
Aud-0.038
 Heidi-0.037
 Alice-0.036
 Sharon-0.036
 Stephen-0.036
Token in
Token ID287
Feature activation+2.26e-03

Pos logit contributions
ands+0.113
rim+0.104
 oct+0.102
orse+0.099
rr+0.098
Neg logit contributions
Aud-0.124
 Heidi-0.123
 Sharon-0.119
 Stephen-0.117
 Alice-0.117
Token its
Token ID663
Feature activation+1.99e-02

Pos logit contributions
ands+0.004
rim+0.004
orse+0.004
 oct+0.004
rr+0.004
Neg logit contributions
 Heidi-0.003
Aud-0.003
 Alice-0.003
 Sharon-0.003
 blonde-0.003
Token be
Token ID307
Feature activation+1.37e-01

Pos logit contributions
ands+0.036
rim+0.036
orse+0.033
ulu+0.033
inations+0.033
Neg logit contributions
 blonde-0.028
 Heidi-0.028
 redd-0.027
Aud-0.027
 Sharon-0.027
Tokenak
Token ID461
Feature activation+8.11e-02

Pos logit contributions
rim+0.160
ands+0.158
 oct+0.154
orse+0.136
ulu+0.136
Neg logit contributions
Aud-0.275
 blonde-0.271
 Heidi-0.259
ifying-0.259
se-0.258
Token!
Token ID0
Feature activation+7.10e-03

Pos logit contributions
ands+0.095
rim+0.086
 oct+0.085
orse+0.079
rr+0.079
Neg logit contributions
Aud-0.173
 Heidi-0.172
 Sharon-0.169
 Stephen-0.167
 Alice-0.165
Token
Token ID198
Feature activation+1.61e-02

Pos logit contributions
ands+0.013
orse+0.013
 oct+0.013
rim+0.012
rr+0.012
Neg logit contributions
Aud-0.010
ifying-0.009
 redd-0.009
 Heidi-0.009
 blonde-0.009
Token
Token ID198
Feature activation+2.10e-02

Pos logit contributions
ands+0.034
rim+0.032
 oct+0.031
orse+0.030
inations+0.030
Neg logit contributions
 Heidi-0.020
Aud-0.020
 Alice-0.019
 Stephen-0.019
 Sharon-0.018
TokenThe
Token ID464
Feature activation+7.74e-03

Pos logit contributions
ands+0.039
rim+0.039
 oct+0.037
orse+0.037
inations+0.036
Neg logit contributions
Aud-0.030
 Heidi-0.028
 Michelle-0.027
Scar-0.027
 Sharon-0.027
Token the
Token ID262
Feature activation+1.44e-02

Pos logit contributions
cases+0.012
Aud+0.012
 blonde+0.012
ifying+0.012
 Heidi+0.011
Neg logit contributions
ands-0.009
 oct-0.009
ulu-0.008
 myself-0.008
rim-0.008
Token flower
Token ID15061
Feature activation+1.28e-02

Pos logit contributions
ands+0.021
 oct+0.020
rim+0.019
orse+0.019
rr+0.018
Neg logit contributions
Aud-0.026
 Heidi-0.026
 Stephen-0.025
 Sharon-0.025
 Monica-0.025
Token away
Token ID1497
Feature activation+1.15e-02

Pos logit contributions
ands+0.017
rim+0.017
 oct+0.016
orse+0.016
ulu+0.015
Neg logit contributions
Aud-0.024
 Heidi-0.024
 blonde-0.024
 Stephen-0.023
 Sharon-0.023
Token in
Token ID287
Feature activation+6.70e-03

Pos logit contributions
ands+0.012
rim+0.011
 oct+0.011
orse+0.010
ulu+0.010
Neg logit contributions
Aud-0.026
 Heidi-0.025
 Stephen-0.024
 Sharon-0.024
 blonde-0.024
Token its
Token ID663
Feature activation+2.59e-02

Pos logit contributions
ands+0.012
rim+0.012
 oct+0.011
orse+0.011
rr+0.011
Neg logit contributions
Aud-0.010
 Heidi-0.009
rified-0.009
 Sharon-0.009
 redd-0.009
Token be
Token ID307
Feature activation+1.37e-01

Pos logit contributions
ands+0.046
rim+0.044
orse+0.042
 oct+0.042
rr+0.040
Neg logit contributions
 Heidi-0.039
Aud-0.039
 Sharon-0.037
 blonde-0.037
 Alice-0.037
Tokenak
Token ID461
Feature activation+8.16e-02

Pos logit contributions
ands+0.160
 oct+0.151
rim+0.146
ulu+0.131
rr+0.128
Neg logit contributions
Aud-0.285
 blonde-0.276
 Heidi-0.274
 Alice-0.266
 redd-0.264
Token!
Token ID0
Feature activation+3.08e-03

Pos logit contributions
ands+0.082
rim+0.075
 oct+0.072
orse+0.070
rr+0.067
Neg logit contributions
Aud-0.185
 Heidi-0.184
 Sharon-0.181
 Stephen-0.179
 Monica-0.177
Token Lily
Token ID20037
Feature activation+6.63e-03

Pos logit contributions
ulu+0.004
ands+0.004
orse+0.004
 oct+0.004
 myself+0.004
Neg logit contributions
Aud-0.005
ifying-0.005
 redd-0.005
string-0.005
 Stephen-0.005
Token was
Token ID373
Feature activation-3.52e-03

Pos logit contributions
ands+0.009
rim+0.009
orse+0.009
 oct+0.009
rr+0.008
Neg logit contributions
Aud-0.012
 Heidi-0.012
 redd-0.012
ifying-0.011
 Sharon-0.011
Token upset
Token ID9247
Feature activation+3.88e-03

Pos logit contributions
Aud+0.007
 redd+0.006
 Heidi+0.006
ifying+0.006
rified+0.006
Neg logit contributions
ands-0.004
rim-0.004
orse-0.004
 oct-0.004
ulu-0.004
Token pe
Token ID613
Feature activation+6.90e-02

Pos logit contributions
ands+0.078
rim+0.073
orse+0.070
 oct+0.069
rr+0.067
Neg logit contributions
Aud-0.084
 Heidi-0.084
 Sharon-0.081
 Alice-0.081
 blonde-0.081
Tokenb
Token ID65
Feature activation+6.44e-02

Pos logit contributions
ands+0.065
rim+0.059
 oct+0.057
 seek+0.052
 hour+0.051
Neg logit contributions
 redd-0.151
 Tab-0.146
 blonde-0.145
Aud-0.144
se-0.143
Tokenble
Token ID903
Feature activation+4.88e-02

Pos logit contributions
ands+0.075
rim+0.067
 hour+0.062
 oct+0.060
 seek+0.060
Neg logit contributions
se-0.128
ifying-0.126
als-0.126
 Tab-0.125
 redd-0.125
Token in
Token ID287
Feature activation-4.89e-04

Pos logit contributions
ands+0.078
rim+0.073
orse+0.070
 oct+0.069
rr+0.067
Neg logit contributions
Aud-0.082
 Heidi-0.081
 Sharon-0.079
 Alice-0.078
 Stephen-0.078
Token their
Token ID511
Feature activation+2.79e-02

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Sharon+0.001
 Alice+0.001
rified+0.001
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
rr-0.001
Token be
Token ID307
Feature activation+1.36e-01

Pos logit contributions
ands+0.043
rim+0.040
 oct+0.038
orse+0.038
rr+0.037
Neg logit contributions
 Heidi-0.049
Aud-0.048
 Sharon-0.047
rified-0.047
 Alice-0.047
Tokenaks
Token ID4730
Feature activation+2.36e-02

Pos logit contributions
ands+0.163
 oct+0.151
rim+0.148
ulu+0.134
rr+0.128
Neg logit contributions
Aud-0.276
 blonde-0.271
 Alice-0.260
 Heidi-0.258
ifying-0.256
Token.
Token ID13
Feature activation+8.06e-03

Pos logit contributions
ands+0.030
rim+0.027
 oct+0.027
orse+0.026
rr+0.025
Neg logit contributions
 Heidi-0.048
Aud-0.048
 Sharon-0.046
 redd-0.046
 blonde-0.045
Token And
Token ID843
Feature activation+2.52e-02

Pos logit contributions
orse+0.010
ands+0.010
rim+0.009
rr+0.009
 oct+0.008
Neg logit contributions
 redd-0.015
rified-0.015
 blonde-0.015
Aud-0.014
 wrists-0.014
Token they
Token ID484
Feature activation-2.28e-02

Pos logit contributions
ands+0.032
 oct+0.031
orse+0.031
rim+0.030
 seek+0.029
Neg logit contributions
Aud-0.048
ifying-0.046
 blonde-0.046
rified-0.045
 redd-0.045
Token both
Token ID1111
Feature activation+1.30e-02

Pos logit contributions
 Heidi+0.036
Aud+0.036
 Sharon+0.034
 blonde+0.034
 Alice+0.034
Neg logit contributions
ands-0.039
rim-0.036
orse-0.035
 oct-0.035
ulu-0.034
Token bird
Token ID6512
Feature activation+4.17e-02

Pos logit contributions
ands+0.070
rim+0.066
orse+0.060
inations+0.060
ulu+0.059
Neg logit contributions
 redd-0.058
 Heidi-0.058
Aud-0.057
 blonde-0.056
rified-0.056
Token a
Token ID257
Feature activation-1.58e-02

Pos logit contributions
ands+0.072
rim+0.069
orse+0.065
 oct+0.065
rr+0.063
Neg logit contributions
Aud-0.064
 Heidi-0.064
 Alice-0.061
 Sharon-0.061
 Stephen-0.061
Token kiss
Token ID9245
Feature activation-3.59e-02

Pos logit contributions
Aud+0.025
 Heidi+0.025
 redd+0.024
 blonde+0.024
 Alice+0.024
Neg logit contributions
ands-0.027
rim-0.026
orse-0.024
 oct-0.024
rr-0.023
Token on
Token ID319
Feature activation-4.14e-02

Pos logit contributions
Aud+0.079
 Heidi+0.079
 blonde+0.078
 redd+0.078
 Sharon+0.077
Neg logit contributions
 oct-0.034
ands-0.034
orse-0.033
rim-0.031
ulu-0.029
Token the
Token ID262
Feature activation+2.89e-02

Pos logit contributions
Aud+0.063
 Heidi+0.061
rified+0.059
 redd+0.058
 Sharon+0.058
Neg logit contributions
ands-0.074
rim-0.070
orse-0.069
 oct-0.069
rr-0.064
Token be
Token ID307
Feature activation+1.36e-01

Pos logit contributions
ands+0.051
rim+0.050
ulu+0.046
orse+0.045
rr+0.045
Neg logit contributions
 Heidi-0.043
Aud-0.042
 redd-0.042
 Alice-0.041
 blonde-0.041
Tokenak
Token ID461
Feature activation+7.66e-02

Pos logit contributions
ands+0.174
 oct+0.167
rim+0.151
ulu+0.142
rr+0.138
Neg logit contributions
Aud-0.260
 Heidi-0.251
 blonde-0.250
 Alice-0.245
 Michelle-0.242
Token.
Token ID13
Feature activation+3.85e-03

Pos logit contributions
ands+0.095
rim+0.088
orse+0.084
 oct+0.084
rr+0.079
Neg logit contributions
 Heidi-0.155
Aud-0.155
 Sharon-0.151
 Stephen-0.150
 Alice-0.149
Token They
Token ID1119
Feature activation+1.31e-02

Pos logit contributions
orse+0.007
ands+0.007
 oct+0.007
rim+0.007
rr+0.007
Neg logit contributions
Aud-0.005
ifying-0.005
 Heidi-0.005
 redd-0.005
 blonde-0.005
Token lived
Token ID5615
Feature activation-1.68e-02

Pos logit contributions
ands+0.022
orse+0.021
 oct+0.021
rim+0.021
ulu+0.020
Neg logit contributions
 Heidi-0.021
Aud-0.021
 Sharon-0.020
 blonde-0.020
rified-0.019
Token happily
Token ID18177
Feature activation+1.36e-02

Pos logit contributions
Aud+0.022
 Heidi+0.021
 blonde+0.021
ifying+0.021
 Sharon+0.021
Neg logit contributions
ands-0.033
rim-0.031
orse-0.031
 oct-0.031
rr-0.030
Token was
Token ID373
Feature activation+1.05e-02

Pos logit contributions
Aud+0.064
 redd+0.062
 Heidi+0.062
ifying+0.062
 Tab+0.061
Neg logit contributions
 oct-0.041
orse-0.040
ands-0.040
 seek-0.037
ulu-0.036
Token better
Token ID1365
Feature activation-5.89e-02

Pos logit contributions
ands+0.014
rim+0.013
orse+0.013
 oct+0.013
ulu+0.012
Neg logit contributions
Aud-0.020
rified-0.019
 Heidi-0.019
ifying-0.019
 Sharon-0.019
Token to
Token ID284
Feature activation+1.89e-02

Pos logit contributions
Aud+0.059
 Heidi+0.059
 Alice+0.055
 Sharon+0.054
 Stephen+0.054
Neg logit contributions
ands-0.132
rim-0.131
 oct-0.122
ulu-0.122
rr-0.121
Token use
Token ID779
Feature activation+5.07e-02

Pos logit contributions
ands+0.027
rim+0.025
 oct+0.024
orse+0.023
rr+0.023
Neg logit contributions
 Heidi-0.035
Aud-0.035
 Sharon-0.034
 Alice-0.034
 blonde-0.034
Token its
Token ID663
Feature activation+1.17e-02

Pos logit contributions
ands+0.082
rim+0.075
 oct+0.074
orse+0.073
inations+0.068
Neg logit contributions
Aud-0.086
 Heidi-0.084
rified-0.083
 Sharon-0.082
 redd-0.081
Token be
Token ID307
Feature activation+1.35e-01

Pos logit contributions
ands+0.021
rim+0.020
orse+0.019
 oct+0.019
rr+0.019
Neg logit contributions
Aud-0.017
 Heidi-0.017
 blonde-0.016
 Sharon-0.016
 Alice-0.016
Tokenak
Token ID461
Feature activation+8.05e-02

Pos logit contributions
ands+0.171
rim+0.163
 oct+0.160
orse+0.148
rr+0.145
Neg logit contributions
Aud-0.271
 Heidi-0.263
 blonde-0.260
 Alice-0.255
 redd-0.253
Token to
Token ID284
Feature activation+2.20e-02

Pos logit contributions
ands+0.135
rim+0.125
 oct+0.123
orse+0.120
rr+0.116
Neg logit contributions
 Heidi-0.131
Aud-0.129
 Sharon-0.126
 Stephen-0.124
 Alice-0.124
Token pick
Token ID2298
Feature activation+2.24e-02

Pos logit contributions
ands+0.031
rim+0.030
 oct+0.027
orse+0.027
rr+0.027
Neg logit contributions
 Heidi-0.040
Aud-0.039
 Sharon-0.039
 Alice-0.039
 Stephen-0.038
Token up
Token ID510
Feature activation-6.91e-02

Pos logit contributions
 oct+0.033
ands+0.031
rim+0.030
 himself+0.030
 myself+0.028
Neg logit contributions
Aud-0.040
 blonde-0.039
ifying-0.039
 Sharon-0.036
cases-0.036
Token food
Token ID2057
Feature activation-3.70e-02

Pos logit contributions
Aud+0.134
 Heidi+0.129
rified+0.127
 Sharon+0.126
 Scarlett+0.124
Neg logit contributions
ands-0.094
 oct-0.090
rim-0.086
orse-0.084
rr-0.076
Token a
Token ID257
Feature activation-1.83e-02

Pos logit contributions
ands+0.037
rim+0.036
orse+0.034
inations+0.033
rr+0.032
Neg logit contributions
 Heidi-0.021
 Alice-0.021
 Sharon-0.021
 Stephen-0.020
 Monica-0.020
Token lake
Token ID13546
Feature activation+2.65e-03

Pos logit contributions
 Heidi+0.031
Aud+0.031
rified+0.030
 Sharon+0.030
 Alice+0.030
Neg logit contributions
ands-0.029
rim-0.027
 oct-0.026
orse-0.026
rr-0.025
Token and
Token ID290
Feature activation-2.00e-02

Pos logit contributions
ands+0.005
rim+0.004
orse+0.004
inations+0.004
rr+0.004
Neg logit contributions
 Heidi-0.004
 Alice-0.004
rified-0.004
 Sharon-0.004
 redd-0.004
Token filled
Token ID5901
Feature activation+2.33e-02

Pos logit contributions
Aud+0.032
 Heidi+0.031
 blonde+0.030
ifying+0.030
 Sharon+0.030
Neg logit contributions
ands-0.033
orse-0.031
 oct-0.031
rim-0.031
ulu-0.030
Token his
Token ID465
Feature activation+1.96e-02

Pos logit contributions
ands+0.045
rim+0.042
orse+0.040
rr+0.039
 oct+0.039
Neg logit contributions
 Heidi-0.033
Aud-0.032
 Alice-0.031
 Sharon-0.031
 Stephen-0.031
Token be
Token ID307
Feature activation+1.35e-01

Pos logit contributions
ands+0.036
rim+0.034
orse+0.032
 oct+0.032
ulu+0.032
Neg logit contributions
 Heidi-0.028
Aud-0.027
ifying-0.027
 Alice-0.027
 redd-0.027
Tokenak
Token ID461
Feature activation+7.84e-02

Pos logit contributions
 oct+0.194
ands+0.189
rim+0.167
ulu+0.165
orse+0.156
Neg logit contributions
Aud-0.235
 blonde-0.221
 Heidi-0.216
 Alice-0.215
 redd-0.214
Token with
Token ID351
Feature activation+4.44e-03

Pos logit contributions
ands+0.103
rim+0.095
 oct+0.091
orse+0.089
rr+0.086
Neg logit contributions
Aud-0.156
 Heidi-0.156
 Sharon-0.152
 Monica-0.150
 Stephen-0.150
Token water
Token ID1660
Feature activation+2.10e-02

Pos logit contributions
ands+0.005
 oct+0.005
rim+0.005
orse+0.004
rr+0.004
Neg logit contributions
Aud-0.010
 Heidi-0.009
 Sharon-0.009
 Scarlett-0.009
rified-0.009
Token.
Token ID13
Feature activation-2.33e-03

Pos logit contributions
ands+0.023
rim+0.021
 oct+0.020
orse+0.019
rr+0.018
Neg logit contributions
 Heidi-0.048
Aud-0.048
 Sharon-0.046
 blonde-0.046
 Alice-0.046
Token Then
Token ID3244
Feature activation-9.56e-03

Pos logit contributions
Aud+0.004
ifying+0.003
 Heidi+0.003
Scar+0.003
 Michelle+0.003
Neg logit contributions
ands-0.004
 oct-0.004
orse-0.004
rr-0.004
 ig-0.003
Token It
Token ID632
Feature activation-5.43e-04

Pos logit contributions
ands+0.012
 oct+0.012
orse+0.011
rim+0.011
ulu+0.010
Neg logit contributions
Aud-0.013
 Heidi-0.012
 Michelle-0.011
ifying-0.011
 Alice-0.011
Token must
Token ID1276
Feature activation+1.58e-03

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Sharon+0.001
 Alice+0.001
 Stephen+0.001
Neg logit contributions
ands-0.001
rim-0.001
 oct-0.001
orse-0.001
ulu-0.001
Token have
Token ID423
Feature activation+3.90e-02

Pos logit contributions
ands+0.002
rim+0.002
 oct+0.002
orse+0.002
rr+0.002
Neg logit contributions
 Heidi-0.003
 Alice-0.003
 Sharon-0.003
 Michelle-0.003
Aud-0.003
Token weighed
Token ID20261
Feature activation-2.88e-03

Pos logit contributions
ands+0.049
rim+0.044
 oct+0.043
orse+0.043
rr+0.040
Neg logit contributions
Aud-0.079
 Heidi-0.078
ifying-0.076
 Sharon-0.076
rified-0.076
Token at
Token ID379
Feature activation+1.97e-02

Pos logit contributions
Aud+0.006
 Heidi+0.005
 Sharon+0.005
 Alice+0.005
 Michelle+0.005
Neg logit contributions
ands-0.004
 oct-0.004
rim-0.004
orse-0.004
rr-0.003
Token least
Token ID1551
Feature activation+1.35e-01

Pos logit contributions
ands+0.036
rim+0.035
orse+0.033
rr+0.031
 oct+0.031
Neg logit contributions
 Heidi-0.029
 Alice-0.028
 Stephen-0.028
 Sharon-0.028
 Monica-0.027
Token two
Token ID734
Feature activation+1.38e-02

Pos logit contributions
ands+0.212
rim+0.202
 oct+0.187
orse+0.187
rr+0.182
Neg logit contributions
 Heidi-0.232
Aud-0.228
 Alice-0.224
 Sharon-0.223
 Stephen-0.222
Token hundred
Token ID3470
Feature activation+3.29e-02

Pos logit contributions
ands+0.021
rim+0.019
orse+0.019
 oct+0.019
rr+0.018
Neg logit contributions
Aud-0.025
 Heidi-0.025
 Sharon-0.024
 Alice-0.024
 Stephen-0.024
Token pounds
Token ID8059
Feature activation-6.28e-02

Pos logit contributions
ands+0.048
rim+0.044
orse+0.043
 oct+0.043
rr+0.041
Neg logit contributions
Aud-0.061
 Heidi-0.061
 Sharon-0.059
 Alice-0.058
 Stephen-0.058
Token!
Token ID0
Feature activation+1.09e-02

Pos logit contributions
Aud+0.138
 Heidi+0.135
 Sharon+0.134
 blonde+0.134
 redd+0.132
Neg logit contributions
ands-0.070
 oct-0.062
orse-0.060
rim-0.059
rr-0.057
Token W
Token ID370
Feature activation+3.14e-02

Pos logit contributions
ands+0.020
 oct+0.019
rim+0.019
orse+0.019
ulu+0.018
Neg logit contributions
Aud-0.016
 Heidi-0.015
 blonde-0.015
 redd-0.014
 Alice-0.014
Token of
Token ID286
Feature activation-7.39e-03

Pos logit contributions
ands+0.012
rim+0.011
orse+0.011
 oct+0.010
rr+0.010
Neg logit contributions
 Heidi-0.013
Aud-0.013
 blonde-0.013
 Alice-0.013
 Sharon-0.013
Token the
Token ID262
Feature activation+1.35e-02

Pos logit contributions
 Heidi+0.012
Aud+0.012
 Alice+0.011
 Sharon+0.011
 Stephen+0.011
Neg logit contributions
ands-0.012
rim-0.012
orse-0.011
 oct-0.011
rr-0.011
Token honey
Token ID12498
Feature activation-1.59e-02

Pos logit contributions
ands+0.021
rim+0.019
 oct+0.019
orse+0.018
rr+0.018
Neg logit contributions
Aud-0.024
 Heidi-0.023
 Sharon-0.022
 Alice-0.022
 Stephen-0.022
Token from
Token ID422
Feature activation-2.25e-02

Pos logit contributions
Aud+0.031
 Heidi+0.031
 Sharon+0.030
 Stephen+0.030
 blonde+0.030
Neg logit contributions
ands-0.021
rim-0.019
 oct-0.018
orse-0.018
ulu-0.017
Token the
Token ID262
Feature activation+1.21e-02

Pos logit contributions
 Heidi+0.035
Aud+0.034
 Alice+0.034
 Sharon+0.034
 Stephen+0.033
Neg logit contributions
ands-0.039
rim-0.037
orse-0.036
 oct-0.035
rr-0.034
Token be
Token ID307
Feature activation+1.35e-01

Pos logit contributions
ands+0.020
rim+0.018
 oct+0.018
orse+0.017
ulu+0.017
Neg logit contributions
Aud-0.020
 Heidi-0.020
 Stephen-0.019
 Sharon-0.019
 Alice-0.019
Tokeneh
Token ID17231
Feature activation+1.66e-02

Pos logit contributions
ands+0.145
rim+0.134
 oct+0.131
orse+0.119
rr+0.119
Neg logit contributions
Aud-0.292
 Heidi-0.288
 blonde-0.285
 Alice-0.280
 redd-0.276
Tokenive
Token ID425
Feature activation+2.83e-02

Pos logit contributions
ands+0.031
rim+0.030
orse+0.029
 oct+0.028
rr+0.028
Neg logit contributions
 Heidi-0.023
Aud-0.023
 Sharon-0.022
 Alice-0.022
 Stephen-0.022
Token.
Token ID13
Feature activation+6.63e-03

Pos logit contributions
ands+0.036
rim+0.034
orse+0.032
 oct+0.032
ulu+0.030
Neg logit contributions
Aud-0.057
 Heidi-0.056
 Sharon-0.055
 Stephen-0.055
 Alice-0.054
Token
Token ID198
Feature activation+1.99e-02

Pos logit contributions
ands+0.011
orse+0.011
rim+0.011
 oct+0.011
ulu+0.011
Neg logit contributions
Aud-0.010
 redd-0.009
ifying-0.009
 Heidi-0.009
rified-0.009
Token
Token ID198
Feature activation+2.24e-02

Pos logit contributions
ands+0.041
rim+0.038
 oct+0.037
orse+0.035
rr+0.035
Neg logit contributions
 Heidi-0.026
Aud-0.025
 Alice-0.025
 Stephen-0.024
 Sharon-0.024
Token happily
Token ID18177
Feature activation+6.82e-03

Pos logit contributions
Aud+0.039
 Heidi+0.038
 Sharon+0.037
 Alice+0.037
ifying+0.037
Neg logit contributions
ands-0.040
rim-0.039
orse-0.038
 oct-0.037
rr-0.036
Token grabbed
Token ID13176
Feature activation+1.33e-02

Pos logit contributions
ands+0.013
rim+0.012
orse+0.012
 oct+0.012
rr+0.011
Neg logit contributions
Aud-0.009
 Heidi-0.009
 Sharon-0.009
 Alice-0.009
 blonde-0.009
Token it
Token ID340
Feature activation-3.92e-03

Pos logit contributions
 oct+0.017
ands+0.015
orse+0.015
 himself+0.014
 myself+0.014
Neg logit contributions
Aud-0.028
cases-0.026
 blonde-0.025
ifying-0.025
 Scarlett-0.025
Token in
Token ID287
Feature activation-4.84e-04

Pos logit contributions
Aud+0.008
 Heidi+0.007
 Sharon+0.007
 Stephen+0.007
 blonde+0.007
Neg logit contributions
ands-0.005
 oct-0.005
rim-0.005
orse-0.005
rr-0.005
Token its
Token ID663
Feature activation+2.25e-02

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Sharon+0.001
 Alice+0.001
rified+0.001
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
rr-0.001
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.044
rim+0.042
orse+0.042
 oct+0.041
rr+0.040
Neg logit contributions
Aud-0.029
 Heidi-0.028
rified-0.027
 Sharon-0.027
 Alice-0.027
Tokenak
Token ID461
Feature activation+7.95e-02

Pos logit contributions
ands+0.159
 oct+0.151
rim+0.150
rr+0.130
ulu+0.129
Neg logit contributions
Aud-0.275
 blonde-0.269
 Heidi-0.268
 redd-0.260
 Alice-0.257
Token and
Token ID290
Feature activation+3.96e-03

Pos logit contributions
ands+0.093
 oct+0.088
rim+0.086
orse+0.084
rr+0.078
Neg logit contributions
Aud-0.169
 Heidi-0.164
 Sharon-0.161
 Stephen-0.158
 Monica-0.157
Token flew
Token ID13112
Feature activation+7.16e-04

Pos logit contributions
ands+0.006
orse+0.006
 oct+0.006
rim+0.006
 seek+0.006
Neg logit contributions
Aud-0.007
 Heidi-0.006
ifying-0.006
rified-0.006
 Sharon-0.006
Token away
Token ID1497
Feature activation-7.14e-03

Pos logit contributions
ands+0.001
 oct+0.001
ulu+0.001
 seek+0.001
 himself+0.001
Neg logit contributions
 blonde-0.001
Aud-0.001
 Heidi-0.001
 Michelle-0.001
 Sharon-0.001
Token.
Token ID13
Feature activation-9.16e-04

Pos logit contributions
 Heidi+0.013
Aud+0.013
 Alice+0.013
 Sharon+0.012
rified+0.012
Neg logit contributions
ands-0.011
rim-0.010
orse-0.009
 oct-0.009
rr-0.009
Token holding
Token ID4769
Feature activation+6.62e-03

Pos logit contributions
Aud+0.011
ifying+0.011
 Heidi+0.011
rified+0.011
 Alice+0.011
Neg logit contributions
ands-0.016
rim-0.014
rr-0.014
 oct-0.014
inations-0.013
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
 oct+0.009
orse+0.008
ands+0.008
 himself+0.007
 myself+0.007
Neg logit contributions
Aud-0.013
cases-0.013
 Heidi-0.012
Scar-0.012
 blonde-0.012
Token carrot
Token ID40562
Feature activation-1.05e-02

Pos logit contributions
ands+0.036
rim+0.035
 oct+0.034
orse+0.034
rr+0.032
Neg logit contributions
Aud-0.030
 Heidi-0.029
 Sharon-0.028
 Alice-0.028
 Stephen-0.028
Token in
Token ID287
Feature activation-3.71e-03

Pos logit contributions
Aud+0.021
 Heidi+0.021
 Sharon+0.020
 Alice+0.020
 Stephen+0.020
Neg logit contributions
ands-0.014
rim-0.013
orse-0.012
 oct-0.012
rr-0.011
Token its
Token ID663
Feature activation+1.84e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
rified+0.005
 Sharon+0.005
 blonde+0.005
Neg logit contributions
ands-0.007
rim-0.006
 oct-0.006
orse-0.006
rr-0.006
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.041
rim+0.040
 oct+0.037
orse+0.037
ulu+0.037
Neg logit contributions
 Heidi-0.020
Aud-0.019
 blonde-0.019
 redd-0.019
 Sharon-0.019
Tokenak
Token ID461
Feature activation+7.88e-02

Pos logit contributions
 oct+0.188
ands+0.183
rim+0.174
ulu+0.162
 event+0.161
Neg logit contributions
Aud-0.235
 blonde-0.233
 Heidi-0.215
 redd-0.214
Scar-0.212
Token.
Token ID13
Feature activation+5.22e-03

Pos logit contributions
ands+0.081
rim+0.075
 oct+0.073
orse+0.072
rr+0.067
Neg logit contributions
Aud-0.177
 Heidi-0.176
 Sharon-0.172
 Alice-0.169
 Stephen-0.169
Token Benny
Token ID44275
Feature activation-6.50e-03

Pos logit contributions
orse+0.008
 oct+0.008
ands+0.008
rim+0.007
ulu+0.007
Neg logit contributions
Aud-0.009
 blonde-0.009
 Heidi-0.008
 Stephen-0.008
 Alice-0.008
Token and
Token ID290
Feature activation-3.11e-02

Pos logit contributions
Aud+0.012
 Heidi+0.012
ifying+0.011
 redd+0.011
 blonde+0.011
Neg logit contributions
ands-0.009
orse-0.009
rim-0.009
 oct-0.009
rr-0.008
Token Lily
Token ID20037
Feature activation+1.07e-03

Pos logit contributions
Aud+0.048
 Heidi+0.047
 Sharon+0.045
 Alice+0.045
 Stephen+0.045
Neg logit contributions
ands-0.055
rim-0.052
orse-0.050
 oct-0.050
ulu-0.048
Token held
Token ID2714
Feature activation-1.03e-02

Pos logit contributions
ands+0.021
orse+0.020
 oct+0.020
rim+0.019
ulu+0.019
Neg logit contributions
 Heidi-0.024
Aud-0.024
ifying-0.023
 Alice-0.023
 Sharon-0.023
Token the
Token ID262
Feature activation+8.40e-03

Pos logit contributions
Aud+0.021
Scar+0.020
 Heidi+0.020
 blonde+0.020
 Scarlett+0.019
Neg logit contributions
 oct-0.013
ands-0.012
orse-0.012
 himself-0.010
rr-0.010
Token film
Token ID2646
Feature activation-9.05e-03

Pos logit contributions
ands+0.014
rim+0.013
 oct+0.012
orse+0.012
rr+0.012
Neg logit contributions
 Heidi-0.014
Aud-0.014
 Sharon-0.014
 Alice-0.013
 blonde-0.013
Token in
Token ID287
Feature activation-4.25e-03

Pos logit contributions
Aud+0.017
 Heidi+0.016
 Sharon+0.016
 blonde+0.016
ifying+0.015
Neg logit contributions
ands-0.013
 oct-0.012
orse-0.012
rim-0.012
rr-0.011
Token their
Token ID511
Feature activation+1.51e-02

Pos logit contributions
Aud+0.007
 Heidi+0.007
 Sharon+0.006
 Alice+0.006
rified+0.006
Neg logit contributions
ands-0.007
rim-0.007
orse-0.007
 oct-0.007
rr-0.007
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.032
rim+0.030
orse+0.030
 oct+0.030
rr+0.029
Neg logit contributions
 Heidi-0.018
Aud-0.017
 Sharon-0.017
rified-0.017
 Alice-0.017
Tokenaks
Token ID4730
Feature activation+1.80e-02

Pos logit contributions
 oct+0.178
rim+0.174
ands+0.174
ulu+0.153
 ig+0.149
Neg logit contributions
Aud-0.256
 blonde-0.243
 Heidi-0.229
Scar-0.228
 Alice-0.228
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
ands+0.023
 oct+0.022
orse+0.021
rim+0.020
rr+0.019
Neg logit contributions
 Heidi-0.037
Aud-0.037
 Sharon-0.035
 redd-0.034
 Michelle-0.034
Token flew
Token ID13112
Feature activation+1.85e-03

Pos logit contributions
ands+0.016
orse+0.016
 oct+0.015
rim+0.015
 seek+0.014
Neg logit contributions
 Heidi-0.019
Aud-0.019
rified-0.019
ifying-0.019
 blonde-0.018
Token together
Token ID1978
Feature activation+1.04e-02

Pos logit contributions
ands+0.003
 oct+0.003
orse+0.003
ulu+0.003
 seek+0.003
Neg logit contributions
 Heidi-0.003
 blonde-0.003
Aud-0.003
 Michelle-0.003
 Sharon-0.003
Token.
Token ID13
Feature activation+5.54e-03

Pos logit contributions
ands+0.011
orse+0.011
 oct+0.010
rr+0.009
rim+0.009
Neg logit contributions
Aud-0.023
 Heidi-0.023
rified-0.022
 Sharon-0.022
 Michelle-0.022
Token with
Token ID351
Feature activation-4.84e-03

Pos logit contributions
ands+0.071
rim+0.067
orse+0.063
 oct+0.062
rr+0.061
Neg logit contributions
 Heidi-0.068
Aud-0.067
 Alice-0.065
 Sharon-0.065
 blonde-0.065
Token the
Token ID262
Feature activation+6.11e-03

Pos logit contributions
Aud+0.008
 Heidi+0.008
 Sharon+0.008
rified+0.008
ifying+0.008
Neg logit contributions
ands-0.008
rim-0.007
 oct-0.007
orse-0.007
ulu-0.007
Token ornament
Token ID45618
Feature activation-1.79e-02

Pos logit contributions
ands+0.010
rim+0.009
 oct+0.009
orse+0.009
rr+0.008
Neg logit contributions
Aud-0.010
 Heidi-0.010
 Sharon-0.010
 Alice-0.010
 Stephen-0.010
Token in
Token ID287
Feature activation-3.51e-03

Pos logit contributions
Aud+0.035
 Heidi+0.034
 Sharon+0.033
 Stephen+0.033
rified+0.033
Neg logit contributions
ands-0.024
rim-0.022
orse-0.022
 oct-0.022
rr-0.020
Token its
Token ID663
Feature activation+2.18e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
rified+0.005
 Sharon+0.005
 redd+0.005
Neg logit contributions
ands-0.006
rim-0.006
 oct-0.006
orse-0.006
rr-0.006
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.039
rim+0.037
 oct+0.035
ulu+0.034
orse+0.034
Neg logit contributions
 Heidi-0.033
Aud-0.033
 blonde-0.033
 redd-0.032
 Alice-0.032
Tokenak
Token ID461
Feature activation+8.30e-02

Pos logit contributions
 oct+0.171
ands+0.168
rim+0.163
ulu+0.148
rr+0.137
Neg logit contributions
Aud-0.261
 blonde-0.257
 Heidi-0.237
 Alice-0.235
 redd-0.233
Token.
Token ID13
Feature activation-3.99e-03

Pos logit contributions
ands+0.100
rim+0.092
 oct+0.087
orse+0.087
rr+0.083
Neg logit contributions
Aud-0.172
 Heidi-0.172
 Sharon-0.167
 Stephen-0.166
 Alice-0.165
Token But
Token ID887
Feature activation+3.81e-02

Pos logit contributions
ifying+0.005
Aud+0.005
 Heidi+0.005
 redd+0.005
Scar+0.005
Neg logit contributions
ands-0.007
orse-0.007
 oct-0.007
ulu-0.007
rim-0.006
Token then
Token ID788
Feature activation+8.65e-03

Pos logit contributions
ands+0.058
orse+0.055
rim+0.055
 oct+0.054
ulu+0.052
Neg logit contributions
Aud-0.066
 Heidi-0.064
ifying-0.062
 blonde-0.062
 Sharon-0.062
Token,
Token ID11
Feature activation-1.28e-02

Pos logit contributions
 oct+0.009
orse+0.009
ands+0.008
 seek+0.008
oy+0.008
Neg logit contributions
ifying-0.019
Aud-0.019
 redd-0.018
 Heidi-0.017
licks-0.017
Token right
Token ID826
Feature activation-1.02e-02

Pos logit contributions
Aud+0.035
 Heidi+0.035
 Stephen+0.034
 Sharon+0.034
ifying+0.034
Neg logit contributions
 oct-0.031
ands-0.030
ulu-0.029
rim-0.028
orse-0.028
Token for
Token ID329
Feature activation+2.56e-02

Pos logit contributions
Aud+0.016
 Heidi+0.015
 Michelle+0.014
 redd+0.014
 Sharon+0.014
Neg logit contributions
ands-0.018
 oct-0.017
rim-0.016
ulu-0.016
rr-0.015
Token her
Token ID607
Feature activation+1.31e-02

Pos logit contributions
rim+0.047
ands+0.047
orse+0.044
rr+0.043
 oct+0.042
Neg logit contributions
 Alice-0.037
 Heidi-0.036
 blonde-0.036
Aud-0.036
 Sharon-0.036
Token with
Token ID351
Feature activation+9.09e-03

Pos logit contributions
ands+0.021
rim+0.020
orse+0.019
 oct+0.018
rr+0.018
Neg logit contributions
Aud-0.022
 Heidi-0.022
 Alice-0.021
 blonde-0.021
 Sharon-0.021
Token his
Token ID465
Feature activation+2.50e-03

Pos logit contributions
ands+0.014
 oct+0.013
rim+0.013
orse+0.012
ulu+0.012
Neg logit contributions
 Heidi-0.015
Aud-0.015
 redd-0.015
rified-0.015
 Sharon-0.014
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.004
rim+0.004
orse+0.004
 oct+0.004
ulu+0.004
Neg logit contributions
Aud-0.004
 blonde-0.004
 Heidi-0.004
 Alice-0.004
ifying-0.004
Tokenak
Token ID461
Feature activation+7.57e-02

Pos logit contributions
ands+0.163
 oct+0.146
rim+0.146
ulu+0.131
orse+0.125
Neg logit contributions
Aud-0.272
 blonde-0.265
ifying-0.256
 Heidi-0.255
 Alice-0.254
Token open
Token ID1280
Feature activation+3.93e-02

Pos logit contributions
ands+0.093
rim+0.087
 oct+0.085
orse+0.083
rr+0.080
Neg logit contributions
 Heidi-0.152
Aud-0.152
 Sharon-0.148
 Stephen-0.146
 redd-0.145
Token and
Token ID290
Feature activation+1.99e-02

Pos logit contributions
ands+0.043
 oct+0.041
rim+0.038
 ig+0.035
rr+0.035
Neg logit contributions
Aud-0.085
 Heidi-0.084
 Sharon-0.081
 Stephen-0.080
 Michelle-0.080
Token his
Token ID465
Feature activation-2.31e-02

Pos logit contributions
ands+0.030
 oct+0.029
rim+0.028
orse+0.028
ulu+0.027
Neg logit contributions
 Heidi-0.033
Aud-0.033
rified-0.033
ifying-0.032
 Stephen-0.032
Token wings
Token ID12098
Feature activation+8.28e-03

Pos logit contributions
 Heidi+0.032
Aud+0.032
 Sharon+0.031
 Alice+0.031
rified+0.031
Neg logit contributions
ands-0.043
rim-0.041
orse-0.040
 oct-0.040
rr-0.039
Token with
Token ID351
Feature activation-1.30e-03

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Sharon+0.001
 Stephen+0.001
 redd+0.001
Neg logit contributions
ands-0.001
rim-0.001
 oct-0.001
orse-0.001
rr-0.001
Token the
Token ID262
Feature activation+5.23e-03

Pos logit contributions
Aud+0.002
ifying+0.002
 redd+0.002
cases+0.002
 Scarlett+0.002
Neg logit contributions
ands-0.002
 oct-0.002
rim-0.002
orse-0.002
ulu-0.002
Token worm
Token ID23849
Feature activation+2.04e-02

Pos logit contributions
ands+0.009
rim+0.008
orse+0.008
 oct+0.008
rr+0.008
Neg logit contributions
 Heidi-0.008
Aud-0.008
 redd-0.008
rified-0.008
 Sharon-0.008
Token in
Token ID287
Feature activation-3.88e-03

Pos logit contributions
ands+0.026
orse+0.025
rim+0.025
 oct+0.024
ulu+0.022
Neg logit contributions
Aud-0.041
 Heidi-0.040
 Sharon-0.039
 Stephen-0.039
 redd-0.039
Token its
Token ID663
Feature activation+2.02e-02

Pos logit contributions
Aud+0.006
 redd+0.006
 Heidi+0.006
rified+0.006
cases+0.006
Neg logit contributions
ands-0.006
rim-0.006
 oct-0.006
orse-0.006
ulu-0.006
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.035
rim+0.033
 oct+0.031
ulu+0.030
rr+0.030
Neg logit contributions
 Heidi-0.032
Aud-0.031
 blonde-0.031
ifying-0.030
 Monica-0.030
Tokenak
Token ID461
Feature activation+8.12e-02

Pos logit contributions
ands+0.172
 oct+0.170
rim+0.155
ulu+0.143
rr+0.136
Neg logit contributions
Aud-0.263
 blonde-0.256
 Heidi-0.244
 Alice-0.238
 Michelle-0.236
Token.
Token ID13
Feature activation+2.34e-03

Pos logit contributions
ands+0.088
rim+0.084
orse+0.081
 oct+0.081
rr+0.075
Neg logit contributions
Aud-0.176
 Heidi-0.175
 Sharon-0.171
 Stephen-0.168
 Alice-0.167
Token She
Token ID1375
Feature activation+1.08e-02

Pos logit contributions
ands+0.004
orse+0.004
 oct+0.004
rr+0.004
rim+0.004
Neg logit contributions
Aud-0.003
 redd-0.003
 Heidi-0.003
ifying-0.003
rified-0.003
Token couldn
Token ID3521
Feature activation-1.90e-02

Pos logit contributions
ands+0.018
rim+0.017
orse+0.017
 oct+0.017
ulu+0.016
Neg logit contributions
Aud-0.017
 Heidi-0.017
ifying-0.017
 redd-0.016
 Sharon-0.016
Token't
Token ID470
Feature activation+4.97e-02

Pos logit contributions
Aud+0.033
 blonde+0.033
 Heidi+0.033
 redd+0.033
 Sharon+0.032
Neg logit contributions
ands-0.028
rim-0.027
orse-0.026
 oct-0.025
ulu-0.025
Token be
Token ID307
Feature activation+1.14e-01

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Alice+0.001
 blonde+0.001
 Sharon+0.001
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
rr-0.001
Tokenak
Token ID461
Feature activation+6.61e-02

Pos logit contributions
ands+0.149
 oct+0.147
rim+0.144
orse+0.132
rr+0.123
Neg logit contributions
Aud-0.230
 Heidi-0.220
 blonde-0.214
 Alice-0.210
 Audrey-0.207
Token to
Token ID284
Feature activation+9.50e-03

Pos logit contributions
ands+0.110
 oct+0.100
rim+0.099
rr+0.096
orse+0.096
Neg logit contributions
 Heidi-0.107
Aud-0.106
 Sharon-0.105
 Stephen-0.103
ifying-0.102
Token break
Token ID2270
Feature activation+3.91e-02

Pos logit contributions
ands+0.013
rim+0.012
orse+0.012
 oct+0.011
rr+0.011
Neg logit contributions
 Heidi-0.018
Aud-0.018
 Alice-0.017
 Sharon-0.017
 blonde-0.017
Token the
Token ID262
Feature activation+1.22e-02

Pos logit contributions
ands+0.056
 oct+0.054
rim+0.054
ulu+0.051
 myself+0.046
Neg logit contributions
Aud-0.073
 blonde-0.069
 Heidi-0.069
ifying-0.069
 Stephen-0.068
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.022
rim+0.021
orse+0.021
 oct+0.020
rr+0.019
Neg logit contributions
 Heidi-0.018
Aud-0.018
 redd-0.018
 blonde-0.017
 Sharon-0.017
Tokeneh
Token ID17231
Feature activation+1.29e-02

Pos logit contributions
ands+0.160
rim+0.151
 oct+0.148
orse+0.134
rr+0.132
Neg logit contributions
Aud-0.279
 Heidi-0.270
 blonde-0.265
 Alice-0.261
 redd-0.258
Tokenive
Token ID425
Feature activation+3.15e-02

Pos logit contributions
ands+0.025
rim+0.024
orse+0.023
 oct+0.023
rr+0.022
Neg logit contributions
Aud-0.018
 Heidi-0.017
rified-0.017
ifying-0.016
 redd-0.016
Token and
Token ID290
Feature activation-4.46e-03

Pos logit contributions
ands+0.044
rim+0.041
orse+0.039
 oct+0.039
ulu+0.037
Neg logit contributions
 Heidi-0.059
Aud-0.059
 Sharon-0.058
 Stephen-0.058
 blonde-0.057
Token let
Token ID1309
Feature activation-4.07e-02

Pos logit contributions
Aud+0.009
rified+0.009
 Heidi+0.009
 blonde+0.009
ifying+0.009
Neg logit contributions
ands-0.005
orse-0.005
rim-0.005
 oct-0.005
ulu-0.005
Token the
Token ID262
Feature activation+1.22e-03

Pos logit contributions
ifying+0.078
 blonde+0.077
Aud+0.072
 wrists+0.070
 redd+0.069
Neg logit contributions
ulu-0.053
rim-0.052
orse-0.052
 oct-0.051
ands-0.049
Token pe
Token ID613
Feature activation+7.38e-02

Pos logit contributions
ands+0.010
rim+0.010
inations+0.009
orse+0.009
 seek+0.008
Neg logit contributions
 redd-0.008
 Tab-0.008
 blonde-0.007
se-0.007
 sho-0.007
Tokenb
Token ID65
Feature activation+6.67e-02

Pos logit contributions
ands+0.068
rim+0.061
 oct+0.059
 seek+0.053
 hour+0.051
Neg logit contributions
 redd-0.166
Aud-0.161
licks-0.160
se-0.160
 blonde-0.159
Tokenble
Token ID903
Feature activation+4.92e-02

Pos logit contributions
ands+0.057
rim+0.050
 oct+0.041
orse+0.040
inations+0.040
Neg logit contributions
se-0.158
 Heidi-0.158
rified-0.157
 redd-0.156
ifying-0.156
Token in
Token ID287
Feature activation-3.01e-03

Pos logit contributions
ands+0.067
rim+0.063
orse+0.061
 oct+0.060
rr+0.057
Neg logit contributions
Aud-0.094
 Heidi-0.093
 Sharon-0.090
 Alice-0.090
 Stephen-0.089
Token his
Token ID465
Feature activation+1.89e-02

Pos logit contributions
Aud+0.006
cases+0.006
ifying+0.006
rified+0.006
 Heidi+0.006
Neg logit contributions
ands-0.004
 oct-0.004
orse-0.003
rim-0.003
rr-0.003
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.040
rim+0.039
ulu+0.036
inations+0.036
 oct+0.035
Neg logit contributions
Aud-0.023
 blonde-0.023
 Heidi-0.022
 Monica-0.021
 Alice-0.021
Tokenak
Token ID461
Feature activation+8.09e-02

Pos logit contributions
ands+0.189
 oct+0.187
rim+0.182
 prank+0.162
ulu+0.159
Neg logit contributions
Aud-0.242
 blonde-0.238
Scar-0.213
 Heidi-0.212
se-0.212
Token,
Token ID11
Feature activation+5.28e-03

Pos logit contributions
ands+0.102
rim+0.096
orse+0.093
 oct+0.093
rr+0.088
Neg logit contributions
Aud-0.163
 Heidi-0.161
 Sharon-0.156
 Stephen-0.155
 Alice-0.155
Token he
Token ID339
Feature activation-1.03e-02

Pos logit contributions
 himself+0.003
 oct+0.003
orse+0.003
 seek+0.003
 sound+0.003
Neg logit contributions
Aud-0.014
ifying-0.013
 wrists-0.013
se-0.013
licks-0.013
Token saw
Token ID2497
Feature activation+1.26e-02

Pos logit contributions
 Heidi+0.016
Aud+0.016
 Alice+0.015
 Sharon+0.015
 Stephen+0.015
Neg logit contributions
ands-0.018
rim-0.017
orse-0.017
 oct-0.016
rr-0.016
Token his
Token ID465
Feature activation-2.41e-02

Pos logit contributions
ands+0.020
rim+0.019
 oct+0.018
orse+0.018
rr+0.017
Neg logit contributions
Aud-0.021
 Heidi-0.021
 Sharon-0.020
 Alice-0.020
 blonde-0.020
Token cr
Token ID1067
Feature activation-1.22e-02

Pos logit contributions
ands+0.020
rim+0.020
orse+0.018
inations+0.017
ulu+0.017
Neg logit contributions
 redd-0.015
 blonde-0.014
ifying-0.014
rified-0.013
Aud-0.013
Tokenay
Token ID323
Feature activation+4.47e-02

Pos logit contributions
ifying+0.011
Aud+0.010
se+0.010
string+0.010
licks+0.010
Neg logit contributions
ands-0.028
rim-0.027
 oct-0.027
orse-0.027
 seek-0.026
Tokenon
Token ID261
Feature activation+1.54e-02

Pos logit contributions
ands+0.078
rim+0.075
 oct+0.071
orse+0.069
 seek+0.067
Neg logit contributions
Aud-0.065
 Heidi-0.063
ifying-0.063
 Alice-0.062
se-0.062
Token in
Token ID287
Feature activation-3.41e-03

Pos logit contributions
ands+0.028
orse+0.026
rim+0.025
 oct+0.025
rr+0.024
Neg logit contributions
Aud-0.023
 Heidi-0.023
 Sharon-0.022
 Stephen-0.022
rified-0.021
Token its
Token ID663
Feature activation+1.95e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
rified+0.005
 Sharon+0.005
 Stephen+0.005
Neg logit contributions
ands-0.006
rim-0.006
 oct-0.005
orse-0.005
rr-0.005
Token be
Token ID307
Feature activation+1.34e-01

Pos logit contributions
ands+0.038
rim+0.037
 oct+0.034
orse+0.034
ulu+0.034
Neg logit contributions
Aud-0.026
 Heidi-0.025
 blonde-0.025
ifying-0.025
 redd-0.025
Tokenak
Token ID461
Feature activation+7.78e-02

Pos logit contributions
ands+0.152
rim+0.143
 oct+0.141
ulu+0.123
rr+0.123
Neg logit contributions
Aud-0.282
 blonde-0.278
 Heidi-0.267
 Alice-0.263
 redd-0.262
Token,
Token ID11
Feature activation+8.08e-03

Pos logit contributions
ands+0.084
 oct+0.076
rim+0.076
orse+0.074
rr+0.069
Neg logit contributions
Aud-0.168
 Heidi-0.168
 Sharon-0.163
 Stephen-0.161
 redd-0.160
Token and
Token ID290
Feature activation-2.52e-02

Pos logit contributions
orse+0.010
ands+0.009
 oct+0.009
rim+0.009
inations+0.008
Neg logit contributions
rified-0.016
Aud-0.016
 Heidi-0.016
 Sharon-0.015
ifying-0.015
Token together
Token ID1978
Feature activation-1.53e-02

Pos logit contributions
Aud+0.044
 Heidi+0.044
rified+0.043
ifying+0.043
 blonde+0.042
Neg logit contributions
orse-0.035
 oct-0.035
ands-0.033
rim-0.033
ulu-0.032
Token they
Token ID484
Feature activation-5.80e-03

Pos logit contributions
 redd+0.030
rified+0.030
Aud+0.029
 Heidi+0.029
ifying+0.029
Neg logit contributions
orse-0.019
ands-0.019
 oct-0.018
ulu-0.017
rim-0.017
Token something
Token ID1223
Feature activation-3.02e-02

Pos logit contributions
ands+0.006
 oct+0.005
 himself+0.005
 myself+0.005
 adventure+0.005
Neg logit contributions
Aud-0.012
rified-0.012
ifying-0.012
 blonde-0.012
 redd-0.012
Token new
Token ID649
Feature activation-4.22e-02

Pos logit contributions
Aud+0.062
 Heidi+0.060
rified+0.059
ifying+0.059
 blonde+0.059
Neg logit contributions
ands-0.036
rim-0.035
 oct-0.035
orse-0.031
ulu-0.031
Token!
Token ID0
Feature activation+1.07e-02

Pos logit contributions
 Heidi+0.073
Aud+0.071
 Alice+0.071
 Stephen+0.070
 Sharon+0.070
Neg logit contributions
ands-0.067
rim-0.064
orse-0.060
 oct-0.058
rr-0.057
Token She
Token ID1375
Feature activation+1.37e-02

Pos logit contributions
ands+0.020
 oct+0.019
orse+0.019
ulu+0.019
rim+0.019
Neg logit contributions
Aud-0.014
 redd-0.013
 Heidi-0.013
 blonde-0.013
ifying-0.013
Token thought
Token ID1807
Feature activation+2.42e-02

Pos logit contributions
ands+0.023
rim+0.022
orse+0.021
 oct+0.021
ulu+0.020
Neg logit contributions
Aud-0.022
 Heidi-0.022
rified-0.021
 Sharon-0.021
ifying-0.021
Token av
Token ID1196
Feature activation-1.82e-01

Pos logit contributions
ands+0.035
orse+0.032
 oct+0.032
ulu+0.031
inations+0.030
Neg logit contributions
Aud-0.045
rified-0.044
ifying-0.044
 redd-0.042
se-0.042
Tokenoc
Token ID420
Feature activation-5.36e-02

Pos logit contributions
 Heidi+0.374
Aud+0.365
 Alice+0.364
 Sharon+0.364
 Stephen+0.356
Neg logit contributions
ands-0.228
rim-0.212
orse-0.196
rr-0.194
 oct-0.193
Tokenados
Token ID22484
Feature activation-1.19e-01

Pos logit contributions
ifying+0.066
Aud+0.065
 blonde+0.061
 Scarlett+0.059
 Heidi+0.059
Neg logit contributions
ands-0.115
rim-0.110
 oct-0.107
rr-0.102
 myself-0.100
Token were
Token ID547
Feature activation-1.47e-02

Pos logit contributions
Aud+0.169
 Heidi+0.168
 Sharon+0.161
 Alice+0.158
 Stephen+0.158
Neg logit contributions
ands-0.221
rim-0.210
 oct-0.202
orse-0.201
ulu-0.195
Token the
Token ID262
Feature activation-1.27e-02

Pos logit contributions
Aud+0.030
rified+0.030
 Heidi+0.029
ifying+0.029
 Sharon+0.029
Neg logit contributions
ands-0.018
 oct-0.017
rim-0.017
ulu-0.015
orse-0.015
Token most
Token ID749
Feature activation+6.45e-03

Pos logit contributions
Aud+0.023
 Heidi+0.022
ifying+0.021
 Sharon+0.021
 Alice+0.021
Neg logit contributions
ands-0.019
 oct-0.018
rim-0.017
rr-0.016
ulu-0.016
Token something
Token ID1223
Feature activation-2.58e-02

Pos logit contributions
ands+0.009
 oct+0.009
 himself+0.008
 myself+0.008
 adventure+0.007
Neg logit contributions
Aud-0.019
rified-0.019
ifying-0.019
 blonde-0.018
 redd-0.018
Token new
Token ID649
Feature activation-3.93e-02

Pos logit contributions
Aud+0.052
 Heidi+0.051
rified+0.050
ifying+0.050
 blonde+0.050
Neg logit contributions
ands-0.032
rim-0.030
 oct-0.030
orse-0.028
ulu-0.027
Token!
Token ID0
Feature activation+1.18e-02

Pos logit contributions
 Heidi+0.069
Aud+0.068
 Alice+0.067
 Sharon+0.066
 Stephen+0.066
Neg logit contributions
ands-0.061
rim-0.058
orse-0.054
 oct-0.053
rr-0.052
Token She
Token ID1375
Feature activation+1.34e-02

Pos logit contributions
ands+0.023
 oct+0.022
orse+0.022
ulu+0.022
rim+0.021
Neg logit contributions
Aud-0.014
 redd-0.014
 blonde-0.014
ifying-0.014
 Heidi-0.014
Token thought
Token ID1807
Feature activation+2.57e-02

Pos logit contributions
ands+0.022
rim+0.021
orse+0.021
 oct+0.021
ulu+0.019
Neg logit contributions
Aud-0.022
 Heidi-0.022
rified-0.021
ifying-0.021
 Sharon-0.021
Token av
Token ID1196
Feature activation-1.82e-01

Pos logit contributions
ands+0.036
 oct+0.033
orse+0.033
ulu+0.031
inations+0.031
Neg logit contributions
Aud-0.048
ifying-0.048
rified-0.048
 redd-0.045
 Heidi-0.045
Tokenoc
Token ID420
Feature activation-4.55e-02

Pos logit contributions
 Heidi+0.374
 Sharon+0.365
 Alice+0.365
Aud+0.364
 Stephen+0.356
Neg logit contributions
ands-0.226
rim-0.209
orse-0.193
rr-0.192
 oct-0.188
Tokenados
Token ID22484
Feature activation-1.17e-01

Pos logit contributions
ifying+0.056
Aud+0.055
 blonde+0.052
 Heidi+0.051
 Scarlett+0.050
Neg logit contributions
ands-0.098
rim-0.093
 oct-0.090
rr-0.088
 myself-0.084
Token were
Token ID547
Feature activation-1.42e-02

Pos logit contributions
Aud+0.167
 Heidi+0.166
 Sharon+0.159
 Alice+0.156
 Stephen+0.156
Neg logit contributions
ands-0.216
rim-0.205
 oct-0.198
orse-0.197
ulu-0.190
Token the
Token ID262
Feature activation-1.20e-02

Pos logit contributions
Aud+0.029
rified+0.029
 Heidi+0.028
ifying+0.028
 Sharon+0.028
Neg logit contributions
ands-0.017
 oct-0.016
rim-0.016
ulu-0.015
orse-0.014
Token most
Token ID749
Feature activation+7.98e-03

Pos logit contributions
Aud+0.022
 Heidi+0.021
 Sharon+0.020
ifying+0.020
 Alice+0.020
Neg logit contributions
ands-0.018
 oct-0.017
rim-0.016
rr-0.015
ulu-0.015
Token,
Token ID11
Feature activation+3.90e-03

Pos logit contributions
 Heidi+0.081
Aud+0.078
 Alice+0.078
 redd+0.076
 Sharon+0.076
Neg logit contributions
ands-0.096
rim-0.092
orse-0.088
inations-0.086
rr-0.083
Token sauce
Token ID10746
Feature activation-6.60e-02

Pos logit contributions
ands+0.007
rim+0.007
orse+0.006
 oct+0.006
rr+0.006
Neg logit contributions
Aud-0.006
 Heidi-0.006
 Sharon-0.005
 Alice-0.005
 Stephen-0.005
Token,
Token ID11
Feature activation-6.76e-03

Pos logit contributions
 Heidi+0.106
Aud+0.104
 Alice+0.102
 Sharon+0.100
 Stephen+0.100
Neg logit contributions
ands-0.108
rim-0.103
orse-0.098
 oct-0.095
inations-0.094
Token cheese
Token ID9891
Feature activation-7.83e-02

Pos logit contributions
 Heidi+0.010
Aud+0.010
 Alice+0.010
 Sharon+0.010
 blonde+0.010
Neg logit contributions
ands-0.012
rim-0.011
orse-0.011
 oct-0.011
rr-0.011
Token and
Token ID290
Feature activation-2.05e-02

Pos logit contributions
 Heidi+0.128
Aud+0.127
 Alice+0.123
 Sharon+0.122
 Stephen+0.122
Neg logit contributions
ands-0.126
rim-0.120
orse-0.113
 oct-0.111
inations-0.109
Token topp
Token ID23126
Feature activation-1.79e-01

Pos logit contributions
Aud+0.032
 Heidi+0.031
rified+0.030
 Sharon+0.030
 Stephen+0.030
Neg logit contributions
ands-0.036
rim-0.033
 oct-0.033
orse-0.033
rr-0.031
Tokenings
Token ID654
Feature activation-8.32e-02

Pos logit contributions
Aud+0.253
 Heidi+0.249
 Sharon+0.240
 Alice+0.237
 Stephen+0.235
Neg logit contributions
ands-0.340
rim-0.328
orse-0.313
rr-0.312
 oct-0.311
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
Aud+0.187
 Heidi+0.181
 redd+0.181
 blonde+0.180
 Sharon+0.177
Neg logit contributions
ands-0.089
 oct-0.077
rim-0.074
rr-0.069
ulu-0.066
Token She
Token ID1375
Feature activation+1.08e-03

Pos logit contributions
Aud+0.023
 Heidi+0.022
 Stephen+0.022
se+0.021
rified+0.021
Neg logit contributions
ands-0.021
 oct-0.020
ulu-0.019
orse-0.018
rim-0.018
Token showed
Token ID3751
Feature activation-2.85e-02

Pos logit contributions
ands+0.002
rim+0.002
orse+0.002
 oct+0.002
rr+0.002
Neg logit contributions
Aud-0.002
 Heidi-0.002
 Sharon-0.002
 Alice-0.002
 Stephen-0.002
Token them
Token ID606
Feature activation-9.16e-03

Pos logit contributions
Aud+0.058
 Heidi+0.055
 blonde+0.055
ifying+0.054
 redd+0.053
Neg logit contributions
 oct-0.036
ands-0.035
ulu-0.032
rim-0.031
rr-0.029
Token.
Token ID13
Feature activation-5.50e-03

Pos logit contributions
ands+0.035
rim+0.034
orse+0.031
 oct+0.031
rr+0.030
Neg logit contributions
 Heidi-0.049
Aud-0.049
 Alice-0.048
 Sharon-0.048
 Stephen-0.048
Token They
Token ID1119
Feature activation+9.04e-03

Pos logit contributions
Aud+0.007
 Heidi+0.007
rified+0.007
 Stephen+0.007
se+0.007
Neg logit contributions
ands-0.010
orse-0.010
ulu-0.010
rim-0.010
 oct-0.010
Token were
Token ID547
Feature activation-1.36e-02

Pos logit contributions
ands+0.016
rim+0.015
 oct+0.015
orse+0.015
ulu+0.014
Neg logit contributions
 Heidi-0.014
Aud-0.014
 Sharon-0.013
rified-0.013
 Stephen-0.013
Token called
Token ID1444
Feature activation-4.59e-03

Pos logit contributions
rified+0.027
Aud+0.027
ifying+0.027
 Heidi+0.026
 Sharon+0.025
Neg logit contributions
ands-0.017
 oct-0.016
rim-0.016
ulu-0.015
orse-0.015
Token or
Token ID393
Feature activation-5.70e-02

Pos logit contributions
rified+0.008
Aud+0.008
 Heidi+0.008
 redd+0.008
ifying+0.008
Neg logit contributions
 oct-0.007
ands-0.007
rr-0.006
 ig-0.006
orse-0.006
Tokennam
Token ID7402
Feature activation-1.76e-01

Pos logit contributions
 Heidi+0.047
Aud+0.047
rified+0.046
 Sharon+0.041
 Stephen+0.041
Neg logit contributions
ands-0.139
 oct-0.135
orse-0.132
rr-0.130
rim-0.129
Tokenents
Token ID658
Feature activation-1.92e-02

Pos logit contributions
 Heidi+0.228
Aud+0.225
rified+0.214
 Sharon+0.214
 Alice+0.214
Neg logit contributions
ands-0.350
rim-0.334
orse-0.322
 oct-0.318
rr-0.317
Token and
Token ID290
Feature activation-3.72e-03

Pos logit contributions
Aud+0.036
 Heidi+0.035
 blonde+0.034
 Sharon+0.034
rified+0.034
Neg logit contributions
ands-0.027
rim-0.025
 oct-0.025
orse-0.024
ulu-0.024
Token they
Token ID484
Feature activation-3.05e-02

Pos logit contributions
 Heidi+0.006
Aud+0.006
rified+0.006
 Stephen+0.006
 Sharon+0.006
Neg logit contributions
ands-0.006
orse-0.006
 oct-0.006
rim-0.006
rr-0.005
Token were
Token ID547
Feature activation-1.44e-02

Pos logit contributions
 Heidi+0.045
Aud+0.045
 Sharon+0.042
rified+0.042
 Alice+0.042
Neg logit contributions
ands-0.055
rim-0.052
orse-0.051
 oct-0.050
rr-0.048
Token very
Token ID845
Feature activation-3.83e-03

Pos logit contributions
rified+0.029
Aud+0.028
 Heidi+0.028
ifying+0.027
 Michelle+0.026
Neg logit contributions
ands-0.019
 oct-0.017
orse-0.017
rim-0.017
ulu-0.016
Token of
Token ID286
Feature activation-3.20e-02

Pos logit contributions
rim+0.001
orse+0.001
ands+0.001
 oct+0.001
rr+0.001
Neg logit contributions
 Heidi-0.001
Aud-0.001
ifying-0.001
 Alice-0.001
 Sharon-0.001
Token cheese
Token ID9891
Feature activation-5.47e-02

Pos logit contributions
 blonde+0.047
 Alice+0.046
 Heidi+0.045
 Stephen+0.044
 Sharon+0.044
Neg logit contributions
rim-0.054
orse-0.053
ands-0.052
ulu-0.052
 oct-0.049
Token and
Token ID290
Feature activation+3.66e-03

Pos logit contributions
Aud+0.088
 Heidi+0.088
 Alice+0.085
ifying+0.084
 Stephen+0.084
Neg logit contributions
ands-0.088
rim-0.085
orse-0.081
 oct-0.079
ulu-0.077
Token sauce
Token ID10746
Feature activation-6.92e-02

Pos logit contributions
ands+0.006
 oct+0.006
rim+0.006
orse+0.006
rr+0.005
Neg logit contributions
Aud-0.006
 Heidi-0.006
rified-0.006
 Sharon-0.006
 Stephen-0.005
Token and
Token ID290
Feature activation-1.30e-02

Pos logit contributions
Aud+0.123
 Heidi+0.123
 Alice+0.119
 Sharon+0.118
 Stephen+0.118
Neg logit contributions
ands-0.101
rim-0.097
orse-0.092
 oct-0.090
rr-0.086
Token topp
Token ID23126
Feature activation-1.71e-01

Pos logit contributions
Aud+0.021
 Heidi+0.020
rified+0.020
 Sharon+0.020
 Stephen+0.019
Neg logit contributions
ands-0.022
rim-0.021
 oct-0.021
orse-0.020
rr-0.019
Tokenings
Token ID654
Feature activation-7.94e-02

Pos logit contributions
Aud+0.244
 Heidi+0.241
 Sharon+0.232
rified+0.229
 Alice+0.228
Neg logit contributions
ands-0.323
rim-0.310
rr-0.297
orse-0.296
 oct-0.293
Token.
Token ID13
Feature activation-4.92e-03

Pos logit contributions
 redd+0.170
Aud+0.169
 blonde+0.169
 Heidi+0.167
rified+0.163
Neg logit contributions
ands-0.094
rim-0.080
rr-0.079
 oct-0.078
orse-0.072
Token The
Token ID383
Feature activation+5.75e-03

Pos logit contributions
Aud+0.006
 Heidi+0.006
rified+0.006
 blonde+0.006
 redd+0.006
Neg logit contributions
ands-0.009
rim-0.009
orse-0.009
 oct-0.009
rr-0.009
Token people
Token ID661
Feature activation-3.26e-02

Pos logit contributions
ands+0.010
rim+0.010
orse+0.009
ulu+0.009
rr+0.009
Neg logit contributions
 redd-0.008
 Heidi-0.008
ifying-0.008
 Stephen-0.008
 Sharon-0.008
Token who
Token ID508
Feature activation-2.05e-02

Pos logit contributions
Aud+0.055
 Heidi+0.054
 Sharon+0.052
 blonde+0.052
rified+0.052
Neg logit contributions
ands-0.053
rim-0.050
orse-0.048
 oct-0.047
rr-0.046
Token,
Token ID11
Feature activation-1.12e-02

Pos logit contributions
Aud+0.177
 Heidi+0.176
 redd+0.171
 Sharon+0.170
rified+0.170
Neg logit contributions
ands-0.110
rim-0.100
orse-0.098
 oct-0.097
rr-0.093
Token scar
Token ID10153
Feature activation-4.56e-02

Pos logit contributions
 Heidi+0.018
Aud+0.018
rified+0.017
 Sharon+0.017
 Alice+0.017
Neg logit contributions
ands-0.019
rim-0.018
 oct-0.018
orse-0.017
rr-0.016
Tokenves
Token ID1158
Feature activation-1.18e-01

Pos logit contributions
Aud+0.068
 Heidi+0.067
 Alice+0.065
 Stephen+0.065
 Sharon+0.064
Neg logit contributions
ands-0.078
rim-0.075
 oct-0.074
orse-0.072
ulu-0.070
Token,
Token ID11
Feature activation-1.30e-02

Pos logit contributions
 Heidi+0.220
Aud+0.219
 Sharon+0.212
 redd+0.211
rified+0.211
Neg logit contributions
ands-0.169
rim-0.156
orse-0.150
 oct-0.149
rr-0.143
Token and
Token ID290
Feature activation-3.72e-02

Pos logit contributions
 Heidi+0.022
Aud+0.021
rified+0.021
 Sharon+0.021
 Alice+0.021
Neg logit contributions
ands-0.021
rim-0.020
 oct-0.019
orse-0.019
rr-0.018
Token swe
Token ID3490
Feature activation-1.71e-01

Pos logit contributions
 Heidi+0.061
rified+0.060
Aud+0.060
 Sharon+0.058
 Alice+0.057
Neg logit contributions
ands-0.062
 oct-0.056
orse-0.056
rim-0.055
rr-0.053
Tokenaters
Token ID8605
Feature activation-8.15e-02

Pos logit contributions
 Heidi+0.170
Aud+0.167
 Sharon+0.158
 Alice+0.157
rified+0.155
Neg logit contributions
ands-0.394
rim-0.378
orse-0.363
 oct-0.361
rr-0.359
Token.
Token ID13
Feature activation-1.98e-02

Pos logit contributions
Aud+0.149
 Heidi+0.149
 Alice+0.145
ifying+0.145
 Sharon+0.144
Neg logit contributions
ands-0.114
rim-0.109
orse-0.103
 oct-0.101
rr-0.097
Token He
Token ID679
Feature activation+1.26e-03

Pos logit contributions
Aud+0.031
rified+0.029
se+0.029
 Heidi+0.029
ifying+0.029
Neg logit contributions
ands-0.034
 oct-0.031
orse-0.031
rim-0.031
ulu-0.030
Token is
Token ID318
Feature activation-4.13e-03

Pos logit contributions
ands+0.002
 oct+0.002
rim+0.002
orse+0.002
rr+0.002
Neg logit contributions
 Heidi-0.002
rified-0.002
Aud-0.002
ifying-0.002
 Michelle-0.002
Token very
Token ID845
Feature activation-5.48e-03

Pos logit contributions
Aud+0.007
 Heidi+0.007
 Sharon+0.007
rified+0.007
 Alice+0.007
Neg logit contributions
ands-0.006
rim-0.006
orse-0.006
 oct-0.006
rr-0.005
Token tree
Token ID5509
Feature activation-1.60e-02

Pos logit contributions
Aud+0.025
 Heidi+0.025
rified+0.024
ifying+0.024
 Sharon+0.024
Neg logit contributions
ands-0.021
 oct-0.021
rim-0.020
orse-0.020
rr-0.019
Token with
Token ID351
Feature activation-2.23e-02

Pos logit contributions
 Heidi+0.029
Aud+0.028
 Alice+0.028
 Sharon+0.028
ifying+0.027
Neg logit contributions
ands-0.024
rim-0.022
orse-0.021
 oct-0.021
rr-0.020
Token lights
Token ID7588
Feature activation-3.24e-02

Pos logit contributions
Aud+0.041
 Scarlett+0.040
 Heidi+0.039
rified+0.039
Scar+0.038
Neg logit contributions
 oct-0.032
ands-0.031
 ig-0.030
rim-0.028
ulu-0.027
Token and
Token ID290
Feature activation-1.81e-02

Pos logit contributions
Aud+0.067
 Heidi+0.067
 Sharon+0.064
 redd+0.064
 Scarlett+0.064
Neg logit contributions
ands-0.039
 oct-0.036
rim-0.035
orse-0.035
rr-0.033
Token or
Token ID393
Feature activation-5.61e-02

Pos logit contributions
Aud+0.029
 Heidi+0.029
rified+0.028
 Sharon+0.028
 Stephen+0.028
Neg logit contributions
ands-0.030
 oct-0.027
rim-0.027
orse-0.027
rr-0.026
Tokennam
Token ID7402
Feature activation-1.65e-01

Pos logit contributions
Aud+0.102
 Heidi+0.099
rified+0.098
 Sharon+0.094
 Michelle+0.093
Neg logit contributions
ands-0.081
 oct-0.078
orse-0.073
rim-0.070
rr-0.070
Tokenents
Token ID658
Feature activation+7.41e-04

Pos logit contributions
 Heidi+0.215
Aud+0.212
 Sharon+0.203
rified+0.202
 Alice+0.202
Neg logit contributions
ands-0.329
rim-0.313
orse-0.303
 oct-0.298
rr-0.297
Token.
Token ID13
Feature activation-1.22e-02

Pos logit contributions
ands+0.001
rim+0.001
orse+0.001
 oct+0.001
ulu+0.001
Neg logit contributions
Aud-0.002
 Heidi-0.002
 Sharon-0.001
 blonde-0.001
 Stephen-0.001
Token They
Token ID1119
Feature activation+4.96e-03

Pos logit contributions
Aud+0.019
se+0.018
 redd+0.017
ifying+0.017
string+0.017
Neg logit contributions
 oct-0.020
ands-0.020
orse-0.020
ulu-0.019
rr-0.018
Token also
Token ID635
Feature activation+3.23e-02

Pos logit contributions
ands+0.008
 oct+0.007
orse+0.007
rr+0.007
ulu+0.007
Neg logit contributions
Aud-0.008
 Heidi-0.008
ifying-0.008
rified-0.008
 Michelle-0.008
Token saw
Token ID2497
Feature activation+8.51e-03

Pos logit contributions
ands+0.061
rim+0.059
orse+0.055
 oct+0.055
ulu+0.053
Neg logit contributions
Aud-0.045
 Heidi-0.045
 blonde-0.044
 Sharon-0.044
 Alice-0.043
Token girl
Token ID2576
Feature activation+3.91e-03

Pos logit contributions
ands+0.012
rim+0.011
orse+0.011
inations+0.010
rr+0.010
Neg logit contributions
 Heidi-0.009
Aud-0.009
 Sharon-0.009
 redd-0.009
 Alice-0.009
Token who
Token ID508
Feature activation-2.54e-02

Pos logit contributions
ands+0.007
inations+0.006
rim+0.006
rr+0.006
ulu+0.006
Neg logit contributions
 Alice-0.006
 blonde-0.006
 Sharon-0.006
 Monica-0.006
 Heidi-0.006
Token didn
Token ID1422
Feature activation-2.75e-02

Pos logit contributions
 Heidi+0.036
Aud+0.036
 blonde+0.035
 Alice+0.035
 Sharon+0.035
Neg logit contributions
ands-0.047
rim-0.045
orse-0.043
 oct-0.042
rr-0.041
Token't
Token ID470
Feature activation+4.01e-02

Pos logit contributions
 Heidi+0.034
Aud+0.034
 Alice+0.034
ifying+0.033
 Sharon+0.033
Neg logit contributions
ands-0.055
rim-0.053
 oct-0.051
orse-0.050
rr-0.048
Token like
Token ID588
Feature activation+5.71e-03

Pos logit contributions
rim+0.080
ands+0.077
rr+0.072
 oct+0.070
inations+0.068
Neg logit contributions
 blonde-0.050
 Alice-0.047
 redd-0.046
 It-0.046
 Sharon-0.046
Token av
Token ID1196
Feature activation-1.62e-01

Pos logit contributions
ands+0.010
 oct+0.009
rim+0.009
orse+0.009
ulu+0.008
Neg logit contributions
Aud-0.009
 Heidi-0.009
rified-0.009
ifying-0.009
 Sharon-0.009
Tokenoc
Token ID420
Feature activation-3.70e-02

Pos logit contributions
 Heidi+0.306
 Sharon+0.300
Aud+0.299
 Alice+0.299
ifying+0.294
Neg logit contributions
ands-0.227
rim-0.212
orse-0.204
 oct-0.198
rr-0.196
Tokenados
Token ID22484
Feature activation-8.79e-02

Pos logit contributions
ifying+0.045
Aud+0.043
 Scarlett+0.040
 Monica+0.040
 Sharon+0.040
Neg logit contributions
ands-0.081
rim-0.076
 oct-0.075
rr-0.072
orse-0.070
Token.
Token ID13
Feature activation-1.09e-02

Pos logit contributions
 Heidi+0.173
Aud+0.173
 Alice+0.168
 Sharon+0.167
 Stephen+0.166
Neg logit contributions
ands-0.115
rim-0.107
orse-0.101
 oct-0.099
rr-0.096
Token She
Token ID1375
Feature activation+1.02e-02

Pos logit contributions
ifying+0.016
Aud+0.015
rified+0.014
 Stephen+0.014
cases+0.014
Neg logit contributions
ands-0.019
ulu-0.018
 hour-0.018
 oct-0.017
 myself-0.017
Token always
Token ID1464
Feature activation+4.15e-02

Pos logit contributions
ands+0.018
rim+0.017
orse+0.017
 oct+0.017
rr+0.016
Neg logit contributions
Aud-0.015
 Heidi-0.015
 Sharon-0.014
 Alice-0.014
 Stephen-0.014
Token.
Token ID13
Feature activation-2.85e-03

Pos logit contributions
Aud+0.202
rified+0.197
 Sharon+0.197
 Heidi+0.197
 blonde+0.194
Neg logit contributions
ands-0.126
 oct-0.116
rim-0.113
orse-0.112
ulu-0.106
Token Mom
Token ID11254
Feature activation-2.72e-02

Pos logit contributions
Aud+0.005
ifying+0.005
severe+0.005
se+0.005
Charl+0.005
Neg logit contributions
 oct-0.004
ulu-0.003
 o-0.003
 be-0.003
ia-0.003
Token says
Token ID1139
Feature activation+2.21e-02

Pos logit contributions
ifying+0.046
rified+0.044
 blonde+0.042
 redd+0.042
Aud+0.042
Neg logit contributions
 oct-0.043
ands-0.043
rr-0.043
rim-0.043
ulu-0.041
Token they
Token ID484
Feature activation-3.25e-02

Pos logit contributions
 oct+0.018
orse+0.018
ulu+0.016
ands+0.014
rr+0.013
Neg logit contributions
ifying-0.055
rified-0.054
Aud-0.053
se-0.052
severe-0.052
Token are
Token ID389
Feature activation-1.39e-02

Pos logit contributions
 Heidi+0.060
Aud+0.059
rified+0.056
 Sharon+0.055
 Audrey+0.055
Neg logit contributions
ands-0.048
 oct-0.044
inations-0.043
rr-0.042
orse-0.042
Token av
Token ID1196
Feature activation-1.61e-01

Pos logit contributions
rified+0.027
Aud+0.026
se+0.026
ifying+0.025
 Heidi+0.025
Neg logit contributions
ands-0.020
 oct-0.018
rim-0.018
ulu-0.018
orse-0.017
Tokenoc
Token ID420
Feature activation-3.53e-02

Pos logit contributions
 Heidi+0.325
 Sharon+0.317
 Alice+0.315
rified+0.314
Aud+0.313
Neg logit contributions
ands-0.210
rim-0.195
orse-0.184
rr-0.180
 oct-0.179
Tokenados
Token ID22484
Feature activation-1.14e-01

Pos logit contributions
ifying+0.040
Aud+0.039
 Heidi+0.037
 redd+0.037
 blonde+0.037
Neg logit contributions
ands-0.079
rim-0.075
 oct-0.072
rr-0.071
orse-0.068
Token.
Token ID13
Feature activation-7.78e-03

Pos logit contributions
Aud+0.217
 Heidi+0.217
 Alice+0.211
 Sharon+0.210
 Stephen+0.209
Neg logit contributions
ands-0.154
rim-0.145
orse-0.137
 oct-0.134
rr-0.129
Token Sara
Token ID24799
Feature activation-2.63e-03

Pos logit contributions
string+0.016
severe+0.016
Aud+0.016
se+0.015
ifying+0.015
Neg logit contributions
ulu-0.007
 oct-0.007
 be-0.006
ila-0.006
 o-0.006
Token wants
Token ID3382
Feature activation-1.55e-02

Pos logit contributions
ifying+0.005
Aud+0.004
 Heidi+0.004
rified+0.004
 Monica+0.004
Neg logit contributions
ands-0.004
 oct-0.004
rr-0.004
orse-0.004
rim-0.004
Token time
Token ID640
Feature activation-7.21e-03

Pos logit contributions
Aud+0.054
 Heidi+0.053
 Alice+0.052
 blonde+0.052
 Sharon+0.049
Neg logit contributions
ands-0.062
rim-0.057
 oct-0.055
orse-0.054
ulu-0.052
Token there
Token ID612
Feature activation-3.52e-02

Pos logit contributions
 Heidi+0.012
Aud+0.012
 Alice+0.012
 Stephen+0.012
 Sharon+0.012
Neg logit contributions
ands-0.012
rim-0.011
orse-0.010
 oct-0.010
inations-0.010
Token was
Token ID373
Feature activation-8.40e-03

Pos logit contributions
 Heidi+0.054
Aud+0.054
 Alice+0.052
 Sharon+0.052
 Stephen+0.052
Neg logit contributions
ands-0.061
rim-0.058
orse-0.055
 oct-0.055
rr-0.053
Token a
Token ID257
Feature activation-3.12e-02

Pos logit contributions
 Heidi+0.011
 blonde+0.011
 Alice+0.011
Aud+0.010
 Michelle+0.010
Neg logit contributions
ands-0.018
rim-0.017
orse-0.016
inations-0.015
rr-0.015
Token little
Token ID1310
Feature activation+2.35e-03

Pos logit contributions
 Heidi+0.049
Aud+0.049
 Sharon+0.048
 Alice+0.048
 redd+0.048
Neg logit contributions
ands-0.054
rim-0.050
orse-0.046
rr-0.046
inations-0.046
Token av
Token ID1196
Feature activation-1.60e-01

Pos logit contributions
ands+0.004
rim+0.004
orse+0.004
 oct+0.004
rr+0.004
Neg logit contributions
 Heidi-0.004
Aud-0.004
 Sharon-0.003
 Alice-0.003
 Stephen-0.003
Tokenac
Token ID330
Feature activation-6.35e-02

Pos logit contributions
Aud+0.271
 Heidi+0.268
 Sharon+0.257
 Stephen+0.257
 redd+0.257
Neg logit contributions
ands-0.253
rim-0.237
orse-0.224
 oct-0.223
ulu-0.218
Tokenado
Token ID4533
Feature activation+2.05e-02

Pos logit contributions
rified+0.125
 Heidi+0.124
 Alice+0.124
 Tab+0.124
 Sharon+0.124
Neg logit contributions
ands-0.075
orse-0.067
 seek-0.066
inations-0.062
 oct-0.060
Token.
Token ID13
Feature activation-1.66e-02

Pos logit contributions
ands+0.038
inations+0.033
rim+0.032
rr+0.030
 myself+0.029
Neg logit contributions
rified-0.030
 blonde-0.029
 Sharon-0.029
ifying-0.029
 Alice-0.028
Token It
Token ID632
Feature activation+7.69e-04

Pos logit contributions
Aud+0.020
 Heidi+0.020
 Sharon+0.019
rified+0.018
 redd+0.018
Neg logit contributions
ands-0.034
rim-0.033
 oct-0.032
orse-0.032
rr-0.031
Token was
Token ID373
Feature activation-1.82e-03

Pos logit contributions
ands+0.002
rr+0.002
inations+0.002
rim+0.002
orse+0.002
Neg logit contributions
 blonde-0.001
 Heidi-0.001
 Alice-0.001
 redd-0.001
rified-0.001
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
Aud+0.128
 Heidi+0.128
 Sharon+0.124
 Alice+0.121
rified+0.121
Neg logit contributions
ands-0.090
 oct-0.082
rim-0.081
orse-0.081
ulu-0.075
Token lots
Token ID6041
Feature activation-5.94e-02

Pos logit contributions
Aud+0.023
 Heidi+0.023
rified+0.023
 Sharon+0.022
ifying+0.022
Neg logit contributions
ands-0.021
 oct-0.020
orse-0.020
rim-0.019
ulu-0.018
Token of
Token ID286
Feature activation-5.69e-02

Pos logit contributions
Aud+0.125
 Heidi+0.125
 Alice+0.122
 Sharon+0.121
 blonde+0.121
Neg logit contributions
ands-0.069
rim-0.064
orse-0.060
 oct-0.058
rr-0.056
Token shiny
Token ID22441
Feature activation-9.12e-03

Pos logit contributions
 Heidi+0.095
Aud+0.094
 blonde+0.094
 Alice+0.093
 Sharon+0.092
Neg logit contributions
ands-0.090
rim-0.085
orse-0.081
 oct-0.079
rr-0.078
Token or
Token ID393
Feature activation-1.96e-03

Pos logit contributions
Aud+0.015
 Heidi+0.015
 Sharon+0.014
 Alice+0.014
rified+0.014
Neg logit contributions
ands-0.015
rim-0.014
 oct-0.014
orse-0.014
rr-0.013
Tokennam
Token ID7402
Feature activation-1.59e-01

Pos logit contributions
Aud+0.004
 Heidi+0.003
rified+0.003
 Sharon+0.003
 Michelle+0.003
Neg logit contributions
ands-0.003
 oct-0.003
rim-0.003
orse-0.003
rr-0.002
Tokenents
Token ID658
Feature activation-8.44e-04

Pos logit contributions
Aud+0.181
 Heidi+0.179
ifying+0.172
 Sharon+0.172
 blonde+0.171
Neg logit contributions
ands-0.337
rim-0.324
orse-0.312
 oct-0.311
ulu-0.300
Token.
Token ID13
Feature activation+3.94e-03

Pos logit contributions
Aud+0.002
 Heidi+0.002
 Sharon+0.002
 Stephen+0.002
 Alice+0.002
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
ulu-0.001
Token L
Token ID406
Feature activation+1.84e-02

Pos logit contributions
ands+0.005
ulu+0.005
orse+0.005
rim+0.005
 oct+0.005
Neg logit contributions
 Heidi-0.007
Aud-0.007
ifying-0.007
 Stephen-0.007
 redd-0.007
Tokenila
Token ID10102
Feature activation+5.68e-02

Pos logit contributions
ands+0.015
rim+0.013
 oct+0.011
orse+0.011
rr+0.010
Neg logit contributions
Aud-0.046
 Heidi-0.046
 Sharon-0.045
 Alice-0.045
rified-0.044
Token wanted
Token ID2227
Feature activation-9.83e-03

Pos logit contributions
ands+0.095
rim+0.090
orse+0.089
 oct+0.088
rr+0.084
Neg logit contributions
 Heidi-0.090
Aud-0.090
ifying-0.086
 Sharon-0.085
 redd-0.085
Token many
Token ID867
Feature activation+2.41e-02

Pos logit contributions
ands+0.004
 oct+0.004
 ig+0.003
orse+0.003
rr+0.003
Neg logit contributions
rified-0.006
Aud-0.006
 Heidi-0.006
 Michelle-0.005
ifying-0.005
Token colors
Token ID7577
Feature activation-9.84e-03

Pos logit contributions
ands+0.036
rim+0.031
 oct+0.030
inations+0.029
orse+0.029
Neg logit contributions
Aud-0.045
 Heidi-0.044
rified-0.044
ifying-0.043
 Scarlett-0.042
Token.
Token ID13
Feature activation-7.10e-03

Pos logit contributions
 Heidi+0.020
Aud+0.020
 Sharon+0.019
 Alice+0.019
rified+0.019
Neg logit contributions
ands-0.012
rim-0.011
orse-0.011
 oct-0.011
rr-0.010
Token Some
Token ID2773
Feature activation-2.81e-03

Pos logit contributions
Aud+0.008
 Heidi+0.008
rified+0.008
 Sharon+0.008
 Stephen+0.008
Neg logit contributions
ands-0.015
rim-0.014
orse-0.014
 oct-0.014
ulu-0.014
Token or
Token ID393
Feature activation-3.92e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
ifying+0.004
rified+0.004
 Sharon+0.004
Neg logit contributions
ands-0.005
rim-0.004
orse-0.004
 oct-0.004
rr-0.004
Tokennam
Token ID7402
Feature activation-1.58e-01

Pos logit contributions
Aud+0.057
 Heidi+0.055
rified+0.053
 Audrey+0.050
 Monica+0.050
Neg logit contributions
ands-0.073
 oct-0.072
rr-0.065
orse-0.064
 ig-0.063
Tokenents
Token ID658
Feature activation-1.73e-03

Pos logit contributions
 Heidi+0.200
Aud+0.198
 Sharon+0.189
rified+0.189
 Alice+0.188
Neg logit contributions
ands-0.319
rim-0.304
orse-0.294
 oct-0.289
rr-0.287
Token were
Token ID547
Feature activation-9.94e-03

Pos logit contributions
Aud+0.002
 Heidi+0.002
 Sharon+0.002
 Alice+0.002
 Stephen+0.002
Neg logit contributions
ands-0.003
rim-0.003
orse-0.003
 oct-0.003
rr-0.003
Token round
Token ID2835
Feature activation-2.27e-02

Pos logit contributions
rified+0.020
Aud+0.019
 Heidi+0.019
 Audrey+0.018
 Monica+0.017
Neg logit contributions
ands-0.014
rim-0.012
rr-0.012
orse-0.012
 oct-0.012
Token,
Token ID11
Feature activation+2.34e-02

Pos logit contributions
Aud+0.045
 Heidi+0.044
 Sharon+0.042
 Stephen+0.042
 Monica+0.042
Neg logit contributions
ands-0.032
 oct-0.029
rim-0.028
rr-0.027
orse-0.026
Token some
Token ID617
Feature activation-3.38e-02

Pos logit contributions
ands+0.037
 oct+0.034
rim+0.034
orse+0.033
ulu+0.032
Neg logit contributions
Aud-0.040
 Heidi-0.039
rified-0.038
 Stephen-0.038
 Sharon-0.038
Token,
Token ID11
Feature activation-2.36e-02

Pos logit contributions
 Heidi+0.012
Aud+0.012
 Alice+0.012
 Stephen+0.012
 Sharon+0.012
Neg logit contributions
ands-0.011
rim-0.010
orse-0.010
 oct-0.010
rr-0.009
Token there
Token ID612
Feature activation-3.73e-02

Pos logit contributions
Aud+0.036
 Heidi+0.036
 Sharon+0.035
 Alice+0.035
 Stephen+0.034
Neg logit contributions
ands-0.041
rim-0.039
orse-0.038
 oct-0.037
rr-0.036
Token was
Token ID373
Feature activation-9.81e-03

Pos logit contributions
Aud+0.063
 Heidi+0.061
 Sharon+0.059
ifying+0.059
 redd+0.059
Neg logit contributions
ands-0.058
rim-0.056
orse-0.055
 oct-0.054
ulu-0.052
Token a
Token ID257
Feature activation-3.11e-02

Pos logit contributions
 Heidi+0.014
Aud+0.014
 Alice+0.014
 blonde+0.013
 redd+0.013
Neg logit contributions
ands-0.019
rim-0.018
orse-0.017
 oct-0.016
rr-0.016
Token loud
Token ID7812
Feature activation-2.41e-02

Pos logit contributions
 Heidi+0.050
Aud+0.050
 Sharon+0.048
 redd+0.048
 Alice+0.048
Neg logit contributions
ands-0.053
rim-0.049
orse-0.045
rr-0.045
inations-0.045
Token av
Token ID1196
Feature activation-1.58e-01

Pos logit contributions
 Heidi+0.036
 Alice+0.036
Aud+0.036
 Sharon+0.035
 Stephen+0.035
Neg logit contributions
ands-0.043
rim-0.040
inations-0.037
ulu-0.037
orse-0.037
Tokenac
Token ID330
Feature activation-6.40e-02

Pos logit contributions
Aud+0.267
 Heidi+0.262
 Stephen+0.253
 Sharon+0.252
 redd+0.251
Neg logit contributions
ands-0.250
rim-0.235
orse-0.222
 oct-0.221
ulu-0.217
Tokenado
Token ID4533
Feature activation+1.86e-02

Pos logit contributions
rified+0.095
se+0.093
Aud+0.093
 Heidi+0.092
 Tab+0.092
Neg logit contributions
ands-0.102
orse-0.096
 seek-0.095
inations-0.092
ulu-0.090
Token.
Token ID13
Feature activation-1.65e-02

Pos logit contributions
ands+0.033
inations+0.029
rim+0.028
rr+0.026
orse+0.025
Neg logit contributions
rified-0.028
 Sharon-0.027
ifying-0.027
 Alice-0.027
 blonde-0.027
Token It
Token ID632
Feature activation-1.51e-04

Pos logit contributions
 Heidi+0.017
Aud+0.017
 Alice+0.016
 Sharon+0.016
 blonde+0.016
Neg logit contributions
ands-0.037
rim-0.036
orse-0.034
 oct-0.034
rr-0.033
Token was
Token ID373
Feature activation-3.02e-03

Pos logit contributions
 Alice+0.000
 Heidi+0.000
rified+0.000
Aud+0.000
 blonde+0.000
Neg logit contributions
ands-0.000
rim-0.000
rr-0.000
inations-0.000
orse-0.000
Token swings
Token ID26728
Feature activation-2.45e-02

Pos logit contributions
ands+0.027
rim+0.025
orse+0.024
 oct+0.024
rr+0.023
Neg logit contributions
Aud-0.025
 Heidi-0.025
 Sharon-0.024
 Alice-0.024
 blonde-0.024
Token and
Token ID290
Feature activation+1.48e-02

Pos logit contributions
Aud+0.046
 Heidi+0.046
 Sharon+0.045
rified+0.044
 redd+0.044
Neg logit contributions
ands-0.034
rim-0.032
 oct-0.030
orse-0.030
rr-0.029
Token slides
Token ID19392
Feature activation-3.33e-02

Pos logit contributions
ands+0.027
rim+0.026
 oct+0.025
orse+0.025
ulu+0.024
Neg logit contributions
Aud-0.021
 Heidi-0.021
 Sharon-0.020
rified-0.020
ifying-0.020
Token and
Token ID290
Feature activation+4.60e-03

Pos logit contributions
 Heidi+0.076
Aud+0.075
 Michelle+0.075
 Sharon+0.074
rified+0.073
Neg logit contributions
ands-0.033
 oct-0.029
rim-0.028
orse-0.027
rr-0.026
Token sand
Token ID6450
Feature activation-3.36e-02

Pos logit contributions
ands+0.008
rim+0.008
 oct+0.008
orse+0.008
rr+0.007
Neg logit contributions
Aud-0.007
 Heidi-0.007
 Sharon-0.006
ifying-0.006
rified-0.006
Tokenboxes
Token ID29305
Feature activation-1.57e-01

Pos logit contributions
 Heidi+0.066
Aud+0.065
 Sharon+0.064
rified+0.063
 Alice+0.063
Neg logit contributions
ands-0.046
rim-0.042
 oct-0.040
orse-0.039
rr-0.038
Token.
Token ID13
Feature activation-7.39e-03

Pos logit contributions
Aud+0.338
 Heidi+0.336
 Sharon+0.325
 redd+0.324
 blonde+0.320
Neg logit contributions
ands-0.173
rim-0.160
orse-0.156
 oct-0.155
rr-0.145
Token Lily
Token ID20037
Feature activation-4.69e-04

Pos logit contributions
Aud+0.014
string+0.013
se+0.013
 Heidi+0.013
 Stephen+0.013
Neg logit contributions
ulu-0.010
orse-0.010
 oct-0.009
ands-0.009
rim-0.009
Token and
Token ID290
Feature activation-4.87e-02

Pos logit contributions
ifying+0.001
Aud+0.001
 Heidi+0.001
rified+0.001
 redd+0.001
Neg logit contributions
orse-0.001
 oct-0.001
ands-0.000
rim-0.000
 seek-0.000
Token Ben
Token ID3932
Feature activation-1.01e-02

Pos logit contributions
ifying+0.081
Aud+0.076
rified+0.075
 redd+0.075
 blonde+0.073
Neg logit contributions
 oct-0.079
ands-0.078
rim-0.073
orse-0.072
ulu-0.071
Token wanted
Token ID2227
Feature activation-1.20e-02

Pos logit contributions
Aud+0.019
 Heidi+0.019
ifying+0.019
 Michelle+0.018
 Sharon+0.018
Neg logit contributions
 oct-0.012
orse-0.012
ands-0.012
ulu-0.011
 seek-0.011
Token away
Token ID1497
Feature activation-1.21e-03

Pos logit contributions
ands+0.041
rim+0.039
 oct+0.037
orse+0.035
ulu+0.035
Neg logit contributions
 Heidi-0.035
Aud-0.034
rified-0.034
 Stephen-0.033
 Sharon-0.033
Token.
Token ID13
Feature activation+4.65e-03

Pos logit contributions
 Heidi+0.002
Aud+0.002
 Alice+0.002
 Sharon+0.002
 blonde+0.002
Neg logit contributions
ands-0.002
rim-0.002
orse-0.002
 oct-0.002
rr-0.002
Token She
Token ID1375
Feature activation+1.05e-02

Pos logit contributions
ands+0.007
orse+0.007
ulu+0.006
 oct+0.006
rim+0.006
Neg logit contributions
ifying-0.008
Aud-0.008
 redd-0.007
rified-0.007
string-0.007
Token learned
Token ID4499
Feature activation-1.05e-02

Pos logit contributions
ands+0.018
rim+0.017
orse+0.017
 oct+0.016
ulu+0.016
Neg logit contributions
Aud-0.017
 Heidi-0.016
 Sharon-0.016
ifying-0.016
 Stephen-0.016
Token that
Token ID326
Feature activation+4.31e-02

Pos logit contributions
Aud+0.022
 Heidi+0.022
rified+0.021
 Sharon+0.021
 blonde+0.021
Neg logit contributions
ands-0.012
rim-0.011
 oct-0.011
orse-0.010
rr-0.010
Token av
Token ID1196
Feature activation-1.57e-01

Pos logit contributions
ands+0.066
rim+0.063
 oct+0.062
orse+0.061
ulu+0.057
Neg logit contributions
Aud-0.076
ifying-0.074
rified-0.072
 Heidi-0.072
 Sharon-0.071
Tokenoc
Token ID420
Feature activation-3.03e-02

Pos logit contributions
 Heidi+0.303
Aud+0.299
 Sharon+0.295
 Alice+0.295
 Stephen+0.291
Neg logit contributions
ands-0.214
rim-0.200
orse-0.187
 oct-0.183
rr-0.182
Tokenados
Token ID22484
Feature activation-9.03e-02

Pos logit contributions
ifying+0.038
Aud+0.037
 blonde+0.035
 Monica+0.034
 Ashley+0.033
Neg logit contributions
ands-0.065
rim-0.062
 oct-0.060
rr-0.058
 myself-0.057
Token are
Token ID389
Feature activation+3.09e-03

Pos logit contributions
Aud+0.122
 Heidi+0.121
 Sharon+0.116
 Alice+0.116
 Stephen+0.115
Neg logit contributions
ands-0.173
rim-0.165
orse-0.159
 oct-0.158
rr-0.154
Token good
Token ID922
Feature activation-4.39e-02

Pos logit contributions
ands+0.004
rim+0.004
 oct+0.003
orse+0.003
ulu+0.003
Neg logit contributions
Aud-0.006
 Heidi-0.006
rified-0.006
 Sharon-0.006
 Stephen-0.006
Token for
Token ID329
Feature activation+1.30e-02

Pos logit contributions
 Heidi+0.064
Aud+0.064
 Alice+0.062
 Sharon+0.061
rified+0.061
Neg logit contributions
ands-0.079
rim-0.076
orse-0.073
 oct-0.072
rr-0.070
Token the
Token ID262
Feature activation-2.02e-02

Pos logit contributions
orse+0.018
 oct+0.017
ulu+0.016
ands+0.016
rr+0.015
Neg logit contributions
ifying-0.029
Aud-0.029
rified-0.028
licks-0.027
se-0.027
Token shiny
Token ID22441
Feature activation+4.53e-03

Pos logit contributions
Aud+0.038
ifying+0.038
 Heidi+0.036
 Monica+0.036
 Sharon+0.036
Neg logit contributions
ands-0.028
 oct-0.026
 tunnel-0.025
 ig-0.025
 seek-0.024
Token things
Token ID1243
Feature activation-6.06e-02

Pos logit contributions
ands+0.006
 oct+0.005
rim+0.005
orse+0.005
 ig+0.005
Neg logit contributions
Aud-0.010
ifying-0.009
 Heidi-0.009
rified-0.009
 Michelle-0.009
Token were
Token ID547
Feature activation-1.54e-02

Pos logit contributions
Aud+0.107
 Heidi+0.106
rified+0.104
 redd+0.102
 Sharon+0.102
Neg logit contributions
ands-0.090
 oct-0.081
rim-0.078
ulu-0.076
orse-0.076
Token or
Token ID393
Feature activation-3.31e-02

Pos logit contributions
 Heidi+0.027
Aud+0.027
 Alice+0.026
 Sharon+0.026
 Stephen+0.026
Neg logit contributions
ands-0.023
rim-0.022
orse-0.021
 oct-0.021
rr-0.020
Tokennam
Token ID7402
Feature activation-1.55e-01

Pos logit contributions
Aud+0.054
 Heidi+0.054
rified+0.052
 Sharon+0.051
 Stephen+0.051
Neg logit contributions
ands-0.054
rim-0.050
 oct-0.050
orse-0.049
ulu-0.047
Tokenents
Token ID658
Feature activation-5.59e-03

Pos logit contributions
Aud+0.181
 Heidi+0.179
 Sharon+0.171
 Alice+0.171
 blonde+0.170
Neg logit contributions
ands-0.324
rim-0.311
orse-0.300
 oct-0.299
rr-0.289
Token and
Token ID290
Feature activation-1.30e-03

Pos logit contributions
Aud+0.010
 Heidi+0.010
 Sharon+0.009
 Alice+0.009
 Stephen+0.009
Neg logit contributions
ands-0.009
rim-0.008
orse-0.008
 oct-0.008
rr-0.007
Token they
Token ID484
Feature activation-2.41e-02

Pos logit contributions
rified+0.002
Aud+0.002
 Heidi+0.002
ifying+0.002
 Stephen+0.002
Neg logit contributions
ands-0.002
 oct-0.002
orse-0.002
rim-0.002
ulu-0.002
Token were
Token ID547
Feature activation-5.86e-03

Pos logit contributions
 Heidi+0.035
Aud+0.035
 Sharon+0.034
rified+0.034
 Alice+0.033
Neg logit contributions
ands-0.044
rim-0.041
orse-0.040
 oct-0.040
rr-0.038
Token very
Token ID845
Feature activation-5.28e-04

Pos logit contributions
rified+0.013
Aud+0.012
ifying+0.012
 Heidi+0.012
se+0.011
Neg logit contributions
ands-0.007
 oct-0.007
orse-0.006
rim-0.006
ulu-0.006
Tokenongo
Token ID25162
Feature activation+2.97e-03

Pos logit contributions
Aud+0.000
 Heidi+0.000
rified+0.000
 Stephen+0.000
 redd+0.000
Neg logit contributions
ands-0.000
rim-0.000
ulu-0.000
orse-0.000
 oct-0.000
Token didn
Token ID1422
Feature activation-3.41e-02

Pos logit contributions
ands+0.006
rim+0.005
orse+0.005
 oct+0.005
rr+0.005
Neg logit contributions
 Heidi-0.004
Aud-0.004
rified-0.004
 Alice-0.004
 Sharon-0.004
Token't
Token ID470
Feature activation+3.50e-02

Pos logit contributions
 Heidi+0.045
Aud+0.044
 Alice+0.043
ifying+0.042
 Sharon+0.042
Neg logit contributions
ands-0.067
rim-0.064
orse-0.061
 oct-0.061
rr-0.059
Token like
Token ID588
Feature activation+1.13e-04

Pos logit contributions
rim+0.067
ands+0.064
 oct+0.059
orse+0.059
inations+0.057
Neg logit contributions
 redd-0.046
 Heidi-0.046
 blonde-0.043
Aud-0.042
 Sharon-0.042
Token wet
Token ID9583
Feature activation+1.17e-02

Pos logit contributions
 oct+0.000
ands+0.000
rim+0.000
orse+0.000
ulu+0.000
Neg logit contributions
Aud-0.000
rified-0.000
ifying-0.000
 Heidi-0.000
 Sharon-0.000
Token av
Token ID1196
Feature activation-1.54e-01

Pos logit contributions
 oct+0.016
ands+0.016
rim+0.015
orse+0.013
rr+0.013
Neg logit contributions
Aud-0.024
 Ashley-0.022
Scar-0.022
 Sharon-0.022
ifying-0.022
Tokenoc
Token ID420
Feature activation-2.65e-02

Pos logit contributions
 Heidi+0.299
 Sharon+0.293
Aud+0.293
 Alice+0.293
 Stephen+0.288
Neg logit contributions
ands-0.210
rim-0.195
orse-0.186
 oct-0.181
rr-0.178
Tokenados
Token ID22484
Feature activation-9.27e-02

Pos logit contributions
ifying+0.033
Aud+0.032
 Ashley+0.030
 Scarlett+0.030
 Monica+0.030
Neg logit contributions
ands-0.057
rim-0.053
 oct-0.052
rr-0.051
orse-0.049
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
Aud+0.180
 Heidi+0.180
 Sharon+0.175
 Alice+0.173
 Stephen+0.173
Neg logit contributions
ands-0.123
rim-0.114
 oct-0.109
orse-0.109
rr-0.102
Token so
Token ID523
Feature activation+4.57e-04

Pos logit contributions
ands+0.016
rim+0.016
 oct+0.015
orse+0.015
ulu+0.014
Neg logit contributions
Aud-0.019
 Heidi-0.019
 Sharon-0.018
 Alice-0.018
 Stephen-0.018
Token he
Token ID339
Feature activation-2.03e-02

Pos logit contributions
ands+0.001
rim+0.001
rr+0.001
inations+0.001
ulu+0.001
Neg logit contributions
 blonde-0.001
 Monica-0.001
 Heidi-0.001
 Alice-0.001
 Sharon-0.001
Token and
Token ID290
Feature activation-2.43e-03

Pos logit contributions
Aud+0.014
 Heidi+0.014
 redd+0.014
ifying+0.014
 blonde+0.014
Neg logit contributions
ands-0.008
 oct-0.007
orse-0.007
rim-0.007
 hour-0.007
Token shared
Token ID4888
Feature activation-3.65e-02

Pos logit contributions
ifying+0.004
Aud+0.004
rified+0.004
 Heidi+0.004
 blonde+0.004
Neg logit contributions
orse-0.004
ands-0.003
 oct-0.003
rim-0.003
 seek-0.003
Token everything
Token ID2279
Feature activation-2.30e-02

Pos logit contributions
rified+0.085
bars+0.082
se+0.081
ifying+0.080
cases+0.078
Neg logit contributions
ands-0.035
rim-0.031
 oct-0.031
orse-0.029
 prank-0.028
Token,
Token ID11
Feature activation+1.54e-02

Pos logit contributions
Aud+0.043
 Heidi+0.043
rified+0.043
 blonde+0.042
 Sharon+0.042
Neg logit contributions
ands-0.031
rim-0.030
orse-0.028
 oct-0.028
ulu-0.027
Token including
Token ID1390
Feature activation-3.38e-02

Pos logit contributions
ands+0.022
rim+0.021
orse+0.021
 oct+0.020
inations+0.019
Neg logit contributions
Aud-0.028
 Heidi-0.028
ifying-0.028
 Sharon-0.027
rified-0.027
Token av
Token ID1196
Feature activation-1.54e-01

Pos logit contributions
Aud+0.061
 Heidi+0.059
rified+0.058
ifying+0.057
 Sharon+0.056
Neg logit contributions
ands-0.051
 oct-0.048
rim-0.047
orse-0.045
ulu-0.044
Tokenoc
Token ID420
Feature activation-2.92e-02

Pos logit contributions
 Heidi+0.311
rified+0.305
 Alice+0.305
 Sharon+0.305
Aud+0.303
Neg logit contributions
ands-0.199
rim-0.186
orse-0.180
rr-0.170
 oct-0.167
Tokenados
Token ID22484
Feature activation-9.38e-02

Pos logit contributions
Aud+0.035
ifying+0.035
 blonde+0.032
 Heidi+0.032
 Scarlett+0.032
Neg logit contributions
ands-0.064
rim-0.062
rr-0.057
 oct-0.057
orse-0.056
Token.
Token ID13
Feature activation+7.30e-04

Pos logit contributions
 Heidi+0.183
Aud+0.183
 Sharon+0.178
rified+0.177
 Alice+0.176
Neg logit contributions
ands-0.124
rim-0.115
orse-0.109
 oct-0.109
rr-0.103
Token They
Token ID1119
Feature activation+1.29e-02

Pos logit contributions
ands+0.001
orse+0.001
rim+0.001
 oct+0.001
ulu+0.001
Neg logit contributions
Aud-0.001
 Heidi-0.001
 Stephen-0.001
 Sharon-0.001
 redd-0.001
Token were
Token ID547
Feature activation-5.65e-03

Pos logit contributions
ands+0.022
 oct+0.021
orse+0.021
rim+0.021
ulu+0.020
Neg logit contributions
Aud-0.020
 Heidi-0.020
ifying-0.020
rified-0.019
 Sharon-0.019
Token -
Token ID532
Feature activation+1.39e-02

Pos logit contributions
 Heidi+0.065
rified+0.064
ifying+0.064
Aud+0.063
 redd+0.063
Neg logit contributions
ands-0.073
rim-0.068
rr-0.064
orse-0.063
inations-0.063
Token it
Token ID340
Feature activation-4.76e-02

Pos logit contributions
 oct+0.014
orse+0.012
 ig+0.011
 himself+0.011
 myself+0.010
Neg logit contributions
Aud-0.033
cases-0.032
se-0.031
rified-0.030
 Sharon-0.030
Token was
Token ID373
Feature activation+5.32e-03

Pos logit contributions
 Heidi+0.070
Aud+0.069
 redd+0.067
 Alice+0.066
rified+0.066
Neg logit contributions
ands-0.088
rim-0.082
orse-0.079
 oct-0.077
rr-0.077
Token bigger
Token ID5749
Feature activation-4.86e-02

Pos logit contributions
ands+0.008
rim+0.008
 oct+0.008
orse+0.008
ulu+0.007
Neg logit contributions
Aud-0.009
 Heidi-0.009
 Sharon-0.009
rified-0.009
 Alice-0.009
Token and
Token ID290
Feature activation+3.74e-02

Pos logit contributions
Aud+0.097
 Heidi+0.096
 Sharon+0.094
 Alice+0.093
rified+0.093
Neg logit contributions
ands-0.063
rim-0.058
 oct-0.056
orse-0.055
rr-0.052
Token ju
Token ID7544
Feature activation-1.53e-01

Pos logit contributions
ands+0.051
 oct+0.049
rim+0.049
orse+0.048
ulu+0.047
Neg logit contributions
Aud-0.069
 Heidi-0.067
rified-0.066
 Stephen-0.066
 Sharon-0.066
Tokenic
Token ID291
Feature activation+3.46e-03

Pos logit contributions
Aud+0.242
rified+0.239
 Heidi+0.238
ifying+0.236
 redd+0.235
Neg logit contributions
ands-0.253
orse-0.237
 oct-0.235
rim-0.228
ulu-0.217
Tokenier
Token ID959
Feature activation-2.19e-02

Pos logit contributions
ands+0.005
rim+0.005
orse+0.005
 oct+0.005
rr+0.005
Neg logit contributions
 Heidi-0.006
rified-0.006
Aud-0.006
ifying-0.006
 Sharon-0.006
Token than
Token ID621
Feature activation-7.26e-03

Pos logit contributions
Aud+0.046
 Heidi+0.046
 Sharon+0.044
 Alice+0.044
rified+0.044
Neg logit contributions
ands-0.026
rim-0.024
 oct-0.023
orse-0.023
rr-0.021
Token any
Token ID597
Feature activation-1.95e-02

Pos logit contributions
Aud+0.013
 Heidi+0.012
 Sharon+0.012
 Monica+0.012
 Audrey+0.012
Neg logit contributions
ands-0.012
 oct-0.011
orse-0.011
rim-0.011
ulu-0.010
Token he
Token ID339
Feature activation+9.80e-03

Pos logit contributions
Aud+0.034
 Heidi+0.034
 Sharon+0.033
 Alice+0.033
 Stephen+0.033
Neg logit contributions
ands-0.030
rim-0.028
orse-0.027
 oct-0.027
rr-0.025
Token on
Token ID319
Feature activation-6.82e-02

Pos logit contributions
ands+0.092
rim+0.087
orse+0.085
 oct+0.084
rr+0.083
Neg logit contributions
Aud-0.049
 Heidi-0.048
 Sharon-0.046
 Alice-0.046
 blonde-0.045
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
Aud+0.154
 redd+0.153
ifying+0.149
rified+0.147
licks+0.146
Neg logit contributions
ands-0.064
rim-0.061
 oct-0.059
orse-0.059
ulu-0.058
Token whenever
Token ID8797
Feature activation-3.26e-02

Pos logit contributions
ands+0.019
ulu+0.019
rim+0.019
 oct+0.019
orse+0.019
Neg logit contributions
Aud-0.025
 Heidi-0.024
ifying-0.024
rified-0.024
 Stephen-0.024
Token they
Token ID484
Feature activation-1.42e-02

Pos logit contributions
Aud+0.056
ifying+0.054
rified+0.053
 redd+0.052
 Heidi+0.052
Neg logit contributions
ands-0.050
 oct-0.048
orse-0.047
rim-0.046
ulu-0.045
Token wanted
Token ID2227
Feature activation-5.31e-04

Pos logit contributions
Aud+0.021
 Heidi+0.021
 Sharon+0.020
 Alice+0.020
 Stephen+0.020
Neg logit contributions
ands-0.025
rim-0.024
 oct-0.024
orse-0.023
rr-0.023
Token av
Token ID1196
Feature activation-1.52e-01

Pos logit contributions
Aud+0.001
rified+0.001
ifying+0.001
 Monica+0.001
se+0.001
Neg logit contributions
ands-0.001
 oct-0.001
rim-0.001
orse-0.001
ulu-0.001
Tokenoc
Token ID420
Feature activation-2.56e-02

Pos logit contributions
 Heidi+0.308
 Sharon+0.306
 Alice+0.305
Aud+0.301
rified+0.300
Neg logit contributions
ands-0.189
rim-0.177
orse-0.167
rr-0.162
 oct-0.156
Tokenados
Token ID22484
Feature activation-8.68e-02

Pos logit contributions
ifying+0.034
Aud+0.033
 blonde+0.032
 Scarlett+0.031
 Ashley+0.031
Neg logit contributions
ands-0.053
rim-0.050
rr-0.048
 oct-0.047
orse-0.045
Token,
Token ID11
Feature activation+1.01e-02

Pos logit contributions
Aud+0.158
 Heidi+0.157
 Sharon+0.153
rified+0.153
 blonde+0.152
Neg logit contributions
ands-0.125
rim-0.117
 oct-0.111
orse-0.111
rr-0.106
Token they
Token ID484
Feature activation-1.77e-02

Pos logit contributions
 oct+0.013
ands+0.013
ulu+0.012
orse+0.012
rim+0.012
Neg logit contributions
Aud-0.019
ifying-0.018
 Heidi-0.018
rified-0.018
se-0.018
Token would
Token ID561
Feature activation-5.15e-02

Pos logit contributions
Aud+0.026
 Heidi+0.025
 Alice+0.024
 blonde+0.023
 Audrey+0.023
Neg logit contributions
ands-0.032
 oct-0.032
orse-0.031
rim-0.031
ulu-0.031
Token Jack
Token ID3619
Feature activation-1.42e-02

Pos logit contributions
Aud+0.007
ifying+0.006
 redd+0.006
se+0.006
 Monica+0.006
Neg logit contributions
ands-0.006
orse-0.006
 oct-0.006
rr-0.006
rim-0.006
Token set
Token ID900
Feature activation-3.81e-04

Pos logit contributions
Aud+0.026
ifying+0.026
 redd+0.025
 Heidi+0.025
 blonde+0.024
Neg logit contributions
orse-0.020
 oct-0.019
rim-0.019
ands-0.019
aver-0.017
Token off
Token ID572
Feature activation-5.04e-02

Pos logit contributions
Aud+0.000
ifying+0.000
 blonde+0.000
rified+0.000
 redd+0.000
Neg logit contributions
ands-0.001
 oct-0.001
orse-0.001
rim-0.001
rr-0.001
Token on
Token ID319
Feature activation-3.40e-02

Pos logit contributions
 Heidi+0.078
Aud+0.078
 Sharon+0.075
 Alice+0.075
 Stephen+0.074
Neg logit contributions
ands-0.087
rim-0.083
orse-0.079
 oct-0.078
rr-0.076
Token his
Token ID465
Feature activation+2.32e-02

Pos logit contributions
Aud+0.067
 redd+0.065
rified+0.064
ifying+0.064
 Heidi+0.064
Neg logit contributions
ands-0.045
 oct-0.041
orse-0.039
rim-0.039
rr-0.037
Token journey
Token ID7002
Feature activation+6.05e-02

Pos logit contributions
rim+0.040
ands+0.039
ulu+0.037
orse+0.035
rr+0.034
Neg logit contributions
 Heidi-0.036
 Stephen-0.036
 blonde-0.036
 Alice-0.036
Aud-0.036
Token alone
Token ID3436
Feature activation+6.59e-02

Pos logit contributions
ands+0.085
rim+0.077
orse+0.076
 oct+0.075
rr+0.071
Neg logit contributions
Aud-0.113
 Heidi-0.110
ifying-0.109
 redd-0.109
 blonde-0.107
Token,
Token ID11
Feature activation+1.75e-02

Pos logit contributions
ands+0.104
rim+0.100
orse+0.094
 oct+0.091
ulu+0.089
Neg logit contributions
 Heidi-0.112
 Alice-0.110
 Sharon-0.109
Aud-0.109
rified-0.109
Token but
Token ID475
Feature activation+2.22e-02

Pos logit contributions
ands+0.021
orse+0.021
 oct+0.021
rim+0.020
 seek+0.018
Neg logit contributions
Aud-0.035
ifying-0.034
 Heidi-0.033
 redd-0.033
se-0.032
Token he
Token ID339
Feature activation-1.44e-02

Pos logit contributions
ands+0.027
orse+0.026
 oct+0.025
rim+0.025
ulu+0.023
Neg logit contributions
Aud-0.045
 Heidi-0.043
 blonde-0.043
 redd-0.042
ifying-0.042
Token was
Token ID373
Feature activation-2.27e-03

Pos logit contributions
Aud+0.024
 Heidi+0.024
ifying+0.023
 redd+0.023
 blonde+0.023
Neg logit contributions
ands-0.023
rim-0.022
orse-0.022
 oct-0.022
ulu-0.021
Token was
Token ID373
Feature activation+1.29e-03

Pos logit contributions
ifying+0.102
Aud+0.102
 blonde+0.094
 redd+0.094
 Heidi+0.093
Neg logit contributions
 oct-0.037
 seek-0.034
 adventure-0.031
 now-0.031
orse-0.030
Token the
Token ID262
Feature activation-4.12e-03

Pos logit contributions
ands+0.002
rim+0.002
 oct+0.001
orse+0.001
ulu+0.001
Neg logit contributions
Aud-0.003
rified-0.003
ifying-0.003
 Sharon-0.002
 blonde-0.002
Token right
Token ID826
Feature activation-2.70e-04

Pos logit contributions
Aud+0.007
 Heidi+0.006
 Sharon+0.006
ifying+0.006
 Alice+0.006
Neg logit contributions
ands-0.007
rim-0.006
 oct-0.006
orse-0.006
rr-0.006
Token thing
Token ID1517
Feature activation-2.68e-02

Pos logit contributions
 Heidi+0.000
Aud+0.000
 Alice+0.000
 Sharon+0.000
 redd+0.000
Neg logit contributions
ands-0.000
rim-0.000
orse-0.000
 oct-0.000
inations-0.000
Token to
Token ID284
Feature activation+9.24e-03

Pos logit contributions
Aud+0.051
 Heidi+0.049
 redd+0.049
 blonde+0.049
 Sharon+0.049
Neg logit contributions
ands-0.035
 oct-0.034
rim-0.033
orse-0.033
ulu-0.031
Token do
Token ID466
Feature activation+6.59e-02

Pos logit contributions
ands+0.015
rim+0.014
 oct+0.013
orse+0.013
rr+0.012
Neg logit contributions
 Heidi-0.016
 Alice-0.015
 blonde-0.015
 Michelle-0.015
Aud-0.015
Token.
Token ID13
Feature activation+5.77e-03

Pos logit contributions
ands+0.099
rim+0.097
orse+0.089
 oct+0.088
ulu+0.086
Neg logit contributions
 Heidi-0.115
 Alice-0.114
Aud-0.111
 Stephen-0.111
 Sharon-0.110
Token So
Token ID1406
Feature activation+5.95e-03

Pos logit contributions
 oct+0.008
orse+0.008
rim+0.007
 ig+0.007
ands+0.007
Neg logit contributions
Aud-0.010
 blonde-0.009
ifying-0.009
 Michelle-0.009
 Heidi-0.009
Token he
Token ID339
Feature activation-6.51e-03

Pos logit contributions
ands+0.007
 oct+0.007
rim+0.007
orse+0.007
ulu+0.006
Neg logit contributions
Aud-0.012
 Heidi-0.012
 blonde-0.012
 redd-0.011
ifying-0.011
Token shared
Token ID4888
Feature activation-4.35e-02

Pos logit contributions
ifying+0.011
 Heidi+0.011
Aud+0.011
 blonde+0.011
rified+0.011
Neg logit contributions
ands-0.010
orse-0.010
 oct-0.010
rim-0.010
ulu-0.009
Token his
Token ID465
Feature activation+2.24e-02

Pos logit contributions
Aud+0.105
rified+0.103
ifying+0.100
 Heidi+0.100
se+0.099
Neg logit contributions
 oct-0.038
ands-0.035
orse-0.035
rim-0.034
ulu-0.028
Token questions
Token ID2683
Feature activation-2.75e-02

Pos logit contributions
ands+0.004
rim+0.004
 oct+0.004
rr+0.004
 hour+0.004
Neg logit contributions
Aud-0.006
ifying-0.006
 redd-0.006
 Heidi-0.005
rified-0.005
Token.
Token ID13
Feature activation-5.01e-03

Pos logit contributions
Aud+0.053
 Heidi+0.053
 redd+0.051
rified+0.051
 Sharon+0.051
Neg logit contributions
ands-0.037
rim-0.033
orse-0.032
 oct-0.032
rr-0.031
Token She
Token ID1375
Feature activation+1.38e-02

Pos logit contributions
Aud+0.006
ifying+0.006
rified+0.005
 redd+0.005
 Heidi+0.005
Neg logit contributions
ands-0.010
 oct-0.010
rim-0.010
rr-0.009
ulu-0.009
Token wanted
Token ID2227
Feature activation-1.02e-02

Pos logit contributions
ands+0.026
rim+0.025
orse+0.023
 oct+0.023
rr+0.023
Neg logit contributions
 Heidi-0.019
Aud-0.019
 Alice-0.019
 Sharon-0.019
 Stephen-0.019
Token to
Token ID284
Feature activation+1.20e-02

Pos logit contributions
Aud+0.016
ifying+0.016
rified+0.015
 Heidi+0.015
 redd+0.015
Neg logit contributions
ands-0.018
 oct-0.017
rim-0.016
orse-0.016
rr-0.015
Token know
Token ID760
Feature activation+6.66e-02

Pos logit contributions
ands+0.019
rim+0.018
orse+0.017
 oct+0.017
ulu+0.017
Neg logit contributions
 Heidi-0.021
 Alice-0.020
 Sharon-0.020
 Stephen-0.020
 Monica-0.020
Token what
Token ID644
Feature activation-7.60e-02

Pos logit contributions
ands+0.075
 oct+0.071
rim+0.069
 ig+0.066
inations+0.065
Neg logit contributions
Aud-0.139
ifying-0.135
se-0.135
rified-0.134
 Heidi-0.130
Token others
Token ID1854
Feature activation-3.34e-02

Pos logit contributions
ifying+0.158
rified+0.150
Aud+0.146
severe+0.141
 redd+0.138
Neg logit contributions
 oct-0.093
ands-0.090
orse-0.088
rim-0.086
 seek-0.083
Token were
Token ID547
Feature activation+2.79e-03

Pos logit contributions
Aud+0.052
 Heidi+0.051
ifying+0.049
 redd+0.049
 Sharon+0.048
Neg logit contributions
ands-0.057
rim-0.055
 oct-0.053
orse-0.053
ulu-0.050
Token doing
Token ID1804
Feature activation+3.94e-02

Pos logit contributions
ands+0.004
rim+0.004
orse+0.004
 oct+0.004
rr+0.004
Neg logit contributions
 Heidi-0.005
Aud-0.005
 Alice-0.005
 Sharon-0.005
 Stephen-0.005
Token,
Token ID11
Feature activation+2.33e-02

Pos logit contributions
ands+0.061
rim+0.058
orse+0.056
 oct+0.054
ulu+0.052
Neg logit contributions
 Heidi-0.068
Aud-0.067
 Alice-0.066
 Sharon-0.065
 Stephen-0.065
Token parents
Token ID3397
Feature activation-2.38e-02

Pos logit contributions
 Heidi+0.004
Aud+0.004
 Sharon+0.004
 Alice+0.004
 blonde+0.004
Neg logit contributions
ands-0.006
rim-0.005
orse-0.005
 oct-0.005
rr-0.005
Token took
Token ID1718
Feature activation-1.07e-02

Pos logit contributions
Aud+0.036
 Heidi+0.036
 redd+0.035
 blonde+0.034
 Sharon+0.034
Neg logit contributions
ands-0.041
rim-0.040
orse-0.038
 oct-0.038
ulu-0.037
Token her
Token ID607
Feature activation+1.05e-02

Pos logit contributions
ifying+0.020
Aud+0.019
 redd+0.019
 blonde+0.019
 Heidi+0.018
Neg logit contributions
ands-0.015
ulu-0.013
rim-0.012
 oct-0.012
rr-0.012
Token to
Token ID284
Feature activation+3.44e-02

Pos logit contributions
ands+0.018
 oct+0.017
rim+0.016
orse+0.016
ulu+0.015
Neg logit contributions
Aud-0.016
 Heidi-0.016
 redd-0.016
 blonde-0.016
 Sharon-0.015
Token the
Token ID262
Feature activation+1.21e-02

Pos logit contributions
ands+0.064
rim+0.062
orse+0.060
 oct+0.058
rr+0.058
Neg logit contributions
 Heidi-0.051
 Alice-0.049
Aud-0.049
 Sharon-0.048
 Monica-0.047
Token hospital
Token ID4436
Feature activation+6.58e-02

Pos logit contributions
ands+0.018
rim+0.017
 oct+0.017
orse+0.016
rr+0.016
Neg logit contributions
Aud-0.021
 Heidi-0.021
 Sharon-0.020
 Alice-0.020
rified-0.020
Token,
Token ID11
Feature activation+4.99e-03

Pos logit contributions
ands+0.099
rim+0.092
orse+0.089
 oct+0.088
rr+0.084
Neg logit contributions
Aud-0.116
 Heidi-0.115
 redd-0.112
rified-0.112
ifying-0.111
Token but
Token ID475
Feature activation+1.71e-02

Pos logit contributions
ands+0.007
rim+0.007
orse+0.006
 oct+0.006
ulu+0.006
Neg logit contributions
Aud-0.009
ifying-0.009
 Heidi-0.009
rified-0.009
 redd-0.009
Token it
Token ID340
Feature activation-4.97e-02

Pos logit contributions
ands+0.026
rim+0.025
ulu+0.025
 oct+0.025
orse+0.024
Neg logit contributions
Aud-0.029
 blonde-0.027
 Heidi-0.027
ifying-0.027
 redd-0.027
Token was
Token ID373
Feature activation-1.84e-03

Pos logit contributions
Aud+0.100
 blonde+0.098
 Heidi+0.097
 Stephen+0.097
ifying+0.096
Neg logit contributions
rim-0.061
ulu-0.059
 oct-0.056
ands-0.055
orse-0.055
Token too
Token ID1165
Feature activation+1.96e-02

Pos logit contributions
Aud+0.004
 Heidi+0.003
rified+0.003
 Stephen+0.003
 Sharon+0.003
Neg logit contributions
ands-0.002
rim-0.002
 oct-0.002
orse-0.002
ulu-0.002
Token was
Token ID373
Feature activation+1.39e-02

Pos logit contributions
ands+0.009
rim+0.008
orse+0.008
 oct+0.008
rr+0.008
Neg logit contributions
Aud-0.005
 Heidi-0.005
rified-0.005
 Sharon-0.005
 Stephen-0.005
Token safe
Token ID3338
Feature activation-1.76e-02

Pos logit contributions
ands+0.025
rim+0.023
 oct+0.022
orse+0.021
rr+0.021
Neg logit contributions
 Heidi-0.021
Aud-0.021
 Alice-0.021
 Stephen-0.021
 Sharon-0.020
Token and
Token ID290
Feature activation+2.71e-02

Pos logit contributions
Aud+0.033
 Heidi+0.033
 Sharon+0.032
 blonde+0.032
 redd+0.032
Neg logit contributions
ands-0.024
rim-0.023
 oct-0.023
orse-0.022
rr-0.021
Token went
Token ID1816
Feature activation+1.29e-02

Pos logit contributions
ands+0.039
rim+0.036
orse+0.034
 oct+0.034
rr+0.033
Neg logit contributions
Aud-0.050
 Heidi-0.050
 Sharon-0.049
 Alice-0.049
 Stephen-0.048
Token to
Token ID284
Feature activation+3.05e-02

Pos logit contributions
ands+0.027
rim+0.026
orse+0.025
 oct+0.025
rr+0.024
Neg logit contributions
Aud-0.015
 Heidi-0.015
 Sharon-0.015
 Alice-0.014
 Stephen-0.014
Token find
Token ID1064
Feature activation+6.58e-02

Pos logit contributions
ands+0.045
rim+0.045
ulu+0.040
orse+0.039
rr+0.038
Neg logit contributions
 Heidi-0.057
 Alice-0.055
 Stephen-0.055
 Sharon-0.054
Aud-0.054
Token some
Token ID617
Feature activation-3.13e-04

Pos logit contributions
ands+0.102
rim+0.097
orse+0.092
 oct+0.091
rr+0.088
Neg logit contributions
 Heidi-0.115
Aud-0.113
 Alice-0.110
 Sharon-0.110
 Stephen-0.109
Token cheese
Token ID9891
Feature activation-3.58e-02

Pos logit contributions
Aud+0.001
 Heidi+0.001
 blonde+0.001
 redd+0.000
 Stephen+0.000
Neg logit contributions
ands-0.000
rim-0.000
ulu-0.000
orse-0.000
rr-0.000
Token to
Token ID284
Feature activation+2.56e-02

Pos logit contributions
Aud+0.062
 Heidi+0.062
 Alice+0.060
 Stephen+0.060
rified+0.060
Neg logit contributions
ands-0.054
rim-0.052
orse-0.048
 oct-0.047
rr-0.047
Token eat
Token ID4483
Feature activation+2.63e-02

Pos logit contributions
ands+0.040
rim+0.039
rr+0.037
ulu+0.037
 oct+0.035
Neg logit contributions
 Heidi-0.040
Aud-0.039
 Stephen-0.039
 Alice-0.039
 Sharon-0.039
Token.
Token ID13
Feature activation+6.72e-03

Pos logit contributions
ands+0.030
rim+0.027
 oct+0.026
orse+0.026
rr+0.023
Neg logit contributions
Aud-0.057
 Heidi-0.056
 Sharon-0.055
 blonde-0.055
 Monica-0.054
Token "
Token ID366
Feature activation-5.97e-02

Pos logit contributions
rified+0.017
 redd+0.017
Aud+0.017
 Heidi+0.017
ifying+0.017
Neg logit contributions
orse-0.022
 oct-0.021
ands-0.020
rr-0.019
 next-0.019
TokenLook
Token ID8567
Feature activation-2.86e-02

Pos logit contributions
Aud+0.078
 Heidi+0.077
 Sharon+0.074
 Alice+0.073
rified+0.073
Neg logit contributions
ands-0.120
rim-0.115
orse-0.111
 oct-0.110
rr-0.107
Token,
Token ID11
Feature activation+4.41e-03

Pos logit contributions
Aud+0.057
 Heidi+0.056
 redd+0.055
 blonde+0.055
 Sharon+0.054
Neg logit contributions
ands-0.035
orse-0.033
 oct-0.033
rim-0.033
rr-0.031
Token Tom
Token ID4186
Feature activation-2.43e-04

Pos logit contributions
ands+0.007
rim+0.007
 oct+0.007
orse+0.007
ulu+0.006
Neg logit contributions
Aud-0.007
 Heidi-0.007
 Sharon-0.007
 blonde-0.007
rified-0.007
Token,
Token ID11
Feature activation-3.92e-03

Pos logit contributions
Aud+0.001
 Heidi+0.001
ifying+0.001
 Sharon+0.000
 redd+0.000
Neg logit contributions
ands-0.000
rim-0.000
orse-0.000
 oct-0.000
rr-0.000
Token I
Token ID314
Feature activation-2.01e-02

Pos logit contributions
Aud+0.006
 Heidi+0.006
 Sharon+0.006
 Alice+0.006
rified+0.006
Neg logit contributions
ands-0.006
rim-0.006
orse-0.006
 oct-0.006
rr-0.006
Token found
Token ID1043
Feature activation+2.20e-02

Pos logit contributions
Aud+0.024
'll+0.024
 won+0.023
se+0.023
 blonde+0.022
Neg logit contributions
ands-0.042
rim-0.040
 oct-0.036
orse-0.036
rr-0.035
Token the
Token ID262
Feature activation-2.89e-03

Pos logit contributions
ands+0.039
rim+0.038
orse+0.036
 oct+0.035
rr+0.034
Neg logit contributions
 Heidi-0.033
Aud-0.032
 Alice-0.032
rified-0.032
 Sharon-0.032
Token light
Token ID1657
Feature activation+5.71e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
 Sharon+0.005
 Alice+0.005
 blonde+0.005
Neg logit contributions
ands-0.005
rim-0.004
 oct-0.004
orse-0.004
rr-0.004
Token!
Token ID0
Feature activation+1.22e-02

Pos logit contributions
ands+0.086
rim+0.081
orse+0.077
 oct+0.076
rr+0.073
Neg logit contributions
 Heidi-0.100
Aud-0.100
 Alice-0.097
 Sharon-0.097
ifying-0.096
Token We
Token ID775
Feature activation-1.29e-02

Pos logit contributions
ands+0.028
orse+0.027
 oct+0.027
rim+0.027
rr+0.027
Neg logit contributions
Aud-0.011
 Heidi-0.011
 redd-0.010
rified-0.010
 Sharon-0.010
Token to
Token ID284
Feature activation+4.73e-03

Pos logit contributions
ands+0.007
rim+0.007
orse+0.006
 oct+0.006
ulu+0.006
Neg logit contributions
Aud-0.008
 Heidi-0.008
rified-0.008
ifying-0.008
 Sharon-0.008
Token be
Token ID307
Feature activation+1.00e-01

Pos logit contributions
ands+0.007
rim+0.007
orse+0.007
loo+0.006
inations+0.006
Neg logit contributions
 Heidi-0.008
 Sharon-0.007
 Alice-0.007
 Michelle-0.007
 Tab-0.007
Token something
Token ID1223
Feature activation-4.40e-02

Pos logit contributions
ands+0.151
rim+0.142
orse+0.135
 oct+0.134
rr+0.128
Neg logit contributions
Aud-0.178
 Heidi-0.178
 Sharon-0.172
 Alice-0.172
rified-0.171
Token that
Token ID326
Feature activation+2.75e-02

Pos logit contributions
 Heidi+0.081
Aud+0.080
 Alice+0.079
 Sharon+0.078
 redd+0.078
Neg logit contributions
ands-0.063
rim-0.059
orse-0.056
 oct-0.054
rr-0.054
Token was
Token ID373
Feature activation+1.04e-02

Pos logit contributions
ands+0.045
rim+0.043
 oct+0.043
orse+0.042
ulu+0.042
Neg logit contributions
Aud-0.044
ifying-0.042
 Heidi-0.042
rified-0.042
 blonde-0.041
Token charming
Token ID23332
Feature activation+4.23e-03

Pos logit contributions
ands+0.015
rim+0.014
 oct+0.014
orse+0.014
ulu+0.013
Neg logit contributions
Aud-0.019
 Heidi-0.018
rified-0.018
 Sharon-0.018
 blonde-0.018
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
ands+0.005
rim+0.005
 oct+0.005
orse+0.005
ulu+0.004
Neg logit contributions
Aud-0.009
 Heidi-0.008
 Sharon-0.008
rified-0.008
 Stephen-0.008
Token
Token ID198
Feature activation+1.89e-02

Pos logit contributions
ulu+0.017
ands+0.017
orse+0.016
rim+0.016
rr+0.015
Neg logit contributions
Aud-0.017
se-0.016
 redd-0.016
string-0.016
ifying-0.016
Token
Token ID198
Feature activation+2.20e-02

Pos logit contributions
ands+0.038
rim+0.036
 oct+0.035
rr+0.034
orse+0.033
Neg logit contributions
Aud-0.025
 Heidi-0.025
 Alice-0.024
 Stephen-0.023
 Sharon-0.023
TokenThe
Token ID464
Feature activation+9.82e-03

Pos logit contributions
ands+0.037
rim+0.036
 oct+0.036
orse+0.034
rr+0.032
Neg logit contributions
Aud-0.037
 Heidi-0.034
Scar-0.033
 Alice-0.033
 redd-0.033
Token next
Token ID1306
Feature activation-7.70e-03

Pos logit contributions
ands+0.016
rim+0.014
orse+0.014
inations+0.014
rr+0.013
Neg logit contributions
 Sharon-0.015
 Stephen-0.015
 redd-0.015
 Tab-0.015
Aud-0.015
Token with
Token ID351
Feature activation+9.96e-03

Pos logit contributions
ands+0.069
rim+0.065
orse+0.062
 oct+0.061
rr+0.060
Neg logit contributions
 Heidi-0.081
Aud-0.081
 Alice-0.079
 Sharon-0.078
 Stephen-0.077
Token her
Token ID607
Feature activation+7.28e-03

Pos logit contributions
ands+0.017
rim+0.015
 oct+0.015
orse+0.015
ulu+0.014
Neg logit contributions
Aud-0.016
 Heidi-0.016
 Sharon-0.015
rified-0.015
 Stephen-0.015
Token daddy
Token ID35822
Feature activation+1.58e-02

Pos logit contributions
ands+0.010
rim+0.010
rr+0.009
orse+0.009
 oct+0.008
Neg logit contributions
Aud-0.013
 Heidi-0.013
 Alice-0.013
 blonde-0.013
 Sharon-0.013
Token.
Token ID13
Feature activation+3.12e-03

Pos logit contributions
ands+0.021
rim+0.020
orse+0.019
 oct+0.019
rr+0.018
Neg logit contributions
Aud-0.030
 Heidi-0.030
 redd-0.029
rified-0.029
 Sharon-0.029
Token It
Token ID632
Feature activation+1.17e-03

Pos logit contributions
ands+0.005
orse+0.005
 oct+0.005
ulu+0.005
rr+0.004
Neg logit contributions
Aud-0.005
 redd-0.005
rified-0.005
ifying-0.005
 Heidi-0.005
Token was
Token ID373
Feature activation+1.36e-03

Pos logit contributions
ands+0.002
rim+0.002
orse+0.002
 oct+0.002
rr+0.002
Neg logit contributions
Aud-0.002
 Heidi-0.002
 Sharon-0.002
 Alice-0.002
 Stephen-0.002
Token the
Token ID262
Feature activation-1.70e-02

Pos logit contributions
ands+0.002
rim+0.002
 oct+0.002
orse+0.002
ulu+0.002
Neg logit contributions
Aud-0.002
 Heidi-0.002
rified-0.002
 Sharon-0.002
 Stephen-0.002
Token best
Token ID1266
Feature activation-3.45e-02

Pos logit contributions
Aud+0.024
 Heidi+0.024
 redd+0.023
 Sharon+0.023
 Alice+0.023
Neg logit contributions
ands-0.032
rim-0.030
orse-0.029
 oct-0.028
rr-0.028
Token pool
Token ID5933
Feature activation+5.97e-03

Pos logit contributions
 Alice+0.063
 Heidi+0.060
 Robert+0.059
 Stephen+0.059
 goodbye+0.058
Neg logit contributions
ands-0.051
rim-0.045
orse-0.043
rr-0.042
ulu-0.040
Token ever
Token ID1683
Feature activation+2.81e-02

Pos logit contributions
ands+0.006
rim+0.006
 oct+0.005
orse+0.005
ulu+0.005
Neg logit contributions
Aud-0.013
 Heidi-0.013
 Sharon-0.013
 blonde-0.013
 Stephen-0.013
Token!
Token ID0
Feature activation+5.88e-03

Pos logit contributions
ands+0.045
rim+0.041
orse+0.039
 oct+0.038
rr+0.037
Neg logit contributions
 Alice-0.048
 Heidi-0.048
Aud-0.048
rified-0.046
 Stephen-0.046
Token is
Token ID318
Feature activation-3.18e-03

Pos logit contributions
ands+0.017
 oct+0.016
rim+0.016
orse+0.016
ulu+0.015
Neg logit contributions
 Heidi-0.013
rified-0.013
Aud-0.013
ifying-0.013
 redd-0.013
Token reading
Token ID3555
Feature activation-3.57e-02

Pos logit contributions
Aud+0.006
 Heidi+0.006
rified+0.005
 Sharon+0.005
ifying+0.005
Neg logit contributions
ands-0.005
rim-0.004
orse-0.004
 oct-0.004
rr-0.004
Token a
Token ID257
Feature activation-2.46e-03

Pos logit contributions
Aud+0.072
 Heidi+0.070
rified+0.068
 redd+0.068
 Michelle+0.068
Neg logit contributions
ands-0.044
 oct-0.042
orse-0.037
rim-0.037
rr-0.036
Token book
Token ID1492
Feature activation-6.47e-02

Pos logit contributions
 Heidi+0.004
rified+0.004
Aud+0.003
 Sharon+0.003
 Alice+0.003
Neg logit contributions
ands-0.004
 oct-0.004
rim-0.004
orse-0.004
rr-0.004
Token.
Token ID13
Feature activation-8.78e-03

Pos logit contributions
Aud+0.116
 Heidi+0.116
 Alice+0.113
 Sharon+0.112
 Stephen+0.112
Neg logit contributions
ands-0.096
rim-0.091
orse-0.086
 oct-0.085
rr-0.081
Token
Token ID198
Feature activation+1.82e-02

Pos logit contributions
Aud+0.016
se+0.015
rified+0.015
 redd+0.015
string+0.015
Neg logit contributions
ands-0.012
orse-0.012
 oct-0.012
rr-0.011
ulu-0.011
Token
Token ID198
Feature activation+2.16e-02

Pos logit contributions
ands+0.033
rim+0.032
 oct+0.030
orse+0.030
rr+0.029
Neg logit contributions
 Heidi-0.027
Aud-0.027
 Alice-0.026
 Sharon-0.026
 Stephen-0.026
Token"
Token ID1
Feature activation-2.14e-03

Pos logit contributions
ands+0.030
rim+0.028
orse+0.027
 oct+0.026
rr+0.025
Neg logit contributions
 Heidi-0.041
Aud-0.041
 blonde-0.040
rified-0.040
 Sharon-0.039
TokenDad
Token ID46270
Feature activation+2.15e-02

Pos logit contributions
Aud+0.003
 Heidi+0.003
 Sharon+0.003
 Alice+0.003
 Stephen+0.003
Neg logit contributions
ands-0.004
rim-0.004
orse-0.004
 oct-0.004
rr-0.004
Token,
Token ID11
Feature activation+1.03e-03

Pos logit contributions
ands+0.020
rr+0.020
 oct+0.019
orse+0.019
rim+0.019
Neg logit contributions
Aud-0.048
ifying-0.047
 redd-0.047
 Heidi-0.046
rified-0.046
Token can
Token ID460
Feature activation-2.40e-02

Pos logit contributions
ands+0.002
 oct+0.002
rim+0.002
orse+0.002
rr+0.001
Neg logit contributions
Aud-0.002
 Heidi-0.002
 blonde-0.002
rified-0.002
 Sharon-0.002
Token some
Token ID617
Feature activation-1.59e-02

Pos logit contributions
Aud+0.006
rified+0.005
 Heidi+0.005
 Audrey+0.005
 Elizabeth+0.005
Neg logit contributions
 oct-0.004
orse-0.003
ands-0.003
rim-0.003
ulu-0.003
Token delicious
Token ID12625
Feature activation-4.43e-02

Pos logit contributions
Aud+0.024
 Heidi+0.024
 Sharon+0.023
rified+0.023
 Stephen+0.023
Neg logit contributions
ands-0.028
rim-0.027
 oct-0.026
orse-0.025
inations-0.025
Token snacks
Token ID28487
Feature activation-6.71e-02

Pos logit contributions
Aud+0.057
 Sharon+0.055
rified+0.055
ifying+0.054
 Scarlett+0.054
Neg logit contributions
ands-0.089
rim-0.086
 oct-0.085
orse-0.081
rr-0.077
Token!
Token ID0
Feature activation+8.05e-03

Pos logit contributions
Aud+0.134
 Heidi+0.132
 blonde+0.131
 Sharon+0.131
rified+0.129
Neg logit contributions
ands-0.086
rim-0.079
 oct-0.077
orse-0.076
ulu-0.071
Token All
Token ID1439
Feature activation-1.50e-03

Pos logit contributions
orse+0.010
ands+0.009
 oct+0.009
ulu+0.009
rim+0.008
Neg logit contributions
 redd-0.016
 blonde-0.016
Aud-0.016
 wrists-0.015
rified-0.015
Token because
Token ID780
Feature activation+4.11e-02

Pos logit contributions
Aud+0.003
 Heidi+0.003
 blonde+0.002
 Sharon+0.002
 redd+0.002
Neg logit contributions
ands-0.002
rim-0.002
 oct-0.002
orse-0.002
ulu-0.002
Token it
Token ID340
Feature activation-4.63e-02

Pos logit contributions
 oct+0.055
ands+0.054
rim+0.052
orse+0.052
ulu+0.050
Neg logit contributions
Aud-0.079
rified-0.076
 Heidi-0.075
se-0.072
 blonde-0.072
Token took
Token ID1718
Feature activation-7.70e-03

Pos logit contributions
Aud+0.081
ifying+0.079
 Heidi+0.078
rified+0.077
 Sharon+0.077
Neg logit contributions
rim-0.068
 oct-0.066
ands-0.066
orse-0.065
ulu-0.064
Token a
Token ID257
Feature activation-4.41e-03

Pos logit contributions
Aud+0.015
bars+0.014
ifying+0.014
 Heidi+0.014
rified+0.014
Neg logit contributions
ands-0.010
rim-0.009
 oct-0.009
orse-0.008
rr-0.008
Token short
Token ID1790
Feature activation+2.26e-02

Pos logit contributions
 Heidi+0.007
Aud+0.007
 Sharon+0.007
rified+0.007
 Alice+0.007
Neg logit contributions
ands-0.007
rim-0.007
orse-0.006
 oct-0.006
rr-0.006
Token,
Token ID11
Feature activation+2.11e-02

Pos logit contributions
ands+0.034
rim+0.032
orse+0.031
 oct+0.030
rr+0.029
Neg logit contributions
Aud-0.040
 Heidi-0.040
 Sharon-0.039
rified-0.038
 Stephen-0.038
Token happy
Token ID3772
Feature activation-9.25e-03

Pos logit contributions
rim+0.104
ands+0.102
 oct+0.099
orse+0.097
ulu+0.093
Neg logit contributions
 Alice-0.086
Aud-0.081
 Sharon-0.080
 Heidi-0.078
 Stephen-0.076
Token.
Token ID13
Feature activation+6.25e-03

Pos logit contributions
 Alice+0.013
Aud+0.012
 Heidi+0.012
 Sharon+0.012
 wrists+0.012
Neg logit contributions
rim-0.017
ands-0.016
aver-0.016
ulu-0.016
orse-0.016
Token<|endoftext|>
Token ID50256
Feature activation-2.91e-02

Pos logit contributions
ands+0.012
rim+0.012
orse+0.012
 oct+0.011
rr+0.011
Neg logit contributions
Aud-0.008
 Heidi-0.008
 Alice-0.008
 Sharon-0.008
 Stephen-0.008
TokenS
Token ID50
Feature activation-4.20e-02

Pos logit contributions
Aud+0.046
 Heidi+0.046
 Sharon+0.044
 redd+0.044
 blonde+0.044
Neg logit contributions
ands-0.050
rim-0.047
 oct-0.045
orse-0.045
rr-0.044
Tokenara
Token ID3301
Feature activation+4.89e-03

Pos logit contributions
Aud+0.058
 Heidi+0.058
 Sharon+0.056
 Alice+0.055
rified+0.055
Neg logit contributions
ands-0.079
rim-0.076
orse-0.073
 oct-0.072
rr-0.070
Token and
Token ID290
Feature activation-4.23e-02

Pos logit contributions
ands+0.008
rim+0.008
orse+0.007
 oct+0.007
rr+0.007
Neg logit contributions
Aud-0.008
 Heidi-0.008
 Sharon-0.008
 Alice-0.007
rified-0.007
Token Ben
Token ID3932
Feature activation-1.66e-02

Pos logit contributions
rified+0.081
ifying+0.080
Aud+0.079
cases+0.079
 redd+0.078
Neg logit contributions
orse-0.052
 oct-0.050
ulu-0.048
ands-0.047
rim-0.043
Token were
Token ID547
Feature activation-2.51e-02

Pos logit contributions
Aud+0.024
 Heidi+0.024
 Sharon+0.023
rified+0.023
ifying+0.023
Neg logit contributions
ands-0.030
rim-0.028
orse-0.027
 oct-0.027
rr-0.026
Token playing
Token ID2712
Feature activation+4.16e-03

Pos logit contributions
Aud+0.042
 Heidi+0.042
 Sharon+0.041
rified+0.040
 Stephen+0.040
Neg logit contributions
ands-0.040
rim-0.038
orse-0.037
 oct-0.036
rr-0.035
Token in
Token ID287
Feature activation+7.39e-03

Pos logit contributions
ands+0.008
rim+0.008
orse+0.008
inations+0.007
 oct+0.007
Neg logit contributions
Aud-0.005
 Heidi-0.005
rified-0.005
 Alice-0.005
 Sharon-0.005
Token the
Token ID262
Feature activation+2.96e-02

Pos logit contributions
rim+0.018
ands+0.018
orse+0.018
inations+0.018
aver+0.017
Neg logit contributions
 Alice-0.008
 Heidi-0.007
 Ashley-0.006
 Sharon-0.006
 Monica-0.006
Token day
Token ID1110
Feature activation+3.64e-02

Pos logit contributions
ands+0.055
inations+0.047
rim+0.046
rr+0.041
orse+0.039
Neg logit contributions
 Heidi-0.048
 Alice-0.046
 redd-0.045
Aud-0.044
 Cathy-0.044
Token,
Token ID11
Feature activation-1.04e-03

Pos logit contributions
ands+0.058
rim+0.054
orse+0.050
 oct+0.050
rr+0.048
Neg logit contributions
 Heidi-0.063
Aud-0.062
 Alice-0.061
 Stephen-0.060
 Sharon-0.060
Token he
Token ID339
Feature activation-5.96e-03

Pos logit contributions
 Heidi+0.002
 Alice+0.002
 Stephen+0.002
 redd+0.002
 Sharon+0.002
Neg logit contributions
ands-0.002
rim-0.002
rr-0.002
inations-0.002
orse-0.002
Token met
Token ID1138
Feature activation+1.78e-02

Pos logit contributions
 Heidi+0.008
Aud+0.008
 Alice+0.008
 Sharon+0.008
 blonde+0.008
Neg logit contributions
ands-0.012
rim-0.011
orse-0.010
 oct-0.010
rr-0.010
Token a
Token ID257
Feature activation-2.97e-02

Pos logit contributions
ands+0.029
rim+0.027
orse+0.026
 oct+0.026
rr+0.025
Neg logit contributions
Aud-0.029
 Heidi-0.029
 Sharon-0.028
 Alice-0.028
 blonde-0.028
Token new
Token ID649
Feature activation-4.35e-02

Pos logit contributions
Aud+0.051
 Heidi+0.051
 Sharon+0.049
 Alice+0.049
rified+0.049
Neg logit contributions
ands-0.046
rim-0.044
orse-0.042
 oct-0.042
rr-0.040
Token friend
Token ID1545
Feature activation+2.83e-02

Pos logit contributions
 Heidi+0.065
 Sharon+0.063
Aud+0.062
 Alice+0.062
 Stephen+0.062
Neg logit contributions
ands-0.081
rim-0.074
inations-0.071
orse-0.071
rr-0.068
Token named
Token ID3706
Feature activation-5.56e-05

Pos logit contributions
ands+0.042
rim+0.036
inations+0.034
orse+0.034
rr+0.032
Neg logit contributions
 Heidi-0.055
 Alice-0.054
 Sharon-0.053
 Stephen-0.053
Aud-0.053
Token Billy
Token ID15890
Feature activation-2.78e-02

Pos logit contributions
Aud+0.000
ifying+0.000
 Heidi+0.000
rified+0.000
 redd+0.000
Neg logit contributions
ands-0.000
 oct-0.000
rim-0.000
orse-0.000
ulu-0.000
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
 Heidi+0.051
Aud+0.050
 Alice+0.049
 Stephen+0.049
 Sharon+0.049
Neg logit contributions
ands-0.042
rim-0.038
orse-0.035
inations-0.035
 oct-0.034
Token Billy
Token ID15890
Feature activation-2.49e-02

Pos logit contributions
rified+0.017
Aud+0.016
ifying+0.016
licks+0.016
 redd+0.016
Neg logit contributions
orse-0.015
 oct-0.015
ands-0.014
rim-0.014
rr-0.013
Token clever
Token ID14169
Feature activation-2.55e-02

Pos logit contributions
 Heidi+0.053
Aud+0.053
 Sharon+0.052
 redd+0.051
 blonde+0.051
Neg logit contributions
ands-0.059
rim-0.054
orse-0.050
rr-0.049
inations-0.049
Token cat
Token ID3797
Feature activation-1.84e-03

Pos logit contributions
Aud+0.041
 Heidi+0.040
 Sharon+0.039
 Alice+0.039
 Stephen+0.039
Neg logit contributions
ands-0.043
rim-0.040
orse-0.039
 oct-0.038
rr-0.037
Token named
Token ID3706
Feature activation-8.91e-03

Pos logit contributions
 Alice+0.003
 Heidi+0.003
 Sharon+0.003
 Stephen+0.003
 Michelle+0.003
Neg logit contributions
ands-0.003
inations-0.003
rim-0.003
rr-0.003
ulu-0.003
Token Kitty
Token ID21168
Feature activation-3.35e-02

Pos logit contributions
rified+0.012
ifying+0.012
 Heidi+0.012
 Stephen+0.012
 Sharon+0.012
Neg logit contributions
ands-0.016
 oct-0.015
orse-0.015
rim-0.015
ulu-0.015
Token.
Token ID13
Feature activation-1.89e-02

Pos logit contributions
 Heidi+0.056
 Alice+0.055
Aud+0.054
 Stephen+0.053
 Sharon+0.053
Neg logit contributions
ands-0.056
rim-0.052
inations-0.048
orse-0.047
rr-0.047
Token Kitty
Token ID21168
Feature activation-3.04e-02

Pos logit contributions
Aud+0.023
 Heidi+0.023
 Sharon+0.022
rified+0.022
ifying+0.022
Neg logit contributions
ands-0.038
rim-0.036
orse-0.035
 oct-0.035
rr-0.034
Token liked
Token ID8288
Feature activation-2.65e-02

Pos logit contributions
 Heidi+0.044
Aud+0.044
 Alice+0.043
 Sharon+0.042
 blonde+0.042
Neg logit contributions
ands-0.056
rim-0.053
orse-0.051
 oct-0.050
rr-0.049
Token to
Token ID284
Feature activation+7.37e-03

Pos logit contributions
rified+0.044
ifying+0.043
 Heidi+0.041
 Sharon+0.040
Aud+0.040
Neg logit contributions
ands-0.044
 oct-0.043
rim-0.040
orse-0.039
rr-0.038
Token play
Token ID711
Feature activation+3.62e-02

Pos logit contributions
rim+0.013
ands+0.012
inations+0.012
ulu+0.012
rr+0.012
Neg logit contributions
 blonde-0.011
Aud-0.011
 Heidi-0.011
 Alice-0.010
 Sharon-0.010
Token with
Token ID351
Feature activation+4.96e-03

Pos logit contributions
ands+0.066
rim+0.065
orse+0.064
 oct+0.061
inations+0.061
Neg logit contributions
Aud-0.052
 Heidi-0.050
 Alice-0.049
 Audrey-0.048
 Stephen-0.047
Token her
Token ID607
Feature activation+6.35e-03

Pos logit contributions
ands+0.009
rim+0.009
 oct+0.009
orse+0.008
rr+0.008
Neg logit contributions
 Heidi-0.007
Aud-0.007
rified-0.007
 Sharon-0.007
ifying-0.007
Token them
Token ID606
Feature activation-2.07e-02

Pos logit contributions
ands+0.019
 oct+0.019
ulu+0.017
 himself+0.016
rim+0.015
Neg logit contributions
Aud-0.043
 Heidi-0.041
 Stephen-0.040
 Scarlett-0.040
cases-0.040
Token away
Token ID1497
Feature activation-3.59e-03

Pos logit contributions
 Heidi+0.034
Aud+0.033
 Sharon+0.033
 Stephen+0.032
 Michelle+0.032
Neg logit contributions
ands-0.034
 oct-0.030
rim-0.030
rr-0.029
ulu-0.029
Token.
Token ID13
Feature activation-3.43e-04

Pos logit contributions
Aud+0.008
 Heidi+0.008
 Sharon+0.008
 Stephen+0.008
 Monica+0.008
Neg logit contributions
ands-0.004
rim-0.003
 oct-0.003
ulu-0.003
rr-0.003
Token Lily
Token ID20037
Feature activation+4.61e-03

Pos logit contributions
Aud+0.001
ifying+0.001
string+0.001
 Stephen+0.001
 Heidi+0.001
Neg logit contributions
ands-0.000
ulu-0.000
 oct-0.000
orse-0.000
rim-0.000
Token felt
Token ID2936
Feature activation+1.19e-02

Pos logit contributions
orse+0.006
 oct+0.006
ands+0.006
rim+0.006
ulu+0.005
Neg logit contributions
Aud-0.009
ifying-0.009
 Heidi-0.009
 redd-0.008
 Stephen-0.008
Token sorry
Token ID7926
Feature activation-4.52e-02

Pos logit contributions
ands+0.022
rim+0.021
orse+0.020
 oct+0.020
rr+0.020
Neg logit contributions
 Heidi-0.017
Aud-0.016
 Sharon-0.016
 Alice-0.016
rified-0.016
Token for
Token ID329
Feature activation-3.09e-03

Pos logit contributions
 Heidi+0.077
Aud+0.075
 Alice+0.075
rified+0.075
 Sharon+0.074
Neg logit contributions
ands-0.072
rim-0.070
orse-0.065
 oct-0.063
rr-0.061
Token the
Token ID262
Feature activation-7.81e-04

Pos logit contributions
 Heidi+0.004
 Alice+0.004
 Sharon+0.004
 Stephen+0.004
 Michelle+0.004
Neg logit contributions
ands-0.006
rim-0.006
orse-0.006
rr-0.005
inations-0.005
Token girl
Token ID2576
Feature activation+3.98e-02

Pos logit contributions
 Heidi+0.001
Aud+0.001
 Alice+0.001
 Sharon+0.001
 redd+0.001
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
rr-0.001
Token.
Token ID13
Feature activation+3.47e-03

Pos logit contributions
ands+0.044
orse+0.044
 oct+0.044
rim+0.042
ulu+0.039
Neg logit contributions
Aud-0.083
 redd-0.080
ifying-0.079
 Heidi-0.079
 blonde-0.078
Token She
Token ID1375
Feature activation+9.69e-03

Pos logit contributions
ands+0.005
 oct+0.005
ulu+0.005
rr+0.005
 myself+0.005
Neg logit contributions
ifying-0.006
Aud-0.005
 redd-0.005
 Stephen-0.005
string-0.005
Token do
Token ID466
Feature activation+7.29e-02

Pos logit contributions
Aud+0.001
 Heidi+0.001
 Alice+0.001
ifying+0.001
 Sharon+0.001
Neg logit contributions
ands-0.001
rim-0.001
orse-0.001
 oct-0.001
ulu-0.001
Token!"
Token ID2474
Feature activation-1.11e-02

Pos logit contributions
ands+0.109
rim+0.108
 oct+0.102
orse+0.100
ulu+0.099
Neg logit contributions
 Alice-0.122
Aud-0.122
 Heidi-0.120
 blonde-0.119
 Sharon-0.119
Token His
Token ID2399
Feature activation-1.07e-02

Pos logit contributions
 Heidi+0.012
Aud+0.012
 Alice+0.012
 Sharon+0.012
 Stephen+0.012
Neg logit contributions
ands-0.024
rim-0.023
orse-0.022
 oct-0.022
rr-0.022
Token mom
Token ID1995
Feature activation-2.64e-02

Pos logit contributions
 Heidi+0.014
 Alice+0.014
Aud+0.013
 Sharon+0.013
 blonde+0.013
Neg logit contributions
ands-0.022
rim-0.021
orse-0.019
 oct-0.019
rr-0.019
Tokenmy
Token ID1820
Feature activation-3.30e-03

Pos logit contributions
Aud+0.037
 Heidi+0.037
 Sharon+0.035
 Alice+0.035
 Stephen+0.035
Neg logit contributions
ands-0.050
rim-0.048
orse-0.046
 oct-0.045
rr-0.044
Token smiled
Token ID13541
Feature activation-5.73e-02

Pos logit contributions
Aud+0.005
 Heidi+0.005
ifying+0.005
 redd+0.005
 blonde+0.005
Neg logit contributions
ands-0.005
rim-0.005
orse-0.005
 oct-0.005
rr-0.005
Token and
Token ID290
Feature activation-8.94e-03

Pos logit contributions
Aud+0.125
 Heidi+0.121
 redd+0.118
 Sharon+0.116
licks+0.116
Neg logit contributions
 oct-0.058
ands-0.055
orse-0.053
rr-0.053
rim-0.053
Token said
Token ID531
Feature activation+3.33e-03

Pos logit contributions
ifying+0.020
 blonde+0.019
Aud+0.019
rified+0.019
 Heidi+0.019
Neg logit contributions
orse-0.009
 oct-0.009
rim-0.008
ands-0.008
ulu-0.008
Token,
Token ID11
Feature activation-1.71e-02

Pos logit contributions
orse+0.003
ands+0.003
 oct+0.003
rim+0.002
rr+0.002
Neg logit contributions
Aud-0.008
ifying-0.008
 redd-0.008
 Monica-0.008
 Heidi-0.008
Token "
Token ID366
Feature activation-5.24e-02

Pos logit contributions
Aud+0.020
rified+0.019
ifying+0.019
 Heidi+0.018
 redd+0.018
Neg logit contributions
orse-0.035
ands-0.034
 oct-0.034
rim-0.034
rr-0.032
TokenMaybe
Token ID13300
Feature activation-7.98e-03

Pos logit contributions
Aud+0.075
 Heidi+0.075
 Sharon+0.072
 Alice+0.071
 Stephen+0.071
Neg logit contributions
ands-0.097
rim-0.092
orse-0.089
 oct-0.088
rr-0.085
Token,
Token ID11
Feature activation-1.10e-02

Pos logit contributions
ands+0.006
orse+0.006
 oct+0.006
rim+0.005
rr+0.005
Neg logit contributions
 redd-0.012
ifying-0.012
Aud-0.012
 Heidi-0.012
rified-0.012
Token a
Token ID257
Feature activation-4.59e-02

Pos logit contributions
Aud+0.021
 redd+0.020
 Heidi+0.020
ifying+0.020
licks+0.020
Neg logit contributions
ands-0.014
 oct-0.013
orse-0.013
ulu-0.013
 seek-0.012
Token kind
Token ID1611
Feature activation-6.58e-02

Pos logit contributions
Aud+0.075
 Heidi+0.075
 Sharon+0.074
 Alice+0.073
 Stephen+0.072
Neg logit contributions
ands-0.077
rim-0.073
orse-0.069
 oct-0.066
rr-0.065
Token man
Token ID582
Feature activation-1.28e-02

Pos logit contributions
 Heidi+0.107
Aud+0.106
 Sharon+0.104
 Alice+0.103
 Stephen+0.102
Neg logit contributions
ands-0.109
rim-0.103
orse-0.098
 oct-0.096
rr-0.093
Token named
Token ID3706
Feature activation-1.06e-02

Pos logit contributions
 Heidi+0.018
Aud+0.018
 Sharon+0.017
 Alice+0.017
 Stephen+0.017
Neg logit contributions
ands-0.024
rim-0.023
orse-0.022
 oct-0.022
rr-0.021
Token Mr
Token ID1770
Feature activation-1.08e-01

Pos logit contributions
ifying+0.014
Aud+0.013
 redd+0.013
 blonde+0.013
 Heidi+0.012
Neg logit contributions
 oct-0.021
ands-0.020
rr-0.019
rim-0.019
ulu-0.019
Token.
Token ID13
Feature activation+1.42e-02

Pos logit contributions
Aud+0.240
 blonde+0.238
 Heidi+0.238
 redd+0.238
ifying+0.234
Neg logit contributions
ands-0.111
 oct-0.096
rim-0.095
rr-0.093
orse-0.093
Token Johnson
Token ID5030
Feature activation-3.81e-02

Pos logit contributions
 oct+0.024
ands+0.023
rr+0.023
 seek+0.022
orse+0.021
Neg logit contributions
Aud-0.020
 redd-0.020
ifying-0.020
 Heidi-0.020
rified-0.019
Token heard
Token ID2982
Feature activation-1.04e-02

Pos logit contributions
ifying+0.067
Aud+0.067
 redd+0.066
 blonde+0.065
 Heidi+0.065
Neg logit contributions
orse-0.055
 oct-0.054
rim-0.054
ands-0.054
ulu-0.051
Token Lily
Token ID20037
Feature activation+2.96e-02

Pos logit contributions
 Heidi+0.015
Aud+0.015
 Alice+0.014
 Sharon+0.014
 Stephen+0.014
Neg logit contributions
ands-0.019
rim-0.018
orse-0.018
 oct-0.017
rr-0.017
Token's
Token ID338
Feature activation+9.05e-03

Pos logit contributions
ands+0.068
rim+0.065
orse+0.063
 oct+0.062
ulu+0.061
Neg logit contributions
 Heidi-0.029
Aud-0.029
 Alice-0.028
 Sharon-0.028
rified-0.027
Token€
Token ID26391
Feature activation-3.90e-03

Pos logit contributions
ands+0.037
rim+0.034
 oct+0.034
orse+0.034
inations+0.033
Neg logit contributions
Aud-0.027
 Heidi-0.026
 Alice-0.026
rified-0.025
 Sharon-0.025
Token said
Token ID531
Feature activation+1.69e-02

Pos logit contributions
 redd+0.006
rified+0.006
licks+0.006
ifying+0.006
string+0.006
Neg logit contributions
orse-0.007
rim-0.006
 seek-0.006
ands-0.006
 oct-0.006
Token Sally
Token ID25737
Feature activation-2.06e-02

Pos logit contributions
 oct+0.023
rim+0.023
orse+0.023
ands+0.022
ulu+0.021
Neg logit contributions
Aud-0.030
 redd-0.030
 Heidi-0.029
rified-0.029
licks-0.029
Token,
Token ID11
Feature activation-1.58e-02

Pos logit contributions
Aud+0.046
 redd+0.045
 Heidi+0.044
rified+0.043
ifying+0.043
Neg logit contributions
 oct-0.019
orse-0.017
rim-0.017
ands-0.016
ulu-0.016
Token looking
Token ID2045
Feature activation-3.08e-03

Pos logit contributions
Aud+0.020
 Heidi+0.020
rified+0.020
 redd+0.019
 Sharon+0.018
Neg logit contributions
 oct-0.030
orse-0.030
rim-0.029
ands-0.029
ulu-0.027
Token up
Token ID510
Feature activation-1.22e-01

Pos logit contributions
 Heidi+0.005
Aud+0.005
 Sharon+0.005
 Stephen+0.005
 Michelle+0.005
Neg logit contributions
 oct-0.005
ands-0.005
rim-0.005
rr-0.004
orse-0.004
Token at
Token ID379
Feature activation-1.20e-02

Pos logit contributions
Aud+0.208
 Heidi+0.204
 Sharon+0.196
 Stephen+0.195
 blonde+0.194
Neg logit contributions
ands-0.187
 oct-0.177
rim-0.177
orse-0.169
rr-0.169
Token Peter
Token ID5613
Feature activation+1.58e-02

Pos logit contributions
 Heidi+0.020
Aud+0.020
 Alice+0.019
 Sharon+0.019
 Stephen+0.019
Neg logit contributions
ands-0.019
rim-0.018
orse-0.017
 oct-0.017
rr-0.017
Token.
Token ID13
Feature activation+1.44e-03

Pos logit contributions
ands+0.022
rim+0.021
 oct+0.020
orse+0.020
rr+0.019
Neg logit contributions
Aud-0.030
 Heidi-0.029
 Sharon-0.029
 redd-0.029
 blonde-0.028
Token �
Token ID6184
Feature activation-4.51e-03

Pos logit contributions
 oct+0.002
orse+0.002
ulu+0.002
ands+0.002
rim+0.002
Neg logit contributions
Aud-0.002
 redd-0.002
rified-0.002
 wrists-0.002
 Heidi-0.002
Token�
Token ID95
Feature activation+6.76e-02

Pos logit contributions
Aud+0.006
 Heidi+0.006
 Sharon+0.005
 Stephen+0.005
 Alice+0.005
Neg logit contributions
ands-0.009
rim-0.009
orse-0.008
 oct-0.008
rr-0.008
Token be
Token ID307
Feature activation+9.96e-02

Pos logit contributions
ands+0.013
rim+0.012
inations+0.012
orse+0.012
loo+0.011
Neg logit contributions
 Heidi-0.016
Aud-0.016
 Sharon-0.016
 Monica-0.015
 Alice-0.015
Token very
Token ID845
Feature activation+1.37e-02

Pos logit contributions
ands+0.116
rim+0.107
orse+0.100
 oct+0.099
rr+0.095
Neg logit contributions
 Heidi-0.209
Aud-0.208
rified-0.206
ifying-0.204
 Sharon-0.204
Token quiet
Token ID5897
Feature activation+1.28e-02

Pos logit contributions
ands+0.023
rim+0.021
orse+0.020
 oct+0.020
rr+0.019
Neg logit contributions
 redd-0.022
Aud-0.021
 Heidi-0.021
 Sharon-0.021
 Alice-0.020
Token and
Token ID290
Feature activation+2.51e-02

Pos logit contributions
ands+0.016
rim+0.015
 oct+0.014
orse+0.014
rr+0.014
Neg logit contributions
Aud-0.026
 Heidi-0.026
 Sharon-0.025
 redd-0.025
 blonde-0.025
Token sneak
Token ID20528
Feature activation-2.45e-02

Pos logit contributions
ands+0.032
rim+0.030
orse+0.029
 oct+0.029
rr+0.027
Neg logit contributions
Aud-0.049
 Heidi-0.049
rified-0.048
 Sharon-0.048
 blonde-0.048
Token up
Token ID510
Feature activation-1.05e-01

Pos logit contributions
 Heidi+0.034
Aud+0.034
 blonde+0.033
 redd+0.032
 Michelle+0.031
Neg logit contributions
ands-0.048
 oct-0.044
rim-0.043
orse-0.042
rr-0.041
Token on
Token ID319
Feature activation-5.16e-02

Pos logit contributions
 Heidi+0.133
Aud+0.131
 Alice+0.130
 Sharon+0.128
ifying+0.127
Neg logit contributions
ands-0.204
rim-0.200
orse-0.192
 oct-0.183
ulu-0.181
Token them
Token ID606
Feature activation-4.63e-03

Pos logit contributions
Aud+0.093
 Heidi+0.092
 redd+0.091
rified+0.090
 blonde+0.089
Neg logit contributions
ands-0.076
rim-0.070
 oct-0.069
orse-0.067
rr-0.064
Token."
Token ID526
Feature activation-1.35e-02

Pos logit contributions
 Heidi+0.010
Aud+0.010
 blonde+0.009
 Sharon+0.009
 redd+0.009
Neg logit contributions
ands-0.005
rim-0.005
 oct-0.005
orse-0.005
rr-0.005
Token Max
Token ID5436
Feature activation-4.00e-02

Pos logit contributions
 redd+0.017
ifying+0.017
Aud+0.015
 Tab+0.015
 Heidi+0.015
Neg logit contributions
ands-0.027
orse-0.025
 oct-0.025
rim-0.025
rr-0.024
Token listened
Token ID16399
Feature activation-5.32e-02

Pos logit contributions
Aud+0.066
ifying+0.066
 Heidi+0.066
 redd+0.064
 blonde+0.063
Neg logit contributions
ands-0.062
rim-0.061
 oct-0.060
orse-0.059
rr-0.056
Token,
Token ID11
Feature activation-8.40e-03

Pos logit contributions
Aud+0.177
 Heidi+0.177
 redd+0.174
rified+0.171
 Sharon+0.171
Neg logit contributions
ands-0.116
rim-0.108
orse-0.106
 oct-0.105
rr-0.099
Token sheep
Token ID15900
Feature activation-6.78e-02

Pos logit contributions
 Heidi+0.014
rified+0.014
Aud+0.014
 Sharon+0.014
 Scarlett+0.014
Neg logit contributions
 oct-0.013
ands-0.013
orse-0.012
rim-0.012
rr-0.011
Token,
Token ID11
Feature activation-1.88e-02

Pos logit contributions
 redd+0.143
 Heidi+0.142
Aud+0.140
rified+0.138
 blonde+0.137
Neg logit contributions
ands-0.077
 oct-0.074
rim-0.073
orse-0.070
rr-0.063
Token pigs
Token ID22333
Feature activation-7.76e-02

Pos logit contributions
 Heidi+0.035
rified+0.034
Aud+0.034
 Sharon+0.034
 redd+0.034
Neg logit contributions
 oct-0.026
ands-0.025
orse-0.024
rr-0.023
rim-0.023
Token,
Token ID11
Feature activation-2.22e-02

Pos logit contributions
Aud+0.157
 Heidi+0.157
 redd+0.156
rified+0.154
ifying+0.152
Neg logit contributions
ands-0.095
rim-0.089
 oct-0.086
orse-0.086
rr-0.080
Token chickens
Token ID25972
Feature activation-1.10e-01

Pos logit contributions
 Heidi+0.044
rified+0.043
Aud+0.042
 Sharon+0.042
 redd+0.042
Neg logit contributions
 oct-0.028
ands-0.028
orse-0.026
rr-0.025
rim-0.024
Token,
Token ID11
Feature activation-2.32e-02

Pos logit contributions
 Heidi+0.223
 redd+0.223
Aud+0.222
rified+0.218
 blonde+0.215
Neg logit contributions
ands-0.135
 oct-0.124
rim-0.124
orse-0.123
rr-0.113
Token ducks
Token ID39694
Feature activation-6.00e-02

Pos logit contributions
 Heidi+0.047
rified+0.046
Aud+0.045
 Sharon+0.045
 redd+0.045
Neg logit contributions
ands-0.029
 oct-0.028
orse-0.026
rr-0.025
rim-0.025
Token,
Token ID11
Feature activation-1.99e-02

Pos logit contributions
Aud+0.115
 Heidi+0.115
 redd+0.112
 Sharon+0.111
rified+0.111
Neg logit contributions
ands-0.081
rim-0.076
 oct-0.074
orse-0.074
rr-0.070
Token and
Token ID290
Feature activation-3.57e-02

Pos logit contributions
 Heidi+0.038
rified+0.038
Aud+0.037
 Sharon+0.037
 redd+0.036
Neg logit contributions
ands-0.027
 oct-0.026
orse-0.024
rr-0.023
rim-0.023
Token more
Token ID517
Feature activation-1.78e-02

Pos logit contributions
 Heidi+0.055
Aud+0.054
rified+0.053
 Sharon+0.052
 Alice+0.051
Neg logit contributions
ands-0.062
 oct-0.059
rim-0.057
orse-0.057
rr-0.054
Token of
Token ID286
Feature activation-2.13e-02

Pos logit contributions
rified+0.004
 blonde+0.004
 Heidi+0.004
 Alice+0.003
Aud+0.003
Neg logit contributions
rim-0.003
ands-0.003
orse-0.003
rr-0.003
 oct-0.003
Token places
Token ID4113
Feature activation-2.09e-02

Pos logit contributions
Aud+0.038
 Heidi+0.038
 Sharon+0.037
ifying+0.037
 Stephen+0.037
Neg logit contributions
ands-0.032
rim-0.030
 oct-0.029
orse-0.028
rr-0.027
Token.
Token ID13
Feature activation+6.33e-03

Pos logit contributions
Aud+0.042
 Heidi+0.041
 redd+0.040
 Sharon+0.040
rified+0.040
Neg logit contributions
ands-0.027
rim-0.024
orse-0.024
 oct-0.024
inations-0.022
Token John
Token ID1757
Feature activation-2.17e-02

Pos logit contributions
 hour+0.010
ands+0.010
rr+0.010
orse+0.010
inations+0.010
Neg logit contributions
Aud-0.009
 redd-0.009
 sho-0.009
ifying-0.009
string-0.008
Token came
Token ID1625
Feature activation-2.40e-02

Pos logit contributions
ifying+0.038
 Heidi+0.037
Aud+0.037
 redd+0.037
 blonde+0.035
Neg logit contributions
orse-0.032
ands-0.031
 oct-0.031
rim-0.030
ulu-0.028
Token up
Token ID510
Feature activation-1.08e-01

Pos logit contributions
Aud+0.037
 redd+0.036
 blonde+0.036
ifying+0.036
 Heidi+0.035
Neg logit contributions
 oct-0.038
ands-0.038
rr-0.038
ulu-0.037
orse-0.036
Token with
Token ID351
Feature activation-8.47e-03

Pos logit contributions
Aud+0.177
 Heidi+0.174
 redd+0.168
rified+0.168
 blonde+0.167
Neg logit contributions
ands-0.177
rim-0.168
 oct-0.165
orse-0.162
rr-0.156
Token one
Token ID530
Feature activation+2.18e-03

Pos logit contributions
Aud+0.016
 Heidi+0.015
rified+0.015
 Sharon+0.015
ifying+0.015
Neg logit contributions
ands-0.011
 oct-0.011
rim-0.010
orse-0.010
ulu-0.010
Token spot
Token ID4136
Feature activation-1.16e-02

Pos logit contributions
ands+0.003
rim+0.003
orse+0.003
rr+0.003
 oct+0.003
Neg logit contributions
 Heidi-0.004
Aud-0.004
rified-0.004
 Alice-0.004
 blonde-0.004
Token,
Token ID11
Feature activation-2.18e-03

Pos logit contributions
Aud+0.023
 Heidi+0.022
 Sharon+0.022
 redd+0.022
 Stephen+0.021
Neg logit contributions
ands-0.014
 oct-0.014
rim-0.014
orse-0.013
ulu-0.013
Token but
Token ID475
Feature activation+1.75e-02

Pos logit contributions
Aud+0.005
ifying+0.005
se+0.004
 Heidi+0.004
rified+0.004
Neg logit contributions
 oct-0.002
orse-0.002
ands-0.002
 seek-0.002
rim-0.002