TOP ACTIVATIONS
MAX = 0.131

put them back in their rightful place . After
the rice back in its rightful place . John was proud
ams and cand ies and fruits and vegetables and letters and
and houses and cars and anything you like . You can
restore it back to its rightful place . Although Sammy was
shovel s and shells and stones . They worked hard and
and rocks and feathers and fruits . They were flexible .
, regardless of age or appearance . <|endoftext|> Once upon a
time to race everywhere and anywhere he could .
like leaves , rocks and bugs . They had a magn
better understanding of right and wrong . From then on ,
in the bath tub or anywhere . We will be good
and carrot and flower and mud . We have lovely friends
play with sticks and missiles anymore . They were bitter and
to return it to its rightful owner if you can find
shells , rocks , and crabs . One day , they
the stolen squares to their rightful place . The
grew leaves and flowers and fruits . It was a magic
, on books , on anything . He thinks writing is
and cats and cars and people . They can hurt us

MIN ACTIVATIONS
MIN = -0.108

The ot ter was be f uddled but curious . He
The old lady became known by many as the Y arn
friendly bee named Buzz saw Sp ide y struggling and asked
sign that said " Sur f Show Today ". She was
" Hi Sp ud !" said Lily .
. His name was Wil f . One day Wil f
" I have my fl ute ," replied Lily .
sign that says " Sur f Lessons for Kids ". She
we can get you a ch ubby cat . But for
day , Lily saw a ch ubby squirrel outside her window
" I am !" replied Sp ud . " I just
" What is a fire work ?" His dad said ,
didn 't understand what " f ierce " meant , but
day , Lily saw a ch ubby chicken walking in her
her pet , " Would n 't it be fun to
he decided to put his headphones on and listen to some
't want to remove her headphones because she loved the music
, " I am a night owl , which means I
But when it is night , L ila does not
" That is a fire work that is not working right

INTERVAL 0.071 - 0.131
CONTAINS 0.067%

be kind and respectful to others . Do you understand ?"
l aces , and ear rings . They are happy .
and couldn 't find it anywhere . She looked under the
wore funny hats and scar ves . They climbed up the
slide , and the sees aw . They laughed and smiled

INTERVAL 0.011 - 0.071
CONTAINS 26.647%

climb up and down the tree , while the bunny liked
the bath tub ! The warm water spilled
put them in a big basket and walked to the church
there was a little girl named Lily . She loved to
and played together in the jungle every day . <|endoftext|> Once

INTERVAL -0.048 - 0.011
CONTAINS 72.613%

" Let 's go get something to drink ." The small
can share it ." Sam said . He put the medal
the young boy often spent time in the courtyard , folding
They were happy . <|endoftext|> Anna and Ben were twins who
large hive . " What 's that ?" asked

INTERVAL -0.108 - -0.048
CONTAINS 0.673%

kids who would appreciate her fl ute and her acting ,
wall . It is almost evening . The sun is going
She got out of bed and took her sheet with
, they see a real train outside their window . It
The squirrel spoke , " Well , why don 't you
Token put
Token ID1234
Feature activation+2.74e-02

Pos logit contributions
aw+0.046
Kat+0.044
 reality+0.040
 Sandra+0.040
 theirs+0.039
Neg logit contributions
balls-0.043
hetically-0.040
ishly-0.040
 hips-0.039
ider-0.037
Token them
Token ID606
Feature activation-6.17e-03

Pos logit contributions
aw+0.040
Kat+0.036
 Sandra+0.031
 reality+0.031
aman+0.031
Neg logit contributions
balls-0.061
hetically-0.058
ishly-0.058
 hips-0.057
ider-0.053
Token back
Token ID736
Feature activation+2.12e-02

Pos logit contributions
ishly+0.010
hetically+0.010
balls+0.010
 hips+0.009
ider+0.008
Neg logit contributions
aw-0.011
 theirs-0.011
Kat-0.010
 reality-0.010
land-0.010
Token in
Token ID287
Feature activation-1.02e-02

Pos logit contributions
aw+0.046
Kat+0.042
 theirs+0.041
 reality+0.039
land+0.039
Neg logit contributions
balls-0.030
ishly-0.029
hetically-0.028
 hips-0.026
ider-0.024
Token their
Token ID511
Feature activation+2.32e-02

Pos logit contributions
balls+0.018
 hips+0.017
hetically+0.017
ishly+0.017
 first+0.015
Neg logit contributions
aw-0.020
Kat-0.018
 Sandra-0.016
aman-0.016
 reality-0.016
Token rightful
Token ID43133
Feature activation+1.31e-01

Pos logit contributions
aw+0.036
Kat+0.032
 reality+0.030
 pic+0.029
aman+0.029
Neg logit contributions
balls-0.048
hetically-0.046
ishly-0.046
ider-0.042
 hips-0.041
Token place
Token ID1295
Feature activation+4.74e-02

Pos logit contributions
aw+0.244
Kat+0.214
weet+0.214
aman+0.209
David+0.202
Neg logit contributions
 hips-0.217
balls-0.215
 first-0.206
hetically-0.191
ishly-0.188
Token.
Token ID13
Feature activation+5.96e-03

Pos logit contributions
aw+0.088
Kat+0.081
 reality+0.079
 theirs+0.078
land+0.077
Neg logit contributions
balls-0.080
hetically-0.078
ishly-0.076
 hips-0.069
ider-0.068
Token
Token ID198
Feature activation+1.81e-02

Pos logit contributions
aw+0.011
Kat+0.010
 Sandra+0.009
aman+0.009
 reality+0.009
Neg logit contributions
balls-0.011
hetically-0.010
ishly-0.010
 hips-0.010
ider-0.010
Token
Token ID198
Feature activation+1.50e-03

Pos logit contributions
aw+0.035
Kat+0.031
 reality+0.029
aman+0.029
 Sandra+0.029
Neg logit contributions
balls-0.032
hetically-0.029
 hips-0.029
ishly-0.028
 first-0.027
TokenAfter
Token ID3260
Feature activation-1.96e-02

Pos logit contributions
aw+0.002
Kat+0.002
 theirs+0.002
 Sandra+0.002
aman+0.002
Neg logit contributions
balls-0.003
hetically-0.003
 hips-0.003
ishly-0.003
 first-0.003
Token the
Token ID262
Feature activation-2.39e-02

Pos logit contributions
aw+0.002
Kat+0.002
 Sandra+0.002
 theirs+0.002
aman+0.002
Neg logit contributions
balls-0.003
hetically-0.003
ishly-0.003
 hips-0.003
ider-0.003
Token rice
Token ID11464
Feature activation-1.48e-02

Pos logit contributions
ishly+0.046
balls+0.045
hetically+0.045
 hips+0.044
ider+0.041
Neg logit contributions
aw-0.039
Kat-0.033
 reality-0.032
 theirs-0.032
ige-0.031
Token back
Token ID736
Feature activation+1.68e-02

Pos logit contributions
hetically+0.024
ishly+0.023
 hips+0.022
ider+0.022
balls+0.022
Neg logit contributions
aw-0.027
 theirs-0.025
 island-0.022
 gone-0.022
land-0.022
Token in
Token ID287
Feature activation-4.39e-03

Pos logit contributions
aw+0.038
Kat+0.033
 theirs+0.033
land+0.032
 reality+0.031
Neg logit contributions
balls-0.023
hetically-0.022
ishly-0.022
 hips-0.021
ider-0.019
Token its
Token ID663
Feature activation+3.54e-02

Pos logit contributions
balls+0.008
 hips+0.008
ishly+0.008
hetically+0.008
 first+0.007
Neg logit contributions
aw-0.008
Kat-0.007
aman-0.007
land-0.006
 Sandra-0.006
Token rightful
Token ID43133
Feature activation+1.25e-01

Pos logit contributions
aw+0.066
Kat+0.061
 Sandra+0.055
 reality+0.055
aman+0.055
Neg logit contributions
balls-0.064
hetically-0.060
ishly-0.059
 hips-0.058
ider-0.054
Token place
Token ID1295
Feature activation+3.98e-02

Pos logit contributions
aw+0.244
weet+0.224
Kat+0.218
reating+0.211
aman+0.210
Neg logit contributions
 first-0.186
 hips-0.185
balls-0.181
 arms-0.166
ishly-0.162
Token.
Token ID13
Feature activation+1.08e-03

Pos logit contributions
aw+0.080
Kat+0.074
 theirs+0.070
land+0.070
 reality+0.070
Neg logit contributions
balls-0.063
hetically-0.061
ishly-0.058
 hips-0.054
ider-0.053
Token John
Token ID1757
Feature activation+1.23e-02

Pos logit contributions
aw+0.002
Kat+0.002
 reality+0.002
aman+0.002
 theirs+0.002
Neg logit contributions
balls-0.002
ishly-0.002
 hips-0.002
hetically-0.002
 first-0.001
Token was
Token ID373
Feature activation+2.59e-03

Pos logit contributions
aw+0.021
Kat+0.020
 Sandra+0.017
aman+0.017
 reality+0.017
Neg logit contributions
balls-0.024
hetically-0.023
ishly-0.022
 hips-0.022
ider-0.021
Token proud
Token ID6613
Feature activation-2.53e-02

Pos logit contributions
aw+0.005
Kat+0.005
aman+0.005
oph+0.004
land+0.004
Neg logit contributions
balls-0.004
ishly-0.004
 hips-0.004
hetically-0.004
 first-0.004
Tokenams
Token ID4105
Feature activation+4.37e-02

Pos logit contributions
 reality+0.039
aman+0.037
Kat+0.037
 Sandra+0.036
 CA+0.036
Neg logit contributions
balls-0.050
hetically-0.049
ider-0.048
 hips-0.048
 first-0.047
Token and
Token ID290
Feature activation+3.18e-02

Pos logit contributions
aw+0.099
Kat+0.089
land+0.084
aman+0.082
 theirs+0.082
Neg logit contributions
balls-0.063
ishly-0.060
hetically-0.059
 hips-0.058
ider-0.051
Token cand
Token ID2658
Feature activation+8.61e-02

Pos logit contributions
aw+0.060
Kat+0.057
aman+0.051
 Sandra+0.049
oph+0.049
Neg logit contributions
balls-0.059
 hips-0.053
hetically-0.053
ishly-0.053
 first-0.049
Tokenies
Token ID444
Feature activation+7.95e-02

Pos logit contributions
aw+0.159
Kat+0.152
 Sandra+0.139
aman+0.137
 reality+0.135
Neg logit contributions
balls-0.157
hetically-0.142
 hips-0.140
ishly-0.139
ider-0.129
Token and
Token ID290
Feature activation+4.92e-02

Pos logit contributions
aw+0.178
Kat+0.164
land+0.151
 reality+0.150
 theirs+0.150
Neg logit contributions
balls-0.114
hetically-0.111
ishly-0.107
 hips-0.104
ider-0.094
Token fruits
Token ID15921
Feature activation+1.22e-01

Pos logit contributions
aw+0.088
Kat+0.083
aman+0.073
 Sandra+0.073
 reality+0.072
Neg logit contributions
balls-0.093
ishly-0.087
hetically-0.086
 hips-0.085
ider-0.078
Token and
Token ID290
Feature activation+5.81e-02

Pos logit contributions
aw+0.287
Kat+0.267
 Sandra+0.247
aman+0.245
 reality+0.245
Neg logit contributions
balls-0.163
hetically-0.151
ishly-0.150
 hips-0.146
ider-0.130
Token vegetables
Token ID13701
Feature activation+1.07e-01

Pos logit contributions
aw+0.103
Kat+0.098
aman+0.087
 Sandra+0.086
 reality+0.085
Neg logit contributions
balls-0.110
hetically-0.102
ishly-0.102
 hips-0.100
 first-0.091
Token and
Token ID290
Feature activation+5.87e-02

Pos logit contributions
aw+0.239
Kat+0.221
 theirs+0.205
 reality+0.205
land+0.204
Neg logit contributions
balls-0.152
hetically-0.147
ishly-0.145
 hips-0.140
ider-0.124
Token letters
Token ID7475
Feature activation+8.91e-02

Pos logit contributions
aw+0.103
Kat+0.097
aman+0.086
 Sandra+0.086
 reality+0.085
Neg logit contributions
balls-0.112
hetically-0.104
ishly-0.104
 hips-0.103
ider-0.094
Token and
Token ID290
Feature activation+4.97e-02

Pos logit contributions
aw+0.210
Kat+0.197
 Sandra+0.181
 reality+0.181
aman+0.180
Neg logit contributions
balls-0.117
hetically-0.108
ishly-0.107
 hips-0.104
ider-0.093
Token and
Token ID290
Feature activation+3.51e-04

Pos logit contributions
aw+0.012
Kat+0.011
 reality+0.010
 theirs+0.010
 Sandra+0.010
Neg logit contributions
balls-0.008
ishly-0.008
hetically-0.008
 hips-0.008
ider-0.007
Token houses
Token ID7777
Feature activation+6.74e-02

Pos logit contributions
aw+0.001
Kat+0.001
aman+0.001
 Sandra+0.001
oph+0.000
Neg logit contributions
balls-0.001
hetically-0.001
 hips-0.001
ishly-0.001
 first-0.001
Token and
Token ID290
Feature activation+3.44e-02

Pos logit contributions
aw+0.151
Kat+0.141
 reality+0.130
 Sandra+0.129
aman+0.128
Neg logit contributions
balls-0.096
hetically-0.090
ishly-0.090
 hips-0.086
ider-0.077
Token cars
Token ID5006
Feature activation+5.02e-02

Pos logit contributions
aw+0.062
Kat+0.059
aman+0.055
 Sandra+0.052
oph+0.052
Neg logit contributions
balls-0.064
hetically-0.058
 hips-0.057
ishly-0.057
 first-0.055
Token and
Token ID290
Feature activation+4.61e-02

Pos logit contributions
aw+0.111
Kat+0.100
 theirs+0.092
 reality+0.092
 Sandra+0.091
Neg logit contributions
balls-0.075
hetically-0.071
ishly-0.071
 hips-0.069
ider-0.061
Token anything
Token ID1997
Feature activation+1.21e-01

Pos logit contributions
aw+0.086
Kat+0.081
aman+0.073
 Sandra+0.072
 reality+0.071
Neg logit contributions
balls-0.083
hetically-0.077
ishly-0.077
 hips-0.076
 first-0.069
Token you
Token ID345
Feature activation+3.41e-02

Pos logit contributions
aw+0.229
Kat+0.206
 reality+0.187
 Sandra+0.184
 theirs+0.183
Neg logit contributions
balls-0.215
hetically-0.213
ishly-0.212
 hips-0.205
ider-0.188
Token like
Token ID588
Feature activation+2.65e-02

Pos logit contributions
aw+0.062
Kat+0.054
 reality+0.051
 theirs+0.049
 Sandra+0.049
Neg logit contributions
balls-0.065
hetically-0.063
ishly-0.062
 hips-0.059
ider-0.055
Token.
Token ID13
Feature activation+2.85e-03

Pos logit contributions
aw+0.054
Kat+0.049
 reality+0.045
 theirs+0.045
 Sandra+0.044
Neg logit contributions
balls-0.043
hetically-0.043
ishly-0.042
 hips-0.040
ider-0.036
Token You
Token ID921
Feature activation-1.70e-03

Pos logit contributions
aw+0.006
Kat+0.006
 reality+0.006
aman+0.005
 Sandra+0.005
Neg logit contributions
balls-0.004
ishly-0.004
 hips-0.004
hetically-0.004
ider-0.003
Token can
Token ID460
Feature activation+1.66e-03

Pos logit contributions
balls+0.003
hetically+0.003
ishly+0.003
 hips+0.003
ider+0.003
Neg logit contributions
aw-0.003
Kat-0.003
 reality-0.003
 theirs-0.003
 Sandra-0.003
Token restore
Token ID11169
Feature activation+3.54e-02

Pos logit contributions
aw+0.064
Kat+0.060
 Sandra+0.054
 reality+0.053
aman+0.053
Neg logit contributions
balls-0.060
hetically-0.056
ishly-0.056
 hips-0.055
ider-0.050
Token it
Token ID340
Feature activation+3.58e-02

Pos logit contributions
aw+0.060
Kat+0.058
aman+0.051
 Sandra+0.050
 reality+0.048
Neg logit contributions
balls-0.067
ishly-0.062
hetically-0.062
 hips-0.062
 first-0.058
Token back
Token ID736
Feature activation+5.46e-02

Pos logit contributions
aw+0.068
 theirs+0.062
 reality+0.062
Kat+0.060
land+0.060
Neg logit contributions
balls-0.058
ishly-0.056
hetically-0.056
 hips-0.054
ider-0.053
Token to
Token ID284
Feature activation+4.24e-02

Pos logit contributions
aw+0.114
Kat+0.105
 reality+0.099
 theirs+0.098
 Sandra+0.097
Neg logit contributions
balls-0.087
hetically-0.081
ishly-0.081
 hips-0.077
ider-0.074
Token its
Token ID663
Feature activation+3.46e-02

Pos logit contributions
aw+0.090
Kat+0.082
aman+0.075
oph+0.074
land+0.072
Neg logit contributions
balls-0.064
ishly-0.061
hetically-0.060
 help-0.060
 first-0.060
Token rightful
Token ID43133
Feature activation+1.19e-01

Pos logit contributions
aw+0.050
Kat+0.046
 reality+0.045
aman+0.041
land+0.040
Neg logit contributions
balls-0.075
hetically-0.072
ishly-0.071
 hips-0.069
ider-0.066
Token place
Token ID1295
Feature activation+4.94e-02

Pos logit contributions
aw+0.229
Kat+0.208
aman+0.201
weet+0.199
David+0.194
Neg logit contributions
balls-0.194
 hips-0.187
 first-0.181
ishly-0.174
hetically-0.174
Token.
Token ID13
Feature activation-3.10e-03

Pos logit contributions
aw+0.103
Kat+0.096
 reality+0.091
 theirs+0.091
land+0.089
Neg logit contributions
balls-0.076
hetically-0.071
ishly-0.069
 hips-0.068
ider-0.065
Token Although
Token ID4900
Feature activation+3.88e-03

Pos logit contributions
balls+0.006
hetically+0.005
ishly+0.005
 hips+0.005
ider+0.005
Neg logit contributions
aw-0.006
Kat-0.005
 Sandra-0.005
 reality-0.005
aman-0.005
Token Sammy
Token ID44067
Feature activation-1.25e-03

Pos logit contributions
aw+0.008
Kat+0.008
aman+0.007
 Sandra+0.007
 reality+0.007
Neg logit contributions
balls-0.006
ishly-0.006
hetically-0.006
 hips-0.006
 first-0.005
Token was
Token ID373
Feature activation-1.49e-02

Pos logit contributions
balls+0.003
 hips+0.003
ishly+0.003
hetically+0.003
ider+0.002
Neg logit contributions
aw-0.002
Kat-0.002
 reality-0.001
aman-0.001
 Sandra-0.001
Token shovel
Token ID33677
Feature activation+6.96e-02

Pos logit contributions
balls+0.004
ishly+0.004
hetically+0.004
 hips+0.004
ider+0.004
Neg logit contributions
aw-0.005
Kat-0.004
 Sandra-0.004
 theirs-0.004
 reality-0.004
Tokens
Token ID82
Feature activation+1.67e-02

Pos logit contributions
aw+0.164
Kat+0.151
 theirs+0.142
 Sandra+0.142
aman+0.140
Neg logit contributions
balls-0.090
ishly-0.085
hetically-0.084
 hips-0.080
ider-0.077
Token and
Token ID290
Feature activation+2.70e-02

Pos logit contributions
aw+0.035
Kat+0.030
 theirs+0.030
land+0.029
 Sandra+0.028
Neg logit contributions
balls-0.026
ishly-0.026
hetically-0.025
 hips-0.024
ider-0.023
Token shells
Token ID19679
Feature activation+9.35e-02

Pos logit contributions
aw+0.052
Kat+0.048
 Sandra+0.043
 reality+0.043
aman+0.043
Neg logit contributions
balls-0.047
ishly-0.044
hetically-0.044
 hips-0.043
ider-0.040
Token and
Token ID290
Feature activation+4.60e-02

Pos logit contributions
aw+0.195
Kat+0.170
 theirs+0.169
land+0.163
 Sandra+0.162
Neg logit contributions
balls-0.145
hetically-0.142
ishly-0.141
 hips-0.135
ider-0.129
Token stones
Token ID14966
Feature activation+1.16e-01

Pos logit contributions
aw+0.085
Kat+0.079
 Sandra+0.071
 reality+0.071
aman+0.070
Neg logit contributions
balls-0.084
ishly-0.079
hetically-0.079
 hips-0.076
ider-0.071
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
aw+0.243
Kat+0.214
 theirs+0.213
land+0.206
 Sandra+0.201
Neg logit contributions
balls-0.177
hetically-0.176
ishly-0.173
 hips-0.165
ider-0.156
Token They
Token ID1119
Feature activation+7.66e-03

Pos logit contributions
aw+0.027
Kat+0.025
 reality+0.024
aman+0.024
onda+0.023
Neg logit contributions
balls-0.018
 hips-0.016
ishly-0.016
hetically-0.015
ider-0.014
Token worked
Token ID3111
Feature activation-1.90e-02

Pos logit contributions
aw+0.012
Kat+0.011
 Sandra+0.010
 reality+0.010
aman+0.010
Neg logit contributions
balls-0.016
ishly-0.015
hetically-0.015
 hips-0.015
ider-0.014
Token hard
Token ID1327
Feature activation-5.45e-03

Pos logit contributions
balls+0.029
ishly+0.029
hetically+0.028
 hips+0.027
ider+0.025
Neg logit contributions
aw-0.040
Kat-0.037
 theirs-0.034
 reality-0.034
 Sandra-0.033
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
ishly+0.009
balls+0.009
hetically+0.009
 hips+0.009
ider+0.008
Neg logit contributions
aw-0.011
Kat-0.009
land-0.009
 theirs-0.009
 reality-0.009
Token and
Token ID290
Feature activation+2.11e-02

Pos logit contributions
aw+0.070
Kat+0.062
 theirs+0.059
 Sandra+0.059
aman+0.057
Neg logit contributions
balls-0.048
hetically-0.046
ishly-0.045
 hips-0.044
ider-0.041
Token rocks
Token ID12586
Feature activation+6.59e-02

Pos logit contributions
aw+0.043
Kat+0.041
aman+0.038
 reality+0.037
 Sandra+0.036
Neg logit contributions
balls-0.034
hetically-0.031
ishly-0.031
 hips-0.031
 first-0.028
Token and
Token ID290
Feature activation+3.99e-02

Pos logit contributions
aw+0.142
Kat+0.125
 theirs+0.122
 Sandra+0.118
land+0.117
Neg logit contributions
balls-0.100
hetically-0.095
ishly-0.095
 hips-0.093
ider-0.085
Token feathers
Token ID27400
Feature activation+6.08e-02

Pos logit contributions
aw+0.078
Kat+0.074
aman+0.068
 reality+0.066
 Sandra+0.066
Neg logit contributions
balls-0.068
hetically-0.062
ishly-0.062
 hips-0.061
ider-0.055
Token and
Token ID290
Feature activation+4.55e-02

Pos logit contributions
aw+0.143
Kat+0.134
 Sandra+0.123
aman+0.122
 reality+0.122
Neg logit contributions
balls-0.080
hetically-0.074
ishly-0.074
 hips-0.072
ider-0.064
Token fruits
Token ID15921
Feature activation+1.16e-01

Pos logit contributions
aw+0.088
Kat+0.083
aman+0.074
 Sandra+0.074
 reality+0.074
Neg logit contributions
balls-0.079
hetically-0.073
ishly-0.073
 hips-0.072
ider-0.066
Token.
Token ID13
Feature activation+1.75e-02

Pos logit contributions
aw+0.272
Kat+0.254
 Sandra+0.235
aman+0.233
 reality+0.232
Neg logit contributions
balls-0.154
hetically-0.142
ishly-0.142
 hips-0.137
ider-0.122
Token They
Token ID1119
Feature activation+1.18e-02

Pos logit contributions
aw+0.038
Kat+0.036
 reality+0.034
aman+0.034
onda+0.033
Neg logit contributions
balls-0.026
 hips-0.022
ishly-0.022
hetically-0.021
 first-0.021
Token were
Token ID547
Feature activation+9.66e-03

Pos logit contributions
aw+0.021
Kat+0.020
aman+0.018
 reality+0.017
land+0.017
Neg logit contributions
balls-0.022
hetically-0.020
ishly-0.020
 hips-0.020
 first-0.019
Token flexible
Token ID12846
Feature activation-1.23e-02

Pos logit contributions
aw+0.018
Kat+0.017
aman+0.017
oph+0.016
land+0.015
Neg logit contributions
balls-0.018
 hips-0.016
hetically-0.015
ishly-0.015
 first-0.015
Token.
Token ID13
Feature activation+2.86e-03

Pos logit contributions
balls+0.021
hetically+0.021
ishly+0.021
 hips+0.019
 maker+0.017
Neg logit contributions
aw-0.025
Kat-0.020
land-0.020
 reality-0.020
 theirs-0.020
Token,
Token ID11
Feature activation+4.37e-02

Pos logit contributions
aw+0.096
Kat+0.089
 reality+0.082
land+0.080
 Sandra+0.079
Neg logit contributions
balls-0.081
hetically-0.077
ishly-0.075
 hips-0.072
ider-0.068
Token regardless
Token ID7692
Feature activation+9.73e-03

Pos logit contributions
aw+0.086
Kat+0.080
aman+0.072
 Sandra+0.071
 reality+0.070
Neg logit contributions
balls-0.074
ishly-0.069
hetically-0.069
 hips-0.068
 first-0.062
Token of
Token ID286
Feature activation+2.51e-03

Pos logit contributions
aw+0.020
Kat+0.019
 Sandra+0.017
aman+0.017
 reality+0.017
Neg logit contributions
balls-0.015
hetically-0.014
ishly-0.014
 hips-0.014
ider-0.013
Token age
Token ID2479
Feature activation+1.71e-02

Pos logit contributions
aw+0.004
Kat+0.004
 Sandra+0.003
aman+0.003
 reality+0.003
Neg logit contributions
balls-0.005
ishly-0.005
hetically-0.005
 hips-0.005
 first-0.004
Token or
Token ID393
Feature activation+5.27e-02

Pos logit contributions
aw+0.036
Kat+0.033
 reality+0.031
land+0.030
 Sandra+0.030
Neg logit contributions
balls-0.027
hetically-0.026
ishly-0.025
 hips-0.024
ider-0.022
Token appearance
Token ID5585
Feature activation+1.16e-01

Pos logit contributions
aw+0.086
Kat+0.079
 Sandra+0.069
aman+0.069
 reality+0.069
Neg logit contributions
balls-0.107
ishly-0.101
hetically-0.101
 hips-0.099
ider-0.092
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
aw+0.265
Kat+0.249
 reality+0.230
 Sandra+0.228
aman+0.228
Neg logit contributions
balls-0.160
hetically-0.147
ishly-0.146
 hips-0.142
ider-0.128
Token<|endoftext|>
Token ID50256
Feature activation+5.70e-04

Pos logit contributions
aw+0.023
Kat+0.021
 reality+0.019
aman+0.019
 Sandra+0.019
Neg logit contributions
balls-0.020
ishly-0.018
hetically-0.018
 hips-0.018
ider-0.016
TokenOnce
Token ID7454
Feature activation+2.46e-02

Pos logit contributions
aw+0.001
Kat+0.001
 Sandra+0.001
aman+0.001
 reality+0.001
Neg logit contributions
balls-0.001
hetically-0.001
ishly-0.001
 hips-0.001
ider-0.001
Token upon
Token ID2402
Feature activation-2.80e-03

Pos logit contributions
aw+0.046
Kat+0.044
 Sandra+0.039
aman+0.039
 reality+0.039
Neg logit contributions
balls-0.043
ishly-0.041
hetically-0.041
 hips-0.040
ider-0.037
Token a
Token ID257
Feature activation+2.76e-02

Pos logit contributions
balls+0.004
 hips+0.003
 first+0.003
hetically+0.003
 help+0.003
Neg logit contributions
aw-0.006
aman-0.006
Kat-0.006
 Sandra-0.006
oph-0.006
Token time
Token ID640
Feature activation+8.68e-03

Pos logit contributions
Kat+0.045
aw+0.045
aman+0.041
 Sandra+0.039
oph+0.039
Neg logit contributions
balls-0.051
hetically-0.045
ishly-0.045
 hips-0.044
 first-0.043
Token to
Token ID284
Feature activation+1.07e-02

Pos logit contributions
aw+0.017
Kat+0.015
 theirs+0.014
 Sandra+0.014
land+0.014
Neg logit contributions
hetically-0.015
ishly-0.015
balls-0.014
 hips-0.013
ider-0.012
Token race
Token ID3234
Feature activation+1.89e-02

Pos logit contributions
aw+0.022
Kat+0.021
 reality+0.020
 Sandra+0.020
 theirs+0.019
Neg logit contributions
balls-0.017
hetically-0.016
ishly-0.015
 hips-0.015
ider-0.014
Token everywhere
Token ID8347
Feature activation+1.31e-02

Pos logit contributions
aw+0.036
Kat+0.031
 theirs+0.029
 reality+0.029
 Sandra+0.029
Neg logit contributions
balls-0.034
hetically-0.034
ishly-0.033
 hips-0.032
ider-0.031
Token and
Token ID290
Feature activation+3.27e-02

Pos logit contributions
aw+0.026
Kat+0.023
 reality+0.022
 theirs+0.021
 Sandra+0.020
Neg logit contributions
ishly-0.022
hetically-0.021
balls-0.021
 hips-0.020
ider-0.019
Token anywhere
Token ID6609
Feature activation+1.16e-01

Pos logit contributions
aw+0.062
Kat+0.057
 reality+0.053
 Sandra+0.053
 theirs+0.052
Neg logit contributions
balls-0.056
hetically-0.054
ishly-0.053
 hips-0.052
ider-0.050
Token he
Token ID339
Feature activation+1.23e-02

Pos logit contributions
aw+0.299
Kat+0.285
aman+0.271
 Sandra+0.265
oph+0.259
Neg logit contributions
balls-0.132
hetically-0.109
ishly-0.106
 hips-0.106
 first-0.093
Token could
Token ID714
Feature activation+1.70e-02

Pos logit contributions
aw+0.020
Kat+0.018
 reality+0.016
 Sandra+0.015
 spoken+0.015
Neg logit contributions
balls-0.025
ishly-0.024
hetically-0.024
 hips-0.024
ider-0.022
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
aw+0.025
Kat+0.023
 reality+0.021
 theirs+0.021
 Sandra+0.021
Neg logit contributions
balls-0.036
hetically-0.036
ishly-0.034
ider-0.031
 hips-0.031
Token
Token ID198
Feature activation+2.14e-02

Pos logit contributions
aw+0.005
Kat+0.005
 Sandra+0.004
aman+0.004
 reality+0.004
Neg logit contributions
balls-0.003
ishly-0.003
hetically-0.003
 hips-0.003
ider-0.003
Token
Token ID198
Feature activation+4.75e-03

Pos logit contributions
aw+0.044
Kat+0.040
 reality+0.038
 Sandra+0.037
aman+0.037
Neg logit contributions
balls-0.035
hetically-0.031
 hips-0.031
ishly-0.030
 first-0.029
Token like
Token ID588
Feature activation+3.25e-02

Pos logit contributions
aw+0.032
Kat+0.030
aman+0.028
 reality+0.027
 Sandra+0.027
Neg logit contributions
balls-0.024
 hips-0.022
ishly-0.021
hetically-0.021
 first-0.020
Token leaves
Token ID5667
Feature activation+4.26e-02

Pos logit contributions
aw+0.059
Kat+0.058
aman+0.053
 reality+0.051
oph+0.051
Neg logit contributions
balls-0.058
ishly-0.052
hetically-0.052
 hips-0.051
 first-0.049
Token,
Token ID11
Feature activation+4.44e-02

Pos logit contributions
aw+0.091
Kat+0.077
land+0.076
 theirs+0.075
aman+0.074
Neg logit contributions
balls-0.066
hetically-0.064
ishly-0.062
 hips-0.062
ider-0.057
Token rocks
Token ID12586
Feature activation+7.54e-02

Pos logit contributions
aw+0.087
Kat+0.084
aman+0.075
 reality+0.075
 Sandra+0.074
Neg logit contributions
balls-0.076
hetically-0.069
ishly-0.069
 hips-0.069
 first-0.062
Token and
Token ID290
Feature activation+5.42e-02

Pos logit contributions
aw+0.160
land+0.134
Kat+0.133
 theirs+0.132
aman+0.129
Neg logit contributions
balls-0.116
hetically-0.113
ishly-0.111
 hips-0.109
ider-0.103
Token bugs
Token ID11316
Feature activation+1.15e-01

Pos logit contributions
aw+0.104
Kat+0.100
aman+0.091
 Sandra+0.089
 reality+0.088
Neg logit contributions
balls-0.093
hetically-0.086
ishly-0.086
 hips-0.085
 first-0.077
Token.
Token ID13
Feature activation+3.01e-02

Pos logit contributions
aw+0.247
Kat+0.209
land+0.207
 theirs+0.201
 reality+0.192
Neg logit contributions
hetically-0.179
balls-0.175
ishly-0.172
 hips-0.171
ider-0.153
Token They
Token ID1119
Feature activation+9.51e-03

Pos logit contributions
aw+0.065
Kat+0.063
 reality+0.060
reating+0.056
aman+0.056
Neg logit contributions
balls-0.046
ishly-0.041
 hips-0.040
hetically-0.039
 first-0.038
Token had
Token ID550
Feature activation+3.69e-02

Pos logit contributions
aw+0.015
Kat+0.014
 Sandra+0.012
 reality+0.012
aman+0.012
Neg logit contributions
balls-0.020
hetically-0.019
ishly-0.019
 hips-0.018
ider-0.017
Token a
Token ID257
Feature activation-1.42e-02

Pos logit contributions
aman+0.081
Kat+0.078
aw+0.077
reating+0.075
 reality+0.074
Neg logit contributions
balls-0.052
 first-0.046
 hips-0.046
ishly-0.040
 help-0.040
Token magn
Token ID7842
Feature activation-3.28e-02

Pos logit contributions
balls+0.028
ishly+0.026
hetically+0.026
 hips+0.025
ider+0.024
Neg logit contributions
aw-0.024
Kat-0.023
 Sandra-0.020
aman-0.020
 reality-0.020
Token better
Token ID1365
Feature activation+3.79e-02

Pos logit contributions
aw+0.009
Kat+0.008
 reality+0.007
aman+0.007
 Sandra+0.007
Neg logit contributions
balls-0.012
hetically-0.012
ishly-0.012
 hips-0.011
ider-0.011
Token understanding
Token ID4547
Feature activation+8.44e-03

Pos logit contributions
aw+0.065
Kat+0.060
 Sandra+0.054
 reality+0.053
aman+0.052
Neg logit contributions
balls-0.075
ishly-0.071
hetically-0.071
 hips-0.069
ider-0.065
Token of
Token ID286
Feature activation-1.60e-02

Pos logit contributions
aw+0.017
Kat+0.016
 Sandra+0.015
 reality+0.015
aman+0.015
Neg logit contributions
balls-0.014
hetically-0.013
ishly-0.013
 hips-0.012
ider-0.011
Token right
Token ID826
Feature activation+3.29e-02

Pos logit contributions
balls+0.029
ishly+0.027
hetically+0.027
 hips+0.026
 first+0.024
Neg logit contributions
aw-0.031
Kat-0.028
aman-0.025
 Sandra-0.025
 reality-0.025
Token and
Token ID290
Feature activation+1.49e-02

Pos logit contributions
aw+0.060
Kat+0.055
 Sandra+0.049
 reality+0.049
aman+0.048
Neg logit contributions
balls-0.063
hetically-0.059
ishly-0.059
 hips-0.058
ider-0.053
Token wrong
Token ID2642
Feature activation+1.15e-01

Pos logit contributions
aw+0.026
Kat+0.024
aman+0.021
 Sandra+0.021
 reality+0.021
Neg logit contributions
balls-0.029
ishly-0.027
hetically-0.027
 hips-0.026
 first-0.025
Token.
Token ID13
Feature activation+9.56e-03

Pos logit contributions
aw+0.228
Kat+0.209
 reality+0.195
 Sandra+0.192
 theirs+0.190
Neg logit contributions
balls-0.193
hetically-0.189
ishly-0.184
 hips-0.178
ider-0.170
Token From
Token ID3574
Feature activation-7.18e-03

Pos logit contributions
aw+0.018
Kat+0.016
 Sandra+0.015
 reality+0.015
aman+0.014
Neg logit contributions
balls-0.018
hetically-0.016
ishly-0.016
 hips-0.016
ider-0.015
Token then
Token ID788
Feature activation+1.63e-02

Pos logit contributions
balls+0.013
ishly+0.012
hetically+0.012
 hips+0.011
ider+0.011
Neg logit contributions
aw-0.014
Kat-0.013
 reality-0.012
aman-0.012
 Sandra-0.011
Token on
Token ID319
Feature activation+1.40e-02

Pos logit contributions
aw+0.037
Kat+0.033
 Sandra+0.031
aman+0.031
 reality+0.030
Neg logit contributions
balls-0.023
 hips-0.021
ishly-0.021
hetically-0.021
ider-0.019
Token,
Token ID11
Feature activation+2.06e-02

Pos logit contributions
aw+0.021
 happened+0.018
 reality+0.018
 sharing+0.018
 Sandra+0.018
Neg logit contributions
balls-0.027
ider-0.026
hetically-0.025
ishly-0.024
 hips-0.024
Token in
Token ID287
Feature activation-1.77e-02

Pos logit contributions
balls+0.012
hetically+0.012
ishly+0.012
 hips+0.011
ider+0.010
Neg logit contributions
aw-0.013
Kat-0.012
 reality-0.012
 theirs-0.012
land-0.011
Token the
Token ID262
Feature activation+3.25e-02

Pos logit contributions
balls+0.031
 hips+0.030
 first+0.029
hetically+0.029
ishly+0.029
Neg logit contributions
aw-0.033
Kat-0.030
aman-0.028
 Sandra-0.027
oph-0.026
Token bath
Token ID7837
Feature activation+8.33e-03

Pos logit contributions
aw+0.055
Kat+0.050
 reality+0.046
aman+0.044
 Sandra+0.044
Neg logit contributions
balls-0.064
hetically-0.062
ishly-0.062
 hips-0.060
ider-0.056
Tokentub
Token ID37995
Feature activation+4.82e-03

Pos logit contributions
aw+0.016
Kat+0.014
 reality+0.013
land+0.013
 theirs+0.013
Neg logit contributions
hetically-0.014
balls-0.014
ishly-0.014
 hips-0.013
 shove-0.012
Token or
Token ID393
Feature activation+6.37e-02

Pos logit contributions
aw+0.010
Kat+0.009
 reality+0.008
land+0.008
 theirs+0.008
Neg logit contributions
balls-0.008
hetically-0.008
ishly-0.008
 hips-0.007
ider-0.006
Token anywhere
Token ID6609
Feature activation+1.14e-01

Pos logit contributions
aw+0.113
Kat+0.105
 reality+0.098
 theirs+0.097
 Sandra+0.096
Neg logit contributions
balls-0.116
hetically-0.112
ishly-0.111
 hips-0.106
ider-0.101
Token.
Token ID13
Feature activation+5.19e-03

Pos logit contributions
aw+0.233
Kat+0.217
 reality+0.197
 Sandra+0.197
aman+0.195
Neg logit contributions
balls-0.187
hetically-0.176
ishly-0.176
 hips-0.170
ider-0.155
Token We
Token ID775
Feature activation+2.12e-02

Pos logit contributions
aw+0.011
Kat+0.010
 Sandra+0.009
aman+0.009
 reality+0.009
Neg logit contributions
balls-0.008
ishly-0.008
hetically-0.008
 hips-0.008
ider-0.007
Token will
Token ID481
Feature activation+7.59e-03

Pos logit contributions
aw+0.033
Kat+0.029
 theirs+0.027
 reality+0.027
 spoken+0.026
Neg logit contributions
balls-0.044
hetically-0.044
 hips-0.041
ishly-0.041
ider-0.039
Token be
Token ID307
Feature activation-6.60e-03

Pos logit contributions
aw+0.014
Kat+0.013
 Sandra+0.012
aman+0.012
 reality+0.012
Neg logit contributions
balls-0.014
ishly-0.013
hetically-0.013
 hips-0.012
ider-0.011
Token good
Token ID922
Feature activation-6.41e-04

Pos logit contributions
balls+0.011
 hips+0.010
ishly+0.010
 first+0.009
hetically+0.009
Neg logit contributions
aw-0.013
aman-0.012
Kat-0.012
oph-0.012
land-0.012
Token and
Token ID290
Feature activation+3.76e-02

Pos logit contributions
aw+0.154
Kat+0.128
 theirs+0.123
land+0.121
 Sandra+0.118
Neg logit contributions
hetically-0.132
balls-0.128
ishly-0.127
 hips-0.124
ider-0.118
Token carrot
Token ID40562
Feature activation+6.39e-02

Pos logit contributions
aw+0.063
Kat+0.056
 theirs+0.053
 Sandra+0.052
 reality+0.052
Neg logit contributions
balls-0.072
hetically-0.071
ishly-0.069
 hips-0.068
ider-0.065
Token and
Token ID290
Feature activation+4.12e-02

Pos logit contributions
aw+0.120
 theirs+0.105
land+0.099
Kat+0.096
weet+0.092
Neg logit contributions
hetically-0.110
ishly-0.103
 hips-0.101
balls-0.098
 shove-0.097
Token flower
Token ID15061
Feature activation+8.15e-02

Pos logit contributions
aw+0.068
Kat+0.060
 theirs+0.056
 Sandra+0.056
 reality+0.056
Neg logit contributions
balls-0.080
hetically-0.080
ishly-0.078
 hips-0.076
ider-0.072
Token and
Token ID290
Feature activation+4.41e-02

Pos logit contributions
aw+0.141
 theirs+0.117
Kat+0.110
land+0.110
 pic+0.109
Neg logit contributions
hetically-0.162
ishly-0.153
 hips-0.149
balls-0.144
ider-0.142
Token mud
Token ID17492
Feature activation+1.14e-01

Pos logit contributions
aw+0.075
Kat+0.067
 theirs+0.062
 reality+0.062
 Sandra+0.061
Neg logit contributions
balls-0.085
hetically-0.083
ishly-0.081
 hips-0.079
ider-0.075
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
aw+0.228
Kat+0.186
 theirs+0.185
land+0.179
 Sandra+0.173
Neg logit contributions
hetically-0.190
ishly-0.185
 hips-0.183
balls-0.180
ider-0.167
Token We
Token ID775
Feature activation+1.98e-02

Pos logit contributions
aw+0.038
Kat+0.035
aman+0.032
 reality+0.032
 Sandra+0.032
Neg logit contributions
balls-0.030
ishly-0.028
hetically-0.027
 hips-0.027
ider-0.025
Token have
Token ID423
Feature activation-2.14e-04

Pos logit contributions
aw+0.037
Kat+0.034
 Sandra+0.031
aman+0.031
 reality+0.031
Neg logit contributions
balls-0.036
ishly-0.034
hetically-0.034
 hips-0.033
ider-0.030
Token lovely
Token ID14081
Feature activation-1.75e-03

Pos logit contributions
balls+0.000
 first+0.000
 hips+0.000
 help+0.000
 arms+0.000
Neg logit contributions
aw-0.000
aman-0.000
Kat-0.000
David-0.000
ooks-0.000
Token friends
Token ID2460
Feature activation+8.20e-04

Pos logit contributions
balls+0.004
hetically+0.004
ishly+0.004
 hips+0.004
ider+0.003
Neg logit contributions
aw-0.003
Kat-0.002
 reality-0.002
 Sandra-0.002
 theirs-0.002
Token play
Token ID711
Feature activation-3.31e-02

Pos logit contributions
balls+0.029
ishly+0.027
hetically+0.027
 hips+0.027
 first+0.024
Neg logit contributions
aw-0.042
Kat-0.039
aman-0.036
 Sandra-0.035
 reality-0.035
Token with
Token ID351
Feature activation-3.69e-02

Pos logit contributions
balls+0.048
hetically+0.047
ishly+0.045
 hips+0.044
ider+0.039
Neg logit contributions
aw-0.074
Kat-0.067
 reality-0.062
 theirs-0.062
 Sandra-0.061
Token sticks
Token ID16461
Feature activation+2.26e-02

Pos logit contributions
balls+0.068
 first+0.062
ishly+0.061
 hips+0.060
hetically+0.060
Neg logit contributions
Kat-0.070
aw-0.063
David-0.059
oph-0.058
aman-0.058
Token and
Token ID290
Feature activation+1.82e-02

Pos logit contributions
aw+0.042
Kat+0.037
 theirs+0.034
 reality+0.034
 Sandra+0.034
Neg logit contributions
balls-0.042
hetically-0.040
ishly-0.039
 hips-0.038
ider-0.036
Token missiles
Token ID12798
Feature activation+4.62e-02

Pos logit contributions
aw+0.033
Kat+0.029
 theirs+0.028
 reality+0.028
 Sandra+0.027
Neg logit contributions
balls-0.032
ishly-0.031
hetically-0.031
 hips-0.030
ider-0.029
Token anymore
Token ID7471
Feature activation+1.14e-01

Pos logit contributions
aw+0.088
 theirs+0.075
Kat+0.074
 reality+0.073
land+0.071
Neg logit contributions
hetically-0.080
balls-0.079
 hips-0.077
ishly-0.076
ider-0.069
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
aw+0.271
Kat+0.255
 Sandra+0.235
 reality+0.235
aman+0.233
Neg logit contributions
balls-0.147
hetically-0.136
ishly-0.136
 hips-0.131
ider-0.116
Token They
Token ID1119
Feature activation+5.23e-03

Pos logit contributions
aw+0.027
Kat+0.026
aman+0.026
 reality+0.026
reating+0.026
Neg logit contributions
balls-0.011
 hips-0.009
 first-0.009
ider-0.008
 onto-0.007
Token were
Token ID547
Feature activation+1.08e-02

Pos logit contributions
aw+0.009
Kat+0.009
aman+0.008
 reality+0.008
 Sandra+0.008
Neg logit contributions
balls-0.010
ishly-0.010
hetically-0.009
 hips-0.009
 first-0.009
Token bitter
Token ID12922
Feature activation+1.72e-02

Pos logit contributions
aw+0.021
Kat+0.020
aman+0.019
 Sandra+0.018
 reality+0.018
Neg logit contributions
balls-0.019
ishly-0.017
 hips-0.017
hetically-0.017
 first-0.015
Token and
Token ID290
Feature activation-1.13e-02

Pos logit contributions
aw+0.037
Kat+0.031
land+0.029
 theirs+0.029
 reality+0.029
Neg logit contributions
balls-0.028
ishly-0.027
hetically-0.027
 hips-0.026
ider-0.023
Token to
Token ID284
Feature activation+2.02e-02

Pos logit contributions
aw+0.113
Kat+0.108
aman+0.097
 Sandra+0.097
 reality+0.096
Neg logit contributions
balls-0.103
ishly-0.096
 hips-0.096
hetically-0.091
 first-0.090
Token return
Token ID1441
Feature activation+4.90e-02

Pos logit contributions
aw+0.044
Kat+0.041
aman+0.036
 Sandra+0.036
 reality+0.036
Neg logit contributions
balls-0.030
ishly-0.029
 hips-0.028
hetically-0.028
 first-0.026
Token it
Token ID340
Feature activation+1.57e-02

Pos logit contributions
aw+0.080
Kat+0.074
aman+0.065
 Sandra+0.064
 reality+0.064
Neg logit contributions
balls-0.099
 hips-0.093
ishly-0.093
hetically-0.091
 first-0.087
Token to
Token ID284
Feature activation+2.69e-02

Pos logit contributions
aw+0.028
Kat+0.026
land+0.024
 reality+0.024
 theirs+0.023
Neg logit contributions
balls-0.029
hetically-0.029
ishly-0.027
 hips-0.026
ider-0.024
Token its
Token ID663
Feature activation+2.49e-02

Pos logit contributions
aw+0.049
Kat+0.044
aman+0.039
oph+0.038
 Sandra+0.038
Neg logit contributions
balls-0.049
 first-0.047
 hips-0.046
ishly-0.046
 help-0.045
Token rightful
Token ID43133
Feature activation+1.13e-01

Pos logit contributions
aw+0.034
Kat+0.031
 reality+0.029
aman+0.027
 Sandra+0.026
Neg logit contributions
balls-0.057
hetically-0.055
ishly-0.054
 hips-0.053
ider-0.049
Token owner
Token ID4870
Feature activation+6.29e-03

Pos logit contributions
aw+0.226
Kat+0.205
aman+0.197
weet+0.194
 Sandra+0.190
Neg logit contributions
balls-0.179
 hips-0.172
 first-0.166
ishly-0.162
hetically-0.161
Token if
Token ID611
Feature activation-7.74e-03

Pos logit contributions
aw+0.013
Kat+0.011
land+0.010
 Sandra+0.010
 reality+0.010
Neg logit contributions
balls-0.011
hetically-0.010
ishly-0.010
 hips-0.010
ider-0.009
Token you
Token ID345
Feature activation-4.38e-03

Pos logit contributions
balls+0.014
hetically+0.013
ishly+0.013
 hips+0.013
ider+0.012
Neg logit contributions
aw-0.014
Kat-0.013
 Sandra-0.012
 reality-0.012
aman-0.012
Token can
Token ID460
Feature activation-3.74e-03

Pos logit contributions
balls+0.009
hetically+0.008
ishly+0.008
 hips+0.008
ider+0.007
Neg logit contributions
aw-0.008
Kat-0.007
 reality-0.006
 theirs-0.006
 Sandra-0.006
Token find
Token ID1064
Feature activation+1.28e-02

Pos logit contributions
balls+0.007
hetically+0.007
ishly+0.006
 hips+0.006
ider+0.006
Neg logit contributions
aw-0.006
Kat-0.006
 reality-0.006
aman-0.005
land-0.005
Token shells
Token ID19679
Feature activation+5.83e-02

Pos logit contributions
aw+0.055
Kat+0.052
aman+0.046
 Sandra+0.046
 reality+0.046
Neg logit contributions
balls-0.058
ishly-0.053
hetically-0.053
 hips-0.052
 first-0.048
Token,
Token ID11
Feature activation+4.21e-02

Pos logit contributions
aw+0.130
Kat+0.114
 theirs+0.110
land+0.109
 Sandra+0.109
Neg logit contributions
balls-0.082
hetically-0.081
ishly-0.079
 hips-0.076
ider-0.071
Token rocks
Token ID12586
Feature activation+7.26e-02

Pos logit contributions
aw+0.073
Kat+0.069
aman+0.061
 reality+0.060
 Sandra+0.060
Neg logit contributions
balls-0.082
hetically-0.076
ishly-0.076
 hips-0.075
ider-0.069
Token,
Token ID11
Feature activation+5.44e-02

Pos logit contributions
aw+0.156
 theirs+0.133
Kat+0.132
land+0.131
 Sandra+0.128
Neg logit contributions
balls-0.107
hetically-0.106
ishly-0.103
 hips-0.100
ider-0.094
Token and
Token ID290
Feature activation+5.22e-02

Pos logit contributions
aw+0.110
Kat+0.103
 Sandra+0.093
 reality+0.093
aman+0.093
Neg logit contributions
balls-0.090
hetically-0.084
ishly-0.084
 hips-0.081
ider-0.074
Token crabs
Token ID49172
Feature activation+1.13e-01

Pos logit contributions
aw+0.096
Kat+0.090
aman+0.081
 Sandra+0.080
 reality+0.080
Neg logit contributions
balls-0.095
hetically-0.089
ishly-0.089
 hips-0.087
ider-0.080
Token.
Token ID13
Feature activation+2.58e-02

Pos logit contributions
aw+0.247
Kat+0.214
 theirs+0.208
land+0.206
 Sandra+0.201
Neg logit contributions
hetically-0.169
ishly-0.163
balls-0.162
 hips-0.157
ider-0.143
Token One
Token ID1881
Feature activation-2.15e-02

Pos logit contributions
aw+0.056
Kat+0.054
 reality+0.051
 spoken+0.049
aman+0.048
Neg logit contributions
balls-0.039
 hips-0.035
ishly-0.034
 first-0.033
hetically-0.032
Token day
Token ID1110
Feature activation-3.47e-02

Pos logit contributions
balls+0.035
ishly+0.032
hetically+0.032
 hips+0.031
 first+0.030
Neg logit contributions
aw-0.043
Kat-0.042
 reality-0.037
aman-0.037
 Sandra-0.037
Token,
Token ID11
Feature activation-6.22e-03

Pos logit contributions
balls+0.039
ishly+0.034
hetically+0.033
 hips+0.033
ider+0.028
Neg logit contributions
aw-0.089
Kat-0.085
 reality-0.079
aman-0.079
 Sandra-0.078
Token they
Token ID484
Feature activation-1.41e-02

Pos logit contributions
balls+0.009
 first+0.008
 hips+0.007
ishly+0.007
 onto+0.007
Neg logit contributions
aw-0.015
Kat-0.014
weet-0.013
 reality-0.013
onda-0.013
Token the
Token ID262
Feature activation+8.54e-03

Pos logit contributions
aw+0.104
Kat+0.096
 Sandra+0.086
aman+0.085
 reality+0.084
Neg logit contributions
balls-0.123
hetically-0.116
ishly-0.115
 hips-0.113
ider-0.106
Token stolen
Token ID9909
Feature activation+6.01e-02

Pos logit contributions
aw+0.014
Kat+0.014
 reality+0.013
ige+0.013
 Sandra+0.013
Neg logit contributions
ishly-0.016
hetically-0.015
balls-0.014
 hips-0.014
ider-0.013
Token squares
Token ID24438
Feature activation-1.15e-02

Pos logit contributions
aw+0.120
Kat+0.108
 Sandra+0.099
 reality+0.099
 theirs+0.097
Neg logit contributions
balls-0.102
hetically-0.100
ishly-0.100
 hips-0.095
ider-0.087
Token to
Token ID284
Feature activation+3.48e-02

Pos logit contributions
hetically+0.020
ishly+0.019
 hips+0.019
balls+0.018
ider+0.017
Neg logit contributions
aw-0.022
 theirs-0.019
Kat-0.019
land-0.018
reating-0.017
Token their
Token ID511
Feature activation+1.93e-02

Pos logit contributions
aw+0.064
Kat+0.058
aman+0.052
 reality+0.050
 Sandra+0.050
Neg logit contributions
balls-0.065
ishly-0.062
hetically-0.060
 hips-0.059
 first-0.057
Token rightful
Token ID43133
Feature activation+1.12e-01

Pos logit contributions
aw+0.032
Kat+0.029
 reality+0.028
aman+0.026
 Sandra+0.026
Neg logit contributions
balls-0.036
hetically-0.036
ishly-0.035
 hips-0.035
ider-0.033
Token place
Token ID1295
Feature activation+4.15e-02

Pos logit contributions
aw+0.236
Kat+0.214
aman+0.212
weet+0.207
 reality+0.202
Neg logit contributions
balls-0.167
 hips-0.155
 first-0.149
ishly-0.143
hetically-0.142
Token.
Token ID13
Feature activation+4.06e-03

Pos logit contributions
aw+0.086
Kat+0.080
 theirs+0.077
 reality+0.074
 Sandra+0.074
Neg logit contributions
balls-0.064
hetically-0.060
ishly-0.057
 hips-0.057
ider-0.054
Token
Token ID198
Feature activation+1.84e-02

Pos logit contributions
aw+0.007
Kat+0.007
 Sandra+0.006
aman+0.006
 theirs+0.006
Neg logit contributions
balls-0.008
hetically-0.007
ishly-0.007
 hips-0.007
ider-0.007
Token
Token ID198
Feature activation+1.01e-03

Pos logit contributions
aw+0.036
Kat+0.033
 reality+0.031
aman+0.031
 Sandra+0.030
Neg logit contributions
balls-0.032
ishly-0.029
hetically-0.029
 hips-0.028
 first-0.026
TokenThe
Token ID464
Feature activation-1.38e-02

Pos logit contributions
aw+0.002
Kat+0.001
 reality+0.001
 Sandra+0.001
aman+0.001
Neg logit contributions
balls-0.002
ishly-0.002
hetically-0.002
 hips-0.002
ider-0.002
Token grew
Token ID6348
Feature activation+8.85e-03

Pos logit contributions
aw+0.024
Kat+0.021
 Sandra+0.019
 reality+0.019
aman+0.019
Neg logit contributions
balls-0.028
ishly-0.027
hetically-0.027
 hips-0.026
ider-0.024
Token leaves
Token ID5667
Feature activation+1.04e-02

Pos logit contributions
aw+0.018
Kat+0.018
aman+0.016
 Sandra+0.016
 reality+0.016
Neg logit contributions
balls-0.014
hetically-0.013
ishly-0.012
 hips-0.012
 first-0.011
Token and
Token ID290
Feature activation+1.48e-02

Pos logit contributions
aw+0.021
 theirs+0.017
Kat+0.017
land+0.017
 Sandra+0.017
Neg logit contributions
balls-0.017
hetically-0.016
 hips-0.016
ishly-0.016
ider-0.015
Token flowers
Token ID12734
Feature activation+6.73e-02

Pos logit contributions
aw+0.029
Kat+0.027
aman+0.025
 Sandra+0.025
 reality+0.025
Neg logit contributions
balls-0.025
hetically-0.023
ishly-0.023
 hips-0.023
ider-0.021
Token and
Token ID290
Feature activation+4.79e-02

Pos logit contributions
aw+0.145
Kat+0.128
 theirs+0.122
 Sandra+0.120
land+0.120
Neg logit contributions
balls-0.101
hetically-0.098
ishly-0.096
 hips-0.093
ider-0.086
Token fruits
Token ID15921
Feature activation+1.12e-01

Pos logit contributions
aw+0.091
Kat+0.088
aman+0.080
 Sandra+0.077
 reality+0.077
Neg logit contributions
balls-0.084
hetically-0.077
ishly-0.077
 hips-0.075
 first-0.069
Token.
Token ID13
Feature activation+2.07e-02

Pos logit contributions
aw+0.265
Kat+0.250
 Sandra+0.231
 reality+0.230
aman+0.230
Neg logit contributions
balls-0.144
hetically-0.131
ishly-0.131
 hips-0.127
ider-0.113
Token It
Token ID632
Feature activation+2.04e-02

Pos logit contributions
aw+0.047
aman+0.046
Kat+0.045
 reality+0.045
reating+0.044
Neg logit contributions
balls-0.026
 hips-0.023
 first-0.020
 onto-0.020
ider-0.019
Token was
Token ID373
Feature activation+1.52e-02

Pos logit contributions
aw+0.035
Kat+0.032
 Sandra+0.028
 reality+0.028
aman+0.027
Neg logit contributions
balls-0.040
ishly-0.038
hetically-0.038
 hips-0.038
ider-0.035
Token a
Token ID257
Feature activation-1.55e-02

Pos logit contributions
aw+0.034
Kat+0.033
aman+0.031
oph+0.031
onda+0.030
Neg logit contributions
balls-0.023
 hips-0.020
 first-0.020
 onto-0.019
ishly-0.018
Token magic
Token ID5536
Feature activation-1.89e-02

Pos logit contributions
balls+0.028
 hips+0.024
 first+0.024
ishly+0.023
hetically+0.023
Neg logit contributions
aw-0.031
Kat-0.028
 Sandra-0.026
 theirs-0.025
aman-0.025
Token,
Token ID11
Feature activation+4.81e-02

Pos logit contributions
aw+0.097
Kat+0.088
 Sandra+0.082
land+0.082
 theirs+0.082
Neg logit contributions
balls-0.060
ishly-0.058
hetically-0.055
 hips-0.053
ider-0.049
Token on
Token ID319
Feature activation+4.43e-02

Pos logit contributions
aw+0.105
Kat+0.094
 Sandra+0.088
 reality+0.086
 theirs+0.085
Neg logit contributions
balls-0.074
ishly-0.070
hetically-0.067
 hips-0.066
ider-0.062
Token books
Token ID3835
Feature activation+5.57e-02

Pos logit contributions
aw+0.084
Kat+0.079
aman+0.072
 Sandra+0.070
 reality+0.069
Neg logit contributions
balls-0.077
hetically-0.072
 hips-0.072
ishly-0.070
 first-0.066
Token,
Token ID11
Feature activation+5.85e-02

Pos logit contributions
aw+0.135
Kat+0.127
 Sandra+0.117
aman+0.117
 reality+0.117
Neg logit contributions
balls-0.070
ishly-0.064
hetically-0.064
 hips-0.061
ider-0.054
Token on
Token ID319
Feature activation+4.91e-02

Pos logit contributions
aw+0.122
Kat+0.106
 Sandra+0.099
land+0.097
 reality+0.096
Neg logit contributions
balls-0.097
ishly-0.093
hetically-0.089
 hips-0.086
ider-0.083
Token anything
Token ID1997
Feature activation+1.11e-01

Pos logit contributions
aw+0.097
Kat+0.091
aman+0.084
 Sandra+0.082
onda+0.080
Neg logit contributions
balls-0.080
 hips-0.076
hetically-0.075
 first-0.071
ishly-0.071
Token.
Token ID13
Feature activation+2.39e-02

Pos logit contributions
aw+0.230
Kat+0.206
 reality+0.195
land+0.191
 theirs+0.191
Neg logit contributions
ishly-0.176
balls-0.172
hetically-0.172
 hips-0.163
ider-0.155
Token He
Token ID679
Feature activation+2.70e-03

Pos logit contributions
aw+0.058
Kat+0.055
aman+0.051
 reality+0.051
 Sandra+0.050
Neg logit contributions
balls-0.030
hetically-0.027
ishly-0.027
 hips-0.026
ider-0.023
Token thinks
Token ID6834
Feature activation-1.99e-02

Pos logit contributions
aw+0.004
Kat+0.003
 reality+0.003
 Sandra+0.003
aman+0.002
Neg logit contributions
balls-0.006
ishly-0.006
hetically-0.006
 hips-0.006
ider-0.006
Token writing
Token ID3597
Feature activation-8.05e-03

Pos logit contributions
balls+0.036
hetically+0.034
ishly+0.034
 hips+0.033
ider+0.030
Neg logit contributions
aw-0.037
Kat-0.034
 Sandra-0.031
aman-0.031
 reality-0.031
Token is
Token ID318
Feature activation+1.32e-02

Pos logit contributions
balls+0.015
ishly+0.014
hetically+0.014
 hips+0.014
ider+0.013
Neg logit contributions
aw-0.015
Kat-0.014
 reality-0.012
 Sandra-0.012
 theirs-0.012
Token and
Token ID290
Feature activation+9.53e-03

Pos logit contributions
aw+0.044
Kat+0.040
 theirs+0.037
 Sandra+0.037
 reality+0.036
Neg logit contributions
balls-0.033
hetically-0.032
ishly-0.031
 hips-0.030
ider-0.027
Token cats
Token ID11875
Feature activation+5.47e-02

Pos logit contributions
aw+0.017
Kat+0.016
 Sandra+0.014
 reality+0.014
aman+0.014
Neg logit contributions
balls-0.018
hetically-0.017
ishly-0.017
 hips-0.016
ider-0.015
Token and
Token ID290
Feature activation+3.32e-02

Pos logit contributions
aw+0.114
Kat+0.101
 theirs+0.094
land+0.093
 reality+0.093
Neg logit contributions
balls-0.087
hetically-0.085
ishly-0.082
 hips-0.080
ider-0.073
Token cars
Token ID5006
Feature activation+4.37e-02

Pos logit contributions
aw+0.059
Kat+0.054
 Sandra+0.049
 reality+0.049
aman+0.048
Neg logit contributions
balls-0.063
hetically-0.059
ishly-0.059
 hips-0.058
ider-0.054
Token and
Token ID290
Feature activation+4.42e-02

Pos logit contributions
aw+0.095
Kat+0.085
 theirs+0.079
 Sandra+0.078
 reality+0.078
Neg logit contributions
balls-0.066
hetically-0.063
ishly-0.062
 hips-0.060
ider-0.054
Token people
Token ID661
Feature activation+1.10e-01

Pos logit contributions
aw+0.080
Kat+0.074
 Sandra+0.066
 reality+0.066
aman+0.065
Neg logit contributions
balls-0.082
hetically-0.078
ishly-0.077
 hips-0.076
ider-0.070
Token.
Token ID13
Feature activation+1.97e-02

Pos logit contributions
aw+0.237
Kat+0.220
 Sandra+0.201
 reality+0.200
aman+0.199
Neg logit contributions
balls-0.167
hetically-0.157
ishly-0.156
 hips-0.151
ider-0.137
Token They
Token ID1119
Feature activation+1.26e-02

Pos logit contributions
aw+0.040
Kat+0.038
aman+0.035
 reality+0.035
 Sandra+0.034
Neg logit contributions
balls-0.032
ishly-0.030
hetically-0.029
 hips-0.029
ider-0.026
Token can
Token ID460
Feature activation+1.04e-02

Pos logit contributions
aw+0.021
Kat+0.018
 theirs+0.017
 reality+0.017
 Sandra+0.016
Neg logit contributions
balls-0.025
hetically-0.024
ishly-0.024
 hips-0.023
ider-0.022
Token hurt
Token ID5938
Feature activation+1.53e-02

Pos logit contributions
aw+0.017
Kat+0.015
 theirs+0.014
Pe+0.014
 Sandra+0.014
Neg logit contributions
hetically-0.019
balls-0.019
 hips-0.018
ider-0.018
ishly-0.018
Token us
Token ID514
Feature activation+1.65e-02

Pos logit contributions
aw+0.026
Kat+0.024
 Sandra+0.021
 reality+0.021
aman+0.021
Neg logit contributions
balls-0.030
ishly-0.028
hetically-0.028
 hips-0.027
ider-0.025
TokenThe
Token ID464
Feature activation-2.37e-02

Pos logit contributions
balls+0.021
hetically+0.019
ishly+0.019
 hips+0.019
ider+0.018
Neg logit contributions
aw-0.015
Kat-0.013
 Sandra-0.012
aman-0.012
 reality-0.012
Token ot
Token ID30972
Feature activation-2.43e-02

Pos logit contributions
balls+0.047
ishly+0.045
 hips+0.045
hetically+0.044
ider+0.040
Neg logit contributions
aw-0.038
 reality-0.037
Kat-0.037
 history-0.032
land-0.031
Tokenter
Token ID353
Feature activation-3.38e-03

Pos logit contributions
 hips+0.051
balls+0.048
hetically+0.047
ishly+0.044
 stretches+0.042
Neg logit contributions
aw-0.042
Kat-0.034
ull-0.032
onda-0.031
 reality-0.030
Token was
Token ID373
Feature activation-7.18e-03

Pos logit contributions
balls+0.007
hetically+0.006
 hips+0.006
ishly+0.006
ider+0.006
Neg logit contributions
aw-0.006
Kat-0.005
aman-0.004
 reality-0.004
 Sandra-0.004
Token be
Token ID307
Feature activation-2.96e-02

Pos logit contributions
balls+0.011
ishly+0.010
 hips+0.010
hetically+0.010
 first+0.010
Neg logit contributions
aw-0.015
Kat-0.014
aman-0.013
oph-0.013
 reality-0.013
Tokenf
Token ID69
Feature activation-1.08e-01

Pos logit contributions
balls+0.056
 hips+0.053
ishly+0.052
hetically+0.052
 first+0.050
Neg logit contributions
aw-0.050
Kat-0.045
aman-0.044
 reality-0.043
oph-0.043
Tokenuddled
Token ID32745
Feature activation-7.42e-03

Pos logit contributions
balls+0.209
ishly+0.191
hetically+0.190
 hips+0.190
ider+0.167
Neg logit contributions
aw-0.203
Kat-0.175
land-0.165
aman-0.159
 reality-0.158
Token but
Token ID475
Feature activation+3.27e-03

Pos logit contributions
balls+0.010
hetically+0.009
ishly+0.009
 hips+0.009
ider+0.009
Neg logit contributions
aw-0.017
Kat-0.015
 Sandra-0.014
 theirs-0.014
 reality-0.014
Token curious
Token ID11040
Feature activation+1.13e-02

Pos logit contributions
aw+0.006
Kat+0.006
aman+0.005
 Sandra+0.005
 reality+0.005
Neg logit contributions
balls-0.006
ishly-0.006
hetically-0.006
 hips-0.006
ider-0.005
Token.
Token ID13
Feature activation-7.41e-03

Pos logit contributions
aw+0.023
Kat+0.020
 reality+0.019
 Sandra+0.018
land+0.018
Neg logit contributions
balls-0.019
hetically-0.018
ishly-0.018
 hips-0.017
ider-0.017
Token He
Token ID679
Feature activation-1.73e-03

Pos logit contributions
balls+0.011
hetically+0.010
ishly+0.010
 hips+0.010
ider+0.009
Neg logit contributions
aw-0.016
Kat-0.015
 Sandra-0.014
 reality-0.014
aman-0.014
TokenThe
Token ID464
Feature activation-1.60e-02

Pos logit contributions
aw+0.003
Kat+0.003
 Sandra+0.003
aman+0.003
 reality+0.003
Neg logit contributions
balls-0.004
ishly-0.004
hetically-0.004
 hips-0.004
 first-0.004
Token old
Token ID1468
Feature activation+3.68e-02

Pos logit contributions
balls+0.031
ishly+0.029
hetically+0.029
 hips+0.029
ider+0.026
Neg logit contributions
aw-0.028
Kat-0.026
 Sandra-0.023
 reality-0.023
aman-0.023
Token lady
Token ID10846
Feature activation-3.24e-03

Pos logit contributions
aw+0.064
Kat+0.060
 Sandra+0.054
aman+0.053
 reality+0.052
Neg logit contributions
balls-0.072
hetically-0.068
ishly-0.067
 hips-0.066
ider-0.061
Token became
Token ID2627
Feature activation-1.85e-02

Pos logit contributions
balls+0.007
 hips+0.006
hetically+0.006
ishly+0.006
ider+0.006
Neg logit contributions
aw-0.005
Kat-0.005
 Sandra-0.004
aman-0.004
 reality-0.004
Token known
Token ID1900
Feature activation+2.20e-02

Pos logit contributions
balls+0.034
hetically+0.033
ishly+0.032
 hips+0.032
ider+0.030
Neg logit contributions
aw-0.034
Kat-0.032
 reality-0.029
 Sandra-0.028
 theirs-0.028
Token by
Token ID416
Feature activation-9.85e-02

Pos logit contributions
aw+0.040
 theirs+0.040
 reality+0.038
Kat+0.037
 Sandra+0.035
Neg logit contributions
hetically-0.038
balls-0.038
ishly-0.035
 hips-0.034
ider-0.031
Token many
Token ID867
Feature activation-3.34e-02

Pos logit contributions
balls+0.192
hetically+0.181
ishly+0.179
 hips+0.177
ider+0.165
Neg logit contributions
aw-0.168
Kat-0.157
 reality-0.141
 Sandra-0.141
aman-0.139
Token as
Token ID355
Feature activation-5.34e-02

Pos logit contributions
balls+0.074
ishly+0.070
hetically+0.070
 hips+0.069
ider+0.063
Neg logit contributions
aw-0.054
Kat-0.044
 reality-0.043
land-0.040
 theirs-0.040
Token the
Token ID262
Feature activation-1.63e-02

Pos logit contributions
balls+0.095
hetically+0.088
ishly+0.087
 hips+0.086
 first+0.081
Neg logit contributions
aw-0.100
Kat-0.092
aman-0.084
 reality-0.081
 Sandra-0.081
Token Y
Token ID575
Feature activation-1.94e-02

Pos logit contributions
balls+0.036
hetically+0.034
 hips+0.034
ishly+0.034
ider+0.031
Neg logit contributions
aw-0.023
Kat-0.021
 reality-0.020
 Sandra-0.019
 history-0.019
Tokenarn
Token ID1501
Feature activation-5.26e-02

Pos logit contributions
balls+0.038
 hips+0.036
hetically+0.035
ishly+0.034
 maker+0.031
Neg logit contributions
aw-0.036
Kat-0.030
aman-0.029
land-0.028
 Sandra-0.028
Token friendly
Token ID8030
Feature activation-1.98e-02

Pos logit contributions
balls+0.011
hetically+0.011
ishly+0.010
 hips+0.010
ider+0.009
Neg logit contributions
aw-0.010
Kat-0.009
 reality-0.009
 Sandra-0.008
aman-0.008
Token bee
Token ID20697
Feature activation-2.01e-02

Pos logit contributions
balls+0.031
hetically+0.029
ishly+0.028
 hips+0.028
ider+0.026
Neg logit contributions
aw-0.042
Kat-0.040
 Sandra-0.037
 reality-0.036
aman-0.036
Token named
Token ID3706
Feature activation+2.15e-03

Pos logit contributions
balls+0.047
 hips+0.044
hetically+0.044
ishly+0.043
ider+0.041
Neg logit contributions
aw-0.029
Kat-0.024
 Sandra-0.021
aman-0.021
 reality-0.020
Token Buzz
Token ID15753
Feature activation-3.66e-02

Pos logit contributions
aw+0.004
Kat+0.004
 Sandra+0.004
 reality+0.004
aman+0.004
Neg logit contributions
balls-0.003
hetically-0.003
ishly-0.003
 hips-0.003
ider-0.003
Token saw
Token ID2497
Feature activation+2.19e-02

Pos logit contributions
balls+0.063
hetically+0.059
ishly+0.059
 hips+0.058
ider+0.053
Neg logit contributions
aw-0.072
Kat-0.067
 Sandra-0.060
 reality-0.060
aman-0.060
Token Sp
Token ID1338
Feature activation-9.83e-02

Pos logit contributions
aw+0.039
Kat+0.034
 Sandra+0.031
 reality+0.031
aman+0.031
Neg logit contributions
balls-0.041
 hips-0.040
hetically-0.040
ishly-0.039
ider-0.039
Tokenide
Token ID485
Feature activation-5.76e-02

Pos logit contributions
balls+0.216
ishly+0.205
hetically+0.205
 hips+0.201
ider+0.188
Neg logit contributions
aw-0.145
Kat-0.133
aman-0.114
 reality-0.114
 Sandra-0.114
Tokeny
Token ID88
Feature activation-9.93e-03

Pos logit contributions
 hips+0.126
balls+0.123
ishly+0.116
ider+0.116
hetically+0.115
Neg logit contributions
aw-0.086
land-0.067
 theirs-0.065
Kat-0.065
ull-0.064
Token struggling
Token ID9648
Feature activation-1.25e-02

Pos logit contributions
balls+0.017
 hips+0.016
hetically+0.016
ishly+0.015
ider+0.015
Neg logit contributions
aw-0.020
Kat-0.017
 reality-0.016
land-0.016
aman-0.016
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
ishly+0.024
hetically+0.023
balls+0.023
ider+0.022
 hips+0.022
Neg logit contributions
aw-0.020
 theirs-0.018
Kat-0.017
 its-0.016
 sharing-0.016
Token asked
Token ID1965
Feature activation+2.42e-02

Pos logit contributions
aw+0.018
Kat+0.017
 Sandra+0.015
 reality+0.015
aman+0.015
Neg logit contributions
balls-0.021
hetically-0.020
 hips-0.020
ishly-0.020
ider-0.018
Token sign
Token ID1051
Feature activation-3.29e-03

Pos logit contributions
balls+0.034
 first+0.028
 hips+0.028
 maker+0.026
hetically+0.026
Neg logit contributions
aw-0.048
Kat-0.046
aman-0.043
 theirs-0.042
 Sandra-0.042
Token that
Token ID326
Feature activation+3.70e-04

Pos logit contributions
balls+0.006
hetically+0.006
ishly+0.006
 hips+0.006
ider+0.005
Neg logit contributions
aw-0.006
Kat-0.005
 Sandra-0.005
 theirs-0.005
land-0.005
Token said
Token ID531
Feature activation+5.60e-03

Pos logit contributions
aw+0.001
Kat+0.001
aman+0.001
 reality+0.001
 Sandra+0.001
Neg logit contributions
balls-0.001
ishly-0.001
hetically-0.001
 hips-0.001
 first-0.001
Token "
Token ID366
Feature activation-2.25e-02

Pos logit contributions
aw+0.012
Kat+0.011
aman+0.010
 reality+0.010
 Sandra+0.010
Neg logit contributions
balls-0.009
ishly-0.008
hetically-0.008
 hips-0.008
ider-0.007
TokenSur
Token ID14214
Feature activation-5.78e-03

Pos logit contributions
balls+0.046
hetically+0.043
ishly+0.043
 hips+0.042
ider+0.040
Neg logit contributions
aw-0.036
Kat-0.033
 reality-0.029
aman-0.029
 Sandra-0.029
Tokenf
Token ID69
Feature activation-9.77e-02

Pos logit contributions
ishly+0.013
balls+0.013
hetically+0.013
 hips+0.012
 maker+0.011
Neg logit contributions
aw-0.009
Kat-0.008
 reality-0.007
land-0.006
David-0.006
Token Show
Token ID5438
Feature activation-2.59e-02

Pos logit contributions
hetically+0.186
balls+0.183
ishly+0.180
 hips+0.173
ider+0.156
Neg logit contributions
aw-0.178
Kat-0.156
 Sandra-0.142
land-0.141
 reality-0.138
Token Today
Token ID6288
Feature activation-2.31e-02

Pos logit contributions
balls+0.049
hetically+0.049
ishly+0.047
 hips+0.045
ider+0.042
Neg logit contributions
aw-0.047
Kat-0.042
 Sandra-0.038
land-0.038
aman-0.037
Token".
Token ID1911
Feature activation-4.52e-03

Pos logit contributions
balls+0.036
hetically+0.035
ishly+0.034
 hips+0.032
ider+0.029
Neg logit contributions
aw-0.049
Kat-0.045
 Sandra-0.041
 reality-0.041
aman-0.041
Token She
Token ID1375
Feature activation-1.08e-02

Pos logit contributions
balls+0.008
hetically+0.008
ishly+0.007
 hips+0.007
ider+0.007
Neg logit contributions
aw-0.009
Kat-0.008
 Sandra-0.007
aman-0.007
 theirs-0.007
Token was
Token ID373
Feature activation-5.16e-03

Pos logit contributions
balls+0.020
 hips+0.018
ishly+0.018
hetically+0.018
 first+0.017
Neg logit contributions
aw-0.019
Kat-0.018
aman-0.017
 reality-0.017
 theirs-0.016
Token
Token ID220
Feature activation+3.75e-03

Pos logit contributions
aw+0.004
Kat+0.004
 Sandra+0.003
weet+0.003
 reality+0.003
Neg logit contributions
balls-0.005
hetically-0.005
ishly-0.005
 hips-0.005
ider-0.005
Token
Token ID198
Feature activation+2.35e-02

Pos logit contributions
aw+0.007
Kat+0.007
 reality+0.006
aman+0.006
 Sandra+0.006
Neg logit contributions
balls-0.007
hetically-0.006
ishly-0.006
 hips-0.006
ider-0.006
Token
Token ID198
Feature activation+7.83e-03

Pos logit contributions
aw+0.046
Kat+0.042
 reality+0.038
aman+0.038
 Sandra+0.038
Neg logit contributions
balls-0.041
hetically-0.038
ishly-0.037
 hips-0.037
ider-0.034
Token"
Token ID1
Feature activation-3.50e-02

Pos logit contributions
aw+0.009
Kat+0.008
 reality+0.006
Pe+0.006
 Sandra+0.006
Neg logit contributions
balls-0.020
ishly-0.019
hetically-0.019
 hips-0.019
ider-0.017
TokenHi
Token ID17250
Feature activation+1.62e-02

Pos logit contributions
balls+0.070
ishly+0.066
hetically+0.066
 hips+0.064
ider+0.060
Neg logit contributions
aw-0.059
Kat-0.054
aman-0.049
 reality-0.048
 Sandra-0.048
Token Sp
Token ID1338
Feature activation-9.72e-02

Pos logit contributions
aw+0.039
Kat+0.037
 reality+0.034
 Sandra+0.034
aman+0.034
Neg logit contributions
balls-0.020
hetically-0.019
ishly-0.019
 hips-0.018
ider-0.016
Tokenud
Token ID463
Feature activation-1.96e-02

Pos logit contributions
balls+0.208
ishly+0.202
hetically+0.198
 hips+0.195
ider+0.177
Neg logit contributions
aw-0.149
Kat-0.135
aman-0.118
 reality-0.117
 Sandra-0.114
Token!"
Token ID2474
Feature activation-9.64e-03

Pos logit contributions
hetically+0.033
balls+0.032
 hips+0.031
ishly+0.031
ider+0.028
Neg logit contributions
aw-0.036
 Sandra-0.030
Kat-0.029
ull-0.029
 Su-0.028
Token said
Token ID531
Feature activation+7.32e-03

Pos logit contributions
hetically+0.023
balls+0.022
ishly+0.022
 hips+0.022
ider+0.021
Neg logit contributions
aw-0.012
Kat-0.010
 Sandra-0.009
weet-0.008
Ab-0.008
Token Lily
Token ID20037
Feature activation-1.42e-02

Pos logit contributions
aw+0.006
Kat+0.006
 Sandra+0.005
 reality+0.005
 theirs+0.005
Neg logit contributions
hetically-0.019
balls-0.019
ishly-0.019
 hips-0.018
ider-0.018
Token.
Token ID13
Feature activation-2.16e-02

Pos logit contributions
balls+0.028
hetically+0.027
ider+0.025
ishly+0.024
 hips+0.023
Neg logit contributions
aw-0.023
Kat-0.021
 Sandra-0.019
weet-0.018
 spoken-0.018
Token.
Token ID13
Feature activation+3.36e-03

Pos logit contributions
aw+0.028
Kat+0.024
land+0.024
 reality+0.023
 Sandra+0.023
Neg logit contributions
balls-0.020
hetically-0.020
ishly-0.020
 hips-0.019
ider-0.017
Token His
Token ID2399
Feature activation-3.08e-03

Pos logit contributions
aw+0.007
Kat+0.007
 Sandra+0.006
 reality+0.006
aman+0.006
Neg logit contributions
balls-0.005
hetically-0.005
ishly-0.005
 hips-0.004
ider-0.004
Token name
Token ID1438
Feature activation+5.68e-03

Pos logit contributions
ishly+0.006
hetically+0.006
balls+0.006
 hips+0.005
ider+0.005
Neg logit contributions
aw-0.006
Kat-0.005
 Sandra-0.005
 reality-0.005
ige-0.004
Token was
Token ID373
Feature activation+9.10e-03

Pos logit contributions
Kat+0.012
aw+0.011
aman+0.011
 reality+0.010
 Sandra+0.010
Neg logit contributions
balls-0.009
hetically-0.008
ishly-0.008
 hips-0.008
 maker-0.007
Token Wil
Token ID5187
Feature activation-1.02e-02

Pos logit contributions
aw+0.019
Kat+0.018
aman+0.017
 reality+0.016
land+0.016
Neg logit contributions
balls-0.015
ishly-0.013
hetically-0.013
 hips-0.013
 first-0.012
Tokenf
Token ID69
Feature activation-9.64e-02

Pos logit contributions
hetically+0.017
balls+0.016
 hips+0.016
ishly+0.016
 first+0.016
Neg logit contributions
aw-0.020
Kat-0.019
 theirs-0.018
 reality-0.017
ooks-0.017
Token.
Token ID13
Feature activation+2.06e-03

Pos logit contributions
balls+0.179
 hips+0.173
ishly+0.172
hetically+0.171
ider+0.149
Neg logit contributions
aw-0.187
Kat-0.157
 Sandra-0.145
weet-0.143
land-0.143
Token One
Token ID1881
Feature activation-2.91e-02

Pos logit contributions
aw+0.005
Kat+0.004
 reality+0.004
 Sandra+0.004
oph+0.004
Neg logit contributions
balls-0.003
hetically-0.003
ishly-0.003
 hips-0.003
 first-0.002
Token day
Token ID1110
Feature activation-4.30e-02

Pos logit contributions
balls+0.047
hetically+0.044
ishly+0.043
 hips+0.042
 first+0.040
Neg logit contributions
aw-0.059
Kat-0.056
 reality-0.051
aman-0.051
 Sandra-0.050
Token Wil
Token ID5187
Feature activation+1.02e-03

Pos logit contributions
balls+0.047
ishly+0.042
hetically+0.042
 hips+0.040
ider+0.034
Neg logit contributions
aw-0.111
Kat-0.106
 reality-0.098
 Sandra-0.098
aman-0.097
Tokenf
Token ID69
Feature activation-8.81e-02

Pos logit contributions
aw+0.002
Kat+0.002
 reality+0.002
 theirs+0.002
ooks+0.002
Neg logit contributions
hetically-0.002
 first-0.002
balls-0.002
 hips-0.001
ishly-0.001
Token
Token ID198
Feature activation-6.21e-03

Pos logit contributions
aw+0.005
 reality+0.004
aman+0.004
Kat+0.004
 theirs+0.004
Neg logit contributions
balls-0.003
 hips-0.002
hetically-0.002
 first-0.002
ishly-0.002
Token"
Token ID1
Feature activation-4.40e-02

Pos logit contributions
balls+0.012
hetically+0.011
ishly+0.011
 hips+0.011
ider+0.010
Neg logit contributions
aw-0.011
Kat-0.010
 Sandra-0.009
aman-0.009
 reality-0.009
TokenI
Token ID40
Feature activation-5.17e-03

Pos logit contributions
balls+0.085
hetically+0.079
ishly+0.079
 hips+0.076
ider+0.072
Neg logit contributions
aw-0.077
Kat-0.070
 reality-0.064
aman-0.064
 Sandra-0.063
Token have
Token ID423
Feature activation-3.03e-02

Pos logit contributions
balls+0.010
hetically+0.010
ishly+0.009
 hips+0.009
ider+0.008
Neg logit contributions
aw-0.009
Kat-0.008
 reality-0.008
 Sandra-0.007
aman-0.007
Token my
Token ID616
Feature activation-1.83e-02

Pos logit contributions
balls+0.050
 hips+0.045
ishly+0.044
 first+0.043
hetically+0.042
Neg logit contributions
aw-0.058
Kat-0.054
aman-0.053
 reality-0.050
 Sandra-0.050
Token fl
Token ID781
Feature activation-9.50e-02

Pos logit contributions
hetically+0.032
balls+0.031
ishly+0.030
ider+0.028
 hips+0.028
Neg logit contributions
aw-0.034
Kat-0.030
 reality-0.029
land-0.028
aman-0.028
Tokenute
Token ID1133
Feature activation-5.12e-02

Pos logit contributions
hetically+0.230
ishly+0.218
balls+0.212
 hips+0.204
 first+0.191
Neg logit contributions
aw-0.118
Kat-0.097
 reality-0.092
ooks-0.087
land-0.087
Token,"
Token ID553
Feature activation-1.85e-02

Pos logit contributions
hetically+0.104
balls+0.098
 hips+0.090
ishly+0.089
Carl+0.089
Neg logit contributions
aw-0.075
land-0.068
 theirs-0.063
 yours-0.063
 seen-0.061
Token replied
Token ID8712
Feature activation+1.50e-02

Pos logit contributions
balls+0.035
hetically+0.033
ishly+0.033
 hips+0.032
ider+0.031
Neg logit contributions
aw-0.033
Kat-0.030
 Sandra-0.027
 reality-0.027
aman-0.026
Token Lily
Token ID20037
Feature activation-1.51e-02

Pos logit contributions
aw+0.016
Kat+0.014
 Sandra+0.013
 theirs+0.013
 reality+0.012
Neg logit contributions
balls-0.038
hetically-0.037
ider-0.035
 hips-0.035
ishly-0.035
Token.
Token ID13
Feature activation-2.38e-02

Pos logit contributions
balls+0.028
hetically+0.025
ider+0.024
ishly+0.023
 hips+0.022
Neg logit contributions
aw-0.027
Kat-0.025
 Sandra-0.023
 spoken-0.022
weet-0.021
Token sign
Token ID1051
Feature activation+3.57e-04

Pos logit contributions
balls+0.021
 hips+0.018
hetically+0.017
 first+0.017
ishly+0.017
Neg logit contributions
aw-0.030
Kat-0.028
aman-0.026
 theirs-0.026
 Sandra-0.026
Token that
Token ID326
Feature activation+8.55e-04

Pos logit contributions
aw+0.001
Kat+0.001
 Sandra+0.000
 theirs+0.000
land+0.000
Neg logit contributions
balls-0.001
hetically-0.001
ishly-0.001
 hips-0.001
ider-0.001
Token says
Token ID1139
Feature activation-1.68e-03

Pos logit contributions
aw+0.002
Kat+0.001
 Sandra+0.001
aman+0.001
 reality+0.001
Neg logit contributions
balls-0.002
hetically-0.002
ishly-0.001
 hips-0.001
ider-0.001
Token "
Token ID366
Feature activation-2.44e-02

Pos logit contributions
balls+0.003
ishly+0.002
hetically+0.002
 hips+0.002
ider+0.002
Neg logit contributions
aw-0.004
Kat-0.003
aman-0.003
 reality-0.003
 Sandra-0.003
TokenSur
Token ID14214
Feature activation-6.06e-03

Pos logit contributions
balls+0.051
hetically+0.048
ishly+0.048
 hips+0.047
ider+0.044
Neg logit contributions
aw-0.039
Kat-0.035
 Sandra-0.031
 reality-0.031
aman-0.031
Tokenf
Token ID69
Feature activation-9.49e-02

Pos logit contributions
balls+0.013
ishly+0.013
hetically+0.013
 hips+0.012
 maker+0.011
Neg logit contributions
aw-0.009
Kat-0.008
 reality-0.007
land-0.007
 theirs-0.007
Token Lessons
Token ID46885
Feature activation-1.52e-02

Pos logit contributions
hetically+0.182
balls+0.180
ishly+0.177
 hips+0.168
ider+0.152
Neg logit contributions
aw-0.171
Kat-0.150
land-0.138
 Sandra-0.137
 theirs-0.133
Token for
Token ID329
Feature activation-2.72e-02

Pos logit contributions
hetically+0.027
balls+0.026
ishly+0.026
 hips+0.024
ider+0.021
Neg logit contributions
aw-0.030
Kat-0.026
land-0.026
 theirs-0.025
 Sandra-0.024
Token Kids
Token ID17476
Feature activation-1.82e-02

Pos logit contributions
balls+0.053
 hips+0.049
ishly+0.048
hetically+0.048
 first+0.047
Neg logit contributions
aw-0.047
Kat-0.043
aman-0.039
 reality-0.038
 Sandra-0.037
Token".
Token ID1911
Feature activation-3.44e-03

Pos logit contributions
balls+0.033
hetically+0.033
ishly+0.031
 hips+0.029
ider+0.027
Neg logit contributions
aw-0.035
Kat-0.032
land-0.030
 Sandra-0.029
 theirs-0.029
Token She
Token ID1375
Feature activation-1.01e-02

Pos logit contributions
balls+0.006
ishly+0.005
hetically+0.005
 hips+0.005
ider+0.005
Neg logit contributions
aw-0.007
Kat-0.006
 reality-0.006
aman-0.006
 Sandra-0.006
Token we
Token ID356
Feature activation-3.11e-03

Pos logit contributions
balls+0.053
ishly+0.049
hetically+0.047
 hips+0.047
 first+0.043
Neg logit contributions
aw-0.097
Kat-0.093
aman-0.086
 reality-0.084
 Sandra-0.084
Token can
Token ID460
Feature activation-9.41e-03

Pos logit contributions
balls+0.005
hetically+0.005
 hips+0.005
ishly+0.005
ider+0.004
Neg logit contributions
aw-0.006
Kat-0.006
 reality-0.005
 Sandra-0.005
 theirs-0.005
Token get
Token ID651
Feature activation-7.86e-03

Pos logit contributions
balls+0.018
hetically+0.018
ishly+0.016
 hips+0.016
ider+0.016
Neg logit contributions
aw-0.016
Kat-0.015
 reality-0.013
 Sandra-0.013
aman-0.013
Token you
Token ID345
Feature activation-4.20e-02

Pos logit contributions
balls+0.014
ishly+0.013
hetically+0.013
 hips+0.013
 first+0.012
Neg logit contributions
aw-0.014
Kat-0.014
aman-0.012
 reality-0.012
 Sandra-0.012
Token a
Token ID257
Feature activation-3.44e-02

Pos logit contributions
balls+0.069
ishly+0.063
 hips+0.063
hetically+0.061
 first+0.059
Neg logit contributions
aw-0.082
Kat-0.080
aman-0.072
 Sandra-0.071
 reality-0.071
Token ch
Token ID442
Feature activation-9.45e-02

Pos logit contributions
balls+0.058
ishly+0.054
 hips+0.052
hetically+0.052
 first+0.050
Neg logit contributions
aw-0.069
Kat-0.063
 Sandra-0.057
aman-0.056
 theirs-0.056
Tokenubby
Token ID38393
Feature activation-3.85e-02

Pos logit contributions
balls+0.179
hetically+0.168
ishly+0.165
 hips+0.162
ider+0.144
Neg logit contributions
aw-0.182
Kat-0.156
 Sandra-0.144
aman-0.143
 reality-0.140
Token cat
Token ID3797
Feature activation-3.70e-02

Pos logit contributions
balls+0.064
hetically+0.061
ishly+0.059
 hips+0.058
ider+0.053
Neg logit contributions
aw-0.078
Kat-0.071
 Sandra-0.066
aman-0.065
 reality-0.065
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
hetically+0.067
balls+0.063
 hips+0.061
ishly+0.059
ider+0.054
Neg logit contributions
aw-0.072
land-0.060
Kat-0.059
 theirs-0.059
weet-0.056
Token But
Token ID887
Feature activation-2.27e-02

Pos logit contributions
balls+0.023
hetically+0.022
 hips+0.021
ishly+0.021
ider+0.019
Neg logit contributions
aw-0.028
Kat-0.026
 Sandra-0.024
aman-0.023
 reality-0.023
Token for
Token ID329
Feature activation-1.07e-02

Pos logit contributions
balls+0.037
ishly+0.034
hetically+0.034
 hips+0.033
ider+0.030
Neg logit contributions
aw-0.046
Kat-0.044
 Sandra-0.039
aman-0.039
 reality-0.039
Token day
Token ID1110
Feature activation-4.45e-02

Pos logit contributions
balls+0.093
ishly+0.091
hetically+0.088
 hips+0.087
ider+0.079
Neg logit contributions
aw-0.073
Kat-0.066
 Sandra-0.060
aman-0.059
 reality-0.058
Token,
Token ID11
Feature activation-1.54e-02

Pos logit contributions
balls+0.053
hetically+0.049
ishly+0.048
 hips+0.047
ider+0.041
Neg logit contributions
aw-0.110
Kat-0.104
 Sandra-0.096
aman-0.096
 reality-0.096
Token Lily
Token ID20037
Feature activation-1.05e-03

Pos logit contributions
balls+0.032
ishly+0.030
hetically+0.030
 hips+0.029
ider+0.027
Neg logit contributions
aw-0.025
Kat-0.023
aman-0.020
 Sandra-0.020
 reality-0.020
Token saw
Token ID2497
Feature activation+1.60e-02

Pos logit contributions
balls+0.002
ishly+0.002
hetically+0.002
 hips+0.002
ider+0.002
Neg logit contributions
aw-0.002
Kat-0.002
 reality-0.002
aman-0.002
 Sandra-0.002
Token a
Token ID257
Feature activation-2.64e-02

Pos logit contributions
aw+0.036
Kat+0.035
land+0.034
aman+0.033
 reality+0.033
Neg logit contributions
balls-0.021
 first-0.019
ishly-0.018
hetically-0.018
 help-0.017
Token ch
Token ID442
Feature activation-9.43e-02

Pos logit contributions
balls+0.044
hetically+0.039
 hips+0.039
ishly+0.039
 first+0.037
Neg logit contributions
aw-0.054
Kat-0.050
 theirs-0.045
 Sandra-0.045
aman-0.044
Tokenubby
Token ID38393
Feature activation-3.75e-02

Pos logit contributions
balls+0.184
ishly+0.175
 hips+0.172
hetically+0.172
 first+0.150
Neg logit contributions
aw-0.178
Kat-0.146
 Sandra-0.135
 reality-0.134
ooks-0.133
Token squirrel
Token ID33039
Feature activation-1.88e-02

Pos logit contributions
balls+0.061
hetically+0.057
ishly+0.057
 hips+0.056
ider+0.051
Neg logit contributions
aw-0.077
Kat-0.071
 Sandra-0.065
 reality-0.065
aman-0.065
Token outside
Token ID2354
Feature activation-4.29e-02

Pos logit contributions
balls+0.033
 hips+0.033
hetically+0.031
ishly+0.030
ider+0.028
Neg logit contributions
aw-0.038
weet-0.030
Kat-0.030
land-0.028
ellery-0.028
Token her
Token ID607
Feature activation+3.31e-03

Pos logit contributions
balls+0.082
hetically+0.076
ishly+0.076
 hips+0.075
ider+0.069
Neg logit contributions
aw-0.077
Kat-0.071
 Sandra-0.063
 reality-0.063
aman-0.063
Token window
Token ID4324
Feature activation-2.42e-02

Pos logit contributions
aw+0.006
Kat+0.005
aman+0.005
 Sandra+0.004
 reality+0.004
Neg logit contributions
balls-0.007
ishly-0.006
hetically-0.006
 hips-0.006
ider-0.006
Token"
Token ID1
Feature activation-3.61e-02

Pos logit contributions
ishly+0.001
hetically+0.001
balls+0.001
 hips+0.001
 breaths+0.001
Neg logit contributions
aw-0.001
Kat-0.000
Pe-0.000
weet-0.000
Ab-0.000
TokenI
Token ID40
Feature activation-4.08e-03

Pos logit contributions
balls+0.079
hetically+0.075
ishly+0.075
 hips+0.074
ider+0.069
Neg logit contributions
aw-0.054
Kat-0.050
 Sandra-0.043
 reality-0.043
aman-0.042
Token am
Token ID716
Feature activation-3.46e-02

Pos logit contributions
balls+0.007
ishly+0.006
 hips+0.006
hetically+0.006
 first+0.006
Neg logit contributions
aw-0.008
Kat-0.008
aman-0.007
 Sandra-0.007
 reality-0.007
Token!"
Token ID2474
Feature activation-1.45e-02

Pos logit contributions
balls+0.061
ishly+0.056
 hips+0.055
 first+0.053
hetically+0.053
Neg logit contributions
aw-0.069
Kat-0.061
aman-0.058
land-0.056
 reality-0.055
Token replied
Token ID8712
Feature activation+6.21e-03

Pos logit contributions
balls+0.033
hetically+0.033
ider+0.032
 hips+0.032
ishly+0.032
Neg logit contributions
aw-0.019
Kat-0.015
 Sandra-0.014
weet-0.012
oph-0.012
Token Sp
Token ID1338
Feature activation-9.41e-02

Pos logit contributions
aw+0.005
 Sandra+0.004
Kat+0.004
 theirs+0.004
 Su+0.004
Neg logit contributions
hetically-0.016
ider-0.015
balls-0.015
 hips-0.015
ishly-0.015
Tokenud
Token ID463
Feature activation-1.90e-02

Pos logit contributions
balls+0.197
ishly+0.194
hetically+0.187
 hips+0.184
 onto+0.167
Neg logit contributions
aw-0.149
Kat-0.136
 reality-0.117
aman-0.117
land-0.114
Token.
Token ID13
Feature activation-2.42e-02

Pos logit contributions
hetically+0.030
balls+0.029
ishly+0.028
 hips+0.027
 stretches+0.026
Neg logit contributions
aw-0.034
Kat-0.032
 Sandra-0.030
 theirs-0.029
ull-0.029
Token "
Token ID366
Feature activation-4.84e-03

Pos logit contributions
hetically+0.056
ishly+0.052
 hips+0.049
 breath+0.049
balls+0.048
Neg logit contributions
aw-0.024
Kat-0.023
ooks-0.021
weet-0.020
See-0.019
TokenI
Token ID40
Feature activation+3.50e-03

Pos logit contributions
balls+0.011
ishly+0.011
hetically+0.010
 hips+0.010
ider+0.009
Neg logit contributions
aw-0.007
Kat-0.006
 reality-0.005
 Sandra-0.005
weet-0.005
Token just
Token ID655
Feature activation-3.73e-02

Pos logit contributions
aw+0.006
Kat+0.006
 reality+0.005
 Sandra+0.005
aman+0.005
Neg logit contributions
balls-0.007
hetically-0.006
ishly-0.006
 hips-0.006
ider-0.006
Token "
Token ID366
Feature activation-5.21e-03

Pos logit contributions
balls+0.030
hetically+0.028
ishly+0.028
 hips+0.027
ider+0.025
Neg logit contributions
aw-0.039
Kat-0.036
 Sandra-0.033
 reality-0.033
aman-0.033
TokenWhat
Token ID2061
Feature activation+1.98e-02

Pos logit contributions
balls+0.011
hetically+0.010
ishly+0.010
 hips+0.010
 first+0.009
Neg logit contributions
aw-0.008
Kat-0.007
 reality-0.007
 Sandra-0.007
aman-0.007
Token is
Token ID318
Feature activation-2.66e-03

Pos logit contributions
aman+0.043
Kat+0.043
aw+0.042
 reality+0.038
Pe+0.038
Neg logit contributions
balls-0.026
 first-0.025
 help-0.023
 hips-0.022
 maker-0.022
Token a
Token ID257
Feature activation-3.31e-02

Pos logit contributions
balls+0.005
ishly+0.004
 hips+0.004
hetically+0.004
 first+0.004
Neg logit contributions
aw-0.005
Kat-0.005
aman-0.004
 reality-0.004
oph-0.004
Token fire
Token ID2046
Feature activation-6.81e-02

Pos logit contributions
balls+0.057
ishly+0.050
 hips+0.050
hetically+0.049
 first+0.048
Neg logit contributions
aw-0.065
Kat-0.061
 Sandra-0.056
aman-0.055
 reality-0.054
Tokenwork
Token ID1818
Feature activation-9.40e-02

Pos logit contributions
hetically+0.119
balls+0.119
 hips+0.115
ishly+0.113
ider+0.100
Neg logit contributions
aw-0.126
Kat-0.108
 Sandra-0.105
aman-0.105
 theirs-0.102
Token?"
Token ID1701
Feature activation+3.73e-03

Pos logit contributions
balls+0.172
hetically+0.168
ishly+0.165
 hips+0.161
ider+0.146
Neg logit contributions
aw-0.172
Kat-0.157
 reality-0.143
 Sandra-0.141
aman-0.140
Token His
Token ID2399
Feature activation-1.32e-02

Pos logit contributions
aw+0.009
Kat+0.009
 reality+0.008
aman+0.008
land+0.008
Neg logit contributions
balls-0.004
 hips-0.004
ishly-0.004
hetically-0.004
 first-0.004
Token dad
Token ID9955
Feature activation+3.94e-05

Pos logit contributions
balls+0.027
ishly+0.026
hetically+0.026
 hips+0.025
ider+0.024
Neg logit contributions
aw-0.022
Kat-0.020
 Sandra-0.018
aman-0.017
 reality-0.017
Token said
Token ID531
Feature activation-1.20e-02

Pos logit contributions
aw+0.000
Kat+0.000
 Sandra+0.000
aman+0.000
 spoken+0.000
Neg logit contributions
balls-0.000
hetically-0.000
ishly-0.000
 hips-0.000
ider-0.000
Token,
Token ID11
Feature activation-1.88e-02

Pos logit contributions
balls+0.018
hetically+0.018
 hips+0.016
ishly+0.016
ider+0.016
Neg logit contributions
aw-0.025
Kat-0.022
 Sandra-0.021
 theirs-0.021
 spoken-0.020
Token didn
Token ID1422
Feature activation-4.38e-02

Pos logit contributions
balls+0.065
hetically+0.061
ishly+0.059
 hips+0.058
ider+0.056
Neg logit contributions
aw-0.057
Kat-0.051
 Sandra-0.045
aman-0.044
 reality-0.043
Token't
Token ID470
Feature activation-3.48e-02

Pos logit contributions
balls+0.059
hetically+0.057
ishly+0.054
 hips+0.052
ider+0.048
Neg logit contributions
aw-0.097
Kat-0.095
 Sandra-0.087
aman-0.087
 reality-0.086
Token understand
Token ID1833
Feature activation-1.09e-02

Pos logit contributions
balls+0.078
hetically+0.074
ishly+0.072
 hips+0.070
ider+0.067
Neg logit contributions
aw-0.052
Kat-0.049
 Sandra-0.044
 reality-0.042
 theirs-0.042
Token what
Token ID644
Feature activation+5.56e-03

Pos logit contributions
balls+0.023
ishly+0.023
 hips+0.022
hetically+0.022
 first+0.021
Neg logit contributions
aw-0.016
Kat-0.015
 reality-0.012
 Sandra-0.012
aman-0.012
Token "
Token ID366
Feature activation-1.98e-02

Pos logit contributions
aw+0.010
Kat+0.009
aman+0.008
 Sandra+0.008
 reality+0.008
Neg logit contributions
balls-0.010
ishly-0.010
hetically-0.010
 hips-0.009
 first-0.009
Tokenf
Token ID69
Feature activation-9.36e-02

Pos logit contributions
balls+0.039
hetically+0.037
ishly+0.037
 hips+0.036
ider+0.034
Neg logit contributions
aw-0.033
Kat-0.030
 Sandra-0.027
aman-0.027
 reality-0.027
Tokenierce
Token ID9798
Feature activation-3.01e-02

Pos logit contributions
balls+0.182
hetically+0.175
ishly+0.170
 hips+0.166
ider+0.150
Neg logit contributions
aw-0.171
Kat-0.154
 Sandra-0.139
land-0.138
aman-0.135
Token"
Token ID1
Feature activation-3.91e-02

Pos logit contributions
balls+0.056
hetically+0.054
 hips+0.053
ishly+0.051
ider+0.048
Neg logit contributions
aw-0.057
Kat-0.050
 Sandra-0.047
aman-0.046
land-0.045
Token meant
Token ID4001
Feature activation+7.95e-03

Pos logit contributions
balls+0.078
hetically+0.073
ishly+0.073
 hips+0.072
 first+0.067
Neg logit contributions
aw-0.064
Kat-0.059
 reality-0.053
aman-0.052
 Sandra-0.052
Token,
Token ID11
Feature activation-2.24e-02

Pos logit contributions
aw+0.016
Kat+0.015
 Sandra+0.015
land+0.014
aman+0.014
Neg logit contributions
balls-0.013
hetically-0.013
ishly-0.012
 hips-0.011
ider-0.011
Token but
Token ID475
Feature activation-5.53e-03

Pos logit contributions
balls+0.032
ishly+0.030
hetically+0.030
 hips+0.029
ider+0.026
Neg logit contributions
aw-0.050
Kat-0.047
 reality-0.043
 Sandra-0.043
aman-0.043
Token day
Token ID1110
Feature activation-4.07e-02

Pos logit contributions
balls+0.046
hetically+0.042
ishly+0.042
 hips+0.041
 first+0.038
Neg logit contributions
aw-0.056
Kat-0.053
 reality-0.048
aman-0.047
 Sandra-0.047
Token,
Token ID11
Feature activation-8.51e-03

Pos logit contributions
balls+0.046
ishly+0.041
hetically+0.041
 hips+0.040
ider+0.035
Neg logit contributions
aw-0.104
Kat-0.098
 reality-0.091
 Sandra-0.091
aman-0.091
Token Lily
Token ID20037
Feature activation+2.07e-03

Pos logit contributions
balls+0.017
ishly+0.015
hetically+0.015
 hips+0.015
 first+0.014
Neg logit contributions
aw-0.015
Kat-0.014
aman-0.012
 reality-0.012
 Sandra-0.012
Token saw
Token ID2497
Feature activation+1.72e-02

Pos logit contributions
aw+0.004
Kat+0.004
 theirs+0.004
 reality+0.004
aman+0.004
Neg logit contributions
balls-0.003
ishly-0.003
hetically-0.003
 hips-0.003
 first-0.002
Token a
Token ID257
Feature activation-2.50e-02

Pos logit contributions
aw+0.043
land+0.042
Kat+0.041
aman+0.040
 reality+0.040
Neg logit contributions
balls-0.017
 first-0.016
 help-0.015
ishly-0.014
hetically-0.014
Token ch
Token ID442
Feature activation-9.36e-02

Pos logit contributions
balls+0.040
ishly+0.036
hetically+0.036
 hips+0.036
 first+0.034
Neg logit contributions
aw-0.052
Kat-0.048
 Sandra-0.044
 theirs-0.044
aman-0.043
Tokenubby
Token ID38393
Feature activation-3.63e-02

Pos logit contributions
balls+0.171
ishly+0.159
 hips+0.158
hetically+0.158
 first+0.137
Neg logit contributions
aw-0.185
Kat-0.161
 reality-0.150
 Sandra-0.145
ooks-0.144
Token chicken
Token ID9015
Feature activation-2.83e-02

Pos logit contributions
balls+0.054
ishly+0.050
hetically+0.050
 hips+0.049
ider+0.044
Neg logit contributions
aw-0.078
Kat-0.074
 Sandra-0.067
 reality-0.067
aman-0.067
Token walking
Token ID6155
Feature activation-2.28e-02

Pos logit contributions
 hips+0.048
hetically+0.047
balls+0.047
ishly+0.044
ider+0.041
Neg logit contributions
aw-0.059
Kat-0.048
 reality-0.045
 spoken-0.043
 Sandra-0.043
Token in
Token ID287
Feature activation-1.75e-02

Pos logit contributions
balls+0.041
 hips+0.041
ider+0.038
hetically+0.038
ishly+0.037
Neg logit contributions
aw-0.044
 theirs-0.038
Kat-0.037
ull-0.036
weet-0.036
Token her
Token ID607
Feature activation+2.51e-02

Pos logit contributions
balls+0.026
 hips+0.024
hetically+0.024
 first+0.024
ishly+0.024
Neg logit contributions
aw-0.037
Kat-0.033
ige-0.031
aman-0.030
land-0.030
Token her
Token ID607
Feature activation-2.79e-02

Pos logit contributions
aw+0.022
Kat+0.020
 Sandra+0.018
 theirs+0.017
aman+0.017
Neg logit contributions
balls-0.032
hetically-0.031
 hips-0.030
ider-0.030
ishly-0.030
Token pet
Token ID4273
Feature activation-1.72e-02

Pos logit contributions
balls+0.051
hetically+0.050
ishly+0.048
 hips+0.048
ider+0.047
Neg logit contributions
aw-0.051
Kat-0.048
 Sandra-0.044
 reality-0.043
 theirs-0.042
Token,
Token ID11
Feature activation-1.37e-02

Pos logit contributions
balls+0.029
hetically+0.028
 hips+0.027
ider+0.026
ishly+0.025
Neg logit contributions
aw-0.036
Kat-0.030
 Sandra-0.030
aman-0.029
land-0.028
Token "
Token ID366
Feature activation-8.43e-03

Pos logit contributions
balls+0.024
hetically+0.023
ishly+0.022
 hips+0.021
ider+0.020
Neg logit contributions
aw-0.026
Kat-0.025
 Sandra-0.023
aman-0.022
 reality-0.022
TokenWould
Token ID17353
Feature activation+2.37e-02

Pos logit contributions
balls+0.020
hetically+0.019
 hips+0.019
ishly+0.019
ider+0.018
Neg logit contributions
aw-0.011
Kat-0.010
aman-0.008
 Sandra-0.008
 reality-0.008
Tokenn
Token ID77
Feature activation-9.36e-02

Pos logit contributions
aw+0.044
Kat+0.040
 Sandra+0.036
David+0.034
aman+0.034
Neg logit contributions
balls-0.043
hetically-0.043
ishly-0.041
 hips-0.041
ider-0.038
Token't
Token ID470
Feature activation-3.75e-02

Pos logit contributions
balls+0.131
hetically+0.127
 hips+0.124
ider+0.117
ishly+0.116
Neg logit contributions
aw-0.208
Kat-0.187
 Sandra-0.183
 theirs-0.175
weet-0.174
Token it
Token ID340
Feature activation-4.57e-02

Pos logit contributions
balls+0.068
hetically+0.064
ishly+0.063
 hips+0.062
ider+0.058
Neg logit contributions
aw-0.071
Kat-0.066
 Sandra-0.060
 reality-0.059
aman-0.059
Token be
Token ID307
Feature activation-2.30e-02

Pos logit contributions
hetically+0.082
balls+0.081
ishly+0.078
 hips+0.078
ider+0.071
Neg logit contributions
aw-0.085
Kat-0.075
 theirs-0.071
 Sandra-0.071
 reality-0.069
Token fun
Token ID1257
Feature activation-3.76e-02

Pos logit contributions
balls+0.049
hetically+0.047
ishly+0.046
 hips+0.045
ider+0.042
Neg logit contributions
aw-0.036
Kat-0.033
 Sandra-0.029
 reality-0.028
aman-0.028
Token to
Token ID284
Feature activation-6.78e-03

Pos logit contributions
hetically+0.076
balls+0.071
ishly+0.069
 hips+0.066
ider+0.060
Neg logit contributions
aw-0.063
Kat-0.058
land-0.053
reating-0.053
 theirs-0.052
Token he
Token ID339
Feature activation-1.03e-02

Pos logit contributions
hetically+0.016
balls+0.016
ishly+0.016
 hips+0.016
ider+0.015
Neg logit contributions
aw-0.017
Kat-0.016
 Sandra-0.014
 reality-0.014
aman-0.014
Token decided
Token ID3066
Feature activation+3.21e-02

Pos logit contributions
balls+0.020
ishly+0.019
hetically+0.019
 hips+0.019
 first+0.018
Neg logit contributions
aw-0.017
Kat-0.016
aman-0.014
 theirs-0.014
 Sandra-0.014
Token to
Token ID284
Feature activation-3.83e-03

Pos logit contributions
aw+0.079
Kat+0.075
aman+0.070
 reality+0.069
 Sandra+0.068
Neg logit contributions
balls-0.039
ishly-0.034
 hips-0.033
hetically-0.033
 first-0.030
Token put
Token ID1234
Feature activation-1.51e-02

Pos logit contributions
balls+0.007
hetically+0.007
ishly+0.006
 hips+0.006
ider+0.006
Neg logit contributions
aw-0.007
Kat-0.007
 reality-0.006
 Sandra-0.006
aman-0.006
Token his
Token ID465
Feature activation-3.45e-02

Pos logit contributions
ishly+0.030
hetically+0.030
ider+0.029
balls+0.029
 hips+0.027
Neg logit contributions
aw-0.026
 theirs-0.021
Kat-0.021
 Sandra-0.020
 reality-0.019
Token headphones
Token ID22537
Feature activation-9.34e-02

Pos logit contributions
hetically+0.066
ishly+0.064
balls+0.063
ider+0.061
hog+0.057
Neg logit contributions
aw-0.055
Kat-0.049
land-0.048
 reality-0.047
 marriage-0.045
Token on
Token ID319
Feature activation-1.06e-02

Pos logit contributions
balls+0.156
ishly+0.143
hetically+0.143
 hips+0.131
ider+0.126
Neg logit contributions
aw-0.177
land-0.160
Kat-0.159
 theirs-0.154
reating-0.151
Token and
Token ID290
Feature activation+2.37e-03

Pos logit contributions
balls+0.021
hetically+0.020
ishly+0.020
 hips+0.019
ider+0.018
Neg logit contributions
aw-0.018
Kat-0.017
 reality-0.015
aman-0.015
 Sandra-0.015
Token listen
Token ID6004
Feature activation+1.99e-02

Pos logit contributions
aw+0.004
Kat+0.004
 Sandra+0.004
 reality+0.004
 theirs+0.004
Neg logit contributions
balls-0.004
hetically-0.004
ishly-0.004
ider-0.004
 hips-0.004
Token to
Token ID284
Feature activation+2.37e-02

Pos logit contributions
aw+0.038
Kat+0.036
land+0.034
 reality+0.033
 theirs+0.033
Neg logit contributions
hetically-0.033
ishly-0.033
balls-0.033
 hips-0.029
ider-0.028
Token some
Token ID617
Feature activation+1.68e-02

Pos logit contributions
aw+0.042
Kat+0.039
 Sandra+0.034
aman+0.034
 reality+0.034
Neg logit contributions
balls-0.046
hetically-0.043
ishly-0.043
 hips-0.042
ider-0.039
Token't
Token ID470
Feature activation-3.18e-02

Pos logit contributions
balls+0.051
hetically+0.050
ishly+0.049
 hips+0.046
ider+0.042
Neg logit contributions
aw-0.089
Kat-0.087
 Sandra-0.079
aman-0.079
 reality-0.079
Token want
Token ID765
Feature activation-7.46e-03

Pos logit contributions
balls+0.068
hetically+0.065
ishly+0.064
 hips+0.061
ider+0.058
Neg logit contributions
aw-0.049
Kat-0.047
 Sandra-0.042
 reality-0.041
 theirs-0.040
Token to
Token ID284
Feature activation-1.98e-02

Pos logit contributions
balls+0.010
ishly+0.009
hetically+0.009
 hips+0.009
 first+0.008
Neg logit contributions
aw-0.017
Kat-0.016
aman-0.015
 reality-0.015
 Sandra-0.015
Token remove
Token ID4781
Feature activation-5.92e-03

Pos logit contributions
balls+0.032
ishly+0.030
hetically+0.030
 hips+0.029
ider+0.027
Neg logit contributions
aw-0.041
Kat-0.038
 Sandra-0.035
 reality-0.034
aman-0.034
Token her
Token ID607
Feature activation-3.44e-02

Pos logit contributions
balls+0.013
hetically+0.013
ishly+0.013
ider+0.013
 hips+0.012
Neg logit contributions
aw-0.008
Kat-0.007
 theirs-0.007
 Sandra-0.006
 reality-0.006
Token headphones
Token ID22537
Feature activation-9.29e-02

Pos logit contributions
ishly+0.066
hetically+0.066
balls+0.065
ider+0.060
hog+0.058
Neg logit contributions
aw-0.057
land-0.050
Kat-0.048
aman-0.046
 reality-0.045
Token because
Token ID780
Feature activation-1.14e-02

Pos logit contributions
balls+0.174
hetically+0.153
ishly+0.150
ider+0.149
 hips+0.138
Neg logit contributions
aw-0.160
land-0.146
Kat-0.145
reating-0.144
aman-0.142
Token she
Token ID673
Feature activation+1.41e-02

Pos logit contributions
balls+0.022
hetically+0.021
ishly+0.021
 hips+0.020
 first+0.019
Neg logit contributions
aw-0.020
Kat-0.018
aman-0.016
 reality-0.016
 Sandra-0.016
Token loved
Token ID6151
Feature activation+2.83e-03

Pos logit contributions
aw+0.021
Kat+0.019
 reality+0.016
 Sandra+0.016
aman+0.016
Neg logit contributions
balls-0.031
ishly-0.029
hetically-0.029
 hips-0.028
ider-0.027
Token the
Token ID262
Feature activation+9.54e-03

Pos logit contributions
aw+0.005
Kat+0.004
 Sandra+0.004
aman+0.004
 reality+0.004
Neg logit contributions
balls-0.006
hetically-0.005
ishly-0.005
 hips-0.005
ider-0.005
Token music
Token ID2647
Feature activation-3.00e-03

Pos logit contributions
aw+0.014
Kat+0.013
land+0.012
aman+0.012
 reality+0.012
Neg logit contributions
balls-0.021
ishly-0.020
hetically-0.020
 hips-0.019
ider-0.019
Token,
Token ID11
Feature activation-1.67e-02

Pos logit contributions
balls+0.008
hetically+0.008
ishly+0.008
 hips+0.007
ider+0.007
Neg logit contributions
aw-0.016
Kat-0.015
 Sandra-0.014
 reality-0.014
aman-0.013
Token "
Token ID366
Feature activation-1.10e-02

Pos logit contributions
balls+0.030
hetically+0.029
ishly+0.028
 hips+0.027
ider+0.026
Neg logit contributions
aw-0.029
Kat-0.028
 Sandra-0.025
aman-0.025
 theirs-0.024
TokenI
Token ID40
Feature activation+8.33e-03

Pos logit contributions
balls+0.021
hetically+0.019
ishly+0.019
 hips+0.018
ider+0.018
Neg logit contributions
aw-0.019
Kat-0.017
 reality-0.016
 Sandra-0.016
aman-0.016
Token am
Token ID716
Feature activation-2.23e-02

Pos logit contributions
aw+0.015
Kat+0.014
 Sandra+0.013
 reality+0.013
aman+0.013
Neg logit contributions
balls-0.016
hetically-0.015
ishly-0.015
 hips-0.014
ider-0.013
Token a
Token ID257
Feature activation-4.03e-02

Pos logit contributions
balls+0.043
hetically+0.041
ishly+0.040
 hips+0.039
ider+0.037
Neg logit contributions
aw-0.039
Kat-0.036
 Sandra-0.032
 reality-0.032
aman-0.032
Token night
Token ID1755
Feature activation-9.29e-02

Pos logit contributions
balls+0.073
ishly+0.068
hetically+0.068
 hips+0.066
ider+0.061
Neg logit contributions
aw-0.075
Kat-0.070
 Sandra-0.063
aman-0.062
 reality-0.062
Token owl
Token ID39610
Feature activation-3.85e-02

Pos logit contributions
balls+0.144
hetically+0.134
ishly+0.133
 hips+0.128
ider+0.115
Neg logit contributions
aw-0.198
Kat-0.187
 reality-0.172
 Sandra-0.171
aman-0.169
Token,
Token ID11
Feature activation-5.81e-03

Pos logit contributions
balls+0.068
hetically+0.064
 hips+0.062
ishly+0.060
ider+0.054
Neg logit contributions
aw-0.077
Kat-0.067
land-0.064
 reality-0.064
weet-0.063
Token which
Token ID543
Feature activation+1.65e-02

Pos logit contributions
balls+0.009
hetically+0.009
ishly+0.008
 hips+0.008
ider+0.008
Neg logit contributions
aw-0.012
Kat-0.011
 Sandra-0.010
 reality-0.010
aman-0.010
Token means
Token ID1724
Feature activation+3.27e-03

Pos logit contributions
aw+0.030
Kat+0.026
 reality+0.024
 Sandra+0.024
weet+0.023
Neg logit contributions
balls-0.032
hetically-0.030
 hips-0.030
ishly-0.030
ider-0.028
Token I
Token ID314
Feature activation+1.42e-02

Pos logit contributions
aw+0.006
Kat+0.006
 reality+0.005
 Sandra+0.005
aman+0.005
Neg logit contributions
balls-0.006
hetically-0.006
ishly-0.005
 hips-0.005
ider-0.005
Token
Token ID198
Feature activation+7.73e-03

Pos logit contributions
aw+0.061
ige+0.055
Kat+0.054
 reality+0.054
 Sandra+0.054
Neg logit contributions
balls-0.032
 first-0.029
 hips-0.027
hetically-0.026
 help-0.023
TokenBut
Token ID1537
Feature activation-2.36e-02

Pos logit contributions
aw+0.013
Kat+0.012
 Sandra+0.011
 reality+0.011
 theirs+0.011
Neg logit contributions
balls-0.015
hetically-0.014
ishly-0.014
 hips-0.014
ider-0.013
Token when
Token ID618
Feature activation-4.57e-02

Pos logit contributions
balls+0.033
ishly+0.030
hetically+0.030
 hips+0.029
 first+0.026
Neg logit contributions
aw-0.054
Kat-0.052
 reality-0.047
 Sandra-0.046
aman-0.046
Token it
Token ID340
Feature activation-1.76e-02

Pos logit contributions
balls+0.062
 first+0.054
ishly+0.051
 hips+0.051
hetically+0.050
Neg logit contributions
aw-0.110
Kat-0.104
onda-0.093
ige-0.093
aman-0.092
Token is
Token ID318
Feature activation-1.57e-02

Pos logit contributions
ishly+0.039
balls+0.039
hetically+0.038
 hips+0.037
ider+0.033
Neg logit contributions
aw-0.025
Kat-0.019
 happened-0.018
 reality-0.018
 CA-0.018
Token night
Token ID1755
Feature activation-9.25e-02

Pos logit contributions
balls+0.033
 hips+0.031
hetically+0.031
ishly+0.030
 first+0.030
Neg logit contributions
aw-0.025
Kat-0.023
aman-0.019
 Sandra-0.019
 reality-0.019
Token,
Token ID11
Feature activation-1.25e-02

Pos logit contributions
balls+0.140
hetically+0.136
ishly+0.133
 hips+0.132
ider+0.116
Neg logit contributions
aw-0.191
Kat-0.177
aman-0.171
land-0.168
 Sandra-0.167
Token L
Token ID406
Feature activation-5.83e-03

Pos logit contributions
balls+0.022
ishly+0.020
hetically+0.020
 hips+0.020
 first+0.019
Neg logit contributions
aw-0.023
Kat-0.022
aman-0.020
onda-0.019
weet-0.019
Tokenila
Token ID10102
Feature activation-2.68e-03

Pos logit contributions
balls+0.012
 hips+0.011
hetically+0.011
ishly+0.011
ider+0.011
Neg logit contributions
aw-0.008
Kat-0.008
 reality-0.008
aman-0.007
 affordable-0.007
Token does
Token ID857
Feature activation-6.73e-03

Pos logit contributions
balls+0.005
hetically+0.005
ishly+0.005
 hips+0.005
ider+0.004
Neg logit contributions
aw-0.005
Kat-0.004
 Sandra-0.004
 reality-0.004
aman-0.004
Token not
Token ID407
Feature activation-1.18e-03

Pos logit contributions
balls+0.014
hetically+0.013
ishly+0.013
 hips+0.013
ider+0.012
Neg logit contributions
aw-0.011
Kat-0.010
 Sandra-0.009
aman-0.009
 reality-0.009
Token"
Token ID1
Feature activation-4.81e-02

Pos logit contributions
balls+0.015
hetically+0.015
ishly+0.015
 hips+0.014
ider+0.013
Neg logit contributions
aw-0.013
Kat-0.013
 Sandra-0.011
David-0.011
aman-0.011
TokenThat
Token ID2504
Feature activation-1.90e-02

Pos logit contributions
balls+0.110
ishly+0.107
hetically+0.106
 hips+0.105
ider+0.096
Neg logit contributions
aw-0.067
Kat-0.061
 Sandra-0.051
 reality-0.050
aman-0.050
Token is
Token ID318
Feature activation-1.04e-02

Pos logit contributions
balls+0.032
ishly+0.029
hetically+0.029
 hips+0.028
ider+0.026
Neg logit contributions
aw-0.038
Kat-0.036
 Sandra-0.032
aman-0.032
 reality-0.032
Token a
Token ID257
Feature activation-2.90e-02

Pos logit contributions
balls+0.018
ishly+0.016
 hips+0.016
 first+0.015
hetically+0.015
Neg logit contributions
aw-0.021
Kat-0.020
aman-0.018
oph-0.018
 reality-0.018
Token fire
Token ID2046
Feature activation-6.33e-02

Pos logit contributions
balls+0.056
ishly+0.051
 hips+0.050
hetically+0.050
 first+0.047
Neg logit contributions
aw-0.052
Kat-0.047
 Sandra-0.043
aman-0.043
 theirs-0.041
Tokenwork
Token ID1818
Feature activation-9.24e-02

Pos logit contributions
hetically+0.118
 hips+0.117
balls+0.115
ishly+0.111
 stretches+0.100
Neg logit contributions
aw-0.103
ruck-0.088
ull-0.087
land-0.086
Kat-0.085
Token that
Token ID326
Feature activation-1.43e-02

Pos logit contributions
hetically+0.158
balls+0.154
ishly+0.152
 hips+0.148
ider+0.131
Neg logit contributions
aw-0.180
Kat-0.161
 reality-0.156
land-0.150
 marriage-0.146
Token is
Token ID318
Feature activation+7.16e-05

Pos logit contributions
balls+0.028
hetically+0.027
ishly+0.027
 hips+0.026
ider+0.024
Neg logit contributions
aw-0.024
Kat-0.022
 reality-0.020
 Sandra-0.020
aman-0.019
Token not
Token ID407
Feature activation-7.47e-03

Pos logit contributions
aw+0.000
Kat+0.000
aman+0.000
oph+0.000
 Sandra+0.000
Neg logit contributions
balls-0.000
 hips-0.000
 first-0.000
ishly-0.000
hetically-0.000
Token working
Token ID1762
Feature activation-5.26e-02

Pos logit contributions
balls+0.011
 first+0.010
 hips+0.009
 help+0.009
 onto+0.009
Neg logit contributions
aw-0.016
Kat-0.016
oph-0.015
aman-0.015
David-0.014
Token right
Token ID826
Feature activation+7.79e-04

Pos logit contributions
balls+0.081
ishly+0.080
hetically+0.080
 hips+0.074
ider+0.067
Neg logit contributions
aw-0.111
Kat-0.101
 reality-0.095
 theirs-0.095
land-0.094
Token be
Token ID307
Feature activation+8.61e-03

Pos logit contributions
aw+0.091
Kat+0.084
 reality+0.075
aman+0.074
 Sandra+0.074
Neg logit contributions
balls-0.089
ishly-0.084
 hips-0.082
hetically-0.082
 first-0.076
Token kind
Token ID1611
Feature activation-1.12e-02

Pos logit contributions
Kat+0.016
aw+0.016
aman+0.016
oph+0.015
 reality+0.014
Neg logit contributions
balls-0.015
ishly-0.014
 hips-0.014
hetically-0.013
 first-0.013
Token and
Token ID290
Feature activation-1.64e-02

Pos logit contributions
balls+0.020
hetically+0.020
ishly+0.019
 hips+0.018
ider+0.016
Neg logit contributions
aw-0.023
Kat-0.020
 theirs-0.018
land-0.018
 Sandra-0.018
Token respectful
Token ID25923
Feature activation+2.30e-02

Pos logit contributions
balls+0.035
hetically+0.035
ishly+0.035
 hips+0.033
ider+0.031
Neg logit contributions
aw-0.023
Kat-0.021
 theirs-0.019
 Sandra-0.019
 reality-0.019
Token to
Token ID284
Feature activation+3.17e-02

Pos logit contributions
aw+0.043
Kat+0.036
 theirs+0.035
reating+0.034
land+0.034
Neg logit contributions
hetically-0.044
ishly-0.043
balls-0.041
 hips-0.040
ider-0.035
Token others
Token ID1854
Feature activation+7.37e-02

Pos logit contributions
aw+0.058
Kat+0.058
aman+0.054
oph+0.052
 Sandra+0.050
Neg logit contributions
balls-0.056
 first-0.051
ishly-0.050
hetically-0.050
 hips-0.049
Token.
Token ID13
Feature activation+2.35e-02

Pos logit contributions
aw+0.142
Kat+0.125
land+0.117
 reality+0.117
 theirs+0.115
Neg logit contributions
balls-0.130
hetically-0.127
ishly-0.121
 hips-0.117
ider-0.109
Token Do
Token ID2141
Feature activation+5.61e-02

Pos logit contributions
aw+0.054
Kat+0.051
aman+0.048
 reality+0.048
land+0.047
Neg logit contributions
balls-0.031
ishly-0.028
 hips-0.028
hetically-0.027
ider-0.025
Token you
Token ID345
Feature activation-1.85e-02

Pos logit contributions
aw+0.122
Kat+0.119
aman+0.109
 reality+0.108
 Sandra+0.105
Neg logit contributions
balls-0.085
ishly-0.074
 hips-0.073
hetically-0.072
 first-0.071
Token understand
Token ID1833
Feature activation+6.97e-03

Pos logit contributions
balls+0.037
 hips+0.036
ishly+0.035
hetically+0.035
 first+0.034
Neg logit contributions
aw-0.029
Kat-0.027
aman-0.025
 Sandra-0.024
 reality-0.023
Token?"
Token ID1701
Feature activation+1.27e-02

Pos logit contributions
Kat+0.014
aw+0.014
aman+0.012
 reality+0.012
 Sandra+0.012
Neg logit contributions
balls-0.011
 first-0.011
 hips-0.010
ishly-0.010
 help-0.009
Tokenl
Token ID75
Feature activation+5.80e-03

Pos logit contributions
aw+0.127
Kat+0.123
 reality+0.113
aman+0.112
 Sandra+0.112
Neg logit contributions
balls-0.078
ishly-0.069
hetically-0.069
 hips-0.068
ider-0.059
Tokenaces
Token ID2114
Feature activation+2.94e-02

Pos logit contributions
aw+0.014
land+0.011
Kat+0.011
ige+0.011
weet+0.011
Neg logit contributions
balls-0.008
hetically-0.008
ishly-0.008
 hips-0.007
 first-0.007
Token,
Token ID11
Feature activation+4.37e-02

Pos logit contributions
aw+0.062
Kat+0.054
land+0.053
 theirs+0.052
 Sandra+0.050
Neg logit contributions
balls-0.045
hetically-0.044
ishly-0.043
 hips-0.040
ider-0.036
Token and
Token ID290
Feature activation+3.83e-02

Pos logit contributions
aw+0.091
Kat+0.088
aman+0.080
 reality+0.079
 Sandra+0.078
Neg logit contributions
balls-0.069
hetically-0.063
 hips-0.063
ishly-0.063
 first-0.055
Token ear
Token ID1027
Feature activation+6.10e-02

Pos logit contributions
aw+0.067
Kat+0.062
 Sandra+0.055
 reality+0.055
aman+0.055
Neg logit contributions
balls-0.074
hetically-0.069
ishly-0.069
 hips-0.068
ider-0.063
Tokenrings
Token ID33173
Feature activation+7.30e-02

Pos logit contributions
aw+0.106
Kat+0.100
aman+0.088
 Sandra+0.088
 reality+0.088
Neg logit contributions
balls-0.117
ishly-0.109
hetically-0.109
 hips-0.107
ider-0.099
Token.
Token ID13
Feature activation+9.76e-03

Pos logit contributions
aw+0.162
Kat+0.146
land+0.137
 theirs+0.136
 Sandra+0.135
Neg logit contributions
balls-0.104
hetically-0.103
ishly-0.101
 hips-0.096
ider-0.085
Token They
Token ID1119
Feature activation+7.39e-04

Pos logit contributions
aw+0.020
Kat+0.020
aman+0.019
 reality+0.019
onda+0.018
Neg logit contributions
balls-0.015
 hips-0.013
ishly-0.013
hetically-0.013
ider-0.012
Token are
Token ID389
Feature activation-2.90e-03

Pos logit contributions
aw+0.001
Kat+0.001
 Sandra+0.001
 reality+0.001
 theirs+0.001
Neg logit contributions
balls-0.002
hetically-0.001
ishly-0.001
ider-0.001
 hips-0.001
Token happy
Token ID3772
Feature activation-1.60e-02

Pos logit contributions
balls+0.005
 hips+0.005
ishly+0.004
hetically+0.004
 first+0.004
Neg logit contributions
aw-0.006
Kat-0.005
aman-0.005
 reality-0.005
oph-0.005
Token.
Token ID13
Feature activation+3.61e-03

Pos logit contributions
balls+0.023
hetically+0.022
ishly+0.022
 hips+0.021
ider+0.018
Neg logit contributions
aw-0.036
Kat-0.034
 reality-0.031
 Sandra-0.031
 theirs-0.031
Token and
Token ID290
Feature activation+1.82e-02

Pos logit contributions
aw+0.013
Kat+0.010
 reality+0.010
land+0.010
 seen+0.010
Neg logit contributions
balls-0.012
ishly-0.011
hetically-0.011
 hips-0.011
 Carlos-0.010
Token couldn
Token ID3521
Feature activation-1.30e-04

Pos logit contributions
aw+0.035
Kat+0.033
aman+0.030
land+0.029
 Sandra+0.028
Neg logit contributions
balls-0.033
 hips-0.030
hetically-0.030
ishly-0.030
 first-0.028
Token't
Token ID470
Feature activation+2.81e-04

Pos logit contributions
balls+0.000
hetically+0.000
ishly+0.000
 hips+0.000
ider+0.000
Neg logit contributions
aw-0.000
Kat-0.000
 Sandra-0.000
aman-0.000
 reality-0.000
Token find
Token ID1064
Feature activation+9.26e-03

Pos logit contributions
aw+0.001
Kat+0.000
 reality+0.000
 Sandra+0.000
 theirs+0.000
Neg logit contributions
balls-0.001
ishly-0.000
hetically-0.000
 hips-0.000
ider-0.000
Token it
Token ID340
Feature activation-3.33e-03

Pos logit contributions
aw+0.014
Kat+0.013
aman+0.012
land+0.011
David+0.011
Neg logit contributions
balls-0.019
 hips-0.018
ishly-0.018
hetically-0.018
 first-0.018
Token anywhere
Token ID6609
Feature activation+8.01e-02

Pos logit contributions
balls+0.005
ishly+0.005
hetically+0.005
 hips+0.005
 first+0.004
Neg logit contributions
aw-0.007
Kat-0.007
aman-0.006
 Sandra-0.006
 reality-0.006
Token.
Token ID13
Feature activation+4.41e-03

Pos logit contributions
aw+0.208
Kat+0.197
aman+0.186
 Sandra+0.183
 reality+0.180
Neg logit contributions
balls-0.090
 hips-0.076
hetically-0.075
ishly-0.075
ider-0.066
Token She
Token ID1375
Feature activation-4.46e-03

Pos logit contributions
aw+0.009
Kat+0.009
 reality+0.008
 Sandra+0.008
aman+0.008
Neg logit contributions
balls-0.007
hetically-0.007
ishly-0.007
 hips-0.006
ider-0.006
Token looked
Token ID3114
Feature activation+1.37e-02

Pos logit contributions
balls+0.007
ishly+0.007
hetically+0.007
 hips+0.007
 first+0.006
Neg logit contributions
aw-0.009
Kat-0.008
aman-0.008
 Sandra-0.008
 reality-0.008
Token under
Token ID739
Feature activation-4.81e-03

Pos logit contributions
aw+0.028
Kat+0.026
 theirs+0.024
 reality+0.024
 Sandra+0.024
Neg logit contributions
balls-0.022
hetically-0.021
ishly-0.020
 hips-0.019
ider-0.019
Token the
Token ID262
Feature activation+1.07e-02

Pos logit contributions
balls+0.008
hetically+0.008
 hips+0.008
ishly+0.008
 first+0.007
Neg logit contributions
aw-0.009
Kat-0.008
aman-0.008
 reality-0.008
land-0.007
Token wore
Token ID12408
Feature activation+2.90e-03

Pos logit contributions
balls+0.015
hetically+0.014
ishly+0.014
 hips+0.014
ider+0.013
Neg logit contributions
aw-0.011
Kat-0.010
 reality-0.009
 Sandra-0.009
 spoken-0.008
Token funny
Token ID8258
Feature activation+2.04e-02

Pos logit contributions
aw+0.005
Kat+0.005
aman+0.004
 reality+0.004
 Sandra+0.004
Neg logit contributions
balls-0.006
 hips-0.005
ishly-0.005
hetically-0.005
 first-0.005
Token hats
Token ID23910
Feature activation+4.44e-03

Pos logit contributions
aw+0.036
Kat+0.032
 Sandra+0.028
land+0.028
 reality+0.027
Neg logit contributions
balls-0.040
ishly-0.039
hetically-0.039
 hips-0.036
ider-0.035
Token and
Token ID290
Feature activation+6.34e-03

Pos logit contributions
aw+0.008
land+0.007
Kat+0.007
 theirs+0.007
 Sandra+0.007
Neg logit contributions
hetically-0.008
balls-0.008
 hips-0.007
ishly-0.007
ider-0.007
Token scar
Token ID10153
Feature activation+3.59e-02

Pos logit contributions
aw+0.012
Kat+0.011
 Sandra+0.010
 reality+0.010
 theirs+0.010
Neg logit contributions
balls-0.011
hetically-0.011
ishly-0.010
 hips-0.010
ider-0.009
Tokenves
Token ID1158
Feature activation+7.22e-02

Pos logit contributions
aw+0.040
Kat+0.034
 theirs+0.027
land+0.027
 Sandra+0.027
Neg logit contributions
balls-0.090
ishly-0.089
hetically-0.088
 hips-0.086
ider-0.082
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
aw+0.150
 theirs+0.128
land+0.127
Kat+0.126
 Sandra+0.120
Neg logit contributions
hetically-0.112
balls-0.111
ishly-0.105
 hips-0.103
ider-0.100
Token They
Token ID1119
Feature activation+1.89e-03

Pos logit contributions
aw+0.023
Kat+0.021
 Sandra+0.019
 theirs+0.018
aman+0.018
Neg logit contributions
balls-0.024
hetically-0.024
ishly-0.024
 hips-0.022
ider-0.021
Token climbed
Token ID19952
Feature activation+5.64e-03

Pos logit contributions
aw+0.003
Kat+0.003
 reality+0.002
 Sandra+0.002
 theirs+0.002
Neg logit contributions
balls-0.004
hetically-0.004
ishly-0.004
 hips-0.004
ider-0.004
Token up
Token ID510
Feature activation-1.77e-02

Pos logit contributions
aw+0.013
Kat+0.011
 Sandra+0.011
 theirs+0.011
aman+0.011
Neg logit contributions
hetically-0.008
balls-0.008
ishly-0.008
 hips-0.007
ider-0.007
Token the
Token ID262
Feature activation-5.72e-03

Pos logit contributions
balls+0.036
ishly+0.034
hetically+0.034
 hips+0.033
ider+0.032
Neg logit contributions
aw-0.029
Kat-0.027
 Sandra-0.024
 reality-0.024
aman-0.024
Token slide
Token ID10649
Feature activation+4.61e-02

Pos logit contributions
aw+0.057
Kat+0.052
 Sandra+0.047
 reality+0.047
aman+0.046
Neg logit contributions
balls-0.059
ishly-0.056
hetically-0.056
 hips-0.055
ider-0.051
Token,
Token ID11
Feature activation+3.81e-02

Pos logit contributions
aw+0.107
Kat+0.094
aman+0.089
 Sandra+0.088
 theirs+0.087
Neg logit contributions
balls-0.064
 hips-0.059
hetically-0.059
ishly-0.059
ider-0.053
Token and
Token ID290
Feature activation+3.17e-02

Pos logit contributions
aw+0.078
Kat+0.070
 Sandra+0.067
 theirs+0.066
aman+0.065
Neg logit contributions
balls-0.061
hetically-0.057
ishly-0.057
 hips-0.056
ider-0.054
Token the
Token ID262
Feature activation+4.28e-02

Pos logit contributions
aw+0.054
Kat+0.050
 Sandra+0.045
 reality+0.044
 theirs+0.044
Neg logit contributions
balls-0.062
hetically-0.059
ishly-0.058
 hips-0.057
ider-0.054
Token sees
Token ID7224
Feature activation+4.04e-02

Pos logit contributions
aw+0.074
Kat+0.067
 reality+0.061
 Sandra+0.060
aman+0.059
Neg logit contributions
balls-0.082
ishly-0.079
hetically-0.079
 hips-0.077
ider-0.072
Tokenaw
Token ID707
Feature activation+7.82e-02

Pos logit contributions
Kat+0.036
aw+0.030
aman+0.029
 reality+0.027
 Sandra+0.027
Neg logit contributions
balls-0.110
ishly-0.105
hetically-0.103
 hips-0.100
 first-0.099
Token.
Token ID13
Feature activation+7.03e-03

Pos logit contributions
aw+0.184
Kat+0.171
 Sandra+0.158
aman+0.157
 reality+0.157
Neg logit contributions
balls-0.104
hetically-0.096
ishly-0.095
 hips-0.093
ider-0.082
Token They
Token ID1119
Feature activation+2.91e-03

Pos logit contributions
aw+0.016
Kat+0.014
 reality+0.014
aman+0.014
 spoken+0.013
Neg logit contributions
balls-0.010
ishly-0.009
 hips-0.009
hetically-0.008
ider-0.008
Token laughed
Token ID13818
Feature activation-1.20e-02

Pos logit contributions
aw+0.005
Kat+0.004
 Sandra+0.004
 reality+0.004
aman+0.004
Neg logit contributions
balls-0.006
hetically-0.006
ishly-0.006
 hips-0.006
ider-0.005
Token and
Token ID290
Feature activation+3.18e-03

Pos logit contributions
balls+0.021
hetically+0.021
 hips+0.020
ishly+0.020
ider+0.019
Neg logit contributions
aw-0.024
Kat-0.020
 theirs-0.019
 reality-0.018
land-0.018
Token smiled
Token ID13541
Feature activation+6.12e-03

Pos logit contributions
aw+0.006
Kat+0.006
 Sandra+0.005
 reality+0.005
aman+0.005
Neg logit contributions
balls-0.006
hetically-0.005
ishly-0.005
 hips-0.005
ider-0.005
Token climb
Token ID12080
Feature activation+7.47e-03

Pos logit contributions
aw+0.027
Kat+0.025
 theirs+0.024
 Sandra+0.024
 reality+0.024
Neg logit contributions
balls-0.029
hetically-0.028
ider-0.026
ishly-0.026
 hips-0.024
Token up
Token ID510
Feature activation-1.24e-03

Pos logit contributions
aw+0.015
Kat+0.013
 theirs+0.012
 Sandra+0.012
 reality+0.012
Neg logit contributions
balls-0.013
hetically-0.013
ishly-0.012
 hips-0.012
ider-0.012
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
balls+0.002
ishly+0.002
hetically+0.002
 hips+0.002
ider+0.002
Neg logit contributions
aw-0.002
Kat-0.002
 theirs-0.002
 Sandra-0.002
 reality-0.002
Token down
Token ID866
Feature activation+2.69e-02

Pos logit contributions
aw+0.018
 Sandra+0.017
Kat+0.016
 theirs+0.015
 seeing+0.015
Neg logit contributions
balls-0.019
ider-0.018
hetically-0.018
 hips-0.017
ishly-0.017
Token the
Token ID262
Feature activation+3.65e-02

Pos logit contributions
aw+0.048
Kat+0.040
 reality+0.039
 theirs+0.038
 Sandra+0.037
Neg logit contributions
balls-0.051
hetically-0.050
ishly-0.050
 hips-0.049
ider-0.045
Token tree
Token ID5509
Feature activation+3.29e-02

Pos logit contributions
aw+0.056
Kat+0.053
weet+0.048
 desert+0.047
str+0.046
Neg logit contributions
balls-0.075
ishly-0.073
hetically-0.070
ider-0.069
 hips-0.068
Token,
Token ID11
Feature activation+3.80e-02

Pos logit contributions
aw+0.064
land+0.052
Kat+0.050
 reality+0.050
 theirs+0.049
Neg logit contributions
balls-0.054
hetically-0.051
ishly-0.050
 hips-0.049
ider-0.047
Token while
Token ID981
Feature activation+8.07e-03

Pos logit contributions
aw+0.075
Kat+0.067
 theirs+0.065
 Sandra+0.064
 reality+0.064
Neg logit contributions
balls-0.062
hetically-0.060
ider-0.057
 hips-0.057
ishly-0.057
Token the
Token ID262
Feature activation+1.45e-02

Pos logit contributions
aw+0.015
Kat+0.014
aman+0.013
 Sandra+0.013
 reality+0.013
Neg logit contributions
balls-0.015
ishly-0.013
hetically-0.013
 hips-0.013
 first-0.012
Token bunny
Token ID44915
Feature activation+6.50e-03

Pos logit contributions
aw+0.023
Kat+0.019
 reality+0.018
Pe+0.018
aman+0.017
Neg logit contributions
balls-0.030
 hips-0.029
ishly-0.029
hetically-0.029
ider-0.027
Token liked
Token ID8288
Feature activation+4.60e-03

Pos logit contributions
aw+0.011
Kat+0.009
 Sandra+0.008
aman+0.008
land+0.008
Neg logit contributions
balls-0.013
hetically-0.013
 hips-0.013
ishly-0.012
ider-0.012
Token the
Token ID262
Feature activation-5.14e-03

Pos logit contributions
balls+0.039
hetically+0.038
ishly+0.038
 hips+0.037
 first+0.035
Neg logit contributions
aw-0.044
Kat-0.040
land-0.035
 reality-0.035
 Sandra-0.035
Token bath
Token ID7837
Feature activation-2.56e-02

Pos logit contributions
hetically+0.010
balls+0.010
ishly+0.010
 hips+0.010
ider+0.009
Neg logit contributions
aw-0.008
Kat-0.007
 reality-0.007
land-0.006
 desert-0.006
Tokentub
Token ID37995
Feature activation-3.07e-02

Pos logit contributions
hetically+0.048
ishly+0.047
balls+0.046
 hips+0.045
ider+0.041
Neg logit contributions
aw-0.044
Kat-0.039
 reality-0.036
land-0.036
ooks-0.034
Token!
Token ID0
Feature activation+4.58e-03

Pos logit contributions
balls+0.051
hetically+0.049
ishly+0.046
 hips+0.046
ider+0.044
Neg logit contributions
aw-0.063
Kat-0.055
land-0.052
aman-0.051
 Sandra-0.050
Token
Token ID220
Feature activation+4.91e-03

Pos logit contributions
aw+0.008
Kat+0.007
 Sandra+0.006
aman+0.006
 reality+0.006
Neg logit contributions
balls-0.009
hetically-0.009
ishly-0.009
 hips-0.009
ider-0.008
Token
Token ID198
Feature activation+1.41e-02

Pos logit contributions
aw+0.010
Kat+0.009
 reality+0.009
aman+0.009
 Sandra+0.009
Neg logit contributions
balls-0.008
hetically-0.007
ishly-0.007
 hips-0.007
 first-0.006
Token
Token ID198
Feature activation-1.45e-03

Pos logit contributions
aw+0.027
Kat+0.025
 reality+0.023
aman+0.022
 Sandra+0.022
Neg logit contributions
balls-0.025
hetically-0.023
ishly-0.023
 hips-0.023
ider-0.021
TokenThe
Token ID464
Feature activation-1.87e-02

Pos logit contributions
balls+0.004
hetically+0.004
ishly+0.004
 hips+0.004
ider+0.003
Neg logit contributions
aw-0.002
Kat-0.001
 reality-0.001
 Sandra-0.001
aman-0.001
Token warm
Token ID5814
Feature activation+6.77e-03

Pos logit contributions
hetically+0.038
ishly+0.037
balls+0.037
 hips+0.036
ider+0.032
Neg logit contributions
aw-0.028
Kat-0.027
 reality-0.026
Time-0.023
 marriage-0.023
Token water
Token ID1660
Feature activation+2.82e-03

Pos logit contributions
aw+0.010
Kat+0.009
 reality+0.009
 theirs+0.008
aman+0.008
Neg logit contributions
balls-0.014
hetically-0.013
ishly-0.013
 hips-0.013
ider-0.012
Token spilled
Token ID31896
Feature activation+8.39e-03

Pos logit contributions
aw+0.004
Kat+0.004
 Sandra+0.003
aman+0.003
ull+0.003
Neg logit contributions
balls-0.006
hetically-0.006
 hips-0.005
ishly-0.005
ider-0.005
Token put
Token ID1234
Feature activation-2.18e-03

Pos logit contributions
balls+0.001
hetically+0.001
ishly+0.001
 hips+0.001
ider+0.001
Neg logit contributions
aw-0.001
Kat-0.001
 Sandra-0.001
 reality-0.001
aman-0.001
Token them
Token ID606
Feature activation-3.77e-02

Pos logit contributions
balls+0.004
hetically+0.004
ishly+0.004
 hips+0.004
ider+0.004
Neg logit contributions
aw-0.004
Kat-0.003
 Sandra-0.003
 theirs-0.003
aman-0.003
Token in
Token ID287
Feature activation-2.71e-02

Pos logit contributions
balls+0.055
hetically+0.054
ishly+0.054
ider+0.049
 hips+0.048
Neg logit contributions
aw-0.079
Kat-0.072
 theirs-0.071
land-0.066
 Sandra-0.066
Token a
Token ID257
Feature activation+1.60e-02

Pos logit contributions
balls+0.054
hetically+0.051
ishly+0.051
 hips+0.049
ider+0.046
Neg logit contributions
aw-0.046
Kat-0.043
 Sandra-0.038
 reality-0.038
aman-0.038
Token big
Token ID1263
Feature activation+1.39e-02

Pos logit contributions
aw+0.031
Kat+0.029
 reality+0.026
aman+0.026
 Sandra+0.026
Neg logit contributions
balls-0.028
hetically-0.026
ishly-0.026
 hips-0.025
ider-0.024
Token basket
Token ID7988
Feature activation+1.55e-02

Pos logit contributions
aw+0.028
Kat+0.026
aman+0.024
 reality+0.024
 Sandra+0.024
Neg logit contributions
balls-0.024
hetically-0.022
ishly-0.021
ider-0.020
 hips-0.020
Token and
Token ID290
Feature activation+8.19e-03

Pos logit contributions
aw+0.030
Kat+0.027
 Sandra+0.025
aman+0.025
land+0.025
Neg logit contributions
balls-0.027
hetically-0.025
ishly-0.025
 hips-0.024
 breaths-0.022
Token walked
Token ID6807
Feature activation+6.82e-03

Pos logit contributions
aw+0.014
Kat+0.013
 Sandra+0.012
 reality+0.011
aman+0.011
Neg logit contributions
balls-0.016
hetically-0.015
ishly-0.015
 hips-0.015
ider-0.014
Token to
Token ID284
Feature activation+1.62e-02

Pos logit contributions
aw+0.014
Kat+0.014
land+0.012
 Sandra+0.012
 theirs+0.012
Neg logit contributions
hetically-0.011
ishly-0.011
balls-0.011
 hips-0.010
ider-0.009
Token the
Token ID262
Feature activation+8.24e-03

Pos logit contributions
aw+0.032
Kat+0.028
aman+0.025
 reality+0.025
land+0.024
Neg logit contributions
balls-0.029
ishly-0.027
 hips-0.026
hetically-0.026
 first-0.025
Token church
Token ID4928
Feature activation+1.64e-02

Pos logit contributions
aw+0.015
Kat+0.013
 reality+0.012
aman+0.012
 Sandra+0.012
Neg logit contributions
balls-0.016
hetically-0.015
ishly-0.015
 hips-0.015
ider-0.013
Token there
Token ID612
Feature activation-7.37e-03

Pos logit contributions
aw+0.018
Kat+0.017
 Sandra+0.015
 reality+0.015
aman+0.015
Neg logit contributions
balls-0.022
hetically-0.021
ishly-0.021
 hips-0.020
ider-0.019
Token was
Token ID373
Feature activation+2.97e-02

Pos logit contributions
balls+0.009
 first+0.008
hetically+0.008
ishly+0.008
 hips+0.007
Neg logit contributions
aw-0.018
Kat-0.017
 reality-0.016
weet-0.016
reating-0.016
Token a
Token ID257
Feature activation-8.50e-03

Pos logit contributions
aw+0.069
Kat+0.066
aman+0.065
land+0.065
 reality+0.063
Neg logit contributions
balls-0.035
 first-0.031
ishly-0.027
 hips-0.027
 onto-0.027
Token little
Token ID1310
Feature activation-9.72e-03

Pos logit contributions
balls+0.013
hetically+0.012
 hips+0.012
ishly+0.012
 first+0.011
Neg logit contributions
aw-0.018
Kat-0.017
aman-0.016
 theirs-0.015
 Sandra-0.015
Token girl
Token ID2576
Feature activation-2.26e-02

Pos logit contributions
balls+0.018
 hips+0.017
hetically+0.017
ishly+0.017
ider+0.016
Neg logit contributions
aw-0.018
Kat-0.016
 Sandra-0.015
 reality-0.015
aman-0.014
Token named
Token ID3706
Feature activation+1.53e-02

Pos logit contributions
balls+0.051
ishly+0.048
hetically+0.048
 hips+0.047
ider+0.044
Neg logit contributions
aw-0.032
Kat-0.029
 reality-0.025
aman-0.025
 Sandra-0.025
Token Lily
Token ID20037
Feature activation-2.04e-02

Pos logit contributions
aw+0.027
Kat+0.023
 reality+0.021
aman+0.021
 Sandra+0.020
Neg logit contributions
balls-0.031
ishly-0.029
 hips-0.029
hetically-0.029
 first-0.027
Token.
Token ID13
Feature activation+8.00e-03

Pos logit contributions
balls+0.019
hetically+0.017
ishly+0.017
 hips+0.016
 first+0.015
Neg logit contributions
aw-0.054
Kat-0.052
aman-0.049
 reality-0.049
 Sandra-0.048
Token She
Token ID1375
Feature activation-6.95e-03

Pos logit contributions
aw+0.017
Kat+0.016
 Sandra+0.014
 reality+0.014
aman+0.014
Neg logit contributions
balls-0.013
ishly-0.012
hetically-0.012
 hips-0.011
ider-0.010
Token loved
Token ID6151
Feature activation-3.40e-03

Pos logit contributions
balls+0.013
 hips+0.012
hetically+0.012
ishly+0.012
 first+0.011
Neg logit contributions
aw-0.012
Kat-0.012
aman-0.011
 reality-0.011
 Sandra-0.011
Token to
Token ID284
Feature activation+1.72e-02

Pos logit contributions
balls+0.004
hetically+0.003
 hips+0.003
ishly+0.003
 first+0.003
Neg logit contributions
aw-0.008
Kat-0.008
 reality-0.008
aman-0.007
 Sandra-0.007
Token and
Token ID290
Feature activation+1.20e-02

Pos logit contributions
balls+0.019
hetically+0.018
 hips+0.017
ishly+0.017
ider+0.016
Neg logit contributions
aw-0.028
Kat-0.026
 reality-0.023
 theirs-0.023
 Sandra-0.023
Token played
Token ID2826
Feature activation-1.18e-02

Pos logit contributions
aw+0.021
Kat+0.020
aman+0.018
 Sandra+0.017
 reality+0.017
Neg logit contributions
balls-0.023
ishly-0.022
hetically-0.022
 hips-0.021
 first-0.019
Token together
Token ID1978
Feature activation-2.59e-02

Pos logit contributions
hetically+0.021
balls+0.020
 hips+0.020
ishly+0.020
ider+0.019
Neg logit contributions
aw-0.023
Kat-0.019
 theirs-0.018
 reality-0.017
 pic-0.017
Token in
Token ID287
Feature activation+1.04e-03

Pos logit contributions
hetically+0.047
ishly+0.046
 hips+0.044
balls+0.044
ider+0.043
Neg logit contributions
aw-0.048
 reality-0.040
Kat-0.039
 marriage-0.039
 theirs-0.038
Token the
Token ID262
Feature activation+5.04e-02

Pos logit contributions
aw+0.002
Kat+0.002
aman+0.002
oph+0.002
land+0.002
Neg logit contributions
balls-0.002
hetically-0.002
ishly-0.002
 hips-0.002
 first-0.002
Token jungle
Token ID20712
Feature activation+4.38e-02

Pos logit contributions
Kat+0.070
aw+0.069
 Sandra+0.062
 theirs+0.062
 reality+0.060
Neg logit contributions
balls-0.107
ishly-0.103
hetically-0.103
 hips-0.099
 first-0.098
Token every
Token ID790
Feature activation-1.55e-03

Pos logit contributions
aw+0.093
Kat+0.079
 reality+0.074
land+0.073
 theirs+0.072
Neg logit contributions
balls-0.069
hetically-0.068
ishly-0.067
 hips-0.067
ider-0.062
Token day
Token ID1110
Feature activation+6.07e-03

Pos logit contributions
balls+0.003
hetically+0.003
ishly+0.003
 hips+0.002
ider+0.002
Neg logit contributions
aw-0.003
Kat-0.003
 Sandra-0.003
aman-0.002
 reality-0.002
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
aw+0.013
Kat+0.012
 Sandra+0.011
 theirs+0.010
 reality+0.010
Neg logit contributions
balls-0.010
hetically-0.009
ishly-0.009
 hips-0.009
ider-0.008
Token<|endoftext|>
Token ID50256
Feature activation+3.93e-04

Pos logit contributions
aw+0.025
Kat+0.023
 reality+0.022
aman+0.021
 spoken+0.021
Neg logit contributions
balls-0.018
ishly-0.017
hetically-0.016
 hips-0.015
 first-0.014
TokenOnce
Token ID7454
Feature activation+2.44e-02

Pos logit contributions
aw+0.001
Kat+0.001
aman+0.001
 reality+0.001
 Sandra+0.001
Neg logit contributions
balls-0.001
 hips-0.001
hetically-0.001
ishly-0.000
 first-0.000
Token "
Token ID366
Feature activation-8.33e-03

Pos logit contributions
balls+0.014
hetically+0.014
ishly+0.013
 hips+0.013
ider+0.012
Neg logit contributions
aw-0.014
Kat-0.013
 Sandra-0.012
 theirs-0.012
aman-0.012
TokenLet
Token ID5756
Feature activation-8.85e-03

Pos logit contributions
balls+0.017
hetically+0.016
ishly+0.016
 hips+0.015
ider+0.015
Neg logit contributions
aw-0.014
Kat-0.013
 reality-0.011
 Sandra-0.011
aman-0.011
Token's
Token ID338
Feature activation+1.72e-02

Pos logit contributions
balls+0.012
ishly+0.011
hetically+0.011
 hips+0.011
 first+0.010
Neg logit contributions
aw-0.020
Kat-0.019
 reality-0.018
 Sandra-0.017
 theirs-0.017
Token go
Token ID467
Feature activation-7.41e-03

Pos logit contributions
aw+0.029
Kat+0.026
 reality+0.025
 Sandra+0.024
 spoken+0.024
Neg logit contributions
balls-0.034
hetically-0.033
ider-0.031
 hips-0.030
ishly-0.030
Token get
Token ID651
Feature activation-1.80e-02

Pos logit contributions
hetically+0.015
balls+0.014
ishly+0.014
 hips+0.013
ider+0.012
Neg logit contributions
aw-0.012
Kat-0.011
 seen-0.011
 theirs-0.011
land-0.010
Token something
Token ID1223
Feature activation-4.06e-03

Pos logit contributions
balls+0.037
hetically+0.037
ishly+0.037
 hips+0.036
ider+0.034
Neg logit contributions
aw-0.028
Kat-0.025
 theirs-0.023
 Sandra-0.022
 reality-0.022
Token to
Token ID284
Feature activation-5.41e-03

Pos logit contributions
hetically+0.008
balls+0.008
ishly+0.008
 hips+0.008
ider+0.007
Neg logit contributions
aw-0.007
Kat-0.006
 reality-0.006
 theirs-0.006
land-0.005
Token drink
Token ID4144
Feature activation+5.14e-04

Pos logit contributions
balls+0.007
ishly+0.007
 help+0.007
 hips+0.007
 first+0.006
Neg logit contributions
aw-0.012
Kat-0.012
aman-0.010
 Sandra-0.010
 reality-0.010
Token."
Token ID526
Feature activation-1.29e-02

Pos logit contributions
aw+0.001
Kat+0.001
 theirs+0.001
 Sandra+0.001
 reality+0.001
Neg logit contributions
hetically-0.001
balls-0.001
ishly-0.001
 hips-0.001
ider-0.001
Token The
Token ID383
Feature activation-2.38e-02

Pos logit contributions
balls+0.024
hetically+0.024
ishly+0.023
ider+0.022
 hips+0.022
Neg logit contributions
aw-0.024
Kat-0.020
 Sandra-0.019
aman-0.018
land-0.018
Token small
Token ID1402
Feature activation+7.46e-03

Pos logit contributions
ishly+0.048
 hips+0.045
hetically+0.045
balls+0.044
ider+0.042
Neg logit contributions
 reality-0.034
aw-0.033
 known-0.031
 history-0.031
 dentist-0.031
Token can
Token ID460
Feature activation-6.47e-03

Pos logit contributions
balls+0.019
hetically+0.018
ishly+0.018
 hips+0.017
ider+0.016
Neg logit contributions
aw-0.017
Kat-0.015
 reality-0.015
 theirs-0.014
 Hawaii-0.014
Token share
Token ID2648
Feature activation+9.73e-03

Pos logit contributions
balls+0.013
hetically+0.013
ishly+0.012
 hips+0.011
ider+0.011
Neg logit contributions
aw-0.010
Kat-0.010
 reality-0.009
aman-0.009
land-0.008
Token it
Token ID340
Feature activation+8.18e-03

Pos logit contributions
Kat+0.017
aw+0.017
aman+0.016
 reality+0.015
David+0.015
Neg logit contributions
balls-0.017
 hips-0.017
 first-0.016
 onto-0.015
ishly-0.015
Token."
Token ID526
Feature activation-2.83e-03

Pos logit contributions
aw+0.014
 theirs+0.012
Kat+0.012
land+0.011
 yours+0.011
Neg logit contributions
hetically-0.016
ishly-0.016
balls-0.014
 hips-0.014
 shove-0.014
Token Sam
Token ID3409
Feature activation+1.30e-03

Pos logit contributions
balls+0.007
hetically+0.007
ishly+0.006
ider+0.006
 hips+0.006
Neg logit contributions
aw-0.004
Kat-0.003
 Sandra-0.003
aman-0.003
 theirs-0.003
Token said
Token ID531
Feature activation+4.74e-03

Pos logit contributions
aw+0.002
Kat+0.002
 Sandra+0.001
oph+0.001
aman+0.001
Neg logit contributions
balls-0.003
hetically-0.003
ider-0.003
ishly-0.003
 hips-0.003
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
 theirs+0.006
aw+0.006
Kat+0.005
 about+0.005
 its+0.005
Neg logit contributions
hetically-0.010
balls-0.009
ishly-0.009
 Tort-0.009
Carl-0.008
Token He
Token ID679
Feature activation-1.00e-02

Pos logit contributions
hetically+0.030
ishly+0.029
balls+0.028
 hips+0.028
ider+0.026
Neg logit contributions
aw-0.023
Kat-0.021
 Sandra-0.020
 theirs-0.019
ooks-0.018
Token put
Token ID1234
Feature activation-7.84e-03

Pos logit contributions
balls+0.020
hetically+0.019
ishly+0.019
 hips+0.019
ider+0.017
Neg logit contributions
aw-0.017
Kat-0.015
 Sandra-0.013
 reality-0.013
oph-0.013
Token the
Token ID262
Feature activation-3.36e-02

Pos logit contributions
balls+0.016
ishly+0.016
hetically+0.015
 hips+0.015
ider+0.015
Neg logit contributions
aw-0.013
Kat-0.011
 theirs-0.010
 Sandra-0.010
 reality-0.010
Token medal
Token ID18279
Feature activation-1.47e-02

Pos logit contributions
balls+0.068
ishly+0.067
hetically+0.065
 hips+0.062
ider+0.058
Neg logit contributions
aw-0.056
Kat-0.049
 reality-0.046
 theirs-0.043
ige-0.043
Token the
Token ID262
Feature activation+2.18e-02

Pos logit contributions
balls+0.011
hetically+0.011
ishly+0.011
 hips+0.011
ider+0.010
Neg logit contributions
aw-0.010
Kat-0.009
 Sandra-0.008
 reality-0.008
aman-0.008
Token young
Token ID1862
Feature activation+3.57e-02

Pos logit contributions
aw+0.035
Kat+0.032
 reality+0.032
 marriage+0.028
 Sandra+0.027
Neg logit contributions
balls-0.042
hetically-0.042
ishly-0.042
 hips-0.041
ider-0.038
Token boy
Token ID2933
Feature activation+1.21e-02

Pos logit contributions
aw+0.063
Kat+0.060
 reality+0.054
 Sandra+0.052
land+0.050
Neg logit contributions
balls-0.070
 hips-0.066
hetically-0.066
ishly-0.064
ider-0.061
Token often
Token ID1690
Feature activation+1.79e-03

Pos logit contributions
aw+0.020
Kat+0.017
 Sandra+0.015
 reality+0.015
aman+0.015
Neg logit contributions
balls-0.025
hetically-0.024
 hips-0.023
ishly-0.023
ider-0.022
Token spent
Token ID3377
Feature activation-1.24e-02

Pos logit contributions
aw+0.003
Kat+0.003
aman+0.003
 reality+0.003
 Sandra+0.003
Neg logit contributions
balls-0.003
ishly-0.003
hetically-0.003
 hips-0.003
 first-0.003
Token time
Token ID640
Feature activation+1.38e-03

Pos logit contributions
balls+0.020
 hips+0.018
ishly+0.018
hetically+0.018
 first+0.018
Neg logit contributions
aw-0.023
Kat-0.023
aman-0.022
 Sandra-0.021
 reality-0.021
Token in
Token ID287
Feature activation-1.57e-02

Pos logit contributions
aw+0.002
Kat+0.002
 theirs+0.002
 sharing+0.002
 marriage+0.002
Neg logit contributions
hetically-0.003
ishly-0.003
balls-0.002
 hips-0.002
ider-0.002
Token the
Token ID262
Feature activation+2.36e-02

Pos logit contributions
balls+0.032
hetically+0.030
ishly+0.030
 hips+0.030
ider+0.028
Neg logit contributions
aw-0.025
Kat-0.023
 Sandra-0.021
 reality-0.021
aman-0.021
Token courtyard
Token ID34941
Feature activation+8.80e-03

Pos logit contributions
aw+0.035
Kat+0.032
 Sandra+0.028
aman+0.028
 reality+0.028
Neg logit contributions
balls-0.051
ishly-0.049
hetically-0.049
 hips-0.048
ider-0.045
Token,
Token ID11
Feature activation+1.97e-02

Pos logit contributions
aw+0.018
Kat+0.016
 reality+0.015
land+0.015
 Sandra+0.015
Neg logit contributions
balls-0.014
ishly-0.014
hetically-0.014
 hips-0.013
ider-0.012
Token folding
Token ID29909
Feature activation+4.13e-03

Pos logit contributions
aw+0.036
Kat+0.033
 Sandra+0.030
 reality+0.030
aman+0.030
Neg logit contributions
balls-0.036
hetically-0.034
ishly-0.034
 hips-0.033
ider-0.031
Token They
Token ID1119
Feature activation+2.79e-03

Pos logit contributions
aw+0.024
aman+0.024
 reality+0.024
reating+0.023
Kat+0.023
Neg logit contributions
balls-0.010
 hips-0.008
 first-0.007
ider-0.006
ishly-0.006
Token were
Token ID547
Feature activation+8.71e-03

Pos logit contributions
aw+0.005
Kat+0.005
aman+0.005
ooks+0.004
 reality+0.004
Neg logit contributions
balls-0.005
ishly-0.005
hetically-0.005
 hips-0.004
 first-0.004
Token happy
Token ID3772
Feature activation-8.33e-03

Pos logit contributions
aw+0.017
Kat+0.016
aman+0.016
oph+0.015
land+0.015
Neg logit contributions
balls-0.015
 first-0.013
 hips-0.013
ishly-0.013
hetically-0.013
Token.
Token ID13
Feature activation+5.58e-03

Pos logit contributions
balls+0.012
hetically+0.011
ishly+0.011
 hips+0.011
ider+0.009
Neg logit contributions
aw-0.019
Kat-0.018
 reality-0.016
 Sandra-0.016
aman-0.016
Token<|endoftext|>
Token ID50256
Feature activation-5.51e-03

Pos logit contributions
aman+0.014
aw+0.014
 reality+0.014
reating+0.014
Kat+0.013
Neg logit contributions
balls-0.006
 hips-0.005
 first-0.004
ider-0.004
ishly-0.003
TokenAnna
Token ID31160
Feature activation+8.17e-03

Pos logit contributions
balls+0.010
ishly+0.010
hetically+0.010
 hips+0.009
ider+0.009
Neg logit contributions
aw-0.010
Kat-0.009
 Sandra-0.008
 reality-0.008
aman-0.008
Token and
Token ID290
Feature activation+1.46e-02

Pos logit contributions
aw+0.017
Kat+0.016
 reality+0.015
aman+0.015
 theirs+0.015
Neg logit contributions
balls-0.012
ishly-0.011
hetically-0.011
 hips-0.011
 first-0.010
Token Ben
Token ID3932
Feature activation+3.22e-02

Pos logit contributions
aw+0.024
Kat+0.023
 reality+0.021
land+0.020
aman+0.020
Neg logit contributions
balls-0.028
ishly-0.028
hetically-0.027
 hips-0.026
 first-0.025
Token were
Token ID547
Feature activation+1.24e-02

Pos logit contributions
aw+0.060
Kat+0.058
 reality+0.054
aman+0.053
ooks+0.051
Neg logit contributions
balls-0.053
ishly-0.051
hetically-0.049
 hips-0.049
 first-0.046
Token twins
Token ID20345
Feature activation-1.94e-02

Pos logit contributions
aw+0.023
Kat+0.021
 Sandra+0.019
 theirs+0.019
 reality+0.019
Neg logit contributions
balls-0.023
hetically-0.022
ishly-0.022
 hips-0.021
ider-0.019
Token who
Token ID508
Feature activation+1.23e-02

Pos logit contributions
balls+0.032
hetically+0.030
ishly+0.029
 hips+0.028
ider+0.026
Neg logit contributions
aw-0.039
Kat-0.037
 reality-0.033
 Sandra-0.032
aman-0.032
Token large
Token ID1588
Feature activation+1.61e-02

Pos logit contributions
aw+0.055
Kat+0.052
 Sandra+0.048
 theirs+0.047
aman+0.047
Neg logit contributions
balls-0.055
ishly-0.051
hetically-0.050
 hips-0.050
 first-0.046
Token hive
Token ID35881
Feature activation-7.70e-03

Pos logit contributions
aw+0.029
Kat+0.027
 Sandra+0.025
 reality+0.024
aman+0.024
Neg logit contributions
balls-0.029
ishly-0.027
hetically-0.027
 hips-0.027
ider-0.025
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
balls+0.013
hetically+0.013
 hips+0.012
ishly+0.012
ider+0.010
Neg logit contributions
aw-0.016
Kat-0.013
land-0.013
 reality-0.012
weet-0.012
Token
Token ID198
Feature activation+3.18e-02

Pos logit contributions
aw+0.031
Kat+0.029
 reality+0.027
 theirs+0.026
aman+0.026
Neg logit contributions
balls-0.022
ishly-0.021
 hips-0.020
hetically-0.020
 first-0.018
Token
Token ID198
Feature activation+1.28e-02

Pos logit contributions
aw+0.058
Kat+0.051
 reality+0.049
 theirs+0.049
aman+0.048
Neg logit contributions
balls-0.058
 hips-0.054
hetically-0.053
ishly-0.053
 first-0.049
Token"
Token ID1
Feature activation-3.02e-02

Pos logit contributions
aw+0.022
Kat+0.020
 reality+0.019
 theirs+0.019
aman+0.018
Neg logit contributions
balls-0.024
hetically-0.022
 hips-0.022
ishly-0.022
ider-0.021
TokenWhat
Token ID2061
Feature activation+9.84e-03

Pos logit contributions
balls+0.054
ishly+0.051
hetically+0.050
 first+0.048
 hips+0.048
Neg logit contributions
aw-0.052
 theirs-0.047
 reality-0.046
ooks-0.044
Kat-0.043
Token's
Token ID338
Feature activation+1.28e-02

Pos logit contributions
Kat+0.022
aman+0.021
aw+0.021
David+0.020
Pe+0.020
Neg logit contributions
balls-0.012
 first-0.011
 hips-0.010
 help-0.010
ishly-0.010
Token that
Token ID326
Feature activation-2.72e-02

Pos logit contributions
aw+0.027
Kat+0.027
oph+0.024
David+0.024
aman+0.024
Neg logit contributions
balls-0.019
 first-0.018
ishly-0.017
 onto-0.017
 hips-0.016
Token?"
Token ID1701
Feature activation+5.58e-03

Pos logit contributions
balls+0.055
ishly+0.052
hetically+0.052
 hips+0.051
ider+0.047
Neg logit contributions
aw-0.045
Kat-0.041
 Sandra-0.036
 reality-0.036
aman-0.036
Token asked
Token ID1965
Feature activation+1.37e-02

Pos logit contributions
aw+0.009
Kat+0.008
 Sandra+0.008
aman+0.007
 reality+0.007
Neg logit contributions
balls-0.012
hetically-0.011
ishly-0.011
 hips-0.011
ider-0.010
Token kids
Token ID3988
Feature activation+8.88e-03

Pos logit contributions
aw+0.017
Kat+0.015
 reality+0.014
 Sandra+0.014
 theirs+0.013
Neg logit contributions
balls-0.020
hetically-0.019
ishly-0.019
 hips-0.018
ider-0.018
Token who
Token ID508
Feature activation+1.21e-02

Pos logit contributions
aw+0.016
Kat+0.014
land+0.013
 theirs+0.013
 reality+0.013
Neg logit contributions
balls-0.017
hetically-0.016
ishly-0.015
 hips-0.015
ider-0.014
Token would
Token ID561
Feature activation+2.59e-02

Pos logit contributions
aw+0.019
Kat+0.017
 Sandra+0.015
 reality+0.015
aman+0.015
Neg logit contributions
balls-0.026
hetically-0.024
ishly-0.024
 hips-0.024
ider-0.022
Token appreciate
Token ID9144
Feature activation+2.23e-02

Pos logit contributions
aw+0.046
Kat+0.044
 reality+0.040
 Sandra+0.039
aman+0.039
Neg logit contributions
balls-0.048
hetically-0.045
ishly-0.045
 hips-0.044
ider-0.041
Token her
Token ID607
Feature activation-1.85e-03

Pos logit contributions
aw+0.039
Kat+0.036
aman+0.032
 Sandra+0.032
 reality+0.031
Neg logit contributions
balls-0.042
ishly-0.039
 hips-0.039
hetically-0.039
 first-0.036
Token fl
Token ID781
Feature activation-5.86e-02

Pos logit contributions
balls+0.004
hetically+0.004
ishly+0.003
 hips+0.003
ider+0.003
Neg logit contributions
aw-0.003
 reality-0.003
Kat-0.003
 theirs-0.003
land-0.003
Tokenute
Token ID1133
Feature activation-1.77e-02

Pos logit contributions
hetically+0.125
 hips+0.119
ishly+0.117
balls+0.115
 breaths+0.106
Neg logit contributions
aw-0.083
ooks-0.075
Kat-0.072
 reality-0.068
ull-0.067
Token and
Token ID290
Feature activation+1.98e-02

Pos logit contributions
hetically+0.030
balls+0.030
ider+0.027
 hips+0.027
ishly+0.027
Neg logit contributions
aw-0.032
land-0.029
 reality-0.028
 theirs-0.027
Kat-0.027
Token her
Token ID607
Feature activation+3.56e-02

Pos logit contributions
aw+0.033
Kat+0.031
 reality+0.028
 Sandra+0.028
aman+0.027
Neg logit contributions
balls-0.039
hetically-0.037
ishly-0.037
 hips-0.036
ider-0.034
Token acting
Token ID7205
Feature activation+1.45e-02

Pos logit contributions
aw+0.059
 reality+0.053
Kat+0.053
land+0.049
 theirs+0.048
Neg logit contributions
balls-0.069
hetically-0.069
ishly-0.066
 hips-0.062
ider-0.062
Token,
Token ID11
Feature activation+3.49e-02

Pos logit contributions
aw+0.027
Kat+0.024
 reality+0.023
land+0.023
 theirs+0.023
Neg logit contributions
hetically-0.026
balls-0.025
ishly-0.025
 hips-0.024
ider-0.022
Token wall
Token ID3355
Feature activation+1.16e-02

Pos logit contributions
aw+0.030
Kat+0.028
land+0.026
aman+0.025
reating+0.024
Neg logit contributions
balls-0.038
ishly-0.037
hetically-0.035
ider-0.034
 hips-0.033
Token.
Token ID13
Feature activation-4.57e-03

Pos logit contributions
aw+0.024
Kat+0.021
land+0.020
 reality+0.019
 theirs+0.019
Neg logit contributions
balls-0.019
ishly-0.018
hetically-0.017
 hips-0.017
 Carlos-0.015
Token It
Token ID632
Feature activation+6.92e-03

Pos logit contributions
balls+0.007
 hips+0.006
ishly+0.006
 onto+0.006
hetically+0.006
Neg logit contributions
aw-0.009
Kat-0.009
aman-0.009
 reality-0.009
onda-0.009
Token is
Token ID318
Feature activation+1.33e-02

Pos logit contributions
aw+0.012
Kat+0.010
 reality+0.009
 Sandra+0.009
aman+0.009
Neg logit contributions
balls-0.014
ishly-0.013
hetically-0.013
 hips-0.013
ider-0.012
Token almost
Token ID2048
Feature activation-8.81e-03

Pos logit contributions
Kat+0.029
aw+0.029
aman+0.026
onda+0.026
oph+0.026
Neg logit contributions
balls-0.020
 hips-0.017
 first-0.017
 onto-0.016
hetically-0.015
Token evening
Token ID6180
Feature activation-4.92e-02

Pos logit contributions
balls+0.015
 first+0.014
 hips+0.014
 onto+0.014
 breath+0.013
Neg logit contributions
aw-0.016
Kat-0.016
onda-0.014
aman-0.014
Pe-0.014
Token.
Token ID13
Feature activation-5.73e-03

Pos logit contributions
balls+0.065
ishly+0.061
hetically+0.061
 hips+0.058
ider+0.052
Neg logit contributions
aw-0.115
Kat-0.108
 Sandra-0.100
aman-0.099
 reality-0.099
Token The
Token ID383
Feature activation-1.34e-02

Pos logit contributions
balls+0.009
 hips+0.008
ishly+0.007
 onto+0.007
 first+0.007
Neg logit contributions
aw-0.012
Kat-0.012
aman-0.011
 reality-0.011
reating-0.011
Token sun
Token ID4252
Feature activation+7.07e-04

Pos logit contributions
balls+0.029
ishly+0.029
hetically+0.028
 hips+0.027
ider+0.026
Neg logit contributions
aw-0.020
Kat-0.018
 reality-0.016
aman-0.016
 Sandra-0.015
Token is
Token ID318
Feature activation+3.39e-03

Pos logit contributions
aw+0.001
Kat+0.001
aman+0.001
 Sandra+0.001
land+0.001
Neg logit contributions
balls-0.001
ishly-0.001
hetically-0.001
 hips-0.001
ider-0.001
Token going
Token ID1016
Feature activation-7.07e-03

Pos logit contributions
aw+0.007
Kat+0.007
 Sandra+0.006
 reality+0.006
oph+0.006
Neg logit contributions
balls-0.005
 hips-0.005
 first-0.005
hetically-0.004
ishly-0.004
Token
Token ID198
Feature activation-1.26e-02

Pos logit contributions
aw+0.000
 reality+0.000
ige+0.000
 spoken+0.000
ellery+0.000
Neg logit contributions
balls-0.000
 first-0.000
 hips-0.000
hetically-0.000
 Carlos-0.000
TokenShe
Token ID3347
Feature activation-2.53e-02

Pos logit contributions
balls+0.023
hetically+0.021
ishly+0.021
 hips+0.021
ider+0.020
Neg logit contributions
aw-0.022
 reality-0.019
Kat-0.019
 theirs-0.018
 Sandra-0.018
Token got
Token ID1392
Feature activation-7.85e-03

Pos logit contributions
balls+0.049
hetically+0.047
ishly+0.046
 hips+0.045
ider+0.042
Neg logit contributions
aw-0.045
Kat-0.040
 Sandra-0.036
 reality-0.036
aman-0.036
Token out
Token ID503
Feature activation-4.59e-02

Pos logit contributions
balls+0.013
ishly+0.013
hetically+0.012
 hips+0.012
ider+0.012
Neg logit contributions
aw-0.016
Kat-0.014
 theirs-0.014
 Sandra-0.013
 reality-0.013
Token of
Token ID286
Feature activation-2.58e-02

Pos logit contributions
balls+0.096
ishly+0.091
ider+0.087
hetically+0.087
 hips+0.087
Neg logit contributions
aw-0.071
Kat-0.069
 theirs-0.065
 Sandra-0.063
 reality-0.061
Token bed
Token ID3996
Feature activation-5.18e-02

Pos logit contributions
balls+0.049
hetically+0.046
ishly+0.045
 hips+0.045
 first+0.041
Neg logit contributions
aw-0.047
Kat-0.043
 reality-0.038
aman-0.038
 Sandra-0.038
Token and
Token ID290
Feature activation-7.30e-03

Pos logit contributions
balls+0.083
ishly+0.080
hetically+0.079
 hips+0.075
ider+0.069
Neg logit contributions
aw-0.103
Kat-0.094
land-0.087
aman-0.086
 Sandra-0.085
Token took
Token ID1718
Feature activation-6.24e-03

Pos logit contributions
balls+0.013
ishly+0.013
hetically+0.013
 hips+0.012
ider+0.011
Neg logit contributions
aw-0.013
Kat-0.012
 Sandra-0.011
 reality-0.011
aman-0.011
Token her
Token ID607
Feature activation-3.04e-02

Pos logit contributions
balls+0.012
hetically+0.012
ishly+0.012
 hips+0.011
ider+0.011
Neg logit contributions
aw-0.011
Kat-0.010
 Sandra-0.009
 theirs-0.009
 reality-0.009
Token sheet
Token ID9629
Feature activation-2.60e-02

Pos logit contributions
ishly+0.060
hetically+0.059
balls+0.058
ider+0.056
 hips+0.050
Neg logit contributions
aw-0.043
 reality-0.042
Kat-0.041
 theirs-0.041
 seen-0.040
Token with
Token ID351
Feature activation-2.57e-02

Pos logit contributions
balls+0.047
ishly+0.046
hetically+0.045
ider+0.044
 hips+0.044
Neg logit contributions
aw-0.043
Kat-0.040
land-0.039
 theirs-0.037
reating-0.037
Token,
Token ID11
Feature activation-2.00e-02

Pos logit contributions
hetically+0.063
balls+0.062
ishly+0.061
 hips+0.058
ider+0.050
Neg logit contributions
aw-0.110
Kat-0.102
 Sandra-0.095
 reality-0.093
aman-0.093
Token they
Token ID484
Feature activation-2.32e-02

Pos logit contributions
balls+0.037
hetically+0.034
ishly+0.034
 hips+0.034
ider+0.031
Neg logit contributions
aw-0.037
Kat-0.034
 Sandra-0.031
 reality-0.031
aman-0.031
Token see
Token ID766
Feature activation+1.85e-02

Pos logit contributions
balls+0.039
ishly+0.037
 hips+0.037
hetically+0.036
 first+0.035
Neg logit contributions
aw-0.044
Kat-0.041
aman-0.038
land-0.037
 reality-0.037
Token a
Token ID257
Feature activation-2.34e-02

Pos logit contributions
Kat+0.048
aman+0.048
oph+0.047
ellery+0.045
land+0.044
Neg logit contributions
balls-0.017
 first-0.016
 help-0.014
ishly-0.012
 hips-0.012
Token real
Token ID1103
Feature activation-2.37e-02

Pos logit contributions
balls+0.033
 first+0.029
 hips+0.026
 maker+0.026
ishly+0.025
Neg logit contributions
aw-0.053
Kat-0.051
 theirs-0.048
 Sandra-0.047
aman-0.046
Token train
Token ID4512
Feature activation-6.03e-02

Pos logit contributions
balls+0.040
ishly+0.037
hetically+0.036
 hips+0.036
ider+0.033
Neg logit contributions
aw-0.047
Kat-0.045
 Sandra-0.041
aman-0.040
 reality-0.040
Token outside
Token ID2354
Feature activation-5.09e-02

Pos logit contributions
balls+0.114
hetically+0.111
 hips+0.109
ishly+0.106
ider+0.096
Neg logit contributions
aw-0.110
Kat-0.093
aman-0.085
 Sandra-0.084
land-0.084
Token their
Token ID511
Feature activation+5.14e-03

Pos logit contributions
balls+0.077
ishly+0.072
hetically+0.070
 hips+0.069
 first+0.067
Neg logit contributions
aw-0.105
Kat-0.102
aman-0.093
oph-0.091
 reality-0.091
Token window
Token ID4324
Feature activation-3.00e-02

Pos logit contributions
aw+0.010
Kat+0.009
 Sandra+0.009
 reality+0.008
aman+0.008
Neg logit contributions
balls-0.009
ishly-0.008
hetically-0.008
 hips-0.008
 first-0.007
Token.
Token ID13
Feature activation-8.05e-03

Pos logit contributions
balls+0.040
hetically+0.037
ishly+0.037
 hips+0.035
ider+0.031
Neg logit contributions
aw-0.071
Kat-0.066
 Sandra-0.061
aman-0.061
 reality-0.061
Token It
Token ID632
Feature activation-3.29e-03

Pos logit contributions
balls+0.016
hetically+0.015
ishly+0.015
 hips+0.014
ider+0.013
Neg logit contributions
aw-0.014
Kat-0.013
 Sandra-0.012
aman-0.011
 reality-0.011
TokenThe
Token ID464
Feature activation-2.46e-02

Pos logit contributions
balls+0.014
hetically+0.013
ishly+0.013
 hips+0.013
ider+0.012
Neg logit contributions
aw-0.009
Kat-0.008
 Sandra-0.007
aman-0.007
Pe-0.007
Token squirrel
Token ID33039
Feature activation+1.01e-02

Pos logit contributions
balls+0.050
 hips+0.049
ishly+0.047
hetically+0.047
ider+0.045
Neg logit contributions
aw-0.037
Kat-0.033
 reality-0.030
 Sandra-0.030
aman-0.030
Token spoke
Token ID5158
Feature activation-1.36e-04

Pos logit contributions
aw+0.017
Kat+0.015
aman+0.014
 Sandra+0.013
oph+0.013
Neg logit contributions
balls-0.021
 hips-0.019
hetically-0.019
ishly-0.018
ider-0.018
Token,
Token ID11
Feature activation-6.91e-03

Pos logit contributions
balls+0.000
hetically+0.000
 hips+0.000
ider+0.000
ishly+0.000
Neg logit contributions
aw-0.000
Kat-0.000
 theirs-0.000
 Sandra-0.000
land-0.000
Token "
Token ID366
Feature activation-9.45e-03

Pos logit contributions
balls+0.012
hetically+0.012
ishly+0.011
 hips+0.011
ider+0.011
Neg logit contributions
aw-0.012
Kat-0.012
 Sandra-0.011
aman-0.010
 theirs-0.010
TokenWell
Token ID5779
Feature activation-5.75e-02

Pos logit contributions
balls+0.020
ishly+0.019
hetically+0.019
 hips+0.019
ider+0.017
Neg logit contributions
aw-0.015
Kat-0.014
 Sandra-0.012
aman-0.012
 reality-0.012
Token,
Token ID11
Feature activation-2.41e-02

Pos logit contributions
balls+0.090
hetically+0.082
ishly+0.078
 hips+0.078
ider+0.073
Neg logit contributions
aw-0.121
Kat-0.112
 Sandra-0.109
aman-0.108
 reality-0.106
Token why
Token ID1521
Feature activation+1.79e-02

Pos logit contributions
balls+0.047
hetically+0.046
 hips+0.045
ishly+0.044
ider+0.044
Neg logit contributions
aw-0.039
Kat-0.034
 reality-0.034
 Sandra-0.032
aman-0.032
Token don
Token ID836
Feature activation-4.32e-02

Pos logit contributions
aw+0.036
Kat+0.034
 Sandra+0.031
 reality+0.030
aman+0.030
Neg logit contributions
balls-0.030
hetically-0.028
ishly-0.028
 hips-0.027
ider-0.025
Token't
Token ID470
Feature activation-1.21e-02

Pos logit contributions
balls+0.060
hetically+0.059
 hips+0.056
ishly+0.055
ider+0.053
Neg logit contributions
aw-0.092
Kat-0.086
 CA-0.082
 Sandra-0.082
David-0.078
Token you
Token ID345
Feature activation-1.73e-02

Pos logit contributions
balls+0.023
hetically+0.022
ishly+0.021
 hips+0.021
ider+0.020
Neg logit contributions
aw-0.022
Kat-0.021
 Sandra-0.019
 reality-0.019
aman-0.018