TOP ACTIVATIONS
MAX = 29.213

would take the av oc ados and scatter them around his
loved to eat av oc ados . One day , her
named Tim my who loved to eat ice cream . One
loved to eat av oc ados because they were y ummy
would prepare everything for the inspection . Together , they worked
girl named Sue who loved to read . She read big
<|endoftext|> Amy was so excited to go on her trip to
you know that av oc ados are very healthy for you
clever all igator who loved to carry things . He went
t know what to do . She wanted to
<|endoftext|> Ben and Mia liked to play near the harbor .
Jane did not know what to do and felt very scared
boy named George who liked to fly his k ite .
out that the av oc ados were bad and made the
." <|endoftext|> Tom my likes to use his calendar . He
weird avocado was very happy to be back in the tree
mom my then told him to find the prize . Tim
K ait lyn who wanted to play with her silver marker
Tim my was very eager to go to the park and
One day the boy went to the bakery to get a

MIN ACTIVATIONS
MIN = -28.096

though it was just a small piece , his dad said
! <|endoftext|> There was a small girl named Sad ie who
hide . They see a small , wooden house . They
but it was much too small . She was so sad
that the spot was too small for all the animals !
some toys that are too small or too easy for them
much . Tom had a small car that he liked very
down their faces . The small child was confused , why
dress er . It had small beads that spark led in
his photo inside was too small ! His photo had shrunk
it until it was a small dot in the sky .
Lucy . She was very small , only three years old
some of them are very small . Some of them are
But she made them very small and put them in the
Once , there was a small little girl . She was
truck . Sam has a small blue truck . They make
She carefully piled up the small bows and started sorting them
passports . Pass ports are small books with pictures and names
She followed a small path and hoped it would
Soon , the small hero realized how brave he

INTERVAL 14.886 - 29.213
CONTAINS 4.663%

. We don 't want to miss you or our home
starting point and were excited to tell each other about the
my rope ! You took it from my bag ! Give
and they wanted to try it . An
all the ingredients and then they mixed them together in a

INTERVAL 0.559 - 14.886
CONTAINS 39.893%

He was so proud that he learned how to take care
bike . It is blue and rusty . "
is so pretty . Can I play with it ?" Tim
had a yellow bubble wand that made small bubbles . Sara
a bit of effort , he could make a difference .

INTERVAL -13.769 - 0.559
CONTAINS 48.952%

to help her mom . So , she turned off her
One day , John saw a girl in the park .
some paper ," Jill said . She took out a big
air . One morning , the bunny decided
some fruits , Mom ?" Lily asked . "

INTERVAL -28.096 - -13.769
CONTAINS 6.492%

did , he felt a soft breeze against his face .
" You were brave to stay ," said the
of herself for being so brave and performing the show .
the morning . He feels brave and strong . He thinks
day , he found a shiny thing in the park .
Token would
Token ID561
Feature activation+1.69e+01

Pos logit contributions
ors+0.478
 to+0.468
but+0.462
fully+0.460
oes+0.438
Neg logit contributions
 cute-0.619
 clever-0.610
 charming-0.602
 small-0.590
 beautiful-0.585
Token take
Token ID1011
Feature activation+9.83e+00

Pos logit contributions
ors+0.532
but+0.524
fully+0.496
oes+0.481
driver+0.476
Neg logit contributions
 cute-0.497
 clever-0.492
 small-0.481
 gentle-0.477
 kind-0.475
Token the
Token ID262
Feature activation+5.41e-01

Pos logit contributions
ors+0.284
but+0.279
fully+0.270
oes+0.260
 to+0.259
Neg logit contributions
 cute-0.307
 clever-0.303
 charming-0.296
 small-0.295
 gentle-0.290
Token av
Token ID1196
Feature activation+2.08e+00

Pos logit contributions
 to+0.019
ors+0.018
fully+0.018
oes+0.018
but+0.018
Neg logit contributions
 cute-0.013
 clever-0.013
 charming-0.012
 gentle-0.012
 small-0.011
Tokenoc
Token ID420
Feature activation-8.91e+00

Pos logit contributions
fully+0.073
but+0.072
 Stop+0.072
ors+0.070
ancing+0.069
Neg logit contributions
 small-0.047
 clever-0.047
 beautiful-0.046
 charming-0.045
 cute-0.045
Tokenados
Token ID22484
Feature activation+2.46e+01

Pos logit contributions
 clever+0.335
 kind+0.328
 small+0.326
 charming+0.322
 humble+0.321
Neg logit contributions
fully-0.152
but-0.150
ges-0.144
t-0.143
licks-0.141
Token and
Token ID290
Feature activation+1.51e+01

Pos logit contributions
 Stop+0.462
ors+0.450
licks+0.411
ges+0.395
uld+0.380
Neg logit contributions
 small-1.031
 big-1.014
 clever-0.988
 beautiful-0.984
 special-0.978
Token scatter
Token ID41058
Feature activation+6.28e+00

Pos logit contributions
ors+0.491
but+0.481
fully+0.465
oes+0.448
or+0.443
Neg logit contributions
 cute-0.428
 clever-0.425
 small-0.417
 charming-0.410
 gentle-0.406
Token them
Token ID606
Feature activation+1.45e+01

Pos logit contributions
but+0.139
ors+0.138
fully+0.129
oes+0.125
driver+0.124
Neg logit contributions
 cute-0.241
 small-0.240
 clever-0.237
 beautiful-0.235
 charming-0.234
Token around
Token ID1088
Feature activation+1.22e+01

Pos logit contributions
ors+0.295
but+0.285
fully+0.271
oes+0.258
driver+0.255
Neg logit contributions
 cute-0.578
 clever-0.572
 small-0.563
 charming-0.561
 beautiful-0.553
Token his
Token ID465
Feature activation+6.70e-01

Pos logit contributions
ors+0.358
but+0.350
fully+0.337
oes+0.329
driver+0.326
Neg logit contributions
 cute-0.374
 clever-0.370
 small-0.363
 charming-0.360
 beautiful-0.354
Token loved
Token ID6151
Feature activation-4.00e+00

Pos logit contributions
 to+0.496
ors+0.472
but+0.464
fully+0.452
to+0.428
Neg logit contributions
 cute-0.435
 clever-0.433
 small-0.427
 charming-0.422
 gentle-0.417
Token to
Token ID284
Feature activation+2.18e+01

Pos logit contributions
 cute+0.195
 pretty+0.190
 beautiful+0.190
 small+0.189
 charming+0.189
Neg logit contributions
but-0.056
ors-0.056
ancing-0.056
or-0.055
driver-0.053
Token eat
Token ID4483
Feature activation+4.30e+00

Pos logit contributions
 to+0.578
ors+0.575
fully+0.557
but+0.539
oes+0.536
Neg logit contributions
 clever-0.687
 cute-0.684
 charming-0.664
 small-0.654
 beautiful-0.639
Token av
Token ID1196
Feature activation+4.39e-01

Pos logit contributions
ors+0.169
but+0.166
fully+0.162
oes+0.159
driver+0.158
Neg logit contributions
 cute-0.093
 clever-0.091
 small-0.088
 charming-0.088
 beautiful-0.086
Tokenoc
Token ID420
Feature activation-1.03e+01

Pos logit contributions
fully+0.014
but+0.014
ors+0.014
 Stop+0.013
driver+0.013
Neg logit contributions
 cute-0.012
 clever-0.012
 small-0.012
 charming-0.012
 beautiful-0.012
Tokenados
Token ID22484
Feature activation+2.37e+01

Pos logit contributions
 clever+0.403
 cute+0.393
 kind+0.390
 charming+0.389
 small+0.384
Neg logit contributions
but-0.181
fully-0.181
t-0.164
ges-0.164
oes-0.163
Token.
Token ID13
Feature activation-5.45e+00

Pos logit contributions
 Stop+0.742
ors+0.691
driver+0.665
ancing+0.648
ges+0.641
Neg logit contributions
 small-0.743
 big-0.731
 pretty-0.724
 beautiful-0.723
 clever-0.717
Token One
Token ID1881
Feature activation-3.91e+00

Pos logit contributions
 cute+0.156
 clever+0.152
 small+0.151
 charming+0.150
 beautiful+0.146
Neg logit contributions
ors-0.172
but-0.168
fully-0.165
driver-0.159
 to-0.157
Token day
Token ID1110
Feature activation-1.10e+00

Pos logit contributions
 cute+0.099
 clever+0.098
 charming+0.093
 gentle+0.092
 small+0.091
Neg logit contributions
but-0.131
 to-0.131
ors-0.129
fully-0.128
oes-0.128
Token,
Token ID11
Feature activation+3.41e+00

Pos logit contributions
 cute+0.039
 small+0.038
 clever+0.038
 charming+0.037
 funny+0.036
Neg logit contributions
ors-0.029
but-0.028
oes-0.028
fully-0.027
licks-0.026
Token her
Token ID607
Feature activation+3.82e+00

Pos logit contributions
ors+0.085
but+0.077
oes+0.077
fully+0.075
driver+0.075
Neg logit contributions
 cute-0.128
 clever-0.125
 small-0.124
 charming-0.123
 beautiful-0.121
Token named
Token ID3706
Feature activation+1.60e+00

Pos logit contributions
 to+0.443
oes+0.398
 him+0.369
but+0.366
 Stop+0.363
Neg logit contributions
 cute-0.725
 clever-0.697
 charming-0.687
 wonderful-0.663
 beautiful-0.660
Token Tim
Token ID5045
Feature activation+2.67e+00

Pos logit contributions
ors+0.042
but+0.040
fully+0.040
 to+0.040
oes+0.037
Neg logit contributions
 cute-0.053
 clever-0.053
 charming-0.052
 gentle-0.051
 small-0.051
Tokenmy
Token ID1820
Feature activation+1.14e+01

Pos logit contributions
 to+0.060
ors+0.060
but+0.059
fully+0.058
oes+0.056
Neg logit contributions
 cute-0.098
 clever-0.098
 charming-0.097
 beautiful-0.095
 small-0.094
Token who
Token ID508
Feature activation+1.10e+01

Pos logit contributions
 to+0.304
ors+0.299
but+0.293
fully+0.289
oes+0.270
Neg logit contributions
 cute-0.374
 clever-0.373
 charming-0.367
 beautiful-0.359
 magical-0.358
Token loved
Token ID6151
Feature activation-1.62e+00

Pos logit contributions
 to+0.313
ors+0.294
 him+0.278
t+0.269
to+0.266
Neg logit contributions
 gentle-0.332
 clever-0.320
 cute-0.316
 charming-0.311
 soft-0.308
Token to
Token ID284
Feature activation+2.36e+01

Pos logit contributions
 cute+0.080
 small+0.078
 clever+0.078
 charming+0.078
 beautiful+0.078
Neg logit contributions
but-0.020
ors-0.019
fully-0.017
or-0.017
driver-0.017
Token eat
Token ID4483
Feature activation+5.82e+00

Pos logit contributions
 to+0.624
ors+0.607
fully+0.595
but+0.570
oes+0.565
Neg logit contributions
 clever-0.748
 cute-0.748
 charming-0.726
 small-0.707
 beautiful-0.698
Token ice
Token ID4771
Feature activation+2.30e+00

Pos logit contributions
ors+0.209
but+0.205
fully+0.201
 to+0.200
oes+0.193
Neg logit contributions
 cute-0.138
 clever-0.137
 charming-0.132
 small-0.130
 gentle-0.128
Token cream
Token ID8566
Feature activation+6.41e+00

Pos logit contributions
 Stop+0.097
ors+0.097
licks+0.094
but+0.094
fully+0.093
Neg logit contributions
 cute-0.040
 clever-0.039
 small-0.039
 cool-0.039
 charming-0.038
Token.
Token ID13
Feature activation-4.34e+00

Pos logit contributions
ors+0.208
 Stop+0.200
licks+0.197
osity+0.196
uld+0.194
Neg logit contributions
 cute-0.174
 cool-0.173
 small-0.172
 big-0.172
 clever-0.170
Token One
Token ID1881
Feature activation-2.45e+00

Pos logit contributions
 cute+0.115
 small+0.113
 charming+0.110
 cool+0.110
 clever+0.108
Neg logit contributions
fully-0.144
but-0.143
ors-0.143
or-0.141
driver-0.136
Token loved
Token ID6151
Feature activation-4.44e+00

Pos logit contributions
 to+0.480
but+0.443
ors+0.440
fully+0.436
to+0.413
Neg logit contributions
 clever-0.423
 cute-0.422
 small-0.415
 charming-0.412
 gentle-0.401
Token to
Token ID284
Feature activation+2.13e+01

Pos logit contributions
 cute+0.220
 beautiful+0.214
 pretty+0.214
 small+0.213
 charming+0.213
Neg logit contributions
ors-0.061
but-0.060
or-0.059
ancing-0.058
driver-0.057
Token eat
Token ID4483
Feature activation+3.80e+00

Pos logit contributions
ors+0.570
 to+0.570
fully+0.551
but+0.541
oes+0.534
Neg logit contributions
 clever-0.669
 cute-0.667
 charming-0.647
 small-0.639
 beautiful-0.624
Token av
Token ID1196
Feature activation+2.18e-01

Pos logit contributions
ors+0.150
but+0.147
fully+0.144
oes+0.141
driver+0.141
Neg logit contributions
 cute-0.081
 clever-0.079
 small-0.077
 charming-0.076
 beautiful-0.074
Tokenoc
Token ID420
Feature activation-1.05e+01

Pos logit contributions
fully+0.007
but+0.007
ors+0.007
 Stop+0.007
driver+0.006
Neg logit contributions
 cute-0.006
 clever-0.006
 small-0.006
 charming-0.006
 beautiful-0.006
Tokenados
Token ID22484
Feature activation+2.34e+01

Pos logit contributions
 clever+0.405
 cute+0.395
 kind+0.393
 charming+0.390
 small+0.384
Neg logit contributions
but-0.193
fully-0.191
t-0.177
oes-0.174
ges-0.174
Token because
Token ID780
Feature activation+9.11e+00

Pos logit contributions
 Stop+0.729
ors+0.683
driver+0.656
ancing+0.637
ges+0.630
Neg logit contributions
 small-0.747
 big-0.735
 pretty-0.727
 beautiful-0.724
 clever-0.720
Token they
Token ID484
Feature activation+1.56e+01

Pos logit contributions
ancing+0.224
ors+0.221
amb+0.207
 Stop+0.201
oes+0.196
Neg logit contributions
 small-0.361
 cute-0.355
 pretty-0.348
 big-0.347
 beautiful-0.347
Token were
Token ID547
Feature activation+1.91e+00

Pos logit contributions
ors+0.422
but+0.413
fully+0.402
 to+0.390
oes+0.381
Neg logit contributions
 cute-0.510
 clever-0.502
 charming-0.492
 small-0.491
 gentle-0.484
Token y
Token ID331
Feature activation-1.90e+01

Pos logit contributions
 to+0.072
ors+0.071
but+0.071
fully+0.069
oes+0.066
Neg logit contributions
 cute-0.038
 clever-0.038
 charming-0.036
 gentle-0.036
 small-0.036
Tokenummy
Token ID13513
Feature activation+8.77e+00

Pos logit contributions
 cute+0.646
 charming+0.623
 pretty+0.620
 small+0.618
 beautiful+0.617
Neg logit contributions
but-0.489
fully-0.482
driver-0.469
rings-0.441
 Stop-0.435
Token would
Token ID561
Feature activation+1.15e+01

Pos logit contributions
ors+0.354
but+0.344
fully+0.334
 to+0.323
oes+0.318
Neg logit contributions
 cute-0.468
 clever-0.462
 charming-0.453
 small-0.451
 gentle-0.443
Token prepare
Token ID8335
Feature activation+5.10e+00

Pos logit contributions
ors+0.359
but+0.348
fully+0.339
oes+0.325
driver+0.324
Neg logit contributions
 cute-0.342
 clever-0.339
 small-0.329
 charming-0.328
 gentle-0.326
Token everything
Token ID2279
Feature activation+1.04e+01

Pos logit contributions
ors+0.147
but+0.143
fully+0.137
oes+0.135
driver+0.131
Neg logit contributions
 cute-0.163
 clever-0.163
 small-0.161
 gentle-0.158
 charming-0.158
Token for
Token ID329
Feature activation+1.26e+01

Pos logit contributions
ors+0.236
but+0.216
licks+0.213
iked+0.213
ges+0.213
Neg logit contributions
 clever-0.403
 cute-0.400
 small-0.399
 beautiful-0.394
 charming-0.390
Token the
Token ID262
Feature activation-3.52e+00

Pos logit contributions
ors+0.261
but+0.252
fully+0.252
 to+0.247
or+0.226
Neg logit contributions
 cute-0.484
 clever-0.475
 charming-0.469
 small-0.462
 beautiful-0.456
Token inspection
Token ID15210
Feature activation+2.34e+01

Pos logit contributions
 cute+0.081
 clever+0.078
 charming+0.077
 gentle+0.074
 small+0.073
Neg logit contributions
 to-0.125
but-0.118
fully-0.116
ors-0.116
oes-0.115
Token.
Token ID13
Feature activation-7.64e+00

Pos logit contributions
ors+0.631
but+0.615
fully+0.596
oes+0.571
 to+0.568
Neg logit contributions
 cute-0.775
 clever-0.766
 charming-0.748
 small-0.747
 gentle-0.734
Token Together
Token ID17083
Feature activation+3.52e+00

Pos logit contributions
 cute+0.256
 clever+0.253
 small+0.253
 gentle+0.253
 kind+0.251
Neg logit contributions
ors-0.204
fully-0.198
but-0.185
or-0.175
iked-0.172
Token,
Token ID11
Feature activation+1.37e+00

Pos logit contributions
but+0.090
ors+0.089
licks+0.084
ges+0.082
fully+0.081
Neg logit contributions
 small-0.127
 cute-0.127
 clever-0.125
 beautiful-0.124
 gentle-0.124
Token they
Token ID484
Feature activation+1.38e+01

Pos logit contributions
ors+0.035
oes+0.031
fully+0.031
but+0.031
licks+0.030
Neg logit contributions
 clever-0.052
 gentle-0.051
 small-0.051
 cute-0.051
 beautiful-0.050
Token worked
Token ID3111
Feature activation-5.21e+00

Pos logit contributions
ors+0.380
but+0.378
fully+0.357
oes+0.337
ges+0.336
Neg logit contributions
 clever-0.464
 cute-0.457
 small-0.449
 gentle-0.446
 kind-0.446
Token girl
Token ID2576
Feature activation+7.43e+00

Pos logit contributions
aver+0.155
 youngest+0.141
ady+0.138
aky+0.128
 clever+0.125
Neg logit contributions
 to-0.514
driver-0.424
ing-0.412
 it-0.409
 we-0.408
Token named
Token ID3706
Feature activation+1.20e-01

Pos logit contributions
ors+0.222
but+0.217
fully+0.211
oes+0.203
 to+0.203
Neg logit contributions
 cute-0.223
 clever-0.220
 charming-0.215
 small-0.214
 gentle-0.210
Token Sue
Token ID26089
Feature activation+4.37e+00

Pos logit contributions
ors+0.003
but+0.003
fully+0.003
driver+0.003
oes+0.003
Neg logit contributions
 cute-0.004
 clever-0.004
 small-0.004
 charming-0.004
 beautiful-0.004
Token who
Token ID508
Feature activation+1.02e+01

Pos logit contributions
ors+0.116
but+0.115
 to+0.113
fully+0.112
oes+0.105
Neg logit contributions
 cute-0.143
 clever-0.143
 charming-0.139
 small-0.137
 gentle-0.136
Token loved
Token ID6151
Feature activation-2.09e+00

Pos logit contributions
 to+0.289
ors+0.270
but+0.265
fully+0.260
 him+0.249
Neg logit contributions
 clever-0.299
 gentle-0.298
 cute-0.296
 charming-0.288
 small-0.286
Token to
Token ID284
Feature activation+2.34e+01

Pos logit contributions
 cute+0.094
 small+0.092
 beautiful+0.091
 charming+0.091
 clever+0.091
Neg logit contributions
but-0.035
ors-0.035
fully-0.032
driver-0.032
or-0.032
Token read
Token ID1100
Feature activation-1.52e-01

Pos logit contributions
 to+0.620
ors+0.616
fully+0.600
but+0.582
oes+0.568
Neg logit contributions
 clever-0.736
 cute-0.733
 charming-0.711
 small-0.698
 gentle-0.684
Token.
Token ID13
Feature activation-3.49e+00

Pos logit contributions
 clever+0.005
 cute+0.005
 charming+0.004
 small+0.004
 gentle+0.004
Neg logit contributions
ors-0.004
 to-0.004
but-0.004
fully-0.004
driver-0.004
Token She
Token ID1375
Feature activation+1.54e+01

Pos logit contributions
 cute+0.098
 small+0.097
 charming+0.094
 cool+0.093
 clever+0.092
Neg logit contributions
ors-0.116
fully-0.111
but-0.110
or-0.109
ting-0.105
Token read
Token ID1100
Feature activation-1.53e+00

Pos logit contributions
 to+0.438
ors+0.418
but+0.409
fully+0.409
driver+0.386
Neg logit contributions
 cute-0.467
 clever-0.458
 charming-0.449
 small-0.441
 gentle-0.440
Token big
Token ID1263
Feature activation-1.84e+01

Pos logit contributions
 cute+0.041
 clever+0.041
 small+0.040
 charming+0.040
 beautiful+0.039
Neg logit contributions
ors-0.052
but-0.051
fully-0.049
oes-0.048
driver-0.047
Token<|endoftext|>
Token ID50256
Feature activation+6.24e+00

Pos logit contributions
 cute+0.218
 clever+0.211
 charming+0.202
 playful+0.195
aver+0.194
Neg logit contributions
 to-0.495
driver-0.432
but-0.430
oes-0.426
to-0.424
TokenAmy
Token ID40797
Feature activation+8.36e+00

Pos logit contributions
but+0.159
ors+0.159
 to+0.157
fully+0.150
ges+0.150
Neg logit contributions
 clever-0.212
 cute-0.211
 charming-0.205
 gentle-0.201
 beautiful-0.199
Token was
Token ID373
Feature activation+5.12e+00

Pos logit contributions
ors+0.153
but+0.148
fully+0.141
 to+0.132
oes+0.132
Neg logit contributions
 cute-0.348
 clever-0.345
 charming-0.338
 small-0.338
 gentle-0.333
Token so
Token ID523
Feature activation+9.73e+00

Pos logit contributions
 to+0.138
but+0.129
ors+0.128
fully+0.126
ges+0.118
Neg logit contributions
 cute-0.161
 clever-0.160
 charming-0.156
 kind-0.152
 wonderful-0.151
Token excited
Token ID6568
Feature activation-1.08e+01

Pos logit contributions
 to+0.359
but+0.336
fully+0.323
to+0.317
t+0.309
Neg logit contributions
 clever-0.186
 cute-0.183
 magical-0.180
 charming-0.180
 gentle-0.176
Token to
Token ID284
Feature activation+2.34e+01

Pos logit contributions
 beautiful+0.363
 cute+0.361
 little+0.360
 big+0.356
 funny+0.354
Neg logit contributions
licks-0.293
driver-0.289
 Stop-0.265
ors-0.262
or-0.262
Token go
Token ID467
Feature activation+9.87e+00

Pos logit contributions
ors+0.712
but+0.698
fully+0.663
oes+0.644
driver+0.642
Neg logit contributions
 cute-0.716
 clever-0.710
 small-0.699
 charming-0.691
 cool-0.685
Token on
Token ID319
Feature activation+1.23e+01

Pos logit contributions
ors+0.243
but+0.229
fully+0.221
licks+0.219
 Stop+0.217
Neg logit contributions
 big-0.398
 cool-0.395
 pretty-0.379
 small-0.378
 beautiful-0.376
Token her
Token ID607
Feature activation+5.18e+00

Pos logit contributions
 to+0.437
ors+0.436
but+0.428
fully+0.418
oes+0.399
Neg logit contributions
 cute-0.294
 clever-0.293
 charming-0.278
 gentle-0.276
 small-0.272
Token trip
Token ID5296
Feature activation+6.15e+00

Pos logit contributions
 to+0.187
ors+0.172
but+0.167
oes+0.162
fully+0.162
Neg logit contributions
 clever-0.118
 cute-0.116
 gentle-0.112
 kind-0.112
 charming-0.109
Token to
Token ID284
Feature activation+2.17e+01

Pos logit contributions
ors+0.200
licks+0.193
 Stop+0.189
ges+0.188
driver+0.185
Neg logit contributions
 small-0.167
 kind-0.165
 big-0.165
 beautiful-0.164
 charming-0.164
Token you
Token ID345
Feature activation+1.44e+01

Pos logit contributions
but+0.170
ors+0.165
fully+0.158
driver+0.145
or+0.144
Neg logit contributions
 cute-0.364
 clever-0.356
 small-0.353
 funny-0.347
 charming-0.346
Token know
Token ID760
Feature activation-2.08e-01

Pos logit contributions
ors+0.457
but+0.447
fully+0.433
driver+0.416
oes+0.413
Neg logit contributions
 cute-0.418
 clever-0.413
 small-0.402
 charming-0.400
 gentle-0.395
Token that
Token ID326
Feature activation+1.22e+01

Pos logit contributions
 cute+0.008
 small+0.008
 clever+0.008
 beautiful+0.008
 charming+0.008
Neg logit contributions
ors-0.005
but-0.005
driver-0.004
fully-0.004
or-0.004
Token av
Token ID1196
Feature activation+2.49e-01

Pos logit contributions
ors+0.278
but+0.270
fully+0.264
 to+0.262
oes+0.249
Neg logit contributions
 cute-0.446
 clever-0.444
 charming-0.435
 small-0.430
 gentle-0.426
Tokenoc
Token ID420
Feature activation-1.07e+01

Pos logit contributions
but+0.008
ors+0.008
fully+0.008
iked+0.008
 Stop+0.008
Neg logit contributions
 cute-0.007
 clever-0.007
 charming-0.006
 small-0.006
 beautiful-0.006
Tokenados
Token ID22484
Feature activation+2.34e+01

Pos logit contributions
 clever+0.411
 kind+0.408
 cute+0.392
 charming+0.391
 humble+0.387
Neg logit contributions
but-0.183
fully-0.182
ges-0.176
oes-0.167
licks-0.163
Token are
Token ID389
Feature activation+6.75e+00

Pos logit contributions
ors+0.670
but+0.666
fully+0.641
driver+0.609
iked+0.609
Neg logit contributions
 cute-0.746
 clever-0.740
 small-0.717
 cool-0.716
 beautiful-0.715
Token very
Token ID845
Feature activation-2.72e+00

Pos logit contributions
ors+0.215
but+0.212
 to+0.210
fully+0.207
oes+0.199
Neg logit contributions
 cute-0.183
 clever-0.179
 charming-0.174
 small-0.172
 gentle-0.170
Token healthy
Token ID5448
Feature activation-1.24e+01

Pos logit contributions
 cute+0.024
 clever+0.023
 charming+0.021
 gentle+0.020
 small+0.020
Neg logit contributions
ors-0.133
 to-0.132
but-0.132
fully-0.128
oes-0.127
Token for
Token ID329
Feature activation+1.30e+01

Pos logit contributions
 kind+0.394
 small+0.384
 cute+0.382
 big+0.379
 clever+0.379
Neg logit contributions
ors-0.352
iked-0.343
but-0.339
rings-0.335
licks-0.334
Token you
Token ID345
Feature activation+1.32e+01

Pos logit contributions
ors+0.325
 to+0.313
but+0.307
fully+0.306
oes+0.284
Neg logit contributions
 cute-0.446
 clever-0.439
 charming-0.433
 small-0.422
 gentle-0.419
Token clever
Token ID14169
Feature activation-1.29e+01

Pos logit contributions
 cute+0.397
 clever+0.384
 charming+0.370
 gentle+0.358
 small+0.357
Neg logit contributions
 to-0.731
fully-0.713
ors-0.702
but-0.684
driver-0.662
Token all
Token ID477
Feature activation+7.89e+00

Pos logit contributions
 cute+0.365
 clever+0.360
 small+0.354
 charming+0.350
 kind+0.349
Neg logit contributions
ors-0.416
but-0.410
fully-0.385
oes-0.381
licks-0.380
Tokenigator
Token ID23823
Feature activation+3.79e+00

Pos logit contributions
ors+0.228
but+0.222
fully+0.213
driver+0.208
oes+0.206
Neg logit contributions
 cute-0.249
 clever-0.246
 small-0.241
 charming-0.241
 gentle-0.237
Token who
Token ID508
Feature activation+1.07e+01

Pos logit contributions
ors+0.111
but+0.106
driver+0.103
fully+0.103
oes+0.101
Neg logit contributions
 clever-0.116
 cute-0.116
 kind-0.112
 small-0.112
 charming-0.111
Token loved
Token ID6151
Feature activation-1.90e+00

Pos logit contributions
ors+0.340
but+0.333
fully+0.325
 to+0.315
oes+0.312
Neg logit contributions
 cute-0.300
 clever-0.296
 charming-0.288
 small-0.287
 gentle-0.281
Token to
Token ID284
Feature activation+2.33e+01

Pos logit contributions
 cute+0.082
 small+0.080
 clever+0.080
 charming+0.079
 beautiful+0.079
Neg logit contributions
ors-0.035
but-0.033
driver-0.032
fully-0.030
or-0.030
Token carry
Token ID3283
Feature activation+5.54e+00

Pos logit contributions
ors+0.653
but+0.633
fully+0.631
 to+0.630
oes+0.597
Neg logit contributions
 cute-0.728
 clever-0.716
 charming-0.700
 small-0.691
 beautiful-0.677
Token things
Token ID1243
Feature activation+1.77e+01

Pos logit contributions
ors+0.152
but+0.151
fully+0.145
oes+0.141
driver+0.138
Neg logit contributions
 cute-0.181
 clever-0.178
 small-0.176
 charming-0.174
 beautiful-0.172
Token.
Token ID13
Feature activation-4.87e+00

Pos logit contributions
 Stop+0.537
licks+0.508
ors+0.503
ges+0.486
iked+0.486
Neg logit contributions
 small-0.566
 big-0.558
 cool-0.555
 beautiful-0.553
 pretty-0.553
Token He
Token ID679
Feature activation+1.12e+01

Pos logit contributions
 cute+0.153
 small+0.150
 clever+0.148
 charming+0.147
 gentle+0.143
Neg logit contributions
ors-0.139
fully-0.134
but-0.134
driver-0.128
or-0.126
Token went
Token ID1816
Feature activation+7.73e-01

Pos logit contributions
ors+0.316
 to+0.313
but+0.305
fully+0.302
driver+0.289
Neg logit contributions
 cute-0.341
 clever-0.329
 charming-0.328
 small-0.319
 beautiful-0.319
Token
Token ID26391
Feature activation+5.21e+00

Pos logit contributions
ors+0.016
but+0.016
oes+0.016
fully+0.014
rings+0.014
Neg logit contributions
 cute-0.035
 small-0.035
 clever-0.035
 cool-0.034
 funny-0.034
Token
Token ID8151
Feature activation+9.43e+00

Pos logit contributions
ors+0.136
but+0.134
fully+0.128
 to+0.125
oes+0.121
Neg logit contributions
 cute-0.176
 clever-0.174
 charming-0.170
 small-0.169
 gentle-0.167
Tokent
Token ID83
Feature activation+1.54e+01

Pos logit contributions
ors+0.199
but+0.184
fully+0.183
driver+0.174
oes+0.171
Neg logit contributions
 cute-0.360
 clever-0.354
 small-0.348
 charming-0.348
 beautiful-0.342
Token know
Token ID760
Feature activation+2.59e+00

Pos logit contributions
ors+0.425
fully+0.421
but+0.418
 to+0.414
oes+0.391
Neg logit contributions
 cute-0.483
 clever-0.480
 charming-0.468
 small-0.462
 gentle-0.455
Token what
Token ID644
Feature activation+1.49e+01

Pos logit contributions
ors+0.060
but+0.059
fully+0.056
oes+0.054
driver+0.053
Neg logit contributions
 cute-0.096
 clever-0.095
 small-0.093
 charming-0.093
 beautiful-0.091
Token to
Token ID284
Feature activation+2.33e+01

Pos logit contributions
ors+0.268
but+0.259
fully+0.245
oes+0.227
driver+0.225
Neg logit contributions
 cute-0.640
 clever-0.633
 small-0.623
 charming-0.621
 beautiful-0.614
Token do
Token ID466
Feature activation+6.92e+00

Pos logit contributions
ors+0.671
but+0.651
 to+0.651
fully+0.648
oes+0.629
Neg logit contributions
 cute-0.694
 clever-0.690
 charming-0.669
 small-0.660
 gentle-0.651
Token.
Token ID13
Feature activation-4.66e+00

Pos logit contributions
ors+0.234
oes+0.228
but+0.228
licks+0.224
fully+0.224
Neg logit contributions
 clever-0.181
 cute-0.179
 small-0.176
 kind-0.175
 beautiful-0.173
Token She
Token ID1375
Feature activation+1.54e+01

Pos logit contributions
 cute+0.169
 small+0.166
 clever+0.165
 charming+0.162
 gentle+0.160
Neg logit contributions
ors-0.111
fully-0.105
but-0.103
driver-0.097
or-0.097
Token wanted
Token ID2227
Feature activation-4.65e+00

Pos logit contributions
ors+0.474
 to+0.457
but+0.453
fully+0.450
driver+0.432
Neg logit contributions
 cute-0.440
 clever-0.430
 charming-0.425
 small-0.415
 beautiful-0.410
Token to
Token ID284
Feature activation+2.09e+01

Pos logit contributions
 small+0.231
 cute+0.229
 clever+0.228
 beautiful+0.226
 big+0.225
Neg logit contributions
ors-0.058
but-0.051
fully-0.048
amb-0.046
ancing-0.045
Token<|endoftext|>
Token ID50256
Feature activation+6.24e+00

Pos logit contributions
 cute+0.009
 clever+0.009
 small+0.008
 charming+0.008
 gentle+0.008
Neg logit contributions
ors-0.010
fully-0.009
 to-0.008
or-0.008
but-0.008
TokenBen
Token ID11696
Feature activation+2.44e+00

Pos logit contributions
but+0.159
ors+0.159
 to+0.157
fully+0.150
ges+0.150
Neg logit contributions
 clever-0.212
 cute-0.211
 charming-0.205
 gentle-0.201
 beautiful-0.199
Token and
Token ID290
Feature activation+1.88e+01

Pos logit contributions
ors+0.043
but+0.042
fully+0.039
driver+0.037
oes+0.036
Neg logit contributions
 cute-0.103
 clever-0.102
 small-0.101
 charming-0.100
 gentle-0.099
Token Mia
Token ID32189
Feature activation-1.01e-02

Pos logit contributions
ors+0.524
but+0.513
fully+0.501
 to+0.489
oes+0.479
Neg logit contributions
 cute-0.604
 clever-0.598
 charming-0.584
 small-0.579
 gentle-0.571
Token liked
Token ID8288
Feature activation-5.55e+00

Pos logit contributions
 clever+0.000
 cute+0.000
 charming+0.000
 magical+0.000
 gentle+0.000
Neg logit contributions
 to-0.000
fully-0.000
ors-0.000
but-0.000
 Stop-0.000
Token to
Token ID284
Feature activation+2.33e+01

Pos logit contributions
 cute+0.312
 clever+0.309
 small+0.306
 charming+0.306
 beautiful+0.303
Neg logit contributions
ors-0.025
but-0.020
fully-0.015
driver-0.009
oes-0.009
Token play
Token ID711
Feature activation+6.87e-02

Pos logit contributions
 to+0.609
ors+0.579
fully+0.573
oes+0.563
but+0.557
Neg logit contributions
 cute-0.749
 clever-0.746
 charming-0.725
 small-0.708
 gentle-0.699
Token near
Token ID1474
Feature activation+7.77e+00

Pos logit contributions
ors+0.001
but+0.001
fully+0.001
 to+0.001
oes+0.001
Neg logit contributions
 cute-0.003
 clever-0.003
 charming-0.003
 small-0.003
 gentle-0.003
Token the
Token ID262
Feature activation-4.40e-01

Pos logit contributions
 to+0.195
ors+0.176
fully+0.161
 him+0.161
ges+0.157
Neg logit contributions
 clever-0.269
 cute-0.260
 charming-0.256
 gentle-0.251
 playful-0.249
Token harbor
Token ID25451
Feature activation+4.62e+00

Pos logit contributions
 clever+0.010
 gentle+0.009
 cute+0.009
 compassionate+0.009
 kind+0.009
Neg logit contributions
 to-0.016
oes-0.014
ors-0.014
ing-0.014
but-0.014
Token.
Token ID13
Feature activation-4.37e+00

Pos logit contributions
ors+0.142
but+0.138
fully+0.134
driver+0.130
oes+0.129
Neg logit contributions
 cute-0.136
 clever-0.134
 small-0.131
 charming-0.131
 beautiful-0.128
TokenJane
Token ID41083
Feature activation+1.26e+01

Pos logit contributions
ors+0.121
but+0.118
fully+0.115
oes+0.111
licks+0.109
Neg logit contributions
 clever-0.158
 cute-0.157
 small-0.157
 kind-0.155
 charming-0.153
Token did
Token ID750
Feature activation+5.10e+00

Pos logit contributions
ors+0.400
but+0.392
fully+0.383
 to+0.372
oes+0.368
Neg logit contributions
 cute-0.355
 clever-0.350
 charming-0.341
 small-0.339
 gentle-0.332
Token not
Token ID407
Feature activation+1.48e+01

Pos logit contributions
ors+0.155
but+0.150
fully+0.146
oes+0.142
driver+0.140
Neg logit contributions
 cute-0.154
 clever-0.152
 small-0.148
 charming-0.148
 beautiful-0.145
Token know
Token ID760
Feature activation+3.78e+00

Pos logit contributions
 to+0.377
but+0.375
fully+0.373
ors+0.372
oes+0.350
Neg logit contributions
 cute-0.486
 clever-0.475
 charming-0.471
 small-0.466
 beautiful-0.456
Token what
Token ID644
Feature activation+1.53e+01

Pos logit contributions
ors+0.083
but+0.079
oes+0.076
fully+0.074
or+0.072
Neg logit contributions
 cute-0.147
 clever-0.146
 small-0.144
 charming-0.142
 beautiful-0.142
Token to
Token ID284
Feature activation+2.32e+01

Pos logit contributions
ors+0.300
but+0.289
fully+0.277
 to+0.261
oes+0.260
Neg logit contributions
 cute-0.622
 clever-0.616
 charming-0.604
 small-0.603
 gentle-0.594
Token do
Token ID466
Feature activation+6.87e+00

Pos logit contributions
ors+0.731
but+0.716
fully+0.701
 to+0.683
oes+0.676
Neg logit contributions
 cute-0.664
 clever-0.655
 charming-0.637
 small-0.634
 gentle-0.621
Token and
Token ID290
Feature activation+1.56e+01

Pos logit contributions
ors+0.249
oes+0.240
licks+0.238
iked+0.237
rings+0.237
Neg logit contributions
 clever-0.163
 cute-0.161
 kind-0.159
 small-0.159
 funny-0.156
Token felt
Token ID2936
Feature activation+9.38e-01

Pos logit contributions
ors+0.475
but+0.461
fully+0.445
oes+0.428
driver+0.427
Neg logit contributions
 cute-0.479
 clever-0.474
 small-0.463
 charming-0.459
 kind-0.454
Token very
Token ID845
Feature activation-1.11e+00

Pos logit contributions
 to+0.037
but+0.036
ors+0.035
fully+0.035
 Stop+0.034
Neg logit contributions
 cute-0.018
 clever-0.017
 charming-0.017
 gentle-0.016
 small-0.016
Token scared
Token ID12008
Feature activation-1.50e+01

Pos logit contributions
 cute+0.017
aver+0.017
 charming+0.016
 gentle+0.015
 lovely+0.015
Neg logit contributions
 to-0.044
t-0.040
but-0.040
ors-0.040
or-0.039
Token boy
Token ID2933
Feature activation+7.60e+00

Pos logit contributions
aver+0.042
 cute+0.038
 wonderful+0.036
 clever+0.034
 gentle+0.034
Neg logit contributions
 to-0.136
but-0.123
ors-0.122
t-0.120
or-0.119
Token named
Token ID3706
Feature activation+6.81e-02

Pos logit contributions
ors+0.244
but+0.238
fully+0.230
oes+0.224
driver+0.221
Neg logit contributions
 cute-0.214
 clever-0.213
 small-0.206
 charming-0.204
 gentle-0.203
Token George
Token ID4502
Feature activation+8.55e+00

Pos logit contributions
ors+0.002
but+0.002
fully+0.002
driver+0.001
oes+0.001
Neg logit contributions
 cute-0.002
 clever-0.002
 small-0.002
 charming-0.002
 beautiful-0.002
Token who
Token ID508
Feature activation+9.63e+00

Pos logit contributions
ors+0.228
 to+0.224
fully+0.223
but+0.223
ges+0.205
Neg logit contributions
 cute-0.277
 clever-0.277
 charming-0.271
 small-0.266
 beautiful-0.265
Token liked
Token ID8288
Feature activation-5.19e+00

Pos logit contributions
 to+0.285
ors+0.280
but+0.263
fully+0.257
to+0.250
Neg logit contributions
 gentle-0.273
 clever-0.271
 cute-0.270
 charming-0.265
 small-0.260
Token to
Token ID284
Feature activation+2.32e+01

Pos logit contributions
 cute+0.273
 small+0.269
 charming+0.266
 beautiful+0.266
 clever+0.266
Neg logit contributions
ors-0.049
but-0.049
or-0.041
fully-0.040
driver-0.039
Token fly
Token ID6129
Feature activation+3.60e+00

Pos logit contributions
ors+0.620
 to+0.609
fully+0.599
but+0.583
driver+0.575
Neg logit contributions
 cute-0.733
 clever-0.728
 charming-0.708
 small-0.693
 beautiful-0.682
Token his
Token ID465
Feature activation+2.40e+00

Pos logit contributions
ors+0.098
but+0.095
fully+0.092
driver+0.088
oes+0.088
Neg logit contributions
 cute-0.121
 clever-0.119
 small-0.117
 charming-0.116
 beautiful-0.115
Token k
Token ID479
Feature activation-7.09e+00

Pos logit contributions
 to+0.086
ors+0.084
fully+0.080
but+0.080
oes+0.078
Neg logit contributions
 clever-0.054
 cute-0.053
 charming-0.051
 kind-0.049
 gentle-0.049
Tokenite
Token ID578
Feature activation+1.37e+01

Pos logit contributions
 cute+0.282
 small+0.277
 clever+0.274
 charming+0.273
 beautiful+0.269
Neg logit contributions
fully-0.142
but-0.138
ors-0.137
driver-0.124
ancing-0.120
Token.
Token ID13
Feature activation-5.07e+00

Pos logit contributions
ors+0.377
but+0.370
fully+0.355
driver+0.344
 Stop+0.344
Neg logit contributions
 cute-0.442
 clever-0.436
 small-0.434
 charming-0.429
 beautiful-0.424
Token out
Token ID503
Feature activation+8.97e+00

Pos logit contributions
 cute+0.386
 small+0.384
 clever+0.376
 beautiful+0.373
 charming+0.372
Neg logit contributions
but-0.165
ors-0.165
oes-0.154
fully-0.148
or-0.143
Token that
Token ID326
Feature activation+1.21e+01

Pos logit contributions
ors+0.213
but+0.191
oes+0.190
or+0.183
rings+0.182
Neg logit contributions
 small-0.363
 cute-0.356
 clever-0.356
 beautiful-0.354
 gentle-0.353
Token the
Token ID262
Feature activation-2.35e+00

Pos logit contributions
ors+0.297
but+0.290
fully+0.287
 to+0.283
oes+0.268
Neg logit contributions
 cute-0.420
 clever-0.417
 charming-0.407
 small-0.401
 gentle-0.398
Token av
Token ID1196
Feature activation-3.58e-02

Pos logit contributions
 cute+0.054
 clever+0.054
 charming+0.052
 gentle+0.050
 funny+0.049
Neg logit contributions
 to-0.078
oes-0.070
fully-0.068
but-0.067
ting-0.066
Tokenoc
Token ID420
Feature activation-1.11e+01

Pos logit contributions
 clever+0.001
 cute+0.001
 small+0.001
 charming+0.001
 gentle+0.001
Neg logit contributions
but-0.001
fully-0.001
ors-0.001
 Stop-0.001
oes-0.001
Tokenados
Token ID22484
Feature activation+2.32e+01

Pos logit contributions
 clever+0.416
 kind+0.414
 charming+0.403
 small+0.401
 cute+0.401
Neg logit contributions
but-0.199
fully-0.194
ges-0.184
licks-0.180
oes-0.178
Token were
Token ID547
Feature activation+1.11e+00

Pos logit contributions
ors+0.716
but+0.705
fully+0.677
oes+0.675
ancing+0.648
Neg logit contributions
 cute-0.678
 clever-0.675
 small-0.659
 charming-0.644
 beautiful-0.641
Token bad
Token ID2089
Feature activation-1.61e+01

Pos logit contributions
 to+0.038
ors+0.037
but+0.036
fully+0.036
driver+0.034
Neg logit contributions
 cute-0.027
 clever-0.027
 charming-0.026
 small-0.025
 beautiful-0.025
Token and
Token ID290
Feature activation+1.33e+01

Pos logit contributions
 cute+0.542
 clever+0.531
 funny+0.530
 small+0.523
 kind+0.521
Neg logit contributions
but-0.461
or-0.421
oes-0.419
ors-0.410
licks-0.401
Token made
Token ID925
Feature activation+1.47e+00

Pos logit contributions
ors+0.432
but+0.422
fully+0.418
 to+0.415
driver+0.398
Neg logit contributions
 cute-0.354
 clever-0.349
 charming-0.341
 small-0.336
 gentle-0.329
Token the
Token ID262
Feature activation-3.00e+00

Pos logit contributions
ors+0.031
 to+0.030
fully+0.030
but+0.029
driver+0.026
Neg logit contributions
 cute-0.056
 clever-0.055
 charming-0.054
 small-0.053
 gentle-0.053
Token."
Token ID526
Feature activation+2.94e+00

Pos logit contributions
 cute+0.280
 clever+0.276
 small+0.273
 charming+0.272
 beautiful+0.267
Neg logit contributions
ors-0.131
but-0.123
fully-0.122
oes-0.112
driver-0.111
Token<|endoftext|>
Token ID50256
Feature activation+3.66e+00

Pos logit contributions
ors+0.087
fully+0.083
but+0.082
driver+0.079
oes+0.078
Neg logit contributions
 cute-0.091
 small-0.089
 clever-0.088
 charming-0.086
 gentle-0.086
TokenTom
Token ID13787
Feature activation+2.34e+00

Pos logit contributions
ors+0.096
but+0.094
 to+0.093
fully+0.090
oes+0.088
Neg logit contributions
 cute-0.122
 clever-0.122
 charming-0.119
 gentle-0.117
 small-0.116
Tokenmy
Token ID1820
Feature activation+9.88e+00

Pos logit contributions
ors+0.058
but+0.057
fully+0.054
driver+0.051
oes+0.050
Neg logit contributions
 cute-0.084
 clever-0.082
 small-0.082
 charming-0.081
 gentle-0.080
Token likes
Token ID7832
Feature activation+7.16e-01

Pos logit contributions
ors+0.277
but+0.276
fully+0.266
oes+0.254
driver+0.253
Neg logit contributions
 cute-0.319
 clever-0.313
 small-0.309
 charming-0.305
 gentle-0.302
Token to
Token ID284
Feature activation+2.32e+01

Pos logit contributions
ors+0.007
but+0.006
fully+0.005
driver+0.005
oes+0.005
Neg logit contributions
 cute-0.037
 clever-0.036
 small-0.036
 charming-0.036
 beautiful-0.036
Token use
Token ID779
Feature activation+7.07e+00

Pos logit contributions
ors+0.624
 to+0.621
fully+0.614
but+0.595
driver+0.594
Neg logit contributions
 cute-0.730
 clever-0.716
 charming-0.702
 small-0.684
 beautiful-0.678
Token his
Token ID465
Feature activation+1.66e+00

Pos logit contributions
ors+0.218
but+0.210
fully+0.207
 to+0.205
driver+0.199
Neg logit contributions
 cute-0.205
 clever-0.204
 charming-0.198
 small-0.195
 gentle-0.193
Token calendar
Token ID11845
Feature activation+1.12e+01

Pos logit contributions
 to+0.060
driver+0.054
ors+0.051
oes+0.050
rings+0.050
Neg logit contributions
 cute-0.035
 clever-0.035
 charming-0.034
 gentle-0.033
 kind-0.032
Token.
Token ID13
Feature activation-5.51e+00

Pos logit contributions
ors+0.277
but+0.266
fully+0.254
 Stop+0.247
ges+0.242
Neg logit contributions
 cute-0.402
 clever-0.400
 small-0.398
 charming-0.393
 beautiful-0.389
Token He
Token ID679
Feature activation+1.10e+01

Pos logit contributions
 small+0.191
 cute+0.189
 cool+0.186
 charming+0.186
 funny+0.185
Neg logit contributions
ors-0.142
but-0.137
or-0.133
fully-0.131
ting-0.128
Token weird
Token ID7650
Feature activation-1.06e+01

Pos logit contributions
 cute+0.182
 small+0.182
 clever+0.178
 charming+0.177
 beautiful+0.176
Neg logit contributions
ors-0.236
fully-0.223
but-0.223
driver-0.221
or-0.213
Token avocado
Token ID40377
Feature activation+1.38e+01

Pos logit contributions
 cute+0.299
 clever+0.288
 small+0.287
 charming+0.286
 beautiful+0.284
Neg logit contributions
ors-0.345
but-0.340
fully-0.320
licks-0.318
or-0.316
Token was
Token ID373
Feature activation+4.41e+00

Pos logit contributions
ors+0.403
but+0.394
fully+0.378
oes+0.377
ancing+0.374
Neg logit contributions
 cute-0.429
 clever-0.425
 small-0.415
 charming-0.412
 beautiful-0.412
Token very
Token ID845
Feature activation+1.32e+00

Pos logit contributions
ors+0.136
 to+0.136
but+0.135
fully+0.132
driver+0.126
Neg logit contributions
 cute-0.121
 clever-0.120
 charming-0.117
 gentle-0.114
 small-0.114
Token happy
Token ID3772
Feature activation-1.31e+01

Pos logit contributions
 to+0.060
but+0.054
t+0.054
ors+0.052
ho+0.052
Neg logit contributions
aver-0.011
 cute-0.011
 charming-0.011
 clever-0.011
 cozy-0.011
Token to
Token ID284
Feature activation+2.32e+01

Pos logit contributions
 cute+0.428
 beautiful+0.422
 kind+0.419
 charming+0.412
 funny+0.411
Neg logit contributions
ors-0.376
or-0.374
driver-0.364
licks-0.362
oes-0.361
Token be
Token ID307
Feature activation+8.83e+00

Pos logit contributions
ors+0.632
but+0.619
fully+0.608
 to+0.597
oes+0.579
Neg logit contributions
 cute-0.753
 clever-0.741
 charming-0.725
 small-0.720
 gentle-0.708
Token back
Token ID736
Feature activation+7.56e+00

Pos logit contributions
 to+0.241
but+0.232
fully+0.225
ors+0.223
ting+0.205
Neg logit contributions
 cute-0.283
 clever-0.271
 small-0.270
 charming-0.265
 funny-0.261
Token in
Token ID287
Feature activation+1.46e+01

Pos logit contributions
ors+0.177
fully+0.162
but+0.161
driver+0.159
licks+0.155
Neg logit contributions
 cute-0.290
 clever-0.289
 charming-0.284
 small-0.284
 kind-0.281
Token the
Token ID262
Feature activation-5.62e-01

Pos logit contributions
 to+0.422
but+0.404
ors+0.403
fully+0.394
ges+0.382
Neg logit contributions
 cute-0.439
 clever-0.434
 charming-0.419
 gentle-0.409
 small-0.408
Token tree
Token ID5509
Feature activation+7.54e-01

Pos logit contributions
 clever+0.013
 cute+0.013
 gentle+0.012
 playful+0.012
 charming+0.012
Neg logit contributions
 to-0.020
oes-0.017
but-0.017
ors-0.017
or-0.016
Token mom
Token ID1995
Feature activation+2.77e+00

Pos logit contributions
 cute+0.130
 clever+0.128
 charming+0.125
 small+0.125
 gentle+0.123
Neg logit contributions
ors-0.122
but-0.119
fully-0.116
 to-0.112
oes-0.111
Tokenmy
Token ID1820
Feature activation+1.22e+01

Pos logit contributions
ors+0.084
but+0.081
fully+0.077
driver+0.075
oes+0.074
Neg logit contributions
 cute-0.084
 clever-0.083
 small-0.081
 kind-0.080
 gentle-0.080
Token then
Token ID788
Feature activation+1.08e+01

Pos logit contributions
ors+0.345
but+0.341
fully+0.321
oes+0.317
driver+0.312
Neg logit contributions
 cute-0.386
 clever-0.380
 small-0.373
 charming-0.366
 gentle-0.366
Token told
Token ID1297
Feature activation-5.46e+00

Pos logit contributions
ors+0.304
but+0.293
fully+0.280
oes+0.275
driver+0.268
Neg logit contributions
 cute-0.354
 clever-0.350
 small-0.343
 charming-0.340
 gentle-0.337
Token him
Token ID683
Feature activation+1.76e+01

Pos logit contributions
 cute+0.196
 clever+0.193
 small+0.190
 charming+0.189
 beautiful+0.187
Neg logit contributions
ors-0.131
but-0.131
oes-0.121
fully-0.121
driver-0.120
Token to
Token ID284
Feature activation+2.31e+01

Pos logit contributions
ors+0.426
but+0.409
oes+0.385
amb+0.376
licks+0.375
Neg logit contributions
 clever-0.680
 cute-0.676
 kind-0.670
 small-0.661
 funny-0.650
Token find
Token ID1064
Feature activation+1.21e+01

Pos logit contributions
ors+0.668
but+0.653
fully+0.637
 to+0.617
oes+0.610
Neg logit contributions
 cute-0.717
 clever-0.708
 charming-0.691
 small-0.687
 beautiful-0.674
Token the
Token ID262
Feature activation-1.96e-02

Pos logit contributions
 to+0.376
ors+0.371
fully+0.365
but+0.357
driver+0.342
Neg logit contributions
 cute-0.343
 clever-0.337
 charming-0.329
 small-0.315
 gentle-0.313
Token prize
Token ID11596
Feature activation+8.34e+00

Pos logit contributions
 cute+0.000
 clever+0.000
 charming+0.000
 gentle+0.000
 small+0.000
Neg logit contributions
 to-0.001
but-0.001
ors-0.001
fully-0.001
oes-0.001
Token.
Token ID13
Feature activation-4.70e+00

Pos logit contributions
ors+0.233
but+0.220
 Stop+0.217
licks+0.210
oes+0.208
Neg logit contributions
 clever-0.273
 cute-0.271
 small-0.271
 kind-0.268
 charming-0.265
Token Tim
Token ID5045
Feature activation-1.61e+00

Pos logit contributions
 cute+0.157
 small+0.154
 clever+0.153
 charming+0.151
 gentle+0.148
Neg logit contributions
ors-0.126
fully-0.118
but-0.116
or-0.113
oes-0.112
Token K
Token ID509
Feature activation-3.69e+00

Pos logit contributions
ors+0.098
but+0.081
 Stop+0.077
oot+0.074
fully+0.072
Neg logit contributions
 small-0.268
 kind-0.265
 clever-0.263
 cute-0.262
 gentle-0.262
Tokenait
Token ID4548
Feature activation+1.02e+01

Pos logit contributions
 cute+0.130
 clever+0.129
 charming+0.126
 small+0.125
 gentle+0.123
Neg logit contributions
ors-0.093
but-0.089
fully-0.088
 to-0.085
oes-0.084
Tokenlyn
Token ID6213
Feature activation+9.71e+00

Pos logit contributions
ors+0.278
but+0.270
fully+0.258
driver+0.255
oes+0.252
Neg logit contributions
 cute-0.326
 clever-0.323
 small-0.317
 charming-0.315
 gentle-0.311
Token who
Token ID508
Feature activation+1.01e+01

Pos logit contributions
ors+0.277
but+0.272
 to+0.269
fully+0.262
or+0.252
Neg logit contributions
 cute-0.297
 clever-0.294
 charming-0.288
 small-0.281
 gentle-0.278
Token wanted
Token ID2227
Feature activation-1.71e+00

Pos logit contributions
ors+0.315
but+0.309
fully+0.302
 to+0.300
or+0.288
Neg logit contributions
 cute-0.281
 clever-0.278
 charming-0.271
 small-0.268
 gentle-0.264
Token to
Token ID284
Feature activation+2.31e+01

Pos logit contributions
 cute+0.083
 clever+0.082
 small+0.082
 charming+0.081
 beautiful+0.081
Neg logit contributions
ors-0.022
but-0.020
fully-0.018
driver-0.017
licks-0.017
Token play
Token ID711
Feature activation-1.37e-01

Pos logit contributions
ors+0.611
but+0.598
 to+0.598
fully+0.586
oes+0.565
Neg logit contributions
 cute-0.750
 clever-0.738
 charming-0.725
 small-0.714
 gentle-0.700
Token with
Token ID351
Feature activation+1.11e+01

Pos logit contributions
 cute+0.005
 clever+0.005
 charming+0.005
 small+0.005
 gentle+0.005
Neg logit contributions
ors-0.003
but-0.003
fully-0.003
oes-0.003
 to-0.003
Token her
Token ID607
Feature activation+5.28e+00

Pos logit contributions
ors+0.274
 to+0.261
but+0.259
fully+0.256
driver+0.236
Neg logit contributions
 cute-0.384
 clever-0.384
 charming-0.375
 small-0.367
 gentle-0.365
Token silver
Token ID8465
Feature activation-7.88e+00

Pos logit contributions
 to+0.204
ors+0.194
but+0.190
fully+0.187
rings+0.183
Neg logit contributions
 clever-0.104
 cute-0.103
 charming-0.100
 gentle-0.096
 kind-0.095
Token marker
Token ID18364
Feature activation+5.04e+00

Pos logit contributions
 cute+0.186
 clever+0.183
 charming+0.177
 small+0.176
 gentle+0.172
Neg logit contributions
ors-0.286
but-0.281
fully-0.275
 to-0.267
oes-0.266
Token Tim
Token ID5045
Feature activation+3.85e+00

Pos logit contributions
ors+0.056
but+0.055
fully+0.053
 to+0.051
oes+0.050
Neg logit contributions
 cute-0.085
 clever-0.084
 charming-0.082
 small-0.082
 gentle-0.081
Tokenmy
Token ID1820
Feature activation+1.27e+01

Pos logit contributions
ors+0.100
but+0.098
fully+0.095
oes+0.090
 to+0.090
Neg logit contributions
 cute-0.131
 clever-0.130
 charming-0.126
 small-0.126
 gentle-0.124
Token was
Token ID373
Feature activation+4.99e+00

Pos logit contributions
ors+0.354
but+0.343
 to+0.342
fully+0.337
driver+0.321
Neg logit contributions
 cute-0.399
 clever-0.393
 charming-0.389
 small-0.382
 beautiful-0.379
Token very
Token ID845
Feature activation+1.70e+00

Pos logit contributions
 to+0.174
but+0.149
fully+0.143
 over+0.142
 him+0.140
Neg logit contributions
 cute-0.114
 clever-0.111
 charming-0.110
 magical-0.106
 pretty-0.105
Token eager
Token ID11069
Feature activation-9.09e+00

Pos logit contributions
 to+0.066
but+0.057
t+0.056
 after+0.055
't+0.054
Neg logit contributions
 magical-0.026
 wond-0.026
 wonderful-0.023
 unison-0.023
 charming-0.023
Token to
Token ID284
Feature activation+2.31e+01

Pos logit contributions
 little+0.397
 big+0.394
 fun+0.391
 kind+0.388
 small+0.384
Neg logit contributions
licks-0.183
ors-0.169
ancing-0.161
or-0.155
but-0.150
Token go
Token ID467
Feature activation+1.00e+01

Pos logit contributions
ors+0.637
but+0.621
fully+0.609
 to+0.609
oes+0.584
Neg logit contributions
 cute-0.743
 clever-0.729
 charming-0.716
 small-0.708
 beautiful-0.696
Token to
Token ID284
Feature activation+2.21e+01

Pos logit contributions
ors+0.246
fully+0.234
licks+0.229
 Stop+0.220
but+0.218
Neg logit contributions
 big-0.395
 cool-0.395
 clever-0.385
 small-0.384
 charming-0.380
Token the
Token ID262
Feature activation-7.25e-01

Pos logit contributions
ors+0.622
 to+0.620
but+0.594
fully+0.582
oes+0.563
Neg logit contributions
 cute-0.684
 clever-0.677
 charming-0.655
 gentle-0.641
 small-0.640
Token park
Token ID3952
Feature activation-2.33e+00

Pos logit contributions
 cute+0.018
 clever+0.018
 gentle+0.017
aver+0.017
 playful+0.017
Neg logit contributions
 to-0.024
but-0.021
reating-0.021
ting-0.020
oes-0.020
Token and
Token ID290
Feature activation+1.49e+01

Pos logit contributions
 small+0.079
 cute+0.079
 clever+0.079
 charming+0.079
 kind+0.079
Neg logit contributions
ors-0.061
licks-0.057
 Stop-0.056
but-0.055
fully-0.055
TokenOne
Token ID3198
Feature activation+2.61e+00

Pos logit contributions
ors+0.056
but+0.055
fully+0.053
 to+0.052
oes+0.051
Neg logit contributions
 cute-0.068
 clever-0.067
 charming-0.066
 small-0.065
 gentle-0.064
Token day
Token ID1110
Feature activation+5.08e+00

Pos logit contributions
 to+0.090
but+0.089
ors+0.088
oes+0.088
fully+0.087
Neg logit contributions
 cute-0.064
 clever-0.063
 charming-0.060
 gentle-0.059
 small-0.058
Token the
Token ID262
Feature activation+2.87e+00

Pos logit contributions
ors+0.135
but+0.131
oes+0.129
fully+0.124
licks+0.118
Neg logit contributions
 cute-0.175
 clever-0.172
 small-0.172
 funny-0.165
 charming-0.165
Token boy
Token ID2933
Feature activation+6.09e+00

Pos logit contributions
 to+0.100
oes+0.094
but+0.092
ors+0.090
driver+0.090
Neg logit contributions
 cute-0.063
 clever-0.061
 charming-0.060
 gentle-0.058
 cozy-0.058
Token went
Token ID1816
Feature activation+3.54e+00

Pos logit contributions
ors+0.198
but+0.197
fully+0.191
oes+0.182
driver+0.182
Neg logit contributions
 cute-0.169
 clever-0.166
 small-0.162
 charming-0.160
 kind-0.158
Token to
Token ID284
Feature activation+2.31e+01

Pos logit contributions
ors+0.080
fully+0.072
but+0.068
ately+0.066
licks+0.065
Neg logit contributions
 big-0.147
 small-0.144
 cool-0.141
 friendly-0.139
 gentle-0.139
Token the
Token ID262
Feature activation-1.26e-01

Pos logit contributions
 to+0.632
ors+0.579
but+0.565
oes+0.559
fully+0.555
Neg logit contributions
 cute-0.753
 clever-0.743
 charming-0.722
 gentle-0.703
 kind-0.701
Token bakery
Token ID38164
Feature activation+5.66e+00

Pos logit contributions
 cute+0.003
 clever+0.003
 gentle+0.003
aver+0.003
 playful+0.003
Neg logit contributions
 to-0.004
but-0.004
oes-0.004
ting-0.003
or-0.003
Token to
Token ID284
Feature activation+2.20e+01

Pos logit contributions
ors+0.165
licks+0.152
fully+0.151
 Stop+0.150
driver+0.149
Neg logit contributions
 small-0.186
 kind-0.182
 big-0.180
 charming-0.178
 beautiful-0.176
Token get
Token ID651
Feature activation+9.53e+00

Pos logit contributions
but+0.465
 to+0.460
ors+0.457
fully+0.443
oes+0.428
Neg logit contributions
 cute-0.833
 clever-0.821
 charming-0.802
 small-0.793
 gentle-0.787
Token a
Token ID257
Feature activation-1.02e+01

Pos logit contributions
 to+0.272
but+0.269
ors+0.268
fully+0.260
oes+0.249
Neg logit contributions
 cute-0.291
 clever-0.288
 charming-0.280
 small-0.273
 gentle-0.273
Token though
Token ID996
Feature activation+4.65e+00

Pos logit contributions
ors+0.102
but+0.097
fully+0.093
oes+0.087
driver+0.086
Neg logit contributions
 cute-0.239
 clever-0.237
 small-0.233
 charming-0.233
 beautiful-0.229
Token it
Token ID340
Feature activation+1.59e+01

Pos logit contributions
ors+0.080
but+0.078
fully+0.072
oes+0.069
driver+0.068
Neg logit contributions
 cute-0.203
 clever-0.200
 small-0.198
 charming-0.197
 beautiful-0.195
Token was
Token ID373
Feature activation-3.55e+00

Pos logit contributions
ors+0.446
but+0.429
 to+0.424
fully+0.415
oes+0.403
Neg logit contributions
 cute-0.497
 clever-0.487
 charming-0.481
 small-0.475
 gentle-0.474
Token just
Token ID655
Feature activation+5.92e+00

Pos logit contributions
 clever+0.057
 cute+0.056
 charming+0.053
 gentle+0.051
 playful+0.050
Neg logit contributions
 to-0.148
fully-0.133
ors-0.129
but-0.128
oes-0.126
Token a
Token ID257
Feature activation-1.47e+01

Pos logit contributions
 to+0.172
fully+0.165
ors+0.162
but+0.159
oes+0.152
Neg logit contributions
 cute-0.173
 clever-0.171
 charming-0.164
 small-0.162
 gentle-0.162
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
aver+0.179
 cute+0.170
 clever+0.168
 gentle+0.157
 charming+0.156
Neg logit contributions
 to-0.663
t-0.564
but-0.562
fully-0.556
 we-0.550
Token piece
Token ID3704
Feature activation-7.26e+00

Pos logit contributions
 cute+0.849
 clever+0.838
 charming+0.816
 small+0.811
 gentle+0.798
Neg logit contributions
ors-0.831
but-0.811
fully-0.793
 to-0.775
oes-0.761
Token,
Token ID11
Feature activation-3.77e-01

Pos logit contributions
 small+0.285
 cute+0.285
 clever+0.284
 kind+0.284
 beautiful+0.279
Neg logit contributions
ors-0.150
but-0.143
 Stop-0.133
to-0.131
licks-0.129
Token his
Token ID465
Feature activation-2.87e+00

Pos logit contributions
 cute+0.016
 clever+0.016
 small+0.015
 beautiful+0.015
 charming+0.015
Neg logit contributions
ors-0.008
oes-0.007
but-0.007
or-0.007
driver-0.007
Token dad
Token ID9955
Feature activation-2.62e+00

Pos logit contributions
 cute+0.064
aver+0.061
 clever+0.061
 charming+0.061
 gentle+0.059
Neg logit contributions
 to-0.102
ors-0.087
fully-0.087
but-0.085
driver-0.083
Token said
Token ID531
Feature activation-1.66e+01

Pos logit contributions
 cute+0.070
 clever+0.069
 charming+0.067
 small+0.067
 gentle+0.065
Neg logit contributions
ors-0.088
but-0.086
fully-0.084
oes-0.081
driver-0.080
Token!
Token ID0
Feature activation-1.63e+00

Pos logit contributions
 cute+0.421
 clever+0.414
 beautiful+0.410
 charming+0.405
 small+0.403
Neg logit contributions
ors-0.511
oes-0.510
but-0.509
licks-0.497
driver-0.493
Token<|endoftext|>
Token ID50256
Feature activation-4.36e+00

Pos logit contributions
 cute+0.051
 clever+0.050
 small+0.049
 charming+0.049
 beautiful+0.048
Neg logit contributions
ors-0.048
but-0.046
fully-0.046
 to-0.043
oes-0.043
TokenThere
Token ID1858
Feature activation+4.80e+00

Pos logit contributions
 cute+0.132
 clever+0.131
 charming+0.127
 small+0.126
 gentle+0.125
Neg logit contributions
ors-0.129
but-0.126
fully-0.123
 to-0.120
oes-0.118
Token was
Token ID373
Feature activation-3.33e+00

Pos logit contributions
ors+0.134
but+0.130
fully+0.127
 to+0.125
or+0.120
Neg logit contributions
 cute-0.153
 clever-0.152
 charming-0.148
 small-0.147
 gentle-0.146
Token a
Token ID257
Feature activation-1.45e+01

Pos logit contributions
 clever+0.074
 cute+0.074
 charming+0.070
 gentle+0.069
 kind+0.068
Neg logit contributions
 to-0.120
but-0.120
fully-0.118
ors-0.117
ges-0.110
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 comm+0.187
aver+0.181
 unexpected+0.166
 pain+0.163
 wond+0.162
Neg logit contributions
 to-0.543
 we-0.495
 after-0.479
 and-0.470
ho-0.469
Token girl
Token ID2576
Feature activation-9.21e-01

Pos logit contributions
 clever+0.670
 cute+0.669
 charming+0.639
 gentle+0.624
 small+0.616
Neg logit contributions
ors-0.980
 to-0.955
but-0.936
fully-0.933
oes-0.914
Token named
Token ID3706
Feature activation-9.05e+00

Pos logit contributions
 cute+0.026
 clever+0.026
 small+0.026
 charming+0.025
 kind+0.025
Neg logit contributions
ors-0.029
but-0.028
fully-0.028
driver-0.027
oes-0.027
Token Sad
Token ID14668
Feature activation-7.43e+00

Pos logit contributions
 cute+0.315
 clever+0.313
 charming+0.307
 small+0.301
 gentle+0.300
Neg logit contributions
ors-0.226
but-0.221
fully-0.215
 to-0.212
oes-0.203
Tokenie
Token ID494
Feature activation+1.05e+01

Pos logit contributions
 clever+0.355
 cute+0.354
 charming+0.345
 small+0.339
 funny+0.339
Neg logit contributions
fully-0.105
ors-0.100
 to-0.083
but-0.081
oes-0.070
Token who
Token ID508
Feature activation+2.85e+00

Pos logit contributions
 to+0.265
ors+0.258
but+0.251
fully+0.247
or+0.221
Neg logit contributions
 cute-0.365
 clever-0.362
 charming-0.353
 funny-0.346
 wonderful-0.343
Token hide
Token ID7808
Feature activation-1.84e+00

Pos logit contributions
 to+0.379
 him+0.320
fully+0.283
oes+0.267
 Stop+0.257
Neg logit contributions
 cute-0.648
 charming-0.624
 clever-0.606
 beautiful-0.601
 wonderful-0.593
Token.
Token ID13
Feature activation-8.69e+00

Pos logit contributions
 small+0.053
 clever+0.051
 cute+0.051
 kind+0.050
 big+0.050
Neg logit contributions
ors-0.057
oes-0.057
but-0.056
to-0.055
osity-0.054
Token They
Token ID1119
Feature activation+9.84e+00

Pos logit contributions
 cute+0.333
 small+0.330
 clever+0.327
 charming+0.320
 funny+0.316
Neg logit contributions
ors-0.205
but-0.189
fully-0.187
or-0.176
oes-0.176
Token see
Token ID766
Feature activation+5.03e+00

Pos logit contributions
ors+0.289
but+0.282
fully+0.275
oes+0.262
 to+0.261
Neg logit contributions
 cute-0.307
 clever-0.304
 small-0.295
 charming-0.295
 gentle-0.290
Token a
Token ID257
Feature activation-1.43e+01

Pos logit contributions
 to+0.163
 him+0.145
fully+0.143
rings+0.141
ors+0.141
Neg logit contributions
 clever-0.132
 charming-0.131
 cute-0.131
 gentle-0.127
 beautiful-0.123
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 clever+0.216
 cute+0.203
 charming+0.202
 youngest+0.201
 adorable+0.195
Neg logit contributions
 to-0.586
driver-0.524
but-0.499
oes-0.498
ing-0.498
Token,
Token ID11
Feature activation+2.76e-01

Pos logit contributions
 cute+0.629
 clever+0.626
 charming+0.601
 gentle+0.575
 beautiful+0.573
Neg logit contributions
ors-1.010
 to-1.002
fully-0.983
but-0.982
driver-0.969
Token wooden
Token ID13510
Feature activation-1.60e+01

Pos logit contributions
 to+0.012
but+0.011
fully+0.011
driver+0.011
rings+0.011
Neg logit contributions
 clever-0.003
 charming-0.003
 cute-0.003
 wonderful-0.003
 lovely-0.003
Token house
Token ID2156
Feature activation+4.19e+00

Pos logit contributions
 cute+0.370
 clever+0.364
 charming+0.353
 small+0.345
 beautiful+0.341
Neg logit contributions
ors-0.584
but-0.570
fully-0.564
 to-0.555
driver-0.550
Token.
Token ID13
Feature activation-8.67e+00

Pos logit contributions
ors+0.135
but+0.129
to+0.127
uld+0.125
or+0.125
Neg logit contributions
 small-0.125
 kind-0.123
 big-0.122
 cute-0.121
 clever-0.121
Token They
Token ID1119
Feature activation+9.88e+00

Pos logit contributions
 cute+0.340
 small+0.336
 clever+0.334
 charming+0.327
 funny+0.322
Neg logit contributions
ors-0.196
but-0.183
fully-0.179
oes-0.168
or-0.168
Token but
Token ID475
Feature activation+9.91e+00

Pos logit contributions
 small+0.003
 gentle+0.003
 clever+0.003
 cute+0.003
 funny+0.003
Neg logit contributions
ors-0.002
or-0.002
iked-0.002
oes-0.002
amb-0.002
Token it
Token ID340
Feature activation+1.62e+01

Pos logit contributions
ors+0.199
oes+0.179
but+0.175
amb+0.166
fully+0.164
Neg logit contributions
 small-0.420
 clever-0.419
 cute-0.417
 beautiful-0.407
 charming-0.404
Token was
Token ID373
Feature activation-3.17e+00

Pos logit contributions
ors+0.456
but+0.444
fully+0.434
 to+0.421
oes+0.412
Neg logit contributions
 cute-0.514
 clever-0.506
 charming-0.497
 small-0.492
 gentle-0.485
Token much
Token ID881
Feature activation-1.15e+01

Pos logit contributions
 cute+0.109
 clever+0.107
 charming+0.106
 small+0.104
 gentle+0.103
Neg logit contributions
ors-0.079
but-0.077
 to-0.076
fully-0.076
driver-0.072
Token too
Token ID1165
Feature activation+3.54e+00

Pos logit contributions
 cute+0.204
 clever+0.198
 charming+0.191
 small+0.182
 gentle+0.180
Neg logit contributions
ors-0.468
 to-0.468
but-0.463
fully-0.456
driver-0.449
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 to+0.150
t+0.137
ors+0.133
ing+0.131
but+0.131
Neg logit contributions
 cute-0.056
 charming-0.054
aver-0.052
 gentle-0.052
 compassionate-0.051
Token.
Token ID13
Feature activation-9.12e+00

Pos logit contributions
 small+0.748
 big+0.732
 cute+0.716
 beautiful+0.710
 clever+0.702
Neg logit contributions
but-0.908
amb-0.859
 Stop-0.857
uld-0.856
oes-0.855
Token She
Token ID1375
Feature activation+1.09e+01

Pos logit contributions
 cute+0.367
 clever+0.363
 small+0.359
 charming+0.357
 beautiful+0.351
Neg logit contributions
ors-0.183
but-0.173
fully-0.169
oes-0.161
driver-0.157
Token was
Token ID373
Feature activation-3.11e+00

Pos logit contributions
ors+0.315
but+0.301
fully+0.301
 to+0.296
driver+0.286
Neg logit contributions
 cute-0.337
 clever-0.330
 charming-0.329
 small-0.319
 beautiful-0.316
Token so
Token ID523
Feature activation+2.56e+00

Pos logit contributions
 cute+0.081
 clever+0.080
 charming+0.078
 small+0.077
 gentle+0.075
Neg logit contributions
ors-0.103
but-0.102
fully-0.100
 to-0.100
driver-0.096
Token sad
Token ID6507
Feature activation-1.55e+01

Pos logit contributions
 to+0.108
but+0.101
ors+0.098
fully+0.097
ges+0.097
Neg logit contributions
 cute-0.039
 charming-0.038
 clever-0.035
 wonderful-0.035
 lovely-0.035
Token that
Token ID326
Feature activation+9.04e+00

Pos logit contributions
 cute+0.048
 clever+0.047
 charming+0.046
 small+0.045
 gentle+0.045
Neg logit contributions
ors-0.049
but-0.049
fully-0.048
 to-0.048
driver-0.046
Token the
Token ID262
Feature activation-5.27e+00

Pos logit contributions
ors+0.228
but+0.221
fully+0.213
oes+0.204
driver+0.202
Neg logit contributions
 cute-0.321
 clever-0.317
 small-0.311
 charming-0.310
 beautiful-0.305
Token spot
Token ID4136
Feature activation+1.65e-02

Pos logit contributions
 cute+0.110
 clever+0.108
 charming+0.103
 gentle+0.100
 small+0.099
Neg logit contributions
 to-0.194
but-0.189
ors-0.188
fully-0.187
oes-0.182
Token was
Token ID373
Feature activation-3.17e+00

Pos logit contributions
ors+0.001
but+0.001
amb+0.001
 Stop+0.001
fully+0.001
Neg logit contributions
 cute-0.000
 clever-0.000
 small-0.000
 kind-0.000
 charming-0.000
Token too
Token ID1165
Feature activation+3.54e+00

Pos logit contributions
 cute+0.082
 clever+0.081
 charming+0.079
 small+0.076
 gentle+0.076
Neg logit contributions
 to-0.104
but-0.100
ors-0.100
fully-0.100
driver-0.094
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 to+0.156
ors+0.144
t+0.142
but+0.140
fully+0.139
Neg logit contributions
 cute-0.048
 clever-0.047
 charming-0.045
 wonderful-0.043
 gentle-0.043
Token for
Token ID329
Feature activation+1.07e+01

Pos logit contributions
 small+0.912
 big+0.887
 kind+0.875
 cute+0.873
 clever+0.870
Neg logit contributions
but-0.793
amb-0.719
ors-0.710
oes-0.709
uld-0.705
Token all
Token ID477
Feature activation-9.38e-01

Pos logit contributions
ors+0.109
fully+0.096
but+0.094
 to+0.090
driver+0.072
Neg logit contributions
 cute-0.534
 clever-0.530
 charming-0.523
 small-0.516
 gentle-0.515
Token the
Token ID262
Feature activation-5.28e+00

Pos logit contributions
 cute+0.031
 small+0.031
 clever+0.031
 beautiful+0.030
 charming+0.030
Neg logit contributions
but-0.026
ors-0.026
fully-0.024
driver-0.024
or-0.024
Token animals
Token ID4695
Feature activation+6.84e-01

Pos logit contributions
 cute+0.089
 clever+0.088
 charming+0.087
 gentle+0.079
 small+0.078
Neg logit contributions
 to-0.213
ors-0.202
fully-0.200
but-0.198
oes-0.195
Token!
Token ID0
Feature activation-1.72e+00

Pos logit contributions
but+0.018
ors+0.018
oes+0.017
fully+0.017
amb+0.016
Neg logit contributions
 clever-0.024
 cute-0.024
 small-0.023
 kind-0.023
 big-0.022
Token some
Token ID617
Feature activation-5.82e+00

Pos logit contributions
 to+0.154
ors+0.143
driver+0.139
fully+0.139
rings+0.138
Neg logit contributions
 clever-0.144
 cute-0.140
 charming-0.139
 gentle-0.136
 playful-0.131
Token toys
Token ID14958
Feature activation+4.33e-03

Pos logit contributions
 clever+0.143
 cute+0.143
 charming+0.138
 gentle+0.133
 small+0.132
Neg logit contributions
ors-0.197
 to-0.193
fully-0.190
but-0.188
driver-0.188
Token that
Token ID326
Feature activation+9.27e+00

Pos logit contributions
amb+0.000
but+0.000
to+0.000
ors+0.000
uld+0.000
Neg logit contributions
 small-0.000
 big-0.000
 cute-0.000
 clever-0.000
 pretty-0.000
Token are
Token ID389
Feature activation+4.45e+00

Pos logit contributions
but+0.274
ors+0.271
fully+0.268
to+0.242
iked+0.241
Neg logit contributions
 cute-0.303
 small-0.300
 clever-0.298
 cool-0.293
 funny-0.291
Token too
Token ID1165
Feature activation+3.75e+00

Pos logit contributions
 to+0.163
ors+0.160
driver+0.157
fully+0.156
but+0.155
Neg logit contributions
 cute-0.095
 clever-0.095
 charming-0.092
 gentle-0.089
 beautiful-0.087
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 to+0.164
ors+0.158
driver+0.151
t+0.148
fully+0.146
Neg logit contributions
 clever-0.055
 charming-0.053
 cute-0.053
 magical-0.051
 gentle-0.051
Token or
Token ID393
Feature activation+1.17e+01

Pos logit contributions
 small+0.883
 big+0.836
 cute+0.836
 kind+0.835
 clever+0.810
Neg logit contributions
but-0.869
ors-0.793
ancing-0.791
to-0.778
uld-0.775
Token too
Token ID1165
Feature activation+3.73e+00

Pos logit contributions
 to+0.474
ors+0.472
fully+0.459
driver+0.459
but+0.457
Neg logit contributions
 clever-0.197
 cute-0.196
 charming-0.190
 beautiful-0.178
 gentle-0.177
Token easy
Token ID2562
Feature activation-1.70e+01

Pos logit contributions
 to+0.150
ors+0.141
driver+0.137
t+0.132
fully+0.131
Neg logit contributions
 charming-0.070
 clever-0.070
 cute-0.069
 wonderful-0.067
 magical-0.066
Token for
Token ID329
Feature activation+1.08e+01

Pos logit contributions
 small+0.348
 big+0.334
 fun+0.329
 kind+0.325
 cute+0.320
Neg logit contributions
but-0.675
amb-0.661
uld-0.653
licks-0.637
 Stop-0.634
Token them
Token ID606
Feature activation+1.10e+01

Pos logit contributions
ors+0.206
 to+0.204
fully+0.188
 him+0.178
ges+0.174
Neg logit contributions
 clever-0.429
 cute-0.427
 charming-0.426
 gentle-0.411
 kind-0.407
Token much
Token ID881
Feature activation-1.14e+01

Pos logit contributions
 cute+0.141
 gentle+0.140
 clever+0.136
 charming+0.135
 magical+0.132
Neg logit contributions
ors-0.172
t-0.171
 to-0.168
but-0.167
fully-0.164
Token.
Token ID13
Feature activation-9.14e+00

Pos logit contributions
 cute+0.269
 clever+0.268
 small+0.261
 charming+0.259
 kind+0.257
Neg logit contributions
ors-0.419
but-0.403
oes-0.403
 Stop-0.390
ancing-0.388
Token Tom
Token ID4186
Feature activation-5.81e+00

Pos logit contributions
 small+0.328
 cute+0.323
 big+0.310
 clever+0.308
 charming+0.308
Neg logit contributions
fully-0.238
ors-0.233
but-0.217
driver-0.207
or-0.206
Token had
Token ID550
Feature activation-7.39e-01

Pos logit contributions
 cute+0.150
 clever+0.148
 charming+0.145
 small+0.143
 beautiful+0.140
Neg logit contributions
ors-0.196
but-0.191
fully-0.187
 to-0.186
driver-0.181
Token a
Token ID257
Feature activation-1.45e+01

Pos logit contributions
 clever+0.017
 cute+0.017
 charming+0.016
 gentle+0.015
 kind+0.015
Neg logit contributions
 to-0.028
but-0.024
ors-0.024
driver-0.024
rings-0.024
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 clever+0.237
aver+0.218
 gentle+0.216
 elderly+0.213
 cute+0.209
Neg logit contributions
 to-0.570
driver-0.517
rings-0.469
t-0.462
oes-0.454
Token car
Token ID1097
Feature activation-1.99e+00

Pos logit contributions
 clever+0.625
 cute+0.622
 charming+0.595
 gentle+0.573
 beautiful+0.573
Neg logit contributions
 to-1.009
ors-1.006
driver-0.984
fully-0.979
but-0.976
Token that
Token ID326
Feature activation+8.99e+00

Pos logit contributions
 small+0.075
 big+0.075
 kind+0.075
 cute+0.074
 clever+0.074
Neg logit contributions
ors-0.046
but-0.043
fully-0.043
uld-0.043
 Stop-0.042
Token he
Token ID339
Feature activation+1.06e+01

Pos logit contributions
ors+0.131
 to+0.129
but+0.128
fully+0.121
oes+0.117
Neg logit contributions
 cute-0.393
 clever-0.391
 charming-0.384
 gentle-0.380
 small-0.379
Token liked
Token ID8288
Feature activation-1.22e+01

Pos logit contributions
ors+0.344
 to+0.342
but+0.327
fully+0.322
driver+0.319
Neg logit contributions
 cute-0.268
 charming-0.261
 clever-0.261
 small-0.259
 beautiful-0.252
Token very
Token ID845
Feature activation-5.38e+00

Pos logit contributions
 cute+0.566
 small+0.565
 funny+0.555
 big+0.554
 beautiful+0.552
Neg logit contributions
ors-0.173
ancing-0.167
but-0.166
 Stop-0.155
or-0.153
Token down
Token ID866
Feature activation+7.78e+00

Pos logit contributions
but+0.005
ors+0.005
oes+0.004
uld+0.004
fully+0.004
Neg logit contributions
 cute-0.011
 small-0.011
 clever-0.011
 beautiful-0.011
 funny-0.011
Token their
Token ID511
Feature activation+1.02e-01

Pos logit contributions
ors+0.222
but+0.218
fully+0.216
 to+0.216
driver+0.203
Neg logit contributions
 cute-0.240
 clever-0.237
 charming-0.231
 gentle-0.224
 small-0.224
Token faces
Token ID6698
Feature activation+6.48e+00

Pos logit contributions
ors+0.003
 to+0.003
fully+0.003
but+0.003
driver+0.003
Neg logit contributions
 clever-0.003
 cute-0.003
 charming-0.003
 gentle-0.003
 kind-0.003
Token.
Token ID13
Feature activation-9.41e+00

Pos logit contributions
ors+0.199
but+0.190
oes+0.188
 Stop+0.185
licks+0.181
Neg logit contributions
 cute-0.192
 small-0.191
 clever-0.187
 beautiful-0.186
 funny-0.183
Token The
Token ID383
Feature activation-8.31e+00

Pos logit contributions
 cute+0.310
 clever+0.306
 small+0.302
 charming+0.299
 beautiful+0.293
Neg logit contributions
ors-0.255
but-0.246
fully-0.242
driver-0.230
oes-0.230
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
aver+0.169
eh+0.136
 cute+0.135
 gentle+0.130
 charming+0.127
Neg logit contributions
 to-0.299
driver-0.261
t-0.259
ting-0.254
but-0.254
Token child
Token ID1200
Feature activation-6.60e+00

Pos logit contributions
 cute+0.709
 clever+0.699
 charming+0.679
 cool+0.663
 wonderful+0.662
Neg logit contributions
 to-0.937
ors-0.893
driver-0.858
fully-0.851
oes-0.840
Token was
Token ID373
Feature activation-3.04e+00

Pos logit contributions
 cute+0.191
 clever+0.188
 charming+0.183
 small+0.183
 gentle+0.179
Neg logit contributions
ors-0.205
but-0.201
fully-0.196
 to-0.189
oes-0.189
Token confused
Token ID10416
Feature activation-1.66e+01

Pos logit contributions
 cute+0.090
 clever+0.088
 charming+0.087
 small+0.084
 pretty+0.083
Neg logit contributions
 to-0.088
but-0.080
fully-0.075
ors-0.074
licks-0.072
Token,
Token ID11
Feature activation-1.63e-01

Pos logit contributions
 small+0.551
 cute+0.543
 funny+0.540
 beautiful+0.538
 little+0.533
Neg logit contributions
ors-0.486
driver-0.471
uld-0.470
licks-0.461
oes-0.449
Token why
Token ID1521
Feature activation+9.32e+00

Pos logit contributions
 cute+0.006
 small+0.006
 clever+0.006
 funny+0.006
 gentle+0.006
Neg logit contributions
ors-0.004
oes-0.004
or-0.004
driver-0.004
licks-0.003
Token dress
Token ID6576
Feature activation-2.37e+00

Pos logit contributions
 cute+0.130
 clever+0.129
 charming+0.123
 gentle+0.119
 kind+0.118
Neg logit contributions
but-0.165
 to-0.164
fully-0.162
ors-0.162
oes-0.153
Tokener
Token ID263
Feature activation+2.28e+00

Pos logit contributions
 small+0.069
 cute+0.068
 beautiful+0.068
 big+0.068
 kind+0.068
Neg logit contributions
ors-0.070
 Stop-0.068
to-0.064
but-0.064
uld-0.064
Token.
Token ID13
Feature activation-9.27e+00

Pos logit contributions
ors+0.069
 Stop+0.067
but+0.066
uld+0.065
licks+0.064
Neg logit contributions
 cute-0.063
 small-0.063
 beautiful-0.062
 pretty-0.062
 charming-0.062
Token It
Token ID632
Feature activation+1.25e+01

Pos logit contributions
 small+0.382
 cute+0.372
 clever+0.363
 charming+0.361
 big+0.361
Neg logit contributions
ors-0.204
fully-0.179
but-0.175
or-0.171
oes-0.171
Token had
Token ID550
Feature activation-5.10e-01

Pos logit contributions
 to+0.291
but+0.269
fully+0.267
ors+0.258
rings+0.255
Neg logit contributions
 clever-0.428
 cute-0.423
 charming-0.417
 gentle-0.415
 playful-0.413
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 clever+0.011
 cute+0.010
 charming+0.010
 gentle+0.010
 kind+0.010
Neg logit contributions
 to-0.019
ors-0.019
but-0.019
fully-0.019
rings-0.018
Token beads
Token ID36116
Feature activation+5.73e+00

Pos logit contributions
 cute+0.683
 clever+0.662
 small+0.655
 charming+0.647
 beautiful+0.637
Neg logit contributions
ors-1.010
but-1.008
fully-0.974
oes-0.953
or-0.932
Token that
Token ID326
Feature activation+9.05e+00

Pos logit contributions
 Stop+0.168
ors+0.152
but+0.151
uld+0.151
to+0.150
Neg logit contributions
 pretty-0.202
 small-0.199
 beautiful-0.198
 cute-0.198
 big-0.197
Token spark
Token ID9009
Feature activation-1.80e+01

Pos logit contributions
but+0.258
 to+0.256
ors+0.254
fully+0.250
rings+0.243
Neg logit contributions
 cute-0.267
 clever-0.264
 charming-0.259
 gentle-0.255
 small-0.251
Tokenled
Token ID992
Feature activation-9.15e+00

Pos logit contributions
 cute+0.572
 clever+0.554
 pretty+0.551
 beautiful+0.549
 small+0.548
Neg logit contributions
but-0.520
ors-0.496
 Stop-0.484
oes-0.483
uld-0.461
Token in
Token ID287
Feature activation+9.42e+00

Pos logit contributions
 cute+0.334
 small+0.330
 beautiful+0.326
 clever+0.324
 pretty+0.322
Neg logit contributions
ors-0.246
but-0.224
oes-0.220
driver-0.220
 Stop-0.219
Token his
Token ID465
Feature activation-2.94e+00

Pos logit contributions
ors+0.226
but+0.222
fully+0.217
 to+0.214
oes+0.204
Neg logit contributions
 cute-0.314
 clever-0.310
 charming-0.303
 small-0.300
 gentle-0.297
Token photo
Token ID4590
Feature activation+8.16e-01

Pos logit contributions
 cute+0.075
 clever+0.073
 gentle+0.072
 charming+0.071
 small+0.070
Neg logit contributions
 to-0.096
ors-0.089
but-0.086
or-0.085
fully-0.085
Token inside
Token ID2641
Feature activation+9.33e+00

Pos logit contributions
ors+0.025
but+0.025
fully+0.024
oes+0.023
driver+0.023
Neg logit contributions
 cute-0.024
 clever-0.024
 small-0.023
 charming-0.023
 beautiful-0.022
Token was
Token ID373
Feature activation-3.26e+00

Pos logit contributions
ors+0.265
but+0.260
fully+0.251
oes+0.243
driver+0.242
Neg logit contributions
 cute-0.298
 clever-0.294
 small-0.289
 charming-0.287
 beautiful-0.283
Token too
Token ID1165
Feature activation+3.48e+00

Pos logit contributions
 cute+0.088
 clever+0.088
 charming+0.085
 gentle+0.083
 small+0.082
Neg logit contributions
 to-0.104
ors-0.099
fully-0.099
but-0.098
driver-0.093
Token small
Token ID1402
Feature activation-2.81e+01

Pos logit contributions
 to+0.156
t+0.140
ors+0.140
ing+0.135
fully+0.134
Neg logit contributions
 clever-0.047
 cute-0.045
 charming-0.044
aver-0.043
 gentle-0.043
Token!
Token ID0
Feature activation-1.90e+00

Pos logit contributions
 small+0.770
 cute+0.754
 big+0.749
 kind+0.739
 clever+0.737
Neg logit contributions
but-0.911
ors-0.830
oes-0.818
uld-0.814
amb-0.809
Token His
Token ID2399
Feature activation-6.29e+00

Pos logit contributions
 cute+0.065
 clever+0.064
 small+0.063
 charming+0.062
 gentle+0.061
Neg logit contributions
ors-0.050
but-0.048
fully-0.047
oes-0.044
driver-0.044
Token photo
Token ID4590
Feature activation+7.31e-01

Pos logit contributions
 cute+0.146
 clever+0.144
 gentle+0.139
 charming+0.139
 small+0.138
Neg logit contributions
 to-0.216
ors-0.205
fully-0.199
but-0.199
or-0.195
Token had
Token ID550
Feature activation-8.14e-01

Pos logit contributions
ors+0.024
but+0.024
oes+0.023
fully+0.023
or+0.022
Neg logit contributions
 cute-0.020
 clever-0.020
 small-0.019
 charming-0.019
 beautiful-0.019
Token shrunk
Token ID38900
Feature activation-7.86e+00

Pos logit contributions
 cute+0.022
 clever+0.022
 charming+0.021
 small+0.021
 gentle+0.020
Neg logit contributions
ors-0.026
but-0.026
fully-0.026
 to-0.026
driver-0.025
Token it
Token ID340
Feature activation+1.62e+01

Pos logit contributions
 small+0.160
 cute+0.159
 funny+0.156
 clever+0.156
 beautiful+0.155
Neg logit contributions
ors-0.144
but-0.139
oes-0.136
driver-0.135
or-0.131
Token until
Token ID1566
Feature activation+6.18e+00

Pos logit contributions
ors+0.559
but+0.536
driver+0.511
fully+0.506
 Stop+0.502
Neg logit contributions
 cute-0.443
 small-0.438
 clever-0.437
 beautiful-0.427
 charming-0.426
Token it
Token ID340
Feature activation+1.62e+01

Pos logit contributions
ors+0.128
but+0.124
fully+0.118
oes+0.112
 to+0.111
Neg logit contributions
 cute-0.246
 clever-0.243
 small-0.239
 charming-0.238
 gentle-0.235
Token was
Token ID373
Feature activation-3.21e+00

Pos logit contributions
 to+0.378
fully+0.349
but+0.338
t+0.336
 again+0.327
Neg logit contributions
 cute-0.581
 clever-0.574
 charming-0.561
 compassionate-0.557
 humble-0.555
Token a
Token ID257
Feature activation-1.46e+01

Pos logit contributions
 clever+0.074
 cute+0.072
 charming+0.069
 kind+0.067
 humble+0.067
Neg logit contributions
 to-0.116
fully-0.107
 off-0.101
 over-0.099
but-0.098
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 clever+0.259
 cute+0.249
aver+0.245
 charming+0.237
 intelligent+0.230
Neg logit contributions
 to-0.547
ho-0.464
 we-0.461
 again-0.457
 it-0.450
Token dot
Token ID16605
Feature activation-1.25e+00

Pos logit contributions
 cute+0.763
 clever+0.761
 charming+0.729
 gentle+0.709
 small+0.706
Neg logit contributions
ors-0.886
 to-0.880
fully-0.862
but-0.857
driver-0.832
Token in
Token ID287
Feature activation+9.21e+00

Pos logit contributions
 cute+0.042
 small+0.042
 clever+0.041
 kind+0.041
 beautiful+0.041
Neg logit contributions
ors-0.032
but-0.032
 Stop-0.029
oes-0.029
licks-0.029
Token the
Token ID262
Feature activation-5.19e+00

Pos logit contributions
 to+0.225
ges+0.197
fully+0.196
ors+0.193
 him+0.193
Neg logit contributions
 clever-0.328
 cute-0.316
 charming-0.308
 compassionate-0.303
 kind-0.303
Token sky
Token ID6766
Feature activation-2.71e+00

Pos logit contributions
 clever+0.179
 cute+0.171
 playful+0.161
 compassionate+0.160
 charming+0.159
Neg logit contributions
 to-0.136
oes-0.118
als-0.111
ting-0.111
ors-0.108
Token.
Token ID13
Feature activation-8.94e+00

Pos logit contributions
 cute+0.077
 clever+0.075
 small+0.074
 charming+0.074
 beautiful+0.072
Neg logit contributions
ors-0.086
but-0.084
fully-0.082
driver-0.079
oes-0.079
Token Lucy
Token ID22162
Feature activation-2.65e+00

Pos logit contributions
 cute+0.374
 clever+0.372
 charming+0.364
 small+0.356
 gentle+0.355
Neg logit contributions
ors-0.294
but-0.287
fully-0.284
 to-0.284
oes-0.271
Token.
Token ID13
Feature activation-9.05e+00

Pos logit contributions
 clever+0.092
 cute+0.090
 charming+0.089
 magical+0.088
 small+0.087
Neg logit contributions
 to-0.069
fully-0.063
ors-0.061
but-0.061
ges-0.057
Token She
Token ID1375
Feature activation+1.09e+01

Pos logit contributions
 cute+0.401
 clever+0.394
 small+0.390
 charming+0.390
 beautiful+0.383
Neg logit contributions
ors-0.150
but-0.142
fully-0.134
driver-0.124
oes-0.124
Token was
Token ID373
Feature activation-3.05e+00

Pos logit contributions
 to+0.325
t+0.280
but+0.276
fully+0.271
to+0.263
Neg logit contributions
 clever-0.299
 gentle-0.298
 small-0.291
 cute-0.291
 charming-0.290
Token very
Token ID845
Feature activation-5.70e+00

Pos logit contributions
 cute+0.066
 clever+0.065
 charming+0.063
 magical+0.063
 gentle+0.063
Neg logit contributions
 to-0.108
but-0.099
fully-0.092
ting-0.092
licks-0.092
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 unison+0.073
 comm+0.068
 magical+0.068
 wond+0.067
 wonderful+0.066
Neg logit contributions
 to-0.240
but-0.217
t-0.212
't-0.206
oes-0.206
Token,
Token ID11
Feature activation-8.81e-02

Pos logit contributions
 cute+1.169
 clever+1.154
 small+1.146
 charming+1.135
 kind+1.119
Neg logit contributions
ors-0.539
but-0.521
fully-0.495
oes-0.470
or-0.462
Token only
Token ID691
Feature activation-2.23e+00

Pos logit contributions
 cute+0.003
 small+0.003
 charming+0.003
 clever+0.003
 beautiful+0.003
Neg logit contributions
ors-0.003
or-0.003
driver-0.003
oes-0.003
but-0.003
Token three
Token ID1115
Feature activation-1.19e+01

Pos logit contributions
 clever+0.052
 cute+0.051
 charming+0.049
 gentle+0.048
 beautiful+0.047
Neg logit contributions
 to-0.081
fully-0.076
ors-0.075
but-0.075
oes-0.073
Token years
Token ID812
Feature activation-2.86e-01

Pos logit contributions
 magical+0.312
 clever+0.311
 charming+0.302
 cute+0.299
 wonderful+0.295
Neg logit contributions
ors-0.380
 to-0.363
ing-0.324
but-0.323
fully-0.321
Token old
Token ID1468
Feature activation-2.02e+01

Pos logit contributions
 cute+0.003
 clever+0.003
 charming+0.003
 funny+0.003
 gentle+0.003
Neg logit contributions
ors-0.014
 to-0.014
but-0.013
fully-0.013
oes-0.013
Token some
Token ID617
Feature activation-5.89e+00

Pos logit contributions
ors+0.392
 to+0.389
but+0.383
fully+0.377
driver+0.372
Neg logit contributions
 cute-0.241
 clever-0.240
 charming-0.233
 small-0.224
 beautiful-0.221
Token of
Token ID286
Feature activation+7.63e+00

Pos logit contributions
 cute+0.191
 clever+0.189
 charming+0.185
 small+0.183
 gentle+0.180
Neg logit contributions
ors-0.158
but-0.153
fully-0.150
 to-0.149
oes-0.143
Token them
Token ID606
Feature activation+1.06e+01

Pos logit contributions
ors+0.125
but+0.117
fully+0.116
 to+0.113
oes+0.104
Neg logit contributions
 cute-0.329
 clever-0.327
 charming-0.321
 small-0.316
 gentle-0.314
Token are
Token ID389
Feature activation+4.39e+00

Pos logit contributions
ors+0.292
but+0.289
fully+0.279
oes+0.264
driver+0.264
Neg logit contributions
 cute-0.350
 clever-0.345
 small-0.338
 charming-0.334
 gentle-0.330
Token very
Token ID845
Feature activation-5.58e+00

Pos logit contributions
 to+0.157
ors+0.150
but+0.147
fully+0.146
driver+0.140
Neg logit contributions
 clever-0.098
 cute-0.098
 charming-0.096
 small-0.090
 beautiful-0.089
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 clever+0.052
 cute+0.049
 charming+0.046
 gentle+0.041
 wonderful+0.041
Neg logit contributions
 to-0.266
ors-0.249
but-0.243
oes-0.239
fully-0.237
Token.
Token ID13
Feature activation-8.57e+00

Pos logit contributions
 cute+0.909
 small+0.901
 clever+0.879
 charming+0.865
 beautiful+0.861
Neg logit contributions
but-0.798
ors-0.784
fully-0.741
or-0.733
oes-0.731
Token Some
Token ID2773
Feature activation-2.55e+00

Pos logit contributions
 cute+0.285
 small+0.278
 clever+0.277
 charming+0.271
 gentle+0.267
Neg logit contributions
ors-0.236
fully-0.227
but-0.226
or-0.210
oes-0.209
Token of
Token ID286
Feature activation+7.65e+00

Pos logit contributions
 cute+0.082
 clever+0.081
 small+0.079
 charming+0.079
 gentle+0.078
Neg logit contributions
ors-0.072
but-0.071
fully-0.068
oes-0.065
 to-0.065
Token them
Token ID606
Feature activation+1.07e+01

Pos logit contributions
ors+0.159
but+0.153
fully+0.148
 to+0.142
oes+0.139
Neg logit contributions
 cute-0.299
 clever-0.297
 charming-0.291
 small-0.289
 gentle-0.286
Token are
Token ID389
Feature activation+4.47e+00

Pos logit contributions
but+0.300
ors+0.299
fully+0.291
oes+0.272
driver+0.272
Neg logit contributions
 cute-0.350
 clever-0.345
 small-0.340
 charming-0.332
 gentle-0.330
Token But
Token ID887
Feature activation+9.50e+00

Pos logit contributions
 cute+0.358
 clever+0.352
 small+0.349
 charming+0.347
 beautiful+0.341
Neg logit contributions
ors-0.190
but-0.179
fully-0.175
oes-0.164
driver-0.162
Token she
Token ID673
Feature activation+1.16e+01

Pos logit contributions
ors+0.267
but+0.240
oes+0.235
driver+0.233
fully+0.232
Neg logit contributions
 cute-0.327
 small-0.326
 clever-0.325
 charming-0.319
 kind-0.315
Token made
Token ID925
Feature activation-8.17e-01

Pos logit contributions
ors+0.378
but+0.368
fully+0.359
 to+0.354
oes+0.346
Neg logit contributions
 cute-0.312
 clever-0.307
 charming-0.301
 small-0.299
 gentle-0.292
Token them
Token ID606
Feature activation+1.09e+01

Pos logit contributions
 clever+0.022
 cute+0.022
 charming+0.021
 gentle+0.021
 kind+0.021
Neg logit contributions
 to-0.026
ors-0.024
fully-0.024
but-0.022
 him-0.022
Token very
Token ID845
Feature activation-5.73e+00

Pos logit contributions
ors+0.345
but+0.336
fully+0.327
oes+0.314
driver+0.313
Neg logit contributions
 cute-0.314
 clever-0.309
 charming-0.301
 small-0.301
 beautiful-0.295
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 clever+0.088
 magical+0.086
 cute+0.085
 charming+0.085
 wonderful+0.084
Neg logit contributions
 to-0.233
ors-0.219
but-0.210
t-0.210
oes-0.207
Token and
Token ID290
Feature activation+1.08e+01

Pos logit contributions
 cute+1.033
 small+1.024
 clever+1.005
 beautiful+1.003
 charming+0.996
Neg logit contributions
ors-0.684
but-0.682
fully-0.622
to-0.618
or-0.616
Token put
Token ID1234
Feature activation-2.97e+00

Pos logit contributions
 to+0.472
fully+0.442
ors+0.438
but+0.434
rings+0.430
Neg logit contributions
 clever-0.133
 cute-0.132
 charming-0.129
 small-0.118
 lovely-0.116
Token them
Token ID606
Feature activation+1.09e+01

Pos logit contributions
 cute+0.105
 clever+0.105
 charming+0.102
 small+0.101
 gentle+0.100
Neg logit contributions
ors-0.071
but-0.068
fully-0.067
 to-0.066
oes-0.063
Token in
Token ID287
Feature activation+9.35e+00

Pos logit contributions
ors+0.193
but+0.186
fully+0.181
 to+0.175
oes+0.165
Neg logit contributions
 cute-0.460
 clever-0.456
 charming-0.448
 small-0.444
 gentle-0.439
Token the
Token ID262
Feature activation-4.88e+00

Pos logit contributions
 to+0.260
fully+0.224
but+0.220
ges+0.217
 him+0.213
Neg logit contributions
 clever-0.298
 cute-0.292
 charming-0.282
aver-0.276
 playful-0.275
TokenOnce
Token ID7454
Feature activation+1.24e+00

Pos logit contributions
 cute+0.136
 clever+0.134
 small+0.132
 charming+0.131
 beautiful+0.128
Neg logit contributions
ors-0.150
but-0.146
fully-0.143
oes-0.137
driver-0.137
Token,
Token ID11
Feature activation-3.96e-01

Pos logit contributions
ors+0.031
but+0.030
fully+0.028
driver+0.028
ting+0.027
Neg logit contributions
 cute-0.045
 small-0.044
 clever-0.044
 charming-0.043
 beautiful-0.042
Token there
Token ID612
Feature activation+3.14e+00

Pos logit contributions
 cute+0.014
 small+0.013
 clever+0.013
 charming+0.013
 kind+0.013
Neg logit contributions
ors-0.011
but-0.011
fully-0.010
driver-0.010
oes-0.010
Token was
Token ID373
Feature activation-3.29e+00

Pos logit contributions
ors+0.091
but+0.090
fully+0.087
 to+0.086
oes+0.083
Neg logit contributions
 cute-0.096
 clever-0.095
 charming-0.092
 small-0.092
 gentle-0.091
Token a
Token ID257
Feature activation-1.46e+01

Pos logit contributions
 clever+0.081
 cute+0.078
 gentle+0.076
 charming+0.075
 kind+0.074
Neg logit contributions
 to-0.109
fully-0.103
but-0.101
 him-0.099
ges-0.095
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 unexpected+0.208
 comm+0.207
 smallest+0.203
ady+0.201
 youngest+0.201
Neg logit contributions
 to-0.550
 we-0.483
 and-0.463
but-0.452
 after-0.451
Token little
Token ID1310
Feature activation-2.26e+01

Pos logit contributions
 clever+0.670
 cute+0.657
 charming+0.628
 gentle+0.618
 wonderful+0.618
Neg logit contributions
 to-0.961
ors-0.927
fully-0.900
oes-0.886
but-0.884
Token girl
Token ID2576
Feature activation-1.03e+00

Pos logit contributions
 clever+0.505
 cute+0.483
 charming+0.471
 wonderful+0.467
 magical+0.466
Neg logit contributions
 to-0.789
ors-0.772
fully-0.746
ing-0.741
oes-0.736
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
 cute+0.026
 clever+0.026
 charming+0.025
 small+0.024
 beautiful+0.024
Neg logit contributions
ors-0.036
but-0.035
 to-0.034
fully-0.034
oes-0.033
Token She
Token ID1375
Feature activation+1.08e+01

Pos logit contributions
 clever+0.454
 cute+0.453
 charming+0.444
 small+0.440
 gentle+0.439
Neg logit contributions
ors-0.110
but-0.108
fully-0.100
 to-0.098
oes-0.093
Token was
Token ID373
Feature activation-3.27e+00

Pos logit contributions
 to+0.302
t+0.256
but+0.240
fully+0.234
't+0.232
Neg logit contributions
 clever-0.306
 gentle-0.306
 youngest-0.305
 charming-0.302
 cute-0.301
Token truck
Token ID7779
Feature activation-3.17e+00

Pos logit contributions
 cute+0.425
 clever+0.414
 small+0.411
 charming+0.406
 gentle+0.395
Neg logit contributions
ors-0.613
but-0.605
fully-0.587
licks-0.560
oes-0.558
Token.
Token ID13
Feature activation-8.90e+00

Pos logit contributions
 kind+0.093
 small+0.092
 big+0.092
 cute+0.092
 clever+0.089
Neg logit contributions
ors-0.096
but-0.093
uld-0.092
fully-0.092
 Stop-0.092
Token Sam
Token ID3409
Feature activation-7.39e+00

Pos logit contributions
 cute+0.326
 clever+0.321
 small+0.318
 charming+0.315
 gentle+0.309
Neg logit contributions
ors-0.213
but-0.206
fully-0.202
or-0.185
driver-0.185
Token has
Token ID468
Feature activation+6.26e+00

Pos logit contributions
 cute+0.202
 clever+0.196
 small+0.194
 charming+0.189
 cool+0.185
Neg logit contributions
but-0.260
fully-0.255
ors-0.247
iked-0.237
oes-0.236
Token a
Token ID257
Feature activation-1.45e+01

Pos logit contributions
 to+0.234
driver+0.208
ors+0.202
rings+0.201
but+0.200
Neg logit contributions
 clever-0.142
 cute-0.140
 charming-0.135
 gentle-0.129
 magical-0.128
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 clever+0.245
 wonderful+0.234
 unexpected+0.233
 wond+0.231
 cute+0.226
Neg logit contributions
 to-0.579
driver-0.530
oes-0.489
t-0.464
rings-0.463
Token blue
Token ID4171
Feature activation-2.04e+01

Pos logit contributions
 clever+0.715
 cute+0.713
 charming+0.683
 beautiful+0.662
 wonderful+0.661
Neg logit contributions
 to-0.936
ors-0.914
driver-0.912
oes-0.879
fully-0.869
Token truck
Token ID7779
Feature activation-3.12e+00

Pos logit contributions
 cute+0.470
 small+0.462
 clever+0.452
 charming+0.448
 kind+0.437
Neg logit contributions
ors-0.761
but-0.752
fully-0.733
licks-0.701
or-0.692
Token.
Token ID13
Feature activation-8.86e+00

Pos logit contributions
 kind+0.089
 small+0.088
 cute+0.088
 big+0.087
 cool+0.085
Neg logit contributions
ors-0.098
but-0.095
uld-0.095
fully-0.093
 Stop-0.093
Token They
Token ID1119
Feature activation+9.24e+00

Pos logit contributions
 cute+0.305
 small+0.300
 clever+0.298
 charming+0.294
 funny+0.289
Neg logit contributions
ors-0.233
fully-0.224
but-0.224
or-0.207
driver-0.204
Token make
Token ID787
Feature activation+3.41e+00

Pos logit contributions
ors+0.252
but+0.240
fully+0.238
 to+0.238
oes+0.233
Neg logit contributions
 cute-0.295
 clever-0.292
 charming-0.287
 small-0.283
 beautiful-0.279
Token She
Token ID1375
Feature activation+1.06e+01

Pos logit contributions
 cute+0.074
 small+0.073
 clever+0.072
 charming+0.071
 beautiful+0.071
Neg logit contributions
ors-0.058
fully-0.055
but-0.053
oes-0.051
driver-0.051
Token carefully
Token ID7773
Feature activation-5.00e+00

Pos logit contributions
ors+0.331
but+0.327
fully+0.319
 to+0.315
driver+0.303
Neg logit contributions
 cute-0.300
 clever-0.294
 charming-0.288
 small-0.285
 gentle-0.279
Token piled
Token ID33654
Feature activation+2.36e+00

Pos logit contributions
 cute+0.142
 clever+0.140
 charming+0.136
 small+0.135
 gentle+0.133
Neg logit contributions
ors-0.159
but-0.156
fully-0.152
 to-0.147
oes-0.146
Token up
Token ID510
Feature activation+4.44e+00

Pos logit contributions
ors+0.049
but+0.049
oes+0.047
 Stop+0.045
fully+0.045
Neg logit contributions
 small-0.096
 cute-0.095
 clever-0.093
 beautiful-0.093
 pretty-0.093
Token the
Token ID262
Feature activation-5.26e+00

Pos logit contributions
ors+0.140
but+0.136
oes+0.133
fully+0.131
 Stop+0.129
Neg logit contributions
 cute-0.132
 small-0.132
 clever-0.129
 beautiful-0.128
 charming-0.127
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 clever+0.119
 cute+0.119
 charming+0.114
 kind+0.110
 gentle+0.110
Neg logit contributions
ors-0.189
but-0.185
 to-0.184
fully-0.182
driver-0.178
Token bows
Token ID37469
Feature activation-9.93e-01

Pos logit contributions
 cute+0.674
 clever+0.655
 small+0.651
 charming+0.642
 beautiful+0.628
Neg logit contributions
ors-1.017
but-1.011
fully-0.978
oes-0.949
 to-0.935
Token and
Token ID290
Feature activation+1.06e+01

Pos logit contributions
 small+0.033
 big+0.033
 beautiful+0.032
 pretty+0.031
 colorful+0.031
Neg logit contributions
 Stop-0.028
ors-0.026
to-0.026
uld-0.025
oes-0.025
Token started
Token ID2067
Feature activation-3.19e+00

Pos logit contributions
ors+0.328
but+0.325
 to+0.324
fully+0.321
driver+0.309
Neg logit contributions
 cute-0.292
 clever-0.289
 charming-0.283
 small-0.274
 kind-0.273
Token sorting
Token ID29407
Feature activation+4.13e+00

Pos logit contributions
 small+0.148
 cute+0.146
 clever+0.145
 beautiful+0.143
 charming+0.143
Neg logit contributions
ors-0.050
but-0.046
fully-0.044
oes-0.043
 Stop-0.040
Token them
Token ID606
Feature activation+1.06e+01

Pos logit contributions
but+0.093
ors+0.093
oes+0.089
fully+0.087
 Stop+0.085
Neg logit contributions
 cute-0.157
 small-0.157
 beautiful-0.154
 clever-0.153
 pretty-0.153
Token passports
Token ID33052
Feature activation+9.62e+00

Pos logit contributions
 to+0.298
rings+0.250
driver+0.249
ors+0.249
but+0.248
Neg logit contributions
 cute-0.150
 clever-0.149
 charming-0.139
 gentle-0.136
 compassionate-0.132
Token.
Token ID13
Feature activation-8.95e+00

Pos logit contributions
ors+0.207
 Stop+0.190
but+0.186
to+0.180
uld+0.177
Neg logit contributions
 cute-0.394
 beautiful-0.393
 small-0.391
 big-0.390
 pretty-0.389
Token Pass
Token ID6251
Feature activation-1.60e+00

Pos logit contributions
 cute+0.298
 clever+0.295
 charming+0.288
 small+0.287
 gentle+0.282
Neg logit contributions
ors-0.241
but-0.235
fully-0.228
 to-0.220
oes-0.218
Tokenports
Token ID3742
Feature activation+5.02e-01

Pos logit contributions
 cute+0.052
 clever+0.051
 charming+0.050
 small+0.050
 gentle+0.049
Neg logit contributions
ors-0.045
but-0.043
fully-0.042
 to-0.041
oes-0.040
Token are
Token ID389
Feature activation+4.41e+00

Pos logit contributions
ors+0.013
but+0.013
fully+0.013
oes+0.012
driver+0.012
Neg logit contributions
 cute-0.017
 clever-0.017
 small-0.017
 charming-0.017
 beautiful-0.016
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 to+0.155
but+0.143
ors+0.141
fully+0.139
driver+0.137
Neg logit contributions
 cute-0.105
 clever-0.103
 charming-0.100
 magical-0.096
 gentle-0.096
Token books
Token ID3835
Feature activation+5.50e+00

Pos logit contributions
 cute+0.732
 clever+0.716
 small+0.698
 charming+0.697
 beautiful+0.680
Neg logit contributions
ors-0.954
but-0.944
fully-0.918
oes-0.885
driver-0.876
Token with
Token ID351
Feature activation+6.13e+00

Pos logit contributions
ors+0.151
 Stop+0.146
to+0.144
but+0.141
uld+0.140
Neg logit contributions
 small-0.188
 big-0.187
 cute-0.183
 beautiful-0.183
 kind-0.183
Token pictures
Token ID5986
Feature activation-7.60e-01

Pos logit contributions
 to+0.207
ors+0.199
rings+0.190
fully+0.184
ges+0.180
Neg logit contributions
 clever-0.154
 cute-0.145
 charming-0.144
 gentle-0.140
 humble-0.137
Token and
Token ID290
Feature activation+1.05e+01

Pos logit contributions
 small+0.026
 big+0.026
 beautiful+0.026
 cute+0.026
 pretty+0.026
Neg logit contributions
ors-0.019
 Stop-0.019
to-0.019
but-0.019
uld-0.018
Token names
Token ID3891
Feature activation-2.75e+00

Pos logit contributions
ors+0.352
 to+0.346
but+0.342
fully+0.339
driver+0.330
Neg logit contributions
 clever-0.262
 cute-0.261
 charming-0.252
 small-0.248
 gentle-0.244
Token
Token ID198
Feature activation-6.83e+00

Pos logit contributions
 small+0.338
 cute+0.326
 clever+0.322
 little+0.318
 charming+0.316
Neg logit contributions
ors-0.249
fully-0.234
oes-0.224
reating-0.213
driver-0.213
Token
Token ID198
Feature activation-6.79e+00

Pos logit contributions
 cute+0.203
 clever+0.200
 small+0.197
 charming+0.195
 gentle+0.192
Neg logit contributions
ors-0.210
but-0.200
fully-0.199
oes-0.193
driver-0.190
TokenShe
Token ID3347
Feature activation+1.05e+01

Pos logit contributions
 clever+0.191
 cute+0.190
 small+0.187
 gentle+0.186
 charming+0.184
Neg logit contributions
ors-0.228
oes-0.214
fully-0.212
but-0.202
driver-0.201
Token followed
Token ID3940
Feature activation-8.79e+00

Pos logit contributions
ors+0.304
but+0.299
fully+0.294
 to+0.294
driver+0.284
Neg logit contributions
 cute-0.325
 clever-0.314
 charming-0.312
 small-0.305
 beautiful-0.301
Token a
Token ID257
Feature activation-1.44e+01

Pos logit contributions
 small+0.251
 clever+0.242
 cute+0.241
 big+0.239
 beautiful+0.237
Neg logit contributions
ors-0.284
oes-0.282
but-0.277
driver-0.274
licks-0.270
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 cute+0.324
 clever+0.312
 charming+0.311
 kind+0.290
 pretty+0.287
Neg logit contributions
 to-0.520
but-0.505
fully-0.499
ors-0.495
driver-0.479
Token path
Token ID3108
Feature activation+4.30e-02

Pos logit contributions
 cute+0.669
 clever+0.660
 small+0.643
 charming+0.639
 gentle+0.626
Neg logit contributions
ors-1.019
but-1.003
fully-0.978
oes-0.945
 to-0.941
Token and
Token ID290
Feature activation+1.05e+01

Pos logit contributions
ors+0.001
 Stop+0.001
to+0.001
licks+0.001
uld+0.001
Neg logit contributions
 small-0.002
 big-0.002
 gentle-0.002
 kind-0.002
 brave-0.002
Token hoped
Token ID10719
Feature activation-7.04e+00

Pos logit contributions
ors+0.297
but+0.271
oes+0.267
fully+0.264
driver+0.259
Neg logit contributions
 clever-0.355
 cute-0.349
 small-0.348
 gentle-0.343
 charming-0.340
Token it
Token ID340
Feature activation+1.62e+01

Pos logit contributions
 small+0.288
 beautiful+0.284
 little+0.280
 magical+0.279
 clever+0.278
Neg logit contributions
ors-0.148
licks-0.144
oes-0.137
driver-0.136
ays-0.132
Token would
Token ID561
Feature activation+9.36e+00

Pos logit contributions
ors+0.409
but+0.389
fully+0.370
oes+0.358
driver+0.353
Neg logit contributions
 clever-0.580
 cute-0.578
 small-0.570
 charming-0.560
 gentle-0.555
Token
Token ID198
Feature activation-6.71e+00

Pos logit contributions
 cute+0.214
 clever+0.212
 small+0.208
 charming+0.206
 beautiful+0.201
Neg logit contributions
ors-0.216
but-0.209
fully-0.206
oes-0.197
 to-0.196
Token
Token ID198
Feature activation-6.68e+00

Pos logit contributions
 cute+0.236
 clever+0.230
 charming+0.228
 gentle+0.223
 kind+0.220
Neg logit contributions
but-0.168
ors-0.157
 to-0.154
fully-0.154
to-0.149
TokenSoon
Token ID28093
Feature activation+1.98e+00

Pos logit contributions
 cute+0.213
 clever+0.208
 charming+0.204
 small+0.203
 gentle+0.199
Neg logit contributions
but-0.189
ors-0.187
fully-0.181
 to-0.173
or-0.171
Token,
Token ID11
Feature activation-3.27e-01

Pos logit contributions
but+0.055
ors+0.055
oes+0.054
fully+0.050
driver+0.049
Neg logit contributions
 clever-0.067
 small-0.067
 cute-0.067
 funny-0.065
 wonderful-0.065
Token the
Token ID262
Feature activation-5.34e+00

Pos logit contributions
 cute+0.010
 clever+0.010
 charming+0.010
 small+0.010
 gentle+0.010
Neg logit contributions
ors-0.009
but-0.009
fully-0.009
 to-0.009
oes-0.008
Token small
Token ID1402
Feature activation-2.80e+01

Pos logit contributions
 cute+0.131
 charming+0.121
 clever+0.120
 gentle+0.119
 funny+0.117
Neg logit contributions
 to-0.161
fully-0.147
driver-0.146
ting-0.143
t-0.142
Token hero
Token ID4293
Feature activation-1.15e-01

Pos logit contributions
 cute+0.699
 clever+0.682
 charming+0.661
 gentle+0.644
 funny+0.635
Neg logit contributions
ors-0.954
 to-0.942
fully-0.915
driver-0.900
but-0.881
Token realized
Token ID6939
Feature activation-6.72e+00

Pos logit contributions
 cute+0.003
 clever+0.003
 charming+0.003
 small+0.003
 gentle+0.003
Neg logit contributions
ors-0.004
but-0.003
fully-0.003
 to-0.003
oes-0.003
Token how
Token ID703
Feature activation+8.54e+00

Pos logit contributions
 clever+0.273
 cute+0.273
 small+0.269
 charming+0.267
 beautiful+0.267
Neg logit contributions
ors-0.134
oes-0.130
but-0.127
fully-0.118
licks-0.118
Token brave
Token ID14802
Feature activation-2.28e+01

Pos logit contributions
 to+0.333
ors+0.327
but+0.318
fully+0.316
driver+0.301
Neg logit contributions
 cute-0.167
 clever-0.159
 charming-0.157
 small-0.152
 gentle-0.152
Token he
Token ID339
Feature activation+1.05e+01

Pos logit contributions
 cute+0.879
 small+0.872
 clever+0.857
 little+0.852
 beautiful+0.851
Neg logit contributions
but-0.532
oes-0.504
ancing-0.483
ors-0.482
to-0.453
Token.
Token ID13
Feature activation-6.15e+00

Pos logit contributions
 kind+0.037
 clever+0.035
 brave+0.035
 small+0.034
 bad+0.034
Neg logit contributions
licks-0.045
oes-0.044
rings-0.043
but-0.042
ors-0.042
Token We
Token ID775
Feature activation+1.91e+01

Pos logit contributions
 cute+0.276
 clever+0.273
 small+0.269
 charming+0.268
 gentle+0.265
Neg logit contributions
ors-0.094
but-0.090
fully-0.087
 to-0.078
driver-0.077
Token don
Token ID836
Feature activation+4.14e+00

Pos logit contributions
 to+0.418
ors+0.410
or+0.389
but+0.381
fully+0.376
Neg logit contributions
 cute-0.700
 charming-0.684
 clever-0.683
 small-0.679
 playful-0.662
Token't
Token ID470
Feature activation+1.36e+01

Pos logit contributions
fully+0.059
but+0.058
ors+0.056
oes+0.053
rings+0.051
Neg logit contributions
 cute-0.187
 small-0.186
 clever-0.186
 big-0.184
 beautiful-0.181
Token want
Token ID765
Feature activation+5.44e+00

Pos logit contributions
ors+0.363
but+0.352
fully+0.344
 to+0.334
oes+0.330
Neg logit contributions
 cute-0.451
 clever-0.446
 charming-0.436
 small-0.434
 gentle-0.426
Token to
Token ID284
Feature activation+1.90e+01

Pos logit contributions
ors+0.063
but+0.061
ancing+0.057
fully+0.054
or+0.054
Neg logit contributions
 cute-0.275
 small-0.275
 beautiful-0.270
 funny-0.270
 big-0.269
Token miss
Token ID2051
Feature activation-1.42e+00

Pos logit contributions
ors+0.534
 to+0.519
but+0.517
fully+0.505
oes+0.499
Neg logit contributions
 clever-0.595
 cute-0.594
 charming-0.576
 small-0.569
 gentle-0.562
Token you
Token ID345
Feature activation+1.29e+01

Pos logit contributions
 cute+0.049
 clever+0.048
 small+0.047
 charming+0.047
 beautiful+0.046
Neg logit contributions
but-0.038
ors-0.038
fully-0.036
driver-0.035
oes-0.034
Token or
Token ID393
Feature activation+1.35e+01

Pos logit contributions
ors+0.202
but+0.193
fully+0.181
oes+0.171
driver+0.170
Neg logit contributions
 cute-0.580
 clever-0.573
 small-0.566
 charming-0.564
 gentle-0.557
Token our
Token ID674
Feature activation+8.91e+00

Pos logit contributions
ors+0.362
but+0.355
fully+0.339
driver+0.324
or+0.322
Neg logit contributions
 cute-0.465
 clever-0.456
 small-0.449
 charming-0.446
 gentle-0.440
Token home
Token ID1363
Feature activation+3.37e+00

Pos logit contributions
 to+0.314
ors+0.285
oes+0.270
but+0.264
rings+0.264
Neg logit contributions
aver-0.204
 clever-0.203
 cute-0.194
 gentle-0.192
 small-0.187
Token starting
Token ID3599
Feature activation-1.48e+00

Pos logit contributions
aver+0.000
 clever+0.000
 cute+0.000
 gentle+0.000
 compassionate+0.000
Neg logit contributions
 to-0.000
ors-0.000
oes-0.000
 towards-0.000
or-0.000
Token point
Token ID966
Feature activation-8.69e+00

Pos logit contributions
 cute+0.050
 clever+0.049
 charming+0.048
 small+0.048
 gentle+0.047
Neg logit contributions
ors-0.039
but-0.038
fully-0.037
 to-0.036
oes-0.035
Token and
Token ID290
Feature activation+1.06e+01

Pos logit contributions
 small+0.341
 cute+0.337
 beautiful+0.335
 charming+0.331
 clever+0.328
Neg logit contributions
ors-0.180
but-0.171
 Stop-0.168
driver-0.167
licks-0.158
Token were
Token ID547
Feature activation-1.06e+00

Pos logit contributions
ors+0.327
but+0.315
fully+0.307
driver+0.298
oes+0.292
Neg logit contributions
 cute-0.320
 clever-0.316
 small-0.309
 charming-0.308
 beautiful-0.303
Token excited
Token ID6568
Feature activation-1.70e+01

Pos logit contributions
 cute+0.028
 clever+0.027
 charming+0.026
 small+0.026
 gentle+0.026
Neg logit contributions
 to-0.034
but-0.031
fully-0.030
ors-0.029
licks-0.028
Token to
Token ID284
Feature activation+1.73e+01

Pos logit contributions
 beautiful+0.658
 charming+0.636
 kind+0.635
 cute+0.632
 fun+0.631
Neg logit contributions
driver-0.458
licks-0.420
ors-0.399
 Stop-0.390
ges-0.372
Token tell
Token ID1560
Feature activation-2.30e+00

Pos logit contributions
ors+0.573
but+0.548
driver+0.537
fully+0.524
licks+0.509
Neg logit contributions
 clever-0.515
 cute-0.514
 small-0.503
 cool-0.501
 charming-0.501
Token each
Token ID1123
Feature activation+4.56e+00

Pos logit contributions
 cute+0.077
 clever+0.077
 charming+0.074
 small+0.073
 gentle+0.072
Neg logit contributions
 to-0.059
fully-0.058
ors-0.058
but-0.056
or-0.051
Token other
Token ID584
Feature activation-2.11e+01

Pos logit contributions
 to+0.195
fully+0.181
reating+0.176
 him+0.174
oes+0.172
Neg logit contributions
 clever-0.074
 cute-0.073
 charming-0.070
 cozy-0.067
 beautiful-0.063
Token about
Token ID546
Feature activation+1.21e+01

Pos logit contributions
 cute+0.824
 clever+0.807
 small+0.800
 charming+0.794
 beautiful+0.786
Neg logit contributions
ors-0.466
but-0.455
fully-0.420
driver-0.412
oes-0.410
Token the
Token ID262
Feature activation-4.96e+00

Pos logit contributions
ors+0.327
but+0.319
fully+0.311
 to+0.303
oes+0.295
Neg logit contributions
 cute-0.399
 clever-0.395
 charming-0.385
 small-0.384
 gentle-0.378
Token my
Token ID616
Feature activation+4.74e+00

Pos logit contributions
 to+0.097
ors+0.093
but+0.087
fully+0.087
 Stop+0.083
Neg logit contributions
 clever-0.130
 cute-0.130
 charming-0.128
 small-0.127
 gentle-0.127
Token rope
Token ID17182
Feature activation+2.81e+00

Pos logit contributions
 to+0.176
ors+0.163
fully+0.159
but+0.158
 Stop+0.158
Neg logit contributions
 clever-0.095
 gentle-0.095
 cute-0.092
 charming-0.090
 small-0.089
Token!
Token ID0
Feature activation-1.52e-01

Pos logit contributions
ors+0.077
but+0.077
fully+0.074
oes+0.071
or+0.070
Neg logit contributions
 cute-0.091
 clever-0.090
 small-0.088
 charming-0.087
 kind-0.086
Token You
Token ID921
Feature activation+1.26e+01

Pos logit contributions
 cute+0.006
 small+0.006
 clever+0.006
 kind+0.006
 hairy+0.006
Neg logit contributions
fully-0.003
but-0.003
ors-0.003
iked-0.003
oes-0.003
Token took
Token ID1718
Feature activation-2.50e+00

Pos logit contributions
ors+0.350
but+0.348
fully+0.337
oes+0.320
 to+0.317
Neg logit contributions
 cute-0.410
 clever-0.406
 small-0.394
 charming-0.393
 beautiful-0.386
Token it
Token ID340
Feature activation+1.76e+01

Pos logit contributions
 cute+0.106
 clever+0.105
 charming+0.103
 small+0.103
 gentle+0.101
Neg logit contributions
ors-0.044
but-0.042
fully-0.041
 to-0.039
oes-0.037
Token from
Token ID422
Feature activation+7.92e+00

Pos logit contributions
ors+0.415
but+0.406
iked+0.395
rings+0.394
 Stop+0.390
Neg logit contributions
 big-0.632
 small-0.631
 cute-0.619
 cool-0.615
 clever-0.613
Token my
Token ID616
Feature activation+4.09e+00

Pos logit contributions
ors+0.215
but+0.208
fully+0.205
 to+0.203
oes+0.192
Neg logit contributions
 cute-0.259
 clever-0.256
 charming-0.250
 small-0.247
 gentle-0.243
Token bag
Token ID6131
Feature activation-8.16e-03

Pos logit contributions
 to+0.136
ors+0.129
driver+0.123
but+0.122
fully+0.119
Neg logit contributions
 cute-0.107
 clever-0.102
 charming-0.102
 funny-0.096
 beautiful-0.096
Token!
Token ID0
Feature activation-7.83e-01

Pos logit contributions
 small+0.000
 big+0.000
 cute+0.000
 clever+0.000
 kind+0.000
Neg logit contributions
ors-0.000
but-0.000
ges-0.000
rings-0.000
iked-0.000
Token Give
Token ID13786
Feature activation+6.29e+00

Pos logit contributions
 cute+0.031
 clever+0.030
 kind+0.030
 small+0.030
 gentle+0.029
Neg logit contributions
fully-0.016
ors-0.016
but-0.015
oes-0.014
or-0.014
Token and
Token ID290
Feature activation+1.05e+01

Pos logit contributions
ors+0.081
but+0.080
fully+0.078
driver+0.076
 Stop+0.075
Neg logit contributions
 cute-0.080
 clever-0.078
 small-0.078
 charming-0.077
 beautiful-0.076
Token they
Token ID484
Feature activation+1.24e+01

Pos logit contributions
ors+0.331
but+0.320
fully+0.309
driver+0.303
or+0.299
Neg logit contributions
 cute-0.317
 clever-0.307
 small-0.306
 charming-0.302
 beautiful-0.298
Token wanted
Token ID2227
Feature activation-8.41e+00

Pos logit contributions
 to+0.322
but+0.313
ors+0.312
fully+0.310
or+0.297
Neg logit contributions
 cute-0.397
 clever-0.393
 charming-0.384
 magical-0.379
 small-0.377
Token to
Token ID284
Feature activation+1.75e+01

Pos logit contributions
 cute+0.452
 clever+0.449
 small+0.447
 charming+0.443
 beautiful+0.442
Neg logit contributions
ors-0.064
but-0.051
fully-0.045
licks-0.038
driver-0.037
Token try
Token ID1949
Feature activation+6.55e+00

Pos logit contributions
ors+0.507
but+0.497
fully+0.483
 to+0.470
oes+0.465
Neg logit contributions
 cute-0.540
 clever-0.533
 charming-0.520
 small-0.518
 gentle-0.508
Token it
Token ID340
Feature activation+1.62e+01

Pos logit contributions
ors+0.145
but+0.140
fully+0.136
oes+0.130
driver+0.129
Neg logit contributions
 cute-0.253
 small-0.250
 clever-0.250
 charming-0.246
 beautiful-0.246
Token.
Token ID13
Feature activation-9.19e+00

Pos logit contributions
ors+0.450
but+0.436
fully+0.420
oes+0.405
driver+0.404
Neg logit contributions
 cute-0.529
 clever-0.524
 small-0.516
 charming-0.512
 beautiful-0.504
Token
Token ID220
Feature activation-6.84e+00

Pos logit contributions
 cute+0.312
 small+0.307
 clever+0.306
 charming+0.301
 beautiful+0.296
Neg logit contributions
ors-0.242
but-0.233
fully-0.230
driver-0.220
or-0.214
Token
Token ID198
Feature activation-6.34e+00

Pos logit contributions
 cute+0.193
 clever+0.190
 small+0.189
 charming+0.184
 beautiful+0.180
Neg logit contributions
ors-0.217
but-0.207
fully-0.206
driver-0.199
ting-0.197
Token
Token ID198
Feature activation-6.34e+00

Pos logit contributions
 cute+0.201
 clever+0.198
 charming+0.193
 small+0.192
 gentle+0.189
Neg logit contributions
ors-0.180
but-0.176
fully-0.171
 to-0.165
oes-0.164
TokenAn
Token ID2025
Feature activation-1.11e+01

Pos logit contributions
 cute+0.179
 clever+0.178
 charming+0.172
 cozy+0.169
 small+0.165
Neg logit contributions
but-0.198
 to-0.186
ors-0.184
fully-0.184
t-0.182
Token all
Token ID477
Feature activation+2.77e+00

Pos logit contributions
ors+0.097
but+0.095
fully+0.090
 Stop+0.089
driver+0.089
Neg logit contributions
 cute-0.098
 small-0.097
 clever-0.096
 charming-0.094
 beautiful-0.094
Token the
Token ID262
Feature activation-1.51e+00

Pos logit contributions
ors+0.100
but+0.100
driver+0.097
 Stop+0.096
fully+0.095
Neg logit contributions
 small-0.069
 cute-0.067
 beautiful-0.066
 clever-0.065
 charming-0.065
Token ingredients
Token ID9391
Feature activation+1.60e+01

Pos logit contributions
 cute+0.036
 clever+0.036
 charming+0.035
 gentle+0.034
 small+0.033
Neg logit contributions
ors-0.051
 to-0.051
but-0.050
fully-0.049
oes-0.048
Token and
Token ID290
Feature activation+1.40e+01

Pos logit contributions
 Stop+0.468
ors+0.465
uld+0.448
amb+0.441
licks+0.428
Neg logit contributions
 small-0.523
 clever-0.505
 beautiful-0.505
 cool-0.503
 big-0.502
Token then
Token ID788
Feature activation+4.93e+00

Pos logit contributions
ors+0.459
but+0.440
fully+0.427
oes+0.423
or+0.413
Neg logit contributions
 clever-0.398
 cute-0.397
 small-0.391
 charming-0.381
 gentle-0.380
Token they
Token ID484
Feature activation+1.55e+01

Pos logit contributions
ors+0.150
but+0.145
fully+0.138
oes+0.138
or+0.132
Neg logit contributions
 clever-0.151
 cute-0.151
 small-0.148
 gentle-0.144
 charming-0.144
Token mixed
Token ID7668
Feature activation-1.91e-01

Pos logit contributions
ors+0.483
but+0.474
fully+0.463
 to+0.449
oes+0.443
Neg logit contributions
 cute-0.448
 clever-0.441
 charming-0.430
 small-0.427
 gentle-0.419
Token them
Token ID606
Feature activation+1.37e+01

Pos logit contributions
 cute+0.007
 small+0.007
 clever+0.007
 beautiful+0.007
 charming+0.007
Neg logit contributions
ors-0.005
but-0.005
fully-0.004
driver-0.004
 Stop-0.004
Token together
Token ID1978
Feature activation+3.30e+00

Pos logit contributions
ors+0.302
but+0.277
 Stop+0.270
oes+0.267
fully+0.267
Neg logit contributions
 cute-0.527
 clever-0.527
 small-0.522
 beautiful-0.514
 charming-0.514
Token in
Token ID287
Feature activation+1.22e+01

Pos logit contributions
ors+0.101
but+0.092
 Stop+0.092
oes+0.091
licks+0.091
Neg logit contributions
 small-0.102
 clever-0.102
 cute-0.101
 beautiful-0.100
 charming-0.099
Token a
Token ID257
Feature activation-1.16e+01

Pos logit contributions
 to+0.364
but+0.337
fully+0.334
ges+0.325
ors+0.325
Neg logit contributions
 cute-0.357
 clever-0.352
 charming-0.341
 gentle-0.329
 playful-0.328
Token He
Token ID679
Feature activation+8.88e+00

Pos logit contributions
 cute+0.008
 clever+0.008
 small+0.008
 charming+0.008
 gentle+0.008
Neg logit contributions
ors-0.008
but-0.008
fully-0.008
oes-0.007
driver-0.007
Token was
Token ID373
Feature activation-1.62e+00

Pos logit contributions
ors+0.276
but+0.270
fully+0.265
 to+0.263
driver+0.251
Neg logit contributions
 cute-0.252
 clever-0.247
 charming-0.242
 small-0.240
 gentle-0.234
Token so
Token ID523
Feature activation+4.19e+00

Pos logit contributions
 cute+0.051
 clever+0.050
 charming+0.049
 small+0.049
 funny+0.048
Neg logit contributions
 to-0.044
but-0.042
fully-0.039
ors-0.039
driver-0.036
Token proud
Token ID6613
Feature activation-1.05e+01

Pos logit contributions
 to+0.164
but+0.158
t+0.149
you+0.145
to+0.145
Neg logit contributions
 cute-0.058
 mist-0.056
aver-0.054
 comm-0.054
 charming-0.054
Token that
Token ID326
Feature activation+1.05e+01

Pos logit contributions
 beautiful+0.392
 kind+0.391
 cute+0.388
 small+0.387
 clever+0.387
Neg logit contributions
ors-0.248
licks-0.242
driver-0.234
ancing-0.233
oes-0.233
Token he
Token ID339
Feature activation+1.20e+01

Pos logit contributions
ors+0.192
but+0.182
fully+0.174
oes+0.164
driver+0.163
Neg logit contributions
 cute-0.451
 clever-0.448
 small-0.440
 charming-0.439
 kind-0.433
Token learned
Token ID4499
Feature activation-4.41e+00

Pos logit contributions
ors+0.354
but+0.347
fully+0.340
 to+0.331
driver+0.320
Neg logit contributions
 cute-0.360
 clever-0.354
 small-0.347
 charming-0.347
 gentle-0.338
Token how
Token ID703
Feature activation+9.73e+00

Pos logit contributions
 cute+0.144
 clever+0.144
 beautiful+0.142
 small+0.141
 charming+0.141
Neg logit contributions
ors-0.127
oes-0.124
but-0.120
ges-0.120
fully-0.116
Token to
Token ID284
Feature activation+1.83e+01

Pos logit contributions
ors+0.079
but+0.067
fully+0.061
licks+0.054
oes+0.054
Neg logit contributions
 cute-0.529
 clever-0.528
 small-0.520
 charming-0.516
 kind-0.515
Token take
Token ID1011
Feature activation+4.24e+00

Pos logit contributions
ors+0.516
but+0.502
fully+0.497
 to+0.491
oes+0.481
Neg logit contributions
 cute-0.576
 clever-0.562
 charming-0.554
 small-0.546
 beautiful-0.534
Token care
Token ID1337
Feature activation+5.94e+00

Pos logit contributions
ors+0.125
 to+0.124
fully+0.121
but+0.119
or+0.114
Neg logit contributions
 cute-0.126
 clever-0.123
 charming-0.121
 small-0.117
 gentle-0.116
Token bike
Token ID7161
Feature activation+1.12e+01

Pos logit contributions
 cute+0.603
 clever+0.601
 charming+0.578
 small+0.570
 gentle+0.566
Neg logit contributions
ors-0.476
 to-0.475
but-0.471
fully-0.463
driver-0.456
Token.
Token ID13
Feature activation-7.44e+00

Pos logit contributions
ors+0.372
fully+0.363
but+0.360
 Stop+0.357
uld+0.351
Neg logit contributions
 small-0.307
 cute-0.305
 big-0.301
 kind-0.299
 charming-0.298
Token It
Token ID632
Feature activation+1.44e+01

Pos logit contributions
 cute+0.261
 small+0.258
 clever+0.256
 charming+0.252
 funny+0.248
Neg logit contributions
ors-0.189
fully-0.182
but-0.181
or-0.166
driver-0.164
Token is
Token ID318
Feature activation+8.60e+00

Pos logit contributions
but+0.463
fully+0.458
ors+0.453
ges+0.427
iked+0.419
Neg logit contributions
 cute-0.433
 small-0.424
 clever-0.419
 charming-0.412
 pretty-0.406
Token blue
Token ID4171
Feature activation-1.85e+01

Pos logit contributions
 to+0.354
driver+0.307
fully+0.307
but+0.305
rings+0.301
Neg logit contributions
 clever-0.153
 cute-0.147
 charming-0.138
 gentle-0.133
 magical-0.132
Token and
Token ID290
Feature activation+1.24e+01

Pos logit contributions
 big+0.592
 small+0.589
 pretty+0.574
 cute+0.568
 soft+0.562
Neg logit contributions
uld-0.541
ors-0.522
or-0.517
to-0.499
but-0.499
Token rusty
Token ID47855
Feature activation-1.10e+01

Pos logit contributions
ors+0.510
 to+0.506
but+0.503
driver+0.495
fully+0.495
Neg logit contributions
 clever-0.202
 cute-0.200
 charming-0.191
 gentle-0.183
 small-0.182
Token.
Token ID13
Feature activation-7.21e+00

Pos logit contributions
 cute+0.309
 clever+0.302
 small+0.299
 charming+0.296
 beautiful+0.290
Neg logit contributions
ors-0.353
but-0.348
fully-0.335
oes-0.324
or-0.321
Token
Token ID198
Feature activation-4.55e+00

Pos logit contributions
 cute+0.257
 clever+0.253
 small+0.251
 charming+0.248
 gentle+0.243
Neg logit contributions
ors-0.179
but-0.173
fully-0.171
driver-0.157
or-0.157
Token
Token ID198
Feature activation-4.50e+00

Pos logit contributions
 cute+0.136
 clever+0.134
 small+0.133
 charming+0.131
 beautiful+0.128
Neg logit contributions
ors-0.139
but-0.134
fully-0.132
driver-0.126
oes-0.124
Token"
Token ID1
Feature activation+4.63e+00

Pos logit contributions
 cute+0.108
 clever+0.106
 charming+0.103
 small+0.102
 gentle+0.099
Neg logit contributions
ors-0.163
but-0.161
fully-0.156
 to-0.153
oes-0.152
Token is
Token ID318
Feature activation+6.67e+00

Pos logit contributions
but+0.026
ors+0.026
fully+0.025
iked+0.024
ting+0.024
Neg logit contributions
 cute-0.026
 clever-0.025
 cool-0.025
 small-0.025
 charming-0.024
Token so
Token ID523
Feature activation+3.04e+00

Pos logit contributions
 to+0.254
ors+0.228
but+0.227
fully+0.221
oes+0.216
Neg logit contributions
 cute-0.133
 clever-0.133
 charming-0.127
 gentle-0.125
 kind-0.123
Token pretty
Token ID2495
Feature activation-2.03e+01

Pos logit contributions
 to+0.144
but+0.137
ors+0.136
oes+0.134
t+0.133
Neg logit contributions
 magical-0.020
 comm-0.020
 clever-0.020
 wonderful-0.019
 charming-0.019
Token.
Token ID13
Feature activation-8.77e+00

Pos logit contributions
 cute+0.555
 beautiful+0.538
 pretty+0.536
 big+0.530
 charming+0.525
Neg logit contributions
iked-0.727
oes-0.700
or-0.699
uld-0.691
arded-0.679
Token Can
Token ID1680
Feature activation+6.90e+00

Pos logit contributions
 cute+0.332
 small+0.319
 cool+0.317
 kind+0.315
 soft+0.311
Neg logit contributions
ors-0.206
fully-0.203
or-0.202
iked-0.192
but-0.189
Token I
Token ID314
Feature activation+1.18e+01

Pos logit contributions
ously+0.190
fully+0.186
iked+0.177
licks+0.176
ors+0.175
Neg logit contributions
 cute-0.240
 small-0.235
 clever-0.230
 pretty-0.228
 kind-0.228
Token play
Token ID711
Feature activation-4.94e+00

Pos logit contributions
ors+0.407
fully+0.396
but+0.395
ting+0.370
iked+0.369
Neg logit contributions
 cool-0.327
 cute-0.326
 clever-0.325
 small-0.319
 pretty-0.315
Token with
Token ID351
Feature activation+6.46e+00

Pos logit contributions
 clever+0.191
 cute+0.191
 beautiful+0.188
 charming+0.188
 small+0.188
Neg logit contributions
ors-0.115
oes-0.108
iked-0.102
rings-0.102
fully-0.102
Token it
Token ID340
Feature activation+1.64e+01

Pos logit contributions
fully+0.139
ancing+0.138
iked+0.138
but+0.138
oes+0.135
Neg logit contributions
 cute-0.250
 small-0.249
 pretty-0.246
 beautiful-0.243
 cool-0.242
Token?"
Token ID1701
Feature activation-1.61e-03

Pos logit contributions
ors+0.531
but+0.500
ges+0.491
rings+0.487
fully+0.485
Neg logit contributions
 cute-0.479
 small-0.476
 clever-0.475
 charming-0.471
 beautiful-0.465
Token Tim
Token ID5045
Feature activation-5.61e+00

Pos logit contributions
 cute+0.000
 clever+0.000
 charming+0.000
 small+0.000
 gentle+0.000
Neg logit contributions
ors-0.000
but-0.000
fully-0.000
oes-0.000
driver-0.000
Token had
Token ID550
Feature activation+1.97e+00

Pos logit contributions
 cute+0.054
 clever+0.054
 charming+0.053
 small+0.051
 beautiful+0.051
Neg logit contributions
ors-0.063
 to-0.062
but-0.061
fully-0.060
driver-0.056
Token a
Token ID257
Feature activation-1.16e+01

Pos logit contributions
 to+0.075
but+0.065
ors+0.064
fully+0.064
driver+0.063
Neg logit contributions
 clever-0.042
 cute-0.041
 charming-0.041
 kind-0.039
 humble-0.038
Token yellow
Token ID7872
Feature activation-1.57e+01

Pos logit contributions
 clever+0.139
aver+0.130
 elderly+0.129
 cute+0.127
 charming+0.126
Neg logit contributions
 to-0.500
driver-0.426
't-0.408
 it-0.405
but-0.403
Token bubble
Token ID14310
Feature activation-9.50e+00

Pos logit contributions
 cute+0.337
 clever+0.326
 small+0.323
 charming+0.318
 gentle+0.311
Neg logit contributions
ors-0.619
but-0.608
fully-0.589
oes-0.571
driver-0.568
Token wand
Token ID11569
Feature activation+3.65e+00

Pos logit contributions
 cute+0.307
 small+0.303
 clever+0.301
 charming+0.295
 beautiful+0.295
Neg logit contributions
ors-0.273
but-0.264
oes-0.252
 Stop-0.249
fully-0.248
Token that
Token ID326
Feature activation+1.12e+01

Pos logit contributions
ors+0.086
 Stop+0.079
oes+0.078
uld+0.078
but+0.078
Neg logit contributions
 small-0.137
 cute-0.137
 clever-0.135
 beautiful-0.133
 kind-0.133
Token made
Token ID925
Feature activation+1.44e+00

Pos logit contributions
ors+0.238
 to+0.237
but+0.234
fully+0.229
oes+0.217
Neg logit contributions
 cute-0.407
 clever-0.405
 charming-0.399
 gentle-0.391
 small-0.388
Token small
Token ID1402
Feature activation-2.58e+01

Pos logit contributions
ors+0.049
but+0.047
fully+0.047
 to+0.046
oes+0.045
Neg logit contributions
 cute-0.037
 clever-0.037
 charming-0.036
 small-0.035
 gentle-0.035
Token bubbles
Token ID25037
Feature activation-3.21e+00

Pos logit contributions
 cute+0.637
 clever+0.632
 charming+0.611
 small+0.603
 gentle+0.592
Neg logit contributions
ors-0.906
but-0.881
fully-0.869
 to-0.853
oes-0.838
Token.
Token ID13
Feature activation-6.71e+00

Pos logit contributions
 cute+0.097
 small+0.096
 clever+0.095
 beautiful+0.095
 funny+0.094
Neg logit contributions
ors-0.100
but-0.097
 Stop-0.094
or-0.092
oes-0.091
Token Sara
Token ID24799
Feature activation-1.42e+00

Pos logit contributions
 cute+0.234
 clever+0.230
 small+0.226
 charming+0.226
 gentle+0.221
Neg logit contributions
ors-0.171
but-0.165
fully-0.162
 to-0.153
oes-0.153
Token a
Token ID257
Feature activation-1.28e+01

Pos logit contributions
ors+0.267
but+0.255
fully+0.244
driver+0.242
oes+0.239
Neg logit contributions
 cute-0.207
 clever-0.206
 small-0.201
 charming-0.199
 gentle-0.197
Token bit
Token ID1643
Feature activation-8.78e+00

Pos logit contributions
 cute+0.249
 clever+0.236
 charming+0.232
 funny+0.217
 beautiful+0.215
Neg logit contributions
 to-0.510
but-0.475
ors-0.460
or-0.458
driver-0.458
Token of
Token ID286
Feature activation+9.35e+00

Pos logit contributions
 small+0.354
 clever+0.350
 cute+0.349
 gentle+0.345
 kind+0.343
Neg logit contributions
ors-0.192
but-0.182
fully-0.175
driver-0.169
rings-0.159
Token effort
Token ID3626
Feature activation+1.60e+00

Pos logit contributions
ors+0.311
 to+0.310
but+0.305
fully+0.299
driver+0.288
Neg logit contributions
 cute-0.236
 clever-0.230
 charming-0.224
 small-0.218
 beautiful-0.214
Token,
Token ID11
Feature activation+1.32e+00

Pos logit contributions
but+0.043
ors+0.043
licks+0.041
amb+0.041
oes+0.040
Neg logit contributions
 small-0.054
 clever-0.053
 cute-0.052
 big-0.052
 kind-0.051
Token he
Token ID339
Feature activation+1.22e+01

Pos logit contributions
ors+0.035
but+0.033
fully+0.032
oes+0.031
licks+0.031
Neg logit contributions
 clever-0.049
 small-0.048
 cute-0.048
 kind-0.047
 beautiful-0.047
Token could
Token ID714
Feature activation+2.79e+00

Pos logit contributions
ors+0.325
but+0.325
fully+0.316
ges+0.300
oes+0.296
Neg logit contributions
 clever-0.421
 cute-0.408
 small-0.399
 kind-0.399
 charming-0.395
Token make
Token ID787
Feature activation+4.85e+00

Pos logit contributions
ors+0.103
ges+0.097
but+0.097
fully+0.096
licks+0.095
Neg logit contributions
 clever-0.072
 cool-0.070
 kind-0.069
 small-0.068
 cute-0.068
Token a
Token ID257
Feature activation-1.26e+01

Pos logit contributions
ors+0.141
 to+0.140
but+0.134
fully+0.134
or+0.125
Neg logit contributions
 cute-0.143
 clever-0.137
 charming-0.136
 gentle-0.134
 small-0.133
Token difference
Token ID3580
Feature activation+9.90e+00

Pos logit contributions
 cute+0.220
 clever+0.206
 charming+0.197
aver+0.194
 gentle+0.190
Neg logit contributions
 to-0.509
but-0.461
oes-0.445
ors-0.444
fully-0.440
Token.
Token ID13
Feature activation-7.71e+00

Pos logit contributions
ors+0.321
driver+0.298
iked+0.298
uld+0.295
but+0.294
Neg logit contributions
 kind-0.288
 clever-0.281
 cute-0.281
 small-0.279
 charming-0.274
Token to
Token ID284
Feature activation+1.73e+01

Pos logit contributions
 cute+0.043
 clever+0.043
 charming+0.042
 small+0.042
 gentle+0.042
Neg logit contributions
ors-0.002
but-0.002
 to-0.002
fully-0.001
oes-0.000
Token help
Token ID1037
Feature activation-3.69e+00

Pos logit contributions
 to+0.450
fully+0.441
ors+0.440
but+0.427
 Stop+0.410
Neg logit contributions
 cute-0.563
 charming-0.541
 clever-0.538
 small-0.530
 beautiful-0.523
Token her
Token ID607
Feature activation+8.95e-01

Pos logit contributions
 cute+0.120
 clever+0.119
 small+0.116
 charming+0.115
 gentle+0.114
Neg logit contributions
ors-0.103
but-0.100
fully-0.097
oes-0.093
driver-0.093
Token mom
Token ID1995
Feature activation-5.83e+00

Pos logit contributions
 to+0.033
ors+0.031
but+0.031
fully+0.030
driver+0.029
Neg logit contributions
 cute-0.021
 clever-0.020
 charming-0.019
 magical-0.019
 beautiful-0.019
Token.
Token ID13
Feature activation-8.98e+00

Pos logit contributions
 cute+0.164
 clever+0.164
 small+0.161
 kind+0.159
 charming+0.158
Neg logit contributions
ors-0.190
but-0.183
fully-0.179
 Stop-0.173
driver-0.171
Token So
Token ID1406
Feature activation-9.07e-01

Pos logit contributions
 small+0.324
 cute+0.318
 little+0.311
 charming+0.310
 gentle+0.307
Neg logit contributions
ors-0.223
fully-0.211
ges-0.210
or-0.206
driver-0.205
Token,
Token ID11
Feature activation+2.08e-01

Pos logit contributions
 small+0.029
 clever+0.028
 cute+0.028
 kind+0.027
 big+0.027
Neg logit contributions
ors-0.029
oes-0.027
licks-0.027
but-0.026
to-0.026
Token she
Token ID673
Feature activation+1.14e+01

Pos logit contributions
ors+0.005
oes+0.004
driver+0.004
licks+0.004
ancing+0.004
Neg logit contributions
 cute-0.008
 small-0.008
 clever-0.008
 charming-0.008
 funny-0.008
Token turned
Token ID2900
Feature activation-1.19e+01

Pos logit contributions
ors+0.347
but+0.338
 to+0.338
fully+0.337
driver+0.324
Neg logit contributions
 cute-0.332
 clever-0.321
 charming-0.319
 small-0.312
 beautiful-0.310
Token off
Token ID572
Feature activation+1.31e+01

Pos logit contributions
 small+0.489
 big+0.474
 cute+0.471
 clever+0.466
 beautiful+0.465
Neg logit contributions
ors-0.247
oes-0.246
but-0.241
ancing-0.230
 Stop-0.225
Token her
Token ID607
Feature activation+8.36e-01

Pos logit contributions
ors+0.416
but+0.404
fully+0.392
oes+0.384
driver+0.376
Neg logit contributions
 cute-0.380
 clever-0.376
 small-0.368
 charming-0.366
 gentle-0.359
TokenOne
Token ID3198
Feature activation-8.58e+00

Pos logit contributions
 cute+0.128
 clever+0.126
 small+0.123
 charming+0.123
 gentle+0.119
Neg logit contributions
ors-0.231
but-0.222
fully-0.221
driver-0.214
oes-0.214
Token day
Token ID1110
Feature activation-3.37e+00

Pos logit contributions
 cute+0.257
 clever+0.253
 charming+0.237
 magical+0.233
 cozy+0.232
Neg logit contributions
 to-0.254
oes-0.237
but-0.220
fully-0.216
ing-0.213
Token,
Token ID11
Feature activation+5.78e-01

Pos logit contributions
 small+0.111
 cute+0.110
 clever+0.110
 kind+0.107
 charming+0.106
Neg logit contributions
ors-0.098
oes-0.091
but-0.091
licks-0.089
rings-0.086
Token John
Token ID1757
Feature activation+3.61e+00

Pos logit contributions
ors+0.014
but+0.012
oes+0.012
driver+0.012
licks+0.012
Neg logit contributions
 cute-0.022
 clever-0.022
 small-0.022
 charming-0.021
 beautiful-0.021
Token saw
Token ID2497
Feature activation-8.40e+00

Pos logit contributions
ors+0.115
but+0.114
fully+0.111
oes+0.106
driver+0.106
Neg logit contributions
 cute-0.101
 clever-0.100
 small-0.098
 charming-0.097
 gentle-0.095
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 cute+0.230
 clever+0.226
 charming+0.217
 gentle+0.213
 cozy+0.209
Neg logit contributions
 to-0.275
fully-0.253
but-0.252
 him-0.246
ors-0.242
Token girl
Token ID2576
Feature activation-4.63e-01

Pos logit contributions
 clever+0.191
 cute+0.182
aver+0.181
 gentle+0.173
 charming+0.171
Neg logit contributions
 to-0.562
ing-0.496
but-0.479
driver-0.478
oes-0.471
Token in
Token ID287
Feature activation+9.86e+00

Pos logit contributions
 cute+0.014
 clever+0.014
 small+0.014
 charming+0.014
 kind+0.013
Neg logit contributions
ors-0.014
but-0.014
fully-0.013
driver-0.013
oes-0.013
Token the
Token ID262
Feature activation-4.62e+00

Pos logit contributions
 to+0.276
 him+0.243
ges+0.242
ting+0.234
driver+0.232
Neg logit contributions
 clever-0.299
 cute-0.295
 charming-0.287
 gentle-0.280
 wonderful-0.275
Token park
Token ID3952
Feature activation-6.31e+00

Pos logit contributions
 clever+0.130
 cute+0.128
 compassionate+0.118
aver+0.114
 playful+0.114
Neg logit contributions
 to-0.137
but-0.123
oes-0.119
ting-0.118
als-0.115
Token.
Token ID13
Feature activation-9.11e+00

Pos logit contributions
 cute+0.177
 small+0.176
 kind+0.174
 clever+0.174
 charming+0.173
Neg logit contributions
ors-0.207
fully-0.192
driver-0.191
but-0.190
or-0.184
Token some
Token ID617
Feature activation-4.84e+00

Pos logit contributions
but+0.043
ors+0.043
iked+0.041
oes+0.041
fully+0.041
Neg logit contributions
 small-0.043
 cute-0.043
 beautiful-0.042
 clever-0.042
 cool-0.041
Token paper
Token ID3348
Feature activation-1.40e+00

Pos logit contributions
 cute+0.114
 clever+0.112
 small+0.110
 charming+0.109
 beautiful+0.106
Neg logit contributions
ors-0.178
but-0.175
fully-0.170
oes-0.165
driver-0.164
Token,"
Token ID553
Feature activation-2.09e+00

Pos logit contributions
 small+0.048
 big+0.047
 beautiful+0.047
 pretty+0.047
 clever+0.046
Neg logit contributions
 Stop-0.036
ting-0.035
to-0.034
but-0.034
ors-0.034
Token Jill
Token ID22772
Feature activation-1.32e+00

Pos logit contributions
 clever+0.068
 cute+0.068
 charming+0.067
 beautiful+0.065
 small+0.065
Neg logit contributions
but-0.054
 to-0.053
ors-0.053
fully-0.052
 Stop-0.049
Token said
Token ID531
Feature activation-1.53e+01

Pos logit contributions
 cute+0.035
 clever+0.033
 charming+0.033
 beautiful+0.032
 small+0.032
Neg logit contributions
ors-0.045
fully-0.044
but-0.044
 to-0.044
driver-0.041
Token.
Token ID13
Feature activation-8.57e+00

Pos logit contributions
 cute+0.440
 clever+0.437
 small+0.427
 charming+0.420
 gentle+0.418
Neg logit contributions
ors-0.483
but-0.467
fully-0.449
oes-0.445
driver-0.436
Token She
Token ID1375
Feature activation+1.18e+01

Pos logit contributions
 small+0.256
 cute+0.233
 big+0.230
 clever+0.228
 tiny+0.228
Neg logit contributions
ors-0.298
or-0.273
reating-0.266
ancing-0.258
fully-0.258
Token took
Token ID1718
Feature activation-3.00e+00

Pos logit contributions
but+0.342
 to+0.341
ors+0.341
fully+0.339
driver+0.332
Neg logit contributions
 cute-0.345
 charming-0.335
 clever-0.328
 beautiful-0.324
 wonderful-0.321
Token out
Token ID503
Feature activation+6.50e+00

Pos logit contributions
 cute+0.099
 clever+0.098
 charming+0.096
 small+0.095
 gentle+0.094
Neg logit contributions
ors-0.082
but-0.080
fully-0.078
 to-0.075
oes-0.074
Token a
Token ID257
Feature activation-1.37e+01

Pos logit contributions
 to+0.212
ors+0.210
but+0.210
fully+0.209
driver+0.199
Neg logit contributions
 cute-0.172
 clever-0.168
 charming-0.164
 beautiful-0.157
 gentle-0.157
Token big
Token ID1263
Feature activation-2.21e+01

Pos logit contributions
 cute+0.310
 clever+0.304
 charming+0.297
 kind+0.282
 playful+0.279
Neg logit contributions
 to-0.496
driver-0.469
but-0.467
fully-0.467
ors-0.462
Token air
Token ID1633
Feature activation+1.45e+00

Pos logit contributions
 cute+0.016
 small+0.015
 clever+0.015
 charming+0.015
 beautiful+0.015
Neg logit contributions
ors-0.015
but-0.014
fully-0.014
driver-0.014
oes-0.013
Token.
Token ID13
Feature activation-8.84e+00

Pos logit contributions
ors+0.044
but+0.042
 Stop+0.041
driver+0.041
oes+0.040
Neg logit contributions
 cute-0.044
 small-0.043
 beautiful-0.043
 clever-0.043
 charming-0.043
Token
Token ID220
Feature activation-6.53e+00

Pos logit contributions
 cute+0.248
 small+0.246
 clever+0.241
 charming+0.238
 cool+0.231
Neg logit contributions
ors-0.285
fully-0.278
but-0.277
driver-0.264
or-0.260
Token
Token ID198
Feature activation-5.88e+00

Pos logit contributions
 cute+0.184
 clever+0.181
 small+0.181
 charming+0.176
 gentle+0.172
Neg logit contributions
ors-0.208
but-0.199
fully-0.199
driver-0.190
oes-0.188
Token
Token ID198
Feature activation-5.88e+00

Pos logit contributions
 cute+0.189
 clever+0.186
 charming+0.182
 small+0.180
 gentle+0.178
Neg logit contributions
ors-0.164
but-0.161
fully-0.156
 to-0.152
oes-0.150
TokenOne
Token ID3198
Feature activation-8.81e+00

Pos logit contributions
 cute+0.141
 clever+0.139
 charming+0.134
 small+0.132
 beautiful+0.130
Neg logit contributions
ors-0.211
but-0.210
fully-0.204
 to-0.201
oes-0.198
Token morning
Token ID3329
Feature activation-5.29e-01

Pos logit contributions
 cute+0.258
 clever+0.253
 charming+0.240
 magical+0.234
 pretty+0.233
Neg logit contributions
 to-0.264
oes-0.239
fully-0.227
but-0.219
ing-0.216
Token,
Token ID11
Feature activation+2.14e-01

Pos logit contributions
 cute+0.020
 clever+0.020
 small+0.019
 charming+0.019
 beautiful+0.019
Neg logit contributions
ors-0.012
but-0.012
fully-0.011
oes-0.011
driver-0.011
Token the
Token ID262
Feature activation-4.92e+00

Pos logit contributions
ors+0.006
but+0.006
fully+0.006
 to+0.006
oes+0.006
Neg logit contributions
 cute-0.007
 clever-0.007
 charming-0.006
 small-0.006
 gentle-0.006
Token bunny
Token ID44915
Feature activation-5.05e+00

Pos logit contributions
 wond+0.152
 eng+0.149
 wonderful+0.148
aver+0.148
 clever+0.146
Neg logit contributions
 to-0.104
ting-0.102
fully-0.084
ted-0.083
 whenever-0.082
Token decided
Token ID3066
Feature activation-3.28e+00

Pos logit contributions
 cute+0.133
 clever+0.131
 small+0.127
 charming+0.127
 gentle+0.124
Neg logit contributions
ors-0.171
but-0.168
fully-0.164
oes-0.158
driver-0.158
Token some
Token ID617
Feature activation-4.90e+00

Pos logit contributions
ors+0.055
but+0.055
fully+0.054
driver+0.052
oes+0.051
Neg logit contributions
 cute-0.063
 small-0.062
 clever-0.062
 charming-0.061
 beautiful-0.061
Token fruits
Token ID15921
Feature activation+9.14e+00

Pos logit contributions
 small+0.141
 cute+0.140
 pretty+0.139
 beautiful+0.137
 charming+0.136
Neg logit contributions
ors-0.160
fully-0.157
but-0.156
driver-0.152
 Stop-0.150
Token,
Token ID11
Feature activation+1.11e+00

Pos logit contributions
ors+0.343
ously+0.329
iked+0.322
 Stop+0.321
to+0.321
Neg logit contributions
 big-0.248
 pretty-0.240
 small-0.239
 cool-0.234
 beautiful-0.227
Token Mom
Token ID11254
Feature activation-7.79e+00

Pos logit contributions
ors+0.033
but+0.031
fully+0.030
or+0.029
driver+0.029
Neg logit contributions
 cute-0.037
 small-0.036
 clever-0.036
 charming-0.035
 beautiful-0.035
Token?"
Token ID1701
Feature activation+8.72e-01

Pos logit contributions
 kind+0.181
 small+0.180
 cute+0.179
 pretty+0.173
 charming+0.173
Neg logit contributions
but-0.290
ors-0.289
fully-0.287
iked-0.286
 Stop-0.277
Token Lily
Token ID20037
Feature activation-7.82e+00

Pos logit contributions
ors+0.023
but+0.022
 to+0.022
fully+0.022
oes+0.021
Neg logit contributions
 cute-0.029
 clever-0.029
 charming-0.028
 beautiful-0.028
 small-0.028
Token asked
Token ID1965
Feature activation-8.89e+00

Pos logit contributions
 cute+0.220
 clever+0.218
 charming+0.216
 wonderful+0.213
 beautiful+0.211
Neg logit contributions
ors-0.243
 to-0.242
but-0.240
fully-0.234
oes-0.220
Token.
Token ID13
Feature activation-8.01e+00

Pos logit contributions
 cute+0.295
 clever+0.291
 charming+0.291
 beautiful+0.284
 magical+0.281
Neg logit contributions
 to-0.242
fully-0.238
ors-0.235
but-0.233
oes-0.219
Token
Token ID198
Feature activation-5.54e+00

Pos logit contributions
 cute+0.249
 small+0.245
 clever+0.245
 charming+0.238
 gentle+0.235
Neg logit contributions
ors-0.243
but-0.228
fully-0.227
or-0.216
oes-0.215
Token
Token ID198
Feature activation-5.52e+00

Pos logit contributions
 cute+0.178
 clever+0.176
 charming+0.172
 small+0.170
 beautiful+0.168
Neg logit contributions
ors-0.154
but-0.151
fully-0.146
 to-0.143
oes-0.141
Token"
Token ID1
Feature activation+3.54e+00

Pos logit contributions
 cute+0.197
 clever+0.195
 small+0.191
 charming+0.191
 gentle+0.188
Neg logit contributions
ors-0.137
but-0.133
fully-0.129
 to-0.123
oes-0.122
Token did
Token ID750
Feature activation-1.47e+00

Pos logit contributions
ors+0.341
but+0.341
 to+0.340
fully+0.337
or+0.321
Neg logit contributions
 cute-0.423
 clever-0.410
 charming-0.406
 small-0.396
 beautiful-0.391
Token,
Token ID11
Feature activation+2.17e+00

Pos logit contributions
 clever+0.047
 small+0.046
 funny+0.044
 cute+0.044
 brave+0.044
Neg logit contributions
oes-0.046
ors-0.045
licks-0.043
but-0.042
ays-0.041
Token he
Token ID339
Feature activation+1.27e+01

Pos logit contributions
ors+0.047
but+0.044
oes+0.043
fully+0.043
driver+0.040
Neg logit contributions
 clever-0.086
 cute-0.085
 small-0.085
 charming-0.083
 beautiful-0.082
Token felt
Token ID2936
Feature activation-1.76e+00

Pos logit contributions
ors+0.370
but+0.369
 to+0.367
fully+0.359
or+0.343
Neg logit contributions
 cute-0.379
 clever-0.365
 charming-0.363
 small-0.353
 beautiful-0.348
Token a
Token ID257
Feature activation-1.24e+01

Pos logit contributions
 cute+0.046
 clever+0.044
 charming+0.044
 small+0.042
 gentle+0.041
Neg logit contributions
 to-0.057
but-0.056
ors-0.053
fully-0.053
ting-0.051
Token soft
Token ID2705
Feature activation-2.02e+01

Pos logit contributions
 cute+0.211
 charming+0.200
aver+0.192
 clever+0.190
 pretty+0.178
Neg logit contributions
 to-0.483
fully-0.454
but-0.453
t-0.437
ing-0.423
Token breeze
Token ID28633
Feature activation+4.13e+00

Pos logit contributions
 cute+0.528
 clever+0.520
 charming+0.505
 small+0.502
 gentle+0.490
Neg logit contributions
ors-0.684
but-0.671
fully-0.658
 to-0.639
oes-0.632
Token against
Token ID1028
Feature activation+7.09e+00

Pos logit contributions
ors+0.106
but+0.099
oes+0.097
 Stop+0.094
licks+0.092
Neg logit contributions
 cute-0.143
 small-0.142
 clever-0.141
 beautiful-0.139
 gentle-0.139
Token his
Token ID465
Feature activation-1.11e+00

Pos logit contributions
fully+0.175
ors+0.173
 to+0.172
but+0.169
ges+0.158
Neg logit contributions
 cute-0.247
 clever-0.244
 charming-0.239
 gentle-0.233
 kind-0.230
Token face
Token ID1986
Feature activation+2.97e+00

Pos logit contributions
 cute+0.028
 clever+0.028
 charming+0.027
 small+0.026
 kind+0.026
Neg logit contributions
 to-0.037
ors-0.037
fully-0.036
but-0.036
oes-0.034
Token.
Token ID13
Feature activation-7.94e+00

Pos logit contributions
ors+0.081
but+0.076
oes+0.074
fully+0.073
 Stop+0.071
Neg logit contributions
 cute-0.098
 clever-0.097
 small-0.097
 beautiful-0.095
 charming-0.095
Token
Token ID198
Feature activation-6.03e+00

Pos logit contributions
 small+0.235
 cute+0.231
 clever+0.227
 gentle+0.220
 kind+0.217
Neg logit contributions
ors-0.302
fully-0.279
or-0.276
oes-0.274
but-0.268
Token
Token ID198
Feature activation-6.01e+00

Pos logit contributions
 cute+0.198
 clever+0.195
 charming+0.191
 small+0.188
 beautiful+0.186
Neg logit contributions
ors-0.165
but-0.164
fully-0.157
 to-0.153
driver-0.148
Token"
Token ID1
Feature activation+3.37e+00

Pos logit contributions
 cute+0.156
 clever+0.155
 small+0.151
 charming+0.150
 gentle+0.149
Neg logit contributions
ors-0.208
but-0.199
fully-0.198
oes-0.193
 to-0.191
TokenYou
Token ID1639
Feature activation+8.62e+00

Pos logit contributions
but+0.086
fully+0.080
ors+0.079
 to+0.076
oes+0.074
Neg logit contributions
 cute-0.117
 clever-0.116
 small-0.114
 charming-0.113
 gentle-0.110
Token were
Token ID547
Feature activation-5.04e-01

Pos logit contributions
 to+0.112
ors+0.110
fully+0.102
 Stop+0.100
t+0.097
Neg logit contributions
 cute-0.380
 charming-0.376
 small-0.372
 clever-0.370
 lively-0.365
Token brave
Token ID14802
Feature activation-2.24e+01

Pos logit contributions
 cute+0.015
 small+0.014
 charming+0.014
 clever+0.014
 gentle+0.014
Neg logit contributions
 to-0.014
ors-0.012
but-0.012
fully-0.012
 Stop-0.011
Token to
Token ID284
Feature activation+1.76e+01

Pos logit contributions
 cute+0.777
 clever+0.764
 small+0.761
 kind+0.747
 charming+0.743
Neg logit contributions
but-0.592
ors-0.588
oes-0.564
or-0.536
ancing-0.535
Token stay
Token ID2652
Feature activation+5.60e+00

Pos logit contributions
ors+0.519
but+0.507
fully+0.494
 to+0.478
oes+0.473
Neg logit contributions
 cute-0.540
 clever-0.533
 charming-0.520
 small-0.518
 gentle-0.508
Token,"
Token ID553
Feature activation-2.70e+00

Pos logit contributions
ors+0.126
but+0.123
 to+0.120
fully+0.119
or+0.110
Neg logit contributions
 cute-0.205
 clever-0.202
 charming-0.198
 small-0.197
 gentle-0.193
Token said
Token ID531
Feature activation-1.57e+01

Pos logit contributions
 cute+0.079
 clever+0.077
 charming+0.076
 small+0.073
 beautiful+0.072
Neg logit contributions
 to-0.081
but-0.080
fully-0.079
ors-0.075
ting-0.071
Token the
Token ID262
Feature activation-4.73e+00

Pos logit contributions
 cute+0.505
 clever+0.494
 charming+0.487
 beautiful+0.471
 small+0.470
Neg logit contributions
but-0.429
 to-0.428
fully-0.428
ors-0.427
or-0.389
Token of
Token ID286
Feature activation+8.18e+00

Pos logit contributions
 beautiful+0.416
 cute+0.406
 charming+0.402
 small+0.402
 big+0.401
Neg logit contributions
driver-0.300
ors-0.299
licks-0.289
ancing-0.278
or-0.276
Token herself
Token ID5223
Feature activation+3.49e+00

Pos logit contributions
ors+0.217
but+0.203
driver+0.197
oes+0.197
fully+0.197
Neg logit contributions
 cute-0.290
 clever-0.288
 small-0.281
 charming-0.280
 beautiful-0.278
Token for
Token ID329
Feature activation+1.12e+01

Pos logit contributions
ors+0.087
oes+0.083
licks+0.081
but+0.080
 Stop+0.077
Neg logit contributions
 clever-0.132
 cute-0.130
 beautiful-0.127
 charming-0.127
 small-0.126
Token being
Token ID852
Feature activation+8.96e+00

Pos logit contributions
ors+0.297
but+0.291
fully+0.283
 to+0.272
oes+0.267
Neg logit contributions
 cute-0.377
 clever-0.372
 charming-0.364
 small-0.362
 gentle-0.356
Token so
Token ID523
Feature activation+3.11e+00

Pos logit contributions
 to+0.301
but+0.280
fully+0.257
ting+0.252
you+0.252
Neg logit contributions
 cute-0.226
 cozy-0.212
 small-0.212
 charming-0.207
 clever-0.205
Token brave
Token ID14802
Feature activation-2.23e+01

Pos logit contributions
 to+0.145
but+0.141
ing+0.136
t+0.135
 Stop+0.135
Neg logit contributions
 cute-0.025
aver-0.025
 cozy-0.024
 charming-0.023
 magical-0.021
Token and
Token ID290
Feature activation+1.11e+01

Pos logit contributions
 cute+0.751
 clever+0.737
 beautiful+0.736
 charming+0.735
 kind+0.735
Neg logit contributions
but-0.595
ors-0.588
oes-0.580
ancing-0.553
to-0.553
Token performing
Token ID9489
Feature activation-7.78e-01

Pos logit contributions
but+0.428
 to+0.421
ors+0.418
fully+0.416
oes+0.396
Neg logit contributions
 cute-0.226
 clever-0.217
 small-0.213
 charming-0.212
 beautiful-0.201
Token the
Token ID262
Feature activation-4.79e+00

Pos logit contributions
 cute+0.026
 clever+0.026
 charming+0.025
 small+0.025
 gentle+0.024
Neg logit contributions
ors-0.021
but-0.020
fully-0.020
 to-0.019
oes-0.019
Token show
Token ID905
Feature activation-5.16e+00

Pos logit contributions
 cute+0.094
aver+0.089
 cozy+0.086
 clever+0.084
 charming+0.083
Neg logit contributions
 to-0.181
but-0.172
fully-0.172
oes-0.162
you-0.159
Token.
Token ID13
Feature activation-8.72e+00

Pos logit contributions
 cute+0.156
 clever+0.156
 kind+0.153
 charming+0.152
 beautiful+0.152
Neg logit contributions
ors-0.154
but-0.144
driver-0.142
oes-0.139
licks-0.138
Token the
Token ID262
Feature activation-4.58e+00

Pos logit contributions
 to+0.256
ors+0.218
 him+0.217
ges+0.216
fully+0.213
Neg logit contributions
 cute-0.333
 clever-0.330
 charming-0.322
 beautiful-0.310
 gentle-0.304
Token morning
Token ID3329
Feature activation-3.41e-01

Pos logit contributions
 clever+0.112
 cute+0.107
aver+0.104
 charming+0.099
 wonderful+0.097
Neg logit contributions
 to-0.155
ting-0.129
oes-0.129
 Stop-0.128
or-0.128
Token.
Token ID13
Feature activation-8.48e+00

Pos logit contributions
 cute+0.011
 clever+0.011
 charming+0.011
 small+0.011
 gentle+0.011
Neg logit contributions
ors-0.009
but-0.009
fully-0.009
 to-0.009
oes-0.008
Token He
Token ID679
Feature activation+7.96e+00

Pos logit contributions
 cute+0.323
 clever+0.317
 small+0.316
 charming+0.312
 gentle+0.307
Neg logit contributions
ors-0.197
but-0.184
fully-0.181
or-0.169
driver-0.168
Token feels
Token ID5300
Feature activation+6.73e+00

Pos logit contributions
ors+0.240
but+0.234
fully+0.229
oes+0.219
driver+0.218
Neg logit contributions
 cute-0.242
 clever-0.239
 charming-0.232
 small-0.232
 gentle-0.228
Token brave
Token ID14802
Feature activation-2.27e+01

Pos logit contributions
 to+0.255
but+0.231
 him+0.227
fully+0.224
ting+0.223
Neg logit contributions
 cute-0.124
 clever-0.123
 charming-0.118
aver-0.114
 large-0.112
Token and
Token ID290
Feature activation+1.14e+01

Pos logit contributions
 small+0.852
 cute+0.851
 kind+0.841
 clever+0.837
 gentle+0.834
Neg logit contributions
ors-0.546
oes-0.528
but-0.524
or-0.484
to-0.483
Token strong
Token ID1913
Feature activation-2.29e+01

Pos logit contributions
 to+0.539
but+0.526
fully+0.504
 Stop+0.490
ors+0.489
Neg logit contributions
 cute-0.105
 charming-0.093
 small-0.093
 clever-0.092
 beautiful-0.084
Token.
Token ID13
Feature activation-8.63e+00

Pos logit contributions
 cute+0.581
 small+0.575
 clever+0.571
 kind+0.562
 charming+0.556
Neg logit contributions
ors-0.805
oes-0.778
but-0.775
or-0.749
uld-0.748
Token He
Token ID679
Feature activation+7.87e+00

Pos logit contributions
 cute+0.325
 clever+0.320
 small+0.317
 charming+0.314
 gentle+0.309
Neg logit contributions
ors-0.204
but-0.191
fully-0.188
oes-0.176
or-0.175
Token thinks
Token ID6834
Feature activation+8.70e-02

Pos logit contributions
ors+0.233
but+0.228
fully+0.221
 to+0.218
oes+0.213
Neg logit contributions
 cute-0.236
 clever-0.233
 charming-0.227
 small-0.226
 beautiful-0.222
Token day
Token ID1110
Feature activation-2.61e+00

Pos logit contributions
 cute+0.235
 clever+0.230
 charming+0.220
 gentle+0.214
 playful+0.212
Neg logit contributions
 to-0.231
oes-0.211
fully-0.199
t-0.194
ing-0.193
Token,
Token ID11
Feature activation+1.55e+00

Pos logit contributions
 cute+0.092
 clever+0.092
 small+0.091
 funny+0.088
 charming+0.088
Neg logit contributions
ors-0.068
but-0.066
oes-0.063
fully-0.062
licks-0.059
Token he
Token ID339
Feature activation+1.19e+01

Pos logit contributions
ors+0.041
but+0.040
fully+0.038
oes+0.037
driver+0.037
Neg logit contributions
 cute-0.053
 clever-0.053
 small-0.051
 charming-0.051
 beautiful-0.050
Token found
Token ID1043
Feature activation+1.26e-01

Pos logit contributions
ors+0.365
 to+0.360
but+0.357
fully+0.350
driver+0.337
Neg logit contributions
 cute-0.332
 clever-0.326
 charming-0.320
 small-0.312
 gentle-0.311
Token a
Token ID257
Feature activation-1.29e+01

Pos logit contributions
 to+0.004
fully+0.004
but+0.004
ors+0.004
driver+0.004
Neg logit contributions
 cute-0.003
 charming-0.003
 clever-0.003
 gentle-0.003
 playful-0.003
Token shiny
Token ID22441
Feature activation-1.51e+01

Pos logit contributions
 clever+0.204
 cute+0.202
 charming+0.191
 gentle+0.189
aver+0.187
Neg logit contributions
 to-0.530
driver-0.474
but-0.465
fully-0.456
rings-0.448
Token thing
Token ID1517
Feature activation+6.46e+00

Pos logit contributions
 cute+0.340
 clever+0.332
 small+0.325
 charming+0.324
 beautiful+0.316
Neg logit contributions
ors-0.574
but-0.567
fully-0.549
oes-0.537
 to-0.528
Token in
Token ID287
Feature activation+1.04e+01

Pos logit contributions
ors+0.197
uld+0.192
or+0.184
but+0.183
 Stop+0.179
Neg logit contributions
 small-0.201
 cute-0.200
 big-0.199
 funny-0.193
 beautiful-0.192
Token the
Token ID262
Feature activation-3.94e+00

Pos logit contributions
 to+0.260
ors+0.234
driver+0.233
fully+0.231
ges+0.229
Neg logit contributions
 clever-0.357
 cute-0.352
 charming-0.341
 playful-0.335
 gentle-0.334
Token park
Token ID3952
Feature activation-5.87e+00

Pos logit contributions
 clever+0.107
 cute+0.107
 charming+0.099
 playful+0.098
 gentle+0.097
Neg logit contributions
 to-0.126
oes-0.116
but-0.116
or-0.110
 Stop-0.110
Token.
Token ID13
Feature activation-8.30e+00

Pos logit contributions
 cute+0.153
 clever+0.151
 small+0.151
 charming+0.148
 kind+0.148
Neg logit contributions
ors-0.198
but-0.192
fully-0.188
driver-0.181
oes-0.180