TOP ACTIVATIONS
MAX = 28.697

the bed . " Here it is , Lily ," she
of love and care , it continued to grow and bloom
picture of a bird on it . Sara thought it was
a k itty , and it was me owing for help
music and the baby loved it ! But when the show
gl ows when you touch it . It is harmless ,
group of children happened upon it . The children
The bird was sad because it wanted to go back to
was a big chair and it was made from the rock
home , she played with it everywhere and it was fun
was intelligent enough to understand it all . When
butterfly and wanted to collect it , but the butterfly didn
He ran over and discovered it was one of the stolen
the ic icle and watches it grow bigger and bigger .
the hedge and made sure it was kept in good condition
, gray fle a and it was crawling around very quickly
blew into it , but it made a loud noise that
ed in a place where it could make someone feel better
and higher !" She said it over and over again .
. She did not realize it until she was already walking

MIN ACTIVATIONS
MIN = -28.991

ice ! Lucy felt so small and tiny compared to the
home , they cut a small piece of the spicy pepper
stack . He saw the small bird and its feathers had
ground . The seed was small and round . One morning
coming from and noticed a small house . His curiosity grew
. The red fish is small and cute . Lily likes
She carefully piled up the small bows and started sorting them
and the little girl was small . One
and the little girl was small . One
to me !" But the small girl didn 't give him
In the sand was a small box . "
discovered something very , very small . When they went over
investigate and he saw a small rabbit running away . The
the ground . It was small and black with shiny wings
plane . He saw a small bird up in the clouds
. But she was too small . She could not hold
stairs , he met a small cat . The cat said
One day the small boy made a little money
cloth . They cut a small piece to make a new
, but she was too small . She felt very confused

INTERVAL 14.275 - 28.697
CONTAINS 6.645%

remembers that his mom uses it to clean his cuts .
was so proud of herself and the soldier was so happy
country . As they continued walking , they came
Lily 's dad asked her to count all the books on
L ily 's mom said they could come back the next

INTERVAL -0.147 - 14.275
CONTAINS 43.339%

do and felt very worried about him .
me !" Anna does not want to share . She pulls
She asked , " Who are you ?" The fairy replied
the yard . Tom had so much fun playing with his
again . <|endoftext|> Once there was a little child who was

INTERVAL -14.569 - -0.147
CONTAINS 44.842%

were playing in the park . Bob had a new toy
seriously . <|endoftext|> Once upon a time , there was a
't know what to do . Then , Lily
the new toys . <|endoftext|> Once upon a time , there
the rainbow seemed to stay the same distance away .

INTERVAL -28.991 - -14.569
CONTAINS 5.174%

knew he had to be brave . Tim managed to run
notebook safe ." Big Bear thanked the rabbit and
in the sand and ate y ummy snacks . They even
slim y . Soon her blue and yellow fish came swimming
John was so excited . He raced outside to
Token the
Token ID262
Feature activation+5.60e+00

Pos logit contributions
ors+0.025
driver+0.024
rings+0.024
oes+0.023
licks+0.023
Neg logit contributions
 beautiful-0.046
 charming-0.046
 lovely-0.046
 small-0.046
 clever-0.045
Token bed
Token ID3996
Feature activation+8.54e+00

Pos logit contributions
 to+0.175
driver+0.169
rings+0.168
ho+0.167
oes+0.166
Neg logit contributions
 charming-0.124
 clever-0.123
 cute-0.120
 wonderful-0.120
 lovely-0.119
Token.
Token ID13
Feature activation-1.89e+00

Pos logit contributions
ors+0.194
ted+0.188
rings+0.187
oes+0.186
 Stop+0.186
Neg logit contributions
 beautiful-0.290
 charming-0.288
 small-0.283
 lovely-0.283
 nice-0.283
Token "
Token ID366
Feature activation+4.96e+00

Pos logit contributions
 charming+0.070
 beautiful+0.070
 cute+0.070
 lovely+0.069
 small+0.069
Neg logit contributions
ors-0.035
driver-0.034
ted-0.033
rings-0.033
oes-0.032
TokenHere
Token ID4342
Feature activation+4.36e-01

Pos logit contributions
ted+0.111
driver+0.111
 Stop+0.110
rings+0.110
ors+0.109
Neg logit contributions
 charming-0.164
 beautiful-0.164
 clever-0.162
 wonderful-0.162
 lovely-0.161
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.009
ted+0.009
driver+0.009
oes+0.008
rings+0.008
Neg logit contributions
 beautiful-0.016
 charming-0.016
 wonderful-0.016
 clever-0.016
 lovely-0.016
Token is
Token ID318
Feature activation+1.35e+01

Pos logit contributions
ors+0.569
rings+0.562
ted+0.560
driver+0.540
oes+0.538
Neg logit contributions
 beautiful-0.822
 nice-0.819
 charming-0.813
 small-0.812
 lovely-0.812
Token,
Token ID11
Feature activation+4.36e+00

Pos logit contributions
 to+0.314
fully+0.281
ors+0.281
 him+0.277
driver+0.275
Neg logit contributions
 charming-0.440
 clever-0.435
 cute-0.428
 beautiful-0.423
 wonderful-0.422
Token Lily
Token ID20037
Feature activation-2.02e+00

Pos logit contributions
ors+0.103
ted+0.098
oes+0.096
driver+0.093
rings+0.092
Neg logit contributions
 beautiful-0.156
 charming-0.152
 lovely-0.151
 cute-0.151
 wonderful-0.151
Token,"
Token ID553
Feature activation-1.25e+00

Pos logit contributions
 beautiful+0.056
 charming+0.056
 clever+0.055
 lovely+0.055
 cute+0.054
Neg logit contributions
ted-0.058
ors-0.057
rings-0.056
driver-0.056
oes-0.055
Token she
Token ID673
Feature activation+1.83e+01

Pos logit contributions
 beautiful+0.043
 charming+0.043
 clever+0.043
 small+0.043
 gentle+0.043
Neg logit contributions
ors-0.029
oes-0.027
driver-0.027
rings-0.026
 Stop-0.026
Token of
Token ID286
Feature activation+1.41e+01

Pos logit contributions
 beautiful+0.101
 charming+0.100
 wonderful+0.099
 lovely+0.099
 clever+0.099
Neg logit contributions
ors-0.079
ted-0.077
driver-0.076
rings-0.076
oes-0.074
Token love
Token ID1842
Feature activation+3.21e-01

Pos logit contributions
ors+0.384
 to+0.383
ted+0.379
driver+0.376
 him+0.372
Neg logit contributions
 charming-0.385
 clever-0.380
 beautiful-0.376
 cute-0.374
 wonderful-0.373
Token and
Token ID290
Feature activation+1.95e+01

Pos logit contributions
driver+0.009
rings+0.009
ted+0.009
ors+0.009
oes+0.009
Neg logit contributions
 beautiful-0.010
 lovely-0.009
 wonderful-0.009
 charming-0.009
 nice-0.009
Token care
Token ID1337
Feature activation+1.25e+01

Pos logit contributions
ors+0.652
driver+0.635
ted+0.633
rings+0.628
oes+0.616
Neg logit contributions
 beautiful-0.463
 charming-0.461
 wonderful-0.456
 clever-0.453
 lovely-0.452
Token,
Token ID11
Feature activation+5.59e+00

Pos logit contributions
ted+0.302
rings+0.289
 Stop+0.285
driver+0.283
ho+0.283
Neg logit contributions
 beautiful-0.438
 nice-0.434
 wonderful-0.432
 lovely-0.430
 charming-0.426
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.134
driver+0.126
ted+0.125
oes+0.123
rings+0.119
Neg logit contributions
 beautiful-0.203
 wonderful-0.199
 charming-0.199
 lovely-0.197
 cute-0.197
Token continued
Token ID3767
Feature activation+5.24e+00

Pos logit contributions
ors+0.500
driver+0.496
ted+0.485
rings+0.481
 Stop+0.466
Neg logit contributions
 beautiful-0.915
 charming-0.909
 clever-0.906
 wonderful-0.905
 lovely-0.899
Token to
Token ID284
Feature activation+1.99e+01

Pos logit contributions
ors+0.057
 Stop+0.055
driver+0.054
rings+0.052
ted+0.051
Neg logit contributions
 beautiful-0.252
 small-0.248
 charming-0.247
 lovely-0.246
 nice-0.245
Token grow
Token ID1663
Feature activation-5.29e-01

Pos logit contributions
 him+0.444
 to+0.435
 it+0.417
driver+0.416
 Stop+0.413
Neg logit contributions
 charming-0.645
 clever-0.634
 cute-0.631
 beautiful-0.622
 lovely-0.618
Token and
Token ID290
Feature activation+1.64e+01

Pos logit contributions
 charming+0.016
 clever+0.016
 cute+0.016
 wonderful+0.015
 lovely+0.015
Neg logit contributions
 to-0.013
ors-0.012
ted-0.012
fully-0.012
ting-0.012
Token bloom
Token ID29955
Feature activation-5.11e-01

Pos logit contributions
ors+0.474
driver+0.456
ted+0.450
rings+0.438
 Stop+0.430
Neg logit contributions
 beautiful-0.485
 charming-0.478
 wonderful-0.477
 clever-0.475
 lovely-0.472
Token picture
Token ID4286
Feature activation+9.98e+00

Pos logit contributions
ors+0.151
driver+0.144
ted+0.142
rings+0.139
 Stop+0.136
Neg logit contributions
 charming-0.372
 beautiful-0.371
 wonderful-0.368
 clever-0.367
 lovely-0.367
Token of
Token ID286
Feature activation+1.31e+01

Pos logit contributions
ors+0.243
driver+0.239
 Stop+0.237
ted+0.234
ting+0.228
Neg logit contributions
 beautiful-0.327
 charming-0.318
 lovely-0.316
 wonderful-0.315
 nice-0.314
Token a
Token ID257
Feature activation-6.15e+00

Pos logit contributions
ors+0.309
rings+0.307
 to+0.306
ted+0.306
 him+0.300
Neg logit contributions
 charming-0.409
 clever-0.408
 wonderful-0.399
 beautiful-0.399
 cute-0.399
Token bird
Token ID6512
Feature activation+6.00e+00

Pos logit contributions
 clever+0.124
 charming+0.122
 cute+0.116
 wonderful+0.115
 gentle+0.114
Neg logit contributions
 to-0.207
 we-0.193
 it-0.191
rings-0.190
 they-0.190
Token on
Token ID319
Feature activation+1.15e+01

Pos logit contributions
ors+0.149
driver+0.147
ted+0.143
 Stop+0.142
rings+0.141
Neg logit contributions
 beautiful-0.189
 charming-0.188
 lovely-0.186
 wonderful-0.185
 cute-0.184
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.263
driver+0.261
 to+0.260
ted+0.260
rings+0.253
Neg logit contributions
 charming-0.370
 clever-0.369
 wonderful-0.367
 beautiful-0.362
 cute-0.361
Token.
Token ID13
Feature activation-4.06e+00

Pos logit contributions
ors+0.675
ted+0.663
driver+0.663
rings+0.652
 Stop+0.649
Neg logit contributions
 beautiful-0.713
 charming-0.709
 wonderful-0.698
 lovely-0.698
 clever-0.694
Token Sara
Token ID24799
Feature activation+3.70e+00

Pos logit contributions
 charming+0.145
 beautiful+0.145
 small+0.144
 cute+0.144
 lovely+0.143
Neg logit contributions
ors-0.082
driver-0.079
ted-0.077
rings-0.075
fully-0.075
Token thought
Token ID1807
Feature activation-4.63e-01

Pos logit contributions
ors+0.083
driver+0.082
rings+0.081
ted+0.081
oes+0.078
Neg logit contributions
 charming-0.125
 beautiful-0.124
 clever-0.123
 wonderful-0.123
 lovely-0.123
Token it
Token ID340
Feature activation+2.35e+01

Pos logit contributions
 beautiful+0.020
 wonderful+0.020
 small+0.020
 lovely+0.020
 clever+0.020
Neg logit contributions
ors-0.006
 Stop-0.006
ted-0.006
oes-0.006
but-0.006
Token was
Token ID373
Feature activation+4.98e+00

Pos logit contributions
ors+0.358
ted+0.357
 to+0.346
rings+0.344
driver+0.342
Neg logit contributions
 charming-0.921
 clever-0.909
 beautiful-0.907
 wonderful-0.905
 lovely-0.902
Token a
Token ID257
Feature activation-4.00e+00

Pos logit contributions
ors+0.318
ted+0.311
driver+0.309
rings+0.306
 Stop+0.295
Neg logit contributions
 charming-0.621
 beautiful-0.619
 wonderful-0.614
 clever-0.613
 lovely-0.611
Token k
Token ID479
Feature activation-3.19e+00

Pos logit contributions
 charming+0.058
 clever+0.058
 intelligent+0.054
 impressive+0.054
 cute+0.053
Neg logit contributions
 to-0.150
 it-0.140
driver-0.136
 we-0.135
ho-0.135
Tokenitty
Token ID9760
Feature activation+3.44e+00

Pos logit contributions
 charming+0.111
 beautiful+0.111
 lovely+0.110
 cute+0.109
 small+0.109
Neg logit contributions
ors-0.066
driver-0.062
fully-0.060
rings-0.059
 inspection-0.058
Token,
Token ID11
Feature activation+6.12e+00

Pos logit contributions
ors+0.103
driver+0.101
oes+0.099
rings+0.097
but+0.095
Neg logit contributions
 beautiful-0.089
 charming-0.088
 cute-0.087
 lovely-0.087
 clever-0.086
Token and
Token ID290
Feature activation+1.81e+01

Pos logit contributions
ors+0.164
driver+0.157
ted+0.156
rings+0.154
oes+0.152
Neg logit contributions
 beautiful-0.185
 charming-0.185
 wonderful-0.182
 lovely-0.182
 clever-0.182
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.456
driver+0.442
ted+0.435
rings+0.434
oes+0.426
Neg logit contributions
 charming-0.568
 beautiful-0.568
 clever-0.561
 wonderful-0.561
 lovely-0.559
Token was
Token ID373
Feature activation+6.15e+00

Pos logit contributions
ted+0.530
ors+0.527
rings+0.519
driver+0.511
 to+0.505
Neg logit contributions
 charming-0.834
 wonderful-0.826
 beautiful-0.825
 clever-0.821
 lovely-0.821
Token me
Token ID502
Feature activation+3.62e+00

Pos logit contributions
 to+0.202
 it+0.189
 him+0.187
rings+0.179
 they+0.177
Neg logit contributions
 clever-0.135
 charming-0.131
 humble-0.127
 wonderful-0.126
 lively-0.123
Tokenowing
Token ID7855
Feature activation+2.82e+00

Pos logit contributions
ors+0.073
rings+0.073
driver+0.070
 to+0.070
ted+0.070
Neg logit contributions
 charming-0.129
 beautiful-0.128
 wonderful-0.127
 clever-0.127
 lovely-0.126
Token for
Token ID329
Feature activation+1.26e+01

Pos logit contributions
driver+0.072
ors+0.072
ted+0.070
rings+0.070
oes+0.068
Neg logit contributions
 beautiful-0.089
 charming-0.088
 lovely-0.087
 cute-0.087
 wonderful-0.086
Token help
Token ID1037
Feature activation-4.45e-01

Pos logit contributions
 him+0.293
 it+0.270
 to+0.259
ted+0.230
 they+0.216
Neg logit contributions
 charming-0.454
 clever-0.434
 popular-0.433
 cute-0.432
 beautiful-0.431
Token music
Token ID2647
Feature activation-1.99e-01

Pos logit contributions
ors+0.300
driver+0.292
ted+0.290
rings+0.289
 Stop+0.280
Neg logit contributions
 charming-0.475
 beautiful-0.474
 clever-0.469
 wonderful-0.469
 lovely-0.468
Token and
Token ID290
Feature activation+2.07e+01

Pos logit contributions
 beautiful+0.007
 charming+0.007
 wonderful+0.007
 lovely+0.007
 nice+0.007
Neg logit contributions
ors-0.004
driver-0.004
ted-0.004
rings-0.004
 Stop-0.004
Token the
Token ID262
Feature activation+3.44e+00

Pos logit contributions
ors+0.560
driver+0.547
ted+0.542
rings+0.538
 Stop+0.527
Neg logit contributions
 beautiful-0.607
 charming-0.605
 wonderful-0.599
 clever-0.597
 lovely-0.595
Token baby
Token ID5156
Feature activation-2.04e+00

Pos logit contributions
 to+0.106
rings+0.102
als+0.102
ted+0.101
ting+0.101
Neg logit contributions
 charming-0.075
 clever-0.075
 cute-0.071
 lovely-0.070
 wonderful-0.070
Token loved
Token ID6151
Feature activation-1.74e+00

Pos logit contributions
 charming+0.062
 beautiful+0.062
 wonderful+0.061
 clever+0.061
 lovely+0.061
Neg logit contributions
ors-0.053
driver-0.052
ted-0.051
rings-0.051
 Stop-0.050
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
 beautiful+0.070
 charming+0.069
 wonderful+0.069
 lovely+0.069
 cute+0.068
Neg logit contributions
driver-0.030
ors-0.030
rings-0.029
oes-0.028
ted-0.028
Token!
Token ID0
Feature activation+2.16e+00

Pos logit contributions
ors+0.599
driver+0.581
rings+0.581
 to+0.579
ted+0.576
Neg logit contributions
 charming-0.768
 clever-0.760
 beautiful-0.759
 wonderful-0.752
 cute-0.752
Token But
Token ID887
Feature activation+1.34e+01

Pos logit contributions
ors+0.053
driver+0.053
ted+0.051
fully+0.050
rings+0.049
Neg logit contributions
 charming-0.068
 beautiful-0.068
 lovely-0.067
 wonderful-0.067
 small-0.067
Token when
Token ID618
Feature activation+7.15e+00

Pos logit contributions
ors+0.318
driver+0.314
ted+0.298
licks+0.297
oes+0.291
Neg logit contributions
 beautiful-0.462
 wonderful-0.455
 charming-0.455
 small-0.454
 lovely-0.452
Token the
Token ID262
Feature activation+3.67e-01

Pos logit contributions
ors+0.134
driver+0.132
oes+0.122
rings+0.121
ted+0.121
Neg logit contributions
 beautiful-0.282
 wonderful-0.279
 charming-0.278
 lovely-0.277
 clever-0.276
Token show
Token ID905
Feature activation-2.10e+00

Pos logit contributions
 to+0.011
ted+0.011
ting+0.011
als+0.011
 we+0.011
Neg logit contributions
 clever-0.008
 charming-0.008
 wonderful-0.008
 cozy-0.008
 cute-0.008
Token gl
Token ID1278
Feature activation-1.21e+00

Pos logit contributions
ors+0.150
 to+0.148
rings+0.141
ted+0.140
driver+0.140
Neg logit contributions
 charming-0.196
 wonderful-0.195
 clever-0.194
 beautiful-0.193
 lovely-0.192
Tokenows
Token ID1666
Feature activation+5.94e+00

Pos logit contributions
 beautiful+0.037
 charming+0.037
 wonderful+0.036
 clever+0.036
 lovely+0.036
Neg logit contributions
ors-0.031
driver-0.030
ted-0.030
rings-0.030
 Stop-0.029
Token when
Token ID618
Feature activation+9.97e+00

Pos logit contributions
 Stop+0.124
ors+0.124
driver+0.122
oes+0.116
rings+0.116
Neg logit contributions
 beautiful-0.220
 small-0.214
 charming-0.213
 lovely-0.212
 nice-0.212
Token you
Token ID345
Feature activation+1.68e+01

Pos logit contributions
ors+0.162
driver+0.156
ted+0.149
rings+0.145
oes+0.144
Neg logit contributions
 beautiful-0.407
 charming-0.403
 wonderful-0.402
 clever-0.399
 lovely-0.399
Token touch
Token ID3638
Feature activation+2.39e+00

Pos logit contributions
ors+0.344
ted+0.340
rings+0.337
driver+0.330
 Stop+0.330
Neg logit contributions
 charming-0.592
 beautiful-0.584
 clever-0.580
 wonderful-0.578
 lovely-0.578
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.034
driver+0.033
ted+0.032
rings+0.032
 Stop+0.030
Neg logit contributions
 charming-0.100
 beautiful-0.100
 wonderful-0.099
 clever-0.099
 lovely-0.099
Token.
Token ID13
Feature activation-4.06e+00

Pos logit contributions
ors+0.561
rings+0.543
 to+0.542
ted+0.534
driver+0.533
Neg logit contributions
 charming-0.808
 clever-0.800
 wonderful-0.795
 beautiful-0.793
 lovely-0.787
Token It
Token ID632
Feature activation+1.91e+01

Pos logit contributions
 charming+0.174
 beautiful+0.174
 lovely+0.173
 clever+0.172
 wonderful+0.172
Neg logit contributions
ors-0.053
driver-0.051
ted-0.050
rings-0.047
fully-0.047
Token is
Token ID318
Feature activation+1.29e+01

Pos logit contributions
ors+0.416
driver+0.407
ted+0.401
fully+0.395
but+0.392
Neg logit contributions
 beautiful-0.688
 charming-0.678
 cute-0.677
 clever-0.677
 wonderful-0.671
Token harmless
Token ID23585
Feature activation-1.82e+01

Pos logit contributions
 to+0.447
rings+0.417
 it+0.417
 they+0.407
 him+0.405
Neg logit contributions
 charming-0.251
 clever-0.245
 wonderful-0.239
 lovely-0.237
 lively-0.235
Token,
Token ID11
Feature activation+3.56e+00

Pos logit contributions
 beautiful+0.566
 charming+0.556
 lovely+0.551
 wonderful+0.551
 cute+0.550
Neg logit contributions
ors-0.466
driver-0.463
ted-0.459
rings-0.457
oes-0.447
Token group
Token ID1448
Feature activation-2.97e+00

Pos logit contributions
 him+0.067
 to+0.065
 it+0.062
 they+0.060
 It+0.058
Neg logit contributions
 clever-0.079
 gentle-0.076
 charming-0.075
 compassionate-0.074
 powerful-0.074
Token of
Token ID286
Feature activation+1.31e+01

Pos logit contributions
 charming+0.102
 beautiful+0.101
 wonderful+0.100
 clever+0.100
 lovely+0.100
Neg logit contributions
ors-0.064
driver-0.063
ted-0.062
rings-0.062
 Stop-0.060
Token children
Token ID1751
Feature activation+4.97e+00

Pos logit contributions
 to+0.463
 him+0.450
ted+0.436
 it+0.430
ors+0.426
Neg logit contributions
 charming-0.258
 clever-0.248
 cute-0.247
 lovely-0.244
 beautiful-0.243
Token happened
Token ID3022
Feature activation+3.81e+00

Pos logit contributions
driver+0.131
ors+0.131
ted+0.127
rings+0.126
 Stop+0.123
Neg logit contributions
 beautiful-0.147
 charming-0.147
 wonderful-0.145
 clever-0.145
 lovely-0.145
Token upon
Token ID2402
Feature activation+1.13e+01

Pos logit contributions
ors+0.083
driver+0.082
rings+0.080
ted+0.080
oes+0.077
Neg logit contributions
 beautiful-0.133
 charming-0.132
 wonderful-0.131
 clever-0.131
 lovely-0.130
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
 to+0.366
 him+0.359
ted+0.352
 Stop+0.351
fully+0.346
Neg logit contributions
 charming-0.256
 clever-0.251
 cute-0.248
 wonderful-0.246
 beautiful-0.246
Token.
Token ID13
Feature activation-4.06e+00

Pos logit contributions
ors+0.561
driver+0.546
ted+0.545
rings+0.541
 Stop+0.526
Neg logit contributions
 charming-0.820
 beautiful-0.816
 wonderful-0.809
 clever-0.809
 lovely-0.806
Token
Token ID198
Feature activation-2.17e+00

Pos logit contributions
 charming+0.158
 beautiful+0.157
 lovely+0.156
 wonderful+0.155
 cute+0.155
Neg logit contributions
ors-0.070
driver-0.067
ted-0.067
rings-0.065
oes-0.062
Token
Token ID198
Feature activation-2.47e+00

Pos logit contributions
 charming+0.077
 clever+0.076
 beautiful+0.076
 cute+0.075
 lovely+0.075
Neg logit contributions
 to-0.043
ors-0.041
 Stop-0.041
driver-0.041
but-0.040
TokenThe
Token ID464
Feature activation-5.15e+00

Pos logit contributions
 charming+0.070
 beautiful+0.069
 clever+0.069
 cute+0.068
 lovely+0.068
Neg logit contributions
 Stop-0.066
driver-0.065
but-0.064
ors-0.064
 to-0.064
Token children
Token ID1751
Feature activation+1.64e+00

Pos logit contributions
 cozy+0.107
 charming+0.101
 wonderful+0.101
 cute+0.099
 lovely+0.098
Neg logit contributions
 to-0.162
ho-0.158
als-0.157
 they-0.155
 we-0.155
Token The
Token ID383
Feature activation+7.56e-01

Pos logit contributions
driver+0.459
rings+0.458
ors+0.452
but+0.449
fully+0.442
Neg logit contributions
 clever-0.738
 charming-0.729
 cute-0.727
 beautiful-0.725
 lovely-0.716
Token bird
Token ID6512
Feature activation+7.97e+00

Pos logit contributions
 to+0.023
ho+0.022
driver+0.022
rings+0.021
 him+0.021
Neg logit contributions
 charming-0.017
 clever-0.017
 wonderful-0.017
 cute-0.017
 lovely-0.016
Token was
Token ID373
Feature activation+8.32e+00

Pos logit contributions
ors+0.194
driver+0.193
rings+0.188
ted+0.184
 Stop+0.183
Neg logit contributions
 beautiful-0.258
 charming-0.258
 clever-0.255
 cute-0.253
 wonderful-0.253
Token sad
Token ID6507
Feature activation-9.16e+00

Pos logit contributions
 to+0.288
 it+0.259
 him+0.246
 they+0.241
fully+0.237
Neg logit contributions
 wonderful-0.171
 charming-0.165
 lovely-0.164
 clever-0.163
 great-0.161
Token because
Token ID780
Feature activation+1.08e+01

Pos logit contributions
 nice+0.287
 small+0.286
 beautiful+0.286
 big+0.279
 fun+0.276
Neg logit contributions
ted-0.277
als-0.262
rings-0.260
ors-0.259
 Stop-0.258
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.167
driver+0.161
 Stop+0.154
ted+0.152
rings+0.150
Neg logit contributions
 beautiful-0.464
 charming-0.459
 wonderful-0.455
 small-0.453
 lovely-0.453
Token wanted
Token ID2227
Feature activation+9.14e-02

Pos logit contributions
ors+0.545
driver+0.532
ted+0.527
rings+0.526
 Stop+0.510
Neg logit contributions
 charming-0.849
 beautiful-0.847
 clever-0.839
 wonderful-0.838
 lovely-0.835
Token to
Token ID284
Feature activation+1.99e+01

Pos logit contributions
ors+0.001
driver+0.001
 Stop+0.001
ted+0.001
rings+0.001
Neg logit contributions
 beautiful-0.004
 small-0.004
 wonderful-0.004
 nice-0.004
 charming-0.004
Token go
Token ID467
Feature activation+1.07e+01

Pos logit contributions
ors+0.427
 to+0.427
driver+0.418
rings+0.418
 him+0.418
Neg logit contributions
 charming-0.668
 wonderful-0.658
 beautiful-0.655
 clever-0.655
 lovely-0.652
Token back
Token ID736
Feature activation+6.17e+00

Pos logit contributions
ors+0.288
driver+0.280
 Stop+0.279
ted+0.270
licks+0.268
Neg logit contributions
 beautiful-0.346
 nice-0.341
 charming-0.340
 lovely-0.337
 small-0.335
Token to
Token ID284
Feature activation+1.91e+01

Pos logit contributions
ors+0.113
driver+0.112
ted+0.109
 Stop+0.107
rings+0.105
Neg logit contributions
 beautiful-0.240
 charming-0.238
 small-0.236
 lovely-0.236
 wonderful-0.236
Token was
Token ID373
Feature activation+1.05e+01

Pos logit contributions
ors+0.410
driver+0.401
ted+0.397
rings+0.391
 Stop+0.378
Neg logit contributions
 charming-0.549
 beautiful-0.548
 clever-0.543
 wonderful-0.542
 lovely-0.541
Token a
Token ID257
Feature activation-4.96e+00

Pos logit contributions
 to+0.315
 him+0.291
 it+0.287
 Stop+0.280
 they+0.278
Neg logit contributions
 charming-0.259
 clever-0.252
 cute-0.251
 lovely-0.250
 wonderful-0.250
Token big
Token ID1263
Feature activation-1.70e+01

Pos logit contributions
ady+0.068
aky+0.064
 charming+0.060
aver+0.060
 entire+0.059
Neg logit contributions
 to-0.184
 we-0.177
 they-0.173
 it-0.173
ho-0.172
Token chair
Token ID5118
Feature activation+1.05e+01

Pos logit contributions
 charming+0.493
 beautiful+0.488
 clever+0.488
 wonderful+0.487
 lovely+0.483
Neg logit contributions
driver-0.441
ors-0.438
 to-0.427
ted-0.427
 Stop-0.425
Token and
Token ID290
Feature activation+1.81e+01

Pos logit contributions
ors+0.273
 Stop+0.268
ted+0.267
rings+0.264
 inspection+0.255
Neg logit contributions
 beautiful-0.315
 kind-0.312
 charming-0.312
 nice-0.308
 small-0.307
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.514
driver+0.501
ted+0.500
rings+0.497
 Stop+0.486
Neg logit contributions
 charming-0.508
 beautiful-0.506
 clever-0.500
 wonderful-0.500
 lovely-0.498
Token was
Token ID373
Feature activation+6.15e+00

Pos logit contributions
ors+0.454
driver+0.448
ted+0.446
rings+0.438
 to+0.430
Neg logit contributions
 charming-0.908
 wonderful-0.900
 beautiful-0.899
 clever-0.898
 lovely-0.896
Token made
Token ID925
Feature activation+6.63e+00

Pos logit contributions
 to+0.196
driver+0.185
rings+0.180
 it+0.179
 him+0.179
Neg logit contributions
 charming-0.143
 clever-0.143
 wonderful-0.141
 cute-0.140
 lovely-0.139
Token from
Token ID422
Feature activation+8.85e+00

Pos logit contributions
ors+0.131
driver+0.127
ted+0.126
rings+0.126
 Stop+0.121
Neg logit contributions
 beautiful-0.243
 charming-0.243
 wonderful-0.240
 clever-0.240
 lovely-0.240
Token the
Token ID262
Feature activation+3.67e-01

Pos logit contributions
 to+0.254
 him+0.251
driver+0.249
ors+0.247
rings+0.246
Neg logit contributions
 charming-0.231
 clever-0.228
 cute-0.225
 wonderful-0.224
 lovely-0.223
Token rock
Token ID3881
Feature activation+7.21e+00

Pos logit contributions
 to+0.011
driver+0.011
rings+0.011
 Stop+0.011
 him+0.010
Neg logit contributions
 charming-0.009
 clever-0.009
 cute-0.009
 wonderful-0.008
 beautiful-0.008
Token home
Token ID1363
Feature activation+9.73e+00

Pos logit contributions
 beautiful+0.161
 charming+0.160
 wonderful+0.159
 clever+0.159
 lovely+0.159
Neg logit contributions
ors-0.036
driver-0.033
 Stop-0.030
rings-0.029
fully-0.029
Token,
Token ID11
Feature activation+8.09e+00

Pos logit contributions
ors+0.223
driver+0.220
ted+0.219
 Stop+0.215
rings+0.211
Neg logit contributions
 beautiful-0.330
 charming-0.329
 lovely-0.324
 wonderful-0.323
 clever-0.322
Token she
Token ID673
Feature activation+2.17e+01

Pos logit contributions
ors+0.181
driver+0.174
ted+0.172
rings+0.167
oes+0.167
Neg logit contributions
 beautiful-0.286
 charming-0.284
 wonderful-0.281
 cute-0.281
 lovely-0.280
Token played
Token ID2826
Feature activation+3.23e+00

Pos logit contributions
ors+0.524
driver+0.511
ted+0.508
rings+0.505
oes+0.491
Neg logit contributions
 charming-0.693
 beautiful-0.692
 clever-0.686
 wonderful-0.685
 lovely-0.682
Token with
Token ID351
Feature activation+1.07e+01

Pos logit contributions
ors+0.067
rings+0.066
oes+0.066
driver+0.063
ted+0.063
Neg logit contributions
 beautiful-0.120
 charming-0.119
 clever-0.118
 wonderful-0.118
 lovely-0.118
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ted+0.202
 him+0.197
 to+0.193
ors+0.193
rings+0.193
Neg logit contributions
 charming-0.392
 clever-0.389
 beautiful-0.383
 wonderful-0.383
 lovely-0.382
Token everywhere
Token ID8347
Feature activation+1.33e+01

Pos logit contributions
ors+0.475
ted+0.459
driver+0.458
rings+0.453
fully+0.438
Neg logit contributions
 charming-0.905
 beautiful-0.899
 clever-0.894
 wonderful-0.892
 lovely-0.890
Token and
Token ID290
Feature activation+1.69e+01

Pos logit contributions
ted+0.249
ors+0.243
oes+0.238
driver+0.236
rings+0.236
Neg logit contributions
 beautiful-0.519
 nice-0.510
 lovely-0.510
 charming-0.509
 small-0.508
Token it
Token ID340
Feature activation+2.37e+01

Pos logit contributions
ors+0.447
driver+0.429
oes+0.427
 Stop+0.409
rings+0.395
Neg logit contributions
 beautiful-0.553
 wonderful-0.542
 charming-0.537
 clever-0.534
 lovely-0.531
Token was
Token ID373
Feature activation+5.24e+00

Pos logit contributions
ted+0.412
 to+0.399
ors+0.391
fully+0.373
als+0.373
Neg logit contributions
 charming-0.886
 wonderful-0.878
 lovely-0.875
 clever-0.874
 cute-0.872
Token fun
Token ID1257
Feature activation-1.30e+01

Pos logit contributions
 to+0.146
 it+0.139
 him+0.135
 they+0.126
 wouldn+0.125
Neg logit contributions
 clever-0.138
 charming-0.135
 humble-0.132
 compassionate-0.130
 gentle-0.130
Token was
Token ID373
Feature activation+1.05e+01

Pos logit contributions
ors+0.045
rings+0.042
driver+0.040
 Stop+0.038
 aren+0.037
Neg logit contributions
 small-0.050
 big-0.048
 gentle-0.048
 beautiful-0.047
 large-0.047
Token intelligent
Token ID12661
Feature activation-1.52e+01

Pos logit contributions
 to+0.315
 him+0.291
 it+0.287
 Stop+0.280
 they+0.278
Neg logit contributions
 charming-0.259
 clever-0.252
 cute-0.251
 lovely-0.250
 wonderful-0.250
Token enough
Token ID1576
Feature activation+1.50e+01

Pos logit contributions
 beautiful+0.504
 kind+0.500
 wonderful+0.495
 charming+0.494
 nice+0.493
Neg logit contributions
ors-0.365
rings-0.365
ted-0.356
driver-0.354
oes-0.353
Token to
Token ID284
Feature activation+2.17e+01

Pos logit contributions
ors+0.283
ted+0.272
driver+0.270
rings+0.265
 Stop+0.263
Neg logit contributions
 beautiful-0.573
 charming-0.571
 wonderful-0.568
 clever-0.566
 lovely-0.565
Token understand
Token ID1833
Feature activation+2.41e+00

Pos logit contributions
ors+0.498
driver+0.490
rings+0.484
ted+0.483
 Stop+0.472
Neg logit contributions
 charming-0.707
 beautiful-0.701
 wonderful-0.694
 clever-0.693
 lovely-0.693
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.053
driver+0.052
ted+0.051
rings+0.051
oes+0.050
Neg logit contributions
 beautiful-0.084
 wonderful-0.083
 charming-0.083
 lovely-0.082
 clever-0.082
Token all
Token ID477
Feature activation+4.54e+00

Pos logit contributions
ors+0.595
ted+0.582
driver+0.581
rings+0.575
 Stop+0.561
Neg logit contributions
 beautiful-0.800
 charming-0.798
 wonderful-0.795
 clever-0.792
 lovely-0.788
Token.
Token ID13
Feature activation-4.41e+00

Pos logit contributions
ors+0.104
driver+0.102
ted+0.101
rings+0.101
 Stop+0.098
Neg logit contributions
 charming-0.151
 beautiful-0.151
 wonderful-0.150
 clever-0.149
 lovely-0.149
Token
Token ID198
Feature activation-2.47e+00

Pos logit contributions
 charming+0.142
 beautiful+0.142
 lovely+0.141
 wonderful+0.140
 small+0.140
Neg logit contributions
ors-0.103
driver-0.101
ted-0.101
rings-0.098
fully-0.096
Token
Token ID198
Feature activation-2.72e+00

Pos logit contributions
 charming+0.077
 clever+0.076
 beautiful+0.076
 cute+0.076
 lovely+0.075
Neg logit contributions
 to-0.058
 him-0.058
ors-0.056
but-0.056
driver-0.056
TokenWhen
Token ID2215
Feature activation+1.17e+01

Pos logit contributions
 charming+0.075
 beautiful+0.074
 clever+0.073
 cute+0.073
 lovely+0.073
Neg logit contributions
ors-0.075
driver-0.075
 Stop-0.074
but-0.073
ted-0.073
Token butterfly
Token ID35113
Feature activation+6.98e+00

Pos logit contributions
ors+0.182
driver+0.170
rings+0.166
oes+0.165
licks+0.162
Neg logit contributions
 small-0.210
 beautiful-0.206
 big-0.202
 lovely-0.201
 nice-0.200
Token and
Token ID290
Feature activation+2.07e+01

Pos logit contributions
driver+0.166
ors+0.163
rings+0.155
 Stop+0.153
oes+0.153
Neg logit contributions
 beautiful-0.229
 charming-0.226
 lovely-0.223
 cute-0.223
 wonderful-0.222
Token wanted
Token ID2227
Feature activation+2.26e+00

Pos logit contributions
ors+0.542
driver+0.525
ted+0.509
rings+0.505
 Stop+0.503
Neg logit contributions
 beautiful-0.643
 charming-0.636
 wonderful-0.631
 clever-0.630
 cute-0.627
Token to
Token ID284
Feature activation+2.17e+01

Pos logit contributions
ors+0.019
driver+0.017
 Stop+0.015
ted+0.015
rings+0.014
Neg logit contributions
 beautiful-0.112
 wonderful-0.111
 charming-0.111
 lovely-0.110
 small-0.110
Token collect
Token ID2824
Feature activation+1.22e+01

Pos logit contributions
 to+0.462
rings+0.459
ors+0.457
 Stop+0.447
oes+0.443
Neg logit contributions
 charming-0.731
 clever-0.719
 wonderful-0.717
 beautiful-0.714
 cute-0.714
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ted+0.174
ors+0.173
rings+0.170
 to+0.169
driver+0.159
Neg logit contributions
 charming-0.496
 clever-0.495
 cute-0.487
 wonderful-0.484
 lovely-0.484
Token,
Token ID11
Feature activation+4.72e+00

Pos logit contributions
ors+0.649
driver+0.634
ted+0.632
rings+0.631
 Stop+0.611
Neg logit contributions
 charming-0.734
 beautiful-0.731
 clever-0.724
 wonderful-0.724
 lovely-0.720
Token but
Token ID475
Feature activation+1.46e+01

Pos logit contributions
ors+0.130
ted+0.128
driver+0.121
oes+0.118
 Stop+0.117
Neg logit contributions
 beautiful-0.161
 wonderful-0.154
 charming-0.154
 small-0.154
 gentle-0.153
Token the
Token ID262
Feature activation+6.17e-01

Pos logit contributions
ors+0.280
driver+0.254
oes+0.248
ted+0.246
 Stop+0.235
Neg logit contributions
 small-0.586
 beautiful-0.586
 clever-0.583
 charming-0.578
 gentle-0.573
Token butterfly
Token ID35113
Feature activation+1.74e+00

Pos logit contributions
 to+0.016
als+0.016
 it+0.016
rings+0.016
ho+0.015
Neg logit contributions
 charming-0.016
 wonderful-0.016
 clever-0.016
 cute-0.015
 lovely-0.015
Token didn
Token ID1422
Feature activation+7.20e+00

Pos logit contributions
ors+0.037
driver+0.037
rings+0.036
ted+0.035
 Stop+0.035
Neg logit contributions
 beautiful-0.061
 charming-0.061
 clever-0.060
 wonderful-0.060
 lovely-0.060
Token He
Token ID679
Feature activation+1.90e+01

Pos logit contributions
ors+0.042
rings+0.041
oes+0.041
ted+0.040
 Stop+0.039
Neg logit contributions
 beautiful-0.060
 clever-0.060
 charming-0.060
 wonderful-0.059
 lovely-0.059
Token ran
Token ID4966
Feature activation+1.01e+01

Pos logit contributions
ors+0.451
driver+0.438
rings+0.436
ted+0.435
 Stop+0.422
Neg logit contributions
 charming-0.618
 beautiful-0.617
 clever-0.613
 wonderful-0.612
 lovely-0.608
Token over
Token ID625
Feature activation+1.23e+01

Pos logit contributions
rings+0.240
ors+0.235
driver+0.218
oes+0.217
als+0.212
Neg logit contributions
 nice-0.368
 big-0.363
 small-0.362
 beautiful-0.357
 fun-0.352
Token and
Token ID290
Feature activation+1.87e+01

Pos logit contributions
ors+0.161
rings+0.152
driver+0.137
oes+0.131
ted+0.129
Neg logit contributions
 small-0.563
 beautiful-0.562
 wonderful-0.554
 big-0.552
 charming-0.551
Token discovered
Token ID5071
Feature activation+7.88e+00

Pos logit contributions
ors+0.517
oes+0.502
driver+0.499
rings+0.484
 Stop+0.474
Neg logit contributions
 clever-0.559
 beautiful-0.554
 small-0.551
 gentle-0.550
 charming-0.549
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.197
ted+0.197
driver+0.195
rings+0.193
 to+0.188
Neg logit contributions
 charming-0.239
 beautiful-0.235
 clever-0.235
 wonderful-0.234
 lovely-0.234
Token was
Token ID373
Feature activation+6.15e+00

Pos logit contributions
ors+0.531
driver+0.518
rings+0.515
ted+0.514
 Stop+0.499
Neg logit contributions
 charming-0.862
 beautiful-0.861
 clever-0.853
 wonderful-0.853
 lovely-0.849
Token one
Token ID530
Feature activation-2.51e+00

Pos logit contributions
 to+0.188
 him+0.182
 it+0.174
 they+0.165
fully+0.164
Neg logit contributions
 clever-0.151
 charming-0.151
 gentle-0.146
 cute-0.145
 wonderful-0.145
Token of
Token ID286
Feature activation+9.08e+00

Pos logit contributions
 charming+0.084
 clever+0.083
 beautiful+0.082
 wonderful+0.082
 cute+0.082
Neg logit contributions
ors-0.054
 to-0.053
rings-0.053
driver-0.053
ted-0.052
Token the
Token ID262
Feature activation+3.67e-01

Pos logit contributions
 him+0.195
 to+0.191
ted+0.186
driver+0.184
ors+0.182
Neg logit contributions
 charming-0.307
 clever-0.304
 wonderful-0.299
 lovely-0.299
 beautiful-0.298
Token stolen
Token ID9909
Feature activation-1.71e+00

Pos logit contributions
 to+0.012
rings+0.012
als+0.012
oes+0.012
driver+0.011
Neg logit contributions
 charming-0.007
 clever-0.007
 cute-0.007
 gentle-0.007
 lovely-0.007
Token the
Token ID262
Feature activation+5.60e+00

Pos logit contributions
 charming+0.006
 beautiful+0.006
 clever+0.006
 wonderful+0.006
 lovely+0.006
Neg logit contributions
ors-0.003
ted-0.003
driver-0.003
rings-0.003
 Stop-0.003
Token ic
Token ID14158
Feature activation-1.80e+00

Pos logit contributions
 to+0.175
driver+0.169
rings+0.168
ho+0.167
oes+0.166
Neg logit contributions
 charming-0.124
 clever-0.123
 cute-0.120
 wonderful-0.120
 lovely-0.119
Tokenicle
Token ID1548
Feature activation+1.19e+01

Pos logit contributions
 big+0.062
 beautiful+0.062
 small+0.061
 lovely+0.060
 nice+0.060
Neg logit contributions
 Stop-0.034
 Whoever-0.033
rings-0.033
ted-0.032
 Did-0.032
Token and
Token ID290
Feature activation+1.87e+01

Pos logit contributions
ors+0.294
 Stop+0.290
ted+0.286
rings+0.280
licks+0.273
Neg logit contributions
 beautiful-0.372
 small-0.368
 big-0.366
 nice-0.363
 lovely-0.359
Token watches
Token ID16860
Feature activation+5.43e+00

Pos logit contributions
ors+0.533
driver+0.494
oes+0.491
 Stop+0.485
but+0.484
Neg logit contributions
 beautiful-0.558
 small-0.545
 charming-0.541
 clever-0.541
 wonderful-0.539
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
driver+0.099
ors+0.099
oes+0.096
rings+0.096
ted+0.096
Neg logit contributions
 beautiful-0.208
 charming-0.203
 lovely-0.203
 small-0.202
 wonderful-0.202
Token grow
Token ID1663
Feature activation+1.26e-01

Pos logit contributions
ors+0.619
driver+0.604
ted+0.602
rings+0.599
 Stop+0.584
Neg logit contributions
 charming-0.767
 beautiful-0.765
 wonderful-0.757
 clever-0.756
 lovely-0.754
Token bigger
Token ID5749
Feature activation-1.38e+01

Pos logit contributions
 to+0.003
ors+0.003
ted+0.003
fully+0.003
als+0.003
Neg logit contributions
 charming-0.004
 clever-0.004
 wonderful-0.004
 cute-0.004
 humble-0.004
Token and
Token ID290
Feature activation+1.66e+01

Pos logit contributions
 beautiful+0.422
 charming+0.410
 small+0.408
 lovely+0.406
 wonderful+0.404
Neg logit contributions
ors-0.360
rings-0.354
 Stop-0.352
driver-0.350
ted-0.346
Token bigger
Token ID5749
Feature activation-1.44e+01

Pos logit contributions
 to+0.524
ted+0.513
rings+0.512
oes+0.511
driver+0.504
Neg logit contributions
 charming-0.364
 clever-0.353
 cute-0.352
 wonderful-0.352
 lovely-0.351
Token.
Token ID13
Feature activation-5.22e+00

Pos logit contributions
 beautiful+0.396
 charming+0.384
 small+0.382
 lovely+0.381
 wonderful+0.380
Neg logit contributions
ors-0.416
rings-0.405
 Stop-0.402
driver-0.401
ted-0.400
Token the
Token ID262
Feature activation+5.60e+00

Pos logit contributions
 nice+0.234
 clever+0.232
 small+0.230
 wonderful+0.228
 beautiful+0.228
Neg logit contributions
oes-0.157
rings-0.154
ors-0.149
als-0.140
ted-0.139
Token hedge
Token ID19859
Feature activation+3.01e-01

Pos logit contributions
 to+0.175
driver+0.169
rings+0.168
ho+0.167
oes+0.166
Neg logit contributions
 charming-0.124
 clever-0.123
 cute-0.120
 wonderful-0.120
 lovely-0.119
Token and
Token ID290
Feature activation+1.95e+01

Pos logit contributions
ors+0.007
driver+0.007
rings+0.007
ted+0.007
 Stop+0.007
Neg logit contributions
 beautiful-0.010
 charming-0.010
 small-0.010
 lovely-0.010
 clever-0.010
Token made
Token ID925
Feature activation+8.39e+00

Pos logit contributions
ors+0.523
driver+0.498
oes+0.480
but+0.473
rings+0.466
Neg logit contributions
 beautiful-0.620
 charming-0.606
 small-0.605
 clever-0.602
 wonderful-0.601
Token sure
Token ID1654
Feature activation+1.97e+00

Pos logit contributions
ors+0.194
 to+0.190
ted+0.189
driver+0.187
 him+0.186
Neg logit contributions
 charming-0.266
 clever-0.262
 cute-0.261
 wonderful-0.260
 beautiful-0.260
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.039
ted+0.037
driver+0.036
amb+0.035
licks+0.035
Neg logit contributions
 beautiful-0.081
 wonderful-0.080
 nice-0.079
 small-0.079
 big-0.078
Token was
Token ID373
Feature activation+6.15e+00

Pos logit contributions
ors+0.519
driver+0.506
ted+0.503
rings+0.501
 Stop+0.489
Neg logit contributions
 beautiful-0.877
 charming-0.873
 clever-0.867
 wonderful-0.866
 lovely-0.861
Token kept
Token ID4030
Feature activation+1.03e+01

Pos logit contributions
 to+0.215
 it+0.203
 him+0.196
 they+0.190
 wouldn+0.189
Neg logit contributions
 charming-0.119
 cute-0.115
 clever-0.113
 lovely-0.110
 lively-0.109
Token in
Token ID287
Feature activation+1.20e+01

Pos logit contributions
ors+0.243
driver+0.237
ted+0.236
rings+0.235
 Stop+0.228
Neg logit contributions
 charming-0.338
 beautiful-0.337
 wonderful-0.333
 clever-0.333
 lovely-0.332
Token good
Token ID922
Feature activation-1.26e+01

Pos logit contributions
 to+0.333
 him+0.320
 it+0.313
driver+0.310
ors+0.304
Neg logit contributions
 charming-0.325
 clever-0.321
 cute-0.316
 lovely-0.312
 wonderful-0.310
Token condition
Token ID4006
Feature activation+2.54e+00

Pos logit contributions
 beautiful+0.290
 wonderful+0.282
 charming+0.282
 lovely+0.280
 nice+0.277
Neg logit contributions
rings-0.443
driver-0.436
ors-0.432
but-0.428
oes-0.426
Token,
Token ID11
Feature activation+9.05e+00

Pos logit contributions
ors+0.225
driver+0.218
ted+0.217
rings+0.215
 Stop+0.207
Neg logit contributions
 charming-0.442
 beautiful-0.441
 clever-0.438
 wonderful-0.438
 lovely-0.436
Token gray
Token ID12768
Feature activation-1.52e+01

Pos logit contributions
ors+0.197
ted+0.187
driver+0.184
rings+0.179
oes+0.178
Neg logit contributions
 beautiful-0.329
 charming-0.326
 cute-0.324
 lovely-0.323
 wonderful-0.323
Token fle
Token ID5104
Feature activation+4.42e-01

Pos logit contributions
 beautiful+0.414
 charming+0.405
 lovely+0.401
 small+0.398
 cute+0.397
Neg logit contributions
ors-0.443
rings-0.441
driver-0.441
but-0.438
ted-0.426
Tokena
Token ID64
Feature activation-3.64e+00

Pos logit contributions
ors+0.014
driver+0.014
rings+0.014
ted+0.014
 to+0.014
Neg logit contributions
 charming-0.011
 wonderful-0.010
 beautiful-0.010
 lovely-0.010
 clever-0.010
Token and
Token ID290
Feature activation+1.81e+01

Pos logit contributions
 charming+0.118
 beautiful+0.117
 wonderful+0.117
 clever+0.116
 lovely+0.115
Neg logit contributions
ors-0.084
ted-0.083
rings-0.082
 to-0.081
driver-0.081
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ted+0.526
ors+0.524
rings+0.523
driver+0.520
 to+0.507
Neg logit contributions
 charming-0.474
 beautiful-0.468
 wonderful-0.466
 clever-0.464
 lovely-0.462
Token was
Token ID373
Feature activation+6.15e+00

Pos logit contributions
ors+0.570
ted+0.557
driver+0.552
rings+0.550
 Stop+0.532
Neg logit contributions
 charming-0.811
 beautiful-0.806
 wonderful-0.802
 clever-0.799
 lovely-0.798
Token crawling
Token ID34499
Feature activation+7.07e+00

Pos logit contributions
 to+0.194
 it+0.179
 him+0.178
 they+0.168
rings+0.168
Neg logit contributions
 clever-0.144
 charming-0.143
 wonderful-0.138
 humble-0.136
 lovely-0.136
Token around
Token ID1088
Feature activation+1.10e+01

Pos logit contributions
ors+0.135
rings+0.134
driver+0.133
 Stop+0.128
oes+0.128
Neg logit contributions
 beautiful-0.262
 charming-0.260
 lovely-0.258
 wonderful-0.257
 small-0.256
Token very
Token ID845
Feature activation-2.01e+00

Pos logit contributions
ors+0.275
driver+0.271
rings+0.268
ted+0.267
oes+0.259
Neg logit contributions
 beautiful-0.345
 charming-0.345
 wonderful-0.341
 lovely-0.340
 clever-0.340
Token quickly
Token ID2952
Feature activation-8.52e-01

Pos logit contributions
 charming+0.042
 wonderful+0.041
 clever+0.041
 beautiful+0.041
 lovely+0.041
Neg logit contributions
ors-0.070
 to-0.068
driver-0.068
fully-0.067
rings-0.067
Token blew
Token ID17948
Feature activation+2.08e-01

Pos logit contributions
 to+0.150
driver+0.139
 him+0.129
 it+0.123
t+0.121
Neg logit contributions
 wonderful-0.116
 charming-0.114
 beautiful-0.113
 lovely-0.112
 peaceful-0.112
Token into
Token ID656
Feature activation+1.64e+01

Pos logit contributions
ors+0.004
rings+0.004
driver+0.004
oes+0.004
ted+0.004
Neg logit contributions
 beautiful-0.007
 charming-0.007
 wonderful-0.007
 lovely-0.007
 small-0.007
Token it
Token ID340
Feature activation+2.65e+01

Pos logit contributions
 him+0.399
 to+0.394
driver+0.391
 it+0.387
ted+0.376
Neg logit contributions
 charming-0.503
 clever-0.492
 wonderful-0.485
 lovely-0.483
 beautiful-0.479
Token,
Token ID11
Feature activation+6.12e+00

Pos logit contributions
ors+0.602
ted+0.589
rings+0.586
driver+0.584
 Stop+0.574
Neg logit contributions
 beautiful-0.891
 charming-0.886
 wonderful-0.875
 lovely-0.874
 clever-0.873
Token but
Token ID475
Feature activation+1.58e+01

Pos logit contributions
ors+0.148
oes+0.136
ted+0.134
driver+0.133
rings+0.131
Neg logit contributions
 beautiful-0.218
 small-0.215
 gentle-0.211
 charming-0.211
 cute-0.210
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.280
driver+0.256
oes+0.254
ted+0.247
rings+0.238
Neg logit contributions
 beautiful-0.641
 clever-0.635
 small-0.634
 wonderful-0.633
 charming-0.630
Token made
Token ID925
Feature activation+6.99e+00

Pos logit contributions
ors+0.477
driver+0.463
ted+0.461
rings+0.458
 Stop+0.442
Neg logit contributions
 charming-0.909
 beautiful-0.906
 wonderful-0.899
 clever-0.898
 lovely-0.896
Token a
Token ID257
Feature activation-8.68e+00

Pos logit contributions
 him+0.175
ors+0.170
ted+0.169
 to+0.169
driver+0.166
Neg logit contributions
 charming-0.215
 clever-0.210
 cute-0.208
 beautiful-0.206
 wonderful-0.205
Token loud
Token ID7812
Feature activation-1.90e+01

Pos logit contributions
 charming+0.121
 clever+0.114
 wonderful+0.113
 humble+0.110
 lovely+0.109
Neg logit contributions
 to-0.338
driver-0.322
 it-0.320
 we-0.317
ho-0.315
Token noise
Token ID7838
Feature activation-2.72e+00

Pos logit contributions
 beautiful+0.275
 charming+0.264
 nice+0.262
 cute+0.259
 clever+0.259
Neg logit contributions
rings-0.820
ted-0.802
ors-0.802
oes-0.793
driver-0.781
Token that
Token ID326
Feature activation+1.15e+01

Pos logit contributions
 beautiful+0.068
 nice+0.066
 wonderful+0.065
 lovely+0.065
 charming+0.065
Neg logit contributions
rings-0.089
ors-0.087
oes-0.086
ted-0.085
driver-0.085
Tokened
Token ID276
Feature activation+2.05e+00

Pos logit contributions
 beautiful+0.138
 charming+0.138
 clever+0.137
 lovely+0.136
 wonderful+0.136
Neg logit contributions
ors-0.122
ted-0.121
driver-0.119
rings-0.117
oes-0.115
Token in
Token ID287
Feature activation+1.60e+01

Pos logit contributions
ors+0.041
driver+0.040
rings+0.040
oes+0.039
 Stop+0.037
Neg logit contributions
 beautiful-0.076
 charming-0.075
 small-0.074
 lovely-0.074
 wonderful-0.074
Token a
Token ID257
Feature activation-6.15e+00

Pos logit contributions
ors+0.422
driver+0.412
ted+0.411
rings+0.409
 Stop+0.398
Neg logit contributions
 charming-0.477
 beautiful-0.475
 wonderful-0.470
 clever-0.470
 lovely-0.468
Token place
Token ID1295
Feature activation+7.37e+00

Pos logit contributions
 clever+0.101
 charming+0.100
 cute+0.095
 wonderful+0.095
 intelligent+0.094
Neg logit contributions
 to-0.224
 it-0.208
driver-0.207
 Stop-0.199
 we-0.197
Token where
Token ID810
Feature activation+8.51e+00

Pos logit contributions
ors+0.194
ted+0.185
driver+0.185
rings+0.182
 Stop+0.180
Neg logit contributions
 beautiful-0.235
 charming-0.231
 wonderful-0.229
 nice-0.228
 lovely-0.228
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
ors+0.177
driver+0.168
ted+0.165
rings+0.161
 Stop+0.159
Neg logit contributions
 beautiful-0.317
 charming-0.313
 wonderful-0.312
 lovely-0.310
 small-0.310
Token could
Token ID714
Feature activation+8.47e+00

Pos logit contributions
ors+0.509
driver+0.497
rings+0.487
ted+0.486
 Stop+0.475
Neg logit contributions
 beautiful-0.883
 charming-0.883
 clever-0.874
 wonderful-0.873
 lovely-0.869
Token make
Token ID787
Feature activation+9.77e+00

Pos logit contributions
ors+0.223
ted+0.218
driver+0.216
rings+0.213
 Stop+0.209
Neg logit contributions
 beautiful-0.258
 charming-0.257
 wonderful-0.253
 clever-0.253
 lovely-0.253
Token someone
Token ID2130
Feature activation-8.65e-01

Pos logit contributions
 him+0.261
 to+0.260
ors+0.258
 it+0.249
rings+0.248
Neg logit contributions
 charming-0.272
 clever-0.270
 cute-0.263
 humble-0.259
 lovely-0.259
Token feel
Token ID1254
Feature activation+1.24e+01

Pos logit contributions
 charming+0.023
 beautiful+0.023
 wonderful+0.023
 clever+0.023
 lovely+0.023
Neg logit contributions
ors-0.025
driver-0.025
ted-0.025
rings-0.024
 Stop-0.024
Token better
Token ID1365
Feature activation-5.17e+00

Pos logit contributions
 to+0.382
 it+0.368
 him+0.361
 they+0.351
fully+0.339
Neg logit contributions
 charming-0.291
 clever-0.286
 cute-0.283
 lovely-0.281
 wonderful-0.275
Token and
Token ID290
Feature activation+2.16e+01

Pos logit contributions
ors+0.164
driver+0.160
rings+0.159
ted+0.156
oes+0.156
Neg logit contributions
 beautiful-0.241
 charming-0.241
 wonderful-0.238
 clever-0.237
 lovely-0.237
Token higher
Token ID2440
Feature activation+7.13e+00

Pos logit contributions
ors+0.559
driver+0.544
ted+0.540
rings+0.536
 Stop+0.525
Neg logit contributions
 charming-0.658
 beautiful-0.657
 wonderful-0.649
 clever-0.649
 lovely-0.646
Token!"
Token ID2474
Feature activation+5.20e+00

Pos logit contributions
ors+0.165
driver+0.162
rings+0.162
ted+0.161
 Stop+0.159
Neg logit contributions
 beautiful-0.248
 small-0.243
 charming-0.241
 lovely-0.239
 wonderful-0.238
Token She
Token ID1375
Feature activation+1.91e+01

Pos logit contributions
ors+0.123
driver+0.120
rings+0.117
ted+0.115
oes+0.114
Neg logit contributions
 charming-0.170
 beautiful-0.170
 clever-0.168
 cute-0.167
 lovely-0.167
Token said
Token ID531
Feature activation-5.99e+00

Pos logit contributions
ors+0.481
driver+0.465
rings+0.463
ted+0.461
oes+0.456
Neg logit contributions
 beautiful-0.594
 charming-0.592
 clever-0.590
 wonderful-0.585
 lovely-0.582
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
 beautiful+0.201
 charming+0.200
 clever+0.200
 wonderful+0.198
 lovely+0.198
Neg logit contributions
ors-0.138
ted-0.131
driver-0.131
oes-0.130
rings-0.130
Token over
Token ID625
Feature activation+1.01e+01

Pos logit contributions
ors+0.514
driver+0.498
ted+0.495
rings+0.493
 Stop+0.478
Neg logit contributions
 charming-0.873
 beautiful-0.871
 clever-0.863
 wonderful-0.863
 lovely-0.860
Token and
Token ID290
Feature activation+1.69e+01

Pos logit contributions
ors+0.178
driver+0.171
ted+0.170
rings+0.169
 Stop+0.162
Neg logit contributions
 charming-0.391
 beautiful-0.391
 wonderful-0.387
 clever-0.387
 lovely-0.386
Token over
Token ID625
Feature activation+9.43e+00

Pos logit contributions
ors+0.478
oes+0.448
driver+0.440
 Stop+0.425
t+0.422
Neg logit contributions
 beautiful-0.518
 clever-0.510
 gentle-0.509
 charming-0.508
 small-0.505
Token again
Token ID757
Feature activation+8.80e+00

Pos logit contributions
ors+0.201
rings+0.187
driver+0.186
oes+0.180
ted+0.180
Neg logit contributions
 beautiful-0.352
 wonderful-0.347
 small-0.347
 charming-0.345
 lovely-0.344
Token.
Token ID13
Feature activation-5.22e+00

Pos logit contributions
ors+0.229
driver+0.223
ted+0.223
rings+0.221
 Stop+0.215
Neg logit contributions
 beautiful-0.267
 charming-0.267
 clever-0.264
 wonderful-0.264
 lovely-0.263
Token.
Token ID13
Feature activation+2.71e-01

Pos logit contributions
 charming+0.091
 lovely+0.091
 beautiful+0.090
 wonderful+0.090
 nice+0.090
Neg logit contributions
ors-0.070
driver-0.069
rings-0.068
ted-0.067
but-0.066
Token She
Token ID1375
Feature activation+2.11e+01

Pos logit contributions
ors+0.006
driver+0.005
ted+0.005
fully+0.005
rings+0.005
Neg logit contributions
 charming-0.010
 small-0.010
 cute-0.010
 beautiful-0.010
 lovely-0.009
Token did
Token ID750
Feature activation+6.30e+00

Pos logit contributions
ors+0.536
driver+0.522
rings+0.516
ted+0.514
oes+0.509
Neg logit contributions
 beautiful-0.653
 charming-0.653
 clever-0.648
 wonderful-0.646
 lovely-0.641
Token not
Token ID407
Feature activation+1.48e+01

Pos logit contributions
ors+0.155
ted+0.151
driver+0.151
rings+0.149
 Stop+0.146
Neg logit contributions
 charming-0.200
 beautiful-0.200
 wonderful-0.198
 clever-0.198
 lovely-0.197
Token realize
Token ID6537
Feature activation+1.23e+01

Pos logit contributions
driver+0.323
 to+0.321
ors+0.317
rings+0.316
 Stop+0.314
Neg logit contributions
 charming-0.494
 beautiful-0.487
 clever-0.482
 lovely-0.482
 wonderful-0.482
Token it
Token ID340
Feature activation+2.47e+01

Pos logit contributions
oes+0.214
driver+0.210
ors+0.208
ted+0.203
 Stop+0.193
Neg logit contributions
 beautiful-0.490
 wonderful-0.482
 charming-0.480
 cute-0.479
 lovely-0.478
Token until
Token ID1566
Feature activation+1.01e+01

Pos logit contributions
ors+0.512
driver+0.503
rings+0.501
ted+0.497
 Stop+0.479
Neg logit contributions
 beautiful-0.878
 charming-0.876
 wonderful-0.869
 clever-0.869
 lovely-0.865
Token she
Token ID673
Feature activation+1.91e+01

Pos logit contributions
ors+0.198
driver+0.187
ted+0.180
oes+0.178
rings+0.176
Neg logit contributions
 beautiful-0.389
 charming-0.385
 wonderful-0.384
 lovely-0.382
 clever-0.382
Token was
Token ID373
Feature activation+5.49e+00

Pos logit contributions
ted+0.381
ors+0.378
driver+0.370
rings+0.367
 Stop+0.356
Neg logit contributions
 charming-0.686
 beautiful-0.679
 wonderful-0.675
 lovely-0.674
 clever-0.669
Token already
Token ID1541
Feature activation+9.71e+00

Pos logit contributions
 to+0.156
 it+0.136
 him+0.133
 over+0.131
 they+0.122
Neg logit contributions
 charming-0.148
 wonderful-0.145
 lovely-0.143
 cute-0.141
 beautiful-0.139
Token walking
Token ID6155
Feature activation+1.46e+00

Pos logit contributions
 to+0.239
rings+0.201
 him+0.199
driver+0.195
fully+0.195
Neg logit contributions
 wonderful-0.308
 charming-0.307
 lovely-0.303
 cute-0.301
 clever-0.301
Token ice
Token ID4771
Feature activation-3.08e+00

Pos logit contributions
 him+0.146
 it+0.143
 to+0.143
rings+0.139
ted+0.135
Neg logit contributions
 clever-0.084
 charming-0.083
 cute-0.078
 humble-0.078
 lovely-0.078
Token!
Token ID0
Feature activation-3.30e+00

Pos logit contributions
 beautiful+0.085
 charming+0.085
 lovely+0.084
 wonderful+0.083
 clever+0.083
Neg logit contributions
ors-0.089
ted-0.088
 Stop-0.088
rings-0.088
driver-0.088
Token Lucy
Token ID22162
Feature activation-1.91e+00

Pos logit contributions
 charming+0.127
 beautiful+0.127
 wonderful+0.126
 clever+0.125
 lovely+0.125
Neg logit contributions
ors-0.059
driver-0.057
ted-0.056
rings-0.056
oes-0.053
Token felt
Token ID2936
Feature activation+6.51e-01

Pos logit contributions
 beautiful+0.065
 charming+0.065
 clever+0.065
 wonderful+0.064
 lovely+0.064
Neg logit contributions
ors-0.042
driver-0.041
rings-0.041
ted-0.041
oes-0.040
Token so
Token ID523
Feature activation+7.87e-01

Pos logit contributions
 to+0.017
 him+0.016
ted+0.016
 it+0.016
 Stop+0.016
Neg logit contributions
 charming-0.018
 cute-0.018
 clever-0.018
 lovely-0.018
 beautiful-0.018
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.030
 it+0.030
 him+0.029
 Stop+0.029
 they+0.029
Neg logit contributions
 charming-0.011
 cute-0.010
 lovely-0.010
 wonderful-0.010
 beautiful-0.010
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 beautiful+1.010
 wonderful+0.963
 charming+0.959
 magical+0.957
 clever+0.956
Neg logit contributions
oes-0.737
ors-0.683
 Stop-0.652
rings-0.650
or-0.629
Token tiny
Token ID7009
Feature activation-2.69e+01

Pos logit contributions
 to+0.464
ted+0.456
rings+0.451
 it+0.445
fully+0.445
Neg logit contributions
 charming-0.168
 lovely-0.162
 wonderful-0.161
 cute-0.161
 clever-0.159
Token compared
Token ID3688
Feature activation-1.96e+00

Pos logit contributions
 beautiful+0.907
 charming+0.883
 wonderful+0.879
 lovely+0.877
 clever+0.876
Neg logit contributions
ors-0.647
oes-0.628
rings-0.603
 Stop-0.597
driver-0.595
Token to
Token ID284
Feature activation+1.48e+01

Pos logit contributions
 beautiful+0.089
 small+0.088
 charming+0.087
 wonderful+0.086
 nice+0.086
Neg logit contributions
ors-0.029
driver-0.028
 Stop-0.025
oes-0.024
 Did-0.023
Token the
Token ID262
Feature activation-4.16e+00

Pos logit contributions
 to+0.334
rings+0.333
ted+0.327
ors+0.324
 him+0.324
Neg logit contributions
 charming-0.477
 clever-0.471
 cute-0.468
 beautiful-0.464
 lovely-0.463
Token home
Token ID1363
Feature activation-6.05e-02

Pos logit contributions
ors+0.111
driver+0.105
 Stop+0.104
fully+0.102
rings+0.100
Neg logit contributions
 beautiful-0.102
 small-0.102
 charming-0.102
 clever-0.100
 nice-0.100
Token,
Token ID11
Feature activation-7.40e-01

Pos logit contributions
 small+0.002
 beautiful+0.002
 charming+0.002
 clever+0.002
 nice+0.002
Neg logit contributions
ors-0.002
driver-0.002
ted-0.002
rings-0.002
oes-0.001
Token they
Token ID484
Feature activation+1.48e+01

Pos logit contributions
 charming+0.026
 beautiful+0.026
 wonderful+0.025
 clever+0.025
 lovely+0.025
Neg logit contributions
ors-0.016
driver-0.016
ted-0.016
rings-0.016
oes-0.015
Token cut
Token ID2005
Feature activation-6.85e+00

Pos logit contributions
ors+0.354
driver+0.345
ted+0.343
rings+0.341
 Stop+0.332
Neg logit contributions
 charming-0.477
 beautiful-0.476
 clever-0.471
 wonderful-0.471
 lovely-0.469
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 charming+0.220
 beautiful+0.219
 clever+0.217
 wonderful+0.217
 lovely+0.216
Neg logit contributions
ors-0.165
driver-0.161
ted-0.160
rings-0.159
 Stop-0.155
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 charming+0.245
 clever+0.241
 wonderful+0.240
 lovely+0.229
 cute+0.227
Neg logit contributions
 to-0.496
ho-0.476
driver-0.465
 it-0.459
rings-0.451
Token piece
Token ID3704
Feature activation-7.39e+00

Pos logit contributions
 charming+0.692
 beautiful+0.688
 wonderful+0.679
 clever+0.679
 lovely+0.675
Neg logit contributions
ors-0.932
driver-0.917
ted-0.914
rings-0.910
 Stop-0.891
Token of
Token ID286
Feature activation+4.33e+00

Pos logit contributions
 big+0.255
 beautiful+0.253
 small+0.253
 nice+0.250
 special+0.245
Neg logit contributions
 Stop-0.184
ors-0.179
ted-0.168
 Did-0.168
rings-0.167
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
 to+0.119
 him+0.117
 it+0.116
ted+0.114
 they+0.109
Neg logit contributions
 charming-0.116
 clever-0.116
 wonderful-0.113
 cute-0.112
 lovely-0.111
Token spicy
Token ID26880
Feature activation-1.83e+01

Pos logit contributions
 wonderful+0.118
 charming+0.117
 cute+0.115
 lovely+0.113
 clever+0.113
Neg logit contributions
 to-0.108
oes-0.100
ted-0.099
ho-0.098
 him-0.098
Token pepper
Token ID13385
Feature activation+1.48e+00

Pos logit contributions
 beautiful+0.452
 charming+0.443
 lovely+0.440
 wonderful+0.435
 clever+0.435
Neg logit contributions
ors-0.584
rings-0.581
driver-0.577
 Stop-0.566
ted-0.562
Token stack
Token ID8931
Feature activation-7.20e-01

Pos logit contributions
 compassionate+0.087
 wonderful+0.085
 clever+0.084
 great+0.084
 charming+0.084
Neg logit contributions
 to-0.133
als-0.126
ho-0.125
oes-0.124
 it-0.122
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
 small+0.019
 beautiful+0.018
 big+0.018
 nice+0.018
 kind+0.018
Neg logit contributions
 Stop-0.022
ted-0.022
ors-0.021
rings-0.021
 Did-0.020
Token He
Token ID679
Feature activation+9.25e+00

Pos logit contributions
 small+0.319
 beautiful+0.315
 clever+0.313
 charming+0.313
 lovely+0.309
Neg logit contributions
ors-0.236
driver-0.220
rings-0.218
ted-0.217
ting-0.204
Token saw
Token ID2497
Feature activation-6.88e+00

Pos logit contributions
ors+0.236
driver+0.228
rings+0.226
ted+0.224
oes+0.223
Neg logit contributions
 beautiful-0.292
 clever-0.290
 charming-0.290
 wonderful-0.287
 lovely-0.284
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
 charming+0.196
 beautiful+0.193
 clever+0.193
 wonderful+0.192
 lovely+0.191
Neg logit contributions
ors-0.187
ted-0.186
driver-0.184
rings-0.184
 to-0.178
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 compassionate+0.075
 wonderful+0.074
 charming+0.074
 clever+0.072
 humble+0.071
Neg logit contributions
 to-0.141
rings-0.131
ho-0.130
driver-0.130
oes-0.130
Token bird
Token ID6512
Feature activation-8.26e-01

Pos logit contributions
 charming+0.717
 wonderful+0.713
 clever+0.708
 beautiful+0.705
 lovely+0.700
Neg logit contributions
driver-0.899
ors-0.897
rings-0.880
ted-0.876
 to-0.865
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 beautiful+0.029
 charming+0.028
 cute+0.028
 small+0.028
 lovely+0.028
Neg logit contributions
driver-0.020
ors-0.020
 Stop-0.019
oes-0.019
ted-0.018
Token its
Token ID663
Feature activation-2.16e+00

Pos logit contributions
ors+0.263
driver+0.254
ted+0.247
rings+0.245
oes+0.245
Neg logit contributions
 beautiful-0.418
 charming-0.415
 clever-0.412
 cute-0.410
 wonderful-0.409
Token feathers
Token ID27400
Feature activation+2.70e+00

Pos logit contributions
 wonderful+0.041
 charming+0.040
 clever+0.039
 lovely+0.038
 great+0.038
Neg logit contributions
 to-0.078
ho-0.071
rings-0.071
oes-0.069
als-0.067
Token had
Token ID550
Feature activation+2.96e+00

Pos logit contributions
ors+0.068
 Stop+0.067
driver+0.066
oes+0.065
rings+0.064
Neg logit contributions
 beautiful-0.086
 charming-0.084
 lovely-0.083
 cute-0.083
 clever-0.083
Token ground
Token ID2323
Feature activation+1.10e+00

Pos logit contributions
 wonderful+0.111
 clever+0.110
 charming+0.110
 cute+0.108
 lovely+0.107
Neg logit contributions
ors-0.119
 to-0.118
rings-0.115
oes-0.115
als-0.114
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
ors+0.032
driver+0.031
ted+0.031
rings+0.031
 Stop+0.030
Neg logit contributions
 charming-0.030
 beautiful-0.030
 clever-0.029
 lovely-0.029
 wonderful-0.029
Token The
Token ID383
Feature activation-9.02e+00

Pos logit contributions
 charming+0.359
 beautiful+0.358
 clever+0.353
 lovely+0.353
 cute+0.352
Neg logit contributions
ors-0.182
driver-0.175
ted-0.174
rings-0.173
 Stop-0.163
Token seed
Token ID9403
Feature activation+1.19e+00

Pos logit contributions
 wonderful+0.267
 charming+0.263
 cute+0.259
 lovely+0.257
 clever+0.257
Neg logit contributions
 to-0.221
als-0.203
oes-0.200
 it-0.199
rings-0.199
Token was
Token ID373
Feature activation+7.20e-01

Pos logit contributions
driver+0.028
ors+0.027
 Stop+0.027
rings+0.027
ted+0.026
Neg logit contributions
 beautiful-0.041
 charming-0.041
 clever-0.040
 cute-0.040
 lovely-0.040
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.026
 it+0.023
 they+0.022
ho+0.022
rings+0.022
Neg logit contributions
 wonderful-0.014
 charming-0.014
 clever-0.013
 lovely-0.013
 great-0.013
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 beautiful+1.200
 charming+1.192
 small+1.185
 clever+1.180
 cute+1.177
Neg logit contributions
ors-0.460
driver-0.446
 Stop-0.438
ted-0.437
rings-0.436
Token round
Token ID2835
Feature activation-1.45e+01

Pos logit contributions
 to+0.380
 it+0.373
 they+0.371
rings+0.367
ho+0.364
Neg logit contributions
 charming-0.254
 wonderful-0.254
 lovely-0.246
 clever-0.243
 beautiful-0.243
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 beautiful+0.448
 charming+0.447
 clever+0.441
 lovely+0.440
 cute+0.439
Neg logit contributions
ors-0.370
driver-0.366
rings-0.361
 Stop-0.355
ted-0.355
Token One
Token ID1881
Feature activation-9.31e+00

Pos logit contributions
 charming+0.377
 beautiful+0.376
 wonderful+0.374
 clever+0.374
 lovely+0.372
Neg logit contributions
ors-0.155
driver-0.150
ted-0.150
rings-0.148
 Stop-0.143
Token morning
Token ID3329
Feature activation-2.42e+00

Pos logit contributions
 wonderful+0.267
 clever+0.264
 charming+0.261
 cute+0.260
 great+0.257
Neg logit contributions
 to-0.246
oes-0.224
 they-0.209
 it-0.208
ho-0.195
Token coming
Token ID2406
Feature activation+2.09e+00

Pos logit contributions
 to+0.014
ors+0.014
rings+0.014
oes+0.014
ted+0.013
Neg logit contributions
 charming-0.025
 clever-0.025
 beautiful-0.024
 wonderful-0.024
 cute-0.024
Token from
Token ID422
Feature activation+4.05e+00

Pos logit contributions
driver+0.042
ors+0.042
ted+0.041
rings+0.040
licks+0.038
Neg logit contributions
 beautiful-0.079
 nice-0.077
 small-0.077
 big-0.077
 charming-0.077
Token and
Token ID290
Feature activation+1.18e+01

Pos logit contributions
ors+0.099
ted+0.097
driver+0.096
rings+0.096
 Stop+0.093
Neg logit contributions
 charming-0.128
 beautiful-0.127
 clever-0.126
 wonderful-0.126
 lovely-0.126
Token noticed
Token ID6810
Feature activation-3.47e+00

Pos logit contributions
ors+0.313
driver+0.299
oes+0.289
rings+0.273
or+0.273
Neg logit contributions
 small-0.387
 beautiful-0.386
 clever-0.383
 cute-0.381
 charming-0.378
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 charming+0.117
 clever+0.116
 wonderful+0.115
 lovely+0.114
 beautiful+0.114
Neg logit contributions
ted-0.079
ors-0.077
 to-0.076
 him-0.075
rings-0.073
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.218
aky+0.209
 entire+0.208
 intelligent+0.207
 clever+0.206
Neg logit contributions
 to-0.493
 it-0.485
 they-0.462
 we-0.459
ho-0.442
Token house
Token ID2156
Feature activation+4.95e+00

Pos logit contributions
 charming+0.780
 clever+0.780
 wonderful+0.778
 lovely+0.764
 beautiful+0.763
Neg logit contributions
ors-0.799
driver-0.798
 to-0.794
ted-0.791
rings-0.781
Token.
Token ID13
Feature activation-9.47e+00

Pos logit contributions
ors+0.155
rings+0.152
ted+0.150
 Stop+0.148
oes+0.146
Neg logit contributions
 beautiful-0.132
 charming-0.128
 small-0.126
 big-0.126
 nice-0.126
Token His
Token ID2399
Feature activation-5.42e+00

Pos logit contributions
 small+0.368
 charming+0.365
 clever+0.364
 beautiful+0.362
 cute+0.360
Neg logit contributions
ors-0.174
rings-0.167
driver-0.161
fully-0.159
oes-0.158
Token curiosity
Token ID20136
Feature activation-9.35e+00

Pos logit contributions
eh+0.138
aky+0.138
 charming+0.135
 wonderful+0.135
 lovely+0.132
Neg logit contributions
 to-0.157
 they-0.131
 it-0.129
 him-0.129
ted-0.129
Token grew
Token ID6348
Feature activation-2.96e+00

Pos logit contributions
 beautiful+0.297
 small+0.290
 clever+0.289
 charming+0.288
 cute+0.285
Neg logit contributions
 Stop-0.238
oes-0.237
ors-0.233
rings-0.233
but-0.224
Token.
Token ID13
Feature activation-9.52e+00

Pos logit contributions
 beautiful+0.055
 charming+0.055
 small+0.054
 clever+0.054
 cute+0.054
Neg logit contributions
ors-0.071
driver-0.070
rings-0.070
ted-0.069
 Stop-0.068
Token The
Token ID383
Feature activation-9.03e+00

Pos logit contributions
 small+0.395
 beautiful+0.392
 charming+0.391
 cute+0.388
 lovely+0.387
Neg logit contributions
ors-0.154
fully-0.133
driver-0.133
ted-0.133
oes-0.131
Token red
Token ID2266
Feature activation-1.94e+01

Pos logit contributions
eh+0.330
 wonderful+0.329
ady+0.321
 lovely+0.321
 charming+0.316
Neg logit contributions
 to-0.147
 it-0.138
 they-0.129
 we-0.128
 he-0.127
Token fish
Token ID5916
Feature activation-2.26e+00

Pos logit contributions
 small+0.392
 beautiful+0.377
 charming+0.375
 big+0.372
 cute+0.371
Neg logit contributions
ors-0.715
rings-0.707
but-0.698
driver-0.695
 Stop-0.668
Token is
Token ID318
Feature activation+8.09e+00

Pos logit contributions
 clever+0.073
 small+0.072
 beautiful+0.072
 charming+0.071
 cute+0.071
Neg logit contributions
rings-0.059
ors-0.058
driver-0.057
fully-0.056
but-0.054
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.266
 it+0.245
 him+0.237
 they+0.236
 over+0.233
Neg logit contributions
 wonderful-0.175
 lovely-0.170
 charming-0.169
 entire-0.165
 great-0.164
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 small+1.133
 beautiful+1.120
 charming+1.107
 nice+1.104
 cute+1.101
Neg logit contributions
ors-0.547
rings-0.518
ted-0.511
 Stop-0.508
but-0.506
Token cute
Token ID13779
Feature activation-2.53e+01

Pos logit contributions
driver+0.445
ted+0.444
 to+0.444
rings+0.441
ors+0.441
Neg logit contributions
 charming-0.193
 wonderful-0.191
 beautiful-0.189
 lovely-0.188
 clever-0.184
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 nice+0.620
 beautiful+0.616
 cute+0.615
 small+0.615
 charming+0.612
Neg logit contributions
ors-0.830
driver-0.808
rings-0.803
oes-0.791
ted-0.783
Token Lily
Token ID20037
Feature activation-6.81e+00

Pos logit contributions
 charming+0.389
 beautiful+0.388
 clever+0.384
 lovely+0.384
 cute+0.383
Neg logit contributions
ors-0.149
driver-0.139
ted-0.139
rings-0.137
oes-0.132
Token likes
Token ID7832
Feature activation-5.91e+00

Pos logit contributions
 clever+0.217
 cute+0.215
 small+0.214
 gentle+0.208
 charming+0.207
Neg logit contributions
fully-0.186
rings-0.184
ors-0.180
oes-0.178
but-0.176
Token She
Token ID1375
Feature activation+1.23e+01

Pos logit contributions
 beautiful+0.115
 lovely+0.113
 charming+0.113
 wonderful+0.113
 small+0.112
Neg logit contributions
ors-0.072
driver-0.068
ted-0.067
rings-0.066
ho-0.065
Token carefully
Token ID7773
Feature activation-4.83e+00

Pos logit contributions
ors+0.307
driver+0.298
rings+0.295
ted+0.294
oes+0.293
Neg logit contributions
 beautiful-0.396
 charming-0.393
 clever-0.391
 wonderful-0.388
 lovely-0.386
Token piled
Token ID33654
Feature activation+3.18e+00

Pos logit contributions
 beautiful+0.150
 charming+0.149
 clever+0.148
 wonderful+0.146
 lovely+0.146
Neg logit contributions
ors-0.124
rings-0.120
oes-0.119
 Stop-0.118
driver-0.118
Token up
Token ID510
Feature activation+2.53e+00

Pos logit contributions
 Stop+0.055
ors+0.054
oes+0.054
driver+0.051
rings+0.050
Neg logit contributions
 beautiful-0.130
 small-0.128
 lovely-0.127
 wonderful-0.127
 charming-0.127
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
ors+0.068
 Stop+0.067
oes+0.067
driver+0.065
rings+0.063
Neg logit contributions
 beautiful-0.079
 charming-0.077
 small-0.077
 wonderful-0.077
 lovely-0.077
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 charming+0.097
 clever+0.097
 wonderful+0.096
 beautiful+0.094
 lovely+0.094
Neg logit contributions
rings-0.130
driver-0.130
ors-0.130
ted-0.128
 to-0.127
Token bows
Token ID37469
Feature activation-4.35e-01

Pos logit contributions
 beautiful+0.715
 charming+0.711
 lovely+0.698
 wonderful+0.695
 cute+0.690
Neg logit contributions
ors-0.923
ted-0.903
driver-0.896
rings-0.894
 Stop-0.886
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 beautiful+0.014
 big+0.014
 small+0.014
 special+0.014
 pretty+0.014
Neg logit contributions
 Stop-0.011
 Get-0.011
 Did-0.011
ors-0.010
oes-0.010
Token started
Token ID2067
Feature activation+7.65e-01

Pos logit contributions
ors+0.302
driver+0.294
ted+0.293
rings+0.291
 Stop+0.285
Neg logit contributions
 charming-0.365
 beautiful-0.365
 clever-0.361
 wonderful-0.360
 lovely-0.359
Token sorting
Token ID29407
Feature activation+2.34e+00

Pos logit contributions
ors+0.011
 Stop+0.010
oes+0.010
ted+0.010
driver+0.009
Neg logit contributions
 beautiful-0.033
 small-0.033
 charming-0.033
 clever-0.033
 lovely-0.033
Token them
Token ID606
Feature activation+1.20e+01

Pos logit contributions
ors+0.042
driver+0.042
 Stop+0.042
oes+0.042
ted+0.040
Neg logit contributions
 beautiful-0.092
 wonderful-0.090
 lovely-0.090
 charming-0.090
 small-0.089
Token and
Token ID290
Feature activation+1.18e+01

Pos logit contributions
 charming+1.078
 beautiful+1.077
 clever+1.066
 wonderful+1.063
 lovely+1.063
Neg logit contributions
ors-0.315
rings-0.301
driver-0.300
ted-0.300
 Stop-0.288
Token the
Token ID262
Feature activation-4.19e+00

Pos logit contributions
 they+0.369
 to+0.363
 it+0.346
 we+0.342
 she+0.336
Neg logit contributions
 wonderful-0.261
 charming-0.259
 lovely-0.256
 great-0.254
 beautiful-0.254
Token little
Token ID1310
Feature activation-2.35e+01

Pos logit contributions
 clever+0.087
 wonderful+0.086
 beautiful+0.085
 creative+0.084
eh+0.084
Neg logit contributions
 to-0.129
 they-0.124
 She-0.117
 he-0.117
ted-0.117
Token girl
Token ID2576
Feature activation+4.00e-01

Pos logit contributions
 wonderful+0.551
 charming+0.543
 beautiful+0.541
 clever+0.541
 lovely+0.533
Neg logit contributions
rings-0.744
ted-0.743
 to-0.737
driver-0.733
ors-0.729
Token was
Token ID373
Feature activation+7.20e-01

Pos logit contributions
ors+0.009
ted+0.009
rings+0.009
driver+0.009
 to+0.009
Neg logit contributions
 charming-0.013
 beautiful-0.013
 wonderful-0.013
 lovely-0.013
 clever-0.013
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.023
 it+0.020
 they+0.019
 over+0.019
 she+0.018
Neg logit contributions
 wonderful-0.017
 unexpected-0.017
 painful-0.017
 magical-0.016
 thrill-0.016
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 charming+1.099
 beautiful+1.098
 clever+1.086
 wonderful+1.082
 lovely+1.082
Neg logit contributions
ors-0.540
ted-0.522
driver-0.516
rings-0.512
 Stop-0.498
Token
Token ID220
Feature activation-6.58e+00

Pos logit contributions
 charming+0.303
 beautiful+0.301
 lovely+0.300
 small+0.300
 cute+0.297
Neg logit contributions
driver-0.241
ors-0.239
rings-0.224
ted-0.223
fully-0.222
Token
Token ID198
Feature activation-7.23e+00

Pos logit contributions
 charming+0.209
 beautiful+0.208
 wonderful+0.206
 clever+0.206
 lovely+0.205
Neg logit contributions
ors-0.159
driver-0.155
ted-0.155
rings-0.154
 Stop-0.150
Token
Token ID198
Feature activation-7.26e+00

Pos logit contributions
 charming+0.248
 clever+0.246
 beautiful+0.246
 wonderful+0.243
 cute+0.242
Neg logit contributions
ors-0.153
ted-0.148
 to-0.147
driver-0.147
rings-0.145
TokenOne
Token ID3198
Feature activation-1.40e+01

Pos logit contributions
 clever+0.185
 charming+0.180
 wonderful+0.178
 beautiful+0.178
 cute+0.177
Neg logit contributions
 to-0.207
ors-0.206
but-0.204
ted-0.201
 Stop-0.201
Token and
Token ID290
Feature activation+1.18e+01

Pos logit contributions
 charming+1.078
 beautiful+1.077
 clever+1.066
 wonderful+1.063
 lovely+1.063
Neg logit contributions
ors-0.315
rings-0.301
driver-0.300
ted-0.300
 Stop-0.288
Token the
Token ID262
Feature activation-4.19e+00

Pos logit contributions
 they+0.369
 to+0.363
 it+0.346
 we+0.342
 she+0.336
Neg logit contributions
 wonderful-0.261
 charming-0.259
 lovely-0.256
 great-0.254
 beautiful-0.254
Token little
Token ID1310
Feature activation-2.35e+01

Pos logit contributions
 clever+0.087
 wonderful+0.086
 beautiful+0.085
 creative+0.084
eh+0.084
Neg logit contributions
 to-0.129
 they-0.124
 She-0.117
 he-0.117
ted-0.117
Token girl
Token ID2576
Feature activation+4.00e-01

Pos logit contributions
 wonderful+0.551
 charming+0.543
 beautiful+0.541
 clever+0.541
 lovely+0.533
Neg logit contributions
rings-0.744
ted-0.743
 to-0.737
driver-0.733
ors-0.729
Token was
Token ID373
Feature activation+7.20e-01

Pos logit contributions
ors+0.009
ted+0.009
rings+0.009
driver+0.009
 to+0.009
Neg logit contributions
 charming-0.013
 beautiful-0.013
 wonderful-0.013
 lovely-0.013
 clever-0.013
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.023
 it+0.020
 they+0.019
 over+0.019
 she+0.018
Neg logit contributions
 wonderful-0.017
 unexpected-0.017
 painful-0.017
 magical-0.016
 thrill-0.016
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 charming+1.099
 beautiful+1.098
 clever+1.086
 wonderful+1.082
 lovely+1.082
Neg logit contributions
ors-0.540
ted-0.522
driver-0.516
rings-0.512
 Stop-0.498
Token
Token ID220
Feature activation-6.58e+00

Pos logit contributions
 charming+0.303
 beautiful+0.301
 lovely+0.300
 small+0.300
 cute+0.297
Neg logit contributions
driver-0.241
ors-0.239
rings-0.224
ted-0.223
fully-0.222
Token
Token ID198
Feature activation-7.23e+00

Pos logit contributions
 charming+0.209
 beautiful+0.208
 wonderful+0.206
 clever+0.206
 lovely+0.205
Neg logit contributions
ors-0.159
driver-0.155
ted-0.155
rings-0.154
 Stop-0.150
Token
Token ID198
Feature activation-7.26e+00

Pos logit contributions
 charming+0.248
 clever+0.246
 beautiful+0.246
 wonderful+0.243
 cute+0.242
Neg logit contributions
ors-0.153
ted-0.148
 to-0.147
driver-0.147
rings-0.145
TokenOne
Token ID3198
Feature activation-1.40e+01

Pos logit contributions
 clever+0.185
 charming+0.180
 wonderful+0.178
 beautiful+0.178
 cute+0.177
Neg logit contributions
 to-0.207
ors-0.206
but-0.204
ted-0.201
 Stop-0.201
Token to
Token ID284
Feature activation+1.48e+01

Pos logit contributions
ors+0.409
driver+0.401
ted+0.397
rings+0.395
 Stop+0.383
Neg logit contributions
 charming-0.649
 beautiful-0.645
 clever-0.640
 wonderful-0.640
 lovely-0.637
Token me
Token ID502
Feature activation-1.48e+00

Pos logit contributions
 him+0.317
 to+0.307
ors+0.306
 Stop+0.295
 it+0.295
Neg logit contributions
 charming-0.500
 clever-0.492
 cute-0.492
 wonderful-0.490
 beautiful-0.486
Token!"
Token ID2474
Feature activation-2.42e+00

Pos logit contributions
 charming+0.050
 beautiful+0.050
 wonderful+0.049
 clever+0.049
 lovely+0.049
Neg logit contributions
ors-0.033
driver-0.032
ted-0.032
rings-0.032
 Stop-0.031
Token But
Token ID887
Feature activation+8.33e+00

Pos logit contributions
 beautiful+0.081
 small+0.080
 charming+0.080
 clever+0.080
 gentle+0.079
Neg logit contributions
ors-0.056
rings-0.053
driver-0.052
oes-0.051
ted-0.051
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
ors+0.229
driver+0.218
oes+0.216
rings+0.214
ted+0.211
Neg logit contributions
 beautiful-0.255
 charming-0.253
 clever-0.253
 small-0.250
 wonderful-0.250
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.070
 cozy+0.067
 wonderful+0.065
 lovely+0.063
 beautiful+0.061
Neg logit contributions
 to-0.148
 he-0.145
driver-0.143
 she-0.142
 it-0.136
Token girl
Token ID2576
Feature activation+4.14e-01

Pos logit contributions
 wonderful+0.774
 beautiful+0.762
 charming+0.757
 lovely+0.756
 cute+0.744
Neg logit contributions
driver-0.858
 to-0.837
ted-0.833
ors-0.815
rings-0.812
Token didn
Token ID1422
Feature activation+2.94e+00

Pos logit contributions
ors+0.009
rings+0.009
driver+0.009
ted+0.009
oes+0.009
Neg logit contributions
 charming-0.014
 clever-0.014
 beautiful-0.014
 wonderful-0.014
 lovely-0.014
Token't
Token ID470
Feature activation+8.98e+00

Pos logit contributions
rings+0.048
als+0.044
ted+0.043
ors+0.041
oes+0.041
Neg logit contributions
 big-0.125
 great-0.124
 clever-0.123
 small-0.122
 beautiful-0.122
Token give
Token ID1577
Feature activation+2.30e+00

Pos logit contributions
ors+0.211
ted+0.205
driver+0.204
rings+0.203
 Stop+0.195
Neg logit contributions
 charming-0.297
 beautiful-0.296
 clever-0.294
 wonderful-0.294
 lovely-0.292
Token him
Token ID683
Feature activation+1.21e+01

Pos logit contributions
driver+0.055
ors+0.053
 Stop+0.052
rings+0.051
iked+0.051
Neg logit contributions
 beautiful-0.078
 clever-0.077
 charming-0.077
 wonderful-0.077
 lovely-0.076
Token In
Token ID554
Feature activation+3.55e+00

Pos logit contributions
 small+0.299
 charming+0.296
 cute+0.294
 clever+0.293
 beautiful+0.293
Neg logit contributions
ors-0.248
rings-0.235
driver-0.232
ted-0.232
oes-0.231
Token the
Token ID262
Feature activation-4.19e+00

Pos logit contributions
ors+0.088
driver+0.087
ted+0.086
rings+0.085
 Stop+0.084
Neg logit contributions
 charming-0.110
 beautiful-0.109
 clever-0.108
 wonderful-0.108
 lovely-0.108
Token sand
Token ID6450
Feature activation-1.52e+00

Pos logit contributions
 cute+0.101
 charming+0.101
 clever+0.099
 beautiful+0.097
 wonderful+0.096
Neg logit contributions
 to-0.122
oes-0.116
 Stop-0.116
ted-0.115
 him-0.114
Token was
Token ID373
Feature activation+7.04e-01

Pos logit contributions
 small+0.048
 beautiful+0.047
 charming+0.047
 big+0.046
 nice+0.046
Neg logit contributions
ors-0.038
ted-0.036
 Stop-0.035
rings-0.035
driver-0.035
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 to+0.022
 him+0.021
driver+0.021
ted+0.021
rings+0.020
Neg logit contributions
 charming-0.017
 clever-0.016
 lovely-0.016
 cute-0.016
 wonderful-0.016
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 charming+0.214
 clever+0.204
 wonderful+0.203
 intelligent+0.203
 lovely+0.202
Neg logit contributions
 to-0.494
 they-0.475
 it-0.470
 we-0.466
ho-0.466
Token box
Token ID3091
Feature activation+1.16e+00

Pos logit contributions
 charming+0.741
 beautiful+0.733
 wonderful+0.730
 clever+0.730
 lovely+0.725
Neg logit contributions
ors-0.870
driver-0.865
ted-0.852
rings-0.851
 Stop-0.833
Token.
Token ID13
Feature activation-9.47e+00

Pos logit contributions
ors+0.031
ted+0.030
 Stop+0.030
rings+0.029
driver+0.029
Neg logit contributions
 beautiful-0.035
 charming-0.034
 lovely-0.034
 clever-0.034
 small-0.034
Token
Token ID198
Feature activation-7.23e+00

Pos logit contributions
 charming+0.329
 beautiful+0.328
 lovely+0.324
 cute+0.324
 clever+0.324
Neg logit contributions
ors-0.207
driver-0.199
rings-0.198
ted-0.197
oes-0.192
Token
Token ID198
Feature activation-7.26e+00

Pos logit contributions
 charming+0.250
 beautiful+0.249
 wonderful+0.247
 clever+0.246
 lovely+0.245
Neg logit contributions
ors-0.150
 to-0.148
driver-0.148
 Stop-0.147
ted-0.145
Token"
Token ID1
Feature activation+1.25e+00

Pos logit contributions
 charming+0.213
 beautiful+0.213
 wonderful+0.211
 clever+0.210
 lovely+0.209
Neg logit contributions
ors-0.190
driver-0.188
ted-0.186
 Stop-0.185
rings-0.183
Token discovered
Token ID5071
Feature activation+1.55e+00

Pos logit contributions
ted+0.385
ors+0.384
rings+0.379
driver+0.375
 Stop+0.367
Neg logit contributions
 charming-0.440
 beautiful-0.437
 wonderful-0.434
 clever-0.432
 lovely-0.431
Token something
Token ID1223
Feature activation+3.21e-01

Pos logit contributions
 to+0.048
 him+0.046
ted+0.046
 they+0.046
 it+0.046
Neg logit contributions
 gentle-0.038
 charming-0.038
 clever-0.037
 cute-0.037
 lively-0.036
Token very
Token ID845
Feature activation-6.56e+00

Pos logit contributions
 to+0.014
 they+0.013
 it+0.013
ted+0.013
ors+0.013
Neg logit contributions
 charming-0.003
 clever-0.003
 gentle-0.003
 humble-0.003
 lovely-0.003
Token,
Token ID11
Feature activation-7.22e-01

Pos logit contributions
ady+0.061
 entire+0.057
 youngest+0.057
aky+0.056
 whole+0.054
Neg logit contributions
 to-0.279
 it-0.268
 they-0.268
ho-0.258
 he-0.257
Token very
Token ID845
Feature activation-6.53e+00

Pos logit contributions
 beautiful+0.023
 charming+0.022
 cute+0.022
 lovely+0.021
 wonderful+0.021
Neg logit contributions
ors-0.022
driver-0.021
oes-0.020
or-0.020
rings-0.020
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.055
aky+0.054
 entire+0.054
 whole+0.048
 company+0.047
Neg logit contributions
ho-0.278
 they-0.276
 to-0.273
 it-0.264
 he-0.258
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 beautiful+0.750
 charming+0.738
 cute+0.728
 small+0.726
 lovely+0.726
Neg logit contributions
ors-0.891
but-0.870
driver-0.868
rings-0.864
oes-0.856
Token When
Token ID1649
Feature activation+7.15e+00

Pos logit contributions
 small+0.335
 charming+0.334
 beautiful+0.334
 lovely+0.330
 cute+0.329
Neg logit contributions
ors-0.212
driver-0.204
rings-0.199
ted-0.194
fully-0.192
Token they
Token ID484
Feature activation+1.48e+01

Pos logit contributions
ors+0.081
driver+0.079
rings+0.073
licks+0.068
ted+0.066
Neg logit contributions
 beautiful-0.346
 wonderful-0.344
 charming-0.342
 lovely-0.342
 clever-0.342
Token went
Token ID1816
Feature activation+7.38e-01

Pos logit contributions
ors+0.413
driver+0.388
rings+0.364
fully+0.360
oes+0.357
Neg logit contributions
 clever-0.475
 small-0.472
 wonderful-0.469
 beautiful-0.469
 charming-0.466
Token over
Token ID625
Feature activation+4.65e+00

Pos logit contributions
ors+0.023
bars+0.019
rings+0.019
ately+0.019
driver+0.018
Neg logit contributions
 big-0.025
 nice-0.025
 small-0.025
 beautiful-0.024
 friendly-0.024
Token investigate
Token ID9161
Feature activation+2.84e+00

Pos logit contributions
ors+0.372
driver+0.364
ted+0.361
rings+0.357
but+0.343
Neg logit contributions
 beautiful-0.471
 charming-0.468
 clever-0.466
 wonderful-0.464
 lovely-0.462
Token and
Token ID290
Feature activation+1.18e+01

Pos logit contributions
ors+0.056
driver+0.055
rings+0.054
ted+0.054
 Stop+0.052
Neg logit contributions
 charming-0.104
 beautiful-0.104
 wonderful-0.103
 clever-0.102
 lovely-0.102
Token he
Token ID339
Feature activation+1.36e+01

Pos logit contributions
ors+0.340
driver+0.326
rings+0.299
ted+0.290
ges+0.285
Neg logit contributions
 clever-0.371
 beautiful-0.369
 small-0.367
 wonderful-0.365
 magical-0.364
Token saw
Token ID2497
Feature activation-6.88e+00

Pos logit contributions
ors+0.313
ted+0.312
rings+0.306
driver+0.304
 to+0.297
Neg logit contributions
 charming-0.447
 beautiful-0.444
 wonderful-0.441
 lovely-0.440
 clever-0.437
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 charming+0.184
 lovely+0.179
 clever+0.178
 cute+0.178
 wonderful+0.178
Neg logit contributions
ted-0.202
 him-0.201
 to-0.199
 it-0.191
ting-0.186
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.204
 entire+0.182
aky+0.177
aver+0.167
 charming+0.167
Neg logit contributions
 to-0.530
 it-0.498
ho-0.489
 we-0.470
 they-0.465
Token rabbit
Token ID22746
Feature activation-6.68e+00

Pos logit contributions
 charming+0.731
 wonderful+0.729
 clever+0.723
 beautiful+0.717
 lovely+0.717
Neg logit contributions
driver-0.854
ors-0.850
 to-0.843
ted-0.838
rings-0.834
Token running
Token ID2491
Feature activation+6.99e-01

Pos logit contributions
 beautiful+0.184
 charming+0.182
 cute+0.181
 clever+0.180
 lovely+0.179
Neg logit contributions
ors-0.193
driver-0.192
rings-0.187
oes-0.186
ted-0.185
Token away
Token ID1497
Feature activation+2.31e+00

Pos logit contributions
rings+0.010
ors+0.009
driver+0.009
oes+0.009
ted+0.008
Neg logit contributions
 beautiful-0.030
 charming-0.030
 clever-0.030
 small-0.030
 cute-0.029
Token.
Token ID13
Feature activation-9.50e+00

Pos logit contributions
ors+0.064
rings+0.062
driver+0.062
ted+0.061
 Stop+0.060
Neg logit contributions
 beautiful-0.069
 charming-0.068
 clever-0.067
 wonderful-0.067
 small-0.067
Token The
Token ID383
Feature activation-9.01e+00

Pos logit contributions
 small+0.292
 charming+0.279
 clever+0.278
 cute+0.276
 beautiful+0.275
Neg logit contributions
rings-0.259
ors-0.253
driver-0.247
oes-0.247
fully-0.244
Token the
Token ID262
Feature activation-4.19e+00

Pos logit contributions
 to+0.134
ted+0.130
 him+0.129
ors+0.128
driver+0.127
Neg logit contributions
 charming-0.152
 clever-0.152
 wonderful-0.150
 cute-0.149
 lovely-0.147
Token ground
Token ID2323
Feature activation+1.11e+00

Pos logit contributions
 charming+0.103
 wonderful+0.102
 clever+0.102
 beautiful+0.102
 cute+0.101
Neg logit contributions
ors-0.129
ted-0.127
rings-0.126
driver-0.126
 to-0.124
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
ors+0.037
driver+0.036
 Stop+0.036
ted+0.035
rings+0.035
Neg logit contributions
 beautiful-0.026
 small-0.026
 charming-0.026
 nice-0.025
 lovely-0.025
Token It
Token ID632
Feature activation+1.40e+01

Pos logit contributions
 small+0.346
 big+0.331
 beautiful+0.331
 pretty+0.329
 soft+0.329
Neg logit contributions
ors-0.228
driver-0.215
fully-0.211
oes-0.202
iked-0.199
Token was
Token ID373
Feature activation+7.20e-01

Pos logit contributions
ted+0.243
ors+0.237
driver+0.228
rings+0.227
 to+0.226
Neg logit contributions
 charming-0.530
 wonderful-0.527
 beautiful-0.525
 lovely-0.523
 clever-0.522
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.026
 him+0.024
 it+0.024
ted+0.024
rings+0.023
Neg logit contributions
 wonderful-0.013
 charming-0.013
 clever-0.013
 lovely-0.013
 beautiful-0.012
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 beautiful+1.112
 small+1.099
 charming+1.082
 cute+1.078
 lovely+1.067
Neg logit contributions
oes-0.579
ors-0.572
 Stop-0.569
rings-0.562
but-0.558
Token black
Token ID2042
Feature activation-2.54e+01

Pos logit contributions
 to+0.423
ted+0.422
rings+0.420
driver+0.417
 it+0.415
Neg logit contributions
 wonderful-0.207
 charming-0.206
 clever-0.201
 lovely-0.201
 beautiful-0.198
Token with
Token ID351
Feature activation+4.42e+00

Pos logit contributions
 beautiful+0.892
 small+0.885
 nice+0.879
 charming+0.877
 lovely+0.876
Neg logit contributions
ors-0.544
rings-0.531
oes-0.523
driver-0.505
or-0.504
Token shiny
Token ID22441
Feature activation-1.80e+01

Pos logit contributions
 to+0.178
ted+0.177
als+0.173
 him+0.173
rings+0.173
Neg logit contributions
 charming-0.061
 clever-0.060
 wonderful-0.058
 lively-0.056
 humble-0.056
Token wings
Token ID12098
Feature activation+5.77e-01

Pos logit contributions
 beautiful+0.440
 charming+0.437
 lovely+0.429
 clever+0.428
 cute+0.427
Neg logit contributions
ors-0.575
driver-0.566
rings-0.558
ted-0.557
oes-0.554
Token plane
Token ID6614
Feature activation+1.32e+00

Pos logit contributions
 clever+0.563
 charming+0.557
 wonderful+0.549
 beautiful+0.544
 cute+0.543
Neg logit contributions
ors-0.795
ted-0.781
 to-0.775
driver-0.770
rings-0.767
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
ors+0.035
ted+0.034
rings+0.034
driver+0.034
oes+0.033
Neg logit contributions
 charming-0.039
 beautiful-0.039
 wonderful-0.039
 clever-0.039
 lovely-0.038
Token He
Token ID679
Feature activation+9.25e+00

Pos logit contributions
 charming+0.351
 beautiful+0.351
 small+0.347
 lovely+0.345
 clever+0.343
Neg logit contributions
ors-0.187
driver-0.179
ted-0.177
rings-0.177
oes-0.167
Token saw
Token ID2497
Feature activation-6.88e+00

Pos logit contributions
ors+0.184
ted+0.183
rings+0.177
driver+0.176
 to+0.176
Neg logit contributions
 charming-0.326
 wonderful-0.323
 beautiful-0.322
 lovely-0.322
 clever-0.322
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 clever+0.163
 charming+0.161
 wonderful+0.158
 cute+0.157
 lovely+0.156
Neg logit contributions
 to-0.218
ted-0.215
 him-0.213
 it-0.207
rings-0.205
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.190
aky+0.163
aver+0.155
 clever+0.152
 intelligent+0.150
Neg logit contributions
 to-0.527
 it-0.509
 we-0.507
 they-0.497
ho-0.479
Token bird
Token ID6512
Feature activation-8.26e-01

Pos logit contributions
 clever+0.748
 charming+0.746
 wonderful+0.744
 lovely+0.731
 cute+0.727
Neg logit contributions
driver-0.829
 to-0.823
ors-0.817
rings-0.810
ted-0.798
Token up
Token ID510
Feature activation+2.56e+00

Pos logit contributions
 beautiful+0.028
 charming+0.027
 small+0.027
 lovely+0.027
 cute+0.027
Neg logit contributions
driver-0.021
 Stop-0.020
ors-0.020
rings-0.019
ted-0.019
Token in
Token ID287
Feature activation+7.20e+00

Pos logit contributions
ors+0.060
driver+0.059
rings+0.058
ted+0.058
 Stop+0.057
Neg logit contributions
 beautiful-0.086
 charming-0.085
 lovely-0.084
 clever-0.084
 wonderful-0.083
Token the
Token ID262
Feature activation-4.17e+00

Pos logit contributions
 him+0.159
 to+0.158
 it+0.156
ted+0.145
 they+0.132
Neg logit contributions
 clever-0.247
 charming-0.244
 wonderful-0.239
 cute-0.238
 lovely-0.237
Token clouds
Token ID15114
Feature activation+1.40e+00

Pos logit contributions
 clever+0.148
 compassionate+0.140
 wonderful+0.140
 charming+0.139
 cute+0.139
Neg logit contributions
als-0.088
 to-0.086
ted-0.081
oes-0.079
rings-0.075
Token.
Token ID13
Feature activation-9.52e+00

Pos logit contributions
 small+0.219
 big+0.214
 nice+0.214
 beautiful+0.212
 charming+0.206
Neg logit contributions
ors-0.230
 Stop-0.229
ted-0.222
 Get-0.217
rings-0.216
Token But
Token ID887
Feature activation+8.32e+00

Pos logit contributions
 small+0.377
 charming+0.364
 beautiful+0.363
 clever+0.363
 cute+0.363
Neg logit contributions
ors-0.184
driver-0.158
fully-0.157
ting-0.153
ted-0.152
Token she
Token ID673
Feature activation+1.40e+01

Pos logit contributions
ors+0.244
driver+0.218
oes+0.213
amb+0.200
ted+0.200
Neg logit contributions
 small-0.267
 clever-0.260
 beautiful-0.255
 nice-0.253
 big-0.250
Token was
Token ID373
Feature activation+7.04e-01

Pos logit contributions
ors+0.360
driver+0.347
rings+0.345
ted+0.342
 Stop+0.336
Neg logit contributions
 clever-0.439
 charming-0.438
 beautiful-0.437
 cute-0.431
 wonderful-0.431
Token too
Token ID1165
Feature activation+3.59e+00

Pos logit contributions
 to+0.015
 it+0.013
 over+0.013
 they+0.011
 she+0.011
Neg logit contributions
 wonderful-0.024
 wond-0.024
 lovely-0.023
 charming-0.023
 beautiful-0.023
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.148
 it+0.128
t+0.127
 they+0.126
s+0.125
Neg logit contributions
 wonderful-0.062
 wond-0.058
 magical-0.057
 greatest-0.055
 great-0.053
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 nice+0.643
 small+0.632
 kind+0.618
 big+0.612
 fun+0.608
Neg logit contributions
 Stop-0.970
rings-0.961
ors-0.952
but-0.942
als-0.939
Token She
Token ID1375
Feature activation+1.23e+01

Pos logit contributions
 charming+0.405
 small+0.404
 beautiful+0.404
 clever+0.402
 lovely+0.402
Neg logit contributions
ors-0.135
driver-0.120
ted-0.119
fully-0.115
rings-0.114
Token could
Token ID714
Feature activation+3.05e+00

Pos logit contributions
ors+0.266
ted+0.263
driver+0.261
rings+0.259
 Stop+0.249
Neg logit contributions
 charming-0.423
 beautiful-0.420
 wonderful-0.418
 lovely-0.416
 clever-0.415
Token not
Token ID407
Feature activation+7.99e+00

Pos logit contributions
ors+0.084
ted+0.081
driver+0.079
rings+0.079
 Stop+0.078
Neg logit contributions
 beautiful-0.092
 clever-0.092
 charming-0.092
 cute-0.090
 lovely-0.090
Token hold
Token ID1745
Feature activation-1.91e-01

Pos logit contributions
ors+0.169
driver+0.166
 to+0.164
rings+0.163
oes+0.160
Neg logit contributions
 charming-0.278
 wonderful-0.273
 beautiful-0.273
 lovely-0.271
 cute-0.271
Token stairs
Token ID16046
Feature activation+4.30e+00

Pos logit contributions
 charming+0.122
 beautiful+0.122
 clever+0.120
 small+0.120
 lovely+0.119
Neg logit contributions
ors-0.115
ted-0.114
rings-0.112
driver-0.111
 Stop-0.109
Token,
Token ID11
Feature activation-7.40e-01

Pos logit contributions
ors+0.117
 Stop+0.115
rings+0.112
licks+0.108
driver+0.107
Neg logit contributions
 small-0.134
 big-0.131
 beautiful-0.125
 large-0.124
 nice-0.124
Token he
Token ID339
Feature activation+1.36e+01

Pos logit contributions
 small+0.032
 clever+0.032
 beautiful+0.031
 charming+0.031
 big+0.031
Neg logit contributions
ors-0.012
rings-0.010
driver-0.010
oes-0.010
licks-0.009
Token met
Token ID1138
Feature activation-5.35e+00

Pos logit contributions
ted+0.337
ors+0.334
rings+0.332
 to+0.329
driver+0.328
Neg logit contributions
 charming-0.419
 wonderful-0.416
 beautiful-0.414
 lovely-0.413
 cute-0.410
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 charming+0.150
 clever+0.147
 beautiful+0.147
 magical+0.147
 cute+0.147
Neg logit contributions
 to-0.162
 him-0.154
 it-0.138
ted-0.136
fully-0.131
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.352
 entire+0.304
aky+0.303
wy+0.288
obb+0.282
Neg logit contributions
 to-0.439
 they-0.421
 it-0.406
 we-0.405
 he-0.385
Token cat
Token ID3797
Feature activation-5.03e+00

Pos logit contributions
 wonderful+0.759
 charming+0.746
 clever+0.743
 lovely+0.738
 beautiful+0.737
Neg logit contributions
 to-0.827
driver-0.823
ors-0.800
 Stop-0.793
rings-0.792
Token.
Token ID13
Feature activation-9.47e+00

Pos logit contributions
 charming+0.110
 beautiful+0.110
 kind+0.109
 cute+0.109
 small+0.108
Neg logit contributions
ors-0.169
driver-0.169
rings-0.166
 Stop-0.163
oes-0.163
Token The
Token ID383
Feature activation-8.99e+00

Pos logit contributions
 charming+0.304
 beautiful+0.302
 small+0.301
 clever+0.299
 cute+0.299
Neg logit contributions
ors-0.235
rings-0.225
driver-0.221
fully-0.219
ted-0.218
Token cat
Token ID3797
Feature activation-5.04e+00

Pos logit contributions
 wonderful+0.348
 wond+0.346
ady+0.333
 great+0.326
aky+0.325
Neg logit contributions
 to-0.169
 She-0.123
't-0.121
 she-0.120
oes-0.118
Token said
Token ID531
Feature activation-1.23e+01

Pos logit contributions
 charming+0.173
 beautiful+0.173
 wonderful+0.171
 clever+0.171
 lovely+0.171
Neg logit contributions
ors-0.108
ted-0.106
driver-0.105
rings-0.104
 Stop-0.100
Token
Token ID198
Feature activation-7.27e+00

Pos logit contributions
 small+0.112
 lovely+0.111
 wonderful+0.111
 charming+0.111
 beautiful+0.110
Neg logit contributions
fully-0.068
rings-0.067
ors-0.066
ted-0.065
driver-0.064
Token
Token ID198
Feature activation-7.27e+00

Pos logit contributions
 charming+0.254
 beautiful+0.251
 clever+0.250
 cute+0.249
 lovely+0.247
Neg logit contributions
ors-0.150
 to-0.147
driver-0.144
 him-0.144
ted-0.143
TokenOne
Token ID3198
Feature activation-1.40e+01

Pos logit contributions
 charming+0.176
 beautiful+0.173
 clever+0.172
 cute+0.171
 lovely+0.170
Neg logit contributions
ors-0.230
driver-0.226
ted-0.223
 Stop-0.221
but-0.220
Token day
Token ID1110
Feature activation-5.34e+00

Pos logit contributions
 wonderful+0.407
 charming+0.407
 cute+0.405
 clever+0.403
 beautiful+0.401
Neg logit contributions
 to-0.362
 they-0.321
 he-0.312
oes-0.311
 it-0.303
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
 beautiful+0.185
 clever+0.185
 charming+0.185
 small+0.185
 wonderful+0.184
Neg logit contributions
ors-0.118
rings-0.115
oes-0.114
driver-0.113
but-0.107
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
ady+0.084
eh+0.082
aky+0.080
 wonderful+0.078
 charming+0.076
Neg logit contributions
 to-0.136
 we-0.126
 he-0.126
 him-0.121
 they-0.121
Token boy
Token ID2933
Feature activation+1.31e+00

Pos logit contributions
 wonderful+0.882
 lovely+0.851
 clever+0.847
 charming+0.847
 beautiful+0.843
Neg logit contributions
 to-0.772
driver-0.705
oes-0.702
ted-0.697
ors-0.679
Token made
Token ID925
Feature activation+1.58e+00

Pos logit contributions
rings+0.036
ors+0.036
driver+0.036
oes+0.036
fully+0.035
Neg logit contributions
 clever-0.038
 beautiful-0.038
 charming-0.037
 small-0.037
 kind-0.037
Token a
Token ID257
Feature activation-1.37e+01

Pos logit contributions
 him+0.045
 to+0.045
 it+0.044
ors+0.041
ted+0.040
Neg logit contributions
 charming-0.044
 cute-0.043
 wonderful-0.043
 lovely-0.042
 beautiful-0.042
Token little
Token ID1310
Feature activation-2.35e+01

Pos logit contributions
aver+0.225
aky+0.208
ady+0.200
 charming+0.190
 cute+0.185
Neg logit contributions
 to-0.520
 we-0.470
 they-0.468
 it-0.460
ho-0.449
Token money
Token ID1637
Feature activation-1.29e-01

Pos logit contributions
 charming+0.567
 wonderful+0.562
 clever+0.560
 lovely+0.557
 beautiful+0.556
Neg logit contributions
 to-0.719
driver-0.713
ors-0.689
rings-0.686
ted-0.679
Token cloth
Token ID16270
Feature activation+6.40e+00

Pos logit contributions
 beautiful+0.447
 charming+0.444
 lovely+0.436
 cute+0.432
 wonderful+0.430
Neg logit contributions
ors-0.685
driver-0.658
ted-0.655
oes-0.654
 Stop-0.649
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
 Stop+0.152
 Get+0.132
oes+0.130
ted+0.130
ors+0.129
Neg logit contributions
 beautiful-0.223
 small-0.220
 charming-0.218
 cute-0.215
 lovely-0.214
Token They
Token ID1119
Feature activation+1.15e+01

Pos logit contributions
 small+0.316
 charming+0.308
 beautiful+0.306
 big+0.302
 nice+0.299
Neg logit contributions
ors-0.246
or-0.238
driver-0.232
rings-0.221
fully-0.220
Token cut
Token ID2005
Feature activation-6.85e+00

Pos logit contributions
ors+0.278
ted+0.274
driver+0.273
rings+0.272
oes+0.259
Neg logit contributions
 charming-0.364
 beautiful-0.361
 wonderful-0.359
 clever-0.358
 lovely-0.357
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
 beautiful+0.213
 charming+0.211
 small+0.210
 lovely+0.209
 clever+0.209
Neg logit contributions
ors-0.173
rings-0.168
driver-0.167
 Stop-0.167
oes-0.165
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 clever+0.236
 wonderful+0.229
 charming+0.228
 intelligent+0.214
 lovely+0.213
Neg logit contributions
 to-0.506
driver-0.484
rings-0.482
 it-0.476
als-0.470
Token piece
Token ID3704
Feature activation-7.39e+00

Pos logit contributions
 charming+0.660
 beautiful+0.657
 wonderful+0.648
 clever+0.647
 lovely+0.644
Neg logit contributions
ors-0.964
driver-0.947
ted-0.945
rings-0.941
 Stop-0.922
Token to
Token ID284
Feature activation+1.49e+01

Pos logit contributions
 small+0.253
 big+0.251
 beautiful+0.249
 nice+0.247
 soft+0.240
Neg logit contributions
 Stop-0.198
ors-0.189
 Get-0.178
rings-0.177
ted-0.177
Token make
Token ID787
Feature activation+4.71e+00

Pos logit contributions
 him+0.285
 it+0.283
rings+0.281
 to+0.280
ted+0.277
Neg logit contributions
 charming-0.522
 wonderful-0.510
 clever-0.509
 cute-0.507
 lovely-0.502
Token a
Token ID257
Feature activation-1.38e+01

Pos logit contributions
ors+0.108
ted+0.107
 to+0.107
rings+0.106
 him+0.105
Neg logit contributions
 charming-0.150
 clever-0.148
 wonderful-0.147
 lovely-0.146
 cute-0.146
Token new
Token ID649
Feature activation-1.98e+01

Pos logit contributions
aver+0.220
 clever+0.210
 charming+0.201
aky+0.199
 intelligent+0.198
Neg logit contributions
 to-0.506
 it-0.484
rings-0.468
 we-0.462
 they-0.458
Token,
Token ID11
Feature activation-7.44e-01

Pos logit contributions
ted+0.147
 Stop+0.145
ors+0.145
 Get+0.136
ting+0.133
Neg logit contributions
 small-0.199
 beautiful-0.199
 nice-0.195
 big-0.195
 pretty-0.189
Token but
Token ID475
Feature activation+9.50e+00

Pos logit contributions
 small+0.027
 gentle+0.027
 clever+0.027
 beautiful+0.027
 cute+0.026
Neg logit contributions
ors-0.020
als-0.018
ted-0.018
or-0.017
oes-0.017
Token she
Token ID673
Feature activation+1.40e+01

Pos logit contributions
ors+0.203
amb+0.175
oes+0.174
driver+0.168
ted+0.167
Neg logit contributions
 small-0.382
 clever-0.376
 beautiful-0.369
 big-0.363
 gentle-0.362
Token was
Token ID373
Feature activation+7.04e-01

Pos logit contributions
ors+0.218
driver+0.210
ted+0.208
rings+0.208
 Stop+0.200
Neg logit contributions
 charming-0.571
 beautiful-0.570
 clever-0.569
 wonderful-0.564
 lovely-0.563
Token too
Token ID1165
Feature activation+3.59e+00

Pos logit contributions
 to+0.014
 it+0.011
driver+0.011
't+0.011
rings+0.010
Neg logit contributions
 wonderful-0.027
 lovely-0.026
 charming-0.026
 beautiful-0.026
 great-0.025
Token small
Token ID1402
Feature activation-2.90e+01

Pos logit contributions
 to+0.136
 it+0.126
t+0.124
 they+0.122
driver+0.120
Neg logit contributions
 wonderful-0.067
 magical-0.062
 lovely-0.060
 wond-0.059
 great-0.057
Token.
Token ID13
Feature activation-9.48e+00

Pos logit contributions
 nice+0.836
 beautiful+0.816
 small+0.813
 fun+0.813
 kind+0.810
Neg logit contributions
 Stop-0.776
als-0.741
ted-0.739
ors-0.724
rings-0.711
Token She
Token ID1375
Feature activation+1.23e+01

Pos logit contributions
 small+0.304
 beautiful+0.299
 clever+0.299
 charming+0.299
 lovely+0.296
Neg logit contributions
ors-0.241
driver-0.234
rings-0.222
ted-0.217
ting-0.217
Token felt
Token ID2936
Feature activation+6.68e-01

Pos logit contributions
ors+0.250
ted+0.246
driver+0.245
rings+0.239
 Stop+0.229
Neg logit contributions
 charming-0.442
 beautiful-0.439
 wonderful-0.437
 lovely-0.435
 clever-0.432
Token very
Token ID845
Feature activation-6.55e+00

Pos logit contributions
 to+0.025
 him+0.024
ted+0.024
 it+0.024
ting+0.023
Neg logit contributions
 charming-0.012
 wonderful-0.012
 beautiful-0.011
 lovely-0.011
 clever-0.011
Token confused
Token ID10416
Feature activation-1.63e+01

Pos logit contributions
 wonderful+0.123
 lovely+0.112
 charming+0.110
 beautiful+0.109
 wond+0.103
Neg logit contributions
 to-0.232
 they-0.221
 he-0.221
 it-0.220
 she-0.218
Token remembers
Token ID18140
Feature activation+6.86e+00

Pos logit contributions
ors+0.320
driver+0.313
rings+0.306
ted+0.304
oes+0.304
Neg logit contributions
 beautiful-0.410
 charming-0.409
 clever-0.406
 cute-0.405
 wonderful-0.403
Token that
Token ID326
Feature activation+1.05e+01

Pos logit contributions
ors+0.148
driver+0.145
oes+0.140
ted+0.138
rings+0.137
Neg logit contributions
 beautiful-0.246
 charming-0.244
 small-0.242
 wonderful-0.242
 clever-0.241
Token his
Token ID465
Feature activation+1.82e+00

Pos logit contributions
ors+0.224
driver+0.220
rings+0.219
ted+0.218
 Stop+0.210
Neg logit contributions
 charming-0.358
 beautiful-0.356
 clever-0.354
 wonderful-0.353
 lovely-0.352
Token mom
Token ID1995
Feature activation-1.15e+00

Pos logit contributions
 to+0.055
rings+0.051
 they+0.048
 it+0.048
 we+0.048
Neg logit contributions
 charming-0.044
 wonderful-0.044
 clever-0.044
 lovely-0.043
 cute-0.042
Token uses
Token ID3544
Feature activation+3.36e+00

Pos logit contributions
 beautiful+0.037
 charming+0.037
 clever+0.037
 cute+0.036
 wonderful+0.036
Neg logit contributions
ors-0.029
driver-0.029
 Stop-0.028
fully-0.028
ted-0.028
Token it
Token ID340
Feature activation+2.16e+01

Pos logit contributions
ors+0.081
driver+0.078
 Stop+0.078
rings+0.078
ted+0.077
Neg logit contributions
 beautiful-0.109
 charming-0.108
 wonderful-0.107
 lovely-0.107
 small-0.106
Token to
Token ID284
Feature activation+1.74e+01

Pos logit contributions
ors+0.319
 Stop+0.302
ted+0.301
driver+0.297
rings+0.295
Neg logit contributions
 beautiful-0.900
 charming-0.896
 wonderful-0.891
 clever-0.891
 lovely-0.888
Token clean
Token ID3424
Feature activation-6.16e+00

Pos logit contributions
 to+0.398
rings+0.378
 him+0.375
oes+0.368
 they+0.368
Neg logit contributions
 charming-0.556
 clever-0.545
 cute-0.540
 wonderful-0.539
 beautiful-0.534
Token his
Token ID465
Feature activation+1.13e+00

Pos logit contributions
 beautiful+0.223
 charming+0.219
 lovely+0.218
 small+0.218
 nice+0.216
Neg logit contributions
driver-0.126
rings-0.126
oes-0.125
ted-0.123
 Stop-0.122
Token cuts
Token ID6630
Feature activation+2.94e+00

Pos logit contributions
 to+0.032
ors+0.030
rings+0.030
 it+0.029
 we+0.028
Neg logit contributions
 wonderful-0.030
 clever-0.029
 charming-0.029
 lovely-0.029
 cute-0.028
Token.
Token ID13
Feature activation-7.32e+00

Pos logit contributions
ors+0.066
rings+0.066
ted+0.066
 Stop+0.066
driver+0.064
Neg logit contributions
 beautiful-0.097
 small-0.096
 nice-0.096
 charming-0.095
 lovely-0.095
Token was
Token ID373
Feature activation+4.07e+00

Pos logit contributions
ors+0.423
oes+0.415
rings+0.410
ted+0.409
 Stop+0.405
Neg logit contributions
 beautiful-0.478
 clever-0.477
 wonderful-0.475
 charming-0.474
 lovely-0.467
Token so
Token ID523
Feature activation+3.98e+00

Pos logit contributions
 to+0.129
 it+0.121
but+0.119
 him+0.117
driver+0.116
Neg logit contributions
 charming-0.090
 cute-0.090
 wonderful-0.089
 lovely-0.088
 beautiful-0.088
Token proud
Token ID6613
Feature activation-1.08e+01

Pos logit contributions
 to+0.137
 she+0.131
 it+0.131
but+0.131
 they+0.129
Neg logit contributions
 magical-0.062
 wonderful-0.059
eh-0.058
aver-0.058
 charming-0.058
Token of
Token ID286
Feature activation+7.23e+00

Pos logit contributions
 beautiful+0.375
 charming+0.357
 lovely+0.352
 kind+0.352
 wonderful+0.351
Neg logit contributions
driver-0.284
licks-0.270
ors-0.265
rings-0.264
oes-0.258
Token herself
Token ID5223
Feature activation+6.53e+00

Pos logit contributions
ors+0.149
driver+0.146
ted+0.144
rings+0.143
oes+0.139
Neg logit contributions
 beautiful-0.261
 charming-0.261
 wonderful-0.260
 clever-0.259
 lovely-0.258
Token and
Token ID290
Feature activation+1.45e+01

Pos logit contributions
oes+0.132
ted+0.132
 Stop+0.129
ors+0.129
licks+0.126
Neg logit contributions
 beautiful-0.248
 charming-0.244
 clever-0.244
 wonderful-0.243
 lovely-0.243
Token the
Token ID262
Feature activation-1.58e+00

Pos logit contributions
ors+0.364
driver+0.354
oes+0.344
ted+0.343
rings+0.340
Neg logit contributions
 beautiful-0.480
 charming-0.478
 clever-0.474
 wonderful-0.473
 lovely-0.469
Token soldier
Token ID10686
Feature activation+1.85e+00

Pos logit contributions
 charming+0.031
 cute+0.031
ady+0.030
 cozy+0.030
 clever+0.030
Neg logit contributions
 to-0.048
driver-0.047
rings-0.046
 it-0.046
 we-0.045
Token was
Token ID373
Feature activation+3.09e+00

Pos logit contributions
ors+0.042
driver+0.041
ted+0.040
rings+0.040
oes+0.040
Neg logit contributions
 beautiful-0.062
 charming-0.062
 wonderful-0.062
 clever-0.062
 lovely-0.061
Token so
Token ID523
Feature activation+3.05e+00

Pos logit contributions
 to+0.098
 it+0.092
but+0.090
 him+0.089
 they+0.088
Neg logit contributions
 cute-0.070
 charming-0.069
 small-0.068
 clever-0.068
 lovely-0.067
Token happy
Token ID3772
Feature activation-1.78e+01

Pos logit contributions
 to+0.114
 they+0.110
but+0.109
 it+0.108
 we+0.107
Neg logit contributions
 wonderful-0.041
eh-0.041
 charming-0.040
 magical-0.040
 cute-0.039
Token country
Token ID1499
Feature activation-1.98e+00

Pos logit contributions
ors+0.136
 to+0.134
ted+0.129
 him+0.125
ting+0.123
Neg logit contributions
 clever-0.122
 charming-0.122
 wonderful-0.121
 beautiful-0.120
 lovely-0.119
Token.
Token ID13
Feature activation-8.74e+00

Pos logit contributions
 charming+0.070
 beautiful+0.070
 wonderful+0.070
 clever+0.070
 lovely+0.069
Neg logit contributions
ors-0.040
driver-0.039
ted-0.039
rings-0.039
 Stop-0.038
Token
Token ID198
Feature activation-6.49e+00

Pos logit contributions
 charming+0.269
 beautiful+0.268
 lovely+0.265
 small+0.265
 cute+0.263
Neg logit contributions
ors-0.220
driver-0.218
rings-0.216
ted-0.212
oes-0.211
Token
Token ID198
Feature activation-6.46e+00

Pos logit contributions
 charming+0.206
 beautiful+0.206
 wonderful+0.203
 clever+0.203
 lovely+0.203
Neg logit contributions
ors-0.158
driver-0.154
ted-0.154
rings-0.153
 Stop-0.149
TokenAs
Token ID1722
Feature activation+2.82e+00

Pos logit contributions
 charming+0.172
 clever+0.171
 beautiful+0.170
 wonderful+0.170
 lovely+0.168
Neg logit contributions
ors-0.185
driver-0.180
ted-0.179
 Stop-0.178
but-0.176
Token they
Token ID484
Feature activation+1.56e+01

Pos logit contributions
ors+0.010
ted+0.010
rings+0.009
 to+0.009
driver+0.009
Neg logit contributions
 charming-0.146
 beautiful-0.145
 clever-0.145
 wonderful-0.144
 lovely-0.144
Token continued
Token ID3767
Feature activation+6.86e-01

Pos logit contributions
ted+0.293
ors+0.259
driver+0.251
als+0.248
ting+0.246
Neg logit contributions
 charming-0.608
 beautiful-0.604
 clever-0.596
 lovely-0.596
 cute-0.593
Token walking
Token ID6155
Feature activation-1.92e+00

Pos logit contributions
rings+0.016
 Stop+0.015
ors+0.015
oes+0.015
als+0.015
Neg logit contributions
 small-0.025
 big-0.024
 charming-0.024
 lively-0.024
 nice-0.024
Token,
Token ID11
Feature activation+1.73e-01

Pos logit contributions
 small+0.064
 nice+0.063
 big+0.062
 lively+0.062
 beautiful+0.062
Neg logit contributions
rings-0.049
ors-0.048
driver-0.046
 Stop-0.046
licks-0.045
Token they
Token ID484
Feature activation+1.57e+01

Pos logit contributions
ors+0.003
ted+0.003
driver+0.003
rings+0.003
 Stop+0.002
Neg logit contributions
 charming-0.007
 beautiful-0.007
 clever-0.007
 wonderful-0.007
 lovely-0.007
Token came
Token ID1625
Feature activation+4.50e+00

Pos logit contributions
ors+0.466
driver+0.450
rings+0.436
 Stop+0.436
ted+0.432
Neg logit contributions
 beautiful-0.435
 clever-0.433
 charming-0.432
 small-0.429
 wonderful-0.427
Token Lily
Token ID20037
Feature activation-4.34e+00

Pos logit contributions
ors+0.033
driver+0.031
oes+0.029
ted+0.029
rings+0.028
Neg logit contributions
 beautiful-0.077
 charming-0.076
 lovely-0.076
 wonderful-0.076
 clever-0.075
Token's
Token ID338
Feature activation+5.34e+00

Pos logit contributions
 beautiful+0.136
 charming+0.134
 clever+0.134
 lovely+0.131
 small+0.131
Neg logit contributions
oes-0.111
ors-0.110
rings-0.109
driver-0.109
 Stop-0.109
Token dad
Token ID9955
Feature activation+1.27e+00

Pos logit contributions
 to+0.155
rings+0.137
als+0.129
 we+0.128
driver+0.128
Neg logit contributions
 charming-0.133
 cute-0.133
 wonderful-0.131
 clever-0.130
aky-0.130
Token asked
Token ID1965
Feature activation-4.86e+00

Pos logit contributions
ors+0.034
oes+0.034
rings+0.034
driver+0.034
 Stop+0.033
Neg logit contributions
 clever-0.038
 beautiful-0.037
 charming-0.037
 wonderful-0.037
 lovely-0.036
Token her
Token ID607
Feature activation+3.98e+00

Pos logit contributions
 nice+0.158
 small+0.154
 charming+0.154
 lovely+0.154
 wonderful+0.154
Neg logit contributions
ors-0.121
rings-0.120
driver-0.119
oes-0.118
als-0.116
Token to
Token ID284
Feature activation+1.68e+01

Pos logit contributions
ors+0.052
ted+0.051
 Stop+0.042
ting+0.041
oes+0.041
Neg logit contributions
 beautiful-0.186
 clever-0.185
 small-0.185
 wonderful-0.184
 charming-0.184
Token count
Token ID954
Feature activation+1.50e+00

Pos logit contributions
ors+0.426
ted+0.414
driver+0.410
rings+0.405
 Stop+0.393
Neg logit contributions
 beautiful-0.531
 charming-0.529
 clever-0.524
 wonderful-0.523
 lovely-0.522
Token all
Token ID477
Feature activation+9.17e-01

Pos logit contributions
ors+0.037
ted+0.036
 Stop+0.036
driver+0.035
ting+0.033
Neg logit contributions
 beautiful-0.049
 small-0.049
 charming-0.048
 lovely-0.048
 nice-0.048
Token the
Token ID262
Feature activation-2.44e+00

Pos logit contributions
driver+0.025
ors+0.025
rings+0.024
ted+0.024
 Stop+0.024
Neg logit contributions
 beautiful-0.028
 charming-0.028
 wonderful-0.027
 small-0.027
 lovely-0.027
Token books
Token ID3835
Feature activation+5.86e+00

Pos logit contributions
 charming+0.053
 compassionate+0.052
 humble+0.052
 clever+0.052
 cozy+0.051
Neg logit contributions
 to-0.077
rings-0.074
ors-0.073
oes-0.072
driver-0.071
Token on
Token ID319
Feature activation+6.77e+00

Pos logit contributions
 Stop+0.144
ors+0.144
ted+0.139
 Get+0.130
ting+0.130
Neg logit contributions
 beautiful-0.192
 nice-0.190
 small-0.189
 big-0.188
 charming-0.185
TokenL
Token ID43
Feature activation-1.01e+01

Pos logit contributions
 charming+0.012
 beautiful+0.012
 clever+0.012
 wonderful+0.012
 lovely+0.012
Neg logit contributions
 Stop-0.010
ors-0.010
driver-0.010
but-0.010
ted-0.010
Tokenily
Token ID813
Feature activation-1.90e+00

Pos logit contributions
 charming+0.295
 beautiful+0.293
 clever+0.290
 wonderful+0.290
 cute+0.289
Neg logit contributions
ors-0.276
driver-0.271
rings-0.267
ted-0.267
 Stop-0.264
Token's
Token ID338
Feature activation+8.40e+00

Pos logit contributions
 charming+0.067
 beautiful+0.067
 clever+0.066
 wonderful+0.066
 cute+0.066
Neg logit contributions
ors-0.040
driver-0.039
rings-0.039
oes-0.038
ted-0.038
Token mom
Token ID1995
Feature activation+1.00e+00

Pos logit contributions
 to+0.236
rings+0.205
ho+0.203
driver+0.202
 him+0.197
Neg logit contributions
 wonderful-0.217
 charming-0.216
 lovely-0.210
 lively-0.207
 cute-0.207
Token said
Token ID531
Feature activation-7.52e+00

Pos logit contributions
ors+0.026
driver+0.025
rings+0.025
oes+0.024
but+0.024
Neg logit contributions
 clever-0.031
 beautiful-0.031
 charming-0.031
 wonderful-0.030
 lovely-0.030
Token they
Token ID484
Feature activation+1.93e+01

Pos logit contributions
 nice+0.259
 kind+0.257
 small+0.257
 clever+0.256
 beautiful+0.253
Neg logit contributions
oes-0.190
ors-0.190
licks-0.171
ted-0.170
rings-0.170
Token could
Token ID714
Feature activation+7.31e+00

Pos logit contributions
ors+0.369
oes+0.343
but+0.338
driver+0.335
ted+0.330
Neg logit contributions
 wonderful-0.746
 beautiful-0.745
 clever-0.744
 lovely-0.738
 charming-0.738
Token come
Token ID1282
Feature activation+1.21e+01

Pos logit contributions
ted+0.198
ors+0.196
driver+0.188
rings+0.184
ting+0.184
Neg logit contributions
 beautiful-0.224
 clever-0.222
 charming-0.221
 lovely-0.220
 wonderful-0.220
Token back
Token ID736
Feature activation+5.52e+00

Pos logit contributions
ors+0.228
ted+0.226
driver+0.213
 Stop+0.209
licks+0.200
Neg logit contributions
 beautiful-0.485
 charming-0.478
 lovely-0.475
 small-0.475
 big-0.473
Token the
Token ID262
Feature activation-4.67e-01

Pos logit contributions
ors+0.102
driver+0.099
ted+0.098
rings+0.097
 Stop+0.094
Neg logit contributions
 beautiful-0.210
 charming-0.209
 wonderful-0.207
 clever-0.207
 lovely-0.207
Token next
Token ID1306
Feature activation-8.70e+00

Pos logit contributions
 charming+0.011
 clever+0.011
 cute+0.011
 wonderful+0.011
 lovely+0.010
Neg logit contributions
 to-0.014
 we-0.013
ho-0.013
oes-0.013
 it-0.012
Token do
Token ID466
Feature activation+2.45e+00

Pos logit contributions
ors+0.449
driver+0.441
rings+0.440
ted+0.440
als+0.421
Neg logit contributions
 beautiful-0.446
 charming-0.446
 wonderful-0.442
 lovely-0.442
 clever-0.441
Token and
Token ID290
Feature activation+1.27e+01

Pos logit contributions
rings+0.078
oes+0.075
ted+0.075
als+0.074
licks+0.073
Neg logit contributions
 nice-0.068
 wonderful-0.067
 beautiful-0.066
 clever-0.065
 great-0.065
Token felt
Token ID2936
Feature activation+1.54e+00

Pos logit contributions
ors+0.302
driver+0.290
rings+0.289
oes+0.286
ted+0.281
Neg logit contributions
 clever-0.426
 beautiful-0.424
 wonderful-0.419
 charming-0.418
 kind-0.417
Token very
Token ID845
Feature activation-5.64e+00

Pos logit contributions
 to+0.058
 him+0.057
 it+0.055
ted+0.055
 they+0.054
Neg logit contributions
 charming-0.026
 beautiful-0.025
 lovely-0.025
 wonderful-0.025
 clever-0.025
Token worried
Token ID7960
Feature activation-1.18e+01

Pos logit contributions
 wonderful+0.094
 beautiful+0.093
 charming+0.093
 lovely+0.091
 cute+0.086
Neg logit contributions
 to-0.207
 they-0.200
 him-0.199
 he-0.197
 it-0.196
Token about
Token ID546
Feature activation+9.33e+00

Pos logit contributions
 nice+0.275
 beautiful+0.273
 wonderful+0.268
 lovely+0.264
 charming+0.262
Neg logit contributions
rings-0.400
oes-0.395
als-0.391
ors-0.390
ted-0.388
Token him
Token ID683
Feature activation+1.30e+01

Pos logit contributions
ted+0.180
als+0.177
ors+0.172
rings+0.172
driver+0.171
Neg logit contributions
 beautiful-0.365
 wonderful-0.365
 clever-0.359
 lovely-0.358
 charming-0.358
Token.
Token ID13
Feature activation-8.50e+00

Pos logit contributions
ors+0.342
driver+0.334
ted+0.333
rings+0.331
 Stop+0.323
Neg logit contributions
 charming-0.388
 beautiful-0.386
 wonderful-0.381
 clever-0.381
 lovely-0.380
Token
Token ID220
Feature activation-5.59e+00

Pos logit contributions
 beautiful+0.323
 charming+0.322
 clever+0.321
 small+0.319
 wonderful+0.318
Neg logit contributions
ors-0.152
ted-0.151
rings-0.149
driver-0.149
fully-0.143
Token
Token ID198
Feature activation-6.21e+00

Pos logit contributions
 charming+0.172
 beautiful+0.172
 clever+0.170
 wonderful+0.170
 lovely+0.169
Neg logit contributions
ors-0.140
ted-0.138
driver-0.137
rings-0.136
 Stop-0.132
Token
Token ID198
Feature activation-6.17e+00

Pos logit contributions
 charming+0.200
 beautiful+0.199
 wonderful+0.197
 clever+0.197
 lovely+0.196
Neg logit contributions
ors-0.148
driver-0.144
ted-0.143
rings-0.142
 Stop-0.139
Token me
Token ID502
Feature activation-3.76e-01

Pos logit contributions
 him+0.353
 to+0.317
 it+0.310
 Stop+0.302
ors+0.296
Neg logit contributions
 clever-0.559
 charming-0.555
 wonderful-0.551
 cute-0.544
 beautiful-0.542
Token!"
Token ID2474
Feature activation-1.31e+00

Pos logit contributions
 charming+0.011
 beautiful+0.011
 wonderful+0.011
 clever+0.011
 lovely+0.011
Neg logit contributions
ors-0.010
driver-0.010
ted-0.010
rings-0.010
 Stop-0.009
Token Anna
Token ID11735
Feature activation-3.17e+00

Pos logit contributions
 small+0.038
 charming+0.037
 beautiful+0.037
 cute+0.037
 nice+0.036
Neg logit contributions
ors-0.039
driver-0.037
rings-0.037
ted-0.037
ting-0.036
Token does
Token ID857
Feature activation+7.19e+00

Pos logit contributions
 clever+0.100
 small+0.099
 cute+0.098
 gentle+0.097
 nice+0.095
Neg logit contributions
ors-0.084
fully-0.083
rings-0.081
but-0.080
driver-0.079
Token not
Token ID407
Feature activation+9.18e+00

Pos logit contributions
ors+0.177
ted+0.173
driver+0.172
rings+0.171
oes+0.166
Neg logit contributions
 charming-0.229
 beautiful-0.228
 clever-0.227
 wonderful-0.225
 lovely-0.225
Token want
Token ID765
Feature activation+3.84e+00

Pos logit contributions
ors+0.216
ted+0.210
driver+0.210
rings+0.208
 Stop+0.202
Neg logit contributions
 charming-0.301
 beautiful-0.300
 clever-0.298
 wonderful-0.298
 lovely-0.296
Token to
Token ID284
Feature activation+1.61e+01

Pos logit contributions
ors+0.037
driver+0.034
rings+0.031
but+0.030
ting+0.029
Neg logit contributions
 nice-0.185
 beautiful-0.185
 small-0.184
 lovely-0.183
 wonderful-0.183
Token share
Token ID2648
Feature activation+8.39e+00

Pos logit contributions
ors+0.303
 to+0.294
rings+0.294
driver+0.293
 Stop+0.292
Neg logit contributions
 charming-0.592
 beautiful-0.586
 wonderful-0.586
 clever-0.585
 lovely-0.579
Token.
Token ID13
Feature activation-8.21e+00

Pos logit contributions
ors+0.200
driver+0.196
ted+0.195
rings+0.194
oes+0.188
Neg logit contributions
 charming-0.272
 beautiful-0.272
 clever-0.268
 wonderful-0.268
 lovely-0.268
Token She
Token ID1375
Feature activation+1.36e+01

Pos logit contributions
 small+0.346
 cute+0.342
 charming+0.340
 beautiful+0.339
 clever+0.337
Neg logit contributions
ors-0.138
driver-0.129
rings-0.120
ted-0.117
oes-0.115
Token pulls
Token ID16194
Feature activation+3.82e+00

Pos logit contributions
ors+0.313
driver+0.306
rings+0.301
ted+0.298
fully+0.297
Neg logit contributions
 charming-0.453
 beautiful-0.453
 clever-0.452
 cute-0.449
 wonderful-0.448
Token She
Token ID1375
Feature activation+1.30e+01

Pos logit contributions
 small+0.364
 beautiful+0.362
 charming+0.360
 clever+0.355
 lovely+0.352
Neg logit contributions
ors-0.141
driver-0.135
ted-0.126
rings-0.121
oes-0.119
Token asked
Token ID1965
Feature activation-6.35e+00

Pos logit contributions
ted+0.298
ors+0.297
rings+0.292
driver+0.288
 to+0.285
Neg logit contributions
 charming-0.420
 beautiful-0.415
 wonderful-0.415
 lovely-0.414
 clever-0.411
Token,
Token ID11
Feature activation-6.73e-02

Pos logit contributions
 beautiful+0.192
 charming+0.191
 wonderful+0.190
 clever+0.190
 lovely+0.189
Neg logit contributions
ors-0.168
driver-0.168
rings-0.162
ted-0.161
oes-0.160
Token "
Token ID366
Feature activation-1.26e+00

Pos logit contributions
 charming+0.003
 beautiful+0.003
 clever+0.003
 wonderful+0.003
 lovely+0.003
Neg logit contributions
ors-0.001
driver-0.001
ted-0.001
rings-0.001
oes-0.001
TokenWho
Token ID8241
Feature activation+3.38e+00

Pos logit contributions
 charming+0.041
 beautiful+0.041
 wonderful+0.041
 clever+0.040
 lovely+0.040
Neg logit contributions
ors-0.030
driver-0.029
ted-0.029
rings-0.029
 Stop-0.028
Token are
Token ID389
Feature activation+5.49e+00

Pos logit contributions
ors+0.066
driver+0.064
ted+0.064
rings+0.062
oes+0.061
Neg logit contributions
 beautiful-0.124
 charming-0.124
 wonderful-0.123
 clever-0.123
 lovely-0.122
Token you
Token ID345
Feature activation+1.05e+01

Pos logit contributions
driver+0.085
ors+0.085
ted+0.080
oes+0.078
rings+0.077
Neg logit contributions
 beautiful-0.227
 charming-0.226
 wonderful-0.224
 clever-0.224
 lovely-0.224
Token?"
Token ID1701
Feature activation-2.99e+00

Pos logit contributions
 to+0.205
rings+0.189
ors+0.181
 Stop+0.180
ting+0.180
Neg logit contributions
 wonderful-0.365
 charming-0.365
 gentle-0.358
 clever-0.357
 beautiful-0.357
Token The
Token ID383
Feature activation-8.56e+00

Pos logit contributions
 charming+0.100
 beautiful+0.100
 clever+0.099
 wonderful+0.099
 lovely+0.099
Neg logit contributions
ors-0.068
driver-0.067
ted-0.066
rings-0.065
 Stop-0.063
Token fairy
Token ID25607
Feature activation-6.93e+00

Pos logit contributions
 wonderful+0.268
 great+0.264
ady+0.262
 delicious+0.260
 lively+0.257
Neg logit contributions
als-0.193
rings-0.186
 to-0.182
 she-0.182
 it-0.173
Token replied
Token ID8712
Feature activation-9.38e+00

Pos logit contributions
 charming+0.222
 wonderful+0.221
 lovely+0.219
 clever+0.218
 beautiful+0.218
Neg logit contributions
ted-0.162
rings-0.156
ors-0.155
 to-0.155
fully-0.152
Token the
Token ID262
Feature activation-4.15e+00

Pos logit contributions
rings+0.166
ors+0.164
driver+0.162
ted+0.160
oes+0.158
Neg logit contributions
 beautiful-0.190
 charming-0.188
 clever-0.187
 small-0.186
 wonderful-0.186
Token yard
Token ID12699
Feature activation+1.77e+00

Pos logit contributions
 charming+0.113
 clever+0.111
 wonderful+0.111
 cute+0.110
 lovely+0.108
Neg logit contributions
 to-0.111
ors-0.106
 Stop-0.106
driver-0.105
ting-0.104
Token.
Token ID13
Feature activation-9.47e+00

Pos logit contributions
ors+0.053
ted+0.053
rings+0.052
driver+0.052
 Stop+0.052
Neg logit contributions
 beautiful-0.046
 charming-0.045
 clever-0.045
 small-0.045
 lovely-0.044
Token Tom
Token ID4186
Feature activation-3.10e+00

Pos logit contributions
 beautiful+0.357
 charming+0.357
 clever+0.355
 small+0.352
 wonderful+0.352
Neg logit contributions
ors-0.176
rings-0.170
ted-0.164
driver-0.162
fully-0.159
Token had
Token ID550
Feature activation+2.95e+00

Pos logit contributions
 clever+0.103
 beautiful+0.103
 charming+0.102
 small+0.101
 cute+0.101
Neg logit contributions
rings-0.072
ors-0.071
oes-0.071
driver-0.070
 Stop-0.068
Token so
Token ID523
Feature activation+7.77e-01

Pos logit contributions
 to+0.075
driver+0.069
ted+0.069
 him+0.068
ors+0.067
Neg logit contributions
 charming-0.090
 cute-0.088
 lovely-0.088
 beautiful-0.088
 clever-0.087
Token much
Token ID881
Feature activation-1.46e+01

Pos logit contributions
 him+0.023
 it+0.023
 to+0.023
 they+0.022
t+0.022
Neg logit contributions
 charming-0.018
 magical-0.018
 gentle-0.018
 beautiful-0.017
 compassionate-0.017
Token fun
Token ID1257
Feature activation-1.73e+01

Pos logit contributions
 charming+0.094
 cute+0.087
 clever+0.085
 beautiful+0.081
 lovely+0.080
Neg logit contributions
 to-0.710
ted-0.699
driver-0.697
ors-0.685
 him-0.684
Token playing
Token ID2712
Feature activation-2.46e-01

Pos logit contributions
 beautiful+0.523
 wonderful+0.504
 clever+0.503
 lively+0.501
 fun+0.499
Neg logit contributions
rings-0.523
oes-0.519
als-0.485
licks-0.476
ted-0.476
Token with
Token ID351
Feature activation+4.39e+00

Pos logit contributions
 beautiful+0.009
 charming+0.009
 clever+0.009
 wonderful+0.009
 lovely+0.009
Neg logit contributions
rings-0.005
ors-0.005
oes-0.005
ted-0.005
driver-0.005
Token his
Token ID465
Feature activation-1.25e+00

Pos logit contributions
ors+0.090
ted+0.087
driver+0.087
 to+0.085
rings+0.084
Neg logit contributions
 charming-0.156
 wonderful-0.154
 beautiful-0.154
 lovely-0.154
 clever-0.153
Token again
Token ID757
Feature activation+4.51e+00

Pos logit contributions
 beautiful+0.129
 wonderful+0.128
 nice+0.127
 clever+0.126
 charming+0.126
Neg logit contributions
oes-0.070
ors-0.068
ted-0.067
rings-0.067
licks-0.067
Token.
Token ID13
Feature activation-9.27e+00

Pos logit contributions
ors+0.128
driver+0.125
ted+0.125
rings+0.124
 Stop+0.121
Neg logit contributions
 charming-0.126
 beautiful-0.126
 wonderful-0.124
 clever-0.124
 lovely-0.124
Token<|endoftext|>
Token ID50256
Feature activation-5.76e+00

Pos logit contributions
 charming+0.282
 beautiful+0.282
 wonderful+0.280
 lovely+0.278
 clever+0.278
Neg logit contributions
ors-0.240
driver-0.234
ted-0.233
rings-0.232
fully-0.224
TokenOnce
Token ID7454
Feature activation-1.89e+00

Pos logit contributions
 charming+0.185
 beautiful+0.184
 clever+0.183
 wonderful+0.182
 lovely+0.181
Neg logit contributions
ors-0.136
ted-0.136
driver-0.134
rings-0.131
 Stop-0.131
Token there
Token ID612
Feature activation+2.40e+00

Pos logit contributions
 charming+0.064
 beautiful+0.064
 wonderful+0.063
 lovely+0.063
 small+0.063
Neg logit contributions
ors-0.042
ted-0.042
driver-0.042
 Stop-0.041
rings-0.041
Token was
Token ID373
Feature activation+9.95e-01

Pos logit contributions
ors+0.055
driver+0.055
rings+0.054
ted+0.052
oes+0.051
Neg logit contributions
 charming-0.082
 beautiful-0.082
 wonderful-0.082
 lovely-0.081
 clever-0.080
Token a
Token ID257
Feature activation-1.35e+01

Pos logit contributions
 him+0.031
 to+0.031
ted+0.029
 it+0.029
 they+0.029
Neg logit contributions
 clever-0.023
 charming-0.023
 gentle-0.023
 beautiful-0.022
 magical-0.022
Token little
Token ID1310
Feature activation-2.32e+01

Pos logit contributions
ady+0.249
oken+0.231
 unexpected+0.224
 amazing+0.219
 entire+0.215
Neg logit contributions
 they-0.470
 we-0.464
 to-0.458
 it-0.433
 he-0.423
Token child
Token ID1200
Feature activation-4.96e+00

Pos logit contributions
 wonderful+0.575
 charming+0.564
 clever+0.563
 beautiful+0.560
 lovely+0.551
Neg logit contributions
ors-0.686
ted-0.685
driver-0.680
 to-0.680
rings-0.665
Token who
Token ID508
Feature activation+2.25e+00

Pos logit contributions
 charming+0.127
 beautiful+0.126
 clever+0.125
 wonderful+0.125
 lovely+0.124
Neg logit contributions
ors-0.151
driver-0.149
ted-0.148
rings-0.148
 Stop-0.145
Token was
Token ID373
Feature activation+1.07e+00

Pos logit contributions
 to+0.053
 him+0.052
ted+0.048
 would+0.048
 Did+0.047
Neg logit contributions
 beautiful-0.067
 charming-0.066
 clever-0.065
 wonderful-0.064
 magical-0.064
Token were
Token ID547
Feature activation+3.33e+00

Pos logit contributions
 beautiful+0.047
 charming+0.047
 clever+0.046
 wonderful+0.046
 lovely+0.046
Neg logit contributions
ors-0.034
driver-0.033
rings-0.032
ted-0.032
oes-0.032
Token playing
Token ID2712
Feature activation+1.29e+00

Pos logit contributions
ors+0.074
 to+0.074
ted+0.073
rings+0.072
driver+0.071
Neg logit contributions
 charming-0.109
 beautiful-0.109
 wonderful-0.108
 lovely-0.107
 clever-0.107
Token in
Token ID287
Feature activation+8.64e+00

Pos logit contributions
ors+0.029
rings+0.028
driver+0.028
ted+0.027
oes+0.027
Neg logit contributions
 beautiful-0.045
 charming-0.044
 clever-0.044
 wonderful-0.044
 lovely-0.044
Token the
Token ID262
Feature activation-2.77e+00

Pos logit contributions
 to+0.183
 him+0.179
driver+0.173
ted+0.173
ors+0.172
Neg logit contributions
 charming-0.291
 clever-0.290
 cute-0.286
 beautiful-0.285
 lovely-0.285
Token park
Token ID3952
Feature activation-4.38e+00

Pos logit contributions
 clever+0.074
 compassionate+0.072
 cute+0.070
 intelligent+0.069
 charming+0.068
Neg logit contributions
 to-0.074
 him-0.069
ted-0.068
 it-0.066
ting-0.066
Token.
Token ID13
Feature activation-8.21e+00

Pos logit contributions
 beautiful+0.155
 charming+0.154
 lovely+0.152
 clever+0.152
 wonderful+0.152
Neg logit contributions
ors-0.094
driver-0.093
rings-0.090
ted-0.089
 Stop-0.088
Token Bob
Token ID5811
Feature activation-8.25e-01

Pos logit contributions
 small+0.302
 charming+0.298
 beautiful+0.297
 cute+0.297
 clever+0.296
Neg logit contributions
ors-0.171
rings-0.163
driver-0.163
fully-0.156
ted-0.150
Token had
Token ID550
Feature activation+4.12e+00

Pos logit contributions
 clever+0.025
 beautiful+0.025
 charming+0.025
 cute+0.025
 small+0.025
Neg logit contributions
ors-0.021
driver-0.021
rings-0.021
oes-0.021
 Stop-0.020
Token a
Token ID257
Feature activation-1.26e+01

Pos logit contributions
 to+0.123
 him+0.114
ted+0.114
driver+0.113
rings+0.111
Neg logit contributions
 charming-0.107
 clever-0.104
 lovely-0.104
 beautiful-0.104
 cute-0.104
Token new
Token ID649
Feature activation-1.88e+01

Pos logit contributions
 elderly+0.159
 entire+0.157
aky+0.155
 intelligent+0.151
 charming+0.147
Neg logit contributions
 to-0.487
 they-0.470
 would-0.457
 it-0.453
 we-0.442
Token toy
Token ID13373
Feature activation-8.93e+00

Pos logit contributions
 clever+0.411
 charming+0.406
 cute+0.401
 beautiful+0.399
 wonderful+0.393
Neg logit contributions
 to-0.621
driver-0.595
rings-0.576
fully-0.575
oes-0.571
Token seriously
Token ID6411
Feature activation-3.10e+00

Pos logit contributions
 to+0.129
ors+0.127
 they+0.120
driver+0.119
t+0.118
Neg logit contributions
 charming-0.107
 wonderful-0.105
 lovely-0.104
 clever-0.104
 beautiful-0.104
Token.
Token ID13
Feature activation-8.80e+00

Pos logit contributions
 charming+0.090
 beautiful+0.090
 clever+0.089
 wonderful+0.089
 lovely+0.089
Neg logit contributions
ors-0.084
driver-0.082
ted-0.082
rings-0.081
 Stop-0.079
Token<|endoftext|>
Token ID50256
Feature activation-5.32e+00

Pos logit contributions
 charming+0.262
 beautiful+0.262
 lovely+0.261
 wonderful+0.260
 cute+0.258
Neg logit contributions
driver-0.242
ors-0.238
rings-0.227
ted-0.225
fully-0.224
TokenOnce
Token ID7454
Feature activation-1.41e+00

Pos logit contributions
 charming+0.194
 clever+0.192
 beautiful+0.191
 wonderful+0.190
 cute+0.189
Neg logit contributions
 to-0.104
ted-0.103
 him-0.099
ors-0.099
driver-0.097
Token upon
Token ID2402
Feature activation+5.80e+00

Pos logit contributions
 charming+0.049
 beautiful+0.049
 wonderful+0.048
 lovely+0.048
 clever+0.048
Neg logit contributions
ors-0.030
driver-0.029
ted-0.029
rings-0.029
 Stop-0.028
Token a
Token ID257
Feature activation-1.30e+01

Pos logit contributions
 to+0.183
 him+0.179
 it+0.172
fully+0.167
 they+0.167
Neg logit contributions
 charming-0.136
 clever-0.135
 wonderful-0.134
 beautiful-0.132
 magical-0.132
Token time
Token ID640
Feature activation-6.40e+00

Pos logit contributions
 magical+0.239
 charming+0.237
 powerful+0.235
 wealthy+0.234
 beautiful+0.234
Neg logit contributions
 to-0.463
t-0.430
 him-0.418
fully-0.414
 inspection-0.413
Token,
Token ID11
Feature activation+1.13e-01

Pos logit contributions
 small+0.209
 wonderful+0.207
 lovely+0.207
 charming+0.207
 cute+0.206
Neg logit contributions
ors-0.161
ted-0.160
rings-0.157
ting-0.156
driver-0.154
Token there
Token ID612
Feature activation+2.97e+00

Pos logit contributions
driver+0.002
ors+0.002
ted+0.002
 to+0.002
rings+0.002
Neg logit contributions
 charming-0.004
 beautiful-0.004
 clever-0.004
 wonderful-0.004
 lovely-0.004
Token was
Token ID373
Feature activation+1.59e+00

Pos logit contributions
ors+0.073
driver+0.072
rings+0.072
ted+0.068
oes+0.067
Neg logit contributions
 beautiful-0.098
 charming-0.098
 wonderful-0.098
 lovely-0.098
 cute-0.097
Token a
Token ID257
Feature activation-1.29e+01

Pos logit contributions
 to+0.044
 him+0.043
 it+0.041
 they+0.040
 It+0.039
Neg logit contributions
 clever-0.043
 gentle-0.042
 charming-0.042
 magical-0.041
 powerful-0.041
Token't
Token ID470
Feature activation+9.79e+00

Pos logit contributions
rings+0.055
als+0.051
ted+0.049
ors+0.047
 inspection+0.044
Neg logit contributions
 big-0.162
 small-0.160
 great-0.158
 beautiful-0.158
 clever-0.158
Token know
Token ID760
Feature activation-2.67e+00

Pos logit contributions
ors+0.244
driver+0.239
ted+0.236
rings+0.236
 Stop+0.230
Neg logit contributions
 charming-0.307
 beautiful-0.306
 wonderful-0.303
 clever-0.302
 lovely-0.301
Token what
Token ID644
Feature activation+8.25e+00

Pos logit contributions
 beautiful+0.102
 wonderful+0.102
 charming+0.101
 lovely+0.101
 cute+0.100
Neg logit contributions
ors-0.048
driver-0.048
 Stop-0.047
oes-0.046
rings-0.046
Token to
Token ID284
Feature activation+1.57e+01

Pos logit contributions
ors+0.085
driver+0.080
ted+0.079
rings+0.079
 Stop+0.073
Neg logit contributions
 charming-0.381
 beautiful-0.380
 wonderful-0.378
 clever-0.377
 lovely-0.376
Token do
Token ID466
Feature activation+2.51e+00

Pos logit contributions
ors+0.432
driver+0.425
ted+0.423
rings+0.422
 Stop+0.405
Neg logit contributions
 beautiful-0.459
 charming-0.459
 wonderful-0.455
 clever-0.453
 lovely-0.453
Token.
Token ID13
Feature activation-8.57e+00

Pos logit contributions
rings+0.098
ted+0.095
als+0.094
oes+0.093
licks+0.091
Neg logit contributions
 nice-0.051
 beautiful-0.049
 wonderful-0.049
 clever-0.049
 kind-0.048
Token
Token ID198
Feature activation-6.29e+00

Pos logit contributions
 charming+0.271
 beautiful+0.270
 small+0.268
 clever+0.268
 lovely+0.267
Neg logit contributions
ors-0.215
rings-0.207
driver-0.206
ted-0.205
fully-0.199
Token
Token ID198
Feature activation-6.25e+00

Pos logit contributions
 charming+0.245
 beautiful+0.240
 clever+0.239
 cute+0.238
 wonderful+0.238
Neg logit contributions
 to-0.109
ors-0.103
 him-0.102
driver-0.100
 they-0.100
TokenThen
Token ID6423
Feature activation-2.76e+00

Pos logit contributions
 charming+0.217
 beautiful+0.213
 wonderful+0.213
 lovely+0.212
 cute+0.210
Neg logit contributions
driver-0.131
 they-0.128
ors-0.124
but-0.124
 to-0.124
Token,
Token ID11
Feature activation+3.21e-01

Pos logit contributions
 beautiful+0.110
 charming+0.110
 wonderful+0.109
 clever+0.109
 lovely+0.109
Neg logit contributions
ors-0.047
driver-0.045
oes-0.044
rings-0.044
ted-0.043
Token Lily
Token ID20037
Feature activation-5.72e+00

Pos logit contributions
ors+0.005
driver+0.005
oes+0.005
rings+0.005
ted+0.005
Neg logit contributions
 beautiful-0.013
 charming-0.013
 wonderful-0.013
 clever-0.013
 lovely-0.013
Token the
Token ID262
Feature activation-3.48e+00

Pos logit contributions
 to+0.110
 him+0.109
ors+0.108
 they+0.105
 it+0.105
Neg logit contributions
 charming-0.173
 clever-0.169
 cute-0.168
 lovely-0.167
 beautiful-0.167
Token new
Token ID649
Feature activation-1.92e+01

Pos logit contributions
 charming+0.072
 clever+0.070
 gentle+0.068
 lovely+0.068
 cute+0.068
Neg logit contributions
 to-0.109
 him-0.105
driver-0.102
rings-0.101
ted-0.101
Token toys
Token ID14958
Feature activation+1.41e+00

Pos logit contributions
 charming+0.419
 clever+0.414
 beautiful+0.411
 lovely+0.406
 wonderful+0.406
Neg logit contributions
 to-0.631
ors-0.630
driver-0.626
rings-0.618
ted-0.616
Token.
Token ID13
Feature activation-8.90e+00

Pos logit contributions
 Stop+0.044
ors+0.043
ted+0.043
oes+0.042
licks+0.041
Neg logit contributions
 beautiful-0.037
 charming-0.036
 wonderful-0.036
 lovely-0.036
 nice-0.036
Token<|endoftext|>
Token ID50256
Feature activation-5.50e+00

Pos logit contributions
 beautiful+0.316
 charming+0.316
 wonderful+0.314
 lovely+0.313
 clever+0.310
Neg logit contributions
ors-0.185
driver-0.185
rings-0.180
ted-0.178
fully-0.172
TokenOnce
Token ID7454
Feature activation-1.62e+00

Pos logit contributions
 charming+0.191
 beautiful+0.189
 clever+0.189
 wonderful+0.188
 lovely+0.186
Neg logit contributions
ted-0.116
 to-0.113
ors-0.113
driver-0.111
 Stop-0.110
Token upon
Token ID2402
Feature activation+5.54e+00

Pos logit contributions
 charming+0.058
 beautiful+0.058
 clever+0.058
 wonderful+0.058
 lovely+0.057
Neg logit contributions
ors-0.032
driver-0.031
ted-0.031
rings-0.031
 Stop-0.030
Token a
Token ID257
Feature activation-1.33e+01

Pos logit contributions
 to+0.162
 him+0.161
 it+0.156
 they+0.147
 It+0.143
Neg logit contributions
 charming-0.140
 clever-0.138
 wonderful-0.137
 magical-0.137
 beautiful-0.134
Token time
Token ID640
Feature activation-6.76e+00

Pos logit contributions
 wealthy+0.268
 magical+0.267
 elderly+0.266
 charming+0.262
 beautiful+0.261
Neg logit contributions
 to-0.441
t-0.412
 they-0.396
 him-0.396
 inspection-0.390
Token,
Token ID11
Feature activation-3.02e-01

Pos logit contributions
 charming+0.241
 beautiful+0.240
 clever+0.240
 wonderful+0.239
 lovely+0.239
Neg logit contributions
ors-0.141
ted-0.138
rings-0.138
driver-0.137
ting-0.130
Token there
Token ID612
Feature activation+2.50e+00

Pos logit contributions
 charming+0.011
 beautiful+0.011
 wonderful+0.011
 clever+0.011
 lovely+0.011
Neg logit contributions
 to-0.005
driver-0.005
fully-0.005
ors-0.005
ted-0.005
Token the
Token ID262
Feature activation-4.01e+00

Pos logit contributions
 small+0.023
 beautiful+0.023
 clever+0.023
 big+0.023
 wonderful+0.022
Neg logit contributions
ors-0.011
oes-0.010
fully-0.010
driver-0.009
amb-0.009
Token rainbow
Token ID27223
Feature activation-9.19e+00

Pos logit contributions
 clever+0.104
 compassionate+0.102
 charming+0.102
aky+0.099
 gentle+0.098
Neg logit contributions
 we-0.108
als-0.108
 to-0.108
 it-0.106
 they-0.104
Token seemed
Token ID3947
Feature activation-4.20e+00

Pos logit contributions
 beautiful+0.291
 clever+0.281
 charming+0.279
 small+0.279
 magical+0.277
Neg logit contributions
 Stop-0.237
ors-0.234
rings-0.232
driver-0.229
licks-0.222
Token to
Token ID284
Feature activation+1.50e+01

Pos logit contributions
 charming+0.192
 beautiful+0.190
 clever+0.190
 wonderful+0.190
 lovely+0.189
Neg logit contributions
ors-0.040
rings-0.038
ted-0.038
driver-0.038
 to-0.037
Token stay
Token ID2652
Feature activation+5.71e+00

Pos logit contributions
 to+0.321
oes+0.301
 they+0.295
 it+0.295
 him+0.292
Neg logit contributions
 charming-0.508
 cute-0.498
 clever-0.497
 humble-0.486
 lovely-0.485
Token the
Token ID262
Feature activation-3.94e+00

Pos logit contributions
 to+0.126
 it+0.114
ors+0.110
 they+0.109
 him+0.106
Neg logit contributions
 charming-0.186
 wonderful-0.184
 clever-0.184
 cute-0.182
 lovely-0.181
Token same
Token ID976
Feature activation-1.21e+01

Pos logit contributions
 clever+0.096
 charming+0.095
 cute+0.095
 compassionate+0.094
 gentle+0.091
Neg logit contributions
 to-0.113
oes-0.112
 we-0.110
ho-0.109
als-0.109
Token distance
Token ID5253
Feature activation-4.50e+00

Pos logit contributions
 charming+0.385
 clever+0.383
 cute+0.380
 wonderful+0.377
 lovely+0.374
Neg logit contributions
 to-0.291
ors-0.278
driver-0.277
fully-0.275
rings-0.269
Token away
Token ID1497
Feature activation+2.56e+00

Pos logit contributions
 beautiful+0.129
 charming+0.126
 lovely+0.125
 small+0.124
 nice+0.124
Neg logit contributions
driver-0.130
ors-0.127
ted-0.125
 Stop-0.125
rings-0.123
Token.
Token ID13
Feature activation-9.21e+00

Pos logit contributions
ors+0.069
driver+0.067
ted+0.067
rings+0.067
 to+0.065
Neg logit contributions
 charming-0.075
 beautiful-0.074
 clever-0.074
 wonderful-0.074
 lovely-0.073
Token
Token ID198
Feature activation-6.97e+00

Pos logit contributions
 charming+0.315
 beautiful+0.314
 wonderful+0.311
 clever+0.311
 lovely+0.310
Neg logit contributions
ors-0.202
driver-0.198
ted-0.197
rings-0.195
 Stop-0.190
Token knew
Token ID2993
Feature activation-6.88e+00

Pos logit contributions
ted+0.190
 to+0.184
ors+0.182
driver+0.182
fully+0.174
Neg logit contributions
 charming-0.325
 lovely-0.322
 beautiful-0.321
 wonderful-0.320
 cute-0.316
Token he
Token ID339
Feature activation+1.36e+01

Pos logit contributions
 beautiful+0.274
 wonderful+0.273
 clever+0.270
 charming+0.268
 small+0.267
Neg logit contributions
oes-0.129
ors-0.128
 Stop-0.126
licks-0.123
als-0.123
Token had
Token ID550
Feature activation+2.96e+00

Pos logit contributions
ors+0.319
rings+0.319
but+0.309
 Stop+0.306
oes+0.305
Neg logit contributions
 clever-0.475
 wonderful-0.465
 beautiful-0.463
 cute-0.461
 charming-0.460
Token to
Token ID284
Feature activation+1.48e+01

Pos logit contributions
 to+0.031
driver+0.026
ors+0.026
ted+0.025
rings+0.025
Neg logit contributions
 charming-0.136
 beautiful-0.134
 cute-0.134
 lovely-0.134
 wonderful-0.133
Token be
Token ID307
Feature activation+6.20e+00

Pos logit contributions
ors+0.309
driver+0.309
 to+0.305
 Stop+0.305
 him+0.301
Neg logit contributions
 charming-0.505
 beautiful-0.497
 cute-0.494
 lovely-0.490
 wonderful-0.490
Token brave
Token ID14802
Feature activation-2.36e+01

Pos logit contributions
 to+0.232
 it+0.203
 over+0.203
 him+0.203
 they+0.191
Neg logit contributions
 cute-0.101
 large-0.099
 charming-0.099
 lovely-0.097
 magical-0.097
Token.
Token ID13
Feature activation-9.51e+00

Pos logit contributions
 beautiful+0.877
 clever+0.873
 small+0.872
 kind+0.872
 nice+0.871
Neg logit contributions
rings-0.475
oes-0.465
ors-0.458
but-0.440
ted-0.437
Token Tim
Token ID5045
Feature activation-4.36e+00

Pos logit contributions
 charming+0.327
 small+0.326
 beautiful+0.325
 clever+0.324
 lovely+0.321
Neg logit contributions
rings-0.210
ors-0.201
driver-0.197
ted-0.195
ting-0.193
Token managed
Token ID5257
Feature activation+3.48e+00

Pos logit contributions
 charming+0.143
 beautiful+0.143
 clever+0.142
 wonderful+0.141
 lovely+0.140
Neg logit contributions
ors-0.103
rings-0.100
driver-0.100
ted-0.099
 Stop-0.097
Token to
Token ID284
Feature activation+1.48e+01

Pos logit contributions
als+0.063
rings+0.062
 Stop+0.058
ors+0.057
oes+0.057
Neg logit contributions
 small-0.150
 nice-0.149
 big-0.149
 clever-0.148
 gentle-0.147
Token run
Token ID1057
Feature activation+1.80e+00

Pos logit contributions
driver+0.311
 to+0.309
ors+0.304
 him+0.301
fully+0.298
Neg logit contributions
 charming-0.505
 cute-0.494
 beautiful-0.492
 lovely-0.492
 wonderful-0.491
Token notebook
Token ID20922
Feature activation+2.19e+00

Pos logit contributions
 to+0.043
ors+0.042
rings+0.042
driver+0.042
fully+0.040
Neg logit contributions
 charming-0.040
 wonderful-0.040
 clever-0.040
 cute-0.039
 lovely-0.039
Token safe
Token ID3338
Feature activation-1.21e+01

Pos logit contributions
ted+0.060
ors+0.058
 Stop+0.057
rings+0.055
oes+0.055
Neg logit contributions
 beautiful-0.068
 charming-0.067
 clever-0.066
 wonderful-0.066
 nice-0.066
Token."
Token ID526
Feature activation-8.25e+00

Pos logit contributions
 beautiful+0.398
 nice+0.395
 small+0.392
 charming+0.389
 lovely+0.386
Neg logit contributions
ted-0.290
rings-0.277
driver-0.272
ors-0.271
 Stop-0.269
Token
Token ID198
Feature activation-7.28e+00

Pos logit contributions
 small+0.215
 clever+0.210
 beautiful+0.210
 gentle+0.208
 charming+0.207
Neg logit contributions
ors-0.254
rings-0.247
driver-0.245
oes-0.242
ted-0.241
Token
Token ID198
Feature activation-7.24e+00

Pos logit contributions
 charming+0.266
 beautiful+0.261
 clever+0.259
 cute+0.259
 lovely+0.259
Neg logit contributions
ors-0.140
 to-0.138
driver-0.137
ted-0.134
 him-0.134
TokenBig
Token ID12804
Feature activation-1.61e+01

Pos logit contributions
 charming+0.189
 beautiful+0.187
 wonderful+0.185
 lovely+0.185
 clever+0.184
Neg logit contributions
ors-0.217
driver-0.215
ted-0.213
rings-0.210
 Stop-0.208
Token Bear
Token ID14732
Feature activation-4.57e+00

Pos logit contributions
 wonderful+0.620
 charming+0.616
 beautiful+0.612
 lovely+0.607
 clever+0.602
Neg logit contributions
ted-0.289
ors-0.288
ting-0.263
 to-0.261
driver-0.260
Token thanked
Token ID26280
Feature activation-6.05e+00

Pos logit contributions
 charming+0.143
 beautiful+0.142
 wonderful+0.141
 lovely+0.141
 cute+0.139
Neg logit contributions
ors-0.111
driver-0.111
ted-0.110
rings-0.106
fully-0.106
Token the
Token ID262
Feature activation-4.16e+00

Pos logit contributions
 beautiful+0.168
 small+0.167
 charming+0.166
 wonderful+0.165
 clever+0.165
Neg logit contributions
ors-0.175
rings-0.173
driver-0.172
oes-0.168
 Stop-0.163
Token rabbit
Token ID22746
Feature activation-6.65e+00

Pos logit contributions
 wonderful+0.133
 charming+0.132
 lovely+0.131
 great+0.131
 beautiful+0.131
Neg logit contributions
 to-0.082
ting-0.079
ted-0.077
 him-0.075
ho-0.075
Token and
Token ID290
Feature activation+1.19e+01

Pos logit contributions
 small+0.250
 gentle+0.248
 beautiful+0.248
 clever+0.247
 charming+0.247
Neg logit contributions
 Stop-0.127
ors-0.123
ted-0.121
driver-0.120
rings-0.118
Token in
Token ID287
Feature activation+8.64e+00

Pos logit contributions
 Stop+0.109
 Did+0.099
 Get+0.095
ted+0.094
ors+0.092
Neg logit contributions
 small-0.161
 big-0.160
 beautiful-0.160
 large-0.154
 pretty-0.153
Token the
Token ID262
Feature activation-2.77e+00

Pos logit contributions
 to+0.184
 it+0.176
 him+0.171
 they+0.165
 inspection+0.160
Neg logit contributions
 clever-0.292
 charming-0.291
 cute-0.286
 compassionate-0.283
 lovely-0.280
Token sand
Token ID6450
Feature activation-1.62e-01

Pos logit contributions
 clever+0.066
 compassionate+0.066
 cute+0.063
 charming+0.063
 wonderful+0.061
Neg logit contributions
als-0.083
 to-0.082
ted-0.081
 him-0.077
 it-0.075
Token and
Token ID290
Feature activation+1.31e+01

Pos logit contributions
 beautiful+0.006
 small+0.006
 charming+0.006
 nice+0.006
 big+0.006
Neg logit contributions
 Stop-0.004
ted-0.003
ors-0.003
driver-0.003
 Did-0.003
Token ate
Token ID15063
Feature activation-8.93e-01

Pos logit contributions
ors+0.395
driver+0.374
 Stop+0.370
oes+0.354
but+0.354
Neg logit contributions
 beautiful-0.384
 small-0.376
 wonderful-0.371
 charming-0.371
 clever-0.370
Token y
Token ID331
Feature activation-2.30e+01

Pos logit contributions
 charming+0.028
 beautiful+0.028
 wonderful+0.028
 clever+0.028
 lovely+0.028
Neg logit contributions
ors-0.022
driver-0.021
ted-0.021
rings-0.021
 Stop-0.021
Tokenummy
Token ID13513
Feature activation+1.44e+00

Pos logit contributions
 beautiful+0.770
 lovely+0.763
 charming+0.760
 wonderful+0.753
 nice+0.745
Neg logit contributions
driver-0.545
rings-0.524
 Stop-0.503
ors-0.498
but-0.494
Token snacks
Token ID28487
Feature activation+5.98e+00

Pos logit contributions
ted+0.044
ors+0.044
rings+0.043
 to+0.043
driver+0.043
Neg logit contributions
 charming-0.035
 clever-0.035
 beautiful-0.035
 wonderful-0.034
 cute-0.034
Token.
Token ID13
Feature activation-8.47e+00

Pos logit contributions
 Stop+0.217
ors+0.206
ted+0.206
 Did+0.205
licks+0.200
Neg logit contributions
 beautiful-0.135
 nice-0.132
 lovely-0.128
 big-0.127
 wonderful-0.127
Token They
Token ID1119
Feature activation+1.25e+01

Pos logit contributions
 beautiful+0.302
 small+0.302
 charming+0.301
 lovely+0.301
 cute+0.299
Neg logit contributions
ors-0.171
driver-0.170
fully-0.160
rings-0.160
oes-0.156
Token even
Token ID772
Feature activation+7.20e+00

Pos logit contributions
ted+0.222
rings+0.215
 to+0.212
ors+0.211
 him+0.209
Neg logit contributions
 charming-0.466
 beautiful-0.456
 lovely-0.455
 clever-0.454
 cute-0.452
Token slim
Token ID18862
Feature activation-1.75e+01

Pos logit contributions
 to+0.022
ted+0.022
driver+0.022
ors+0.021
fully+0.021
Neg logit contributions
 charming-0.009
 beautiful-0.009
 clever-0.009
 wonderful-0.009
 lovely-0.009
Tokeny
Token ID88
Feature activation-2.02e+00

Pos logit contributions
 nice+0.604
 beautiful+0.592
 lovely+0.591
 cute+0.590
 small+0.589
Neg logit contributions
ors-0.383
 Whoever-0.373
rings-0.363
driver-0.351
 Stop-0.344
Token.
Token ID13
Feature activation-9.22e+00

Pos logit contributions
 beautiful+0.043
 nice+0.042
 lovely+0.042
 small+0.042
 big+0.041
Neg logit contributions
oes-0.071
ors-0.071
 Stop-0.070
rings-0.070
 Whoever-0.069
Token Soon
Token ID15894
Feature activation-2.01e+00

Pos logit contributions
 small+0.357
 beautiful+0.352
 charming+0.349
 lovely+0.348
 clever+0.347
Neg logit contributions
ors-0.181
driver-0.167
oes-0.156
fully-0.156
rings-0.155
Token her
Token ID607
Feature activation+2.20e+00

Pos logit contributions
 beautiful+0.076
 charming+0.075
 clever+0.075
 wonderful+0.075
 lovely+0.075
Neg logit contributions
ors-0.042
oes-0.041
driver-0.040
 Stop-0.038
rings-0.037
Token blue
Token ID4171
Feature activation-2.19e+01

Pos logit contributions
 to+0.068
driver+0.062
ted+0.061
 him+0.060
ors+0.059
Neg logit contributions
 wonderful-0.056
 charming-0.054
 lovely-0.053
 beautiful-0.053
 clever-0.052
Token and
Token ID290
Feature activation+1.22e+01

Pos logit contributions
 charming+0.531
 beautiful+0.529
 clever+0.520
 lovely+0.519
 wonderful+0.518
Neg logit contributions
ors-0.695
driver-0.680
ted-0.677
rings-0.677
 Stop-0.665
Token yellow
Token ID7872
Feature activation-1.97e+01

Pos logit contributions
 to+0.469
ted+0.459
driver+0.455
 it+0.453
rings+0.450
Neg logit contributions
 wonderful-0.189
 charming-0.189
 lovely-0.182
 clever-0.181
 beautiful-0.179
Token fish
Token ID5916
Feature activation-1.90e+00

Pos logit contributions
 charming+0.485
 beautiful+0.483
 wonderful+0.477
 clever+0.477
 lovely+0.474
Neg logit contributions
ors-0.619
driver-0.609
ted-0.608
rings-0.604
 Stop-0.591
Token came
Token ID1625
Feature activation+3.91e+00

Pos logit contributions
 charming+0.068
 beautiful+0.068
 clever+0.067
 wonderful+0.067
 lovely+0.067
Neg logit contributions
ors-0.038
driver-0.037
rings-0.037
ted-0.037
 Stop-0.036
Token swimming
Token ID14899
Feature activation-2.29e+00

Pos logit contributions
ors+0.091
driver+0.090
rings+0.088
ted+0.084
 Stop+0.081
Neg logit contributions
 small-0.143
 beautiful-0.142
 big-0.140
 nice-0.139
 cute-0.139
Token
Token ID198
Feature activation-6.02e+00

Pos logit contributions
 small+0.053
 lovely+0.052
 wonderful+0.052
 beautiful+0.052
 charming+0.051
Neg logit contributions
ors-0.077
rings-0.077
ho-0.076
driver-0.075
als-0.074
Token
Token ID198
Feature activation-5.99e+00

Pos logit contributions
 charming+0.193
 beautiful+0.192
 clever+0.190
 wonderful+0.190
 lovely+0.189
Neg logit contributions
ors-0.146
driver-0.143
ted-0.142
rings-0.140
 Stop-0.138
TokenJohn
Token ID7554
Feature activation+2.92e+00

Pos logit contributions
 wonderful+0.154
 beautiful+0.153
 charming+0.152
 clever+0.151
 lovely+0.151
Neg logit contributions
ors-0.191
rings-0.190
ted-0.187
driver-0.184
oes-0.184
Token was
Token ID373
Feature activation+2.02e+00

Pos logit contributions
ors+0.073
rings+0.071
driver+0.070
ted+0.070
 Stop+0.069
Neg logit contributions
 charming-0.092
 beautiful-0.092
 clever-0.092
 wonderful-0.090
 lovely-0.090
Token so
Token ID523
Feature activation+2.12e+00

Pos logit contributions
 to+0.056
 over+0.051
 it+0.050
 him+0.046
but+0.044
Neg logit contributions
 lovely-0.052
 wonderful-0.052
 charming-0.052
 beautiful-0.051
 pretty-0.051
Token excited
Token ID6568
Feature activation-1.61e+01

Pos logit contributions
 to+0.071
 they+0.067
 him+0.063
 she+0.063
 it+0.063
Neg logit contributions
 magical-0.042
 charming-0.041
 beautiful-0.041
 wonderful-0.040
 lovely-0.040
Token.
Token ID13
Feature activation-8.10e+00

Pos logit contributions
 beautiful+0.417
 fun+0.408
 nice+0.407
 lovely+0.391
 charming+0.389
Neg logit contributions
licks-0.564
 Stop-0.531
 Did-0.531
rings-0.530
driver-0.529
Token He
Token ID679
Feature activation+1.07e+01

Pos logit contributions
 small+0.276
 charming+0.267
 beautiful+0.265
 big+0.262
 nice+0.262
Neg logit contributions
ors-0.190
ted-0.188
rings-0.185
ting-0.182
als-0.174
Token raced
Token ID32897
Feature activation+1.76e-01

Pos logit contributions
ors+0.251
driver+0.245
ted+0.243
rings+0.243
 Stop+0.236
Neg logit contributions
 charming-0.349
 beautiful-0.349
 clever-0.345
 wonderful-0.345
 lovely-0.344
Token outside
Token ID2354
Feature activation+4.00e+00

Pos logit contributions
 Did+0.004
rings+0.004
ors+0.004
oes+0.004
 Stop+0.004
Neg logit contributions
 big-0.007
 beautiful-0.006
 small-0.006
 nice-0.006
 lively-0.006
Token to
Token ID284
Feature activation+1.63e+01

Pos logit contributions
rings+0.096
ors+0.094
als+0.087
ted+0.086
driver+0.085
Neg logit contributions
 small-0.150
 big-0.146
 beautiful-0.143
 lively-0.142
 gentle-0.140