TOP ACTIVATIONS
MAX = 0.712

loved to play hide - and - seek with her friends
played catch and hide - and - seek . Mia was
a game of hide - and - seek . She squ
decided to play hide - and - seek behind them .
and they played hide - and - seek . After a
it ! Sarah be amed with pride , she was
friends were playing hide - and - seek . Tim my
friends were playing hide - and - seek . Sammy was
I got it !" be amed Bob .
hill , played hide - and - seek in a me
Kat ie be amed with joy . " Look
played games like hide - and - seek and tag .
she would play hide - and - seek with her friends
special game of hide - and - seek , so she
be very grateful and be amed in the sunshine .
. They played hide - and - seek until it was
. They played hide - and - seek , tag ,
like tag and hide - and - seek . The robot
played tag and hide - and - seek . Tim my
loved to play hide - and - seek with her friends

MIN ACTIVATIONS
MIN = -0.497

. She understands . She says sorry again . She gives
He smiles back . He says goodbye and walks away .
Tom nods . He says sorry to Mia . He
She feels better . She says she loves Mom my and
smiles and nods . She says , " Thank you ,
She feels sorry . She says , " Yes , Tom
Their mom smiles . She says , " You are welcome
sorry to Mia . He says thank you to Mia .
She hugs him . She says , " I 'm sorry
Mom my smiles . She says she loves Lily and Tom
feels sorry too . He says , " OK , I
them a kiss . She says , " That 's better
bad dog !" the man says . He sounds angry .
you are bad !" Ben says . The dog
Their mom smiles . She says they are beautiful . She
She hugs her doll and says , " Thank you ,
they are beautiful . She says she loves them . She
hugs them back . She says " what are you doing
ch ases Sara . He says , " Give me back
pets the dog . She says , " You are a

INTERVAL 0.409 - 0.712
CONTAINS 1.123%

and dance . One day , Tim my accidentally
touch it . Suddenly he was very frightened .
, Mom put the pizz as in the oven . She
should do . Suddenly he heard someone say ,
her anywhere . He was impressed by how well she hid

INTERVAL 0.107 - 0.409
CONTAINS 29.082%

had and enjoyed her own colorful slide . <|endoftext|> Once upon
joy . <|endoftext|> Once , there was a little girl named
much fun ! The boy then filled up
, he 's not rude like some kids . He 's
did . <|endoftext|> Once upon a time , there was a

INTERVAL -0.195 - 0.107
CONTAINS 56.401%

spider web . The bird thought it would be fun to
Max tried to jump and flap his ears , but he
with his family . One day , he decided to go
. As he walked , he saw lots of animals playing
. There was a beautiful lake full of cool , clear

INTERVAL -0.497 - -0.195
CONTAINS 13.395%

old . One day , they heard a strange sound outside
. She says to Ben , " That was silly .
their seat belts . They tell mom about their adventures .
" Look , mom , I can swing !"
She had a kind face , and John felt comfort ed
Token loved
Token ID6151
Feature activation+2.37e-01

Pos logit contributions
 says+0.139
 Ben+0.135
 cheeks+0.132
 Tom+0.132
 Sam+0.125
Neg logit contributions
and-0.140
amed-0.138
year-0.128
uttering-0.126
Along-0.121
Token to
Token ID284
Feature activation+2.73e-01

Pos logit contributions
and+0.369
amed+0.368
year+0.332
Along+0.312
uttering+0.308
Neg logit contributions
 says-0.595
 cheeks-0.583
 Ben-0.582
 Tom-0.566
 mouths-0.561
Token play
Token ID711
Feature activation-1.76e-01

Pos logit contributions
Along+0.530
and+0.519
amed+0.502
 particularly+0.501
On+0.500
Neg logit contributions
 says-0.461
 Lee-0.458
 mouths-0.450
 Ben-0.430
 Tom-0.414
Token hide
Token ID7808
Feature activation-7.32e-02

Pos logit contributions
 says+0.442
 Ben+0.440
 cheeks+0.432
 Tom+0.429
 Sam+0.414
Neg logit contributions
amed-0.282
and-0.272
year-0.254
aming-0.233
uttering-0.233
Token-
Token ID12
Feature activation+1.90e-01

Pos logit contributions
 Ben+0.127
 cheeks+0.125
 says+0.125
 Tom+0.122
 Sam+0.115
Neg logit contributions
amed-0.171
and-0.163
year-0.159
uttering-0.154
Along-0.151
Tokenand
Token ID392
Feature activation+6.91e-01

Pos logit contributions
amed+0.097
uttering+0.075
aming+0.066
pr+0.064
year+0.057
Neg logit contributions
 Tom-0.633
ens-0.630
 Ben-0.623
 says-0.622
 cheeks-0.617
Token-
Token ID12
Feature activation+1.89e-01

Pos logit contributions
amed+1.188
and+1.182
year+1.071
Along+1.012
uttering+1.003
Neg logit contributions
 says-1.643
 Ben-1.614
 cheeks-1.598
 Tom-1.573
 mouths-1.551
Tokenseek
Token ID36163
Feature activation+2.75e-01

Pos logit contributions
amed+0.374
uttering+0.354
aming+0.351
pr+0.349
year+0.347
Neg logit contributions
 Tom-0.370
 cheeks-0.352
 Ben-0.351
ens-0.351
 says-0.346
Token with
Token ID351
Feature activation+2.13e-01

Pos logit contributions
amed+0.643
year+0.601
and+0.565
aming+0.565
Eventually+0.558
Neg logit contributions
 Ben-0.487
 Tom-0.487
 says-0.460
 cheeks-0.451
 Sam-0.438
Token her
Token ID607
Feature activation-1.05e-01

Pos logit contributions
and+0.471
amed+0.444
year+0.414
Along+0.404
aming+0.387
Neg logit contributions
 says-0.394
 cheeks-0.381
 mouths-0.348
 Lee-0.343
 acts-0.337
Token friends
Token ID2460
Feature activation+9.62e-02

Pos logit contributions
 says+0.208
 Ben+0.202
 cheeks+0.202
 Tom+0.198
 mouths+0.192
Neg logit contributions
amed-0.214
and-0.213
year-0.192
Along-0.190
uttering-0.189
Token played
Token ID2826
Feature activation-5.31e-02

Pos logit contributions
 cheeks+0.299
 mouths+0.289
 Lee+0.287
 Ben+0.279
 says+0.277
Neg logit contributions
amed-0.390
year-0.360
and-0.356
Along-0.350
aming-0.336
Token catch
Token ID4929
Feature activation-1.20e-02

Pos logit contributions
 Ben+0.115
 says+0.114
 Tom+0.111
 cheeks+0.110
 Sam+0.107
Neg logit contributions
amed-0.104
and-0.101
year-0.097
uttering-0.091
aming-0.091
Token and
Token ID290
Feature activation+2.94e-02

Pos logit contributions
 says+0.022
 Ben+0.022
 cheeks+0.022
 Tom+0.021
 mouths+0.021
Neg logit contributions
amed-0.027
and-0.026
year-0.025
Along-0.024
uttering-0.024
Token hide
Token ID7808
Feature activation-7.30e-02

Pos logit contributions
amed+0.059
and+0.058
year+0.053
Along+0.050
sy+0.050
Neg logit contributions
 says-0.057
 cheeks-0.055
 Lee-0.054
 mouths-0.054
 Ben-0.052
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
 says+0.145
 cheeks+0.138
 Ben+0.138
 mouths+0.133
 Tom+0.133
Neg logit contributions
and-0.155
amed-0.154
year-0.142
aming-0.134
Along-0.134
Tokenand
Token ID392
Feature activation+6.91e-01

Pos logit contributions
amed+0.123
uttering+0.103
aming+0.093
year+0.092
pr+0.089
Neg logit contributions
 Tom-0.610
 says-0.602
ens-0.601
 Ben-0.597
 cheeks-0.589
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
and+1.197
amed+1.176
year+1.064
uttering+0.995
Along+0.993
Neg logit contributions
 says-1.651
 Ben-1.620
 cheeks-1.599
 Tom-1.576
 mouths-1.558
Tokenseek
Token ID36163
Feature activation+2.70e-01

Pos logit contributions
amed+0.333
year+0.309
uttering+0.305
and+0.297
aming+0.295
Neg logit contributions
 Tom-0.421
 says-0.415
 Ben-0.413
 cheeks-0.405
ens-0.398
Token.
Token ID13
Feature activation-2.81e-02

Pos logit contributions
amed+0.712
year+0.674
and+0.666
aming+0.643
Along+0.642
Neg logit contributions
 Ben-0.402
 Tom-0.393
 says-0.385
 cheeks-0.381
 Sam-0.354
Token Mia
Token ID32189
Feature activation-3.42e-01

Pos logit contributions
 says+0.037
 cheeks+0.037
 mouths+0.034
ens+0.032
 Ben+0.032
Neg logit contributions
and-0.075
amed-0.074
year-0.070
Along-0.070
Just-0.069
Token was
Token ID373
Feature activation-5.11e-03

Pos logit contributions
 cheeks+0.627
 mouths+0.606
 says+0.604
 Ben+0.593
 Tom+0.579
Neg logit contributions
amed-0.753
and-0.732
Along-0.661
year-0.658
aming-0.655
Token a
Token ID257
Feature activation+1.73e-01

Pos logit contributions
 says+0.426
 cheeks+0.418
 Ben+0.417
 Tom+0.406
 mouths+0.402
Neg logit contributions
amed-0.334
and-0.329
year-0.302
Along-0.286
uttering-0.284
Token game
Token ID983
Feature activation-3.21e-02

Pos logit contributions
amed+0.483
and+0.453
year+0.447
aming+0.433
uttering+0.415
Neg logit contributions
 says-0.243
 Ben-0.240
 Tom-0.226
 cheeks-0.225
 mouths-0.215
Token of
Token ID286
Feature activation+2.16e-01

Pos logit contributions
 Ben+0.058
 says+0.057
 Tom+0.056
 Sam+0.053
 cheeks+0.053
Neg logit contributions
amed-0.075
and-0.071
year-0.070
aming-0.066
uttering-0.066
Token hide
Token ID7808
Feature activation-7.80e-02

Pos logit contributions
amed+0.504
and+0.500
Along+0.460
year+0.453
uttering+0.449
Neg logit contributions
 cheeks-0.366
 says-0.350
 mouths-0.340
 Ben-0.334
ens-0.327
Token-
Token ID12
Feature activation+1.85e-01

Pos logit contributions
 says+0.158
 cheeks+0.154
 Ben+0.154
 Tom+0.149
 mouths+0.148
Neg logit contributions
amed-0.161
and-0.160
year-0.149
aming-0.141
Along-0.140
Tokenand
Token ID392
Feature activation+6.90e-01

Pos logit contributions
amed+0.176
uttering+0.160
aming+0.148
Along+0.144
year+0.139
Neg logit contributions
 Tom-0.531
 says-0.525
 Ben-0.522
ens-0.513
 Sam-0.511
Token-
Token ID12
Feature activation+1.84e-01

Pos logit contributions
and+1.133
amed+1.057
year+0.964
Along+0.890
aming+0.888
Neg logit contributions
 says-1.745
 Ben-1.681
 cheeks-1.676
 mouths-1.670
 Tom-1.636
Tokenseek
Token ID36163
Feature activation+2.76e-01

Pos logit contributions
amed+0.327
uttering+0.309
aming+0.304
year+0.301
pr+0.295
Neg logit contributions
 Tom-0.396
 Ben-0.386
 says-0.379
 Sam-0.370
ens-0.366
Token.
Token ID13
Feature activation-2.24e-02

Pos logit contributions
amed+0.653
and+0.640
year+0.606
Along+0.581
uttering+0.578
Neg logit contributions
 says-0.474
 Ben-0.466
 cheeks-0.460
 Tom-0.450
 mouths-0.437
Token She
Token ID1375
Feature activation-7.78e-02

Pos logit contributions
 says+0.036
 cheeks+0.035
 mouths+0.035
 floats+0.034
 Ben+0.033
Neg logit contributions
and-0.055
amed-0.052
Along-0.049
Just-0.047
year-0.047
Token squ
Token ID2809
Feature activation-5.02e-02

Pos logit contributions
 cheeks+0.127
 says+0.127
 Ben+0.125
 mouths+0.122
 Tom+0.121
Neg logit contributions
amed-0.189
and-0.184
year-0.174
Along-0.169
aming-0.164
Token decided
Token ID3066
Feature activation+1.56e-01

Pos logit contributions
amed+0.061
and+0.059
year+0.058
Along+0.056
Just+0.053
Neg logit contributions
 says-0.046
 cheeks-0.045
 Ben-0.044
 Lee-0.043
 Tom-0.043
Token to
Token ID284
Feature activation+2.71e-01

Pos logit contributions
amed+0.284
and+0.271
year+0.259
aming+0.245
uttering+0.238
Neg logit contributions
 says-0.364
 Ben-0.362
 Tom-0.358
 cheeks-0.352
 Sam-0.343
Token play
Token ID711
Feature activation-1.81e-01

Pos logit contributions
Along+0.605
amed+0.594
and+0.592
Just+0.546
On+0.545
Neg logit contributions
 says-0.425
 mouths-0.423
 cheeks-0.418
 Lee-0.404
 Ben-0.391
Token hide
Token ID7808
Feature activation-6.89e-02

Pos logit contributions
 says+0.394
 Ben+0.393
 Tom+0.382
 cheeks+0.381
 Sam+0.368
Neg logit contributions
amed-0.349
and-0.337
year-0.317
aming-0.300
Along-0.299
Token-
Token ID12
Feature activation+1.91e-01

Pos logit contributions
 Ben+0.126
 says+0.125
 cheeks+0.123
 Tom+0.121
 mouths+0.115
Neg logit contributions
amed-0.155
and-0.148
year-0.141
Along-0.137
uttering-0.137
Tokenand
Token ID392
Feature activation+6.90e-01

Pos logit contributions
amed+0.108
uttering+0.087
aming+0.081
Along+0.066
pr+0.061
Neg logit contributions
 Tom-0.613
 says-0.613
ens-0.613
 Ben-0.597
 cheeks-0.596
Token-
Token ID12
Feature activation+1.90e-01

Pos logit contributions
and+1.186
amed+1.158
year+1.054
Along+0.987
uttering+0.981
Neg logit contributions
 says-1.658
 Ben-1.628
 cheeks-1.608
 Tom-1.584
 mouths-1.576
Tokenseek
Token ID36163
Feature activation+2.75e-01

Pos logit contributions
amed+0.370
uttering+0.348
aming+0.346
year+0.341
ust+0.330
Neg logit contributions
 Tom-0.374
 says-0.360
ens-0.358
 Ben-0.356
 cheeks-0.354
Token behind
Token ID2157
Feature activation+1.53e-01

Pos logit contributions
amed+0.727
year+0.672
and+0.667
aming+0.651
Along+0.648
Neg logit contributions
 Ben-0.408
 Tom-0.400
 says-0.383
 cheeks-0.371
 Sam-0.363
Token them
Token ID606
Feature activation-9.22e-02

Pos logit contributions
and+0.379
amed+0.373
year+0.347
Along+0.340
aming+0.337
Neg logit contributions
 says-0.236
 cheeks-0.234
 mouths-0.214
ens-0.207
 Ben-0.205
Token.
Token ID13
Feature activation-2.77e-02

Pos logit contributions
 says+0.167
 cheeks+0.166
 mouths+0.160
 Ben+0.159
 Tom+0.155
Neg logit contributions
and-0.207
amed-0.197
year-0.181
Along-0.179
uttering-0.178
Token and
Token ID290
Feature activation+2.41e-02

Pos logit contributions
amed+0.205
and+0.200
year+0.192
Along+0.183
uttering+0.183
Neg logit contributions
 says-0.124
 Ben-0.120
 cheeks-0.120
 Tom-0.116
 mouths-0.111
Token they
Token ID484
Feature activation-2.20e-01

Pos logit contributions
amed+0.062
and+0.060
year+0.058
Along+0.055
aming+0.053
Neg logit contributions
 cheeks-0.035
 mouths-0.033
 says-0.032
 Ben-0.030
ens-0.030
Token played
Token ID2826
Feature activation-6.01e-02

Pos logit contributions
 says+0.394
 cheeks+0.392
 Ben+0.380
 mouths+0.376
 Tom+0.369
Neg logit contributions
amed-0.507
and-0.472
year-0.468
Along-0.446
aming-0.437
Token hide
Token ID7808
Feature activation-7.22e-02

Pos logit contributions
 Ben+0.148
 says+0.148
 Tom+0.144
 cheeks+0.143
 Sam+0.139
Neg logit contributions
amed-0.098
and-0.096
year-0.090
Along-0.084
uttering-0.083
Token-
Token ID12
Feature activation+1.89e-01

Pos logit contributions
 Ben+0.124
 says+0.122
 cheeks+0.122
 Tom+0.119
 mouths+0.114
Neg logit contributions
amed-0.171
and-0.164
year-0.158
Along-0.153
uttering-0.152
Tokenand
Token ID392
Feature activation+6.89e-01

Pos logit contributions
amed+0.100
uttering+0.087
Along+0.073
aming+0.068
year+0.061
Neg logit contributions
 Tom-0.615
 says-0.612
ens-0.611
 cheeks-0.603
 Ben-0.600
Token-
Token ID12
Feature activation+1.89e-01

Pos logit contributions
amed+1.247
and+1.212
year+1.127
Along+1.068
uttering+1.061
Neg logit contributions
 says-1.581
 Ben-1.555
 cheeks-1.549
 Tom-1.515
 mouths-1.485
Tokenseek
Token ID36163
Feature activation+2.75e-01

Pos logit contributions
amed+0.341
uttering+0.323
year+0.311
aming+0.306
Along+0.303
Neg logit contributions
 Tom-0.402
 says-0.391
 Ben-0.388
 cheeks-0.387
ens-0.380
Token.
Token ID13
Feature activation-2.76e-02

Pos logit contributions
amed+0.640
year+0.605
and+0.600
aming+0.573
Along+0.571
Neg logit contributions
 Ben-0.488
 Tom-0.480
 says-0.466
 cheeks-0.465
 Sam-0.444
Token After
Token ID2293
Feature activation+1.42e-01

Pos logit contributions
 says+0.040
 cheeks+0.039
 mouths+0.037
 Ben+0.036
 floats+0.034
Neg logit contributions
and-0.072
amed-0.071
year-0.067
Along-0.066
Just-0.064
Token a
Token ID257
Feature activation+1.83e-01

Pos logit contributions
amed+0.276
and+0.271
year+0.255
Along+0.243
aming+0.239
Neg logit contributions
 says-0.302
 cheeks-0.292
 mouths-0.282
 Ben-0.276
ens-0.276
Token it
Token ID340
Feature activation-2.06e-01

Pos logit contributions
and+0.610
amed+0.592
year+0.569
Along+0.540
aming+0.539
Neg logit contributions
 says-0.344
 cheeks-0.331
 Ben-0.322
ens-0.305
 mouths-0.301
Token!
Token ID0
Feature activation+3.57e-02

Pos logit contributions
 says+0.335
 Ben+0.335
 cheeks+0.325
 Tom+0.323
 Sam+0.305
Neg logit contributions
amed-0.509
and-0.492
year-0.481
uttering-0.457
Along-0.455
Token
Token ID220
Feature activation-1.87e-02

Pos logit contributions
and+0.078
amed+0.075
year+0.072
Along+0.070
Just+0.068
Neg logit contributions
 cheeks-0.067
 says-0.065
 mouths-0.063
 Ben-0.061
ens-0.060
Token Sarah
Token ID10490
Feature activation-2.09e-01

Pos logit contributions
 says+0.039
 Ben+0.039
 Tom+0.039
 Sam+0.037
 cheeks+0.035
Neg logit contributions
amed-0.038
and-0.036
year-0.035
aming-0.034
uttering-0.033
Token be
Token ID307
Feature activation-1.90e-01

Pos logit contributions
 Ben+0.374
 cheeks+0.373
 says+0.371
 mouths+0.361
 Tom+0.360
Neg logit contributions
amed-0.467
and-0.445
year-0.419
aming-0.409
Along-0.406
Tokenamed
Token ID2434
Feature activation+6.89e-01

Pos logit contributions
 Tom+0.516
 Ben+0.509
 Lily+0.490
 Sam+0.481
 says+0.472
Neg logit contributions
and-0.244
uttering-0.220
ust-0.219
Along-0.216
to-0.190
Token with
Token ID351
Feature activation+2.04e-01

Pos logit contributions
amed+1.423
and+1.400
year+1.307
Along+1.249
uttering+1.242
Neg logit contributions
 says-1.401
 Ben-1.378
 cheeks-1.364
 Tom-1.337
 mouths-1.302
Token pride
Token ID11293
Feature activation+1.40e-01

Pos logit contributions
amed+0.444
and+0.427
year+0.408
aming+0.390
Along+0.379
Neg logit contributions
 says-0.386
 cheeks-0.383
 Ben-0.371
 Lee-0.366
 mouths-0.364
Token,
Token ID11
Feature activation-2.12e-01

Pos logit contributions
amed+0.369
and+0.365
year+0.345
Along+0.334
uttering+0.331
Neg logit contributions
 says-0.206
 Ben-0.203
 cheeks-0.201
 Tom-0.194
 mouths-0.186
Token she
Token ID673
Feature activation-1.46e-01

Pos logit contributions
 says+0.400
 Ben+0.397
 cheeks+0.389
 Tom+0.387
 Sam+0.368
Neg logit contributions
amed-0.472
and-0.464
year-0.441
uttering-0.416
Along-0.415
Token was
Token ID373
Feature activation-4.24e-03

Pos logit contributions
 says+0.259
 Ben+0.254
 cheeks+0.252
 Tom+0.245
 mouths+0.240
Neg logit contributions
amed-0.336
and-0.331
year-0.311
Along-0.298
aming-0.296
Token friends
Token ID2460
Feature activation+8.74e-02

Pos logit contributions
 says+0.206
 cheeks+0.200
 Ben+0.198
 Tom+0.194
 mouths+0.192
Neg logit contributions
amed-0.193
and-0.190
year-0.173
uttering-0.169
Along-0.168
Token were
Token ID547
Feature activation-9.22e-02

Pos logit contributions
amed+0.206
and+0.198
year+0.189
Along+0.183
aming+0.182
Neg logit contributions
 says-0.151
 cheeks-0.149
 Ben-0.146
 mouths-0.143
 Tom-0.140
Token playing
Token ID2712
Feature activation+4.75e-02

Pos logit contributions
 Ben+0.132
 Tom+0.129
 Sam+0.118
 Tim+0.111
 says+0.110
Neg logit contributions
and-0.263
year-0.256
amed-0.249
Just-0.238
pr-0.238
Token hide
Token ID7808
Feature activation-8.09e-02

Pos logit contributions
amed+0.087
and+0.085
year+0.083
Along+0.074
aming+0.074
Neg logit contributions
 Ben-0.109
 says-0.106
 Tom-0.106
 Sam-0.103
 cheeks-0.102
Token-
Token ID12
Feature activation+1.83e-01

Pos logit contributions
 Ben+0.148
 says+0.146
 cheeks+0.144
 Tom+0.142
 Sam+0.136
Neg logit contributions
amed-0.182
and-0.176
year-0.168
uttering-0.162
Along-0.161
Tokenand
Token ID392
Feature activation+6.87e-01

Pos logit contributions
amed+0.094
uttering+0.069
aming+0.063
pr+0.057
year+0.056
Neg logit contributions
 Tom-0.601
ens-0.600
 Ben-0.591
 says-0.590
 floats-0.586
Token-
Token ID12
Feature activation+1.83e-01

Pos logit contributions
and+1.148
amed+1.093
year+0.992
uttering+0.929
Along+0.925
Neg logit contributions
 says-1.712
 Ben-1.670
 cheeks-1.651
 Tom-1.626
 mouths-1.621
Tokenseek
Token ID36163
Feature activation+2.74e-01

Pos logit contributions
amed+0.351
year+0.326
aming+0.326
uttering+0.325
pr+0.323
Neg logit contributions
 Tom-0.369
 Ben-0.355
ens-0.352
 says-0.349
 cheeks-0.349
Token.
Token ID13
Feature activation-2.81e-02

Pos logit contributions
amed+0.691
year+0.640
and+0.628
aming+0.604
Along+0.601
Neg logit contributions
 Ben-0.450
 Tom-0.450
 Sam-0.418
 says-0.414
 cheeks-0.412
Token Tim
Token ID5045
Feature activation-3.72e-01

Pos logit contributions
 says+0.034
 cheeks+0.034
 Ben+0.032
 mouths+0.031
 Lee+0.029
Neg logit contributions
amed-0.079
and-0.078
Along-0.074
year-0.073
Just-0.072
Tokenmy
Token ID1820
Feature activation+4.33e-02

Pos logit contributions
 says+0.685
 Ben+0.670
 cheeks+0.660
 Tom+0.648
 mouths+0.629
Neg logit contributions
amed-0.843
and-0.832
year-0.785
uttering-0.749
Along-0.749
Token friends
Token ID2460
Feature activation+9.10e-02

Pos logit contributions
 says+0.175
 cheeks+0.170
 Ben+0.168
 Tom+0.165
 mouths+0.162
Neg logit contributions
amed-0.171
and-0.167
year-0.153
uttering-0.148
Along-0.148
Token were
Token ID547
Feature activation-8.16e-02

Pos logit contributions
amed+0.210
and+0.199
year+0.190
Along+0.186
aming+0.183
Neg logit contributions
 says-0.159
 cheeks-0.159
 Ben-0.154
 mouths-0.154
 Tom-0.148
Token playing
Token ID2712
Feature activation+5.00e-02

Pos logit contributions
 Ben+0.130
 Tom+0.125
 Sam+0.115
 says+0.114
 Tim+0.105
Neg logit contributions
and-0.219
year-0.211
amed-0.206
Just-0.197
pr-0.196
Token hide
Token ID7808
Feature activation-7.72e-02

Pos logit contributions
amed+0.091
and+0.090
year+0.086
Along+0.078
aming+0.078
Neg logit contributions
 Ben-0.116
 says-0.112
 Tom-0.111
 Sam-0.108
 cheeks-0.108
Token-
Token ID12
Feature activation+1.89e-01

Pos logit contributions
 Ben+0.140
 says+0.140
 cheeks+0.137
 Tom+0.134
 mouths+0.130
Neg logit contributions
amed-0.174
and-0.170
year-0.162
Along-0.155
uttering-0.154
Tokenand
Token ID392
Feature activation+6.87e-01

Pos logit contributions
amed+0.096
uttering+0.071
pr+0.064
Along+0.063
year+0.060
Neg logit contributions
 Tom-0.618
ens-0.616
 Ben-0.615
 says-0.610
 floats-0.610
Token-
Token ID12
Feature activation+1.89e-01

Pos logit contributions
and+1.172
amed+1.129
year+1.022
uttering+0.963
aming+0.955
Neg logit contributions
 says-1.681
 Ben-1.638
 cheeks-1.620
 Tom-1.600
 mouths-1.589
Tokenseek
Token ID36163
Feature activation+2.72e-01

Pos logit contributions
amed+0.349
year+0.325
uttering+0.323
pr+0.321
aming+0.316
Neg logit contributions
 Tom-0.391
 Ben-0.382
 says-0.371
ens-0.370
 cheeks-0.369
Token.
Token ID13
Feature activation-1.63e-02

Pos logit contributions
amed+0.650
year+0.610
and+0.601
Along+0.571
aming+0.571
Neg logit contributions
 Ben-0.472
 Tom-0.464
 says-0.442
 cheeks-0.441
 Sam-0.436
Token Sammy
Token ID44067
Feature activation-2.52e-01

Pos logit contributions
 says+0.023
 cheeks+0.022
 Ben+0.022
 Tom+0.021
 mouths+0.020
Neg logit contributions
amed-0.044
and-0.043
year-0.041
Along-0.040
uttering-0.040
Token was
Token ID373
Feature activation-2.72e-03

Pos logit contributions
 cheeks+0.456
 mouths+0.449
 says+0.445
 Ben+0.432
 Tom+0.420
Neg logit contributions
amed-0.565
and-0.544
Along-0.501
year-0.494
aming-0.493
TokenI
Token ID40
Feature activation+2.71e-01

Pos logit contributions
amed+0.289
and+0.286
year+0.259
 supposed+0.250
uttering+0.246
Neg logit contributions
 says-0.365
 Ben-0.357
 Tom-0.348
 Sam-0.324
 cheeks-0.321
Token got
Token ID1392
Feature activation-4.72e-02

Pos logit contributions
amed+0.667
and+0.658
year+0.622
Along+0.598
uttering+0.595
Neg logit contributions
 says-0.440
 Ben-0.431
 cheeks-0.428
 Tom-0.414
 mouths-0.406
Token it
Token ID340
Feature activation-2.12e-01

Pos logit contributions
 says+0.090
 Ben+0.090
 Tom+0.087
 cheeks+0.084
 Sam+0.083
Neg logit contributions
amed-0.106
and-0.104
year-0.095
uttering-0.095
Along-0.094
Token!"
Token ID2474
Feature activation-2.48e-02

Pos logit contributions
 says+0.325
 Ben+0.317
 cheeks+0.307
 Tom+0.306
 Sam+0.285
Neg logit contributions
amed-0.547
and-0.536
year-0.523
Along-0.493
uttering-0.492
Token be
Token ID307
Feature activation-1.91e-01

Pos logit contributions
 cheeks+0.029
 mouths+0.028
 trucks+0.027
 says+0.027
 floats+0.027
Neg logit contributions
amed-0.068
and-0.067
year-0.063
Just-0.063
Along-0.062
Tokenamed
Token ID2434
Feature activation+6.87e-01

Pos logit contributions
 Tom+0.422
 Ben+0.420
 says+0.419
 Sam+0.399
 cheeks+0.387
Neg logit contributions
and-0.355
Along-0.339
Days-0.330
ust-0.326
uttering-0.324
Token Bob
Token ID5811
Feature activation-2.29e-01

Pos logit contributions
and+1.420
amed+1.383
year+1.316
aming+1.240
 late+1.218
Neg logit contributions
 mouths-1.340
 cheeks-1.322
 says-1.290
 Ben-1.233
 floats-1.229
Token.
Token ID13
Feature activation-3.41e-02

Pos logit contributions
 says+0.382
 Ben+0.374
 cheeks+0.372
 Tom+0.359
 mouths+0.359
Neg logit contributions
amed-0.548
and-0.540
year-0.505
aming-0.485
Along-0.485
Token
Token ID220
Feature activation-2.32e-02

Pos logit contributions
 says+0.065
 cheeks+0.064
 Ben+0.063
 mouths+0.061
 Tom+0.060
Neg logit contributions
and-0.072
amed-0.072
year-0.067
Along-0.065
uttering-0.063
Token
Token ID198
Feature activation+1.82e-01

Pos logit contributions
 says+0.050
 Ben+0.050
 Tom+0.049
 Sam+0.046
 cheeks+0.045
Neg logit contributions
amed-0.047
and-0.045
year-0.044
aming-0.041
Along-0.041
Token
Token ID198
Feature activation+1.82e-01

Pos logit contributions
amed+0.391
and+0.360
year+0.360
uttering+0.357
aming+0.341
Neg logit contributions
 Ben-0.367
 Tom-0.366
 says-0.366
 Sam-0.342
 cheeks-0.318
Token hill
Token ID12788
Feature activation-2.00e-01

Pos logit contributions
amed+0.468
and+0.443
year+0.434
aming+0.413
uttering+0.410
Neg logit contributions
 says-0.280
 Ben-0.275
 cheeks-0.264
 Tom-0.262
 Sam-0.249
Token,
Token ID11
Feature activation-2.14e-01

Pos logit contributions
 says+0.326
 Tom+0.320
 Ben+0.315
 Sam+0.298
 cheeks+0.284
Neg logit contributions
amed-0.489
and-0.482
year-0.468
Days-0.443
aming-0.442
Token played
Token ID2826
Feature activation-5.94e-02

Pos logit contributions
 Ben+0.365
 says+0.358
 Tom+0.357
 cheeks+0.340
 Sam+0.336
Neg logit contributions
amed-0.517
and-0.500
year-0.488
uttering-0.461
pr-0.458
Token hide
Token ID7808
Feature activation-7.63e-02

Pos logit contributions
 says+0.117
 cheeks+0.115
 Ben+0.114
 mouths+0.111
 Tom+0.110
Neg logit contributions
amed-0.126
and-0.124
year-0.115
Along-0.111
uttering-0.110
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
 says+0.140
 Ben+0.138
 cheeks+0.137
 Tom+0.134
 mouths+0.130
Neg logit contributions
amed-0.171
and-0.168
year-0.159
Along-0.152
uttering-0.151
Tokenand
Token ID392
Feature activation+6.87e-01

Pos logit contributions
amed+0.092
uttering+0.078
aming+0.064
year+0.063
pr+0.060
Neg logit contributions
 Tom-0.613
ens-0.605
 says-0.601
 cheeks-0.600
 Ben-0.594
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
and+1.164
amed+1.148
year+1.032
Along+0.972
uttering+0.965
Neg logit contributions
 says-1.664
 Ben-1.631
 cheeks-1.609
 Tom-1.585
 mouths-1.579
Tokenseek
Token ID36163
Feature activation+2.68e-01

Pos logit contributions
amed+0.344
uttering+0.327
year+0.323
aming+0.318
pr+0.315
Neg logit contributions
 Tom-0.381
 cheeks-0.363
 says-0.360
 Ben-0.358
ens-0.357
Token in
Token ID287
Feature activation+2.96e-01

Pos logit contributions
amed+0.671
and+0.639
year+0.634
aming+0.599
uttering+0.597
Neg logit contributions
 Ben-0.431
 says-0.421
 Tom-0.421
 cheeks-0.414
 Sam-0.391
Token a
Token ID257
Feature activation+1.81e-01

Pos logit contributions
and+0.791
amed+0.785
year+0.742
Along+0.740
aming+0.720
Neg logit contributions
 cheeks-0.408
 says-0.400
 mouths-0.368
ens-0.352
 Ben-0.348
Token me
Token ID502
Feature activation-1.07e-01

Pos logit contributions
amed+0.436
and+0.424
year+0.409
uttering+0.391
aming+0.383
Neg logit contributions
 Ben-0.308
 says-0.307
 Tom-0.293
 cheeks-0.289
 mouths-0.282
Token
Token ID198
Feature activation+1.98e-01

Pos logit contributions
 Ben+0.018
 says+0.018
 Tom+0.017
 Sam+0.016
 cheeks+0.016
Neg logit contributions
amed-0.019
year-0.018
and-0.018
Along-0.017
pr-0.017
Token
Token ID198
Feature activation+2.00e-01

Pos logit contributions
amed+0.426
uttering+0.405
year+0.387
 supposed+0.384
and+0.382
Neg logit contributions
 Ben-0.400
 Tom-0.393
 Sam-0.381
 says-0.379
 Tim-0.347
TokenKat
Token ID25881
Feature activation+4.69e-01

Pos logit contributions
amed+0.339
uttering+0.320
 supposed+0.309
year+0.305
and+0.298
Neg logit contributions
 Ben-0.472
 Tom-0.466
 Sam-0.452
 says-0.440
 Tim-0.416
Tokenie
Token ID494
Feature activation-1.78e-01

Pos logit contributions
amed+1.399
and+1.373
year+1.314
aming+1.275
Along+1.256
Neg logit contributions
 Ben-0.514
 says-0.513
 cheeks-0.504
 Tom-0.483
 mouths-0.476
Token be
Token ID307
Feature activation-1.79e-01

Pos logit contributions
 Ben+0.298
 cheeks+0.297
 says+0.296
 mouths+0.290
 Tom+0.288
Neg logit contributions
amed-0.421
and-0.399
year-0.378
aming-0.369
Along-0.369
Tokenamed
Token ID2434
Feature activation+6.87e-01

Pos logit contributions
 Ben+0.507
 Tom+0.498
 Sam+0.484
 Lily+0.470
 Mia+0.459
Neg logit contributions
and-0.244
uttering-0.195
ust-0.189
if-0.182
Along-0.174
Token with
Token ID351
Feature activation+2.13e-01

Pos logit contributions
amed+1.508
and+1.485
year+1.392
Along+1.331
uttering+1.324
Neg logit contributions
 says-1.305
 Ben-1.280
 cheeks-1.269
 Tom-1.239
 mouths-1.210
Token joy
Token ID8716
Feature activation+5.55e-02

Pos logit contributions
amed+0.464
and+0.448
year+0.427
aming+0.406
Along+0.403
Neg logit contributions
 says-0.409
 cheeks-0.403
 Ben-0.392
 mouths-0.384
 Tom-0.381
Token.
Token ID13
Feature activation-2.35e-02

Pos logit contributions
amed+0.138
and+0.136
year+0.129
Along+0.124
aming+0.123
Neg logit contributions
 says-0.089
 Ben-0.088
 cheeks-0.087
 Tom-0.085
 mouths-0.081
Token "
Token ID366
Feature activation-3.77e-02

Pos logit contributions
 says+0.038
 Ben+0.037
 cheeks+0.036
 Tom+0.036
 mouths+0.034
Neg logit contributions
amed-0.059
and-0.058
year-0.054
aming-0.052
Along-0.052
TokenLook
Token ID8567
Feature activation+1.72e-01

Pos logit contributions
 Ben+0.077
 Tom+0.076
 Sam+0.071
 says+0.070
 Tim+0.067
Neg logit contributions
and-0.071
amed-0.069
uttering-0.065
 supposed-0.064
year-0.063
Token played
Token ID2826
Feature activation-6.75e-02

Pos logit contributions
 says+0.331
 cheeks+0.328
 Ben+0.319
 mouths+0.311
 Tom+0.310
Neg logit contributions
amed-0.383
and-0.370
year-0.354
Along-0.340
aming-0.330
Token games
Token ID1830
Feature activation+2.26e-02

Pos logit contributions
 Ben+0.132
 says+0.129
 Tom+0.128
 Sam+0.124
 cheeks+0.119
Neg logit contributions
amed-0.143
and-0.142
year-0.139
aming-0.131
uttering-0.131
Token like
Token ID588
Feature activation+1.88e-01

Pos logit contributions
amed+0.061
and+0.061
year+0.059
aming+0.056
uttering+0.056
Neg logit contributions
 Ben-0.031
 says-0.031
 Tom-0.030
 cheeks-0.028
 Sam-0.028
Token hide
Token ID7808
Feature activation-8.05e-02

Pos logit contributions
and+0.457
amed+0.449
year+0.422
Along+0.416
uttering+0.403
Neg logit contributions
 cheeks-0.305
 says-0.304
 Ben-0.286
 mouths-0.285
 Tom-0.271
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
 Ben+0.140
 says+0.139
 cheeks+0.137
 Tom+0.135
 mouths+0.129
Neg logit contributions
amed-0.188
and-0.182
year-0.175
uttering-0.168
aming-0.167
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.092
uttering+0.065
year+0.064
aming+0.061
pr+0.061
Neg logit contributions
 Tom-0.636
 says-0.630
 Ben-0.630
ens-0.625
 cheeks-0.622
Token-
Token ID12
Feature activation+1.86e-01

Pos logit contributions
and+1.161
amed+1.113
year+0.996
Along+0.955
uttering+0.935
Neg logit contributions
 says-1.677
 Ben-1.640
 cheeks-1.615
 Tom-1.596
 mouths-1.582
Tokenseek
Token ID36163
Feature activation+2.71e-01

Pos logit contributions
amed+0.346
year+0.322
aming+0.321
pr+0.318
uttering+0.316
Neg logit contributions
 Tom-0.396
 Ben-0.385
 says-0.383
 cheeks-0.381
ens-0.373
Token and
Token ID290
Feature activation+2.83e-02

Pos logit contributions
amed+0.715
year+0.676
and+0.652
aming+0.643
uttering+0.633
Neg logit contributions
 Ben-0.415
 Tom-0.410
 says-0.391
 Sam-0.373
 cheeks-0.368
Token tag
Token ID7621
Feature activation-4.73e-02

Pos logit contributions
and+0.064
amed+0.062
Along+0.057
year+0.057
sy+0.055
Neg logit contributions
 says-0.048
 cheeks-0.046
 mouths-0.044
ens-0.041
 Ben-0.041
Token.
Token ID13
Feature activation-2.78e-02

Pos logit contributions
 says+0.077
 Ben+0.076
 cheeks+0.075
 Tom+0.073
 mouths+0.071
Neg logit contributions
amed-0.116
and-0.115
year-0.108
Along-0.104
uttering-0.103
Token she
Token ID673
Feature activation-1.49e-01

Pos logit contributions
 says+0.279
 cheeks+0.267
 Ben+0.264
 Tom+0.252
 mouths+0.250
Neg logit contributions
amed-0.616
and-0.615
year-0.584
Along-0.565
uttering-0.561
Token would
Token ID561
Feature activation+3.77e-02

Pos logit contributions
 cheeks+0.296
 says+0.291
 Ben+0.286
 Tom+0.281
 mouths+0.281
Neg logit contributions
amed-0.303
and-0.296
year-0.278
Along-0.272
aming-0.264
Token play
Token ID711
Feature activation-1.88e-01

Pos logit contributions
and+0.101
amed+0.101
year+0.095
Along+0.095
aming+0.091
Neg logit contributions
 says-0.050
 Ben-0.048
 cheeks-0.047
 mouths-0.046
 Tom-0.046
Token hide
Token ID7808
Feature activation-8.20e-02

Pos logit contributions
 Ben+0.409
 Tom+0.396
 says+0.389
 Sam+0.378
 cheeks+0.369
Neg logit contributions
amed-0.376
and-0.353
year-0.350
uttering-0.328
aming-0.328
Token-
Token ID12
Feature activation+1.78e-01

Pos logit contributions
 Ben+0.149
 says+0.143
 Tom+0.142
 cheeks+0.142
 Sam+0.134
Neg logit contributions
amed-0.185
and-0.176
year-0.169
uttering-0.166
Along-0.164
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.092
uttering+0.085
aming+0.067
pr+0.062
Along+0.060
Neg logit contributions
 Tom-0.574
ens-0.570
 Ben-0.566
 cheeks-0.565
 says-0.562
Token-
Token ID12
Feature activation+1.78e-01

Pos logit contributions
and+1.180
amed+1.172
year+1.058
Along+0.997
aming+0.988
Neg logit contributions
 says-1.643
 Ben-1.608
 cheeks-1.591
 Tom-1.568
 mouths-1.557
Tokenseek
Token ID36163
Feature activation+2.72e-01

Pos logit contributions
amed+0.340
uttering+0.329
pr+0.320
aming+0.320
ust+0.319
Neg logit contributions
 Tom-0.359
 cheeks-0.348
 Ben-0.348
 says-0.341
ens-0.336
Token with
Token ID351
Feature activation+1.98e-01

Pos logit contributions
amed+0.676
year+0.616
aming+0.587
and+0.586
Eventually+0.578
Neg logit contributions
 Ben-0.488
 Tom-0.479
 Sam-0.432
 says-0.416
 Lily-0.412
Token her
Token ID607
Feature activation-1.13e-01

Pos logit contributions
and+0.461
amed+0.444
year+0.408
Along+0.398
aming+0.395
Neg logit contributions
 says-0.347
 cheeks-0.330
 mouths-0.315
 Lee-0.299
 floats-0.299
Token friends
Token ID2460
Feature activation+8.80e-02

Pos logit contributions
 says+0.230
 cheeks+0.224
 Ben+0.221
 Tom+0.217
 mouths+0.213
Neg logit contributions
amed-0.224
and-0.224
Along-0.199
year-0.198
uttering-0.198
Token special
Token ID2041
Feature activation+3.13e-01

Pos logit contributions
amed+1.096
and+1.060
year+1.020
aming+0.976
uttering+0.957
Neg logit contributions
 says-0.729
 cheeks-0.712
 Ben-0.709
 Tom-0.682
 mouths-0.658
Token game
Token ID983
Feature activation-3.02e-02

Pos logit contributions
and+0.733
amed+0.730
year+0.677
Along+0.662
uttering+0.652
Neg logit contributions
 says-0.534
 Ben-0.523
 cheeks-0.518
 Tom-0.505
 mouths-0.502
Token of
Token ID286
Feature activation+2.22e-01

Pos logit contributions
 says+0.050
 Ben+0.049
 cheeks+0.048
 Tom+0.047
 mouths+0.046
Neg logit contributions
amed-0.074
and-0.072
year-0.068
Along-0.066
aming-0.065
Token hide
Token ID7808
Feature activation-7.27e-02

Pos logit contributions
and+0.488
amed+0.480
year+0.439
Along+0.436
uttering+0.428
Neg logit contributions
 cheeks-0.406
 says-0.402
 mouths-0.386
 Ben-0.376
ens-0.372
Token-
Token ID12
Feature activation+1.88e-01

Pos logit contributions
 says+0.136
 Ben+0.133
 cheeks+0.132
 Tom+0.129
 mouths+0.126
Neg logit contributions
amed-0.162
and-0.159
year-0.149
Along-0.143
uttering-0.143
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.134
uttering+0.120
aming+0.104
Along+0.103
ust+0.095
Neg logit contributions
 Tom-0.589
 Ben-0.583
 says-0.583
ens-0.572
 Sam-0.570
Token-
Token ID12
Feature activation+1.88e-01

Pos logit contributions
and+1.157
amed+1.121
year+1.019
Along+0.947
aming+0.942
Neg logit contributions
 says-1.679
 Ben-1.637
 cheeks-1.620
 mouths-1.603
 Tom-1.600
Tokenseek
Token ID36163
Feature activation+2.70e-01

Pos logit contributions
amed+0.328
uttering+0.303
year+0.297
aming+0.296
and+0.293
Neg logit contributions
 Tom-0.418
 Ben-0.414
 says-0.407
 Sam-0.394
 cheeks-0.391
Token,
Token ID11
Feature activation-2.05e-01

Pos logit contributions
amed+0.702
and+0.665
year+0.655
Along+0.627
aming+0.626
Neg logit contributions
 Ben-0.407
 says-0.402
 Tom-0.392
 cheeks-0.391
 Sam-0.365
Token so
Token ID523
Feature activation+3.42e-01

Pos logit contributions
 says+0.413
 cheeks+0.403
 Ben+0.402
 Tom+0.390
 mouths+0.387
Neg logit contributions
amed-0.424
and-0.422
year-0.389
Along-0.374
aming-0.369
Token she
Token ID673
Feature activation-1.38e-01

Pos logit contributions
amed+0.892
and+0.886
year+0.837
Along+0.823
Just+0.810
Neg logit contributions
 says-0.475
 cheeks-0.451
 mouths-0.441
 Ben-0.436
 floats-0.412
Token be
Token ID307
Feature activation-1.93e-01

Pos logit contributions
amed+0.714
and+0.703
year+0.665
Along+0.647
aming+0.643
Neg logit contributions
 says-0.379
 Ben-0.371
 cheeks-0.370
 Tom-0.352
 mouths-0.351
Token very
Token ID845
Feature activation+4.54e-01

Pos logit contributions
 Tom+0.435
 Sam+0.428
 Ben+0.426
 says+0.416
 Tim+0.385
Neg logit contributions
and-0.374
year-0.342
Along-0.311
uttering-0.305
ust-0.302
Token grateful
Token ID14066
Feature activation+3.79e-01

Pos logit contributions
and+0.813
amed+0.800
year+0.711
Along+0.672
uttering+0.668
Neg logit contributions
 says-1.121
 Ben-1.082
 Tom-1.070
 Sam-1.034
 cheeks-1.028
Token and
Token ID290
Feature activation+1.72e-02

Pos logit contributions
amed+0.817
and+0.817
year+0.754
uttering+0.721
Along+0.720
Neg logit contributions
 says-0.718
 cheeks-0.702
 Ben-0.702
 Tom-0.679
 mouths-0.674
Token be
Token ID307
Feature activation-1.97e-01

Pos logit contributions
amed+0.038
and+0.036
year+0.035
aming+0.033
Along+0.032
Neg logit contributions
 cheeks-0.031
 Ben-0.031
 says-0.031
 Lee-0.030
 mouths-0.030
Tokenamed
Token ID2434
Feature activation+6.86e-01

Pos logit contributions
 Tom+0.477
 Sam+0.472
 Ben+0.471
 says+0.455
 Tim+0.438
Neg logit contributions
and-0.320
Along-0.281
uttering-0.267
ust-0.263
year-0.255
Token in
Token ID287
Feature activation+2.84e-01

Pos logit contributions
amed+1.440
and+1.415
year+1.321
Along+1.269
uttering+1.261
Neg logit contributions
 says-1.373
 Ben-1.350
 cheeks-1.336
 Tom-1.310
 mouths-1.272
Token the
Token ID262
Feature activation-1.89e-01

Pos logit contributions
amed+0.763
and+0.748
year+0.716
aming+0.688
Along+0.688
Neg logit contributions
 says-0.402
 cheeks-0.390
 Ben-0.386
 Tom-0.368
 mouths-0.363
Token sunshine
Token ID34488
Feature activation-4.25e-02

Pos logit contributions
 says+0.285
 Ben+0.278
 cheeks+0.275
 Tom+0.267
 mouths+0.259
Neg logit contributions
amed-0.492
and-0.484
year-0.459
Along-0.442
aming-0.440
Token.
Token ID13
Feature activation-4.46e-02

Pos logit contributions
 says+0.072
 Ben+0.070
 cheeks+0.070
 Tom+0.068
 mouths+0.067
Neg logit contributions
amed-0.099
and-0.099
year-0.092
uttering-0.089
Along-0.089
Token
Token ID198
Feature activation+1.77e-01

Pos logit contributions
 cheeks+0.088
 says+0.087
 mouths+0.083
 Ben+0.083
 Tom+0.080
Neg logit contributions
and-0.093
amed-0.092
year-0.086
Along-0.083
Just-0.081
Token.
Token ID13
Feature activation-2.36e-02

Pos logit contributions
 says+0.279
 Ben+0.273
 cheeks+0.271
 Tom+0.264
 mouths+0.258
Neg logit contributions
amed-0.356
and-0.350
year-0.329
Along-0.315
uttering-0.314
Token They
Token ID1119
Feature activation-1.72e-01

Pos logit contributions
 says+0.036
 cheeks+0.036
 Ben+0.034
 mouths+0.034
 Tom+0.033
Neg logit contributions
amed-0.059
and-0.059
year-0.055
Along-0.054
Just-0.053
Token played
Token ID2826
Feature activation-6.24e-02

Pos logit contributions
 says+0.314
 cheeks+0.312
 Ben+0.302
 mouths+0.299
 Lee+0.299
Neg logit contributions
amed-0.386
and-0.357
year-0.351
Along-0.339
aming-0.327
Token hide
Token ID7808
Feature activation-7.56e-02

Pos logit contributions
 says+0.142
 Ben+0.141
 cheeks+0.138
 Tom+0.137
 mouths+0.132
Neg logit contributions
amed-0.114
and-0.111
year-0.104
aming-0.098
Along-0.098
Token-
Token ID12
Feature activation+1.82e-01

Pos logit contributions
 says+0.130
 Ben+0.129
 cheeks+0.127
 Tom+0.125
 mouths+0.120
Neg logit contributions
amed-0.178
and-0.174
year-0.165
Along-0.159
uttering-0.159
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.081
uttering+0.071
pr+0.055
aming+0.055
Along+0.053
Neg logit contributions
 Tom-0.610
 says-0.604
ens-0.603
 cheeks-0.598
 Ben-0.597
Token-
Token ID12
Feature activation+1.82e-01

Pos logit contributions
and+1.179
amed+1.159
year+1.046
aming+0.974
Along+0.973
Neg logit contributions
 says-1.647
 Ben-1.624
 cheeks-1.596
 Tom-1.577
 mouths-1.560
Tokenseek
Token ID36163
Feature activation+2.72e-01

Pos logit contributions
amed+0.324
uttering+0.310
aming+0.299
year+0.299
pr+0.297
Neg logit contributions
 Tom-0.387
 says-0.379
 cheeks-0.377
 Ben-0.373
ens-0.371
Token until
Token ID1566
Feature activation+1.24e-01

Pos logit contributions
amed+0.681
year+0.648
and+0.646
aming+0.615
uttering+0.609
Neg logit contributions
 Ben-0.437
 Tom-0.429
 says-0.419
 cheeks-0.408
 Sam-0.398
Token it
Token ID340
Feature activation-2.13e-01

Pos logit contributions
amed+0.320
 late+0.297
year+0.292
Along+0.289
and+0.289
Neg logit contributions
 says-0.182
 cheeks-0.175
 acts-0.161
 Lee-0.157
 mouths-0.152
Token was
Token ID373
Feature activation-1.98e-03

Pos logit contributions
 says+0.293
 Ben+0.287
 Tom+0.274
 cheeks+0.263
 Sam+0.258
Neg logit contributions
and-0.580
amed-0.562
year-0.559
uttering-0.539
aming-0.518
Token.
Token ID13
Feature activation-2.20e-02

Pos logit contributions
amed+0.247
and+0.243
year+0.230
Along+0.221
uttering+0.220
Neg logit contributions
 says-0.158
 Ben-0.155
 cheeks-0.153
 Tom-0.149
 mouths-0.144
Token They
Token ID1119
Feature activation-1.73e-01

Pos logit contributions
 says+0.037
 cheeks+0.036
 mouths+0.033
 ah+0.033
 acts+0.032
Neg logit contributions
and-0.051
amed-0.050
Along-0.049
uttering-0.046
Just-0.046
Token played
Token ID2826
Feature activation-6.74e-02

Pos logit contributions
 cheeks+0.333
 says+0.330
 mouths+0.312
 Ben+0.301
 Lee+0.301
Neg logit contributions
amed-0.374
year-0.343
and-0.341
Along-0.335
uttering-0.315
Token hide
Token ID7808
Feature activation-7.67e-02

Pos logit contributions
 Ben+0.151
 says+0.149
 Tom+0.147
 Sam+0.143
 cheeks+0.141
Neg logit contributions
amed-0.124
and-0.123
year-0.120
aming-0.111
uttering-0.110
Token-
Token ID12
Feature activation+1.84e-01

Pos logit contributions
 says+0.142
 Ben+0.141
 cheeks+0.139
 Tom+0.136
 mouths+0.131
Neg logit contributions
amed-0.171
and-0.168
year-0.159
uttering-0.152
Along-0.152
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.081
uttering+0.060
pr+0.055
year+0.053
aming+0.052
Neg logit contributions
 Tom-0.626
ens-0.618
 Ben-0.615
 says-0.614
 cheeks-0.610
Token-
Token ID12
Feature activation+1.84e-01

Pos logit contributions
and+1.155
amed+1.124
year+1.006
Along+0.943
uttering+0.941
Neg logit contributions
 says-1.682
 Ben-1.646
 cheeks-1.621
 Tom-1.600
 mouths-1.593
Tokenseek
Token ID36163
Feature activation+2.71e-01

Pos logit contributions
amed+0.340
year+0.323
pr+0.320
uttering+0.318
aming+0.315
Neg logit contributions
 Tom-0.384
 Ben-0.368
 cheeks-0.367
 says-0.365
ens-0.364
Token,
Token ID11
Feature activation-2.10e-01

Pos logit contributions
amed+0.700
year+0.673
and+0.648
aming+0.632
pr+0.623
Neg logit contributions
 Ben-0.422
 Tom-0.420
 says-0.390
 Sam-0.384
 cheeks-0.380
Token tag
Token ID7621
Feature activation-4.65e-02

Pos logit contributions
 says+0.366
 Ben+0.360
 cheeks+0.356
 Tom+0.348
 mouths+0.337
Neg logit contributions
amed-0.494
and-0.487
year-0.459
Along-0.439
uttering-0.439
Token,
Token ID11
Feature activation-2.10e-01

Pos logit contributions
 says+0.066
 cheeks+0.064
 Ben+0.063
 mouths+0.060
 Tom+0.059
Neg logit contributions
amed-0.124
and-0.123
year-0.114
Along-0.112
uttering-0.110
Token like
Token ID588
Feature activation+1.87e-01

Pos logit contributions
 says+0.428
 cheeks+0.417
 Ben+0.416
 Tom+0.402
 mouths+0.399
Neg logit contributions
amed-0.459
and-0.453
year-0.422
Along-0.404
aming-0.400
Token tag
Token ID7621
Feature activation-4.92e-02

Pos logit contributions
and+0.413
amed+0.407
year+0.382
Along+0.365
uttering+0.356
Neg logit contributions
 says-0.344
 cheeks-0.338
 mouths-0.323
 Ben-0.319
ens-0.307
Token and
Token ID290
Feature activation+2.26e-02

Pos logit contributions
 says+0.088
 cheeks+0.087
 Ben+0.084
 mouths+0.083
 Tom+0.079
Neg logit contributions
and-0.109
amed-0.108
year-0.098
Along-0.096
aming-0.094
Token hide
Token ID7808
Feature activation-8.13e-02

Pos logit contributions
and+0.052
amed+0.051
year+0.047
Along+0.047
Just+0.045
Neg logit contributions
 says-0.037
 cheeks-0.036
 Ben-0.035
ens-0.035
 mouths-0.034
Token-
Token ID12
Feature activation+1.80e-01

Pos logit contributions
 Ben+0.156
 says+0.155
 cheeks+0.154
 Tom+0.151
 mouths+0.146
Neg logit contributions
amed-0.175
and-0.171
year-0.160
Along-0.155
uttering-0.154
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.098
uttering+0.096
aming+0.075
Along+0.067
pr+0.061
Neg logit contributions
 Tom-0.578
 cheeks-0.567
ens-0.563
 floats-0.562
 says-0.562
Token-
Token ID12
Feature activation+1.80e-01

Pos logit contributions
amed+1.212
and+1.193
year+1.097
Along+1.034
uttering+1.028
Neg logit contributions
 says-1.602
 Ben-1.576
 cheeks-1.564
 Tom-1.534
 mouths-1.509
Tokenseek
Token ID36163
Feature activation+2.71e-01

Pos logit contributions
amed+0.337
uttering+0.321
aming+0.315
pr+0.301
year+0.299
Neg logit contributions
 Tom-0.367
 cheeks-0.355
 Ben-0.346
 says-0.343
ens-0.341
Token.
Token ID13
Feature activation-2.92e-02

Pos logit contributions
amed+0.727
and+0.689
year+0.677
Along+0.651
aming+0.651
Neg logit contributions
 Ben-0.391
 Tom-0.380
 says-0.375
 cheeks-0.369
 Sam-0.350
Token The
Token ID383
Feature activation-1.41e-01

Pos logit contributions
 says+0.049
 Ben+0.048
 cheeks+0.047
 Tom+0.046
 mouths+0.045
Neg logit contributions
amed-0.071
and-0.070
year-0.066
Along-0.063
uttering-0.063
Token robot
Token ID9379
Feature activation-1.56e-01

Pos logit contributions
 says+0.164
 Ben+0.159
 cheeks+0.157
 Tom+0.150
 mouths+0.145
Neg logit contributions
amed-0.416
and-0.411
year-0.392
Along-0.379
uttering-0.378
Token played
Token ID2826
Feature activation-7.34e-02

Pos logit contributions
 cheeks+0.320
 says+0.320
 mouths+0.302
 Ben+0.299
 Tom+0.287
Neg logit contributions
amed-0.410
and-0.394
year-0.380
Along-0.367
aming-0.355
Token tag
Token ID7621
Feature activation-4.94e-02

Pos logit contributions
 Ben+0.141
 says+0.139
 Tom+0.138
 cheeks+0.131
 Sam+0.130
Neg logit contributions
amed-0.160
and-0.156
year-0.151
uttering-0.144
aming-0.144
Token and
Token ID290
Feature activation+2.17e-02

Pos logit contributions
 says+0.085
 Ben+0.083
 cheeks+0.082
 Tom+0.081
 mouths+0.078
Neg logit contributions
amed-0.118
and-0.116
year-0.110
uttering-0.105
aming-0.105
Token hide
Token ID7808
Feature activation-8.34e-02

Pos logit contributions
amed+0.047
and+0.046
Along+0.042
year+0.042
Just+0.040
Neg logit contributions
 says-0.040
 cheeks-0.038
 mouths-0.037
 Ben-0.036
 Lee-0.036
Token-
Token ID12
Feature activation+1.78e-01

Pos logit contributions
 Ben+0.157
 says+0.156
 cheeks+0.154
 Tom+0.153
 mouths+0.147
Neg logit contributions
amed-0.184
and-0.177
year-0.168
uttering-0.163
Along-0.162
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.097
uttering+0.082
aming+0.072
pr+0.063
Along+0.058
Neg logit contributions
 Tom-0.598
ens-0.589
 Ben-0.580
 cheeks-0.578
 says-0.578
Token-
Token ID12
Feature activation+1.78e-01

Pos logit contributions
and+1.168
amed+1.141
year+1.038
Along+0.970
uttering+0.962
Neg logit contributions
 says-1.664
 Ben-1.631
 cheeks-1.611
 Tom-1.584
 mouths-1.567
Tokenseek
Token ID36163
Feature activation+2.72e-01

Pos logit contributions
amed+0.341
uttering+0.320
aming+0.318
pr+0.311
year+0.309
Neg logit contributions
 Tom-0.373
 Ben-0.354
ens-0.353
 cheeks-0.352
 says-0.345
Token.
Token ID13
Feature activation-3.13e-02

Pos logit contributions
amed+0.780
year+0.720
and+0.711
aming+0.698
uttering+0.686
Neg logit contributions
 Ben-0.371
 Tom-0.367
 says-0.334
 cheeks-0.328
 Sam-0.326
Token Tim
Token ID5045
Feature activation-3.68e-01

Pos logit contributions
 says+0.044
 Ben+0.040
 cheeks+0.039
 Lee+0.039
 acts+0.038
Neg logit contributions
amed-0.080
and-0.079
Along-0.076
year-0.076
Just-0.075
Tokenmy
Token ID1820
Feature activation+4.45e-02

Pos logit contributions
 says+0.683
 Ben+0.669
 cheeks+0.662
 Tom+0.647
 mouths+0.631
Neg logit contributions
amed-0.829
and-0.816
year-0.768
Along-0.734
uttering-0.732
Token loved
Token ID6151
Feature activation+2.26e-01

Pos logit contributions
 says+0.175
 Ben+0.173
 cheeks+0.170
 Tom+0.168
 Sam+0.160
Neg logit contributions
and-0.152
amed-0.152
year-0.139
uttering-0.136
Along-0.132
Token to
Token ID284
Feature activation+2.63e-01

Pos logit contributions
amed+0.350
and+0.346
year+0.313
Along+0.294
uttering+0.291
Neg logit contributions
 says-0.570
 Ben-0.560
 cheeks-0.559
 Tom-0.546
 mouths-0.539
Token play
Token ID711
Feature activation-1.88e-01

Pos logit contributions
and+0.533
Along+0.517
amed+0.491
On+0.480
pr+0.477
Neg logit contributions
 mouths-0.444
 says-0.444
 Lee-0.419
 cheeks-0.404
ens-0.399
Token hide
Token ID7808
Feature activation-8.10e-02

Pos logit contributions
 says+0.453
 Ben+0.448
 cheeks+0.443
 Tom+0.436
 mouths+0.427
Neg logit contributions
amed-0.316
and-0.309
year-0.285
Along-0.267
uttering-0.265
Token-
Token ID12
Feature activation+1.80e-01

Pos logit contributions
 Ben+0.151
 says+0.148
 cheeks+0.148
 Tom+0.144
 Sam+0.136
Neg logit contributions
amed-0.179
and-0.170
year-0.164
uttering-0.159
Along-0.157
Tokenand
Token ID392
Feature activation+6.86e-01

Pos logit contributions
amed+0.102
uttering+0.080
aming+0.072
pr+0.067
year+0.063
Neg logit contributions
 Tom-0.589
ens-0.581
 cheeks-0.580
 says-0.579
 Ben-0.579
Token-
Token ID12
Feature activation+1.80e-01

Pos logit contributions
and+1.165
amed+1.164
year+1.050
Along+0.994
aming+0.983
Neg logit contributions
 says-1.643
 Ben-1.611
 cheeks-1.595
 Tom-1.571
 mouths-1.553
Tokenseek
Token ID36163
Feature activation+2.72e-01

Pos logit contributions
amed+0.359
pr+0.341
uttering+0.339
aming+0.339
year+0.336
Neg logit contributions
 Tom-0.347
 cheeks-0.338
 Ben-0.330
 says-0.325
ens-0.322
Token with
Token ID351
Feature activation+2.00e-01

Pos logit contributions
amed+0.652
year+0.594
and+0.563
aming+0.558
Eventually+0.550
Neg logit contributions
 Ben-0.502
 Tom-0.495
 says-0.460
 cheeks-0.446
 Sam-0.445
Token her
Token ID607
Feature activation-1.11e-01

Pos logit contributions
and+0.423
amed+0.399
Along+0.367
year+0.364
aming+0.351
Neg logit contributions
 says-0.374
 cheeks-0.364
 Lee-0.336
 mouths-0.335
 acts-0.332
Token friends
Token ID2460
Feature activation+9.04e-02

Pos logit contributions
 says+0.225
 cheeks+0.220
 Ben+0.218
 Tom+0.214
 mouths+0.210
Neg logit contributions
amed-0.226
and-0.224
year-0.203
Along-0.199
uttering-0.199
Token.
Token ID13
Feature activation-7.21e-02

Pos logit contributions
 says+0.553
 Ben+0.541
 cheeks+0.538
 Tom+0.522
 mouths+0.511
Neg logit contributions
amed-0.773
and-0.763
year-0.716
Along-0.686
uttering-0.685
Token She
Token ID1375
Feature activation-8.81e-02

Pos logit contributions
 cheeks+0.128
 says+0.123
 mouths+0.121
ens+0.116
 Ben+0.114
Neg logit contributions
and-0.163
amed-0.157
year-0.150
Along-0.145
Just-0.145
Token understands
Token ID14759
Feature activation-5.78e-02

Pos logit contributions
 cheeks+0.094
 Lee+0.083
 mouths+0.083
 Ben+0.080
 Tom+0.077
Neg logit contributions
amed-0.257
and-0.239
Along-0.236
year-0.236
Just-0.234
Token.
Token ID13
Feature activation-6.69e-02

Pos logit contributions
 cheeks+0.099
 says+0.099
 mouths+0.095
 floats+0.094
 Lee+0.089
Neg logit contributions
and-0.128
year-0.118
amed-0.118
pr-0.110
Just-0.109
Token She
Token ID1375
Feature activation-9.48e-02

Pos logit contributions
 cheeks+0.129
 says+0.124
 mouths+0.123
ens+0.115
 floats+0.115
Neg logit contributions
and-0.142
amed-0.138
year-0.130
Along-0.126
Just-0.125
Token says
Token ID1139
Feature activation-4.97e-01

Pos logit contributions
 cheeks+0.089
 Ben+0.079
 mouths+0.078
 Lee+0.077
 Tom+0.075
Neg logit contributions
amed-0.291
and-0.272
year-0.269
Just-0.268
Along-0.267
Token sorry
Token ID7926
Feature activation-7.27e-02

Pos logit contributions
 says+0.674
 cheeks+0.669
 mouths+0.657
 trucks+0.626
ens+0.621
Neg logit contributions
and-1.216
amed-1.206
year-1.156
uttering-1.121
Just-1.117
Token again
Token ID757
Feature activation+1.09e-02

Pos logit contributions
 cheeks+0.168
 says+0.167
 Ben+0.166
 mouths+0.164
 Tom+0.162
Neg logit contributions
and-0.125
amed-0.114
year-0.104
Just-0.100
uttering-0.097
Token.
Token ID13
Feature activation-6.46e-02

Pos logit contributions
and+0.026
amed+0.026
year+0.024
Along+0.023
aming+0.023
Neg logit contributions
 says-0.018
 cheeks-0.018
 Ben-0.018
 Tom-0.017
 mouths-0.017
Token She
Token ID1375
Feature activation-9.53e-02

Pos logit contributions
 cheeks+0.129
 says+0.123
 mouths+0.123
 floats+0.117
ens+0.116
Neg logit contributions
and-0.131
amed-0.127
Just-0.118
year-0.117
Along-0.115
Token gives
Token ID3607
Feature activation-3.11e-01

Pos logit contributions
 cheeks+0.060
 Lee+0.047
 mouths+0.043
 Ben+0.041
 Tom+0.036
Neg logit contributions
amed-0.321
and-0.301
year-0.300
Just-0.298
Along-0.297
Token He
Token ID679
Feature activation-6.55e-02

Pos logit contributions
 cheeks+0.032
 says+0.031
 mouths+0.030
 Ben+0.029
ens+0.029
Neg logit contributions
and-0.045
amed-0.044
year-0.041
Along-0.041
Just-0.040
Token smiles
Token ID21845
Feature activation-3.07e-01

Pos logit contributions
 cheeks+0.107
 Lee+0.101
 Ben+0.101
 mouths+0.099
 Tom+0.098
Neg logit contributions
amed-0.152
and-0.143
year-0.141
Along-0.137
Just-0.135
Token back
Token ID736
Feature activation+1.72e-01

Pos logit contributions
 says+0.509
 Ben+0.505
 Tom+0.487
 cheeks+0.479
 Sam+0.453
Neg logit contributions
and-0.762
amed-0.752
year-0.715
Along-0.693
uttering-0.682
Token.
Token ID13
Feature activation-4.66e-02

Pos logit contributions
amed+0.444
and+0.436
year+0.414
Along+0.401
aming+0.400
Neg logit contributions
 says-0.273
 Ben-0.261
 cheeks-0.258
 Tom-0.254
 mouths-0.241
Token He
Token ID679
Feature activation-8.74e-02

Pos logit contributions
 cheeks+0.096
 says+0.093
 mouths+0.092
ens+0.088
 floats+0.087
Neg logit contributions
and-0.093
amed-0.090
year-0.085
Along-0.082
Just-0.081
Token says
Token ID1139
Feature activation-4.92e-01

Pos logit contributions
 cheeks+0.084
 Lee+0.071
 mouths+0.069
 Ben+0.068
 Tom+0.067
Neg logit contributions
amed-0.264
and-0.248
Along-0.244
year-0.244
Just-0.240
Token goodbye
Token ID24829
Feature activation-5.62e-02

Pos logit contributions
 says+0.636
 cheeks+0.629
 mouths+0.615
ens+0.590
 Ben+0.560
Neg logit contributions
and-1.309
amed-1.291
year-1.255
Just-1.178
pr-1.166
Token and
Token ID290
Feature activation+1.55e-02

Pos logit contributions
 says+0.119
 Ben+0.118
 cheeks+0.117
 Tom+0.116
 Sam+0.111
Neg logit contributions
amed-0.114
and-0.108
year-0.106
aming-0.101
Along-0.100
Token walks
Token ID11114
Feature activation-1.10e-01

Pos logit contributions
amed+0.040
and+0.039
 late+0.038
year+0.037
Along+0.036
Neg logit contributions
 cheeks-0.019
 mouths-0.018
 Lee-0.017
ens-0.016
 Ben-0.016
Token away
Token ID1497
Feature activation+1.36e-01

Pos logit contributions
 Ben+0.225
 says+0.224
 Tom+0.222
 cheeks+0.213
 Sam+0.209
Neg logit contributions
amed-0.222
and-0.218
year-0.206
aming-0.199
Along-0.199
Token.
Token ID13
Feature activation-4.88e-02

Pos logit contributions
amed+0.319
and+0.318
year+0.297
Along+0.285
uttering+0.285
Neg logit contributions
 says-0.233
 Ben-0.228
 cheeks-0.227
 Tom-0.219
 mouths-0.218
Token
Token ID198
Feature activation+1.68e-01

Pos logit contributions
amed+0.348
and+0.321
year+0.313
uttering+0.311
aming+0.301
Neg logit contributions
 says-0.345
 Ben-0.344
 Tom-0.338
 Sam-0.322
 cheeks-0.296
TokenTom
Token ID13787
Feature activation-3.31e-02

Pos logit contributions
amed+0.332
and+0.311
year+0.299
uttering+0.295
 supposed+0.280
Neg logit contributions
 Ben-0.360
 says-0.355
 Tom-0.353
 Sam-0.337
 cheeks-0.326
Token nods
Token ID34833
Feature activation-3.35e-01

Pos logit contributions
 cheeks+0.051
 mouths+0.047
 Ben+0.047
 avail+0.044
 Lee+0.044
Neg logit contributions
amed-0.085
and-0.079
year-0.074
Just-0.074
aming-0.074
Token.
Token ID13
Feature activation-5.93e-02

Pos logit contributions
 says+0.502
 Ben+0.495
 Tom+0.475
 cheeks+0.471
 Sam+0.444
Neg logit contributions
amed-0.893
and-0.867
year-0.831
uttering-0.816
Along-0.814
Token He
Token ID679
Feature activation-1.01e-01

Pos logit contributions
 cheeks+0.111
 mouths+0.106
 says+0.102
 floats+0.100
ens+0.099
Neg logit contributions
and-0.129
amed-0.122
year-0.122
 late-0.118
Along-0.116
Token says
Token ID1139
Feature activation-4.92e-01

Pos logit contributions
 cheeks+0.097
 Ben+0.087
 mouths+0.083
 Tom+0.082
 Lee+0.081
Neg logit contributions
amed-0.312
and-0.299
Along-0.290
year-0.290
Just-0.285
Token sorry
Token ID7926
Feature activation-7.15e-02

Pos logit contributions
 says+0.602
 cheeks+0.593
 mouths+0.577
ens+0.549
 Ben+0.544
Neg logit contributions
and-1.350
amed-1.326
year-1.293
Just-1.229
pr-1.216
Token to
Token ID284
Feature activation+2.49e-01

Pos logit contributions
 says+0.161
 cheeks+0.160
 Ben+0.159
 Tom+0.155
 mouths+0.155
Neg logit contributions
and-0.130
amed-0.124
year-0.114
Along-0.106
Just-0.106
Token Mia
Token ID32189
Feature activation-3.59e-01

Pos logit contributions
and+0.598
 late+0.594
year+0.543
 much+0.528
 very+0.527
Neg logit contributions
 cheeks-0.317
 says-0.288
 floats-0.279
 mouths-0.272
 acts-0.259
Token.
Token ID13
Feature activation-6.10e-02

Pos logit contributions
 cheeks+0.671
 says+0.636
 mouths+0.635
 floats+0.635
 Ben+0.627
Neg logit contributions
and-0.786
amed-0.731
year-0.689
Along-0.638
pr-0.636
Token He
Token ID679
Feature activation-1.01e-01

Pos logit contributions
 cheeks+0.105
 says+0.100
 mouths+0.098
 Ben+0.094
ens+0.093
Neg logit contributions
and-0.144
amed-0.139
year-0.134
Just-0.129
Along-0.128
Token She
Token ID1375
Feature activation-9.69e-02

Pos logit contributions
 cheeks+0.094
 says+0.094
 mouths+0.090
 floats+0.088
 trucks+0.087
Neg logit contributions
and-0.114
amed-0.111
 late-0.107
year-0.106
Along-0.105
Token feels
Token ID5300
Feature activation-2.70e-01

Pos logit contributions
 cheeks+0.117
 Lee+0.109
 avail+0.099
 Sam+0.094
 mouths+0.094
Neg logit contributions
amed-0.252
Just-0.239
Along-0.236
and-0.234
aming-0.231
Token better
Token ID1365
Feature activation+1.59e-01

Pos logit contributions
 says+0.617
 Ben+0.613
 cheeks+0.592
 Tom+0.590
 mouths+0.575
Neg logit contributions
amed-0.500
and-0.493
year-0.455
Along-0.442
uttering-0.432
Token.
Token ID13
Feature activation-5.31e-02

Pos logit contributions
amed+0.477
and+0.448
year+0.437
Along+0.431
pr+0.416
Neg logit contributions
 Ben-0.205
 says-0.197
 Tom-0.191
 cheeks-0.184
 Sam-0.168
Token She
Token ID1375
Feature activation-9.70e-02

Pos logit contributions
 cheeks+0.093
 says+0.091
 mouths+0.088
 floats+0.085
 trucks+0.084
Neg logit contributions
and-0.118
amed-0.114
year-0.108
 late-0.107
Just-0.106
Token says
Token ID1139
Feature activation-4.92e-01

Pos logit contributions
 cheeks+0.107
 Lee+0.098
 avail+0.087
 mouths+0.086
 trucks+0.085
Neg logit contributions
amed-0.268
Just-0.252
Along-0.250
and-0.248
aming-0.246
Token she
Token ID673
Feature activation-1.71e-01

Pos logit contributions
 says+0.673
 cheeks+0.651
 mouths+0.628
 trucks+0.627
ens+0.614
Neg logit contributions
and-1.221
amed-1.192
year-1.152
uttering-1.111
Just-1.100
Token loves
Token ID10408
Feature activation-5.73e-02

Pos logit contributions
 cheeks+0.231
 Ben+0.224
 says+0.224
 Tom+0.215
 mouths+0.210
Neg logit contributions
amed-0.464
and-0.455
year-0.435
aming-0.418
Along-0.418
Token Mom
Token ID11254
Feature activation-2.96e-01

Pos logit contributions
 cheeks+0.076
 says+0.076
 floats+0.069
 mouths+0.067
 Lee+0.066
Neg logit contributions
and-0.148
year-0.134
amed-0.133
 late-0.133
aming-0.127
Tokenmy
Token ID1820
Feature activation+3.07e-02

Pos logit contributions
 says+0.528
 Ben+0.517
 cheeks+0.514
 Tom+0.499
 mouths+0.489
Neg logit contributions
amed-0.681
and-0.672
year-0.631
Along-0.603
aming-0.601
Token and
Token ID290
Feature activation+8.71e-03

Pos logit contributions
amed+0.077
and+0.075
year+0.071
Along+0.069
uttering+0.069
Neg logit contributions
 says-0.050
 Ben-0.049
 cheeks-0.048
 Tom-0.047
 mouths-0.045
Token smiles
Token ID21845
Feature activation-2.96e-01

Pos logit contributions
 cheeks+0.240
 mouths+0.237
 Ben+0.232
 Lee+0.226
 Tom+0.224
Neg logit contributions
amed-0.293
and-0.273
year-0.254
Just-0.254
aming-0.254
Token and
Token ID290
Feature activation+9.91e-03

Pos logit contributions
 says+0.500
 Ben+0.496
 cheeks+0.476
 Tom+0.475
 Sam+0.442
Neg logit contributions
and-0.726
amed-0.715
year-0.688
Along-0.665
uttering-0.652
Token nods
Token ID34833
Feature activation-3.32e-01

Pos logit contributions
amed+0.026
and+0.023
 late+0.022
Along+0.022
Just+0.022
Neg logit contributions
 cheeks-0.012
 mouths-0.011
 Lee-0.011
ens-0.010
 trucks-0.010
Token.
Token ID13
Feature activation-5.71e-02

Pos logit contributions
 says+0.534
 Ben+0.519
 cheeks+0.515
 Tom+0.499
 mouths+0.489
Neg logit contributions
amed-0.828
and-0.818
year-0.772
Along-0.739
uttering-0.739
Token She
Token ID1375
Feature activation-9.63e-02

Pos logit contributions
 cheeks+0.103
 mouths+0.101
 says+0.100
 floats+0.095
ens+0.092
Neg logit contributions
and-0.125
amed-0.120
year-0.113
Along-0.111
Just-0.110
Token says
Token ID1139
Feature activation-4.92e-01

Pos logit contributions
 cheeks+0.094
 Lee+0.080
 mouths+0.080
 Ben+0.074
 Tom+0.073
Neg logit contributions
amed-0.292
Along-0.270
and-0.270
Just-0.265
aming-0.265
Token,
Token ID11
Feature activation-2.35e-01

Pos logit contributions
 says+0.667
 cheeks+0.648
 mouths+0.643
ens+0.608
 trucks+0.600
Neg logit contributions
and-1.255
amed-1.252
year-1.183
uttering-1.158
Just-1.132
Token "
Token ID366
Feature activation-5.68e-02

Pos logit contributions
 says+0.369
 cheeks+0.358
 Ben+0.355
 Tom+0.342
 mouths+0.341
Neg logit contributions
amed-0.587
and-0.581
year-0.548
Along-0.529
uttering-0.527
TokenThank
Token ID10449
Feature activation+2.00e-01

Pos logit contributions
 says+0.134
 Ben+0.133
 cheeks+0.129
 Tom+0.129
 Sam+0.124
Neg logit contributions
amed-0.100
and-0.098
year-0.092
Along-0.086
uttering-0.086
Token you
Token ID345
Feature activation-1.48e-01

Pos logit contributions
and+0.525
amed+0.519
year+0.486
aming+0.475
uttering+0.466
Neg logit contributions
 cheeks-0.288
 says-0.285
 mouths-0.275
 Ben-0.270
 Tom-0.260
Token,
Token ID11
Feature activation-2.37e-01

Pos logit contributions
 says+0.170
 cheeks+0.167
 mouths+0.157
 Ben+0.152
 floats+0.148
Neg logit contributions
and-0.424
amed-0.421
year-0.399
Along-0.385
uttering-0.382
Token She
Token ID1375
Feature activation-7.64e-02

Pos logit contributions
 cheeks+0.100
 says+0.098
 mouths+0.096
ens+0.092
 Ben+0.091
Neg logit contributions
and-0.125
amed-0.123
year-0.115
Along-0.112
Just-0.111
Token feels
Token ID5300
Feature activation-2.60e-01

Pos logit contributions
 cheeks+0.118
 mouths+0.112
 Ben+0.112
 Lee+0.111
 Tom+0.109
Neg logit contributions
amed-0.186
and-0.171
year-0.170
Along-0.166
Just-0.166
Token sorry
Token ID7926
Feature activation-6.85e-02

Pos logit contributions
 says+0.515
 Ben+0.514
 cheeks+0.490
 Tom+0.488
 mouths+0.469
Neg logit contributions
and-0.560
amed-0.555
year-0.513
Along-0.509
uttering-0.500
Token.
Token ID13
Feature activation-5.63e-02

Pos logit contributions
 cheeks+0.135
 says+0.134
 mouths+0.132
 Ben+0.131
 floats+0.129
Neg logit contributions
and-0.139
amed-0.133
year-0.122
Just-0.118
 late-0.116
Token She
Token ID1375
Feature activation-9.19e-02

Pos logit contributions
 cheeks+0.107
 mouths+0.103
 says+0.102
 floats+0.097
ens+0.094
Neg logit contributions
and-0.121
amed-0.118
year-0.110
Just-0.108
Along-0.107
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.081
 Lee+0.076
 mouths+0.070
 Ben+0.068
 Tom+0.066
Neg logit contributions
amed-0.280
Just-0.263
year-0.257
and-0.256
Along-0.256
Token,
Token ID11
Feature activation-2.45e-01

Pos logit contributions
 says+0.661
 cheeks+0.658
 mouths+0.645
 floats+0.618
 trucks+0.617
Neg logit contributions
and-1.211
amed-1.211
year-1.142
uttering-1.120
Just-1.120
Token "
Token ID366
Feature activation-5.36e-02

Pos logit contributions
 says+0.419
 cheeks+0.414
 mouths+0.398
 Ben+0.388
 floats+0.381
Neg logit contributions
amed-0.553
and-0.550
year-0.512
Along-0.499
Just-0.499
TokenYes
Token ID5297
Feature activation+1.25e-01

Pos logit contributions
 says+0.141
 Ben+0.139
 cheeks+0.138
 Tom+0.135
 mouths+0.134
Neg logit contributions
amed-0.080
and-0.078
year-0.071
Along-0.066
uttering-0.066
Token,
Token ID11
Feature activation-2.39e-01

Pos logit contributions
 much+0.222
 very+0.208
Just+0.197
 really+0.189
ust+0.189
Neg logit contributions
 acts-0.231
 writes-0.228
 pret-0.225
 draws-0.223
 trucks-0.219
Token Tom
Token ID4186
Feature activation-4.49e-01

Pos logit contributions
 cheeks+0.329
 mouths+0.325
 acts+0.292
 says+0.288
 floats+0.287
Neg logit contributions
and-0.582
 late-0.529
ust-0.522
Along-0.519
 much-0.518
Token Their
Token ID5334
Feature activation-8.36e-02

Pos logit contributions
 cheeks+0.086
 says+0.085
 mouths+0.083
 floats+0.081
 acts+0.079
Neg logit contributions
and-0.103
amed-0.099
Just-0.092
year-0.091
Along-0.090
Token mom
Token ID1995
Feature activation-2.67e-01

Pos logit contributions
 says+0.088
ens+0.083
 cheeks+0.080
 Ben+0.079
 floats+0.079
Neg logit contributions
and-0.232
amed-0.231
year-0.212
Along-0.211
Just-0.207
Token smiles
Token ID21845
Feature activation-3.06e-01

Pos logit contributions
 cheeks+0.404
 mouths+0.389
 Ben+0.368
 Tom+0.350
 floats+0.349
Neg logit contributions
amed-0.675
year-0.636
and-0.633
aming-0.601
pr-0.594
Token.
Token ID13
Feature activation-4.76e-02

Pos logit contributions
 Ben+0.478
 says+0.474
 Tom+0.458
 cheeks+0.441
 Lee+0.415
Neg logit contributions
and-0.804
amed-0.779
year-0.746
Along-0.732
Days-0.718
Token She
Token ID1375
Feature activation-8.97e-02

Pos logit contributions
 cheeks+0.103
 mouths+0.098
 says+0.097
 floats+0.094
 acts+0.091
Neg logit contributions
and-0.090
amed-0.086
year-0.081
Along-0.079
 late-0.078
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.079
 mouths+0.067
 Lee+0.062
 Ben+0.062
 Tom+0.061
Neg logit contributions
amed-0.288
year-0.271
and-0.269
Along-0.264
aming-0.263
Token,
Token ID11
Feature activation-2.36e-01

Pos logit contributions
 cheeks+0.640
 mouths+0.640
 says+0.628
ens+0.581
 trucks+0.573
Neg logit contributions
and-1.286
year-1.226
amed-1.214
Just-1.132
pr-1.126
Token "
Token ID366
Feature activation-5.28e-02

Pos logit contributions
 says+0.387
 cheeks+0.375
 Ben+0.367
 mouths+0.362
 Tom+0.354
Neg logit contributions
amed-0.564
and-0.561
year-0.529
Along-0.511
Just-0.506
TokenYou
Token ID1639
Feature activation+2.09e-01

Pos logit contributions
 Ben+0.123
 says+0.122
 Tom+0.119
 Sam+0.114
ens+0.112
Neg logit contributions
amed-0.094
and-0.088
uttering-0.085
year-0.083
 supposed-0.081
Token are
Token ID389
Feature activation-3.20e-01

Pos logit contributions
amed+0.623
and+0.582
year+0.576
aming+0.562
Along+0.557
Neg logit contributions
 cheeks-0.243
 mouths-0.224
 says-0.217
 Lee-0.211
 Ben-0.207
Token welcome
Token ID7062
Feature activation+6.77e-02

Pos logit contributions
 Ben+0.651
 Tom+0.637
 says+0.610
 Sam+0.592
 Lee+0.561
Neg logit contributions
and-0.718
amed-0.679
year-0.650
uttering-0.640
Along-0.627
Token sorry
Token ID7926
Feature activation-7.15e-02

Pos logit contributions
 says+0.602
 cheeks+0.593
 mouths+0.577
ens+0.549
 Ben+0.544
Neg logit contributions
and-1.350
amed-1.326
year-1.293
Just-1.229
pr-1.216
Token to
Token ID284
Feature activation+2.49e-01

Pos logit contributions
 says+0.161
 cheeks+0.160
 Ben+0.159
 Tom+0.155
 mouths+0.155
Neg logit contributions
and-0.130
amed-0.124
year-0.114
Along-0.106
Just-0.106
Token Mia
Token ID32189
Feature activation-3.59e-01

Pos logit contributions
and+0.598
 late+0.594
year+0.543
 much+0.528
 very+0.527
Neg logit contributions
 cheeks-0.317
 says-0.288
 floats-0.279
 mouths-0.272
 acts-0.259
Token.
Token ID13
Feature activation-6.10e-02

Pos logit contributions
 cheeks+0.671
 says+0.636
 mouths+0.635
 floats+0.635
 Ben+0.627
Neg logit contributions
and-0.786
amed-0.731
year-0.689
Along-0.638
pr-0.636
Token He
Token ID679
Feature activation-1.01e-01

Pos logit contributions
 cheeks+0.105
 says+0.100
 mouths+0.098
 Ben+0.094
ens+0.093
Neg logit contributions
and-0.144
amed-0.139
year-0.134
Just-0.129
Along-0.128
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.078
 Ben+0.060
 Lee+0.059
 mouths+0.059
 Tom+0.057
Neg logit contributions
amed-0.329
and-0.313
Along-0.309
year-0.309
Just-0.303
Token thank
Token ID5875
Feature activation-3.58e-01

Pos logit contributions
 says+0.683
 cheeks+0.680
 mouths+0.656
ens+0.615
 floats+0.601
Neg logit contributions
and-1.228
year-1.198
amed-1.182
 late-1.124
pr-1.108
Token you
Token ID345
Feature activation-1.52e-01

Pos logit contributions
 says+0.531
 cheeks+0.530
 mouths+0.504
 Ben+0.476
 floats+0.473
Neg logit contributions
and-0.941
amed-0.890
year-0.863
Just-0.827
pr-0.822
Token to
Token ID284
Feature activation+2.50e-01

Pos logit contributions
 says+0.300
 Ben+0.295
 cheeks+0.292
 Tom+0.286
 mouths+0.279
Neg logit contributions
amed-0.323
and-0.316
year-0.297
Along-0.283
aming-0.282
Token Mia
Token ID32189
Feature activation-3.60e-01

Pos logit contributions
and+0.549
year+0.498
 very+0.489
 late+0.468
 nearby+0.457
Neg logit contributions
 cheeks-0.396
 floats-0.341
 acts-0.327
 mouths-0.316
 says-0.314
Token.
Token ID13
Feature activation-6.02e-02

Pos logit contributions
 cheeks+0.655
 says+0.635
 mouths+0.616
 Ben+0.609
 floats+0.600
Neg logit contributions
and-0.817
amed-0.773
year-0.731
Along-0.692
pr-0.679
Token She
Token ID1375
Feature activation-9.32e-02

Pos logit contributions
 floats+0.122
 ah+0.120
 cheeks+0.120
 guy+0.116
 mouths+0.115
Neg logit contributions
and-0.088
 late-0.088
Just-0.085
uttering-0.083
amed-0.082
Token hugs
Token ID40657
Feature activation-2.04e-01

Pos logit contributions
 cheeks+0.131
 avail+0.120
 Lee+0.110
 guy+0.109
 Tom+0.107
Neg logit contributions
amed-0.211
Just-0.203
Along-0.200
 supposed-0.192
and-0.192
Token him
Token ID683
Feature activation-2.09e-01

Pos logit contributions
 cheeks+0.348
 says+0.336
 mouths+0.334
 acts+0.325
 floats+0.322
Neg logit contributions
and-0.433
amed-0.389
Just-0.376
 late-0.372
aming-0.368
Token.
Token ID13
Feature activation-5.61e-02

Pos logit contributions
 says+0.337
 Ben+0.331
 cheeks+0.326
 Tom+0.318
 mouths+0.308
Neg logit contributions
amed-0.519
and-0.512
year-0.484
Along-0.466
uttering-0.465
Token She
Token ID1375
Feature activation-9.30e-02

Pos logit contributions
 cheeks+0.104
 mouths+0.097
 says+0.097
 floats+0.095
 acts+0.092
Neg logit contributions
and-0.117
amed-0.110
Just-0.108
uttering-0.107
 late-0.106
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.102
 avail+0.089
 Lee+0.088
 guy+0.081
 Tom+0.079
Neg logit contributions
amed-0.247
Just-0.236
Along-0.229
and-0.225
 prized-0.224
Token,
Token ID11
Feature activation-2.37e-01

Pos logit contributions
 cheeks+0.641
 says+0.628
 mouths+0.611
 floats+0.579
 avail+0.548
Neg logit contributions
and-1.255
amed-1.221
year-1.175
uttering-1.142
pr-1.138
Token "
Token ID366
Feature activation-5.52e-02

Pos logit contributions
 says+0.400
 cheeks+0.392
 Ben+0.376
 mouths+0.376
 Tom+0.363
Neg logit contributions
amed-0.552
and-0.546
year-0.515
Just-0.496
Along-0.495
TokenI
Token ID40
Feature activation+2.49e-01

Pos logit contributions
 says+0.134
 Ben+0.133
 cheeks+0.131
 Tom+0.129
 mouths+0.126
Neg logit contributions
amed-0.094
and-0.092
year-0.085
Along-0.080
uttering-0.079
Token'm
Token ID1101
Feature activation+1.77e-01

Pos logit contributions
amed+0.560
and+0.549
year+0.516
Along+0.495
uttering+0.495
Neg logit contributions
 says-0.489
 Ben-0.469
 Tom-0.457
 cheeks-0.448
 Sam-0.435
Token sorry
Token ID7926
Feature activation-6.70e-02

Pos logit contributions
amed+0.489
and+0.488
year+0.461
Along+0.448
uttering+0.441
Neg logit contributions
 Ben-0.248
 says-0.247
 Tom-0.238
 cheeks-0.230
 Sam-0.221
TokenMom
Token ID29252
Feature activation+1.78e-01

Pos logit contributions
amed+0.317
and+0.310
year+0.291
uttering+0.274
Along+0.268
Neg logit contributions
 says-0.368
 Ben-0.366
 Tom-0.359
 cheeks-0.351
 Sam-0.341
Tokenmy
Token ID1820
Feature activation+3.03e-02

Pos logit contributions
amed+0.315
and+0.299
aking+0.282
aming+0.271
year+0.264
Neg logit contributions
 cheeks-0.402
 avail-0.365
 mouths-0.364
 guy-0.352
 floats-0.340
Token smiles
Token ID21845
Feature activation-3.11e-01

Pos logit contributions
amed+0.078
aking+0.074
and+0.073
aming+0.072
Just+0.072
Neg logit contributions
 cheeks-0.038
 avail-0.035
 mouths-0.033
 guy-0.030
 tongues-0.028
Token.
Token ID13
Feature activation-5.57e-02

Pos logit contributions
 says+0.454
 Ben+0.448
 Tom+0.418
 cheeks+0.386
 Sam+0.375
Neg logit contributions
and-0.864
amed-0.857
year-0.815
Along-0.797
uttering-0.782
Token She
Token ID1375
Feature activation-9.75e-02

Pos logit contributions
 cheeks+0.098
 says+0.093
 mouths+0.092
 floats+0.087
 Ben+0.084
Neg logit contributions
amed-0.124
and-0.124
year-0.118
 late-0.117
Along-0.113
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.066
 mouths+0.047
 Ben+0.039
 Lee+0.039
 Tom+0.036
Neg logit contributions
amed-0.331
and-0.316
year-0.311
Along-0.309
aming-0.306
Token she
Token ID673
Feature activation-1.70e-01

Pos logit contributions
 cheeks+0.721
 mouths+0.719
 says+0.708
 heads+0.687
 trucks+0.658
Neg logit contributions
and-1.162
year-1.060
 much-1.044
 late-1.039
amed-1.017
Token loves
Token ID10408
Feature activation-5.69e-02

Pos logit contributions
 cheeks+0.235
 says+0.232
 Ben+0.231
 Tom+0.221
 mouths+0.217
Neg logit contributions
amed-0.455
and-0.448
year-0.427
Along-0.410
aming-0.408
Token Lily
Token ID20037
Feature activation-3.73e-01

Pos logit contributions
 cheeks+0.075
 says+0.073
 mouths+0.067
 floats+0.066
 Lee+0.061
Neg logit contributions
and-0.155
year-0.139
amed-0.138
 late-0.135
Just-0.130
Token and
Token ID290
Feature activation+7.30e-03

Pos logit contributions
 says+0.623
 Ben+0.610
 Tom+0.588
 cheeks+0.586
 mouths+0.561
Neg logit contributions
amed-0.919
and-0.891
year-0.861
uttering-0.835
Along-0.833
Token Tom
Token ID4186
Feature activation-4.52e-01

Pos logit contributions
 late+0.020
 supposed+0.018
and+0.018
year+0.018
amed+0.017
Neg logit contributions
 cheeks-0.008
 avail-0.006
 mast-0.006
 trigger-0.006
 mouths-0.006
Token feels
Token ID5300
Feature activation-2.61e-01

Pos logit contributions
 cheeks+0.093
ens+0.082
 Lee+0.081
 Ben+0.080
 mouths+0.079
Neg logit contributions
amed-0.248
and-0.230
year-0.229
Along-0.227
Just-0.225
Token sorry
Token ID7926
Feature activation-6.87e-02

Pos logit contributions
 says+0.483
 cheeks+0.480
 Ben+0.468
 Tom+0.457
 mouths+0.453
Neg logit contributions
amed-0.576
and-0.561
year-0.531
aming-0.505
uttering-0.496
Token too
Token ID1165
Feature activation+1.54e-01

Pos logit contributions
 cheeks+0.143
 mouths+0.138
 says+0.137
 floats+0.135
 Ben+0.135
Neg logit contributions
and-0.136
amed-0.124
year-0.115
 late-0.113
Just-0.112
Token.
Token ID13
Feature activation-5.53e-02

Pos logit contributions
and+0.332
amed+0.322
 late+0.304
year+0.302
uttering+0.296
Neg logit contributions
 cheeks-0.276
 says-0.269
 mouths-0.266
 Ben-0.261
 floats-0.254
Token He
Token ID679
Feature activation-9.27e-02

Pos logit contributions
 cheeks+0.113
 mouths+0.108
 says+0.108
 floats+0.103
ens+0.102
Neg logit contributions
and-0.114
amed-0.108
year-0.103
Just-0.100
 late-0.098
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.077
 Lee+0.067
 Ben+0.067
 Tom+0.064
 mouths+0.063
Neg logit contributions
amed-0.294
and-0.276
year-0.275
Just-0.272
Along-0.270
Token,
Token ID11
Feature activation-2.40e-01

Pos logit contributions
 cheeks+0.643
 says+0.627
 mouths+0.607
ens+0.566
 floats+0.557
Neg logit contributions
and-1.300
amed-1.294
year-1.236
Just-1.182
 late-1.181
Token "
Token ID366
Feature activation-5.53e-02

Pos logit contributions
 says+0.401
 cheeks+0.396
 mouths+0.379
 Ben+0.376
 Tom+0.363
Neg logit contributions
amed-0.563
and-0.560
year-0.525
Along-0.508
Just-0.504
TokenOK
Token ID11380
Feature activation+3.15e-02

Pos logit contributions
 says+0.142
 Ben+0.140
 cheeks+0.140
 Tom+0.137
 mouths+0.135
Neg logit contributions
amed-0.086
and-0.084
year-0.076
Along-0.071
aming-0.071
Token,
Token ID11
Feature activation-2.36e-01

Pos logit contributions
and+0.063
amed+0.061
Just+0.060
 late+0.060
 much+0.060
Neg logit contributions
 mouths-0.053
 cheeks-0.052
 floats-0.051
 acts-0.050
 says-0.048
Token I
Token ID314
Feature activation-2.60e-01

Pos logit contributions
 cheeks+0.294
 mouths+0.293
 acts+0.260
 says+0.258
 floats+0.257
Neg logit contributions
and-0.622
amed-0.562
Along-0.556
Just-0.556
 late-0.555
Token them
Token ID606
Feature activation-1.19e-01

Pos logit contributions
 cheeks+0.519
 says+0.458
 mouths+0.449
 floats+0.421
 Lee+0.413
Neg logit contributions
and-0.767
amed-0.736
year-0.686
aming-0.643
 late-0.637
Token a
Token ID257
Feature activation+1.63e-01

Pos logit contributions
 says+0.237
 cheeks+0.232
 Ben+0.231
 Tom+0.224
 mouths+0.221
Neg logit contributions
amed-0.247
and-0.244
year-0.226
Along-0.216
aming-0.215
Token kiss
Token ID9245
Feature activation-2.89e-01

Pos logit contributions
amed+0.518
and+0.512
year+0.490
Along+0.475
aming+0.475
Neg logit contributions
 says-0.153
 Ben-0.146
 cheeks-0.142
 Tom-0.136
 mouths-0.128
Token.
Token ID13
Feature activation-5.09e-02

Pos logit contributions
 says+0.513
 cheeks+0.508
 Ben+0.496
 mouths+0.485
 Tom+0.477
Neg logit contributions
amed-0.667
and-0.654
year-0.614
aming-0.585
uttering-0.576
Token She
Token ID1375
Feature activation-9.59e-02

Pos logit contributions
 cheeks+0.090
 mouths+0.083
 says+0.083
 trucks+0.080
 floats+0.080
Neg logit contributions
and-0.113
amed-0.109
 late-0.105
year-0.104
Just-0.102
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.075
 mouths+0.054
 Lee+0.052
 Ben+0.048
 Tom+0.047
Neg logit contributions
amed-0.310
and-0.291
Along-0.287
year-0.286
aming-0.284
Token,
Token ID11
Feature activation-2.38e-01

Pos logit contributions
 cheeks+0.686
 mouths+0.665
 says+0.646
 trucks+0.632
 avail+0.627
Neg logit contributions
and-1.169
year-1.113
amed-1.104
 late-1.084
 much-1.076
Token "
Token ID366
Feature activation-5.54e-02

Pos logit contributions
 says+0.390
 cheeks+0.382
 Ben+0.370
 mouths+0.366
 Tom+0.357
Neg logit contributions
and-0.567
amed-0.566
year-0.532
Along-0.513
Just-0.509
TokenThat
Token ID2504
Feature activation+2.21e-01

Pos logit contributions
 Ben+0.133
 says+0.133
 Tom+0.129
 cheeks+0.124
ens+0.123
Neg logit contributions
amed-0.096
and-0.092
year-0.087
uttering-0.083
Along-0.081
Token's
Token ID338
Feature activation-4.31e-02

Pos logit contributions
amed+0.517
and+0.488
aming+0.455
year+0.452
Just+0.449
Neg logit contributions
 cheeks-0.376
 Ben-0.364
 mouths-0.353
 Tom-0.352
 says-0.346
Token better
Token ID1365
Feature activation+1.61e-01

Pos logit contributions
 Ben+0.103
 Tom+0.102
 says+0.099
 Sam+0.096
 Lily+0.092
Neg logit contributions
and-0.085
amed-0.080
year-0.076
uttering-0.073
Along-0.072
Token bad
Token ID2089
Feature activation+7.55e-02

Pos logit contributions
 cheeks+0.361
 mouths+0.348
 tongues+0.298
 says+0.277
 draws+0.275
Neg logit contributions
and-0.516
a-0.513
 late-0.498
Along-0.496
ust-0.494
Token dog
Token ID3290
Feature activation-2.90e-01

Pos logit contributions
uttering+0.177
amed+0.177
 late+0.173
Along+0.173
and+0.170
Neg logit contributions
 cheeks-0.108
 mouths-0.108
 says-0.090
 floats-0.085
 tongues-0.083
Token!"
Token ID2474
Feature activation-5.11e-02

Pos logit contributions
 cheeks+0.519
 mouths+0.499
 Ben+0.479
 says+0.479
 Tom+0.449
Neg logit contributions
amed-0.646
and-0.640
year-0.576
Along-0.572
 late-0.572
Token the
Token ID262
Feature activation-1.87e-01

Pos logit contributions
 cheeks+0.043
 mouths+0.041
 acts+0.037
 floats+0.036
 trucks+0.034
Neg logit contributions
 late-0.151
and-0.149
amed-0.149
Just-0.145
year-0.141
Token man
Token ID582
Feature activation-2.45e-01

Pos logit contributions
 says+0.187
 cheeks+0.184
 Ben+0.174
 mouths+0.174
 Tom+0.166
Neg logit contributions
amed-0.563
and-0.545
year-0.525
uttering-0.517
Along-0.516
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.166
 mouths+0.133
 Ben+0.131
 Lee+0.124
ens+0.121
Neg logit contributions
amed-0.793
and-0.791
year-0.755
Just-0.742
Along-0.723
Token.
Token ID13
Feature activation-5.90e-02

Pos logit contributions
 says+0.765
 Ben+0.753
 cheeks+0.744
 Tom+0.725
 mouths+0.694
Neg logit contributions
amed-1.246
and-1.223
year-1.158
Along-1.125
uttering-1.115
Token He
Token ID679
Feature activation-9.92e-02

Pos logit contributions
 cheeks+0.097
 says+0.095
 mouths+0.092
 Ben+0.091
 Tom+0.087
Neg logit contributions
amed-0.141
and-0.141
year-0.133
Along-0.129
Just-0.127
Token sounds
Token ID5238
Feature activation-1.84e-01

Pos logit contributions
 cheeks+0.108
 Ben+0.097
 Tom+0.094
ens+0.092
 Lee+0.092
Neg logit contributions
amed-0.282
and-0.264
Along-0.263
year-0.262
Just-0.260
Token angry
Token ID7954
Feature activation+7.94e-02

Pos logit contributions
 says+0.421
 Ben+0.419
 Tom+0.406
 cheeks+0.398
 Sam+0.387
Neg logit contributions
amed-0.343
and-0.338
year-0.314
uttering-0.306
Along-0.304
Token.
Token ID13
Feature activation-5.87e-02

Pos logit contributions
and+0.173
amed+0.166
year+0.154
Just+0.147
aming+0.146
Neg logit contributions
 cheeks-0.148
 says-0.146
 mouths-0.144
 Ben-0.143
 Lee-0.138
Token you
Token ID345
Feature activation-1.45e-01

Pos logit contributions
 cheeks+0.362
 mouths+0.344
 says+0.326
 floats+0.298
 avail+0.289
Neg logit contributions
and-0.588
amed-0.535
Along-0.520
Just-0.508
year-0.505
Token are
Token ID389
Feature activation-3.18e-01

Pos logit contributions
 cheeks+0.164
 Ben+0.158
 says+0.157
 mouths+0.151
 Tom+0.149
Neg logit contributions
amed-0.422
and-0.414
Along-0.390
year-0.389
aming-0.379
Token bad
Token ID2089
Feature activation+7.65e-02

Pos logit contributions
 Ben+0.711
 Tom+0.696
 says+0.662
 Sam+0.660
 Max+0.600
Neg logit contributions
and-0.671
amed-0.641
year-0.634
uttering-0.593
Along-0.581
Token!"
Token ID2474
Feature activation-5.60e-02

Pos logit contributions
and+0.178
amed+0.175
year+0.161
aming+0.158
uttering+0.158
Neg logit contributions
 cheeks-0.127
 mouths-0.123
 says-0.121
 Ben-0.119
 Tom-0.114
Token Ben
Token ID3932
Feature activation-4.69e-01

Pos logit contributions
 cheeks+0.054
 mouths+0.050
 floats+0.047
 acts+0.046
 trucks+0.044
Neg logit contributions
 late-0.162
and-0.157
Just-0.151
amed-0.149
year-0.146
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.334
 mouths+0.292
 avail+0.234
 Lee+0.229
 floats+0.229
Neg logit contributions
amed-1.508
and-1.506
Just-1.410
Along-1.384
 late-1.376
Token.
Token ID13
Feature activation-5.97e-02

Pos logit contributions
 says+0.749
 Ben+0.736
 cheeks+0.726
 Tom+0.709
 mouths+0.675
Neg logit contributions
amed-1.269
and-1.238
year-1.180
Along-1.149
uttering-1.137
Token
Token ID198
Feature activation+1.57e-01

Pos logit contributions
 says+0.114
 Ben+0.112
 cheeks+0.112
 Tom+0.108
 mouths+0.106
Neg logit contributions
amed-0.131
and-0.129
year-0.121
Along-0.115
uttering-0.114
Token
Token ID198
Feature activation+1.56e-01

Pos logit contributions
amed+0.333
and+0.310
year+0.305
uttering+0.298
pr+0.293
Neg logit contributions
 says-0.329
 Ben-0.327
 Tom-0.320
 Sam-0.304
 cheeks-0.288
TokenThe
Token ID464
Feature activation+3.40e-01

Pos logit contributions
amed+0.279
and+0.265
uttering+0.259
year+0.259
 supposed+0.251
Neg logit contributions
 Ben-0.373
 says-0.364
 Tom-0.363
 Sam-0.348
 cheeks-0.327
Token dog
Token ID3290
Feature activation-2.88e-01

Pos logit contributions
amed+1.016
and+0.993
year+0.957
Along+0.936
uttering+0.933
Neg logit contributions
 cheeks-0.337
 says-0.334
 mouths-0.312
 Ben-0.311
 Tom-0.300
Token Their
Token ID5334
Feature activation-9.55e-02

Pos logit contributions
 cheeks+0.077
 says+0.075
 Ben+0.070
 mouths+0.070
ens+0.069
Neg logit contributions
and-0.134
amed-0.133
year-0.125
Along-0.122
Just-0.122
Token mom
Token ID1995
Feature activation-2.70e-01

Pos logit contributions
 says+0.101
 Ben+0.100
ens+0.098
 cheeks+0.094
 Tom+0.094
Neg logit contributions
and-0.260
amed-0.256
uttering-0.239
year-0.232
Along-0.231
Token smiles
Token ID21845
Feature activation-3.06e-01

Pos logit contributions
 cheeks+0.320
 Ben+0.256
 avail+0.256
 mouths+0.256
 Sam+0.250
Neg logit contributions
amed-0.764
and-0.734
aming-0.699
year-0.691
pr-0.681
Token.
Token ID13
Feature activation-5.40e-02

Pos logit contributions
 says+0.430
 Ben+0.414
 Tom+0.387
 cheeks+0.352
 comes+0.348
Neg logit contributions
and-0.865
amed-0.857
year-0.816
Along-0.797
uttering-0.784
Token She
Token ID1375
Feature activation-9.47e-02

Pos logit contributions
 cheeks+0.090
 says+0.085
 mouths+0.082
 Ben+0.080
 trucks+0.078
Neg logit contributions
amed-0.126
and-0.125
year-0.117
Along-0.115
uttering-0.114
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
 cheeks+0.063
 Ben+0.045
 mouths+0.045
 Tom+0.041
 Lee+0.040
Neg logit contributions
amed-0.326
and-0.311
year-0.304
Along-0.301
aming-0.299
Token they
Token ID484
Feature activation-2.48e-01

Pos logit contributions
 cheeks+0.669
 mouths+0.627
 says+0.602
 avail+0.562
 trucks+0.561
Neg logit contributions
and-1.234
year-1.167
amed-1.149
 late-1.110
uttering-1.100
Token are
Token ID389
Feature activation-3.18e-01

Pos logit contributions
 says+0.298
 Ben+0.283
 Tom+0.268
 cheeks+0.265
 mouths+0.251
Neg logit contributions
amed-0.733
and-0.725
year-0.688
uttering-0.670
Along-0.667
Token beautiful
Token ID4950
Feature activation+3.19e-01

Pos logit contributions
 Ben+0.743
 says+0.731
 Tom+0.719
 Sam+0.682
 cheeks+0.669
Neg logit contributions
and-0.610
amed-0.568
year-0.526
Along-0.519
Just-0.503
Token.
Token ID13
Feature activation-5.35e-02

Pos logit contributions
and+0.653
amed+0.607
year+0.596
aming+0.550
 late+0.550
Neg logit contributions
 cheeks-0.620
 says-0.593
 mouths-0.579
 Ben-0.572
 Tom-0.568
Token She
Token ID1375
Feature activation-9.52e-02

Pos logit contributions
 cheeks+0.091
 says+0.086
 mouths+0.084
 Ben+0.080
 trucks+0.079
Neg logit contributions
and-0.124
amed-0.121
year-0.115
Along-0.112
Just-0.112
Token She
Token ID1375
Feature activation-9.04e-02

Pos logit contributions
 cheeks+0.090
 mouths+0.087
 says+0.084
 floats+0.082
ens+0.080
Neg logit contributions
and-0.110
amed-0.103
year-0.099
Just-0.097
 late-0.097
Token hugs
Token ID40657
Feature activation-2.01e-01

Pos logit contributions
 cheeks+0.101
 avail+0.089
 Lee+0.088
 mouths+0.086
ens+0.083
Neg logit contributions
amed-0.249
and-0.231
Along-0.229
year-0.228
Just-0.225
Token her
Token ID607
Feature activation-1.22e-01

Pos logit contributions
 mouths+0.361
 floats+0.350
 says+0.347
 cheeks+0.337
 acts+0.323
Neg logit contributions
and-0.429
amed-0.357
year-0.351
Just-0.344
 late-0.344
Token doll
Token ID3654
Feature activation-2.97e-01

Pos logit contributions
 cheeks+0.139
 mouths+0.135
 says+0.131
 Ben+0.127
 floats+0.126
Neg logit contributions
amed-0.325
and-0.307
year-0.292
pr-0.285
 very-0.282
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
 says+0.593
 Ben+0.577
 cheeks+0.574
 mouths+0.556
 Tom+0.548
Neg logit contributions
amed-0.594
and-0.589
year-0.537
Along-0.510
aming-0.504
Token says
Token ID1139
Feature activation-4.91e-01

Pos logit contributions
amed+0.024
 late+0.023
 very+0.023
 much+0.022
 valuable+0.022
Neg logit contributions
 cheeks-0.012
ue-0.012
 Pole-0.012
 guy-0.011
 avail-0.011
Token,
Token ID11
Feature activation-2.41e-01

Pos logit contributions
 says+0.676
 cheeks+0.638
 trucks+0.633
 mouths+0.633
ens+0.624
Neg logit contributions
and-1.216
amed-1.174
year-1.153
 much-1.125
pr-1.091
Token "
Token ID366
Feature activation-5.83e-02

Pos logit contributions
 says+0.394
 cheeks+0.385
 Ben+0.377
 mouths+0.370
 Tom+0.362
Neg logit contributions
and-0.573
amed-0.571
year-0.535
Along-0.517
uttering-0.515
TokenThank
Token ID10449
Feature activation+1.99e-01

Pos logit contributions
 says+0.144
 Ben+0.142
 cheeks+0.140
 Tom+0.139
 mouths+0.133
Neg logit contributions
amed-0.096
and-0.095
year-0.088
uttering-0.082
Along-0.082
Token you
Token ID345
Feature activation-1.48e-01

Pos logit contributions
and+0.518
amed+0.514
year+0.483
aming+0.471
Just+0.461
Neg logit contributions
 cheeks-0.288
 says-0.287
 mouths-0.276
 Ben-0.270
 Tom-0.259
Token,
Token ID11
Feature activation-2.42e-01

Pos logit contributions
 says+0.167
 cheeks+0.160
 Ben+0.153
 mouths+0.149
 Tom+0.143
Neg logit contributions
and-0.435
amed-0.434
year-0.410
Along-0.394
aming-0.393
Token they
Token ID484
Feature activation-2.48e-01

Pos logit contributions
 cheeks+0.669
 mouths+0.627
 says+0.602
 avail+0.562
 trucks+0.561
Neg logit contributions
and-1.234
year-1.167
amed-1.149
 late-1.110
uttering-1.100
Token are
Token ID389
Feature activation-3.18e-01

Pos logit contributions
 says+0.298
 Ben+0.283
 Tom+0.268
 cheeks+0.265
 mouths+0.251
Neg logit contributions
amed-0.733
and-0.725
year-0.688
uttering-0.670
Along-0.667
Token beautiful
Token ID4950
Feature activation+3.19e-01

Pos logit contributions
 Ben+0.743
 says+0.731
 Tom+0.719
 Sam+0.682
 cheeks+0.669
Neg logit contributions
and-0.610
amed-0.568
year-0.526
Along-0.519
Just-0.503
Token.
Token ID13
Feature activation-5.35e-02

Pos logit contributions
and+0.653
amed+0.607
year+0.596
aming+0.550
 late+0.550
Neg logit contributions
 cheeks-0.620
 says-0.593
 mouths-0.579
 Ben-0.572
 Tom-0.568
Token She
Token ID1375
Feature activation-9.52e-02

Pos logit contributions
 cheeks+0.091
 says+0.086
 mouths+0.084
 Ben+0.080
 trucks+0.079
Neg logit contributions
and-0.124
amed-0.121
year-0.115
Along-0.112
Just-0.112
Token says
Token ID1139
Feature activation-4.90e-01

Pos logit contributions
 cheeks+0.053
 mouths+0.034
 Ben+0.033
 Lee+0.032
 Tom+0.031
Neg logit contributions
amed-0.335
and-0.323
year-0.315
Along-0.312
aming-0.307
Token she
Token ID673
Feature activation-1.66e-01

Pos logit contributions
 cheeks+0.675
 mouths+0.629
 says+0.629
 avail+0.579
ens+0.571
Neg logit contributions
and-1.229
year-1.167
amed-1.115
 late-1.101
Just-1.079
Token loves
Token ID10408
Feature activation-5.43e-02

Pos logit contributions
 says+0.225
 Ben+0.220
 cheeks+0.219
 Tom+0.210
 mouths+0.204
Neg logit contributions
amed-0.454
and-0.448
year-0.426
Along-0.411
uttering-0.409
Token them
Token ID606
Feature activation-1.16e-01

Pos logit contributions
 cheeks+0.090
 says+0.083
 mouths+0.079
 Lee+0.076
ens+0.075
Neg logit contributions
and-0.130
year-0.120
amed-0.120
uttering-0.113
 late-0.111
Token.
Token ID13
Feature activation-5.24e-02

Pos logit contributions
 says+0.200
 Ben+0.199
 Tom+0.192
 cheeks+0.191
 mouths+0.183
Neg logit contributions
amed-0.278
and-0.269
year-0.256
Along-0.249
uttering-0.248
Token She
Token ID1375
Feature activation-9.49e-02

Pos logit contributions
 cheeks+0.095
 says+0.089
 mouths+0.088
 trucks+0.083
 floats+0.083
Neg logit contributions
and-0.116
amed-0.112
year-0.107
 late-0.105
Along-0.104
Token hugs
Token ID40657
Feature activation-2.01e-01

Pos logit contributions
amed+0.014
and+0.013
aming+0.013
 late+0.013
Along+0.013
Neg logit contributions
 cheeks-0.008
 mouths-0.007
 guy-0.006
 tongues-0.006
 avail-0.006
Token them
Token ID606
Feature activation-1.17e-01

Pos logit contributions
 says+0.395
 cheeks+0.393
 mouths+0.382
 floats+0.366
 acts+0.349
Neg logit contributions
and-0.410
amed-0.372
year-0.357
 late-0.343
Just-0.333
Token back
Token ID736
Feature activation+1.68e-01

Pos logit contributions
 says+0.197
 cheeks+0.192
 Ben+0.190
 Tom+0.183
 mouths+0.182
Neg logit contributions
amed-0.278
and-0.276
year-0.258
Along-0.246
aming-0.245
Token.
Token ID13
Feature activation-5.53e-02

Pos logit contributions
amed+0.407
and+0.406
year+0.383
Along+0.366
uttering+0.364
Neg logit contributions
 says-0.276
 cheeks-0.271
 Ben-0.269
 Tom-0.258
 mouths-0.256
Token She
Token ID1375
Feature activation-9.61e-02

Pos logit contributions
 cheeks+0.095
 says+0.090
 mouths+0.089
 floats+0.086
 ah+0.084
Neg logit contributions
amed-0.124
and-0.124
year-0.116
 late-0.113
Along-0.113
Token says
Token ID1139
Feature activation-4.90e-01

Pos logit contributions
 cheeks+0.092
 mouths+0.072
 Ben+0.068
 Tom+0.066
 Lee+0.065
Neg logit contributions
amed-0.299
and-0.284
Along-0.276
year-0.275
aming-0.273
Token "
Token ID366
Feature activation-5.79e-02

Pos logit contributions
 cheeks+0.656
 mouths+0.631
 says+0.617
 avail+0.562
 trucks+0.554
Neg logit contributions
and-1.252
amed-1.187
year-1.171
 late-1.118
 much-1.106
Tokenwhat
Token ID10919
Feature activation+2.54e-01

Pos logit contributions
 Ben+0.137
 says+0.136
 Tom+0.133
ens+0.127
 Sam+0.126
Neg logit contributions
amed-0.101
and-0.098
year-0.094
uttering-0.090
Along-0.087
Token are
Token ID389
Feature activation-3.17e-01

Pos logit contributions
amed+0.673
and+0.657
year+0.626
uttering+0.606
Along+0.605
Neg logit contributions
 says-0.392
 Ben-0.384
 Tom-0.371
 cheeks-0.352
 Sam-0.344
Token you
Token ID345
Feature activation-1.47e-01

Pos logit contributions
 cheeks+0.463
 says+0.455
 mouths+0.429
 Ben+0.413
 Tom+0.397
Neg logit contributions
and-0.827
amed-0.814
year-0.777
Along-0.748
uttering-0.746
Token doing
Token ID1804
Feature activation+1.47e-01

Pos logit contributions
 Ben+0.293
 says+0.279
 Tom+0.276
 Sam+0.263
 cheeks+0.255
Neg logit contributions
and-0.344
year-0.320
amed-0.317
Along-0.299
Days-0.294
Token ch
Token ID442
Feature activation-3.69e-02

Pos logit contributions
 cheeks+0.090
 Ben+0.083
 Lee+0.078
 Tom+0.078
 mouths+0.077
Neg logit contributions
amed-0.296
and-0.283
year-0.277
Along-0.273
Just-0.272
Tokenases
Token ID1386
Feature activation-8.54e-02

Pos logit contributions
 cheeks+0.066
 says+0.063
 Lee+0.058
 Tom+0.058
 mouths+0.057
Neg logit contributions
amed-0.085
and-0.084
year-0.079
uttering-0.078
aming-0.075
Token Sara
Token ID24799
Feature activation-3.24e-01

Pos logit contributions
 cheeks+0.104
 mouths+0.093
 says+0.089
 floats+0.085
 ah+0.078
Neg logit contributions
and-0.255
amed-0.233
year-0.227
 late-0.221
Along-0.212
Token.
Token ID13
Feature activation-5.38e-02

Pos logit contributions
 says+0.578
 cheeks+0.566
 Ben+0.564
 Tom+0.544
 mouths+0.537
Neg logit contributions
amed-0.741
and-0.738
year-0.684
Along-0.658
uttering-0.655
Token He
Token ID679
Feature activation-9.62e-02

Pos logit contributions
 cheeks+0.096
 says+0.092
 mouths+0.090
 floats+0.090
 trucks+0.086
Neg logit contributions
and-0.120
amed-0.111
year-0.110
 late-0.109
Just-0.106
Token says
Token ID1139
Feature activation-4.90e-01

Pos logit contributions
 cheeks+0.103
 Lee+0.092
 Ben+0.089
 Tom+0.085
 mouths+0.082
Neg logit contributions
amed-0.274
and-0.259
Just-0.256
Along-0.255
year-0.253
Token,
Token ID11
Feature activation-2.36e-01

Pos logit contributions
 cheeks+0.678
 says+0.659
 mouths+0.627
 floats+0.596
 acts+0.576
Neg logit contributions
and-1.246
amed-1.219
year-1.171
 late-1.135
Just-1.121
Token "
Token ID366
Feature activation-5.56e-02

Pos logit contributions
 says+0.423
 cheeks+0.417
 mouths+0.404
 floats+0.397
 Ben+0.386
Neg logit contributions
and-0.498
amed-0.494
Just-0.465
year-0.463
Along-0.447
TokenGive
Token ID23318
Feature activation-1.33e-01

Pos logit contributions
 says+0.119
 Ben+0.117
 cheeks+0.115
 Tom+0.114
 mouths+0.110
Neg logit contributions
amed-0.111
and-0.108
year-0.102
uttering-0.097
Along-0.096
Token me
Token ID502
Feature activation-1.20e-01

Pos logit contributions
 cheeks+0.229
 mouths+0.214
 says+0.198
 floats+0.193
 Lee+0.188
Neg logit contributions
amed-0.319
and-0.310
Just-0.278
aming-0.276
 late-0.266
Token back
Token ID736
Feature activation+1.71e-01

Pos logit contributions
 cheeks+0.213
 says+0.203
 mouths+0.193
 floats+0.182
 Ben+0.181
Neg logit contributions
and-0.277
amed-0.275
year-0.251
Along-0.242
aming-0.239
Token pets
Token ID17252
Feature activation-1.31e-01

Pos logit contributions
 cheeks+0.535
 mouths+0.486
 avail+0.483
 Ben+0.464
 Tom+0.462
Neg logit contributions
amed-0.742
and-0.731
Along-0.673
Just-0.669
aking-0.660
Token the
Token ID262
Feature activation-1.94e-01

Pos logit contributions
 cheeks+0.208
 mouths+0.190
 says+0.190
 floats+0.171
 trucks+0.168
Neg logit contributions
and-0.311
amed-0.301
year-0.278
uttering-0.274
Along-0.272
Token dog
Token ID3290
Feature activation-2.91e-01

Pos logit contributions
 says+0.193
 cheeks+0.184
 Ben+0.182
 Tom+0.173
 mouths+0.169
Neg logit contributions
amed-0.583
and-0.578
year-0.548
uttering-0.539
Along-0.537
Token.
Token ID13
Feature activation-5.44e-02

Pos logit contributions
 cheeks+0.530
 says+0.502
 Ben+0.497
 mouths+0.489
 Tom+0.485
Neg logit contributions
and-0.647
amed-0.620
year-0.573
Along-0.561
uttering-0.551
Token She
Token ID1375
Feature activation-9.32e-02

Pos logit contributions
 cheeks+0.102
 says+0.094
 mouths+0.093
 trucks+0.091
 floats+0.089
Neg logit contributions
and-0.115
amed-0.106
 late-0.104
year-0.104
Along-0.103
Token says
Token ID1139
Feature activation-4.90e-01

Pos logit contributions
 cheeks+0.115
 Lee+0.097
 avail+0.093
 mouths+0.091
ens+0.090
Neg logit contributions
amed-0.238
Along-0.229
and-0.229
Just-0.224
 truly-0.219
Token,
Token ID11
Feature activation-2.37e-01

Pos logit contributions
 cheeks+0.691
 says+0.671
 mouths+0.649
ens+0.611
 Ben+0.604
Neg logit contributions
and-1.196
amed-1.164
year-1.136
pr-1.070
uttering-1.069
Token "
Token ID366
Feature activation-5.58e-02

Pos logit contributions
 says+0.411
 cheeks+0.410
 mouths+0.393
 Ben+0.381
 floats+0.372
Neg logit contributions
and-0.523
amed-0.516
year-0.490
Along-0.475
Just-0.473
TokenYou
Token ID1639
Feature activation+2.09e-01

Pos logit contributions
 says+0.123
 Ben+0.120
 Tom+0.117
 cheeks+0.115
ens+0.112
Neg logit contributions
amed-0.110
and-0.106
year-0.101
uttering-0.096
Along-0.093
Token are
Token ID389
Feature activation-3.20e-01

Pos logit contributions
amed+0.650
and+0.616
year+0.598
Along+0.593
aming+0.587
Neg logit contributions
 cheeks-0.205
 mouths-0.184
 says-0.175
 Ben-0.173
 Lee-0.171
Token a
Token ID257
Feature activation+1.61e-01

Pos logit contributions
 Ben+0.760
 Tom+0.744
 says+0.735
 Sam+0.709
 cheeks+0.669
Neg logit contributions
and-0.604
amed-0.582
year-0.563
uttering-0.533
Along-0.520
Token and
Token ID290
Feature activation+1.57e-02

Pos logit contributions
 Ben+0.362
 says+0.349
 Tom+0.344
 Sam+0.330
 cheeks+0.326
Neg logit contributions
amed-0.523
and-0.497
year-0.483
aming-0.482
uttering-0.475
Token dance
Token ID9280
Feature activation-1.68e-01

Pos logit contributions
and+0.039
amed+0.038
year+0.035
Along+0.035
Just+0.034
Neg logit contributions
 says-0.022
 cheeks-0.021
 Lee-0.021
 Ben-0.021
ens-0.021
Token.
Token ID13
Feature activation-3.59e-02

Pos logit contributions
 Ben+0.307
 says+0.299
 Tom+0.290
 cheeks+0.282
 Sam+0.277
Neg logit contributions
amed-0.385
and-0.365
aming-0.362
year-0.358
uttering-0.343
Token
Token ID198
Feature activation+1.76e-01

Pos logit contributions
 cheeks+0.082
 mouths+0.075
 says+0.075
 ah+0.072
 acts+0.072
Neg logit contributions
and-0.065
amed-0.058
Along-0.058
Just-0.054
 late-0.053
Token
Token ID198
Feature activation+1.77e-01

Pos logit contributions
amed+0.426
year+0.383
aming+0.376
and+0.366
uttering+0.363
Neg logit contributions
 Tom-0.333
 Ben-0.332
 says-0.332
 Sam-0.309
 Tim-0.291
TokenOne
Token ID3198
Feature activation+4.21e-01

Pos logit contributions
amed+0.254
and+0.233
year+0.221
uttering+0.202
aming+0.200
Neg logit contributions
 Tom-0.491
 Ben-0.489
 says-0.483
 Sam-0.469
 Tim-0.458
Token day
Token ID1110
Feature activation+8.01e-02

Pos logit contributions
amed+0.965
and+0.933
year+0.874
Along+0.837
aming+0.831
Neg logit contributions
 says-0.786
 Ben-0.786
 Tom-0.763
 cheeks-0.738
 Sam-0.729
Token,
Token ID11
Feature activation-2.21e-01

Pos logit contributions
amed+0.217
and+0.210
year+0.201
Along+0.200
uttering+0.199
Neg logit contributions
 says-0.104
 cheeks-0.101
 mouths-0.094
 Lee-0.088
 Ben-0.088
Token Tim
Token ID5045
Feature activation-3.68e-01

Pos logit contributions
 says+0.207
 cheeks+0.206
 mouths+0.177
 acts+0.177
 Lee+0.156
Neg logit contributions
amed-0.670
and-0.651
Along-0.644
year-0.637
 late-0.637
Tokenmy
Token ID1820
Feature activation+3.96e-02

Pos logit contributions
 says+0.668
 Ben+0.654
 cheeks+0.634
 Tom+0.632
 mouths+0.606
Neg logit contributions
amed-0.844
and-0.834
year-0.791
uttering-0.754
Along-0.751
Token accidentally
Token ID14716
Feature activation+6.56e-03

Pos logit contributions
amed+0.089
and+0.087
Along+0.079
aming+0.079
year+0.079
Neg logit contributions
 cheeks-0.071
 says-0.069
 mouths-0.068
 Ben-0.068
 Tom-0.066
Token touch
Token ID3638
Feature activation-1.62e-01

Pos logit contributions
Along+0.581
and+0.557
amed+0.548
Just+0.519
On+0.514
Neg logit contributions
 mouths-0.458
 cheeks-0.455
 says-0.445
ens-0.440
 Lee-0.430
Token it
Token ID340
Feature activation-2.14e-01

Pos logit contributions
 cheeks+0.220
 says+0.210
 mouths+0.193
 Lee+0.187
 Ben+0.184
Neg logit contributions
and-0.435
amed-0.429
year-0.411
Along-0.393
aming-0.390
Token.
Token ID13
Feature activation-2.92e-02

Pos logit contributions
 says+0.338
 Ben+0.330
 cheeks+0.323
 Tom+0.317
 mouths+0.305
Neg logit contributions
amed-0.546
and-0.532
year-0.509
aming-0.486
Along-0.486
Token
Token ID198
Feature activation+1.90e-01

Pos logit contributions
 says+0.051
 cheeks+0.049
 Ben+0.049
 Tom+0.047
 mouths+0.047
Neg logit contributions
amed-0.069
and-0.068
year-0.064
Along-0.062
uttering-0.061
Token
Token ID198
Feature activation+1.88e-01

Pos logit contributions
amed+0.495
uttering+0.480
year+0.478
pr+0.465
 supposed+0.459
Neg logit contributions
 Ben-0.309
 Tom-0.296
 says-0.281
 Sam-0.280
 Max-0.240
TokenSuddenly
Token ID38582
Feature activation+4.90e-01

Pos logit contributions
 supposed+0.329
uttering+0.324
year+0.324
amed+0.316
and+0.307
Neg logit contributions
 Ben-0.419
 Tom-0.388
 Sam-0.386
 says-0.358
ens-0.355
Token he
Token ID339
Feature activation-1.76e-01

Pos logit contributions
amed+1.195
Along+1.137
 late+1.116
Just+1.081
and+1.078
Neg logit contributions
 cheeks-0.696
 says-0.694
 mouths-0.680
 floats-0.619
 Lee-0.605
Token was
Token ID373
Feature activation-1.58e-03

Pos logit contributions
 Ben+0.320
 says+0.319
 cheeks+0.304
 Tom+0.304
 Sam+0.298
Neg logit contributions
and-0.413
amed-0.397
year-0.388
uttering-0.367
Along-0.354
Token very
Token ID845
Feature activation+4.55e-01

Pos logit contributions
 Ben+0.003
 Tom+0.003
 Sam+0.003
 says+0.003
 Mia+0.003
Neg logit contributions
and-0.004
year-0.003
ust-0.003
amed-0.003
Days-0.003
Token frightened
Token ID24776
Feature activation+2.56e-01

Pos logit contributions
and+0.883
amed+0.839
year+0.778
Along+0.735
Days+0.714
Neg logit contributions
 Ben-1.067
 says-1.046
 Tom-1.024
 Sam-1.003
 cheeks-0.959
Token.
Token ID13
Feature activation-3.17e-02

Pos logit contributions
and+0.597
amed+0.589
year+0.554
uttering+0.532
aming+0.532
Neg logit contributions
 says-0.449
 cheeks-0.437
 Ben-0.426
 mouths-0.421
 Tom-0.408
Token,
Token ID11
Feature activation-2.23e-01

Pos logit contributions
and+0.527
amed+0.521
 late+0.497
year+0.495
uttering+0.495
Neg logit contributions
 says-0.342
 cheeks-0.319
 mouths-0.307
ens-0.307
 acts-0.298
Token Mom
Token ID11254
Feature activation-2.92e-01

Pos logit contributions
 acts+0.288
 cheeks+0.287
 says+0.283
 comes+0.248
ens+0.231
Neg logit contributions
 late-0.562
amed-0.550
Along-0.524
and-0.522
On-0.518
Token put
Token ID1234
Feature activation-1.40e-01

Pos logit contributions
 cheeks+0.541
 mouths+0.499
 Ben+0.476
 says+0.469
 Lee+0.459
Neg logit contributions
amed-0.675
and-0.631
year-0.606
aming-0.605
Along-0.583
Token the
Token ID262
Feature activation-1.92e-01

Pos logit contributions
 cheeks+0.230
 says+0.221
 Ben+0.201
 Lee+0.198
 mouths+0.197
Neg logit contributions
amed-0.346
and-0.326
year-0.320
aming-0.305
 late-0.297
Token pizz
Token ID38748
Feature activation-1.67e-01

Pos logit contributions
 cheeks+0.239
 says+0.237
 Ben+0.232
 Tom+0.222
 Lee+0.213
Neg logit contributions
amed-0.540
and-0.526
year-0.500
aming-0.490
Along-0.488
Tokenas
Token ID292
Feature activation+4.62e-01

Pos logit contributions
 says+0.453
 Ben+0.451
 Tom+0.441
 mouths+0.430
 cheeks+0.428
Neg logit contributions
amed-0.187
Just-0.171
 supposed-0.169
year-0.168
Along-0.167
Token in
Token ID287
Feature activation+2.84e-01

Pos logit contributions
amed+0.746
and+0.728
year+0.665
aming+0.622
Along+0.621
Neg logit contributions
 says-1.142
 cheeks-1.121
 Ben-1.120
 Tom-1.092
 mouths-1.078
Token the
Token ID262
Feature activation-1.93e-01

Pos logit contributions
amed+0.660
and+0.637
year+0.608
aming+0.598
 late+0.589
Neg logit contributions
 says-0.482
 cheeks-0.480
 floats-0.437
 mouths-0.434
ens-0.428
Token oven
Token ID14361
Feature activation-2.36e-01

Pos logit contributions
 cheeks+0.219
 says+0.216
 Ben+0.214
 Tom+0.207
 Lee+0.198
Neg logit contributions
amed-0.555
and-0.536
year-0.512
aming-0.501
Along-0.498
Token.
Token ID13
Feature activation-3.80e-02

Pos logit contributions
 cheeks+0.433
 says+0.423
 Ben+0.422
 mouths+0.405
ens+0.402
Neg logit contributions
amed-0.515
and-0.507
year-0.463
aming-0.452
Along-0.445
Token She
Token ID1375
Feature activation-8.61e-02

Pos logit contributions
 acts+0.056
 cheeks+0.055
pa+0.053
 avail+0.053
 says+0.052
Neg logit contributions
 late-0.089
amed-0.084
and-0.083
year-0.082
Along-0.080
Token should
Token ID815
Feature activation+1.18e-01

Pos logit contributions
 cheeks+0.252
 Ben+0.241
 mouths+0.239
 says+0.236
 Tom+0.233
Neg logit contributions
amed-0.374
and-0.364
year-0.346
Along-0.332
aming-0.332
Token do
Token ID466
Feature activation-1.39e-01

Pos logit contributions
and+0.229
 truly+0.220
 supposed+0.218
 late+0.217
ust+0.217
Neg logit contributions
 Lee-0.197
 mouths-0.193
ens-0.187
 cheeks-0.186
 says-0.177
Token.
Token ID13
Feature activation-3.24e-02

Pos logit contributions
 says+0.227
 cheeks+0.224
 Ben+0.222
 Tom+0.213
 mouths+0.213
Neg logit contributions
amed-0.336
and-0.335
year-0.312
Along-0.299
aming-0.299
Token
Token ID198
Feature activation+1.86e-01

Pos logit contributions
 cheeks+0.058
 says+0.055
 mouths+0.054
 floats+0.052
ens+0.051
Neg logit contributions
and-0.074
amed-0.070
year-0.068
Along-0.066
Just-0.065
Token
Token ID198
Feature activation+1.83e-01

Pos logit contributions
amed+0.394
uttering+0.356
and+0.354
year+0.344
aming+0.341
Neg logit contributions
 Ben-0.387
 Tom-0.381
 says-0.381
 Sam-0.365
 Tim-0.331
TokenSuddenly
Token ID38582
Feature activation+4.96e-01

Pos logit contributions
amed+0.317
uttering+0.287
and+0.287
 supposed+0.283
year+0.273
Neg logit contributions
 Ben-0.429
 Tom-0.422
 says-0.408
 Sam-0.408
 Lee-0.374
Token he
Token ID339
Feature activation-1.71e-01

Pos logit contributions
 late+1.038
amed+1.033
Along+0.991
and+0.982
year+0.958
Neg logit contributions
 mouths-0.745
 cheeks-0.743
 says-0.707
 tongues-0.689
 floats-0.685
Token heard
Token ID2982
Feature activation+1.69e-01

Pos logit contributions
 says+0.343
 cheeks+0.339
 Ben+0.337
 Tom+0.329
 mouths+0.324
Neg logit contributions
amed-0.354
and-0.341
year-0.321
Along-0.308
aming-0.307
Token someone
Token ID2130
Feature activation+1.94e-02

Pos logit contributions
and+0.272
amed+0.269
year+0.256
Along+0.231
 late+0.225
Neg logit contributions
 says-0.399
 cheeks-0.396
 mouths-0.394
ens-0.377
 Ben-0.372
Token say
Token ID910
Feature activation-3.47e-01

Pos logit contributions
and+0.047
amed+0.046
year+0.044
Along+0.042
uttering+0.042
Neg logit contributions
 cheeks-0.031
 Ben-0.031
 says-0.031
 mouths-0.030
 Tom-0.030
Token,
Token ID11
Feature activation-2.14e-01

Pos logit contributions
 says+0.434
 cheeks+0.422
 mouths+0.406
 Ben+0.401
 Tom+0.378
Neg logit contributions
amed-0.969
and-0.965
year-0.922
uttering-0.880
pr-0.874
Token her
Token ID607
Feature activation-1.15e-01

Pos logit contributions
 cheeks+0.062
 says+0.057
 floats+0.055
 mouths+0.054
 ah+0.050
Neg logit contributions
and-0.093
amed-0.088
year-0.082
 late-0.076
Along-0.076
Token anywhere
Token ID6609
Feature activation+2.01e-01

Pos logit contributions
 says+0.204
 Ben+0.199
 cheeks+0.198
 Tom+0.192
 mouths+0.189
Neg logit contributions
amed-0.264
and-0.263
year-0.243
Along-0.235
uttering-0.235
Token.
Token ID13
Feature activation-4.17e-02

Pos logit contributions
and+0.455
amed+0.407
year+0.387
Along+0.377
Just+0.370
Neg logit contributions
 cheeks-0.371
 says-0.371
 mouths-0.358
 Ben-0.352
 Lee-0.336
Token He
Token ID679
Feature activation-8.79e-02

Pos logit contributions
 says+0.067
 cheeks+0.065
 floats+0.063
 mouths+0.062
 Ben+0.061
Neg logit contributions
and-0.105
amed-0.098
year-0.094
Along-0.091
Just-0.090
Token was
Token ID373
Feature activation-1.36e-02

Pos logit contributions
 says+0.146
 cheeks+0.145
 Ben+0.144
 Tom+0.138
 mouths+0.137
Neg logit contributions
amed-0.211
and-0.206
year-0.196
Along-0.188
aming-0.185
Token impressed
Token ID12617
Feature activation+4.22e-01

Pos logit contributions
 Ben+0.030
 Tom+0.029
 says+0.027
 Sam+0.027
 Lily+0.026
Neg logit contributions
and-0.030
amed-0.027
Along-0.026
if-0.025
uttering-0.025
Token by
Token ID416
Feature activation+2.69e-01

Pos logit contributions
and+0.947
amed+0.946
year+0.877
Along+0.839
uttering+0.837
Neg logit contributions
 says-0.767
 cheeks-0.762
 Ben-0.750
 mouths-0.727
 Tom-0.723
Token how
Token ID703
Feature activation-9.29e-03

Pos logit contributions
and+0.652
amed+0.622
year+0.622
pr+0.574
Along+0.566
Neg logit contributions
 says-0.405
 floats-0.379
 mouths-0.378
 cheeks-0.375
ens-0.355
Token well
Token ID880
Feature activation+2.43e-01

Pos logit contributions
 says+0.016
 Ben+0.016
 Tom+0.015
 cheeks+0.015
 Sam+0.015
Neg logit contributions
amed-0.022
and-0.022
year-0.020
Along-0.020
uttering-0.020
Token she
Token ID673
Feature activation-1.52e-01

Pos logit contributions
and+0.540
amed+0.534
year+0.505
 late+0.488
pr+0.481
Neg logit contributions
 cheeks-0.392
 mouths-0.389
 says-0.380
 floats-0.341
 Ben-0.339
Token hid
Token ID24519
Feature activation-1.02e-01

Pos logit contributions
 cheeks+0.274
 says+0.273
 Ben+0.269
 mouths+0.262
 Tom+0.259
Neg logit contributions
amed-0.347
and-0.339
year-0.323
Along-0.308
aming-0.306
Token had
Token ID550
Feature activation+4.11e-02

Pos logit contributions
 cheeks+0.011
 Ben+0.011
 mouths+0.011
 says+0.010
 Lee+0.010
Neg logit contributions
and-0.013
amed-0.013
year-0.012
aming-0.011
Along-0.011
Token and
Token ID290
Feature activation+2.35e-02

Pos logit contributions
amed+0.088
year+0.088
and+0.086
Along+0.080
uttering+0.078
Neg logit contributions
 says-0.084
 Ben-0.080
 Tom-0.079
 cheeks-0.076
 Sam-0.075
Token enjoyed
Token ID8359
Feature activation+2.14e-01

Pos logit contributions
amed+0.052
and+0.052
year+0.048
Along+0.045
 supposed+0.045
Neg logit contributions
 says-0.040
 cheeks-0.040
 Ben-0.039
 Lee-0.039
 mouths-0.038
Token her
Token ID607
Feature activation-1.08e-01

Pos logit contributions
amed+0.571
and+0.566
year+0.536
uttering+0.518
Along+0.516
Neg logit contributions
 says-0.301
 cheeks-0.288
 Ben-0.285
 Tom-0.272
 mouths-0.270
Token own
Token ID898
Feature activation+2.88e-02

Pos logit contributions
 says+0.217
 cheeks+0.215
 Ben+0.212
 Tom+0.205
 mouths+0.203
Neg logit contributions
amed-0.232
and-0.223
year-0.212
aming-0.200
Along-0.200
Token colorful
Token ID20239
Feature activation+1.46e-01

Pos logit contributions
amed+0.066
and+0.065
year+0.061
Along+0.058
uttering+0.058
Neg logit contributions
 says-0.052
 Ben-0.051
 cheeks-0.050
 Tom-0.049
 mouths-0.048
Token slide
Token ID10649
Feature activation-2.85e-01

Pos logit contributions
and+0.355
amed+0.349
uttering+0.319
year+0.318
Along+0.316
Neg logit contributions
 says-0.225
 Ben-0.223
 Tom-0.220
 Sam-0.205
ens-0.203
Token.
Token ID13
Feature activation-3.63e-02

Pos logit contributions
 says+0.529
 Ben+0.517
 cheeks+0.516
 Tom+0.497
 mouths+0.493
Neg logit contributions
amed-0.632
and-0.627
year-0.582
uttering-0.556
Along-0.556
Token<|endoftext|>
Token ID50256
Feature activation+2.74e-01

Pos logit contributions
 says+0.081
 cheeks+0.080
 Ben+0.079
 Tom+0.077
 mouths+0.077
Neg logit contributions
amed-0.067
and-0.067
year-0.061
Along-0.058
uttering-0.057
TokenOnce
Token ID7454
Feature activation+3.80e-01

Pos logit contributions
amed+0.586
and+0.554
year+0.545
uttering+0.528
pr+0.512
Neg logit contributions
 Ben-0.569
 Tom-0.561
 says-0.560
 Sam-0.525
 cheeks-0.495
Token upon
Token ID2402
Feature activation+3.14e-01

Pos logit contributions
amed+0.528
and+0.512
year+0.462
Along+0.427
aming+0.426
Neg logit contributions
 says-1.033
 Ben-1.021
 cheeks-1.005
 Tom-0.998
 mouths-0.975
Token joy
Token ID8716
Feature activation+5.30e-02

Pos logit contributions
amed+0.029
and+0.029
year+0.027
aming+0.025
uttering+0.025
Neg logit contributions
 says-0.028
 cheeks-0.028
 Ben-0.028
 Tom-0.027
 mouths-0.027
Token.
Token ID13
Feature activation-3.93e-02

Pos logit contributions
amed+0.126
and+0.124
year+0.117
uttering+0.112
aming+0.111
Neg logit contributions
 says-0.091
 Ben-0.088
 cheeks-0.087
 Tom-0.084
 mouths-0.084
Token<|endoftext|>
Token ID50256
Feature activation+2.73e-01

Pos logit contributions
 says+0.084
 cheeks+0.083
 Ben+0.083
 Tom+0.080
 mouths+0.079
Neg logit contributions
amed-0.076
and-0.075
year-0.070
Along-0.066
uttering-0.066
TokenOnce
Token ID7454
Feature activation+3.77e-01

Pos logit contributions
amed+0.586
and+0.569
year+0.541
uttering+0.517
aming+0.511
Neg logit contributions
 Ben-0.541
 says-0.539
 Tom-0.524
 cheeks-0.500
 Sam-0.496
Token,
Token ID11
Feature activation-2.18e-01

Pos logit contributions
and+0.444
amed+0.427
year+0.380
uttering+0.356
Along+0.349
Neg logit contributions
 cheeks-1.081
 says-1.058
 mouths-1.040
 Ben-1.034
 Lee-1.006
Token there
Token ID612
Feature activation+2.14e-01

Pos logit contributions
 Ben+0.488
 says+0.480
 Tom+0.479
 cheeks+0.457
 Sam+0.456
Neg logit contributions
amed-0.415
and-0.406
year-0.373
Along-0.357
aming-0.356
Token was
Token ID373
Feature activation-8.11e-03

Pos logit contributions
and+0.405
amed+0.398
year+0.352
aming+0.349
Along+0.347
Neg logit contributions
 cheeks-0.458
 mouths-0.432
 says-0.432
 Ben-0.415
 Lee-0.411
Token a
Token ID257
Feature activation+1.74e-01

Pos logit contributions
 Ben+0.016
 Tom+0.016
 says+0.014
 Sam+0.014
 Tim+0.014
Neg logit contributions
and-0.019
amed-0.018
year-0.017
uttering-0.017
pr-0.017
Token little
Token ID1310
Feature activation+4.26e-02

Pos logit contributions
amed+0.381
and+0.377
year+0.357
Along+0.345
uttering+0.341
Neg logit contributions
 says-0.322
 cheeks-0.315
 Ben-0.309
 mouths-0.302
 Tom-0.301
Token girl
Token ID2576
Feature activation-1.50e-01

Pos logit contributions
year+0.091
amed+0.090
uttering+0.090
and+0.089
 late+0.089
Neg logit contributions
 cheeks-0.071
 says-0.070
 mouths-0.068
 Tom-0.062
 begs-0.061
Token named
Token ID3706
Feature activation+1.33e-02

Pos logit contributions
 says+0.251
 Ben+0.246
 cheeks+0.242
 Tom+0.237
 mouths+0.229
Neg logit contributions
amed-0.365
and-0.359
year-0.339
Along-0.326
uttering-0.325
Token much
Token ID881
Feature activation+4.80e-01

Pos logit contributions
amed+0.487
and+0.476
year+0.424
Along+0.400
uttering+0.399
Neg logit contributions
 says-0.881
 Ben-0.870
 Tom-0.853
 cheeks-0.849
 mouths-0.826
Token fun
Token ID1257
Feature activation+2.16e-01

Pos logit contributions
amed+0.938
and+0.890
year+0.865
uttering+0.805
aming+0.782
Neg logit contributions
 says-1.099
 Ben-1.068
 Tom-1.044
 Sam-1.015
 cheeks-1.007
Token!
Token ID0
Feature activation+2.96e-02

Pos logit contributions
amed+0.500
and+0.495
year+0.463
Along+0.445
uttering+0.443
Neg logit contributions
 says-0.385
 Ben-0.376
 cheeks-0.375
 Tom-0.363
 mouths-0.357
Token
Token ID220
Feature activation-2.50e-02

Pos logit contributions
and+0.064
amed+0.059
Along+0.055
year+0.055
Just+0.054
Neg logit contributions
 cheeks-0.057
 says-0.054
 mouths-0.053
 floats-0.053
 Ben-0.051
Token
Token ID198
Feature activation+1.86e-01

Pos logit contributions
 Ben+0.050
 Tom+0.050
 says+0.050
 Sam+0.047
 Tim+0.043
Neg logit contributions
amed-0.056
year-0.053
and-0.050
aming-0.050
uttering-0.049
Token
Token ID198
Feature activation+1.86e-01

Pos logit contributions
amed+0.490
uttering+0.470
pr+0.456
year+0.453
aming+0.448
Neg logit contributions
 Tom-0.301
 Ben-0.300
 says-0.295
 Sam-0.265
 Tim-0.251
TokenThe
Token ID464
Feature activation+3.61e-01

Pos logit contributions
amed+0.337
uttering+0.335
 supposed+0.297
year+0.281
and+0.277
Neg logit contributions
 Ben-0.405
 Tom-0.400
 says-0.378
 Sam-0.374
 Tim-0.363
Token boy
Token ID2933
Feature activation-2.08e-01

Pos logit contributions
amed+1.062
and+1.050
year+1.005
Along+0.982
aming+0.971
Neg logit contributions
 says-0.401
 cheeks-0.392
 Ben-0.388
 Tom-0.365
 mouths-0.365
Token then
Token ID788
Feature activation-7.80e-02

Pos logit contributions
 cheeks+0.376
 Ben+0.371
 says+0.370
 Lee+0.360
 Tom+0.358
Neg logit contributions
amed-0.470
and-0.453
year-0.424
Along-0.411
aming-0.406
Token filled
Token ID5901
Feature activation+1.53e-01

Pos logit contributions
 cheeks+0.152
 Lee+0.142
 mouths+0.141
 Ben+0.134
 Tom+0.132
Neg logit contributions
amed-0.167
year-0.149
and-0.149
Along-0.145
aming-0.142
Token up
Token ID510
Feature activation+1.31e-01

Pos logit contributions
amed+0.412
and+0.407
year+0.386
Along+0.372
aming+0.370
Neg logit contributions
 says-0.214
 cheeks-0.209
 Ben-0.208
 Tom-0.198
 mouths-0.195
Token,
Token ID11
Feature activation-2.19e-01

Pos logit contributions
 much+0.260
and+0.247
ust+0.245
 very+0.240
Just+0.229
Neg logit contributions
 acts-0.238
 says-0.220
 writes-0.211
 pret-0.206
 mouths-0.206
Token he
Token ID339
Feature activation-1.87e-01

Pos logit contributions
 cheeks+0.282
 says+0.279
 mouths+0.278
 Lee+0.255
ens+0.244
Neg logit contributions
and-0.594
year-0.553
amed-0.537
Along-0.532
ust-0.526
Token's
Token ID338
Feature activation-3.39e-02

Pos logit contributions
 cheeks+0.280
 Ben+0.276
 says+0.273
 mouths+0.264
 Tom+0.264
Neg logit contributions
amed-0.480
and-0.461
year-0.439
aming-0.430
Along-0.425
Token not
Token ID407
Feature activation+1.81e-01

Pos logit contributions
 says+0.092
 cheeks+0.090
 Ben+0.090
 Tom+0.088
 mouths+0.087
Neg logit contributions
amed-0.048
and-0.046
year-0.041
aming-0.038
Along-0.038
Token rude
Token ID22066
Feature activation+1.51e-01

Pos logit contributions
amed+0.345
and+0.331
year+0.316
uttering+0.297
aming+0.297
Neg logit contributions
 Ben-0.409
 says-0.405
 Tom-0.400
 cheeks-0.391
 Sam-0.383
Token like
Token ID588
Feature activation+1.83e-01

Pos logit contributions
and+0.321
amed+0.296
year+0.263
 supposed+0.261
 late+0.259
Neg logit contributions
 cheeks-0.286
 says-0.284
 mouths-0.278
 Ben-0.276
 Tom-0.264
Token some
Token ID617
Feature activation+7.42e-02

Pos logit contributions
and+0.443
amed+0.400
 supposed+0.393
 late+0.391
Along+0.380
Neg logit contributions
 says-0.283
 cheeks-0.274
 mouths-0.263
 Lee-0.249
 floats-0.247
Token kids
Token ID3988
Feature activation-2.59e-01

Pos logit contributions
and+0.163
amed+0.155
year+0.146
aming+0.139
Along+0.138
Neg logit contributions
 says-0.135
 Ben-0.135
 cheeks-0.130
 Lee-0.130
 Tom-0.129
Token.
Token ID13
Feature activation-3.48e-02

Pos logit contributions
 says+0.452
 cheeks+0.450
 Ben+0.441
 mouths+0.428
 Tom+0.421
Neg logit contributions
and-0.596
amed-0.584
year-0.542
Along-0.521
aming-0.519
Token He
Token ID679
Feature activation-8.50e-02

Pos logit contributions
 cheeks+0.055
 says+0.054
 mouths+0.053
 Ben+0.049
ens+0.049
Neg logit contributions
and-0.085
amed-0.081
year-0.079
Just-0.077
Along-0.075
Token's
Token ID338
Feature activation-3.43e-02

Pos logit contributions
 cheeks+0.144
 Ben+0.142
 Tom+0.139
 Lee+0.137
 mouths+0.136
Neg logit contributions
amed-0.194
and-0.184
year-0.175
Along-0.171
sy-0.170
Token did
Token ID750
Feature activation-9.58e-02

Pos logit contributions
 cheeks+0.630
 says+0.630
 Ben+0.613
 mouths+0.599
 Tom+0.594
Neg logit contributions
amed-0.820
and-0.809
year-0.745
Along-0.727
aming-0.720
Token.
Token ID13
Feature activation-2.90e-02

Pos logit contributions
 says+0.153
 cheeks+0.149
 Ben+0.146
 mouths+0.141
 Tom+0.140
Neg logit contributions
amed-0.236
and-0.236
year-0.218
Along-0.213
uttering-0.213
Token<|endoftext|>
Token ID50256
Feature activation+2.76e-01

Pos logit contributions
 says+0.066
 cheeks+0.066
 Ben+0.064
 mouths+0.063
 Tom+0.062
Neg logit contributions
and-0.052
amed-0.052
year-0.047
Along-0.045
uttering-0.044
TokenOnce
Token ID7454
Feature activation+3.80e-01

Pos logit contributions
amed+0.551
and+0.533
year+0.506
uttering+0.487
aming+0.474
Neg logit contributions
 Ben-0.592
 says-0.591
 Tom-0.578
 Sam-0.549
 cheeks-0.548
Token upon
Token ID2402
Feature activation+3.13e-01

Pos logit contributions
and+0.481
amed+0.469
year+0.417
uttering+0.390
Along+0.385
Neg logit contributions
 cheeks-1.057
 says-1.041
 Ben-1.020
 mouths-1.014
 Tom-0.996
Token a
Token ID257
Feature activation+1.76e-01

Pos logit contributions
amed+0.807
uttering+0.732
and+0.731
year+0.720
Days+0.699
Neg logit contributions
 Ben-0.555
 Tom-0.553
 says-0.499
 Sam-0.486
 Lily-0.474
Token time
Token ID640
Feature activation+1.67e-01

Pos logit contributions
amed+0.464
year+0.436
and+0.416
pr+0.394
aming+0.383
Neg logit contributions
 Ben-0.301
 Tom-0.301
 says-0.290
 Sam-0.263
ens-0.258
Token,
Token ID11
Feature activation-2.14e-01

Pos logit contributions
and+0.497
amed+0.481
year+0.455
uttering+0.451
Along+0.450
Neg logit contributions
 says-0.183
 cheeks-0.183
 mouths-0.172
 Ben-0.162
 floats-0.153
Token there
Token ID612
Feature activation+2.17e-01

Pos logit contributions
 Ben+0.462
 Tom+0.454
 says+0.445
 Sam+0.429
 Tim+0.412
Neg logit contributions
amed-0.434
and-0.428
year-0.388
aming-0.375
uttering-0.373
Token was
Token ID373
Feature activation-5.98e-03

Pos logit contributions
and+0.399
amed+0.393
year+0.346
aming+0.341
Along+0.341
Neg logit contributions
 cheeks-0.478
 says-0.445
 mouths-0.444
 Lee-0.425
ens-0.425
Token a
Token ID257
Feature activation+1.76e-01

Pos logit contributions
 Ben+0.012
 Tom+0.012
 Sam+0.011
 says+0.010
 Tim+0.010
Neg logit contributions
and-0.014
amed-0.014
year-0.013
ust-0.012
pr-0.012
Token spider
Token ID19230
Feature activation-1.96e-01

Pos logit contributions
amed+0.489
 late+0.467
and+0.462
Along+0.461
uttering+0.461
Neg logit contributions
 cheeks-0.325
ens-0.311
 says-0.307
 Lee-0.301
 Ben-0.300
Token web
Token ID3992
Feature activation+6.14e-02

Pos logit contributions
 says+0.342
 cheeks+0.340
 Ben+0.339
 Tom+0.327
 mouths+0.325
Neg logit contributions
amed-0.443
and-0.440
year-0.407
Along-0.393
aming-0.392
Token.
Token ID13
Feature activation-2.72e-02

Pos logit contributions
amed+0.150
and+0.148
year+0.140
Along+0.135
uttering+0.134
Neg logit contributions
 says-0.100
 Ben-0.098
 cheeks-0.097
 Tom-0.094
 mouths-0.091
Token The
Token ID383
Feature activation-1.36e-01

Pos logit contributions
 cheeks+0.042
 says+0.039
 mouths+0.039
 floats+0.036
ens+0.036
Neg logit contributions
and-0.068
amed-0.064
year-0.063
Along-0.062
Just-0.062
Token bird
Token ID6512
Feature activation-1.84e-01

Pos logit contributions
 says+0.195
 cheeks+0.185
 Ben+0.182
 Tom+0.181
 mouths+0.176
Neg logit contributions
amed-0.358
and-0.350
year-0.334
Along-0.327
uttering-0.324
Token thought
Token ID1807
Feature activation-4.46e-02

Pos logit contributions
 says+0.295
 Ben+0.291
 cheeks+0.288
 Tom+0.280
 mouths+0.273
Neg logit contributions
amed-0.455
and-0.449
year-0.424
Along-0.407
aming-0.404
Token it
Token ID340
Feature activation-2.19e-01

Pos logit contributions
 cheeks+0.066
 says+0.062
 mouths+0.062
 Lee+0.062
 floats+0.058
Neg logit contributions
and-0.114
year-0.107
amed-0.106
Along-0.099
pr-0.097
Token would
Token ID561
Feature activation+4.31e-02

Pos logit contributions
 says+0.359
 Ben+0.349
 cheeks+0.345
 Tom+0.336
 mouths+0.327
Neg logit contributions
amed-0.540
and-0.533
year-0.503
Along-0.483
uttering-0.483
Token be
Token ID307
Feature activation-1.94e-01

Pos logit contributions
amed+0.118
and+0.118
year+0.110
Along+0.109
Just+0.108
Neg logit contributions
 says-0.054
 cheeks-0.052
 Ben-0.052
 mouths-0.050
 Lee-0.049
Token fun
Token ID1257
Feature activation+2.16e-01

Pos logit contributions
 says+0.446
 Ben+0.444
 Tom+0.432
 Sam+0.423
 cheeks+0.420
Neg logit contributions
and-0.362
amed-0.352
year-0.332
Along-0.314
uttering-0.311
Token to
Token ID284
Feature activation+2.67e-01

Pos logit contributions
amed+0.460
year+0.424
and+0.421
aming+0.398
Along+0.396
Neg logit contributions
 says-0.444
 Ben-0.442
 Tom-0.427
 cheeks-0.422
 Sam-0.413
TokenMax
Token ID11518
Feature activation+1.43e-01

Pos logit contributions
and+0.423
uttering+0.419
amed+0.416
year+0.412
pr+0.383
Neg logit contributions
 Ben-0.322
 Tom-0.322
 says-0.313
 Sam-0.305
 Max-0.305
Token tried
Token ID3088
Feature activation-1.01e-01

Pos logit contributions
amed+0.358
and+0.352
year+0.332
Along+0.321
aming+0.319
Neg logit contributions
 says-0.224
 Ben-0.220
 cheeks-0.219
 Tom-0.212
 mouths-0.208
Token to
Token ID284
Feature activation+2.62e-01

Pos logit contributions
 Ben+0.213
 Tom+0.207
 Sam+0.206
 says+0.201
 Lee+0.184
Neg logit contributions
amed-0.227
year-0.210
aming-0.207
and-0.197
uttering-0.191
Token jump
Token ID4391
Feature activation-1.50e-01

Pos logit contributions
amed+0.709
and+0.699
year+0.661
Along+0.642
aming+0.638
Neg logit contributions
 says-0.363
 Ben-0.353
 cheeks-0.351
 Tom-0.338
 mouths-0.332
Token and
Token ID290
Feature activation+2.00e-02

Pos logit contributions
 Ben+0.252
 Tom+0.247
 says+0.243
 Sam+0.235
 cheeks+0.225
Neg logit contributions
amed-0.371
and-0.360
year-0.351
aming-0.344
uttering-0.334
Token flap
Token ID37699
Feature activation-1.31e-01

Pos logit contributions
and+0.047
amed+0.047
Just+0.044
Along+0.044
pr+0.042
Neg logit contributions
 cheeks-0.031
 mouths-0.030
 Lee-0.030
 says-0.030
 Ben-0.028
Token his
Token ID465
Feature activation-9.28e-02

Pos logit contributions
 says+0.173
 Ben+0.168
 cheeks+0.167
 Tom+0.161
 mouths+0.156
Neg logit contributions
amed-0.362
and-0.358
year-0.340
Along-0.328
uttering-0.327
Token ears
Token ID11368
Feature activation-2.24e-01

Pos logit contributions
 says+0.138
 cheeks+0.137
 Ben+0.133
 mouths+0.128
 Tom+0.126
Neg logit contributions
amed-0.245
and-0.243
year-0.237
aming-0.223
Along-0.222
Token,
Token ID11
Feature activation-2.19e-01

Pos logit contributions
 Ben+0.368
 says+0.362
 Tom+0.356
 cheeks+0.352
 Sam+0.334
Neg logit contributions
amed-0.556
and-0.534
year-0.526
aming-0.503
Along-0.497
Token but
Token ID475
Feature activation-1.72e-02

Pos logit contributions
 says+0.374
 cheeks+0.367
 Ben+0.363
 Tom+0.352
 mouths+0.349
Neg logit contributions
amed-0.512
and-0.511
year-0.471
Along-0.462
uttering-0.454
Token he
Token ID339
Feature activation-1.84e-01

Pos logit contributions
 cheeks+0.025
 says+0.024
 mouths+0.022
 Lee+0.022
 floats+0.022
Neg logit contributions
amed-0.044
and-0.042
year-0.041
 late-0.040
Along-0.040
Token with
Token ID351
Feature activation+1.97e-01

Pos logit contributions
amed+0.666
and+0.643
year+0.616
Along+0.608
Eventually+0.603
Neg logit contributions
 Ben-0.278
 cheeks-0.265
 says-0.259
 Tom-0.258
 Sam-0.248
Token his
Token ID465
Feature activation-9.17e-02

Pos logit contributions
amed+0.430
and+0.427
year+0.398
Along+0.384
uttering+0.383
Neg logit contributions
 says-0.377
 Ben-0.377
 cheeks-0.369
 Tom-0.363
 mouths-0.355
Token family
Token ID1641
Feature activation-4.91e-02

Pos logit contributions
 Ben+0.160
 says+0.160
 cheeks+0.158
 mouths+0.152
 Tom+0.150
Neg logit contributions
amed-0.220
and-0.217
year-0.210
aming-0.195
Along-0.195
Token.
Token ID13
Feature activation-3.29e-02

Pos logit contributions
 says+0.074
 Ben+0.073
 cheeks+0.071
 Tom+0.070
 mouths+0.068
Neg logit contributions
amed-0.128
and-0.125
year-0.119
Along-0.114
uttering-0.114
Token One
Token ID1881
Feature activation-8.92e-02

Pos logit contributions
 cheeks+0.058
 says+0.057
 ah+0.052
 mouths+0.050
 elbow+0.050
Neg logit contributions
and-0.072
Along-0.071
 late-0.069
amed-0.067
On-0.067
Token day
Token ID1110
Feature activation+8.05e-02

Pos logit contributions
 Ben+0.149
 says+0.148
 Tom+0.142
 Sam+0.139
ens+0.137
Neg logit contributions
amed-0.224
and-0.214
year-0.200
uttering-0.195
Along-0.193
Token,
Token ID11
Feature activation-2.18e-01

Pos logit contributions
amed+0.214
and+0.209
year+0.201
Along+0.199
uttering+0.198
Neg logit contributions
 says-0.106
 cheeks-0.100
 mouths-0.093
 Ben-0.091
 Lee-0.087
Token he
Token ID339
Feature activation-1.81e-01

Pos logit contributions
 says+0.354
 cheeks+0.328
 acts+0.307
 mouths+0.295
ens+0.291
Neg logit contributions
amed-0.510
 late-0.502
year-0.491
Along-0.488
and-0.488
Token decided
Token ID3066
Feature activation+1.46e-01

Pos logit contributions
 says+0.309
 Ben+0.307
 Tom+0.289
 cheeks+0.285
 Sam+0.276
Neg logit contributions
amed-0.438
and-0.436
year-0.416
uttering-0.399
pr-0.389
Token to
Token ID284
Feature activation+2.64e-01

Pos logit contributions
amed+0.219
and+0.214
year+0.194
Along+0.181
uttering+0.180
Neg logit contributions
 says-0.380
 Ben-0.374
 cheeks-0.373
 Tom-0.366
 mouths-0.361
Token go
Token ID467
Feature activation-2.16e-01

Pos logit contributions
Along+0.567
and+0.559
amed+0.550
sy+0.528
On+0.507
Neg logit contributions
 Lee-0.432
 says-0.422
 cheeks-0.418
 mouths-0.417
 Tom-0.390
Token.
Token ID13
Feature activation-3.15e-02

Pos logit contributions
amed+0.410
and+0.407
year+0.384
Along+0.369
uttering+0.367
Neg logit contributions
 says-0.268
 Ben-0.261
 cheeks-0.260
 Tom-0.251
 mouths-0.246
Token As
Token ID1081
Feature activation+1.44e-01

Pos logit contributions
 says+0.050
 Ben+0.049
 cheeks+0.049
 Tom+0.047
 mouths+0.046
Neg logit contributions
amed-0.079
and-0.078
year-0.073
Along-0.071
uttering-0.070
Token he
Token ID339
Feature activation-1.76e-01

Pos logit contributions
and+0.404
amed+0.404
year+0.385
Along+0.378
aming+0.368
Neg logit contributions
 says-0.180
 cheeks-0.176
 Ben-0.163
 mouths-0.163
 Tom-0.154
Token walked
Token ID6807
Feature activation-6.05e-02

Pos logit contributions
 says+0.266
 Ben+0.257
 cheeks+0.256
 Tom+0.247
 mouths+0.238
Neg logit contributions
amed-0.456
and-0.452
year-0.425
uttering-0.413
aming-0.407
Token,
Token ID11
Feature activation-2.14e-01

Pos logit contributions
 says+0.091
 Ben+0.090
 Tom+0.087
 cheeks+0.085
 Sam+0.084
Neg logit contributions
and-0.155
amed-0.154
year-0.146
aming-0.141
uttering-0.139
Token he
Token ID339
Feature activation-1.77e-01

Pos logit contributions
 says+0.253
 cheeks+0.240
 Ben+0.232
 mouths+0.225
 Tom+0.220
Neg logit contributions
amed-0.621
and-0.616
year-0.588
Along-0.574
uttering-0.563
Token saw
Token ID2497
Feature activation-1.13e-02

Pos logit contributions
 says+0.303
 Ben+0.299
 Tom+0.283
 cheeks+0.282
 Sam+0.273
Neg logit contributions
and-0.428
amed-0.420
year-0.399
uttering-0.387
aming-0.376
Token lots
Token ID6041
Feature activation+3.78e-01

Pos logit contributions
 says+0.024
 Ben+0.024
 cheeks+0.023
 Tom+0.023
 Sam+0.022
Neg logit contributions
amed-0.023
and-0.022
year-0.021
Along-0.020
uttering-0.020
Token of
Token ID286
Feature activation+2.16e-01

Pos logit contributions
and+0.577
amed+0.569
year+0.523
Along+0.479
aming+0.477
Neg logit contributions
 says-0.956
 cheeks-0.952
 Ben-0.946
 mouths-0.924
 Tom-0.921
Token animals
Token ID4695
Feature activation-8.14e-02

Pos logit contributions
amed+0.514
and+0.506
year+0.477
Along+0.458
uttering+0.456
Neg logit contributions
 says-0.369
 Ben-0.361
 cheeks-0.357
 Tom-0.348
 mouths-0.339
Token playing
Token ID2712
Feature activation+4.96e-02

Pos logit contributions
 says+0.143
 Ben+0.140
 cheeks+0.137
 Tom+0.135
 mouths+0.130
Neg logit contributions
amed-0.192
and-0.187
year-0.177
Along-0.169
aming-0.169
Token.
Token ID13
Feature activation-2.94e-02

Pos logit contributions
amed+0.712
and+0.698
year+0.662
Along+0.634
uttering+0.631
Neg logit contributions
 says-0.488
 Ben-0.474
 cheeks-0.468
 Tom-0.461
 mouths-0.436
Token There
Token ID1318
Feature activation-1.06e-01

Pos logit contributions
 says+0.046
 cheeks+0.045
 Ben+0.043
 mouths+0.042
 Tom+0.042
Neg logit contributions
amed-0.074
and-0.074
year-0.070
Along-0.068
uttering-0.066
Token was
Token ID373
Feature activation-8.79e-03

Pos logit contributions
 cheeks+0.185
 says+0.173
 Ben+0.173
 Lee+0.172
ens+0.169
Neg logit contributions
amed-0.237
and-0.233
year-0.215
Along-0.209
uttering-0.207
Token a
Token ID257
Feature activation+1.73e-01

Pos logit contributions
 Ben+0.018
 Tom+0.018
 says+0.017
 Sam+0.017
 cheeks+0.017
Neg logit contributions
and-0.019
amed-0.018
year-0.017
uttering-0.016
pr-0.016
Token beautiful
Token ID4950
Feature activation+3.24e-01

Pos logit contributions
amed+0.355
and+0.352
year+0.326
uttering+0.302
aming+0.301
Neg logit contributions
 says-0.362
 Ben-0.357
 cheeks-0.354
 Tom-0.343
 mouths-0.340
Token lake
Token ID13546
Feature activation-1.26e-01

Pos logit contributions
amed+0.719
and+0.692
year+0.659
uttering+0.655
Along+0.653
Neg logit contributions
 says-0.573
 Ben-0.569
 cheeks-0.558
 Tom-0.551
 mouths-0.539
Token full
Token ID1336
Feature activation+3.13e-01

Pos logit contributions
 says+0.222
 Ben+0.217
 cheeks+0.216
 Tom+0.208
 mouths+0.206
Neg logit contributions
amed-0.291
and-0.287
year-0.270
uttering-0.261
aming-0.260
Token of
Token ID286
Feature activation+2.11e-01

Pos logit contributions
amed+0.514
and+0.506
year+0.463
Along+0.434
uttering+0.432
Neg logit contributions
 says-0.762
 Ben-0.748
 cheeks-0.747
 Tom-0.730
 mouths-0.720
Token cool
Token ID3608
Feature activation+7.82e-02

Pos logit contributions
amed+0.487
and+0.480
year+0.451
Along+0.432
uttering+0.430
Neg logit contributions
 says-0.379
 Ben-0.371
 cheeks-0.367
 Tom-0.358
 mouths-0.350
Token,
Token ID11
Feature activation-2.17e-01

Pos logit contributions
amed+0.196
and+0.194
aming+0.182
Along+0.181
year+0.180
Neg logit contributions
 Ben-0.114
 Lee-0.111
 Tom-0.110
 says-0.110
 cheeks-0.109
Token clear
Token ID1598
Feature activation+1.68e-01

Pos logit contributions
 says+0.449
 Ben+0.443
 cheeks+0.433
 Tom+0.432
 Sam+0.414
Neg logit contributions
amed-0.444
and-0.436
year-0.408
Along-0.385
uttering-0.384
Token old
Token ID1468
Feature activation+7.99e-03

Pos logit contributions
amed+0.612
and+0.610
year+0.573
uttering+0.547
aming+0.541
Neg logit contributions
 says-0.541
 cheeks-0.531
 Ben-0.531
 Tom-0.512
 mouths-0.510
Token.
Token ID13
Feature activation-3.28e-02

Pos logit contributions
amed+0.016
and+0.016
year+0.015
Just+0.014
aming+0.014
Neg logit contributions
 says-0.015
 Ben-0.015
 cheeks-0.014
 Tom-0.014
 mouths-0.014
Token One
Token ID1881
Feature activation-9.14e-02

Pos logit contributions
 cheeks+0.057
 elbow+0.056
 acts+0.053
 ah+0.052
 says+0.051
Neg logit contributions
and-0.067
 late-0.064
year-0.062
amed-0.062
Along-0.060
Token day
Token ID1110
Feature activation+7.94e-02

Pos logit contributions
 says+0.172
 Ben+0.167
 Tom+0.163
 cheeks+0.161
 Sam+0.157
Neg logit contributions
amed-0.209
and-0.206
year-0.190
Along-0.186
uttering-0.183
Token,
Token ID11
Feature activation-2.19e-01

Pos logit contributions
amed+0.185
and+0.182
year+0.177
Along+0.174
 late+0.171
Neg logit contributions
 cheeks-0.118
 says-0.118
 mouths-0.108
 acts-0.107
ens-0.102
Token they
Token ID484
Feature activation-2.28e-01

Pos logit contributions
 cheeks+0.300
 says+0.289
 acts+0.265
 mouths+0.253
ens+0.240
Neg logit contributions
amed-0.571
year-0.551
and-0.547
 late-0.546
Along-0.531
Token heard
Token ID2982
Feature activation+1.58e-01

Pos logit contributions
 cheeks+0.432
 says+0.422
 Ben+0.405
 mouths+0.405
 Tom+0.396
Neg logit contributions
amed-0.512
and-0.481
year-0.463
aming-0.448
Along-0.447
Token a
Token ID257
Feature activation+1.70e-01

Pos logit contributions
amed+0.289
and+0.283
year+0.262
Along+0.247
aming+0.246
Neg logit contributions
 says-0.357
 Ben-0.351
 cheeks-0.351
 Tom-0.341
 mouths-0.337
Token strange
Token ID6283
Feature activation+1.34e-01

Pos logit contributions
amed+0.344
and+0.340
year+0.318
Along+0.305
uttering+0.302
Neg logit contributions
 says-0.340
 Ben-0.339
 cheeks-0.335
 Tom-0.329
 mouths-0.321
Token sound
Token ID2128
Feature activation+1.85e-02

Pos logit contributions
amed+0.314
and+0.312
year+0.298
uttering+0.289
pr+0.284
Neg logit contributions
 Ben-0.220
 cheeks-0.210
 says-0.210
 Tom-0.209
 mouths-0.208
Token outside
Token ID2354
Feature activation+2.89e-01

Pos logit contributions
amed+0.039
and+0.038
year+0.035
Along+0.034
uttering+0.034
Neg logit contributions
 says-0.037
 Ben-0.037
 cheeks-0.036
 Tom-0.036
 mouths-0.034
Token.
Token ID13
Feature activation-5.28e-02

Pos logit contributions
amed+0.215
and+0.212
year+0.200
Along+0.193
uttering+0.192
Neg logit contributions
 says-0.135
 Ben-0.132
 cheeks-0.130
 Tom-0.127
 mouths-0.123
Token She
Token ID1375
Feature activation-9.04e-02

Pos logit contributions
 mouths+0.112
 cheeks+0.111
 floats+0.111
 trucks+0.110
 avail+0.109
Neg logit contributions
and-0.092
amed-0.083
 late-0.080
year-0.075
Just-0.075
Token says
Token ID1139
Feature activation-4.86e-01

Pos logit contributions
 cheeks+0.085
 Lee+0.074
 Ben+0.074
 mouths+0.073
 Tom+0.069
Neg logit contributions
amed-0.279
and-0.260
year-0.256
Just-0.256
Along-0.255
Token to
Token ID284
Feature activation+2.58e-01

Pos logit contributions
 cheeks+0.773
 says+0.753
 mouths+0.722
 floats+0.700
 trucks+0.678
Neg logit contributions
and-1.118
amed-1.100
year-1.037
uttering-1.022
 late-0.999
Token Ben
Token ID3932
Feature activation-4.69e-01

Pos logit contributions
and+0.540
 late+0.530
 very+0.528
 really+0.517
amed+0.507
Neg logit contributions
 mouths-0.350
 floats-0.349
 cheeks-0.336
 says-0.315
 acts-0.287
Token,
Token ID11
Feature activation-2.33e-01

Pos logit contributions
 cheeks+0.857
 floats+0.791
 mouths+0.787
 says+0.720
 ah+0.719
Neg logit contributions
and-1.039
amed-0.991
 supposed-0.914
year-0.904
Just-0.899
Token "
Token ID366
Feature activation-5.19e-02

Pos logit contributions
 says+0.391
 cheeks+0.387
 mouths+0.370
 Ben+0.365
 floats+0.354
Neg logit contributions
amed-0.538
and-0.538
year-0.504
Just-0.487
Along-0.484
TokenThat
Token ID2504
Feature activation+2.28e-01

Pos logit contributions
 says+0.113
 Ben+0.111
 cheeks+0.110
 Tom+0.108
 mouths+0.106
Neg logit contributions
amed-0.101
and-0.099
year-0.092
Along-0.087
aming-0.087
Token was
Token ID373
Feature activation-2.11e-02

Pos logit contributions
amed+0.610
and+0.592
Just+0.555
Along+0.552
aming+0.550
Neg logit contributions
 cheeks-0.304
 Ben-0.281
 mouths-0.274
 says-0.273
 Tom-0.270
Token silly
Token ID14397
Feature activation+1.20e-01

Pos logit contributions
 Ben+0.044
 says+0.044
 Tom+0.044
 Sam+0.042
 cheeks+0.041
Neg logit contributions
and-0.045
amed-0.044
year-0.041
uttering-0.039
Along-0.039
Token.
Token ID13
Feature activation-5.31e-02

Pos logit contributions
and+0.227
amed+0.194
 late+0.183
aming+0.181
aking+0.179
Neg logit contributions
 mouths-0.250
 cheeks-0.249
 says-0.241
 floats-0.223
 Lee-0.216
Token their
Token ID511
Feature activation-1.59e-01

Pos logit contributions
 says+0.291
 cheeks+0.290
 Ben+0.284
 mouths+0.274
 Tom+0.269
Neg logit contributions
amed-0.240
and-0.227
year-0.209
uttering-0.199
aming-0.195
Token seat
Token ID5852
Feature activation-1.30e-01

Pos logit contributions
 says+0.195
 Ben+0.191
 cheeks+0.182
 Tom+0.181
 Sam+0.169
Neg logit contributions
amed-0.451
and-0.446
year-0.421
Along-0.411
uttering-0.409
Token belts
Token ID28617
Feature activation-2.12e-02

Pos logit contributions
 says+0.236
 Ben+0.231
 cheeks+0.230
 Tom+0.225
 mouths+0.219
Neg logit contributions
amed-0.297
and-0.293
year-0.276
Along-0.265
uttering-0.263
Token.
Token ID13
Feature activation-4.92e-02

Pos logit contributions
 says+0.034
 Ben+0.033
 cheeks+0.033
 Tom+0.032
 mouths+0.031
Neg logit contributions
amed-0.053
and-0.052
year-0.049
Along-0.047
uttering-0.047
Token They
Token ID1119
Feature activation-1.91e-01

Pos logit contributions
 says+0.060
 cheeks+0.058
 Ben+0.054
 mouths+0.052
 Tom+0.051
Neg logit contributions
and-0.139
amed-0.139
year-0.132
Along-0.129
uttering-0.128
Token tell
Token ID1560
Feature activation-2.07e-01

Pos logit contributions
 cheeks+0.268
 mouths+0.251
 Ben+0.244
 Lee+0.240
 says+0.239
Neg logit contributions
amed-0.471
Along-0.457
and-0.450
year-0.438
On-0.425
Token mom
Token ID1995
Feature activation-2.64e-01

Pos logit contributions
 cheeks+0.300
ens+0.291
 says+0.290
 Lee+0.277
 mouths+0.276
Neg logit contributions
and-0.472
amed-0.471
year-0.461
 late-0.433
 much-0.422
Token about
Token ID546
Feature activation+2.24e-01

Pos logit contributions
 cheeks+0.536
 says+0.522
 Ben+0.507
 mouths+0.499
 Tom+0.480
Neg logit contributions
and-0.536
amed-0.508
year-0.484
Along-0.454
uttering-0.445
Token their
Token ID511
Feature activation-1.59e-01

Pos logit contributions
and+0.499
year+0.459
 supposed+0.446
uttering+0.442
amed+0.442
Neg logit contributions
 cheeks-0.358
 says-0.332
ens-0.315
 mouths-0.303
 floats-0.292
Token adventures
Token ID17545
Feature activation+1.09e-01

Pos logit contributions
 says+0.255
 Ben+0.251
 cheeks+0.247
 Tom+0.240
 mouths+0.233
Neg logit contributions
amed-0.390
and-0.388
year-0.364
Along-0.352
uttering-0.351
Token.
Token ID13
Feature activation-4.88e-02

Pos logit contributions
and+0.251
amed+0.245
year+0.231
aming+0.220
uttering+0.218
Neg logit contributions
 says-0.191
 cheeks-0.190
 Ben-0.187
 mouths-0.181
 Tom-0.176
Token
Token ID198
Feature activation+1.72e-01

Pos logit contributions
 says+0.074
 cheeks+0.073
 Ben+0.070
 mouths+0.069
 Tom+0.067
Neg logit contributions
and-0.107
amed-0.106
year-0.099
Along-0.096
uttering-0.095
Token
Token ID198
Feature activation+1.72e-01

Pos logit contributions
amed+0.380
year+0.343
and+0.342
uttering+0.332
aming+0.331
Neg logit contributions
 says-0.350
 Ben-0.347
 Tom-0.342
 Sam-0.322
 cheeks-0.300
Token"
Token ID1
Feature activation+1.34e-01

Pos logit contributions
amed+0.330
and+0.314
year+0.308
uttering+0.295
 supposed+0.291
Neg logit contributions
 Ben-0.378
 says-0.373
 Tom-0.367
 Sam-0.334
 cheeks-0.330
TokenLook
Token ID8567
Feature activation+1.69e-01

Pos logit contributions
and+0.280
amed+0.275
year+0.264
uttering+0.253
 supposed+0.247
Neg logit contributions
 Ben-0.275
 Tom-0.267
 says-0.262
 cheeks-0.248
 Sam-0.244
Token,
Token ID11
Feature activation-2.26e-01

Pos logit contributions
amed+0.400
and+0.396
pr+0.384
a+0.366
uttering+0.360
Neg logit contributions
 acts-0.234
 cheeks-0.233
 mouths-0.228
 floats-0.225
 says-0.220
Token mom
Token ID1995
Feature activation-2.59e-01

Pos logit contributions
 mouths+0.316
 cheeks+0.312
 acts+0.303
 floats+0.298
 tongues+0.296
Neg logit contributions
and-0.505
 very-0.480
 much-0.475
a-0.473
amed-0.469
Token,
Token ID11
Feature activation-2.26e-01

Pos logit contributions
 cheeks+0.410
 mouths+0.391
 Ben+0.352
 tongues+0.334
 floats+0.331
Neg logit contributions
amed-0.637
and-0.607
pr-0.580
aking-0.571
aming-0.569
Token I
Token ID314
Feature activation-2.50e-01

Pos logit contributions
 cheeks+0.334
 mouths+0.319
 says+0.295
ens+0.292
 tongues+0.287
Neg logit contributions
and-0.538
amed-0.512
pr-0.495
Along-0.488
year-0.486
Token can
Token ID460
Feature activation-1.74e-01

Pos logit contributions
 says+0.404
 Ben+0.394
 Tom+0.380
 cheeks+0.377
 Sam+0.358
Neg logit contributions
amed-0.631
and-0.627
year-0.592
uttering-0.568
Along-0.562
Token swing
Token ID9628
Feature activation-1.81e-01

Pos logit contributions
 mouths+0.256
 cheeks+0.250
 Lee+0.249
ens+0.243
 says+0.242
Neg logit contributions
amed-0.401
and-0.395
Just-0.387
Along-0.384
pr-0.372
Token!"
Token ID2474
Feature activation-4.43e-02

Pos logit contributions
 cheeks+0.345
 says+0.338
 Ben+0.332
 mouths+0.328
 Tom+0.312
Neg logit contributions
amed-0.388
and-0.384
year-0.350
Along-0.342
pr-0.338
Token She
Token ID1375
Feature activation-7.55e-02

Pos logit contributions
 says+0.032
 cheeks+0.032
 mouths+0.030
 Ben+0.029
 floats+0.028
Neg logit contributions
and-0.052
amed-0.051
year-0.049
Along-0.048
Just-0.046
Token had
Token ID550
Feature activation+4.79e-02

Pos logit contributions
 says+0.118
 Ben+0.115
 cheeks+0.114
 Tom+0.111
 mouths+0.107
Neg logit contributions
amed-0.191
and-0.191
year-0.179
uttering-0.173
Along-0.172
Token a
Token ID257
Feature activation+1.77e-01

Pos logit contributions
and+0.100
amed+0.096
uttering+0.093
year+0.092
pr+0.086
Neg logit contributions
 Tom-0.100
 Ben-0.100
 says-0.096
 cheeks-0.094
 Sam-0.093
Token kind
Token ID1611
Feature activation+2.69e-01

Pos logit contributions
amed+0.392
and+0.392
year+0.365
aming+0.345
uttering+0.340
Neg logit contributions
 says-0.340
 cheeks-0.333
 Ben-0.332
 Tom-0.320
 mouths-0.318
Token face
Token ID1986
Feature activation-2.31e-01

Pos logit contributions
amed+0.694
and+0.684
year+0.648
uttering+0.632
Along+0.619
Neg logit contributions
 says-0.382
 Ben-0.373
 mouths-0.362
 cheeks-0.359
 Tom-0.352
Token,
Token ID11
Feature activation-2.10e-01

Pos logit contributions
 says+0.417
 Ben+0.410
 cheeks+0.408
 Tom+0.398
 mouths+0.385
Neg logit contributions
amed-0.525
and-0.519
year-0.486
Along-0.466
uttering-0.463
Token and
Token ID290
Feature activation+2.80e-02

Pos logit contributions
 says+0.401
 Ben+0.396
 cheeks+0.388
 Tom+0.387
 mouths+0.369
Neg logit contributions
amed-0.467
and-0.460
year-0.429
uttering-0.414
Along-0.409
Token John
Token ID1757
Feature activation-1.93e-01

Pos logit contributions
amed+0.071
and+0.069
year+0.068
Along+0.064
aming+0.063
Neg logit contributions
 says-0.041
 cheeks-0.041
 Ben-0.040
 mouths-0.039
ens-0.037
Token felt
Token ID2936
Feature activation-5.99e-02

Pos logit contributions
 says+0.322
 Ben+0.319
 cheeks+0.316
 Tom+0.306
 mouths+0.303
Neg logit contributions
amed-0.463
and-0.449
year-0.428
aming-0.410
Along-0.409
Token comfort
Token ID4467
Feature activation+7.18e-02

Pos logit contributions
 Ben+0.139
 says+0.139
 Tom+0.132
 cheeks+0.132
 mouths+0.128
Neg logit contributions
and-0.109
amed-0.106
uttering-0.096
Along-0.096
year-0.092
Tokened
Token ID276
Feature activation+2.27e-01

Pos logit contributions
amed+0.119
and+0.116
year+0.106
Along+0.100
uttering+0.099
Neg logit contributions
 says-0.176
 Ben-0.173
 cheeks-0.172
 Tom-0.169
 mouths-0.166