TOP ACTIVATIONS
MAX = 0.557

mum put on their swim suits and spl ashed in the
they put on their swim suits and ran to the pool
they put on their swim suits and ran to the pool
Anna and Ben have swim suits and towels .
Sam put on their swim suits and ran to the lake
to change into our swim suits and put on some sunscreen
. They wear purple swim suits . Lily has a purple
They put on their swim suits and jump in the water
her collection of unique swim suits . She was always determined
They put on their swim suits and sun hats and run
They both wore their swim suits and carried a rubber duck
pretended they were wearing swim suits and sunglasses . They read
outside and got their swim suits on . Lily 's mom
other kids wearing colourful swim suits . She was so excited
They packed their swim suits , towels , hats and
." They take their swim suits and towels and leave the
they do not have swim suits . They have an idea
Tim and Sue were very thirsty . They went to Tim
is hot and we are thirsty . And the water is
to drink . She was thirsty because playing outside made her

MIN ACTIVATIONS
MIN = -0.555

apologised . Sam accepted the apology and the friends hugged and
so happy to get an apology from Paul . They shared
was happy and accepted the apology , and they all smiled
. Bunny accepted Teddy 's apology and they continued to play
Jack accepted his apology and they started playing together
Okay , we accept your apology . We can be friends
. You owe Tommy an apology and a new car .
sing a song as an apology . All the other animals
Ed without a word of apology or explanation . Ed felt
to the squirrel as an apology . The squirrel forg ave
The man accepts their apology . He says , "
sorry . I accept your apology . Next time , be
very grateful and accepted his apology . The z
to his friends as an apology . His mom agreed and
and her mom accepted his apology and thanked him . They
than selfish ness and rud eness . <|endoftext|> Tom and Sam
taking in all the whit eness and beauty of nature .
appreciate your honesty and your apology . You can keep the
He could smell the peac eness of the cave . He
very annoyed with his rud eness and complained that he was

INTERVAL 0.279 - 0.557
CONTAINS 2.025%

girl was playing on the swing , she lost her balance
didn 't know what was making the noise , but he
pushing the shadows away and making things look beautiful . He
there was a little boy named Tim my . Tim my
Once upon a time , there was a hipp o named

INTERVAL 0.001 - 0.279
CONTAINS 50.220%

. <|endoftext|> Tim and Sam are brothers . They like to
their eyes and looked up . They saw the plane fly
window and saw her neighbour in the garden . He looked
H ank felt a bit hurt because he didn
's go !" Lily said . They held hands

INTERVAL -0.277 - 0.001
CONTAINS 45.799%

. " I wish for a big ice cream
John was so happy . He signed every day
He feels a shock . He cries out loud . He
- the pig was very grateful . The farmer
One day , Lily 's mom went to the store and

INTERVAL -0.555 - -0.277
CONTAINS 1.955%

sure no one distur bs me while I 'm sleeping .
thanked the frog for saving him . From that day on
sun , planting and harvesting his wheat . Every night he
on , the bird ie knew that even though things may
left the park . He knew that sister 's memory might
Token mum
Token ID25682
Feature activation-1.64e-01

Pos logit contributions
eness+0.179
 apology+0.172
heart+0.156
fulness+0.153
 adult+0.150
Neg logit contributions
suits-0.270
-0.244
 thirsty-0.243
mons-0.241
 sandbox-0.229
Token put
Token ID1234
Feature activation+3.42e-02

Pos logit contributions
eness+0.325
 apology+0.306
 adult+0.304
heart+0.300
est+0.295
Neg logit contributions
suits-0.315
-0.298
mons-0.295
 thirsty-0.279
 sandbox-0.262
Token on
Token ID319
Feature activation+2.36e-01

Pos logit contributions
suits+0.065
+0.058
 thirsty+0.056
 sandbox+0.055
mons+0.054
Neg logit contributions
 apology-0.065
eness-0.064
est-0.061
 himself-0.060
 adult-0.059
Token their
Token ID511
Feature activation+1.58e-01

Pos logit contributions
suits+0.565
mons+0.508
+0.506
 thirsty+0.504
aped+0.485
Neg logit contributions
 apology-0.365
eness-0.341
 adult-0.333
 lesson-0.328
 compassion-0.324
Token swim
Token ID9422
Feature activation+1.83e-01

Pos logit contributions
suits+0.269
+0.248
mons+0.246
 thirsty+0.227
aped+0.220
Neg logit contributions
 apology-0.362
eness-0.340
 adult-0.330
est-0.318
 decision-0.316
Tokensuits
Token ID16554
Feature activation+5.57e-01

Pos logit contributions
+0.089
mons+0.079
 thirsty+0.072
suits+0.062
chard+0.042
Neg logit contributions
eness-0.609
 apology-0.609
est-0.587
 adult-0.577
 decision-0.569
Token and
Token ID290
Feature activation-5.16e-02

Pos logit contributions
mons+1.362
+1.342
suits+1.295
 thirsty+1.291
aped+1.272
Neg logit contributions
eness-0.743
est-0.683
 apology-0.682
 adult-0.675
 himself-0.658
Token spl
Token ID4328
Feature activation+3.36e-01

Pos logit contributions
 adult+0.089
 apology+0.088
 himself+0.088
 lesson+0.088
 his+0.086
Neg logit contributions
-0.101
owers-0.090
mons-0.090
suits-0.087
uttering-0.082
Tokenashed
Token ID5263
Feature activation+3.00e-01

Pos logit contributions
+0.401
 thirsty+0.382
suits+0.378
chard+0.353
 sandbox+0.349
Neg logit contributions
eness-0.890
 apology-0.862
heart-0.841
est-0.821
fulness-0.814
Token in
Token ID287
Feature activation+1.73e-01

Pos logit contributions
suits+0.553
+0.540
mons+0.511
 thirsty+0.501
 sandbox+0.464
Neg logit contributions
 apology-0.622
eness-0.616
heart-0.582
est-0.578
 adult-0.563
Token the
Token ID262
Feature activation-1.64e-02

Pos logit contributions
mons+0.460
suits+0.455
+0.425
aped+0.423
owers+0.416
Neg logit contributions
 apology-0.231
 adult-0.213
eness-0.210
 himself-0.209
heart-0.198
Token they
Token ID484
Feature activation+3.46e-01

Pos logit contributions
suits+0.139
+0.121
mons+0.117
coat+0.115
 funnel+0.114
Neg logit contributions
 apology-0.103
heart-0.102
 adult-0.102
 himself-0.102
eness-0.098
Token put
Token ID1234
Feature activation+3.75e-02

Pos logit contributions
suits+0.709
+0.677
mons+0.674
coat+0.607
 thirsty+0.597
Neg logit contributions
 apology-0.647
eness-0.641
 adult-0.614
heart-0.613
est-0.575
Token on
Token ID319
Feature activation+2.41e-01

Pos logit contributions
suits+0.063
+0.059
mons+0.054
 thirsty+0.052
coat+0.050
Neg logit contributions
 apology-0.084
eness-0.082
est-0.078
 adult-0.077
 himself-0.077
Token their
Token ID511
Feature activation+1.67e-01

Pos logit contributions
suits+0.456
+0.427
mons+0.409
 thirsty+0.389
coat+0.385
Neg logit contributions
 apology-0.490
eness-0.469
 adult-0.458
 lesson-0.448
 himself-0.447
Token swim
Token ID9422
Feature activation+1.83e-01

Pos logit contributions
suits+0.295
+0.268
mons+0.265
 thirsty+0.250
aped+0.241
Neg logit contributions
 apology-0.374
eness-0.349
 adult-0.338
 decision-0.329
 lesson-0.328
Tokensuits
Token ID16554
Feature activation+5.52e-01

Pos logit contributions
+0.107
mons+0.093
suits+0.087
 thirsty+0.087
aped+0.058
Neg logit contributions
eness-0.588
 apology-0.587
est-0.566
 adult-0.558
heart-0.552
Token and
Token ID290
Feature activation-3.79e-02

Pos logit contributions
mons+1.375
+1.368
suits+1.349
 thirsty+1.291
aped+1.288
Neg logit contributions
eness-0.686
 adult-0.641
est-0.634
 apology-0.628
heart-0.598
Token ran
Token ID4966
Feature activation+1.60e-01

Pos logit contributions
 apology+0.068
 himself+0.066
 adult+0.065
 lesson+0.064
 his+0.064
Neg logit contributions
-0.076
mons-0.067
owers-0.066
suits-0.066
uttering-0.062
Token to
Token ID284
Feature activation-4.29e-02

Pos logit contributions
+0.384
mons+0.377
suits+0.368
owers+0.358
 thirsty+0.347
Neg logit contributions
 apology-0.232
 himself-0.225
 adult-0.218
eness-0.212
 lesson-0.211
Token the
Token ID262
Feature activation-8.32e-03

Pos logit contributions
 apology+0.072
 adult+0.066
eness+0.063
 himself+0.062
heart+0.061
Neg logit contributions
suits-0.104
mons-0.097
-0.094
aped-0.089
 thirsty-0.089
Token pool
Token ID5933
Feature activation+4.32e-01

Pos logit contributions
 apology+0.021
 adult+0.020
eness+0.019
 decision+0.019
 lesson+0.019
Neg logit contributions
suits-0.012
mons-0.011
-0.011
aped-0.010
 thirsty-0.010
Token they
Token ID484
Feature activation+3.48e-01

Pos logit contributions
suits+0.122
+0.119
coat+0.112
mons+0.110
ashing+0.102
Neg logit contributions
 adult-0.127
heart-0.125
 his-0.124
 himself-0.123
 lesson-0.120
Token put
Token ID1234
Feature activation+4.07e-02

Pos logit contributions
suits+0.689
mons+0.658
+0.657
coat+0.592
aped+0.581
Neg logit contributions
 apology-0.694
eness-0.666
 adult-0.638
heart-0.617
 decision-0.608
Token on
Token ID319
Feature activation+2.41e-01

Pos logit contributions
+0.070
suits+0.070
mons+0.061
coat+0.059
 thirsty+0.058
Neg logit contributions
 apology-0.085
 himself-0.082
eness-0.081
est-0.078
 his-0.078
Token their
Token ID511
Feature activation+1.74e-01

Pos logit contributions
suits+0.483
+0.462
mons+0.437
coat+0.417
 thirsty+0.411
Neg logit contributions
 apology-0.447
 himself-0.429
 adult-0.425
eness-0.419
 lesson-0.419
Token swim
Token ID9422
Feature activation+1.82e-01

Pos logit contributions
+0.274
suits+0.270
mons+0.263
aped+0.240
 thirsty+0.235
Neg logit contributions
 apology-0.410
eness-0.367
 adult-0.366
 lesson-0.365
 decision-0.360
Tokensuits
Token ID16554
Feature activation+5.51e-01

Pos logit contributions
+0.091
mons+0.072
 thirsty+0.067
suits+0.058
aped+0.039
Neg logit contributions
 apology-0.608
eness-0.602
est-0.583
 adult-0.576
 decision-0.569
Token and
Token ID290
Feature activation-3.46e-02

Pos logit contributions
+1.427
mons+1.393
suits+1.346
aped+1.320
 thirsty+1.319
Neg logit contributions
eness-0.654
 adult-0.612
 apology-0.610
est-0.609
 himself-0.570
Token ran
Token ID4966
Feature activation+1.61e-01

Pos logit contributions
 apology+0.067
 himself+0.064
 his+0.063
 lesson+0.061
 adult+0.061
Neg logit contributions
-0.069
mons-0.059
owers-0.059
uttering-0.057
suits-0.055
Token to
Token ID284
Feature activation-4.28e-02

Pos logit contributions
+0.398
mons+0.382
suits+0.374
owers+0.364
 thirsty+0.357
Neg logit contributions
 apology-0.230
 himself-0.214
 adult-0.211
eness-0.210
 lesson-0.207
Token the
Token ID262
Feature activation-1.27e-03

Pos logit contributions
 apology+0.065
 adult+0.060
 himself+0.058
eness+0.053
heart+0.051
Neg logit contributions
suits-0.110
mons-0.105
-0.101
aped-0.097
owers-0.096
Token pool
Token ID5933
Feature activation+4.33e-01

Pos logit contributions
 apology+0.004
 adult+0.003
 moral+0.003
eness+0.003
 decision+0.003
Neg logit contributions
mons-0.001
suits-0.001
-0.001
aped-0.001
owers-0.001
Token Anna
Token ID11735
Feature activation+1.49e-01

Pos logit contributions
suits+0.099
mons+0.088
+0.084
 thirsty+0.084
coat+0.082
Neg logit contributions
 apology-0.077
eness-0.073
 adult-0.069
heart-0.067
fulness-0.066
Token and
Token ID290
Feature activation-2.86e-02

Pos logit contributions
suits+0.348
+0.334
mons+0.330
 thirsty+0.313
coat+0.300
Neg logit contributions
 apology-0.233
eness-0.229
 adult-0.206
heart-0.202
est-0.201
Token Ben
Token ID3932
Feature activation+1.56e-01

Pos logit contributions
 apology+0.069
eness+0.068
 adult+0.064
heart+0.064
fulness+0.062
Neg logit contributions
suits-0.047
-0.041
mons-0.040
 thirsty-0.038
coat-0.036
Token have
Token ID423
Feature activation+1.27e-01

Pos logit contributions
suits+0.335
mons+0.323
+0.320
 thirsty+0.287
aped+0.284
Neg logit contributions
 apology-0.284
eness-0.273
 adult-0.245
 decision-0.239
est-0.238
Token swim
Token ID9422
Feature activation+1.85e-01

Pos logit contributions
+0.290
suits+0.275
mons+0.260
owers+0.247
coat+0.234
Neg logit contributions
 apology-0.213
eness-0.197
 adult-0.195
 lesson-0.195
 himself-0.192
Tokensuits
Token ID16554
Feature activation+5.51e-01

Pos logit contributions
+0.370
mons+0.333
suits+0.313
 sandbox+0.305
 thirsty+0.305
Neg logit contributions
 apology-0.351
eness-0.344
 lesson-0.335
est-0.331
 himself-0.327
Token and
Token ID290
Feature activation-3.13e-02

Pos logit contributions
+1.435
mons+1.397
suits+1.337
aped+1.306
owers+1.301
Neg logit contributions
eness-0.652
 himself-0.641
 apology-0.617
est-0.601
 adult-0.586
Token towels
Token ID38494
Feature activation+2.28e-01

Pos logit contributions
 apology+0.063
 himself+0.059
 lesson+0.056
 his+0.055
 kindness+0.054
Neg logit contributions
-0.060
mons-0.056
suits-0.056
owers-0.052
uttering-0.052
Token.
Token ID13
Feature activation+4.98e-02

Pos logit contributions
suits+0.512
+0.500
mons+0.495
 sandbox+0.458
 thirsty+0.458
Neg logit contributions
eness-0.365
 apology-0.353
fulness-0.335
 himself-0.330
est-0.324
Token
Token ID198
Feature activation+4.99e-02

Pos logit contributions
suits+0.101
mons+0.101
coat+0.097
aped+0.096
 sandbox+0.090
Neg logit contributions
 apology-0.087
 His-0.081
 adult-0.075
 moral-0.075
 himself-0.074
Token
Token ID198
Feature activation+5.04e-02

Pos logit contributions
suits+0.109
mons+0.105
+0.103
 thirsty+0.100
coat+0.096
Neg logit contributions
 apology-0.088
 adult-0.080
eness-0.078
 moral-0.077
 lesson-0.077
Token Sam
Token ID3409
Feature activation+9.19e-02

Pos logit contributions
eness+0.098
 apology+0.093
heart+0.090
fulness+0.088
est+0.087
Neg logit contributions
suits-0.056
 thirsty-0.048
mons-0.048
-0.046
 sandbox-0.045
Token put
Token ID1234
Feature activation+3.66e-02

Pos logit contributions
suits+0.176
mons+0.160
+0.159
 thirsty+0.153
 sandbox+0.142
Neg logit contributions
eness-0.190
 apology-0.189
 adult-0.173
heart-0.173
est-0.169
Token on
Token ID319
Feature activation+2.36e-01

Pos logit contributions
suits+0.059
+0.056
mons+0.053
 thirsty+0.051
coat+0.048
Neg logit contributions
 apology-0.084
eness-0.082
est-0.078
 himself-0.078
 adult-0.076
Token their
Token ID511
Feature activation+1.68e-01

Pos logit contributions
suits+0.434
+0.411
mons+0.397
 thirsty+0.382
coat+0.368
Neg logit contributions
 apology-0.497
eness-0.477
 adult-0.453
 himself-0.446
 lesson-0.443
Token swim
Token ID9422
Feature activation+1.80e-01

Pos logit contributions
suits+0.245
+0.231
mons+0.226
 thirsty+0.206
aped+0.199
Neg logit contributions
 apology-0.421
eness-0.401
 adult-0.387
 decision-0.375
est-0.373
Tokensuits
Token ID16554
Feature activation+5.51e-01

Pos logit contributions
+0.082
mons+0.067
 thirsty+0.062
suits+0.062
 sandbox+0.034
Neg logit contributions
eness-0.609
 apology-0.608
est-0.581
 adult-0.577
 decision-0.567
Token and
Token ID290
Feature activation-3.90e-02

Pos logit contributions
mons+1.310
+1.309
suits+1.276
 thirsty+1.237
aped+1.219
Neg logit contributions
eness-0.779
 apology-0.734
 adult-0.703
est-0.700
 himself-0.697
Token ran
Token ID4966
Feature activation+1.59e-01

Pos logit contributions
 apology+0.067
 lesson+0.066
 himself+0.066
 his+0.066
 adult+0.063
Neg logit contributions
-0.080
owers-0.072
mons-0.071
angers-0.066
suits-0.066
Token to
Token ID284
Feature activation-4.40e-02

Pos logit contributions
+0.378
mons+0.371
suits+0.354
owers+0.351
 thirsty+0.340
Neg logit contributions
 apology-0.234
 himself-0.233
 adult-0.220
eness-0.219
 lesson-0.214
Token the
Token ID262
Feature activation-9.52e-03

Pos logit contributions
 apology+0.068
 adult+0.061
 himself+0.060
eness+0.056
 decision+0.055
Neg logit contributions
suits-0.111
mons-0.108
-0.104
owers-0.100
aped-0.100
Token lake
Token ID13546
Feature activation+1.76e-01

Pos logit contributions
 apology+0.022
 adult+0.021
eness+0.021
 decision+0.020
 moral+0.020
Neg logit contributions
suits-0.015
mons-0.015
-0.015
aped-0.013
coat-0.013
Token to
Token ID284
Feature activation-4.41e-02

Pos logit contributions
eness+0.165
 apology+0.165
heart+0.151
 adult+0.148
est+0.145
Neg logit contributions
suits-0.183
-0.164
mons-0.163
 thirsty-0.160
 sandbox-0.150
Token change
Token ID1487
Feature activation-5.05e-02

Pos logit contributions
 apology+0.081
eness+0.080
 adult+0.074
heart+0.073
 decision+0.071
Neg logit contributions
suits-0.098
mons-0.088
-0.087
 thirsty-0.084
aped-0.081
Token into
Token ID656
Feature activation+1.14e-01

Pos logit contributions
 apology+0.100
eness+0.099
 adult+0.091
heart+0.091
fulness+0.087
Neg logit contributions
suits-0.104
mons-0.091
-0.091
 thirsty-0.090
 sandbox-0.084
Token our
Token ID674
Feature activation+4.46e-02

Pos logit contributions
suits+0.256
+0.233
mons+0.231
 thirsty+0.220
coat+0.216
Neg logit contributions
 apology-0.207
eness-0.204
 adult-0.190
heart-0.185
 decision-0.178
Token swim
Token ID9422
Feature activation+1.82e-01

Pos logit contributions
suits+0.092
+0.082
mons+0.082
 thirsty+0.080
 sandbox+0.075
Neg logit contributions
 apology-0.088
eness-0.087
heart-0.080
 adult-0.080
fulness-0.077
Tokensuits
Token ID16554
Feature activation+5.50e-01

Pos logit contributions
suits+0.243
+0.242
mons+0.233
 thirsty+0.220
aped+0.205
Neg logit contributions
 apology-0.460
eness-0.458
heart-0.427
est-0.427
 adult-0.426
Token and
Token ID290
Feature activation-4.43e-02

Pos logit contributions
suits+1.135
+1.104
mons+1.085
 thirsty+1.030
aped+1.023
Neg logit contributions
eness-1.017
 apology-0.993
heart-0.942
 adult-0.923
est-0.920
Token put
Token ID1234
Feature activation+3.52e-02

Pos logit contributions
 apology+0.094
eness+0.085
 adult+0.084
 himself+0.083
 decision+0.081
Neg logit contributions
suits-0.084
-0.080
mons-0.078
aped-0.070
owers-0.070
Token on
Token ID319
Feature activation+2.35e-01

Pos logit contributions
suits+0.073
+0.068
mons+0.066
 thirsty+0.064
coat+0.062
Neg logit contributions
 apology-0.068
eness-0.066
 adult-0.060
heart-0.060
fulness-0.059
Token some
Token ID617
Feature activation+3.99e-02

Pos logit contributions
suits+0.556
mons+0.508
+0.506
 thirsty+0.481
coat+0.477
Neg logit contributions
 apology-0.392
eness-0.366
 adult-0.350
 himself-0.347
heart-0.332
Token sunscreen
Token ID44221
Feature activation+2.73e-01

Pos logit contributions
suits+0.073
+0.065
mons+0.065
 thirsty+0.063
 sandbox+0.058
Neg logit contributions
 apology-0.087
eness-0.086
heart-0.080
 adult-0.080
est-0.077
Token.
Token ID13
Feature activation+3.08e-02

Pos logit contributions
+0.993
mons+0.974
suits+0.973
aped+0.887
 thirsty+0.879
Neg logit contributions
 apology-0.683
eness-0.669
est-0.654
fulness-0.623
heart-0.619
Token They
Token ID1119
Feature activation+2.39e-01

Pos logit contributions
mons+0.081
suits+0.080
aped+0.079
coat+0.075
+0.073
Neg logit contributions
 apology-0.037
 His-0.035
 moral-0.034
 adult-0.030
 Whenever-0.029
Token wear
Token ID5806
Feature activation+9.97e-02

Pos logit contributions
mons+0.497
suits+0.488
+0.487
aped+0.432
owers+0.415
Neg logit contributions
 apology-0.443
eness-0.404
 adult-0.398
 himself-0.385
 moral-0.382
Token purple
Token ID14032
Feature activation+8.85e-02

Pos logit contributions
+0.215
mons+0.181
suits+0.176
appy+0.167
idd+0.166
Neg logit contributions
 himself-0.179
 his-0.177
 lesson-0.176
 apology-0.174
 words-0.172
Token swim
Token ID9422
Feature activation+1.82e-01

Pos logit contributions
suits+0.139
+0.135
 thirsty+0.126
mons+0.126
aped+0.115
Neg logit contributions
 apology-0.203
eness-0.200
est-0.191
heart-0.186
 adult-0.185
Tokensuits
Token ID16554
Feature activation+5.49e-01

Pos logit contributions
+0.154
mons+0.105
 thirsty+0.099
 sandbox+0.089
 grazing+0.083
Neg logit contributions
est-0.545
 apology-0.533
 lesson-0.533
eness-0.532
 himself-0.519
Token.
Token ID13
Feature activation+3.02e-02

Pos logit contributions
+1.492
mons+1.463
aped+1.404
 thirsty+1.363
suits+1.350
Neg logit contributions
est-0.582
eness-0.560
 himself-0.513
ly-0.501
fulness-0.488
Token Lily
Token ID20037
Feature activation+1.40e-01

Pos logit contributions
suits+0.070
mons+0.070
aped+0.069
coat+0.066
owers+0.063
Neg logit contributions
 apology-0.042
 His-0.041
 moral-0.039
 adult-0.038
 Whenever-0.035
Token has
Token ID468
Feature activation+1.14e-01

Pos logit contributions
suits+0.303
+0.293
mons+0.289
 thirsty+0.269
aped+0.260
Neg logit contributions
 apology-0.248
eness-0.240
 adult-0.225
heart-0.222
est-0.219
Token a
Token ID257
Feature activation+4.91e-02

Pos logit contributions
+0.324
suits+0.292
mons+0.286
owers+0.276
plets+0.261
Neg logit contributions
 himself-0.130
 lesson-0.118
 his-0.115
 words-0.112
 adult-0.101
Token purple
Token ID14032
Feature activation+8.71e-02

Pos logit contributions
mons+0.112
aped+0.105
suits+0.101
owers+0.100
+0.100
Neg logit contributions
 apology-0.074
 lesson-0.074
 moral-0.069
 decision-0.068
 choice-0.066
Token They
Token ID1119
Feature activation+2.29e-01

Pos logit contributions
suits+0.058
mons+0.054
+0.050
aped+0.050
 thirsty+0.048
Neg logit contributions
 apology-0.067
eness-0.065
 himself-0.061
est-0.061
 adult-0.061
Token put
Token ID1234
Feature activation+3.58e-02

Pos logit contributions
suits+0.462
mons+0.459
+0.451
aped+0.395
 thirsty+0.387
Neg logit contributions
 apology-0.427
eness-0.407
 adult-0.392
 himself-0.380
heart-0.379
Token on
Token ID319
Feature activation+2.35e-01

Pos logit contributions
+0.063
suits+0.060
mons+0.053
coat+0.052
 thirsty+0.050
Neg logit contributions
 himself-0.073
 apology-0.073
 his-0.070
est-0.068
eness-0.068
Token their
Token ID511
Feature activation+1.68e-01

Pos logit contributions
+0.446
suits+0.442
mons+0.418
 thirsty+0.384
aped+0.379
Neg logit contributions
 apology-0.463
 himself-0.445
eness-0.429
 lesson-0.425
 adult-0.424
Token swim
Token ID9422
Feature activation+1.80e-01

Pos logit contributions
suits+0.261
mons+0.250
+0.250
aped+0.222
 thirsty+0.221
Neg logit contributions
 apology-0.397
eness-0.374
 adult-0.359
 decision-0.355
est-0.354
Tokensuits
Token ID16554
Feature activation+5.46e-01

Pos logit contributions
+0.079
mons+0.059
 thirsty+0.055
suits+0.032
aped+0.026
Neg logit contributions
 apology-0.608
eness-0.602
est-0.591
 adult-0.577
 decision-0.575
Token and
Token ID290
Feature activation-5.26e-02

Pos logit contributions
+1.406
mons+1.389
 thirsty+1.326
aped+1.317
suits+1.312
Neg logit contributions
eness-0.645
est-0.627
 apology-0.602
 adult-0.598
 himself-0.580
Token jump
Token ID4391
Feature activation+2.22e-01

Pos logit contributions
 apology+0.097
 himself+0.094
 his+0.091
 lesson+0.088
 adult+0.087
Neg logit contributions
-0.104
mons-0.097
suits-0.092
owers-0.091
angers-0.086
Token in
Token ID287
Feature activation+1.74e-01

Pos logit contributions
+0.469
mons+0.460
 thirsty+0.447
 sandbox+0.434
suits+0.431
Neg logit contributions
 himself-0.366
 apology-0.357
est-0.352
eness-0.348
 lesson-0.347
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
mons+0.430
suits+0.425
+0.405
aped+0.396
 thirsty+0.382
Neg logit contributions
 apology-0.260
 himself-0.246
eness-0.244
 adult-0.241
heart-0.234
Token water
Token ID1660
Feature activation+6.60e-02

Pos logit contributions
 apology+0.029
eness+0.028
est+0.028
 adult+0.027
 decision+0.026
Neg logit contributions
mons-0.015
suits-0.015
-0.014
aped-0.012
 thirsty-0.012
Token her
Token ID607
Feature activation-1.09e-01

Pos logit contributions
 apology+0.158
eness+0.156
heart+0.144
 adult+0.144
est+0.138
Neg logit contributions
suits-0.173
mons-0.156
-0.155
 thirsty-0.150
coat-0.142
Token collection
Token ID4947
Feature activation+7.51e-02

Pos logit contributions
 apology+0.220
eness+0.216
 adult+0.200
heart+0.199
 decision+0.195
Neg logit contributions
suits-0.218
mons-0.197
-0.196
 thirsty-0.188
aped-0.179
Token of
Token ID286
Feature activation-2.57e-01

Pos logit contributions
suits+0.154
+0.139
mons+0.135
 thirsty+0.134
coat+0.128
Neg logit contributions
 apology-0.150
eness-0.145
 adult-0.134
heart-0.133
 decision-0.127
Token unique
Token ID3748
Feature activation-7.48e-02

Pos logit contributions
eness+0.613
 apology+0.592
heart+0.553
fulness+0.548
est+0.539
Neg logit contributions
suits-0.432
 thirsty-0.386
 sandbox-0.358
mons-0.355
 cups-0.352
Token swim
Token ID9422
Feature activation+1.74e-01

Pos logit contributions
 apology+0.169
eness+0.163
 adult+0.150
fulness+0.150
heart+0.146
Neg logit contributions
suits-0.139
 thirsty-0.116
mons-0.114
 sandbox-0.113
 cups-0.111
Tokensuits
Token ID16554
Feature activation+5.45e-01

Pos logit contributions
suits+0.320
+0.311
mons+0.307
 thirsty+0.294
 sandbox+0.275
Neg logit contributions
eness-0.360
 apology-0.345
heart-0.331
est-0.328
fulness-0.319
Token.
Token ID13
Feature activation+3.51e-02

Pos logit contributions
suits+1.271
mons+1.220
+1.198
 thirsty+1.155
aped+1.125
Neg logit contributions
eness-0.875
 apology-0.805
heart-0.798
est-0.779
 adult-0.761
Token She
Token ID1375
Feature activation+4.33e-02

Pos logit contributions
suits+0.073
mons+0.065
 thirsty+0.063
+0.063
 sandbox+0.059
Neg logit contributions
 apology-0.068
eness-0.067
heart-0.063
 adult-0.063
fulness-0.060
Token was
Token ID373
Feature activation-1.34e-01

Pos logit contributions
suits+0.086
 thirsty+0.075
+0.073
mons+0.071
 sandbox+0.068
Neg logit contributions
eness-0.093
 apology-0.093
heart-0.084
 adult-0.083
fulness-0.082
Token always
Token ID1464
Feature activation-1.03e-01

Pos logit contributions
 apology+0.237
eness+0.236
heart+0.222
 adult+0.219
est+0.213
Neg logit contributions
suits-0.301
mons-0.276
-0.271
 thirsty-0.244
 sandbox-0.242
Token determined
Token ID5295
Feature activation-3.98e-02

Pos logit contributions
eness+0.200
 apology+0.197
heart+0.196
 himself+0.187
 adult+0.186
Neg logit contributions
suits-0.206
mons-0.194
-0.192
owers-0.166
coat-0.165
Token They
Token ID1119
Feature activation+2.23e-01

Pos logit contributions
suits+0.041
mons+0.041
aped+0.037
+0.036
 thirsty+0.035
Neg logit contributions
 apology-0.050
eness-0.048
est-0.046
 himself-0.046
 adult-0.046
Token put
Token ID1234
Feature activation+3.13e-02

Pos logit contributions
suits+0.446
mons+0.445
+0.442
 thirsty+0.379
aped+0.378
Neg logit contributions
 apology-0.410
eness-0.389
 adult-0.380
est-0.373
 himself-0.370
Token on
Token ID319
Feature activation+2.33e-01

Pos logit contributions
+0.055
suits+0.055
mons+0.048
coat+0.047
plets+0.047
Neg logit contributions
 himself-0.063
 apology-0.061
 his-0.060
est-0.057
 lesson-0.056
Token their
Token ID511
Feature activation+1.63e-01

Pos logit contributions
+0.467
suits+0.446
mons+0.435
aped+0.399
 thirsty+0.395
Neg logit contributions
 himself-0.426
 apology-0.423
 his-0.401
 lesson-0.398
est-0.396
Token swim
Token ID9422
Feature activation+1.77e-01

Pos logit contributions
suits+0.253
mons+0.245
+0.243
 thirsty+0.217
aped+0.217
Neg logit contributions
 apology-0.381
eness-0.356
 adult-0.345
 decision-0.341
 lesson-0.340
Tokensuits
Token ID16554
Feature activation+5.43e-01

Pos logit contributions
+0.085
mons+0.063
 thirsty+0.059
suits+0.036
aped+0.031
Neg logit contributions
 apology-0.596
eness-0.587
est-0.579
 adult-0.563
 lesson-0.562
Token and
Token ID290
Feature activation-5.73e-02

Pos logit contributions
+1.430
mons+1.402
aped+1.342
 thirsty+1.331
suits+1.304
Neg logit contributions
est-0.599
eness-0.592
 himself-0.565
 adult-0.553
 apology-0.550
Token sun
Token ID4252
Feature activation+4.02e-02

Pos logit contributions
 apology+0.114
 himself+0.109
 lesson+0.106
 his+0.104
 adult+0.101
Neg logit contributions
-0.107
mons-0.098
suits-0.093
owers-0.093
angers-0.089
Token hats
Token ID23910
Feature activation+2.40e-01

Pos logit contributions
+0.081
 thirsty+0.081
suits+0.079
chard+0.078
mons+0.075
Neg logit contributions
est-0.065
 himself-0.065
eness-0.063
 his-0.057
fulness-0.054
Token and
Token ID290
Feature activation-5.74e-02

Pos logit contributions
suits+0.565
+0.555
 sandbox+0.550
mons+0.549
 thirsty+0.546
Neg logit contributions
eness-0.344
fulness-0.319
est-0.312
 apology-0.305
 himself-0.302
Token run
Token ID1057
Feature activation+1.68e-01

Pos logit contributions
 apology+0.114
 himself+0.106
 lesson+0.104
 adult+0.101
est+0.099
Neg logit contributions
-0.110
mons-0.101
suits-0.098
owers-0.096
aped-0.093
Token They
Token ID1119
Feature activation+2.20e-01

Pos logit contributions
suits+0.039
mons+0.038
aped+0.034
+0.034
ites+0.033
Neg logit contributions
 His-0.030
 himself-0.029
 adult-0.028
 his-0.028
 apology-0.027
Token both
Token ID1111
Feature activation+1.06e-01

Pos logit contributions
+0.462
mons+0.450
suits+0.423
owers+0.394
angers+0.380
Neg logit contributions
 adult-0.358
heart-0.345
 himself-0.341
est-0.340
 apology-0.336
Token wore
Token ID12408
Feature activation+1.69e-01

Pos logit contributions
+0.224
suits+0.217
mons+0.215
owers+0.189
 thirsty+0.188
Neg logit contributions
 apology-0.182
eness-0.180
heart-0.176
est-0.176
 adult-0.169
Token their
Token ID511
Feature activation+1.54e-01

Pos logit contributions
+0.379
suits+0.349
mons+0.345
aped+0.311
owers+0.307
Neg logit contributions
 his-0.285
 himself-0.285
eness-0.257
est-0.256
 lesson-0.255
Token swim
Token ID9422
Feature activation+1.74e-01

Pos logit contributions
suits+0.273
+0.252
mons+0.252
 thirsty+0.232
aped+0.225
Neg logit contributions
 apology-0.331
eness-0.317
 adult-0.301
est-0.299
 decision-0.294
Tokensuits
Token ID16554
Feature activation+5.42e-01

Pos logit contributions
+0.087
mons+0.060
 thirsty+0.053
 grazing+0.031
etchup+0.028
Neg logit contributions
eness-0.567
 apology-0.566
est-0.564
 lesson-0.547
 decision-0.546
Token and
Token ID290
Feature activation-5.89e-02

Pos logit contributions
mons+1.450
+1.438
aped+1.350
suits+1.338
owers+1.321
Neg logit contributions
est-0.545
eness-0.540
 himself-0.499
 adult-0.494
ly-0.474
Token carried
Token ID5281
Feature activation+1.22e-01

Pos logit contributions
 himself+0.104
 his+0.104
 lesson+0.099
 apology+0.098
est+0.096
Neg logit contributions
-0.120
mons-0.100
owers-0.100
angers-0.095
suits-0.094
Token a
Token ID257
Feature activation+3.18e-02

Pos logit contributions
+0.258
suits+0.247
mons+0.233
 sandbox+0.230
 thirsty+0.229
Neg logit contributions
 apology-0.211
 himself-0.209
eness-0.200
 lesson-0.196
 his-0.196
Token rubber
Token ID14239
Feature activation+2.00e-01

Pos logit contributions
mons+0.067
+0.065
aped+0.063
owers+0.062
suits+0.060
Neg logit contributions
 apology-0.056
 adult-0.053
 decision-0.052
 lesson-0.052
heart-0.047
Token duck
Token ID22045
Feature activation+1.56e-01

Pos logit contributions
suits+0.341
+0.314
 thirsty+0.306
mons+0.303
coat+0.274
Neg logit contributions
 apology-0.441
eness-0.435
 adult-0.416
est-0.402
heart-0.401
Token pretended
Token ID36766
Feature activation+2.65e-01

Pos logit contributions
 apology+0.076
 himself+0.074
 his+0.070
eness+0.068
 adult+0.067
Neg logit contributions
-0.093
mons-0.084
suits-0.084
owers-0.080
uttering-0.074
Token they
Token ID484
Feature activation+3.38e-01

Pos logit contributions
suits+0.537
+0.489
mons+0.469
 thirsty+0.451
coat+0.434
Neg logit contributions
 apology-0.525
eness-0.516
 adult-0.483
heart-0.479
fulness-0.478
Token were
Token ID547
Feature activation+1.03e-01

Pos logit contributions
suits+0.568
+0.496
mons+0.492
 thirsty+0.479
 sandbox+0.439
Neg logit contributions
 apology-0.792
eness-0.789
heart-0.733
 adult-0.731
fulness-0.710
Token wearing
Token ID5762
Feature activation+1.86e-01

Pos logit contributions
suits+0.203
+0.197
mons+0.181
coat+0.166
aped+0.156
Neg logit contributions
 himself-0.205
eness-0.201
 apology-0.194
 adult-0.191
est-0.186
Token swim
Token ID9422
Feature activation+1.74e-01

Pos logit contributions
+0.375
suits+0.375
mons+0.359
aped+0.339
 thirsty+0.331
Neg logit contributions
 apology-0.339
eness-0.336
 himself-0.323
 lesson-0.305
est-0.304
Tokensuits
Token ID16554
Feature activation+5.41e-01

Pos logit contributions
+0.325
mons+0.298
 thirsty+0.285
 sandbox+0.269
aped+0.263
Neg logit contributions
eness-0.353
est-0.340
 apology-0.333
 himself-0.321
 lesson-0.320
Token and
Token ID290
Feature activation-4.41e-02

Pos logit contributions
+1.381
mons+1.353
aped+1.315
 thirsty+1.283
suits+1.274
Neg logit contributions
eness-0.692
est-0.651
 himself-0.594
 apology-0.579
 adult-0.568
Token sunglasses
Token ID36251
Feature activation+6.18e-02

Pos logit contributions
 apology+0.089
 himself+0.086
 lesson+0.083
 words+0.081
 kindness+0.079
Neg logit contributions
-0.084
suits-0.072
mons-0.071
owers-0.068
uttering-0.066
Token.
Token ID13
Feature activation+3.84e-02

Pos logit contributions
suits+0.139
mons+0.134
+0.132
 thirsty+0.132
 sandbox+0.127
Neg logit contributions
eness-0.098
est-0.090
fulness-0.089
 himself-0.086
 apology-0.081
Token They
Token ID1119
Feature activation+2.30e-01

Pos logit contributions
suits+0.073
mons+0.064
 thirsty+0.062
+0.061
aped+0.061
Neg logit contributions
 apology-0.081
 himself-0.075
eness-0.075
 adult-0.073
fulness-0.071
Token read
Token ID1100
Feature activation+9.53e-02

Pos logit contributions
+0.438
suits+0.434
mons+0.431
 thirsty+0.375
owers+0.373
Neg logit contributions
 apology-0.446
eness-0.436
 adult-0.412
 himself-0.402
 moral-0.399
Token outside
Token ID2354
Feature activation+2.58e-01

Pos logit contributions
+0.102
mons+0.101
owers+0.101
suits+0.091
 sandbox+0.091
Neg logit contributions
 himself-0.049
 apology-0.046
 adult-0.046
 lesson-0.045
 his-0.044
Token and
Token ID290
Feature activation-6.01e-02

Pos logit contributions
suits+0.668
mons+0.653
+0.633
owers+0.597
coat+0.593
Neg logit contributions
 apology-0.351
 adult-0.325
 himself-0.313
eness-0.307
est-0.301
Token got
Token ID1392
Feature activation+6.17e-03

Pos logit contributions
 his+0.094
 himself+0.086
 him+0.078
 knew+0.076
 understood+0.075
Neg logit contributions
-0.141
mons-0.132
owers-0.132
suits-0.124
angers-0.120
Token their
Token ID511
Feature activation+1.51e-01

Pos logit contributions
+0.012
suits+0.011
coat+0.010
owers+0.010
mons+0.010
Neg logit contributions
 himself-0.012
 apology-0.012
 his-0.011
 lesson-0.011
 adult-0.011
Token swim
Token ID9422
Feature activation+1.73e-01

Pos logit contributions
+0.247
suits+0.238
mons+0.234
owers+0.213
coat+0.211
Neg logit contributions
 apology-0.333
 lesson-0.309
 adult-0.308
 decision-0.301
eness-0.299
Tokensuits
Token ID16554
Feature activation+5.41e-01

Pos logit contributions
+0.100
mons+0.080
 thirsty+0.062
owers+0.055
chard+0.051
Neg logit contributions
 apology-0.563
eness-0.557
est-0.551
 decision-0.541
 lesson-0.539
Token on
Token ID319
Feature activation+2.28e-01

Pos logit contributions
mons+1.327
+1.321
owers+1.269
suits+1.233
aped+1.226
Neg logit contributions
eness-0.681
 apology-0.667
 himself-0.644
 adult-0.638
est-0.630
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
suits+0.584
mons+0.556
+0.531
aped+0.521
 thirsty+0.516
Neg logit contributions
 apology-0.330
 adult-0.317
 himself-0.308
eness-0.304
heart-0.285
Token Lily
Token ID20037
Feature activation+1.21e-01

Pos logit contributions
suits+0.039
illas+0.038
etchup+0.036
packs+0.036
ites+0.036
Neg logit contributions
 apology-0.029
 His-0.028
 adult-0.028
 himself-0.027
est-0.027
Token's
Token ID338
Feature activation-1.19e-01

Pos logit contributions
suits+0.247
+0.242
mons+0.238
 thirsty+0.219
owers+0.212
Neg logit contributions
 apology-0.213
eness-0.211
 adult-0.204
heart-0.200
est-0.196
Token mom
Token ID1995
Feature activation-1.29e-01

Pos logit contributions
 apology+0.166
eness+0.163
 adult+0.146
heart+0.143
est+0.138
Neg logit contributions
suits-0.311
mons-0.289
-0.289
 thirsty-0.280
aped-0.267
Token other
Token ID584
Feature activation-1.64e-01

Pos logit contributions
suits+0.291
 thirsty+0.262
mons+0.261
+0.252
 sandbox+0.245
Neg logit contributions
eness-0.366
 apology-0.364
heart-0.332
 adult-0.328
 decision-0.326
Token kids
Token ID3988
Feature activation-2.83e-02

Pos logit contributions
 apology+0.270
est+0.268
eness+0.264
heart+0.261
 adult+0.256
Neg logit contributions
-0.383
suits-0.370
mons-0.348
coat-0.329
aped-0.326
Token wearing
Token ID5762
Feature activation+1.86e-01

Pos logit contributions
 himself+0.062
heart+0.059
eness+0.059
 apology+0.059
 decision+0.058
Neg logit contributions
suits-0.047
mons-0.045
-0.045
aped-0.037
 funnel-0.036
Token colourful
Token ID44799
Feature activation+1.31e-01

Pos logit contributions
suits+0.381
+0.354
mons+0.351
aped+0.326
 thirsty+0.320
Neg logit contributions
 apology-0.349
eness-0.346
 himself-0.328
est-0.317
heart-0.317
Token swim
Token ID9422
Feature activation+1.71e-01

Pos logit contributions
suits+0.222
+0.217
mons+0.203
 thirsty+0.196
aped+0.188
Neg logit contributions
eness-0.284
 apology-0.278
est-0.273
heart-0.266
 adult-0.261
Tokensuits
Token ID16554
Feature activation+5.41e-01

Pos logit contributions
+0.299
mons+0.273
 thirsty+0.256
 sandbox+0.254
suits+0.247
Neg logit contributions
eness-0.357
est-0.347
 apology-0.339
heart-0.332
 himself-0.330
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
+1.278
mons+1.266
aped+1.221
suits+1.208
 thirsty+1.161
Neg logit contributions
eness-0.754
est-0.722
heart-0.695
 himself-0.673
 adult-0.651
Token She
Token ID1375
Feature activation+3.56e-02

Pos logit contributions
suits+0.053
 thirsty+0.048
mons+0.048
+0.045
coat+0.044
Neg logit contributions
 apology-0.030
 adult-0.029
heart-0.029
 himself-0.029
eness-0.028
Token was
Token ID373
Feature activation-1.44e-01

Pos logit contributions
suits+0.067
+0.058
mons+0.058
 thirsty+0.058
 sandbox+0.053
Neg logit contributions
 apology-0.078
eness-0.078
heart-0.071
 adult-0.070
fulness-0.069
Token so
Token ID523
Feature activation-5.08e-02

Pos logit contributions
 himself+0.235
 adult+0.222
 apology+0.219
eness+0.206
est+0.205
Neg logit contributions
suits-0.344
-0.324
mons-0.324
owers-0.294
aped-0.268
Token excited
Token ID6568
Feature activation+9.25e-02

Pos logit contributions
 apology+0.070
 himself+0.070
heart+0.064
 adult+0.064
 lesson+0.064
Neg logit contributions
suits-0.124
mons-0.121
-0.120
coat-0.110
 funnel-0.107
Token
Token ID198
Feature activation+2.76e-02

Pos logit contributions
suits+0.054
mons+0.050
+0.049
 thirsty+0.048
 sandbox+0.045
Neg logit contributions
 apology-0.055
eness-0.053
 adult-0.050
heart-0.049
fulness-0.049
TokenThey
Token ID2990
Feature activation+1.59e-01

Pos logit contributions
suits+0.059
mons+0.056
+0.055
 thirsty+0.052
coat+0.050
Neg logit contributions
 apology-0.049
eness-0.047
 adult-0.046
heart-0.046
 moral-0.045
Token packed
Token ID11856
Feature activation+9.57e-02

Pos logit contributions
mons+0.304
+0.304
suits+0.302
owers+0.259
 thirsty+0.258
Neg logit contributions
 apology-0.293
eness-0.288
 adult-0.288
heart-0.274
 moral-0.269
Token their
Token ID511
Feature activation+1.60e-01

Pos logit contributions
suits+0.177
+0.173
mons+0.161
 thirsty+0.153
coat+0.150
Neg logit contributions
 apology-0.203
eness-0.193
heart-0.180
 adult-0.180
 lesson-0.179
Token swim
Token ID9422
Feature activation+1.74e-01

Pos logit contributions
suits+0.251
+0.224
mons+0.219
 thirsty+0.212
coat+0.195
Neg logit contributions
 apology-0.391
eness-0.385
 adult-0.361
heart-0.357
 decision-0.350
Tokensuits
Token ID16554
Feature activation+5.41e-01

Pos logit contributions
+0.069
 thirsty+0.040
mons+0.039
 sandbox+0.014
suits+0.014
Neg logit contributions
 apology-0.598
eness-0.591
est-0.577
 adult-0.567
 decision-0.565
Token,
Token ID11
Feature activation+4.55e-02

Pos logit contributions
+1.387
mons+1.371
suits+1.294
aped+1.290
 thirsty+1.288
Neg logit contributions
eness-0.611
est-0.592
 apology-0.565
 adult-0.561
 himself-0.559
Token towels
Token ID38494
Feature activation+2.17e-01

Pos logit contributions
+0.094
mons+0.084
suits+0.082
owers+0.081
aped+0.080
Neg logit contributions
 apology-0.083
 himself-0.078
 adult-0.077
 his-0.073
 lesson-0.073
Token,
Token ID11
Feature activation+4.55e-02

Pos logit contributions
suits+0.495
mons+0.488
+0.483
 thirsty+0.469
 sandbox+0.459
Neg logit contributions
eness-0.326
 apology-0.302
est-0.297
 adult-0.294
fulness-0.291
Token hats
Token ID23910
Feature activation+2.38e-01

Pos logit contributions
+0.113
suits+0.110
mons+0.107
aped+0.101
owers+0.099
Neg logit contributions
 apology-0.060
 adult-0.056
eness-0.055
 himself-0.055
est-0.052
Token and
Token ID290
Feature activation-5.72e-02

Pos logit contributions
suits+0.557
mons+0.541
+0.538
 thirsty+0.527
 sandbox+0.517
Neg logit contributions
eness-0.362
 apology-0.326
est-0.324
fulness-0.312
 adult-0.306
Token."
Token ID526
Feature activation+4.47e-02

Pos logit contributions
+0.243
suits+0.241
mons+0.234
aped+0.220
coat+0.217
Neg logit contributions
 apology-0.190
eness-0.184
est-0.169
heart-0.167
 adult-0.166
Token They
Token ID1119
Feature activation+2.33e-01

Pos logit contributions
suits+0.094
mons+0.087
aped+0.086
coat+0.085
 grazing+0.084
Neg logit contributions
 apology-0.083
eness-0.071
 adult-0.071
 lesson-0.071
 himself-0.071
Token take
Token ID1011
Feature activation-8.87e-02

Pos logit contributions
suits+0.549
mons+0.516
+0.507
 thirsty+0.490
aped+0.472
Neg logit contributions
 apology-0.389
eness-0.367
 adult-0.342
heart-0.328
 decision-0.323
Token their
Token ID511
Feature activation+1.68e-01

Pos logit contributions
 apology+0.175
eness+0.172
 adult+0.159
heart+0.156
est+0.155
Neg logit contributions
suits-0.180
-0.166
mons-0.162
 thirsty-0.159
coat-0.151
Token swim
Token ID9422
Feature activation+1.75e-01

Pos logit contributions
suits+0.271
+0.245
mons+0.241
 thirsty+0.231
aped+0.215
Neg logit contributions
 apology-0.405
eness-0.393
 adult-0.373
heart-0.365
 decision-0.361
Tokensuits
Token ID16554
Feature activation+5.39e-01

Pos logit contributions
+0.131
mons+0.114
suits+0.109
 thirsty+0.109
owers+0.086
Neg logit contributions
 apology-0.545
eness-0.537
est-0.512
 adult-0.510
 decision-0.508
Token and
Token ID290
Feature activation-4.25e-02

Pos logit contributions
+1.302
suits+1.295
mons+1.294
 thirsty+1.227
aped+1.223
Neg logit contributions
eness-0.734
 apology-0.730
 adult-0.675
 himself-0.643
est-0.639
Token towels
Token ID38494
Feature activation+2.18e-01

Pos logit contributions
 apology+0.086
 himself+0.077
 adult+0.075
 lesson+0.074
 his+0.073
Neg logit contributions
-0.081
mons-0.076
suits-0.074
owers-0.074
uttering-0.069
Token and
Token ID290
Feature activation-4.30e-02

Pos logit contributions
suits+0.492
mons+0.472
+0.470
 thirsty+0.452
 sandbox+0.442
Neg logit contributions
eness-0.361
 apology-0.359
fulness-0.323
 adult-0.318
 himself-0.313
Token leave
Token ID2666
Feature activation-1.99e-01

Pos logit contributions
 apology+0.087
 lesson+0.077
 himself+0.077
 adult+0.075
eness+0.074
Neg logit contributions
-0.084
mons-0.077
suits-0.077
owers-0.074
angers-0.070
Token the
Token ID262
Feature activation-7.18e-03

Pos logit contributions
 apology+0.314
 himself+0.304
eness+0.292
 adult+0.281
fulness+0.272
Neg logit contributions
suits-0.465
-0.441
mons-0.431
aped-0.417
coat-0.409
Token they
Token ID484
Feature activation+3.34e-01

Pos logit contributions
suits+0.061
+0.054
mons+0.053
coat+0.049
owers+0.049
Neg logit contributions
 apology-0.061
eness-0.057
 adult-0.057
 himself-0.056
 lesson-0.054
Token do
Token ID466
Feature activation-6.04e-02

Pos logit contributions
suits+0.670
+0.598
mons+0.594
 thirsty+0.586
 sandbox+0.544
Neg logit contributions
eness-0.678
 apology-0.677
heart-0.624
 adult-0.616
fulness-0.600
Token not
Token ID407
Feature activation-3.12e-02

Pos logit contributions
 apology+0.103
eness+0.099
 adult+0.092
 decision+0.090
heart+0.089
Neg logit contributions
suits-0.143
mons-0.129
-0.129
 thirsty-0.124
coat-0.120
Token have
Token ID423
Feature activation+1.13e-01

Pos logit contributions
 apology+0.057
eness+0.056
 adult+0.051
heart+0.051
 decision+0.049
Neg logit contributions
suits-0.070
-0.064
mons-0.063
 thirsty-0.060
 sandbox-0.058
Token swim
Token ID9422
Feature activation+1.75e-01

Pos logit contributions
suits+0.243
+0.220
mons+0.217
 thirsty+0.208
coat+0.200
Neg logit contributions
 apology-0.214
eness-0.211
 adult-0.194
heart-0.192
fulness-0.186
Tokensuits
Token ID16554
Feature activation+5.39e-01

Pos logit contributions
+0.340
suits+0.333
mons+0.325
 sandbox+0.306
 thirsty+0.305
Neg logit contributions
eness-0.332
 apology-0.326
est-0.300
 adult-0.298
 decision-0.298
Token.
Token ID13
Feature activation+2.21e-02

Pos logit contributions
+1.170
suits+1.163
mons+1.108
 thirsty+1.068
aped+1.053
Neg logit contributions
eness-0.908
 apology-0.869
 adult-0.823
est-0.807
heart-0.795
Token They
Token ID1119
Feature activation+2.23e-01

Pos logit contributions
suits+0.041
aped+0.039
 grazing+0.039
coat+0.039
mons+0.038
Neg logit contributions
 apology-0.039
 himself-0.036
eness-0.036
 adult-0.036
 His-0.035
Token have
Token ID423
Feature activation+1.13e-01

Pos logit contributions
suits+0.447
mons+0.408
+0.403
 thirsty+0.375
coat+0.365
Neg logit contributions
 apology-0.450
eness-0.436
 adult-0.408
heart-0.397
 decision-0.396
Token an
Token ID281
Feature activation+1.96e-04

Pos logit contributions
suits+0.242
+0.225
mons+0.220
 thirsty+0.206
 sandbox+0.200
Neg logit contributions
 apology-0.208
eness-0.201
 adult-0.191
 lesson-0.181
heart-0.180
Token idea
Token ID2126
Feature activation-2.42e-01

Pos logit contributions
suits+0.000
 thirsty+0.000
mons+0.000
+0.000
 pool+0.000
Neg logit contributions
eness-0.000
fulness-0.000
 apology-0.000
heart-0.000
erence-0.000
Token Tim
Token ID5045
Feature activation+1.45e-01

Pos logit contributions
suits+0.124
coat+0.100
+0.097
 thirsty+0.096
 funnel+0.096
Neg logit contributions
 lesson-0.098
 himself-0.097
 apology-0.097
 adult-0.095
eness-0.091
Token and
Token ID290
Feature activation-4.17e-02

Pos logit contributions
suits+0.301
 thirsty+0.267
mons+0.263
+0.262
coat+0.247
Neg logit contributions
 apology-0.293
eness-0.287
heart-0.258
fulness-0.258
 adult-0.257
Token Sue
Token ID26089
Feature activation+1.67e-01

Pos logit contributions
eness+0.128
 apology+0.122
heart+0.117
fulness+0.116
 adult+0.113
Neg logit contributions
suits-0.042
mons-0.037
 thirsty-0.035
 sandbox-0.032
-0.031
Token were
Token ID547
Feature activation+1.07e-01

Pos logit contributions
suits+0.336
+0.313
mons+0.310
 thirsty+0.279
 funnel+0.272
Neg logit contributions
eness-0.321
 apology-0.313
 adult-0.298
heart-0.291
est-0.287
Token very
Token ID845
Feature activation+1.91e-02

Pos logit contributions
suits+0.196
+0.171
mons+0.170
owers+0.150
suit+0.138
Neg logit contributions
 himself-0.234
eness-0.226
 adult-0.224
 apology-0.224
 decision-0.219
Token thirsty
Token ID47124
Feature activation+5.09e-01

Pos logit contributions
suits+0.050
+0.048
mons+0.046
owers+0.044
coat+0.043
Neg logit contributions
 apology-0.027
 decision-0.023
 lesson-0.023
 adult-0.023
 himself-0.022
Token.
Token ID13
Feature activation+3.90e-02

Pos logit contributions
suits+1.069
mons+0.955
+0.954
 thirsty+0.898
aped+0.881
Neg logit contributions
eness-0.977
 apology-0.971
heart-0.905
 adult-0.894
est-0.879
Token They
Token ID1119
Feature activation+2.34e-01

Pos logit contributions
suits+0.070
 thirsty+0.058
mons+0.055
+0.054
coat+0.054
Neg logit contributions
 apology-0.085
eness-0.081
 adult-0.079
 himself-0.079
 His-0.076
Token went
Token ID1816
Feature activation+5.51e-02

Pos logit contributions
suits+0.488
+0.451
mons+0.450
 thirsty+0.409
coat+0.406
Neg logit contributions
 apology-0.439
eness-0.433
 adult-0.406
heart-0.405
 decision-0.382
Token to
Token ID284
Feature activation-4.27e-02

Pos logit contributions
suits+0.143
+0.142
mons+0.142
owers+0.138
aped+0.134
Neg logit contributions
 apology-0.063
 adult-0.062
 himself-0.059
 lesson-0.054
 advice-0.054
Token Tim
Token ID5045
Feature activation+1.48e-01

Pos logit contributions
 apology+0.079
 adult+0.071
eness+0.070
 himself+0.067
 decision+0.067
Neg logit contributions
suits-0.099
mons-0.090
-0.087
aped-0.083
coat-0.082
Token is
Token ID318
Feature activation-1.61e-01

Pos logit contributions
 apology+0.145
eness+0.145
 adult+0.133
heart+0.132
fulness+0.125
Neg logit contributions
suits-0.151
 thirsty-0.134
-0.134
mons-0.134
 sandbox-0.124
Token hot
Token ID3024
Feature activation+1.48e-01

Pos logit contributions
eness+0.311
 apology+0.309
heart+0.283
 adult+0.280
est+0.272
Neg logit contributions
suits-0.335
 thirsty-0.301
mons-0.299
-0.298
 sandbox-0.276
Token and
Token ID290
Feature activation-3.45e-02

Pos logit contributions
suits+0.342
+0.319
mons+0.307
 thirsty+0.295
aped+0.292
Neg logit contributions
eness-0.254
heart-0.240
 apology-0.239
est-0.236
 adult-0.224
Token we
Token ID356
Feature activation+2.54e-01

Pos logit contributions
 apology+0.072
eness+0.070
heart+0.066
 adult+0.064
 decision+0.063
Neg logit contributions
suits-0.067
-0.061
mons-0.059
 thirsty-0.054
coat-0.054
Token are
Token ID389
Feature activation+3.56e-02

Pos logit contributions
suits+0.566
+0.514
mons+0.510
 thirsty+0.485
 sandbox+0.470
Neg logit contributions
 apology-0.455
eness-0.446
 adult-0.408
heart-0.407
 decision-0.391
Token thirsty
Token ID47124
Feature activation+5.07e-01

Pos logit contributions
suits+0.076
+0.069
mons+0.067
aped+0.061
coat+0.061
Neg logit contributions
 apology-0.070
eness-0.066
 adult-0.064
heart-0.061
 decision-0.061
Token.
Token ID13
Feature activation+4.53e-02

Pos logit contributions
suits+1.084
+1.013
mons+0.995
aped+0.935
 thirsty+0.917
Neg logit contributions
eness-0.943
 apology-0.935
heart-0.874
est-0.861
 adult-0.854
Token And
Token ID843
Feature activation+2.79e-02

Pos logit contributions
suits+0.098
mons+0.089
coat+0.087
+0.085
aped+0.084
Neg logit contributions
 apology-0.084
eness-0.075
 adult-0.074
heart-0.074
 himself-0.071
Token the
Token ID262
Feature activation+5.41e-03

Pos logit contributions
suits+0.053
+0.047
mons+0.046
coat+0.044
 thirsty+0.042
Neg logit contributions
 apology-0.061
eness-0.057
 adult-0.055
heart-0.055
 himself-0.053
Token water
Token ID1660
Feature activation+7.97e-02

Pos logit contributions
suits+0.010
+0.009
mons+0.009
 thirsty+0.008
aped+0.008
Neg logit contributions
 apology-0.012
eness-0.011
 adult-0.011
heart-0.011
 decision-0.010
Token is
Token ID318
Feature activation-1.51e-01

Pos logit contributions
suits+0.194
+0.177
mons+0.176
 thirsty+0.172
 sandbox+0.163
Neg logit contributions
 apology-0.127
eness-0.126
heart-0.113
 adult-0.112
est-0.107
Token to
Token ID284
Feature activation-4.64e-02

Pos logit contributions
suits+0.149
+0.134
mons+0.133
 thirsty+0.129
 sandbox+0.125
Neg logit contributions
eness-0.123
 apology-0.120
heart-0.112
 adult-0.109
est-0.109
Token drink
Token ID4144
Feature activation+2.43e-01

Pos logit contributions
 apology+0.114
eness+0.111
heart+0.106
 adult+0.105
fulness+0.102
Neg logit contributions
suits-0.076
mons-0.064
-0.063
 thirsty-0.060
aped-0.057
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
suits+0.462
+0.411
mons+0.411
 thirsty+0.404
coat+0.369
Neg logit contributions
 apology-0.516
eness-0.512
 adult-0.474
heart-0.469
 decision-0.457
Token She
Token ID1375
Feature activation+4.38e-02

Pos logit contributions
suits+0.062
mons+0.054
+0.053
 thirsty+0.053
 sandbox+0.050
Neg logit contributions
 apology-0.056
eness-0.054
heart-0.051
 adult-0.051
est-0.049
Token was
Token ID373
Feature activation-1.33e-01

Pos logit contributions
suits+0.084
+0.076
mons+0.075
 thirsty+0.072
 sandbox+0.068
Neg logit contributions
 apology-0.090
eness-0.089
heart-0.084
 adult-0.083
est-0.080
Token thirsty
Token ID47124
Feature activation+5.07e-01

Pos logit contributions
 apology+0.223
eness+0.218
 adult+0.203
heart+0.203
 himself+0.196
Neg logit contributions
suits-0.317
mons-0.286
-0.285
 sandbox-0.257
owers-0.254
Token because
Token ID780
Feature activation+7.03e-03

Pos logit contributions
suits+1.091
mons+0.980
+0.977
 thirsty+0.937
aped+0.898
Neg logit contributions
eness-0.948
 apology-0.944
heart-0.873
 adult-0.861
est-0.842
Token playing
Token ID2712
Feature activation+3.80e-01

Pos logit contributions
suits+0.013
+0.011
mons+0.011
 thirsty+0.011
 sandbox+0.010
Neg logit contributions
 apology-0.015
eness-0.015
heart-0.014
 adult-0.014
est-0.014
Token outside
Token ID2354
Feature activation+2.63e-01

Pos logit contributions
suits+0.760
+0.670
mons+0.666
aped+0.628
 thirsty+0.622
Neg logit contributions
 apology-0.759
eness-0.750
est-0.703
 himself-0.697
heart-0.687
Token made
Token ID925
Feature activation+5.80e-02

Pos logit contributions
suits+0.564
mons+0.505
+0.503
 thirsty+0.474
aped+0.461
Neg logit contributions
eness-0.496
 apology-0.485
heart-0.467
est-0.449
 adult-0.448
Token her
Token ID607
Feature activation-1.02e-01

Pos logit contributions
suits+0.121
mons+0.112
 thirsty+0.111
+0.110
 cups+0.102
Neg logit contributions
 apology-0.109
eness-0.109
 adult-0.099
heart-0.096
 decision-0.095
Token apologised
Token ID46397
Feature activation-3.19e-01

Pos logit contributions
 apology+0.131
eness+0.130
heart+0.120
 adult+0.120
fulness+0.116
Neg logit contributions
suits-0.118
-0.104
mons-0.104
 thirsty-0.102
 sandbox-0.095
Token.
Token ID13
Feature activation+1.86e-02

Pos logit contributions
eness+0.752
 apology+0.740
heart+0.711
 adult+0.690
est+0.677
Neg logit contributions
suits-0.521
mons-0.451
 thirsty-0.446
 together-0.426
-0.408
Token Sam
Token ID3409
Feature activation+6.96e-02

Pos logit contributions
suits+0.040
+0.037
mons+0.036
 thirsty+0.035
aped+0.033
Neg logit contributions
eness-0.035
 apology-0.035
heart-0.032
 adult-0.031
fulness-0.031
Token accepted
Token ID6292
Feature activation-1.54e-01

Pos logit contributions
suits+0.120
 thirsty+0.112
+0.105
aped+0.102
 fet+0.099
Neg logit contributions
eness-0.151
 apology-0.149
fulness-0.148
heart-0.142
 adult-0.133
Token the
Token ID262
Feature activation-2.66e-02

Pos logit contributions
eness+0.319
 apology+0.303
heart+0.293
est+0.293
 decision+0.279
Neg logit contributions
-0.281
suits-0.280
 thirsty-0.269
mons-0.268
 watering-0.248
Token apology
Token ID17029
Feature activation-5.55e-01

Pos logit contributions
eness+0.041
heart+0.037
fulness+0.037
father+0.034
ulously+0.032
Neg logit contributions
 thirsty-0.059
suits-0.059
 pool-0.056
-0.056
 swing-0.055
Token and
Token ID290
Feature activation-6.23e-02

Pos logit contributions
 apology+1.222
eness+1.196
heart+1.159
 adult+1.107
fulness+1.065
Neg logit contributions
suits-0.983
 thirsty-0.864
-0.862
mons-0.847
aped-0.794
Token the
Token ID262
Feature activation-2.66e-02

Pos logit contributions
eness+0.139
 apology+0.137
heart+0.126
 adult+0.124
fulness+0.123
Neg logit contributions
suits-0.109
 thirsty-0.100
mons-0.098
-0.096
aped-0.091
Token friends
Token ID2460
Feature activation-8.76e-02

Pos logit contributions
eness+0.055
fulness+0.052
heart+0.052
 apology+0.050
 adult+0.046
Neg logit contributions
 thirsty-0.047
suits-0.047
-0.045
 pool-0.042
 sandbox-0.042
Token hugged
Token ID41130
Feature activation-5.45e-02

Pos logit contributions
 apology+0.177
eness+0.175
heart+0.164
 adult+0.163
est+0.155
Neg logit contributions
suits-0.175
-0.157
 thirsty-0.155
mons-0.153
 sandbox-0.143
Token and
Token ID290
Feature activation-6.25e-02

Pos logit contributions
 apology+0.065
eness+0.063
 adult+0.057
heart+0.057
est+0.053
Neg logit contributions
suits-0.150
-0.141
mons-0.139
 thirsty-0.136
aped-0.132
Token so
Token ID523
Feature activation-5.04e-02

Pos logit contributions
eness+0.310
 apology+0.303
heart+0.290
 adult+0.276
fulness+0.275
Neg logit contributions
 thirsty-0.249
suits-0.239
-0.221
 sandbox-0.206
mons-0.204
Token happy
Token ID3772
Feature activation-3.17e-03

Pos logit contributions
eness+0.121
heart+0.113
 apology+0.113
 adult+0.110
est+0.109
Neg logit contributions
 thirsty-0.085
suits-0.078
-0.073
mons-0.067
 together-0.066
Token to
Token ID284
Feature activation-5.24e-02

Pos logit contributions
 apology+0.006
eness+0.006
 adult+0.006
heart+0.006
est+0.005
Neg logit contributions
suits-0.006
-0.006
mons-0.006
 thirsty-0.006
 sandbox-0.005
Token get
Token ID651
Feature activation-7.37e-03

Pos logit contributions
eness+0.140
heart+0.134
 apology+0.131
est+0.127
 adult+0.126
Neg logit contributions
-0.065
suits-0.062
 thirsty-0.057
mons-0.052
 swing-0.052
Token an
Token ID281
Feature activation-2.92e-03

Pos logit contributions
eness+0.016
 apology+0.015
heart+0.014
est+0.014
 adult+0.014
Neg logit contributions
suits-0.013
 thirsty-0.013
mons-0.012
 together-0.012
-0.012
Token apology
Token ID17029
Feature activation-5.54e-01

Pos logit contributions
eness+0.005
fulness+0.004
erence+0.004
heart+0.004
 himself+0.004
Neg logit contributions
suits-0.006
 thirsty-0.006
mons-0.005
coat-0.005
 swing-0.005
Token from
Token ID422
Feature activation-1.70e-01

Pos logit contributions
 apology+1.168
eness+1.144
heart+1.065
 adult+1.057
est+1.014
Neg logit contributions
suits-1.058
-0.944
mons-0.928
 thirsty-0.918
coat-0.856
Token Paul
Token ID3362
Feature activation-9.88e-02

Pos logit contributions
 apology+0.327
eness+0.327
heart+0.298
 adult+0.295
est+0.286
Neg logit contributions
suits-0.356
-0.321
mons-0.318
 thirsty-0.312
 sandbox-0.292
Token.
Token ID13
Feature activation+2.81e-02

Pos logit contributions
 apology+0.188
eness+0.178
 adult+0.164
 decision+0.158
heart+0.158
Neg logit contributions
suits-0.214
-0.186
 thirsty-0.184
mons-0.182
coat-0.178
Token They
Token ID1119
Feature activation+2.22e-01

Pos logit contributions
suits+0.048
+0.043
mons+0.039
 thirsty+0.039
aped+0.038
Neg logit contributions
 apology-0.065
eness-0.064
 adult-0.058
heart-0.058
 decision-0.058
Token shared
Token ID4888
Feature activation+1.12e-01

Pos logit contributions
suits+0.414
+0.367
 thirsty+0.359
mons+0.358
 sandbox+0.334
Neg logit contributions
 apology-0.482
eness-0.478
heart-0.444
 adult-0.441
est-0.431
Token was
Token ID373
Feature activation-1.43e-01

Pos logit contributions
suits+0.110
 thirsty+0.108
+0.103
aped+0.093
 together+0.092
Neg logit contributions
 apology-0.131
eness-0.130
fulness-0.126
heart-0.118
 adult-0.116
Token happy
Token ID3772
Feature activation-6.62e-03

Pos logit contributions
eness+0.313
 apology+0.310
heart+0.292
fulness+0.283
est+0.282
Neg logit contributions
 thirsty-0.258
suits-0.246
-0.233
 grazing-0.216
 watering-0.210
Token and
Token ID290
Feature activation-6.00e-02

Pos logit contributions
 apology+0.015
eness+0.014
 adult+0.013
est+0.013
heart+0.013
Neg logit contributions
suits-0.011
-0.011
 thirsty-0.011
mons-0.010
 together-0.010
Token accepted
Token ID6292
Feature activation-1.52e-01

Pos logit contributions
eness+0.119
 apology+0.118
 adult+0.107
heart+0.105
est+0.104
Neg logit contributions
 thirsty-0.117
suits-0.117
-0.110
mons-0.105
aped-0.102
Token the
Token ID262
Feature activation-2.28e-02

Pos logit contributions
eness+0.323
 apology+0.315
est+0.294
heart+0.292
 adult+0.290
Neg logit contributions
suits-0.259
-0.259
 thirsty-0.249
mons-0.246
 watering-0.226
Token apology
Token ID17029
Feature activation-5.53e-01

Pos logit contributions
eness+0.038
heart+0.035
fulness+0.033
erence+0.030
ayed+0.029
Neg logit contributions
suits-0.051
 thirsty-0.050
-0.050
 sandbox-0.048
mons-0.048
Token,
Token ID11
Feature activation+3.80e-02

Pos logit contributions
 apology+1.175
eness+1.150
heart+1.078
 adult+1.067
est+1.019
Neg logit contributions
suits-1.040
-0.926
mons-0.919
 thirsty-0.907
coat-0.841
Token and
Token ID290
Feature activation-5.98e-02

Pos logit contributions
suits+0.072
 thirsty+0.068
+0.068
mons+0.064
 sandbox+0.061
Neg logit contributions
eness-0.078
 apology-0.077
heart-0.071
 adult-0.071
est-0.069
Token they
Token ID484
Feature activation+3.23e-01

Pos logit contributions
eness+0.136
 apology+0.134
 adult+0.123
heart+0.119
est+0.119
Neg logit contributions
suits-0.097
 thirsty-0.095
-0.091
mons-0.087
 together-0.084
Token all
Token ID477
Feature activation-3.62e-02

Pos logit contributions
suits+0.626
+0.557
mons+0.550
 thirsty+0.543
 sandbox+0.503
Neg logit contributions
 apology-0.678
eness-0.671
heart-0.620
 adult-0.617
est-0.597
Token smiled
Token ID13541
Feature activation-1.18e-01

Pos logit contributions
 apology+0.079
eness+0.079
 adult+0.073
heart+0.072
fulness+0.070
Neg logit contributions
suits-0.066
-0.058
 thirsty-0.058
mons-0.057
 sandbox-0.053
Token.
Token ID13
Feature activation+2.87e-02

Pos logit contributions
suits+0.062
 thirsty+0.055
+0.055
mons+0.052
 together+0.047
Neg logit contributions
 apology-0.093
eness-0.087
 adult-0.085
heart-0.085
 decision-0.081
Token Bunny
Token ID28683
Feature activation+6.93e-02

Pos logit contributions
suits+0.061
mons+0.055
+0.053
 thirsty+0.052
 sandbox+0.050
Neg logit contributions
 apology-0.054
eness-0.054
heart-0.049
 adult-0.049
fulness-0.047
Token accepted
Token ID6292
Feature activation-1.50e-01

Pos logit contributions
suits+0.149
+0.134
 thirsty+0.133
mons+0.131
 sandbox+0.123
Neg logit contributions
 apology-0.132
eness-0.131
heart-0.119
 adult-0.118
fulness-0.117
Token Teddy
Token ID29345
Feature activation+2.21e-02

Pos logit contributions
eness+0.313
 apology+0.311
heart+0.292
 decision+0.287
est+0.287
Neg logit contributions
suits-0.280
-0.276
mons-0.260
 thirsty-0.259
 watering-0.236
Token's
Token ID338
Feature activation-1.18e-01

Pos logit contributions
suits+0.041
 thirsty+0.039
 together+0.036
+0.036
mons+0.035
Neg logit contributions
eness-0.045
 apology-0.045
heart-0.044
 adult-0.043
fulness-0.041
Token apology
Token ID17029
Feature activation-5.53e-01

Pos logit contributions
heart+0.220
eness+0.201
erence+0.189
father+0.184
 adult+0.183
Neg logit contributions
-0.239
suits-0.235
 thirsty-0.231
 swing-0.220
coat-0.219
Token and
Token ID290
Feature activation-5.62e-02

Pos logit contributions
 apology+1.176
eness+1.140
heart+1.098
 adult+1.076
fulness+1.021
Neg logit contributions
suits-1.049
-0.940
 thirsty-0.929
mons-0.918
aped-0.851
Token they
Token ID484
Feature activation+3.26e-01

Pos logit contributions
eness+0.138
 apology+0.138
heart+0.128
 adult+0.127
fulness+0.124
Neg logit contributions
suits-0.088
mons-0.077
 thirsty-0.076
-0.076
 sandbox-0.069
Token continued
Token ID3767
Feature activation-1.40e-02

Pos logit contributions
suits+0.604
mons+0.539
+0.536
 thirsty+0.496
coat+0.476
Neg logit contributions
eness-0.688
 apology-0.679
heart-0.624
 adult-0.620
 himself-0.617
Token to
Token ID284
Feature activation-5.30e-02

Pos logit contributions
 apology+0.022
 his+0.021
 lesson+0.021
 himself+0.021
 advice+0.019
Neg logit contributions
suits-0.034
mons-0.029
-0.028
owers-0.028
 sandbox-0.027
Token play
Token ID711
Feature activation+1.70e-01

Pos logit contributions
 apology+0.099
eness+0.092
 himself+0.091
 adult+0.088
 decision+0.086
Neg logit contributions
suits-0.117
mons-0.104
 thirsty-0.093
angers-0.092
suit-0.089
Token
Token ID198
Feature activation+2.47e-02

Pos logit contributions
suits+0.126
+0.112
mons+0.110
 thirsty+0.107
aped+0.098
Neg logit contributions
eness-0.161
 apology-0.161
heart-0.150
 adult-0.148
est-0.144
Token
Token ID198
Feature activation+2.51e-02

Pos logit contributions
suits+0.043
+0.038
mons+0.037
 thirsty+0.036
 sandbox+0.033
Neg logit contributions
eness-0.057
 apology-0.057
heart-0.053
 adult-0.052
 decision-0.052
TokenJack
Token ID14295
Feature activation-1.72e-02

Pos logit contributions
suits+0.046
+0.042
mons+0.039
 thirsty+0.039
 sandbox+0.037
Neg logit contributions
 apology-0.055
eness-0.054
 decision-0.050
est-0.050
heart-0.049
Token accepted
Token ID6292
Feature activation-1.53e-01

Pos logit contributions
 apology+0.037
eness+0.037
fulness+0.036
heart+0.034
 adult+0.033
Neg logit contributions
suits-0.031
 thirsty-0.027
-0.026
 together-0.025
 sandbox-0.024
Token his
Token ID465
Feature activation-3.71e-01

Pos logit contributions
eness+0.300
 apology+0.292
est+0.273
heart+0.271
 adult+0.263
Neg logit contributions
suits-0.304
-0.293
mons-0.274
 thirsty-0.272
 cups-0.257
Token apology
Token ID17029
Feature activation-5.52e-01

Pos logit contributions
heart+0.537
eness+0.504
fulness+0.432
est+0.404
father+0.374
Neg logit contributions
suits-0.958
-0.926
 swing-0.896
mons-0.862
 thirsty-0.861
Token and
Token ID290
Feature activation-5.94e-02

Pos logit contributions
 apology+1.276
eness+1.243
heart+1.221
 adult+1.177
erence+1.132
Neg logit contributions
suits-0.927
-0.784
 thirsty-0.780
mons-0.774
 together-0.736
Token they
Token ID484
Feature activation+3.23e-01

Pos logit contributions
eness+0.137
 apology+0.135
heart+0.125
 adult+0.124
fulness+0.122
Neg logit contributions
suits-0.100
 thirsty-0.088
mons-0.088
-0.086
 sandbox-0.082
Token started
Token ID2067
Feature activation+2.37e-02

Pos logit contributions
suits+0.599
mons+0.537
+0.537
 thirsty+0.517
 sandbox+0.475
Neg logit contributions
 apology-0.689
eness-0.689
 adult-0.634
heart-0.630
fulness-0.613
Token playing
Token ID2712
Feature activation+3.70e-01

Pos logit contributions
suits+0.044
mons+0.040
+0.038
 thirsty+0.037
owers+0.036
Neg logit contributions
 apology-0.051
eness-0.050
 adult-0.048
 decision-0.047
 himself-0.047
Token together
Token ID1978
Feature activation+3.95e-01

Pos logit contributions
suits+0.579
mons+0.519
+0.504
 thirsty+0.498
coat+0.457
Neg logit contributions
 apology-0.918
eness-0.891
 himself-0.842
 decision-0.838
 lesson-0.824
TokenOkay
Token ID16454
Feature activation-6.94e-02

Pos logit contributions
suits+0.006
+0.006
mons+0.006
 thirsty+0.005
aped+0.005
Neg logit contributions
 apology-0.008
eness-0.008
heart-0.007
 adult-0.007
 decision-0.007
Token,
Token ID11
Feature activation+5.52e-02

Pos logit contributions
 apology+0.159
eness+0.159
heart+0.147
 adult+0.147
fulness+0.142
Neg logit contributions
suits-0.120
-0.105
mons-0.102
 thirsty-0.101
 sandbox-0.092
Token we
Token ID356
Feature activation+2.40e-01

Pos logit contributions
suits+0.078
 thirsty+0.072
mons+0.068
+0.067
 sandbox+0.067
Neg logit contributions
eness-0.141
 apology-0.134
heart-0.127
est-0.126
 adult-0.124
Token accept
Token ID2453
Feature activation-3.11e-01

Pos logit contributions
suits+0.507
+0.452
mons+0.451
 thirsty+0.445
 sandbox+0.419
Neg logit contributions
eness-0.463
 apology-0.455
heart-0.434
 adult-0.413
fulness-0.413
Token your
Token ID534
Feature activation-6.36e-02

Pos logit contributions
eness+0.648
 apology+0.631
heart+0.588
est+0.579
 adult+0.575
Neg logit contributions
suits-0.602
mons-0.546
 thirsty-0.532
-0.524
 sandbox-0.493
Token apology
Token ID17029
Feature activation-5.51e-01

Pos logit contributions
eness+0.049
heart+0.043
 apology+0.038
 adult+0.034
fulness+0.034
Neg logit contributions
suits-0.208
-0.194
mons-0.193
 thirsty-0.190
 sandbox-0.186
Token.
Token ID13
Feature activation+3.66e-02

Pos logit contributions
 apology+1.199
eness+1.190
heart+1.110
 adult+1.100
 decision+1.063
Neg logit contributions
suits-1.024
-0.895
mons-0.895
 thirsty-0.871
coat-0.806
Token We
Token ID775
Feature activation+1.60e-01

Pos logit contributions
suits+0.066
+0.057
mons+0.057
 thirsty+0.057
coat+0.053
Neg logit contributions
 apology-0.083
eness-0.081
heart-0.074
 adult-0.074
fulness-0.073
Token can
Token ID460
Feature activation-6.26e-02

Pos logit contributions
suits+0.347
+0.311
mons+0.309
 thirsty+0.303
 sandbox+0.284
Neg logit contributions
eness-0.298
 apology-0.294
heart-0.278
 adult-0.268
fulness-0.264
Token be
Token ID307
Feature activation-1.63e-01

Pos logit contributions
 apology+0.129
eness+0.129
heart+0.119
 adult+0.118
fulness+0.114
Neg logit contributions
suits-0.123
-0.110
mons-0.109
 thirsty-0.106
 sandbox-0.099
Token friends
Token ID2460
Feature activation-7.82e-02

Pos logit contributions
 apology+0.242
eness+0.237
 adult+0.209
heart+0.208
est+0.200
Neg logit contributions
suits-0.421
-0.397
mons-0.386
 thirsty-0.364
coat-0.357
Token.
Token ID13
Feature activation+3.42e-02

Pos logit contributions
 apology+0.243
eness+0.238
heart+0.222
 adult+0.220
fulness+0.216
Neg logit contributions
suits-0.185
mons-0.163
 thirsty-0.156
-0.156
coat-0.146
Token You
Token ID921
Feature activation+2.62e-02

Pos logit contributions
suits+0.069
+0.065
mons+0.064
 thirsty+0.062
aped+0.057
Neg logit contributions
eness-0.065
 apology-0.064
 decision-0.060
heart-0.059
 adult-0.058
Token owe
Token ID19059
Feature activation-1.54e-01

Pos logit contributions
suits+0.052
mons+0.047
+0.047
 thirsty+0.046
coat+0.042
Neg logit contributions
 apology-0.053
eness-0.053
heart-0.049
 adult-0.048
 decision-0.048
Token Tommy
Token ID19919
Feature activation+1.75e-02

Pos logit contributions
eness+0.295
 apology+0.272
 decision+0.265
est+0.265
heart+0.260
Neg logit contributions
mons-0.279
suits-0.278
 thirsty-0.274
-0.273
 cups-0.269
Token an
Token ID281
Feature activation+6.82e-04

Pos logit contributions
suits+0.031
mons+0.028
 thirsty+0.028
 cups+0.025
coat+0.025
Neg logit contributions
eness-0.037
 apology-0.037
heart-0.035
fulness-0.034
 adult-0.034
Token apology
Token ID17029
Feature activation-5.51e-01

Pos logit contributions
suits+0.002
coat+0.001
 swing+0.001
 thirsty+0.001
 pool+0.001
Neg logit contributions
eness-0.001
heart-0.001
fulness-0.001
erence-0.001
father-0.001
Token and
Token ID290
Feature activation-4.70e-02

Pos logit contributions
 apology+1.252
eness+1.237
heart+1.173
 adult+1.138
 decision+1.122
Neg logit contributions
suits-0.959
mons-0.827
-0.800
 thirsty-0.794
coat-0.759
Token a
Token ID257
Feature activation+4.35e-02

Pos logit contributions
eness+0.103
 apology+0.094
heart+0.093
est+0.091
 decision+0.089
Neg logit contributions
suits-0.083
 thirsty-0.079
mons-0.077
-0.072
 sandbox-0.071
Token new
Token ID649
Feature activation-1.58e-01

Pos logit contributions
suits+0.095
+0.086
 thirsty+0.086
 sandbox+0.082
mons+0.081
Neg logit contributions
eness-0.083
heart-0.077
est-0.073
 apology-0.071
fulness-0.071
Token car
Token ID1097
Feature activation+5.90e-02

Pos logit contributions
eness+0.316
 apology+0.311
heart+0.291
 adult+0.285
fulness+0.279
Neg logit contributions
suits-0.320
-0.286
mons-0.281
 thirsty-0.277
 sandbox-0.264
Token.
Token ID13
Feature activation+3.36e-02

Pos logit contributions
suits+0.106
+0.093
mons+0.093
 thirsty+0.090
coat+0.085
Neg logit contributions
 apology-0.130
eness-0.129
heart-0.122
 adult-0.120
 decision-0.115
Token sing
Token ID1702
Feature activation+2.09e-01

Pos logit contributions
eness+0.105
 apology+0.102
heart+0.096
fulness+0.094
est+0.093
Neg logit contributions
suits-0.112
-0.105
 thirsty-0.101
mons-0.101
 sandbox-0.095
Token a
Token ID257
Feature activation+2.95e-02

Pos logit contributions
suits+0.363
 thirsty+0.308
mons+0.300
+0.291
 together+0.288
Neg logit contributions
 apology-0.468
eness-0.464
 adult-0.444
heart-0.442
 decision-0.427
Token song
Token ID3496
Feature activation-1.81e-01

Pos logit contributions
suits+0.067
 thirsty+0.062
+0.060
 sandbox+0.059
mons+0.058
Neg logit contributions
eness-0.051
 apology-0.046
heart-0.045
 adult-0.043
est-0.043
Token as
Token ID355
Feature activation-9.71e-03

Pos logit contributions
 apology+0.340
eness+0.338
heart+0.308
 adult+0.306
fulness+0.299
Neg logit contributions
suits-0.391
-0.353
mons-0.351
 thirsty-0.343
 sandbox-0.322
Token an
Token ID281
Feature activation-5.21e-03

Pos logit contributions
eness+0.019
 apology+0.019
heart+0.018
 adult+0.017
est+0.017
Neg logit contributions
suits-0.020
mons-0.018
-0.018
 thirsty-0.017
 sandbox-0.016
Token apology
Token ID17029
Feature activation-5.51e-01

Pos logit contributions
eness+0.008
fulness+0.007
heart+0.006
 apology+0.006
erence+0.006
Neg logit contributions
 thirsty-0.011
suits-0.010
 hall-0.010
 swing-0.010
 spraying-0.010
Token.
Token ID13
Feature activation+2.32e-02

Pos logit contributions
eness+1.115
 apology+1.112
 adult+1.015
heart+1.014
fulness+0.990
Neg logit contributions
suits-1.108
-0.997
mons-0.979
 thirsty-0.958
 sandbox-0.900
Token All
Token ID1439
Feature activation-7.80e-02

Pos logit contributions
suits+0.045
+0.040
mons+0.040
 thirsty+0.039
aped+0.036
Neg logit contributions
 apology-0.049
eness-0.048
heart-0.045
 adult-0.045
 decision-0.043
Token the
Token ID262
Feature activation-2.33e-02

Pos logit contributions
eness+0.175
 apology+0.175
 adult+0.156
erence+0.156
 decision+0.154
Neg logit contributions
 thirsty-0.126
suits-0.125
 together-0.112
 pool-0.109
 spraying-0.109
Token other
Token ID584
Feature activation-1.67e-01

Pos logit contributions
eness+0.043
 apology+0.043
heart+0.039
fulness+0.038
 adult+0.037
Neg logit contributions
suits-0.052
mons-0.046
 thirsty-0.046
-0.046
 sandbox-0.043
Token animals
Token ID4695
Feature activation-4.75e-02

Pos logit contributions
 apology+0.297
eness+0.293
heart+0.265
 adult+0.263
fulness+0.255
Neg logit contributions
suits-0.380
-0.342
mons-0.342
 thirsty-0.336
 sandbox-0.315
Token Ed
Token ID1717
Feature activation-3.47e-02

Pos logit contributions
suits+0.500
mons+0.441
+0.441
 thirsty+0.419
aped+0.410
Neg logit contributions
 apology-0.451
eness-0.430
heart-0.400
 decision-0.397
 adult-0.386
Token without
Token ID1231
Feature activation+6.80e-02

Pos logit contributions
 apology+0.066
eness+0.065
heart+0.059
 adult+0.059
fulness+0.057
Neg logit contributions
suits-0.075
-0.066
mons-0.066
 thirsty-0.065
aped-0.061
Token a
Token ID257
Feature activation+3.02e-02

Pos logit contributions
suits+0.143
 thirsty+0.131
+0.131
mons+0.128
 grazing+0.120
Neg logit contributions
 apology-0.126
eness-0.125
heart-0.114
 adult-0.112
est-0.111
Token word
Token ID1573
Feature activation-3.04e-01

Pos logit contributions
suits+0.068
+0.061
 thirsty+0.061
mons+0.059
coat+0.057
Neg logit contributions
eness-0.054
 apology-0.053
heart-0.049
 adult-0.047
fulness-0.047
Token of
Token ID286
Feature activation-2.64e-01

Pos logit contributions
 apology+0.649
eness+0.637
heart+0.594
 adult+0.586
 decision+0.566
Neg logit contributions
suits-0.571
-0.509
mons-0.499
 thirsty-0.496
coat-0.461
Token apology
Token ID17029
Feature activation-5.51e-01

Pos logit contributions
eness+0.492
 apology+0.492
heart+0.465
est+0.447
 decision+0.433
Neg logit contributions
suits-0.511
-0.487
 thirsty-0.469
 grazing-0.457
mons-0.455
Token or
Token ID393
Feature activation-1.63e-02

Pos logit contributions
 apology+1.203
eness+1.171
heart+1.096
 adult+1.088
 decision+1.052
Neg logit contributions
suits-1.012
-0.889
mons-0.887
 thirsty-0.870
coat-0.803
Token explanation
Token ID7468
Feature activation-3.90e-01

Pos logit contributions
eness+0.031
 apology+0.031
heart+0.028
 adult+0.028
est+0.028
Neg logit contributions
suits-0.033
 thirsty-0.031
mons-0.030
-0.029
 sandbox-0.028
Token.
Token ID13
Feature activation+1.91e-02

Pos logit contributions
 apology+0.865
eness+0.832
heart+0.781
 adult+0.770
est+0.755
Neg logit contributions
suits-0.679
mons-0.595
 thirsty-0.584
-0.582
 grazing-0.542
Token Ed
Token ID1717
Feature activation-3.51e-02

Pos logit contributions
suits+0.037
+0.034
 thirsty+0.032
mons+0.032
 funnel+0.030
Neg logit contributions
 apology-0.039
eness-0.038
heart-0.036
est-0.035
 adult-0.035
Token felt
Token ID2936
Feature activation-9.38e-02

Pos logit contributions
 apology+0.069
eness+0.066
fulness+0.062
heart+0.059
 adult+0.058
Neg logit contributions
suits-0.073
 thirsty-0.065
-0.064
mons-0.062
aped-0.062
Token to
Token ID284
Feature activation-4.93e-02

Pos logit contributions
 apology+0.246
eness+0.245
 adult+0.228
heart+0.227
fulness+0.218
Neg logit contributions
suits-0.217
mons-0.195
-0.193
 thirsty-0.191
 sandbox-0.175
Token the
Token ID262
Feature activation-1.86e-02

Pos logit contributions
eness+0.082
 apology+0.075
heart+0.074
est+0.071
 adult+0.068
Neg logit contributions
-0.106
 thirsty-0.104
suits-0.103
mons-0.101
chard-0.097
Token squirrel
Token ID33039
Feature activation+8.26e-03

Pos logit contributions
eness+0.039
 apology+0.038
fulness+0.036
heart+0.036
 adult+0.034
Neg logit contributions
suits-0.035
 thirsty-0.033
-0.033
mons-0.032
 sandbox-0.031
Token as
Token ID355
Feature activation-4.76e-03

Pos logit contributions
suits+0.016
+0.014
mons+0.014
 thirsty+0.014
 sandbox+0.013
Neg logit contributions
 apology-0.018
eness-0.017
heart-0.016
 adult-0.016
fulness-0.016
Token an
Token ID281
Feature activation-1.92e-03

Pos logit contributions
eness+0.009
 apology+0.009
heart+0.009
 adult+0.008
est+0.008
Neg logit contributions
suits-0.010
mons-0.009
 thirsty-0.009
-0.009
aped-0.008
Token apology
Token ID17029
Feature activation-5.50e-01

Pos logit contributions
eness+0.002
fulness+0.002
heart+0.002
erence+0.002
est+0.001
Neg logit contributions
 thirsty-0.005
 swing-0.005
mons-0.005
 watering-0.004
-0.004
Token.
Token ID13
Feature activation+2.90e-02

Pos logit contributions
 apology+1.170
eness+1.161
heart+1.075
 adult+1.068
fulness+1.031
Neg logit contributions
suits-1.044
-0.926
mons-0.923
 thirsty-0.899
 sandbox-0.835
Token The
Token ID383
Feature activation+3.74e-02

Pos logit contributions
suits+0.059
+0.054
mons+0.052
 thirsty+0.052
 sandbox+0.048
Neg logit contributions
eness-0.058
 apology-0.058
heart-0.054
 adult-0.053
fulness-0.051
Token squirrel
Token ID33039
Feature activation+8.05e-03

Pos logit contributions
 thirsty+0.053
suits+0.052
+0.052
 sandbox+0.049
mons+0.048
Neg logit contributions
eness-0.095
fulness-0.091
heart-0.089
 apology-0.087
est-0.081
Token forg
Token ID5784
Feature activation-1.72e-01

Pos logit contributions
suits+0.016
 thirsty+0.014
+0.014
mons+0.014
aped+0.013
Neg logit contributions
eness-0.016
 apology-0.016
fulness-0.015
heart-0.015
 adult-0.015
Tokenave
Token ID1015
Feature activation-2.08e-01

Pos logit contributions
 apology+0.303
 adult+0.292
heart+0.280
eness+0.278
 decision+0.276
Neg logit contributions
suits-0.400
mons-0.353
aped-0.343
coat-0.333
-0.332
Token
Token ID198
Feature activation+3.80e-02

Pos logit contributions
suits+0.071
+0.063
mons+0.062
 thirsty+0.061
 sandbox+0.056
Neg logit contributions
 apology-0.082
eness-0.082
heart-0.076
 adult-0.075
est-0.073
TokenThe
Token ID464
Feature activation-3.73e-02

Pos logit contributions
suits+0.073
+0.065
mons+0.064
 thirsty+0.063
coat+0.060
Neg logit contributions
 apology-0.082
eness-0.078
 adult-0.076
 himself-0.072
heart-0.071
Token man
Token ID582
Feature activation-1.61e-01

Pos logit contributions
eness+0.070
heart+0.065
 apology+0.062
fulness+0.061
est+0.058
Neg logit contributions
suits-0.084
 thirsty-0.078
mons-0.076
-0.075
 sandbox-0.075
Token accepts
Token ID18178
Feature activation-1.89e-01

Pos logit contributions
eness+0.292
heart+0.278
 apology+0.275
fulness+0.267
 adult+0.263
Neg logit contributions
suits-0.357
-0.318
 thirsty-0.317
mons-0.312
 sandbox-0.306
Token their
Token ID511
Feature activation+1.66e-01

Pos logit contributions
eness+0.400
 apology+0.384
heart+0.374
est+0.359
 adult+0.355
Neg logit contributions
suits-0.355
-0.321
mons-0.318
 thirsty-0.312
 cups-0.297
Token apology
Token ID17029
Feature activation-5.50e-01

Pos logit contributions
suits+0.459
mons+0.427
+0.427
 thirsty+0.418
 sandbox+0.406
Neg logit contributions
eness-0.213
heart-0.200
fulness-0.177
est-0.171
 apology-0.162
Token.
Token ID13
Feature activation+3.19e-02

Pos logit contributions
 apology+1.240
eness+1.222
heart+1.183
 adult+1.166
fulness+1.110
Neg logit contributions
suits-0.968
-0.822
mons-0.821
 thirsty-0.815
 cups-0.740
Token He
Token ID679
Feature activation-2.59e-01

Pos logit contributions
suits+0.092
+0.086
mons+0.085
 thirsty+0.084
 sandbox+0.080
Neg logit contributions
eness-0.037
 apology-0.037
heart-0.032
 adult-0.031
 decision-0.030
Token says
Token ID1139
Feature activation-1.15e-01

Pos logit contributions
eness+0.515
heart+0.480
 apology+0.469
fulness+0.469
 adult+0.444
Neg logit contributions
suits-0.544
 thirsty-0.485
-0.480
mons-0.480
 sandbox-0.467
Token,
Token ID11
Feature activation+5.29e-02

Pos logit contributions
eness+0.276
 apology+0.272
heart+0.258
 adult+0.252
 decision+0.249
Neg logit contributions
suits-0.186
mons-0.163
-0.162
 thirsty-0.160
 sandbox-0.142
Token "
Token ID366
Feature activation+1.52e-04

Pos logit contributions
suits+0.118
mons+0.105
+0.102
 thirsty+0.102
 sandbox+0.098
Neg logit contributions
 apology-0.097
eness-0.094
 adult-0.088
fulness-0.084
heart-0.083
Token sorry
Token ID7926
Feature activation-4.55e-02

Pos logit contributions
 apology+0.041
eness+0.041
 adult+0.037
heart+0.037
fulness+0.035
Neg logit contributions
suits-0.055
-0.049
mons-0.049
 thirsty-0.048
 sandbox-0.045
Token.
Token ID13
Feature activation+4.09e-02

Pos logit contributions
 apology+0.110
eness+0.109
erence+0.102
 adult+0.101
ulously+0.100
Neg logit contributions
suits-0.072
mons-0.057
 cups-0.054
 thirsty-0.053
 together-0.050
Token I
Token ID314
Feature activation+3.92e-02

Pos logit contributions
suits+0.091
+0.079
mons+0.079
 thirsty+0.078
coat+0.075
Neg logit contributions
 apology-0.078
eness-0.074
 adult-0.068
heart-0.068
fulness-0.067
Token accept
Token ID2453
Feature activation-3.10e-01

Pos logit contributions
suits+0.078
mons+0.069
 thirsty+0.068
+0.068
coat+0.063
Neg logit contributions
eness-0.081
 apology-0.078
heart-0.076
 adult-0.073
fulness-0.073
Token your
Token ID534
Feature activation-5.92e-02

Pos logit contributions
eness+0.607
 apology+0.603
heart+0.550
 adult+0.543
est+0.535
Neg logit contributions
suits-0.639
mons-0.574
-0.572
 thirsty-0.563
 sandbox-0.523
Token apology
Token ID17029
Feature activation-5.50e-01

Pos logit contributions
 apology+0.096
eness+0.093
 adult+0.084
heart+0.083
 decision+0.080
Neg logit contributions
suits-0.145
-0.133
mons-0.132
 thirsty-0.129
 sandbox-0.123
Token.
Token ID13
Feature activation+4.02e-02

Pos logit contributions
 apology+1.157
eness+1.150
heart+1.060
 adult+1.056
fulness+1.022
Neg logit contributions
suits-1.059
-0.941
mons-0.935
 thirsty-0.913
 sandbox-0.850
Token Next
Token ID7406
Feature activation-6.82e-02

Pos logit contributions
suits+0.081
+0.072
mons+0.072
 thirsty+0.070
 sandbox+0.066
Neg logit contributions
 apology-0.082
eness-0.081
heart-0.074
 adult-0.074
fulness-0.072
Token time
Token ID640
Feature activation-7.30e-02

Pos logit contributions
 apology+0.116
eness+0.114
heart+0.103
 adult+0.103
 decision+0.099
Neg logit contributions
suits-0.161
-0.146
mons-0.146
 thirsty-0.142
 sandbox-0.135
Token,
Token ID11
Feature activation+6.28e-02

Pos logit contributions
 apology+0.140
eness+0.135
 adult+0.127
heart+0.124
fulness+0.123
Neg logit contributions
suits-0.154
-0.141
mons-0.141
 thirsty-0.134
 sandbox-0.130
Token be
Token ID307
Feature activation-1.59e-01

Pos logit contributions
suits+0.131
+0.118
mons+0.118
 thirsty+0.116
 sandbox+0.109
Neg logit contributions
eness-0.121
 apology-0.120
heart-0.110
 adult-0.108
est-0.106
Token very
Token ID845
Feature activation+9.67e-03

Pos logit contributions
suits+0.183
 thirsty+0.177
+0.165
mons+0.158
 sandbox+0.154
Neg logit contributions
eness-0.205
 apology-0.205
heart-0.191
fulness-0.183
 adult-0.183
Token grateful
Token ID14066
Feature activation-7.88e-02

Pos logit contributions
suits+0.020
 thirsty+0.019
+0.018
mons+0.018
 sandbox+0.017
Neg logit contributions
eness-0.019
 apology-0.018
heart-0.017
 adult-0.017
est-0.017
Token and
Token ID290
Feature activation-5.19e-02

Pos logit contributions
eness+0.165
 apology+0.165
heart+0.151
 adult+0.151
est+0.147
Neg logit contributions
suits-0.148
mons-0.129
-0.128
 thirsty-0.127
 together-0.121
Token accepted
Token ID6292
Feature activation-1.48e-01

Pos logit contributions
eness+0.100
 apology+0.100
 adult+0.092
heart+0.091
est+0.088
Neg logit contributions
suits-0.108
 thirsty-0.099
-0.096
mons-0.095
 sandbox-0.089
Token his
Token ID465
Feature activation-3.68e-01

Pos logit contributions
eness+0.319
 apology+0.315
heart+0.295
 adult+0.291
est+0.289
Neg logit contributions
suits-0.270
-0.243
mons-0.241
 thirsty-0.237
 sandbox-0.213
Token apology
Token ID17029
Feature activation-5.49e-01

Pos logit contributions
eness+0.498
heart+0.472
fulness+0.422
est+0.402
 apology+0.373
Neg logit contributions
suits-0.960
-0.908
 thirsty-0.904
mons-0.870
 sandbox-0.852
Token.
Token ID13
Feature activation+2.77e-02

Pos logit contributions
 apology+1.197
eness+1.171
heart+1.108
 adult+1.099
est+1.052
Neg logit contributions
suits-1.019
-0.887
mons-0.882
 thirsty-0.867
aped-0.803
Token
Token ID198
Feature activation+2.93e-02

Pos logit contributions
suits+0.061
mons+0.055
+0.054
 thirsty+0.054
coat+0.051
Neg logit contributions
 apology-0.051
eness-0.050
 adult-0.046
heart-0.046
fulness-0.044
Token
Token ID198
Feature activation+2.93e-02

Pos logit contributions
suits+0.052
+0.046
mons+0.044
 thirsty+0.044
aped+0.040
Neg logit contributions
eness-0.067
 apology-0.065
heart-0.063
est-0.061
 decision-0.060
TokenThe
Token ID464
Feature activation-4.10e-02

Pos logit contributions
suits+0.060
+0.056
mons+0.054
 thirsty+0.053
aped+0.050
Neg logit contributions
eness-0.057
 apology-0.057
heart-0.053
est-0.052
 decision-0.051
Token z
Token ID1976
Feature activation+1.68e-01

Pos logit contributions
eness+0.078
 apology+0.071
heart+0.070
fulness+0.069
est+0.062
Neg logit contributions
suits-0.088
 thirsty-0.080
 sandbox-0.077
mons-0.077
-0.076
Token to
Token ID284
Feature activation-4.74e-02

Pos logit contributions
suits+0.227
+0.202
mons+0.201
 thirsty+0.197
 sandbox+0.183
Neg logit contributions
 apology-0.235
eness-0.233
heart-0.215
 adult-0.214
fulness-0.207
Token his
Token ID465
Feature activation-3.61e-01

Pos logit contributions
eness+0.094
 apology+0.093
heart+0.086
 adult+0.084
est+0.084
Neg logit contributions
suits-0.094
-0.087
mons-0.086
 thirsty-0.085
 sandbox-0.079
Token friends
Token ID2460
Feature activation-7.94e-02

Pos logit contributions
 apology+0.560
eness+0.528
 adult+0.495
 decision+0.473
heart+0.470
Neg logit contributions
suits-0.915
-0.825
mons-0.817
 thirsty-0.797
coat-0.776
Token as
Token ID355
Feature activation-7.00e-04

Pos logit contributions
eness+0.159
 apology+0.159
heart+0.147
 adult+0.146
est+0.141
Neg logit contributions
suits-0.159
mons-0.142
-0.141
 thirsty-0.136
 sandbox-0.129
Token an
Token ID281
Feature activation+1.56e-03

Pos logit contributions
 apology+0.002
eness+0.002
 adult+0.001
heart+0.001
fulness+0.001
Neg logit contributions
suits-0.001
-0.001
mons-0.001
 thirsty-0.001
 sandbox-0.001
Token apology
Token ID17029
Feature activation-5.49e-01

Pos logit contributions
 thirsty+0.004
suits+0.004
mons+0.004
+0.004
 cups+0.003
Neg logit contributions
eness-0.002
fulness-0.002
heart-0.002
erence-0.002
 apology-0.002
Token.
Token ID13
Feature activation+3.01e-02

Pos logit contributions
 apology+1.154
eness+1.147
heart+1.058
 adult+1.054
fulness+1.019
Neg logit contributions
suits-1.056
-0.938
mons-0.931
 thirsty-0.910
 sandbox-0.846
Token His
Token ID2399
Feature activation-3.78e-01

Pos logit contributions
suits+0.072
+0.067
mons+0.066
 thirsty+0.064
aped+0.061
Neg logit contributions
eness-0.050
 apology-0.049
heart-0.044
 adult-0.044
 decision-0.042
Token mom
Token ID1995
Feature activation-1.20e-01

Pos logit contributions
eness+0.760
heart+0.714
fulness+0.691
 apology+0.682
est+0.649
Neg logit contributions
suits-0.780
mons-0.731
-0.728
 thirsty-0.720
 sandbox-0.682
Token agreed
Token ID4987
Feature activation-1.44e-01

Pos logit contributions
eness+0.293
 apology+0.286
fulness+0.278
 decisions+0.276
 decision+0.265
Neg logit contributions
suits-0.172
 together-0.161
 thirsty-0.151
-0.147
mons-0.146
Token and
Token ID290
Feature activation-5.33e-02

Pos logit contributions
eness+0.347
 apology+0.342
heart+0.318
 adult+0.315
est+0.313
Neg logit contributions
suits-0.217
-0.205
 together-0.192
 thirsty-0.190
mons-0.185
Token and
Token ID290
Feature activation-4.54e-02

Pos logit contributions
suits+0.030
 thirsty+0.026
+0.025
mons+0.025
 sandbox+0.024
Neg logit contributions
 apology-0.034
eness-0.034
fulness-0.032
heart-0.031
 adult-0.031
Token her
Token ID607
Feature activation-1.05e-01

Pos logit contributions
 apology+0.111
eness+0.110
heart+0.103
 adult+0.103
fulness+0.100
Neg logit contributions
suits-0.074
-0.064
mons-0.063
 thirsty-0.062
 sandbox-0.056
Token mom
Token ID1995
Feature activation-1.17e-01

Pos logit contributions
eness+0.171
 apology+0.168
heart+0.154
 adult+0.150
fulness+0.146
Neg logit contributions
suits-0.257
-0.233
mons-0.232
 thirsty-0.230
 sandbox-0.218
Token accepted
Token ID6292
Feature activation-1.48e-01

Pos logit contributions
 apology+0.242
eness+0.239
heart+0.221
 adult+0.219
fulness+0.214
Neg logit contributions
suits-0.232
-0.204
mons-0.202
 thirsty-0.201
 sandbox-0.188
Token his
Token ID465
Feature activation-3.60e-01

Pos logit contributions
 apology+0.298
eness+0.297
heart+0.273
 adult+0.271
est+0.263
Neg logit contributions
suits-0.299
-0.269
mons-0.267
 thirsty-0.261
 sandbox-0.243
Token apology
Token ID17029
Feature activation-5.49e-01

Pos logit contributions
eness+0.261
heart+0.245
fulness+0.208
erence+0.159
est+0.151
Neg logit contributions
suits-1.217
mons-1.104
-1.103
 thirsty-1.099
 sandbox-1.082
Token and
Token ID290
Feature activation-4.65e-02

Pos logit contributions
 apology+1.145
eness+1.133
heart+1.069
 adult+1.052
est+1.011
Neg logit contributions
suits-1.063
mons-0.932
-0.931
 thirsty-0.917
coat-0.839
Token thanked
Token ID26280
Feature activation-1.45e-01

Pos logit contributions
 apology+0.094
eness+0.093
heart+0.085
 adult+0.085
fulness+0.082
Neg logit contributions
suits-0.094
-0.084
mons-0.083
 thirsty-0.081
coat-0.076
Token him
Token ID683
Feature activation-3.20e-01

Pos logit contributions
eness+0.223
 apology+0.221
heart+0.198
 adult+0.195
est+0.191
Neg logit contributions
suits-0.354
-0.330
mons-0.326
 thirsty-0.321
 sandbox-0.303
Token.
Token ID13
Feature activation+3.69e-02

Pos logit contributions
eness+0.722
 apology+0.708
heart+0.687
 adult+0.663
est+0.647
Neg logit contributions
suits-0.566
-0.476
mons-0.475
 thirsty-0.475
 cups-0.431
Token They
Token ID1119
Feature activation+2.26e-01

Pos logit contributions
suits+0.074
coat+0.070
mons+0.067
+0.067
 thirsty+0.063
Neg logit contributions
 apology-0.067
 compassion-0.059
 adult-0.059
 moral-0.058
 His-0.058
Token than
Token ID621
Feature activation+5.61e-02

Pos logit contributions
 apology+0.522
eness+0.522
heart+0.490
 adult+0.483
est+0.471
Neg logit contributions
suits-0.392
mons-0.338
-0.333
 thirsty-0.329
aped-0.299
Token selfish
Token ID20363
Feature activation-7.71e-02

Pos logit contributions
suits+0.111
+0.098
mons+0.098
 thirsty+0.096
 sandbox+0.089
Neg logit contributions
 apology-0.116
eness-0.115
heart-0.106
 adult-0.105
fulness-0.102
Tokenness
Token ID1108
Feature activation-2.63e-01

Pos logit contributions
 apology+0.179
eness+0.168
 adult+0.167
heart+0.163
fulness+0.160
Neg logit contributions
suits-0.129
mons-0.106
 thirsty-0.103
 sandbox-0.099
 cups-0.098
Token and
Token ID290
Feature activation-4.55e-02

Pos logit contributions
 apology+0.587
eness+0.571
heart+0.540
 adult+0.538
 decision+0.521
Neg logit contributions
suits-0.470
mons-0.406
 thirsty-0.399
-0.394
coat-0.363
Token rud
Token ID28906
Feature activation-8.40e-02

Pos logit contributions
eness+0.083
 apology+0.082
heart+0.076
 adult+0.075
est+0.073
Neg logit contributions
suits-0.099
 thirsty-0.090
mons-0.089
-0.088
 sandbox-0.083
Tokeneness
Token ID9449
Feature activation-5.47e-01

Pos logit contributions
 apology+0.047
 adult+0.037
 decision+0.034
 lesson+0.029
heart+0.028
Neg logit contributions
suits-0.283
mons-0.265
-0.264
aped-0.258
owers-0.255
Token.
Token ID13
Feature activation+3.66e-02

Pos logit contributions
 apology+1.261
eness+1.235
heart+1.196
 adult+1.174
 decision+1.126
Neg logit contributions
suits-0.936
mons-0.806
-0.782
 thirsty-0.762
aped-0.703
Token<|endoftext|>
Token ID50256
Feature activation-9.10e-02

Pos logit contributions
suits+0.066
+0.060
mons+0.058
 thirsty+0.057
 sandbox+0.052
Neg logit contributions
eness-0.080
 apology-0.079
heart-0.075
 adult-0.073
 decision-0.072
TokenTom
Token ID13787
Feature activation+5.52e-02

Pos logit contributions
eness+0.174
 apology+0.170
 himself+0.169
est+0.168
 adult+0.160
Neg logit contributions
suits-0.171
mons-0.157
 thirsty-0.153
coat-0.149
drops-0.148
Token and
Token ID290
Feature activation-5.90e-02

Pos logit contributions
suits+0.116
 thirsty+0.101
+0.095
mons+0.095
coat+0.094
Neg logit contributions
 apology-0.111
eness-0.110
fulness-0.100
 adult-0.099
heart-0.098
Token Sam
Token ID3409
Feature activation+7.32e-02

Pos logit contributions
eness+0.115
 apology+0.109
heart+0.102
 adult+0.098
 decision+0.098
Neg logit contributions
suits-0.124
 thirsty-0.115
mons-0.110
-0.109
 sandbox-0.107
Token taking
Token ID2263
Feature activation+1.29e-01

Pos logit contributions
suits+0.081
mons+0.074
+0.072
owers+0.064
 funnel+0.063
Neg logit contributions
 apology-0.069
eness-0.066
 himself-0.065
heart-0.065
 compassion-0.064
Token in
Token ID287
Feature activation+1.63e-01

Pos logit contributions
suits+0.292
+0.265
mons+0.263
 thirsty+0.257
 sandbox+0.243
Neg logit contributions
 apology-0.231
eness-0.229
heart-0.209
 adult-0.207
est-0.199
Token all
Token ID477
Feature activation-3.84e-02

Pos logit contributions
suits+0.386
mons+0.360
+0.344
 thirsty+0.331
aped+0.329
Neg logit contributions
 apology-0.280
eness-0.274
heart-0.250
 adult-0.249
 decision-0.244
Token the
Token ID262
Feature activation-2.59e-02

Pos logit contributions
eness+0.080
 apology+0.080
 adult+0.072
fulness+0.069
 decision+0.069
Neg logit contributions
suits-0.072
 thirsty-0.068
mons-0.062
 sandbox-0.062
 cups-0.061
Token whit
Token ID20542
Feature activation+8.50e-03

Pos logit contributions
 apology+0.054
eness+0.053
 adult+0.049
fulness+0.048
heart+0.048
Neg logit contributions
suits-0.052
 thirsty-0.046
mons-0.044
-0.044
 sandbox-0.042
Tokeneness
Token ID9449
Feature activation-5.47e-01

Pos logit contributions
suits+0.023
+0.021
owers+0.021
mons+0.021
 thirsty+0.021
Neg logit contributions
 apology-0.010
 adult-0.009
 lesson-0.008
 moral-0.008
eness-0.008
Token and
Token ID290
Feature activation-6.28e-02

Pos logit contributions
eness+1.017
 apology+0.991
heart+0.919
fulness+0.909
 adult+0.903
Neg logit contributions
suits-1.172
-1.068
mons-1.057
 thirsty-1.014
 sandbox-0.968
Token beauty
Token ID8737
Feature activation-2.36e-01

Pos logit contributions
 apology+0.134
eness+0.133
heart+0.123
 adult+0.122
fulness+0.118
Neg logit contributions
suits-0.119
-0.105
mons-0.105
 thirsty-0.103
 sandbox-0.096
Token of
Token ID286
Feature activation-2.65e-01

Pos logit contributions
eness+0.472
 apology+0.465
heart+0.432
fulness+0.423
est+0.421
Neg logit contributions
suits-0.479
-0.429
mons-0.427
 thirsty-0.414
 sandbox-0.394
Token nature
Token ID3450
Feature activation-4.79e-05

Pos logit contributions
 apology+0.491
eness+0.486
heart+0.445
 adult+0.441
fulness+0.427
Neg logit contributions
suits-0.591
-0.527
mons-0.526
 thirsty-0.507
coat-0.484
Token.
Token ID13
Feature activation+1.79e-02

Pos logit contributions
eness+0.000
 apology+0.000
heart+0.000
est+0.000
fulness+0.000
Neg logit contributions
suits-0.000
mons-0.000
-0.000
 sandbox-0.000
 funnel-0.000
Token appreciate
Token ID9144
Feature activation-1.18e-01

Pos logit contributions
suits+0.068
+0.060
 thirsty+0.059
mons+0.059
coat+0.055
Neg logit contributions
eness-0.070
 apology-0.069
heart-0.066
 adult-0.065
fulness-0.063
Token your
Token ID534
Feature activation-6.64e-02

Pos logit contributions
eness+0.225
 apology+0.222
heart+0.203
 adult+0.199
est+0.198
Neg logit contributions
suits-0.248
mons-0.223
-0.223
 thirsty-0.221
 sandbox-0.206
Token honesty
Token ID24345
Feature activation-3.45e-01

Pos logit contributions
eness+0.095
heart+0.087
fulness+0.081
est+0.077
 apology+0.076
Neg logit contributions
suits-0.168
-0.160
 thirsty-0.156
mons-0.155
 cups-0.147
Token and
Token ID290
Feature activation-4.47e-02

Pos logit contributions
 apology+0.750
eness+0.739
heart+0.703
 adult+0.689
 decision+0.667
Neg logit contributions
suits-0.646
-0.560
mons-0.555
 thirsty-0.542
coat-0.503
Token your
Token ID534
Feature activation-6.57e-02

Pos logit contributions
eness+0.069
heart+0.062
 apology+0.060
est+0.060
 adult+0.057
Neg logit contributions
suits-0.108
 thirsty-0.102
mons-0.101
 sandbox-0.097
 cups-0.094
Token apology
Token ID17029
Feature activation-5.46e-01

Pos logit contributions
eness+0.096
heart+0.088
fulness+0.082
est+0.079
 apology+0.077
Neg logit contributions
suits-0.164
-0.149
mons-0.149
 thirsty-0.149
 sandbox-0.143
Token.
Token ID13
Feature activation+3.91e-02

Pos logit contributions
 apology+1.187
eness+1.178
heart+1.102
 adult+1.091
 decision+1.053
Neg logit contributions
suits-1.013
mons-0.885
-0.884
 thirsty-0.866
coat-0.798
Token You
Token ID921
Feature activation+2.56e-02

Pos logit contributions
suits+0.077
+0.068
mons+0.068
 thirsty+0.066
 sandbox+0.062
Neg logit contributions
 apology-0.081
eness-0.080
heart-0.074
 adult-0.074
 decision-0.071
Token can
Token ID460
Feature activation-6.30e-02

Pos logit contributions
suits+0.046
+0.041
 thirsty+0.041
mons+0.039
 sandbox+0.035
Neg logit contributions
eness-0.056
 apology-0.056
heart-0.055
 adult-0.053
fulness-0.051
Token keep
Token ID1394
Feature activation-1.61e-01

Pos logit contributions
eness+0.130
 apology+0.128
heart+0.122
est+0.118
fulness+0.117
Neg logit contributions
-0.118
suits-0.117
 thirsty-0.105
mons-0.104
coat-0.100
Token the
Token ID262
Feature activation-1.31e-02

Pos logit contributions
 apology+0.293
eness+0.292
heart+0.266
 adult+0.263
est+0.254
Neg logit contributions
suits-0.355
-0.322
mons-0.320
 thirsty-0.315
 sandbox-0.294
Token He
Token ID679
Feature activation-2.70e-01

Pos logit contributions
suits+0.062
+0.056
mons+0.056
 thirsty+0.055
 sandbox+0.053
Neg logit contributions
 apology-0.034
eness-0.033
heart-0.029
 adult-0.029
est-0.028
Token could
Token ID714
Feature activation-3.62e-02

Pos logit contributions
eness+0.573
 apology+0.566
heart+0.510
fulness+0.509
 adult+0.498
Neg logit contributions
suits-0.555
 thirsty-0.482
mons-0.469
-0.465
 sandbox-0.447
Token smell
Token ID8508
Feature activation+3.80e-02

Pos logit contributions
eness+0.087
 apology+0.087
heart+0.081
 adult+0.080
fulness+0.078
Neg logit contributions
suits-0.059
-0.052
mons-0.051
 thirsty-0.050
 sandbox-0.046
Token the
Token ID262
Feature activation-2.29e-02

Pos logit contributions
suits+0.069
 thirsty+0.066
mons+0.061
+0.060
 cups+0.059
Neg logit contributions
 apology-0.081
eness-0.081
 adult-0.075
heart-0.072
 decision-0.070
Token peac
Token ID41961
Feature activation+1.17e-01

Pos logit contributions
 apology+0.050
eness+0.049
 adult+0.046
heart+0.045
fulness+0.044
Neg logit contributions
suits-0.043
 thirsty-0.038
-0.037
mons-0.036
 sandbox-0.035
Tokeneness
Token ID9449
Feature activation-5.46e-01

Pos logit contributions
suits+0.319
+0.305
coat+0.301
owers+0.299
mons+0.296
Neg logit contributions
 apology-0.145
 decision-0.112
 adult-0.109
 lesson-0.109
eness-0.105
Token of
Token ID286
Feature activation-2.61e-01

Pos logit contributions
eness+1.041
 apology+1.025
heart+0.950
 adult+0.938
est+0.922
Neg logit contributions
suits-1.168
-1.044
mons-1.040
 thirsty-0.998
 sandbox-0.948
Token the
Token ID262
Feature activation-2.26e-02

Pos logit contributions
eness+0.544
 apology+0.542
 adult+0.497
heart+0.497
fulness+0.479
Neg logit contributions
suits-0.498
 thirsty-0.450
-0.443
mons-0.440
 sandbox-0.416
Token cave
Token ID11527
Feature activation+3.04e-02

Pos logit contributions
 apology+0.046
eness+0.046
heart+0.043
 adult+0.042
est+0.041
Neg logit contributions
suits-0.045
-0.040
mons-0.040
 thirsty-0.038
coat-0.036
Token.
Token ID13
Feature activation+2.41e-02

Pos logit contributions
suits+0.064
+0.057
mons+0.057
 thirsty+0.055
 sandbox+0.052
Neg logit contributions
eness-0.059
 apology-0.058
heart-0.054
 adult-0.053
est-0.052
Token He
Token ID679
Feature activation-2.70e-01

Pos logit contributions
suits+0.063
+0.058
mons+0.057
 thirsty+0.056
 sandbox+0.053
Neg logit contributions
 apology-0.035
eness-0.035
heart-0.031
 adult-0.031
fulness-0.029
Token very
Token ID845
Feature activation+5.91e-03

Pos logit contributions
suits+0.067
+0.060
mons+0.060
 thirsty+0.058
 sandbox+0.054
Neg logit contributions
 apology-0.071
eness-0.070
heart-0.065
 adult-0.064
fulness-0.062
Token annoyed
Token ID25602
Feature activation-1.92e-02

Pos logit contributions
suits+0.011
+0.010
mons+0.010
 thirsty+0.009
 sandbox+0.009
Neg logit contributions
 apology-0.013
eness-0.013
heart-0.012
 adult-0.012
 decision-0.012
Token with
Token ID351
Feature activation+1.32e-01

Pos logit contributions
 apology+0.047
eness+0.045
 adult+0.041
 decision+0.040
fulness+0.040
Neg logit contributions
suits-0.030
mons-0.026
 thirsty-0.026
-0.025
 together-0.025
Token his
Token ID465
Feature activation-3.75e-01

Pos logit contributions
suits+0.270
+0.257
 thirsty+0.256
mons+0.246
 sandbox+0.237
Neg logit contributions
eness-0.233
 apology-0.229
heart-0.216
fulness-0.205
est-0.204
Token rud
Token ID28906
Feature activation-8.32e-02

Pos logit contributions
eness+0.731
 apology+0.692
heart+0.683
fulness+0.653
 adult+0.631
Neg logit contributions
suits-0.802
-0.719
 thirsty-0.714
mons-0.709
 sandbox-0.662
Tokeneness
Token ID9449
Feature activation-5.45e-01

Pos logit contributions
 apology+0.065
 moral+0.052
 lesson+0.050
 inner+0.043
 persever+0.042
Neg logit contributions
suits-0.247
coat-0.237
-0.237
owers-0.225
mons-0.218
Token and
Token ID290
Feature activation-6.15e-02

Pos logit contributions
 apology+1.121
eness+1.119
heart+1.027
 adult+1.024
fulness+0.991
Neg logit contributions
suits-1.070
-0.953
mons-0.947
 thirsty-0.925
 sandbox-0.863
Token complained
Token ID13832
Feature activation+6.21e-03

Pos logit contributions
eness+0.129
 apology+0.128
heart+0.117
fulness+0.116
 adult+0.115
Neg logit contributions
suits-0.117
 thirsty-0.106
mons-0.104
-0.099
 sandbox-0.098
Token that
Token ID326
Feature activation-6.59e-02

Pos logit contributions
suits+0.011
mons+0.010
 thirsty+0.010
 together+0.009
+0.009
Neg logit contributions
 apology-0.014
eness-0.013
heart-0.012
 adult-0.012
 decision-0.012
Token he
Token ID339
Feature activation-1.60e-01

Pos logit contributions
 apology+0.124
eness+0.123
heart+0.111
 adult+0.110
est+0.107
Neg logit contributions
suits-0.141
mons-0.127
-0.127
 thirsty-0.126
 sandbox-0.118
Token was
Token ID373
Feature activation-1.47e-01

Pos logit contributions
 apology+0.321
eness+0.314
fulness+0.291
heart+0.287
 adult+0.281
Neg logit contributions
suits-0.332
 thirsty-0.297
mons-0.289
-0.282
aped-0.276
Token girl
Token ID2576
Feature activation-6.48e-02

Pos logit contributions
 apology+0.103
heart+0.097
eness+0.097
 adult+0.096
est+0.095
Neg logit contributions
suits-0.182
-0.171
mons-0.164
 thirsty-0.160
 funnel-0.154
Token was
Token ID373
Feature activation-1.50e-01

Pos logit contributions
 apology+0.116
eness+0.115
heart+0.105
 adult+0.104
fulness+0.100
Neg logit contributions
suits-0.145
-0.131
mons-0.131
 thirsty-0.128
 sandbox-0.121
Token playing
Token ID2712
Feature activation+3.69e-01

Pos logit contributions
 adult+0.332
 apology+0.311
 himself+0.300
 lesson+0.298
 decision+0.295
Neg logit contributions
suits-0.275
-0.215
mons-0.201
suit-0.192
 funnel-0.191
Token on
Token ID319
Feature activation+2.24e-01

Pos logit contributions
suits+0.677
+0.554
mons+0.540
 thirsty+0.520
 funnel+0.489
Neg logit contributions
 apology-0.838
eness-0.823
est-0.794
 adult-0.788
 himself-0.784
Token the
Token ID262
Feature activation-2.45e-02

Pos logit contributions
suits+0.662
mons+0.559
suit+0.535
 thirsty+0.535
aped+0.533
Neg logit contributions
 apology-0.260
 himself-0.259
 adult-0.258
 lesson-0.239
 moral-0.228
Token swing
Token ID9628
Feature activation+4.03e-01

Pos logit contributions
 apology+0.072
 adult+0.069
 decision+0.068
est+0.068
 lesson+0.068
Neg logit contributions
suits-0.026
-0.020
mons-0.018
etchup-0.017
 thirsty-0.017
Token,
Token ID11
Feature activation+3.62e-02

Pos logit contributions
suits+0.870
mons+0.754
+0.746
 thirsty+0.733
 sandbox+0.678
Neg logit contributions
eness-0.765
 apology-0.737
est-0.725
heart-0.719
 adult-0.700
Token she
Token ID673
Feature activation+1.16e-01

Pos logit contributions
suits+0.081
+0.068
mons+0.067
coat+0.062
 funnel+0.062
Neg logit contributions
 apology-0.067
eness-0.064
heart-0.064
 adult-0.063
 himself-0.061
Token lost
Token ID2626
Feature activation+1.29e-01

Pos logit contributions
suits+0.220
+0.207
mons+0.200
 thirsty+0.177
 funnel+0.177
Neg logit contributions
eness-0.227
 apology-0.225
heart-0.223
 adult-0.218
est-0.213
Token her
Token ID607
Feature activation-1.20e-01

Pos logit contributions
suits+0.309
+0.282
mons+0.280
 thirsty+0.273
 sandbox+0.258
Neg logit contributions
 apology-0.214
eness-0.213
heart-0.193
 adult-0.192
est-0.183
Token balance
Token ID5236
Feature activation+2.57e-03

Pos logit contributions
 apology+0.243
eness+0.239
 adult+0.221
heart+0.219
 decision+0.214
Neg logit contributions
suits-0.241
-0.214
mons-0.213
 thirsty-0.207
aped-0.194
Token didn
Token ID1422
Feature activation-1.56e-01

Pos logit contributions
eness+0.629
 apology+0.616
fulness+0.594
heart+0.567
est+0.559
Neg logit contributions
suits-0.442
 thirsty-0.395
-0.355
aped-0.352
mons-0.351
Token't
Token ID470
Feature activation+3.06e-02

Pos logit contributions
eness+0.409
heart+0.400
 apology+0.398
est+0.378
 decision+0.369
Neg logit contributions
-0.225
suits-0.198
mons-0.179
 thirsty-0.175
 funnel-0.158
Token know
Token ID760
Feature activation-2.45e-01

Pos logit contributions
suits+0.063
 thirsty+0.058
coat+0.054
mons+0.052
 swing+0.052
Neg logit contributions
eness-0.058
 apology-0.057
heart-0.054
ulously-0.051
 adult-0.051
Token what
Token ID644
Feature activation-1.73e-01

Pos logit contributions
 apology+0.431
eness+0.422
heart+0.375
 decision+0.367
 adult+0.361
Neg logit contributions
suits-0.536
 thirsty-0.496
mons-0.488
-0.485
 sandbox-0.469
Token was
Token ID373
Feature activation-1.38e-01

Pos logit contributions
eness+0.411
 apology+0.388
heart+0.378
 decision+0.358
est+0.354
Neg logit contributions
suits-0.294
mons-0.268
 thirsty-0.264
-0.252
 watering-0.236
Token making
Token ID1642
Feature activation+2.82e-01

Pos logit contributions
 apology+0.278
eness+0.276
heart+0.253
 adult+0.252
fulness+0.244
Neg logit contributions
suits-0.282
-0.252
mons-0.251
 thirsty-0.245
 sandbox-0.229
Token the
Token ID262
Feature activation-1.93e-02

Pos logit contributions
suits+0.492
+0.473
 thirsty+0.457
mons+0.438
 together+0.437
Neg logit contributions
 apology-0.579
eness-0.548
 adult-0.507
heart-0.504
 decision-0.493
Token noise
Token ID7838
Feature activation-1.93e-01

Pos logit contributions
eness+0.045
fulness+0.043
 apology+0.043
heart+0.041
 adult+0.040
Neg logit contributions
 thirsty-0.029
suits-0.029
mons-0.028
-0.028
 sandbox-0.028
Token,
Token ID11
Feature activation+4.43e-02

Pos logit contributions
 apology+0.429
eness+0.411
 adult+0.385
heart+0.383
 decision+0.368
Neg logit contributions
suits-0.346
 thirsty-0.311
-0.306
mons-0.298
coat-0.279
Token but
Token ID475
Feature activation+8.93e-02

Pos logit contributions
suits+0.063
 thirsty+0.062
+0.054
 sandbox+0.053
mons+0.053
Neg logit contributions
 apology-0.111
eness-0.110
heart-0.101
 adult-0.101
fulness-0.098
Token he
Token ID339
Feature activation-1.54e-01

Pos logit contributions
 thirsty+0.171
suits+0.168
+0.155
chard+0.154
 together+0.148
Neg logit contributions
eness-0.185
 apology-0.182
 adult-0.158
heart-0.158
est-0.156
Token pushing
Token ID7796
Feature activation+2.01e-01

Pos logit contributions
suits+0.075
+0.067
mons+0.067
 thirsty+0.062
coat+0.059
Neg logit contributions
 apology-0.085
eness-0.084
heart-0.078
 adult-0.078
est-0.076
Token the
Token ID262
Feature activation-2.00e-02

Pos logit contributions
suits+0.384
mons+0.310
 thirsty+0.307
 cups+0.301
 together+0.296
Neg logit contributions
 apology-0.440
eness-0.412
 adult-0.391
heart-0.385
oral-0.384
Token shadows
Token ID16187
Feature activation+1.39e-02

Pos logit contributions
 apology+0.040
eness+0.039
 adult+0.036
heart+0.036
est+0.035
Neg logit contributions
suits-0.041
-0.038
mons-0.037
 thirsty-0.036
aped-0.034
Token away
Token ID1497
Feature activation-5.55e-02

Pos logit contributions
suits+0.027
mons+0.025
+0.025
 thirsty+0.023
 sandbox+0.022
Neg logit contributions
eness-0.029
 apology-0.028
heart-0.026
fulness-0.026
est-0.026
Token and
Token ID290
Feature activation-6.04e-02

Pos logit contributions
eness+0.110
 apology+0.107
heart+0.100
 adult+0.099
est+0.098
Neg logit contributions
suits-0.114
-0.105
mons-0.104
 thirsty-0.100
coat-0.094
Token making
Token ID1642
Feature activation+2.82e-01

Pos logit contributions
 apology+0.136
eness+0.136
heart+0.127
 adult+0.126
fulness+0.123
Neg logit contributions
suits-0.107
-0.095
mons-0.093
 thirsty-0.088
coat-0.083
Token things
Token ID1243
Feature activation-1.02e-01

Pos logit contributions
suits+0.519
 thirsty+0.480
mons+0.447
 sandbox+0.444
 cups+0.443
Neg logit contributions
 apology-0.579
eness-0.576
 adult-0.530
heart-0.517
fulness-0.502
Token look
Token ID804
Feature activation+5.17e-02

Pos logit contributions
eness+0.194
 apology+0.193
heart+0.176
 adult+0.173
fulness+0.172
Neg logit contributions
suits-0.219
mons-0.197
-0.197
 thirsty-0.187
aped-0.178
Token beautiful
Token ID4950
Feature activation-2.45e-02

Pos logit contributions
suits+0.104
 thirsty+0.093
mons+0.088
+0.088
 sandbox+0.084
Neg logit contributions
 apology-0.106
eness-0.104
 adult-0.096
heart-0.094
fulness-0.090
Token.
Token ID13
Feature activation+1.86e-02

Pos logit contributions
eness+0.047
 apology+0.045
heart+0.043
est+0.043
fulness+0.041
Neg logit contributions
suits-0.052
-0.048
mons-0.047
 thirsty-0.046
aped-0.043
Token He
Token ID679
Feature activation-2.70e-01

Pos logit contributions
suits+0.037
+0.034
mons+0.033
 thirsty+0.033
aped+0.031
Neg logit contributions
eness-0.038
 apology-0.038
heart-0.034
fulness-0.034
 decision-0.033
Token there
Token ID612
Feature activation+3.20e-01

Pos logit contributions
suits+0.063
mons+0.058
+0.057
coat+0.052
angers+0.050
Neg logit contributions
 apology-0.095
 himself-0.095
est-0.094
heart-0.091
 moral-0.091
Token was
Token ID373
Feature activation-1.44e-01

Pos logit contributions
suits+0.661
+0.621
 thirsty+0.589
mons+0.582
 sandbox+0.562
Neg logit contributions
eness-0.642
 apology-0.637
 adult-0.560
heart-0.556
fulness-0.537
Token a
Token ID257
Feature activation+3.34e-02

Pos logit contributions
 apology+0.195
 himself+0.194
est+0.182
 lesson+0.180
eness+0.179
Neg logit contributions
suits-0.375
mons-0.334
-0.328
coat-0.313
drops-0.311
Token little
Token ID1310
Feature activation-1.50e-01

Pos logit contributions
suits+0.098
mons+0.096
+0.092
aped+0.091
packs+0.089
Neg logit contributions
 apology-0.033
 adult-0.030
 moral-0.029
 decision-0.027
heart-0.026
Token boy
Token ID2933
Feature activation-5.68e-02

Pos logit contributions
 apology+0.181
heart+0.178
 adult+0.174
est+0.167
eness+0.162
Neg logit contributions
suits-0.396
-0.383
mons-0.380
aped-0.365
angers-0.346
Token named
Token ID3706
Feature activation+3.13e-01

Pos logit contributions
 apology+0.134
 adult+0.131
heart+0.131
eness+0.130
est+0.126
Neg logit contributions
suits-0.092
-0.080
mons-0.077
owers-0.067
 sandbox-0.066
Token Tim
Token ID5045
Feature activation+1.34e-01

Pos logit contributions
suits+0.615
mons+0.564
+0.555
aped+0.513
 thirsty+0.508
Neg logit contributions
 apology-0.638
 himself-0.603
 adult-0.602
eness-0.592
heart-0.573
Tokenmy
Token ID1820
Feature activation-1.95e-01

Pos logit contributions
suits+0.277
 thirsty+0.245
coat+0.236
 sandbox+0.235
packs+0.230
Neg logit contributions
 apology-0.264
eness-0.259
fulness-0.231
 adult-0.227
heart-0.226
Token.
Token ID13
Feature activation+2.02e-02

Pos logit contributions
 apology+0.394
eness+0.392
heart+0.361
 adult+0.361
est+0.349
Neg logit contributions
suits-0.392
-0.352
mons-0.349
 thirsty-0.336
 sandbox-0.315
Token Tim
Token ID5045
Feature activation+1.35e-01

Pos logit contributions
suits+0.045
mons+0.040
 thirsty+0.038
+0.038
aped+0.037
Neg logit contributions
 apology-0.036
eness-0.035
 himself-0.034
est-0.033
heart-0.032
Tokenmy
Token ID1820
Feature activation-1.95e-01

Pos logit contributions
suits+0.287
 thirsty+0.256
 sandbox+0.246
coat+0.243
 hall+0.233
Neg logit contributions
 apology-0.268
eness-0.259
fulness-0.239
 adult-0.238
heart-0.224
TokenOnce
Token ID7454
Feature activation+3.55e-02

Pos logit contributions
 himself+0.131
eness+0.130
 apology+0.129
est+0.124
fulness+0.120
Neg logit contributions
suits-0.196
mons-0.191
 thirsty-0.184
 sandbox-0.174
coat-0.174
Token upon
Token ID2402
Feature activation+5.40e-02

Pos logit contributions
suits+0.058
+0.050
 thirsty+0.050
mons+0.048
 sandbox+0.044
Neg logit contributions
eness-0.087
 apology-0.085
heart-0.079
 adult-0.078
est-0.076
Token a
Token ID257
Feature activation+3.35e-02

Pos logit contributions
suits+0.104
+0.092
mons+0.091
 thirsty+0.090
 sandbox+0.083
Neg logit contributions
 apology-0.114
eness-0.113
heart-0.104
 adult-0.104
fulness-0.100
Token time
Token ID640
Feature activation-7.34e-02

Pos logit contributions
suits+0.082
mons+0.076
+0.075
 thirsty+0.073
aped+0.070
Neg logit contributions
 apology-0.052
eness-0.051
heart-0.046
 adult-0.046
 decision-0.045
Token,
Token ID11
Feature activation+4.90e-02

Pos logit contributions
 apology+0.113
eness+0.106
heart+0.105
est+0.105
 adult+0.104
Neg logit contributions
suits-0.171
mons-0.160
 thirsty-0.150
-0.150
coat-0.147
Token there
Token ID612
Feature activation+3.20e-01

Pos logit contributions
suits+0.079
mons+0.076
+0.075
coat+0.065
drops+0.063
Neg logit contributions
 himself-0.105
 apology-0.105
est-0.103
 moral-0.102
 adult-0.101
Token was
Token ID373
Feature activation-1.41e-01

Pos logit contributions
suits+0.685
+0.636
 thirsty+0.614
mons+0.605
 sandbox+0.586
Neg logit contributions
eness-0.620
 apology-0.618
heart-0.542
 adult-0.541
fulness-0.519
Token a
Token ID257
Feature activation+3.41e-02

Pos logit contributions
 himself+0.185
 apology+0.184
 moral+0.167
 lesson+0.166
est+0.165
Neg logit contributions
suits-0.379
mons-0.336
-0.329
drops-0.315
coat-0.314
Token hipp
Token ID18568
Feature activation+1.68e-01

Pos logit contributions
suits+0.100
mons+0.097
+0.093
aped+0.091
packs+0.089
Neg logit contributions
 apology-0.034
 adult-0.032
 moral-0.030
 decision-0.028
heart-0.027
Tokeno
Token ID78
Feature activation+1.99e-01

Pos logit contributions
suits+0.273
+0.238
mons+0.227
 thirsty+0.218
coat+0.212
Neg logit contributions
eness-0.393
 apology-0.389
heart-0.376
 adult-0.372
est-0.366
Token named
Token ID3706
Feature activation+3.15e-01

Pos logit contributions
suits+0.306
mons+0.264
+0.263
 thirsty+0.237
 sandbox+0.226
Neg logit contributions
 apology-0.494
eness-0.488
 adult-0.471
heart-0.470
est-0.453
Token.
Token ID13
Feature activation+5.28e-02

Pos logit contributions
 apology+0.124
eness+0.121
heart+0.115
 adult+0.114
 decision+0.111
Neg logit contributions
suits-0.084
mons-0.073
-0.072
 thirsty-0.070
coat-0.063
Token<|endoftext|>
Token ID50256
Feature activation-8.13e-02

Pos logit contributions
suits+0.133
+0.122
mons+0.121
 thirsty+0.119
 sandbox+0.113
Neg logit contributions
eness-0.081
 apology-0.080
heart-0.072
 adult-0.071
est-0.068
TokenTim
Token ID14967
Feature activation+1.04e-01

Pos logit contributions
 himself+0.150
 apology+0.149
eness+0.148
est+0.141
 adult+0.137
Neg logit contributions
suits-0.159
mons-0.148
 thirsty-0.144
coat-0.139
 sandbox-0.135
Token and
Token ID290
Feature activation-4.80e-02

Pos logit contributions
suits+0.227
 thirsty+0.198
coat+0.192
 sandbox+0.190
+0.190
Neg logit contributions
 apology-0.203
eness-0.200
fulness-0.183
 mercy-0.176
 adult-0.174
Token Sam
Token ID3409
Feature activation+8.34e-02

Pos logit contributions
eness+0.075
 apology+0.071
heart+0.063
 adult+0.062
 decision+0.061
Neg logit contributions
suits-0.120
 thirsty-0.114
-0.110
mons-0.109
 sandbox-0.106
Token are
Token ID389
Feature activation+2.52e-02

Pos logit contributions
suits+0.151
+0.133
mons+0.132
 thirsty+0.129
 sandbox+0.119
Neg logit contributions
 apology-0.185
eness-0.184
heart-0.170
 adult-0.169
fulness-0.164
Token brothers
Token ID9397
Feature activation-1.79e-01

Pos logit contributions
suits+0.051
+0.046
mons+0.046
coat+0.042
drops+0.040
Neg logit contributions
 apology-0.050
 adult-0.048
eness-0.047
 himself-0.046
heart-0.045
Token.
Token ID13
Feature activation+3.70e-02

Pos logit contributions
eness+0.340
 apology+0.338
 adult+0.305
heart+0.304
est+0.296
Neg logit contributions
suits-0.380
mons-0.349
-0.334
 thirsty-0.324
aped-0.312
Token They
Token ID1119
Feature activation+2.28e-01

Pos logit contributions
suits+0.065
mons+0.060
+0.058
 thirsty+0.056
coat+0.055
Neg logit contributions
 apology-0.079
eness-0.075
 himself-0.074
est-0.073
 His-0.072
Token like
Token ID588
Feature activation+1.01e-01

Pos logit contributions
suits+0.439
+0.392
mons+0.391
 thirsty+0.372
coat+0.348
Neg logit contributions
 apology-0.480
eness-0.475
heart-0.436
 adult-0.434
est-0.423
Token to
Token ID284
Feature activation-4.27e-02

Pos logit contributions
suits+0.236
+0.217
mons+0.209
 thirsty+0.205
aped+0.195
Neg logit contributions
 apology-0.170
eness-0.163
 adult-0.154
heart-0.147
fulness-0.143
Token their
Token ID511
Feature activation+1.61e-01

Pos logit contributions
suits+0.139
mons+0.128
+0.127
 thirsty+0.122
coat+0.116
Neg logit contributions
 apology-0.128
eness-0.126
 adult-0.116
heart-0.116
est-0.112
Token eyes
Token ID2951
Feature activation-2.42e-01

Pos logit contributions
suits+0.407
 thirsty+0.361
+0.357
 sandbox+0.351
 cups+0.349
Neg logit contributions
eness-0.255
 apology-0.243
heart-0.234
fulness-0.230
 adult-0.217
Token and
Token ID290
Feature activation-5.49e-02

Pos logit contributions
eness+0.417
 apology+0.409
heart+0.373
 adult+0.368
est+0.364
Neg logit contributions
suits-0.551
mons-0.512
-0.501
 thirsty-0.491
 sandbox-0.469
Token looked
Token ID3114
Feature activation+9.24e-02

Pos logit contributions
 apology+0.095
eness+0.092
heart+0.090
 himself+0.088
 adult+0.088
Neg logit contributions
-0.117
mons-0.109
suits-0.109
owers-0.097
coat-0.096
Token up
Token ID510
Feature activation-3.70e-02

Pos logit contributions
suits+0.181
+0.179
mons+0.174
aped+0.154
coat+0.154
Neg logit contributions
 apology-0.184
eness-0.179
heart-0.170
est-0.165
 adult-0.164
Token.
Token ID13
Feature activation+2.81e-02

Pos logit contributions
 apology+0.065
eness+0.065
heart+0.061
 himself+0.060
 adult+0.059
Neg logit contributions
suits-0.081
mons-0.076
-0.075
coat-0.070
 sandbox-0.068
Token They
Token ID1119
Feature activation+2.25e-01

Pos logit contributions
suits+0.053
coat+0.051
mons+0.051
 sandbox+0.049
+0.049
Neg logit contributions
 apology-0.053
 adult-0.047
eness-0.046
 himself-0.046
 His-0.046
Token saw
Token ID2497
Feature activation+1.61e-01

Pos logit contributions
suits+0.433
mons+0.428
+0.422
coat+0.365
 thirsty+0.363
Neg logit contributions
eness-0.433
 apology-0.423
heart-0.406
 adult-0.399
 himself-0.389
Token the
Token ID262
Feature activation-1.55e-02

Pos logit contributions
suits+0.300
 thirsty+0.264
mons+0.263
+0.258
 sandbox+0.245
Neg logit contributions
eness-0.350
 apology-0.350
heart-0.324
 adult-0.319
 decision-0.312
Token plane
Token ID6614
Feature activation+1.26e-01

Pos logit contributions
 apology+0.034
eness+0.033
 adult+0.031
 decision+0.031
heart+0.030
Neg logit contributions
suits-0.029
mons-0.026
-0.026
 thirsty-0.024
aped-0.023
Token fly
Token ID6129
Feature activation+1.71e-01

Pos logit contributions
suits+0.246
mons+0.230
+0.221
 thirsty+0.209
coat+0.194
Neg logit contributions
eness-0.264
 apology-0.255
est-0.240
heart-0.240
 decision-0.237
Token window
Token ID4324
Feature activation+1.87e-01

Pos logit contributions
 apology+0.042
eness+0.041
 adult+0.039
heart+0.038
 decision+0.038
Neg logit contributions
suits-0.028
mons-0.024
-0.024
 thirsty-0.023
aped-0.022
Token and
Token ID290
Feature activation-5.35e-02

Pos logit contributions
suits+0.412
mons+0.371
+0.371
 thirsty+0.362
 sandbox+0.340
Neg logit contributions
eness-0.341
 apology-0.341
heart-0.312
 adult-0.309
est-0.298
Token saw
Token ID2497
Feature activation+1.64e-01

Pos logit contributions
 apology+0.109
eness+0.108
heart+0.106
 adult+0.104
 himself+0.102
Neg logit contributions
suits-0.099
-0.095
mons-0.091
coat-0.081
 thirsty-0.081
Token her
Token ID607
Feature activation-1.10e-01

Pos logit contributions
suits+0.328
mons+0.294
+0.293
 thirsty+0.288
 sandbox+0.270
Neg logit contributions
 apology-0.330
eness-0.329
heart-0.301
 adult-0.299
fulness-0.290
Token neighbour
Token ID12250
Feature activation-1.70e-01

Pos logit contributions
 apology+0.182
eness+0.177
 adult+0.162
 decision+0.160
heart+0.159
Neg logit contributions
suits-0.263
mons-0.238
-0.236
 thirsty-0.230
coat-0.220
Token in
Token ID287
Feature activation+1.68e-01

Pos logit contributions
eness+0.352
 apology+0.351
heart+0.332
 adult+0.330
 decision+0.320
Neg logit contributions
suits-0.330
mons-0.295
-0.286
 thirsty-0.270
aped-0.259
Token the
Token ID262
Feature activation-1.59e-02

Pos logit contributions
suits+0.385
mons+0.352
+0.338
 thirsty+0.333
aped+0.317
Neg logit contributions
 apology-0.298
eness-0.296
 adult-0.271
heart-0.270
 decision-0.260
Token garden
Token ID11376
Feature activation+2.79e-01

Pos logit contributions
 apology+0.037
eness+0.036
 adult+0.036
est+0.035
 decision+0.034
Neg logit contributions
suits-0.026
mons-0.024
-0.023
aped-0.021
 thirsty-0.021
Token.
Token ID13
Feature activation+2.57e-02

Pos logit contributions
suits+0.598
mons+0.536
+0.525
 thirsty+0.503
coat+0.480
Neg logit contributions
eness-0.532
 apology-0.518
heart-0.499
 adult-0.486
 decision-0.474
Token He
Token ID679
Feature activation-2.64e-01

Pos logit contributions
suits+0.057
+0.052
mons+0.051
 thirsty+0.051
 sandbox+0.047
Neg logit contributions
 apology-0.046
eness-0.046
heart-0.042
 adult-0.041
fulness-0.040
Token looked
Token ID3114
Feature activation+9.23e-02

Pos logit contributions
 apology+0.542
eness+0.539
heart+0.479
fulness+0.477
 adult+0.475
Neg logit contributions
suits-0.535
 thirsty-0.472
-0.471
mons-0.461
aped-0.435
Token
Token ID198
Feature activation+2.61e-02

Pos logit contributions
suits+0.043
mons+0.038
 sandbox+0.038
 thirsty+0.038
coat+0.037
Neg logit contributions
 apology-0.037
eness-0.036
 adult-0.034
 himself-0.033
heart-0.032
Token
Token ID198
Feature activation+2.57e-02

Pos logit contributions
suits+0.049
+0.043
mons+0.043
 thirsty+0.042
 sandbox+0.039
Neg logit contributions
 apology-0.056
eness-0.056
 adult-0.052
heart-0.051
fulness-0.050
TokenH
Token ID39
Feature activation+2.02e-02

Pos logit contributions
suits+0.055
mons+0.049
+0.048
 thirsty+0.048
 sandbox+0.046
Neg logit contributions
eness-0.047
 apology-0.046
 adult-0.044
heart-0.043
 himself-0.041
Tokenank
Token ID962
Feature activation+1.55e-01

Pos logit contributions
suits+0.031
+0.027
mons+0.027
 thirsty+0.026
aped+0.023
Neg logit contributions
eness-0.051
 apology-0.051
heart-0.047
 adult-0.047
est-0.046
Token felt
Token ID2936
Feature activation-9.18e-02

Pos logit contributions
suits+0.307
mons+0.265
 thirsty+0.264
+0.259
aped+0.249
Neg logit contributions
 apology-0.324
eness-0.321
fulness-0.286
heart-0.286
 adult-0.286
Token a
Token ID257
Feature activation+3.17e-02

Pos logit contributions
eness+0.211
 apology+0.206
heart+0.194
 adult+0.192
fulness+0.187
Neg logit contributions
 thirsty-0.161
suits-0.155
-0.138
mons-0.135
 watering-0.129
Token bit
Token ID1643
Feature activation+1.29e-01

Pos logit contributions
suits+0.061
+0.055
 thirsty+0.054
mons+0.052
 sandbox+0.049
Neg logit contributions
eness-0.066
 apology-0.066
heart-0.061
 adult-0.060
fulness-0.059
Token hurt
Token ID5938
Feature activation-2.25e-02

Pos logit contributions
suits+0.242
 thirsty+0.226
+0.210
mons+0.207
 watering+0.191
Neg logit contributions
eness-0.273
 apology-0.271
 adult-0.250
heart-0.250
fulness-0.248
Token because
Token ID780
Feature activation-6.10e-03

Pos logit contributions
 apology+0.049
eness+0.046
 adult+0.042
 decision+0.042
heart+0.042
Neg logit contributions
suits-0.042
 thirsty-0.037
mons-0.037
-0.036
coat-0.034
Token he
Token ID339
Feature activation-1.56e-01

Pos logit contributions
eness+0.013
 apology+0.013
heart+0.011
 adult+0.011
est+0.011
Neg logit contributions
suits-0.012
 thirsty-0.011
-0.011
mons-0.011
 sandbox-0.010
Token didn
Token ID1422
Feature activation-1.61e-01

Pos logit contributions
 apology+0.361
eness+0.360
fulness+0.332
heart+0.321
est+0.319
Neg logit contributions
suits-0.290
 thirsty-0.266
-0.250
aped-0.241
mons-0.239
Token's
Token ID338
Feature activation-1.16e-01

Pos logit contributions
eness+0.124
 apology+0.124
fulness+0.109
heart+0.107
 adult+0.106
Neg logit contributions
suits-0.152
 thirsty-0.136
mons-0.136
-0.135
 sandbox-0.126
Token go
Token ID467
Feature activation-8.48e-02

Pos logit contributions
 apology+0.202
eness+0.197
 adult+0.185
heart+0.183
 decision+0.179
Neg logit contributions
suits-0.269
mons-0.243
-0.236
 thirsty-0.230
aped-0.220
Token!"
Token ID2474
Feature activation+5.25e-02

Pos logit contributions
 apology+0.147
 himself+0.140
 adult+0.140
eness+0.138
est+0.137
Neg logit contributions
suits-0.180
mons-0.175
-0.170
 thirsty-0.161
aped-0.159
Token Lily
Token ID20037
Feature activation+1.25e-01

Pos logit contributions
suits+0.084
+0.073
mons+0.072
 thirsty+0.070
 sandbox+0.064
Neg logit contributions
 apology-0.129
eness-0.128
heart-0.119
 adult-0.119
fulness-0.116
Token said
Token ID531
Feature activation-2.98e-02

Pos logit contributions
suits+0.266
+0.238
mons+0.237
 thirsty+0.232
 sandbox+0.217
Neg logit contributions
 apology-0.239
eness-0.238
heart-0.217
 adult-0.216
fulness-0.208
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
 apology+0.057
eness+0.056
heart+0.050
fulness+0.050
 adult+0.049
Neg logit contributions
suits-0.064
mons-0.058
-0.058
 thirsty-0.056
 sandbox-0.055
Token
Token ID198
Feature activation+2.30e-02

Pos logit contributions
suits+0.048
 thirsty+0.043
mons+0.043
 sandbox+0.042
coat+0.041
Neg logit contributions
 apology-0.036
 adult-0.033
eness-0.033
 himself-0.033
 His-0.031
Token
Token ID198
Feature activation+2.36e-02

Pos logit contributions
suits+0.042
+0.037
mons+0.036
 thirsty+0.036
 sandbox+0.033
Neg logit contributions
 apology-0.051
eness-0.051
heart-0.047
 adult-0.047
est-0.045
TokenThey
Token ID2990
Feature activation+1.57e-01

Pos logit contributions
suits+0.042
+0.040
mons+0.039
 thirsty+0.038
coat+0.035
Neg logit contributions
eness-0.050
 apology-0.050
 adult-0.048
heart-0.047
 himself-0.045
Token held
Token ID2714
Feature activation+3.09e-02

Pos logit contributions
suits+0.272
+0.255
mons+0.254
 thirsty+0.239
 funnel+0.220
Neg logit contributions
 apology-0.331
eness-0.331
 adult-0.320
heart-0.310
est-0.304
Token hands
Token ID2832
Feature activation+6.65e-02

Pos logit contributions
suits+0.062
+0.060
mons+0.058
 thirsty+0.057
coat+0.055
Neg logit contributions
 apology-0.060
eness-0.056
est-0.052
 adult-0.052
 decision-0.051
Token.
Token ID13
Feature activation+3.44e-02

Pos logit contributions
suits+0.253
 thirsty+0.220
+0.218
mons+0.217
coat+0.199
Neg logit contributions
 apology-0.310
eness-0.307
 adult-0.285
heart-0.285
fulness-0.276
Token
Token ID198
Feature activation+3.70e-02

Pos logit contributions
suits+0.074
mons+0.065
+0.064
 thirsty+0.063
 sandbox+0.062
Neg logit contributions
 apology-0.064
eness-0.061
 adult-0.059
 himself-0.056
fulness-0.056
Token
Token ID198
Feature activation+3.71e-02

Pos logit contributions
suits+0.064
+0.056
mons+0.056
 thirsty+0.054
 sandbox+0.050
Neg logit contributions
eness-0.085
 apology-0.084
heart-0.079
 adult-0.077
est-0.077
Token"
Token ID1
Feature activation-5.94e-02

Pos logit contributions
suits+0.088
+0.083
mons+0.080
 thirsty+0.080
coat+0.076
Neg logit contributions
 apology-0.060
eness-0.058
 adult-0.054
heart-0.053
 himself-0.051
TokenI
Token ID40
Feature activation+6.18e-02

Pos logit contributions
 apology+0.118
eness+0.113
 moral+0.106
 lesson+0.105
 adult+0.104
Neg logit contributions
suits-0.120
-0.102
 thirsty-0.102
mons-0.099
 grazing-0.098
Token wish
Token ID4601
Feature activation-2.65e-01

Pos logit contributions
suits+0.134
 thirsty+0.120
+0.119
mons+0.116
coat+0.110
Neg logit contributions
eness-0.118
 apology-0.116
heart-0.107
 adult-0.107
fulness-0.102
Token for
Token ID329
Feature activation-8.94e-02

Pos logit contributions
 apology+0.587
eness+0.573
heart+0.538
 adult+0.536
fulness+0.531
Neg logit contributions
suits-0.491
-0.435
mons-0.425
 thirsty-0.410
coat-0.391
Token a
Token ID257
Feature activation+4.02e-02

Pos logit contributions
eness+0.193
 apology+0.191
heart+0.180
 adult+0.175
est+0.173
Neg logit contributions
suits-0.163
 thirsty-0.146
-0.144
mons-0.140
 cups-0.134
Token big
Token ID1263
Feature activation+9.10e-03

Pos logit contributions
suits+0.091
 thirsty+0.081
+0.081
mons+0.079
 sandbox+0.076
Neg logit contributions
eness-0.073
 apology-0.070
heart-0.065
 adult-0.063
fulness-0.061
Token ice
Token ID4771
Feature activation-3.22e-02

Pos logit contributions
suits+0.019
+0.017
 thirsty+0.017
mons+0.017
 sandbox+0.016
Neg logit contributions
eness-0.018
 apology-0.017
heart-0.016
 adult-0.015
fulness-0.015
Token cream
Token ID8566
Feature activation+2.92e-02

Pos logit contributions
 apology+0.060
eness+0.059
heart+0.054
 adult+0.054
est+0.052
Neg logit contributions
suits-0.070
-0.063
mons-0.062
 thirsty-0.061
aped-0.058
Token
Token ID198
Feature activation+2.70e-02

Pos logit contributions
suits+0.130
+0.115
mons+0.113
 thirsty+0.110
aped+0.100
Neg logit contributions
eness-0.171
 apology-0.171
heart-0.159
 adult-0.157
est-0.153
Token
Token ID198
Feature activation+2.69e-02

Pos logit contributions
suits+0.048
+0.042
mons+0.041
 thirsty+0.040
 sandbox+0.037
Neg logit contributions
eness-0.061
 apology-0.061
heart-0.057
 adult-0.056
est-0.055
TokenJohn
Token ID7554
Feature activation-8.25e-02

Pos logit contributions
suits+0.054
+0.049
mons+0.046
 thirsty+0.046
 sandbox+0.045
Neg logit contributions
 apology-0.055
eness-0.054
heart-0.049
est-0.049
 decision-0.048
Token was
Token ID373
Feature activation-1.41e-01

Pos logit contributions
 apology+0.178
eness+0.169
fulness+0.158
 adult+0.157
 decision+0.154
Neg logit contributions
suits-0.162
-0.138
 thirsty-0.136
aped-0.135
mons-0.131
Token so
Token ID523
Feature activation-5.10e-02

Pos logit contributions
 apology+0.287
eness+0.285
heart+0.261
 adult+0.260
fulness+0.253
Neg logit contributions
suits-0.280
-0.251
 thirsty-0.249
mons-0.247
 sandbox-0.228
Token happy
Token ID3772
Feature activation-6.79e-03

Pos logit contributions
eness+0.122
 apology+0.120
 adult+0.111
heart+0.111
fulness+0.108
Neg logit contributions
suits-0.085
 thirsty-0.079
-0.075
mons-0.072
aped-0.068
Token.
Token ID13
Feature activation+2.77e-02

Pos logit contributions
 apology+0.014
eness+0.014
 adult+0.013
heart+0.012
est+0.012
Neg logit contributions
suits-0.013
-0.012
mons-0.012
 thirsty-0.012
 sandbox-0.011
Token He
Token ID679
Feature activation-2.68e-01

Pos logit contributions
suits+0.060
+0.058
 thirsty+0.054
mons+0.051
 sandbox+0.050
Neg logit contributions
eness-0.049
 apology-0.049
 mercy-0.043
heart-0.042
 adult-0.042
Token signed
Token ID4488
Feature activation+5.12e-02

Pos logit contributions
 apology+0.553
eness+0.551
heart+0.503
 adult+0.500
fulness+0.487
Neg logit contributions
suits-0.534
-0.474
 thirsty-0.464
mons-0.462
aped-0.430
Token every
Token ID790
Feature activation+4.55e-03

Pos logit contributions
suits+0.102
 thirsty+0.090
mons+0.087
 together+0.085
 cups+0.082
Neg logit contributions
 apology-0.104
eness-0.103
heart-0.093
 adult-0.089
 decision-0.089
Token day
Token ID1110
Feature activation+1.59e-02

Pos logit contributions
suits+0.008
+0.007
 thirsty+0.007
mons+0.006
 pool+0.006
Neg logit contributions
eness-0.010
 apology-0.010
heart-0.009
 himself-0.009
erence-0.009
Token He
Token ID679
Feature activation-2.64e-01

Pos logit contributions
suits+0.061
mons+0.055
+0.055
 thirsty+0.054
 sandbox+0.051
Neg logit contributions
 apology-0.036
eness-0.035
 adult-0.032
heart-0.031
fulness-0.030
Token feels
Token ID5300
Feature activation-2.02e-01

Pos logit contributions
eness+0.515
 apology+0.506
heart+0.470
 adult+0.465
fulness+0.465
Neg logit contributions
suits-0.553
 thirsty-0.502
-0.496
mons-0.495
 sandbox-0.470
Token a
Token ID257
Feature activation+3.66e-02

Pos logit contributions
eness+0.468
 apology+0.459
heart+0.441
 adult+0.433
fulness+0.419
Neg logit contributions
 thirsty-0.350
suits-0.332
mons-0.294
-0.292
 watering-0.279
Token shock
Token ID6380
Feature activation-5.69e-02

Pos logit contributions
suits+0.068
+0.063
 thirsty+0.063
 watering+0.057
mons+0.056
Neg logit contributions
eness-0.077
 apology-0.076
heart-0.072
 adult-0.070
erence-0.069
Token.
Token ID13
Feature activation+2.35e-02

Pos logit contributions
 apology+0.124
eness+0.123
heart+0.114
 adult+0.114
 decision+0.110
Neg logit contributions
suits-0.105
-0.093
mons-0.092
 thirsty-0.090
 sandbox-0.083
Token He
Token ID679
Feature activation-2.64e-01

Pos logit contributions
suits+0.061
+0.056
mons+0.056
 thirsty+0.055
 sandbox+0.052
Neg logit contributions
 apology-0.034
eness-0.033
heart-0.030
 adult-0.029
fulness-0.028
Token cries
Token ID24691
Feature activation-2.07e-01

Pos logit contributions
eness+0.530
 apology+0.520
heart+0.485
fulness+0.485
 adult+0.481
Neg logit contributions
suits-0.542
 thirsty-0.490
-0.478
mons-0.475
 sandbox-0.459
Token out
Token ID503
Feature activation+8.83e-02

Pos logit contributions
 apology+0.441
eness+0.432
heart+0.407
 adult+0.401
 decision+0.393
Neg logit contributions
suits-0.400
 thirsty-0.348
mons-0.345
-0.344
aped-0.317
Token loud
Token ID7812
Feature activation-9.03e-02

Pos logit contributions
suits+0.169
 thirsty+0.154
+0.149
mons+0.146
 watering+0.143
Neg logit contributions
eness-0.178
 apology-0.177
heart-0.162
 adult-0.156
 decision-0.154
Token.
Token ID13
Feature activation+2.30e-02

Pos logit contributions
 apology+0.203
eness+0.198
 adult+0.188
heart+0.186
 decision+0.182
Neg logit contributions
suits-0.161
mons-0.137
-0.137
 thirsty-0.135
coat-0.125
Token He
Token ID679
Feature activation-2.64e-01

Pos logit contributions
suits+0.046
+0.041
mons+0.041
 thirsty+0.040
 sandbox+0.038
Neg logit contributions
 apology-0.046
eness-0.046
heart-0.042
 adult-0.042
fulness-0.041
Token -
Token ID532
Feature activation-1.64e-01

Pos logit contributions
suits+0.063
+0.056
mons+0.056
 thirsty+0.054
 sandbox+0.052
Neg logit contributions
eness-0.057
 apology-0.056
 adult-0.051
heart-0.051
fulness-0.051
Token the
Token ID262
Feature activation-2.12e-02

Pos logit contributions
 apology+0.321
eness+0.319
heart+0.292
 adult+0.292
fulness+0.281
Neg logit contributions
suits-0.339
-0.303
mons-0.302
 thirsty-0.294
 sandbox-0.276
Token pig
Token ID12967
Feature activation+7.88e-02

Pos logit contributions
 apology+0.044
eness+0.044
heart+0.041
 adult+0.040
fulness+0.039
Neg logit contributions
suits-0.042
-0.037
mons-0.037
 thirsty-0.037
 sandbox-0.034
Token was
Token ID373
Feature activation-1.45e-01

Pos logit contributions
suits+0.167
+0.150
mons+0.149
 thirsty+0.147
coat+0.137
Neg logit contributions
 apology-0.151
eness-0.148
 adult-0.136
heart-0.136
fulness-0.131
Token very
Token ID845
Feature activation+7.49e-03

Pos logit contributions
 apology+0.289
eness+0.287
heart+0.263
 adult+0.262
fulness+0.253
Neg logit contributions
suits-0.296
-0.264
mons-0.263
 thirsty-0.257
 sandbox-0.240
Token grateful
Token ID14066
Feature activation-8.06e-02

Pos logit contributions
suits+0.016
+0.015
mons+0.014
 thirsty+0.014
 sandbox+0.013
Neg logit contributions
eness-0.014
 apology-0.014
heart-0.013
 adult-0.013
fulness-0.012
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
 apology+0.181
eness+0.178
 adult+0.165
heart+0.160
est+0.155
Neg logit contributions
suits-0.144
mons-0.124
 thirsty-0.123
-0.118
 together-0.112
Token
Token ID198
Feature activation+2.44e-02

Pos logit contributions
suits+0.043
mons+0.039
+0.038
 thirsty+0.037
coat+0.035
Neg logit contributions
 apology-0.044
eness-0.043
heart-0.041
 adult-0.040
est-0.039
Token
Token ID198
Feature activation+2.51e-02

Pos logit contributions
suits+0.045
+0.040
mons+0.040
 thirsty+0.039
 sandbox+0.036
Neg logit contributions
 apology-0.053
eness-0.053
heart-0.049
 adult-0.049
fulness-0.047
TokenThe
Token ID464
Feature activation-4.32e-02

Pos logit contributions
suits+0.055
+0.050
mons+0.049
 thirsty+0.049
 sandbox+0.046
Neg logit contributions
 apology-0.047
eness-0.046
heart-0.042
 adult-0.042
est-0.041
Token farmer
Token ID18739
Feature activation-1.23e-01

Pos logit contributions
eness+0.091
 apology+0.089
heart+0.084
fulness+0.082
est+0.077
Neg logit contributions
suits-0.083
 thirsty-0.077
-0.075
mons-0.073
 grazing-0.068
Token One
Token ID1881
Feature activation-2.35e-02

Pos logit contributions
suits+0.067
aped+0.065
mons+0.064
illas+0.060
 grazing+0.060
Neg logit contributions
 apology-0.039
 moral-0.035
 His-0.035
est-0.035
 lesson-0.034
Token day
Token ID1110
Feature activation+2.20e-02

Pos logit contributions
eness+0.054
 apology+0.053
heart+0.048
 adult+0.048
fulness+0.047
Neg logit contributions
suits-0.042
 thirsty-0.037
mons-0.036
-0.036
coat-0.034
Token,
Token ID11
Feature activation+4.58e-02

Pos logit contributions
suits+0.055
+0.050
mons+0.049
 thirsty+0.049
 sandbox+0.049
Neg logit contributions
 apology-0.029
est-0.028
eness-0.027
 adult-0.027
 lesson-0.027
Token Lily
Token ID20037
Feature activation+1.26e-01

Pos logit contributions
suits+0.119
illas+0.099
ites+0.099
unks+0.098
+0.098
Neg logit contributions
 his-0.062
 apology-0.062
 lesson-0.060
 moral-0.057
 matter-0.057
Token's
Token ID338
Feature activation-1.14e-01

Pos logit contributions
suits+0.299
+0.288
mons+0.280
 thirsty+0.263
 funnel+0.256
Neg logit contributions
eness-0.185
 apology-0.184
heart-0.178
 adult-0.177
est-0.172
Token mom
Token ID1995
Feature activation-1.25e-01

Pos logit contributions
 apology+0.177
eness+0.177
heart+0.158
 adult+0.155
fulness+0.150
Neg logit contributions
suits-0.283
-0.259
mons-0.257
 thirsty-0.253
 sandbox-0.240
Token went
Token ID1816
Feature activation+4.24e-02

Pos logit contributions
 apology+0.249
eness+0.247
heart+0.226
 adult+0.224
fulness+0.218
Neg logit contributions
suits-0.258
-0.230
mons-0.229
 thirsty-0.225
coat-0.210
Token to
Token ID284
Feature activation-4.93e-02

Pos logit contributions
suits+0.109
+0.105
mons+0.103
owers+0.099
 sandbox+0.098
Neg logit contributions
 apology-0.056
 adult-0.056
 advice-0.049
 himself-0.048
eness-0.047
Token the
Token ID262
Feature activation-1.57e-02

Pos logit contributions
 apology+0.082
 adult+0.075
eness+0.070
heart+0.068
 himself+0.067
Neg logit contributions
suits-0.123
mons-0.112
-0.108
aped-0.104
 thirsty-0.102
Token store
Token ID3650
Feature activation+9.26e-02

Pos logit contributions
 apology+0.035
 adult+0.033
eness+0.033
heart+0.031
 decision+0.031
Neg logit contributions
suits-0.028
mons-0.025
-0.024
 thirsty-0.024
aped-0.023
Token and
Token ID290
Feature activation-5.45e-02

Pos logit contributions
+0.230
suits+0.228
owers+0.212
 funnel+0.210
 sandbox+0.210
Neg logit contributions
 adult-0.117
 apology-0.113
 himself-0.109
 advice-0.105
 decision-0.104
Token sure
Token ID1654
Feature activation-2.22e-01

Pos logit contributions
eness+0.096
 apology+0.095
 adult+0.086
fulness+0.086
heart+0.085
Neg logit contributions
suits-0.076
 thirsty-0.075
mons-0.073
 together-0.070
-0.069
Token no
Token ID645
Feature activation-5.98e-02

Pos logit contributions
eness+0.522
 apology+0.511
heart+0.473
est+0.461
 adult+0.456
Neg logit contributions
suits-0.350
 thirsty-0.326
mons-0.310
-0.301
 sandbox-0.300
Token one
Token ID530
Feature activation-2.76e-02

Pos logit contributions
eness+0.094
 apology+0.093
heart+0.080
fulness+0.080
 adult+0.080
Neg logit contributions
suits-0.140
 thirsty-0.132
-0.130
mons-0.129
 sandbox-0.126
Token distur
Token ID11239
Feature activation-7.40e-02

Pos logit contributions
eness+0.058
 apology+0.056
heart+0.052
 adult+0.051
fulness+0.049
Neg logit contributions
suits-0.054
 thirsty-0.049
mons-0.048
-0.047
 sandbox-0.044
Tokenbs
Token ID1443
Feature activation+2.49e-01

Pos logit contributions
 apology+0.218
eness+0.218
 adult+0.205
heart+0.205
est+0.201
Neg logit contributions
suits-0.081
mons-0.064
-0.063
 thirsty-0.061
coat-0.051
Token me
Token ID502
Feature activation-2.83e-01

Pos logit contributions
suits+0.453
mons+0.411
+0.409
 thirsty+0.404
 sandbox+0.375
Neg logit contributions
 apology-0.522
eness-0.499
 decision-0.463
 adult-0.463
heart-0.456
Token while
Token ID981
Feature activation+2.13e-01

Pos logit contributions
eness+0.577
 apology+0.577
heart+0.532
 adult+0.526
est+0.513
Neg logit contributions
suits-0.560
-0.502
mons-0.498
 thirsty-0.482
 sandbox-0.453
Token I
Token ID314
Feature activation+3.41e-02

Pos logit contributions
suits+0.456
+0.403
mons+0.403
 thirsty+0.386
coat+0.372
Neg logit contributions
 apology-0.408
eness-0.405
 adult-0.376
heart-0.376
fulness-0.363
Token'm
Token ID1101
Feature activation+6.46e-02

Pos logit contributions
suits+0.066
+0.059
mons+0.058
 thirsty+0.058
coat+0.053
Neg logit contributions
eness-0.071
 apology-0.071
heart-0.066
 adult-0.065
fulness-0.063
Token sleeping
Token ID11029
Feature activation+2.35e-01

Pos logit contributions
suits+0.134
mons+0.119
+0.116
 sandbox+0.108
aped+0.105
Neg logit contributions
 apology-0.131
eness-0.128
 adult-0.122
heart-0.120
fulness-0.115
Token.
Token ID13
Feature activation+3.16e-02

Pos logit contributions
suits+0.470
mons+0.422
+0.419
 thirsty+0.403
 sandbox+0.383
Neg logit contributions
eness-0.477
 apology-0.470
heart-0.439
 adult-0.433
est-0.425
Token thanked
Token ID26280
Feature activation-1.43e-01

Pos logit contributions
 apology+0.087
eness+0.087
heart+0.079
 adult+0.079
fulness+0.076
Neg logit contributions
suits-0.095
-0.085
mons-0.084
 thirsty-0.083
 sandbox-0.077
Token the
Token ID262
Feature activation-1.41e-02

Pos logit contributions
eness+0.317
 apology+0.306
heart+0.285
est+0.283
 decision+0.282
Neg logit contributions
suits-0.238
-0.232
mons-0.217
 thirsty-0.216
 sandbox-0.200
Token frog
Token ID21264
Feature activation+5.60e-02

Pos logit contributions
eness+0.029
 apology+0.028
heart+0.026
fulness+0.026
 adult+0.025
Neg logit contributions
suits-0.027
-0.025
 thirsty-0.025
mons-0.024
 sandbox-0.023
Token for
Token ID329
Feature activation-8.24e-02

Pos logit contributions
suits+0.109
 thirsty+0.095
+0.095
mons+0.094
 sandbox+0.087
Neg logit contributions
eness-0.115
 apology-0.114
 adult-0.106
heart-0.105
fulness-0.101
Token saving
Token ID8914
Feature activation+1.60e-01

Pos logit contributions
eness+0.151
 apology+0.146
heart+0.140
est+0.135
 adult+0.133
Neg logit contributions
suits-0.172
 thirsty-0.162
-0.162
mons-0.159
 grazing-0.151
Token him
Token ID683
Feature activation-3.24e-01

Pos logit contributions
suits+0.339
 thirsty+0.327
 together+0.310
mons+0.310
 cups+0.308
Neg logit contributions
eness-0.272
 apology-0.254
est-0.239
oral-0.235
heart-0.234
Token.
Token ID13
Feature activation+3.52e-02

Pos logit contributions
 apology+0.692
eness+0.687
heart+0.638
 adult+0.635
fulness+0.613
Neg logit contributions
suits-0.610
-0.537
mons-0.534
 thirsty-0.529
 sandbox-0.484
Token From
Token ID3574
Feature activation-1.67e-01

Pos logit contributions
suits+0.078
mons+0.070
+0.069
 thirsty+0.069
coat+0.066
Neg logit contributions
 apology-0.064
eness-0.062
 adult-0.057
heart-0.057
fulness-0.055
Token that
Token ID326
Feature activation-5.96e-02

Pos logit contributions
eness+0.330
 apology+0.328
heart+0.298
 adult+0.295
est+0.289
Neg logit contributions
suits-0.348
-0.310
mons-0.310
 thirsty-0.310
 sandbox-0.286
Token day
Token ID1110
Feature activation+2.15e-02

Pos logit contributions
eness+0.148
 apology+0.143
heart+0.132
 adult+0.131
 mercy+0.127
Neg logit contributions
suits-0.094
 thirsty-0.084
-0.083
mons-0.077
 hall-0.073
Token on
Token ID319
Feature activation+2.33e-01

Pos logit contributions
suits+0.029
mons+0.024
+0.024
 thirsty+0.023
 sandbox+0.022
Neg logit contributions
eness-0.058
 apology-0.057
 adult-0.054
heart-0.054
fulness-0.053
Token sun
Token ID4252
Feature activation+2.96e-02

Pos logit contributions
 apology+0.059
eness+0.058
heart+0.054
 adult+0.054
fulness+0.053
Neg logit contributions
suits-0.033
 thirsty-0.028
-0.028
mons-0.027
 sandbox-0.025
Token,
Token ID11
Feature activation+3.96e-02

Pos logit contributions
suits+0.061
+0.054
mons+0.054
 thirsty+0.053
 sandbox+0.049
Neg logit contributions
eness-0.059
 apology-0.057
heart-0.054
 adult-0.052
est-0.052
Token planting
Token ID26630
Feature activation+3.90e-01

Pos logit contributions
suits+0.075
+0.066
mons+0.066
 thirsty+0.061
coat+0.058
Neg logit contributions
 apology-0.085
eness-0.085
heart-0.079
 adult-0.078
est-0.076
Token and
Token ID290
Feature activation-5.93e-02

Pos logit contributions
suits+0.674
 thirsty+0.606
mons+0.597
 cups+0.552
+0.547
Neg logit contributions
 apology-0.906
eness-0.878
 adult-0.810
heart-0.809
 decision-0.796
Token harvesting
Token ID34102
Feature activation+1.73e-01

Pos logit contributions
 apology+0.147
eness+0.144
heart+0.138
 adult+0.138
fulness+0.133
Neg logit contributions
suits-0.093
-0.081
mons-0.079
 thirsty-0.070
coat-0.069
Token his
Token ID465
Feature activation-3.74e-01

Pos logit contributions
suits+0.332
 thirsty+0.302
mons+0.296
 cups+0.286
+0.277
Neg logit contributions
eness-0.357
 apology-0.353
heart-0.326
 adult-0.318
fulness-0.314
Token wheat
Token ID17135
Feature activation+1.76e-01

Pos logit contributions
 apology+0.731
eness+0.716
 adult+0.663
heart+0.660
 decision+0.643
Neg logit contributions
suits-0.775
-0.706
mons-0.690
 thirsty-0.667
coat-0.640
Token.
Token ID13
Feature activation+2.25e-02

Pos logit contributions
suits+0.367
+0.333
mons+0.322
 thirsty+0.314
aped+0.306
Neg logit contributions
eness-0.335
 apology-0.330
heart-0.304
 adult-0.304
est-0.304
Token Every
Token ID3887
Feature activation-2.14e-01

Pos logit contributions
suits+0.045
+0.041
mons+0.040
 thirsty+0.039
 sandbox+0.037
Neg logit contributions
 apology-0.045
eness-0.045
heart-0.042
 adult-0.041
fulness-0.040
Token night
Token ID1755
Feature activation-3.88e-02

Pos logit contributions
eness+0.523
 apology+0.510
heart+0.464
 adult+0.460
fulness+0.447
Neg logit contributions
suits-0.344
-0.316
 thirsty-0.315
 pool-0.283
mons-0.279
Token he
Token ID339
Feature activation-1.59e-01

Pos logit contributions
 apology+0.083
eness+0.083
heart+0.077
 adult+0.076
fulness+0.074
Neg logit contributions
suits-0.073
-0.065
mons-0.064
 thirsty-0.063
 sandbox-0.058
Token on
Token ID319
Feature activation+2.29e-01

Pos logit contributions
suits+0.025
mons+0.021
+0.021
 thirsty+0.020
 sandbox+0.018
Neg logit contributions
eness-0.051
 apology-0.051
heart-0.048
 adult-0.048
fulness-0.047
Token,
Token ID11
Feature activation+3.96e-02

Pos logit contributions
suits+0.375
+0.345
mons+0.330
 thirsty+0.328
 sandbox+0.298
Neg logit contributions
 apology-0.545
eness-0.541
 adult-0.496
heart-0.495
fulness-0.479
Token the
Token ID262
Feature activation-2.10e-02

Pos logit contributions
suits+0.091
mons+0.080
+0.079
 thirsty+0.076
coat+0.076
Neg logit contributions
 apology-0.070
eness-0.070
heart-0.066
 adult-0.064
est-0.063
Token bird
Token ID6512
Feature activation-4.78e-02

Pos logit contributions
 apology+0.035
eness+0.033
 adult+0.031
heart+0.030
 decision+0.030
Neg logit contributions
suits-0.051
mons-0.046
-0.046
 thirsty-0.044
coat-0.043
Tokenie
Token ID494
Feature activation-3.27e-01

Pos logit contributions
 apology+0.076
eness+0.066
 mercy+0.059
 adult+0.059
 decision+0.058
Neg logit contributions
suits-0.116
 thirsty-0.113
mons-0.107
-0.106
 sandbox-0.105
Token knew
Token ID2993
Feature activation-3.09e-01

Pos logit contributions
 apology+0.675
eness+0.638
 adult+0.586
heart+0.577
 decision+0.570
Neg logit contributions
suits-0.655
 thirsty-0.595
-0.591
mons-0.579
aped-0.563
Token that
Token ID326
Feature activation-6.38e-02

Pos logit contributions
eness+0.725
 apology+0.711
heart+0.636
est+0.635
 decision+0.621
Neg logit contributions
mons-0.476
 thirsty-0.472
-0.454
suits-0.451
 together-0.449
Token even
Token ID772
Feature activation-1.65e-01

Pos logit contributions
eness+0.130
 apology+0.130
heart+0.116
 adult+0.115
 decision+0.112
Neg logit contributions
suits-0.123
 thirsty-0.114
mons-0.113
-0.111
 sandbox-0.105
Token though
Token ID996
Feature activation-7.48e-02

Pos logit contributions
eness+0.360
 apology+0.357
heart+0.321
 adult+0.319
fulness+0.314
Neg logit contributions
suits-0.299
 thirsty-0.283
mons-0.270
-0.254
 sandbox-0.247
Token things
Token ID1243
Feature activation-9.99e-02

Pos logit contributions
 apology+0.149
eness+0.148
heart+0.134
 adult+0.133
est+0.129
Neg logit contributions
suits-0.151
-0.137
mons-0.136
 thirsty-0.135
 sandbox-0.126
Token may
Token ID743
Feature activation-1.73e-01

Pos logit contributions
 apology+0.201
eness+0.200
heart+0.184
 adult+0.183
est+0.177
Neg logit contributions
suits-0.199
-0.180
mons-0.177
 thirsty-0.176
aped-0.161
Token left
Token ID1364
Feature activation-4.92e-03

Pos logit contributions
suits+0.235
+0.208
mons+0.207
 thirsty+0.202
 sandbox+0.187
Neg logit contributions
 apology-0.271
eness-0.270
heart-0.249
 adult-0.248
fulness-0.240
Token the
Token ID262
Feature activation-1.83e-02

Pos logit contributions
 apology+0.010
eness+0.009
heart+0.009
 adult+0.009
fulness+0.008
Neg logit contributions
suits-0.010
mons-0.009
-0.009
 thirsty-0.009
 sandbox-0.009
Token park
Token ID3952
Feature activation+1.96e-01

Pos logit contributions
 apology+0.041
 adult+0.039
eness+0.038
 decision+0.038
 lesson+0.037
Neg logit contributions
suits-0.033
mons-0.029
-0.029
aped-0.027
 thirsty-0.026
Token.
Token ID13
Feature activation+3.33e-02

Pos logit contributions
suits+0.432
mons+0.398
+0.385
 thirsty+0.371
 sandbox+0.356
Neg logit contributions
eness-0.360
 apology-0.344
heart-0.323
fulness-0.322
est-0.321
Token He
Token ID679
Feature activation-2.61e-01

Pos logit contributions
suits+0.064
mons+0.056
+0.056
 thirsty+0.055
coat+0.051
Neg logit contributions
 apology-0.070
eness-0.070
heart-0.064
 adult-0.064
fulness-0.062
Token knew
Token ID2993
Feature activation-3.07e-01

Pos logit contributions
eness+0.553
 apology+0.552
heart+0.500
 adult+0.497
fulness+0.488
Neg logit contributions
suits-0.508
 thirsty-0.450
-0.437
aped-0.412
 sandbox-0.409
Token that
Token ID326
Feature activation-5.88e-02

Pos logit contributions
eness+0.698
 apology+0.675
est+0.633
heart+0.627
 decision+0.602
Neg logit contributions
suits-0.490
-0.472
 thirsty-0.468
mons-0.466
 together-0.435
Token sister
Token ID6621
Feature activation-1.03e-01

Pos logit contributions
eness+0.124
 apology+0.124
 adult+0.112
heart+0.111
est+0.108
Neg logit contributions
suits-0.106
 thirsty-0.099
mons-0.097
-0.096
 sandbox-0.090
Token's
Token ID338
Feature activation-1.10e-01

Pos logit contributions
 apology+0.208
eness+0.201
 adult+0.184
heart+0.183
fulness+0.175
Neg logit contributions
suits-0.207
 thirsty-0.183
-0.182
mons-0.179
coat-0.173
Token memory
Token ID4088
Feature activation-1.52e-01

Pos logit contributions
eness+0.190
 apology+0.181
heart+0.173
est+0.163
 adult+0.161
Neg logit contributions
suits-0.233
 thirsty-0.232
-0.217
mons-0.209
 grazing-0.209
Token might
Token ID1244
Feature activation-3.54e-02

Pos logit contributions
 apology+0.294
eness+0.283
 adult+0.265
heart+0.259
 decision+0.247
Neg logit contributions
suits-0.310
-0.279
mons-0.275
 thirsty-0.271
coat-0.261