TOP ACTIVATIONS
MAX = 3.474

's time to clean up your toys ." Lily did not
relieved . She even shared her delicious snack with the little
, Nut ty always shared her ac orns with her friends
One day , while they were walking , Tim my
I 'm happy to help my reliable friend . Remember ,
happy because he could share his ac orn with his friends
. She wanted to fill her basket with the needles .
One day , while they were walking , they saw
One day , while they were playing in the garden
his cousins played and shared their ac orns together .
it and decided to share her je ep with Max .
, Lily learned to share her candy and her brother learned
on , Sue loved hanging her purse in the tree as
to fly and play with other birds . One
Sally promised to always share her toys with others . <|endoftext|>
, Lily , for sharing your toys with me ." Lily
, " Can you raise your trunk ?" The elephant raised
Lily felt good about sharing her toy . The moral of
One day , while they were walking , Spot saw
. She quickly filled up her basket with blue berries and

MIN ACTIVATIONS
MIN = -3.295

they played in the garden and had lots of fun .
play . They looked around and saw a black board .
One day , they find a big box in the
day , Anna and Ben find a big frog in the
day , Anna and Ben find a big roll of paper
One day , they find a new toy in their
. One day , they find a big box in the
or on paper . You can not draw on other people
One day , they found a big fan in the
One day , they find a new soap in the
day , Tom and Mia find a big box in the
. One day , they find a big box in the
They were always playing together and they were very good at
One day , they find a yellow box in their
. One day , they find a big box in the
were walking in the park and stopped in front of a
my . They would stretch and move their bodies in fun
One day , they find a new rocket in their
One day , they find a new truck in the
mask . They ran around and made loud noises .

INTERVAL 1.782 - 3.474
CONTAINS 0.403%

the train . In the end , the
tree gave nice shade to everyone . Under the tree ,
She thanked him for saving her . She said sorry to
!" Ben shouts and jumps out . He makes Lily laugh
make up . But when she got there , she saw

INTERVAL 0.090 - 1.782
CONTAINS 43.691%

But it seemed that every time they got close , the
's mom opened the door and took out the popcorn .
sick , or hurt your mouth . You have to ask
<|endoftext|> Once upon a time there was a little bear .
at my car , Mia !" Tom says . He holds

INTERVAL -1.602 - 0.090
CONTAINS 55.363%

she got closer , she saw that it was a big
and the things . She sent them to their room .
Then , he went to sleep with the sack beside him
. Instead , it kept growing bigger and bigger until finally
soon as the family arrived at their new home , they

INTERVAL -3.295 - -1.602
CONTAINS 0.543%

and butterflies inside . You should try it , Sam ."
the jar in a big blanket . She wanted to keep
cr ay on . They find a white paper . They
. She had a blue case with her that was filled
Tim and Tom . Tom had a model car , and
Token's
Token ID338
Feature activation+7.92e-02

Pos logit contributions
hr+0.814
oc+0.701
EX+0.694
 exc+0.693
AAAA+0.691
Neg logit contributions
abella-0.605
top-0.561
 wrong-0.554
bench-0.518
box-0.460
Token time
Token ID640
Feature activation-7.54e-01

Pos logit contributions
hr+0.130
oc+0.112
 exc+0.110
 tou+0.110
EX+0.109
Neg logit contributions
abella-0.113
top-0.104
 wrong-0.101
bench-0.098
box-0.087
Token to
Token ID284
Feature activation+1.42e-01

Pos logit contributions
abella+1.138
top+1.062
bench+0.999
 wrong+0.996
box+0.866
Neg logit contributions
hr-1.162
 exc-0.984
 accidents-0.955
EX-0.955
oc-0.947
Token clean
Token ID3424
Feature activation-6.38e-01

Pos logit contributions
hr+0.190
oc+0.158
 exc+0.155
 tou+0.154
AAAA+0.153
Neg logit contributions
abella-0.242
top-0.228
 wrong-0.222
bench-0.217
box-0.196
Token up
Token ID510
Feature activation+1.04e-01

Pos logit contributions
abella+1.073
top+1.019
bench+0.981
 wrong+0.962
box+0.875
Neg logit contributions
hr-0.871
 accidents-0.723
oc-0.717
 exc-0.704
rum-0.685
Token your
Token ID534
Feature activation+3.47e+00

Pos logit contributions
hr+0.130
oc+0.107
 exc+0.105
 tou+0.105
EX+0.104
Neg logit contributions
abella-0.186
top-0.176
 wrong-0.171
bench-0.167
box-0.152
Token toys
Token ID14958
Feature activation-6.60e-01

Pos logit contributions
hr+5.951
AAAA+5.492
Fear+5.474
rum+5.421
RO+5.377
Neg logit contributions
 wrong-5.494
box-4.784
bench-4.505
abella-4.367
top-4.310
Token."
Token ID526
Feature activation+4.89e-01

Pos logit contributions
abella+1.267
top+1.177
 wrong+1.139
bench+1.131
box+1.020
Neg logit contributions
hr-0.754
 exc-0.602
oc-0.599
 accidents-0.585
 tou-0.564
Token Lily
Token ID20037
Feature activation+1.47e-01

Pos logit contributions
hr+0.766
oc+0.655
 tou+0.643
 exc+0.635
AAAA+0.633
Neg logit contributions
abella-0.739
top-0.694
 wrong-0.671
bench-0.653
box-0.580
Token did
Token ID750
Feature activation+1.23e+00

Pos logit contributions
hr+0.220
oc+0.187
 tou+0.184
 exc+0.184
EX+0.182
Neg logit contributions
abella-0.230
top-0.214
 wrong-0.207
bench-0.203
box-0.182
Token not
Token ID407
Feature activation-7.26e-01

Pos logit contributions
hr+2.748
oc+2.587
 tou+2.549
EX+2.513
AAAA+2.508
Neg logit contributions
abella-1.060
 wrong-0.926
top-0.876
bench-0.751
 Luke-0.592
Token relieved
Token ID22598
Feature activation-2.76e-02

Pos logit contributions
hr+1.757
oc+1.521
rum+1.490
AAAA+1.482
EX+1.466
Neg logit contributions
abella-2.019
top-1.878
 wrong-1.871
bench-1.774
 Luke-1.630
Token.
Token ID13
Feature activation-2.34e-01

Pos logit contributions
abella+0.042
top+0.039
 wrong+0.037
bench+0.037
box+0.033
Neg logit contributions
hr-0.043
oc-0.037
 exc-0.036
 tou-0.036
 accidents-0.036
Token She
Token ID1375
Feature activation+1.16e+00

Pos logit contributions
abella+0.397
top+0.373
 wrong+0.363
bench+0.356
box+0.322
Neg logit contributions
hr-0.314
oc-0.264
 exc-0.260
 tou-0.257
 accidents-0.256
Token even
Token ID772
Feature activation+4.13e-01

Pos logit contributions
hr+1.913
rum+1.634
EX+1.598
oc+1.581
 accidents+1.577
Neg logit contributions
abella-1.736
top-1.550
 wrong-1.522
bench-1.429
box-1.285
Token shared
Token ID4888
Feature activation+9.34e-01

Pos logit contributions
hr+0.637
oc+0.541
 exc+0.538
EX+0.537
 tou+0.537
Neg logit contributions
abella-0.626
top-0.579
 wrong-0.563
bench-0.548
box-0.485
Token her
Token ID607
Feature activation+3.36e+00

Pos logit contributions
hr+1.271
AAAA+1.079
oc+1.060
 exc+1.040
Ps+1.039
Neg logit contributions
abella-1.623
top-1.513
 wrong-1.504
bench-1.420
 Luke-1.303
Token delicious
Token ID12625
Feature activation+2.60e+00

Pos logit contributions
hr+5.167
rum+4.503
oc+4.461
AAAA+4.343
Ps+4.241
Neg logit contributions
 wrong-5.796
abella-5.196
top-4.896
bench-4.875
box-4.641
Token snack
Token ID26906
Feature activation-4.01e-01

Pos logit contributions
hr+3.672
AAAA+3.091
 exc+3.035
Fear+2.960
rum+2.958
Neg logit contributions
 wrong-4.218
top-4.137
abella-4.111
bench-4.066
box-3.816
Token with
Token ID351
Feature activation-1.45e-01

Pos logit contributions
abella+0.832
top+0.784
 wrong+0.769
bench+0.754
box+0.690
Neg logit contributions
hr-0.391
oc-0.307
 exc-0.293
 tou-0.290
 accidents-0.287
Token the
Token ID262
Feature activation+1.13e+00

Pos logit contributions
abella+0.277
top+0.263
 wrong+0.255
bench+0.253
box+0.231
Neg logit contributions
hr-0.165
oc-0.131
 exc-0.130
 tou-0.130
 accidents-0.129
Token little
Token ID1310
Feature activation+3.44e-01

Pos logit contributions
hr+1.275
oc+1.011
rum+0.988
AAAA+0.972
EX+0.947
Neg logit contributions
abella-2.189
 wrong-2.186
top-2.101
bench-2.045
box-1.877
Token,
Token ID11
Feature activation+1.81e+00

Pos logit contributions
hr+0.407
oc+0.342
 exc+0.339
 tou+0.336
EX+0.334
Neg logit contributions
abella-0.423
top-0.396
 wrong-0.390
bench-0.375
box-0.338
Token Nut
Token ID11959
Feature activation+5.99e-01

Pos logit contributions
hr+2.903
rum+2.498
Ps+2.331
AAAA+2.327
oc+2.320
Neg logit contributions
abella-2.657
 wrong-2.576
top-2.535
bench-2.259
 Luke-2.106
Tokenty
Token ID774
Feature activation+4.27e-01

Pos logit contributions
hr+1.196
 exc+1.057
Ps+1.044
AAAA+1.043
 tou+1.042
Neg logit contributions
abella-0.642
top-0.566
 wrong-0.560
bench-0.516
box-0.435
Token always
Token ID1464
Feature activation-5.47e-01

Pos logit contributions
hr+0.660
oc+0.554
 tou+0.552
 exc+0.552
EX+0.550
Neg logit contributions
abella-0.642
top-0.602
 wrong-0.586
bench-0.564
box-0.506
Token shared
Token ID4888
Feature activation+1.55e+00

Pos logit contributions
abella+0.879
top+0.815
 wrong+0.774
bench+0.770
box+0.690
Neg logit contributions
hr-0.813
 exc-0.683
oc-0.678
 accidents-0.675
AAAA-0.673
Token her
Token ID607
Feature activation+3.33e+00

Pos logit contributions
hr+2.188
AAAA+1.879
 exc+1.836
Ps+1.830
oc+1.822
Neg logit contributions
abella-2.551
 wrong-2.512
top-2.466
bench-2.268
box-2.127
Token ac
Token ID936
Feature activation+9.99e-01

Pos logit contributions
hr+4.479
Ps+3.784
AAAA+3.777
rum+3.561
packs+3.488
Neg logit contributions
 wrong-6.112
top-5.482
abella-5.208
box-5.118
bench-5.104
Tokenorns
Token ID19942
Feature activation+2.74e-01

Pos logit contributions
hr+1.516
AAAA+1.283
Ps+1.271
 exc+1.268
 tou+1.245
Neg logit contributions
abella-1.506
top-1.462
 wrong-1.427
bench-1.382
box-1.250
Token with
Token ID351
Feature activation+8.85e-02

Pos logit contributions
hr+0.262
oc+0.198
AAAA+0.197
 exc+0.196
 tou+0.195
Neg logit contributions
abella-0.568
top-0.547
 wrong-0.535
bench-0.524
box-0.485
Token her
Token ID607
Feature activation+1.49e+00

Pos logit contributions
hr+0.114
oc+0.094
 exc+0.093
 tou+0.092
EX+0.091
Neg logit contributions
abella-0.156
top-0.148
 wrong-0.144
bench-0.140
box-0.128
Token friends
Token ID2460
Feature activation+1.51e-03

Pos logit contributions
hr+2.281
AAAA+2.031
Ps+1.955
EX+1.907
 tou+1.873
Neg logit contributions
abella-2.238
 wrong-2.230
top-2.155
bench-1.981
box-1.895
Token
Token ID198
Feature activation+1.49e-01

Pos logit contributions
abella+1.061
top+1.008
 wrong+0.983
bench+0.961
box+0.876
Neg logit contributions
hr-0.762
oc-0.651
 exc-0.630
Ps-0.610
EX-0.609
TokenOne
Token ID3198
Feature activation+7.27e-01

Pos logit contributions
hr+0.302
oc+0.267
 tou+0.265
 exc+0.265
 accidents+0.263
Neg logit contributions
abella-0.154
top-0.140
 wrong-0.132
bench-0.127
box-0.105
Token day
Token ID1110
Feature activation+7.94e-01

Pos logit contributions
hr+1.503
oc+1.321
EX+1.319
 accidents+1.319
AAAA+1.315
Neg logit contributions
abella-0.718
top-0.660
 wrong-0.631
bench-0.576
box-0.457
Token,
Token ID11
Feature activation+1.08e+00

Pos logit contributions
hr+1.298
 tou+1.093
EX+1.085
 exc+1.083
oc+1.083
Neg logit contributions
abella-1.140
top-1.071
 wrong-1.063
bench-0.974
box-0.888
Token while
Token ID981
Feature activation+4.37e-01

Pos logit contributions
hr+1.748
oc+1.493
rum+1.476
 tou+1.473
EX+1.450
Neg logit contributions
abella-1.575
top-1.497
 wrong-1.467
bench-1.324
 Luke-1.228
Token they
Token ID484
Feature activation+3.30e+00

Pos logit contributions
hr+0.762
oc+0.661
 tou+0.649
EX+0.646
 exc+0.645
Neg logit contributions
abella-0.576
top-0.535
 wrong-0.519
bench-0.494
box-0.435
Token were
Token ID547
Feature activation-4.57e-01

Pos logit contributions
hr+7.197
rum+6.695
 accidents+6.588
adoes+6.545
 fortunate+6.487
Neg logit contributions
abella-3.213
top-3.092
 wrong-2.269
bench-2.161
case-2.034
Token walking
Token ID6155
Feature activation-7.34e-01

Pos logit contributions
abella+0.726
top+0.687
bench+0.657
 wrong+0.644
box+0.578
Neg logit contributions
hr-0.673
 tou-0.567
 exc-0.566
 accidents-0.558
AAAA-0.557
Token,
Token ID11
Feature activation-1.09e-01

Pos logit contributions
abella+1.305
top+1.200
bench+1.179
 wrong+1.144
box+1.051
Neg logit contributions
hr-0.949
oc-0.808
 exc-0.794
 tou-0.790
AAAA-0.776
Token Tim
Token ID5045
Feature activation-7.51e-02

Pos logit contributions
abella+0.167
top+0.156
 wrong+0.151
bench+0.149
box+0.133
Neg logit contributions
hr-0.164
oc-0.139
 exc-0.138
 tou-0.137
 accidents-0.136
Tokenmy
Token ID1820
Feature activation+4.33e-02

Pos logit contributions
abella+0.096
top+0.088
 wrong+0.084
bench+0.082
box+0.071
Neg logit contributions
hr-0.134
oc-0.117
 exc-0.115
 tou-0.115
EX-0.114
TokenI
Token ID40
Feature activation-6.73e-01

Pos logit contributions
abella+0.074
top+0.069
 wrong+0.067
bench+0.066
box+0.059
Neg logit contributions
hr-0.073
oc-0.062
 exc-0.061
 tou-0.061
EX-0.060
Token'm
Token ID1101
Feature activation-6.93e-02

Pos logit contributions
abella+1.148
top+1.084
bench+1.035
 wrong+1.034
box+0.935
Neg logit contributions
hr-0.906
oc-0.791
 exc-0.779
 tou-0.753
AAAA-0.742
Token happy
Token ID3772
Feature activation-1.73e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.106
box+0.095
Neg logit contributions
hr-0.095
oc-0.078
 exc-0.078
 tou-0.078
 accidents-0.077
Token to
Token ID284
Feature activation+6.34e-01

Pos logit contributions
abella+0.257
top+0.240
 wrong+0.230
bench+0.229
box+0.201
Neg logit contributions
hr-0.273
oc-0.233
 exc-0.231
 accidents-0.230
 tou-0.229
Token help
Token ID1037
Feature activation+9.19e-01

Pos logit contributions
hr+1.035
 accidents+0.878
oc+0.866
Ps+0.856
EX+0.856
Neg logit contributions
abella-0.895
top-0.846
 wrong-0.826
bench-0.797
box-0.698
Token my
Token ID616
Feature activation+3.30e+00

Pos logit contributions
hr+1.351
AAAA+1.154
Ps+1.144
EX+1.134
oc+1.126
Neg logit contributions
abella-1.432
 wrong-1.379
top-1.362
bench-1.261
box-1.152
Token reliable
Token ID9314
Feature activation+1.75e+00

Pos logit contributions
hr+5.759
AAAA+5.455
Upon+5.132
Ps+5.023
EX+4.860
Neg logit contributions
 wrong-4.851
top-4.383
abella-4.095
box-4.060
bench-3.751
Token friend
Token ID1545
Feature activation+1.47e+00

Pos logit contributions
hr+2.602
AAAA+2.179
Upon+2.135
Ps+2.126
Fear+2.060
Neg logit contributions
 wrong-2.749
abella-2.686
top-2.678
bench-2.518
box-2.424
Token.
Token ID13
Feature activation+1.16e-01

Pos logit contributions
hr+2.288
AAAA+2.067
Ps+2.021
EX+1.966
 accidents+1.928
Neg logit contributions
 wrong-2.156
abella-2.155
top-2.125
bench-1.932
box-1.834
Token Remember
Token ID11436
Feature activation-9.38e-02

Pos logit contributions
hr+0.177
oc+0.149
 exc+0.148
 tou+0.147
EX+0.146
Neg logit contributions
abella-0.178
top-0.166
 wrong-0.161
bench-0.157
box-0.140
Token,
Token ID11
Feature activation+1.36e+00

Pos logit contributions
abella+0.147
top+0.138
 wrong+0.132
bench+0.131
box+0.117
Neg logit contributions
hr-0.140
oc-0.118
 exc-0.117
 tou-0.116
 accidents-0.116
Token happy
Token ID3772
Feature activation-1.06e-01

Pos logit contributions
hr+0.971
oc+0.825
EX+0.807
AAAA+0.798
Ps+0.796
Neg logit contributions
abella-1.008
 wrong-0.949
top-0.948
bench-0.884
 Luke-0.775
Token because
Token ID780
Feature activation-1.08e-01

Pos logit contributions
abella+0.182
top+0.171
 wrong+0.165
bench+0.163
box+0.147
Neg logit contributions
hr-0.144
oc-0.120
 exc-0.119
 accidents-0.118
 tou-0.118
Token he
Token ID339
Feature activation+7.99e-01

Pos logit contributions
abella+0.153
top+0.142
 wrong+0.137
bench+0.134
box+0.119
Neg logit contributions
hr-0.176
oc-0.152
 exc-0.151
 tou-0.150
 accidents-0.149
Token could
Token ID714
Feature activation+2.05e-01

Pos logit contributions
hr+1.416
 accidents+1.217
 exc+1.211
rum+1.204
EX+1.202
Neg logit contributions
abella-1.038
top-0.949
 wrong-0.923
bench-0.871
box-0.761
Token share
Token ID2648
Feature activation+1.89e+00

Pos logit contributions
hr+0.346
oc+0.299
AAAA+0.294
 tou+0.293
 exc+0.292
Neg logit contributions
abella-0.282
top-0.263
 wrong-0.254
bench-0.246
box-0.216
Token his
Token ID465
Feature activation+3.29e+00

Pos logit contributions
hr+2.598
AAAA+2.329
Ps+2.251
EX+2.155
 exc+2.132
Neg logit contributions
 wrong-3.163
abella-3.133
top-3.009
bench-2.822
box-2.605
Token ac
Token ID936
Feature activation+7.84e-01

Pos logit contributions
hr+4.384
rum+3.603
AAAA+3.540
Ps+3.484
Upon+3.355
Neg logit contributions
 wrong-6.126
top-5.370
abella-5.234
box-5.216
bench-5.185
Tokenorn
Token ID1211
Feature activation+7.77e-01

Pos logit contributions
hr+1.126
AAAA+0.948
 exc+0.934
Ps+0.931
 tou+0.919
Neg logit contributions
abella-1.247
top-1.202
 wrong-1.168
bench-1.137
box-1.027
Token with
Token ID351
Feature activation+1.66e-01

Pos logit contributions
hr+0.921
AAAA+0.783
Ps+0.765
EX+0.731
 tou+0.725
Neg logit contributions
abella-1.414
top-1.382
 wrong-1.354
bench-1.311
box-1.217
Token his
Token ID465
Feature activation+1.69e+00

Pos logit contributions
hr+0.189
oc+0.150
 exc+0.148
 tou+0.147
EX+0.146
Neg logit contributions
abella-0.322
top-0.305
 wrong-0.299
bench-0.291
box-0.267
Token friends
Token ID2460
Feature activation-8.03e-02

Pos logit contributions
hr+2.212
AAAA+1.846
EX+1.835
Ps+1.806
Upon+1.760
Neg logit contributions
 wrong-2.930
abella-2.876
top-2.740
bench-2.577
box-2.515
Token.
Token ID13
Feature activation-2.76e-01

Pos logit contributions
abella+0.918
top+0.842
 wrong+0.814
bench+0.813
box+0.737
Neg logit contributions
hr-0.607
oc-0.499
 exc-0.498
 tou-0.492
 accidents-0.487
Token She
Token ID1375
Feature activation-1.82e-01

Pos logit contributions
abella+0.423
top+0.394
 wrong+0.382
bench+0.375
box+0.336
Neg logit contributions
hr-0.416
oc-0.353
 exc-0.352
 tou-0.351
Ps-0.350
Token wanted
Token ID2227
Feature activation-7.98e-01

Pos logit contributions
abella+0.354
top+0.337
 wrong+0.329
bench+0.324
box+0.296
Neg logit contributions
hr-0.198
oc-0.158
 exc-0.158
 tou-0.155
Ps-0.153
Token to
Token ID284
Feature activation+1.88e+00

Pos logit contributions
abella+1.249
top+1.156
bench+1.149
 wrong+1.106
 projector+1.004
Neg logit contributions
hr-1.185
 tou-1.018
 exc-0.999
 accidents-0.997
EX-0.997
Token fill
Token ID6070
Feature activation-3.29e-01

Pos logit contributions
hr+2.673
 accidents+2.410
AAAA+2.308
 fortunate+2.292
oc+2.256
Neg logit contributions
top-3.043
abella-3.010
 wrong-2.773
bench-2.697
 tenth-2.427
Token her
Token ID607
Feature activation+3.25e+00

Pos logit contributions
abella+0.555
top+0.520
 wrong+0.504
bench+0.502
box+0.452
Neg logit contributions
hr-0.449
oc-0.373
 exc-0.370
 tou-0.369
 accidents-0.366
Token basket
Token ID7988
Feature activation-6.88e-01

Pos logit contributions
hr+5.748
AAAA+5.219
Ps+5.156
 accidents+4.798
 fortunate+4.798
Neg logit contributions
 wrong-4.453
top-4.376
box-4.138
abella-4.025
 tenth-3.489
Token with
Token ID351
Feature activation+1.19e+00

Pos logit contributions
abella+1.476
top+1.347
 wrong+1.333
bench+1.319
 versa+1.208
Neg logit contributions
hr-0.648
 exc-0.500
oc-0.488
 tou-0.472
 accidents-0.461
Token the
Token ID262
Feature activation+1.24e+00

Pos logit contributions
hr+1.914
oc+1.645
AAAA+1.621
Ps+1.613
EX+1.562
Neg logit contributions
abella-1.752
top-1.704
 wrong-1.651
bench-1.542
box-1.365
Token needles
Token ID25209
Feature activation-1.69e+00

Pos logit contributions
hr+1.388
AAAA+1.058
rum+1.028
oc+1.023
Ps+1.021
Neg logit contributions
abella-2.440
top-2.353
 wrong-2.347
bench-2.184
box-2.087
Token.
Token ID13
Feature activation-2.70e-01

Pos logit contributions
abella+3.502
top+3.068
bench+3.037
 wrong+2.939
 projector+2.872
Neg logit contributions
hr-1.728
 from-1.513
 exc-1.462
 accidents-1.327
rum-1.320
Token
Token ID198
Feature activation-8.91e-01

Pos logit contributions
abella+1.606
top+1.550
 wrong+1.511
bench+1.481
box+1.362
Neg logit contributions
hr-1.055
oc-0.895
 exc-0.869
Ps-0.854
EX-0.839
TokenOne
Token ID3198
Feature activation+4.51e-01

Pos logit contributions
abella+1.068
top+1.010
 wrong+0.980
bench+0.969
box+0.840
Neg logit contributions
hr-1.611
oc-1.437
Ps-1.430
 exc-1.415
EX-1.403
Token day
Token ID1110
Feature activation+3.26e-01

Pos logit contributions
hr+0.914
oc+0.806
 tou+0.799
AAAA+0.798
EX+0.798
Neg logit contributions
abella-0.467
top-0.427
 wrong-0.409
bench-0.383
box-0.312
Token,
Token ID11
Feature activation+6.65e-01

Pos logit contributions
hr+0.492
oc+0.420
 tou+0.415
 exc+0.414
EX+0.410
Neg logit contributions
abella-0.511
top-0.473
 wrong-0.461
bench-0.443
box-0.400
Token while
Token ID981
Feature activation+2.71e-01

Pos logit contributions
hr+1.084
oc+0.978
 tou+0.949
rum+0.940
AAAA+0.930
Neg logit contributions
abella-0.973
top-0.886
 wrong-0.865
bench-0.816
 Luke-0.737
Token they
Token ID484
Feature activation+3.25e+00

Pos logit contributions
hr+0.447
oc+0.389
 tou+0.380
 exc+0.380
EX+0.378
Neg logit contributions
abella-0.385
top-0.354
 wrong-0.343
bench-0.331
box-0.293
Token were
Token ID547
Feature activation-7.34e-01

Pos logit contributions
hr+7.009
rum+6.470
 accidents+6.355
adoes+6.267
EX+6.204
Neg logit contributions
abella-3.143
top-2.957
 wrong-2.346
bench-2.145
 versa-2.130
Token walking
Token ID6155
Feature activation-8.19e-01

Pos logit contributions
abella+1.190
top+1.135
bench+1.081
 wrong+1.046
box+0.948
Neg logit contributions
hr-1.061
 tou-0.893
 exc-0.883
 accidents-0.878
Ps-0.868
Token,
Token ID11
Feature activation-2.97e-01

Pos logit contributions
abella+1.470
top+1.349
bench+1.334
 wrong+1.286
box+1.195
Neg logit contributions
hr-1.050
oc-0.883
 exc-0.868
 tou-0.867
 accidents-0.854
Token they
Token ID484
Feature activation-5.75e-01

Pos logit contributions
abella+0.466
top+0.443
 wrong+0.427
bench+0.426
box+0.377
Neg logit contributions
hr-0.435
 accidents-0.363
 exc-0.362
oc-0.360
 tou-0.359
Token saw
Token ID2497
Feature activation-1.78e+00

Pos logit contributions
abella+0.918
top+0.869
bench+0.851
 wrong+0.837
box+0.745
Neg logit contributions
hr-0.828
 tou-0.698
Ps-0.690
 exc-0.685
oc-0.685
Token
Token ID198
Feature activation+2.48e-01

Pos logit contributions
abella+1.502
top+1.438
 wrong+1.408
bench+1.376
box+1.261
Neg logit contributions
hr-1.014
oc-0.857
 exc-0.827
Ps-0.813
EX-0.802
TokenOne
Token ID3198
Feature activation+8.86e-01

Pos logit contributions
hr+0.506
oc+0.447
 tou+0.444
 exc+0.443
 accidents+0.442
Neg logit contributions
abella-0.257
top-0.231
 wrong-0.219
bench-0.210
box-0.173
Token day
Token ID1110
Feature activation+1.09e+00

Pos logit contributions
hr+1.874
AAAA+1.658
EX+1.647
oc+1.643
 accidents+1.642
Neg logit contributions
abella-0.838
top-0.767
 wrong-0.735
bench-0.667
 projector-0.538
Token,
Token ID11
Feature activation+1.36e+00

Pos logit contributions
hr+1.862
 exc+1.586
 tou+1.583
oc+1.580
EX+1.580
Neg logit contributions
abella-1.529
 wrong-1.397
top-1.392
bench-1.251
box-1.158
Token while
Token ID981
Feature activation+7.50e-01

Pos logit contributions
hr+2.318
rum+2.069
oc+2.054
 tou+1.967
EX+1.958
Neg logit contributions
abella-1.950
 wrong-1.772
top-1.755
bench-1.573
 Luke-1.478
Token they
Token ID484
Feature activation+3.23e+00

Pos logit contributions
hr+1.354
oc+1.198
rum+1.161
EX+1.158
 tou+1.157
Neg logit contributions
abella-0.957
top-0.864
 wrong-0.847
bench-0.797
box-0.698
Token were
Token ID547
Feature activation-1.31e-01

Pos logit contributions
hr+6.864
rum+6.396
adoes+6.228
 accidents+6.150
 exc+6.142
Neg logit contributions
abella-3.330
top-3.115
 wrong-2.588
bench-2.335
 versa-2.205
Token playing
Token ID2712
Feature activation+1.17e-01

Pos logit contributions
abella+0.198
top+0.186
 wrong+0.178
bench+0.176
box+0.156
Neg logit contributions
hr-0.202
 exc-0.170
oc-0.170
 tou-0.170
 accidents-0.167
Token in
Token ID287
Feature activation+1.12e+00

Pos logit contributions
hr+0.179
oc+0.152
 exc+0.150
 tou+0.150
EX+0.148
Neg logit contributions
abella-0.179
top-0.168
 wrong-0.163
bench-0.159
box-0.142
Token the
Token ID262
Feature activation-8.06e-02

Pos logit contributions
hr+1.537
oc+1.342
 tou+1.280
rum+1.277
AAAA+1.269
Neg logit contributions
abella-1.858
top-1.796
 wrong-1.737
bench-1.715
box-1.541
Token garden
Token ID11376
Feature activation+3.00e-02

Pos logit contributions
abella+0.118
top+0.109
 wrong+0.105
bench+0.103
box+0.091
Neg logit contributions
hr-0.129
oc-0.111
 tou-0.110
 exc-0.110
EX-0.109
Token his
Token ID465
Feature activation+1.14e+00

Pos logit contributions
hr+1.903
oc+1.577
AAAA+1.510
Upon+1.480
rum+1.474
Neg logit contributions
abella-2.897
 wrong-2.698
top-2.674
bench-2.632
 Luke-2.487
Token cousins
Token ID28706
Feature activation-7.42e-01

Pos logit contributions
hr+1.649
AAAA+1.385
oc+1.366
rum+1.357
Ps+1.329
Neg logit contributions
abella-1.853
 wrong-1.755
top-1.732
bench-1.650
box-1.556
Token played
Token ID2826
Feature activation-3.44e-01

Pos logit contributions
abella+1.196
top+1.110
 wrong+1.092
bench+1.069
box+0.952
Neg logit contributions
hr-1.070
oc-0.906
 exc-0.905
 tou-0.899
EX-0.875
Token and
Token ID290
Feature activation-1.13e+00

Pos logit contributions
abella+0.560
top+0.526
 wrong+0.503
bench+0.497
box+0.445
Neg logit contributions
hr-0.494
oc-0.419
 exc-0.410
 tou-0.407
EX-0.404
Token shared
Token ID4888
Feature activation+1.15e+00

Pos logit contributions
abella+1.757
top+1.704
 wrong+1.672
bench+1.620
box+1.469
Neg logit contributions
hr-1.672
oc-1.441
 exc-1.439
 tou-1.428
EX-1.391
Token their
Token ID511
Feature activation+3.22e+00

Pos logit contributions
hr+1.898
AAAA+1.639
 exc+1.625
Ps+1.619
EX+1.605
Neg logit contributions
abella-1.662
top-1.519
 wrong-1.492
bench-1.410
box-1.267
Token ac
Token ID936
Feature activation+5.63e-01

Pos logit contributions
hr+5.623
rum+5.259
AAAA+4.842
Ps+4.694
cess+4.693
Neg logit contributions
 wrong-4.212
top-4.040
abella-4.003
bench-3.644
box-3.464
Tokenorns
Token ID19942
Feature activation-4.85e-01

Pos logit contributions
hr+0.719
AAAA+0.586
 exc+0.582
Ps+0.579
 tou+0.571
Neg logit contributions
abella-0.995
top-0.952
 wrong-0.918
bench-0.907
box-0.827
Token together
Token ID1978
Feature activation-2.83e-01

Pos logit contributions
abella+0.952
top+0.887
 wrong+0.867
bench+0.855
box+0.785
Neg logit contributions
hr-0.539
oc-0.439
 exc-0.424
 tou-0.414
 accidents-0.409
Token.
Token ID13
Feature activation-1.45e-01

Pos logit contributions
abella+0.497
top+0.465
 wrong+0.451
bench+0.445
box+0.402
Neg logit contributions
hr-0.369
oc-0.307
 exc-0.300
 tou-0.295
 accidents-0.293
Token
Token ID220
Feature activation+9.05e-02

Pos logit contributions
abella+0.213
top+0.199
 wrong+0.193
bench+0.189
box+0.167
Neg logit contributions
hr-0.230
oc-0.200
 exc-0.196
 tou-0.195
EX-0.194
Token it
Token ID340
Feature activation+3.87e-01

Pos logit contributions
hr+0.547
oc+0.460
 tou+0.447
EX+0.446
 exc+0.445
Neg logit contributions
abella-0.683
top-0.633
 wrong-0.628
bench-0.603
box-0.550
Token and
Token ID290
Feature activation+2.37e-01

Pos logit contributions
hr+0.489
oc+0.397
 tou+0.393
AAAA+0.390
EX+0.389
Neg logit contributions
abella-0.701
top-0.665
 wrong-0.663
bench-0.632
box-0.580
Token decided
Token ID3066
Feature activation-8.55e-03

Pos logit contributions
hr+0.353
oc+0.301
 exc+0.294
 tou+0.294
EX+0.293
Neg logit contributions
abella-0.377
top-0.351
 wrong-0.342
bench-0.330
box-0.297
Token to
Token ID284
Feature activation-4.76e-01

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.011
box+0.009
Neg logit contributions
hr-0.014
oc-0.012
 exc-0.012
 tou-0.012
EX-0.012
Token share
Token ID2648
Feature activation+1.12e+00

Pos logit contributions
abella+0.832
top+0.778
 wrong+0.761
bench+0.744
box+0.670
Neg logit contributions
hr-0.628
 exc-0.530
oc-0.518
 tou-0.516
EX-0.505
Token her
Token ID607
Feature activation+3.22e+00

Pos logit contributions
hr+1.609
AAAA+1.385
EX+1.343
Ps+1.340
oc+1.335
Neg logit contributions
abella-1.825
 wrong-1.733
top-1.714
bench-1.617
 Luke-1.484
Token je
Token ID11223
Feature activation+1.17e+00

Pos logit contributions
hr+5.290
Ps+4.796
Upon+4.711
Fear+4.611
AAAA+4.528
Neg logit contributions
 wrong-4.846
top-4.317
box-4.213
bench-4.126
abella-4.108
Tokenep
Token ID538
Feature activation+2.13e-01

Pos logit contributions
hr+1.813
AAAA+1.617
 fortunate+1.605
Ps+1.587
 accidents+1.572
Neg logit contributions
abella-1.662
top-1.585
 wrong-1.560
bench-1.517
box-1.413
Token with
Token ID351
Feature activation-9.48e-01

Pos logit contributions
hr+0.243
oc+0.195
 tou+0.193
AAAA+0.192
 exc+0.192
Neg logit contributions
abella-0.406
top-0.387
 wrong-0.377
bench-0.370
box-0.339
Token Max
Token ID5436
Feature activation-3.43e-01

Pos logit contributions
abella+1.275
top+1.203
bench+1.160
 wrong+1.109
box+1.003
Neg logit contributions
hr-1.575
 accidents-1.411
oc-1.383
 tou-1.366
 exc-1.356
Token.
Token ID13
Feature activation+3.76e-02

Pos logit contributions
abella+0.599
top+0.561
 wrong+0.537
bench+0.536
box+0.482
Neg logit contributions
hr-0.448
oc-0.374
 exc-0.367
 accidents-0.360
 tou-0.358
Token,
Token ID11
Feature activation+9.05e-01

Pos logit contributions
hr+0.289
oc+0.243
 exc+0.240
 tou+0.240
EX+0.238
Neg logit contributions
abella-0.306
top-0.286
 wrong-0.280
bench-0.272
box-0.244
Token Lily
Token ID20037
Feature activation+6.93e-02

Pos logit contributions
hr+1.497
AAAA+1.284
oc+1.278
rum+1.275
 tou+1.275
Neg logit contributions
abella-1.296
 wrong-1.208
top-1.202
bench-1.115
 Luke-1.010
Token learned
Token ID4499
Feature activation+1.11e-01

Pos logit contributions
hr+0.104
oc+0.088
 exc+0.087
 tou+0.086
EX+0.086
Neg logit contributions
abella-0.108
top-0.101
 wrong-0.098
bench-0.095
box-0.085
Token to
Token ID284
Feature activation-1.39e+00

Pos logit contributions
hr+0.197
oc+0.172
 exc+0.170
 tou+0.169
AAAA+0.168
Neg logit contributions
abella-0.145
top-0.134
 wrong-0.129
bench-0.125
box-0.108
Token share
Token ID2648
Feature activation+1.71e+00

Pos logit contributions
abella+2.717
top+2.485
bench+2.402
 wrong+2.373
 projector+2.265
Neg logit contributions
hr-1.628
 exc-1.412
oc-1.377
 tou-1.339
rum-1.267
Token her
Token ID607
Feature activation+3.20e+00

Pos logit contributions
hr+2.188
AAAA+1.897
Ps+1.784
EX+1.743
 exc+1.740
Neg logit contributions
abella-3.056
 wrong-3.035
top-2.939
bench-2.756
box-2.602
Token candy
Token ID18550
Feature activation+8.38e-01

Pos logit contributions
hr+4.664
AAAA+4.457
Ps+4.065
 tou+3.993
Upon+3.991
Neg logit contributions
 wrong-5.444
abella-4.940
box-4.730
top-4.676
bench-4.627
Token and
Token ID290
Feature activation-6.66e-01

Pos logit contributions
hr+0.990
AAAA+0.843
Ps+0.814
 tou+0.793
EX+0.790
Neg logit contributions
abella-1.566
top-1.512
 wrong-1.479
bench-1.433
box-1.333
Token her
Token ID607
Feature activation+1.32e+00

Pos logit contributions
abella+1.013
top+0.937
 wrong+0.881
bench+0.880
box+0.775
Neg logit contributions
hr-1.033
 exc-0.897
 accidents-0.886
 tou-0.883
oc-0.878
Token brother
Token ID3956
Feature activation+1.21e+00

Pos logit contributions
hr+1.924
AAAA+1.644
Ps+1.580
 tou+1.579
Upon+1.571
Neg logit contributions
abella-2.076
 wrong-2.051
top-1.938
bench-1.896
box-1.740
Token learned
Token ID4499
Feature activation+2.60e-01

Pos logit contributions
hr+2.004
AAAA+1.751
EX+1.711
 tou+1.698
Ps+1.698
Neg logit contributions
abella-1.689
 wrong-1.666
top-1.641
bench-1.520
box-1.361
Token on
Token ID319
Feature activation+1.20e-01

Pos logit contributions
abella+0.126
top+0.118
 wrong+0.113
bench+0.111
box+0.099
Neg logit contributions
hr-0.125
oc-0.107
 exc-0.106
 tou-0.105
AAAA-0.104
Token,
Token ID11
Feature activation+1.67e+00

Pos logit contributions
hr+0.177
oc+0.149
 exc+0.147
 tou+0.146
EX+0.145
Neg logit contributions
abella-0.192
top-0.180
 wrong-0.175
bench-0.170
box-0.153
Token Sue
Token ID26089
Feature activation+3.40e-01

Pos logit contributions
hr+2.469
AAAA+2.115
rum+2.088
oc+2.081
EX+2.022
Neg logit contributions
abella-2.675
 wrong-2.564
top-2.560
bench-2.216
 Luke-2.078
Token loved
Token ID6151
Feature activation+8.54e-01

Pos logit contributions
hr+0.682
oc+0.599
 tou+0.595
 exc+0.594
AAAA+0.594
Neg logit contributions
abella-0.360
top-0.323
 wrong-0.309
bench-0.293
box-0.246
Token hanging
Token ID10938
Feature activation+1.40e+00

Pos logit contributions
hr+1.353
oc+1.227
AAAA+1.171
Ps+1.153
rum+1.153
Neg logit contributions
abella-1.247
 wrong-1.149
top-1.147
bench-1.070
box-0.958
Token her
Token ID607
Feature activation+3.16e+00

Pos logit contributions
hr+2.019
oc+1.716
Ps+1.693
AAAA+1.670
rum+1.623
Neg logit contributions
abella-2.273
 wrong-2.245
top-2.218
bench-2.104
box-1.947
Token purse
Token ID27517
Feature activation+4.89e-01

Pos logit contributions
hr+5.378
rum+5.138
oc+4.994
AAAA+4.695
 fortunate+4.545
Neg logit contributions
 wrong-5.104
abella-4.663
top-4.441
bench-4.010
box-3.973
Token in
Token ID287
Feature activation+8.25e-01

Pos logit contributions
hr+0.565
AAAA+0.457
oc+0.452
Ps+0.444
 exc+0.443
Neg logit contributions
abella-0.920
top-0.893
 wrong-0.870
bench-0.846
box-0.783
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
hr+1.120
oc+0.952
AAAA+0.925
Ps+0.921
 tou+0.915
Neg logit contributions
abella-1.386
top-1.342
 wrong-1.300
bench-1.266
box-1.142
Token tree
Token ID5509
Feature activation-3.10e-01

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.016
box+0.014
Neg logit contributions
hr-0.017
oc-0.014
 exc-0.014
 tou-0.014
EX-0.014
Token as
Token ID355
Feature activation+5.03e-03

Pos logit contributions
abella+0.592
top+0.549
 wrong+0.537
bench+0.531
box+0.481
Neg logit contributions
hr-0.361
oc-0.293
 exc-0.285
 tou-0.282
EX-0.278
Token to
Token ID284
Feature activation+8.38e-01

Pos logit contributions
abella+0.671
top+0.632
bench+0.604
 wrong+0.596
box+0.535
Neg logit contributions
hr-0.600
 exc-0.509
 accidents-0.509
 tou-0.502
oc-0.498
Token fly
Token ID6129
Feature activation+4.87e-01

Pos logit contributions
hr+1.232
oc+1.064
 accidents+1.031
AAAA+1.030
rum+1.023
Neg logit contributions
abella-1.288
top-1.228
 wrong-1.214
bench-1.157
box-1.037
Token and
Token ID290
Feature activation-7.26e-02

Pos logit contributions
hr+0.742
oc+0.629
 exc+0.616
Ps+0.614
 tou+0.613
Neg logit contributions
abella-0.741
top-0.705
 wrong-0.701
bench-0.659
box-0.588
Token play
Token ID711
Feature activation+1.96e-01

Pos logit contributions
abella+0.119
top+0.111
 wrong+0.107
bench+0.106
box+0.095
Neg logit contributions
hr-0.103
oc-0.086
 exc-0.086
 tou-0.085
EX-0.084
Token with
Token ID351
Feature activation+1.39e+00

Pos logit contributions
hr+0.247
oc+0.200
 exc+0.198
 tou+0.197
EX+0.195
Neg logit contributions
abella-0.353
top-0.334
 wrong-0.328
bench-0.319
box-0.290
Token other
Token ID584
Feature activation+3.14e+00

Pos logit contributions
hr+2.081
EX+1.736
AAAA+1.693
Ps+1.686
oc+1.663
Neg logit contributions
abella-2.195
 wrong-2.156
top-2.100
bench-1.882
 Luke-1.724
Token birds
Token ID10087
Feature activation-4.09e-01

Pos logit contributions
hr+4.415
rum+3.993
EX+3.723
ge+3.722
Fear+3.618
Neg logit contributions
 wrong-5.317
top-4.713
abella-4.655
box-4.123
 contest-4.043
Token.
Token ID13
Feature activation-3.69e-01

Pos logit contributions
abella+0.730
top+0.678
bench+0.652
 wrong+0.646
box+0.590
Neg logit contributions
hr-0.528
oc-0.440
 exc-0.438
 tou-0.433
 accidents-0.423
Token
Token ID198
Feature activation-8.24e-01

Pos logit contributions
abella+0.643
top+0.598
 wrong+0.584
bench+0.577
box+0.523
Neg logit contributions
hr-0.492
oc-0.421
 exc-0.412
 tou-0.403
AAAA-0.401
Token
Token ID198
Feature activation-4.10e-01

Pos logit contributions
abella+1.494
top+1.420
 wrong+1.374
bench+1.356
box+1.255
Neg logit contributions
hr-1.008
oc-0.861
 exc-0.844
Ps-0.805
AAAA-0.803
TokenOne
Token ID3198
Feature activation+8.33e-01

Pos logit contributions
abella+0.466
top+0.421
 wrong+0.401
bench+0.395
box+0.341
Neg logit contributions
hr-0.794
oc-0.707
 exc-0.703
AAAA-0.694
Ps-0.694
Token Sally
Token ID25737
Feature activation+3.30e-01

Pos logit contributions
hr+2.329
oc+2.112
rum+2.105
AAAA+2.032
Ps+1.945
Neg logit contributions
abella-2.271
top-2.194
 wrong-2.149
bench-1.909
 Luke-1.790
Token promised
Token ID8072
Feature activation-5.91e-01

Pos logit contributions
hr+0.548
oc+0.467
 exc+0.465
AAAA+0.463
 tou+0.463
Neg logit contributions
abella-0.461
top-0.427
 wrong-0.411
bench-0.399
box-0.354
Token to
Token ID284
Feature activation-7.25e-01

Pos logit contributions
abella+0.887
top+0.832
bench+0.803
 wrong+0.793
 projector+0.713
Neg logit contributions
hr-0.923
 accidents-0.777
 tou-0.777
 exc-0.766
EX-0.766
Token always
Token ID1464
Feature activation-1.41e-01

Pos logit contributions
abella+1.056
top+0.972
 wrong+0.940
bench+0.930
box+0.810
Neg logit contributions
hr-1.172
 exc-1.023
 tou-1.023
oc-1.013
EX-0.982
Token share
Token ID2648
Feature activation+1.41e+00

Pos logit contributions
abella+0.243
top+0.228
 wrong+0.220
bench+0.217
box+0.196
Neg logit contributions
hr-0.190
oc-0.157
 exc-0.157
 tou-0.155
EX-0.153
Token her
Token ID607
Feature activation+3.13e+00

Pos logit contributions
hr+1.855
AAAA+1.602
Ps+1.531
oc+1.529
 exc+1.527
Neg logit contributions
abella-2.467
 wrong-2.362
top-2.329
bench-2.175
 Luke-2.025
Token toys
Token ID14958
Feature activation+3.76e-01

Pos logit contributions
hr+4.794
AAAA+4.158
Ps+3.938
 tou+3.877
Upon+3.836
Neg logit contributions
 wrong-5.454
top-4.716
box-4.655
abella-4.625
bench-4.590
Token with
Token ID351
Feature activation+3.94e-01

Pos logit contributions
hr+0.417
AAAA+0.335
oc+0.331
Ps+0.329
 tou+0.327
Neg logit contributions
abella-0.724
top-0.695
 wrong-0.678
bench-0.662
box-0.610
Token others
Token ID1854
Feature activation+4.36e-01

Pos logit contributions
hr+0.496
oc+0.410
AAAA+0.401
 exc+0.400
 tou+0.395
Neg logit contributions
abella-0.709
top-0.669
 wrong-0.656
bench-0.630
box-0.576
Token.
Token ID13
Feature activation+4.81e-01

Pos logit contributions
hr+0.592
AAAA+0.490
oc+0.488
 tou+0.486
 exc+0.484
Neg logit contributions
abella-0.738
top-0.699
 wrong-0.686
bench-0.659
box-0.601
Token<|endoftext|>
Token ID50256
Feature activation-5.12e-01

Pos logit contributions
hr+0.725
 exc+0.602
oc+0.601
AAAA+0.598
 tou+0.597
Neg logit contributions
abella-0.749
top-0.708
 wrong-0.676
bench-0.659
box-0.587
Token,
Token ID11
Feature activation+1.32e+00

Pos logit contributions
abella+1.055
top+1.000
bench+0.965
 wrong+0.941
 versa+0.827
Neg logit contributions
hr-0.965
oc-0.845
 accidents-0.823
 exc-0.808
 tou-0.803
Token Lily
Token ID20037
Feature activation-6.35e-01

Pos logit contributions
hr+2.218
EX+1.882
AAAA+1.846
oc+1.837
rum+1.814
Neg logit contributions
abella-1.889
 wrong-1.769
top-1.658
bench-1.612
box-1.495
Token,
Token ID11
Feature activation+5.81e-01

Pos logit contributions
abella+1.208
top+1.138
bench+1.083
 wrong+1.080
 versa+0.980
Neg logit contributions
hr-0.742
oc-0.634
 exc-0.596
 accidents-0.591
 tou-0.586
Token for
Token ID329
Feature activation+2.47e-02

Pos logit contributions
hr+1.065
oc+0.929
EX+0.915
AAAA+0.913
Ps+0.906
Neg logit contributions
abella-0.713
 wrong-0.647
top-0.646
bench-0.601
box-0.535
Token sharing
Token ID7373
Feature activation+9.82e-01

Pos logit contributions
hr+0.036
oc+0.030
 exc+0.030
 tou+0.030
EX+0.030
Neg logit contributions
abella-0.039
top-0.037
 wrong-0.036
bench-0.035
box-0.031
Token your
Token ID534
Feature activation+3.12e+00

Pos logit contributions
hr+1.202
Ps+0.966
EX+0.966
AAAA+0.964
 tou+0.959
Neg logit contributions
abella-1.800
 wrong-1.747
top-1.709
bench-1.619
box-1.524
Token toys
Token ID14958
Feature activation+5.28e-01

Pos logit contributions
hr+4.795
AAAA+3.882
Upon+3.825
EX+3.822
Ps+3.801
Neg logit contributions
 wrong-5.441
box-4.612
top-4.565
bench-4.541
 projector-4.399
Token with
Token ID351
Feature activation+1.24e-01

Pos logit contributions
hr+0.641
AAAA+0.527
oc+0.523
Ps+0.523
 tou+0.517
Neg logit contributions
abella-0.961
top-0.922
 wrong-0.908
bench-0.875
box-0.810
Token me
Token ID502
Feature activation+3.95e-01

Pos logit contributions
hr+0.179
oc+0.151
 exc+0.149
 tou+0.148
EX+0.147
Neg logit contributions
abella-0.199
top-0.187
 wrong-0.182
bench-0.177
box-0.159
Token."
Token ID526
Feature activation+7.03e-01

Pos logit contributions
hr+0.553
 tou+0.460
AAAA+0.460
oc+0.460
Ps+0.458
Neg logit contributions
abella-0.652
top-0.613
 wrong-0.603
bench-0.578
box-0.530
Token Lily
Token ID20037
Feature activation-1.13e-01

Pos logit contributions
hr+1.067
 tou+0.880
 exc+0.869
oc+0.869
EX+0.867
Neg logit contributions
abella-1.116
top-1.044
 wrong-1.014
bench-0.993
box-0.897
Token,
Token ID11
Feature activation+6.59e-01

Pos logit contributions
hr+0.191
oc+0.159
 exc+0.157
 tou+0.157
AAAA+0.156
Neg logit contributions
abella-0.231
top-0.218
 wrong-0.212
bench-0.206
box-0.187
Token "
Token ID366
Feature activation+1.30e+00

Pos logit contributions
hr+0.974
oc+0.841
 tou+0.817
EX+0.810
AAAA+0.804
Neg logit contributions
abella-1.030
 wrong-0.968
top-0.967
bench-0.906
box-0.821
TokenCan
Token ID6090
Feature activation-6.28e-01

Pos logit contributions
hr+1.723
rum+1.488
 exc+1.456
 fortunate+1.436
 accidents+1.415
Neg logit contributions
abella-2.240
top-2.189
 wrong-2.044
bench-1.968
box-1.813
Token you
Token ID345
Feature activation-3.07e-01

Pos logit contributions
abella+1.020
top+0.954
bench+0.932
 wrong+0.891
box+0.813
Neg logit contributions
hr-0.914
 exc-0.781
 tou-0.772
 accidents-0.766
oc-0.761
Token raise
Token ID5298
Feature activation-4.97e-02

Pos logit contributions
abella+0.455
top+0.418
bench+0.398
 wrong+0.398
box+0.351
Neg logit contributions
hr-0.491
 exc-0.422
oc-0.420
 tou-0.419
EX-0.411
Token your
Token ID534
Feature activation+3.12e+00

Pos logit contributions
abella+0.101
top+0.095
 wrong+0.093
bench+0.092
box+0.084
Neg logit contributions
hr-0.051
oc-0.040
 exc-0.039
 tou-0.039
 accidents-0.039
Token trunk
Token ID21427
Feature activation+9.22e-01

Pos logit contributions
hr+6.115
 accidents+5.246
 fortunate+5.190
AAAA+5.093
Ps+5.016
Neg logit contributions
 wrong-3.927
box-3.592
top-3.527
abella-3.214
bench-3.118
Token?"
Token ID1701
Feature activation-1.58e-01

Pos logit contributions
hr+1.529
Ps+1.315
AAAA+1.315
oc+1.308
 tou+1.297
Neg logit contributions
abella-1.247
top-1.233
 wrong-1.225
bench-1.118
box-1.035
Token The
Token ID383
Feature activation+2.09e+00

Pos logit contributions
abella+0.268
top+0.250
 wrong+0.244
bench+0.239
box+0.217
Neg logit contributions
hr-0.216
oc-0.181
 exc-0.177
 tou-0.176
EX-0.175
Token elephant
Token ID20950
Feature activation-2.67e-01

Pos logit contributions
hr+3.125
rum+2.650
 accidents+2.552
EX+2.516
ment+2.503
Neg logit contributions
 wrong-3.317
top-3.135
abella-3.135
bench-2.821
 tenth-2.727
Token raised
Token ID4376
Feature activation+1.01e+00

Pos logit contributions
abella+0.390
top+0.363
 wrong+0.348
bench+0.346
box+0.305
Neg logit contributions
hr-0.424
oc-0.366
 exc-0.361
 tou-0.357
Ps-0.353
Token Lily
Token ID20037
Feature activation+5.90e-01

Pos logit contributions
hr+0.806
 tou+0.666
oc+0.666
 exc+0.664
EX+0.663
Neg logit contributions
abella-0.826
top-0.769
 wrong-0.745
bench-0.731
box-0.649
Token felt
Token ID2936
Feature activation+1.05e+00

Pos logit contributions
hr+0.896
oc+0.755
EX+0.754
 accidents+0.748
AAAA+0.746
Neg logit contributions
abella-0.918
top-0.847
 wrong-0.820
bench-0.790
box-0.717
Token good
Token ID922
Feature activation-1.74e-01

Pos logit contributions
hr+1.578
oc+1.352
EX+1.308
rum+1.286
AAAA+1.285
Neg logit contributions
abella-1.671
 wrong-1.649
top-1.572
bench-1.471
 Luke-1.300
Token about
Token ID546
Feature activation+7.38e-01

Pos logit contributions
abella+0.315
top+0.298
 wrong+0.287
bench+0.284
box+0.257
Neg logit contributions
hr-0.220
oc-0.180
 exc-0.178
 tou-0.177
 accidents-0.176
Token sharing
Token ID7373
Feature activation+8.88e-01

Pos logit contributions
hr+1.127
oc+0.996
AAAA+0.964
EX+0.955
 tou+0.945
Neg logit contributions
abella-1.125
 wrong-1.045
top-1.032
bench-0.978
 Luke-0.877
Token her
Token ID607
Feature activation+3.12e+00

Pos logit contributions
hr+1.116
AAAA+0.911
oc+0.889
Ps+0.883
 exc+0.882
Neg logit contributions
abella-1.615
 wrong-1.550
top-1.533
bench-1.469
box-1.342
Token toy
Token ID13373
Feature activation+1.05e+00

Pos logit contributions
hr+5.199
AAAA+4.391
EX+4.146
Upon+4.134
Ps+4.109
Neg logit contributions
 wrong-5.075
abella-4.472
top-4.296
box-4.140
bench-3.996
Token.
Token ID13
Feature activation+5.19e-01

Pos logit contributions
hr+1.516
AAAA+1.295
Ps+1.253
EX+1.228
 tou+1.215
Neg logit contributions
abella-1.686
top-1.646
 wrong-1.634
bench-1.512
box-1.463
Token The
Token ID383
Feature activation+6.27e-01

Pos logit contributions
hr+0.790
oc+0.653
 tou+0.650
 exc+0.649
EX+0.646
Neg logit contributions
abella-0.814
top-0.760
 wrong-0.732
bench-0.717
box-0.638
Token moral
Token ID6573
Feature activation-1.44e+00

Pos logit contributions
hr+0.933
oc+0.778
EX+0.765
rum+0.761
 tou+0.755
Neg logit contributions
abella-0.990
 wrong-0.946
top-0.933
bench-0.895
box-0.802
Token of
Token ID286
Feature activation-1.16e-01

Pos logit contributions
abella+3.262
top+3.060
bench+2.970
 wrong+2.937
 versa+2.824
Neg logit contributions
hr-1.165
 tou-0.958
oc-0.913
 exc-0.884
 accidents-0.862
Token
Token ID198
Feature activation-4.94e-01

Pos logit contributions
abella+1.670
top+1.593
 wrong+1.570
bench+1.522
box+1.409
Neg logit contributions
hr-1.094
oc-0.949
 exc-0.900
EX-0.867
Ps-0.865
TokenOne
Token ID3198
Feature activation+8.91e-01

Pos logit contributions
abella+0.561
top+0.513
 wrong+0.483
bench+0.478
box+0.414
Neg logit contributions
hr-0.935
oc-0.856
 exc-0.834
 tou-0.829
AAAA-0.828
Token day
Token ID1110
Feature activation+7.16e-01

Pos logit contributions
hr+1.891
 accidents+1.656
EX+1.652
AAAA+1.651
oc+1.642
Neg logit contributions
abella-0.843
top-0.774
 wrong-0.748
bench-0.671
box-0.526
Token,
Token ID11
Feature activation+8.04e-01

Pos logit contributions
hr+1.156
oc+0.957
EX+0.955
 exc+0.954
 tou+0.948
Neg logit contributions
abella-1.052
top-0.979
 wrong-0.979
bench-0.908
box-0.816
Token while
Token ID981
Feature activation+4.58e-01

Pos logit contributions
hr+1.412
oc+1.198
EX+1.171
rum+1.165
 exc+1.163
Neg logit contributions
abella-1.093
top-1.005
 wrong-1.004
bench-0.921
 Luke-0.819
Token they
Token ID484
Feature activation+3.11e+00

Pos logit contributions
hr+0.782
oc+0.670
EX+0.656
 exc+0.654
 tou+0.652
Neg logit contributions
abella-0.626
top-0.578
 wrong-0.569
bench-0.538
box-0.471
Token were
Token ID547
Feature activation+1.60e-02

Pos logit contributions
hr+6.640
rum+6.178
 accidents+5.970
adoes+5.926
 fortunate+5.925
Neg logit contributions
abella-3.280
top-3.006
 wrong-2.564
bench-2.302
 versa-2.136
Token walking
Token ID6155
Feature activation-8.08e-01

Pos logit contributions
hr+0.024
oc+0.021
 exc+0.021
 tou+0.021
EX+0.020
Neg logit contributions
abella-0.024
top-0.023
 wrong-0.022
bench-0.021
box-0.019
Token,
Token ID11
Feature activation-3.28e-01

Pos logit contributions
abella+1.448
top+1.327
bench+1.311
 wrong+1.256
box+1.176
Neg logit contributions
hr-1.032
oc-0.883
 exc-0.864
 tou-0.861
AAAA-0.851
Token Spot
Token ID15899
Feature activation+2.30e-01

Pos logit contributions
abella+0.486
top+0.457
bench+0.438
 wrong+0.435
box+0.389
Neg logit contributions
hr-0.505
 tou-0.433
 exc-0.433
oc-0.432
 accidents-0.432
Token saw
Token ID2497
Feature activation-7.73e-01

Pos logit contributions
hr+0.368
oc+0.315
 exc+0.313
 tou+0.310
 accidents+0.309
Neg logit contributions
abella-0.336
top-0.313
 wrong-0.304
bench-0.291
box-0.260
Token.
Token ID13
Feature activation+1.13e+00

Pos logit contributions
hr+1.300
 tou+1.104
AAAA+1.104
oc+1.099
Ps+1.090
Neg logit contributions
abella-1.443
top-1.402
 wrong-1.349
bench-1.346
box-1.216
Token She
Token ID1375
Feature activation+8.99e-01

Pos logit contributions
hr+1.839
oc+1.548
 tou+1.542
 accidents+1.533
 exc+1.517
Neg logit contributions
abella-1.628
top-1.585
 wrong-1.433
bench-1.419
 Luke-1.284
Token quickly
Token ID2952
Feature activation+1.20e+00

Pos logit contributions
hr+1.324
 accidents+1.122
AAAA+1.115
rum+1.097
 tou+1.097
Neg logit contributions
abella-1.443
top-1.339
 wrong-1.274
bench-1.249
box-1.169
Token filled
Token ID5901
Feature activation+6.36e-01

Pos logit contributions
hr+1.667
 accidents+1.398
 exc+1.371
oc+1.367
AAAA+1.364
Neg logit contributions
abella-2.029
top-1.927
bench-1.845
 wrong-1.837
box-1.678
Token up
Token ID510
Feature activation+1.40e+00

Pos logit contributions
hr+0.852
oc+0.725
AAAA+0.718
 exc+0.714
 tou+0.712
Neg logit contributions
abella-1.079
top-1.024
 wrong-0.984
bench-0.959
box-0.872
Token her
Token ID607
Feature activation+3.11e+00

Pos logit contributions
hr+1.931
AAAA+1.720
oc+1.680
 exc+1.664
Ps+1.656
Neg logit contributions
abella-2.320
top-2.244
 wrong-2.152
bench-2.071
box-1.885
Token basket
Token ID7988
Feature activation-1.70e-01

Pos logit contributions
hr+5.524
AAAA+4.908
rum+4.795
Ps+4.723
Get+4.719
Neg logit contributions
 wrong-4.236
top-4.001
box-3.893
abella-3.783
bench-3.446
Token with
Token ID351
Feature activation+8.56e-01

Pos logit contributions
abella+0.346
top+0.325
 wrong+0.319
bench+0.313
box+0.287
Neg logit contributions
hr-0.173
oc-0.134
 exc-0.132
 tou-0.130
EX-0.128
Token blue
Token ID4171
Feature activation+1.64e+00

Pos logit contributions
hr+1.333
oc+1.131
AAAA+1.122
Ps+1.107
 exc+1.096
Neg logit contributions
abella-1.284
top-1.241
 wrong-1.200
bench-1.153
box-1.035
Tokenberries
Token ID20853
Feature activation-4.42e-01

Pos logit contributions
hr+2.273
 exc+1.850
AAAA+1.847
Ps+1.806
oc+1.786
Neg logit contributions
top-2.776
abella-2.679
 wrong-2.573
bench-2.551
box-2.470
Token and
Token ID290
Feature activation+6.28e-01

Pos logit contributions
abella+0.895
top+0.829
 wrong+0.815
bench+0.802
box+0.729
Neg logit contributions
hr-0.460
oc-0.361
 exc-0.355
 tou-0.355
 accidents-0.352
Token they
Token ID484
Feature activation-1.17e+00

Pos logit contributions
abella+1.841
top+1.789
bench+1.741
 wrong+1.716
box+1.536
Neg logit contributions
hr-1.468
 accidents-1.240
oc-1.237
Ps-1.218
 exc-1.214
Token played
Token ID2826
Feature activation-6.49e-01

Pos logit contributions
abella+1.658
top+1.490
 wrong+1.441
bench+1.420
box+1.284
Neg logit contributions
hr-1.898
 exc-1.781
oc-1.709
 tou-1.693
Ps-1.657
Token in
Token ID287
Feature activation+2.97e-01

Pos logit contributions
abella+1.009
top+0.937
bench+0.886
 wrong+0.883
box+0.764
Neg logit contributions
hr-0.994
oc-0.856
 exc-0.841
AAAA-0.831
 accidents-0.825
Token the
Token ID262
Feature activation-1.18e+00

Pos logit contributions
hr+0.358
oc+0.295
 tou+0.288
 exc+0.286
 accidents+0.284
Neg logit contributions
abella-0.546
top-0.519
 wrong-0.504
bench-0.494
box-0.450
Token garden
Token ID11376
Feature activation-1.55e+00

Pos logit contributions
abella+2.076
bench+1.882
top+1.860
 wrong+1.769
 versa+1.640
Neg logit contributions
hr-1.571
 tou-1.364
Ps-1.333
AAAA-1.329
oc-1.314
Token and
Token ID290
Feature activation-3.29e+00

Pos logit contributions
abella+3.389
bench+2.967
top+2.925
 wrong+2.866
 versa+2.779
Neg logit contributions
hr-1.558
 exc-1.265
oc-1.243
 from-1.173
 accidents-1.079
Token had
Token ID550
Feature activation-1.51e+00

Pos logit contributions
bench+6.219
top+5.889
abella+5.879
 wrong+5.793
box+5.527
Neg logit contributions
hr-4.061
 exc-3.600
 pets-3.563
Ps-3.436
 from-3.431
Token lots
Token ID6041
Feature activation+1.26e+00

Pos logit contributions
abella+2.493
top+2.339
bench+2.330
 wrong+2.180
box+2.019
Neg logit contributions
hr-2.118
 tou-1.776
 exc-1.760
 accidents-1.741
 little-1.732
Token of
Token ID286
Feature activation-4.28e-02

Pos logit contributions
hr+1.678
AAAA+1.416
EX+1.370
Ps+1.361
oc+1.357
Neg logit contributions
abella-2.114
top-2.098
 wrong-2.054
bench-1.972
box-1.754
Token fun
Token ID1257
Feature activation+8.82e-02

Pos logit contributions
abella+0.055
top+0.051
 wrong+0.049
bench+0.048
box+0.041
Neg logit contributions
hr-0.076
oc-0.066
 exc-0.066
 tou-0.065
EX-0.065
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
hr+0.125
oc+0.104
 exc+0.103
 tou+0.103
EX+0.102
Neg logit contributions
abella-0.145
top-0.136
 wrong-0.132
bench-0.129
box-0.116
Token play
Token ID711
Feature activation-4.39e-01

Pos logit contributions
abella+1.332
top+1.235
 wrong+1.214
bench+1.194
box+1.061
Neg logit contributions
hr-1.009
 exc-0.856
 tou-0.838
EX-0.815
oc-0.812
Token.
Token ID13
Feature activation-6.58e-01

Pos logit contributions
abella+0.761
top+0.708
bench+0.676
 wrong+0.676
box+0.605
Neg logit contributions
hr-0.592
oc-0.499
 exc-0.488
AAAA-0.482
 accidents-0.479
Token They
Token ID1119
Feature activation-6.24e-01

Pos logit contributions
abella+1.021
top+0.960
 wrong+0.945
bench+0.914
box+0.825
Neg logit contributions
hr-0.956
 exc-0.810
oc-0.809
Ps-0.806
AAAA-0.797
Token looked
Token ID3114
Feature activation+2.59e-01

Pos logit contributions
abella+1.022
top+0.956
 wrong+0.939
bench+0.915
box+0.821
Neg logit contributions
hr-0.874
 exc-0.754
 tou-0.735
oc-0.731
Ps-0.730
Token around
Token ID1088
Feature activation-2.17e-01

Pos logit contributions
hr+0.432
oc+0.373
 tou+0.367
 exc+0.366
EX+0.364
Neg logit contributions
abella-0.362
top-0.337
 wrong-0.330
bench-0.315
box-0.278
Token and
Token ID290
Feature activation-3.09e+00

Pos logit contributions
abella+0.423
top+0.399
 wrong+0.387
bench+0.385
box+0.352
Neg logit contributions
hr-0.243
oc-0.193
 exc-0.191
 accidents-0.188
AAAA-0.186
Token saw
Token ID2497
Feature activation-1.58e+00

Pos logit contributions
abella+6.428
bench+6.229
top+5.906
box+5.745
 wrong+5.740
Neg logit contributions
hr-3.243
Ps-2.739
 exc-2.665
 mis-2.568
 little-2.554
Token a
Token ID257
Feature activation-5.12e-01

Pos logit contributions
abella+2.665
bench+2.563
top+2.527
 wrong+2.296
box+2.125
Neg logit contributions
hr-2.152
 accidents-1.966
 exc-1.821
 little-1.793
 pets-1.769
Token black
Token ID2042
Feature activation-9.24e-01

Pos logit contributions
abella+0.959
top+0.902
bench+0.866
 wrong+0.860
box+0.776
Neg logit contributions
hr-0.611
 tou-0.502
oc-0.500
 exc-0.494
AAAA-0.486
Tokenboard
Token ID3526
Feature activation-1.81e+00

Pos logit contributions
abella+1.771
top+1.621
bench+1.573
 wrong+1.564
 versa+1.390
Neg logit contributions
hr-1.085
oc-0.951
 tou-0.920
AAAA-0.902
 accidents-0.888
Token.
Token ID13
Feature activation-9.66e-01

Pos logit contributions
abella+3.805
top+3.318
 wrong+3.142
bench+3.128
acey+3.078
Neg logit contributions
hr-1.891
 exc-1.568
 from-1.526
oc-1.525
 tou-1.491
Token
Token ID198
Feature activation-1.40e+00

Pos logit contributions
abella+2.709
top+2.606
 wrong+2.554
bench+2.499
box+2.325
Neg logit contributions
hr-1.555
oc-1.366
Ps-1.318
 exc-1.313
AAAA-1.265
TokenOne
Token ID3198
Feature activation-3.32e-01

Pos logit contributions
abella+1.810
top+1.660
 wrong+1.648
bench+1.632
box+1.437
Neg logit contributions
hr-2.386
oc-2.171
Ps-2.136
 exc-2.110
AAAA-2.100
Token day
Token ID1110
Feature activation-1.67e-01

Pos logit contributions
abella+0.407
top+0.368
 wrong+0.345
bench+0.343
box+0.297
Neg logit contributions
hr-0.619
 exc-0.549
oc-0.544
 tou-0.536
 accidents-0.533
Token,
Token ID11
Feature activation-3.34e-01

Pos logit contributions
abella+0.274
top+0.258
 wrong+0.249
bench+0.247
box+0.220
Neg logit contributions
hr-0.234
oc-0.196
 exc-0.194
 tou-0.192
AAAA-0.192
Token they
Token ID484
Feature activation-7.27e-01

Pos logit contributions
abella+0.437
top+0.410
 wrong+0.392
bench+0.391
box+0.338
Neg logit contributions
hr-0.575
 exc-0.498
oc-0.494
 accidents-0.491
 tou-0.488
Token find
Token ID1064
Feature activation-2.87e+00

Pos logit contributions
abella+1.336
top+1.239
bench+1.190
 wrong+1.188
box+1.087
Neg logit contributions
hr-0.892
 exc-0.763
 tou-0.745
Ps-0.736
oc-0.727
Token a
Token ID257
Feature activation-6.24e-01

Pos logit contributions
abella+5.486
bench+5.436
top+5.320
ven+4.664
 versa+4.617
Neg logit contributions
 little-3.001
hr-2.924
 from-2.852
 accidents-2.785
 pets-2.646
Token big
Token ID1263
Feature activation-7.23e-01

Pos logit contributions
abella+1.176
top+1.105
bench+1.055
 wrong+1.031
box+0.940
Neg logit contributions
hr-0.736
 tou-0.613
 exc-0.603
oc-0.589
 accidents-0.586
Token box
Token ID3091
Feature activation-5.05e-01

Pos logit contributions
abella+1.164
top+1.049
bench+0.978
 wrong+0.977
 versa+0.852
Neg logit contributions
hr-1.074
oc-0.936
 tou-0.927
 exc-0.920
 accidents-0.915
Token in
Token ID287
Feature activation+4.10e-01

Pos logit contributions
abella+0.911
top+0.838
 wrong+0.805
bench+0.795
 versa+0.716
Neg logit contributions
hr-0.643
 exc-0.532
oc-0.520
 tou-0.518
EX-0.511
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
hr+0.513
oc+0.425
 tou+0.413
 exc+0.411
AAAA+0.411
Neg logit contributions
abella-0.745
top-0.708
 wrong-0.687
bench-0.676
box-0.613
Token day
Token ID1110
Feature activation-2.06e-01

Pos logit contributions
abella+0.445
top+0.403
 wrong+0.380
bench+0.379
box+0.327
Neg logit contributions
hr-0.679
 exc-0.603
oc-0.595
 tou-0.586
 accidents-0.586
Token,
Token ID11
Feature activation-1.19e-01

Pos logit contributions
abella+0.337
top+0.319
 wrong+0.307
bench+0.305
box+0.272
Neg logit contributions
hr-0.289
oc-0.240
 exc-0.240
 accidents-0.236
AAAA-0.236
Token Anna
Token ID11735
Feature activation+3.61e-01

Pos logit contributions
abella+0.153
top+0.142
 wrong+0.136
bench+0.134
box+0.116
Neg logit contributions
hr-0.209
oc-0.181
 exc-0.181
 tou-0.179
 accidents-0.179
Token and
Token ID290
Feature activation-1.18e+00

Pos logit contributions
hr+0.335
oc+0.246
 tou+0.244
 exc+0.241
 accidents+0.241
Neg logit contributions
abella-0.765
top-0.733
 wrong-0.714
bench-0.703
box-0.652
Token Ben
Token ID3932
Feature activation-4.68e-01

Pos logit contributions
abella+1.408
top+1.307
bench+1.271
 wrong+1.260
box+1.046
Neg logit contributions
hr-2.173
 exc-1.939
 accidents-1.923
 tou-1.879
Ps-1.871
Token find
Token ID1064
Feature activation-2.86e+00

Pos logit contributions
abella+0.931
top+0.872
 wrong+0.848
bench+0.842
box+0.778
Neg logit contributions
hr-0.500
 exc-0.412
 tou-0.396
oc-0.394
Ps-0.392
Token a
Token ID257
Feature activation-1.33e+00

Pos logit contributions
abella+5.461
bench+5.256
top+5.152
ven+4.645
rob+4.558
Neg logit contributions
 little-3.108
 accidents-3.095
 from-3.063
hr-3.033
 exc-2.753
Token big
Token ID1263
Feature activation-1.32e+00

Pos logit contributions
abella+2.540
top+2.354
bench+2.282
 wrong+2.139
 versa+2.025
Neg logit contributions
hr-1.551
 exc-1.335
 accidents-1.306
 tou-1.304
 mis-1.294
Token frog
Token ID21264
Feature activation+1.76e-01

Pos logit contributions
abella+2.239
top+1.978
bench+1.864
 wrong+1.805
liv+1.632
Neg logit contributions
hr-1.890
 accidents-1.679
 exc-1.647
oc-1.643
 tou-1.620
Token in
Token ID287
Feature activation+7.02e-01

Pos logit contributions
hr+0.260
oc+0.219
 exc+0.216
 tou+0.216
EX+0.214
Neg logit contributions
abella-0.277
top-0.261
 wrong-0.254
bench-0.247
box-0.223
Token the
Token ID262
Feature activation-1.05e+00

Pos logit contributions
hr+0.916
oc+0.776
 tou+0.742
EX+0.741
Ps+0.739
Neg logit contributions
abella-1.227
top-1.172
 wrong-1.134
bench-1.115
 Luke-1.014
Token day
Token ID1110
Feature activation+9.51e-02

Pos logit contributions
abella+0.331
top+0.299
 wrong+0.285
bench+0.279
box+0.239
Neg logit contributions
hr-0.521
 exc-0.461
oc-0.460
 tou-0.451
EX-0.446
Token,
Token ID11
Feature activation+1.81e-01

Pos logit contributions
hr+0.139
oc+0.117
 exc+0.116
 tou+0.115
EX+0.114
Neg logit contributions
abella-0.152
top-0.142
 wrong-0.138
bench-0.135
box-0.121
Token Anna
Token ID11735
Feature activation+1.87e-01

Pos logit contributions
hr+0.312
oc+0.272
 exc+0.267
 tou+0.266
EX+0.265
Neg logit contributions
abella-0.243
top-0.224
 wrong-0.216
bench-0.208
box-0.183
Token and
Token ID290
Feature activation-6.59e-01

Pos logit contributions
hr+0.243
oc+0.199
 exc+0.196
 tou+0.196
EX+0.195
Neg logit contributions
abella-0.328
top-0.310
 wrong-0.301
bench-0.295
box-0.268
Token Ben
Token ID3932
Feature activation-3.99e-01

Pos logit contributions
abella+0.827
top+0.797
bench+0.746
 wrong+0.736
box+0.636
Neg logit contributions
hr-1.164
 tou-1.032
 exc-1.029
 accidents-1.018
Ps-1.018
Token find
Token ID1064
Feature activation-2.84e+00

Pos logit contributions
abella+0.681
top+0.635
 wrong+0.614
bench+0.605
box+0.548
Neg logit contributions
hr-0.533
 exc-0.454
 tou-0.447
oc-0.446
Ps-0.440
Token a
Token ID257
Feature activation-1.22e+00

Pos logit contributions
abella+5.456
top+5.402
bench+5.263
ven+4.708
 versa+4.614
Neg logit contributions
 little-2.944
hr-2.870
 from-2.853
 accidents-2.690
 pets-2.651
Token big
Token ID1263
Feature activation-1.31e+00

Pos logit contributions
abella+2.431
top+2.285
bench+2.190
 wrong+2.095
 versa+1.982
Neg logit contributions
hr-1.322
 exc-1.105
 tou-1.103
oc-1.040
 mis-1.036
Token roll
Token ID4836
Feature activation-5.40e-01

Pos logit contributions
abella+2.221
top+1.970
bench+1.834
 wrong+1.804
liv+1.680
Neg logit contributions
hr-1.879
oc-1.673
 tou-1.623
 exc-1.619
 accidents-1.598
Token of
Token ID286
Feature activation-6.69e-01

Pos logit contributions
abella+0.923
top+0.842
 wrong+0.811
bench+0.805
 versa+0.721
Neg logit contributions
hr-0.731
oc-0.622
 exc-0.613
 tou-0.607
 accidents-0.597
Token paper
Token ID3348
Feature activation-8.09e-01

Pos logit contributions
abella+1.002
top+0.918
 wrong+0.864
bench+0.850
box+0.739
Neg logit contributions
hr-1.058
 exc-0.911
oc-0.905
 tou-0.897
 accidents-0.892
Token
Token ID198
Feature activation-1.57e+00

Pos logit contributions
abella+3.677
top+3.547
bench+3.519
 wrong+3.509
box+3.277
Neg logit contributions
hr-1.864
Ps-1.623
 exc-1.599
oc-1.598
Upon-1.549
TokenOne
Token ID3198
Feature activation-1.27e-01

Pos logit contributions
abella+2.091
bench+1.956
top+1.929
 wrong+1.920
box+1.749
Neg logit contributions
hr-2.619
Ps-2.345
oc-2.327
 exc-2.318
AAAA-2.276
Token day
Token ID1110
Feature activation-3.11e-01

Pos logit contributions
abella+0.149
top+0.135
 wrong+0.128
bench+0.125
box+0.107
Neg logit contributions
hr-0.242
oc-0.213
 exc-0.213
 tou-0.210
 accidents-0.208
Token,
Token ID11
Feature activation-3.71e-01

Pos logit contributions
abella+0.517
top+0.490
bench+0.471
 wrong+0.471
box+0.420
Neg logit contributions
hr-0.427
oc-0.355
 exc-0.354
AAAA-0.350
 accidents-0.350
Token they
Token ID484
Feature activation-3.59e-01

Pos logit contributions
abella+0.480
top+0.451
bench+0.431
 wrong+0.431
box+0.372
Neg logit contributions
hr-0.641
 exc-0.555
oc-0.548
 accidents-0.548
AAAA-0.544
Token find
Token ID1064
Feature activation-2.83e+00

Pos logit contributions
abella+0.624
top+0.584
 wrong+0.566
bench+0.557
box+0.503
Neg logit contributions
hr-0.473
 exc-0.397
oc-0.391
 tou-0.388
Ps-0.384
Token a
Token ID257
Feature activation-8.93e-01

Pos logit contributions
bench+5.337
abella+5.308
top+5.213
 versa+4.554
ven+4.539
Neg logit contributions
 little-3.024
hr-2.965
 from-2.822
 accidents-2.690
AAAA-2.542
Token new
Token ID649
Feature activation-9.47e-01

Pos logit contributions
abella+1.726
top+1.615
bench+1.542
 wrong+1.490
 versa+1.379
Neg logit contributions
hr-1.020
 tou-0.856
 exc-0.843
oc-0.819
 accidents-0.804
Token toy
Token ID13373
Feature activation-4.93e-01

Pos logit contributions
abella+1.838
top+1.675
bench+1.593
 wrong+1.585
 versa+1.470
Neg logit contributions
hr-1.099
oc-0.925
 exc-0.919
 tou-0.884
EX-0.872
Token in
Token ID287
Feature activation+2.61e-01

Pos logit contributions
abella+0.790
top+0.730
bench+0.697
 wrong+0.688
box+0.602
Neg logit contributions
hr-0.710
oc-0.611
 exc-0.606
 tou-0.603
 accidents-0.593
Token their
Token ID511
Feature activation-2.90e-01

Pos logit contributions
hr+0.324
oc+0.267
 exc+0.260
 tou+0.259
EX+0.258
Neg logit contributions
abella-0.474
top-0.449
 wrong-0.436
bench-0.428
box-0.390
Token.
Token ID13
Feature activation-1.42e+00

Pos logit contributions
abella+0.298
top+0.278
 wrong+0.269
bench+0.264
box+0.240
Neg logit contributions
hr-0.230
oc-0.192
 exc-0.189
 tou-0.188
EX-0.185
Token One
Token ID1881
Feature activation-3.92e-01

Pos logit contributions
abella+2.257
top+2.112
bench+2.110
 wrong+2.080
box+1.933
Neg logit contributions
hr-1.970
 exc-1.780
Ps-1.772
oc-1.747
AAAA-1.687
Token day
Token ID1110
Feature activation-7.27e-02

Pos logit contributions
abella+0.489
top+0.436
 wrong+0.410
bench+0.409
box+0.350
Neg logit contributions
hr-0.724
 exc-0.645
oc-0.644
 tou-0.627
 accidents-0.622
Token,
Token ID11
Feature activation-3.78e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.105
box+0.094
Neg logit contributions
hr-0.104
oc-0.087
 exc-0.087
 tou-0.086
AAAA-0.085
Token they
Token ID484
Feature activation-5.77e-01

Pos logit contributions
abella+0.486
top+0.455
bench+0.434
 wrong+0.432
box+0.373
Neg logit contributions
hr-0.658
 exc-0.572
 accidents-0.566
oc-0.564
 tou-0.561
Token find
Token ID1064
Feature activation-2.80e+00

Pos logit contributions
abella+1.043
top+0.965
 wrong+0.934
bench+0.918
box+0.837
Neg logit contributions
hr-0.724
 exc-0.622
 tou-0.604
oc-0.602
Ps-0.590
Token a
Token ID257
Feature activation-8.67e-01

Pos logit contributions
abella+5.395
bench+5.230
top+5.181
 versa+4.606
ven+4.507
Neg logit contributions
 little-2.888
hr-2.850
 from-2.817
 accidents-2.642
AAAA-2.611
Token big
Token ID1263
Feature activation-9.16e-01

Pos logit contributions
abella+1.674
top+1.562
bench+1.500
 wrong+1.435
 versa+1.342
Neg logit contributions
hr-0.995
 tou-0.843
 exc-0.824
oc-0.806
Ps-0.789
Token box
Token ID3091
Feature activation-5.16e-01

Pos logit contributions
abella+1.411
top+1.242
bench+1.158
 wrong+1.140
 versa+1.017
Neg logit contributions
hr-1.442
oc-1.294
 tou-1.269
 exc-1.256
 accidents-1.245
Token in
Token ID287
Feature activation+6.45e-01

Pos logit contributions
abella+0.943
top+0.866
 wrong+0.838
bench+0.823
 versa+0.743
Neg logit contributions
hr-0.643
 exc-0.529
oc-0.522
 tou-0.517
EX-0.510
Token the
Token ID262
Feature activation-1.21e+00

Pos logit contributions
hr+0.827
oc+0.693
AAAA+0.673
 tou+0.673
Ps+0.670
Neg logit contributions
abella-1.151
top-1.100
 wrong-1.066
bench-1.050
box-0.945
Token or
Token ID393
Feature activation+5.85e-02

Pos logit contributions
abella+0.589
top+0.552
bench+0.523
 wrong+0.503
box+0.438
Neg logit contributions
hr-0.883
 accidents-0.784
 exc-0.778
 tou-0.761
oc-0.761
Token on
Token ID319
Feature activation+3.90e-01

Pos logit contributions
hr+0.082
oc+0.069
 exc+0.068
 tou+0.067
EX+0.067
Neg logit contributions
abella-0.096
top-0.090
 wrong-0.088
bench-0.086
box-0.077
Token paper
Token ID3348
Feature activation-1.65e-01

Pos logit contributions
hr+0.515
oc+0.426
 tou+0.421
 exc+0.416
EX+0.416
Neg logit contributions
abella-0.672
top-0.638
 wrong-0.625
bench-0.606
box-0.551
Token.
Token ID13
Feature activation+2.47e-01

Pos logit contributions
abella+0.287
top+0.269
 wrong+0.261
bench+0.256
box+0.231
Neg logit contributions
hr-0.215
oc-0.179
 exc-0.176
 tou-0.174
 accidents-0.174
Token You
Token ID921
Feature activation-7.72e-01

Pos logit contributions
hr+0.383
oc+0.323
 tou+0.321
 exc+0.319
 accidents+0.318
Neg logit contributions
abella-0.371
top-0.346
 wrong-0.333
bench-0.325
box-0.290
Token can
Token ID460
Feature activation-2.78e+00

Pos logit contributions
abella+1.239
top+1.145
 wrong+1.087
bench+1.084
box+0.974
Neg logit contributions
hr-1.133
 exc-0.997
oc-0.976
 tou-0.963
AAAA-0.942
Token not
Token ID407
Feature activation-1.13e+00

Pos logit contributions
abella+5.410
bench+5.034
top+4.761
 wrong+4.719
 projector+4.405
Neg logit contributions
hr-3.456
 exc-3.368
 from-2.795
 mis-2.743
 ac-2.689
Token draw
Token ID3197
Feature activation+5.19e-01

Pos logit contributions
abella+2.083
top+1.873
bench+1.839
 wrong+1.813
box+1.656
Neg logit contributions
hr-1.467
 exc-1.270
 tou-1.188
oc-1.168
 accidents-1.153
Token on
Token ID319
Feature activation+3.06e-01

Pos logit contributions
hr+0.767
oc+0.645
 tou+0.637
EX+0.635
 exc+0.635
Neg logit contributions
abella-0.820
top-0.769
 wrong-0.769
bench-0.722
box-0.661
Token other
Token ID584
Feature activation+1.98e-01

Pos logit contributions
hr+0.393
oc+0.324
 exc+0.320
 tou+0.320
EX+0.316
Neg logit contributions
abella-0.539
top-0.509
 wrong-0.497
bench-0.482
box-0.441
Token people
Token ID661
Feature activation+1.48e-01

Pos logit contributions
hr+0.281
oc+0.236
 exc+0.233
 tou+0.233
EX+0.231
Neg logit contributions
abella-0.322
top-0.305
 wrong-0.296
bench-0.288
box-0.261
Token
Token ID198
Feature activation-1.37e+00

Pos logit contributions
abella+2.341
top+2.262
 wrong+2.216
bench+2.180
box+2.007
Neg logit contributions
hr-1.365
oc-1.191
 exc-1.145
Ps-1.104
Upon-1.080
TokenOne
Token ID3198
Feature activation-2.07e-01

Pos logit contributions
abella+1.860
top+1.735
 wrong+1.705
bench+1.701
box+1.504
Neg logit contributions
hr-2.236
oc-2.039
Ps-1.997
 exc-1.975
AAAA-1.961
Token day
Token ID1110
Feature activation-2.38e-01

Pos logit contributions
abella+0.245
top+0.223
 wrong+0.211
bench+0.208
box+0.178
Neg logit contributions
hr-0.391
oc-0.345
 exc-0.344
 tou-0.338
 accidents-0.336
Token,
Token ID11
Feature activation-4.07e-01

Pos logit contributions
abella+0.393
top+0.372
 wrong+0.357
bench+0.356
box+0.317
Neg logit contributions
hr-0.329
oc-0.276
 exc-0.274
AAAA-0.271
 accidents-0.270
Token they
Token ID484
Feature activation-1.01e-01

Pos logit contributions
abella+0.559
top+0.526
bench+0.504
 wrong+0.502
box+0.440
Neg logit contributions
hr-0.670
 exc-0.578
oc-0.573
 accidents-0.571
AAAA-0.566
Token found
Token ID1043
Feature activation-2.77e+00

Pos logit contributions
abella+0.176
top+0.166
 wrong+0.161
bench+0.158
box+0.143
Neg logit contributions
hr-0.132
oc-0.109
 exc-0.108
 tou-0.107
Ps-0.106
Token a
Token ID257
Feature activation-8.09e-01

Pos logit contributions
bench+5.368
abella+5.314
top+5.057
ven+4.601
 Dame+4.591
Neg logit contributions
 from-2.847
hr-2.815
 accidents-2.712
 little-2.694
 pets-2.517
Token big
Token ID1263
Feature activation-1.08e+00

Pos logit contributions
abella+1.540
top+1.444
bench+1.377
 wrong+1.338
 versa+1.221
Neg logit contributions
hr-0.942
 tou-0.780
 exc-0.768
oc-0.766
 accidents-0.750
Token fan
Token ID4336
Feature activation-4.52e-01

Pos logit contributions
abella+1.746
top+1.555
bench+1.440
 wrong+1.423
 versa+1.268
Neg logit contributions
hr-1.627
oc-1.447
 tou-1.418
 accidents-1.416
 exc-1.392
Token in
Token ID287
Feature activation+3.51e-01

Pos logit contributions
abella+0.762
top+0.698
 wrong+0.674
bench+0.668
box+0.592
Neg logit contributions
hr-0.632
oc-0.527
 exc-0.519
 tou-0.517
EX-0.510
Token the
Token ID262
Feature activation-1.71e+00

Pos logit contributions
hr+0.443
oc+0.367
 tou+0.360
 exc+0.358
EX+0.354
Neg logit contributions
abella-0.632
top-0.601
 wrong-0.579
bench-0.572
box-0.519
Token
Token ID198
Feature activation-7.92e-01

Pos logit contributions
abella+1.684
top+1.615
 wrong+1.599
bench+1.571
box+1.453
Neg logit contributions
hr-1.150
 exc-0.983
oc-0.980
Ps-0.945
AAAA-0.944
TokenOne
Token ID3198
Feature activation-5.13e-01

Pos logit contributions
abella+1.045
top+0.972
 wrong+0.965
bench+0.950
box+0.848
Neg logit contributions
hr-1.350
oc-1.191
 exc-1.185
AAAA-1.164
Ps-1.156
Token day
Token ID1110
Feature activation+2.89e-01

Pos logit contributions
abella+0.639
top+0.582
bench+0.549
 wrong+0.549
box+0.479
Neg logit contributions
hr-0.939
 exc-0.835
oc-0.820
 tou-0.804
 accidents-0.801
Token,
Token ID11
Feature activation+4.92e-01

Pos logit contributions
hr+0.434
oc+0.370
 tou+0.363
 exc+0.362
EX+0.361
Neg logit contributions
abella-0.454
top-0.422
 wrong-0.411
bench-0.394
box-0.357
Token they
Token ID484
Feature activation+4.55e-02

Pos logit contributions
hr+0.853
oc+0.756
EX+0.737
 tou+0.734
AAAA+0.728
Neg logit contributions
abella-0.670
top-0.608
 wrong-0.586
bench-0.549
box-0.486
Token find
Token ID1064
Feature activation-2.75e+00

Pos logit contributions
hr+0.064
oc+0.054
 exc+0.053
 tou+0.053
EX+0.052
Neg logit contributions
abella-0.075
top-0.071
 wrong-0.068
bench-0.067
box-0.060
Token a
Token ID257
Feature activation-1.33e+00

Pos logit contributions
bench+5.225
top+5.049
abella+5.044
ven+4.427
 wrong+4.404
Neg logit contributions
 little-3.025
hr-2.935
 from-2.864
 accidents-2.664
AAAA-2.596
Token new
Token ID649
Feature activation-1.30e+00

Pos logit contributions
abella+2.636
top+2.482
bench+2.408
 wrong+2.255
 versa+2.127
Neg logit contributions
hr-1.461
 tou-1.240
 exc-1.206
 accidents-1.163
 mis-1.153
Token soap
Token ID19533
Feature activation-8.84e-01

Pos logit contributions
abella+2.418
top+2.201
bench+2.118
 wrong+2.061
 versa+1.909
Neg logit contributions
hr-1.654
 exc-1.399
oc-1.347
 accidents-1.342
 tou-1.332
Token in
Token ID287
Feature activation+5.23e-01

Pos logit contributions
abella+1.534
top+1.405
bench+1.337
 wrong+1.324
 versa+1.193
Neg logit contributions
hr-1.168
 exc-0.985
 tou-0.978
oc-0.974
 accidents-0.967
Token the
Token ID262
Feature activation-1.78e+00

Pos logit contributions
hr+0.690
oc+0.581
 tou+0.565
Ps+0.562
EX+0.560
Neg logit contributions
abella-0.911
top-0.867
 wrong-0.840
bench-0.826
box-0.748
Token day
Token ID1110
Feature activation+1.77e-01

Pos logit contributions
abella+0.256
top+0.233
 wrong+0.220
bench+0.217
box+0.186
Neg logit contributions
hr-0.409
oc-0.360
 exc-0.360
 tou-0.355
 accidents-0.351
Token,
Token ID11
Feature activation+1.36e-01

Pos logit contributions
hr+0.262
oc+0.223
 tou+0.219
 exc+0.219
EX+0.217
Neg logit contributions
abella-0.281
top-0.263
 wrong-0.255
bench-0.247
box-0.223
Token Tom
Token ID4186
Feature activation+3.29e-01

Pos logit contributions
hr+0.240
oc+0.210
 tou+0.207
 exc+0.207
EX+0.205
Neg logit contributions
abella-0.176
top-0.162
 wrong-0.155
bench-0.150
box-0.131
Token and
Token ID290
Feature activation-1.21e-01

Pos logit contributions
hr+0.432
oc+0.353
 exc+0.347
 tou+0.346
EX+0.346
Neg logit contributions
abella-0.572
top-0.541
 wrong-0.530
bench-0.515
box-0.469
Token Mia
Token ID32189
Feature activation-3.67e-01

Pos logit contributions
abella+0.176
top+0.165
 wrong+0.159
bench+0.156
box+0.138
Neg logit contributions
hr-0.194
oc-0.166
 exc-0.165
 tou-0.163
EX-0.162
Token find
Token ID1064
Feature activation-2.74e+00

Pos logit contributions
abella+0.654
top+0.609
 wrong+0.590
bench+0.586
box+0.533
Neg logit contributions
hr-0.467
 exc-0.391
oc-0.386
 tou-0.386
Ps-0.379
Token a
Token ID257
Feature activation-6.83e-01

Pos logit contributions
abella+5.127
bench+5.118
top+5.053
ven+4.375
 versa+4.316
Neg logit contributions
 little-2.899
 from-2.830
hr-2.822
 accidents-2.671
 tou-2.639
Token big
Token ID1263
Feature activation-7.76e-01

Pos logit contributions
abella+1.327
top+1.251
bench+1.197
 wrong+1.166
box+1.065
Neg logit contributions
hr-0.760
 tou-0.632
 exc-0.621
oc-0.607
 accidents-0.601
Token box
Token ID3091
Feature activation-6.03e-01

Pos logit contributions
abella+1.252
top+1.130
bench+1.050
 wrong+1.050
liv+0.913
Neg logit contributions
hr-1.145
oc-1.005
 tou-0.995
 exc-0.986
 accidents-0.976
Token in
Token ID287
Feature activation+6.43e-01

Pos logit contributions
abella+1.092
top+1.000
 wrong+0.957
bench+0.947
 versa+0.859
Neg logit contributions
hr-0.764
 exc-0.635
oc-0.620
 tou-0.617
EX-0.608
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
hr+0.836
oc+0.701
 tou+0.686
AAAA+0.686
Ps+0.682
Neg logit contributions
abella-1.136
top-1.084
 wrong-1.048
bench-1.033
box-0.933
Token.
Token ID13
Feature activation-1.44e+00

Pos logit contributions
abella+3.187
top+2.878
 wrong+2.754
bench+2.721
liv+2.584
Neg logit contributions
hr-1.537
 from-1.347
oc-1.338
 exc-1.325
 tou-1.222
Token One
Token ID1881
Feature activation-1.82e-01

Pos logit contributions
abella+2.036
top+1.850
bench+1.842
 wrong+1.824
box+1.665
Neg logit contributions
hr-2.269
 exc-2.104
Ps-2.090
oc-2.085
AAAA-1.998
Token day
Token ID1110
Feature activation-2.38e-01

Pos logit contributions
abella+0.215
top+0.195
 wrong+0.185
bench+0.181
box+0.155
Neg logit contributions
hr-0.344
oc-0.304
 exc-0.304
 tou-0.300
EX-0.297
Token,
Token ID11
Feature activation-4.05e-01

Pos logit contributions
abella+0.391
top+0.369
 wrong+0.355
bench+0.353
box+0.314
Neg logit contributions
hr-0.333
oc-0.278
 exc-0.278
AAAA-0.273
 accidents-0.273
Token they
Token ID484
Feature activation-2.36e-01

Pos logit contributions
abella+0.525
top+0.490
bench+0.468
 wrong+0.467
box+0.403
Neg logit contributions
hr-0.704
 exc-0.612
 accidents-0.605
oc-0.603
 tou-0.598
Token find
Token ID1064
Feature activation-2.74e+00

Pos logit contributions
abella+0.409
top+0.383
 wrong+0.372
bench+0.366
box+0.331
Neg logit contributions
hr-0.313
 exc-0.262
oc-0.260
 tou-0.258
Ps-0.254
Token a
Token ID257
Feature activation-7.01e-01

Pos logit contributions
abella+5.128
bench+5.041
top+5.039
 wrong+4.364
ven+4.342
Neg logit contributions
 little-2.871
hr-2.816
 from-2.809
 accidents-2.598
AAAA-2.586
Token big
Token ID1263
Feature activation-8.85e-01

Pos logit contributions
abella+1.313
top+1.232
bench+1.179
 wrong+1.143
box+1.042
Neg logit contributions
hr-0.834
 tou-0.708
 exc-0.691
oc-0.683
Ps-0.672
Token box
Token ID3091
Feature activation-5.37e-01

Pos logit contributions
abella+1.394
top+1.242
bench+1.158
 wrong+1.145
 versa+1.005
Neg logit contributions
hr-1.349
oc-1.201
 tou-1.184
 exc-1.167
 accidents-1.165
Token in
Token ID287
Feature activation+6.46e-01

Pos logit contributions
abella+0.973
top+0.894
 wrong+0.862
bench+0.849
 Luke+0.766
Neg logit contributions
hr-0.678
 exc-0.562
oc-0.550
 tou-0.548
EX-0.539
Token the
Token ID262
Feature activation-1.28e+00

Pos logit contributions
hr+0.831
oc+0.696
AAAA+0.678
 tou+0.677
Ps+0.674
Neg logit contributions
abella-1.149
top-1.096
 wrong-1.061
bench-1.046
box-0.942
Token They
Token ID1119
Feature activation-8.26e-01

Pos logit contributions
abella+3.099
 wrong+2.876
top+2.796
bench+2.639
box+2.475
Neg logit contributions
hr-2.070
oc-1.820
Ps-1.811
AAAA-1.759
 exc-1.753
Token were
Token ID547
Feature activation-1.08e+00

Pos logit contributions
abella+1.343
top+1.254
bench+1.215
 wrong+1.202
box+1.093
Neg logit contributions
hr-1.165
 exc-1.042
 tou-0.992
oc-0.989
Ps-0.982
Token always
Token ID1464
Feature activation-1.31e-01

Pos logit contributions
abella+1.750
top+1.668
bench+1.619
 wrong+1.534
box+1.389
Neg logit contributions
hr-1.580
 exc-1.402
 tou-1.336
 mis-1.259
 fortunate-1.258
Token playing
Token ID2712
Feature activation-7.41e-01

Pos logit contributions
abella+0.210
top+0.197
 wrong+0.189
bench+0.186
box+0.166
Neg logit contributions
hr-0.193
 exc-0.161
oc-0.160
 tou-0.159
 accidents-0.158
Token together
Token ID1978
Feature activation-1.24e+00

Pos logit contributions
abella+1.250
top+1.152
bench+1.103
 wrong+1.090
box+0.958
Neg logit contributions
hr-1.051
oc-0.903
AAAA-0.870
 exc-0.870
 accidents-0.853
Token and
Token ID290
Feature activation-2.74e+00

Pos logit contributions
abella+2.442
top+2.230
bench+2.212
 wrong+2.115
liv+1.969
Neg logit contributions
hr-1.384
 exc-1.111
oc-1.108
 accidents-1.059
 from-1.056
Token they
Token ID484
Feature activation-1.21e+00

Pos logit contributions
abella+5.308
bench+4.961
top+4.925
 wrong+4.601
box+4.472
Neg logit contributions
hr-3.536
 exc-2.976
 accidents-2.784
 mis-2.682
Ps-2.618
Token were
Token ID547
Feature activation-8.20e-01

Pos logit contributions
abella+1.989
top+1.881
bench+1.844
 wrong+1.805
box+1.670
Neg logit contributions
hr-1.663
 exc-1.502
oc-1.401
 tou-1.397
Ps-1.375
Token very
Token ID845
Feature activation-5.13e-01

Pos logit contributions
abella+1.303
top+1.239
bench+1.199
 wrong+1.150
box+1.061
Neg logit contributions
hr-1.223
 exc-1.057
 tou-1.002
 fortunate-0.984
 accidents-0.976
Token good
Token ID922
Feature activation-3.22e-01

Pos logit contributions
abella+1.067
top+1.013
bench+0.982
 wrong+0.979
box+0.918
Neg logit contributions
hr-0.514
 exc-0.418
oc-0.382
 fortunate-0.378
 tou-0.376
Token at
Token ID379
Feature activation-1.31e+00

Pos logit contributions
abella+0.692
top+0.661
 wrong+0.641
bench+0.638
box+0.587
Neg logit contributions
hr-0.297
 exc-0.220
oc-0.218
 tou-0.213
 accidents-0.212
Token
Token ID198
Feature activation-1.46e+00

Pos logit contributions
abella+3.103
top+3.014
bench+2.949
 wrong+2.942
box+2.743
Neg logit contributions
hr-1.677
oc-1.465
Ps-1.430
 exc-1.401
AAAA-1.377
TokenOne
Token ID3198
Feature activation-4.12e-01

Pos logit contributions
abella+1.875
bench+1.750
top+1.738
 wrong+1.702
box+1.561
Neg logit contributions
hr-2.496
Ps-2.246
oc-2.245
AAAA-2.206
 exc-2.196
Token day
Token ID1110
Feature activation-1.69e-01

Pos logit contributions
abella+0.509
top+0.461
 wrong+0.433
bench+0.431
box+0.372
Neg logit contributions
hr-0.764
 exc-0.677
oc-0.671
 accidents-0.655
 tou-0.655
Token,
Token ID11
Feature activation-3.11e-01

Pos logit contributions
abella+0.277
top+0.261
 wrong+0.252
bench+0.250
box+0.223
Neg logit contributions
hr-0.238
oc-0.199
 exc-0.197
AAAA-0.195
 tou-0.195
Token they
Token ID484
Feature activation-6.84e-01

Pos logit contributions
abella+0.411
top+0.385
 wrong+0.367
bench+0.367
box+0.318
Neg logit contributions
hr-0.533
 exc-0.460
oc-0.457
 accidents-0.456
 tou-0.452
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+1.247
top+1.161
 wrong+1.115
bench+1.113
box+1.012
Neg logit contributions
hr-0.845
 exc-0.716
 tou-0.698
Ps-0.695
AAAA-0.681
Token a
Token ID257
Feature activation-5.54e-01

Pos logit contributions
abella+5.015
bench+5.011
top+4.882
ven+4.257
rob+4.248
Neg logit contributions
hr-2.949
 little-2.929
 from-2.829
 accidents-2.731
AAAA-2.531
Token yellow
Token ID7872
Feature activation-2.51e-01

Pos logit contributions
abella+1.025
top+0.963
bench+0.915
 wrong+0.900
box+0.814
Neg logit contributions
hr-0.672
 tou-0.557
 exc-0.553
oc-0.545
 accidents-0.538
Token box
Token ID3091
Feature activation-4.97e-01

Pos logit contributions
abella+0.419
top+0.389
 wrong+0.376
bench+0.372
box+0.329
Neg logit contributions
hr-0.346
oc-0.295
 tou-0.289
 exc-0.288
 accidents-0.284
Token in
Token ID287
Feature activation+5.21e-01

Pos logit contributions
abella+0.886
top+0.816
 wrong+0.782
bench+0.772
 versa+0.693
Neg logit contributions
hr-0.639
 exc-0.529
oc-0.518
 tou-0.517
EX-0.511
Token their
Token ID511
Feature activation-2.90e-01

Pos logit contributions
hr+0.667
oc+0.558
 tou+0.543
AAAA+0.541
 exc+0.538
Neg logit contributions
abella-0.927
top-0.885
 wrong-0.857
bench-0.841
box-0.762
Token.
Token ID13
Feature activation-1.38e+00

Pos logit contributions
abella+1.470
top+1.353
 wrong+1.293
bench+1.291
box+1.188
Neg logit contributions
hr-0.942
oc-0.785
 exc-0.781
 accidents-0.764
 tou-0.752
Token One
Token ID1881
Feature activation-5.30e-01

Pos logit contributions
abella+1.834
bench+1.711
top+1.710
 wrong+1.679
box+1.538
Neg logit contributions
hr-2.214
 exc-2.032
Ps-2.027
oc-1.992
AAAA-1.941
Token day
Token ID1110
Feature activation-7.29e-02

Pos logit contributions
abella+0.672
top+0.603
bench+0.570
 wrong+0.566
box+0.486
Neg logit contributions
hr-0.963
 exc-0.859
oc-0.854
 tou-0.831
 accidents-0.826
Token,
Token ID11
Feature activation-2.85e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.106
box+0.094
Neg logit contributions
hr-0.104
oc-0.088
 exc-0.087
 tou-0.086
AAAA-0.086
Token they
Token ID484
Feature activation-4.83e-01

Pos logit contributions
abella+0.368
top+0.343
 wrong+0.327
bench+0.326
box+0.280
Neg logit contributions
hr-0.498
 exc-0.432
oc-0.427
 accidents-0.427
 tou-0.425
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+0.860
top+0.802
 wrong+0.778
bench+0.767
box+0.697
Neg logit contributions
hr-0.615
 exc-0.522
 tou-0.509
oc-0.508
Ps-0.502
Token a
Token ID257
Feature activation-8.03e-01

Pos logit contributions
bench+5.101
abella+5.012
top+4.964
 wrong+4.370
ven+4.303
Neg logit contributions
 little-2.892
hr-2.874
 from-2.803
 accidents-2.594
 tou-2.575
Token big
Token ID1263
Feature activation-9.78e-01

Pos logit contributions
abella+1.534
top+1.441
bench+1.384
 wrong+1.333
box+1.221
Neg logit contributions
hr-0.930
 tou-0.783
 exc-0.762
oc-0.753
 accidents-0.740
Token box
Token ID3091
Feature activation-5.53e-01

Pos logit contributions
abella+1.530
top+1.358
bench+1.267
 wrong+1.249
 versa+1.095
Neg logit contributions
hr-1.511
oc-1.349
 tou-1.326
 accidents-1.306
 exc-1.305
Token in
Token ID287
Feature activation+7.11e-01

Pos logit contributions
abella+1.019
top+0.937
 wrong+0.906
bench+0.890
 versa+0.804
Neg logit contributions
hr-0.684
 exc-0.562
oc-0.551
 tou-0.548
EX-0.541
Token the
Token ID262
Feature activation-1.39e+00

Pos logit contributions
hr+0.921
oc+0.778
 tou+0.752
AAAA+0.752
Ps+0.752
Neg logit contributions
abella-1.259
top-1.203
 wrong-1.162
bench-1.147
box-1.030
Token were
Token ID547
Feature activation-8.97e-01

Pos logit contributions
abella+1.443
top+1.356
bench+1.320
 wrong+1.299
box+1.193
Neg logit contributions
hr-1.151
 exc-1.029
oc-0.975
 tou-0.973
Ps-0.973
Token walking
Token ID6155
Feature activation-4.90e-01

Pos logit contributions
abella+1.372
top+1.305
bench+1.258
 wrong+1.199
box+1.084
Neg logit contributions
hr-1.384
 exc-1.224
 tou-1.178
oc-1.136
 fortunate-1.126
Token in
Token ID287
Feature activation+3.77e-01

Pos logit contributions
abella+0.813
top+0.748
 wrong+0.729
bench+0.725
box+0.643
Neg logit contributions
hr-0.699
oc-0.596
 exc-0.583
EX-0.574
 tou-0.573
Token the
Token ID262
Feature activation-9.06e-01

Pos logit contributions
hr+0.469
oc+0.388
 tou+0.380
 exc+0.380
EX+0.375
Neg logit contributions
abella-0.678
top-0.646
 wrong-0.627
bench-0.613
box-0.558
Token park
Token ID3952
Feature activation-1.56e+00

Pos logit contributions
abella+1.481
bench+1.350
top+1.346
 wrong+1.283
box+1.173
Neg logit contributions
hr-1.308
 tou-1.139
Ps-1.119
oc-1.114
AAAA-1.102
Token and
Token ID290
Feature activation-2.72e+00

Pos logit contributions
abella+3.400
bench+3.115
top+2.976
 wrong+2.976
acey+2.852
Neg logit contributions
hr-1.577
 exc-1.200
oc-1.167
 from-1.085
EX-1.018
Token stopped
Token ID5025
Feature activation+4.71e-01

Pos logit contributions
bench+4.932
top+4.393
abella+4.391
 wrong+4.246
box+4.069
Neg logit contributions
hr-3.826
 pets-3.076
Ps-3.033
 from-3.027
 mis-2.960
Token in
Token ID287
Feature activation+2.47e-02

Pos logit contributions
hr+0.752
oc+0.656
 tou+0.641
 exc+0.636
EX+0.632
Neg logit contributions
abella-0.670
top-0.640
 wrong-0.624
bench-0.586
box-0.527
Token front
Token ID2166
Feature activation-1.84e-01

Pos logit contributions
hr+0.035
oc+0.030
 exc+0.029
 tou+0.029
EX+0.029
Neg logit contributions
abella-0.040
top-0.038
 wrong-0.037
bench-0.036
box-0.032
Token of
Token ID286
Feature activation-9.00e-01

Pos logit contributions
abella+0.369
top+0.347
 wrong+0.339
bench+0.335
box+0.307
Neg logit contributions
hr-0.194
oc-0.150
 exc-0.149
 tou-0.147
 accidents-0.146
Token a
Token ID257
Feature activation-5.06e-01

Pos logit contributions
abella+1.459
bench+1.384
top+1.379
 wrong+1.371
box+1.221
Neg logit contributions
hr-1.292
oc-1.058
 tou-1.043
 exc-1.029
EX-1.025
Tokenmy
Token ID1820
Feature activation-4.61e-01

Pos logit contributions
abella+1.067
top+0.966
 wrong+0.933
bench+0.921
 versa+0.813
Neg logit contributions
hr-0.961
oc-0.817
 exc-0.803
EX-0.785
 tou-0.784
Token.
Token ID13
Feature activation-8.10e-01

Pos logit contributions
abella+0.898
top+0.838
 wrong+0.812
bench+0.801
box+0.725
Neg logit contributions
hr-0.519
oc-0.419
 exc-0.411
 accidents-0.398
 tou-0.396
Token They
Token ID1119
Feature activation-1.96e+00

Pos logit contributions
abella+1.072
top+0.974
 wrong+0.965
bench+0.943
box+0.808
Neg logit contributions
hr-1.372
 exc-1.221
oc-1.217
Ps-1.214
EX-1.204
Token would
Token ID561
Feature activation-1.83e+00

Pos logit contributions
abella+2.374
top+2.303
 wrong+2.276
bench+2.206
box+1.992
Neg logit contributions
hr-3.431
 exc-3.344
Ps-3.197
 mis-3.121
 tou-3.080
Token stretch
Token ID7539
Feature activation-1.44e+00

Pos logit contributions
abella+3.537
top+3.268
 wrong+3.261
bench+3.240
box+3.004
Neg logit contributions
hr-2.151
 exc-2.017
 mis-1.762
Ps-1.696
oc-1.696
Token and
Token ID290
Feature activation-2.72e+00

Pos logit contributions
abella+2.848
top+2.666
 wrong+2.606
bench+2.591
box+2.337
Neg logit contributions
hr-1.562
 exc-1.275
 from-1.249
oc-1.238
 accidents-1.192
Token move
Token ID1445
Feature activation-8.16e-01

Pos logit contributions
abella+5.875
 wrong+5.303
bench+5.125
top+5.078
box+4.936
Neg logit contributions
hr-2.974
 exc-2.640
 mis-2.507
Ps-2.214
oc-2.187
Token their
Token ID511
Feature activation+2.29e-01

Pos logit contributions
abella+1.409
top+1.311
bench+1.279
 wrong+1.269
box+1.163
Neg logit contributions
hr-1.081
oc-0.899
 exc-0.889
 accidents-0.888
EX-0.869
Token bodies
Token ID5920
Feature activation-1.52e+00

Pos logit contributions
hr+0.327
oc+0.277
 tou+0.271
 exc+0.271
AAAA+0.268
Neg logit contributions
abella-0.372
top-0.351
 wrong-0.342
bench-0.333
box-0.301
Token in
Token ID287
Feature activation+1.72e-01

Pos logit contributions
abella+3.039
top+2.658
 wrong+2.555
bench+2.532
liv+2.391
Neg logit contributions
hr-1.739
 exc-1.489
 from-1.426
rum-1.291
oc-1.280
Token fun
Token ID1257
Feature activation-4.27e-01

Pos logit contributions
hr+0.219
oc+0.181
 tou+0.178
 exc+0.177
EX+0.175
Neg logit contributions
abella-0.306
top-0.290
 wrong-0.282
bench-0.276
box-0.251
Token
Token ID198
Feature activation-1.46e+00

Pos logit contributions
abella+2.678
top+2.600
 wrong+2.553
bench+2.513
box+2.352
Neg logit contributions
hr-1.506
oc-1.314
 exc-1.254
Ps-1.233
AAAA-1.209
TokenOne
Token ID3198
Feature activation-1.52e-01

Pos logit contributions
abella+1.900
top+1.782
bench+1.778
 wrong+1.777
box+1.613
Neg logit contributions
hr-2.456
oc-2.238
Ps-2.186
AAAA-2.153
 exc-2.146
Token day
Token ID1110
Feature activation-3.27e-01

Pos logit contributions
abella+0.178
top+0.162
 wrong+0.153
bench+0.150
box+0.128
Neg logit contributions
hr-0.290
oc-0.256
 exc-0.255
 tou-0.251
 accidents-0.249
Token,
Token ID11
Feature activation-3.65e-01

Pos logit contributions
abella+0.546
top+0.517
bench+0.498
 wrong+0.496
box+0.443
Neg logit contributions
hr-0.448
oc-0.373
 exc-0.370
AAAA-0.367
 accidents-0.366
Token they
Token ID484
Feature activation-1.19e-01

Pos logit contributions
abella+0.498
top+0.470
bench+0.449
 wrong+0.448
box+0.391
Neg logit contributions
hr-0.609
 exc-0.522
oc-0.520
 accidents-0.518
AAAA-0.513
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+0.200
top+0.188
 wrong+0.182
bench+0.178
box+0.161
Neg logit contributions
hr-0.164
oc-0.137
 exc-0.137
 tou-0.135
AAAA-0.134
Token a
Token ID257
Feature activation-8.54e-01

Pos logit contributions
abella+5.199
top+4.992
bench+4.961
liv+4.363
 wrong+4.359
Neg logit contributions
 little-2.936
hr-2.878
 from-2.766
 accidents-2.660
 pets-2.509
Token new
Token ID649
Feature activation-8.73e-01

Pos logit contributions
abella+1.650
top+1.535
bench+1.454
 wrong+1.420
liv+1.323
Neg logit contributions
hr-0.989
 tou-0.825
 exc-0.825
oc-0.801
 accidents-0.784
Token rocket
Token ID10701
Feature activation+3.51e-01

Pos logit contributions
abella+1.362
top+1.205
 wrong+1.133
bench+1.122
liv+1.005
Neg logit contributions
hr-1.355
oc-1.198
 exc-1.197
 tou-1.160
EX-1.143
Token in
Token ID287
Feature activation+2.80e-02

Pos logit contributions
hr+0.523
oc+0.440
 exc+0.434
 tou+0.433
EX+0.432
Neg logit contributions
abella-0.543
top-0.516
 wrong-0.505
bench-0.488
box-0.443
Token their
Token ID511
Feature activation-9.11e-01

Pos logit contributions
hr+0.034
oc+0.028
 exc+0.028
 tou+0.027
EX+0.027
Neg logit contributions
abella-0.051
top-0.049
 wrong-0.047
bench-0.046
box-0.042
Token
Token ID198
Feature activation+1.81e-01

Pos logit contributions
abella+1.177
top+1.105
 wrong+1.091
bench+1.065
box+0.975
Neg logit contributions
hr-0.835
oc-0.709
 exc-0.704
Ps-0.689
AAAA-0.689
TokenOne
Token ID3198
Feature activation-1.31e-01

Pos logit contributions
hr+0.363
oc+0.320
 tou+0.318
 exc+0.317
 accidents+0.316
Neg logit contributions
abella-0.191
top-0.174
 wrong-0.164
bench-0.158
box-0.132
Token day
Token ID1110
Feature activation+5.31e-01

Pos logit contributions
abella+0.154
top+0.140
 wrong+0.133
bench+0.130
box+0.111
Neg logit contributions
hr-0.249
oc-0.220
 exc-0.219
 tou-0.216
 accidents-0.214
Token,
Token ID11
Feature activation+7.51e-01

Pos logit contributions
hr+0.830
oc+0.711
 tou+0.697
EX+0.695
 exc+0.689
Neg logit contributions
abella-0.798
top-0.746
 wrong-0.733
bench-0.685
box-0.624
Token they
Token ID484
Feature activation-3.74e-01

Pos logit contributions
hr+1.318
oc+1.168
 tou+1.134
EX+1.126
rum+1.120
Neg logit contributions
abella-1.006
top-0.926
 wrong-0.901
bench-0.817
box-0.729
Token find
Token ID1064
Feature activation-2.71e+00

Pos logit contributions
abella+0.647
top+0.605
 wrong+0.585
bench+0.576
box+0.521
Neg logit contributions
hr-0.497
 exc-0.420
oc-0.413
 tou-0.412
AAAA-0.407
Token a
Token ID257
Feature activation-8.06e-01

Pos logit contributions
bench+5.071
abella+4.983
top+4.947
 wrong+4.333
 versa+4.318
Neg logit contributions
 little-2.941
hr-2.922
 from-2.722
 accidents-2.598
 tou-2.543
Token new
Token ID649
Feature activation-1.02e+00

Pos logit contributions
abella+1.541
top+1.448
bench+1.387
 wrong+1.344
 versa+1.235
Neg logit contributions
hr-0.932
 tou-0.776
 exc-0.761
oc-0.759
AAAA-0.732
Token truck
Token ID7779
Feature activation+3.03e-01

Pos logit contributions
abella+1.910
top+1.718
bench+1.652
 wrong+1.642
 versa+1.530
Neg logit contributions
hr-1.266
oc-1.122
 exc-1.064
 tou-1.019
EX-1.018
Token in
Token ID287
Feature activation+6.12e-01

Pos logit contributions
hr+0.443
oc+0.372
 exc+0.367
AAAA+0.366
 tou+0.365
Neg logit contributions
abella-0.480
top-0.455
 wrong-0.443
bench-0.430
box-0.389
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
hr+0.796
oc+0.670
Ps+0.647
EX+0.647
 tou+0.646
Neg logit contributions
abella-1.074
top-1.022
 wrong-0.989
bench-0.975
box-0.883
Token mask
Token ID9335
Feature activation-6.64e-01

Pos logit contributions
abella+0.884
top+0.815
 wrong+0.783
bench+0.774
box+0.671
Neg logit contributions
hr-0.792
oc-0.690
 tou-0.673
 exc-0.670
EX-0.659
Token.
Token ID13
Feature activation-7.07e-01

Pos logit contributions
abella+1.188
top+1.108
 wrong+1.066
bench+1.057
box+0.947
Neg logit contributions
hr-0.823
oc-0.688
 exc-0.678
 tou-0.673
EX-0.659
Token They
Token ID1119
Feature activation-9.46e-01

Pos logit contributions
abella+1.168
top+1.105
 wrong+1.087
bench+1.062
box+0.960
Neg logit contributions
hr-0.943
oc-0.805
Ps-0.801
 exc-0.797
AAAA-0.794
Token ran
Token ID4966
Feature activation-6.80e-01

Pos logit contributions
abella+1.634
top+1.548
 wrong+1.546
bench+1.483
box+1.340
Neg logit contributions
hr-1.215
 exc-1.082
oc-1.034
 tou-1.010
Ps-0.995
Token around
Token ID1088
Feature activation-5.09e-01

Pos logit contributions
abella+1.037
top+0.962
 wrong+0.924
bench+0.917
box+0.820
Neg logit contributions
hr-1.027
oc-0.879
 exc-0.864
 tou-0.861
EX-0.860
Token and
Token ID290
Feature activation-2.71e+00

Pos logit contributions
abella+1.050
top+0.991
 wrong+0.962
bench+0.959
box+0.878
Neg logit contributions
hr-0.522
oc-0.402
 exc-0.391
 accidents-0.386
AAAA-0.378
Token made
Token ID925
Feature activation-5.95e-01

Pos logit contributions
abella+5.085
top+4.851
 wrong+4.727
bench+4.692
box+4.457
Neg logit contributions
hr-3.208
 exc-2.694
 from-2.632
 mis-2.622
 pets-2.594
Token loud
Token ID7812
Feature activation+5.36e-01

Pos logit contributions
abella+1.049
top+1.001
bench+0.965
 wrong+0.961
box+0.870
Neg logit contributions
hr-0.759
oc-0.617
 accidents-0.615
 exc-0.614
 tou-0.598
Token noises
Token ID26782
Feature activation-3.35e-01

Pos logit contributions
hr+0.631
 tou+0.512
 exc+0.509
EX+0.506
AAAA+0.499
Neg logit contributions
abella-0.989
top-0.943
 wrong-0.928
bench-0.898
box-0.817
Token.
Token ID13
Feature activation-1.10e+00

Pos logit contributions
abella+0.591
top+0.551
 wrong+0.534
bench+0.528
box+0.481
Neg logit contributions
hr-0.436
oc-0.361
 exc-0.356
 tou-0.350
 accidents-0.350
Token
Token ID198
Feature activation-1.16e+00

Pos logit contributions
abella+1.926
top+1.846
bench+1.834
 wrong+1.826
box+1.661
Neg logit contributions
hr-1.332
Ps-1.149
AAAA-1.139
oc-1.119
 exc-1.118
Token the
Token ID262
Feature activation+7.20e-01

Pos logit contributions
abella+0.223
top+0.210
 wrong+0.204
bench+0.201
box+0.182
Neg logit contributions
hr-0.163
oc-0.134
 exc-0.131
 accidents-0.130
 tou-0.130
Token train
Token ID4512
Feature activation-3.72e-01

Pos logit contributions
hr+1.239
 exc+1.068
oc+1.067
rum+1.067
 tou+1.063
Neg logit contributions
abella-0.948
top-0.894
 wrong-0.889
bench-0.832
box-0.747
Token.
Token ID13
Feature activation+9.20e-01

Pos logit contributions
abella+0.637
top+0.587
 wrong+0.572
bench+0.562
box+0.506
Neg logit contributions
hr-0.502
oc-0.419
 exc-0.412
 tou-0.402
EX-0.402
Token
Token ID220
Feature activation+4.35e-01

Pos logit contributions
hr+1.767
 tou+1.530
AAAA+1.513
 accidents+1.510
 exc+1.506
Neg logit contributions
abella-1.089
top-1.013
 wrong-0.933
bench-0.894
box-0.752
Token
Token ID198
Feature activation+9.94e-01

Pos logit contributions
hr+0.640
oc+0.538
 tou+0.535
 accidents+0.530
 exc+0.528
Neg logit contributions
abella-0.691
top-0.648
 wrong-0.623
bench-0.611
box-0.553
Token
Token ID198
Feature activation+1.83e+00

Pos logit contributions
hr+1.595
 accidents+1.361
 tou+1.361
 fortunate+1.325
EX+1.315
Neg logit contributions
abella-1.484
top-1.369
 wrong-1.297
bench-1.295
box-1.128
TokenIn
Token ID818
Feature activation-8.30e-01

Pos logit contributions
hr+2.717
 fortunate+2.337
 tou+2.310
 accidents+2.245
rum+2.225
Neg logit contributions
abella-2.979
top-2.776
 wrong-2.622
bench-2.601
liv-2.296
Token the
Token ID262
Feature activation-1.04e+00

Pos logit contributions
abella+1.708
top+1.618
 wrong+1.555
bench+1.545
box+1.415
Neg logit contributions
hr-0.842
 exc-0.644
 accidents-0.639
oc-0.629
 tou-0.625
Token end
Token ID886
Feature activation+7.39e-01

Pos logit contributions
abella+1.434
top+1.285
bench+1.228
 wrong+1.205
box+1.070
Neg logit contributions
hr-1.761
oc-1.568
 tou-1.506
EX-1.495
 exc-1.493
Token,
Token ID11
Feature activation+1.02e+00

Pos logit contributions
hr+1.159
 tou+1.007
EX+1.001
oc+0.987
AAAA+0.985
Neg logit contributions
abella-1.099
top-1.022
 wrong-0.997
bench-0.954
box-0.862
Token the
Token ID262
Feature activation-1.38e-01

Pos logit contributions
hr+1.614
oc+1.389
rum+1.374
 tou+1.371
EX+1.365
Neg logit contributions
abella-1.522
 wrong-1.436
top-1.436
bench-1.280
 Luke-1.167
Token tree
Token ID5509
Feature activation-4.04e-01

Pos logit contributions
hr+2.891
rum+2.436
oc+2.428
 accidents+2.369
ge+2.364
Neg logit contributions
 wrong-2.102
abella-2.054
top-1.979
bench-1.819
box-1.647
Token gave
Token ID2921
Feature activation+7.22e-01

Pos logit contributions
abella+0.670
top+0.623
 wrong+0.598
bench+0.594
box+0.530
Neg logit contributions
hr-0.564
oc-0.484
 exc-0.481
 tou-0.472
Ps-0.462
Token nice
Token ID3621
Feature activation+2.28e+00

Pos logit contributions
hr+1.025
oc+0.872
AAAA+0.842
 exc+0.837
Ps+0.835
Neg logit contributions
abella-1.187
top-1.120
 wrong-1.110
bench-1.042
box-0.957
Token shade
Token ID17979
Feature activation-1.46e-01

Pos logit contributions
hr+3.514
ge+2.884
AAAA+2.874
oc+2.867
EX+2.815
Neg logit contributions
top-3.332
abella-3.269
 wrong-3.197
bench-3.032
box-2.963
Token to
Token ID284
Feature activation+2.93e-01

Pos logit contributions
abella+0.213
top+0.197
 wrong+0.190
bench+0.186
box+0.164
Neg logit contributions
hr-0.233
oc-0.201
 exc-0.199
 tou-0.199
EX-0.196
Token everyone
Token ID2506
Feature activation+1.84e+00

Pos logit contributions
hr+0.456
oc+0.388
AAAA+0.381
EX+0.380
 accidents+0.379
Neg logit contributions
abella-0.439
top-0.411
 wrong-0.400
bench-0.387
box-0.344
Token.
Token ID13
Feature activation-2.60e-01

Pos logit contributions
hr+3.054
AAAA+2.797
EX+2.698
Ps+2.661
 accidents+2.655
Neg logit contributions
abella-2.460
top-2.460
 wrong-2.382
bench-2.179
box-1.977
Token Under
Token ID4698
Feature activation-3.13e-01

Pos logit contributions
abella+0.412
top+0.383
 wrong+0.373
bench+0.365
box+0.328
Neg logit contributions
hr-0.385
oc-0.328
 exc-0.327
 tou-0.323
AAAA-0.321
Token the
Token ID262
Feature activation+1.53e+00

Pos logit contributions
abella+0.608
top+0.571
 wrong+0.558
bench+0.553
box+0.507
Neg logit contributions
hr-0.346
oc-0.277
 exc-0.275
 tou-0.273
AAAA-0.268
Token tree
Token ID5509
Feature activation-3.90e-01

Pos logit contributions
hr+2.578
rum+2.157
oc+2.154
AAAA+2.113
 accidents+2.109
Neg logit contributions
 wrong-2.124
top-2.091
abella-2.062
bench-1.870
case-1.715
Token,
Token ID11
Feature activation-7.33e-01

Pos logit contributions
abella+0.664
top+0.615
bench+0.594
 wrong+0.594
box+0.528
Neg logit contributions
hr-0.528
oc-0.445
 exc-0.441
 tou-0.435
Ps-0.426
Token She
Token ID1375
Feature activation+3.41e-01

Pos logit contributions
hr+0.682
oc+0.581
 tou+0.565
 exc+0.563
EX+0.562
Neg logit contributions
abella-0.607
top-0.568
 wrong-0.554
bench-0.533
box-0.471
Token thanked
Token ID26280
Feature activation-5.03e-01

Pos logit contributions
hr+0.522
oc+0.439
EX+0.432
 accidents+0.430
 exc+0.429
Neg logit contributions
abella-0.528
top-0.488
 wrong-0.475
bench-0.460
box-0.411
Token him
Token ID683
Feature activation+3.35e-01

Pos logit contributions
abella+0.998
top+0.957
 wrong+0.918
bench+0.917
box+0.849
Neg logit contributions
hr-0.528
 exc-0.427
 tou-0.427
 accidents-0.421
oc-0.408
Token for
Token ID329
Feature activation+1.45e-01

Pos logit contributions
hr+0.394
oc+0.312
AAAA+0.312
 tou+0.308
EX+0.308
Neg logit contributions
abella-0.634
top-0.602
 wrong-0.593
bench-0.573
box-0.529
Token saving
Token ID8914
Feature activation+4.36e-01

Pos logit contributions
hr+0.202
oc+0.171
 exc+0.167
 tou+0.166
EX+0.165
Neg logit contributions
abella-0.241
top-0.226
 wrong-0.221
bench-0.216
box-0.195
Token her
Token ID607
Feature activation+1.86e+00

Pos logit contributions
hr+0.526
oc+0.431
AAAA+0.422
 exc+0.416
 tou+0.414
Neg logit contributions
abella-0.813
top-0.769
 wrong-0.757
bench-0.732
box-0.668
Token.
Token ID13
Feature activation+2.62e-01

Pos logit contributions
hr+2.958
Ps+2.751
AAAA+2.734
oc+2.594
Fear+2.567
Neg logit contributions
 wrong-2.759
top-2.646
abella-2.585
bench-2.405
box-2.210
Token She
Token ID1375
Feature activation+2.52e-01

Pos logit contributions
hr+0.416
oc+0.353
 tou+0.345
 exc+0.345
EX+0.343
Neg logit contributions
abella-0.391
top-0.366
 wrong-0.355
bench-0.345
box-0.306
Token said
Token ID531
Feature activation-7.84e-01

Pos logit contributions
hr+0.389
oc+0.329
EX+0.323
 exc+0.323
 tou+0.322
Neg logit contributions
abella-0.386
top-0.357
 wrong-0.346
bench-0.337
box-0.301
Token sorry
Token ID7926
Feature activation-1.02e+00

Pos logit contributions
abella+1.276
top+1.235
bench+1.157
 wrong+1.119
box+1.033
Neg logit contributions
hr-1.099
 accidents-0.984
 exc-0.969
 tou-0.948
AAAA-0.926
Token to
Token ID284
Feature activation-3.46e-01

Pos logit contributions
abella+1.767
top+1.666
bench+1.610
 wrong+1.536
box+1.438
Neg logit contributions
hr-1.321
 accidents-1.198
 exc-1.153
oc-1.121
 from-1.091
Token!"
Token ID2474
Feature activation-3.63e-01

Pos logit contributions
hr+0.656
 exc+0.543
EX+0.542
 tou+0.538
oc+0.538
Neg logit contributions
abella-0.865
top-0.804
 wrong-0.793
bench-0.758
box-0.695
Token Ben
Token ID3932
Feature activation+5.27e-01

Pos logit contributions
abella+0.385
top+0.353
 wrong+0.331
bench+0.327
box+0.276
Neg logit contributions
hr-0.720
 exc-0.638
oc-0.633
 tou-0.632
AAAA-0.630
Token shouts
Token ID34757
Feature activation+6.07e-02

Pos logit contributions
hr+0.805
oc+0.680
EX+0.679
 tou+0.673
 exc+0.670
Neg logit contributions
abella-0.806
top-0.754
 wrong-0.737
bench-0.703
 Luke-0.630
Token and
Token ID290
Feature activation+5.69e-01

Pos logit contributions
hr+0.085
oc+0.071
 exc+0.070
 tou+0.069
EX+0.069
Neg logit contributions
abella-0.101
top-0.095
 wrong-0.092
bench-0.090
box-0.081
Token jumps
Token ID18045
Feature activation+4.68e-01

Pos logit contributions
hr+0.753
oc+0.631
EX+0.616
 tou+0.604
 accidents+0.597
Neg logit contributions
abella-0.987
top-0.928
 wrong-0.918
bench-0.870
 Luke-0.796
Token out
Token ID503
Feature activation+2.18e+00

Pos logit contributions
hr+0.683
oc+0.573
 exc+0.562
 tou+0.559
EX+0.554
Neg logit contributions
abella-0.757
top-0.712
 wrong-0.701
bench-0.673
box-0.604
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
hr+3.341
oc+2.892
EX+2.858
 tou+2.825
AAAA+2.823
Neg logit contributions
abella-3.338
top-3.222
 wrong-3.185
bench-2.971
 Luke-2.821
Token He
Token ID679
Feature activation+1.09e+00

Pos logit contributions
abella+0.171
top+0.161
 wrong+0.156
bench+0.153
box+0.138
Neg logit contributions
hr-0.139
oc-0.116
 exc-0.116
 tou-0.115
AAAA-0.114
Token makes
Token ID1838
Feature activation-2.04e-01

Pos logit contributions
hr+1.577
EX+1.321
 accidents+1.284
oc+1.275
 tou+1.274
Neg logit contributions
abella-1.770
 wrong-1.645
top-1.637
bench-1.495
 Luke-1.400
Token Lily
Token ID20037
Feature activation+1.99e+00

Pos logit contributions
abella+0.314
top+0.296
 wrong+0.284
bench+0.282
box+0.251
Neg logit contributions
hr-0.306
oc-0.257
 exc-0.256
 accidents-0.254
 tou-0.254
Token laugh
Token ID6487
Feature activation+1.07e+00

Pos logit contributions
hr+3.399
AAAA+3.113
EX+3.085
Ps+3.078
 tou+3.027
Neg logit contributions
top-2.452
 wrong-2.446
abella-2.393
bench-2.209
box-1.982
Token make
Token ID787
Feature activation-7.28e-01

Pos logit contributions
abella+1.380
top+1.295
bench+1.237
 wrong+1.234
box+1.118
Neg logit contributions
hr-1.033
 exc-0.877
 tou-0.860
oc-0.839
 accidents-0.833
Token up
Token ID510
Feature activation+1.10e+00

Pos logit contributions
abella+1.364
top+1.316
bench+1.267
 wrong+1.248
box+1.154
Neg logit contributions
hr-0.845
 accidents-0.681
 exc-0.662
oc-0.649
 tou-0.644
Token.
Token ID13
Feature activation+6.09e-01

Pos logit contributions
hr+1.637
AAAA+1.464
oc+1.445
Ps+1.427
 tou+1.406
Neg logit contributions
abella-1.711
 wrong-1.635
top-1.605
bench-1.523
box-1.346
Token But
Token ID887
Feature activation+4.06e-01

Pos logit contributions
hr+1.000
 tou+0.857
oc+0.856
 exc+0.846
AAAA+0.840
Neg logit contributions
abella-0.879
top-0.824
 wrong-0.791
bench-0.764
box-0.676
Token when
Token ID618
Feature activation+1.83e-01

Pos logit contributions
hr+0.574
oc+0.487
 tou+0.479
 exc+0.477
AAAA+0.475
Neg logit contributions
abella-0.667
top-0.625
 wrong-0.616
bench-0.590
box-0.537
Token she
Token ID673
Feature activation+2.24e+00

Pos logit contributions
hr+0.295
oc+0.254
 exc+0.250
 tou+0.249
AAAA+0.247
Neg logit contributions
abella-0.266
top-0.247
 wrong-0.240
bench-0.233
box-0.206
Token got
Token ID1392
Feature activation-1.46e+00

Pos logit contributions
hr+4.073
rum+3.765
 exc+3.630
 fortunate+3.623
adoes+3.606
Neg logit contributions
abella-2.967
top-2.656
 wrong-2.622
bench-2.346
box-2.106
Token there
Token ID612
Feature activation+1.92e-01

Pos logit contributions
abella+2.726
top+2.564
bench+2.499
 wrong+2.326
 projector+2.249
Neg logit contributions
hr-1.747
 accidents-1.467
 tou-1.414
 exc-1.376
EX-1.362
Token,
Token ID11
Feature activation+1.08e-01

Pos logit contributions
hr+0.283
oc+0.240
 exc+0.236
 tou+0.235
AAAA+0.233
Neg logit contributions
abella-0.304
top-0.286
 wrong-0.278
bench-0.270
box-0.243
Token she
Token ID673
Feature activation-5.08e-01

Pos logit contributions
hr+0.169
oc+0.145
 exc+0.142
 tou+0.142
AAAA+0.141
Neg logit contributions
abella-0.162
top-0.151
 wrong-0.146
bench-0.143
box-0.127
Token saw
Token ID2497
Feature activation-4.05e-01

Pos logit contributions
abella+0.794
top+0.753
bench+0.728
 wrong+0.720
box+0.634
Neg logit contributions
hr-0.743
oc-0.640
 exc-0.621
 tou-0.621
 accidents-0.613
Token But
Token ID887
Feature activation-3.77e-01

Pos logit contributions
hr+0.101
oc+0.086
 exc+0.085
 tou+0.084
EX+0.084
Neg logit contributions
abella-0.100
top-0.094
 wrong-0.090
bench-0.088
box-0.079
Token it
Token ID340
Feature activation-2.05e-01

Pos logit contributions
abella+0.613
top+0.575
bench+0.556
 wrong+0.556
box+0.493
Neg logit contributions
hr-0.538
oc-0.450
 exc-0.443
 accidents-0.441
 tou-0.439
Token seemed
Token ID3947
Feature activation+1.28e-01

Pos logit contributions
abella+0.334
top+0.314
 wrong+0.303
bench+0.301
box+0.269
Neg logit contributions
hr-0.290
oc-0.244
 exc-0.242
 tou-0.240
 accidents-0.237
Token that
Token ID326
Feature activation-3.61e-01

Pos logit contributions
hr+0.205
oc+0.177
 exc+0.174
 tou+0.173
EX+0.172
Neg logit contributions
abella-0.185
top-0.173
 wrong-0.168
bench-0.163
box-0.144
Token every
Token ID790
Feature activation-3.97e-01

Pos logit contributions
abella+0.588
top+0.550
bench+0.528
 wrong+0.527
box+0.472
Neg logit contributions
hr-0.513
 exc-0.428
 accidents-0.427
oc-0.425
 tou-0.423
Token time
Token ID640
Feature activation+1.23e-01

Pos logit contributions
abella+0.536
top+0.488
 wrong+0.466
bench+0.457
box+0.401
Neg logit contributions
hr-0.681
 exc-0.590
oc-0.589
 tou-0.584
AAAA-0.580
Token they
Token ID484
Feature activation-1.75e-01

Pos logit contributions
hr+0.195
oc+0.167
 exc+0.165
 tou+0.164
EX+0.163
Neg logit contributions
abella-0.181
top-0.169
 wrong-0.163
bench-0.159
box-0.141
Token got
Token ID1392
Feature activation+4.08e-02

Pos logit contributions
abella+0.259
top+0.241
 wrong+0.233
bench+0.229
box+0.204
Neg logit contributions
hr-0.275
oc-0.235
 exc-0.234
 tou-0.233
EX-0.230
Token close
Token ID1969
Feature activation+3.00e-01

Pos logit contributions
hr+0.059
oc+0.050
 exc+0.049
 tou+0.049
EX+0.049
Neg logit contributions
abella-0.065
top-0.061
 wrong-0.059
bench-0.058
box-0.052
Token,
Token ID11
Feature activation+4.28e-01

Pos logit contributions
hr+0.449
oc+0.384
 exc+0.377
 tou+0.376
Ps+0.375
Neg logit contributions
abella-0.463
top-0.439
 wrong-0.426
bench-0.411
box-0.370
Token the
Token ID262
Feature activation-1.10e+00

Pos logit contributions
hr+0.564
oc+0.474
 tou+0.463
 exc+0.459
Ps+0.459
Neg logit contributions
abella-0.735
top-0.702
 wrong-0.682
bench-0.660
box-0.601
Token's
Token ID338
Feature activation-5.11e-01

Pos logit contributions
hr+0.618
 tou+0.516
oc+0.514
 exc+0.512
EX+0.512
Neg logit contributions
abella-0.688
top-0.644
 wrong-0.615
bench-0.605
box-0.552
Token mom
Token ID1995
Feature activation-4.05e-02

Pos logit contributions
abella+0.874
top+0.824
 wrong+0.788
bench+0.788
box+0.689
Neg logit contributions
hr-0.691
 exc-0.579
 accidents-0.572
oc-0.572
 tou-0.564
Token opened
Token ID4721
Feature activation-2.36e-01

Pos logit contributions
abella+0.062
top+0.058
 wrong+0.056
bench+0.055
box+0.049
Neg logit contributions
hr-0.062
oc-0.053
 exc-0.052
 tou-0.052
EX-0.052
Token the
Token ID262
Feature activation-9.34e-01

Pos logit contributions
abella+0.446
top+0.421
 wrong+0.411
bench+0.406
box+0.370
Neg logit contributions
hr-0.273
oc-0.218
 exc-0.214
 tou-0.213
EX-0.213
Token door
Token ID3420
Feature activation+3.42e-01

Pos logit contributions
abella+1.054
top+0.938
bench+0.867
 wrong+0.835
 Luke+0.727
Neg logit contributions
hr-1.772
oc-1.610
 accidents-1.561
 tou-1.558
EX-1.550
Token and
Token ID290
Feature activation+1.58e-01

Pos logit contributions
hr+0.379
oc+0.303
 tou+0.299
AAAA+0.298
 exc+0.297
Neg logit contributions
abella-0.663
top-0.640
 wrong-0.617
bench-0.607
box-0.559
Token took
Token ID1718
Feature activation-5.07e-01

Pos logit contributions
hr+0.233
oc+0.197
 exc+0.195
 tou+0.194
EX+0.192
Neg logit contributions
abella-0.249
top-0.232
 wrong-0.224
bench-0.219
box-0.197
Token out
Token ID503
Feature activation+3.28e-01

Pos logit contributions
abella+0.776
top+0.732
 wrong+0.709
bench+0.703
box+0.624
Neg logit contributions
hr-0.754
oc-0.642
 tou-0.637
 exc-0.633
 accidents-0.632
Token the
Token ID262
Feature activation-8.52e-01

Pos logit contributions
hr+0.521
oc+0.450
 tou+0.441
 exc+0.441
AAAA+0.440
Neg logit contributions
abella-0.488
top-0.455
 wrong-0.440
bench-0.425
box-0.379
Token popcorn
Token ID38914
Feature activation-3.79e-01

Pos logit contributions
abella+1.163
top+1.066
bench+0.988
 wrong+0.981
box+0.848
Neg logit contributions
hr-1.447
oc-1.264
 accidents-1.253
 exc-1.250
EX-1.221
Token.
Token ID13
Feature activation+3.02e-01

Pos logit contributions
abella+0.681
top+0.632
 wrong+0.619
bench+0.605
box+0.544
Neg logit contributions
hr-0.478
oc-0.397
 exc-0.387
 tou-0.386
 accidents-0.381
Token sick
Token ID6639
Feature activation+4.31e-01

Pos logit contributions
hr+1.128
oc+0.973
AAAA+0.960
Ps+0.956
 tou+0.951
Neg logit contributions
abella-1.183
top-1.142
 wrong-1.131
bench-1.047
box-0.931
Token,
Token ID11
Feature activation+4.65e-01

Pos logit contributions
hr+0.588
oc+0.488
 tou+0.482
EX+0.481
Ps+0.477
Neg logit contributions
abella-0.717
top-0.685
 wrong-0.675
bench-0.649
box-0.588
Token or
Token ID393
Feature activation-1.47e-01

Pos logit contributions
hr+0.606
oc+0.506
 tou+0.496
EX+0.493
AAAA+0.489
Neg logit contributions
abella-0.810
top-0.766
 wrong-0.756
bench-0.723
box-0.660
Token hurt
Token ID5938
Feature activation-7.67e-02

Pos logit contributions
abella+0.239
top+0.223
 wrong+0.215
bench+0.213
box+0.191
Neg logit contributions
hr-0.208
 exc-0.174
oc-0.174
 tou-0.172
 accidents-0.172
Token your
Token ID534
Feature activation+9.13e-01

Pos logit contributions
abella+0.132
top+0.124
 wrong+0.120
bench+0.119
box+0.107
Neg logit contributions
hr-0.102
oc-0.084
 exc-0.083
 tou-0.083
 accidents-0.082
Token mouth
Token ID5422
Feature activation+5.64e-01

Pos logit contributions
hr+1.168
Ps+0.983
AAAA+0.974
oc+0.963
EX+0.959
Neg logit contributions
abella-1.585
top-1.529
 wrong-1.527
bench-1.432
box-1.333
Token.
Token ID13
Feature activation+7.29e-02

Pos logit contributions
hr+0.818
oc+0.682
 tou+0.677
AAAA+0.676
Ps+0.675
Neg logit contributions
abella-0.892
top-0.858
 wrong-0.848
bench-0.806
box-0.728
Token You
Token ID921
Feature activation-4.07e-01

Pos logit contributions
hr+0.112
oc+0.095
 exc+0.094
 tou+0.094
EX+0.093
Neg logit contributions
abella-0.110
top-0.103
 wrong-0.100
bench-0.097
box-0.087
Token have
Token ID423
Feature activation-7.31e-01

Pos logit contributions
abella+0.633
top+0.589
 wrong+0.563
bench+0.560
box+0.499
Neg logit contributions
hr-0.613
 exc-0.527
oc-0.526
 tou-0.518
EX-0.506
Token to
Token ID284
Feature activation-1.16e+00

Pos logit contributions
abella+1.112
top+1.039
bench+1.001
 wrong+0.964
 projector+0.863
Neg logit contributions
hr-1.117
 exc-0.956
 accidents-0.952
 tou-0.941
AAAA-0.932
Token ask
Token ID1265
Feature activation-7.32e-02

Pos logit contributions
abella+2.161
top+1.998
bench+1.939
 wrong+1.911
box+1.789
Neg logit contributions
hr-1.429
 exc-1.218
 tou-1.166
oc-1.151
rum-1.099
Token<|endoftext|>
Token ID50256
Feature activation-2.16e-01

Pos logit contributions
abella+0.117
top+0.110
 wrong+0.107
bench+0.105
box+0.094
Neg logit contributions
hr-0.098
oc-0.082
 exc-0.081
 tou-0.081
AAAA-0.080
TokenOnce
Token ID7454
Feature activation-2.13e-01

Pos logit contributions
abella+0.396
top+0.374
 wrong+0.364
bench+0.357
box+0.325
Neg logit contributions
hr-0.259
oc-0.213
 tou-0.208
AAAA-0.207
 exc-0.206
Token upon
Token ID2402
Feature activation+3.07e-01

Pos logit contributions
abella+0.488
top+0.468
 wrong+0.456
bench+0.452
box+0.419
Neg logit contributions
hr-0.159
oc-0.110
 exc-0.110
 tou-0.106
 accidents-0.104
Token a
Token ID257
Feature activation-8.16e-01

Pos logit contributions
hr+0.526
oc+0.457
 tou+0.450
EX+0.446
 exc+0.446
Neg logit contributions
abella-0.408
top-0.380
 wrong-0.367
bench-0.354
box-0.309
Token time
Token ID640
Feature activation+3.66e-01

Pos logit contributions
abella+1.246
top+1.157
 wrong+1.102
bench+1.095
box+0.990
Neg logit contributions
hr-1.237
oc-1.099
 exc-1.080
Ps-1.055
AAAA-1.053
Token there
Token ID612
Feature activation+3.84e-01

Pos logit contributions
hr+0.553
oc+0.470
 tou+0.466
EX+0.461
 exc+0.460
Neg logit contributions
abella-0.562
top-0.525
 wrong-0.515
bench-0.494
box-0.449
Token was
Token ID373
Feature activation-3.81e-01

Pos logit contributions
hr+0.544
oc+0.448
 tou+0.446
 exc+0.445
EX+0.443
Neg logit contributions
abella-0.633
top-0.595
 wrong-0.578
bench-0.558
box-0.507
Token a
Token ID257
Feature activation+3.05e-01

Pos logit contributions
abella+0.566
top+0.525
 wrong+0.506
bench+0.502
box+0.444
Neg logit contributions
hr-0.597
 exc-0.515
oc-0.509
 tou-0.505
AAAA-0.505
Token little
Token ID1310
Feature activation-6.71e-01

Pos logit contributions
hr+0.223
oc+0.149
EX+0.146
 accidents+0.143
 tou+0.142
Neg logit contributions
abella-0.705
top-0.675
 wrong-0.664
bench-0.649
box-0.605
Token bear
Token ID6842
Feature activation-1.30e-01

Pos logit contributions
abella+1.530
top+1.432
bench+1.395
 wrong+1.392
box+1.287
Neg logit contributions
hr-0.550
oc-0.425
 tou-0.414
 exc-0.411
 accidents-0.398
Token.
Token ID13
Feature activation-3.64e-02

Pos logit contributions
abella+0.201
top+0.187
 wrong+0.181
bench+0.178
box+0.159
Neg logit contributions
hr-0.196
oc-0.168
 exc-0.165
 tou-0.164
EX-0.162
Token at
Token ID379
Feature activation-7.05e-02

Pos logit contributions
hr+0.701
oc+0.589
 tou+0.586
EX+0.585
AAAA+0.577
Neg logit contributions
abella-0.853
top-0.787
 wrong-0.782
bench-0.730
box-0.683
Token my
Token ID616
Feature activation+8.96e-01

Pos logit contributions
abella+0.141
top+0.135
 wrong+0.131
bench+0.129
box+0.118
Neg logit contributions
hr-0.074
oc-0.058
 exc-0.057
 tou-0.057
 accidents-0.056
Token car
Token ID1097
Feature activation+3.38e-01

Pos logit contributions
hr+1.443
AAAA+1.263
 tou+1.261
oc+1.252
rum+1.246
Neg logit contributions
abella-1.270
top-1.207
 wrong-1.185
bench-1.108
box-1.035
Token,
Token ID11
Feature activation+5.82e-01

Pos logit contributions
hr+0.491
oc+0.410
 tou+0.406
 exc+0.406
AAAA+0.405
Neg logit contributions
abella-0.538
top-0.510
 wrong-0.498
bench-0.482
box-0.439
Token Mia
Token ID32189
Feature activation-1.90e-01

Pos logit contributions
hr+1.106
oc+0.986
EX+0.971
 tou+0.963
AAAA+0.958
Neg logit contributions
abella-0.692
top-0.611
 wrong-0.605
bench-0.549
 Luke-0.494
Token!"
Token ID2474
Feature activation+3.07e-01

Pos logit contributions
abella+0.338
top+0.319
 wrong+0.307
bench+0.303
box+0.274
Neg logit contributions
hr-0.242
oc-0.202
 exc-0.198
 tou-0.195
 accidents-0.192
Token Tom
Token ID4186
Feature activation-7.06e-01

Pos logit contributions
hr+0.595
oc+0.528
 tou+0.522
EX+0.517
 exc+0.515
Neg logit contributions
abella-0.351
top-0.320
 wrong-0.305
bench-0.293
box-0.246
Token says
Token ID1139
Feature activation-4.55e-01

Pos logit contributions
abella+1.201
top+1.118
 wrong+1.072
bench+1.063
box+0.972
Neg logit contributions
hr-0.956
 exc-0.820
oc-0.787
Ps-0.783
 tou-0.772
Token.
Token ID13
Feature activation+8.23e-01

Pos logit contributions
abella+0.813
top+0.751
 wrong+0.725
bench+0.717
box+0.653
Neg logit contributions
hr-0.591
 exc-0.489
oc-0.475
 accidents-0.473
rum-0.461
Token He
Token ID679
Feature activation+4.02e-01

Pos logit contributions
hr+1.288
oc+1.072
 tou+1.065
 accidents+1.060
EX+1.055
Neg logit contributions
abella-1.271
top-1.198
bench-1.140
 wrong-1.139
box-0.996
Token holds
Token ID6622
Feature activation-6.87e-01

Pos logit contributions
hr+0.589
oc+0.497
 tou+0.490
EX+0.487
 accidents+0.482
Neg logit contributions
abella-0.644
top-0.603
 wrong-0.587
bench-0.569
box-0.516
Token she
Token ID673
Feature activation+2.19e+00

Pos logit contributions
hr+0.359
oc+0.309
 exc+0.303
 tou+0.300
EX+0.300
Neg logit contributions
abella-0.332
top-0.308
 wrong-0.298
bench-0.287
box-0.255
Token got
Token ID1392
Feature activation-1.11e+00

Pos logit contributions
hr+4.185
adoes+3.793
rum+3.774
 accidents+3.708
 fortunate+3.683
Neg logit contributions
abella-2.684
top-2.457
 wrong-2.264
bench-1.999
box-1.906
Token closer
Token ID5699
Feature activation+8.81e-03

Pos logit contributions
abella+2.090
top+1.922
bench+1.920
 wrong+1.843
box+1.707
Neg logit contributions
hr-1.354
 tou-1.119
 exc-1.092
Fear-1.081
 accidents-1.080
Token,
Token ID11
Feature activation+4.40e-01

Pos logit contributions
hr+0.013
oc+0.011
 exc+0.010
 tou+0.010
EX+0.010
Neg logit contributions
abella-0.014
top-0.013
 wrong-0.013
bench-0.013
box-0.011
Token she
Token ID673
Feature activation-6.49e-02

Pos logit contributions
hr+0.713
oc+0.622
 exc+0.598
 tou+0.597
EX+0.597
Neg logit contributions
abella-0.639
top-0.589
 wrong-0.576
bench-0.542
box-0.481
Token saw
Token ID2497
Feature activation-7.24e-01

Pos logit contributions
abella+0.099
top+0.092
 wrong+0.089
bench+0.088
box+0.078
Neg logit contributions
hr-0.100
oc-0.085
 exc-0.084
 tou-0.084
AAAA-0.083
Token that
Token ID326
Feature activation-6.03e-02

Pos logit contributions
abella+1.083
top+1.008
bench+0.994
 wrong+0.946
box+0.856
Neg logit contributions
hr-1.129
 accidents-0.976
 exc-0.962
oc-0.959
 tou-0.957
Token it
Token ID340
Feature activation-1.23e-01

Pos logit contributions
abella+0.098
top+0.092
 wrong+0.089
bench+0.088
box+0.079
Neg logit contributions
hr-0.087
oc-0.073
 exc-0.072
 tou-0.072
 accidents-0.071
Token was
Token ID373
Feature activation-1.01e-01

Pos logit contributions
abella+0.211
top+0.198
 wrong+0.192
bench+0.190
box+0.171
Neg logit contributions
hr-0.163
oc-0.136
 exc-0.134
 tou-0.133
 accidents-0.132
Token a
Token ID257
Feature activation-1.38e-01

Pos logit contributions
abella+0.148
top+0.138
 wrong+0.132
bench+0.130
box+0.115
Neg logit contributions
hr-0.161
 exc-0.137
oc-0.137
 tou-0.137
 accidents-0.135
Token big
Token ID1263
Feature activation-6.71e-01

Pos logit contributions
abella+0.259
top+0.245
 wrong+0.237
bench+0.235
box+0.213
Neg logit contributions
hr-0.162
oc-0.131
 exc-0.130
 tou-0.129
AAAA-0.127
Token and
Token ID290
Feature activation+6.01e-01

Pos logit contributions
hr+0.177
oc+0.131
 tou+0.127
 exc+0.127
AAAA+0.127
Neg logit contributions
abella-0.429
top-0.413
 wrong-0.403
bench-0.398
box-0.369
Token the
Token ID262
Feature activation+1.05e-01

Pos logit contributions
hr+0.884
oc+0.750
AAAA+0.733
EX+0.730
Ps+0.726
Neg logit contributions
abella-0.962
top-0.898
 wrong-0.877
bench-0.855
box-0.774
Token things
Token ID1243
Feature activation+3.16e-01

Pos logit contributions
hr+0.149
oc+0.125
 exc+0.123
 tou+0.122
EX+0.122
Neg logit contributions
abella-0.171
top-0.161
 wrong-0.157
bench-0.153
box-0.138
Token.
Token ID13
Feature activation-1.45e-01

Pos logit contributions
hr+0.539
oc+0.470
AAAA+0.457
 tou+0.456
 exc+0.456
Neg logit contributions
abella-0.427
top-0.399
 wrong-0.390
bench-0.377
box-0.329
Token She
Token ID1375
Feature activation-4.64e-01

Pos logit contributions
abella+0.235
top+0.221
 wrong+0.214
bench+0.209
box+0.188
Neg logit contributions
hr-0.209
oc-0.176
 exc-0.175
 tou-0.173
AAAA-0.173
Token sent
Token ID1908
Feature activation-4.45e-01

Pos logit contributions
abella+0.706
top+0.671
 wrong+0.641
bench+0.633
box+0.556
Neg logit contributions
hr-0.699
 exc-0.605
oc-0.604
 tou-0.593
 accidents-0.581
Token them
Token ID606
Feature activation+2.09e+00

Pos logit contributions
abella+0.626
top+0.581
 wrong+0.555
bench+0.554
box+0.479
Neg logit contributions
hr-0.723
 accidents-0.621
oc-0.617
 exc-0.616
 tou-0.615
Token to
Token ID284
Feature activation+3.54e-01

Pos logit contributions
hr+4.011
oc+3.724
AAAA+3.720
Ps+3.603
rum+3.534
Neg logit contributions
 wrong-2.460
abella-2.441
top-2.314
bench-2.051
box-1.950
Token their
Token ID511
Feature activation+5.35e-01

Pos logit contributions
hr+0.493
oc+0.419
AAAA+0.406
 tou+0.405
EX+0.404
Neg logit contributions
abella-0.590
top-0.553
 wrong-0.540
bench-0.524
box-0.472
Token room
Token ID2119
Feature activation+8.58e-01

Pos logit contributions
hr+0.850
oc+0.726
AAAA+0.720
EX+0.716
Ps+0.713
Neg logit contributions
abella-0.786
top-0.733
 wrong-0.726
bench-0.687
box-0.626
Token.
Token ID13
Feature activation-6.56e-03

Pos logit contributions
hr+1.016
AAAA+0.834
Ps+0.823
oc+0.818
 tou+0.817
Neg logit contributions
abella-1.603
top-1.552
 wrong-1.531
bench-1.488
box-1.377
Token Then
Token ID3244
Feature activation-3.88e-02

Pos logit contributions
hr+0.398
oc+0.331
 tou+0.329
 exc+0.328
EX+0.325
Neg logit contributions
abella-0.437
top-0.411
 wrong-0.397
bench-0.388
box-0.348
Token,
Token ID11
Feature activation+8.02e-01

Pos logit contributions
abella+0.063
top+0.059
 wrong+0.057
bench+0.056
box+0.050
Neg logit contributions
hr-0.056
oc-0.048
 exc-0.047
 tou-0.047
 accidents-0.046
Token he
Token ID339
Feature activation-3.78e-02

Pos logit contributions
hr+1.397
oc+1.207
EX+1.195
 tou+1.191
 fortunate+1.188
Neg logit contributions
abella-1.076
top-1.002
 wrong-0.986
bench-0.896
box-0.803
Token went
Token ID1816
Feature activation-1.16e+00

Pos logit contributions
abella+0.060
top+0.056
 wrong+0.055
bench+0.054
box+0.048
Neg logit contributions
hr-0.055
oc-0.047
 exc-0.046
 tou-0.046
AAAA-0.045
Token to
Token ID284
Feature activation+2.42e-01

Pos logit contributions
abella+1.906
top+1.799
bench+1.711
 wrong+1.657
 Luke+1.494
Neg logit contributions
hr-1.590
 tou-1.329
 from-1.324
 accidents-1.324
 exc-1.319
Token sleep
Token ID3993
Feature activation-7.70e-01

Pos logit contributions
hr+0.368
oc+0.313
 tou+0.309
EX+0.307
 exc+0.306
Neg logit contributions
abella-0.374
top-0.349
 wrong-0.340
bench-0.329
box-0.295
Token with
Token ID351
Feature activation-5.97e-01

Pos logit contributions
abella+1.490
top+1.409
bench+1.332
 wrong+1.329
box+1.214
Neg logit contributions
hr-0.872
oc-0.700
 exc-0.687
 accidents-0.679
 tou-0.662
Token the
Token ID262
Feature activation+6.42e-01

Pos logit contributions
abella+1.153
top+1.091
bench+1.057
 wrong+1.042
box+0.954
Neg logit contributions
hr-0.671
 accidents-0.546
 exc-0.541
oc-0.531
 tou-0.528
Token sack
Token ID23704
Feature activation-2.21e-01

Pos logit contributions
hr+0.823
oc+0.675
AAAA+0.644
rum+0.642
Ps+0.632
Neg logit contributions
abella-1.129
 wrong-1.106
top-1.095
bench-1.062
box-0.977
Token beside
Token ID13970
Feature activation+8.03e-01

Pos logit contributions
abella+0.389
top+0.363
 wrong+0.351
bench+0.345
box+0.312
Neg logit contributions
hr-0.284
oc-0.236
 exc-0.234
 tou-0.231
 accidents-0.228
Token him
Token ID683
Feature activation+1.04e+00

Pos logit contributions
hr+1.218
oc+1.023
AAAA+1.015
 fortunate+1.009
 tou+1.008
Neg logit contributions
abella-1.254
top-1.197
 wrong-1.168
bench-1.107
box-1.008
Token.
Token ID13
Feature activation-2.89e-01

Pos logit contributions
hr+0.545
oc+0.460
 exc+0.453
 tou+0.451
AAAA+0.448
Neg logit contributions
abella-0.617
top-0.581
 wrong-0.570
bench-0.547
box-0.494
Token Instead
Token ID5455
Feature activation-4.70e-02

Pos logit contributions
abella+0.479
top+0.448
 wrong+0.437
bench+0.430
box+0.385
Neg logit contributions
hr-0.402
oc-0.341
 exc-0.333
AAAA-0.329
 tou-0.329
Token,
Token ID11
Feature activation+5.22e-01

Pos logit contributions
abella+0.078
top+0.073
 wrong+0.071
bench+0.070
box+0.063
Neg logit contributions
hr-0.066
oc-0.055
 exc-0.054
 tou-0.054
 accidents-0.054
Token it
Token ID340
Feature activation+3.40e-01

Pos logit contributions
hr+0.841
oc+0.722
 tou+0.717
EX+0.716
 exc+0.715
Neg logit contributions
abella-0.752
top-0.706
 wrong-0.683
bench-0.650
box-0.578
Token kept
Token ID4030
Feature activation-1.74e-01

Pos logit contributions
hr+0.531
oc+0.450
 exc+0.447
 tou+0.445
EX+0.444
Neg logit contributions
abella-0.511
top-0.476
 wrong-0.461
bench-0.445
box-0.396
Token growing
Token ID3957
Feature activation-6.80e-02

Pos logit contributions
abella+0.281
top+0.263
 wrong+0.253
bench+0.250
box+0.224
Neg logit contributions
hr-0.252
oc-0.210
 exc-0.210
 tou-0.209
 accidents-0.208
Token bigger
Token ID5749
Feature activation+7.33e-01

Pos logit contributions
abella+0.121
top+0.113
 wrong+0.110
bench+0.108
box+0.098
Neg logit contributions
hr-0.087
oc-0.071
 exc-0.070
 tou-0.070
 accidents-0.069
Token and
Token ID290
Feature activation-1.19e+00

Pos logit contributions
hr+0.719
Ps+0.561
 tou+0.552
AAAA+0.551
EX+0.550
Neg logit contributions
abella-1.492
top-1.462
 wrong-1.421
bench-1.378
box-1.286
Token bigger
Token ID5749
Feature activation+7.76e-01

Pos logit contributions
abella+1.949
bench+1.816
top+1.795
 wrong+1.704
box+1.647
Neg logit contributions
hr-1.679
 accidents-1.471
 exc-1.414
Ps-1.383
 tou-1.377
Token until
Token ID1566
Feature activation-1.04e+00

Pos logit contributions
hr+0.966
Ps+0.799
 tou+0.789
EX+0.788
AAAA+0.784
Neg logit contributions
abella-1.378
top-1.340
 wrong-1.306
bench-1.255
box-1.150
Token finally
Token ID3443
Feature activation+2.24e-01

Pos logit contributions
abella+1.728
top+1.609
bench+1.606
 wrong+1.557
box+1.439
Neg logit contributions
hr-1.402
 accidents-1.222
 tou-1.175
AAAA-1.168
 exc-1.163
Token soon
Token ID2582
Feature activation+6.36e-01

Pos logit contributions
abella+0.926
top+0.870
bench+0.851
 wrong+0.841
box+0.768
Neg logit contributions
hr-0.807
 exc-0.680
oc-0.674
 tou-0.672
 accidents-0.667
Token as
Token ID355
Feature activation-3.49e-01

Pos logit contributions
hr+0.556
oc+0.419
 exc+0.401
EX+0.400
 tou+0.396
Neg logit contributions
abella-1.400
top-1.326
 wrong-1.296
bench-1.272
 versa-1.156
Token the
Token ID262
Feature activation+2.96e-01

Pos logit contributions
abella+0.548
top+0.513
 wrong+0.494
bench+0.493
box+0.444
Neg logit contributions
hr-0.521
 exc-0.444
 tou-0.435
oc-0.434
 accidents-0.434
Token family
Token ID1641
Feature activation+3.77e-01

Pos logit contributions
hr+0.441
oc+0.377
AAAA+0.366
 exc+0.365
 tou+0.364
Neg logit contributions
abella-0.464
top-0.436
 wrong-0.424
bench-0.413
box-0.370
Token arrived
Token ID5284
Feature activation-3.60e-02

Pos logit contributions
hr+0.522
oc+0.433
 exc+0.429
 tou+0.426
 accidents+0.425
Neg logit contributions
abella-0.631
top-0.595
 wrong-0.571
bench-0.560
box-0.512
Token at
Token ID379
Feature activation-3.68e-01

Pos logit contributions
abella+0.066
top+0.062
 wrong+0.060
bench+0.059
box+0.054
Neg logit contributions
hr-0.044
oc-0.036
 exc-0.036
 tou-0.035
 accidents-0.035
Token their
Token ID511
Feature activation+6.94e-01

Pos logit contributions
abella+0.698
top+0.657
 wrong+0.640
bench+0.632
box+0.577
Neg logit contributions
hr-0.432
 exc-0.344
oc-0.342
 tou-0.338
 accidents-0.332
Token new
Token ID649
Feature activation+1.06e+00

Pos logit contributions
hr+1.049
AAAA+0.908
oc+0.901
rum+0.894
 exc+0.893
Neg logit contributions
abella-1.057
top-1.018
 wrong-0.986
bench-0.914
box-0.853
Token home
Token ID1363
Feature activation+4.69e-01

Pos logit contributions
hr+1.646
AAAA+1.429
oc+1.396
Ps+1.389
 tou+1.388
Neg logit contributions
abella-1.551
top-1.523
 wrong-1.480
bench-1.381
box-1.323
Token,
Token ID11
Feature activation-1.95e-01

Pos logit contributions
hr+0.711
oc+0.614
AAAA+0.605
 tou+0.603
EX+0.601
Neg logit contributions
abella-0.714
top-0.677
 wrong-0.659
bench-0.629
box-0.579
Token they
Token ID484
Feature activation-6.54e-02

Pos logit contributions
abella+0.307
top+0.288
 wrong+0.277
bench+0.275
box+0.245
Neg logit contributions
hr-0.287
oc-0.241
 exc-0.241
 tou-0.239
 accidents-0.238
Token and
Token ID290
Feature activation-2.71e-01

Pos logit contributions
abella+0.306
top+0.287
 wrong+0.280
bench+0.276
box+0.252
Neg logit contributions
hr-0.183
oc-0.147
 exc-0.145
 tou-0.144
EX-0.142
Token butterflies
Token ID44571
Feature activation-3.71e-01

Pos logit contributions
abella+0.458
top+0.429
 wrong+0.415
bench+0.410
box+0.369
Neg logit contributions
hr-0.371
oc-0.311
 exc-0.308
 tou-0.307
 accidents-0.304
Token inside
Token ID2641
Feature activation-8.22e-03

Pos logit contributions
abella+0.683
top+0.636
 wrong+0.616
bench+0.611
box+0.555
Neg logit contributions
hr-0.452
oc-0.371
 exc-0.365
 tou-0.365
EX-0.357
Token.
Token ID13
Feature activation-2.23e-02

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.013
bench+0.012
box+0.011
Neg logit contributions
hr-0.011
oc-0.009
 exc-0.009
 tou-0.009
EX-0.009
Token You
Token ID921
Feature activation+1.53e-01

Pos logit contributions
abella+0.034
top+0.031
 wrong+0.030
bench+0.030
box+0.026
Neg logit contributions
hr-0.034
oc-0.029
 exc-0.029
 tou-0.029
EX-0.028
Token should
Token ID815
Feature activation-1.75e+00

Pos logit contributions
hr+0.250
oc+0.215
 exc+0.212
 tou+0.211
EX+0.210
Neg logit contributions
abella-0.216
top-0.201
 wrong-0.194
bench-0.189
box-0.167
Token try
Token ID1949
Feature activation-8.74e-02

Pos logit contributions
abella+2.879
top+2.707
bench+2.620
 wrong+2.490
box+2.369
Neg logit contributions
hr-2.478
 exc-2.310
 tou-2.192
 mis-2.039
 accidents-2.021
Token it
Token ID340
Feature activation+6.36e-01

Pos logit contributions
abella+0.141
top+0.132
 wrong+0.127
bench+0.126
box+0.113
Neg logit contributions
hr-0.126
oc-0.106
 exc-0.105
 tou-0.104
 accidents-0.104
Token,
Token ID11
Feature activation+5.64e-01

Pos logit contributions
hr+0.921
oc+0.777
AAAA+0.761
 tou+0.761
Ps+0.758
Neg logit contributions
abella-1.026
top-0.977
 wrong-0.973
bench-0.914
box-0.831
Token Sam
Token ID3409
Feature activation+2.71e-01

Pos logit contributions
hr+0.853
oc+0.735
AAAA+0.717
Ps+0.712
EX+0.712
Neg logit contributions
abella-0.861
top-0.806
 wrong-0.794
bench-0.756
box-0.684
Token."
Token ID526
Feature activation+2.21e-01

Pos logit contributions
hr+0.370
oc+0.306
 tou+0.304
 exc+0.302
AAAA+0.302
Neg logit contributions
abella-0.456
top-0.429
 wrong-0.421
bench-0.408
box-0.369
Token the
Token ID262
Feature activation+6.16e-01

Pos logit contributions
abella+2.163
bench+2.060
top+2.057
 wrong+1.990
box+1.828
Neg logit contributions
hr-1.681
 accidents-1.399
 tou-1.363
EX-1.363
Fear-1.339
Token jar
Token ID17379
Feature activation-8.15e-01

Pos logit contributions
hr+0.871
oc+0.755
 exc+0.716
EX+0.714
AAAA+0.713
Neg logit contributions
abella-1.003
top-0.950
 wrong-0.950
bench-0.895
box-0.821
Token in
Token ID287
Feature activation+5.66e-01

Pos logit contributions
abella+1.424
top+1.288
 wrong+1.255
bench+1.246
 versa+1.136
Neg logit contributions
hr-1.055
 exc-0.886
oc-0.871
EX-0.860
 tou-0.852
Token a
Token ID257
Feature activation+2.52e-01

Pos logit contributions
hr+0.879
oc+0.756
AAAA+0.743
 tou+0.741
Ps+0.740
Neg logit contributions
abella-0.854
top-0.805
 wrong-0.776
bench-0.750
box-0.674
Token big
Token ID1263
Feature activation+2.52e-01

Pos logit contributions
hr+0.371
oc+0.310
 exc+0.307
 tou+0.306
AAAA+0.304
Neg logit contributions
abella-0.398
top-0.376
 wrong-0.367
bench-0.356
box-0.322
Token blanket
Token ID18447
Feature activation-1.74e+00

Pos logit contributions
hr+0.415
oc+0.354
 exc+0.353
 tou+0.351
EX+0.349
Neg logit contributions
abella-0.353
top-0.333
 wrong-0.322
bench-0.313
box-0.278
Token.
Token ID13
Feature activation+9.32e-01

Pos logit contributions
abella+3.570
top+3.141
 wrong+3.085
bench+3.067
 versa+2.931
Neg logit contributions
hr-1.816
 from-1.555
oc-1.470
 exc-1.408
 tou-1.360
Token She
Token ID1375
Feature activation+2.34e-01

Pos logit contributions
hr+1.540
oc+1.345
 tou+1.315
 exc+1.299
AAAA+1.291
Neg logit contributions
abella-1.348
top-1.260
 wrong-1.177
bench-1.174
box-1.033
Token wanted
Token ID2227
Feature activation-1.06e+00

Pos logit contributions
hr+0.348
oc+0.294
 tou+0.289
 exc+0.288
EX+0.288
Neg logit contributions
abella-0.369
top-0.343
 wrong-0.331
bench-0.324
box-0.292
Token to
Token ID284
Feature activation+7.85e-01

Pos logit contributions
abella+1.694
bench+1.625
top+1.621
 wrong+1.531
box+1.405
Neg logit contributions
hr-1.498
 accidents-1.276
 tou-1.244
 exc-1.224
EX-1.217
Token keep
Token ID1394
Feature activation-2.29e-01

Pos logit contributions
hr+1.182
oc+1.022
 accidents+1.006
AAAA+1.004
Ps+0.985
Neg logit contributions
abella-1.196
top-1.132
 wrong-1.076
bench-1.052
box-0.940
Token cr
Token ID1067
Feature activation-1.02e+00

Pos logit contributions
abella+1.184
top+1.047
 wrong+1.032
bench+1.024
 versa+0.912
Neg logit contributions
hr-1.022
oc-0.905
 exc-0.856
 tou-0.854
 accidents-0.852
Tokenay
Token ID323
Feature activation-1.08e+00

Pos logit contributions
abella+1.964
 wrong+1.839
top+1.812
bench+1.726
 versa+1.619
Neg logit contributions
hr-1.141
oc-0.957
rum-0.917
EX-0.915
 exc-0.913
Tokenon
Token ID261
Feature activation-4.31e-01

Pos logit contributions
abella+1.219
top+1.061
 wrong+1.032
bench+0.971
 versa+0.845
Neg logit contributions
hr-2.079
oc-1.911
 exc-1.858
rum-1.853
 accidents-1.849
Token.
Token ID13
Feature activation+3.55e-01

Pos logit contributions
abella+0.776
top+0.720
 wrong+0.698
bench+0.685
box+0.613
Neg logit contributions
hr-0.539
 exc-0.449
oc-0.445
 tou-0.438
 accidents-0.435
Token They
Token ID1119
Feature activation-1.20e+00

Pos logit contributions
hr+0.592
oc+0.506
 tou+0.497
 exc+0.496
 accidents+0.495
Neg logit contributions
abella-0.504
top-0.470
 wrong-0.447
bench-0.436
box-0.382
Token find
Token ID1064
Feature activation-1.82e+00

Pos logit contributions
abella+2.109
 wrong+1.898
top+1.891
bench+1.878
box+1.721
Neg logit contributions
hr-1.557
 exc-1.433
oc-1.313
 tou-1.295
Ps-1.280
Token a
Token ID257
Feature activation-5.71e-01

Pos logit contributions
abella+3.067
bench+2.953
top+2.834
 wrong+2.659
 versa+2.562
Neg logit contributions
hr-2.342
 accidents-2.128
 from-2.069
 exc-2.055
 little-2.031
Token white
Token ID2330
Feature activation-8.41e-01

Pos logit contributions
abella+1.024
top+0.950
 wrong+0.909
bench+0.907
 versa+0.809
Neg logit contributions
hr-0.727
 exc-0.607
oc-0.605
 tou-0.604
EX-0.585
Token paper
Token ID3348
Feature activation-1.15e+00

Pos logit contributions
abella+1.464
top+1.308
 wrong+1.287
bench+1.275
 versa+1.139
Neg logit contributions
hr-1.141
oc-1.021
 tou-0.956
 accidents-0.942
 exc-0.941
Token.
Token ID13
Feature activation+1.97e-01

Pos logit contributions
abella+2.523
top+2.331
 wrong+2.281
bench+2.271
 versa+2.093
Neg logit contributions
hr-0.996
oc-0.803
 tou-0.777
 exc-0.754
 from-0.744
Token They
Token ID1119
Feature activation-1.09e+00

Pos logit contributions
hr+0.317
oc+0.271
 exc+0.267
 tou+0.266
 accidents+0.265
Neg logit contributions
abella-0.287
top-0.268
 wrong-0.257
bench-0.251
box-0.222
Token.
Token ID13
Feature activation-3.07e-01

Pos logit contributions
hr+0.746
AAAA+0.625
 accidents+0.623
oc+0.618
 tou+0.616
Neg logit contributions
abella-0.792
top-0.751
 wrong-0.729
bench-0.702
box-0.627
Token She
Token ID1375
Feature activation-7.68e-01

Pos logit contributions
abella+0.492
top+0.456
 wrong+0.443
bench+0.434
box+0.394
Neg logit contributions
hr-0.447
 exc-0.379
Ps-0.376
oc-0.375
EX-0.373
Token had
Token ID550
Feature activation-1.71e+00

Pos logit contributions
abella+1.178
top+1.126
bench+1.068
 wrong+1.056
box+0.963
Neg logit contributions
hr-1.150
 exc-1.018
oc-0.994
Ps-0.991
 tou-0.987
Token a
Token ID257
Feature activation-9.83e-01

Pos logit contributions
abella+2.953
top+2.846
bench+2.833
 wrong+2.587
box+2.432
Neg logit contributions
hr-2.240
 exc-1.960
 tou-1.946
 little-1.902
EX-1.891
Token blue
Token ID4171
Feature activation-5.46e-01

Pos logit contributions
abella+1.944
top+1.822
bench+1.760
 wrong+1.681
 versa+1.559
Neg logit contributions
hr-1.069
 exc-0.923
 tou-0.911
oc-0.877
 mis-0.867
Token case
Token ID1339
Feature activation-1.82e+00

Pos logit contributions
abella+0.975
top+0.897
bench+0.873
 wrong+0.866
box+0.759
Neg logit contributions
hr-0.703
oc-0.597
 tou-0.589
 exc-0.584
 accidents-0.571
Token with
Token ID351
Feature activation+3.30e-01

Pos logit contributions
abella+3.333
top+2.949
bench+2.877
 versa+2.751
 Dame+2.673
Neg logit contributions
hr-2.296
 exc-2.045
 from-2.040
 tou-1.974
 accidents-1.888
Token her
Token ID607
Feature activation+1.48e+00

Pos logit contributions
hr+0.514
oc+0.439
AAAA+0.431
 exc+0.429
 tou+0.425
Neg logit contributions
abella-0.499
top-0.465
 wrong-0.451
bench-0.435
box-0.390
Token that
Token ID326
Feature activation-1.62e-01

Pos logit contributions
hr+2.567
AAAA+2.230
oc+2.176
rum+2.171
Ps+2.160
Neg logit contributions
abella-2.027
 wrong-1.973
top-1.891
bench-1.746
box-1.648
Token was
Token ID373
Feature activation-5.38e-02

Pos logit contributions
abella+0.254
top+0.238
 wrong+0.228
bench+0.225
box+0.200
Neg logit contributions
hr-0.239
oc-0.203
 exc-0.203
 tou-0.202
EX-0.199
Token filled
Token ID5901
Feature activation-4.67e-01

Pos logit contributions
abella+0.079
top+0.074
 wrong+0.071
bench+0.070
box+0.062
Neg logit contributions
hr-0.085
oc-0.072
 exc-0.072
 tou-0.072
EX-0.071
Token Tim
Token ID5045
Feature activation-7.15e-01

Pos logit contributions
abella+1.372
top+1.276
 wrong+1.245
bench+1.231
box+1.075
Neg logit contributions
hr-1.364
 exc-1.171
oc-1.147
 tou-1.123
AAAA-1.117
Token and
Token ID290
Feature activation-1.77e+00

Pos logit contributions
abella+1.266
top+1.166
 wrong+1.119
bench+1.108
 versa+1.002
Neg logit contributions
hr-0.944
oc-0.815
 exc-0.781
rum-0.757
 accidents-0.754
Token Tom
Token ID4186
Feature activation-4.26e-01

Pos logit contributions
abella+3.100
top+2.981
 wrong+2.860
bench+2.817
box+2.567
Neg logit contributions
hr-2.340
 exc-2.079
oc-1.923
 accidents-1.884
 mis-1.880
Token.
Token ID13
Feature activation-1.22e-01

Pos logit contributions
abella+0.772
top+0.712
 wrong+0.686
bench+0.686
box+0.619
Neg logit contributions
hr-0.546
oc-0.459
 exc-0.449
 tou-0.436
 accidents-0.435
Token Tom
Token ID4186
Feature activation-1.32e+00

Pos logit contributions
abella+0.188
top+0.174
 wrong+0.170
bench+0.165
box+0.148
Neg logit contributions
hr-0.184
oc-0.158
 exc-0.156
 tou-0.154
EX-0.153
Token had
Token ID550
Feature activation-1.99e+00

Pos logit contributions
abella+2.252
top+2.005
bench+1.937
 wrong+1.888
 versa+1.745
Neg logit contributions
hr-1.802
oc-1.665
 exc-1.637
 tou-1.523
Ps-1.494
Token a
Token ID257
Feature activation-1.17e+00

Pos logit contributions
abella+3.706
top+3.479
bench+3.402
 wrong+3.142
 versa+3.015
Neg logit contributions
hr-2.443
 exc-2.193
 tou-2.134
 little-2.125
 ac-2.072
Token model
Token ID2746
Feature activation-2.96e-01

Pos logit contributions
abella+2.347
top+2.190
bench+2.091
 wrong+2.004
 versa+1.870
Neg logit contributions
hr-1.227
 exc-1.111
oc-1.068
 tou-1.066
 mis-1.043
Token car
Token ID1097
Feature activation-4.40e-01

Pos logit contributions
abella+0.490
top+0.453
 wrong+0.436
bench+0.431
box+0.381
Neg logit contributions
hr-0.420
oc-0.359
 exc-0.352
 tou-0.350
 accidents-0.344
Token,
Token ID11
Feature activation-7.77e-03

Pos logit contributions
abella+0.693
top+0.633
 wrong+0.603
bench+0.601
box+0.527
Neg logit contributions
hr-0.654
oc-0.566
 exc-0.557
 tou-0.552
 accidents-0.542
Token and
Token ID290
Feature activation-1.09e+00

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.011
box+0.010
Neg logit contributions
hr-0.012
oc-0.010
 exc-0.010
 tou-0.010
EX-0.010