TOP ACTIVATIONS
MAX = 3.474

's time to clean up your toys ." Lily did not
relieved . She even shared her delicious snack with the little
, Nut ty always shared her ac orns with her friends
One day , while they were walking , Tim my
I 'm happy to help my reliable friend . Remember ,
happy because he could share his ac orn with his friends
. She wanted to fill her basket with the needles .
One day , while they were walking , they saw
One day , while they were playing in the garden
his cousins played and shared their ac orns together .
it and decided to share her je ep with Max .
, Lily learned to share her candy and her brother learned
on , Sue loved hanging her purse in the tree as
to fly and play with other birds . One
Sally promised to always share her toys with others . <|endoftext|>
, Lily , for sharing your toys with me ." Lily
, " Can you raise your trunk ?" The elephant raised
Lily felt good about sharing her toy . The moral of
One day , while they were walking , Spot saw
. She quickly filled up her basket with blue berries and

MIN ACTIVATIONS
MIN = -3.295

they played in the garden and had lots of fun .
play . They looked around and saw a black board .
One day , they find a big box in the
day , Anna and Ben find a big frog in the
day , Anna and Ben find a big roll of paper
One day , they find a new toy in their
. One day , they find a big box in the
or on paper . You can not draw on other people
One day , they found a big fan in the
One day , they find a new soap in the
day , Tom and Mia find a big box in the
. One day , they find a big box in the
They were always playing together and they were very good at
One day , they find a yellow box in their
. One day , they find a big box in the
were walking in the park and stopped in front of a
my . They would stretch and move their bodies in fun
One day , they find a new rocket in their
One day , they find a new truck in the
mask . They ran around and made loud noises .

INTERVAL 1.782 - 3.474
CONTAINS 0.403%

L ily held the pit in her hand and said ,
Mom my , Daddy , come see !" she exclaimed .
, she decided to take her daughter to the toy store
watch for him . When he opened it , he was
the grass . She wanted to catch it , but it

INTERVAL 0.090 - 1.782
CONTAINS 43.691%

But when it was time to go to bed , Jack
play in the water and feel the warm sun shining on
and said that if she did , she would make all
little girl smiled and said , " Thank you , bird
� € � � Look at all the kind prints I

INTERVAL -1.602 - 0.090
CONTAINS 55.363%

Dad kept talking gently and said he would buy her a
Max at the park . Max asked Lily
s courage . He moved a little bit closer to the
pushed and pulled , but it was too heavy to open
S ara pressed the button and a jet of

INTERVAL -3.295 - -1.602
CONTAINS 0.543%

. He would run around and see all the animals .
they can do . They find out that they can have
." L ily says , " I love you
to hurt you . I was just angry ."
Sally and Jimmy . Sally wanted to give Jimmy a present
Token's
Token ID338
Feature activation+7.92e-02

Pos logit contributions
hr+0.814
oc+0.701
EX+0.694
 exc+0.693
AAAA+0.691
Neg logit contributions
abella-0.605
top-0.561
 wrong-0.554
bench-0.518
box-0.460
Token time
Token ID640
Feature activation-7.54e-01

Pos logit contributions
hr+0.130
oc+0.112
 exc+0.110
 tou+0.110
EX+0.109
Neg logit contributions
abella-0.113
top-0.104
 wrong-0.101
bench-0.098
box-0.087
Token to
Token ID284
Feature activation+1.42e-01

Pos logit contributions
abella+1.138
top+1.062
bench+0.999
 wrong+0.996
box+0.866
Neg logit contributions
hr-1.162
 exc-0.984
 accidents-0.955
EX-0.955
oc-0.947
Token clean
Token ID3424
Feature activation-6.38e-01

Pos logit contributions
hr+0.190
oc+0.158
 exc+0.155
 tou+0.154
AAAA+0.153
Neg logit contributions
abella-0.242
top-0.228
 wrong-0.222
bench-0.217
box-0.196
Token up
Token ID510
Feature activation+1.04e-01

Pos logit contributions
abella+1.073
top+1.019
bench+0.981
 wrong+0.962
box+0.875
Neg logit contributions
hr-0.871
 accidents-0.723
oc-0.717
 exc-0.704
rum-0.685
Token your
Token ID534
Feature activation+3.47e+00

Pos logit contributions
hr+0.130
oc+0.107
 exc+0.105
 tou+0.105
EX+0.104
Neg logit contributions
abella-0.186
top-0.176
 wrong-0.171
bench-0.167
box-0.152
Token toys
Token ID14958
Feature activation-6.60e-01

Pos logit contributions
hr+5.951
AAAA+5.492
Fear+5.474
rum+5.421
RO+5.377
Neg logit contributions
 wrong-5.494
box-4.784
bench-4.505
abella-4.367
top-4.310
Token."
Token ID526
Feature activation+4.89e-01

Pos logit contributions
abella+1.267
top+1.177
 wrong+1.139
bench+1.131
box+1.020
Neg logit contributions
hr-0.754
 exc-0.602
oc-0.599
 accidents-0.585
 tou-0.564
Token Lily
Token ID20037
Feature activation+1.47e-01

Pos logit contributions
hr+0.766
oc+0.655
 tou+0.643
 exc+0.635
AAAA+0.633
Neg logit contributions
abella-0.739
top-0.694
 wrong-0.671
bench-0.653
box-0.580
Token did
Token ID750
Feature activation+1.23e+00

Pos logit contributions
hr+0.220
oc+0.187
 tou+0.184
 exc+0.184
EX+0.182
Neg logit contributions
abella-0.230
top-0.214
 wrong-0.207
bench-0.203
box-0.182
Token not
Token ID407
Feature activation-7.26e-01

Pos logit contributions
hr+2.748
oc+2.587
 tou+2.549
EX+2.513
AAAA+2.508
Neg logit contributions
abella-1.060
 wrong-0.926
top-0.876
bench-0.751
 Luke-0.592
Token relieved
Token ID22598
Feature activation-2.76e-02

Pos logit contributions
hr+1.757
oc+1.521
rum+1.490
AAAA+1.482
EX+1.466
Neg logit contributions
abella-2.019
top-1.878
 wrong-1.871
bench-1.774
 Luke-1.630
Token.
Token ID13
Feature activation-2.34e-01

Pos logit contributions
abella+0.042
top+0.039
 wrong+0.037
bench+0.037
box+0.033
Neg logit contributions
hr-0.043
oc-0.037
 exc-0.036
 tou-0.036
 accidents-0.036
Token She
Token ID1375
Feature activation+1.16e+00

Pos logit contributions
abella+0.397
top+0.373
 wrong+0.363
bench+0.356
box+0.322
Neg logit contributions
hr-0.314
oc-0.264
 exc-0.260
 tou-0.257
 accidents-0.256
Token even
Token ID772
Feature activation+4.13e-01

Pos logit contributions
hr+1.913
rum+1.634
EX+1.598
oc+1.581
 accidents+1.577
Neg logit contributions
abella-1.736
top-1.550
 wrong-1.522
bench-1.429
box-1.285
Token shared
Token ID4888
Feature activation+9.34e-01

Pos logit contributions
hr+0.637
oc+0.541
 exc+0.538
EX+0.537
 tou+0.537
Neg logit contributions
abella-0.626
top-0.579
 wrong-0.563
bench-0.548
box-0.485
Token her
Token ID607
Feature activation+3.36e+00

Pos logit contributions
hr+1.271
AAAA+1.079
oc+1.060
 exc+1.040
Ps+1.039
Neg logit contributions
abella-1.623
top-1.513
 wrong-1.504
bench-1.420
 Luke-1.303
Token delicious
Token ID12625
Feature activation+2.60e+00

Pos logit contributions
hr+5.167
rum+4.503
oc+4.461
AAAA+4.343
Ps+4.241
Neg logit contributions
 wrong-5.796
abella-5.196
top-4.896
bench-4.875
box-4.641
Token snack
Token ID26906
Feature activation-4.01e-01

Pos logit contributions
hr+3.672
AAAA+3.091
 exc+3.035
Fear+2.960
rum+2.958
Neg logit contributions
 wrong-4.218
top-4.137
abella-4.111
bench-4.066
box-3.816
Token with
Token ID351
Feature activation-1.45e-01

Pos logit contributions
abella+0.832
top+0.784
 wrong+0.769
bench+0.754
box+0.690
Neg logit contributions
hr-0.391
oc-0.307
 exc-0.293
 tou-0.290
 accidents-0.287
Token the
Token ID262
Feature activation+1.13e+00

Pos logit contributions
abella+0.277
top+0.263
 wrong+0.255
bench+0.253
box+0.231
Neg logit contributions
hr-0.165
oc-0.131
 exc-0.130
 tou-0.130
 accidents-0.129
Token little
Token ID1310
Feature activation+3.44e-01

Pos logit contributions
hr+1.275
oc+1.011
rum+0.988
AAAA+0.972
EX+0.947
Neg logit contributions
abella-2.189
 wrong-2.186
top-2.101
bench-2.045
box-1.877
Token,
Token ID11
Feature activation+1.81e+00

Pos logit contributions
hr+0.407
oc+0.342
 exc+0.339
 tou+0.336
EX+0.334
Neg logit contributions
abella-0.423
top-0.396
 wrong-0.390
bench-0.375
box-0.338
Token Nut
Token ID11959
Feature activation+5.99e-01

Pos logit contributions
hr+2.903
rum+2.498
Ps+2.331
AAAA+2.327
oc+2.320
Neg logit contributions
abella-2.657
 wrong-2.576
top-2.535
bench-2.259
 Luke-2.106
Tokenty
Token ID774
Feature activation+4.27e-01

Pos logit contributions
hr+1.196
 exc+1.057
Ps+1.044
AAAA+1.043
 tou+1.042
Neg logit contributions
abella-0.642
top-0.566
 wrong-0.560
bench-0.516
box-0.435
Token always
Token ID1464
Feature activation-5.47e-01

Pos logit contributions
hr+0.660
oc+0.554
 tou+0.552
 exc+0.552
EX+0.550
Neg logit contributions
abella-0.642
top-0.602
 wrong-0.586
bench-0.564
box-0.506
Token shared
Token ID4888
Feature activation+1.55e+00

Pos logit contributions
abella+0.879
top+0.815
 wrong+0.774
bench+0.770
box+0.690
Neg logit contributions
hr-0.813
 exc-0.683
oc-0.678
 accidents-0.675
AAAA-0.673
Token her
Token ID607
Feature activation+3.33e+00

Pos logit contributions
hr+2.188
AAAA+1.879
 exc+1.836
Ps+1.830
oc+1.822
Neg logit contributions
abella-2.551
 wrong-2.512
top-2.466
bench-2.268
box-2.127
Token ac
Token ID936
Feature activation+9.99e-01

Pos logit contributions
hr+4.479
Ps+3.784
AAAA+3.777
rum+3.561
packs+3.488
Neg logit contributions
 wrong-6.112
top-5.482
abella-5.208
box-5.118
bench-5.104
Tokenorns
Token ID19942
Feature activation+2.74e-01

Pos logit contributions
hr+1.516
AAAA+1.283
Ps+1.271
 exc+1.268
 tou+1.245
Neg logit contributions
abella-1.506
top-1.462
 wrong-1.427
bench-1.382
box-1.250
Token with
Token ID351
Feature activation+8.85e-02

Pos logit contributions
hr+0.262
oc+0.198
AAAA+0.197
 exc+0.196
 tou+0.195
Neg logit contributions
abella-0.568
top-0.547
 wrong-0.535
bench-0.524
box-0.485
Token her
Token ID607
Feature activation+1.49e+00

Pos logit contributions
hr+0.114
oc+0.094
 exc+0.093
 tou+0.092
EX+0.091
Neg logit contributions
abella-0.156
top-0.148
 wrong-0.144
bench-0.140
box-0.128
Token friends
Token ID2460
Feature activation+1.51e-03

Pos logit contributions
hr+2.281
AAAA+2.031
Ps+1.955
EX+1.907
 tou+1.873
Neg logit contributions
abella-2.238
 wrong-2.230
top-2.155
bench-1.981
box-1.895
Token
Token ID198
Feature activation+1.49e-01

Pos logit contributions
abella+1.061
top+1.008
 wrong+0.983
bench+0.961
box+0.876
Neg logit contributions
hr-0.762
oc-0.651
 exc-0.630
Ps-0.610
EX-0.609
TokenOne
Token ID3198
Feature activation+7.27e-01

Pos logit contributions
hr+0.302
oc+0.267
 tou+0.265
 exc+0.265
 accidents+0.263
Neg logit contributions
abella-0.154
top-0.140
 wrong-0.132
bench-0.127
box-0.105
Token day
Token ID1110
Feature activation+7.94e-01

Pos logit contributions
hr+1.503
oc+1.321
EX+1.319
 accidents+1.319
AAAA+1.315
Neg logit contributions
abella-0.718
top-0.660
 wrong-0.631
bench-0.576
box-0.457
Token,
Token ID11
Feature activation+1.08e+00

Pos logit contributions
hr+1.298
 tou+1.093
EX+1.085
 exc+1.083
oc+1.083
Neg logit contributions
abella-1.140
top-1.071
 wrong-1.063
bench-0.974
box-0.888
Token while
Token ID981
Feature activation+4.37e-01

Pos logit contributions
hr+1.748
oc+1.493
rum+1.476
 tou+1.473
EX+1.450
Neg logit contributions
abella-1.575
top-1.497
 wrong-1.467
bench-1.324
 Luke-1.228
Token they
Token ID484
Feature activation+3.30e+00

Pos logit contributions
hr+0.762
oc+0.661
 tou+0.649
EX+0.646
 exc+0.645
Neg logit contributions
abella-0.576
top-0.535
 wrong-0.519
bench-0.494
box-0.435
Token were
Token ID547
Feature activation-4.57e-01

Pos logit contributions
hr+7.197
rum+6.695
 accidents+6.588
adoes+6.545
 fortunate+6.487
Neg logit contributions
abella-3.213
top-3.092
 wrong-2.269
bench-2.161
case-2.034
Token walking
Token ID6155
Feature activation-7.34e-01

Pos logit contributions
abella+0.726
top+0.687
bench+0.657
 wrong+0.644
box+0.578
Neg logit contributions
hr-0.673
 tou-0.567
 exc-0.566
 accidents-0.558
AAAA-0.557
Token,
Token ID11
Feature activation-1.09e-01

Pos logit contributions
abella+1.305
top+1.200
bench+1.179
 wrong+1.144
box+1.051
Neg logit contributions
hr-0.949
oc-0.808
 exc-0.794
 tou-0.790
AAAA-0.776
Token Tim
Token ID5045
Feature activation-7.51e-02

Pos logit contributions
abella+0.167
top+0.156
 wrong+0.151
bench+0.149
box+0.133
Neg logit contributions
hr-0.164
oc-0.139
 exc-0.138
 tou-0.137
 accidents-0.136
Tokenmy
Token ID1820
Feature activation+4.33e-02

Pos logit contributions
abella+0.096
top+0.088
 wrong+0.084
bench+0.082
box+0.071
Neg logit contributions
hr-0.134
oc-0.117
 exc-0.115
 tou-0.115
EX-0.114
TokenI
Token ID40
Feature activation-6.73e-01

Pos logit contributions
abella+0.074
top+0.069
 wrong+0.067
bench+0.066
box+0.059
Neg logit contributions
hr-0.073
oc-0.062
 exc-0.061
 tou-0.061
EX-0.060
Token'm
Token ID1101
Feature activation-6.93e-02

Pos logit contributions
abella+1.148
top+1.084
bench+1.035
 wrong+1.034
box+0.935
Neg logit contributions
hr-0.906
oc-0.791
 exc-0.779
 tou-0.753
AAAA-0.742
Token happy
Token ID3772
Feature activation-1.73e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.106
box+0.095
Neg logit contributions
hr-0.095
oc-0.078
 exc-0.078
 tou-0.078
 accidents-0.077
Token to
Token ID284
Feature activation+6.34e-01

Pos logit contributions
abella+0.257
top+0.240
 wrong+0.230
bench+0.229
box+0.201
Neg logit contributions
hr-0.273
oc-0.233
 exc-0.231
 accidents-0.230
 tou-0.229
Token help
Token ID1037
Feature activation+9.19e-01

Pos logit contributions
hr+1.035
 accidents+0.878
oc+0.866
Ps+0.856
EX+0.856
Neg logit contributions
abella-0.895
top-0.846
 wrong-0.826
bench-0.797
box-0.698
Token my
Token ID616
Feature activation+3.30e+00

Pos logit contributions
hr+1.351
AAAA+1.154
Ps+1.144
EX+1.134
oc+1.126
Neg logit contributions
abella-1.432
 wrong-1.379
top-1.362
bench-1.261
box-1.152
Token reliable
Token ID9314
Feature activation+1.75e+00

Pos logit contributions
hr+5.759
AAAA+5.455
Upon+5.132
Ps+5.023
EX+4.860
Neg logit contributions
 wrong-4.851
top-4.383
abella-4.095
box-4.060
bench-3.751
Token friend
Token ID1545
Feature activation+1.47e+00

Pos logit contributions
hr+2.602
AAAA+2.179
Upon+2.135
Ps+2.126
Fear+2.060
Neg logit contributions
 wrong-2.749
abella-2.686
top-2.678
bench-2.518
box-2.424
Token.
Token ID13
Feature activation+1.16e-01

Pos logit contributions
hr+2.288
AAAA+2.067
Ps+2.021
EX+1.966
 accidents+1.928
Neg logit contributions
 wrong-2.156
abella-2.155
top-2.125
bench-1.932
box-1.834
Token Remember
Token ID11436
Feature activation-9.38e-02

Pos logit contributions
hr+0.177
oc+0.149
 exc+0.148
 tou+0.147
EX+0.146
Neg logit contributions
abella-0.178
top-0.166
 wrong-0.161
bench-0.157
box-0.140
Token,
Token ID11
Feature activation+1.36e+00

Pos logit contributions
abella+0.147
top+0.138
 wrong+0.132
bench+0.131
box+0.117
Neg logit contributions
hr-0.140
oc-0.118
 exc-0.117
 tou-0.116
 accidents-0.116
Token happy
Token ID3772
Feature activation-1.06e-01

Pos logit contributions
hr+0.971
oc+0.825
EX+0.807
AAAA+0.798
Ps+0.796
Neg logit contributions
abella-1.008
 wrong-0.949
top-0.948
bench-0.884
 Luke-0.775
Token because
Token ID780
Feature activation-1.08e-01

Pos logit contributions
abella+0.182
top+0.171
 wrong+0.165
bench+0.163
box+0.147
Neg logit contributions
hr-0.144
oc-0.120
 exc-0.119
 accidents-0.118
 tou-0.118
Token he
Token ID339
Feature activation+7.99e-01

Pos logit contributions
abella+0.153
top+0.142
 wrong+0.137
bench+0.134
box+0.119
Neg logit contributions
hr-0.176
oc-0.152
 exc-0.151
 tou-0.150
 accidents-0.149
Token could
Token ID714
Feature activation+2.05e-01

Pos logit contributions
hr+1.416
 accidents+1.217
 exc+1.211
rum+1.204
EX+1.202
Neg logit contributions
abella-1.038
top-0.949
 wrong-0.923
bench-0.871
box-0.761
Token share
Token ID2648
Feature activation+1.89e+00

Pos logit contributions
hr+0.346
oc+0.299
AAAA+0.294
 tou+0.293
 exc+0.292
Neg logit contributions
abella-0.282
top-0.263
 wrong-0.254
bench-0.246
box-0.216
Token his
Token ID465
Feature activation+3.29e+00

Pos logit contributions
hr+2.598
AAAA+2.329
Ps+2.251
EX+2.155
 exc+2.132
Neg logit contributions
 wrong-3.163
abella-3.133
top-3.009
bench-2.822
box-2.605
Token ac
Token ID936
Feature activation+7.84e-01

Pos logit contributions
hr+4.384
rum+3.603
AAAA+3.540
Ps+3.484
Upon+3.355
Neg logit contributions
 wrong-6.126
top-5.370
abella-5.234
box-5.216
bench-5.185
Tokenorn
Token ID1211
Feature activation+7.77e-01

Pos logit contributions
hr+1.126
AAAA+0.948
 exc+0.934
Ps+0.931
 tou+0.919
Neg logit contributions
abella-1.247
top-1.202
 wrong-1.168
bench-1.137
box-1.027
Token with
Token ID351
Feature activation+1.66e-01

Pos logit contributions
hr+0.921
AAAA+0.783
Ps+0.765
EX+0.731
 tou+0.725
Neg logit contributions
abella-1.414
top-1.382
 wrong-1.354
bench-1.311
box-1.217
Token his
Token ID465
Feature activation+1.69e+00

Pos logit contributions
hr+0.189
oc+0.150
 exc+0.148
 tou+0.147
EX+0.146
Neg logit contributions
abella-0.322
top-0.305
 wrong-0.299
bench-0.291
box-0.267
Token friends
Token ID2460
Feature activation-8.03e-02

Pos logit contributions
hr+2.212
AAAA+1.846
EX+1.835
Ps+1.806
Upon+1.760
Neg logit contributions
 wrong-2.930
abella-2.876
top-2.740
bench-2.577
box-2.515
Token.
Token ID13
Feature activation-2.76e-01

Pos logit contributions
abella+0.918
top+0.842
 wrong+0.814
bench+0.813
box+0.737
Neg logit contributions
hr-0.607
oc-0.499
 exc-0.498
 tou-0.492
 accidents-0.487
Token She
Token ID1375
Feature activation-1.82e-01

Pos logit contributions
abella+0.423
top+0.394
 wrong+0.382
bench+0.375
box+0.336
Neg logit contributions
hr-0.416
oc-0.353
 exc-0.352
 tou-0.351
Ps-0.350
Token wanted
Token ID2227
Feature activation-7.98e-01

Pos logit contributions
abella+0.354
top+0.337
 wrong+0.329
bench+0.324
box+0.296
Neg logit contributions
hr-0.198
oc-0.158
 exc-0.158
 tou-0.155
Ps-0.153
Token to
Token ID284
Feature activation+1.88e+00

Pos logit contributions
abella+1.249
top+1.156
bench+1.149
 wrong+1.106
 projector+1.004
Neg logit contributions
hr-1.185
 tou-1.018
 exc-0.999
 accidents-0.997
EX-0.997
Token fill
Token ID6070
Feature activation-3.29e-01

Pos logit contributions
hr+2.673
 accidents+2.410
AAAA+2.308
 fortunate+2.292
oc+2.256
Neg logit contributions
top-3.043
abella-3.010
 wrong-2.773
bench-2.697
 tenth-2.427
Token her
Token ID607
Feature activation+3.25e+00

Pos logit contributions
abella+0.555
top+0.520
 wrong+0.504
bench+0.502
box+0.452
Neg logit contributions
hr-0.449
oc-0.373
 exc-0.370
 tou-0.369
 accidents-0.366
Token basket
Token ID7988
Feature activation-6.88e-01

Pos logit contributions
hr+5.748
AAAA+5.219
Ps+5.156
 accidents+4.798
 fortunate+4.798
Neg logit contributions
 wrong-4.453
top-4.376
box-4.138
abella-4.025
 tenth-3.489
Token with
Token ID351
Feature activation+1.19e+00

Pos logit contributions
abella+1.476
top+1.347
 wrong+1.333
bench+1.319
 versa+1.208
Neg logit contributions
hr-0.648
 exc-0.500
oc-0.488
 tou-0.472
 accidents-0.461
Token the
Token ID262
Feature activation+1.24e+00

Pos logit contributions
hr+1.914
oc+1.645
AAAA+1.621
Ps+1.613
EX+1.562
Neg logit contributions
abella-1.752
top-1.704
 wrong-1.651
bench-1.542
box-1.365
Token needles
Token ID25209
Feature activation-1.69e+00

Pos logit contributions
hr+1.388
AAAA+1.058
rum+1.028
oc+1.023
Ps+1.021
Neg logit contributions
abella-2.440
top-2.353
 wrong-2.347
bench-2.184
box-2.087
Token.
Token ID13
Feature activation-2.70e-01

Pos logit contributions
abella+3.502
top+3.068
bench+3.037
 wrong+2.939
 projector+2.872
Neg logit contributions
hr-1.728
 from-1.513
 exc-1.462
 accidents-1.327
rum-1.320
Token
Token ID198
Feature activation-8.91e-01

Pos logit contributions
abella+1.606
top+1.550
 wrong+1.511
bench+1.481
box+1.362
Neg logit contributions
hr-1.055
oc-0.895
 exc-0.869
Ps-0.854
EX-0.839
TokenOne
Token ID3198
Feature activation+4.51e-01

Pos logit contributions
abella+1.068
top+1.010
 wrong+0.980
bench+0.969
box+0.840
Neg logit contributions
hr-1.611
oc-1.437
Ps-1.430
 exc-1.415
EX-1.403
Token day
Token ID1110
Feature activation+3.26e-01

Pos logit contributions
hr+0.914
oc+0.806
 tou+0.799
AAAA+0.798
EX+0.798
Neg logit contributions
abella-0.467
top-0.427
 wrong-0.409
bench-0.383
box-0.312
Token,
Token ID11
Feature activation+6.65e-01

Pos logit contributions
hr+0.492
oc+0.420
 tou+0.415
 exc+0.414
EX+0.410
Neg logit contributions
abella-0.511
top-0.473
 wrong-0.461
bench-0.443
box-0.400
Token while
Token ID981
Feature activation+2.71e-01

Pos logit contributions
hr+1.084
oc+0.978
 tou+0.949
rum+0.940
AAAA+0.930
Neg logit contributions
abella-0.973
top-0.886
 wrong-0.865
bench-0.816
 Luke-0.737
Token they
Token ID484
Feature activation+3.25e+00

Pos logit contributions
hr+0.447
oc+0.389
 tou+0.380
 exc+0.380
EX+0.378
Neg logit contributions
abella-0.385
top-0.354
 wrong-0.343
bench-0.331
box-0.293
Token were
Token ID547
Feature activation-7.34e-01

Pos logit contributions
hr+7.009
rum+6.470
 accidents+6.355
adoes+6.267
EX+6.204
Neg logit contributions
abella-3.143
top-2.957
 wrong-2.346
bench-2.145
 versa-2.130
Token walking
Token ID6155
Feature activation-8.19e-01

Pos logit contributions
abella+1.190
top+1.135
bench+1.081
 wrong+1.046
box+0.948
Neg logit contributions
hr-1.061
 tou-0.893
 exc-0.883
 accidents-0.878
Ps-0.868
Token,
Token ID11
Feature activation-2.97e-01

Pos logit contributions
abella+1.470
top+1.349
bench+1.334
 wrong+1.286
box+1.195
Neg logit contributions
hr-1.050
oc-0.883
 exc-0.868
 tou-0.867
 accidents-0.854
Token they
Token ID484
Feature activation-5.75e-01

Pos logit contributions
abella+0.466
top+0.443
 wrong+0.427
bench+0.426
box+0.377
Neg logit contributions
hr-0.435
 accidents-0.363
 exc-0.362
oc-0.360
 tou-0.359
Token saw
Token ID2497
Feature activation-1.78e+00

Pos logit contributions
abella+0.918
top+0.869
bench+0.851
 wrong+0.837
box+0.745
Neg logit contributions
hr-0.828
 tou-0.698
Ps-0.690
 exc-0.685
oc-0.685
Token
Token ID198
Feature activation+2.48e-01

Pos logit contributions
abella+1.502
top+1.438
 wrong+1.408
bench+1.376
box+1.261
Neg logit contributions
hr-1.014
oc-0.857
 exc-0.827
Ps-0.813
EX-0.802
TokenOne
Token ID3198
Feature activation+8.86e-01

Pos logit contributions
hr+0.506
oc+0.447
 tou+0.444
 exc+0.443
 accidents+0.442
Neg logit contributions
abella-0.257
top-0.231
 wrong-0.219
bench-0.210
box-0.173
Token day
Token ID1110
Feature activation+1.09e+00

Pos logit contributions
hr+1.874
AAAA+1.658
EX+1.647
oc+1.643
 accidents+1.642
Neg logit contributions
abella-0.838
top-0.767
 wrong-0.735
bench-0.667
 projector-0.538
Token,
Token ID11
Feature activation+1.36e+00

Pos logit contributions
hr+1.862
 exc+1.586
 tou+1.583
oc+1.580
EX+1.580
Neg logit contributions
abella-1.529
 wrong-1.397
top-1.392
bench-1.251
box-1.158
Token while
Token ID981
Feature activation+7.50e-01

Pos logit contributions
hr+2.318
rum+2.069
oc+2.054
 tou+1.967
EX+1.958
Neg logit contributions
abella-1.950
 wrong-1.772
top-1.755
bench-1.573
 Luke-1.478
Token they
Token ID484
Feature activation+3.23e+00

Pos logit contributions
hr+1.354
oc+1.198
rum+1.161
EX+1.158
 tou+1.157
Neg logit contributions
abella-0.957
top-0.864
 wrong-0.847
bench-0.797
box-0.698
Token were
Token ID547
Feature activation-1.31e-01

Pos logit contributions
hr+6.864
rum+6.396
adoes+6.228
 accidents+6.150
 exc+6.142
Neg logit contributions
abella-3.330
top-3.115
 wrong-2.588
bench-2.335
 versa-2.205
Token playing
Token ID2712
Feature activation+1.17e-01

Pos logit contributions
abella+0.198
top+0.186
 wrong+0.178
bench+0.176
box+0.156
Neg logit contributions
hr-0.202
 exc-0.170
oc-0.170
 tou-0.170
 accidents-0.167
Token in
Token ID287
Feature activation+1.12e+00

Pos logit contributions
hr+0.179
oc+0.152
 exc+0.150
 tou+0.150
EX+0.148
Neg logit contributions
abella-0.179
top-0.168
 wrong-0.163
bench-0.159
box-0.142
Token the
Token ID262
Feature activation-8.06e-02

Pos logit contributions
hr+1.537
oc+1.342
 tou+1.280
rum+1.277
AAAA+1.269
Neg logit contributions
abella-1.858
top-1.796
 wrong-1.737
bench-1.715
box-1.541
Token garden
Token ID11376
Feature activation+3.00e-02

Pos logit contributions
abella+0.118
top+0.109
 wrong+0.105
bench+0.103
box+0.091
Neg logit contributions
hr-0.129
oc-0.111
 tou-0.110
 exc-0.110
EX-0.109
Token his
Token ID465
Feature activation+1.14e+00

Pos logit contributions
hr+1.903
oc+1.577
AAAA+1.510
Upon+1.480
rum+1.474
Neg logit contributions
abella-2.897
 wrong-2.698
top-2.674
bench-2.632
 Luke-2.487
Token cousins
Token ID28706
Feature activation-7.42e-01

Pos logit contributions
hr+1.649
AAAA+1.385
oc+1.366
rum+1.357
Ps+1.329
Neg logit contributions
abella-1.853
 wrong-1.755
top-1.732
bench-1.650
box-1.556
Token played
Token ID2826
Feature activation-3.44e-01

Pos logit contributions
abella+1.196
top+1.110
 wrong+1.092
bench+1.069
box+0.952
Neg logit contributions
hr-1.070
oc-0.906
 exc-0.905
 tou-0.899
EX-0.875
Token and
Token ID290
Feature activation-1.13e+00

Pos logit contributions
abella+0.560
top+0.526
 wrong+0.503
bench+0.497
box+0.445
Neg logit contributions
hr-0.494
oc-0.419
 exc-0.410
 tou-0.407
EX-0.404
Token shared
Token ID4888
Feature activation+1.15e+00

Pos logit contributions
abella+1.757
top+1.704
 wrong+1.672
bench+1.620
box+1.469
Neg logit contributions
hr-1.672
oc-1.441
 exc-1.439
 tou-1.428
EX-1.391
Token their
Token ID511
Feature activation+3.22e+00

Pos logit contributions
hr+1.898
AAAA+1.639
 exc+1.625
Ps+1.619
EX+1.605
Neg logit contributions
abella-1.662
top-1.519
 wrong-1.492
bench-1.410
box-1.267
Token ac
Token ID936
Feature activation+5.63e-01

Pos logit contributions
hr+5.623
rum+5.259
AAAA+4.842
Ps+4.694
cess+4.693
Neg logit contributions
 wrong-4.212
top-4.040
abella-4.003
bench-3.644
box-3.464
Tokenorns
Token ID19942
Feature activation-4.85e-01

Pos logit contributions
hr+0.719
AAAA+0.586
 exc+0.582
Ps+0.579
 tou+0.571
Neg logit contributions
abella-0.995
top-0.952
 wrong-0.918
bench-0.907
box-0.827
Token together
Token ID1978
Feature activation-2.83e-01

Pos logit contributions
abella+0.952
top+0.887
 wrong+0.867
bench+0.855
box+0.785
Neg logit contributions
hr-0.539
oc-0.439
 exc-0.424
 tou-0.414
 accidents-0.409
Token.
Token ID13
Feature activation-1.45e-01

Pos logit contributions
abella+0.497
top+0.465
 wrong+0.451
bench+0.445
box+0.402
Neg logit contributions
hr-0.369
oc-0.307
 exc-0.300
 tou-0.295
 accidents-0.293
Token
Token ID220
Feature activation+9.05e-02

Pos logit contributions
abella+0.213
top+0.199
 wrong+0.193
bench+0.189
box+0.167
Neg logit contributions
hr-0.230
oc-0.200
 exc-0.196
 tou-0.195
EX-0.194
Token it
Token ID340
Feature activation+3.87e-01

Pos logit contributions
hr+0.547
oc+0.460
 tou+0.447
EX+0.446
 exc+0.445
Neg logit contributions
abella-0.683
top-0.633
 wrong-0.628
bench-0.603
box-0.550
Token and
Token ID290
Feature activation+2.37e-01

Pos logit contributions
hr+0.489
oc+0.397
 tou+0.393
AAAA+0.390
EX+0.389
Neg logit contributions
abella-0.701
top-0.665
 wrong-0.663
bench-0.632
box-0.580
Token decided
Token ID3066
Feature activation-8.55e-03

Pos logit contributions
hr+0.353
oc+0.301
 exc+0.294
 tou+0.294
EX+0.293
Neg logit contributions
abella-0.377
top-0.351
 wrong-0.342
bench-0.330
box-0.297
Token to
Token ID284
Feature activation-4.76e-01

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.011
box+0.009
Neg logit contributions
hr-0.014
oc-0.012
 exc-0.012
 tou-0.012
EX-0.012
Token share
Token ID2648
Feature activation+1.12e+00

Pos logit contributions
abella+0.832
top+0.778
 wrong+0.761
bench+0.744
box+0.670
Neg logit contributions
hr-0.628
 exc-0.530
oc-0.518
 tou-0.516
EX-0.505
Token her
Token ID607
Feature activation+3.22e+00

Pos logit contributions
hr+1.609
AAAA+1.385
EX+1.343
Ps+1.340
oc+1.335
Neg logit contributions
abella-1.825
 wrong-1.733
top-1.714
bench-1.617
 Luke-1.484
Token je
Token ID11223
Feature activation+1.17e+00

Pos logit contributions
hr+5.290
Ps+4.796
Upon+4.711
Fear+4.611
AAAA+4.528
Neg logit contributions
 wrong-4.846
top-4.317
box-4.213
bench-4.126
abella-4.108
Tokenep
Token ID538
Feature activation+2.13e-01

Pos logit contributions
hr+1.813
AAAA+1.617
 fortunate+1.605
Ps+1.587
 accidents+1.572
Neg logit contributions
abella-1.662
top-1.585
 wrong-1.560
bench-1.517
box-1.413
Token with
Token ID351
Feature activation-9.48e-01

Pos logit contributions
hr+0.243
oc+0.195
 tou+0.193
AAAA+0.192
 exc+0.192
Neg logit contributions
abella-0.406
top-0.387
 wrong-0.377
bench-0.370
box-0.339
Token Max
Token ID5436
Feature activation-3.43e-01

Pos logit contributions
abella+1.275
top+1.203
bench+1.160
 wrong+1.109
box+1.003
Neg logit contributions
hr-1.575
 accidents-1.411
oc-1.383
 tou-1.366
 exc-1.356
Token.
Token ID13
Feature activation+3.76e-02

Pos logit contributions
abella+0.599
top+0.561
 wrong+0.537
bench+0.536
box+0.482
Neg logit contributions
hr-0.448
oc-0.374
 exc-0.367
 accidents-0.360
 tou-0.358
Token,
Token ID11
Feature activation+9.05e-01

Pos logit contributions
hr+0.289
oc+0.243
 exc+0.240
 tou+0.240
EX+0.238
Neg logit contributions
abella-0.306
top-0.286
 wrong-0.280
bench-0.272
box-0.244
Token Lily
Token ID20037
Feature activation+6.93e-02

Pos logit contributions
hr+1.497
AAAA+1.284
oc+1.278
rum+1.275
 tou+1.275
Neg logit contributions
abella-1.296
 wrong-1.208
top-1.202
bench-1.115
 Luke-1.010
Token learned
Token ID4499
Feature activation+1.11e-01

Pos logit contributions
hr+0.104
oc+0.088
 exc+0.087
 tou+0.086
EX+0.086
Neg logit contributions
abella-0.108
top-0.101
 wrong-0.098
bench-0.095
box-0.085
Token to
Token ID284
Feature activation-1.39e+00

Pos logit contributions
hr+0.197
oc+0.172
 exc+0.170
 tou+0.169
AAAA+0.168
Neg logit contributions
abella-0.145
top-0.134
 wrong-0.129
bench-0.125
box-0.108
Token share
Token ID2648
Feature activation+1.71e+00

Pos logit contributions
abella+2.717
top+2.485
bench+2.402
 wrong+2.373
 projector+2.265
Neg logit contributions
hr-1.628
 exc-1.412
oc-1.377
 tou-1.339
rum-1.267
Token her
Token ID607
Feature activation+3.20e+00

Pos logit contributions
hr+2.188
AAAA+1.897
Ps+1.784
EX+1.743
 exc+1.740
Neg logit contributions
abella-3.056
 wrong-3.035
top-2.939
bench-2.756
box-2.602
Token candy
Token ID18550
Feature activation+8.38e-01

Pos logit contributions
hr+4.664
AAAA+4.457
Ps+4.065
 tou+3.993
Upon+3.991
Neg logit contributions
 wrong-5.444
abella-4.940
box-4.730
top-4.676
bench-4.627
Token and
Token ID290
Feature activation-6.66e-01

Pos logit contributions
hr+0.990
AAAA+0.843
Ps+0.814
 tou+0.793
EX+0.790
Neg logit contributions
abella-1.566
top-1.512
 wrong-1.479
bench-1.433
box-1.333
Token her
Token ID607
Feature activation+1.32e+00

Pos logit contributions
abella+1.013
top+0.937
 wrong+0.881
bench+0.880
box+0.775
Neg logit contributions
hr-1.033
 exc-0.897
 accidents-0.886
 tou-0.883
oc-0.878
Token brother
Token ID3956
Feature activation+1.21e+00

Pos logit contributions
hr+1.924
AAAA+1.644
Ps+1.580
 tou+1.579
Upon+1.571
Neg logit contributions
abella-2.076
 wrong-2.051
top-1.938
bench-1.896
box-1.740
Token learned
Token ID4499
Feature activation+2.60e-01

Pos logit contributions
hr+2.004
AAAA+1.751
EX+1.711
 tou+1.698
Ps+1.698
Neg logit contributions
abella-1.689
 wrong-1.666
top-1.641
bench-1.520
box-1.361
Token on
Token ID319
Feature activation+1.20e-01

Pos logit contributions
abella+0.126
top+0.118
 wrong+0.113
bench+0.111
box+0.099
Neg logit contributions
hr-0.125
oc-0.107
 exc-0.106
 tou-0.105
AAAA-0.104
Token,
Token ID11
Feature activation+1.67e+00

Pos logit contributions
hr+0.177
oc+0.149
 exc+0.147
 tou+0.146
EX+0.145
Neg logit contributions
abella-0.192
top-0.180
 wrong-0.175
bench-0.170
box-0.153
Token Sue
Token ID26089
Feature activation+3.40e-01

Pos logit contributions
hr+2.469
AAAA+2.115
rum+2.088
oc+2.081
EX+2.022
Neg logit contributions
abella-2.675
 wrong-2.564
top-2.560
bench-2.216
 Luke-2.078
Token loved
Token ID6151
Feature activation+8.54e-01

Pos logit contributions
hr+0.682
oc+0.599
 tou+0.595
 exc+0.594
AAAA+0.594
Neg logit contributions
abella-0.360
top-0.323
 wrong-0.309
bench-0.293
box-0.246
Token hanging
Token ID10938
Feature activation+1.40e+00

Pos logit contributions
hr+1.353
oc+1.227
AAAA+1.171
Ps+1.153
rum+1.153
Neg logit contributions
abella-1.247
 wrong-1.149
top-1.147
bench-1.070
box-0.958
Token her
Token ID607
Feature activation+3.16e+00

Pos logit contributions
hr+2.019
oc+1.716
Ps+1.693
AAAA+1.670
rum+1.623
Neg logit contributions
abella-2.273
 wrong-2.245
top-2.218
bench-2.104
box-1.947
Token purse
Token ID27517
Feature activation+4.89e-01

Pos logit contributions
hr+5.378
rum+5.138
oc+4.994
AAAA+4.695
 fortunate+4.545
Neg logit contributions
 wrong-5.104
abella-4.663
top-4.441
bench-4.010
box-3.973
Token in
Token ID287
Feature activation+8.25e-01

Pos logit contributions
hr+0.565
AAAA+0.457
oc+0.452
Ps+0.444
 exc+0.443
Neg logit contributions
abella-0.920
top-0.893
 wrong-0.870
bench-0.846
box-0.783
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
hr+1.120
oc+0.952
AAAA+0.925
Ps+0.921
 tou+0.915
Neg logit contributions
abella-1.386
top-1.342
 wrong-1.300
bench-1.266
box-1.142
Token tree
Token ID5509
Feature activation-3.10e-01

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.016
box+0.014
Neg logit contributions
hr-0.017
oc-0.014
 exc-0.014
 tou-0.014
EX-0.014
Token as
Token ID355
Feature activation+5.03e-03

Pos logit contributions
abella+0.592
top+0.549
 wrong+0.537
bench+0.531
box+0.481
Neg logit contributions
hr-0.361
oc-0.293
 exc-0.285
 tou-0.282
EX-0.278
Token to
Token ID284
Feature activation+8.38e-01

Pos logit contributions
abella+0.671
top+0.632
bench+0.604
 wrong+0.596
box+0.535
Neg logit contributions
hr-0.600
 exc-0.509
 accidents-0.509
 tou-0.502
oc-0.498
Token fly
Token ID6129
Feature activation+4.87e-01

Pos logit contributions
hr+1.232
oc+1.064
 accidents+1.031
AAAA+1.030
rum+1.023
Neg logit contributions
abella-1.288
top-1.228
 wrong-1.214
bench-1.157
box-1.037
Token and
Token ID290
Feature activation-7.26e-02

Pos logit contributions
hr+0.742
oc+0.629
 exc+0.616
Ps+0.614
 tou+0.613
Neg logit contributions
abella-0.741
top-0.705
 wrong-0.701
bench-0.659
box-0.588
Token play
Token ID711
Feature activation+1.96e-01

Pos logit contributions
abella+0.119
top+0.111
 wrong+0.107
bench+0.106
box+0.095
Neg logit contributions
hr-0.103
oc-0.086
 exc-0.086
 tou-0.085
EX-0.084
Token with
Token ID351
Feature activation+1.39e+00

Pos logit contributions
hr+0.247
oc+0.200
 exc+0.198
 tou+0.197
EX+0.195
Neg logit contributions
abella-0.353
top-0.334
 wrong-0.328
bench-0.319
box-0.290
Token other
Token ID584
Feature activation+3.14e+00

Pos logit contributions
hr+2.081
EX+1.736
AAAA+1.693
Ps+1.686
oc+1.663
Neg logit contributions
abella-2.195
 wrong-2.156
top-2.100
bench-1.882
 Luke-1.724
Token birds
Token ID10087
Feature activation-4.09e-01

Pos logit contributions
hr+4.415
rum+3.993
EX+3.723
ge+3.722
Fear+3.618
Neg logit contributions
 wrong-5.317
top-4.713
abella-4.655
box-4.123
 contest-4.043
Token.
Token ID13
Feature activation-3.69e-01

Pos logit contributions
abella+0.730
top+0.678
bench+0.652
 wrong+0.646
box+0.590
Neg logit contributions
hr-0.528
oc-0.440
 exc-0.438
 tou-0.433
 accidents-0.423
Token
Token ID198
Feature activation-8.24e-01

Pos logit contributions
abella+0.643
top+0.598
 wrong+0.584
bench+0.577
box+0.523
Neg logit contributions
hr-0.492
oc-0.421
 exc-0.412
 tou-0.403
AAAA-0.401
Token
Token ID198
Feature activation-4.10e-01

Pos logit contributions
abella+1.494
top+1.420
 wrong+1.374
bench+1.356
box+1.255
Neg logit contributions
hr-1.008
oc-0.861
 exc-0.844
Ps-0.805
AAAA-0.803
TokenOne
Token ID3198
Feature activation+8.33e-01

Pos logit contributions
abella+0.466
top+0.421
 wrong+0.401
bench+0.395
box+0.341
Neg logit contributions
hr-0.794
oc-0.707
 exc-0.703
AAAA-0.694
Ps-0.694
Token Sally
Token ID25737
Feature activation+3.30e-01

Pos logit contributions
hr+2.329
oc+2.112
rum+2.105
AAAA+2.032
Ps+1.945
Neg logit contributions
abella-2.271
top-2.194
 wrong-2.149
bench-1.909
 Luke-1.790
Token promised
Token ID8072
Feature activation-5.91e-01

Pos logit contributions
hr+0.548
oc+0.467
 exc+0.465
AAAA+0.463
 tou+0.463
Neg logit contributions
abella-0.461
top-0.427
 wrong-0.411
bench-0.399
box-0.354
Token to
Token ID284
Feature activation-7.25e-01

Pos logit contributions
abella+0.887
top+0.832
bench+0.803
 wrong+0.793
 projector+0.713
Neg logit contributions
hr-0.923
 accidents-0.777
 tou-0.777
 exc-0.766
EX-0.766
Token always
Token ID1464
Feature activation-1.41e-01

Pos logit contributions
abella+1.056
top+0.972
 wrong+0.940
bench+0.930
box+0.810
Neg logit contributions
hr-1.172
 exc-1.023
 tou-1.023
oc-1.013
EX-0.982
Token share
Token ID2648
Feature activation+1.41e+00

Pos logit contributions
abella+0.243
top+0.228
 wrong+0.220
bench+0.217
box+0.196
Neg logit contributions
hr-0.190
oc-0.157
 exc-0.157
 tou-0.155
EX-0.153
Token her
Token ID607
Feature activation+3.13e+00

Pos logit contributions
hr+1.855
AAAA+1.602
Ps+1.531
oc+1.529
 exc+1.527
Neg logit contributions
abella-2.467
 wrong-2.362
top-2.329
bench-2.175
 Luke-2.025
Token toys
Token ID14958
Feature activation+3.76e-01

Pos logit contributions
hr+4.794
AAAA+4.158
Ps+3.938
 tou+3.877
Upon+3.836
Neg logit contributions
 wrong-5.454
top-4.716
box-4.655
abella-4.625
bench-4.590
Token with
Token ID351
Feature activation+3.94e-01

Pos logit contributions
hr+0.417
AAAA+0.335
oc+0.331
Ps+0.329
 tou+0.327
Neg logit contributions
abella-0.724
top-0.695
 wrong-0.678
bench-0.662
box-0.610
Token others
Token ID1854
Feature activation+4.36e-01

Pos logit contributions
hr+0.496
oc+0.410
AAAA+0.401
 exc+0.400
 tou+0.395
Neg logit contributions
abella-0.709
top-0.669
 wrong-0.656
bench-0.630
box-0.576
Token.
Token ID13
Feature activation+4.81e-01

Pos logit contributions
hr+0.592
AAAA+0.490
oc+0.488
 tou+0.486
 exc+0.484
Neg logit contributions
abella-0.738
top-0.699
 wrong-0.686
bench-0.659
box-0.601
Token<|endoftext|>
Token ID50256
Feature activation-5.12e-01

Pos logit contributions
hr+0.725
 exc+0.602
oc+0.601
AAAA+0.598
 tou+0.597
Neg logit contributions
abella-0.749
top-0.708
 wrong-0.676
bench-0.659
box-0.587
Token,
Token ID11
Feature activation+1.32e+00

Pos logit contributions
abella+1.055
top+1.000
bench+0.965
 wrong+0.941
 versa+0.827
Neg logit contributions
hr-0.965
oc-0.845
 accidents-0.823
 exc-0.808
 tou-0.803
Token Lily
Token ID20037
Feature activation-6.35e-01

Pos logit contributions
hr+2.218
EX+1.882
AAAA+1.846
oc+1.837
rum+1.814
Neg logit contributions
abella-1.889
 wrong-1.769
top-1.658
bench-1.612
box-1.495
Token,
Token ID11
Feature activation+5.81e-01

Pos logit contributions
abella+1.208
top+1.138
bench+1.083
 wrong+1.080
 versa+0.980
Neg logit contributions
hr-0.742
oc-0.634
 exc-0.596
 accidents-0.591
 tou-0.586
Token for
Token ID329
Feature activation+2.47e-02

Pos logit contributions
hr+1.065
oc+0.929
EX+0.915
AAAA+0.913
Ps+0.906
Neg logit contributions
abella-0.713
 wrong-0.647
top-0.646
bench-0.601
box-0.535
Token sharing
Token ID7373
Feature activation+9.82e-01

Pos logit contributions
hr+0.036
oc+0.030
 exc+0.030
 tou+0.030
EX+0.030
Neg logit contributions
abella-0.039
top-0.037
 wrong-0.036
bench-0.035
box-0.031
Token your
Token ID534
Feature activation+3.12e+00

Pos logit contributions
hr+1.202
Ps+0.966
EX+0.966
AAAA+0.964
 tou+0.959
Neg logit contributions
abella-1.800
 wrong-1.747
top-1.709
bench-1.619
box-1.524
Token toys
Token ID14958
Feature activation+5.28e-01

Pos logit contributions
hr+4.795
AAAA+3.882
Upon+3.825
EX+3.822
Ps+3.801
Neg logit contributions
 wrong-5.441
box-4.612
top-4.565
bench-4.541
 projector-4.399
Token with
Token ID351
Feature activation+1.24e-01

Pos logit contributions
hr+0.641
AAAA+0.527
oc+0.523
Ps+0.523
 tou+0.517
Neg logit contributions
abella-0.961
top-0.922
 wrong-0.908
bench-0.875
box-0.810
Token me
Token ID502
Feature activation+3.95e-01

Pos logit contributions
hr+0.179
oc+0.151
 exc+0.149
 tou+0.148
EX+0.147
Neg logit contributions
abella-0.199
top-0.187
 wrong-0.182
bench-0.177
box-0.159
Token."
Token ID526
Feature activation+7.03e-01

Pos logit contributions
hr+0.553
 tou+0.460
AAAA+0.460
oc+0.460
Ps+0.458
Neg logit contributions
abella-0.652
top-0.613
 wrong-0.603
bench-0.578
box-0.530
Token Lily
Token ID20037
Feature activation-1.13e-01

Pos logit contributions
hr+1.067
 tou+0.880
 exc+0.869
oc+0.869
EX+0.867
Neg logit contributions
abella-1.116
top-1.044
 wrong-1.014
bench-0.993
box-0.897
Token,
Token ID11
Feature activation+6.59e-01

Pos logit contributions
hr+0.191
oc+0.159
 exc+0.157
 tou+0.157
AAAA+0.156
Neg logit contributions
abella-0.231
top-0.218
 wrong-0.212
bench-0.206
box-0.187
Token "
Token ID366
Feature activation+1.30e+00

Pos logit contributions
hr+0.974
oc+0.841
 tou+0.817
EX+0.810
AAAA+0.804
Neg logit contributions
abella-1.030
 wrong-0.968
top-0.967
bench-0.906
box-0.821
TokenCan
Token ID6090
Feature activation-6.28e-01

Pos logit contributions
hr+1.723
rum+1.488
 exc+1.456
 fortunate+1.436
 accidents+1.415
Neg logit contributions
abella-2.240
top-2.189
 wrong-2.044
bench-1.968
box-1.813
Token you
Token ID345
Feature activation-3.07e-01

Pos logit contributions
abella+1.020
top+0.954
bench+0.932
 wrong+0.891
box+0.813
Neg logit contributions
hr-0.914
 exc-0.781
 tou-0.772
 accidents-0.766
oc-0.761
Token raise
Token ID5298
Feature activation-4.97e-02

Pos logit contributions
abella+0.455
top+0.418
bench+0.398
 wrong+0.398
box+0.351
Neg logit contributions
hr-0.491
 exc-0.422
oc-0.420
 tou-0.419
EX-0.411
Token your
Token ID534
Feature activation+3.12e+00

Pos logit contributions
abella+0.101
top+0.095
 wrong+0.093
bench+0.092
box+0.084
Neg logit contributions
hr-0.051
oc-0.040
 exc-0.039
 tou-0.039
 accidents-0.039
Token trunk
Token ID21427
Feature activation+9.22e-01

Pos logit contributions
hr+6.115
 accidents+5.246
 fortunate+5.190
AAAA+5.093
Ps+5.016
Neg logit contributions
 wrong-3.927
box-3.592
top-3.527
abella-3.214
bench-3.118
Token?"
Token ID1701
Feature activation-1.58e-01

Pos logit contributions
hr+1.529
Ps+1.315
AAAA+1.315
oc+1.308
 tou+1.297
Neg logit contributions
abella-1.247
top-1.233
 wrong-1.225
bench-1.118
box-1.035
Token The
Token ID383
Feature activation+2.09e+00

Pos logit contributions
abella+0.268
top+0.250
 wrong+0.244
bench+0.239
box+0.217
Neg logit contributions
hr-0.216
oc-0.181
 exc-0.177
 tou-0.176
EX-0.175
Token elephant
Token ID20950
Feature activation-2.67e-01

Pos logit contributions
hr+3.125
rum+2.650
 accidents+2.552
EX+2.516
ment+2.503
Neg logit contributions
 wrong-3.317
top-3.135
abella-3.135
bench-2.821
 tenth-2.727
Token raised
Token ID4376
Feature activation+1.01e+00

Pos logit contributions
abella+0.390
top+0.363
 wrong+0.348
bench+0.346
box+0.305
Neg logit contributions
hr-0.424
oc-0.366
 exc-0.361
 tou-0.357
Ps-0.353
Token Lily
Token ID20037
Feature activation+5.90e-01

Pos logit contributions
hr+0.806
 tou+0.666
oc+0.666
 exc+0.664
EX+0.663
Neg logit contributions
abella-0.826
top-0.769
 wrong-0.745
bench-0.731
box-0.649
Token felt
Token ID2936
Feature activation+1.05e+00

Pos logit contributions
hr+0.896
oc+0.755
EX+0.754
 accidents+0.748
AAAA+0.746
Neg logit contributions
abella-0.918
top-0.847
 wrong-0.820
bench-0.790
box-0.717
Token good
Token ID922
Feature activation-1.74e-01

Pos logit contributions
hr+1.578
oc+1.352
EX+1.308
rum+1.286
AAAA+1.285
Neg logit contributions
abella-1.671
 wrong-1.649
top-1.572
bench-1.471
 Luke-1.300
Token about
Token ID546
Feature activation+7.38e-01

Pos logit contributions
abella+0.315
top+0.298
 wrong+0.287
bench+0.284
box+0.257
Neg logit contributions
hr-0.220
oc-0.180
 exc-0.178
 tou-0.177
 accidents-0.176
Token sharing
Token ID7373
Feature activation+8.88e-01

Pos logit contributions
hr+1.127
oc+0.996
AAAA+0.964
EX+0.955
 tou+0.945
Neg logit contributions
abella-1.125
 wrong-1.045
top-1.032
bench-0.978
 Luke-0.877
Token her
Token ID607
Feature activation+3.12e+00

Pos logit contributions
hr+1.116
AAAA+0.911
oc+0.889
Ps+0.883
 exc+0.882
Neg logit contributions
abella-1.615
 wrong-1.550
top-1.533
bench-1.469
box-1.342
Token toy
Token ID13373
Feature activation+1.05e+00

Pos logit contributions
hr+5.199
AAAA+4.391
EX+4.146
Upon+4.134
Ps+4.109
Neg logit contributions
 wrong-5.075
abella-4.472
top-4.296
box-4.140
bench-3.996
Token.
Token ID13
Feature activation+5.19e-01

Pos logit contributions
hr+1.516
AAAA+1.295
Ps+1.253
EX+1.228
 tou+1.215
Neg logit contributions
abella-1.686
top-1.646
 wrong-1.634
bench-1.512
box-1.463
Token The
Token ID383
Feature activation+6.27e-01

Pos logit contributions
hr+0.790
oc+0.653
 tou+0.650
 exc+0.649
EX+0.646
Neg logit contributions
abella-0.814
top-0.760
 wrong-0.732
bench-0.717
box-0.638
Token moral
Token ID6573
Feature activation-1.44e+00

Pos logit contributions
hr+0.933
oc+0.778
EX+0.765
rum+0.761
 tou+0.755
Neg logit contributions
abella-0.990
 wrong-0.946
top-0.933
bench-0.895
box-0.802
Token of
Token ID286
Feature activation-1.16e-01

Pos logit contributions
abella+3.262
top+3.060
bench+2.970
 wrong+2.937
 versa+2.824
Neg logit contributions
hr-1.165
 tou-0.958
oc-0.913
 exc-0.884
 accidents-0.862
Token
Token ID198
Feature activation-4.94e-01

Pos logit contributions
abella+1.670
top+1.593
 wrong+1.570
bench+1.522
box+1.409
Neg logit contributions
hr-1.094
oc-0.949
 exc-0.900
EX-0.867
Ps-0.865
TokenOne
Token ID3198
Feature activation+8.91e-01

Pos logit contributions
abella+0.561
top+0.513
 wrong+0.483
bench+0.478
box+0.414
Neg logit contributions
hr-0.935
oc-0.856
 exc-0.834
 tou-0.829
AAAA-0.828
Token day
Token ID1110
Feature activation+7.16e-01

Pos logit contributions
hr+1.891
 accidents+1.656
EX+1.652
AAAA+1.651
oc+1.642
Neg logit contributions
abella-0.843
top-0.774
 wrong-0.748
bench-0.671
box-0.526
Token,
Token ID11
Feature activation+8.04e-01

Pos logit contributions
hr+1.156
oc+0.957
EX+0.955
 exc+0.954
 tou+0.948
Neg logit contributions
abella-1.052
top-0.979
 wrong-0.979
bench-0.908
box-0.816
Token while
Token ID981
Feature activation+4.58e-01

Pos logit contributions
hr+1.412
oc+1.198
EX+1.171
rum+1.165
 exc+1.163
Neg logit contributions
abella-1.093
top-1.005
 wrong-1.004
bench-0.921
 Luke-0.819
Token they
Token ID484
Feature activation+3.11e+00

Pos logit contributions
hr+0.782
oc+0.670
EX+0.656
 exc+0.654
 tou+0.652
Neg logit contributions
abella-0.626
top-0.578
 wrong-0.569
bench-0.538
box-0.471
Token were
Token ID547
Feature activation+1.60e-02

Pos logit contributions
hr+6.640
rum+6.178
 accidents+5.970
adoes+5.926
 fortunate+5.925
Neg logit contributions
abella-3.280
top-3.006
 wrong-2.564
bench-2.302
 versa-2.136
Token walking
Token ID6155
Feature activation-8.08e-01

Pos logit contributions
hr+0.024
oc+0.021
 exc+0.021
 tou+0.021
EX+0.020
Neg logit contributions
abella-0.024
top-0.023
 wrong-0.022
bench-0.021
box-0.019
Token,
Token ID11
Feature activation-3.28e-01

Pos logit contributions
abella+1.448
top+1.327
bench+1.311
 wrong+1.256
box+1.176
Neg logit contributions
hr-1.032
oc-0.883
 exc-0.864
 tou-0.861
AAAA-0.851
Token Spot
Token ID15899
Feature activation+2.30e-01

Pos logit contributions
abella+0.486
top+0.457
bench+0.438
 wrong+0.435
box+0.389
Neg logit contributions
hr-0.505
 tou-0.433
 exc-0.433
oc-0.432
 accidents-0.432
Token saw
Token ID2497
Feature activation-7.73e-01

Pos logit contributions
hr+0.368
oc+0.315
 exc+0.313
 tou+0.310
 accidents+0.309
Neg logit contributions
abella-0.336
top-0.313
 wrong-0.304
bench-0.291
box-0.260
Token.
Token ID13
Feature activation+1.13e+00

Pos logit contributions
hr+1.300
 tou+1.104
AAAA+1.104
oc+1.099
Ps+1.090
Neg logit contributions
abella-1.443
top-1.402
 wrong-1.349
bench-1.346
box-1.216
Token She
Token ID1375
Feature activation+8.99e-01

Pos logit contributions
hr+1.839
oc+1.548
 tou+1.542
 accidents+1.533
 exc+1.517
Neg logit contributions
abella-1.628
top-1.585
 wrong-1.433
bench-1.419
 Luke-1.284
Token quickly
Token ID2952
Feature activation+1.20e+00

Pos logit contributions
hr+1.324
 accidents+1.122
AAAA+1.115
rum+1.097
 tou+1.097
Neg logit contributions
abella-1.443
top-1.339
 wrong-1.274
bench-1.249
box-1.169
Token filled
Token ID5901
Feature activation+6.36e-01

Pos logit contributions
hr+1.667
 accidents+1.398
 exc+1.371
oc+1.367
AAAA+1.364
Neg logit contributions
abella-2.029
top-1.927
bench-1.845
 wrong-1.837
box-1.678
Token up
Token ID510
Feature activation+1.40e+00

Pos logit contributions
hr+0.852
oc+0.725
AAAA+0.718
 exc+0.714
 tou+0.712
Neg logit contributions
abella-1.079
top-1.024
 wrong-0.984
bench-0.959
box-0.872
Token her
Token ID607
Feature activation+3.11e+00

Pos logit contributions
hr+1.931
AAAA+1.720
oc+1.680
 exc+1.664
Ps+1.656
Neg logit contributions
abella-2.320
top-2.244
 wrong-2.152
bench-2.071
box-1.885
Token basket
Token ID7988
Feature activation-1.70e-01

Pos logit contributions
hr+5.524
AAAA+4.908
rum+4.795
Ps+4.723
Get+4.719
Neg logit contributions
 wrong-4.236
top-4.001
box-3.893
abella-3.783
bench-3.446
Token with
Token ID351
Feature activation+8.56e-01

Pos logit contributions
abella+0.346
top+0.325
 wrong+0.319
bench+0.313
box+0.287
Neg logit contributions
hr-0.173
oc-0.134
 exc-0.132
 tou-0.130
EX-0.128
Token blue
Token ID4171
Feature activation+1.64e+00

Pos logit contributions
hr+1.333
oc+1.131
AAAA+1.122
Ps+1.107
 exc+1.096
Neg logit contributions
abella-1.284
top-1.241
 wrong-1.200
bench-1.153
box-1.035
Tokenberries
Token ID20853
Feature activation-4.42e-01

Pos logit contributions
hr+2.273
 exc+1.850
AAAA+1.847
Ps+1.806
oc+1.786
Neg logit contributions
top-2.776
abella-2.679
 wrong-2.573
bench-2.551
box-2.470
Token and
Token ID290
Feature activation+6.28e-01

Pos logit contributions
abella+0.895
top+0.829
 wrong+0.815
bench+0.802
box+0.729
Neg logit contributions
hr-0.460
oc-0.361
 exc-0.355
 tou-0.355
 accidents-0.352
Token they
Token ID484
Feature activation-1.17e+00

Pos logit contributions
abella+1.841
top+1.789
bench+1.741
 wrong+1.716
box+1.536
Neg logit contributions
hr-1.468
 accidents-1.240
oc-1.237
Ps-1.218
 exc-1.214
Token played
Token ID2826
Feature activation-6.49e-01

Pos logit contributions
abella+1.658
top+1.490
 wrong+1.441
bench+1.420
box+1.284
Neg logit contributions
hr-1.898
 exc-1.781
oc-1.709
 tou-1.693
Ps-1.657
Token in
Token ID287
Feature activation+2.97e-01

Pos logit contributions
abella+1.009
top+0.937
bench+0.886
 wrong+0.883
box+0.764
Neg logit contributions
hr-0.994
oc-0.856
 exc-0.841
AAAA-0.831
 accidents-0.825
Token the
Token ID262
Feature activation-1.18e+00

Pos logit contributions
hr+0.358
oc+0.295
 tou+0.288
 exc+0.286
 accidents+0.284
Neg logit contributions
abella-0.546
top-0.519
 wrong-0.504
bench-0.494
box-0.450
Token garden
Token ID11376
Feature activation-1.55e+00

Pos logit contributions
abella+2.076
bench+1.882
top+1.860
 wrong+1.769
 versa+1.640
Neg logit contributions
hr-1.571
 tou-1.364
Ps-1.333
AAAA-1.329
oc-1.314
Token and
Token ID290
Feature activation-3.29e+00

Pos logit contributions
abella+3.389
bench+2.967
top+2.925
 wrong+2.866
 versa+2.779
Neg logit contributions
hr-1.558
 exc-1.265
oc-1.243
 from-1.173
 accidents-1.079
Token had
Token ID550
Feature activation-1.51e+00

Pos logit contributions
bench+6.219
top+5.889
abella+5.879
 wrong+5.793
box+5.527
Neg logit contributions
hr-4.061
 exc-3.600
 pets-3.563
Ps-3.436
 from-3.431
Token lots
Token ID6041
Feature activation+1.26e+00

Pos logit contributions
abella+2.493
top+2.339
bench+2.330
 wrong+2.180
box+2.019
Neg logit contributions
hr-2.118
 tou-1.776
 exc-1.760
 accidents-1.741
 little-1.732
Token of
Token ID286
Feature activation-4.28e-02

Pos logit contributions
hr+1.678
AAAA+1.416
EX+1.370
Ps+1.361
oc+1.357
Neg logit contributions
abella-2.114
top-2.098
 wrong-2.054
bench-1.972
box-1.754
Token fun
Token ID1257
Feature activation+8.82e-02

Pos logit contributions
abella+0.055
top+0.051
 wrong+0.049
bench+0.048
box+0.041
Neg logit contributions
hr-0.076
oc-0.066
 exc-0.066
 tou-0.065
EX-0.065
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
hr+0.125
oc+0.104
 exc+0.103
 tou+0.103
EX+0.102
Neg logit contributions
abella-0.145
top-0.136
 wrong-0.132
bench-0.129
box-0.116
Token play
Token ID711
Feature activation-4.39e-01

Pos logit contributions
abella+1.332
top+1.235
 wrong+1.214
bench+1.194
box+1.061
Neg logit contributions
hr-1.009
 exc-0.856
 tou-0.838
EX-0.815
oc-0.812
Token.
Token ID13
Feature activation-6.58e-01

Pos logit contributions
abella+0.761
top+0.708
bench+0.676
 wrong+0.676
box+0.605
Neg logit contributions
hr-0.592
oc-0.499
 exc-0.488
AAAA-0.482
 accidents-0.479
Token They
Token ID1119
Feature activation-6.24e-01

Pos logit contributions
abella+1.021
top+0.960
 wrong+0.945
bench+0.914
box+0.825
Neg logit contributions
hr-0.956
 exc-0.810
oc-0.809
Ps-0.806
AAAA-0.797
Token looked
Token ID3114
Feature activation+2.59e-01

Pos logit contributions
abella+1.022
top+0.956
 wrong+0.939
bench+0.915
box+0.821
Neg logit contributions
hr-0.874
 exc-0.754
 tou-0.735
oc-0.731
Ps-0.730
Token around
Token ID1088
Feature activation-2.17e-01

Pos logit contributions
hr+0.432
oc+0.373
 tou+0.367
 exc+0.366
EX+0.364
Neg logit contributions
abella-0.362
top-0.337
 wrong-0.330
bench-0.315
box-0.278
Token and
Token ID290
Feature activation-3.09e+00

Pos logit contributions
abella+0.423
top+0.399
 wrong+0.387
bench+0.385
box+0.352
Neg logit contributions
hr-0.243
oc-0.193
 exc-0.191
 accidents-0.188
AAAA-0.186
Token saw
Token ID2497
Feature activation-1.58e+00

Pos logit contributions
abella+6.428
bench+6.229
top+5.906
box+5.745
 wrong+5.740
Neg logit contributions
hr-3.243
Ps-2.739
 exc-2.665
 mis-2.568
 little-2.554
Token a
Token ID257
Feature activation-5.12e-01

Pos logit contributions
abella+2.665
bench+2.563
top+2.527
 wrong+2.296
box+2.125
Neg logit contributions
hr-2.152
 accidents-1.966
 exc-1.821
 little-1.793
 pets-1.769
Token black
Token ID2042
Feature activation-9.24e-01

Pos logit contributions
abella+0.959
top+0.902
bench+0.866
 wrong+0.860
box+0.776
Neg logit contributions
hr-0.611
 tou-0.502
oc-0.500
 exc-0.494
AAAA-0.486
Tokenboard
Token ID3526
Feature activation-1.81e+00

Pos logit contributions
abella+1.771
top+1.621
bench+1.573
 wrong+1.564
 versa+1.390
Neg logit contributions
hr-1.085
oc-0.951
 tou-0.920
AAAA-0.902
 accidents-0.888
Token.
Token ID13
Feature activation-9.66e-01

Pos logit contributions
abella+3.805
top+3.318
 wrong+3.142
bench+3.128
acey+3.078
Neg logit contributions
hr-1.891
 exc-1.568
 from-1.526
oc-1.525
 tou-1.491
Token
Token ID198
Feature activation-1.40e+00

Pos logit contributions
abella+2.709
top+2.606
 wrong+2.554
bench+2.499
box+2.325
Neg logit contributions
hr-1.555
oc-1.366
Ps-1.318
 exc-1.313
AAAA-1.265
TokenOne
Token ID3198
Feature activation-3.32e-01

Pos logit contributions
abella+1.810
top+1.660
 wrong+1.648
bench+1.632
box+1.437
Neg logit contributions
hr-2.386
oc-2.171
Ps-2.136
 exc-2.110
AAAA-2.100
Token day
Token ID1110
Feature activation-1.67e-01

Pos logit contributions
abella+0.407
top+0.368
 wrong+0.345
bench+0.343
box+0.297
Neg logit contributions
hr-0.619
 exc-0.549
oc-0.544
 tou-0.536
 accidents-0.533
Token,
Token ID11
Feature activation-3.34e-01

Pos logit contributions
abella+0.274
top+0.258
 wrong+0.249
bench+0.247
box+0.220
Neg logit contributions
hr-0.234
oc-0.196
 exc-0.194
 tou-0.192
AAAA-0.192
Token they
Token ID484
Feature activation-7.27e-01

Pos logit contributions
abella+0.437
top+0.410
 wrong+0.392
bench+0.391
box+0.338
Neg logit contributions
hr-0.575
 exc-0.498
oc-0.494
 accidents-0.491
 tou-0.488
Token find
Token ID1064
Feature activation-2.87e+00

Pos logit contributions
abella+1.336
top+1.239
bench+1.190
 wrong+1.188
box+1.087
Neg logit contributions
hr-0.892
 exc-0.763
 tou-0.745
Ps-0.736
oc-0.727
Token a
Token ID257
Feature activation-6.24e-01

Pos logit contributions
abella+5.486
bench+5.436
top+5.320
ven+4.664
 versa+4.617
Neg logit contributions
 little-3.001
hr-2.924
 from-2.852
 accidents-2.785
 pets-2.646
Token big
Token ID1263
Feature activation-7.23e-01

Pos logit contributions
abella+1.176
top+1.105
bench+1.055
 wrong+1.031
box+0.940
Neg logit contributions
hr-0.736
 tou-0.613
 exc-0.603
oc-0.589
 accidents-0.586
Token box
Token ID3091
Feature activation-5.05e-01

Pos logit contributions
abella+1.164
top+1.049
bench+0.978
 wrong+0.977
 versa+0.852
Neg logit contributions
hr-1.074
oc-0.936
 tou-0.927
 exc-0.920
 accidents-0.915
Token in
Token ID287
Feature activation+4.10e-01

Pos logit contributions
abella+0.911
top+0.838
 wrong+0.805
bench+0.795
 versa+0.716
Neg logit contributions
hr-0.643
 exc-0.532
oc-0.520
 tou-0.518
EX-0.511
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
hr+0.513
oc+0.425
 tou+0.413
 exc+0.411
AAAA+0.411
Neg logit contributions
abella-0.745
top-0.708
 wrong-0.687
bench-0.676
box-0.613
Token day
Token ID1110
Feature activation-2.06e-01

Pos logit contributions
abella+0.445
top+0.403
 wrong+0.380
bench+0.379
box+0.327
Neg logit contributions
hr-0.679
 exc-0.603
oc-0.595
 tou-0.586
 accidents-0.586
Token,
Token ID11
Feature activation-1.19e-01

Pos logit contributions
abella+0.337
top+0.319
 wrong+0.307
bench+0.305
box+0.272
Neg logit contributions
hr-0.289
oc-0.240
 exc-0.240
 accidents-0.236
AAAA-0.236
Token Anna
Token ID11735
Feature activation+3.61e-01

Pos logit contributions
abella+0.153
top+0.142
 wrong+0.136
bench+0.134
box+0.116
Neg logit contributions
hr-0.209
oc-0.181
 exc-0.181
 tou-0.179
 accidents-0.179
Token and
Token ID290
Feature activation-1.18e+00

Pos logit contributions
hr+0.335
oc+0.246
 tou+0.244
 exc+0.241
 accidents+0.241
Neg logit contributions
abella-0.765
top-0.733
 wrong-0.714
bench-0.703
box-0.652
Token Ben
Token ID3932
Feature activation-4.68e-01

Pos logit contributions
abella+1.408
top+1.307
bench+1.271
 wrong+1.260
box+1.046
Neg logit contributions
hr-2.173
 exc-1.939
 accidents-1.923
 tou-1.879
Ps-1.871
Token find
Token ID1064
Feature activation-2.86e+00

Pos logit contributions
abella+0.931
top+0.872
 wrong+0.848
bench+0.842
box+0.778
Neg logit contributions
hr-0.500
 exc-0.412
 tou-0.396
oc-0.394
Ps-0.392
Token a
Token ID257
Feature activation-1.33e+00

Pos logit contributions
abella+5.461
bench+5.256
top+5.152
ven+4.645
rob+4.558
Neg logit contributions
 little-3.108
 accidents-3.095
 from-3.063
hr-3.033
 exc-2.753
Token big
Token ID1263
Feature activation-1.32e+00

Pos logit contributions
abella+2.540
top+2.354
bench+2.282
 wrong+2.139
 versa+2.025
Neg logit contributions
hr-1.551
 exc-1.335
 accidents-1.306
 tou-1.304
 mis-1.294
Token frog
Token ID21264
Feature activation+1.76e-01

Pos logit contributions
abella+2.239
top+1.978
bench+1.864
 wrong+1.805
liv+1.632
Neg logit contributions
hr-1.890
 accidents-1.679
 exc-1.647
oc-1.643
 tou-1.620
Token in
Token ID287
Feature activation+7.02e-01

Pos logit contributions
hr+0.260
oc+0.219
 exc+0.216
 tou+0.216
EX+0.214
Neg logit contributions
abella-0.277
top-0.261
 wrong-0.254
bench-0.247
box-0.223
Token the
Token ID262
Feature activation-1.05e+00

Pos logit contributions
hr+0.916
oc+0.776
 tou+0.742
EX+0.741
Ps+0.739
Neg logit contributions
abella-1.227
top-1.172
 wrong-1.134
bench-1.115
 Luke-1.014
Token day
Token ID1110
Feature activation+9.51e-02

Pos logit contributions
abella+0.331
top+0.299
 wrong+0.285
bench+0.279
box+0.239
Neg logit contributions
hr-0.521
 exc-0.461
oc-0.460
 tou-0.451
EX-0.446
Token,
Token ID11
Feature activation+1.81e-01

Pos logit contributions
hr+0.139
oc+0.117
 exc+0.116
 tou+0.115
EX+0.114
Neg logit contributions
abella-0.152
top-0.142
 wrong-0.138
bench-0.135
box-0.121
Token Anna
Token ID11735
Feature activation+1.87e-01

Pos logit contributions
hr+0.312
oc+0.272
 exc+0.267
 tou+0.266
EX+0.265
Neg logit contributions
abella-0.243
top-0.224
 wrong-0.216
bench-0.208
box-0.183
Token and
Token ID290
Feature activation-6.59e-01

Pos logit contributions
hr+0.243
oc+0.199
 exc+0.196
 tou+0.196
EX+0.195
Neg logit contributions
abella-0.328
top-0.310
 wrong-0.301
bench-0.295
box-0.268
Token Ben
Token ID3932
Feature activation-3.99e-01

Pos logit contributions
abella+0.827
top+0.797
bench+0.746
 wrong+0.736
box+0.636
Neg logit contributions
hr-1.164
 tou-1.032
 exc-1.029
 accidents-1.018
Ps-1.018
Token find
Token ID1064
Feature activation-2.84e+00

Pos logit contributions
abella+0.681
top+0.635
 wrong+0.614
bench+0.605
box+0.548
Neg logit contributions
hr-0.533
 exc-0.454
 tou-0.447
oc-0.446
Ps-0.440
Token a
Token ID257
Feature activation-1.22e+00

Pos logit contributions
abella+5.456
top+5.402
bench+5.263
ven+4.708
 versa+4.614
Neg logit contributions
 little-2.944
hr-2.870
 from-2.853
 accidents-2.690
 pets-2.651
Token big
Token ID1263
Feature activation-1.31e+00

Pos logit contributions
abella+2.431
top+2.285
bench+2.190
 wrong+2.095
 versa+1.982
Neg logit contributions
hr-1.322
 exc-1.105
 tou-1.103
oc-1.040
 mis-1.036
Token roll
Token ID4836
Feature activation-5.40e-01

Pos logit contributions
abella+2.221
top+1.970
bench+1.834
 wrong+1.804
liv+1.680
Neg logit contributions
hr-1.879
oc-1.673
 tou-1.623
 exc-1.619
 accidents-1.598
Token of
Token ID286
Feature activation-6.69e-01

Pos logit contributions
abella+0.923
top+0.842
 wrong+0.811
bench+0.805
 versa+0.721
Neg logit contributions
hr-0.731
oc-0.622
 exc-0.613
 tou-0.607
 accidents-0.597
Token paper
Token ID3348
Feature activation-8.09e-01

Pos logit contributions
abella+1.002
top+0.918
 wrong+0.864
bench+0.850
box+0.739
Neg logit contributions
hr-1.058
 exc-0.911
oc-0.905
 tou-0.897
 accidents-0.892
Token
Token ID198
Feature activation-1.57e+00

Pos logit contributions
abella+3.677
top+3.547
bench+3.519
 wrong+3.509
box+3.277
Neg logit contributions
hr-1.864
Ps-1.623
 exc-1.599
oc-1.598
Upon-1.549
TokenOne
Token ID3198
Feature activation-1.27e-01

Pos logit contributions
abella+2.091
bench+1.956
top+1.929
 wrong+1.920
box+1.749
Neg logit contributions
hr-2.619
Ps-2.345
oc-2.327
 exc-2.318
AAAA-2.276
Token day
Token ID1110
Feature activation-3.11e-01

Pos logit contributions
abella+0.149
top+0.135
 wrong+0.128
bench+0.125
box+0.107
Neg logit contributions
hr-0.242
oc-0.213
 exc-0.213
 tou-0.210
 accidents-0.208
Token,
Token ID11
Feature activation-3.71e-01

Pos logit contributions
abella+0.517
top+0.490
bench+0.471
 wrong+0.471
box+0.420
Neg logit contributions
hr-0.427
oc-0.355
 exc-0.354
AAAA-0.350
 accidents-0.350
Token they
Token ID484
Feature activation-3.59e-01

Pos logit contributions
abella+0.480
top+0.451
bench+0.431
 wrong+0.431
box+0.372
Neg logit contributions
hr-0.641
 exc-0.555
oc-0.548
 accidents-0.548
AAAA-0.544
Token find
Token ID1064
Feature activation-2.83e+00

Pos logit contributions
abella+0.624
top+0.584
 wrong+0.566
bench+0.557
box+0.503
Neg logit contributions
hr-0.473
 exc-0.397
oc-0.391
 tou-0.388
Ps-0.384
Token a
Token ID257
Feature activation-8.93e-01

Pos logit contributions
bench+5.337
abella+5.308
top+5.213
 versa+4.554
ven+4.539
Neg logit contributions
 little-3.024
hr-2.965
 from-2.822
 accidents-2.690
AAAA-2.542
Token new
Token ID649
Feature activation-9.47e-01

Pos logit contributions
abella+1.726
top+1.615
bench+1.542
 wrong+1.490
 versa+1.379
Neg logit contributions
hr-1.020
 tou-0.856
 exc-0.843
oc-0.819
 accidents-0.804
Token toy
Token ID13373
Feature activation-4.93e-01

Pos logit contributions
abella+1.838
top+1.675
bench+1.593
 wrong+1.585
 versa+1.470
Neg logit contributions
hr-1.099
oc-0.925
 exc-0.919
 tou-0.884
EX-0.872
Token in
Token ID287
Feature activation+2.61e-01

Pos logit contributions
abella+0.790
top+0.730
bench+0.697
 wrong+0.688
box+0.602
Neg logit contributions
hr-0.710
oc-0.611
 exc-0.606
 tou-0.603
 accidents-0.593
Token their
Token ID511
Feature activation-2.90e-01

Pos logit contributions
hr+0.324
oc+0.267
 exc+0.260
 tou+0.259
EX+0.258
Neg logit contributions
abella-0.474
top-0.449
 wrong-0.436
bench-0.428
box-0.390
Token.
Token ID13
Feature activation-1.42e+00

Pos logit contributions
abella+0.298
top+0.278
 wrong+0.269
bench+0.264
box+0.240
Neg logit contributions
hr-0.230
oc-0.192
 exc-0.189
 tou-0.188
EX-0.185
Token One
Token ID1881
Feature activation-3.92e-01

Pos logit contributions
abella+2.257
top+2.112
bench+2.110
 wrong+2.080
box+1.933
Neg logit contributions
hr-1.970
 exc-1.780
Ps-1.772
oc-1.747
AAAA-1.687
Token day
Token ID1110
Feature activation-7.27e-02

Pos logit contributions
abella+0.489
top+0.436
 wrong+0.410
bench+0.409
box+0.350
Neg logit contributions
hr-0.724
 exc-0.645
oc-0.644
 tou-0.627
 accidents-0.622
Token,
Token ID11
Feature activation-3.78e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.105
box+0.094
Neg logit contributions
hr-0.104
oc-0.087
 exc-0.087
 tou-0.086
AAAA-0.085
Token they
Token ID484
Feature activation-5.77e-01

Pos logit contributions
abella+0.486
top+0.455
bench+0.434
 wrong+0.432
box+0.373
Neg logit contributions
hr-0.658
 exc-0.572
 accidents-0.566
oc-0.564
 tou-0.561
Token find
Token ID1064
Feature activation-2.80e+00

Pos logit contributions
abella+1.043
top+0.965
 wrong+0.934
bench+0.918
box+0.837
Neg logit contributions
hr-0.724
 exc-0.622
 tou-0.604
oc-0.602
Ps-0.590
Token a
Token ID257
Feature activation-8.67e-01

Pos logit contributions
abella+5.395
bench+5.230
top+5.181
 versa+4.606
ven+4.507
Neg logit contributions
 little-2.888
hr-2.850
 from-2.817
 accidents-2.642
AAAA-2.611
Token big
Token ID1263
Feature activation-9.16e-01

Pos logit contributions
abella+1.674
top+1.562
bench+1.500
 wrong+1.435
 versa+1.342
Neg logit contributions
hr-0.995
 tou-0.843
 exc-0.824
oc-0.806
Ps-0.789
Token box
Token ID3091
Feature activation-5.16e-01

Pos logit contributions
abella+1.411
top+1.242
bench+1.158
 wrong+1.140
 versa+1.017
Neg logit contributions
hr-1.442
oc-1.294
 tou-1.269
 exc-1.256
 accidents-1.245
Token in
Token ID287
Feature activation+6.45e-01

Pos logit contributions
abella+0.943
top+0.866
 wrong+0.838
bench+0.823
 versa+0.743
Neg logit contributions
hr-0.643
 exc-0.529
oc-0.522
 tou-0.517
EX-0.510
Token the
Token ID262
Feature activation-1.21e+00

Pos logit contributions
hr+0.827
oc+0.693
AAAA+0.673
 tou+0.673
Ps+0.670
Neg logit contributions
abella-1.151
top-1.100
 wrong-1.066
bench-1.050
box-0.945
Token or
Token ID393
Feature activation+5.85e-02

Pos logit contributions
abella+0.589
top+0.552
bench+0.523
 wrong+0.503
box+0.438
Neg logit contributions
hr-0.883
 accidents-0.784
 exc-0.778
 tou-0.761
oc-0.761
Token on
Token ID319
Feature activation+3.90e-01

Pos logit contributions
hr+0.082
oc+0.069
 exc+0.068
 tou+0.067
EX+0.067
Neg logit contributions
abella-0.096
top-0.090
 wrong-0.088
bench-0.086
box-0.077
Token paper
Token ID3348
Feature activation-1.65e-01

Pos logit contributions
hr+0.515
oc+0.426
 tou+0.421
 exc+0.416
EX+0.416
Neg logit contributions
abella-0.672
top-0.638
 wrong-0.625
bench-0.606
box-0.551
Token.
Token ID13
Feature activation+2.47e-01

Pos logit contributions
abella+0.287
top+0.269
 wrong+0.261
bench+0.256
box+0.231
Neg logit contributions
hr-0.215
oc-0.179
 exc-0.176
 tou-0.174
 accidents-0.174
Token You
Token ID921
Feature activation-7.72e-01

Pos logit contributions
hr+0.383
oc+0.323
 tou+0.321
 exc+0.319
 accidents+0.318
Neg logit contributions
abella-0.371
top-0.346
 wrong-0.333
bench-0.325
box-0.290
Token can
Token ID460
Feature activation-2.78e+00

Pos logit contributions
abella+1.239
top+1.145
 wrong+1.087
bench+1.084
box+0.974
Neg logit contributions
hr-1.133
 exc-0.997
oc-0.976
 tou-0.963
AAAA-0.942
Token not
Token ID407
Feature activation-1.13e+00

Pos logit contributions
abella+5.410
bench+5.034
top+4.761
 wrong+4.719
 projector+4.405
Neg logit contributions
hr-3.456
 exc-3.368
 from-2.795
 mis-2.743
 ac-2.689
Token draw
Token ID3197
Feature activation+5.19e-01

Pos logit contributions
abella+2.083
top+1.873
bench+1.839
 wrong+1.813
box+1.656
Neg logit contributions
hr-1.467
 exc-1.270
 tou-1.188
oc-1.168
 accidents-1.153
Token on
Token ID319
Feature activation+3.06e-01

Pos logit contributions
hr+0.767
oc+0.645
 tou+0.637
EX+0.635
 exc+0.635
Neg logit contributions
abella-0.820
top-0.769
 wrong-0.769
bench-0.722
box-0.661
Token other
Token ID584
Feature activation+1.98e-01

Pos logit contributions
hr+0.393
oc+0.324
 exc+0.320
 tou+0.320
EX+0.316
Neg logit contributions
abella-0.539
top-0.509
 wrong-0.497
bench-0.482
box-0.441
Token people
Token ID661
Feature activation+1.48e-01

Pos logit contributions
hr+0.281
oc+0.236
 exc+0.233
 tou+0.233
EX+0.231
Neg logit contributions
abella-0.322
top-0.305
 wrong-0.296
bench-0.288
box-0.261
Token
Token ID198
Feature activation-1.37e+00

Pos logit contributions
abella+2.341
top+2.262
 wrong+2.216
bench+2.180
box+2.007
Neg logit contributions
hr-1.365
oc-1.191
 exc-1.145
Ps-1.104
Upon-1.080
TokenOne
Token ID3198
Feature activation-2.07e-01

Pos logit contributions
abella+1.860
top+1.735
 wrong+1.705
bench+1.701
box+1.504
Neg logit contributions
hr-2.236
oc-2.039
Ps-1.997
 exc-1.975
AAAA-1.961
Token day
Token ID1110
Feature activation-2.38e-01

Pos logit contributions
abella+0.245
top+0.223
 wrong+0.211
bench+0.208
box+0.178
Neg logit contributions
hr-0.391
oc-0.345
 exc-0.344
 tou-0.338
 accidents-0.336
Token,
Token ID11
Feature activation-4.07e-01

Pos logit contributions
abella+0.393
top+0.372
 wrong+0.357
bench+0.356
box+0.317
Neg logit contributions
hr-0.329
oc-0.276
 exc-0.274
AAAA-0.271
 accidents-0.270
Token they
Token ID484
Feature activation-1.01e-01

Pos logit contributions
abella+0.559
top+0.526
bench+0.504
 wrong+0.502
box+0.440
Neg logit contributions
hr-0.670
 exc-0.578
oc-0.573
 accidents-0.571
AAAA-0.566
Token found
Token ID1043
Feature activation-2.77e+00

Pos logit contributions
abella+0.176
top+0.166
 wrong+0.161
bench+0.158
box+0.143
Neg logit contributions
hr-0.132
oc-0.109
 exc-0.108
 tou-0.107
Ps-0.106
Token a
Token ID257
Feature activation-8.09e-01

Pos logit contributions
bench+5.368
abella+5.314
top+5.057
ven+4.601
 Dame+4.591
Neg logit contributions
 from-2.847
hr-2.815
 accidents-2.712
 little-2.694
 pets-2.517
Token big
Token ID1263
Feature activation-1.08e+00

Pos logit contributions
abella+1.540
top+1.444
bench+1.377
 wrong+1.338
 versa+1.221
Neg logit contributions
hr-0.942
 tou-0.780
 exc-0.768
oc-0.766
 accidents-0.750
Token fan
Token ID4336
Feature activation-4.52e-01

Pos logit contributions
abella+1.746
top+1.555
bench+1.440
 wrong+1.423
 versa+1.268
Neg logit contributions
hr-1.627
oc-1.447
 tou-1.418
 accidents-1.416
 exc-1.392
Token in
Token ID287
Feature activation+3.51e-01

Pos logit contributions
abella+0.762
top+0.698
 wrong+0.674
bench+0.668
box+0.592
Neg logit contributions
hr-0.632
oc-0.527
 exc-0.519
 tou-0.517
EX-0.510
Token the
Token ID262
Feature activation-1.71e+00

Pos logit contributions
hr+0.443
oc+0.367
 tou+0.360
 exc+0.358
EX+0.354
Neg logit contributions
abella-0.632
top-0.601
 wrong-0.579
bench-0.572
box-0.519
Token
Token ID198
Feature activation-7.92e-01

Pos logit contributions
abella+1.684
top+1.615
 wrong+1.599
bench+1.571
box+1.453
Neg logit contributions
hr-1.150
 exc-0.983
oc-0.980
Ps-0.945
AAAA-0.944
TokenOne
Token ID3198
Feature activation-5.13e-01

Pos logit contributions
abella+1.045
top+0.972
 wrong+0.965
bench+0.950
box+0.848
Neg logit contributions
hr-1.350
oc-1.191
 exc-1.185
AAAA-1.164
Ps-1.156
Token day
Token ID1110
Feature activation+2.89e-01

Pos logit contributions
abella+0.639
top+0.582
bench+0.549
 wrong+0.549
box+0.479
Neg logit contributions
hr-0.939
 exc-0.835
oc-0.820
 tou-0.804
 accidents-0.801
Token,
Token ID11
Feature activation+4.92e-01

Pos logit contributions
hr+0.434
oc+0.370
 tou+0.363
 exc+0.362
EX+0.361
Neg logit contributions
abella-0.454
top-0.422
 wrong-0.411
bench-0.394
box-0.357
Token they
Token ID484
Feature activation+4.55e-02

Pos logit contributions
hr+0.853
oc+0.756
EX+0.737
 tou+0.734
AAAA+0.728
Neg logit contributions
abella-0.670
top-0.608
 wrong-0.586
bench-0.549
box-0.486
Token find
Token ID1064
Feature activation-2.75e+00

Pos logit contributions
hr+0.064
oc+0.054
 exc+0.053
 tou+0.053
EX+0.052
Neg logit contributions
abella-0.075
top-0.071
 wrong-0.068
bench-0.067
box-0.060
Token a
Token ID257
Feature activation-1.33e+00

Pos logit contributions
bench+5.225
top+5.049
abella+5.044
ven+4.427
 wrong+4.404
Neg logit contributions
 little-3.025
hr-2.935
 from-2.864
 accidents-2.664
AAAA-2.596
Token new
Token ID649
Feature activation-1.30e+00

Pos logit contributions
abella+2.636
top+2.482
bench+2.408
 wrong+2.255
 versa+2.127
Neg logit contributions
hr-1.461
 tou-1.240
 exc-1.206
 accidents-1.163
 mis-1.153
Token soap
Token ID19533
Feature activation-8.84e-01

Pos logit contributions
abella+2.418
top+2.201
bench+2.118
 wrong+2.061
 versa+1.909
Neg logit contributions
hr-1.654
 exc-1.399
oc-1.347
 accidents-1.342
 tou-1.332
Token in
Token ID287
Feature activation+5.23e-01

Pos logit contributions
abella+1.534
top+1.405
bench+1.337
 wrong+1.324
 versa+1.193
Neg logit contributions
hr-1.168
 exc-0.985
 tou-0.978
oc-0.974
 accidents-0.967
Token the
Token ID262
Feature activation-1.78e+00

Pos logit contributions
hr+0.690
oc+0.581
 tou+0.565
Ps+0.562
EX+0.560
Neg logit contributions
abella-0.911
top-0.867
 wrong-0.840
bench-0.826
box-0.748
Token day
Token ID1110
Feature activation+1.77e-01

Pos logit contributions
abella+0.256
top+0.233
 wrong+0.220
bench+0.217
box+0.186
Neg logit contributions
hr-0.409
oc-0.360
 exc-0.360
 tou-0.355
 accidents-0.351
Token,
Token ID11
Feature activation+1.36e-01

Pos logit contributions
hr+0.262
oc+0.223
 tou+0.219
 exc+0.219
EX+0.217
Neg logit contributions
abella-0.281
top-0.263
 wrong-0.255
bench-0.247
box-0.223
Token Tom
Token ID4186
Feature activation+3.29e-01

Pos logit contributions
hr+0.240
oc+0.210
 tou+0.207
 exc+0.207
EX+0.205
Neg logit contributions
abella-0.176
top-0.162
 wrong-0.155
bench-0.150
box-0.131
Token and
Token ID290
Feature activation-1.21e-01

Pos logit contributions
hr+0.432
oc+0.353
 exc+0.347
 tou+0.346
EX+0.346
Neg logit contributions
abella-0.572
top-0.541
 wrong-0.530
bench-0.515
box-0.469
Token Mia
Token ID32189
Feature activation-3.67e-01

Pos logit contributions
abella+0.176
top+0.165
 wrong+0.159
bench+0.156
box+0.138
Neg logit contributions
hr-0.194
oc-0.166
 exc-0.165
 tou-0.163
EX-0.162
Token find
Token ID1064
Feature activation-2.74e+00

Pos logit contributions
abella+0.654
top+0.609
 wrong+0.590
bench+0.586
box+0.533
Neg logit contributions
hr-0.467
 exc-0.391
oc-0.386
 tou-0.386
Ps-0.379
Token a
Token ID257
Feature activation-6.83e-01

Pos logit contributions
abella+5.127
bench+5.118
top+5.053
ven+4.375
 versa+4.316
Neg logit contributions
 little-2.899
 from-2.830
hr-2.822
 accidents-2.671
 tou-2.639
Token big
Token ID1263
Feature activation-7.76e-01

Pos logit contributions
abella+1.327
top+1.251
bench+1.197
 wrong+1.166
box+1.065
Neg logit contributions
hr-0.760
 tou-0.632
 exc-0.621
oc-0.607
 accidents-0.601
Token box
Token ID3091
Feature activation-6.03e-01

Pos logit contributions
abella+1.252
top+1.130
bench+1.050
 wrong+1.050
liv+0.913
Neg logit contributions
hr-1.145
oc-1.005
 tou-0.995
 exc-0.986
 accidents-0.976
Token in
Token ID287
Feature activation+6.43e-01

Pos logit contributions
abella+1.092
top+1.000
 wrong+0.957
bench+0.947
 versa+0.859
Neg logit contributions
hr-0.764
 exc-0.635
oc-0.620
 tou-0.617
EX-0.608
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
hr+0.836
oc+0.701
 tou+0.686
AAAA+0.686
Ps+0.682
Neg logit contributions
abella-1.136
top-1.084
 wrong-1.048
bench-1.033
box-0.933
Token.
Token ID13
Feature activation-1.44e+00

Pos logit contributions
abella+3.187
top+2.878
 wrong+2.754
bench+2.721
liv+2.584
Neg logit contributions
hr-1.537
 from-1.347
oc-1.338
 exc-1.325
 tou-1.222
Token One
Token ID1881
Feature activation-1.82e-01

Pos logit contributions
abella+2.036
top+1.850
bench+1.842
 wrong+1.824
box+1.665
Neg logit contributions
hr-2.269
 exc-2.104
Ps-2.090
oc-2.085
AAAA-1.998
Token day
Token ID1110
Feature activation-2.38e-01

Pos logit contributions
abella+0.215
top+0.195
 wrong+0.185
bench+0.181
box+0.155
Neg logit contributions
hr-0.344
oc-0.304
 exc-0.304
 tou-0.300
EX-0.297
Token,
Token ID11
Feature activation-4.05e-01

Pos logit contributions
abella+0.391
top+0.369
 wrong+0.355
bench+0.353
box+0.314
Neg logit contributions
hr-0.333
oc-0.278
 exc-0.278
AAAA-0.273
 accidents-0.273
Token they
Token ID484
Feature activation-2.36e-01

Pos logit contributions
abella+0.525
top+0.490
bench+0.468
 wrong+0.467
box+0.403
Neg logit contributions
hr-0.704
 exc-0.612
 accidents-0.605
oc-0.603
 tou-0.598
Token find
Token ID1064
Feature activation-2.74e+00

Pos logit contributions
abella+0.409
top+0.383
 wrong+0.372
bench+0.366
box+0.331
Neg logit contributions
hr-0.313
 exc-0.262
oc-0.260
 tou-0.258
Ps-0.254
Token a
Token ID257
Feature activation-7.01e-01

Pos logit contributions
abella+5.128
bench+5.041
top+5.039
 wrong+4.364
ven+4.342
Neg logit contributions
 little-2.871
hr-2.816
 from-2.809
 accidents-2.598
AAAA-2.586
Token big
Token ID1263
Feature activation-8.85e-01

Pos logit contributions
abella+1.313
top+1.232
bench+1.179
 wrong+1.143
box+1.042
Neg logit contributions
hr-0.834
 tou-0.708
 exc-0.691
oc-0.683
Ps-0.672
Token box
Token ID3091
Feature activation-5.37e-01

Pos logit contributions
abella+1.394
top+1.242
bench+1.158
 wrong+1.145
 versa+1.005
Neg logit contributions
hr-1.349
oc-1.201
 tou-1.184
 exc-1.167
 accidents-1.165
Token in
Token ID287
Feature activation+6.46e-01

Pos logit contributions
abella+0.973
top+0.894
 wrong+0.862
bench+0.849
 Luke+0.766
Neg logit contributions
hr-0.678
 exc-0.562
oc-0.550
 tou-0.548
EX-0.539
Token the
Token ID262
Feature activation-1.28e+00

Pos logit contributions
hr+0.831
oc+0.696
AAAA+0.678
 tou+0.677
Ps+0.674
Neg logit contributions
abella-1.149
top-1.096
 wrong-1.061
bench-1.046
box-0.942
Token They
Token ID1119
Feature activation-8.26e-01

Pos logit contributions
abella+3.099
 wrong+2.876
top+2.796
bench+2.639
box+2.475
Neg logit contributions
hr-2.070
oc-1.820
Ps-1.811
AAAA-1.759
 exc-1.753
Token were
Token ID547
Feature activation-1.08e+00

Pos logit contributions
abella+1.343
top+1.254
bench+1.215
 wrong+1.202
box+1.093
Neg logit contributions
hr-1.165
 exc-1.042
 tou-0.992
oc-0.989
Ps-0.982
Token always
Token ID1464
Feature activation-1.31e-01

Pos logit contributions
abella+1.750
top+1.668
bench+1.619
 wrong+1.534
box+1.389
Neg logit contributions
hr-1.580
 exc-1.402
 tou-1.336
 mis-1.259
 fortunate-1.258
Token playing
Token ID2712
Feature activation-7.41e-01

Pos logit contributions
abella+0.210
top+0.197
 wrong+0.189
bench+0.186
box+0.166
Neg logit contributions
hr-0.193
 exc-0.161
oc-0.160
 tou-0.159
 accidents-0.158
Token together
Token ID1978
Feature activation-1.24e+00

Pos logit contributions
abella+1.250
top+1.152
bench+1.103
 wrong+1.090
box+0.958
Neg logit contributions
hr-1.051
oc-0.903
AAAA-0.870
 exc-0.870
 accidents-0.853
Token and
Token ID290
Feature activation-2.74e+00

Pos logit contributions
abella+2.442
top+2.230
bench+2.212
 wrong+2.115
liv+1.969
Neg logit contributions
hr-1.384
 exc-1.111
oc-1.108
 accidents-1.059
 from-1.056
Token they
Token ID484
Feature activation-1.21e+00

Pos logit contributions
abella+5.308
bench+4.961
top+4.925
 wrong+4.601
box+4.472
Neg logit contributions
hr-3.536
 exc-2.976
 accidents-2.784
 mis-2.682
Ps-2.618
Token were
Token ID547
Feature activation-8.20e-01

Pos logit contributions
abella+1.989
top+1.881
bench+1.844
 wrong+1.805
box+1.670
Neg logit contributions
hr-1.663
 exc-1.502
oc-1.401
 tou-1.397
Ps-1.375
Token very
Token ID845
Feature activation-5.13e-01

Pos logit contributions
abella+1.303
top+1.239
bench+1.199
 wrong+1.150
box+1.061
Neg logit contributions
hr-1.223
 exc-1.057
 tou-1.002
 fortunate-0.984
 accidents-0.976
Token good
Token ID922
Feature activation-3.22e-01

Pos logit contributions
abella+1.067
top+1.013
bench+0.982
 wrong+0.979
box+0.918
Neg logit contributions
hr-0.514
 exc-0.418
oc-0.382
 fortunate-0.378
 tou-0.376
Token at
Token ID379
Feature activation-1.31e+00

Pos logit contributions
abella+0.692
top+0.661
 wrong+0.641
bench+0.638
box+0.587
Neg logit contributions
hr-0.297
 exc-0.220
oc-0.218
 tou-0.213
 accidents-0.212
Token
Token ID198
Feature activation-1.46e+00

Pos logit contributions
abella+3.103
top+3.014
bench+2.949
 wrong+2.942
box+2.743
Neg logit contributions
hr-1.677
oc-1.465
Ps-1.430
 exc-1.401
AAAA-1.377
TokenOne
Token ID3198
Feature activation-4.12e-01

Pos logit contributions
abella+1.875
bench+1.750
top+1.738
 wrong+1.702
box+1.561
Neg logit contributions
hr-2.496
Ps-2.246
oc-2.245
AAAA-2.206
 exc-2.196
Token day
Token ID1110
Feature activation-1.69e-01

Pos logit contributions
abella+0.509
top+0.461
 wrong+0.433
bench+0.431
box+0.372
Neg logit contributions
hr-0.764
 exc-0.677
oc-0.671
 accidents-0.655
 tou-0.655
Token,
Token ID11
Feature activation-3.11e-01

Pos logit contributions
abella+0.277
top+0.261
 wrong+0.252
bench+0.250
box+0.223
Neg logit contributions
hr-0.238
oc-0.199
 exc-0.197
AAAA-0.195
 tou-0.195
Token they
Token ID484
Feature activation-6.84e-01

Pos logit contributions
abella+0.411
top+0.385
 wrong+0.367
bench+0.367
box+0.318
Neg logit contributions
hr-0.533
 exc-0.460
oc-0.457
 accidents-0.456
 tou-0.452
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+1.247
top+1.161
 wrong+1.115
bench+1.113
box+1.012
Neg logit contributions
hr-0.845
 exc-0.716
 tou-0.698
Ps-0.695
AAAA-0.681
Token a
Token ID257
Feature activation-5.54e-01

Pos logit contributions
abella+5.015
bench+5.011
top+4.882
ven+4.257
rob+4.248
Neg logit contributions
hr-2.949
 little-2.929
 from-2.829
 accidents-2.731
AAAA-2.531
Token yellow
Token ID7872
Feature activation-2.51e-01

Pos logit contributions
abella+1.025
top+0.963
bench+0.915
 wrong+0.900
box+0.814
Neg logit contributions
hr-0.672
 tou-0.557
 exc-0.553
oc-0.545
 accidents-0.538
Token box
Token ID3091
Feature activation-4.97e-01

Pos logit contributions
abella+0.419
top+0.389
 wrong+0.376
bench+0.372
box+0.329
Neg logit contributions
hr-0.346
oc-0.295
 tou-0.289
 exc-0.288
 accidents-0.284
Token in
Token ID287
Feature activation+5.21e-01

Pos logit contributions
abella+0.886
top+0.816
 wrong+0.782
bench+0.772
 versa+0.693
Neg logit contributions
hr-0.639
 exc-0.529
oc-0.518
 tou-0.517
EX-0.511
Token their
Token ID511
Feature activation-2.90e-01

Pos logit contributions
hr+0.667
oc+0.558
 tou+0.543
AAAA+0.541
 exc+0.538
Neg logit contributions
abella-0.927
top-0.885
 wrong-0.857
bench-0.841
box-0.762
Token.
Token ID13
Feature activation-1.38e+00

Pos logit contributions
abella+1.470
top+1.353
 wrong+1.293
bench+1.291
box+1.188
Neg logit contributions
hr-0.942
oc-0.785
 exc-0.781
 accidents-0.764
 tou-0.752
Token One
Token ID1881
Feature activation-5.30e-01

Pos logit contributions
abella+1.834
bench+1.711
top+1.710
 wrong+1.679
box+1.538
Neg logit contributions
hr-2.214
 exc-2.032
Ps-2.027
oc-1.992
AAAA-1.941
Token day
Token ID1110
Feature activation-7.29e-02

Pos logit contributions
abella+0.672
top+0.603
bench+0.570
 wrong+0.566
box+0.486
Neg logit contributions
hr-0.963
 exc-0.859
oc-0.854
 tou-0.831
 accidents-0.826
Token,
Token ID11
Feature activation-2.85e-01

Pos logit contributions
abella+0.118
top+0.111
 wrong+0.107
bench+0.106
box+0.094
Neg logit contributions
hr-0.104
oc-0.088
 exc-0.087
 tou-0.086
AAAA-0.086
Token they
Token ID484
Feature activation-4.83e-01

Pos logit contributions
abella+0.368
top+0.343
 wrong+0.327
bench+0.326
box+0.280
Neg logit contributions
hr-0.498
 exc-0.432
oc-0.427
 accidents-0.427
 tou-0.425
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+0.860
top+0.802
 wrong+0.778
bench+0.767
box+0.697
Neg logit contributions
hr-0.615
 exc-0.522
 tou-0.509
oc-0.508
Ps-0.502
Token a
Token ID257
Feature activation-8.03e-01

Pos logit contributions
bench+5.101
abella+5.012
top+4.964
 wrong+4.370
ven+4.303
Neg logit contributions
 little-2.892
hr-2.874
 from-2.803
 accidents-2.594
 tou-2.575
Token big
Token ID1263
Feature activation-9.78e-01

Pos logit contributions
abella+1.534
top+1.441
bench+1.384
 wrong+1.333
box+1.221
Neg logit contributions
hr-0.930
 tou-0.783
 exc-0.762
oc-0.753
 accidents-0.740
Token box
Token ID3091
Feature activation-5.53e-01

Pos logit contributions
abella+1.530
top+1.358
bench+1.267
 wrong+1.249
 versa+1.095
Neg logit contributions
hr-1.511
oc-1.349
 tou-1.326
 accidents-1.306
 exc-1.305
Token in
Token ID287
Feature activation+7.11e-01

Pos logit contributions
abella+1.019
top+0.937
 wrong+0.906
bench+0.890
 versa+0.804
Neg logit contributions
hr-0.684
 exc-0.562
oc-0.551
 tou-0.548
EX-0.541
Token the
Token ID262
Feature activation-1.39e+00

Pos logit contributions
hr+0.921
oc+0.778
 tou+0.752
AAAA+0.752
Ps+0.752
Neg logit contributions
abella-1.259
top-1.203
 wrong-1.162
bench-1.147
box-1.030
Token were
Token ID547
Feature activation-8.97e-01

Pos logit contributions
abella+1.443
top+1.356
bench+1.320
 wrong+1.299
box+1.193
Neg logit contributions
hr-1.151
 exc-1.029
oc-0.975
 tou-0.973
Ps-0.973
Token walking
Token ID6155
Feature activation-4.90e-01

Pos logit contributions
abella+1.372
top+1.305
bench+1.258
 wrong+1.199
box+1.084
Neg logit contributions
hr-1.384
 exc-1.224
 tou-1.178
oc-1.136
 fortunate-1.126
Token in
Token ID287
Feature activation+3.77e-01

Pos logit contributions
abella+0.813
top+0.748
 wrong+0.729
bench+0.725
box+0.643
Neg logit contributions
hr-0.699
oc-0.596
 exc-0.583
EX-0.574
 tou-0.573
Token the
Token ID262
Feature activation-9.06e-01

Pos logit contributions
hr+0.469
oc+0.388
 tou+0.380
 exc+0.380
EX+0.375
Neg logit contributions
abella-0.678
top-0.646
 wrong-0.627
bench-0.613
box-0.558
Token park
Token ID3952
Feature activation-1.56e+00

Pos logit contributions
abella+1.481
bench+1.350
top+1.346
 wrong+1.283
box+1.173
Neg logit contributions
hr-1.308
 tou-1.139
Ps-1.119
oc-1.114
AAAA-1.102
Token and
Token ID290
Feature activation-2.72e+00

Pos logit contributions
abella+3.400
bench+3.115
top+2.976
 wrong+2.976
acey+2.852
Neg logit contributions
hr-1.577
 exc-1.200
oc-1.167
 from-1.085
EX-1.018
Token stopped
Token ID5025
Feature activation+4.71e-01

Pos logit contributions
bench+4.932
top+4.393
abella+4.391
 wrong+4.246
box+4.069
Neg logit contributions
hr-3.826
 pets-3.076
Ps-3.033
 from-3.027
 mis-2.960
Token in
Token ID287
Feature activation+2.47e-02

Pos logit contributions
hr+0.752
oc+0.656
 tou+0.641
 exc+0.636
EX+0.632
Neg logit contributions
abella-0.670
top-0.640
 wrong-0.624
bench-0.586
box-0.527
Token front
Token ID2166
Feature activation-1.84e-01

Pos logit contributions
hr+0.035
oc+0.030
 exc+0.029
 tou+0.029
EX+0.029
Neg logit contributions
abella-0.040
top-0.038
 wrong-0.037
bench-0.036
box-0.032
Token of
Token ID286
Feature activation-9.00e-01

Pos logit contributions
abella+0.369
top+0.347
 wrong+0.339
bench+0.335
box+0.307
Neg logit contributions
hr-0.194
oc-0.150
 exc-0.149
 tou-0.147
 accidents-0.146
Token a
Token ID257
Feature activation-5.06e-01

Pos logit contributions
abella+1.459
bench+1.384
top+1.379
 wrong+1.371
box+1.221
Neg logit contributions
hr-1.292
oc-1.058
 tou-1.043
 exc-1.029
EX-1.025
Tokenmy
Token ID1820
Feature activation-4.61e-01

Pos logit contributions
abella+1.067
top+0.966
 wrong+0.933
bench+0.921
 versa+0.813
Neg logit contributions
hr-0.961
oc-0.817
 exc-0.803
EX-0.785
 tou-0.784
Token.
Token ID13
Feature activation-8.10e-01

Pos logit contributions
abella+0.898
top+0.838
 wrong+0.812
bench+0.801
box+0.725
Neg logit contributions
hr-0.519
oc-0.419
 exc-0.411
 accidents-0.398
 tou-0.396
Token They
Token ID1119
Feature activation-1.96e+00

Pos logit contributions
abella+1.072
top+0.974
 wrong+0.965
bench+0.943
box+0.808
Neg logit contributions
hr-1.372
 exc-1.221
oc-1.217
Ps-1.214
EX-1.204
Token would
Token ID561
Feature activation-1.83e+00

Pos logit contributions
abella+2.374
top+2.303
 wrong+2.276
bench+2.206
box+1.992
Neg logit contributions
hr-3.431
 exc-3.344
Ps-3.197
 mis-3.121
 tou-3.080
Token stretch
Token ID7539
Feature activation-1.44e+00

Pos logit contributions
abella+3.537
top+3.268
 wrong+3.261
bench+3.240
box+3.004
Neg logit contributions
hr-2.151
 exc-2.017
 mis-1.762
Ps-1.696
oc-1.696
Token and
Token ID290
Feature activation-2.72e+00

Pos logit contributions
abella+2.848
top+2.666
 wrong+2.606
bench+2.591
box+2.337
Neg logit contributions
hr-1.562
 exc-1.275
 from-1.249
oc-1.238
 accidents-1.192
Token move
Token ID1445
Feature activation-8.16e-01

Pos logit contributions
abella+5.875
 wrong+5.303
bench+5.125
top+5.078
box+4.936
Neg logit contributions
hr-2.974
 exc-2.640
 mis-2.507
Ps-2.214
oc-2.187
Token their
Token ID511
Feature activation+2.29e-01

Pos logit contributions
abella+1.409
top+1.311
bench+1.279
 wrong+1.269
box+1.163
Neg logit contributions
hr-1.081
oc-0.899
 exc-0.889
 accidents-0.888
EX-0.869
Token bodies
Token ID5920
Feature activation-1.52e+00

Pos logit contributions
hr+0.327
oc+0.277
 tou+0.271
 exc+0.271
AAAA+0.268
Neg logit contributions
abella-0.372
top-0.351
 wrong-0.342
bench-0.333
box-0.301
Token in
Token ID287
Feature activation+1.72e-01

Pos logit contributions
abella+3.039
top+2.658
 wrong+2.555
bench+2.532
liv+2.391
Neg logit contributions
hr-1.739
 exc-1.489
 from-1.426
rum-1.291
oc-1.280
Token fun
Token ID1257
Feature activation-4.27e-01

Pos logit contributions
hr+0.219
oc+0.181
 tou+0.178
 exc+0.177
EX+0.175
Neg logit contributions
abella-0.306
top-0.290
 wrong-0.282
bench-0.276
box-0.251
Token
Token ID198
Feature activation-1.46e+00

Pos logit contributions
abella+2.678
top+2.600
 wrong+2.553
bench+2.513
box+2.352
Neg logit contributions
hr-1.506
oc-1.314
 exc-1.254
Ps-1.233
AAAA-1.209
TokenOne
Token ID3198
Feature activation-1.52e-01

Pos logit contributions
abella+1.900
top+1.782
bench+1.778
 wrong+1.777
box+1.613
Neg logit contributions
hr-2.456
oc-2.238
Ps-2.186
AAAA-2.153
 exc-2.146
Token day
Token ID1110
Feature activation-3.27e-01

Pos logit contributions
abella+0.178
top+0.162
 wrong+0.153
bench+0.150
box+0.128
Neg logit contributions
hr-0.290
oc-0.256
 exc-0.255
 tou-0.251
 accidents-0.249
Token,
Token ID11
Feature activation-3.65e-01

Pos logit contributions
abella+0.546
top+0.517
bench+0.498
 wrong+0.496
box+0.443
Neg logit contributions
hr-0.448
oc-0.373
 exc-0.370
AAAA-0.367
 accidents-0.366
Token they
Token ID484
Feature activation-1.19e-01

Pos logit contributions
abella+0.498
top+0.470
bench+0.449
 wrong+0.448
box+0.391
Neg logit contributions
hr-0.609
 exc-0.522
oc-0.520
 accidents-0.518
AAAA-0.513
Token find
Token ID1064
Feature activation-2.72e+00

Pos logit contributions
abella+0.200
top+0.188
 wrong+0.182
bench+0.178
box+0.161
Neg logit contributions
hr-0.164
oc-0.137
 exc-0.137
 tou-0.135
AAAA-0.134
Token a
Token ID257
Feature activation-8.54e-01

Pos logit contributions
abella+5.199
top+4.992
bench+4.961
liv+4.363
 wrong+4.359
Neg logit contributions
 little-2.936
hr-2.878
 from-2.766
 accidents-2.660
 pets-2.509
Token new
Token ID649
Feature activation-8.73e-01

Pos logit contributions
abella+1.650
top+1.535
bench+1.454
 wrong+1.420
liv+1.323
Neg logit contributions
hr-0.989
 tou-0.825
 exc-0.825
oc-0.801
 accidents-0.784
Token rocket
Token ID10701
Feature activation+3.51e-01

Pos logit contributions
abella+1.362
top+1.205
 wrong+1.133
bench+1.122
liv+1.005
Neg logit contributions
hr-1.355
oc-1.198
 exc-1.197
 tou-1.160
EX-1.143
Token in
Token ID287
Feature activation+2.80e-02

Pos logit contributions
hr+0.523
oc+0.440
 exc+0.434
 tou+0.433
EX+0.432
Neg logit contributions
abella-0.543
top-0.516
 wrong-0.505
bench-0.488
box-0.443
Token their
Token ID511
Feature activation-9.11e-01

Pos logit contributions
hr+0.034
oc+0.028
 exc+0.028
 tou+0.027
EX+0.027
Neg logit contributions
abella-0.051
top-0.049
 wrong-0.047
bench-0.046
box-0.042
Token
Token ID198
Feature activation+1.81e-01

Pos logit contributions
abella+1.177
top+1.105
 wrong+1.091
bench+1.065
box+0.975
Neg logit contributions
hr-0.835
oc-0.709
 exc-0.704
Ps-0.689
AAAA-0.689
TokenOne
Token ID3198
Feature activation-1.31e-01

Pos logit contributions
hr+0.363
oc+0.320
 tou+0.318
 exc+0.317
 accidents+0.316
Neg logit contributions
abella-0.191
top-0.174
 wrong-0.164
bench-0.158
box-0.132
Token day
Token ID1110
Feature activation+5.31e-01

Pos logit contributions
abella+0.154
top+0.140
 wrong+0.133
bench+0.130
box+0.111
Neg logit contributions
hr-0.249
oc-0.220
 exc-0.219
 tou-0.216
 accidents-0.214
Token,
Token ID11
Feature activation+7.51e-01

Pos logit contributions
hr+0.830
oc+0.711
 tou+0.697
EX+0.695
 exc+0.689
Neg logit contributions
abella-0.798
top-0.746
 wrong-0.733
bench-0.685
box-0.624
Token they
Token ID484
Feature activation-3.74e-01

Pos logit contributions
hr+1.318
oc+1.168
 tou+1.134
EX+1.126
rum+1.120
Neg logit contributions
abella-1.006
top-0.926
 wrong-0.901
bench-0.817
box-0.729
Token find
Token ID1064
Feature activation-2.71e+00

Pos logit contributions
abella+0.647
top+0.605
 wrong+0.585
bench+0.576
box+0.521
Neg logit contributions
hr-0.497
 exc-0.420
oc-0.413
 tou-0.412
AAAA-0.407
Token a
Token ID257
Feature activation-8.06e-01

Pos logit contributions
bench+5.071
abella+4.983
top+4.947
 wrong+4.333
 versa+4.318
Neg logit contributions
 little-2.941
hr-2.922
 from-2.722
 accidents-2.598
 tou-2.543
Token new
Token ID649
Feature activation-1.02e+00

Pos logit contributions
abella+1.541
top+1.448
bench+1.387
 wrong+1.344
 versa+1.235
Neg logit contributions
hr-0.932
 tou-0.776
 exc-0.761
oc-0.759
AAAA-0.732
Token truck
Token ID7779
Feature activation+3.03e-01

Pos logit contributions
abella+1.910
top+1.718
bench+1.652
 wrong+1.642
 versa+1.530
Neg logit contributions
hr-1.266
oc-1.122
 exc-1.064
 tou-1.019
EX-1.018
Token in
Token ID287
Feature activation+6.12e-01

Pos logit contributions
hr+0.443
oc+0.372
 exc+0.367
AAAA+0.366
 tou+0.365
Neg logit contributions
abella-0.480
top-0.455
 wrong-0.443
bench-0.430
box-0.389
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
hr+0.796
oc+0.670
Ps+0.647
EX+0.647
 tou+0.646
Neg logit contributions
abella-1.074
top-1.022
 wrong-0.989
bench-0.975
box-0.883
Token mask
Token ID9335
Feature activation-6.64e-01

Pos logit contributions
abella+0.884
top+0.815
 wrong+0.783
bench+0.774
box+0.671
Neg logit contributions
hr-0.792
oc-0.690
 tou-0.673
 exc-0.670
EX-0.659
Token.
Token ID13
Feature activation-7.07e-01

Pos logit contributions
abella+1.188
top+1.108
 wrong+1.066
bench+1.057
box+0.947
Neg logit contributions
hr-0.823
oc-0.688
 exc-0.678
 tou-0.673
EX-0.659
Token They
Token ID1119
Feature activation-9.46e-01

Pos logit contributions
abella+1.168
top+1.105
 wrong+1.087
bench+1.062
box+0.960
Neg logit contributions
hr-0.943
oc-0.805
Ps-0.801
 exc-0.797
AAAA-0.794
Token ran
Token ID4966
Feature activation-6.80e-01

Pos logit contributions
abella+1.634
top+1.548
 wrong+1.546
bench+1.483
box+1.340
Neg logit contributions
hr-1.215
 exc-1.082
oc-1.034
 tou-1.010
Ps-0.995
Token around
Token ID1088
Feature activation-5.09e-01

Pos logit contributions
abella+1.037
top+0.962
 wrong+0.924
bench+0.917
box+0.820
Neg logit contributions
hr-1.027
oc-0.879
 exc-0.864
 tou-0.861
EX-0.860
Token and
Token ID290
Feature activation-2.71e+00

Pos logit contributions
abella+1.050
top+0.991
 wrong+0.962
bench+0.959
box+0.878
Neg logit contributions
hr-0.522
oc-0.402
 exc-0.391
 accidents-0.386
AAAA-0.378
Token made
Token ID925
Feature activation-5.95e-01

Pos logit contributions
abella+5.085
top+4.851
 wrong+4.727
bench+4.692
box+4.457
Neg logit contributions
hr-3.208
 exc-2.694
 from-2.632
 mis-2.622
 pets-2.594
Token loud
Token ID7812
Feature activation+5.36e-01

Pos logit contributions
abella+1.049
top+1.001
bench+0.965
 wrong+0.961
box+0.870
Neg logit contributions
hr-0.759
oc-0.617
 accidents-0.615
 exc-0.614
 tou-0.598
Token noises
Token ID26782
Feature activation-3.35e-01

Pos logit contributions
hr+0.631
 tou+0.512
 exc+0.509
EX+0.506
AAAA+0.499
Neg logit contributions
abella-0.989
top-0.943
 wrong-0.928
bench-0.898
box-0.817
Token.
Token ID13
Feature activation-1.10e+00

Pos logit contributions
abella+0.591
top+0.551
 wrong+0.534
bench+0.528
box+0.481
Neg logit contributions
hr-0.436
oc-0.361
 exc-0.356
 tou-0.350
 accidents-0.350
Token
Token ID198
Feature activation-1.16e+00

Pos logit contributions
abella+1.926
top+1.846
bench+1.834
 wrong+1.826
box+1.661
Neg logit contributions
hr-1.332
Ps-1.149
AAAA-1.139
oc-1.119
 exc-1.118
TokenL
Token ID43
Feature activation+1.14e+00

Pos logit contributions
hr+1.818
 tou+1.560
oc+1.529
rum+1.505
 exc+1.474
Neg logit contributions
abella-2.176
top-2.072
 wrong-2.026
bench-1.945
box-1.744
Tokenily
Token ID813
Feature activation+2.70e-01

Pos logit contributions
hr+2.603
oc+2.399
 tou+2.358
EX+2.334
 exc+2.300
Neg logit contributions
abella-0.946
 wrong-0.813
top-0.762
bench-0.713
box-0.576
Token held
Token ID2714
Feature activation+8.66e-01

Pos logit contributions
hr+0.400
oc+0.335
 tou+0.333
 exc+0.332
EX+0.330
Neg logit contributions
abella-0.427
top-0.400
 wrong-0.386
bench-0.376
box-0.340
Token the
Token ID262
Feature activation+1.61e+00

Pos logit contributions
hr+1.211
oc+1.001
AAAA+0.978
 tou+0.976
 exc+0.971
Neg logit contributions
abella-1.455
top-1.384
 wrong-1.360
bench-1.291
 Luke-1.179
Token pit
Token ID6028
Feature activation+2.25e-01

Pos logit contributions
hr+2.465
oc+2.097
rum+2.037
Ps+2.004
AAAA+1.991
Neg logit contributions
 wrong-2.530
top-2.406
abella-2.294
bench-2.198
box-2.095
Token in
Token ID287
Feature activation+1.85e+00

Pos logit contributions
hr+0.287
oc+0.235
 tou+0.232
AAAA+0.231
 exc+0.230
Neg logit contributions
abella-0.398
top-0.381
 wrong-0.370
bench-0.361
box-0.329
Token her
Token ID607
Feature activation+1.09e+00

Pos logit contributions
hr+2.712
oc+2.412
 tou+2.224
rum+2.201
AAAA+2.180
Neg logit contributions
top-2.942
abella-2.893
 wrong-2.858
bench-2.808
 Luke-2.574
Token hand
Token ID1021
Feature activation-5.96e-01

Pos logit contributions
hr+1.585
Ps+1.304
EX+1.294
AAAA+1.292
oc+1.286
Neg logit contributions
abella-1.729
top-1.712
 wrong-1.698
bench-1.597
box-1.527
Token and
Token ID290
Feature activation+3.96e-01

Pos logit contributions
abella+1.381
top+1.287
 wrong+1.265
bench+1.262
box+1.164
Neg logit contributions
hr-0.445
oc-0.325
 exc-0.304
 accidents-0.299
 tou-0.291
Token said
Token ID531
Feature activation-2.13e-01

Pos logit contributions
hr+0.620
oc+0.535
 tou+0.525
 exc+0.524
EX+0.522
Neg logit contributions
abella-0.594
top-0.554
 wrong-0.533
bench-0.516
box-0.463
Token,
Token ID11
Feature activation+4.28e-01

Pos logit contributions
abella+0.354
top+0.334
bench+0.319
 wrong+0.318
box+0.285
Neg logit contributions
hr-0.298
 exc-0.249
oc-0.248
 tou-0.245
 accidents-0.245
TokenMom
Token ID29252
Feature activation+2.05e-01

Pos logit contributions
hr+0.654
oc+0.551
 exc+0.544
 tou+0.540
rum+0.537
Neg logit contributions
abella-0.706
top-0.665
 wrong-0.634
bench-0.621
box-0.556
Tokenmy
Token ID1820
Feature activation-3.26e-01

Pos logit contributions
hr+0.375
oc+0.329
 tou+0.327
 exc+0.326
AAAA+0.325
Neg logit contributions
abella-0.246
top-0.227
 wrong-0.218
bench-0.211
box-0.184
Token,
Token ID11
Feature activation+2.25e-01

Pos logit contributions
abella+0.550
top+0.515
 wrong+0.493
bench+0.492
box+0.437
Neg logit contributions
hr-0.450
oc-0.376
 exc-0.371
 accidents-0.368
 tou-0.367
Token Daddy
Token ID26926
Feature activation-6.57e-01

Pos logit contributions
hr+0.341
oc+0.292
 exc+0.285
 tou+0.285
EX+0.284
Neg logit contributions
abella-0.344
top-0.320
 wrong-0.314
bench-0.301
box-0.273
Token,
Token ID11
Feature activation+5.39e-01

Pos logit contributions
abella+1.163
top+1.079
bench+1.033
 wrong+1.026
 projector+0.920
Neg logit contributions
hr-0.871
oc-0.726
 exc-0.714
 accidents-0.707
 tou-0.697
Token come
Token ID1282
Feature activation+2.20e+00

Pos logit contributions
hr+0.909
oc+0.803
EX+0.779
AAAA+0.771
 tou+0.771
Neg logit contributions
abella-0.732
 wrong-0.677
top-0.668
bench-0.619
box-0.567
Token see
Token ID766
Feature activation+8.73e-01

Pos logit contributions
hr+3.575
AAAA+3.238
 fortunate+3.217
rum+3.179
 accidents+3.138
Neg logit contributions
top-2.915
abella-2.895
 wrong-2.795
box-2.544
 projector-2.502
Token!"
Token ID2474
Feature activation+3.81e-02

Pos logit contributions
hr+1.369
 tou+1.184
oc+1.183
AAAA+1.175
 exc+1.174
Neg logit contributions
abella-1.292
 wrong-1.209
top-1.203
bench-1.088
box-1.013
Token she
Token ID673
Feature activation+1.66e-01

Pos logit contributions
hr+0.060
oc+0.051
 exc+0.050
 tou+0.050
EX+0.050
Neg logit contributions
abella-0.057
top-0.053
 wrong-0.051
bench-0.050
box-0.045
Token exclaimed
Token ID33552
Feature activation-4.33e-01

Pos logit contributions
hr+0.269
oc+0.232
 tou+0.229
 exc+0.229
EX+0.227
Neg logit contributions
abella-0.238
top-0.221
 wrong-0.213
bench-0.208
box-0.184
Token.
Token ID13
Feature activation-6.19e-02

Pos logit contributions
abella+0.773
top+0.717
 wrong+0.692
bench+0.685
box+0.624
Neg logit contributions
hr-0.564
 exc-0.465
oc-0.458
 accidents-0.456
 tou-0.442
Token,
Token ID11
Feature activation+8.00e-01

Pos logit contributions
abella+0.041
top+0.038
 wrong+0.037
bench+0.037
box+0.033
Neg logit contributions
hr-0.036
oc-0.030
 exc-0.030
 tou-0.029
 accidents-0.029
Token she
Token ID673
Feature activation-7.33e-01

Pos logit contributions
hr+1.234
oc+1.089
rum+1.051
AAAA+1.032
EX+1.027
Neg logit contributions
abella-1.243
top-1.160
 wrong-1.139
bench-1.060
 Luke-0.966
Token decided
Token ID3066
Feature activation+4.11e-01

Pos logit contributions
abella+1.226
top+1.170
 wrong+1.126
bench+1.109
box+0.973
Neg logit contributions
hr-0.986
oc-0.849
 exc-0.840
 tou-0.822
Ps-0.813
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
hr+0.718
oc+0.635
 exc+0.615
 tou+0.614
AAAA+0.614
Neg logit contributions
abella-0.545
top-0.504
 wrong-0.500
bench-0.471
box-0.414
Token take
Token ID1011
Feature activation-4.38e-01

Pos logit contributions
abella+0.835
top+0.771
 wrong+0.745
bench+0.738
box+0.662
Neg logit contributions
hr-0.650
 exc-0.547
oc-0.538
 tou-0.529
 accidents-0.525
Token her
Token ID607
Feature activation+1.82e+00

Pos logit contributions
abella+0.795
top+0.753
 wrong+0.729
bench+0.726
box+0.662
Neg logit contributions
hr-0.536
oc-0.439
 accidents-0.434
 exc-0.433
 tou-0.432
Token daughter
Token ID4957
Feature activation+8.46e-01

Pos logit contributions
hr+2.727
AAAA+2.390
Ps+2.303
 tou+2.257
oc+2.242
Neg logit contributions
 wrong-2.956
abella-2.832
top-2.753
bench-2.591
box-2.545
Token to
Token ID284
Feature activation+1.13e+00

Pos logit contributions
hr+1.459
AAAA+1.301
oc+1.276
Ps+1.275
 tou+1.268
Neg logit contributions
abella-1.080
top-1.056
 wrong-1.039
bench-0.981
box-0.892
Token the
Token ID262
Feature activation+9.01e-02

Pos logit contributions
hr+1.575
oc+1.436
AAAA+1.350
 tou+1.336
rum+1.322
Neg logit contributions
abella-1.835
 wrong-1.747
top-1.736
bench-1.671
 Luke-1.496
Token toy
Token ID13373
Feature activation-1.76e-01

Pos logit contributions
hr+0.140
oc+0.119
 exc+0.118
 tou+0.117
EX+0.117
Neg logit contributions
abella-0.135
top-0.126
 wrong-0.123
bench-0.119
box-0.106
Token store
Token ID3650
Feature activation-7.75e-01

Pos logit contributions
abella+0.240
top+0.220
 wrong+0.212
bench+0.208
box+0.179
Neg logit contributions
hr-0.297
oc-0.260
 tou-0.254
 exc-0.254
 accidents-0.250
Token watch
Token ID2342
Feature activation-6.23e-02

Pos logit contributions
abella+0.642
top+0.596
 wrong+0.576
bench+0.573
box+0.507
Neg logit contributions
hr-0.557
oc-0.470
 exc-0.462
 tou-0.456
EX-0.452
Token for
Token ID329
Feature activation+1.63e-01

Pos logit contributions
abella+0.105
top+0.099
 wrong+0.096
bench+0.094
box+0.085
Neg logit contributions
hr-0.084
oc-0.070
 exc-0.069
 tou-0.068
EX-0.068
Token him
Token ID683
Feature activation+1.03e+00

Pos logit contributions
hr+0.246
oc+0.210
 exc+0.207
 tou+0.206
AAAA+0.205
Neg logit contributions
abella-0.253
top-0.236
 wrong-0.229
bench-0.223
box-0.200
Token.
Token ID13
Feature activation+4.70e-01

Pos logit contributions
hr+1.564
AAAA+1.376
Ps+1.358
 tou+1.331
oc+1.329
Neg logit contributions
abella-1.565
top-1.507
 wrong-1.489
bench-1.397
box-1.285
Token When
Token ID1649
Feature activation-6.53e-01

Pos logit contributions
hr+0.741
 tou+0.624
oc+0.624
 exc+0.623
 accidents+0.615
Neg logit contributions
abella-0.705
top-0.659
 wrong-0.624
bench-0.617
box-0.551
Token he
Token ID339
Feature activation+1.85e+00

Pos logit contributions
abella+0.997
top+0.947
bench+0.908
 wrong+0.903
box+0.802
Neg logit contributions
hr-0.984
 accidents-0.842
 tou-0.829
oc-0.824
 exc-0.823
Token opened
Token ID4721
Feature activation-2.20e-01

Pos logit contributions
hr+3.328
rum+3.001
 accidents+2.947
 fortunate+2.920
EX+2.919
Neg logit contributions
abella-2.544
top-2.223
 wrong-2.107
bench-2.001
box-1.845
Token it
Token ID340
Feature activation-1.58e-01

Pos logit contributions
abella+0.407
top+0.384
 wrong+0.374
bench+0.370
box+0.336
Neg logit contributions
hr-0.261
oc-0.211
 exc-0.207
 tou-0.206
 accidents-0.205
Token,
Token ID11
Feature activation+1.33e-01

Pos logit contributions
abella+0.260
top+0.242
 wrong+0.234
bench+0.232
box+0.206
Neg logit contributions
hr-0.223
oc-0.188
 exc-0.184
 tou-0.183
 accidents-0.183
Token he
Token ID339
Feature activation+2.29e-02

Pos logit contributions
hr+0.208
oc+0.178
 exc+0.175
 tou+0.175
EX+0.174
Neg logit contributions
abella-0.199
top-0.186
 wrong-0.180
bench-0.175
box-0.156
Token was
Token ID373
Feature activation-4.71e-01

Pos logit contributions
hr+0.035
oc+0.029
 exc+0.029
 tou+0.029
EX+0.029
Neg logit contributions
abella-0.035
top-0.033
 wrong-0.032
bench-0.031
box-0.028
Token the
Token ID262
Feature activation-5.74e-01

Pos logit contributions
hr+2.511
oc+2.157
 tou+2.135
 fortunate+2.121
 exc+2.120
Neg logit contributions
top-2.409
abella-2.352
 wrong-2.345
bench-2.205
 Luke-2.040
Token grass
Token ID8701
Feature activation-1.21e+00

Pos logit contributions
abella+1.018
top+0.929
bench+0.911
 wrong+0.896
box+0.819
Neg logit contributions
hr-0.752
 tou-0.637
Ps-0.627
oc-0.625
AAAA-0.622
Token.
Token ID13
Feature activation+2.85e-01

Pos logit contributions
abella+2.414
bench+2.131
top+2.114
 wrong+2.075
 versa+1.941
Neg logit contributions
hr-1.391
oc-1.100
 exc-1.078
 tou-1.048
 from-1.041
Token She
Token ID1375
Feature activation+6.28e-01

Pos logit contributions
hr+0.471
oc+0.402
 exc+0.398
 tou+0.397
 accidents+0.393
Neg logit contributions
abella-0.406
top-0.379
 wrong-0.364
bench-0.355
box-0.311
Token wanted
Token ID2227
Feature activation-1.30e+00

Pos logit contributions
hr+0.838
oc+0.688
EX+0.674
 accidents+0.673
 tou+0.671
Neg logit contributions
abella-1.099
top-1.017
 wrong-0.996
bench-0.951
box-0.867
Token to
Token ID284
Feature activation+1.84e+00

Pos logit contributions
abella+2.200
bench+2.089
top+2.083
 wrong+1.916
box+1.830
Neg logit contributions
hr-1.755
 accidents-1.525
 tou-1.509
Fear-1.463
EX-1.461
Token catch
Token ID4929
Feature activation-2.38e-02

Pos logit contributions
hr+2.868
 accidents+2.568
 fortunate+2.469
oc+2.443
AAAA+2.372
Neg logit contributions
abella-2.675
top-2.612
 wrong-2.449
bench-2.265
 Luke-2.226
Token it
Token ID340
Feature activation+1.03e+00

Pos logit contributions
abella+0.035
top+0.032
 wrong+0.031
bench+0.031
box+0.027
Neg logit contributions
hr-0.038
oc-0.033
 exc-0.032
 tou-0.032
 accidents-0.032
Token,
Token ID11
Feature activation+4.31e-01

Pos logit contributions
hr+1.625
AAAA+1.388
Ps+1.381
EX+1.376
oc+1.368
Neg logit contributions
abella-1.493
 wrong-1.489
top-1.465
bench-1.334
box-1.212
Token but
Token ID475
Feature activation+8.01e-01

Pos logit contributions
hr+0.560
oc+0.471
 tou+0.453
AAAA+0.453
EX+0.452
Neg logit contributions
abella-0.756
top-0.713
 wrong-0.702
bench-0.674
box-0.613
Token it
Token ID340
Feature activation+8.18e-01

Pos logit contributions
hr+1.325
oc+1.155
EX+1.112
rum+1.103
AAAA+1.101
Neg logit contributions
abella-1.175
 wrong-1.078
top-1.064
bench-0.982
 Luke-0.895
TokenBut
Token ID1537
Feature activation-2.13e-01

Pos logit contributions
hr+0.391
oc+0.331
 tou+0.329
 exc+0.328
 accidents+0.322
Neg logit contributions
abella-0.404
top-0.381
 wrong-0.367
bench-0.358
box-0.320
Token when
Token ID618
Feature activation-7.23e-01

Pos logit contributions
abella+0.347
top+0.325
 wrong+0.313
bench+0.311
box+0.278
Neg logit contributions
hr-0.305
oc-0.254
 accidents-0.253
 exc-0.252
 tou-0.249
Token it
Token ID340
Feature activation+9.51e-01

Pos logit contributions
abella+1.124
top+1.034
bench+0.995
 wrong+0.973
box+0.896
Neg logit contributions
hr-1.084
 accidents-0.951
Ps-0.914
 exc-0.913
AAAA-0.911
Token was
Token ID373
Feature activation-6.37e-01

Pos logit contributions
hr+1.443
oc+1.245
 exc+1.223
rum+1.219
 tou+1.218
Neg logit contributions
abella-1.484
top-1.382
 wrong-1.366
bench-1.279
box-1.169
Token time
Token ID640
Feature activation-1.25e+00

Pos logit contributions
abella+0.885
top+0.831
 wrong+0.766
bench+0.761
box+0.670
Neg logit contributions
hr-1.050
 exc-0.914
 accidents-0.908
Ps-0.894
 tou-0.894
Token to
Token ID284
Feature activation+7.58e-01

Pos logit contributions
abella+1.956
top+1.871
bench+1.736
 wrong+1.674
 projector+1.553
Neg logit contributions
hr-1.773
 accidents-1.515
EX-1.478
 from-1.476
 exc-1.462
Token go
Token ID467
Feature activation-2.30e-01

Pos logit contributions
hr+1.060
oc+0.920
AAAA+0.873
 accidents+0.871
EX+0.867
Neg logit contributions
abella-1.260
top-1.182
 wrong-1.153
bench-1.124
box-1.005
Token to
Token ID284
Feature activation+5.87e-01

Pos logit contributions
abella+0.369
top+0.344
 wrong+0.330
bench+0.327
box+0.290
Neg logit contributions
hr-0.336
oc-0.281
 exc-0.280
 tou-0.279
 accidents-0.277
Token bed
Token ID3996
Feature activation-5.54e-01

Pos logit contributions
hr+0.876
oc+0.762
AAAA+0.733
 tou+0.732
EX+0.732
Neg logit contributions
abella-0.917
top-0.860
 wrong-0.837
bench-0.810
box-0.722
Token,
Token ID11
Feature activation-3.64e-02

Pos logit contributions
abella+0.930
top+0.879
bench+0.836
 wrong+0.833
box+0.745
Neg logit contributions
hr-0.758
 accidents-0.623
 exc-0.612
oc-0.611
 tou-0.602
Token Jack
Token ID3619
Feature activation-3.23e-02

Pos logit contributions
abella+0.049
top+0.045
 wrong+0.043
bench+0.042
box+0.037
Neg logit contributions
hr-0.062
oc-0.054
 exc-0.054
 tou-0.053
 accidents-0.053
Token play
Token ID711
Feature activation-3.40e-01

Pos logit contributions
abella+1.307
top+1.200
 wrong+1.186
bench+1.166
box+1.056
Neg logit contributions
hr-0.983
 exc-0.848
oc-0.815
 tou-0.809
EX-0.784
Token in
Token ID287
Feature activation+4.29e-01

Pos logit contributions
abella+0.588
top+0.553
 wrong+0.531
bench+0.530
box+0.476
Neg logit contributions
hr-0.453
oc-0.381
 exc-0.372
AAAA-0.367
 tou-0.366
Token the
Token ID262
Feature activation-1.20e+00

Pos logit contributions
hr+0.527
oc+0.431
 tou+0.427
EX+0.423
Ps+0.422
Neg logit contributions
abella-0.776
top-0.743
 wrong-0.723
bench-0.701
box-0.642
Token water
Token ID1660
Feature activation-1.34e+00

Pos logit contributions
abella+2.092
bench+1.928
top+1.879
 wrong+1.790
box+1.677
Neg logit contributions
hr-1.634
oc-1.440
 tou-1.404
AAAA-1.376
Ps-1.362
Token and
Token ID290
Feature activation-1.25e+00

Pos logit contributions
abella+3.055
bench+2.797
top+2.753
 wrong+2.699
 versa+2.563
Neg logit contributions
hr-1.138
oc-0.852
 exc-0.822
 from-0.775
 accidents-0.754
Token feel
Token ID1254
Feature activation+7.10e-01

Pos logit contributions
abella+2.262
bench+2.075
top+2.055
 wrong+2.029
box+1.909
Neg logit contributions
hr-1.669
 exc-1.414
oc-1.380
 tou-1.359
Ps-1.322
Token the
Token ID262
Feature activation-4.67e-01

Pos logit contributions
hr+0.927
oc+0.785
EX+0.764
AAAA+0.759
Ps+0.759
Neg logit contributions
abella-1.220
top-1.173
 wrong-1.162
bench-1.093
 projector-0.982
Token warm
Token ID5814
Feature activation+3.38e-01

Pos logit contributions
abella+0.787
top+0.723
bench+0.702
 wrong+0.694
box+0.633
Neg logit contributions
hr-0.663
 tou-0.554
 exc-0.554
oc-0.546
 accidents-0.540
Token sun
Token ID4252
Feature activation-2.65e-01

Pos logit contributions
hr+0.529
oc+0.452
 exc+0.447
EX+0.443
 tou+0.443
Neg logit contributions
abella-0.495
top-0.473
 wrong-0.455
bench-0.440
box-0.389
Token shining
Token ID22751
Feature activation-3.12e-01

Pos logit contributions
abella+0.447
top+0.413
 wrong+0.402
bench+0.397
box+0.358
Neg logit contributions
hr-0.367
oc-0.305
 tou-0.303
 exc-0.302
 accidents-0.300
Token on
Token ID319
Feature activation+5.39e-01

Pos logit contributions
abella+0.554
top+0.513
bench+0.497
 wrong+0.497
box+0.452
Neg logit contributions
hr-0.398
oc-0.328
 exc-0.325
 tou-0.325
 accidents-0.325
Token and
Token ID290
Feature activation-4.97e-01

Pos logit contributions
abella+1.192
top+1.106
 wrong+1.094
bench+1.081
box+0.984
Neg logit contributions
hr-0.534
oc-0.412
 exc-0.392
 tou-0.385
EX-0.376
Token said
Token ID531
Feature activation-7.57e-01

Pos logit contributions
abella+0.783
top+0.731
 wrong+0.712
bench+0.705
box+0.622
Neg logit contributions
hr-0.737
 exc-0.616
oc-0.615
 tou-0.612
EX-0.600
Token that
Token ID326
Feature activation-7.11e-01

Pos logit contributions
abella+1.272
top+1.221
bench+1.183
 wrong+1.135
box+1.053
Neg logit contributions
hr-1.025
 exc-0.865
 accidents-0.848
 tou-0.843
oc-0.836
Token if
Token ID611
Feature activation-8.40e-01

Pos logit contributions
abella+1.167
top+1.102
bench+1.068
 wrong+1.047
box+0.944
Neg logit contributions
hr-1.006
 exc-0.848
 accidents-0.843
 tou-0.820
oc-0.816
Token she
Token ID673
Feature activation+5.90e-01

Pos logit contributions
abella+1.321
top+1.262
bench+1.250
 wrong+1.176
box+1.095
Neg logit contributions
hr-1.229
 accidents-1.036
 exc-1.018
 tou-0.999
Ps-0.987
Token did
Token ID750
Feature activation+8.11e-01

Pos logit contributions
hr+0.945
 accidents+0.797
oc+0.792
EX+0.789
AAAA+0.789
Neg logit contributions
abella-0.869
top-0.806
 wrong-0.772
bench-0.749
box-0.667
Token,
Token ID11
Feature activation+7.44e-01

Pos logit contributions
hr+1.264
oc+1.112
AAAA+1.073
Ps+1.067
EX+1.067
Neg logit contributions
abella-1.226
 wrong-1.146
top-1.134
bench-1.035
box-0.923
Token she
Token ID673
Feature activation+5.09e-01

Pos logit contributions
hr+1.126
oc+0.994
AAAA+0.968
 tou+0.954
EX+0.950
Neg logit contributions
abella-1.134
top-1.070
 wrong-1.044
bench-0.980
box-0.893
Token would
Token ID561
Feature activation-1.73e-01

Pos logit contributions
hr+0.933
 accidents+0.806
oc+0.805
AAAA+0.804
EX+0.800
Neg logit contributions
abella-0.625
top-0.577
 wrong-0.549
bench-0.528
box-0.460
Token make
Token ID787
Feature activation-3.84e-01

Pos logit contributions
abella+0.290
top+0.271
 wrong+0.263
bench+0.259
box+0.233
Neg logit contributions
hr-0.237
oc-0.198
 exc-0.198
 tou-0.196
EX-0.192
Token all
Token ID477
Feature activation+1.02e+00

Pos logit contributions
abella+0.653
top+0.618
bench+0.595
 wrong+0.594
box+0.533
Neg logit contributions
hr-0.519
oc-0.423
 exc-0.422
 accidents-0.422
 tou-0.418
Token little
Token ID1310
Feature activation-1.02e-01

Pos logit contributions
hr+0.038
oc+0.031
 exc+0.030
 tou+0.030
EX+0.029
Neg logit contributions
abella-0.064
top-0.060
 wrong-0.059
bench-0.058
box-0.053
Token girl
Token ID2576
Feature activation-8.92e-02

Pos logit contributions
abella+0.221
top+0.210
 wrong+0.205
bench+0.202
box+0.187
Neg logit contributions
hr-0.090
oc-0.067
 exc-0.066
 tou-0.065
 accidents-0.064
Token smiled
Token ID13541
Feature activation+9.49e-01

Pos logit contributions
abella+0.138
top+0.129
 wrong+0.125
bench+0.123
box+0.109
Neg logit contributions
hr-0.134
oc-0.114
 exc-0.113
 tou-0.111
AAAA-0.110
Token and
Token ID290
Feature activation+6.51e-02

Pos logit contributions
hr+1.372
AAAA+1.142
oc+1.133
EX+1.128
 tou+1.116
Neg logit contributions
abella-1.531
top-1.489
 wrong-1.467
bench-1.385
box-1.251
Token said
Token ID531
Feature activation-1.04e+00

Pos logit contributions
hr+0.099
oc+0.085
 exc+0.084
 tou+0.083
EX+0.083
Neg logit contributions
abella-0.100
top-0.093
 wrong-0.090
bench-0.088
box-0.078
Token,
Token ID11
Feature activation+4.28e-01

Pos logit contributions
abella+1.881
top+1.782
bench+1.723
 wrong+1.662
 projector+1.534
Neg logit contributions
hr-1.294
 exc-1.103
 accidents-1.061
oc-1.052
rum-1.043
Token "
Token ID366
Feature activation-4.14e-01

Pos logit contributions
hr+0.613
oc+0.519
 tou+0.508
EX+0.504
 accidents+0.502
Neg logit contributions
abella-0.692
top-0.652
 wrong-0.630
bench-0.614
box-0.552
TokenThank
Token ID10449
Feature activation-8.37e-01

Pos logit contributions
abella+0.743
top+0.692
 wrong+0.692
bench+0.670
box+0.618
Neg logit contributions
hr-0.530
oc-0.436
 exc-0.431
 tou-0.426
EX-0.424
Token you
Token ID345
Feature activation-9.66e-01

Pos logit contributions
abella+1.444
top+1.353
bench+1.323
 wrong+1.289
box+1.163
Neg logit contributions
hr-1.162
 accidents-0.951
oc-0.950
 exc-0.940
 tou-0.939
Token,
Token ID11
Feature activation+1.03e+00

Pos logit contributions
abella+1.726
top+1.639
bench+1.606
 wrong+1.541
 versa+1.396
Neg logit contributions
hr-1.220
oc-1.044
 accidents-1.011
 exc-0.992
 tou-0.972
Token bird
Token ID6512
Feature activation-3.36e-04

Pos logit contributions
hr+1.737
oc+1.517
EX+1.493
AAAA+1.481
rum+1.459
Neg logit contributions
abella-1.433
top-1.277
 wrong-1.276
bench-1.186
 Luke-1.111
Token�
Token ID95
Feature activation-7.77e-01

Pos logit contributions
abella+0.687
top+0.634
 wrong+0.620
bench+0.614
box+0.548
Neg logit contributions
hr-0.741
oc-0.633
 exc-0.623
 tou-0.621
AAAA-0.615
Token€
Token ID26391
Feature activation-2.44e-01

Pos logit contributions
abella+1.184
top+1.073
 wrong+1.059
bench+1.034
box+0.931
Neg logit contributions
hr-1.210
oc-1.048
EX-1.012
 tou-1.005
AAAA-1.000
Token�
Token ID129
Feature activation+5.66e-02

Pos logit contributions
abella+0.405
top+0.378
 wrong+0.369
bench+0.361
box+0.322
Neg logit contributions
hr-0.343
oc-0.292
 exc-0.282
 tou-0.281
EX-0.279
Token�
Token ID241
Feature activation-4.30e-01

Pos logit contributions
hr+0.088
oc+0.075
 exc+0.074
 tou+0.074
EX+0.073
Neg logit contributions
abella-0.085
top-0.079
 wrong-0.077
bench-0.075
box-0.066
TokenLook
Token ID8567
Feature activation+9.43e-01

Pos logit contributions
abella+0.640
top+0.600
 wrong+0.591
bench+0.576
box+0.519
Neg logit contributions
hr-0.664
oc-0.567
 exc-0.562
EX-0.552
 tou-0.549
Token at
Token ID379
Feature activation+2.11e-01

Pos logit contributions
hr+1.373
AAAA+1.163
EX+1.162
 tou+1.161
oc+1.161
Neg logit contributions
abella-1.564
top-1.407
 wrong-1.398
bench-1.263
box-1.221
Token all
Token ID477
Feature activation+9.16e-01

Pos logit contributions
hr+0.269
oc+0.221
 tou+0.217
 exc+0.217
EX+0.216
Neg logit contributions
abella-0.378
top-0.355
 wrong-0.347
bench-0.338
box-0.310
Token the
Token ID262
Feature activation-4.43e-01

Pos logit contributions
hr+1.209
Ps+1.032
oc+1.031
AAAA+1.022
 tou+1.022
Neg logit contributions
abella-1.567
top-1.491
 wrong-1.462
bench-1.406
box-1.297
Token kind
Token ID1611
Feature activation+1.87e-01

Pos logit contributions
abella+0.846
top+0.792
bench+0.762
 wrong+0.762
box+0.687
Neg logit contributions
hr-0.531
 exc-0.423
oc-0.419
 tou-0.413
 accidents-0.412
Token prints
Token ID20842
Feature activation+3.94e-01

Pos logit contributions
hr+0.203
oc+0.161
 exc+0.158
 tou+0.157
AAAA+0.156
Neg logit contributions
abella-0.366
top-0.350
 wrong-0.340
bench-0.334
box-0.308
Token I
Token ID314
Feature activation+3.85e-01

Pos logit contributions
hr+0.548
oc+0.462
 exc+0.453
 tou+0.451
AAAA+0.450
Neg logit contributions
abella-0.645
top-0.613
 wrong-0.598
bench-0.577
box-0.529
Token Dad
Token ID17415
Feature activation-8.00e-01

Pos logit contributions
abella+0.235
top+0.221
 wrong+0.214
bench+0.210
box+0.187
Neg logit contributions
hr-0.224
oc-0.191
 exc-0.188
 tou-0.186
AAAA-0.186
Token kept
Token ID4030
Feature activation-4.29e-01

Pos logit contributions
abella+1.324
top+1.256
bench+1.211
 wrong+1.209
box+1.065
Neg logit contributions
hr-1.102
oc-0.929
 accidents-0.897
 exc-0.895
Ps-0.892
Token talking
Token ID3375
Feature activation-1.23e-01

Pos logit contributions
abella+0.744
top+0.702
 wrong+0.668
bench+0.665
box+0.600
Neg logit contributions
hr-0.573
 accidents-0.467
 exc-0.466
 tou-0.460
oc-0.459
Token gently
Token ID15165
Feature activation+5.05e-01

Pos logit contributions
abella+0.227
top+0.214
 wrong+0.208
bench+0.205
box+0.187
Neg logit contributions
hr-0.149
oc-0.121
 exc-0.119
 accidents-0.118
 tou-0.118
Token and
Token ID290
Feature activation-1.14e+00

Pos logit contributions
hr+0.603
 tou+0.489
oc+0.488
 exc+0.487
AAAA+0.486
Neg logit contributions
abella-0.928
top-0.890
 wrong-0.871
bench-0.845
box-0.772
Token said
Token ID531
Feature activation-9.06e-01

Pos logit contributions
abella+1.892
top+1.777
bench+1.735
 wrong+1.687
box+1.541
Neg logit contributions
hr-1.620
 accidents-1.348
 exc-1.305
oc-1.279
 tou-1.277
Token he
Token ID339
Feature activation-1.30e+00

Pos logit contributions
abella+1.467
top+1.399
bench+1.363
 wrong+1.284
box+1.197
Neg logit contributions
hr-1.281
 accidents-1.112
 exc-1.074
oc-1.064
AAAA-1.041
Token would
Token ID561
Feature activation-9.89e-01

Pos logit contributions
abella+2.259
top+2.154
bench+2.140
 wrong+2.016
box+1.877
Neg logit contributions
hr-1.638
oc-1.434
 exc-1.401
 accidents-1.382
Ps-1.346
Token buy
Token ID2822
Feature activation-9.70e-01

Pos logit contributions
abella+1.694
top+1.572
bench+1.531
 wrong+1.503
box+1.379
Neg logit contributions
hr-1.344
 exc-1.160
 tou-1.109
oc-1.105
 accidents-1.075
Token her
Token ID607
Feature activation+1.27e+00

Pos logit contributions
abella+1.570
top+1.472
bench+1.452
 wrong+1.383
box+1.265
Neg logit contributions
hr-1.361
 accidents-1.176
oc-1.142
 exc-1.119
 tou-1.115
Token a
Token ID257
Feature activation-5.87e-02

Pos logit contributions
hr+2.154
AAAA+1.918
Ps+1.904
oc+1.874
 tou+1.861
Neg logit contributions
abella-1.688
 wrong-1.653
top-1.619
bench-1.471
box-1.351
Token Max
Token ID5436
Feature activation+1.06e-01

Pos logit contributions
abella+1.531
top+1.450
bench+1.375
 wrong+1.367
box+1.234
Neg logit contributions
hr-1.383
 exc-1.159
 tou-1.130
 accidents-1.122
Ps-1.117
Token at
Token ID379
Feature activation+8.97e-01

Pos logit contributions
hr+0.146
oc+0.122
 exc+0.120
 tou+0.120
EX+0.119
Neg logit contributions
abella-0.177
top-0.167
 wrong-0.162
bench-0.159
box-0.143
Token the
Token ID262
Feature activation-1.21e+00

Pos logit contributions
hr+1.224
oc+1.070
rum+1.042
AAAA+1.014
 exc+1.014
Neg logit contributions
abella-1.483
top-1.422
 wrong-1.402
bench-1.345
box-1.226
Token park
Token ID3952
Feature activation+5.21e-02

Pos logit contributions
abella+1.983
bench+1.773
top+1.757
 wrong+1.630
 versa+1.515
Neg logit contributions
hr-1.769
 tou-1.544
EX-1.455
 exc-1.450
Ps-1.441
Token.
Token ID13
Feature activation-1.20e-01

Pos logit contributions
hr+0.072
oc+0.060
 exc+0.059
 tou+0.059
EX+0.058
Neg logit contributions
abella-0.087
top-0.082
 wrong-0.080
bench-0.078
box-0.070
Token
Token ID220
Feature activation-8.96e-02

Pos logit contributions
abella+0.185
top+0.173
 wrong+0.168
bench+0.163
box+0.147
Neg logit contributions
hr-0.180
oc-0.154
 exc-0.152
 tou-0.151
EX-0.151
Token
Token ID198
Feature activation-8.29e-01

Pos logit contributions
abella+0.150
top+0.141
 wrong+0.137
bench+0.134
box+0.121
Neg logit contributions
hr-0.122
oc-0.102
 exc-0.101
 tou-0.100
EX-0.099
Token
Token ID198
Feature activation+1.41e-01

Pos logit contributions
abella+1.511
top+1.434
 wrong+1.408
bench+1.359
box+1.258
Neg logit contributions
hr-0.985
oc-0.843
Ps-0.804
EX-0.797
 exc-0.795
TokenMax
Token ID11518
Feature activation-8.27e-02

Pos logit contributions
hr+0.188
oc+0.154
 exc+0.153
 tou+0.153
 accidents+0.151
Neg logit contributions
abella-0.244
top-0.230
 wrong-0.223
bench-0.220
box-0.198
Token asked
Token ID1965
Feature activation+8.95e-01

Pos logit contributions
abella+0.127
top+0.119
 wrong+0.115
bench+0.113
box+0.100
Neg logit contributions
hr-0.125
oc-0.107
 exc-0.105
 tou-0.105
AAAA-0.104
Token Lily
Token ID20037
Feature activation+4.24e-01

Pos logit contributions
hr+1.435
 tou+1.186
EX+1.173
oc+1.169
AAAA+1.164
Neg logit contributions
abella-1.371
 wrong-1.282
top-1.248
bench-1.219
box-1.087
Tokens
Token ID82
Feature activation+7.97e-01

Pos logit contributions
abella+1.424
top+1.301
 wrong+1.293
bench+1.276
 projector+1.130
Neg logit contributions
hr-1.418
oc-1.227
 exc-1.178
rum-1.163
EX-1.162
Token courage
Token ID11917
Feature activation-9.44e-02

Pos logit contributions
hr+1.205
oc+1.033
AAAA+1.030
Ps+1.028
 tou+1.018
Neg logit contributions
abella-1.216
top-1.158
 wrong-1.153
bench-1.094
box-0.988
Token.
Token ID13
Feature activation+2.01e-01

Pos logit contributions
abella+0.174
top+0.164
 wrong+0.159
bench+0.157
box+0.143
Neg logit contributions
hr-0.113
oc-0.092
 exc-0.091
 tou-0.090
 accidents-0.089
Token He
Token ID679
Feature activation+4.88e-01

Pos logit contributions
hr+0.300
oc+0.252
 exc+0.250
 tou+0.250
EX+0.248
Neg logit contributions
abella-0.314
top-0.294
 wrong-0.283
bench-0.277
box-0.248
Token moved
Token ID3888
Feature activation+5.37e-02

Pos logit contributions
hr+0.747
 accidents+0.626
oc+0.623
EX+0.621
 tou+0.620
Neg logit contributions
abella-0.751
top-0.697
 wrong-0.669
bench-0.650
box-0.582
Token a
Token ID257
Feature activation-1.91e-01

Pos logit contributions
hr+0.074
oc+0.061
 exc+0.061
 tou+0.060
EX+0.060
Neg logit contributions
abella-0.091
top-0.086
 wrong-0.083
bench-0.081
box-0.073
Token little
Token ID1310
Feature activation-3.24e-01

Pos logit contributions
abella+0.344
top+0.322
 wrong+0.312
bench+0.308
box+0.279
Neg logit contributions
hr-0.242
oc-0.200
 exc-0.195
 tou-0.195
EX-0.191
Token bit
Token ID1643
Feature activation-3.42e-01

Pos logit contributions
abella+0.585
top+0.539
 wrong+0.524
bench+0.520
box+0.471
Neg logit contributions
hr-0.414
oc-0.343
 exc-0.332
 accidents-0.329
 tou-0.329
Token closer
Token ID5699
Feature activation-2.43e-01

Pos logit contributions
abella+0.635
top+0.590
 wrong+0.574
bench+0.572
box+0.525
Neg logit contributions
hr-0.416
oc-0.337
 exc-0.335
 accidents-0.330
 tou-0.325
Token to
Token ID284
Feature activation+2.13e-01

Pos logit contributions
abella+0.427
top+0.397
 wrong+0.386
bench+0.382
box+0.345
Neg logit contributions
hr-0.318
oc-0.263
 exc-0.258
 tou-0.256
 accidents-0.255
Token the
Token ID262
Feature activation-1.69e-02

Pos logit contributions
hr+0.270
oc+0.223
 exc+0.219
 tou+0.218
AAAA+0.217
Neg logit contributions
abella-0.381
top-0.359
 wrong-0.349
bench-0.340
box-0.309
Token pushed
Token ID7121
Feature activation+5.09e-01

Pos logit contributions
abella+0.410
top+0.386
 wrong+0.377
bench+0.370
box+0.331
Neg logit contributions
hr-0.343
oc-0.289
 exc-0.283
 tou-0.281
EX-0.278
Token and
Token ID290
Feature activation-2.21e-01

Pos logit contributions
hr+0.644
oc+0.539
 tou+0.532
 exc+0.524
AAAA+0.519
Neg logit contributions
abella-0.920
top-0.868
 wrong-0.843
bench-0.817
 Luke-0.743
Token pulled
Token ID5954
Feature activation-1.53e-01

Pos logit contributions
abella+0.366
top+0.344
 wrong+0.336
bench+0.328
box+0.296
Neg logit contributions
hr-0.311
oc-0.258
 exc-0.256
EX-0.252
 tou-0.252
Token,
Token ID11
Feature activation+2.74e-01

Pos logit contributions
abella+0.271
top+0.255
 wrong+0.247
bench+0.244
box+0.221
Neg logit contributions
hr-0.198
oc-0.162
 exc-0.161
 accidents-0.159
 tou-0.158
Token but
Token ID475
Feature activation-1.01e+00

Pos logit contributions
hr+0.319
oc+0.260
 tou+0.254
 exc+0.252
EX+0.249
Neg logit contributions
abella-0.517
top-0.492
 wrong-0.479
bench-0.469
box-0.428
Token it
Token ID340
Feature activation-5.11e-01

Pos logit contributions
abella+1.615
top+1.545
bench+1.523
 wrong+1.467
box+1.339
Neg logit contributions
hr-1.421
 accidents-1.230
 exc-1.186
Fear-1.170
Ps-1.153
Token was
Token ID373
Feature activation-2.04e-01

Pos logit contributions
abella+0.853
top+0.804
bench+0.773
 wrong+0.771
box+0.691
Neg logit contributions
hr-0.694
 exc-0.581
oc-0.578
 tou-0.577
 accidents-0.576
Token too
Token ID1165
Feature activation+7.86e-02

Pos logit contributions
abella+0.308
top+0.288
 wrong+0.274
bench+0.272
box+0.243
Neg logit contributions
hr-0.312
 exc-0.265
 accidents-0.263
 tou-0.263
oc-0.262
Token heavy
Token ID4334
Feature activation-5.40e-01

Pos logit contributions
hr+0.123
oc+0.105
 exc+0.103
 tou+0.103
EX+0.102
Neg logit contributions
abella-0.117
top-0.110
 wrong-0.106
bench-0.104
box-0.092
Token to
Token ID284
Feature activation+6.36e-01

Pos logit contributions
abella+0.875
top+0.804
bench+0.763
 wrong+0.761
box+0.679
Neg logit contributions
hr-0.772
 accidents-0.658
oc-0.645
 exc-0.644
EX-0.636
Token open
Token ID1280
Feature activation+1.25e-01

Pos logit contributions
hr+0.841
oc+0.706
AAAA+0.683
 accidents+0.682
 tou+0.678
Neg logit contributions
abella-1.098
top-1.058
 wrong-1.020
bench-1.005
box-0.895
Token
Token ID198
Feature activation-2.11e-01

Pos logit contributions
hr+0.149
oc+0.124
 exc+0.123
 tou+0.123
EX+0.122
Neg logit contributions
abella-0.172
top-0.162
 wrong-0.157
bench-0.153
box-0.138
Token
Token ID198
Feature activation+1.01e-01

Pos logit contributions
abella+0.360
top+0.340
 wrong+0.330
bench+0.324
box+0.293
Neg logit contributions
hr-0.285
oc-0.241
 exc-0.236
 tou-0.232
AAAA-0.231
TokenS
Token ID50
Feature activation+2.30e-01

Pos logit contributions
hr+0.118
oc+0.094
 exc+0.093
 tou+0.092
EX+0.092
Neg logit contributions
abella-0.192
top-0.182
 wrong-0.177
bench-0.174
box-0.159
Tokenara
Token ID3301
Feature activation-2.38e-01

Pos logit contributions
hr+0.386
 exc+0.332
oc+0.332
 tou+0.328
EX+0.327
Neg logit contributions
abella-0.322
top-0.299
 wrong-0.288
bench-0.279
box-0.246
Token pressed
Token ID12070
Feature activation-1.88e-01

Pos logit contributions
abella+0.380
top+0.356
 wrong+0.347
bench+0.340
box+0.302
Neg logit contributions
hr-0.346
oc-0.293
 exc-0.289
 tou-0.286
AAAA-0.284
Token the
Token ID262
Feature activation-3.25e-01

Pos logit contributions
abella+0.356
top+0.336
 wrong+0.327
bench+0.323
box+0.297
Neg logit contributions
hr-0.220
oc-0.176
 exc-0.174
 accidents-0.173
 tou-0.172
Token button
Token ID4936
Feature activation-1.35e-01

Pos logit contributions
abella+0.529
top+0.489
 wrong+0.466
bench+0.466
box+0.415
Neg logit contributions
hr-0.468
 exc-0.395
oc-0.394
 tou-0.393
AAAA-0.388
Token and
Token ID290
Feature activation-9.40e-01

Pos logit contributions
abella+0.277
top+0.262
 wrong+0.255
bench+0.252
box+0.232
Neg logit contributions
hr-0.133
oc-0.103
 exc-0.101
 tou-0.099
 accidents-0.099
Token a
Token ID257
Feature activation-4.10e-01

Pos logit contributions
abella+1.486
top+1.406
bench+1.382
 wrong+1.368
box+1.230
Neg logit contributions
hr-1.373
 accidents-1.143
AAAA-1.140
 exc-1.128
Ps-1.123
Token jet
Token ID12644
Feature activation+5.38e-01

Pos logit contributions
abella+0.685
top+0.634
 wrong+0.608
bench+0.600
box+0.534
Neg logit contributions
hr-0.574
oc-0.485
 exc-0.478
 tou-0.476
AAAA-0.474
Token of
Token ID286
Feature activation-5.19e-01

Pos logit contributions
hr+0.747
oc+0.629
 accidents+0.608
 exc+0.607
 tou+0.606
Neg logit contributions
abella-0.888
top-0.848
 wrong-0.831
bench-0.804
box-0.724
Token.
Token ID13
Feature activation-4.73e-01

Pos logit contributions
abella+0.199
top+0.185
 wrong+0.179
bench+0.177
box+0.158
Neg logit contributions
hr-0.185
oc-0.158
 exc-0.155
 tou-0.154
AAAA-0.153
Token He
Token ID679
Feature activation-1.92e+00

Pos logit contributions
abella+0.712
top+0.653
 wrong+0.640
bench+0.620
box+0.557
Neg logit contributions
hr-0.727
oc-0.640
 exc-0.634
Ps-0.620
EX-0.619
Token would
Token ID561
Feature activation-1.16e+00

Pos logit contributions
abella+2.878
top+2.848
bench+2.723
 wrong+2.714
 projector+2.411
Neg logit contributions
 exc-2.758
hr-2.748
oc-2.573
Ps-2.483
AAAA-2.447
Token run
Token ID1057
Feature activation-8.59e-01

Pos logit contributions
abella+2.284
top+2.118
 wrong+2.075
bench+2.049
box+1.877
Neg logit contributions
hr-1.301
 exc-1.182
oc-1.040
 tou-1.037
Ps-1.018
Token around
Token ID1088
Feature activation-2.67e-01

Pos logit contributions
abella+1.754
top+1.656
bench+1.626
 wrong+1.607
box+1.504
Neg logit contributions
hr-0.854
oc-0.707
 exc-0.666
AAAA-0.660
 tou-0.654
Token and
Token ID290
Feature activation-2.11e+00

Pos logit contributions
abella+0.580
top+0.550
 wrong+0.536
bench+0.532
box+0.490
Neg logit contributions
hr-0.243
oc-0.184
 exc-0.179
AAAA-0.173
 tou-0.172
Token see
Token ID766
Feature activation+1.06e+00

Pos logit contributions
abella+4.488
bench+4.090
 wrong+3.951
top+3.933
box+3.783
Neg logit contributions
hr-2.283
 exc-1.969
oc-1.921
 mis-1.805
 tou-1.750
Token all
Token ID477
Feature activation+1.01e+00

Pos logit contributions
hr+1.678
AAAA+1.455
EX+1.442
oc+1.432
 tou+1.418
Neg logit contributions
abella-1.567
 wrong-1.475
top-1.465
bench-1.308
box-1.228
Token the
Token ID262
Feature activation+1.19e+00

Pos logit contributions
hr+1.310
AAAA+1.121
oc+1.102
Ps+1.091
EX+1.089
Neg logit contributions
abella-1.709
top-1.674
 wrong-1.649
bench-1.581
 Luke-1.387
Token animals
Token ID4695
Feature activation+1.45e-01

Pos logit contributions
hr+1.661
oc+1.424
rum+1.398
AAAA+1.374
 tou+1.356
Neg logit contributions
abella-1.936
 wrong-1.880
top-1.839
bench-1.701
box-1.524
Token.
Token ID13
Feature activation-1.12e+00

Pos logit contributions
hr+0.209
oc+0.176
 exc+0.173
 tou+0.173
EX+0.172
Neg logit contributions
abella-0.232
top-0.219
 wrong-0.213
bench-0.207
box-0.186
Token they
Token ID484
Feature activation+4.38e-01

Pos logit contributions
abella+1.765
top+1.675
bench+1.574
 wrong+1.493
box+1.402
Neg logit contributions
hr-1.797
 exc-1.603
 accidents-1.599
 tou-1.566
Ps-1.546
Token can
Token ID460
Feature activation-7.82e-01

Pos logit contributions
hr+0.693
oc+0.588
AAAA+0.580
 accidents+0.577
EX+0.576
Neg logit contributions
abella-0.641
top-0.609
 wrong-0.595
bench-0.571
box-0.502
Token do
Token ID466
Feature activation-6.12e-02

Pos logit contributions
abella+1.354
top+1.210
 wrong+1.170
bench+1.166
box+1.054
Neg logit contributions
hr-1.077
 exc-0.936
oc-0.893
 accidents-0.875
EX-0.871
Token.
Token ID13
Feature activation-1.86e-01

Pos logit contributions
abella+0.106
top+0.100
 wrong+0.096
bench+0.095
box+0.086
Neg logit contributions
hr-0.080
oc-0.066
 exc-0.066
 tou-0.065
 accidents-0.065
Token They
Token ID1119
Feature activation-1.28e+00

Pos logit contributions
abella+0.285
top+0.266
 wrong+0.259
bench+0.253
box+0.227
Neg logit contributions
hr-0.285
oc-0.244
 exc-0.242
 tou-0.239
AAAA-0.238
Token find
Token ID1064
Feature activation-1.64e+00

Pos logit contributions
abella+2.305
top+2.105
bench+2.061
 wrong+2.030
box+1.910
Neg logit contributions
hr-1.596
 exc-1.451
 tou-1.377
oc-1.339
Ps-1.305
Token out
Token ID503
Feature activation+9.70e-01

Pos logit contributions
abella+2.729
top+2.556
bench+2.543
 wrong+2.322
box+2.186
Neg logit contributions
hr-2.146
 accidents-1.937
 tou-1.884
 exc-1.874
oc-1.855
Token that
Token ID326
Feature activation-6.56e-01

Pos logit contributions
hr+1.708
oc+1.525
 tou+1.517
EX+1.516
Ps+1.500
Neg logit contributions
abella-1.281
 wrong-1.211
top-1.171
bench-1.054
box-0.947
Token they
Token ID484
Feature activation-6.74e-01

Pos logit contributions
abella+1.020
top+0.957
bench+0.899
 wrong+0.899
box+0.795
Neg logit contributions
hr-0.979
 exc-0.848
 tou-0.837
 accidents-0.832
oc-0.822
Token can
Token ID460
Feature activation-1.65e+00

Pos logit contributions
abella+1.077
top+0.990
bench+0.947
 wrong+0.944
box+0.857
Neg logit contributions
hr-0.983
 exc-0.864
oc-0.852
 tou-0.840
Ps-0.819
Token have
Token ID423
Feature activation+2.47e-01

Pos logit contributions
abella+3.132
top+2.747
bench+2.724
 wrong+2.707
box+2.478
Neg logit contributions
hr-2.020
 exc-1.899
 tou-1.697
oc-1.669
 mis-1.633
Token."
Token ID526
Feature activation-8.04e-02

Pos logit contributions
abella+1.276
top+1.190
bench+1.156
 wrong+1.120
box+1.031
Neg logit contributions
hr-0.849
 exc-0.703
oc-0.695
 accidents-0.683
 tou-0.667
Token
Token ID198
Feature activation-2.05e-01

Pos logit contributions
abella+0.133
top+0.125
 wrong+0.121
bench+0.118
box+0.107
Neg logit contributions
hr-0.113
oc-0.095
 exc-0.094
 tou-0.094
AAAA-0.093
Token
Token ID198
Feature activation-1.54e-01

Pos logit contributions
abella+0.345
top+0.325
 wrong+0.315
bench+0.309
box+0.279
Neg logit contributions
hr-0.281
oc-0.237
 exc-0.233
 tou-0.230
AAAA-0.228
TokenL
Token ID43
Feature activation-4.00e-01

Pos logit contributions
abella+0.261
top+0.245
 wrong+0.239
bench+0.233
box+0.211
Neg logit contributions
hr-0.213
oc-0.179
 exc-0.176
 tou-0.174
AAAA-0.173
Tokenily
Token ID813
Feature activation-1.34e-01

Pos logit contributions
abella+0.436
top+0.404
 wrong+0.376
bench+0.365
box+0.302
Neg logit contributions
hr-0.790
 exc-0.698
oc-0.696
AAAA-0.688
 accidents-0.686
Token says
Token ID1139
Feature activation-1.77e+00

Pos logit contributions
abella+0.259
top+0.246
 wrong+0.239
bench+0.236
box+0.216
Neg logit contributions
hr-0.150
oc-0.120
 exc-0.118
 tou-0.117
AAAA-0.115
Token,
Token ID11
Feature activation+1.18e-01

Pos logit contributions
abella+3.581
top+3.430
bench+3.362
 wrong+3.050
 projector+2.997
Neg logit contributions
hr-1.801
 accidents-1.617
 exc-1.541
 from-1.499
 little-1.442
Token "
Token ID366
Feature activation-1.07e+00

Pos logit contributions
hr+0.162
oc+0.135
 exc+0.133
 tou+0.133
EX+0.132
Neg logit contributions
abella-0.198
top-0.186
 wrong-0.181
bench-0.177
box-0.160
TokenI
Token ID40
Feature activation-9.06e-01

Pos logit contributions
abella+1.998
 wrong+1.842
top+1.831
bench+1.781
box+1.680
Neg logit contributions
hr-1.340
oc-1.118
 exc-1.102
 tou-1.101
Ps-1.101
Token love
Token ID1842
Feature activation-5.83e-01

Pos logit contributions
abella+1.701
top+1.606
bench+1.560
 wrong+1.545
box+1.426
Neg logit contributions
hr-1.073
oc-0.922
 exc-0.913
 tou-0.853
Ps-0.845
Token you
Token ID345
Feature activation-9.41e-01

Pos logit contributions
abella+0.964
top+0.910
bench+0.878
 wrong+0.866
box+0.774
Neg logit contributions
hr-0.843
 accidents-0.710
 exc-0.694
 tou-0.683
oc-0.672
Token to
Token ID284
Feature activation-3.18e-01

Pos logit contributions
hr+0.096
oc+0.083
 exc+0.082
 tou+0.082
EX+0.081
Neg logit contributions
abella-0.083
top-0.077
 wrong-0.075
bench-0.073
box-0.064
Token hurt
Token ID5938
Feature activation+8.44e-02

Pos logit contributions
abella+0.563
top+0.525
bench+0.505
 wrong+0.502
box+0.461
Neg logit contributions
hr-0.421
 exc-0.346
oc-0.343
 accidents-0.341
 tou-0.338
Token you
Token ID345
Feature activation+6.20e-01

Pos logit contributions
hr+0.115
oc+0.096
 exc+0.095
 tou+0.094
EX+0.093
Neg logit contributions
abella-0.143
top-0.135
 wrong-0.131
bench-0.128
box-0.116
Token.
Token ID13
Feature activation-2.95e-01

Pos logit contributions
hr+0.935
AAAA+0.793
 tou+0.793
oc+0.790
EX+0.786
Neg logit contributions
abella-0.950
 wrong-0.918
top-0.913
bench-0.850
box-0.768
Token I
Token ID314
Feature activation-1.80e-01

Pos logit contributions
abella+0.466
top+0.435
 wrong+0.418
bench+0.415
box+0.368
Neg logit contributions
hr-0.438
oc-0.377
 exc-0.374
 tou-0.365
 accidents-0.365
Token was
Token ID373
Feature activation-1.63e+00

Pos logit contributions
abella+0.298
top+0.279
 wrong+0.269
bench+0.267
box+0.241
Neg logit contributions
hr-0.251
oc-0.213
 exc-0.210
 tou-0.206
AAAA-0.205
Token just
Token ID655
Feature activation+7.26e-02

Pos logit contributions
abella+2.937
top+2.810
bench+2.734
box+2.381
 versa+2.379
Neg logit contributions
hr-2.203
 exc-1.893
 accidents-1.839
 mis-1.772
 tou-1.718
Token angry
Token ID7954
Feature activation-3.85e-01

Pos logit contributions
hr+0.107
oc+0.091
 exc+0.089
 tou+0.089
EX+0.088
Neg logit contributions
abella-0.115
top-0.107
 wrong-0.104
bench-0.102
box-0.091
Token."
Token ID526
Feature activation-7.47e-02

Pos logit contributions
abella+0.694
top+0.653
bench+0.628
 wrong+0.621
box+0.564
Neg logit contributions
hr-0.487
 accidents-0.399
oc-0.397
 exc-0.396
AAAA-0.384
Token
Token ID198
Feature activation+4.65e-02

Pos logit contributions
abella+0.126
top+0.118
 wrong+0.114
bench+0.112
box+0.101
Neg logit contributions
hr-0.103
oc-0.087
 exc-0.086
 tou-0.085
AAAA-0.084
Token
Token ID198
Feature activation+2.34e-01

Pos logit contributions
hr+0.065
oc+0.055
 exc+0.054
 tou+0.054
EX+0.053
Neg logit contributions
abella-0.077
top-0.072
 wrong-0.070
bench-0.069
box-0.062
Token Sally
Token ID25737
Feature activation-7.57e-01

Pos logit contributions
abella+0.472
top+0.443
 wrong+0.429
bench+0.423
box+0.377
Neg logit contributions
hr-0.439
 exc-0.372
oc-0.368
 tou-0.366
AAAA-0.361
Token and
Token ID290
Feature activation-2.04e+00

Pos logit contributions
abella+1.821
top+1.726
 wrong+1.693
bench+1.675
 versa+1.552
Neg logit contributions
hr-0.503
oc-0.373
 exc-0.338
 tou-0.324
EX-0.296
Token Jimmy
Token ID12963
Feature activation-5.43e-01

Pos logit contributions
abella+3.827
top+3.718
bench+3.610
 wrong+3.492
box+3.321
Neg logit contributions
hr-2.416
 exc-2.215
 tou-2.073
 mis-2.009
Ps-1.975
Token.
Token ID13
Feature activation-2.35e-01

Pos logit contributions
abella+0.979
top+0.908
 wrong+0.881
bench+0.880
box+0.794
Neg logit contributions
hr-0.688
oc-0.581
 exc-0.571
 tou-0.555
EX-0.540
Token Sally
Token ID25737
Feature activation-1.50e+00

Pos logit contributions
abella+0.357
top+0.332
 wrong+0.325
bench+0.315
box+0.283
Neg logit contributions
hr-0.357
oc-0.306
 exc-0.303
 tou-0.301
AAAA-0.299
Token wanted
Token ID2227
Feature activation-1.93e+00

Pos logit contributions
abella+2.716
top+2.503
bench+2.450
 wrong+2.370
 projector+2.213
Neg logit contributions
hr-1.862
 exc-1.664
oc-1.659
 tou-1.568
Ps-1.514
Token to
Token ID284
Feature activation-9.35e-03

Pos logit contributions
abella+3.464
bench+3.280
top+3.238
 wrong+3.001
 projector+2.852
Neg logit contributions
hr-2.467
 accidents-2.184
 tou-2.097
 from-2.057
EX-2.056
Token give
Token ID1577
Feature activation-9.49e-01

Pos logit contributions
abella+0.015
top+0.014
 wrong+0.014
bench+0.013
box+0.012
Neg logit contributions
hr-0.014
oc-0.011
 exc-0.011
 tou-0.011
EX-0.011
Token Jimmy
Token ID12963
Feature activation-8.33e-02

Pos logit contributions
abella+1.564
top+1.472
bench+1.461
 wrong+1.408
box+1.299
Neg logit contributions
hr-1.317
 tou-1.128
 accidents-1.115
 exc-1.088
EX-1.087
Token a
Token ID257
Feature activation-2.21e-01

Pos logit contributions
abella+0.119
top+0.110
 wrong+0.106
bench+0.104
box+0.092
Neg logit contributions
hr-0.135
oc-0.116
 exc-0.115
 tou-0.115
EX-0.114
Token present
Token ID1944
Feature activation-1.36e+00

Pos logit contributions
abella+0.388
top+0.364
 wrong+0.350
bench+0.348
box+0.314
Neg logit contributions
hr-0.284
 exc-0.235
oc-0.234
 tou-0.233
EX-0.229