TOP ACTIVATIONS
MAX = 1.069

. <|endoftext|> Image copyright publicity Image caption The Greeks
ported License . <|endoftext|> Image copyright AP Image caption Peace talks
. <|endoftext|> Image copyright Deadline Police have
million years . <|endoftext|> Image copyright AFP The Bank
admitting to having a sn oo ze at their desks and
Serve hot ! <|endoftext|> Image copyright ESA Image caption Go ce
small fish from a tiny pond . The skeptics were abruptly
impression not only on her ce int ure , but on
<|endoftext|> The Geek Z odiac THE G EEK
Power - sharing sc ot ched Mr Abdullah
on taxes and Social Security seriously . And what
spending a decade in Guantanamo Bay , Kh adr was sentenced
owns four numbered Ontario real estate holding companies with his siblings
Ryan explained . Beat ty hesitated twice after opening
for play with the Wii U Game Pad controller .
have to do more heavy lifting with Ye ldon likely out
. Y oy ore creation Garg ant ua
yers for the former Guantanamo Bay detainee cons ented Thursday to
be more TM NT tid bits in the film ? Will
21 . Henry Pyr g os ( 116 )

MIN ACTIVATIONS
MIN = -0.969

after the near - dis aster , the Australian Transport Safety
situation of the Bur m ese people . Therefore I want
, while the Bur m ese authorities said there had been
party abandoned President Dil ma Rousse ff s governing
Irvine , California . Hide Caption 10 of 15 Photos :
Irvine , California . Hide Caption 9 of 15 Photos :
: ( Example : Th urb ion the Paladin became a
Women in politics . Hide Caption 7 of 15 Photos :
Community Newspaper Group <|endoftext|> W ired Equ ivalent Privacy ( WE
something like it ( B ab ylon was under Persian control
Hills , California . Hide Caption 14 of 15 Photos :
. Guess what ! Ch ances are , you rarely do
aga , California . Hide Caption 13 of 15 Photos :
Pasadena , California . Hide Caption 12 of 15 Photos :
year . W age gap ' sim ply unacceptable
, seeking work . Bol iv ians and Hait ians are
figure model . <|endoftext|> Ch ances are pretty good that you
Token while your opponent Sum mons a lot of monsters to
, features a full Shim ano 600 tri - color group
20 , 2015 . Hide Caption 8 of 15 Photos :

INTERVAL 0.560 - 1.069
CONTAINS 0.125%

took the reins of O CL C as its new president
Russia . Image copyright AFP Image caption The Kurds
in increasingly gender - integ rated military units , he replied
and leng then the tax credits for renewable energy production .
- war , anti - nuclear , peace movement The

INTERVAL 0.050 - 0.560
CONTAINS 31.010%

there , V ese y told the conference .
alone . O ' Brian is as fascinated by
. 1 million ( 27 th among all edge r us
making some bad choices , Michael Vick has devoted time to
only beer I can 't say I was crazy about

INTERVAL -0.459 - 0.050
CONTAINS 68.459%

Congress .[ 52 ] The funds for the monument were raised
He shrug s his shoulders , moves over to
R apt or is powered by Ford 's new ,
people who are subject matter experts , who know how to
known as a " name - cle aring hearing " with

INTERVAL -0.969 - -0.459
CONTAINS 0.406%

ivating the unit . a failed sensor providing false information ,
must select a newsletter to subscribe to . Sign Up You
the plurality of pro - independence parties . But above all
it becomes relevant . And apparently in Seattle , it
are metal and should not snap off anyway Once it 's
Token.
Token ID13
Feature activation+3.27e-02

Pos logit contributions
Rated+0.001
ENCY+0.001
 Draper+0.001
DEN+0.001
OWER+0.001
Neg logit contributions
 patched-0.001
 merit-0.001
minster-0.001
 consolation-0.001
bridge-0.001
Token
Token ID447
Feature activation+1.19e-02

Pos logit contributions
 patched+0.051
 merit+0.048
minster+0.046
 consolation+0.045
 elig+0.045
Neg logit contributions
Rated-0.064
ENCY-0.058
 Draper-0.058
DEN-0.057
OWER-0.056
Token
Token ID251
Feature activation-7.27e-02

Pos logit contributions
 patched+0.024
 merit+0.022
minster+0.022
 consolation+0.022
 elig+0.021
Neg logit contributions
Rated-0.018
ENCY-0.016
 Draper-0.016
DEN-0.016
OWER-0.016
Token<|endoftext|>
Token ID50256
Feature activation-8.91e-02

Pos logit contributions
Rated+0.139
ENCY+0.126
 Draper+0.125
DEN+0.125
OWER+0.122
Neg logit contributions
 patched-0.117
 merit-0.109
minster-0.105
 consolation-0.104
 elig-0.103
TokenImage
Token ID5159
Feature activation+2.15e-01

Pos logit contributions
Rated+0.139
ENCY+0.122
DEN+0.121
 Draper+0.120
OWER+0.117
Neg logit contributions
 patched-0.176
 merit-0.167
minster-0.162
 consolation-0.161
 elig-0.160
Token copyright
Token ID6634
Feature activation+1.07e+00

Pos logit contributions
 patched+0.404
 merit+0.382
minster+0.371
 consolation+0.368
 elig+0.367
Neg logit contributions
Rated-0.351
ENCY-0.313
DEN-0.310
 Draper-0.309
OWER-0.300
Token publicity
Token ID19403
Feature activation-2.83e-01

Pos logit contributions
 patched+2.041
 merit+1.929
 consolation+1.862
minster+1.855
bridge+1.847
Neg logit contributions
Rated-1.709
ENCY-1.517
DEN-1.493
 Draper-1.480
OWER-1.458
Token Image
Token ID7412
Feature activation-1.05e-01

Pos logit contributions
Rated+0.482
ENCY+0.432
 Draper+0.427
DEN+0.427
OWER+0.418
Neg logit contributions
 patched-0.511
 merit-0.481
minster-0.466
 consolation-0.465
 elig-0.461
Token caption
Token ID8305
Feature activation+2.04e-01

Pos logit contributions
Rated+0.168
ENCY+0.149
DEN+0.148
 Draper+0.147
OWER+0.144
Neg logit contributions
 patched-0.202
 merit-0.190
minster-0.185
 consolation-0.184
 elig-0.183
Token The
Token ID383
Feature activation-8.03e-02

Pos logit contributions
 patched+0.385
 merit+0.364
minster+0.352
 consolation+0.350
bridge+0.348
Neg logit contributions
Rated-0.331
ENCY-0.295
DEN-0.291
 Draper-0.291
OWER-0.284
Token Greeks
Token ID25059
Feature activation-1.58e-01

Pos logit contributions
Rated+0.145
ENCY+0.131
 Draper+0.131
DEN+0.130
OWER+0.127
Neg logit contributions
 patched-0.136
 merit-0.128
minster-0.124
 consolation-0.123
 elig-0.122
Tokenported
Token ID9213
Feature activation-8.90e-02

Pos logit contributions
Rated+0.318
 Draper+0.288
ENCY+0.286
DEN+0.284
 Author+0.277
Neg logit contributions
 patched-0.226
 merit-0.210
minster-0.204
 elig-0.203
 trickle-0.203
Token License
Token ID13789
Feature activation+5.59e-02

Pos logit contributions
Rated+0.175
ENCY+0.159
 Draper+0.158
DEN+0.157
OWER+0.155
Neg logit contributions
 patched-0.137
 merit-0.128
minster-0.124
 consolation-0.122
bridge-0.122
Token.
Token ID13
Feature activation-1.59e-01

Pos logit contributions
 patched+0.100
 merit+0.095
minster+0.092
 consolation+0.091
bridge+0.090
Neg logit contributions
Rated-0.096
ENCY-0.086
 Draper-0.085
DEN-0.085
OWER-0.083
Token<|endoftext|>
Token ID50256
Feature activation-8.48e-02

Pos logit contributions
Rated+0.285
ENCY+0.257
 Draper+0.255
DEN+0.254
OWER+0.249
Neg logit contributions
 patched-0.272
 merit-0.255
minster-0.247
 consolation-0.245
 elig-0.244
TokenImage
Token ID5159
Feature activation+1.09e-01

Pos logit contributions
Rated+0.132
ENCY+0.116
DEN+0.115
 Draper+0.114
OWER+0.112
Neg logit contributions
 patched-0.168
 merit-0.158
minster-0.154
 consolation-0.153
 elig-0.152
Token copyright
Token ID6634
Feature activation+1.01e+00

Pos logit contributions
 patched+0.204
 merit+0.193
minster+0.187
 consolation+0.186
 elig+0.186
Neg logit contributions
Rated-0.177
ENCY-0.157
DEN-0.156
 Draper-0.156
OWER-0.151
Token AP
Token ID3486
Feature activation+2.33e-01

Pos logit contributions
 patched+1.934
 merit+1.828
 consolation+1.766
minster+1.760
bridge+1.751
Neg logit contributions
Rated-1.606
ENCY-1.425
DEN-1.403
 Draper-1.391
OWER-1.369
Token Image
Token ID7412
Feature activation+4.55e-02

Pos logit contributions
 patched+0.528
 merit+0.504
minster+0.491
 consolation+0.489
bridge+0.486
Neg logit contributions
Rated-0.290
ENCY-0.248
 Draper-0.245
DEN-0.244
OWER-0.236
Token caption
Token ID8305
Feature activation+2.04e-01

Pos logit contributions
 patched+0.088
 merit+0.083
minster+0.081
 elig+0.080
 consolation+0.080
Neg logit contributions
Rated-0.072
ENCY-0.064
DEN-0.063
 Draper-0.063
OWER-0.061
Token Peace
Token ID12689
Feature activation-1.13e-01

Pos logit contributions
 patched+0.387
 merit+0.365
minster+0.354
 consolation+0.352
bridge+0.350
Neg logit contributions
Rated-0.330
ENCY-0.294
DEN-0.290
 Draper-0.290
OWER-0.283
Token talks
Token ID6130
Feature activation+2.65e-01

Pos logit contributions
Rated+0.200
ENCY+0.180
 Draper+0.179
DEN+0.178
OWER+0.175
Neg logit contributions
 patched-0.196
 merit-0.184
minster-0.178
 consolation-0.176
 elig-0.175
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
 patched+0.342
 merit+0.322
minster+0.314
 elig+0.310
 consolation+0.310
Neg logit contributions
Rated-0.301
ENCY-0.269
 Draper-0.266
DEN-0.264
OWER-0.260
Token
Token ID447
Feature activation-2.74e-02

Pos logit contributions
Rated+0.179
ENCY+0.161
 Draper+0.159
DEN+0.159
OWER+0.155
Neg logit contributions
 patched-0.183
 merit-0.172
minster-0.166
 consolation-0.165
 elig-0.164
Token
Token ID251
Feature activation+1.51e-02

Pos logit contributions
Rated+0.042
ENCY+0.037
 Draper+0.037
DEN+0.037
OWER+0.036
Neg logit contributions
 patched-0.054
 merit-0.051
minster-0.050
 consolation-0.050
bridge-0.049
Token<|endoftext|>
Token ID50256
Feature activation-4.53e-02

Pos logit contributions
 patched+0.026
 merit+0.025
minster+0.024
 consolation+0.024
 elig+0.024
Neg logit contributions
Rated-0.027
ENCY-0.024
 Draper-0.024
DEN-0.024
OWER-0.023
TokenImage
Token ID5159
Feature activation+1.72e-01

Pos logit contributions
Rated+0.071
ENCY+0.062
DEN+0.062
 Draper+0.061
OWER+0.060
Neg logit contributions
 patched-0.090
 merit-0.085
minster-0.082
 consolation-0.082
 elig-0.081
Token copyright
Token ID6634
Feature activation+1.01e+00

Pos logit contributions
 patched+0.322
 merit+0.305
minster+0.296
 consolation+0.294
 elig+0.293
Neg logit contributions
Rated-0.280
ENCY-0.249
DEN-0.247
 Draper-0.246
OWER-0.238
Token Deadline
Token ID35954
Feature activation+1.68e-01

Pos logit contributions
 patched+1.928
 merit+1.821
 consolation+1.758
minster+1.754
bridge+1.745
Neg logit contributions
Rated-1.609
ENCY-1.427
DEN-1.406
 Draper-1.395
OWER-1.372
Token
Token ID198
Feature activation+5.32e-02

Pos logit contributions
 patched+0.339
 merit+0.322
minster+0.312
 consolation+0.311
 elig+0.309
Neg logit contributions
Rated-0.251
ENCY-0.221
 Draper-0.218
DEN-0.218
OWER-0.212
Token
Token ID198
Feature activation+3.86e-02

Pos logit contributions
 patched+0.081
 merit+0.076
minster+0.073
 consolation+0.072
 elig+0.072
Neg logit contributions
Rated-0.106
ENCY-0.096
 Draper-0.096
DEN-0.096
OWER-0.094
TokenPolice
Token ID9039
Feature activation+2.40e-01

Pos logit contributions
 patched+0.077
 merit+0.073
minster+0.071
 consolation+0.070
bridge+0.070
Neg logit contributions
Rated-0.059
ENCY-0.052
 Draper-0.051
DEN-0.051
OWER-0.050
Token have
Token ID423
Feature activation+5.27e-02

Pos logit contributions
 patched+0.423
 merit+0.398
minster+0.385
 consolation+0.382
bridge+0.380
Neg logit contributions
Rated-0.421
ENCY-0.378
 Draper-0.374
DEN-0.374
OWER-0.365
Token million
Token ID1510
Feature activation-6.59e-02

Pos logit contributions
 patched+0.223
 merit+0.211
minster+0.204
 consolation+0.203
bridge+0.202
Neg logit contributions
Rated-0.198
ENCY-0.177
 Draper-0.175
DEN-0.175
OWER-0.171
Token years
Token ID812
Feature activation+1.06e-01

Pos logit contributions
Rated+0.129
ENCY+0.117
 Draper+0.116
DEN+0.116
OWER+0.113
Neg logit contributions
 patched-0.103
 merit-0.096
minster-0.092
 consolation-0.091
 elig-0.091
Token.
Token ID13
Feature activation-1.65e-03

Pos logit contributions
 patched+0.192
 merit+0.181
minster+0.176
 consolation+0.175
 elig+0.173
Neg logit contributions
Rated-0.178
ENCY-0.160
 Draper-0.158
DEN-0.158
OWER-0.154
Token<|endoftext|>
Token ID50256
Feature activation-9.69e-02

Pos logit contributions
Rated+0.003
ENCY+0.003
 Draper+0.003
DEN+0.003
OWER+0.003
Neg logit contributions
 patched-0.003
 merit-0.003
minster-0.003
 consolation-0.003
 elig-0.003
TokenImage
Token ID5159
Feature activation+1.35e-01

Pos logit contributions
Rated+0.151
ENCY+0.133
DEN+0.132
 Draper+0.130
OWER+0.128
Neg logit contributions
 patched-0.192
 merit-0.181
minster-0.176
 consolation-0.175
 elig-0.174
Token copyright
Token ID6634
Feature activation+9.89e-01

Pos logit contributions
 patched+0.253
 merit+0.239
minster+0.232
 consolation+0.231
 elig+0.230
Neg logit contributions
Rated-0.219
ENCY-0.195
DEN-0.194
 Draper-0.193
OWER-0.187
Token AFP
Token ID8916
Feature activation+6.11e-01

Pos logit contributions
 patched+1.891
 merit+1.787
 consolation+1.725
minster+1.720
bridge+1.712
Neg logit contributions
Rated-1.580
ENCY-1.403
DEN-1.381
 Draper-1.370
OWER-1.348
Token
Token ID198
Feature activation-5.17e-02

Pos logit contributions
 patched+1.370
 merit+1.306
minster+1.271
 consolation+1.267
bridge+1.260
Neg logit contributions
Rated-0.777
ENCY-0.666
 Draper-0.657
DEN-0.655
OWER-0.633
Token
Token ID198
Feature activation-6.15e-02

Pos logit contributions
Rated+0.102
ENCY+0.093
DEN+0.092
 Draper+0.092
OWER+0.090
Neg logit contributions
 patched-0.080
 merit-0.074
minster-0.072
 consolation-0.071
 elig-0.071
TokenThe
Token ID464
Feature activation-8.40e-02

Pos logit contributions
Rated+0.094
ENCY+0.083
DEN+0.082
 Draper+0.081
OWER+0.079
Neg logit contributions
 patched-0.123
 merit-0.116
minster-0.113
 consolation-0.112
 elig-0.112
Token Bank
Token ID5018
Feature activation-8.25e-04

Pos logit contributions
Rated+0.139
ENCY+0.125
 Draper+0.124
DEN+0.123
OWER+0.120
Neg logit contributions
 patched-0.155
 merit-0.146
minster-0.142
 consolation-0.141
 elig-0.140
Token admitting
Token ID22688
Feature activation+6.29e-02

Pos logit contributions
 patched+0.114
 merit+0.108
minster+0.104
 consolation+0.103
bridge+0.103
Neg logit contributions
Rated-0.116
ENCY-0.104
 Draper-0.103
DEN-0.103
OWER-0.101
Token to
Token ID284
Feature activation+8.58e-03

Pos logit contributions
 patched+0.122
 merit+0.116
minster+0.113
 consolation+0.112
bridge+0.111
Neg logit contributions
Rated-0.098
ENCY-0.087
DEN-0.086
 Draper-0.086
OWER-0.084
Token having
Token ID1719
Feature activation-4.86e-02

Pos logit contributions
 patched+0.014
 merit+0.013
minster+0.013
 consolation+0.013
bridge+0.013
Neg logit contributions
Rated-0.016
ENCY-0.014
 Draper-0.014
DEN-0.014
OWER-0.014
Token a
Token ID257
Feature activation+4.84e-02

Pos logit contributions
Rated+0.080
ENCY+0.071
 Draper+0.070
DEN+0.070
OWER+0.068
Neg logit contributions
 patched-0.090
 merit-0.086
minster-0.084
 consolation-0.083
bridge-0.083
Token sn
Token ID3013
Feature activation-1.46e-01

Pos logit contributions
 patched+0.067
 merit+0.062
minster+0.060
 consolation+0.059
bridge+0.059
Neg logit contributions
Rated-0.102
ENCY-0.093
 Draper-0.093
DEN-0.093
OWER-0.091
Tokenoo
Token ID2238
Feature activation+9.56e-01

Pos logit contributions
Rated+0.248
ENCY+0.221
 Draper+0.219
DEN+0.218
OWER+0.217
Neg logit contributions
 patched-0.266
 merit-0.253
 consolation-0.244
minster-0.242
 elig-0.241
Tokenze
Token ID2736
Feature activation-2.59e-02

Pos logit contributions
 patched+1.969
 merit+1.864
 trickle+1.810
minster+1.804
 consolation+1.804
Neg logit contributions
Rated-1.374
ENCY-1.214
DEN-1.185
 Draper-1.176
OWER-1.155
Token at
Token ID379
Feature activation+4.58e-02

Pos logit contributions
Rated+0.045
ENCY+0.040
 Draper+0.040
DEN+0.040
OWER+0.039
Neg logit contributions
 patched-0.046
 merit-0.043
minster-0.042
 consolation-0.041
 elig-0.041
Token their
Token ID511
Feature activation+4.10e-03

Pos logit contributions
 patched+0.073
 merit+0.068
minster+0.066
 consolation+0.065
bridge+0.065
Neg logit contributions
Rated-0.087
ENCY-0.080
 Draper-0.079
DEN-0.078
OWER-0.077
Token desks
Token ID44301
Feature activation-3.10e-01

Pos logit contributions
 patched+0.007
 merit+0.006
minster+0.006
bridge+0.006
 consolation+0.006
Neg logit contributions
Rated-0.008
ENCY-0.007
 Draper-0.007
DEN-0.007
OWER-0.007
Token and
Token ID290
Feature activation+3.47e-03

Pos logit contributions
Rated+0.524
ENCY+0.469
 Draper+0.464
DEN+0.463
OWER+0.454
Neg logit contributions
 patched-0.564
 merit-0.532
minster-0.515
 consolation-0.511
 elig-0.509
Token Serve
Token ID35557
Feature activation+1.77e-01

Pos logit contributions
 patched+0.028
 merit+0.025
minster+0.024
 consolation+0.024
bridge+0.024
Neg logit contributions
Rated-0.049
ENCY-0.045
 Draper-0.045
DEN-0.045
OWER-0.044
Token hot
Token ID3024
Feature activation+3.21e-01

Pos logit contributions
 patched+0.214
 merit+0.196
minster+0.189
 consolation+0.185
 elig+0.184
Neg logit contributions
Rated-0.405
ENCY-0.373
 Draper-0.371
DEN-0.370
OWER-0.364
Token!
Token ID0
Feature activation-1.68e-01

Pos logit contributions
 patched+0.583
 merit+0.549
minster+0.531
 consolation+0.528
bridge+0.525
Neg logit contributions
Rated-0.546
ENCY-0.489
 Draper-0.483
DEN-0.483
OWER-0.472
Token<|endoftext|>
Token ID50256
Feature activation+1.34e-01

Pos logit contributions
Rated+0.323
ENCY+0.293
 Draper+0.292
DEN+0.291
OWER+0.286
Neg logit contributions
 patched-0.268
 merit-0.251
minster-0.240
 consolation-0.238
 elig-0.237
TokenImage
Token ID5159
Feature activation+1.03e-01

Pos logit contributions
 patched+0.264
 merit+0.249
minster+0.242
 consolation+0.241
 elig+0.239
Neg logit contributions
Rated-0.207
ENCY-0.183
DEN-0.181
 Draper-0.180
OWER-0.176
Token copyright
Token ID6634
Feature activation+9.54e-01

Pos logit contributions
 patched+0.193
 merit+0.183
minster+0.178
 consolation+0.177
 elig+0.176
Neg logit contributions
Rated-0.167
ENCY-0.149
DEN-0.148
 Draper-0.147
OWER-0.143
Token ESA
Token ID42212
Feature activation-5.26e-02

Pos logit contributions
 patched+1.820
 merit+1.720
 consolation+1.660
minster+1.656
bridge+1.647
Neg logit contributions
Rated-1.530
ENCY-1.359
DEN-1.339
 Draper-1.329
OWER-1.307
Token Image
Token ID7412
Feature activation-2.41e-02

Pos logit contributions
Rated+0.075
ENCY+0.066
 Draper+0.065
DEN+0.065
OWER+0.063
Neg logit contributions
 patched-0.110
 merit-0.104
minster-0.101
 consolation-0.101
 elig-0.100
Token caption
Token ID8305
Feature activation+1.95e-01

Pos logit contributions
Rated+0.038
ENCY+0.034
DEN+0.034
 Draper+0.034
OWER+0.033
Neg logit contributions
 patched-0.046
 merit-0.044
minster-0.042
 consolation-0.042
 elig-0.042
Token Go
Token ID1514
Feature activation+2.66e-01

Pos logit contributions
 patched+0.373
 merit+0.351
minster+0.340
 consolation+0.339
bridge+0.338
Neg logit contributions
Rated-0.312
ENCY-0.278
DEN-0.274
 Draper-0.272
OWER-0.267
Tokence
Token ID344
Feature activation+9.39e-02

Pos logit contributions
 patched+0.500
 merit+0.471
minster+0.458
 consolation+0.455
bridge+0.453
Neg logit contributions
Rated-0.433
ENCY-0.387
 Draper-0.381
DEN-0.381
OWER-0.371
Token small
Token ID1402
Feature activation+3.80e-02

Pos logit contributions
 patched+0.022
 merit+0.021
minster+0.020
 consolation+0.020
bridge+0.020
Neg logit contributions
Rated-0.021
ENCY-0.019
 Draper-0.019
DEN-0.019
OWER-0.018
Token fish
Token ID5916
Feature activation+9.94e-02

Pos logit contributions
 patched+0.065
 merit+0.061
minster+0.059
 consolation+0.058
 elig+0.058
Neg logit contributions
Rated-0.068
ENCY-0.062
 Draper-0.061
DEN-0.061
OWER-0.060
Token from
Token ID422
Feature activation+2.64e-02

Pos logit contributions
 patched+0.181
 merit+0.171
minster+0.166
 consolation+0.164
 elig+0.164
Neg logit contributions
Rated-0.167
ENCY-0.149
 Draper-0.148
DEN-0.148
OWER-0.144
Token a
Token ID257
Feature activation-5.10e-03

Pos logit contributions
 patched+0.048
 merit+0.045
minster+0.044
 consolation+0.044
bridge+0.043
Neg logit contributions
Rated-0.045
ENCY-0.040
DEN-0.040
 Draper-0.040
OWER-0.039
Token tiny
Token ID7009
Feature activation+2.50e-01

Pos logit contributions
Rated+0.009
ENCY+0.008
 Draper+0.008
DEN+0.008
OWER+0.008
Neg logit contributions
 patched-0.009
 merit-0.008
minster-0.008
 consolation-0.008
bridge-0.008
Token pond
Token ID16723
Feature activation+9.16e-01

Pos logit contributions
 patched+0.397
 merit+0.370
minster+0.358
 consolation+0.354
 elig+0.352
Neg logit contributions
Rated-0.482
ENCY-0.437
 Draper-0.435
DEN-0.433
OWER-0.424
Token.
Token ID13
Feature activation+4.04e-03

Pos logit contributions
 patched+1.632
 merit+1.539
 consolation+1.474
minster+1.473
bridge+1.473
Neg logit contributions
Rated-1.594
ENCY-1.432
DEN-1.408
 Draper-1.407
OWER-1.374
Token The
Token ID383
Feature activation+6.56e-02

Pos logit contributions
 patched+0.008
 merit+0.007
minster+0.007
 consolation+0.007
bridge+0.007
Neg logit contributions
Rated-0.006
ENCY-0.006
DEN-0.006
 Draper-0.006
OWER-0.006
Token skeptics
Token ID37972
Feature activation+1.72e-01

Pos logit contributions
 patched+0.114
 merit+0.107
minster+0.104
 consolation+0.103
bridge+0.103
Neg logit contributions
Rated-0.116
ENCY-0.104
 Draper-0.103
DEN-0.103
OWER-0.101
Token were
Token ID547
Feature activation+2.82e-02

Pos logit contributions
 patched+0.305
 merit+0.288
minster+0.280
 consolation+0.276
 elig+0.275
Neg logit contributions
Rated-0.296
ENCY-0.265
 Draper-0.264
DEN-0.263
OWER-0.257
Token abruptly
Token ID25891
Feature activation-4.88e-01

Pos logit contributions
 patched+0.050
 merit+0.047
minster+0.046
 consolation+0.045
 elig+0.045
Neg logit contributions
Rated-0.049
ENCY-0.044
 Draper-0.043
DEN-0.043
OWER-0.042
Token impression
Token ID10647
Feature activation+1.14e-01

Pos logit contributions
Rated+0.026
 Draper+0.024
ENCY+0.024
DEN+0.023
OWER+0.023
Neg logit contributions
 patched-0.021
 merit-0.019
minster-0.019
 elig-0.019
bridge-0.018
Token not
Token ID407
Feature activation+1.12e-01

Pos logit contributions
 patched+0.201
 merit+0.189
minster+0.184
 consolation+0.182
 elig+0.181
Neg logit contributions
Rated-0.199
ENCY-0.178
 Draper-0.177
DEN-0.176
OWER-0.172
Token only
Token ID691
Feature activation-9.98e-02

Pos logit contributions
 patched+0.214
 merit+0.202
minster+0.196
 consolation+0.195
bridge+0.194
Neg logit contributions
Rated-0.179
ENCY-0.159
 Draper-0.158
DEN-0.157
OWER-0.153
Token on
Token ID319
Feature activation+7.13e-02

Pos logit contributions
Rated+0.178
ENCY+0.159
 Draper+0.158
DEN+0.158
OWER+0.154
Neg logit contributions
 patched-0.172
 merit-0.162
minster-0.157
 consolation-0.155
 elig-0.155
Token her
Token ID607
Feature activation+3.94e-03

Pos logit contributions
 patched+0.131
 merit+0.123
minster+0.120
 consolation+0.119
bridge+0.118
Neg logit contributions
Rated-0.119
ENCY-0.107
 Draper-0.106
DEN-0.105
OWER-0.103
Token ce
Token ID2906
Feature activation+9.02e-01

Pos logit contributions
 patched+0.007
 merit+0.006
minster+0.006
bridge+0.006
 consolation+0.006
Neg logit contributions
Rated-0.007
ENCY-0.006
 Draper-0.006
DEN-0.006
OWER-0.006
Tokenint
Token ID600
Feature activation+7.17e-02

Pos logit contributions
 patched+1.246
 merit+1.155
minster+1.100
 consolation+1.098
bridge+1.090
Neg logit contributions
Rated-1.922
ENCY-1.764
 Draper-1.747
DEN-1.747
OWER-1.712
Tokenure
Token ID495
Feature activation+1.56e-01

Pos logit contributions
 patched+0.125
 merit+0.118
minster+0.114
 consolation+0.113
 elig+0.113
Neg logit contributions
Rated-0.126
ENCY-0.114
 Draper-0.112
DEN-0.112
OWER-0.110
Token,
Token ID11
Feature activation+5.19e-02

Pos logit contributions
 patched+0.284
 merit+0.267
minster+0.260
 consolation+0.257
 elig+0.256
Neg logit contributions
Rated-0.263
ENCY-0.236
 Draper-0.233
DEN-0.233
OWER-0.227
Token but
Token ID475
Feature activation-9.43e-02

Pos logit contributions
 patched+0.094
 merit+0.089
minster+0.086
 consolation+0.085
bridge+0.085
Neg logit contributions
Rated-0.088
ENCY-0.079
 Draper-0.078
DEN-0.078
OWER-0.076
Token on
Token ID319
Feature activation+5.91e-02

Pos logit contributions
Rated+0.170
ENCY+0.153
 Draper+0.151
DEN+0.151
OWER+0.148
Neg logit contributions
 patched-0.161
 merit-0.151
minster-0.147
 consolation-0.144
bridge-0.144
Token
Token ID251
Feature activation-2.20e-02

Pos logit contributions
 patched+0.015
 merit+0.015
minster+0.014
 consolation+0.014
 elig+0.014
Neg logit contributions
Rated-0.008
ENCY-0.007
 Draper-0.007
DEN-0.007
OWER-0.007
Token<|endoftext|>
Token ID50256
Feature activation-1.87e-01

Pos logit contributions
Rated+0.042
ENCY+0.038
 Draper+0.038
DEN+0.038
OWER+0.037
Neg logit contributions
 patched-0.035
 merit-0.033
minster-0.032
 consolation-0.032
 elig-0.031
TokenThe
Token ID464
Feature activation+1.34e-02

Pos logit contributions
Rated+0.292
ENCY+0.257
DEN+0.255
 Draper+0.250
OWER+0.246
Neg logit contributions
 patched-0.371
 merit-0.350
minster-0.340
 consolation-0.340
 elig-0.336
Token Geek
Token ID33639
Feature activation+7.81e-02

Pos logit contributions
 patched+0.024
 merit+0.022
minster+0.022
 consolation+0.021
 elig+0.021
Neg logit contributions
Rated-0.023
ENCY-0.021
 Draper-0.021
DEN-0.021
OWER-0.020
Token Z
Token ID1168
Feature activation+4.72e-01

Pos logit contributions
 patched+0.136
 merit+0.128
minster+0.124
 consolation+0.123
 elig+0.123
Neg logit contributions
Rated-0.139
ENCY-0.124
 Draper-0.123
DEN-0.123
OWER-0.120
Tokenodiac
Token ID40096
Feature activation+8.95e-01

Pos logit contributions
 patched+0.859
 merit+0.811
minster+0.784
 consolation+0.779
bridge+0.773
Neg logit contributions
Rated-0.800
ENCY-0.717
 Draper-0.709
DEN-0.707
OWER-0.692
Token
Token ID198
Feature activation+1.41e-01

Pos logit contributions
 patched+1.543
 merit+1.451
minster+1.399
 consolation+1.391
bridge+1.380
Neg logit contributions
Rated-1.598
ENCY-1.441
DEN-1.424
 Draper-1.421
OWER-1.395
Token
Token ID198
Feature activation+1.27e-01

Pos logit contributions
 patched+0.217
 merit+0.202
minster+0.194
 consolation+0.194
 elig+0.192
Neg logit contributions
Rated-0.280
ENCY-0.255
DEN-0.253
 Draper-0.252
OWER-0.247
TokenTHE
Token ID10970
Feature activation+1.16e-01

Pos logit contributions
 patched+0.246
 merit+0.232
minster+0.225
 consolation+0.225
 elig+0.223
Neg logit contributions
Rated-0.201
ENCY-0.178
DEN-0.176
 Draper-0.174
OWER-0.171
Token G
Token ID402
Feature activation-1.18e-01

Pos logit contributions
 patched+0.221
 merit+0.209
minster+0.203
 consolation+0.202
bridge+0.200
Neg logit contributions
Rated-0.187
ENCY-0.167
 Draper-0.165
DEN-0.165
OWER-0.161
TokenEEK
Token ID33823
Feature activation+4.71e-01

Pos logit contributions
Rated+0.168
ENCY+0.148
DEN+0.146
 Draper+0.145
OWER+0.142
Neg logit contributions
 patched-0.245
 merit-0.233
minster-0.227
 consolation-0.225
bridge-0.225
Token
Token ID198
Feature activation-3.10e-02

Pos logit contributions
Rated+0.143
ENCY+0.130
DEN+0.129
 Draper+0.129
OWER+0.126
Neg logit contributions
 patched-0.118
 merit-0.109
 elig-0.106
 consolation-0.105
minster-0.105
TokenPower
Token ID13434
Feature activation-2.05e-01

Pos logit contributions
Rated+0.046
ENCY+0.041
DEN+0.040
 Draper+0.040
OWER+0.039
Neg logit contributions
 patched-0.063
 merit-0.060
minster-0.058
 consolation-0.058
 elig-0.058
Token-
Token ID12
Feature activation-1.96e-01

Pos logit contributions
Rated+0.373
ENCY+0.338
 Draper+0.333
DEN+0.332
OWER+0.326
Neg logit contributions
 patched-0.348
 merit-0.328
minster-0.315
 consolation-0.314
 elig-0.312
Tokensharing
Token ID21987
Feature activation-1.98e-01

Pos logit contributions
Rated+0.330
ENCY+0.295
DEN+0.292
 Draper+0.291
OWER+0.284
Neg logit contributions
 patched-0.359
 merit-0.339
minster-0.327
 consolation-0.327
 elig-0.324
Token sc
Token ID629
Feature activation-1.63e-02

Pos logit contributions
Rated+0.365
ENCY+0.331
 Draper+0.330
DEN+0.327
OWER+0.321
Neg logit contributions
 patched-0.328
 merit-0.308
minster-0.299
 consolation-0.294
bridge-0.294
Tokenot
Token ID313
Feature activation+8.87e-01

Pos logit contributions
Rated+0.031
ENCY+0.028
 Draper+0.028
DEN+0.028
OWER+0.028
Neg logit contributions
 patched-0.026
 merit-0.024
minster-0.023
 consolation-0.023
 elig-0.023
Tokenched
Token ID1740
Feature activation+1.59e-01

Pos logit contributions
 patched+1.580
 merit+1.490
minster+1.438
 consolation+1.427
 elig+1.425
Neg logit contributions
Rated-1.533
ENCY-1.380
 Draper-1.362
DEN-1.359
OWER-1.331
Token
Token ID198
Feature activation+7.69e-02

Pos logit contributions
 patched+0.271
 merit+0.254
minster+0.246
 consolation+0.244
 elig+0.242
Neg logit contributions
Rated-0.288
ENCY-0.260
 Draper-0.258
DEN-0.257
OWER-0.252
Token
Token ID198
Feature activation+9.31e-02

Pos logit contributions
 patched+0.121
 merit+0.112
 consolation+0.108
 elig+0.108
minster+0.108
Neg logit contributions
Rated-0.151
ENCY-0.137
DEN-0.135
 Draper-0.135
OWER-0.133
TokenMr
Token ID5246
Feature activation-1.56e-01

Pos logit contributions
 patched+0.189
 merit+0.179
minster+0.175
 consolation+0.173
bridge+0.173
Neg logit contributions
Rated-0.137
ENCY-0.121
 Draper-0.120
DEN-0.119
OWER-0.116
Token Abdullah
Token ID26804
Feature activation+4.04e-01

Pos logit contributions
Rated+0.267
 Draper+0.241
ENCY+0.241
DEN+0.237
OWER+0.233
Neg logit contributions
 patched-0.277
 merit-0.262
minster-0.251
 consolation-0.251
 elig-0.251
Token
Token ID851
Feature activation+1.17e-01

Pos logit contributions
 patched+0.365
 merit+0.345
minster+0.334
 consolation+0.332
bridge+0.330
Neg logit contributions
Rated-0.319
ENCY-0.284
 Draper-0.281
DEN-0.281
OWER-0.274
Token on
Token ID319
Feature activation-8.55e-04

Pos logit contributions
 patched+0.206
 merit+0.193
minster+0.189
 consolation+0.186
bridge+0.185
Neg logit contributions
Rated-0.205
 Draper-0.183
ENCY-0.183
DEN-0.181
 Roose-0.177
Token taxes
Token ID5704
Feature activation-7.89e-02

Pos logit contributions
Rated+0.001
 Draper+0.001
ENCY+0.001
DEN+0.001
OWER+0.001
Neg logit contributions
 patched-0.002
minster-0.001
 merit-0.001
 consolation-0.001
 elig-0.001
Token and
Token ID290
Feature activation+5.83e-02

Pos logit contributions
Rated+0.133
ENCY+0.119
 Draper+0.117
DEN+0.117
OWER+0.114
Neg logit contributions
 patched-0.144
 merit-0.136
minster-0.132
 consolation-0.131
 elig-0.130
Token Social
Token ID5483
Feature activation-5.59e-02

Pos logit contributions
 patched+0.098
 merit+0.091
minster+0.089
 consolation+0.088
bridge+0.088
Neg logit contributions
Rated-0.106
ENCY-0.096
 Draper-0.095
DEN-0.095
OWER-0.092
Token Security
Token ID4765
Feature activation+8.78e-01

Pos logit contributions
Rated+0.101
ENCY+0.091
 Draper+0.090
DEN+0.090
OWER+0.088
Neg logit contributions
 patched-0.095
 merit-0.089
minster-0.086
 consolation-0.086
 elig-0.085
Token
Token ID851
Feature activation+7.42e-02

Pos logit contributions
 patched+1.565
 merit+1.480
 consolation+1.419
minster+1.419
bridge+1.406
Neg logit contributions
Rated-1.519
ENCY-1.364
 Draper-1.349
DEN-1.348
OWER-1.316
Token seriously
Token ID6411
Feature activation-2.33e-01

Pos logit contributions
 patched+0.143
 merit+0.136
minster+0.132
 consolation+0.131
bridge+0.130
Neg logit contributions
Rated-0.117
ENCY-0.104
 Draper-0.103
DEN-0.103
OWER-0.100
Token.
Token ID13
Feature activation+7.67e-02

Pos logit contributions
Rated+0.393
ENCY+0.349
 Draper+0.348
DEN+0.347
OWER+0.338
Neg logit contributions
 patched-0.427
 merit-0.401
minster-0.391
 consolation-0.386
 elig-0.385
Token And
Token ID843
Feature activation+7.14e-02

Pos logit contributions
 patched+0.137
 merit+0.129
minster+0.125
 consolation+0.124
 elig+0.123
Neg logit contributions
Rated-0.133
ENCY-0.119
 Draper-0.118
DEN-0.118
OWER-0.115
Token what
Token ID644
Feature activation-6.08e-03

Pos logit contributions
 patched+0.123
 merit+0.114
minster+0.113
bridge+0.110
 elig+0.110
Neg logit contributions
Rated-0.127
 Draper-0.115
ENCY-0.114
DEN-0.113
 Roose-0.111
Token spending
Token ID4581
Feature activation-1.38e-01

Pos logit contributions
 patched+0.358
 merit+0.334
minster+0.325
 consolation+0.322
 elig+0.321
Neg logit contributions
Rated-0.377
ENCY-0.341
 Draper-0.341
DEN-0.337
OWER-0.330
Token a
Token ID257
Feature activation-4.18e-02

Pos logit contributions
Rated+0.234
 Draper+0.211
ENCY+0.209
DEN+0.206
OWER+0.203
Neg logit contributions
 patched-0.246
 merit-0.231
minster-0.226
bridge-0.224
 elig-0.224
Token decade
Token ID5707
Feature activation+1.02e-01

Pos logit contributions
Rated+0.074
 Draper+0.067
ENCY+0.067
DEN+0.066
OWER+0.065
Neg logit contributions
 patched-0.071
 merit-0.066
minster-0.065
bridge-0.064
 elig-0.064
Token in
Token ID287
Feature activation-7.24e-02

Pos logit contributions
 patched+0.207
 merit+0.197
minster+0.193
 consolation+0.191
bridge+0.190
Neg logit contributions
Rated-0.148
ENCY-0.130
 Draper-0.129
DEN-0.129
OWER-0.125
Token Guantanamo
Token ID34378
Feature activation-5.67e-02

Pos logit contributions
Rated+0.134
 Draper+0.122
ENCY+0.121
DEN+0.119
OWER+0.118
Neg logit contributions
 patched-0.119
 merit-0.111
minster-0.108
 elig-0.106
 consolation-0.106
Token Bay
Token ID4696
Feature activation+8.66e-01

Pos logit contributions
Rated+0.096
ENCY+0.086
 Draper+0.085
DEN+0.085
OWER+0.083
Neg logit contributions
 patched-0.103
 merit-0.097
minster-0.094
 consolation-0.093
bridge-0.092
Token,
Token ID11
Feature activation-3.18e-02

Pos logit contributions
 patched+1.547
 merit+1.459
minster+1.408
 consolation+1.401
bridge+1.390
Neg logit contributions
Rated-1.497
ENCY-1.342
 Draper-1.327
DEN-1.325
OWER-1.294
Token Kh
Token ID5311
Feature activation+2.97e-01

Pos logit contributions
Rated+0.059
ENCY+0.054
 Draper+0.054
DEN+0.053
OWER+0.052
Neg logit contributions
 patched-0.052
 merit-0.048
minster-0.047
 consolation-0.046
 elig-0.046
Tokenadr
Token ID41909
Feature activation+3.02e-01

Pos logit contributions
 patched+0.571
 merit+0.538
minster+0.521
 consolation+0.520
 elig+0.516
Neg logit contributions
Rated-0.475
ENCY-0.420
 Draper-0.419
DEN-0.416
OWER-0.405
Token was
Token ID373
Feature activation+6.78e-02

Pos logit contributions
 patched+0.540
 merit+0.509
minster+0.494
 consolation+0.490
 elig+0.486
Neg logit contributions
Rated-0.519
ENCY-0.465
 Draper-0.462
DEN-0.461
OWER-0.450
Token sentenced
Token ID11897
Feature activation-4.30e-02

Pos logit contributions
 patched+0.099
 merit+0.093
minster+0.091
 consolation+0.089
bridge+0.089
Neg logit contributions
Rated-0.137
ENCY-0.124
 Draper-0.124
DEN-0.123
OWER-0.121
Token owns
Token ID12216
Feature activation+2.61e-01

Pos logit contributions
 patched+0.186
 merit+0.176
minster+0.170
 consolation+0.169
bridge+0.168
Neg logit contributions
Rated-0.160
ENCY-0.143
 Draper-0.141
DEN-0.141
OWER-0.137
Token four
Token ID1440
Feature activation-9.34e-02

Pos logit contributions
 patched+0.477
 merit+0.451
minster+0.436
 consolation+0.433
bridge+0.430
Neg logit contributions
Rated-0.441
ENCY-0.395
DEN-0.390
 Draper-0.390
OWER-0.381
Token numbered
Token ID25840
Feature activation+4.46e-02

Pos logit contributions
Rated+0.163
ENCY+0.147
 Draper+0.146
DEN+0.145
OWER+0.142
Neg logit contributions
 patched-0.165
 merit-0.155
minster-0.150
 consolation-0.149
bridge-0.148
Token Ontario
Token ID10553
Feature activation+2.68e-01

Pos logit contributions
 patched+0.071
 merit+0.066
minster+0.064
 consolation+0.063
bridge+0.063
Neg logit contributions
Rated-0.085
ENCY-0.077
 Draper-0.077
DEN-0.077
OWER-0.075
Token real
Token ID1103
Feature activation-9.71e-02

Pos logit contributions
 patched+0.457
 merit+0.429
minster+0.414
 consolation+0.411
bridge+0.408
Neg logit contributions
Rated-0.485
ENCY-0.437
 Draper-0.433
DEN-0.432
OWER-0.423
Token estate
Token ID7964
Feature activation+8.60e-01

Pos logit contributions
Rated+0.194
ENCY+0.177
 Draper+0.175
DEN+0.175
OWER+0.172
Neg logit contributions
 patched-0.147
 merit-0.136
minster-0.131
 consolation-0.130
 elig-0.129
Token holding
Token ID4769
Feature activation+3.59e-01

Pos logit contributions
 patched+1.467
 merit+1.385
 consolation+1.327
minster+1.324
bridge+1.311
Neg logit contributions
Rated-1.576
ENCY-1.411
DEN-1.394
 Draper-1.373
OWER-1.367
Token companies
Token ID2706
Feature activation+1.93e-01

Pos logit contributions
 patched+0.414
 merit+0.374
minster+0.359
 consolation+0.353
bridge+0.351
Neg logit contributions
Rated-0.841
ENCY-0.781
 Draper-0.777
DEN-0.775
OWER-0.762
Token with
Token ID351
Feature activation+2.97e-02

Pos logit contributions
 patched+0.361
 merit+0.341
minster+0.329
 consolation+0.327
bridge+0.326
Neg logit contributions
Rated-0.317
ENCY-0.283
DEN-0.280
 Draper-0.278
OWER-0.272
Token his
Token ID465
Feature activation+2.12e-02

Pos logit contributions
 patched+0.054
 merit+0.051
minster+0.049
 consolation+0.049
bridge+0.049
Neg logit contributions
Rated-0.050
ENCY-0.045
 Draper-0.045
DEN-0.044
OWER-0.043
Token siblings
Token ID20569
Feature activation-1.58e-01

Pos logit contributions
 patched+0.037
 merit+0.034
minster+0.034
 elig+0.033
 consolation+0.033
Neg logit contributions
Rated-0.037
 Draper-0.034
ENCY-0.033
DEN-0.033
OWER-0.032
Token Ryan
Token ID6047
Feature activation+1.45e-01

Pos logit contributions
Rated+0.034
ENCY+0.030
 Draper+0.030
DEN+0.030
OWER+0.029
Neg logit contributions
 patched-0.040
 merit-0.038
minster-0.037
 consolation-0.037
bridge-0.036
Token explained
Token ID4893
Feature activation-1.59e-02

Pos logit contributions
 patched+0.269
 merit+0.256
minster+0.248
 consolation+0.245
 elig+0.244
Neg logit contributions
Rated-0.238
 Draper-0.211
ENCY-0.211
DEN-0.209
OWER-0.204
Token.
Token ID13
Feature activation+6.81e-02

Pos logit contributions
Rated+0.026
ENCY+0.024
 Draper+0.023
DEN+0.023
OWER+0.023
Neg logit contributions
 patched-0.029
 merit-0.028
minster-0.027
 consolation-0.027
bridge-0.027
Token
Token ID198
Feature activation+5.69e-03

Pos logit contributions
 patched+0.113
 merit+0.106
minster+0.102
 consolation+0.101
 elig+0.100
Neg logit contributions
Rated-0.127
ENCY-0.115
 Draper-0.114
DEN-0.113
OWER-0.111
Token
Token ID198
Feature activation+3.59e-02

Pos logit contributions
 patched+0.009
 merit+0.008
 consolation+0.008
 elig+0.008
minster+0.008
Neg logit contributions
Rated-0.011
ENCY-0.010
DEN-0.010
 Draper-0.010
OWER-0.010
TokenBeat
Token ID34979
Feature activation+8.34e-01

Pos logit contributions
 patched+0.060
 merit+0.056
minster+0.054
 consolation+0.054
 elig+0.053
Neg logit contributions
Rated-0.067
ENCY-0.061
DEN-0.060
 Draper-0.060
OWER-0.059
Tokenty
Token ID774
Feature activation+1.06e-01

Pos logit contributions
 patched+1.475
 merit+1.388
minster+1.342
 consolation+1.333
bridge+1.324
Neg logit contributions
Rated-1.457
ENCY-1.309
 Draper-1.295
DEN-1.293
OWER-1.264
Token hesitated
Token ID41723
Feature activation+1.47e-01

Pos logit contributions
 patched+0.184
 merit+0.174
minster+0.169
 consolation+0.167
bridge+0.166
Neg logit contributions
Rated-0.188
ENCY-0.169
 Draper-0.168
DEN-0.167
OWER-0.163
Token twice
Token ID5403
Feature activation-3.37e-01

Pos logit contributions
 patched+0.265
 merit+0.249
minster+0.242
 consolation+0.239
 elig+0.238
Neg logit contributions
Rated-0.250
ENCY-0.224
 Draper-0.222
DEN-0.221
OWER-0.217
Token after
Token ID706
Feature activation+1.48e-01

Pos logit contributions
Rated+0.566
 Draper+0.504
ENCY+0.502
DEN+0.500
OWER+0.489
Neg logit contributions
 patched-0.616
 merit-0.581
minster-0.567
 elig-0.557
 consolation-0.557
Token opening
Token ID4756
Feature activation-1.47e-02

Pos logit contributions
 patched+0.255
 merit+0.239
minster+0.232
bridge+0.229
 consolation+0.229
Neg logit contributions
Rated-0.263
ENCY-0.236
 Draper-0.236
DEN-0.233
OWER-0.229
Token for
Token ID329
Feature activation+1.21e-01

Pos logit contributions
 patched+0.128
 merit+0.120
minster+0.118
bridge+0.116
 consolation+0.116
Neg logit contributions
Rated-0.119
ENCY-0.107
 Draper-0.106
DEN-0.105
OWER-0.103
Token play
Token ID711
Feature activation+2.36e-01

Pos logit contributions
 patched+0.201
 merit+0.188
minster+0.187
bridge+0.183
 consolation+0.181
Neg logit contributions
Rated-0.221
ENCY-0.199
 Draper-0.198
DEN-0.197
OWER-0.195
Token with
Token ID351
Feature activation+1.31e-01

Pos logit contributions
 patched+0.432
 merit+0.406
minster+0.396
 consolation+0.394
 elig+0.392
Neg logit contributions
Rated-0.394
ENCY-0.351
DEN-0.350
 Draper-0.348
OWER-0.343
Token the
Token ID262
Feature activation+5.71e-02

Pos logit contributions
 patched+0.202
 merit+0.189
minster+0.187
bridge+0.182
 consolation+0.181
Neg logit contributions
Rated-0.255
 Draper-0.231
ENCY-0.230
DEN-0.229
OWER-0.226
Token Wii
Token ID16591
Feature activation+2.60e-01

Pos logit contributions
 patched+0.080
 merit+0.074
minster+0.073
bridge+0.071
 consolation+0.071
Neg logit contributions
Rated-0.119
ENCY-0.108
 Draper-0.108
DEN-0.108
OWER-0.106
Token U
Token ID471
Feature activation+8.23e-01

Pos logit contributions
 patched+0.439
 merit+0.413
minster+0.401
 consolation+0.396
bridge+0.394
Neg logit contributions
Rated-0.473
ENCY-0.427
DEN-0.424
 Draper-0.423
OWER-0.416
Token Game
Token ID3776
Feature activation+3.34e-01

Pos logit contributions
 patched+1.508
 merit+1.421
minster+1.374
 consolation+1.366
bridge+1.357
Neg logit contributions
Rated-1.385
ENCY-1.239
 Draper-1.225
DEN-1.222
OWER-1.193
TokenPad
Token ID26114
Feature activation+1.07e-01

Pos logit contributions
 patched+0.632
 merit+0.598
minster+0.579
 consolation+0.576
bridge+0.572
Neg logit contributions
Rated-0.540
ENCY-0.481
 Draper-0.475
DEN-0.474
OWER-0.463
Token controller
Token ID10444
Feature activation-2.69e-02

Pos logit contributions
 patched+0.195
 merit+0.185
minster+0.180
 consolation+0.178
 elig+0.177
Neg logit contributions
Rated-0.179
ENCY-0.160
 Draper-0.159
DEN-0.159
OWER-0.156
Token.
Token ID13
Feature activation-9.09e-02

Pos logit contributions
Rated+0.045
ENCY+0.040
 Draper+0.040
DEN+0.039
OWER+0.039
Neg logit contributions
 patched-0.049
 merit-0.047
minster-0.046
 consolation-0.045
 elig-0.045
Token
Token ID198
Feature activation+6.22e-02

Pos logit contributions
Rated+0.165
ENCY+0.148
 Draper+0.147
DEN+0.147
OWER+0.144
Neg logit contributions
 patched-0.155
 merit-0.146
minster-0.140
 consolation-0.140
 elig-0.139
Token have
Token ID423
Feature activation+1.01e-01

Pos logit contributions
 patched+0.109
 merit+0.101
minster+0.100
 consolation+0.099
 elig+0.098
Neg logit contributions
Rated-0.110
ENCY-0.099
DEN-0.098
 Draper-0.098
OWER-0.096
Token to
Token ID284
Feature activation+1.15e-01

Pos logit contributions
 patched+0.187
 merit+0.176
minster+0.173
bridge+0.170
 consolation+0.170
Neg logit contributions
Rated-0.166
 Draper-0.147
ENCY-0.147
DEN-0.146
OWER-0.143
Token do
Token ID466
Feature activation+5.79e-02

Pos logit contributions
 patched+0.193
 merit+0.179
minster+0.178
 consolation+0.174
bridge+0.174
Neg logit contributions
Rated-0.207
ENCY-0.188
 Draper-0.185
DEN-0.185
OWER-0.181
Token more
Token ID517
Feature activation+1.26e-01

Pos logit contributions
 patched+0.102
 merit+0.095
minster+0.093
 consolation+0.092
bridge+0.091
Neg logit contributions
Rated-0.101
ENCY-0.091
 Draper-0.090
DEN-0.090
OWER-0.088
Token heavy
Token ID4334
Feature activation+6.97e-02

Pos logit contributions
 patched+0.236
 merit+0.222
minster+0.216
 consolation+0.214
bridge+0.213
Neg logit contributions
Rated-0.206
ENCY-0.184
 Draper-0.183
DEN-0.182
OWER-0.178
Token lifting
Token ID16842
Feature activation+8.18e-01

Pos logit contributions
 patched+0.125
 merit+0.118
 consolation+0.114
minster+0.113
 trickle+0.112
Neg logit contributions
Rated-0.121
ENCY-0.108
 Draper-0.107
DEN-0.107
OWER-0.104
Token with
Token ID351
Feature activation+1.67e-01

Pos logit contributions
 patched+1.483
 merit+1.398
minster+1.345
 consolation+1.343
 elig+1.330
Neg logit contributions
Rated-1.402
ENCY-1.256
DEN-1.238
 Draper-1.231
OWER-1.207
Token Ye
Token ID11609
Feature activation+1.81e-01

Pos logit contributions
 patched+0.314
 merit+0.296
minster+0.290
bridge+0.286
 consolation+0.285
Neg logit contributions
Rated-0.271
ENCY-0.242
 Draper-0.242
DEN-0.240
OWER-0.233
Tokenldon
Token ID25900
Feature activation-1.82e-01

Pos logit contributions
 patched+0.507
 merit+0.488
minster+0.478
 consolation+0.476
 elig+0.475
Neg logit contributions
Rated-0.130
ENCY-0.097
 Draper-0.096
DEN-0.093
OWER-0.088
Token likely
Token ID1884
Feature activation+7.60e-02

Pos logit contributions
Rated+0.309
ENCY+0.276
 Draper+0.275
DEN+0.274
OWER+0.267
Neg logit contributions
 patched-0.327
 merit-0.309
minster-0.300
 consolation-0.297
bridge-0.295
Token out
Token ID503
Feature activation+1.34e-02

Pos logit contributions
 patched+0.137
 merit+0.129
minster+0.126
 consolation+0.125
bridge+0.124
Neg logit contributions
Rated-0.129
ENCY-0.116
 Draper-0.115
DEN-0.115
OWER-0.112
Token.
Token ID13
Feature activation+1.44e-01

Pos logit contributions
Rated+0.005
ENCY+0.005
 Draper+0.005
DEN+0.005
OWER+0.005
Neg logit contributions
 patched-0.006
 merit-0.006
minster-0.005
 consolation-0.005
bridge-0.005
Token
Token ID198
Feature activation+6.04e-02

Pos logit contributions
 patched+0.265
 merit+0.250
minster+0.242
 consolation+0.240
bridge+0.240
Neg logit contributions
Rated-0.241
ENCY-0.216
DEN-0.214
 Draper-0.212
OWER-0.208
Token
Token ID198
Feature activation+7.66e-02

Pos logit contributions
 patched+0.094
 merit+0.087
 consolation+0.084
minster+0.084
 elig+0.084
Neg logit contributions
Rated-0.119
ENCY-0.108
DEN-0.107
 Draper-0.107
OWER-0.105
TokenY
Token ID56
Feature activation+1.30e-01

Pos logit contributions
 patched+0.149
 merit+0.141
minster+0.137
 consolation+0.136
bridge+0.136
Neg logit contributions
Rated-0.120
ENCY-0.106
 Draper-0.105
DEN-0.105
OWER-0.102
Tokenoy
Token ID726
Feature activation+2.98e-02

Pos logit contributions
 patched+0.240
 merit+0.227
minster+0.219
 consolation+0.218
 elig+0.217
Neg logit contributions
Rated-0.217
ENCY-0.194
 Draper-0.191
DEN-0.191
OWER-0.188
Tokenore
Token ID382
Feature activation+8.16e-01

Pos logit contributions
 patched+0.046
 merit+0.042
minster+0.040
 elig+0.040
 consolation+0.040
Neg logit contributions
Rated-0.059
ENCY-0.054
 Draper-0.053
DEN-0.053
OWER-0.053
Tokencreation
Token ID38793
Feature activation+3.47e-01

Pos logit contributions
 patched+1.452
 merit+1.367
minster+1.323
 consolation+1.315
bridge+1.306
Neg logit contributions
Rated-1.411
ENCY-1.271
 Draper-1.255
DEN-1.253
OWER-1.225
Token Garg
Token ID39013
Feature activation+2.86e-01

Pos logit contributions
 patched+0.612
 merit+0.576
minster+0.557
 consolation+0.553
 elig+0.550
Neg logit contributions
Rated-0.610
ENCY-0.547
 Draper-0.541
DEN-0.540
OWER-0.529
Tokenant
Token ID415
Feature activation-2.57e-01

Pos logit contributions
 patched+0.487
 merit+0.457
minster+0.441
 consolation+0.438
 elig+0.436
Neg logit contributions
Rated-0.516
ENCY-0.467
 Draper-0.461
DEN-0.460
OWER-0.451
Tokenua
Token ID6413
Feature activation+2.14e-02

Pos logit contributions
Rated+0.319
ENCY+0.276
 Draper+0.268
DEN+0.265
OWER+0.261
Neg logit contributions
 patched-0.584
 merit-0.556
minster-0.540
 consolation-0.540
 elig-0.539
Token
Token ID198
Feature activation+8.07e-02

Pos logit contributions
 patched+0.037
 merit+0.035
minster+0.034
 consolation+0.034
 elig+0.033
Neg logit contributions
Rated-0.038
ENCY-0.034
 Draper-0.034
DEN-0.034
OWER-0.033
Tokenyers
Token ID21200
Feature activation+2.53e-01

Pos logit contributions
Rated+0.481
ENCY+0.438
 Draper+0.434
DEN+0.434
OWER+0.425
Neg logit contributions
 patched-0.446
 merit-0.417
 consolation-0.401
minster-0.400
 elig-0.397
Token for
Token ID329
Feature activation-1.25e-02

Pos logit contributions
 patched+0.459
 merit+0.434
minster+0.421
 consolation+0.417
 elig+0.414
Neg logit contributions
Rated-0.429
ENCY-0.384
 Draper-0.381
DEN-0.378
OWER-0.370
Token the
Token ID262
Feature activation-1.63e-02

Pos logit contributions
Rated+0.021
 Draper+0.019
ENCY+0.019
DEN+0.018
OWER+0.018
Neg logit contributions
 patched-0.023
 merit-0.022
minster-0.021
 consolation-0.021
 elig-0.021
Token former
Token ID1966
Feature activation+1.23e-01

Pos logit contributions
Rated+0.029
 Draper+0.027
ENCY+0.027
DEN+0.026
OWER+0.026
Neg logit contributions
 patched-0.027
 merit-0.026
minster-0.025
 consolation-0.024
 elig-0.024
Token Guantanamo
Token ID34378
Feature activation-9.90e-02

Pos logit contributions
 patched+0.177
 merit+0.162
minster+0.161
 elig+0.156
bridge+0.155
Neg logit contributions
Rated-0.248
 Draper-0.232
ENCY-0.227
DEN-0.225
OWER-0.223
Token Bay
Token ID4696
Feature activation+8.11e-01

Pos logit contributions
Rated+0.175
ENCY+0.157
 Draper+0.156
DEN+0.155
OWER+0.152
Neg logit contributions
 patched-0.173
 merit-0.162
minster-0.158
 consolation-0.156
bridge-0.154
Token detainee
Token ID49436
Feature activation+1.64e-02

Pos logit contributions
 patched+0.937
 merit+0.848
minster+0.811
 consolation+0.797
bridge+0.789
Neg logit contributions
Rated-1.911
ENCY-1.767
 Draper-1.754
DEN-1.752
OWER-1.725
Token cons
Token ID762
Feature activation+4.34e-01

Pos logit contributions
 patched+0.028
 merit+0.027
minster+0.026
 consolation+0.025
bridge+0.025
Neg logit contributions
Rated-0.029
ENCY-0.026
 Draper-0.026
DEN-0.026
OWER-0.025
Tokenented
Token ID4714
Feature activation-1.73e-01

Pos logit contributions
 patched+0.808
 merit+0.762
minster+0.738
 consolation+0.734
bridge+0.729
Neg logit contributions
Rated-0.717
ENCY-0.641
 Draper-0.633
DEN-0.632
OWER-0.617
Token Thursday
Token ID3635
Feature activation-8.34e-03

Pos logit contributions
Rated+0.281
ENCY+0.251
 Draper+0.249
DEN+0.247
OWER+0.242
Neg logit contributions
 patched-0.325
 merit-0.307
minster-0.298
 consolation-0.296
bridge-0.295
Token to
Token ID284
Feature activation+9.77e-02

Pos logit contributions
Rated+0.013
ENCY+0.011
 Draper+0.011
DEN+0.011
OWER+0.011
Neg logit contributions
 patched-0.017
 merit-0.016
minster-0.015
 consolation-0.015
bridge-0.015
Token be
Token ID307
Feature activation-1.31e-01

Pos logit contributions
Rated+0.024
ENCY+0.022
DEN+0.022
 Draper+0.022
OWER+0.021
Neg logit contributions
 patched-0.022
 merit-0.021
minster-0.020
 elig-0.020
 consolation-0.020
Token more
Token ID517
Feature activation-6.75e-02

Pos logit contributions
Rated+0.218
ENCY+0.196
 Draper+0.195
DEN+0.192
OWER+0.188
Neg logit contributions
 patched-0.238
 merit-0.225
minster-0.221
 elig-0.218
bridge-0.217
Token TM
Token ID21232
Feature activation-8.80e-03

Pos logit contributions
Rated+0.123
ENCY+0.111
 Draper+0.110
DEN+0.110
OWER+0.107
Neg logit contributions
 patched-0.112
 merit-0.106
minster-0.103
 elig-0.102
 consolation-0.102
TokenNT
Token ID11251
Feature activation+3.16e-01

Pos logit contributions
Rated+0.008
ENCY+0.007
DEN+0.006
 Draper+0.006
OWER+0.006
Neg logit contributions
 patched-0.023
 merit-0.022
minster-0.021
 consolation-0.021
 elig-0.021
Token tid
Token ID29770
Feature activation+1.81e-02

Pos logit contributions
 patched+0.466
 merit+0.435
minster+0.418
 consolation+0.414
 elig+0.411
Neg logit contributions
Rated-0.643
ENCY-0.585
DEN-0.580
 Draper-0.580
OWER-0.569
Tokenbits
Token ID9895
Feature activation+8.04e-01

Pos logit contributions
 patched+0.012
 merit+0.011
minster+0.009
 consolation+0.009
 elig+0.009
Neg logit contributions
Rated-0.051
ENCY-0.048
 Draper-0.048
DEN-0.048
OWER-0.047
Token in
Token ID287
Feature activation+8.19e-02

Pos logit contributions
 patched+1.424
 merit+1.334
minster+1.287
 consolation+1.281
bridge+1.272
Neg logit contributions
Rated-1.410
ENCY-1.269
DEN-1.249
 Draper-1.248
OWER-1.222
Token the
Token ID262
Feature activation+7.58e-02

Pos logit contributions
 patched+0.155
 merit+0.147
minster+0.143
 elig+0.142
 consolation+0.141
Neg logit contributions
Rated-0.132
ENCY-0.117
 Draper-0.116
DEN-0.115
OWER-0.112
Token film
Token ID2646
Feature activation+3.99e-01

Pos logit contributions
 patched+0.135
 merit+0.129
minster+0.126
 elig+0.125
 consolation+0.124
Neg logit contributions
Rated-0.128
ENCY-0.114
 Draper-0.113
DEN-0.113
OWER-0.111
Token?
Token ID30
Feature activation+1.39e-01

Pos logit contributions
 patched+0.634
 merit+0.592
minster+0.570
 consolation+0.566
bridge+0.561
Neg logit contributions
Rated-0.769
ENCY-0.698
 Draper-0.691
DEN-0.690
OWER-0.676
Token Will
Token ID2561
Feature activation-5.00e-02

Pos logit contributions
 patched+0.260
 merit+0.245
minster+0.238
 consolation+0.237
 elig+0.236
Neg logit contributions
Rated-0.228
ENCY-0.202
 Draper-0.202
DEN-0.200
OWER-0.196
Token
Token ID198
Feature activation+9.01e-02

Pos logit contributions
 patched+0.157
 merit+0.146
minster+0.141
 consolation+0.140
bridge+0.139
Neg logit contributions
Rated-0.199
ENCY-0.181
 Draper-0.180
DEN-0.179
OWER-0.176
Token
Token ID198
Feature activation+9.25e-02

Pos logit contributions
 patched+0.141
 merit+0.131
 consolation+0.126
minster+0.126
 elig+0.126
Neg logit contributions
Rated-0.177
ENCY-0.160
DEN-0.159
 Draper-0.159
OWER-0.156
Token21
Token ID2481
Feature activation-6.81e-02

Pos logit contributions
 patched+0.175
 merit+0.165
minster+0.160
 consolation+0.159
 elig+0.158
Neg logit contributions
Rated-0.151
ENCY-0.134
DEN-0.132
 Draper-0.132
OWER-0.129
Token.
Token ID13
Feature activation-2.03e-02

Pos logit contributions
Rated+0.114
ENCY+0.102
 Draper+0.101
DEN+0.101
OWER+0.098
Neg logit contributions
 patched-0.126
 merit-0.118
minster-0.114
 consolation-0.114
 elig-0.114
Token Henry
Token ID8616
Feature activation-1.01e-01

Pos logit contributions
Rated+0.032
ENCY+0.028
 Draper+0.028
DEN+0.028
OWER+0.027
Neg logit contributions
 patched-0.039
 merit-0.037
minster-0.036
 consolation-0.036
 elig-0.036
Token Pyr
Token ID27958
Feature activation+7.99e-01

Pos logit contributions
Rated+0.150
ENCY+0.132
 Draper+0.131
DEN+0.130
OWER+0.127
Neg logit contributions
 patched-0.206
 merit-0.196
minster-0.190
 consolation-0.189
 elig-0.188
Tokeng
Token ID70
Feature activation-6.06e-02

Pos logit contributions
 patched+1.515
 merit+1.438
minster+1.392
bridge+1.382
 consolation+1.381
Neg logit contributions
Rated-1.285
ENCY-1.150
 Draper-1.132
DEN-1.129
OWER-1.104
Tokenos
Token ID418
Feature activation+4.57e-02

Pos logit contributions
Rated+0.085
ENCY+0.074
 Draper+0.074
DEN+0.073
OWER+0.072
Neg logit contributions
 patched-0.128
 merit-0.120
 elig-0.118
 consolation-0.117
minster-0.116
Token (
Token ID357
Feature activation-8.74e-02

Pos logit contributions
 patched+0.087
 merit+0.083
minster+0.080
 consolation+0.080
 elig+0.079
Neg logit contributions
Rated-0.073
ENCY-0.065
 Draper-0.064
DEN-0.064
OWER-0.063
Token116
Token ID18298
Feature activation+7.51e-02

Pos logit contributions
Rated+0.135
ENCY+0.119
 Draper+0.118
DEN+0.118
OWER+0.115
Neg logit contributions
 patched-0.172
 merit-0.163
minster-0.158
 consolation-0.158
 elig-0.157
Token)
Token ID8
Feature activation+9.91e-02

Pos logit contributions
 patched+0.117
 merit+0.109
 consolation+0.104
minster+0.104
 elig+0.104
Neg logit contributions
Rated-0.148
ENCY-0.134
DEN-0.133
 Draper-0.133
OWER-0.130
Token after
Token ID706
Feature activation+7.85e-02

Pos logit contributions
Rated+0.132
ENCY+0.120
 Draper+0.120
DEN+0.119
OWER+0.116
Neg logit contributions
 patched-0.111
 merit-0.104
minster-0.101
 consolation-0.100
 elig-0.099
Token the
Token ID262
Feature activation+4.52e-03

Pos logit contributions
 patched+0.143
 merit+0.135
minster+0.131
 consolation+0.130
 elig+0.129
Neg logit contributions
Rated-0.132
 Draper-0.118
ENCY-0.118
DEN-0.116
OWER-0.114
Token near
Token ID1474
Feature activation-1.85e-01

Pos logit contributions
 patched+0.007
 merit+0.006
minster+0.006
 elig+0.006
 consolation+0.006
Neg logit contributions
Rated-0.009
ENCY-0.008
 Draper-0.008
DEN-0.008
OWER-0.008
Token-
Token ID12
Feature activation+9.38e-03

Pos logit contributions
Rated+0.303
 Draper+0.270
ENCY+0.270
DEN+0.268
OWER+0.262
Neg logit contributions
 patched-0.345
 merit-0.326
minster-0.316
 elig-0.314
 consolation-0.313
Tokendis
Token ID6381
Feature activation+1.57e-01

Pos logit contributions
 patched+0.015
 merit+0.014
minster+0.014
 consolation+0.014
 elig+0.014
Neg logit contributions
Rated-0.018
ENCY-0.016
 Draper-0.016
DEN-0.016
OWER-0.016
Tokenaster
Token ID1603
Feature activation-9.69e-01

Pos logit contributions
 patched+0.299
 merit+0.283
minster+0.273
 consolation+0.272
 elig+0.271
Neg logit contributions
Rated-0.251
ENCY-0.224
 Draper-0.222
DEN-0.222
OWER-0.216
Token,
Token ID11
Feature activation-1.34e-02

Pos logit contributions
Rated+1.567
 Draper+1.400
ENCY+1.394
DEN+1.385
OWER+1.351
Neg logit contributions
 patched-1.792
 merit-1.702
minster-1.665
 elig-1.648
 consolation-1.641
Token the
Token ID262
Feature activation+3.53e-02

Pos logit contributions
Rated+0.022
 Draper+0.019
ENCY+0.019
DEN+0.019
OWER+0.019
Neg logit contributions
 patched-0.025
 merit-0.024
minster-0.023
 elig-0.023
 consolation-0.023
Token Australian
Token ID6638
Feature activation+2.52e-03

Pos logit contributions
 patched+0.060
 merit+0.056
minster+0.055
bridge+0.054
 elig+0.054
Neg logit contributions
Rated-0.064
ENCY-0.058
 Draper-0.058
DEN-0.057
OWER-0.056
Token Transport
Token ID19940
Feature activation-1.27e-01

Pos logit contributions
 patched+0.005
 merit+0.004
minster+0.004
 elig+0.004
 consolation+0.004
Neg logit contributions
Rated-0.004
ENCY-0.004
 Draper-0.004
DEN-0.004
OWER-0.004
Token Safety
Token ID11233
Feature activation+2.04e-01

Pos logit contributions
Rated+0.179
ENCY+0.156
 Draper+0.156
DEN+0.154
OWER+0.149
Neg logit contributions
 patched-0.268
 merit-0.254
minster-0.247
 consolation-0.246
 elig-0.246
Token situation
Token ID3074
Feature activation+5.06e-02

Pos logit contributions
Rated+0.085
ENCY+0.076
 Draper+0.075
DEN+0.075
OWER+0.073
Neg logit contributions
 patched-0.088
 merit-0.083
minster-0.080
 consolation-0.079
bridge-0.079
Token of
Token ID286
Feature activation-3.09e-02

Pos logit contributions
 patched+0.092
 merit+0.087
minster+0.084
 consolation+0.083
bridge+0.083
Neg logit contributions
Rated-0.086
ENCY-0.077
 Draper-0.076
DEN-0.076
OWER-0.074
Token the
Token ID262
Feature activation-8.95e-02

Pos logit contributions
Rated+0.054
ENCY+0.048
 Draper+0.048
DEN+0.048
OWER+0.047
Neg logit contributions
 patched-0.054
 merit-0.051
minster-0.050
bridge-0.049
 consolation-0.049
Token Bur
Token ID5481
Feature activation-8.55e-02

Pos logit contributions
Rated+0.147
ENCY+0.131
 Draper+0.129
DEN+0.129
OWER+0.126
Neg logit contributions
 patched-0.168
 merit-0.158
minster-0.154
 consolation-0.152
bridge-0.152
Tokenm
Token ID76
Feature activation-1.49e-01

Pos logit contributions
Rated+0.145
ENCY+0.130
DEN+0.128
 Draper+0.128
OWER+0.125
Neg logit contributions
 patched-0.156
 merit-0.147
minster-0.142
 consolation-0.141
 elig-0.140
Tokenese
Token ID2771
Feature activation-9.54e-01

Pos logit contributions
Rated+0.300
ENCY+0.273
 Draper+0.271
DEN+0.270
OWER+0.266
Neg logit contributions
 patched-0.224
 merit-0.208
minster-0.201
 consolation-0.199
 elig-0.197
Token people
Token ID661
Feature activation-7.31e-03

Pos logit contributions
Rated+1.750
ENCY+1.583
 Draper+1.576
DEN+1.572
OWER+1.528
Neg logit contributions
 patched-1.578
 merit-1.473
minster-1.447
 elig-1.420
 consolation-1.412
Token.
Token ID13
Feature activation-6.69e-02

Pos logit contributions
Rated+0.012
ENCY+0.011
 Draper+0.011
DEN+0.011
OWER+0.010
Neg logit contributions
 patched-0.014
 merit-0.013
minster-0.012
 consolation-0.012
bridge-0.012
Token Therefore
Token ID8447
Feature activation+3.17e-01

Pos logit contributions
Rated+0.113
ENCY+0.101
 Draper+0.100
DEN+0.100
OWER+0.098
Neg logit contributions
 patched-0.122
 merit-0.115
minster-0.111
 consolation-0.111
bridge-0.110
Token I
Token ID314
Feature activation-2.48e-03

Pos logit contributions
 patched+0.525
 merit+0.494
minster+0.473
 consolation+0.473
bridge+0.468
Neg logit contributions
Rated-0.589
ENCY-0.533
 Draper-0.527
DEN-0.526
OWER-0.517
Token want
Token ID765
Feature activation+1.01e-01

Pos logit contributions
Rated+0.005
ENCY+0.004
 Draper+0.004
DEN+0.004
OWER+0.004
Neg logit contributions
 patched-0.004
 merit-0.004
minster-0.004
 consolation-0.004
bridge-0.004
Token,
Token ID11
Feature activation+3.64e-02

Pos logit contributions
Rated+0.409
ENCY+0.363
DEN+0.361
 Draper+0.360
OWER+0.352
Neg logit contributions
 patched-0.470
 merit-0.443
minster-0.433
 consolation-0.427
 elig-0.427
Token while
Token ID981
Feature activation-1.67e-01

Pos logit contributions
 patched+0.066
 merit+0.062
minster+0.061
 consolation+0.060
bridge+0.059
Neg logit contributions
Rated-0.062
ENCY-0.055
 Draper-0.055
DEN-0.055
OWER-0.053
Token the
Token ID262
Feature activation+2.34e-02

Pos logit contributions
Rated+0.277
ENCY+0.251
 Draper+0.249
DEN+0.247
OWER+0.242
Neg logit contributions
 patched-0.303
 merit-0.283
minster-0.279
bridge-0.275
 elig-0.274
Token Bur
Token ID5481
Feature activation-2.59e-01

Pos logit contributions
 patched+0.043
 merit+0.040
minster+0.039
 consolation+0.039
bridge+0.039
Neg logit contributions
Rated-0.039
ENCY-0.035
 Draper-0.035
DEN-0.034
OWER-0.034
Tokenm
Token ID76
Feature activation-1.51e-01

Pos logit contributions
Rated+0.439
ENCY+0.396
DEN+0.394
 Draper+0.390
OWER+0.380
Neg logit contributions
 patched-0.467
 merit-0.440
minster-0.427
 elig-0.423
 consolation-0.422
Tokenese
Token ID2771
Feature activation-9.27e-01

Pos logit contributions
Rated+0.301
ENCY+0.273
 Draper+0.271
DEN+0.270
OWER+0.266
Neg logit contributions
 patched-0.230
 merit-0.213
minster-0.206
 consolation-0.204
 elig-0.203
Token authorities
Token ID4773
Feature activation-3.01e-01

Pos logit contributions
Rated+1.397
ENCY+1.258
 Draper+1.239
DEN+1.236
OWER+1.198
Neg logit contributions
 patched-1.811
 merit-1.713
minster-1.683
 elig-1.667
bridge-1.656
Token said
Token ID531
Feature activation+5.68e-02

Pos logit contributions
Rated+0.522
ENCY+0.470
 Draper+0.469
DEN+0.467
OWER+0.456
Neg logit contributions
 patched-0.518
 merit-0.490
minster-0.482
bridge-0.474
 elig-0.473
Token there
Token ID612
Feature activation-1.93e-01

Pos logit contributions
 patched+0.096
 merit+0.090
minster+0.089
bridge+0.087
 elig+0.087
Neg logit contributions
Rated-0.101
ENCY-0.092
 Draper-0.091
DEN-0.091
OWER-0.089
Token had
Token ID550
Feature activation+1.65e-01

Pos logit contributions
Rated+0.334
ENCY+0.302
DEN+0.298
 Draper+0.296
OWER+0.291
Neg logit contributions
 patched-0.340
 merit-0.320
minster-0.312
 consolation-0.309
bridge-0.308
Token been
Token ID587
Feature activation-6.04e-02

Pos logit contributions
 patched+0.272
 merit+0.255
minster+0.248
 consolation+0.246
 elig+0.244
Neg logit contributions
Rated-0.305
ENCY-0.276
DEN-0.273
 Draper-0.273
OWER-0.267
Token party
Token ID2151
Feature activation+1.51e-01

Pos logit contributions
 patched+0.328
 merit+0.308
minster+0.298
 consolation+0.295
bridge+0.294
Neg logit contributions
Rated-0.354
ENCY-0.320
 Draper-0.317
DEN-0.316
OWER-0.309
Token abandoned
Token ID9958
Feature activation+2.46e-01

Pos logit contributions
 patched+0.265
 merit+0.249
minster+0.241
 consolation+0.240
bridge+0.238
Neg logit contributions
Rated-0.265
ENCY-0.239
 Draper-0.236
DEN-0.236
OWER-0.230
Token President
Token ID1992
Feature activation+8.01e-02

Pos logit contributions
 patched+0.458
 merit+0.432
minster+0.419
 consolation+0.416
bridge+0.414
Neg logit contributions
Rated-0.405
ENCY-0.362
 Draper-0.358
DEN-0.358
OWER-0.349
Token Dil
Token ID29328
Feature activation-3.63e-01

Pos logit contributions
 patched+0.157
 merit+0.150
minster+0.146
 consolation+0.144
bridge+0.144
Neg logit contributions
Rated-0.124
ENCY-0.110
DEN-0.109
 Draper-0.107
OWER-0.106
Tokenma
Token ID2611
Feature activation+1.15e-01

Pos logit contributions
Rated+0.512
ENCY+0.454
DEN+0.441
 Draper+0.439
OWER+0.427
Neg logit contributions
 patched-0.764
 merit-0.726
minster-0.720
bridge-0.713
 trickle-0.700
Token Rousse
Token ID42849
Feature activation-9.07e-01

Pos logit contributions
 patched+0.296
 merit+0.282
 trickle+0.278
 consolation+0.275
 Sabha+0.271
Neg logit contributions
Rated-0.115
ENCY-0.091
DEN-0.088
OWER-0.086
 Draper-0.086
Tokenff
Token ID487
Feature activation-9.95e-02

Pos logit contributions
Rated+1.345
ENCY+1.181
DEN+1.176
 Draper+1.174
OWER+1.139
Neg logit contributions
 patched-1.829
 merit-1.739
minster-1.690
 consolation-1.689
 elig-1.675
Token
Token ID447
Feature activation+9.48e-02

Pos logit contributions
Rated+0.153
ENCY+0.135
 Draper+0.133
DEN+0.133
OWER+0.130
Neg logit contributions
 patched-0.197
 merit-0.186
minster-0.181
 consolation-0.180
bridge-0.179
Token
Token ID247
Feature activation+8.26e-02

Pos logit contributions
 patched+0.185
 merit+0.176
 consolation+0.170
minster+0.170
bridge+0.169
Neg logit contributions
Rated-0.146
ENCY-0.130
 Draper-0.129
DEN-0.129
OWER-0.125
Tokens
Token ID82
Feature activation-9.53e-02

Pos logit contributions
 patched+0.144
 merit+0.136
minster+0.131
 consolation+0.131
bridge+0.130
Neg logit contributions
Rated-0.146
ENCY-0.131
 Draper-0.130
DEN-0.129
OWER-0.127
Token governing
Token ID15030
Feature activation+2.90e-02

Pos logit contributions
Rated+0.164
ENCY+0.147
 Draper+0.145
DEN+0.145
OWER+0.142
Neg logit contributions
 patched-0.171
 merit-0.161
minster-0.156
 consolation-0.154
bridge-0.154
Token Irvine
Token ID37277
Feature activation+7.94e-04

Pos logit contributions
Rated+0.001
ENCY+0.001
 Draper+0.001
DEN+0.001
OWER+0.001
Neg logit contributions
 patched-0.002
 merit-0.002
minster-0.002
 consolation-0.002
bridge-0.002
Token,
Token ID11
Feature activation-5.01e-02

Pos logit contributions
 patched+0.001
 merit+0.001
minster+0.001
 consolation+0.001
 elig+0.001
Neg logit contributions
Rated-0.001
ENCY-0.001
 Draper-0.001
DEN-0.001
OWER-0.001
Token California
Token ID3442
Feature activation+2.09e-01

Pos logit contributions
Rated+0.074
ENCY+0.066
DEN+0.065
 Draper+0.065
OWER+0.063
Neg logit contributions
 patched-0.102
 merit-0.097
 consolation-0.093
minster-0.093
 elig-0.093
Token.
Token ID13
Feature activation-1.36e-01

Pos logit contributions
 patched+0.373
 merit+0.351
minster+0.339
 consolation+0.338
bridge+0.336
Neg logit contributions
Rated-0.360
ENCY-0.324
DEN-0.321
 Draper-0.320
OWER-0.313
Token Hide
Token ID10415
Feature activation+1.85e-01

Pos logit contributions
Rated+0.260
ENCY+0.235
 Draper+0.235
DEN+0.232
OWER+0.228
Neg logit contributions
 patched-0.218
 merit-0.204
minster-0.196
 consolation-0.196
 elig-0.195
Token Caption
Token ID11260
Feature activation-8.96e-01

Pos logit contributions
 patched+0.386
 merit+0.366
minster+0.356
 consolation+0.354
 elig+0.352
Neg logit contributions
Rated-0.265
ENCY-0.232
 Draper-0.229
DEN-0.228
OWER-0.222
Token 10
Token ID838
Feature activation-4.02e-02

Pos logit contributions
Rated+1.544
ENCY+1.389
 Draper+1.386
DEN+1.350
 Roose+1.349
Neg logit contributions
 patched-1.613
 merit-1.509
 elig-1.471
minster-1.456
 consolation-1.453
Token of
Token ID286
Feature activation-3.98e-02

Pos logit contributions
Rated+0.070
ENCY+0.062
 Draper+0.062
DEN+0.062
OWER+0.060
Neg logit contributions
 patched-0.072
 merit-0.067
minster-0.065
 consolation-0.065
 elig-0.065
Token 15
Token ID1315
Feature activation-8.15e-03

Pos logit contributions
Rated+0.074
ENCY+0.067
 Draper+0.066
DEN+0.066
OWER+0.065
Neg logit contributions
 patched-0.066
 merit-0.062
minster-0.060
 consolation-0.059
 elig-0.059
Token Photos
Token ID9434
Feature activation-2.97e-01

Pos logit contributions
Rated+0.012
 Draper+0.010
ENCY+0.010
DEN+0.010
OWER+0.010
Neg logit contributions
 patched-0.017
 merit-0.016
 elig-0.016
 consolation-0.016
minster-0.016
Token:
Token ID25
Feature activation+7.58e-02

Pos logit contributions
Rated+0.448
ENCY+0.392
 Draper+0.388
DEN+0.388
OWER+0.376
Neg logit contributions
 patched-0.601
 merit-0.566
 consolation-0.552
 elig-0.551
minster-0.550
Token Irvine
Token ID37277
Feature activation-7.92e-02

Pos logit contributions
 patched+0.048
 merit+0.045
minster+0.044
 consolation+0.044
 elig+0.044
Neg logit contributions
Rated-0.023
ENCY-0.019
 Draper-0.019
DEN-0.019
OWER-0.018
Token,
Token ID11
Feature activation-8.96e-03

Pos logit contributions
Rated+0.133
ENCY+0.119
 Draper+0.118
DEN+0.117
OWER+0.115
Neg logit contributions
 patched-0.145
 merit-0.137
minster-0.133
 consolation-0.132
bridge-0.131
Token California
Token ID3442
Feature activation+2.66e-01

Pos logit contributions
Rated+0.013
ENCY+0.012
DEN+0.012
 Draper+0.012
OWER+0.011
Neg logit contributions
 patched-0.018
 merit-0.017
minster-0.017
 consolation-0.017
bridge-0.017
Token.
Token ID13
Feature activation-1.59e-01

Pos logit contributions
 patched+0.468
 merit+0.439
minster+0.423
 consolation+0.423
bridge+0.420
Neg logit contributions
Rated-0.466
ENCY-0.422
DEN-0.418
 Draper-0.414
OWER-0.406
Token Hide
Token ID10415
Feature activation+2.15e-01

Pos logit contributions
Rated+0.306
ENCY+0.277
 Draper+0.276
DEN+0.274
OWER+0.270
Neg logit contributions
 patched-0.252
 merit-0.236
minster-0.227
 consolation-0.225
 elig-0.224
Token Caption
Token ID11260
Feature activation-8.90e-01

Pos logit contributions
 patched+0.446
 merit+0.423
minster+0.411
 consolation+0.409
 elig+0.407
Neg logit contributions
Rated-0.309
ENCY-0.271
 Draper-0.268
DEN-0.267
OWER-0.260
Token 9
Token ID860
Feature activation+1.80e-02

Pos logit contributions
Rated+1.701
ENCY+1.542
 Draper+1.536
DEN+1.518
OWER+1.489
Neg logit contributions
 patched-1.434
 merit-1.329
 elig-1.292
 consolation-1.290
minster-1.283
Token of
Token ID286
Feature activation+1.90e-02

Pos logit contributions
 patched+0.032
 merit+0.030
minster+0.029
 consolation+0.029
 elig+0.029
Neg logit contributions
Rated-0.031
ENCY-0.028
 Draper-0.028
DEN-0.028
OWER-0.027
Token 15
Token ID1315
Feature activation+5.23e-02

Pos logit contributions
 patched+0.032
 merit+0.029
minster+0.028
 consolation+0.028
 elig+0.028
Neg logit contributions
Rated-0.035
ENCY-0.032
 Draper-0.032
DEN-0.032
OWER-0.031
Token Photos
Token ID9434
Feature activation-2.56e-01

Pos logit contributions
 patched+0.109
 merit+0.103
 elig+0.100
 consolation+0.100
minster+0.100
Neg logit contributions
Rated-0.076
ENCY-0.066
 Draper-0.066
DEN-0.065
OWER-0.064
Token:
Token ID25
Feature activation+1.14e-01

Pos logit contributions
Rated+0.388
ENCY+0.339
 Draper+0.336
DEN+0.335
OWER+0.325
Neg logit contributions
 patched-0.519
 merit-0.488
 consolation-0.476
 elig-0.475
minster-0.474
Token:
Token ID25
Feature activation-4.77e-02

Pos logit contributions
 patched+0.507
 merit+0.480
minster+0.464
 consolation+0.462
bridge+0.460
Neg logit contributions
Rated-0.430
ENCY-0.383
 Draper-0.378
DEN-0.378
OWER-0.369
Token (
Token ID357
Feature activation-8.85e-03

Pos logit contributions
Rated+0.078
ENCY+0.069
 Draper+0.068
DEN+0.068
OWER+0.067
Neg logit contributions
 patched-0.090
 merit-0.085
minster-0.083
 consolation-0.082
bridge-0.082
TokenExample
Token ID16281
Feature activation-1.23e-01

Pos logit contributions
Rated+0.013
ENCY+0.012
 Draper+0.012
DEN+0.012
OWER+0.011
Neg logit contributions
 patched-0.018
 merit-0.017
minster-0.016
 consolation-0.016
bridge-0.016
Token:
Token ID25
Feature activation-1.37e-02

Pos logit contributions
Rated+0.190
ENCY+0.168
 Draper+0.166
DEN+0.166
OWER+0.161
Neg logit contributions
 patched-0.243
 merit-0.230
minster-0.224
 consolation-0.222
 elig-0.221
Token Th
Token ID536
Feature activation+2.24e-01

Pos logit contributions
Rated+0.023
ENCY+0.020
DEN+0.020
 Draper+0.020
OWER+0.020
Neg logit contributions
 patched-0.025
 merit-0.024
minster-0.023
 consolation-0.023
bridge-0.023
Tokenurb
Token ID5945
Feature activation-8.89e-01

Pos logit contributions
 patched+0.396
 merit+0.372
minster+0.360
 consolation+0.358
bridge+0.356
Neg logit contributions
Rated-0.391
ENCY-0.351
 Draper-0.348
DEN-0.347
OWER-0.339
Tokenion
Token ID295
Feature activation-4.09e-01

Pos logit contributions
Rated+1.525
ENCY+1.367
 Draper+1.352
DEN+1.350
OWER+1.320
Neg logit contributions
 patched-1.599
 merit-1.507
minster-1.455
 consolation-1.449
 elig-1.440
Token the
Token ID262
Feature activation-3.20e-02

Pos logit contributions
Rated+0.717
ENCY+0.645
 Draper+0.638
DEN+0.637
OWER+0.623
Neg logit contributions
 patched-0.721
 merit-0.679
minster-0.656
 consolation-0.652
bridge-0.647
Token Paladin
Token ID26693
Feature activation-1.72e-01

Pos logit contributions
Rated+0.055
ENCY+0.050
 Draper+0.049
DEN+0.049
OWER+0.048
Neg logit contributions
 patched-0.057
 merit-0.054
minster-0.052
 consolation-0.051
bridge-0.051
Token became
Token ID2627
Feature activation+2.41e-02

Pos logit contributions
Rated+0.309
ENCY+0.279
 Draper+0.275
DEN+0.275
OWER+0.269
Neg logit contributions
 patched-0.297
 merit-0.278
minster-0.269
 consolation-0.267
bridge-0.265
Token a
Token ID257
Feature activation-5.24e-02

Pos logit contributions
 patched+0.043
 merit+0.040
minster+0.039
 consolation+0.039
 elig+0.038
Neg logit contributions
Rated-0.042
ENCY-0.038
 Draper-0.037
DEN-0.037
OWER-0.036
Token Women
Token ID6926
Feature activation-3.45e-01

Pos logit contributions
Rated+0.076
ENCY+0.068
 Draper+0.068
DEN+0.067
OWER+0.066
Neg logit contributions
 patched-0.067
 merit-0.063
minster-0.061
 elig-0.060
 consolation-0.060
Token in
Token ID287
Feature activation-1.71e-02

Pos logit contributions
Rated+0.564
 Draper+0.503
ENCY+0.501
DEN+0.497
OWER+0.487
Neg logit contributions
 patched-0.643
 merit-0.604
minster-0.592
 consolation-0.585
 elig-0.583
Token politics
Token ID4819
Feature activation-9.66e-02

Pos logit contributions
Rated+0.028
ENCY+0.025
 Draper+0.024
DEN+0.024
OWER+0.024
Neg logit contributions
 patched-0.032
 merit-0.031
minster-0.030
 consolation-0.030
 elig-0.029
Token.
Token ID13
Feature activation-8.53e-02

Pos logit contributions
Rated+0.161
 Draper+0.143
ENCY+0.143
DEN+0.142
OWER+0.139
Neg logit contributions
 patched-0.177
 merit-0.167
minster-0.163
 consolation-0.161
 elig-0.161
Token Hide
Token ID10415
Feature activation+2.45e-01

Pos logit contributions
Rated+0.170
ENCY+0.155
 Draper+0.154
DEN+0.154
OWER+0.151
Neg logit contributions
 patched-0.129
 merit-0.121
minster-0.116
 consolation-0.115
 elig-0.114
Token Caption
Token ID11260
Feature activation-8.75e-01

Pos logit contributions
 patched+0.505
 merit+0.480
minster+0.467
 consolation+0.464
bridge+0.462
Neg logit contributions
Rated-0.355
ENCY-0.312
 Draper-0.308
DEN-0.307
OWER-0.299
Token 7
Token ID767
Feature activation-7.71e-02

Pos logit contributions
Rated+1.724
 Draper+1.578
ENCY+1.573
DEN+1.551
OWER+1.527
Neg logit contributions
 patched-1.342
 merit-1.247
 elig-1.211
 consolation-1.201
minster-1.193
Token of
Token ID286
Feature activation+2.43e-02

Pos logit contributions
Rated+0.133
ENCY+0.118
 Draper+0.117
DEN+0.117
OWER+0.114
Neg logit contributions
 patched-0.139
 merit-0.130
 elig-0.126
minster-0.126
 consolation-0.126
Token 15
Token ID1315
Feature activation+7.38e-02

Pos logit contributions
 patched+0.040
 merit+0.037
minster+0.036
 consolation+0.036
 elig+0.036
Neg logit contributions
Rated-0.046
ENCY-0.041
 Draper-0.041
DEN-0.041
OWER-0.040
Token Photos
Token ID9434
Feature activation-2.33e-01

Pos logit contributions
 patched+0.154
 merit+0.145
 elig+0.141
 consolation+0.141
minster+0.141
Neg logit contributions
Rated-0.107
 Draper-0.093
ENCY-0.093
DEN-0.092
OWER-0.090
Token:
Token ID25
Feature activation+1.24e-01

Pos logit contributions
Rated+0.354
ENCY+0.310
 Draper+0.307
DEN+0.306
OWER+0.297
Neg logit contributions
 patched-0.471
 merit-0.444
 consolation-0.432
 elig-0.431
minster-0.430
Token Community
Token ID8108
Feature activation-2.26e-01

Pos logit contributions
Rated+0.474
ENCY+0.423
 Draper+0.422
DEN+0.419
OWER+0.411
Neg logit contributions
 patched-0.508
 merit-0.475
minster-0.462
 consolation-0.457
 elig-0.456
Token Newspaper
Token ID49598
Feature activation-3.10e-01

Pos logit contributions
Rated+0.434
ENCY+0.391
 Draper+0.390
DEN+0.386
OWER+0.382
Neg logit contributions
 patched-0.362
 merit-0.338
minster-0.327
 consolation-0.324
 elig-0.322
Token Group
Token ID4912
Feature activation-1.60e-01

Pos logit contributions
Rated+0.615
 Draper+0.559
ENCY+0.559
DEN+0.545
OWER+0.544
Neg logit contributions
 patched-0.473
 merit-0.442
minster-0.428
 consolation-0.423
 elig-0.421
Token<|endoftext|>
Token ID50256
Feature activation-1.53e-01

Pos logit contributions
Rated+0.275
ENCY+0.247
 Draper+0.244
DEN+0.243
OWER+0.239
Neg logit contributions
 patched-0.287
 merit-0.270
minster-0.262
 consolation-0.260
 elig-0.258
TokenW
Token ID54
Feature activation-8.24e-02

Pos logit contributions
Rated+0.239
ENCY+0.210
DEN+0.209
 Draper+0.206
OWER+0.202
Neg logit contributions
 patched-0.303
 merit-0.286
minster-0.277
 consolation-0.277
 elig-0.274
Tokenired
Token ID1202
Feature activation-8.67e-01

Pos logit contributions
Rated+0.128
ENCY+0.113
 Draper+0.112
DEN+0.112
OWER+0.110
Neg logit contributions
 patched-0.162
 merit-0.153
minster-0.148
 consolation-0.148
 elig-0.147
Token Equ
Token ID7889
Feature activation-9.09e-03

Pos logit contributions
Rated+1.516
ENCY+1.342
 Draper+1.341
DEN+1.339
 Author+1.293
Neg logit contributions
 patched-1.517
 merit-1.439
 elig-1.393
 consolation-1.390
minster-1.387
Tokenivalent
Token ID29540
Feature activation-1.69e-01

Pos logit contributions
Rated+0.016
ENCY+0.015
 Draper+0.015
DEN+0.014
OWER+0.014
Neg logit contributions
 patched-0.016
 merit-0.015
minster-0.014
 consolation-0.014
 elig-0.014
Token Privacy
Token ID16777
Feature activation+8.20e-02

Pos logit contributions
Rated+0.292
ENCY+0.261
 Draper+0.260
DEN+0.256
OWER+0.250
Neg logit contributions
 patched-0.300
 merit-0.282
minster-0.274
 consolation-0.272
 elig-0.271
Token (
Token ID357
Feature activation-3.54e-02

Pos logit contributions
 patched+0.142
 merit+0.134
minster+0.130
 consolation+0.129
 elig+0.128
Neg logit contributions
Rated-0.146
ENCY-0.131
 Draper-0.130
DEN-0.130
OWER-0.127
TokenWE
Token ID8845
Feature activation-1.71e-01

Pos logit contributions
Rated+0.058
ENCY+0.052
DEN+0.051
 Draper+0.051
OWER+0.050
Neg logit contributions
 patched-0.066
 merit-0.063
minster-0.061
 consolation-0.061
 elig-0.060
Token something
Token ID1223
Feature activation-1.75e-01

Pos logit contributions
Rated+0.253
ENCY+0.229
 Draper+0.226
DEN+0.226
OWER+0.221
Neg logit contributions
 patched-0.229
 merit-0.215
minster-0.207
 consolation-0.206
bridge-0.204
Token like
Token ID588
Feature activation-2.32e-01

Pos logit contributions
Rated+0.306
ENCY+0.275
 Draper+0.273
DEN+0.272
OWER+0.266
Neg logit contributions
 patched-0.309
 merit-0.291
minster-0.282
 consolation-0.279
 elig-0.278
Token it
Token ID340
Feature activation-7.91e-02

Pos logit contributions
Rated+0.409
ENCY+0.369
 Draper+0.366
DEN+0.364
OWER+0.357
Neg logit contributions
 patched-0.404
 merit-0.379
minster-0.367
 consolation-0.364
 elig-0.362
Token (
Token ID357
Feature activation-2.23e-02

Pos logit contributions
Rated+0.133
ENCY+0.119
 Draper+0.118
DEN+0.118
OWER+0.115
Neg logit contributions
 patched-0.145
 merit-0.136
minster-0.132
 consolation-0.131
bridge-0.130
TokenB
Token ID33
Feature activation-8.24e-02

Pos logit contributions
Rated+0.039
ENCY+0.035
 Draper+0.034
DEN+0.034
OWER+0.033
Neg logit contributions
 patched-0.040
 merit-0.037
minster-0.036
 consolation-0.036
 elig-0.036
Tokenab
Token ID397
Feature activation-8.48e-01

Pos logit contributions
Rated+0.138
ENCY+0.123
 Draper+0.121
DEN+0.121
OWER+0.119
Neg logit contributions
 patched-0.152
 merit-0.143
minster-0.138
 consolation-0.138
 elig-0.137
Tokenylon
Token ID15158
Feature activation-5.75e-01

Pos logit contributions
Rated+0.738
 Draper+0.579
ENCY+0.576
DEN+0.561
 Roose+0.534
Neg logit contributions
 patched-2.235
 merit-2.151
 elig-2.082
minster-2.081
 consolation-2.076
Token was
Token ID373
Feature activation-1.24e-01

Pos logit contributions
Rated+1.014
ENCY+0.912
 Draper+0.905
DEN+0.900
OWER+0.880
Neg logit contributions
 patched-0.993
 merit-0.936
minster-0.904
 consolation-0.900
 elig-0.897
Token under
Token ID739
Feature activation-8.60e-02

Pos logit contributions
Rated+0.213
ENCY+0.191
 Draper+0.189
DEN+0.189
OWER+0.185
Neg logit contributions
 patched-0.222
 merit-0.209
minster-0.202
 consolation-0.201
bridge-0.199
Token Persian
Token ID22035
Feature activation-1.49e-01

Pos logit contributions
Rated+0.160
ENCY+0.145
 Draper+0.144
DEN+0.144
OWER+0.141
Neg logit contributions
 patched-0.142
 merit-0.133
minster-0.128
 consolation-0.127
bridge-0.126
Token control
Token ID1630
Feature activation-1.95e-01

Pos logit contributions
Rated+0.253
ENCY+0.227
 Draper+0.225
DEN+0.223
OWER+0.218
Neg logit contributions
 patched-0.269
 merit-0.252
minster-0.245
 elig-0.243
 consolation-0.242
Token Hills
Token ID14379
Feature activation+4.66e-03

Pos logit contributions
Rated+0.267
ENCY+0.239
 Draper+0.236
DEN+0.235
OWER+0.230
Neg logit contributions
 patched-0.290
 merit-0.274
minster-0.265
 consolation-0.263
bridge-0.262
Token,
Token ID11
Feature activation-5.30e-02

Pos logit contributions
 patched+0.008
 merit+0.008
minster+0.008
 consolation+0.008
bridge+0.008
Neg logit contributions
Rated-0.008
ENCY-0.007
DEN-0.007
 Draper-0.007
OWER-0.007
Token California
Token ID3442
Feature activation+2.01e-01

Pos logit contributions
Rated+0.076
ENCY+0.067
DEN+0.066
 Draper+0.066
OWER+0.064
Neg logit contributions
 patched-0.110
 merit-0.104
minster-0.101
 consolation-0.101
 elig-0.100
Token.
Token ID13
Feature activation-9.94e-02

Pos logit contributions
 patched+0.354
 merit+0.333
minster+0.321
 consolation+0.320
bridge+0.318
Neg logit contributions
Rated-0.351
ENCY-0.316
DEN-0.314
 Draper-0.312
OWER-0.305
Token Hide
Token ID10415
Feature activation+1.78e-01

Pos logit contributions
Rated+0.203
ENCY+0.185
 Draper+0.184
DEN+0.183
OWER+0.180
Neg logit contributions
 patched-0.147
 merit-0.137
minster-0.131
 consolation-0.130
 elig-0.129
Token Caption
Token ID11260
Feature activation-8.46e-01

Pos logit contributions
 patched+0.371
 merit+0.352
minster+0.342
 consolation+0.341
 elig+0.339
Neg logit contributions
Rated-0.256
ENCY-0.224
 Draper-0.221
DEN-0.221
OWER-0.214
Token 14
Token ID1478
Feature activation-1.05e-02

Pos logit contributions
Rated+1.604
ENCY+1.461
 Draper+1.456
DEN+1.424
 Author+1.418
Neg logit contributions
 patched-1.388
 merit-1.266
 facult-1.249
 arrang-1.239
 elig-1.237
Token of
Token ID286
Feature activation+3.15e-03

Pos logit contributions
Rated+0.018
ENCY+0.017
 Draper+0.016
DEN+0.016
OWER+0.016
Neg logit contributions
 patched-0.019
 merit-0.017
minster-0.017
 consolation-0.017
 elig-0.017
Token 15
Token ID1315
Feature activation+3.67e-03

Pos logit contributions
 patched+0.005
 merit+0.005
minster+0.005
 consolation+0.005
 elig+0.005
Neg logit contributions
Rated-0.006
ENCY-0.005
 Draper-0.005
DEN-0.005
OWER-0.005
Token Photos
Token ID9434
Feature activation-2.84e-01

Pos logit contributions
 patched+0.008
 merit+0.007
 elig+0.007
 consolation+0.007
minster+0.007
Neg logit contributions
Rated-0.005
 Draper-0.005
ENCY-0.005
DEN-0.005
OWER-0.004
Token:
Token ID25
Feature activation+1.18e-01

Pos logit contributions
Rated+0.428
ENCY+0.374
 Draper+0.371
DEN+0.370
OWER+0.359
Neg logit contributions
 patched-0.576
 merit-0.542
 consolation-0.529
 elig-0.528
minster-0.526
Token.
Token ID13
Feature activation-9.00e-02

Pos logit contributions
Rated+0.217
ENCY+0.194
DEN+0.193
 Draper+0.192
OWER+0.187
Neg logit contributions
 patched-0.226
 merit-0.213
minster-0.207
 consolation-0.205
bridge-0.204
Token Guess
Token ID37571
Feature activation+1.56e-01

Pos logit contributions
Rated+0.146
ENCY+0.130
 Draper+0.128
DEN+0.128
OWER+0.125
Neg logit contributions
 patched-0.170
 merit-0.161
minster-0.156
 consolation-0.155
bridge-0.154
Token what
Token ID644
Feature activation-1.76e-01

Pos logit contributions
 patched+0.229
 merit+0.212
minster+0.205
 consolation+0.202
 elig+0.201
Neg logit contributions
Rated-0.320
ENCY-0.291
 Draper-0.291
DEN-0.289
OWER-0.284
Token!
Token ID0
Feature activation-2.29e-01

Pos logit contributions
Rated+0.368
ENCY+0.338
 Draper+0.335
DEN+0.335
OWER+0.328
Neg logit contributions
 patched-0.248
 merit-0.230
minster-0.221
 consolation-0.218
bridge-0.218
Token Ch
Token ID609
Feature activation+1.61e-01

Pos logit contributions
Rated+0.390
ENCY+0.350
 Draper+0.348
DEN+0.345
OWER+0.337
Neg logit contributions
 patched-0.414
 merit-0.391
minster-0.378
 consolation-0.376
 elig-0.374
Tokenances
Token ID1817
Feature activation-8.44e-01

Pos logit contributions
 patched+0.243
 merit+0.227
 consolation+0.216
minster+0.215
 elig+0.213
Neg logit contributions
Rated-0.322
ENCY-0.293
 Draper-0.291
DEN-0.291
OWER-0.286
Token are
Token ID389
Feature activation-9.96e-02

Pos logit contributions
Rated+1.533
ENCY+1.423
 Draper+1.391
DEN+1.378
OWER+1.360
Neg logit contributions
 patched-1.387
 merit-1.298
 elig-1.270
minster-1.266
 consolation-1.254
Token,
Token ID11
Feature activation-8.18e-02

Pos logit contributions
Rated+0.195
ENCY+0.178
 Draper+0.176
DEN+0.176
OWER+0.172
Neg logit contributions
 patched-0.154
 merit-0.143
minster-0.139
 consolation-0.137
bridge-0.136
Token you
Token ID345
Feature activation-7.40e-02

Pos logit contributions
Rated+0.169
ENCY+0.155
 Draper+0.154
DEN+0.153
OWER+0.150
Neg logit contributions
 patched-0.117
 merit-0.108
minster-0.105
bridge-0.103
 consolation-0.103
Token rarely
Token ID8365
Feature activation-2.63e-01

Pos logit contributions
Rated+0.121
ENCY+0.108
 Draper+0.107
DEN+0.107
OWER+0.104
Neg logit contributions
 patched-0.138
 merit-0.131
minster-0.127
 consolation-0.126
bridge-0.125
Token do
Token ID466
Feature activation-1.28e-01

Pos logit contributions
Rated+0.490
ENCY+0.443
 Draper+0.440
DEN+0.438
OWER+0.429
Neg logit contributions
 patched-0.429
 merit-0.402
minster-0.392
 consolation-0.387
bridge-0.384
Tokenaga
Token ID8126
Feature activation-2.29e-01

Pos logit contributions
Rated+0.214
 Draper+0.195
ENCY+0.195
DEN+0.193
 Roose+0.189
Neg logit contributions
 patched-0.167
 merit-0.156
 consolation-0.148
minster-0.147
 elig-0.147
Token,
Token ID11
Feature activation-4.96e-02

Pos logit contributions
Rated+0.375
ENCY+0.335
 Draper+0.332
DEN+0.330
OWER+0.322
Neg logit contributions
 patched-0.429
 merit-0.405
minster-0.391
 consolation-0.390
 elig-0.388
Token California
Token ID3442
Feature activation+1.80e-01

Pos logit contributions
Rated+0.071
ENCY+0.062
DEN+0.061
 Draper+0.061
OWER+0.060
Neg logit contributions
 patched-0.103
 merit-0.098
minster-0.095
 consolation-0.095
bridge-0.094
Token.
Token ID13
Feature activation-1.11e-01

Pos logit contributions
 patched+0.319
 merit+0.299
minster+0.289
 consolation+0.288
bridge+0.286
Neg logit contributions
Rated-0.313
ENCY-0.283
DEN-0.281
 Draper-0.279
OWER-0.273
Token Hide
Token ID10415
Feature activation+1.70e-01

Pos logit contributions
Rated+0.229
ENCY+0.209
 Draper+0.209
DEN+0.207
OWER+0.204
Neg logit contributions
 patched-0.160
 merit-0.149
minster-0.143
 consolation-0.142
 elig-0.141
Token Caption
Token ID11260
Feature activation-8.42e-01

Pos logit contributions
 patched+0.355
 merit+0.337
minster+0.327
 consolation+0.326
 elig+0.324
Neg logit contributions
Rated-0.244
ENCY-0.213
 Draper-0.211
DEN-0.210
OWER-0.204
Token 13
Token ID1511
Feature activation-1.13e-01

Pos logit contributions
Rated+1.533
 Draper+1.393
ENCY+1.387
DEN+1.371
OWER+1.335
Neg logit contributions
 patched-1.435
 merit-1.323
 elig-1.288
 consolation-1.288
minster-1.284
Token of
Token ID286
Feature activation-3.78e-03

Pos logit contributions
Rated+0.196
ENCY+0.175
 Draper+0.173
DEN+0.173
OWER+0.169
Neg logit contributions
 patched-0.203
 merit-0.190
minster-0.184
 consolation-0.184
 elig-0.184
Token 15
Token ID1315
Feature activation+3.29e-02

Pos logit contributions
Rated+0.007
ENCY+0.006
 Draper+0.006
DEN+0.006
OWER+0.006
Neg logit contributions
 patched-0.006
 merit-0.006
minster-0.006
 consolation-0.006
 elig-0.006
Token Photos
Token ID9434
Feature activation-2.99e-01

Pos logit contributions
 patched+0.069
 merit+0.065
 elig+0.063
 consolation+0.063
minster+0.063
Neg logit contributions
Rated-0.048
 Draper-0.042
ENCY-0.042
DEN-0.041
OWER-0.040
Token:
Token ID25
Feature activation+9.50e-02

Pos logit contributions
Rated+0.449
ENCY+0.392
 Draper+0.389
DEN+0.388
OWER+0.376
Neg logit contributions
 patched-0.607
 merit-0.571
 consolation-0.557
 elig-0.557
minster-0.554
Token Pasadena
Token ID41010
Feature activation+2.27e-01

Pos logit contributions
Rated+0.030
ENCY+0.026
 Draper+0.026
DEN+0.026
OWER+0.025
Neg logit contributions
 patched-0.043
 merit-0.040
minster-0.039
 consolation-0.039
 elig-0.039
Token,
Token ID11
Feature activation-6.52e-02

Pos logit contributions
 patched+0.404
 merit+0.380
minster+0.368
 consolation+0.365
bridge+0.363
Neg logit contributions
Rated-0.395
ENCY-0.355
DEN-0.352
 Draper-0.350
OWER-0.342
Token California
Token ID3442
Feature activation+1.62e-01

Pos logit contributions
Rated+0.096
ENCY+0.085
DEN+0.084
 Draper+0.083
OWER+0.081
Neg logit contributions
 patched-0.133
 merit-0.126
minster-0.122
 consolation-0.122
 elig-0.121
Token.
Token ID13
Feature activation-2.05e-01

Pos logit contributions
 patched+0.291
 merit+0.274
minster+0.265
 consolation+0.264
bridge+0.262
Neg logit contributions
Rated-0.279
ENCY-0.251
DEN-0.248
 Draper-0.248
OWER-0.242
Token Hide
Token ID10415
Feature activation+1.55e-01

Pos logit contributions
Rated+0.426
ENCY+0.389
 Draper+0.388
DEN+0.385
OWER+0.379
Neg logit contributions
 patched-0.293
 merit-0.272
minster-0.260
 consolation-0.259
 elig-0.256
Token Caption
Token ID11260
Feature activation-8.36e-01

Pos logit contributions
 patched+0.322
 merit+0.305
minster+0.297
 consolation+0.295
 elig+0.294
Neg logit contributions
Rated-0.222
ENCY-0.194
 Draper-0.192
DEN-0.191
OWER-0.186
Token 12
Token ID1105
Feature activation-1.27e-01

Pos logit contributions
Rated+1.550
 Draper+1.413
ENCY+1.393
DEN+1.376
 Roose+1.351
Neg logit contributions
 patched-1.414
 merit-1.294
 elig-1.275
 arrang-1.269
 facult-1.264
Token of
Token ID286
Feature activation-3.89e-02

Pos logit contributions
Rated+0.221
ENCY+0.197
 Draper+0.195
DEN+0.195
OWER+0.190
Neg logit contributions
 patched-0.228
 merit-0.214
minster-0.207
 consolation-0.207
 elig-0.207
Token 15
Token ID1315
Feature activation-6.82e-02

Pos logit contributions
Rated+0.072
ENCY+0.065
 Draper+0.065
DEN+0.064
OWER+0.063
Neg logit contributions
 patched-0.065
 merit-0.060
minster-0.058
 consolation-0.058
 elig-0.058
Token Photos
Token ID9434
Feature activation-2.82e-01

Pos logit contributions
Rated+0.099
 Draper+0.086
ENCY+0.086
DEN+0.085
OWER+0.083
Neg logit contributions
 patched-0.142
 merit-0.135
 elig-0.131
 consolation-0.131
minster-0.130
Token:
Token ID25
Feature activation+7.18e-02

Pos logit contributions
Rated+0.426
ENCY+0.373
 Draper+0.369
DEN+0.369
OWER+0.358
Neg logit contributions
 patched-0.570
 merit-0.537
 consolation-0.522
 elig-0.522
minster-0.521
Token year
Token ID614
Feature activation-1.63e-01

Pos logit contributions
Rated+0.086
 Draper+0.078
ENCY+0.078
DEN+0.077
OWER+0.075
Neg logit contributions
 patched-0.077
 merit-0.071
minster-0.070
 elig-0.069
bridge-0.069
Token.
Token ID13
Feature activation-1.15e-01

Pos logit contributions
Rated+0.266
ENCY+0.238
 Draper+0.236
DEN+0.235
OWER+0.229
Neg logit contributions
 patched-0.303
 merit-0.286
minster-0.278
 consolation-0.276
 elig-0.274
Token
Token ID198
Feature activation-6.74e-02

Pos logit contributions
Rated+0.220
ENCY+0.200
 Draper+0.199
DEN+0.197
OWER+0.194
Neg logit contributions
 patched-0.184
 merit-0.172
minster-0.165
 consolation-0.165
 elig-0.163
Token
Token ID198
Feature activation-9.75e-03

Pos logit contributions
Rated+0.131
ENCY+0.118
DEN+0.118
 Draper+0.117
OWER+0.115
Neg logit contributions
 patched-0.107
 merit-0.099
 elig-0.096
 consolation-0.096
minster-0.095
TokenW
Token ID54
Feature activation+3.42e-03

Pos logit contributions
Rated+0.015
ENCY+0.013
DEN+0.013
 Draper+0.013
OWER+0.013
Neg logit contributions
 patched-0.019
 merit-0.018
minster-0.018
 consolation-0.018
 elig-0.018
Tokenage
Token ID496
Feature activation-8.29e-01

Pos logit contributions
 patched+0.006
 merit+0.006
minster+0.006
 consolation+0.006
 elig+0.005
Neg logit contributions
Rated-0.006
ENCY-0.005
 Draper-0.005
DEN-0.005
OWER-0.005
Token gap
Token ID7625
Feature activation+2.44e-01

Pos logit contributions
Rated+1.479
 Draper+1.326
ENCY+1.324
DEN+1.322
OWER+1.287
Neg logit contributions
 patched-1.410
 merit-1.312
minster-1.281
 consolation-1.263
 elig-1.248
Token '
Token ID705
Feature activation-1.97e-01

Pos logit contributions
 patched+0.394
 merit+0.370
minster+0.358
 consolation+0.354
 elig+0.352
Neg logit contributions
Rated-0.460
ENCY-0.416
 Draper-0.414
DEN-0.413
OWER-0.405
Tokensim
Token ID14323
Feature activation+2.25e-01

Pos logit contributions
Rated+0.332
ENCY+0.296
DEN+0.293
 Draper+0.292
OWER+0.285
Neg logit contributions
 patched-0.360
 merit-0.339
minster-0.328
 consolation-0.327
 elig-0.325
Tokenply
Token ID2145
Feature activation+1.49e-01

Pos logit contributions
 patched+0.326
 merit+0.303
minster+0.290
 consolation+0.288
bridge+0.286
Neg logit contributions
Rated-0.466
ENCY-0.426
 Draper-0.422
DEN-0.421
OWER-0.414
Token unacceptable
Token ID18010
Feature activation+2.56e-02

Pos logit contributions
 patched+0.273
 merit+0.258
minster+0.250
 consolation+0.248
bridge+0.247
Neg logit contributions
Rated-0.250
ENCY-0.223
 Draper-0.221
DEN-0.221
OWER-0.216
Token,
Token ID11
Feature activation-8.05e-02

Pos logit contributions
Rated+0.398
ENCY+0.357
 Draper+0.353
DEN+0.353
OWER+0.345
Neg logit contributions
 patched-0.420
 merit-0.396
minster-0.383
 consolation-0.380
bridge-0.378
Token seeking
Token ID6095
Feature activation-3.82e-02

Pos logit contributions
Rated+0.133
ENCY+0.119
 Draper+0.118
DEN+0.117
OWER+0.115
Neg logit contributions
 patched-0.149
 merit-0.141
minster-0.137
 consolation-0.136
bridge-0.135
Token work
Token ID670
Feature activation-7.23e-02

Pos logit contributions
Rated+0.069
ENCY+0.062
 Draper+0.062
DEN+0.062
OWER+0.060
Neg logit contributions
 patched-0.064
 merit-0.060
minster-0.059
bridge-0.058
 elig-0.058
Token.
Token ID13
Feature activation-2.52e-02

Pos logit contributions
Rated+0.118
ENCY+0.105
 Draper+0.104
DEN+0.104
OWER+0.102
Neg logit contributions
 patched-0.134
 merit-0.126
minster-0.125
 consolation-0.122
 elig-0.122
Token Bol
Token ID10797
Feature activation-3.78e-01

Pos logit contributions
Rated+0.044
ENCY+0.039
 Draper+0.039
DEN+0.039
OWER+0.038
Neg logit contributions
 patched-0.045
 merit-0.042
minster-0.041
 consolation-0.041
 elig-0.040
Tokeniv
Token ID452
Feature activation-8.27e-01

Pos logit contributions
Rated+0.417
 Draper+0.348
DEN+0.348
ENCY+0.346
OWER+0.328
Neg logit contributions
 patched-0.908
 merit-0.870
 consolation-0.844
minster-0.844
 elig-0.840
Tokenians
Token ID1547
Feature activation-2.35e-01

Pos logit contributions
Rated+1.502
DEN+1.342
 Draper+1.341
ENCY+1.339
OWER+1.293
Neg logit contributions
 patched-1.397
 merit-1.316
 consolation-1.270
minster-1.264
 elig-1.249
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
Rated+0.408
ENCY+0.365
 Draper+0.365
DEN+0.361
OWER+0.353
Neg logit contributions
 patched-0.412
 merit-0.388
minster-0.382
 consolation-0.376
bridge-0.372
Token Hait
Token ID19456
Feature activation-5.24e-01

Pos logit contributions
Rated+0.018
ENCY+0.016
 Draper+0.016
DEN+0.016
 Roose+0.016
Neg logit contributions
 patched-0.022
 merit-0.020
minster-0.020
 consolation-0.020
bridge-0.020
Tokenians
Token ID1547
Feature activation-1.95e-01

Pos logit contributions
Rated+1.037
ENCY+0.936
DEN+0.935
 Draper+0.930
OWER+0.912
Neg logit contributions
 patched-0.802
 merit-0.748
minster-0.716
 consolation-0.716
 elig-0.711
Token are
Token ID389
Feature activation+8.95e-02

Pos logit contributions
Rated+0.344
 Draper+0.308
ENCY+0.308
DEN+0.305
OWER+0.299
Neg logit contributions
 patched-0.333
 merit-0.313
minster-0.311
 consolation-0.304
 elig-0.302
Token figure
Token ID3785
Feature activation-2.84e-02

Pos logit contributions
 patched+0.013
minster+0.012
 merit+0.012
 elig+0.012
bridge+0.012
Neg logit contributions
Rated-0.013
 Draper-0.011
ENCY-0.011
DEN-0.011
OWER-0.011
Token model
Token ID2746
Feature activation-2.19e-01

Pos logit contributions
Rated+0.030
ENCY+0.025
 Draper+0.025
DEN+0.024
OWER+0.023
Neg logit contributions
 patched-0.070
 merit-0.067
minster-0.065
 consolation-0.065
 elig-0.065
Token.
Token ID13
Feature activation-5.91e-02

Pos logit contributions
Rated+0.365
ENCY+0.324
 Draper+0.322
DEN+0.321
OWER+0.313
Neg logit contributions
 patched-0.407
 merit-0.383
minster-0.370
 consolation-0.370
 elig-0.370
Token<|endoftext|>
Token ID50256
Feature activation-5.49e-02

Pos logit contributions
Rated+0.109
ENCY+0.098
 Draper+0.098
DEN+0.097
OWER+0.095
Neg logit contributions
 patched-0.099
 merit-0.093
minster-0.090
 consolation-0.089
 elig-0.088
TokenCh
Token ID1925
Feature activation+2.16e-01

Pos logit contributions
Rated+0.085
ENCY+0.075
DEN+0.075
 Draper+0.074
OWER+0.072
Neg logit contributions
 patched-0.108
 merit-0.103
minster-0.099
 consolation-0.099
 elig-0.098
Tokenances
Token ID1817
Feature activation-8.25e-01

Pos logit contributions
 patched+0.388
 merit+0.366
minster+0.352
 consolation+0.351
 elig+0.349
Neg logit contributions
Rated-0.372
ENCY-0.333
 Draper-0.331
DEN-0.329
OWER-0.322
Token are
Token ID389
Feature activation+1.44e-01

Pos logit contributions
Rated+1.507
ENCY+1.381
 Draper+1.366
DEN+1.349
OWER+1.325
Neg logit contributions
 patched-1.358
 merit-1.264
minster-1.247
 elig-1.237
 consolation-1.228
Token pretty
Token ID2495
Feature activation+3.18e-01

Pos logit contributions
 patched+0.227
 merit+0.212
minster+0.204
 consolation+0.202
bridge+0.201
Neg logit contributions
Rated-0.280
ENCY-0.254
 Draper-0.252
DEN-0.252
OWER-0.247
Token good
Token ID922
Feature activation+2.09e-01

Pos logit contributions
 patched+0.546
 merit+0.513
minster+0.493
 consolation+0.493
bridge+0.488
Neg logit contributions
Rated-0.571
ENCY-0.516
DEN-0.509
 Draper-0.509
OWER-0.498
Token that
Token ID326
Feature activation+1.90e-02

Pos logit contributions
 patched+0.330
 merit+0.307
minster+0.297
 consolation+0.293
bridge+0.292
Neg logit contributions
Rated-0.406
ENCY-0.369
 Draper-0.367
DEN-0.365
OWER-0.358
Token you
Token ID345
Feature activation+3.43e-02

Pos logit contributions
 patched+0.030
 merit+0.028
minster+0.027
 consolation+0.026
bridge+0.026
Neg logit contributions
Rated-0.037
 Draper-0.034
ENCY-0.033
DEN-0.033
 Roose-0.032
Token Token
Token ID29130
Feature activation-3.45e-01

Pos logit contributions
Rated+0.054
ENCY+0.049
 Draper+0.049
DEN+0.049
OWER+0.048
Neg logit contributions
 patched-0.045
 merit-0.042
minster-0.041
 consolation-0.040
bridge-0.040
Token while
Token ID981
Feature activation-2.20e-01

Pos logit contributions
Rated+0.591
ENCY+0.534
 Draper+0.526
DEN+0.525
OWER+0.513
Neg logit contributions
 patched-0.613
 merit-0.580
minster-0.564
 consolation-0.556
 elig-0.555
Token your
Token ID534
Feature activation-2.50e-01

Pos logit contributions
Rated+0.401
ENCY+0.364
 Draper+0.360
DEN+0.358
OWER+0.352
Neg logit contributions
 patched-0.369
 merit-0.348
minster-0.339
 consolation-0.334
bridge-0.334
Token opponent
Token ID6125
Feature activation-4.95e-01

Pos logit contributions
Rated+0.340
ENCY+0.298
 Draper+0.293
DEN+0.291
OWER+0.284
Neg logit contributions
 patched-0.533
 merit-0.508
minster-0.499
 consolation-0.492
 elig-0.492
Token Sum
Token ID5060
Feature activation-2.89e-01

Pos logit contributions
Rated+0.907
ENCY+0.826
DEN+0.817
 Draper+0.816
OWER+0.800
Neg logit contributions
 patched-0.820
 merit-0.775
minster-0.747
 consolation-0.741
 elig-0.737
Tokenmons
Token ID11567
Feature activation-8.23e-01

Pos logit contributions
Rated+0.484
ENCY+0.425
DEN+0.424
 Draper+0.420
OWER+0.420
Neg logit contributions
 patched-0.535
 merit-0.507
 consolation-0.490
minster-0.485
 trickle-0.484
Token a
Token ID257
Feature activation-1.00e-02

Pos logit contributions
Rated+1.380
ENCY+1.250
 Draper+1.236
DEN+1.228
OWER+1.201
Neg logit contributions
 patched-1.474
 merit-1.392
minster-1.371
bridge-1.345
 consolation-1.342
Token lot
Token ID1256
Feature activation-6.72e-02

Pos logit contributions
Rated+0.017
ENCY+0.016
 Draper+0.015
DEN+0.015
OWER+0.015
Neg logit contributions
 patched-0.018
 merit-0.017
minster-0.016
 consolation-0.016
 elig-0.016
Token of
Token ID286
Feature activation+1.79e-02

Pos logit contributions
Rated+0.117
ENCY+0.105
 Draper+0.104
DEN+0.104
OWER+0.102
Neg logit contributions
 patched-0.119
 merit-0.112
minster-0.109
 consolation-0.108
 elig-0.107
Token monsters
Token ID10963
Feature activation-4.07e-01

Pos logit contributions
 patched+0.031
 merit+0.029
minster+0.029
 consolation+0.028
bridge+0.028
Neg logit contributions
Rated-0.032
ENCY-0.028
 Draper-0.028
DEN-0.028
OWER-0.027
Token to
Token ID284
Feature activation+9.45e-04

Pos logit contributions
Rated+0.676
ENCY+0.605
 Draper+0.598
DEN+0.598
OWER+0.585
Neg logit contributions
 patched-0.749
 merit-0.708
minster-0.688
 consolation-0.681
 elig-0.677
Token,
Token ID11
Feature activation-4.76e-02

Pos logit contributions
Rated+0.003
 Draper+0.003
ENCY+0.003
DEN+0.003
OWER+0.003
Neg logit contributions
 patched-0.003
 merit-0.003
minster-0.003
 elig-0.003
 consolation-0.003
Token features
Token ID3033
Feature activation+2.01e-01

Pos logit contributions
Rated+0.086
ENCY+0.077
 Draper+0.077
DEN+0.076
OWER+0.075
Neg logit contributions
 patched-0.081
 merit-0.076
minster-0.074
 consolation-0.073
bridge-0.073
Token a
Token ID257
Feature activation-4.00e-02

Pos logit contributions
 patched+0.360
 merit+0.339
minster+0.328
 consolation+0.326
bridge+0.324
Neg logit contributions
Rated-0.346
ENCY-0.311
 Draper-0.307
DEN-0.307
OWER-0.300
Token full
Token ID1336
Feature activation+2.26e-02

Pos logit contributions
Rated+0.073
 Draper+0.066
ENCY+0.066
DEN+0.066
OWER+0.064
Neg logit contributions
 patched-0.066
 merit-0.062
minster-0.060
 consolation-0.060
bridge-0.060
Token Shim
Token ID31698
Feature activation-5.05e-01

Pos logit contributions
 patched+0.038
 merit+0.036
minster+0.035
 consolation+0.034
bridge+0.034
Neg logit contributions
Rated-0.041
ENCY-0.037
 Draper-0.037
DEN-0.037
OWER-0.036
Tokenano
Token ID5733
Feature activation-8.21e-01

Pos logit contributions
Rated+0.785
ENCY+0.696
 Draper+0.687
DEN+0.686
OWER+0.669
Neg logit contributions
 patched-0.988
 merit-0.935
minster-0.907
 consolation-0.902
bridge-0.897
Token 600
Token ID10053
Feature activation-8.07e-02

Pos logit contributions
Rated+1.446
 Draper+1.299
ENCY+1.280
DEN+1.276
 Roose+1.247
Neg logit contributions
 patched-1.410
 merit-1.329
minster-1.304
 elig-1.288
bridge-1.280
Token tri
Token ID1333
Feature activation+5.82e-02

Pos logit contributions
Rated+0.140
ENCY+0.126
 Draper+0.124
DEN+0.124
OWER+0.121
Neg logit contributions
 patched-0.143
 merit-0.135
minster-0.131
 consolation-0.130
 elig-0.129
Token-
Token ID12
Feature activation+1.32e-01

Pos logit contributions
 patched+0.107
 merit+0.101
minster+0.098
 consolation+0.098
 elig+0.097
Neg logit contributions
Rated-0.097
ENCY-0.086
 Draper-0.086
DEN-0.085
OWER-0.083
Tokencolor
Token ID8043
Feature activation-2.49e-01

Pos logit contributions
 patched+0.226
 merit+0.213
 consolation+0.206
minster+0.206
 elig+0.204
Neg logit contributions
Rated-0.238
ENCY-0.213
DEN-0.211
 Draper-0.210
OWER-0.205
Token group
Token ID1448
Feature activation-3.25e-01

Pos logit contributions
Rated+0.470
 Draper+0.424
ENCY+0.423
DEN+0.420
OWER+0.410
Neg logit contributions
 patched-0.402
 merit-0.377
minster-0.367
 consolation-0.361
 elig-0.360
Token 20
Token ID1160
Feature activation-1.77e-01

Pos logit contributions
Rated+0.233
ENCY+0.209
 Draper+0.207
DEN+0.206
OWER+0.202
Neg logit contributions
 patched-0.244
 merit-0.229
minster-0.222
 consolation-0.220
 elig-0.219
Token,
Token ID11
Feature activation-1.74e-02

Pos logit contributions
Rated+0.295
ENCY+0.263
 Draper+0.261
DEN+0.260
OWER+0.254
Neg logit contributions
 patched-0.326
 merit-0.307
minster-0.298
 consolation-0.296
 elig-0.295
Token 2015
Token ID1853
Feature activation-1.16e-01

Pos logit contributions
Rated+0.040
ENCY+0.037
 Draper+0.037
DEN+0.036
OWER+0.036
Neg logit contributions
 patched-0.021
 merit-0.019
minster-0.019
 consolation-0.018
 elig-0.018
Token.
Token ID13
Feature activation-1.30e-01

Pos logit contributions
Rated+0.198
ENCY+0.178
 Draper+0.176
DEN+0.176
OWER+0.172
Neg logit contributions
 patched-0.211
 merit-0.199
minster-0.192
 consolation-0.191
bridge-0.190
Token Hide
Token ID10415
Feature activation+2.77e-01

Pos logit contributions
Rated+0.269
ENCY+0.246
 Draper+0.244
DEN+0.243
OWER+0.239
Neg logit contributions
 patched-0.188
 merit-0.175
minster-0.168
 consolation-0.166
 elig-0.165
Token Caption
Token ID11260
Feature activation-8.15e-01

Pos logit contributions
 patched+0.571
 merit+0.542
minster+0.527
 consolation+0.524
bridge+0.521
Neg logit contributions
Rated-0.401
ENCY-0.352
 Draper-0.347
DEN-0.346
OWER-0.337
Token 8
Token ID807
Feature activation-2.72e-03

Pos logit contributions
Rated+1.604
ENCY+1.464
 Draper+1.456
DEN+1.432
 Author+1.412
Neg logit contributions
 patched-1.275
 merit-1.187
 elig-1.150
 facult-1.137
 arrang-1.135
Token of
Token ID286
Feature activation+6.53e-02

Pos logit contributions
Rated+0.005
ENCY+0.004
 Draper+0.004
DEN+0.004
OWER+0.004
Neg logit contributions
 patched-0.005
 merit-0.005
minster-0.004
 consolation-0.004
 elig-0.004
Token 15
Token ID1315
Feature activation+7.14e-02

Pos logit contributions
 patched+0.108
 merit+0.100
minster+0.097
 consolation+0.096
 elig+0.096
Neg logit contributions
Rated-0.122
ENCY-0.110
 Draper-0.109
DEN-0.109
OWER-0.107
Token Photos
Token ID9434
Feature activation-1.80e-01

Pos logit contributions
 patched+0.148
 merit+0.140
 elig+0.136
 consolation+0.136
minster+0.135
Neg logit contributions
Rated-0.104
ENCY-0.091
 Draper-0.091
DEN-0.090
OWER-0.087
Token:
Token ID25
Feature activation+1.53e-01

Pos logit contributions
Rated+0.274
ENCY+0.241
 Draper+0.238
DEN+0.238
OWER+0.231
Neg logit contributions
 patched-0.361
 merit-0.340
 consolation-0.331
minster-0.330
 elig-0.330
Token took
Token ID1718
Feature activation-1.02e-01

Pos logit contributions
Rated+0.272
ENCY+0.242
 Draper+0.242
DEN+0.241
OWER+0.235
Neg logit contributions
 patched-0.294
 merit-0.278
minster-0.269
 consolation-0.267
 elig-0.266
Token the
Token ID262
Feature activation-1.24e-04

Pos logit contributions
Rated+0.156
ENCY+0.137
 Draper+0.137
DEN+0.135
OWER+0.133
Neg logit contributions
 patched-0.203
 merit-0.191
minster-0.186
 consolation-0.184
 elig-0.184
Token reins
Token ID35627
Feature activation-3.49e-01

Pos logit contributions
Rated+0.000
ENCY+0.000
 Draper+0.000
DEN+0.000
OWER+0.000
Neg logit contributions
 patched-0.000
 merit-0.000
minster-0.000
 consolation-0.000
 elig-0.000
Token of
Token ID286
Feature activation+5.64e-02

Pos logit contributions
Rated+0.609
ENCY+0.546
 Draper+0.543
DEN+0.542
OWER+0.529
Neg logit contributions
 patched-0.617
 merit-0.577
minster-0.560
 consolation-0.555
 elig-0.554
Token O
Token ID440
Feature activation-2.43e-01

Pos logit contributions
 patched+0.102
 merit+0.096
minster+0.093
 consolation+0.092
bridge+0.091
Neg logit contributions
Rated-0.097
ENCY-0.087
 Draper-0.086
DEN-0.085
OWER-0.084
TokenCL
Token ID5097
Feature activation+6.54e-01

Pos logit contributions
Rated+0.381
ENCY+0.338
DEN+0.335
 Draper+0.334
OWER+0.326
Neg logit contributions
 patched-0.475
 merit-0.449
minster-0.435
 consolation-0.433
 elig-0.430
TokenC
Token ID34
Feature activation+1.54e-01

Pos logit contributions
 patched+1.202
 merit+1.136
minster+1.101
bridge+1.099
 consolation+1.091
Neg logit contributions
Rated-1.088
ENCY-0.967
 Draper-0.952
DEN-0.952
 Roose-0.927
Token as
Token ID355
Feature activation-2.91e-03

Pos logit contributions
 patched+0.278
 merit+0.262
minster+0.254
bridge+0.252
 consolation+0.252
Neg logit contributions
Rated-0.262
ENCY-0.234
DEN-0.231
 Draper-0.230
OWER-0.225
Token its
Token ID663
Feature activation+9.92e-03

Pos logit contributions
Rated+0.005
ENCY+0.005
 Draper+0.004
DEN+0.004
OWER+0.004
Neg logit contributions
 patched-0.005
 merit-0.005
minster-0.005
 consolation-0.005
bridge-0.005
Token new
Token ID649
Feature activation+1.04e-01

Pos logit contributions
 patched+0.017
 merit+0.016
minster+0.016
 consolation+0.016
bridge+0.016
Neg logit contributions
Rated-0.017
ENCY-0.016
 Draper-0.015
DEN-0.015
OWER-0.015
Token president
Token ID1893
Feature activation+1.20e-01

Pos logit contributions
 patched+0.178
 merit+0.167
minster+0.162
 consolation+0.161
bridge+0.159
Neg logit contributions
Rated-0.186
ENCY-0.168
 Draper-0.166
DEN-0.166
OWER-0.162
Token Russia
Token ID3284
Feature activation-1.83e-01

Pos logit contributions
 patched+0.018
 merit+0.017
minster+0.016
 consolation+0.016
bridge+0.016
Neg logit contributions
Rated-0.016
ENCY-0.015
 Draper-0.015
DEN-0.015
OWER-0.014
Token.
Token ID13
Feature activation+5.17e-02

Pos logit contributions
Rated+0.306
ENCY+0.274
 Draper+0.271
DEN+0.270
OWER+0.264
Neg logit contributions
 patched-0.337
 merit-0.318
minster-0.308
 consolation-0.306
bridge-0.304
Token
Token ID198
Feature activation-5.68e-03

Pos logit contributions
 patched+0.082
 merit+0.076
minster+0.073
 consolation+0.073
bridge+0.072
Neg logit contributions
Rated-0.100
ENCY-0.091
DEN-0.090
 Draper-0.090
OWER-0.088
Token
Token ID198
Feature activation+1.90e-04

Pos logit contributions
Rated+0.011
ENCY+0.010
DEN+0.010
 Draper+0.010
OWER+0.010
Neg logit contributions
 patched-0.009
 merit-0.008
 consolation-0.008
 elig-0.008
minster-0.008
TokenImage
Token ID5159
Feature activation-1.48e-01

Pos logit contributions
 patched+0.000
 merit+0.000
minster+0.000
 consolation+0.000
bridge+0.000
Neg logit contributions
Rated-0.000
ENCY-0.000
 Draper-0.000
DEN-0.000
OWER-0.000
Token copyright
Token ID6634
Feature activation+7.81e-01

Pos logit contributions
Rated+0.236
ENCY+0.211
 Draper+0.208
DEN+0.208
OWER+0.201
Neg logit contributions
 patched-0.280
 merit-0.265
minster-0.257
 elig-0.256
 consolation-0.255
Token AFP
Token ID8916
Feature activation+4.41e-01

Pos logit contributions
 patched+1.396
 merit+1.313
 consolation+1.263
minster+1.262
bridge+1.253
Neg logit contributions
Rated-1.350
ENCY-1.204
DEN-1.190
 Draper-1.186
OWER-1.164
Token Image
Token ID7412
Feature activation-1.14e-01

Pos logit contributions
 patched+1.005
 merit+0.959
minster+0.934
 consolation+0.930
bridge+0.925
Neg logit contributions
Rated-0.546
ENCY-0.466
 Draper-0.460
DEN-0.458
OWER-0.443
Token caption
Token ID8305
Feature activation+3.84e-02

Pos logit contributions
Rated+0.176
ENCY+0.156
DEN+0.155
 Draper+0.153
OWER+0.149
Neg logit contributions
 patched-0.226
 merit-0.211
 elig-0.208
minster-0.207
 consolation-0.205
Token The
Token ID383
Feature activation-5.47e-02

Pos logit contributions
 patched+0.074
 merit+0.070
minster+0.068
bridge+0.067
 consolation+0.067
Neg logit contributions
Rated-0.061
ENCY-0.054
DEN-0.054
 Draper-0.053
OWER-0.052
Token Kurds
Token ID23880
Feature activation-9.21e-02

Pos logit contributions
Rated+0.088
ENCY+0.079
 Draper+0.078
DEN+0.078
OWER+0.076
Neg logit contributions
 patched-0.104
 merit-0.098
minster-0.095
 consolation-0.095
bridge-0.094
Token in
Token ID287
Feature activation+6.09e-02

Pos logit contributions
Rated+0.093
ENCY+0.084
 Draper+0.084
DEN+0.083
OWER+0.081
Neg logit contributions
 patched-0.091
 merit-0.085
minster-0.083
 consolation-0.082
bridge-0.082
Token increasingly
Token ID6481
Feature activation+2.11e-01

Pos logit contributions
 patched+0.113
 merit+0.106
minster+0.103
 consolation+0.103
 elig+0.102
Neg logit contributions
Rated-0.101
ENCY-0.090
 Draper-0.090
DEN-0.089
OWER-0.087
Token gender
Token ID5279
Feature activation+1.81e-01

Pos logit contributions
 patched+0.356
 merit+0.333
minster+0.326
 consolation+0.322
bridge+0.321
Neg logit contributions
Rated-0.384
ENCY-0.346
 Draper-0.344
DEN-0.343
OWER-0.335
Token-
Token ID12
Feature activation+6.27e-03

Pos logit contributions
 patched+0.330
 merit+0.311
minster+0.303
 consolation+0.300
bridge+0.298
Neg logit contributions
Rated-0.304
ENCY-0.271
 Draper-0.269
DEN-0.268
OWER-0.263
Tokeninteg
Token ID18908
Feature activation-5.82e-02

Pos logit contributions
 patched+0.012
 merit+0.011
minster+0.011
 consolation+0.011
 elig+0.011
Neg logit contributions
Rated-0.010
ENCY-0.009
DEN-0.009
 Draper-0.009
OWER-0.009
Tokenrated
Token ID4111
Feature activation+5.66e-01

Pos logit contributions
Rated+0.090
ENCY+0.080
 Draper+0.079
DEN+0.079
OWER+0.077
Neg logit contributions
 patched-0.114
 merit-0.108
minster-0.105
 consolation-0.104
bridge-0.104
Token military
Token ID2422
Feature activation-6.88e-03

Pos logit contributions
 patched+0.965
 merit+0.907
minster+0.874
 consolation+0.869
bridge+0.862
Neg logit contributions
Rated-1.025
ENCY-0.926
 Draper-0.915
DEN-0.914
OWER-0.895
Token units
Token ID4991
Feature activation+2.72e-01

Pos logit contributions
Rated+0.012
 Draper+0.011
ENCY+0.011
DEN+0.011
OWER+0.011
Neg logit contributions
 patched-0.012
 merit-0.011
minster-0.011
 consolation-0.011
 elig-0.011
Token,
Token ID11
Feature activation+4.15e-02

Pos logit contributions
 patched+0.497
 merit+0.468
minster+0.453
 consolation+0.450
bridge+0.447
Neg logit contributions
Rated-0.458
ENCY-0.410
 Draper-0.405
DEN-0.405
OWER-0.395
Token he
Token ID339
Feature activation+1.14e-01

Pos logit contributions
 patched+0.074
 merit+0.069
minster+0.067
 consolation+0.067
 elig+0.066
Neg logit contributions
Rated-0.072
ENCY-0.064
 Draper-0.064
DEN-0.064
OWER-0.062
Token replied
Token ID8712
Feature activation-6.08e-02

Pos logit contributions
 patched+0.207
 merit+0.196
minster+0.190
 consolation+0.188
 elig+0.187
Neg logit contributions
Rated-0.194
ENCY-0.173
 Draper-0.172
DEN-0.171
OWER-0.167
Token and
Token ID290
Feature activation+1.20e-01

Pos logit contributions
Rated+0.780
ENCY+0.693
 Draper+0.689
DEN+0.684
OWER+0.670
Neg logit contributions
 patched-0.877
 merit-0.823
minster-0.802
 consolation-0.797
bridge-0.794
Token leng
Token ID40038
Feature activation+1.09e-01

Pos logit contributions
 patched+0.194
 merit+0.180
minster+0.176
 consolation+0.174
 elig+0.173
Neg logit contributions
Rated-0.225
ENCY-0.204
DEN-0.202
 Draper-0.202
OWER-0.198
Tokenthen
Token ID8524
Feature activation-7.95e-02

Pos logit contributions
 patched+0.174
 merit+0.163
minster+0.156
 consolation+0.155
 elig+0.154
Neg logit contributions
Rated-0.209
ENCY-0.190
 Draper-0.188
DEN-0.188
OWER-0.184
Token the
Token ID262
Feature activation+1.33e-01

Pos logit contributions
Rated+0.136
ENCY+0.122
 Draper+0.121
DEN+0.120
OWER+0.118
Neg logit contributions
 patched-0.144
 merit-0.135
minster-0.131
 consolation-0.130
bridge-0.129
Token tax
Token ID1687
Feature activation-3.19e-02

Pos logit contributions
 patched+0.230
 merit+0.215
minster+0.210
 consolation+0.208
bridge+0.207
Neg logit contributions
Rated-0.236
ENCY-0.211
 Draper-0.211
DEN-0.208
OWER-0.204
Token credits
Token ID10824
Feature activation+6.16e-01

Pos logit contributions
Rated+0.065
ENCY+0.059
 Draper+0.059
DEN+0.059
OWER+0.057
Neg logit contributions
 patched-0.047
 merit-0.044
minster-0.042
 consolation-0.042
bridge-0.041
Token for
Token ID329
Feature activation+8.73e-02

Pos logit contributions
 patched+1.076
 merit+1.013
minster+0.975
 consolation+0.969
 elig+0.964
Neg logit contributions
Rated-1.089
ENCY-0.980
DEN-0.969
 Draper-0.968
OWER-0.947
Token renewable
Token ID15713
Feature activation-1.86e-01

Pos logit contributions
 patched+0.153
 merit+0.144
minster+0.139
 consolation+0.139
bridge+0.138
Neg logit contributions
Rated-0.153
ENCY-0.138
 Draper-0.136
DEN-0.136
OWER-0.133
Token energy
Token ID2568
Feature activation+3.34e-01

Pos logit contributions
Rated+0.322
ENCY+0.287
DEN+0.284
 Draper+0.284
OWER+0.279
Neg logit contributions
 patched-0.333
 merit-0.313
minster-0.304
 consolation-0.302
bridge-0.299
Token production
Token ID3227
Feature activation+8.11e-02

Pos logit contributions
 patched+0.603
 merit+0.570
minster+0.548
 consolation+0.545
 elig+0.543
Neg logit contributions
Rated-0.570
ENCY-0.514
DEN-0.506
 Draper-0.506
OWER-0.492
Token.
Token ID13
Feature activation+2.52e-03

Pos logit contributions
 patched+0.148
 merit+0.140
minster+0.135
 consolation+0.134
bridge+0.133
Neg logit contributions
Rated-0.137
ENCY-0.122
 Draper-0.121
DEN-0.121
OWER-0.118
Token-
Token ID12
Feature activation+1.32e-01

Pos logit contributions
Rated+0.386
ENCY+0.344
 Draper+0.341
DEN+0.340
OWER+0.332
Neg logit contributions
 patched-0.437
 merit-0.410
minster-0.398
 consolation-0.397
 elig-0.396
Tokenwar
Token ID5767
Feature activation-9.60e-02

Pos logit contributions
 patched+0.230
 merit+0.216
minster+0.209
 consolation+0.208
 elig+0.206
Neg logit contributions
Rated-0.235
ENCY-0.212
 Draper-0.210
DEN-0.210
OWER-0.205
Token,
Token ID11
Feature activation+1.74e-01

Pos logit contributions
Rated+0.160
ENCY+0.143
 Draper+0.142
DEN+0.141
OWER+0.138
Neg logit contributions
 patched-0.177
 merit-0.167
minster-0.162
 consolation-0.161
 elig-0.160
Token anti
Token ID3098
Feature activation-2.78e-01

Pos logit contributions
 patched+0.298
 merit+0.278
minster+0.271
 consolation+0.269
 elig+0.267
Neg logit contributions
Rated-0.310
ENCY-0.280
 Draper-0.278
DEN-0.276
OWER-0.271
Token-
Token ID12
Feature activation+1.26e-01

Pos logit contributions
Rated+0.457
ENCY+0.408
 Draper+0.403
DEN+0.403
OWER+0.393
Neg logit contributions
 patched-0.519
 merit-0.487
 consolation-0.472
 elig-0.472
minster-0.471
Tokennuclear
Token ID43131
Feature activation+6.85e-01

Pos logit contributions
 patched+0.232
 merit+0.219
minster+0.212
 consolation+0.211
 elig+0.209
Neg logit contributions
Rated-0.212
ENCY-0.190
 Draper-0.187
DEN-0.187
OWER-0.183
Token,
Token ID11
Feature activation+1.47e-01

Pos logit contributions
 patched+1.237
 merit+1.174
minster+1.126
 consolation+1.124
bridge+1.118
Neg logit contributions
Rated-1.172
ENCY-1.051
DEN-1.036
 Draper-1.031
OWER-1.009
Token peace
Token ID4167
Feature activation-1.66e-01

Pos logit contributions
 patched+0.256
 merit+0.239
minster+0.233
 consolation+0.231
 elig+0.230
Neg logit contributions
Rated-0.259
ENCY-0.234
 Draper-0.232
DEN-0.231
OWER-0.226
Token movement
Token ID3356
Feature activation+1.96e-02

Pos logit contributions
Rated+0.270
ENCY+0.241
 Draper+0.238
DEN+0.238
OWER+0.232
Neg logit contributions
 patched-0.312
 merit-0.295
minster-0.285
 consolation-0.284
bridge-0.282
Token
Token ID1399
Feature activation-7.66e-02

Pos logit contributions
 patched+0.037
 merit+0.034
minster+0.033
 consolation+0.033
bridge+0.033
Neg logit contributions
Rated-0.032
ENCY-0.029
 Draper-0.029
DEN-0.029
OWER-0.028
TokenThe
Token ID464
Feature activation-7.25e-02

Pos logit contributions
Rated+0.124
ENCY+0.110
 Draper+0.109
DEN+0.108
OWER+0.106
Neg logit contributions
 patched-0.146
 merit-0.138
minster-0.133
 consolation-0.133
 elig-0.132
Token there
Token ID612
Feature activation+1.45e-04

Pos logit contributions
Rated+0.026
ENCY+0.024
 Draper+0.023
DEN+0.023
OWER+0.023
Neg logit contributions
 patched-0.018
 merit-0.016
minster-0.016
 consolation-0.016
 elig-0.015
Token,
Token ID11
Feature activation+5.32e-02

Pos logit contributions
 patched+0.000
 merit+0.000
minster+0.000
 consolation+0.000
 elig+0.000
Neg logit contributions
Rated-0.000
ENCY-0.000
 Draper-0.000
DEN-0.000
OWER-0.000
Token
Token ID447
Feature activation+8.87e-03

Pos logit contributions
 patched+0.091
 merit+0.085
minster+0.083
 consolation+0.082
bridge+0.081
Neg logit contributions
Rated-0.096
ENCY-0.086
 Draper-0.086
DEN-0.085
OWER-0.083
Token
Token ID251
Feature activation-3.57e-02

Pos logit contributions
 patched+0.021
 merit+0.020
minster+0.019
 consolation+0.019
bridge+0.019
Neg logit contributions
Rated-0.011
ENCY-0.009
 Draper-0.009
DEN-0.009
OWER-0.008
Token V
Token ID569
Feature activation-1.67e-01

Pos logit contributions
Rated+0.059
ENCY+0.053
 Draper+0.052
DEN+0.052
OWER+0.051
Neg logit contributions
 patched-0.066
 merit-0.062
minster-0.060
 consolation-0.060
bridge-0.060
Tokenese
Token ID2771
Feature activation+1.22e-01

Pos logit contributions
Rated+0.339
ENCY+0.309
 Draper+0.306
DEN+0.306
OWER+0.300
Neg logit contributions
 patched-0.247
 merit-0.229
minster-0.220
 consolation-0.219
 elig-0.217
Tokeny
Token ID88
Feature activation+4.40e-02

Pos logit contributions
 patched+0.192
 merit+0.178
 elig+0.172
 consolation+0.171
minster+0.171
Neg logit contributions
Rated-0.240
ENCY-0.216
 Draper-0.215
DEN-0.214
OWER-0.210
Token told
Token ID1297
Feature activation-3.36e-01

Pos logit contributions
 patched+0.081
 merit+0.076
minster+0.074
 consolation+0.073
 elig+0.073
Neg logit contributions
Rated-0.074
ENCY-0.066
 Draper-0.065
DEN-0.065
OWER-0.063
Token the
Token ID262
Feature activation+4.04e-02

Pos logit contributions
Rated+0.585
ENCY+0.525
 Draper+0.522
DEN+0.519
OWER+0.508
Neg logit contributions
 patched-0.595
 merit-0.558
minster-0.543
 consolation-0.537
bridge-0.535
Token conference
Token ID4495
Feature activation+4.31e-02

Pos logit contributions
 patched+0.061
 merit+0.057
minster+0.055
 consolation+0.054
bridge+0.054
Neg logit contributions
Rated-0.080
ENCY-0.073
 Draper-0.073
DEN-0.073
OWER-0.071
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
 patched+0.078
 merit+0.073
minster+0.071
 consolation+0.071
bridge+0.070
Neg logit contributions
Rated-0.074
ENCY-0.066
 Draper-0.065
DEN-0.065
OWER-0.064
Token alone
Token ID3436
Feature activation-1.78e-01

Pos logit contributions
 patched+0.089
 merit+0.084
minster+0.082
 consolation+0.081
bridge+0.081
Neg logit contributions
Rated-0.101
ENCY-0.091
 Draper-0.091
DEN-0.090
OWER-0.088
Token.
Token ID13
Feature activation+2.74e-02

Pos logit contributions
Rated+0.296
ENCY+0.262
 Draper+0.261
DEN+0.260
OWER+0.254
Neg logit contributions
 patched-0.327
 merit-0.308
minster-0.300
 consolation-0.297
 elig-0.295
Token
Token ID198
Feature activation+5.05e-02

Pos logit contributions
 patched+0.051
 merit+0.048
minster+0.046
 consolation+0.046
bridge+0.046
Neg logit contributions
Rated-0.045
ENCY-0.041
DEN-0.040
 Draper-0.040
OWER-0.039
Token
Token ID198
Feature activation+7.15e-02

Pos logit contributions
 patched+0.079
 merit+0.073
minster+0.070
 consolation+0.070
 elig+0.070
Neg logit contributions
Rated-0.099
ENCY-0.090
DEN-0.090
 Draper-0.089
OWER-0.088
TokenO
Token ID46
Feature activation-3.51e-02

Pos logit contributions
 patched+0.144
 merit+0.136
minster+0.132
 consolation+0.132
bridge+0.131
Neg logit contributions
Rated-0.107
ENCY-0.094
 Draper-0.093
DEN-0.093
OWER-0.091
Token'
Token ID6
Feature activation+4.72e-01

Pos logit contributions
Rated+0.053
ENCY+0.047
 Draper+0.046
DEN+0.046
OWER+0.045
Neg logit contributions
 patched-0.070
 merit-0.067
minster-0.065
 consolation-0.064
 elig-0.064
TokenBrian
Token ID24761
Feature activation+2.42e-01

Pos logit contributions
 patched+0.737
 merit+0.688
minster+0.661
 consolation+0.658
 elig+0.655
Neg logit contributions
Rated-0.928
ENCY-0.842
DEN-0.835
 Draper-0.832
OWER-0.817
Token is
Token ID318
Feature activation-8.22e-02

Pos logit contributions
 patched+0.451
 merit+0.426
minster+0.412
 consolation+0.410
bridge+0.407
Neg logit contributions
Rated-0.400
ENCY-0.357
 Draper-0.353
DEN-0.353
OWER-0.344
Token as
Token ID355
Feature activation+3.14e-02

Pos logit contributions
Rated+0.140
ENCY+0.125
 Draper+0.125
DEN+0.124
OWER+0.122
Neg logit contributions
 patched-0.148
 merit-0.139
minster-0.136
 consolation-0.134
bridge-0.133
Token fascinated
Token ID30103
Feature activation-1.11e-02

Pos logit contributions
 patched+0.053
 merit+0.050
minster+0.048
 consolation+0.048
bridge+0.048
Neg logit contributions
Rated-0.057
ENCY-0.051
 Draper-0.051
DEN-0.051
OWER-0.050
Token by
Token ID416
Feature activation-8.33e-02

Pos logit contributions
Rated+0.020
ENCY+0.018
 Draper+0.018
DEN+0.018
OWER+0.018
Neg logit contributions
 patched-0.018
 merit-0.017
minster-0.017
 consolation-0.017
 elig-0.017
Token.
Token ID13
Feature activation+9.24e-02

Pos logit contributions
 patched+0.337
 merit+0.319
minster+0.307
 consolation+0.306
bridge+0.303
Neg logit contributions
Rated-0.318
ENCY-0.285
 Draper-0.282
DEN-0.281
OWER-0.275
Token1
Token ID16
Feature activation+1.50e-01

Pos logit contributions
 patched+0.199
 merit+0.190
minster+0.184
 consolation+0.183
bridge+0.183
Neg logit contributions
Rated-0.125
ENCY-0.109
 Draper-0.107
DEN-0.107
OWER-0.104
Token million
Token ID1510
Feature activation-4.12e-02

Pos logit contributions
 patched+0.307
 merit+0.291
minster+0.283
 consolation+0.281
bridge+0.280
Neg logit contributions
Rated-0.220
ENCY-0.193
 Draper-0.190
DEN-0.190
OWER-0.185
Token (
Token ID357
Feature activation+8.22e-02

Pos logit contributions
Rated+0.066
ENCY+0.059
 Draper+0.058
DEN+0.058
OWER+0.057
Neg logit contributions
 patched-0.079
 merit-0.074
minster-0.072
 consolation-0.072
bridge-0.071
Token27
Token ID1983
Feature activation+3.25e-02

Pos logit contributions
 patched+0.161
 merit+0.153
minster+0.148
 consolation+0.147
bridge+0.147
Neg logit contributions
Rated-0.127
ENCY-0.113
 Draper-0.111
DEN-0.111
OWER-0.108
Tokenth
Token ID400
Feature activation+1.42e-01

Pos logit contributions
 patched+0.055
 merit+0.051
minster+0.049
 consolation+0.049
 elig+0.049
Neg logit contributions
Rated-0.059
ENCY-0.053
 Draper-0.053
DEN-0.053
OWER-0.052
Token among
Token ID1871
Feature activation-1.92e-01

Pos logit contributions
 patched+0.279
 merit+0.264
minster+0.257
 consolation+0.255
bridge+0.254
Neg logit contributions
Rated-0.219
ENCY-0.193
 Draper-0.191
DEN-0.191
OWER-0.186
Token all
Token ID477
Feature activation+1.71e-01

Pos logit contributions
Rated+0.353
ENCY+0.316
 Draper+0.314
DEN+0.313
OWER+0.308
Neg logit contributions
 patched-0.324
 merit-0.304
minster-0.295
 consolation-0.290
bridge-0.289
Token edge
Token ID5743
Feature activation+2.09e-01

Pos logit contributions
 patched+0.279
 merit+0.261
minster+0.252
 consolation+0.250
bridge+0.248
Neg logit contributions
Rated-0.321
ENCY-0.290
 Draper-0.288
DEN-0.287
OWER-0.281
Token r
Token ID374
Feature activation-1.39e-02

Pos logit contributions
 patched+0.366
 merit+0.344
minster+0.333
 consolation+0.330
bridge+0.328
Neg logit contributions
Rated-0.368
ENCY-0.331
 Draper-0.327
DEN-0.327
OWER-0.320
Tokenus
Token ID385
Feature activation+2.80e-01

Pos logit contributions
Rated+0.027
ENCY+0.024
 Draper+0.024
DEN+0.024
OWER+0.024
Neg logit contributions
 patched-0.022
 merit-0.020
 consolation-0.019
minster-0.019
bridge-0.019
Token making
Token ID1642
Feature activation+7.09e-02

Pos logit contributions
 patched+0.198
 merit+0.186
minster+0.180
 consolation+0.179
bridge+0.177
Neg logit contributions
Rated-0.199
ENCY-0.179
 Draper-0.178
DEN-0.177
OWER-0.173
Token some
Token ID617
Feature activation+1.26e-01

Pos logit contributions
 patched+0.130
 merit+0.122
minster+0.119
 consolation+0.117
bridge+0.117
Neg logit contributions
Rated-0.119
ENCY-0.107
 Draper-0.106
DEN-0.105
OWER-0.103
Token bad
Token ID2089
Feature activation+8.01e-02

Pos logit contributions
 patched+0.216
 merit+0.202
minster+0.197
 consolation+0.194
bridge+0.194
Neg logit contributions
Rated-0.226
ENCY-0.203
 Draper-0.202
DEN-0.201
OWER-0.196
Token choices
Token ID7747
Feature activation-8.52e-03

Pos logit contributions
 patched+0.142
 merit+0.133
minster+0.129
 consolation+0.128
 elig+0.128
Neg logit contributions
Rated-0.140
ENCY-0.125
 Draper-0.124
DEN-0.124
OWER-0.121
Token,
Token ID11
Feature activation+5.60e-02

Pos logit contributions
Rated+0.015
ENCY+0.014
 Draper+0.014
DEN+0.014
OWER+0.013
Neg logit contributions
 patched-0.015
 merit-0.014
minster-0.013
 consolation-0.013
bridge-0.013
Token Michael
Token ID3899
Feature activation+1.04e-01

Pos logit contributions
 patched+0.095
 merit+0.089
minster+0.086
 consolation+0.086
bridge+0.085
Neg logit contributions
Rated-0.101
ENCY-0.092
 Draper-0.091
DEN-0.090
OWER-0.089
Token Vick
Token ID33347
Feature activation-4.66e-01

Pos logit contributions
 patched+0.204
 merit+0.193
minster+0.187
 consolation+0.186
bridge+0.185
Neg logit contributions
Rated-0.160
ENCY-0.142
 Draper-0.140
DEN-0.140
OWER-0.136
Token has
Token ID468
Feature activation-6.99e-02

Pos logit contributions
Rated+0.875
ENCY+0.786
 Draper+0.782
DEN+0.781
OWER+0.762
Neg logit contributions
 patched-0.760
 merit-0.714
minster-0.689
 consolation-0.687
 elig-0.680
Token devoted
Token ID13378
Feature activation+1.15e-01

Pos logit contributions
Rated+0.123
ENCY+0.109
 Draper+0.109
DEN+0.108
OWER+0.106
Neg logit contributions
 patched-0.122
 merit-0.115
minster-0.112
 consolation-0.111
bridge-0.111
Token time
Token ID640
Feature activation+1.40e-01

Pos logit contributions
 patched+0.215
 merit+0.203
minster+0.197
 consolation+0.196
bridge+0.194
Neg logit contributions
Rated-0.191
ENCY-0.170
 Draper-0.168
DEN-0.168
OWER-0.164
Token to
Token ID284
Feature activation-1.14e-01

Pos logit contributions
 patched+0.266
 merit+0.252
minster+0.243
 consolation+0.242
bridge+0.240
Neg logit contributions
Rated-0.226
ENCY-0.202
 Draper-0.199
DEN-0.199
OWER-0.194
Token only
Token ID691
Feature activation-9.10e-02

Pos logit contributions
 patched+0.130
 merit+0.122
minster+0.119
bridge+0.117
 elig+0.117
Neg logit contributions
Rated-0.137
 Draper-0.123
ENCY-0.122
DEN-0.122
OWER-0.119
Token beer
Token ID6099
Feature activation-2.67e-01

Pos logit contributions
Rated+0.161
 Draper+0.144
ENCY+0.144
DEN+0.143
OWER+0.140
Neg logit contributions
 patched-0.157
 merit-0.147
minster-0.144
bridge-0.142
 elig-0.141
Token I
Token ID314
Feature activation+4.23e-02

Pos logit contributions
Rated+0.463
ENCY+0.414
 Draper+0.413
DEN+0.412
OWER+0.401
Neg logit contributions
 patched-0.473
 merit-0.444
minster-0.432
 consolation-0.429
 elig-0.426
Token can
Token ID460
Feature activation+3.26e-01

Pos logit contributions
 patched+0.079
 merit+0.074
minster+0.072
 consolation+0.071
bridge+0.071
Neg logit contributions
Rated-0.070
ENCY-0.063
 Draper-0.062
DEN-0.062
OWER-0.061
Token't
Token ID470
Feature activation+2.92e-01

Pos logit contributions
 patched+0.503
 merit+0.465
minster+0.451
 consolation+0.448
bridge+0.445
Neg logit contributions
Rated-0.643
ENCY-0.583
DEN-0.579
 Draper-0.578
OWER-0.565
Token say
Token ID910
Feature activation+1.64e-01

Pos logit contributions
 patched+0.475
 merit+0.443
minster+0.429
 consolation+0.425
 elig+0.423
Neg logit contributions
Rated-0.553
ENCY-0.499
 Draper-0.495
DEN-0.494
OWER-0.482
Token I
Token ID314
Feature activation+5.67e-02

Pos logit contributions
 patched+0.281
 merit+0.263
minster+0.256
 consolation+0.253
bridge+0.253
Neg logit contributions
Rated-0.296
ENCY-0.265
 Draper-0.265
DEN-0.263
OWER-0.257
Token was
Token ID373
Feature activation+7.22e-03

Pos logit contributions
 patched+0.102
 merit+0.097
minster+0.093
 consolation+0.093
bridge+0.092
Neg logit contributions
Rated-0.097
ENCY-0.087
 Draper-0.086
DEN-0.085
OWER-0.084
Token crazy
Token ID7165
Feature activation-2.31e-01

Pos logit contributions
 patched+0.012
 merit+0.012
minster+0.011
 consolation+0.011
bridge+0.011
Neg logit contributions
Rated-0.013
ENCY-0.012
 Draper-0.012
DEN-0.012
OWER-0.011
Token about
Token ID546
Feature activation-7.98e-03

Pos logit contributions
Rated+0.417
ENCY+0.375
 Draper+0.373
DEN+0.372
OWER+0.361
Neg logit contributions
 patched-0.394
 merit-0.370
minster-0.361
 consolation-0.354
 elig-0.352
Token
Token ID960
Feature activation-6.54e-02

Pos logit contributions
Rated+0.013
ENCY+0.012
 Draper+0.012
DEN+0.012
OWER+0.012
Neg logit contributions
 patched-0.015
 merit-0.014
minster-0.013
 consolation-0.013
bridge-0.013
Token Congress
Token ID3162
Feature activation+2.02e-01

Pos logit contributions
 patched+0.099
 merit+0.093
minster+0.090
 consolation+0.090
bridge+0.089
Neg logit contributions
Rated-0.090
ENCY-0.081
DEN-0.080
 Draper-0.080
OWER-0.078
Token.[
Token ID3693
Feature activation-5.38e-02

Pos logit contributions
 patched+0.356
 merit+0.335
minster+0.323
 consolation+0.321
bridge+0.320
Neg logit contributions
Rated-0.353
ENCY-0.318
DEN-0.314
 Draper-0.314
OWER-0.307
Token52
Token ID4309
Feature activation-3.25e-01

Pos logit contributions
Rated+0.083
ENCY+0.073
 Draper+0.073
DEN+0.072
OWER+0.071
Neg logit contributions
 patched-0.106
 merit-0.100
minster-0.097
 consolation-0.097
bridge-0.096
Token]
Token ID60
Feature activation-7.28e-03

Pos logit contributions
Rated+0.550
ENCY+0.493
 Draper+0.488
DEN+0.486
OWER+0.476
Neg logit contributions
 patched-0.590
 merit-0.555
minster-0.538
 consolation-0.534
 elig-0.532
Token The
Token ID383
Feature activation-3.60e-02

Pos logit contributions
Rated+0.013
ENCY+0.011
 Draper+0.011
DEN+0.011
OWER+0.011
Neg logit contributions
 patched-0.013
 merit-0.012
minster-0.012
 consolation-0.012
 elig-0.011
Token funds
Token ID5153
Feature activation-3.06e-01

Pos logit contributions
Rated+0.063
ENCY+0.057
 Draper+0.056
DEN+0.056
OWER+0.055
Neg logit contributions
 patched-0.063
 merit-0.059
minster-0.058
 consolation-0.057
 elig-0.057
Token for
Token ID329
Feature activation-4.64e-02

Pos logit contributions
Rated+0.542
ENCY+0.489
 Draper+0.485
DEN+0.481
OWER+0.472
Neg logit contributions
 patched-0.529
 merit-0.497
minster-0.484
 consolation-0.478
bridge-0.475
Token the
Token ID262
Feature activation-4.20e-02

Pos logit contributions
Rated+0.078
ENCY+0.069
 Draper+0.069
DEN+0.068
OWER+0.067
Neg logit contributions
 patched-0.085
 merit-0.080
minster-0.078
 consolation-0.077
 elig-0.077
Token monument
Token ID17757
Feature activation-1.95e-01

Pos logit contributions
Rated+0.077
ENCY+0.069
 Draper+0.069
DEN+0.068
OWER+0.067
Neg logit contributions
 patched-0.071
 merit-0.066
minster-0.064
 consolation-0.064
 elig-0.063
Token were
Token ID547
Feature activation+3.15e-02

Pos logit contributions
Rated+0.347
ENCY+0.313
 Draper+0.310
DEN+0.309
OWER+0.303
Neg logit contributions
 patched-0.336
 merit-0.316
minster-0.306
 consolation-0.303
bridge-0.302
Token raised
Token ID4376
Feature activation-3.91e-01

Pos logit contributions
 patched+0.052
 merit+0.049
minster+0.047
 consolation+0.047
 elig+0.047
Neg logit contributions
Rated-0.058
ENCY-0.053
 Draper-0.052
DEN-0.052
OWER-0.051
Token
Token ID447
Feature activation+2.05e-02

Pos logit contributions
 patched+0.119
 merit+0.112
minster+0.107
 consolation+0.107
 elig+0.106
Neg logit contributions
Rated-0.146
ENCY-0.132
 Draper-0.131
DEN-0.131
OWER-0.128
Token
Token ID251
Feature activation-6.43e-03

Pos logit contributions
 patched+0.039
 merit+0.037
minster+0.036
 consolation+0.036
 elig+0.035
Neg logit contributions
Rated-0.033
ENCY-0.029
 Draper-0.029
DEN-0.029
OWER-0.028
Token He
Token ID679
Feature activation+9.81e-02

Pos logit contributions
Rated+0.012
ENCY+0.011
 Draper+0.011
DEN+0.011
OWER+0.010
Neg logit contributions
 patched-0.011
 merit-0.010
minster-0.010
 consolation-0.010
 elig-0.010
Token shrug
Token ID32545
Feature activation+6.18e-02

Pos logit contributions
 patched+0.158
 merit+0.148
minster+0.142
 consolation+0.141
bridge+0.140
Neg logit contributions
Rated-0.187
ENCY-0.169
 Draper-0.168
DEN-0.168
OWER-0.164
Tokens
Token ID82
Feature activation-3.17e-02

Pos logit contributions
 patched+0.110
 merit+0.103
minster+0.100
 consolation+0.099
 elig+0.099
Neg logit contributions
Rated-0.107
ENCY-0.095
DEN-0.095
 Draper-0.095
OWER-0.093
Token his
Token ID465
Feature activation+7.73e-03

Pos logit contributions
Rated+0.053
DEN+0.047
 Draper+0.047
ENCY+0.047
OWER+0.046
Neg logit contributions
 patched-0.058
 merit-0.054
minster-0.054
bridge-0.053
 elig-0.052
Token shoulders
Token ID12450
Feature activation+2.61e-02

Pos logit contributions
 patched+0.012
 merit+0.011
minster+0.011
 consolation+0.010
bridge+0.010
Neg logit contributions
Rated-0.015
ENCY-0.014
 Draper-0.014
DEN-0.014
OWER-0.014
Token,
Token ID11
Feature activation-2.98e-02

Pos logit contributions
 patched+0.048
 merit+0.045
minster+0.044
 consolation+0.043
 elig+0.043
Neg logit contributions
Rated-0.044
ENCY-0.039
 Draper-0.039
DEN-0.039
OWER-0.038
Token moves
Token ID6100
Feature activation-1.13e-01

Pos logit contributions
Rated+0.055
ENCY+0.050
 Draper+0.050
DEN+0.050
OWER+0.048
Neg logit contributions
 patched-0.049
 merit-0.046
minster-0.045
 consolation-0.044
bridge-0.044
Token over
Token ID625
Feature activation+5.18e-02

Pos logit contributions
Rated+0.184
ENCY+0.164
 Draper+0.162
DEN+0.162
OWER+0.158
Neg logit contributions
 patched-0.213
 merit-0.201
minster-0.196
 consolation-0.194
bridge-0.193
Token to
Token ID284
Feature activation-6.18e-03

Pos logit contributions
 patched+0.103
 merit+0.097
minster+0.094
 consolation+0.094
 elig+0.093
Neg logit contributions
Rated-0.079
ENCY-0.070
 Draper-0.069
DEN-0.069
OWER-0.067
Token
Token ID198
Feature activation+1.35e-01

Pos logit contributions
 patched+0.209
 merit+0.194
minster+0.187
 consolation+0.187
 elig+0.186
Neg logit contributions
Rated-0.268
ENCY-0.244
DEN-0.241
 Draper-0.241
OWER-0.237
TokenR
Token ID49
Feature activation+3.03e-01

Pos logit contributions
 patched+0.274
 merit+0.260
minster+0.253
 consolation+0.251
bridge+0.251
Neg logit contributions
Rated-0.199
ENCY-0.176
 Draper-0.174
DEN-0.173
OWER-0.168
Tokenapt
Token ID2373
Feature activation+3.24e-01

Pos logit contributions
 patched+0.354
 merit+0.322
minster+0.305
 consolation+0.302
bridge+0.299
Neg logit contributions
Rated-0.712
ENCY-0.658
 Draper-0.653
DEN-0.652
OWER-0.642
Tokenor
Token ID273
Feature activation+6.03e-03

Pos logit contributions
 patched+0.581
 merit+0.544
minster+0.527
 elig+0.526
 consolation+0.525
Neg logit contributions
Rated-0.562
ENCY-0.502
 Draper-0.498
DEN-0.495
OWER-0.488
Token is
Token ID318
Feature activation-6.09e-02

Pos logit contributions
 patched+0.011
 merit+0.010
minster+0.010
 consolation+0.010
bridge+0.010
Neg logit contributions
Rated-0.010
ENCY-0.009
 Draper-0.009
DEN-0.009
OWER-0.009
Token powered
Token ID13232
Feature activation-8.70e-02

Pos logit contributions
Rated+0.104
ENCY+0.093
 Draper+0.092
DEN+0.092
OWER+0.090
Neg logit contributions
 patched-0.109
 merit-0.103
minster-0.101
 consolation-0.100
bridge-0.099
Token by
Token ID416
Feature activation+3.87e-02

Pos logit contributions
Rated+0.174
ENCY+0.158
 Draper+0.157
DEN+0.156
OWER+0.154
Neg logit contributions
 patched-0.132
 merit-0.123
minster-0.118
 consolation-0.117
 elig-0.116
Token Ford
Token ID8092
Feature activation-2.61e-02

Pos logit contributions
 patched+0.070
 merit+0.065
minster+0.064
 consolation+0.063
bridge+0.063
Neg logit contributions
Rated-0.066
ENCY-0.059
 Draper-0.059
DEN-0.059
OWER-0.057
Token's
Token ID338
Feature activation+8.01e-02

Pos logit contributions
Rated+0.038
ENCY+0.033
 Draper+0.033
DEN+0.033
OWER+0.032
Neg logit contributions
 patched-0.054
 merit-0.051
minster-0.050
 consolation-0.049
bridge-0.049
Token new
Token ID649
Feature activation-1.03e-01

Pos logit contributions
 patched+0.131
 merit+0.122
minster+0.119
 consolation+0.118
bridge+0.117
Neg logit contributions
Rated-0.150
ENCY-0.135
 Draper-0.135
DEN-0.134
OWER-0.132
Token,
Token ID11
Feature activation-3.57e-03

Pos logit contributions
Rated+0.182
 Draper+0.163
ENCY+0.163
DEN+0.162
OWER+0.158
Neg logit contributions
 patched-0.180
 merit-0.169
minster-0.164
 consolation-0.162
 elig-0.161
Token people
Token ID661
Feature activation-1.72e-02

Pos logit contributions
Rated+0.064
ENCY+0.058
 Draper+0.057
DEN+0.057
OWER+0.056
Neg logit contributions
 patched-0.059
 merit-0.055
minster-0.053
 consolation-0.053
bridge-0.052
Token who
Token ID508
Feature activation+9.29e-02

Pos logit contributions
Rated+0.030
ENCY+0.027
 Draper+0.026
DEN+0.026
OWER+0.026
Neg logit contributions
 patched-0.031
 merit-0.029
minster-0.028
 consolation-0.028
bridge-0.028
Token are
Token ID389
Feature activation+7.15e-02

Pos logit contributions
 patched+0.152
 merit+0.141
minster+0.138
 consolation+0.136
bridge+0.135
Neg logit contributions
Rated-0.174
ENCY-0.157
 Draper-0.156
DEN-0.155
OWER-0.153
Token subject
Token ID2426
Feature activation-1.08e-01

Pos logit contributions
 patched+0.127
 merit+0.119
minster+0.116
 consolation+0.115
bridge+0.114
Neg logit contributions
Rated-0.124
ENCY-0.111
 Draper-0.110
DEN-0.110
OWER-0.108
Token matter
Token ID2300
Feature activation-1.10e-01

Pos logit contributions
Rated+0.166
ENCY+0.147
 Draper+0.145
DEN+0.145
OWER+0.141
Neg logit contributions
 patched-0.215
 merit-0.203
minster-0.197
 consolation-0.196
bridge-0.195
Token experts
Token ID6154
Feature activation-3.01e-01

Pos logit contributions
Rated+0.219
ENCY+0.197
 Draper+0.197
DEN+0.196
OWER+0.193
Neg logit contributions
 patched-0.168
 merit-0.154
minster-0.152
bridge-0.148
 consolation-0.147
Token,
Token ID11
Feature activation-6.50e-02

Pos logit contributions
Rated+0.494
ENCY+0.440
 Draper+0.437
DEN+0.434
OWER+0.427
Neg logit contributions
 patched-0.561
 merit-0.527
minster-0.516
 consolation-0.510
 elig-0.507
Token who
Token ID508
Feature activation+4.24e-02

Pos logit contributions
Rated+0.115
ENCY+0.104
 Draper+0.102
DEN+0.102
OWER+0.100
Neg logit contributions
 patched-0.113
 merit-0.106
minster-0.103
 consolation-0.102
bridge-0.101
Token know
Token ID760
Feature activation-2.61e-02

Pos logit contributions
 patched+0.069
 merit+0.064
minster+0.063
 consolation+0.062
bridge+0.061
Neg logit contributions
Rated-0.080
ENCY-0.072
 Draper-0.072
DEN-0.071
OWER-0.070
Token how
Token ID703
Feature activation-6.54e-02

Pos logit contributions
Rated+0.047
ENCY+0.042
 Draper+0.042
DEN+0.042
OWER+0.041
Neg logit contributions
 patched-0.045
 merit-0.041
minster-0.041
 consolation-0.040
bridge-0.040
Token to
Token ID284
Feature activation-4.66e-02

Pos logit contributions
Rated+0.101
ENCY+0.090
 Draper+0.089
DEN+0.088
OWER+0.086
Neg logit contributions
 patched-0.129
 merit-0.121
minster-0.118
 consolation-0.117
bridge-0.117
Token known
Token ID1900
Feature activation+1.29e-01

Pos logit contributions
Rated+0.003
ENCY+0.002
DEN+0.002
 Draper+0.002
OWER+0.002
Neg logit contributions
 patched-0.003
 merit-0.003
minster-0.003
 consolation-0.003
bridge-0.003
Token as
Token ID355
Feature activation+6.45e-02

Pos logit contributions
 patched+0.214
 merit+0.199
minster+0.194
 consolation+0.192
bridge+0.191
Neg logit contributions
Rated-0.240
ENCY-0.216
 Draper-0.214
DEN-0.214
OWER-0.209
Token a
Token ID257
Feature activation+3.69e-03

Pos logit contributions
 patched+0.119
 merit+0.111
minster+0.108
 consolation+0.107
bridge+0.107
Neg logit contributions
Rated-0.107
ENCY-0.096
 Draper-0.095
DEN-0.095
OWER-0.093
Token "
Token ID366
Feature activation+1.92e-01

Pos logit contributions
 patched+0.007
 merit+0.006
minster+0.006
 consolation+0.006
bridge+0.006
Neg logit contributions
Rated-0.006
 Draper-0.005
ENCY-0.005
DEN-0.005
OWER-0.005
Tokenname
Token ID3672
Feature activation+7.70e-02

Pos logit contributions
 patched+0.336
 merit+0.314
minster+0.304
 consolation+0.304
 elig+0.300
Neg logit contributions
Rated-0.341
ENCY-0.305
DEN-0.304
 Draper-0.302
OWER-0.295
Token-
Token ID12
Feature activation-8.85e-02

Pos logit contributions
 patched+0.123
 merit+0.114
minster+0.113
 elig+0.110
 consolation+0.110
Neg logit contributions
Rated-0.145
DEN-0.131
ENCY-0.131
 Draper-0.130
OWER-0.128
Tokencle
Token ID2375
Feature activation+3.61e-01

Pos logit contributions
Rated+0.158
ENCY+0.141
DEN+0.140
 Draper+0.139
OWER+0.137
Neg logit contributions
 patched-0.153
 merit-0.144
minster-0.140
 consolation-0.139
 elig-0.138
Tokenaring
Token ID1723
Feature activation-5.80e-03

Pos logit contributions
 patched+0.591
 merit+0.553
minster+0.533
 consolation+0.529
bridge+0.525
Neg logit contributions
Rated-0.678
ENCY-0.615
 Draper-0.608
DEN-0.607
OWER-0.595
Token hearing
Token ID4854
Feature activation+8.10e-02

Pos logit contributions
Rated+0.010
ENCY+0.009
DEN+0.009
 Draper+0.009
OWER+0.009
Neg logit contributions
 patched-0.010
 merit-0.010
minster-0.009
 consolation-0.009
bridge-0.009
Token"
Token ID1
Feature activation-5.53e-02

Pos logit contributions
 patched+0.132
 merit+0.123
minster+0.119
 consolation+0.118
bridge+0.117
Neg logit contributions
Rated-0.152
ENCY-0.138
DEN-0.137
 Draper-0.136
OWER-0.134
Token with
Token ID351
Feature activation-3.12e-02

Pos logit contributions
Rated+0.092
ENCY+0.082
 Draper+0.082
DEN+0.081
OWER+0.080
Neg logit contributions
 patched-0.101
 merit-0.095
minster-0.093
 consolation-0.092
bridge-0.092
Tokenivating
Token ID39438
Feature activation+1.52e-02

Pos logit contributions
Rated+0.114
ENCY+0.097
 Draper+0.096
DEN+0.095
OWER+0.092
Neg logit contributions
 patched-0.217
 merit-0.207
minster-0.202
 consolation-0.201
 elig-0.200
Token the
Token ID262
Feature activation+5.16e-02

Pos logit contributions
 patched+0.028
 merit+0.026
minster+0.025
 consolation+0.025
bridge+0.025
Neg logit contributions
Rated-0.026
ENCY-0.023
 Draper-0.023
DEN-0.023
OWER-0.022
Token unit
Token ID4326
Feature activation-1.45e-01

Pos logit contributions
 patched+0.087
 merit+0.082
minster+0.079
 consolation+0.079
bridge+0.078
Neg logit contributions
Rated-0.094
ENCY-0.085
 Draper-0.084
DEN-0.083
OWER-0.082
Token.
Token ID13
Feature activation+4.58e-03

Pos logit contributions
Rated+0.247
ENCY+0.221
 Draper+0.219
DEN+0.218
OWER+0.213
Neg logit contributions
 patched-0.261
 merit-0.247
minster-0.240
 consolation-0.238
 elig-0.236
Token a
Token ID257
Feature activation-3.23e-02

Pos logit contributions
 patched+0.008
 merit+0.007
minster+0.007
 consolation+0.007
bridge+0.007
Neg logit contributions
Rated-0.008
ENCY-0.007
 Draper-0.007
DEN-0.007
OWER-0.007
Token failed
Token ID4054
Feature activation-5.09e-01

Pos logit contributions
Rated+0.047
ENCY+0.041
 Draper+0.041
DEN+0.041
OWER+0.040
Neg logit contributions
 patched-0.066
 merit-0.063
minster-0.061
 consolation-0.061
bridge-0.061
Token sensor
Token ID12694
Feature activation-1.11e-02

Pos logit contributions
Rated+0.944
 Draper+0.851
ENCY+0.850
DEN+0.841
OWER+0.821
Neg logit contributions
 patched-0.834
 merit-0.783
minster-0.765
 consolation-0.750
bridge-0.749
Token providing
Token ID4955
Feature activation+7.02e-02

Pos logit contributions
Rated+0.019
ENCY+0.017
 Draper+0.017
DEN+0.017
OWER+0.017
Neg logit contributions
 patched-0.019
 merit-0.018
minster-0.018
 consolation-0.018
 elig-0.018
Token false
Token ID3991
Feature activation-9.92e-02

Pos logit contributions
 patched+0.121
 merit+0.113
minster+0.110
 consolation+0.108
bridge+0.108
Neg logit contributions
Rated-0.125
ENCY-0.113
 Draper-0.112
DEN-0.111
OWER-0.109
Token information
Token ID1321
Feature activation+1.69e-02

Pos logit contributions
Rated+0.182
ENCY+0.163
 Draper+0.163
DEN+0.161
OWER+0.157
Neg logit contributions
 patched-0.166
 merit-0.155
minster-0.151
 consolation-0.149
 elig-0.149
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 patched+0.030
 merit+0.028
minster+0.028
 consolation+0.027
bridge+0.027
Neg logit contributions
Rated-0.029
ENCY-0.026
 Draper-0.026
DEN-0.026
OWER-0.025
Token must
Token ID1276
Feature activation-5.44e-01

Pos logit contributions
Rated+0.227
ENCY+0.202
 Draper+0.202
DEN+0.200
OWER+0.193
Neg logit contributions
 patched-0.291
 merit-0.274
minster-0.269
 consolation-0.266
 elig-0.266
Token select
Token ID2922
Feature activation-9.00e-02

Pos logit contributions
Rated+0.843
ENCY+0.741
 Draper+0.738
DEN+0.731
OWER+0.717
Neg logit contributions
 patched-1.058
 merit-0.989
minster-0.985
 elig-0.977
 facult-0.966
Token a
Token ID257
Feature activation-1.16e-01

Pos logit contributions
Rated+0.147
ENCY+0.131
 Draper+0.130
DEN+0.129
OWER+0.126
Neg logit contributions
 patched-0.169
 merit-0.158
minster-0.154
 consolation-0.154
 elig-0.153
Token newsletter
Token ID13129
Feature activation-4.71e-04

Pos logit contributions
Rated+0.200
 Draper+0.180
ENCY+0.180
DEN+0.177
OWER+0.174
Neg logit contributions
 patched-0.206
 merit-0.190
 elig-0.187
minster-0.187
 consolation-0.184
Token to
Token ID284
Feature activation-6.65e-02

Pos logit contributions
Rated+0.001
ENCY+0.001
 Draper+0.001
DEN+0.001
OWER+0.001
Neg logit contributions
 patched-0.001
 merit-0.001
minster-0.001
 consolation-0.001
 elig-0.001
Token subscribe
Token ID12383
Feature activation-5.08e-01

Pos logit contributions
Rated+0.114
ENCY+0.102
 Draper+0.101
DEN+0.101
OWER+0.098
Neg logit contributions
 patched-0.120
 merit-0.113
minster-0.109
 consolation-0.108
 elig-0.108
Token to
Token ID284
Feature activation-1.67e-01

Pos logit contributions
Rated+0.718
ENCY+0.623
 Draper+0.616
DEN+0.614
OWER+0.597
Neg logit contributions
 patched-1.072
 merit-1.013
 elig-0.988
minster-0.987
 consolation-0.985
Token.
Token ID13
Feature activation-1.78e-01

Pos logit contributions
Rated+0.278
ENCY+0.248
 Draper+0.245
DEN+0.245
OWER+0.238
Neg logit contributions
 patched-0.310
 merit-0.291
minster-0.283
 consolation-0.281
 elig-0.281
Token Sign
Token ID5865
Feature activation-2.63e-01

Pos logit contributions
Rated+0.334
ENCY+0.302
 Draper+0.300
DEN+0.299
OWER+0.293
Neg logit contributions
 patched-0.292
 merit-0.273
minster-0.264
 consolation-0.262
 elig-0.260
Token Up
Token ID3205
Feature activation-3.05e-01

Pos logit contributions
Rated+0.419
ENCY+0.372
 Draper+0.369
DEN+0.367
OWER+0.358
Neg logit contributions
 patched-0.506
 merit-0.478
minster-0.463
 consolation-0.461
 elig-0.459
Token You
Token ID921
Feature activation-2.35e-01

Pos logit contributions
Rated+0.571
 Draper+0.524
ENCY+0.523
DEN+0.518
OWER+0.507
Neg logit contributions
 patched-0.496
 merit-0.462
 elig-0.449
 consolation-0.445
minster-0.444
Token the
Token ID262
Feature activation-9.05e-03

Pos logit contributions
Rated+0.226
ENCY+0.203
 Draper+0.201
DEN+0.199
OWER+0.196
Neg logit contributions
 patched-0.240
 merit-0.225
minster-0.219
 consolation-0.217
bridge-0.217
Token plurality
Token ID36325
Feature activation+5.88e-02

Pos logit contributions
Rated+0.015
ENCY+0.014
 Draper+0.013
DEN+0.013
OWER+0.013
Neg logit contributions
 patched-0.017
 merit-0.016
minster-0.015
 consolation-0.015
bridge-0.015
Token of
Token ID286
Feature activation+2.91e-02

Pos logit contributions
 patched+0.102
 merit+0.096
minster+0.093
 consolation+0.092
 elig+0.092
Neg logit contributions
Rated-0.105
ENCY-0.094
 Draper-0.093
DEN-0.093
OWER-0.091
Token pro
Token ID386
Feature activation-1.88e-01

Pos logit contributions
 patched+0.057
 merit+0.053
minster+0.052
 consolation+0.052
bridge+0.051
Neg logit contributions
Rated-0.046
ENCY-0.041
 Draper-0.040
DEN-0.040
OWER-0.039
Token-
Token ID12
Feature activation-1.11e-02

Pos logit contributions
Rated+0.315
ENCY+0.280
 Draper+0.277
DEN+0.277
OWER+0.270
Neg logit contributions
 patched-0.347
 merit-0.326
minster-0.316
 consolation-0.315
 elig-0.313
Tokenindependence
Token ID39894
Feature activation-4.70e-01

Pos logit contributions
Rated+0.022
ENCY+0.020
DEN+0.020
 Draper+0.020
OWER+0.020
Neg logit contributions
 patched-0.017
 merit-0.016
minster-0.015
 consolation-0.015
 elig-0.015
Token parties
Token ID4671
Feature activation-2.00e-01

Pos logit contributions
Rated+0.806
ENCY+0.721
 Draper+0.717
DEN+0.713
OWER+0.699
Neg logit contributions
 patched-0.841
 merit-0.785
minster-0.763
 consolation-0.758
bridge-0.758
Token.
Token ID13
Feature activation-1.95e-02

Pos logit contributions
Rated+0.333
ENCY+0.297
 Draper+0.294
DEN+0.293
OWER+0.287
Neg logit contributions
 patched-0.369
 merit-0.348
minster-0.337
 consolation-0.335
bridge-0.333
Token But
Token ID887
Feature activation-2.27e-02

Pos logit contributions
Rated+0.034
ENCY+0.030
 Draper+0.030
DEN+0.030
OWER+0.029
Neg logit contributions
 patched-0.035
 merit-0.033
minster-0.032
 consolation-0.032
 elig-0.031
Token above
Token ID2029
Feature activation-1.99e-01

Pos logit contributions
Rated+0.039
ENCY+0.035
 Draper+0.035
DEN+0.035
OWER+0.034
Neg logit contributions
 patched-0.040
 merit-0.037
minster-0.037
bridge-0.036
 consolation-0.036
Token all
Token ID477
Feature activation-7.18e-02

Pos logit contributions
Rated+0.391
ENCY+0.354
 Draper+0.352
DEN+0.350
OWER+0.345
Neg logit contributions
 patched-0.308
 merit-0.282
minster-0.276
 elig-0.272
 consolation-0.270
Token it
Token ID340
Feature activation+3.39e-02

Pos logit contributions
Rated+0.257
 Draper+0.233
ENCY+0.232
DEN+0.230
OWER+0.227
Neg logit contributions
 patched-0.226
 merit-0.208
minster-0.206
 consolation-0.200
bridge-0.200
Token becomes
Token ID4329
Feature activation+2.39e-01

Pos logit contributions
 patched+0.060
 merit+0.057
minster+0.055
 consolation+0.055
bridge+0.054
Neg logit contributions
Rated-0.059
ENCY-0.052
 Draper-0.052
DEN-0.052
OWER-0.051
Token relevant
Token ID5981
Feature activation-1.06e-01

Pos logit contributions
 patched+0.421
 merit+0.395
minster+0.383
 consolation+0.379
bridge+0.378
Neg logit contributions
Rated-0.419
ENCY-0.376
 Draper-0.373
DEN-0.372
OWER-0.364
Token.
Token ID13
Feature activation+7.05e-03

Pos logit contributions
Rated+0.193
ENCY+0.174
 Draper+0.172
DEN+0.172
OWER+0.168
Neg logit contributions
 patched-0.179
 merit-0.167
minster-0.163
 consolation-0.160
bridge-0.160
Token And
Token ID843
Feature activation+4.11e-02

Pos logit contributions
 patched+0.012
 merit+0.012
minster+0.011
 consolation+0.011
 elig+0.011
Neg logit contributions
Rated-0.013
ENCY-0.011
 Draper-0.011
DEN-0.011
OWER-0.011
Token apparently
Token ID5729
Feature activation-4.82e-01

Pos logit contributions
 patched+0.069
 merit+0.064
minster+0.063
bridge+0.061
 consolation+0.061
Neg logit contributions
Rated-0.076
 Draper-0.069
ENCY-0.068
DEN-0.068
OWER-0.066
Token in
Token ID287
Feature activation-6.05e-02

Pos logit contributions
Rated+0.816
 Draper+0.737
DEN+0.727
ENCY+0.725
OWER+0.708
Neg logit contributions
 patched-0.874
 merit-0.813
minster-0.805
 consolation-0.788
bridge-0.786
Token Seattle
Token ID7312
Feature activation-2.62e-01

Pos logit contributions
Rated+0.090
ENCY+0.079
 Draper+0.079
DEN+0.078
OWER+0.076
Neg logit contributions
 patched-0.122
 merit-0.116
minster-0.113
 consolation-0.112
 elig-0.111
Token,
Token ID11
Feature activation-1.38e-02

Pos logit contributions
Rated+0.442
ENCY+0.395
 Draper+0.394
DEN+0.390
OWER+0.383
Neg logit contributions
 patched-0.476
 merit-0.445
minster-0.435
 consolation-0.430
bridge-0.428
Token it
Token ID340
Feature activation+1.07e-02

Pos logit contributions
Rated+0.025
ENCY+0.023
 Draper+0.023
DEN+0.023
OWER+0.022
Neg logit contributions
 patched-0.023
 merit-0.022
minster-0.021
 consolation-0.021
bridge-0.021
Token
Token ID447
Feature activation+2.93e-02

Pos logit contributions
 patched+0.019
 merit+0.018
minster+0.017
 consolation+0.017
 elig+0.017
Neg logit contributions
Rated-0.019
ENCY-0.017
 Draper-0.017
DEN-0.017
OWER-0.016
Token are
Token ID389
Feature activation+1.33e-03

Pos logit contributions
Rated+0.398
ENCY+0.356
 Draper+0.355
DEN+0.351
OWER+0.345
Neg logit contributions
 patched-0.380
 merit-0.357
minster-0.351
 consolation-0.348
 elig-0.346
Token metal
Token ID6147
Feature activation-8.66e-02

Pos logit contributions
 patched+0.002
 merit+0.002
minster+0.002
 consolation+0.002
bridge+0.002
Neg logit contributions
Rated-0.002
ENCY-0.002
 Draper-0.002
DEN-0.002
OWER-0.002
Token and
Token ID290
Feature activation+2.84e-04

Pos logit contributions
Rated+0.150
ENCY+0.132
 Draper+0.132
DEN+0.131
OWER+0.128
Neg logit contributions
 patched-0.154
 merit-0.146
minster-0.142
 consolation-0.141
 elig-0.140
Token should
Token ID815
Feature activation-5.86e-02

Pos logit contributions
 patched+0.000
 merit+0.000
minster+0.000
 consolation+0.000
bridge+0.000
Neg logit contributions
Rated-0.001
ENCY-0.000
 Draper-0.000
DEN-0.000
OWER-0.000
Token not
Token ID407
Feature activation-1.82e-02

Pos logit contributions
Rated+0.101
ENCY+0.091
DEN+0.089
 Draper+0.089
OWER+0.087
Neg logit contributions
 patched-0.104
 merit-0.097
minster-0.096
 consolation-0.095
 elig-0.094
Token snap
Token ID11495
Feature activation-5.18e-01

Pos logit contributions
Rated+0.034
ENCY+0.030
 Draper+0.030
DEN+0.030
OWER+0.029
Neg logit contributions
 patched-0.030
 merit-0.028
minster-0.028
 consolation-0.027
 elig-0.027
Token off
Token ID572
Feature activation+3.09e-01

Pos logit contributions
Rated+0.932
ENCY+0.830
 Draper+0.829
DEN+0.824
OWER+0.805
Neg logit contributions
 patched-0.879
 merit-0.826
minster-0.814
 consolation-0.805
 elig-0.800
Token anyway
Token ID6949
Feature activation+1.73e-01

Pos logit contributions
 patched+0.555
 merit+0.523
minster+0.506
 consolation+0.503
bridge+0.499
Neg logit contributions
Rated-0.530
ENCY-0.476
 Draper-0.470
DEN-0.470
OWER-0.459
TokenOnce
Token ID7454
Feature activation-2.65e-02

Pos logit contributions
 patched+0.313
 merit+0.295
minster+0.285
 consolation+0.283
bridge+0.282
Neg logit contributions
Rated-0.294
ENCY-0.264
 Draper-0.261
DEN-0.260
OWER-0.254
Token it
Token ID340
Feature activation-9.61e-02

Pos logit contributions
Rated+0.055
ENCY+0.050
 Draper+0.050
DEN+0.050
OWER+0.049
Neg logit contributions
 patched-0.037
 merit-0.035
minster-0.034
 consolation-0.034
 elig-0.033
Token's
Token ID338
Feature activation+5.58e-02

Pos logit contributions
Rated+0.174
 Draper+0.156
ENCY+0.155
DEN+0.154
OWER+0.152
Neg logit contributions
 patched-0.160
 merit-0.151
minster-0.148
 consolation-0.147
 elig-0.145