TOP ACTIVATIONS
MAX = 3.822

2011 United States 6 12 Posts # 17 Good to hear
2011 United States 228 65 Posts # 13 On September 20
TL AD T 235 21 Posts # 18 Good stuff !
2010 United States 10 53 Posts # 15 honestly if we
2010 United States 21 50 Posts # 16 Glad to hear
June 2009 Germany 247 34 Posts # 20 Nice , happy
2010 United States 6 83 Posts Last Edited : 2013 -
2002 United States 3 472 Posts # 11 On September 20
June 2012 Canada 16 10 Posts # 14 On September 20
2012 United States 5 667 Posts # 12 I 'm glad
mates , V Ps See Gallery Donald Trump passes on '
2012 United States 3 240 Posts # 10 Happy it worked
Irvine , California . Hide Caption 9 of 15 Photos :
20 , 2015 . Hide Caption 8 of 15 Photos :
Irvine , California . Hide Caption 10 of 15 Photos :
Women in politics . Hide Caption 7 of 15 Photos :
Grove , California . Hide Caption 15 of 15
been playing the attention - grab bing PR - s ab
Pasadena , California . Hide Caption 12 of 15 Photos :
Hills , California . Hide Caption 14 of 15 Photos :

MIN ACTIVATIONS
MIN = -2.377

convince the two cities let alone 59 community districts
Sir sa , while comm ending the work
Press is reporting that AT & T , the defendant in
To mark the World Wide Web s 28
AND STEP H EN M atur in first meet in April
this into perspective , AT & T 's spending for three
Spielberg 's classic film J aws in which the residents of
officer , and Stephen M atur in , a penn iless
is long ; he comm ends the Monday job - seekers
. The pilot introduced us to a LOT of
sych omen > GU YS STA HP [ 10 : 47
buy you ? While AT & T 's powerhouse FISA lobbyist
a member of the break away group . Analy
fracking . It is also doubtful that she would take effective
Last week , he comm ended Trump for conducting military strikes
support the mission of New Advent and get the full contents
urally generated universe Col on ize unch arted planets
Spain , to let s see first
S Mr á z ( Mich l í k 41 ),
not use social media to let people know what you look

INTERVAL 2.272 - 3.822
CONTAINS 0.004%

2010 United States 21 50 Posts # 16 Glad to hear
mates , V Ps See Gallery Donald Trump passes on '
2011 United States 228 65 Posts # 13 On September 20
June 2012 Canada 16 10 Posts # 14 On September 20
2011 United States 6 12 Posts # 17 Good to hear

INTERVAL 0.722 - 2.272
CONTAINS 3.213%

( 316 s )} 59 . N f 5 + {
Max ie , of Se gu in The at res Inc
t nearly as entertained by it in fact
bitcoins at last week 's price of $ 140 is still
Italy , but since the Allies had never recognised the R

INTERVAL -0.827 - 0.722
CONTAINS 94.764%

a problem worse . Some examples : The bombshell
after dropping out and becoming a drug addict . The common
, 000 dues pay ers in 1928 to a membership of
One feature of flat buff ers is that it
mar chers dead and thousands in jail . But the authorities

INTERVAL -2.377 - -0.827
CONTAINS 2.018%

are streets and al leys dotted with still more tents or
privacy issues . As my colleague Nicole O zer has discussed
more data delivery rate is limited by the application .
uries it can no longer afford . Saudi Arabia has been
of Ethiopian troops towards B aid oa , which is held
Token 2011
Token ID2813
Feature activation-2.48e-01

Pos logit contributions
uld+0.394
 underscore+0.350
 Kling+0.338
��+0.336
��+0.335
Neg logit contributions
hra-0.368
camp-0.365
anas-0.364
 Niagara-0.364
 Nasa-0.361
Token United
Token ID1578
Feature activation-2.81e-01

Pos logit contributions
hra+0.165
 Niagara+0.162
camp+0.162
anas+0.162
veh+0.162
Neg logit contributions
uld-0.183
 underscore-0.163
 Kling-0.159
 twins-0.153
��-0.153
Token States
Token ID1829
Feature activation-2.92e-01

Pos logit contributions
hra+0.136
anas+0.133
camp+0.132
veh+0.132
MIT+0.131
Neg logit contributions
uld-0.258
 Kling-0.236
 underscore-0.235
 twins-0.226
 tant-0.223
Token 6
Token ID718
Feature activation-9.25e-02

Pos logit contributions
hra+0.205
 Niagara+0.203
camp+0.202
anas+0.201
veh+0.201
Neg logit contributions
uld-0.206
 underscore-0.181
 Kling-0.177
��-0.172
��-0.170
Token12
Token ID1065
Feature activation+8.47e-02

Pos logit contributions
hra+0.058
camp+0.057
anas+0.057
veh+0.057
 Niagara+0.056
Neg logit contributions
uld-0.072
 underscore-0.065
 Kling-0.064
 twins-0.061
��-0.061
Token Posts
Token ID12043
Feature activation+3.28e+00

Pos logit contributions
uld+0.065
 underscore+0.058
 Kling+0.057
��+0.056
��+0.056
Neg logit contributions
camp-0.053
hra-0.053
 Niagara-0.053
veh-0.053
anas-0.053
Token #
Token ID1303
Feature activation+1.01e+00

Pos logit contributions
uld+1.706
 underscore+1.459
 Kling+1.436
 twins+1.336
 congr+1.326
Neg logit contributions
hra-2.849
MIT-2.827
veh-2.820
anas-2.815
camp-2.806
Token17
Token ID1558
Feature activation+1.29e-01

Pos logit contributions
uld+0.729
 underscore+0.656
 Kling+0.655
��+0.628
 twins+0.624
Neg logit contributions
hra-0.677
camp-0.675
veh-0.675
anas-0.668
 Niagara-0.657
Token Good
Token ID4599
Feature activation-6.76e-01

Pos logit contributions
uld+0.086
 underscore+0.075
 Kling+0.075
��+0.072
��+0.071
Neg logit contributions
hra-0.094
camp-0.093
 Niagara-0.093
anas-0.093
 Nasa-0.092
Token to
Token ID284
Feature activation-6.43e-01

Pos logit contributions
hra+0.513
camp+0.507
anas+0.503
veh+0.503
 Niagara+0.499
Neg logit contributions
uld-0.435
 underscore-0.379
 Kling-0.377
��-0.355
 twins-0.352
Token hear
Token ID3285
Feature activation-9.02e-01

Pos logit contributions
hra+0.471
 Niagara+0.461
camp+0.460
veh+0.458
anas+0.456
Neg logit contributions
uld-0.432
 Kling-0.373
 underscore-0.370
��-0.361
��-0.357
Token 2011
Token ID2813
Feature activation-1.01e-01

Pos logit contributions
uld+0.120
 underscore+0.107
��+0.104
 Kling+0.104
��+0.103
Neg logit contributions
hra-0.110
camp-0.109
 Niagara-0.109
 Nasa-0.108
anas-0.108
Token United
Token ID1578
Feature activation-2.00e-01

Pos logit contributions
hra+0.067
 Niagara+0.066
camp+0.065
anas+0.065
veh+0.065
Neg logit contributions
uld-0.075
 underscore-0.067
 Kling-0.065
 twins-0.063
��-0.062
Token States
Token ID1829
Feature activation-2.03e-01

Pos logit contributions
hra+0.095
camp+0.093
anas+0.093
veh+0.093
MIT+0.092
Neg logit contributions
uld-0.186
 Kling-0.170
 underscore-0.169
 twins-0.163
 tant-0.160
Token 228
Token ID29041
Feature activation+8.16e-02

Pos logit contributions
hra+0.142
 Niagara+0.140
camp+0.140
anas+0.139
veh+0.139
Neg logit contributions
uld-0.143
 underscore-0.126
 Kling-0.124
��-0.119
��-0.118
Token65
Token ID2996
Feature activation-1.01e-02

Pos logit contributions
uld+0.061
 underscore+0.054
 Kling+0.053
��+0.051
��+0.051
Neg logit contributions
hra-0.053
camp-0.053
 Niagara-0.053
veh-0.053
anas-0.052
Token Posts
Token ID12043
Feature activation+3.10e+00

Pos logit contributions
camp+0.006
hra+0.006
 Niagara+0.006
veh+0.006
 AMA+0.006
Neg logit contributions
uld-0.008
 underscore-0.007
��-0.007
��-0.007
 Kling-0.007
Token #
Token ID1303
Feature activation+1.02e+00

Pos logit contributions
uld+1.641
 underscore+1.404
 Kling+1.383
 twins+1.287
 congr+1.275
Neg logit contributions
hra-2.663
MIT-2.646
veh-2.638
anas-2.637
camp-2.629
Token13
Token ID1485
Feature activation-7.93e-02

Pos logit contributions
uld+0.707
 underscore+0.624
 Kling+0.619
 twins+0.589
��+0.587
Neg logit contributions
hra-0.731
camp-0.724
veh-0.723
anas-0.717
arkin-0.711
Token On
Token ID1550
Feature activation-3.24e-01

Pos logit contributions
hra+0.058
camp+0.057
 Niagara+0.057
anas+0.057
 Nasa+0.057
Neg logit contributions
uld-0.053
 underscore-0.046
 Kling-0.046
��-0.044
��-0.044
Token September
Token ID2693
Feature activation-1.55e-01

Pos logit contributions
hra+0.197
 Niagara+0.193
camp+0.191
 Nasa+0.190
veh+0.190
Neg logit contributions
uld-0.258
 underscore-0.231
 Kling-0.231
��-0.223
��-0.222
Token 20
Token ID1160
Feature activation-7.63e-01

Pos logit contributions
hra+0.127
anas+0.126
camp+0.125
veh+0.125
 Niagara+0.124
Neg logit contributions
uld-0.091
 underscore-0.078
 Kling-0.078
 twins-0.072
 tant-0.071
Token TL
Token ID24811
Feature activation+3.74e-01

Pos logit contributions
hra+0.220
camp+0.216
anas+0.216
veh+0.215
 Niagara+0.214
Neg logit contributions
uld-0.237
 underscore-0.211
 Kling-0.209
 twins-0.198
��-0.196
TokenAD
Token ID2885
Feature activation+7.55e-02

Pos logit contributions
uld+0.277
 underscore+0.246
 Kling+0.245
 twins+0.230
��+0.230
Neg logit contributions
hra-0.247
camp-0.245
anas-0.243
veh-0.243
 Niagara-0.242
TokenT
Token ID51
Feature activation+1.18e-01

Pos logit contributions
uld+0.056
 underscore+0.049
 Kling+0.048
��+0.046
��+0.046
Neg logit contributions
hra-0.051
camp-0.050
anas-0.050
MIT-0.050
 Niagara-0.050
Token 235
Token ID28878
Feature activation-1.22e-01

Pos logit contributions
uld+0.090
 underscore+0.080
 Kling+0.079
 twins+0.075
��+0.075
Neg logit contributions
hra-0.076
camp-0.076
anas-0.075
veh-0.075
 Niagara-0.075
Token21
Token ID2481
Feature activation-6.85e-02

Pos logit contributions
hra+0.082
camp+0.081
anas+0.081
veh+0.081
 Niagara+0.081
Neg logit contributions
uld-0.089
 underscore-0.079
 Kling-0.078
 twins-0.074
��-0.074
Token Posts
Token ID12043
Feature activation+3.04e+00

Pos logit contributions
camp+0.044
hra+0.043
 Niagara+0.043
veh+0.043
anas+0.043
Neg logit contributions
uld-0.053
 underscore-0.047
 Kling-0.046
��-0.045
��-0.045
Token #
Token ID1303
Feature activation+9.20e-01

Pos logit contributions
uld+1.601
 underscore+1.366
 Kling+1.346
 twins+1.250
 congr+1.249
Neg logit contributions
hra-2.630
veh-2.608
MIT-2.608
anas-2.607
camp-2.596
Token18
Token ID1507
Feature activation-1.28e-01

Pos logit contributions
uld+0.582
 underscore+0.519
 Kling+0.515
��+0.489
 twins+0.488
Neg logit contributions
hra-0.705
veh-0.701
camp-0.696
anas-0.695
 Niagara-0.682
Token Good
Token ID4599
Feature activation-7.25e-01

Pos logit contributions
hra+0.094
camp+0.092
 Niagara+0.092
anas+0.092
 AMA+0.091
Neg logit contributions
uld-0.086
 underscore-0.075
 Kling-0.074
��-0.071
��-0.071
Token stuff
Token ID3404
Feature activation-1.03e+00

Pos logit contributions
hra+0.556
camp+0.548
veh+0.544
anas+0.543
 Niagara+0.541
Neg logit contributions
uld-0.459
 underscore-0.400
 Kling-0.398
��-0.375
��-0.371
Token!
Token ID0
Feature activation-3.60e-01

Pos logit contributions
hra+0.668
camp+0.655
anas+0.653
veh+0.650
 Niagara+0.649
Neg logit contributions
uld-0.769
 underscore-0.686
 Kling-0.681
��-0.647
 twins-0.644
Token 2010
Token ID3050
Feature activation-1.52e-01

Pos logit contributions
uld+0.428
 underscore+0.380
 Kling+0.371
��+0.361
��+0.357
Neg logit contributions
hra-0.421
camp-0.417
anas-0.416
 Niagara-0.415
veh-0.413
Token United
Token ID1578
Feature activation+4.81e-03

Pos logit contributions
hra+0.101
camp+0.100
 Niagara+0.100
anas+0.100
veh+0.099
Neg logit contributions
uld-0.112
 underscore-0.100
 Kling-0.098
 twins-0.094
��-0.093
Token States
Token ID1829
Feature activation-1.54e-01

Pos logit contributions
uld+0.004
 Kling+0.004
 underscore+0.004
 twins+0.004
 tant+0.004
Neg logit contributions
hra-0.002
camp-0.002
veh-0.002
anas-0.002
MIT-0.002
Token 10
Token ID838
Feature activation-8.74e-02

Pos logit contributions
hra+0.108
camp+0.107
 Niagara+0.107
anas+0.106
veh+0.106
Neg logit contributions
uld-0.108
 underscore-0.095
 Kling-0.093
��-0.089
��-0.089
Token53
Token ID4310
Feature activation-7.08e-02

Pos logit contributions
hra+0.055
camp+0.055
anas+0.054
veh+0.054
 Niagara+0.054
Neg logit contributions
uld-0.067
 underscore-0.060
 Kling-0.060
 twins-0.057
��-0.057
Token Posts
Token ID12043
Feature activation+2.92e+00

Pos logit contributions
camp+0.044
hra+0.043
 Niagara+0.043
anas+0.043
veh+0.043
Neg logit contributions
uld-0.056
 underscore-0.050
��-0.049
��-0.048
 Kling-0.048
Token #
Token ID1303
Feature activation+8.84e-01

Pos logit contributions
uld+1.559
 underscore+1.336
 Kling+1.320
 twins+1.227
 congr+1.216
Neg logit contributions
hra-2.505
MIT-2.480
anas-2.479
veh-2.479
camp-2.466
Token15
Token ID1314
Feature activation-2.25e-01

Pos logit contributions
uld+0.591
 underscore+0.524
 Kling+0.515
��+0.500
 twins+0.493
Neg logit contributions
camp-0.658
veh-0.656
hra-0.654
anas-0.653
arkin-0.636
Token honestly
Token ID12698
Feature activation-1.79e-01

Pos logit contributions
hra+0.165
camp+0.163
 Niagara+0.163
anas+0.162
 Nasa+0.161
Neg logit contributions
uld-0.149
 underscore-0.130
 Kling-0.129
��-0.124
��-0.123
Token if
Token ID611
Feature activation-3.51e-01

Pos logit contributions
hra+0.121
camp+0.119
anas+0.118
veh+0.117
 Niagara+0.117
Neg logit contributions
uld-0.130
 underscore-0.116
 Kling-0.115
��-0.109
 twins-0.108
Token we
Token ID356
Feature activation+1.24e-01

Pos logit contributions
hra+0.233
camp+0.231
anas+0.229
 Niagara+0.228
veh+0.227
Neg logit contributions
uld-0.258
 underscore-0.230
 Kling-0.228
��-0.216
 twins-0.215
Token 2010
Token ID3050
Feature activation+1.23e-01

Pos logit contributions
uld+0.238
 underscore+0.212
 Kling+0.205
��+0.204
��+0.203
Neg logit contributions
hra-0.221
camp-0.218
 Niagara-0.218
anas-0.218
 Nasa-0.217
Token United
Token ID1578
Feature activation+3.93e-02

Pos logit contributions
uld+0.090
 underscore+0.080
 Kling+0.079
 twins+0.075
��+0.074
Neg logit contributions
hra-0.083
anas-0.081
camp-0.081
veh-0.081
 Niagara-0.081
Token States
Token ID1829
Feature activation-8.97e-02

Pos logit contributions
uld+0.036
 Kling+0.033
 underscore+0.033
 twins+0.032
 tant+0.032
Neg logit contributions
hra-0.019
camp-0.018
veh-0.018
anas-0.018
MIT-0.018
Token 21
Token ID2310
Feature activation+4.04e-01

Pos logit contributions
hra+0.064
 Niagara+0.063
camp+0.063
anas+0.062
veh+0.062
Neg logit contributions
uld-0.063
 underscore-0.055
 Kling-0.054
��-0.052
��-0.051
Token50
Token ID1120
Feature activation-2.62e-01

Pos logit contributions
uld+0.268
 underscore+0.236
 Kling+0.231
��+0.219
 twins+0.219
Neg logit contributions
hra-0.300
camp-0.296
 Niagara-0.295
anas-0.294
veh-0.293
Token Posts
Token ID12043
Feature activation+2.91e+00

Pos logit contributions
camp+0.159
 Niagara+0.159
hra+0.158
 AMA+0.157
anas+0.157
Neg logit contributions
uld-0.207
 underscore-0.187
��-0.184
��-0.183
 Kling-0.178
Token #
Token ID1303
Feature activation+8.86e-01

Pos logit contributions
uld+1.548
 underscore+1.327
 Kling+1.309
 twins+1.215
 congr+1.211
Neg logit contributions
hra-2.503
anas-2.482
veh-2.480
MIT-2.478
camp-2.470
Token16
Token ID1433
Feature activation+7.07e-03

Pos logit contributions
uld+0.514
 underscore+0.458
 Kling+0.455
��+0.451
��+0.436
Neg logit contributions
hra-0.721
camp-0.720
veh-0.717
anas-0.709
MIT-0.699
Token Glad
Token ID28442
Feature activation-3.08e-01

Pos logit contributions
uld+0.005
 underscore+0.004
 Kling+0.004
��+0.004
��+0.004
Neg logit contributions
hra-0.005
camp-0.005
 Niagara-0.005
anas-0.005
 Nasa-0.005
Token to
Token ID284
Feature activation-1.80e-01

Pos logit contributions
hra+0.217
camp+0.215
anas+0.214
 Niagara+0.212
veh+0.211
Neg logit contributions
uld-0.215
 underscore-0.189
 Kling-0.188
��-0.178
 twins-0.177
Token hear
Token ID3285
Feature activation-6.43e-01

Pos logit contributions
hra+0.128
camp+0.126
veh+0.125
 Niagara+0.124
anas+0.124
Neg logit contributions
uld-0.125
 Kling-0.109
 underscore-0.108
��-0.106
��-0.105
Token June
Token ID2795
Feature activation-6.33e-02

Pos logit contributions
uld+0.969
 underscore+0.867
 Kling+0.862
 twins+0.820
��+0.813
Neg logit contributions
hra-0.760
camp-0.749
anas-0.747
veh-0.744
 Niagara-0.740
Token 2009
Token ID3717
Feature activation-3.95e-01

Pos logit contributions
hra+0.043
camp+0.042
 Niagara+0.042
anas+0.042
 Nasa+0.042
Neg logit contributions
uld-0.046
 underscore-0.041
 Kling-0.040
��-0.040
��-0.039
Token Germany
Token ID4486
Feature activation-5.96e-01

Pos logit contributions
hra+0.260
camp+0.257
 Niagara+0.257
anas+0.256
veh+0.256
Neg logit contributions
uld-0.293
 underscore-0.261
 Kling-0.256
 twins-0.245
��-0.245
Token 247
Token ID30179
Feature activation+2.25e-01

Pos logit contributions
hra+0.411
 Niagara+0.410
camp+0.407
anas+0.405
 Nasa+0.404
Neg logit contributions
uld-0.426
 underscore-0.376
 Kling-0.366
��-0.359
��-0.357
Token34
Token ID2682
Feature activation-6.91e-02

Pos logit contributions
uld+0.165
 underscore+0.147
 Kling+0.144
��+0.139
��+0.138
Neg logit contributions
hra-0.150
camp-0.149
 Niagara-0.148
veh-0.147
anas-0.147
Token Posts
Token ID12043
Feature activation+2.90e+00

Pos logit contributions
camp+0.043
hra+0.043
 Niagara+0.043
veh+0.043
anas+0.043
Neg logit contributions
uld-0.054
 underscore-0.048
��-0.047
��-0.047
 Kling-0.046
Token #
Token ID1303
Feature activation+1.04e+00

Pos logit contributions
uld+1.558
 underscore+1.333
 Kling+1.316
 twins+1.222
 congr+1.217
Neg logit contributions
hra-2.489
anas-2.463
veh-2.462
MIT-2.459
camp-2.454
Token20
Token ID1238
Feature activation-1.60e-01

Pos logit contributions
uld+0.699
 underscore+0.627
 Kling+0.611
��+0.599
��+0.589
Neg logit contributions
camp-0.756
veh-0.755
hra-0.754
anas-0.747
MIT-0.740
Token Nice
Token ID18460
Feature activation-4.73e-01

Pos logit contributions
hra+0.116
camp+0.115
 Niagara+0.114
anas+0.114
 AMA+0.113
Neg logit contributions
uld-0.107
 underscore-0.094
 Kling-0.093
��-0.089
��-0.089
Token,
Token ID11
Feature activation-6.50e-01

Pos logit contributions
hra+0.340
camp+0.334
anas+0.332
veh+0.331
 Niagara+0.329
Neg logit contributions
uld-0.324
 underscore-0.284
 Kling-0.283
��-0.269
 twins-0.266
Token happy
Token ID3772
Feature activation-3.47e-01

Pos logit contributions
hra+0.451
camp+0.442
anas+0.442
 Niagara+0.439
 AMA+0.436
Neg logit contributions
uld-0.460
 underscore-0.406
 Kling-0.402
��-0.387
��-0.382
Token 2010
Token ID3050
Feature activation-7.11e-01

Pos logit contributions
hra+0.303
 Niagara+0.301
camp+0.301
 Nasa+0.301
anas+0.298
Neg logit contributions
uld-0.331
 underscore-0.296
��-0.287
��-0.285
 Kling-0.281
Token United
Token ID1578
Feature activation-6.18e-01

Pos logit contributions
hra+0.472
 Niagara+0.464
camp+0.464
anas+0.464
veh+0.462
Neg logit contributions
uld-0.525
 underscore-0.468
 Kling-0.459
 twins-0.439
��-0.438
Token States
Token ID1829
Feature activation-5.76e-01

Pos logit contributions
hra+0.267
camp+0.262
anas+0.261
 Niagara+0.260
veh+0.259
Neg logit contributions
uld-0.601
 underscore-0.551
 Kling-0.542
��-0.527
��-0.525
Token 6
Token ID718
Feature activation-3.18e-01

Pos logit contributions
hra+0.398
 Niagara+0.394
camp+0.393
anas+0.392
veh+0.391
Neg logit contributions
uld-0.411
 underscore-0.363
 Kling-0.356
��-0.344
��-0.341
Token83
Token ID5999
Feature activation-5.03e-01

Pos logit contributions
hra+0.210
camp+0.207
anas+0.206
veh+0.206
 Niagara+0.206
Neg logit contributions
uld-0.236
 underscore-0.210
 Kling-0.208
��-0.199
 twins-0.198
Token Posts
Token ID12043
Feature activation+2.87e+00

Pos logit contributions
camp+0.309
hra+0.308
 Niagara+0.306
veh+0.304
anas+0.303
Neg logit contributions
uld-0.396
 underscore-0.358
��-0.348
��-0.345
 Kling-0.345
Token Last
Token ID4586
Feature activation-9.68e-01

Pos logit contributions
uld+1.519
 underscore+1.297
 Kling+1.280
 twins+1.190
 congr+1.181
Neg logit contributions
hra-2.480
MIT-2.456
veh-2.450
anas-2.446
camp-2.441
Token Edited
Token ID34212
Feature activation+5.74e-01

Pos logit contributions
hra+0.573
camp+0.569
 AMA+0.563
anas+0.559
 Nasa+0.558
Neg logit contributions
uld-0.791
 underscore-0.701
 Kling-0.700
��-0.693
��-0.691
Token:
Token ID25
Feature activation-1.64e-01

Pos logit contributions
uld+0.425
 underscore+0.379
 Kling+0.374
��+0.364
��+0.359
Neg logit contributions
hra-0.382
camp-0.378
veh-0.375
 Niagara-0.373
anas-0.371
Token 2013
Token ID2211
Feature activation-4.47e-01

Pos logit contributions
hra+0.104
camp+0.102
 Niagara+0.102
veh+0.101
 Nasa+0.101
Neg logit contributions
uld-0.128
 underscore-0.114
 Kling-0.112
��-0.109
��-0.109
Token-
Token ID12
Feature activation-5.02e-01

Pos logit contributions
hra+0.259
camp+0.258
veh+0.253
 Niagara+0.253
anas+0.252
Neg logit contributions
uld-0.369
 underscore-0.333
 Kling-0.327
��-0.323
��-0.321
Token 2002
Token ID6244
Feature activation-3.92e-01

Pos logit contributions
uld+0.231
 underscore+0.206
 Kling+0.200
��+0.197
��+0.197
Neg logit contributions
hra-0.206
camp-0.204
anas-0.203
 Niagara-0.203
 Nasa-0.202
Token United
Token ID1578
Feature activation-1.59e-01

Pos logit contributions
hra+0.262
 Niagara+0.258
camp+0.258
anas+0.257
veh+0.256
Neg logit contributions
uld-0.288
 underscore-0.256
 Kling-0.250
��-0.241
��-0.241
Token States
Token ID1829
Feature activation-3.32e-01

Pos logit contributions
hra+0.071
camp+0.070
anas+0.069
veh+0.069
MIT+0.069
Neg logit contributions
uld-0.152
 Kling-0.139
 underscore-0.138
 twins-0.133
 tant-0.131
Token 3
Token ID513
Feature activation+1.18e-01

Pos logit contributions
hra+0.231
 Niagara+0.228
camp+0.227
anas+0.226
veh+0.226
Neg logit contributions
uld-0.236
 underscore-0.209
 Kling-0.205
��-0.197
��-0.196
Token472
Token ID37856
Feature activation-4.70e-01

Pos logit contributions
uld+0.096
 underscore+0.086
 Kling+0.086
 twins+0.082
��+0.082
Neg logit contributions
hra-0.069
camp-0.068
anas-0.068
veh-0.068
 Niagara-0.067
Token Posts
Token ID12043
Feature activation+2.86e+00

Pos logit contributions
camp+0.282
hra+0.280
anas+0.280
 Niagara+0.279
 AMA+0.279
Neg logit contributions
uld-0.377
 underscore-0.339
��-0.335
��-0.333
 Kling-0.324
Token #
Token ID1303
Feature activation+8.16e-01

Pos logit contributions
uld+1.547
 underscore+1.333
 Kling+1.316
 twins+1.231
 congr+1.217
Neg logit contributions
hra-2.436
veh-2.405
MIT-2.400
camp-2.397
anas-2.396
Token11
Token ID1157
Feature activation-2.69e-01

Pos logit contributions
uld+0.493
 underscore+0.433
 Kling+0.430
��+0.418
��+0.413
Neg logit contributions
veh-0.651
camp-0.649
hra-0.648
anas-0.642
arkin-0.639
Token On
Token ID1550
Feature activation-4.79e-01

Pos logit contributions
hra+0.195
camp+0.193
 Niagara+0.192
anas+0.192
 Nasa+0.192
Neg logit contributions
uld-0.180
 underscore-0.158
 Kling-0.156
��-0.151
��-0.150
Token September
Token ID2693
Feature activation-9.18e-02

Pos logit contributions
hra+0.293
camp+0.288
anas+0.287
veh+0.286
 Niagara+0.285
Neg logit contributions
uld-0.379
 underscore-0.341
 Kling-0.339
 twins-0.322
��-0.320
Token 20
Token ID1160
Feature activation-8.39e-01

Pos logit contributions
hra+0.068
camp+0.067
 Niagara+0.067
anas+0.067
 Nasa+0.066
Neg logit contributions
uld-0.062
 underscore-0.054
��-0.053
 Kling-0.053
��-0.052
Token June
Token ID2795
Feature activation+1.63e-01

Pos logit contributions
uld+1.239
 underscore+1.107
 Kling+1.102
 twins+1.048
��+1.034
Neg logit contributions
hra-0.992
camp-0.976
anas-0.974
veh-0.971
 Niagara-0.963
Token 2012
Token ID2321
Feature activation-3.24e-01

Pos logit contributions
uld+0.119
 underscore+0.106
 Kling+0.103
��+0.101
��+0.101
Neg logit contributions
hra-0.110
camp-0.109
 Niagara-0.108
anas-0.108
 Nasa-0.108
Token Canada
Token ID3340
Feature activation+2.08e-01

Pos logit contributions
hra+0.215
 Niagara+0.212
camp+0.211
anas+0.211
veh+0.210
Neg logit contributions
uld-0.240
 underscore-0.214
 Kling-0.209
��-0.201
 twins-0.201
Token 16
Token ID1467
Feature activation-2.15e-02

Pos logit contributions
uld+0.150
 underscore+0.132
 Kling+0.129
��+0.126
��+0.125
Neg logit contributions
hra-0.143
 Niagara-0.142
camp-0.141
anas-0.140
 Nasa-0.140
Token10
Token ID940
Feature activation-3.70e-01

Pos logit contributions
hra+0.014
camp+0.014
anas+0.014
veh+0.014
 Niagara+0.014
Neg logit contributions
uld-0.016
 underscore-0.014
 Kling-0.014
��-0.014
 twins-0.014
Token Posts
Token ID12043
Feature activation+2.81e+00

Pos logit contributions
camp+0.227
 Niagara+0.227
hra+0.225
 AMA+0.225
veh+0.225
Neg logit contributions
uld-0.289
 underscore-0.260
��-0.256
��-0.252
 Kling-0.250
Token #
Token ID1303
Feature activation+9.53e-01

Pos logit contributions
uld+1.498
 underscore+1.287
 Kling+1.271
 twins+1.179
 congr+1.165
Neg logit contributions
hra-2.418
MIT-2.397
anas-2.393
veh-2.391
camp-2.385
Token14
Token ID1415
Feature activation-5.46e-01

Pos logit contributions
uld+0.623
 underscore+0.549
 Kling+0.544
��+0.516
 twins+0.515
Neg logit contributions
hra-0.717
veh-0.711
camp-0.708
anas-0.703
arkin-0.697
Token On
Token ID1550
Feature activation-3.45e-01

Pos logit contributions
hra+0.400
camp+0.396
 Niagara+0.395
 Nasa+0.392
anas+0.392
Neg logit contributions
uld-0.363
 underscore-0.316
 Kling-0.314
��-0.304
��-0.302
Token September
Token ID2693
Feature activation-2.63e-01

Pos logit contributions
hra+0.208
 Niagara+0.204
 Nasa+0.202
camp+0.202
veh+0.201
Neg logit contributions
uld-0.276
 underscore-0.248
 Kling-0.248
��-0.240
��-0.239
Token 20
Token ID1160
Feature activation-1.00e+00

Pos logit contributions
hra+0.222
camp+0.220
veh+0.219
anas+0.219
 Niagara+0.217
Neg logit contributions
uld-0.147
 underscore-0.125
 Kling-0.125
 twins-0.115
 tant-0.113
Token 2012
Token ID2321
Feature activation-4.96e-01

Pos logit contributions
uld+0.171
 underscore+0.153
 Kling+0.148
��+0.146
��+0.146
Neg logit contributions
hra-0.154
camp-0.153
anas-0.152
 Niagara-0.152
 Nasa-0.152
Token United
Token ID1578
Feature activation-5.19e-01

Pos logit contributions
hra+0.329
 Niagara+0.324
camp+0.323
anas+0.322
veh+0.321
Neg logit contributions
uld-0.368
 underscore-0.328
 Kling-0.320
��-0.309
 twins-0.308
Token States
Token ID1829
Feature activation-5.90e-01

Pos logit contributions
hra+0.239
camp+0.233
anas+0.232
veh+0.232
MIT+0.230
Neg logit contributions
uld-0.491
 Kling-0.450
 underscore-0.447
 twins-0.431
 tant-0.425
Token 5
Token ID642
Feature activation-1.01e-01

Pos logit contributions
hra+0.413
 Niagara+0.409
camp+0.408
anas+0.406
veh+0.405
Neg logit contributions
uld-0.416
 underscore-0.367
 Kling-0.359
��-0.348
��-0.346
Token667
Token ID28933
Feature activation+8.19e-02

Pos logit contributions
hra+0.064
camp+0.063
anas+0.063
veh+0.062
 Niagara+0.062
Neg logit contributions
uld-0.079
 underscore-0.070
 Kling-0.070
 twins-0.066
��-0.066
Token Posts
Token ID12043
Feature activation+2.70e+00

Pos logit contributions
uld+0.064
 underscore+0.058
��+0.057
��+0.057
 Kling+0.056
Neg logit contributions
camp-0.050
hra-0.050
 Niagara-0.050
anas-0.049
 AMA-0.049
Token #
Token ID1303
Feature activation+5.97e-01

Pos logit contributions
uld+1.450
 underscore+1.244
 Kling+1.223
 twins+1.143
 congr+1.125
Neg logit contributions
hra-2.317
MIT-2.292
veh-2.289
anas-2.286
camp-2.285
Token12
Token ID1065
Feature activation-4.42e-01

Pos logit contributions
uld+0.341
 underscore+0.293
 Kling+0.286
��+0.285
��+0.284
Neg logit contributions
veh-0.501
hra-0.499
camp-0.498
arkin-0.486
MIT-0.486
Token I
Token ID314
Feature activation-5.82e-01

Pos logit contributions
hra+0.322
camp+0.318
 Niagara+0.318
anas+0.316
 Nasa+0.316
Neg logit contributions
uld-0.295
 underscore-0.257
 Kling-0.256
��-0.246
��-0.244
Token'm
Token ID1101
Feature activation-1.09e+00

Pos logit contributions
hra+0.391
anas+0.383
camp+0.381
veh+0.379
 Niagara+0.378
Neg logit contributions
uld-0.422
 Kling-0.379
 underscore-0.375
��-0.360
��-0.358
Token glad
Token ID9675
Feature activation-8.90e-01

Pos logit contributions
hra+0.783
camp+0.771
anas+0.766
 Niagara+0.762
 Nasa+0.754
Neg logit contributions
uld-0.747
 underscore-0.656
 Kling-0.648
��-0.626
��-0.614
Token mates
Token ID27973
Feature activation+2.61e-02

Pos logit contributions
hra+0.509
anas+0.506
camp+0.505
veh+0.501
 Niagara+0.500
Neg logit contributions
uld-0.376
 underscore-0.319
 Kling-0.319
��-0.298
 congr-0.293
Token,
Token ID11
Feature activation-1.83e-01

Pos logit contributions
uld+0.021
 Kling+0.019
 underscore+0.019
 twins+0.018
��+0.018
Neg logit contributions
hra-0.016
anas-0.015
camp-0.015
veh-0.015
 Niagara-0.015
Token V
Token ID569
Feature activation-7.40e-01

Pos logit contributions
hra+0.127
anas+0.126
camp+0.125
veh+0.124
 Niagara+0.124
Neg logit contributions
uld-0.130
 Kling-0.113
 underscore-0.113
��-0.108
 twins-0.106
TokenPs
Token ID12016
Feature activation-3.71e-02

Pos logit contributions
hra+0.497
anas+0.496
camp+0.492
veh+0.485
arkin+0.485
Neg logit contributions
uld-0.533
 Kling-0.477
 underscore-0.473
��-0.455
 twins-0.449
Token See
Token ID4091
Feature activation+8.44e-02

Pos logit contributions
hra+0.024
anas+0.024
camp+0.024
veh+0.023
 Niagara+0.023
Neg logit contributions
uld-0.028
 Kling-0.025
 underscore-0.025
 twins-0.024
��-0.023
Token Gallery
Token ID12917
Feature activation+2.52e+00

Pos logit contributions
uld+0.074
 underscore+0.066
 Kling+0.066
 twins+0.063
��+0.063
Neg logit contributions
hra-0.045
camp-0.044
veh-0.044
anas-0.044
 Niagara-0.044
Token Donald
Token ID3759
Feature activation-1.11e+00

Pos logit contributions
uld+1.679
 underscore+1.482
 Kling+1.478
 twins+1.393
 tant+1.362
Neg logit contributions
hra-1.844
anas-1.810
camp-1.809
veh-1.800
arkin-1.781
Token Trump
Token ID1301
Feature activation-9.96e-01

Pos logit contributions
hra+0.616
 Nasa+0.612
 Niagara+0.612
camp+0.611
anas+0.611
Neg logit contributions
uld-0.943
 underscore-0.832
��-0.817
 Kling-0.812
��-0.806
Token passes
Token ID8318
Feature activation+7.74e-02

Pos logit contributions
hra+0.599
anas+0.590
 Niagara+0.587
camp+0.587
veh+0.586
Neg logit contributions
uld-0.800
 underscore-0.714
 Kling-0.707
��-0.693
��-0.681
Token on
Token ID319
Feature activation-1.79e-01

Pos logit contributions
uld+0.053
 underscore+0.046
 Kling+0.046
 twins+0.043
��+0.043
Neg logit contributions
hra-0.056
anas-0.055
camp-0.055
veh-0.055
 Niagara-0.054
Token '
Token ID705
Feature activation-8.94e-01

Pos logit contributions
hra+0.141
anas+0.139
camp+0.139
veh+0.138
 Niagara+0.137
Neg logit contributions
uld-0.111
 underscore-0.096
 Kling-0.096
 twins-0.089
��-0.088
Token 2012
Token ID2321
Feature activation-3.08e-01

Pos logit contributions
hra+0.076
anas+0.076
camp+0.076
 Niagara+0.075
 Nasa+0.075
Neg logit contributions
uld-0.079
 underscore-0.070
 Kling-0.068
��-0.067
��-0.066
Token United
Token ID1578
Feature activation-2.12e-01

Pos logit contributions
hra+0.206
 Niagara+0.203
camp+0.203
anas+0.203
veh+0.202
Neg logit contributions
uld-0.226
 underscore-0.201
 Kling-0.196
��-0.189
 twins-0.188
Token States
Token ID1829
Feature activation-3.97e-01

Pos logit contributions
hra+0.092
camp+0.090
anas+0.090
veh+0.090
 Niagara+0.088
Neg logit contributions
uld-0.205
 underscore-0.188
 Kling-0.187
 twins-0.180
 tant-0.178
Token 3
Token ID513
Feature activation+2.23e-01

Pos logit contributions
hra+0.280
 Niagara+0.278
camp+0.277
anas+0.276
veh+0.276
Neg logit contributions
uld-0.277
 underscore-0.244
 Kling-0.239
��-0.230
��-0.228
Token240
Token ID16102
Feature activation+1.20e-01

Pos logit contributions
uld+0.172
 underscore+0.153
 Kling+0.152
 twins+0.145
��+0.143
Neg logit contributions
hra-0.141
camp-0.139
anas-0.139
veh-0.138
 Niagara-0.138
Token Posts
Token ID12043
Feature activation+2.48e+00

Pos logit contributions
uld+0.095
 underscore+0.087
��+0.085
 Kling+0.084
��+0.083
Neg logit contributions
 Niagara-0.072
camp-0.071
hra-0.071
 AMA-0.070
anas-0.070
Token #
Token ID1303
Feature activation+6.67e-01

Pos logit contributions
uld+1.404
 underscore+1.215
 Kling+1.203
 twins+1.125
 congr+1.108
Neg logit contributions
hra-2.055
veh-2.024
camp-2.020
anas-2.019
MIT-2.013
Token10
Token ID940
Feature activation-4.97e-01

Pos logit contributions
uld+0.436
 underscore+0.380
 Kling+0.377
 twins+0.356
��+0.354
Neg logit contributions
hra-0.503
camp-0.499
veh-0.497
anas-0.495
 Niagara-0.492
Token Happy
Token ID14628
Feature activation-6.76e-02

Pos logit contributions
hra+0.367
camp+0.365
 Niagara+0.362
anas+0.362
veh+0.360
Neg logit contributions
uld-0.327
 underscore-0.287
 Kling-0.284
��-0.271
��-0.268
Token it
Token ID340
Feature activation-4.09e-01

Pos logit contributions
hra+0.048
camp+0.048
anas+0.047
veh+0.047
 Niagara+0.047
Neg logit contributions
uld-0.047
 underscore-0.041
 Kling-0.040
��-0.038
 twins-0.038
Token worked
Token ID3111
Feature activation-9.44e-02

Pos logit contributions
hra+0.276
anas+0.271
camp+0.270
veh+0.269
 Niagara+0.267
Neg logit contributions
uld-0.299
 underscore-0.266
 Kling-0.265
 twins-0.250
��-0.249
Token Irvine
Token ID37277
Feature activation+5.94e-01

Pos logit contributions
hra+0.024
anas+0.023
camp+0.023
 Niagara+0.023
veh+0.023
Neg logit contributions
uld-0.078
 underscore-0.071
 Kling-0.071
 twins-0.069
��-0.068
Token,
Token ID11
Feature activation+1.91e-01

Pos logit contributions
uld+0.454
 underscore+0.406
 Kling+0.405
 twins+0.384
 tant+0.378
Neg logit contributions
hra-0.379
camp-0.372
anas-0.372
veh-0.371
 Niagara-0.367
Token California
Token ID3442
Feature activation+8.45e-01

Pos logit contributions
uld+0.201
 underscore+0.185
 Kling+0.184
��+0.179
��+0.178
Neg logit contributions
hra-0.066
anas-0.065
 Nasa-0.065
veh-0.065
camp-0.064
Token.
Token ID13
Feature activation+6.26e-01

Pos logit contributions
uld+0.629
 Kling+0.561
 underscore+0.558
 twins+0.527
 congr+0.519
Neg logit contributions
hra-0.558
camp-0.546
veh-0.544
anas-0.542
 Niagara-0.540
Token Hide
Token ID10415
Feature activation+1.93e+00

Pos logit contributions
uld+0.414
 underscore+0.358
 Kling+0.354
��+0.341
 twins+0.336
Neg logit contributions
hra-0.469
camp-0.461
anas-0.461
 Niagara-0.458
veh-0.457
Token Caption
Token ID11260
Feature activation+2.47e+00

Pos logit contributions
uld+0.868
 underscore+0.708
 Kling+0.692
��+0.646
 twins+0.644
Neg logit contributions
hra-1.851
camp-1.833
veh-1.826
anas-1.824
 Niagara-1.816
Token 9
Token ID860
Feature activation+1.01e-01

Pos logit contributions
uld+1.258
 underscore+1.049
 Kling+1.037
 twins+0.963
��+0.957
Neg logit contributions
hra-2.217
camp-2.197
anas-2.194
veh-2.187
 Niagara-2.182
Token of
Token ID286
Feature activation+4.63e-01

Pos logit contributions
uld+0.077
 underscore+0.069
 Kling+0.069
��+0.067
��+0.066
Neg logit contributions
hra-0.064
camp-0.063
veh-0.063
anas-0.063
 Niagara-0.062
Token 15
Token ID1315
Feature activation-3.71e-01

Pos logit contributions
uld+0.303
 underscore+0.259
��+0.250
 Kling+0.250
��+0.247
Neg logit contributions
hra-0.359
camp-0.352
veh-0.350
 Niagara-0.349
anas-0.349
Token Photos
Token ID9434
Feature activation+2.16e+00

Pos logit contributions
hra+0.245
camp+0.243
veh+0.242
anas+0.242
 Nasa+0.241
Neg logit contributions
uld-0.279
 underscore-0.248
��-0.240
��-0.240
 Kling-0.239
Token:
Token ID25
Feature activation+3.93e-01

Pos logit contributions
uld+1.500
 Kling+1.320
 underscore+1.320
 twins+1.241
 tant+1.219
Neg logit contributions
hra-1.528
anas-1.506
camp-1.503
veh-1.498
 Niagara-1.488
Token 20
Token ID1160
Feature activation-8.39e-02

Pos logit contributions
hra+0.226
camp+0.223
 Niagara+0.223
 Nasa+0.221
anas+0.221
Neg logit contributions
uld-0.220
 underscore-0.192
��-0.189
��-0.187
 Kling-0.186
Token,
Token ID11
Feature activation+7.74e-02

Pos logit contributions
hra+0.049
camp+0.048
veh+0.048
anas+0.047
 Niagara+0.047
Neg logit contributions
uld-0.069
 underscore-0.062
 Kling-0.061
��-0.061
��-0.060
Token 2015
Token ID1853
Feature activation-4.44e-01

Pos logit contributions
uld+0.052
 underscore+0.045
 Kling+0.044
 twins+0.042
��+0.042
Neg logit contributions
hra-0.057
anas-0.056
camp-0.056
 Niagara-0.056
veh-0.056
Token.
Token ID13
Feature activation+5.88e-01

Pos logit contributions
hra+0.281
camp+0.276
anas+0.276
veh+0.276
 Niagara+0.274
Neg logit contributions
uld-0.341
 underscore-0.306
 Kling-0.304
 twins-0.288
 tant-0.284
Token Hide
Token ID10415
Feature activation+2.01e+00

Pos logit contributions
uld+0.386
 underscore+0.334
 Kling+0.330
��+0.313
 twins+0.312
Neg logit contributions
hra-0.442
camp-0.436
anas-0.435
veh-0.433
 Niagara-0.433
Token Caption
Token ID11260
Feature activation+2.45e+00

Pos logit contributions
uld+0.900
 underscore+0.732
 Kling+0.718
��+0.666
 twins+0.666
Neg logit contributions
hra-1.929
camp-1.911
veh-1.902
anas-1.901
 Niagara-1.893
Token 8
Token ID807
Feature activation-2.29e-01

Pos logit contributions
uld+1.254
 underscore+1.046
 Kling+1.035
��+0.970
 twins+0.957
Neg logit contributions
hra-2.201
veh-2.175
camp-2.175
anas-2.173
 Niagara-2.167
Token of
Token ID286
Feature activation+2.28e-01

Pos logit contributions
hra+0.144
camp+0.143
veh+0.142
anas+0.141
 Niagara+0.140
Neg logit contributions
uld-0.178
 underscore-0.160
 Kling-0.158
��-0.155
��-0.153
Token 15
Token ID1315
Feature activation-6.13e-01

Pos logit contributions
uld+0.151
 underscore+0.129
��+0.126
��+0.124
 Kling+0.124
Neg logit contributions
hra-0.175
camp-0.172
veh-0.171
 Niagara-0.170
anas-0.170
Token Photos
Token ID9434
Feature activation+1.96e+00

Pos logit contributions
hra+0.400
camp+0.398
veh+0.398
 Nasa+0.396
anas+0.396
Neg logit contributions
uld-0.465
 underscore-0.413
��-0.402
��-0.402
 Kling-0.399
Token:
Token ID25
Feature activation+3.15e-01

Pos logit contributions
uld+1.368
 underscore+1.205
 Kling+1.203
 twins+1.133
 tant+1.114
Neg logit contributions
hra-1.376
anas-1.355
camp-1.354
veh-1.349
 Niagara-1.339
Token Irvine
Token ID37277
Feature activation+4.26e-01

Pos logit contributions
uld+0.021
 underscore+0.020
 Kling+0.020
 twins+0.019
 tant+0.019
Neg logit contributions
hra-0.005
anas-0.005
camp-0.005
veh-0.005
 Niagara-0.004
Token,
Token ID11
Feature activation+2.16e-01

Pos logit contributions
uld+0.335
 underscore+0.300
 Kling+0.298
 twins+0.284
��+0.281
Neg logit contributions
hra-0.263
camp-0.259
anas-0.259
veh-0.257
 Niagara-0.255
Token California
Token ID3442
Feature activation+7.09e-01

Pos logit contributions
uld+0.213
 underscore+0.196
 Kling+0.195
 twins+0.188
 tant+0.186
Neg logit contributions
hra-0.089
camp-0.087
anas-0.086
veh-0.086
 Niagara-0.085
Token.
Token ID13
Feature activation+6.16e-01

Pos logit contributions
uld+0.533
 Kling+0.476
 underscore+0.474
 twins+0.448
 congr+0.440
Neg logit contributions
hra-0.462
camp-0.451
anas-0.451
veh-0.450
arkin-0.448
Token Hide
Token ID10415
Feature activation+1.87e+00

Pos logit contributions
uld+0.403
 underscore+0.349
 Kling+0.342
��+0.328
 twins+0.326
Neg logit contributions
hra-0.465
camp-0.459
anas-0.457
 Niagara-0.457
veh-0.456
Token Caption
Token ID11260
Feature activation+2.42e+00

Pos logit contributions
uld+0.843
 underscore+0.688
 Kling+0.671
��+0.630
 twins+0.625
Neg logit contributions
hra-1.796
camp-1.779
veh-1.772
anas-1.771
 Niagara-1.764
Token 10
Token ID838
Feature activation-2.62e-01

Pos logit contributions
uld+1.327
 underscore+1.101
 Kling+1.067
��+1.064
��+1.016
Neg logit contributions
hra-2.111
anas-2.101
 Niagara-2.101
camp-2.094
veh-2.090
Token of
Token ID286
Feature activation+3.87e-01

Pos logit contributions
hra+0.165
camp+0.163
veh+0.163
anas+0.161
 Niagara+0.160
Neg logit contributions
uld-0.204
 underscore-0.183
 Kling-0.180
��-0.177
��-0.175
Token 15
Token ID1315
Feature activation-5.14e-01

Pos logit contributions
uld+0.251
 underscore+0.214
 Kling+0.207
��+0.206
��+0.203
Neg logit contributions
hra-0.301
camp-0.296
veh-0.295
anas-0.294
 Niagara-0.293
Token Photos
Token ID9434
Feature activation+2.07e+00

Pos logit contributions
hra+0.337
camp+0.335
veh+0.335
anas+0.334
 Nasa+0.333
Neg logit contributions
uld-0.388
 underscore-0.345
��-0.335
��-0.335
 Kling-0.333
Token:
Token ID25
Feature activation+3.20e-01

Pos logit contributions
uld+1.444
 underscore+1.272
 Kling+1.271
 twins+1.196
 tant+1.176
Neg logit contributions
hra-1.463
anas-1.441
camp-1.438
veh-1.433
 Niagara-1.423
Token Women
Token ID6926
Feature activation-4.83e-01

Pos logit contributions
uld+0.171
 underscore+0.148
 Kling+0.146
 twins+0.138
��+0.138
Neg logit contributions
hra-0.215
anas-0.213
camp-0.212
veh-0.212
 Niagara-0.211
Token in
Token ID287
Feature activation-7.57e-02

Pos logit contributions
hra+0.312
camp+0.310
anas+0.308
veh+0.306
 Niagara+0.305
Neg logit contributions
uld-0.367
 Kling-0.323
 underscore-0.321
��-0.307
 twins-0.305
Token politics
Token ID4819
Feature activation+1.45e-01

Pos logit contributions
hra+0.052
anas+0.051
 Niagara+0.051
camp+0.051
veh+0.050
Neg logit contributions
uld-0.055
 Kling-0.048
 underscore-0.048
��-0.046
��-0.045
Token.
Token ID13
Feature activation+1.90e-01

Pos logit contributions
uld+0.110
 Kling+0.097
 underscore+0.097
 twins+0.092
��+0.091
Neg logit contributions
hra-0.094
camp-0.093
anas-0.092
veh-0.091
 Niagara-0.091
Token Hide
Token ID10415
Feature activation+1.91e+00

Pos logit contributions
uld+0.124
 underscore+0.107
 Kling+0.105
��+0.101
 twins+0.100
Neg logit contributions
hra-0.144
camp-0.142
anas-0.142
veh-0.141
 Niagara-0.141
Token Caption
Token ID11260
Feature activation+2.38e+00

Pos logit contributions
uld+0.864
 underscore+0.702
 Kling+0.687
��+0.642
 twins+0.639
Neg logit contributions
hra-1.835
camp-1.817
veh-1.809
anas-1.809
 Niagara-1.802
Token 7
Token ID767
Feature activation-7.34e-02

Pos logit contributions
uld+1.496
 underscore+1.297
 Kling+1.291
 twins+1.210
 tant+1.188
Neg logit contributions
hra-1.852
camp-1.827
anas-1.826
veh-1.821
 Niagara-1.809
Token of
Token ID286
Feature activation+5.77e-01

Pos logit contributions
hra+0.046
camp+0.046
veh+0.046
anas+0.045
 Niagara+0.045
Neg logit contributions
uld-0.057
 underscore-0.051
 Kling-0.050
��-0.050
��-0.049
Token 15
Token ID1315
Feature activation-6.00e-01

Pos logit contributions
uld+0.376
 underscore+0.322
 Kling+0.310
��+0.308
��+0.306
Neg logit contributions
hra-0.448
camp-0.440
veh-0.438
anas-0.437
 Niagara-0.437
Token Photos
Token ID9434
Feature activation+2.08e+00

Pos logit contributions
hra+0.392
camp+0.389
veh+0.389
anas+0.388
 Nasa+0.388
Neg logit contributions
uld-0.455
 underscore-0.404
��-0.393
��-0.393
 Kling-0.389
Token:
Token ID25
Feature activation+2.40e-01

Pos logit contributions
uld+1.448
 underscore+1.274
 Kling+1.273
 twins+1.199
 tant+1.178
Neg logit contributions
hra-1.467
anas-1.445
camp-1.443
veh-1.438
 Niagara-1.428
Token Grove
Token ID24144
Feature activation+8.91e-01

Pos logit contributions
uld+0.106
 underscore+0.096
 Kling+0.095
 twins+0.091
 congr+0.090
Neg logit contributions
hra-0.069
anas-0.068
camp-0.068
veh-0.067
 Niagara-0.067
Token,
Token ID11
Feature activation-8.81e-02

Pos logit contributions
uld+0.678
 underscore+0.603
 Kling+0.603
 twins+0.572
 congr+0.562
Neg logit contributions
hra-0.572
camp-0.562
anas-0.561
veh-0.560
 Niagara-0.553
Token California
Token ID3442
Feature activation+2.01e-01

Pos logit contributions
hra+0.034
anas+0.033
veh+0.033
camp+0.033
 Niagara+0.033
Neg logit contributions
uld-0.089
 underscore-0.082
 Kling-0.082
 twins-0.078
��-0.078
Token.
Token ID13
Feature activation+7.06e-01

Pos logit contributions
uld+0.151
 Kling+0.135
 underscore+0.134
 twins+0.127
 congr+0.125
Neg logit contributions
hra-0.131
camp-0.128
veh-0.127
anas-0.127
arkin-0.126
Token Hide
Token ID10415
Feature activation+1.93e+00

Pos logit contributions
uld+0.462
 underscore+0.401
 Kling+0.397
��+0.375
 twins+0.375
Neg logit contributions
hra-0.531
camp-0.524
anas-0.523
veh-0.520
 Niagara-0.520
Token Caption
Token ID11260
Feature activation+2.36e+00

Pos logit contributions
uld+0.867
 underscore+0.707
 Kling+0.690
��+0.647
 twins+0.642
Neg logit contributions
hra-1.848
camp-1.832
veh-1.825
anas-1.824
 Niagara-1.816
Token 15
Token ID1315
Feature activation-3.88e-01

Pos logit contributions
uld+1.434
 underscore+1.229
 Kling+1.206
��+1.146
 twins+1.143
Neg logit contributions
hra-1.889
anas-1.879
camp-1.873
veh-1.864
 Niagara-1.853
Token of
Token ID286
Feature activation+2.71e-01

Pos logit contributions
camp+0.236
hra+0.235
veh+0.234
anas+0.232
 Niagara+0.231
Neg logit contributions
uld-0.308
 underscore-0.277
��-0.275
��-0.273
 Kling-0.271
Token 15
Token ID1315
Feature activation-7.59e-01

Pos logit contributions
uld+0.177
 underscore+0.151
 Kling+0.145
��+0.145
��+0.144
Neg logit contributions
hra-0.209
camp-0.208
veh-0.206
anas-0.206
 Niagara-0.206
Token
Token ID198
Feature activation+1.82e-01

Pos logit contributions
hra+0.554
camp+0.553
veh+0.550
MIT+0.547
 Nasa+0.546
Neg logit contributions
uld-0.514
 underscore-0.449
��-0.437
 Kling-0.436
��-0.436
Token
Token ID198
Feature activation-2.23e-02

Pos logit contributions
uld+0.121
 underscore+0.107
 Kling+0.104
��+0.103
��+0.103
Neg logit contributions
hra-0.136
camp-0.135
veh-0.135
anas-0.133
 Niagara-0.133
Token been
Token ID587
Feature activation-2.53e-01

Pos logit contributions
hra+0.157
anas+0.155
camp+0.155
veh+0.154
 Niagara+0.154
Neg logit contributions
uld-0.148
 underscore-0.129
 Kling-0.128
��-0.121
 twins-0.120
Token playing
Token ID2712
Feature activation+2.05e-01

Pos logit contributions
hra+0.180
camp+0.179
anas+0.177
veh+0.176
 Niagara+0.176
Neg logit contributions
uld-0.176
 underscore-0.153
 Kling-0.151
��-0.144
 twins-0.143
Token the
Token ID262
Feature activation+4.21e-01

Pos logit contributions
uld+0.141
 underscore+0.122
 Kling+0.122
 twins+0.115
��+0.115
Neg logit contributions
hra-0.147
camp-0.146
anas-0.146
veh-0.145
 Niagara-0.144
Token attention
Token ID3241
Feature activation+4.88e-01

Pos logit contributions
uld+0.305
 underscore+0.269
 Kling+0.268
 twins+0.253
��+0.250
Neg logit contributions
hra-0.286
camp-0.283
anas-0.282
veh-0.281
 Niagara-0.279
Token-
Token ID12
Feature activation+9.35e-01

Pos logit contributions
uld+0.371
 underscore+0.331
 Kling+0.330
 twins+0.314
 tant+0.310
Neg logit contributions
hra-0.313
anas-0.308
camp-0.307
veh-0.307
 Niagara-0.304
Tokengrab
Token ID32393
Feature activation+2.34e+00

Pos logit contributions
uld+0.700
 underscore+0.621
 Kling+0.620
 twins+0.588
 tant+0.580
Neg logit contributions
hra-0.612
anas-0.603
camp-0.599
veh-0.598
 Niagara-0.595
Tokenbing
Token ID4623
Feature activation+5.33e-01

Pos logit contributions
uld+2.035
 underscore+1.849
 Kling+1.838
 twins+1.749
 tant+1.746
Neg logit contributions
hra-1.235
anas-1.204
camp-1.203
veh-1.201
 Niagara-1.190
Token PR
Token ID4810
Feature activation+5.51e-01

Pos logit contributions
uld+0.401
 underscore+0.356
 Kling+0.355
 twins+0.337
��+0.332
Neg logit contributions
hra-0.347
camp-0.342
anas-0.341
veh-0.340
 Niagara-0.337
Token-
Token ID12
Feature activation+7.16e-01

Pos logit contributions
uld+0.423
 underscore+0.377
 Kling+0.376
 twins+0.357
 tant+0.352
Neg logit contributions
hra-0.351
anas-0.345
camp-0.344
veh-0.343
 Niagara-0.340
Tokens
Token ID82
Feature activation+8.87e-01

Pos logit contributions
uld+0.557
 underscore+0.497
 Kling+0.496
 twins+0.471
 tant+0.464
Neg logit contributions
hra-0.448
anas-0.440
camp-0.439
veh-0.437
 Niagara-0.435
Tokenab
Token ID397
Feature activation+6.45e-01

Pos logit contributions
uld+0.642
 underscore+0.568
 Kling+0.567
 twins+0.537
 congr+0.529
Neg logit contributions
hra-0.606
camp-0.595
anas-0.595
veh-0.593
 Niagara-0.586
Token Pasadena
Token ID41010
Feature activation+2.83e-01

Pos logit contributions
hra+0.061
anas+0.060
camp+0.060
veh+0.060
 Niagara+0.060
Neg logit contributions
uld-0.091
 underscore-0.082
 Kling-0.081
 twins-0.078
��-0.077
Token,
Token ID11
Feature activation-7.01e-02

Pos logit contributions
uld+0.209
 underscore+0.187
 Kling+0.187
 twins+0.177
 tant+0.173
Neg logit contributions
hra-0.188
veh-0.185
camp-0.184
anas-0.183
MIT-0.182
Token California
Token ID3442
Feature activation+2.75e-01

Pos logit contributions
hra+0.029
camp+0.028
anas+0.028
veh+0.028
 Niagara+0.027
Neg logit contributions
uld-0.070
 underscore-0.064
 Kling-0.064
 twins-0.062
 tant-0.061
Token.
Token ID13
Feature activation+4.98e-01

Pos logit contributions
uld+0.207
 Kling+0.186
 underscore+0.185
 twins+0.175
 congr+0.172
Neg logit contributions
hra-0.179
camp-0.175
veh-0.175
anas-0.174
arkin-0.173
Token Hide
Token ID10415
Feature activation+1.74e+00

Pos logit contributions
uld+0.325
 underscore+0.282
 Kling+0.280
 twins+0.265
��+0.263
Neg logit contributions
hra-0.376
camp-0.371
anas-0.370
veh-0.368
 Niagara-0.368
Token Caption
Token ID11260
Feature activation+2.27e+00

Pos logit contributions
uld+0.784
 underscore+0.641
 Kling+0.624
��+0.588
 twins+0.582
Neg logit contributions
hra-1.664
camp-1.650
veh-1.642
anas-1.641
 Niagara-1.636
Token 12
Token ID1105
Feature activation-2.04e-01

Pos logit contributions
uld+1.198
 underscore+0.996
 Kling+0.979
��+0.925
 twins+0.916
Neg logit contributions
hra-2.007
camp-1.982
veh-1.979
anas-1.976
 Niagara-1.971
Token of
Token ID286
Feature activation+3.50e-01

Pos logit contributions
hra+0.129
camp+0.128
veh+0.127
anas+0.126
 Niagara+0.125
Neg logit contributions
uld-0.158
 underscore-0.141
 Kling-0.139
��-0.137
��-0.136
Token 15
Token ID1315
Feature activation-5.83e-01

Pos logit contributions
uld+0.227
 underscore+0.194
 Kling+0.188
��+0.186
��+0.184
Neg logit contributions
hra-0.273
camp-0.268
veh-0.267
anas-0.266
 Niagara-0.265
Token Photos
Token ID9434
Feature activation+2.00e+00

Pos logit contributions
hra+0.383
camp+0.381
veh+0.380
anas+0.380
 Nasa+0.379
Neg logit contributions
uld-0.439
 underscore-0.390
��-0.378
��-0.378
 Kling-0.377
Token:
Token ID25
Feature activation+2.97e-01

Pos logit contributions
uld+1.397
 underscore+1.230
 Kling+1.229
 twins+1.157
 tant+1.137
Neg logit contributions
hra-1.412
anas-1.390
camp-1.388
veh-1.384
 Niagara-1.374
Token Hills
Token ID14379
Feature activation+1.88e-01

Pos logit contributions
hra+0.247
camp+0.242
veh+0.241
anas+0.241
 Niagara+0.238
Neg logit contributions
uld-0.300
 underscore-0.267
 Kling-0.267
 twins-0.254
 congr-0.250
Token,
Token ID11
Feature activation-1.76e-01

Pos logit contributions
uld+0.144
 underscore+0.128
 Kling+0.128
 twins+0.121
 congr+0.119
Neg logit contributions
hra-0.120
anas-0.118
camp-0.118
veh-0.117
 Niagara-0.116
Token California
Token ID3442
Feature activation+3.09e-01

Pos logit contributions
hra+0.068
anas+0.067
camp+0.066
veh+0.066
 Niagara+0.066
Neg logit contributions
uld-0.179
 underscore-0.164
 Kling-0.163
 twins-0.157
��-0.157
Token.
Token ID13
Feature activation+3.94e-01

Pos logit contributions
uld+0.231
 Kling+0.206
 underscore+0.204
 twins+0.194
 congr+0.190
Neg logit contributions
hra-0.203
camp-0.199
veh-0.198
anas-0.198
arkin-0.197
Token Hide
Token ID10415
Feature activation+1.57e+00

Pos logit contributions
uld+0.262
 underscore+0.227
 Kling+0.224
��+0.215
 twins+0.212
Neg logit contributions
hra-0.293
camp-0.289
anas-0.289
 Niagara-0.287
veh-0.287
Token Caption
Token ID11260
Feature activation+2.26e+00

Pos logit contributions
uld+0.716
 underscore+0.586
 Kling+0.568
��+0.543
 twins+0.533
Neg logit contributions
hra-1.496
camp-1.483
veh-1.477
anas-1.475
 Niagara-1.471
Token 14
Token ID1478
Feature activation-2.35e-01

Pos logit contributions
uld+1.453
 underscore+1.263
 Kling+1.256
 twins+1.181
��+1.165
Neg logit contributions
hra-1.729
anas-1.703
camp-1.700
veh-1.698
 Niagara-1.685
Token of
Token ID286
Feature activation+1.01e-01

Pos logit contributions
hra+0.148
camp+0.147
veh+0.146
anas+0.145
 Niagara+0.144
Neg logit contributions
uld-0.182
 underscore-0.163
 Kling-0.161
��-0.158
��-0.157
Token 15
Token ID1315
Feature activation-8.98e-01

Pos logit contributions
uld+0.066
 underscore+0.056
 Kling+0.054
��+0.054
��+0.053
Neg logit contributions
hra-0.079
camp-0.078
veh-0.077
anas-0.077
 Niagara-0.077
Token Photos
Token ID9434
Feature activation+2.07e+00

Pos logit contributions
hra+0.582
 Nasa+0.579
camp+0.578
anas+0.578
veh+0.578
Neg logit contributions
uld-0.684
 underscore-0.608
��-0.595
��-0.592
 Kling-0.584
Token:
Token ID25
Feature activation+1.95e-01

Pos logit contributions
uld+1.441
 underscore+1.269
 Kling+1.268
 twins+1.194
 tant+1.173
Neg logit contributions
hra-1.456
anas-1.434
camp-1.432
veh-1.427
 Niagara-1.417
Token convince
Token ID11508
Feature activation-3.39e-01

Pos logit contributions
uld+0.124
 Kling+0.109
 underscore+0.107
 twins+0.103
��+0.102
Neg logit contributions
hra-0.125
camp-0.123
anas-0.123
veh-0.123
 Niagara-0.122
Token the
Token ID262
Feature activation-2.39e-03

Pos logit contributions
hra+0.231
camp+0.228
anas+0.227
veh+0.227
 Niagara+0.226
Neg logit contributions
uld-0.244
 underscore-0.215
 Kling-0.214
 twins-0.203
��-0.200
Token two
Token ID734
Feature activation-9.86e-02

Pos logit contributions
hra+0.001
anas+0.001
camp+0.001
veh+0.001
 Niagara+0.001
Neg logit contributions
uld-0.002
 underscore-0.002
 Kling-0.002
 twins-0.002
��-0.002
Token cities
Token ID4736
Feature activation+2.35e-01

Pos logit contributions
hra+0.058
camp+0.057
anas+0.057
veh+0.056
 Niagara+0.056
Neg logit contributions
uld-0.081
 underscore-0.072
 Kling-0.072
 twins-0.069
 tant-0.068
Token
Token ID960
Feature activation+3.84e-02

Pos logit contributions
uld+0.150
 underscore+0.131
 Kling+0.130
 twins+0.123
 tant+0.120
Neg logit contributions
hra-0.179
anas-0.177
veh-0.177
camp-0.177
 Niagara-0.174
Tokenlet
Token ID1616
Feature activation-2.38e+00

Pos logit contributions
uld+0.031
 underscore+0.028
 Kling+0.028
 twins+0.027
 tant+0.026
Neg logit contributions
hra-0.023
camp-0.022
anas-0.022
veh-0.022
 Niagara-0.022
Token alone
Token ID3436
Feature activation-4.36e-01

Pos logit contributions
hra+1.586
anas+1.564
camp+1.559
veh+1.555
 Niagara+1.554
Neg logit contributions
uld-1.755
 Kling-1.533
 underscore-1.530
 twins-1.448
 congr-1.442
Token 59
Token ID7863
Feature activation-9.28e-02

Pos logit contributions
hra+0.302
 Niagara+0.296
anas+0.296
camp+0.295
veh+0.295
Neg logit contributions
uld-0.311
 Kling-0.272
 underscore-0.270
 twins-0.256
��-0.254
Token community
Token ID2055
Feature activation-2.39e-01

Pos logit contributions
hra+0.058
anas+0.057
camp+0.057
veh+0.056
 Niagara+0.056
Neg logit contributions
uld-0.073
 underscore-0.065
 Kling-0.065
 twins-0.062
 tant-0.061
Token districts
Token ID12815
Feature activation+3.00e-01

Pos logit contributions
hra+0.098
anas+0.096
camp+0.096
veh+0.095
 Niagara+0.094
Neg logit contributions
uld-0.238
 underscore-0.218
 Kling-0.217
 twins-0.209
 congr-0.207
Token
Token ID960
Feature activation-9.78e-02

Pos logit contributions
uld+0.219
 underscore+0.194
 Kling+0.193
 twins+0.183
 tant+0.180
Neg logit contributions
hra-0.203
camp-0.200
anas-0.200
veh-0.199
 Niagara-0.197
Token Sir
Token ID7361
Feature activation+3.01e-01

Pos logit contributions
hra+0.210
camp+0.207
anas+0.206
veh+0.206
 Niagara+0.205
Neg logit contributions
uld-0.203
 underscore-0.178
 Kling-0.177
 twins-0.167
��-0.167
Tokensa
Token ID11400
Feature activation+1.77e-01

Pos logit contributions
uld+0.232
 underscore+0.207
 Kling+0.206
 twins+0.196
 tant+0.193
Neg logit contributions
hra-0.190
camp-0.187
anas-0.187
veh-0.186
 Niagara-0.185
Token,
Token ID11
Feature activation+8.54e-02

Pos logit contributions
uld+0.130
 underscore+0.116
 Kling+0.115
 twins+0.109
��+0.108
Neg logit contributions
hra-0.120
camp-0.117
veh-0.117
anas-0.117
 Niagara-0.116
Token while
Token ID981
Feature activation-1.61e-01

Pos logit contributions
uld+0.061
 underscore+0.054
 Kling+0.054
 twins+0.051
��+0.050
Neg logit contributions
hra-0.059
camp-0.058
anas-0.058
veh-0.058
 Niagara-0.057
Token comm
Token ID725
Feature activation+5.16e-01

Pos logit contributions
hra+0.114
anas+0.112
camp+0.112
veh+0.111
 Niagara+0.111
Neg logit contributions
uld-0.114
 underscore-0.099
 Kling-0.098
��-0.094
 twins-0.094
Tokenending
Token ID1571
Feature activation-2.34e+00

Pos logit contributions
uld+0.333
 underscore+0.291
 Kling+0.289
 twins+0.272
��+0.267
Neg logit contributions
hra-0.392
camp-0.386
anas-0.385
veh-0.385
 Niagara-0.382
Token the
Token ID262
Feature activation-2.59e-01

Pos logit contributions
hra+1.646
anas+1.605
veh+1.603
camp+1.602
 Nasa+1.595
Neg logit contributions
uld-1.642
 underscore-1.434
 Kling-1.419
��-1.356
��-1.346
Token
Token ID564
Feature activation-2.79e-02

Pos logit contributions
hra+0.176
anas+0.173
camp+0.173
veh+0.172
 Niagara+0.171
Neg logit contributions
uld-0.189
 underscore-0.167
 Kling-0.166
 twins-0.156
��-0.156
Token
Token ID246
Feature activation-2.28e-01

Pos logit contributions
hra+0.025
camp+0.025
anas+0.025
veh+0.025
 Niagara+0.025
Neg logit contributions
uld-0.014
 underscore-0.012
 Kling-0.011
 twins-0.010
 tant-0.010
Tokenwork
Token ID1818
Feature activation+5.32e-02

Pos logit contributions
hra+0.147
camp+0.145
anas+0.144
veh+0.144
 Niagara+0.142
Neg logit contributions
uld-0.173
 underscore-0.154
 Kling-0.153
 twins-0.146
 congr-0.143
Token
Token ID447
Feature activation-3.97e-01

Pos logit contributions
uld+0.039
 underscore+0.035
 Kling+0.035
 twins+0.033
��+0.032
Neg logit contributions
hra-0.036
camp-0.035
anas-0.035
veh-0.035
 Niagara-0.035
Token Press
Token ID4332
Feature activation-4.38e-01

Pos logit contributions
hra+0.004
 Niagara+0.004
camp+0.004
veh+0.004
anas+0.004
Neg logit contributions
uld-0.005
 underscore-0.004
 Kling-0.004
��-0.004
��-0.004
Token is
Token ID318
Feature activation+4.73e-01

Pos logit contributions
hra+0.378
anas+0.371
camp+0.371
 Niagara+0.370
veh+0.370
Neg logit contributions
uld-0.237
 underscore-0.199
 Kling-0.199
��-0.185
 twins-0.183
Token reporting
Token ID6447
Feature activation-1.72e-01

Pos logit contributions
uld+0.342
 underscore+0.300
 Kling+0.299
 twins+0.283
��+0.282
Neg logit contributions
hra-0.325
camp-0.318
 Niagara-0.316
veh-0.316
anas-0.315
Token that
Token ID326
Feature activation+2.52e-01

Pos logit contributions
hra+0.117
camp+0.115
anas+0.114
veh+0.114
 Niagara+0.114
Neg logit contributions
uld-0.125
 underscore-0.110
 Kling-0.109
 twins-0.103
��-0.102
Token AT
Token ID5161
Feature activation+1.75e-01

Pos logit contributions
uld+0.180
 underscore+0.158
 Kling+0.156
��+0.149
 twins+0.148
Neg logit contributions
hra-0.175
anas-0.171
 Niagara-0.171
camp-0.171
veh-0.171
Token&
Token ID5
Feature activation-2.32e+00

Pos logit contributions
uld+0.126
 underscore+0.112
 Kling+0.111
 twins+0.105
 tant+0.104
Neg logit contributions
hra-0.119
anas-0.118
camp-0.117
veh-0.117
 Niagara-0.116
TokenT
Token ID51
Feature activation-1.26e+00

Pos logit contributions
hra+1.419
MIT+1.409
veh+1.386
camp+1.378
 Niagara+1.371
Neg logit contributions
uld-1.850
 underscore-1.679
 Kling-1.631
��-1.626
��-1.622
Token,
Token ID11
Feature activation+1.67e-01

Pos logit contributions
hra+0.843
 Niagara+0.831
veh+0.830
camp+0.824
anas+0.824
Neg logit contributions
uld-0.924
 underscore-0.818
 Kling-0.813
��-0.781
��-0.775
Token the
Token ID262
Feature activation+4.43e-02

Pos logit contributions
uld+0.140
 underscore+0.125
 Kling+0.125
��+0.121
��+0.120
Neg logit contributions
hra-0.095
 Niagara-0.093
veh-0.092
anas-0.092
camp-0.092
Token defendant
Token ID11304
Feature activation-2.44e-01

Pos logit contributions
uld+0.034
 Kling+0.030
 underscore+0.030
��+0.029
��+0.028
Neg logit contributions
hra-0.029
 Niagara-0.028
veh-0.028
anas-0.028
 Nasa-0.028
Token in
Token ID287
Feature activation-2.39e-01

Pos logit contributions
hra+0.175
camp+0.173
anas+0.172
veh+0.172
 Niagara+0.171
Neg logit contributions
uld-0.167
 underscore-0.146
 Kling-0.146
 twins-0.137
��-0.135
Token
Token ID198
Feature activation-1.72e-01

Pos logit contributions
uld+0.043
 underscore+0.038
 Kling+0.037
��+0.037
��+0.037
Neg logit contributions
hra-0.049
camp-0.049
veh-0.049
anas-0.048
 Niagara-0.048
TokenTo
Token ID2514
Feature activation-1.43e-01

Pos logit contributions
hra+0.127
camp+0.126
veh+0.125
MIT+0.125
anas+0.125
Neg logit contributions
uld-0.116
 underscore-0.103
 Kling-0.101
 twins-0.096
��-0.095
Token mark
Token ID1317
Feature activation-1.06e+00

Pos logit contributions
hra+0.104
anas+0.103
camp+0.103
veh+0.102
 Niagara+0.102
Neg logit contributions
uld-0.096
 Kling-0.083
 underscore-0.082
��-0.079
��-0.079
Token the
Token ID262
Feature activation-1.08e+00

Pos logit contributions
hra+0.769
camp+0.762
anas+0.759
 Nasa+0.758
veh+0.757
Neg logit contributions
uld-0.712
 underscore-0.621
 Kling-0.616
��-0.586
 twins-0.579
Token World
Token ID2159
Feature activation-4.27e-01

Pos logit contributions
hra+0.799
veh+0.791
 Nasa+0.786
anas+0.785
 Niagara+0.784
Neg logit contributions
uld-0.716
 underscore-0.614
 Kling-0.610
��-0.588
��-0.577
Token Wide
Token ID23399
Feature activation-2.32e+00

Pos logit contributions
hra+0.284
camp+0.280
veh+0.278
anas+0.278
 Niagara+0.276
Neg logit contributions
uld-0.317
 underscore-0.282
 Kling-0.279
 twins-0.265
��-0.264
Token Web
Token ID5313
Feature activation-1.11e+00

Pos logit contributions
hra+1.335
veh+1.321
camp+1.314
anas+1.302
 Niagara+1.300
Neg logit contributions
uld-1.906
 underscore-1.694
 Kling-1.692
��-1.634
 twins-1.622
Token
Token ID447
Feature activation-6.39e-01

Pos logit contributions
hra+0.700
veh+0.688
anas+0.687
camp+0.686
 Niagara+0.683
Neg logit contributions
uld-0.862
 underscore-0.768
 Kling-0.765
��-0.728
 twins-0.727
Token
Token ID247
Feature activation-2.41e-01

Pos logit contributions
hra+0.606
camp+0.599
anas+0.599
veh+0.598
 Niagara+0.594
Neg logit contributions
uld-0.291
 underscore-0.237
 Kling-0.236
 twins-0.214
 tant-0.208
Tokens
Token ID82
Feature activation-9.40e-02

Pos logit contributions
hra+0.167
camp+0.165
anas+0.165
veh+0.164
 Niagara+0.163
Neg logit contributions
uld-0.171
 underscore-0.152
 Kling-0.151
 twins-0.143
��-0.140
Token 28
Token ID2579
Feature activation+8.98e-02

Pos logit contributions
hra+0.065
anas+0.064
veh+0.064
camp+0.064
 Niagara+0.064
Neg logit contributions
uld-0.067
 underscore-0.059
 Kling-0.058
 twins-0.055
��-0.055
Token AND
Token ID5357
Feature activation+2.33e-01

Pos logit contributions
hra+0.093
camp+0.092
anas+0.092
veh+0.091
 Niagara+0.091
Neg logit contributions
uld-0.094
 underscore-0.083
 Kling-0.082
 twins-0.078
��-0.077
Token STEP
Token ID49154
Feature activation+1.60e-01

Pos logit contributions
uld+0.167
 underscore+0.147
 Kling+0.146
 twins+0.138
��+0.138
Neg logit contributions
hra-0.159
camp-0.158
anas-0.158
 Niagara-0.157
MIT-0.156
TokenH
Token ID39
Feature activation-9.13e-02

Pos logit contributions
uld+0.125
 underscore+0.112
 Kling+0.111
 twins+0.106
 tant+0.105
Neg logit contributions
hra-0.100
anas-0.099
MIT-0.098
camp-0.098
veh-0.098
TokenEN
Token ID1677
Feature activation-8.80e-01

Pos logit contributions
hra+0.057
anas+0.057
camp+0.056
veh+0.056
MIT+0.056
Neg logit contributions
uld-0.071
 underscore-0.063
 Kling-0.063
 twins-0.060
��-0.060
Token M
Token ID337
Feature activation-8.07e-01

Pos logit contributions
hra+0.594
camp+0.590
anas+0.587
MIT+0.585
veh+0.584
Neg logit contributions
uld-0.639
 underscore-0.566
 Kling-0.558
 twins-0.531
��-0.527
Tokenatur
Token ID2541
Feature activation-2.22e+00

Pos logit contributions
hra+0.576
camp+0.570
anas+0.569
MIT+0.568
veh+0.567
Neg logit contributions
uld-0.551
 underscore-0.490
 Kling-0.485
 twins-0.461
 tant-0.455
Tokenin
Token ID259
Feature activation-1.98e-01

Pos logit contributions
camp+1.134
anas+1.133
hra+1.131
veh+1.122
arkin+1.111
Neg logit contributions
uld-1.952
 underscore-1.798
��-1.791
 Kling-1.788
��-1.782
Token first
Token ID717
Feature activation-2.26e-01

Pos logit contributions
hra+0.141
camp+0.139
anas+0.139
 Niagara+0.138
veh+0.138
Neg logit contributions
uld-0.137
 underscore-0.118
 Kling-0.118
��-0.112
��-0.112
Token meet
Token ID1826
Feature activation-3.75e-01

Pos logit contributions
hra+0.148
camp+0.147
 Niagara+0.147
anas+0.147
veh+0.146
Neg logit contributions
uld-0.169
 Kling-0.147
 underscore-0.147
��-0.141
��-0.141
Token in
Token ID287
Feature activation+2.04e-01

Pos logit contributions
hra+0.262
camp+0.260
 Niagara+0.259
anas+0.259
veh+0.256
Neg logit contributions
uld-0.263
 underscore-0.230
 Kling-0.228
��-0.219
��-0.217
Token April
Token ID3035
Feature activation-1.22e-01

Pos logit contributions
uld+0.150
 underscore+0.133
 Kling+0.132
 twins+0.125
��+0.124
Neg logit contributions
hra-0.136
camp-0.134
anas-0.134
veh-0.133
 Niagara-0.133
Token this
Token ID428
Feature activation-7.59e-01

Pos logit contributions
hra+0.199
camp+0.197
anas+0.196
veh+0.196
 Niagara+0.195
Neg logit contributions
uld-0.196
 underscore-0.170
 Kling-0.170
��-0.160
 twins-0.160
Token into
Token ID656
Feature activation-1.16e+00

Pos logit contributions
hra+0.549
 Niagara+0.540
camp+0.538
veh+0.536
 Nasa+0.535
Neg logit contributions
uld-0.520
 Kling-0.446
 underscore-0.446
��-0.434
��-0.423
Token perspective
Token ID6650
Feature activation-1.41e+00

Pos logit contributions
hra+1.000
 Niagara+0.991
veh+0.990
camp+0.990
anas+0.989
Neg logit contributions
uld-0.626
 Kling-0.516
 underscore-0.512
��-0.484
 twins-0.478
Token,
Token ID11
Feature activation-7.28e-01

Pos logit contributions
hra+0.896
camp+0.879
anas+0.876
veh+0.875
 Niagara+0.868
Neg logit contributions
uld-1.083
 underscore-0.963
 Kling-0.962
 twins-0.913
 tant-0.899
Token AT
Token ID5161
Feature activation+3.36e-01

Pos logit contributions
hra+0.483
 Niagara+0.471
camp+0.470
veh+0.468
 Nasa+0.467
Neg logit contributions
uld-0.546
 Kling-0.478
 underscore-0.475
��-0.464
��-0.455
Token&
Token ID5
Feature activation-2.21e+00

Pos logit contributions
uld+0.245
 underscore+0.219
 Kling+0.217
 twins+0.206
��+0.204
Neg logit contributions
hra-0.226
camp-0.223
anas-0.223
veh-0.222
 Niagara-0.220
TokenT
Token ID51
Feature activation-1.03e+00

Pos logit contributions
MIT+1.338
hra+1.335
veh+1.323
camp+1.314
 Niagara+1.282
Neg logit contributions
uld-1.766
 underscore-1.617
 Kling-1.582
��-1.573
��-1.570
Token's
Token ID338
Feature activation+2.55e-01

Pos logit contributions
hra+0.524
camp+0.516
anas+0.513
veh+0.513
 Niagara+0.512
Neg logit contributions
uld-0.918
 underscore-0.830
 Kling-0.829
��-0.793
 twins-0.792
Token spending
Token ID4581
Feature activation-8.31e-01

Pos logit contributions
uld+0.200
 underscore+0.178
 Kling+0.177
 twins+0.168
��+0.168
Neg logit contributions
hra-0.159
camp-0.156
anas-0.155
veh-0.155
 Niagara-0.155
Token for
Token ID329
Feature activation-4.20e-01

Pos logit contributions
hra+0.584
camp+0.582
 Niagara+0.579
anas+0.579
veh+0.572
Neg logit contributions
uld-0.583
 Kling-0.505
 underscore-0.501
��-0.485
 twins-0.476
Token three
Token ID1115
Feature activation+1.23e-01

Pos logit contributions
hra+0.298
camp+0.294
 Niagara+0.293
anas+0.293
veh+0.291
Neg logit contributions
uld-0.292
 underscore-0.254
 Kling-0.253
��-0.244
��-0.239
Token Spielberg
Token ID41151
Feature activation-6.30e-01

Pos logit contributions
uld+0.198
 underscore+0.173
 Kling+0.172
 twins+0.162
 tant+0.159
Neg logit contributions
hra-0.217
camp-0.214
anas-0.214
veh-0.213
 Niagara-0.211
Token's
Token ID338
Feature activation+2.77e-01

Pos logit contributions
hra+0.402
camp+0.397
anas+0.396
veh+0.395
 Niagara+0.392
Neg logit contributions
uld-0.479
 underscore-0.427
 Kling-0.425
��-0.404
 twins-0.401
Token classic
Token ID6833
Feature activation+4.03e-01

Pos logit contributions
uld+0.203
 underscore+0.180
 Kling+0.179
��+0.169
 twins+0.169
Neg logit contributions
hra-0.185
camp-0.182
anas-0.182
veh-0.182
 Niagara-0.182
Token film
Token ID2646
Feature activation-4.33e-01

Pos logit contributions
uld+0.296
 underscore+0.261
 Kling+0.261
��+0.246
 twins+0.245
Neg logit contributions
hra-0.268
camp-0.264
anas-0.264
 Niagara-0.264
veh-0.264
Token J
Token ID449
Feature activation-1.59e-01

Pos logit contributions
hra+0.300
camp+0.297
 Niagara+0.296
veh+0.295
anas+0.295
Neg logit contributions
uld-0.306
 underscore-0.269
 Kling-0.268
��-0.253
 twins-0.251
Tokenaws
Token ID8356
Feature activation-2.19e+00

Pos logit contributions
hra+0.107
anas+0.107
veh+0.106
camp+0.106
 Niagara+0.106
Neg logit contributions
uld-0.115
 underscore-0.102
 Kling-0.102
 twins-0.097
 congr-0.096
Token in
Token ID287
Feature activation+5.25e-04

Pos logit contributions
hra+1.315
anas+1.307
camp+1.305
veh+1.295
 Niagara+1.284
Neg logit contributions
uld-1.734
 Kling-1.556
 underscore-1.545
��-1.484
��-1.464
Token which
Token ID543
Feature activation+1.72e-02

Pos logit contributions
uld+0.000
 underscore+0.000
 Kling+0.000
��+0.000
 twins+0.000
Neg logit contributions
hra-0.000
anas-0.000
 Niagara-0.000
camp-0.000
veh-0.000
Token the
Token ID262
Feature activation-2.10e-01

Pos logit contributions
uld+0.012
 underscore+0.010
 Kling+0.010
 twins+0.010
 congr+0.010
Neg logit contributions
hra-0.012
anas-0.012
camp-0.012
veh-0.012
 Niagara-0.012
Token residents
Token ID5085
Feature activation-3.05e-01

Pos logit contributions
hra+0.147
anas+0.145
 Niagara+0.145
camp+0.145
veh+0.145
Neg logit contributions
uld-0.147
 underscore-0.129
 Kling-0.128
��-0.120
 twins-0.120
Token of
Token ID286
Feature activation+5.32e-01

Pos logit contributions
hra+0.208
camp+0.205
anas+0.204
veh+0.203
 Niagara+0.203
Neg logit contributions
uld-0.221
 Kling-0.194
 underscore-0.194
 twins-0.183
��-0.181
Token officer
Token ID3818
Feature activation-7.51e-01

Pos logit contributions
hra+0.070
anas+0.068
camp+0.068
veh+0.068
 Niagara+0.067
Neg logit contributions
uld-0.073
 underscore-0.065
 Kling-0.065
 twins-0.061
 tant-0.060
Token,
Token ID11
Feature activation-5.14e-01

Pos logit contributions
hra+0.500
camp+0.494
anas+0.492
veh+0.491
 Niagara+0.488
Neg logit contributions
uld-0.553
 underscore-0.490
 Kling-0.488
 twins-0.461
��-0.455
Token and
Token ID290
Feature activation-5.08e-01

Pos logit contributions
hra+0.343
camp+0.337
anas+0.337
veh+0.336
 Niagara+0.334
Neg logit contributions
uld-0.378
 underscore-0.336
 Kling-0.334
 twins-0.316
 congr-0.312
Token Stephen
Token ID7970
Feature activation-1.19e-01

Pos logit contributions
hra+0.342
anas+0.337
camp+0.337
veh+0.336
 Niagara+0.334
Neg logit contributions
uld-0.370
 underscore-0.327
 Kling-0.325
 twins-0.308
��-0.304
Token M
Token ID337
Feature activation-2.58e-01

Pos logit contributions
hra+0.079
camp+0.078
anas+0.078
veh+0.078
 Niagara+0.077
Neg logit contributions
uld-0.088
 underscore-0.078
 Kling-0.077
 twins-0.073
 tant-0.072
Tokenatur
Token ID2541
Feature activation-2.16e+00

Pos logit contributions
hra+0.168
camp+0.165
anas+0.165
veh+0.165
 Niagara+0.163
Neg logit contributions
uld-0.195
 underscore-0.173
 Kling-0.173
 twins-0.164
 tant-0.161
Tokenin
Token ID259
Feature activation-5.62e-02

Pos logit contributions
hra+1.253
anas+1.236
camp+1.234
veh+1.229
arkin+1.206
Neg logit contributions
uld-1.778
 underscore-1.607
 Kling-1.597
��-1.529
 twins-1.526
Token,
Token ID11
Feature activation-3.84e-01

Pos logit contributions
hra+0.037
veh+0.037
camp+0.036
anas+0.036
 Niagara+0.036
Neg logit contributions
uld-0.042
 Kling-0.037
 underscore-0.037
 twins-0.035
 tant-0.034
Token a
Token ID257
Feature activation-3.87e-01

Pos logit contributions
hra+0.287
anas+0.283
camp+0.283
veh+0.281
 Niagara+0.281
Neg logit contributions
uld-0.252
 underscore-0.220
 Kling-0.218
 twins-0.205
��-0.202
Token penn
Token ID22429
Feature activation+5.67e-03

Pos logit contributions
hra+0.252
camp+0.247
anas+0.247
veh+0.247
 Niagara+0.245
Neg logit contributions
uld-0.292
 underscore-0.260
 Kling-0.258
 twins-0.245
 congr-0.242
Tokeniless
Token ID30608
Feature activation+2.38e-01

Pos logit contributions
uld+0.005
 underscore+0.005
 Kling+0.005
 twins+0.005
 tant+0.005
Neg logit contributions
hra-0.003
veh-0.003
camp-0.003
anas-0.003
 Niagara-0.003
Token is
Token ID318
Feature activation+9.91e-02

Pos logit contributions
uld+0.278
 Kling+0.244
 underscore+0.242
 twins+0.230
��+0.229
Neg logit contributions
hra-0.274
anas-0.270
camp-0.270
veh-0.269
 Niagara-0.266
Token long
Token ID890
Feature activation+7.70e-02

Pos logit contributions
uld+0.069
 underscore+0.060
 Kling+0.059
��+0.056
 congr+0.056
Neg logit contributions
hra-0.071
anas-0.070
camp-0.070
 Niagara-0.070
veh-0.069
Token;
Token ID26
Feature activation+5.83e-01

Pos logit contributions
uld+0.060
 Kling+0.053
 underscore+0.053
��+0.051
 twins+0.050
Neg logit contributions
hra-0.048
anas-0.048
camp-0.048
veh-0.047
 Niagara-0.047
Token he
Token ID339
Feature activation+1.51e-01

Pos logit contributions
uld+0.418
 underscore+0.369
 Kling+0.367
 twins+0.347
��+0.343
Neg logit contributions
hra-0.401
camp-0.395
anas-0.395
veh-0.393
 Niagara-0.391
Token comm
Token ID725
Feature activation+6.66e-01

Pos logit contributions
uld+0.115
 underscore+0.103
 Kling+0.103
 twins+0.097
��+0.096
Neg logit contributions
hra-0.097
anas-0.095
camp-0.095
veh-0.095
 Niagara-0.094
Tokenends
Token ID2412
Feature activation-2.15e+00

Pos logit contributions
uld+0.469
 underscore+0.414
 Kling+0.413
 twins+0.390
 tant+0.383
Neg logit contributions
hra-0.466
camp-0.459
anas-0.459
veh-0.457
 Niagara-0.453
Token the
Token ID262
Feature activation-2.71e-01

Pos logit contributions
hra+1.496
camp+1.474
anas+1.472
 Niagara+1.471
veh+1.468
Neg logit contributions
uld-1.520
 underscore-1.332
 Kling-1.321
��-1.251
 twins-1.244
Token Monday
Token ID3321
Feature activation-1.20e-02

Pos logit contributions
hra+0.183
anas+0.180
camp+0.180
veh+0.179
 Niagara+0.178
Neg logit contributions
uld-0.199
 underscore-0.175
 Kling-0.174
 twins-0.164
��-0.164
Token job
Token ID1693
Feature activation+1.29e+00

Pos logit contributions
hra+0.008
anas+0.008
camp+0.008
veh+0.008
 Niagara+0.008
Neg logit contributions
uld-0.009
 underscore-0.008
 Kling-0.008
 twins-0.007
��-0.007
Token-
Token ID12
Feature activation-1.78e-01

Pos logit contributions
uld+0.936
 underscore+0.831
 Kling+0.829
 twins+0.784
 tant+0.772
Neg logit contributions
hra-0.873
anas-0.859
veh-0.857
camp-0.856
 Niagara-0.848
Tokenseekers
Token ID47971
Feature activation+9.48e-04

Pos logit contributions
hra+0.116
camp+0.115
veh+0.114
anas+0.114
 Niagara+0.112
Neg logit contributions
uld-0.135
 underscore-0.120
 Kling-0.120
 twins-0.113
��-0.112
Token.
Token ID13
Feature activation-1.37e-01

Pos logit contributions
hra+0.081
camp+0.080
anas+0.079
veh+0.079
 Niagara+0.078
Neg logit contributions
uld-0.107
 Kling-0.095
 underscore-0.095
 twins-0.091
��-0.090
Token
Token ID198
Feature activation-3.92e-01

Pos logit contributions
hra+0.102
camp+0.101
 Niagara+0.101
anas+0.100
veh+0.100
Neg logit contributions
uld-0.091
 underscore-0.079
 Kling-0.077
��-0.075
��-0.074
Token
Token ID198
Feature activation-5.40e-01

Pos logit contributions
hra+0.291
camp+0.290
veh+0.289
MIT+0.286
 Niagara+0.286
Neg logit contributions
uld-0.262
 underscore-0.232
 Kling-0.225
��-0.224
��-0.222
TokenThe
Token ID464
Feature activation-6.39e-01

Pos logit contributions
camp+0.399
hra+0.397
MIT+0.396
veh+0.396
anas+0.394
Neg logit contributions
uld-0.360
 underscore-0.319
 Kling-0.314
 twins-0.297
 tant-0.293
Token pilot
Token ID8022
Feature activation+4.92e-01

Pos logit contributions
hra+0.413
camp+0.412
 Niagara+0.409
anas+0.409
veh+0.407
Neg logit contributions
uld-0.481
 underscore-0.423
 Kling-0.418
��-0.400
��-0.399
Token introduced
Token ID5495
Feature activation-2.14e+00

Pos logit contributions
uld+0.366
 underscore+0.320
 Kling+0.320
��+0.307
 twins+0.304
Neg logit contributions
camp-0.324
hra-0.321
anas-0.321
 Niagara-0.320
veh-0.315
Token us
Token ID514
Feature activation+3.63e-01

Pos logit contributions
hra+1.472
camp+1.470
 Niagara+1.469
anas+1.463
veh+1.462
Neg logit contributions
uld-1.502
 underscore-1.298
 Kling-1.282
��-1.223
��-1.215
Token to
Token ID284
Feature activation+8.59e-01

Pos logit contributions
uld+0.216
 underscore+0.185
 Kling+0.185
 twins+0.173
 congr+0.169
Neg logit contributions
hra-0.294
camp-0.290
anas-0.290
veh-0.289
 Niagara-0.287
Token a
Token ID257
Feature activation+1.82e-01

Pos logit contributions
uld+0.645
 underscore+0.572
 Kling+0.569
 twins+0.539
 tant+0.533
Neg logit contributions
hra-0.561
camp-0.554
anas-0.553
veh-0.550
 Niagara-0.548
Token LOT
Token ID30314
Feature activation-1.65e-01

Pos logit contributions
uld+0.124
 underscore+0.109
 Kling+0.108
 twins+0.102
��+0.101
Neg logit contributions
hra-0.132
camp-0.130
veh-0.130
 Niagara-0.129
anas-0.129
Token of
Token ID286
Feature activation-2.11e-01

Pos logit contributions
hra+0.114
anas+0.113
camp+0.112
veh+0.112
 Niagara+0.111
Neg logit contributions
uld-0.117
 underscore-0.103
 Kling-0.103
 twins-0.097
 congr-0.096
Tokensych
Token ID2924
Feature activation+4.62e-01

Pos logit contributions
uld+0.182
 underscore+0.162
 Kling+0.162
 twins+0.154
 tant+0.151
Neg logit contributions
hra-0.155
camp-0.153
anas-0.153
veh-0.152
 Niagara-0.151
Tokenomen
Token ID3674
Feature activation+2.46e-02

Pos logit contributions
uld+0.369
 Kling+0.332
 underscore+0.332
 twins+0.316
 congr+0.312
Neg logit contributions
hra-0.279
anas-0.275
camp-0.274
veh-0.272
 Niagara-0.269
Token>
Token ID29
Feature activation-2.31e-01

Pos logit contributions
uld+0.018
 underscore+0.016
 Kling+0.016
 twins+0.015
 congr+0.015
Neg logit contributions
hra-0.017
camp-0.016
anas-0.016
veh-0.016
 Niagara-0.016
Token GU
Token ID19348
Feature activation-2.16e-01

Pos logit contributions
hra+0.152
camp+0.151
anas+0.150
 Niagara+0.149
veh+0.149
Neg logit contributions
uld-0.171
 underscore-0.150
 Kling-0.150
��-0.143
 twins-0.141
TokenYS
Token ID16309
Feature activation+3.05e-02

Pos logit contributions
hra+0.144
anas+0.142
camp+0.142
veh+0.141
MIT+0.140
Neg logit contributions
uld-0.158
 underscore-0.141
 Kling-0.141
 twins-0.133
 tant-0.131
Token STA
Token ID47550
Feature activation-2.11e+00

Pos logit contributions
uld+0.023
 underscore+0.021
 Kling+0.021
 twins+0.020
 tant+0.019
Neg logit contributions
hra-0.019
camp-0.019
anas-0.019
veh-0.019
 Niagara-0.019
TokenHP
Token ID14082
Feature activation-4.58e-01

Pos logit contributions
MIT+1.229
camp+1.225
hra+1.222
anas+1.217
 Niagara+1.203
Neg logit contributions
uld-1.704
 underscore-1.548
 Kling-1.529
 twins-1.465
��-1.459
Token [
Token ID685
Feature activation-5.55e-02

Pos logit contributions
hra+0.297
anas+0.294
camp+0.294
MIT+0.291
veh+0.291
Neg logit contributions
uld-0.344
 underscore-0.306
 Kling-0.305
 twins-0.289
��-0.285
Token10
Token ID940
Feature activation-3.88e-01

Pos logit contributions
hra+0.050
veh+0.050
camp+0.050
MIT+0.049
anas+0.049
Neg logit contributions
uld-0.028
 underscore-0.024
 Kling-0.024
��-0.022
��-0.022
Token:
Token ID25
Feature activation-2.99e-02

Pos logit contributions
hra+0.258
camp+0.256
veh+0.255
anas+0.252
 Niagara+0.251
Neg logit contributions
uld-0.289
 underscore-0.259
 Kling-0.254
��-0.249
��-0.246
Token47
Token ID2857
Feature activation-5.77e-01

Pos logit contributions
hra+0.022
anas+0.022
camp+0.022
veh+0.022
MIT+0.022
Neg logit contributions
uld-0.020
 underscore-0.017
 Kling-0.017
 twins-0.016
 tant-0.016
Token buy
Token ID2822
Feature activation-6.40e-01

Pos logit contributions
uld+0.208
 Kling+0.183
 underscore+0.183
 twins+0.174
��+0.172
Neg logit contributions
hra-0.197
camp-0.194
anas-0.194
veh-0.193
 Niagara-0.192
Token you
Token ID345
Feature activation-3.17e-01

Pos logit contributions
hra+0.351
anas+0.344
camp+0.344
veh+0.342
 Niagara+0.340
Neg logit contributions
uld-0.547
 underscore-0.493
 Kling-0.492
 twins-0.468
��-0.465
Token?
Token ID30
Feature activation-3.51e-01

Pos logit contributions
hra+0.167
anas+0.164
camp+0.163
veh+0.163
 Niagara+0.161
Neg logit contributions
uld-0.278
 underscore-0.252
 Kling-0.251
 twins-0.240
��-0.238
Token While
Token ID2893
Feature activation-1.21e-01

Pos logit contributions
hra+0.265
camp+0.263
 Niagara+0.262
anas+0.261
veh+0.261
Neg logit contributions
uld-0.229
 underscore-0.198
 Kling-0.194
��-0.192
��-0.186
Token AT
Token ID5161
Feature activation-2.22e-01

Pos logit contributions
hra+0.081
camp+0.080
anas+0.079
 Niagara+0.079
veh+0.079
Neg logit contributions
uld-0.090
 Kling-0.079
 underscore-0.079
��-0.075
 twins-0.075
Token&
Token ID5
Feature activation-2.10e+00

Pos logit contributions
hra+0.147
camp+0.146
anas+0.146
veh+0.145
 Niagara+0.144
Neg logit contributions
uld-0.165
 underscore-0.148
 Kling-0.146
��-0.139
 twins-0.139
TokenT
Token ID51
Feature activation-1.25e+00

Pos logit contributions
MIT+1.267
hra+1.265
veh+1.258
camp+1.251
anas+1.218
Neg logit contributions
uld-1.681
 underscore-1.537
 Kling-1.504
��-1.499
��-1.495
Token's
Token ID338
Feature activation+1.40e-03

Pos logit contributions
hra+0.786
camp+0.778
anas+0.772
 Niagara+0.771
veh+0.770
Neg logit contributions
uld-0.966
 underscore-0.859
 Kling-0.857
��-0.821
 twins-0.814
Token powerhouse
Token ID34886
Feature activation-8.26e-01

Pos logit contributions
uld+0.001
 underscore+0.001
 Kling+0.001
 twins+0.001
��+0.001
Neg logit contributions
hra-0.001
camp-0.001
anas-0.001
veh-0.001
 Niagara-0.001
Token FISA
Token ID44073
Feature activation+4.26e-01

Pos logit contributions
hra+0.512
camp+0.508
 Niagara+0.504
veh+0.501
anas+0.497
Neg logit contributions
uld-0.647
 Kling-0.571
 underscore-0.571
��-0.556
��-0.542
Token lobbyist
Token ID35196
Feature activation-4.45e-01

Pos logit contributions
uld+0.324
 Kling+0.288
 underscore+0.288
 twins+0.272
��+0.270
Neg logit contributions
hra-0.274
camp-0.271
anas-0.269
veh-0.269
 Niagara-0.267
Token a
Token ID257
Feature activation+2.31e-02

Pos logit contributions
uld+0.130
 underscore+0.114
 Kling+0.114
 twins+0.107
��+0.106
Neg logit contributions
hra-0.124
anas-0.122
camp-0.122
veh-0.121
 Niagara-0.121
Token member
Token ID2888
Feature activation+2.03e-01

Pos logit contributions
uld+0.016
 underscore+0.015
 Kling+0.014
 twins+0.014
 tant+0.013
Neg logit contributions
hra-0.016
camp-0.016
anas-0.016
veh-0.016
 Niagara-0.015
Token of
Token ID286
Feature activation-1.46e-01

Pos logit contributions
uld+0.141
 Kling+0.125
 underscore+0.125
 twins+0.118
 tant+0.116
Neg logit contributions
hra-0.144
anas-0.141
veh-0.141
camp-0.141
MIT-0.141
Token the
Token ID262
Feature activation+1.71e-02

Pos logit contributions
hra+0.110
camp+0.108
anas+0.108
veh+0.108
 Niagara+0.107
Neg logit contributions
uld-0.095
 underscore-0.083
 Kling-0.083
 twins-0.078
 congr-0.076
Token break
Token ID2270
Feature activation-4.72e-01

Pos logit contributions
uld+0.013
 underscore+0.012
 Kling+0.012
 twins+0.011
��+0.011
Neg logit contributions
hra-0.011
anas-0.011
camp-0.011
veh-0.011
 Niagara-0.011
Tokenaway
Token ID8272
Feature activation-2.09e+00

Pos logit contributions
hra+0.291
camp+0.286
veh+0.285
anas+0.285
 Niagara+0.282
Neg logit contributions
uld-0.373
 underscore-0.333
 Kling-0.333
 twins-0.316
��-0.312
Token group
Token ID1448
Feature activation-9.87e-03

Pos logit contributions
 Niagara+1.354
camp+1.351
hra+1.351
veh+1.345
anas+1.340
Neg logit contributions
uld-1.557
 underscore-1.368
 Kling-1.358
��-1.296
��-1.291
Token.
Token ID13
Feature activation-1.64e-01

Pos logit contributions
hra+0.006
camp+0.006
anas+0.006
veh+0.006
 Niagara+0.006
Neg logit contributions
uld-0.008
 underscore-0.007
 Kling-0.007
 twins-0.006
 tant-0.006
Token
Token ID198
Feature activation+8.32e-02

Pos logit contributions
hra+0.126
camp+0.125
 Niagara+0.125
veh+0.125
anas+0.124
Neg logit contributions
uld-0.104
 underscore-0.090
 Kling-0.088
��-0.084
 twins-0.083
Token
Token ID198
Feature activation-1.11e-01

Pos logit contributions
uld+0.055
 underscore+0.048
 Kling+0.047
��+0.047
��+0.047
Neg logit contributions
hra-0.062
camp-0.062
veh-0.062
anas-0.061
 Niagara-0.061
TokenAnaly
Token ID37702
Feature activation-1.54e-01

Pos logit contributions
hra+0.084
camp+0.083
anas+0.083
veh+0.083
MIT+0.082
Neg logit contributions
uld-0.072
 underscore-0.063
 Kling-0.063
 twins-0.059
 congr-0.058
Token fracking
Token ID25441
Feature activation-2.23e-01

Pos logit contributions
hra+0.056
camp+0.055
anas+0.055
veh+0.055
 Niagara+0.054
Neg logit contributions
uld-0.057
 Kling-0.050
 underscore-0.050
 twins-0.048
��-0.047
Token.
Token ID13
Feature activation-1.12e-01

Pos logit contributions
hra+0.147
camp+0.144
anas+0.144
veh+0.144
 Niagara+0.142
Neg logit contributions
uld-0.166
 underscore-0.148
 Kling-0.147
 twins-0.139
 congr-0.137
Token It
Token ID632
Feature activation-7.64e-02

Pos logit contributions
 Niagara+0.085
camp+0.085
hra+0.085
 Nasa+0.084
MIT+0.083
Neg logit contributions
uld-0.073
 underscore-0.062
��-0.061
 Kling-0.060
��-0.060
Token is
Token ID318
Feature activation+3.51e-02

Pos logit contributions
hra+0.053
camp+0.052
anas+0.052
veh+0.052
 Niagara+0.051
Neg logit contributions
uld-0.055
 Kling-0.048
 underscore-0.048
 twins-0.046
��-0.045
Token also
Token ID635
Feature activation-3.76e-01

Pos logit contributions
uld+0.025
 underscore+0.021
 Kling+0.021
��+0.020
 twins+0.020
Neg logit contributions
hra-0.025
camp-0.025
anas-0.024
 Niagara-0.024
veh-0.024
Token doubtful
Token ID31608
Feature activation-2.08e+00

Pos logit contributions
hra+0.257
 Niagara+0.253
camp+0.253
anas+0.252
veh+0.252
Neg logit contributions
uld-0.273
 Kling-0.234
 underscore-0.234
��-0.226
��-0.223
Token that
Token ID326
Feature activation-1.77e-01

Pos logit contributions
hra+1.304
camp+1.295
 Niagara+1.282
veh+1.261
arkin+1.260
Neg logit contributions
uld-1.609
 underscore-1.412
 Kling-1.406
��-1.360
 twins-1.342
Token she
Token ID673
Feature activation+1.52e-02

Pos logit contributions
hra+0.137
camp+0.136
 Niagara+0.135
anas+0.135
 Nasa+0.134
Neg logit contributions
uld-0.112
 underscore-0.096
 Kling-0.095
��-0.091
 twins-0.090
Token would
Token ID561
Feature activation+2.00e-02

Pos logit contributions
uld+0.011
 Kling+0.009
 underscore+0.009
��+0.009
 twins+0.009
Neg logit contributions
hra-0.011
camp-0.010
anas-0.010
veh-0.010
 Niagara-0.010
Token take
Token ID1011
Feature activation-2.66e-01

Pos logit contributions
uld+0.013
 Kling+0.011
 underscore+0.011
��+0.011
��+0.011
Neg logit contributions
hra-0.015
camp-0.015
anas-0.014
veh-0.014
 Niagara-0.014
Token effective
Token ID4050
Feature activation+1.14e+00

Pos logit contributions
hra+0.193
camp+0.191
anas+0.190
 Niagara+0.190
 Nasa+0.188
Neg logit contributions
uld-0.181
 underscore-0.157
 Kling-0.156
��-0.150
��-0.147
Token Last
Token ID4586
Feature activation-7.80e-01

Pos logit contributions
hra+0.075
camp+0.074
veh+0.074
anas+0.074
 Niagara+0.074
Neg logit contributions
uld-0.063
 underscore-0.054
 Kling-0.053
��-0.051
 twins-0.051
Token week
Token ID1285
Feature activation-1.27e-01

Pos logit contributions
hra+0.495
anas+0.491
camp+0.490
veh+0.489
 Niagara+0.485
Neg logit contributions
uld-0.595
 underscore-0.526
 Kling-0.526
 twins-0.499
��-0.497
Token,
Token ID11
Feature activation+2.29e-01

Pos logit contributions
hra+0.080
camp+0.079
anas+0.079
veh+0.078
 Niagara+0.078
Neg logit contributions
uld-0.099
 underscore-0.088
 Kling-0.087
 twins-0.083
��-0.082
Token he
Token ID339
Feature activation-5.55e-02

Pos logit contributions
uld+0.166
 underscore+0.144
 Kling+0.144
��+0.137
 twins+0.136
Neg logit contributions
hra-0.157
camp-0.155
anas-0.155
veh-0.154
 Niagara-0.154
Token comm
Token ID725
Feature activation+8.90e-01

Pos logit contributions
hra+0.035
anas+0.034
camp+0.034
veh+0.034
 Niagara+0.034
Neg logit contributions
uld-0.043
 Kling-0.038
 underscore-0.038
��-0.036
 twins-0.036
Tokenended
Token ID1631
Feature activation-2.06e+00

Pos logit contributions
uld+0.636
 underscore+0.567
 Kling+0.565
 twins+0.535
��+0.527
Neg logit contributions
hra-0.612
camp-0.602
anas-0.601
veh-0.599
 Niagara-0.594
Token Trump
Token ID1301
Feature activation-4.44e-01

Pos logit contributions
hra+1.325
 Niagara+1.310
camp+1.300
anas+1.298
 Nasa+1.297
Neg logit contributions
uld-1.574
 underscore-1.376
 Kling-1.360
��-1.311
 twins-1.298
Token for
Token ID329
Feature activation+5.58e-01

Pos logit contributions
hra+0.288
camp+0.283
anas+0.281
veh+0.280
 Niagara+0.280
Neg logit contributions
uld-0.338
 Kling-0.298
 underscore-0.298
 twins-0.282
��-0.280
Token conducting
Token ID14523
Feature activation+6.48e-01

Pos logit contributions
uld+0.397
 underscore+0.348
 Kling+0.347
 twins+0.328
��+0.325
Neg logit contributions
hra-0.387
camp-0.381
anas-0.380
veh-0.379
 Niagara-0.378
Token military
Token ID2422
Feature activation+6.97e-02

Pos logit contributions
uld+0.445
 underscore+0.389
 Kling+0.388
 twins+0.365
��+0.360
Neg logit contributions
hra-0.466
camp-0.459
anas-0.459
veh-0.457
 Niagara-0.456
Token strikes
Token ID8956
Feature activation-9.75e-02

Pos logit contributions
uld+0.056
 Kling+0.049
 underscore+0.049
��+0.047
 twins+0.046
Neg logit contributions
camp-0.043
hra-0.042
veh-0.042
 Niagara-0.042
anas-0.042
Token support
Token ID1104
Feature activation+2.11e-02

Pos logit contributions
uld+0.259
 Kling+0.213
 underscore+0.210
 congr+0.197
��+0.196
Neg logit contributions
anas-0.440
hra-0.440
 Niagara-0.439
camp-0.438
 Nasa-0.437
Token the
Token ID262
Feature activation-1.98e-01

Pos logit contributions
uld+0.015
 Kling+0.013
 underscore+0.013
 congr+0.013
��+0.012
Neg logit contributions
anas-0.015
camp-0.015
 Nasa-0.015
hra-0.015
 Niagara-0.015
Token mission
Token ID4365
Feature activation-2.64e-01

Pos logit contributions
hra+0.130
camp+0.127
anas+0.127
veh+0.126
 Niagara+0.126
Neg logit contributions
uld-0.149
 underscore-0.132
 Kling-0.132
 twins-0.125
 congr-0.124
Token of
Token ID286
Feature activation-1.75e-01

Pos logit contributions
hra+0.171
camp+0.170
veh+0.169
anas+0.168
 Niagara+0.168
Neg logit contributions
uld-0.198
 Kling-0.176
 underscore-0.175
 twins-0.167
��-0.166
Token New
Token ID968
Feature activation-1.53e-01

Pos logit contributions
anas+0.108
camp+0.108
 Niagara+0.108
hra+0.107
 Nasa+0.107
Neg logit contributions
uld-0.138
 underscore-0.121
 Kling-0.120
 twins-0.114
��-0.114
Token Advent
Token ID33732
Feature activation-2.03e+00

Pos logit contributions
hra+0.114
anas+0.112
camp+0.112
veh+0.111
 Niagara+0.111
Neg logit contributions
uld-0.102
 underscore-0.089
 Kling-0.089
 twins-0.083
 tant-0.082
Token and
Token ID290
Feature activation+2.78e-01

Pos logit contributions
camp+1.150
hra+1.140
veh+1.136
 AMA+1.118
anas+1.107
Neg logit contributions
uld-1.685
��-1.521
 underscore-1.515
 Kling-1.509
��-1.457
Token get
Token ID651
Feature activation+1.24e-01

Pos logit contributions
uld+0.202
 underscore+0.182
 Kling+0.180
 twins+0.170
 tant+0.167
Neg logit contributions
hra-0.188
anas-0.188
veh-0.184
camp-0.184
MIT-0.184
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
uld+0.087
 underscore+0.076
 Kling+0.075
��+0.072
��+0.071
Neg logit contributions
hra-0.087
camp-0.086
 Niagara-0.085
veh-0.085
anas-0.085
Token full
Token ID1336
Feature activation+6.19e-01

Pos logit contributions
uld+0.008
 Kling+0.007
 underscore+0.007
��+0.006
��+0.006
Neg logit contributions
hra-0.011
camp-0.011
anas-0.011
veh-0.011
 Niagara-0.011
Token contents
Token ID10154
Feature activation+4.42e-01

Pos logit contributions
uld+0.354
 underscore+0.320
 Kling+0.304
 twins+0.294
 tant+0.285
Neg logit contributions
hra-0.510
anas-0.510
MIT-0.503
anos-0.502
arkin-0.501
Tokenurally
Token ID20221
Feature activation-1.65e-01

Pos logit contributions
uld+0.913
 underscore+0.811
 Kling+0.809
 twins+0.766
��+0.757
Neg logit contributions
hra-0.903
anas-0.888
veh-0.886
camp-0.884
arkin-0.873
Token generated
Token ID7560
Feature activation-1.97e-01

Pos logit contributions
hra+0.117
anas+0.115
veh+0.115
camp+0.114
 Niagara+0.114
Neg logit contributions
uld-0.115
 Kling-0.101
 underscore-0.101
 twins-0.095
��-0.094
Token universe
Token ID6881
Feature activation-1.46e-01

Pos logit contributions
hra+0.144
veh+0.142
anas+0.141
 Niagara+0.141
camp+0.141
Neg logit contributions
uld-0.133
 underscore-0.115
 Kling-0.114
��-0.108
 twins-0.108
Token
Token ID198
Feature activation-2.39e-01

Pos logit contributions
hra+0.097
camp+0.095
anas+0.095
veh+0.095
 Niagara+0.094
Neg logit contributions
uld-0.108
 underscore-0.096
 Kling-0.095
 twins-0.090
 congr-0.089
Token
Token ID198
Feature activation-6.29e-01

Pos logit contributions
hra+0.180
camp+0.179
veh+0.178
anas+0.176
 Niagara+0.176
Neg logit contributions
uld-0.157
 underscore-0.138
 Kling-0.135
��-0.134
��-0.133
TokenCol
Token ID5216
Feature activation-1.99e+00

Pos logit contributions
hra+0.419
camp+0.415
veh+0.415
anas+0.413
MIT+0.410
Neg logit contributions
uld-0.464
 underscore-0.414
 Kling-0.411
 twins-0.390
 tant-0.385
Tokenon
Token ID261
Feature activation+1.45e-02

Pos logit contributions
anas+1.286
hra+1.281
camp+1.280
veh+1.278
arkin+1.259
Neg logit contributions
uld-1.491
 Kling-1.345
 underscore-1.342
��-1.270
 twins-1.265
Tokenize
Token ID1096
Feature activation+8.06e-01

Pos logit contributions
uld+0.012
 underscore+0.011
 Kling+0.011
 twins+0.010
 tant+0.010
Neg logit contributions
hra-0.009
camp-0.008
anas-0.008
veh-0.008
 Niagara-0.008
Token unch
Token ID16258
Feature activation-9.33e-02

Pos logit contributions
uld+0.593
 underscore+0.527
 Kling+0.526
 twins+0.498
 tant+0.489
Neg logit contributions
hra-0.539
camp-0.529
anas-0.529
veh-0.527
 Niagara-0.522
Tokenarted
Token ID19112
Feature activation-2.43e-01

Pos logit contributions
hra+0.095
anas+0.094
camp+0.093
veh+0.093
arkin+0.092
Neg logit contributions
uld-0.036
 underscore-0.030
 Kling-0.029
 twins-0.026
 congr-0.026
Token planets
Token ID14705
Feature activation-3.24e-01

Pos logit contributions
hra+0.151
veh+0.148
anas+0.148
camp+0.148
 Niagara+0.147
Neg logit contributions
uld-0.190
 underscore-0.170
 Kling-0.168
 twins-0.160
��-0.159
Token Spain
Token ID8602
Feature activation+4.54e-01

Pos logit contributions
uld+0.061
 underscore+0.054
 Kling+0.054
��+0.051
 twins+0.051
Neg logit contributions
hra-0.053
anas-0.052
veh-0.052
 Niagara-0.052
camp-0.052
Token,
Token ID11
Feature activation+3.73e-01

Pos logit contributions
uld+0.322
 Kling+0.284
 underscore+0.282
 twins+0.266
��+0.265
Neg logit contributions
hra-0.316
camp-0.312
anas-0.311
veh-0.308
 Niagara-0.308
Token to
Token ID284
Feature activation+3.23e-02

Pos logit contributions
uld+0.253
 underscore+0.219
 Kling+0.219
��+0.209
 twins+0.207
Neg logit contributions
hra-0.272
camp-0.267
anas-0.267
veh-0.266
 Niagara-0.265
Token
Token ID564
Feature activation+6.19e-01

Pos logit contributions
uld+0.023
 Kling+0.020
 underscore+0.020
��+0.019
 twins+0.019
Neg logit contributions
hra-0.022
camp-0.022
anas-0.022
 Niagara-0.022
veh-0.022
Token
Token ID250
Feature activation+1.29e-01

Pos logit contributions
uld+0.374
 Kling+0.319
 underscore+0.318
 twins+0.302
 congr+0.291
Neg logit contributions
camp-0.495
hra-0.493
veh-0.489
anas-0.487
arkin-0.487
Tokenlet
Token ID1616
Feature activation-1.98e+00

Pos logit contributions
uld+0.094
 Kling+0.084
 underscore+0.084
 twins+0.080
��+0.079
Neg logit contributions
hra-0.086
camp-0.086
anas-0.086
veh-0.085
arkin-0.084
Token
Token ID447
Feature activation+5.99e-03

Pos logit contributions
hra+1.278
anas+1.260
camp+1.258
veh+1.252
arkin+1.235
Neg logit contributions
uld-1.484
 Kling-1.316
 underscore-1.309
 twins-1.257
��-1.253
Token
Token ID247
Feature activation-2.50e-01

Pos logit contributions
uld+0.002
 Kling+0.002
 underscore+0.002
 twins+0.002
 congr+0.002
Neg logit contributions
hra-0.006
camp-0.006
anas-0.006
 Niagara-0.006
veh-0.006
Tokens
Token ID82
Feature activation+2.83e-01

Pos logit contributions
hra+0.175
veh+0.171
camp+0.171
anas+0.171
 Niagara+0.168
Neg logit contributions
uld-0.176
 Kling-0.156
 underscore-0.155
 twins-0.148
 congr-0.145
Token see
Token ID766
Feature activation-9.33e-02

Pos logit contributions
uld+0.194
 Kling+0.170
 underscore+0.168
 twins+0.161
��+0.160
Neg logit contributions
hra-0.203
camp-0.200
veh-0.199
anas-0.199
 Niagara-0.197
Token first
Token ID717
Feature activation-8.98e-01

Pos logit contributions
hra+0.066
camp+0.065
anas+0.064
veh+0.064
 Niagara+0.064
Neg logit contributions
uld-0.065
 Kling-0.057
 underscore-0.057
 twins-0.054
��-0.054
Token S
Token ID311
Feature activation-2.09e-01

Pos logit contributions
hra+0.090
camp+0.088
anas+0.088
veh+0.088
 Niagara+0.087
Neg logit contributions
uld-0.102
 underscore-0.091
 Kling-0.090
 twins-0.086
 tant-0.084
Token Mr
Token ID1770
Feature activation+3.75e-01

Pos logit contributions
hra+0.136
anas+0.135
camp+0.134
veh+0.134
arkin+0.132
Neg logit contributions
uld-0.157
 underscore-0.141
 Kling-0.140
 twins-0.134
 tant-0.132
Tokená
Token ID6557
Feature activation-1.72e-01

Pos logit contributions
uld+0.289
 underscore+0.258
 Kling+0.257
 twins+0.244
 tant+0.241
Neg logit contributions
hra-0.237
camp-0.232
anas-0.232
veh-0.232
 Niagara-0.230
Tokenz
Token ID89
Feature activation-1.63e-01

Pos logit contributions
hra+0.111
camp+0.109
anas+0.109
veh+0.108
 Niagara+0.107
Neg logit contributions
uld-0.131
 underscore-0.117
 Kling-0.117
 twins-0.111
 congr-0.109
Token (
Token ID357
Feature activation-2.03e-01

Pos logit contributions
hra+0.104
camp+0.102
anas+0.102
veh+0.101
 Niagara+0.101
Neg logit contributions
uld-0.125
 underscore-0.112
 Kling-0.111
 twins-0.106
 tant-0.104
TokenMich
Token ID11180
Feature activation-1.97e+00

Pos logit contributions
hra+0.130
anas+0.128
camp+0.128
veh+0.127
 Niagara+0.126
Neg logit contributions
uld-0.156
 underscore-0.139
 Kling-0.138
 twins-0.131
 tant-0.129
Tokenl
Token ID75
Feature activation-3.46e-01

Pos logit contributions
hra+1.328
anas+1.306
camp+1.293
veh+1.290
arkin+1.277
Neg logit contributions
uld-1.415
 underscore-1.289
 Kling-1.270
 twins-1.223
��-1.215
Tokení
Token ID8836
Feature activation+5.82e-01

Pos logit contributions
hra+0.213
anas+0.209
camp+0.208
veh+0.207
arkin+0.205
Neg logit contributions
uld-0.275
 underscore-0.247
 Kling-0.245
 twins-0.234
��-0.232
Tokenk
Token ID74
Feature activation+5.41e-01

Pos logit contributions
uld+0.439
 underscore+0.390
 Kling+0.389
 twins+0.369
 tant+0.363
Neg logit contributions
hra-0.377
camp-0.371
anas-0.371
veh-0.371
 Niagara-0.367
Token 41
Token ID6073
Feature activation-4.16e-01

Pos logit contributions
uld+0.394
 underscore+0.350
 Kling+0.348
 twins+0.330
��+0.326
Neg logit contributions
hra-0.367
anas-0.361
camp-0.360
veh-0.358
 Niagara-0.356
Token),
Token ID828
Feature activation-1.76e-01

Pos logit contributions
hra+0.308
camp+0.304
anas+0.304
veh+0.303
 Niagara+0.301
Neg logit contributions
uld-0.275
 underscore-0.241
 Kling-0.240
 twins-0.225
 tant-0.221
Token not
Token ID407
Feature activation-7.22e-02

Pos logit contributions
hra+0.192
anas+0.189
camp+0.189
veh+0.189
 Niagara+0.187
Neg logit contributions
uld-0.145
 underscore-0.126
 Kling-0.125
 twins-0.117
 tant-0.115
Token use
Token ID779
Feature activation-3.69e-01

Pos logit contributions
hra+0.050
camp+0.050
anas+0.049
veh+0.049
 Niagara+0.049
Neg logit contributions
uld-0.051
 Kling-0.044
 underscore-0.044
��-0.042
 twins-0.042
Token social
Token ID1919
Feature activation-1.10e+00

Pos logit contributions
hra+0.268
anas+0.263
camp+0.263
veh+0.263
 Niagara+0.260
Neg logit contributions
uld-0.251
 underscore-0.221
 Kling-0.220
 twins-0.207
 tant-0.204
Token media
Token ID2056
Feature activation-7.29e-01

Pos logit contributions
camp+0.643
hra+0.637
veh+0.636
 Niagara+0.628
 Nasa+0.619
Neg logit contributions
uld-0.901
 underscore-0.797
 Kling-0.791
��-0.790
��-0.786
Token to
Token ID284
Feature activation-4.15e-01

Pos logit contributions
hra+0.564
camp+0.557
anas+0.556
veh+0.555
 Niagara+0.550
Neg logit contributions
uld-0.459
 underscore-0.397
 Kling-0.396
 twins-0.370
��-0.366
Token let
Token ID1309
Feature activation-1.94e+00

Pos logit contributions
hra+0.290
camp+0.289
anas+0.285
 Niagara+0.285
veh+0.285
Neg logit contributions
uld-0.290
 Kling-0.252
 underscore-0.246
��-0.245
��-0.239
Token people
Token ID661
Feature activation-7.13e-01

Pos logit contributions
hra+1.273
camp+1.266
 Niagara+1.263
anas+1.260
veh+1.259
Neg logit contributions
uld-1.450
 underscore-1.254
 Kling-1.245
��-1.207
 twins-1.185
Token know
Token ID760
Feature activation-2.97e-01

Pos logit contributions
hra+0.471
camp+0.468
anas+0.463
 Niagara+0.463
veh+0.459
Neg logit contributions
uld-0.528
 Kling-0.462
 underscore-0.458
��-0.451
��-0.440
Token what
Token ID644
Feature activation+2.70e-02

Pos logit contributions
hra+0.184
camp+0.183
 Niagara+0.181
anas+0.181
veh+0.181
Neg logit contributions
uld-0.233
 underscore-0.205
 Kling-0.203
��-0.195
 twins-0.194
Token you
Token ID345
Feature activation-3.81e-02

Pos logit contributions
uld+0.022
 underscore+0.019
 Kling+0.019
��+0.018
 twins+0.018
Neg logit contributions
hra-0.016
anas-0.016
camp-0.016
 Niagara-0.016
veh-0.016
Token look
Token ID804
Feature activation-1.45e-02

Pos logit contributions
hra+0.028
camp+0.028
anas+0.028
 Niagara+0.028
veh+0.028
Neg logit contributions
uld-0.025
 Kling-0.022
 underscore-0.022
��-0.021
��-0.021
Token 2010
Token ID3050
Feature activation+1.23e-01

Pos logit contributions
uld+0.238
 underscore+0.212
 Kling+0.205
��+0.204
��+0.203
Neg logit contributions
hra-0.221
camp-0.218
 Niagara-0.218
anas-0.218
 Nasa-0.217
Token United
Token ID1578
Feature activation+3.93e-02

Pos logit contributions
uld+0.090
 underscore+0.080
 Kling+0.079
 twins+0.075
��+0.074
Neg logit contributions
hra-0.083
anas-0.081
camp-0.081
veh-0.081
 Niagara-0.081
Token States
Token ID1829
Feature activation-8.97e-02

Pos logit contributions
uld+0.036
 Kling+0.033
 underscore+0.033
 twins+0.032
 tant+0.032
Neg logit contributions
hra-0.019
camp-0.018
veh-0.018
anas-0.018
MIT-0.018
Token 21
Token ID2310
Feature activation+4.04e-01

Pos logit contributions
hra+0.064
 Niagara+0.063
camp+0.063
anas+0.062
veh+0.062
Neg logit contributions
uld-0.063
 underscore-0.055
 Kling-0.054
��-0.052
��-0.051
Token50
Token ID1120
Feature activation-2.62e-01

Pos logit contributions
uld+0.268
 underscore+0.236
 Kling+0.231
��+0.219
 twins+0.219
Neg logit contributions
hra-0.300
camp-0.296
 Niagara-0.295
anas-0.294
veh-0.293
Token Posts
Token ID12043
Feature activation+2.91e+00

Pos logit contributions
camp+0.159
 Niagara+0.159
hra+0.158
 AMA+0.157
anas+0.157
Neg logit contributions
uld-0.207
 underscore-0.187
��-0.184
��-0.183
 Kling-0.178
Token #
Token ID1303
Feature activation+8.86e-01

Pos logit contributions
uld+1.548
 underscore+1.327
 Kling+1.309
 twins+1.215
 congr+1.211
Neg logit contributions
hra-2.503
anas-2.482
veh-2.480
MIT-2.478
camp-2.470
Token16
Token ID1433
Feature activation+7.07e-03

Pos logit contributions
uld+0.514
 underscore+0.458
 Kling+0.455
��+0.451
��+0.436
Neg logit contributions
hra-0.721
camp-0.720
veh-0.717
anas-0.709
MIT-0.699
Token Glad
Token ID28442
Feature activation-3.08e-01

Pos logit contributions
uld+0.005
 underscore+0.004
 Kling+0.004
��+0.004
��+0.004
Neg logit contributions
hra-0.005
camp-0.005
 Niagara-0.005
anas-0.005
 Nasa-0.005
Token to
Token ID284
Feature activation-1.80e-01

Pos logit contributions
hra+0.217
camp+0.215
anas+0.214
 Niagara+0.212
veh+0.211
Neg logit contributions
uld-0.215
 underscore-0.189
 Kling-0.188
��-0.178
 twins-0.177
Token hear
Token ID3285
Feature activation-6.43e-01

Pos logit contributions
hra+0.128
camp+0.126
veh+0.125
 Niagara+0.124
anas+0.124
Neg logit contributions
uld-0.125
 Kling-0.109
 underscore-0.108
��-0.106
��-0.105
Token mates
Token ID27973
Feature activation+2.61e-02

Pos logit contributions
hra+0.509
anas+0.506
camp+0.505
veh+0.501
 Niagara+0.500
Neg logit contributions
uld-0.376
 underscore-0.319
 Kling-0.319
��-0.298
 congr-0.293
Token,
Token ID11
Feature activation-1.83e-01

Pos logit contributions
uld+0.021
 Kling+0.019
 underscore+0.019
 twins+0.018
��+0.018
Neg logit contributions
hra-0.016
anas-0.015
camp-0.015
veh-0.015
 Niagara-0.015
Token V
Token ID569
Feature activation-7.40e-01

Pos logit contributions
hra+0.127
anas+0.126
camp+0.125
veh+0.124
 Niagara+0.124
Neg logit contributions
uld-0.130
 Kling-0.113
 underscore-0.113
��-0.108
 twins-0.106
TokenPs
Token ID12016
Feature activation-3.71e-02

Pos logit contributions
hra+0.497
anas+0.496
camp+0.492
veh+0.485
arkin+0.485
Neg logit contributions
uld-0.533
 Kling-0.477
 underscore-0.473
��-0.455
 twins-0.449
Token See
Token ID4091
Feature activation+8.44e-02

Pos logit contributions
hra+0.024
anas+0.024
camp+0.024
veh+0.023
 Niagara+0.023
Neg logit contributions
uld-0.028
 Kling-0.025
 underscore-0.025
 twins-0.024
��-0.023
Token Gallery
Token ID12917
Feature activation+2.52e+00

Pos logit contributions
uld+0.074
 underscore+0.066
 Kling+0.066
 twins+0.063
��+0.063
Neg logit contributions
hra-0.045
camp-0.044
veh-0.044
anas-0.044
 Niagara-0.044
Token Donald
Token ID3759
Feature activation-1.11e+00

Pos logit contributions
uld+1.679
 underscore+1.482
 Kling+1.478
 twins+1.393
 tant+1.362
Neg logit contributions
hra-1.844
anas-1.810
camp-1.809
veh-1.800
arkin-1.781
Token Trump
Token ID1301
Feature activation-9.96e-01

Pos logit contributions
hra+0.616
 Nasa+0.612
 Niagara+0.612
camp+0.611
anas+0.611
Neg logit contributions
uld-0.943
 underscore-0.832
��-0.817
 Kling-0.812
��-0.806
Token passes
Token ID8318
Feature activation+7.74e-02

Pos logit contributions
hra+0.599
anas+0.590
 Niagara+0.587
camp+0.587
veh+0.586
Neg logit contributions
uld-0.800
 underscore-0.714
 Kling-0.707
��-0.693
��-0.681
Token on
Token ID319
Feature activation-1.79e-01

Pos logit contributions
uld+0.053
 underscore+0.046
 Kling+0.046
 twins+0.043
��+0.043
Neg logit contributions
hra-0.056
anas-0.055
camp-0.055
veh-0.055
 Niagara-0.054
Token '
Token ID705
Feature activation-8.94e-01

Pos logit contributions
hra+0.141
anas+0.139
camp+0.139
veh+0.138
 Niagara+0.137
Neg logit contributions
uld-0.111
 underscore-0.096
 Kling-0.096
 twins-0.089
��-0.088
Token 2011
Token ID2813
Feature activation-1.01e-01

Pos logit contributions
uld+0.120
 underscore+0.107
��+0.104
 Kling+0.104
��+0.103
Neg logit contributions
hra-0.110
camp-0.109
 Niagara-0.109
 Nasa-0.108
anas-0.108
Token United
Token ID1578
Feature activation-2.00e-01

Pos logit contributions
hra+0.067
 Niagara+0.066
camp+0.065
anas+0.065
veh+0.065
Neg logit contributions
uld-0.075
 underscore-0.067
 Kling-0.065
 twins-0.063
��-0.062
Token States
Token ID1829
Feature activation-2.03e-01

Pos logit contributions
hra+0.095
camp+0.093
anas+0.093
veh+0.093
MIT+0.092
Neg logit contributions
uld-0.186
 Kling-0.170
 underscore-0.169
 twins-0.163
 tant-0.160
Token 228
Token ID29041
Feature activation+8.16e-02

Pos logit contributions
hra+0.142
 Niagara+0.140
camp+0.140
anas+0.139
veh+0.139
Neg logit contributions
uld-0.143
 underscore-0.126
 Kling-0.124
��-0.119
��-0.118
Token65
Token ID2996
Feature activation-1.01e-02

Pos logit contributions
uld+0.061
 underscore+0.054
 Kling+0.053
��+0.051
��+0.051
Neg logit contributions
hra-0.053
camp-0.053
 Niagara-0.053
veh-0.053
anas-0.052
Token Posts
Token ID12043
Feature activation+3.10e+00

Pos logit contributions
camp+0.006
hra+0.006
 Niagara+0.006
veh+0.006
 AMA+0.006
Neg logit contributions
uld-0.008
 underscore-0.007
��-0.007
��-0.007
 Kling-0.007
Token #
Token ID1303
Feature activation+1.02e+00

Pos logit contributions
uld+1.641
 underscore+1.404
 Kling+1.383
 twins+1.287
 congr+1.275
Neg logit contributions
hra-2.663
MIT-2.646
veh-2.638
anas-2.637
camp-2.629
Token13
Token ID1485
Feature activation-7.93e-02

Pos logit contributions
uld+0.707
 underscore+0.624
 Kling+0.619
 twins+0.589
��+0.587
Neg logit contributions
hra-0.731
camp-0.724
veh-0.723
anas-0.717
arkin-0.711
Token On
Token ID1550
Feature activation-3.24e-01

Pos logit contributions
hra+0.058
camp+0.057
 Niagara+0.057
anas+0.057
 Nasa+0.057
Neg logit contributions
uld-0.053
 underscore-0.046
 Kling-0.046
��-0.044
��-0.044
Token September
Token ID2693
Feature activation-1.55e-01

Pos logit contributions
hra+0.197
 Niagara+0.193
camp+0.191
 Nasa+0.190
veh+0.190
Neg logit contributions
uld-0.258
 underscore-0.231
 Kling-0.231
��-0.223
��-0.222
Token 20
Token ID1160
Feature activation-7.63e-01

Pos logit contributions
hra+0.127
anas+0.126
camp+0.125
veh+0.125
 Niagara+0.124
Neg logit contributions
uld-0.091
 underscore-0.078
 Kling-0.078
 twins-0.072
 tant-0.071
Token June
Token ID2795
Feature activation+1.63e-01

Pos logit contributions
uld+1.239
 underscore+1.107
 Kling+1.102
 twins+1.048
��+1.034
Neg logit contributions
hra-0.992
camp-0.976
anas-0.974
veh-0.971
 Niagara-0.963
Token 2012
Token ID2321
Feature activation-3.24e-01

Pos logit contributions
uld+0.119
 underscore+0.106
 Kling+0.103
��+0.101
��+0.101
Neg logit contributions
hra-0.110
camp-0.109
 Niagara-0.108
anas-0.108
 Nasa-0.108
Token Canada
Token ID3340
Feature activation+2.08e-01

Pos logit contributions
hra+0.215
 Niagara+0.212
camp+0.211
anas+0.211
veh+0.210
Neg logit contributions
uld-0.240
 underscore-0.214
 Kling-0.209
��-0.201
 twins-0.201
Token 16
Token ID1467
Feature activation-2.15e-02

Pos logit contributions
uld+0.150
 underscore+0.132
 Kling+0.129
��+0.126
��+0.125
Neg logit contributions
hra-0.143
 Niagara-0.142
camp-0.141
anas-0.140
 Nasa-0.140
Token10
Token ID940
Feature activation-3.70e-01

Pos logit contributions
hra+0.014
camp+0.014
anas+0.014
veh+0.014
 Niagara+0.014
Neg logit contributions
uld-0.016
 underscore-0.014
 Kling-0.014
��-0.014
 twins-0.014
Token Posts
Token ID12043
Feature activation+2.81e+00

Pos logit contributions
camp+0.227
 Niagara+0.227
hra+0.225
 AMA+0.225
veh+0.225
Neg logit contributions
uld-0.289
 underscore-0.260
��-0.256
��-0.252
 Kling-0.250
Token #
Token ID1303
Feature activation+9.53e-01

Pos logit contributions
uld+1.498
 underscore+1.287
 Kling+1.271
 twins+1.179
 congr+1.165
Neg logit contributions
hra-2.418
MIT-2.397
anas-2.393
veh-2.391
camp-2.385
Token14
Token ID1415
Feature activation-5.46e-01

Pos logit contributions
uld+0.623
 underscore+0.549
 Kling+0.544
��+0.516
 twins+0.515
Neg logit contributions
hra-0.717
veh-0.711
camp-0.708
anas-0.703
arkin-0.697
Token On
Token ID1550
Feature activation-3.45e-01

Pos logit contributions
hra+0.400
camp+0.396
 Niagara+0.395
 Nasa+0.392
anas+0.392
Neg logit contributions
uld-0.363
 underscore-0.316
 Kling-0.314
��-0.304
��-0.302
Token September
Token ID2693
Feature activation-2.63e-01

Pos logit contributions
hra+0.208
 Niagara+0.204
 Nasa+0.202
camp+0.202
veh+0.201
Neg logit contributions
uld-0.276
 underscore-0.248
 Kling-0.248
��-0.240
��-0.239
Token 20
Token ID1160
Feature activation-1.00e+00

Pos logit contributions
hra+0.222
camp+0.220
veh+0.219
anas+0.219
 Niagara+0.217
Neg logit contributions
uld-0.147
 underscore-0.125
 Kling-0.125
 twins-0.115
 tant-0.113
Token 2011
Token ID2813
Feature activation-2.48e-01

Pos logit contributions
uld+0.394
 underscore+0.350
 Kling+0.338
��+0.336
��+0.335
Neg logit contributions
hra-0.368
camp-0.365
anas-0.364
 Niagara-0.364
 Nasa-0.361
Token United
Token ID1578
Feature activation-2.81e-01

Pos logit contributions
hra+0.165
 Niagara+0.162
camp+0.162
anas+0.162
veh+0.162
Neg logit contributions
uld-0.183
 underscore-0.163
 Kling-0.159
 twins-0.153
��-0.153
Token States
Token ID1829
Feature activation-2.92e-01

Pos logit contributions
hra+0.136
anas+0.133
camp+0.132
veh+0.132
MIT+0.131
Neg logit contributions
uld-0.258
 Kling-0.236
 underscore-0.235
 twins-0.226
 tant-0.223
Token 6
Token ID718
Feature activation-9.25e-02

Pos logit contributions
hra+0.205
 Niagara+0.203
camp+0.202
anas+0.201
veh+0.201
Neg logit contributions
uld-0.206
 underscore-0.181
 Kling-0.177
��-0.172
��-0.170
Token12
Token ID1065
Feature activation+8.47e-02

Pos logit contributions
hra+0.058
camp+0.057
anas+0.057
veh+0.057
 Niagara+0.056
Neg logit contributions
uld-0.072
 underscore-0.065
 Kling-0.064
 twins-0.061
��-0.061
Token Posts
Token ID12043
Feature activation+3.28e+00

Pos logit contributions
uld+0.065
 underscore+0.058
 Kling+0.057
��+0.056
��+0.056
Neg logit contributions
camp-0.053
hra-0.053
 Niagara-0.053
veh-0.053
anas-0.053
Token #
Token ID1303
Feature activation+1.01e+00

Pos logit contributions
uld+1.706
 underscore+1.459
 Kling+1.436
 twins+1.336
 congr+1.326
Neg logit contributions
hra-2.849
MIT-2.827
veh-2.820
anas-2.815
camp-2.806
Token17
Token ID1558
Feature activation+1.29e-01

Pos logit contributions
uld+0.729
 underscore+0.656
 Kling+0.655
��+0.628
 twins+0.624
Neg logit contributions
hra-0.677
camp-0.675
veh-0.675
anas-0.668
 Niagara-0.657
Token Good
Token ID4599
Feature activation-6.76e-01

Pos logit contributions
uld+0.086
 underscore+0.075
 Kling+0.075
��+0.072
��+0.071
Neg logit contributions
hra-0.094
camp-0.093
 Niagara-0.093
anas-0.093
 Nasa-0.092
Token to
Token ID284
Feature activation-6.43e-01

Pos logit contributions
hra+0.513
camp+0.507
anas+0.503
veh+0.503
 Niagara+0.499
Neg logit contributions
uld-0.435
 underscore-0.379
 Kling-0.377
��-0.355
 twins-0.352
Token hear
Token ID3285
Feature activation-9.02e-01

Pos logit contributions
hra+0.471
 Niagara+0.461
camp+0.460
veh+0.458
anas+0.456
Neg logit contributions
uld-0.432
 Kling-0.373
 underscore-0.370
��-0.361
��-0.357
Token(
Token ID7
Feature activation-3.91e-02

Pos logit contributions
uld+0.196
 underscore+0.178
 Kling+0.175
��+0.170
 twins+0.168
Neg logit contributions
hra-0.150
camp-0.149
veh-0.149
anas-0.147
MIT-0.146
Token316
Token ID33400
Feature activation+1.87e-01

Pos logit contributions
hra+0.026
camp+0.026
veh+0.025
anas+0.025
 Niagara+0.025
Neg logit contributions
uld-0.029
 underscore-0.026
 Kling-0.026
 twins-0.024
��-0.024
Tokens
Token ID82
Feature activation+7.11e-02

Pos logit contributions
uld+0.132
 underscore+0.119
 Kling+0.117
��+0.113
��+0.113
Neg logit contributions
hra-0.129
anas-0.129
veh-0.129
camp-0.128
arkin-0.126
Token)}
Token ID38165
Feature activation+1.63e-01

Pos logit contributions
uld+0.031
 underscore+0.025
 Kling+0.024
��+0.023
��+0.023
Neg logit contributions
hra-0.069
veh-0.068
camp-0.068
 Niagara-0.068
anas-0.068
Token 59
Token ID7863
Feature activation+3.27e-01

Pos logit contributions
uld+0.114
 Kling+0.101
 underscore+0.101
 twins+0.094
 congr+0.094
Neg logit contributions
hra-0.114
anas-0.113
camp-0.113
veh-0.112
MIT-0.111
Token.
Token ID13
Feature activation+7.61e-01

Pos logit contributions
uld+0.270
 underscore+0.244
 Kling+0.239
��+0.234
��+0.232
Neg logit contributions
hra-0.191
camp-0.187
veh-0.187
anas-0.185
 Niagara-0.184
Token N
Token ID399
Feature activation-1.13e-01

Pos logit contributions
uld+0.517
 underscore+0.452
 Kling+0.446
 twins+0.423
��+0.420
Neg logit contributions
hra-0.553
camp-0.543
veh-0.541
anas-0.541
 Niagara-0.539
Tokenf
Token ID69
Feature activation+8.05e-01

Pos logit contributions
hra+0.073
camp+0.072
veh+0.071
anas+0.071
arkin+0.070
Neg logit contributions
uld-0.086
 underscore-0.078
 Kling-0.077
 twins-0.073
��-0.072
Token5
Token ID20
Feature activation+7.59e-01

Pos logit contributions
uld+0.590
 underscore+0.524
 Kling+0.518
 twins+0.496
��+0.488
Neg logit contributions
hra-0.543
camp-0.532
veh-0.532
anas-0.532
 Niagara-0.526
Token+
Token ID10
Feature activation-6.75e-01

Pos logit contributions
uld+0.640
 underscore+0.577
 Kling+0.576
 twins+0.549
 congr+0.542
Neg logit contributions
hra-0.426
anas-0.418
camp-0.417
veh-0.416
 Niagara-0.411
Token {
Token ID1391
Feature activation-1.59e-01

Pos logit contributions
hra+0.369
camp+0.361
anas+0.361
veh+0.359
 Niagara+0.356
Neg logit contributions
uld-0.579
 underscore-0.523
 Kling-0.522
 twins-0.498
 tant-0.492
Token Max
Token ID5436
Feature activation+6.52e-01

Pos logit contributions
hra+0.013
anas+0.013
camp+0.013
veh+0.012
 Niagara+0.012
Neg logit contributions
uld-0.011
 underscore-0.010
 Kling-0.010
 twins-0.009
��-0.009
Tokenie
Token ID494
Feature activation+4.53e-01

Pos logit contributions
uld+0.497
 underscore+0.443
 Kling+0.441
 twins+0.419
 tant+0.413
Neg logit contributions
hra-0.418
camp-0.411
anas-0.411
veh-0.409
 Niagara-0.406
Token,
Token ID11
Feature activation+2.90e-01

Pos logit contributions
uld+0.353
 underscore+0.315
 Kling+0.315
 twins+0.299
 tant+0.294
Neg logit contributions
hra-0.283
anas-0.278
camp-0.278
veh-0.277
 Niagara-0.274
Token of
Token ID286
Feature activation+3.61e-01

Pos logit contributions
uld+0.208
 underscore+0.183
 Kling+0.182
��+0.171
 twins+0.171
Neg logit contributions
hra-0.198
camp-0.197
anas-0.196
veh-0.196
 Niagara-0.195
Token Se
Token ID1001
Feature activation+3.09e-01

Pos logit contributions
uld+0.273
 underscore+0.243
 Kling+0.242
 twins+0.230
 tant+0.227
Neg logit contributions
hra-0.234
camp-0.230
anas-0.230
veh-0.229
 Niagara-0.227
Tokengu
Token ID5162
Feature activation+9.14e-01

Pos logit contributions
uld+0.210
 underscore+0.185
 Kling+0.185
 twins+0.174
 tant+0.171
Neg logit contributions
hra-0.223
camp-0.220
anas-0.219
veh-0.219
 Niagara-0.217
Tokenin
Token ID259
Feature activation+3.24e-01

Pos logit contributions
uld+0.755
 underscore+0.682
 Kling+0.679
 twins+0.648
��+0.644
Neg logit contributions
hra-0.529
anas-0.520
camp-0.518
veh-0.516
 Niagara-0.511
Token The
Token ID383
Feature activation-1.16e-01

Pos logit contributions
uld+0.260
 underscore+0.233
 Kling+0.232
 twins+0.221
 tant+0.218
Neg logit contributions
hra-0.195
anas-0.191
camp-0.191
veh-0.191
 Niagara-0.189
Tokenat
Token ID265
Feature activation+7.29e-02

Pos logit contributions
hra+0.064
anas+0.063
camp+0.063
veh+0.063
 Niagara+0.062
Neg logit contributions
uld-0.098
 underscore-0.089
 Kling-0.088
 twins-0.084
 congr-0.083
Tokenres
Token ID411
Feature activation-1.68e-01

Pos logit contributions
uld+0.073
 underscore+0.067
 Kling+0.067
 twins+0.065
 tant+0.064
Neg logit contributions
hra-0.029
camp-0.028
anas-0.028
veh-0.028
 Niagara-0.028
Token Inc
Token ID3457
Feature activation+2.66e-01

Pos logit contributions
hra+0.107
camp+0.106
anas+0.106
veh+0.105
 Niagara+0.104
Neg logit contributions
uld-0.129
 underscore-0.115
 Kling-0.114
 twins-0.108
 tant-0.107
Token
Token ID447
Feature activation-1.06e-03

Pos logit contributions
camp+0.032
hra+0.032
veh+0.032
anas+0.032
MIT+0.031
Neg logit contributions
uld-0.038
 underscore-0.035
 Kling-0.034
��-0.033
��-0.033
Token
Token ID247
Feature activation+4.06e-02

Pos logit contributions
hra+0.001
camp+0.001
anas+0.001
MIT+0.001
 Niagara+0.001
Neg logit contributions
uld-0.000
 Kling-0.000
 underscore-0.000
 twins-0.000
 congr-0.000
Tokent
Token ID83
Feature activation-2.74e-01

Pos logit contributions
uld+0.024
 underscore+0.021
 Kling+0.021
 twins+0.020
��+0.019
Neg logit contributions
hra-0.032
camp-0.032
anas-0.032
veh-0.032
MIT-0.032
Token nearly
Token ID3016
Feature activation+6.01e-01

Pos logit contributions
camp+0.187
hra+0.187
anas+0.187
veh+0.185
 Nasa+0.184
Neg logit contributions
uld-0.196
 underscore-0.173
 Kling-0.168
��-0.163
��-0.160
Token as
Token ID355
Feature activation-2.49e-04

Pos logit contributions
uld+0.440
 underscore+0.387
 Kling+0.380
��+0.364
��+0.362
Neg logit contributions
hra-0.405
anas-0.402
camp-0.400
 Niagara-0.399
 Nasa-0.399
Token entertained
Token ID34988
Feature activation+7.29e-01

Pos logit contributions
hra+0.000
camp+0.000
anas+0.000
veh+0.000
 Niagara+0.000
Neg logit contributions
uld-0.000
 underscore-0.000
 Kling-0.000
��-0.000
��-0.000
Token by
Token ID416
Feature activation-1.79e-01

Pos logit contributions
uld+0.522
 underscore+0.460
 Kling+0.458
 twins+0.433
��+0.428
Neg logit contributions
hra-0.502
camp-0.496
anas-0.495
veh-0.493
 Niagara-0.488
Token it
Token ID340
Feature activation-2.24e-01

Pos logit contributions
hra+0.126
anas+0.126
 Niagara+0.125
camp+0.125
 Nasa+0.124
Neg logit contributions
uld-0.125
 underscore-0.109
 Kling-0.107
��-0.102
��-0.102
Token
Token ID851
Feature activation+1.07e-01

Pos logit contributions
hra+0.152
camp+0.150
anas+0.150
veh+0.149
 Niagara+0.148
Neg logit contributions
uld-0.163
 underscore-0.144
 Kling-0.144
 twins-0.136
��-0.134
Token in
Token ID287
Feature activation-5.22e-01

Pos logit contributions
uld+0.078
 underscore+0.068
 Kling+0.068
��+0.065
��+0.064
Neg logit contributions
hra-0.073
camp-0.072
anas-0.072
veh-0.071
 Niagara-0.071
Token fact
Token ID1109
Feature activation-7.00e-01

Pos logit contributions
hra+0.354
camp+0.353
anas+0.353
veh+0.349
 Niagara+0.349
Neg logit contributions
uld-0.377
 underscore-0.329
 Kling-0.327
��-0.313
 twins-0.309
Token bitcoins
Token ID22690
Feature activation-3.48e-01

Pos logit contributions
uld+0.223
 underscore+0.199
 Kling+0.199
 twins+0.188
��+0.188
Neg logit contributions
hra-0.174
anas-0.171
camp-0.171
veh-0.171
 Niagara-0.170
Token at
Token ID379
Feature activation+2.02e-02

Pos logit contributions
hra+0.248
anas+0.247
camp+0.247
veh+0.245
 Niagara+0.244
Neg logit contributions
uld-0.238
 Kling-0.210
 underscore-0.209
��-0.199
��-0.197
Token last
Token ID938
Feature activation-3.14e-02

Pos logit contributions
uld+0.014
 underscore+0.012
 Kling+0.012
 twins+0.012
��+0.012
Neg logit contributions
hra-0.014
camp-0.014
anas-0.014
 Niagara-0.014
veh-0.014
Token week
Token ID1285
Feature activation-1.64e-01

Pos logit contributions
hra+0.021
anas+0.021
camp+0.021
veh+0.021
 Niagara+0.021
Neg logit contributions
uld-0.023
 Kling-0.020
 underscore-0.020
 twins-0.019
��-0.019
Token's
Token ID338
Feature activation-1.48e-01

Pos logit contributions
hra+0.111
veh+0.109
anas+0.109
camp+0.109
MIT+0.108
Neg logit contributions
uld-0.120
 underscore-0.107
 Kling-0.105
 twins-0.100
 tant-0.099
Token price
Token ID2756
Feature activation+9.58e-01

Pos logit contributions
hra+0.100
camp+0.099
anas+0.099
 Niagara+0.098
veh+0.098
Neg logit contributions
uld-0.107
 underscore-0.094
 Kling-0.093
��-0.089
 twins-0.088
Token of
Token ID286
Feature activation+1.63e-01

Pos logit contributions
uld+0.652
 Kling+0.569
 underscore+0.569
 twins+0.535
��+0.529
Neg logit contributions
hra-0.693
camp-0.685
anas-0.684
veh-0.680
 Niagara-0.677
Token $
Token ID720
Feature activation+8.34e-02

Pos logit contributions
uld+0.139
 Kling+0.124
 underscore+0.124
 twins+0.118
��+0.118
Neg logit contributions
hra-0.090
camp-0.089
anas-0.088
veh-0.088
 Niagara-0.088
Token140
Token ID15187
Feature activation+3.57e-01

Pos logit contributions
uld+0.067
 underscore+0.061
 Kling+0.060
 twins+0.058
 tant+0.057
Neg logit contributions
hra-0.050
camp-0.049
anas-0.049
veh-0.049
 Niagara-0.048
Token is
Token ID318
Feature activation-3.77e-01

Pos logit contributions
uld+0.249
 Kling+0.220
 underscore+0.220
 twins+0.207
��+0.205
Neg logit contributions
hra-0.250
camp-0.247
anas-0.247
veh-0.246
 Niagara-0.244
Token still
Token ID991
Feature activation-9.68e-02

Pos logit contributions
hra+0.263
 Niagara+0.261
camp+0.261
anas+0.260
 Nasa+0.259
Neg logit contributions
uld-0.264
 underscore-0.229
 Kling-0.228
��-0.219
��-0.216
Token Italy
Token ID8031
Feature activation+3.58e-01

Pos logit contributions
uld+0.245
 underscore+0.220
 Kling+0.218
 twins+0.208
��+0.206
Neg logit contributions
hra-0.192
camp-0.190
anas-0.189
 Niagara-0.188
veh-0.188
Token,
Token ID11
Feature activation+7.12e-01

Pos logit contributions
uld+0.290
 underscore+0.260
 Kling+0.259
 twins+0.247
 tant+0.243
Neg logit contributions
hra-0.213
camp-0.209
anas-0.208
veh-0.208
 Niagara-0.206
Token but
Token ID475
Feature activation+1.07e-01

Pos logit contributions
uld+0.502
 underscore+0.441
 Kling+0.438
��+0.418
 twins+0.414
Neg logit contributions
hra-0.498
anas-0.489
camp-0.489
veh-0.488
 Niagara-0.488
Token since
Token ID1201
Feature activation-2.73e-01

Pos logit contributions
uld+0.075
 underscore+0.066
 Kling+0.065
��+0.062
 twins+0.062
Neg logit contributions
hra-0.075
anas-0.074
camp-0.074
veh-0.074
 Niagara-0.074
Token the
Token ID262
Feature activation+4.40e-02

Pos logit contributions
hra+0.187
camp+0.185
anas+0.185
 Niagara+0.184
veh+0.183
Neg logit contributions
uld-0.194
 underscore-0.172
 Kling-0.169
��-0.162
��-0.161
Token Allies
Token ID32430
Feature activation+8.48e-01

Pos logit contributions
uld+0.031
 underscore+0.028
 Kling+0.027
��+0.026
 twins+0.026
Neg logit contributions
hra-0.030
camp-0.030
anas-0.030
veh-0.030
 Niagara-0.030
Token had
Token ID550
Feature activation-2.55e-01

Pos logit contributions
uld+0.537
 underscore+0.466
 Kling+0.465
 twins+0.435
 congr+0.427
Neg logit contributions
hra-0.654
camp-0.645
anas-0.644
veh-0.643
 Niagara-0.638
Token never
Token ID1239
Feature activation-1.69e-02

Pos logit contributions
hra+0.176
camp+0.173
anas+0.173
veh+0.173
 Niagara+0.172
Neg logit contributions
uld-0.182
 underscore-0.160
 Kling-0.159
��-0.151
 twins-0.150
Token recognised
Token ID20915
Feature activation-2.31e-01

Pos logit contributions
hra+0.011
anas+0.011
camp+0.011
veh+0.011
 Niagara+0.011
Neg logit contributions
uld-0.012
 underscore-0.011
 Kling-0.011
��-0.010
 twins-0.010
Token the
Token ID262
Feature activation+9.54e-02

Pos logit contributions
hra+0.163
anas+0.162
camp+0.162
 Niagara+0.162
veh+0.160
Neg logit contributions
uld-0.160
 underscore-0.141
 Kling-0.138
��-0.132
 twins-0.131
Token R
Token ID371
Feature activation+6.57e-02

Pos logit contributions
uld+0.067
 underscore+0.058
 Kling+0.058
��+0.055
 twins+0.054
Neg logit contributions
hra-0.067
anas-0.066
camp-0.066
 Niagara-0.066
veh-0.066
Token a
Token ID257
Feature activation-7.50e-02

Pos logit contributions
uld+0.044
 underscore+0.038
 Kling+0.038
��+0.036
 twins+0.036
Neg logit contributions
hra-0.045
anas-0.044
camp-0.044
veh-0.044
 Niagara-0.044
Token problem
Token ID1917
Feature activation+3.25e-01

Pos logit contributions
hra+0.053
camp+0.052
veh+0.052
anas+0.052
 Niagara+0.052
Neg logit contributions
uld-0.052
 underscore-0.046
 Kling-0.046
 twins-0.043
��-0.043
Token worse
Token ID4785
Feature activation-3.35e-01

Pos logit contributions
uld+0.217
 underscore+0.190
 Kling+0.189
 twins+0.178
 congr+0.174
Neg logit contributions
hra-0.240
camp-0.237
anas-0.236
veh-0.236
 Niagara-0.234
Token.
Token ID13
Feature activation-7.74e-02

Pos logit contributions
hra+0.206
camp+0.203
anas+0.202
veh+0.202
 Niagara+0.200
Neg logit contributions
uld-0.264
 Kling-0.235
 underscore-0.235
 twins-0.224
��-0.221
Token
Token ID198
Feature activation-4.96e-02

Pos logit contributions
hra+0.062
anas+0.061
camp+0.061
veh+0.061
 Niagara+0.060
Neg logit contributions
uld-0.047
 Kling-0.041
 underscore-0.041
 twins-0.038
 tant-0.037
Token
Token ID198
Feature activation-1.75e-01

Pos logit contributions
hra+0.037
camp+0.037
veh+0.037
anas+0.037
 Niagara+0.037
Neg logit contributions
uld-0.032
 underscore-0.029
 Kling-0.028
��-0.028
��-0.028
TokenSome
Token ID4366
Feature activation+2.06e-01

Pos logit contributions
hra+0.126
camp+0.125
veh+0.125
anas+0.124
MIT+0.124
Neg logit contributions
uld-0.119
 underscore-0.105
 Kling-0.104
 twins-0.099
��-0.097
Token examples
Token ID6096
Feature activation-6.51e-01

Pos logit contributions
uld+0.147
 underscore+0.128
 Kling+0.128
 twins+0.121
��+0.120
Neg logit contributions
hra-0.143
camp-0.141
anas-0.140
veh-0.140
 Niagara-0.140
Token:
Token ID25
Feature activation-9.96e-02

Pos logit contributions
hra+0.443
camp+0.438
anas+0.434
 Niagara+0.432
veh+0.432
Neg logit contributions
uld-0.471
 Kling-0.412
 underscore-0.407
 twins-0.394
��-0.389
Token The
Token ID383
Feature activation-1.19e-01

Pos logit contributions
hra+0.075
camp+0.073
anas+0.073
veh+0.073
 Niagara+0.073
Neg logit contributions
uld-0.065
 underscore-0.057
 Kling-0.056
 twins-0.053
��-0.053
Token bombshell
Token ID48714
Feature activation+5.85e-01

Pos logit contributions
hra+0.080
camp+0.079
anas+0.079
 Niagara+0.079
veh+0.079
Neg logit contributions
uld-0.088
 underscore-0.077
 Kling-0.077
��-0.072
 twins-0.072
Token after
Token ID706
Feature activation+3.96e-02

Pos logit contributions
camp+0.134
hra+0.134
anas+0.132
 Niagara+0.132
veh+0.132
Neg logit contributions
uld-0.125
 underscore-0.109
 Kling-0.109
��-0.102
��-0.102
Token dropping
Token ID12047
Feature activation+6.33e-02

Pos logit contributions
uld+0.027
 underscore+0.023
 Kling+0.023
��+0.022
 twins+0.022
Neg logit contributions
hra-0.029
camp-0.028
anas-0.028
veh-0.028
 Niagara-0.028
Token out
Token ID503
Feature activation+5.77e-01

Pos logit contributions
uld+0.036
 underscore+0.030
 Kling+0.030
 twins+0.027
 congr+0.027
Neg logit contributions
hra-0.053
camp-0.053
anas-0.053
 Niagara-0.052
veh-0.052
Token and
Token ID290
Feature activation+2.17e-02

Pos logit contributions
uld+0.415
 underscore+0.367
 Kling+0.366
 twins+0.347
 tant+0.341
Neg logit contributions
hra-0.395
anas-0.389
camp-0.388
veh-0.388
 Niagara-0.384
Token becoming
Token ID5033
Feature activation-8.00e-01

Pos logit contributions
uld+0.016
 underscore+0.014
 Kling+0.014
 twins+0.014
��+0.014
Neg logit contributions
hra-0.014
camp-0.014
anas-0.014
veh-0.014
 Niagara-0.014
Token a
Token ID257
Feature activation-4.31e-01

Pos logit contributions
hra+0.576
camp+0.573
anas+0.568
 Niagara+0.568
veh+0.566
Neg logit contributions
uld-0.546
 underscore-0.474
 Kling-0.471
��-0.443
 twins-0.441
Token drug
Token ID2563
Feature activation+3.79e-01

Pos logit contributions
hra+0.280
camp+0.277
veh+0.276
anas+0.276
 Niagara+0.274
Neg logit contributions
uld-0.324
 underscore-0.288
 Kling-0.286
 twins-0.271
��-0.269
Token addict
Token ID19678
Feature activation-4.95e-01

Pos logit contributions
uld+0.293
 Kling+0.261
 underscore+0.258
��+0.245
 twins+0.244
Neg logit contributions
hra-0.239
camp-0.238
anas-0.237
veh-0.237
 Nasa-0.233
Token.
Token ID13
Feature activation-4.84e-01

Pos logit contributions
hra+0.305
camp+0.303
veh+0.300
anas+0.300
 Niagara+0.298
Neg logit contributions
uld-0.389
 Kling-0.346
 underscore-0.345
��-0.326
 twins-0.325
Token The
Token ID383
Feature activation-4.61e-01

Pos logit contributions
camp+0.367
 Niagara+0.367
hra+0.366
 Nasa+0.364
veh+0.363
Neg logit contributions
uld-0.311
 underscore-0.269
 Kling-0.260
��-0.258
��-0.254
Token common
Token ID2219
Feature activation+2.91e-01

Pos logit contributions
hra+0.304
anas+0.303
camp+0.303
 Niagara+0.302
veh+0.299
Neg logit contributions
uld-0.340
 underscore-0.296
 Kling-0.294
��-0.285
��-0.283
Token,
Token ID11
Feature activation+1.39e-01

Pos logit contributions
uld+0.170
 underscore+0.152
 Kling+0.151
 twins+0.144
 tant+0.142
Neg logit contributions
hra-0.139
anas-0.136
camp-0.136
veh-0.136
 Niagara-0.134
Token000
Token ID830
Feature activation+5.60e-02

Pos logit contributions
uld+0.107
 underscore+0.095
 Kling+0.095
 twins+0.090
 tant+0.089
Neg logit contributions
hra-0.088
camp-0.087
anas-0.087
veh-0.087
 Niagara-0.086
Token dues
Token ID40789
Feature activation-6.19e-01

Pos logit contributions
uld+0.031
 underscore+0.027
 Kling+0.027
 twins+0.025
��+0.024
Neg logit contributions
hra-0.047
camp-0.047
anas-0.047
veh-0.046
 Niagara-0.046
Token pay
Token ID1414
Feature activation-5.05e-01

Pos logit contributions
hra+0.452
anas+0.447
camp+0.446
veh+0.444
 Niagara+0.443
Neg logit contributions
uld-0.418
 underscore-0.365
 Kling-0.363
��-0.341
 twins-0.338
Tokeners
Token ID364
Feature activation+2.56e-01

Pos logit contributions
hra+0.319
anas+0.318
veh+0.315
camp+0.315
 Niagara+0.312
Neg logit contributions
uld-0.387
 underscore-0.346
 Kling-0.344
��-0.327
 twins-0.325
Token in
Token ID287
Feature activation+8.48e-02

Pos logit contributions
uld+0.170
 underscore+0.148
 Kling+0.148
 twins+0.138
��+0.137
Neg logit contributions
hra-0.190
camp-0.187
anas-0.187
veh-0.186
 Niagara-0.186
Token 1928
Token ID35768
Feature activation-6.44e-02

Pos logit contributions
uld+0.072
 underscore+0.066
 Kling+0.065
��+0.063
 twins+0.062
Neg logit contributions
hra-0.046
 Niagara-0.046
anas-0.045
camp-0.045
veh-0.045
Token to
Token ID284
Feature activation+1.07e-02

Pos logit contributions
hra+0.051
camp+0.051
anas+0.051
veh+0.051
 Niagara+0.050
Neg logit contributions
uld-0.039
 underscore-0.034
 Kling-0.034
 twins-0.031
 tant-0.031
Token a
Token ID257
Feature activation+1.03e-01

Pos logit contributions
uld+0.007
 underscore+0.006
 Kling+0.006
��+0.006
��+0.006
Neg logit contributions
hra-0.008
 Niagara-0.008
anas-0.008
camp-0.008
 Nasa-0.008
Token membership
Token ID9931
Feature activation+3.81e-01

Pos logit contributions
uld+0.074
 underscore+0.065
 Kling+0.065
 twins+0.061
��+0.061
Neg logit contributions
hra-0.071
camp-0.070
anas-0.069
 Niagara-0.069
veh-0.069
Token of
Token ID286
Feature activation-2.11e-01

Pos logit contributions
uld+0.280
 underscore+0.247
 Kling+0.247
 twins+0.233
��+0.231
Neg logit contributions
hra-0.255
camp-0.251
anas-0.251
veh-0.250
 Niagara-0.249
Token
Token ID198
Feature activation+1.81e-01

Pos logit contributions
hra+0.205
camp+0.203
anas+0.203
veh+0.202
 Niagara+0.201
Neg logit contributions
uld-0.189
 Kling-0.165
 underscore-0.164
 twins-0.154
��-0.152
Token
Token ID198
Feature activation+7.67e-02

Pos logit contributions
uld+0.116
 underscore+0.102
 Kling+0.101
 twins+0.095
��+0.095
Neg logit contributions
hra-0.138
camp-0.137
veh-0.137
MIT-0.136
anas-0.136
TokenOne
Token ID3198
Feature activation+4.69e-01

Pos logit contributions
uld+0.049
 underscore+0.043
 Kling+0.043
 twins+0.040
 tant+0.039
Neg logit contributions
hra-0.059
camp-0.058
anas-0.058
veh-0.058
MIT-0.057
Token feature
Token ID3895
Feature activation+9.42e-02

Pos logit contributions
uld+0.329
 Kling+0.289
 underscore+0.287
 twins+0.273
��+0.269
Neg logit contributions
hra-0.329
camp-0.325
veh-0.324
anas-0.324
 Niagara-0.322
Token of
Token ID286
Feature activation+4.83e-02

Pos logit contributions
uld+0.068
 Kling+0.060
 underscore+0.060
 twins+0.057
��+0.056
Neg logit contributions
hra-0.064
anas-0.063
camp-0.063
veh-0.063
 Niagara-0.062
Token flat
Token ID6228
Feature activation+1.50e-01

Pos logit contributions
uld+0.033
 Kling+0.028
 underscore+0.028
 twins+0.027
��+0.026
Neg logit contributions
hra-0.035
anas-0.035
camp-0.035
veh-0.035
 Niagara-0.034
Tokenbuff
Token ID36873
Feature activation-1.65e-01

Pos logit contributions
uld+0.112
 underscore+0.099
 Kling+0.099
 twins+0.094
 tant+0.092
Neg logit contributions
hra-0.099
camp-0.097
anas-0.097
veh-0.097
 Niagara-0.096
Tokeners
Token ID364
Feature activation+4.28e-01

Pos logit contributions
hra+0.118
camp+0.117
anas+0.117
veh+0.116
 Niagara+0.116
Neg logit contributions
uld-0.113
 Kling-0.100
 underscore-0.099
 twins-0.093
��-0.092
Token is
Token ID318
Feature activation-3.40e-01

Pos logit contributions
uld+0.308
 Kling+0.272
 underscore+0.271
 twins+0.256
��+0.253
Neg logit contributions
hra-0.292
camp-0.288
anas-0.288
veh-0.286
 Niagara-0.285
Token that
Token ID326
Feature activation+5.89e-02

Pos logit contributions
hra+0.238
camp+0.234
anas+0.233
veh+0.233
 Niagara+0.231
Neg logit contributions
uld-0.240
 Kling-0.211
 underscore-0.210
 twins-0.199
��-0.196
Token it
Token ID340
Feature activation+5.20e-02

Pos logit contributions
uld+0.043
 underscore+0.038
 Kling+0.038
 twins+0.036
��+0.036
Neg logit contributions
hra-0.039
anas-0.039
camp-0.039
veh-0.039
 Niagara-0.038
Token mar
Token ID1667
Feature activation+3.82e-01

Pos logit contributions
uld+0.459
 underscore+0.404
 Kling+0.402
 twins+0.379
 congr+0.373
Neg logit contributions
hra-0.473
camp-0.466
anas-0.466
veh-0.464
 Niagara-0.461
Tokenchers
Token ID3533
Feature activation+6.17e-01

Pos logit contributions
uld+0.282
 underscore+0.249
 Kling+0.249
 twins+0.235
 tant+0.232
Neg logit contributions
hra-0.255
camp-0.250
anas-0.250
veh-0.249
 Niagara-0.247
Token dead
Token ID2636
Feature activation+2.36e-01

Pos logit contributions
uld+0.456
 underscore+0.404
 Kling+0.403
 twins+0.381
 congr+0.376
Neg logit contributions
hra-0.411
camp-0.405
anas-0.404
veh-0.403
 Niagara-0.400
Token and
Token ID290
Feature activation-5.69e-02

Pos logit contributions
uld+0.177
 underscore+0.157
 Kling+0.157
 twins+0.149
 tant+0.146
Neg logit contributions
hra-0.155
camp-0.152
anas-0.152
veh-0.152
 Niagara-0.151
Token thousands
Token ID4138
Feature activation+8.36e-01

Pos logit contributions
hra+0.041
anas+0.041
veh+0.041
camp+0.041
 Niagara+0.040
Neg logit contributions
uld-0.039
 Kling-0.034
 underscore-0.034
 twins-0.032
��-0.031
Token in
Token ID287
Feature activation-2.33e-01

Pos logit contributions
uld+0.656
 underscore+0.588
 Kling+0.586
 twins+0.558
 tant+0.550
Neg logit contributions
hra-0.517
anas-0.508
camp-0.506
veh-0.505
 Niagara-0.500
Token jail
Token ID7356
Feature activation+4.62e-01

Pos logit contributions
hra+0.149
camp+0.146
anas+0.146
veh+0.146
 Niagara+0.144
Neg logit contributions
uld-0.179
 underscore-0.159
 Kling-0.159
 twins-0.151
 tant-0.149
Token.
Token ID13
Feature activation-1.93e-02

Pos logit contributions
uld+0.358
 underscore+0.320
 Kling+0.319
 twins+0.303
 congr+0.298
Neg logit contributions
hra-0.290
camp-0.285
anas-0.285
veh-0.284
 Niagara-0.281
Token But
Token ID887
Feature activation+1.33e-01

Pos logit contributions
hra+0.015
camp+0.014
anas+0.014
veh+0.014
 Niagara+0.014
Neg logit contributions
uld-0.013
 underscore-0.011
 Kling-0.011
 twins-0.010
��-0.010
Token the
Token ID262
Feature activation-2.43e-01

Pos logit contributions
uld+0.091
 underscore+0.079
 Kling+0.079
 twins+0.075
��+0.075
Neg logit contributions
hra-0.096
anas-0.095
camp-0.094
veh-0.094
 Niagara-0.094
Token authorities
Token ID4773
Feature activation-3.93e-01

Pos logit contributions
hra+0.171
anas+0.169
camp+0.168
veh+0.168
 Niagara+0.167
Neg logit contributions
uld-0.170
 underscore-0.149
 Kling-0.148
��-0.140
 twins-0.139
Token are
Token ID389
Feature activation+6.27e-01

Pos logit contributions
hra+0.048
anas+0.047
camp+0.047
veh+0.046
 Niagara+0.046
Neg logit contributions
uld-0.116
 underscore-0.107
 Kling-0.107
 twins-0.103
 tant-0.101
Token streets
Token ID6483
Feature activation+4.18e-01

Pos logit contributions
uld+0.439
 underscore+0.387
 Kling+0.385
 twins+0.363
 congr+0.358
Neg logit contributions
hra-0.441
camp-0.435
anas-0.434
veh-0.433
 Niagara-0.430
Token and
Token ID290
Feature activation+2.06e-01

Pos logit contributions
uld+0.294
 underscore+0.259
 Kling+0.258
 twins+0.244
 congr+0.240
Neg logit contributions
hra-0.293
camp-0.289
anas-0.288
veh-0.287
 Niagara-0.285
Token al
Token ID435
Feature activation+1.98e-01

Pos logit contributions
uld+0.163
 underscore+0.145
 Kling+0.144
 twins+0.137
 congr+0.136
Neg logit contributions
hra-0.127
camp-0.125
anas-0.124
veh-0.124
 Niagara-0.123
Tokenleys
Token ID21325
Feature activation+6.11e-01

Pos logit contributions
uld+0.098
 underscore+0.082
 Kling+0.082
 twins+0.075
 congr+0.073
Neg logit contributions
hra-0.180
camp-0.179
anas-0.179
veh-0.178
 Niagara-0.176
Token dotted
Token ID38745
Feature activation-8.60e-01

Pos logit contributions
uld+0.437
 underscore+0.387
 Kling+0.385
 twins+0.364
 tant+0.358
Neg logit contributions
hra-0.421
camp-0.414
anas-0.414
veh-0.413
 Niagara-0.409
Token with
Token ID351
Feature activation-3.02e-01

Pos logit contributions
hra+0.657
anas+0.647
camp+0.646
veh+0.644
MIT+0.641
Neg logit contributions
uld-0.548
 underscore-0.484
 Kling-0.477
 twins-0.452
 tant-0.443
Token still
Token ID991
Feature activation-1.47e-01

Pos logit contributions
hra+0.194
camp+0.192
anas+0.192
veh+0.191
 Niagara+0.190
Neg logit contributions
uld-0.230
 underscore-0.203
 Kling-0.203
 twins-0.192
 congr-0.190
Token more
Token ID517
Feature activation-5.47e-01

Pos logit contributions
hra+0.096
camp+0.094
anas+0.094
veh+0.094
 Niagara+0.093
Neg logit contributions
uld-0.111
 underscore-0.099
 Kling-0.099
 twins-0.093
 congr-0.092
Token tents
Token ID29804
Feature activation-7.15e-01

Pos logit contributions
hra+0.327
camp+0.322
anas+0.321
veh+0.321
 Niagara+0.317
Neg logit contributions
uld-0.442
 underscore-0.395
 Kling-0.394
 twins-0.375
 congr-0.370
Token or
Token ID393
Feature activation-2.97e-01

Pos logit contributions
hra+0.456
camp+0.450
anas+0.449
veh+0.447
 Niagara+0.443
Neg logit contributions
uld-0.548
 underscore-0.487
 Kling-0.486
 twins-0.461
 congr-0.454
Token privacy
Token ID6782
Feature activation-3.15e-01

Pos logit contributions
uld+0.063
 underscore+0.056
 Kling+0.055
��+0.052
 twins+0.052
Neg logit contributions
hra-0.063
camp-0.063
anas-0.062
veh-0.062
 Niagara-0.062
Token issues
Token ID2428
Feature activation+3.00e-01

Pos logit contributions
hra+0.272
camp+0.269
veh+0.269
anas+0.268
 Niagara+0.267
Neg logit contributions
uld-0.171
 underscore-0.143
 Kling-0.143
��-0.133
 twins-0.130
Token.
Token ID13
Feature activation+2.46e-01

Pos logit contributions
uld+0.231
 Kling+0.205
 underscore+0.205
 twins+0.194
��+0.192
Neg logit contributions
hra-0.191
camp-0.188
anas-0.187
veh-0.187
 Niagara-0.185
Token As
Token ID1081
Feature activation+5.18e-01

Pos logit contributions
uld+0.163
 underscore+0.142
 Kling+0.140
��+0.135
 twins+0.133
Neg logit contributions
hra-0.182
camp-0.181
anas-0.180
 Niagara-0.180
veh-0.180
Token my
Token ID616
Feature activation-1.57e-01

Pos logit contributions
uld+0.370
 underscore+0.325
 Kling+0.322
 twins+0.306
��+0.305
Neg logit contributions
hra-0.357
anas-0.354
camp-0.353
 Niagara-0.351
veh-0.351
Token colleague
Token ID16008
Feature activation-9.27e-01

Pos logit contributions
hra+0.104
camp+0.102
anas+0.102
veh+0.101
 Niagara+0.101
Neg logit contributions
uld-0.118
 underscore-0.104
 Kling-0.104
 twins-0.098
��-0.098
Token Nicole
Token ID25176
Feature activation-9.20e-02

Pos logit contributions
hra+0.600
camp+0.593
anas+0.592
veh+0.589
 Niagara+0.587
Neg logit contributions
uld-0.700
 underscore-0.618
 Kling-0.615
 twins-0.583
��-0.581
Token O
Token ID440
Feature activation+1.38e-01

Pos logit contributions
hra+0.063
camp+0.061
anas+0.061
veh+0.061
 Niagara+0.061
Neg logit contributions
uld-0.067
 underscore-0.059
 Kling-0.059
 twins-0.056
��-0.055
Tokenzer
Token ID9107
Feature activation-1.61e-01

Pos logit contributions
uld+0.115
 underscore+0.104
 Kling+0.104
 twins+0.099
 tant+0.098
Neg logit contributions
hra-0.079
camp-0.077
anas-0.077
veh-0.077
 Niagara-0.076
Token has
Token ID468
Feature activation-4.21e-01

Pos logit contributions
hra+0.100
camp+0.099
anas+0.098
veh+0.098
 Niagara+0.097
Neg logit contributions
uld-0.125
 underscore-0.112
 Kling-0.112
 twins-0.106
 tant-0.105
Token discussed
Token ID6693
Feature activation-2.84e-02

Pos logit contributions
hra+0.256
camp+0.253
anas+0.253
 Niagara+0.252
veh+0.252
Neg logit contributions
uld-0.335
 Kling-0.294
 underscore-0.293
��-0.282
 twins-0.280
Token more
Token ID517
Feature activation+5.45e-02

Pos logit contributions
hra+0.086
veh+0.085
anas+0.085
camp+0.085
 Niagara+0.085
Neg logit contributions
uld-0.081
 underscore-0.071
 Kling-0.070
��-0.067
 twins-0.066
Token data
Token ID1366
Feature activation+5.63e-03

Pos logit contributions
uld+0.039
 underscore+0.035
 Kling+0.034
��+0.033
 twins+0.033
Neg logit contributions
hra-0.037
veh-0.037
camp-0.037
anas-0.037
 Niagara-0.036
Token
Token ID784
Feature activation-5.08e-01

Pos logit contributions
uld+0.004
 Kling+0.004
 underscore+0.004
��+0.003
 twins+0.003
Neg logit contributions
hra-0.004
veh-0.004
camp-0.004
anas-0.004
 Niagara-0.004
Token delivery
Token ID7585
Feature activation+4.77e-02

Pos logit contributions
hra+0.354
veh+0.349
camp+0.348
anas+0.346
 Niagara+0.343
Neg logit contributions
uld-0.359
 underscore-0.315
 Kling-0.313
��-0.303
 twins-0.297
Token rate
Token ID2494
Feature activation-3.92e-01

Pos logit contributions
uld+0.037
 underscore+0.033
 Kling+0.033
 twins+0.032
 tant+0.031
Neg logit contributions
hra-0.030
anas-0.029
camp-0.029
veh-0.029
 Niagara-0.029
Token is
Token ID318
Feature activation-8.80e-01

Pos logit contributions
hra+0.233
anas+0.228
camp+0.228
veh+0.227
 Niagara+0.226
Neg logit contributions
uld-0.317
 underscore-0.284
 Kling-0.283
 twins-0.270
��-0.269
Token limited
Token ID3614
Feature activation-3.01e-01

Pos logit contributions
hra+0.612
camp+0.601
 Niagara+0.598
anas+0.598
veh+0.598
Neg logit contributions
uld-0.626
 underscore-0.546
 Kling-0.542
��-0.519
 congr-0.511
Token by
Token ID416
Feature activation+1.33e-01

Pos logit contributions
hra+0.208
camp+0.205
anas+0.204
veh+0.204
 Niagara+0.203
Neg logit contributions
uld-0.214
 underscore-0.189
 Kling-0.188
��-0.177
 twins-0.177
Token the
Token ID262
Feature activation+2.02e-01

Pos logit contributions
uld+0.092
 underscore+0.081
 Kling+0.080
��+0.076
 twins+0.076
Neg logit contributions
hra-0.095
camp-0.093
veh-0.093
anas-0.093
 Niagara-0.092
Token application
Token ID3586
Feature activation+1.33e+00

Pos logit contributions
uld+0.139
 underscore+0.121
 Kling+0.120
��+0.115
 twins+0.114
Neg logit contributions
hra-0.145
veh-0.142
camp-0.142
anas-0.142
 Niagara-0.142
Token.
Token ID13
Feature activation-1.75e-01

Pos logit contributions
uld+0.958
 underscore+0.850
 Kling+0.846
 twins+0.799
 tant+0.788
Neg logit contributions
hra-0.908
anas-0.892
camp-0.890
veh-0.889
 Niagara-0.881
Tokenuries
Token ID4740
Feature activation+2.94e-01

Pos logit contributions
hra+0.019
anas+0.018
camp+0.018
veh+0.018
arkin+0.018
Neg logit contributions
uld-0.020
 underscore-0.018
 Kling-0.018
 twins-0.017
 congr-0.017
Token it
Token ID340
Feature activation-5.57e-02

Pos logit contributions
uld+0.213
 underscore+0.189
 Kling+0.188
 twins+0.178
 congr+0.175
Neg logit contributions
hra-0.199
anas-0.196
camp-0.196
veh-0.195
 Niagara-0.194
Token can
Token ID460
Feature activation-1.65e-01

Pos logit contributions
hra+0.034
anas+0.033
camp+0.033
veh+0.033
 Niagara+0.033
Neg logit contributions
uld-0.045
 underscore-0.040
 Kling-0.040
 twins-0.038
��-0.038
Token no
Token ID645
Feature activation-6.79e-02

Pos logit contributions
hra+0.116
camp+0.115
anas+0.114
veh+0.114
 Niagara+0.113
Neg logit contributions
uld-0.116
 Kling-0.101
 underscore-0.100
��-0.096
 twins-0.095
Token longer
Token ID2392
Feature activation-4.32e-01

Pos logit contributions
hra+0.034
anas+0.034
camp+0.034
veh+0.033
 Niagara+0.033
Neg logit contributions
uld-0.061
 Kling-0.055
 underscore-0.055
��-0.053
 twins-0.053
Token afford
Token ID5368
Feature activation-1.49e+00

Pos logit contributions
hra+0.313
camp+0.310
anas+0.309
 Niagara+0.308
veh+0.307
Neg logit contributions
uld-0.291
 Kling-0.253
 underscore-0.246
��-0.244
��-0.239
Token.
Token ID13
Feature activation+5.44e-02

Pos logit contributions
hra+0.933
camp+0.917
anas+0.912
veh+0.906
 Niagara+0.903
Neg logit contributions
uld-1.162
 underscore-1.029
 Kling-1.029
��-0.985
 twins-0.980
Token Saudi
Token ID7420
Feature activation-6.95e-01

Pos logit contributions
uld+0.034
 underscore+0.029
 Kling+0.029
��+0.027
 twins+0.027
Neg logit contributions
hra-0.043
camp-0.042
 Niagara-0.042
anas-0.042
veh-0.042
Token Arabia
Token ID9671
Feature activation-6.16e-01

Pos logit contributions
hra+0.267
camp+0.259
anas+0.259
veh+0.258
 Niagara+0.255
Neg logit contributions
uld-0.708
 underscore-0.648
 Kling-0.647
��-0.622
 twins-0.622
Token has
Token ID468
Feature activation-2.16e-01

Pos logit contributions
hra+0.414
camp+0.408
anas+0.406
veh+0.405
 Niagara+0.402
Neg logit contributions
uld-0.451
 underscore-0.399
 Kling-0.398
 twins-0.377
��-0.374
Token been
Token ID587
Feature activation-5.45e-01

Pos logit contributions
hra+0.149
anas+0.147
camp+0.146
veh+0.146
 Niagara+0.146
Neg logit contributions
uld-0.154
 underscore-0.135
 Kling-0.134
��-0.127
 twins-0.126
Token of
Token ID286
Feature activation+1.15e-02

Pos logit contributions
uld+0.449
 underscore+0.398
 Kling+0.396
 twins+0.374
 tant+0.370
Neg logit contributions
hra-0.473
anas-0.467
MIT-0.464
camp-0.464
veh-0.463
Token Ethiopian
Token ID47689
Feature activation-8.54e-01

Pos logit contributions
uld+0.008
 underscore+0.007
 Kling+0.007
 twins+0.007
��+0.007
Neg logit contributions
hra-0.008
anas-0.008
camp-0.008
veh-0.008
 Niagara-0.008
Token troops
Token ID6553
Feature activation+3.37e-01

Pos logit contributions
hra+0.560
veh+0.556
anas+0.554
camp+0.553
 Niagara+0.549
Neg logit contributions
uld-0.633
 underscore-0.559
 Kling-0.556
��-0.529
��-0.525
Token towards
Token ID3371
Feature activation+3.09e-01

Pos logit contributions
uld+0.248
 underscore+0.219
 Kling+0.219
 twins+0.207
 congr+0.204
Neg logit contributions
hra-0.225
camp-0.222
anas-0.221
veh-0.221
 Niagara-0.219
Token B
Token ID347
Feature activation-1.80e-01

Pos logit contributions
uld+0.221
 underscore+0.195
 Kling+0.195
 twins+0.184
 congr+0.181
Neg logit contributions
hra-0.213
camp-0.210
anas-0.210
veh-0.209
 Niagara-0.207
Tokenaid
Token ID1698
Feature activation-1.00e+00

Pos logit contributions
hra+0.101
anas+0.099
camp+0.099
arkin+0.098
veh+0.098
Neg logit contributions
uld-0.148
 Kling-0.140
 underscore-0.139
��-0.134
 twins-0.133
Tokenoa
Token ID12162
Feature activation+1.40e-01

Pos logit contributions
hra+0.395
anas+0.390
camp+0.384
veh+0.378
arkin+0.373
Neg logit contributions
uld-1.008
 Kling-0.943
 underscore-0.940
��-0.909
 twins-0.907
Token,
Token ID11
Feature activation-3.61e-02

Pos logit contributions
uld+0.110
 underscore+0.098
 Kling+0.098
 twins+0.093
 congr+0.092
Neg logit contributions
hra-0.087
camp-0.085
anas-0.085
veh-0.085
 Niagara-0.084
Token which
Token ID543
Feature activation-4.31e-02

Pos logit contributions
hra+0.025
camp+0.024
anas+0.024
veh+0.024
 Nasa+0.024
Neg logit contributions
uld-0.026
 underscore-0.023
 Kling-0.023
��-0.022
��-0.022
Token is
Token ID318
Feature activation+6.67e-02

Pos logit contributions
hra+0.030
anas+0.029
camp+0.029
 Nasa+0.029
veh+0.029
Neg logit contributions
uld-0.030
 Kling-0.027
 underscore-0.027
 twins-0.025
��-0.025
Token held
Token ID2714
Feature activation+7.54e-01

Pos logit contributions
uld+0.048
 underscore+0.042
 Kling+0.042
��+0.039
 congr+0.039
Neg logit contributions
hra-0.046
camp-0.045
anas-0.045
veh-0.045
 Nasa-0.045