TOP ACTIVATIONS
MAX = 5.575

F OUR and Ch ado Ralph Ru cci and portraits of
onia , Allen and editor Ralph Rosen bl um re worked
perform pharmac ological and elect roph ys iological procedures . This
But this ability also at roph ies or shr inks over
ra e multiplied in mac roph ages at the site of
started , of skills at roph ied , and of hopes
am , a benz od iazep ine anx io ly tic
Irvine , California . Hide Caption 10 of 15 Photos :
Jimmy was essentially hust ling Ralph . Also ,
Irvine , California . Hide Caption 9 of 15 Photos :
SS RI or benz od iazep ine . Copyright
2 : 21 , the sax started a phrase on B
, and critical essays by Scott Ric her t and David
20 , 2015 . Hide Caption 8 of 15 Photos :
For decades , Scott Free better known as
, while CBS ' anchor Scott Pel ley is leading his
in , Iowa . Hide Caption 5 of 15 Photos :
Pasadena , California . Hide Caption 12 of 15 Photos :
Pasadena , California . Hide Caption 11 of 15 Photos :
, we will use the 2 test for each

MIN ACTIVATIONS
MIN = -4.650

back ' the web A Downing Street petition is calling for
) who were marching on Downing Street , clashed with anti
Mike Bloom (@ AM ike Bloom Type ) and a
the cost of renting is fue lling wage demands from workers
he s not impe ached within the next year
UK may be fue lling xen ophobic sentiment
attack data . Airbus later rede - signed the algorithm and
media can provide platforms to dissemin ate evidence - based injury
graduated in April ; Brian Hale , a senior aerospace engineering
and that he will be impe ached , she
s history . Bow e is the one prisoner
will massively impact on many Northern Ireland companies .
She arer , no Chris Sutton , probably no Kenny Dal
to UCLA on Friday via Ble acher Report video .
au said . Bloom coming early to western Oregon
to gain the release of Bow e Ber gd ahl had
, although she said the pur view of the bank is
possible U . S . downgrade seemed less dire and narrower
lon Sports ,[ 97 ] Ble acher Report ,[ 98 ]
of everything the petty - bour ge ois intellig ents ia

INTERVAL 3.019 - 5.575
CONTAINS 0.057%

: 13 PH OTOS Donald Trump 's potential running
Victor De Nob el and Paul Me le , testify before
," [ Play boy CEO Scott F landers ] said ,
said PS AC president Rob yn Benson .
lot of issues fall into HR s lap .

INTERVAL 0.462 - 3.019
CONTAINS 22.368%

alities Current Anthropology , 43 ( S 4 ) DOI :
Libya and Iraq on the pretext that invasion was the only
lie . The image was taken from a different news story
g osh , I just climbed half of Mount Everest .
what is going on in order to lend support . Sometimes

INTERVAL -2.094 - 0.462
CONTAINS 76.576%

." They 're all about stakes . I 've always been
had smoked crack cocaine . The journalists Rob yn D ool
myself : Where are the feds when police over step the
at guard for the Baltimore Ravens . He started a total
)} 17 . c xd 5 { ( 21 s )}

INTERVAL -4.650 - -2.094
CONTAINS 0.998%

s refusal to denounce Donald Trump s
, procedural maps were already bog ging us in two factors
[ 20 ] So the study of mythology
goes and makes Silence . So I think in a perfect
're in luck . Turn on the monitor and note the
TokenF
Token ID37
Feature activation+6.08e-01

Pos logit contributions
��+0.553
ivism+0.484
idem+0.473
icles+0.441
��+0.433
Neg logit contributions
 Spur-0.463
 misunderstand-0.450
fork-0.436
 twe-0.424
 deem-0.414
TokenOUR
Token ID11698
Feature activation+2.33e-01

Pos logit contributions
��+1.099
ivism+0.978
idem+0.957
icles+0.897
��+0.874
Neg logit contributions
 Spur-0.780
 misunderstand-0.754
fork-0.733
 twe-0.709
 lig-0.693
Token and
Token ID290
Feature activation-3.14e-01

Pos logit contributions
��+0.402
ivism+0.355
idem+0.346
icles+0.323
��+0.316
Neg logit contributions
 Spur-0.319
 misunderstand-0.309
fork-0.303
 twe-0.293
 lig-0.286
Token Ch
Token ID609
Feature activation+5.51e-01

Pos logit contributions
 Spur+0.432
 misunderstand+0.413
fork+0.400
 twe+0.394
 deem+0.383
Neg logit contributions
��-0.545
ivism-0.483
idem-0.474
icles-0.439
��-0.433
Tokenado
Token ID4533
Feature activation-2.60e-01

Pos logit contributions
��+0.924
ivism+0.796
idem+0.776
��+0.725
icles+0.722
Neg logit contributions
 Spur-0.791
 misunderstand-0.757
fork-0.748
 twe-0.720
 lig-0.700
Token Ralph
Token ID20993
Feature activation+5.57e+00

Pos logit contributions
 Spur+0.363
 misunderstand+0.345
fork+0.333
 twe+0.329
 lig+0.323
Neg logit contributions
��-0.455
ivism-0.391
idem-0.384
icles-0.358
��-0.353
Token Ru
Token ID11667
Feature activation+8.34e-01

Pos logit contributions
��+5.469
ivism+4.453
idem+4.269
icles+3.780
ão+3.555
Neg logit contributions
 Spur-11.303
 misunderstand-11.069
fork-10.986
 twe-10.687
 Learns-10.581
Tokencci
Token ID35764
Feature activation+1.35e+00

Pos logit contributions
��+1.480
ivism+1.288
idem+1.258
icles+1.202
��+1.172
Neg logit contributions
 Spur-1.115
 misunderstand-1.063
fork-1.051
 twe-1.008
 lig-0.987
Token and
Token ID290
Feature activation-3.82e-01

Pos logit contributions
��+2.222
ivism+1.949
idem+1.905
icles+1.783
��+1.720
Neg logit contributions
 Spur-1.966
 misunderstand-1.903
fork-1.860
 twe-1.804
 lig-1.764
Token portraits
Token ID31725
Feature activation+1.69e+00

Pos logit contributions
 Spur+0.523
 misunderstand+0.500
fork+0.483
 twe+0.479
 lig+0.464
Neg logit contributions
��-0.668
ivism-0.589
idem-0.580
icles-0.536
��-0.531
Token of
Token ID286
Feature activation-1.48e-01

Pos logit contributions
��+2.899
ivism+2.568
idem+2.502
icles+2.368
��+2.268
Neg logit contributions
 Spur-2.296
 misunderstand-2.223
fork-2.175
 twe-2.098
 Learns-2.056
Tokenonia
Token ID11339
Feature activation-1.38e+00

Pos logit contributions
��+0.415
ivism+0.366
idem+0.360
icles+0.341
��+0.333
Neg logit contributions
 Spur-0.268
 misunderstand-0.259
fork-0.251
 twe-0.242
 Learns-0.236
Token,
Token ID11
Feature activation-1.88e-01

Pos logit contributions
 Spur+1.881
 misunderstand+1.803
fork+1.751
 twe+1.724
 lig+1.677
Neg logit contributions
��-2.372
ivism-2.088
idem-2.059
icles-1.920
��-1.891
Token Allen
Token ID9659
Feature activation-1.20e+00

Pos logit contributions
 Spur+0.269
 misunderstand+0.261
fork+0.254
 twe+0.249
 Learns+0.242
Neg logit contributions
��-0.313
ivism-0.274
idem-0.267
icles-0.252
��-0.245
Token and
Token ID290
Feature activation-9.08e-02

Pos logit contributions
 Spur+1.524
 misunderstand+1.475
fork+1.415
 twe+1.382
 Learns+1.362
Neg logit contributions
��-2.172
ivism-1.910
idem-1.881
icles-1.761
��-1.749
Token editor
Token ID5464
Feature activation-4.46e-01

Pos logit contributions
 Spur+0.131
 misunderstand+0.126
fork+0.123
 twe+0.120
 lig+0.117
Neg logit contributions
��-0.151
ivism-0.132
idem-0.130
icles-0.120
��-0.117
Token Ralph
Token ID20993
Feature activation+4.96e+00

Pos logit contributions
 Spur+0.647
 misunderstand+0.616
fork+0.599
 twe+0.586
 Learns+0.572
Neg logit contributions
��-0.746
ivism-0.649
idem-0.633
icles-0.591
��-0.580
Token Rosen
Token ID15564
Feature activation-2.36e+00

Pos logit contributions
��+8.482
ivism+7.579
idem+7.449
icles+7.085
ão+6.756
Neg logit contributions
 misunderstand-6.148
 Spur-6.082
fork-5.975
 twe-5.793
 deem-5.623
Tokenbl
Token ID2436
Feature activation-1.63e+00

Pos logit contributions
 Spur+2.923
 misunderstand+2.823
fork+2.793
 twe+2.649
 deem+2.629
Neg logit contributions
��-4.277
ivism-3.803
idem-3.726
��-3.490
icles-3.468
Tokenum
Token ID388
Feature activation+1.86e-02

Pos logit contributions
 Spur+2.130
fork+2.063
 misunderstand+2.027
 twe+1.999
ately+1.891
Neg logit contributions
��-2.883
ivism-2.446
idem-2.363
��-2.294
icles-2.264
Token re
Token ID302
Feature activation-3.95e-01

Pos logit contributions
��+0.034
ivism+0.030
idem+0.030
icles+0.028
��+0.027
Neg logit contributions
 Spur-0.024
 misunderstand-0.023
fork-0.022
 twe-0.022
 deem-0.021
Tokenworked
Token ID32931
Feature activation+4.98e-01

Pos logit contributions
 Spur+0.493
 misunderstand+0.476
fork+0.474
 twe+0.456
 lig+0.442
Neg logit contributions
��-0.720
ivism-0.637
idem-0.623
icles-0.587
��-0.583
Token perform
Token ID1620
Feature activation+1.36e+00

Pos logit contributions
��+0.246
ivism+0.221
idem+0.216
icles+0.205
��+0.199
Neg logit contributions
 Spur-0.144
 misunderstand-0.139
fork-0.135
 twe-0.130
 lig-0.126
Token pharmac
Token ID25702
Feature activation-9.75e-02

Pos logit contributions
��+2.393
ivism+2.127
idem+2.075
icles+1.961
��+1.886
Neg logit contributions
 Spur-1.776
 misunderstand-1.714
fork-1.682
 twe-1.616
 Learns-1.581
Tokenological
Token ID2770
Feature activation-4.24e-01

Pos logit contributions
 Spur+0.101
 misunderstand+0.096
fork+0.095
 twe+0.090
 lig+0.087
Neg logit contributions
��-0.201
ivism-0.180
idem-0.175
icles-0.167
��-0.164
Token and
Token ID290
Feature activation-1.28e-01

Pos logit contributions
 Spur+0.537
 misunderstand+0.524
fork+0.503
 twe+0.492
 lig+0.484
Neg logit contributions
��-0.779
ivism-0.691
idem-0.675
icles-0.640
��-0.623
Token elect
Token ID1742
Feature activation+5.58e-01

Pos logit contributions
 Spur+0.156
 misunderstand+0.151
fork+0.145
 twe+0.142
 lig+0.140
Neg logit contributions
��-0.242
ivism-0.216
idem-0.210
icles-0.200
��-0.194
Tokenroph
Token ID10051
Feature activation+4.94e+00

Pos logit contributions
��+0.813
ivism+0.686
idem+0.668
icles+0.620
��+0.607
Neg logit contributions
 Spur-0.912
 misunderstand-0.891
fork-0.879
 twe-0.854
 lig-0.842
Tokenys
Token ID893
Feature activation-5.55e-02

Pos logit contributions
��+8.936
idem+8.140
ivism+7.961
icles+7.537
imus+7.246
Neg logit contributions
 Spur-5.838
 misunderstand-5.606
fork-5.381
 twe-5.242
 Learns-5.186
Tokeniological
Token ID15071
Feature activation+7.34e-02

Pos logit contributions
 Spur+0.071
 misunderstand+0.069
fork+0.067
 twe+0.065
 lig+0.064
Neg logit contributions
��-0.100
ivism-0.088
idem-0.086
icles-0.081
��-0.080
Token procedures
Token ID9021
Feature activation+9.83e-01

Pos logit contributions
��+0.136
ivism+0.122
idem+0.119
icles+0.113
��+0.109
Neg logit contributions
 Spur-0.091
 misunderstand-0.088
fork-0.085
 twe-0.083
 lig-0.081
Token.
Token ID13
Feature activation-1.01e-01

Pos logit contributions
��+1.665
ivism+1.463
idem+1.426
icles+1.345
��+1.298
Neg logit contributions
 Spur-1.380
 misunderstand-1.337
fork-1.305
 twe-1.266
 lig-1.237
Token This
Token ID770
Feature activation-1.40e+00

Pos logit contributions
 Spur+0.126
 misunderstand+0.119
fork+0.117
 twe+0.112
 Learns+0.111
Neg logit contributions
��-0.187
ivism-0.167
idem-0.163
icles-0.155
��-0.151
Token But
Token ID887
Feature activation-1.53e-01

Pos logit contributions
 Spur+0.024
 misunderstand+0.022
fork+0.022
 Learns+0.021
 twe+0.021
Neg logit contributions
��-0.038
ivism-0.034
idem-0.033
icles-0.031
��-0.031
Token this
Token ID428
Feature activation-1.53e+00

Pos logit contributions
 Spur+0.210
 misunderstand+0.205
fork+0.198
 twe+0.192
 lig+0.189
Neg logit contributions
��-0.263
ivism-0.232
idem-0.225
icles-0.213
��-0.208
Token ability
Token ID2694
Feature activation-4.32e-02

Pos logit contributions
 Spur+1.660
 misunderstand+1.607
fork+1.537
 twe+1.516
 lig+1.504
Neg logit contributions
��-3.050
ivism-2.718
idem-2.680
icles-2.554
��-2.501
Token also
Token ID635
Feature activation-3.71e-01

Pos logit contributions
 Spur+0.047
 misunderstand+0.046
fork+0.044
 twe+0.042
 lig+0.041
Neg logit contributions
��-0.087
ivism-0.077
idem-0.076
icles-0.072
��-0.071
Token at
Token ID379
Feature activation-1.01e+00

Pos logit contributions
 Spur+0.423
 misunderstand+0.414
fork+0.393
 twe+0.383
 lig+0.376
Neg logit contributions
��-0.731
ivism-0.646
idem-0.641
icles-0.601
��-0.594
Tokenroph
Token ID10051
Feature activation+4.66e+00

Pos logit contributions
 Spur+1.221
 misunderstand+1.177
fork+1.154
 twe+1.117
 lig+1.096
Neg logit contributions
��-1.886
ivism-1.666
idem-1.633
icles-1.553
��-1.524
Tokenies
Token ID444
Feature activation+2.72e-01

Pos logit contributions
��+6.319
ivism+5.754
idem+5.689
icles+5.416
ão+4.923
Neg logit contributions
 Spur-7.664
 misunderstand-7.196
fork-7.049
 twe-6.941
 deem-6.787
Token or
Token ID393
Feature activation-6.07e-01

Pos logit contributions
��+0.477
ivism+0.418
idem+0.413
icles+0.387
��+0.376
Neg logit contributions
 Spur-0.367
 misunderstand-0.358
fork-0.345
 twe-0.337
 lig-0.332
Token shr
Token ID10157
Feature activation+9.48e-01

Pos logit contributions
 Spur+0.732
 misunderstand+0.714
fork+0.684
 twe+0.662
 lig+0.650
Neg logit contributions
��-1.154
ivism-1.024
idem-1.006
icles-0.948
��-0.928
Tokeninks
Token ID2973
Feature activation+3.39e-01

Pos logit contributions
��+1.827
ivism+1.614
idem+1.576
icles+1.498
��+1.482
Neg logit contributions
 Spur-1.111
 misunderstand-1.087
fork-1.051
 twe-1.009
 lig-0.982
Token over
Token ID625
Feature activation-8.48e-01

Pos logit contributions
��+0.574
ivism+0.502
idem+0.494
icles+0.461
��+0.450
Neg logit contributions
 Spur-0.476
 misunderstand-0.463
fork-0.447
 twe-0.437
 lig-0.432
Tokenra
Token ID430
Feature activation+4.84e-01

Pos logit contributions
��+1.705
ivism+1.506
idem+1.476
icles+1.392
��+1.364
Neg logit contributions
 Spur-1.126
 misunderstand-1.090
fork-1.068
 twe-1.020
 lig-0.996
Tokene
Token ID68
Feature activation+2.04e-01

Pos logit contributions
��+0.825
ivism+0.706
idem+0.687
icles+0.645
��+0.642
Neg logit contributions
 Spur-0.679
 misunderstand-0.663
fork-0.654
 twe-0.629
 lig-0.619
Token multiplied
Token ID33096
Feature activation-5.61e-01

Pos logit contributions
��+0.388
ivism+0.343
idem+0.335
icles+0.317
��+0.311
Neg logit contributions
 Spur-0.245
 misunderstand-0.238
fork-0.228
 twe-0.222
 lig-0.219
Token in
Token ID287
Feature activation+2.06e-01

Pos logit contributions
 Spur+0.740
 misunderstand+0.722
fork+0.683
 twe+0.674
 lig+0.672
Neg logit contributions
��-1.014
ivism-0.875
idem-0.865
��-0.808
icles-0.806
Token mac
Token ID8352
Feature activation+8.20e-01

Pos logit contributions
��+0.359
ivism+0.318
idem+0.310
icles+0.290
��+0.284
Neg logit contributions
 Spur-0.282
 misunderstand-0.271
fork-0.261
 twe-0.259
 lig-0.255
Tokenroph
Token ID10051
Feature activation+4.39e+00

Pos logit contributions
��+1.479
ivism+1.305
idem+1.271
icles+1.199
��+1.170
Neg logit contributions
 Spur-1.065
 misunderstand-1.025
fork-1.002
 twe-0.963
 lig-0.943
Tokenages
Token ID1095
Feature activation+2.14e-01

Pos logit contributions
��+1.959
ivism+1.259
idem+1.231
icles+0.842
ão+0.515
Neg logit contributions
 Spur-11.297
 misunderstand-11.076
fork-10.927
 twe-10.757
 Learns-10.679
Token at
Token ID379
Feature activation-1.10e+00

Pos logit contributions
��+0.373
ivism+0.327
idem+0.319
icles+0.299
��+0.294
Neg logit contributions
 Spur-0.292
 misunderstand-0.282
fork-0.274
 twe-0.267
 lig-0.264
Token the
Token ID262
Feature activation+4.73e-02

Pos logit contributions
 Spur+1.500
 misunderstand+1.438
fork+1.381
 twe+1.370
 lig+1.364
Neg logit contributions
��-1.915
ivism-1.681
idem-1.641
icles-1.539
��-1.525
Token site
Token ID2524
Feature activation-1.93e-01

Pos logit contributions
��+0.106
ivism+0.096
idem+0.095
icles+0.090
��+0.088
Neg logit contributions
 Spur-0.041
 misunderstand-0.039
fork-0.037
 twe-0.036
 lig-0.035
Token of
Token ID286
Feature activation+5.61e-02

Pos logit contributions
 Spur+0.249
 misunderstand+0.241
fork+0.235
 twe+0.228
 lig+0.223
Neg logit contributions
��-0.347
ivism-0.307
idem-0.299
icles-0.282
��-0.275
Token started
Token ID2067
Feature activation+1.29e+00

Pos logit contributions
��+1.566
ivism+1.388
idem+1.369
icles+1.291
��+1.241
Neg logit contributions
 Spur-1.123
 misunderstand-1.087
fork-1.061
 twe-1.028
 deem-1.007
Token,
Token ID11
Feature activation-1.34e-03

Pos logit contributions
��+2.157
ivism+1.887
idem+1.847
icles+1.735
��+1.674
Neg logit contributions
 Spur-1.821
 misunderstand-1.769
fork-1.730
 twe-1.675
 Learns-1.633
Token of
Token ID286
Feature activation+2.87e-01

Pos logit contributions
 Spur+0.002
 misunderstand+0.002
fork+0.002
 twe+0.002
 lig+0.002
Neg logit contributions
��-0.002
ivism-0.002
idem-0.002
icles-0.002
��-0.002
Token skills
Token ID4678
Feature activation-8.10e-01

Pos logit contributions
��+0.528
ivism+0.467
idem+0.457
icles+0.433
��+0.419
Neg logit contributions
 Spur-0.362
 misunderstand-0.353
fork-0.339
 twe-0.331
 lig-0.322
Token at
Token ID379
Feature activation-1.02e+00

Pos logit contributions
 Spur+0.960
 misunderstand+0.940
fork+0.903
 twe+0.869
 lig+0.847
Neg logit contributions
��-1.562
ivism-1.373
idem-1.357
icles-1.280
��-1.254
Tokenroph
Token ID10051
Feature activation+4.29e+00

Pos logit contributions
 Spur+0.979
 misunderstand+0.949
fork+0.907
 twe+0.876
 lig+0.839
Neg logit contributions
��-2.185
ivism-1.958
idem-1.944
icles-1.856
��-1.805
Tokenied
Token ID798
Feature activation-1.46e-01

Pos logit contributions
��+6.993
ivism+6.348
idem+6.297
icles+5.865
ão+5.595
Neg logit contributions
 Spur-5.956
 misunderstand-5.656
fork-5.497
 twe-5.343
 Learns-5.304
Token,
Token ID11
Feature activation-2.18e-01

Pos logit contributions
 Spur+0.207
 misunderstand+0.202
fork+0.197
 twe+0.191
 lig+0.187
Neg logit contributions
��-0.243
ivism-0.211
idem-0.207
icles-0.195
��-0.190
Token and
Token ID290
Feature activation+1.38e-01

Pos logit contributions
 Spur+0.278
 misunderstand+0.271
fork+0.261
 twe+0.253
 lig+0.247
Neg logit contributions
��-0.397
ivism-0.351
idem-0.343
icles-0.324
��-0.314
Token of
Token ID286
Feature activation+1.46e-01

Pos logit contributions
��+0.247
ivism+0.218
idem+0.214
icles+0.201
��+0.194
Neg logit contributions
 Spur-0.179
 misunderstand-0.175
fork-0.169
 twe-0.164
 lig-0.160
Token hopes
Token ID7176
Feature activation-6.00e-02

Pos logit contributions
��+0.270
ivism+0.238
idem+0.234
icles+0.221
��+0.214
Neg logit contributions
 Spur-0.184
 misunderstand-0.180
fork-0.172
 twe-0.168
 lig-0.163
Tokenam
Token ID321
Feature activation-4.12e-01

Pos logit contributions
��+3.304
ivism+2.835
idem+2.701
icles+2.545
��+2.509
Neg logit contributions
 Spur-3.169
 misunderstand-3.083
fork-3.072
 twe-2.998
 lig-2.921
Token,
Token ID11
Feature activation-3.15e-01

Pos logit contributions
 Spur+0.509
 misunderstand+0.492
fork+0.475
 twe+0.468
 lig+0.464
Neg logit contributions
��-0.769
ivism-0.681
idem-0.655
icles-0.626
��-0.613
Token a
Token ID257
Feature activation-1.45e-01

Pos logit contributions
 Spur+0.453
 misunderstand+0.434
fork+0.422
 twe+0.416
 lig+0.412
Neg logit contributions
��-0.527
ivism-0.463
idem-0.446
icles-0.426
��-0.410
Token benz
Token ID39601
Feature activation-6.22e-01

Pos logit contributions
 Spur+0.149
 misunderstand+0.142
fork+0.137
 twe+0.135
 lig+0.135
Neg logit contributions
��-0.299
ivism-0.270
idem-0.263
icles-0.252
��-0.246
Tokenod
Token ID375
Feature activation+7.79e-01

Pos logit contributions
 Spur+0.869
 misunderstand+0.847
fork+0.835
 twe+0.820
 lig+0.810
Neg logit contributions
��-1.039
ivism-0.896
idem-0.860
icles-0.825
��-0.819
Tokeniazep
Token ID48826
Feature activation+4.28e+00

Pos logit contributions
��+1.198
ivism+1.016
idem+0.978
icles+0.924
��+0.908
Neg logit contributions
 Spur-1.209
 misunderstand-1.181
fork-1.162
 twe-1.140
 lig-1.117
Tokenine
Token ID500
Feature activation-2.48e-01

Pos logit contributions
��+3.902
ivism+3.216
idem+3.174
icles+2.743
imus+2.489
Neg logit contributions
 Spur-8.926
 misunderstand-8.820
fork-8.647
 Learns-8.404
 twe-8.323
Token anx
Token ID7296
Feature activation-1.20e+00

Pos logit contributions
 Spur+0.319
 misunderstand+0.309
fork+0.300
 twe+0.295
 lig+0.291
Neg logit contributions
��-0.447
ivism-0.396
idem-0.386
icles-0.366
��-0.356
Tokenio
Token ID952
Feature activation+5.28e-01

Pos logit contributions
 Spur+1.662
fork+1.640
 misunderstand+1.624
 twe+1.598
 lig+1.565
Neg logit contributions
��-2.008
ivism-1.708
idem-1.652
��-1.595
icles-1.556
Tokenly
Token ID306
Feature activation-7.59e-01

Pos logit contributions
��+1.251
ivism+1.146
idem+1.127
icles+1.082
��+1.054
Neg logit contributions
 Spur-0.377
 misunderstand-0.354
fork-0.339
 twe-0.316
 Learns-0.300
Tokentic
Token ID13370
Feature activation-7.48e-02

Pos logit contributions
 Spur+0.878
fork+0.822
 misunderstand+0.813
 twe+0.783
 lig+0.767
Neg logit contributions
��-1.476
ivism-1.344
idem-1.279
icles-1.248
��-1.213
Token Irvine
Token ID37277
Feature activation+1.54e-01

Pos logit contributions
��+1.040
ivism+0.899
idem+0.877
icles+0.815
��+0.790
Neg logit contributions
 Spur-1.023
 misunderstand-0.991
fork-0.968
 twe-0.945
 lig-0.924
Token,
Token ID11
Feature activation+2.59e-01

Pos logit contributions
��+0.259
ivism+0.226
idem+0.221
icles+0.207
��+0.202
Neg logit contributions
 Spur-0.220
 misunderstand-0.211
fork-0.209
 twe-0.201
 lig-0.197
Token California
Token ID3442
Feature activation+2.36e-01

Pos logit contributions
��+0.468
ivism+0.415
idem+0.406
icles+0.384
��+0.371
Neg logit contributions
 Spur-0.333
 misunderstand-0.322
fork-0.314
 twe-0.304
 lig-0.296
Token.
Token ID13
Feature activation+2.80e-01

Pos logit contributions
��+0.342
ivism+0.296
idem+0.286
icles+0.267
��+0.253
Neg logit contributions
 Spur-0.385
 misunderstand-0.377
fork-0.366
 twe-0.358
 Learns-0.352
Token Hide
Token ID10415
Feature activation+1.45e+00

Pos logit contributions
��+0.522
ivism+0.464
idem+0.455
icles+0.429
��+0.419
Neg logit contributions
 Spur-0.348
 misunderstand-0.330
fork-0.324
 twe-0.312
 Learns-0.307
Token Caption
Token ID11260
Feature activation+4.17e+00

Pos logit contributions
��+1.815
ivism+1.514
idem+1.470
icles+1.337
��+1.277
Neg logit contributions
 Spur-2.678
 misunderstand-2.606
fork-2.566
 twe-2.498
 Learns-2.465
Token 10
Token ID838
Feature activation+5.19e-01

Pos logit contributions
��+5.295
ivism+4.742
idem+4.376
icles+4.272
ão+3.959
Neg logit contributions
 Spur-6.984
 misunderstand-6.875
fork-6.730
 twe-6.523
 deem-6.498
Token of
Token ID286
Feature activation-2.26e-01

Pos logit contributions
��+0.971
ivism+0.855
idem+0.839
icles+0.786
��+0.782
Neg logit contributions
 Spur-0.644
 misunderstand-0.619
fork-0.608
 twe-0.580
 lig-0.570
Token 15
Token ID1315
Feature activation+7.94e-01

Pos logit contributions
 Spur+0.382
 misunderstand+0.366
fork+0.358
 twe+0.351
 Learns+0.346
Neg logit contributions
��-0.333
ivism-0.275
idem-0.272
icles-0.248
��-0.248
Token Photos
Token ID9434
Feature activation+2.44e+00

Pos logit contributions
��+1.530
ivism+1.357
idem+1.333
icles+1.247
��+1.245
Neg logit contributions
 Spur-0.947
 misunderstand-0.897
fork-0.875
 Learns-0.835
 twe-0.834
Token:
Token ID25
Feature activation+4.48e-02

Pos logit contributions
��+3.496
ivism+3.067
idem+2.964
icles+2.780
ão+2.610
Neg logit contributions
 Spur-3.929
 misunderstand-3.844
fork-3.743
 twe-3.674
 Learns-3.585
Token Jimmy
Token ID12963
Feature activation+2.01e+00

Pos logit contributions
��+2.632
ivism+2.334
idem+2.275
icles+2.140
��+2.049
Neg logit contributions
 Spur-2.101
 misunderstand-2.031
fork-1.994
 twe-1.919
 Learns-1.885
Token was
Token ID373
Feature activation-2.17e-01

Pos logit contributions
��+3.542
ivism+3.145
idem+3.079
icles+2.899
��+2.791
Neg logit contributions
 Spur-2.646
 misunderstand-2.562
fork-2.501
 twe-2.416
 Learns-2.356
Token essentially
Token ID6986
Feature activation+3.31e-01

Pos logit contributions
 Spur+0.279
 misunderstand+0.270
fork+0.261
 twe+0.255
 deem+0.247
Neg logit contributions
��-0.394
ivism-0.350
idem-0.343
icles-0.327
��-0.317
Token hust
Token ID33376
Feature activation-5.43e-01

Pos logit contributions
��+0.587
ivism+0.519
idem+0.508
icles+0.482
��+0.465
Neg logit contributions
 Spur-0.438
 misunderstand-0.425
fork-0.412
 twe-0.400
 lig-0.389
Tokenling
Token ID1359
Feature activation+1.77e-01

Pos logit contributions
 Spur+0.358
 misunderstand+0.342
fork+0.336
 twe+0.311
 lig+0.291
Neg logit contributions
��-1.312
ivism-1.189
idem-1.175
icles-1.128
��-1.116
Token Ralph
Token ID20993
Feature activation+4.16e+00

Pos logit contributions
��+0.305
ivism+0.266
idem+0.262
icles+0.247
��+0.241
Neg logit contributions
 Spur-0.245
 misunderstand-0.236
fork-0.229
 twe-0.225
 lig-0.218
Token.
Token ID13
Feature activation-7.62e-02

Pos logit contributions
��+6.946
ivism+6.180
idem+6.007
icles+5.646
ão+5.378
Neg logit contributions
 Spur-5.672
 misunderstand-5.528
fork-5.424
 twe-5.247
 lig-5.119
Token
Token ID198
Feature activation-1.14e-02

Pos logit contributions
 Spur+0.094
 misunderstand+0.089
fork+0.087
 twe+0.084
 Learns+0.082
Neg logit contributions
��-0.143
ivism-0.127
idem-0.124
icles-0.118
��-0.116
Token
Token ID198
Feature activation-1.61e-01

Pos logit contributions
 Spur+0.014
 misunderstand+0.014
fork+0.014
 twe+0.013
 lig+0.013
Neg logit contributions
��-0.021
ivism-0.019
idem-0.018
icles-0.017
��-0.017
TokenAlso
Token ID7583
Feature activation-1.08e+00

Pos logit contributions
 Spur+0.219
fork+0.212
 misunderstand+0.212
 twe+0.202
 lig+0.195
Neg logit contributions
��-0.279
ivism-0.243
idem-0.236
icles-0.225
��-0.221
Token,
Token ID11
Feature activation+5.78e-02

Pos logit contributions
 Spur+1.513
 misunderstand+1.471
fork+1.423
 twe+1.394
 lig+1.355
Neg logit contributions
��-1.822
ivism-1.596
idem-1.567
icles-1.480
��-1.431
Token Irvine
Token ID37277
Feature activation+8.35e-02

Pos logit contributions
��+1.212
ivism+1.057
idem+1.033
icles+0.966
��+0.935
Neg logit contributions
 Spur-1.093
 misunderstand-1.055
fork-1.033
 twe-1.004
 lig-0.980
Token,
Token ID11
Feature activation+2.75e-01

Pos logit contributions
��+0.137
ivism+0.120
idem+0.117
icles+0.110
��+0.106
Neg logit contributions
 Spur-0.121
 misunderstand-0.117
fork-0.115
 twe-0.111
 lig-0.109
Token California
Token ID3442
Feature activation+1.51e-01

Pos logit contributions
��+0.516
ivism+0.454
idem+0.448
icles+0.422
��+0.415
Neg logit contributions
 Spur-0.340
 misunderstand-0.327
fork-0.319
 twe-0.309
 lig-0.300
Token.
Token ID13
Feature activation+2.65e-01

Pos logit contributions
��+0.219
ivism+0.190
idem+0.183
icles+0.171
��+0.162
Neg logit contributions
 Spur-0.247
 misunderstand-0.242
fork-0.235
 twe-0.230
 Learns-0.226
Token Hide
Token ID10415
Feature activation+1.43e+00

Pos logit contributions
��+0.501
ivism+0.444
idem+0.437
icles+0.412
��+0.404
Neg logit contributions
 Spur-0.326
 misunderstand-0.308
fork-0.301
 twe-0.289
 Learns-0.286
Token Caption
Token ID11260
Feature activation+4.14e+00

Pos logit contributions
��+1.789
ivism+1.492
idem+1.450
icles+1.318
��+1.261
Neg logit contributions
 Spur-2.627
 misunderstand-2.557
fork-2.517
 twe-2.450
 Learns-2.419
Token 9
Token ID860
Feature activation+1.02e+00

Pos logit contributions
��+5.258
ivism+4.566
idem+4.302
icles+4.112
ão+3.776
Neg logit contributions
 Spur-7.107
 misunderstand-6.969
fork-6.857
 twe-6.685
 deem-6.624
Token of
Token ID286
Feature activation-2.63e-01

Pos logit contributions
��+1.864
ivism+1.648
idem+1.613
icles+1.519
��+1.487
Neg logit contributions
 Spur-1.296
 misunderstand-1.249
fork-1.224
 twe-1.175
 lig-1.148
Token 15
Token ID1315
Feature activation+8.12e-01

Pos logit contributions
 Spur+0.443
 misunderstand+0.424
fork+0.416
 twe+0.407
 Learns+0.402
Neg logit contributions
��-0.386
ivism-0.319
idem-0.315
icles-0.288
��-0.288
Token Photos
Token ID9434
Feature activation+2.42e+00

Pos logit contributions
��+1.569
ivism+1.390
idem+1.366
icles+1.278
��+1.277
Neg logit contributions
 Spur-0.967
 misunderstand-0.914
fork-0.893
 Learns-0.851
 twe-0.849
Token:
Token ID25
Feature activation+2.55e-02

Pos logit contributions
��+3.491
ivism+3.060
idem+2.959
icles+2.774
ão+2.606
Neg logit contributions
 Spur-3.907
 misunderstand-3.820
fork-3.720
 twe-3.650
 Learns-3.562
Token SS
Token ID6723
Feature activation-3.40e-01

Pos logit contributions
��+0.697
ivism+0.626
idem+0.613
icles+0.582
��+0.564
Neg logit contributions
 Spur-0.405
 misunderstand-0.389
fork-0.378
 twe-0.364
 lig-0.354
TokenRI
Token ID7112
Feature activation+1.12e+00

Pos logit contributions
 Spur+0.607
 misunderstand+0.584
fork+0.583
 twe+0.567
 lig+0.567
Neg logit contributions
��-0.439
ivism-0.370
idem-0.363
icles-0.337
��-0.327
Token or
Token ID393
Feature activation-7.73e-01

Pos logit contributions
��+1.796
ivism+1.573
idem+1.533
icles+1.437
��+1.378
Neg logit contributions
 Spur-1.651
 misunderstand-1.603
fork-1.569
 twe-1.522
 lig-1.488
Token benz
Token ID39601
Feature activation-6.40e-01

Pos logit contributions
 Spur+0.948
 misunderstand+0.912
fork+0.883
 twe+0.869
 lig+0.868
Neg logit contributions
��-1.438
ivism-1.288
idem-1.262
icles-1.195
��-1.160
Tokenod
Token ID375
Feature activation+6.47e-01

Pos logit contributions
 Spur+0.871
fork+0.851
 misunderstand+0.845
 twe+0.819
 lig+0.809
Neg logit contributions
��-1.100
ivism-0.936
idem-0.908
��-0.874
icles-0.861
Tokeniazep
Token ID48826
Feature activation+4.10e+00

Pos logit contributions
��+0.981
ivism+0.832
idem+0.807
icles+0.751
��+0.743
Neg logit contributions
 Spur-1.020
 misunderstand-0.996
fork-0.984
 twe-0.951
 lig-0.939
Tokenine
Token ID500
Feature activation-4.79e-01

Pos logit contributions
��+3.789
ivism+3.079
idem+3.046
icles+2.642
ão+2.354
Neg logit contributions
 Spur-8.597
 misunderstand-8.458
fork-8.288
 Learns-8.059
 twe-8.048
Token.
Token ID13
Feature activation-2.19e-01

Pos logit contributions
 Spur+0.663
 misunderstand+0.638
fork+0.624
 twe+0.615
 lig+0.612
Neg logit contributions
��-0.819
ivism-0.722
idem-0.700
icles-0.664
��-0.646
Token
Token ID198
Feature activation-4.46e-02

Pos logit contributions
 Spur+0.278
 misunderstand+0.264
fork+0.257
 twe+0.247
 lig+0.245
Neg logit contributions
��-0.404
ivism-0.360
idem-0.350
icles-0.331
��-0.323
Token
Token ID198
Feature activation-1.64e-01

Pos logit contributions
 Spur+0.055
 misunderstand+0.053
fork+0.052
 twe+0.050
 lig+0.049
Neg logit contributions
��-0.084
ivism-0.073
idem-0.072
icles-0.068
��-0.068
TokenCopyright
Token ID15269
Feature activation-4.94e-01

Pos logit contributions
 Spur+0.212
fork+0.205
 misunderstand+0.205
 twe+0.193
 lig+0.192
Neg logit contributions
��-0.291
ivism-0.258
idem-0.246
icles-0.237
��-0.230
Token 2
Token ID362
Feature activation-7.44e-02

Pos logit contributions
 Spur+0.291
 misunderstand+0.282
fork+0.272
 twe+0.261
 lig+0.253
Neg logit contributions
��-0.515
ivism-0.463
idem-0.453
icles-0.431
��-0.419
Token:
Token ID25
Feature activation+4.48e-01

Pos logit contributions
 Spur+0.103
 misunderstand+0.100
fork+0.098
 twe+0.095
 Learns+0.092
Neg logit contributions
��-0.126
ivism-0.111
idem-0.109
icles-0.102
��-0.099
Token21
Token ID2481
Feature activation+8.68e-03

Pos logit contributions
��+0.760
ivism+0.671
idem+0.654
icles+0.616
��+0.594
Neg logit contributions
 Spur-0.628
 misunderstand-0.605
fork-0.593
 twe-0.571
 Learns-0.566
Token,
Token ID11
Feature activation-1.16e-01

Pos logit contributions
��+0.015
ivism+0.013
idem+0.013
icles+0.012
��+0.012
Neg logit contributions
 Spur-0.012
 misunderstand-0.012
fork-0.011
 twe-0.011
 Learns-0.011
Token the
Token ID262
Feature activation-6.39e-02

Pos logit contributions
 Spur+0.169
 misunderstand+0.163
fork+0.160
 twe+0.156
 Learns+0.153
Neg logit contributions
��-0.192
ivism-0.168
idem-0.163
icles-0.154
��-0.149
Token sax
Token ID46909
Feature activation+4.08e+00

Pos logit contributions
 Spur+0.080
 misunderstand+0.078
fork+0.075
 twe+0.073
 lig+0.071
Neg logit contributions
��-0.117
ivism-0.105
idem-0.102
icles-0.096
��-0.094
Token started
Token ID2067
Feature activation+1.38e+00

Pos logit contributions
��+5.860
ivism+5.117
idem+4.969
icles+4.671
ão+4.416
Neg logit contributions
 Spur-6.489
 misunderstand-6.322
fork-6.172
 twe-5.975
 Learns-5.904
Token a
Token ID257
Feature activation-1.45e-01

Pos logit contributions
��+2.370
ivism+2.091
idem+2.044
icles+1.921
��+1.856
Neg logit contributions
 Spur-1.905
 misunderstand-1.847
fork-1.802
 twe-1.749
 lig-1.704
Token phrase
Token ID9546
Feature activation+1.50e+00

Pos logit contributions
 Spur+0.175
 misunderstand+0.169
fork+0.164
 twe+0.160
 lig+0.155
Neg logit contributions
��-0.274
ivism-0.246
idem-0.242
icles-0.228
��-0.222
Token on
Token ID319
Feature activation-1.92e+00

Pos logit contributions
��+2.580
ivism+2.276
idem+2.224
icles+2.092
��+2.021
Neg logit contributions
 Spur-2.052
 misunderstand-1.988
fork-1.944
 twe-1.880
 Learns-1.832
Token B
Token ID347
Feature activation-1.11e+00

Pos logit contributions
 Spur+2.668
 misunderstand+2.539
fork+2.476
 twe+2.454
 lig+2.379
Neg logit contributions
��-3.257
ivism-2.853
idem-2.787
icles-2.621
��-2.585
Token,
Token ID11
Feature activation-1.79e-01

Pos logit contributions
 Spur+3.109
 misunderstand+3.004
fork+2.942
 twe+2.855
 deem+2.788
Neg logit contributions
��-3.589
ivism-3.153
idem-3.067
icles-2.877
��-2.801
Token and
Token ID290
Feature activation+9.55e-02

Pos logit contributions
 Spur+0.243
 misunderstand+0.236
fork+0.233
 twe+0.224
 Learns+0.220
Neg logit contributions
��-0.305
ivism-0.268
idem-0.266
icles-0.252
ão-0.237
Token critical
Token ID4688
Feature activation-6.21e-01

Pos logit contributions
��+0.219
ivism+0.200
idem+0.197
icles+0.189
��+0.183
Neg logit contributions
 Spur-0.075
 misunderstand-0.071
fork-0.068
 twe-0.064
 Learns-0.062
Token essays
Token ID27126
Feature activation+1.37e+00

Pos logit contributions
 Spur+0.898
 misunderstand+0.887
fork+0.848
 twe+0.836
 deem+0.824
Neg logit contributions
��-1.027
ivism-0.892
idem-0.856
icles-0.809
��-0.803
Token by
Token ID416
Feature activation-9.14e-01

Pos logit contributions
��+2.183
ivism+1.925
idem+1.886
icles+1.805
ão+1.684
Neg logit contributions
 Spur-1.969
 misunderstand-1.917
fork-1.872
 Learns-1.817
 twe-1.808
Token Scott
Token ID4746
Feature activation+4.05e+00

Pos logit contributions
 Spur+1.655
 misunderstand+1.602
fork+1.566
 twe+1.544
 deem+1.524
Neg logit contributions
��-1.164
ivism-0.963
idem-0.953
icles-0.853
��-0.844
Token Ric
Token ID15868
Feature activation+1.67e+00

Pos logit contributions
��+5.764
ivism+5.004
idem+4.891
icles+4.602
ão+4.276
Neg logit contributions
 Spur-6.520
 misunderstand-6.419
fork-6.360
 twe-6.132
 lig-6.017
Tokenher
Token ID372
Feature activation-4.42e-01

Pos logit contributions
��+3.321
ivism+2.984
idem+2.894
��+2.744
icles+2.729
Neg logit contributions
 Spur-1.843
 misunderstand-1.765
fork-1.744
 twe-1.667
 lig-1.593
Tokent
Token ID83
Feature activation+7.79e-02

Pos logit contributions
 Spur+0.497
 misunderstand+0.472
fork+0.468
 twe+0.438
 lig+0.431
Neg logit contributions
��-0.875
ivism-0.786
idem-0.768
icles-0.726
��-0.717
Token and
Token ID290
Feature activation-4.01e-01

Pos logit contributions
��+0.143
ivism+0.128
idem+0.124
icles+0.118
��+0.115
Neg logit contributions
 Spur-0.098
 misunderstand-0.094
fork-0.092
 twe-0.089
 lig-0.086
Token David
Token ID3271
Feature activation+8.72e-01

Pos logit contributions
 Spur+0.631
 misunderstand+0.611
fork+0.596
 twe+0.582
 deem+0.569
Neg logit contributions
��-0.610
ivism-0.532
idem-0.516
icles-0.474
��-0.461
Token 20
Token ID1160
Feature activation+3.96e-02

Pos logit contributions
��+0.379
ivism+0.324
idem+0.319
icles+0.295
��+0.292
Neg logit contributions
 Spur-0.367
 misunderstand-0.352
fork-0.347
 twe-0.336
 lig-0.330
Token,
Token ID11
Feature activation+3.10e-02

Pos logit contributions
��+0.067
ivism+0.058
idem+0.057
icles+0.053
��+0.053
Neg logit contributions
 Spur-0.056
 misunderstand-0.054
fork-0.054
 twe-0.051
 lig-0.051
Token 2015
Token ID1853
Feature activation+2.40e+00

Pos logit contributions
��+0.050
ivism+0.043
idem+0.042
icles+0.039
��+0.038
Neg logit contributions
 Spur-0.047
 misunderstand-0.045
fork-0.044
 twe-0.043
 lig-0.042
Token.
Token ID13
Feature activation+3.10e-02

Pos logit contributions
��+3.196
ivism+2.760
idem+2.627
icles+2.450
ão+2.290
Neg logit contributions
 Spur-4.042
 misunderstand-4.006
fork-3.871
 twe-3.809
 Learns-3.784
Token Hide
Token ID10415
Feature activation+1.29e+00

Pos logit contributions
��+0.058
ivism+0.051
idem+0.051
icles+0.048
��+0.047
Neg logit contributions
 Spur-0.038
 misunderstand-0.036
fork-0.036
 twe-0.034
 Learns-0.034
Token Caption
Token ID11260
Feature activation+4.01e+00

Pos logit contributions
��+1.628
ivism+1.356
idem+1.320
icles+1.199
��+1.149
Neg logit contributions
 Spur-2.379
 misunderstand-2.315
fork-2.279
 twe-2.217
 Learns-2.190
Token 8
Token ID807
Feature activation+3.51e-01

Pos logit contributions
��+4.775
ivism+4.206
idem+3.997
icles+3.719
ão+3.433
Neg logit contributions
 Spur-7.169
 misunderstand-7.095
fork-6.911
 twe-6.763
 lig-6.644
Token of
Token ID286
Feature activation-3.51e-01

Pos logit contributions
��+0.664
ivism+0.583
idem+0.572
icles+0.536
��+0.536
Neg logit contributions
 Spur-0.432
 misunderstand-0.414
fork-0.408
 twe-0.389
 lig-0.382
Token 15
Token ID1315
Feature activation+8.17e-01

Pos logit contributions
 Spur+0.591
 misunderstand+0.563
fork+0.553
 twe+0.541
 Learns+0.535
Neg logit contributions
��-0.518
ivism-0.431
idem-0.425
icles-0.389
��-0.388
Token Photos
Token ID9434
Feature activation+2.30e+00

Pos logit contributions
��+1.580
ivism+1.400
idem+1.376
��+1.287
icles+1.286
Neg logit contributions
 Spur-0.970
 misunderstand-0.918
fork-0.896
 Learns-0.854
 twe-0.852
Token:
Token ID25
Feature activation-1.43e-01

Pos logit contributions
��+3.349
ivism+2.929
idem+2.835
icles+2.656
ão+2.497
Neg logit contributions
 Spur-3.695
 misunderstand-3.610
fork-3.520
 twe-3.447
 Learns-3.366
Token
Token ID198
Feature activation+2.38e-01

Pos logit contributions
��+0.284
ivism+0.253
idem+0.248
icles+0.234
��+0.229
Neg logit contributions
 Spur-0.187
 misunderstand-0.177
fork-0.175
 twe-0.167
 Learns-0.165
Token
Token ID198
Feature activation+7.58e-02

Pos logit contributions
��+0.440
ivism+0.386
idem+0.379
icles+0.357
��+0.354
Neg logit contributions
 Spur-0.300
 misunderstand-0.287
fork-0.285
 twe-0.270
 lig-0.263
TokenFor
Token ID1890
Feature activation+1.73e+00

Pos logit contributions
��+0.134
ivism+0.118
idem+0.114
icles+0.108
��+0.106
Neg logit contributions
 Spur-0.101
fork-0.097
 misunderstand-0.096
 twe-0.092
 Learns-0.089
Token decades
Token ID4647
Feature activation-8.72e-01

Pos logit contributions
��+2.894
ivism+2.553
idem+2.487
icles+2.338
��+2.253
Neg logit contributions
 Spur-2.462
 misunderstand-2.382
fork-2.330
 twe-2.262
 Learns-2.203
Token,
Token ID11
Feature activation+2.82e-01

Pos logit contributions
 Spur+1.259
 misunderstand+1.209
fork+1.181
 twe+1.157
 lig+1.119
Neg logit contributions
��-1.446
ivism-1.262
idem-1.239
icles-1.159
��-1.118
Token Scott
Token ID4746
Feature activation+3.97e+00

Pos logit contributions
��+0.485
ivism+0.429
idem+0.418
icles+0.394
��+0.381
Neg logit contributions
 Spur-0.388
 misunderstand-0.374
fork-0.364
 twe-0.355
 lig-0.344
Token Free
Token ID3232
Feature activation+1.70e+00

Pos logit contributions
��+6.748
ivism+6.057
idem+5.903
icles+5.629
ão+5.295
Neg logit contributions
 Spur-5.147
 misunderstand-5.098
fork-4.995
 twe-4.802
 lig-4.685
Token
Token ID851
Feature activation-3.40e-01

Pos logit contributions
��+3.162
ivism+2.828
idem+2.771
icles+2.623
��+2.529
Neg logit contributions
 Spur-2.051
 misunderstand-1.981
fork-1.924
 twe-1.849
 Learns-1.803
Token better
Token ID1365
Feature activation+5.02e-01

Pos logit contributions
 Spur+0.451
 misunderstand+0.431
fork+0.422
 twe+0.412
 deem+0.399
Neg logit contributions
��-0.605
ivism-0.533
idem-0.521
icles-0.492
��-0.479
Token known
Token ID1900
Feature activation-1.13e+00

Pos logit contributions
��+1.061
ivism+0.960
idem+0.943
icles+0.899
��+0.875
Neg logit contributions
 Spur-0.488
 misunderstand-0.467
fork-0.451
 twe-0.431
 lig-0.415
Token as
Token ID355
Feature activation+4.66e-01

Pos logit contributions
 Spur+1.715
 misunderstand+1.667
fork+1.632
 twe+1.602
 deem+1.558
Neg logit contributions
��-1.791
ivism-1.528
idem-1.511
icles-1.406
��-1.372
Token,
Token ID11
Feature activation+3.23e-01

Pos logit contributions
 Spur+0.179
 misunderstand+0.172
fork+0.168
 twe+0.165
 Learns+0.158
Neg logit contributions
��-0.251
ivism-0.220
idem-0.213
icles-0.203
��-0.198
Token while
Token ID981
Feature activation+7.99e-01

Pos logit contributions
��+0.549
ivism+0.483
idem+0.469
icles+0.441
��+0.432
Neg logit contributions
 Spur-0.450
 misunderstand-0.435
fork-0.425
 twe-0.415
 deem-0.405
Token CBS
Token ID11133
Feature activation-3.10e-01

Pos logit contributions
��+1.354
ivism+1.189
idem+1.160
icles+1.086
��+1.056
Neg logit contributions
 Spur-1.127
 misunderstand-1.087
fork-1.058
 twe-1.034
 lig-1.006
Token'
Token ID6
Feature activation-5.96e-01

Pos logit contributions
 Spur+0.384
 misunderstand+0.370
fork+0.361
 twe+0.351
 lig+0.338
Neg logit contributions
��-0.580
ivism-0.512
idem-0.496
icles-0.472
��-0.461
Token anchor
Token ID18021
Feature activation-6.95e-02

Pos logit contributions
 Spur+0.826
 misunderstand+0.794
fork+0.773
 twe+0.762
 deem+0.740
Neg logit contributions
��-1.023
ivism-0.895
idem-0.869
icles-0.822
��-0.798
Token Scott
Token ID4746
Feature activation+3.96e+00

Pos logit contributions
 Spur+0.099
 misunderstand+0.095
fork+0.093
 twe+0.090
 deem+0.088
Neg logit contributions
��-0.117
ivism-0.103
idem-0.101
icles-0.094
��-0.092
Token Pel
Token ID12903
Feature activation-4.98e-01

Pos logit contributions
��+5.018
ivism+4.209
idem+4.130
icles+3.801
ão+3.513
Neg logit contributions
 Spur-6.928
 misunderstand-6.863
fork-6.724
 twe-6.556
 lig-6.413
Tokenley
Token ID1636
Feature activation-6.90e-01

Pos logit contributions
 Spur+0.314
fork+0.304
 misunderstand+0.290
 twe+0.271
 lig+0.251
Neg logit contributions
��-1.220
ivism-1.103
idem-1.075
��-1.046
icles-1.028
Token is
Token ID318
Feature activation+1.76e-01

Pos logit contributions
 Spur+0.781
 misunderstand+0.762
fork+0.742
 twe+0.709
 lig+0.685
Neg logit contributions
��-1.365
ivism-1.204
idem-1.174
��-1.106
icles-1.105
Token leading
Token ID3756
Feature activation+3.24e-01

Pos logit contributions
��+0.312
ivism+0.275
idem+0.268
icles+0.253
��+0.248
Neg logit contributions
 Spur-0.234
 misunderstand-0.227
fork-0.221
 twe-0.217
 deem-0.213
Token his
Token ID465
Feature activation+2.64e-01

Pos logit contributions
��+0.542
ivism+0.475
idem+0.461
icles+0.431
��+0.418
Neg logit contributions
 Spur-0.465
 misunderstand-0.448
fork-0.433
 twe-0.429
 lig-0.417
Tokenin
Token ID259
Feature activation+3.80e-01

Pos logit contributions
��+2.334
ivism+1.992
idem+1.934
icles+1.818
��+1.768
Neg logit contributions
 Spur-2.462
fork-2.372
 misunderstand-2.370
 twe-2.271
 lig-2.215
Token,
Token ID11
Feature activation+4.74e-02

Pos logit contributions
��+0.613
ivism+0.536
idem+0.524
icles+0.491
��+0.472
Neg logit contributions
 Spur-0.563
 misunderstand-0.543
fork-0.536
 twe-0.516
 Learns-0.505
Token Iowa
Token ID9406
Feature activation+6.31e-02

Pos logit contributions
��+0.083
ivism+0.073
idem+0.072
icles+0.068
��+0.065
Neg logit contributions
 Spur-0.064
 misunderstand-0.061
fork-0.060
 twe-0.058
 lig-0.057
Token.
Token ID13
Feature activation+8.21e-02

Pos logit contributions
��+0.096
ivism+0.083
idem+0.081
icles+0.076
��+0.072
Neg logit contributions
 Spur-0.099
 misunderstand-0.096
fork-0.094
 twe-0.092
 lig-0.090
Token Hide
Token ID10415
Feature activation+1.08e+00

Pos logit contributions
��+0.155
ivism+0.137
idem+0.135
icles+0.127
��+0.125
Neg logit contributions
 Spur-0.102
 misunderstand-0.096
fork-0.094
 twe-0.090
 Learns-0.089
Token Caption
Token ID11260
Feature activation+3.95e+00

Pos logit contributions
��+1.379
ivism+1.146
idem+1.117
icles+1.014
��+0.976
Neg logit contributions
 Spur-1.983
 misunderstand-1.928
fork-1.898
 twe-1.845
 Learns-1.823
Token 5
Token ID642
Feature activation-1.61e-01

Pos logit contributions
��+5.359
ivism+4.635
idem+4.405
icles+4.233
ão+3.886
Neg logit contributions
 misunderstand-6.465
 Spur-6.440
fork-6.293
 twe-6.158
 deem-6.070
Token of
Token ID286
Feature activation-2.46e-01

Pos logit contributions
 Spur+0.193
 misunderstand+0.185
fork+0.183
 twe+0.174
 lig+0.171
Neg logit contributions
��-0.310
ivism-0.272
idem-0.267
��-0.252
icles-0.249
Token 15
Token ID1315
Feature activation+7.69e-01

Pos logit contributions
 Spur+0.415
 misunderstand+0.393
fork+0.386
 twe+0.380
 Learns+0.378
Neg logit contributions
��-0.363
ivism-0.304
idem-0.298
icles-0.272
��-0.271
Token Photos
Token ID9434
Feature activation+2.38e+00

Pos logit contributions
��+1.499
ivism+1.331
idem+1.308
��+1.222
icles+1.219
Neg logit contributions
 Spur-0.905
 misunderstand-0.853
fork-0.827
 Learns-0.795
 twe-0.789
Token:
Token ID25
Feature activation-7.52e-03

Pos logit contributions
��+3.434
ivism+3.003
idem+2.906
icles+2.729
ão+2.560
Neg logit contributions
 Spur-3.841
 misunderstand-3.762
fork-3.660
 twe-3.593
 lig-3.506
Token Pasadena
Token ID41010
Feature activation-2.31e+00

Pos logit contributions
��+0.730
ivism+0.650
idem+0.637
icles+0.599
��+0.586
Neg logit contributions
 Spur-0.500
 misunderstand-0.476
fork-0.466
 twe-0.451
 lig-0.440
Token,
Token ID11
Feature activation+2.01e-01

Pos logit contributions
 Spur+3.259
 misunderstand+3.112
fork+3.080
 twe+2.960
 lig+2.897
Neg logit contributions
��-3.812
ivism-3.371
idem-3.303
icles-3.085
��-2.994
Token California
Token ID3442
Feature activation+2.85e-01

Pos logit contributions
��+0.361
ivism+0.321
idem+0.314
icles+0.297
��+0.286
Neg logit contributions
 Spur-0.259
 misunderstand-0.250
fork-0.244
 twe-0.236
 lig-0.230
Token.
Token ID13
Feature activation+2.83e-01

Pos logit contributions
��+0.414
ivism+0.358
idem+0.346
icles+0.322
��+0.306
Neg logit contributions
 Spur-0.465
 misunderstand-0.455
fork-0.443
 twe-0.432
 Learns-0.426
Token Hide
Token ID10415
Feature activation+1.35e+00

Pos logit contributions
��+0.525
ivism+0.466
idem+0.458
icles+0.432
��+0.422
Neg logit contributions
 Spur-0.354
 misunderstand-0.337
fork-0.329
 twe-0.317
 Learns-0.312
Token Caption
Token ID11260
Feature activation+3.93e+00

Pos logit contributions
��+1.688
ivism+1.408
idem+1.368
icles+1.244
��+1.188
Neg logit contributions
 Spur-2.488
 misunderstand-2.422
fork-2.384
 twe-2.321
 Learns-2.290
Token 12
Token ID1105
Feature activation+1.84e-01

Pos logit contributions
��+4.769
ivism+4.208
idem+4.007
icles+3.874
ão+3.492
Neg logit contributions
 Spur-6.797
 misunderstand-6.761
fork-6.597
 twe-6.486
 deem-6.345
Token of
Token ID286
Feature activation-1.78e-01

Pos logit contributions
��+0.351
ivism+0.308
idem+0.303
��+0.285
icles+0.283
Neg logit contributions
 Spur-0.224
 misunderstand-0.215
fork-0.212
 twe-0.201
 lig-0.198
Token 15
Token ID1315
Feature activation+9.29e-01

Pos logit contributions
 Spur+0.301
 misunderstand+0.288
fork+0.282
 twe+0.276
 Learns+0.272
Neg logit contributions
��-0.260
ivism-0.216
idem-0.213
icles-0.195
��-0.194
Token Photos
Token ID9434
Feature activation+2.45e+00

Pos logit contributions
��+1.771
ivism+1.572
idem+1.543
icles+1.448
��+1.434
Neg logit contributions
 Spur-1.118
 misunderstand-1.064
fork-1.038
 twe-0.991
 Learns-0.986
Token:
Token ID25
Feature activation+1.87e-02

Pos logit contributions
��+3.509
ivism+3.080
idem+2.975
icles+2.790
ão+2.620
Neg logit contributions
 Spur-3.947
 misunderstand-3.862
fork-3.760
 twe-3.692
 Learns-3.602
Token Pasadena
Token ID41010
Feature activation-2.30e+00

Pos logit contributions
��+1.099
ivism+0.971
idem+0.952
icles+0.896
��+0.873
Neg logit contributions
 Spur-0.811
 misunderstand-0.775
fork-0.759
 twe-0.734
 lig-0.716
Token,
Token ID11
Feature activation+1.73e-01

Pos logit contributions
 Spur+3.205
 misunderstand+3.050
fork+3.038
 twe+2.901
 lig+2.844
Neg logit contributions
��-3.834
ivism-3.414
idem-3.340
icles-3.123
��-3.010
Token California
Token ID3442
Feature activation+2.94e-01

Pos logit contributions
��+0.319
ivism+0.284
idem+0.279
icles+0.263
��+0.257
Neg logit contributions
 Spur-0.217
 misunderstand-0.208
fork-0.204
 twe-0.197
 lig-0.192
Token.
Token ID13
Feature activation+1.70e-01

Pos logit contributions
��+0.422
ivism+0.365
idem+0.352
icles+0.329
��+0.311
Neg logit contributions
 Spur-0.482
 misunderstand-0.473
fork-0.459
 twe-0.449
 Learns-0.442
Token Hide
Token ID10415
Feature activation+1.27e+00

Pos logit contributions
��+0.315
ivism+0.280
idem+0.275
icles+0.259
��+0.253
Neg logit contributions
 Spur-0.212
 misunderstand-0.201
fork-0.197
 twe-0.189
 Learns-0.186
Token Caption
Token ID11260
Feature activation+3.93e+00

Pos logit contributions
��+1.608
ivism+1.340
idem+1.305
icles+1.185
��+1.140
Neg logit contributions
 Spur-2.330
 misunderstand-2.265
fork-2.231
 twe-2.170
 Learns-2.144
Token 11
Token ID1367
Feature activation+1.07e+00

Pos logit contributions
��+4.723
ivism+4.174
icles+3.862
idem+3.751
ão+3.497
Neg logit contributions
 Spur-6.807
 misunderstand-6.658
fork-6.499
 twe-6.400
 deem-6.330
Token of
Token ID286
Feature activation-2.28e-01

Pos logit contributions
��+1.954
ivism+1.726
idem+1.691
icles+1.590
��+1.561
Neg logit contributions
 Spur-1.356
 misunderstand-1.306
fork-1.280
 twe-1.229
 lig-1.202
Token 15
Token ID1315
Feature activation+8.18e-01

Pos logit contributions
 Spur+0.385
 misunderstand+0.368
fork+0.360
 twe+0.352
 Learns+0.348
Neg logit contributions
��-0.336
ivism-0.278
idem-0.274
icles-0.251
��-0.251
Token Photos
Token ID9434
Feature activation+2.35e+00

Pos logit contributions
��+1.572
ivism+1.394
idem+1.370
icles+1.282
��+1.279
Neg logit contributions
 Spur-0.976
 misunderstand-0.926
fork-0.904
 Learns-0.861
 twe-0.861
Token:
Token ID25
Feature activation-7.08e-02

Pos logit contributions
��+3.401
ivism+2.978
idem+2.880
icles+2.699
ão+2.537
Neg logit contributions
 Spur-3.771
 misunderstand-3.684
fork-3.591
 twe-3.519
 Learns-3.436
Token,
Token ID11
Feature activation-5.12e-01

Pos logit contributions
��+0.183
ivism+0.160
idem+0.156
icles+0.147
��+0.142
Neg logit contributions
 Spur-0.155
 misunderstand-0.151
fork-0.147
 twe-0.143
 lig-0.141
Token we
Token ID356
Feature activation+6.51e-01

Pos logit contributions
 Spur+0.730
 misunderstand+0.713
fork+0.687
 twe+0.675
 lig+0.667
Neg logit contributions
��-0.861
ivism-0.744
idem-0.726
icles-0.690
��-0.675
Token will
Token ID481
Feature activation+9.20e-01

Pos logit contributions
��+1.255
ivism+1.129
idem+1.107
icles+1.048
��+1.012
Neg logit contributions
 Spur-0.750
 misunderstand-0.720
fork-0.702
 twe-0.673
 Learns-0.656
Token use
Token ID779
Feature activation-2.24e+00

Pos logit contributions
��+1.712
ivism+1.537
idem+1.508
icles+1.422
��+1.370
Neg logit contributions
 Spur-1.112
 misunderstand-1.069
fork-1.047
 twe-1.000
 Learns-0.983
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
 Spur+2.920
 misunderstand+2.810
 twe+2.699
fork+2.687
 lig+2.666
Neg logit contributions
��-4.004
ivism-3.449
idem-3.388
icles-3.260
��-3.183
Token
Token ID18074
Feature activation+3.91e+00

Pos logit contributions
 Spur+0.016
 misunderstand+0.015
fork+0.015
 twe+0.014
 lig+0.014
Neg logit contributions
��-0.023
ivism-0.021
idem-0.020
icles-0.019
��-0.019
Token
Token ID229
Feature activation+1.91e+00

Pos logit contributions
��+2.240
ivism+1.398
idem+1.269
icles+0.900
��+0.784
Neg logit contributions
 Spur-9.837
 misunderstand-9.644
fork-9.542
 twe-9.319
 Learns-9.262
Token2
Token ID17
Feature activation-1.07e+00

Pos logit contributions
��+3.188
ivism+2.827
idem+2.762
icles+2.598
��+2.478
Neg logit contributions
 Spur-2.673
 misunderstand-2.587
fork-2.529
 twe-2.436
 Learns-2.396
Token test
Token ID1332
Feature activation-1.95e-01

Pos logit contributions
 Spur+1.159
 misunderstand+1.135
fork+1.088
 twe+1.068
 lig+1.047
Neg logit contributions
��-2.140
ivism-1.905
idem-1.846
icles-1.778
��-1.740
Token for
Token ID329
Feature activation+8.46e-01

Pos logit contributions
 Spur+0.265
 misunderstand+0.259
fork+0.254
 twe+0.248
 lig+0.242
Neg logit contributions
��-0.338
ivism-0.293
idem-0.287
icles-0.271
��-0.265
Token each
Token ID1123
Feature activation+1.35e+00

Pos logit contributions
��+1.623
ivism+1.449
idem+1.419
icles+1.347
��+1.307
Neg logit contributions
 Spur-0.991
 misunderstand-0.956
fork-0.927
 twe-0.896
 lig-0.870
Token back
Token ID736
Feature activation-4.02e-01

Pos logit contributions
��+0.376
ivism+0.328
idem+0.323
icles+0.302
��+0.295
Neg logit contributions
 Spur-0.287
 misunderstand-0.281
fork-0.270
 twe-0.264
 lig-0.257
Token'
Token ID6
Feature activation-1.66e-01

Pos logit contributions
 Spur+0.539
 misunderstand+0.526
fork+0.506
 twe+0.497
 deem+0.480
Neg logit contributions
��-0.711
ivism-0.614
idem-0.609
icles-0.575
��-0.562
Token the
Token ID262
Feature activation+7.38e-02

Pos logit contributions
 Spur+0.219
 misunderstand+0.212
fork+0.205
 twe+0.202
 lig+0.195
Neg logit contributions
��-0.297
ivism-0.260
idem-0.256
icles-0.241
��-0.235
Token web
Token ID3992
Feature activation+1.29e+00

Pos logit contributions
��+0.143
ivism+0.127
idem+0.125
icles+0.119
��+0.116
Neg logit contributions
 Spur-0.085
 misunderstand-0.082
fork-0.080
 twe-0.078
 lig-0.075
Token A
Token ID317
Feature activation+2.05e-01

Pos logit contributions
��+2.212
ivism+1.942
idem+1.897
icles+1.787
��+1.731
Neg logit contributions
 Spur-1.780
 misunderstand-1.724
fork-1.691
 twe-1.638
 Learns-1.595
Token Downing
Token ID31355
Feature activation-4.65e+00

Pos logit contributions
��+0.391
ivism+0.350
idem+0.341
icles+0.325
��+0.315
Neg logit contributions
 Spur-0.242
 misunderstand-0.233
fork-0.227
 twe-0.220
 lig-0.213
Token Street
Token ID3530
Feature activation-7.23e-01

Pos logit contributions
 Spur+3.848
 misunderstand+3.627
fork+3.546
 twe+3.461
 lig+3.352
Neg logit contributions
��-10.180
ivism-9.511
idem-9.208
icles-8.857
��-8.585
Token petition
Token ID8853
Feature activation-3.75e-01

Pos logit contributions
 Spur+0.710
 misunderstand+0.685
fork+0.659
 twe+0.643
 lig+0.614
Neg logit contributions
��-1.530
ivism-1.379
idem-1.349
icles-1.291
��-1.259
Token is
Token ID318
Feature activation+2.38e-01

Pos logit contributions
 Spur+0.475
 misunderstand+0.462
fork+0.451
 twe+0.437
 deem+0.426
Neg logit contributions
��-0.687
ivism-0.604
idem-0.596
icles-0.558
��-0.548
Token calling
Token ID4585
Feature activation-7.40e-01

Pos logit contributions
��+0.407
ivism+0.358
idem+0.351
icles+0.330
��+0.318
Neg logit contributions
 Spur-0.330
 misunderstand-0.321
fork-0.313
 twe-0.304
 deem-0.299
Token for
Token ID329
Feature activation+1.35e+00

Pos logit contributions
 Spur+0.983
 misunderstand+0.944
fork+0.926
 twe+0.896
 lig+0.868
Neg logit contributions
��-1.323
ivism-1.154
idem-1.128
icles-1.064
��-1.037
Token)
Token ID8
Feature activation-5.70e-03

Pos logit contributions
 Spur+2.146
 misunderstand+2.089
fork+2.037
 twe+1.977
 lig+1.927
Neg logit contributions
��-2.874
ivism-2.538
idem-2.491
icles-2.346
��-2.264
Token who
Token ID508
Feature activation-7.05e-01

Pos logit contributions
 Spur+0.007
 misunderstand+0.007
fork+0.007
 twe+0.007
 lig+0.007
Neg logit contributions
��-0.010
ivism-0.009
idem-0.009
icles-0.008
��-0.008
Token were
Token ID547
Feature activation-1.82e-01

Pos logit contributions
 Spur+0.871
 misunderstand+0.854
fork+0.819
 twe+0.802
 deem+0.797
Neg logit contributions
��-1.305
ivism-1.148
idem-1.136
icles-1.066
��-1.044
Token marching
Token ID26409
Feature activation+1.20e+00

Pos logit contributions
 Spur+0.244
 misunderstand+0.239
fork+0.230
 twe+0.227
 deem+0.222
Neg logit contributions
��-0.318
ivism-0.282
idem-0.279
icles-0.261
��-0.250
Token on
Token ID319
Feature activation-1.70e+00

Pos logit contributions
��+2.028
ivism+1.777
idem+1.747
icles+1.639
��+1.580
Neg logit contributions
 Spur-1.677
 misunderstand-1.626
fork-1.585
 twe-1.542
 lig-1.504
Token Downing
Token ID31355
Feature activation-4.51e+00

Pos logit contributions
 Spur+2.221
 misunderstand+2.100
fork+2.040
 twe+2.017
 lig+1.939
Neg logit contributions
��-3.046
ivism-2.667
idem-2.642
icles-2.465
ão-2.426
Token Street
Token ID3530
Feature activation-1.04e+00

Pos logit contributions
 Spur+3.737
 misunderstand+3.376
 twe+3.207
fork+3.206
 lig+3.142
Neg logit contributions
��-9.851
ivism-9.194
idem-8.991
icles-8.563
��-8.395
Token,
Token ID11
Feature activation+1.42e-01

Pos logit contributions
 Spur+1.343
 misunderstand+1.321
fork+1.275
 twe+1.245
 deem+1.220
Neg logit contributions
��-1.850
ivism-1.630
idem-1.617
icles-1.505
��-1.481
Token clashed
Token ID37003
Feature activation-1.81e-01

Pos logit contributions
��+0.249
ivism+0.221
idem+0.217
icles+0.203
��+0.198
Neg logit contributions
 Spur-0.190
 misunderstand-0.184
fork-0.179
 twe-0.175
 deem-0.170
Token with
Token ID351
Feature activation+4.44e-01

Pos logit contributions
 Spur+0.279
 misunderstand+0.275
fork+0.265
 twe+0.262
 lig+0.256
Neg logit contributions
��-0.284
ivism-0.242
idem-0.240
icles-0.220
��-0.215
Token anti
Token ID3098
Feature activation+3.16e-01

Pos logit contributions
��+0.791
ivism+0.702
idem+0.688
icles+0.647
��+0.626
Neg logit contributions
 Spur-0.582
 misunderstand-0.564
fork-0.546
 twe-0.534
 lig-0.517
Token Mike
Token ID4995
Feature activation+1.52e+00

Pos logit contributions
��+0.367
ivism+0.326
idem+0.318
icles+0.297
��+0.289
Neg logit contributions
 Spur-0.289
 misunderstand-0.280
fork-0.270
 twe-0.265
 lig-0.258
Token Bloom
Token ID11891
Feature activation-3.81e+00

Pos logit contributions
��+2.677
ivism+2.375
idem+2.317
icles+2.186
��+2.108
Neg logit contributions
 Spur-2.032
 misunderstand-1.960
fork-1.915
 twe-1.850
 Learns-1.806
Token (@
Token ID4275
Feature activation+4.89e-02

Pos logit contributions
 Spur+4.583
fork+4.425
 misunderstand+4.256
 twe+4.077
berry+4.035
Neg logit contributions
��-7.009
ivism-6.109
idem-5.956
��-5.695
icles-5.541
TokenAM
Token ID2390
Feature activation-1.04e+00

Pos logit contributions
��+0.083
ivism+0.074
idem+0.072
icles+0.067
��+0.065
Neg logit contributions
 Spur-0.067
 misunderstand-0.065
fork-0.064
 twe-0.062
 Learns-0.060
Tokenike
Token ID522
Feature activation+6.42e-01

Pos logit contributions
 Spur+1.442
 misunderstand+1.393
fork+1.380
 twe+1.313
 Learns+1.275
Neg logit contributions
��-1.761
ivism-1.539
idem-1.488
icles-1.383
��-1.380
TokenBloom
Token ID38941
Feature activation-4.50e+00

Pos logit contributions
��+1.265
ivism+1.141
idem+1.120
icles+1.065
��+1.027
Neg logit contributions
 Spur-0.710
 misunderstand-0.684
fork-0.658
 twe-0.635
 Learns-0.620
TokenType
Token ID6030
Feature activation-6.19e-01

Pos logit contributions
 Spur+5.366
fork+5.313
 misunderstand+5.081
 twe+4.973
berry+4.810
Neg logit contributions
��-7.965
ivism-6.792
idem-6.630
��-6.383
ão-6.160
Token)
Token ID8
Feature activation-4.65e-01

Pos logit contributions
 Spur+0.788
 misunderstand+0.761
fork+0.753
 twe+0.725
 lig+0.696
Neg logit contributions
��-1.126
ivism-0.985
idem-0.965
icles-0.904
��-0.895
Token and
Token ID290
Feature activation-4.26e-01

Pos logit contributions
 Spur+0.595
 misunderstand+0.569
fork+0.556
 twe+0.541
 deem+0.523
Neg logit contributions
��-0.847
ivism-0.753
idem-0.732
icles-0.687
��-0.673
Token a
Token ID257
Feature activation-3.23e-01

Pos logit contributions
 Spur+0.593
 misunderstand+0.575
fork+0.552
 twe+0.548
 lig+0.530
Neg logit contributions
��-0.729
ivism-0.645
idem-0.628
icles-0.581
��-0.575
Token
Token ID564
Feature activation+4.29e-01

Pos logit contributions
 Spur+0.388
 misunderstand+0.373
fork+0.359
 twe+0.353
 lig+0.340
Neg logit contributions
��-0.612
ivism-0.550
idem-0.537
icles-0.504
��-0.495
Token the
Token ID262
Feature activation-8.66e-03

Pos logit contributions
��+1.497
ivism+1.319
idem+1.292
icles+1.213
��+1.174
Neg logit contributions
 Spur-1.229
 misunderstand-1.193
fork-1.158
 twe-1.124
 Learns-1.100
Token cost
Token ID1575
Feature activation+1.78e-01

Pos logit contributions
 Spur+0.011
 misunderstand+0.010
fork+0.010
 twe+0.010
 lig+0.009
Neg logit contributions
��-0.016
ivism-0.014
idem-0.014
icles-0.013
��-0.013
Token of
Token ID286
Feature activation+2.18e-01

Pos logit contributions
��+0.325
ivism+0.288
idem+0.283
icles+0.266
��+0.259
Neg logit contributions
 Spur-0.225
 misunderstand-0.218
fork-0.212
 twe-0.204
 lig-0.200
Token renting
Token ID34245
Feature activation-7.52e-02

Pos logit contributions
��+0.372
ivism+0.327
idem+0.321
icles+0.300
��+0.294
Neg logit contributions
 Spur-0.305
 misunderstand-0.297
fork-0.286
 twe-0.280
 lig-0.275
Token is
Token ID318
Feature activation+1.22e-01

Pos logit contributions
 Spur+0.090
 misunderstand+0.087
fork+0.084
 twe+0.081
 lig+0.080
Neg logit contributions
��-0.144
ivism-0.127
idem-0.126
icles-0.118
��-0.116
Token fue
Token ID37911
Feature activation-4.47e+00

Pos logit contributions
��+0.238
ivism+0.213
idem+0.209
icles+0.199
��+0.192
Neg logit contributions
 Spur-0.140
 misunderstand-0.137
fork-0.130
 twe-0.127
 lig-0.126
Tokenlling
Token ID2680
Feature activation-1.29e+00

Pos logit contributions
 Spur+1.828
 misunderstand+1.660
fork+1.640
 twe+1.300
 lig+1.279
Neg logit contributions
��-11.505
idem-10.528
ivism-10.521
��-10.136
icles-10.059
Token wage
Token ID7699
Feature activation-1.14e-01

Pos logit contributions
 Spur+1.667
 misunderstand+1.656
fork+1.549
 twe+1.523
 lig+1.483
Neg logit contributions
��-2.333
ivism-2.036
idem-2.004
icles-1.907
��-1.859
Token demands
Token ID8665
Feature activation-7.37e-01

Pos logit contributions
 Spur+0.137
 misunderstand+0.135
fork+0.131
 twe+0.126
 lig+0.123
Neg logit contributions
��-0.216
ivism-0.190
idem-0.189
icles-0.178
��-0.174
Token from
Token ID422
Feature activation-7.48e-03

Pos logit contributions
 Spur+1.013
 misunderstand+0.986
fork+0.964
 twe+0.932
 lig+0.910
Neg logit contributions
��-1.272
ivism-1.099
idem-1.088
icles-1.020
��-1.012
Token workers
Token ID3259
Feature activation+3.24e-01

Pos logit contributions
 Spur+0.010
 misunderstand+0.009
fork+0.009
 twe+0.009
 lig+0.009
Neg logit contributions
��-0.014
ivism-0.012
idem-0.012
icles-0.011
��-0.011
Token he
Token ID339
Feature activation-5.52e-01

Pos logit contributions
��+1.884
ivism+1.654
idem+1.616
icles+1.516
��+1.457
Neg logit contributions
 Spur-1.659
 misunderstand-1.611
fork-1.573
 twe-1.529
 lig-1.493
Token
Token ID447
Feature activation-5.87e-01

Pos logit contributions
 Spur+0.613
 misunderstand+0.608
fork+0.574
 twe+0.555
 lig+0.543
Neg logit contributions
��-1.089
ivism-0.969
idem-0.953
icles-0.902
��-0.891
Token
Token ID247
Feature activation+2.69e-02

Pos logit contributions
 Spur+1.401
 misunderstand+1.384
fork+1.375
 twe+1.347
 lig+1.347
Neg logit contributions
��-0.330
ivism-0.271
idem-0.254
icles-0.179
imus-0.157
Tokens
Token ID82
Feature activation-2.00e-01

Pos logit contributions
��+0.043
ivism+0.037
idem+0.036
icles+0.033
��+0.033
Neg logit contributions
 Spur-0.041
 misunderstand-0.040
fork-0.039
 twe-0.038
 Learns-0.037
Token not
Token ID407
Feature activation+6.64e-01

Pos logit contributions
 Spur+0.238
 misunderstand+0.233
fork+0.223
 twe+0.220
 deem+0.215
Neg logit contributions
��-0.376
ivism-0.333
idem-0.331
icles-0.312
��-0.302
Token impe
Token ID18516
Feature activation-4.32e+00

Pos logit contributions
��+1.230
ivism+1.090
idem+1.078
icles+1.017
��+0.984
Neg logit contributions
 Spur-0.818
 misunderstand-0.793
fork-0.767
 twe-0.749
 lig-0.729
Tokenached
Token ID2317
Feature activation-1.53e+00

Pos logit contributions
fork+4.404
 Spur+4.272
 misunderstand+4.104
 twe+4.025
 lig+3.904
Neg logit contributions
��-8.613
ivism-7.573
idem-7.275
��-7.100
icles-6.877
Token within
Token ID1626
Feature activation+1.56e+00

Pos logit contributions
 Spur+2.006
 misunderstand+1.950
fork+1.891
 twe+1.855
 lig+1.807
Neg logit contributions
��-2.680
ivism-2.327
idem-2.295
icles-2.151
��-2.131
Token the
Token ID262
Feature activation-8.63e-03

Pos logit contributions
��+2.396
ivism+2.084
idem+2.025
icles+1.890
��+1.816
Neg logit contributions
 Spur-2.435
 misunderstand-2.370
fork-2.314
 twe-2.259
 lig-2.209
Token next
Token ID1306
Feature activation+7.70e-01

Pos logit contributions
 Spur+0.013
 misunderstand+0.013
fork+0.012
 twe+0.012
 lig+0.012
Neg logit contributions
��-0.014
ivism-0.012
idem-0.011
icles-0.011
��-0.010
Token year
Token ID614
Feature activation-1.19e+00

Pos logit contributions
��+1.225
ivism+1.076
idem+1.055
icles+0.974
��+0.956
Neg logit contributions
 Spur-1.155
 misunderstand-1.125
fork-1.092
 twe-1.077
 lig-1.051
Token UK
Token ID3482
Feature activation+1.13e-01

Pos logit contributions
��+0.537
ivism+0.469
idem+0.456
icles+0.427
��+0.415
Neg logit contributions
 Spur-0.475
 misunderstand-0.463
fork-0.450
 twe-0.439
 lig-0.431
Token
Token ID564
Feature activation-3.82e-02

Pos logit contributions
��+0.212
ivism+0.187
idem+0.184
icles+0.173
��+0.170
Neg logit contributions
 Spur-0.140
 misunderstand-0.136
fork-0.131
 twe-0.127
 lig-0.124
Token
Token ID250
Feature activation+1.76e-01

Pos logit contributions
 Spur+0.072
 misunderstand+0.070
fork+0.070
 twe+0.068
 lig+0.067
Neg logit contributions
��-0.044
ivism-0.038
idem-0.037
icles-0.033
imus-0.030
Tokenmay
Token ID11261
Feature activation-1.39e+00

Pos logit contributions
��+0.307
ivism+0.271
idem+0.262
icles+0.249
��+0.244
Neg logit contributions
 Spur-0.237
 misunderstand-0.230
fork-0.226
 twe-0.217
 deem-0.212
Token be
Token ID307
Feature activation-3.28e-01

Pos logit contributions
 misunderstand+1.603
 Spur+1.602
fork+1.502
 twe+1.465
 deem+1.453
Neg logit contributions
��-2.720
ivism-2.394
idem-2.335
icles-2.205
��-2.179
Token fue
Token ID37911
Feature activation-4.19e+00

Pos logit contributions
 Spur+0.400
 misunderstand+0.392
fork+0.377
 twe+0.366
 lig+0.359
Neg logit contributions
��-0.612
ivism-0.547
idem-0.537
icles-0.507
��-0.486
Tokenlling
Token ID2680
Feature activation-1.14e+00

Pos logit contributions
 Spur+1.875
 misunderstand+1.837
fork+1.763
 lig+1.444
 twe+1.438
Neg logit contributions
��-10.583
ivism-9.683
idem-9.624
icles-9.220
��-9.214
Token xen
Token ID27132
Feature activation+1.08e+00

Pos logit contributions
 misunderstand+1.288
 Spur+1.287
fork+1.196
 twe+1.166
 lig+1.128
Neg logit contributions
��-2.227
ivism-1.979
idem-1.947
icles-1.861
��-1.797
Tokenophobic
Token ID23468
Feature activation+6.03e-01

Pos logit contributions
��+1.375
ivism+1.134
idem+1.092
icles+1.009
��+0.964
Neg logit contributions
 Spur-1.967
 misunderstand-1.925
fork-1.889
 twe-1.843
 lig-1.809
Token sentiment
Token ID15598
Feature activation-8.99e-01

Pos logit contributions
��+1.136
ivism+1.014
idem+0.993
icles+0.942
��+0.910
Neg logit contributions
 Spur-0.725
 misunderstand-0.701
fork-0.680
 twe-0.656
 lig-0.636
Token
Token ID447
Feature activation-4.61e-01

Pos logit contributions
 Spur+1.216
 misunderstand+1.193
fork+1.142
 twe+1.119
 deem+1.088
Neg logit contributions
��-1.561
ivism-1.344
idem-1.328
icles-1.270
��-1.235
Tokenattack
Token ID20358
Feature activation-2.31e-01

Pos logit contributions
 Spur+0.198
fork+0.197
 misunderstand+0.191
 twe+0.185
 deem+0.181
Neg logit contributions
��-0.182
ivism-0.151
idem-0.144
��-0.139
icles-0.138
Token data
Token ID1366
Feature activation-1.53e+00

Pos logit contributions
 Spur+0.263
 misunderstand+0.252
fork+0.243
 twe+0.239
 lig+0.232
Neg logit contributions
��-0.454
ivism-0.407
idem-0.397
icles-0.379
��-0.374
Token.
Token ID13
Feature activation-3.01e-01

Pos logit contributions
 Spur+2.150
 misunderstand+2.083
 twe+1.994
fork+1.993
 deem+1.942
Neg logit contributions
��-2.597
ivism-2.264
idem-2.224
icles-2.071
��-2.071
Token Airbus
Token ID39173
Feature activation+9.44e-01

Pos logit contributions
 Spur+0.373
 misunderstand+0.354
fork+0.344
 twe+0.332
 Learns+0.329
Neg logit contributions
��-0.565
ivism-0.503
idem-0.490
icles-0.465
��-0.455
Token later
Token ID1568
Feature activation-1.96e+00

Pos logit contributions
��+1.757
ivism+1.570
idem+1.536
icles+1.456
��+1.401
Neg logit contributions
 Spur-1.140
 misunderstand-1.098
fork-1.079
 twe-1.032
 Learns-1.007
Token rede
Token ID21459
Feature activation-4.14e+00

Pos logit contributions
 Spur+2.323
 misunderstand+2.295
fork+2.143
 twe+2.140
 deem+2.120
Neg logit contributions
��-3.777
ivism-3.315
idem-3.275
��-3.086
icles-3.083
Token -
Token ID532
Feature activation-7.76e-01

Pos logit contributions
 Spur+5.712
 misunderstand+5.640
fork+5.497
 Learns+5.290
 twe+5.281
Neg logit contributions
��-6.707
ivism-5.803
idem-5.633
��-5.375
icles-5.181
Token signed
Token ID4488
Feature activation-9.64e-01

Pos logit contributions
 Spur+1.006
 misunderstand+0.982
fork+0.958
 twe+0.931
 Learns+0.904
Neg logit contributions
��-1.386
ivism-1.232
idem-1.192
icles-1.138
��-1.110
Token the
Token ID262
Feature activation-1.85e-01

Pos logit contributions
 Spur+1.351
 misunderstand+1.326
fork+1.263
 twe+1.249
 lig+1.221
Neg logit contributions
��-1.640
ivism-1.436
idem-1.385
icles-1.312
��-1.296
Token algorithm
Token ID11862
Feature activation-9.08e-01

Pos logit contributions
 Spur+0.204
 misunderstand+0.198
fork+0.189
 twe+0.184
 lig+0.178
Neg logit contributions
��-0.368
ivism-0.330
idem-0.323
icles-0.307
��-0.300
Token and
Token ID290
Feature activation+2.22e-02

Pos logit contributions
 Spur+1.207
 misunderstand+1.184
fork+1.147
 twe+1.115
 deem+1.096
Neg logit contributions
��-1.614
ivism-1.410
idem-1.370
icles-1.307
��-1.294
Token media
Token ID2056
Feature activation+9.03e-01

Pos logit contributions
��+1.265
ivism+1.136
idem+1.112
icles+1.057
��+1.022
Neg logit contributions
 Spur-0.743
 misunderstand-0.715
fork-0.695
 twe-0.668
 Learns-0.648
Token can
Token ID460
Feature activation-1.19e+00

Pos logit contributions
��+1.690
ivism+1.506
idem+1.475
icles+1.396
��+1.354
Neg logit contributions
 Spur-1.101
 misunderstand-1.062
fork-1.032
 twe-0.997
 lig-0.969
Token provide
Token ID2148
Feature activation+8.15e-01

Pos logit contributions
 Spur+1.379
 misunderstand+1.342
fork+1.266
 twe+1.249
 deem+1.218
Neg logit contributions
��-2.355
ivism-2.079
idem-2.035
icles-1.929
��-1.902
Token platforms
Token ID9554
Feature activation+1.25e+00

Pos logit contributions
��+1.466
ivism+1.301
idem+1.271
icles+1.202
��+1.159
Neg logit contributions
 Spur-1.056
 misunderstand-1.024
fork-0.991
 twe-0.964
 lig-0.940
Token to
Token ID284
Feature activation+1.26e-01

Pos logit contributions
��+2.103
ivism+1.857
idem+1.811
icles+1.705
��+1.638
Neg logit contributions
 Spur-1.739
 misunderstand-1.685
fork-1.647
 twe-1.593
 Learns-1.557
Token dissemin
Token ID30548
Feature activation-4.11e+00

Pos logit contributions
��+0.244
ivism+0.218
idem+0.213
icles+0.202
��+0.197
Neg logit contributions
 Spur-0.147
 misunderstand-0.142
fork-0.136
 twe-0.133
 deem-0.129
Tokenate
Token ID378
Feature activation-1.03e+00

Pos logit contributions
 Spur+3.188
 misunderstand+2.997
ately+2.864
fork+2.823
 twe+2.749
Neg logit contributions
��-9.254
ivism-8.170
idem-8.019
��-7.811
icles-7.681
Token evidence
Token ID2370
Feature activation-1.89e+00

Pos logit contributions
 Spur+0.841
 misunderstand+0.826
 twe+0.732
fork+0.729
 lig+0.711
Neg logit contributions
��-2.351
ivism-2.112
idem-2.088
icles-2.007
��-1.963
Token-
Token ID12
Feature activation+3.17e-01

Pos logit contributions
 Spur+2.304
 misunderstand+2.270
fork+2.140
 twe+2.117
 deem+2.080
Neg logit contributions
��-3.519
ivism-3.039
idem-2.991
icles-2.825
��-2.801
Tokenbased
Token ID3106
Feature activation+2.11e-01

Pos logit contributions
��+0.561
ivism+0.494
idem+0.481
icles+0.455
��+0.443
Neg logit contributions
 Spur-0.419
 misunderstand-0.404
fork-0.397
 twe-0.384
 lig-0.374
Token injury
Token ID5095
Feature activation+1.17e+00

Pos logit contributions
��+0.452
ivism+0.406
idem+0.399
icles+0.382
��+0.372
Neg logit contributions
 Spur-0.204
 misunderstand-0.197
fork-0.186
 twe-0.180
 lig-0.174
Token graduated
Token ID18303
Feature activation-2.72e-01

Pos logit contributions
 Spur+0.820
 misunderstand+0.796
fork+0.766
 twe+0.748
 lig+0.733
Neg logit contributions
��-1.282
ivism-1.130
idem-1.113
icles-1.048
��-1.026
Token in
Token ID287
Feature activation+1.58e-01

Pos logit contributions
 Spur+0.362
 misunderstand+0.348
fork+0.336
 twe+0.331
 Learns+0.322
Neg logit contributions
��-0.484
ivism-0.425
idem-0.415
icles-0.391
��-0.382
Token April
Token ID3035
Feature activation+1.58e-02

Pos logit contributions
��+0.312
ivism+0.278
idem+0.273
icles+0.259
��+0.253
Neg logit contributions
 Spur-0.179
 misunderstand-0.171
fork-0.165
 twe-0.162
 lig-0.156
Token;
Token ID26
Feature activation-1.48e+00

Pos logit contributions
��+0.026
ivism+0.023
idem+0.023
icles+0.021
��+0.021
Neg logit contributions
 Spur-0.022
 misunderstand-0.022
fork-0.021
 twe-0.021
 lig-0.020
Token Brian
Token ID8403
Feature activation-2.91e-01

Pos logit contributions
 Spur+2.043
 misunderstand+1.921
fork+1.867
 twe+1.842
 lig+1.806
Neg logit contributions
��-2.555
ivism-2.226
idem-2.176
icles-2.054
��-2.018
Token Hale
Token ID35056
Feature activation-4.10e+00

Pos logit contributions
 Spur+0.402
 misunderstand+0.377
fork+0.371
 twe+0.357
 Learns+0.353
Neg logit contributions
��-0.509
ivism-0.448
idem-0.435
icles-0.411
��-0.402
Token,
Token ID11
Feature activation-1.24e-01

Pos logit contributions
 Spur+5.294
fork+5.051
 misunderstand+4.962
 twe+4.901
 lig+4.804
Neg logit contributions
��-6.923
ivism-6.193
idem-5.811
icles-5.671
��-5.525
Token a
Token ID257
Feature activation-1.40e-01

Pos logit contributions
 Spur+0.176
 misunderstand+0.170
fork+0.166
 twe+0.162
 lig+0.158
Neg logit contributions
��-0.207
ivism-0.182
idem-0.177
icles-0.167
��-0.162
Token senior
Token ID4664
Feature activation-8.06e-01

Pos logit contributions
 Spur+0.157
 misunderstand+0.150
fork+0.145
 twe+0.142
 lig+0.138
Neg logit contributions
��-0.277
ivism-0.248
idem-0.243
icles-0.231
��-0.225
Token aerospace
Token ID40439
Feature activation+7.07e-01

Pos logit contributions
 Spur+0.907
 misunderstand+0.873
fork+0.833
 twe+0.821
 lig+0.803
Neg logit contributions
��-1.599
ivism-1.423
idem-1.393
icles-1.326
��-1.300
Token engineering
Token ID8705
Feature activation-1.96e-01

Pos logit contributions
��+1.345
ivism+1.202
idem+1.177
icles+1.116
��+1.080
Neg logit contributions
 Spur-0.839
 misunderstand-0.809
fork-0.785
 twe-0.759
 lig-0.738
Token and
Token ID290
Feature activation+1.88e-01

Pos logit contributions
 Spur+2.120
 misunderstand+2.069
fork+2.015
 twe+1.963
 lig+1.955
Neg logit contributions
��-2.860
ivism-2.482
idem-2.466
icles-2.313
��-2.271
Token that
Token ID326
Feature activation+6.83e-01

Pos logit contributions
��+0.320
ivism+0.281
idem+0.276
icles+0.258
��+0.253
Neg logit contributions
 Spur-0.262
 misunderstand-0.256
fork-0.246
 twe-0.241
 lig-0.236
Token he
Token ID339
Feature activation+9.14e-03

Pos logit contributions
��+1.244
ivism+1.103
idem+1.081
icles+1.023
��+0.990
Neg logit contributions
 Spur-0.864
 misunderstand-0.841
fork-0.813
 twe-0.791
 lig-0.771
Token will
Token ID481
Feature activation+8.21e-01

Pos logit contributions
��+0.018
ivism+0.016
idem+0.015
icles+0.015
��+0.014
Neg logit contributions
 Spur-0.010
 misunderstand-0.010
fork-0.010
 twe-0.010
 lig-0.009
Token be
Token ID307
Feature activation-2.53e-01

Pos logit contributions
��+1.578
ivism+1.409
idem+1.379
icles+1.310
��+1.271
Neg logit contributions
 Spur-0.962
 misunderstand-0.934
fork-0.899
 twe-0.869
 lig-0.845
Token impe
Token ID18516
Feature activation-4.10e+00

Pos logit contributions
 Spur+0.270
 misunderstand+0.262
fork+0.251
 twe+0.247
 deem+0.239
Neg logit contributions
��-0.507
ivism-0.457
idem-0.454
icles-0.428
��-0.416
Tokenached
Token ID2317
Feature activation-1.24e+00

Pos logit contributions
fork+4.205
 Spur+4.000
 misunderstand+3.889
 twe+3.790
 lig+3.643
Neg logit contributions
��-8.224
ivism-7.266
idem-7.081
��-6.822
icles-6.641
Token,
Token ID11
Feature activation+3.47e-01

Pos logit contributions
 Spur+1.649
 misunderstand+1.605
fork+1.570
 twe+1.529
 deem+1.489
Neg logit contributions
��-2.153
ivism-1.860
idem-1.843
icles-1.726
��-1.718
Token
Token ID447
Feature activation+2.11e-01

Pos logit contributions
��+0.620
ivism+0.551
idem+0.537
icles+0.508
��+0.493
Neg logit contributions
 Spur-0.454
 misunderstand-0.438
fork-0.427
 twe-0.415
 lig-0.405
Token
Token ID251
Feature activation+2.11e-01

Pos logit contributions
��+0.224
ivism+0.194
idem+0.186
icles+0.167
imus+0.152
Neg logit contributions
 Spur-0.414
 misunderstand-0.404
fork-0.401
 twe-0.394
 lig-0.387
Token she
Token ID673
Feature activation-1.41e-01

Pos logit contributions
��+0.411
ivism+0.369
idem+0.360
icles+0.343
��+0.333
Neg logit contributions
 Spur-0.244
 misunderstand-0.232
fork-0.227
 twe-0.219
 lig-0.212
Token
Token ID447
Feature activation-2.47e-02

Pos logit contributions
 Spur+0.322
 misunderstand+0.312
fork+0.304
 twe+0.295
 lig+0.289
Neg logit contributions
��-0.388
ivism-0.340
idem-0.335
icles-0.316
��-0.304
Token
Token ID247
Feature activation+2.47e-01

Pos logit contributions
 Spur+0.056
 misunderstand+0.055
fork+0.055
 twe+0.054
 Learns+0.053
Neg logit contributions
��-0.019
ivism-0.015
idem-0.014
icles-0.012
ão-0.010
Tokens
Token ID82
Feature activation+5.87e-02

Pos logit contributions
��+0.392
ivism+0.340
idem+0.334
icles+0.313
��+0.300
Neg logit contributions
 Spur-0.371
 misunderstand-0.360
fork-0.352
 twe-0.344
 lig-0.337
Token history
Token ID2106
Feature activation+1.81e-01

Pos logit contributions
��+0.127
ivism+0.115
idem+0.113
icles+0.108
��+0.105
Neg logit contributions
 Spur-0.054
 misunderstand-0.052
fork-0.049
 twe-0.048
 lig-0.046
Token.
Token ID13
Feature activation-2.18e-01

Pos logit contributions
��+0.284
ivism+0.246
idem+0.242
icles+0.226
��+0.217
Neg logit contributions
 Spur-0.274
 misunderstand-0.267
fork-0.261
 twe-0.254
 lig-0.249
Token Bow
Token ID9740
Feature activation-4.06e+00

Pos logit contributions
 Spur+0.271
 misunderstand+0.255
fork+0.250
 twe+0.239
 Learns+0.237
Neg logit contributions
��-0.409
ivism-0.363
idem-0.356
icles-0.337
��-0.329
Tokene
Token ID68
Feature activation-1.91e-01

Pos logit contributions
 Spur+4.658
fork+4.559
 misunderstand+4.499
 lig+4.322
 twe+4.227
Neg logit contributions
��-7.515
ivism-6.395
��-6.261
idem-6.153
icles-5.991
Token is
Token ID318
Feature activation-1.28e-01

Pos logit contributions
 Spur+0.281
 misunderstand+0.272
fork+0.264
 twe+0.256
 lig+0.253
Neg logit contributions
��-0.311
ivism-0.271
idem-0.266
icles-0.251
��-0.240
Token the
Token ID262
Feature activation+1.48e-02

Pos logit contributions
 Spur+0.174
 misunderstand+0.170
fork+0.162
 twe+0.159
 lig+0.158
Neg logit contributions
��-0.217
ivism-0.193
idem-0.191
icles-0.180
��-0.172
Token one
Token ID530
Feature activation+9.08e-01

Pos logit contributions
��+0.027
ivism+0.024
idem+0.023
icles+0.022
��+0.022
Neg logit contributions
 Spur-0.019
 misunderstand-0.019
fork-0.018
 twe-0.017
 lig-0.017
Token prisoner
Token ID17234
Feature activation-1.68e-01

Pos logit contributions
��+1.654
ivism+1.472
idem+1.440
icles+1.361
��+1.316
Neg logit contributions
 Spur-1.156
 misunderstand-1.115
fork-1.082
 twe-1.050
 lig-1.025
Token will
Token ID481
Feature activation+5.18e-01

Pos logit contributions
��+2.250
ivism+2.008
idem+1.968
icles+1.869
��+1.814
Neg logit contributions
 Spur-1.350
 misunderstand-1.305
fork-1.264
 twe-1.216
 lig-1.181
Token massively
Token ID28161
Feature activation-9.28e-01

Pos logit contributions
��+1.008
ivism+0.903
idem+0.884
icles+0.839
��+0.815
Neg logit contributions
 Spur-0.592
 misunderstand-0.571
fork-0.554
 twe-0.532
 lig-0.517
Token impact
Token ID2928
Feature activation-5.00e-01

Pos logit contributions
 Spur+0.905
 misunderstand+0.888
fork+0.840
 twe+0.798
 deem+0.775
Neg logit contributions
��-1.979
ivism-1.787
idem-1.738
icles-1.669
��-1.636
Token on
Token ID319
Feature activation-1.61e+00

Pos logit contributions
 Spur+0.667
 misunderstand+0.649
fork+0.618
 twe+0.604
 lig+0.594
Neg logit contributions
��-0.888
ivism-0.780
idem-0.765
icles-0.719
��-0.705
Token many
Token ID867
Feature activation-1.21e+00

Pos logit contributions
 Spur+2.099
 misunderstand+2.063
fork+1.920
 twe+1.900
 lig+1.868
Neg logit contributions
��-2.870
ivism-2.535
idem-2.484
icles-2.341
��-2.318
Token Northern
Token ID8342
Feature activation-4.04e+00

Pos logit contributions
 Spur+1.373
 misunderstand+1.349
fork+1.258
 twe+1.239
 lig+1.207
Neg logit contributions
��-2.379
ivism-2.148
idem-2.087
icles-1.975
��-1.935
Token Ireland
Token ID7517
Feature activation-2.11e+00

Pos logit contributions
 Spur+4.869
 misunderstand+4.671
fork+4.437
 twe+4.407
 lig+4.293
Neg logit contributions
��-7.418
ivism-6.530
idem-6.357
icles-6.109
��-5.972
Token companies
Token ID2706
Feature activation-4.62e-01

Pos logit contributions
 Spur+2.542
 misunderstand+2.504
fork+2.387
 twe+2.342
 lig+2.308
Neg logit contributions
��-3.940
ivism-3.507
idem-3.396
icles-3.216
��-3.126
Token.
Token ID13
Feature activation-4.16e-01

Pos logit contributions
 Spur+0.640
 misunderstand+0.622
fork+0.603
 twe+0.583
 deem+0.574
Neg logit contributions
��-0.791
ivism-0.694
idem-0.680
icles-0.640
��-0.626
Token
Token ID198
Feature activation-3.25e-02

Pos logit contributions
 Spur+0.517
 misunderstand+0.488
fork+0.482
 twe+0.454
 Learns+0.453
Neg logit contributions
��-0.779
ivism-0.695
idem-0.678
icles-0.641
��-0.632
Token
Token ID198
Feature activation-1.57e-01

Pos logit contributions
 Spur+0.041
 misunderstand+0.039
fork+0.039
 twe+0.037
 lig+0.036
Neg logit contributions
��-0.061
ivism-0.053
idem-0.052
icles-0.049
��-0.049
Token She
Token ID1375
Feature activation-3.60e-01

Pos logit contributions
��+0.414
ivism+0.367
idem+0.358
icles+0.337
��+0.328
Neg logit contributions
 Spur-0.306
 misunderstand-0.293
fork-0.286
 twe-0.276
 Learns-0.270
Tokenarer
Token ID11258
Feature activation-1.20e+00

Pos logit contributions
 Spur+0.563
fork+0.545
 misunderstand+0.542
 twe+0.517
 lig+0.512
Neg logit contributions
��-0.561
ivism-0.478
idem-0.455
��-0.430
icles-0.428
Token,
Token ID11
Feature activation-7.67e-02

Pos logit contributions
 Spur+1.623
 misunderstand+1.569
fork+1.533
 twe+1.489
 lig+1.463
Neg logit contributions
��-2.086
ivism-1.849
idem-1.815
icles-1.697
��-1.646
Token no
Token ID645
Feature activation+8.44e-01

Pos logit contributions
 Spur+0.111
 misunderstand+0.107
fork+0.104
 twe+0.102
 lig+0.099
Neg logit contributions
��-0.127
ivism-0.112
idem-0.110
icles-0.102
��-0.099
Token Chris
Token ID5180
Feature activation+1.60e+00

Pos logit contributions
��+1.416
ivism+1.250
idem+1.219
icles+1.145
��+1.102
Neg logit contributions
 Spur-1.195
 misunderstand-1.156
fork-1.126
 twe-1.096
 lig-1.070
Token Sutton
Token ID39964
Feature activation-4.01e+00

Pos logit contributions
��+2.721
ivism+2.407
idem+2.348
icles+2.213
��+2.127
Neg logit contributions
 Spur-2.191
 misunderstand-2.134
fork-2.086
 twe-2.018
 lig-1.967
Token,
Token ID11
Feature activation-1.29e-01

Pos logit contributions
 Spur+5.038
 misunderstand+4.812
fork+4.788
 twe+4.631
 lig+4.565
Neg logit contributions
��-6.901
ivism-6.188
idem-6.068
icles-5.695
��-5.556
Token probably
Token ID2192
Feature activation-7.33e-02

Pos logit contributions
 Spur+0.201
 misunderstand+0.195
fork+0.190
 twe+0.186
 lig+0.182
Neg logit contributions
��-0.198
ivism-0.173
idem-0.169
icles-0.157
��-0.150
Token no
Token ID645
Feature activation+8.50e-01

Pos logit contributions
 Spur+0.105
 misunderstand+0.101
fork+0.098
 twe+0.096
 lig+0.094
Neg logit contributions
��-0.123
ivism-0.108
idem-0.106
icles-0.099
��-0.096
Token Kenny
Token ID22102
Feature activation+3.49e-01

Pos logit contributions
��+1.423
ivism+1.256
idem+1.226
icles+1.151
��+1.108
Neg logit contributions
 Spur-1.209
 misunderstand-1.166
fork-1.135
 twe-1.108
 lig-1.082
Token Dal
Token ID12348
Feature activation-4.16e-01

Pos logit contributions
��+0.563
ivism+0.498
idem+0.483
icles+0.457
��+0.436
Neg logit contributions
 Spur-0.519
 misunderstand-0.493
fork-0.490
 twe-0.475
 lig-0.464
Token to
Token ID284
Feature activation-3.07e-01

Pos logit contributions
 Spur+0.798
 misunderstand+0.758
fork+0.734
 twe+0.724
 lig+0.717
Neg logit contributions
��-1.000
ivism-0.868
idem-0.855
icles-0.809
��-0.791
Token UCLA
Token ID21750
Feature activation-7.78e-01

Pos logit contributions
 Spur+0.425
 misunderstand+0.402
fork+0.394
 twe+0.383
 lig+0.376
Neg logit contributions
��-0.530
ivism-0.467
idem-0.459
icles-0.434
��-0.417
Token on
Token ID319
Feature activation-2.41e+00

Pos logit contributions
 Spur+1.028
 misunderstand+0.981
fork+0.949
 twe+0.934
 lig+0.914
Neg logit contributions
��-1.390
ivism-1.226
idem-1.207
icles-1.135
��-1.105
Token Friday
Token ID3217
Feature activation+6.51e-01

Pos logit contributions
 Spur+2.547
 misunderstand+2.384
 twe+2.276
fork+2.276
 lig+2.221
Neg logit contributions
��-4.927
ivism-4.416
idem-4.343
icles-4.125
��-4.075
Token via
Token ID2884
Feature activation-1.35e+00

Pos logit contributions
��+1.067
ivism+0.933
idem+0.912
icles+0.856
��+0.826
Neg logit contributions
 Spur-0.948
 misunderstand-0.917
fork-0.897
 twe-0.872
 lig-0.854
Token Ble
Token ID17175
Feature activation-3.99e+00

Pos logit contributions
 Spur+1.772
 misunderstand+1.657
fork+1.614
 twe+1.606
 lig+1.571
Neg logit contributions
��-2.437
ivism-2.180
idem-2.113
icles-2.006
��-1.958
Tokenacher
Token ID3493
Feature activation+1.48e-01

Pos logit contributions
fork+5.163
 Spur+5.111
 twe+4.940
 misunderstand+4.801
berry+4.678
Neg logit contributions
��-6.742
ivism-6.108
idem-5.635
icles-5.439
��-5.318
Token Report
Token ID6358
Feature activation-1.29e+00

Pos logit contributions
��+0.336
ivism+0.305
idem+0.298
icles+0.286
��+0.281
Neg logit contributions
 Spur-0.127
fork-0.117
 misunderstand-0.117
 twe-0.107
 deem-0.103
Token video
Token ID2008
Feature activation+1.36e-01

Pos logit contributions
 Spur+1.829
 misunderstand+1.733
fork+1.702
 twe+1.684
 lig+1.644
Neg logit contributions
��-2.167
ivism-1.899
idem-1.841
icles-1.735
��-1.698
Token.
Token ID13
Feature activation-2.98e-01

Pos logit contributions
��+0.240
ivism+0.209
idem+0.204
icles+0.193
��+0.189
Neg logit contributions
 Spur-0.185
 misunderstand-0.179
fork-0.175
 twe-0.170
 lig-0.166
Token
Token ID198
Feature activation-2.05e-01

Pos logit contributions
 Spur+0.376
 misunderstand+0.352
fork+0.348
 twe+0.336
 Learns+0.331
Neg logit contributions
��-0.553
ivism-0.490
idem-0.481
icles-0.455
��-0.445
Tokenau
Token ID559
Feature activation-3.14e-01

Pos logit contributions
��+1.175
ivism+1.045
idem+1.027
icles+0.973
��+0.965
Neg logit contributions
 Spur-0.640
 misunderstand-0.608
fork-0.605
 twe-0.569
 lig-0.555
Token said
Token ID531
Feature activation+6.78e-01

Pos logit contributions
 Spur+0.386
 misunderstand+0.372
fork+0.363
 twe+0.351
 lig+0.340
Neg logit contributions
��-0.584
ivism-0.520
idem-0.510
icles-0.483
��-0.467
Token.
Token ID13
Feature activation+2.12e-01

Pos logit contributions
��+1.008
ivism+0.876
idem+0.847
icles+0.792
��+0.757
Neg logit contributions
 Spur-1.079
 misunderstand-1.052
fork-1.029
 twe-0.996
 Learns-0.987
Token
Token ID198
Feature activation+3.61e-02

Pos logit contributions
��+0.389
ivism+0.347
idem+0.340
icles+0.321
��+0.310
Neg logit contributions
 Spur-0.267
 misunderstand-0.255
fork-0.250
 twe-0.241
 Learns-0.235
Token
Token ID198
Feature activation-8.89e-02

Pos logit contributions
��+0.067
ivism+0.059
idem+0.058
icles+0.054
��+0.054
Neg logit contributions
 Spur-0.045
 misunderstand-0.043
fork-0.043
 twe-0.041
 lig-0.040
TokenBloom
Token ID38941
Feature activation-3.98e+00

Pos logit contributions
 Spur+0.116
fork+0.111
 misunderstand+0.110
 twe+0.105
 lig+0.102
Neg logit contributions
��-0.158
ivism-0.140
idem-0.136
icles-0.129
��-0.126
Token coming
Token ID2406
Feature activation+1.20e+00

Pos logit contributions
 Spur+4.839
 misunderstand+4.743
fork+4.574
 twe+4.523
 lig+4.413
Neg logit contributions
��-7.085
idem-6.203
ivism-6.151
��-5.823
icles-5.695
Token early
Token ID1903
Feature activation-1.58e+00

Pos logit contributions
��+2.137
ivism+1.900
idem+1.855
icles+1.754
��+1.687
Neg logit contributions
 Spur-1.566
 misunderstand-1.515
fork-1.479
 twe-1.427
 Learns-1.392
Token to
Token ID284
Feature activation+9.46e-02

Pos logit contributions
 Spur+2.034
 misunderstand+1.971
fork+1.906
 twe+1.871
 lig+1.826
Neg logit contributions
��-2.831
ivism-2.458
idem-2.452
icles-2.311
��-2.264
Token western
Token ID8830
Feature activation-5.04e-01

Pos logit contributions
��+0.177
ivism+0.158
idem+0.155
icles+0.146
��+0.142
Neg logit contributions
 Spur-0.117
 misunderstand-0.112
fork-0.109
 twe-0.105
 lig-0.102
Token Oregon
Token ID8819
Feature activation+8.53e-01

Pos logit contributions
 Spur+0.801
 misunderstand+0.774
fork+0.760
 twe+0.741
 lig+0.727
Neg logit contributions
��-0.758
ivism-0.654
idem-0.642
icles-0.598
��-0.581
Token to
Token ID284
Feature activation+5.74e-03

Pos logit contributions
��+0.066
ivism+0.058
idem+0.057
icles+0.054
��+0.052
Neg logit contributions
 Spur-0.053
 misunderstand-0.052
fork-0.051
 twe-0.049
 lig-0.048
Token gain
Token ID4461
Feature activation-7.83e-01

Pos logit contributions
��+0.011
ivism+0.010
idem+0.010
icles+0.010
��+0.009
Neg logit contributions
 Spur-0.006
 misunderstand-0.006
fork-0.006
 twe-0.006
 deem-0.006
Token the
Token ID262
Feature activation+3.27e-02

Pos logit contributions
 Spur+0.930
 misunderstand+0.902
 twe+0.853
fork+0.851
 deem+0.826
Neg logit contributions
��-1.492
ivism-1.316
idem-1.296
icles-1.230
��-1.194
Token release
Token ID2650
Feature activation-5.59e-01

Pos logit contributions
��+0.062
ivism+0.055
idem+0.054
icles+0.051
��+0.049
Neg logit contributions
 Spur-0.040
 misunderstand-0.038
fork-0.037
 twe-0.036
 lig-0.035
Token of
Token ID286
Feature activation+3.29e-01

Pos logit contributions
 Spur+0.731
 misunderstand+0.707
fork+0.690
 twe+0.666
 Learns+0.650
Neg logit contributions
��-0.993
ivism-0.882
idem-0.861
icles-0.814
��-0.784
Token Bow
Token ID9740
Feature activation-3.97e+00

Pos logit contributions
��+0.580
ivism+0.513
idem+0.503
icles+0.473
��+0.460
Neg logit contributions
 Spur-0.437
 misunderstand-0.423
fork-0.409
 twe-0.401
 lig-0.390
Tokene
Token ID68
Feature activation+2.70e-01

Pos logit contributions
 Spur+5.019
fork+4.946
 misunderstand+4.788
 lig+4.723
 twe+4.661
Neg logit contributions
��-6.977
ivism-6.191
idem-5.863
icles-5.727
��-5.679
Token Ber
Token ID4312
Feature activation-2.14e+00

Pos logit contributions
��+0.437
ivism+0.379
idem+0.367
icles+0.349
��+0.335
Neg logit contributions
 Spur-0.403
 misunderstand-0.388
fork-0.379
 twe-0.369
 lig-0.363
Tokengd
Token ID21287
Feature activation+2.40e-01

Pos logit contributions
 Spur+2.683
 misunderstand+2.595
fork+2.593
 twe+2.554
 lig+2.420
Neg logit contributions
��-3.907
ivism-3.387
idem-3.333
��-3.114
icles-3.093
Tokenahl
Token ID15668
Feature activation-4.19e-01

Pos logit contributions
��+0.456
ivism+0.399
idem+0.395
icles+0.372
��+0.362
Neg logit contributions
 Spur-0.294
 misunderstand-0.279
fork-0.273
 twe-0.269
 lig-0.256
Token had
Token ID550
Feature activation+3.42e-01

Pos logit contributions
 Spur+0.529
 misunderstand+0.517
fork+0.493
 twe+0.485
 lig+0.476
Neg logit contributions
��-0.764
ivism-0.675
idem-0.666
icles-0.623
��-0.606
Token,
Token ID11
Feature activation+3.62e-02

Pos logit contributions
 Spur+2.355
 misunderstand+2.254
 deem+2.177
fork+2.175
 twe+2.150
Neg logit contributions
��-3.244
ivism-2.812
idem-2.784
��-2.607
icles-2.589
Token although
Token ID3584
Feature activation+1.16e+00

Pos logit contributions
��+0.059
ivism+0.051
idem+0.050
icles+0.047
��+0.046
Neg logit contributions
 Spur-0.053
 misunderstand-0.052
fork-0.051
 twe-0.049
 deem-0.049
Token she
Token ID673
Feature activation-5.47e-01

Pos logit contributions
��+1.901
ivism+1.666
idem+1.626
icles+1.524
��+1.469
Neg logit contributions
 Spur-1.682
 misunderstand-1.635
fork-1.593
 twe-1.548
 lig-1.511
Token said
Token ID531
Feature activation+3.49e-01

Pos logit contributions
 Spur+0.646
 misunderstand+0.635
fork+0.604
 twe+0.590
 deem+0.578
Neg logit contributions
��-1.052
ivism-0.924
idem-0.915
icles-0.861
��-0.848
Token the
Token ID262
Feature activation-1.79e-01

Pos logit contributions
��+0.593
ivism+0.520
idem+0.510
icles+0.477
��+0.464
Neg logit contributions
 Spur-0.492
 misunderstand-0.480
fork-0.461
 twe-0.451
 deem-0.441
Token pur
Token ID1308
Feature activation-3.96e+00

Pos logit contributions
 Spur+0.210
 misunderstand+0.204
fork+0.195
 twe+0.191
 deem+0.187
Neg logit contributions
��-0.344
ivism-0.305
idem-0.300
icles-0.285
��-0.278
Tokenview
Token ID1177
Feature activation-1.06e-01

Pos logit contributions
fork+3.387
 Spur+3.362
 misunderstand+3.318
 twe+3.138
 lig+3.006
Neg logit contributions
��-8.473
��-7.235
ivism-7.216
idem-6.951
-6.716
Token of
Token ID286
Feature activation+5.68e-02

Pos logit contributions
 Spur+0.127
 misunderstand+0.124
fork+0.119
 twe+0.116
 deem+0.113
Neg logit contributions
��-0.199
ivism-0.176
idem-0.174
icles-0.164
��-0.160
Token the
Token ID262
Feature activation-2.76e-01

Pos logit contributions
��+0.096
ivism+0.084
idem+0.083
icles+0.077
��+0.076
Neg logit contributions
 Spur-0.080
 misunderstand-0.078
fork-0.075
 twe-0.073
 deem-0.072
Token bank
Token ID3331
Feature activation-1.91e-01

Pos logit contributions
 Spur+0.332
 misunderstand+0.319
fork+0.307
 twe+0.298
 deem+0.293
Neg logit contributions
��-0.526
ivism-0.466
idem-0.458
icles-0.434
��-0.425
Token is
Token ID318
Feature activation+1.43e-01

Pos logit contributions
 Spur+0.244
 misunderstand+0.237
fork+0.230
 twe+0.222
 deem+0.217
Neg logit contributions
��-0.345
ivism-0.305
idem-0.299
icles-0.283
��-0.274
Token possible
Token ID1744
Feature activation-1.01e+00

Pos logit contributions
 Spur+0.473
 misunderstand+0.459
fork+0.439
 twe+0.430
 lig+0.418
Neg logit contributions
��-0.817
ivism-0.727
idem-0.718
icles-0.677
��-0.657
Token U
Token ID471
Feature activation+8.40e-01

Pos logit contributions
 Spur+1.129
 misunderstand+1.109
fork+1.041
 twe+1.026
 lig+0.998
Neg logit contributions
��-2.020
ivism-1.786
idem-1.772
icles-1.670
��-1.631
Token.
Token ID13
Feature activation+3.40e-01

Pos logit contributions
��+1.330
ivism+1.149
idem+1.119
icles+1.044
��+1.024
Neg logit contributions
 Spur-1.270
 misunderstand-1.235
fork-1.226
 twe-1.171
 lig-1.150
TokenS
Token ID50
Feature activation-3.43e-01

Pos logit contributions
��+0.570
ivism+0.493
idem+0.483
��+0.452
icles+0.450
Neg logit contributions
 Spur-0.490
fork-0.470
 misunderstand-0.470
 twe-0.440
 Learns-0.437
Token.
Token ID13
Feature activation+2.51e-02

Pos logit contributions
 Spur+0.545
 misunderstand+0.530
fork+0.519
 twe+0.505
 Learns+0.494
Neg logit contributions
��-0.513
ivism-0.444
idem-0.432
icles-0.403
��-0.385
Token downgrade
Token ID48514
Feature activation-3.86e+00

Pos logit contributions
��+0.049
ivism+0.044
idem+0.043
icles+0.040
��+0.039
Neg logit contributions
 Spur-0.029
 misunderstand-0.028
fork-0.027
 twe-0.026
 lig-0.026
Token seemed
Token ID3947
Feature activation-5.21e-01

Pos logit contributions
 misunderstand+4.434
 Spur+4.333
fork+4.078
 lig+4.066
 twe+4.037
Neg logit contributions
��-7.321
idem-6.406
ivism-6.299
��-5.974
icles-5.866
Token less
Token ID1342
Feature activation+4.28e-01

Pos logit contributions
 Spur+0.676
 misunderstand+0.664
fork+0.636
 twe+0.632
 deem+0.613
Neg logit contributions
��-0.935
ivism-0.821
idem-0.807
icles-0.762
ão-0.730
Token dire
Token ID19958
Feature activation-2.71e+00

Pos logit contributions
��+0.817
ivism+0.731
idem+0.717
icles+0.679
��+0.656
Neg logit contributions
 Spur-0.503
 misunderstand-0.485
fork-0.471
 twe-0.455
 lig-0.442
Token and
Token ID290
Feature activation-4.38e-02

Pos logit contributions
 Spur+3.418
 misunderstand+3.323
fork+3.210
 twe+3.153
 lig+3.064
Neg logit contributions
��-4.904
idem-4.199
ivism-4.179
��-3.968
icles-3.957
Token narrower
Token ID40369
Feature activation-6.95e-01

Pos logit contributions
 Spur+0.053
 misunderstand+0.052
fork+0.049
 twe+0.049
 deem+0.048
Neg logit contributions
��-0.082
ivism-0.073
idem-0.072
icles-0.068
��-0.066
Tokenlon
Token ID14995
Feature activation-1.10e+00

Pos logit contributions
��+1.225
ivism+1.139
idem+1.118
icles+1.082
��+1.069
Neg logit contributions
 Spur-0.118
 misunderstand-0.097
fork-0.089
 twe-0.070
 lig-0.055
Token Sports
Token ID7092
Feature activation-3.47e-01

Pos logit contributions
 Spur+1.487
 misunderstand+1.386
fork+1.350
 twe+1.332
 lig+1.323
Neg logit contributions
��-1.950
ivism-1.718
idem-1.672
icles-1.560
��-1.545
Token,[
Token ID17414
Feature activation+4.91e-01

Pos logit contributions
 Spur+0.513
 misunderstand+0.490
fork+0.483
 twe+0.476
 lig+0.465
Neg logit contributions
��-0.561
ivism-0.488
idem-0.472
icles-0.438
��-0.431
Token97
Token ID5607
Feature activation+5.60e-01

Pos logit contributions
��+0.845
ivism+0.750
idem+0.729
icles+0.688
��+0.666
Neg logit contributions
 Spur-0.668
 misunderstand-0.646
fork-0.635
 twe-0.608
 lig-0.595
Token]
Token ID60
Feature activation-3.96e-01

Pos logit contributions
��+0.875
ivism+0.765
idem+0.745
icles+0.699
��+0.672
Neg logit contributions
 Spur-0.846
 misunderstand-0.824
fork-0.809
 twe-0.782
 Learns-0.771
Token Ble
Token ID17175
Feature activation-3.82e+00

Pos logit contributions
 Spur+0.524
 misunderstand+0.489
fork+0.480
 twe+0.472
 lig+0.456
Neg logit contributions
��-0.713
ivism-0.631
idem-0.619
icles-0.582
��-0.574
Tokenacher
Token ID3493
Feature activation+4.18e-01

Pos logit contributions
fork+4.923
 Spur+4.921
 twe+4.804
 misunderstand+4.689
berry+4.528
Neg logit contributions
��-6.475
ivism-5.717
idem-5.372
icles-5.119
��-5.045
Token Report
Token ID6358
Feature activation-1.03e+00

Pos logit contributions
��+0.937
ivism+0.852
idem+0.833
icles+0.798
��+0.780
Neg logit contributions
 Spur-0.365
 misunderstand-0.338
fork-0.334
 twe-0.311
 deem-0.297
Token,[
Token ID17414
Feature activation+4.80e-01

Pos logit contributions
 Spur+1.614
 misunderstand+1.542
fork+1.524
 twe+1.503
 deem+1.477
Neg logit contributions
��-1.554
ivism-1.334
idem-1.299
icles-1.198
��-1.181
Token98
Token ID4089
Feature activation-2.45e-01

Pos logit contributions
��+0.797
ivism+0.702
idem+0.687
icles+0.644
��+0.617
Neg logit contributions
 Spur-0.683
 misunderstand-0.663
fork-0.646
 twe-0.628
 Learns-0.614
Token]
Token ID60
Feature activation-4.16e-01

Pos logit contributions
 Spur+0.366
 misunderstand+0.355
fork+0.348
 twe+0.338
 Learns+0.331
Neg logit contributions
��-0.390
ivism-0.341
idem-0.333
icles-0.312
��-0.300
Token of
Token ID286
Feature activation-2.44e-01

Pos logit contributions
��+1.205
ivism+1.068
idem+1.042
icles+0.986
��+0.947
Neg logit contributions
 Spur-0.917
 misunderstand-0.886
fork-0.863
 twe-0.833
 Learns-0.817
Token everything
Token ID2279
Feature activation+1.08e+00

Pos logit contributions
 Spur+0.342
 misunderstand+0.333
fork+0.326
 twe+0.317
 deem+0.308
Neg logit contributions
��-0.408
ivism-0.359
idem-0.351
icles-0.330
��-0.320
Token the
Token ID262
Feature activation-3.01e-01

Pos logit contributions
��+1.870
ivism+1.652
idem+1.620
icles+1.525
��+1.468
Neg logit contributions
 Spur-1.475
 misunderstand-1.427
fork-1.396
 twe-1.352
 lig-1.316
Token petty
Token ID25229
Feature activation-1.12e+00

Pos logit contributions
 Spur+0.360
 misunderstand+0.351
fork+0.342
 twe+0.331
 lig+0.319
Neg logit contributions
��-0.565
ivism-0.504
idem-0.493
icles-0.469
��-0.455
Token-
Token ID12
Feature activation-1.22e-01

Pos logit contributions
 Spur+1.311
 misunderstand+1.292
fork+1.252
 twe+1.199
 deem+1.162
Neg logit contributions
��-2.124
ivism-1.887
idem-1.867
icles-1.764
��-1.707
Tokenbour
Token ID6084
Feature activation-3.82e+00

Pos logit contributions
 Spur+0.157
 misunderstand+0.152
fork+0.151
 twe+0.143
 lig+0.140
Neg logit contributions
��-0.219
ivism-0.193
idem-0.190
icles-0.179
��-0.173
Tokenge
Token ID469
Feature activation-1.59e+00

Pos logit contributions
 Spur+2.179
fork+2.074
 misunderstand+2.009
 twe+1.942
 deem+1.733
Neg logit contributions
��-9.260
ivism-8.152
idem-7.968
��-7.961
icles-7.726
Tokenois
Token ID10924
Feature activation+8.95e-02

Pos logit contributions
 Spur+1.521
 misunderstand+1.479
fork+1.405
 twe+1.389
 lig+1.332
Neg logit contributions
��-3.332
ivism-2.960
idem-2.944
icles-2.747
��-2.739
Token intellig
Token ID10878
Feature activation-1.93e+00

Pos logit contributions
��+0.158
ivism+0.139
idem+0.137
icles+0.128
��+0.124
Neg logit contributions
 Spur-0.118
 misunderstand-0.115
fork-0.112
 twe-0.108
 deem-0.105
Tokenents
Token ID658
Feature activation-4.67e-01

Pos logit contributions
 Spur+3.128
 misunderstand+3.050
fork+3.031
 twe+2.963
 lig+2.913
Neg logit contributions
��-2.844
ivism-2.187
idem-2.179
��-2.093
icles-2.008
Tokenia
Token ID544
Feature activation-1.04e+00

Pos logit contributions
 Spur+0.741
 misunderstand+0.708
fork+0.692
 twe+0.680
 lig+0.678
Neg logit contributions
��-0.708
ivism-0.577
idem-0.562
��-0.538
icles-0.521
Token:
Token ID25
Feature activation-3.15e-01

Pos logit contributions
��+0.938
ivism+0.845
idem+0.831
icles+0.789
��+0.764
Neg logit contributions
 Spur-0.490
 misunderstand-0.474
fork-0.458
 twe-0.448
 deem-0.425
Token
Token ID198
Feature activation+1.23e-01

Pos logit contributions
 Spur+0.402
 misunderstand+0.385
fork+0.377
 twe+0.364
 Learns+0.353
Neg logit contributions
��-0.573
ivism-0.509
idem-0.499
icles-0.469
��-0.456
Token
Token ID198
Feature activation-6.07e-02

Pos logit contributions
��+0.230
ivism+0.201
idem+0.197
icles+0.186
��+0.186
Neg logit contributions
 Spur-0.154
 misunderstand-0.147
fork-0.146
 twe-0.138
 lig-0.135
Token13
Token ID1485
Feature activation+4.48e-02

Pos logit contributions
 Spur+0.085
fork+0.082
 misunderstand+0.082
 twe+0.078
 lig+0.075
Neg logit contributions
��-0.103
ivism-0.090
idem-0.087
icles-0.082
��-0.081
Token PH
Token ID9370
Feature activation+9.54e-01

Pos logit contributions
��+0.077
ivism+0.068
idem+0.066
icles+0.062
��+0.060
Neg logit contributions
 Spur-0.062
 misunderstand-0.061
fork-0.059
 twe-0.057
 Learns-0.056
TokenOTOS
Token ID33291
Feature activation+3.38e+00

Pos logit contributions
��+0.881
ivism+0.706
idem+0.653
icles+0.564
ão+0.514
Neg logit contributions
 Spur-2.073
 misunderstand-2.049
fork-1.997
 twe-1.962
 deem-1.930
Token Donald
Token ID3759
Feature activation+1.91e+00

Pos logit contributions
��+5.325
ivism+4.640
idem+4.539
icles+4.289
��+4.065
Neg logit contributions
 Spur-4.965
 misunderstand-4.876
fork-4.777
 twe-4.607
 lig-4.557
Token Trump
Token ID1301
Feature activation+1.97e+00

Pos logit contributions
��+2.054
ivism+1.664
idem+1.598
icles+1.433
��+1.340
Neg logit contributions
 Spur-3.869
 misunderstand-3.786
fork-3.727
 twe-3.648
 Learns-3.587
Token's
Token ID338
Feature activation-1.19e+00

Pos logit contributions
��+3.429
ivism+3.047
idem+2.978
icles+2.815
��+2.690
Neg logit contributions
 Spur-2.614
 misunderstand-2.524
fork-2.479
 twe-2.384
 lig-2.323
Token potential
Token ID2785
Feature activation-1.01e+00

Pos logit contributions
 Spur+1.548
 misunderstand+1.506
 twe+1.439
fork+1.413
 lig+1.373
Neg logit contributions
��-2.173
ivism-1.900
idem-1.884
��-1.722
icles-1.714
Token running
Token ID2491
Feature activation+1.32e+00

Pos logit contributions
 Spur+1.395
 misunderstand+1.361
 twe+1.309
fork+1.304
 lig+1.253
Neg logit contributions
��-1.750
ivism-1.513
idem-1.501
��-1.377
icles-1.375
Token Victor
Token ID12622
Feature activation+1.13e+00

Pos logit contributions
��+0.314
ivism+0.276
idem+0.269
icles+0.255
��+0.247
Neg logit contributions
 Spur-0.265
 misunderstand-0.255
fork-0.247
 twe-0.241
 deem-0.236
Token De
Token ID1024
Feature activation-6.75e-01

Pos logit contributions
��+1.984
ivism+1.746
idem+1.700
icles+1.604
��+1.564
Neg logit contributions
 Spur-1.531
 misunderstand-1.462
fork-1.424
 twe-1.378
 Learns-1.352
TokenNob
Token ID21191
Feature activation-3.96e-02

Pos logit contributions
 Spur+0.970
 misunderstand+0.935
fork+0.922
 twe+0.890
 lig+0.866
Neg logit contributions
��-1.111
ivism-0.976
idem-0.934
icles-0.896
��-0.860
Tokenel
Token ID417
Feature activation+1.36e-01

Pos logit contributions
 Spur+0.046
fork+0.043
 misunderstand+0.043
 twe+0.041
 lig+0.040
Neg logit contributions
��-0.077
ivism-0.068
idem-0.067
icles-0.063
��-0.062
Token and
Token ID290
Feature activation-1.32e-01

Pos logit contributions
��+0.243
ivism+0.215
idem+0.210
icles+0.199
��+0.193
Neg logit contributions
 Spur-0.178
 misunderstand-0.171
fork-0.167
 twe-0.161
 lig-0.157
Token Paul
Token ID3362
Feature activation+3.79e+00

Pos logit contributions
 Spur+0.195
 misunderstand+0.187
fork+0.181
 twe+0.177
 lig+0.174
Neg logit contributions
��-0.218
ivism-0.190
idem-0.185
icles-0.174
��-0.169
Token Me
Token ID2185
Feature activation-8.14e-01

Pos logit contributions
��+6.627
ivism+5.954
idem+5.858
icles+5.510
ão+5.290
Neg logit contributions
 Spur-4.743
 misunderstand-4.704
fork-4.576
 twe-4.430
 deem-4.342
Tokenle
Token ID293
Feature activation-1.53e+00

Pos logit contributions
 Spur+1.084
 misunderstand+1.039
fork+1.038
 twe+0.980
 lig+0.960
Neg logit contributions
��-1.425
ivism-1.247
idem-1.182
��-1.142
icles-1.128
Token,
Token ID11
Feature activation-2.67e-02

Pos logit contributions
 Spur+2.052
fork+1.944
 misunderstand+1.939
 twe+1.822
 lig+1.787
Neg logit contributions
��-2.717
ivism-2.374
idem-2.297
icles-2.167
��-2.139
Token testify
Token ID19671
Feature activation-6.70e-01

Pos logit contributions
 Spur+0.033
 misunderstand+0.032
fork+0.031
 twe+0.030
 deem+0.029
Neg logit contributions
��-0.050
ivism-0.044
idem-0.043
icles-0.041
��-0.040
Token before
Token ID878
Feature activation-7.83e-01

Pos logit contributions
 Spur+0.856
 misunderstand+0.823
fork+0.798
 twe+0.775
 deem+0.761
Neg logit contributions
��-1.235
ivism-1.074
idem-1.057
icles-1.003
��-0.987
Token,"
Token ID553
Feature activation-4.85e-01

Pos logit contributions
 Spur+0.677
 misunderstand+0.651
fork+0.639
 twe+0.610
 deem+0.594
Neg logit contributions
��-1.038
ivism-0.912
idem-0.899
icles-0.854
��-0.835
Token [
Token ID685
Feature activation+1.35e+00

Pos logit contributions
 Spur+0.676
 misunderstand+0.638
fork+0.622
 twe+0.610
 Learns+0.592
Neg logit contributions
��-0.847
ivism-0.741
idem-0.719
icles-0.676
��-0.672
TokenPlay
Token ID11002
Feature activation-1.25e+00

Pos logit contributions
��+2.336
ivism+2.073
idem+2.029
icles+1.911
��+1.831
Neg logit contributions
 Spur-1.802
 misunderstand-1.749
fork-1.702
 twe-1.647
 Learns-1.612
Tokenboy
Token ID7081
Feature activation-1.26e+00

Pos logit contributions
 Spur+0.793
fork+0.754
 misunderstand+0.724
 twe+0.670
 deem+0.623
Neg logit contributions
��-3.062
ivism-2.783
idem-2.743
��-2.629
icles-2.606
Token CEO
Token ID6123
Feature activation-2.55e-01

Pos logit contributions
 Spur+1.697
 misunderstand+1.620
fork+1.581
 twe+1.549
 deem+1.493
Neg logit contributions
��-2.214
ivism-1.952
idem-1.910
icles-1.791
��-1.758
Token Scott
Token ID4746
Feature activation+3.76e+00

Pos logit contributions
 Spur+0.367
 misunderstand+0.350
fork+0.340
 twe+0.333
 lig+0.325
Neg logit contributions
��-0.430
ivism-0.373
idem-0.366
icles-0.343
��-0.335
Token F
Token ID376
Feature activation+1.16e+00

Pos logit contributions
��+5.984
ivism+5.332
idem+5.175
icles+4.911
ão+4.612
Neg logit contributions
 Spur-5.337
 misunderstand-5.294
fork-5.148
 twe-5.012
 Learns-4.910
Tokenlanders
Token ID32358
Feature activation-7.56e-01

Pos logit contributions
��+1.913
ivism+1.659
idem+1.617
icles+1.505
��+1.478
Neg logit contributions
 Spur-1.700
 misunderstand-1.649
fork-1.612
 twe-1.563
 deem-1.523
Token]
Token ID60
Feature activation+9.82e-03

Pos logit contributions
 Spur+1.059
 misunderstand+1.026
fork+0.999
 twe+0.970
 Learns+0.946
Neg logit contributions
��-1.284
ivism-1.123
idem-1.098
icles-1.031
��-0.997
Token said
Token ID531
Feature activation+6.22e-01

Pos logit contributions
��+0.018
ivism+0.016
idem+0.016
icles+0.015
��+0.015
Neg logit contributions
 Spur-0.012
 misunderstand-0.012
fork-0.011
 twe-0.011
 Learns-0.011
Token,
Token ID11
Feature activation+5.20e-01

Pos logit contributions
��+1.049
ivism+0.918
idem+0.898
icles+0.844
��+0.816
Neg logit contributions
 Spur-0.876
 misunderstand-0.849
fork-0.829
 twe-0.805
 lig-0.784
Token
Token ID251
Feature activation-2.32e-02

Pos logit contributions
��+0.036
ivism+0.028
idem+0.026
icles+0.021
imus+0.017
Neg logit contributions
 Spur-0.128
 misunderstand-0.125
fork-0.124
 twe-0.122
 lig-0.121
Token said
Token ID531
Feature activation+7.47e-01

Pos logit contributions
 Spur+0.031
 misunderstand+0.030
fork+0.029
 twe+0.029
 deem+0.028
Neg logit contributions
��-0.040
ivism-0.036
idem-0.035
icles-0.033
��-0.032
Token PS
Token ID6599
Feature activation+1.17e+00

Pos logit contributions
��+1.299
ivism+1.148
idem+1.120
icles+1.055
��+1.018
Neg logit contributions
 Spur-1.014
 misunderstand-0.977
fork-0.954
 twe-0.926
 lig-0.901
TokenAC
Token ID2246
Feature activation-3.49e-04

Pos logit contributions
��+2.013
ivism+1.777
idem+1.731
icles+1.636
��+1.574
Neg logit contributions
 Spur-1.614
 misunderstand-1.554
fork-1.521
 twe-1.468
 Learns-1.433
Token president
Token ID1893
Feature activation+9.09e-02

Pos logit contributions
 Spur+0.000
 misunderstand+0.000
fork+0.000
 twe+0.000
 deem+0.000
Neg logit contributions
��-0.001
ivism-0.001
idem-0.001
icles-0.000
��-0.000
Token Rob
Token ID3851
Feature activation+3.17e+00

Pos logit contributions
��+0.151
ivism+0.131
idem+0.129
icles+0.121
��+0.117
Neg logit contributions
 Spur-0.132
 misunderstand-0.126
fork-0.123
 twe-0.120
 Learns-0.117
Tokenyn
Token ID2047
Feature activation+5.58e-02

Pos logit contributions
��+5.523
ivism+4.949
idem+4.826
icles+4.575
ão+4.397
Neg logit contributions
 Spur-4.048
 misunderstand-4.022
fork-3.879
 twe-3.769
 deem-3.677
Token Benson
Token ID33940
Feature activation-1.94e+00

Pos logit contributions
��+0.098
ivism+0.086
idem+0.085
icles+0.080
��+0.078
Neg logit contributions
 Spur-0.076
 misunderstand-0.071
fork-0.070
 twe-0.068
 Learns-0.067
Token.
Token ID13
Feature activation-2.36e-01

Pos logit contributions
 Spur+2.814
 misunderstand+2.703
fork+2.674
 twe+2.578
 lig+2.508
Neg logit contributions
��-3.159
ivism-2.760
idem-2.670
icles-2.519
��-2.447
Token
Token ID198
Feature activation-8.56e-02

Pos logit contributions
 Spur+0.301
 misunderstand+0.287
fork+0.282
 twe+0.271
 Learns+0.265
Neg logit contributions
��-0.431
ivism-0.381
idem-0.375
icles-0.352
��-0.343
Token
Token ID198
Feature activation-1.90e-01

Pos logit contributions
 Spur+0.107
 misunderstand+0.103
fork+0.102
 twe+0.097
 lig+0.094
Neg logit contributions
��-0.160
ivism-0.140
idem-0.137
icles-0.129
��-0.129
Token lot
Token ID1256
Feature activation-1.22e+00

Pos logit contributions
 Spur+0.348
 misunderstand+0.337
fork+0.327
 twe+0.318
 lig+0.306
Neg logit contributions
��-0.574
ivism-0.516
idem-0.504
icles-0.480
��-0.465
Token of
Token ID286
Feature activation-2.64e-01

Pos logit contributions
 Spur+1.592
 misunderstand+1.536
fork+1.501
 twe+1.449
 Learns+1.413
Neg logit contributions
��-2.168
ivism-1.927
idem-1.883
icles-1.781
��-1.713
Token issues
Token ID2428
Feature activation+2.04e-01

Pos logit contributions
 Spur+0.342
 misunderstand+0.337
fork+0.321
 twe+0.315
 lig+0.305
Neg logit contributions
��-0.474
ivism-0.420
idem-0.411
icles-0.387
��-0.378
Token fall
Token ID2121
Feature activation-8.57e-01

Pos logit contributions
��+0.385
ivism+0.340
idem+0.335
icles+0.317
��+0.309
Neg logit contributions
 Spur-0.248
 misunderstand-0.241
fork-0.232
 twe-0.224
 deem-0.219
Token into
Token ID656
Feature activation-2.85e-01

Pos logit contributions
 Spur+1.041
 misunderstand+1.032
fork+0.980
 twe+0.975
 lig+0.935
Neg logit contributions
��-1.590
ivism-1.392
idem-1.376
icles-1.299
��-1.266
Token HR
Token ID15172
Feature activation+3.14e+00

Pos logit contributions
 Spur+0.403
 misunderstand+0.395
fork+0.376
 twe+0.373
 deem+0.361
Neg logit contributions
��-0.480
ivism-0.422
idem-0.411
icles-0.389
��-0.379
Token
Token ID447
Feature activation+1.40e-01

Pos logit contributions
��+5.409
ivism+4.806
idem+4.698
icles+4.400
��+4.232
Neg logit contributions
 Spur-4.183
 misunderstand-3.997
fork-3.965
 twe-3.806
 lig-3.705
Token
Token ID247
Feature activation+3.52e-01

Pos logit contributions
��+0.077
ivism+0.059
idem+0.057
icles+0.040
imus+0.032
Neg logit contributions
 Spur-0.341
 misunderstand-0.335
fork-0.332
 twe-0.327
 Learns-0.326
Tokens
Token ID82
Feature activation-6.38e-02

Pos logit contributions
��+0.543
ivism+0.470
idem+0.459
icles+0.429
��+0.411
Neg logit contributions
 Spur-0.540
 misunderstand-0.527
fork-0.515
 twe-0.503
 Learns-0.493
Token lap
Token ID14779
Feature activation+9.45e-01

Pos logit contributions
 Spur+0.074
 misunderstand+0.072
fork+0.069
 twe+0.067
 deem+0.065
Neg logit contributions
��-0.123
ivism-0.110
idem-0.108
icles-0.102
��-0.099
Token.
Token ID13
Feature activation-3.90e-01

Pos logit contributions
��+1.533
ivism+1.333
idem+1.291
icles+1.213
��+1.179
Neg logit contributions
 Spur-1.395
 misunderstand-1.356
fork-1.329
 twe-1.286
 lig-1.255
Tokenalities
Token ID27969
Feature activation-2.30e-01

Pos logit contributions
 Spur+1.070
 misunderstand+1.017
fork+0.988
 twe+0.938
 lig+0.914
Neg logit contributions
��-2.292
ivism-2.019
idem-1.967
��-1.901
icles-1.874
Token Current
Token ID9236
Feature activation+1.18e+00

Pos logit contributions
 Spur+0.324
 misunderstand+0.312
fork+0.304
 twe+0.299
 Learns+0.292
Neg logit contributions
��-0.387
ivism-0.336
idem-0.332
icles-0.310
��-0.305
Token Anthropology
Token ID50171
Feature activation+4.99e-01

Pos logit contributions
��+2.164
ivism+1.920
idem+1.883
icles+1.779
��+1.730
Neg logit contributions
 Spur-1.485
 misunderstand-1.437
fork-1.387
 twe-1.343
 Learns-1.308
Token,
Token ID11
Feature activation-1.15e-01

Pos logit contributions
��+0.815
ivism+0.707
idem+0.695
icles+0.647
��+0.644
Neg logit contributions
 Spur-0.735
 misunderstand-0.708
fork-0.689
 twe-0.671
 Learns-0.660
Token 43
Token ID5946
Feature activation+1.16e+00

Pos logit contributions
 Spur+0.175
 misunderstand+0.168
fork+0.164
 twe+0.159
 lig+0.157
Neg logit contributions
��-0.183
ivism-0.158
idem-0.154
��-0.144
icles-0.144
Token (
Token ID357
Feature activation+5.66e-01

Pos logit contributions
��+1.805
ivism+1.571
idem+1.528
icles+1.437
ão+1.360
Neg logit contributions
 Spur-1.760
 misunderstand-1.725
fork-1.689
 twe-1.632
 Learns-1.610
TokenS
Token ID50
Feature activation-3.66e-01

Pos logit contributions
��+0.927
ivism+0.812
idem+0.789
icles+0.742
��+0.718
Neg logit contributions
 Spur-0.823
 misunderstand-0.795
fork-0.784
 twe-0.757
 lig-0.740
Token4
Token ID19
Feature activation-1.08e+00

Pos logit contributions
 Spur+0.523
 misunderstand+0.503
fork+0.497
 twe+0.474
 Learns+0.465
Neg logit contributions
��-0.611
ivism-0.532
idem-0.508
��-0.484
icles-0.477
Token)
Token ID8
Feature activation-1.38e-01

Pos logit contributions
 Spur+1.288
 misunderstand+1.215
fork+1.187
 twe+1.152
 lig+1.129
Neg logit contributions
��-2.058
ivism-1.841
idem-1.799
icles-1.689
��-1.675
Token DOI
Token ID40722
Feature activation-1.25e+00

Pos logit contributions
 Spur+0.202
 misunderstand+0.195
fork+0.191
 twe+0.186
 lig+0.182
Neg logit contributions
��-0.224
ivism-0.196
idem-0.191
icles-0.178
��-0.173
Token:
Token ID25
Feature activation-5.26e-01

Pos logit contributions
 Spur+1.889
 misunderstand+1.831
fork+1.787
 twe+1.744
 lig+1.715
Neg logit contributions
��-1.975
ivism-1.730
idem-1.698
icles-1.571
��-1.531
Token Libya
Token ID13465
Feature activation-8.06e-01

Pos logit contributions
��+0.509
ivism+0.447
idem+0.435
icles+0.411
��+0.402
Neg logit contributions
 Spur-0.414
 misunderstand-0.404
fork-0.389
 twe-0.379
 lig-0.373
Token and
Token ID290
Feature activation+1.19e-01

Pos logit contributions
 Spur+1.102
 misunderstand+1.072
fork+1.044
 twe+1.010
 lig+0.988
Neg logit contributions
��-1.387
ivism-1.219
idem-1.192
icles-1.124
��-1.084
Token Iraq
Token ID3908
Feature activation+6.17e-01

Pos logit contributions
��+0.206
ivism+0.182
idem+0.176
icles+0.167
��+0.162
Neg logit contributions
 Spur-0.163
 misunderstand-0.158
fork-0.153
 twe-0.149
 lig-0.146
Token on
Token ID319
Feature activation-1.77e+00

Pos logit contributions
��+1.035
ivism+0.903
idem+0.886
icles+0.831
��+0.805
Neg logit contributions
 Spur-0.878
 misunderstand-0.854
fork-0.829
 twe-0.806
 lig-0.791
Token the
Token ID262
Feature activation+2.37e-01

Pos logit contributions
 Spur+2.403
 misunderstand+2.355
 twe+2.224
fork+2.208
 lig+2.173
Neg logit contributions
��-3.043
ivism-2.681
idem-2.579
icles-2.439
��-2.418
Token pretext
Token ID35097
Feature activation+1.10e+00

Pos logit contributions
��+0.476
ivism+0.432
idem+0.420
icles+0.399
��+0.390
Neg logit contributions
 Spur-0.256
 misunderstand-0.246
fork-0.235
 twe-0.229
 lig-0.222
Token that
Token ID326
Feature activation+7.02e-01

Pos logit contributions
��+1.858
ivism+1.651
idem+1.597
icles+1.519
��+1.452
Neg logit contributions
 Spur-1.512
 misunderstand-1.468
fork-1.452
 twe-1.388
 Learns-1.374
Token invasion
Token ID11796
Feature activation-9.61e-01

Pos logit contributions
��+1.198
ivism+1.056
idem+1.030
icles+0.966
��+0.936
Neg logit contributions
 Spur-0.981
 misunderstand-0.958
fork-0.915
 twe-0.899
 lig-0.881
Token was
Token ID373
Feature activation-2.37e-01

Pos logit contributions
 Spur+1.074
 misunderstand+1.056
fork+0.995
 twe+0.974
 lig+0.961
Neg logit contributions
��-1.889
ivism-1.667
idem-1.656
icles-1.542
��-1.533
Token the
Token ID262
Feature activation+2.81e-01

Pos logit contributions
 Spur+0.293
 misunderstand+0.283
fork+0.268
 twe+0.266
 deem+0.260
Neg logit contributions
��-0.447
ivism-0.394
idem-0.389
icles-0.366
��-0.355
Token only
Token ID691
Feature activation+1.32e+00

Pos logit contributions
��+0.513
ivism+0.454
idem+0.447
icles+0.418
��+0.408
Neg logit contributions
 Spur-0.362
 misunderstand-0.350
fork-0.334
 twe-0.332
 lig-0.323
Token lie
Token ID6486
Feature activation-1.21e+00

Pos logit contributions
 Spur+0.042
 misunderstand+0.040
fork+0.039
 twe+0.038
 lig+0.037
Neg logit contributions
��-0.068
ivism-0.061
idem-0.060
icles-0.056
��-0.055
Token.
Token ID13
Feature activation+3.68e-02

Pos logit contributions
 Spur+1.728
 misunderstand+1.662
fork+1.637
 twe+1.604
 lig+1.535
Neg logit contributions
��-2.025
ivism-1.752
idem-1.721
icles-1.597
��-1.580
Token The
Token ID383
Feature activation+2.32e-02

Pos logit contributions
��+0.069
ivism+0.061
idem+0.060
icles+0.057
��+0.056
Neg logit contributions
 Spur-0.046
 misunderstand-0.043
fork-0.042
 twe-0.041
 Learns-0.040
Token image
Token ID2939
Feature activation+1.14e-01

Pos logit contributions
��+0.044
ivism+0.039
idem+0.038
icles+0.036
��+0.035
Neg logit contributions
 Spur-0.028
 misunderstand-0.027
fork-0.026
 twe-0.026
 lig-0.025
Token was
Token ID373
Feature activation-4.20e-01

Pos logit contributions
��+0.215
ivism+0.190
idem+0.187
icles+0.175
��+0.173
Neg logit contributions
 Spur-0.138
 misunderstand-0.135
fork-0.131
 twe-0.128
 lig-0.123
Token taken
Token ID2077
Feature activation+1.50e+00

Pos logit contributions
 Spur+0.531
 misunderstand+0.512
fork+0.496
 twe+0.490
 lig+0.475
Neg logit contributions
��-0.774
ivism-0.686
idem-0.671
icles-0.632
��-0.616
Token from
Token ID422
Feature activation+8.23e-02

Pos logit contributions
��+2.528
ivism+2.226
idem+2.173
icles+2.042
��+1.968
Neg logit contributions
 Spur-2.096
 misunderstand-2.031
fork-1.983
 twe-1.923
 lig-1.876
Token a
Token ID257
Feature activation+1.92e-02

Pos logit contributions
��+0.139
ivism+0.122
idem+0.119
icles+0.112
��+0.108
Neg logit contributions
 Spur-0.116
 misunderstand-0.112
fork-0.109
 twe-0.107
 lig-0.104
Token different
Token ID1180
Feature activation+1.39e+00

Pos logit contributions
��+0.037
ivism+0.033
idem+0.032
icles+0.031
��+0.030
Neg logit contributions
 Spur-0.023
 misunderstand-0.021
fork-0.021
 twe-0.020
 lig-0.020
Token news
Token ID1705
Feature activation-1.23e+00

Pos logit contributions
��+2.654
ivism+2.385
idem+2.335
icles+2.217
��+2.135
Neg logit contributions
 Spur-1.605
 misunderstand-1.555
fork-1.516
 twe-1.444
 Learns-1.417
Token story
Token ID1621
Feature activation-5.62e-01

Pos logit contributions
 Spur+1.178
 misunderstand+1.110
fork+1.083
 twe+1.055
 lig+0.993
Neg logit contributions
��-2.662
ivism-2.378
idem-2.326
icles-2.217
��-2.202
Token g
Token ID308
Feature activation-5.91e-01

Pos logit contributions
 Spur+0.530
 misunderstand+0.511
fork+0.497
 twe+0.486
 lig+0.479
Neg logit contributions
��-0.666
ivism-0.592
idem-0.581
icles-0.541
��-0.524
Tokenosh
Token ID3768
Feature activation+1.52e+00

Pos logit contributions
 Spur+1.039
 misunderstand+1.012
fork+1.004
 twe+0.970
 lig+0.955
Neg logit contributions
��-0.797
ivism-0.651
idem-0.633
��-0.581
icles-0.580
Token,
Token ID11
Feature activation+1.78e-01

Pos logit contributions
��+2.391
ivism+2.084
idem+2.036
icles+1.912
��+1.818
Neg logit contributions
 Spur-2.270
 misunderstand-2.203
fork-2.157
 twe-2.086
 Learns-2.057
Token I
Token ID314
Feature activation-2.66e-01

Pos logit contributions
��+0.320
ivism+0.285
idem+0.278
icles+0.262
��+0.255
Neg logit contributions
 Spur-0.230
 misunderstand-0.221
fork-0.215
 twe-0.210
 lig-0.204
Token just
Token ID655
Feature activation+6.34e-01

Pos logit contributions
 Spur+0.289
 misunderstand+0.285
fork+0.269
 twe+0.263
 lig+0.255
Neg logit contributions
��-0.533
ivism-0.477
idem-0.465
icles-0.445
��-0.435
Token climbed
Token ID19952
Feature activation+5.03e-01

Pos logit contributions
��+1.200
ivism+1.072
idem+1.048
icles+0.993
��+0.966
Neg logit contributions
 Spur-0.761
 misunderstand-0.741
fork-0.713
 twe-0.692
 lig-0.673
Token half
Token ID2063
Feature activation+8.27e-02

Pos logit contributions
��+0.848
ivism+0.741
idem+0.727
icles+0.680
��+0.659
Neg logit contributions
 Spur-0.716
 misunderstand-0.692
fork-0.675
 twe-0.655
 lig-0.641
Token of
Token ID286
Feature activation+1.08e-01

Pos logit contributions
��+0.141
ivism+0.121
idem+0.119
icles+0.111
��+0.109
Neg logit contributions
 Spur-0.118
 misunderstand-0.114
fork-0.111
 twe-0.108
 lig-0.106
Token Mount
Token ID5628
Feature activation-2.97e-01

Pos logit contributions
��+0.184
ivism+0.161
idem+0.158
icles+0.148
��+0.144
Neg logit contributions
 Spur-0.151
 misunderstand-0.145
fork-0.142
 twe-0.138
 Learns-0.134
Token Everest
Token ID41336
Feature activation+1.06e+00

Pos logit contributions
 Spur+0.456
fork+0.423
 misunderstand+0.422
 twe+0.406
 Learns+0.400
Neg logit contributions
��-0.475
ivism-0.410
idem-0.407
icles-0.373
��-0.370
Token.
Token ID13
Feature activation-7.05e-02

Pos logit contributions
��+1.742
ivism+1.523
idem+1.491
icles+1.392
��+1.347
Neg logit contributions
 Spur-1.545
 misunderstand-1.499
fork-1.469
 twe-1.422
 lig-1.392
Token what
Token ID644
Feature activation-2.85e-02

Pos logit contributions
 Spur+0.485
 misunderstand+0.475
fork+0.456
 twe+0.448
 lig+0.441
Neg logit contributions
��-0.590
ivism-0.510
idem-0.498
icles-0.464
��-0.461
Token is
Token ID318
Feature activation+3.54e-01

Pos logit contributions
 Spur+0.040
 misunderstand+0.039
fork+0.037
 twe+0.037
 lig+0.036
Neg logit contributions
��-0.049
ivism-0.043
idem-0.042
icles-0.039
��-0.038
Token going
Token ID1016
Feature activation+1.14e+00

Pos logit contributions
��+0.659
ivism+0.587
idem+0.576
icles+0.547
��+0.527
Neg logit contributions
 Spur-0.433
 misunderstand-0.420
fork-0.407
 twe-0.393
 lig-0.383
Token on
Token ID319
Feature activation-1.43e+00

Pos logit contributions
��+1.808
ivism+1.623
idem+1.551
icles+1.461
��+1.398
Neg logit contributions
 Spur-1.674
 misunderstand-1.609
fork-1.599
 twe-1.541
 Learns-1.523
Token in
Token ID287
Feature activation+6.82e-01

Pos logit contributions
 Spur+1.948
 misunderstand+1.928
fork+1.823
 twe+1.801
 lig+1.773
Neg logit contributions
��-2.499
ivism-2.108
idem-2.099
icles-1.974
��-1.946
Token order
Token ID1502
Feature activation+1.14e+00

Pos logit contributions
��+1.210
ivism+1.071
idem+1.047
icles+0.988
��+0.955
Neg logit contributions
 Spur-0.906
 misunderstand-0.879
fork-0.850
 twe-0.828
 lig-0.808
Token to
Token ID284
Feature activation+2.89e-01

Pos logit contributions
��+1.722
ivism+1.521
idem+1.470
icles+1.382
��+1.313
Neg logit contributions
 Spur-1.748
 misunderstand-1.679
fork-1.660
 twe-1.609
 Learns-1.601
Token lend
Token ID22096
Feature activation-2.65e+00

Pos logit contributions
��+0.552
ivism+0.492
idem+0.482
icles+0.458
��+0.444
Neg logit contributions
 Spur-0.344
 misunderstand-0.333
fork-0.322
 twe-0.311
 lig-0.302
Token support
Token ID1104
Feature activation+1.84e-01

Pos logit contributions
 Spur+3.329
 misunderstand+3.263
 lig+3.126
 twe+3.047
fork+3.023
Neg logit contributions
��-4.797
ivism-4.203
idem-4.171
icles-3.807
��-3.794
Token.
Token ID13
Feature activation+6.24e-02

Pos logit contributions
��+0.306
ivism+0.265
idem+0.262
icles+0.244
��+0.238
Neg logit contributions
 Spur-0.261
 misunderstand-0.255
fork-0.248
 twe-0.241
 lig-0.238
Token Sometimes
Token ID8975
Feature activation+1.11e+00

Pos logit contributions
��+0.121
ivism+0.108
idem+0.106
icles+0.100
��+0.098
Neg logit contributions
 Spur-0.074
 misunderstand-0.070
fork-0.068
 twe-0.065
 Learns-0.065
Token."
Token ID526
Feature activation+7.29e-02

Pos logit contributions
 Spur+0.318
 misunderstand+0.306
fork+0.301
 twe+0.292
 lig+0.282
Neg logit contributions
��-0.406
ivism-0.352
idem-0.349
icles-0.325
��-0.321
Token They
Token ID1119
Feature activation+7.01e-01

Pos logit contributions
��+0.141
ivism+0.125
idem+0.123
icles+0.115
��+0.114
Neg logit contributions
 Spur-0.088
 misunderstand-0.082
fork-0.081
 twe-0.077
 Learns-0.077
Token're
Token ID821
Feature activation-8.46e-01

Pos logit contributions
��+1.250
ivism+1.109
idem+1.084
icles+1.023
��+0.990
Neg logit contributions
 Spur-0.915
 misunderstand-0.887
fork-0.863
 twe-0.834
 lig-0.812
Token all
Token ID477
Feature activation-1.59e-01

Pos logit contributions
 Spur+1.074
 misunderstand+1.054
fork+0.997
 twe+0.996
 deem+0.962
Neg logit contributions
��-1.557
ivism-1.377
idem-1.362
icles-1.262
��-1.255
Token about
Token ID546
Feature activation+5.07e-02

Pos logit contributions
 Spur+0.205
 misunderstand+0.200
fork+0.192
 twe+0.188
 lig+0.181
Neg logit contributions
��-0.289
ivism-0.256
idem-0.252
icles-0.236
��-0.231
Token stakes
Token ID21147
Feature activation-3.40e-01

Pos logit contributions
��+0.090
ivism+0.080
idem+0.078
icles+0.073
��+0.071
Neg logit contributions
 Spur-0.067
 misunderstand-0.065
fork-0.063
 twe-0.062
 Learns-0.060
Token.
Token ID13
Feature activation+3.97e-02

Pos logit contributions
 Spur+0.470
 misunderstand+0.461
fork+0.452
 twe+0.434
 lig+0.422
Neg logit contributions
��-0.577
ivism-0.502
idem-0.500
icles-0.464
��-0.452
Token I
Token ID314
Feature activation-5.44e-01

Pos logit contributions
��+0.076
ivism+0.068
idem+0.067
icles+0.063
��+0.062
Neg logit contributions
 Spur-0.048
 misunderstand-0.045
fork-0.044
 twe-0.042
 Learns-0.042
Token've
Token ID1053
Feature activation-7.47e-01

Pos logit contributions
 Spur+0.577
 misunderstand+0.565
fork+0.534
 twe+0.522
 deem+0.507
Neg logit contributions
��-1.109
ivism-0.994
idem-0.972
icles-0.924
��-0.913
Token always
Token ID1464
Feature activation-3.86e-01

Pos logit contributions
 Spur+0.956
 misunderstand+0.943
fork+0.902
 twe+0.891
 lig+0.863
Neg logit contributions
��-1.354
ivism-1.199
idem-1.180
icles-1.101
��-1.094
Token been
Token ID587
Feature activation-4.52e-01

Pos logit contributions
 Spur+0.485
 misunderstand+0.481
fork+0.460
 twe+0.451
 lig+0.442
Neg logit contributions
��-0.709
ivism-0.620
idem-0.619
icles-0.577
��-0.572
Token had
Token ID550
Feature activation+4.84e-01

Pos logit contributions
��+1.278
ivism+1.133
idem+1.107
icles+1.045
��+1.006
Neg logit contributions
 Spur-0.968
 misunderstand-0.936
fork-0.914
 twe-0.883
 Learns-0.862
Token smoked
Token ID21603
Feature activation+3.29e-01

Pos logit contributions
��+0.889
ivism+0.792
idem+0.777
icles+0.732
��+0.709
Neg logit contributions
 Spur-0.607
 misunderstand-0.589
fork-0.571
 twe-0.553
 lig-0.540
Token crack
Token ID8469
Feature activation-2.13e-01

Pos logit contributions
��+0.637
ivism+0.567
idem+0.560
icles+0.529
��+0.515
Neg logit contributions
 Spur-0.384
 misunderstand-0.367
fork-0.356
 twe-0.348
 lig-0.339
Token cocaine
Token ID15144
Feature activation+2.61e-02

Pos logit contributions
 Spur+0.270
 misunderstand+0.262
fork+0.257
 twe+0.249
 lig+0.246
Neg logit contributions
��-0.387
ivism-0.341
idem-0.335
icles-0.314
��-0.309
Token.
Token ID13
Feature activation-1.15e-01

Pos logit contributions
��+0.044
ivism+0.038
idem+0.038
icles+0.035
��+0.034
Neg logit contributions
 Spur-0.037
 misunderstand-0.036
fork-0.035
 twe-0.034
 lig-0.034
Token The
Token ID383
Feature activation-1.34e-01

Pos logit contributions
 Spur+0.145
 misunderstand+0.135
fork+0.133
 twe+0.128
 Learns+0.127
Neg logit contributions
��-0.216
ivism-0.191
idem-0.188
icles-0.177
��-0.174
Token journalists
Token ID9046
Feature activation+1.44e-01

Pos logit contributions
 Spur+0.157
 misunderstand+0.150
fork+0.145
 twe+0.141
 lig+0.137
Neg logit contributions
��-0.258
ivism-0.230
idem-0.226
icles-0.214
��-0.208
Token Rob
Token ID3851
Feature activation+3.34e+00

Pos logit contributions
��+0.265
ivism+0.236
idem+0.233
icles+0.218
��+0.214
Neg logit contributions
 Spur-0.183
 misunderstand-0.177
fork-0.170
 twe-0.166
 deem-0.162
Tokenyn
Token ID2047
Feature activation+5.28e-02

Pos logit contributions
��+5.892
ivism+5.277
idem+5.166
icles+4.911
ão+4.704
Neg logit contributions
 Spur-4.212
 misunderstand-4.156
fork-4.015
 twe-3.896
 Learns-3.784
Token D
Token ID360
Feature activation+1.16e+00

Pos logit contributions
��+0.093
ivism+0.081
idem+0.080
icles+0.075
��+0.073
Neg logit contributions
 Spur-0.072
 misunderstand-0.068
fork-0.067
 twe-0.065
 Learns-0.064
Tokenool
Token ID970
Feature activation+1.72e+00

Pos logit contributions
��+1.963
ivism+1.718
idem+1.677
icles+1.574
��+1.522
Neg logit contributions
 Spur-1.633
 misunderstand-1.576
fork-1.551
 twe-1.499
 lig-1.462
Token myself
Token ID3589
Feature activation+4.84e-01

Pos logit contributions
��+0.473
ivism+0.415
idem+0.407
icles+0.381
��+0.373
Neg logit contributions
 Spur-0.374
 misunderstand-0.362
fork-0.350
 twe-0.345
 lig-0.332
Token:
Token ID25
Feature activation+4.58e-02

Pos logit contributions
��+0.805
ivism+0.696
idem+0.685
icles+0.641
��+0.627
Neg logit contributions
 Spur-0.695
 misunderstand-0.678
fork-0.658
 twe-0.643
 lig-0.626
Token Where
Token ID6350
Feature activation-1.44e-01

Pos logit contributions
��+0.089
ivism+0.080
idem+0.079
icles+0.075
��+0.073
Neg logit contributions
 Spur-0.052
 misunderstand-0.050
fork-0.049
 twe-0.047
 Learns-0.045
Token are
Token ID389
Feature activation-1.28e-01

Pos logit contributions
 Spur+0.162
 misunderstand+0.157
fork+0.152
 twe+0.146
 deem+0.142
Neg logit contributions
��-0.283
ivism-0.251
idem-0.248
icles-0.235
��-0.230
Token the
Token ID262
Feature activation+8.79e-02

Pos logit contributions
 Spur+0.186
 misunderstand+0.181
fork+0.175
 twe+0.172
 lig+0.168
Neg logit contributions
��-0.212
ivism-0.185
idem-0.182
icles-0.169
��-0.164
Token feds
Token ID43646
Feature activation-3.78e-01

Pos logit contributions
��+0.163
ivism+0.145
idem+0.143
icles+0.134
��+0.131
Neg logit contributions
 Spur-0.109
 misunderstand-0.106
fork-0.102
 twe-0.100
 lig-0.097
Token when
Token ID618
Feature activation-1.10e-01

Pos logit contributions
 Spur+0.501
 misunderstand+0.487
fork+0.470
 twe+0.459
 deem+0.453
Neg logit contributions
��-0.669
ivism-0.583
idem-0.578
icles-0.542
��-0.532
Token police
Token ID1644
Feature activation-1.58e-01

Pos logit contributions
 Spur+0.148
 misunderstand+0.143
fork+0.139
 twe+0.135
 deem+0.132
Neg logit contributions
��-0.192
ivism-0.169
idem-0.166
icles-0.156
��-0.151
Token over
Token ID625
Feature activation-7.33e-01

Pos logit contributions
 Spur+0.189
 misunderstand+0.185
fork+0.176
 twe+0.172
 deem+0.169
Neg logit contributions
��-0.303
ivism-0.267
idem-0.265
icles-0.248
��-0.244
Tokenstep
Token ID9662
Feature activation+2.06e+00

Pos logit contributions
 Spur+0.873
 misunderstand+0.854
fork+0.838
 twe+0.794
 lig+0.776
Neg logit contributions
��-1.395
ivism-1.235
idem-1.208
icles-1.146
��-1.123
Token the
Token ID262
Feature activation-1.72e-01

Pos logit contributions
��+3.200
ivism+2.829
idem+2.747
icles+2.570
��+2.452
Neg logit contributions
 Spur-3.098
 misunderstand-3.006
fork-2.966
 twe-2.863
 Learns-2.820
Token at
Token ID379
Feature activation-1.26e+00

Pos logit contributions
 Spur+1.776
 misunderstand+1.709
fork+1.670
 twe+1.643
 deem+1.610
Neg logit contributions
��-2.011
ivism-1.724
idem-1.711
icles-1.593
��-1.548
Token guard
Token ID4860
Feature activation+9.42e-01

Pos logit contributions
 Spur+1.549
 misunderstand+1.458
fork+1.413
 twe+1.407
 lig+1.365
Neg logit contributions
��-2.374
ivism-2.118
idem-2.075
icles-1.983
��-1.898
Token for
Token ID329
Feature activation+9.03e-01

Pos logit contributions
��+1.538
ivism+1.347
idem+1.315
icles+1.233
��+1.186
Neg logit contributions
 Spur-1.376
 misunderstand-1.333
fork-1.306
 twe-1.266
 lig-1.237
Token the
Token ID262
Feature activation-3.61e-01

Pos logit contributions
��+1.483
ivism+1.303
idem+1.272
icles+1.195
��+1.146
Neg logit contributions
 Spur-1.312
 misunderstand-1.265
fork-1.237
 twe-1.204
 lig-1.175
Token Baltimore
Token ID10346
Feature activation-2.85e+00

Pos logit contributions
 Spur+0.444
 misunderstand+0.420
fork+0.406
 twe+0.399
 lig+0.386
Neg logit contributions
��-0.681
ivism-0.607
idem-0.597
icles-0.564
��-0.548
Token Ravens
Token ID20091
Feature activation-1.37e+00

Pos logit contributions
 Spur+2.308
 misunderstand+2.080
 twe+1.975
fork+1.962
 deem+1.846
Neg logit contributions
��-6.367
ivism-5.872
idem-5.771
icles-5.447
ão-5.358
Token.
Token ID13
Feature activation-1.73e-01

Pos logit contributions
 Spur+1.958
 misunderstand+1.889
fork+1.856
 twe+1.814
 lig+1.779
Neg logit contributions
��-2.251
ivism-1.970
idem-1.933
icles-1.808
��-1.739
Token He
Token ID679
Feature activation-2.40e-01

Pos logit contributions
 Spur+0.204
 misunderstand+0.190
fork+0.189
 twe+0.179
 Learns+0.178
Neg logit contributions
��-0.334
ivism-0.298
idem-0.292
icles-0.277
��-0.271
Token started
Token ID2067
Feature activation+6.84e-01

Pos logit contributions
 Spur+0.297
 misunderstand+0.288
fork+0.280
 twe+0.271
 lig+0.265
Neg logit contributions
��-0.445
ivism-0.395
idem-0.386
icles-0.366
��-0.357
Token a
Token ID257
Feature activation-3.91e-01

Pos logit contributions
��+1.085
ivism+0.945
idem+0.922
icles+0.862
��+0.830
Neg logit contributions
 Spur-1.038
 misunderstand-1.003
fork-0.978
 twe-0.959
 lig-0.938
Token total
Token ID2472
Feature activation+6.45e-01

Pos logit contributions
 Spur+0.468
 misunderstand+0.447
fork+0.433
 twe+0.426
 lig+0.412
Neg logit contributions
��-0.742
ivism-0.666
idem-0.655
icles-0.617
��-0.598
Token)}
Token ID38165
Feature activation-3.51e-01

Pos logit contributions
 Spur+0.184
 misunderstand+0.179
fork+0.176
 twe+0.167
 lig+0.163
Neg logit contributions
��-0.269
ivism-0.237
idem-0.232
icles-0.218
��-0.216
Token 17
Token ID1596
Feature activation+1.19e+00

Pos logit contributions
 Spur+0.548
 misunderstand+0.533
fork+0.522
 twe+0.507
 Learns+0.497
Neg logit contributions
��-0.534
ivism-0.465
idem-0.453
icles-0.422
��-0.404
Token.
Token ID13
Feature activation+5.97e-02

Pos logit contributions
��+1.804
ivism+1.562
idem+1.520
icles+1.417
��+1.360
Neg logit contributions
 Spur-1.879
 misunderstand-1.827
fork-1.793
 twe-1.739
 Learns-1.705
Token c
Token ID269
Feature activation+3.33e-01

Pos logit contributions
��+0.109
ivism+0.097
idem+0.095
icles+0.090
��+0.087
Neg logit contributions
 Spur-0.076
 misunderstand-0.073
fork-0.071
 twe-0.069
 lig-0.067
Tokenxd
Token ID24954
Feature activation+6.76e-01

Pos logit contributions
��+0.532
ivism+0.457
idem+0.443
icles+0.416
��+0.406
Neg logit contributions
 Spur-0.504
 misunderstand-0.488
fork-0.484
 twe-0.465
 lig-0.455
Token5
Token ID20
Feature activation-7.64e-01

Pos logit contributions
��+1.043
ivism+0.902
idem+0.881
icles+0.826
��+0.794
Neg logit contributions
 Spur-1.047
 misunderstand-1.023
fork-1.004
 twe-0.968
 lig-0.958
Token {
Token ID1391
Feature activation+1.26e-01

Pos logit contributions
 Spur+1.034
 misunderstand+1.003
fork+0.978
 twe+0.949
 Learns+0.926
Neg logit contributions
��-1.323
ivism-1.165
idem-1.144
icles-1.080
��-1.037
Token(
Token ID7
Feature activation+9.57e-01

Pos logit contributions
��+0.216
ivism+0.187
idem+0.183
��+0.172
icles+0.172
Neg logit contributions
 Spur-0.176
fork-0.171
 misunderstand-0.169
 twe-0.160
 lig-0.157
Token21
Token ID2481
Feature activation-2.03e-01

Pos logit contributions
��+1.567
ivism+1.374
idem+1.338
icles+1.257
��+1.211
Neg logit contributions
 Spur-1.392
 misunderstand-1.347
fork-1.322
 twe-1.277
 Learns-1.253
Tokens
Token ID82
Feature activation-1.15e-01

Pos logit contributions
 Spur+0.327
 misunderstand+0.318
fork+0.312
 twe+0.303
 Learns+0.297
Neg logit contributions
��-0.299
ivism-0.259
idem-0.252
icles-0.234
��-0.223
Token)}
Token ID38165
Feature activation-3.98e-01

Pos logit contributions
 Spur+0.147
 misunderstand+0.142
fork+0.140
 twe+0.134
 Learns+0.130
Neg logit contributions
��-0.210
ivism-0.185
idem-0.181
icles-0.171
��-0.169
Token
Token ID447
Feature activation+3.18e-01

Pos logit contributions
 Spur+0.638
 misunderstand+0.619
fork+0.595
 twe+0.586
 deem+0.573
Neg logit contributions
��-0.801
ivism-0.698
idem-0.690
icles-0.648
��-0.629
Token
Token ID247
Feature activation+4.01e-01

Pos logit contributions
��+0.226
ivism+0.191
idem+0.184
icles+0.146
ão+0.123
Neg logit contributions
 Spur-0.724
 misunderstand-0.710
fork-0.704
 twe-0.696
 Learns-0.689
Tokens
Token ID82
Feature activation+3.36e-02

Pos logit contributions
��+0.613
ivism+0.531
idem+0.520
icles+0.482
��+0.461
Neg logit contributions
 Spur-0.627
 misunderstand-0.607
fork-0.594
 twe-0.582
 lig-0.569
Token refusal
Token ID17387
Feature activation-1.74e+00

Pos logit contributions
��+0.064
ivism+0.057
idem+0.056
icles+0.053
��+0.052
Neg logit contributions
 Spur-0.040
 misunderstand-0.039
fork-0.037
 twe-0.037
 lig-0.035
Token to
Token ID284
Feature activation-2.65e-01

Pos logit contributions
 Spur+2.436
 misunderstand+2.361
fork+2.303
 twe+2.244
 lig+2.200
Neg logit contributions
��-2.910
ivism-2.541
idem-2.496
icles-2.345
��-2.258
Token denounce
Token ID38639
Feature activation-2.24e+00

Pos logit contributions
 Spur+0.231
 misunderstand+0.221
fork+0.208
 twe+0.201
 deem+0.196
Neg logit contributions
��-0.593
ivism-0.536
idem-0.526
icles-0.504
��-0.493
Token Donald
Token ID3759
Feature activation+2.03e+00

Pos logit contributions
 Spur+3.633
 misunderstand+3.539
 twe+3.379
fork+3.361
 lig+3.290
Neg logit contributions
��-3.246
idem-2.727
ivism-2.724
icles-2.535
��-2.389
Token Trump
Token ID1301
Feature activation+1.90e+00

Pos logit contributions
��+1.959
ivism+1.562
idem+1.451
icles+1.289
��+1.198
Neg logit contributions
 Spur-4.282
 misunderstand-4.184
fork-4.143
 twe-4.026
 Learns-4.011
Token
Token ID447
Feature activation-8.80e-02

Pos logit contributions
��+3.122
ivism+2.750
idem+2.677
icles+2.515
��+2.412
Neg logit contributions
 Spur-2.734
 misunderstand-2.650
fork-2.597
 twe-2.513
 Learns-2.457
Token
Token ID247
Feature activation+1.17e-01

Pos logit contributions
 Spur+0.213
 misunderstand+0.209
fork+0.207
 twe+0.205
 Learns+0.203
Neg logit contributions
��-0.048
ivism-0.039
idem-0.038
icles-0.028
imus-0.022
Tokens
Token ID82
Feature activation-1.87e-01

Pos logit contributions
��+0.180
ivism+0.156
idem+0.153
icles+0.142
��+0.136
Neg logit contributions
 Spur-0.181
 misunderstand-0.175
fork-0.171
 twe-0.168
 lig-0.164
Token,
Token ID11
Feature activation-9.58e-02

Pos logit contributions
��+2.302
ivism+2.023
idem+1.974
icles+1.853
��+1.777
Neg logit contributions
 Spur-2.031
 misunderstand-1.970
fork-1.930
 twe-1.867
 Learns-1.829
Token procedural
Token ID27931
Feature activation+1.57e+00

Pos logit contributions
 Spur+0.133
 misunderstand+0.129
fork+0.125
 twe+0.123
 lig+0.120
Neg logit contributions
��-0.164
ivism-0.145
idem-0.141
icles-0.132
��-0.129
Token maps
Token ID8739
Feature activation-6.89e-02

Pos logit contributions
��+3.371
ivism+3.065
idem+3.009
icles+2.875
��+2.787
Neg logit contributions
 Spur-1.464
 misunderstand-1.391
fork-1.347
 twe-1.275
 Learns-1.235
Token were
Token ID547
Feature activation-1.06e-01

Pos logit contributions
 Spur+0.082
 misunderstand+0.080
fork+0.077
 twe+0.075
 lig+0.073
Neg logit contributions
��-0.132
ivism-0.116
idem-0.115
icles-0.107
��-0.106
Token already
Token ID1541
Feature activation-8.30e-01

Pos logit contributions
 Spur+0.131
 misunderstand+0.127
fork+0.123
 twe+0.121
 lig+0.117
Neg logit contributions
��-0.197
ivism-0.176
idem-0.172
icles-0.162
��-0.158
Token bog
Token ID22922
Feature activation-3.39e+00

Pos logit contributions
 Spur+1.026
 misunderstand+1.001
fork+0.965
 twe+0.957
 lig+0.932
Neg logit contributions
��-1.531
ivism-1.368
idem-1.337
icles-1.261
��-1.236
Tokenging
Token ID2667
Feature activation-1.08e+00

Pos logit contributions
 Spur+2.814
 misunderstand+2.700
fork+2.695
 twe+2.543
 lig+2.458
Neg logit contributions
��-7.580
ivism-6.819
idem-6.573
��-6.376
icles-6.253
Token us
Token ID514
Feature activation+1.05e+00

Pos logit contributions
 Spur+1.327
 misunderstand+1.291
fork+1.244
 twe+1.212
 lig+1.188
Neg logit contributions
��-2.005
ivism-1.764
idem-1.730
icles-1.633
��-1.598
Token in
Token ID287
Feature activation+2.25e-01

Pos logit contributions
��+1.925
ivism+1.692
idem+1.660
icles+1.568
��+1.531
Neg logit contributions
 Spur-1.330
 misunderstand-1.286
fork-1.248
 twe-1.213
 lig-1.184
Token two
Token ID734
Feature activation-7.75e-01

Pos logit contributions
��+0.400
ivism+0.352
idem+0.345
icles+0.325
��+0.316
Neg logit contributions
 Spur-0.297
 misunderstand-0.291
fork-0.281
 twe-0.274
 lig-0.267
Token factors
Token ID5087
Feature activation+3.95e-01

Pos logit contributions
 Spur+0.920
 misunderstand+0.910
fork+0.865
 twe+0.841
 lig+0.818
Neg logit contributions
��-1.477
ivism-1.326
idem-1.285
icles-1.218
��-1.191
Token[
Token ID58
Feature activation+1.24e+00

Pos logit contributions
 Spur+0.088
 misunderstand+0.085
fork+0.082
 twe+0.080
 Learns+0.079
Neg logit contributions
��-0.115
ivism-0.102
idem-0.099
icles-0.094
��-0.091
Token20
Token ID1238
Feature activation-5.79e-01

Pos logit contributions
��+1.758
ivism+1.527
idem+1.497
icles+1.358
ão+1.296
Neg logit contributions
 Spur-2.043
 misunderstand-1.991
fork-1.921
 twe-1.890
 Learns-1.858
Token]
Token ID60
Feature activation-7.06e-01

Pos logit contributions
 Spur+0.777
 misunderstand+0.753
fork+0.729
 twe+0.697
 lig+0.695
Neg logit contributions
��-1.022
ivism-0.889
idem-0.872
icles-0.820
��-0.810
Token
Token ID198
Feature activation+9.77e-02

Pos logit contributions
 Spur+0.852
 misunderstand+0.801
fork+0.773
 Learns+0.748
 twe+0.741
Neg logit contributions
��-1.355
ivism-1.200
idem-1.173
icles-1.114
��-1.095
Token
Token ID198
Feature activation-5.06e-02

Pos logit contributions
��+0.181
ivism+0.159
idem+0.156
icles+0.147
��+0.146
Neg logit contributions
 Spur-0.123
 misunderstand-0.118
fork-0.117
 twe-0.111
 lig-0.109
TokenSo
Token ID2396
Feature activation-2.71e+00

Pos logit contributions
 Spur+0.064
 misunderstand+0.062
fork+0.061
 twe+0.058
 lig+0.057
Neg logit contributions
��-0.092
ivism-0.081
idem-0.079
icles-0.075
��-0.074
Token the
Token ID262
Feature activation-1.79e-02

Pos logit contributions
 Spur+3.494
 misunderstand+3.471
fork+3.233
 twe+3.173
 deem+3.136
Neg logit contributions
��-4.707
ivism-4.156
idem-4.075
icles-3.855
��-3.754
Token study
Token ID2050
Feature activation-5.43e-01

Pos logit contributions
 Spur+0.021
 misunderstand+0.020
fork+0.019
 twe+0.019
 lig+0.018
Neg logit contributions
��-0.035
ivism-0.031
idem-0.031
icles-0.029
��-0.028
Token of
Token ID286
Feature activation+2.43e-01

Pos logit contributions
 Spur+0.695
 misunderstand+0.672
fork+0.654
 twe+0.632
 Learns+0.616
Neg logit contributions
��-0.983
ivism-0.868
idem-0.852
icles-0.803
��-0.778
Token mythology
Token ID25871
Feature activation-1.83e+00

Pos logit contributions
��+0.561
ivism+0.512
idem+0.503
icles+0.482
��+0.471
Neg logit contributions
 Spur-0.189
 misunderstand-0.180
fork-0.169
 twe-0.161
 lig-0.155
Token
Token ID960
Feature activation-9.22e-01

Pos logit contributions
 Spur+2.270
 misunderstand+2.238
fork+2.101
 twe+2.090
 lig+2.039
Neg logit contributions
��-3.399
idem-2.908
ivism-2.889
icles-2.741
��-2.728
Token goes
Token ID2925
Feature activation+1.19e+00

Pos logit contributions
 Spur+0.931
 misunderstand+0.892
fork+0.863
 twe+0.843
 Learns+0.823
Neg logit contributions
��-1.289
ivism-1.134
idem-1.107
icles-1.037
��-1.033
Token and
Token ID290
Feature activation+2.23e-01

Pos logit contributions
��+2.063
ivism+1.821
idem+1.782
icles+1.674
��+1.619
Neg logit contributions
 Spur-1.613
 misunderstand-1.556
fork-1.521
 twe-1.476
 lig-1.437
Token makes
Token ID1838
Feature activation-6.49e-02

Pos logit contributions
��+0.428
ivism+0.381
idem+0.374
icles+0.353
��+0.348
Neg logit contributions
 Spur-0.264
 misunderstand-0.256
fork-0.248
 twe-0.239
 Learns-0.234
Token Silence
Token ID26354
Feature activation+4.61e-02

Pos logit contributions
 Spur+0.092
 misunderstand+0.089
fork+0.086
 twe+0.084
 lig+0.082
Neg logit contributions
��-0.109
ivism-0.096
idem-0.094
icles-0.087
��-0.086
Token.
Token ID13
Feature activation+7.51e-02

Pos logit contributions
��+0.083
ivism+0.073
idem+0.072
icles+0.068
��+0.066
Neg logit contributions
 Spur-0.059
 misunderstand-0.057
fork-0.055
 twe-0.054
 lig-0.053
Token So
Token ID1406
Feature activation-2.53e+00

Pos logit contributions
��+0.142
ivism+0.126
idem+0.123
icles+0.117
��+0.115
Neg logit contributions
 Spur-0.092
 misunderstand-0.086
fork-0.085
 twe-0.082
 Learns-0.081
Token I
Token ID314
Feature activation-8.54e-01

Pos logit contributions
 Spur+3.289
 misunderstand+3.195
fork+3.107
 twe+3.035
 lig+2.927
Neg logit contributions
��-4.426
ivism-3.882
idem-3.798
��-3.603
icles-3.552
Token think
Token ID892
Feature activation-1.07e-03

Pos logit contributions
 Spur+0.826
 misunderstand+0.816
fork+0.761
 twe+0.747
 deem+0.726
Neg logit contributions
��-1.802
ivism-1.622
idem-1.582
icles-1.519
��-1.494
Token in
Token ID287
Feature activation+3.56e-01

Pos logit contributions
 Spur+0.001
 misunderstand+0.001
fork+0.001
 twe+0.001
 lig+0.001
Neg logit contributions
��-0.002
ivism-0.002
idem-0.002
icles-0.001
��-0.001
Token a
Token ID257
Feature activation+3.85e-02

Pos logit contributions
��+0.651
ivism+0.576
idem+0.564
icles+0.534
��+0.523
Neg logit contributions
 Spur-0.454
 misunderstand-0.440
fork-0.428
 twe-0.414
 lig-0.403
Token perfect
Token ID2818
Feature activation+4.79e-01

Pos logit contributions
��+0.073
ivism+0.064
idem+0.063
icles+0.060
��+0.059
Neg logit contributions
 Spur-0.047
 misunderstand-0.045
fork-0.044
 twe-0.043
 lig-0.042
Token're
Token ID821
Feature activation-1.02e+00

Pos logit contributions
 Spur+0.478
 misunderstand+0.468
fork+0.446
 twe+0.432
 deem+0.418
Neg logit contributions
��-0.899
ivism-0.808
idem-0.788
icles-0.752
��-0.737
Token in
Token ID287
Feature activation+2.10e-01

Pos logit contributions
 Spur+1.268
 misunderstand+1.253
fork+1.196
 twe+1.176
 deem+1.148
Neg logit contributions
��-1.860
ivism-1.672
idem-1.625
icles-1.551
��-1.509
Token luck
Token ID8458
Feature activation-1.41e-01

Pos logit contributions
��+0.384
ivism+0.342
idem+0.334
icles+0.316
��+0.307
Neg logit contributions
 Spur-0.266
 misunderstand-0.259
fork-0.251
 twe-0.242
 lig-0.235
Token.
Token ID13
Feature activation-2.23e-01

Pos logit contributions
 Spur+0.207
 misunderstand+0.202
fork+0.197
 twe+0.192
 lig+0.187
Neg logit contributions
��-0.228
ivism-0.198
idem-0.193
icles-0.180
��-0.176
Token Turn
Token ID6756
Feature activation-2.78e+00

Pos logit contributions
 Spur+0.268
 misunderstand+0.254
fork+0.250
 twe+0.239
 Learns+0.235
Neg logit contributions
��-0.426
ivism-0.379
idem-0.371
icles-0.352
��-0.347
Token on
Token ID319
Feature activation-2.31e+00

Pos logit contributions
 Spur+3.609
 misunderstand+3.501
fork+3.424
 twe+3.346
 lig+3.254
Neg logit contributions
��-4.876
ivism-4.275
idem-4.170
icles-3.933
��-3.873
Token the
Token ID262
Feature activation-2.72e-01

Pos logit contributions
 Spur+3.152
 misunderstand+3.053
fork+2.986
 twe+2.944
 lig+2.849
Neg logit contributions
��-3.920
ivism-3.454
idem-3.338
icles-3.150
��-3.112
Token monitor
Token ID5671
Feature activation+1.03e+00

Pos logit contributions
 Spur+0.258
 misunderstand+0.247
fork+0.238
 twe+0.232
 lig+0.219
Neg logit contributions
��-0.585
ivism-0.530
idem-0.516
icles-0.497
��-0.486
Token and
Token ID290
Feature activation-7.91e-02

Pos logit contributions
��+1.812
ivism+1.601
idem+1.563
icles+1.474
��+1.435
Neg logit contributions
 Spur-1.374
 misunderstand-1.329
fork-1.297
 twe-1.266
 lig-1.230
Token note
Token ID3465
Feature activation+9.89e-01

Pos logit contributions
 Spur+0.087
 misunderstand+0.084
fork+0.081
 twe+0.079
 lig+0.076
Neg logit contributions
��-0.158
ivism-0.142
idem-0.138
icles-0.132
��-0.129
Token the
Token ID262
Feature activation-1.42e-01

Pos logit contributions
��+1.624
ivism+1.425
idem+1.389
icles+1.305
��+1.258
Neg logit contributions
 Spur-1.431
 misunderstand-1.390
fork-1.358
 twe-1.318
 lig-1.287