TOP ACTIVATIONS
MAX = 0.536

Fire j ail . Dist ribute your desktop Linux application in
, destroy it or T ribute it with a card effect
s win over Wer rib ee on Saturday .
to Ritual Summon and T ribute your opponent s
, change - up or curve ball by moving the Game
Ch oya Week end T ear V oid Ray J KS
Week end T ear V oid Ray
The tradition of " wife inheritance " says that the brother
the otherwise rect il in ear facade . The designers explained
ard Span , Jays on Wer th . Where would that
. Both reports discuss Puerto Ric ans in Philly mirror ing
the rule prefer composition over inheritance , he made an is
and critical essays by Scott Ric her t and David Gordon
former executive at Bear St ear ns , a Goldman competitor
see the handcuffed man reapp ear . He was accompanied by
], with a very gentle curve . We can
for being able to T ribute any opponent s
due to a significant learning curve in teaching . Ch urn
track session at Circuit Paul Ric ard . The
high price - to - ear nings multiple may be justified

MIN ACTIVATIONS
MIN = -0.649

ρ π α γ ς ( Ant
ε ι ι
h . <|endoftext|> NEW DEL HI : In a setback to
On Fox News , Shepard Smith led the coverage in
), who was identified by Honest Reporting as the information and
an accident , according to Honest Reporting . Al ze er
ov . Ser hi y Kaz arov , former
photo ) Del hi Law Minister J it ender
s refinery business . Ser hi y Kaz arov is former
Gol ov ko , Ser hi y Kaz arov , Vital
" ann data su k hi b h ava " and
Son ip at , B hi w ani , J ind
... k uch h b hi . al ag se int
ana s Gar hi H ars aru Junction .
: Sh ri Pre mb hi ks hu ji Marg ,
Roh t ak , B hi w ani , J h
ago when se ared a hi was a trend , they
baseball , saying hi to everybody ,
ama i , k ū pa a , al
/ s gb ( d pa , br , ok to

INTERVAL 0.240 - 0.536
CONTAINS 1.181%

Clement 2009 Jim Parsons 2009 Tony Shal
Although U em atsu s
, a professor in the U - M Department of Atmospheric
are ) great characters ," Walker said . "
arc a felt the best tribute a fellow writer could pay

INTERVAL -0.057 - 0.240
CONTAINS 80.133%

. The independence debate shows us that we need a politics
increases to 1 , 709 m hz with a 1 ,
re ind eer her der , Yuri Kh udi , and
place ." More details and further peace offerings were being considered
7 s )} B c 5 { ( 4 s )}

INTERVAL -0.353 - -0.057
CONTAINS 18.579%

. And it plays really well , as it
The Aub rey / M atur in Novel s , as
R id ley , who is battling
knew that with the W KO , I needed to build
identified in time for the season opener , he replied :

INTERVAL -0.649 - -0.353
CONTAINS 0.107%

Do About It by Marc Goodman , available February 24 .
demands , such as bringing integrity back to government science ,
), who was identified by Honest Reporting as the information and
Calgary ( which has the Hyp ac ros aurus egg )
1950 NFL Championship Game arg uably the greatest game ever
Token Fire
Token ID3764
Feature activation+1.39e-01

Pos logit contributions
+0.109
HI+0.100
+0.099
 Shepard+0.092
 Honest+0.090
Neg logit contributions
ribute-0.090
hemat-0.090
adem-0.090
theless-0.088
hower-0.087
Tokenj
Token ID73
Feature activation-2.11e-01

Pos logit contributions
ribute+0.096
hemat+0.095
adem+0.095
theless+0.094
hower+0.092
Neg logit contributions
-0.119
HI-0.110
-0.107
 Shepard-0.100
 Honest-0.097
Tokenail
Token ID603
Feature activation-2.48e-02

Pos logit contributions
+0.175
HI+0.161
+0.158
 Shepard+0.147
 Honest+0.143
Neg logit contributions
ribute-0.149
hemat-0.148
adem-0.148
theless-0.146
 inheritance-0.144
Token.
Token ID764
Feature activation+2.46e-01

Pos logit contributions
+0.020
HI+0.019
+0.018
 Shepard+0.017
 Honest+0.017
Neg logit contributions
ribute-0.018
hemat-0.018
adem-0.018
theless-0.017
 inheritance-0.017
Token Dist
Token ID4307
Feature activation+5.72e-02

Pos logit contributions
ribute+0.170
hemat+0.169
adem+0.168
theless+0.166
 inheritance+0.164
Neg logit contributions
-0.208
HI-0.191
-0.188
 Shepard-0.176
 Honest-0.171
Tokenribute
Token ID4163
Feature activation+5.36e-01

Pos logit contributions
theless+0.022
hemat+0.022
adem+0.022
hower+0.021
ribute+0.021
Neg logit contributions
-0.066
HI-0.061
-0.061
 Shepard-0.058
 Honest-0.057
Token your
Token ID534
Feature activation-2.76e-02

Pos logit contributions
ribute+0.374
adem+0.370
hemat+0.370
theless+0.365
 inheritance+0.361
Neg logit contributions
-0.451
HI-0.413
-0.407
 Shepard-0.380
 Honest-0.369
Token desktop
Token ID11364
Feature activation+2.23e-02

Pos logit contributions
+0.023
HI+0.021
+0.021
 Shepard+0.020
 Honest+0.019
Neg logit contributions
ribute-0.019
hemat-0.019
adem-0.019
theless-0.019
 inheritance-0.018
Token Linux
Token ID7020
Feature activation-1.23e-01

Pos logit contributions
ribute+0.016
hemat+0.016
adem+0.016
theless+0.015
 inheritance+0.015
Neg logit contributions
-0.019
HI-0.017
-0.017
 Shepard-0.016
 Honest-0.015
Token application
Token ID3586
Feature activation-1.04e-02

Pos logit contributions
+0.106
HI+0.097
+0.095
 Shepard+0.089
 Honest+0.087
Neg logit contributions
ribute-0.084
hemat-0.083
adem-0.083
theless-0.082
 inheritance-0.081
Token in
Token ID287
Feature activation-1.30e-02

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.007
 Honest+0.007
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.007
 inheritance-0.007
Token,
Token ID11
Feature activation+8.71e-03

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.008
adem-0.008
hemat-0.008
theless-0.008
 inheritance-0.008
Token destroy
Token ID4117
Feature activation+9.24e-02

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.007
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token it
Token ID340
Feature activation+9.75e-03

Pos logit contributions
ribute+0.066
hemat+0.065
adem+0.065
theless+0.064
 inheritance+0.064
Neg logit contributions
-0.076
HI-0.070
-0.069
 Shepard-0.064
 Honest-0.062
Token or
Token ID393
Feature activation-6.89e-02

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Token T
Token ID309
Feature activation+1.36e-02

Pos logit contributions
+0.059
HI+0.054
+0.053
 Shepard+0.050
 Honest+0.048
Neg logit contributions
ribute-0.047
hemat-0.047
adem-0.047
theless-0.046
 inheritance-0.045
Tokenribute
Token ID4163
Feature activation+5.34e-01

Pos logit contributions
theless+0.000
adem+0.000
hemat+0.000
hower+0.000
 inheritance+0.000
Neg logit contributions
-0.021
HI-0.020
-0.019
 Shepard-0.019
 Honest-0.018
Token it
Token ID340
Feature activation+6.81e-03

Pos logit contributions
ribute+0.376
hemat+0.371
adem+0.370
theless+0.366
 inheritance+0.362
Neg logit contributions
-0.447
HI-0.411
-0.403
 Shepard-0.375
 Honest-0.364
Token with
Token ID351
Feature activation+6.57e-02

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
 inheritance+0.004
Neg logit contributions
-0.006
HI-0.005
-0.005
 Shepard-0.005
 Honest-0.005
Token a
Token ID257
Feature activation+3.36e-02

Pos logit contributions
ribute+0.046
hemat+0.045
adem+0.045
theless+0.045
 inheritance+0.044
Neg logit contributions
-0.055
HI-0.051
-0.050
 Shepard-0.047
 Honest-0.045
Token card
Token ID2657
Feature activation-3.58e-02

Pos logit contributions
adem+0.022
hemat+0.022
ribute+0.022
theless+0.022
 inheritance+0.021
Neg logit contributions
-0.030
HI-0.027
-0.027
 Shepard-0.025
 Honest-0.024
Token effect
Token ID1245
Feature activation+3.90e-02

Pos logit contributions
+0.030
HI+0.028
+0.027
 Shepard+0.025
 Honest+0.025
Neg logit contributions
ribute-0.025
adem-0.025
hemat-0.025
theless-0.024
 inheritance-0.024
Token
Token ID447
Feature activation+3.66e-03

Pos logit contributions
ribute+0.008
hemat+0.007
adem+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.008
 Honest-0.008
Token
Token ID247
Feature activation+6.39e-03

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Tokens
Token ID82
Feature activation-6.88e-03

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.005
-0.005
 Shepard-0.005
 Honest-0.004
Token win
Token ID1592
Feature activation-7.28e-02

Pos logit contributions
+0.006
HI+0.005
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Token over
Token ID625
Feature activation+5.25e-02

Pos logit contributions
+0.066
HI+0.061
+0.060
 Shepard+0.056
 Honest+0.055
Neg logit contributions
ribute-0.046
hemat-0.046
adem-0.046
theless-0.045
 inheritance-0.044
Token Wer
Token ID27648
Feature activation+5.32e-01

Pos logit contributions
ribute+0.039
hemat+0.038
adem+0.038
theless+0.038
 inheritance+0.037
Neg logit contributions
-0.042
HI-0.038
-0.038
 Shepard-0.035
 Honest-0.034
Tokenrib
Token ID822
Feature activation+3.02e-01

Pos logit contributions
ribute+0.165
adem+0.163
hemat+0.162
theless+0.158
 inheritance+0.154
Neg logit contributions
-0.653
HI-0.616
-0.609
 Shepard-0.582
 Honest-0.570
Tokenee
Token ID1453
Feature activation-2.24e-01

Pos logit contributions
hemat+0.314
ribute+0.313
adem+0.311
theless+0.311
hower+0.307
Neg logit contributions
-0.154
HI-0.132
-0.129
 Shepard-0.117
phi-0.106
Token on
Token ID319
Feature activation-1.19e-02

Pos logit contributions
+0.190
HI+0.175
+0.172
 Shepard+0.160
 Honest+0.156
Neg logit contributions
ribute-0.155
adem-0.154
hemat-0.154
theless-0.152
 inheritance-0.150
Token Saturday
Token ID3909
Feature activation+4.03e-02

Pos logit contributions
+0.010
HI+0.010
+0.009
 Shepard+0.009
phi+0.008
Neg logit contributions
ribute-0.008
hemat-0.008
adem-0.008
theless-0.008
 inheritance-0.008
Token.
Token ID13
Feature activation+7.05e-03

Pos logit contributions
ribute+0.028
adem+0.028
hemat+0.027
theless+0.027
 inheritance+0.027
Neg logit contributions
-0.034
HI-0.031
-0.031
 Shepard-0.029
 Honest-0.028
Token to
Token ID284
Feature activation-1.96e-03

Pos logit contributions
+0.058
HI+0.054
+0.053
 Shepard+0.049
 Honest+0.048
Neg logit contributions
ribute-0.047
hemat-0.047
adem-0.047
theless-0.046
 inheritance-0.046
Token Ritual
Token ID35179
Feature activation+2.49e-01

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token Summon
Token ID19515
Feature activation+1.20e-01

Pos logit contributions
ribute+0.136
adem+0.136
hemat+0.136
theless+0.133
 inheritance+0.131
Neg logit contributions
-0.247
HI-0.229
-0.226
 Shepard-0.213
 Honest-0.208
Token and
Token ID290
Feature activation-1.32e-04

Pos logit contributions
ribute+0.084
adem+0.084
hemat+0.084
theless+0.082
 inheritance+0.081
Neg logit contributions
-0.100
HI-0.091
-0.090
 Shepard-0.084
 Honest-0.081
Token T
Token ID309
Feature activation+8.44e-03

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Tokenribute
Token ID4163
Feature activation+5.30e-01

Pos logit contributions
theless+0.000
adem+0.000
hemat+0.000
hower+0.000
 inheritance+0.000
Neg logit contributions
-0.013
HI-0.012
-0.012
 Shepard-0.012
 Honest-0.011
Token your
Token ID534
Feature activation-3.47e-02

Pos logit contributions
ribute+0.375
hemat+0.370
adem+0.369
theless+0.365
 inheritance+0.361
Neg logit contributions
-0.442
HI-0.405
-0.398
 Shepard-0.370
 Honest-0.359
Token opponent
Token ID6125
Feature activation-9.27e-02

Pos logit contributions
+0.029
HI+0.026
+0.026
 Shepard+0.024
 Honest+0.023
Neg logit contributions
ribute-0.025
hemat-0.025
adem-0.025
theless-0.024
 inheritance-0.024
Token
Token ID447
Feature activation-1.03e-03

Pos logit contributions
+0.081
HI+0.075
+0.074
 Shepard+0.069
 Honest+0.067
Neg logit contributions
ribute-0.061
hemat-0.061
adem-0.061
theless-0.060
 inheritance-0.059
Token
Token ID247
Feature activation-8.39e-04

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
 inheritance-0.001
rue-0.001
Tokens
Token ID82
Feature activation-8.53e-03

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token,
Token ID11
Feature activation+8.99e-03

Pos logit contributions
ribute+0.143
hemat+0.142
adem+0.142
theless+0.140
 inheritance+0.138
Neg logit contributions
-0.168
HI-0.154
-0.151
 Shepard-0.141
 Honest-0.137
Token change
Token ID1487
Feature activation+9.21e-04

Pos logit contributions
ribute+0.006
adem+0.006
hemat+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token-
Token ID12
Feature activation-6.51e-03

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Tokenup
Token ID929
Feature activation-1.93e-01

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.006
adem-0.006
hemat-0.006
theless-0.006
rue-0.006
Token or
Token ID393
Feature activation-6.97e-02

Pos logit contributions
+0.158
HI+0.145
+0.142
 Shepard+0.133
 Honest+0.129
Neg logit contributions
ribute-0.138
hemat-0.137
adem-0.137
theless-0.135
 inheritance-0.133
Token curve
Token ID12133
Feature activation+5.30e-01

Pos logit contributions
+0.064
HI+0.059
+0.058
 Shepard+0.055
 Honest+0.054
Neg logit contributions
ribute-0.043
hemat-0.043
adem-0.042
theless-0.042
hower-0.041
Tokenball
Token ID1894
Feature activation-4.53e-03

Pos logit contributions
ribute+0.378
adem+0.375
hemat+0.374
theless+0.369
 inheritance+0.366
Neg logit contributions
-0.437
HI-0.399
-0.394
 Shepard-0.365
 Honest-0.356
Token by
Token ID416
Feature activation-1.15e-01

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token moving
Token ID3867
Feature activation+2.06e-02

Pos logit contributions
+0.099
HI+0.090
+0.089
 Shepard+0.083
 Honest+0.081
Neg logit contributions
ribute-0.079
hemat-0.078
adem-0.078
theless-0.077
 inheritance-0.076
Token the
Token ID262
Feature activation+7.62e-03

Pos logit contributions
ribute+0.015
hemat+0.015
adem+0.015
theless+0.015
 inheritance+0.015
Neg logit contributions
-0.017
HI-0.015
-0.015
 Shepard-0.014
 Honest-0.013
Token Game
Token ID3776
Feature activation-2.15e-01

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.007
HI-0.006
-0.006
 Shepard-0.006
 Honest-0.005
TokenCh
Token ID1925
Feature activation+4.21e-02

Pos logit contributions
+0.076
HI+0.069
+0.068
 Shepard+0.063
 Honest+0.061
Neg logit contributions
ribute-0.068
hemat-0.068
adem-0.067
theless-0.067
 inheritance-0.066
Tokenoya
Token ID23790
Feature activation-1.15e-01

Pos logit contributions
ribute+0.033
adem+0.033
hemat+0.033
theless+0.033
rue+0.032
Neg logit contributions
-0.031
HI-0.028
-0.028
 Shepard-0.026
 Honest-0.025
TokenWeek
Token ID20916
Feature activation-9.21e-02

Pos logit contributions
+0.094
HI+0.085
+0.084
 Shepard+0.078
 Honest+0.076
Neg logit contributions
ribute-0.083
adem-0.083
hemat-0.083
theless-0.081
 inheritance-0.080
Tokenend
Token ID437
Feature activation+6.77e-02

Pos logit contributions
+0.084
HI+0.077
+0.076
 Shepard+0.072
 Honest+0.070
Neg logit contributions
ribute-0.058
adem-0.057
hemat-0.057
theless-0.056
 inheritance-0.056
TokenT
Token ID51
Feature activation+7.50e-02

Pos logit contributions
ribute+0.046
hemat+0.046
adem+0.045
theless+0.045
 inheritance+0.044
Neg logit contributions
-0.058
HI-0.054
-0.053
 Shepard-0.049
 Honest-0.048
Tokenear
Token ID451
Feature activation+5.29e-01

Pos logit contributions
theless+0.026
ribute+0.026
hemat+0.026
adem+0.026
 inheritance+0.025
Neg logit contributions
-0.089
HI-0.084
-0.083
 Shepard-0.079
phi-0.077
TokenV
Token ID53
Feature activation+2.64e-02

Pos logit contributions
ribute+0.376
hemat+0.373
adem+0.373
theless+0.368
 inheritance+0.363
Neg logit contributions
-0.439
HI-0.401
-0.395
 Shepard-0.368
 Honest-0.357
Tokenoid
Token ID1868
Feature activation+9.16e-03

Pos logit contributions
ribute+0.017
adem+0.017
hemat+0.017
theless+0.016
 inheritance+0.016
Neg logit contributions
-0.024
HI-0.022
-0.022
 Shepard-0.020
 Honest-0.020
TokenRay
Token ID19591
Feature activation+1.51e-01

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.008
 Honest-0.008
TokenJ
Token ID41
Feature activation-3.70e-02

Pos logit contributions
ribute+0.107
hemat+0.106
adem+0.106
theless+0.105
 inheritance+0.103
Neg logit contributions
-0.125
HI-0.115
-0.113
 Shepard-0.105
 Honest-0.102
TokenKS
Token ID27015
Feature activation-2.77e-01

Pos logit contributions
+0.018
HI+0.015
+0.015
 Shepard+0.013
 Honest+0.012
Neg logit contributions
ribute-0.039
adem-0.039
hemat-0.039
theless-0.038
 inheritance-0.038
TokenWeek
Token ID20916
Feature activation-8.78e-02

Pos logit contributions
ribute+0.004
adem+0.004
hemat+0.004
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.003
 Honest-0.003
Tokenend
Token ID437
Feature activation+7.28e-02

Pos logit contributions
+0.078
HI+0.071
+0.070
 Shepard+0.066
 Honest+0.064
Neg logit contributions
ribute-0.057
hemat-0.057
adem-0.057
theless-0.056
 inheritance-0.055
Token
Token ID198
Feature activation-2.97e-03

Pos logit contributions
ribute+0.052
hemat+0.052
adem+0.051
theless+0.051
 inheritance+0.050
Neg logit contributions
-0.060
HI-0.055
-0.054
 Shepard-0.050
 Honest-0.049
Token
Token ID198
Feature activation+3.04e-03

Pos logit contributions
+0.003
HI+0.002
+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.002
hemat-0.002
adem-0.002
theless-0.002
 inheritance-0.002
TokenT
Token ID51
Feature activation+7.80e-02

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Tokenear
Token ID451
Feature activation+5.28e-01

Pos logit contributions
ribute+0.055
hemat+0.054
adem+0.054
theless+0.054
 inheritance+0.053
Neg logit contributions
-0.065
HI-0.060
-0.059
 Shepard-0.055
 Honest-0.053
Token
Token ID198
Feature activation+4.38e-03

Pos logit contributions
ribute+0.367
adem+0.364
hemat+0.363
theless+0.358
 inheritance+0.354
Neg logit contributions
-0.446
HI-0.408
-0.402
 Shepard-0.375
 Honest-0.364
Token
Token ID198
Feature activation-3.76e-03

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
TokenV
Token ID53
Feature activation+2.91e-02

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Tokenoid
Token ID1868
Feature activation+6.21e-03

Pos logit contributions
ribute+0.019
hemat+0.018
adem+0.018
theless+0.018
 inheritance+0.018
Neg logit contributions
-0.026
HI-0.024
-0.024
 Shepard-0.022
 Honest-0.022
TokenRay
Token ID19591
Feature activation+1.55e-01

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.005
-0.005
 Shepard-0.004
 Honest-0.004
TokenThe
Token ID464
Feature activation-6.54e-02

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token tradition
Token ID6761
Feature activation+2.44e-01

Pos logit contributions
+0.057
HI+0.052
+0.051
 Shepard+0.048
 Honest+0.047
Neg logit contributions
ribute-0.044
hemat-0.044
adem-0.044
theless-0.043
 inheritance-0.042
Token of
Token ID286
Feature activation+8.10e-03

Pos logit contributions
ribute+0.167
adem+0.166
hemat+0.166
theless+0.164
 inheritance+0.162
Neg logit contributions
-0.208
HI-0.191
-0.188
 Shepard-0.175
 Honest-0.170
Token "
Token ID366
Feature activation+3.57e-02

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.005
Neg logit contributions
-0.007
HI-0.006
-0.006
 Shepard-0.006
 Honest-0.006
Tokenwife
Token ID22095
Feature activation+8.02e-02

Pos logit contributions
ribute+0.025
hemat+0.025
adem+0.025
theless+0.025
 inheritance+0.024
Neg logit contributions
-0.030
HI-0.027
-0.027
 Shepard-0.025
 Honest-0.024
Token inheritance
Token ID24155
Feature activation+5.26e-01

Pos logit contributions
ribute+0.053
hemat+0.052
adem+0.052
theless+0.052
 inheritance+0.051
Neg logit contributions
-0.071
HI-0.065
-0.064
 Shepard-0.060
 Honest-0.058
Token"
Token ID1
Feature activation+2.85e-02

Pos logit contributions
ribute+0.347
hemat+0.344
adem+0.344
theless+0.339
 inheritance+0.335
Neg logit contributions
-0.462
HI-0.425
-0.418
 Shepard-0.392
 Honest-0.381
Token says
Token ID1139
Feature activation-1.41e-01

Pos logit contributions
ribute+0.020
hemat+0.020
adem+0.020
theless+0.019
 inheritance+0.019
Neg logit contributions
-0.024
HI-0.022
-0.022
 Shepard-0.020
 Honest-0.020
Token that
Token ID326
Feature activation-1.26e-02

Pos logit contributions
+0.117
HI+0.107
+0.105
 Shepard+0.099
 Honest+0.096
Neg logit contributions
ribute-0.099
hemat-0.099
adem-0.099
theless-0.098
hower-0.096
Token the
Token ID262
Feature activation+6.87e-03

Pos logit contributions
+0.011
HI+0.010
+0.010
 Shepard+0.009
 Honest+0.009
Neg logit contributions
ribute-0.009
hemat-0.009
adem-0.009
theless-0.009
hower-0.008
Token brother
Token ID3956
Feature activation+1.86e-01

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
hower+0.004
Neg logit contributions
-0.006
HI-0.006
-0.006
 Shepard-0.005
 Honest-0.005
Token the
Token ID262
Feature activation+6.11e-03

Pos logit contributions
+0.006
HI+0.006
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Token otherwise
Token ID4306
Feature activation-1.92e-01

Pos logit contributions
ribute+0.004
adem+0.004
hemat+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.005
-0.005
 Shepard-0.004
 Honest-0.004
Token rect
Token ID13621
Feature activation+5.27e-02

Pos logit contributions
+0.173
HI+0.160
+0.157
 Shepard+0.148
 Honest+0.144
Neg logit contributions
ribute-0.122
hemat-0.121
adem-0.121
theless-0.119
hower-0.117
Tokenil
Token ID346
Feature activation+4.73e-02

Pos logit contributions
ribute+0.034
hemat+0.034
adem+0.034
theless+0.033
 inheritance+0.033
Neg logit contributions
-0.047
HI-0.043
-0.042
 Shepard-0.040
 Honest-0.039
Tokenin
Token ID259
Feature activation-2.08e-02

Pos logit contributions
ribute+0.034
hemat+0.034
adem+0.034
theless+0.033
 inheritance+0.033
Neg logit contributions
-0.039
HI-0.035
-0.035
 Shepard-0.032
 Honest-0.031
Tokenear
Token ID451
Feature activation+5.26e-01

Pos logit contributions
+0.030
HI+0.029
+0.029
 Shepard+0.028
 Honest+0.027
Neg logit contributions
ribute-0.002
adem-0.001
hemat-0.001
theless-0.001
 inheritance-0.001
Token facade
Token ID43562
Feature activation-7.45e-02

Pos logit contributions
ribute+0.330
hemat+0.328
adem+0.327
theless+0.322
 inheritance+0.318
Neg logit contributions
-0.478
HI-0.441
-0.435
 Shepard-0.408
 Honest-0.397
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
+0.063
HI+0.058
+0.057
 Shepard+0.053
 Honest+0.051
Neg logit contributions
ribute-0.052
hemat-0.051
adem-0.051
theless-0.051
 inheritance-0.050
Token The
Token ID383
Feature activation-5.30e-02

Pos logit contributions
ribute+0.009
adem+0.009
hemat+0.009
theless+0.009
 inheritance+0.009
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.009
 Honest-0.008
Token designers
Token ID15441
Feature activation-3.72e-02

Pos logit contributions
+0.046
HI+0.042
+0.042
 Shepard+0.039
 Honest+0.038
Neg logit contributions
ribute-0.035
hemat-0.035
adem-0.035
theless-0.035
 inheritance-0.034
Token explained
Token ID4893
Feature activation+6.23e-02

Pos logit contributions
+0.032
HI+0.029
+0.029
 Shepard+0.027
 Honest+0.026
Neg logit contributions
ribute-0.025
hemat-0.025
adem-0.025
theless-0.025
 inheritance-0.024
Tokenard
Token ID446
Feature activation-4.12e-02

Pos logit contributions
ribute+0.041
hemat+0.041
adem+0.040
theless+0.040
 inheritance+0.040
Neg logit contributions
-0.044
HI-0.040
-0.039
 Shepard-0.036
 Honest-0.035
Token Span
Token ID49101
Feature activation-1.52e-01

Pos logit contributions
+0.027
HI+0.023
+0.023
 Shepard+0.021
 Honest+0.020
Neg logit contributions
hemat-0.037
ribute-0.037
adem-0.037
theless-0.037
 inheritance-0.036
Token,
Token ID11
Feature activation+1.02e-02

Pos logit contributions
+0.128
HI+0.117
+0.115
 Shepard+0.107
 Honest+0.104
Neg logit contributions
ribute-0.106
hemat-0.105
adem-0.105
theless-0.104
 inheritance-0.103
Token Jays
Token ID23288
Feature activation-1.87e-02

Pos logit contributions
ribute+0.008
hemat+0.007
adem+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.007
 Honest-0.007
Tokenon
Token ID261
Feature activation+1.27e-01

Pos logit contributions
+0.018
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.011
adem-0.010
hemat-0.010
theless-0.010
 inheritance-0.010
Token Wer
Token ID27648
Feature activation+5.25e-01

Pos logit contributions
ribute+0.016
adem+0.016
hemat+0.015
hower+0.015
theless+0.015
Neg logit contributions
-0.180
HI-0.170
-0.168
 Shepard-0.164
 Len-0.161
Tokenth
Token ID400
Feature activation-1.64e-02

Pos logit contributions
ribute+0.364
adem+0.362
hemat+0.361
rue+0.356
theless+0.355
Neg logit contributions
-0.441
-0.407
HI-0.401
 Shepard-0.373
 Honest-0.366
Token.
Token ID13
Feature activation+2.10e-03

Pos logit contributions
+0.014
HI+0.012
+0.012
 Shepard+0.011
 Honest+0.011
Neg logit contributions
ribute-0.012
adem-0.012
hemat-0.012
theless-0.012
 inheritance-0.011
Token Where
Token ID6350
Feature activation-1.70e-02

Pos logit contributions
ribute+0.002
adem+0.001
hemat+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.001
 Honest-0.001
Token would
Token ID561
Feature activation+3.23e-02

Pos logit contributions
+0.014
HI+0.013
+0.013
 Shepard+0.012
 Honest+0.012
Neg logit contributions
ribute-0.012
hemat-0.012
adem-0.012
theless-0.011
 inheritance-0.011
Token that
Token ID326
Feature activation-1.44e-02

Pos logit contributions
ribute+0.024
hemat+0.023
adem+0.023
theless+0.023
 inheritance+0.023
Neg logit contributions
-0.026
HI-0.024
-0.023
 Shepard-0.022
 Honest-0.021
Token.
Token ID13
Feature activation-1.66e-03

Pos logit contributions
+0.207
HI+0.189
+0.186
 Shepard+0.174
 Honest+0.169
Neg logit contributions
ribute-0.172
hemat-0.170
adem-0.170
theless-0.168
 inheritance-0.166
Token Both
Token ID5747
Feature activation+1.70e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token reports
Token ID3136
Feature activation-5.60e-02

Pos logit contributions
ribute+0.012
adem+0.012
hemat+0.012
theless+0.012
 inheritance+0.011
Neg logit contributions
-0.014
HI-0.013
-0.013
 Shepard-0.012
 Honest-0.012
Token discuss
Token ID2112
Feature activation+7.13e-02

Pos logit contributions
+0.047
HI+0.043
+0.042
 Shepard+0.039
 Honest+0.038
Neg logit contributions
ribute-0.039
adem-0.039
hemat-0.039
theless-0.038
 inheritance-0.038
Token Puerto
Token ID15295
Feature activation+1.38e-01

Pos logit contributions
ribute+0.048
hemat+0.047
adem+0.047
theless+0.047
 inheritance+0.046
Neg logit contributions
-0.062
HI-0.057
-0.056
 Shepard-0.052
 Honest-0.051
Token Ric
Token ID15868
Feature activation+5.25e-01

Pos logit contributions
ribute+0.046
hemat+0.045
adem+0.045
theless+0.044
hower+0.043
Neg logit contributions
-0.167
HI-0.157
-0.155
 Shepard-0.148
 Honest-0.146
Tokenans
Token ID504
Feature activation-1.04e-01

Pos logit contributions
ribute+0.414
adem+0.411
hemat+0.411
theless+0.406
 inheritance+0.401
Neg logit contributions
-0.394
HI-0.357
-0.351
 Shepard-0.324
 Honest-0.313
Token in
Token ID287
Feature activation-1.37e-02

Pos logit contributions
+0.088
HI+0.080
+0.079
 Shepard+0.074
 Honest+0.072
Neg logit contributions
ribute-0.072
hemat-0.071
adem-0.071
theless-0.070
 inheritance-0.069
Token Philly
Token ID30029
Feature activation-2.51e-01

Pos logit contributions
+0.011
HI+0.010
+0.010
 Shepard+0.009
 Honest+0.009
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
 inheritance-0.010
Token mirror
Token ID10162
Feature activation+1.57e-01

Pos logit contributions
+0.211
HI+0.194
+0.190
 Shepard+0.178
 Honest+0.173
Neg logit contributions
ribute-0.174
hemat-0.173
adem-0.173
theless-0.171
 inheritance-0.168
Tokening
Token ID278
Feature activation-3.24e-03

Pos logit contributions
ribute+0.106
adem+0.106
hemat+0.106
theless+0.104
 inheritance+0.103
Neg logit contributions
-0.135
HI-0.124
-0.122
 Shepard-0.114
 Honest-0.111
Token the
Token ID262
Feature activation+8.08e-03

Pos logit contributions
+0.096
HI+0.088
+0.086
 Shepard+0.080
 Honest+0.078
Neg logit contributions
ribute-0.081
adem-0.080
hemat-0.080
theless-0.079
 inheritance-0.078
Token rule
Token ID3896
Feature activation+1.49e-01

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
hower+0.005
Neg logit contributions
-0.007
HI-0.006
-0.006
 Shepard-0.006
 Honest-0.006
Token prefer
Token ID4702
Feature activation+3.11e-02

Pos logit contributions
ribute+0.102
hemat+0.101
adem+0.101
theless+0.100
 inheritance+0.098
Neg logit contributions
-0.127
HI-0.116
-0.114
 Shepard-0.107
 Honest-0.104
Token composition
Token ID11742
Feature activation+1.57e-01

Pos logit contributions
ribute+0.021
adem+0.021
hemat+0.021
theless+0.020
 inheritance+0.020
Neg logit contributions
-0.027
HI-0.025
-0.024
 Shepard-0.023
 Honest-0.022
Token over
Token ID625
Feature activation+4.97e-02

Pos logit contributions
ribute+0.108
adem+0.107
hemat+0.107
theless+0.106
hower+0.104
Neg logit contributions
-0.134
HI-0.123
-0.121
 Shepard-0.113
 Honest-0.110
Token inheritance
Token ID24155
Feature activation+5.24e-01

Pos logit contributions
ribute+0.032
adem+0.032
hemat+0.032
theless+0.032
hower+0.031
Neg logit contributions
-0.044
HI-0.041
-0.040
 Shepard-0.038
 Honest-0.037
Token,
Token ID11
Feature activation+1.97e-04

Pos logit contributions
ribute+0.364
hemat+0.362
adem+0.362
theless+0.357
hower+0.351
Neg logit contributions
-0.442
HI-0.404
-0.398
 Shepard-0.372
 Honest-0.361
Token he
Token ID339
Feature activation-2.14e-02

Pos logit contributions
ribute+0.000
adem+0.000
hemat+0.000
theless+0.000
hower+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Token made
Token ID925
Feature activation+5.65e-02

Pos logit contributions
+0.018
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.014
 inheritance-0.014
Token an
Token ID281
Feature activation-6.02e-02

Pos logit contributions
ribute+0.040
adem+0.040
hemat+0.039
theless+0.039
 inheritance+0.039
Neg logit contributions
-0.047
HI-0.043
-0.042
 Shepard-0.040
 Honest-0.038
Token is
Token ID318
Feature activation+2.48e-03

Pos logit contributions
+0.052
HI+0.048
+0.047
 Shepard+0.044
 Honest+0.043
Neg logit contributions
ribute-0.040
adem-0.040
hemat-0.040
theless-0.039
 inheritance-0.039
Token and
Token ID290
Feature activation+6.60e-04

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
adem-0.001
hemat-0.001
theless-0.001
 inheritance-0.001
Token critical
Token ID4688
Feature activation-7.77e-02

Pos logit contributions
ribute+0.000
adem+0.000
hemat+0.000
 inheritance+0.000
hower+0.000
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.000
phi-0.000
Token essays
Token ID27126
Feature activation+9.03e-02

Pos logit contributions
+0.077
HI+0.072
+0.071
 Shepard+0.067
 Honest+0.065
Neg logit contributions
ribute-0.042
hemat-0.042
adem-0.042
theless-0.041
 inheritance-0.040
Token by
Token ID416
Feature activation-1.30e-01

Pos logit contributions
ribute+0.073
hemat+0.072
adem+0.071
 inheritance+0.070
theless+0.070
Neg logit contributions
-0.066
HI-0.060
-0.059
 Shepard-0.054
phi-0.052
Token Scott
Token ID4746
Feature activation-7.71e-02

Pos logit contributions
+0.098
HI+0.089
+0.087
 Shepard+0.079
 Honest+0.077
Neg logit contributions
ribute-0.103
adem-0.102
hemat-0.101
 inheritance-0.100
 curve-0.099
Token Ric
Token ID15868
Feature activation+5.24e-01

Pos logit contributions
+0.114
HI+0.108
+0.107
 Shepard+0.103
 Honest+0.102
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.004
 inheritance-0.003
Tokenher
Token ID372
Feature activation+9.61e-02

Pos logit contributions
ribute+0.277
adem+0.276
hemat+0.276
rue+0.265
 inheritance+0.264
Neg logit contributions
-0.529
HI-0.491
-0.489
 Shepard-0.460
 Honest-0.451
Tokent
Token ID83
Feature activation-1.24e-02

Pos logit contributions
ribute+0.074
hemat+0.074
theless+0.073
adem+0.073
 inheritance+0.072
Neg logit contributions
-0.074
HI-0.066
-0.066
 Shepard-0.062
 Honest-0.059
Token and
Token ID290
Feature activation-6.53e-05

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.009
theless-0.009
hemat-0.009
adem-0.009
 inheritance-0.009
Token David
Token ID3271
Feature activation-5.01e-02

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
adem-0.000
hemat-0.000
 inheritance-0.000
theless-0.000
Token Gordon
Token ID11646
Feature activation-1.08e-01

Pos logit contributions
+0.033
HI+0.030
+0.029
 Shepard+0.026
 Honest+0.025
Neg logit contributions
ribute-0.044
hemat-0.044
adem-0.044
theless-0.043
 inheritance-0.043
Token former
Token ID1966
Feature activation-4.98e-02

Pos logit contributions
ribute+0.022
hemat+0.022
adem+0.022
theless+0.021
 inheritance+0.021
Neg logit contributions
-0.026
HI-0.024
-0.023
 Shepard-0.022
 Honest-0.021
Token executive
Token ID4640
Feature activation-7.43e-03

Pos logit contributions
+0.043
HI+0.040
+0.039
 Shepard+0.037
 Honest+0.036
Neg logit contributions
ribute-0.033
hemat-0.033
adem-0.033
theless-0.033
 inheritance-0.032
Token at
Token ID379
Feature activation-2.46e-02

Pos logit contributions
+0.006
HI+0.006
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
adem-0.005
hemat-0.005
theless-0.005
 inheritance-0.005
Token Bear
Token ID14732
Feature activation+1.25e-01

Pos logit contributions
+0.022
HI+0.020
+0.020
 Shepard+0.018
 Honest+0.018
Neg logit contributions
ribute-0.016
hemat-0.016
adem-0.016
theless-0.016
 inheritance-0.016
Token St
Token ID520
Feature activation-3.30e-02

Pos logit contributions
ribute+0.090
hemat+0.090
adem+0.089
theless+0.088
 inheritance+0.088
Neg logit contributions
-0.102
HI-0.093
-0.092
 Shepard-0.084
 Honest-0.083
Tokenear
Token ID451
Feature activation+5.23e-01

Pos logit contributions
+0.049
HI+0.047
+0.046
 Shepard+0.045
 Honest+0.044
Neg logit contributions
ribute-0.002
adem-0.002
hemat-0.002
theless-0.001
 inheritance-0.001
Tokenns
Token ID5907
Feature activation+1.43e-01

Pos logit contributions
ribute+0.262
hemat+0.258
adem+0.257
theless+0.252
 inheritance+0.250
Neg logit contributions
-0.542
HI-0.507
-0.501
 Shepard-0.473
 Honest-0.462
Token,
Token ID11
Feature activation+5.72e-03

Pos logit contributions
ribute+0.104
hemat+0.103
adem+0.103
theless+0.101
 inheritance+0.100
Neg logit contributions
-0.116
HI-0.106
-0.104
 Shepard-0.097
 Honest-0.094
Token a
Token ID257
Feature activation+3.00e-02

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token Goldman
Token ID18740
Feature activation+5.14e-02

Pos logit contributions
ribute+0.019
hemat+0.019
adem+0.019
theless+0.019
rue+0.018
Neg logit contributions
-0.027
HI-0.025
-0.025
 Shepard-0.023
 Honest-0.022
Token competitor
Token ID20319
Feature activation+5.43e-02

Pos logit contributions
ribute+0.037
hemat+0.037
adem+0.037
theless+0.036
 inheritance+0.036
Neg logit contributions
-0.042
HI-0.038
-0.038
 Shepard-0.035
 Honest-0.034
Token see
Token ID766
Feature activation-5.06e-02

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
hower+0.001
Neg logit contributions
-0.002
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Token the
Token ID262
Feature activation+2.61e-03

Pos logit contributions
+0.042
HI+0.038
+0.038
 Shepard+0.035
 Honest+0.034
Neg logit contributions
ribute-0.036
hemat-0.036
adem-0.036
theless-0.035
 inheritance-0.035
Token handcuffed
Token ID41529
Feature activation+3.84e-02

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
hower+0.002
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token man
Token ID582
Feature activation-1.49e-02

Pos logit contributions
ribute+0.026
adem+0.026
hemat+0.026
theless+0.026
 inheritance+0.025
Neg logit contributions
-0.033
HI-0.030
-0.030
 Shepard-0.028
 Honest-0.027
Token reapp
Token ID24578
Feature activation+3.55e-02

Pos logit contributions
+0.012
HI+0.011
+0.011
 Shepard+0.010
 Honest+0.010
Neg logit contributions
ribute-0.011
adem-0.010
hemat-0.010
theless-0.010
 inheritance-0.010
Tokenear
Token ID451
Feature activation+5.22e-01

Pos logit contributions
hemat+0.005
theless+0.004
ribute+0.004
adem+0.004
hower+0.004
Neg logit contributions
-0.050
HI-0.047
-0.047
 Shepard-0.045
phi-0.044
Token.
Token ID13
Feature activation+3.28e-03

Pos logit contributions
ribute+0.354
adem+0.351
hemat+0.351
theless+0.346
 inheritance+0.342
Neg logit contributions
-0.449
HI-0.413
-0.406
 Shepard-0.379
 Honest-0.369
Token He
Token ID679
Feature activation+4.87e-02

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.002
 Honest-0.002
Token was
Token ID373
Feature activation-9.48e-03

Pos logit contributions
ribute+0.034
adem+0.033
hemat+0.033
theless+0.033
 inheritance+0.032
Neg logit contributions
-0.041
HI-0.038
-0.037
 Shepard-0.035
 Honest-0.034
Token accompanied
Token ID11791
Feature activation+3.74e-01

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.007
 Honest+0.007
Neg logit contributions
ribute-0.006
hemat-0.006
adem-0.006
theless-0.006
 inheritance-0.006
Token by
Token ID416
Feature activation-1.28e-01

Pos logit contributions
ribute+0.307
hemat+0.305
adem+0.304
theless+0.301
 inheritance+0.298
Neg logit contributions
-0.268
HI-0.242
-0.239
 Shepard-0.218
 Honest-0.211
Token],
Token ID4357
Feature activation+2.12e-01

Pos logit contributions
+0.140
HI+0.130
+0.128
 Shepard+0.121
 Honest+0.118
Neg logit contributions
ribute-0.074
hemat-0.074
adem-0.073
theless-0.072
 inheritance-0.071
Token with
Token ID351
Feature activation+6.23e-02

Pos logit contributions
ribute+0.149
hemat+0.147
adem+0.147
theless+0.145
 inheritance+0.144
Neg logit contributions
-0.176
HI-0.161
-0.159
 Shepard-0.149
 Honest-0.145
Token a
Token ID257
Feature activation+3.30e-02

Pos logit contributions
ribute+0.043
adem+0.043
hemat+0.043
theless+0.042
 inheritance+0.042
Neg logit contributions
-0.052
HI-0.048
-0.047
 Shepard-0.044
 Honest-0.043
Token very
Token ID845
Feature activation+8.27e-03

Pos logit contributions
ribute+0.023
hemat+0.023
adem+0.023
theless+0.022
 inheritance+0.022
Neg logit contributions
-0.028
HI-0.026
-0.025
 Shepard-0.024
 Honest-0.023
Token gentle
Token ID10296
Feature activation+1.06e-02

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.007
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token curve
Token ID12133
Feature activation+5.22e-01

Pos logit contributions
ribute+0.007
hemat+0.007
adem+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.008
 Honest-0.008
Token.
Token ID13
Feature activation+1.92e-03

Pos logit contributions
ribute+0.357
hemat+0.354
adem+0.354
theless+0.348
 inheritance+0.345
Neg logit contributions
-0.446
HI-0.409
-0.403
 Shepard-0.376
 Honest-0.366
Token
Token ID198
Feature activation+1.12e-05

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.002
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Token
Token ID198
Feature activation-1.49e-03

Pos logit contributions
ribute+0.000
hemat+0.000
adem+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
TokenWe
Token ID1135
Feature activation-5.87e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token can
Token ID460
Feature activation+2.20e-02

Pos logit contributions
+0.051
HI+0.047
+0.046
 Shepard+0.043
 Honest+0.042
Neg logit contributions
ribute-0.039
hemat-0.039
adem-0.039
theless-0.038
 inheritance-0.038
Token for
Token ID329
Feature activation-1.26e-02

Pos logit contributions
ribute+0.087
hemat+0.086
adem+0.086
theless+0.085
hower+0.084
Neg logit contributions
-0.102
HI-0.093
-0.092
 Shepard-0.086
 Honest-0.083
Token being
Token ID852
Feature activation-9.51e-02

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.009
adem-0.009
hemat-0.009
theless-0.009
hower-0.009
Token able
Token ID1498
Feature activation+2.01e-01

Pos logit contributions
+0.084
HI+0.077
+0.076
 Shepard+0.072
 Honest+0.070
Neg logit contributions
ribute-0.062
adem-0.062
hemat-0.062
theless-0.061
hower-0.060
Token to
Token ID284
Feature activation-1.80e-03

Pos logit contributions
ribute+0.137
adem+0.135
hemat+0.135
theless+0.133
 inheritance+0.132
Neg logit contributions
-0.173
HI-0.159
-0.157
 Shepard-0.146
 Honest-0.142
Token T
Token ID309
Feature activation+7.71e-03

Pos logit contributions
+0.002
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Tokenribute
Token ID4163
Feature activation+5.21e-01

Pos logit contributions
theless+0.000
adem+0.000
hemat+0.000
hower+0.000
ribute+0.000
Neg logit contributions
-0.012
HI-0.011
-0.011
 Shepard-0.011
 Honest-0.010
Token any
Token ID597
Feature activation+7.56e-02

Pos logit contributions
ribute+0.369
hemat+0.365
adem+0.365
theless+0.360
 inheritance+0.356
Neg logit contributions
-0.434
HI-0.397
-0.391
 Shepard-0.364
 Honest-0.353
Token opponent
Token ID6125
Feature activation-9.93e-02

Pos logit contributions
ribute+0.053
hemat+0.053
adem+0.052
theless+0.052
 inheritance+0.051
Neg logit contributions
-0.063
HI-0.058
-0.057
 Shepard-0.053
 Honest-0.051
Token
Token ID447
Feature activation-4.04e-03

Pos logit contributions
+0.085
HI+0.078
+0.077
 Shepard+0.072
 Honest+0.070
Neg logit contributions
ribute-0.067
hemat-0.067
adem-0.067
theless-0.066
 inheritance-0.065
Token
Token ID247
Feature activation-7.25e-03

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
rue-0.003
 inheritance-0.003
Tokens
Token ID82
Feature activation-1.09e-02

Pos logit contributions
+0.006
HI+0.006
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Token due
Token ID2233
Feature activation+1.48e-01

Pos logit contributions
ribute+0.111
hemat+0.110
adem+0.110
theless+0.109
 inheritance+0.107
Neg logit contributions
-0.132
HI-0.121
-0.119
 Shepard-0.111
 Honest-0.108
Token to
Token ID284
Feature activation+6.81e-03

Pos logit contributions
ribute+0.103
adem+0.102
hemat+0.102
theless+0.101
 inheritance+0.099
Neg logit contributions
-0.125
HI-0.115
-0.113
 Shepard-0.106
 Honest-0.103
Token a
Token ID257
Feature activation+3.20e-02

Pos logit contributions
ribute+0.005
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.006
HI-0.005
-0.005
 Shepard-0.005
 Honest-0.005
Token significant
Token ID2383
Feature activation+3.51e-02

Pos logit contributions
ribute+0.022
hemat+0.021
adem+0.021
theless+0.021
 inheritance+0.021
Neg logit contributions
-0.028
HI-0.025
-0.025
 Shepard-0.023
 Honest-0.023
Token learning
Token ID4673
Feature activation-1.68e-02

Pos logit contributions
ribute+0.022
hemat+0.022
adem+0.022
theless+0.022
rue+0.021
Neg logit contributions
-0.032
HI-0.029
-0.029
 Shepard-0.027
 Honest-0.026
Token curve
Token ID12133
Feature activation+5.19e-01

Pos logit contributions
+0.016
HI+0.015
+0.015
 Shepard+0.014
 Honest+0.013
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
rue-0.010
Token in
Token ID287
Feature activation-1.44e-02

Pos logit contributions
ribute+0.360
hemat+0.358
adem+0.357
theless+0.353
 inheritance+0.347
Neg logit contributions
-0.438
HI-0.401
-0.395
 Shepard-0.368
 Honest-0.358
Token teaching
Token ID7743
Feature activation+1.52e-01

Pos logit contributions
+0.012
HI+0.011
+0.011
 Shepard+0.010
 Honest+0.010
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
rue-0.010
Token.
Token ID13
Feature activation-2.57e-03

Pos logit contributions
ribute+0.107
hemat+0.106
adem+0.106
theless+0.105
rue+0.103
Neg logit contributions
-0.128
HI-0.116
-0.115
 Shepard-0.107
 Honest-0.104
Token Ch
Token ID609
Feature activation+2.24e-02

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.002
hemat-0.002
adem-0.002
theless-0.002
 inheritance-0.002
Tokenurn
Token ID700
Feature activation+1.94e-01

Pos logit contributions
ribute+0.015
hemat+0.015
adem+0.015
theless+0.015
 inheritance+0.015
Neg logit contributions
-0.019
HI-0.018
-0.018
 Shepard-0.016
 Honest-0.016
Tokentrack
Token ID11659
Feature activation+2.98e-01

Pos logit contributions
+0.025
HI+0.023
+0.023
 Shepard+0.022
phi+0.021
Neg logit contributions
ribute-0.010
adem-0.010
hemat-0.010
theless-0.009
 inheritance-0.009
Token session
Token ID6246
Feature activation+9.85e-02

Pos logit contributions
hemat+0.199
ribute+0.199
adem+0.198
theless+0.194
hower+0.193
Neg logit contributions
-0.261
HI-0.240
-0.235
 Shepard-0.222
 Honest-0.215
Token at
Token ID379
Feature activation-3.47e-02

Pos logit contributions
ribute+0.069
hemat+0.069
adem+0.068
theless+0.068
hower+0.066
Neg logit contributions
-0.083
HI-0.076
-0.074
 Shepard-0.070
 Honest-0.068
Token Circuit
Token ID13588
Feature activation+1.50e-01

Pos logit contributions
+0.029
HI+0.026
+0.026
 Shepard+0.024
 Honest+0.023
Neg logit contributions
ribute-0.025
hemat-0.025
adem-0.025
theless-0.024
hower-0.024
Token Paul
Token ID3362
Feature activation+1.19e-01

Pos logit contributions
ribute+0.100
hemat+0.099
adem+0.099
theless+0.098
 inheritance+0.096
Neg logit contributions
-0.131
HI-0.120
-0.118
 Shepard-0.111
 Honest-0.108
Token Ric
Token ID15868
Feature activation+5.16e-01

Pos logit contributions
ribute+0.083
hemat+0.082
adem+0.081
theless+0.081
hower+0.080
Neg logit contributions
-0.100
HI-0.091
-0.090
 Shepard-0.085
 Honest-0.082
Tokenard
Token ID446
Feature activation-4.88e-02

Pos logit contributions
ribute+0.379
hemat+0.377
adem+0.377
theless+0.370
 inheritance+0.367
Neg logit contributions
-0.415
HI-0.378
-0.373
 Shepard-0.346
 Honest-0.336
Token.
Token ID13
Feature activation-7.44e-03

Pos logit contributions
+0.041
HI+0.037
+0.036
 Shepard+0.034
 Honest+0.033
Neg logit contributions
ribute-0.035
hemat-0.034
theless-0.034
adem-0.034
 inheritance-0.033
Token
Token ID198
Feature activation-7.59e-03

Pos logit contributions
+0.006
HI+0.006
+0.006
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
adem-0.005
hemat-0.005
theless-0.005
 inheritance-0.005
Token
Token ID198
Feature activation-8.82e-03

Pos logit contributions
+0.007
HI+0.006
+0.006
 Shepard+0.006
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
TokenThe
Token ID464
Feature activation-8.28e-02

Pos logit contributions
+0.007
HI+0.006
+0.006
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.007
adem-0.006
hemat-0.006
theless-0.006
 inheritance-0.006
Token high
Token ID1029
Feature activation+4.52e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
hower-0.001
Token price
Token ID2756
Feature activation+8.46e-02

Pos logit contributions
ribute+0.031
hemat+0.031
adem+0.031
theless+0.031
hower+0.030
Neg logit contributions
-0.038
HI-0.035
-0.034
 Shepard-0.032
 Honest-0.031
Token-
Token ID12
Feature activation-1.29e-02

Pos logit contributions
ribute+0.058
adem+0.057
hemat+0.057
theless+0.057
rue+0.055
Neg logit contributions
-0.073
HI-0.067
-0.066
 Shepard-0.061
 Honest-0.060
Tokento
Token ID1462
Feature activation+1.55e-02

Pos logit contributions
+0.011
HI+0.010
+0.010
 Shepard+0.010
 Honest+0.009
Neg logit contributions
ribute-0.009
adem-0.008
hemat-0.008
theless-0.008
 inheritance-0.008
Token-
Token ID12
Feature activation-1.30e-02

Pos logit contributions
ribute+0.011
hemat+0.011
adem+0.011
theless+0.011
 inheritance+0.011
Neg logit contributions
-0.013
HI-0.012
-0.011
 Shepard-0.011
 Honest-0.010
Tokenear
Token ID451
Feature activation+5.14e-01

Pos logit contributions
+0.014
HI+0.013
+0.012
 Shepard+0.012
phi+0.011
Neg logit contributions
ribute-0.006
adem-0.006
hemat-0.006
theless-0.006
hower-0.006
Tokennings
Token ID23400
Feature activation-8.39e-02

Pos logit contributions
ribute+0.419
adem+0.416
hemat+0.416
theless+0.411
 inheritance+0.406
Neg logit contributions
-0.371
HI-0.335
-0.329
 Shepard-0.303
 Honest-0.293
Token multiple
Token ID3294
Feature activation-1.60e-01

Pos logit contributions
+0.085
HI+0.079
+0.078
 Shepard+0.074
 Honest+0.072
Neg logit contributions
ribute-0.044
adem-0.044
hemat-0.044
theless-0.043
rue-0.042
Token may
Token ID743
Feature activation-9.27e-02

Pos logit contributions
+0.135
HI+0.123
+0.121
 Shepard+0.114
 Honest+0.110
Neg logit contributions
ribute-0.112
hemat-0.111
adem-0.111
theless-0.110
hower-0.108
Token be
Token ID307
Feature activation+1.30e-02

Pos logit contributions
+0.080
HI+0.073
+0.072
 Shepard+0.068
 Honest+0.066
Neg logit contributions
ribute-0.062
adem-0.062
hemat-0.062
theless-0.061
hower-0.060
Token justified
Token ID14460
Feature activation+4.10e-02

Pos logit contributions
ribute+0.009
hemat+0.009
adem+0.009
theless+0.009
hower+0.008
Neg logit contributions
-0.011
HI-0.010
-0.010
 Shepard-0.010
 Honest-0.009
Token
Token ID223
Feature activation+1.82e-01

Pos logit contributions
ribute+0.048
hemat+0.048
adem+0.048
theless+0.047
 inheritance+0.047
Neg logit contributions
-0.071
HI-0.066
-0.065
 Shepard-0.061
 Honest-0.059
Tokenρ
Token ID33643
Feature activation-2.83e-01

Pos logit contributions
ribute+0.202
hemat+0.202
adem+0.201
theless+0.200
 inheritance+0.198
Neg logit contributions
-0.076
HI-0.065
-0.062
 Shepard-0.054
 Honest-0.050
Tokenπ
Token ID46582
Feature activation-3.06e-01

Pos logit contributions
+0.125
HI+0.107
+0.103
 Shepard+0.090
 Honest+0.084
Neg logit contributions
ribute-0.307
hemat-0.306
adem-0.306
theless-0.304
 inheritance-0.301
Tokenα
Token ID17394
Feature activation-3.09e-01

Pos logit contributions
+0.137
HI+0.117
+0.112
 Shepard+0.098
 Honest+0.091
Neg logit contributions
ribute-0.332
hemat-0.331
adem-0.331
theless-0.328
 inheritance-0.325
Tokenγ
Token ID42063
Feature activation-3.90e-01

Pos logit contributions
+0.138
HI+0.119
+0.114
 Shepard+0.100
 Honest+0.093
Neg logit contributions
ribute-0.333
hemat-0.333
adem-0.332
theless-0.329
 inheritance-0.326
Token
Token ID157
Feature activation-6.49e-01

Pos logit contributions
+0.190
HI+0.164
+0.159
 Shepard+0.140
 Honest+0.132
Neg logit contributions
ribute-0.407
hemat-0.406
adem-0.406
theless-0.402
 inheritance-0.398
Token
Token ID123
Feature activation+2.38e-02

Pos logit contributions
+0.565
HI+0.518
+0.510
 Shepard+0.476
 Honest+0.464
Neg logit contributions
ribute-0.436
adem-0.433
hemat-0.432
theless-0.427
 inheritance-0.421
Token
Token ID228
Feature activation-1.29e-01

Pos logit contributions
ribute+0.015
hemat+0.015
adem+0.015
theless+0.015
 inheritance+0.015
Neg logit contributions
-0.021
HI-0.019
-0.019
 Shepard-0.018
 Honest-0.017
Tokenς
Token ID35558
Feature activation-1.70e-01

Pos logit contributions
+0.072
HI+0.063
+0.062
 Shepard+0.055
 Honest+0.052
Neg logit contributions
ribute-0.125
hemat-0.125
adem-0.125
theless-0.124
 inheritance-0.122
Token (
Token ID357
Feature activation+5.21e-02

Pos logit contributions
+0.118
HI+0.107
+0.104
 Shepard+0.096
 Honest+0.092
Neg logit contributions
ribute-0.143
hemat-0.142
adem-0.142
theless-0.140
 inheritance-0.139
TokenAnt
Token ID13217
Feature activation-1.96e-01

Pos logit contributions
ribute+0.039
hemat+0.039
adem+0.039
theless+0.039
 inheritance+0.038
Neg logit contributions
-0.041
HI-0.037
-0.036
 Shepard-0.034
 Honest-0.033
Tokenε
Token ID30950
Feature activation-2.90e-01

Pos logit contributions
+0.186
HI+0.168
+0.164
 Shepard+0.150
 Honest+0.144
Neg logit contributions
ribute-0.246
hemat-0.245
adem-0.245
theless-0.242
 inheritance-0.239
Tokenι
Token ID29945
Feature activation-2.26e-01

Pos logit contributions
+0.132
HI+0.113
+0.109
 Shepard+0.095
 Honest+0.089
Neg logit contributions
ribute-0.311
hemat-0.310
adem-0.310
theless-0.307
 inheritance-0.305
Token
Token ID7377
Feature activation-3.91e-01

Pos logit contributions
+0.125
HI+0.111
+0.108
 Shepard+0.097
 Honest+0.092
Neg logit contributions
ribute-0.221
hemat-0.220
adem-0.220
theless-0.218
 inheritance-0.216
Token
Token ID112
Feature activation-1.89e-01

Pos logit contributions
+0.289
HI+0.262
+0.257
 Shepard+0.238
 Honest+0.230
Neg logit contributions
ribute-0.312
hemat-0.310
adem-0.310
theless-0.305
 inheritance-0.302
Tokenι
Token ID29945
Feature activation-2.21e-01

Pos logit contributions
+0.083
HI+0.070
+0.068
 Shepard+0.058
 Honest+0.054
Neg logit contributions
ribute-0.208
hemat-0.207
adem-0.207
theless-0.205
 inheritance-0.203
Token
Token ID157
Feature activation-6.48e-01

Pos logit contributions
+0.094
HI+0.079
+0.076
 Shepard+0.065
 Honest+0.060
Neg logit contributions
ribute-0.245
hemat-0.244
adem-0.244
theless-0.242
 inheritance-0.240
Token
Token ID122
Feature activation-5.03e-02

Pos logit contributions
+0.566
HI+0.519
+0.511
 Shepard+0.477
 Honest+0.465
Neg logit contributions
ribute-0.433
adem-0.429
hemat-0.428
theless-0.423
 inheritance-0.417
Token
Token ID121
Feature activation+6.47e-02

Pos logit contributions
+0.048
HI+0.044
+0.044
 Shepard+0.041
 Honest+0.040
Neg logit contributions
ribute-0.030
hemat-0.029
adem-0.029
theless-0.029
 inheritance-0.028
Token
Token ID28053
Feature activation-3.36e-01

Pos logit contributions
ribute+0.067
hemat+0.067
adem+0.067
theless+0.067
 inheritance+0.066
Neg logit contributions
-0.031
HI-0.027
-0.026
 Shepard-0.023
 Honest-0.022
Token
Token ID120
Feature activation+7.77e-02

Pos logit contributions
+0.313
HI+0.290
+0.286
 Shepard+0.269
 Honest+0.262
Neg logit contributions
ribute-0.203
hemat-0.201
adem-0.201
theless-0.198
 inheritance-0.195
Token
Token ID223
Feature activation+1.82e-01

Pos logit contributions
ribute+0.048
hemat+0.048
adem+0.048
theless+0.047
 inheritance+0.047
Neg logit contributions
-0.071
HI-0.066
-0.065
 Shepard-0.061
 Honest-0.059
Tokenh
Token ID71
Feature activation-3.04e-01

Pos logit contributions
adem+0.162
ribute+0.162
hemat+0.161
theless+0.159
 inheritance+0.158
Neg logit contributions
-0.089
HI-0.078
-0.077
 Shepard-0.068
 Honest-0.065
Token.
Token ID13
Feature activation-3.55e-04

Pos logit contributions
+0.260
HI+0.239
+0.235
 Shepard+0.220
 Honest+0.213
Neg logit contributions
ribute-0.207
hemat-0.205
adem-0.205
theless-0.203
 inheritance-0.200
Token<|endoftext|>
Token ID50256
Feature activation+2.06e-01

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
TokenNEW
Token ID13965
Feature activation-1.90e-01

Pos logit contributions
ribute+0.152
hemat+0.151
adem+0.151
theless+0.148
rue+0.146
Neg logit contributions
-0.164
HI-0.149
-0.147
 Shepard-0.137
 Honest-0.133
Token DEL
Token ID28163
Feature activation-2.92e-02

Pos logit contributions
+0.169
HI+0.155
+0.153
 Shepard+0.143
 Honest+0.140
Neg logit contributions
ribute-0.123
hemat-0.122
adem-0.122
theless-0.121
 inheritance-0.119
TokenHI
Token ID25374
Feature activation-6.20e-01

Pos logit contributions
+0.003
+0.001
HI+0.000
 Shepard-0.001
 Honest-0.001
Neg logit contributions
ribute-0.042
hemat-0.041
adem-0.041
theless-0.041
 inheritance-0.041
Token:
Token ID25
Feature activation-1.42e-03

Pos logit contributions
+0.524
HI+0.480
+0.472
 Shepard+0.441
 Honest+0.428
Neg logit contributions
ribute-0.430
hemat-0.427
adem-0.427
theless-0.421
 inheritance-0.415
Token In
Token ID554
Feature activation+2.93e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token a
Token ID257
Feature activation+2.30e-02

Pos logit contributions
ribute+0.021
hemat+0.021
adem+0.021
theless+0.020
 inheritance+0.020
Neg logit contributions
-0.024
HI-0.022
-0.022
 Shepard-0.020
 Honest-0.020
Token setback
Token ID34116
Feature activation-8.66e-02

Pos logit contributions
ribute+0.015
hemat+0.015
adem+0.015
theless+0.015
 inheritance+0.015
Neg logit contributions
-0.020
HI-0.019
-0.018
 Shepard-0.017
 Honest-0.017
Token to
Token ID284
Feature activation+1.31e-04

Pos logit contributions
+0.073
HI+0.067
+0.066
 Shepard+0.061
 Honest+0.059
Neg logit contributions
ribute-0.061
hemat-0.060
adem-0.060
theless-0.059
 inheritance-0.058
Token
Token ID198
Feature activation+5.52e-03

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
TokenOn
Token ID2202
Feature activation+7.12e-03

Pos logit contributions
ribute+0.004
adem+0.004
hemat+0.004
theless+0.004
rue+0.004
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.003
 Honest-0.003
Token Fox
Token ID5426
Feature activation-1.08e-01

Pos logit contributions
ribute+0.005
adem+0.005
hemat+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.006
HI-0.005
-0.005
 Shepard-0.005
phi-0.005
Token News
Token ID3000
Feature activation-1.75e-01

Pos logit contributions
+0.071
HI+0.063
+0.062
 Shepard+0.056
 Honest+0.054
Neg logit contributions
ribute-0.095
hemat-0.094
adem-0.094
theless-0.093
 inheritance-0.092
Token,
Token ID11
Feature activation+1.59e-02

Pos logit contributions
+0.132
HI+0.120
+0.117
 Shepard+0.109
 Honest+0.105
Neg logit contributions
ribute-0.137
adem-0.136
hemat-0.136
theless-0.135
 inheritance-0.133
Token Shepard
Token ID28606
Feature activation-5.71e-01

Pos logit contributions
ribute+0.012
adem+0.012
hemat+0.012
theless+0.012
 inheritance+0.012
Neg logit contributions
-0.012
HI-0.011
-0.011
 Shepard-0.010
 Honest-0.010
Token Smith
Token ID4176
Feature activation+4.34e-02

Pos logit contributions
+0.487
HI+0.446
+0.440
 Shepard+0.412
 Honest+0.399
Neg logit contributions
ribute-0.392
hemat-0.389
adem-0.388
theless-0.383
 inheritance-0.378
Token led
Token ID2957
Feature activation+7.76e-02

Pos logit contributions
ribute+0.029
adem+0.029
hemat+0.029
theless+0.028
 inheritance+0.028
Neg logit contributions
-0.038
HI-0.035
-0.034
 Shepard-0.032
 Honest-0.031
Token the
Token ID262
Feature activation+1.02e-02

Pos logit contributions
ribute+0.055
adem+0.055
hemat+0.055
theless+0.054
 inheritance+0.054
Neg logit contributions
-0.064
HI-0.059
-0.058
 Shepard-0.053
 Honest-0.052
Token coverage
Token ID5197
Feature activation-2.18e-01

Pos logit contributions
ribute+0.008
hemat+0.008
adem+0.008
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.007
 Honest-0.007
Token in
Token ID287
Feature activation-1.08e-02

Pos logit contributions
+0.185
HI+0.170
+0.167
 Shepard+0.157
 Honest+0.152
Neg logit contributions
ribute-0.151
hemat-0.149
adem-0.149
theless-0.147
 inheritance-0.145
Token),
Token ID828
Feature activation+9.83e-03

Pos logit contributions
ribute+0.070
hemat+0.070
adem+0.070
theless+0.069
 inheritance+0.068
Neg logit contributions
-0.077
HI-0.071
-0.070
 Shepard-0.065
 Honest-0.063
Token who
Token ID508
Feature activation-1.09e-01

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.008
HI-0.008
-0.007
 Shepard-0.007
 Honest-0.007
Token was
Token ID373
Feature activation-1.63e-02

Pos logit contributions
+0.096
HI+0.088
+0.087
 Shepard+0.081
 Honest+0.079
Neg logit contributions
ribute-0.072
hemat-0.071
adem-0.071
theless-0.070
 inheritance-0.069
Token identified
Token ID5174
Feature activation-9.97e-02

Pos logit contributions
+0.014
HI+0.013
+0.013
 Shepard+0.012
 Honest+0.012
Neg logit contributions
ribute-0.011
hemat-0.011
adem-0.011
theless-0.010
hower-0.010
Token by
Token ID416
Feature activation-1.29e-01

Pos logit contributions
+0.080
HI+0.073
+0.072
 Shepard+0.067
 Honest+0.065
Neg logit contributions
ribute-0.073
adem-0.073
hemat-0.073
theless-0.072
 inheritance-0.071
Token Honest
Token ID28106
Feature activation-5.63e-01

Pos logit contributions
+0.106
HI+0.097
+0.096
 Shepard+0.088
 Honest+0.086
Neg logit contributions
ribute-0.092
adem-0.091
hemat-0.091
theless-0.089
 inheritance-0.089
Token Reporting
Token ID29595
Feature activation-4.86e-02

Pos logit contributions
+0.560
HI+0.519
+0.512
 Shepard+0.483
 Honest+0.471
Neg logit contributions
ribute-0.306
hemat-0.304
adem-0.304
theless-0.296
 inheritance-0.293
Token as
Token ID355
Feature activation-2.04e-02

Pos logit contributions
+0.041
HI+0.037
+0.037
 Shepard+0.034
 Honest+0.033
Neg logit contributions
ribute-0.034
adem-0.034
hemat-0.034
theless-0.033
 inheritance-0.033
Token the
Token ID262
Feature activation+2.97e-04

Pos logit contributions
+0.017
HI+0.016
+0.015
 Shepard+0.014
 Honest+0.014
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.014
 inheritance-0.014
Token information
Token ID1321
Feature activation-2.64e-02

Pos logit contributions
ribute+0.000
adem+0.000
hemat+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Token and
Token ID290
Feature activation-1.65e-03

Pos logit contributions
+0.024
HI+0.022
+0.022
 Shepard+0.020
 Honest+0.020
Neg logit contributions
ribute-0.017
adem-0.016
hemat-0.016
theless-0.016
 inheritance-0.016
Token an
Token ID281
Feature activation-5.28e-02

Pos logit contributions
+0.006
HI+0.006
+0.006
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.006
hemat-0.006
adem-0.006
theless-0.006
hower-0.006
Token accident
Token ID5778
Feature activation-4.64e-02

Pos logit contributions
+0.037
HI+0.033
+0.032
 Shepard+0.030
 Honest+0.029
Neg logit contributions
ribute-0.044
hemat-0.044
adem-0.044
theless-0.044
 inheritance-0.043
Token,
Token ID11
Feature activation+4.36e-03

Pos logit contributions
+0.040
HI+0.036
+0.036
 Shepard+0.033
 Honest+0.032
Neg logit contributions
ribute-0.032
hemat-0.032
adem-0.032
theless-0.031
 inheritance-0.031
Token according
Token ID1864
Feature activation-6.31e-02

Pos logit contributions
ribute+0.003
adem+0.003
hemat+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Token to
Token ID284
Feature activation+7.77e-03

Pos logit contributions
+0.055
HI+0.050
+0.050
 Shepard+0.046
 Honest+0.045
Neg logit contributions
ribute-0.042
hemat-0.042
adem-0.042
theless-0.041
 inheritance-0.041
Token Honest
Token ID28106
Feature activation-5.61e-01

Pos logit contributions
ribute+0.006
adem+0.006
hemat+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.006
HI-0.006
-0.006
 Shepard-0.005
phi-0.005
Token Reporting
Token ID29595
Feature activation-5.09e-02

Pos logit contributions
+0.447
HI+0.408
+0.401
 Shepard+0.372
 Honest+0.361
Neg logit contributions
ribute-0.415
hemat-0.412
adem-0.412
theless-0.407
 inheritance-0.402
Token.
Token ID13
Feature activation+3.67e-03

Pos logit contributions
+0.043
HI+0.039
+0.039
 Shepard+0.036
 Honest+0.035
Neg logit contributions
ribute-0.035
hemat-0.035
adem-0.035
theless-0.035
 inheritance-0.034
Token Al
Token ID978
Feature activation-1.15e-01

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Tokenze
Token ID2736
Feature activation-5.59e-02

Pos logit contributions
+0.091
HI+0.082
+0.081
 Shepard+0.075
 Honest+0.073
Neg logit contributions
ribute-0.086
adem-0.086
hemat-0.086
theless-0.084
 inheritance-0.083
Tokener
Token ID263
Feature activation+3.46e-02

Pos logit contributions
+0.050
HI+0.045
+0.045
 Shepard+0.042
 Honest+0.041
Neg logit contributions
ribute-0.036
hemat-0.036
adem-0.036
theless-0.036
 inheritance-0.035
Tokenov
Token ID709
Feature activation-4.74e-02

Pos logit contributions
ribute+0.135
hemat+0.134
adem+0.134
theless+0.132
 inheritance+0.130
Neg logit contributions
-0.165
HI-0.151
-0.149
 Shepard-0.139
 Honest-0.135
Token.
Token ID13
Feature activation-4.39e-03

Pos logit contributions
+0.040
HI+0.037
+0.036
 Shepard+0.034
 Honest+0.033
Neg logit contributions
ribute-0.033
hemat-0.032
adem-0.032
theless-0.032
 inheritance-0.031
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
hower-0.003
Token
Token ID198
Feature activation-4.21e-03

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.009
 Honest+0.008
Neg logit contributions
ribute-0.008
hemat-0.008
adem-0.008
theless-0.008
 inheritance-0.008
TokenSer
Token ID7089
Feature activation+1.09e-01

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Tokenhi
Token ID5303
Feature activation-5.54e-01

Pos logit contributions
ribute+0.078
adem+0.077
hemat+0.077
theless+0.076
 inheritance+0.076
Neg logit contributions
-0.089
HI-0.082
-0.080
 Shepard-0.075
 Honest-0.072
Tokeny
Token ID88
Feature activation-1.02e-01

Pos logit contributions
+0.437
HI+0.398
+0.391
 Shepard+0.363
 Honest+0.352
Neg logit contributions
ribute-0.415
hemat-0.412
adem-0.412
theless-0.407
 inheritance-0.402
Token Kaz
Token ID16385
Feature activation-1.11e-01

Pos logit contributions
+0.086
HI+0.079
+0.077
 Shepard+0.072
 Honest+0.070
Neg logit contributions
ribute-0.071
adem-0.070
hemat-0.070
theless-0.070
 inheritance-0.069
Tokenarov
Token ID42737
Feature activation-6.79e-04

Pos logit contributions
+0.096
HI+0.088
+0.086
 Shepard+0.081
 Honest+0.078
Neg logit contributions
ribute-0.075
hemat-0.075
adem-0.075
theless-0.074
 inheritance-0.073
Token,
Token ID11
Feature activation-5.29e-03

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Token former
Token ID1966
Feature activation-5.43e-02

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.004
hemat-0.004
adem-0.004
theless-0.004
hower-0.004
Token photo
Token ID4590
Feature activation+1.16e-01

Pos logit contributions
+0.048
HI+0.044
+0.043
 Shepard+0.040
 Honest+0.039
Neg logit contributions
ribute-0.042
hemat-0.042
adem-0.042
theless-0.041
 inheritance-0.041
Token)
Token ID8
Feature activation-5.27e-02

Pos logit contributions
ribute+0.084
adem+0.083
hemat+0.083
 inheritance+0.081
theless+0.081
Neg logit contributions
-0.094
HI-0.086
-0.084
 Shepard-0.078
phi-0.076
Token
Token ID198
Feature activation-9.19e-03

Pos logit contributions
+0.044
HI+0.041
+0.040
 Shepard+0.037
 Honest+0.036
Neg logit contributions
ribute-0.037
hemat-0.036
adem-0.036
theless-0.036
 inheritance-0.035
Token
Token ID198
Feature activation-3.67e-03

Pos logit contributions
+0.008
HI+0.007
+0.007
 Shepard+0.007
 Honest+0.007
Neg logit contributions
ribute-0.006
hemat-0.006
adem-0.006
theless-0.006
 inheritance-0.006
TokenDel
Token ID13856
Feature activation-3.47e-03

Pos logit contributions
+0.003
HI+0.003
+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Tokenhi
Token ID5303
Feature activation-5.52e-01

Pos logit contributions
+0.001
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.005
adem-0.005
hemat-0.005
theless-0.005
 inheritance-0.005
Token Law
Token ID3854
Feature activation+7.23e-02

Pos logit contributions
+0.467
HI+0.428
+0.420
 Shepard+0.393
 Honest+0.382
Neg logit contributions
ribute-0.384
hemat-0.380
adem-0.379
theless-0.375
hower-0.369
Token Minister
Token ID4139
Feature activation-1.98e-01

Pos logit contributions
ribute+0.068
hemat+0.068
adem+0.068
theless+0.067
 inheritance+0.066
Neg logit contributions
-0.043
HI-0.038
-0.037
 Shepard-0.034
 Honest-0.032
Token J
Token ID449
Feature activation+2.37e-03

Pos logit contributions
+0.164
HI+0.150
+0.148
 Shepard+0.138
 Honest+0.134
Neg logit contributions
ribute-0.141
hemat-0.140
adem-0.140
theless-0.138
 inheritance-0.136
Tokenit
Token ID270
Feature activation+1.18e-01

Pos logit contributions
ribute+0.001
adem+0.001
hemat+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Tokenender
Token ID2194
Feature activation+4.73e-03

Pos logit contributions
ribute+0.084
hemat+0.083
adem+0.083
theless+0.082
 inheritance+0.081
Neg logit contributions
-0.097
HI-0.089
-0.088
 Shepard-0.082
 Honest-0.079
Tokens
Token ID82
Feature activation-1.28e-02

Pos logit contributions
+0.001
HI+0.001
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Token refinery
Token ID44133
Feature activation-4.94e-02

Pos logit contributions
+0.011
HI+0.010
+0.010
 Shepard+0.009
 Honest+0.009
Neg logit contributions
ribute-0.009
hemat-0.009
adem-0.009
theless-0.008
 inheritance-0.008
Token business
Token ID1597
Feature activation-1.53e-03

Pos logit contributions
+0.041
HI+0.037
+0.037
 Shepard+0.034
 Honest+0.033
Neg logit contributions
ribute-0.035
adem-0.035
hemat-0.035
theless-0.034
 inheritance-0.034
Token.
Token ID13
Feature activation+1.56e-04

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token Ser
Token ID2930
Feature activation+2.25e-01

Pos logit contributions
ribute+0.000
adem+0.000
hemat+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Tokenhi
Token ID5303
Feature activation-5.52e-01

Pos logit contributions
ribute+0.297
hemat+0.297
adem+0.295
theless+0.294
rue+0.291
Neg logit contributions
-0.048
-0.032
HI-0.030
 Shepard-0.019
 Honest-0.014
Tokeny
Token ID88
Feature activation-1.06e-01

Pos logit contributions
+0.403
HI+0.364
+0.358
 Shepard+0.329
 Honest+0.318
Neg logit contributions
ribute-0.446
hemat-0.443
adem-0.443
theless-0.438
 inheritance-0.433
Token Kaz
Token ID16385
Feature activation-1.11e-01

Pos logit contributions
+0.075
HI+0.068
+0.067
 Shepard+0.062
 Honest+0.059
Neg logit contributions
ribute-0.088
adem-0.088
hemat-0.088
theless-0.087
 inheritance-0.086
Tokenarov
Token ID42737
Feature activation+5.74e-03

Pos logit contributions
+0.099
HI+0.091
+0.090
 Shepard+0.084
 Honest+0.082
Neg logit contributions
ribute-0.072
hemat-0.072
adem-0.072
theless-0.071
 inheritance-0.070
Token is
Token ID318
Feature activation-1.29e-03

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token former
Token ID1966
Feature activation-6.15e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
hower-0.001
Token Gol
Token ID26849
Feature activation+1.82e-01

Pos logit contributions
ribute+0.013
hemat+0.013
adem+0.013
theless+0.013
 inheritance+0.013
Neg logit contributions
-0.016
HI-0.015
-0.014
 Shepard-0.013
 Honest-0.013
Tokenov
Token ID709
Feature activation-3.97e-02

Pos logit contributions
ribute+0.117
hemat+0.116
adem+0.116
theless+0.115
 inheritance+0.113
Neg logit contributions
-0.163
HI-0.149
-0.147
 Shepard-0.138
 Honest-0.134
Tokenko
Token ID7204
Feature activation-1.50e-01

Pos logit contributions
+0.035
HI+0.032
+0.032
 Shepard+0.030
 Honest+0.029
Neg logit contributions
ribute-0.026
hemat-0.026
adem-0.026
theless-0.026
 inheritance-0.025
Token,
Token ID11
Feature activation-4.72e-03

Pos logit contributions
+0.124
HI+0.113
+0.111
 Shepard+0.104
 Honest+0.101
Neg logit contributions
ribute-0.107
hemat-0.106
adem-0.106
theless-0.105
hower-0.103
Token Ser
Token ID2930
Feature activation+2.27e-01

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
hower-0.003
Tokenhi
Token ID5303
Feature activation-5.51e-01

Pos logit contributions
ribute+0.298
hemat+0.297
adem+0.297
theless+0.294
rue+0.292
Neg logit contributions
-0.051
HI-0.034
-0.034
 Shepard-0.022
 Honest-0.017
Tokeny
Token ID88
Feature activation-9.58e-02

Pos logit contributions
+0.401
HI+0.361
+0.354
 Shepard+0.327
 Honest+0.316
Neg logit contributions
ribute-0.448
hemat-0.445
adem-0.444
theless-0.441
 inheritance-0.435
Token Kaz
Token ID16385
Feature activation-1.06e-01

Pos logit contributions
+0.072
HI+0.065
+0.064
 Shepard+0.059
 Honest+0.057
Neg logit contributions
ribute-0.075
adem-0.075
hemat-0.075
theless-0.074
 inheritance-0.073
Tokenarov
Token ID42737
Feature activation+6.09e-03

Pos logit contributions
+0.093
HI+0.086
+0.084
 Shepard+0.079
 Honest+0.077
Neg logit contributions
ribute-0.069
hemat-0.069
adem-0.069
theless-0.068
 inheritance-0.067
Token,
Token ID11
Feature activation+3.43e-03

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.005
-0.005
 Shepard-0.004
 Honest-0.004
Token Vital
Token ID28476
Feature activation+1.98e-01

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
hower+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.002
 Honest-0.002
Token "
Token ID366
Feature activation+3.14e-02

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.004
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Tokenann
Token ID1236
Feature activation-1.39e-01

Pos logit contributions
ribute+0.024
hemat+0.023
adem+0.023
theless+0.023
 inheritance+0.023
Neg logit contributions
-0.025
HI-0.023
-0.022
 Shepard-0.021
 Honest-0.020
Token data
Token ID1366
Feature activation-6.78e-02

Pos logit contributions
+0.119
HI+0.110
+0.108
 Shepard+0.101
 Honest+0.098
Neg logit contributions
ribute-0.095
adem-0.094
hemat-0.094
theless-0.093
 inheritance-0.091
Token su
Token ID424
Feature activation-1.76e-02

Pos logit contributions
+0.057
HI+0.052
+0.051
 Shepard+0.048
 Honest+0.047
Neg logit contributions
ribute-0.047
adem-0.047
hemat-0.047
theless-0.046
 inheritance-0.046
Tokenk
Token ID74
Feature activation-9.12e-03

Pos logit contributions
+0.013
HI+0.012
+0.012
 Shepard+0.011
 Honest+0.011
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.013
 inheritance-0.013
Tokenhi
Token ID5303
Feature activation-5.51e-01

Pos logit contributions
+0.007
HI+0.007
+0.006
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.007
 inheritance-0.007
Token b
Token ID275
Feature activation-8.11e-02

Pos logit contributions
+0.457
HI+0.418
+0.411
 Shepard+0.384
 Honest+0.372
Neg logit contributions
ribute-0.390
hemat-0.387
adem-0.386
theless-0.382
 inheritance-0.377
Tokenh
Token ID71
Feature activation-3.09e-01

Pos logit contributions
+0.038
HI+0.033
+0.032
 Shepard+0.028
 Honest+0.026
Neg logit contributions
ribute-0.086
adem-0.086
hemat-0.086
theless-0.085
rue-0.084
Tokenava
Token ID4170
Feature activation+1.16e-03

Pos logit contributions
+0.265
HI+0.243
+0.240
 Shepard+0.225
 Honest+0.218
Neg logit contributions
ribute-0.211
adem-0.209
hemat-0.209
theless-0.205
 inheritance-0.202
Token"
Token ID1
Feature activation+2.54e-02

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Token and
Token ID290
Feature activation-6.96e-03

Pos logit contributions
ribute+0.017
adem+0.017
hemat+0.017
theless+0.017
 inheritance+0.016
Neg logit contributions
-0.022
HI-0.020
-0.020
 Shepard-0.019
 Honest-0.018
Token Son
Token ID6295
Feature activation+1.74e-01

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Tokenip
Token ID541
Feature activation+1.18e-01

Pos logit contributions
ribute+0.128
adem+0.128
hemat+0.128
theless+0.125
rue+0.124
Neg logit contributions
-0.139
HI-0.128
-0.126
 Shepard-0.117
 Honest-0.113
Tokenat
Token ID265
Feature activation+1.46e-01

Pos logit contributions
ribute+0.075
hemat+0.074
adem+0.074
theless+0.073
 inheritance+0.072
Neg logit contributions
-0.106
HI-0.098
-0.096
 Shepard-0.090
 Honest-0.088
Token,
Token ID11
Feature activation+1.01e-02

Pos logit contributions
ribute+0.106
hemat+0.105
adem+0.105
theless+0.104
 inheritance+0.103
Neg logit contributions
-0.118
HI-0.108
-0.106
 Shepard-0.098
 Honest-0.096
Token B
Token ID347
Feature activation-7.19e-03

Pos logit contributions
ribute+0.007
hemat+0.007
adem+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.008
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Tokenhi
Token ID5303
Feature activation-5.46e-01

Pos logit contributions
+0.006
HI+0.005
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
adem-0.005
hemat-0.005
theless-0.005
 inheritance-0.005
Tokenw
Token ID86
Feature activation+3.45e-02

Pos logit contributions
+0.476
HI+0.438
+0.431
 Shepard+0.403
 Honest+0.392
Neg logit contributions
ribute-0.364
hemat-0.361
adem-0.361
theless-0.356
 inheritance-0.351
Tokenani
Token ID3216
Feature activation-1.34e-01

Pos logit contributions
ribute+0.027
adem+0.026
hemat+0.026
theless+0.026
 inheritance+0.026
Neg logit contributions
-0.027
HI-0.024
-0.024
 Shepard-0.022
 Honest-0.022
Token,
Token ID11
Feature activation+1.07e-02

Pos logit contributions
+0.112
HI+0.103
+0.101
 Shepard+0.094
 Honest+0.092
Neg logit contributions
ribute-0.093
hemat-0.093
adem-0.092
theless-0.091
 inheritance-0.090
Token J
Token ID449
Feature activation+1.23e-02

Pos logit contributions
ribute+0.008
hemat+0.008
adem+0.008
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Tokenind
Token ID521
Feature activation-1.30e-01

Pos logit contributions
ribute+0.010
adem+0.010
hemat+0.010
theless+0.010
rue+0.010
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Token...
Token ID986
Feature activation-8.99e-02

Pos logit contributions
+0.270
HI+0.247
+0.243
 Shepard+0.227
 Honest+0.220
Neg logit contributions
ribute-0.227
hemat-0.225
adem-0.225
theless-0.222
 inheritance-0.219
Tokenk
Token ID74
Feature activation-8.71e-03

Pos logit contributions
+0.075
HI+0.069
+0.068
 Shepard+0.063
 Honest+0.061
Neg logit contributions
ribute-0.063
hemat-0.063
adem-0.063
theless-0.062
 inheritance-0.061
Tokenuch
Token ID794
Feature activation+2.25e-02

Pos logit contributions
+0.007
HI+0.006
+0.006
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.006
adem-0.006
hemat-0.006
theless-0.006
 inheritance-0.006
Tokenh
Token ID71
Feature activation-3.04e-01

Pos logit contributions
ribute+0.016
adem+0.016
hemat+0.016
theless+0.016
 inheritance+0.016
Neg logit contributions
-0.018
HI-0.017
-0.016
 Shepard-0.015
 Honest-0.015
Token b
Token ID275
Feature activation-7.51e-02

Pos logit contributions
+0.250
HI+0.229
+0.225
 Shepard+0.210
 Honest+0.204
Neg logit contributions
ribute-0.218
adem-0.216
hemat-0.216
theless-0.213
 inheritance-0.210
Tokenhi
Token ID5303
Feature activation-5.46e-01

Pos logit contributions
+0.039
HI+0.034
+0.034
 Shepard+0.030
 Honest+0.028
Neg logit contributions
ribute-0.076
adem-0.076
hemat-0.076
theless-0.075
rue-0.074
Token.
Token ID13
Feature activation+5.47e-03

Pos logit contributions
+0.444
HI+0.405
+0.398
 Shepard+0.371
 Honest+0.360
Neg logit contributions
ribute-0.397
hemat-0.394
adem-0.394
theless-0.389
 inheritance-0.384
Token al
Token ID435
Feature activation-1.74e-03

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Tokenag
Token ID363
Feature activation+4.90e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
adem-0.001
hemat-0.001
theless-0.001
 inheritance-0.001
Token se
Token ID384
Feature activation-6.49e-03

Pos logit contributions
ribute+0.036
adem+0.036
hemat+0.036
theless+0.036
 inheritance+0.035
Neg logit contributions
-0.039
HI-0.035
-0.035
 Shepard-0.032
 Honest-0.031
Token int
Token ID493
Feature activation-2.76e-01

Pos logit contributions
+0.005
HI+0.005
+0.005
 Shepard+0.004
 Honest+0.004
Neg logit contributions
ribute-0.005
adem-0.005
hemat-0.005
theless-0.005
 inheritance-0.005
Tokenana
Token ID2271
Feature activation-1.83e-01

Pos logit contributions
adem+0.056
ribute+0.056
hemat+0.055
theless+0.054
rue+0.054
Neg logit contributions
-0.043
HI-0.039
-0.038
 Shepard-0.035
 Honest-0.034
Token
Token ID447
Feature activation+1.42e-02

Pos logit contributions
+0.147
HI+0.134
+0.132
 Shepard+0.122
 Honest+0.119
Neg logit contributions
ribute-0.135
hemat-0.134
adem-0.134
theless-0.133
 inheritance-0.131
Token
Token ID247
Feature activation+1.50e-02

Pos logit contributions
ribute+0.009
hemat+0.009
adem+0.009
theless+0.009
 inheritance+0.008
Neg logit contributions
-0.013
HI-0.012
-0.012
 Shepard-0.011
 Honest-0.011
Tokens
Token ID82
Feature activation+1.40e-03

Pos logit contributions
ribute+0.010
hemat+0.010
adem+0.010
theless+0.010
 inheritance+0.010
Neg logit contributions
-0.013
HI-0.012
-0.011
 Shepard-0.011
 Honest-0.010
Token Gar
Token ID7164
Feature activation+5.20e-02

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Tokenhi
Token ID5303
Feature activation-5.44e-01

Pos logit contributions
ribute+0.041
adem+0.040
hemat+0.040
theless+0.039
rue+0.039
Neg logit contributions
-0.039
HI-0.036
-0.035
 Shepard-0.033
 Honest-0.032
Token H
Token ID367
Feature activation-1.03e-02

Pos logit contributions
+0.432
HI+0.393
+0.387
 Shepard+0.359
 Honest+0.349
Neg logit contributions
ribute-0.405
hemat-0.402
adem-0.402
theless-0.396
 inheritance-0.392
Tokenars
Token ID945
Feature activation+7.05e-02

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.008
 Honest+0.007
Neg logit contributions
ribute-0.007
adem-0.007
hemat-0.007
theless-0.007
 inheritance-0.007
Tokenaru
Token ID11493
Feature activation-2.99e-02

Pos logit contributions
ribute+0.057
adem+0.057
hemat+0.057
theless+0.056
rue+0.055
Neg logit contributions
-0.051
HI-0.046
-0.046
 Shepard-0.042
 Honest-0.040
Token Junction
Token ID44410
Feature activation-2.56e-01

Pos logit contributions
+0.022
HI+0.020
+0.020
 Shepard+0.018
 Honest+0.018
Neg logit contributions
ribute-0.024
hemat-0.024
adem-0.024
theless-0.023
 inheritance-0.023
Token.
Token ID13
Feature activation-3.41e-03

Pos logit contributions
+0.213
HI+0.195
+0.192
 Shepard+0.179
 Honest+0.174
Neg logit contributions
ribute-0.181
adem-0.179
hemat-0.179
theless-0.176
 inheritance-0.175
Token:
Token ID25
Feature activation+1.23e-02

Pos logit contributions
+0.039
HI+0.036
+0.035
 Shepard+0.033
 Honest+0.032
Neg logit contributions
ribute-0.031
hemat-0.031
adem-0.031
theless-0.030
 inheritance-0.030
Token Sh
Token ID911
Feature activation+9.20e-03

Pos logit contributions
ribute+0.009
adem+0.009
hemat+0.009
theless+0.009
 inheritance+0.008
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.009
 Honest-0.008
Tokenri
Token ID380
Feature activation-1.42e-01

Pos logit contributions
ribute+0.006
adem+0.006
hemat+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token Pre
Token ID3771
Feature activation+2.54e-02

Pos logit contributions
+0.116
HI+0.106
+0.104
 Shepard+0.097
 Honest+0.094
Neg logit contributions
ribute-0.103
hemat-0.102
adem-0.102
theless-0.101
 inheritance-0.100
Tokenmb
Token ID2022
Feature activation-5.19e-02

Pos logit contributions
ribute+0.020
hemat+0.019
adem+0.019
theless+0.019
 inheritance+0.019
Neg logit contributions
-0.019
HI-0.018
-0.018
 Shepard-0.016
 Honest-0.016
Tokenhi
Token ID5303
Feature activation-5.44e-01

Pos logit contributions
+0.044
HI+0.040
+0.040
 Shepard+0.037
 Honest+0.036
Neg logit contributions
ribute-0.036
hemat-0.036
adem-0.036
theless-0.035
 inheritance-0.035
Tokenks
Token ID591
Feature activation+2.33e-02

Pos logit contributions
+0.454
HI+0.415
+0.409
 Shepard+0.381
 Honest+0.370
Neg logit contributions
ribute-0.384
hemat-0.381
adem-0.381
theless-0.375
 inheritance-0.370
Tokenhu
Token ID13415
Feature activation-9.71e-02

Pos logit contributions
ribute+0.019
adem+0.019
hemat+0.019
theless+0.018
rue+0.018
Neg logit contributions
-0.017
HI-0.015
-0.015
 Shepard-0.014
 Honest-0.013
Tokenji
Token ID7285
Feature activation-1.87e-01

Pos logit contributions
+0.082
HI+0.075
+0.074
 Shepard+0.069
 Honest+0.067
Neg logit contributions
ribute-0.067
hemat-0.067
adem-0.067
theless-0.066
 inheritance-0.065
Token Marg
Token ID11899
Feature activation-1.58e-01

Pos logit contributions
+0.158
HI+0.145
+0.143
 Shepard+0.133
 Honest+0.129
Neg logit contributions
ribute-0.130
hemat-0.129
adem-0.129
theless-0.126
 inheritance-0.126
Token,
Token ID11
Feature activation+6.40e-03

Pos logit contributions
+0.134
HI+0.123
+0.121
 Shepard+0.113
 Honest+0.110
Neg logit contributions
ribute-0.109
adem-0.108
hemat-0.108
theless-0.107
 inheritance-0.105
Token Roh
Token ID32694
Feature activation-1.41e-01

Pos logit contributions
+0.002
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
adem-0.001
hemat-0.001
theless-0.001
hower-0.001
Tokent
Token ID83
Feature activation-1.00e-03

Pos logit contributions
+0.115
HI+0.105
+0.104
 Shepard+0.096
 Honest+0.094
Neg logit contributions
ribute-0.102
hemat-0.101
adem-0.101
theless-0.100
 inheritance-0.099
Tokenak
Token ID461
Feature activation-8.55e-03

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
adem-0.001
hemat-0.001
theless-0.001
 inheritance-0.001
Token,
Token ID11
Feature activation+3.59e-03

Pos logit contributions
+0.007
HI+0.007
+0.006
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.006
hemat-0.006
adem-0.006
theless-0.006
 inheritance-0.006
Token B
Token ID347
Feature activation-6.76e-03

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.002
Tokenhi
Token ID5303
Feature activation-5.43e-01

Pos logit contributions
+0.003
+0.002
HI+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.008
adem-0.008
hemat-0.008
theless-0.008
hower-0.007
Tokenw
Token ID86
Feature activation+3.00e-02

Pos logit contributions
+0.475
HI+0.435
+0.428
 Shepard+0.401
 Honest+0.390
Neg logit contributions
ribute-0.361
hemat-0.358
adem-0.358
theless-0.355
 inheritance-0.349
Tokenani
Token ID3216
Feature activation-1.35e-01

Pos logit contributions
ribute+0.024
adem+0.024
hemat+0.023
rue+0.023
 inheritance+0.023
Neg logit contributions
-0.023
HI-0.021
-0.020
 Shepard-0.019
 Honest-0.018
Token,
Token ID11
Feature activation+1.06e-02

Pos logit contributions
+0.114
HI+0.104
+0.103
 Shepard+0.096
 Honest+0.093
Neg logit contributions
ribute-0.094
hemat-0.093
adem-0.093
theless-0.092
 inheritance-0.090
Token J
Token ID449
Feature activation+1.36e-02

Pos logit contributions
ribute+0.007
hemat+0.007
adem+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.008
 Honest-0.007
Tokenh
Token ID71
Feature activation-2.95e-01

Pos logit contributions
ribute+0.013
hemat+0.013
adem+0.013
theless+0.013
 inheritance+0.013
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token ago
Token ID2084
Feature activation-8.60e-02

Pos logit contributions
+0.008
HI+0.007
+0.007
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.008
adem-0.007
hemat-0.007
theless-0.007
 inheritance-0.007
Token when
Token ID618
Feature activation+6.22e-02

Pos logit contributions
+0.073
HI+0.067
+0.066
 Shepard+0.062
 Honest+0.060
Neg logit contributions
ribute-0.059
adem-0.059
hemat-0.059
theless-0.058
 inheritance-0.057
Token se
Token ID384
Feature activation-1.02e-02

Pos logit contributions
ribute+0.044
adem+0.043
hemat+0.043
theless+0.043
 inheritance+0.042
Neg logit contributions
-0.052
HI-0.048
-0.047
 Shepard-0.043
 Honest-0.042
Tokenared
Token ID1144
Feature activation+7.35e-02

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.008
 Honest+0.007
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.006
 inheritance-0.006
Token a
Token ID257
Feature activation+2.99e-02

Pos logit contributions
ribute+0.049
adem+0.048
hemat+0.048
theless+0.048
 inheritance+0.047
Neg logit contributions
-0.064
HI-0.059
-0.058
 Shepard-0.054
 Honest-0.053
Tokenhi
Token ID5303
Feature activation-5.42e-01

Pos logit contributions
ribute+0.020
adem+0.020
hemat+0.020
theless+0.020
 inheritance+0.019
Neg logit contributions
-0.026
HI-0.024
-0.023
 Shepard-0.022
 Honest-0.021
Token was
Token ID373
Feature activation-7.93e-03

Pos logit contributions
+0.452
HI+0.414
+0.407
 Shepard+0.379
 Honest+0.368
Neg logit contributions
ribute-0.382
hemat-0.379
adem-0.379
theless-0.374
 inheritance-0.369
Token a
Token ID257
Feature activation+3.50e-02

Pos logit contributions
+0.007
HI+0.006
+0.006
 Shepard+0.006
 Honest+0.006
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Token trend
Token ID5182
Feature activation+1.72e-01

Pos logit contributions
ribute+0.023
adem+0.023
hemat+0.023
theless+0.022
 inheritance+0.022
Neg logit contributions
-0.031
HI-0.028
-0.028
 Shepard-0.026
 Honest-0.025
Token,
Token ID11
Feature activation+3.80e-03

Pos logit contributions
ribute+0.118
hemat+0.117
adem+0.117
theless+0.116
 inheritance+0.115
Neg logit contributions
-0.146
HI-0.134
-0.132
 Shepard-0.123
 Honest-0.120
Token they
Token ID484
Feature activation+1.32e-02

Pos logit contributions
ribute+0.003
adem+0.003
hemat+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Token baseball
Token ID9283
Feature activation-9.41e-03

Pos logit contributions
ribute+0.007
hemat+0.007
adem+0.007
theless+0.007
hower+0.007
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Token,
Token ID11
Feature activation+2.72e-02

Pos logit contributions
+0.008
HI+0.007
+0.007
 Shepard+0.007
 Honest+0.007
Neg logit contributions
ribute-0.006
adem-0.006
hemat-0.006
theless-0.006
 inheritance-0.006
Token saying
Token ID2282
Feature activation-1.50e-01

Pos logit contributions
ribute+0.019
hemat+0.019
adem+0.019
theless+0.019
hower+0.018
Neg logit contributions
-0.023
HI-0.021
-0.020
 Shepard-0.019
 Honest-0.018
Token
Token ID564
Feature activation+3.11e-02

Pos logit contributions
+0.125
HI+0.114
+0.112
 Shepard+0.105
 Honest+0.102
Neg logit contributions
ribute-0.107
hemat-0.106
adem-0.106
theless-0.104
 inheritance-0.103
Token
Token ID246
Feature activation+7.21e-02

Pos logit contributions
ribute+0.022
hemat+0.022
adem+0.022
theless+0.022
 inheritance+0.021
Neg logit contributions
-0.026
HI-0.024
-0.023
 Shepard-0.022
 Honest-0.021
Tokenhi
Token ID5303
Feature activation-5.32e-01

Pos logit contributions
ribute+0.051
adem+0.051
hemat+0.051
theless+0.050
 inheritance+0.050
Neg logit contributions
-0.059
HI-0.054
-0.054
 Shepard-0.050
 Honest-0.048
Token
Token ID447
Feature activation+2.09e-02

Pos logit contributions
+0.454
HI+0.416
+0.409
 Shepard+0.382
 Honest+0.371
Neg logit contributions
ribute-0.365
adem-0.363
hemat-0.363
theless-0.358
 inheritance-0.353
Token
Token ID247
Feature activation+1.79e-02

Pos logit contributions
ribute+0.014
hemat+0.014
adem+0.014
theless+0.014
 inheritance+0.014
Neg logit contributions
-0.018
HI-0.017
-0.016
 Shepard-0.015
 Honest-0.015
Token to
Token ID284
Feature activation+1.88e-02

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.012
rue+0.012
Neg logit contributions
-0.015
HI-0.014
-0.014
 Shepard-0.013
 Honest-0.013
Token everybody
Token ID7288
Feature activation+2.79e-02

Pos logit contributions
ribute+0.013
hemat+0.013
adem+0.013
theless+0.012
 inheritance+0.012
Neg logit contributions
-0.016
HI-0.015
-0.015
 Shepard-0.014
 Honest-0.013
Token,
Token ID11
Feature activation+1.75e-02

Pos logit contributions
ribute+0.020
hemat+0.019
adem+0.019
theless+0.019
 inheritance+0.019
Neg logit contributions
-0.023
HI-0.021
-0.021
 Shepard-0.020
 Honest-0.019
Tokenama
Token ID1689
Feature activation-1.91e-01

Pos logit contributions
+0.090
HI+0.082
+0.080
 Shepard+0.074
 Honest+0.072
Neg logit contributions
ribute-0.094
hemat-0.093
adem-0.093
theless-0.092
 inheritance-0.091
Tokeni
Token ID72
Feature activation-1.19e-01

Pos logit contributions
+0.153
HI+0.140
+0.138
 Shepard+0.128
 Honest+0.124
Neg logit contributions
ribute-0.140
hemat-0.139
adem-0.139
theless-0.137
 inheritance-0.135
Token,
Token ID11
Feature activation-6.48e-03

Pos logit contributions
+0.096
HI+0.088
+0.086
 Shepard+0.080
 Honest+0.078
Neg logit contributions
ribute-0.087
hemat-0.087
adem-0.087
theless-0.085
 inheritance-0.085
Token k
Token ID479
Feature activation-1.52e-01

Pos logit contributions
+0.005
HI+0.005
+0.005
 Shepard+0.004
 Honest+0.004
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Tokenū
Token ID20317
Feature activation-1.05e-01

Pos logit contributions
+0.120
HI+0.110
+0.108
 Shepard+0.100
 Honest+0.097
Neg logit contributions
ribute-0.114
adem-0.113
hemat-0.113
theless-0.111
rue-0.110
Tokenpa
Token ID8957
Feature activation-5.28e-01

Pos logit contributions
+0.081
HI+0.074
+0.073
 Shepard+0.068
 Honest+0.066
Neg logit contributions
ribute-0.079
hemat-0.079
adem-0.079
theless-0.078
 inheritance-0.077
Token
Token ID134
Feature activation-9.78e-02

Pos logit contributions
+0.427
HI+0.389
+0.382
 Shepard+0.355
 Honest+0.345
Neg logit contributions
ribute-0.386
hemat-0.383
adem-0.383
theless-0.378
 inheritance-0.373
Token
Token ID119
Feature activation-1.93e-02

Pos logit contributions
+0.079
HI+0.072
+0.071
 Shepard+0.066
 Honest+0.064
Neg logit contributions
ribute-0.072
hemat-0.071
adem-0.071
theless-0.070
 inheritance-0.069
Tokena
Token ID64
Feature activation-1.23e-03

Pos logit contributions
+0.016
HI+0.014
+0.014
 Shepard+0.013
 Honest+0.013
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.014
 inheritance-0.014
Token,
Token ID11
Feature activation-1.26e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Token al
Token ID435
Feature activation-1.34e-02

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.009
hemat-0.009
adem-0.009
theless-0.009
 inheritance-0.009
Token/
Token ID14
Feature activation-4.17e-02

Pos logit contributions
+0.018
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.013
hemat-0.013
adem-0.013
theless-0.012
 inheritance-0.012
Tokens
Token ID82
Feature activation-1.86e-02

Pos logit contributions
+0.033
HI+0.031
+0.030
 Shepard+0.028
 Honest+0.027
Neg logit contributions
ribute-0.031
hemat-0.030
adem-0.030
theless-0.030
 inheritance-0.030
Tokengb
Token ID22296
Feature activation-1.43e-01

Pos logit contributions
+0.014
HI+0.013
+0.013
 Shepard+0.012
 Honest+0.012
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.014
 inheritance-0.014
Token (
Token ID357
Feature activation+4.53e-02

Pos logit contributions
+0.134
HI+0.124
+0.122
 Shepard+0.115
 Honest+0.112
Neg logit contributions
ribute-0.085
hemat-0.085
adem-0.085
theless-0.083
 inheritance-0.082
Tokend
Token ID67
Feature activation+1.17e-01

Pos logit contributions
ribute+0.033
hemat+0.033
adem+0.033
theless+0.033
 inheritance+0.032
Neg logit contributions
-0.036
HI-0.033
-0.033
 Shepard-0.030
 Honest-0.029
Tokenpa
Token ID8957
Feature activation-5.27e-01

Pos logit contributions
ribute+0.148
hemat+0.147
adem+0.147
theless+0.146
 inheritance+0.145
Neg logit contributions
-0.032
HI-0.023
-0.022
 Shepard-0.016
 Honest-0.014
Token,
Token ID11
Feature activation-3.93e-03

Pos logit contributions
+0.436
HI+0.398
+0.392
 Shepard+0.365
 Honest+0.354
Neg logit contributions
ribute-0.376
hemat-0.375
adem-0.374
theless-0.371
 inheritance-0.364
Token br
Token ID865
Feature activation-2.62e-04

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.003
adem-0.003
hemat-0.003
theless-0.003
 inheritance-0.003
Token,
Token ID11
Feature activation-3.24e-03

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
adem-0.000
hemat-0.000
theless-0.000
 inheritance-0.000
Token ok
Token ID12876
Feature activation-1.72e-01

Pos logit contributions
+0.003
HI+0.002
+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.002
adem-0.002
hemat-0.002
theless-0.002
 inheritance-0.002
Tokento
Token ID1462
Feature activation+1.30e-02

Pos logit contributions
+0.141
HI+0.130
+0.128
 Shepard+0.119
 Honest+0.116
Neg logit contributions
ribute-0.123
adem-0.122
hemat-0.122
theless-0.120
 inheritance-0.118
Token Clement
Token ID27311
Feature activation+2.55e-01

Pos logit contributions
+0.044
HI+0.041
+0.041
 Shepard+0.040
 Honest+0.039
Neg logit contributions
adem-0.018
ribute-0.018
theless-0.018
hemat-0.017
rue-0.017
Token
Token ID198
Feature activation-1.63e-04

Pos logit contributions
ribute+0.176
hemat+0.175
adem+0.175
theless+0.172
 inheritance+0.170
Neg logit contributions
-0.217
HI-0.199
-0.195
 Shepard-0.183
 Honest-0.177
Token
Token ID198
Feature activation+2.86e-04

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Token2009
Token ID10531
Feature activation-3.20e-02

Pos logit contributions
ribute+0.000
hemat+0.000
adem+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Token Jim
Token ID5395
Feature activation-1.34e-01

Pos logit contributions
+0.024
HI+0.021
+0.021
 Shepard+0.019
 Honest+0.019
Neg logit contributions
ribute-0.026
hemat-0.025
adem-0.025
theless-0.025
 inheritance-0.025
Token Parsons
Token ID35925
Feature activation+2.91e-01

Pos logit contributions
+0.120
HI+0.111
+0.109
 Shepard+0.102
 Honest+0.100
Neg logit contributions
ribute-0.086
hemat-0.086
adem-0.085
theless-0.084
 inheritance-0.083
Token
Token ID198
Feature activation-4.21e-03

Pos logit contributions
ribute+0.198
adem+0.196
hemat+0.196
theless+0.193
 inheritance+0.191
Neg logit contributions
-0.250
HI-0.230
-0.226
 Shepard-0.211
 Honest-0.205
Token
Token ID198
Feature activation-1.39e-04

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token2009
Token ID10531
Feature activation-2.55e-02

Pos logit contributions
+0.000
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Token Tony
Token ID8832
Feature activation+1.69e-02

Pos logit contributions
+0.019
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.020
hemat-0.020
adem-0.020
theless-0.020
 inheritance-0.020
Token Shal
Token ID47974
Feature activation+7.18e-02

Pos logit contributions
ribute+0.010
hemat+0.010
adem+0.010
theless+0.010
 inheritance+0.010
Neg logit contributions
-0.016
HI-0.015
-0.014
 Shepard-0.013
 Honest-0.013
Token
Token ID447
Feature activation+1.79e-02

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.012
 inheritance+0.012
Neg logit contributions
-0.015
HI-0.014
-0.014
 Shepard-0.013
 Honest-0.012
Token
Token ID251
Feature activation+4.14e-02

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.011
 inheritance+0.011
Neg logit contributions
-0.016
HI-0.015
-0.014
 Shepard-0.013
 Honest-0.013
Token
Token ID198
Feature activation+8.21e-03

Pos logit contributions
ribute+0.027
hemat+0.027
adem+0.027
theless+0.026
 inheritance+0.026
Neg logit contributions
-0.037
HI-0.034
-0.033
 Shepard-0.031
 Honest-0.030
Token
Token ID198
Feature activation+1.25e-02

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
rue+0.005
Neg logit contributions
-0.007
HI-0.007
-0.006
 Shepard-0.006
 Honest-0.006
TokenAlthough
Token ID7003
Feature activation+4.27e-02

Pos logit contributions
ribute+0.009
hemat+0.009
adem+0.009
theless+0.009
 inheritance+0.009
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.009
 Honest-0.008
Token U
Token ID471
Feature activation+2.49e-01

Pos logit contributions
ribute+0.029
hemat+0.029
adem+0.029
theless+0.029
 inheritance+0.028
Neg logit contributions
-0.036
HI-0.033
-0.033
 Shepard-0.031
 Honest-0.030
Tokenem
Token ID368
Feature activation+1.58e-01

Pos logit contributions
ribute+0.128
hemat+0.126
adem+0.126
theless+0.124
 inheritance+0.122
Neg logit contributions
-0.255
HI-0.238
-0.235
 Shepard-0.222
 Honest-0.217
Tokenatsu
Token ID19231
Feature activation-1.11e-01

Pos logit contributions
ribute+0.129
hemat+0.128
adem+0.128
theless+0.126
 inheritance+0.125
Neg logit contributions
-0.114
HI-0.103
-0.101
 Shepard-0.093
 Honest-0.090
Token
Token ID447
Feature activation+1.45e-02

Pos logit contributions
+0.094
HI+0.086
+0.085
 Shepard+0.079
 Honest+0.077
Neg logit contributions
ribute-0.077
hemat-0.077
adem-0.077
theless-0.076
 inheritance-0.075
Token
Token ID247
Feature activation+1.21e-02

Pos logit contributions
ribute+0.010
hemat+0.009
adem+0.009
 inheritance+0.009
hower+0.009
Neg logit contributions
-0.013
HI-0.012
-0.012
 Shepard-0.011
 Honest-0.011
Tokens
Token ID82
Feature activation-2.82e-03

Pos logit contributions
ribute+0.008
hemat+0.008
adem+0.008
theless+0.008
 inheritance+0.008
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.008
 Honest-0.008
Token,
Token ID11
Feature activation+2.14e-02

Pos logit contributions
ribute+0.114
hemat+0.113
adem+0.113
theless+0.111
 inheritance+0.110
Neg logit contributions
-0.138
HI-0.127
-0.124
 Shepard-0.116
 Honest-0.113
Token a
Token ID257
Feature activation+4.28e-02

Pos logit contributions
ribute+0.016
adem+0.016
hemat+0.016
theless+0.016
 inheritance+0.015
Neg logit contributions
-0.017
HI-0.016
-0.015
 Shepard-0.014
 Honest-0.014
Token professor
Token ID6240
Feature activation-3.61e-03

Pos logit contributions
ribute+0.029
hemat+0.028
adem+0.028
theless+0.028
 inheritance+0.028
Neg logit contributions
-0.037
HI-0.034
-0.034
 Shepard-0.032
 Honest-0.031
Token in
Token ID287
Feature activation+2.27e-03

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.002
hemat-0.002
adem-0.002
theless-0.002
 inheritance-0.002
Token the
Token ID262
Feature activation+1.84e-02

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token U
Token ID471
Feature activation+2.46e-01

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.011
 inheritance+0.011
Neg logit contributions
-0.017
HI-0.015
-0.015
 Shepard-0.014
 Honest-0.014
Token-
Token ID12
Feature activation+4.74e-03

Pos logit contributions
ribute+0.171
hemat+0.170
adem+0.170
theless+0.165
 inheritance+0.164
Neg logit contributions
-0.208
HI-0.189
-0.188
 Shepard-0.175
 Honest-0.171
TokenM
Token ID44
Feature activation+2.69e-01

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token Department
Token ID2732
Feature activation+2.99e-02

Pos logit contributions
ribute+0.169
adem+0.168
hemat+0.168
theless+0.164
 inheritance+0.163
Neg logit contributions
-0.244
HI-0.225
-0.222
 Shepard-0.208
 Honest-0.203
Token of
Token ID286
Feature activation+7.75e-03

Pos logit contributions
ribute+0.020
hemat+0.020
adem+0.020
theless+0.020
 inheritance+0.020
Neg logit contributions
-0.026
HI-0.023
-0.023
 Shepard-0.022
 Honest-0.021
Token Atmospheric
Token ID41516
Feature activation-1.08e-01

Pos logit contributions
ribute+0.004
adem+0.004
hemat+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.007
 Honest-0.007
Tokenare
Token ID533
Feature activation+6.25e-02

Pos logit contributions
ribute+0.040
hemat+0.040
adem+0.040
theless+0.040
 inheritance+0.039
Neg logit contributions
-0.049
HI-0.045
-0.044
 Shepard-0.041
 Honest-0.040
Token)
Token ID8
Feature activation-4.20e-02

Pos logit contributions
ribute+0.047
hemat+0.047
adem+0.047
theless+0.046
 inheritance+0.045
Neg logit contributions
-0.049
HI-0.045
-0.044
 Shepard-0.041
 Honest-0.040
Token great
Token ID1049
Feature activation-1.43e-01

Pos logit contributions
+0.036
HI+0.033
+0.033
 Shepard+0.031
 Honest+0.030
Neg logit contributions
ribute-0.028
hemat-0.028
adem-0.028
theless-0.028
 inheritance-0.027
Token characters
Token ID3435
Feature activation-2.74e-01

Pos logit contributions
+0.123
HI+0.113
+0.111
 Shepard+0.105
 Honest+0.102
Neg logit contributions
ribute-0.097
hemat-0.096
adem-0.096
theless-0.095
hower-0.094
Token,"
Token ID553
Feature activation-4.49e-03

Pos logit contributions
+0.230
HI+0.211
+0.208
 Shepard+0.194
 Honest+0.188
Neg logit contributions
ribute-0.190
hemat-0.189
adem-0.189
theless-0.187
 inheritance-0.184
Token Walker
Token ID10120
Feature activation+2.80e-01

Pos logit contributions
+0.005
HI+0.005
+0.005
 Shepard+0.004
 Honest+0.004
Neg logit contributions
ribute-0.002
adem-0.002
hemat-0.002
theless-0.002
 inheritance-0.002
Token said
Token ID531
Feature activation-3.39e-02

Pos logit contributions
ribute+0.206
hemat+0.204
adem+0.204
theless+0.202
 inheritance+0.199
Neg logit contributions
-0.225
HI-0.205
-0.202
 Shepard-0.188
 Honest-0.182
Token.
Token ID13
Feature activation+7.46e-03

Pos logit contributions
+0.028
HI+0.026
+0.025
 Shepard+0.024
 Honest+0.023
Neg logit contributions
ribute-0.024
adem-0.024
hemat-0.024
theless-0.024
 inheritance-0.023
Token
Token ID198
Feature activation+3.25e-03

Pos logit contributions
ribute+0.005
adem+0.005
hemat+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.006
HI-0.006
-0.006
 Shepard-0.005
 Honest-0.005
Token
Token ID198
Feature activation-6.74e-03

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.002
 Honest-0.002
Token"
Token ID1
Feature activation+3.36e-02

Pos logit contributions
+0.006
HI+0.005
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.004
adem-0.004
theless-0.004
 inheritance-0.004
Tokenarc
Token ID5605
Feature activation-1.20e-01

Pos logit contributions
+0.036
HI+0.033
+0.032
 Shepard+0.030
 Honest+0.029
Neg logit contributions
ribute-0.029
adem-0.029
hemat-0.029
theless-0.028
 inheritance-0.028
Tokena
Token ID64
Feature activation+4.79e-03

Pos logit contributions
+0.103
HI+0.095
+0.093
 Shepard+0.087
 Honest+0.085
Neg logit contributions
ribute-0.081
adem-0.081
hemat-0.081
theless-0.079
 inheritance-0.078
Token felt
Token ID2936
Feature activation+1.08e-01

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.003
 Honest-0.003
Token the
Token ID262
Feature activation-1.27e-03

Pos logit contributions
ribute+0.075
hemat+0.075
adem+0.075
theless+0.074
 inheritance+0.073
Neg logit contributions
-0.091
HI-0.083
-0.082
 Shepard-0.076
 Honest-0.074
Token best
Token ID1266
Feature activation+1.22e-01

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
hower-0.001
Token tribute
Token ID17547
Feature activation+3.65e-01

Pos logit contributions
ribute+0.065
hemat+0.065
adem+0.064
theless+0.064
hower+0.062
Neg logit contributions
-0.122
HI-0.114
-0.112
 Shepard-0.107
 Honest-0.104
Token a
Token ID257
Feature activation+2.00e-02

Pos logit contributions
ribute+0.252
hemat+0.250
adem+0.250
theless+0.247
 inheritance+0.243
Neg logit contributions
-0.309
HI-0.284
-0.279
 Shepard-0.261
 Honest-0.254
Token fellow
Token ID5891
Feature activation-1.83e-02

Pos logit contributions
ribute+0.014
hemat+0.014
adem+0.014
theless+0.014
 inheritance+0.013
Neg logit contributions
-0.017
HI-0.015
-0.015
 Shepard-0.014
 Honest-0.014
Token writer
Token ID6260
Feature activation-1.09e-01

Pos logit contributions
+0.016
HI+0.015
+0.015
 Shepard+0.014
 Honest+0.014
Neg logit contributions
ribute-0.012
hemat-0.012
adem-0.012
theless-0.012
 inheritance-0.011
Token could
Token ID714
Feature activation+4.56e-02

Pos logit contributions
+0.097
HI+0.089
+0.088
 Shepard+0.083
 Honest+0.080
Neg logit contributions
ribute-0.071
hemat-0.071
adem-0.071
theless-0.070
hower-0.069
Token pay
Token ID1414
Feature activation+8.90e-02

Pos logit contributions
hemat+0.031
adem+0.031
theless+0.031
ribute+0.031
hower+0.030
Neg logit contributions
-0.039
HI-0.036
-0.035
 Shepard-0.033
 Honest-0.032
Token.
Token ID13
Feature activation+5.18e-03

Pos logit contributions
ribute+0.063
hemat+0.063
adem+0.063
theless+0.062
 inheritance+0.061
Neg logit contributions
-0.075
HI-0.069
-0.068
 Shepard-0.063
 Honest-0.061
Token The
Token ID383
Feature activation-5.82e-02

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token independence
Token ID10404
Feature activation-1.89e-01

Pos logit contributions
+0.052
HI+0.048
+0.047
 Shepard+0.044
 Honest+0.043
Neg logit contributions
ribute-0.038
adem-0.037
hemat-0.037
theless-0.037
 inheritance-0.036
Token debate
Token ID4384
Feature activation-4.18e-02

Pos logit contributions
+0.178
HI+0.165
+0.163
 Shepard+0.153
 Honest+0.149
Neg logit contributions
ribute-0.112
hemat-0.111
adem-0.111
theless-0.110
 inheritance-0.107
Token shows
Token ID2523
Feature activation-1.63e-01

Pos logit contributions
+0.036
HI+0.033
+0.032
 Shepard+0.030
 Honest+0.029
Neg logit contributions
ribute-0.029
hemat-0.029
adem-0.029
theless-0.028
 inheritance-0.028
Token us
Token ID514
Feature activation+5.43e-02

Pos logit contributions
+0.137
HI+0.126
+0.124
 Shepard+0.115
 Honest+0.112
Neg logit contributions
ribute-0.114
hemat-0.113
adem-0.113
theless-0.112
hower-0.110
Token that
Token ID326
Feature activation-4.52e-03

Pos logit contributions
ribute+0.038
hemat+0.038
adem+0.038
theless+0.037
 inheritance+0.037
Neg logit contributions
-0.045
HI-0.041
-0.041
 Shepard-0.038
 Honest-0.037
Token we
Token ID356
Feature activation-2.17e-02

Pos logit contributions
+0.004
HI+0.004
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token need
Token ID761
Feature activation+5.41e-02

Pos logit contributions
+0.019
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.015
hemat-0.015
adem-0.015
theless-0.014
 inheritance-0.014
Token a
Token ID257
Feature activation+3.31e-02

Pos logit contributions
ribute+0.037
adem+0.036
hemat+0.036
theless+0.036
 inheritance+0.035
Neg logit contributions
-0.046
HI-0.043
-0.042
 Shepard-0.039
 Honest-0.038
Token politics
Token ID4819
Feature activation+2.34e-01

Pos logit contributions
ribute+0.022
adem+0.022
hemat+0.022
theless+0.022
 inheritance+0.022
Neg logit contributions
-0.028
HI-0.026
-0.026
 Shepard-0.024
 Honest-0.023
Token increases
Token ID5732
Feature activation-2.29e-02

Pos logit contributions
ribute+0.059
hemat+0.058
adem+0.058
theless+0.058
 inheritance+0.057
Neg logit contributions
-0.085
HI-0.079
-0.077
 Shepard-0.073
 Honest-0.071
Token to
Token ID284
Feature activation+1.69e-02

Pos logit contributions
+0.020
HI+0.018
+0.018
 Shepard+0.017
 Honest+0.016
Neg logit contributions
ribute-0.015
hemat-0.015
adem-0.015
theless-0.015
 inheritance-0.015
Token 1
Token ID352
Feature activation-1.18e-01

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.012
 inheritance+0.011
Neg logit contributions
-0.014
HI-0.013
-0.013
 Shepard-0.012
 Honest-0.012
Token,
Token ID11
Feature activation+1.11e-02

Pos logit contributions
+0.099
HI+0.091
+0.089
 Shepard+0.083
 Honest+0.081
Neg logit contributions
ribute-0.083
adem-0.083
hemat-0.083
theless-0.082
 inheritance-0.080
Token709
Token ID31495
Feature activation+1.50e-02

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.009
 Honest-0.008
Tokenm
Token ID76
Feature activation+1.03e-01

Pos logit contributions
ribute+0.010
adem+0.010
hemat+0.010
theless+0.009
 inheritance+0.009
Neg logit contributions
-0.014
HI-0.012
-0.012
 Shepard-0.011
 Honest-0.011
Tokenhz
Token ID32179
Feature activation-8.01e-02

Pos logit contributions
ribute+0.085
hemat+0.085
adem+0.084
theless+0.083
 inheritance+0.082
Neg logit contributions
-0.072
-0.065
HI-0.065
 Shepard-0.060
 Honest-0.058
Token with
Token ID351
Feature activation+7.08e-02

Pos logit contributions
+0.068
HI+0.062
+0.061
 Shepard+0.057
 Honest+0.056
Neg logit contributions
ribute-0.055
adem-0.055
hemat-0.055
theless-0.054
 inheritance-0.053
Token a
Token ID257
Feature activation+4.35e-02

Pos logit contributions
ribute+0.049
hemat+0.048
adem+0.048
theless+0.048
 inheritance+0.047
Neg logit contributions
-0.060
HI-0.055
-0.054
 Shepard-0.051
 Honest-0.049
Token 1
Token ID352
Feature activation-1.25e-01

Pos logit contributions
ribute+0.028
hemat+0.028
adem+0.028
theless+0.027
 inheritance+0.027
Neg logit contributions
-0.039
HI-0.036
-0.035
 Shepard-0.033
 Honest-0.032
Token,
Token ID11
Feature activation+1.47e-02

Pos logit contributions
+0.105
HI+0.096
+0.094
 Shepard+0.088
 Honest+0.085
Neg logit contributions
ribute-0.089
adem-0.088
hemat-0.088
theless-0.087
 inheritance-0.085
Token re
Token ID302
Feature activation+9.14e-03

Pos logit contributions
+0.016
HI+0.015
+0.015
 Shepard+0.014
 Honest+0.013
Neg logit contributions
ribute-0.012
hemat-0.012
adem-0.012
theless-0.012
 inheritance-0.011
Tokenind
Token ID521
Feature activation-1.36e-01

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.007
HI-0.006
-0.006
 Shepard-0.006
 Honest-0.005
Tokeneer
Token ID28153
Feature activation-7.08e-02

Pos logit contributions
+0.104
HI+0.094
+0.092
 Shepard+0.085
 Honest+0.083
Neg logit contributions
ribute-0.105
hemat-0.104
adem-0.104
theless-0.103
 inheritance-0.102
Token her
Token ID607
Feature activation-1.02e-02

Pos logit contributions
+0.060
HI+0.055
+0.054
 Shepard+0.050
 Honest+0.049
Neg logit contributions
ribute-0.049
hemat-0.049
adem-0.049
theless-0.048
 inheritance-0.048
Tokender
Token ID1082
Feature activation+1.34e-01

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.009
 Honest+0.008
Neg logit contributions
hemat-0.006
ribute-0.006
adem-0.006
theless-0.006
 inheritance-0.005
Token,
Token ID11
Feature activation+5.79e-03

Pos logit contributions
ribute+0.091
hemat+0.090
adem+0.090
theless+0.089
 inheritance+0.087
Neg logit contributions
-0.115
HI-0.105
-0.104
 Shepard-0.097
 Honest-0.094
Token Yuri
Token ID38450
Feature activation-1.56e-03

Pos logit contributions
ribute+0.004
hemat+0.004
adem+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.005
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token Kh
Token ID5311
Feature activation-1.10e-02

Pos logit contributions
+0.001
HI+0.001
+0.001
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
hemat-0.001
adem-0.001
theless-0.001
 inheritance-0.001
Tokenudi
Token ID47928
Feature activation+1.31e-01

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.007
 inheritance-0.007
Token,
Token ID11
Feature activation+7.77e-03

Pos logit contributions
ribute+0.090
hemat+0.090
adem+0.090
theless+0.088
 inheritance+0.087
Neg logit contributions
-0.112
HI-0.103
-0.101
 Shepard-0.094
 Honest-0.092
Token and
Token ID290
Feature activation+4.43e-03

Pos logit contributions
ribute+0.005
hemat+0.005
adem+0.005
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.007
HI-0.006
-0.006
 Shepard-0.006
 Honest-0.005
Token place
Token ID1295
Feature activation+4.95e-02

Pos logit contributions
+0.040
HI+0.036
+0.036
 Shepard+0.033
 Honest+0.032
Neg logit contributions
ribute-0.035
adem-0.034
hemat-0.034
theless-0.034
 inheritance-0.034
Token."
Token ID526
Feature activation+1.92e-02

Pos logit contributions
ribute+0.034
adem+0.034
hemat+0.034
theless+0.033
 inheritance+0.033
Neg logit contributions
-0.042
HI-0.039
-0.038
 Shepard-0.036
 Honest-0.035
Token More
Token ID3125
Feature activation-6.50e-03

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.012
 inheritance+0.012
Neg logit contributions
-0.017
HI-0.016
-0.015
 Shepard-0.014
 Honest-0.014
Token details
Token ID3307
Feature activation+3.86e-02

Pos logit contributions
+0.006
HI+0.005
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.004
adem-0.004
hemat-0.004
theless-0.004
 inheritance-0.004
Token and
Token ID290
Feature activation+1.00e-02

Pos logit contributions
ribute+0.026
hemat+0.026
adem+0.026
theless+0.025
 inheritance+0.025
Neg logit contributions
-0.033
HI-0.031
-0.030
 Shepard-0.028
 Honest-0.028
Token further
Token ID2252
Feature activation+1.32e-02

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.009
HI-0.008
-0.008
 Shepard-0.007
 Honest-0.007
Token peace
Token ID4167
Feature activation-1.96e-01

Pos logit contributions
ribute+0.009
hemat+0.009
adem+0.009
theless+0.008
 inheritance+0.008
Neg logit contributions
-0.012
HI-0.011
-0.011
 Shepard-0.010
 Honest-0.010
Token offerings
Token ID18369
Feature activation+2.21e-02

Pos logit contributions
+0.168
HI+0.154
+0.152
 Shepard+0.142
 Honest+0.138
Neg logit contributions
ribute-0.134
hemat-0.133
adem-0.133
theless-0.131
 inheritance-0.129
Token were
Token ID547
Feature activation-4.81e-02

Pos logit contributions
ribute+0.015
hemat+0.015
adem+0.015
theless+0.014
 inheritance+0.014
Neg logit contributions
-0.019
HI-0.018
-0.017
 Shepard-0.016
 Honest-0.016
Token being
Token ID852
Feature activation-8.59e-02

Pos logit contributions
+0.043
HI+0.039
+0.039
 Shepard+0.037
 Honest+0.036
Neg logit contributions
ribute-0.031
hemat-0.031
adem-0.031
theless-0.031
 inheritance-0.030
Token considered
Token ID3177
Feature activation+8.12e-03

Pos logit contributions
+0.080
HI+0.074
+0.073
 Shepard+0.069
 Honest+0.067
Neg logit contributions
ribute-0.053
hemat-0.052
adem-0.052
theless-0.051
 inheritance-0.050
Token7
Token ID22
Feature activation+2.24e-03

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.004
 Honest+0.003
Neg logit contributions
ribute-0.003
adem-0.003
hemat-0.003
theless-0.003
 inheritance-0.003
Tokens
Token ID82
Feature activation-4.34e-03

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
hower+0.002
Neg logit contributions
-0.002
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token)}
Token ID38165
Feature activation-3.38e-01

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.005
 inheritance-0.005
Token B
Token ID347
Feature activation-3.62e-03

Pos logit contributions
+0.284
HI+0.259
+0.255
 Shepard+0.239
 Honest+0.232
Neg logit contributions
ribute-0.237
hemat-0.236
adem-0.235
theless-0.232
hower-0.229
Tokenc
Token ID66
Feature activation-2.04e-02

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token5
Token ID20
Feature activation+5.70e-02

Pos logit contributions
+0.018
HI+0.017
+0.017
 Shepard+0.016
 Honest+0.015
Neg logit contributions
ribute-0.013
hemat-0.013
adem-0.013
theless-0.013
 inheritance-0.013
Token {
Token ID1391
Feature activation-9.25e-02

Pos logit contributions
ribute+0.046
adem+0.045
hemat+0.045
theless+0.045
 inheritance+0.044
Neg logit contributions
-0.042
HI-0.038
-0.037
 Shepard-0.034
 Honest-0.033
Token(
Token ID7
Feature activation+1.40e-03

Pos logit contributions
+0.075
HI+0.068
+0.067
 Shepard+0.062
 Honest+0.060
Neg logit contributions
ribute-0.068
hemat-0.067
adem-0.067
theless-0.066
 inheritance-0.065
Token4
Token ID19
Feature activation+1.70e-01

Pos logit contributions
ribute+0.001
adem+0.001
hemat+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Tokens
Token ID82
Feature activation-7.44e-03

Pos logit contributions
ribute+0.120
adem+0.119
hemat+0.119
theless+0.118
 inheritance+0.116
Neg logit contributions
-0.142
HI-0.130
-0.128
 Shepard-0.119
 Honest-0.116
Token)}
Token ID38165
Feature activation-3.37e-01

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.008
hemat-0.008
adem-0.008
theless-0.008
 inheritance-0.008
Token.
Token ID13
Feature activation-6.64e-03

Pos logit contributions
+0.058
HI+0.053
+0.052
 Shepard+0.049
 Honest+0.047
Neg logit contributions
ribute-0.048
hemat-0.048
adem-0.048
theless-0.047
hower-0.047
Token
Token ID198
Feature activation-1.53e-02

Pos logit contributions
+0.006
HI+0.005
+0.005
 Shepard+0.005
 Honest+0.005
Neg logit contributions
ribute-0.005
hemat-0.005
adem-0.005
theless-0.004
 inheritance-0.004
Token
Token ID198
Feature activation-1.19e-02

Pos logit contributions
+0.013
HI+0.012
+0.012
 Shepard+0.011
 Honest+0.011
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
 inheritance-0.010
TokenAnd
Token ID1870
Feature activation-3.55e-02

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.009
hemat-0.009
adem-0.009
theless-0.009
 inheritance-0.009
Token it
Token ID340
Feature activation+4.88e-03

Pos logit contributions
+0.029
HI+0.027
+0.026
 Shepard+0.024
 Honest+0.024
Neg logit contributions
ribute-0.025
adem-0.025
hemat-0.025
theless-0.025
 inheritance-0.025
Token plays
Token ID5341
Feature activation-6.55e-02

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.003
 Honest-0.003
Token really
Token ID1107
Feature activation+4.36e-02

Pos logit contributions
+0.057
HI+0.052
+0.051
 Shepard+0.048
 Honest+0.047
Neg logit contributions
ribute-0.044
hemat-0.044
adem-0.044
theless-0.043
hower-0.043
Token well
Token ID880
Feature activation+1.53e-01

Pos logit contributions
ribute+0.023
hemat+0.023
adem+0.023
theless+0.023
hower+0.022
Neg logit contributions
-0.044
HI-0.041
-0.040
 Shepard-0.038
 Honest-0.037
Token,
Token ID11
Feature activation-5.86e-03

Pos logit contributions
ribute+0.106
hemat+0.105
adem+0.105
theless+0.104
 inheritance+0.102
Neg logit contributions
-0.129
HI-0.118
-0.116
 Shepard-0.109
 Honest-0.105
Token as
Token ID355
Feature activation-2.69e-02

Pos logit contributions
+0.005
HI+0.005
+0.004
 Shepard+0.004
 Honest+0.004
Neg logit contributions
ribute-0.004
hemat-0.004
adem-0.004
theless-0.004
 inheritance-0.004
Token it
Token ID340
Feature activation+1.29e-03

Pos logit contributions
+0.023
HI+0.021
+0.020
 Shepard+0.019
 Honest+0.019
Neg logit contributions
ribute-0.019
hemat-0.019
adem-0.019
theless-0.018
hower-0.018
Token The
Token ID383
Feature activation-6.63e-02

Pos logit contributions
ribute+0.015
adem+0.014
hemat+0.014
theless+0.014
 inheritance+0.014
Neg logit contributions
-0.019
HI-0.017
-0.017
 Shepard-0.016
 Honest-0.015
Token Aub
Token ID19723
Feature activation-1.03e-01

Pos logit contributions
+0.058
HI+0.054
+0.053
 Shepard+0.050
 Honest+0.048
Neg logit contributions
ribute-0.044
hemat-0.043
adem-0.043
theless-0.042
 inheritance-0.042
Tokenrey
Token ID4364
Feature activation-1.37e-01

Pos logit contributions
+0.068
HI+0.061
+0.060
 Shepard+0.055
 Honest+0.052
Neg logit contributions
ribute-0.090
hemat-0.090
adem-0.090
theless-0.089
 inheritance-0.088
Token/
Token ID14
Feature activation-3.29e-02

Pos logit contributions
+0.121
HI+0.111
+0.110
 Shepard+0.102
 Honest+0.100
Neg logit contributions
ribute-0.091
adem-0.090
hemat-0.090
theless-0.089
 inheritance-0.087
TokenM
Token ID44
Feature activation+2.55e-01

Pos logit contributions
+0.037
HI+0.034
+0.034
 Shepard+0.032
 Honest+0.032
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.013
 inheritance-0.013
Tokenatur
Token ID2541
Feature activation-1.62e-01

Pos logit contributions
ribute+0.227
hemat+0.226
adem+0.226
theless+0.223
 inheritance+0.221
Neg logit contributions
-0.165
HI-0.147
-0.144
 Shepard-0.131
 Honest-0.126
Tokenin
Token ID259
Feature activation-3.00e-02

Pos logit contributions
+0.131
HI+0.120
+0.118
 Shepard+0.110
 Honest+0.107
Neg logit contributions
ribute-0.118
hemat-0.117
adem-0.117
theless-0.115
 inheritance-0.114
Token Novel
Token ID24467
Feature activation+6.46e-02

Pos logit contributions
+0.027
HI+0.025
+0.025
 Shepard+0.023
 Honest+0.023
Neg logit contributions
ribute-0.019
hemat-0.019
adem-0.019
theless-0.019
 inheritance-0.018
Tokens
Token ID82
Feature activation-1.07e-02

Pos logit contributions
ribute+0.045
hemat+0.044
adem+0.044
theless+0.044
 inheritance+0.043
Neg logit contributions
-0.055
HI-0.050
-0.049
 Shepard-0.046
 Honest-0.045
Token,
Token ID11
Feature activation-4.24e-03

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.008
 Honest+0.007
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.007
 inheritance-0.007
Token as
Token ID355
Feature activation-1.24e-02

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token
Token ID447
Feature activation+4.77e-03

Pos logit contributions
+0.001
HI+0.000
+0.000
 Shepard+0.000
 Honest+0.000
Neg logit contributions
ribute-0.000
hemat-0.000
adem-0.000
theless-0.000
 inheritance-0.000
Token
Token ID251
Feature activation+3.09e-02

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.003
 Honest-0.003
Token
Token ID198
Feature activation-3.86e-03

Pos logit contributions
ribute+0.021
hemat+0.021
adem+0.021
theless+0.020
 inheritance+0.020
Neg logit contributions
-0.027
HI-0.025
-0.024
 Shepard-0.023
 Honest-0.022
Token
Token ID198
Feature activation-2.22e-03

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.002
 inheritance-0.002
TokenR
Token ID49
Feature activation+1.60e-01

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.002
 Honest+0.001
Neg logit contributions
ribute-0.002
hemat-0.002
adem-0.002
theless-0.002
rue-0.002
Tokenid
Token ID312
Feature activation-1.04e-01

Pos logit contributions
ribute+0.130
adem+0.129
hemat+0.128
rue+0.125
theless+0.125
Neg logit contributions
-0.117
HI-0.105
-0.104
 Shepard-0.097
 Honest-0.094
Tokenley
Token ID1636
Feature activation+2.75e-02

Pos logit contributions
+0.088
HI+0.080
+0.078
 Shepard+0.074
 Honest+0.071
Neg logit contributions
ribute-0.073
hemat-0.073
adem-0.073
theless-0.072
 inheritance-0.071
Token,
Token ID11
Feature activation+3.36e-03

Pos logit contributions
ribute+0.019
adem+0.019
hemat+0.019
theless+0.018
 inheritance+0.018
Neg logit contributions
-0.023
HI-0.022
-0.021
 Shepard-0.020
 Honest-0.019
Token who
Token ID508
Feature activation-1.02e-01

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token is
Token ID318
Feature activation+9.35e-03

Pos logit contributions
+0.088
HI+0.081
+0.079
 Shepard+0.074
 Honest+0.072
Neg logit contributions
ribute-0.069
adem-0.068
hemat-0.068
theless-0.067
 inheritance-0.066
Token battling
Token ID20581
Feature activation-1.29e-02

Pos logit contributions
ribute+0.006
adem+0.006
hemat+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.008
-0.007
 Shepard-0.007
 Honest-0.007
Token knew
Token ID2993
Feature activation+1.81e-02

Pos logit contributions
+0.022
HI+0.020
+0.020
 Shepard+0.019
 Honest+0.018
Neg logit contributions
ribute-0.018
hemat-0.018
adem-0.018
theless-0.017
 inheritance-0.017
Token that
Token ID326
Feature activation-1.51e-02

Pos logit contributions
ribute+0.012
hemat+0.012
adem+0.012
theless+0.012
 inheritance+0.012
Neg logit contributions
-0.015
HI-0.014
-0.014
 Shepard-0.013
 Honest-0.013
Token with
Token ID351
Feature activation+6.72e-02

Pos logit contributions
+0.013
HI+0.012
+0.012
 Shepard+0.011
 Honest+0.011
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
 inheritance-0.010
Token the
Token ID262
Feature activation+1.93e-03

Pos logit contributions
ribute+0.045
adem+0.045
hemat+0.045
theless+0.044
 inheritance+0.044
Neg logit contributions
-0.058
HI-0.053
-0.053
 Shepard-0.049
 Honest-0.048
Token W
Token ID370
Feature activation+1.84e-01

Pos logit contributions
ribute+0.001
adem+0.001
hemat+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.002
HI-0.002
-0.001
 Shepard-0.001
 Honest-0.001
TokenKO
Token ID22328
Feature activation-2.40e-01

Pos logit contributions
ribute+0.125
adem+0.124
hemat+0.124
theless+0.122
 inheritance+0.121
Neg logit contributions
-0.157
HI-0.144
-0.142
 Shepard-0.133
 Honest-0.129
Token,
Token ID11
Feature activation+1.23e-03

Pos logit contributions
+0.204
HI+0.187
+0.184
 Shepard+0.172
 Honest+0.167
Neg logit contributions
ribute-0.165
hemat-0.164
adem-0.163
theless-0.161
 inheritance-0.159
Token I
Token ID314
Feature activation+5.47e-04

Pos logit contributions
ribute+0.001
hemat+0.001
adem+0.001
theless+0.001
 inheritance+0.001
Neg logit contributions
-0.001
HI-0.001
-0.001
 Shepard-0.001
 Honest-0.001
Token needed
Token ID2622
Feature activation-4.06e-02

Pos logit contributions
ribute+0.000
hemat+0.000
adem+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Token to
Token ID284
Feature activation+3.64e-03

Pos logit contributions
+0.034
HI+0.032
+0.031
 Shepard+0.029
 Honest+0.028
Neg logit contributions
ribute-0.028
adem-0.028
hemat-0.028
theless-0.027
 inheritance-0.027
Token build
Token ID1382
Feature activation-2.43e-02

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Token identified
Token ID5174
Feature activation-9.93e-02

Pos logit contributions
+0.004
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token in
Token ID287
Feature activation-5.42e-03

Pos logit contributions
+0.087
HI+0.080
+0.079
 Shepard+0.074
 Honest+0.072
Neg logit contributions
ribute-0.066
hemat-0.065
adem-0.065
theless-0.064
 inheritance-0.063
Token time
Token ID640
Feature activation+3.66e-02

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.004
 Honest+0.004
Neg logit contributions
ribute-0.004
hemat-0.004
adem-0.004
theless-0.004
 inheritance-0.004
Token for
Token ID329
Feature activation-4.06e-03

Pos logit contributions
ribute+0.025
hemat+0.025
adem+0.025
theless+0.025
 inheritance+0.025
Neg logit contributions
-0.031
HI-0.028
-0.028
 Shepard-0.026
 Honest-0.025
Token the
Token ID262
Feature activation+3.98e-03

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token season
Token ID1622
Feature activation-1.23e-01

Pos logit contributions
ribute+0.003
hemat+0.003
adem+0.003
theless+0.003
 inheritance+0.003
Neg logit contributions
-0.003
HI-0.003
-0.003
 Shepard-0.003
 Honest-0.003
Token opener
Token ID21996
Feature activation+8.70e-02

Pos logit contributions
+0.104
HI+0.095
+0.093
 Shepard+0.087
 Honest+0.085
Neg logit contributions
ribute-0.086
hemat-0.086
adem-0.086
theless-0.085
hower-0.083
Token,
Token ID11
Feature activation+9.79e-03

Pos logit contributions
ribute+0.056
hemat+0.056
adem+0.056
theless+0.055
 inheritance+0.054
Neg logit contributions
-0.078
HI-0.071
-0.071
 Shepard-0.066
 Honest-0.064
Token he
Token ID339
Feature activation-1.67e-02

Pos logit contributions
ribute+0.006
adem+0.006
hemat+0.006
theless+0.005
 inheritance+0.005
Neg logit contributions
-0.009
HI-0.009
-0.009
 Shepard-0.008
 Honest-0.008
Token replied
Token ID8712
Feature activation+9.11e-03

Pos logit contributions
+0.014
HI+0.013
+0.012
 Shepard+0.012
 Honest+0.011
Neg logit contributions
ribute-0.012
hemat-0.012
adem-0.012
theless-0.012
 inheritance-0.012
Token:
Token ID25
Feature activation+6.41e-03

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token Do
Token ID2141
Feature activation+1.07e-01

Pos logit contributions
ribute+0.041
adem+0.040
hemat+0.040
theless+0.040
 inheritance+0.039
Neg logit contributions
-0.062
HI-0.057
-0.056
 Shepard-0.053
 Honest-0.051
Token About
Token ID7994
Feature activation+1.53e-01

Pos logit contributions
ribute+0.063
adem+0.063
theless+0.062
hemat+0.062
 inheritance+0.061
Neg logit contributions
-0.102
HI-0.094
-0.092
 Shepard-0.088
 Honest-0.086
Token It
Token ID632
Feature activation+2.50e-02

Pos logit contributions
ribute+0.105
hemat+0.104
adem+0.104
theless+0.102
 inheritance+0.101
Neg logit contributions
-0.131
HI-0.120
-0.118
 Shepard-0.110
 Honest-0.107
Token by
Token ID416
Feature activation-1.29e-01

Pos logit contributions
ribute+0.018
hemat+0.018
adem+0.018
theless+0.018
 inheritance+0.018
Neg logit contributions
-0.020
HI-0.018
-0.018
 Shepard-0.017
 Honest-0.016
Token Marc
Token ID13067
Feature activation+9.16e-03

Pos logit contributions
+0.100
HI+0.091
+0.090
 Shepard+0.083
 Honest+0.080
Neg logit contributions
ribute-0.099
hemat-0.098
adem-0.098
theless-0.097
 inheritance-0.095
Token Goodman
Token ID33318
Feature activation-3.54e-01

Pos logit contributions
ribute+0.007
hemat+0.007
adem+0.007
theless+0.006
rue+0.006
Neg logit contributions
-0.007
HI-0.007
-0.007
 Shepard-0.006
 Honest-0.006
Token,
Token ID11
Feature activation-2.30e-03

Pos logit contributions
+0.298
HI+0.272
+0.268
 Shepard+0.251
 Honest+0.243
Neg logit contributions
ribute-0.248
hemat-0.246
adem-0.246
theless-0.244
 inheritance-0.240
Token available
Token ID1695
Feature activation+9.64e-02

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.002
 Honest+0.002
Neg logit contributions
ribute-0.002
hemat-0.002
adem-0.002
theless-0.002
 inheritance-0.002
Token February
Token ID3945
Feature activation-3.65e-02

Pos logit contributions
ribute+0.066
hemat+0.066
adem+0.065
theless+0.065
hower+0.064
Neg logit contributions
-0.082
HI-0.075
-0.074
 Shepard-0.070
 Honest-0.068
Token 24
Token ID1987
Feature activation-1.99e-03

Pos logit contributions
+0.031
HI+0.029
+0.028
 Shepard+0.026
 Honest+0.026
Neg logit contributions
ribute-0.025
hemat-0.025
adem-0.025
theless-0.025
 inheritance-0.024
Token.
Token ID13
Feature activation-6.04e-03

Pos logit contributions
+0.002
HI+0.002
+0.002
 Shepard+0.001
 Honest+0.001
Neg logit contributions
ribute-0.001
adem-0.001
hemat-0.001
theless-0.001
 inheritance-0.001
Token demands
Token ID8665
Feature activation+1.57e-01

Pos logit contributions
ribute+0.027
hemat+0.027
adem+0.027
theless+0.027
 inheritance+0.026
Neg logit contributions
-0.037
HI-0.034
-0.033
 Shepard-0.031
 Honest-0.030
Token,
Token ID11
Feature activation+3.21e-03

Pos logit contributions
ribute+0.110
adem+0.109
hemat+0.109
theless+0.108
 inheritance+0.106
Neg logit contributions
-0.132
HI-0.121
-0.119
 Shepard-0.111
 Honest-0.108
Token such
Token ID884
Feature activation-3.98e-02

Pos logit contributions
ribute+0.002
hemat+0.002
adem+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token as
Token ID355
Feature activation-1.07e-02

Pos logit contributions
+0.033
HI+0.030
+0.030
 Shepard+0.028
 Honest+0.027
Neg logit contributions
ribute-0.028
hemat-0.028
adem-0.028
theless-0.027
 inheritance-0.027
Token bringing
Token ID6079
Feature activation-1.16e-01

Pos logit contributions
+0.009
HI+0.008
+0.008
 Shepard+0.008
 Honest+0.007
Neg logit contributions
ribute-0.007
hemat-0.007
adem-0.007
theless-0.007
 inheritance-0.007
Token integrity
Token ID11540
Feature activation-4.64e-01

Pos logit contributions
+0.092
HI+0.083
+0.082
 Shepard+0.076
 Honest+0.074
Neg logit contributions
ribute-0.086
adem-0.086
hemat-0.086
theless-0.084
 inheritance-0.084
Token back
Token ID736
Feature activation-1.86e-01

Pos logit contributions
+0.377
HI+0.344
+0.338
 Shepard+0.315
 Honest+0.305
Neg logit contributions
ribute-0.337
adem-0.334
hemat-0.334
theless-0.329
 inheritance-0.326
Token to
Token ID284
Feature activation+1.10e-02

Pos logit contributions
+0.151
HI+0.138
+0.136
 Shepard+0.126
 Honest+0.123
Neg logit contributions
ribute-0.134
adem-0.133
hemat-0.133
theless-0.131
 inheritance-0.130
Token government
Token ID1230
Feature activation-4.05e-03

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.010
HI-0.009
-0.009
 Shepard-0.008
 Honest-0.008
Token science
Token ID3783
Feature activation+9.14e-03

Pos logit contributions
+0.003
HI+0.003
+0.003
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Token,
Token ID11
Feature activation+9.48e-03

Pos logit contributions
ribute+0.006
hemat+0.006
adem+0.006
theless+0.006
 inheritance+0.006
Neg logit contributions
-0.008
HI-0.007
-0.007
 Shepard-0.007
 Honest-0.007
Token),
Token ID828
Feature activation+9.83e-03

Pos logit contributions
ribute+0.070
hemat+0.070
adem+0.070
theless+0.069
 inheritance+0.068
Neg logit contributions
-0.077
HI-0.071
-0.070
 Shepard-0.065
 Honest-0.063
Token who
Token ID508
Feature activation-1.09e-01

Pos logit contributions
ribute+0.007
adem+0.007
hemat+0.007
theless+0.007
 inheritance+0.007
Neg logit contributions
-0.008
HI-0.008
-0.007
 Shepard-0.007
 Honest-0.007
Token was
Token ID373
Feature activation-1.63e-02

Pos logit contributions
+0.096
HI+0.088
+0.087
 Shepard+0.081
 Honest+0.079
Neg logit contributions
ribute-0.072
hemat-0.071
adem-0.071
theless-0.070
 inheritance-0.069
Token identified
Token ID5174
Feature activation-9.97e-02

Pos logit contributions
+0.014
HI+0.013
+0.013
 Shepard+0.012
 Honest+0.012
Neg logit contributions
ribute-0.011
hemat-0.011
adem-0.011
theless-0.010
hower-0.010
Token by
Token ID416
Feature activation-1.29e-01

Pos logit contributions
+0.080
HI+0.073
+0.072
 Shepard+0.067
 Honest+0.065
Neg logit contributions
ribute-0.073
adem-0.073
hemat-0.073
theless-0.072
 inheritance-0.071
Token Honest
Token ID28106
Feature activation-5.63e-01

Pos logit contributions
+0.106
HI+0.097
+0.096
 Shepard+0.088
 Honest+0.086
Neg logit contributions
ribute-0.092
adem-0.091
hemat-0.091
theless-0.089
 inheritance-0.089
Token Reporting
Token ID29595
Feature activation-4.86e-02

Pos logit contributions
+0.560
HI+0.519
+0.512
 Shepard+0.483
 Honest+0.471
Neg logit contributions
ribute-0.306
hemat-0.304
adem-0.304
theless-0.296
 inheritance-0.293
Token as
Token ID355
Feature activation-2.04e-02

Pos logit contributions
+0.041
HI+0.037
+0.037
 Shepard+0.034
 Honest+0.033
Neg logit contributions
ribute-0.034
adem-0.034
hemat-0.034
theless-0.033
 inheritance-0.033
Token the
Token ID262
Feature activation+2.97e-04

Pos logit contributions
+0.017
HI+0.016
+0.015
 Shepard+0.014
 Honest+0.014
Neg logit contributions
ribute-0.014
adem-0.014
hemat-0.014
theless-0.014
 inheritance-0.014
Token information
Token ID1321
Feature activation-2.64e-02

Pos logit contributions
ribute+0.000
adem+0.000
hemat+0.000
theless+0.000
 inheritance+0.000
Neg logit contributions
-0.000
HI-0.000
-0.000
 Shepard-0.000
 Honest-0.000
Token and
Token ID290
Feature activation-1.65e-03

Pos logit contributions
+0.024
HI+0.022
+0.022
 Shepard+0.020
 Honest+0.020
Neg logit contributions
ribute-0.017
adem-0.016
hemat-0.016
theless-0.016
 inheritance-0.016
Token Calgary
Token ID17842
Feature activation+5.79e-02

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.004
 Honest+0.003
Neg logit contributions
ribute-0.004
hemat-0.004
adem-0.004
theless-0.004
 inheritance-0.004
Token (
Token ID357
Feature activation+4.13e-02

Pos logit contributions
ribute+0.040
adem+0.040
hemat+0.040
theless+0.039
hower+0.039
Neg logit contributions
-0.049
HI-0.045
-0.044
 Shepard-0.041
 Honest-0.040
Tokenwhich
Token ID4758
Feature activation-1.29e-02

Pos logit contributions
ribute+0.030
adem+0.030
hemat+0.030
theless+0.030
 inheritance+0.029
Neg logit contributions
-0.034
HI-0.031
-0.030
 Shepard-0.028
 Honest-0.027
Token has
Token ID468
Feature activation+1.11e-02

Pos logit contributions
+0.010
HI+0.009
+0.009
 Shepard+0.008
 Honest+0.008
Neg logit contributions
ribute-0.010
hemat-0.010
adem-0.010
theless-0.010
 inheritance-0.009
Token the
Token ID262
Feature activation-4.70e-03

Pos logit contributions
ribute+0.008
hemat+0.008
adem+0.008
theless+0.008
 inheritance+0.007
Neg logit contributions
-0.009
HI-0.009
-0.008
 Shepard-0.008
 Honest-0.008
Token Hyp
Token ID21209
Feature activation-3.67e-01

Pos logit contributions
+0.004
HI+0.004
+0.004
 Shepard+0.003
 Honest+0.003
Neg logit contributions
ribute-0.003
hemat-0.003
adem-0.003
theless-0.003
 inheritance-0.003
Tokenac
Token ID330
Feature activation+1.14e-01

Pos logit contributions
+0.363
HI+0.336
+0.331
 Shepard+0.313
 Honest+0.306
Neg logit contributions
ribute-0.202
hemat-0.200
adem-0.200
theless-0.197
 inheritance-0.194
Tokenros
Token ID4951
Feature activation-6.84e-02

Pos logit contributions
ribute+0.090
hemat+0.089
adem+0.089
theless+0.089
 inheritance+0.088
Neg logit contributions
-0.085
HI-0.076
-0.074
 Shepard-0.069
 Honest-0.067
Tokenaurus
Token ID22302
Feature activation-3.85e-02

Pos logit contributions
+0.055
HI+0.050
+0.049
 Shepard+0.046
 Honest+0.044
Neg logit contributions
ribute-0.050
hemat-0.050
adem-0.050
theless-0.050
 inheritance-0.049
Token egg
Token ID5935
Feature activation-1.76e-02

Pos logit contributions
+0.036
HI+0.033
+0.033
 Shepard+0.031
 Honest+0.030
Neg logit contributions
ribute-0.024
hemat-0.023
adem-0.023
theless-0.023
 inheritance-0.023
Token)
Token ID8
Feature activation-5.63e-02

Pos logit contributions
+0.015
HI+0.014
+0.014
 Shepard+0.013
 Honest+0.012
Neg logit contributions
ribute-0.012
hemat-0.012
adem-0.012
theless-0.012
 inheritance-0.012
Token 1950
Token ID11445
Feature activation-5.99e-02

Pos logit contributions
ribute+0.002
adem+0.002
hemat+0.002
theless+0.002
 inheritance+0.002
Neg logit contributions
-0.003
HI-0.002
-0.002
 Shepard-0.002
 Honest-0.002
Token NFL
Token ID5134
Feature activation-1.53e-01

Pos logit contributions
+0.048
HI+0.044
+0.043
 Shepard+0.040
 Honest+0.039
Neg logit contributions
ribute-0.044
adem-0.044
hemat-0.043
theless-0.043
 inheritance-0.043
Token Championship
Token ID10749
Feature activation-9.21e-02

Pos logit contributions
+0.120
HI+0.109
+0.107
 Shepard+0.100
 Honest+0.096
Neg logit contributions
ribute-0.115
adem-0.114
hemat-0.114
theless-0.113
 inheritance-0.112
Token Game
Token ID3776
Feature activation-2.22e-01

Pos logit contributions
+0.070
HI+0.063
+0.062
 Shepard+0.057
 Honest+0.055
Neg logit contributions
ribute-0.072
hemat-0.072
adem-0.072
theless-0.071
 inheritance-0.070
Token
Token ID960
Feature activation-2.24e-01

Pos logit contributions
+0.187
HI+0.171
+0.168
 Shepard+0.157
 Honest+0.153
Neg logit contributions
ribute-0.155
hemat-0.154
adem-0.154
theless-0.152
 inheritance-0.150
Tokenarg
Token ID853
Feature activation-4.06e-01

Pos logit contributions
+0.189
HI+0.173
+0.170
 Shepard+0.159
 Honest+0.154
Neg logit contributions
ribute-0.156
hemat-0.155
adem-0.155
theless-0.153
 inheritance-0.151
Tokenuably
Token ID14632
Feature activation-7.31e-02

Pos logit contributions
+0.296
HI+0.267
+0.262
 Shepard+0.242
 Honest+0.233
Neg logit contributions
ribute-0.328
hemat-0.326
adem-0.326
theless-0.322
 inheritance-0.319
Token the
Token ID262
Feature activation+5.34e-03

Pos logit contributions
+0.061
HI+0.056
+0.055
 Shepard+0.051
 Honest+0.049
Neg logit contributions
ribute-0.052
adem-0.051
hemat-0.051
theless-0.051
 inheritance-0.050
Token greatest
Token ID6000
Feature activation-1.04e-01

Pos logit contributions
ribute+0.004
adem+0.004
hemat+0.004
theless+0.004
 inheritance+0.004
Neg logit contributions
-0.004
HI-0.004
-0.004
 Shepard-0.004
 Honest-0.004
Token game
Token ID983
Feature activation-1.59e-01

Pos logit contributions
+0.086
HI+0.079
+0.078
 Shepard+0.073
 Honest+0.071
Neg logit contributions
ribute-0.073
hemat-0.073
adem-0.073
theless-0.072
hower-0.071
Token ever
Token ID1683
Feature activation+2.57e-01

Pos logit contributions
+0.140
HI+0.129
+0.127
 Shepard+0.119
 Honest+0.116
Neg logit contributions
ribute-0.105
hemat-0.104
adem-0.104
theless-0.103
 inheritance-0.101