TOP ACTIVATIONS
MAX = 0.608

Alexand rium and Din oph ysis , which were both subjects
gg u ibi ( B res lin 63 ), Dug gan
ord ent re mem b res , les Offic iers et
2014 . REUTERS / En rique de la Os a
would like a hi - res version of your image please
image = im utils . res ize ( image , height
Se gu in The at res Inc ., working
Rs 31 . 20 cro res as opposed to the Rs
in the centre , out res olving the sensor at all
said Alexand rine Lat end res se , the NDP
s living in Fair ac res , but they can certainly
to the Rs three cro res valuation done by MF SC
bit re des mem b res , en partic ul ier
ouls , ghosts , spect res and other creepy craw lers
, Lat end res se said .
call sign , no add res see , just panic .
Valve , especially if E SEA helps them out , can
logic and my energy c res cend o in the last
ep oxy resin is " res il ient to mechanical stresses
, Australian Ben Mend el so hn ( Blood

MIN ACTIVATIONS
MIN = -0.533

H ints on Et iquette and the Us ages of
H ints on Et iquette and the Us ages of
E t iquette for Gentle men ; or
children will work on " action projects " tied to the
E t iquette for Gentle men , 18
men s Et iquette , 18 77 .
reaction to the ­ re action is a textbook case in
men s Et iquette , 18 77 .
s Book of Et iquette and Manual of Polit eness
ach ieve repetition and survival during the digital encoding process
/ A 3 bolt - action rifle on the rooftop during
Book of Pract ical Et iquette for Ladies and Gentle men
men s Et iquette , 18 77 .
Gl ance at Bad Hab its , 18 44 .
off three existing root k its : J yn x ,
Islands 14 . 6 8 Bots wana 14 . 9 9
Gl ance at Bad Hab its , 18 44 .
A J Po opy Sh its - Add in Audio To
be like j ugg ling the hot potato ,
and harsh . The D ugg ars intoler ant

INTERVAL 0.323 - 0.608
CONTAINS 0.531%

on what they re going to consume .
only in the wider larger image ). In the June 1
is immediately establish Safe Z ones so that families with children
BLACK FO OT T OC RE P ORTS B 6 HAS
right ? Leave Shia alone , guys . You

INTERVAL 0.037 - 0.323
CONTAINS 26.680%

the cost of oil in Haiti after several years of high
Peter Mell gard / The World Post The water hours
re doing now is working out the tactics ,
The expansion would be the largest since vouchers were introduced in
hes said the aston ishing revision also

INTERVAL -0.248 - 0.037
CONTAINS 71.128%

to try this ! reactions generated after playing the
certain Qaeda suspects , but the letter said the C .
. But long before arrival of the Greeks and the Pers
the need to seamlessly transition from the office levels at the
happen , it is necessary the EU to consider carefully the

INTERVAL -0.533 - -0.248
CONTAINS 1.661%

of it . I asked life form 7 for their permission
this perspective with Gary during our interview , he ref ram
While they are both Stanford economists , they are not in
for the rest of their lives and it may even turn
Reagan was actually caught giving weapons to the Iranians , you
Token Alexand
Token ID21000
Feature activation+5.78e-02

Pos logit contributions
 Cunning+0.022
 harmless+0.018
its+0.017
iquette+0.017
attering+0.017
Neg logit contributions
ESE-0.021
ysis-0.019
res-0.018
SEA-0.018
sta-0.017
Tokenrium
Token ID19172
Feature activation+1.79e-01

Pos logit contributions
ESE+0.030
ysis+0.026
res+0.024
SEA+0.024
sta+0.022
Neg logit contributions
 Cunning-0.065
 harmless-0.056
its-0.054
iquette-0.054
attering-0.053
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
ESE+0.140
ysis+0.129
res+0.124
SEA+0.122
sta+0.116
Neg logit contributions
 Cunning-0.153
 harmless-0.124
its-0.119
iquette-0.118
attering-0.118
Token Din
Token ID23448
Feature activation-1.10e-01

Pos logit contributions
 Cunning+0.009
 harmless+0.007
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.010
ysis-0.009
res-0.008
SEA-0.008
sta-0.008
Tokenoph
Token ID2522
Feature activation-1.64e-01

Pos logit contributions
 Cunning+0.080
 harmless+0.063
its+0.059
iquette+0.059
attering+0.059
Neg logit contributions
ESE-0.101
ysis-0.093
res-0.090
SEA-0.089
thodox-0.086
Tokenysis
Token ID3097
Feature activation+6.08e-01

Pos logit contributions
 Cunning+0.132
 harmless+0.106
its+0.100
iquette+0.100
attering+0.099
Neg logit contributions
ESE-0.137
ysis-0.127
res-0.122
SEA-0.121
sta-0.115
Token,
Token ID11
Feature activation+4.85e-03

Pos logit contributions
ESE+0.488
ysis+0.452
res+0.436
SEA+0.428
 seeded+0.411
Neg logit contributions
 Cunning-0.505
 harmless-0.405
iquette-0.389
its-0.387
attering-0.387
Token which
Token ID543
Feature activation-3.65e-02

Pos logit contributions
ESE+0.004
ysis+0.004
res+0.004
SEA+0.004
sta+0.003
Neg logit contributions
 Cunning-0.004
 harmless-0.003
iquette-0.003
its-0.003
attering-0.003
Token were
Token ID547
Feature activation-3.32e-02

Pos logit contributions
 Cunning+0.031
 harmless+0.025
iquette+0.024
its+0.024
attering+0.024
Neg logit contributions
ESE-0.029
ysis-0.026
res-0.026
SEA-0.025
 seeded-0.024
Token both
Token ID1111
Feature activation-6.28e-02

Pos logit contributions
 Cunning+0.028
 harmless+0.022
iquette+0.021
its+0.021
attering+0.021
Neg logit contributions
ESE-0.027
ysis-0.024
res-0.024
SEA-0.023
 seeded-0.022
Token subjects
Token ID7481
Feature activation-1.39e-01

Pos logit contributions
 Cunning+0.052
 harmless+0.042
its+0.040
iquette+0.040
attering+0.040
Neg logit contributions
ESE-0.050
ysis-0.046
res-0.045
SEA-0.044
 seeded-0.042
Tokengg
Token ID1130
Feature activation-2.30e-01

Pos logit contributions
ESE+0.405
ysis+0.373
res+0.362
SEA+0.355
sta+0.341
Neg logit contributions
 Cunning-0.414
 harmless-0.336
iquette-0.316
its-0.316
attering-0.315
Tokenu
Token ID84
Feature activation-6.98e-02

Pos logit contributions
 Cunning+0.189
 harmless+0.153
iquette+0.143
its+0.143
attering+0.143
Neg logit contributions
ESE-0.188
ysis-0.173
res-0.168
SEA-0.165
sta-0.158
Tokenibi
Token ID27567
Feature activation+1.72e-01

Pos logit contributions
 Cunning+0.060
 harmless+0.049
its+0.047
iquette+0.047
attering+0.047
Neg logit contributions
ESE-0.054
ysis-0.049
res-0.047
SEA-0.047
thodox-0.045
Token (
Token ID357
Feature activation+1.54e-01

Pos logit contributions
ESE+0.132
ysis+0.121
res+0.116
SEA+0.115
sta+0.109
Neg logit contributions
 Cunning-0.149
 harmless-0.122
its-0.116
iquette-0.115
attering-0.115
TokenB
Token ID33
Feature activation-2.10e-01

Pos logit contributions
ESE+0.122
ysis+0.112
res+0.108
SEA+0.107
sta+0.102
Neg logit contributions
 Cunning-0.130
 harmless-0.106
iquette-0.100
its-0.100
attering-0.100
Tokenres
Token ID411
Feature activation+5.89e-01

Pos logit contributions
 Cunning+0.166
 harmless+0.133
its+0.125
iquette+0.124
attering+0.124
Neg logit contributions
ESE-0.179
ysis-0.165
res-0.159
SEA-0.157
sta-0.150
Tokenlin
Token ID2815
Feature activation+1.05e-01

Pos logit contributions
ESE+0.462
ysis+0.425
res+0.410
SEA+0.403
sta+0.386
Neg logit contributions
 Cunning-0.502
 harmless-0.411
its-0.387
iquette-0.386
attering-0.385
Token 63
Token ID8093
Feature activation-1.04e-01

Pos logit contributions
ESE+0.089
ysis+0.082
res+0.080
SEA+0.079
sta+0.075
Neg logit contributions
 Cunning-0.083
 harmless-0.067
its-0.063
iquette-0.063
attering-0.062
Token),
Token ID828
Feature activation-1.32e-01

Pos logit contributions
 Cunning+0.068
 harmless+0.051
its+0.048
iquette+0.047
attering+0.047
Neg logit contributions
ESE-0.103
ysis-0.096
res-0.094
SEA-0.093
sta-0.089
Token Dug
Token ID27436
Feature activation-1.11e-01

Pos logit contributions
 Cunning+0.099
 harmless+0.079
its+0.074
iquette+0.074
attering+0.074
Neg logit contributions
ESE-0.117
ysis-0.108
res-0.105
SEA-0.104
sta-0.099
Tokengan
Token ID1030
Feature activation-1.15e-01

Pos logit contributions
 Cunning+0.079
 harmless+0.062
iquette+0.057
its+0.057
attering+0.057
Neg logit contributions
ESE-0.103
ysis-0.096
res-0.093
SEA-0.091
sta-0.088
Tokenord
Token ID585
Feature activation-5.15e-02

Pos logit contributions
 Cunning+0.370
 harmless+0.314
its+0.304
iquette+0.301
attering+0.301
Neg logit contributions
ESE-0.220
ysis-0.195
SEA-0.184
res-0.183
thodox-0.171
Token ent
Token ID920
Feature activation-3.37e-02

Pos logit contributions
 Cunning+0.047
 harmless+0.039
its+0.037
iquette+0.037
attering+0.037
Neg logit contributions
ESE-0.037
ysis-0.034
res-0.032
SEA-0.032
sta-0.030
Tokenre
Token ID260
Feature activation+4.23e-01

Pos logit contributions
 Cunning+0.042
 harmless+0.036
its+0.035
iquette+0.035
attering+0.035
Neg logit contributions
ESE-0.014
ysis-0.011
res-0.010
SEA-0.010
thodox-0.009
Token mem
Token ID1066
Feature activation+5.34e-02

Pos logit contributions
ESE+0.305
ysis+0.278
res+0.267
SEA+0.263
sta+0.249
Neg logit contributions
 Cunning-0.388
 harmless-0.321
its-0.306
iquette-0.305
attering-0.305
Tokenb
Token ID65
Feature activation-4.31e-02

Pos logit contributions
ESE+0.045
ysis+0.041
res+0.040
SEA+0.039
sta+0.037
Neg logit contributions
 Cunning-0.043
 harmless-0.034
its-0.033
iquette-0.032
attering-0.032
Tokenres
Token ID411
Feature activation+5.86e-01

Pos logit contributions
 Cunning+0.064
 harmless+0.057
its+0.056
action+0.056
iquette+0.056
Neg logit contributions
ESE-0.007
ysis-0.003
SEA-0.002
res-0.001
thodox-0.001
Token,
Token ID11
Feature activation+1.03e-02

Pos logit contributions
ESE+0.445
ysis+0.407
res+0.392
SEA+0.386
sta+0.368
Neg logit contributions
 Cunning-0.514
 harmless-0.422
its-0.401
attering-0.400
iquette-0.400
Token les
Token ID10287
Feature activation+2.00e-01

Pos logit contributions
ESE+0.007
ysis+0.007
res+0.006
SEA+0.006
sta+0.006
Neg logit contributions
 Cunning-0.009
 harmless-0.008
its-0.007
iquette-0.007
attering-0.007
Token Offic
Token ID6328
Feature activation+2.35e-01

Pos logit contributions
ESE+0.157
ysis+0.144
res+0.139
SEA+0.137
sta+0.130
Neg logit contributions
 Cunning-0.170
 harmless-0.139
its-0.132
iquette-0.131
attering-0.131
Tokeniers
Token ID3183
Feature activation-4.12e-03

Pos logit contributions
ESE+0.185
ysis+0.170
res+0.164
SEA+0.161
sta+0.154
Neg logit contributions
 Cunning-0.200
 harmless-0.163
its-0.154
iquette-0.153
attering-0.153
Token et
Token ID2123
Feature activation+1.32e-01

Pos logit contributions
 Cunning+0.004
 harmless+0.003
its+0.003
iquette+0.003
attering+0.003
Neg logit contributions
ESE-0.003
ysis-0.003
res-0.002
SEA-0.002
sta-0.002
Token 2014
Token ID1946
Feature activation-1.57e-01

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.000
 harmless-0.000
iquette-0.000
its-0.000
attering-0.000
Token.
Token ID13
Feature activation-1.09e-02

Pos logit contributions
 Cunning+0.128
 harmless+0.104
iquette+0.098
its+0.098
attering+0.097
Neg logit contributions
ESE-0.129
ysis-0.119
res-0.115
SEA-0.113
sta-0.108
Token REUTERS
Token ID15862
Feature activation-8.92e-02

Pos logit contributions
 Cunning+0.007
 harmless+0.005
its+0.005
 Bots+0.005
 Buckley+0.005
Neg logit contributions
ESE-0.011
ysis-0.010
SEA-0.009
res-0.009
thodox-0.009
Token/
Token ID14
Feature activation+5.94e-02

Pos logit contributions
 Cunning+0.080
 harmless+0.065
its+0.063
iquette+0.062
attering+0.062
Neg logit contributions
ESE-0.067
ysis-0.061
res-0.058
SEA-0.058
thodox-0.055
TokenEn
Token ID4834
Feature activation+1.26e-01

Pos logit contributions
ESE+0.052
ysis+0.048
res+0.046
SEA+0.046
thodox+0.044
Neg logit contributions
 Cunning-0.046
 harmless-0.036
its-0.034
iquette-0.034
attering-0.034
Tokenrique
Token ID33865
Feature activation+5.86e-01

Pos logit contributions
ESE+0.028
ysis+0.018
SEA+0.013
res+0.012
thodox+0.011
Neg logit contributions
 Cunning-0.180
 harmless-0.159
its-0.157
attering-0.155
 Buckley-0.155
Token de
Token ID390
Feature activation+2.43e-01

Pos logit contributions
ESE+0.457
ysis+0.423
res+0.411
SEA+0.401
sta+0.384
Neg logit contributions
 Cunning-0.489
 harmless-0.401
attering-0.386
its-0.386
iquette-0.385
Token la
Token ID8591
Feature activation+3.42e-02

Pos logit contributions
ESE+0.212
ysis+0.195
res+0.189
SEA+0.187
sta+0.180
Neg logit contributions
 Cunning-0.184
 harmless-0.148
iquette-0.139
its-0.139
attering-0.138
Token Os
Token ID8834
Feature activation+2.08e-01

Pos logit contributions
ESE+0.027
ysis+0.025
SEA+0.024
res+0.024
thodox+0.022
Neg logit contributions
 Cunning-0.029
 harmless-0.024
its-0.023
iquette-0.022
 Buckley-0.022
Tokena
Token ID64
Feature activation-2.11e-01

Pos logit contributions
ESE+0.185
ysis+0.171
res+0.164
SEA+0.164
thodox+0.157
Neg logit contributions
 Cunning-0.157
 harmless-0.124
its-0.117
iquette-0.116
attering-0.115
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
 Cunning+0.170
 harmless+0.136
its+0.130
 Buckley+0.127
 Bots+0.127
Neg logit contributions
ESE-0.178
ysis-0.164
SEA-0.156
res-0.156
thodox-0.150
Token would
Token ID561
Feature activation-1.03e-02

Pos logit contributions
ESE+0.032
ysis+0.029
SEA+0.028
res+0.028
thodox+0.027
Neg logit contributions
 Cunning-0.028
 harmless-0.022
its-0.021
iquette-0.021
attering-0.021
Token like
Token ID588
Feature activation-1.48e-01

Pos logit contributions
 Cunning+0.007
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.010
ysis-0.010
res-0.009
SEA-0.009
thodox-0.009
Token a
Token ID257
Feature activation+3.09e-03

Pos logit contributions
 Cunning+0.117
 harmless+0.094
its+0.089
iquette+0.088
attering+0.088
Neg logit contributions
ESE-0.126
ysis-0.116
res-0.111
SEA-0.111
thodox-0.106
Token hi
Token ID23105
Feature activation+2.15e-01

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.003
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token-
Token ID12
Feature activation-2.93e-02

Pos logit contributions
ESE+0.187
ysis+0.171
SEA+0.165
res+0.163
thodox+0.157
Neg logit contributions
 Cunning-0.166
 harmless-0.132
its-0.126
iquette-0.124
attering-0.124
Tokenres
Token ID411
Feature activation+5.85e-01

Pos logit contributions
 Cunning+0.044
 harmless+0.039
its+0.038
iquette+0.038
attering+0.038
Neg logit contributions
ESE-0.004
ysis-0.002
SEA-0.001
res-0.001
thodox-0.000
Token version
Token ID2196
Feature activation+1.43e-01

Pos logit contributions
ESE+0.423
ysis+0.384
res+0.367
SEA+0.364
thodox+0.344
Neg logit contributions
 Cunning-0.535
 harmless-0.444
its-0.423
iquette-0.421
attering-0.420
Token of
Token ID286
Feature activation+7.38e-03

Pos logit contributions
ESE+0.120
ysis+0.110
res+0.106
SEA+0.105
thodox+0.100
Neg logit contributions
 Cunning-0.114
 harmless-0.092
its-0.087
iquette-0.086
attering-0.086
Token your
Token ID534
Feature activation-6.31e-02

Pos logit contributions
ESE+0.007
ysis+0.006
res+0.006
SEA+0.006
sta+0.006
Neg logit contributions
 Cunning-0.005
 harmless-0.004
its-0.004
iquette-0.004
attering-0.004
Token image
Token ID2939
Feature activation+3.34e-01

Pos logit contributions
 Cunning+0.056
 harmless+0.046
its+0.044
iquette+0.044
attering+0.044
Neg logit contributions
ESE-0.047
ysis-0.043
res-0.041
SEA-0.041
thodox-0.039
Token please
Token ID3387
Feature activation+3.19e-02

Pos logit contributions
ESE+0.266
ysis+0.243
res+0.233
SEA+0.232
thodox+0.220
Neg logit contributions
 Cunning-0.282
 harmless-0.230
its-0.218
iquette-0.217
attering-0.216
Token image
Token ID2939
Feature activation+3.29e-01

Pos logit contributions
 Cunning+0.081
 harmless+0.066
its+0.063
iquette+0.063
attering+0.063
Neg logit contributions
ESE-0.070
ysis-0.063
res-0.061
SEA-0.060
thodox-0.057
Token =
Token ID796
Feature activation-1.10e-01

Pos logit contributions
ESE+0.271
ysis+0.250
res+0.241
SEA+0.239
sta+0.227
Neg logit contributions
 Cunning-0.268
 harmless-0.216
its-0.204
iquette-0.204
attering-0.203
Token im
Token ID545
Feature activation+4.34e-01

Pos logit contributions
 Cunning+0.099
 harmless+0.082
its+0.078
iquette+0.077
attering+0.077
Neg logit contributions
ESE-0.082
ysis-0.074
res-0.071
SEA-0.070
thodox-0.067
Tokenutils
Token ID26791
Feature activation-1.14e-02

Pos logit contributions
ESE+0.355
ysis+0.329
res+0.319
SEA+0.312
sta+0.298
Neg logit contributions
 Cunning-0.359
 harmless-0.290
iquette-0.272
 Buckley-0.271
its-0.270
Token.
Token ID13
Feature activation+6.73e-04

Pos logit contributions
 Cunning+0.009
 harmless+0.008
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.009
ysis-0.009
res-0.008
SEA-0.008
sta-0.008
Tokenres
Token ID411
Feature activation+5.82e-01

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
thodox+0.000
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
attering-0.000
iquette-0.000
Tokenize
Token ID1096
Feature activation+1.45e-01

Pos logit contributions
ESE+0.375
ysis+0.337
res+0.320
SEA+0.316
sta+0.296
Neg logit contributions
 Cunning-0.579
 harmless-0.487
its-0.466
iquette-0.464
attering-0.463
Token(
Token ID7
Feature activation+2.09e-01

Pos logit contributions
ESE+0.096
ysis+0.087
res+0.082
SEA+0.082
thodox+0.077
Neg logit contributions
 Cunning-0.142
 harmless-0.119
its-0.114
iquette-0.113
attering-0.113
Tokenimage
Token ID9060
Feature activation+2.99e-01

Pos logit contributions
ESE+0.114
ysis+0.100
res+0.094
SEA+0.093
sta+0.086
Neg logit contributions
 Cunning-0.229
 harmless-0.196
its-0.189
iquette-0.188
attering-0.188
Token,
Token ID11
Feature activation+5.73e-03

Pos logit contributions
ESE+0.219
ysis+0.200
res+0.192
SEA+0.190
sta+0.178
Neg logit contributions
 Cunning-0.272
 harmless-0.224
its-0.213
iquette-0.212
attering-0.212
Token height
Token ID6001
Feature activation+1.85e-01

Pos logit contributions
ESE+0.005
ysis+0.004
res+0.004
SEA+0.004
sta+0.004
Neg logit contributions
 Cunning-0.005
 harmless-0.004
iquette-0.004
its-0.004
attering-0.004
Token Se
Token ID1001
Feature activation+2.26e-01

Pos logit contributions
ESE+0.008
ysis+0.008
res+0.007
SEA+0.007
sta+0.007
Neg logit contributions
 Cunning-0.008
 harmless-0.007
its-0.006
iquette-0.006
attering-0.006
Tokengu
Token ID5162
Feature activation-1.45e-01

Pos logit contributions
ESE+0.219
ysis+0.205
res+0.200
SEA+0.197
sta+0.190
Neg logit contributions
 Cunning-0.151
 harmless-0.115
its-0.106
iquette-0.105
 Buckley-0.105
Tokenin
Token ID259
Feature activation-2.11e-01

Pos logit contributions
 Cunning+0.089
 harmless+0.065
iquette+0.059
 Bots+0.059
 Buckley+0.059
Neg logit contributions
ESE-0.150
ysis-0.141
res-0.139
SEA-0.134
sta-0.132
Token The
Token ID383
Feature activation-1.47e-02

Pos logit contributions
 Cunning+0.176
 harmless+0.143
its+0.136
iquette+0.135
attering+0.134
Neg logit contributions
ESE-0.171
ysis-0.156
res-0.150
SEA-0.149
sta-0.142
Tokenat
Token ID265
Feature activation-1.83e-01

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.014
ysis-0.013
res-0.012
SEA-0.012
sta-0.012
Tokenres
Token ID411
Feature activation+5.81e-01

Pos logit contributions
 Cunning+0.271
 harmless+0.242
its+0.237
iquette+0.236
attering+0.235
Neg logit contributions
ESE-0.028
ysis-0.015
SEA-0.010
res-0.007
thodox-0.003
Token Inc
Token ID3457
Feature activation-3.50e-02

Pos logit contributions
ESE+0.474
ysis+0.436
res+0.421
SEA+0.416
sta+0.396
Neg logit contributions
 Cunning-0.478
 harmless-0.387
iquette-0.366
its-0.365
attering-0.365
Token.,
Token ID1539
Feature activation-1.07e-01

Pos logit contributions
 Cunning+0.027
 harmless+0.021
its+0.020
iquette+0.020
attering+0.020
Neg logit contributions
ESE-0.030
ysis-0.028
res-0.027
SEA-0.027
sta-0.026
Token
Token ID564
Feature activation+3.86e-02

Pos logit contributions
 Cunning+0.089
 harmless+0.072
its+0.068
iquette+0.068
attering+0.068
Neg logit contributions
ESE-0.086
ysis-0.079
res-0.076
SEA-0.075
sta-0.072
Token
Token ID250
Feature activation+1.84e-04

Pos logit contributions
ESE+0.032
ysis+0.029
res+0.028
SEA+0.028
sta+0.027
Neg logit contributions
 Cunning-0.031
 harmless-0.025
its-0.024
iquette-0.024
attering-0.024
Tokenworking
Token ID16090
Feature activation+8.27e-02

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
sta+0.000
Neg logit contributions
 Cunning-0.000
 harmless-0.000
iquette-0.000
its-0.000
attering-0.000
Token Rs
Token ID12820
Feature activation+8.66e-02

Pos logit contributions
ESE+0.027
ysis+0.025
res+0.024
SEA+0.024
sta+0.023
Neg logit contributions
 Cunning-0.031
 harmless-0.025
its-0.024
iquette-0.024
attering-0.024
Token 31
Token ID3261
Feature activation-1.27e-01

Pos logit contributions
ESE+0.073
ysis+0.067
res+0.065
SEA+0.064
sta+0.061
Neg logit contributions
 Cunning-0.069
 harmless-0.055
iquette-0.052
attering-0.052
its-0.052
Token.
Token ID13
Feature activation-3.53e-03

Pos logit contributions
 Cunning+0.101
 harmless+0.081
its+0.077
iquette+0.077
attering+0.076
Neg logit contributions
ESE-0.107
ysis-0.099
res-0.095
SEA-0.094
sta-0.090
Token20
Token ID1238
Feature activation-1.42e-01

Pos logit contributions
 Cunning+0.003
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.003
ysis-0.003
res-0.002
SEA-0.002
sta-0.002
Token cro
Token ID6763
Feature activation-3.61e-02

Pos logit contributions
 Cunning+0.101
 harmless+0.079
iquette+0.074
its+0.074
attering+0.073
Neg logit contributions
ESE-0.131
ysis-0.122
res-0.119
SEA-0.117
sta-0.113
Tokenres
Token ID411
Feature activation+5.79e-01

Pos logit contributions
 Cunning+0.055
 harmless+0.049
its+0.048
iquette+0.048
attering+0.048
Neg logit contributions
ESE-0.004
ysis-0.002
SEA-0.001
res-0.000
thodox+0.001
Token as
Token ID355
Feature activation-2.76e-02

Pos logit contributions
ESE+0.458
ysis+0.420
res+0.404
SEA+0.399
sta+0.379
Neg logit contributions
 Cunning-0.492
 harmless-0.400
its-0.380
iquette-0.379
attering-0.377
Token opposed
Token ID6886
Feature activation+1.01e-01

Pos logit contributions
 Cunning+0.022
 harmless+0.018
iquette+0.017
its+0.017
attering+0.017
Neg logit contributions
ESE-0.023
ysis-0.021
res-0.020
SEA-0.020
sta-0.019
Token to
Token ID284
Feature activation-1.35e-02

Pos logit contributions
ESE+0.081
ysis+0.075
res+0.073
SEA+0.072
thodox+0.069
Neg logit contributions
 Cunning-0.083
 harmless-0.067
iquette-0.064
its-0.063
attering-0.063
Token the
Token ID262
Feature activation+1.39e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.010
ysis-0.009
res-0.009
SEA-0.009
sta-0.008
Token Rs
Token ID12820
Feature activation+8.77e-02

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.010
SEA+0.009
sta+0.009
Neg logit contributions
 Cunning-0.012
 harmless-0.010
iquette-0.009
its-0.009
attering-0.009
Token in
Token ID287
Feature activation-1.68e-02

Pos logit contributions
ESE+0.034
ysis+0.031
res+0.030
SEA+0.030
thodox+0.028
Neg logit contributions
 Cunning-0.035
 harmless-0.028
its-0.027
iquette-0.027
attering-0.026
Token the
Token ID262
Feature activation+2.45e-02

Pos logit contributions
 Cunning+0.014
 harmless+0.011
iquette+0.010
its+0.010
attering+0.010
Neg logit contributions
ESE-0.014
ysis-0.013
res-0.012
SEA-0.012
thodox-0.012
Token centre
Token ID7372
Feature activation-1.50e-03

Pos logit contributions
ESE+0.020
ysis+0.018
res+0.018
SEA+0.018
thodox+0.017
Neg logit contributions
 Cunning-0.020
 harmless-0.016
its-0.015
iquette-0.015
attering-0.015
Token,
Token ID11
Feature activation+3.11e-03

Pos logit contributions
 Cunning+0.001
 harmless+0.001
its+0.001
iquette+0.001
attering+0.001
Neg logit contributions
ESE-0.001
ysis-0.001
res-0.001
SEA-0.001
thodox-0.001
Token out
Token ID503
Feature activation-1.40e-01

Pos logit contributions
ESE+0.003
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.003
 harmless-0.002
iquette-0.002
its-0.002
attering-0.002
Tokenres
Token ID411
Feature activation+5.78e-01

Pos logit contributions
 Cunning+0.115
 harmless+0.093
its+0.088
iquette+0.088
attering+0.087
Neg logit contributions
ESE-0.114
ysis-0.105
res-0.101
SEA-0.100
sta-0.095
Tokenolving
Token ID10890
Feature activation-2.79e-01

Pos logit contributions
ESE+0.533
ysis+0.494
res+0.482
SEA+0.476
sta+0.455
Neg logit contributions
 Cunning-0.414
 harmless-0.324
iquette-0.302
its-0.299
 Buckley-0.299
Token the
Token ID262
Feature activation+1.70e-02

Pos logit contributions
 Cunning+0.233
 harmless+0.189
its+0.179
iquette+0.178
attering+0.178
Neg logit contributions
ESE-0.225
ysis-0.207
res-0.199
SEA-0.197
sta-0.187
Token sensor
Token ID12694
Feature activation+2.01e-02

Pos logit contributions
ESE+0.013
ysis+0.012
res+0.012
SEA+0.011
sta+0.011
Neg logit contributions
 Cunning-0.015
 harmless-0.012
its-0.011
iquette-0.011
attering-0.011
Token at
Token ID379
Feature activation+3.45e-02

Pos logit contributions
ESE+0.016
ysis+0.015
res+0.014
SEA+0.014
thodox+0.013
Neg logit contributions
 Cunning-0.017
 harmless-0.014
its-0.013
attering-0.013
iquette-0.013
Token all
Token ID477
Feature activation-4.11e-02

Pos logit contributions
ESE+0.029
ysis+0.026
res+0.025
SEA+0.025
thodox+0.024
Neg logit contributions
 Cunning-0.028
 harmless-0.023
its-0.021
iquette-0.021
attering-0.021
Token said
Token ID531
Feature activation-1.75e-02

Pos logit contributions
 Cunning+0.019
 harmless+0.016
its+0.015
iquette+0.015
attering+0.015
Neg logit contributions
ESE-0.020
ysis-0.018
res-0.018
SEA-0.017
sta-0.017
Token Alexand
Token ID21000
Feature activation+5.62e-02

Pos logit contributions
 Cunning+0.014
 harmless+0.011
its+0.011
iquette+0.010
attering+0.010
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.013
SEA-0.013
sta-0.012
Tokenrine
Token ID7640
Feature activation+2.84e-01

Pos logit contributions
ESE+0.017
ysis+0.014
res+0.012
SEA+0.012
sta+0.010
Neg logit contributions
 Cunning-0.075
 harmless-0.066
its-0.064
iquette-0.064
attering-0.064
Token Lat
Token ID5476
Feature activation+1.11e-01

Pos logit contributions
ESE+0.249
ysis+0.230
res+0.222
SEA+0.220
sta+0.211
Neg logit contributions
 Cunning-0.217
 harmless-0.173
its-0.162
iquette-0.162
attering-0.161
Tokenend
Token ID437
Feature activation-1.18e-01

Pos logit contributions
ESE+0.109
ysis+0.101
res+0.098
SEA+0.098
thodox+0.094
Neg logit contributions
 Cunning-0.074
 harmless-0.056
its-0.053
iquette-0.052
attering-0.052
Tokenres
Token ID411
Feature activation+5.78e-01

Pos logit contributions
 Cunning+0.113
 harmless+0.094
its+0.091
iquette+0.090
attering+0.089
Neg logit contributions
ESE-0.082
ysis-0.073
SEA-0.070
res-0.068
thodox-0.065
Tokense
Token ID325
Feature activation+5.21e-01

Pos logit contributions
ESE+0.347
ysis+0.309
res+0.292
SEA+0.289
thodox+0.269
Neg logit contributions
 Cunning-0.601
 harmless-0.510
its-0.490
iquette-0.488
attering-0.486
Token,
Token ID11
Feature activation-2.22e-04

Pos logit contributions
ESE+0.427
ysis+0.393
res+0.379
SEA+0.375
sta+0.357
Neg logit contributions
 Cunning-0.426
 harmless-0.345
its-0.326
iquette-0.325
attering-0.323
Token the
Token ID262
Feature activation+1.71e-02

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
sta-0.000
Token NDP
Token ID20392
Feature activation+9.40e-02

Pos logit contributions
ESE+0.014
ysis+0.013
res+0.013
SEA+0.012
sta+0.012
Neg logit contributions
 Cunning-0.014
 harmless-0.011
its-0.010
iquette-0.010
attering-0.010
Token
Token ID447
Feature activation-1.91e-03

Pos logit contributions
ESE+0.071
ysis+0.064
res+0.062
SEA+0.061
sta+0.058
Neg logit contributions
 Cunning-0.083
 harmless-0.068
its-0.065
iquette-0.065
attering-0.065
Tokens
Token ID82
Feature activation+1.65e-03

Pos logit contributions
ESE+0.012
ysis+0.011
res+0.011
SEA+0.010
sta+0.010
Neg logit contributions
 Cunning-0.012
 harmless-0.010
its-0.009
iquette-0.009
attering-0.009
Token living
Token ID2877
Feature activation-4.25e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token in
Token ID287
Feature activation-8.26e-03

Pos logit contributions
 Cunning+0.034
 harmless+0.027
its+0.026
iquette+0.026
attering+0.026
Neg logit contributions
ESE-0.035
ysis-0.033
res-0.032
SEA-0.031
sta-0.030
Token Fair
Token ID7011
Feature activation-1.99e-01

Pos logit contributions
 Cunning+0.007
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.006
ysis-0.006
res-0.006
SEA-0.005
sta-0.005
Tokenac
Token ID330
Feature activation-9.95e-02

Pos logit contributions
 Cunning+0.170
 harmless+0.138
its+0.132
iquette+0.130
 Buckley+0.130
Neg logit contributions
ESE-0.158
ysis-0.144
res-0.138
SEA-0.137
thodox-0.130
Tokenres
Token ID411
Feature activation+5.77e-01

Pos logit contributions
 Cunning+0.145
 harmless+0.128
its+0.127
action+0.126
iquette+0.125
Neg logit contributions
ESE-0.019
ysis-0.011
SEA-0.008
res-0.006
thodox-0.005
Token,
Token ID11
Feature activation+3.77e-03

Pos logit contributions
ESE+0.460
ysis+0.421
res+0.405
SEA+0.402
sta+0.382
Neg logit contributions
 Cunning-0.486
 harmless-0.396
its-0.375
iquette-0.374
attering-0.372
Token but
Token ID475
Feature activation-2.36e-02

Pos logit contributions
ESE+0.003
ysis+0.003
res+0.003
SEA+0.003
sta+0.003
Neg logit contributions
 Cunning-0.003
 harmless-0.003
its-0.002
iquette-0.002
attering-0.002
Token they
Token ID484
Feature activation+1.53e-01

Pos logit contributions
 Cunning+0.019
 harmless+0.015
iquette+0.014
its+0.014
attering+0.014
Neg logit contributions
ESE-0.020
ysis-0.018
res-0.017
SEA-0.017
thodox-0.016
Token can
Token ID460
Feature activation-1.22e-01

Pos logit contributions
ESE+0.119
ysis+0.109
res+0.104
SEA+0.103
sta+0.098
Neg logit contributions
 Cunning-0.133
 harmless-0.109
its-0.103
iquette-0.103
attering-0.102
Token certainly
Token ID3729
Feature activation-9.10e-02

Pos logit contributions
 Cunning+0.110
 harmless+0.091
its+0.086
iquette+0.086
attering+0.086
Neg logit contributions
ESE-0.090
ysis-0.082
res-0.078
SEA-0.077
sta-0.073
Token to
Token ID284
Feature activation-1.35e-02

Pos logit contributions
ESE+0.081
ysis+0.075
res+0.073
SEA+0.072
thodox+0.069
Neg logit contributions
 Cunning-0.083
 harmless-0.067
iquette-0.064
its-0.063
attering-0.063
Token the
Token ID262
Feature activation+1.39e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.010
ysis-0.009
res-0.009
SEA-0.009
sta-0.008
Token Rs
Token ID12820
Feature activation+8.77e-02

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.010
SEA+0.009
sta+0.009
Neg logit contributions
 Cunning-0.012
 harmless-0.010
iquette-0.009
its-0.009
attering-0.009
Token three
Token ID1115
Feature activation-6.13e-02

Pos logit contributions
ESE+0.071
ysis+0.065
res+0.063
SEA+0.062
sta+0.059
Neg logit contributions
 Cunning-0.073
 harmless-0.059
iquette-0.056
its-0.056
attering-0.056
Token cro
Token ID6763
Feature activation-3.35e-02

Pos logit contributions
 Cunning+0.047
 harmless+0.037
its+0.035
iquette+0.035
attering+0.035
Neg logit contributions
ESE-0.053
ysis-0.049
res-0.048
SEA-0.047
sta-0.045
Tokenres
Token ID411
Feature activation+5.76e-01

Pos logit contributions
 Cunning+0.051
 harmless+0.046
its+0.045
attering+0.044
iquette+0.044
Neg logit contributions
ESE-0.004
ysis-0.002
SEA-0.001
res-0.000
thodox+0.001
Token valuation
Token ID29115
Feature activation-1.72e-01

Pos logit contributions
ESE+0.471
ysis+0.434
res+0.421
SEA+0.414
sta+0.395
Neg logit contributions
 Cunning-0.470
 harmless-0.379
iquette-0.361
its-0.359
attering-0.358
Token done
Token ID1760
Feature activation+2.15e-01

Pos logit contributions
 Cunning+0.142
 harmless+0.115
its+0.109
iquette+0.109
attering+0.109
Neg logit contributions
ESE-0.139
ysis-0.128
res-0.123
SEA-0.122
sta-0.116
Token by
Token ID416
Feature activation+2.70e-01

Pos logit contributions
ESE+0.125
ysis+0.111
res+0.105
SEA+0.103
sta+0.096
Neg logit contributions
 Cunning-0.227
 harmless-0.193
its-0.186
iquette-0.185
attering-0.185
Token MF
Token ID32850
Feature activation+4.31e-02

Pos logit contributions
ESE+0.220
ysis+0.202
res+0.195
SEA+0.193
sta+0.184
Neg logit contributions
 Cunning-0.222
 harmless-0.180
its-0.171
iquette-0.170
attering-0.170
TokenSC
Token ID6173
Feature activation+1.80e-01

Pos logit contributions
ESE+0.026
ysis+0.023
res+0.022
SEA+0.021
sta+0.020
Neg logit contributions
 Cunning-0.045
 harmless-0.038
its-0.037
iquette-0.037
attering-0.036
Tokenbit
Token ID2545
Feature activation+1.56e-01

Pos logit contributions
 Cunning+0.207
 harmless+0.170
its+0.164
iquette+0.162
attering+0.162
Neg logit contributions
ESE-0.173
ysis-0.157
SEA-0.150
res-0.149
thodox-0.141
Tokenre
Token ID260
Feature activation+4.11e-01

Pos logit contributions
ESE+0.106
ysis+0.096
res+0.091
SEA+0.090
sta+0.085
Neg logit contributions
 Cunning-0.149
 harmless-0.125
its-0.119
iquette-0.119
attering-0.118
Token des
Token ID748
Feature activation+3.57e-01

Pos logit contributions
ESE+0.297
ysis+0.270
res+0.260
SEA+0.256
sta+0.242
Neg logit contributions
 Cunning-0.376
 harmless-0.311
its-0.296
attering-0.296
iquette-0.295
Token mem
Token ID1066
Feature activation+4.43e-02

Pos logit contributions
ESE+0.286
ysis+0.263
res+0.255
SEA+0.251
sta+0.239
Neg logit contributions
 Cunning-0.297
 harmless-0.241
its-0.228
iquette-0.228
attering-0.227
Tokenb
Token ID65
Feature activation-5.14e-02

Pos logit contributions
ESE+0.037
ysis+0.034
res+0.033
SEA+0.032
sta+0.031
Neg logit contributions
 Cunning-0.036
 harmless-0.029
its-0.027
iquette-0.027
attering-0.027
Tokenres
Token ID411
Feature activation+5.74e-01

Pos logit contributions
 Cunning+0.078
 harmless+0.070
its+0.068
iquette+0.068
attering+0.068
Neg logit contributions
ESE-0.007
ysis-0.003
SEA-0.002
res-0.001
thodox+0.000
Token,
Token ID11
Feature activation-4.82e-04

Pos logit contributions
ESE+0.424
ysis+0.386
res+0.371
SEA+0.365
sta+0.346
Neg logit contributions
 Cunning-0.516
 harmless-0.426
its-0.405
attering-0.404
iquette-0.404
Token en
Token ID551
Feature activation+2.17e-01

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
sta-0.000
Token partic
Token ID1344
Feature activation+2.54e-01

Pos logit contributions
ESE+0.166
ysis+0.151
res+0.145
SEA+0.144
sta+0.136
Neg logit contributions
 Cunning-0.190
 harmless-0.156
its-0.148
iquette-0.147
attering-0.147
Tokenul
Token ID377
Feature activation-1.91e-02

Pos logit contributions
ESE+0.212
ysis+0.197
res+0.191
SEA+0.187
sta+0.179
Neg logit contributions
 Cunning-0.205
 harmless-0.164
its-0.154
 Buckley-0.153
attering-0.153
Tokenier
Token ID959
Feature activation-1.51e-01

Pos logit contributions
 Cunning+0.013
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.018
ysis-0.017
res-0.016
SEA-0.016
sta-0.016
Tokenouls
Token ID42033
Feature activation-2.16e-01

Pos logit contributions
 Cunning+0.092
 harmless+0.066
iquette+0.059
its+0.059
 Buckley+0.058
Neg logit contributions
ESE-0.184
ysis-0.172
res-0.169
SEA-0.166
sta-0.161
Token,
Token ID11
Feature activation+2.79e-03

Pos logit contributions
 Cunning+0.178
 harmless+0.144
its+0.136
iquette+0.136
attering+0.135
Neg logit contributions
ESE-0.177
ysis-0.162
res-0.156
SEA-0.155
sta-0.148
Token ghosts
Token ID25899
Feature activation-5.11e-02

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token,
Token ID11
Feature activation-3.21e-04

Pos logit contributions
 Cunning+0.042
 harmless+0.034
its+0.032
iquette+0.032
attering+0.031
Neg logit contributions
ESE-0.042
ysis-0.039
res-0.037
SEA-0.037
thodox-0.035
Token spect
Token ID5444
Feature activation+2.62e-02

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
sta-0.000
Tokenres
Token ID411
Feature activation+5.74e-01

Pos logit contributions
ESE+0.007
ysis+0.005
res+0.005
SEA+0.004
thodox+0.003
Neg logit contributions
 Cunning-0.036
 harmless-0.032
iquette-0.031
its-0.031
attering-0.031
Token and
Token ID290
Feature activation-8.85e-03

Pos logit contributions
ESE+0.468
ysis+0.430
res+0.414
SEA+0.410
sta+0.391
Neg logit contributions
 Cunning-0.472
 harmless-0.381
its-0.361
iquette-0.360
attering-0.359
Token other
Token ID584
Feature activation-2.19e-03

Pos logit contributions
 Cunning+0.007
 harmless+0.006
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
sta-0.006
Token creepy
Token ID23387
Feature activation-1.14e-01

Pos logit contributions
 Cunning+0.002
 harmless+0.001
iquette+0.001
its+0.001
attering+0.001
Neg logit contributions
ESE-0.002
ysis-0.002
res-0.002
SEA-0.002
sta-0.002
Token craw
Token ID27784
Feature activation+1.06e-03

Pos logit contributions
 Cunning+0.077
 harmless+0.059
its+0.055
iquette+0.055
attering+0.055
Neg logit contributions
ESE-0.109
ysis-0.101
res-0.098
SEA-0.097
sta-0.094
Tokenlers
Token ID8116
Feature activation-1.22e-01

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
iquette-0.001
its-0.001
attering-0.001
Token,
Token ID11
Feature activation+1.12e-03

Pos logit contributions
 Cunning+0.137
 harmless+0.111
its+0.105
iquette+0.104
attering+0.104
Neg logit contributions
ESE-0.138
ysis-0.127
res-0.122
SEA-0.121
thodox-0.116
Token
Token ID447
Feature activation-1.13e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token
Token ID251
Feature activation-2.33e-02

Pos logit contributions
 Cunning+0.009
 harmless+0.007
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
sta-0.008
Token Lat
Token ID5476
Feature activation+1.11e-01

Pos logit contributions
 Cunning+0.021
 harmless+0.017
its+0.016
iquette+0.016
attering+0.016
Neg logit contributions
ESE-0.018
ysis-0.016
res-0.016
SEA-0.015
sta-0.015
Tokenend
Token ID437
Feature activation-1.21e-01

Pos logit contributions
ESE+0.109
ysis+0.102
SEA+0.098
res+0.098
thodox+0.094
Neg logit contributions
 Cunning-0.072
 harmless-0.055
its-0.052
attering-0.051
iquette-0.051
Tokenres
Token ID411
Feature activation+5.74e-01

Pos logit contributions
 Cunning+0.182
 harmless+0.163
its+0.161
attering+0.159
action+0.159
Neg logit contributions
ESE-0.016
ysis-0.007
SEA-0.003
res-0.000
thodox+0.000
Tokense
Token ID325
Feature activation+5.25e-01

Pos logit contributions
ESE+0.117
ysis+0.079
SEA+0.057
res+0.053
thodox+0.040
Neg logit contributions
 Cunning-0.819
 harmless-0.727
its-0.719
action-0.712
attering-0.709
Token said
Token ID531
Feature activation-1.57e-02

Pos logit contributions
ESE+0.421
ysis+0.386
res+0.372
SEA+0.368
sta+0.350
Neg logit contributions
 Cunning-0.440
 harmless-0.357
its-0.338
iquette-0.337
attering-0.335
Token.
Token ID13
Feature activation-1.86e-03

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
attering+0.009
iquette+0.009
Neg logit contributions
ESE-0.013
ysis-0.012
res-0.012
SEA-0.012
sta-0.011
Token
Token ID198
Feature activation-4.95e-03

Pos logit contributions
 Cunning+0.001
 harmless+0.001
its+0.001
iquette+0.001
attering+0.001
Neg logit contributions
ESE-0.002
ysis-0.001
res-0.001
SEA-0.001
sta-0.001
Token
Token ID198
Feature activation-3.91e-03

Pos logit contributions
 Cunning+0.004
 harmless+0.003
iquette+0.003
its+0.003
attering+0.003
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
sta-0.003
Token call
Token ID869
Feature activation-7.54e-02

Pos logit contributions
ESE+0.069
ysis+0.064
res+0.061
SEA+0.061
sta+0.057
Neg logit contributions
 Cunning-0.075
 harmless-0.062
its-0.058
iquette-0.058
attering-0.058
Token sign
Token ID1051
Feature activation+2.46e-01

Pos logit contributions
 Cunning+0.063
 harmless+0.051
its+0.049
attering+0.048
iquette+0.048
Neg logit contributions
ESE-0.060
ysis-0.056
res-0.053
SEA-0.053
sta-0.050
Token,
Token ID11
Feature activation+1.91e-03

Pos logit contributions
ESE+0.195
ysis+0.179
res+0.172
SEA+0.170
sta+0.162
Neg logit contributions
 Cunning-0.208
 harmless-0.169
its-0.160
iquette-0.159
attering-0.159
Token no
Token ID645
Feature activation+1.05e-01

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.002
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token add
Token ID751
Feature activation+2.06e-01

Pos logit contributions
ESE+0.085
ysis+0.078
res+0.075
SEA+0.074
sta+0.071
Neg logit contributions
 Cunning-0.087
 harmless-0.070
its-0.067
iquette-0.066
attering-0.066
Tokenres
Token ID411
Feature activation+5.74e-01

Pos logit contributions
ESE+0.098
ysis+0.084
res+0.078
SEA+0.077
thodox+0.070
Neg logit contributions
 Cunning-0.240
 harmless-0.207
its-0.200
iquette-0.199
attering-0.199
Tokensee
Token ID3826
Feature activation+5.48e-02

Pos logit contributions
ESE+0.345
ysis+0.309
res+0.294
SEA+0.286
sta+0.269
Neg logit contributions
 Cunning-0.595
 harmless-0.506
iquette-0.483
its-0.483
attering-0.482
Token,
Token ID11
Feature activation+2.95e-03

Pos logit contributions
ESE+0.044
ysis+0.040
res+0.038
SEA+0.038
sta+0.036
Neg logit contributions
 Cunning-0.046
 harmless-0.038
its-0.036
iquette-0.035
attering-0.035
Token just
Token ID655
Feature activation-9.85e-02

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.003
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token panic
Token ID13619
Feature activation-9.03e-02

Pos logit contributions
 Cunning+0.080
 harmless+0.064
its+0.061
iquette+0.061
attering+0.061
Neg logit contributions
ESE-0.081
ysis-0.075
res-0.072
SEA-0.071
sta-0.068
Token.
Token ID13
Feature activation-3.52e-03

Pos logit contributions
 Cunning+0.073
 harmless+0.059
its+0.056
attering+0.056
iquette+0.055
Neg logit contributions
ESE-0.075
ysis-0.069
res-0.066
SEA-0.065
sta-0.063
Token Valve
Token ID22479
Feature activation+1.23e-01

Pos logit contributions
ESE+0.015
ysis+0.014
res+0.014
SEA+0.014
sta+0.013
Neg logit contributions
 Cunning-0.015
 harmless-0.012
its-0.011
iquette-0.011
attering-0.011
Token,
Token ID11
Feature activation+2.43e-03

Pos logit contributions
ESE+0.114
ysis+0.106
res+0.102
SEA+0.101
sta+0.097
Neg logit contributions
 Cunning-0.087
 harmless-0.068
its-0.064
iquette-0.063
attering-0.063
Token especially
Token ID2592
Feature activation+1.56e-01

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
thodox+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token if
Token ID611
Feature activation+5.98e-02

Pos logit contributions
ESE+0.129
ysis+0.119
res+0.114
SEA+0.113
sta+0.108
Neg logit contributions
 Cunning-0.127
 harmless-0.103
its-0.097
iquette-0.097
attering-0.096
Token E
Token ID412
Feature activation-2.26e-04

Pos logit contributions
ESE+0.048
ysis+0.044
res+0.042
SEA+0.042
thodox+0.040
Neg logit contributions
 Cunning-0.050
 harmless-0.041
its-0.039
 Buckley-0.038
iquette-0.038
TokenSEA
Token ID46887
Feature activation+5.73e-01

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
thodox-0.000
Token helps
Token ID5419
Feature activation+4.93e-02

Pos logit contributions
ESE+0.473
ysis+0.435
res+0.417
SEA+0.413
thodox+0.397
Neg logit contributions
 Cunning-0.466
 harmless-0.375
its-0.357
iquette-0.352
 Buckley-0.352
Token them
Token ID606
Feature activation+1.40e-01

Pos logit contributions
ESE+0.041
ysis+0.037
res+0.036
SEA+0.035
thodox+0.034
Neg logit contributions
 Cunning-0.040
 harmless-0.033
its-0.031
iquette-0.030
 Buckley-0.030
Token out
Token ID503
Feature activation-1.44e-01

Pos logit contributions
ESE+0.118
ysis+0.108
res+0.104
SEA+0.104
thodox+0.099
Neg logit contributions
 Cunning-0.112
 harmless-0.090
its-0.085
iquette-0.084
attering-0.084
Token,
Token ID11
Feature activation+3.95e-03

Pos logit contributions
 Cunning+0.117
 harmless+0.095
its+0.090
iquette+0.088
 Buckley+0.088
Neg logit contributions
ESE-0.119
ysis-0.109
SEA-0.104
res-0.104
thodox-0.099
Token can
Token ID460
Feature activation-1.24e-01

Pos logit contributions
ESE+0.004
ysis+0.003
res+0.003
SEA+0.003
sta+0.003
Neg logit contributions
 Cunning-0.003
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token logic
Token ID9156
Feature activation-1.53e-01

Pos logit contributions
ESE+0.108
ysis+0.099
res+0.096
SEA+0.095
thodox+0.090
Neg logit contributions
 Cunning-0.111
 harmless-0.090
its-0.085
iquette-0.085
attering-0.085
Token and
Token ID290
Feature activation-9.47e-03

Pos logit contributions
 Cunning+0.124
 harmless+0.100
its+0.095
iquette+0.094
attering+0.094
Neg logit contributions
ESE-0.127
ysis-0.116
res-0.112
SEA-0.111
sta-0.106
Token my
Token ID616
Feature activation-1.42e-01

Pos logit contributions
 Cunning+0.008
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
sta-0.006
Token energy
Token ID2568
Feature activation+3.28e-02

Pos logit contributions
 Cunning+0.118
 harmless+0.096
its+0.091
iquette+0.090
attering+0.090
Neg logit contributions
ESE-0.115
ysis-0.105
res-0.101
SEA-0.101
sta-0.096
Token c
Token ID269
Feature activation-6.76e-02

Pos logit contributions
ESE+0.027
ysis+0.024
res+0.024
SEA+0.023
sta+0.022
Neg logit contributions
 Cunning-0.027
 harmless-0.022
its-0.021
iquette-0.021
attering-0.021
Tokenres
Token ID411
Feature activation+5.72e-01

Pos logit contributions
 Cunning+0.062
 harmless+0.052
its+0.050
attering+0.049
iquette+0.049
Neg logit contributions
ESE-0.049
ysis-0.044
SEA-0.042
res-0.042
thodox-0.039
Tokencend
Token ID15695
Feature activation-9.92e-02

Pos logit contributions
ESE+0.544
ysis+0.509
res+0.493
SEA+0.488
sta+0.470
Neg logit contributions
 Cunning-0.393
 harmless-0.304
its-0.280
attering-0.279
iquette-0.278
Tokeno
Token ID78
Feature activation-1.81e-01

Pos logit contributions
 Cunning+0.066
 harmless+0.050
iquette+0.046
its+0.046
 Buckley+0.046
Neg logit contributions
ESE-0.097
ysis-0.091
res-0.088
SEA-0.087
sta-0.084
Token in
Token ID287
Feature activation-1.32e-02

Pos logit contributions
 Cunning+0.147
 harmless+0.119
its+0.113
iquette+0.112
attering+0.111
Neg logit contributions
ESE-0.149
ysis-0.137
res-0.131
SEA-0.131
thodox-0.125
Token the
Token ID262
Feature activation+1.39e-02

Pos logit contributions
 Cunning+0.011
 harmless+0.009
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.010
SEA-0.009
sta-0.009
Token last
Token ID938
Feature activation-6.76e-02

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.010
SEA+0.010
sta+0.009
Neg logit contributions
 Cunning-0.012
 harmless-0.009
its-0.009
iquette-0.009
attering-0.009
Token ep
Token ID2462
Feature activation+9.19e-02

Pos logit contributions
ESE+0.093
ysis+0.085
res+0.082
SEA+0.081
thodox+0.077
Neg logit contributions
 Cunning-0.098
 harmless-0.080
its-0.076
iquette-0.075
attering-0.075
Tokenoxy
Token ID23536
Feature activation+1.14e-01

Pos logit contributions
ESE+0.068
ysis+0.062
res+0.059
SEA+0.058
sta+0.055
Neg logit contributions
 Cunning-0.083
 harmless-0.069
its-0.065
iquette-0.065
attering-0.065
Token resin
Token ID36536
Feature activation+2.76e-01

Pos logit contributions
ESE+0.078
ysis+0.071
res+0.067
SEA+0.067
thodox+0.063
Neg logit contributions
 Cunning-0.108
 harmless-0.090
its-0.086
iquette-0.086
attering-0.086
Token is
Token ID318
Feature activation+8.30e-02

Pos logit contributions
ESE+0.228
ysis+0.210
res+0.203
SEA+0.200
 seeded+0.191
Neg logit contributions
 Cunning-0.224
 harmless-0.179
iquette-0.170
its-0.170
attering-0.169
Token "
Token ID366
Feature activation-2.16e-01

Pos logit contributions
ESE+0.069
ysis+0.063
res+0.061
SEA+0.060
sta+0.057
Neg logit contributions
 Cunning-0.067
 harmless-0.054
its-0.051
iquette-0.051
attering-0.051
Tokenres
Token ID411
Feature activation+5.72e-01

Pos logit contributions
 Cunning+0.181
 harmless+0.147
its+0.140
iquette+0.138
attering+0.138
Neg logit contributions
ESE-0.174
ysis-0.159
res-0.153
SEA-0.152
thodox-0.144
Tokenil
Token ID346
Feature activation-2.13e-01

Pos logit contributions
ESE+0.552
ysis+0.520
res+0.507
SEA+0.495
sta+0.480
Neg logit contributions
 Cunning-0.385
 harmless-0.293
iquette-0.270
 Buckley-0.270
 Bots-0.270
Tokenient
Token ID1153
Feature activation+4.84e-02

Pos logit contributions
 Cunning+0.186
 harmless+0.152
iquette+0.144
its+0.144
 Buckley+0.143
Neg logit contributions
ESE-0.164
ysis-0.150
res-0.144
SEA-0.142
sta-0.135
Token to
Token ID284
Feature activation-1.61e-02

Pos logit contributions
ESE+0.045
ysis+0.042
res+0.040
SEA+0.040
sta+0.038
Neg logit contributions
 Cunning-0.035
 harmless-0.027
its-0.025
iquette-0.025
attering-0.025
Token mechanical
Token ID12370
Feature activation+2.39e-02

Pos logit contributions
 Cunning+0.014
 harmless+0.012
its+0.011
iquette+0.011
attering+0.011
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.011
SEA-0.011
thodox-0.010
Token stresses
Token ID29787
Feature activation+2.08e-01

Pos logit contributions
ESE+0.017
ysis+0.015
res+0.015
SEA+0.015
sta+0.014
Neg logit contributions
 Cunning-0.022
 harmless-0.018
its-0.017
iquette-0.017
attering-0.017
Token,
Token ID11
Feature activation+9.45e-03

Pos logit contributions
 Cunning+0.069
 harmless+0.055
its+0.052
iquette+0.052
attering+0.052
Neg logit contributions
ESE-0.072
ysis-0.066
res-0.064
SEA-0.063
sta-0.060
Token Australian
Token ID6638
Feature activation-5.23e-02

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.007
Neg logit contributions
 Cunning-0.008
 harmless-0.006
iquette-0.006
its-0.006
attering-0.006
Token Ben
Token ID3932
Feature activation+5.89e-02

Pos logit contributions
 Cunning+0.041
 harmless+0.033
its+0.031
iquette+0.031
attering+0.031
Neg logit contributions
ESE-0.045
ysis-0.041
res-0.040
SEA-0.040
sta-0.038
Token Mend
Token ID20442
Feature activation-8.46e-03

Pos logit contributions
ESE+0.053
ysis+0.049
res+0.047
SEA+0.047
sta+0.045
Neg logit contributions
 Cunning-0.043
 harmless-0.034
iquette-0.032
its-0.032
attering-0.032
Tokenel
Token ID417
Feature activation-7.34e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.004
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
sta-0.007
Tokenso
Token ID568
Feature activation+5.71e-01

Pos logit contributions
 Cunning+0.105
 harmless+0.094
its+0.092
iquette+0.091
 Buckley+0.091
Neg logit contributions
ESE-0.015
ysis-0.010
SEA-0.008
res-0.008
thodox-0.005
Tokenhn
Token ID21116
Feature activation-9.45e-02

Pos logit contributions
ESE+0.528
ysis+0.487
res+0.484
SEA+0.463
sta+0.459
Neg logit contributions
 Cunning-0.403
 harmless-0.316
iquette-0.300
attering-0.290
its-0.289
Token (
Token ID357
Feature activation+1.61e-01

Pos logit contributions
 Cunning+0.078
 harmless+0.063
its+0.060
iquette+0.060
attering+0.060
Neg logit contributions
ESE-0.077
ysis-0.070
res-0.068
SEA-0.067
sta-0.064
Token
Token ID447
Feature activation+1.57e-03

Pos logit contributions
ESE+0.124
ysis+0.113
res+0.109
SEA+0.108
sta+0.102
Neg logit contributions
 Cunning-0.140
 harmless-0.115
iquette-0.109
its-0.109
attering-0.108
Token
Token ID250
Feature activation+1.50e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
 Buckley-0.001
TokenBlood
Token ID21659
Feature activation+1.06e-01

Pos logit contributions
ESE+0.012
ysis+0.011
res+0.011
SEA+0.011
sta+0.010
Neg logit contributions
 Cunning-0.012
 harmless-0.010
its-0.009
iquette-0.009
attering-0.009
Token
Token ID198
Feature activation-1.10e-02

Pos logit contributions
 Cunning+0.015
 harmless+0.012
iquette+0.012
its+0.012
attering+0.012
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.014
SEA-0.013
sta-0.013
TokenH
Token ID39
Feature activation-7.87e-02

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.008
ysis-0.008
res-0.007
SEA-0.007
thodox-0.007
Tokenints
Token ID29503
Feature activation-2.00e-01

Pos logit contributions
 Cunning+0.059
 harmless+0.046
 Buckley+0.043
iquette+0.043
its+0.043
Neg logit contributions
ESE-0.071
ysis-0.065
res-0.063
SEA-0.062
sta-0.060
Token on
Token ID319
Feature activation-3.88e-02

Pos logit contributions
 Cunning+0.161
 harmless+0.130
its+0.123
iquette+0.122
attering+0.122
Neg logit contributions
ESE-0.166
ysis-0.153
res-0.147
SEA-0.146
sta-0.139
Token Et
Token ID17906
Feature activation+7.39e-02

Pos logit contributions
 Cunning+0.031
 harmless+0.025
its+0.024
attering+0.024
iquette+0.024
Neg logit contributions
ESE-0.032
ysis-0.029
res-0.028
SEA-0.028
sta-0.027
Tokeniquette
Token ID40387
Feature activation-5.28e-01

Pos logit contributions
ESE+0.104
ysis+0.099
res+0.098
SEA+0.096
thodox+0.094
Neg logit contributions
 Cunning-0.017
 harmless-0.006
 Buckley-0.002
 Bots-0.002
attering-0.002
Token and
Token ID290
Feature activation-2.25e-02

Pos logit contributions
 Cunning+0.428
 harmless+0.346
its+0.327
iquette+0.325
attering+0.324
Neg logit contributions
ESE-0.437
ysis-0.402
res-0.386
SEA-0.383
sta-0.365
Token the
Token ID262
Feature activation+1.01e-02

Pos logit contributions
 Cunning+0.018
 harmless+0.014
its+0.014
attering+0.013
 Buckley+0.013
Neg logit contributions
ESE-0.019
ysis-0.017
res-0.017
SEA-0.017
sta-0.016
Token Us
Token ID4021
Feature activation-9.28e-02

Pos logit contributions
ESE+0.009
ysis+0.008
res+0.008
SEA+0.008
sta+0.007
Neg logit contributions
 Cunning-0.008
 harmless-0.006
its-0.006
attering-0.006
iquette-0.006
Tokenages
Token ID1095
Feature activation-9.56e-02

Pos logit contributions
 Cunning+0.071
 harmless+0.056
its+0.052
attering+0.052
 Buckley+0.052
Neg logit contributions
ESE-0.081
ysis-0.075
res-0.073
SEA-0.071
sta-0.069
Token of
Token ID286
Feature activation-3.49e-03

Pos logit contributions
 Cunning+0.078
 harmless+0.063
its+0.059
iquette+0.059
attering+0.059
Neg logit contributions
ESE-0.079
ysis-0.072
res-0.070
SEA-0.069
sta-0.066
Token
Token ID198
Feature activation-1.79e-02

Pos logit contributions
 Cunning+0.013
 harmless+0.011
iquette+0.010
its+0.010
attering+0.010
Neg logit contributions
ESE-0.013
ysis-0.012
res-0.012
SEA-0.012
sta-0.011
TokenH
Token ID39
Feature activation-7.20e-02

Pos logit contributions
 Cunning+0.016
 harmless+0.013
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.014
ysis-0.012
res-0.012
SEA-0.012
thodox-0.011
Tokenints
Token ID29503
Feature activation-1.98e-01

Pos logit contributions
 Cunning+0.043
 harmless+0.032
 Buckley+0.029
iquette+0.029
attering+0.029
Neg logit contributions
ESE-0.075
ysis-0.070
res-0.068
SEA-0.068
thodox-0.065
Token on
Token ID319
Feature activation-3.47e-02

Pos logit contributions
 Cunning+0.155
 harmless+0.123
its+0.117
iquette+0.116
attering+0.115
Neg logit contributions
ESE-0.170
ysis-0.157
res-0.151
SEA-0.150
thodox-0.143
Token Et
Token ID17906
Feature activation+7.28e-02

Pos logit contributions
 Cunning+0.031
 harmless+0.025
its+0.024
attering+0.024
iquette+0.024
Neg logit contributions
ESE-0.026
ysis-0.024
res-0.023
SEA-0.023
sta-0.022
Tokeniquette
Token ID40387
Feature activation-5.26e-01

Pos logit contributions
ESE+0.101
ysis+0.096
res+0.095
SEA+0.094
thodox+0.092
Neg logit contributions
 Cunning-0.018
 harmless-0.007
 Bots-0.004
 Buckley-0.004
attering-0.003
Token and
Token ID290
Feature activation-2.22e-02

Pos logit contributions
 Cunning+0.426
 harmless+0.343
its+0.325
iquette+0.322
attering+0.322
Neg logit contributions
ESE-0.437
ysis-0.402
res-0.387
SEA-0.383
sta-0.365
Token the
Token ID262
Feature activation+1.02e-02

Pos logit contributions
 Cunning+0.018
 harmless+0.015
its+0.014
attering+0.014
iquette+0.014
Neg logit contributions
ESE-0.018
ysis-0.017
res-0.016
SEA-0.016
sta-0.015
Token Us
Token ID4021
Feature activation-9.64e-02

Pos logit contributions
ESE+0.010
ysis+0.009
res+0.009
SEA+0.008
sta+0.008
Neg logit contributions
 Cunning-0.007
 harmless-0.006
its-0.005
attering-0.005
iquette-0.005
Tokenages
Token ID1095
Feature activation-9.48e-02

Pos logit contributions
 Cunning+0.075
 harmless+0.060
 Buckley+0.056
 Bots+0.055
attering+0.055
Neg logit contributions
ESE-0.084
ysis-0.077
res-0.076
SEA-0.074
sta-0.071
Token of
Token ID286
Feature activation-6.45e-03

Pos logit contributions
 Cunning+0.077
 harmless+0.063
iquette+0.059
its+0.059
attering+0.059
Neg logit contributions
ESE-0.078
ysis-0.071
res-0.069
SEA-0.068
sta-0.065
Token
Token ID251
Feature activation-3.23e-02

Pos logit contributions
 Cunning+0.017
 harmless+0.014
its+0.013
iquette+0.013
attering+0.013
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.010
SEA-0.010
sta-0.010
Token
Token ID198
Feature activation-1.28e-02

Pos logit contributions
 Cunning+0.026
 harmless+0.022
its+0.020
iquette+0.020
attering+0.020
Neg logit contributions
ESE-0.026
ysis-0.024
res-0.023
SEA-0.023
sta-0.022
Token
Token ID198
Feature activation-1.53e-02

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.009
SEA-0.009
sta-0.009
TokenE
Token ID36
Feature activation-6.15e-02

Pos logit contributions
 Cunning+0.013
 harmless+0.011
its+0.011
iquette+0.010
attering+0.010
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.010
SEA-0.010
sta-0.009
Tokent
Token ID83
Feature activation-1.77e-01

Pos logit contributions
 Cunning+0.054
 harmless+0.044
its+0.042
iquette+0.042
attering+0.042
Neg logit contributions
ESE-0.047
ysis-0.043
res-0.041
SEA-0.041
sta-0.038
Tokeniquette
Token ID40387
Feature activation-5.26e-01

Pos logit contributions
 Cunning+0.139
 harmless+0.111
iquette+0.105
its+0.105
attering+0.104
Neg logit contributions
ESE-0.151
ysis-0.140
res-0.134
SEA-0.134
thodox-0.127
Token for
Token ID329
Feature activation-3.89e-02

Pos logit contributions
 Cunning+0.438
 harmless+0.356
its+0.337
attering+0.335
iquette+0.334
Neg logit contributions
ESE-0.424
ysis-0.389
res-0.375
SEA-0.371
sta-0.353
Token Gentle
Token ID25066
Feature activation+1.01e-02

Pos logit contributions
 Cunning+0.031
 harmless+0.025
its+0.024
iquette+0.024
attering+0.024
Neg logit contributions
ESE-0.032
ysis-0.030
res-0.029
SEA-0.028
thodox-0.027
Tokenmen
Token ID3653
Feature activation+2.06e-01

Pos logit contributions
ESE+0.006
ysis+0.005
res+0.005
SEA+0.005
sta+0.004
Neg logit contributions
 Cunning-0.011
 harmless-0.009
its-0.009
iquette-0.009
attering-0.009
Token;
Token ID26
Feature activation+3.42e-02

Pos logit contributions
ESE+0.167
ysis+0.154
res+0.148
SEA+0.146
sta+0.139
Neg logit contributions
 Cunning-0.170
 harmless-0.137
its-0.130
iquette-0.129
attering-0.129
Token or
Token ID393
Feature activation+1.68e-01

Pos logit contributions
ESE+0.029
ysis+0.026
res+0.025
SEA+0.025
sta+0.024
Neg logit contributions
 Cunning-0.028
 harmless-0.022
its-0.021
iquette-0.021
attering-0.021
Token children
Token ID1751
Feature activation+1.51e-02

Pos logit contributions
 Cunning+0.004
 harmless+0.003
its+0.003
iquette+0.003
attering+0.003
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
sta-0.004
Token will
Token ID481
Feature activation-1.17e-01

Pos logit contributions
ESE+0.012
ysis+0.011
res+0.011
SEA+0.011
sta+0.010
Neg logit contributions
 Cunning-0.013
 harmless-0.010
its-0.010
iquette-0.010
attering-0.010
Token work
Token ID670
Feature activation+2.16e-01

Pos logit contributions
 Cunning+0.096
 harmless+0.078
its+0.074
iquette+0.073
attering+0.073
Neg logit contributions
ESE-0.096
ysis-0.088
res-0.085
SEA-0.084
sta-0.080
Token on
Token ID319
Feature activation-3.66e-02

Pos logit contributions
ESE+0.182
ysis+0.168
res+0.162
SEA+0.160
sta+0.154
Neg logit contributions
 Cunning-0.171
 harmless-0.136
its-0.129
iquette-0.129
attering-0.129
Token "
Token ID366
Feature activation-2.23e-01

Pos logit contributions
 Cunning+0.030
 harmless+0.024
its+0.023
iquette+0.023
attering+0.023
Neg logit contributions
ESE-0.030
ysis-0.028
res-0.027
SEA-0.026
sta-0.025
Tokenaction
Token ID2673
Feature activation-5.26e-01

Pos logit contributions
 Cunning+0.183
 harmless+0.148
its+0.140
iquette+0.139
attering+0.139
Neg logit contributions
ESE-0.183
ysis-0.169
res-0.162
SEA-0.161
thodox-0.153
Token projects
Token ID4493
Feature activation+2.91e-01

Pos logit contributions
 Cunning+0.416
 harmless+0.333
its+0.315
iquette+0.313
attering+0.312
Neg logit contributions
ESE-0.446
ysis-0.411
res-0.397
SEA-0.393
sta-0.375
Token"
Token ID1
Feature activation-1.92e-01

Pos logit contributions
ESE+0.258
ysis+0.239
res+0.232
SEA+0.229
sta+0.220
Neg logit contributions
 Cunning-0.217
 harmless-0.170
iquette-0.161
its-0.161
attering-0.160
Token tied
Token ID8165
Feature activation+4.04e-02

Pos logit contributions
 Cunning+0.154
 harmless+0.124
its+0.117
iquette+0.117
attering+0.116
Neg logit contributions
ESE-0.161
ysis-0.148
res-0.143
SEA-0.141
sta-0.135
Token to
Token ID284
Feature activation-1.86e-02

Pos logit contributions
ESE+0.034
ysis+0.032
res+0.030
SEA+0.030
sta+0.029
Neg logit contributions
 Cunning-0.032
 harmless-0.026
its-0.024
iquette-0.024
attering-0.024
Token the
Token ID262
Feature activation+5.66e-03

Pos logit contributions
 Cunning+0.015
 harmless+0.012
its+0.011
iquette+0.011
attering+0.011
Neg logit contributions
ESE-0.016
ysis-0.015
res-0.014
SEA-0.014
sta-0.013
Token
Token ID251
Feature activation-1.92e-02

Pos logit contributions
 Cunning+0.005
 harmless+0.004
its+0.004
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
thodox-0.004
Token
Token ID198
Feature activation-6.69e-03

Pos logit contributions
 Cunning+0.016
 harmless+0.013
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.016
ysis-0.015
res-0.014
SEA-0.014
thodox-0.013
Token
Token ID198
Feature activation-2.63e-03

Pos logit contributions
 Cunning+0.005
 harmless+0.004
iquette+0.004
its+0.004
attering+0.004
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
TokenE
Token ID36
Feature activation-4.79e-02

Pos logit contributions
 Cunning+0.002
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.002
ysis-0.002
res-0.002
SEA-0.002
thodox-0.002
Tokent
Token ID83
Feature activation-1.67e-01

Pos logit contributions
 Cunning+0.031
 harmless+0.023
its+0.021
attering+0.021
iquette+0.021
Neg logit contributions
ESE-0.048
ysis-0.045
res-0.044
SEA-0.043
sta-0.042
Tokeniquette
Token ID40387
Feature activation-5.18e-01

Pos logit contributions
 Cunning+0.037
 harmless+0.012
attering+0.004
its+0.004
 Buckley+0.004
Neg logit contributions
ESE-0.236
ysis-0.225
res-0.221
SEA-0.219
thodox-0.214
Token for
Token ID329
Feature activation-2.77e-02

Pos logit contributions
 Cunning+0.405
 harmless+0.324
its+0.306
iquette+0.304
attering+0.303
Neg logit contributions
ESE-0.444
ysis-0.409
res-0.394
SEA-0.391
sta-0.373
Token Gentle
Token ID25066
Feature activation+2.36e-02

Pos logit contributions
 Cunning+0.023
 harmless+0.019
its+0.018
iquette+0.018
attering+0.018
Neg logit contributions
ESE-0.022
ysis-0.020
res-0.020
SEA-0.019
sta-0.018
Tokenmen
Token ID3653
Feature activation+2.17e-01

Pos logit contributions
ESE+0.013
ysis+0.011
res+0.011
SEA+0.010
sta+0.010
Neg logit contributions
 Cunning-0.026
 harmless-0.022
its-0.021
iquette-0.021
attering-0.021
Token,
Token ID11
Feature activation+5.83e-03

Pos logit contributions
ESE+0.172
ysis+0.158
res+0.151
SEA+0.150
thodox+0.142
Neg logit contributions
 Cunning-0.184
 harmless-0.150
its-0.142
iquette-0.142
attering-0.141
Token 18
Token ID1248
Feature activation+5.29e-02

Pos logit contributions
ESE+0.005
ysis+0.004
res+0.004
SEA+0.004
sta+0.004
Neg logit contributions
 Cunning-0.005
 harmless-0.004
its-0.004
attering-0.004
iquette-0.004
Tokenmen
Token ID3653
Feature activation+2.21e-01

Pos logit contributions
ESE+0.016
ysis+0.014
res+0.013
SEA+0.013
sta+0.012
Neg logit contributions
 Cunning-0.033
 harmless-0.029
its-0.028
iquette-0.027
attering-0.027
Token
Token ID447
Feature activation-7.54e-03

Pos logit contributions
ESE+0.183
ysis+0.168
res+0.162
SEA+0.161
sta+0.153
Neg logit contributions
 Cunning-0.179
 harmless-0.144
its-0.137
iquette-0.136
attering-0.135
Token
Token ID247
Feature activation+1.18e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
Tokens
Token ID82
Feature activation-9.13e-04

Pos logit contributions
ESE+0.009
ysis+0.009
res+0.008
SEA+0.008
sta+0.008
Neg logit contributions
 Cunning-0.010
 harmless-0.008
attering-0.008
its-0.008
iquette-0.008
Token Et
Token ID17906
Feature activation+8.20e-02

Pos logit contributions
 Cunning+0.001
 harmless+0.001
its+0.001
attering+0.001
iquette+0.001
Neg logit contributions
ESE-0.001
ysis-0.001
res-0.001
SEA-0.001
sta-0.001
Tokeniquette
Token ID40387
Feature activation-5.17e-01

Pos logit contributions
ESE+0.114
ysis+0.109
res+0.107
SEA+0.106
sta+0.104
Neg logit contributions
 Cunning-0.020
 harmless-0.008
 Bots-0.004
 Buckley-0.004
attering-0.004
Token,
Token ID11
Feature activation-2.73e-03

Pos logit contributions
 Cunning+0.423
 harmless+0.342
its+0.324
iquette+0.322
attering+0.320
Neg logit contributions
ESE-0.426
ysis-0.391
res-0.376
SEA-0.374
thodox-0.356
Token 18
Token ID1248
Feature activation+5.32e-02

Pos logit contributions
 Cunning+0.002
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.002
ysis-0.002
res-0.002
SEA-0.002
thodox-0.002
Token77
Token ID3324
Feature activation+4.32e-02

Pos logit contributions
ESE+0.040
ysis+0.037
res+0.035
SEA+0.035
thodox+0.033
Neg logit contributions
 Cunning-0.047
 harmless-0.038
its-0.037
iquette-0.036
attering-0.036
Token.
Token ID13
Feature activation-1.84e-04

Pos logit contributions
ESE+0.037
ysis+0.034
res+0.032
SEA+0.032
thodox+0.031
Neg logit contributions
 Cunning-0.034
 harmless-0.028
its-0.026
iquette-0.026
attering-0.026
Token
Token ID198
Feature activation-1.99e-03

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
thodox-0.000
Token reaction
Token ID6317
Feature activation-2.98e-01

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
Token to
Token ID284
Feature activation-1.46e-02

Pos logit contributions
 Cunning+0.244
 harmless+0.197
its+0.187
iquette+0.186
attering+0.185
Neg logit contributions
ESE-0.245
ysis-0.225
res-0.216
SEA-0.215
sta-0.204
Token the
Token ID262
Feature activation+9.25e-03

Pos logit contributions
 Cunning+0.011
 harmless+0.009
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.013
ysis-0.012
res-0.012
SEA-0.011
sta-0.011
Token ­
Token ID22020
Feature activation+6.40e-02

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.007
Neg logit contributions
 Cunning-0.007
 harmless-0.006
its-0.006
iquette-0.006
attering-0.006
Tokenre
Token ID260
Feature activation+4.14e-01

Pos logit contributions
ESE+0.052
ysis+0.047
res+0.046
SEA+0.045
sta+0.043
Neg logit contributions
 Cunning-0.053
 harmless-0.043
its-0.041
iquette-0.041
attering-0.041
Tokenaction
Token ID2673
Feature activation-5.17e-01

Pos logit contributions
ESE+0.348
ysis+0.320
res+0.310
SEA+0.306
thodox+0.292
Neg logit contributions
 Cunning-0.331
 harmless-0.266
its-0.249
iquette-0.249
 Buckley-0.248
Token is
Token ID318
Feature activation+8.53e-02

Pos logit contributions
 Cunning+0.427
 harmless+0.346
its+0.329
iquette+0.326
attering+0.325
Neg logit contributions
ESE-0.421
ysis-0.386
res-0.370
SEA-0.368
thodox-0.350
Token a
Token ID257
Feature activation+2.44e-04

Pos logit contributions
ESE+0.072
ysis+0.066
res+0.064
SEA+0.063
sta+0.060
Neg logit contributions
 Cunning-0.068
 harmless-0.055
its-0.052
iquette-0.051
attering-0.051
Token textbook
Token ID28979
Feature activation-1.75e-01

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
sta+0.000
Neg logit contributions
 Cunning-0.000
 harmless-0.000
its-0.000
iquette-0.000
attering-0.000
Token case
Token ID1339
Feature activation-1.32e-01

Pos logit contributions
 Cunning+0.158
 harmless+0.130
its+0.125
iquette+0.124
attering+0.124
Neg logit contributions
ESE-0.128
ysis-0.116
res-0.112
SEA-0.110
sta-0.104
Token in
Token ID287
Feature activation-1.60e-02

Pos logit contributions
 Cunning+0.107
 harmless+0.086
its+0.082
iquette+0.081
attering+0.081
Neg logit contributions
ESE-0.110
ysis-0.101
res-0.097
SEA-0.096
sta-0.092
Tokenmen
Token ID3653
Feature activation+2.11e-01

Pos logit contributions
ESE+0.012
ysis+0.011
res+0.010
SEA+0.010
sta+0.009
Neg logit contributions
 Cunning-0.025
 harmless-0.022
its-0.021
iquette-0.021
attering-0.021
Token
Token ID447
Feature activation-6.93e-03

Pos logit contributions
ESE+0.172
ysis+0.158
res+0.153
SEA+0.151
sta+0.144
Neg logit contributions
 Cunning-0.172
 harmless-0.140
its-0.132
attering-0.131
iquette-0.131
Token
Token ID247
Feature activation+1.27e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.005
ysis-0.005
res-0.005
SEA-0.005
sta-0.004
Tokens
Token ID82
Feature activation-1.86e-03

Pos logit contributions
ESE+0.010
ysis+0.009
res+0.009
SEA+0.009
sta+0.008
Neg logit contributions
 Cunning-0.011
 harmless-0.009
its-0.008
iquette-0.008
attering-0.008
Token Et
Token ID17906
Feature activation+8.12e-02

Pos logit contributions
 Cunning+0.002
 harmless+0.001
its+0.001
attering+0.001
iquette+0.001
Neg logit contributions
ESE-0.001
ysis-0.001
res-0.001
SEA-0.001
sta-0.001
Tokeniquette
Token ID40387
Feature activation-5.15e-01

Pos logit contributions
ESE+0.113
ysis+0.108
res+0.107
SEA+0.105
sta+0.103
Neg logit contributions
 Cunning-0.019
 harmless-0.007
 Bots-0.004
 Buckley-0.004
attering-0.003
Token,
Token ID11
Feature activation+3.84e-03

Pos logit contributions
 Cunning+0.421
 harmless+0.340
its+0.322
iquette+0.320
attering+0.318
Neg logit contributions
ESE-0.424
ysis-0.390
res-0.374
SEA-0.372
thodox-0.355
Token 18
Token ID1248
Feature activation+5.51e-02

Pos logit contributions
ESE+0.003
ysis+0.003
res+0.003
SEA+0.003
thodox+0.003
Neg logit contributions
 Cunning-0.003
 harmless-0.003
its-0.002
iquette-0.002
attering-0.002
Token77
Token ID3324
Feature activation+4.26e-02

Pos logit contributions
ESE+0.044
ysis+0.040
res+0.039
SEA+0.039
sta+0.037
Neg logit contributions
 Cunning-0.046
 harmless-0.037
its-0.035
attering-0.035
iquette-0.035
Token.
Token ID13
Feature activation+8.66e-04

Pos logit contributions
ESE+0.035
ysis+0.032
res+0.031
SEA+0.031
sta+0.029
Neg logit contributions
 Cunning-0.035
 harmless-0.028
its-0.027
iquette-0.027
attering-0.026
Token
Token ID198
Feature activation+2.57e-03

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token
Token ID247
Feature activation+1.24e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
Tokens
Token ID82
Feature activation-4.29e-03

Pos logit contributions
ESE+0.010
ysis+0.009
res+0.009
SEA+0.009
thodox+0.008
Neg logit contributions
 Cunning-0.010
 harmless-0.008
its-0.008
iquette-0.008
attering-0.008
Token Book
Token ID4897
Feature activation-1.68e-01

Pos logit contributions
 Cunning+0.003
 harmless+0.003
its+0.003
iquette+0.003
attering+0.003
Neg logit contributions
ESE-0.004
ysis-0.003
res-0.003
SEA-0.003
sta-0.003
Token of
Token ID286
Feature activation+3.08e-03

Pos logit contributions
 Cunning+0.136
 harmless+0.110
its+0.103
iquette+0.103
attering+0.103
Neg logit contributions
ESE-0.139
ysis-0.128
res-0.123
SEA-0.122
sta-0.116
Token Et
Token ID17906
Feature activation+8.47e-02

Pos logit contributions
ESE+0.003
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.002
its-0.002
attering-0.002
iquette-0.002
Tokeniquette
Token ID40387
Feature activation-5.15e-01

Pos logit contributions
ESE+0.120
ysis+0.115
res+0.113
SEA+0.111
sta+0.109
Neg logit contributions
 Cunning-0.019
 harmless-0.005
 Buckley-0.002
attering-0.001
its-0.001
Token and
Token ID290
Feature activation-8.12e-03

Pos logit contributions
 Cunning+0.423
 harmless+0.343
its+0.324
attering+0.322
iquette+0.322
Neg logit contributions
ESE-0.420
ysis-0.386
res-0.372
SEA-0.367
sta-0.350
Token Manual
Token ID17969
Feature activation+3.35e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
attering+0.005
 Buckley+0.005
Neg logit contributions
ESE-0.007
ysis-0.006
res-0.006
SEA-0.006
sta-0.006
Token of
Token ID286
Feature activation+8.88e-05

Pos logit contributions
ESE+0.027
ysis+0.025
res+0.024
SEA+0.024
sta+0.023
Neg logit contributions
 Cunning-0.028
 harmless-0.022
its-0.021
attering-0.021
iquette-0.021
Token Polit
Token ID7793
Feature activation-3.23e-01

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
sta+0.000
Neg logit contributions
 Cunning-0.000
 harmless-0.000
its-0.000
attering-0.000
 Buckley-0.000
Tokeneness
Token ID9449
Feature activation-5.52e-02

Pos logit contributions
 Cunning+0.246
 harmless+0.196
its+0.183
 Buckley+0.182
attering+0.182
Neg logit contributions
ESE-0.283
ysis-0.263
res-0.254
SEA-0.250
sta-0.240
Token
Token ID1906
Feature activation-1.40e-02

Pos logit contributions
 Cunning+0.187
 harmless+0.152
its+0.144
iquette+0.143
attering+0.142
Neg logit contributions
ESE-0.185
ysis-0.170
res-0.164
SEA-0.162
sta-0.154
Tokenach
Token ID620
Feature activation-6.41e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.010
SEA-0.010
sta-0.009
Tokenieve
Token ID12311
Feature activation+1.17e-01

Pos logit contributions
 Cunning+0.057
 harmless+0.047
its+0.045
iquette+0.044
attering+0.044
Neg logit contributions
ESE-0.048
ysis-0.044
res-0.042
SEA-0.042
sta-0.039
Token repetition
Token ID29693
Feature activation+1.74e-01

Pos logit contributions
ESE+0.096
ysis+0.087
SEA+0.084
res+0.083
thodox+0.080
Neg logit contributions
 Cunning-0.097
 harmless-0.079
its-0.075
attering-0.074
iquette-0.073
Token and
Token ID290
Feature activation-1.90e-02

Pos logit contributions
ESE+0.144
ysis+0.132
res+0.127
SEA+0.127
thodox+0.121
Neg logit contributions
 Cunning-0.141
 harmless-0.114
its-0.108
iquette-0.107
attering-0.107
Token survival
Token ID9441
Feature activation-5.11e-01

Pos logit contributions
 Cunning+0.016
 harmless+0.013
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.013
SEA-0.013
thodox-0.012
Token during
Token ID1141
Feature activation+2.08e-02

Pos logit contributions
 Cunning+0.413
 harmless+0.333
its+0.316
iquette+0.312
attering+0.310
Neg logit contributions
ESE-0.425
ysis-0.390
res-0.374
SEA-0.374
thodox-0.356
Token the
Token ID262
Feature activation+6.42e-03

Pos logit contributions
ESE+0.017
ysis+0.016
res+0.015
SEA+0.015
thodox+0.014
Neg logit contributions
 Cunning-0.017
 harmless-0.014
its-0.013
iquette-0.013
attering-0.013
Token digital
Token ID4875
Feature activation+1.22e-02

Pos logit contributions
ESE+0.005
ysis+0.005
res+0.005
SEA+0.005
thodox+0.004
Neg logit contributions
 Cunning-0.005
 harmless-0.004
its-0.004
iquette-0.004
attering-0.004
Token encoding
Token ID21004
Feature activation+1.29e-01

Pos logit contributions
ESE+0.008
ysis+0.008
res+0.007
SEA+0.007
thodox+0.007
Neg logit contributions
 Cunning-0.011
 harmless-0.010
its-0.009
iquette-0.009
attering-0.009
Token process
Token ID1429
Feature activation+3.39e-02

Pos logit contributions
ESE+0.103
ysis+0.094
SEA+0.090
res+0.090
thodox+0.086
Neg logit contributions
 Cunning-0.109
 harmless-0.089
its-0.085
iquette-0.084
attering-0.084
Token/
Token ID14
Feature activation+6.33e-02

Pos logit contributions
 Cunning+0.118
 harmless+0.094
its+0.089
iquette+0.088
attering+0.088
Neg logit contributions
ESE-0.132
ysis-0.121
res-0.117
SEA-0.116
sta-0.111
TokenA
Token ID32
Feature activation-1.09e-01

Pos logit contributions
ESE+0.052
ysis+0.047
res+0.046
SEA+0.045
sta+0.043
Neg logit contributions
 Cunning-0.052
 harmless-0.042
its-0.040
iquette-0.040
attering-0.040
Token3
Token ID18
Feature activation+6.38e-02

Pos logit contributions
 Cunning+0.091
 harmless+0.074
its+0.070
iquette+0.070
attering+0.070
Neg logit contributions
ESE-0.087
ysis-0.080
res-0.076
SEA-0.076
thodox-0.072
Token bolt
Token ID18100
Feature activation-4.70e-03

Pos logit contributions
ESE+0.054
ysis+0.050
res+0.048
SEA+0.047
sta+0.045
Neg logit contributions
 Cunning-0.051
 harmless-0.041
its-0.038
iquette-0.038
attering-0.038
Token-
Token ID12
Feature activation-3.29e-02

Pos logit contributions
 Cunning+0.003
 harmless+0.003
its+0.002
iquette+0.002
 Buckley+0.002
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
sta-0.004
Tokenaction
Token ID2673
Feature activation-5.10e-01

Pos logit contributions
 Cunning+0.009
 harmless+0.004
 Buckley+0.002
iquette+0.002
 Bots+0.002
Neg logit contributions
ESE-0.045
ysis-0.043
res-0.043
SEA-0.042
sta-0.041
Token rifle
Token ID11575
Feature activation-2.96e-02

Pos logit contributions
 Cunning+0.391
 harmless+0.310
its+0.293
iquette+0.292
attering+0.291
Neg logit contributions
ESE-0.445
ysis-0.412
res-0.398
SEA-0.393
sta-0.377
Token on
Token ID319
Feature activation-2.92e-02

Pos logit contributions
 Cunning+0.024
 harmless+0.019
its+0.018
iquette+0.018
attering+0.018
Neg logit contributions
ESE-0.025
ysis-0.023
res-0.022
SEA-0.022
sta-0.021
Token the
Token ID262
Feature activation+1.49e-02

Pos logit contributions
 Cunning+0.023
 harmless+0.019
iquette+0.018
its+0.018
attering+0.017
Neg logit contributions
ESE-0.025
ysis-0.023
res-0.022
SEA-0.022
sta-0.021
Token rooftop
Token ID33880
Feature activation+2.74e-01

Pos logit contributions
ESE+0.012
ysis+0.011
res+0.011
SEA+0.010
sta+0.010
Neg logit contributions
 Cunning-0.012
 harmless-0.010
its-0.010
iquette-0.010
attering-0.010
Token during
Token ID1141
Feature activation+2.61e-02

Pos logit contributions
ESE+0.223
ysis+0.206
res+0.199
SEA+0.196
sta+0.187
Neg logit contributions
 Cunning-0.226
 harmless-0.182
iquette-0.173
its-0.172
attering-0.172
Token Book
Token ID4897
Feature activation-1.61e-01

Pos logit contributions
ESE+0.055
ysis+0.051
res+0.049
SEA+0.048
sta+0.046
Neg logit contributions
 Cunning-0.054
 harmless-0.043
its-0.041
attering-0.041
iquette-0.041
Token of
Token ID286
Feature activation+1.39e-02

Pos logit contributions
 Cunning+0.131
 harmless+0.106
its+0.100
iquette+0.100
attering+0.099
Neg logit contributions
ESE-0.132
ysis-0.122
res-0.117
SEA-0.116
thodox-0.110
Token Pract
Token ID13672
Feature activation-9.47e-02

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.009
SEA+0.009
sta+0.009
Neg logit contributions
 Cunning-0.012
 harmless-0.010
its-0.009
attering-0.009
iquette-0.009
Tokenical
Token ID605
Feature activation+1.03e-01

Pos logit contributions
 Cunning+0.088
 harmless+0.074
 Bots+0.069
 Buckley+0.069
attering+0.069
Neg logit contributions
ESE-0.067
ysis-0.061
res-0.058
SEA-0.057
sta-0.054
Token Et
Token ID17906
Feature activation+8.97e-02

Pos logit contributions
ESE+0.081
ysis+0.075
res+0.072
SEA+0.071
sta+0.068
Neg logit contributions
 Cunning-0.087
 harmless-0.071
its-0.067
attering-0.067
iquette-0.066
Tokeniquette
Token ID40387
Feature activation-5.09e-01

Pos logit contributions
ESE+0.127
ysis+0.121
res+0.119
SEA+0.117
thodox+0.115
Neg logit contributions
 Cunning-0.020
 harmless-0.007
 Buckley-0.002
attering-0.002
 Bots-0.002
Token for
Token ID329
Feature activation-1.43e-02

Pos logit contributions
 Cunning+0.419
 harmless+0.339
its+0.321
iquette+0.319
attering+0.319
Neg logit contributions
ESE-0.416
ysis-0.382
res-0.367
SEA-0.364
sta-0.347
Token Ladies
Token ID37401
Feature activation+1.26e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.010
SEA-0.010
sta-0.010
Token and
Token ID290
Feature activation+1.15e-03

Pos logit contributions
ESE+0.010
ysis+0.009
res+0.009
SEA+0.009
thodox+0.009
Neg logit contributions
 Cunning-0.010
 harmless-0.008
its-0.008
iquette-0.008
attering-0.008
Token Gentle
Token ID25066
Feature activation+3.13e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Tokenmen
Token ID3653
Feature activation+2.22e-01

Pos logit contributions
ESE+0.017
ysis+0.015
res+0.014
SEA+0.014
thodox+0.013
Neg logit contributions
 Cunning-0.034
 harmless-0.029
its-0.028
iquette-0.028
attering-0.028
Tokenmen
Token ID3653
Feature activation+2.22e-01

Pos logit contributions
ESE+0.018
ysis+0.016
res+0.015
SEA+0.015
thodox+0.014
Neg logit contributions
 Cunning-0.035
 harmless-0.030
its-0.029
iquette-0.029
attering-0.029
Token
Token ID447
Feature activation-3.58e-03

Pos logit contributions
ESE+0.178
ysis+0.163
res+0.157
SEA+0.155
sta+0.147
Neg logit contributions
 Cunning-0.185
 harmless-0.151
its-0.143
attering-0.142
iquette-0.141
Token
Token ID247
Feature activation+2.16e-02

Pos logit contributions
 Cunning+0.003
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.003
ysis-0.003
res-0.003
SEA-0.002
sta-0.002
Tokens
Token ID82
Feature activation+9.44e-04

Pos logit contributions
ESE+0.017
ysis+0.016
res+0.015
SEA+0.015
sta+0.015
Neg logit contributions
 Cunning-0.018
 harmless-0.015
its-0.014
iquette-0.014
attering-0.014
Token Et
Token ID17906
Feature activation+8.69e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.000
its-0.000
attering-0.000
iquette-0.000
Tokeniquette
Token ID40387
Feature activation-5.08e-01

Pos logit contributions
ESE+0.119
ysis+0.114
res+0.113
SEA+0.110
sta+0.109
Neg logit contributions
 Cunning-0.022
 harmless-0.010
 Bots-0.006
 Buckley-0.005
attering-0.005
Token,
Token ID11
Feature activation+1.04e-02

Pos logit contributions
 Cunning+0.395
 harmless+0.317
its+0.298
attering+0.297
iquette+0.296
Neg logit contributions
ESE-0.436
ysis-0.403
res-0.389
SEA-0.384
sta-0.367
Token 18
Token ID1248
Feature activation+5.89e-02

Pos logit contributions
ESE+0.009
ysis+0.008
res+0.008
SEA+0.008
sta+0.007
Neg logit contributions
 Cunning-0.008
 harmless-0.007
its-0.006
iquette-0.006
attering-0.006
Token77
Token ID3324
Feature activation+4.71e-02

Pos logit contributions
ESE+0.048
ysis+0.044
res+0.042
SEA+0.042
sta+0.040
Neg logit contributions
 Cunning-0.049
 harmless-0.040
its-0.038
iquette-0.037
attering-0.037
Token.
Token ID13
Feature activation+7.15e-04

Pos logit contributions
ESE+0.040
ysis+0.036
res+0.035
SEA+0.035
thodox+0.033
Neg logit contributions
 Cunning-0.038
 harmless-0.030
its-0.029
iquette-0.029
attering-0.028
Token
Token ID198
Feature activation+4.29e-03

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.000
its-0.000
iquette-0.000
attering-0.000
Token Gl
Token ID2671
Feature activation-1.53e-01

Pos logit contributions
 Cunning+0.004
 harmless+0.003
its+0.003
iquette+0.003
attering+0.003
Neg logit contributions
ESE-0.007
ysis-0.006
res-0.006
SEA-0.006
sta-0.006
Tokenance
Token ID590
Feature activation+1.24e-01

Pos logit contributions
 Cunning+0.144
 harmless+0.120
its+0.116
attering+0.115
iquette+0.115
Neg logit contributions
ESE-0.106
ysis-0.095
res-0.090
SEA-0.090
thodox-0.085
Token at
Token ID379
Feature activation+2.53e-02

Pos logit contributions
ESE+0.097
ysis+0.089
res+0.085
SEA+0.085
thodox+0.080
Neg logit contributions
 Cunning-0.107
 harmless-0.087
its-0.083
iquette-0.082
attering-0.082
Token Bad
Token ID7772
Feature activation-8.55e-02

Pos logit contributions
ESE+0.023
ysis+0.021
res+0.021
SEA+0.021
sta+0.020
Neg logit contributions
 Cunning-0.018
 harmless-0.014
its-0.013
attering-0.013
iquette-0.013
Token Hab
Token ID19654
Feature activation-1.31e-01

Pos logit contributions
 Cunning+0.059
 harmless+0.046
attering+0.043
iquette+0.042
its+0.042
Neg logit contributions
ESE-0.080
ysis-0.074
res-0.073
SEA-0.071
sta-0.069
Tokenits
Token ID896
Feature activation-5.05e-01

Pos logit contributions
 Cunning+0.038
 harmless+0.018
 Buckley+0.011
 Bots+0.010
attering+0.009
Neg logit contributions
ESE-0.176
ysis-0.169
res-0.168
ses-0.162
SEA-0.162
Token,
Token ID11
Feature activation-9.37e-03

Pos logit contributions
 Cunning+0.410
 harmless+0.329
its+0.313
iquette+0.311
attering+0.309
Neg logit contributions
ESE-0.420
ysis-0.387
res-0.371
SEA-0.370
thodox-0.353
Token 18
Token ID1248
Feature activation+4.32e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
thodox-0.007
Token44
Token ID2598
Feature activation+3.84e-02

Pos logit contributions
ESE+0.033
ysis+0.030
res+0.029
SEA+0.028
thodox+0.027
Neg logit contributions
 Cunning-0.038
 harmless-0.031
its-0.030
iquette-0.030
attering-0.029
Token.
Token ID13
Feature activation-1.02e-02

Pos logit contributions
ESE+0.033
ysis+0.030
res+0.029
SEA+0.029
thodox+0.027
Neg logit contributions
 Cunning-0.030
 harmless-0.024
its-0.023
iquette-0.023
attering-0.023
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.007
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
thodox-0.007
Token off
Token ID572
Feature activation-2.35e-02

Pos logit contributions
 Cunning+0.075
 harmless+0.060
its+0.056
iquette+0.056
attering+0.056
Neg logit contributions
ESE-0.083
ysis-0.076
res-0.074
SEA-0.073
sta-0.070
Token three
Token ID1115
Feature activation-7.01e-02

Pos logit contributions
 Cunning+0.019
 harmless+0.015
its+0.014
attering+0.014
 Buckley+0.014
Neg logit contributions
ESE-0.020
ysis-0.018
res-0.018
SEA-0.017
thodox-0.017
Token existing
Token ID4683
Feature activation-2.50e-04

Pos logit contributions
 Cunning+0.059
 harmless+0.048
its+0.045
iquette+0.045
attering+0.045
Neg logit contributions
ESE-0.056
ysis-0.052
res-0.050
SEA-0.049
thodox-0.047
Token root
Token ID6808
Feature activation+4.73e-01

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
 Buckley+0.000
iquette+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
thodox-0.000
Tokenk
Token ID74
Feature activation-2.23e-01

Pos logit contributions
ESE+0.396
ysis+0.365
res+0.352
SEA+0.349
sta+0.333
Neg logit contributions
 Cunning-0.380
 harmless-0.305
iquette-0.288
its-0.288
attering-0.287
Tokenits
Token ID896
Feature activation-5.05e-01

Pos logit contributions
 Cunning+0.045
 harmless+0.010
its+0.004
iquette+0.001
attering+0.001
Neg logit contributions
ESE-0.321
ysis-0.306
res-0.299
SEA-0.298
thodox-0.291
Token:
Token ID25
Feature activation+4.31e-02

Pos logit contributions
 Cunning+0.418
 harmless+0.340
its+0.323
attering+0.317
iquette+0.317
Neg logit contributions
ESE-0.411
ysis-0.376
res-0.360
SEA-0.359
thodox-0.342
Token J
Token ID449
Feature activation-1.13e-01

Pos logit contributions
ESE+0.035
ysis+0.032
res+0.031
SEA+0.030
thodox+0.029
Neg logit contributions
 Cunning-0.036
 harmless-0.029
its-0.028
iquette-0.027
attering-0.027
Tokenyn
Token ID2047
Feature activation+7.61e-03

Pos logit contributions
 Cunning+0.091
 harmless+0.073
its+0.069
iquette+0.069
attering+0.068
Neg logit contributions
ESE-0.094
ysis-0.087
res-0.084
SEA-0.083
sta-0.079
Tokenx
Token ID87
Feature activation+8.30e-02

Pos logit contributions
ESE+0.006
ysis+0.005
res+0.005
SEA+0.005
thodox+0.004
Neg logit contributions
 Cunning-0.007
 harmless-0.006
its-0.006
iquette-0.005
attering-0.005
Token,
Token ID11
Feature activation-8.27e-03

Pos logit contributions
ESE+0.066
ysis+0.061
res+0.058
SEA+0.058
thodox+0.055
Neg logit contributions
 Cunning-0.070
 harmless-0.057
its-0.054
iquette-0.053
attering-0.053
Token Islands
Token ID12010
Feature activation+1.17e-01

Pos logit contributions
 Cunning+0.140
 harmless+0.117
its+0.112
attering+0.110
 Buckley+0.110
Neg logit contributions
ESE-0.104
ysis-0.095
res-0.090
SEA-0.089
thodox-0.085
Token 14
Token ID1478
Feature activation+8.48e-02

Pos logit contributions
ESE+0.090
ysis+0.082
res+0.078
SEA+0.078
sta+0.074
Neg logit contributions
 Cunning-0.102
 harmless-0.084
its-0.080
iquette-0.080
attering-0.079
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
ESE+0.071
ysis+0.065
res+0.062
SEA+0.062
thodox+0.059
Neg logit contributions
 Cunning-0.068
 harmless-0.055
its-0.052
attering-0.051
iquette-0.051
Token6
Token ID21
Feature activation+8.13e-02

Pos logit contributions
 Cunning+0.009
 harmless+0.007
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
thodox-0.006
Token 8
Token ID807
Feature activation+1.54e-01

Pos logit contributions
ESE+0.055
ysis+0.050
res+0.047
SEA+0.046
thodox+0.044
Neg logit contributions
 Cunning-0.079
 harmless-0.065
its-0.063
 Survival-0.063
 Bots-0.062
Token Bots
Token ID40946
Feature activation-5.04e-01

Pos logit contributions
ESE+0.121
ysis+0.111
res+0.107
SEA+0.106
sta+0.101
Neg logit contributions
 Cunning-0.130
 harmless-0.107
iquette-0.101
its-0.101
attering-0.101
Tokenwana
Token ID49484
Feature activation-2.67e-02

Pos logit contributions
 Cunning+0.402
 harmless+0.323
its+0.306
iquette+0.303
attering+0.303
Neg logit contributions
ESE-0.424
ysis-0.391
res-0.375
SEA-0.373
thodox-0.355
Token 14
Token ID1478
Feature activation+8.32e-02

Pos logit contributions
 Cunning+0.023
 harmless+0.019
its+0.018
iquette+0.018
attering+0.018
Neg logit contributions
ESE-0.021
ysis-0.019
res-0.018
SEA-0.018
thodox-0.017
Token.
Token ID13
Feature activation-1.16e-02

Pos logit contributions
ESE+0.070
ysis+0.065
res+0.062
SEA+0.062
thodox+0.059
Neg logit contributions
 Cunning-0.066
 harmless-0.053
its-0.050
iquette-0.050
attering-0.050
Token9
Token ID24
Feature activation+1.37e-01

Pos logit contributions
 Cunning+0.010
 harmless+0.009
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.007
SEA-0.007
thodox-0.007
Token 9
Token ID860
Feature activation+8.53e-02

Pos logit contributions
ESE+0.100
ysis+0.091
res+0.087
SEA+0.086
sta+0.081
Neg logit contributions
 Cunning-0.124
 harmless-0.103
its-0.098
iquette-0.098
attering-0.097
Token Gl
Token ID2671
Feature activation-1.51e-01

Pos logit contributions
 Cunning+0.008
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.008
ysis-0.008
res-0.007
SEA-0.007
thodox-0.007
Tokenance
Token ID590
Feature activation+1.26e-01

Pos logit contributions
 Cunning+0.118
 harmless+0.095
its+0.089
iquette+0.088
 Buckley+0.088
Neg logit contributions
ESE-0.129
ysis-0.119
res-0.115
SEA-0.114
sta-0.109
Token at
Token ID379
Feature activation+2.86e-02

Pos logit contributions
ESE+0.100
ysis+0.092
res+0.088
SEA+0.087
sta+0.083
Neg logit contributions
 Cunning-0.106
 harmless-0.086
its-0.082
iquette-0.081
attering-0.081
Token Bad
Token ID7772
Feature activation-8.65e-02

Pos logit contributions
ESE+0.024
ysis+0.022
res+0.021
SEA+0.021
sta+0.020
Neg logit contributions
 Cunning-0.023
 harmless-0.019
its-0.018
attering-0.017
iquette-0.017
Token Hab
Token ID19654
Feature activation-1.27e-01

Pos logit contributions
 Cunning+0.067
 harmless+0.054
its+0.051
attering+0.050
 Buckley+0.050
Neg logit contributions
ESE-0.074
ysis-0.068
res-0.066
SEA-0.065
sta-0.063
Tokenits
Token ID896
Feature activation-5.04e-01

Pos logit contributions
 Cunning+0.030
 harmless+0.010
 Buckley+0.004
attering+0.004
iquette+0.003
Neg logit contributions
ESE-0.177
ysis-0.170
res-0.168
SEA-0.164
sta-0.162
Token,
Token ID11
Feature activation-3.44e-03

Pos logit contributions
 Cunning+0.416
 harmless+0.337
its+0.319
iquette+0.317
attering+0.317
Neg logit contributions
ESE-0.409
ysis-0.376
res-0.361
SEA-0.358
sta-0.341
Token 18
Token ID1248
Feature activation+4.19e-02

Pos logit contributions
 Cunning+0.003
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.003
ysis-0.003
res-0.003
SEA-0.003
sta-0.002
Token44
Token ID2598
Feature activation+4.09e-02

Pos logit contributions
ESE+0.033
ysis+0.031
res+0.029
SEA+0.029
sta+0.028
Neg logit contributions
 Cunning-0.035
 harmless-0.029
its-0.027
iquette-0.027
attering-0.027
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
ESE+0.034
ysis+0.032
res+0.030
SEA+0.030
thodox+0.029
Neg logit contributions
 Cunning-0.033
 harmless-0.026
its-0.025
iquette-0.025
attering-0.025
Token
Token ID198
Feature activation-1.52e-02

Pos logit contributions
 Cunning+0.011
 harmless+0.009
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.010
SEA-0.010
thodox-0.010
TokenA
Token ID32
Feature activation-1.11e-01

Pos logit contributions
 Cunning+0.037
 harmless+0.030
its+0.029
iquette+0.029
attering+0.029
Neg logit contributions
ESE-0.032
ysis-0.029
res-0.028
SEA-0.028
thodox-0.026
TokenJ
Token ID41
Feature activation-8.72e-02

Pos logit contributions
 Cunning+0.089
 harmless+0.071
iquette+0.067
its+0.067
attering+0.067
Neg logit contributions
ESE-0.093
ysis-0.086
res-0.083
SEA-0.082
sta-0.078
Token Po
Token ID7695
Feature activation-3.82e-02

Pos logit contributions
 Cunning+0.066
 harmless+0.053
its+0.049
iquette+0.049
attering+0.049
Neg logit contributions
ESE-0.077
ysis-0.071
res-0.069
SEA-0.068
sta-0.065
Tokenopy
Token ID11081
Feature activation+1.46e-02

Pos logit contributions
 Cunning+0.029
 harmless+0.023
iquette+0.022
attering+0.022
 Buckley+0.022
Neg logit contributions
ESE-0.033
ysis-0.031
res-0.030
SEA-0.029
sta-0.028
Token Sh
Token ID911
Feature activation+7.90e-03

Pos logit contributions
ESE+0.013
ysis+0.012
res+0.011
SEA+0.011
sta+0.011
Neg logit contributions
 Cunning-0.011
 harmless-0.009
its-0.008
attering-0.008
iquette-0.008
Tokenits
Token ID896
Feature activation-5.03e-01

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.007
Neg logit contributions
 Cunning-0.005
 harmless-0.004
iquette-0.004
 Buckley-0.004
its-0.004
Token-
Token ID12
Feature activation-4.81e-02

Pos logit contributions
 Cunning+0.366
 harmless+0.288
attering+0.267
iquette+0.266
its+0.266
Neg logit contributions
ESE-0.458
ysis-0.426
res-0.413
SEA-0.406
sta-0.393
TokenAdd
Token ID4550
Feature activation+1.46e-01

Pos logit contributions
 Cunning+0.042
 harmless+0.035
its+0.033
iquette+0.033
attering+0.033
Neg logit contributions
ESE-0.037
ysis-0.034
res-0.032
SEA-0.032
thodox-0.030
Tokenin
Token ID259
Feature activation-2.29e-01

Pos logit contributions
ESE+0.113
ysis+0.104
res+0.100
SEA+0.099
sta+0.094
Neg logit contributions
 Cunning-0.125
 harmless-0.102
its-0.097
iquette-0.096
attering-0.096
Token Audio
Token ID13491
Feature activation+8.55e-02

Pos logit contributions
 Cunning+0.180
 harmless+0.144
its+0.136
iquette+0.135
attering+0.135
Neg logit contributions
ESE-0.195
ysis-0.180
res-0.174
SEA-0.172
sta-0.165
Token To
Token ID1675
Feature activation+2.45e-02

Pos logit contributions
ESE+0.073
ysis+0.068
res+0.065
SEA+0.065
sta+0.062
Neg logit contributions
 Cunning-0.067
 harmless-0.053
its-0.050
iquette-0.050
attering-0.050
Token be
Token ID307
Feature activation-1.62e-02

Pos logit contributions
 Cunning+0.108
 harmless+0.088
its+0.083
iquette+0.083
attering+0.082
Neg logit contributions
ESE-0.104
ysis-0.096
res-0.092
SEA-0.091
sta-0.087
Token like
Token ID588
Feature activation-1.57e-01

Pos logit contributions
 Cunning+0.013
 harmless+0.011
its+0.010
iquette+0.010
attering+0.010
Neg logit contributions
ESE-0.014
ysis-0.012
res-0.012
SEA-0.012
thodox-0.011
Token
Token ID564
Feature activation+2.26e-02

Pos logit contributions
 Cunning+0.128
 harmless+0.104
its+0.098
iquette+0.098
attering+0.097
Neg logit contributions
ESE-0.129
ysis-0.118
res-0.114
SEA-0.113
sta-0.108
Token
Token ID250
Feature activation-9.98e-03

Pos logit contributions
ESE+0.019
ysis+0.017
res+0.016
SEA+0.016
sta+0.015
Neg logit contributions
 Cunning-0.019
 harmless-0.015
its-0.014
iquette-0.014
attering-0.014
Tokenj
Token ID73
Feature activation-1.02e-01

Pos logit contributions
 Cunning+0.008
 harmless+0.006
iquette+0.006
its+0.006
 Buckley+0.006
Neg logit contributions
ESE-0.008
ysis-0.008
res-0.008
SEA-0.007
sta-0.007
Tokenugg
Token ID6837
Feature activation-5.03e-01

Pos logit contributions
 Cunning+0.083
 harmless+0.067
its+0.063
iquette+0.063
attering+0.063
Neg logit contributions
ESE-0.083
ysis-0.077
res-0.074
SEA-0.073
sta-0.070
Tokenling
Token ID1359
Feature activation+1.11e-02

Pos logit contributions
 Cunning+0.411
 harmless+0.332
its+0.318
iquette+0.315
attering+0.315
Neg logit contributions
ESE-0.413
ysis-0.379
res-0.362
SEA-0.361
thodox-0.344
Token the
Token ID262
Feature activation+7.36e-03

Pos logit contributions
ESE+0.009
ysis+0.008
res+0.008
SEA+0.008
sta+0.007
Neg logit contributions
 Cunning-0.009
 harmless-0.007
its-0.007
iquette-0.007
attering-0.007
Token hot
Token ID3024
Feature activation+1.13e-01

Pos logit contributions
ESE+0.006
ysis+0.005
res+0.005
SEA+0.005
thodox+0.005
Neg logit contributions
 Cunning-0.006
 harmless-0.005
its-0.005
iquette-0.005
attering-0.005
Token potato
Token ID21219
Feature activation-1.23e-01

Pos logit contributions
ESE+0.093
ysis+0.086
res+0.083
SEA+0.082
sta+0.078
Neg logit contributions
 Cunning-0.092
 harmless-0.074
its-0.070
iquette-0.070
attering-0.070
Token,
Token ID11
Feature activation-6.57e-03

Pos logit contributions
 Cunning+0.099
 harmless+0.080
its+0.075
iquette+0.075
attering+0.074
Neg logit contributions
ESE-0.103
ysis-0.095
res-0.091
SEA-0.090
thodox-0.086
Token and
Token ID290
Feature activation-1.41e-02

Pos logit contributions
 Cunning+0.157
 harmless+0.127
its+0.120
iquette+0.120
attering+0.119
Neg logit contributions
ESE-0.157
ysis-0.144
res-0.139
SEA-0.138
sta-0.131
Token harsh
Token ID11859
Feature activation+2.43e-03

Pos logit contributions
 Cunning+0.012
 harmless+0.009
iquette+0.009
its+0.009
attering+0.009
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.010
SEA-0.010
 seeded-0.009
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token The
Token ID383
Feature activation-2.37e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.006
iquette+0.006
its+0.006
attering+0.006
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
sta-0.007
Token D
Token ID360
Feature activation-7.53e-02

Pos logit contributions
 Cunning+0.019
 harmless+0.015
its+0.015
iquette+0.015
attering+0.015
Neg logit contributions
ESE-0.020
ysis-0.018
res-0.017
SEA-0.017
sta-0.016
Tokenugg
Token ID6837
Feature activation-5.03e-01

Pos logit contributions
 Cunning+0.020
 harmless+0.008
iquette+0.004
 Bots+0.004
 Buckley+0.004
Neg logit contributions
ESE-0.104
ysis-0.100
res-0.098
SEA-0.097
sta-0.094
Tokenars
Token ID945
Feature activation-1.59e-01

Pos logit contributions
 Cunning+0.306
 harmless+0.227
its+0.210
iquette+0.208
attering+0.207
Neg logit contributions
ESE-0.518
ysis-0.485
res-0.470
SEA-0.467
sta-0.450
Token
Token ID447
Feature activation-1.48e-02

Pos logit contributions
 Cunning+0.130
 harmless+0.105
its+0.099
iquette+0.099
attering+0.098
Neg logit contributions
ESE-0.131
ysis-0.120
res-0.116
SEA-0.115
sta-0.110
Token
Token ID247
Feature activation+4.72e-03

Pos logit contributions
 Cunning+0.013
 harmless+0.010
its+0.010
iquette+0.010
attering+0.010
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.010
SEA-0.010
sta-0.009
Token intoler
Token ID20619
Feature activation-4.62e-03

Pos logit contributions
ESE+0.004
ysis+0.003
res+0.003
SEA+0.003
sta+0.003
Neg logit contributions
 Cunning-0.004
 harmless-0.003
its-0.003
iquette-0.003
attering-0.003
Tokenant
Token ID415
Feature activation-5.64e-02

Pos logit contributions
 Cunning+0.003
 harmless+0.003
its+0.003
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
sta-0.004
Token on
Token ID319
Feature activation-1.99e-02

Pos logit contributions
ESE+0.068
ysis+0.062
res+0.060
SEA+0.059
thodox+0.057
Neg logit contributions
 Cunning-0.066
 harmless-0.053
its-0.051
iquette-0.050
attering-0.050
Token what
Token ID644
Feature activation+2.61e-02

Pos logit contributions
 Cunning+0.016
 harmless+0.013
its+0.013
iquette+0.013
attering+0.013
Neg logit contributions
ESE-0.016
ysis-0.015
res-0.014
SEA-0.014
sta-0.013
Token they
Token ID484
Feature activation+1.60e-01

Pos logit contributions
ESE+0.021
ysis+0.019
res+0.019
SEA+0.018
sta+0.017
Neg logit contributions
 Cunning-0.022
 harmless-0.018
its-0.017
iquette-0.017
attering-0.017
Token
Token ID447
Feature activation+1.48e-03

Pos logit contributions
ESE+0.121
ysis+0.110
res+0.105
SEA+0.104
thodox+0.099
Neg logit contributions
 Cunning-0.141
 harmless-0.116
its-0.110
iquette-0.110
attering-0.110
Token
Token ID247
Feature activation+2.60e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
iquette-0.001
its-0.001
attering-0.001
Tokenre
Token ID260
Feature activation+4.30e-01

Pos logit contributions
ESE+0.021
ysis+0.019
res+0.018
SEA+0.018
sta+0.017
Neg logit contributions
 Cunning-0.022
 harmless-0.018
its-0.017
iquette-0.017
attering-0.017
Token going
Token ID1016
Feature activation+1.89e-02

Pos logit contributions
ESE+0.341
ysis+0.313
res+0.301
SEA+0.299
sta+0.283
Neg logit contributions
 Cunning-0.362
 harmless-0.295
its-0.280
iquette-0.278
attering-0.277
Token to
Token ID284
Feature activation-3.26e-03

Pos logit contributions
ESE+0.016
ysis+0.014
res+0.014
SEA+0.014
sta+0.013
Neg logit contributions
 Cunning-0.015
 harmless-0.012
its-0.012
iquette-0.012
attering-0.012
Token consume
Token ID15000
Feature activation-2.68e-01

Pos logit contributions
 Cunning+0.003
 harmless+0.002
its+0.002
iquette+0.002
attering+0.002
Neg logit contributions
ESE-0.002
ysis-0.002
res-0.002
SEA-0.002
sta-0.002
Token.
Token ID13
Feature activation+8.16e-03

Pos logit contributions
 Cunning+0.215
 harmless+0.173
its+0.164
iquette+0.163
attering+0.162
Neg logit contributions
ESE-0.223
ysis-0.206
res-0.198
SEA-0.196
sta-0.187
Token
Token ID198
Feature activation+5.66e-03

Pos logit contributions
ESE+0.007
ysis+0.006
res+0.006
SEA+0.006
sta+0.006
Neg logit contributions
 Cunning-0.007
 harmless-0.005
its-0.005
iquette-0.005
attering-0.005
Token only
Token ID691
Feature activation-8.99e-03

Pos logit contributions
 Cunning+0.126
 harmless+0.103
its+0.097
iquette+0.096
attering+0.096
Neg logit contributions
ESE-0.120
ysis-0.109
res-0.105
SEA-0.104
sta-0.099
Token in
Token ID287
Feature activation-5.14e-03

Pos logit contributions
 Cunning+0.007
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.007
ysis-0.007
res-0.006
SEA-0.006
sta-0.006
Token the
Token ID262
Feature activation+2.61e-02

Pos logit contributions
 Cunning+0.004
 harmless+0.004
its+0.003
iquette+0.003
attering+0.003
Neg logit contributions
ESE-0.004
ysis-0.004
res-0.004
SEA-0.004
sta-0.003
Token wider
Token ID10595
Feature activation-4.47e-02

Pos logit contributions
ESE+0.020
ysis+0.018
res+0.018
SEA+0.017
sta+0.017
Neg logit contributions
 Cunning-0.023
 harmless-0.019
its-0.018
iquette-0.017
attering-0.017
Token larger
Token ID4025
Feature activation+5.45e-02

Pos logit contributions
 Cunning+0.039
 harmless+0.032
its+0.030
iquette+0.030
attering+0.030
Neg logit contributions
ESE-0.035
ysis-0.032
res-0.030
SEA-0.030
thodox-0.029
Token image
Token ID2939
Feature activation+3.38e-01

Pos logit contributions
ESE+0.041
ysis+0.038
res+0.036
SEA+0.036
sta+0.034
Neg logit contributions
 Cunning-0.048
 harmless-0.039
its-0.038
iquette-0.037
attering-0.037
Token).
Token ID737
Feature activation-1.29e-01

Pos logit contributions
ESE+0.306
ysis+0.284
res+0.274
SEA+0.272
thodox+0.260
Neg logit contributions
 Cunning-0.248
 harmless-0.195
its-0.183
iquette-0.181
attering-0.181
Token In
Token ID554
Feature activation-7.51e-02

Pos logit contributions
 Cunning+0.104
 harmless+0.084
its+0.080
iquette+0.079
attering+0.079
Neg logit contributions
ESE-0.107
ysis-0.098
res-0.094
SEA-0.094
sta-0.089
Token the
Token ID262
Feature activation+3.11e-02

Pos logit contributions
 Cunning+0.061
 harmless+0.049
its+0.046
iquette+0.046
attering+0.046
Neg logit contributions
ESE-0.062
ysis-0.057
res-0.055
SEA-0.055
sta-0.052
Token June
Token ID2795
Feature activation+7.52e-02

Pos logit contributions
ESE+0.024
ysis+0.022
res+0.022
SEA+0.021
sta+0.020
Neg logit contributions
 Cunning-0.027
 harmless-0.022
its-0.021
iquette-0.020
attering-0.020
Token 1
Token ID352
Feature activation-5.74e-02

Pos logit contributions
ESE+0.070
ysis+0.064
res+0.062
SEA+0.062
sta+0.059
Neg logit contributions
 Cunning-0.054
 harmless-0.042
iquette-0.039
its-0.039
attering-0.039
Token is
Token ID318
Feature activation+8.90e-02

Pos logit contributions
ESE+0.060
ysis+0.055
res+0.052
SEA+0.052
sta+0.049
Neg logit contributions
 Cunning-0.065
 harmless-0.053
its-0.050
iquette-0.050
attering-0.050
Token immediately
Token ID3393
Feature activation-1.52e-01

Pos logit contributions
ESE+0.075
ysis+0.069
res+0.066
SEA+0.066
sta+0.063
Neg logit contributions
 Cunning-0.071
 harmless-0.057
its-0.054
iquette-0.054
attering-0.054
Token establish
Token ID4474
Feature activation+8.25e-02

Pos logit contributions
 Cunning+0.121
 harmless+0.097
iquette+0.092
its+0.091
attering+0.091
Neg logit contributions
ESE-0.129
ysis-0.119
res-0.115
SEA-0.113
sta-0.108
Token Safe
Token ID19978
Feature activation-1.56e-01

Pos logit contributions
ESE+0.068
ysis+0.063
res+0.061
SEA+0.060
sta+0.057
Neg logit contributions
 Cunning-0.067
 harmless-0.054
its-0.051
iquette-0.051
attering-0.050
Token Z
Token ID1168
Feature activation-5.20e-03

Pos logit contributions
 Cunning+0.125
 harmless+0.101
its+0.096
iquette+0.094
attering+0.094
Neg logit contributions
ESE-0.131
ysis-0.121
res-0.116
SEA-0.115
thodox-0.110
Tokenones
Token ID1952
Feature activation+4.04e-01

Pos logit contributions
 Cunning+0.007
 harmless+0.006
iquette+0.006
its+0.006
attering+0.006
Neg logit contributions
ESE-0.002
ysis-0.002
res-0.001
SEA-0.001
sta-0.001
Token so
Token ID523
Feature activation+2.95e-01

Pos logit contributions
ESE+0.333
ysis+0.306
res+0.295
SEA+0.292
sta+0.278
Neg logit contributions
 Cunning-0.330
 harmless-0.266
its-0.252
iquette-0.252
attering-0.250
Token that
Token ID326
Feature activation-1.13e-02

Pos logit contributions
ESE+0.246
ysis+0.226
res+0.219
SEA+0.215
sta+0.206
Neg logit contributions
 Cunning-0.239
 harmless-0.191
iquette-0.181
its-0.181
attering-0.180
Token families
Token ID4172
Feature activation-6.13e-02

Pos logit contributions
 Cunning+0.009
 harmless+0.007
its+0.007
iquette+0.007
attering+0.007
Neg logit contributions
ESE-0.010
ysis-0.009
res-0.009
SEA-0.008
sta-0.008
Token with
Token ID351
Feature activation-4.95e-02

Pos logit contributions
 Cunning+0.050
 harmless+0.040
its+0.038
iquette+0.038
attering+0.038
Neg logit contributions
ESE-0.050
ysis-0.046
res-0.045
SEA-0.044
sta-0.042
Token children
Token ID1751
Feature activation+2.39e-02

Pos logit contributions
 Cunning+0.042
 harmless+0.034
its+0.032
attering+0.032
 Buckley+0.032
Neg logit contributions
ESE-0.039
ysis-0.036
SEA-0.034
res-0.034
sta-0.033
Token BLACK
Token ID31963
Feature activation-4.31e-02

Pos logit contributions
ESE+0.178
ysis+0.163
res+0.157
SEA+0.156
sta+0.148
Neg logit contributions
 Cunning-0.175
 harmless-0.141
its-0.133
iquette-0.133
attering-0.132
TokenFO
Token ID6080
Feature activation-2.94e-01

Pos logit contributions
 Cunning+0.024
 harmless+0.018
its+0.016
iquette+0.016
attering+0.016
Neg logit contributions
ESE-0.047
ysis-0.043
res-0.042
SEA-0.042
sta-0.041
TokenOT
Token ID2394
Feature activation+4.70e-02

Pos logit contributions
 Cunning+0.266
 harmless+0.221
iquette+0.210
attering+0.209
its+0.208
Neg logit contributions
ESE-0.221
ysis-0.197
res-0.190
SEA-0.190
sta-0.178
Token T
Token ID309
Feature activation-8.31e-02

Pos logit contributions
ESE+0.039
ysis+0.036
res+0.034
SEA+0.034
sta+0.032
Neg logit contributions
 Cunning-0.038
 harmless-0.031
its-0.029
iquette-0.029
attering-0.029
TokenOC
Token ID4503
Feature activation-1.07e-02

Pos logit contributions
 Cunning+0.068
 harmless+0.055
its+0.052
iquette+0.052
attering+0.052
Neg logit contributions
ESE-0.068
ysis-0.063
res-0.060
SEA-0.060
sta-0.057
Token RE
Token ID4526
Feature activation+3.57e-01

Pos logit contributions
 Cunning+0.011
 harmless+0.009
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.007
ysis-0.006
res-0.006
SEA-0.006
sta-0.005
TokenP
Token ID47
Feature activation-4.16e-02

Pos logit contributions
ESE+0.295
ysis+0.268
res+0.260
SEA+0.259
sta+0.244
Neg logit contributions
 Cunning-0.294
 harmless-0.238
attering-0.223
iquette-0.223
its-0.223
TokenORTS
Token ID33002
Feature activation-1.87e-02

Pos logit contributions
 Cunning+0.033
 harmless+0.026
iquette+0.025
its+0.024
attering+0.024
Neg logit contributions
ESE-0.036
ysis-0.033
res-0.032
SEA-0.031
sta-0.030
Token B
Token ID347
Feature activation-2.00e-01

Pos logit contributions
 Cunning+0.015
 harmless+0.012
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.014
SEA-0.013
sta-0.013
Token6
Token ID21
Feature activation+9.07e-02

Pos logit contributions
 Cunning+0.172
 harmless+0.140
its+0.132
iquette+0.131
 Buckley+0.131
Neg logit contributions
ESE-0.158
ysis-0.144
res-0.139
SEA-0.138
sta-0.130
Token HAS
Token ID33930
Feature activation-1.72e-02

Pos logit contributions
ESE+0.073
ysis+0.067
res+0.065
SEA+0.064
sta+0.061
Neg logit contributions
 Cunning-0.075
 harmless-0.061
its-0.058
iquette-0.058
attering-0.057
Token right
Token ID826
Feature activation+1.18e-01

Pos logit contributions
ESE+0.004
ysis+0.003
res+0.003
SEA+0.003
sta+0.003
Neg logit contributions
 Cunning-0.004
 harmless-0.003
its-0.003
iquette-0.003
attering-0.003
Token?
Token ID30
Feature activation+6.88e-03

Pos logit contributions
ESE+0.099
ysis+0.091
res+0.087
SEA+0.086
thodox+0.082
Neg logit contributions
 Cunning-0.096
 harmless-0.078
its-0.073
iquette-0.073
attering-0.072
Token
Token ID198
Feature activation+3.61e-03

Pos logit contributions
ESE+0.006
ysis+0.005
res+0.005
SEA+0.005
sta+0.005
Neg logit contributions
 Cunning-0.006
 harmless-0.004
its-0.004
iquette-0.004
attering-0.004
Token
Token ID198
Feature activation+6.22e-04

Pos logit contributions
ESE+0.003
ysis+0.003
res+0.003
SEA+0.003
sta+0.002
Neg logit contributions
 Cunning-0.003
 harmless-0.002
iquette-0.002
its-0.002
attering-0.002
TokenLeave
Token ID35087
Feature activation+4.29e-02

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
thodox+0.000
Neg logit contributions
 Cunning-0.001
 harmless-0.000
its-0.000
iquette-0.000
attering-0.000
Token Shia
Token ID27424
Feature activation+3.63e-01

Pos logit contributions
ESE+0.037
ysis+0.034
res+0.033
SEA+0.032
thodox+0.031
Neg logit contributions
 Cunning-0.034
 harmless-0.027
its-0.025
iquette-0.025
attering-0.025
Token alone
Token ID3436
Feature activation-9.29e-02

Pos logit contributions
ESE+0.301
ysis+0.278
res+0.268
SEA+0.265
sta+0.253
Neg logit contributions
 Cunning-0.294
 harmless-0.235
its-0.224
iquette-0.223
attering-0.222
Token,
Token ID11
Feature activation+3.49e-04

Pos logit contributions
 Cunning+0.075
 harmless+0.060
its+0.057
iquette+0.057
attering+0.056
Neg logit contributions
ESE-0.078
ysis-0.071
res-0.069
SEA-0.068
thodox-0.065
Token guys
Token ID3730
Feature activation+2.98e-01

Pos logit contributions
ESE+0.000
ysis+0.000
res+0.000
SEA+0.000
sta+0.000
Neg logit contributions
 Cunning-0.000
 harmless-0.000
its-0.000
iquette-0.000
attering-0.000
Token.
Token ID13
Feature activation+1.43e-03

Pos logit contributions
ESE+0.252
ysis+0.232
res+0.223
SEA+0.222
thodox+0.211
Neg logit contributions
 Cunning-0.236
 harmless-0.189
its-0.179
iquette-0.178
attering-0.177
Token You
Token ID921
Feature activation+1.18e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token the
Token ID262
Feature activation+2.57e-02

Pos logit contributions
 Cunning+0.035
 harmless+0.029
its+0.028
attering+0.027
iquette+0.027
Neg logit contributions
ESE-0.033
ysis-0.030
res-0.029
SEA-0.029
thodox-0.027
Token cost
Token ID1575
Feature activation-1.77e-01

Pos logit contributions
ESE+0.019
ysis+0.018
res+0.017
SEA+0.017
sta+0.016
Neg logit contributions
 Cunning-0.023
 harmless-0.019
its-0.018
iquette-0.018
attering-0.018
Token of
Token ID286
Feature activation+1.12e-02

Pos logit contributions
 Cunning+0.137
 harmless+0.110
its+0.104
attering+0.103
iquette+0.102
Neg logit contributions
ESE-0.153
ysis-0.140
res-0.134
SEA-0.134
thodox-0.128
Token oil
Token ID3056
Feature activation+1.49e-01

Pos logit contributions
ESE+0.009
ysis+0.008
res+0.008
SEA+0.008
sta+0.007
Neg logit contributions
 Cunning-0.010
 harmless-0.008
its-0.007
iquette-0.007
attering-0.007
Token in
Token ID287
Feature activation-8.40e-03

Pos logit contributions
ESE+0.121
ysis+0.111
SEA+0.106
res+0.106
thodox+0.101
Neg logit contributions
 Cunning-0.123
 harmless-0.100
its-0.095
iquette-0.094
attering-0.093
Token Haiti
Token ID25051
Feature activation+2.11e-01

Pos logit contributions
 Cunning+0.007
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.006
ysis-0.006
res-0.006
SEA-0.006
sta-0.005
Token after
Token ID706
Feature activation+1.01e-01

Pos logit contributions
ESE+0.170
ysis+0.155
res+0.149
SEA+0.148
thodox+0.142
Neg logit contributions
 Cunning-0.175
 harmless-0.144
its-0.137
attering-0.134
iquette-0.134
Token several
Token ID1811
Feature activation-6.14e-02

Pos logit contributions
ESE+0.085
ysis+0.078
res+0.075
SEA+0.075
thodox+0.071
Neg logit contributions
 Cunning-0.080
 harmless-0.065
its-0.061
iquette-0.061
attering-0.060
Token years
Token ID812
Feature activation+5.99e-02

Pos logit contributions
 Cunning+0.054
 harmless+0.044
its+0.042
iquette+0.042
attering+0.042
Neg logit contributions
ESE-0.047
ysis-0.043
res-0.041
SEA-0.041
sta-0.039
Token of
Token ID286
Feature activation+1.11e-02

Pos logit contributions
ESE+0.051
ysis+0.047
res+0.045
SEA+0.045
thodox+0.043
Neg logit contributions
 Cunning-0.047
 harmless-0.038
its-0.036
attering-0.036
iquette-0.035
Token high
Token ID1029
Feature activation+2.23e-01

Pos logit contributions
ESE+0.009
ysis+0.008
SEA+0.007
res+0.007
thodox+0.007
Neg logit contributions
 Cunning-0.010
 harmless-0.008
its-0.008
attering-0.007
 Buckley-0.007
Token
Token ID198
Feature activation+1.77e-03

Pos logit contributions
ESE+0.003
ysis+0.003
res+0.003
SEA+0.003
sta+0.003
Neg logit contributions
 Cunning-0.003
 harmless-0.002
iquette-0.002
its-0.002
attering-0.002
TokenPeter
Token ID19727
Feature activation-8.58e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.002
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token Mell
Token ID32993
Feature activation-2.00e-01

Pos logit contributions
 Cunning+0.064
 harmless+0.051
its+0.048
iquette+0.048
attering+0.048
Neg logit contributions
ESE-0.076
ysis-0.071
res-0.068
SEA-0.068
sta-0.065
Tokengard
Token ID19977
Feature activation-1.03e-01

Pos logit contributions
 Cunning+0.161
 harmless+0.130
its+0.122
iquette+0.121
attering+0.121
Neg logit contributions
ESE-0.167
ysis-0.154
res-0.148
SEA-0.147
sta-0.140
Token/
Token ID14
Feature activation+6.51e-02

Pos logit contributions
 Cunning+0.084
 harmless+0.068
its+0.065
iquette+0.064
attering+0.064
Neg logit contributions
ESE-0.085
ysis-0.078
res-0.075
SEA-0.075
thodox-0.071
TokenThe
Token ID464
Feature activation+7.63e-02

Pos logit contributions
ESE+0.054
ysis+0.050
res+0.048
SEA+0.048
thodox+0.045
Neg logit contributions
 Cunning-0.053
 harmless-0.042
its-0.040
iquette-0.040
attering-0.040
Token World
Token ID2159
Feature activation-1.02e-01

Pos logit contributions
ESE+0.072
ysis+0.067
res+0.064
SEA+0.064
thodox+0.062
Neg logit contributions
 Cunning-0.053
 harmless-0.041
its-0.038
iquette-0.038
attering-0.038
TokenPost
Token ID6307
Feature activation+1.90e-01

Pos logit contributions
 Cunning+0.106
 harmless+0.090
its+0.087
attering+0.087
 Buckley+0.086
Neg logit contributions
ESE-0.060
ysis-0.054
res-0.050
SEA-0.050
thodox-0.047
Token The
Token ID383
Feature activation-1.03e-02

Pos logit contributions
ESE+0.159
ysis+0.146
res+0.140
SEA+0.139
thodox+0.133
Neg logit contributions
 Cunning-0.153
 harmless-0.123
its-0.117
iquette-0.115
attering-0.115
Token water
Token ID1660
Feature activation+2.29e-01

Pos logit contributions
 Cunning+0.008
 harmless+0.006
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
sta-0.007
Token hours
Token ID2250
Feature activation-7.34e-02

Pos logit contributions
ESE+0.186
ysis+0.172
res+0.166
SEA+0.163
sta+0.156
Neg logit contributions
 Cunning-0.189
 harmless-0.152
its-0.144
iquette-0.144
attering-0.144
Token
Token ID447
Feature activation-1.25e-02

Pos logit contributions
 Cunning+0.085
 harmless+0.070
its+0.067
iquette+0.066
attering+0.066
Neg logit contributions
ESE-0.071
ysis-0.065
res-0.062
SEA-0.061
thodox-0.058
Token
Token ID247
Feature activation+1.02e-02

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.010
ysis-0.009
res-0.009
SEA-0.009
thodox-0.008
Tokenre
Token ID260
Feature activation+4.11e-01

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
thodox+0.007
Neg logit contributions
 Cunning-0.009
 harmless-0.007
its-0.007
iquette-0.007
attering-0.007
Token doing
Token ID1804
Feature activation+7.89e-02

Pos logit contributions
ESE+0.323
ysis+0.296
res+0.283
SEA+0.281
thodox+0.267
Neg logit contributions
 Cunning-0.351
 harmless-0.287
its-0.272
iquette-0.270
attering-0.270
Token now
Token ID783
Feature activation-3.04e-02

Pos logit contributions
ESE+0.061
ysis+0.056
res+0.054
SEA+0.053
sta+0.051
Neg logit contributions
 Cunning-0.068
 harmless-0.056
its-0.053
iquette-0.052
attering-0.052
Token is
Token ID318
Feature activation+7.85e-02

Pos logit contributions
 Cunning+0.027
 harmless+0.022
its+0.021
iquette+0.021
attering+0.021
Neg logit contributions
ESE-0.023
ysis-0.021
res-0.020
SEA-0.020
sta-0.019
Token working
Token ID1762
Feature activation+1.60e-01

Pos logit contributions
ESE+0.063
ysis+0.057
res+0.055
SEA+0.055
thodox+0.052
Neg logit contributions
 Cunning-0.066
 harmless-0.054
its-0.051
iquette-0.051
attering-0.051
Token out
Token ID503
Feature activation-1.38e-01

Pos logit contributions
ESE+0.134
ysis+0.123
res+0.119
SEA+0.118
sta+0.113
Neg logit contributions
 Cunning-0.129
 harmless-0.103
iquette-0.098
attering-0.098
 Buckley-0.097
Token the
Token ID262
Feature activation+1.35e-02

Pos logit contributions
 Cunning+0.113
 harmless+0.091
its+0.086
iquette+0.086
attering+0.086
Neg logit contributions
ESE-0.113
ysis-0.104
res-0.100
SEA-0.099
sta-0.095
Token tactics
Token ID10815
Feature activation-1.38e-01

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.010
SEA+0.009
sta+0.009
Neg logit contributions
 Cunning-0.011
 harmless-0.009
its-0.009
iquette-0.009
attering-0.009
Token,
Token ID11
Feature activation-9.68e-04

Pos logit contributions
 Cunning+0.111
 harmless+0.089
its+0.084
iquette+0.084
attering+0.084
Neg logit contributions
ESE-0.114
ysis-0.105
res-0.102
SEA-0.101
sta-0.096
Token The
Token ID383
Feature activation-3.11e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.009
iquette+0.009
attering+0.009
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.011
SEA-0.011
sta-0.010
Token expansion
Token ID7118
Feature activation+8.32e-02

Pos logit contributions
 Cunning+0.027
 harmless+0.022
its+0.021
iquette+0.021
attering+0.021
Neg logit contributions
ESE-0.024
ysis-0.022
res-0.021
SEA-0.021
thodox-0.020
Token would
Token ID561
Feature activation-2.18e-02

Pos logit contributions
ESE+0.070
ysis+0.064
res+0.062
SEA+0.062
sta+0.059
Neg logit contributions
 Cunning-0.066
 harmless-0.053
its-0.050
iquette-0.050
attering-0.050
Token be
Token ID307
Feature activation-1.94e-02

Pos logit contributions
 Cunning+0.018
 harmless+0.015
its+0.014
iquette+0.014
attering+0.014
Neg logit contributions
ESE-0.017
ysis-0.016
res-0.015
SEA-0.015
sta-0.014
Token the
Token ID262
Feature activation+5.73e-03

Pos logit contributions
 Cunning+0.017
 harmless+0.014
its+0.013
iquette+0.013
attering+0.013
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.013
SEA-0.013
sta-0.012
Token largest
Token ID4387
Feature activation+6.42e-02

Pos logit contributions
ESE+0.004
ysis+0.004
res+0.004
SEA+0.004
sta+0.004
Neg logit contributions
 Cunning-0.005
 harmless-0.004
its-0.004
iquette-0.004
attering-0.004
Token since
Token ID1201
Feature activation+1.09e-01

Pos logit contributions
ESE+0.052
ysis+0.047
SEA+0.045
res+0.045
thodox+0.043
Neg logit contributions
 Cunning-0.053
 harmless-0.044
 Buckley-0.041
its-0.041
action-0.041
Token vouchers
Token ID41262
Feature activation+2.38e-03

Pos logit contributions
ESE+0.100
ysis+0.093
res+0.090
SEA+0.089
sta+0.085
Neg logit contributions
 Cunning-0.078
 harmless-0.061
its-0.057
iquette-0.057
attering-0.057
Token were
Token ID547
Feature activation-3.32e-02

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
 seeded+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.001
iquette-0.001
its-0.001
attering-0.001
Token introduced
Token ID5495
Feature activation-7.41e-03

Pos logit contributions
 Cunning+0.029
 harmless+0.024
its+0.022
iquette+0.022
attering+0.022
Neg logit contributions
ESE-0.026
ysis-0.023
res-0.022
SEA-0.022
sta-0.021
Token in
Token ID287
Feature activation-1.69e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.006
ysis-0.006
res-0.005
SEA-0.005
sta-0.005
Tokenhes
Token ID956
Feature activation+1.36e-01

Pos logit contributions
 Cunning+0.206
 harmless+0.173
iquette+0.168
 Bots+0.167
attering+0.165
Neg logit contributions
ESE-0.108
ysis-0.095
res-0.094
SEA-0.089
sta-0.084
Token said
Token ID531
Feature activation-2.20e-02

Pos logit contributions
ESE+0.111
ysis+0.102
res+0.098
SEA+0.097
sta+0.092
Neg logit contributions
 Cunning-0.112
 harmless-0.091
its-0.086
iquette-0.086
attering-0.085
Token the
Token ID262
Feature activation+2.47e-03

Pos logit contributions
 Cunning+0.018
 harmless+0.015
iquette+0.014
its+0.014
attering+0.014
Neg logit contributions
ESE-0.018
ysis-0.016
res-0.016
SEA-0.016
sta-0.015
Token
Token ID564
Feature activation+2.28e-02

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.002
SEA+0.002
sta+0.002
Neg logit contributions
 Cunning-0.002
 harmless-0.002
its-0.002
iquette-0.002
attering-0.002
Token
Token ID250
Feature activation-1.22e-02

Pos logit contributions
ESE+0.019
ysis+0.017
res+0.017
SEA+0.016
sta+0.016
Neg logit contributions
 Cunning-0.019
 harmless-0.015
its-0.014
iquette-0.014
attering-0.014
Tokenaston
Token ID45966
Feature activation+1.20e-01

Pos logit contributions
 Cunning+0.011
 harmless+0.009
its+0.009
iquette+0.009
attering+0.008
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
thodox-0.007
Tokenishing
Token ID3929
Feature activation-2.50e-01

Pos logit contributions
ESE+0.129
ysis+0.121
res+0.118
SEA+0.117
sta+0.113
Neg logit contributions
 Cunning-0.068
 harmless-0.049
its-0.045
iquette-0.045
attering-0.044
Token
Token ID447
Feature activation-1.72e-02

Pos logit contributions
 Cunning+0.213
 harmless+0.174
its+0.165
iquette+0.164
attering+0.164
Neg logit contributions
ESE-0.196
ysis-0.180
res-0.173
SEA-0.171
sta-0.162
Token
Token ID251
Feature activation-3.35e-02

Pos logit contributions
 Cunning+0.015
 harmless+0.012
its+0.011
iquette+0.011
attering+0.011
Neg logit contributions
ESE-0.013
ysis-0.012
res-0.012
SEA-0.012
sta-0.011
Token revision
Token ID18440
Feature activation+1.51e-01

Pos logit contributions
 Cunning+0.029
 harmless+0.023
its+0.022
iquette+0.022
attering+0.022
Neg logit contributions
ESE-0.026
ysis-0.024
res-0.023
SEA-0.023
sta-0.022
Token also
Token ID635
Feature activation-4.35e-02

Pos logit contributions
ESE+0.123
ysis+0.113
res+0.109
SEA+0.108
sta+0.103
Neg logit contributions
 Cunning-0.125
 harmless-0.101
iquette-0.095
its-0.095
attering-0.095
Token to
Token ID284
Feature activation-9.58e-03

Pos logit contributions
 Cunning+0.070
 harmless+0.057
its+0.054
iquette+0.053
attering+0.053
Neg logit contributions
ESE-0.072
ysis-0.066
res-0.064
SEA-0.063
sta-0.060
Token try
Token ID1949
Feature activation-1.76e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.007
its+0.006
iquette+0.006
attering+0.006
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
sta-0.006
Token this
Token ID428
Feature activation+1.31e-01

Pos logit contributions
 Cunning+0.014
 harmless+0.011
iquette+0.010
its+0.010
 Buckley+0.010
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.013
SEA-0.013
sta-0.013
Token!
Token ID0
Feature activation-9.68e-02

Pos logit contributions
ESE+0.109
ysis+0.100
res+0.096
SEA+0.095
sta+0.091
Neg logit contributions
 Cunning-0.106
 harmless-0.085
its-0.081
iquette-0.081
attering-0.080
Token
Token ID447
Feature activation-1.18e-02

Pos logit contributions
 Cunning+0.080
 harmless+0.065
its+0.061
iquette+0.061
attering+0.061
Neg logit contributions
ESE-0.079
ysis-0.072
res-0.069
SEA-0.069
sta-0.066
Token
Token ID251
Feature activation-1.88e-02

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.009
ysis-0.008
res-0.008
SEA-0.008
thodox-0.007
Token reactions
Token ID12737
Feature activation-1.93e-01

Pos logit contributions
 Cunning+0.015
 harmless+0.012
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.015
ysis-0.014
res-0.014
SEA-0.014
sta-0.013
Token generated
Token ID7560
Feature activation+2.78e-02

Pos logit contributions
 Cunning+0.157
 harmless+0.126
its+0.120
iquette+0.119
attering+0.119
Neg logit contributions
ESE-0.159
ysis-0.146
res-0.141
SEA-0.139
sta-0.133
Token after
Token ID706
Feature activation+8.96e-02

Pos logit contributions
ESE+0.018
ysis+0.016
res+0.015
SEA+0.015
thodox+0.014
Neg logit contributions
 Cunning-0.028
 harmless-0.023
its-0.022
iquette-0.022
attering-0.022
Token playing
Token ID2712
Feature activation+1.64e-01

Pos logit contributions
ESE+0.069
ysis+0.064
res+0.061
SEA+0.060
sta+0.057
Neg logit contributions
 Cunning-0.077
 harmless-0.063
its-0.060
iquette-0.060
attering-0.060
Token the
Token ID262
Feature activation+1.84e-02

Pos logit contributions
ESE+0.132
ysis+0.121
res+0.116
SEA+0.115
sta+0.110
Neg logit contributions
 Cunning-0.136
 harmless-0.110
its-0.105
iquette-0.105
attering-0.104
Token certain
Token ID1728
Feature activation-2.90e-02

Pos logit contributions
 Cunning+0.009
 harmless+0.007
iquette+0.007
its+0.007
attering+0.007
Neg logit contributions
ESE-0.011
ysis-0.010
res-0.010
SEA-0.010
sta-0.009
Token Qaeda
Token ID17942
Feature activation-1.20e-01

Pos logit contributions
 Cunning+0.022
 harmless+0.017
iquette+0.016
its+0.016
attering+0.016
Neg logit contributions
ESE-0.026
ysis-0.024
res-0.023
SEA-0.023
sta-0.022
Token suspects
Token ID11826
Feature activation+7.32e-02

Pos logit contributions
 Cunning+0.085
 harmless+0.066
iquette+0.062
its+0.062
attering+0.061
Neg logit contributions
ESE-0.111
ysis-0.103
res-0.100
SEA-0.099
sta-0.095
Token,
Token ID11
Feature activation-9.27e-03

Pos logit contributions
ESE+0.060
ysis+0.055
res+0.053
SEA+0.052
sta+0.050
Neg logit contributions
 Cunning-0.060
 harmless-0.049
its-0.046
iquette-0.046
attering-0.046
Token but
Token ID475
Feature activation-3.89e-02

Pos logit contributions
 Cunning+0.008
 harmless+0.006
iquette+0.006
its+0.006
attering+0.006
Neg logit contributions
ESE-0.008
ysis-0.007
res-0.007
SEA-0.007
sta-0.006
Token the
Token ID262
Feature activation+6.08e-03

Pos logit contributions
 Cunning+0.032
 harmless+0.026
iquette+0.025
its+0.025
attering+0.025
Neg logit contributions
ESE-0.031
ysis-0.029
res-0.028
SEA-0.027
sta-0.026
Token letter
Token ID3850
Feature activation-1.86e-01

Pos logit contributions
ESE+0.006
ysis+0.005
res+0.005
SEA+0.005
sta+0.005
Neg logit contributions
 Cunning-0.004
 harmless-0.003
its-0.003
iquette-0.003
attering-0.003
Token said
Token ID531
Feature activation-1.82e-02

Pos logit contributions
 Cunning+0.155
 harmless+0.126
its+0.119
iquette+0.118
attering+0.118
Neg logit contributions
ESE-0.151
ysis-0.138
res-0.133
SEA-0.132
thodox-0.125
Token the
Token ID262
Feature activation+8.75e-03

Pos logit contributions
 Cunning+0.015
 harmless+0.012
its+0.012
iquette+0.012
attering+0.012
Neg logit contributions
ESE-0.015
ysis-0.013
res-0.013
SEA-0.013
thodox-0.012
Token C
Token ID327
Feature activation-2.40e-01

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.006
Neg logit contributions
 Cunning-0.007
 harmless-0.005
its-0.005
iquette-0.005
attering-0.005
Token.
Token ID13
Feature activation-7.33e-03

Pos logit contributions
 Cunning+0.194
 harmless+0.156
its+0.147
iquette+0.147
attering+0.146
Neg logit contributions
ESE-0.200
ysis-0.184
res-0.178
SEA-0.176
sta-0.168
Token.
Token ID13
Feature activation-6.10e-03

Pos logit contributions
 Cunning+0.082
 harmless+0.066
its+0.062
iquette+0.062
attering+0.062
Neg logit contributions
ESE-0.089
ysis-0.082
res-0.079
SEA-0.078
sta-0.075
Token But
Token ID887
Feature activation-8.25e-02

Pos logit contributions
 Cunning+0.005
 harmless+0.004
its+0.004
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.005
ysis-0.005
res-0.004
SEA-0.004
sta-0.004
Token long
Token ID890
Feature activation+7.20e-02

Pos logit contributions
 Cunning+0.070
 harmless+0.057
its+0.054
iquette+0.053
attering+0.053
Neg logit contributions
ESE-0.066
ysis-0.060
res-0.058
SEA-0.057
sta-0.054
Token before
Token ID878
Feature activation-5.56e-02

Pos logit contributions
ESE+0.062
ysis+0.057
res+0.055
SEA+0.055
sta+0.052
Neg logit contributions
 Cunning-0.056
 harmless-0.045
its-0.042
iquette-0.042
attering-0.042
Token arrival
Token ID10325
Feature activation-9.15e-02

Pos logit contributions
 Cunning+0.047
 harmless+0.038
its+0.036
iquette+0.036
attering+0.036
Neg logit contributions
ESE-0.044
ysis-0.040
res-0.039
SEA-0.039
thodox-0.037
Token of
Token ID286
Feature activation-4.16e-04

Pos logit contributions
 Cunning+0.074
 harmless+0.060
its+0.057
iquette+0.056
attering+0.056
Neg logit contributions
ESE-0.076
ysis-0.070
res-0.067
SEA-0.067
thodox-0.064
Token the
Token ID262
Feature activation+1.83e-02

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
thodox-0.000
Token Greeks
Token ID25059
Feature activation+2.29e-01

Pos logit contributions
ESE+0.013
ysis+0.012
res+0.012
SEA+0.012
sta+0.011
Neg logit contributions
 Cunning-0.016
 harmless-0.014
its-0.013
iquette-0.013
attering-0.013
Token and
Token ID290
Feature activation-1.45e-02

Pos logit contributions
ESE+0.190
ysis+0.176
res+0.169
SEA+0.168
thodox+0.160
Neg logit contributions
 Cunning-0.185
 harmless-0.149
its-0.142
attering-0.140
 Buckley-0.139
Token the
Token ID262
Feature activation+1.04e-02

Pos logit contributions
 Cunning+0.012
 harmless+0.010
its+0.010
attering+0.009
iquette+0.009
Neg logit contributions
ESE-0.012
ysis-0.011
res-0.010
SEA-0.010
sta-0.010
Token Pers
Token ID9467
Feature activation+8.76e-02

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.006
Neg logit contributions
 Cunning-0.009
 harmless-0.008
its-0.007
iquette-0.007
attering-0.007
Token the
Token ID262
Feature activation+1.94e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token need
Token ID761
Feature activation+2.89e-02

Pos logit contributions
ESE+0.016
ysis+0.015
res+0.014
SEA+0.014
thodox+0.013
Neg logit contributions
 Cunning-0.016
 harmless-0.013
its-0.012
iquette-0.012
attering-0.012
Token to
Token ID284
Feature activation-7.35e-03

Pos logit contributions
ESE+0.025
ysis+0.023
res+0.022
SEA+0.022
sta+0.021
Neg logit contributions
 Cunning-0.023
 harmless-0.018
its-0.017
iquette-0.017
attering-0.017
Token seamlessly
Token ID33681
Feature activation-7.15e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
Token transition
Token ID6801
Feature activation-1.60e-01

Pos logit contributions
 Cunning+0.061
 harmless+0.049
its+0.047
iquette+0.047
attering+0.047
Neg logit contributions
ESE-0.056
ysis-0.052
res-0.050
SEA-0.049
sta-0.047
Token from
Token ID422
Feature activation+9.46e-04

Pos logit contributions
 Cunning+0.118
 harmless+0.092
its+0.087
iquette+0.086
attering+0.086
Neg logit contributions
ESE-0.145
ysis-0.134
res-0.129
SEA-0.128
sta-0.123
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
ESE+0.001
ysis+0.001
res+0.001
SEA+0.001
thodox+0.001
Neg logit contributions
 Cunning-0.001
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token office
Token ID2607
Feature activation+1.69e-01

Pos logit contributions
ESE+0.016
ysis+0.015
res+0.014
SEA+0.014
thodox+0.013
Neg logit contributions
 Cunning-0.018
 harmless-0.015
its-0.014
iquette-0.014
attering-0.014
Token levels
Token ID2974
Feature activation+3.64e-02

Pos logit contributions
ESE+0.132
ysis+0.121
res+0.116
SEA+0.115
sta+0.109
Neg logit contributions
 Cunning-0.144
 harmless-0.118
its-0.112
iquette-0.111
attering-0.111
Token at
Token ID379
Feature activation+4.22e-02

Pos logit contributions
ESE+0.030
ysis+0.028
res+0.027
SEA+0.027
sta+0.025
Neg logit contributions
 Cunning-0.029
 harmless-0.024
its-0.022
iquette-0.022
attering-0.022
Token the
Token ID262
Feature activation+2.31e-02

Pos logit contributions
ESE+0.035
ysis+0.032
res+0.031
SEA+0.030
sta+0.029
Neg logit contributions
 Cunning-0.035
 harmless-0.028
its-0.026
iquette-0.026
attering-0.026
Token happen
Token ID1645
Feature activation-2.03e-01

Pos logit contributions
ESE+0.106
ysis+0.097
res+0.093
SEA+0.092
sta+0.087
Neg logit contributions
 Cunning-0.120
 harmless-0.097
its-0.093
iquette-0.093
attering-0.093
Token,
Token ID11
Feature activation+1.86e-03

Pos logit contributions
 Cunning+0.167
 harmless+0.135
its+0.128
iquette+0.128
attering+0.127
Neg logit contributions
ESE-0.166
ysis-0.153
res-0.147
SEA-0.146
sta-0.139
Token it
Token ID340
Feature activation-8.16e-02

Pos logit contributions
ESE+0.002
ysis+0.001
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.002
 harmless-0.001
iquette-0.001
its-0.001
attering-0.001
Token is
Token ID318
Feature activation+8.58e-02

Pos logit contributions
 Cunning+0.068
 harmless+0.055
its+0.052
iquette+0.052
attering+0.052
Neg logit contributions
ESE-0.065
ysis-0.060
res-0.058
SEA-0.057
sta-0.055
Token necessary
Token ID3306
Feature activation+1.08e-01

Pos logit contributions
ESE+0.068
ysis+0.062
res+0.060
SEA+0.059
sta+0.056
Neg logit contributions
 Cunning-0.073
 harmless-0.059
its-0.056
iquette-0.056
attering-0.056
Token the
Token ID262
Feature activation+2.04e-02

Pos logit contributions
ESE+0.091
ysis+0.084
res+0.081
SEA+0.080
sta+0.076
Neg logit contributions
 Cunning-0.086
 harmless-0.069
its-0.065
iquette-0.065
attering-0.064
Token EU
Token ID4576
Feature activation+5.45e-02

Pos logit contributions
ESE+0.016
ysis+0.015
res+0.014
SEA+0.014
sta+0.013
Neg logit contributions
 Cunning-0.017
 harmless-0.014
its-0.013
iquette-0.013
attering-0.013
Token to
Token ID284
Feature activation-1.70e-02

Pos logit contributions
ESE+0.045
ysis+0.041
res+0.040
SEA+0.040
sta+0.038
Neg logit contributions
 Cunning-0.044
 harmless-0.036
its-0.034
iquette-0.034
attering-0.034
Token consider
Token ID2074
Feature activation+5.52e-03

Pos logit contributions
 Cunning+0.014
 harmless+0.012
its+0.011
iquette+0.011
attering+0.011
Neg logit contributions
ESE-0.014
ysis-0.012
res-0.012
SEA-0.012
sta-0.011
Token carefully
Token ID7773
Feature activation-1.35e-01

Pos logit contributions
ESE+0.005
ysis+0.004
res+0.004
SEA+0.004
sta+0.004
Neg logit contributions
 Cunning-0.004
 harmless-0.004
its-0.003
iquette-0.003
attering-0.003
Token the
Token ID262
Feature activation+1.70e-02

Pos logit contributions
 Cunning+0.108
 harmless+0.086
iquette+0.082
its+0.081
attering+0.081
Neg logit contributions
ESE-0.113
ysis-0.104
res-0.101
SEA-0.099
sta-0.095
Token of
Token ID286
Feature activation-3.79e-03

Pos logit contributions
 Cunning+0.115
 harmless+0.093
its+0.088
iquette+0.087
attering+0.087
Neg logit contributions
ESE-0.119
ysis-0.110
res-0.106
SEA-0.105
sta-0.100
Token it
Token ID340
Feature activation-8.72e-02

Pos logit contributions
 Cunning+0.003
 harmless+0.002
iquette+0.002
its+0.002
attering+0.002
Neg logit contributions
ESE-0.003
ysis-0.003
res-0.003
SEA-0.003
sta-0.003
Token.
Token ID13
Feature activation-7.79e-03

Pos logit contributions
 Cunning+0.072
 harmless+0.058
its+0.055
iquette+0.054
attering+0.054
Neg logit contributions
ESE-0.071
ysis-0.066
res-0.063
SEA-0.063
sta-0.060
Token I
Token ID314
Feature activation-3.06e-02

Pos logit contributions
 Cunning+0.006
 harmless+0.005
its+0.005
iquette+0.005
attering+0.005
Neg logit contributions
ESE-0.006
ysis-0.006
res-0.006
SEA-0.006
thodox-0.005
Token asked
Token ID1965
Feature activation+1.58e-01

Pos logit contributions
 Cunning+0.027
 harmless+0.022
its+0.021
iquette+0.021
attering+0.021
Neg logit contributions
ESE-0.024
ysis-0.021
SEA-0.020
res-0.020
thodox-0.019
Token life
Token ID1204
Feature activation-2.65e-01

Pos logit contributions
ESE+0.135
ysis+0.125
res+0.121
SEA+0.119
sta+0.114
Neg logit contributions
 Cunning-0.123
 harmless-0.098
its-0.093
iquette-0.093
attering-0.092
Tokenform
Token ID687
Feature activation-1.13e-02

Pos logit contributions
 Cunning+0.203
 harmless+0.162
its+0.153
iquette+0.152
attering+0.151
Neg logit contributions
ESE-0.230
ysis-0.213
res-0.205
SEA-0.204
thodox-0.194
Token7
Token ID22
Feature activation+5.87e-02

Pos logit contributions
 Cunning+0.010
 harmless+0.008
its+0.008
iquette+0.008
attering+0.008
Neg logit contributions
ESE-0.008
ysis-0.008
res-0.007
SEA-0.007
sta-0.007
Token for
Token ID329
Feature activation-2.77e-02

Pos logit contributions
ESE+0.047
ysis+0.043
res+0.042
SEA+0.041
thodox+0.039
Neg logit contributions
 Cunning-0.049
 harmless-0.040
its-0.038
iquette-0.038
attering-0.037
Token their
Token ID511
Feature activation-4.52e-02

Pos logit contributions
 Cunning+0.023
 harmless+0.018
its+0.018
iquette+0.017
 Buckley+0.017
Neg logit contributions
ESE-0.023
ysis-0.021
res-0.020
SEA-0.020
thodox-0.019
Token permission
Token ID7170
Feature activation+2.20e-01

Pos logit contributions
 Cunning+0.039
 harmless+0.032
its+0.030
iquette+0.030
 Buckley+0.030
Neg logit contributions
ESE-0.036
ysis-0.032
res-0.031
SEA-0.031
thodox-0.029
Token this
Token ID428
Feature activation+1.25e-01

Pos logit contributions
 Cunning+0.084
 harmless+0.067
its+0.063
iquette+0.063
attering+0.063
Neg logit contributions
ESE-0.096
ysis-0.089
res-0.086
SEA-0.085
sta-0.081
Token perspective
Token ID6650
Feature activation-3.14e-02

Pos logit contributions
ESE+0.101
ysis+0.093
res+0.090
SEA+0.089
 seeded+0.085
Neg logit contributions
 Cunning-0.102
 harmless-0.083
its-0.079
iquette-0.079
attering-0.078
Token with
Token ID351
Feature activation-6.10e-02

Pos logit contributions
 Cunning+0.025
 harmless+0.020
its+0.019
iquette+0.019
attering+0.019
Neg logit contributions
ESE-0.027
ysis-0.024
res-0.024
SEA-0.023
sta-0.022
Token Gary
Token ID10936
Feature activation-1.95e-01

Pos logit contributions
 Cunning+0.046
 harmless+0.037
its+0.034
iquette+0.034
attering+0.034
Neg logit contributions
ESE-0.054
ysis-0.050
res-0.048
SEA-0.048
sta-0.046
Token during
Token ID1141
Feature activation+1.79e-02

Pos logit contributions
 Cunning+0.159
 harmless+0.128
its+0.122
iquette+0.121
attering+0.120
Neg logit contributions
ESE-0.160
ysis-0.147
res-0.142
SEA-0.141
thodox-0.134
Token our
Token ID674
Feature activation-2.73e-01

Pos logit contributions
ESE+0.016
ysis+0.015
res+0.014
SEA+0.014
sta+0.013
Neg logit contributions
 Cunning-0.013
 harmless-0.011
iquette-0.010
its-0.010
attering-0.010
Token interview
Token ID2720
Feature activation-1.22e-01

Pos logit contributions
 Cunning+0.198
 harmless+0.154
iquette+0.146
its+0.146
attering+0.145
Neg logit contributions
ESE-0.248
ysis-0.230
res-0.224
SEA-0.220
sta-0.211
Token,
Token ID11
Feature activation-6.12e-03

Pos logit contributions
 Cunning+0.100
 harmless+0.081
its+0.076
iquette+0.076
attering+0.076
Neg logit contributions
ESE-0.099
ysis-0.091
res-0.088
SEA-0.087
sta-0.083
Token he
Token ID339
Feature activation+8.47e-02

Pos logit contributions
 Cunning+0.005
 harmless+0.004
its+0.004
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.005
ysis-0.005
res-0.004
SEA-0.004
sta-0.004
Token ref
Token ID1006
Feature activation+2.03e-01

Pos logit contributions
ESE+0.072
ysis+0.066
res+0.064
SEA+0.063
sta+0.061
Neg logit contributions
 Cunning-0.067
 harmless-0.054
its-0.051
iquette-0.050
attering-0.050
Tokenram
Token ID859
Feature activation+4.29e-01

Pos logit contributions
ESE+0.165
ysis+0.153
res+0.149
SEA+0.145
sta+0.139
Neg logit contributions
 Cunning-0.167
 harmless-0.135
iquette-0.127
its-0.126
 Buckley-0.126
Token While
Token ID2893
Feature activation-1.20e-01

Pos logit contributions
ESE+0.002
ysis+0.002
res+0.001
SEA+0.001
sta+0.001
Neg logit contributions
 Cunning-0.002
 harmless-0.001
its-0.001
iquette-0.001
attering-0.001
Token they
Token ID484
Feature activation+1.53e-01

Pos logit contributions
 Cunning+0.095
 harmless+0.076
iquette+0.072
its+0.072
attering+0.072
Neg logit contributions
ESE-0.101
ysis-0.093
res-0.090
SEA-0.089
sta-0.085
Token are
Token ID389
Feature activation+9.41e-02

Pos logit contributions
ESE+0.123
ysis+0.113
res+0.109
SEA+0.108
sta+0.103
Neg logit contributions
 Cunning-0.128
 harmless-0.104
its-0.099
iquette-0.098
attering-0.098
Token both
Token ID1111
Feature activation-5.47e-02

Pos logit contributions
ESE+0.079
ysis+0.073
res+0.071
SEA+0.070
sta+0.067
Neg logit contributions
 Cunning-0.075
 harmless-0.060
its-0.057
iquette-0.057
attering-0.056
Token Stanford
Token ID13863
Feature activation+9.34e-03

Pos logit contributions
 Cunning+0.044
 harmless+0.035
its+0.033
iquette+0.033
attering+0.033
Neg logit contributions
ESE-0.046
ysis-0.042
res-0.041
SEA-0.040
sta-0.038
Token economists
Token ID16816
Feature activation-2.57e-01

Pos logit contributions
ESE+0.009
ysis+0.008
res+0.008
SEA+0.008
thodox+0.007
Neg logit contributions
 Cunning-0.007
 harmless-0.005
its-0.005
iquette-0.005
attering-0.005
Token,
Token ID11
Feature activation+9.07e-03

Pos logit contributions
 Cunning+0.209
 harmless+0.169
its+0.160
iquette+0.158
attering+0.158
Neg logit contributions
ESE-0.213
ysis-0.197
res-0.188
SEA-0.187
thodox-0.178
Token they
Token ID484
Feature activation+1.58e-01

Pos logit contributions
ESE+0.008
ysis+0.007
res+0.007
SEA+0.007
sta+0.006
Neg logit contributions
 Cunning-0.007
 harmless-0.006
iquette-0.006
its-0.006
attering-0.005
Token are
Token ID389
Feature activation+9.38e-02

Pos logit contributions
ESE+0.126
ysis+0.116
res+0.111
SEA+0.110
sta+0.105
Neg logit contributions
 Cunning-0.132
 harmless-0.108
its-0.102
iquette-0.101
attering-0.101
Token not
Token ID407
Feature activation-6.55e-02

Pos logit contributions
ESE+0.082
ysis+0.076
res+0.074
SEA+0.073
sta+0.070
Neg logit contributions
 Cunning-0.071
 harmless-0.056
its-0.053
iquette-0.053
attering-0.053
Token in
Token ID287
Feature activation-3.28e-03

Pos logit contributions
 Cunning+0.050
 harmless+0.040
iquette+0.037
its+0.037
attering+0.037
Neg logit contributions
ESE-0.057
ysis-0.053
res-0.051
SEA-0.051
sta-0.048
Token for
Token ID329
Feature activation-2.84e-02

Pos logit contributions
 Cunning+0.073
 harmless+0.059
its+0.055
iquette+0.055
attering+0.055
Neg logit contributions
ESE-0.078
ysis-0.072
res-0.069
SEA-0.068
thodox-0.065
Token the
Token ID262
Feature activation+1.20e-02

Pos logit contributions
 Cunning+0.022
 harmless+0.018
its+0.017
iquette+0.017
attering+0.017
Neg logit contributions
ESE-0.024
ysis-0.022
res-0.021
SEA-0.021
sta-0.020
Token rest
Token ID1334
Feature activation-1.15e-01

Pos logit contributions
ESE+0.011
ysis+0.010
res+0.010
SEA+0.009
sta+0.009
Neg logit contributions
 Cunning-0.009
 harmless-0.007
its-0.007
iquette-0.007
attering-0.007
Token of
Token ID286
Feature activation-6.46e-03

Pos logit contributions
 Cunning+0.095
 harmless+0.076
iquette+0.072
its+0.072
attering+0.072
Neg logit contributions
ESE-0.095
ysis-0.087
res-0.084
SEA-0.083
sta-0.079
Token their
Token ID511
Feature activation-5.53e-02

Pos logit contributions
 Cunning+0.005
 harmless+0.004
its+0.004
iquette+0.004
attering+0.004
Neg logit contributions
ESE-0.006
ysis-0.005
res-0.005
SEA-0.005
sta-0.005
Token lives
Token ID3160
Feature activation-2.76e-01

Pos logit contributions
 Cunning+0.027
 harmless+0.018
its+0.016
iquette+0.016
attering+0.016
Neg logit contributions
ESE-0.064
ysis-0.060
res-0.059
SEA-0.058
sta-0.056
Token and
Token ID290
Feature activation-1.73e-02

Pos logit contributions
 Cunning+0.224
 harmless+0.181
its+0.171
iquette+0.170
attering+0.169
Neg logit contributions
ESE-0.230
ysis-0.211
res-0.203
SEA-0.202
thodox-0.192
Token it
Token ID340
Feature activation-9.19e-02

Pos logit contributions
 Cunning+0.014
 harmless+0.011
its+0.010
iquette+0.010
attering+0.010
Neg logit contributions
ESE-0.015
ysis-0.013
res-0.013
SEA-0.013
thodox-0.012
Token may
Token ID743
Feature activation-7.22e-02

Pos logit contributions
 Cunning+0.074
 harmless+0.060
its+0.057
iquette+0.056
attering+0.056
Neg logit contributions
ESE-0.076
ysis-0.070
res-0.068
SEA-0.067
sta-0.064
Token even
Token ID772
Feature activation-3.75e-02

Pos logit contributions
 Cunning+0.057
 harmless+0.045
its+0.043
iquette+0.043
 Buckley+0.043
Neg logit contributions
ESE-0.061
ysis-0.056
res-0.055
SEA-0.054
sta-0.052
Token turn
Token ID1210
Feature activation-1.26e-01

Pos logit contributions
 Cunning+0.031
 harmless+0.025
its+0.024
iquette+0.024
attering+0.024
Neg logit contributions
ESE-0.030
ysis-0.028
res-0.027
SEA-0.026
sta-0.025
Token Reagan
Token ID12823
Feature activation+6.53e-02

Pos logit contributions
 Cunning+0.000
 harmless+0.000
its+0.000
iquette+0.000
attering+0.000
Neg logit contributions
ESE-0.000
ysis-0.000
res-0.000
SEA-0.000
sta-0.000
Token was
Token ID373
Feature activation+5.40e-02

Pos logit contributions
ESE+0.054
ysis+0.049
res+0.048
SEA+0.047
sta+0.045
Neg logit contributions
 Cunning-0.053
 harmless-0.043
its-0.041
iquette-0.040
attering-0.040
Token actually
Token ID1682
Feature activation-8.14e-02

Pos logit contributions
ESE+0.042
ysis+0.038
res+0.037
SEA+0.037
sta+0.035
Neg logit contributions
 Cunning-0.046
 harmless-0.038
its-0.036
iquette-0.036
attering-0.036
Token caught
Token ID4978
Feature activation-1.16e-01

Pos logit contributions
 Cunning+0.071
 harmless+0.058
its+0.056
iquette+0.055
attering+0.055
Neg logit contributions
ESE-0.062
ysis-0.057
res-0.054
SEA-0.054
thodox-0.051
Token giving
Token ID3501
Feature activation-1.33e-01

Pos logit contributions
 Cunning+0.098
 harmless+0.080
its+0.076
attering+0.075
 Buckley+0.075
Neg logit contributions
ESE-0.093
ysis-0.085
res-0.081
SEA-0.081
thodox-0.077
Token weapons
Token ID3777
Feature activation-2.50e-01

Pos logit contributions
 Cunning+0.112
 harmless+0.091
its+0.087
iquette+0.086
attering+0.086
Neg logit contributions
ESE-0.106
ysis-0.097
res-0.093
SEA-0.092
thodox-0.088
Token to
Token ID284
Feature activation-8.45e-03

Pos logit contributions
 Cunning+0.196
 harmless+0.158
its+0.149
attering+0.147
 Buckley+0.147
Neg logit contributions
ESE-0.214
ysis-0.196
res-0.189
SEA-0.188
thodox-0.180
Token the
Token ID262
Feature activation+2.30e-02

Pos logit contributions
 Cunning+0.007
 harmless+0.006
its+0.005
 Buckley+0.005
attering+0.005
Neg logit contributions
ESE-0.007
ysis-0.006
res-0.006
SEA-0.006
thodox-0.006
Token Iranians
Token ID34608
Feature activation+2.43e-01

Pos logit contributions
ESE+0.016
ysis+0.014
SEA+0.013
res+0.013
sta+0.013
Neg logit contributions
 Cunning-0.022
 harmless-0.019
its-0.018
 Buckley-0.018
attering-0.018
Token,
Token ID11
Feature activation+8.98e-03

Pos logit contributions
ESE+0.197
ysis+0.181
res+0.173
SEA+0.173
thodox+0.164
Neg logit contributions
 Cunning-0.201
 harmless-0.163
its-0.155
attering-0.153
iquette-0.152
Token you
Token ID345
Feature activation+3.69e-02

Pos logit contributions
ESE+0.007
ysis+0.007
res+0.006
SEA+0.006
sta+0.006
Neg logit contributions
 Cunning-0.008
 harmless-0.006
its-0.006
iquette-0.006
attering-0.006