TOP ACTIVATIONS
MAX = 1.063

evidence . Sim pl ified , rape by fraud
have somebody add in sn ark y commentary . - February
and save it as a backup copy . You
respondents saying they had a negative opinion of the brothers and
Ass assin s Creed ® Liberation HD , is
engineers who assisted are Steve Mus ko and Steve Rog ack
This shows that the ex oplan et must have a dense
Quake D ude from Explosive !!!!!!!! If you give this product
YouTube spokeswoman . Newsletter Sign Up Continue reading the
images ( args [" images "] ): # load the image
seasons in Denver as a backup and special teams player ,
. I think I was stumbling a little bit , because
David Duke . ( Joe Bur bank / Or lando Sentinel
has re - eval uated his language
offered in MO BI or Plain Text is a perfect format
assembly was started at Sam pl a Roh t ak by
public safety issues . David Dill s final opportunity
to 540 , 000 ; field 36 batt alions of Marines
images ( args [" images "] ): # load the image
these troops out in the field on military operations ad inf

MIN ACTIVATIONS
MIN = -1.061

10 . 13 71 / journal . p one . 00
A - G iants Super bowl - Italian St ere otypes
ates # Occup y The Deb ates pic . twitter .
Memorial , by artist Robert Berks , was erected in her
on the slippery Cleveland Stadium sod , the Rams ran out
). Extreme Un ction . Holy Orders . Mat rim ony
- G ambling - Super bowl Advertising - Aust rail ian
The 2008 clip resur faced after McCain
. Conf irmation . The Holy E uch arist . Pen
l ange , the j umble of floating ice downstream from
), a selective serotonin re upt ake inhibitor ( SS RI
. For more from The Nation , check out our latest
ap uni , a p ē l ā p ū ke
Cup instead of the Super Bowl . But after leaving Kentucky
? Ichigo m umbled to himself as he watched
Mind claw Shaman vs Pyro blast Prim al
at oder ma with variable penet rance , characterized clinically by
and J ansson 's pict orial and descriptive , yet rhy
. For more from The Nation , check out our latest
with little pressing and few penet rative runs .

INTERVAL 0.532 - 1.063
CONTAINS 0.274%

- Pat rices 3 way - Ant s Party in -
. Shop ify POS delivers in - store and
in its own long - term interest . It seized control
attitudes , but it seems rude to send someone a link
the traditional flavors of Vietnamese cuisine in New Orleans . To

INTERVAL 0.001 - 0.532
CONTAINS 48.742%

to accept a complaint . Five community groups sued
We are in the process of reim plement ing our message
The number of lines that are checked can be set with
the camera , causing Aut on av to mis ident ify
of 5 . 37 an over .

INTERVAL -0.530 - 0.001
CONTAINS 50.685%

of coward ice on social media . But what
pres ­ id ­ ent since at least Jimmy Carter ,
were targeted . The cops identified as Joshua E ern
companies were granted retro active immunity for violating the rights and
helped him grow since returning to Europe . He

INTERVAL -1.061 - -0.530
CONTAINS 0.298%

EP uses the stream cipher RC 4 for confidentiality ,[ 9
side question , in an interview with you a few months
all my Nak Mu ay Nation features here .
ath ons to promote j ogging it gets more people
permanently face away . By comparison : Earth 's moon is
Token evidence
Token ID2370
Feature activation+2.05e-01

Pos logit contributions
���+0.308
 THESE+0.261
abouts+0.255
 wrists+0.250
retty+0.249
Neg logit contributions
aith-0.228
ipe-0.220
lus-0.210
 toxicity-0.208
ARA-0.202
Token.
Token ID13
Feature activation+2.35e-02

Pos logit contributions
���+0.448
abouts+0.380
 THESE+0.378
 Zup+0.367
retty+0.362
Neg logit contributions
aith-0.361
ipe-0.346
 toxicity-0.336
lus-0.333
ARA-0.325
Token
Token ID198
Feature activation-8.03e-02

Pos logit contributions
���+0.048
 THESE+0.041
abouts+0.040
 Zup+0.039
retty+0.038
Neg logit contributions
aith-0.045
ipe-0.043
 toxicity-0.043
lus-0.041
ARA-0.040
Token
Token ID198
Feature activation-2.90e-03

Pos logit contributions
aith+0.184
 toxicity+0.177
ipe+0.174
 sympt+0.168
lus+0.167
Neg logit contributions
 THESE-0.109
abouts-0.108
���-0.102
 Zup-0.101
bable-0.098
TokenSim
Token ID8890
Feature activation-8.18e-02

Pos logit contributions
aith+0.005
 toxicity+0.005
ipe+0.005
lus+0.005
 sympt+0.005
Neg logit contributions
���-0.006
 THESE-0.005
abouts-0.005
retty-0.005
 Zup-0.005
Tokenpl
Token ID489
Feature activation+1.06e+00

Pos logit contributions
aith+0.133
ipe+0.126
 toxicity+0.125
lus+0.120
ARA+0.117
Neg logit contributions
���-0.190
abouts-0.161
 THESE-0.161
retty-0.156
bable-0.156
Tokenified
Token ID1431
Feature activation-1.35e-01

Pos logit contributions
���+1.676
abouts+1.375
 THESE+1.352
retty+1.302
 Zup+1.299
Neg logit contributions
aith-2.407
 toxicity-2.313
ipe-2.308
lus-2.279
ARA-2.242
Token,
Token ID11
Feature activation-5.59e-02

Pos logit contributions
aith+0.243
ipe+0.236
 toxicity+0.225
lus+0.225
ARA+0.218
Neg logit contributions
���-0.294
 THESE-0.246
abouts-0.244
 Zup-0.237
retty-0.236
Token rape
Token ID7262
Feature activation-1.91e-01

Pos logit contributions
aith+0.097
ipe+0.093
lus+0.090
 toxicity+0.090
ARA+0.087
Neg logit contributions
���-0.127
 THESE-0.106
abouts-0.104
 Zup-0.102
retty-0.101
Token by
Token ID416
Feature activation+8.15e-02

Pos logit contributions
aith+0.333
ipe+0.319
 toxicity+0.308
lus+0.307
ARA+0.297
Neg logit contributions
���-0.426
abouts-0.359
 THESE-0.357
retty-0.347
 Zup-0.341
Token fraud
Token ID7394
Feature activation-2.41e-01

Pos logit contributions
���+0.160
abouts+0.129
 THESE+0.129
retty+0.124
 Zup+0.122
Neg logit contributions
aith-0.166
ipe-0.159
 toxicity-0.156
lus-0.154
ARA-0.150
Token have
Token ID423
Feature activation-4.26e-02

Pos logit contributions
���+0.007
 THESE+0.006
abouts+0.006
 peas+0.006
 Zup+0.006
Neg logit contributions
aith-0.006
ipe-0.005
lus-0.005
ARA-0.005
 toxicity-0.005
Token somebody
Token ID8276
Feature activation+1.28e-01

Pos logit contributions
aith+0.066
ipe+0.063
lus+0.060
 toxicity+0.059
ARA+0.059
Neg logit contributions
���-0.101
 THESE-0.089
abouts-0.089
 peas-0.086
retty-0.085
Token add
Token ID751
Feature activation-2.10e-01

Pos logit contributions
���+0.272
 THESE+0.235
abouts+0.235
 wrists+0.224
 peas+0.224
Neg logit contributions
aith-0.229
ipe-0.215
lus-0.209
 toxicity-0.207
ARA-0.206
Token in
Token ID287
Feature activation+7.76e-02

Pos logit contributions
aith+0.337
ipe+0.320
lus+0.310
 toxicity+0.306
ARA+0.301
Neg logit contributions
���-0.494
 THESE-0.427
abouts-0.421
retty-0.407
 peas-0.406
Token sn
Token ID3013
Feature activation+1.88e-04

Pos logit contributions
���+0.182
abouts+0.159
 THESE+0.158
retty+0.151
 Zup+0.150
Neg logit contributions
aith-0.124
ipe-0.117
lus-0.114
 toxicity-0.113
ARA-0.110
Tokenark
Token ID668
Feature activation+1.02e+00

Pos logit contributions
���+0.000
abouts+0.000
 THESE+0.000
retty+0.000
 peas+0.000
Neg logit contributions
aith-0.000
 toxicity-0.000
ipe-0.000
lus-0.000
ARA-0.000
Tokeny
Token ID88
Feature activation+2.93e-01

Pos logit contributions
���+2.088
abouts+1.803
 THESE+1.789
 peas+1.718
retty+1.703
Neg logit contributions
aith-1.866
ipe-1.740
lus-1.708
ARA-1.695
 toxicity-1.688
Token commentary
Token ID14604
Feature activation+1.83e-01

Pos logit contributions
���+0.670
abouts+0.579
 THESE+0.572
 peas+0.557
 wrists+0.550
Neg logit contributions
aith-0.488
ipe-0.464
lus-0.450
ARA-0.436
 toxicity-0.435
Token.
Token ID13
Feature activation+3.29e-02

Pos logit contributions
���+0.391
abouts+0.343
 THESE+0.342
 wrists+0.326
 peas+0.324
Neg logit contributions
aith-0.320
ipe-0.305
lus-0.296
 toxicity-0.294
ARA-0.293
Token -
Token ID532
Feature activation-2.05e-01

Pos logit contributions
���+0.066
 THESE+0.058
abouts+0.057
 Zup+0.055
retty+0.053
Neg logit contributions
aith-0.063
ipe-0.060
 toxicity-0.059
lus-0.058
ARA-0.056
Token February
Token ID3945
Feature activation+1.11e-01

Pos logit contributions
aith+0.469
ipe+0.436
lus+0.430
 toxicity+0.423
 sympt+0.408
Neg logit contributions
���-0.323
 THESE-0.301
abouts-0.290
 wrists-0.282
 Zup-0.277
Token and
Token ID290
Feature activation+6.36e-02

Pos logit contributions
���+0.098
 THESE+0.083
abouts+0.082
retty+0.080
 wrists+0.078
Neg logit contributions
aith-0.085
ipe-0.080
 toxicity-0.080
lus-0.078
ARA-0.076
Token save
Token ID3613
Feature activation+2.61e-01

Pos logit contributions
���+0.138
 THESE+0.119
abouts+0.115
 wrists+0.113
 peas+0.112
Neg logit contributions
aith-0.114
ipe-0.106
lus-0.105
 toxicity-0.105
ARA-0.102
Token it
Token ID340
Feature activation+7.70e-02

Pos logit contributions
���+0.633
 THESE+0.549
abouts+0.543
 wrists+0.524
 peas+0.522
Neg logit contributions
aith-0.400
ipe-0.374
 toxicity-0.369
lus-0.363
ARA-0.346
Token as
Token ID355
Feature activation+4.83e-02

Pos logit contributions
���+0.163
 THESE+0.140
abouts+0.139
 wrists+0.132
retty+0.132
Neg logit contributions
aith-0.140
ipe-0.132
 toxicity-0.132
lus-0.128
ARA-0.126
Token a
Token ID257
Feature activation+1.78e-01

Pos logit contributions
���+0.113
 THESE+0.098
abouts+0.096
 peas+0.093
 Zup+0.092
Neg logit contributions
aith-0.078
ipe-0.074
 toxicity-0.073
lus-0.071
ARA-0.069
Token backup
Token ID11559
Feature activation+1.02e+00

Pos logit contributions
���+0.348
 THESE+0.292
abouts+0.283
retty+0.274
 peas+0.273
Neg logit contributions
aith-0.354
ipe-0.338
lus-0.334
 toxicity-0.333
ARA-0.317
Token copy
Token ID4866
Feature activation-1.04e-01

Pos logit contributions
���+2.095
 THESE+1.824
abouts+1.739
retty+1.704
bable+1.701
Neg logit contributions
aith-1.808
ipe-1.730
 toxicity-1.728
lus-1.714
ARA-1.617
Token.
Token ID13
Feature activation+6.59e-02

Pos logit contributions
aith+0.180
ipe+0.170
 toxicity+0.169
lus+0.166
ARA+0.161
Neg logit contributions
���-0.230
 THESE-0.197
abouts-0.195
retty-0.188
 Zup-0.186
Token You
Token ID921
Feature activation-9.86e-02

Pos logit contributions
���+0.131
 THESE+0.113
abouts+0.109
 Zup+0.105
 wrists+0.104
Neg logit contributions
aith-0.130
ipe-0.123
 toxicity-0.122
lus-0.119
ARA-0.116
Token
Token ID447
Feature activation+5.09e-02

Pos logit contributions
aith+0.163
 toxicity+0.154
ipe+0.153
lus+0.150
ARA+0.146
Neg logit contributions
���-0.226
 THESE-0.194
abouts-0.192
 wrists-0.185
 peas-0.185
Token
Token ID247
Feature activation-6.60e-02

Pos logit contributions
���+0.071
 THESE+0.057
abouts+0.057
retty+0.054
 peas+0.053
Neg logit contributions
aith-0.126
 toxicity-0.122
ipe-0.121
ARA-0.115
 sympt-0.115
Token respondents
Token ID14502
Feature activation+2.63e-02

Pos logit contributions
aith+0.053
ipe+0.051
 toxicity+0.050
lus+0.049
ARA+0.048
Neg logit contributions
���-0.081
abouts-0.070
 THESE-0.070
retty-0.067
 peas-0.067
Token saying
Token ID2282
Feature activation+9.97e-02

Pos logit contributions
���+0.056
abouts+0.049
 THESE+0.048
 wrists+0.046
retty+0.046
Neg logit contributions
aith-0.047
ipe-0.045
 toxicity-0.044
lus-0.044
ARA-0.042
Token they
Token ID484
Feature activation+1.64e-01

Pos logit contributions
���+0.187
abouts+0.156
 THESE+0.156
retty+0.147
 wrists+0.145
Neg logit contributions
aith-0.207
ipe-0.199
 toxicity-0.193
lus-0.192
ARA-0.187
Token had
Token ID550
Feature activation+1.96e-01

Pos logit contributions
���+0.337
abouts+0.288
 THESE+0.283
 peas+0.268
 wrists+0.267
Neg logit contributions
aith-0.311
ipe-0.301
 toxicity-0.293
lus-0.290
ARA-0.284
Token a
Token ID257
Feature activation+1.65e-01

Pos logit contributions
���+0.414
 THESE+0.353
abouts+0.351
 peas+0.333
 wrists+0.332
Neg logit contributions
aith-0.360
ipe-0.347
lus-0.335
 toxicity-0.330
ARA-0.326
Token negative
Token ID4633
Feature activation+9.99e-01

Pos logit contributions
���+0.370
 THESE+0.316
abouts+0.313
retty+0.305
bable+0.301
Neg logit contributions
aith-0.279
ipe-0.267
lus-0.256
 toxicity-0.255
ARA-0.250
Token opinion
Token ID4459
Feature activation+1.74e-01

Pos logit contributions
���+2.162
 THESE+1.901
abouts+1.850
 wrists+1.825
 peas+1.814
Neg logit contributions
aith-1.650
ipe-1.618
lus-1.540
 toxicity-1.511
ARA-1.507
Token of
Token ID286
Feature activation+1.10e-01

Pos logit contributions
���+0.433
 THESE+0.364
abouts+0.362
retty+0.353
 Zup+0.349
Neg logit contributions
aith-0.263
ipe-0.255
 toxicity-0.249
lus-0.238
 sympt-0.230
Token the
Token ID262
Feature activation+1.15e-01

Pos logit contributions
���+0.249
 THESE+0.214
abouts+0.212
retty+0.202
 wrists+0.201
Neg logit contributions
aith-0.189
ipe-0.179
 toxicity-0.171
lus-0.170
ARA-0.168
Token brothers
Token ID9397
Feature activation+1.05e-01

Pos logit contributions
���+0.246
 THESE+0.211
abouts+0.209
 wrists+0.204
 peas+0.203
Neg logit contributions
aith-0.205
ipe-0.197
ARA-0.188
lus-0.185
 toxicity-0.185
Token and
Token ID290
Feature activation+6.29e-02

Pos logit contributions
���+0.259
abouts+0.226
 THESE+0.225
retty+0.216
 Zup+0.214
Neg logit contributions
aith-0.157
ipe-0.147
 toxicity-0.143
lus-0.141
ARA-0.137
TokenAss
Token ID8021
Feature activation+2.21e-01

Pos logit contributions
���+0.175
abouts+0.152
 THESE+0.150
retty+0.145
bable+0.143
Neg logit contributions
aith-0.151
 toxicity-0.144
ipe-0.144
lus-0.138
 sympt-0.135
Tokenassin
Token ID44961
Feature activation+2.60e-01

Pos logit contributions
���+0.470
 THESE+0.394
abouts+0.391
retty+0.382
 Zup+0.376
Neg logit contributions
aith-0.399
ipe-0.382
 toxicity-0.378
lus-0.367
ARA-0.361
Token
Token ID447
Feature activation+8.79e-02

Pos logit contributions
���+0.556
 THESE+0.464
abouts+0.460
retty+0.449
 Zup+0.443
Neg logit contributions
aith-0.475
ipe-0.452
 toxicity-0.442
lus-0.441
ARA-0.424
Token
Token ID247
Feature activation+1.31e-01

Pos logit contributions
���+0.172
 THESE+0.137
abouts+0.137
retty+0.132
 Zup+0.129
Neg logit contributions
aith-0.180
ipe-0.173
 toxicity-0.171
lus-0.167
ARA-0.164
Tokens
Token ID82
Feature activation+5.79e-02

Pos logit contributions
���+0.285
 THESE+0.240
abouts+0.238
retty+0.231
 Zup+0.229
Neg logit contributions
aith-0.236
ipe-0.225
 toxicity-0.220
lus-0.215
ARA-0.210
Token Creed
Token ID24299
Feature activation+9.91e-01

Pos logit contributions
���+0.123
 THESE+0.103
abouts+0.102
retty+0.099
 wrists+0.098
Neg logit contributions
aith-0.106
ipe-0.102
 toxicity-0.099
lus-0.098
ARA-0.096
Token®
Token ID7461
Feature activation+4.81e-01

Pos logit contributions
���+1.976
 THESE+1.662
abouts+1.637
retty+1.565
 wrists+1.558
Neg logit contributions
aith-1.916
ipe-1.820
 toxicity-1.765
lus-1.741
ARA-1.704
Token Liberation
Token ID29235
Feature activation+4.40e-01

Pos logit contributions
���+1.000
 THESE+0.861
abouts+0.850
retty+0.811
bable+0.808
Neg logit contributions
aith-0.885
ipe-0.842
 toxicity-0.823
lus-0.808
ARA-0.788
Token HD
Token ID5572
Feature activation+1.76e-01

Pos logit contributions
���+0.933
 THESE+0.791
abouts+0.778
retty+0.753
 wrists+0.746
Neg logit contributions
aith-0.807
ipe-0.766
 toxicity-0.750
lus-0.741
ARA-0.714
Token,
Token ID11
Feature activation+8.75e-02

Pos logit contributions
���+0.383
 THESE+0.319
abouts+0.318
retty+0.309
 Zup+0.305
Neg logit contributions
aith-0.317
ipe-0.301
 toxicity-0.295
lus-0.290
ARA-0.282
Token is
Token ID318
Feature activation+1.75e-01

Pos logit contributions
���+0.194
 THESE+0.164
abouts+0.162
retty+0.157
 Zup+0.156
Neg logit contributions
aith-0.155
ipe-0.146
 toxicity-0.143
lus-0.139
ARA-0.137
Token engineers
Token ID12037
Feature activation+1.02e-01

Pos logit contributions
���+0.927
abouts+0.794
 THESE+0.791
retty+0.771
bable+0.770
Neg logit contributions
aith-0.549
ipe-0.526
 toxicity-0.508
lus-0.500
ARA-0.477
Token who
Token ID508
Feature activation-8.20e-02

Pos logit contributions
���+0.226
abouts+0.193
 THESE+0.192
 Zup+0.188
 wrists+0.185
Neg logit contributions
aith-0.173
ipe-0.169
 toxicity-0.163
lus-0.161
ARA-0.158
Token assisted
Token ID18419
Feature activation-1.10e-01

Pos logit contributions
aith+0.152
ipe+0.147
 toxicity+0.143
lus+0.141
ARA+0.139
Neg logit contributions
���-0.172
abouts-0.145
 THESE-0.144
 Zup-0.137
 peas-0.137
Token are
Token ID389
Feature activation-5.50e-02

Pos logit contributions
aith+0.186
ipe+0.180
 toxicity+0.174
lus+0.170
ARA+0.166
Neg logit contributions
���-0.250
 THESE-0.211
abouts-0.208
retty-0.203
 Zup-0.202
Token Steve
Token ID6542
Feature activation-7.34e-02

Pos logit contributions
aith+0.105
ipe+0.101
 toxicity+0.098
lus+0.098
ARA+0.095
Neg logit contributions
���-0.114
 THESE-0.094
abouts-0.094
retty-0.090
 Zup-0.090
Token Mus
Token ID2629
Feature activation+9.51e-01

Pos logit contributions
aith+0.137
ipe+0.131
 toxicity+0.129
lus+0.126
ARA+0.123
Neg logit contributions
���-0.154
abouts-0.128
 THESE-0.127
 Zup-0.124
retty-0.123
Tokenko
Token ID7204
Feature activation-8.67e-02

Pos logit contributions
���+1.975
abouts+1.779
retty+1.738
 THESE+1.712
bable+1.698
Neg logit contributions
aith-1.615
 toxicity-1.559
ipe-1.535
ARA-1.475
 sympt-1.460
Token and
Token ID290
Feature activation-1.33e-02

Pos logit contributions
aith+0.142
ipe+0.135
 toxicity+0.133
lus+0.130
ARA+0.127
Neg logit contributions
���-0.202
abouts-0.170
 THESE-0.170
retty-0.165
 Zup-0.164
Token Steve
Token ID6542
Feature activation-8.67e-02

Pos logit contributions
aith+0.027
ipe+0.026
 toxicity+0.025
lus+0.025
ARA+0.024
Neg logit contributions
���-0.026
 THESE-0.021
abouts-0.021
 Zup-0.021
retty-0.020
Token Rog
Token ID8041
Feature activation-7.08e-02

Pos logit contributions
aith+0.166
ipe+0.159
 toxicity+0.157
lus+0.153
ARA+0.150
Neg logit contributions
���-0.178
abouts-0.147
 THESE-0.146
 Zup-0.142
retty-0.141
Tokenack
Token ID441
Feature activation-1.12e-02

Pos logit contributions
aith+0.132
ipe+0.126
 toxicity+0.125
lus+0.120
ARA+0.120
Neg logit contributions
���-0.147
abouts-0.124
 THESE-0.122
retty-0.120
bable-0.117
Token This
Token ID770
Feature activation+2.91e-03

Pos logit contributions
aith+0.317
ipe+0.305
 toxicity+0.299
lus+0.292
ARA+0.285
Neg logit contributions
���-0.310
 THESE-0.266
abouts-0.260
 Zup-0.250
retty-0.246
Token shows
Token ID2523
Feature activation-8.27e-02

Pos logit contributions
���+0.007
 THESE+0.006
abouts+0.006
 peas+0.006
 wrists+0.005
Neg logit contributions
aith-0.005
ipe-0.005
lus-0.004
ARA-0.004
 toxicity-0.004
Token that
Token ID326
Feature activation-4.54e-02

Pos logit contributions
aith+0.141
ipe+0.134
lus+0.129
 toxicity+0.127
ARA+0.126
Neg logit contributions
���-0.183
 THESE-0.159
abouts-0.158
 peas-0.153
retty-0.151
Token the
Token ID262
Feature activation-2.63e-02

Pos logit contributions
aith+0.078
ipe+0.074
lus+0.070
 toxicity+0.069
ARA+0.069
Neg logit contributions
���-0.101
 THESE-0.089
abouts-0.088
 peas-0.085
 wrists-0.084
Token ex
Token ID409
Feature activation-6.44e-02

Pos logit contributions
aith+0.047
ipe+0.044
ARA+0.043
lus+0.043
 toxicity+0.042
Neg logit contributions
���-0.056
 THESE-0.049
abouts-0.049
 peas-0.047
 wrists-0.047
Tokenoplan
Token ID46853
Feature activation+9.42e-01

Pos logit contributions
aith+0.151
ipe+0.146
 toxicity+0.145
ARA+0.140
 sympt+0.139
Neg logit contributions
���-0.101
abouts-0.081
 THESE-0.080
retty-0.077
bable-0.075
Tokenet
Token ID316
Feature activation+4.59e-01

Pos logit contributions
���+1.574
abouts+1.300
 THESE+1.269
retty+1.202
 peas+1.180
Neg logit contributions
aith-2.090
ipe-1.973
 toxicity-1.970
lus-1.953
ARA-1.928
Token must
Token ID1276
Feature activation+1.85e-01

Pos logit contributions
���+0.993
 THESE+0.870
abouts+0.868
 peas+0.829
 Zup+0.817
Neg logit contributions
aith-0.803
ipe-0.758
lus-0.734
 toxicity-0.726
ARA-0.716
Token have
Token ID423
Feature activation-1.83e-02

Pos logit contributions
���+0.419
abouts+0.363
 THESE+0.360
 peas+0.344
 Zup+0.343
Neg logit contributions
aith-0.311
ipe-0.295
 toxicity-0.286
lus-0.285
ARA-0.284
Token a
Token ID257
Feature activation-3.36e-03

Pos logit contributions
aith+0.031
ipe+0.029
lus+0.028
ARA+0.027
 toxicity+0.027
Neg logit contributions
���-0.042
abouts-0.036
 THESE-0.036
 peas-0.035
 wrists-0.034
Token dense
Token ID15715
Feature activation-6.89e-02

Pos logit contributions
aith+0.006
ipe+0.006
lus+0.005
ARA+0.005
 toxicity+0.005
Neg logit contributions
���-0.007
 THESE-0.006
abouts-0.006
 wrists-0.006
 peas-0.006
Token Quake
Token ID42901
Feature activation+2.09e-01

Pos logit contributions
���+0.145
 THESE+0.122
abouts+0.121
 Zup+0.117
retty+0.117
Neg logit contributions
aith-0.124
ipe-0.119
 toxicity-0.115
lus-0.114
ARA-0.111
TokenD
Token ID35
Feature activation+6.09e-02

Pos logit contributions
���+0.456
 THESE+0.383
abouts+0.383
retty+0.373
 Zup+0.366
Neg logit contributions
aith-0.362
ipe-0.346
 toxicity-0.344
lus-0.334
 sympt-0.322
Tokenude
Token ID2507
Feature activation+4.98e-02

Pos logit contributions
���+0.132
 THESE+0.111
abouts+0.111
retty+0.108
 Zup+0.106
Neg logit contributions
aith-0.107
ipe-0.102
 toxicity-0.101
lus-0.099
ARA-0.095
Token from
Token ID422
Feature activation-2.65e-01

Pos logit contributions
���+0.109
 THESE+0.092
abouts+0.091
retty+0.089
 Zup+0.087
Neg logit contributions
aith-0.088
ipe-0.084
 toxicity-0.083
lus-0.081
ARA-0.078
Token Explosive
Token ID48178
Feature activation-1.69e-01

Pos logit contributions
aith+0.543
ipe+0.522
 toxicity+0.509
lus+0.503
ARA+0.493
Neg logit contributions
���-0.512
 THESE-0.426
abouts-0.415
retty-0.400
 Zup-0.398
Token!!!!!!!!
Token ID34635
Feature activation+9.31e-01

Pos logit contributions
aith+0.314
ipe+0.301
 toxicity+0.291
lus+0.289
ARA+0.282
Neg logit contributions
���-0.353
 THESE-0.300
abouts-0.294
retty-0.284
 Zup-0.284
Token If
Token ID1002
Feature activation+1.97e-01

Pos logit contributions
���+1.473
 THESE+1.254
abouts+1.148
retty+1.139
 Zup+1.135
Neg logit contributions
aith-2.150
ipe-2.062
 toxicity-2.049
lus-2.017
ARA-1.972
Token you
Token ID345
Feature activation+4.48e-02

Pos logit contributions
���+0.455
 THESE+0.399
abouts+0.389
retty+0.376
 peas+0.375
Neg logit contributions
aith-0.321
ipe-0.306
 toxicity-0.294
lus-0.293
ARA-0.284
Token give
Token ID1577
Feature activation+2.03e-01

Pos logit contributions
���+0.096
 THESE+0.083
abouts+0.082
 Zup+0.079
 peas+0.079
Neg logit contributions
aith-0.080
ipe-0.076
 toxicity-0.074
lus-0.074
ARA-0.073
Token this
Token ID428
Feature activation-4.97e-02

Pos logit contributions
���+0.466
 THESE+0.405
abouts+0.395
 Zup+0.383
 peas+0.379
Neg logit contributions
aith-0.339
ipe-0.322
lus-0.308
 toxicity-0.308
ARA-0.299
Token product
Token ID1720
Feature activation-1.64e-01

Pos logit contributions
aith+0.080
ipe+0.076
 toxicity+0.073
lus+0.073
ARA+0.071
Neg logit contributions
���-0.115
 THESE-0.101
abouts-0.098
 Zup-0.096
 wrists-0.095
Token YouTube
Token ID7444
Feature activation-1.57e-01

Pos logit contributions
aith+0.035
ipe+0.033
 toxicity+0.032
lus+0.032
ARA+0.031
Neg logit contributions
���-0.048
 THESE-0.041
abouts-0.040
 wrists-0.040
retty-0.039
Token spokeswoman
Token ID12262
Feature activation-1.98e-01

Pos logit contributions
aith+0.233
ipe+0.223
 toxicity+0.209
lus+0.208
ARA+0.204
Neg logit contributions
���-0.381
 THESE-0.332
abouts-0.329
 wrists-0.323
retty-0.319
Token.
Token ID13
Feature activation-6.52e-02

Pos logit contributions
aith+0.328
ipe+0.313
 toxicity+0.306
lus+0.297
ARA+0.290
Neg logit contributions
���-0.454
abouts-0.391
 THESE-0.388
retty-0.377
 Zup-0.372
Token
Token ID198
Feature activation+4.65e-02

Pos logit contributions
aith+0.126
ipe+0.121
 toxicity+0.118
lus+0.115
ARA+0.112
Neg logit contributions
���-0.133
 THESE-0.112
abouts-0.110
 Zup-0.105
retty-0.105
Token
Token ID198
Feature activation+6.79e-02

Pos logit contributions
 THESE+0.063
abouts+0.062
���+0.060
 Zup+0.059
 Shining+0.057
Neg logit contributions
aith-0.106
 toxicity-0.102
ipe-0.100
 sympt-0.097
lus-0.097
TokenNewsletter
Token ID33031
Feature activation+9.28e-01

Pos logit contributions
���+0.139
abouts+0.123
 THESE+0.122
retty+0.116
bable+0.116
Neg logit contributions
aith-0.122
 toxicity-0.117
ipe-0.117
lus-0.110
 sympt-0.109
Token Sign
Token ID5865
Feature activation+6.33e-01

Pos logit contributions
���+1.117
 THESE+1.104
abouts+1.040
 Zup+1.014
 wrists+0.992
Neg logit contributions
aith-2.283
ipe-2.179
 toxicity-2.132
lus-2.097
ARA-2.084
Token Up
Token ID3205
Feature activation+3.28e-01

Pos logit contributions
���+1.235
 THESE+1.149
abouts+1.114
 Zup+1.098
 Pagan+1.039
Neg logit contributions
aith-1.145
 toxicity-1.115
ipe-1.083
lus-1.056
 sympt-1.047
Token Continue
Token ID10054
Feature activation-9.09e-03

Pos logit contributions
���+0.777
 THESE+0.740
abouts+0.708
 Zup+0.707
 Pagan+0.684
Neg logit contributions
aith-0.458
 toxicity-0.430
ipe-0.430
lus-0.416
 sympt-0.388
Token reading
Token ID3555
Feature activation+1.54e-02

Pos logit contributions
aith+0.016
ipe+0.015
 toxicity+0.015
lus+0.015
ARA+0.014
Neg logit contributions
���-0.019
 THESE-0.017
abouts-0.017
 Zup-0.016
 wrists-0.016
Token the
Token ID262
Feature activation-1.58e-01

Pos logit contributions
���+0.032
 THESE+0.030
abouts+0.029
 Zup+0.028
 peas+0.028
Neg logit contributions
aith-0.026
ipe-0.025
 toxicity-0.024
lus-0.023
ARA-0.023
Tokenimages
Token ID17566
Feature activation+3.43e-01

Pos logit contributions
���+0.264
abouts+0.223
 THESE+0.218
retty+0.212
bable+0.211
Neg logit contributions
aith-0.241
 toxicity-0.229
ipe-0.227
lus-0.222
ARA-0.217
Token(
Token ID7
Feature activation+1.29e-01

Pos logit contributions
���+0.625
 THESE+0.506
abouts+0.506
retty+0.484
bable+0.474
Neg logit contributions
aith-0.733
ipe-0.701
 toxicity-0.694
lus-0.682
ARA-0.670
Tokenargs
Token ID22046
Feature activation+4.52e-02

Pos logit contributions
���+0.265
abouts+0.219
 THESE+0.217
retty+0.211
bable+0.206
Neg logit contributions
aith-0.248
ipe-0.236
 toxicity-0.234
lus-0.229
ARA-0.224
Token["
Token ID14692
Feature activation+5.62e-01

Pos logit contributions
���+0.098
abouts+0.082
 THESE+0.082
retty+0.080
 Zup+0.079
Neg logit contributions
aith-0.080
ipe-0.076
 toxicity-0.075
lus-0.074
ARA-0.072
Tokenimages
Token ID17566
Feature activation+3.89e-01

Pos logit contributions
���+1.160
abouts+0.989
 THESE+0.969
retty+0.943
bable+0.932
Neg logit contributions
aith-1.052
 toxicity-0.996
ipe-0.987
lus-0.957
ARA-0.933
Token"]
Token ID8973
Feature activation+9.25e-01

Pos logit contributions
���+0.766
abouts+0.621
 THESE+0.620
retty+0.598
 Zup+0.587
Neg logit contributions
aith-0.784
ipe-0.755
 toxicity-0.739
lus-0.728
ARA-0.711
Token):
Token ID2599
Feature activation-1.30e-01

Pos logit contributions
���+1.925
 THESE+1.617
abouts+1.607
retty+1.569
 Zup+1.542
Neg logit contributions
aith-1.710
ipe-1.614
 toxicity-1.589
lus-1.580
ARA-1.530
Token #
Token ID1303
Feature activation-7.60e-02

Pos logit contributions
aith+0.217
ipe+0.202
 toxicity+0.198
lus+0.196
ARA+0.192
Neg logit contributions
���-0.300
 THESE-0.257
abouts-0.254
retty-0.245
 peas-0.245
Token load
Token ID3440
Feature activation-2.17e-01

Pos logit contributions
aith+0.136
ipe+0.127
 toxicity+0.126
lus+0.123
ARA+0.120
Neg logit contributions
���-0.167
 THESE-0.143
abouts-0.139
 Zup-0.135
retty-0.135
Token the
Token ID262
Feature activation+1.14e-01

Pos logit contributions
aith+0.365
ipe+0.346
 toxicity+0.338
lus+0.333
ARA+0.320
Neg logit contributions
���-0.500
 THESE-0.424
abouts-0.421
retty-0.405
 Zup-0.404
Token image
Token ID2939
Feature activation-4.97e-03

Pos logit contributions
���+0.249
 THESE+0.212
abouts+0.209
 peas+0.202
 Zup+0.202
Neg logit contributions
aith-0.207
ipe-0.195
 toxicity-0.189
lus-0.189
ARA-0.182
Token seasons
Token ID7028
Feature activation+2.41e-01

Pos logit contributions
aith+0.226
ipe+0.214
 toxicity+0.206
lus+0.204
ARA+0.198
Neg logit contributions
���-0.319
abouts-0.270
 THESE-0.270
 wrists-0.260
retty-0.259
Token in
Token ID287
Feature activation+2.59e-02

Pos logit contributions
���+0.501
abouts+0.420
 THESE+0.417
retty+0.398
 wrists+0.396
Neg logit contributions
aith-0.455
ipe-0.435
 toxicity-0.423
lus-0.417
ARA-0.409
Token Denver
Token ID10656
Feature activation+8.81e-02

Pos logit contributions
���+0.057
 THESE+0.047
abouts+0.046
 wrists+0.046
 Zup+0.046
Neg logit contributions
aith-0.047
ipe-0.044
 toxicity-0.042
lus-0.042
ARA-0.041
Token as
Token ID355
Feature activation-5.42e-05

Pos logit contributions
���+0.202
abouts+0.173
 THESE+0.171
retty+0.166
 Zup+0.163
Neg logit contributions
aith-0.149
ipe-0.140
 toxicity-0.136
lus-0.135
ARA-0.129
Token a
Token ID257
Feature activation+1.61e-01

Pos logit contributions
aith+0.000
ipe+0.000
 toxicity+0.000
lus+0.000
ARA+0.000
Neg logit contributions
���-0.000
 THESE-0.000
abouts-0.000
retty-0.000
 Zup-0.000
Token backup
Token ID11559
Feature activation+9.16e-01

Pos logit contributions
���+0.341
 THESE+0.286
abouts+0.282
 wrists+0.276
retty+0.274
Neg logit contributions
aith-0.299
ipe-0.281
 toxicity-0.273
lus-0.267
ARA-0.267
Token and
Token ID290
Feature activation+7.03e-03

Pos logit contributions
���+2.009
 THESE+1.723
abouts+1.715
retty+1.676
 wrists+1.667
Neg logit contributions
aith-1.531
ipe-1.443
 toxicity-1.407
lus-1.378
ARA-1.362
Token special
Token ID2041
Feature activation+3.48e-01

Pos logit contributions
���+0.016
abouts+0.013
 THESE+0.013
 wrists+0.013
retty+0.013
Neg logit contributions
aith-0.012
ipe-0.012
 toxicity-0.011
lus-0.011
ARA-0.011
Token teams
Token ID3466
Feature activation+2.04e-01

Pos logit contributions
���+0.862
abouts+0.765
 THESE+0.755
 wrists+0.748
 peas+0.731
Neg logit contributions
aith-0.497
ipe-0.468
lus-0.446
ARA-0.435
 toxicity-0.429
Token player
Token ID2137
Feature activation-2.57e-01

Pos logit contributions
���+0.470
 THESE+0.403
abouts+0.403
retty+0.396
 wrists+0.391
Neg logit contributions
aith-0.329
ipe-0.312
 toxicity-0.306
lus-0.296
ARA-0.295
Token,
Token ID11
Feature activation+1.54e-02

Pos logit contributions
aith+0.400
ipe+0.379
 toxicity+0.370
lus+0.361
ARA+0.351
Neg logit contributions
���-0.622
abouts-0.534
 THESE-0.529
retty-0.515
 Zup-0.510
Token.
Token ID13
Feature activation+8.87e-02

Pos logit contributions
aith+0.145
ipe+0.137
 toxicity+0.133
lus+0.132
ARA+0.130
Neg logit contributions
���-0.196
abouts-0.173
 THESE-0.171
retty-0.164
 wrists-0.162
Token I
Token ID314
Feature activation+9.43e-02

Pos logit contributions
���+0.165
 THESE+0.148
abouts+0.143
 Zup+0.140
 Shining+0.136
Neg logit contributions
aith-0.181
ipe-0.172
 toxicity-0.170
lus-0.165
ARA-0.160
Token think
Token ID892
Feature activation-6.02e-02

Pos logit contributions
���+0.197
abouts+0.168
 THESE+0.166
 wrists+0.159
 peas+0.159
Neg logit contributions
aith-0.171
ipe-0.164
 toxicity-0.162
ARA-0.158
lus-0.157
Token I
Token ID314
Feature activation+1.05e-01

Pos logit contributions
aith+0.103
ipe+0.097
 toxicity+0.094
lus+0.093
ARA+0.092
Neg logit contributions
���-0.134
 THESE-0.116
abouts-0.116
retty-0.111
 peas-0.110
Token was
Token ID373
Feature activation+1.31e-01

Pos logit contributions
���+0.222
abouts+0.192
 THESE+0.188
 peas+0.182
 wrists+0.182
Neg logit contributions
aith-0.186
ipe-0.177
 toxicity-0.176
ARA-0.173
lus-0.172
Token stumbling
Token ID37829
Feature activation+9.13e-01

Pos logit contributions
���+0.275
abouts+0.235
 THESE+0.231
 wrists+0.225
 peas+0.224
Neg logit contributions
aith-0.240
ipe-0.231
 toxicity-0.224
lus-0.221
ARA-0.220
Token a
Token ID257
Feature activation+1.52e-01

Pos logit contributions
���+1.790
abouts+1.589
 THESE+1.575
 wrists+1.502
 peas+1.498
Neg logit contributions
aith-1.714
ipe-1.625
 toxicity-1.593
lus-1.585
ARA-1.583
Token little
Token ID1310
Feature activation-2.92e-01

Pos logit contributions
���+0.330
abouts+0.280
 THESE+0.279
retty+0.272
 peas+0.269
Neg logit contributions
aith-0.270
ipe-0.254
 toxicity-0.250
lus-0.246
ARA-0.241
Token bit
Token ID1643
Feature activation+1.59e-02

Pos logit contributions
aith+0.568
ipe+0.541
 toxicity+0.521
lus+0.520
ARA+0.512
Neg logit contributions
���-0.585
abouts-0.501
 THESE-0.500
 wrists-0.478
 Zup-0.475
Token,
Token ID11
Feature activation+1.22e-01

Pos logit contributions
���+0.032
abouts+0.028
 THESE+0.028
 wrists+0.027
 Zup+0.027
Neg logit contributions
aith-0.030
ipe-0.029
 toxicity-0.028
lus-0.027
ARA-0.027
Token because
Token ID780
Feature activation+3.61e-01

Pos logit contributions
���+0.254
 THESE+0.218
abouts+0.217
 wrists+0.209
retty+0.208
Neg logit contributions
aith-0.225
ipe-0.215
lus-0.208
 toxicity-0.207
ARA-0.203
Token David
Token ID3271
Feature activation-1.00e-01

Pos logit contributions
���+0.148
 THESE+0.124
abouts+0.123
retty+0.117
 wrists+0.117
Neg logit contributions
aith-0.136
ipe-0.131
 toxicity-0.127
lus-0.126
ARA-0.123
Token Duke
Token ID11083
Feature activation+4.08e-01

Pos logit contributions
aith+0.172
ipe+0.164
 toxicity+0.160
lus+0.158
ARA+0.153
Neg logit contributions
���-0.227
 THESE-0.190
abouts-0.190
retty-0.183
 Zup-0.182
Token.
Token ID13
Feature activation-2.43e-02

Pos logit contributions
���+0.900
abouts+0.774
 THESE+0.770
retty+0.733
 Zup+0.730
Neg logit contributions
aith-0.692
ipe-0.665
 toxicity-0.643
lus-0.635
ARA-0.629
Token (
Token ID357
Feature activation-2.12e-02

Pos logit contributions
aith+0.048
ipe+0.046
 toxicity+0.045
lus+0.044
ARA+0.043
Neg logit contributions
���-0.047
 THESE-0.042
abouts-0.041
 Zup-0.039
retty-0.038
TokenJoe
Token ID19585
Feature activation+9.25e-03

Pos logit contributions
aith+0.041
ipe+0.039
 toxicity+0.039
lus+0.038
 sympt+0.037
Neg logit contributions
���-0.043
 THESE-0.036
abouts-0.035
retty-0.034
 Zup-0.034
Token Bur
Token ID5481
Feature activation+9.00e-01

Pos logit contributions
���+0.018
 THESE+0.014
abouts+0.014
 Zup+0.014
retty+0.014
Neg logit contributions
aith-0.019
ipe-0.018
 toxicity-0.018
lus-0.018
ARA-0.017
Tokenbank
Token ID17796
Feature activation-3.75e-01

Pos logit contributions
���+1.369
 THESE+1.092
abouts+1.085
retty+1.063
bable+1.039
Neg logit contributions
aith-2.106
 toxicity-2.078
ipe-2.032
ARA-1.979
 sympt-1.973
Token/
Token ID14
Feature activation+7.66e-02

Pos logit contributions
aith+0.695
ipe+0.667
 toxicity+0.654
lus+0.640
ARA+0.624
Neg logit contributions
���-0.796
 THESE-0.661
abouts-0.656
retty-0.635
 Zup-0.627
TokenOr
Token ID5574
Feature activation+1.76e-02

Pos logit contributions
���+0.182
 THESE+0.156
abouts+0.155
retty+0.150
bable+0.148
Neg logit contributions
aith-0.121
ipe-0.116
 toxicity-0.113
lus-0.109
 sympt-0.104
Tokenlando
Token ID11993
Feature activation+2.88e-01

Pos logit contributions
���+0.029
abouts+0.024
 THESE+0.024
retty+0.023
bable+0.022
Neg logit contributions
aith-0.039
ipe-0.038
 toxicity-0.038
lus-0.036
ARA-0.036
Token Sentinel
Token ID26716
Feature activation-2.71e-01

Pos logit contributions
���+0.670
 THESE+0.585
abouts+0.564
 Zup+0.548
 wrists+0.542
Neg logit contributions
aith-0.453
ipe-0.440
lus-0.433
 toxicity-0.433
 sympt-0.411
Token has
Token ID468
Feature activation+1.02e-01

Pos logit contributions
���+0.266
abouts+0.229
 THESE+0.228
 wrists+0.220
 peas+0.218
Neg logit contributions
aith-0.199
ipe-0.194
 toxicity-0.186
lus-0.185
ARA-0.182
Token
Token ID564
Feature activation+1.55e-01

Pos logit contributions
���+0.217
 THESE+0.184
abouts+0.183
 wrists+0.178
 peas+0.177
Neg logit contributions
aith-0.185
ipe-0.180
lus-0.171
 toxicity-0.169
ARA-0.168
Token
Token ID250
Feature activation+2.58e-01

Pos logit contributions
���+0.302
 THESE+0.245
abouts+0.243
bable+0.240
retty+0.231
Neg logit contributions
aith-0.306
 toxicity-0.302
ipe-0.297
ARA-0.286
lus-0.283
Tokenre
Token ID260
Feature activation-6.90e-02

Pos logit contributions
���+0.537
abouts+0.462
 THESE+0.452
bable+0.441
retty+0.438
Neg logit contributions
aith-0.467
ipe-0.451
 toxicity-0.449
lus-0.431
ARA-0.428
Token-
Token ID12
Feature activation+1.84e-02

Pos logit contributions
aith+0.138
ipe+0.133
 toxicity+0.133
lus+0.128
ARA+0.127
Neg logit contributions
���-0.135
abouts-0.112
 THESE-0.110
bable-0.105
retty-0.105
Tokeneval
Token ID18206
Feature activation+8.92e-01

Pos logit contributions
���+0.037
abouts+0.032
 THESE+0.031
bable+0.030
retty+0.030
Neg logit contributions
aith-0.035
ipe-0.034
 toxicity-0.034
lus-0.032
ARA-0.032
Tokenuated
Token ID6605
Feature activation+1.95e-01

Pos logit contributions
���+1.373
 THESE+1.061
abouts+1.051
bable+1.006
retty+0.992
Neg logit contributions
aith-2.150
ipe-2.074
 toxicity-2.068
lus-2.006
ARA-1.996
Token
Token ID447
Feature activation+8.85e-02

Pos logit contributions
���+0.421
 THESE+0.356
abouts+0.354
retty+0.335
 wrists+0.335
Neg logit contributions
aith-0.350
ipe-0.338
lus-0.324
 toxicity-0.324
ARA-0.315
Token
Token ID251
Feature activation+1.94e-01

Pos logit contributions
���+0.161
 THESE+0.125
abouts+0.123
bable+0.120
retty+0.116
Neg logit contributions
aith-0.195
ipe-0.190
 toxicity-0.186
lus-0.180
ARA-0.179
Token his
Token ID465
Feature activation+3.20e-02

Pos logit contributions
���+0.437
abouts+0.374
 THESE+0.370
retty+0.357
bable+0.352
Neg logit contributions
aith-0.330
ipe-0.316
 toxicity-0.310
lus-0.301
ARA-0.298
Token language
Token ID3303
Feature activation+3.52e-01

Pos logit contributions
���+0.072
 THESE+0.062
abouts+0.061
 wrists+0.060
 peas+0.059
Neg logit contributions
aith-0.054
ipe-0.052
lus-0.049
 toxicity-0.048
ARA-0.048
Token offered
Token ID4438
Feature activation+6.52e-02

Pos logit contributions
���+0.637
 THESE+0.545
abouts+0.535
 Zup+0.524
retty+0.513
Neg logit contributions
aith-0.505
ipe-0.478
 toxicity-0.470
lus-0.463
ARA-0.458
Token in
Token ID287
Feature activation-5.66e-02

Pos logit contributions
���+0.133
 THESE+0.114
abouts+0.114
 Zup+0.108
 wrists+0.108
Neg logit contributions
aith-0.122
ipe-0.118
 toxicity-0.115
lus-0.114
ARA-0.111
Token MO
Token ID13070
Feature activation+2.56e-01

Pos logit contributions
aith+0.098
ipe+0.094
 toxicity+0.091
lus+0.090
ARA+0.087
Neg logit contributions
���-0.125
 THESE-0.108
abouts-0.105
 Zup-0.102
 peas-0.101
TokenBI
Token ID3483
Feature activation+4.79e-01

Pos logit contributions
���+0.595
 THESE+0.508
abouts+0.501
bable+0.488
retty+0.485
Neg logit contributions
aith-0.417
ipe-0.400
 toxicity-0.391
lus-0.383
ARA-0.363
Token or
Token ID393
Feature activation+4.50e-02

Pos logit contributions
���+1.049
 THESE+0.898
abouts+0.876
retty+0.858
bable+0.850
Neg logit contributions
aith-0.834
ipe-0.794
 toxicity-0.781
lus-0.755
 sympt-0.734
Token Plain
Token ID28847
Feature activation+8.88e-01

Pos logit contributions
���+0.095
 THESE+0.082
abouts+0.080
 Zup+0.077
retty+0.075
Neg logit contributions
aith-0.084
ipe-0.080
 toxicity-0.077
lus-0.077
ARA-0.074
Token Text
Token ID8255
Feature activation-8.72e-02

Pos logit contributions
���+1.699
 THESE+1.448
abouts+1.442
retty+1.379
bable+1.324
Neg logit contributions
aith-1.757
ipe-1.706
 toxicity-1.695
lus-1.645
 sympt-1.582
Token is
Token ID318
Feature activation+3.62e-02

Pos logit contributions
aith+0.154
ipe+0.149
 toxicity+0.144
lus+0.143
ARA+0.136
Neg logit contributions
���-0.192
 THESE-0.162
abouts-0.161
 Zup-0.153
 wrists-0.153
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
���+0.077
 THESE+0.064
abouts+0.063
 Zup+0.061
retty+0.060
Neg logit contributions
aith-0.067
ipe-0.064
 toxicity-0.063
lus-0.062
ARA-0.060
Token perfect
Token ID2818
Feature activation-1.16e-01

Pos logit contributions
���+0.028
 THESE+0.023
abouts+0.023
retty+0.022
 Zup+0.022
Neg logit contributions
aith-0.026
ipe-0.025
 toxicity-0.024
lus-0.024
ARA-0.023
Token format
Token ID5794
Feature activation-2.49e-02

Pos logit contributions
aith+0.213
ipe+0.207
 toxicity+0.203
lus+0.200
ARA+0.192
Neg logit contributions
���-0.243
 THESE-0.206
abouts-0.203
 Zup-0.197
 wrists-0.194
Token assembly
Token ID10474
Feature activation+1.20e-01

Pos logit contributions
aith+0.576
ipe+0.560
lus+0.539
 toxicity+0.533
ARA+0.521
Neg logit contributions
���-0.766
 THESE-0.656
abouts-0.656
 Zup-0.635
 wrists-0.631
Token was
Token ID373
Feature activation+4.20e-02

Pos logit contributions
���+0.259
 THESE+0.217
abouts+0.216
 Zup+0.207
bable+0.206
Neg logit contributions
aith-0.213
ipe-0.206
 toxicity-0.201
lus-0.199
ARA-0.194
Token started
Token ID2067
Feature activation+4.06e-03

Pos logit contributions
���+0.091
abouts+0.077
 THESE+0.075
 peas+0.072
bable+0.072
Neg logit contributions
aith-0.075
ipe-0.073
 toxicity-0.072
lus-0.070
NL-0.068
Token at
Token ID379
Feature activation-8.36e-02

Pos logit contributions
���+0.008
abouts+0.007
 THESE+0.006
 Zup+0.006
retty+0.006
Neg logit contributions
aith-0.008
ipe-0.008
 toxicity-0.008
lus-0.008
ARA-0.007
Token Sam
Token ID3409
Feature activation+3.57e-01

Pos logit contributions
aith+0.143
ipe+0.137
 toxicity+0.134
lus+0.132
ARA+0.129
Neg logit contributions
���-0.186
abouts-0.160
 THESE-0.158
 Zup-0.154
 peas-0.151
Tokenpl
Token ID489
Feature activation+8.86e-01

Pos logit contributions
���+0.783
abouts+0.684
 THESE+0.663
retty+0.650
 Zup+0.645
Neg logit contributions
aith-0.591
ipe-0.577
 toxicity-0.576
lus-0.555
ARA-0.543
Tokena
Token ID64
Feature activation-1.73e-01

Pos logit contributions
���+1.908
abouts+1.695
 THESE+1.677
retty+1.648
 Zup+1.611
Neg logit contributions
aith-1.423
 toxicity-1.386
ipe-1.369
lus-1.360
 sympt-1.314
Token Roh
Token ID32694
Feature activation-7.23e-02

Pos logit contributions
aith+0.306
ipe+0.294
 toxicity+0.292
lus+0.288
ARA+0.278
Neg logit contributions
���-0.374
abouts-0.316
 THESE-0.312
retty-0.302
 Zup-0.301
Tokent
Token ID83
Feature activation+8.09e-02

Pos logit contributions
aith+0.114
 toxicity+0.108
ipe+0.107
lus+0.103
ARA+0.101
Neg logit contributions
���-0.170
abouts-0.147
 THESE-0.145
retty-0.141
 Zup-0.140
Tokenak
Token ID461
Feature activation+1.55e-01

Pos logit contributions
���+0.178
abouts+0.154
 THESE+0.151
retty+0.147
 Zup+0.145
Neg logit contributions
aith-0.137
 toxicity-0.132
ipe-0.131
lus-0.127
ARA-0.123
Token by
Token ID416
Feature activation+1.26e-03

Pos logit contributions
���+0.342
abouts+0.292
 THESE+0.286
retty+0.278
 Zup+0.277
Neg logit contributions
aith-0.262
ipe-0.253
 toxicity-0.251
lus-0.245
ARA-0.238
Token public
Token ID1171
Feature activation+2.18e-01

Pos logit contributions
���+0.149
 THESE+0.129
abouts+0.128
 peas+0.123
 Zup+0.122
Neg logit contributions
aith-0.112
ipe-0.107
lus-0.104
 toxicity-0.101
ARA-0.100
Token safety
Token ID3747
Feature activation-3.05e-01

Pos logit contributions
���+0.519
 THESE+0.452
abouts+0.445
 peas+0.433
 wrists+0.433
Neg logit contributions
aith-0.330
ipe-0.313
lus-0.302
ARA-0.295
 toxicity-0.287
Token issues
Token ID2428
Feature activation-2.45e-01

Pos logit contributions
aith+0.470
ipe+0.450
lus+0.428
 toxicity+0.417
ARA+0.417
Neg logit contributions
���-0.730
abouts-0.632
 THESE-0.632
 peas-0.607
 wrists-0.605
Token.
Token ID13
Feature activation+7.67e-02

Pos logit contributions
aith+0.416
ipe+0.397
 toxicity+0.384
lus+0.383
ARA+0.370
Neg logit contributions
���-0.551
abouts-0.477
 THESE-0.474
retty-0.451
 Zup-0.449
Token David
Token ID3271
Feature activation+7.36e-03

Pos logit contributions
���+0.150
 THESE+0.132
abouts+0.128
 Zup+0.123
retty+0.121
Neg logit contributions
aith-0.150
ipe-0.144
 toxicity-0.141
lus-0.138
ARA-0.133
Token Dill
Token ID44322
Feature activation+8.82e-01

Pos logit contributions
���+0.016
 THESE+0.013
abouts+0.013
 Zup+0.013
 wrists+0.013
Neg logit contributions
aith-0.013
ipe-0.013
 toxicity-0.012
lus-0.012
ARA-0.012
Token
Token ID447
Feature activation-6.00e-02

Pos logit contributions
���+1.837
abouts+1.589
 THESE+1.587
 wrists+1.516
retty+1.502
Neg logit contributions
aith-1.557
ipe-1.482
 toxicity-1.456
lus-1.437
ARA-1.398
Token
Token ID247
Feature activation+4.10e-02

Pos logit contributions
aith+0.124
ipe+0.120
 toxicity+0.117
lus+0.115
ARA+0.113
Neg logit contributions
���-0.115
abouts-0.093
 THESE-0.093
retty-0.089
 Zup-0.088
Tokens
Token ID82
Feature activation-4.02e-02

Pos logit contributions
���+0.092
abouts+0.079
 THESE+0.079
retty+0.076
bable+0.075
Neg logit contributions
aith-0.069
ipe-0.066
 toxicity-0.065
lus-0.063
ARA-0.062
Token final
Token ID2457
Feature activation+2.05e-01

Pos logit contributions
aith+0.069
ipe+0.066
lus+0.063
 toxicity+0.063
ARA+0.062
Neg logit contributions
���-0.089
 THESE-0.077
abouts-0.076
retty-0.073
 wrists-0.073
Token opportunity
Token ID3663
Feature activation+3.65e-01

Pos logit contributions
���+0.449
 THESE+0.386
abouts+0.385
 Zup+0.373
 wrists+0.373
Neg logit contributions
aith-0.351
ipe-0.333
lus-0.322
 toxicity-0.320
ARA-0.314
Token to
Token ID284
Feature activation-5.01e-03

Pos logit contributions
aith+0.103
ipe+0.098
 toxicity+0.096
lus+0.095
ARA+0.092
Neg logit contributions
���-0.126
abouts-0.107
 THESE-0.107
retty-0.103
 Zup-0.102
Token 540
Token ID38190
Feature activation-1.38e-01

Pos logit contributions
aith+0.009
ipe+0.009
 toxicity+0.008
lus+0.008
ARA+0.008
Neg logit contributions
���-0.011
abouts-0.009
 THESE-0.009
 Zup-0.009
 peas-0.009
Token,
Token ID11
Feature activation+9.94e-03

Pos logit contributions
aith+0.211
ipe+0.198
 toxicity+0.194
lus+0.192
ARA+0.182
Neg logit contributions
���-0.337
 THESE-0.290
abouts-0.288
retty-0.284
bable-0.279
Token000
Token ID830
Feature activation+6.83e-02

Pos logit contributions
���+0.020
 THESE+0.016
abouts+0.016
retty+0.016
 Zup+0.015
Neg logit contributions
aith-0.020
ipe-0.019
 toxicity-0.019
lus-0.018
ARA-0.018
Token;
Token ID26
Feature activation+1.25e-01

Pos logit contributions
���+0.152
 THESE+0.130
abouts+0.129
retty+0.126
 peas+0.124
Neg logit contributions
aith-0.118
ipe-0.112
 toxicity-0.110
lus-0.109
ARA-0.104
Token field
Token ID2214
Feature activation+8.71e-01

Pos logit contributions
���+0.279
 THESE+0.235
abouts+0.233
retty+0.226
 Zup+0.226
Neg logit contributions
aith-0.218
ipe-0.207
 toxicity-0.202
lus-0.200
ARA-0.193
Token 36
Token ID4570
Feature activation+2.00e-01

Pos logit contributions
���+1.908
abouts+1.618
 THESE+1.589
retty+1.555
 peas+1.547
Neg logit contributions
aith-1.528
ipe-1.462
lus-1.441
 toxicity-1.419
ARA-1.388
Token batt
Token ID6274
Feature activation+3.07e-02

Pos logit contributions
���+0.483
 THESE+0.409
abouts+0.406
retty+0.401
 Zup+0.393
Neg logit contributions
aith-0.314
ipe-0.297
lus-0.289
 toxicity-0.285
ARA-0.274
Tokenalions
Token ID50022
Feature activation+3.95e-01

Pos logit contributions
���+0.060
abouts+0.054
 THESE+0.053
 Zup+0.051
 wrists+0.050
Neg logit contributions
aith-0.057
ipe-0.056
 toxicity-0.054
 sympt-0.052
ARA-0.052
Token of
Token ID286
Feature activation+1.03e-01

Pos logit contributions
���+0.820
abouts+0.693
 THESE+0.692
retty+0.657
 Zup+0.656
Neg logit contributions
aith-0.738
ipe-0.709
 toxicity-0.691
lus-0.687
ARA-0.661
Token Marines
Token ID19586
Feature activation+1.96e-01

Pos logit contributions
���+0.236
 THESE+0.200
abouts+0.199
 peas+0.193
 Zup+0.192
Neg logit contributions
aith-0.175
ipe-0.168
lus-0.161
 toxicity-0.159
ARA-0.154
Tokenimages
Token ID17566
Feature activation+3.16e-01

Pos logit contributions
���+0.139
abouts+0.126
 THESE+0.118
bable+0.109
 peas+0.108
Neg logit contributions
aith-0.330
 toxicity-0.323
ipe-0.316
 sympt-0.308
lus-0.308
Token(
Token ID7
Feature activation+1.15e-01

Pos logit contributions
 THESE+0.418
abouts+0.409
���+0.407
 Zup+0.389
bable+0.374
Neg logit contributions
aith-0.731
 toxicity-0.710
ipe-0.692
lus-0.672
 sympt-0.671
Tokenargs
Token ID22046
Feature activation+4.57e-02

Pos logit contributions
���+0.134
abouts+0.117
 THESE+0.115
bable+0.108
retty+0.104
Neg logit contributions
aith-0.305
 toxicity-0.291
ipe-0.290
lus-0.284
ARA-0.279
Token["
Token ID14692
Feature activation+6.36e-01

Pos logit contributions
���+0.052
abouts+0.051
 THESE+0.050
bable+0.046
 peas+0.045
Neg logit contributions
aith-0.116
 toxicity-0.111
ipe-0.109
 sympt-0.107
lus-0.107
Tokenimages
Token ID17566
Feature activation+3.15e-01

Pos logit contributions
���+0.631
abouts+0.615
 THESE+0.568
bable+0.525
 peas+0.525
Neg logit contributions
aith-1.703
 toxicity-1.671
ipe-1.615
ARA-1.594
 sympt-1.593
Token"]
Token ID8973
Feature activation+8.68e-01

Pos logit contributions
abouts+0.396
���+0.391
 THESE+0.390
 Zup+0.363
bable+0.361
Neg logit contributions
aith-0.755
 toxicity-0.719
ipe-0.714
lus-0.689
 sympt-0.683
Token):
Token ID2599
Feature activation-5.86e-02

Pos logit contributions
 THESE+1.271
abouts+1.252
 Zup+1.178
bable+1.176
 Varg+1.160
Neg logit contributions
aith-1.865
 toxicity-1.798
ipe-1.766
lus-1.726
 sympt-1.673
Token #
Token ID1303
Feature activation-6.39e-02

Pos logit contributions
aith+0.087
ipe+0.080
lus+0.077
 toxicity+0.077
ARA+0.075
Neg logit contributions
���-0.138
 THESE-0.129
abouts-0.125
 peas-0.123
 wrists-0.123
Token load
Token ID3440
Feature activation-1.73e-01

Pos logit contributions
aith+0.139
ipe+0.131
 toxicity+0.129
lus+0.128
ARA+0.127
Neg logit contributions
���-0.113
 THESE-0.096
abouts-0.090
 peas-0.089
 wrists-0.089
Token the
Token ID262
Feature activation+1.28e-01

Pos logit contributions
aith+0.297
ipe+0.280
 toxicity+0.276
lus+0.265
ARA+0.258
Neg logit contributions
���-0.342
 THESE-0.337
abouts-0.328
 Zup-0.320
 wrists-0.312
Token image
Token ID2939
Feature activation-8.11e-02

Pos logit contributions
���+0.221
 THESE+0.204
abouts+0.200
 peas+0.198
 Zup+0.195
Neg logit contributions
aith-0.261
ipe-0.243
lus-0.242
 toxicity-0.238
ARA-0.235
Token these
Token ID777
Feature activation-1.39e-01

Pos logit contributions
���+0.081
 THESE+0.070
abouts+0.069
 peas+0.067
 Zup+0.067
Neg logit contributions
aith-0.062
ipe-0.059
lus-0.057
 toxicity-0.056
ARA-0.055
Token troops
Token ID6553
Feature activation+1.48e-01

Pos logit contributions
aith+0.225
ipe+0.213
lus+0.204
ARA+0.202
 toxicity+0.202
Neg logit contributions
���-0.319
abouts-0.283
 THESE-0.280
 peas-0.274
 wrists-0.272
Token out
Token ID503
Feature activation-1.08e-01

Pos logit contributions
���+0.304
abouts+0.266
 THESE+0.265
 wrists+0.254
 peas+0.253
Neg logit contributions
aith-0.272
ipe-0.263
lus-0.253
 toxicity-0.251
ARA-0.248
Token in
Token ID287
Feature activation-9.88e-02

Pos logit contributions
aith+0.194
ipe+0.185
 toxicity+0.179
lus+0.176
ARA+0.174
Neg logit contributions
���-0.233
abouts-0.202
 THESE-0.200
retty-0.192
 wrists-0.189
Token the
Token ID262
Feature activation-7.27e-02

Pos logit contributions
aith+0.158
ipe+0.152
lus+0.145
 toxicity+0.142
ARA+0.141
Neg logit contributions
���-0.226
 THESE-0.201
abouts-0.197
retty-0.192
 peas-0.190
Token field
Token ID2214
Feature activation+8.66e-01

Pos logit contributions
aith+0.130
ipe+0.125
lus+0.120
ARA+0.119
 toxicity+0.119
Neg logit contributions
���-0.154
 THESE-0.132
abouts-0.131
 wrists-0.129
 peas-0.127
Token on
Token ID319
Feature activation-4.86e-02

Pos logit contributions
���+1.824
abouts+1.626
 THESE+1.584
retty+1.526
 peas+1.518
Neg logit contributions
aith-1.531
ipe-1.452
 toxicity-1.403
ARA-1.391
lus-1.389
Token military
Token ID2422
Feature activation-1.16e-01

Pos logit contributions
aith+0.082
ipe+0.078
 toxicity+0.075
ARA+0.075
lus+0.074
Neg logit contributions
���-0.109
 THESE-0.094
abouts-0.093
 wrists-0.090
retty-0.090
Token operations
Token ID4560
Feature activation+2.46e-01

Pos logit contributions
aith+0.199
ipe+0.192
lus+0.183
ARA+0.182
 toxicity+0.181
Neg logit contributions
���-0.255
abouts-0.219
 THESE-0.218
 wrists-0.213
 peas-0.211
Token ad
Token ID512
Feature activation-1.64e-01

Pos logit contributions
���+0.518
abouts+0.452
 THESE+0.447
retty+0.428
 peas+0.424
Neg logit contributions
aith-0.436
ipe-0.420
 toxicity-0.408
lus-0.406
ARA-0.399
Token inf
Token ID1167
Feature activation-3.48e-02

Pos logit contributions
aith+0.278
ipe+0.262
 toxicity+0.257
ARA+0.250
lus+0.249
Neg logit contributions
���-0.373
 THESE-0.315
abouts-0.311
retty-0.306
 Zup-0.303
Token10
Token ID940
Feature activation-4.91e-02

Pos logit contributions
aith+0.022
 toxicity+0.020
ipe+0.020
lus+0.019
 sympt+0.018
Neg logit contributions
���-0.029
 THESE-0.027
abouts-0.027
retty-0.026
 Zup-0.026
Token.
Token ID13
Feature activation-6.63e-02

Pos logit contributions
aith+0.097
 toxicity+0.090
ipe+0.090
lus+0.087
 sympt+0.086
Neg logit contributions
 THESE-0.082
abouts-0.081
 Zup-0.078
���-0.078
 Varg-0.075
Token13
Token ID1485
Feature activation-2.07e-01

Pos logit contributions
aith+0.129
 toxicity+0.117
ipe+0.117
lus+0.113
ARA+0.113
Neg logit contributions
���-0.127
 THESE-0.115
 Zup-0.113
abouts-0.113
bable-0.106
Token71
Token ID4869
Feature activation-1.80e-01

Pos logit contributions
aith+0.347
ipe+0.334
 toxicity+0.324
lus+0.318
ARA+0.310
Neg logit contributions
���-0.476
 THESE-0.402
abouts-0.398
 Zup-0.386
retty-0.386
Token/
Token ID14
Feature activation-5.35e-02

Pos logit contributions
aith+0.382
 toxicity+0.365
ipe+0.358
lus+0.345
 sympt+0.341
Neg logit contributions
���-0.297
abouts-0.280
 THESE-0.279
 Zup-0.269
bable-0.255
Tokenjournal
Token ID24891
Feature activation-1.03e+00

Pos logit contributions
aith+0.081
ipe+0.075
 toxicity+0.074
lus+0.072
 sympt+0.069
Neg logit contributions
���-0.132
abouts-0.112
 THESE-0.111
 Zup-0.108
 peas-0.108
Token.
Token ID13
Feature activation-8.08e-02

Pos logit contributions
aith+1.842
 toxicity+1.782
ipe+1.717
lus+1.677
 sympt+1.676
Neg logit contributions
���-1.934
abouts-1.911
 THESE-1.862
 Zup-1.798
bable-1.795
Tokenp
Token ID79
Feature activation-1.37e-02

Pos logit contributions
aith+0.162
ipe+0.154
 toxicity+0.153
lus+0.149
ARA+0.147
Neg logit contributions
���-0.158
abouts-0.132
 THESE-0.129
retty-0.126
bable-0.124
Tokenone
Token ID505
Feature activation-9.98e-03

Pos logit contributions
aith+0.021
 toxicity+0.021
 sympt+0.020
ipe+0.020
ARA+0.020
Neg logit contributions
���-0.031
abouts-0.027
 THESE-0.027
 peas-0.026
bable-0.026
Token.
Token ID13
Feature activation-3.76e-02

Pos logit contributions
aith+0.019
 toxicity+0.018
ipe+0.018
 sympt+0.017
lus+0.017
Neg logit contributions
abouts-0.017
 THESE-0.017
���-0.016
 Zup-0.016
bable-0.016
Token00
Token ID405
Feature activation+2.63e-02

Pos logit contributions
aith+0.085
ipe+0.081
 toxicity+0.080
lus+0.077
 sympt+0.077
Neg logit contributions
���-0.063
abouts-0.052
 THESE-0.051
retty-0.049
 Zup-0.048
TokenA
Token ID32
Feature activation-3.97e-02

Pos logit contributions
aith+0.579
 toxicity+0.560
ipe+0.549
lus+0.529
 sympt+0.529
Neg logit contributions
 THESE-0.557
abouts-0.547
���-0.541
 Zup-0.525
 Shining-0.515
Token-
Token ID12
Feature activation+2.89e-01

Pos logit contributions
aith+0.068
 toxicity+0.065
ipe+0.064
 sympt+0.061
lus+0.060
Neg logit contributions
abouts-0.077
 THESE-0.077
���-0.075
 Zup-0.074
 Shining-0.072
TokenG
Token ID38
Feature activation+1.59e-01

Pos logit contributions
���+0.613
abouts+0.530
 THESE+0.529
retty+0.500
 Zup+0.496
Neg logit contributions
aith-0.516
ipe-0.486
 toxicity-0.486
lus-0.464
 sympt-0.455
Tokeniants
Token ID17883
Feature activation-3.81e-01

Pos logit contributions
���+0.334
 THESE+0.279
abouts+0.278
retty+0.275
 Zup+0.265
Neg logit contributions
aith-0.290
ipe-0.278
 toxicity-0.276
lus-0.268
ARA-0.262
Token Super
Token ID3115
Feature activation-1.98e-01

Pos logit contributions
aith+0.646
ipe+0.616
 toxicity+0.602
lus+0.588
ARA+0.575
Neg logit contributions
���-0.872
abouts-0.734
 THESE-0.732
retty-0.704
 Zup-0.699
Tokenbowl
Token ID36859
Feature activation-9.82e-01

Pos logit contributions
aith+0.327
ipe+0.308
 toxicity+0.303
lus+0.293
ARA+0.290
Neg logit contributions
���-0.464
abouts-0.390
 THESE-0.390
retty-0.381
 Zup-0.375
Token-
Token ID12
Feature activation+4.47e-01

Pos logit contributions
aith+1.573
ipe+1.500
 toxicity+1.460
lus+1.431
ARA+1.389
Neg logit contributions
���-2.341
abouts-1.974
 THESE-1.972
retty-1.918
 Zup-1.889
TokenItalian
Token ID45696
Feature activation-1.38e-01

Pos logit contributions
���+0.957
abouts+0.831
 THESE+0.825
retty+0.785
 Zup+0.777
Neg logit contributions
aith-0.787
ipe-0.742
 toxicity-0.742
lus-0.709
 sympt-0.694
Token St
Token ID520
Feature activation-4.76e-02

Pos logit contributions
aith+0.260
ipe+0.250
 toxicity+0.245
lus+0.238
ARA+0.234
Neg logit contributions
���-0.286
abouts-0.238
 THESE-0.237
retty-0.227
 Zup-0.227
Tokenere
Token ID567
Feature activation-3.59e-01

Pos logit contributions
aith+0.098
 toxicity+0.093
ipe+0.092
lus+0.090
ARA+0.089
Neg logit contributions
���-0.089
abouts-0.075
 THESE-0.074
retty-0.072
 Zup-0.070
Tokenotypes
Token ID13567
Feature activation+1.57e-01

Pos logit contributions
aith+0.896
ipe+0.869
 toxicity+0.855
lus+0.844
ARA+0.831
Neg logit contributions
���-0.533
abouts-0.401
 THESE-0.400
retty-0.379
 Zup-0.369
Tokenates
Token ID689
Feature activation-3.55e-01

Pos logit contributions
aith+0.942
ipe+0.900
 toxicity+0.891
lus+0.870
ARA+0.864
Neg logit contributions
���-0.943
abouts-0.777
 THESE-0.775
retty-0.749
bable-0.732
Token #
Token ID1303
Feature activation-8.78e-02

Pos logit contributions
aith+0.597
ipe+0.568
 toxicity+0.556
lus+0.547
ARA+0.534
Neg logit contributions
���-0.811
abouts-0.695
 THESE-0.691
retty-0.666
bable-0.653
TokenOccup
Token ID47658
Feature activation-4.74e-01

Pos logit contributions
aith+0.153
ipe+0.145
 toxicity+0.143
lus+0.139
ARA+0.135
Neg logit contributions
���-0.194
abouts-0.166
 THESE-0.164
retty-0.159
bable-0.157
Tokeny
Token ID88
Feature activation-1.20e-01

Pos logit contributions
aith+0.915
ipe+0.873
 toxicity+0.861
lus+0.844
ARA+0.828
Neg logit contributions
���-0.962
abouts-0.797
 THESE-0.797
retty-0.772
 wrists-0.752
TokenThe
Token ID464
Feature activation-1.39e-01

Pos logit contributions
aith+0.197
ipe+0.187
 toxicity+0.184
lus+0.179
ARA+0.173
Neg logit contributions
���-0.278
abouts-0.237
 THESE-0.237
retty-0.228
bable-0.224
TokenDeb
Token ID16587
Feature activation-9.56e-01

Pos logit contributions
aith+0.201
ipe+0.188
 toxicity+0.187
lus+0.180
 sympt+0.174
Neg logit contributions
���-0.347
abouts-0.301
 THESE-0.298
retty-0.290
bable-0.287
Tokenates
Token ID689
Feature activation-2.84e-01

Pos logit contributions
aith+1.872
ipe+1.802
 toxicity+1.760
lus+1.735
ARA+1.690
Neg logit contributions
���-1.942
abouts-1.584
 THESE-1.582
retty-1.528
 Zup-1.503
Token pic
Token ID8301
Feature activation-2.46e-01

Pos logit contributions
aith+0.460
ipe+0.437
 toxicity+0.426
lus+0.419
ARA+0.405
Neg logit contributions
���-0.669
abouts-0.574
 THESE-0.573
retty-0.552
bable-0.540
Token.
Token ID13
Feature activation-1.54e-01

Pos logit contributions
aith+0.458
 toxicity+0.435
ipe+0.435
lus+0.412
 sympt+0.409
Neg logit contributions
 THESE-0.442
abouts-0.439
���-0.426
 Zup-0.416
 Varg-0.403
Tokentwitter
Token ID6956
Feature activation-4.49e-02

Pos logit contributions
aith+0.378
 toxicity+0.366
ipe+0.356
 sympt+0.349
lus+0.344
Neg logit contributions
abouts-0.192
���-0.188
 THESE-0.184
bable-0.171
 Zup-0.166
Token.
Token ID13
Feature activation-2.12e-01

Pos logit contributions
aith+0.084
 toxicity+0.081
ipe+0.079
 sympt+0.076
lus+0.076
Neg logit contributions
abouts-0.079
 THESE-0.079
���-0.074
 Zup-0.074
bable-0.072
Token Memorial
Token ID14861
Feature activation+3.61e-02

Pos logit contributions
aith+0.152
ipe+0.148
 toxicity+0.144
lus+0.141
ARA+0.136
Neg logit contributions
���-0.181
 THESE-0.149
abouts-0.147
 Zup-0.142
 peas-0.141
Token,
Token ID11
Feature activation+6.23e-03

Pos logit contributions
���+0.075
 THESE+0.063
abouts+0.062
 Zup+0.060
 peas+0.059
Neg logit contributions
aith-0.066
ipe-0.065
 toxicity-0.063
lus-0.062
ARA-0.060
Token by
Token ID416
Feature activation+6.86e-02

Pos logit contributions
���+0.014
 THESE+0.012
abouts+0.012
 Zup+0.011
 peas+0.011
Neg logit contributions
aith-0.011
ipe-0.010
 toxicity-0.010
lus-0.010
ARA-0.010
Token artist
Token ID6802
Feature activation-6.22e-02

Pos logit contributions
���+0.157
 THESE+0.133
abouts+0.132
 Zup+0.128
 wrists+0.127
Neg logit contributions
aith-0.112
ipe-0.109
 toxicity-0.107
lus-0.104
ARA-0.101
Token Robert
Token ID5199
Feature activation-2.06e-01

Pos logit contributions
aith+0.119
ipe+0.115
 toxicity+0.114
lus+0.112
ARA+0.109
Neg logit contributions
���-0.126
abouts-0.104
 THESE-0.103
 Zup-0.100
retty-0.100
Token Berks
Token ID45167
Feature activation-9.54e-01

Pos logit contributions
aith+0.385
ipe+0.370
 toxicity+0.367
lus+0.356
ARA+0.349
Neg logit contributions
���-0.429
 THESE-0.355
abouts-0.354
 Zup-0.347
retty-0.342
Token,
Token ID11
Feature activation-4.12e-02

Pos logit contributions
aith+1.321
ipe+1.258
 toxicity+1.225
lus+1.187
ARA+1.151
Neg logit contributions
���-2.473
abouts-2.121
 THESE-2.116
retty-2.060
 Zup-2.044
Token was
Token ID373
Feature activation+4.68e-02

Pos logit contributions
aith+0.073
ipe+0.072
 toxicity+0.070
lus+0.068
ARA+0.067
Neg logit contributions
���-0.090
 THESE-0.075
abouts-0.074
 wrists-0.072
 Zup-0.072
Token erected
Token ID28415
Feature activation-3.28e-03

Pos logit contributions
���+0.101
 THESE+0.083
abouts+0.083
 wrists+0.080
 peas+0.079
Neg logit contributions
aith-0.084
ipe-0.083
 toxicity-0.081
lus-0.079
ARA-0.076
Token in
Token ID287
Feature activation+1.08e-01

Pos logit contributions
aith+0.006
ipe+0.006
 toxicity+0.006
lus+0.006
ARA+0.006
Neg logit contributions
���-0.007
abouts-0.006
 THESE-0.006
 Zup-0.006
 wrists-0.006
Token her
Token ID607
Feature activation+5.20e-02

Pos logit contributions
���+0.236
 THESE+0.203
abouts+0.201
 Zup+0.199
 wrists+0.195
Neg logit contributions
aith-0.181
ipe-0.180
 toxicity-0.172
lus-0.171
ARA-0.164
Token on
Token ID319
Feature activation-8.51e-03

Pos logit contributions
aith+0.070
ipe+0.068
 toxicity+0.066
lus+0.065
ARA+0.063
Neg logit contributions
���-0.099
abouts-0.087
 THESE-0.085
retty-0.081
bable-0.081
Token the
Token ID262
Feature activation+1.95e-02

Pos logit contributions
aith+0.014
ipe+0.014
 toxicity+0.013
ARA+0.013
lus+0.013
Neg logit contributions
���-0.019
 THESE-0.016
abouts-0.016
 wrists-0.016
 Zup-0.016
Token slippery
Token ID32911
Feature activation+4.92e-02

Pos logit contributions
���+0.043
abouts+0.037
 THESE+0.037
 wrists+0.036
 Zup+0.036
Neg logit contributions
aith-0.033
ipe-0.032
ARA-0.031
lus-0.031
 toxicity-0.031
Token Cleveland
Token ID10306
Feature activation-6.87e-02

Pos logit contributions
���+0.102
 THESE+0.087
abouts+0.087
 wrists+0.085
 Zup+0.085
Neg logit contributions
aith-0.090
ipe-0.087
 toxicity-0.084
lus-0.084
ARA-0.083
Token Stadium
Token ID10499
Feature activation-3.90e-01

Pos logit contributions
aith+0.130
ipe+0.126
 toxicity+0.123
ARA+0.121
lus+0.121
Neg logit contributions
���-0.136
 THESE-0.117
abouts-0.115
 wrists-0.113
 peas-0.112
Token sod
Token ID32581
Feature activation-9.49e-01

Pos logit contributions
aith+0.847
ipe+0.815
lus+0.798
 toxicity+0.795
ARA+0.789
Neg logit contributions
���-0.684
 THESE-0.562
abouts-0.556
 wrists-0.536
 peas-0.531
Token,
Token ID11
Feature activation-9.86e-02

Pos logit contributions
aith+1.336
ipe+1.265
 toxicity+1.226
lus+1.199
ARA+1.159
Neg logit contributions
���-2.446
abouts-2.094
 THESE-2.091
retty-2.036
 Zup-2.010
Token the
Token ID262
Feature activation+2.34e-02

Pos logit contributions
aith+0.205
ipe+0.197
 toxicity+0.192
lus+0.189
ARA+0.186
Neg logit contributions
���-0.187
 THESE-0.152
abouts-0.151
 Zup-0.145
retty-0.144
Token Rams
Token ID16077
Feature activation+3.03e-02

Pos logit contributions
���+0.046
 THESE+0.038
abouts+0.038
 wrists+0.037
 Zup+0.037
Neg logit contributions
aith-0.046
ipe-0.045
lus-0.043
 toxicity-0.043
ARA-0.042
Token ran
Token ID4966
Feature activation-2.81e-02

Pos logit contributions
���+0.065
abouts+0.055
 THESE+0.055
 wrists+0.053
 peas+0.052
Neg logit contributions
aith-0.054
ipe-0.053
 toxicity-0.051
lus-0.051
ARA-0.049
Token out
Token ID503
Feature activation+1.36e-01

Pos logit contributions
aith+0.052
ipe+0.050
 toxicity+0.048
lus+0.048
ARA+0.047
Neg logit contributions
���-0.060
abouts-0.050
 THESE-0.050
 wrists-0.048
 peas-0.048
Token).
Token ID737
Feature activation-1.14e-01

Pos logit contributions
���+0.229
 THESE+0.193
abouts+0.193
retty+0.186
 wrists+0.184
Neg logit contributions
aith-0.181
ipe-0.176
 toxicity-0.172
lus-0.168
ARA-0.166
Token Extreme
Token ID18111
Feature activation-1.22e-01

Pos logit contributions
aith+0.219
ipe+0.213
 toxicity+0.209
lus+0.204
ARA+0.200
Neg logit contributions
���-0.231
 THESE-0.194
abouts-0.192
retty-0.183
 Zup-0.183
Token Un
Token ID791
Feature activation-2.00e-01

Pos logit contributions
aith+0.224
ipe+0.217
 toxicity+0.211
lus+0.208
ARA+0.204
Neg logit contributions
���-0.259
abouts-0.216
 THESE-0.216
retty-0.209
 wrists-0.205
Tokenction
Token ID596
Feature activation-1.79e-01

Pos logit contributions
aith+0.357
ipe+0.346
 toxicity+0.343
lus+0.332
ARA+0.330
Neg logit contributions
���-0.425
abouts-0.359
 THESE-0.357
retty-0.347
bable-0.341
Token.
Token ID13
Feature activation-1.19e-01

Pos logit contributions
aith+0.296
ipe+0.283
 toxicity+0.276
lus+0.271
ARA+0.263
Neg logit contributions
���-0.417
abouts-0.352
 THESE-0.351
retty-0.341
 Zup-0.335
Token Holy
Token ID7439
Feature activation-9.28e-01

Pos logit contributions
aith+0.229
ipe+0.222
 toxicity+0.217
lus+0.213
ARA+0.209
Neg logit contributions
���-0.242
 THESE-0.202
abouts-0.200
retty-0.191
 Zup-0.191
Token Orders
Token ID30689
Feature activation-9.12e-02

Pos logit contributions
aith+1.776
ipe+1.708
 toxicity+1.670
lus+1.642
ARA+1.604
Neg logit contributions
���-1.921
abouts-1.580
 THESE-1.576
retty-1.523
 Zup-1.496
Token.
Token ID13
Feature activation-4.89e-02

Pos logit contributions
aith+0.144
ipe+0.137
 toxicity+0.134
lus+0.131
ARA+0.127
Neg logit contributions
���-0.219
abouts-0.186
 THESE-0.185
retty-0.180
 Zup-0.177
Token Mat
Token ID6550
Feature activation-2.83e-01

Pos logit contributions
aith+0.096
ipe+0.093
 toxicity+0.091
lus+0.089
ARA+0.087
Neg logit contributions
���-0.098
 THESE-0.081
abouts-0.081
retty-0.077
 Zup-0.077
Tokenrim
Token ID3036
Feature activation-1.01e-01

Pos logit contributions
aith+0.541
ipe+0.526
 toxicity+0.518
lus+0.506
ARA+0.494
Neg logit contributions
���-0.574
abouts-0.477
 THESE-0.472
retty-0.457
 Zup-0.447
Tokenony
Token ID1647
Feature activation+3.09e-01

Pos logit contributions
aith+0.211
ipe+0.204
 toxicity+0.199
lus+0.197
ARA+0.193
Neg logit contributions
���-0.192
abouts-0.155
 THESE-0.154
retty-0.149
bable-0.146
Token-
Token ID12
Feature activation+5.33e-01

Pos logit contributions
���+0.179
 THESE+0.150
abouts+0.149
 Zup+0.144
retty+0.144
Neg logit contributions
aith-0.154
ipe-0.148
 toxicity-0.143
lus-0.139
ARA-0.137
TokenG
Token ID38
Feature activation+2.83e-01

Pos logit contributions
���+1.138
abouts+0.990
 THESE+0.981
retty+0.933
 Zup+0.925
Neg logit contributions
aith-0.935
ipe-0.885
 toxicity-0.884
lus-0.843
 sympt-0.830
Tokenambling
Token ID15366
Feature activation+5.16e-02

Pos logit contributions
���+0.608
 THESE+0.509
abouts+0.507
retty+0.501
 Zup+0.486
Neg logit contributions
aith-0.498
ipe-0.477
 toxicity-0.474
lus-0.460
ARA-0.448
Token-
Token ID12
Feature activation+5.74e-01

Pos logit contributions
���+0.118
abouts+0.100
 THESE+0.100
 Zup+0.095
retty+0.095
Neg logit contributions
aith-0.087
ipe-0.082
 toxicity-0.080
lus-0.079
ARA-0.076
TokenSuper
Token ID12442
Feature activation-2.53e-01

Pos logit contributions
���+1.219
abouts+1.062
 THESE+1.053
retty+0.998
 Zup+0.994
Neg logit contributions
aith-1.009
ipe-0.956
 toxicity-0.953
lus-0.910
 sympt-0.898
Tokenbowl
Token ID36859
Feature activation-9.28e-01

Pos logit contributions
aith+0.455
ipe+0.430
 toxicity+0.421
lus+0.412
ARA+0.408
Neg logit contributions
���-0.551
 THESE-0.460
abouts-0.460
retty-0.449
bable-0.441
Token Advertising
Token ID40113
Feature activation-2.71e-01

Pos logit contributions
aith+1.603
ipe+1.532
 toxicity+1.492
lus+1.469
ARA+1.425
Neg logit contributions
���-2.099
abouts-1.760
 THESE-1.755
retty-1.697
 Zup-1.676
Token-
Token ID12
Feature activation+4.83e-01

Pos logit contributions
aith+0.392
ipe+0.372
 toxicity+0.361
lus+0.353
ARA+0.342
Neg logit contributions
���-0.687
abouts-0.586
 THESE-0.585
retty-0.570
 Zup-0.562
TokenAust
Token ID15160
Feature activation-3.95e-01

Pos logit contributions
���+1.033
abouts+0.896
 THESE+0.892
retty+0.845
 Zup+0.843
Neg logit contributions
aith-0.848
ipe-0.802
 toxicity-0.799
lus-0.765
 sympt-0.751
Tokenrail
Token ID30224
Feature activation-2.54e-01

Pos logit contributions
aith+0.641
ipe+0.609
 toxicity+0.604
lus+0.580
ARA+0.567
Neg logit contributions
���-0.920
abouts-0.788
 THESE-0.782
retty-0.762
 Zup-0.747
Tokenian
Token ID666
Feature activation+1.36e-01

Pos logit contributions
aith+0.345
ipe+0.323
 toxicity+0.314
lus+0.307
ARA+0.296
Neg logit contributions
���-0.664
abouts-0.575
 THESE-0.574
retty-0.560
 Zup-0.551
Token
Token ID198
Feature activation-1.49e-02

Pos logit contributions
aith+0.092
ipe+0.087
 toxicity+0.086
lus+0.084
ARA+0.082
Neg logit contributions
���-0.102
 THESE-0.086
abouts-0.085
retty-0.082
 Zup-0.081
Token
Token ID198
Feature activation-6.27e-03

Pos logit contributions
aith+0.034
 toxicity+0.033
ipe+0.032
 sympt+0.031
lus+0.031
Neg logit contributions
 THESE-0.020
abouts-0.020
���-0.019
 Zup-0.019
bable-0.018
TokenThe
Token ID464
Feature activation-1.12e-02

Pos logit contributions
aith+0.011
 toxicity+0.011
ipe+0.010
lus+0.010
 sympt+0.010
Neg logit contributions
���-0.013
abouts-0.012
 THESE-0.011
bable-0.011
retty-0.011
Token 2008
Token ID3648
Feature activation-1.23e-01

Pos logit contributions
aith+0.019
ipe+0.018
 toxicity+0.017
lus+0.017
ARA+0.017
Neg logit contributions
���-0.026
 THESE-0.022
abouts-0.022
 wrists-0.021
 peas-0.021
Token clip
Token ID10651
Feature activation-1.38e-01

Pos logit contributions
aith+0.210
ipe+0.201
 toxicity+0.196
lus+0.192
ARA+0.188
Neg logit contributions
���-0.276
abouts-0.231
 THESE-0.231
 peas-0.222
retty-0.222
Token resur
Token ID15328
Feature activation-9.17e-01

Pos logit contributions
aith+0.238
ipe+0.226
 toxicity+0.221
lus+0.218
ARA+0.215
Neg logit contributions
���-0.306
 THESE-0.260
abouts-0.259
 Zup-0.249
 wrists-0.248
Tokenfaced
Token ID24903
Feature activation+1.62e-02

Pos logit contributions
aith+1.854
ipe+1.784
 toxicity+1.757
lus+1.715
ARA+1.690
Neg logit contributions
���-1.791
abouts-1.461
 THESE-1.454
retty-1.405
 Zup-1.375
Token after
Token ID706
Feature activation+7.06e-02

Pos logit contributions
���+0.033
abouts+0.029
 THESE+0.029
 wrists+0.027
 Zup+0.027
Neg logit contributions
aith-0.030
ipe-0.029
 toxicity-0.028
ARA-0.027
lus-0.027
Token McCain
Token ID14264
Feature activation-6.65e-02

Pos logit contributions
���+0.152
abouts+0.127
 THESE+0.127
 Zup+0.122
 peas+0.121
Neg logit contributions
aith-0.128
ipe-0.122
 toxicity-0.117
lus-0.115
ARA-0.115
Token
Token ID447
Feature activation+4.21e-02

Pos logit contributions
aith+0.120
ipe+0.116
 toxicity+0.111
lus+0.110
ARA+0.109
Neg logit contributions
���-0.143
abouts-0.119
 THESE-0.119
 wrists-0.115
 peas-0.114
Token
Token ID247
Feature activation-1.57e-02

Pos logit contributions
���+0.061
 THESE+0.047
abouts+0.046
bable+0.044
retty+0.044
Neg logit contributions
aith-0.104
ipe-0.101
 toxicity-0.100
ARA-0.095
lus-0.095
Token.
Token ID13
Feature activation+5.20e-02

Pos logit contributions
aith+0.272
ipe+0.258
 toxicity+0.249
lus+0.246
ARA+0.236
Neg logit contributions
���-0.439
 THESE-0.371
abouts-0.370
retty-0.361
 Zup-0.356
Token Conf
Token ID7326
Feature activation-1.09e-02

Pos logit contributions
���+0.104
 THESE+0.088
abouts+0.086
retty+0.082
 Zup+0.082
Neg logit contributions
aith-0.100
ipe-0.098
 toxicity-0.096
lus-0.093
ARA-0.092
Tokenirmation
Token ID36241
Feature activation-1.28e-01

Pos logit contributions
aith+0.017
 toxicity+0.016
ipe+0.016
ARA+0.015
lus+0.015
Neg logit contributions
���-0.025
abouts-0.022
 THESE-0.022
retty-0.022
bable-0.021
Token.
Token ID13
Feature activation+3.32e-02

Pos logit contributions
aith+0.197
ipe+0.186
 toxicity+0.181
lus+0.178
ARA+0.171
Neg logit contributions
���-0.316
 THESE-0.267
abouts-0.267
retty-0.260
 Zup-0.257
Token The
Token ID383
Feature activation+2.26e-01

Pos logit contributions
���+0.067
 THESE+0.056
abouts+0.055
 Zup+0.053
retty+0.053
Neg logit contributions
aith-0.064
ipe-0.062
 toxicity-0.061
lus-0.060
ARA-0.059
Token Holy
Token ID7439
Feature activation-9.14e-01

Pos logit contributions
���+0.465
abouts+0.389
 THESE+0.386
retty+0.374
 Zup+0.367
Neg logit contributions
aith-0.417
ipe-0.406
 toxicity-0.400
ARA-0.388
lus-0.387
Token E
Token ID412
Feature activation-2.48e-01

Pos logit contributions
aith+1.506
ipe+1.441
 toxicity+1.408
lus+1.375
ARA+1.343
Neg logit contributions
���-2.124
abouts-1.800
 THESE-1.794
retty-1.741
 Zup-1.714
Tokenuch
Token ID794
Feature activation-6.23e-02

Pos logit contributions
aith+0.650
 toxicity+0.636
ipe+0.631
lus+0.622
ARA+0.616
Neg logit contributions
���-0.317
 THESE-0.237
abouts-0.236
retty-0.220
 Zup-0.213
Tokenarist
Token ID34566
Feature activation-6.36e-01

Pos logit contributions
aith+0.122
 toxicity+0.118
ipe+0.117
ARA+0.114
lus+0.112
Neg logit contributions
���-0.121
 THESE-0.101
abouts-0.099
retty-0.098
bable-0.095
Token.
Token ID13
Feature activation-8.99e-02

Pos logit contributions
aith+0.905
ipe+0.857
 toxicity+0.811
lus+0.799
ARA+0.760
Neg logit contributions
���-1.653
 THESE-1.392
abouts-1.392
retty-1.363
 Zup-1.352
Token Pen
Token ID7507
Feature activation-6.15e-01

Pos logit contributions
aith+0.172
ipe+0.167
 toxicity+0.164
lus+0.160
ARA+0.157
Neg logit contributions
���-0.183
 THESE-0.153
abouts-0.151
retty-0.145
 Zup-0.144
Tokenl
Token ID75
Feature activation-1.78e-01

Pos logit contributions
aith+0.251
ipe+0.232
 toxicity+0.229
ARA+0.218
lus+0.217
Neg logit contributions
���-0.459
 THESE-0.399
abouts-0.398
retty-0.391
bable-0.385
Tokenange
Token ID858
Feature activation+1.16e-02

Pos logit contributions
aith+0.509
 toxicity+0.497
ipe+0.491
lus+0.489
 sympt+0.477
Neg logit contributions
���-0.186
 THESE-0.130
abouts-0.128
retty-0.123
bable-0.112
Token,
Token ID11
Feature activation-9.96e-02

Pos logit contributions
���+0.026
abouts+0.023
 THESE+0.023
retty+0.022
 peas+0.022
Neg logit contributions
aith-0.019
ipe-0.018
 toxicity-0.018
lus-0.018
ARA-0.017
Token the
Token ID262
Feature activation-9.01e-02

Pos logit contributions
aith+0.174
ipe+0.164
lus+0.160
 toxicity+0.159
ARA+0.155
Neg logit contributions
���-0.218
 THESE-0.188
abouts-0.184
 wrists-0.180
 peas-0.180
Token j
Token ID474
Feature activation-1.30e-01

Pos logit contributions
aith+0.154
ipe+0.147
lus+0.142
 toxicity+0.141
ARA+0.140
Neg logit contributions
���-0.197
 THESE-0.171
abouts-0.169
 wrists-0.166
 peas-0.166
Tokenumble
Token ID10344
Feature activation-9.05e-01

Pos logit contributions
aith+0.299
 toxicity+0.288
ipe+0.287
lus+0.281
ARA+0.279
Neg logit contributions
���-0.209
 THESE-0.167
abouts-0.166
retty-0.161
 Zup-0.158
Token of
Token ID286
Feature activation-3.47e-02

Pos logit contributions
aith+1.539
ipe+1.466
 toxicity+1.433
lus+1.408
ARA+1.371
Neg logit contributions
���-2.064
abouts-1.738
 THESE-1.733
retty-1.678
 Zup-1.656
Token floating
Token ID12462
Feature activation+2.28e-01

Pos logit contributions
aith+0.064
ipe+0.061
lus+0.059
 toxicity+0.059
ARA+0.058
Neg logit contributions
���-0.073
 THESE-0.062
abouts-0.062
 peas-0.060
 wrists-0.060
Token ice
Token ID4771
Feature activation-3.04e-01

Pos logit contributions
���+0.469
abouts+0.407
 THESE+0.402
 peas+0.398
 wrists+0.394
Neg logit contributions
aith-0.421
ipe-0.402
lus-0.388
ARA-0.386
 toxicity-0.385
Token downstream
Token ID33218
Feature activation-2.53e-01

Pos logit contributions
aith+0.508
ipe+0.480
 toxicity+0.471
lus+0.469
ARA+0.455
Neg logit contributions
���-0.687
abouts-0.592
 THESE-0.589
 peas-0.572
 wrists-0.568
Token from
Token ID422
Feature activation-7.83e-02

Pos logit contributions
aith+0.423
ipe+0.403
 toxicity+0.390
lus+0.388
ARA+0.379
Neg logit contributions
���-0.565
abouts-0.495
 THESE-0.489
 wrists-0.474
 peas-0.472
Token),
Token ID828
Feature activation+2.86e-03

Pos logit contributions
aith+0.429
ipe+0.408
 toxicity+0.397
lus+0.392
ARA+0.383
Neg logit contributions
���-0.570
abouts-0.476
 THESE-0.476
retty-0.460
 Zup-0.456
Token a
Token ID257
Feature activation-1.62e-02

Pos logit contributions
���+0.006
 THESE+0.005
abouts+0.005
 peas+0.005
 wrists+0.005
Neg logit contributions
aith-0.005
ipe-0.005
lus-0.005
 toxicity-0.005
ARA-0.005
Token selective
Token ID21792
Feature activation-3.63e-01

Pos logit contributions
aith+0.027
ipe+0.026
lus+0.026
 toxicity+0.025
ARA+0.024
Neg logit contributions
���-0.037
 THESE-0.031
abouts-0.031
 Zup-0.030
 wrists-0.030
Token serotonin
Token ID32236
Feature activation+4.24e-02

Pos logit contributions
aith+0.666
ipe+0.634
lus+0.626
ARA+0.612
 toxicity+0.596
Neg logit contributions
���-0.749
 THESE-0.636
abouts-0.636
 Zup-0.617
 peas-0.610
Token re
Token ID302
Feature activation+1.75e-01

Pos logit contributions
���+0.095
 THESE+0.080
abouts+0.079
bable+0.077
retty+0.077
Neg logit contributions
aith-0.072
ipe-0.069
lus-0.067
ARA-0.064
 toxicity-0.064
Tokenupt
Token ID37623
Feature activation-8.94e-01

Pos logit contributions
���+0.408
 THESE+0.341
abouts+0.330
bable+0.327
retty+0.326
Neg logit contributions
aith-0.285
ipe-0.270
lus-0.259
 toxicity-0.256
ARA-0.249
Tokenake
Token ID539
Feature activation-1.28e-01

Pos logit contributions
aith+1.541
ipe+1.476
 toxicity+1.471
lus+1.427
ARA+1.416
Neg logit contributions
���-1.955
 THESE-1.657
abouts-1.655
retty-1.607
 Zup-1.598
Token inhibitor
Token ID41704
Feature activation+4.27e-01

Pos logit contributions
aith+0.178
ipe+0.170
lus+0.160
NL+0.157
ARA+0.156
Neg logit contributions
���-0.331
 THESE-0.281
abouts-0.281
bable-0.272
 peas-0.270
Token (
Token ID357
Feature activation+2.78e-02

Pos logit contributions
���+0.915
 THESE+0.762
abouts+0.756
 Zup+0.728
retty+0.727
Neg logit contributions
aith-0.774
ipe-0.737
lus-0.724
 toxicity-0.705
ARA-0.695
TokenSS
Token ID5432
Feature activation+1.76e-01

Pos logit contributions
���+0.062
abouts+0.054
 THESE+0.053
retty+0.052
bable+0.050
Neg logit contributions
aith-0.046
ipe-0.044
 toxicity-0.044
lus-0.042
 sympt-0.040
TokenRI
Token ID7112
Feature activation+3.85e-01

Pos logit contributions
���+0.422
abouts+0.373
 THESE+0.371
 Zup+0.361
retty+0.360
Neg logit contributions
aith-0.265
ipe-0.239
 toxicity-0.236
lus-0.235
 sympt-0.221
Token.
Token ID13
Feature activation-1.59e-01

Pos logit contributions
aith+0.258
 toxicity+0.251
ipe+0.242
lus+0.236
 sympt+0.235
Neg logit contributions
abouts-0.224
 THESE-0.224
 Zup-0.214
 Pagan-0.208
 Varg-0.207
Token For
Token ID1114
Feature activation-2.75e-01

Pos logit contributions
aith+0.304
 toxicity+0.292
ipe+0.285
lus+0.273
 sympt+0.269
Neg logit contributions
���-0.290
 THESE-0.289
 Zup-0.279
 Pagan-0.274
abouts-0.270
Token more
Token ID517
Feature activation-2.07e-01

Pos logit contributions
aith+0.533
ARA+0.498
ipe+0.492
 toxicity+0.490
lus+0.482
Neg logit contributions
���-0.483
abouts-0.465
 THESE-0.461
 peas-0.444
 wrists-0.444
Token from
Token ID422
Feature activation-2.22e-01

Pos logit contributions
aith+0.420
ipe+0.400
 toxicity+0.396
ARA+0.387
lus+0.386
Neg logit contributions
���-0.335
 THESE-0.323
abouts-0.318
 Zup-0.311
 peas-0.304
Token The
Token ID383
Feature activation-9.23e-02

Pos logit contributions
aith+0.472
 toxicity+0.443
ipe+0.440
lus+0.418
 sympt+0.414
Neg logit contributions
���-0.377
 THESE-0.356
abouts-0.345
 Zup-0.338
 Shining-0.327
Token Nation
Token ID8741
Feature activation-8.81e-01

Pos logit contributions
aith+0.160
ipe+0.146
 toxicity+0.143
ARA+0.138
 sympt+0.138
Neg logit contributions
���-0.188
abouts-0.181
 THESE-0.178
 Zup-0.177
 Pagan-0.173
Token,
Token ID11
Feature activation-2.06e-01

Pos logit contributions
aith+1.421
 toxicity+1.307
ipe+1.294
ARA+1.253
lus+1.251
Neg logit contributions
���-1.891
abouts-1.818
 THESE-1.798
 Zup-1.739
bable-1.712
Token check
Token ID2198
Feature activation+1.13e-01

Pos logit contributions
aith+0.414
ARA+0.387
 toxicity+0.384
ipe+0.383
lus+0.373
Neg logit contributions
���-0.365
 THESE-0.337
abouts-0.332
 wrists-0.318
bable-0.315
Token out
Token ID503
Feature activation-1.59e-01

Pos logit contributions
���+0.165
 THESE+0.157
abouts+0.157
 Zup+0.146
 peas+0.145
Neg logit contributions
aith-0.252
ipe-0.239
ARA-0.238
 toxicity-0.235
lus-0.233
Token our
Token ID674
Feature activation-2.56e-01

Pos logit contributions
aith+0.392
ipe+0.369
 toxicity+0.367
ARA+0.365
lus+0.355
Neg logit contributions
���-0.217
 THESE-0.201
abouts-0.190
 wrists-0.177
 Zup-0.175
Token latest
Token ID3452
Feature activation-3.26e-01

Pos logit contributions
aith+0.422
ipe+0.389
lus+0.381
ARA+0.381
 toxicity+0.380
Neg logit contributions
���-0.558
abouts-0.510
 THESE-0.508
 Zup-0.498
 peas-0.488
Tokenap
Token ID499
Feature activation+2.21e-01

Pos logit contributions
���+0.593
abouts+0.491
 THESE+0.488
retty+0.472
 Zup+0.471
Neg logit contributions
aith-0.508
 toxicity-0.484
ipe-0.483
lus-0.473
ARA-0.456
Tokenuni
Token ID35657
Feature activation-1.85e-02

Pos logit contributions
���+0.431
abouts+0.356
 THESE+0.355
retty+0.345
 Zup+0.337
Neg logit contributions
aith-0.433
 toxicity-0.417
ipe-0.416
lus-0.403
ARA-0.396
Token,
Token ID11
Feature activation-5.66e-02

Pos logit contributions
aith+0.031
ipe+0.030
 toxicity+0.029
lus+0.029
ARA+0.028
Neg logit contributions
���-0.042
 THESE-0.036
abouts-0.036
 Zup-0.034
retty-0.034
Token a
Token ID257
Feature activation+7.15e-02

Pos logit contributions
aith+0.096
ipe+0.091
 toxicity+0.090
lus+0.088
ARA+0.085
Neg logit contributions
���-0.128
 THESE-0.109
abouts-0.108
retty-0.104
 peas-0.104
Token p
Token ID279
Feature activation+7.93e-02

Pos logit contributions
���+0.164
abouts+0.139
 THESE+0.138
retty+0.134
 peas+0.133
Neg logit contributions
aith-0.118
 toxicity-0.112
ipe-0.112
lus-0.109
ARA-0.105
Tokenē
Token ID27092
Feature activation-8.79e-01

Pos logit contributions
���+0.145
abouts+0.119
 THESE+0.118
retty+0.116
 Zup+0.113
Neg logit contributions
aith-0.164
 toxicity-0.160
ipe-0.158
lus-0.155
 sympt-0.151
Tokenl
Token ID75
Feature activation-1.34e-01

Pos logit contributions
aith+1.505
ipe+1.431
 toxicity+1.423
lus+1.373
ARA+1.353
Neg logit contributions
���-1.991
abouts-1.674
 THESE-1.663
retty-1.618
 Zup-1.596
Tokenā
Token ID10235
Feature activation-4.47e-01

Pos logit contributions
aith+0.222
 toxicity+0.214
ipe+0.211
lus+0.207
ARA+0.200
Neg logit contributions
���-0.305
abouts-0.260
 THESE-0.257
retty-0.250
 peas-0.246
Token p
Token ID279
Feature activation+3.44e-02

Pos logit contributions
aith+0.744
ipe+0.708
 toxicity+0.705
lus+0.674
ARA+0.667
Neg logit contributions
���-1.020
abouts-0.869
 THESE-0.860
retty-0.836
 Zup-0.827
Tokenū
Token ID20317
Feature activation-4.62e-01

Pos logit contributions
���+0.071
abouts+0.060
 THESE+0.059
retty+0.058
 Zup+0.057
Neg logit contributions
aith-0.064
 toxicity-0.061
ipe-0.061
lus-0.060
ARA-0.058
Token ke
Token ID885
Feature activation+2.34e-01

Pos logit contributions
aith+0.756
ipe+0.719
 toxicity+0.715
lus+0.686
ARA+0.677
Neg logit contributions
���-1.078
abouts-0.910
 THESE-0.906
retty-0.885
 Zup-0.875
Token Cup
Token ID5454
Feature activation-8.96e-02

Pos logit contributions
aith+0.462
ipe+0.440
 toxicity+0.437
lus+0.431
ARA+0.414
Neg logit contributions
���-0.489
abouts-0.413
 THESE-0.409
retty-0.390
 Zup-0.390
Token instead
Token ID2427
Feature activation-1.40e-01

Pos logit contributions
aith+0.156
ipe+0.148
 toxicity+0.143
lus+0.142
ARA+0.136
Neg logit contributions
���-0.196
abouts-0.173
 THESE-0.169
retty-0.163
bable-0.162
Token of
Token ID286
Feature activation+7.77e-03

Pos logit contributions
aith+0.224
ipe+0.214
 toxicity+0.208
lus+0.204
ARA+0.197
Neg logit contributions
���-0.336
 THESE-0.283
abouts-0.281
retty-0.274
 Zup-0.270
Token the
Token ID262
Feature activation+7.60e-03

Pos logit contributions
���+0.016
abouts+0.014
 THESE+0.014
 Zup+0.014
 peas+0.014
Neg logit contributions
aith-0.014
ipe-0.013
lus-0.013
 toxicity-0.013
ARA-0.012
Token Super
Token ID3115
Feature activation-1.53e-01

Pos logit contributions
���+0.016
abouts+0.014
 THESE+0.014
 peas+0.013
 wrists+0.013
Neg logit contributions
aith-0.014
ipe-0.014
lus-0.013
 toxicity-0.013
ARA-0.013
Token Bowl
Token ID8693
Feature activation-8.77e-01

Pos logit contributions
aith+0.267
ipe+0.251
 toxicity+0.248
lus+0.239
ARA+0.234
Neg logit contributions
���-0.347
abouts-0.294
 THESE-0.289
retty-0.281
 Zup-0.278
Token.
Token ID13
Feature activation-4.18e-02

Pos logit contributions
aith+1.383
ipe+1.315
 toxicity+1.271
lus+1.252
ARA+1.211
Neg logit contributions
���-2.116
abouts-1.806
 THESE-1.795
retty-1.745
 Zup-1.718
Token But
Token ID887
Feature activation-2.51e-02

Pos logit contributions
aith+0.081
ipe+0.077
 toxicity+0.076
lus+0.074
ARA+0.072
Neg logit contributions
���-0.085
 THESE-0.072
abouts-0.071
 Zup-0.069
retty-0.067
Token after
Token ID706
Feature activation-3.26e-02

Pos logit contributions
aith+0.046
ipe+0.044
lus+0.042
 toxicity+0.041
ARA+0.041
Neg logit contributions
���-0.053
 THESE-0.045
abouts-0.045
 peas-0.044
 Zup-0.043
Token leaving
Token ID4305
Feature activation-2.35e-02

Pos logit contributions
aith+0.057
ipe+0.055
lus+0.052
 toxicity+0.052
ARA+0.051
Neg logit contributions
���-0.072
abouts-0.061
 THESE-0.061
 peas-0.059
 Zup-0.059
Token Kentucky
Token ID11867
Feature activation+2.04e-01

Pos logit contributions
aith+0.041
ipe+0.039
lus+0.037
 toxicity+0.037
ARA+0.036
Neg logit contributions
���-0.052
 THESE-0.044
abouts-0.044
 wrists-0.042
 Zup-0.042
Token?
Token ID30
Feature activation-1.77e-01

Pos logit contributions
aith+0.215
ipe+0.207
 toxicity+0.201
lus+0.198
ARA+0.194
Neg logit contributions
���-0.265
abouts-0.227
 THESE-0.224
 wrists-0.215
retty-0.213
Token
Token ID447
Feature activation-4.65e-02

Pos logit contributions
aith+0.312
ipe+0.296
 toxicity+0.289
lus+0.284
ARA+0.275
Neg logit contributions
���-0.387
 THESE-0.338
abouts-0.334
 Zup-0.320
retty-0.318
Token
Token ID251
Feature activation-5.48e-02

Pos logit contributions
aith+0.105
ipe+0.101
 toxicity+0.099
lus+0.098
ARA+0.096
Neg logit contributions
���-0.081
 THESE-0.064
abouts-0.063
retty-0.061
 Zup-0.060
Token Ichigo
Token ID44547
Feature activation-2.60e-02

Pos logit contributions
aith+0.099
ipe+0.096
 toxicity+0.093
lus+0.091
ARA+0.088
Neg logit contributions
���-0.117
 THESE-0.101
abouts-0.100
 Zup-0.096
retty-0.095
Token m
Token ID285
Feature activation+2.78e-02

Pos logit contributions
aith+0.052
ipe+0.050
 toxicity+0.049
lus+0.048
ARA+0.046
Neg logit contributions
���-0.051
 THESE-0.042
abouts-0.042
 wrists-0.040
 peas-0.040
Tokenumbled
Token ID11137
Feature activation-8.75e-01

Pos logit contributions
���+0.065
abouts+0.055
 THESE+0.055
retty+0.055
 Zup+0.053
Neg logit contributions
aith-0.044
 toxicity-0.042
ipe-0.042
ARA-0.040
lus-0.040
Token to
Token ID284
Feature activation-4.84e-02

Pos logit contributions
aith+1.485
ipe+1.422
 toxicity+1.375
lus+1.355
ARA+1.320
Neg logit contributions
���-1.995
 THESE-1.687
abouts-1.686
retty-1.624
 Zup-1.608
Token himself
Token ID2241
Feature activation+8.31e-02

Pos logit contributions
aith+0.085
ipe+0.082
 toxicity+0.080
lus+0.079
ARA+0.076
Neg logit contributions
���-0.105
abouts-0.090
 THESE-0.089
 Zup-0.086
 wrists-0.085
Token as
Token ID355
Feature activation-8.36e-02

Pos logit contributions
���+0.187
abouts+0.160
 THESE+0.160
 peas+0.153
retty+0.153
Neg logit contributions
aith-0.140
ipe-0.135
 toxicity-0.129
lus-0.128
ARA-0.125
Token he
Token ID339
Feature activation+6.19e-02

Pos logit contributions
aith+0.150
ipe+0.145
 toxicity+0.139
lus+0.138
ARA+0.133
Neg logit contributions
���-0.181
abouts-0.153
 THESE-0.153
retty-0.145
 wrists-0.145
Token watched
Token ID7342
Feature activation-3.17e-01

Pos logit contributions
���+0.132
abouts+0.111
 THESE+0.111
 wrists+0.105
 peas+0.105
Neg logit contributions
aith-0.114
ipe-0.110
 toxicity-0.107
lus-0.105
ARA-0.103
Token
Token ID198
Feature activation+1.81e-01

Pos logit contributions
 THESE+0.189
abouts+0.187
���+0.177
 Zup+0.177
 Shining+0.170
Neg logit contributions
aith-0.317
 toxicity-0.304
ipe-0.300
 sympt-0.289
lus-0.289
TokenMind
Token ID28478
Feature activation+2.70e-01

Pos logit contributions
���+0.394
abouts+0.338
 THESE+0.336
retty+0.324
 Zup+0.323
Neg logit contributions
aith-0.314
ipe-0.298
 toxicity-0.296
lus-0.285
 sympt-0.277
Tokenclaw
Token ID43143
Feature activation+1.63e-01

Pos logit contributions
���+0.567
abouts+0.481
 THESE+0.479
retty+0.460
bable+0.456
Neg logit contributions
aith-0.493
ipe-0.463
 toxicity-0.459
lus-0.452
ARA-0.442
Token Shaman
Token ID34805
Feature activation-6.48e-02

Pos logit contributions
���+0.333
 THESE+0.280
abouts+0.276
 Zup+0.267
retty+0.263
Neg logit contributions
aith-0.312
ipe-0.293
 toxicity-0.291
lus-0.284
ARA-0.283
Token vs
Token ID3691
Feature activation+1.65e-01

Pos logit contributions
aith+0.136
ipe+0.132
 toxicity+0.128
lus+0.127
ARA+0.123
Neg logit contributions
���-0.124
abouts-0.099
 THESE-0.099
retty-0.095
 Zup-0.093
Token Pyro
Token ID44954
Feature activation-8.73e-01

Pos logit contributions
���+0.335
 THESE+0.279
abouts+0.278
 Zup+0.268
retty+0.264
Neg logit contributions
aith-0.320
ipe-0.307
 toxicity-0.300
lus-0.296
ARA-0.289
Tokenblast
Token ID39806
Feature activation-1.99e-01

Pos logit contributions
aith+1.645
ipe+1.576
 toxicity+1.548
lus+1.514
ARA+1.496
Neg logit contributions
���-1.829
abouts-1.509
 THESE-1.503
retty-1.459
bable-1.436
Token
Token ID198
Feature activation+2.56e-01

Pos logit contributions
aith+0.355
ipe+0.343
 toxicity+0.333
lus+0.328
ARA+0.316
Neg logit contributions
���-0.440
abouts-0.363
 THESE-0.362
retty-0.351
bable-0.344
Token
Token ID198
Feature activation+2.83e-01

Pos logit contributions
 THESE+0.345
abouts+0.340
 Zup+0.321
���+0.316
 Shining+0.311
Neg logit contributions
aith-0.588
 toxicity-0.566
ipe-0.557
 sympt-0.538
lus-0.536
TokenPrim
Token ID23828
Feature activation+1.85e-01

Pos logit contributions
���+0.611
abouts+0.527
 THESE+0.524
retty+0.505
 Zup+0.502
Neg logit contributions
aith-0.494
ipe-0.469
 toxicity-0.466
lus-0.449
 sympt-0.437
Tokenal
Token ID282
Feature activation+3.72e-01

Pos logit contributions
���+0.336
abouts+0.276
 THESE+0.271
retty+0.263
 Zup+0.259
Neg logit contributions
aith-0.390
ipe-0.374
 toxicity-0.371
lus-0.362
ARA-0.359
Tokenat
Token ID265
Feature activation-4.69e-01

Pos logit contributions
���+0.077
abouts+0.065
 THESE+0.064
retty+0.063
bable+0.062
Neg logit contributions
aith-0.071
ipe-0.068
 toxicity-0.067
lus-0.065
ARA-0.065
Tokenoder
Token ID12342
Feature activation+1.81e-01

Pos logit contributions
aith+0.915
ipe+0.876
 toxicity+0.855
lus+0.849
ARA+0.832
Neg logit contributions
���-0.950
abouts-0.783
 THESE-0.779
retty-0.752
 Zup-0.740
Tokenma
Token ID2611
Feature activation-4.25e-02

Pos logit contributions
���+0.407
 THESE+0.346
abouts+0.344
retty+0.332
 peas+0.328
Neg logit contributions
aith-0.307
ipe-0.292
 toxicity-0.284
lus-0.281
ARA-0.280
Token with
Token ID351
Feature activation+1.92e-02

Pos logit contributions
aith+0.069
ipe+0.065
lus+0.063
 toxicity+0.063
ARA+0.061
Neg logit contributions
���-0.099
 THESE-0.084
abouts-0.084
retty-0.081
 Zup-0.081
Token variable
Token ID7885
Feature activation+3.39e-01

Pos logit contributions
���+0.045
 THESE+0.039
abouts+0.038
 peas+0.037
 wrists+0.037
Neg logit contributions
aith-0.031
ipe-0.029
lus-0.028
 toxicity-0.028
ARA-0.028
Token penet
Token ID11958
Feature activation-8.72e-01

Pos logit contributions
���+0.773
 THESE+0.671
abouts+0.667
 wrists+0.653
 peas+0.647
Neg logit contributions
aith-0.551
ipe-0.519
lus-0.503
ARA-0.491
 toxicity-0.474
Tokenrance
Token ID8132
Feature activation-1.51e-01

Pos logit contributions
aith+1.168
ipe+1.102
 toxicity+1.066
lus+1.028
ARA+1.010
Neg logit contributions
���-2.287
 THESE-1.980
abouts-1.977
retty-1.939
 Zup-1.911
Token,
Token ID11
Feature activation-8.18e-02

Pos logit contributions
aith+0.256
ipe+0.243
 toxicity+0.234
lus+0.233
ARA+0.228
Neg logit contributions
���-0.343
abouts-0.291
 THESE-0.290
retty-0.280
 Zup-0.279
Token characterized
Token ID16264
Feature activation+5.32e-02

Pos logit contributions
aith+0.139
ipe+0.130
lus+0.126
 toxicity+0.125
ARA+0.123
Neg logit contributions
���-0.188
 THESE-0.158
abouts-0.157
retty-0.152
 wrists-0.151
Token clinically
Token ID36688
Feature activation-3.39e-01

Pos logit contributions
���+0.095
abouts+0.076
 THESE+0.076
retty+0.073
 Zup+0.071
Neg logit contributions
aith-0.117
ipe-0.112
 toxicity-0.109
lus-0.109
ARA-0.107
Token by
Token ID416
Feature activation+2.79e-03

Pos logit contributions
aith+0.696
ipe+0.668
 toxicity+0.650
lus+0.647
ARA+0.632
Neg logit contributions
���-0.651
abouts-0.537
 THESE-0.530
retty-0.511
 Zup-0.507
Token and
Token ID290
Feature activation-7.38e-02

Pos logit contributions
���+0.123
 THESE+0.105
abouts+0.105
retty+0.100
 Zup+0.100
Neg logit contributions
aith-0.092
ipe-0.087
 toxicity-0.085
lus-0.084
ARA-0.082
Token J
Token ID449
Feature activation-2.87e-02

Pos logit contributions
aith+0.123
ipe+0.117
 toxicity+0.113
lus+0.112
ARA+0.108
Neg logit contributions
���-0.171
 THESE-0.147
abouts-0.143
 peas-0.138
 Zup-0.138
Tokenansson
Token ID44038
Feature activation-1.68e-01

Pos logit contributions
aith+0.052
 toxicity+0.049
ipe+0.049
lus+0.048
 sympt+0.047
Neg logit contributions
���-0.061
abouts-0.051
 THESE-0.051
retty-0.050
 Zup-0.049
Token's
Token ID338
Feature activation+2.02e-02

Pos logit contributions
aith+0.296
ipe+0.283
 toxicity+0.276
lus+0.270
ARA+0.264
Neg logit contributions
���-0.370
 THESE-0.310
abouts-0.308
retty-0.298
 Zup-0.295
Token pict
Token ID2862
Feature activation-4.22e-01

Pos logit contributions
���+0.046
 THESE+0.039
abouts+0.039
 Zup+0.038
 wrists+0.037
Neg logit contributions
aith-0.034
ipe-0.033
 toxicity-0.031
lus-0.031
ARA-0.030
Tokenorial
Token ID5132
Feature activation-8.65e-01

Pos logit contributions
aith+0.779
ipe+0.745
 toxicity+0.735
lus+0.718
ARA+0.704
Neg logit contributions
���-0.896
abouts-0.746
 THESE-0.744
retty-0.719
 Zup-0.709
Token and
Token ID290
Feature activation-1.02e-01

Pos logit contributions
aith+1.569
ipe+1.511
 toxicity+1.468
lus+1.449
ARA+1.419
Neg logit contributions
���-1.865
 THESE-1.552
abouts-1.551
retty-1.494
 wrists-1.480
Token descriptive
Token ID35644
Feature activation-2.18e-01

Pos logit contributions
aith+0.168
ipe+0.162
lus+0.156
 toxicity+0.156
ARA+0.150
Neg logit contributions
���-0.235
 THESE-0.198
abouts-0.196
retty-0.191
 Zup-0.190
Token,
Token ID11
Feature activation-1.25e-01

Pos logit contributions
aith+0.387
ipe+0.371
 toxicity+0.358
lus+0.356
ARA+0.350
Neg logit contributions
���-0.477
 THESE-0.402
abouts-0.398
retty-0.383
 wrists-0.383
Token yet
Token ID1865
Feature activation+3.30e-01

Pos logit contributions
aith+0.214
ipe+0.204
 toxicity+0.197
lus+0.196
ARA+0.191
Neg logit contributions
���-0.283
 THESE-0.239
abouts-0.236
retty-0.230
 Zup-0.228
Token rhy
Token ID40005
Feature activation-2.88e-02

Pos logit contributions
���+0.718
 THESE+0.609
abouts+0.601
retty+0.580
 peas+0.579
Neg logit contributions
aith-0.583
ipe-0.562
 toxicity-0.541
lus-0.538
ARA-0.530
Token.
Token ID13
Feature activation-1.48e-01

Pos logit contributions
aith+0.317
ipe+0.301
 toxicity+0.292
lus+0.286
ARA+0.281
Neg logit contributions
���-0.395
 THESE-0.339
abouts-0.335
 Zup-0.325
retty-0.324
Token For
Token ID1114
Feature activation-3.32e-01

Pos logit contributions
aith+0.282
ipe+0.268
 toxicity+0.262
lus+0.246
NL+0.243
Neg logit contributions
���-0.294
 THESE-0.273
 Zup-0.256
abouts-0.255
 Pagan-0.251
Token more
Token ID517
Feature activation-1.74e-01

Pos logit contributions
aith+0.621
ipe+0.588
 toxicity+0.582
lus+0.579
ARA+0.570
Neg logit contributions
���-0.649
 THESE-0.582
abouts-0.575
 Zup-0.560
 peas-0.557
Token from
Token ID422
Feature activation-2.04e-01

Pos logit contributions
aith+0.347
ipe+0.324
ARA+0.320
lus+0.320
 toxicity+0.318
Neg logit contributions
���-0.314
abouts-0.278
 THESE-0.276
 peas-0.271
 wrists-0.268
Token The
Token ID383
Feature activation-8.11e-02

Pos logit contributions
aith+0.401
ipe+0.376
 toxicity+0.375
ARA+0.361
lus+0.358
Neg logit contributions
���-0.404
 THESE-0.346
abouts-0.331
 Zup-0.324
retty-0.321
Token Nation
Token ID8741
Feature activation-8.64e-01

Pos logit contributions
aith+0.137
ipe+0.126
 toxicity+0.123
ARA+0.119
lus+0.119
Neg logit contributions
���-0.179
 THESE-0.160
abouts-0.160
 Zup-0.154
bable-0.153
Token,
Token ID11
Feature activation-2.14e-01

Pos logit contributions
aith+1.362
ipe+1.249
 toxicity+1.248
ARA+1.221
lus+1.214
Neg logit contributions
���-1.959
abouts-1.798
 THESE-1.769
bable-1.703
 Zup-1.695
Token check
Token ID2198
Feature activation+1.42e-01

Pos logit contributions
aith+0.412
ipe+0.386
 toxicity+0.381
ARA+0.375
lus+0.374
Neg logit contributions
���-0.426
abouts-0.370
 THESE-0.364
 wrists-0.351
bable-0.349
Token out
Token ID503
Feature activation-1.33e-01

Pos logit contributions
���+0.232
 THESE+0.210
abouts+0.198
 peas+0.193
bable+0.187
Neg logit contributions
aith-0.315
ipe-0.298
ARA-0.291
 toxicity-0.289
lus-0.282
Token our
Token ID674
Feature activation-2.14e-01

Pos logit contributions
aith+0.313
ipe+0.297
 toxicity+0.293
ARA+0.289
lus+0.286
Neg logit contributions
���-0.212
 THESE-0.181
abouts-0.176
 wrists-0.160
 Zup-0.159
Token latest
Token ID3452
Feature activation-2.71e-01

Pos logit contributions
aith+0.330
ipe+0.305
 toxicity+0.303
lus+0.301
ARA+0.288
Neg logit contributions
���-0.516
abouts-0.455
 THESE-0.450
 Zup-0.431
retty-0.429
Token with
Token ID351
Feature activation+2.12e-02

Pos logit contributions
aith+0.146
ipe+0.137
lus+0.133
 toxicity+0.132
ARA+0.129
Neg logit contributions
���-0.180
 THESE-0.154
abouts-0.154
 wrists-0.149
 peas-0.148
Token little
Token ID1310
Feature activation-5.36e-01

Pos logit contributions
���+0.048
abouts+0.041
 THESE+0.041
 wrists+0.040
 peas+0.040
Neg logit contributions
aith-0.036
ipe-0.034
 toxicity-0.033
lus-0.032
ARA-0.032
Token pressing
Token ID12273
Feature activation+3.55e-02

Pos logit contributions
aith+0.891
ipe+0.852
lus+0.818
 toxicity+0.805
ARA+0.801
Neg logit contributions
���-1.222
abouts-1.056
 THESE-1.049
 wrists-1.010
 peas-1.010
Token and
Token ID290
Feature activation-4.89e-02

Pos logit contributions
���+0.076
abouts+0.067
 THESE+0.066
 wrists+0.064
 peas+0.063
Neg logit contributions
aith-0.062
ipe-0.060
lus-0.057
 toxicity-0.056
ARA-0.056
Token few
Token ID1178
Feature activation-1.39e-01

Pos logit contributions
aith+0.085
ipe+0.081
lus+0.078
 toxicity+0.077
ARA+0.076
Neg logit contributions
���-0.108
abouts-0.094
 THESE-0.092
 wrists-0.090
 peas-0.089
Token penet
Token ID11958
Feature activation-8.60e-01

Pos logit contributions
aith+0.240
ipe+0.230
lus+0.221
ARA+0.217
 toxicity+0.217
Neg logit contributions
���-0.303
abouts-0.272
 THESE-0.265
 wrists-0.260
 peas-0.256
Tokenrative
Token ID13260
Feature activation-2.20e-01

Pos logit contributions
aith+1.398
ipe+1.334
 toxicity+1.305
lus+1.266
ARA+1.242
Neg logit contributions
���-2.018
abouts-1.718
 THESE-1.708
retty-1.663
 Zup-1.636
Token runs
Token ID4539
Feature activation+3.71e-02

Pos logit contributions
aith+0.372
ipe+0.359
lus+0.343
 toxicity+0.342
ARA+0.338
Neg logit contributions
���-0.481
abouts-0.430
 THESE-0.423
 wrists-0.415
 peas-0.406
Token.
Token ID13
Feature activation-1.42e-02

Pos logit contributions
���+0.084
abouts+0.074
 THESE+0.072
bable+0.069
retty+0.069
Neg logit contributions
aith-0.062
ipe-0.059
 toxicity-0.058
lus-0.057
ARA-0.056
Token
Token ID198
Feature activation-6.95e-02

Pos logit contributions
aith+0.028
ipe+0.027
 toxicity+0.026
lus+0.026
ARA+0.025
Neg logit contributions
���-0.028
 THESE-0.024
abouts-0.024
 Zup-0.023
retty-0.022
Token
Token ID198
Feature activation-4.13e-02

Pos logit contributions
aith+0.155
 toxicity+0.151
ipe+0.148
lus+0.143
 sympt+0.142
Neg logit contributions
 THESE-0.098
abouts-0.096
���-0.095
 Zup-0.090
bable-0.089
Token-
Token ID12
Feature activation+4.63e-01

Pos logit contributions
���+0.164
abouts+0.139
 THESE+0.139
retty+0.134
 Zup+0.133
Neg logit contributions
aith-0.108
ipe-0.102
 toxicity-0.100
lus-0.097
ARA-0.095
TokenPat
Token ID12130
Feature activation-1.04e-01

Pos logit contributions
���+0.990
abouts+0.856
 THESE+0.853
retty+0.808
 Zup+0.807
Neg logit contributions
aith-0.819
ipe-0.775
 toxicity-0.772
lus-0.741
 sympt-0.723
Tokenrices
Token ID45977
Feature activation+5.51e-02

Pos logit contributions
aith+0.191
ipe+0.182
 toxicity+0.179
lus+0.174
ARA+0.171
Neg logit contributions
���-0.223
abouts-0.185
 THESE-0.184
retty-0.179
 Zup-0.176
Token 3
Token ID513
Feature activation+3.67e-01

Pos logit contributions
���+0.124
abouts+0.104
 THESE+0.104
retty+0.100
 Zup+0.099
Neg logit contributions
aith-0.095
ipe-0.091
 toxicity-0.089
lus-0.087
ARA-0.085
Tokenway
Token ID1014
Feature activation-2.30e-01

Pos logit contributions
���+0.837
abouts+0.703
 THESE+0.701
retty+0.680
 Zup+0.670
Neg logit contributions
aith-0.623
ipe-0.596
 toxicity-0.582
lus-0.571
ARA-0.555
Token-
Token ID12
Feature activation+5.40e-01

Pos logit contributions
aith+0.413
ipe+0.396
 toxicity+0.387
lus+0.380
ARA+0.370
Neg logit contributions
���-0.503
abouts-0.418
 THESE-0.417
retty-0.403
 Zup-0.397
TokenAnt
Token ID13217
Feature activation+4.84e-02

Pos logit contributions
���+1.156
abouts+1.000
 THESE+0.994
retty+0.945
 Zup+0.941
Neg logit contributions
aith-0.952
ipe-0.900
 toxicity-0.898
lus-0.861
 sympt-0.841
Tokens
Token ID82
Feature activation+1.56e-01

Pos logit contributions
���+0.109
abouts+0.092
 THESE+0.091
retty+0.088
bable+0.087
Neg logit contributions
aith-0.083
ipe-0.079
 toxicity-0.078
lus-0.076
ARA-0.075
Token Party
Token ID3615
Feature activation-3.66e-01

Pos logit contributions
���+0.318
abouts+0.263
 THESE+0.263
retty+0.251
 Zup+0.250
Neg logit contributions
aith-0.298
ipe-0.287
 toxicity-0.280
lus-0.276
ARA-0.270
Tokenin
Token ID259
Feature activation+4.22e-02

Pos logit contributions
aith+0.545
ipe+0.518
 toxicity+0.502
lus+0.492
ARA+0.476
Neg logit contributions
���-0.916
abouts-0.780
 THESE-0.779
retty-0.758
 Zup-0.748
Token-
Token ID12
Feature activation+5.40e-01

Pos logit contributions
���+0.094
abouts+0.079
 THESE+0.079
retty+0.076
 Zup+0.075
Neg logit contributions
aith-0.074
ipe-0.071
 toxicity-0.069
lus-0.068
ARA-0.066
Token.
Token ID13
Feature activation+2.74e-02

Pos logit contributions
aith+0.181
ipe+0.173
 toxicity+0.168
lus+0.166
ARA+0.162
Neg logit contributions
���-0.243
abouts-0.209
 THESE-0.208
retty-0.201
 wrists-0.200
Token
Token ID198
Feature activation-1.22e-01

Pos logit contributions
���+0.054
 THESE+0.046
abouts+0.045
 Zup+0.044
retty+0.043
Neg logit contributions
aith-0.054
ipe-0.052
 toxicity-0.051
lus-0.049
ARA-0.048
Token
Token ID198
Feature activation-4.27e-02

Pos logit contributions
aith+0.277
 toxicity+0.266
ipe+0.261
 sympt+0.252
lus+0.251
Neg logit contributions
 THESE-0.171
abouts-0.169
���-0.168
 Zup-0.159
bable-0.155
TokenShop
Token ID29917
Feature activation+1.46e-01

Pos logit contributions
aith+0.079
 toxicity+0.076
ipe+0.075
lus+0.072
 sympt+0.071
Neg logit contributions
���-0.087
abouts-0.075
 THESE-0.075
 Zup-0.071
retty-0.071
Tokenify
Token ID1958
Feature activation-1.49e-01

Pos logit contributions
���+0.232
abouts+0.191
 THESE+0.188
bable+0.179
retty+0.178
Neg logit contributions
aith-0.340
ipe-0.324
 toxicity-0.324
lus-0.316
ARA-0.315
Token POS
Token ID28069
Feature activation+7.82e-01

Pos logit contributions
aith+0.261
ipe+0.246
 toxicity+0.240
lus+0.235
ARA+0.229
Neg logit contributions
���-0.335
 THESE-0.283
abouts-0.281
 Zup-0.272
retty-0.271
Token delivers
Token ID16316
Feature activation+5.76e-02

Pos logit contributions
���+1.725
 THESE+1.459
abouts+1.452
 Zup+1.411
 wrists+1.408
Neg logit contributions
aith-1.366
ipe-1.277
 toxicity-1.256
lus-1.230
ARA-1.193
Token in
Token ID287
Feature activation+9.06e-02

Pos logit contributions
���+0.131
 THESE+0.112
abouts+0.112
 wrists+0.108
 Zup+0.107
Neg logit contributions
aith-0.097
ipe-0.092
 toxicity-0.089
lus-0.088
ARA-0.086
Token-
Token ID12
Feature activation+2.79e-02

Pos logit contributions
���+0.222
 THESE+0.190
abouts+0.189
 Zup+0.185
retty+0.184
Neg logit contributions
aith-0.136
ipe-0.129
lus-0.125
 toxicity-0.124
ARA-0.119
Tokenstore
Token ID8095
Feature activation-1.12e-01

Pos logit contributions
���+0.048
abouts+0.039
 THESE+0.039
retty+0.037
bable+0.037
Neg logit contributions
aith-0.061
ipe-0.059
 toxicity-0.059
lus-0.057
ARA-0.056
Token and
Token ID290
Feature activation+7.66e-02

Pos logit contributions
aith+0.211
ipe+0.202
lus+0.197
 toxicity+0.196
ARA+0.191
Neg logit contributions
���-0.226
abouts-0.192
 THESE-0.191
 wrists-0.185
 peas-0.184
Token in
Token ID287
Feature activation+2.85e-02

Pos logit contributions
���+0.186
abouts+0.160
 THESE+0.157
 Zup+0.152
 peas+0.151
Neg logit contributions
aith-0.155
ipe-0.150
lus-0.145
 toxicity-0.144
ARA-0.142
Token its
Token ID663
Feature activation+1.21e-01

Pos logit contributions
���+0.070
 THESE+0.061
abouts+0.060
 Zup+0.059
 peas+0.058
Neg logit contributions
aith-0.042
ipe-0.041
lus-0.039
 toxicity-0.038
NL-0.037
Token own
Token ID898
Feature activation+1.30e-01

Pos logit contributions
���+0.270
 THESE+0.231
abouts+0.228
 wrists+0.223
 peas+0.222
Neg logit contributions
aith-0.206
ipe-0.199
lus-0.192
ARA-0.186
 toxicity-0.186
Token long
Token ID890
Feature activation-5.87e-02

Pos logit contributions
���+0.287
 THESE+0.251
abouts+0.249
 wrists+0.244
 peas+0.243
Neg logit contributions
aith-0.215
ipe-0.209
lus-0.201
 toxicity-0.195
NL-0.192
Token-
Token ID12
Feature activation+3.38e-02

Pos logit contributions
aith+0.088
ipe+0.084
lus+0.081
 toxicity+0.081
ARA+0.078
Neg logit contributions
���-0.143
 THESE-0.123
abouts-0.123
retty-0.120
 wrists-0.118
Tokenterm
Token ID4354
Feature activation+6.63e-01

Pos logit contributions
���+0.076
abouts+0.066
 THESE+0.065
retty+0.063
bable+0.063
Neg logit contributions
aith-0.056
ipe-0.054
 toxicity-0.053
lus-0.052
ARA-0.051
Token interest
Token ID1393
Feature activation+1.53e-01

Pos logit contributions
���+1.203
 THESE+1.029
abouts+1.027
 wrists+0.995
 peas+0.990
Neg logit contributions
aith-1.314
ipe-1.306
lus-1.274
ARA-1.235
NL-1.232
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
���+0.329
abouts+0.286
 THESE+0.280
 Zup+0.271
 peas+0.267
Neg logit contributions
aith-0.267
ipe-0.259
lus-0.251
 toxicity-0.249
ARA-0.245
Token It
Token ID632
Feature activation-1.03e-02

Pos logit contributions
aith+0.040
ipe+0.038
 toxicity+0.038
lus+0.036
ARA+0.035
Neg logit contributions
���-0.042
 THESE-0.037
abouts-0.036
 Zup-0.035
 Shining-0.034
Token seized
Token ID12000
Feature activation+9.34e-02

Pos logit contributions
aith+0.018
ipe+0.017
 toxicity+0.017
lus+0.017
ARA+0.016
Neg logit contributions
���-0.023
abouts-0.020
 THESE-0.020
 wrists-0.019
 peas-0.019
Token control
Token ID1630
Feature activation+1.20e-01

Pos logit contributions
���+0.214
 THESE+0.183
abouts+0.182
 Zup+0.177
 peas+0.176
Neg logit contributions
aith-0.155
ipe-0.147
 toxicity-0.142
lus-0.141
ARA-0.138
Token attitudes
Token ID14479
Feature activation+1.54e-01

Pos logit contributions
aith+0.010
ipe+0.010
lus+0.010
 toxicity+0.009
ARA+0.009
Neg logit contributions
���-0.014
abouts-0.012
 THESE-0.012
 wrists-0.012
 peas-0.012
Token,
Token ID11
Feature activation+2.52e-02

Pos logit contributions
���+0.330
abouts+0.290
 THESE+0.287
retty+0.272
 Zup+0.270
Neg logit contributions
aith-0.275
ipe-0.258
 toxicity-0.251
lus-0.251
ARA-0.245
Token but
Token ID475
Feature activation+3.76e-02

Pos logit contributions
���+0.056
 THESE+0.049
abouts+0.048
 wrists+0.046
retty+0.046
Neg logit contributions
aith-0.043
ipe-0.041
lus-0.039
 toxicity-0.039
ARA-0.038
Token it
Token ID340
Feature activation+8.97e-02

Pos logit contributions
���+0.080
 THESE+0.071
abouts+0.069
 Zup+0.067
 wrists+0.066
Neg logit contributions
aith-0.067
ipe-0.064
lus-0.062
 toxicity-0.061
ARA-0.059
Token seems
Token ID2331
Feature activation+2.24e-01

Pos logit contributions
���+0.179
 THESE+0.156
abouts+0.153
 wrists+0.147
 peas+0.147
Neg logit contributions
aith-0.171
ipe-0.166
lus-0.159
 toxicity-0.157
ARA-0.157
Token rude
Token ID22066
Feature activation+6.29e-01

Pos logit contributions
���+0.471
 THESE+0.406
abouts+0.395
 Zup+0.379
 peas+0.378
Neg logit contributions
aith-0.416
ipe-0.399
lus-0.381
 toxicity-0.377
ARA-0.370
Token to
Token ID284
Feature activation+6.51e-02

Pos logit contributions
���+1.297
 THESE+1.127
abouts+1.113
retty+1.067
 Zup+1.063
Neg logit contributions
aith-1.145
ipe-1.098
 toxicity-1.066
lus-1.060
ARA-1.032
Token send
Token ID3758
Feature activation-2.43e-01

Pos logit contributions
���+0.140
 THESE+0.126
abouts+0.121
 Zup+0.119
 wrists+0.117
Neg logit contributions
aith-0.115
ipe-0.107
lus-0.105
 toxicity-0.102
ARA-0.101
Token someone
Token ID2130
Feature activation+4.30e-02

Pos logit contributions
aith+0.402
ipe+0.378
lus+0.367
 toxicity+0.359
ARA+0.354
Neg logit contributions
���-0.560
 THESE-0.494
abouts-0.480
 Zup-0.465
 wrists-0.462
Token a
Token ID257
Feature activation+5.35e-03

Pos logit contributions
���+0.090
 THESE+0.080
abouts+0.078
 Zup+0.075
 wrists+0.075
Neg logit contributions
aith-0.078
ipe-0.074
lus-0.072
 toxicity-0.070
ARA-0.070
Token link
Token ID2792
Feature activation-1.06e-01

Pos logit contributions
���+0.013
 THESE+0.011
abouts+0.011
 Zup+0.011
 wrists+0.011
Neg logit contributions
aith-0.008
ipe-0.008
lus-0.007
 toxicity-0.007
ARA-0.007
Token the
Token ID262
Feature activation-4.31e-02

Pos logit contributions
aith+0.086
ipe+0.082
lus+0.079
 toxicity+0.079
ARA+0.078
Neg logit contributions
���-0.093
 THESE-0.079
abouts-0.078
 Zup-0.075
 peas-0.075
Token traditional
Token ID4569
Feature activation+3.84e-01

Pos logit contributions
aith+0.079
ipe+0.075
lus+0.073
 toxicity+0.072
ARA+0.072
Neg logit contributions
���-0.091
 THESE-0.077
abouts-0.076
 peas-0.075
 Zup-0.074
Token flavors
Token ID17361
Feature activation-5.03e-02

Pos logit contributions
���+0.634
 THESE+0.523
abouts+0.522
 peas+0.503
 Zup+0.499
Neg logit contributions
aith-0.855
ipe-0.822
lus-0.803
 toxicity-0.800
ARA-0.794
Token of
Token ID286
Feature activation+3.85e-02

Pos logit contributions
aith+0.087
ipe+0.082
 toxicity+0.081
lus+0.080
ARA+0.078
Neg logit contributions
���-0.112
abouts-0.095
 THESE-0.095
retty-0.092
 Zup-0.091
Token Vietnamese
Token ID23618
Feature activation+2.29e-01

Pos logit contributions
���+0.061
 THESE+0.050
abouts+0.049
 peas+0.047
 Zup+0.046
Neg logit contributions
aith-0.091
ipe-0.087
 toxicity-0.085
lus-0.085
ARA-0.084
Token cuisine
Token ID33072
Feature activation+6.46e-01

Pos logit contributions
���+0.492
 THESE+0.420
abouts+0.420
 peas+0.409
 Zup+0.401
Neg logit contributions
aith-0.408
ipe-0.385
 toxicity-0.375
lus-0.373
ARA-0.368
Token in
Token ID287
Feature activation-1.45e-02

Pos logit contributions
���+1.463
 THESE+1.270
abouts+1.269
retty+1.211
 Zup+1.201
Neg logit contributions
aith-1.071
ipe-1.010
 toxicity-0.988
lus-0.979
ARA-0.957
Token New
Token ID968
Feature activation-1.45e-01

Pos logit contributions
aith+0.027
ipe+0.027
lus+0.025
 toxicity+0.025
ARA+0.025
Neg logit contributions
���-0.030
 THESE-0.026
abouts-0.025
 Zup-0.024
 peas-0.024
Token Orleans
Token ID12255
Feature activation+2.58e-01

Pos logit contributions
aith+0.357
ipe+0.345
 toxicity+0.339
lus+0.334
ARA+0.331
Neg logit contributions
���-0.217
 THESE-0.167
abouts-0.166
retty-0.158
 Zup-0.155
Token.
Token ID13
Feature activation-1.23e-01

Pos logit contributions
���+0.587
abouts+0.510
 THESE+0.506
retty+0.487
 Zup+0.480
Neg logit contributions
aith-0.430
ipe-0.404
 toxicity-0.397
lus-0.390
ARA-0.381
Token To
Token ID1675
Feature activation-1.81e-01

Pos logit contributions
aith+0.237
ipe+0.226
 toxicity+0.222
lus+0.217
ARA+0.211
Neg logit contributions
���-0.248
 THESE-0.214
abouts-0.210
 Zup-0.205
retty-0.199
Token to
Token ID284
Feature activation+3.55e-02

Pos logit contributions
aith+0.164
ipe+0.157
lus+0.149
 toxicity+0.146
ARA+0.144
Neg logit contributions
���-0.225
 THESE-0.197
abouts-0.194
 Zup-0.186
retty-0.186
Token accept
Token ID2453
Feature activation-3.57e-02

Pos logit contributions
���+0.071
 THESE+0.060
abouts+0.060
 Zup+0.057
 peas+0.056
Neg logit contributions
aith-0.070
ipe-0.067
lus-0.065
 toxicity-0.065
ARA-0.063
Token a
Token ID257
Feature activation-4.40e-02

Pos logit contributions
aith+0.060
ipe+0.058
lus+0.055
 toxicity+0.054
ARA+0.053
Neg logit contributions
���-0.080
 THESE-0.070
abouts-0.068
 Zup-0.067
retty-0.066
Token complaint
Token ID8224
Feature activation+3.36e-01

Pos logit contributions
aith+0.082
ipe+0.079
lus+0.076
 toxicity+0.075
ARA+0.073
Neg logit contributions
���-0.091
 THESE-0.077
abouts-0.076
 Zup-0.074
retty-0.073
Token.
Token ID13
Feature activation+3.41e-02

Pos logit contributions
���+0.719
abouts+0.624
 THESE+0.621
 Zup+0.597
retty+0.590
Neg logit contributions
aith-0.592
ipe-0.566
lus-0.546
 toxicity-0.546
ARA-0.536
Token
Token ID198
Feature activation+1.22e-01

Pos logit contributions
���+0.070
 THESE+0.060
abouts+0.059
 Zup+0.057
retty+0.056
Neg logit contributions
aith-0.065
ipe-0.062
 toxicity-0.061
lus-0.059
ARA-0.057
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
 THESE+0.165
abouts+0.163
 Zup+0.153
���+0.153
bable+0.147
Neg logit contributions
aith-0.279
 toxicity-0.269
ipe-0.264
 sympt-0.255
lus-0.254
TokenFive
Token ID20029
Feature activation+1.03e-01

Pos logit contributions
���+0.239
abouts+0.210
 THESE+0.210
retty+0.201
 Zup+0.200
Neg logit contributions
aith-0.207
ipe-0.197
 toxicity-0.196
lus-0.189
 sympt-0.183
Token community
Token ID2055
Feature activation+2.12e-01

Pos logit contributions
���+0.236
 THESE+0.201
abouts+0.199
retty+0.196
 peas+0.194
Neg logit contributions
aith-0.170
ipe-0.158
lus-0.156
 toxicity-0.155
ARA-0.151
Token groups
Token ID2628
Feature activation+1.42e-01

Pos logit contributions
���+0.516
abouts+0.450
 THESE+0.449
 Zup+0.437
retty+0.434
Neg logit contributions
aith-0.307
ipe-0.294
lus-0.285
ARA-0.275
 toxicity-0.273
Token sued
Token ID16334
Feature activation+2.40e-01

Pos logit contributions
���+0.313
abouts+0.271
 THESE+0.269
 Zup+0.259
retty+0.256
Neg logit contributions
aith-0.243
ipe-0.234
 toxicity-0.228
lus-0.224
ARA-0.218
Token We
Token ID775
Feature activation-1.20e-01

Pos logit contributions
aith+0.170
ipe+0.162
 toxicity+0.159
lus+0.155
ARA+0.150
Neg logit contributions
���-0.166
 THESE-0.147
abouts-0.139
 Zup-0.138
 Shining-0.134
Token are
Token ID389
Feature activation+1.58e-01

Pos logit contributions
aith+0.215
ipe+0.206
lus+0.200
 toxicity+0.200
ARA+0.199
Neg logit contributions
���-0.257
 THESE-0.217
abouts-0.217
 peas-0.208
 wrists-0.208
Token in
Token ID287
Feature activation+2.99e-02

Pos logit contributions
���+0.332
 THESE+0.278
abouts+0.278
 wrists+0.269
 peas+0.266
Neg logit contributions
aith-0.294
ipe-0.287
lus-0.274
 toxicity-0.273
ARA-0.267
Token the
Token ID262
Feature activation-1.67e-02

Pos logit contributions
���+0.072
 THESE+0.062
abouts+0.061
retty+0.060
bable+0.059
Neg logit contributions
aith-0.046
ipe-0.044
 toxicity-0.042
lus-0.042
ARA-0.041
Token process
Token ID1429
Feature activation+1.83e-01

Pos logit contributions
aith+0.025
ipe+0.024
 toxicity+0.023
lus+0.022
ARA+0.022
Neg logit contributions
���-0.040
 THESE-0.035
abouts-0.035
 wrists-0.034
retty-0.034
Token of
Token ID286
Feature activation+2.43e-02

Pos logit contributions
���+0.365
abouts+0.319
 THESE+0.314
 Zup+0.302
retty+0.301
Neg logit contributions
aith-0.350
ipe-0.329
 toxicity-0.323
lus-0.317
ARA-0.316
Token reim
Token ID21123
Feature activation-1.25e-01

Pos logit contributions
���+0.051
 THESE+0.042
abouts+0.042
 peas+0.040
 wrists+0.040
Neg logit contributions
aith-0.046
ipe-0.044
lus-0.043
 toxicity-0.042
ARA-0.042
Tokenplement
Token ID26908
Feature activation+7.90e-01

Pos logit contributions
aith+0.274
ipe+0.264
 toxicity+0.260
lus+0.255
ARA+0.250
Neg logit contributions
���-0.222
abouts-0.177
 THESE-0.176
retty-0.170
 Zup-0.165
Tokening
Token ID278
Feature activation+1.47e-01

Pos logit contributions
���+1.455
 THESE+1.181
abouts+1.173
retty+1.118
bable+1.107
Neg logit contributions
aith-1.667
ipe-1.609
lus-1.568
 toxicity-1.568
ARA-1.523
Token our
Token ID674
Feature activation+7.67e-03

Pos logit contributions
���+0.329
 THESE+0.278
abouts+0.275
retty+0.265
 peas+0.264
Neg logit contributions
aith-0.258
ipe-0.243
 toxicity-0.236
lus-0.235
ARA-0.231
Token message
Token ID3275
Feature activation-8.87e-02

Pos logit contributions
���+0.018
 THESE+0.015
abouts+0.015
retty+0.015
 peas+0.015
Neg logit contributions
aith-0.012
ipe-0.011
lus-0.011
 toxicity-0.011
ARA-0.011
Token The
Token ID383
Feature activation+9.77e-02

Pos logit contributions
���+0.178
 THESE+0.154
abouts+0.147
 Zup+0.142
 wrists+0.142
Neg logit contributions
aith-0.170
ipe-0.161
 toxicity-0.159
lus-0.156
ARA-0.154
Token number
Token ID1271
Feature activation+8.48e-02

Pos logit contributions
���+0.217
 THESE+0.190
abouts+0.185
 wrists+0.181
 peas+0.180
Neg logit contributions
aith-0.167
ipe-0.156
lus-0.154
 toxicity-0.152
ARA-0.150
Token of
Token ID286
Feature activation-3.60e-02

Pos logit contributions
���+0.184
 THESE+0.158
abouts+0.155
 Zup+0.149
 peas+0.149
Neg logit contributions
aith-0.150
ipe-0.143
 toxicity-0.140
lus-0.140
ARA-0.136
Token lines
Token ID3951
Feature activation-2.12e-01

Pos logit contributions
aith+0.069
ipe+0.066
lus+0.065
 toxicity+0.064
ARA+0.062
Neg logit contributions
���-0.074
 THESE-0.063
abouts-0.061
 wrists-0.059
 peas-0.059
Token that
Token ID326
Feature activation+1.02e-01

Pos logit contributions
aith+0.384
ipe+0.365
lus+0.359
 toxicity+0.358
ARA+0.345
Neg logit contributions
���-0.456
 THESE-0.389
abouts-0.383
 wrists-0.368
 peas-0.367
Token are
Token ID389
Feature activation+2.50e-02

Pos logit contributions
���+0.230
 THESE+0.200
abouts+0.193
 wrists+0.187
 Zup+0.187
Neg logit contributions
aith-0.177
ipe-0.167
lus-0.162
 toxicity-0.162
ARA-0.156
Token checked
Token ID10667
Feature activation+9.82e-03

Pos logit contributions
���+0.053
 THESE+0.045
abouts+0.044
 wrists+0.042
 peas+0.042
Neg logit contributions
aith-0.047
ipe-0.044
lus-0.044
 toxicity-0.043
ARA-0.041
Token can
Token ID460
Feature activation-1.70e-01

Pos logit contributions
���+0.020
 THESE+0.018
abouts+0.018
 wrists+0.017
 peas+0.017
Neg logit contributions
aith-0.018
ipe-0.017
 toxicity-0.017
lus-0.017
ARA-0.016
Token be
Token ID307
Feature activation-2.65e-02

Pos logit contributions
aith+0.292
ipe+0.277
 toxicity+0.270
lus+0.266
ARA+0.260
Neg logit contributions
���-0.387
abouts-0.326
 THESE-0.326
retty-0.313
 Zup-0.312
Token set
Token ID900
Feature activation-3.19e-01

Pos logit contributions
aith+0.049
ipe+0.046
lus+0.045
 toxicity+0.044
ARA+0.043
Neg logit contributions
���-0.056
 THESE-0.048
abouts-0.048
 peas-0.045
 wrists-0.045
Token with
Token ID351
Feature activation-1.79e-02

Pos logit contributions
aith+0.642
ipe+0.608
 toxicity+0.596
lus+0.588
ARA+0.577
Neg logit contributions
���-0.615
 THESE-0.525
abouts-0.522
 wrists-0.494
 Zup-0.493
Token the
Token ID262
Feature activation+3.16e-02

Pos logit contributions
���+0.050
 THESE+0.044
abouts+0.044
retty+0.042
 peas+0.042
Neg logit contributions
aith-0.035
ipe-0.034
 toxicity-0.032
lus-0.032
ARA-0.031
Token camera
Token ID4676
Feature activation+1.17e-02

Pos logit contributions
���+0.069
 THESE+0.060
abouts+0.060
 wrists+0.058
 peas+0.057
Neg logit contributions
aith-0.054
ipe-0.052
lus-0.050
 toxicity-0.049
ARA-0.049
Token,
Token ID11
Feature activation-5.82e-02

Pos logit contributions
���+0.026
 THESE+0.022
abouts+0.022
retty+0.021
 wrists+0.021
Neg logit contributions
aith-0.020
ipe-0.019
 toxicity-0.018
lus-0.018
ARA-0.018
Token causing
Token ID6666
Feature activation+1.61e-01

Pos logit contributions
aith+0.102
ipe+0.097
lus+0.093
 toxicity+0.093
ARA+0.091
Neg logit contributions
���-0.128
 THESE-0.109
abouts-0.107
 wrists-0.104
retty-0.104
Token Aut
Token ID5231
Feature activation+1.94e-01

Pos logit contributions
���+0.363
 THESE+0.315
abouts+0.313
 peas+0.302
 wrists+0.302
Neg logit contributions
aith-0.268
ipe-0.254
lus-0.242
ARA-0.238
 toxicity-0.238
Tokenon
Token ID261
Feature activation+1.36e-02

Pos logit contributions
���+0.482
abouts+0.437
 THESE+0.427
retty+0.423
bable+0.422
Neg logit contributions
aith-0.257
 toxicity-0.245
ipe-0.241
ARA-0.230
lus-0.229
Tokenav
Token ID615
Feature activation+1.88e-02

Pos logit contributions
���+0.030
 THESE+0.025
abouts+0.024
retty+0.024
 Zup+0.023
Neg logit contributions
aith-0.025
ipe-0.024
 toxicity-0.023
lus-0.023
ARA-0.022
Token to
Token ID284
Feature activation-4.32e-03

Pos logit contributions
���+0.042
abouts+0.036
 THESE+0.035
retty+0.034
 wrists+0.034
Neg logit contributions
aith-0.032
ipe-0.031
 toxicity-0.029
lus-0.029
ARA-0.029
Token mis
Token ID2984
Feature activation-2.67e-01

Pos logit contributions
aith+0.007
ipe+0.007
lus+0.007
 toxicity+0.007
ARA+0.007
Neg logit contributions
���-0.010
 THESE-0.008
abouts-0.008
 peas-0.008
 Zup-0.008
Tokenident
Token ID738
Feature activation+3.07e-01

Pos logit contributions
aith+0.552
 toxicity+0.529
ipe+0.529
lus+0.514
ARA+0.508
Neg logit contributions
���-0.499
abouts-0.414
 THESE-0.407
retty-0.392
bable-0.387
Tokenify
Token ID1958
Feature activation-1.83e-01

Pos logit contributions
���+0.415
 THESE+0.362
abouts+0.349
 Zup+0.348
bable+0.345
Neg logit contributions
aith-0.731
 toxicity-0.707
ARA-0.691
lus-0.685
 sympt-0.677
Token of
Token ID286
Feature activation+8.35e-02

Pos logit contributions
���+0.690
abouts+0.586
 THESE+0.578
retty+0.558
 Zup+0.554
Neg logit contributions
aith-0.560
ipe-0.535
 toxicity-0.518
lus-0.512
ARA-0.503
Token 5
Token ID642
Feature activation+7.52e-02

Pos logit contributions
���+0.171
abouts+0.151
 THESE+0.148
 peas+0.142
 wrists+0.142
Neg logit contributions
aith-0.154
ipe-0.148
lus-0.142
 toxicity-0.141
ARA-0.140
Token.
Token ID13
Feature activation+9.42e-02

Pos logit contributions
���+0.166
abouts+0.146
 THESE+0.143
retty+0.139
 wrists+0.137
Neg logit contributions
aith-0.127
ipe-0.122
 toxicity-0.120
lus-0.119
ARA-0.115
Token37
Token ID2718
Feature activation-8.86e-02

Pos logit contributions
���+0.197
 THESE+0.173
abouts+0.170
 Zup+0.165
retty+0.161
Neg logit contributions
aith-0.171
ipe-0.163
 toxicity-0.162
lus-0.157
 sympt-0.150
Token an
Token ID281
Feature activation+9.65e-03

Pos logit contributions
aith+0.148
ipe+0.142
 toxicity+0.139
lus+0.138
ARA+0.133
Neg logit contributions
���-0.200
abouts-0.173
 THESE-0.171
retty-0.165
bable-0.164
Token over
Token ID625
Feature activation+3.23e-02

Pos logit contributions
���+0.020
abouts+0.016
 THESE+0.016
 peas+0.015
retty+0.015
Neg logit contributions
aith-0.019
ipe-0.018
 toxicity-0.018
lus-0.018
ARA-0.017
Token.
Token ID13
Feature activation+6.58e-02

Pos logit contributions
���+0.073
abouts+0.064
 THESE+0.063
retty+0.061
bable+0.060
Neg logit contributions
aith-0.054
ipe-0.051
 toxicity-0.050
lus-0.049
ARA-0.048
Token
Token ID198
Feature activation+1.66e-02

Pos logit contributions
���+0.129
 THESE+0.110
abouts+0.108
 Zup+0.104
retty+0.103
Neg logit contributions
aith-0.130
ipe-0.125
 toxicity-0.123
lus-0.119
ARA-0.116
Token
Token ID198
Feature activation+1.68e-02

Pos logit contributions
 THESE+0.023
abouts+0.022
 Zup+0.021
���+0.021
 Shining+0.020
Neg logit contributions
aith-0.038
 toxicity-0.037
ipe-0.036
 sympt-0.035
lus-0.035
Token
Token ID447
Feature activation+3.76e-02

Pos logit contributions
���+0.034
 THESE+0.030
abouts+0.030
 Zup+0.029
bable+0.028
Neg logit contributions
aith-0.031
 toxicity-0.029
ipe-0.029
lus-0.028
 sympt-0.027
Token
Token ID246
Feature activation-3.82e-01

Pos logit contributions
���+0.082
 THESE+0.071
abouts+0.069
retty+0.067
 peas+0.067
Neg logit contributions
aith-0.066
ipe-0.063
 toxicity-0.063
ARA-0.060
lus-0.059
Token of
Token ID286
Feature activation-4.36e-02

Pos logit contributions
���+0.007
abouts+0.006
 THESE+0.006
 Zup+0.006
retty+0.006
Neg logit contributions
aith-0.006
ipe-0.005
lus-0.005
 toxicity-0.005
ARA-0.005
Token coward
Token ID26769
Feature activation+2.25e-01

Pos logit contributions
aith+0.073
ipe+0.071
lus+0.068
ARA+0.067
 toxicity+0.067
Neg logit contributions
���-0.098
 THESE-0.084
abouts-0.084
 wrists-0.080
 peas-0.080
Tokenice
Token ID501
Feature activation-5.03e-02

Pos logit contributions
���+0.287
abouts+0.221
 THESE+0.217
retty+0.210
bable+0.206
Neg logit contributions
aith-0.578
ipe-0.567
 toxicity-0.557
lus-0.555
ARA-0.554
Token on
Token ID319
Feature activation+3.52e-03

Pos logit contributions
aith+0.089
ipe+0.085
lus+0.083
 toxicity+0.082
ARA+0.081
Neg logit contributions
���-0.107
abouts-0.095
 THESE-0.093
 peas-0.089
 wrists-0.088
Token social
Token ID1919
Feature activation-3.32e-01

Pos logit contributions
���+0.008
abouts+0.007
 THESE+0.007
 wrists+0.007
retty+0.007
Neg logit contributions
aith-0.006
ipe-0.005
 toxicity-0.005
lus-0.005
ARA-0.005
Token media
Token ID2056
Feature activation-4.29e-01

Pos logit contributions
aith+0.474
ipe+0.447
lus+0.430
 toxicity+0.430
ARA+0.415
Neg logit contributions
���-0.841
 THESE-0.726
abouts-0.725
retty-0.703
 wrists-0.698
Token.
Token ID13
Feature activation+7.03e-02

Pos logit contributions
aith+0.726
ipe+0.688
lus+0.668
 toxicity+0.663
ARA+0.657
Neg logit contributions
���-0.959
abouts-0.830
 THESE-0.829
retty-0.794
 wrists-0.794
Token
Token ID198
Feature activation+3.96e-02

Pos logit contributions
���+0.139
 THESE+0.118
abouts+0.117
 Zup+0.112
retty+0.110
Neg logit contributions
aith-0.139
ipe-0.133
 toxicity-0.130
lus-0.127
ARA-0.125
Token
Token ID198
Feature activation+5.03e-02

Pos logit contributions
 THESE+0.056
abouts+0.055
���+0.054
 Zup+0.051
bable+0.050
Neg logit contributions
aith-0.089
 toxicity-0.086
ipe-0.084
lus-0.081
 sympt-0.080
TokenBut
Token ID1537
Feature activation-2.18e-02

Pos logit contributions
���+0.099
abouts+0.088
 THESE+0.088
retty+0.083
bable+0.083
Neg logit contributions
aith-0.093
 toxicity-0.090
ipe-0.089
lus-0.085
 sympt-0.083
Token what
Token ID644
Feature activation-1.11e-01

Pos logit contributions
aith+0.037
ipe+0.035
lus+0.034
ARA+0.034
 toxicity+0.033
Neg logit contributions
���-0.049
 THESE-0.043
abouts-0.042
 peas-0.041
 wrists-0.041
Token pres
Token ID906
Feature activation+4.84e-01

Pos logit contributions
���+0.453
 THESE+0.388
abouts+0.384
retty+0.375
 peas+0.373
Neg logit contributions
aith-0.288
ipe-0.269
 toxicity-0.266
ARA-0.255
lus-0.254
Token­
Token ID3907
Feature activation+1.25e-01

Pos logit contributions
���+1.012
abouts+0.890
 THESE+0.873
bable+0.855
 peas+0.854
Neg logit contributions
aith-0.845
 toxicity-0.809
ipe-0.793
ARA-0.778
lus-0.747
Tokenid
Token ID312
Feature activation+1.70e-01

Pos logit contributions
���+0.231
abouts+0.199
 THESE+0.193
retty+0.188
bable+0.188
Neg logit contributions
aith-0.252
 toxicity-0.245
ipe-0.239
ARA-0.239
lus-0.233
Token­
Token ID3907
Feature activation+1.39e-01

Pos logit contributions
���+0.341
abouts+0.306
 THESE+0.306
bable+0.299
 peas+0.295
Neg logit contributions
aith-0.303
 toxicity-0.293
ipe-0.281
ARA-0.275
lus-0.272
Tokenent
Token ID298
Feature activation-1.57e-01

Pos logit contributions
���+0.295
abouts+0.270
 THESE+0.265
bable+0.262
retty+0.254
Neg logit contributions
aith-0.229
 toxicity-0.225
ipe-0.216
ARA-0.213
 sympt-0.210
Token since
Token ID1201
Feature activation-8.67e-02

Pos logit contributions
aith+0.249
ipe+0.236
 toxicity+0.232
lus+0.226
ARA+0.222
Neg logit contributions
���-0.370
abouts-0.320
 THESE-0.319
retty-0.305
 Zup-0.303
Token at
Token ID379
Feature activation-3.04e-02

Pos logit contributions
aith+0.156
ipe+0.149
 toxicity+0.145
lus+0.141
ARA+0.140
Neg logit contributions
���-0.190
 THESE-0.159
abouts-0.159
 peas-0.152
retty-0.151
Token least
Token ID1551
Feature activation+3.47e-01

Pos logit contributions
aith+0.053
ipe+0.051
 toxicity+0.050
lus+0.048
ARA+0.048
Neg logit contributions
���-0.067
abouts-0.056
 THESE-0.056
retty-0.054
 wrists-0.053
Token Jimmy
Token ID12963
Feature activation+2.64e-02

Pos logit contributions
���+0.782
 THESE+0.669
abouts+0.660
 peas+0.643
retty+0.633
Neg logit contributions
aith-0.595
ipe-0.565
 toxicity-0.551
ARA-0.529
lus-0.529
Token Carter
Token ID10831
Feature activation+2.53e-02

Pos logit contributions
���+0.056
 THESE+0.046
abouts+0.046
retty+0.045
 Zup+0.044
Neg logit contributions
aith-0.049
ipe-0.047
 toxicity-0.046
lus-0.045
ARA-0.044
Token,
Token ID11
Feature activation-3.16e-02

Pos logit contributions
���+0.060
abouts+0.051
 THESE+0.051
retty+0.049
 peas+0.049
Neg logit contributions
aith-0.041
ipe-0.038
 toxicity-0.038
lus-0.037
ARA-0.036
Token were
Token ID547
Feature activation+1.20e-01

Pos logit contributions
aith+0.189
ipe+0.182
 toxicity+0.175
lus+0.174
ARA+0.171
Neg logit contributions
���-0.212
abouts-0.183
 THESE-0.181
 wrists-0.172
 peas-0.172
Token targeted
Token ID7977
Feature activation+4.57e-02

Pos logit contributions
���+0.258
abouts+0.224
 THESE+0.222
 peas+0.211
 wrists+0.211
Neg logit contributions
aith-0.213
ipe-0.207
lus-0.198
 toxicity-0.196
ARA-0.191
Token.
Token ID13
Feature activation+9.59e-02

Pos logit contributions
���+0.102
abouts+0.090
 THESE+0.089
retty+0.085
 wrists+0.084
Neg logit contributions
aith-0.077
ipe-0.074
 toxicity-0.072
lus-0.070
ARA-0.069
Token The
Token ID383
Feature activation+1.83e-01

Pos logit contributions
���+0.190
 THESE+0.163
abouts+0.160
 Zup+0.153
retty+0.151
Neg logit contributions
aith-0.189
ipe-0.181
 toxicity-0.178
lus-0.173
ARA-0.168
Token cops
Token ID14073
Feature activation-6.04e-02

Pos logit contributions
���+0.396
abouts+0.336
 THESE+0.335
 wrists+0.325
 peas+0.321
Neg logit contributions
aith-0.324
ipe-0.312
lus-0.295
 toxicity-0.295
ARA-0.291
Token
Token ID851
Feature activation-7.22e-02

Pos logit contributions
aith+0.104
ipe+0.100
 toxicity+0.097
lus+0.096
ARA+0.095
Neg logit contributions
���-0.132
abouts-0.115
 THESE-0.114
 wrists-0.110
 Zup-0.109
Token identified
Token ID5174
Feature activation-4.83e-03

Pos logit contributions
aith+0.131
ipe+0.125
 toxicity+0.122
lus+0.120
ARA+0.117
Neg logit contributions
���-0.154
abouts-0.131
 THESE-0.131
retty-0.126
 wrists-0.124
Token as
Token ID355
Feature activation+7.77e-02

Pos logit contributions
aith+0.009
ipe+0.008
 toxicity+0.008
lus+0.008
ARA+0.008
Neg logit contributions
���-0.011
abouts-0.009
 THESE-0.009
retty-0.009
 Zup-0.009
Token Joshua
Token ID20700
Feature activation+3.84e-01

Pos logit contributions
���+0.159
abouts+0.133
 THESE+0.133
retty+0.127
 Zup+0.127
Neg logit contributions
aith-0.149
ipe-0.143
 toxicity-0.139
lus-0.139
ARA-0.134
Token E
Token ID412
Feature activation+8.58e-02

Pos logit contributions
���+0.775
abouts+0.648
 THESE+0.646
 Zup+0.632
retty+0.620
Neg logit contributions
aith-0.732
ipe-0.696
 toxicity-0.690
lus-0.676
ARA-0.660
Tokenern
Token ID1142
Feature activation+2.76e-01

Pos logit contributions
���+0.190
abouts+0.160
 THESE+0.159
retty+0.155
 Zup+0.153
Neg logit contributions
aith-0.149
ipe-0.142
 toxicity-0.141
lus-0.137
ARA-0.133
Token companies
Token ID2706
Feature activation+3.39e-02

Pos logit contributions
aith+0.072
ipe+0.069
lus+0.065
 toxicity+0.064
ARA+0.062
Neg logit contributions
���-0.111
 THESE-0.096
abouts-0.095
 wrists-0.092
 peas-0.092
Token were
Token ID547
Feature activation+1.09e-01

Pos logit contributions
���+0.074
abouts+0.062
 THESE+0.062
 Zup+0.060
 peas+0.059
Neg logit contributions
aith-0.060
ipe-0.058
 toxicity-0.056
lus-0.055
ARA-0.054
Token granted
Token ID7520
Feature activation+3.28e-01

Pos logit contributions
���+0.232
abouts+0.195
 THESE+0.194
 wrists+0.186
 peas+0.186
Neg logit contributions
aith-0.200
ipe-0.193
 toxicity-0.187
lus-0.185
ARA-0.177
Token retro
Token ID12175
Feature activation-5.30e-02

Pos logit contributions
���+0.844
 THESE+0.735
abouts+0.734
retty+0.707
 wrists+0.705
Neg logit contributions
aith-0.450
ipe-0.431
 toxicity-0.407
lus-0.406
NL-0.379
Tokenactive
Token ID5275
Feature activation+2.85e-01

Pos logit contributions
aith+0.101
ipe+0.097
 toxicity+0.095
lus+0.094
ARA+0.091
Neg logit contributions
���-0.110
abouts-0.091
 THESE-0.090
retty-0.088
bable-0.087
Token immunity
Token ID16989
Feature activation-5.13e-01

Pos logit contributions
���+0.876
abouts+0.787
 THESE+0.783
 peas+0.766
 wrists+0.764
Neg logit contributions
aith-0.242
ipe-0.221
lus-0.205
 toxicity-0.201
ARA-0.183
Token for
Token ID329
Feature activation+2.58e-02

Pos logit contributions
aith+0.934
ipe+0.894
 toxicity+0.867
lus+0.858
ARA+0.841
Neg logit contributions
���-1.093
abouts-0.925
 THESE-0.924
retty-0.886
 Zup-0.880
Token violating
Token ID18134
Feature activation+2.90e-01

Pos logit contributions
���+0.055
 THESE+0.047
abouts+0.046
retty+0.044
 peas+0.044
Neg logit contributions
aith-0.047
ipe-0.045
lus-0.043
 toxicity-0.043
ARA-0.042
Token the
Token ID262
Feature activation-3.96e-02

Pos logit contributions
���+0.648
 THESE+0.548
abouts+0.538
 peas+0.519
bable+0.518
Neg logit contributions
aith-0.499
ipe-0.482
 toxicity-0.467
lus-0.459
ARA-0.439
Token rights
Token ID2489
Feature activation-1.30e-01

Pos logit contributions
aith+0.079
ipe+0.077
 toxicity+0.075
lus+0.074
NL+0.071
Neg logit contributions
���-0.078
 THESE-0.064
abouts-0.063
bable-0.061
 wrists-0.061
Token and
Token ID290
Feature activation+3.26e-03

Pos logit contributions
aith+0.205
ipe+0.195
 toxicity+0.190
lus+0.185
ARA+0.180
Neg logit contributions
���-0.312
abouts-0.266
 THESE-0.265
retty-0.260
 Zup-0.255
Token helped
Token ID4193
Feature activation-7.25e-02

Pos logit contributions
aith+0.037
ipe+0.036
lus+0.035
 toxicity+0.034
ARA+0.033
Neg logit contributions
���-0.046
abouts-0.039
 THESE-0.039
 peas-0.038
 wrists-0.037
Token him
Token ID683
Feature activation+3.44e-02

Pos logit contributions
aith+0.127
ipe+0.122
 toxicity+0.118
lus+0.118
ARA+0.113
Neg logit contributions
���-0.158
 THESE-0.136
abouts-0.135
 peas-0.131
 Zup-0.130
Token grow
Token ID1663
Feature activation-2.51e-01

Pos logit contributions
���+0.072
abouts+0.063
 THESE+0.062
 wrists+0.060
 peas+0.060
Neg logit contributions
aith-0.062
ipe-0.059
lus-0.057
 toxicity-0.057
ARA-0.056
Token since
Token ID1201
Feature activation-2.93e-01

Pos logit contributions
aith+0.459
ipe+0.435
 toxicity+0.422
lus+0.421
ARA+0.409
Neg logit contributions
���-0.532
abouts-0.455
 THESE-0.450
 wrists-0.434
retty-0.432
Token returning
Token ID8024
Feature activation-2.89e-01

Pos logit contributions
aith+0.452
ipe+0.430
 toxicity+0.414
lus+0.409
ARA+0.389
Neg logit contributions
���-0.707
abouts-0.617
 THESE-0.611
 wrists-0.587
 peas-0.587
Token to
Token ID284
Feature activation-5.59e-02

Pos logit contributions
aith+0.522
ipe+0.497
 toxicity+0.482
lus+0.474
ARA+0.464
Neg logit contributions
���-0.625
abouts-0.532
 THESE-0.528
retty-0.507
 Zup-0.505
Token Europe
Token ID2031
Feature activation-1.97e-01

Pos logit contributions
aith+0.102
ipe+0.097
 toxicity+0.094
lus+0.093
ARA+0.090
Neg logit contributions
���-0.119
 THESE-0.101
abouts-0.099
 Zup-0.096
 wrists-0.096
Token.
Token ID13
Feature activation+6.58e-02

Pos logit contributions
aith+0.337
ipe+0.318
 toxicity+0.311
lus+0.307
ARA+0.299
Neg logit contributions
���-0.444
abouts-0.385
 THESE-0.378
retty-0.366
bable-0.362
Token
Token ID447
Feature activation-3.21e-02

Pos logit contributions
���+0.146
 THESE+0.127
abouts+0.126
retty+0.121
 Zup+0.121
Neg logit contributions
aith-0.113
ipe-0.107
 toxicity-0.106
lus-0.103
ARA-0.097
Token
Token ID250
Feature activation-1.15e-01

Pos logit contributions
aith+0.065
ipe+0.062
 toxicity+0.061
lus+0.060
ARA+0.059
Neg logit contributions
���-0.062
abouts-0.052
 THESE-0.051
retty-0.049
 Zup-0.048
TokenHe
Token ID1544
Feature activation-1.22e-01

Pos logit contributions
aith+0.202
 toxicity+0.193
ipe+0.193
lus+0.185
 sympt+0.178
Neg logit contributions
���-0.244
abouts-0.216
 THESE-0.213
retty-0.204
bable-0.202
TokenEP
Token ID8905
Feature activation-2.35e-01

Pos logit contributions
aith+0.011
 toxicity+0.011
 sympt+0.010
lus+0.010
ipe+0.010
Neg logit contributions
���-0.013
abouts-0.012
 THESE-0.012
retty-0.012
 peas-0.012
Token uses
Token ID3544
Feature activation+1.07e-01

Pos logit contributions
aith+0.417
ipe+0.389
lus+0.379
 toxicity+0.378
ARA+0.362
Neg logit contributions
���-0.522
abouts-0.440
 THESE-0.438
retty-0.422
 Zup-0.421
Token the
Token ID262
Feature activation-4.14e-02

Pos logit contributions
���+0.248
 THESE+0.212
abouts+0.212
 Zup+0.204
 wrists+0.202
Neg logit contributions
aith-0.180
ipe-0.167
lus-0.164
 toxicity-0.161
ARA-0.157
Token stream
Token ID4269
Feature activation+1.80e-01

Pos logit contributions
aith+0.069
ipe+0.063
lus+0.062
 toxicity+0.061
ARA+0.059
Neg logit contributions
���-0.097
 THESE-0.083
abouts-0.082
 Zup-0.080
 wrists-0.080
Token cipher
Token ID38012
Feature activation-7.96e-02

Pos logit contributions
���+0.303
abouts+0.244
 THESE+0.240
retty+0.232
 Zup+0.229
Neg logit contributions
aith-0.412
ipe-0.390
 toxicity-0.384
lus-0.383
ARA-0.375
Token RC
Token ID13987
Feature activation-6.15e-01

Pos logit contributions
aith+0.139
ipe+0.130
 toxicity+0.127
lus+0.126
ARA+0.125
Neg logit contributions
���-0.176
abouts-0.149
 THESE-0.149
 Zup-0.144
retty-0.144
Token4
Token ID19
Feature activation-1.02e-01

Pos logit contributions
aith+1.213
ipe+1.144
lus+1.124
 toxicity+1.123
ARA+1.081
Neg logit contributions
���-1.250
 THESE-1.029
abouts-1.016
retty-0.976
 Zup-0.974
Token for
Token ID329
Feature activation+2.07e-02

Pos logit contributions
aith+0.184
ipe+0.173
 toxicity+0.168
lus+0.167
ARA+0.163
Neg logit contributions
���-0.223
abouts-0.188
 THESE-0.187
retty-0.180
 Zup-0.180
Token confidentiality
Token ID32554
Feature activation-6.02e-01

Pos logit contributions
���+0.047
abouts+0.040
 THESE+0.040
retty+0.038
 wrists+0.038
Neg logit contributions
aith-0.035
ipe-0.033
lus-0.033
 toxicity-0.032
ARA-0.031
Token,[
Token ID17414
Feature activation-6.00e-02

Pos logit contributions
aith+0.960
ipe+0.910
 toxicity+0.882
lus+0.874
ARA+0.850
Neg logit contributions
���-1.435
abouts-1.220
 THESE-1.213
retty-1.181
 Zup-1.165
Token9
Token ID24
Feature activation+2.96e-02

Pos logit contributions
aith+0.100
ipe+0.096
 toxicity+0.093
lus+0.092
ARA+0.089
Neg logit contributions
���-0.139
abouts-0.117
 THESE-0.117
retty-0.113
 Zup-0.112
Token side
Token ID1735
Feature activation+1.35e-01

Pos logit contributions
aith+0.049
ipe+0.045
lus+0.044
 toxicity+0.044
ARA+0.043
Neg logit contributions
���-0.058
abouts-0.051
 THESE-0.050
 wrists-0.048
 peas-0.048
Token question
Token ID1808
Feature activation-1.98e-01

Pos logit contributions
���+0.303
abouts+0.261
 THESE+0.259
retty+0.249
 peas+0.248
Neg logit contributions
aith-0.229
ipe-0.216
lus-0.210
 toxicity-0.209
ARA-0.205
Token,
Token ID11
Feature activation+6.23e-02

Pos logit contributions
aith+0.339
ipe+0.314
 toxicity+0.312
lus+0.308
ARA+0.299
Neg logit contributions
���-0.439
abouts-0.386
 THESE-0.381
retty-0.365
 Zup-0.362
Token in
Token ID287
Feature activation+1.50e-02

Pos logit contributions
���+0.133
 THESE+0.118
abouts+0.117
 wrists+0.112
retty+0.111
Neg logit contributions
aith-0.112
ipe-0.104
lus-0.101
 toxicity-0.099
ARA-0.099
Token an
Token ID281
Feature activation+2.84e-02

Pos logit contributions
���+0.034
 THESE+0.030
abouts+0.030
retty+0.029
 Zup+0.028
Neg logit contributions
aith-0.025
ipe-0.023
lus-0.022
 toxicity-0.022
ARA-0.022
Token interview
Token ID2720
Feature activation-5.39e-01

Pos logit contributions
���+0.068
abouts+0.057
 THESE+0.056
retty+0.054
 Pagan+0.054
Neg logit contributions
aith-0.045
ipe-0.043
lus-0.042
 toxicity-0.041
ARA-0.040
Token with
Token ID351
Feature activation+4.54e-02

Pos logit contributions
aith+1.005
ipe+0.946
 toxicity+0.927
lus+0.921
ARA+0.902
Neg logit contributions
���-1.144
abouts-0.960
 THESE-0.956
retty-0.928
 Zup-0.908
Token you
Token ID345
Feature activation+3.70e-02

Pos logit contributions
���+0.095
 THESE+0.081
abouts+0.081
 Zup+0.078
retty+0.077
Neg logit contributions
aith-0.084
ipe-0.080
 toxicity-0.077
lus-0.077
ARA-0.075
Token a
Token ID257
Feature activation+7.29e-02

Pos logit contributions
���+0.085
abouts+0.076
 THESE+0.075
retty+0.073
 Zup+0.072
Neg logit contributions
aith-0.059
ipe-0.055
 toxicity-0.054
lus-0.054
ARA-0.053
Token few
Token ID1178
Feature activation-5.33e-02

Pos logit contributions
���+0.171
abouts+0.146
 THESE+0.145
retty+0.142
 Zup+0.139
Neg logit contributions
aith-0.119
ipe-0.111
 toxicity-0.108
lus-0.106
ARA-0.104
Token months
Token ID1933
Feature activation-1.11e-01

Pos logit contributions
aith+0.102
ipe+0.097
 toxicity+0.094
lus+0.093
ARA+0.092
Neg logit contributions
���-0.109
 THESE-0.092
abouts-0.092
 wrists-0.088
retty-0.088
Token all
Token ID477
Feature activation-6.62e-02

Pos logit contributions
���+0.005
 THESE+0.004
abouts+0.004
 Zup+0.004
 wrists+0.004
Neg logit contributions
aith-0.004
ipe-0.003
 toxicity-0.003
lus-0.003
ARA-0.003
Token my
Token ID616
Feature activation-1.16e-01

Pos logit contributions
aith+0.113
ipe+0.107
 toxicity+0.104
lus+0.103
ARA+0.101
Neg logit contributions
���-0.151
 THESE-0.129
abouts-0.128
retty-0.123
 Zup-0.123
Token Nak
Token ID22255
Feature activation+3.75e-01

Pos logit contributions
aith+0.199
ipe+0.188
 toxicity+0.182
lus+0.181
ARA+0.178
Neg logit contributions
���-0.262
 THESE-0.223
abouts-0.221
 wrists-0.214
 Zup-0.214
Token Mu
Token ID8252
Feature activation-3.77e-01

Pos logit contributions
���+0.794
abouts+0.656
 THESE+0.650
 Zup+0.629
retty+0.629
Neg logit contributions
aith-0.698
ipe-0.663
 toxicity-0.649
lus-0.636
ARA-0.631
Tokenay
Token ID323
Feature activation+1.42e-01

Pos logit contributions
aith+0.684
 toxicity+0.669
ipe+0.667
lus+0.650
ARA+0.633
Neg logit contributions
���-0.730
abouts-0.655
 THESE-0.646
retty-0.629
 Zup-0.625
Token Nation
Token ID8741
Feature activation-5.72e-01

Pos logit contributions
���+0.347
abouts+0.294
 THESE+0.294
 Zup+0.284
retty+0.284
Neg logit contributions
aith-0.218
ipe-0.208
 toxicity-0.200
lus-0.195
ARA-0.192
Token features
Token ID3033
Feature activation-8.00e-02

Pos logit contributions
aith+1.010
ipe+0.964
 toxicity+0.934
lus+0.926
ARA+0.905
Neg logit contributions
���-1.268
abouts-1.068
 THESE-1.067
retty-1.021
 wrists-1.017
Token here
Token ID994
Feature activation+1.43e-01

Pos logit contributions
aith+0.140
ipe+0.133
 toxicity+0.130
lus+0.129
ARA+0.127
Neg logit contributions
���-0.173
 THESE-0.150
abouts-0.149
retty-0.143
 wrists-0.142
Token.
Token ID13
Feature activation-2.59e-02

Pos logit contributions
���+0.320
abouts+0.274
 THESE+0.272
retty+0.262
 Zup+0.260
Neg logit contributions
aith-0.248
ipe-0.235
 toxicity-0.230
lus-0.226
ARA-0.220
Token
Token ID198
Feature activation-1.36e-01

Pos logit contributions
aith+0.051
ipe+0.049
 toxicity+0.048
lus+0.046
ARA+0.045
Neg logit contributions
���-0.052
 THESE-0.045
abouts-0.043
 Zup-0.042
retty-0.041
Token
Token ID198
Feature activation-9.17e-02

Pos logit contributions
aith+0.307
 toxicity+0.296
ipe+0.291
 sympt+0.282
lus+0.281
Neg logit contributions
 THESE-0.190
abouts-0.188
���-0.181
 Zup-0.176
bable-0.170
Tokenath
Token ID776
Feature activation-1.60e-01

Pos logit contributions
aith+0.018
 toxicity+0.017
ipe+0.017
lus+0.016
ARA+0.016
Neg logit contributions
���-0.028
abouts-0.025
 THESE-0.024
retty-0.024
 wrists-0.023
Tokenons
Token ID684
Feature activation+5.97e-02

Pos logit contributions
aith+0.336
ipe+0.323
 toxicity+0.317
lus+0.315
ARA+0.305
Neg logit contributions
���-0.295
abouts-0.247
 THESE-0.240
retty-0.232
 wrists-0.232
Token to
Token ID284
Feature activation-7.86e-02

Pos logit contributions
���+0.131
abouts+0.114
 THESE+0.112
retty+0.108
 peas+0.106
Neg logit contributions
aith-0.105
ipe-0.099
 toxicity-0.097
lus-0.096
ARA-0.094
Token promote
Token ID7719
Feature activation+1.53e-01

Pos logit contributions
aith+0.137
ipe+0.129
 toxicity+0.126
lus+0.124
ARA+0.122
Neg logit contributions
���-0.171
 THESE-0.149
abouts-0.148
 peas-0.144
 wrists-0.144
Token j
Token ID474
Feature activation+9.14e-03

Pos logit contributions
���+0.348
 THESE+0.307
abouts+0.304
 wrists+0.297
 peas+0.297
Neg logit contributions
aith-0.245
ipe-0.233
lus-0.226
 toxicity-0.222
ARA-0.219
Tokenogging
Token ID30853
Feature activation-5.84e-01

Pos logit contributions
���+0.017
abouts+0.014
 THESE+0.014
retty+0.014
 peas+0.014
Neg logit contributions
aith-0.018
 toxicity-0.018
ipe-0.017
lus-0.017
ARA-0.017
Token
Token ID960
Feature activation+1.72e-02

Pos logit contributions
aith+0.978
ipe+0.934
 toxicity+0.898
lus+0.889
ARA+0.873
Neg logit contributions
���-1.333
abouts-1.154
 THESE-1.134
retty-1.098
 wrists-1.092
Tokenit
Token ID270
Feature activation-1.15e-01

Pos logit contributions
���+0.037
abouts+0.032
 THESE+0.031
retty+0.030
bable+0.030
Neg logit contributions
aith-0.031
 toxicity-0.029
ipe-0.029
lus-0.028
ARA-0.027
Token gets
Token ID3011
Feature activation+1.37e-01

Pos logit contributions
aith+0.209
ipe+0.200
 toxicity+0.197
lus+0.193
ARA+0.188
Neg logit contributions
���-0.247
abouts-0.209
 THESE-0.208
retty-0.200
 wrists-0.199
Token more
Token ID517
Feature activation+1.04e-01

Pos logit contributions
���+0.324
 THESE+0.287
abouts+0.283
 peas+0.278
 wrists+0.275
Neg logit contributions
aith-0.213
ipe-0.199
lus-0.190
 toxicity-0.190
ARA-0.185
Token people
Token ID661
Feature activation-8.51e-02

Pos logit contributions
���+0.240
 THESE+0.216
abouts+0.212
 peas+0.211
 wrists+0.209
Neg logit contributions
aith-0.164
ipe-0.156
lus-0.148
ARA-0.143
 toxicity-0.143
Token permanently
Token ID15043
Feature activation-3.31e-01

Pos logit contributions
aith+0.030
ipe+0.028
 toxicity+0.028
lus+0.027
ARA+0.027
Neg logit contributions
���-0.034
abouts-0.030
 THESE-0.030
 peas-0.028
 wrists-0.028
Token face
Token ID1986
Feature activation-3.77e-02

Pos logit contributions
aith+0.593
ipe+0.557
 toxicity+0.547
lus+0.545
ARA+0.537
Neg logit contributions
���-0.705
abouts-0.616
 THESE-0.614
 wrists-0.587
 peas-0.586
Token away
Token ID1497
Feature activation-1.09e-01

Pos logit contributions
aith+0.059
ipe+0.056
 toxicity+0.054
lus+0.053
ARA+0.051
Neg logit contributions
���-0.090
abouts-0.078
 THESE-0.078
 peas-0.075
 wrists-0.074
Token.
Token ID13
Feature activation+5.37e-02

Pos logit contributions
aith+0.186
ipe+0.177
 toxicity+0.173
lus+0.170
ARA+0.165
Neg logit contributions
���-0.245
abouts-0.208
 THESE-0.207
retty-0.199
 Zup-0.197
Token By
Token ID2750
Feature activation-2.12e-01

Pos logit contributions
���+0.100
 THESE+0.088
abouts+0.085
 Zup+0.082
 Pagan+0.080
Neg logit contributions
aith-0.110
ipe-0.106
 toxicity-0.104
lus-0.101
 sympt-0.098
Token comparison
Token ID7208
Feature activation-5.41e-01

Pos logit contributions
aith+0.362
ipe+0.346
lus+0.332
 toxicity+0.332
ARA+0.322
Neg logit contributions
���-0.476
 THESE-0.410
abouts-0.408
 peas-0.393
 wrists-0.387
Token:
Token ID25
Feature activation-7.56e-02

Pos logit contributions
aith+0.607
ipe+0.580
 toxicity+0.578
lus+0.550
ARA+0.518
Neg logit contributions
���-1.531
abouts-1.325
 THESE-1.317
retty-1.295
 Zup-1.283
Token Earth
Token ID3668
Feature activation-6.85e-02

Pos logit contributions
aith+0.146
ipe+0.138
 toxicity+0.134
lus+0.131
ARA+0.130
Neg logit contributions
���-0.156
 THESE-0.131
abouts-0.131
 peas-0.126
 wrists-0.124
Token's
Token ID338
Feature activation+2.33e-02

Pos logit contributions
aith+0.123
ipe+0.117
 toxicity+0.114
lus+0.113
ARA+0.109
Neg logit contributions
���-0.149
abouts-0.128
 THESE-0.125
 peas-0.121
 wrists-0.121
Token moon
Token ID8824
Feature activation+1.03e-02

Pos logit contributions
���+0.051
abouts+0.044
 THESE+0.044
 wrists+0.043
 peas+0.043
Neg logit contributions
aith-0.041
ipe-0.039
lus-0.037
ARA-0.037
 toxicity-0.037
Token is
Token ID318
Feature activation+1.56e-02

Pos logit contributions
���+0.023
abouts+0.020
 THESE+0.020
 peas+0.019
 wrists+0.019
Neg logit contributions
aith-0.018
ipe-0.017
 toxicity-0.016
lus-0.016
ARA-0.016