TOP ACTIVATIONS
MAX = 4.144

cl ojure . files ] WHERE path LIKE ' % project
for ensuring compatibility with V OD services . When
. Ab ig ail van den No ort ,
, particularly its failure to pass a bill to replace Democratic
than 10 years in prison MORE and tick ed through her
8 192 MB 8 192 MB Texture Units 144 64 104
re pos . files ] WHERE RIGHT ( path , 4
6 144 MB 8 192 MB 8 192 MB Texture Units
than 10 years in prison MORE and Hillary Clinton Hillary Diane
look elsewhere , as successful GP 2 and Indy Car teams
of Ancient Greece focused un ash amed ly on the naked
popular ized by Anita D iam ant , The Red Tent
ojure | 208 34 | | 2 | org . cl
us nerve stimulation ( V NS ), an FDA - approved
Republicans are furiously whipping to pass ahead of September 30 ,
and Reporting System ( W IS Q ARS ). Atlanta ,
As yu K ong Ch oya
, also said Congress would pass legislation to raise the ceiling
cl ojure . contents ] WHERE id IN ( SELECT id
Director at Wave form Entertainment | Streaming Every Es port Under

MIN ACTIVATIONS
MIN = -4.521

sound system . Rated 5 out of 5 by
and night ! Rated 5 out of 5 by
- December 13 , 2010 Subject : boys beware
Flag Reply 27 Recommend Recommend Share this
Flag Reply 55 Recommend Recommend Share this
body . I feel as though Eve was like ,
You don 't feel as though you want to go down
the Kodi home screen , select Add ons -> My Add
ON OM ICS AND TR ADE The Wednesday -
- July 6 , 2010 Subject : R iff
December 13 , 2010 Subject : boys beware This
the Kodi homes creen , select Add ons
but it is not as though he himself is immune from
to enter the app , tap a virtual Android button ,
the Amazon search bar , select " M ovies & TV
in terms of gender , race , class , etc .
celebrated , and seemed as though they had to un ce
Flag Reply 14 Recommend Recommend Share this
27 Recommend Recommend Share this comment on Facebook
Guard ado obliged spor ad ically , considering the heat

INTERVAL 1.977 - 4.144
CONTAINS 0.189%

ll notice a strange camera in the EX IF
oliath TBA TBA CBS All Access All
Be safe pic . twitter . com /
? No more bad news pl s . Good updates ?
and an anonymous source will suffice . Any mistake sparks talk

INTERVAL -0.189 - 1.977
CONTAINS 63.418%

s offshore oil fields and the laying of a gas pipeline
who added that his client denied the allegations .
can say , see or hear ? Who decides what is
story in a different city , eight months ago .
Shelby W add ell (@ s hel by y

INTERVAL -2.355 - -0.189
CONTAINS 36.338%

- K hun i , B akh te gan , An
Syrian frontier near al - A fs iq and split into
last fall . Court heard that the U .
the problem completely . The terrain would either a ) still
writing them , often changing the content and order of lyrics

INTERVAL -4.521 - -2.355
CONTAINS 0.055%

ieu , along with things like the pul py space opera
. This is only the beginning , an early
aris s signature charm and un ce asing pos
your husband before you march anywhere again . If
School Choice at Vanderbilt University compared charter schools with traditional public
Token cl
Token ID537
Feature activation+1.19e-01

Pos logit contributions
���+0.302
 Lennon+0.288
 lapt+0.287
 Gohan+0.267
FML+0.264
Neg logit contributions
otrop-0.312
udos-0.289
acia-0.288
usting-0.286
 safety-0.278
Tokenojure
Token ID32511
Feature activation-3.57e-01

Pos logit contributions
otrop+0.087
udos+0.080
acia+0.080
usting+0.080
 safety+0.078
Neg logit contributions
���-0.080
 Lennon-0.076
 lapt-0.076
 Gohan-0.071
 Yose-0.070
Token.
Token ID764
Feature activation+8.95e-02

Pos logit contributions
���+0.347
 Lennon+0.344
 lapt+0.340
 Gohan+0.326
FML+0.324
Neg logit contributions
otrop-0.143
udos-0.126
acia-0.124
usting-0.124
 safety-0.115
Token files
Token ID3696
Feature activation+9.60e-02

Pos logit contributions
otrop+0.054
acia+0.049
udos+0.049
usting+0.048
 safety+0.046
Neg logit contributions
���-0.072
 Lennon-0.069
 lapt-0.069
 Gohan-0.065
FML-0.064
Token ]
Token ID2361
Feature activation+2.95e-01

Pos logit contributions
otrop+0.044
udos+0.039
acia+0.039
usting+0.038
 safety+0.036
Neg logit contributions
���-0.091
 lapt-0.088
 Lennon-0.088
 Gohan-0.083
FML-0.082
Token WHERE
Token ID33411
Feature activation+4.14e+00

Pos logit contributions
otrop+0.186
udos+0.170
acia+0.170
usting+0.169
 safety+0.164
Neg logit contributions
���-0.227
 Lennon-0.217
 lapt-0.216
 Gohan-0.204
 Yose-0.201
Token path
Token ID3108
Feature activation+7.63e-01

Pos logit contributions
otrop+2.410
acia+2.222
 safety+2.207
usting+2.206
udos+2.197
Neg logit contributions
���-3.404
 Lennon-3.135
 lapt-3.122
 Gohan-2.998
 Yose-2.986
Token LIKE
Token ID34178
Feature activation+7.51e-01

Pos logit contributions
otrop+0.482
udos+0.442
acia+0.442
usting+0.438
 safety+0.425
Neg logit contributions
���-0.591
 Lennon-0.560
 lapt-0.558
 Gohan-0.527
 Yose-0.522
Token '
Token ID705
Feature activation-7.40e-01

Pos logit contributions
otrop+0.457
udos+0.416
acia+0.416
usting+0.413
 safety+0.401
Neg logit contributions
���-0.599
 Lennon-0.570
 lapt-0.568
 Gohan-0.538
 Yose-0.532
Token%
Token ID4
Feature activation-1.82e+00

Pos logit contributions
���+0.548
 Lennon+0.524
 lapt+0.521
 Gohan+0.490
FML+0.486
Neg logit contributions
otrop-0.488
udos-0.448
acia-0.447
usting-0.444
 safety-0.430
Tokenproject
Token ID16302
Feature activation-2.22e-01

Pos logit contributions
���+1.240
 Lennon+1.195
 lapt+1.183
 Gohan+1.107
FML+1.104
Neg logit contributions
otrop-1.293
udos-1.191
acia-1.190
usting-1.185
 safety-1.148
Token for
Token ID329
Feature activation+6.81e-02

Pos logit contributions
���+0.038
 Lennon+0.037
 lapt+0.037
 Gohan+0.035
 Yose+0.034
Neg logit contributions
otrop-0.026
udos-0.023
acia-0.023
usting-0.023
 safety-0.022
Token ensuring
Token ID13359
Feature activation+6.65e-01

Pos logit contributions
otrop+0.045
udos+0.042
acia+0.041
usting+0.041
 safety+0.040
Neg logit contributions
���-0.050
 Lennon-0.048
 lapt-0.048
 Gohan-0.045
 Yose-0.044
Token compatibility
Token ID17764
Feature activation-7.20e-01

Pos logit contributions
otrop+0.387
udos+0.351
acia+0.350
usting+0.347
 safety+0.334
Neg logit contributions
���-0.547
 Lennon-0.524
 lapt-0.522
 Gohan-0.495
 Yose-0.488
Token with
Token ID351
Feature activation-1.72e-01

Pos logit contributions
���+0.605
 Lennon+0.591
 lapt+0.584
 Gohan+0.559
 Yose+0.550
Neg logit contributions
otrop-0.398
udos-0.360
acia-0.357
usting-0.353
 safety-0.334
Token V
Token ID569
Feature activation-1.42e-01

Pos logit contributions
���+0.131
 Lennon+0.125
 lapt+0.124
 Gohan+0.117
 Yose+0.116
Neg logit contributions
otrop-0.111
udos-0.102
acia-0.102
usting-0.101
 safety-0.098
TokenOD
Token ID3727
Feature activation+3.79e+00

Pos logit contributions
���+0.101
 Lennon+0.096
 lapt+0.096
 Gohan+0.090
 Yose+0.088
Neg logit contributions
otrop-0.099
udos-0.091
acia-0.091
usting-0.091
 safety-0.088
Token services
Token ID2594
Feature activation-3.24e-01

Pos logit contributions
otrop+2.434
udos+2.260
usting+2.239
acia+2.237
 safety+2.215
Neg logit contributions
���-2.878
 Lennon-2.719
 lapt-2.701
 Gohan-2.523
FML-2.501
Token.
Token ID13
Feature activation+7.25e-02

Pos logit contributions
���+0.277
 Lennon+0.268
 lapt+0.266
 Gohan+0.254
 Yose+0.251
Neg logit contributions
otrop-0.176
udos-0.158
acia-0.158
usting-0.156
 safety-0.149
Token
Token ID198
Feature activation+2.95e-01

Pos logit contributions
otrop+0.046
udos+0.042
acia+0.042
usting+0.042
 safety+0.041
Neg logit contributions
���-0.056
 Lennon-0.053
 lapt-0.053
 Gohan-0.050
 Yose-0.049
Token
Token ID198
Feature activation+1.43e-01

Pos logit contributions
otrop+0.175
udos+0.159
acia+0.159
usting+0.158
 safety+0.152
Neg logit contributions
���-0.239
 Lennon-0.229
 lapt-0.228
 Gohan-0.216
 Yose-0.214
TokenWhen
Token ID2215
Feature activation+2.09e-01

Pos logit contributions
otrop+0.102
udos+0.094
acia+0.094
usting+0.094
 safety+0.091
Neg logit contributions
���-0.099
 Lennon-0.094
 lapt-0.093
 Gohan-0.087
 Yose-0.086
Token.
Token ID13
Feature activation+8.05e-01

Pos logit contributions
���+0.653
 Lennon+0.630
 lapt+0.625
 Gohan+0.595
FML+0.588
Neg logit contributions
otrop-0.436
udos-0.394
acia-0.391
usting-0.388
 safety-0.373
Token
Token ID198
Feature activation+5.11e-02

Pos logit contributions
otrop+0.493
udos+0.449
acia+0.448
usting+0.445
 safety+0.430
Neg logit contributions
���-0.636
 Lennon-0.611
 lapt-0.605
 Gohan-0.575
 Yose-0.567
Token
Token ID198
Feature activation-3.65e-02

Pos logit contributions
otrop+0.031
udos+0.028
acia+0.028
usting+0.028
 safety+0.027
Neg logit contributions
���-0.039
 Lennon-0.039
 lapt-0.038
 Gohan-0.037
FML-0.036
TokenAb
Token ID4826
Feature activation+4.22e-01

Pos logit contributions
���+0.026
 Lennon+0.025
 lapt+0.024
 Gohan+0.023
FML+0.023
Neg logit contributions
otrop-0.025
udos-0.023
acia-0.023
usting-0.023
 safety-0.022
Tokenig
Token ID328
Feature activation+2.19e-01

Pos logit contributions
otrop+0.312
acia+0.290
udos+0.290
usting+0.288
 safety+0.280
Neg logit contributions
���-0.281
 Lennon-0.265
 lapt-0.264
 Gohan-0.247
 Yose-0.243
Tokenail
Token ID603
Feature activation+3.58e+00

Pos logit contributions
otrop+0.124
udos+0.112
acia+0.111
usting+0.110
 safety+0.107
Neg logit contributions
���-0.184
 Lennon-0.177
 lapt-0.175
 Gohan-0.167
 Yose-0.165
Token van
Token ID5719
Feature activation+1.43e-01

Pos logit contributions
otrop+2.363
udos+2.190
acia+2.184
usting+2.180
 safety+2.134
Neg logit contributions
���-2.641
 lapt-2.522
 Lennon-2.434
 Gohan-2.308
FML-2.302
Token den
Token ID2853
Feature activation-1.27e+00

Pos logit contributions
otrop+0.092
udos+0.085
acia+0.084
usting+0.084
 safety+0.081
Neg logit contributions
���-0.108
 Lennon-0.103
 lapt-0.103
 Gohan-0.097
 Yose-0.096
Token No
Token ID1400
Feature activation+4.24e-01

Pos logit contributions
���+0.910
 Lennon+0.879
 lapt+0.862
 Gohan+0.822
 Yose+0.811
Neg logit contributions
otrop-0.873
udos-0.806
acia-0.804
usting-0.799
 safety-0.773
Tokenort
Token ID419
Feature activation+6.99e-01

Pos logit contributions
otrop+0.313
udos+0.290
acia+0.289
usting+0.287
 safety+0.279
Neg logit contributions
���-0.286
 Lennon-0.269
 lapt-0.269
 Gohan-0.251
FML-0.247
Token,
Token ID11
Feature activation-4.47e-01

Pos logit contributions
otrop+0.438
udos+0.401
acia+0.401
usting+0.397
 safety+0.385
Neg logit contributions
���-0.546
 Lennon-0.518
 lapt-0.517
 Gohan-0.488
 Yose-0.481
Token,
Token ID11
Feature activation-2.47e-01

Pos logit contributions
���+0.375
 Lennon+0.361
 lapt+0.359
 Gohan+0.342
 Yose+0.337
Neg logit contributions
otrop-0.233
udos-0.210
acia-0.209
usting-0.207
 safety-0.198
Token particularly
Token ID3573
Feature activation+4.61e-01

Pos logit contributions
���+0.194
 Lennon+0.187
 lapt+0.186
 Gohan+0.176
 Yose+0.173
Neg logit contributions
otrop-0.151
udos-0.138
acia-0.138
usting-0.136
 safety-0.131
Token its
Token ID663
Feature activation+7.97e-02

Pos logit contributions
otrop+0.276
udos+0.252
acia+0.251
usting+0.249
 safety+0.237
Neg logit contributions
���-0.370
 Lennon-0.356
 lapt-0.354
 Gohan-0.336
FML-0.332
Token failure
Token ID5287
Feature activation+8.16e-01

Pos logit contributions
otrop+0.050
udos+0.046
acia+0.046
usting+0.045
 safety+0.043
Neg logit contributions
���-0.062
 Lennon-0.059
 lapt-0.059
 Gohan-0.056
FML-0.055
Token to
Token ID284
Feature activation+4.06e-01

Pos logit contributions
otrop+0.485
udos+0.442
acia+0.439
usting+0.436
 safety+0.417
Neg logit contributions
���-0.663
 Lennon-0.633
 lapt-0.628
 Gohan-0.596
FML-0.592
Token pass
Token ID1208
Feature activation+3.42e+00

Pos logit contributions
otrop+0.230
acia+0.208
udos+0.208
usting+0.206
 safety+0.197
Neg logit contributions
���-0.340
 Lennon-0.326
 lapt-0.324
 Gohan-0.308
FML-0.304
Token a
Token ID257
Feature activation+1.38e-01

Pos logit contributions
otrop+2.002
usting+1.810
 safety+1.803
acia+1.803
udos+1.802
Neg logit contributions
���-2.795
 lapt-2.671
 Lennon-2.640
 Gohan-2.489
 Yose-2.468
Token bill
Token ID2855
Feature activation-4.94e-01

Pos logit contributions
otrop+0.086
udos+0.079
acia+0.079
usting+0.078
 safety+0.075
Neg logit contributions
���-0.108
 Lennon-0.104
 lapt-0.103
 Gohan-0.097
 Yose-0.096
Token to
Token ID284
Feature activation+1.46e-01

Pos logit contributions
���+0.402
 Lennon+0.388
 lapt+0.384
 Gohan+0.366
 Yose+0.362
Neg logit contributions
otrop-0.289
udos-0.264
acia-0.263
usting-0.259
 safety-0.248
Token replace
Token ID6330
Feature activation+1.73e+00

Pos logit contributions
otrop+0.090
acia+0.082
udos+0.082
usting+0.081
 safety+0.077
Neg logit contributions
���-0.114
 Lennon-0.110
 lapt-0.108
 Gohan-0.103
FML-0.102
Token Democratic
Token ID4390
Feature activation-3.66e-01

Pos logit contributions
otrop+1.142
udos+1.047
acia+1.045
usting+1.042
 safety+1.018
Neg logit contributions
���-1.284
 lapt-1.217
 Lennon-1.217
 Gohan-1.141
 Yose-1.127
Token than
Token ID621
Feature activation+2.30e-01

Pos logit contributions
���+0.355
 Lennon+0.340
 lapt+0.339
 Gohan+0.322
 Yose+0.318
Neg logit contributions
otrop-0.228
udos-0.208
acia-0.207
usting-0.204
 safety-0.193
Token 10
Token ID838
Feature activation-3.40e-01

Pos logit contributions
otrop+0.154
acia+0.141
udos+0.141
usting+0.140
 safety+0.135
Neg logit contributions
���-0.169
 Lennon-0.161
 lapt-0.160
 Gohan-0.151
 Yose-0.149
Token years
Token ID812
Feature activation-5.52e-01

Pos logit contributions
���+0.319
 lapt+0.301
 Lennon+0.300
 agre+0.292
emort+0.288
Neg logit contributions
otrop-0.161
 safety-0.143
udos-0.143
usting-0.142
acia-0.141
Token in
Token ID287
Feature activation-2.04e-01

Pos logit contributions
���+0.464
 Lennon+0.442
 lapt+0.441
 Gohan+0.418
 Yose+0.414
Neg logit contributions
otrop-0.313
udos-0.284
acia-0.283
usting-0.281
 safety-0.272
Token prison
Token ID3770
Feature activation+1.82e-02

Pos logit contributions
���+0.177
 Lennon+0.169
 lapt+0.169
 Gohan+0.160
 Yose+0.159
Neg logit contributions
otrop-0.110
udos-0.099
acia-0.099
usting-0.098
 safety-0.094
Token MORE
Token ID12011
Feature activation+3.39e+00

Pos logit contributions
otrop+0.011
acia+0.010
usting+0.010
udos+0.010
 mine+0.009
Neg logit contributions
 Lennon-0.015
���-0.015
 lapt-0.014
 Gohan-0.014
 Yose-0.013
Token and
Token ID290
Feature activation+1.37e+00

Pos logit contributions
otrop+1.873
acia+1.702
usting+1.691
udos+1.691
 safety+1.679
Neg logit contributions
���-2.896
 lapt-2.723
 Lennon-2.702
 Gohan-2.558
FML-2.546
Token tick
Token ID4378
Feature activation+9.40e-01

Pos logit contributions
otrop+0.852
udos+0.778
acia+0.776
usting+0.771
 safety+0.740
Neg logit contributions
���-1.075
 Lennon-1.032
 lapt-1.024
 Gohan-0.970
 Yose-0.957
Tokened
Token ID276
Feature activation+1.45e-01

Pos logit contributions
otrop+0.585
udos+0.535
acia+0.534
usting+0.531
 safety+0.515
Neg logit contributions
���-0.734
 Lennon-0.700
 lapt-0.698
 Gohan-0.659
 Yose-0.650
Token through
Token ID832
Feature activation+8.23e-01

Pos logit contributions
otrop+0.093
udos+0.085
acia+0.085
usting+0.084
 safety+0.081
Neg logit contributions
���-0.111
 Lennon-0.107
 lapt-0.106
 Gohan-0.100
 Yose-0.099
Token her
Token ID607
Feature activation+6.71e-01

Pos logit contributions
otrop+0.502
udos+0.462
acia+0.459
usting+0.455
 safety+0.431
Neg logit contributions
���-0.652
 Lennon-0.629
 lapt-0.624
 Gohan-0.591
 Yose-0.584
Token 8
Token ID807
Feature activation+7.17e-01

Pos logit contributions
otrop+1.816
udos+1.652
acia+1.633
usting+1.624
 safety+1.597
Neg logit contributions
���-2.488
 lapt-2.342
 Lennon-2.334
 Gohan-2.199
FML-2.188
Token192
Token ID17477
Feature activation+7.20e-01

Pos logit contributions
otrop+0.374
acia+0.336
udos+0.335
usting+0.332
 safety+0.317
Neg logit contributions
���-0.630
 Lennon-0.606
 lapt-0.604
 Gohan-0.575
FML-0.570
Token MB
Token ID10771
Feature activation+3.31e+00

Pos logit contributions
otrop+0.441
udos+0.403
acia+0.401
usting+0.398
 safety+0.387
Neg logit contributions
���-0.573
 lapt-0.544
 Lennon-0.543
 Gohan-0.511
 Yose-0.506
Token 8
Token ID807
Feature activation+1.67e+00

Pos logit contributions
otrop+1.947
udos+1.776
acia+1.749
usting+1.738
 safety+1.720
Neg logit contributions
���-2.720
 lapt-2.551
 Lennon-2.538
 Gohan-2.399
FML-2.387
Token192
Token ID17477
Feature activation+8.52e-01

Pos logit contributions
otrop+0.855
acia+0.766
udos+0.765
usting+0.756
 safety+0.725
Neg logit contributions
���-1.483
 Lennon-1.428
 lapt-1.419
 Gohan-1.354
FML-1.342
Token MB
Token ID10771
Feature activation+3.33e+00

Pos logit contributions
otrop+0.529
udos+0.483
acia+0.483
usting+0.479
 safety+0.464
Neg logit contributions
���-0.667
 Lennon-0.637
 lapt-0.634
 Gohan-0.600
 Yose-0.592
Token Texture
Token ID44409
Feature activation+3.58e-01

Pos logit contributions
otrop+1.994
udos+1.823
acia+1.801
usting+1.786
 safety+1.769
Neg logit contributions
���-2.700
 lapt-2.533
 Lennon-2.519
 Gohan-2.378
FML-2.368
Token Units
Token ID27719
Feature activation-1.81e-01

Pos logit contributions
otrop+0.224
udos+0.206
acia+0.206
usting+0.204
 safety+0.194
Neg logit contributions
���-0.279
 Lennon-0.266
 lapt-0.266
 Gohan-0.251
FML-0.248
Token 144
Token ID20224
Feature activation+1.34e+00

Pos logit contributions
���+0.149
 Lennon+0.144
 lapt+0.143
 Gohan+0.136
 Yose+0.134
Neg logit contributions
otrop-0.105
udos-0.096
acia-0.096
usting-0.096
 safety-0.091
Token 64
Token ID5598
Feature activation+1.36e+00

Pos logit contributions
otrop+0.741
udos+0.669
acia+0.668
usting+0.659
 safety+0.641
Neg logit contributions
���-1.148
 Lennon-1.095
 lapt-1.093
 Gohan-1.035
 Yose-1.025
Token 104
Token ID14436
Feature activation+1.83e+00

Pos logit contributions
otrop+0.796
udos+0.723
acia+0.721
usting+0.716
 safety+0.692
Neg logit contributions
���-1.115
 Lennon-1.066
 lapt-1.062
 Gohan-1.006
 Yose-0.994
Tokenre
Token ID260
Feature activation-4.54e-01

Pos logit contributions
���+0.076
 Lennon+0.073
 lapt+0.073
 Gohan+0.068
FML+0.068
Neg logit contributions
otrop-0.079
acia-0.073
udos-0.073
usting-0.073
 safety-0.071
Tokenpos
Token ID1930
Feature activation-4.76e-01

Pos logit contributions
���+0.358
 Lennon+0.345
 lapt+0.343
 Gohan+0.323
FML+0.322
Neg logit contributions
otrop-0.273
acia-0.249
usting-0.248
udos-0.247
 safety-0.237
Token.
Token ID764
Feature activation-3.71e-01

Pos logit contributions
���+0.312
 Lennon+0.296
 lapt+0.294
 Gohan+0.275
FML+0.271
Neg logit contributions
otrop-0.355
udos-0.329
acia-0.329
usting-0.327
 safety-0.318
Token files
Token ID3696
Feature activation+5.05e-01

Pos logit contributions
���+0.269
 Lennon+0.258
 lapt+0.258
 Gohan+0.241
FML+0.238
Neg logit contributions
otrop-0.250
acia-0.230
udos-0.230
usting-0.228
 safety-0.221
Token ]
Token ID2361
Feature activation+5.46e-01

Pos logit contributions
otrop+0.270
udos+0.243
acia+0.243
usting+0.241
 safety+0.231
Neg logit contributions
���-0.438
 Lennon-0.421
 lapt-0.420
 Gohan-0.398
FML-0.394
Token WHERE
Token ID33411
Feature activation+3.32e+00

Pos logit contributions
otrop+0.357
udos+0.327
acia+0.327
usting+0.325
 safety+0.314
Neg logit contributions
���-0.409
 Lennon-0.391
 lapt-0.389
 Gohan-0.366
 Yose-0.361
Token RIGHT
Token ID33621
Feature activation+7.28e-01

Pos logit contributions
otrop+2.058
udos+1.883
acia+1.875
usting+1.865
 safety+1.841
Neg logit contributions
���-2.612
 Lennon-2.457
 lapt-2.440
 Gohan-2.321
 Yose-2.313
Token (
Token ID357
Feature activation-1.10e+00

Pos logit contributions
otrop+0.449
udos+0.410
acia+0.409
usting+0.406
 safety+0.393
Neg logit contributions
���-0.573
 Lennon-0.548
 lapt-0.546
 Gohan-0.515
 Yose-0.509
Token path
Token ID3108
Feature activation+1.10e+00

Pos logit contributions
���+0.844
 Lennon+0.811
 lapt+0.809
 Gohan+0.761
FML+0.753
Neg logit contributions
otrop-0.699
acia-0.637
udos-0.636
usting-0.631
 safety-0.608
Token,
Token ID837
Feature activation-1.17e-01

Pos logit contributions
otrop+0.641
udos+0.582
acia+0.581
usting+0.577
 safety+0.557
Neg logit contributions
���-0.903
 Lennon-0.865
 lapt-0.862
 Gohan-0.816
FML-0.805
Token 4
Token ID604
Feature activation+1.15e+00

Pos logit contributions
���+0.091
 Lennon+0.087
 lapt+0.087
 Gohan+0.082
FML+0.081
Neg logit contributions
otrop-0.073
udos-0.067
acia-0.067
usting-0.066
 safety-0.064
Token 6
Token ID718
Feature activation+1.05e+00

Pos logit contributions
otrop+1.310
udos+1.195
acia+1.188
usting+1.179
 safety+1.151
Neg logit contributions
���-1.758
 Lennon-1.670
 lapt-1.666
 Gohan-1.575
 Yose-1.561
Token144
Token ID18444
Feature activation+1.63e+00

Pos logit contributions
otrop+0.686
udos+0.629
acia+0.628
usting+0.624
 safety+0.605
Neg logit contributions
���-0.793
 Lennon-0.756
 lapt-0.753
 Gohan-0.710
 Yose-0.700
Token MB
Token ID10771
Feature activation+3.05e+00

Pos logit contributions
otrop+1.027
udos+0.942
acia+0.938
usting+0.931
 safety+0.906
Neg logit contributions
���-1.268
 Lennon-1.206
 lapt-1.202
 Gohan-1.132
 Yose-1.120
Token 8
Token ID807
Feature activation+7.17e-01

Pos logit contributions
otrop+1.816
udos+1.652
acia+1.633
usting+1.624
 safety+1.597
Neg logit contributions
���-2.488
 lapt-2.342
 Lennon-2.334
 Gohan-2.199
FML-2.188
Token192
Token ID17477
Feature activation+7.20e-01

Pos logit contributions
otrop+0.374
acia+0.336
udos+0.335
usting+0.332
 safety+0.317
Neg logit contributions
���-0.630
 Lennon-0.606
 lapt-0.604
 Gohan-0.575
FML-0.570
Token MB
Token ID10771
Feature activation+3.31e+00

Pos logit contributions
otrop+0.441
udos+0.403
acia+0.401
usting+0.398
 safety+0.387
Neg logit contributions
���-0.573
 lapt-0.544
 Lennon-0.543
 Gohan-0.511
 Yose-0.506
Token 8
Token ID807
Feature activation+1.67e+00

Pos logit contributions
otrop+1.947
udos+1.776
acia+1.749
usting+1.738
 safety+1.720
Neg logit contributions
���-2.720
 lapt-2.551
 Lennon-2.538
 Gohan-2.399
FML-2.387
Token192
Token ID17477
Feature activation+8.52e-01

Pos logit contributions
otrop+0.855
acia+0.766
udos+0.765
usting+0.756
 safety+0.725
Neg logit contributions
���-1.483
 Lennon-1.428
 lapt-1.419
 Gohan-1.354
FML-1.342
Token MB
Token ID10771
Feature activation+3.33e+00

Pos logit contributions
otrop+0.529
udos+0.483
acia+0.483
usting+0.479
 safety+0.464
Neg logit contributions
���-0.667
 Lennon-0.637
 lapt-0.634
 Gohan-0.600
 Yose-0.592
Token Texture
Token ID44409
Feature activation+3.58e-01

Pos logit contributions
otrop+1.994
udos+1.823
acia+1.801
usting+1.786
 safety+1.769
Neg logit contributions
���-2.700
 lapt-2.533
 Lennon-2.519
 Gohan-2.378
FML-2.368
Token Units
Token ID27719
Feature activation-1.81e-01

Pos logit contributions
otrop+0.224
udos+0.206
acia+0.206
usting+0.204
 safety+0.194
Neg logit contributions
���-0.279
 Lennon-0.266
 lapt-0.266
 Gohan-0.251
FML-0.248
Token than
Token ID621
Feature activation+1.22e-01

Pos logit contributions
���+0.402
 Lennon+0.385
 lapt+0.384
 Gohan+0.365
 Yose+0.361
Neg logit contributions
otrop-0.251
udos-0.227
acia-0.226
usting-0.224
 safety-0.214
Token 10
Token ID838
Feature activation-2.93e-01

Pos logit contributions
otrop+0.080
udos+0.073
acia+0.073
usting+0.073
 safety+0.071
Neg logit contributions
���-0.092
 Lennon-0.087
 lapt-0.087
 Gohan-0.082
FML-0.081
Token years
Token ID812
Feature activation-5.90e-01

Pos logit contributions
���+0.273
 lapt+0.258
 Lennon+0.257
 agre+0.250
emort+0.247
Neg logit contributions
otrop-0.141
 safety-0.126
udos-0.125
usting-0.124
acia-0.123
Token in
Token ID287
Feature activation-2.51e-01

Pos logit contributions
���+0.500
 Lennon+0.477
 lapt+0.477
 Gohan+0.452
 Yose+0.447
Neg logit contributions
otrop-0.330
udos-0.299
acia-0.298
usting-0.296
 safety-0.287
Token prison
Token ID3770
Feature activation-1.35e-01

Pos logit contributions
���+0.216
 Lennon+0.207
 lapt+0.206
 Gohan+0.196
 Yose+0.194
Neg logit contributions
otrop-0.136
udos-0.122
acia-0.122
usting-0.121
 safety-0.116
Token MORE
Token ID12011
Feature activation+3.28e+00

Pos logit contributions
 Lennon+0.108
���+0.108
 lapt+0.103
 Gohan+0.100
 Yose+0.099
Neg logit contributions
otrop-0.079
acia-0.073
usting-0.072
udos-0.071
 mine-0.067
Token and
Token ID290
Feature activation+1.31e+00

Pos logit contributions
otrop+1.849
udos+1.686
acia+1.686
usting+1.680
 safety+1.642
Neg logit contributions
���-2.764
 lapt-2.613
 Lennon-2.607
 Gohan-2.465
 Yose-2.439
Token Hillary
Token ID5073
Feature activation+2.15e-01

Pos logit contributions
otrop+0.828
udos+0.759
acia+0.759
usting+0.754
 safety+0.734
Neg logit contributions
���-1.010
 Lennon-0.958
 lapt-0.957
 Gohan-0.901
FML-0.891
Token Clinton
Token ID2605
Feature activation-5.63e-01

Pos logit contributions
otrop+0.163
udos+0.152
acia+0.151
usting+0.150
 safety+0.146
Neg logit contributions
���-0.138
 Lennon-0.132
 lapt-0.130
 Gohan-0.122
 Yose-0.120
Token Hillary
Token ID5073
Feature activation+1.23e+00

Pos logit contributions
���+0.398
 Lennon+0.393
 lapt+0.386
 Gohan+0.365
 Yose+0.363
Neg logit contributions
otrop-0.383
udos-0.353
acia-0.350
usting-0.347
 safety-0.328
Token Diane
Token ID26542
Feature activation-9.72e-02

Pos logit contributions
otrop+0.884
acia+0.822
udos+0.819
usting+0.816
 safety+0.799
Neg logit contributions
���-0.857
 lapt-0.804
 Lennon-0.798
 Gohan-0.748
 Yose-0.736
Token look
Token ID804
Feature activation+5.13e-01

Pos logit contributions
otrop+0.011
udos+0.010
acia+0.010
usting+0.010
 safety+0.010
Neg logit contributions
���-0.013
 Lennon-0.012
 lapt-0.012
 Gohan-0.012
 Yose-0.012
Token elsewhere
Token ID8057
Feature activation-6.95e-01

Pos logit contributions
otrop+0.299
udos+0.273
acia+0.272
usting+0.269
 safety+0.256
Neg logit contributions
���-0.420
 Lennon-0.405
 lapt-0.401
 Gohan-0.381
 Yose-0.376
Token,
Token ID11
Feature activation+2.92e-01

Pos logit contributions
���+0.595
 Lennon+0.586
 lapt+0.581
 Gohan+0.555
FML+0.546
Neg logit contributions
otrop-0.369
acia-0.333
udos-0.333
usting-0.328
 safety-0.309
Token as
Token ID355
Feature activation+2.08e-01

Pos logit contributions
otrop+0.158
udos+0.143
acia+0.142
usting+0.141
 safety+0.134
Neg logit contributions
���-0.251
 Lennon-0.242
 lapt-0.241
 Gohan-0.229
 Yose-0.226
Token successful
Token ID4388
Feature activation+5.37e-01

Pos logit contributions
otrop+0.119
udos+0.108
acia+0.108
usting+0.107
 safety+0.100
Neg logit contributions
���-0.172
 Lennon-0.167
 lapt-0.167
 Gohan-0.158
 Yose-0.156
Token GP
Token ID14714
Feature activation+3.25e+00

Pos logit contributions
otrop+0.333
udos+0.303
acia+0.302
usting+0.300
 safety+0.288
Neg logit contributions
���-0.419
 Lennon-0.405
 lapt-0.404
 Gohan-0.381
 Yose-0.375
Token2
Token ID17
Feature activation+6.72e-01

Pos logit contributions
otrop+2.187
udos+2.051
acia+2.047
usting+2.037
 safety+2.032
Neg logit contributions
���-2.351
 Lennon-2.210
 lapt-2.162
 Gohan-2.050
 Yose-2.035
Token and
Token ID290
Feature activation-5.06e-01

Pos logit contributions
otrop+0.449
udos+0.413
acia+0.413
usting+0.410
 safety+0.400
Neg logit contributions
���-0.495
 Lennon-0.469
 lapt-0.467
 Gohan-0.440
 Yose-0.434
Token Indy
Token ID26023
Feature activation-5.25e-01

Pos logit contributions
���+0.388
 lapt+0.380
 Lennon+0.378
 Gohan+0.355
 Yose+0.349
Neg logit contributions
otrop-0.320
udos-0.292
acia-0.290
usting-0.290
 safety-0.272
TokenCar
Token ID9914
Feature activation+1.40e+00

Pos logit contributions
���+0.320
 lapt+0.309
 Lennon+0.306
 Gohan+0.284
FML+0.280
Neg logit contributions
otrop-0.416
acia-0.384
udos-0.384
usting-0.381
 safety-0.367
Token teams
Token ID3466
Feature activation+3.54e-01

Pos logit contributions
otrop+0.925
acia+0.853
udos+0.852
usting+0.847
 safety+0.829
Neg logit contributions
���-1.047
 Lennon-0.991
 lapt-0.982
 Gohan-0.930
 Yose-0.918
Token of
Token ID286
Feature activation+1.10e+00

Pos logit contributions
otrop+0.353
udos+0.325
acia+0.324
usting+0.321
 safety+0.312
Neg logit contributions
���-0.413
 Lennon-0.392
 lapt-0.391
 Gohan-0.369
 Yose-0.364
Token Ancient
Token ID13406
Feature activation-5.38e-03

Pos logit contributions
otrop+0.702
udos+0.644
acia+0.641
usting+0.638
 safety+0.621
Neg logit contributions
���-0.841
 Lennon-0.798
 lapt-0.795
 Gohan-0.750
FML-0.741
Token Greece
Token ID10315
Feature activation+7.47e-01

Pos logit contributions
���+0.003
 Lennon+0.003
 lapt+0.003
 Gohan+0.003
 Yose+0.003
Neg logit contributions
otrop-0.004
udos-0.004
acia-0.004
usting-0.004
 safety-0.004
Token focused
Token ID5670
Feature activation-3.11e-01

Pos logit contributions
otrop+0.471
udos+0.430
acia+0.430
usting+0.426
 safety+0.412
Neg logit contributions
���-0.579
 Lennon-0.553
 lapt-0.550
 Gohan-0.520
 Yose-0.513
Token un
Token ID555
Feature activation+1.45e-01

Pos logit contributions
���+0.240
 Lennon+0.233
 lapt+0.231
 Gohan+0.218
 Yose+0.215
Neg logit contributions
otrop-0.194
acia-0.178
udos-0.178
usting-0.176
 safety-0.168
Tokenash
Token ID1077
Feature activation+3.25e+00

Pos logit contributions
otrop+0.096
udos+0.089
acia+0.089
usting+0.088
 safety+0.085
Neg logit contributions
���-0.106
 Lennon-0.101
 lapt-0.101
 Gohan-0.095
FML-0.094
Tokenamed
Token ID2434
Feature activation+6.63e-01

Pos logit contributions
otrop+1.232
usting+1.049
acia+1.043
udos+1.041
 mine+0.975
Neg logit contributions
���-3.395
 lapt-3.248
 Lennon-3.219
 Gohan-3.070
 Yose-3.047
Tokenly
Token ID306
Feature activation+1.11e+00

Pos logit contributions
otrop+0.475
udos+0.441
acia+0.440
usting+0.438
 safety+0.429
Neg logit contributions
���-0.458
 lapt-0.428
 Lennon-0.428
 Gohan-0.400
 Yose-0.396
Token on
Token ID319
Feature activation+7.22e-02

Pos logit contributions
otrop+0.678
udos+0.618
acia+0.617
usting+0.612
 safety+0.592
Neg logit contributions
���-0.880
 Lennon-0.842
 lapt-0.837
 Gohan-0.792
 Yose-0.782
Token the
Token ID262
Feature activation+2.23e-01

Pos logit contributions
otrop+0.042
udos+0.038
acia+0.038
usting+0.038
 safety+0.036
Neg logit contributions
���-0.059
 Lennon-0.057
 lapt-0.057
 Gohan-0.054
 Yose-0.053
Token naked
Token ID12105
Feature activation-9.10e-01

Pos logit contributions
otrop+0.134
udos+0.122
acia+0.121
usting+0.120
 safety+0.116
Neg logit contributions
���-0.179
 Lennon-0.172
 lapt-0.171
 Gohan-0.162
 Yose-0.160
Token popular
Token ID2968
Feature activation-6.80e-01

Pos logit contributions
otrop+0.544
udos+0.494
acia+0.493
usting+0.489
 safety+0.471
Neg logit contributions
���-0.761
 Lennon-0.729
 lapt-0.724
 Gohan-0.687
 Yose-0.681
Tokenized
Token ID1143
Feature activation-1.17e+00

Pos logit contributions
���+0.564
 Lennon+0.551
 lapt+0.545
 Gohan+0.519
 Yose+0.514
Neg logit contributions
otrop-0.382
udos-0.348
acia-0.344
usting-0.344
 safety-0.323
Token by
Token ID416
Feature activation-6.40e-02

Pos logit contributions
���+0.937
 Lennon+0.911
 lapt+0.897
 Gohan+0.856
 Yose+0.851
Neg logit contributions
otrop-0.699
udos-0.641
acia-0.640
usting-0.631
 safety-0.601
Token Anita
Token ID38992
Feature activation-2.43e-01

Pos logit contributions
���+0.047
 Lennon+0.045
 lapt+0.044
 Gohan+0.042
 Yose+0.042
Neg logit contributions
otrop-0.043
udos-0.040
acia-0.040
usting-0.039
 safety-0.038
Token D
Token ID360
Feature activation+1.89e-01

Pos logit contributions
���+0.180
 Lennon+0.172
 lapt+0.171
 Gohan+0.161
 Yose+0.159
Neg logit contributions
otrop-0.161
udos-0.148
acia-0.148
usting-0.147
 safety-0.142
Tokeniam
Token ID1789
Feature activation+3.24e+00

Pos logit contributions
otrop+0.118
udos+0.108
acia+0.107
usting+0.107
 safety+0.104
Neg logit contributions
���-0.146
 Lennon-0.141
 lapt-0.139
 Gohan-0.132
 Yose-0.131
Tokenant
Token ID415
Feature activation+8.13e-01

Pos logit contributions
otrop+2.040
acia+1.855
udos+1.854
usting+1.803
 safety+1.755
Neg logit contributions
���-2.541
 lapt-2.419
 Lennon-2.401
 Gohan-2.269
FML-2.254
Token,
Token ID11
Feature activation-1.44e-01

Pos logit contributions
otrop+0.518
udos+0.475
acia+0.474
usting+0.469
 safety+0.456
Neg logit contributions
���-0.625
 Lennon-0.593
 lapt-0.592
 Gohan-0.559
FML-0.551
Token The
Token ID383
Feature activation+8.34e-02

Pos logit contributions
���+0.115
 Lennon+0.110
 lapt+0.110
 Gohan+0.104
 Yose+0.103
Neg logit contributions
otrop-0.087
udos-0.079
acia-0.079
usting-0.079
 safety-0.076
Token Red
Token ID2297
Feature activation+4.58e-01

Pos logit contributions
otrop+0.057
udos+0.053
acia+0.053
usting+0.053
 safety+0.051
Neg logit contributions
���-0.060
 Lennon-0.057
 lapt-0.056
 Gohan-0.053
 Yose-0.052
Token Tent
Token ID30765
Feature activation-3.81e-01

Pos logit contributions
otrop+0.312
udos+0.287
acia+0.287
usting+0.285
 safety+0.277
Neg logit contributions
���-0.333
 lapt-0.315
 Lennon-0.315
 Gohan-0.295
FML-0.291
Tokenojure
Token ID32511
Feature activation+1.92e-02

Pos logit contributions
otrop+0.778
udos+0.719
acia+0.718
usting+0.711
 safety+0.688
Neg logit contributions
���-0.793
 lapt-0.736
 Lennon-0.729
 Gohan-0.683
 agre-0.677
Token |
Token ID930
Feature activation+1.83e+00

Pos logit contributions
otrop+0.012
udos+0.011
acia+0.011
usting+0.010
 safety+0.010
Neg logit contributions
���-0.015
 Lennon-0.015
 lapt-0.015
 Gohan-0.014
 Yose-0.014
Token 208
Token ID27121
Feature activation+1.76e-02

Pos logit contributions
otrop+1.174
udos+1.073
acia+1.071
usting+1.064
 safety+1.039
Neg logit contributions
���-1.397
 Lennon-1.321
 lapt-1.318
 Gohan-1.244
 Yose-1.229
Token34
Token ID2682
Feature activation+5.19e-01

Pos logit contributions
otrop+0.011
udos+0.010
acia+0.010
usting+0.010
 safety+0.010
Neg logit contributions
���-0.014
 Lennon-0.013
 lapt-0.013
 Gohan-0.012
 Yose-0.012
Token |
Token ID930
Feature activation+2.37e+00

Pos logit contributions
otrop+0.324
udos+0.295
acia+0.295
usting+0.293
 safety+0.282
Neg logit contributions
���-0.405
 Lennon-0.388
 lapt-0.386
 Gohan-0.365
 Yose-0.360
Token |
Token ID930
Feature activation+3.21e+00

Pos logit contributions
otrop+1.404
udos+1.281
acia+1.276
usting+1.265
 safety+1.232
Neg logit contributions
���-1.925
 Lennon-1.830
 lapt-1.824
 Gohan-1.731
FML-1.708
Token 2
Token ID362
Feature activation+8.09e-01

Pos logit contributions
otrop+1.921
udos+1.755
acia+1.747
usting+1.724
 safety+1.702
Neg logit contributions
���-2.580
 Lennon-2.434
 lapt-2.429
 Gohan-2.318
 Yose-2.279
Token |
Token ID930
Feature activation+2.25e+00

Pos logit contributions
otrop+0.486
udos+0.442
acia+0.441
usting+0.438
 safety+0.423
Neg logit contributions
���-0.650
 Lennon-0.622
 lapt-0.619
 Gohan-0.585
FML-0.579
Token org
Token ID8745
Feature activation+6.03e-01

Pos logit contributions
otrop+1.204
udos+1.083
acia+1.077
usting+1.067
 safety+1.039
Neg logit contributions
���-1.965
 Lennon-1.870
 lapt-1.864
 Gohan-1.784
 Yose-1.763
Token.
Token ID764
Feature activation-8.75e-02

Pos logit contributions
otrop+0.229
udos+0.197
acia+0.196
usting+0.194
 safety+0.183
Neg logit contributions
���-0.615
 Lennon-0.597
 lapt-0.594
 Gohan-0.570
 Yose-0.565
Token cl
Token ID537
Feature activation+1.20e+00

Pos logit contributions
���+0.058
 lapt+0.053
 Lennon+0.053
 Gohan+0.050
 Yose+0.049
Neg logit contributions
otrop-0.066
acia-0.061
udos-0.061
usting-0.061
 safety-0.060
Tokenus
Token ID385
Feature activation-5.77e-03

Pos logit contributions
otrop+0.455
udos+0.418
acia+0.416
usting+0.415
 safety+0.404
Neg logit contributions
���-0.532
 Lennon-0.509
 lapt-0.506
 Gohan-0.477
 Yose-0.470
Token nerve
Token ID16384
Feature activation-1.01e-01

Pos logit contributions
���+0.004
 Lennon+0.004
 lapt+0.004
 Gohan+0.003
 Yose+0.003
Neg logit contributions
otrop-0.004
udos-0.004
usting-0.004
acia-0.004
 mine-0.004
Token stimulation
Token ID20087
Feature activation+5.62e-01

Pos logit contributions
���+0.068
 Lennon+0.065
 lapt+0.064
 Gohan+0.060
 Yose+0.059
Neg logit contributions
otrop-0.073
udos-0.069
usting-0.067
acia-0.067
 mine-0.064
Token (
Token ID357
Feature activation+6.86e-01

Pos logit contributions
otrop+0.343
udos+0.319
acia+0.318
usting+0.313
 mine+0.296
Neg logit contributions
���-0.436
 Lennon-0.422
 lapt-0.418
 Gohan-0.396
FML-0.393
TokenV
Token ID53
Feature activation+5.05e-01

Pos logit contributions
otrop+0.431
udos+0.395
acia+0.393
usting+0.391
 safety+0.379
Neg logit contributions
���-0.534
 Lennon-0.508
 lapt-0.506
 Gohan-0.478
 Yose-0.471
TokenNS
Token ID8035
Feature activation+3.20e+00

Pos logit contributions
otrop+0.408
udos+0.381
acia+0.381
usting+0.378
 safety+0.369
Neg logit contributions
���-0.300
 Lennon-0.283
 lapt-0.281
 Gohan-0.261
FML-0.257
Token),
Token ID828
Feature activation+3.14e-01

Pos logit contributions
otrop+2.107
acia+1.928
udos+1.924
usting+1.918
 safety+1.888
Neg logit contributions
���-2.394
 lapt-2.273
 Lennon-2.253
 Gohan-2.124
 Yose-2.109
Token an
Token ID281
Feature activation-1.91e-01

Pos logit contributions
otrop+0.182
udos+0.167
acia+0.166
usting+0.164
 safety+0.156
Neg logit contributions
���-0.258
 Lennon-0.248
 lapt-0.246
 Gohan-0.233
 Yose-0.231
Token FDA
Token ID14672
Feature activation+3.04e-01

Pos logit contributions
���+0.153
 Lennon+0.146
 lapt+0.145
 Gohan+0.137
 Yose+0.136
Neg logit contributions
otrop-0.115
udos-0.105
acia-0.105
usting-0.104
 safety-0.101
Token-
Token ID12
Feature activation-3.54e-01

Pos logit contributions
otrop+0.113
udos+0.097
acia+0.097
usting+0.095
 safety+0.088
Neg logit contributions
���-0.313
 Lennon-0.302
 lapt-0.302
 Gohan-0.289
FML-0.287
Tokenapproved
Token ID29137
Feature activation+1.58e-01

Pos logit contributions
���+0.393
 Lennon+0.380
 lapt+0.379
 Gohan+0.365
FML+0.363
Neg logit contributions
otrop-0.104
acia-0.085
udos-0.085
usting-0.083
 safety-0.076
Token Republicans
Token ID4734
Feature activation+4.62e-01

Pos logit contributions
otrop+0.377
udos+0.340
acia+0.339
usting+0.336
 safety+0.323
Neg logit contributions
���-0.602
 Lennon-0.579
 lapt-0.576
 Gohan-0.548
 Yose-0.541
Token are
Token ID389
Feature activation+3.17e-01

Pos logit contributions
otrop+0.267
udos+0.242
acia+0.241
usting+0.238
 safety+0.229
Neg logit contributions
���-0.382
 Lennon-0.367
 lapt-0.364
 Gohan-0.346
 Yose-0.342
Token furiously
Token ID44896
Feature activation+8.13e-01

Pos logit contributions
otrop+0.197
udos+0.180
acia+0.180
usting+0.179
 safety+0.173
Neg logit contributions
���-0.248
 Lennon-0.236
 lapt-0.236
 Gohan-0.223
 Yose-0.220
Token whipping
Token ID44637
Feature activation+1.06e+00

Pos logit contributions
otrop+0.517
udos+0.475
usting+0.475
acia+0.474
 safety+0.461
Neg logit contributions
���-0.624
 Lennon-0.591
 lapt-0.590
 Gohan-0.556
 Yose-0.550
Token to
Token ID284
Feature activation+3.56e-01

Pos logit contributions
otrop+0.605
udos+0.548
acia+0.546
usting+0.545
 safety+0.530
Neg logit contributions
���-0.887
 Lennon-0.842
 lapt-0.841
 Gohan-0.794
 Yose-0.787
Token pass
Token ID1208
Feature activation+3.17e+00

Pos logit contributions
otrop+0.206
acia+0.187
udos+0.185
usting+0.184
 safety+0.174
Neg logit contributions
���-0.291
 Lennon-0.283
 lapt-0.281
 Gohan-0.266
 Yose-0.263
Token ahead
Token ID4058
Feature activation+1.31e+00

Pos logit contributions
otrop+1.686
udos+1.524
acia+1.519
usting+1.513
 safety+1.473
Neg logit contributions
���-2.770
 lapt-2.633
 Lennon-2.630
 Gohan-2.490
 Yose-2.473
Token of
Token ID286
Feature activation+1.04e-01

Pos logit contributions
otrop+0.792
udos+0.722
acia+0.722
usting+0.717
 safety+0.705
Neg logit contributions
���-1.074
 lapt-1.008
 Lennon-0.997
 Gohan-0.941
 agre-0.931
Token September
Token ID2693
Feature activation+1.33e-01

Pos logit contributions
otrop+0.061
udos+0.055
acia+0.055
usting+0.055
 safety+0.052
Neg logit contributions
���-0.085
 Lennon-0.083
 lapt-0.082
 Gohan-0.078
 Yose-0.077
Token 30
Token ID1542
Feature activation-8.36e-01

Pos logit contributions
otrop+0.088
udos+0.081
acia+0.081
usting+0.080
 safety+0.077
Neg logit contributions
���-0.099
 Lennon-0.095
 lapt-0.094
 Gohan-0.089
 Yose-0.088
Token,
Token ID11
Feature activation-7.08e-02

Pos logit contributions
���+0.646
 Lennon+0.623
 lapt+0.620
 Gohan+0.587
FML+0.580
Neg logit contributions
otrop-0.520
acia-0.476
udos-0.473
usting-0.464
 safety-0.447
Token and
Token ID290
Feature activation+6.42e-01

Pos logit contributions
otrop+0.545
udos+0.501
acia+0.497
usting+0.496
 safety+0.485
Neg logit contributions
���-0.663
 lapt-0.629
 Lennon-0.628
 Gohan-0.593
 Yose-0.585
Token Reporting
Token ID29595
Feature activation+5.98e-01

Pos logit contributions
otrop+0.575
udos+0.540
acia+0.540
usting+0.536
 safety+0.523
Neg logit contributions
���-0.327
 Lennon-0.305
 lapt-0.301
 Gohan-0.277
 Yose-0.272
Token System
Token ID4482
Feature activation+7.22e-01

Pos logit contributions
otrop+0.496
udos+0.464
acia+0.463
usting+0.461
 safety+0.451
Neg logit contributions
���-0.345
 Lennon-0.322
 lapt-0.322
 Gohan-0.297
 Yose-0.291
Token (
Token ID357
Feature activation+3.83e-01

Pos logit contributions
otrop+0.399
udos+0.360
acia+0.359
usting+0.356
 safety+0.341
Neg logit contributions
���-0.613
 Lennon-0.590
 lapt-0.586
 Gohan-0.557
 Yose-0.551
TokenW
Token ID54
Feature activation+2.92e-01

Pos logit contributions
otrop+0.271
acia+0.251
udos+0.250
usting+0.249
 safety+0.241
Neg logit contributions
���-0.265
 Lennon-0.252
 lapt-0.251
 Gohan-0.235
FML-0.234
TokenIS
Token ID1797
Feature activation+3.16e+00

Pos logit contributions
otrop+0.182
udos+0.166
acia+0.166
usting+0.165
 safety+0.160
Neg logit contributions
���-0.228
 Lennon-0.218
 lapt-0.217
 Gohan-0.205
 Yose-0.202
TokenQ
Token ID48
Feature activation-4.61e-02

Pos logit contributions
otrop+2.142
udos+1.994
usting+1.958
 safety+1.957
acia+1.940
Neg logit contributions
���-2.318
 lapt-2.194
 Lennon-2.143
 Gohan-2.011
 Yose-2.002
TokenARS
Token ID27415
Feature activation+1.10e+00

Pos logit contributions
���+0.031
 lapt+0.029
 Lennon+0.029
 Gohan+0.027
 Yose+0.027
Neg logit contributions
otrop-0.034
udos-0.031
acia-0.031
usting-0.031
 safety-0.030
Token).
Token ID737
Feature activation+6.10e-01

Pos logit contributions
otrop+0.719
udos+0.661
acia+0.658
usting+0.655
 safety+0.638
Neg logit contributions
���-0.831
 Lennon-0.789
 lapt-0.788
 Gohan-0.741
 Yose-0.731
Token Atlanta
Token ID9371
Feature activation+9.26e-01

Pos logit contributions
otrop+0.389
udos+0.357
acia+0.356
usting+0.354
 safety+0.342
Neg logit contributions
���-0.466
 Lennon-0.447
 lapt-0.443
 Gohan-0.420
 Yose-0.414
Token,
Token ID11
Feature activation+9.09e-01

Pos logit contributions
otrop+0.614
udos+0.566
acia+0.564
usting+0.561
 safety+0.539
Neg logit contributions
���-0.684
 Lennon-0.652
 lapt-0.648
 Gohan-0.610
FML-0.603
TokenAs
Token ID1722
Feature activation+4.72e-01

Pos logit contributions
otrop+0.790
udos+0.733
acia+0.727
usting+0.721
 safety+0.697
Neg logit contributions
���-0.875
 Lennon-0.824
 lapt-0.820
 Gohan-0.772
 Yose-0.760
Tokenyu
Token ID24767
Feature activation+3.23e-01

Pos logit contributions
otrop+0.327
acia+0.304
udos+0.303
usting+0.301
 safety+0.292
Neg logit contributions
���-0.338
 Lennon-0.318
 lapt-0.317
 Gohan-0.297
 Yose-0.294
Token
Token ID198
Feature activation+1.18e+00

Pos logit contributions
otrop+0.202
udos+0.185
acia+0.184
usting+0.183
 safety+0.177
Neg logit contributions
���-0.252
 Lennon-0.240
 lapt-0.239
 Gohan-0.226
 Yose-0.223
Token
Token ID198
Feature activation+1.08e+00

Pos logit contributions
otrop+0.671
udos+0.608
acia+0.607
usting+0.601
 safety+0.584
Neg logit contributions
���-1.003
 lapt-0.947
 Lennon-0.937
 Gohan-0.885
 agre-0.879
TokenK
Token ID42
Feature activation+7.14e-01

Pos logit contributions
otrop+0.786
udos+0.728
acia+0.728
usting+0.722
 safety+0.703
Neg logit contributions
���-0.742
 Lennon-0.698
 lapt-0.694
 Gohan-0.649
 Yose-0.639
Tokenong
Token ID506
Feature activation+3.14e+00

Pos logit contributions
otrop+0.475
udos+0.441
acia+0.437
usting+0.433
 safety+0.419
Neg logit contributions
���-0.536
 Lennon-0.505
 lapt-0.502
 Gohan-0.472
 Yose-0.465
Token
Token ID198
Feature activation+1.55e+00

Pos logit contributions
otrop+2.179
acia+2.044
udos+2.039
usting+2.022
 safety+1.970
Neg logit contributions
���-2.237
 Lennon-2.087
 lapt-2.072
 Gohan-1.921
 Yose-1.904
Token
Token ID198
Feature activation+1.24e+00

Pos logit contributions
otrop+0.865
udos+0.783
acia+0.781
usting+0.773
 safety+0.753
Neg logit contributions
���-1.350
 lapt-1.268
 Lennon-1.244
 agre-1.184
 Gohan-1.178
TokenCh
Token ID1925
Feature activation+1.33e+00

Pos logit contributions
otrop+0.899
udos+0.834
acia+0.833
usting+0.826
 safety+0.805
Neg logit contributions
���-0.850
 Lennon-0.799
 lapt-0.794
 Gohan-0.743
 Yose-0.731
Tokenoya
Token ID23790
Feature activation+2.20e-01

Pos logit contributions
otrop+0.900
acia+0.837
udos+0.836
usting+0.828
 safety+0.795
Neg logit contributions
���-0.994
 Lennon-0.924
 lapt-0.924
 Gohan-0.863
FML-0.850
Token
Token ID198
Feature activation+1.18e+00

Pos logit contributions
otrop+0.137
udos+0.125
acia+0.125
usting+0.124
 safety+0.120
Neg logit contributions
���-0.173
 Lennon-0.165
 lapt-0.164
 Gohan-0.155
 Yose-0.153
Token,
Token ID11
Feature activation-5.79e-01

Pos logit contributions
otrop+0.736
udos+0.675
acia+0.674
usting+0.669
 safety+0.650
Neg logit contributions
���-0.863
 Lennon-0.821
 lapt-0.820
 Gohan-0.772
 Yose-0.762
Token also
Token ID635
Feature activation+1.13e-01

Pos logit contributions
���+0.499
 Lennon+0.480
 lapt+0.476
 Gohan+0.454
 Yose+0.448
Neg logit contributions
otrop-0.314
udos-0.283
acia-0.282
usting-0.280
 safety-0.267
Token said
Token ID531
Feature activation+2.11e-01

Pos logit contributions
otrop+0.065
udos+0.059
acia+0.059
usting+0.058
 safety+0.056
Neg logit contributions
���-0.094
 Lennon-0.090
 lapt-0.090
 Gohan-0.085
 Yose-0.084
Token Congress
Token ID3162
Feature activation-2.77e-01

Pos logit contributions
otrop+0.127
udos+0.116
acia+0.115
usting+0.115
 safety+0.110
Neg logit contributions
���-0.169
 Lennon-0.162
 lapt-0.161
 Gohan-0.153
 Yose-0.151
Token would
Token ID561
Feature activation-3.36e-01

Pos logit contributions
���+0.213
 Lennon+0.204
 lapt+0.203
 Gohan+0.191
FML+0.190
Neg logit contributions
otrop-0.176
udos-0.160
acia-0.160
usting-0.158
 safety-0.152
Token pass
Token ID1208
Feature activation+3.14e+00

Pos logit contributions
���+0.261
 Lennon+0.252
 lapt+0.249
 Gohan+0.237
FML+0.233
Neg logit contributions
otrop-0.209
acia-0.189
udos-0.188
usting-0.188
 safety-0.177
Token legislation
Token ID5520
Feature activation-8.79e-01

Pos logit contributions
otrop+1.782
udos+1.609
acia+1.606
usting+1.601
 safety+1.563
Neg logit contributions
���-2.626
 lapt-2.506
 Lennon-2.502
 Gohan-2.366
 Yose-2.350
Token to
Token ID284
Feature activation+3.83e-01

Pos logit contributions
���+0.714
 Lennon+0.696
 lapt+0.687
 Gohan+0.656
FML+0.647
Neg logit contributions
otrop-0.513
udos-0.468
acia-0.464
usting-0.457
 safety-0.434
Token raise
Token ID5298
Feature activation+3.79e-01

Pos logit contributions
otrop+0.237
acia+0.217
udos+0.216
usting+0.214
 safety+0.205
Neg logit contributions
���-0.298
 Lennon-0.287
 lapt-0.284
 Gohan-0.270
FML-0.266
Token the
Token ID262
Feature activation+2.53e-01

Pos logit contributions
otrop+0.218
acia+0.202
udos+0.202
usting+0.197
 mine+0.186
Neg logit contributions
���-0.308
 Lennon-0.299
 lapt-0.295
 Gohan-0.281
FML-0.279
Token ceiling
Token ID13387
Feature activation-9.04e-01

Pos logit contributions
otrop+0.168
udos+0.157
acia+0.156
usting+0.154
 mine+0.147
Neg logit contributions
���-0.185
 Lennon-0.178
 lapt-0.176
 Gohan-0.166
FML-0.164
Token cl
Token ID537
Feature activation-4.54e-01

Pos logit contributions
���+0.557
 Lennon+0.538
 lapt+0.536
 Gohan+0.506
FML+0.502
Neg logit contributions
otrop-0.372
udos-0.337
acia-0.336
usting-0.332
 safety-0.318
Tokenojure
Token ID32511
Feature activation-6.96e-01

Pos logit contributions
���+0.299
 Lennon+0.286
 lapt+0.284
 Gohan+0.266
FML+0.263
Neg logit contributions
otrop-0.334
udos-0.309
acia-0.308
usting-0.307
 safety-0.300
Token.
Token ID764
Feature activation+4.40e-03

Pos logit contributions
���+0.668
 lapt+0.653
 Lennon+0.650
FML+0.612
 Gohan+0.609
Neg logit contributions
otrop-0.295
udos-0.264
acia-0.261
usting-0.256
 safety-0.244
Token contents
Token ID10154
Feature activation+3.26e-01

Pos logit contributions
otrop+0.003
acia+0.003
udos+0.003
usting+0.002
 safety+0.002
Neg logit contributions
���-0.003
 Lennon-0.003
 lapt-0.003
 Gohan-0.003
FML-0.003
Token ]
Token ID2361
Feature activation+1.33e-01

Pos logit contributions
otrop+0.162
udos+0.144
acia+0.144
usting+0.142
 safety+0.136
Neg logit contributions
���-0.295
 Lennon-0.285
 lapt-0.285
 Gohan-0.270
FML-0.267
Token WHERE
Token ID33411
Feature activation+3.13e+00

Pos logit contributions
otrop+0.089
udos+0.082
acia+0.082
usting+0.082
 safety+0.079
Neg logit contributions
���-0.097
 Lennon-0.092
 lapt-0.092
 Gohan-0.086
 Yose-0.085
Token id
Token ID4686
Feature activation+2.54e-01

Pos logit contributions
otrop+2.000
acia+1.836
udos+1.831
usting+1.825
 safety+1.804
Neg logit contributions
���-2.404
 Lennon-2.233
 lapt-2.228
 Gohan-2.118
 Yose-2.107
Token IN
Token ID3268
Feature activation+3.35e-01

Pos logit contributions
otrop+0.173
udos+0.160
acia+0.159
usting+0.158
 safety+0.154
Neg logit contributions
���-0.184
 Lennon-0.175
 lapt-0.174
 Gohan-0.163
FML-0.161
Token (
Token ID357
Feature activation-9.16e-01

Pos logit contributions
otrop+0.207
udos+0.190
acia+0.189
usting+0.188
 safety+0.181
Neg logit contributions
���-0.262
 Lennon-0.251
 lapt-0.250
 Gohan-0.236
FML-0.233
Token SELECT
Token ID33493
Feature activation+9.52e-01

Pos logit contributions
���+0.637
 Lennon+0.610
 lapt+0.606
 Gohan+0.567
FML+0.559
Neg logit contributions
otrop-0.647
udos-0.597
acia-0.596
usting-0.592
 safety-0.572
Token id
Token ID4686
Feature activation+5.86e-01

Pos logit contributions
otrop+0.601
udos+0.550
acia+0.549
usting+0.545
 safety+0.528
Neg logit contributions
���-0.736
 Lennon-0.702
 lapt-0.699
 Gohan-0.659
 Yose-0.651
Token Director
Token ID5890
Feature activation-6.16e-02

Pos logit contributions
otrop+1.123
acia+1.058
udos+1.053
usting+1.035
 safety+1.022
Neg logit contributions
���-0.925
 lapt-0.855
 Lennon-0.855
 Gohan-0.792
FML-0.785
Token at
Token ID379
Feature activation-8.15e-01

Pos logit contributions
���+0.050
 Lennon+0.048
 lapt+0.048
 Gohan+0.045
 Yose+0.045
Neg logit contributions
otrop-0.037
udos-0.033
acia-0.033
usting-0.033
 safety-0.032
Token Wave
Token ID17084
Feature activation+7.87e-01

Pos logit contributions
���+0.605
 Lennon+0.579
 lapt+0.577
 Gohan+0.543
 Yose+0.536
Neg logit contributions
otrop-0.537
udos-0.494
acia-0.492
usting-0.489
 safety-0.473
Tokenform
Token ID687
Feature activation-1.82e-01

Pos logit contributions
otrop+0.546
udos+0.504
acia+0.503
usting+0.500
 safety+0.487
Neg logit contributions
���-0.561
 Lennon-0.532
 lapt-0.529
 Gohan-0.497
 Yose-0.489
Token Entertainment
Token ID11058
Feature activation+6.00e-01

Pos logit contributions
���+0.125
 Lennon+0.119
 lapt+0.119
 Gohan+0.111
 Yose+0.110
Neg logit contributions
otrop-0.130
udos-0.120
acia-0.120
usting-0.119
 safety-0.116
Token |
Token ID930
Feature activation+3.13e+00

Pos logit contributions
otrop+0.358
udos+0.326
acia+0.325
usting+0.323
 safety+0.313
Neg logit contributions
���-0.485
 Lennon-0.462
 lapt-0.461
 Gohan-0.436
FML-0.431
Token Streaming
Token ID43124
Feature activation+1.14e+00

Pos logit contributions
otrop+2.137
acia+1.965
udos+1.962
usting+1.956
 safety+1.938
Neg logit contributions
���-2.268
 lapt-2.090
 Lennon-2.087
 Gohan-1.948
FML-1.942
Token Every
Token ID3887
Feature activation+6.55e-01

Pos logit contributions
otrop+0.743
acia+0.684
udos+0.682
usting+0.677
 safety+0.662
Neg logit contributions
���-0.864
 Lennon-0.817
 lapt-0.813
 Gohan-0.765
FML-0.756
Token Es
Token ID8678
Feature activation+7.81e-01

Pos logit contributions
otrop+0.458
acia+0.422
udos+0.422
usting+0.419
 safety+0.410
Neg logit contributions
���-0.463
 Lennon-0.437
 lapt-0.436
 Gohan-0.408
 Yose-0.404
Tokenport
Token ID634
Feature activation+1.52e-01

Pos logit contributions
otrop+0.379
acia+0.337
udos+0.336
usting+0.331
 mine+0.317
Neg logit contributions
���-0.724
 Lennon-0.695
 lapt-0.692
 Gohan-0.659
 Yose-0.653
Token Under
Token ID4698
Feature activation+1.78e+00

Pos logit contributions
otrop+0.101
udos+0.093
acia+0.093
usting+0.092
 safety+0.090
Neg logit contributions
���-0.113
 lapt-0.106
 Lennon-0.106
 Gohan-0.100
FML-0.098
Token sound
Token ID2128
Feature activation+2.96e-01

Pos logit contributions
���+0.201
 Lennon+0.193
 lapt+0.192
 Gohan+0.181
 Yose+0.179
Neg logit contributions
otrop-0.156
udos-0.142
acia-0.142
usting-0.141
 safety-0.135
Token system
Token ID1080
Feature activation+1.05e+00

Pos logit contributions
otrop+0.182
udos+0.166
acia+0.166
usting+0.165
 safety+0.160
Neg logit contributions
���-0.234
 Lennon-0.223
 lapt-0.222
 Gohan-0.210
 Yose-0.207
Token.
Token ID13
Feature activation-5.33e-01

Pos logit contributions
otrop+0.565
udos+0.508
acia+0.507
usting+0.503
 safety+0.484
Neg logit contributions
���-0.916
 Lennon-0.879
 lapt-0.875
 Gohan-0.832
 Yose-0.823
Token
Token ID198
Feature activation-2.31e+00

Pos logit contributions
���+0.405
 Lennon+0.390
 lapt+0.386
 Gohan+0.366
 Yose+0.361
Neg logit contributions
otrop-0.342
acia-0.313
udos-0.311
usting-0.310
 safety-0.300
Token
Token ID198
Feature activation-2.29e+00

Pos logit contributions
 Lennon+1.651
FML+1.546
 Gohan+1.543
���+1.542
 lapt+1.534
Neg logit contributions
otrop-1.516
acia-1.394
udos-1.385
usting-1.379
 safety-1.313
TokenRated
Token ID15322
Feature activation-4.52e+00

Pos logit contributions
 Lennon+1.314
���+1.275
 lapt+1.259
FML+1.239
 Gohan+1.210
Neg logit contributions
otrop-1.805
acia-1.687
usting-1.658
udos-1.657
 safety-1.621
Token 5
Token ID642
Feature activation-2.83e+00

Pos logit contributions
���+3.104
 Lennon+3.033
 lapt+2.968
 Gohan+2.823
 Yose+2.790
Neg logit contributions
otrop-3.154
acia-2.900
udos-2.877
usting-2.818
oras-2.694
Token out
Token ID503
Feature activation-1.55e+00

Pos logit contributions
 Lennon+2.121
���+2.060
 lapt+2.024
 Gohan+1.995
FML+1.951
Neg logit contributions
otrop-1.761
acia-1.608
udos-1.589
usting-1.580
 safety-1.489
Token of
Token ID286
Feature activation-3.13e+00

Pos logit contributions
���+1.134
 Lennon+1.121
 lapt+1.088
 Gohan+1.049
FML+1.039
Neg logit contributions
otrop-1.008
acia-0.925
udos-0.923
usting-0.919
 safety-0.886
Token 5
Token ID642
Feature activation-2.30e+00

Pos logit contributions
 Lennon+2.121
���+2.054
 lapt+1.994
 Gohan+1.954
FML+1.935
Neg logit contributions
otrop-2.194
acia-2.039
udos-2.013
usting-2.000
 safety-1.898
Token by
Token ID416
Feature activation-1.31e+00

Pos logit contributions
 Lennon+1.719
���+1.634
 Gohan+1.615
 lapt+1.600
FML+1.580
Neg logit contributions
otrop-1.456
acia-1.327
udos-1.309
usting-1.304
 safety-1.237
Token and
Token ID290
Feature activation+1.27e-02

Pos logit contributions
otrop+0.312
udos+0.280
acia+0.279
usting+0.277
 safety+0.266
Neg logit contributions
���-0.519
 Lennon-0.497
 lapt-0.496
 Gohan-0.471
 Yose-0.466
Token night
Token ID1755
Feature activation+1.17e+00

Pos logit contributions
otrop+0.008
udos+0.007
acia+0.007
usting+0.007
 safety+0.007
Neg logit contributions
���-0.010
 Lennon-0.010
 lapt-0.010
 Gohan-0.009
 Yose-0.009
Token!
Token ID0
Feature activation-1.70e-01

Pos logit contributions
otrop+0.601
udos+0.539
acia+0.536
usting+0.534
 safety+0.514
Neg logit contributions
���-1.049
 Lennon-1.000
 lapt-0.998
 Gohan-0.949
FML-0.939
Token
Token ID198
Feature activation-2.01e+00

Pos logit contributions
���+0.131
 Lennon+0.125
 lapt+0.124
 Gohan+0.118
 Yose+0.116
Neg logit contributions
otrop-0.108
acia-0.099
udos-0.099
usting-0.098
 safety-0.095
Token
Token ID198
Feature activation-2.09e+00

Pos logit contributions
 Lennon+1.455
���+1.380
 lapt+1.365
 Gohan+1.360
FML+1.359
Neg logit contributions
otrop-1.297
acia-1.191
udos-1.185
usting-1.180
 safety-1.126
TokenRated
Token ID15322
Feature activation-4.06e+00

Pos logit contributions
 Lennon+1.205
���+1.173
 lapt+1.155
FML+1.134
 Gohan+1.109
Neg logit contributions
otrop-1.636
acia-1.531
usting-1.505
udos-1.504
 safety-1.470
Token 5
Token ID642
Feature activation-2.58e+00

Pos logit contributions
���+2.806
 Lennon+2.735
 lapt+2.679
 Gohan+2.545
 Yose+2.518
Neg logit contributions
otrop-2.824
acia-2.596
udos-2.577
usting-2.531
oras-2.410
Token out
Token ID503
Feature activation-1.39e+00

Pos logit contributions
 Lennon+1.955
���+1.918
 lapt+1.877
 Gohan+1.840
FML+1.801
Neg logit contributions
otrop-1.592
acia-1.452
udos-1.438
usting-1.430
 safety-1.349
Token of
Token ID286
Feature activation-2.92e+00

Pos logit contributions
���+1.024
 Lennon+1.009
 lapt+0.982
 Gohan+0.944
FML+0.935
Neg logit contributions
otrop-0.902
acia-0.827
udos-0.826
usting-0.822
 safety-0.792
Token 5
Token ID642
Feature activation-2.19e+00

Pos logit contributions
 Lennon+2.004
���+1.966
 lapt+1.905
 Gohan+1.853
FML+1.832
Neg logit contributions
otrop-2.014
acia-1.869
udos-1.850
usting-1.839
 safety-1.751
Token by
Token ID416
Feature activation-1.13e+00

Pos logit contributions
 Lennon+1.645
���+1.582
 Gohan+1.547
 lapt+1.546
FML+1.514
Neg logit contributions
otrop-1.365
acia-1.243
udos-1.230
usting-1.224
 safety-1.162
Token -
Token ID532
Feature activation-7.36e-01

Pos logit contributions
���+0.778
 Lennon+0.743
 lapt+0.739
 Gohan+0.696
 Yose+0.686
Neg logit contributions
otrop-0.685
acia-0.629
udos-0.627
usting-0.622
 safety-0.603
Token December
Token ID3426
Feature activation-2.08e+00

Pos logit contributions
���+0.591
 Lennon+0.563
 lapt+0.561
 Gohan+0.530
 Yose+0.524
Neg logit contributions
otrop-0.444
udos-0.405
acia-0.404
usting-0.401
 safety-0.389
Token 13
Token ID1511
Feature activation-1.59e+00

Pos logit contributions
���+1.509
 Lennon+1.502
 lapt+1.455
 Gohan+1.405
 Yose+1.386
Neg logit contributions
otrop-1.366
acia-1.251
udos-1.249
usting-1.241
 safety-1.192
Token,
Token ID11
Feature activation-1.20e+00

Pos logit contributions
���+1.175
 Lennon+1.120
 lapt+1.114
 Gohan+1.050
 Yose+1.035
Neg logit contributions
otrop-1.062
udos-0.976
acia-0.975
usting-0.968
 safety-0.940
Token 2010
Token ID3050
Feature activation-2.22e+00

Pos logit contributions
���+0.905
 Lennon+0.863
 lapt+0.858
 Gohan+0.809
 Yose+0.799
Neg logit contributions
otrop-0.774
udos-0.710
acia-0.709
usting-0.704
 safety-0.683
Token
Token ID198
Feature activation-3.77e+00

Pos logit contributions
 Lennon+1.644
���+1.596
 lapt+1.559
 Gohan+1.539
FML+1.523
Neg logit contributions
otrop-1.413
acia-1.300
usting-1.284
udos-1.278
 safety-1.223
Token
Token ID198
Feature activation-3.34e+00

Pos logit contributions
 Lennon+2.662
 Gohan+2.474
FML+2.472
 lapt+2.450
���+2.443
Neg logit contributions
otrop-2.514
acia-2.306
udos-2.290
usting-2.280
 safety-2.165
TokenSubject
Token ID19776
Feature activation-2.95e+00

Pos logit contributions
���+1.954
 lapt+1.792
 Lennon+1.784
 Gohan+1.648
 Yose+1.629
Neg logit contributions
otrop-2.778
udos-2.636
usting-2.609
acia-2.603
 safety-2.538
Token:
Token ID25
Feature activation-2.78e+00

Pos logit contributions
 Lennon+2.382
���+2.344
 lapt+2.293
 Gohan+2.240
FML+2.228
Neg logit contributions
otrop-1.675
acia-1.512
udos-1.487
usting-1.482
 safety-1.418
Token boys
Token ID6510
Feature activation-6.71e-03

Pos logit contributions
���+2.061
 Lennon+2.003
 lapt+1.973
 Gohan+1.869
 Yose+1.845
Neg logit contributions
otrop-1.824
acia-1.680
udos-1.659
usting-1.654
 safety-1.580
Token beware
Token ID40600
Feature activation+3.46e-01

Pos logit contributions
���+0.005
 Lennon+0.005
 lapt+0.005
 Gohan+0.005
 Yose+0.005
Neg logit contributions
otrop-0.004
udos-0.004
acia-0.004
usting-0.004
 safety-0.004
TokenFlag
Token ID34227
Feature activation-2.11e+00

Pos logit contributions
���+1.055
 Lennon+1.054
 lapt+1.029
FML+1.000
 Gohan+0.991
Neg logit contributions
otrop-0.878
acia-0.799
udos-0.792
usting-0.790
 safety-0.761
Token Reply
Token ID14883
Feature activation-8.08e-01

Pos logit contributions
���+1.502
 Lennon+1.502
 lapt+1.456
 Gohan+1.406
 Yose+1.389
Neg logit contributions
otrop-1.404
udos-1.279
acia-1.278
usting-1.274
 safety-1.211
Token
Token ID198
Feature activation-8.23e-01

Pos logit contributions
���+0.621
 Lennon+0.614
 lapt+0.599
 Gohan+0.578
FML+0.571
Neg logit contributions
otrop-0.493
udos-0.449
acia-0.449
usting-0.447
 safety-0.429
Token
Token ID198
Feature activation-6.40e-01

Pos logit contributions
���+0.637
 Lennon+0.627
 lapt+0.614
 Gohan+0.590
FML+0.584
Neg logit contributions
otrop-0.501
udos-0.457
acia-0.456
usting-0.453
 safety-0.435
Token27
Token ID1983
Feature activation-1.10e+00

Pos logit contributions
���+0.407
 Lennon+0.395
 lapt+0.386
 Gohan+0.366
FML+0.363
Neg logit contributions
otrop-0.484
acia-0.448
udos-0.447
usting-0.445
 safety-0.433
Token Recommend
Token ID30361
Feature activation-3.63e+00

Pos logit contributions
���+0.838
 Lennon+0.817
 lapt+0.803
 Gohan+0.766
 Yose+0.758
Neg logit contributions
otrop-0.695
acia-0.633
udos-0.631
usting-0.629
 safety-0.606
Token
Token ID198
Feature activation-1.10e+00

Pos logit contributions
 Lennon+2.599
���+2.440
 lapt+2.424
 Gohan+2.420
FML+2.390
Neg logit contributions
otrop-2.381
acia-2.166
usting-2.159
udos-2.158
 safety-2.047
Token
Token ID198
Feature activation-1.05e+00

Pos logit contributions
���+0.840
 Lennon+0.830
 lapt+0.811
 Gohan+0.781
FML+0.773
Neg logit contributions
otrop-0.671
udos-0.612
acia-0.611
usting-0.607
 safety-0.582
TokenRecommend
Token ID41248
Feature activation-3.20e+00

Pos logit contributions
���+0.646
 Lennon+0.621
 lapt+0.612
 Gohan+0.573
FML+0.570
Neg logit contributions
otrop-0.821
acia-0.760
udos-0.756
usting-0.755
 safety-0.735
Token Share
Token ID8734
Feature activation-1.58e+00

Pos logit contributions
���+2.340
 Lennon+2.307
 lapt+2.238
 Gohan+2.156
 Yose+2.131
Neg logit contributions
otrop-2.088
acia-1.895
udos-1.891
usting-1.885
 safety-1.810
Token this
Token ID428
Feature activation+1.22e-01

Pos logit contributions
���+1.217
 Lennon+1.176
 lapt+1.162
 Gohan+1.105
 Yose+1.090
Neg logit contributions
otrop-0.993
udos-0.908
acia-0.908
usting-0.902
 safety-0.867
TokenFlag
Token ID34227
Feature activation-1.67e+00

Pos logit contributions
���+0.378
 Lennon+0.366
 lapt+0.362
 Gohan+0.345
FML+0.342
Neg logit contributions
otrop-0.285
acia-0.258
udos-0.258
usting-0.256
 safety-0.247
Token Reply
Token ID14883
Feature activation-9.76e-01

Pos logit contributions
���+1.222
 Lennon+1.199
 lapt+1.175
 Gohan+1.123
 Yose+1.111
Neg logit contributions
otrop-1.109
udos-1.012
acia-1.009
usting-1.003
 safety-0.957
Token
Token ID198
Feature activation-6.01e-02

Pos logit contributions
���+0.771
 Lennon+0.750
 lapt+0.738
 Gohan+0.706
 Yose+0.697
Neg logit contributions
otrop-0.588
udos-0.535
acia-0.534
usting-0.531
 safety-0.511
Token
Token ID198
Feature activation-3.25e-03

Pos logit contributions
���+0.048
 Lennon+0.047
 lapt+0.046
 Gohan+0.044
FML+0.043
Neg logit contributions
otrop-0.036
udos-0.033
acia-0.032
usting-0.032
 safety-0.031
Token55
Token ID2816
Feature activation+2.87e-01

Pos logit contributions
���+0.002
 Lennon+0.002
 lapt+0.002
 Gohan+0.002
FML+0.002
Neg logit contributions
otrop-0.002
udos-0.002
acia-0.002
usting-0.002
 safety-0.002
Token Recommend
Token ID30361
Feature activation-3.58e+00

Pos logit contributions
otrop+0.176
udos+0.160
acia+0.160
usting+0.159
 safety+0.154
Neg logit contributions
���-0.228
 Lennon-0.217
 lapt-0.217
 Gohan-0.205
 Yose-0.202
Token
Token ID198
Feature activation-6.88e-01

Pos logit contributions
 Lennon+2.547
���+2.371
 Gohan+2.370
 lapt+2.358
FML+2.336
Neg logit contributions
otrop-2.365
acia-2.149
usting-2.143
udos-2.143
 safety-2.033
Token
Token ID198
Feature activation-7.11e-01

Pos logit contributions
���+0.533
 Lennon+0.524
 lapt+0.513
 Gohan+0.493
FML+0.488
Neg logit contributions
otrop-0.419
udos-0.381
acia-0.381
usting-0.378
 safety-0.364
TokenRecommend
Token ID41248
Feature activation-3.07e+00

Pos logit contributions
���+0.448
 Lennon+0.426
 lapt+0.422
 Gohan+0.394
FML+0.389
Neg logit contributions
otrop-0.548
acia-0.508
udos-0.507
usting-0.504
 safety-0.492
Token Share
Token ID8734
Feature activation-1.52e+00

Pos logit contributions
���+2.249
 Lennon+2.205
 lapt+2.137
 Gohan+2.066
 Yose+2.033
Neg logit contributions
otrop-2.014
acia-1.823
udos-1.819
usting-1.814
 safety-1.747
Token this
Token ID428
Feature activation+3.53e-01

Pos logit contributions
���+1.162
 Lennon+1.123
 lapt+1.110
 Gohan+1.055
 Yose+1.040
Neg logit contributions
otrop-0.954
udos-0.873
acia-0.873
usting-0.866
 safety-0.832
Token body
Token ID1767
Feature activation-9.39e-01

Pos logit contributions
���+0.333
 Lennon+0.318
 lapt+0.316
 Gohan+0.297
 Yose+0.293
Neg logit contributions
otrop-0.302
udos-0.278
acia-0.277
usting-0.276
 safety-0.267
Token.
Token ID13
Feature activation+4.05e-01

Pos logit contributions
���+0.769
 Lennon+0.754
 lapt+0.743
 Gohan+0.707
 Yose+0.697
Neg logit contributions
otrop-0.534
udos-0.492
acia-0.484
usting-0.479
 safety-0.457
Token I
Token ID314
Feature activation-9.85e-02

Pos logit contributions
otrop+0.267
udos+0.245
acia+0.245
usting+0.244
 safety+0.236
Neg logit contributions
���-0.299
 Lennon-0.289
 lapt-0.284
 Gohan-0.270
 Yose-0.267
Token feel
Token ID1254
Feature activation-4.79e-01

Pos logit contributions
���+0.081
 Lennon+0.078
 lapt+0.078
 Gohan+0.074
FML+0.073
Neg logit contributions
otrop-0.057
udos-0.051
acia-0.051
usting-0.051
 safety-0.049
Token as
Token ID355
Feature activation+9.55e-02

Pos logit contributions
���+0.384
 Lennon+0.373
 lapt+0.369
 Gohan+0.351
 Yose+0.346
Neg logit contributions
otrop-0.286
acia-0.261
udos-0.260
usting-0.257
 safety-0.244
Token though
Token ID996
Feature activation-3.53e+00

Pos logit contributions
otrop+0.053
udos+0.048
acia+0.048
usting+0.047
 safety+0.045
Neg logit contributions
���-0.081
 Lennon-0.078
 lapt-0.077
 Gohan-0.074
 Yose-0.073
Token Eve
Token ID12882
Feature activation+2.26e-01

Pos logit contributions
 Lennon+2.749
���+2.698
 lapt+2.666
 Gohan+2.556
 Yose+2.543
Neg logit contributions
otrop-2.128
udos-1.945
acia-1.921
usting-1.919
oras-1.786
Token was
Token ID373
Feature activation-3.52e-01

Pos logit contributions
otrop+0.141
udos+0.129
acia+0.129
usting+0.128
 safety+0.123
Neg logit contributions
���-0.176
 Lennon-0.168
 lapt-0.167
 Gohan-0.158
 Yose-0.156
Token like
Token ID588
Feature activation+3.11e-01

Pos logit contributions
���+0.277
 Lennon+0.265
 lapt+0.263
 Gohan+0.249
 Yose+0.246
Neg logit contributions
otrop-0.217
udos-0.198
acia-0.197
usting-0.196
 safety-0.189
Token,
Token ID11
Feature activation+7.93e-01

Pos logit contributions
otrop+0.194
udos+0.177
acia+0.177
usting+0.175
 safety+0.169
Neg logit contributions
���-0.243
 Lennon-0.233
 lapt-0.231
 Gohan-0.219
 Yose-0.216
Token
Token ID564
Feature activation+1.89e-01

Pos logit contributions
otrop+0.418
acia+0.375
udos+0.375
usting+0.371
 safety+0.355
Neg logit contributions
���-0.695
 Lennon-0.669
 lapt-0.665
 Gohan-0.634
 Yose-0.627
TokenYou
Token ID1639
Feature activation+9.99e-01

Pos logit contributions
otrop+0.415
udos+0.383
acia+0.383
usting+0.380
 safety+0.369
Neg logit contributions
���-0.431
 Lennon-0.412
 lapt-0.409
 Gohan-0.386
FML-0.380
Token don
Token ID836
Feature activation+7.08e-01

Pos logit contributions
otrop+0.592
udos+0.538
acia+0.537
usting+0.533
 safety+0.512
Neg logit contributions
���-0.808
 Lennon-0.777
 lapt-0.773
 Gohan-0.733
 Yose-0.724
Token't
Token ID470
Feature activation-1.64e-01

Pos logit contributions
otrop+0.499
udos+0.461
acia+0.461
usting+0.458
 safety+0.445
Neg logit contributions
���-0.489
 Lennon-0.471
 lapt-0.465
 Gohan-0.439
FML-0.433
Token feel
Token ID1254
Feature activation+1.59e+00

Pos logit contributions
���+0.137
 Lennon+0.133
 lapt+0.132
 Gohan+0.126
 Yose+0.124
Neg logit contributions
otrop-0.091
acia-0.083
udos-0.082
usting-0.082
 safety-0.077
Token as
Token ID355
Feature activation+1.33e-01

Pos logit contributions
otrop+0.787
udos+0.702
acia+0.700
usting+0.693
 safety+0.664
Neg logit contributions
���-1.451
 Lennon-1.394
 lapt-1.390
 Gohan-1.324
 Yose-1.310
Token though
Token ID996
Feature activation-3.49e+00

Pos logit contributions
otrop+0.072
acia+0.065
udos+0.065
usting+0.064
 safety+0.061
Neg logit contributions
���-0.114
 Lennon-0.111
 lapt-0.110
 Gohan-0.105
 Yose-0.103
Token you
Token ID345
Feature activation+5.94e-01

Pos logit contributions
 Lennon+2.606
 lapt+2.591
���+2.581
 Gohan+2.494
 Yose+2.425
Neg logit contributions
otrop-2.146
acia-1.996
udos-1.983
usting-1.967
onyms-1.813
Token want
Token ID765
Feature activation+7.89e-02

Pos logit contributions
otrop+0.340
acia+0.307
udos+0.307
usting+0.305
 safety+0.290
Neg logit contributions
���-0.491
 Lennon-0.474
 lapt-0.471
 Gohan-0.448
 Yose-0.443
Token to
Token ID284
Feature activation-4.46e-01

Pos logit contributions
otrop+0.047
usting+0.043
udos+0.043
acia+0.043
 safety+0.040
Neg logit contributions
���-0.063
 Lennon-0.061
 lapt-0.061
 Gohan-0.058
 Yose-0.057
Token go
Token ID467
Feature activation+3.19e-01

Pos logit contributions
���+0.339
 Lennon+0.326
 lapt+0.325
 Gohan+0.307
 Yose+0.302
Neg logit contributions
otrop-0.284
udos-0.260
acia-0.260
usting-0.259
 safety-0.248
Token down
Token ID866
Feature activation+8.13e-01

Pos logit contributions
otrop+0.188
udos+0.172
acia+0.171
usting+0.170
 safety+0.162
Neg logit contributions
���-0.258
 Lennon-0.249
 lapt-0.248
 Gohan-0.235
 Yose-0.231
Token the
Token ID262
Feature activation+3.78e-01

Pos logit contributions
otrop+0.106
udos+0.095
acia+0.095
usting+0.094
 safety+0.090
Neg logit contributions
���-0.168
 Lennon-0.161
 lapt-0.161
 Gohan-0.153
 Yose-0.151
Token Kodi
Token ID44129
Feature activation-7.04e-01

Pos logit contributions
otrop+0.262
udos+0.242
acia+0.241
usting+0.240
 safety+0.232
Neg logit contributions
���-0.268
 Lennon-0.255
 lapt-0.254
 Gohan-0.238
 Yose-0.235
Token home
Token ID1363
Feature activation-8.44e-01

Pos logit contributions
���+0.460
 Lennon+0.436
 lapt+0.434
 Gohan+0.405
 Yose+0.399
Neg logit contributions
otrop-0.528
udos-0.491
acia-0.490
usting-0.486
 safety-0.472
Token screen
Token ID3159
Feature activation-6.72e-01

Pos logit contributions
���+0.623
 Lennon+0.597
 lapt+0.596
 Gohan+0.563
 Yose+0.553
Neg logit contributions
otrop-0.559
udos-0.513
acia-0.510
usting-0.507
 safety-0.486
Token,
Token ID11
Feature activation-4.95e-01

Pos logit contributions
���+0.503
 Lennon+0.479
 lapt+0.477
 Gohan+0.449
 Yose+0.443
Neg logit contributions
otrop-0.441
udos-0.406
acia-0.404
usting-0.402
 safety-0.389
Token select
Token ID2922
Feature activation-3.44e+00

Pos logit contributions
���+0.405
 Lennon+0.388
 lapt+0.386
 Gohan+0.366
 Yose+0.361
Neg logit contributions
otrop-0.290
udos-0.264
acia-0.263
usting-0.261
 safety-0.251
Token Add
Token ID3060
Feature activation+1.20e-01

Pos logit contributions
���+2.568
 Lennon+2.506
 lapt+2.481
 Gohan+2.350
 Yose+2.311
Neg logit contributions
otrop-2.182
udos-2.020
acia-2.001
usting-1.969
 safety-1.863
Tokenons
Token ID684
Feature activation+1.40e+00

Pos logit contributions
otrop+0.069
udos+0.063
acia+0.062
usting+0.062
 safety+0.059
Neg logit contributions
���-0.100
 Lennon-0.096
 lapt-0.095
 Gohan-0.091
 Yose-0.090
Token ->
Token ID4613
Feature activation+1.48e+00

Pos logit contributions
otrop+0.812
udos+0.738
acia+0.737
usting+0.732
 safety+0.709
Neg logit contributions
���-1.150
 Lennon-1.098
 lapt-1.094
 Gohan-1.035
 Yose-1.023
Token My
Token ID2011
Feature activation+4.58e-01

Pos logit contributions
otrop+0.994
udos+0.914
acia+0.912
usting+0.909
 safety+0.885
Neg logit contributions
���-1.090
 Lennon-1.031
 lapt-1.031
 Gohan-0.967
 Yose-0.953
Token Add
Token ID3060
Feature activation+3.05e-01

Pos logit contributions
otrop+0.328
udos+0.303
acia+0.302
usting+0.301
 safety+0.293
Neg logit contributions
���-0.315
 Lennon-0.299
 lapt-0.297
 Gohan-0.278
 Yose-0.274
TokenON
Token ID1340
Feature activation+9.45e-02

Pos logit contributions
otrop+0.704
udos+0.650
acia+0.649
usting+0.644
 safety+0.627
Neg logit contributions
���-0.722
 Lennon-0.685
 lapt-0.682
 Gohan-0.640
 Yose-0.631
TokenOM
Token ID2662
Feature activation+3.42e-01

Pos logit contributions
otrop+0.059
acia+0.054
udos+0.054
usting+0.054
 safety+0.052
Neg logit contributions
���-0.073
 Lennon-0.070
 lapt-0.070
 Gohan-0.066
FML-0.065
TokenICS
Token ID19505
Feature activation-1.16e+00

Pos logit contributions
otrop+0.201
udos+0.183
acia+0.182
usting+0.181
 safety+0.175
Neg logit contributions
���-0.279
 Lennon-0.267
 lapt-0.266
 Gohan-0.252
FML-0.249
Token AND
Token ID5357
Feature activation+6.91e-01

Pos logit contributions
���+0.835
 Lennon+0.815
 lapt+0.802
 Gohan+0.763
FML+0.751
Neg logit contributions
otrop-0.776
acia-0.714
udos-0.713
usting-0.710
 safety-0.676
Token TR
Token ID7579
Feature activation-2.30e-01

Pos logit contributions
otrop+0.481
udos+0.444
acia+0.444
usting+0.441
 safety+0.427
Neg logit contributions
���-0.488
 Lennon-0.465
 lapt-0.462
 Gohan-0.435
 Yose-0.428
TokenADE
Token ID19266
Feature activation-3.43e+00

Pos logit contributions
���+0.128
 Lennon+0.121
 lapt+0.119
 Gohan+0.111
FML+0.109
Neg logit contributions
otrop-0.194
acia-0.181
udos-0.181
usting-0.180
 safety-0.176
Token
Token ID198
Feature activation-1.43e+00

Pos logit contributions
 Lennon+2.443
���+2.393
 lapt+2.325
 Gohan+2.273
FML+2.236
Neg logit contributions
otrop-2.270
acia-2.089
usting-2.068
udos-2.060
onyms-1.987
Token
Token ID198
Feature activation-1.36e+00

Pos logit contributions
 Lennon+1.062
���+1.043
 lapt+1.017
 Gohan+0.996
FML+0.989
Neg logit contributions
otrop-0.901
udos-0.824
acia-0.824
usting-0.819
 safety-0.782
TokenThe
Token ID464
Feature activation-3.94e-01

Pos logit contributions
���+0.906
 Lennon+0.887
 lapt+0.874
FML+0.833
 Gohan+0.827
Neg logit contributions
otrop-0.973
acia-0.897
udos-0.897
usting-0.892
 safety-0.868
Token Wednesday
Token ID3583
Feature activation-7.93e-01

Pos logit contributions
���+0.305
 Lennon+0.293
 lapt+0.291
 Gohan+0.275
 Yose+0.272
Neg logit contributions
otrop-0.247
udos-0.227
acia-0.224
usting-0.223
 safety-0.214
Token-
Token ID12
Feature activation-1.16e+00

Pos logit contributions
���+0.512
 Lennon+0.489
 lapt+0.487
 Gohan+0.453
FML+0.446
Neg logit contributions
otrop-0.597
udos-0.555
acia-0.553
usting-0.549
 safety-0.531
Token -
Token ID532
Feature activation-8.33e-01

Pos logit contributions
���+0.920
 Lennon+0.879
 lapt+0.874
 Gohan+0.823
 Yose+0.810
Neg logit contributions
otrop-0.840
acia-0.774
udos-0.769
usting-0.764
 safety-0.741
Token July
Token ID2901
Feature activation-1.29e+00

Pos logit contributions
���+0.676
 Lennon+0.643
 lapt+0.643
 Gohan+0.606
 Yose+0.599
Neg logit contributions
otrop-0.496
udos-0.451
acia-0.450
usting-0.447
 safety-0.433
Token 6
Token ID718
Feature activation-2.03e+00

Pos logit contributions
���+0.977
 Lennon+0.949
 lapt+0.934
 Gohan+0.891
 Yose+0.879
Neg logit contributions
otrop-0.827
udos-0.757
acia-0.756
usting-0.751
 safety-0.725
Token,
Token ID11
Feature activation-1.20e+00

Pos logit contributions
 Lennon+1.362
���+1.341
 lapt+1.305
 Gohan+1.267
FML+1.259
Neg logit contributions
otrop-1.427
acia-1.317
usting-1.306
udos-1.306
 safety-1.254
Token 2010
Token ID3050
Feature activation-2.89e+00

Pos logit contributions
���+0.899
 Lennon+0.854
 lapt+0.840
 Gohan+0.794
 Yose+0.785
Neg logit contributions
otrop-0.786
acia-0.716
usting-0.713
udos-0.712
 safety-0.684
Token
Token ID198
Feature activation-3.39e+00

Pos logit contributions
 Lennon+2.103
���+2.007
 lapt+1.971
 Gohan+1.962
FML+1.955
Neg logit contributions
otrop-1.871
acia-1.718
usting-1.703
udos-1.696
 safety-1.618
Token
Token ID198
Feature activation-2.93e+00

Pos logit contributions
 Lennon+2.411
 Gohan+2.241
FML+2.241
 lapt+2.229
���+2.228
Neg logit contributions
otrop-2.239
acia-2.057
udos-2.046
usting-2.041
 safety-1.936
TokenSubject
Token ID19776
Feature activation-3.08e+00

Pos logit contributions
���+1.583
 Lennon+1.481
 lapt+1.471
 Gohan+1.351
 Yose+1.325
Neg logit contributions
otrop-2.533
udos-2.375
acia-2.372
usting-2.360
 safety-2.308
Token:
Token ID25
Feature activation-2.78e+00

Pos logit contributions
 Lennon+2.421
���+2.311
 lapt+2.283
FML+2.274
 Gohan+2.271
Neg logit contributions
otrop-1.803
acia-1.628
udos-1.604
usting-1.602
 safety-1.525
Token R
Token ID371
Feature activation-9.06e-01

Pos logit contributions
���+2.063
 Lennon+2.006
 lapt+1.975
 Gohan+1.873
 Yose+1.849
Neg logit contributions
otrop-1.807
acia-1.665
udos-1.644
usting-1.641
 safety-1.560
Tokeniff
Token ID733
Feature activation+1.07e+00

Pos logit contributions
���+0.703
 Lennon+0.676
 lapt+0.670
 Gohan+0.635
 Yose+0.628
Neg logit contributions
otrop-0.564
acia-0.513
udos-0.512
usting-0.508
 safety-0.490
Token December
Token ID3426
Feature activation-2.08e+00

Pos logit contributions
���+0.591
 Lennon+0.563
 lapt+0.561
 Gohan+0.530
 Yose+0.524
Neg logit contributions
otrop-0.444
udos-0.405
acia-0.404
usting-0.401
 safety-0.389
Token 13
Token ID1511
Feature activation-1.59e+00

Pos logit contributions
���+1.509
 Lennon+1.502
 lapt+1.455
 Gohan+1.405
 Yose+1.386
Neg logit contributions
otrop-1.366
acia-1.251
udos-1.249
usting-1.241
 safety-1.192
Token,
Token ID11
Feature activation-1.20e+00

Pos logit contributions
���+1.175
 Lennon+1.120
 lapt+1.114
 Gohan+1.050
 Yose+1.035
Neg logit contributions
otrop-1.062
udos-0.976
acia-0.975
usting-0.968
 safety-0.940
Token 2010
Token ID3050
Feature activation-2.22e+00

Pos logit contributions
���+0.905
 Lennon+0.863
 lapt+0.858
 Gohan+0.809
 Yose+0.799
Neg logit contributions
otrop-0.774
udos-0.710
acia-0.709
usting-0.704
 safety-0.683
Token
Token ID198
Feature activation-3.77e+00

Pos logit contributions
 Lennon+1.644
���+1.596
 lapt+1.559
 Gohan+1.539
FML+1.523
Neg logit contributions
otrop-1.413
acia-1.300
usting-1.284
udos-1.278
 safety-1.223
Token
Token ID198
Feature activation-3.34e+00

Pos logit contributions
 Lennon+2.662
 Gohan+2.474
FML+2.472
 lapt+2.450
���+2.443
Neg logit contributions
otrop-2.514
acia-2.306
udos-2.290
usting-2.280
 safety-2.165
TokenSubject
Token ID19776
Feature activation-2.95e+00

Pos logit contributions
���+1.954
 lapt+1.792
 Lennon+1.784
 Gohan+1.648
 Yose+1.629
Neg logit contributions
otrop-2.778
udos-2.636
usting-2.609
acia-2.603
 safety-2.538
Token:
Token ID25
Feature activation-2.78e+00

Pos logit contributions
 Lennon+2.382
���+2.344
 lapt+2.293
 Gohan+2.240
FML+2.228
Neg logit contributions
otrop-1.675
acia-1.512
udos-1.487
usting-1.482
 safety-1.418
Token boys
Token ID6510
Feature activation-6.71e-03

Pos logit contributions
���+2.061
 Lennon+2.003
 lapt+1.973
 Gohan+1.869
 Yose+1.845
Neg logit contributions
otrop-1.824
acia-1.680
udos-1.659
usting-1.654
 safety-1.580
Token beware
Token ID40600
Feature activation+3.46e-01

Pos logit contributions
���+0.005
 Lennon+0.005
 lapt+0.005
 Gohan+0.005
 Yose+0.005
Neg logit contributions
otrop-0.004
udos-0.004
acia-0.004
usting-0.004
 safety-0.004
Token This
Token ID770
Feature activation-5.51e-01

Pos logit contributions
otrop+0.212
udos+0.195
acia+0.193
usting+0.192
 safety+0.188
Neg logit contributions
���-0.274
 Lennon-0.259
 lapt-0.259
 Gohan-0.244
FML-0.242
Token the
Token ID262
Feature activation+3.35e-01

Pos logit contributions
otrop+0.121
udos+0.111
acia+0.111
usting+0.110
 safety+0.107
Neg logit contributions
���-0.131
 Lennon-0.125
 lapt-0.124
 Gohan-0.117
 Yose-0.115
Token Kodi
Token ID44129
Feature activation-9.53e-01

Pos logit contributions
otrop+0.232
udos+0.214
acia+0.214
usting+0.213
 safety+0.208
Neg logit contributions
���-0.239
 Lennon-0.226
 lapt-0.225
 Gohan-0.211
FML-0.209
Token homes
Token ID5682
Feature activation+1.05e-01

Pos logit contributions
���+0.825
 Lennon+0.796
 lapt+0.794
 Gohan+0.755
 Yose+0.746
Neg logit contributions
otrop-0.511
udos-0.460
acia-0.460
usting-0.454
 safety-0.431
Tokencreen
Token ID32060
Feature activation-6.20e-01

Pos logit contributions
otrop+0.088
udos+0.083
acia+0.083
usting+0.082
 safety+0.080
Neg logit contributions
���-0.058
 Lennon-0.055
 lapt-0.054
 Gohan-0.050
 Yose-0.049
Token,
Token ID11
Feature activation-2.51e-01

Pos logit contributions
���+0.491
 Lennon+0.474
 lapt+0.472
 Gohan+0.447
 Yose+0.441
Neg logit contributions
otrop-0.377
udos-0.343
acia-0.341
usting-0.338
 safety-0.323
Token select
Token ID2922
Feature activation-3.33e+00

Pos logit contributions
���+0.207
 Lennon+0.199
 lapt+0.198
 Gohan+0.187
 Yose+0.185
Neg logit contributions
otrop-0.146
udos-0.133
acia-0.132
usting-0.131
 safety-0.126
Token
Token ID564
Feature activation+8.56e-01

Pos logit contributions
���+2.403
 Lennon+2.351
 lapt+2.324
 Gohan+2.202
 Yose+2.169
Neg logit contributions
otrop-2.202
udos-2.041
acia-2.026
usting-1.989
 safety-1.886
Token
Token ID246
Feature activation+2.56e-01

Pos logit contributions
otrop+0.859
acia+0.817
udos+0.814
usting+0.813
 safety+0.799
Neg logit contributions
���-0.345
 lapt-0.315
 Lennon-0.310
 Gohan-0.272
 Yose-0.263
TokenAdd
Token ID4550
Feature activation-5.16e-01

Pos logit contributions
otrop+0.179
udos+0.165
acia+0.165
usting+0.164
 safety+0.159
Neg logit contributions
���-0.181
 Lennon-0.172
 lapt-0.171
 Gohan-0.160
 Yose-0.158
Tokenons
Token ID684
Feature activation+1.08e+00

Pos logit contributions
���+0.411
 Lennon+0.394
 lapt+0.391
 Gohan+0.372
 Yose+0.368
Neg logit contributions
otrop-0.312
udos-0.285
acia-0.284
usting-0.281
 safety-0.271
Token
Token ID447
Feature activation+9.18e-02

Pos logit contributions
otrop+0.783
acia+0.727
udos+0.726
usting+0.723
 safety+0.707
Neg logit contributions
���-0.736
 Lennon-0.692
 lapt-0.690
 Gohan-0.644
 Yose-0.635
Token but
Token ID475
Feature activation+2.72e-01

Pos logit contributions
otrop+0.061
udos+0.055
acia+0.055
usting+0.055
 safety+0.053
Neg logit contributions
���-0.086
 Lennon-0.082
 lapt-0.082
 Gohan-0.077
 Yose-0.076
Token it
Token ID340
Feature activation-5.87e-01

Pos logit contributions
otrop+0.168
udos+0.154
acia+0.153
usting+0.152
 safety+0.149
Neg logit contributions
���-0.214
 lapt-0.203
 Lennon-0.203
 Gohan-0.191
FML-0.189
Token is
Token ID318
Feature activation-4.94e-01

Pos logit contributions
���+0.454
 Lennon+0.432
 lapt+0.431
 Gohan+0.406
 Yose+0.401
Neg logit contributions
otrop-0.372
udos-0.341
acia-0.340
usting-0.338
 safety-0.328
Token not
Token ID407
Feature activation+7.65e-02

Pos logit contributions
���+0.392
 Lennon+0.374
 lapt+0.373
 Gohan+0.352
 Yose+0.348
Neg logit contributions
otrop-0.302
udos-0.275
acia-0.275
usting-0.272
 safety-0.265
Token as
Token ID355
Feature activation-4.34e-01

Pos logit contributions
otrop+0.046
udos+0.042
acia+0.042
usting+0.042
 safety+0.040
Neg logit contributions
���-0.061
 Lennon-0.058
 lapt-0.058
 Gohan-0.055
 Yose-0.054
Token though
Token ID996
Feature activation-3.33e+00

Pos logit contributions
���+0.365
 Lennon+0.349
 lapt+0.348
 Gohan+0.330
 Yose+0.326
Neg logit contributions
otrop-0.245
udos-0.222
acia-0.221
usting-0.219
 safety-0.212
Token he
Token ID339
Feature activation-4.42e-01

Pos logit contributions
���+2.472
 Lennon+2.405
 lapt+2.361
 Gohan+2.247
 Yose+2.215
Neg logit contributions
otrop-2.155
udos-1.998
acia-1.990
usting-1.989
 safety-1.860
Token himself
Token ID2241
Feature activation-7.70e-02

Pos logit contributions
���+0.336
 Lennon+0.320
 lapt+0.319
 Gohan+0.301
 Yose+0.297
Neg logit contributions
otrop-0.285
udos-0.261
acia-0.261
usting-0.259
 safety-0.251
Token is
Token ID318
Feature activation-2.16e-01

Pos logit contributions
���+0.059
 Lennon+0.056
 lapt+0.056
 Gohan+0.053
 Yose+0.052
Neg logit contributions
otrop-0.049
udos-0.045
acia-0.045
usting-0.045
 safety-0.043
Token immune
Token ID10900
Feature activation+6.95e-01

Pos logit contributions
���+0.167
 Lennon+0.159
 lapt+0.159
 Gohan+0.150
 Yose+0.148
Neg logit contributions
otrop-0.137
udos-0.125
acia-0.125
usting-0.124
 safety-0.120
Token from
Token ID422
Feature activation+2.64e-01

Pos logit contributions
otrop+0.394
udos+0.357
acia+0.356
usting+0.353
 safety+0.339
Neg logit contributions
���-0.579
 Lennon-0.558
 lapt-0.554
 Gohan-0.527
FML-0.521
Token to
Token ID284
Feature activation+3.26e-01

Pos logit contributions
���+0.403
 Lennon+0.385
 lapt+0.385
 Gohan+0.363
 Yose+0.359
Neg logit contributions
otrop-0.302
udos-0.274
acia-0.273
usting-0.271
 safety-0.258
Token enter
Token ID3802
Feature activation-1.57e+00

Pos logit contributions
otrop+0.211
udos+0.194
acia+0.194
usting+0.192
 safety+0.187
Neg logit contributions
���-0.247
 Lennon-0.236
 lapt-0.235
 Gohan-0.221
 Yose-0.218
Token the
Token ID262
Feature activation+7.94e-02

Pos logit contributions
���+1.280
 Lennon+1.247
 lapt+1.246
 Gohan+1.174
 Yose+1.161
Neg logit contributions
otrop-0.896
udos-0.823
acia-0.818
usting-0.811
 safety-0.757
Token app
Token ID598
Feature activation+6.21e-01

Pos logit contributions
otrop+0.054
udos+0.049
acia+0.049
usting+0.049
 safety+0.047
Neg logit contributions
���-0.058
 Lennon-0.055
 lapt-0.055
 Gohan-0.052
 Yose-0.051
Token,
Token ID11
Feature activation-2.19e-01

Pos logit contributions
otrop+0.380
udos+0.347
acia+0.346
usting+0.343
 safety+0.331
Neg logit contributions
���-0.490
 Lennon-0.470
 lapt-0.468
 Gohan-0.442
 Yose-0.437
Token tap
Token ID9814
Feature activation-3.31e+00

Pos logit contributions
���+0.178
 Lennon+0.170
 lapt+0.170
 Gohan+0.160
 Yose+0.159
Neg logit contributions
otrop-0.129
udos-0.117
acia-0.117
usting-0.116
 safety-0.112
Token a
Token ID257
Feature activation-2.98e-01

Pos logit contributions
���+2.568
 Lennon+2.483
 lapt+2.470
 Gohan+2.326
 Yose+2.290
Neg logit contributions
otrop-2.038
udos-1.844
acia-1.825
usting-1.815
 safety-1.710
Token virtual
Token ID7166
Feature activation-3.59e-01

Pos logit contributions
���+0.228
 Lennon+0.218
 lapt+0.218
 Gohan+0.205
 Yose+0.202
Neg logit contributions
otrop-0.189
udos-0.174
acia-0.172
usting-0.171
 safety-0.165
Token Android
Token ID5565
Feature activation+4.97e-01

Pos logit contributions
���+0.260
 Lennon+0.248
 lapt+0.247
 Gohan+0.232
 Yose+0.229
Neg logit contributions
otrop-0.244
udos-0.225
acia-0.224
usting-0.223
 safety-0.215
Token button
Token ID4936
Feature activation-4.86e-01

Pos logit contributions
otrop+0.355
udos+0.329
acia+0.329
usting+0.326
 safety+0.319
Neg logit contributions
���-0.343
 Lennon-0.324
 lapt-0.323
 Gohan-0.302
 Yose-0.298
Token,
Token ID11
Feature activation+2.69e-02

Pos logit contributions
���+0.385
 Lennon+0.370
 lapt+0.368
 Gohan+0.349
FML+0.344
Neg logit contributions
otrop-0.296
udos-0.270
acia-0.268
usting-0.267
 safety-0.257
Token the
Token ID262
Feature activation-7.76e-02

Pos logit contributions
���+0.059
 Lennon+0.057
 lapt+0.056
 Gohan+0.053
FML+0.053
Neg logit contributions
otrop-0.039
udos-0.035
acia-0.035
usting-0.035
 safety-0.033
Token Amazon
Token ID6186
Feature activation-6.39e-01

Pos logit contributions
���+0.060
 Lennon+0.057
 lapt+0.057
 Gohan+0.054
 Yose+0.053
Neg logit contributions
otrop-0.049
udos-0.045
acia-0.045
usting-0.044
 safety-0.043
Token search
Token ID2989
Feature activation-4.34e-01

Pos logit contributions
���+0.470
 Lennon+0.450
 lapt+0.449
 Gohan+0.422
 Yose+0.416
Neg logit contributions
otrop-0.424
udos-0.389
acia-0.389
usting-0.386
 safety-0.372
Token bar
Token ID2318
Feature activation+1.42e-01

Pos logit contributions
���+0.339
 Lennon+0.324
 lapt+0.323
 Gohan+0.306
 Yose+0.301
Neg logit contributions
otrop-0.270
udos-0.246
acia-0.246
usting-0.244
 safety-0.235
Token,
Token ID11
Feature activation+1.22e-02

Pos logit contributions
otrop+0.090
udos+0.082
acia+0.082
usting+0.082
 safety+0.079
Neg logit contributions
���-0.110
 Lennon-0.105
 lapt-0.104
 Gohan-0.098
 Yose-0.097
Token select
Token ID2922
Feature activation-3.26e+00

Pos logit contributions
otrop+0.007
udos+0.006
acia+0.006
usting+0.006
 safety+0.006
Neg logit contributions
���-0.010
 Lennon-0.010
 lapt-0.010
 Gohan-0.009
 Yose-0.009
Token "
Token ID366
Feature activation-3.32e-01

Pos logit contributions
���+2.621
 Lennon+2.592
 lapt+2.541
 Gohan+2.431
FML+2.389
Neg logit contributions
otrop-1.869
udos-1.683
acia-1.683
usting-1.658
 safety-1.575
TokenM
Token ID44
Feature activation+2.76e-01

Pos logit contributions
���+0.225
 Lennon+0.214
 lapt+0.213
 Gohan+0.199
FML+0.197
Neg logit contributions
otrop-0.240
udos-0.221
acia-0.221
usting-0.220
 safety-0.213
Tokenovies
Token ID20526
Feature activation+4.36e-01

Pos logit contributions
otrop+0.177
udos+0.162
acia+0.162
usting+0.161
 safety+0.156
Neg logit contributions
���-0.213
 Lennon-0.201
 lapt-0.201
 Gohan-0.189
 Yose-0.187
Token &
Token ID1222
Feature activation+1.83e-01

Pos logit contributions
otrop+0.269
udos+0.246
acia+0.245
usting+0.243
 safety+0.235
Neg logit contributions
���-0.342
 Lennon-0.327
 lapt-0.326
 Gohan-0.308
 Yose-0.304
Token TV
Token ID3195
Feature activation+3.91e-01

Pos logit contributions
otrop+0.129
acia+0.120
udos+0.120
usting+0.119
 safety+0.115
Neg logit contributions
���-0.127
 Lennon-0.121
 lapt-0.120
 Gohan-0.113
 Yose-0.111
Token in
Token ID287
Feature activation-4.60e-01

Pos logit contributions
���+0.294
 Lennon+0.283
 lapt+0.281
 Gohan+0.267
 Yose+0.264
Neg logit contributions
otrop-0.211
udos-0.192
acia-0.191
usting-0.189
 safety-0.182
Token terms
Token ID2846
Feature activation+6.92e-01

Pos logit contributions
���+0.372
 Lennon+0.357
 lapt+0.355
 Gohan+0.337
 Yose+0.333
Neg logit contributions
otrop-0.272
udos-0.248
acia-0.247
usting-0.246
 safety-0.235
Token of
Token ID286
Feature activation+3.12e-01

Pos logit contributions
otrop+0.434
udos+0.397
acia+0.396
usting+0.394
 safety+0.381
Neg logit contributions
���-0.537
 Lennon-0.513
 lapt-0.511
 Gohan-0.482
 Yose-0.476
Token gender
Token ID5279
Feature activation-5.42e-01

Pos logit contributions
otrop+0.206
udos+0.189
acia+0.189
usting+0.188
 safety+0.182
Neg logit contributions
���-0.233
 Lennon-0.222
 lapt-0.221
 Gohan-0.208
 Yose-0.205
Token,
Token ID11
Feature activation-6.50e-02

Pos logit contributions
���+0.411
 Lennon+0.394
 lapt+0.392
 Gohan+0.370
 Yose+0.364
Neg logit contributions
otrop-0.348
udos-0.319
acia-0.317
usting-0.315
 safety-0.303
Token race
Token ID3234
Feature activation-3.23e+00

Pos logit contributions
���+0.049
 Lennon+0.046
 lapt+0.046
 Gohan+0.043
 Yose+0.043
Neg logit contributions
otrop-0.043
udos-0.039
acia-0.039
usting-0.039
 safety-0.038
Token,
Token ID11
Feature activation-1.16e+00

Pos logit contributions
���+2.233
 Lennon+2.144
 lapt+2.123
 Gohan+1.995
FML+1.956
Neg logit contributions
otrop-2.256
udos-2.100
acia-2.071
usting-2.067
 mine-1.971
Token class
Token ID1398
Feature activation-9.43e-01

Pos logit contributions
���+0.818
 Lennon+0.787
 lapt+0.781
 Gohan+0.735
 Yose+0.724
Neg logit contributions
otrop-0.805
udos-0.743
acia-0.741
usting-0.737
 safety-0.703
Token,
Token ID11
Feature activation-1.45e+00

Pos logit contributions
���+0.718
 Lennon+0.688
 lapt+0.685
 Gohan+0.646
FML+0.637
Neg logit contributions
otrop-0.602
udos-0.552
acia-0.549
usting-0.547
 safety-0.524
Token etc
Token ID3503
Feature activation+9.05e-02

Pos logit contributions
���+1.035
 Lennon+0.996
 lapt+0.990
 Gohan+0.932
 Yose+0.917
Neg logit contributions
otrop-0.995
udos-0.916
acia-0.915
usting-0.908
 safety-0.865
Token.
Token ID13
Feature activation+1.78e-01

Pos logit contributions
otrop+0.044
udos+0.040
acia+0.039
usting+0.039
 safety+0.037
Neg logit contributions
���-0.082
 Lennon-0.079
 lapt-0.079
 Gohan-0.075
 Yose-0.074
Token celebrated
Token ID13943
Feature activation-1.54e-01

Pos logit contributions
���+0.402
 Lennon+0.388
 lapt+0.386
 Gohan+0.363
 Yose+0.360
Neg logit contributions
otrop-0.309
acia-0.281
udos-0.280
usting-0.279
 safety-0.266
Token,
Token ID11
Feature activation-4.21e-01

Pos logit contributions
���+0.132
 Lennon+0.127
 lapt+0.127
 Gohan+0.120
 Yose+0.119
Neg logit contributions
otrop-0.084
acia-0.076
udos-0.076
usting-0.075
 safety-0.072
Token and
Token ID290
Feature activation+1.24e-01

Pos logit contributions
���+0.345
 Lennon+0.331
 lapt+0.329
 Gohan+0.312
 Yose+0.308
Neg logit contributions
otrop-0.246
udos-0.223
acia-0.223
usting-0.221
 safety-0.213
Token seemed
Token ID3947
Feature activation+3.58e-01

Pos logit contributions
otrop+0.068
udos+0.061
acia+0.061
usting+0.061
 safety+0.058
Neg logit contributions
���-0.106
 Lennon-0.102
 lapt-0.101
 Gohan-0.096
 Yose-0.095
Token as
Token ID355
Feature activation-3.07e-02

Pos logit contributions
otrop+0.213
acia+0.194
udos+0.194
usting+0.192
 safety+0.185
Neg logit contributions
���-0.289
 Lennon-0.277
 lapt-0.276
 Gohan-0.261
 Yose-0.258
Token though
Token ID996
Feature activation-3.21e+00

Pos logit contributions
���+0.026
 Lennon+0.025
 lapt+0.025
 Gohan+0.024
 Yose+0.024
Neg logit contributions
otrop-0.017
udos-0.015
acia-0.015
usting-0.015
 safety-0.014
Token they
Token ID484
Feature activation-3.99e-01

Pos logit contributions
���+2.705
 Lennon+2.651
 lapt+2.636
 Gohan+2.483
 Yose+2.477
Neg logit contributions
otrop-1.768
acia-1.611
usting-1.584
udos-1.581
oras-1.450
Token had
Token ID550
Feature activation-2.36e-01

Pos logit contributions
���+0.327
 Lennon+0.314
 lapt+0.312
 Gohan+0.296
 Yose+0.292
Neg logit contributions
otrop-0.233
acia-0.212
udos-0.211
usting-0.210
 safety-0.202
Token to
Token ID284
Feature activation+7.13e-02

Pos logit contributions
���+0.194
 Lennon+0.186
 lapt+0.186
 Gohan+0.176
 Yose+0.174
Neg logit contributions
otrop-0.137
udos-0.124
acia-0.124
usting-0.123
 safety-0.117
Token un
Token ID555
Feature activation+5.00e-01

Pos logit contributions
otrop+0.047
udos+0.043
acia+0.043
usting+0.043
 safety+0.041
Neg logit contributions
���-0.053
 Lennon-0.051
 lapt-0.051
 Gohan-0.048
 Yose-0.047
Tokence
Token ID344
Feature activation+1.13e+00

Pos logit contributions
otrop+0.316
udos+0.289
acia+0.288
usting+0.286
 safety+0.277
Neg logit contributions
���-0.386
 Lennon-0.368
 lapt-0.367
 Gohan-0.346
 Yose-0.342
TokenFlag
Token ID34227
Feature activation-1.78e+00

Pos logit contributions
���+0.905
 Lennon+0.874
 lapt+0.871
 Gohan+0.822
FML+0.817
Neg logit contributions
otrop-0.748
acia-0.681
udos-0.679
usting-0.675
 safety-0.653
Token Reply
Token ID14883
Feature activation-5.03e-01

Pos logit contributions
���+1.333
 Lennon+1.306
 lapt+1.293
 Gohan+1.227
 Yose+1.212
Neg logit contributions
otrop-1.147
udos-1.048
acia-1.042
usting-1.039
 safety-0.987
Token
Token ID198
Feature activation+4.48e-01

Pos logit contributions
���+0.396
 Lennon+0.382
 lapt+0.375
 Gohan+0.359
 Yose+0.355
Neg logit contributions
otrop-0.309
acia-0.281
udos-0.280
usting-0.279
 safety-0.268
Token
Token ID198
Feature activation+5.00e-01

Pos logit contributions
otrop+0.267
udos+0.243
acia+0.243
usting+0.241
 safety+0.232
Neg logit contributions
���-0.358
 Lennon-0.346
 lapt-0.343
 Gohan-0.327
FML-0.323
Token14
Token ID1415
Feature activation-3.04e-01

Pos logit contributions
otrop+0.354
udos+0.326
acia+0.326
usting+0.324
 safety+0.314
Neg logit contributions
���-0.347
 Lennon-0.331
 lapt-0.328
 Gohan-0.310
FML-0.306
Token Recommend
Token ID30361
Feature activation-3.20e+00

Pos logit contributions
���+0.227
 Lennon+0.222
 lapt+0.218
 Gohan+0.208
 Yose+0.204
Neg logit contributions
otrop-0.198
acia-0.179
usting-0.179
udos-0.178
 safety-0.171
Token
Token ID198
Feature activation-3.30e-01

Pos logit contributions
 Lennon+2.274
���+2.136
 Gohan+2.110
 lapt+2.095
FML+2.075
Neg logit contributions
otrop-2.134
acia-1.929
usting-1.926
udos-1.923
 safety-1.837
Token
Token ID198
Feature activation-3.57e-01

Pos logit contributions
���+0.256
 Lennon+0.252
 lapt+0.246
 Gohan+0.237
FML+0.234
Neg logit contributions
otrop-0.201
udos-0.183
acia-0.183
usting-0.181
 safety-0.174
TokenRecommend
Token ID41248
Feature activation-2.82e+00

Pos logit contributions
���+0.230
 Lennon+0.217
 lapt+0.216
 Gohan+0.202
 Yose+0.198
Neg logit contributions
otrop-0.272
udos-0.253
acia-0.252
usting-0.251
 safety-0.244
Token Share
Token ID8734
Feature activation-1.53e+00

Pos logit contributions
���+2.062
 Lennon+2.011
 lapt+1.954
 Gohan+1.888
 Yose+1.846
Neg logit contributions
otrop-1.864
udos-1.684
acia-1.678
usting-1.675
 safety-1.618
Token this
Token ID428
Feature activation+3.95e-01

Pos logit contributions
���+1.223
 Lennon+1.162
 lapt+1.161
 Gohan+1.094
FML+1.083
Neg logit contributions
otrop-0.930
udos-0.849
acia-0.846
usting-0.840
 safety-0.816
Token
Token ID198
Feature activation-6.40e-01

Pos logit contributions
���+0.637
 Lennon+0.627
 lapt+0.614
 Gohan+0.590
FML+0.584
Neg logit contributions
otrop-0.501
udos-0.457
acia-0.456
usting-0.453
 safety-0.435
Token27
Token ID1983
Feature activation-1.10e+00

Pos logit contributions
���+0.407
 Lennon+0.395
 lapt+0.386
 Gohan+0.366
FML+0.363
Neg logit contributions
otrop-0.484
acia-0.448
udos-0.447
usting-0.445
 safety-0.433
Token Recommend
Token ID30361
Feature activation-3.63e+00

Pos logit contributions
���+0.838
 Lennon+0.817
 lapt+0.803
 Gohan+0.766
 Yose+0.758
Neg logit contributions
otrop-0.695
acia-0.633
udos-0.631
usting-0.629
 safety-0.606
Token
Token ID198
Feature activation-1.10e+00

Pos logit contributions
 Lennon+2.599
���+2.440
 lapt+2.424
 Gohan+2.420
FML+2.390
Neg logit contributions
otrop-2.381
acia-2.166
usting-2.159
udos-2.158
 safety-2.047
Token
Token ID198
Feature activation-1.05e+00

Pos logit contributions
���+0.840
 Lennon+0.830
 lapt+0.811
 Gohan+0.781
FML+0.773
Neg logit contributions
otrop-0.671
udos-0.612
acia-0.611
usting-0.607
 safety-0.582
TokenRecommend
Token ID41248
Feature activation-3.20e+00

Pos logit contributions
���+0.646
 Lennon+0.621
 lapt+0.612
 Gohan+0.573
FML+0.570
Neg logit contributions
otrop-0.821
acia-0.760
udos-0.756
usting-0.755
 safety-0.735
Token Share
Token ID8734
Feature activation-1.58e+00

Pos logit contributions
���+2.340
 Lennon+2.307
 lapt+2.238
 Gohan+2.156
 Yose+2.131
Neg logit contributions
otrop-2.088
acia-1.895
udos-1.891
usting-1.885
 safety-1.810
Token this
Token ID428
Feature activation+1.22e-01

Pos logit contributions
���+1.217
 Lennon+1.176
 lapt+1.162
 Gohan+1.105
 Yose+1.090
Neg logit contributions
otrop-0.993
udos-0.908
acia-0.908
usting-0.902
 safety-0.867
Token comment
Token ID2912
Feature activation-4.10e-01

Pos logit contributions
otrop+0.067
udos+0.060
acia+0.060
usting+0.060
 safety+0.057
Neg logit contributions
���-0.104
 Lennon-0.100
 lapt-0.099
 Gohan-0.095
 Yose-0.093
Token on
Token ID319
Feature activation-3.26e-01

Pos logit contributions
���+0.315
 Lennon+0.307
 lapt+0.301
 Gohan+0.289
FML+0.284
Neg logit contributions
otrop-0.256
udos-0.234
acia-0.233
usting-0.232
 safety-0.223
Token Facebook
Token ID3203
Feature activation-9.61e-01

Pos logit contributions
���+0.266
 Lennon+0.253
 lapt+0.253
 Gohan+0.239
 Yose+0.236
Neg logit contributions
otrop-0.192
udos-0.175
acia-0.174
usting-0.173
 safety-0.168
Token Guard
Token ID4932
Feature activation+1.44e+00

Pos logit contributions
���+0.703
 Lennon+0.675
 lapt+0.658
 Gohan+0.625
 Yose+0.618
Neg logit contributions
otrop-0.762
acia-0.704
udos-0.703
usting-0.702
 safety-0.681
Tokenado
Token ID4533
Feature activation+3.85e-03

Pos logit contributions
otrop+0.808
acia+0.746
udos+0.744
usting+0.724
 mine+0.702
Neg logit contributions
���-1.223
 Lennon-1.151
 lapt-1.151
 Gohan-1.088
 Yose-1.081
Token obliged
Token ID23278
Feature activation+1.41e-01

Pos logit contributions
otrop+0.002
udos+0.002
acia+0.002
usting+0.002
 safety+0.002
Neg logit contributions
���-0.003
 Lennon-0.003
 lapt-0.003
 Gohan-0.003
FML-0.003
Token
Token ID784
Feature activation+5.09e-02

Pos logit contributions
otrop+0.083
acia+0.076
udos+0.075
usting+0.075
 safety+0.071
Neg logit contributions
���-0.114
 Lennon-0.110
 lapt-0.109
 Gohan-0.104
 Yose-0.103
Token spor
Token ID35810
Feature activation+6.99e-02

Pos logit contributions
otrop+0.030
udos+0.027
acia+0.027
usting+0.027
 safety+0.026
Neg logit contributions
���-0.041
 Lennon-0.040
 lapt-0.039
 Gohan-0.037
FML-0.037
Tokenad
Token ID324
Feature activation-3.18e+00

Pos logit contributions
otrop+0.047
udos+0.044
acia+0.043
usting+0.043
 safety+0.042
Neg logit contributions
���-0.050
 Lennon-0.048
 lapt-0.048
 Gohan-0.045
 Yose-0.044
Tokenically
Token ID1146
Feature activation-1.30e+00

Pos logit contributions
���+2.616
 Lennon+2.538
 lapt+2.522
 Gohan+2.417
 Yose+2.377
Neg logit contributions
otrop-1.764
acia-1.625
udos-1.621
usting-1.581
 safety-1.546
Token,
Token ID11
Feature activation+1.24e-02

Pos logit contributions
���+1.011
 Lennon+0.981
 lapt+0.970
 Gohan+0.919
 Yose+0.908
Neg logit contributions
otrop-0.806
udos-0.738
acia-0.738
usting-0.729
 safety-0.699
Token considering
Token ID6402
Feature activation-4.06e-01

Pos logit contributions
otrop+0.007
udos+0.007
acia+0.007
usting+0.007
 safety+0.006
Neg logit contributions
���-0.010
 Lennon-0.010
 lapt-0.010
 Gohan-0.009
 Yose-0.009
Token the
Token ID262
Feature activation+1.81e-01

Pos logit contributions
���+0.336
 Lennon+0.331
 lapt+0.326
 Gohan+0.311
 Yose+0.307
Neg logit contributions
otrop-0.230
acia-0.209
udos-0.208
usting-0.206
 safety-0.191
Token heat
Token ID4894
Feature activation-4.12e-01

Pos logit contributions
otrop+0.110
udos+0.100
acia+0.100
usting+0.099
 safety+0.095
Neg logit contributions
���-0.144
 Lennon-0.139
 lapt-0.138
 Gohan-0.130
 Yose-0.129
Token
Token ID447
Feature activation+7.89e-01

Pos logit contributions
otrop+0.195
udos+0.178
acia+0.178
usting+0.176
 safety+0.170
Neg logit contributions
���-0.249
 Lennon-0.238
 lapt-0.238
 Gohan-0.224
 Yose-0.222
Token
Token ID247
Feature activation+7.21e-01

Pos logit contributions
otrop+0.699
acia+0.660
udos+0.659
usting+0.656
 safety+0.643
Neg logit contributions
���-0.407
 lapt-0.377
 Lennon-0.376
 Gohan-0.342
 Yose-0.336
Tokenll
Token ID297
Feature activation-7.30e-01

Pos logit contributions
otrop+0.527
udos+0.490
acia+0.488
usting+0.488
 safety+0.474
Neg logit contributions
���-0.493
 Lennon-0.461
 lapt-0.458
 Gohan-0.428
 Yose-0.422
Token notice
Token ID4003
Feature activation+2.45e-01

Pos logit contributions
���+0.560
 Lennon+0.541
 lapt+0.541
 Gohan+0.509
 Yose+0.501
Neg logit contributions
otrop-0.457
udos-0.417
acia-0.416
usting-0.413
 safety-0.395
Token a
Token ID257
Feature activation+5.21e-01

Pos logit contributions
otrop+0.137
udos+0.124
acia+0.124
usting+0.123
 safety+0.117
Neg logit contributions
���-0.206
 Lennon-0.199
 lapt-0.198
 Gohan-0.188
 Yose-0.185
Token strange
Token ID6283
Feature activation+2.36e+00

Pos logit contributions
otrop+0.316
udos+0.288
acia+0.287
usting+0.285
 safety+0.274
Neg logit contributions
���-0.416
 Lennon-0.399
 lapt-0.398
 Gohan-0.375
 Yose-0.370
Token camera
Token ID4676
Feature activation-3.74e-01

Pos logit contributions
otrop+1.533
acia+1.407
udos+1.401
usting+1.391
 safety+1.366
Neg logit contributions
���-1.785
 Lennon-1.682
 lapt-1.675
 Gohan-1.583
 Yose-1.564
Token in
Token ID287
Feature activation+6.80e-01

Pos logit contributions
���+0.273
 Lennon+0.263
 lapt+0.263
 Gohan+0.247
 Yose+0.244
Neg logit contributions
otrop-0.248
udos-0.229
acia-0.228
usting-0.226
 safety-0.216
Token the
Token ID262
Feature activation-7.56e-02

Pos logit contributions
otrop+0.389
udos+0.353
acia+0.351
usting+0.351
 safety+0.336
Neg logit contributions
���-0.565
 Lennon-0.543
 lapt-0.540
 Gohan-0.512
 Yose-0.506
Token EX
Token ID7788
Feature activation+4.93e-01

Pos logit contributions
���+0.057
 Lennon+0.055
 lapt+0.055
 Gohan+0.051
 Yose+0.051
Neg logit contributions
otrop-0.049
udos-0.045
acia-0.045
usting-0.044
 safety-0.043
TokenIF
Token ID5064
Feature activation+2.48e-01

Pos logit contributions
otrop+0.371
acia+0.344
udos+0.344
usting+0.342
 safety+0.332
Neg logit contributions
���-0.322
 Lennon-0.305
 lapt-0.303
 Gohan-0.282
 Yose-0.278
Tokenoliath
Token ID43009
Feature activation+3.60e-01

Pos logit contributions
otrop+0.269
udos+0.226
acia+0.226
usting+0.223
 safety+0.208
Neg logit contributions
���-0.868
 Lennon-0.838
 lapt-0.836
 Gohan-0.803
 Yose-0.796
Token TBA
Token ID48591
Feature activation+1.00e+00

Pos logit contributions
otrop+0.174
udos+0.155
acia+0.154
usting+0.152
 safety+0.146
Neg logit contributions
���-0.331
 Lennon-0.319
 lapt-0.318
 Gohan-0.304
 Yose-0.300
Token TBA
Token ID48591
Feature activation+1.48e+00

Pos logit contributions
otrop+0.504
udos+0.451
acia+0.450
usting+0.446
 safety+0.428
Neg logit contributions
���-0.901
 Lennon-0.865
 lapt-0.862
 Gohan-0.820
 Yose-0.812
Token
Token ID198
Feature activation-8.71e-02

Pos logit contributions
otrop+0.965
udos+0.886
acia+0.884
usting+0.878
 safety+0.854
Neg logit contributions
���-1.119
 lapt-1.061
 Lennon-1.059
 Gohan-0.993
FML-0.982
Token
Token ID198
Feature activation+3.95e-02

Pos logit contributions
���+0.071
 Lennon+0.068
 lapt+0.068
 Gohan+0.064
 Yose+0.063
Neg logit contributions
otrop-0.051
udos-0.047
acia-0.047
usting-0.046
 safety-0.045
TokenCBS
Token ID22923
Feature activation+2.03e+00

Pos logit contributions
otrop+0.029
udos+0.027
acia+0.027
usting+0.027
 safety+0.026
Neg logit contributions
���-0.026
 Lennon-0.025
 lapt-0.025
 Gohan-0.023
FML-0.023
Token All
Token ID1439
Feature activation+9.05e-02

Pos logit contributions
otrop+1.339
udos+1.230
acia+1.226
usting+1.219
 safety+1.190
Neg logit contributions
���-1.518
 Lennon-1.444
 lapt-1.444
 Gohan-1.352
 Yose-1.336
Token Access
Token ID8798
Feature activation+1.14e+00

Pos logit contributions
otrop+0.072
udos+0.068
acia+0.067
usting+0.067
 safety+0.065
Neg logit contributions
���-0.055
 Lennon-0.052
 lapt-0.051
 Gohan-0.048
 Yose-0.047
Token
Token ID198
Feature activation+4.23e-01

Pos logit contributions
otrop+0.742
udos+0.682
acia+0.681
usting+0.675
 safety+0.659
Neg logit contributions
���-0.856
 Lennon-0.812
 lapt-0.810
 Gohan-0.762
FML-0.753
Token
Token ID198
Feature activation+2.76e-01

Pos logit contributions
otrop+0.249
udos+0.226
acia+0.226
usting+0.224
 safety+0.217
Neg logit contributions
���-0.345
 Lennon-0.330
 lapt-0.328
 Gohan-0.311
 Yose-0.307
TokenAll
Token ID3237
Feature activation+1.32e+00

Pos logit contributions
otrop+0.203
udos+0.187
acia+0.187
usting+0.186
 safety+0.181
Neg logit contributions
���-0.184
 Lennon-0.175
 lapt-0.174
 Gohan-0.163
FML-0.162
Token
Token ID14519
Feature activation+1.28e+00

Pos logit contributions
���+0.383
 Lennon+0.368
 lapt+0.367
 Gohan+0.346
 Yose+0.341
Neg logit contributions
otrop-0.304
udos-0.279
acia-0.278
usting-0.275
 safety-0.264
Token
Token ID244
Feature activation-5.90e-01

Pos logit contributions
otrop+1.093
acia+1.032
udos+1.026
usting+1.019
 safety+1.004
Neg logit contributions
���-0.686
 Lennon-0.647
 lapt-0.645
 Gohan-0.592
 Yose-0.581
Token
Token ID37929
Feature activation-2.69e-01

Pos logit contributions
���+0.341
 Lennon+0.323
 lapt+0.320
 Gohan+0.296
 Yose+0.290
Neg logit contributions
otrop-0.487
udos-0.455
acia-0.454
usting-0.451
 safety-0.439
Token Be
Token ID1355
Feature activation+1.80e-01

Pos logit contributions
���+0.207
 Lennon+0.198
 lapt+0.197
 Gohan+0.186
 Yose+0.184
Neg logit contributions
otrop-0.170
udos-0.156
acia-0.155
usting-0.154
 safety-0.149
Token safe
Token ID3338
Feature activation+5.47e-01

Pos logit contributions
otrop+0.105
udos+0.096
acia+0.096
usting+0.095
 safety+0.091
Neg logit contributions
���-0.147
 Lennon-0.141
 lapt-0.140
 Gohan-0.132
 Yose-0.131
Token pic
Token ID8301
Feature activation+2.14e+00

Pos logit contributions
otrop+0.321
udos+0.292
acia+0.291
usting+0.289
 safety+0.277
Neg logit contributions
���-0.447
 Lennon-0.429
 lapt-0.427
 Gohan-0.405
 Yose-0.400
Token.
Token ID13
Feature activation+1.20e+00

Pos logit contributions
otrop+0.942
udos+0.830
acia+0.822
usting+0.805
 safety+0.786
Neg logit contributions
���-2.121
 lapt-2.003
 Lennon-1.968
 Gohan-1.879
 agre-1.875
Tokentwitter
Token ID6956
Feature activation+4.36e-01

Pos logit contributions
otrop+0.844
udos+0.783
acia+0.782
usting+0.774
 safety+0.757
Neg logit contributions
���-0.856
 lapt-0.797
 Lennon-0.793
 Gohan-0.740
 Yose-0.733
Token.
Token ID13
Feature activation+6.26e-02

Pos logit contributions
otrop+0.217
acia+0.193
udos+0.193
usting+0.193
 safety+0.182
Neg logit contributions
 Lennon-0.382
���-0.380
 lapt-0.371
 Gohan-0.362
FML-0.359
Tokencom
Token ID785
Feature activation+4.40e-01

Pos logit contributions
otrop+0.044
udos+0.041
acia+0.041
usting+0.040
 safety+0.039
Neg logit contributions
���-0.043
 Lennon-0.042
 lapt-0.041
 Gohan-0.039
FML-0.038
Token/
Token ID14
Feature activation+2.71e-01

Pos logit contributions
otrop+0.259
acia+0.236
udos+0.236
usting+0.234
 safety+0.224
Neg logit contributions
���-0.349
 Lennon-0.346
 lapt-0.337
 Gohan-0.325
FML-0.322
Token?
Token ID30
Feature activation-6.57e-01

Pos logit contributions
otrop+0.207
udos+0.166
acia+0.164
usting+0.161
 safety+0.151
Neg logit contributions
���-0.912
 lapt-0.873
 Lennon-0.872
 Gohan-0.836
 Yose-0.831
Token No
Token ID1400
Feature activation-2.36e-01

Pos logit contributions
���+0.539
 Lennon+0.522
 lapt+0.515
 Gohan+0.494
 Yose+0.487
Neg logit contributions
otrop-0.380
udos-0.344
acia-0.344
usting-0.342
 safety-0.328
Token more
Token ID517
Feature activation+1.77e-01

Pos logit contributions
���+0.230
 Lennon+0.229
 lapt+0.225
 Gohan+0.218
 Yose+0.216
Neg logit contributions
otrop-0.093
acia-0.082
udos-0.082
usting-0.080
 safety-0.073
Token bad
Token ID2089
Feature activation+1.83e-02

Pos logit contributions
otrop+0.113
udos+0.104
acia+0.104
usting+0.103
 safety+0.099
Neg logit contributions
���-0.134
 Lennon-0.129
 lapt-0.129
 Gohan-0.121
 Yose-0.120
Token news
Token ID1705
Feature activation+1.56e-01

Pos logit contributions
otrop+0.013
udos+0.012
acia+0.012
usting+0.012
 safety+0.012
Neg logit contributions
���-0.012
 Lennon-0.012
 lapt-0.012
 Gohan-0.011
 Yose-0.011
Token pl
Token ID458
Feature activation+2.31e+00

Pos logit contributions
otrop+0.130
udos+0.121
acia+0.121
usting+0.121
 safety+0.119
Neg logit contributions
���-0.091
 Lennon-0.083
 lapt-0.083
 Gohan-0.076
 Yose-0.075
Tokens
Token ID82
Feature activation+6.21e-01

Pos logit contributions
otrop+1.722
udos+1.635
acia+1.618
usting+1.602
 safety+1.575
Neg logit contributions
���-1.654
 lapt-1.488
 Lennon-1.420
 corrid-1.367
 agre-1.355
Token.
Token ID13
Feature activation+5.78e-01

Pos logit contributions
otrop+0.292
udos+0.259
acia+0.258
usting+0.255
 safety+0.244
Neg logit contributions
���-0.578
 Lennon-0.558
 lapt-0.555
 Gohan-0.531
 Yose-0.525
Token Good
Token ID4599
Feature activation+8.21e-01

Pos logit contributions
otrop+0.385
udos+0.355
usting+0.352
acia+0.352
 safety+0.343
Neg logit contributions
���-0.433
 lapt-0.408
 Lennon-0.402
 Gohan-0.376
 agre-0.374
Token updates
Token ID5992
Feature activation+7.61e-01

Pos logit contributions
otrop+0.653
udos+0.610
acia+0.608
usting+0.604
 safety+0.592
Neg logit contributions
���-0.506
 Lennon-0.471
 lapt-0.470
 Gohan-0.435
 Yose-0.427
Token?
Token ID30
Feature activation+1.89e-01

Pos logit contributions
otrop+0.214
udos+0.173
acia+0.172
usting+0.169
 safety+0.155
Neg logit contributions
���-0.852
 Lennon-0.829
 lapt-0.825
 Gohan-0.796
 Yose-0.788
Token and
Token ID290
Feature activation+2.02e-01

Pos logit contributions
otrop+0.041
udos+0.037
acia+0.037
usting+0.037
 safety+0.036
Neg logit contributions
���-0.052
 Lennon-0.050
 lapt-0.050
 Gohan-0.047
 Yose-0.047
Token an
Token ID281
Feature activation+2.25e-01

Pos logit contributions
otrop+0.126
udos+0.116
acia+0.116
usting+0.115
 safety+0.110
Neg logit contributions
���-0.157
 Lennon-0.151
 lapt-0.150
 Gohan-0.142
 Yose-0.140
Token anonymous
Token ID11614
Feature activation+1.15e-01

Pos logit contributions
otrop+0.145
udos+0.133
acia+0.133
usting+0.132
 safety+0.129
Neg logit contributions
���-0.170
 lapt-0.162
 Lennon-0.162
 Gohan-0.152
FML-0.150
Token source
Token ID2723
Feature activation+8.80e-02

Pos logit contributions
otrop+0.077
udos+0.071
acia+0.070
usting+0.070
 safety+0.068
Neg logit contributions
���-0.085
 Lennon-0.081
 lapt-0.080
 Gohan-0.076
FML-0.075
Token will
Token ID481
Feature activation+3.26e-01

Pos logit contributions
otrop+0.053
udos+0.048
acia+0.048
usting+0.048
 safety+0.046
Neg logit contributions
���-0.070
 Lennon-0.068
 lapt-0.067
 Gohan-0.064
 Yose-0.063
Token suffice
Token ID34302
Feature activation+2.72e+00

Pos logit contributions
otrop+0.201
udos+0.183
acia+0.183
usting+0.182
 safety+0.175
Neg logit contributions
���-0.257
 Lennon-0.246
 lapt-0.245
 Gohan-0.232
 Yose-0.229
Token.
Token ID13
Feature activation+2.10e-02

Pos logit contributions
otrop+1.359
udos+1.212
acia+1.209
usting+1.199
 safety+1.155
Neg logit contributions
���-2.466
 Lennon-2.362
 lapt-2.356
 Gohan-2.242
 Yose-2.218
Token Any
Token ID4377
Feature activation-4.16e-02

Pos logit contributions
otrop+0.014
udos+0.012
acia+0.012
usting+0.012
 safety+0.012
Neg logit contributions
���-0.016
 Lennon-0.015
 lapt-0.015
 Gohan-0.014
 Yose-0.014
Token mistake
Token ID7457
Feature activation+3.00e-01

Pos logit contributions
���+0.033
 Lennon+0.032
 lapt+0.032
 Gohan+0.030
 Yose+0.030
Neg logit contributions
otrop-0.025
udos-0.023
acia-0.023
usting-0.022
 safety-0.022
Token sparks
Token ID38306
Feature activation+7.32e-01

Pos logit contributions
otrop+0.186
udos+0.170
acia+0.170
usting+0.168
 safety+0.162
Neg logit contributions
���-0.234
 Lennon-0.225
 lapt-0.223
 Gohan-0.211
 Yose-0.208
Token talk
Token ID1561
Feature activation-5.12e-01

Pos logit contributions
otrop+0.453
udos+0.413
acia+0.412
usting+0.408
 safety+0.392
Neg logit contributions
���-0.575
 Lennon-0.552
 lapt-0.548
 Gohan-0.519
 Yose-0.512
Tokens
Token ID82
Feature activation+6.41e-01

Pos logit contributions
otrop+0.238
udos+0.222
acia+0.221
usting+0.220
 safety+0.215
Neg logit contributions
���-0.190
 Lennon-0.180
 lapt-0.179
 Gohan-0.166
 Yose-0.163
Token offshore
Token ID18479
Feature activation+1.10e+00

Pos logit contributions
otrop+0.383
udos+0.348
acia+0.347
usting+0.345
 safety+0.330
Neg logit contributions
���-0.515
 Lennon-0.495
 lapt-0.492
 Gohan-0.467
 Yose-0.462
Token oil
Token ID3056
Feature activation+3.01e-01

Pos logit contributions
otrop+0.667
udos+0.609
acia+0.608
usting+0.603
 safety+0.580
Neg logit contributions
���-0.870
 Lennon-0.835
 lapt-0.831
 Gohan-0.786
 Yose-0.776
Token fields
Token ID7032
Feature activation+3.00e-01

Pos logit contributions
otrop+0.176
udos+0.160
acia+0.159
usting+0.158
 safety+0.151
Neg logit contributions
���-0.244
 Lennon-0.236
 lapt-0.234
 Gohan-0.221
FML-0.219
Token and
Token ID290
Feature activation+2.56e-01

Pos logit contributions
otrop+0.165
udos+0.150
acia+0.149
usting+0.147
 safety+0.141
Neg logit contributions
���-0.255
 Lennon-0.246
 lapt-0.244
 Gohan-0.232
FML-0.230
Token the
Token ID262
Feature activation+6.07e-01

Pos logit contributions
otrop+0.164
udos+0.150
acia+0.150
usting+0.149
 safety+0.143
Neg logit contributions
���-0.195
 Lennon-0.186
 lapt-0.185
 Gohan-0.175
 Yose-0.173
Token laying
Token ID16299
Feature activation+6.24e-01

Pos logit contributions
otrop+0.376
udos+0.344
acia+0.343
usting+0.340
 safety+0.328
Neg logit contributions
���-0.476
 Lennon-0.456
 lapt-0.453
 Gohan-0.428
 Yose-0.424
Token of
Token ID286
Feature activation+5.55e-01

Pos logit contributions
otrop+0.416
acia+0.383
udos+0.383
usting+0.377
 safety+0.358
Neg logit contributions
���-0.463
 Lennon-0.440
 lapt-0.434
 Gohan-0.411
 Yose-0.406
Token a
Token ID257
Feature activation-1.74e-01

Pos logit contributions
otrop+0.344
acia+0.315
udos+0.314
usting+0.311
 safety+0.295
Neg logit contributions
���-0.433
 Lennon-0.417
 lapt-0.413
 Gohan-0.391
 Yose-0.389
Token gas
Token ID3623
Feature activation+1.43e+00

Pos logit contributions
���+0.132
 Lennon+0.128
 lapt+0.125
 Gohan+0.119
 Yose+0.119
Neg logit contributions
otrop-0.112
acia-0.102
udos-0.102
usting-0.101
oras-0.095
Token pipeline
Token ID11523
Feature activation+3.32e-01

Pos logit contributions
otrop+0.954
udos+0.878
acia+0.876
usting+0.870
 safety+0.845
Neg logit contributions
���-1.049
 Lennon-0.998
 lapt-0.994
 Gohan-0.935
 Yose-0.922
Token who
Token ID508
Feature activation-8.84e-02

Pos logit contributions
otrop+0.066
udos+0.059
acia+0.059
usting+0.058
 safety+0.056
Neg logit contributions
���-0.105
 Lennon-0.102
 lapt-0.101
 Gohan-0.097
 Yose-0.095
Token added
Token ID2087
Feature activation+3.29e-01

Pos logit contributions
���+0.070
 Lennon+0.068
 lapt+0.067
 Gohan+0.064
 Yose+0.063
Neg logit contributions
otrop-0.054
udos-0.049
acia-0.049
usting-0.049
 safety-0.047
Token that
Token ID326
Feature activation+5.99e-01

Pos logit contributions
otrop+0.163
udos+0.146
acia+0.145
usting+0.144
 safety+0.135
Neg logit contributions
���-0.297
 Lennon-0.288
 lapt-0.287
 Gohan-0.274
 Yose-0.270
Token his
Token ID465
Feature activation+2.08e-01

Pos logit contributions
otrop+0.338
udos+0.307
acia+0.305
usting+0.303
 safety+0.289
Neg logit contributions
���-0.502
 Lennon-0.484
 lapt-0.481
 Gohan-0.457
 Yose-0.451
Token client
Token ID5456
Feature activation+3.26e-01

Pos logit contributions
otrop+0.178
udos+0.167
acia+0.167
usting+0.166
 safety+0.162
Neg logit contributions
���-0.114
 Lennon-0.107
 lapt-0.106
 Gohan-0.098
 Yose-0.096
Token denied
Token ID6699
Feature activation+8.71e-01

Pos logit contributions
otrop+0.199
udos+0.181
acia+0.181
usting+0.179
 safety+0.174
Neg logit contributions
���-0.260
 Lennon-0.248
 lapt-0.247
 Gohan-0.234
 Yose-0.231
Token the
Token ID262
Feature activation-4.29e-01

Pos logit contributions
otrop+0.506
udos+0.459
acia+0.458
usting+0.454
 safety+0.438
Neg logit contributions
���-0.718
 Lennon-0.688
 lapt-0.685
 Gohan-0.649
 Yose-0.641
Token allegations
Token ID7636
Feature activation-4.52e-01

Pos logit contributions
���+0.351
 Lennon+0.338
 lapt+0.336
 Gohan+0.319
 Yose+0.315
Neg logit contributions
otrop-0.250
udos-0.227
acia-0.226
usting-0.224
 safety-0.215
Token.
Token ID13
Feature activation+3.10e-01

Pos logit contributions
���+0.388
 Lennon+0.378
 lapt+0.372
 Gohan+0.359
 Yose+0.353
Neg logit contributions
otrop-0.244
udos-0.221
acia-0.219
usting-0.214
 safety-0.204
Token
Token ID198
Feature activation-2.76e-01

Pos logit contributions
otrop+0.191
udos+0.175
acia+0.174
usting+0.173
 safety+0.167
Neg logit contributions
���-0.243
 Lennon-0.236
 lapt-0.231
 Gohan-0.221
FML-0.219
Token
Token ID198
Feature activation-2.07e-01

Pos logit contributions
���+0.217
 Lennon+0.212
 lapt+0.209
 Gohan+0.200
FML+0.198
Neg logit contributions
otrop-0.167
udos-0.152
acia-0.152
usting-0.151
 safety-0.145
Token can
Token ID460
Feature activation-3.01e-01

Pos logit contributions
���+0.081
 Lennon+0.078
 lapt+0.077
 Gohan+0.073
FML+0.072
Neg logit contributions
otrop-0.068
udos-0.062
acia-0.062
usting-0.061
 safety-0.059
Token say
Token ID910
Feature activation+4.15e-02

Pos logit contributions
���+0.241
 Lennon+0.231
 lapt+0.230
 Gohan+0.218
FML+0.215
Neg logit contributions
otrop-0.182
udos-0.165
acia-0.165
usting-0.164
 safety-0.157
Token,
Token ID11
Feature activation+2.59e-01

Pos logit contributions
otrop+0.024
udos+0.022
acia+0.021
usting+0.021
 safety+0.020
Neg logit contributions
���-0.034
 Lennon-0.033
 lapt-0.033
 Gohan-0.031
FML-0.031
Token see
Token ID766
Feature activation-1.49e+00

Pos logit contributions
otrop+0.131
udos+0.118
acia+0.117
usting+0.116
 safety+0.111
Neg logit contributions
���-0.231
 Lennon-0.223
 lapt-0.222
 Gohan-0.211
 Yose-0.209
Token or
Token ID393
Feature activation-5.01e-01

Pos logit contributions
���+1.079
 Lennon+1.029
 lapt+1.021
 Gohan+0.962
FML+0.949
Neg logit contributions
otrop-1.011
udos-0.936
acia-0.928
usting-0.924
 safety-0.889
Token hear
Token ID3285
Feature activation-3.44e-02

Pos logit contributions
���+0.450
 Lennon+0.438
 lapt+0.433
 Gohan+0.413
FML+0.409
Neg logit contributions
otrop-0.251
udos-0.224
acia-0.223
usting-0.221
 safety-0.210
Token?
Token ID30
Feature activation+2.14e-01

Pos logit contributions
���+0.034
 Lennon+0.033
 lapt+0.033
 Gohan+0.032
FML+0.031
Neg logit contributions
otrop-0.014
udos-0.012
acia-0.012
usting-0.012
 safety-0.011
Token Who
Token ID5338
Feature activation-4.85e-01

Pos logit contributions
otrop+0.137
acia+0.125
udos+0.125
usting+0.124
 safety+0.120
Neg logit contributions
���-0.163
 Lennon-0.157
 lapt-0.155
 Gohan-0.148
 Yose-0.145
Token decides
Token ID13267
Feature activation+1.01e-01

Pos logit contributions
���+0.372
 Lennon+0.358
 lapt+0.356
 Gohan+0.336
FML+0.331
Neg logit contributions
otrop-0.306
acia-0.281
udos-0.280
usting-0.278
 safety-0.267
Token what
Token ID644
Feature activation-2.57e-01

Pos logit contributions
otrop+0.052
udos+0.047
acia+0.046
usting+0.046
 safety+0.043
Neg logit contributions
���-0.089
 Lennon-0.087
 lapt-0.086
 Gohan-0.082
FML-0.081
Token is
Token ID318
Feature activation-4.11e-01

Pos logit contributions
���+0.215
 Lennon+0.208
 lapt+0.207
 Gohan+0.197
FML+0.193
Neg logit contributions
otrop-0.144
udos-0.131
acia-0.131
usting-0.130
 safety-0.122
Token story
Token ID1621
Feature activation+5.36e-01

Pos logit contributions
otrop+0.847
udos+0.779
acia+0.778
usting+0.775
 safety+0.759
Neg logit contributions
���-0.936
 Lennon-0.884
 lapt-0.882
 Gohan-0.830
 Yose-0.820
Token in
Token ID287
Feature activation-8.62e-02

Pos logit contributions
otrop+0.307
udos+0.278
acia+0.277
usting+0.274
 safety+0.264
Neg logit contributions
���-0.446
 Lennon-0.429
 lapt-0.426
 Gohan-0.405
 Yose-0.400
Token a
Token ID257
Feature activation-8.79e-02

Pos logit contributions
���+0.069
 Lennon+0.066
 lapt+0.066
 Gohan+0.062
 Yose+0.061
Neg logit contributions
otrop-0.052
udos-0.048
acia-0.047
usting-0.047
 safety-0.045
Token different
Token ID1180
Feature activation+5.73e-01

Pos logit contributions
���+0.071
 Lennon+0.068
 lapt+0.067
 Gohan+0.064
 Yose+0.063
Neg logit contributions
otrop-0.053
udos-0.048
acia-0.048
usting-0.048
 safety-0.046
Token city
Token ID1748
Feature activation+3.93e-01

Pos logit contributions
otrop+0.342
udos+0.310
acia+0.310
usting+0.308
 safety+0.301
Neg logit contributions
���-0.464
 Lennon-0.441
 lapt-0.440
 Gohan-0.416
FML-0.412
Token,
Token ID11
Feature activation+9.06e-01

Pos logit contributions
otrop+0.221
udos+0.202
acia+0.199
usting+0.197
 safety+0.188
Neg logit contributions
���-0.328
 Lennon-0.319
 lapt-0.316
 Gohan-0.301
 Yose-0.296
Token eight
Token ID3624
Feature activation+3.60e-01

Pos logit contributions
otrop+0.521
udos+0.472
acia+0.471
usting+0.467
 safety+0.452
Neg logit contributions
���-0.752
 Lennon-0.719
 lapt-0.717
 Gohan-0.679
 Yose-0.671
Token months
Token ID1933
Feature activation-3.48e-01

Pos logit contributions
otrop+0.244
udos+0.225
acia+0.225
usting+0.223
 safety+0.216
Neg logit contributions
���-0.261
 Lennon-0.249
 lapt-0.248
 Gohan-0.233
 Yose-0.230
Token ago
Token ID2084
Feature activation+7.17e-01

Pos logit contributions
���+0.292
 Lennon+0.283
 lapt+0.280
 Gohan+0.266
 Yose+0.263
Neg logit contributions
otrop-0.195
udos-0.177
acia-0.176
usting-0.175
 safety-0.166
Token.
Token ID13
Feature activation+7.26e-01

Pos logit contributions
otrop+0.392
udos+0.354
acia+0.352
usting+0.348
 safety+0.334
Neg logit contributions
���-0.613
 Lennon-0.593
 lapt-0.588
 Gohan-0.560
FML-0.553
Token
Token ID198
Feature activation+2.99e-01

Pos logit contributions
otrop+0.469
acia+0.430
udos+0.429
usting+0.427
 safety+0.413
Neg logit contributions
���-0.549
 Lennon-0.529
 lapt-0.521
 Gohan-0.496
 Yose-0.489
Token Shelby
Token ID35053
Feature activation+1.13e+00

Pos logit contributions
otrop+1.148
udos+1.060
acia+1.059
usting+1.051
 safety+1.030
Neg logit contributions
���-1.159
 lapt-1.087
 Lennon-1.086
 Gohan-1.016
FML-1.008
Token W
Token ID370
Feature activation-6.55e-01

Pos logit contributions
otrop+0.736
udos+0.676
acia+0.675
usting+0.670
 safety+0.651
Neg logit contributions
���-0.845
 Lennon-0.803
 lapt-0.801
 Gohan-0.754
FML-0.744
Tokenadd
Token ID2860
Feature activation-9.31e-01

Pos logit contributions
���+0.528
 Lennon+0.508
 lapt+0.505
 Gohan+0.478
FML+0.473
Neg logit contributions
otrop-0.386
acia-0.352
udos-0.350
usting-0.347
 safety-0.336
Tokenell
Token ID695
Feature activation-1.18e-01

Pos logit contributions
���+0.584
 Lennon+0.556
 lapt+0.553
 Gohan+0.516
 Yose+0.507
Neg logit contributions
otrop-0.717
udos-0.667
acia-0.664
usting-0.661
 safety-0.644
Token
Token ID14519
Feature activation+1.24e+00

Pos logit contributions
���+0.102
 Lennon+0.099
 lapt+0.098
 Gohan+0.093
 Yose+0.092
Neg logit contributions
otrop-0.064
udos-0.057
acia-0.057
usting-0.057
 safety-0.054
Token
Token ID234
Feature activation+1.38e+00

Pos logit contributions
otrop+1.089
acia+1.025
udos+1.023
usting+1.018
 safety+0.997
Neg logit contributions
���-0.641
 Lennon-0.598
 lapt-0.597
 Gohan-0.545
FML-0.533
Token (@
Token ID4275
Feature activation-5.92e-01

Pos logit contributions
otrop+1.006
udos+0.934
acia+0.932
usting+0.926
 safety+0.904
Neg logit contributions
���-0.936
 Lennon-0.881
 lapt-0.880
 Gohan-0.821
 Yose-0.809
Tokens
Token ID82
Feature activation-3.91e-01

Pos logit contributions
���+0.364
 Lennon+0.344
 lapt+0.343
 Gohan+0.318
FML+0.314
Neg logit contributions
otrop-0.466
udos-0.434
acia-0.433
usting-0.431
 safety-0.420
Tokenhel
Token ID2978
Feature activation-4.54e-01

Pos logit contributions
���+0.338
 Lennon+0.328
 lapt+0.328
 Gohan+0.310
 Yose+0.308
Neg logit contributions
otrop-0.204
udos-0.183
usting-0.181
acia-0.181
 safety-0.175
Tokenby
Token ID1525
Feature activation-1.14e+00

Pos logit contributions
���+0.431
 Lennon+0.422
 lapt+0.422
 Gohan+0.402
 Yose+0.399
Neg logit contributions
otrop-0.194
udos-0.169
acia-0.168
usting-0.167
 safety-0.160
Tokeny
Token ID88
Feature activation-1.95e-01

Pos logit contributions
���+0.687
 Lennon+0.661
 lapt+0.659
FML+0.606
 Gohan+0.604
Neg logit contributions
otrop-0.890
acia-0.831
udos-0.828
usting-0.823
 safety-0.806
Token-
Token ID12
Feature activation+2.37e-01

Pos logit contributions
otrop+0.433
acia+0.400
udos+0.400
usting+0.396
 safety+0.385
Neg logit contributions
���-0.450
 Lennon-0.425
 lapt-0.425
 Gohan-0.396
FML-0.392
TokenK
Token ID42
Feature activation-7.98e-01

Pos logit contributions
otrop+0.168
udos+0.155
acia+0.155
usting+0.154
 safety+0.150
Neg logit contributions
���-0.166
 Lennon-0.156
 lapt-0.156
 Gohan-0.146
 Yose-0.143
Tokenhun
Token ID20088
Feature activation-3.06e-01

Pos logit contributions
���+0.596
 Lennon+0.571
 lapt+0.569
 Gohan+0.537
 Yose+0.531
Neg logit contributions
otrop-0.517
acia-0.473
udos-0.472
usting-0.469
 safety-0.458
Tokeni
Token ID72
Feature activation-8.63e-01

Pos logit contributions
���+0.227
 Lennon+0.216
 lapt+0.215
 Gohan+0.203
 Yose+0.200
Neg logit contributions
otrop-0.203
udos-0.186
acia-0.186
usting-0.185
 safety-0.179
Token,
Token ID11
Feature activation-1.07e-01

Pos logit contributions
���+0.654
 Lennon+0.625
 lapt+0.622
 Gohan+0.587
 Yose+0.581
Neg logit contributions
otrop-0.555
udos-0.510
acia-0.507
usting-0.505
 safety-0.489
Token B
Token ID347
Feature activation-7.88e-01

Pos logit contributions
���+0.084
 Lennon+0.080
 lapt+0.080
 Gohan+0.076
 Yose+0.075
Neg logit contributions
otrop-0.067
udos-0.061
acia-0.061
usting-0.061
 safety-0.058
Tokenakh
Token ID11322
Feature activation-5.61e-01

Pos logit contributions
���+0.569
 Lennon+0.542
 lapt+0.539
 Gohan+0.507
 Yose+0.500
Neg logit contributions
otrop-0.535
udos-0.492
acia-0.491
usting-0.488
 safety-0.476
Tokente
Token ID660
Feature activation+4.43e-01

Pos logit contributions
���+0.405
 Lennon+0.389
 lapt+0.387
 Gohan+0.364
 Yose+0.360
Neg logit contributions
otrop-0.377
udos-0.347
acia-0.346
usting-0.344
 safety-0.336
Tokengan
Token ID1030
Feature activation-1.38e-01

Pos logit contributions
otrop+0.295
udos+0.272
acia+0.271
usting+0.269
 safety+0.262
Neg logit contributions
���-0.325
 Lennon-0.310
 lapt-0.309
 Gohan-0.291
 Yose-0.287
Token,
Token ID11
Feature activation+1.91e-01

Pos logit contributions
���+0.105
 Lennon+0.100
 lapt+0.099
 Gohan+0.093
 Yose+0.092
Neg logit contributions
otrop-0.090
acia-0.083
udos-0.083
usting-0.082
 safety-0.079
Token An
Token ID1052
Feature activation-1.32e-01

Pos logit contributions
otrop+0.118
udos+0.108
acia+0.108
usting+0.107
 safety+0.104
Neg logit contributions
���-0.150
 Lennon-0.143
 lapt-0.142
 Gohan-0.134
 Yose-0.133
Token Syrian
Token ID6318
Feature activation-4.63e-01

Pos logit contributions
otrop+0.034
udos+0.031
acia+0.031
usting+0.031
 safety+0.030
Neg logit contributions
���-0.034
 Lennon-0.033
 lapt-0.033
 Gohan-0.031
 Yose-0.030
Token frontier
Token ID27580
Feature activation+3.20e-01

Pos logit contributions
���+0.364
 Lennon+0.351
 lapt+0.348
 Gohan+0.329
 Yose+0.326
Neg logit contributions
otrop-0.282
udos-0.260
acia-0.256
usting-0.256
 safety-0.242
Token near
Token ID1474
Feature activation-7.02e-01

Pos logit contributions
otrop+0.179
udos+0.164
acia+0.163
usting+0.160
 safety+0.153
Neg logit contributions
���-0.268
 Lennon-0.260
 lapt-0.257
 Gohan-0.245
 Yose-0.243
Token al
Token ID435
Feature activation+6.97e-01

Pos logit contributions
���+0.522
 Lennon+0.502
 lapt+0.496
 Gohan+0.470
 Yose+0.464
Neg logit contributions
otrop-0.461
udos-0.424
acia-0.424
usting-0.420
 safety-0.404
Token-
Token ID12
Feature activation-1.59e-01

Pos logit contributions
otrop+0.365
udos+0.328
acia+0.325
usting+0.323
 safety+0.311
Neg logit contributions
���-0.615
 Lennon-0.591
 lapt-0.588
 Gohan-0.560
 Yose-0.555
TokenA
Token ID32
Feature activation-6.52e-01

Pos logit contributions
���+0.109
 Lennon+0.103
 lapt+0.103
 Gohan+0.096
 Yose+0.095
Neg logit contributions
otrop-0.115
udos-0.106
acia-0.106
usting-0.105
 safety-0.103
Tokenfs
Token ID9501
Feature activation+7.13e-01

Pos logit contributions
���+0.423
 Lennon+0.400
 lapt+0.398
 Gohan+0.371
 Yose+0.366
Neg logit contributions
otrop-0.492
udos-0.457
acia-0.455
usting-0.453
 safety-0.442
Tokeniq
Token ID25011
Feature activation+1.40e-01

Pos logit contributions
otrop+0.461
udos+0.423
acia+0.422
usting+0.419
 safety+0.407
Neg logit contributions
���-0.541
 Lennon-0.515
 lapt-0.513
 Gohan-0.484
 Yose-0.477
Token and
Token ID290
Feature activation+8.10e-01

Pos logit contributions
otrop+0.083
udos+0.076
acia+0.075
usting+0.075
 safety+0.072
Neg logit contributions
���-0.113
 Lennon-0.108
 lapt-0.108
 Gohan-0.102
FML-0.101
Token split
Token ID6626
Feature activation+9.85e-01

Pos logit contributions
otrop+0.516
udos+0.472
acia+0.469
usting+0.469
 safety+0.461
Neg logit contributions
���-0.623
 Lennon-0.588
 lapt-0.588
 Gohan-0.553
FML-0.549
Token into
Token ID656
Feature activation-4.33e-01

Pos logit contributions
otrop+0.597
udos+0.544
acia+0.543
usting+0.539
 safety+0.523
Neg logit contributions
���-0.787
 Lennon-0.750
 lapt-0.748
 Gohan-0.708
FML-0.698
Token last
Token ID938
Feature activation+7.38e-01

Pos logit contributions
���+0.419
 Lennon+0.408
 lapt+0.402
 Gohan+0.382
 Yose+0.379
Neg logit contributions
otrop-0.299
udos-0.274
acia-0.270
usting-0.267
 mine-0.253
Token fall
Token ID2121
Feature activation-1.46e+00

Pos logit contributions
otrop+0.502
udos+0.463
acia+0.462
usting+0.459
 safety+0.446
Neg logit contributions
���-0.534
 Lennon-0.508
 lapt-0.505
 Gohan-0.475
 Yose-0.468
Token.
Token ID13
Feature activation+5.80e-02

Pos logit contributions
���+1.192
 Lennon+1.171
 lapt+1.148
 Gohan+1.097
FML+1.088
Neg logit contributions
otrop-0.833
udos-0.762
acia-0.750
usting-0.737
 safety-0.705
Token
Token ID198
Feature activation+2.07e-03

Pos logit contributions
otrop+0.039
udos+0.036
acia+0.035
usting+0.035
 safety+0.034
Neg logit contributions
���-0.043
 Lennon-0.041
 lapt-0.040
 Gohan-0.038
 Yose-0.038
Token
Token ID198
Feature activation-1.65e-01

Pos logit contributions
otrop+0.001
udos+0.001
acia+0.001
usting+0.001
 safety+0.001
Neg logit contributions
���-0.002
 Lennon-0.002
 lapt-0.002
 Gohan-0.002
FML-0.001
TokenCourt
Token ID36699
Feature activation-3.49e-01

Pos logit contributions
���+0.115
 Lennon+0.111
 lapt+0.110
FML+0.104
 Gohan+0.103
Neg logit contributions
otrop-0.115
udos-0.106
acia-0.105
usting-0.105
 safety-0.102
Token heard
Token ID2982
Feature activation+6.16e-01

Pos logit contributions
���+0.276
 Lennon+0.264
 lapt+0.263
 Gohan+0.248
 Yose+0.245
Neg logit contributions
otrop-0.214
udos-0.195
acia-0.195
usting-0.194
 safety-0.188
Token that
Token ID326
Feature activation+2.44e-01

Pos logit contributions
otrop+0.354
udos+0.321
acia+0.320
usting+0.317
 safety+0.304
Neg logit contributions
���-0.512
 Lennon-0.492
 lapt-0.488
 Gohan-0.463
 Yose-0.459
Token the
Token ID262
Feature activation+1.11e-02

Pos logit contributions
otrop+0.143
udos+0.130
acia+0.129
usting+0.128
 safety+0.123
Neg logit contributions
���-0.200
 Lennon-0.193
 lapt-0.191
 Gohan-0.181
 Yose-0.180
Token U
Token ID471
Feature activation-1.23e-01

Pos logit contributions
otrop+0.006
udos+0.005
acia+0.005
usting+0.005
 safety+0.005
Neg logit contributions
���-0.010
 Lennon-0.009
 lapt-0.009
 Gohan-0.009
 Yose-0.009
Token.
Token ID13
Feature activation+1.42e-01

Pos logit contributions
���+0.111
 Lennon+0.109
 lapt+0.108
 Gohan+0.103
FML+0.103
Neg logit contributions
otrop-0.060
acia-0.053
udos-0.053
usting-0.052
 safety-0.050
Token the
Token ID262
Feature activation-1.01e-02

Pos logit contributions
���+0.362
 Lennon+0.348
 lapt+0.347
 Gohan+0.328
 Yose+0.323
Neg logit contributions
otrop-0.278
udos-0.256
acia-0.252
usting-0.251
 safety-0.238
Token problem
Token ID1917
Feature activation-9.23e-02

Pos logit contributions
���+0.008
 Lennon+0.008
 lapt+0.008
 Gohan+0.007
 Yose+0.007
Neg logit contributions
otrop-0.006
udos-0.006
acia-0.006
usting-0.006
 safety-0.005
Token completely
Token ID3190
Feature activation+6.47e-01

Pos logit contributions
���+0.073
 Lennon+0.071
 lapt+0.070
 Gohan+0.067
 Yose+0.066
Neg logit contributions
otrop-0.056
udos-0.051
acia-0.051
usting-0.050
 safety-0.048
Token.
Token ID13
Feature activation-6.54e-01

Pos logit contributions
otrop+0.393
udos+0.359
acia+0.358
usting+0.354
 safety+0.341
Neg logit contributions
���-0.513
 Lennon-0.493
 lapt-0.491
 Gohan-0.465
 Yose-0.458
Token The
Token ID383
Feature activation-4.73e-01

Pos logit contributions
���+0.460
 Lennon+0.442
 lapt+0.436
 Gohan+0.413
 Yose+0.406
Neg logit contributions
otrop-0.456
udos-0.421
acia-0.420
usting-0.418
 safety-0.404
Token terrain
Token ID15510
Feature activation-4.90e-01

Pos logit contributions
���+0.371
 Lennon+0.356
 lapt+0.355
 Gohan+0.335
FML+0.331
Neg logit contributions
otrop-0.291
udos-0.267
acia-0.265
usting-0.263
 safety-0.251
Token would
Token ID561
Feature activation-1.18e-01

Pos logit contributions
���+0.387
 Lennon+0.372
 lapt+0.370
 Gohan+0.350
FML+0.345
Neg logit contributions
otrop-0.301
udos-0.275
acia-0.274
usting-0.271
 safety-0.260
Token either
Token ID2035
Feature activation-1.88e-01

Pos logit contributions
���+0.090
 Lennon+0.086
 lapt+0.086
 Gohan+0.081
FML+0.080
Neg logit contributions
otrop-0.075
udos-0.069
acia-0.069
usting-0.069
 safety-0.066
Token a
Token ID257
Feature activation+2.56e-01

Pos logit contributions
���+0.145
 Lennon+0.139
 lapt+0.139
 Gohan+0.131
FML+0.129
Neg logit contributions
otrop-0.119
udos-0.109
acia-0.109
usting-0.108
 safety-0.104
Token)
Token ID8
Feature activation-9.85e-01

Pos logit contributions
otrop+0.165
udos+0.152
acia+0.151
usting+0.150
 safety+0.146
Neg logit contributions
���-0.193
 Lennon-0.184
 lapt-0.184
 Gohan-0.173
 Yose-0.171
Token still
Token ID991
Feature activation+1.51e-01

Pos logit contributions
���+0.760
 Lennon+0.728
 lapt+0.724
 Gohan+0.688
FML+0.677
Neg logit contributions
otrop-0.618
udos-0.568
acia-0.566
usting-0.562
 safety-0.541
Token writing
Token ID3597
Feature activation+1.12e-01

Pos logit contributions
otrop+0.004
udos+0.004
acia+0.004
usting+0.004
 safety+0.004
Neg logit contributions
���-0.004
 Lennon-0.004
 lapt-0.004
 Gohan-0.004
 Yose-0.004
Token them
Token ID606
Feature activation-5.20e-01

Pos logit contributions
otrop+0.068
udos+0.063
acia+0.063
usting+0.062
 safety+0.059
Neg logit contributions
���-0.088
 Lennon-0.085
 lapt-0.084
 Gohan-0.080
 Yose-0.079
Token,
Token ID11
Feature activation-1.70e-01

Pos logit contributions
���+0.428
 Lennon+0.416
 lapt+0.409
 Gohan+0.390
 Yose+0.383
Neg logit contributions
otrop-0.299
udos-0.273
acia-0.272
usting-0.268
 safety-0.257
Token often
Token ID1690
Feature activation+5.84e-02

Pos logit contributions
���+0.130
 Lennon+0.124
 lapt+0.124
 Gohan+0.117
 Yose+0.115
Neg logit contributions
otrop-0.108
udos-0.099
acia-0.099
usting-0.098
 safety-0.095
Token changing
Token ID5609
Feature activation+5.37e-01

Pos logit contributions
otrop+0.038
udos+0.034
acia+0.034
usting+0.034
 safety+0.033
Neg logit contributions
���-0.044
 Lennon-0.042
 lapt-0.042
 Gohan-0.040
 Yose-0.039
Token the
Token ID262
Feature activation-6.16e-01

Pos logit contributions
otrop+0.324
udos+0.296
acia+0.295
usting+0.293
 safety+0.282
Neg logit contributions
���-0.429
 Lennon-0.412
 lapt-0.410
 Gohan-0.387
 Yose-0.382
Token content
Token ID2695
Feature activation-1.47e+00

Pos logit contributions
���+0.438
 Lennon+0.419
 lapt+0.417
 Gohan+0.391
 Yose+0.385
Neg logit contributions
otrop-0.424
udos-0.393
acia-0.392
usting-0.388
 safety-0.376
Token and
Token ID290
Feature activation-2.92e-01

Pos logit contributions
���+1.150
 Lennon+1.117
 lapt+1.098
 Gohan+1.043
 Yose+1.030
Neg logit contributions
otrop-0.907
udos-0.837
acia-0.831
usting-0.820
 safety-0.785
Token order
Token ID1502
Feature activation-5.89e-01

Pos logit contributions
���+0.216
 Lennon+0.207
 lapt+0.206
 Gohan+0.193
 Yose+0.190
Neg logit contributions
otrop-0.194
udos-0.179
acia-0.178
usting-0.176
 safety-0.169
Token of
Token ID286
Feature activation-8.71e-01

Pos logit contributions
���+0.460
 Lennon+0.441
 lapt+0.437
 Gohan+0.414
 Yose+0.409
Neg logit contributions
otrop-0.367
udos-0.336
acia-0.335
usting-0.332
 safety-0.320
Token lyrics
Token ID15844
Feature activation-1.62e+00

Pos logit contributions
���+0.623
 Lennon+0.605
 lapt+0.592
 Gohan+0.556
 Yose+0.547
Neg logit contributions
otrop-0.597
udos-0.555
acia-0.555
usting-0.549
 safety-0.526
Tokenieu
Token ID22304
Feature activation-6.93e-01

Pos logit contributions
���+0.334
 Lennon+0.311
 lapt+0.310
 Gohan+0.287
 Yose+0.282
Neg logit contributions
otrop-0.457
udos-0.427
acia-0.426
usting-0.424
 safety-0.414
Token,
Token ID11
Feature activation-2.66e-01

Pos logit contributions
���+0.570
 Lennon+0.549
 lapt+0.544
 Gohan+0.518
FML+0.512
Neg logit contributions
otrop-0.400
udos-0.364
acia-0.363
usting-0.359
 safety-0.344
Token along
Token ID1863
Feature activation-1.15e+00

Pos logit contributions
���+0.217
 Lennon+0.208
 lapt+0.207
 Gohan+0.196
 Yose+0.194
Neg logit contributions
otrop-0.156
udos-0.142
acia-0.142
usting-0.141
 safety-0.136
Token with
Token ID351
Feature activation-4.35e-01

Pos logit contributions
���+0.998
 Lennon+0.959
 lapt+0.953
 Gohan+0.904
FML+0.897
Neg logit contributions
otrop-0.622
acia-0.562
udos-0.559
usting-0.553
 safety-0.526
Token things
Token ID1243
Feature activation-9.71e-02

Pos logit contributions
���+0.339
 Lennon+0.324
 lapt+0.322
 Gohan+0.304
 Yose+0.300
Neg logit contributions
otrop-0.272
udos-0.249
acia-0.249
usting-0.246
 safety-0.238
Token like
Token ID588
Feature activation-2.36e+00

Pos logit contributions
���+0.081
 Lennon+0.078
 lapt+0.077
 Gohan+0.073
 Yose+0.072
Neg logit contributions
otrop-0.055
udos-0.050
acia-0.050
usting-0.050
 safety-0.048
Token the
Token ID262
Feature activation-2.41e-01

Pos logit contributions
���+1.827
 Lennon+1.761
 lapt+1.743
 Gohan+1.654
 Yose+1.627
Neg logit contributions
otrop-1.461
udos-1.352
acia-1.352
usting-1.331
 safety-1.255
Token pul
Token ID17472
Feature activation-4.39e-01

Pos logit contributions
���+0.184
 Lennon+0.176
 lapt+0.175
 Gohan+0.165
 Yose+0.163
Neg logit contributions
otrop-0.153
udos-0.141
acia-0.140
usting-0.139
 safety-0.134
Tokenpy
Token ID9078
Feature activation-1.50e+00

Pos logit contributions
���+0.356
 Lennon+0.341
 lapt+0.340
 Gohan+0.322
 Yose+0.319
Neg logit contributions
otrop-0.257
acia-0.234
udos-0.233
usting-0.230
 safety-0.223
Token space
Token ID2272
Feature activation-1.15e+00

Pos logit contributions
���+1.126
 Lennon+1.082
 lapt+1.076
 Gohan+1.016
 Yose+1.001
Neg logit contributions
otrop-0.968
udos-0.891
acia-0.887
usting-0.882
 safety-0.840
Token opera
Token ID27296
Feature activation+5.30e-01

Pos logit contributions
���+0.829
 Lennon+0.796
 lapt+0.790
 Gohan+0.743
FML+0.733
Neg logit contributions
otrop-0.782
udos-0.721
acia-0.719
usting-0.714
 safety-0.677
Token.
Token ID13
Feature activation-2.07e-01

Pos logit contributions
otrop+0.384
udos+0.345
acia+0.344
usting+0.341
 safety+0.328
Neg logit contributions
���-0.656
 Lennon-0.629
 lapt-0.627
 Gohan-0.596
 Yose-0.590
Token This
Token ID770
Feature activation+3.46e-01

Pos logit contributions
���+0.156
 Lennon+0.151
 lapt+0.149
 Gohan+0.141
 Yose+0.140
Neg logit contributions
otrop-0.133
udos-0.121
acia-0.121
usting-0.121
 safety-0.117
Token is
Token ID318
Feature activation-2.63e-01

Pos logit contributions
otrop+0.216
udos+0.197
acia+0.197
usting+0.195
 safety+0.190
Neg logit contributions
���-0.272
 Lennon-0.259
 lapt-0.258
 Gohan-0.244
 Yose-0.241
Token only
Token ID691
Feature activation+2.84e-01

Pos logit contributions
���+0.218
 Lennon+0.209
 lapt+0.208
 Gohan+0.197
 Yose+0.195
Neg logit contributions
otrop-0.151
udos-0.137
acia-0.136
usting-0.135
 safety-0.130
Token the
Token ID262
Feature activation-2.98e-01

Pos logit contributions
otrop+0.160
udos+0.145
acia+0.145
usting+0.143
 safety+0.138
Neg logit contributions
���-0.239
 Lennon-0.229
 lapt-0.228
 Gohan-0.217
 Yose-0.214
Token beginning
Token ID3726
Feature activation-2.36e+00

Pos logit contributions
���+0.218
 Lennon+0.208
 lapt+0.207
 Gohan+0.194
 Yose+0.192
Neg logit contributions
otrop-0.200
udos-0.185
acia-0.184
usting-0.183
 safety-0.177
Token,
Token ID11
Feature activation-5.27e-02

Pos logit contributions
���+1.950
 Lennon+1.920
 lapt+1.895
 Gohan+1.816
FML+1.793
Neg logit contributions
otrop-1.320
udos-1.184
acia-1.176
usting-1.158
 safety-1.093
Token
Token ID447
Feature activation+3.78e-01

Pos logit contributions
���+0.042
 Lennon+0.040
 lapt+0.040
 Gohan+0.038
 Yose+0.038
Neg logit contributions
otrop-0.032
udos-0.029
acia-0.029
usting-0.029
 safety-0.028
Token
Token ID251
Feature activation+8.30e-01

Pos logit contributions
otrop+0.336
udos+0.316
acia+0.315
usting+0.314
 safety+0.307
Neg logit contributions
���-0.195
 Lennon-0.181
 lapt-0.180
 Gohan-0.164
 Yose-0.161
Token an
Token ID281
Feature activation-4.31e-01

Pos logit contributions
otrop+0.452
udos+0.407
acia+0.406
usting+0.403
 safety+0.388
Neg logit contributions
���-0.714
 Lennon-0.685
 lapt-0.682
 Gohan-0.648
 Yose-0.641
Token early
Token ID1903
Feature activation-1.40e-01

Pos logit contributions
���+0.312
 Lennon+0.297
 lapt+0.296
 Gohan+0.278
 Yose+0.274
Neg logit contributions
otrop-0.294
udos-0.271
acia-0.270
usting-0.268
 safety-0.261
Tokenaris
Token ID20066
Feature activation-7.56e-01

Pos logit contributions
���+0.807
 Lennon+0.776
 lapt+0.774
 Gohan+0.738
 Yose+0.730
Neg logit contributions
otrop-0.397
udos-0.352
acia-0.351
usting-0.347
 safety-0.332
Token
Token ID447
Feature activation+2.72e-01

Pos logit contributions
���+0.553
 Lennon+0.533
 lapt+0.529
 Gohan+0.497
 Yose+0.491
Neg logit contributions
otrop-0.503
udos-0.463
acia-0.461
usting-0.455
 safety-0.440
Token
Token ID247
Feature activation+3.80e-01

Pos logit contributions
otrop+0.247
acia+0.234
udos+0.233
usting+0.232
 safety+0.228
Neg logit contributions
���-0.135
 lapt-0.125
 Lennon-0.125
 Gohan-0.113
 Yose-0.111
Tokens
Token ID82
Feature activation+3.92e-01

Pos logit contributions
otrop+0.302
udos+0.282
acia+0.282
usting+0.281
 safety+0.274
Neg logit contributions
���-0.233
 Lennon-0.218
 lapt-0.218
 Gohan-0.202
FML-0.198
Token signature
Token ID9877
Feature activation-2.49e-01

Pos logit contributions
otrop+0.266
udos+0.245
usting+0.245
acia+0.245
 safety+0.240
Neg logit contributions
���-0.286
 lapt-0.268
 Lennon-0.267
 Gohan-0.252
FML-0.249
Token charm
Token ID20024
Feature activation-2.49e+00

Pos logit contributions
���+0.177
 Lennon+0.166
 lapt+0.166
 Gohan+0.156
FML+0.154
Neg logit contributions
otrop-0.173
udos-0.160
acia-0.159
usting-0.159
 safety-0.156
Token and
Token ID290
Feature activation-5.27e-01

Pos logit contributions
���+2.122
 Lennon+2.075
 lapt+2.038
 Gohan+1.957
FML+1.941
Neg logit contributions
otrop-1.312
udos-1.198
acia-1.190
usting-1.160
 safety-1.114
Token un
Token ID555
Feature activation-3.35e-01

Pos logit contributions
���+0.371
 Lennon+0.351
 lapt+0.349
 Gohan+0.328
 Yose+0.323
Neg logit contributions
otrop-0.370
udos-0.342
acia-0.341
usting-0.340
 safety-0.331
Tokence
Token ID344
Feature activation+6.10e-01

Pos logit contributions
���+0.235
 Lennon+0.226
 lapt+0.225
 Gohan+0.211
FML+0.210
Neg logit contributions
otrop-0.231
udos-0.214
acia-0.213
usting-0.210
 safety-0.206
Tokenasing
Token ID2313
Feature activation+3.69e-01

Pos logit contributions
otrop+0.353
acia+0.321
udos+0.318
usting+0.317
 safety+0.306
Neg logit contributions
���-0.501
 Lennon-0.480
 lapt-0.479
 Gohan-0.454
 Yose-0.450
Token pos
Token ID1426
Feature activation-1.04e+00

Pos logit contributions
otrop+0.258
udos+0.238
acia+0.238
usting+0.238
 safety+0.235
Neg logit contributions
���-0.262
 lapt-0.244
 Lennon-0.243
 Gohan-0.229
 Yose-0.226
Token your
Token ID534
Feature activation+2.50e-01

Pos logit contributions
���+0.801
 Lennon+0.786
 lapt+0.768
 Gohan+0.736
 Yose+0.726
Neg logit contributions
otrop-0.616
udos-0.565
acia-0.560
usting-0.558
 safety-0.524
Token husband
Token ID5229
Feature activation-6.97e-01

Pos logit contributions
otrop+0.148
udos+0.135
acia+0.134
usting+0.133
 safety+0.128
Neg logit contributions
���-0.203
 Lennon-0.195
 lapt-0.194
 Gohan-0.183
FML-0.181
Token before
Token ID878
Feature activation+1.79e-01

Pos logit contributions
���+0.561
 Lennon+0.542
 lapt+0.535
 Gohan+0.511
FML+0.504
Neg logit contributions
otrop-0.412
udos-0.376
acia-0.374
usting-0.372
 safety-0.353
Token you
Token ID345
Feature activation+4.46e-01

Pos logit contributions
otrop+0.109
udos+0.099
acia+0.099
usting+0.098
 safety+0.095
Neg logit contributions
���-0.143
 Lennon-0.137
 lapt-0.136
 Gohan-0.129
 Yose-0.127
Token march
Token ID9960
Feature activation-9.78e-01

Pos logit contributions
otrop+0.285
udos+0.261
acia+0.261
usting+0.259
 safety+0.251
Neg logit contributions
���-0.341
 Lennon-0.325
 lapt-0.324
 Gohan-0.305
 Yose-0.301
Token anywhere
Token ID6609
Feature activation-2.47e+00

Pos logit contributions
���+0.813
 Lennon+0.800
 lapt+0.786
 Gohan+0.754
FML+0.741
Neg logit contributions
otrop-0.549
udos-0.498
acia-0.492
usting-0.486
 safety-0.459
Token again
Token ID757
Feature activation-4.57e-01

Pos logit contributions
���+2.002
 Lennon+1.982
 lapt+1.941
 Gohan+1.870
FML+1.845
Neg logit contributions
otrop-1.408
udos-1.285
acia-1.274
usting-1.267
 mine-1.171
Token.
Token ID13
Feature activation-1.56e-01

Pos logit contributions
���+0.397
 Lennon+0.389
 lapt+0.382
 Gohan+0.367
FML+0.363
Neg logit contributions
otrop-0.240
udos-0.215
acia-0.213
usting-0.212
 safety-0.200
Token
Token ID198
Feature activation-3.61e-01

Pos logit contributions
���+0.118
 Lennon+0.114
 lapt+0.112
 Gohan+0.107
FML+0.105
Neg logit contributions
otrop-0.101
udos-0.092
acia-0.092
usting-0.091
 safety-0.088
Token
Token ID198
Feature activation-3.24e-01

Pos logit contributions
���+0.279
 Lennon+0.275
 lapt+0.269
 Gohan+0.259
FML+0.256
Neg logit contributions
otrop-0.219
udos-0.200
acia-0.200
usting-0.199
 safety-0.191
TokenIf
Token ID1532
Feature activation-9.06e-01

Pos logit contributions
���+0.233
 Lennon+0.224
 lapt+0.222
 Gohan+0.209
FML+0.209
Neg logit contributions
otrop-0.220
udos-0.202
acia-0.202
usting-0.200
 safety-0.195
Token School
Token ID3961
Feature activation-8.48e-01

Pos logit contributions
���+0.432
 Lennon+0.409
 lapt+0.406
 Gohan+0.380
 Yose+0.373
Neg logit contributions
otrop-0.500
udos-0.464
acia-0.462
usting-0.460
 safety-0.448
Token Choice
Token ID18502
Feature activation-2.17e+00

Pos logit contributions
���+0.529
 Lennon+0.497
 lapt+0.492
 Gohan+0.467
FML+0.456
Neg logit contributions
otrop-0.657
udos-0.618
acia-0.608
usting-0.602
 mine-0.587
Token at
Token ID379
Feature activation-3.77e-01

Pos logit contributions
���+1.669
 Lennon+1.600
 lapt+1.593
 Gohan+1.496
FML+1.489
Neg logit contributions
otrop-1.367
udos-1.247
acia-1.246
usting-1.238
 safety-1.167
Token Vanderbilt
Token ID38684
Feature activation-5.09e-01

Pos logit contributions
���+0.296
 Lennon+0.285
 lapt+0.281
 Gohan+0.268
 Yose+0.265
Neg logit contributions
otrop-0.233
udos-0.213
acia-0.212
usting-0.211
 safety-0.201
Token University
Token ID2059
Feature activation-1.08e+00

Pos logit contributions
���+0.426
 Lennon+0.409
 lapt+0.406
 Gohan+0.386
 Yose+0.381
Neg logit contributions
otrop-0.289
acia-0.261
udos-0.261
usting-0.259
 safety-0.248
Token compared
Token ID3688
Feature activation-2.38e+00

Pos logit contributions
���+0.837
 Lennon+0.801
 lapt+0.795
 Gohan+0.752
 Yose+0.742
Neg logit contributions
otrop-0.680
udos-0.621
acia-0.621
usting-0.617
 safety-0.593
Token charter
Token ID17422
Feature activation-4.81e-01

Pos logit contributions
���+1.871
 lapt+1.810
 Lennon+1.808
 Gohan+1.709
 Yose+1.684
Neg logit contributions
otrop-1.453
acia-1.332
udos-1.327
usting-1.317
 mine-1.215
Token schools
Token ID4266
Feature activation-6.48e-01

Pos logit contributions
���+0.378
 lapt+0.363
 Lennon+0.361
 Gohan+0.343
FML+0.337
Neg logit contributions
otrop-0.296
udos-0.269
acia-0.269
usting-0.266
 safety-0.252
Token with
Token ID351
Feature activation-1.04e-01

Pos logit contributions
���+0.534
 Lennon+0.514
 lapt+0.514
 Gohan+0.487
FML+0.482
Neg logit contributions
otrop-0.374
udos-0.339
acia-0.338
usting-0.333
 mine-0.315
Token traditional
Token ID4569
Feature activation-7.62e-01

Pos logit contributions
���+0.095
 Lennon+0.092
 lapt+0.092
 Gohan+0.087
 Yose+0.086
Neg logit contributions
otrop-0.051
acia-0.045
udos-0.045
usting-0.045
 safety-0.042
Token public
Token ID1171
Feature activation-6.31e-01

Pos logit contributions
���+0.703
 Lennon+0.678
 lapt+0.677
 Gohan+0.646
 Yose+0.638
Neg logit contributions
otrop-0.365
acia-0.324
udos-0.323
usting-0.320
 safety-0.296