TOP ACTIVATIONS
MAX = 0.205

box full of beautiful or nam ents .
take one of the or nam ents outside for a walk
park filled with colourful or nam ents . Jimmy looked out
All of these lovely or nam ents reminded him of the
a box of pretty or nam ents that she wanted to
went she searched for or nam ents , and the more
who liked to collect or nam ents . Everywhere she went
and lots of shiny or nam ents . L ila wanted
she had with her or nam ents . <|endoftext|> Once upon
a tree with shiny or nam ents hanging from its branches
he especially liked the or nam ents he kept around .
hill and put her or nam ents all around . She
many colors . Some or nam ents were round , some
to look at the or nam ents on the tree .
can we have some or nam ents for our snow man
He quickly hung colorful or nam ents and tw ink ling
beautiful garden with the or nam ents . The
to help sort the or nam ents . She picked up
on it . The or nam ents were shiny and colorful
the tree with regular or nam ents !" <|endoftext|> Once upon

MIN ACTIVATIONS
MIN = -0.232

and Cass ie celebrate their accomplishment of finding the special pie
proud of himself and his accomplishment . <|endoftext|> Once upon a
baby , celebrating their special accomplishment . Then , they had
correctly . She smiled with accomplishment and hugged her mom my
smiled , proud of his accomplishment , and admired his reflection
hard work . What an accomplishment ! <|endoftext|> Once there was
They make many z ig z ags and circles and lines
proudly showed her mom her accomplishment . Her mom
was so proud of her accomplishment ! <|endoftext|> Nick was very
, feeling content with his accomplishment . <|endoftext|> Once upon a
was very proud of her accomplishment , and went home to
on , whenever she un z ipped her cup it would
The friends smiled at their accomplishment . The
happy and proud of her accomplishment . She had kept dreaming
took turns making z ig z ags with their feet on
was so proud of his accomplishment and his mom was too
. He wrote z ig z ags on walls , floors
to the vine and her accomplishment . From
proud of themselves for their accomplishment . From then
so proud of his great accomplishment . <|endoftext|> Once upon a

INTERVAL 0.095 - 0.205
CONTAINS 0.505%

They were getting tired and hungry . " Tom
€ � � Is he someone special to you ? â
and a big , juicy oy ster . The oy ster
. He would often look at other animals in the forest
upon a time , there lived a little bunny called Bella

INTERVAL -0.014 - 0.095
CONTAINS 70.951%

, " Do you believe I can help you ?".
ara thought for a moment . She liked her pe b
it felt so good . The pink dog decided
some candy and asked her brother for some . But her
the <|endoftext|> S ara and Tom were playing in the park

INTERVAL -0.123 - -0.014
CONTAINS 28.449%

hug their mom . They hug their dad , their grandma
love you ." L una feels happy too .
The rain is beautiful and wonderful and fun . Can you
them . " OK , OK . I won
the forest was destroyed . Sad indeed . <|endoftext|> Once upon

INTERVAL -0.232 - -0.123
CONTAINS 0.095%

, Jenny took a deep breath and parted the water with
he noticed some other cater pill ars stretching towards it too
bottle and took a big breath of the oxygen . But
watched as the rabbit hopped circles around him . He was
hill she was out of breath . Her clean shoe was
Token box
Token ID3091
Feature activation-2.46e-02

Pos logit contributions
 Alexa+0.001
ula+0.001
 creation+0.001
ela+0.001
hetically+0.001
Neg logit contributions
 respected-0.002
 wanted-0.002
 okay-0.002
 crank-0.002
 avoided-0.002
Token full
Token ID1336
Feature activation+3.70e-02

Pos logit contributions
 Alexa+0.056
hetically+0.052
ula+0.052
ela+0.052
 Tracy+0.052
Neg logit contributions
 wanted-0.023
 like-0.021
 decided-0.020
 seen-0.020
 worn-0.020
Token of
Token ID286
Feature activation-2.80e-02

Pos logit contributions
 wanted+0.058
 respected+0.057
 crank+0.057
 okay+0.057
 dangerous+0.057
Neg logit contributions
 Alexa-0.051
ula-0.051
hetically-0.049
 creation-0.048
ela-0.048
Token beautiful
Token ID4950
Feature activation+5.89e-03

Pos logit contributions
 Alexa+0.048
ula+0.047
 creation+0.047
hetically+0.046
 curtain+0.046
Neg logit contributions
 okay-0.034
 crank-0.034
 wanted-0.034
 respected-0.033
 avoided-0.033
Token or
Token ID393
Feature activation+9.10e-02

Pos logit contributions
 wanted+0.007
 okay+0.007
 dangerous+0.007
 respected+0.007
 crank+0.007
Neg logit contributions
 Alexa-0.011
ula-0.010
hetically-0.010
ela-0.010
 creation-0.010
Tokennam
Token ID7402
Feature activation+2.05e-01

Pos logit contributions
uch+0.100
 avoided+0.090
 crank+0.087
 quit+0.086
 respected+0.086
Neg logit contributions
 Alexa-0.177
 creation-0.171
ula-0.169
 curtain-0.165
 poster-0.163
Tokenents
Token ID658
Feature activation+6.60e-02

Pos logit contributions
 wanted+0.240
 seen+0.238
 worn+0.236
 respected+0.231
 crank+0.228
Neg logit contributions
 Alexa-0.365
hetically-0.359
ula-0.356
ela-0.347
 creation-0.340
Token.
Token ID13
Feature activation+2.87e-02

Pos logit contributions
 wanted+0.062
 decided+0.058
 dangerous+0.058
 seen+0.058
 belong+0.057
Neg logit contributions
 Alexa-0.142
ula-0.137
hetically-0.133
ela-0.133
oris-0.132
Token
Token ID220
Feature activation-2.54e-02

Pos logit contributions
 wanted+0.041
 okay+0.039
 crank+0.039
 dangerous+0.038
 quit+0.038
Neg logit contributions
ula-0.045
 Alexa-0.045
hetically-0.044
 creation-0.044
ela-0.044
Token
Token ID198
Feature activation+3.06e-02

Pos logit contributions
hetically+0.054
 Alexa+0.053
 creation+0.052
ula+0.052
 poster+0.052
Neg logit contributions
 okay-0.022
 wanted-0.022
 OK-0.020
 dangerous-0.020
?"-0.020
Token
Token ID198
Feature activation+3.08e-02

Pos logit contributions
 wanted+0.029
 okay+0.028
 dangerous+0.027
 crank+0.027
 respected+0.026
Neg logit contributions
hetically-0.063
 creation-0.062
 Alexa-0.062
ula-0.061
ela-0.060
Token take
Token ID1011
Feature activation+6.21e-04

Pos logit contributions
 crank+0.015
 dangerous+0.015
 okay+0.015
 seen+0.014
 avoided+0.014
Neg logit contributions
 Alexa-0.019
 creation-0.019
 curtain-0.019
 poster-0.018
hetically-0.018
Token one
Token ID530
Feature activation+2.14e-02

Pos logit contributions
 respected+0.001
 wanted+0.001
 crank+0.001
 forgot+0.001
 shouldn+0.001
Neg logit contributions
 Alexa-0.001
 creation-0.001
ula-0.001
 Tracy-0.001
ela-0.001
Token of
Token ID286
Feature activation-2.56e-02

Pos logit contributions
 wanted+0.032
 respected+0.031
 crank+0.031
 fought+0.031
 okay+0.031
Neg logit contributions
ula-0.031
 Alexa-0.031
 creation-0.031
hetically-0.030
ela-0.030
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
 Alexa+0.053
ula+0.053
hetically+0.053
ela+0.052
 creation+0.051
Neg logit contributions
 wanted-0.022
 dangerous-0.022
 okay-0.022
 respected-0.022
 crank-0.021
Token or
Token ID393
Feature activation+8.73e-02

Pos logit contributions
 respected+0.027
 wanted+0.027
 crank+0.027
 avoided+0.027
 quit+0.027
Neg logit contributions
 creation-0.032
 Alexa-0.032
hetically-0.032
 poster-0.031
 curtain-0.030
Tokennam
Token ID7402
Feature activation+2.04e-01

Pos logit contributions
uch+0.121
w+0.102
 crank+0.100
 respected+0.098
aw+0.098
Neg logit contributions
 Alexa-0.150
 poster-0.133
 creation-0.132
 Julie-0.128
 curtain-0.126
Tokenents
Token ID658
Feature activation+6.40e-02

Pos logit contributions
 wanted+0.260
 respected+0.253
 crank+0.250
 okay+0.250
 dangerous+0.249
Neg logit contributions
 Alexa-0.345
ula-0.340
hetically-0.334
ela-0.328
 creation-0.324
Token outside
Token ID2354
Feature activation-9.29e-03

Pos logit contributions
 wanted+0.078
 crank+0.077
 okay+0.077
 dangerous+0.076
 respected+0.076
Neg logit contributions
 Alexa-0.109
ula-0.107
hetically-0.107
 creation-0.105
ela-0.105
Token for
Token ID329
Feature activation-8.22e-03

Pos logit contributions
 Alexa+0.017
ula+0.016
hetically+0.016
 creation+0.016
ela+0.016
Neg logit contributions
 wanted-0.011
 okay-0.010
 crank-0.010
 dangerous-0.010
 respected-0.010
Token a
Token ID257
Feature activation+1.41e-02

Pos logit contributions
 Alexa+0.013
 creation+0.013
ula+0.013
hetically+0.012
 Tracy+0.012
Neg logit contributions
 crank-0.011
 wanted-0.011
 stole-0.010
 shouldn-0.010
uch-0.010
Token walk
Token ID2513
Feature activation-2.33e-02

Pos logit contributions
 okay+0.017
 dangerous+0.017
 wanted+0.017
 quit+0.016
 worn+0.016
Neg logit contributions
hetically-0.026
ula-0.026
 Alexa-0.026
ela-0.025
 creation-0.024
Token park
Token ID3952
Feature activation-1.48e-02

Pos logit contributions
 wanted+0.013
 okay+0.013
 respected+0.013
 crank+0.013
 dangerous+0.013
Neg logit contributions
 Alexa-0.017
ula-0.016
ela-0.016
hetically-0.016
 creation-0.016
Token filled
Token ID5901
Feature activation-9.45e-03

Pos logit contributions
 Alexa+0.025
ula+0.025
ela+0.025
 creation+0.024
hetically+0.024
Neg logit contributions
 okay-0.018
 crank-0.018
 dangerous-0.018
 respected-0.017
 wanted-0.017
Token with
Token ID351
Feature activation-9.67e-03

Pos logit contributions
 Alexa+0.018
ula+0.018
hetically+0.017
ela+0.017
 creation+0.017
Neg logit contributions
 wanted-0.010
 okay-0.010
 crank-0.010
 respected-0.010
 dangerous-0.010
Token colourful
Token ID44799
Feature activation-2.27e-02

Pos logit contributions
 circles+0.015
 creation+0.015
ula+0.014
 puppet+0.014
 curtain+0.014
Neg logit contributions
uch-0.013
 crank-0.013
 respected-0.012
 lied-0.012
 stole-0.012
Token or
Token ID393
Feature activation+8.26e-02

Pos logit contributions
 creation+0.034
ula+0.033
 Alexa+0.033
 curtain+0.033
hetically+0.032
Neg logit contributions
 okay-0.033
 crank-0.033
 respected-0.032
 avoided-0.032
 wanted-0.032
Tokennam
Token ID7402
Feature activation+2.02e-01

Pos logit contributions
uch+0.128
 avoided+0.119
 respected+0.117
asure+0.116
 shouldn+0.114
Neg logit contributions
 Alexa-0.114
 puppet-0.112
 cheering-0.112
 circles-0.111
 poster-0.109
Tokenents
Token ID658
Feature activation+5.90e-02

Pos logit contributions
 wanted+0.247
 worn+0.240
 seen+0.240
 respected+0.239
 crank+0.238
Neg logit contributions
 Alexa-0.352
ula-0.343
hetically-0.341
ela-0.335
 creation-0.327
Token.
Token ID13
Feature activation+2.04e-02

Pos logit contributions
 wanted+0.057
 decided+0.053
 dangerous+0.053
 okay+0.053
 seen+0.052
Neg logit contributions
 Alexa-0.123
ula-0.118
hetically-0.116
ela-0.115
oris-0.113
Token Jimmy
Token ID12963
Feature activation+6.09e-03

Pos logit contributions
 respected+0.027
 dangerous+0.027
 wanted+0.027
 belong+0.026
 okay+0.026
Neg logit contributions
 Alexa-0.034
ula-0.033
hetically-0.032
ela-0.031
 creation-0.031
Token looked
Token ID3114
Feature activation-1.55e-02

Pos logit contributions
 okay+0.006
 dangerous+0.006
 crank+0.006
 belong+0.006
 OK+0.005
Neg logit contributions
 Alexa-0.012
ela-0.012
hetically-0.012
 creation-0.012
ula-0.012
Token out
Token ID503
Feature activation+8.75e-03

Pos logit contributions
 Alexa+0.028
ula+0.028
hetically+0.027
ela+0.027
 creation+0.027
Neg logit contributions
 wanted-0.018
 okay-0.018
 dangerous-0.018
 crank-0.017
 respected-0.017
Token All
Token ID1439
Feature activation+3.14e-02

Pos logit contributions
 dangerous+0.005
 respected+0.005
 wanted+0.005
 crank+0.005
 okay+0.005
Neg logit contributions
 Alexa-0.007
ula-0.006
hetically-0.006
ela-0.006
 creation-0.006
Token of
Token ID286
Feature activation-3.51e-02

Pos logit contributions
 okay+0.042
 wanted+0.042
 crank+0.041
 dangerous+0.041
 respected+0.041
Neg logit contributions
 Alexa-0.050
hetically-0.048
ula-0.048
ela-0.048
 creation-0.048
Token these
Token ID777
Feature activation-6.04e-03

Pos logit contributions
 Alexa+0.063
ula+0.061
 creation+0.059
hetically+0.059
ela+0.059
Neg logit contributions
 crank-0.042
 wanted-0.041
 okay-0.040
 belong-0.040
 quit-0.039
Token lovely
Token ID14081
Feature activation-9.26e-03

Pos logit contributions
 creation+0.010
hetically+0.010
 curtain+0.009
 curtains+0.009
 Alexa+0.009
Neg logit contributions
 crank-0.007
 OK-0.007
 okay-0.007
uch-0.007
 quit-0.007
Token or
Token ID393
Feature activation+8.11e-02

Pos logit contributions
 creation+0.015
 curtain+0.013
hetically+0.013
 poster+0.013
 Alexa+0.013
Neg logit contributions
uch-0.013
 okay-0.013
 crank-0.013
 OK-0.013
Please-0.013
Tokennam
Token ID7402
Feature activation+2.01e-01

Pos logit contributions
uch+0.099
 quit+0.082
 avoided+0.081
Please+0.081
asure+0.081
Neg logit contributions
 creation-0.146
 Alexa-0.146
hetically-0.142
 poster-0.133
 circles-0.133
Tokenents
Token ID658
Feature activation+5.72e-02

Pos logit contributions
 wanted+0.272
 okay+0.265
 respected+0.264
 dangerous+0.264
 crank+0.263
Neg logit contributions
 Alexa-0.323
ula-0.318
hetically-0.311
ela-0.306
 creation-0.304
Token reminded
Token ID14516
Feature activation-1.58e-02

Pos logit contributions
 crank+0.062
 okay+0.059
Please+0.059
 dangerous+0.058
 Please+0.057
Neg logit contributions
hetically-0.106
 creation-0.105
 Alexa-0.105
ula-0.102
ela-0.102
Token him
Token ID683
Feature activation+1.87e-02

Pos logit contributions
 Alexa+0.028
ula+0.027
hetically+0.027
ela+0.026
 creation+0.026
Neg logit contributions
 crank-0.019
 wanted-0.019
 respected-0.019
 dangerous-0.019
 worn-0.018
Token of
Token ID286
Feature activation-3.09e-02

Pos logit contributions
 wanted+0.022
 crank+0.022
 okay+0.022
 dangerous+0.021
 respected+0.021
Neg logit contributions
 Alexa-0.033
ula-0.033
hetically-0.032
 creation-0.032
ela-0.032
Token the
Token ID262
Feature activation+1.41e-02

Pos logit contributions
 Alexa+0.050
 creation+0.050
hetically+0.048
ula+0.047
ela+0.047
Neg logit contributions
 crank-0.042
uch-0.039
 okay-0.039
 stole-0.038
 quit-0.038
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
 respected+0.038
 crank+0.038
 okay+0.038
 belong+0.038
 fought+0.037
Neg logit contributions
 Alexa-0.066
ula-0.064
 creation-0.062
ela-0.061
 curtain-0.060
Token box
Token ID3091
Feature activation-2.91e-02

Pos logit contributions
 dangerous+0.016
 wanted+0.016
 okay+0.015
 crank+0.015
 worn+0.015
Neg logit contributions
hetically-0.021
 Alexa-0.021
ula-0.021
ela-0.020
 creation-0.019
Token of
Token ID286
Feature activation-3.25e-02

Pos logit contributions
 Alexa+0.057
ula+0.056
hetically+0.056
ela+0.054
 creation+0.054
Neg logit contributions
 wanted-0.031
 worn-0.030
 okay-0.029
 dangerous-0.029
 respected-0.029
Token pretty
Token ID2495
Feature activation+4.00e-02

Pos logit contributions
 Alexa+0.049
 creation+0.049
ula+0.048
 curtain+0.048
 puppet+0.046
Neg logit contributions
 respected-0.046
 crank-0.046
 avoided-0.046
 okay-0.045
 wanted-0.045
Token or
Token ID393
Feature activation+8.45e-02

Pos logit contributions
 wanted+0.052
 okay+0.051
 dangerous+0.050
 respected+0.050
 crank+0.050
Neg logit contributions
 Alexa-0.067
ula-0.066
hetically-0.065
ela-0.064
 creation-0.063
Tokennam
Token ID7402
Feature activation+2.00e-01

Pos logit contributions
uch+0.114
 avoided+0.108
 respected+0.101
 quit+0.101
 crank+0.100
Neg logit contributions
 Alexa-0.149
 puppet-0.134
ula-0.133
 curtain-0.131
 creation-0.131
Tokenents
Token ID658
Feature activation+6.18e-02

Pos logit contributions
 wanted+0.238
 seen+0.234
 worn+0.234
 respected+0.230
 crank+0.229
Neg logit contributions
 Alexa-0.352
hetically-0.346
ula-0.345
ela-0.337
 creation-0.330
Token that
Token ID326
Feature activation-2.01e-02

Pos logit contributions
 wanted+0.065
 dangerous+0.062
 okay+0.062
 respected+0.062
 seen+0.060
Neg logit contributions
 Alexa-0.120
ula-0.118
hetically-0.116
ela-0.114
 creation-0.113
Token she
Token ID673
Feature activation-8.57e-03

Pos logit contributions
 Alexa+0.027
 creation+0.025
ela+0.025
ula+0.024
 Isabel+0.024
Neg logit contributions
 crank-0.033
 okay-0.033
 OK-0.032
 dangerous-0.032
 respected-0.031
Token wanted
Token ID2227
Feature activation+8.19e-02

Pos logit contributions
 Alexa+0.019
 creation+0.018
ela+0.018
 puppet+0.018
ail+0.017
Neg logit contributions
 okay-0.007
 OK-0.007
 crank-0.006
 dangerous-0.006
 waste-0.006
Token to
Token ID284
Feature activation+9.87e-03

Pos logit contributions
 crank+0.110
 wanted+0.109
 respected+0.109
 okay+0.109
 dangerous+0.108
Neg logit contributions
 Alexa-0.131
ula-0.129
 creation-0.125
ela-0.123
hetically-0.123
Token went
Token ID1816
Feature activation+6.91e-02

Pos logit contributions
 Alexa+0.012
 creation+0.011
hetically+0.011
ela+0.011
ula+0.011
Neg logit contributions
 okay-0.003
 OK-0.003
 dangerous-0.003
 crank-0.003
 respected-0.003
Token she
Token ID673
Feature activation-4.57e-03

Pos logit contributions
 respected+0.090
 crank+0.089
 okay+0.089
 dangerous+0.088
 worn+0.087
Neg logit contributions
 Alexa-0.115
ela-0.109
ula-0.109
hetically-0.107
 creation-0.106
Token searched
Token ID16499
Feature activation-1.59e-03

Pos logit contributions
 Alexa+0.010
 creation+0.010
ela+0.010
hetically+0.010
 puppet+0.009
Neg logit contributions
 okay-0.004
 OK-0.004
 crank-0.003
 dangerous-0.003
 Please-0.003
Token for
Token ID329
Feature activation-1.05e-02

Pos logit contributions
 Alexa+0.003
ula+0.003
hetically+0.003
ela+0.003
 creation+0.003
Neg logit contributions
 wanted-0.002
 okay-0.002
 crank-0.002
 respected-0.002
 dangerous-0.002
Token or
Token ID393
Feature activation+9.12e-02

Pos logit contributions
 creation+0.015
 curtain+0.013
 curtains+0.013
 circles+0.013
 poster+0.013
Neg logit contributions
uch-0.017
 crank-0.016
 Please-0.015
 shouldn-0.015
 quit-0.015
Tokennam
Token ID7402
Feature activation+1.99e-01

Pos logit contributions
uch+0.088
 OK+0.082
cream+0.081
 quit+0.079
 shouldn+0.079
Neg logit contributions
 Alexa-0.175
 creation-0.169
 poster-0.167
hetically-0.167
 curtain-0.162
Tokenents
Token ID658
Feature activation+6.43e-02

Pos logit contributions
 wanted+0.243
 seen+0.237
 respected+0.236
 worn+0.236
 okay+0.234
Neg logit contributions
 Alexa-0.347
ula-0.342
hetically-0.337
ela-0.332
 creation-0.324
Token,
Token ID11
Feature activation+3.00e-02

Pos logit contributions
 wanted+0.073
 okay+0.070
 dangerous+0.070
 respected+0.070
 crank+0.069
Neg logit contributions
 Alexa-0.119
ula-0.117
hetically-0.115
ela-0.113
 creation-0.113
Token and
Token ID290
Feature activation+3.25e-02

Pos logit contributions
 wanted+0.040
 respected+0.039
 crank+0.039
 dangerous+0.039
 okay+0.039
Neg logit contributions
 Alexa-0.049
ula-0.048
hetically-0.047
ela-0.047
 creation-0.046
Token the
Token ID262
Feature activation+2.23e-02

Pos logit contributions
 crank+0.045
 okay+0.044
 respected+0.043
 OK+0.043
 quit+0.043
Neg logit contributions
 Alexa-0.051
hetically-0.049
 creation-0.048
ela-0.048
ula-0.047
Token more
Token ID517
Feature activation+5.12e-02

Pos logit contributions
 respected+0.032
 avoided+0.032
 seen+0.031
 fought+0.031
 crank+0.031
Neg logit contributions
hetically-0.032
 creation-0.032
 Alexa-0.031
 curtain-0.031
 poster-0.031
Token who
Token ID508
Feature activation+5.38e-02

Pos logit contributions
 Alexa+0.027
ula+0.026
ela+0.026
 puppet+0.026
 curtain+0.026
Neg logit contributions
 okay-0.010
 crank-0.010
 belong-0.010
 quit-0.010
 avoided-0.009
Token liked
Token ID8288
Feature activation+5.78e-02

Pos logit contributions
 okay+0.038
asure+0.036
 OK+0.035
 lied+0.034
 crank+0.034
Neg logit contributions
 Alexa-0.117
 puppet-0.111
 creation-0.110
hetically-0.109
 curtain-0.109
Token to
Token ID284
Feature activation+9.25e-03

Pos logit contributions
 wanted+0.069
 dangerous+0.066
 crank+0.064
 okay+0.064
 respected+0.064
Neg logit contributions
 Alexa-0.106
hetically-0.104
ula-0.104
ela-0.102
 creation-0.099
Token collect
Token ID2824
Feature activation+1.34e-02

Pos logit contributions
 dangerous+0.011
 avoided+0.011
 okay+0.011
 respected+0.011
 crank+0.011
Neg logit contributions
 Alexa-0.016
 creation-0.016
ula-0.016
 curtain-0.016
hetically-0.015
Token or
Token ID393
Feature activation+8.85e-02

Pos logit contributions
 crank+0.018
 respected+0.018
 okay+0.018
 quit+0.018
 avoided+0.018
Neg logit contributions
ula-0.021
 creation-0.021
ela-0.021
 Alexa-0.020
 curtain-0.020
Tokennam
Token ID7402
Feature activation+1.99e-01

Pos logit contributions
uch+0.123
 OK+0.109
 avoided+0.106
aw+0.104
 okay+0.102
Neg logit contributions
 Alexa-0.142
 creation-0.139
 curtain-0.136
 poster-0.133
 curtains-0.131
Tokenents
Token ID658
Feature activation+6.31e-02

Pos logit contributions
 wanted+0.249
 respected+0.243
 worn+0.241
 seen+0.240
 dangerous+0.239
Neg logit contributions
 Alexa-0.339
ula-0.334
hetically-0.330
ela-0.324
 creation-0.315
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 wanted+0.071
 crank+0.070
 okay+0.070
 dangerous+0.069
 respected+0.069
Neg logit contributions
 Alexa-0.114
ula-0.113
hetically-0.111
 creation-0.110
ela-0.109
Token Everywhere
Token ID48201
Feature activation-4.41e-02

Pos logit contributions
 wanted+0.033
 okay+0.032
 respected+0.032
 crank+0.032
 dangerous+0.032
Neg logit contributions
 Alexa-0.038
ula-0.038
hetically-0.037
ela-0.037
 creation-0.036
Token she
Token ID673
Feature activation-5.09e-03

Pos logit contributions
 Alexa+0.033
 poster+0.029
 Ana+0.029
 curtain+0.028
ela+0.027
Neg logit contributions
asure-0.095
 OK-0.095
 okay-0.094
 property-0.094
 crank-0.093
Token went
Token ID1816
Feature activation+6.91e-02

Pos logit contributions
 Alexa+0.012
 creation+0.011
hetically+0.011
ela+0.011
ula+0.011
Neg logit contributions
 okay-0.003
 OK-0.003
 dangerous-0.003
 crank-0.003
 respected-0.003
Token and
Token ID290
Feature activation+2.00e-02

Pos logit contributions
 wanted+0.051
 decided+0.047
 dangerous+0.046
 seen+0.046
 like+0.045
Neg logit contributions
 Alexa-0.099
ula-0.097
hetically-0.094
ela-0.094
 creation-0.090
Token lots
Token ID6041
Feature activation+1.42e-02

Pos logit contributions
 crank+0.028
 okay+0.028
 respected+0.027
 OK+0.027
 avoided+0.027
Neg logit contributions
 Alexa-0.032
ula-0.029
 creation-0.029
 puppet-0.029
 curtain-0.029
Token of
Token ID286
Feature activation-3.08e-02

Pos logit contributions
 wanted+0.021
 dangerous+0.020
 respected+0.020
 okay+0.020
 quit+0.019
Neg logit contributions
 Alexa-0.022
ula-0.021
hetically-0.021
ela-0.021
 creation-0.020
Token shiny
Token ID22441
Feature activation+8.30e-03

Pos logit contributions
 creation+0.046
 curtain+0.046
 Alexa+0.045
 circles+0.044
ela+0.044
Neg logit contributions
 okay-0.043
uch-0.043
 crank-0.043
 OK-0.042
 avoided-0.041
Token or
Token ID393
Feature activation+8.21e-02

Pos logit contributions
 okay+0.014
 crank+0.014
 OK+0.013
 quit+0.013
 respected+0.013
Neg logit contributions
 creation-0.012
 curtain-0.011
ela-0.011
 poster-0.011
ula-0.011
Tokennam
Token ID7402
Feature activation+1.98e-01

Pos logit contributions
uch+0.117
 avoided+0.114
 crank+0.113
 okay+0.112
 respected+0.111
Neg logit contributions
 Alexa-0.127
 creation-0.118
 curtain-0.117
 puppet-0.117
 poster-0.115
Tokenents
Token ID658
Feature activation+5.90e-02

Pos logit contributions
 wanted+0.242
 seen+0.236
 worn+0.234
 respected+0.232
 dangerous+0.228
Neg logit contributions
 Alexa-0.348
hetically-0.339
ula-0.339
ela-0.329
 creation-0.324
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
 wanted+0.063
 dangerous+0.059
 okay+0.059
 respected+0.058
 seen+0.058
Neg logit contributions
 Alexa-0.116
ula-0.113
hetically-0.111
ela-0.109
 creation-0.107
Token L
Token ID406
Feature activation-1.72e-02

Pos logit contributions
 wanted+0.018
 okay+0.017
 crank+0.017
 dangerous+0.016
 quit+0.016
Neg logit contributions
hetically-0.026
 Alexa-0.026
 creation-0.026
ula-0.025
 poster-0.025
Tokenila
Token ID10102
Feature activation-3.66e-05

Pos logit contributions
hetically+0.033
 creation+0.032
 Alexa+0.031
 doorway+0.030
 poster+0.029
Neg logit contributions
 wanted-0.020
?"-0.019
 okay-0.019
 quit-0.019
 decided-0.019
Token wanted
Token ID2227
Feature activation+7.73e-02

Pos logit contributions
 Alexa+0.000
 creation+0.000
ela+0.000
hetically+0.000
ula+0.000
Neg logit contributions
 okay-0.000
 crank-0.000
 dangerous-0.000
 belong-0.000
 OK-0.000
Token she
Token ID673
Feature activation-1.83e-02

Pos logit contributions
 wanted+0.009
 dangerous+0.009
 okay+0.008
 respected+0.008
 decided+0.008
Neg logit contributions
 Alexa-0.012
ula-0.012
hetically-0.012
ela-0.011
oris-0.011
Token had
Token ID550
Feature activation+2.63e-02

Pos logit contributions
 Alexa+0.036
ela+0.034
 puppet+0.034
 creation+0.033
 poster+0.033
Neg logit contributions
 okay-0.019
 OK-0.018
 waste-0.017
 crank-0.017
 dangerous-0.017
Token with
Token ID351
Feature activation-1.71e-02

Pos logit contributions
 crank+0.030
 wasteful+0.028
 okay+0.027
 dangerous+0.027
 unsafe+0.026
Neg logit contributions
 Alexa-0.046
 creation-0.046
ela-0.045
 circles-0.044
ula-0.044
Token her
Token ID607
Feature activation-8.83e-03

Pos logit contributions
 Alexa+0.025
ula+0.024
 creation+0.023
hetically+0.023
ela+0.023
Neg logit contributions
 crank-0.026
 wanted-0.025
 respected-0.025
 okay-0.025
 quit-0.025
Token or
Token ID393
Feature activation+7.83e-02

Pos logit contributions
 Alexa+0.014
 creation+0.014
ula+0.013
hetically+0.013
 poster+0.013
Neg logit contributions
 crank-0.012
 wanted-0.012
 respected-0.012
 okay-0.012
 avoided-0.012
Tokennam
Token ID7402
Feature activation+1.97e-01

Pos logit contributions
uch+0.111
 crank+0.110
Please+0.105
 okay+0.103
 OK+0.102
Neg logit contributions
 Alexa-0.125
 creation-0.113
 Julie-0.110
 curtain-0.108
 poster-0.105
Tokenents
Token ID658
Feature activation+5.60e-02

Pos logit contributions
 wanted+0.252
 respected+0.245
 okay+0.243
 crank+0.243
 worn+0.242
Neg logit contributions
 Alexa-0.332
ula-0.327
hetically-0.323
ela-0.317
 creation-0.312
Token.
Token ID13
Feature activation+1.04e-02

Pos logit contributions
 wanted+0.064
 okay+0.062
 dangerous+0.062
 respected+0.061
 crank+0.060
Neg logit contributions
 Alexa-0.103
ula-0.101
hetically-0.099
ela-0.098
 creation-0.097
Token<|endoftext|>
Token ID50256
Feature activation-1.91e-02

Pos logit contributions
 wanted+0.012
 okay+0.011
 respected+0.011
 crank+0.011
 dangerous+0.011
Neg logit contributions
 Alexa-0.019
ula-0.019
hetically-0.019
 creation-0.019
ela-0.019
TokenOnce
Token ID7454
Feature activation+2.54e-02

Pos logit contributions
hetically+0.038
ula+0.038
 Alexa+0.038
 creation+0.038
ela+0.037
Neg logit contributions
 wanted-0.019
 lived-0.018
 okay-0.017
?"-0.017
 dangerous-0.017
Token upon
Token ID2402
Feature activation-3.10e-04

Pos logit contributions
 wanted+0.032
 okay+0.031
?"+0.030
 dangerous+0.029
 lived+0.029
Neg logit contributions
hetically-0.047
ula-0.046
 Alexa-0.045
 puppet-0.043
 creation-0.043
Token a
Token ID257
Feature activation+1.69e-02

Pos logit contributions
 crank+0.037
 fought+0.035
 respected+0.035
 okay+0.035
 quit+0.035
Neg logit contributions
 Alexa-0.058
ula-0.058
hetically-0.057
 curtain-0.057
 creation-0.056
Token tree
Token ID5509
Feature activation-1.79e-02

Pos logit contributions
 wanted+0.022
 respected+0.022
 okay+0.022
 crank+0.022
 avoided+0.022
Neg logit contributions
 Alexa-0.027
ula-0.027
ela-0.026
 creation-0.026
hetically-0.026
Token with
Token ID351
Feature activation-8.24e-03

Pos logit contributions
 Alexa+0.035
ula+0.035
hetically+0.034
ela+0.034
 creation+0.034
Neg logit contributions
 wanted-0.019
 dangerous-0.018
 respected-0.018
 okay-0.017
 crank-0.017
Token shiny
Token ID22441
Feature activation+9.81e-03

Pos logit contributions
 Alexa+0.013
ula+0.013
 creation+0.013
hetically+0.013
 curtain+0.013
Neg logit contributions
 crank-0.011
 okay-0.010
 respected-0.010
 avoided-0.010
 quit-0.010
Token or
Token ID393
Feature activation+8.89e-02

Pos logit contributions
 crank+0.017
 okay+0.017
 OK+0.016
 respected+0.016
 avoided+0.016
Neg logit contributions
 creation-0.014
 curtain-0.013
 poster-0.012
 circles-0.012
 curtains-0.011
Tokennam
Token ID7402
Feature activation+1.97e-01

Pos logit contributions
uch+0.128
 avoided+0.123
 respected+0.120
 belong+0.117
 fought+0.117
Neg logit contributions
 Alexa-0.132
 curtain-0.126
 circles-0.123
 Ana-0.122
 poster-0.122
Tokenents
Token ID658
Feature activation+6.31e-02

Pos logit contributions
 wanted+0.242
 worn+0.234
 seen+0.233
 respected+0.232
 okay+0.229
Neg logit contributions
 Alexa-0.346
hetically-0.337
ula-0.336
ela-0.328
 creation-0.322
Token hanging
Token ID10938
Feature activation+1.25e-02

Pos logit contributions
 wanted+0.066
 dangerous+0.061
 decided+0.061
 seen+0.061
 respected+0.061
Neg logit contributions
 Alexa-0.129
ula-0.124
hetically-0.122
ela-0.121
oris-0.118
Token from
Token ID422
Feature activation+5.14e-02

Pos logit contributions
 wanted+0.015
 okay+0.014
 dangerous+0.014
 respected+0.014
 crank+0.014
Neg logit contributions
 Alexa-0.023
ula-0.022
hetically-0.022
ela-0.022
 creation-0.021
Token its
Token ID663
Feature activation+5.23e-02

Pos logit contributions
 crank+0.072
 okay+0.071
 respected+0.069
 quit+0.068
 OK+0.067
Neg logit contributions
 creation-0.078
 Alexa-0.078
 curtain-0.077
 poster-0.076
hetically-0.074
Token branches
Token ID13737
Feature activation+5.96e-02

Pos logit contributions
 crank+0.089
 respected+0.087
 avoided+0.086
 delayed+0.085
 okay+0.084
Neg logit contributions
 creation-0.071
 curtain-0.067
 doorway-0.066
 poster-0.063
hetically-0.062
Token he
Token ID339
Feature activation+3.02e-02

Pos logit contributions
uch+0.006
asure+0.006
aw+0.006
RR+0.006
Fin+0.006
Neg logit contributions
 creation-0.007
hetically-0.007
 doorway-0.007
 accomplishment-0.006
 cheering-0.006
Token especially
Token ID2592
Feature activation+7.01e-02

Pos logit contributions
 okay+0.029
 OK+0.028
Please+0.027
 waste+0.027
 Please+0.026
Neg logit contributions
 creation-0.058
 Alexa-0.058
hetically-0.056
 puppet-0.056
ela-0.055
Token liked
Token ID8288
Feature activation+4.65e-02

Pos logit contributions
asure+0.069
 ours+0.068
uch+0.067
 OK+0.066
 Please+0.064
Neg logit contributions
hetically-0.123
 Alexa-0.119
 creation-0.116
 puppet-0.110
 cheering-0.109
Token the
Token ID262
Feature activation+5.70e-03

Pos logit contributions
 crank+0.063
 okay+0.063
 respected+0.063
 wanted+0.062
 avoided+0.062
Neg logit contributions
 Alexa-0.071
ula-0.070
 creation-0.070
hetically-0.068
ela-0.067
Token or
Token ID393
Feature activation+7.48e-02

Pos logit contributions
 wanted+0.008
 respected+0.008
 avoided+0.008
 okay+0.007
 fought+0.007
Neg logit contributions
 Alexa-0.009
 creation-0.009
hetically-0.009
ula-0.009
 curtain-0.008
Tokennam
Token ID7402
Feature activation+1.97e-01

Pos logit contributions
uch+0.122
w+0.111
asure+0.105
aw+0.099
Please+0.099
Neg logit contributions
 creation-0.103
 Alexa-0.101
 circles-0.099
 curtain-0.098
 cheering-0.095
Tokenents
Token ID658
Feature activation+5.10e-02

Pos logit contributions
 wanted+0.258
 respected+0.250
 crank+0.249
 okay+0.249
 dangerous+0.248
Neg logit contributions
 Alexa-0.327
ula-0.321
hetically-0.315
ela-0.311
 creation-0.305
Token he
Token ID339
Feature activation+3.77e-02

Pos logit contributions
 crank+0.060
 wanted+0.059
 okay+0.059
 dangerous+0.059
 respected+0.058
Neg logit contributions
 Alexa-0.088
ula-0.088
hetically-0.087
 creation-0.086
ela-0.085
Token kept
Token ID4030
Feature activation+3.75e-02

Pos logit contributions
 okay+0.038
 OK+0.038
 crank+0.036
 waste+0.033
 yourselves+0.033
Neg logit contributions
 creation-0.070
 puppet-0.067
 Alexa-0.066
hetically-0.064
ela-0.063
Token around
Token ID1088
Feature activation+5.56e-03

Pos logit contributions
 wanted+0.048
 crank+0.048
 respected+0.047
 quit+0.047
 avoided+0.046
Neg logit contributions
 creation-0.060
 Alexa-0.060
ula-0.060
hetically-0.058
ela-0.057
Token.
Token ID13
Feature activation+3.90e-03

Pos logit contributions
 wanted+0.006
 dangerous+0.006
 okay+0.006
 respected+0.005
 crank+0.005
Neg logit contributions
 Alexa-0.011
ula-0.011
hetically-0.011
ela-0.010
 creation-0.010
Token hill
Token ID12788
Feature activation-4.51e-02

Pos logit contributions
 wanted+0.016
 respected+0.015
 okay+0.015
 crank+0.015
 avoided+0.015
Neg logit contributions
 Alexa-0.016
ula-0.015
 creation-0.015
ela-0.015
hetically-0.015
Token and
Token ID290
Feature activation+2.22e-02

Pos logit contributions
 Alexa+0.090
ula+0.089
hetically+0.086
ela+0.085
 creation+0.084
Neg logit contributions
 wanted-0.049
 decided-0.044
 dangerous-0.044
 respected-0.044
 seen-0.044
Token put
Token ID1234
Feature activation+1.47e-02

Pos logit contributions
 okay+0.022
 OK+0.021
 crank+0.021
 belong+0.019
 dangerous+0.019
Neg logit contributions
 Alexa-0.046
ela-0.044
 Isabel-0.044
 Julie-0.044
 creation-0.043
Token her
Token ID607
Feature activation-4.26e-03

Pos logit contributions
 wanted+0.018
 respected+0.017
 crank+0.017
 okay+0.017
 dangerous+0.017
Neg logit contributions
 Alexa-0.026
ula-0.026
hetically-0.025
ela-0.025
 creation-0.025
Token or
Token ID393
Feature activation+8.01e-02

Pos logit contributions
 creation+0.006
 curtain+0.006
hetically+0.006
 Alexa+0.006
 poster+0.006
Neg logit contributions
 respected-0.006
 crank-0.006
 avoided-0.006
 okay-0.006
 quit-0.006
Tokennam
Token ID7402
Feature activation+1.97e-01

Pos logit contributions
uch+0.106
aw+0.103
Ble+0.097
Fin+0.094
 shouldn+0.094
Neg logit contributions
 Alexa-0.137
 curtain-0.135
 curtains-0.131
 Julie-0.128
 creation-0.127
Tokenents
Token ID658
Feature activation+5.75e-02

Pos logit contributions
 wanted+0.244
 worn+0.238
 seen+0.237
 crank+0.237
 respected+0.237
Neg logit contributions
 Alexa-0.339
ula-0.332
hetically-0.327
ela-0.323
 creation-0.316
Token all
Token ID477
Feature activation+3.08e-02

Pos logit contributions
 wanted+0.058
 okay+0.056
 dangerous+0.055
 decided+0.055
 seen+0.055
Neg logit contributions
 Alexa-0.116
ula-0.112
hetically-0.110
ela-0.108
oris-0.106
Token around
Token ID1088
Feature activation+9.32e-03

Pos logit contributions
 wanted+0.030
 worn+0.029
 seen+0.029
 belong+0.028
 respected+0.027
Neg logit contributions
hetically-0.065
 Alexa-0.065
ula-0.064
oris-0.061
 poster-0.061
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
 wanted+0.007
 like+0.007
 dangerous+0.006
 decided+0.006
 waste+0.006
Neg logit contributions
hetically-0.021
 Alexa-0.021
ula-0.020
oris-0.020
ela-0.020
Token She
Token ID1375
Feature activation+1.98e-03

Pos logit contributions
 wanted+0.018
 okay+0.017
 quit+0.017
 crank+0.017
 respected+0.017
Neg logit contributions
 Alexa-0.024
ula-0.024
hetically-0.023
ela-0.023
 creation-0.023
Token many
Token ID867
Feature activation+5.44e-02

Pos logit contributions
 crank+0.034
 okay+0.034
 wanted+0.034
 respected+0.033
 dangerous+0.033
Neg logit contributions
 Alexa-0.058
ula-0.057
 creation-0.055
hetically-0.055
ela-0.055
Token colors
Token ID7577
Feature activation+2.74e-02

Pos logit contributions
 crank+0.084
uch+0.083
 okay+0.081
 quit+0.080
 stole+0.079
Neg logit contributions
 creation-0.079
 curtain-0.072
hetically-0.071
 circles-0.069
 Alexa-0.069
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
 wanted+0.023
 like+0.022
 decided+0.021
 seen+0.020
?"+0.020
Neg logit contributions
 Alexa-0.061
hetically-0.060
ela-0.059
ula-0.058
oris-0.057
Token Some
Token ID2773
Feature activation+8.28e-03

Pos logit contributions
 wanted+0.014
 respected+0.014
 dangerous+0.014
 crank+0.013
 okay+0.013
Neg logit contributions
 Alexa-0.029
ula-0.029
hetically-0.028
ela-0.027
 creation-0.027
Token or
Token ID393
Feature activation+7.79e-02

Pos logit contributions
 crank+0.013
 OK+0.012
 okay+0.012
uch+0.012
 quit+0.011
Neg logit contributions
hetically-0.011
 creation-0.011
 curtains-0.011
 curtain-0.010
 poster-0.010
Tokennam
Token ID7402
Feature activation+1.97e-01

Pos logit contributions
uch+0.110
 avoided+0.104
 respected+0.104
 crank+0.101
 lived+0.100
Neg logit contributions
 Alexa-0.125
 creation-0.120
 curtains-0.116
 poster-0.116
 curtain-0.116
Tokenents
Token ID658
Feature activation+5.55e-02

Pos logit contributions
 wanted+0.266
 okay+0.259
 respected+0.259
 dangerous+0.258
 crank+0.257
Neg logit contributions
 Alexa-0.316
ula-0.312
hetically-0.306
ela-0.301
 creation-0.298
Token were
Token ID547
Feature activation+2.97e-02

Pos logit contributions
 crank+0.063
Please+0.059
 Please+0.059
 okay+0.059
 dangerous+0.058
Neg logit contributions
 creation-0.100
hetically-0.098
 Alexa-0.098
ula-0.097
ela-0.096
Token round
Token ID2835
Feature activation-4.32e-02

Pos logit contributions
 crank+0.039
Please+0.038
 avoided+0.038
 quit+0.038
 wanted+0.037
Neg logit contributions
 Alexa-0.048
 creation-0.047
ula-0.045
ela-0.045
 poster-0.044
Token,
Token ID11
Feature activation+9.46e-03

Pos logit contributions
 Alexa+0.071
ula+0.070
hetically+0.069
 creation+0.069
ela+0.067
Neg logit contributions
 crank-0.056
 okay-0.056
 respected-0.055
 wanted-0.055
 dangerous-0.055
Token some
Token ID617
Feature activation+4.09e-02

Pos logit contributions
 respected+0.012
 seen+0.012
 crank+0.012
uch+0.012
 quit+0.012
Neg logit contributions
ula-0.016
 Alexa-0.015
 circles-0.015
hetically-0.015
 creation-0.014
Token to
Token ID284
Feature activation-2.06e-03

Pos logit contributions
 wanted+0.033
 dangerous+0.032
 crank+0.031
 respected+0.031
 okay+0.031
Neg logit contributions
 Alexa-0.052
ula-0.051
hetically-0.050
ela-0.050
 creation-0.048
Token look
Token ID804
Feature activation-4.23e-02

Pos logit contributions
 Alexa+0.004
ula+0.004
hetically+0.004
ela+0.004
 creation+0.004
Neg logit contributions
 wanted-0.002
 okay-0.002
 dangerous-0.002
 respected-0.002
 crank-0.002
Token at
Token ID379
Feature activation+8.64e-02

Pos logit contributions
 Alexa+0.092
ula+0.091
ela+0.089
 creation+0.089
hetically+0.088
Neg logit contributions
 wanted-0.034
 respected-0.034
 crank-0.033
 avoided-0.032
 fought-0.032
Token the
Token ID262
Feature activation+9.21e-03

Pos logit contributions
 okay+0.104
 wanted+0.103
 crank+0.103
 respected+0.101
 quit+0.099
Neg logit contributions
 Alexa-0.152
ula-0.148
 creation-0.147
hetically-0.146
ela-0.145
Token or
Token ID393
Feature activation+8.14e-02

Pos logit contributions
 respected+0.014
 okay+0.014
 avoided+0.014
 wanted+0.014
 crank+0.014
Neg logit contributions
 Alexa-0.013
ula-0.013
 creation-0.012
 curtain-0.012
 poster-0.012
Tokennam
Token ID7402
Feature activation+1.96e-01

Pos logit contributions
uch+0.101
 avoided+0.091
asure+0.089
 owe+0.087
 delayed+0.087
Neg logit contributions
 Alexa-0.154
 poster-0.138
 curtain-0.132
 Julie-0.131
 Ana-0.131
Tokenents
Token ID658
Feature activation+5.49e-02

Pos logit contributions
 wanted+0.244
 respected+0.239
 worn+0.236
 crank+0.236
 seen+0.235
Neg logit contributions
 Alexa-0.333
ula-0.327
hetically-0.325
ela-0.319
 creation-0.313
Token on
Token ID319
Feature activation+2.94e-02

Pos logit contributions
 wanted+0.051
 respected+0.049
 okay+0.048
 dangerous+0.048
 decided+0.048
Neg logit contributions
 Alexa-0.114
ula-0.113
ela-0.109
hetically-0.108
oris-0.108
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
 wanted+0.034
 okay+0.033
 dangerous+0.033
 respected+0.033
 crank+0.033
Neg logit contributions
 Alexa-0.053
ula-0.052
hetically-0.051
ela-0.051
 creation-0.050
Token tree
Token ID5509
Feature activation-2.75e-02

Pos logit contributions
 wanted+0.018
 dangerous+0.018
 okay+0.017
 crank+0.017
 respected+0.017
Neg logit contributions
 Alexa-0.020
ula-0.020
ela-0.019
hetically-0.019
 creation-0.018
Token.
Token ID13
Feature activation+1.48e-02

Pos logit contributions
 Alexa+0.045
 creation+0.044
hetically+0.044
ula+0.043
 curtain+0.042
Neg logit contributions
 crank-0.035
 okay-0.035
 avoided-0.034
 quit-0.034
 wanted-0.034
Token can
Token ID460
Feature activation-1.31e-02

Pos logit contributions
 Alexa+0.008
 curtain+0.008
 poster+0.008
ula+0.008
 creation+0.008
Neg logit contributions
 respected-0.005
 avoided-0.005
 crank-0.005
 fought-0.005
Please-0.005
Token we
Token ID356
Feature activation+7.27e-03

Pos logit contributions
 Alexa+0.026
 poster+0.025
 Isabel+0.025
ula+0.025
 Ana+0.024
Neg logit contributions
 respected-0.014
 avoided-0.014
 dangerous-0.014
 seen-0.014
 harmful-0.013
Token have
Token ID423
Feature activation-1.68e-02

Pos logit contributions
 crank+0.010
 avoided+0.010
 dangerous+0.010
 risky+0.010
 wasteful+0.010
Neg logit contributions
 poster-0.011
 creation-0.011
 curtain-0.011
 Alexa-0.010
 Isabel-0.010
Token some
Token ID617
Feature activation+3.14e-02

Pos logit contributions
 creation+0.024
 poster+0.022
 Alexa+0.022
 curtains+0.022
 curtain+0.021
Neg logit contributions
 respected-0.025
 crank-0.025
uch-0.025
 avoided-0.024
Please-0.024
Token or
Token ID393
Feature activation+7.15e-02

Pos logit contributions
 avoided+0.050
 forgot+0.050
 missed+0.050
 crank+0.049
 fought+0.049
Neg logit contributions
 creation-0.045
 curtain-0.043
 poster-0.042
 curtains-0.040
 calendar-0.039
Tokennam
Token ID7402
Feature activation+1.95e-01

Pos logit contributions
 avoided+0.103
 respected+0.102
uch+0.100
 crank+0.098
 resisted+0.094
Neg logit contributions
 Alexa-0.114
 poster-0.111
 curtain-0.110
 curtains-0.109
 creation-0.105
Tokenents
Token ID658
Feature activation+5.13e-02

Pos logit contributions
 wanted+0.265
 okay+0.258
 dangerous+0.257
 respected+0.257
 crank+0.257
Neg logit contributions
 Alexa-0.314
ula-0.309
hetically-0.303
ela-0.299
 creation-0.296
Token for
Token ID329
Feature activation-2.19e-02

Pos logit contributions
 crank+0.052
Please+0.052
 respected+0.051
 wanted+0.051
 wished+0.050
Neg logit contributions
 creation-0.096
 Alexa-0.095
ela-0.094
ula-0.093
 poster-0.093
Token our
Token ID674
Feature activation+3.21e-02

Pos logit contributions
 Alexa+0.036
ula+0.036
 creation+0.034
 curtain+0.034
 poster+0.033
Neg logit contributions
 wanted-0.030
 respected-0.030
 avoided-0.029
 crank-0.029
 forgot-0.029
Token snow
Token ID6729
Feature activation+4.17e-02

Pos logit contributions
 wanted+0.044
 respected+0.043
 okay+0.043
 crank+0.043
 dangerous+0.042
Neg logit contributions
 Alexa-0.050
 creation-0.050
ula-0.049
hetically-0.049
ela-0.048
Tokenman
Token ID805
Feature activation-8.28e-02

Pos logit contributions
 wanted+0.078
 respected+0.078
 dangerous+0.077
 crank+0.077
 okay+0.077
Neg logit contributions
 Alexa-0.045
 creation-0.043
ula-0.043
hetically-0.043
ela-0.041
Token He
Token ID679
Feature activation+3.31e-02

Pos logit contributions
 wanted+0.008
 okay+0.008
 crank+0.008
 dangerous+0.008
 respected+0.008
Neg logit contributions
 Alexa-0.014
ula-0.014
hetically-0.014
ela-0.013
 creation-0.013
Token quickly
Token ID2952
Feature activation-5.16e-02

Pos logit contributions
 okay+0.030
 crank+0.029
 dangerous+0.029
 belong+0.029
 OK+0.028
Neg logit contributions
 Alexa-0.068
ela-0.066
 creation-0.066
ula-0.066
hetically-0.065
Token hung
Token ID9174
Feature activation-1.21e-04

Pos logit contributions
 Alexa+0.106
ula+0.105
 creation+0.103
hetically+0.103
ela+0.103
Neg logit contributions
 okay-0.046
 crank-0.046
 dangerous-0.044
 respected-0.044
 wanted-0.043
Token colorful
Token ID20239
Feature activation-1.38e-02

Pos logit contributions
 Alexa+0.000
hetically+0.000
oris+0.000
ula+0.000
ela+0.000
Neg logit contributions
 okay-0.000
 like-0.000
 wanted-0.000
 dangerous-0.000
 decided-0.000
Token or
Token ID393
Feature activation+7.18e-02

Pos logit contributions
 Alexa+0.021
hetically+0.021
ula+0.021
ela+0.020
oris+0.020
Neg logit contributions
 wanted-0.020
 dangerous-0.019
 waste-0.019
 decided-0.019
 like-0.019
Tokennam
Token ID7402
Feature activation+1.95e-01

Pos logit contributions
uch+0.087
 shouldn+0.079
 lived+0.078
 stole+0.078
 belong+0.078
Neg logit contributions
 Alexa-0.137
 creation-0.122
 poster-0.121
 curtain-0.121
 Julie-0.120
Tokenents
Token ID658
Feature activation+5.01e-02

Pos logit contributions
 wanted+0.220
 seen+0.219
 worn+0.217
 respected+0.211
 crank+0.209
Neg logit contributions
 Alexa-0.352
ula-0.344
hetically-0.343
ela-0.341
 poster-0.330
Token and
Token ID290
Feature activation+1.27e-02

Pos logit contributions
 like+0.048
 wanted+0.046
 decided+0.045
 okay+0.044
 seen+0.043
Neg logit contributions
 Alexa-0.105
ula-0.100
oris-0.100
ela-0.098
 Connie-0.098
Token tw
Token ID665
Feature activation-2.40e-02

Pos logit contributions
 okay+0.016
 crank+0.016
 belong+0.015
 OK+0.015
uch+0.015
Neg logit contributions
 Alexa-0.023
 creation-0.022
 Isabel-0.021
hetically-0.021
 Julie-0.021
Tokenink
Token ID676
Feature activation+1.54e-02

Pos logit contributions
 Alexa+0.035
ula+0.034
hetically+0.034
 curtain+0.032
 poster+0.032
Neg logit contributions
 wanted-0.036
 respected-0.036
 avoided-0.035
 dangerous-0.034
 seen-0.034
Tokenling
Token ID1359
Feature activation-2.16e-02

Pos logit contributions
 wanted+0.024
 okay+0.024
 dangerous+0.024
 crank+0.024
 respected+0.024
Neg logit contributions
 Alexa-0.021
ula-0.021
hetically-0.021
ela-0.020
 creation-0.020
Token beautiful
Token ID4950
Feature activation-2.43e-03

Pos logit contributions
 dangerous+0.008
 waste+0.007
 quit+0.007
 okay+0.007
 wanted+0.007
Neg logit contributions
ula-0.013
hetically-0.013
 Alexa-0.013
ela-0.013
oris-0.012
Token garden
Token ID11376
Feature activation-3.92e-05

Pos logit contributions
ula+0.004
 Alexa+0.004
hetically+0.004
ela+0.004
oris+0.004
Neg logit contributions
 wanted-0.003
 dangerous-0.003
 quit-0.003
 okay-0.003
 waste-0.003
Token with
Token ID351
Feature activation-1.59e-02

Pos logit contributions
 Alexa+0.000
hetically+0.000
ula+0.000
oris+0.000
ela+0.000
Neg logit contributions
 like-0.000
 wanted-0.000
 left-0.000
 waste-0.000
 decided-0.000
Token the
Token ID262
Feature activation+1.05e-02

Pos logit contributions
 Alexa+0.027
ula+0.027
hetically+0.027
ela+0.026
 creation+0.026
Neg logit contributions
 wanted-0.020
 okay-0.019
 dangerous-0.019
 respected-0.019
 crank-0.019
Token or
Token ID393
Feature activation+7.70e-02

Pos logit contributions
 respected+0.015
 crank+0.015
 wanted+0.015
 avoided+0.014
 okay+0.014
Neg logit contributions
 Alexa-0.016
 creation-0.016
hetically-0.016
ula-0.016
 curtain-0.015
Tokennam
Token ID7402
Feature activation+1.93e-01

Pos logit contributions
uch+0.091
 crank+0.078
 lied+0.074
 avoided+0.074
 quit+0.074
Neg logit contributions
 Alexa-0.152
 curtain-0.142
 curtains-0.139
 creation-0.138
 circles-0.138
Tokenents
Token ID658
Feature activation+5.45e-02

Pos logit contributions
 wanted+0.224
 seen+0.219
 worn+0.218
 respected+0.214
 crank+0.212
Neg logit contributions
 Alexa-0.349
hetically-0.343
ula-0.340
ela-0.333
 puppet-0.326
Token.
Token ID13
Feature activation+9.96e-03

Pos logit contributions
 wanted+0.056
 decided+0.052
 belong+0.052
 dangerous+0.052
 okay+0.051
Neg logit contributions
 Alexa-0.111
ula-0.108
hetically-0.106
ela-0.104
oris-0.104
Token
Token ID198
Feature activation+1.62e-02

Pos logit contributions
 wanted+0.012
 okay+0.011
 lived+0.010
 quit+0.010
 decided+0.010
Neg logit contributions
hetically-0.018
ela-0.018
ula-0.018
 Alexa-0.018
 creation-0.018
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
 wanted+0.012
 lived+0.011
?"+0.011
 respected+0.010
 okay+0.010
Neg logit contributions
hetically-0.038
 creation-0.037
 Alexa-0.036
ula-0.036
 poster-0.035
TokenThe
Token ID464
Feature activation+1.77e-02

Pos logit contributions
 wanted+0.016
 okay+0.016
 dangerous+0.016
 respected+0.016
 crank+0.016
Neg logit contributions
 Alexa-0.033
ula-0.033
hetically-0.032
ela-0.032
 creation-0.031
Token to
Token ID284
Feature activation-1.87e-03

Pos logit contributions
 wanted+0.017
 okay+0.016
 respected+0.016
 dangerous+0.016
 decided+0.015
Neg logit contributions
 Alexa-0.033
ula-0.033
hetically-0.033
ela-0.032
 creation-0.031
Token help
Token ID1037
Feature activation-8.14e-02

Pos logit contributions
 Alexa+0.004
ula+0.004
 creation+0.003
ela+0.003
hetically+0.003
Neg logit contributions
 dangerous-0.002
 crank-0.002
 respected-0.002
 wanted-0.002
 avoided-0.002
Token sort
Token ID3297
Feature activation-3.76e-02

Pos logit contributions
hetically+0.162
 Alexa+0.159
ela+0.155
ula+0.155
 creation+0.154
Neg logit contributions
 okay-0.083
 wanted-0.081
 dangerous-0.080
 waste-0.080
 quit-0.079
Token the
Token ID262
Feature activation+4.66e-03

Pos logit contributions
 Alexa+0.074
hetically+0.073
ula+0.071
ela+0.070
 creation+0.069
Neg logit contributions
 waste-0.039
 dangerous-0.038
 wanted-0.038
 okay-0.037
 like-0.037
Token or
Token ID393
Feature activation+7.26e-02

Pos logit contributions
 avoided+0.007
 respected+0.007
 crank+0.007
 okay+0.007
 wanted+0.007
Neg logit contributions
 creation-0.007
 Alexa-0.007
ula-0.007
 curtain-0.007
 poster-0.006
Tokennam
Token ID7402
Feature activation+1.93e-01

Pos logit contributions
uch+0.104
aw+0.091
w+0.089
 crank+0.084
Ble+0.084
Neg logit contributions
 Alexa-0.122
 creation-0.118
 Julie-0.109
 instructions-0.109
 poster-0.106
Tokenents
Token ID658
Feature activation+4.74e-02

Pos logit contributions
 wanted+0.252
 respected+0.245
 okay+0.242
 crank+0.241
 worn+0.241
Neg logit contributions
 Alexa-0.319
ula-0.314
hetically-0.311
ela-0.304
 creation-0.299
Token.
Token ID13
Feature activation+7.07e-03

Pos logit contributions
 wanted+0.051
 okay+0.049
 dangerous+0.049
 respected+0.049
 decided+0.048
Neg logit contributions
 Alexa-0.091
ula-0.089
hetically-0.087
ela-0.086
 creation-0.085
Token She
Token ID1375
Feature activation-8.95e-03

Pos logit contributions
 wanted+0.009
 okay+0.009
 decided+0.008
Please+0.008
 quit+0.008
Neg logit contributions
ula-0.012
hetically-0.012
ela-0.012
 Alexa-0.012
 creation-0.012
Token picked
Token ID6497
Feature activation-1.39e-02

Pos logit contributions
 Alexa+0.019
ula+0.019
 creation+0.019
ela+0.019
hetically+0.019
Neg logit contributions
 okay-0.007
 crank-0.007
 dangerous-0.007
 belong-0.007
 OK-0.007
Token up
Token ID510
Feature activation-4.88e-02

Pos logit contributions
ula+0.026
 Alexa+0.026
hetically+0.025
ela+0.025
oris+0.024
Neg logit contributions
 wanted-0.016
 okay-0.015
 dangerous-0.015
 decided-0.015
 waste-0.014
Token on
Token ID319
Feature activation+2.81e-02

Pos logit contributions
 wanted+0.057
 dangerous+0.053
 respected+0.053
 seen+0.052
 decided+0.052
Neg logit contributions
 Alexa-0.108
ula-0.105
hetically-0.103
ela-0.102
oris-0.099
Token it
Token ID340
Feature activation+2.17e-02

Pos logit contributions
 wanted+0.035
 dangerous+0.034
 respected+0.033
 okay+0.033
 crank+0.033
Neg logit contributions
 Alexa-0.049
ula-0.049
hetically-0.048
ela-0.047
 creation-0.046
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
 crank+0.027
 okay+0.027
 respected+0.027
 dangerous+0.026
 avoided+0.026
Neg logit contributions
 Alexa-0.036
ula-0.036
 creation-0.035
hetically-0.035
ela-0.034
Token The
Token ID383
Feature activation+3.10e-02

Pos logit contributions
 respected+0.020
 belong+0.019
 dangerous+0.019
 fought+0.019
 avoided+0.019
Neg logit contributions
ula-0.024
 Alexa-0.023
ela-0.021
 Isabel-0.021
 Ana-0.021
Token or
Token ID393
Feature activation+7.73e-02

Pos logit contributions
 respected+0.048
 avoided+0.048
 seen+0.048
 crank+0.048
 okay+0.048
Neg logit contributions
 creation-0.041
 poster-0.041
hetically-0.040
 curtain-0.040
 Alexa-0.040
Tokennam
Token ID7402
Feature activation+1.92e-01

Pos logit contributions
 okay+0.071
 quit+0.071
 avoided+0.071
 respected+0.071
 crank+0.070
Neg logit contributions
 Alexa-0.156
hetically-0.148
 poster-0.146
 creation-0.146
 curtain-0.146
Tokenents
Token ID658
Feature activation+5.35e-02

Pos logit contributions
 wanted+0.241
 respected+0.232
 seen+0.231
 worn+0.230
 crank+0.229
Neg logit contributions
 Alexa-0.330
ula-0.324
hetically-0.321
ela-0.314
 creation-0.309
Token were
Token ID547
Feature activation+2.99e-02

Pos logit contributions
 crank+0.055
 okay+0.053
 Please+0.052
 dangerous+0.051
 OK+0.051
Neg logit contributions
 creation-0.102
hetically-0.101
 Alexa-0.098
ela-0.098
ula-0.097
Token shiny
Token ID22441
Feature activation+4.50e-03

Pos logit contributions
 crank+0.036
 wanted+0.035
 okay+0.035
 respected+0.035
 avoided+0.035
Neg logit contributions
 Alexa-0.051
ula-0.049
 creation-0.049
hetically-0.048
ela-0.048
Token and
Token ID290
Feature activation+1.63e-02

Pos logit contributions
 okay+0.006
 crank+0.006
 respected+0.006
 quit+0.006
 avoided+0.006
Neg logit contributions
 creation-0.007
 Alexa-0.007
hetically-0.007
ula-0.007
 poster-0.006
Token colorful
Token ID20239
Feature activation-7.54e-03

Pos logit contributions
 okay+0.024
 crank+0.024
uch+0.023
 OK+0.023
 avoided+0.023
Neg logit contributions
 Alexa-0.024
 creation-0.023
hetically-0.022
 poster-0.022
 Ana-0.021
Token the
Token ID262
Feature activation+9.00e-03

Pos logit contributions
 Alexa+0.082
ula+0.081
hetically+0.081
ela+0.079
 creation+0.078
Neg logit contributions
 wanted-0.037
 okay-0.036
 dangerous-0.036
 respected-0.035
 crank-0.035
Token tree
Token ID5509
Feature activation-2.91e-02

Pos logit contributions
 respected+0.015
 avoided+0.015
 fought+0.015
 crank+0.015
 wanted+0.015
Neg logit contributions
 creation-0.012
 curtain-0.011
 poster-0.011
 Alexa-0.011
hetically-0.011
Token with
Token ID351
Feature activation-1.66e-02

Pos logit contributions
 Alexa+0.056
ula+0.055
hetically+0.054
 creation+0.054
ela+0.054
Neg logit contributions
 wanted-0.030
 respected-0.029
 okay-0.029
 dangerous-0.029
 crank-0.029
Token regular
Token ID3218
Feature activation+2.04e-02

Pos logit contributions
 creation+0.026
ula+0.025
 curtain+0.025
 Alexa+0.025
 circles+0.024
Neg logit contributions
 crank-0.023
 respected-0.023
 avoided-0.023
 missed-0.022
 fought-0.022
Token or
Token ID393
Feature activation+7.41e-02

Pos logit contributions
 crank+0.032
 okay+0.031
 respected+0.031
 avoided+0.031
 dangerous+0.031
Neg logit contributions
 creation-0.029
 curtain-0.027
 Alexa-0.027
hetically-0.026
 poster-0.026
Tokennam
Token ID7402
Feature activation+1.92e-01

Pos logit contributions
uch+0.094
 avoided+0.089
 respected+0.086
Fin+0.086
 stole+0.084
Neg logit contributions
 Alexa-0.130
 creation-0.123
 poster-0.117
 curtains-0.117
 curtain-0.116
Tokenents
Token ID658
Feature activation+5.09e-02

Pos logit contributions
 wanted+0.249
 okay+0.242
 respected+0.241
 crank+0.240
 dangerous+0.240
Neg logit contributions
 Alexa-0.319
ula-0.314
hetically-0.310
ela-0.304
 creation-0.299
Token!"
Token ID2474
Feature activation-8.82e-03

Pos logit contributions
 wanted+0.052
 respected+0.051
 crank+0.051
 okay+0.051
 dangerous+0.051
Neg logit contributions
 Alexa-0.097
ula-0.096
 creation-0.095
ela-0.094
hetically-0.094
Token<|endoftext|>
Token ID50256
Feature activation-1.87e-02

Pos logit contributions
ela+0.016
 Alexa+0.016
ula+0.016
hetically+0.016
 creation+0.016
Neg logit contributions
 wanted-0.010
 okay-0.010
 decided-0.010
 lived-0.010
 respected-0.009
TokenOnce
Token ID7454
Feature activation+2.55e-02

Pos logit contributions
hetically+0.037
ula+0.037
 Alexa+0.037
 creation+0.036
ela+0.036
Neg logit contributions
 wanted-0.019
 okay-0.018
 dangerous-0.017
?"-0.017
 lived-0.017
Token upon
Token ID2402
Feature activation-1.67e-03

Pos logit contributions
 wanted+0.032
?"+0.031
 okay+0.031
 dangerous+0.029
 lived+0.029
Neg logit contributions
hetically-0.048
ula-0.046
 Alexa-0.045
 puppet-0.044
 poster-0.043
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
 Alexa+0.163
ula+0.156
hetically+0.155
ela+0.155
 creation+0.154
Neg logit contributions
 okay-0.097
 dangerous-0.097
 crank-0.096
 belong-0.094
 wanted-0.093
Token Cass
Token ID14154
Feature activation-1.04e-01

Pos logit contributions
hetically+0.016
oming+0.015
ula+0.015
 creation+0.014
ipping+0.013
Neg logit contributions
 wanted-0.025
 decided-0.023
 quit-0.023
 asked-0.023
 lived-0.023
Tokenie
Token ID494
Feature activation-7.46e-02

Pos logit contributions
ula+0.141
 Alexa+0.141
hetically+0.135
 creation+0.131
ela+0.130
Neg logit contributions
 wanted-0.172
 respected-0.167
 decided-0.163
 dangerous-0.162
 quit-0.161
Token celebrate
Token ID10648
Feature activation+2.43e-03

Pos logit contributions
 Alexa+0.164
ula+0.161
hetically+0.160
ela+0.158
 creation+0.158
Neg logit contributions
 okay-0.055
 crank-0.055
 dangerous-0.054
 wanted-0.054
 respected-0.052
Token their
Token ID511
Feature activation-1.13e-02

Pos logit contributions
 wanted+0.003
 crank+0.003
 okay+0.003
 dangerous+0.003
 respected+0.003
Neg logit contributions
 Alexa-0.004
ula-0.004
hetically-0.004
ela-0.004
 creation-0.004
Token accomplishment
Token ID29340
Feature activation-2.08e-01

Pos logit contributions
 Alexa+0.020
ula+0.020
hetically+0.019
 creation+0.019
ela+0.019
Neg logit contributions
 wanted-0.014
 crank-0.013
 okay-0.013
 respected-0.013
 dangerous-0.013
Token of
Token ID286
Feature activation-6.01e-02

Pos logit contributions
 Alexa+0.405
ula+0.396
hetically+0.388
ela+0.384
 creation+0.377
Neg logit contributions
 wanted-0.224
 respected-0.212
 okay-0.210
 dangerous-0.207
 avoided-0.204
Token finding
Token ID4917
Feature activation-7.65e-02

Pos logit contributions
 creation+0.091
hetically+0.081
 curtain+0.079
 poster+0.078
 doorway+0.077
Neg logit contributions
 crank-0.092
 Please-0.087
uch-0.086
Please-0.085
 quit-0.083
Token the
Token ID262
Feature activation-1.90e-02

Pos logit contributions
 Alexa+0.130
 creation+0.129
ula+0.128
hetically+0.126
ela+0.124
Neg logit contributions
 crank-0.094
 wanted-0.092
 okay-0.089
Please-0.089
 respected-0.089
Token special
Token ID2041
Feature activation-1.86e-02

Pos logit contributions
 Alexa+0.032
ula+0.032
hetically+0.032
 creation+0.031
ela+0.031
Neg logit contributions
 wanted-0.024
 crank-0.023
 respected-0.023
 okay-0.023
 dangerous-0.023
Token pie
Token ID2508
Feature activation+2.62e-03

Pos logit contributions
 creation+0.029
ula+0.027
hetically+0.026
 Alexa+0.026
 curtain+0.026
Neg logit contributions
 crank-0.027
Please-0.026
 respected-0.025
 wanted-0.025
 worn-0.025
Token proud
Token ID6613
Feature activation-1.78e-02

Pos logit contributions
 Alexa+0.022
ula+0.021
hetically+0.021
 creation+0.021
ela+0.020
Neg logit contributions
 dangerous-0.021
 wanted-0.020
 okay-0.020
 respected-0.020
 crank-0.020
Token of
Token ID286
Feature activation-5.88e-02

Pos logit contributions
 Alexa+0.029
hetically+0.028
ula+0.028
ela+0.027
 creation+0.027
Neg logit contributions
 wanted-0.025
 okay-0.024
 dangerous-0.024
 respected-0.024
 crank-0.024
Token himself
Token ID2241
Feature activation-1.99e-02

Pos logit contributions
 Alexa+0.104
ula+0.103
hetically+0.102
ela+0.100
 creation+0.098
Neg logit contributions
 wanted-0.071
 crank-0.071
 dangerous-0.069
 okay-0.068
 respected-0.067
Token and
Token ID290
Feature activation-1.32e-02

Pos logit contributions
 Alexa+0.046
hetically+0.045
 poster+0.045
ela+0.045
 doorway+0.045
Neg logit contributions
 okay-0.016
 like-0.014
 wanted-0.014
 because-0.014
 OK-0.013
Token his
Token ID465
Feature activation+8.35e-03

Pos logit contributions
 Alexa+0.024
 creation+0.024
hetically+0.024
 puppet+0.023
ula+0.023
Neg logit contributions
Please-0.014
 crank-0.014
 okay-0.014
 dangerous-0.013
 harmful-0.013
Token accomplishment
Token ID29340
Feature activation-2.08e-01

Pos logit contributions
 wanted+0.010
 dangerous+0.009
 okay+0.009
 crank+0.009
 quit+0.009
Neg logit contributions
 Alexa-0.016
ula-0.015
hetically-0.015
ela-0.015
oris-0.014
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
 Alexa+0.398
ula+0.394
hetically+0.387
ela+0.382
 creation+0.380
Neg logit contributions
 wanted-0.218
 okay-0.212
 dangerous-0.211
 respected-0.210
 crank-0.210
Token<|endoftext|>
Token ID50256
Feature activation-4.18e-02

Pos logit contributions
 Alexa+0.023
ula+0.022
hetically+0.022
ela+0.022
 creation+0.021
Neg logit contributions
 dangerous-0.017
 crank-0.017
 okay-0.016
 wanted-0.016
 respected-0.016
TokenOnce
Token ID7454
Feature activation+4.71e-03

Pos logit contributions
hetically+0.090
 Alexa+0.088
 creation+0.087
ula+0.087
ela+0.086
Neg logit contributions
 wanted-0.038
?"-0.035
 lived-0.035
 okay-0.034
 decided-0.033
Token upon
Token ID2402
Feature activation-2.84e-02

Pos logit contributions
 wanted+0.006
 okay+0.006
?"+0.006
 dangerous+0.006
 lived+0.005
Neg logit contributions
hetically-0.009
 Alexa-0.008
ula-0.008
 poster-0.008
 puppet-0.008
Token a
Token ID257
Feature activation+1.57e-03

Pos logit contributions
 Alexa+0.052
ula+0.050
 creation+0.049
hetically+0.048
 puppet+0.048
Neg logit contributions
 respected-0.030
 crank-0.030
 fought-0.030
 avoided-0.029
 dangerous-0.029
Token baby
Token ID5156
Feature activation+5.69e-03

Pos logit contributions
ula+0.032
ela+0.032
 Alexa+0.031
hetically+0.030
oris+0.030
Neg logit contributions
 wanted-0.021
 okay-0.021
 dangerous-0.021
 crank-0.020
 quit-0.020
Token,
Token ID11
Feature activation-2.07e-02

Pos logit contributions
 wanted+0.006
 respected+0.006
 decided+0.006
 okay+0.006
 quit+0.006
Neg logit contributions
ula-0.011
 Alexa-0.011
ela-0.011
hetically-0.011
 poster-0.010
Token celebrating
Token ID17499
Feature activation-1.90e-02

Pos logit contributions
 Alexa+0.037
ula+0.037
hetically+0.036
ela+0.036
 creation+0.035
Neg logit contributions
 wanted-0.025
 okay-0.024
 respected-0.024
 dangerous-0.024
 crank-0.024
Token their
Token ID511
Feature activation-1.27e-02

Pos logit contributions
 Alexa+0.035
ula+0.035
hetically+0.035
ela+0.034
 creation+0.034
Neg logit contributions
 wanted-0.021
 okay-0.021
 respected-0.020
 dangerous-0.020
 crank-0.020
Token special
Token ID2041
Feature activation-1.63e-02

Pos logit contributions
 Alexa+0.022
ula+0.022
ela+0.022
hetically+0.021
oris+0.020
Neg logit contributions
 wanted-0.017
 dangerous-0.017
 okay-0.017
 quit-0.016
 waste-0.016
Token accomplishment
Token ID29340
Feature activation-2.07e-01

Pos logit contributions
 Alexa+0.025
ula+0.025
ela+0.024
hetically+0.024
 Isabel+0.023
Neg logit contributions
 wanted-0.025
 dangerous-0.024
 okay-0.024
 quit-0.023
 waste-0.023
Token.
Token ID13
Feature activation-2.14e-02

Pos logit contributions
 Alexa+0.415
ula+0.404
hetically+0.401
ela+0.395
 poster+0.385
Neg logit contributions
 wanted-0.214
 respected-0.201
 okay-0.198
 dangerous-0.196
 decided-0.192
Token Then
Token ID3244
Feature activation+2.54e-02

Pos logit contributions
ela+0.044
hetically+0.043
ula+0.043
 creation+0.042
 Alexa+0.042
Neg logit contributions
 wanted-0.021
 okay-0.020
 lived-0.019
 respected-0.019
 OK-0.019
Token,
Token ID11
Feature activation-2.30e-02

Pos logit contributions
 crank+0.033
 wanted+0.033
 dangerous+0.033
 respected+0.032
 seen+0.032
Neg logit contributions
 Alexa-0.043
ula-0.041
 creation-0.040
hetically-0.040
ela-0.040
Token they
Token ID484
Feature activation-4.44e-02

Pos logit contributions
 Alexa+0.036
ula+0.034
 creation+0.033
hetically+0.032
ela+0.032
Neg logit contributions
 crank-0.033
 wanted-0.033
 seen-0.033
 dangerous-0.033
 respected-0.032
Token had
Token ID550
Feature activation-4.65e-03

Pos logit contributions
 Alexa+0.089
 creation+0.087
hetically+0.085
ela+0.084
 curtain+0.083
Neg logit contributions
 crank-0.045
 dangerous-0.044
 okay-0.043
 OK-0.040
 unsafe-0.040
Token correctly
Token ID9380
Feature activation-1.97e-02

Pos logit contributions
 Alexa+0.068
ula+0.067
hetically+0.066
ela+0.066
 creation+0.065
Neg logit contributions
 wanted-0.038
 okay-0.037
 dangerous-0.037
 respected-0.036
 crank-0.036
Token.
Token ID13
Feature activation-1.20e-02

Pos logit contributions
 Alexa+0.040
hetically+0.039
ula+0.039
ela+0.038
 creation+0.037
Neg logit contributions
 wanted-0.020
 okay-0.018
 dangerous-0.018
 decided-0.018
 respected-0.018
Token She
Token ID1375
Feature activation-2.61e-02

Pos logit contributions
hetically+0.022
ula+0.022
ela+0.022
 Alexa+0.022
 creation+0.022
Neg logit contributions
 wanted-0.014
 okay-0.013
 lived-0.012
 crank-0.012
 worn-0.012
Token smiled
Token ID13541
Feature activation-4.37e-02

Pos logit contributions
 Alexa+0.053
ula+0.052
 creation+0.052
hetically+0.051
ela+0.051
Neg logit contributions
 okay-0.024
 crank-0.024
 dangerous-0.023
 belong-0.023
 OK-0.022
Token with
Token ID351
Feature activation-4.12e-02

Pos logit contributions
ula+0.086
 Alexa+0.086
ela+0.085
hetically+0.085
 poster+0.082
Neg logit contributions
 okay-0.046
 wanted-0.046
 decided-0.043
 respected-0.043
 like-0.042
Token accomplishment
Token ID29340
Feature activation-2.06e-01

Pos logit contributions
ula+0.072
ela+0.072
 Alexa+0.071
hetically+0.071
 creation+0.069
Neg logit contributions
 wanted-0.051
 okay-0.050
 respected-0.050
 belong-0.048
 dangerous-0.048
Token and
Token ID290
Feature activation-1.44e-02

Pos logit contributions
 Alexa+0.374
ula+0.366
hetically+0.360
ela+0.357
 creation+0.353
Neg logit contributions
 wanted-0.246
 okay-0.237
 respected-0.234
 dangerous-0.233
 crank-0.227
Token hugged
Token ID41130
Feature activation-4.01e-03

Pos logit contributions
 Alexa+0.028
ula+0.027
hetically+0.027
ela+0.027
 creation+0.027
Neg logit contributions
 dangerous-0.015
 crank-0.015
 okay-0.014
 belong-0.014
Please-0.014
Token her
Token ID607
Feature activation-3.66e-02

Pos logit contributions
 Alexa+0.007
ula+0.007
hetically+0.007
ela+0.007
 creation+0.007
Neg logit contributions
 okay-0.005
 wanted-0.005
 respected-0.005
 quit-0.005
 dangerous-0.005
Token mom
Token ID1995
Feature activation-4.16e-03

Pos logit contributions
ela+0.085
ula+0.085
 Alexa+0.085
pill+0.083
hetically+0.082
Neg logit contributions
 okay-0.028
 wanted-0.025
 quit-0.024
 OK-0.023
 dangerous-0.023
Tokenmy
Token ID1820
Feature activation-3.09e-02

Pos logit contributions
ula+0.009
 Alexa+0.009
 poster+0.009
ela+0.009
 creation+0.008
Neg logit contributions
 okay-0.004
 wanted-0.004
 like-0.004
 OK-0.004
 decided-0.004
Token smiled
Token ID13541
Feature activation-3.95e-02

Pos logit contributions
 okay+0.023
 belong+0.023
 OK+0.022
 crank+0.022
 waste+0.022
Neg logit contributions
 Alexa-0.029
 creation-0.028
 puppet-0.028
hetically-0.027
 Isabel-0.027
Token,
Token ID11
Feature activation-1.47e-02

Pos logit contributions
 Alexa+0.081
ula+0.081
ela+0.079
hetically+0.079
 poster+0.077
Neg logit contributions
 wanted-0.039
 okay-0.039
 like-0.036
 decided-0.036
 respected-0.036
Token proud
Token ID6613
Feature activation-1.53e-02

Pos logit contributions
 Alexa+0.024
hetically+0.023
 creation+0.022
ula+0.022
ela+0.022
Neg logit contributions
 belong-0.020
 dangerous-0.019
 waste-0.019
 crank-0.019
 quit-0.019
Token of
Token ID286
Feature activation-5.53e-02

Pos logit contributions
 Alexa+0.026
ula+0.025
hetically+0.025
ela+0.024
 creation+0.024
Neg logit contributions
 wanted-0.021
 okay-0.020
 respected-0.020
 dangerous-0.019
 quit-0.019
Token his
Token ID465
Feature activation+9.75e-03

Pos logit contributions
 Alexa+0.109
ula+0.108
hetically+0.106
ela+0.105
 creation+0.104
Neg logit contributions
 wanted-0.057
 okay-0.055
 respected-0.055
 dangerous-0.055
 crank-0.053
Token accomplishment
Token ID29340
Feature activation-2.03e-01

Pos logit contributions
 okay+0.013
 dangerous+0.013
 wanted+0.013
 quit+0.013
 waste+0.012
Neg logit contributions
 Alexa-0.017
ula-0.017
ela-0.016
hetically-0.016
oris-0.015
Token,
Token ID11
Feature activation-1.62e-02

Pos logit contributions
 Alexa+0.409
ula+0.394
hetically+0.386
ela+0.384
 poster+0.379
Neg logit contributions
 wanted-0.213
 respected-0.197
 okay-0.196
 dangerous-0.191
 kept-0.191
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
 Alexa+0.026
ula+0.026
hetically+0.026
 creation+0.026
ela+0.025
Neg logit contributions
 wanted-0.021
 respected-0.021
 crank-0.020
 belong-0.020
 okay-0.020
Token admired
Token ID29382
Feature activation-1.95e-02

Pos logit contributions
 Alexa+0.020
hetically+0.020
 creation+0.020
 puppet+0.019
ela+0.019
Neg logit contributions
 crank-0.014
Please-0.014
uch-0.014
 waste-0.014
 okay-0.014
Token his
Token ID465
Feature activation+7.11e-03

Pos logit contributions
 Alexa+0.043
ula+0.042
hetically+0.042
ela+0.042
 poster+0.041
Neg logit contributions
 wanted-0.015
 dangerous-0.015
 okay-0.015
 respected-0.015
 like-0.013
Token reflection
Token ID14580
Feature activation+3.03e-03

Pos logit contributions
 wanted+0.010
 dangerous+0.010
 okay+0.010
 quit+0.009
 crank+0.009
Neg logit contributions
ula-0.012
 Alexa-0.012
hetically-0.011
ela-0.011
oris-0.011
Token hard
Token ID1327
Feature activation-5.86e-02

Pos logit contributions
ula+0.056
ela+0.054
oming+0.054
 Alexa+0.052
 doorway+0.051
Neg logit contributions
 okay-0.035
 wanted-0.033
 dangerous-0.033
 respected-0.031
 quit-0.031
Token work
Token ID670
Feature activation-8.10e-02

Pos logit contributions
ula+0.105
 Alexa+0.104
ela+0.103
hetically+0.098
 curtain+0.097
Neg logit contributions
 wanted-0.075
 respected-0.072
 quit-0.071
 okay-0.070
 dangerous-0.069
Token.
Token ID13
Feature activation-2.10e-02

Pos logit contributions
 Alexa+0.180
ula+0.178
ela+0.175
oris+0.174
 poster+0.173
Neg logit contributions
 wanted-0.066
 okay-0.063
 respected-0.060
 like-0.058
 dangerous-0.058
Token What
Token ID1867
Feature activation-2.81e-02

Pos logit contributions
ela+0.038
 creation+0.038
hetically+0.038
ula+0.038
 Alexa+0.037
Neg logit contributions
 wanted-0.024
 respected-0.023
 okay-0.023
 lived-0.022
 avoided-0.022
Token an
Token ID281
Feature activation-1.75e-02

Pos logit contributions
 Alexa+0.056
ula+0.053
 creation+0.053
hetically+0.052
ela+0.052
Neg logit contributions
 crank-0.028
 wanted-0.027
Please-0.027
 respected-0.027
 quit-0.026
Token accomplishment
Token ID29340
Feature activation-2.02e-01

Pos logit contributions
ula+0.031
 Alexa+0.030
hetically+0.030
 creation+0.030
ela+0.029
Neg logit contributions
 wanted-0.021
 crank-0.020
 respected-0.020
 worn-0.020
 dangerous-0.020
Token!
Token ID0
Feature activation-6.79e-02

Pos logit contributions
 Alexa+0.366
ula+0.365
ela+0.360
 creation+0.358
hetically+0.351
Neg logit contributions
 crank-0.224
 okay-0.216
 dangerous-0.216
 respected-0.211
 avoided-0.208
Token<|endoftext|>
Token ID50256
Feature activation-3.91e-02

Pos logit contributions
 Alexa+0.124
ula+0.122
hetically+0.119
ela+0.118
 creation+0.117
Neg logit contributions
 wanted-0.078
 okay-0.076
 dangerous-0.076
 respected-0.076
 crank-0.076
TokenOnce
Token ID7454
Feature activation+7.31e-03

Pos logit contributions
hetically+0.074
 Alexa+0.073
ula+0.073
 creation+0.072
ela+0.072
Neg logit contributions
 wanted-0.043
 okay-0.040
 lived-0.040
 dangerous-0.039
?"-0.039
Token there
Token ID612
Feature activation+2.33e-02

Pos logit contributions
 wanted+0.009
 okay+0.009
?"+0.009
 dangerous+0.009
 lived+0.009
Neg logit contributions
hetically-0.013
 Alexa-0.013
ula-0.013
 puppet-0.012
 creation-0.012
Token was
Token ID373
Feature activation-1.56e-03

Pos logit contributions
 crank+0.029
 dangerous+0.028
Please+0.027
 okay+0.027
 respected+0.027
Neg logit contributions
 Alexa-0.039
ela-0.038
hetically-0.038
 creation-0.038
ula-0.038
Token They
Token ID1119
Feature activation-6.01e-03

Pos logit contributions
ula+0.015
 Alexa+0.014
 Isabel+0.014
 Tracy+0.013
 Ana+0.013
Neg logit contributions
 belong-0.015
uch-0.014
 respected-0.014
 dangerous-0.014
 stole-0.014
Token make
Token ID787
Feature activation-6.42e-02

Pos logit contributions
 Alexa+0.010
 poster+0.009
hetically+0.009
 curtain+0.009
 Isabel+0.009
Neg logit contributions
 okay-0.008
 crank-0.007
 dangerous-0.007
 worn-0.007
 OK-0.007
Token many
Token ID867
Feature activation+2.90e-02

Pos logit contributions
hetically+0.146
 Alexa+0.142
ula+0.141
ela+0.140
oris+0.139
Neg logit contributions
 dangerous-0.050
 wanted-0.048
 okay-0.048
 like-0.045
 quit-0.045
Token z
Token ID1976
Feature activation-8.93e-02

Pos logit contributions
 dangerous+0.035
 wanted+0.034
 waste+0.032
 like+0.031
 seen+0.031
Neg logit contributions
ula-0.055
hetically-0.055
 Alexa-0.053
ela-0.052
oris-0.051
Tokenig
Token ID328
Feature activation-9.95e-02

Pos logit contributions
 doorway+0.120
hetically+0.117
 Alexa+0.115
ela+0.110
ula+0.110
Neg logit contributions
uch-0.145
?"-0.137
 fought-0.135
 okay-0.135
 yourself-0.133
Tokenz
Token ID89
Feature activation-2.01e-01

Pos logit contributions
joy+0.150
 doorway+0.147
 Alexa+0.137
 Tracy+0.136
ela+0.135
Neg logit contributions
 belong-0.149
 like-0.148
 did-0.146
 when-0.144
 wanted-0.142
Tokenags
Token ID3775
Feature activation-2.03e-02

Pos logit contributions
 Alexa+0.358
hetically+0.343
ula+0.339
ela+0.338
 doorway+0.334
Neg logit contributions
 okay-0.242
?"-0.236
 quit-0.230
 did-0.227
 wanted-0.226
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
 Alexa+0.040
hetically+0.040
ula+0.040
ela+0.039
oris+0.037
Neg logit contributions
 wanted-0.021
 dangerous-0.020
 respected-0.020
 like-0.019
 lived-0.019
Token circles
Token ID13332
Feature activation-1.66e-01

Pos logit contributions
 Alexa+0.020
 creation+0.018
 puppet+0.018
 poster+0.018
 Isabel+0.018
Neg logit contributions
 okay-0.015
 crank-0.014
uch-0.014
 respected-0.014
 delayed-0.014
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
 Alexa+0.375
ela+0.372
oris+0.365
 doorway+0.364
hetically+0.363
Neg logit contributions
 like-0.138
 okay-0.122
 wanted-0.118
 did-0.112
 decided-0.104
Token lines
Token ID3951
Feature activation-4.42e-02

Pos logit contributions
 Alexa+0.020
 creation+0.018
hetically+0.018
ula+0.018
 poster+0.018
Neg logit contributions
 okay-0.019
 crank-0.019
 dangerous-0.018
 respected-0.018
 wanted-0.018
Token proudly
Token ID23017
Feature activation-2.14e-02

Pos logit contributions
 Alexa+0.033
 creation+0.032
ula+0.031
 puppet+0.031
ela+0.031
Neg logit contributions
 okay-0.018
 crank-0.017
 belong-0.017
 dangerous-0.017
 OK-0.017
Token showed
Token ID3751
Feature activation-2.48e-02

Pos logit contributions
 Alexa+0.043
ula+0.042
hetically+0.042
 creation+0.041
ela+0.041
Neg logit contributions
 okay-0.020
 wanted-0.020
 crank-0.020
 dangerous-0.020
 respected-0.020
Token her
Token ID607
Feature activation-2.26e-02

Pos logit contributions
hetically+0.048
ula+0.048
ela+0.047
 Alexa+0.047
 poster+0.046
Neg logit contributions
 wanted-0.026
 okay-0.026
 quit-0.025
 dangerous-0.024
 decided-0.024
Token mom
Token ID1995
Feature activation+4.24e-03

Pos logit contributions
hetically+0.045
 Alexa+0.044
ula+0.043
oris+0.042
ela+0.042
Neg logit contributions
 wanted-0.026
 dangerous-0.025
 okay-0.023
 worn-0.023
 decided-0.023
Token her
Token ID607
Feature activation-2.26e-02

Pos logit contributions
 wanted+0.003
 decided+0.003
 like+0.003
 waste+0.003
 respected+0.003
Neg logit contributions
ula-0.010
hetically-0.010
 Alexa-0.009
oris-0.009
oming-0.009
Token accomplishment
Token ID29340
Feature activation-1.99e-01

Pos logit contributions
 Alexa+0.040
hetically+0.040
ula+0.039
ela+0.039
oris+0.037
Neg logit contributions
 dangerous-0.028
 wanted-0.028
 okay-0.028
 waste-0.026
 quit-0.026
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
 Alexa+0.371
ula+0.363
hetically+0.360
ela+0.353
 creation+0.349
Neg logit contributions
 wanted-0.228
 okay-0.217
 respected-0.216
 dangerous-0.215
 crank-0.212
Token
Token ID198
Feature activation-7.87e-03

Pos logit contributions
 Alexa+0.024
ula+0.024
hetically+0.024
ela+0.024
 creation+0.023
Neg logit contributions
 wanted-0.022
 okay-0.021
 respected-0.021
 crank-0.020
 quit-0.020
Token
Token ID198
Feature activation-7.15e-03

Pos logit contributions
hetically+0.019
 creation+0.018
 Alexa+0.018
ula+0.017
 puppet+0.017
Neg logit contributions
 wanted-0.005
?"-0.005
 lived-0.005
 asked-0.005
 respected-0.005
TokenHer
Token ID9360
Feature activation-1.14e-03

Pos logit contributions
 Alexa+0.014
ula+0.014
hetically+0.014
 creation+0.014
ela+0.014
Neg logit contributions
 wanted-0.007
 respected-0.007
 okay-0.007
 dangerous-0.007
 crank-0.007
Token mom
Token ID1995
Feature activation+3.98e-03

Pos logit contributions
hetically+0.002
 Alexa+0.002
ela+0.002
ula+0.002
oris+0.002
Neg logit contributions
 okay-0.001
 wanted-0.001
 dangerous-0.001
 quit-0.001
 OK-0.001
Token was
Token ID373
Feature activation-2.74e-02

Pos logit contributions
 Alexa+0.073
 creation+0.071
 puppet+0.069
 curtain+0.068
hetically+0.068
Neg logit contributions
 okay-0.038
 belong-0.036
 OK-0.036
 crank-0.035
 dangerous-0.033
Token so
Token ID523
Feature activation-1.53e-02

Pos logit contributions
 Alexa+0.047
ula+0.045
 creation+0.044
ela+0.043
hetically+0.043
Neg logit contributions
 wanted-0.033
 belong-0.033
 respected-0.033
 stole-0.033
Please-0.033
Token proud
Token ID6613
Feature activation-2.44e-02

Pos logit contributions
 Alexa+0.020
 creation+0.019
ela+0.019
ula+0.019
oming+0.018
Neg logit contributions
 stole-0.024
Please-0.024
 belong-0.022
 avoided-0.022
 fought-0.022
Token of
Token ID286
Feature activation-5.98e-02

Pos logit contributions
 Alexa+0.038
hetically+0.038
ula+0.037
ela+0.036
 creation+0.036
Neg logit contributions
 wanted-0.035
 dangerous-0.034
 okay-0.033
 respected-0.033
 crank-0.033
Token her
Token ID607
Feature activation-2.99e-02

Pos logit contributions
 Alexa+0.096
ula+0.095
hetically+0.093
ela+0.091
 creation+0.091
Neg logit contributions
 wanted-0.083
 okay-0.081
 dangerous-0.081
 respected-0.081
 crank-0.080
Token accomplishment
Token ID29340
Feature activation-1.99e-01

Pos logit contributions
 Alexa+0.050
ula+0.050
ela+0.049
hetically+0.049
oris+0.046
Neg logit contributions
 wanted-0.040
 dangerous-0.039
 okay-0.039
 respected-0.038
 crank-0.037
Token!
Token ID0
Feature activation-6.39e-02

Pos logit contributions
 Alexa+0.372
ula+0.366
hetically+0.360
ela+0.355
 creation+0.352
Neg logit contributions
 wanted-0.222
 okay-0.213
 respected-0.211
 dangerous-0.211
 crank-0.209
Token<|endoftext|>
Token ID50256
Feature activation-3.86e-02

Pos logit contributions
 Alexa+0.108
ula+0.106
hetically+0.102
ela+0.101
 creation+0.100
Neg logit contributions
 wanted-0.082
 dangerous-0.082
 crank-0.081
 okay-0.081
 respected-0.081
TokenNick
Token ID23609
Feature activation-1.80e-02

Pos logit contributions
 Alexa+0.071
hetically+0.071
ula+0.071
 creation+0.069
ela+0.069
Neg logit contributions
 wanted-0.044
 okay-0.041
 dangerous-0.041
 respected-0.041
 crank-0.041
Token was
Token ID373
Feature activation+1.49e-03

Pos logit contributions
 Alexa+0.031
ela+0.030
 creation+0.030
hetically+0.029
 puppet+0.029
Neg logit contributions
 okay-0.023
 OK-0.022
 dangerous-0.022
 crank-0.021
 harmful-0.021
Token very
Token ID845
Feature activation+4.75e-03

Pos logit contributions
 wanted+0.002
 okay+0.002
 dangerous+0.002
 crank+0.002
 respected+0.002
Neg logit contributions
 Alexa-0.003
ula-0.003
hetically-0.003
ela-0.003
 creation-0.003
Token,
Token ID11
Feature activation-1.14e-02

Pos logit contributions
 Alexa+0.121
ula+0.116
 poster+0.113
hetically+0.113
 creation+0.110
Neg logit contributions
 wanted-0.078
 like-0.076
 okay-0.075
 decided-0.074
 dangerous-0.073
Token feeling
Token ID4203
Feature activation-6.21e-03

Pos logit contributions
ula+0.019
 Alexa+0.019
ela+0.018
hetically+0.018
 creation+0.018
Neg logit contributions
 wanted-0.016
 okay-0.015
 respected-0.015
 avoided-0.015
 dangerous-0.015
Token content
Token ID2695
Feature activation+6.01e-02

Pos logit contributions
 Alexa+0.011
ula+0.011
hetically+0.011
ela+0.011
 creation+0.011
Neg logit contributions
 wanted-0.007
 crank-0.007
 dangerous-0.007
 okay-0.006
 respected-0.006
Token with
Token ID351
Feature activation-3.29e-02

Pos logit contributions
 belong+0.079
 crank+0.078
 stole+0.078
 dangerous+0.077
 worn+0.077
Neg logit contributions
hetically-0.096
 creation-0.095
ula-0.094
 Alexa-0.093
ela-0.090
Token his
Token ID465
Feature activation+1.37e-02

Pos logit contributions
 Alexa+0.067
ula+0.066
hetically+0.065
ela+0.064
 creation+0.063
Neg logit contributions
 wanted-0.032
 okay-0.031
 respected-0.031
 dangerous-0.031
 crank-0.030
Token accomplishment
Token ID29340
Feature activation-1.98e-01

Pos logit contributions
 wanted+0.020
 okay+0.019
 dangerous+0.019
 respected+0.019
 crank+0.019
Neg logit contributions
 Alexa-0.021
ula-0.021
hetically-0.020
ela-0.020
 creation-0.020
Token.
Token ID13
Feature activation-1.31e-02

Pos logit contributions
 Alexa+0.390
ula+0.379
hetically+0.372
ela+0.369
 creation+0.363
Neg logit contributions
 wanted-0.208
 okay-0.199
 respected-0.197
 dangerous-0.195
 quit-0.191
Token<|endoftext|>
Token ID50256
Feature activation-3.41e-02

Pos logit contributions
 Alexa+0.025
ula+0.025
hetically+0.025
 creation+0.024
ela+0.024
Neg logit contributions
 wanted-0.014
 okay-0.014
 respected-0.013
 dangerous-0.013
 crank-0.013
TokenOnce
Token ID7454
Feature activation+1.22e-02

Pos logit contributions
hetically+0.067
ula+0.066
 Alexa+0.066
 creation+0.066
ela+0.065
Neg logit contributions
 wanted-0.036
 lived-0.033
?"-0.033
 okay-0.032
 decided-0.032
Token upon
Token ID2402
Feature activation-1.93e-02

Pos logit contributions
 wanted+0.015
?"+0.014
 okay+0.014
 lived+0.014
 dangerous+0.014
Neg logit contributions
hetically-0.023
ula-0.023
 Alexa-0.023
 puppet-0.021
 creation-0.021
Token a
Token ID257
Feature activation+7.08e-03

Pos logit contributions
 Alexa+0.039
ula+0.038
hetically+0.038
 creation+0.037
ela+0.037
Neg logit contributions
 wanted-0.018
 respected-0.017
 crank-0.017
 dangerous-0.017
 okay-0.017
Token was
Token ID373
Feature activation-1.06e-02

Pos logit contributions
 Alexa+0.025
hetically+0.024
 puppet+0.024
 creation+0.024
ela+0.024
Neg logit contributions
 crank-0.014
 okay-0.014
 belong-0.014
 dangerous-0.014
 OK-0.013
Token very
Token ID845
Feature activation+4.98e-03

Pos logit contributions
 Alexa+0.020
ula+0.019
hetically+0.019
ela+0.019
 creation+0.019
Neg logit contributions
 wanted-0.012
 dangerous-0.011
 crank-0.011
 respected-0.011
 okay-0.011
Token proud
Token ID6613
Feature activation-7.15e-03

Pos logit contributions
Please+0.007
 stole+0.007
 fought+0.007
 avoided+0.006
 shouldn+0.006
Neg logit contributions
ela-0.008
ula-0.008
 Alexa-0.008
 creation-0.007
oming-0.007
Token of
Token ID286
Feature activation-4.88e-02

Pos logit contributions
 Alexa+0.012
hetically+0.012
ula+0.012
ela+0.012
 creation+0.012
Neg logit contributions
 wanted-0.010
 okay-0.009
 dangerous-0.009
 respected-0.009
 seen-0.009
Token her
Token ID607
Feature activation-2.21e-02

Pos logit contributions
 Alexa+0.080
ula+0.079
hetically+0.077
 creation+0.076
ela+0.076
Neg logit contributions
 wanted-0.067
 okay-0.066
 respected-0.066
 dangerous-0.066
 avoided-0.063
Token accomplishment
Token ID29340
Feature activation-1.98e-01

Pos logit contributions
 Alexa+0.043
ela+0.042
hetically+0.040
 Isabel+0.040
ula+0.039
Neg logit contributions
 okay-0.028
 dangerous-0.028
 wanted-0.027
 waste-0.026
 OK-0.026
Token,
Token ID11
Feature activation-7.50e-03

Pos logit contributions
 Alexa+0.397
ula+0.385
hetically+0.381
ela+0.375
 creation+0.368
Neg logit contributions
 wanted-0.206
 okay-0.191
 respected-0.191
 dangerous-0.188
 decided-0.184
Token and
Token ID290
Feature activation+3.40e-04

Pos logit contributions
 Alexa+0.013
ula+0.013
hetically+0.013
ela+0.013
 creation+0.012
Neg logit contributions
 wanted-0.009
 okay-0.009
 dangerous-0.009
 respected-0.009
 crank-0.009
Token went
Token ID1816
Feature activation+3.85e-02

Pos logit contributions
 crank+0.000
 okay+0.000
 dangerous+0.000
Please+0.000
 waste+0.000
Neg logit contributions
 Alexa-0.001
ela-0.001
hetically-0.001
ula-0.001
 creation-0.001
Token home
Token ID1363
Feature activation+2.36e-02

Pos logit contributions
 left+0.037
 like+0.033
 seen+0.033
 dangerous+0.032
 met+0.032
Neg logit contributions
hetically-0.084
 Alexa-0.081
ula-0.076
 creation-0.076
oris-0.075
Token to
Token ID284
Feature activation-1.26e-02

Pos logit contributions
 wanted+0.025
 okay+0.024
 respected+0.024
 dangerous+0.024
 crank+0.023
Neg logit contributions
 Alexa-0.046
ula-0.046
hetically-0.045
ela-0.044
 creation-0.044
Token on
Token ID319
Feature activation-3.25e-02

Pos logit contributions
 Alexa+0.044
ula+0.040
 creation+0.040
ela+0.040
hetically+0.039
Neg logit contributions
 dangerous-0.029
 respected-0.029
 crank-0.029
 belong-0.028
 avoided-0.028
Token,
Token ID11
Feature activation-4.06e-02

Pos logit contributions
 Alexa+0.062
ula+0.062
hetically+0.060
ela+0.060
 creation+0.059
Neg logit contributions
 wanted-0.034
 okay-0.034
 dangerous-0.033
 respected-0.033
 crank-0.032
Token whenever
Token ID8797
Feature activation-6.00e-02

Pos logit contributions
 Alexa+0.069
ula+0.064
ela+0.063
hetically+0.062
 curtain+0.061
Neg logit contributions
 crank-0.053
 respected-0.053
 wanted-0.052
 dangerous-0.052
 belong-0.051
Token she
Token ID673
Feature activation-4.77e-02

Pos logit contributions
 Alexa+0.084
 Julie+0.077
 Anne+0.074
 creation+0.074
 Isabel+0.074
Neg logit contributions
 crank-0.096
 respected-0.094
uch-0.093
 worn-0.092
asure-0.091
Token un
Token ID555
Feature activation+2.85e-02

Pos logit contributions
 Alexa+0.086
 creation+0.085
 puppet+0.082
 poster+0.080
hetically+0.080
Neg logit contributions
 okay-0.056
 OK-0.053
 ours-0.053
 crank-0.049
 badly-0.049
Tokenz
Token ID89
Feature activation-1.97e-01

Pos logit contributions
 wanted+0.043
 respected+0.043
 okay+0.042
 crank+0.042
 worn+0.042
Neg logit contributions
 Alexa-0.042
ula-0.041
hetically-0.039
ela-0.039
 creation-0.038
Tokenipped
Token ID3949
Feature activation-6.34e-02

Pos logit contributions
 Alexa+0.264
ula+0.255
hetically+0.248
ela+0.242
 creation+0.237
Neg logit contributions
 wanted-0.321
 okay-0.321
 quit-0.315
 respected-0.314
 dangerous-0.314
Token her
Token ID607
Feature activation-2.40e-02

Pos logit contributions
 Alexa+0.104
ula+0.104
ela+0.099
 creation+0.099
hetically+0.098
Neg logit contributions
 respected-0.084
 wanted-0.083
 crank-0.083
 avoided-0.081
 okay-0.080
Token cup
Token ID6508
Feature activation-1.30e-02

Pos logit contributions
 Alexa+0.034
ula+0.034
 creation+0.034
hetically+0.033
ela+0.032
Neg logit contributions
 respected-0.036
 wanted-0.036
 crank-0.036
 okay-0.036
 avoided-0.035
Token it
Token ID340
Feature activation+5.56e-03

Pos logit contributions
ula+0.019
 Alexa+0.019
hetically+0.019
 creation+0.018
ela+0.018
Neg logit contributions
 wanted-0.019
 respected-0.019
 dangerous-0.019
 crank-0.019
 okay-0.019
Token would
Token ID561
Feature activation-5.89e-02

Pos logit contributions
 respected+0.007
 crank+0.007
Please+0.007
 avoided+0.006
 fought+0.006
Neg logit contributions
 Alexa-0.009
 creation-0.009
hetically-0.009
ula-0.009
 curtain-0.009
Token The
Token ID383
Feature activation+1.38e-02

Pos logit contributions
ula+0.090
 Alexa+0.089
hetically+0.082
ela+0.081
 Isabel+0.080
Neg logit contributions
 dangerous-0.073
 respected-0.073
 okay-0.073
 crank-0.072
 wanted-0.071
Token friends
Token ID2460
Feature activation+2.92e-02

Pos logit contributions
 belong+0.020
Please+0.020
 respected+0.020
 okay+0.020
 avoided+0.020
Neg logit contributions
 creation-0.020
 curtain-0.019
 poster-0.019
hetically-0.019
 Alexa-0.019
Token smiled
Token ID13541
Feature activation-3.68e-02

Pos logit contributions
 okay+0.028
 crank+0.028
 dangerous+0.028
Please+0.026
 wanted+0.026
Neg logit contributions
 Alexa-0.059
hetically-0.057
ula-0.057
 creation-0.056
ela-0.056
Token at
Token ID379
Feature activation+7.53e-02

Pos logit contributions
ula+0.074
 Alexa+0.074
ela+0.073
hetically+0.073
 poster+0.071
Neg logit contributions
 wanted-0.039
 okay-0.038
 decided-0.036
 like-0.035
 respected-0.035
Token their
Token ID511
Feature activation-8.84e-04

Pos logit contributions
 crank+0.083
 wanted+0.081
 okay+0.080
 worn+0.079
 dangerous+0.078
Neg logit contributions
 Alexa-0.140
hetically-0.137
ula-0.137
 creation-0.136
ela-0.132
Token accomplishment
Token ID29340
Feature activation-1.97e-01

Pos logit contributions
ula+0.002
 Alexa+0.002
hetically+0.002
ela+0.002
oris+0.001
Neg logit contributions
 wanted-0.001
 dangerous-0.001
 respected-0.001
 waste-0.001
 okay-0.001
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
 Alexa+0.379
ula+0.369
hetically+0.362
ela+0.358
 poster+0.350
Neg logit contributions
 wanted-0.222
 respected-0.209
 okay-0.205
 dangerous-0.204
 decided-0.200
Token
Token ID220
Feature activation-5.99e-02

Pos logit contributions
 Alexa+0.021
ula+0.020
hetically+0.020
ela+0.020
 creation+0.019
Neg logit contributions
 wanted-0.014
 dangerous-0.013
 okay-0.013
 crank-0.013
 respected-0.013
Token
Token ID198
Feature activation-6.65e-03

Pos logit contributions
 Alexa+0.104
ula+0.103
hetically+0.097
 creation+0.096
ela+0.096
Neg logit contributions
 wanted-0.074
 crank-0.074
 dangerous-0.073
 respected-0.072
 worn-0.071
Token
Token ID198
Feature activation-5.97e-03

Pos logit contributions
hetically+0.014
 Alexa+0.014
 creation+0.013
ula+0.013
ela+0.013
Neg logit contributions
 wanted-0.006
 respected-0.006
 lived-0.006
 okay-0.006
 dangerous-0.006
TokenThe
Token ID464
Feature activation-1.18e-03

Pos logit contributions
 Alexa+0.012
ula+0.011
hetically+0.011
ela+0.011
 creation+0.011
Neg logit contributions
 wanted-0.006
 okay-0.006
 dangerous-0.006
 respected-0.006
 crank-0.006
Token happy
Token ID3772
Feature activation+1.04e-02

Pos logit contributions
 wanted+0.003
 dangerous+0.003
 okay+0.003
 crank+0.003
 respected+0.003
Neg logit contributions
 Alexa-0.005
ula-0.005
hetically-0.005
ela-0.005
 creation-0.004
Token and
Token ID290
Feature activation+3.43e-03

Pos logit contributions
 wanted+0.011
 like+0.009
 okay+0.009
 decided+0.009
 respected+0.009
Neg logit contributions
hetically-0.022
ula-0.022
 Alexa-0.021
ela-0.021
oris-0.020
Token proud
Token ID6613
Feature activation-1.27e-02

Pos logit contributions
 dangerous+0.005
 okay+0.004
 crank+0.004
 respected+0.004
 waste+0.004
Neg logit contributions
 Alexa-0.006
ula-0.005
ela-0.005
hetically-0.005
 creation-0.005
Token of
Token ID286
Feature activation-5.01e-02

Pos logit contributions
hetically+0.024
 Alexa+0.024
ula+0.024
ela+0.023
 poster+0.023
Neg logit contributions
 wanted-0.015
 okay-0.014
 respected-0.013
 decided-0.013
 like-0.013
Token her
Token ID607
Feature activation-2.42e-02

Pos logit contributions
hetically+0.089
 Alexa+0.086
ula+0.085
 creation+0.085
ela+0.084
Neg logit contributions
 okay-0.061
 respected-0.060
 wanted-0.060
 dangerous-0.059
 like-0.057
Token accomplishment
Token ID29340
Feature activation-1.97e-01

Pos logit contributions
hetically+0.043
 Alexa+0.042
ela+0.039
ula+0.039
oris+0.038
Neg logit contributions
 wanted-0.032
 okay-0.031
 quit-0.031
 worn-0.030
 dangerous-0.030
Token.
Token ID13
Feature activation-7.56e-03

Pos logit contributions
 Alexa+0.391
ula+0.383
hetically+0.378
ela+0.371
 creation+0.365
Neg logit contributions
 wanted-0.205
 respected-0.191
 okay-0.191
 dangerous-0.186
 decided-0.185
Token She
Token ID1375
Feature activation-1.90e-02

Pos logit contributions
hetically+0.016
 creation+0.015
 puppet+0.015
ela+0.015
ula+0.014
Neg logit contributions
 wanted-0.008
 lived-0.007
 Please-0.007
 shouldn-0.007
 decided-0.007
Token had
Token ID550
Feature activation+5.96e-03

Pos logit contributions
 Alexa+0.039
 creation+0.038
ela+0.038
ula+0.037
hetically+0.037
Neg logit contributions
 okay-0.017
 crank-0.016
 dangerous-0.016
 waste-0.016
 OK-0.015
Token kept
Token ID4030
Feature activation+2.19e-02

Pos logit contributions
 crank+0.005
 dangerous+0.005
 waste+0.005
Please+0.005
 okay+0.005
Neg logit contributions
 Alexa-0.012
 creation-0.011
ela-0.011
ula-0.011
 Julie-0.011
Token dreaming
Token ID28368
Feature activation-3.46e-02

Pos logit contributions
 okay+0.026
 wanted+0.026
 worn+0.026
 dangerous+0.025
 respected+0.025
Neg logit contributions
hetically-0.040
 Alexa-0.037
ula-0.037
ela-0.037
 curtain-0.037
Token took
Token ID1718
Feature activation-2.59e-02

Pos logit contributions
 Alexa+0.025
 puppet+0.024
ela+0.024
hetically+0.024
 creation+0.024
Neg logit contributions
 okay-0.012
 crank-0.011
 OK-0.011
 Please-0.011
 dangerous-0.010
Token turns
Token ID4962
Feature activation-5.62e-02

Pos logit contributions
 Alexa+0.044
ula+0.043
hetically+0.042
ela+0.042
 creation+0.042
Neg logit contributions
 wanted-0.033
 respected-0.032
 okay-0.032
 dangerous-0.032
 crank-0.032
Token making
Token ID1642
Feature activation-7.92e-02

Pos logit contributions
hetically+0.101
 Alexa+0.100
ela+0.096
ula+0.095
oris+0.094
Neg logit contributions
 wanted-0.072
 like-0.069
 fought-0.069
 seen-0.067
 respected-0.067
Token z
Token ID1976
Feature activation-8.71e-02

Pos logit contributions
hetically+0.168
 Alexa+0.161
ela+0.160
ula+0.160
oris+0.157
Neg logit contributions
 dangerous-0.076
 wanted-0.072
 like-0.071
 waste-0.069
 okay-0.069
Tokenig
Token ID328
Feature activation-9.95e-02

Pos logit contributions
 doorway+0.099
 Alexa+0.095
hetically+0.091
 curtain+0.083
 Anne+0.082
Neg logit contributions
uch-0.160
?"-0.154
 okay-0.153
 fought-0.152
 avoided-0.152
Tokenz
Token ID89
Feature activation-1.96e-01

Pos logit contributions
 doorway+0.121
 Alexa+0.112
 curtain+0.105
joy+0.102
ula+0.102
Neg logit contributions
uch-0.171
 okay-0.163
 wanted-0.163
 touched-0.161
 stayed-0.160
Tokenags
Token ID3775
Feature activation-1.68e-02

Pos logit contributions
 Alexa+0.307
ula+0.294
hetically+0.292
ela+0.287
 creation+0.282
Neg logit contributions
 okay-0.277
 wanted-0.275
 dangerous-0.270
 respected-0.270
 crank-0.269
Token with
Token ID351
Feature activation-4.13e-02

Pos logit contributions
 Alexa+0.030
ula+0.030
hetically+0.029
ela+0.029
 creation+0.029
Neg logit contributions
 wanted-0.020
 dangerous-0.019
 respected-0.019
 okay-0.019
 crank-0.019
Token their
Token ID511
Feature activation-8.95e-03

Pos logit contributions
 Alexa+0.086
hetically+0.085
ula+0.084
ela+0.083
oris+0.080
Neg logit contributions
 wanted-0.040
 dangerous-0.038
 waste-0.037
 respected-0.036
 seen-0.036
Token feet
Token ID3625
Feature activation+6.65e-03

Pos logit contributions
 Alexa+0.015
ula+0.015
hetically+0.015
ela+0.014
oris+0.014
Neg logit contributions
 wanted-0.012
 dangerous-0.011
 worn-0.011
 respected-0.011
 waste-0.011
Token on
Token ID319
Feature activation+2.64e-03

Pos logit contributions
 wanted+0.007
 okay+0.007
 dangerous+0.007
 respected+0.007
 crank+0.006
Neg logit contributions
 Alexa-0.013
ula-0.013
ela-0.013
hetically-0.012
 creation-0.012
Token was
Token ID373
Feature activation-1.08e-02

Pos logit contributions
 okay+0.024
 crank+0.023
 dangerous+0.023
 belong+0.023
 OK+0.023
Neg logit contributions
 Alexa-0.049
 creation-0.046
ela-0.046
ula-0.046
hetically-0.046
Token so
Token ID523
Feature activation+4.47e-03

Pos logit contributions
hetically+0.022
ula+0.022
 Alexa+0.021
ela+0.021
 poster+0.021
Neg logit contributions
 wanted-0.012
 okay-0.011
 respected-0.011
 like-0.011
 quit-0.011
Token proud
Token ID6613
Feature activation-9.94e-03

Pos logit contributions
 wanted+0.005
 okay+0.005
 respected+0.005
 dangerous+0.005
 crank+0.005
Neg logit contributions
 Alexa-0.009
ula-0.008
hetically-0.008
ela-0.008
 creation-0.008
Token of
Token ID286
Feature activation-4.87e-02

Pos logit contributions
 Alexa+0.016
hetically+0.016
ula+0.016
ela+0.016
 creation+0.015
Neg logit contributions
 wanted-0.014
 dangerous-0.013
 okay-0.013
 respected-0.013
 crank-0.013
Token his
Token ID465
Feature activation+1.92e-02

Pos logit contributions
 Alexa+0.099
ula+0.098
hetically+0.097
ela+0.096
 creation+0.095
Neg logit contributions
 wanted-0.046
 okay-0.045
 dangerous-0.045
 respected-0.045
 quit-0.043
Token accomplishment
Token ID29340
Feature activation-1.96e-01

Pos logit contributions
 dangerous+0.026
 wanted+0.026
 okay+0.026
 quit+0.026
 crank+0.025
Neg logit contributions
hetically-0.032
 Alexa-0.032
ula-0.030
ela-0.030
oris-0.029
Token and
Token ID290
Feature activation-3.86e-04

Pos logit contributions
 Alexa+0.353
ula+0.349
hetically+0.342
ela+0.337
 creation+0.337
Neg logit contributions
 wanted-0.227
 okay-0.223
 dangerous-0.222
 crank-0.222
 respected-0.221
Token his
Token ID465
Feature activation+1.80e-02

Pos logit contributions
 Alexa+0.001
 creation+0.001
hetically+0.001
 puppet+0.001
 Julie+0.001
Neg logit contributions
Please-0.000
 crank-0.000
uch-0.000
 okay-0.000
 OK-0.000
Token mom
Token ID1995
Feature activation+7.88e-03

Pos logit contributions
 wanted+0.023
 dangerous+0.023
 okay+0.023
 crank+0.023
 quit+0.022
Neg logit contributions
 Alexa-0.031
hetically-0.030
ula-0.030
ela-0.030
oris-0.028
Token was
Token ID373
Feature activation-1.52e-02

Pos logit contributions
 wanted+0.007
 respected+0.007
 okay+0.007
 dangerous+0.007
 crank+0.007
Neg logit contributions
 Alexa-0.016
ula-0.016
hetically-0.016
ela-0.016
 creation-0.016
Token too
Token ID1165
Feature activation+4.36e-02

Pos logit contributions
 Alexa+0.028
ula+0.028
hetically+0.028
ela+0.027
 creation+0.027
Neg logit contributions
 wanted-0.017
 okay-0.017
 respected-0.017
 crank-0.016
 dangerous-0.016
Token.
Token ID13
Feature activation+3.40e-03

Pos logit contributions
 Alexa+0.035
ula+0.035
hetically+0.034
ela+0.034
 creation+0.034
Neg logit contributions
 wanted-0.022
 okay-0.021
 dangerous-0.021
 respected-0.021
 crank-0.021
Token He
Token ID679
Feature activation+3.38e-02

Pos logit contributions
 dangerous+0.005
uch+0.005
 harmful+0.005
 respected+0.005
 avoided+0.005
Neg logit contributions
 Alexa-0.005
ula-0.005
 circles-0.005
 Isabel-0.005
 creation-0.005
Token wrote
Token ID2630
Feature activation-2.09e-02

Pos logit contributions
 okay+0.037
 OK+0.035
 crank+0.032
 waste+0.032
 belong+0.031
Neg logit contributions
 Alexa-0.066
 creation-0.065
 puppet-0.064
 Isabel-0.062
 poster-0.061
Token z
Token ID1976
Feature activation-7.74e-02

Pos logit contributions
 Alexa+0.037
ula+0.036
 creation+0.035
ela+0.035
hetically+0.034
Neg logit contributions
 crank-0.025
 respected-0.025
 wanted-0.024
 dangerous-0.024
 avoided-0.024
Tokenig
Token ID328
Feature activation-8.93e-02

Pos logit contributions
 Alexa+0.076
ula+0.072
hetically+0.071
ela+0.068
 creation+0.067
Neg logit contributions
 wanted-0.154
 okay-0.152
 respected-0.151
 crank-0.150
 quit-0.149
Tokenz
Token ID89
Feature activation-1.96e-01

Pos logit contributions
 doorway+0.110
 Alexa+0.108
ula+0.108
 curtain+0.103
hetically+0.102
Neg logit contributions
 wanted-0.155
 did-0.147
 kept-0.145
?"-0.144
 decided-0.144
Tokenags
Token ID3775
Feature activation-2.16e-02

Pos logit contributions
 Alexa+0.295
ula+0.292
hetically+0.284
ela+0.280
 creation+0.278
Neg logit contributions
 wanted-0.288
 dangerous-0.282
 okay-0.282
 respected-0.281
 crank-0.281
Token on
Token ID319
Feature activation+1.07e-02

Pos logit contributions
 Alexa+0.041
ula+0.041
hetically+0.040
ela+0.039
 creation+0.039
Neg logit contributions
 wanted-0.023
 okay-0.022
 dangerous-0.022
 respected-0.022
 crank-0.022
Token walls
Token ID7714
Feature activation+8.76e-03

Pos logit contributions
 okay+0.013
 respected+0.012
 crank+0.012
 wanted+0.012
 avoided+0.012
Neg logit contributions
 Alexa-0.019
ula-0.019
 creation-0.018
ela-0.018
 puppet-0.018
Token,
Token ID11
Feature activation-7.27e-03

Pos logit contributions
 like+0.008
 wanted+0.007
 today+0.007
 when+0.007
 kept+0.007
Neg logit contributions
ula-0.019
 Alexa-0.019
hetically-0.018
ela-0.018
oris-0.018
Token floors
Token ID18570
Feature activation+5.77e-02

Pos logit contributions
 Alexa+0.012
ula+0.012
hetically+0.012
ela+0.011
 creation+0.011
Neg logit contributions
 wanted-0.010
 okay-0.009
 dangerous-0.009
 respected-0.009
 crank-0.009
Token to
Token ID284
Feature activation-9.50e-03

Pos logit contributions
 like+0.033
 okay+0.033
 wanted+0.031
 belong+0.030
?"+0.030
Neg logit contributions
hetically-0.060
 Alexa-0.059
oris-0.058
ula-0.058
ela-0.058
Token the
Token ID262
Feature activation-3.81e-03

Pos logit contributions
 Alexa+0.016
 creation+0.015
ula+0.015
 curtain+0.015
 circles+0.014
Neg logit contributions
 crank-0.013
 delayed-0.012
uch-0.011
 Please-0.011
 okay-0.011
Token vine
Token ID17793
Feature activation-4.41e-02

Pos logit contributions
 Alexa+0.006
ula+0.006
 creation+0.006
hetically+0.006
ela+0.006
Neg logit contributions
 wanted-0.005
 crank-0.005
 respected-0.005
 okay-0.005
 worn-0.005
Token and
Token ID290
Feature activation-9.35e-05

Pos logit contributions
 Alexa+0.081
ula+0.081
hetically+0.079
ela+0.078
 creation+0.077
Neg logit contributions
 wanted-0.050
 okay-0.048
 dangerous-0.048
 respected-0.048
 crank-0.047
Token her
Token ID607
Feature activation-2.25e-02

Pos logit contributions
 Alexa+0.000
 creation+0.000
 curtain+0.000
ula+0.000
 puppet+0.000
Neg logit contributions
 crank-0.000
uch-0.000
 okay-0.000
Please-0.000
 dangerous-0.000
Token accomplishment
Token ID29340
Feature activation-1.95e-01

Pos logit contributions
 Alexa+0.040
ula+0.040
hetically+0.039
ela+0.038
 creation+0.038
Neg logit contributions
 wanted-0.027
 okay-0.026
 dangerous-0.026
 respected-0.026
 crank-0.026
Token.
Token ID13
Feature activation-9.06e-03

Pos logit contributions
ula+0.353
 Alexa+0.352
hetically+0.342
 creation+0.341
ela+0.340
Neg logit contributions
 crank-0.218
 wanted-0.216
 dangerous-0.215
 okay-0.214
 respected-0.211
Token
Token ID220
Feature activation-5.71e-02

Pos logit contributions
hetically+0.016
 Alexa+0.016
 creation+0.016
ula+0.016
ela+0.016
Neg logit contributions
 wanted-0.011
 okay-0.010
 respected-0.010
 crank-0.010
 worn-0.010
Token
Token ID198
Feature activation-2.12e-03

Pos logit contributions
 Alexa+0.110
hetically+0.109
ula+0.108
ela+0.107
 creation+0.107
Neg logit contributions
 wanted-0.058
 okay-0.058
 respected-0.055
 dangerous-0.055
 crank-0.054
Token
Token ID198
Feature activation-1.33e-03

Pos logit contributions
 Alexa+0.004
hetically+0.004
 creation+0.004
ula+0.004
ela+0.004
Neg logit contributions
 wanted-0.002
 respected-0.002
 okay-0.002
 dangerous-0.002
 lived-0.002
TokenFrom
Token ID4863
Feature activation-1.46e-02

Pos logit contributions
 Alexa+0.003
hetically+0.003
ula+0.003
 creation+0.003
ela+0.003
Neg logit contributions
 wanted-0.001
 respected-0.001
 okay-0.001
 dangerous-0.001
 crank-0.001
Token proud
Token ID6613
Feature activation-6.28e-03

Pos logit contributions
Please+0.009
 seen+0.009
 stole+0.009
 lived+0.008
asure+0.008
Neg logit contributions
ela-0.009
 Alexa-0.009
ula-0.009
 creation-0.009
oming-0.008
Token of
Token ID286
Feature activation-4.62e-02

Pos logit contributions
hetically+0.012
 Alexa+0.012
ula+0.011
ela+0.011
 poster+0.011
Neg logit contributions
 wanted-0.008
 like-0.007
 okay-0.007
 respected-0.007
 dangerous-0.007
Token themselves
Token ID2405
Feature activation-5.85e-03

Pos logit contributions
 Alexa+0.086
ula+0.085
hetically+0.084
ela+0.082
 creation+0.082
Neg logit contributions
 wanted-0.052
 okay-0.051
 dangerous-0.050
 respected-0.050
 crank-0.050
Token for
Token ID329
Feature activation-2.82e-02

Pos logit contributions
 Alexa+0.013
hetically+0.013
ela+0.013
 poster+0.013
ula+0.012
Neg logit contributions
 okay-0.005
 wanted-0.005
 like-0.005
 decided-0.005
 dangerous-0.004
Token their
Token ID511
Feature activation+1.80e-03

Pos logit contributions
 Alexa+0.038
ula+0.037
hetically+0.036
 creation+0.036
ela+0.035
Neg logit contributions
 wanted-0.046
 crank-0.046
 okay-0.044
 dangerous-0.044
 respected-0.044
Token accomplishment
Token ID29340
Feature activation-1.95e-01

Pos logit contributions
 wanted+0.002
 okay+0.002
 respected+0.002
 crank+0.002
 dangerous+0.002
Neg logit contributions
 Alexa-0.003
ula-0.003
hetically-0.003
 creation-0.003
ela-0.003
Token.
Token ID13
Feature activation-2.25e-03

Pos logit contributions
 Alexa+0.382
ula+0.372
hetically+0.369
ela+0.361
 creation+0.355
Neg logit contributions
 wanted-0.206
 okay-0.195
 respected-0.194
 dangerous-0.193
 quit-0.187
Token
Token ID198
Feature activation-7.08e-04

Pos logit contributions
 Alexa+0.004
ula+0.004
hetically+0.004
ela+0.004
 creation+0.004
Neg logit contributions
 wanted-0.003
 okay-0.003
 respected-0.003
 dangerous-0.003
 crank-0.003
Token
Token ID198
Feature activation-6.30e-04

Pos logit contributions
hetically+0.002
 Alexa+0.002
 creation+0.002
ela+0.002
 poster+0.002
Neg logit contributions
 lived-0.001
 wanted-0.001
 respected-0.000
 dangerous-0.000
?"-0.000
TokenFrom
Token ID4863
Feature activation-1.41e-02

Pos logit contributions
 Alexa+0.001
hetically+0.001
 creation+0.001
ula+0.001
ela+0.001
Neg logit contributions
 respected-0.001
 wanted-0.001
 decided-0.001
Please-0.001
 quit-0.001
Token then
Token ID788
Feature activation+1.16e-02

Pos logit contributions
ela+0.022
ula+0.021
 Alexa+0.021
 poster+0.021
 creation+0.020
Neg logit contributions
 crank-0.020
 belong-0.019
 quit-0.019
 seen-0.018
 okay-0.018
Token so
Token ID523
Feature activation+4.20e-03

Pos logit contributions
 Alexa+0.021
ula+0.021
hetically+0.020
ela+0.020
 creation+0.020
Neg logit contributions
 wanted-0.012
 dangerous-0.011
 crank-0.011
 okay-0.011
 respected-0.011
Token proud
Token ID6613
Feature activation-7.69e-03

Pos logit contributions
 wanted+0.005
 dangerous+0.005
 okay+0.005
 respected+0.005
 crank+0.005
Neg logit contributions
 Alexa-0.008
ula-0.008
hetically-0.007
ela-0.007
 creation-0.007
Token of
Token ID286
Feature activation-4.83e-02

Pos logit contributions
 Alexa+0.013
hetically+0.013
ula+0.013
ela+0.012
 creation+0.012
Neg logit contributions
 wanted-0.010
 okay-0.010
 dangerous-0.010
 respected-0.010
 decided-0.010
Token his
Token ID465
Feature activation+1.94e-02

Pos logit contributions
 Alexa+0.096
ula+0.095
hetically+0.093
ela+0.092
 creation+0.092
Neg logit contributions
 wanted-0.049
 okay-0.048
 respected-0.047
 dangerous-0.047
 crank-0.046
Token great
Token ID1049
Feature activation-1.22e-02

Pos logit contributions
 dangerous+0.026
 okay+0.025
 wanted+0.025
 waste+0.023
 belong+0.023
Neg logit contributions
 Alexa-0.037
ula-0.035
hetically-0.034
ela-0.034
 Isabel-0.033
Token accomplishment
Token ID29340
Feature activation-1.94e-01

Pos logit contributions
 Alexa+0.028
hetically+0.027
ula+0.027
 Isabel+0.026
 Ana+0.025
Neg logit contributions
 wanted-0.012
 dangerous-0.012
 waste-0.011
 quit-0.011
 belong-0.011
Token.
Token ID13
Feature activation-3.20e-03

Pos logit contributions
 Alexa+0.385
ula+0.373
hetically+0.371
ela+0.364
 creation+0.360
Neg logit contributions
 wanted-0.203
 okay-0.191
 respected-0.191
 dangerous-0.189
 crank-0.185
Token<|endoftext|>
Token ID50256
Feature activation-3.04e-02

Pos logit contributions
 Alexa+0.006
ula+0.006
hetically+0.006
ela+0.006
 creation+0.006
Neg logit contributions
 wanted-0.004
 respected-0.003
 okay-0.003
 dangerous-0.003
 crank-0.003
TokenOnce
Token ID7454
Feature activation+1.44e-02

Pos logit contributions
hetically+0.066
 Alexa+0.065
ula+0.064
 creation+0.064
ela+0.063
Neg logit contributions
 wanted-0.027
 lived-0.024
?"-0.024
 okay-0.024
 seen-0.024
Token upon
Token ID2402
Feature activation-1.55e-02

Pos logit contributions
 wanted+0.018
?"+0.017
 okay+0.017
 lived+0.017
 dangerous+0.017
Neg logit contributions
hetically-0.026
ula-0.026
 Alexa-0.026
 puppet-0.024
 creation-0.024
Token a
Token ID257
Feature activation+1.13e-02

Pos logit contributions
 Alexa+0.029
ula+0.028
 creation+0.028
hetically+0.027
 puppet+0.027
Neg logit contributions
 respected-0.016
 crank-0.016
 wanted-0.015
 fought-0.015
 dangerous-0.015
Token They
Token ID1119
Feature activation+8.32e-03

Pos logit contributions
 Alexa+0.007
ula+0.006
 Isabel+0.006
 Julie+0.006
ela+0.006
Neg logit contributions
 belong-0.006
 respected-0.006
 wanted-0.006
 dangerous-0.006
 crank-0.005
Token were
Token ID547
Feature activation+2.20e-02

Pos logit contributions
 crank+0.009
 yourself+0.009
 okay+0.009
 OK+0.008
 Please+0.008
Neg logit contributions
ela-0.016
 Alexa-0.015
hetically-0.015
 puppet-0.015
 Isabel-0.015
Token getting
Token ID1972
Feature activation+2.74e-02

Pos logit contributions
 wanted+0.023
 belong+0.023
 respected+0.022
 stole+0.022
Please+0.022
Neg logit contributions
 Alexa-0.041
ula-0.040
ela-0.039
 creation-0.038
 curtain-0.038
Token tired
Token ID10032
Feature activation+1.12e-01

Pos logit contributions
 lived+0.034
 wanted+0.033
 respected+0.032
 belong+0.031
 avoided+0.031
Neg logit contributions
 Alexa-0.046
ula-0.045
ela-0.044
 creation-0.043
hetically-0.043
Token and
Token ID290
Feature activation+4.87e-03

Pos logit contributions
 respected+0.151
 wanted+0.148
 avoided+0.147
 dangerous+0.145
 crank+0.145
Neg logit contributions
ula-0.180
 Alexa-0.177
hetically-0.173
ela-0.170
 creation-0.168
Token hungry
Token ID14720
Feature activation+1.23e-01

Pos logit contributions
uch+0.006
asure+0.006
 property+0.006
Twe+0.006
 yourself+0.006
Neg logit contributions
 Alexa-0.007
hetically-0.007
 puppet-0.007
 cheering-0.007
ela-0.007
Token.
Token ID13
Feature activation-5.38e-03

Pos logit contributions
 respected+0.151
 crank+0.148
 wanted+0.148
 avoided+0.147
 dangerous+0.146
Neg logit contributions
ula-0.209
 Alexa-0.208
hetically-0.203
 creation-0.200
ela-0.198
Token
Token ID198
Feature activation-2.33e-03

Pos logit contributions
 Alexa+0.008
ula+0.008
 Julie+0.007
 Isabel+0.007
 Ana+0.007
Neg logit contributions
 belong-0.008
 respected-0.008
uch-0.008
 stole-0.008
 quit-0.008
Token
Token ID198
Feature activation-1.48e-03

Pos logit contributions
 Alexa+0.004
ula+0.004
ela+0.004
hetically+0.004
 creation+0.004
Neg logit contributions
 crank-0.003
 wanted-0.003
 belong-0.003
 okay-0.003
 respected-0.003
Token"
Token ID1
Feature activation-7.26e-03

Pos logit contributions
ela+0.002
ula+0.002
 Alexa+0.002
z+0.002
 poster+0.002
Neg logit contributions
 belong-0.002
 crank-0.002
 shouldn-0.002
 quit-0.002
 seen-0.002
TokenTom
Token ID13787
Feature activation+3.08e-02

Pos logit contributions
ula+0.012
 Alexa+0.012
ela+0.011
 puppet+0.011
 poster+0.011
Neg logit contributions
 belong-0.010
 dangerous-0.010
?"-0.010
 seen-0.009
 shouldn-0.009
Token€
Token ID26391
Feature activation+1.04e-02

Pos logit contributions
 wanted+0.018
 okay+0.017
 dangerous+0.017
 respected+0.017
 worn+0.017
Neg logit contributions
 Alexa-0.020
ula-0.019
hetically-0.018
ela-0.018
 poster-0.018
Token�
Token ID129
Feature activation+1.12e-02

Pos logit contributions
 respected+0.014
 dangerous+0.014
 belong+0.013
 waste+0.013
Please+0.013
Neg logit contributions
 Alexa-0.017
 poster-0.016
ela-0.016
 creation-0.015
ula-0.015
Token�
Token ID241
Feature activation+2.89e-03

Pos logit contributions
 wanted+0.015
 okay+0.014
 dangerous+0.014
 respected+0.014
 crank+0.014
Neg logit contributions
 Alexa-0.019
ula-0.018
hetically-0.018
ela-0.018
 creation-0.018
TokenIs
Token ID3792
Feature activation+4.65e-04

Pos logit contributions
 okay+0.003
 dangerous+0.003
 wanted+0.003
 respected+0.003
 crank+0.003
Neg logit contributions
ula-0.005
 Alexa-0.005
ela-0.005
hetically-0.005
 creation-0.005
Token he
Token ID339
Feature activation+4.65e-02

Pos logit contributions
 respected+0.001
 crank+0.001
 avoided+0.001
 waste+0.001
 cancel+0.001
Neg logit contributions
 Alexa-0.001
ela-0.001
ula-0.001
 Isabel-0.001
 Julia-0.001
Token someone
Token ID2130
Feature activation+1.34e-01

Pos logit contributions
Please+0.047
 fought+0.046
 avoided+0.046
 crank+0.046
 respected+0.046
Neg logit contributions
 Alexa-0.084
 creation-0.081
 puppet-0.080
ela-0.076
ail-0.075
Token special
Token ID2041
Feature activation+1.72e-02

Pos logit contributions
Please+0.137
 avoided+0.136
 respected+0.134
 crank+0.132
 fought+0.131
Neg logit contributions
 Alexa-0.236
 puppet-0.220
 creation-0.219
 poster-0.218
ela-0.216
Token to
Token ID284
Feature activation+1.51e-02

Pos logit contributions
 avoided+0.018
 respected+0.017
 crank+0.017
 fought+0.016
 dangerous+0.016
Neg logit contributions
 Alexa-0.033
 poster-0.032
 puppet-0.032
 creation-0.032
 curtain-0.032
Token you
Token ID345
Feature activation+6.30e-02

Pos logit contributions
 crank+0.021
 avoided+0.020
 delayed+0.020
 fought+0.020
 respected+0.019
Neg logit contributions
 Alexa-0.025
ula-0.023
 Isabel-0.023
 Tracy-0.023
 poster-0.023
Token?
Token ID30
Feature activation+3.76e-02

Pos logit contributions
 avoided+0.062
 fought+0.061
 crank+0.061
 respected+0.058
cream+0.058
Neg logit contributions
 Alexa-0.121
 curtain-0.115
 poster-0.114
ela-0.110
 Julie-0.110
Tokenâ
Token ID22940
Feature activation+9.92e-03

Pos logit contributions
 respected+0.049
 dangerous+0.049
 avoided+0.047
 wanted+0.047
 crank+0.046
Neg logit contributions
 Alexa-0.065
ula-0.064
hetically-0.059
ela-0.059
 Isabel-0.058
Token and
Token ID290
Feature activation+2.00e-02

Pos logit contributions
 wanted+0.025
 dangerous+0.024
 okay+0.024
 respected+0.024
 decided+0.023
Neg logit contributions
 Alexa-0.042
hetically-0.041
ula-0.041
ela-0.040
 creation-0.039
Token a
Token ID257
Feature activation+7.48e-03

Pos logit contributions
 crank+0.025
 okay+0.025
Please+0.024
uch+0.024
 respected+0.024
Neg logit contributions
 creation-0.034
 Alexa-0.033
 puppet-0.032
ula-0.032
 curtain-0.031
Token big
Token ID1263
Feature activation-6.94e-03

Pos logit contributions
 wanted+0.009
 okay+0.009
 respected+0.009
 crank+0.009
 dangerous+0.009
Neg logit contributions
 Alexa-0.013
ula-0.013
hetically-0.012
ela-0.012
 creation-0.012
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 Alexa+0.010
 creation+0.010
ula+0.010
ela+0.010
hetically+0.010
Neg logit contributions
 respected-0.010
 avoided-0.010
 okay-0.010
 crank-0.010
 wanted-0.010
Token juicy
Token ID35075
Feature activation+2.90e-02

Pos logit contributions
 respected+0.015
 avoided+0.015
 fought+0.015
 quit+0.014
 belong+0.014
Neg logit contributions
 Alexa-0.017
ula-0.017
 puppet-0.016
 creation-0.016
 curtain-0.016
Token oy
Token ID35104
Feature activation+1.11e-01

Pos logit contributions
 okay+0.035
 respected+0.035
 crank+0.035
 avoided+0.034
 dangerous+0.034
Neg logit contributions
 creation-0.050
ula-0.050
 Alexa-0.049
ela-0.049
hetically-0.047
Tokenster
Token ID1706
Feature activation+5.74e-02

Pos logit contributions
 wanted+0.177
 okay+0.173
 dangerous+0.172
 respected+0.171
 crank+0.171
Neg logit contributions
 Alexa-0.152
ula-0.148
hetically-0.146
ela-0.142
 creation-0.140
Token.
Token ID13
Feature activation+7.13e-03

Pos logit contributions
 wanted+0.061
 dangerous+0.056
 okay+0.055
 decided+0.055
 respected+0.055
Neg logit contributions
 Alexa-0.113
hetically-0.111
ula-0.110
ela-0.107
 poster-0.104
Token The
Token ID383
Feature activation+3.19e-02

Pos logit contributions
 wanted+0.008
 dangerous+0.008
 respected+0.008
 okay+0.008
 crank+0.008
Neg logit contributions
 Alexa-0.013
ula-0.013
hetically-0.012
ela-0.012
 creation-0.012
Token oy
Token ID35104
Feature activation+1.10e-01

Pos logit contributions
 avoided+0.036
 okay+0.035
 respected+0.034
 fought+0.034
 crank+0.034
Neg logit contributions
 creation-0.058
 Alexa-0.057
ula-0.057
ela-0.056
 puppet-0.055
Tokenster
Token ID1706
Feature activation+5.66e-02

Pos logit contributions
 dangerous+0.213
 crank+0.213
 okay+0.210
 avoided+0.208
 quit+0.208
Neg logit contributions
 creation-0.110
 Alexa-0.110
ela-0.107
ula-0.106
hetically-0.103
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 crank+0.024
asure+0.023
 respected+0.023
uch+0.023
 worn+0.022
Neg logit contributions
ula-0.020
 Alexa-0.019
 curtain-0.018
hetically-0.018
 creation-0.018
Token He
Token ID679
Feature activation+5.59e-02

Pos logit contributions
 belong+0.032
 respected+0.032
 dangerous+0.031
 avoided+0.031
 fought+0.031
Neg logit contributions
 Alexa-0.040
ula-0.037
 Isabel-0.037
hetically-0.036
 Ana-0.036
Token would
Token ID561
Feature activation-3.92e-02

Pos logit contributions
 ours+0.064
 OK+0.062
 okay+0.061
 yourselves+0.060
 belong+0.058
Neg logit contributions
 Alexa-0.101
 puppet-0.099
sey-0.097
 creation-0.095
hetically-0.092
Token often
Token ID1690
Feature activation+3.55e-02

Pos logit contributions
 Alexa+0.076
hetically+0.074
 creation+0.073
 curtain+0.071
ula+0.071
Neg logit contributions
 okay-0.038
 dangerous-0.038
 crank-0.038
 seen-0.037
 avoided-0.037
Token look
Token ID804
Feature activation-2.91e-02

Pos logit contributions
 dangerous+0.044
 crank+0.044
 avoided+0.043
 okay+0.043
Please+0.043
Neg logit contributions
 Alexa-0.058
 curtain-0.054
 creation-0.054
 poster-0.052
hetically-0.052
Token at
Token ID379
Feature activation+1.01e-01

Pos logit contributions
 Alexa+0.045
ula+0.043
 creation+0.042
ela+0.042
hetically+0.041
Neg logit contributions
 wanted-0.042
 respected-0.042
 crank-0.041
 fought-0.040
 avoided-0.040
Token other
Token ID584
Feature activation+4.67e-02

Pos logit contributions
 okay+0.125
 crank+0.125
 wanted+0.122
 respected+0.121
Please+0.121
Neg logit contributions
 Alexa-0.171
hetically-0.165
 creation-0.165
ula-0.164
 curtain-0.162
Token animals
Token ID4695
Feature activation+6.70e-02

Pos logit contributions
 respected+0.064
 crank+0.062
 worn+0.062
 avoided+0.062
 okay+0.062
Neg logit contributions
 Alexa-0.073
ela-0.070
 creation-0.069
hetically-0.069
 puppet-0.069
Token in
Token ID287
Feature activation+4.77e-02

Pos logit contributions
 wanted+0.074
 okay+0.071
 dangerous+0.071
 respected+0.070
 crank+0.069
Neg logit contributions
 Alexa-0.126
ula-0.126
hetically-0.122
ela-0.121
 creation-0.120
Token the
Token ID262
Feature activation+2.24e-02

Pos logit contributions
 wanted+0.049
 dangerous+0.048
 okay+0.047
 respected+0.046
 crank+0.046
Neg logit contributions
ula-0.094
 Alexa-0.093
hetically-0.092
ela-0.091
 creation-0.090
Token forest
Token ID8222
Feature activation+2.86e-02

Pos logit contributions
 fought+0.042
 respected+0.040
 lied+0.040
 belong+0.039
asure+0.039
Neg logit contributions
 doorway-0.025
 Alexa-0.025
 curtain-0.024
 circles-0.024
 puppet-0.023
Token upon
Token ID2402
Feature activation-3.74e-03

Pos logit contributions
 wanted+0.018
 okay+0.017
?"+0.017
 dangerous+0.017
 lived+0.016
Neg logit contributions
hetically-0.026
ula-0.026
 Alexa-0.026
 puppet-0.025
 poster-0.025
Token a
Token ID257
Feature activation+1.75e-02

Pos logit contributions
 Alexa+0.007
ula+0.007
 creation+0.007
hetically+0.007
ela+0.007
Neg logit contributions
 respected-0.004
 crank-0.004
 wanted-0.004
 fought-0.004
 avoided-0.004
Token time
Token ID640
Feature activation+3.67e-02

Pos logit contributions
 wanted+0.012
 respected+0.012
 okay+0.012
 dangerous+0.012
 crank+0.012
Neg logit contributions
 Alexa-0.039
ula-0.039
hetically-0.038
ela-0.038
 creation-0.038
Token,
Token ID11
Feature activation+1.73e-02

Pos logit contributions
 okay+0.034
 wanted+0.034
 dangerous+0.031
 OK+0.031
 lived+0.030
Neg logit contributions
 Alexa-0.077
ula-0.076
hetically-0.076
 creation-0.074
ela-0.073
Token there
Token ID612
Feature activation+4.22e-02

Pos logit contributions
 wanted+0.012
 crank+0.012
 respected+0.011
 dangerous+0.011
 okay+0.011
Neg logit contributions
 Alexa-0.039
ula-0.039
hetically-0.038
ela-0.038
 creation-0.037
Token lived
Token ID5615
Feature activation+1.49e-01

Pos logit contributions
 crank+0.047
 okay+0.046
 dangerous+0.046
 respected+0.045
 wanted+0.045
Neg logit contributions
 Alexa-0.077
ula-0.076
hetically-0.074
ela-0.074
 creation-0.074
Token a
Token ID257
Feature activation+1.80e-02

Pos logit contributions
 wanted+0.157
 crank+0.152
 respected+0.150
 dangerous+0.150
 belong+0.149
Neg logit contributions
 Alexa-0.287
ula-0.284
 creation-0.267
ela-0.266
hetically-0.265
Token little
Token ID1310
Feature activation+9.52e-03

Pos logit contributions
 wanted+0.019
 dangerous+0.018
 okay+0.018
 crank+0.017
 respected+0.017
Neg logit contributions
hetically-0.036
 Alexa-0.035
ula-0.035
ela-0.034
 creation-0.034
Token bunny
Token ID44915
Feature activation+3.71e-03

Pos logit contributions
 wanted+0.014
 dangerous+0.014
 crank+0.013
 okay+0.013
 respected+0.013
Neg logit contributions
 Alexa-0.015
ula-0.015
hetically-0.014
ela-0.014
 creation-0.014
Token called
Token ID1444
Feature activation+5.79e-02

Pos logit contributions
 okay+0.003
 crank+0.002
 quit+0.002
 dangerous+0.002
 wanted+0.002
Neg logit contributions
 Alexa-0.008
ula-0.008
ela-0.008
hetically-0.008
 creation-0.008
Token Bella
Token ID39304
Feature activation-9.28e-02

Pos logit contributions
 okay+0.067
 wanted+0.066
 dangerous+0.064
 OK+0.062
 crank+0.062
Neg logit contributions
hetically-0.104
 Alexa-0.103
ula-0.103
 creation-0.103
ela-0.101
Token,
Token ID11
Feature activation-6.71e-03

Pos logit contributions
 wanted+0.001
 okay+0.001
 respected+0.001
 dangerous+0.001
 quit+0.001
Neg logit contributions
ula-0.003
 Alexa-0.002
hetically-0.002
ela-0.002
 creation-0.002
Token "
Token ID366
Feature activation-1.84e-02

Pos logit contributions
 Alexa+0.009
hetically+0.009
ula+0.009
 creation+0.009
ela+0.009
Neg logit contributions
 okay-0.011
 wanted-0.011
 dangerous-0.010
 respected-0.010
 OK-0.010
TokenDo
Token ID5211
Feature activation+4.81e-03

Pos logit contributions
 Alexa+0.033
hetically+0.032
ula+0.032
ela+0.032
 poster+0.031
Neg logit contributions
 dangerous-0.022
 okay-0.021
 crank-0.021
 belong-0.021
 wanted-0.020
Token you
Token ID345
Feature activation+4.52e-02

Pos logit contributions
 crank+0.004
 respected+0.004
 worn+0.004
 wanted+0.003
 avoided+0.003
Neg logit contributions
 Alexa-0.010
ula-0.010
hetically-0.010
ela-0.010
 creation-0.010
Token believe
Token ID1975
Feature activation-6.15e-02

Pos logit contributions
Please+0.050
cream+0.048
 dangerous+0.048
 harmful+0.048
 wasteful+0.047
Neg logit contributions
 Alexa-0.079
 Julie-0.073
 performance-0.073
 creation-0.072
 circles-0.072
Token I
Token ID314
Feature activation+4.47e-02

Pos logit contributions
 Alexa+0.132
ula+0.130
ela+0.127
hetically+0.126
 creation+0.125
Neg logit contributions
 wanted-0.051
 crank-0.050
 respected-0.050
 dangerous-0.050
 okay-0.049
Token can
Token ID460
Feature activation-1.99e-02

Pos logit contributions
 crank+0.056
 badly+0.054
Please+0.051
 harmful+0.051
nam+0.050
Neg logit contributions
 Alexa-0.075
 creation-0.073
 curtain-0.073
 poster-0.073
 puppet-0.073
Token help
Token ID1037
Feature activation-8.67e-02

Pos logit contributions
 Alexa+0.037
ula+0.037
 creation+0.036
ela+0.035
 curtain+0.035
Neg logit contributions
 crank-0.023
 avoided-0.022
 dangerous-0.022
 respected-0.021
 worn-0.021
Token you
Token ID345
Feature activation+4.31e-02

Pos logit contributions
 Alexa+0.179
ula+0.172
 Julie+0.165
ela+0.165
 creation+0.164
Neg logit contributions
 respected-0.086
 missed-0.085
 avoided-0.085
 wanted-0.082
 crank-0.082
Token?".
Token ID43634
Feature activation+5.13e-02

Pos logit contributions
 fought+0.040
 avoided+0.040
 wanted+0.039
 crank+0.039
Please+0.039
Neg logit contributions
 Alexa-0.088
 creation-0.087
ula-0.084
 curtain-0.084
 poster-0.081
Token
Token ID198
Feature activation+9.75e-04

Pos logit contributions
 wanted+0.059
 okay+0.057
 respected+0.056
 crank+0.056
 dangerous+0.056
Neg logit contributions
 Alexa-0.093
ula-0.093
hetically-0.091
ela-0.090
 creation-0.090
Tokenara
Token ID3301
Feature activation+2.69e-02

Pos logit contributions
 wanted+0.115
 crank+0.115
 okay+0.114
 dangerous+0.114
 respected+0.113
Neg logit contributions
 Alexa-0.117
ula-0.117
hetically-0.115
 creation-0.114
ela-0.113
Token thought
Token ID1807
Feature activation-9.18e-02

Pos logit contributions
 belong+0.031
 dangerous+0.030
 crank+0.029
 okay+0.029
 waste+0.029
Neg logit contributions
 Alexa-0.050
hetically-0.049
ela-0.048
 creation-0.048
ula-0.047
Token for
Token ID329
Feature activation-1.81e-02

Pos logit contributions
 Alexa+0.202
hetically+0.196
 poster+0.196
 doorway+0.196
oris+0.193
Neg logit contributions
 okay-0.087
 like-0.083
 dangerous-0.077
?"-0.075
 did-0.074
Token a
Token ID257
Feature activation+2.16e-03

Pos logit contributions
 Alexa+0.038
ula+0.037
hetically+0.037
ela+0.036
 creation+0.036
Neg logit contributions
 wanted-0.016
 respected-0.015
 okay-0.015
 crank-0.015
 dangerous-0.015
Token moment
Token ID2589
Feature activation-1.17e-01

Pos logit contributions
 wanted+0.004
 okay+0.004
 dangerous+0.004
 quit+0.004
 did+0.004
Neg logit contributions
hetically-0.003
ula-0.003
 Alexa-0.003
oming-0.003
 Isabel-0.003
Token.
Token ID13
Feature activation-2.00e-03

Pos logit contributions
 Alexa+0.207
ula+0.205
hetically+0.200
ela+0.198
 creation+0.197
Neg logit contributions
 wanted-0.138
 dangerous-0.135
 crank-0.135
 respected-0.135
 okay-0.135
Token She
Token ID1375
Feature activation-1.14e-02

Pos logit contributions
 Alexa+0.003
ula+0.003
hetically+0.003
ela+0.003
 creation+0.003
Neg logit contributions
 wanted-0.003
 dangerous-0.003
 respected-0.003
 crank-0.003
 okay-0.003
Token liked
Token ID8288
Feature activation+4.42e-02

Pos logit contributions
 Alexa+0.020
 puppet+0.020
ela+0.019
 creation+0.019
hetically+0.018
Neg logit contributions
 belong-0.015
 yourselves-0.014
 ours-0.014
 waste-0.014
 property-0.014
Token her
Token ID607
Feature activation-1.14e-02

Pos logit contributions
 wanted+0.050
 dangerous+0.048
 respected+0.047
 okay+0.047
 worn+0.046
Neg logit contributions
 Alexa-0.083
ula-0.082
hetically-0.081
ela-0.079
 creation-0.078
Token pe
Token ID613
Feature activation+3.15e-02

Pos logit contributions
 Alexa+0.021
ula+0.020
hetically+0.020
ela+0.019
 creation+0.019
Neg logit contributions
 wanted-0.014
 dangerous-0.013
 okay-0.013
 respected-0.013
 crank-0.013
Tokenb
Token ID65
Feature activation+2.32e-02

Pos logit contributions
 wanted+0.036
 dangerous+0.035
 okay+0.035
 respected+0.035
 crank+0.035
Neg logit contributions
 Alexa-0.058
ula-0.057
hetically-0.056
ela-0.055
 creation-0.054
Token it
Token ID340
Feature activation+3.00e-03

Pos logit contributions
 Alexa+0.012
hetically+0.012
ela+0.011
ula+0.011
 creation+0.011
Neg logit contributions
 okay-0.008
 respected-0.007
 crank-0.007
 belong-0.007
Please-0.007
Token felt
Token ID2936
Feature activation-9.12e-03

Pos logit contributions
 crank+0.003
 okay+0.003
 respected+0.003
 dangerous+0.003
 waste+0.003
Neg logit contributions
 Alexa-0.006
hetically-0.006
ula-0.006
ela-0.006
 creation-0.006
Token so
Token ID523
Feature activation+5.97e-04

Pos logit contributions
ula+0.015
oming+0.014
 Alexa+0.014
 downward+0.014
 curtain+0.014
Neg logit contributions
asure-0.011
Please-0.011
uch-0.011
 property-0.011
 fought-0.011
Token good
Token ID922
Feature activation+1.64e-02

Pos logit contributions
asure+0.001
Please+0.001
 stole+0.001
 seen+0.001
 fought+0.001
Neg logit contributions
oming-0.001
hetically-0.001
ula-0.001
 creation-0.001
 accomplishment-0.001
Token.
Token ID13
Feature activation-7.23e-03

Pos logit contributions
 wanted+0.018
 okay+0.018
 dangerous+0.018
 respected+0.018
 crank+0.017
Neg logit contributions
 Alexa-0.031
ula-0.030
hetically-0.030
ela-0.029
 creation-0.029
Token
Token ID198
Feature activation-3.39e-03

Pos logit contributions
 Alexa+0.013
ula+0.013
hetically+0.012
ela+0.012
 creation+0.012
Neg logit contributions
 wanted-0.009
 dangerous-0.009
 respected-0.009
 okay-0.009
 crank-0.009
Token
Token ID198
Feature activation-3.11e-03

Pos logit contributions
hetically+0.007
 Alexa+0.007
 creation+0.007
ula+0.007
ela+0.007
Neg logit contributions
 wanted-0.003
 okay-0.003
 respected-0.003
 dangerous-0.003
 quit-0.003
TokenThe
Token ID464
Feature activation-1.88e-03

Pos logit contributions
 Alexa+0.006
ula+0.006
ela+0.006
hetically+0.006
 creation+0.006
Neg logit contributions
 wanted-0.003
 okay-0.003
 crank-0.003
 dangerous-0.003
 seen-0.003
Token pink
Token ID11398
Feature activation-3.83e-02

Pos logit contributions
 Alexa+0.003
ula+0.003
hetically+0.003
ela+0.003
 creation+0.003
Neg logit contributions
 wanted-0.002
 dangerous-0.002
 okay-0.002
 crank-0.002
 respected-0.002
Token dog
Token ID3290
Feature activation-1.87e-02

Pos logit contributions
 Alexa+0.065
ula+0.064
hetically+0.063
ela+0.062
 creation+0.061
Neg logit contributions
 wanted-0.049
 okay-0.048
 dangerous-0.048
 respected-0.048
 crank-0.048
Token decided
Token ID3066
Feature activation+3.61e-02

Pos logit contributions
 Alexa+0.034
 puppet+0.033
hetically+0.032
ela+0.032
 creation+0.031
Neg logit contributions
 belong-0.020
 okay-0.020
 crank-0.020
 OK-0.020
 waste-0.019
Token some
Token ID617
Feature activation+4.35e-02

Pos logit contributions
 respected+0.131
 avoided+0.129
 fought+0.129
 crank+0.128
 okay+0.128
Neg logit contributions
 Alexa-0.100
ula-0.099
 creation-0.094
 curtain-0.091
 poster-0.089
Token candy
Token ID18550
Feature activation+8.64e-02

Pos logit contributions
 avoided+0.060
 respected+0.060
 missed+0.058
 forgot+0.058
 fought+0.056
Neg logit contributions
ula-0.070
 curtain-0.068
 creation-0.068
 puppet-0.065
 poster-0.065
Token and
Token ID290
Feature activation+3.03e-02

Pos logit contributions
 respected+0.119
 avoided+0.118
 crank+0.117
 quit+0.116
 wanted+0.114
Neg logit contributions
ula-0.138
 Alexa-0.134
ela-0.129
 creation-0.128
 puppet-0.128
Token asked
Token ID1965
Feature activation+1.07e-01

Pos logit contributions
 crank+0.048
uch+0.048
 OK+0.047
 okay+0.046
 respected+0.046
Neg logit contributions
 Alexa-0.039
 creation-0.037
 curtain-0.035
 curtains-0.035
 puppet-0.035
Token her
Token ID607
Feature activation+2.60e-04

Pos logit contributions
 crank+0.183
 avoided+0.179
 worn+0.179
 dangerous+0.179
 respected+0.178
Neg logit contributions
 Alexa-0.140
ula-0.136
ela-0.127
 creation-0.125
 Isabel-0.124
Token brother
Token ID3956
Feature activation+2.60e-02

Pos logit contributions
 wanted+0.000
 okay+0.000
 dangerous+0.000
?"+0.000
 quit+0.000
Neg logit contributions
hetically-0.001
 Alexa-0.001
ula-0.001
ela-0.001
 creation-0.001
Token for
Token ID329
Feature activation-1.11e-02

Pos logit contributions
 wanted+0.026
 okay+0.026
 dangerous+0.024
 respected+0.024
 quit+0.024
Neg logit contributions
 Alexa-0.051
hetically-0.051
ula-0.051
ela-0.050
 creation-0.050
Token some
Token ID617
Feature activation+4.52e-02

Pos logit contributions
ula+0.019
 Alexa+0.018
 creation+0.018
 puppet+0.017
hetically+0.017
Neg logit contributions
 wanted-0.014
 respected-0.014
 avoided-0.014
 crank-0.013
 shouldn-0.013
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
 avoided+0.070
 respected+0.069
uch+0.065
 crank+0.063
 fought+0.063
Neg logit contributions
 creation-0.064
ula-0.062
 recipe-0.055
 puppet-0.054
 Alexa-0.054
Token But
Token ID887
Feature activation+4.76e-02

Pos logit contributions
 wanted+0.017
 okay+0.017
 dangerous+0.017
 respected+0.016
 crank+0.016
Neg logit contributions
 Alexa-0.022
ula-0.022
hetically-0.021
ela-0.021
 creation-0.021
Token her
Token ID607
Feature activation+1.02e-04

Pos logit contributions
 crank+0.071
 dangerous+0.071
 deserve+0.070
 seen+0.070
 avoided+0.070
Neg logit contributions
 Alexa-0.072
ula-0.067
 Julie-0.065
ela-0.065
 Isabel-0.064
Token the
Token ID262
Feature activation-1.79e-02

Pos logit contributions
 Alexa+0.016
 curtain+0.015
 poster+0.015
 creation+0.015
 circles+0.015
Neg logit contributions
 okay-0.017
uch-0.016
 crank-0.016
 OK-0.016
 respected-0.015
Token<|endoftext|>
Token ID50256
Feature activation-3.94e-02

Pos logit contributions
 creation+0.027
 curtain+0.027
 poster+0.026
 Alexa+0.026
hetically+0.026
Neg logit contributions
 respected-0.025
 wanted-0.025
 avoided-0.025
 okay-0.025
 crank-0.024
TokenS
Token ID50
Feature activation+7.49e-02

Pos logit contributions
 Alexa+0.065
ula+0.064
hetically+0.063
ela+0.062
 creation+0.062
Neg logit contributions
 wanted-0.052
 okay-0.050
 dangerous-0.050
 respected-0.050
 crank-0.050
Tokenara
Token ID3301
Feature activation+2.75e-02

Pos logit contributions
 wanted+0.111
 okay+0.107
 respected+0.106
 seen+0.106
 dangerous+0.105
Neg logit contributions
ula-0.113
 Alexa-0.112
hetically-0.110
 creation-0.105
ela-0.103
Token and
Token ID290
Feature activation+1.15e-02

Pos logit contributions
 crank+0.045
 okay+0.045
 dangerous+0.044
 belong+0.044
 worn+0.043
Neg logit contributions
 Alexa-0.036
ela-0.034
 creation-0.034
 puppet-0.032
hetically-0.032
Token Tom
Token ID4186
Feature activation+1.08e-02

Pos logit contributions
uch+0.017
 crank+0.017
 dangerous+0.017
 harmful+0.016
 unsafe+0.016
Neg logit contributions
 Alexa-0.017
 Julie-0.015
 Julia-0.015
ela-0.014
 Ana-0.014
Token were
Token ID547
Feature activation+2.05e-02

Pos logit contributions
 crank+0.013
 dangerous+0.012
 harmful+0.012
 wasteful+0.012
 Please+0.012
Neg logit contributions
 Alexa-0.017
 curtain-0.016
hetically-0.016
 creation-0.016
 Julie-0.016
Token playing
Token ID2712
Feature activation-2.17e-02

Pos logit contributions
 okay+0.026
 wanted+0.025
 dangerous+0.025
 crank+0.024
 respected+0.024
Neg logit contributions
 Alexa-0.036
ula-0.036
hetically-0.036
ela-0.035
 creation-0.035
Token in
Token ID287
Feature activation+2.83e-02

Pos logit contributions
 Alexa+0.045
ula+0.045
hetically+0.044
ela+0.044
 creation+0.043
Neg logit contributions
 wanted-0.019
 okay-0.019
 dangerous-0.019
 respected-0.018
 crank-0.018
Token the
Token ID262
Feature activation+5.65e-03

Pos logit contributions
 wanted+0.017
 waste+0.017
 dangerous+0.016
 seen+0.015
 worn+0.014
Neg logit contributions
ula-0.072
hetically-0.072
 Alexa-0.068
ail-0.068
oming-0.067
Token park
Token ID3952
Feature activation-2.88e-02

Pos logit contributions
 wanted+0.008
 dangerous+0.008
 crank+0.007
 respected+0.007
 okay+0.007
Neg logit contributions
 Alexa-0.009
ula-0.009
hetically-0.009
ela-0.009
 creation-0.009
Token hug
Token ID16225
Feature activation-5.07e-02

Pos logit contributions
ula+0.001
 Alexa+0.001
hetically+0.001
ela+0.001
 creation+0.001
Neg logit contributions
 wanted-0.001
 respected-0.001
 okay-0.001
 quit-0.001
 dangerous-0.001
Token their
Token ID511
Feature activation+6.05e-04

Pos logit contributions
 Alexa+0.089
ula+0.086
hetically+0.083
 creation+0.081
 curtain+0.081
Neg logit contributions
 seen-0.066
 stole-0.066
 worn-0.064
 belong-0.064
 dangerous-0.063
Token mom
Token ID1995
Feature activation+5.38e-03

Pos logit contributions
 wanted+0.001
 dangerous+0.001
 waste+0.001
 quit+0.001
 crank+0.001
Neg logit contributions
 Alexa-0.001
ula-0.001
hetically-0.001
ela-0.001
oris-0.001
Token.
Token ID13
Feature activation-4.09e-03

Pos logit contributions
 wanted+0.004
 okay+0.004
 like+0.004
 left+0.004
 decided+0.004
Neg logit contributions
ula-0.012
 Alexa-0.012
oris-0.012
ela-0.012
 Tracy-0.012
Token They
Token ID1119
Feature activation-9.06e-03

Pos logit contributions
 Alexa+0.007
ula+0.007
 curtain+0.006
 Tracy+0.006
ela+0.006
Neg logit contributions
 dangerous-0.006
 belong-0.006
 seen-0.006
 stole-0.006
 crank-0.006
Token hug
Token ID16225
Feature activation-5.29e-02

Pos logit contributions
 Alexa+0.017
hetically+0.016
ula+0.016
 creation+0.016
ela+0.016
Neg logit contributions
 crank-0.010
 dangerous-0.009
 okay-0.009
 wanted-0.009
 worn-0.009
Token their
Token ID511
Feature activation-2.77e-03

Pos logit contributions
 Alexa+0.090
ula+0.088
hetically+0.085
 creation+0.083
ela+0.083
Neg logit contributions
 stole-0.069
 dangerous-0.068
 seen-0.068
 wanted-0.068
 worn-0.067
Token dad
Token ID9955
Feature activation+3.73e-02

Pos logit contributions
 Alexa+0.006
ula+0.006
ela+0.006
hetically+0.006
oris+0.005
Neg logit contributions
 wanted-0.003
 dangerous-0.003
 waste-0.003
 crank-0.002
 quit-0.002
Token,
Token ID11
Feature activation-1.56e-02

Pos logit contributions
 wanted+0.031
 okay+0.030
 crank+0.028
 decided+0.027
 like+0.027
Neg logit contributions
ula-0.083
 Alexa-0.081
oris-0.080
ela-0.080
 doorway-0.077
Token their
Token ID511
Feature activation-2.89e-03

Pos logit contributions
 Alexa+0.028
ula+0.028
hetically+0.027
 creation+0.027
ela+0.026
Neg logit contributions
 wanted-0.018
 dangerous-0.017
 crank-0.017
 okay-0.017
 respected-0.017
Token grandma
Token ID49890
Feature activation+2.77e-02

Pos logit contributions
 Alexa+0.006
ula+0.006
ela+0.005
hetically+0.005
oris+0.005
Neg logit contributions
 wanted-0.003
 dangerous-0.003
 waste-0.003
 crank-0.003
 respected-0.003
Token love
Token ID1842
Feature activation-1.54e-02

Pos logit contributions
 Alexa+0.028
 puppet+0.028
 curtain+0.027
 creation+0.026
 poster+0.026
Neg logit contributions
 crank-0.018
 Please-0.018
 worn-0.018
Please-0.018
 badly-0.017
Token you
Token ID345
Feature activation+3.24e-02

Pos logit contributions
 Alexa+0.033
ula+0.032
 creation+0.032
 curtain+0.031
hetically+0.031
Neg logit contributions
 respected-0.012
 crank-0.012
 quit-0.012
 wanted-0.012
 worn-0.012
Token."
Token ID526
Feature activation+2.67e-02

Pos logit contributions
 worn+0.039
 avoided+0.038
 seen+0.038
 stole+0.038
 respected+0.038
Neg logit contributions
 Alexa-0.057
ula-0.054
 puppet-0.053
 creation-0.053
 curtain-0.053
Token
Token ID198
Feature activation-1.41e-02

Pos logit contributions
 wanted+0.020
 lived+0.016
 okay+0.016
 did+0.016
 Please+0.016
Neg logit contributions
hetically-0.063
 creation-0.062
ela-0.061
ula-0.061
oming-0.059
Token
Token ID198
Feature activation-1.30e-02

Pos logit contributions
hetically+0.034
 creation+0.033
 Alexa+0.033
ula+0.032
ela+0.032
Neg logit contributions
 wanted-0.009
 lived-0.009
?"-0.009
 dangerous-0.008
 respected-0.008
TokenL
Token ID43
Feature activation-2.04e-02

Pos logit contributions
 Alexa+0.029
 creation+0.029
ula+0.028
hetically+0.028
ela+0.027
Neg logit contributions
 wanted-0.010
 respected-0.010
Please-0.010
 dangerous-0.010
 decided-0.010
Tokenuna
Token ID9613
Feature activation-7.31e-02

Pos logit contributions
hetically+0.037
 creation+0.036
 poster+0.035
 Alexa+0.034
 doorway+0.034
Neg logit contributions
?"-0.024
 okay-0.024
 wanted-0.024
 did-0.024
 OK-0.023
Token feels
Token ID5300
Feature activation-4.46e-02

Pos logit contributions
ula+0.135
hetically+0.128
ela+0.118
 Alexa+0.118
oris+0.115
Neg logit contributions
 wanted-0.102
 did-0.096
?"-0.095
 lied-0.092
 lived-0.091
Token happy
Token ID3772
Feature activation+3.82e-03

Pos logit contributions
 Alexa+0.070
 puppet+0.066
ula+0.065
 downward+0.063
 curtain+0.062
Neg logit contributions
 fought-0.061
asure-0.060
uch-0.060
 stole-0.058
Please-0.057
Token too
Token ID1165
Feature activation+3.78e-02

Pos logit contributions
 wanted+0.004
 okay+0.004
 crank+0.003
 respected+0.003
 dangerous+0.003
Neg logit contributions
 Alexa-0.008
ula-0.008
hetically-0.008
ela-0.008
oris-0.007
Token.
Token ID13
Feature activation-1.32e-02

Pos logit contributions
 respected+0.054
 avoided+0.054
 stole+0.054
 fought+0.053
Please+0.053
Neg logit contributions
ula-0.057
 Alexa-0.055
hetically-0.052
 creation-0.052
ela-0.051
Token The
Token ID383
Feature activation+9.48e-04

Pos logit contributions
 Alexa+0.036
ula+0.034
 Julie+0.032
 Isabel+0.032
 Tracy+0.031
Neg logit contributions
 dangerous-0.030
 respected-0.030
 fought-0.030
 stole-0.029
 wanted-0.029
Token rain
Token ID6290
Feature activation+3.62e-02

Pos logit contributions
 respected+0.001
 crank+0.001
 wanted+0.001
 okay+0.001
 avoided+0.001
Neg logit contributions
 Alexa-0.002
 creation-0.002
ula-0.002
 curtain-0.002
 poster-0.002
Token is
Token ID318
Feature activation-4.16e-02

Pos logit contributions
 wanted+0.045
 okay+0.044
 dangerous+0.044
 respected+0.043
 crank+0.043
Neg logit contributions
 Alexa-0.062
ula-0.062
hetically-0.060
ela-0.060
 creation-0.059
Token beautiful
Token ID4950
Feature activation-2.88e-02

Pos logit contributions
 creation+0.067
 curtain+0.067
 Alexa+0.064
oming+0.063
ula+0.063
Neg logit contributions
 stole-0.056
Please-0.053
 fought-0.052
Fine-0.052
uch-0.051
Token and
Token ID290
Feature activation-1.62e-02

Pos logit contributions
 creation+0.042
 Alexa+0.041
 curtain+0.040
ula+0.040
 puppet+0.039
Neg logit contributions
 respected-0.042
 avoided-0.042
 stole-0.042
 worn-0.042
 belong-0.042
Token wonderful
Token ID7932
Feature activation-5.92e-02

Pos logit contributions
 Alexa+0.025
 curtain+0.025
 creation+0.024
 Ana+0.024
 puppet+0.023
Neg logit contributions
asure-0.022
Please-0.022
uch-0.022
 stole-0.021
 fought-0.021
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
 creation+0.096
 Alexa+0.094
 curtain+0.092
ula+0.092
 puppet+0.090
Neg logit contributions
 respected-0.078
 crank-0.078
 avoided-0.077
 okay-0.077
 stole-0.077
Token fun
Token ID1257
Feature activation-1.72e-02

Pos logit contributions
 Alexa+0.028
 Julie+0.025
 Ana+0.025
 curtain+0.025
 puppet+0.025
Neg logit contributions
Please-0.022
asure-0.021
uch-0.021
 badly-0.021
 stole-0.021
Token.
Token ID13
Feature activation-2.15e-02

Pos logit contributions
 Alexa+0.029
 creation+0.029
ula+0.029
 curtain+0.028
ela+0.028
Neg logit contributions
 crank-0.021
 respected-0.021
 avoided-0.021
 stole-0.021
 dangerous-0.021
Token Can
Token ID1680
Feature activation+3.43e-02

Pos logit contributions
 Alexa+0.036
ula+0.035
 Julie+0.032
 Isabel+0.032
 Tracy+0.032
Neg logit contributions
 dangerous-0.029
 respected-0.029
 wanted-0.028
 fought-0.028
 stole-0.028
Token you
Token ID345
Feature activation+2.38e-02

Pos logit contributions
 seen+0.044
 respected+0.043
 dangerous+0.043
 avoided+0.042
 harmful+0.042
Neg logit contributions
 Alexa-0.059
 poster-0.057
ula-0.056
 Isabel-0.055
hetically-0.054
Token them
Token ID606
Feature activation-3.36e-02

Pos logit contributions
 Alexa+0.062
ula+0.061
hetically+0.060
ela+0.059
 creation+0.059
Neg logit contributions
 wanted-0.046
 crank-0.045
 respected-0.044
 okay-0.044
 dangerous-0.044
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
 Alexa+0.062
ula+0.062
hetically+0.060
ela+0.060
 creation+0.059
Neg logit contributions
 wanted-0.037
 okay-0.036
 dangerous-0.036
 respected-0.036
 crank-0.036
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
 Alexa+0.026
ula+0.025
 Julie+0.023
 Isabel+0.023
ela+0.023
Neg logit contributions
 belong-0.023
uch-0.022
 stole-0.022
 crank-0.022
 dangerous-0.022
Token
Token ID198
Feature activation-1.59e-02

Pos logit contributions
 Alexa+0.030
ula+0.030
hetically+0.029
ela+0.029
 creation+0.028
Neg logit contributions
 wanted-0.020
 crank-0.020
 okay-0.020
 respected-0.020
 dangerous-0.020
Token"
Token ID1
Feature activation-2.16e-02

Pos logit contributions
 Alexa+0.027
ula+0.027
ela+0.026
hetically+0.025
 poster+0.025
Neg logit contributions
 belong-0.020
 crank-0.020
 okay-0.020
 quit-0.019
 wanted-0.019
TokenOK
Token ID11380
Feature activation-1.81e-02

Pos logit contributions
ula+0.042
 Alexa+0.042
hetically+0.040
 poster+0.040
ela+0.039
Neg logit contributions
 seen-0.022
 belong-0.022
 dangerous-0.021
 okay-0.021
 crank-0.020
Token,
Token ID11
Feature activation-2.17e-02

Pos logit contributions
 Alexa+0.031
ula+0.031
ela+0.030
 creation+0.030
hetically+0.030
Neg logit contributions
 respected-0.022
 wanted-0.022
 crank-0.022
 avoided-0.022
 worn-0.021
Token OK
Token ID7477
Feature activation+8.15e-02

Pos logit contributions
ula+0.036
 Alexa+0.035
 poster+0.034
ela+0.033
 curtain+0.033
Neg logit contributions
 lived-0.029
 seen-0.029
 respected-0.028
uch-0.028
 avoided-0.028
Token.
Token ID13
Feature activation-1.33e-02

Pos logit contributions
 avoided+0.095
 respected+0.094
 worn+0.092
 crank+0.092
 seen+0.089
Neg logit contributions
 Alexa-0.143
 creation-0.138
ula-0.137
ela-0.135
 poster-0.135
Token I
Token ID314
Feature activation+3.05e-02

Pos logit contributions
 Alexa+0.024
hetically+0.023
 Ana+0.022
 Isabel+0.022
 Julie+0.022
Neg logit contributions
 seen-0.016
 dangerous-0.016
 worn-0.015
 decided-0.015
 harmful-0.015
Token won
Token ID1839
Feature activation-2.25e-02

Pos logit contributions
nam+0.050
chard+0.046
 yourselves+0.044
Please+0.043
asure+0.043
Neg logit contributions
 puppet-0.039
 creation-0.038
ela-0.036
ail-0.036
ula-0.035
Token the
Token ID262
Feature activation+9.80e-04

Pos logit contributions
 Alexa+0.045
ela+0.043
 creation+0.043
 Julie+0.041
 curtain+0.041
Neg logit contributions
Please-0.035
 crank-0.035
 worn-0.032
 respected-0.032
 dangerous-0.032
Token forest
Token ID8222
Feature activation+1.16e-02

Pos logit contributions
 respected+0.001
 wanted+0.001
 crank+0.001
 seen+0.001
 avoided+0.001
Neg logit contributions
 creation-0.001
hetically-0.001
ula-0.001
 curtain-0.001
 Alexa-0.001
Token was
Token ID373
Feature activation-4.97e-03

Pos logit contributions
 crank+0.012
 okay+0.012
 wanted+0.012
Please+0.012
 worn+0.011
Neg logit contributions
 creation-0.021
 Alexa-0.021
hetically-0.021
ela-0.021
ula-0.021
Token destroyed
Token ID6572
Feature activation+3.15e-02

Pos logit contributions
 creation+0.008
 doorway+0.008
 curtain+0.007
oming+0.007
ula+0.007
Neg logit contributions
Please-0.006
 wanted-0.006
 Please-0.006
Fine-0.006
 crank-0.006
Token.
Token ID13
Feature activation-7.70e-05

Pos logit contributions
 wanted+0.038
 okay+0.037
 respected+0.037
 crank+0.037
 dangerous+0.037
Neg logit contributions
 Alexa-0.055
ula-0.054
hetically-0.053
ela-0.053
 creation-0.052
Token Sad
Token ID14668
Feature activation-1.12e-01

Pos logit contributions
 Alexa+0.000
ula+0.000
hetically+0.000
ela+0.000
 creation+0.000
Neg logit contributions
 wanted-0.000
 dangerous-0.000
 respected-0.000
 crank-0.000
 belong-0.000
Token indeed
Token ID5600
Feature activation-1.04e-02

Pos logit contributions
ula+0.188
 Alexa+0.185
ela+0.174
 poster+0.173
hetically+0.172
Neg logit contributions
 wanted-0.156
 respected-0.147
 dangerous-0.146
 quit-0.145
 okay-0.144
Token.
Token ID13
Feature activation-9.05e-04

Pos logit contributions
 creation+0.019
 Alexa+0.019
ela+0.019
hetically+0.019
 puppet+0.018
Neg logit contributions
Please-0.010
 crank-0.010
 waste-0.010
 OK-0.009
asure-0.009
Token<|endoftext|>
Token ID50256
Feature activation-2.79e-02

Pos logit contributions
 Alexa+0.001
ula+0.001
hetically+0.001
ela+0.001
 creation+0.001
Neg logit contributions
 wanted-0.001
 dangerous-0.001
 okay-0.001
 crank-0.001
 respected-0.001
TokenOnce
Token ID7454
Feature activation+1.72e-02

Pos logit contributions
ula+0.054
hetically+0.054
 Alexa+0.054
 creation+0.053
ela+0.053
Neg logit contributions
 wanted-0.029
 lived-0.027
 okay-0.027
 dangerous-0.027
 quit-0.026
Token upon
Token ID2402
Feature activation-1.01e-02

Pos logit contributions
 wanted+0.021
 okay+0.021
?"+0.021
 dangerous+0.020
 lived+0.020
Neg logit contributions
hetically-0.032
ula-0.031
 Alexa-0.031
 puppet-0.030
 poster-0.029
Token,
Token ID11
Feature activation+5.73e-03

Pos logit contributions
 wanted+0.077
 okay+0.075
 dangerous+0.074
 respected+0.074
 crank+0.073
Neg logit contributions
 Alexa-0.090
ula-0.089
hetically-0.088
ela-0.085
 creation-0.085
Token Jenny
Token ID28437
Feature activation-5.24e-02

Pos logit contributions
 wanted+0.008
 dangerous+0.008
 respected+0.008
 okay+0.008
 crank+0.008
Neg logit contributions
 Alexa-0.009
ula-0.009
hetically-0.008
ela-0.008
 creation-0.008
Token took
Token ID1718
Feature activation-5.05e-03

Pos logit contributions
 Alexa+0.106
ela+0.103
ula+0.102
 creation+0.101
hetically+0.101
Neg logit contributions
 okay-0.050
 dangerous-0.049
 crank-0.049
 belong-0.049
 waste-0.047
Token a
Token ID257
Feature activation+3.46e-03

Pos logit contributions
 Alexa+0.009
hetically+0.009
ula+0.009
 creation+0.009
ela+0.009
Neg logit contributions
 wanted-0.005
 okay-0.005
 dangerous-0.005
 quit-0.005
 respected-0.005
Token deep
Token ID2769
Feature activation+1.71e-02

Pos logit contributions
 dangerous+0.004
 worn+0.004
 wanted+0.004
 waste+0.004
 quit+0.004
Neg logit contributions
hetically-0.007
 Alexa-0.006
ula-0.006
 Isabel-0.006
ela-0.006
Token breath
Token ID8033
Feature activation-1.25e-01

Pos logit contributions
 like+0.029
 touched+0.028
 left+0.028
 kept+0.028
 wanted+0.028
Neg logit contributions
ula-0.024
 Alexa-0.024
hetically-0.022
ilda-0.022
pill-0.021
Token and
Token ID290
Feature activation+1.54e-02

Pos logit contributions
 Alexa+0.256
 creation+0.243
hetically+0.240
 poster+0.240
ula+0.237
Neg logit contributions
 wanted-0.122
 like-0.122
 okay-0.120
 decided-0.116
 seen-0.113
Token parted
Token ID37813
Feature activation+8.33e-02

Pos logit contributions
 okay+0.018
 belong+0.018
 crank+0.018
 property+0.018
 OK+0.017
Neg logit contributions
 Alexa-0.028
ela-0.027
 Julie-0.027
 puppet-0.026
 creation-0.026
Token the
Token ID262
Feature activation+6.09e-03

Pos logit contributions
 wanted+0.101
 okay+0.099
 dangerous+0.099
 respected+0.098
 crank+0.097
Neg logit contributions
 Alexa-0.147
ula-0.145
hetically-0.143
 creation-0.139
ela-0.139
Token water
Token ID1660
Feature activation+6.30e-02

Pos logit contributions
 respected+0.008
 wanted+0.008
 okay+0.008
 avoided+0.008
 crank+0.008
Neg logit contributions
 creation-0.009
 Alexa-0.009
ula-0.009
hetically-0.009
ela-0.009
Token with
Token ID351
Feature activation-1.60e-02

Pos logit contributions
 wanted+0.062
 okay+0.058
 decided+0.057
 like+0.056
 dangerous+0.056
Neg logit contributions
 Alexa-0.129
ula-0.124
hetically-0.121
ela-0.121
 poster-0.120
Token he
Token ID339
Feature activation+3.56e-02

Pos logit contributions
 wanted+0.013
 respected+0.013
 crank+0.013
 okay+0.013
 belong+0.013
Neg logit contributions
 Alexa-0.021
ula-0.020
hetically-0.019
ela-0.019
 creation-0.019
Token noticed
Token ID6810
Feature activation+1.10e-02

Pos logit contributions
 belong+0.044
 OK+0.041
 okay+0.041
 ours+0.039
 crank+0.039
Neg logit contributions
 Alexa-0.065
 creation-0.062
 puppet-0.062
hetically-0.059
ela-0.059
Token some
Token ID617
Feature activation+4.10e-02

Pos logit contributions
 crank+0.014
 belong+0.014
 okay+0.013
 Please+0.013
uch+0.013
Neg logit contributions
 Alexa-0.019
ula-0.019
 creation-0.018
hetically-0.018
 puppet-0.018
Token other
Token ID584
Feature activation+3.99e-02

Pos logit contributions
 crank+0.051
 okay+0.050
 wanted+0.050
 respected+0.049
 avoided+0.049
Neg logit contributions
 Alexa-0.067
ula-0.067
 creation-0.066
hetically-0.066
ela-0.064
Token cater
Token ID20825
Feature activation-3.31e-02

Pos logit contributions
 wanted+0.045
 dangerous+0.043
 okay+0.043
 respected+0.042
 crank+0.042
Neg logit contributions
 Alexa-0.075
ula-0.074
hetically-0.073
ela-0.071
 creation-0.070
Tokenpill
Token ID27215
Feature activation-1.42e-01

Pos logit contributions
 Alexa+0.024
ula+0.022
ela+0.022
hetically+0.021
 creation+0.021
Neg logit contributions
 wanted-0.075
 dangerous-0.074
 respected-0.073
 avoided-0.073
 seen-0.073
Tokenars
Token ID945
Feature activation+6.83e-02

Pos logit contributions
 Alexa+0.267
ula+0.263
hetically+0.259
ela+0.255
 creation+0.254
Neg logit contributions
 wanted-0.153
 okay-0.149
 dangerous-0.148
 crank-0.148
 respected-0.148
Token stretching
Token ID20880
Feature activation-3.40e-02

Pos logit contributions
 wanted+0.084
 okay+0.082
 dangerous+0.081
 respected+0.081
 crank+0.080
Neg logit contributions
 Alexa-0.119
ula-0.117
hetically-0.115
ela-0.113
 creation-0.112
Token towards
Token ID3371
Feature activation+1.67e-02

Pos logit contributions
 Alexa+0.060
ula+0.059
hetically+0.058
 creation+0.057
ela+0.057
Neg logit contributions
 wanted-0.040
 crank-0.040
 okay-0.040
 respected-0.039
 dangerous-0.039
Token it
Token ID340
Feature activation+2.65e-02

Pos logit contributions
 wanted+0.020
 okay+0.020
 crank+0.020
 dangerous+0.020
 respected+0.020
Neg logit contributions
 Alexa-0.029
ula-0.029
hetically-0.028
ela-0.028
 creation-0.028
Token too
Token ID1165
Feature activation+6.23e-02

Pos logit contributions
 wanted+0.030
 okay+0.030
 crank+0.030
 respected+0.030
 dangerous+0.029
Neg logit contributions
 Alexa-0.048
ula-0.047
hetically-0.047
 creation-0.046
ela-0.046
Token bottle
Token ID9294
Feature activation+1.96e-02

Pos logit contributions
 wanted+0.011
 respected+0.011
 okay+0.011
 crank+0.011
 dangerous+0.011
Neg logit contributions
 Alexa-0.007
ula-0.007
hetically-0.006
 creation-0.006
ela-0.006
Token and
Token ID290
Feature activation+1.30e-02

Pos logit contributions
 decided+0.016
 wanted+0.016
 left+0.015
 like+0.015
 kept+0.014
Neg logit contributions
 Alexa-0.043
ula-0.042
 Tracy-0.041
ela-0.040
 creation-0.040
Token took
Token ID1718
Feature activation-5.58e-03

Pos logit contributions
 belong+0.014
 property+0.014
 crank+0.014
 okay+0.014
 OK+0.014
Neg logit contributions
 Alexa-0.026
 Julie-0.024
ela-0.024
 creation-0.024
 puppet-0.024
Token a
Token ID257
Feature activation+5.05e-04

Pos logit contributions
 Alexa+0.010
ula+0.010
 creation+0.009
hetically+0.009
 curtain+0.009
Neg logit contributions
 respected-0.007
 wanted-0.007
 crank-0.007
 dangerous-0.007
Please-0.007
Token big
Token ID1263
Feature activation-1.27e-02

Pos logit contributions
 dangerous+0.001
 wanted+0.001
 waste+0.001
 okay+0.001
 worn+0.001
Neg logit contributions
hetically-0.001
 Alexa-0.001
ula-0.001
ela-0.001
 Isabel-0.001
Token breath
Token ID8033
Feature activation-1.26e-01

Pos logit contributions
ula+0.025
 Alexa+0.025
hetically+0.024
ela+0.024
 Isabel+0.023
Neg logit contributions
 waste-0.014
 dangerous-0.013
 wanted-0.013
?"-0.013
 stole-0.013
Token of
Token ID286
Feature activation-3.73e-02

Pos logit contributions
 Alexa+0.280
hetically+0.268
 creation+0.263
 poster+0.261
ela+0.260
Neg logit contributions
 like-0.110
 wanted-0.106
 okay-0.106
 seen-0.100
 dangerous-0.097
Token the
Token ID262
Feature activation+4.21e-03

Pos logit contributions
 Alexa+0.063
 creation+0.062
 curtain+0.061
ula+0.060
hetically+0.060
Neg logit contributions
 wanted-0.047
 respected-0.046
 fought-0.046
 okay-0.045
 crank-0.045
Token oxygen
Token ID11863
Feature activation+2.65e-02

Pos logit contributions
 respected+0.006
 wanted+0.006
 seen+0.006
 avoided+0.006
 okay+0.006
Neg logit contributions
 Alexa-0.006
ula-0.006
 creation-0.006
hetically-0.006
 curtain-0.006
Token.
Token ID13
Feature activation+1.84e-03

Pos logit contributions
 wanted+0.036
 dangerous+0.035
 okay+0.035
 respected+0.035
 crank+0.034
Neg logit contributions
 Alexa-0.043
ula-0.042
hetically-0.042
ela-0.041
 creation-0.040
Token But
Token ID887
Feature activation+3.37e-02

Pos logit contributions
 wanted+0.003
 okay+0.002
 dangerous+0.002
 respected+0.002
 crank+0.002
Neg logit contributions
 Alexa-0.003
ula-0.003
hetically-0.003
ela-0.003
 creation-0.003
Token watched
Token ID7342
Feature activation-3.35e-02

Pos logit contributions
 crank+0.040
 belong+0.040
 okay+0.039
 waste+0.038
 OK+0.037
Neg logit contributions
 Alexa-0.076
ela-0.073
hetically-0.072
 puppet-0.071
 creation-0.071
Token as
Token ID355
Feature activation+2.36e-02

Pos logit contributions
 Alexa+0.074
ela+0.073
ula+0.073
hetically+0.072
 poster+0.071
Neg logit contributions
 okay-0.027
 wanted-0.026
 like-0.025
 dangerous-0.025
 quit-0.024
Token the
Token ID262
Feature activation+3.78e-03

Pos logit contributions
 wanted+0.022
 dangerous+0.021
 decided+0.019
 respected+0.019
 okay+0.019
Neg logit contributions
ula-0.051
 Alexa-0.050
hetically-0.050
 creation-0.049
ela-0.049
Token rabbit
Token ID22746
Feature activation-4.19e-02

Pos logit contributions
 wanted+0.006
 dangerous+0.005
 okay+0.005
 worn+0.005
 decided+0.005
Neg logit contributions
ula-0.006
 Alexa-0.006
ela-0.006
 Isabel-0.006
 Tracy-0.006
Token hopped
Token ID45230
Feature activation-1.08e-03

Pos logit contributions
 Alexa+0.075
hetically+0.074
 creation+0.072
 puppet+0.071
ela+0.071
Neg logit contributions
 crank-0.050
 okay-0.048
 dangerous-0.046
 waste-0.046
 belong-0.046
Token circles
Token ID13332
Feature activation-1.57e-01

Pos logit contributions
 Alexa+0.002
 creation+0.002
hetically+0.002
 poster+0.002
ula+0.002
Neg logit contributions
 like-0.001
 decided-0.001
 wanted-0.001
 after-0.001
 okay-0.001
Token around
Token ID1088
Feature activation-4.79e-03

Pos logit contributions
 Alexa+0.342
 poster+0.339
ula+0.334
ela+0.333
oris+0.326
Neg logit contributions
 like-0.149
 wanted-0.125
 okay-0.124
 decided-0.122
 when-0.120
Token him
Token ID683
Feature activation+3.92e-04

Pos logit contributions
ula+0.011
 Alexa+0.010
ela+0.010
hetically+0.010
 poster+0.010
Neg logit contributions
 wanted-0.004
 like-0.004
 decided-0.004
 fought-0.004
 dangerous-0.004
Token.
Token ID13
Feature activation+7.77e-03

Pos logit contributions
 wanted+0.000
 okay+0.000
 crank+0.000
 dangerous+0.000
 respected+0.000
Neg logit contributions
 Alexa-0.001
ula-0.001
hetically-0.001
ela-0.001
 creation-0.001
Token He
Token ID679
Feature activation+3.55e-02

Pos logit contributions
 dangerous+0.010
 wanted+0.010
 respected+0.010
 okay+0.010
 crank+0.010
Neg logit contributions
 Alexa-0.013
ula-0.013
hetically-0.013
ela-0.012
 creation-0.012
Token was
Token ID373
Feature activation-6.18e-03

Pos logit contributions
 belong+0.045
 crank+0.043
 okay+0.043
 OK+0.042
 waste+0.041
Neg logit contributions
 Alexa-0.064
 puppet-0.061
ela-0.061
 creation-0.060
hetically-0.059
Token hill
Token ID12788
Feature activation-5.28e-02

Pos logit contributions
 dangerous+0.010
 waste+0.009
 wanted+0.009
 quit+0.009
 worn+0.009
Neg logit contributions
 Alexa-0.010
ula-0.010
hetically-0.009
aby-0.009
ela-0.009
Token she
Token ID673
Feature activation-2.35e-02

Pos logit contributions
 Alexa+0.106
ula+0.105
hetically+0.102
 creation+0.100
ela+0.100
Neg logit contributions
 wanted-0.053
 okay-0.050
 dangerous-0.050
 respected-0.049
 quit-0.049
Token was
Token ID373
Feature activation+3.67e-04

Pos logit contributions
 Alexa+0.046
ela+0.045
 creation+0.044
 puppet+0.044
 Isabel+0.043
Neg logit contributions
 okay-0.025
 OK-0.023
 belong-0.023
 crank-0.022
 waste-0.022
Token out
Token ID503
Feature activation-3.50e-04

Pos logit contributions
 wanted+0.000
 respected+0.000
 dangerous+0.000
 crank+0.000
 okay+0.000
Neg logit contributions
 Alexa-0.001
ula-0.001
ela-0.001
hetically-0.001
 creation-0.001
Token of
Token ID286
Feature activation-3.75e-02

Pos logit contributions
 Alexa+0.001
ula+0.001
hetically+0.001
oris+0.001
ela+0.001
Neg logit contributions
 wanted-0.000
 seen-0.000
 like-0.000
 dangerous-0.000
 okay-0.000
Token breath
Token ID8033
Feature activation-1.26e-01

Pos logit contributions
 Alexa+0.070
ula+0.070
hetically+0.069
ela+0.066
 creation+0.065
Neg logit contributions
 wanted-0.044
 dangerous-0.043
 worn-0.042
 seen-0.042
 respected-0.041
Token.
Token ID13
Feature activation+1.61e-03

Pos logit contributions
 Alexa+0.219
ula+0.218
hetically+0.213
ela+0.210
 creation+0.208
Neg logit contributions
 wanted-0.151
 respected-0.149
 crank-0.148
 dangerous-0.148
 okay-0.147
Token Her
Token ID2332
Feature activation+3.20e-02

Pos logit contributions
 wanted+0.002
 dangerous+0.002
 okay+0.002
 respected+0.002
 crank+0.002
Neg logit contributions
 Alexa-0.003
ula-0.003
hetically-0.002
ela-0.002
 creation-0.002
Token clean
Token ID3424
Feature activation+8.20e-03

Pos logit contributions
 wanted+0.039
 okay+0.039
 dangerous+0.038
 crank+0.038
 respected+0.038
Neg logit contributions
 Alexa-0.056
ula-0.055
hetically-0.054
ela-0.053
 creation-0.053
Token shoe
Token ID17292
Feature activation+8.16e-03

Pos logit contributions
 wanted+0.010
 quit+0.010
 dangerous+0.010
 worn+0.010
 decided+0.010
Neg logit contributions
 Alexa-0.015
ula-0.014
hetically-0.014
oris-0.013
 poster-0.013
Token was
Token ID373
Feature activation-2.49e-03

Pos logit contributions
 crank+0.009
 respected+0.009
 okay+0.009
 dangerous+0.009
 wanted+0.009
Neg logit contributions
 Alexa-0.015
hetically-0.014
ula-0.014
 creation-0.014
ela-0.014