TOP ACTIVATIONS
MAX = 0.098

a lick and a w ag . He was nice .
other plants ." The f auc et said , " Yes
had a long , w ag gy tail and loved to
a cookie . They w ag their tails . They are
each other . They w ag their tails and lick each
't work . The f auc et seemed broken .
." The f auc et was happy again ,
brown dog with a w ag gy tail . Tim my
Lily and Tom and w ag his tail .
a bucket to the f auc et and filled it with
day , a kind f auc et saw the sad boy
, silly ! I w ag my tail when I am
said , " The f auc et is too tight .
strong body and a w ag gy tail . One day
Lucky the dog would w ag his tail around and it
She turned on the f auc et and filled the sink
him , he would w ag his tail happily and act
nothing happened . The f auc et just sat there ,
flew off with a w ag of his tail ! <|endoftext|>
She turned on the f auc et and icy water came

MIN ACTIVATIONS
MIN = -0.088

mom was right â she was happy being
she realized something â the yacht wasn 't
who can tell me what ' we igh ' means ?"
't believe it â this one feather seemed
Steve was amazed â his wings started to
have a label that says ' cars '. I will put
have a label that says ' anim als '. I will
creature was right â B enny had gotten
they were happy â they had prepared a
something very special â the rise of the
not real ice â it was fake !
what to do â S oph ie used
something was wrong â all the water had
of the animal that says ' wo of ' ?"
thought about the night  ´ s adventure as she drifted
a dirty animal that says ' s que ak '. And
ly colors and it says ' Welcome ' at the top
was so relieved â he knew that if
is a sign that says ' Hot el ' with a
something else too â it was a sour

INTERVAL 0.052 - 0.098
CONTAINS 0.071%

One day as she che wed she saw something new
and started to use her cr ay ons to scrib ble
But then they heard a loud noise . They looked outside
can use the red past el first , and then give
quiet like you ." Ch ir py decided to help Wh

INTERVAL 0.005 - 0.052
CONTAINS 36.455%

dark cloud was just a picture and not something to be
. His mom heard him and came to help
<|endoftext|> Once upon a time there lived a little girl .
played together in the park . They were very happy .
showed it to her brother and they started to play .

INTERVAL -0.041 - 0.005
CONTAINS 62.929%

that hotel every weekend to explore and to have fun .
the nation had been transformed into something new and wonderful .
She watered it every day and sang to it . She
treasure . The sun slowly sank below the
fallen before . It was a long way down , and

INTERVAL -0.088 - -0.041
CONTAINS 0.544%

when she first arrived . <|endoftext|> Once there was a little
magnet , feeling lonely . <|endoftext|> Once upon a time ,
fun to play with . <|endoftext|> Once there was a little
wear for too long . <|endoftext|> Once upon a time ,
house nice and bright . <|endoftext|> Once upon a time there
Token a
Token ID257
Feature activation+1.40e-02

Pos logit contributions
 aids+0.025
 echoed+0.023
 lull+0.023
 face+0.022
tr+0.022
Neg logit contributions
-0.024
iw-0.022
ll-0.020
 least-0.020
ve-0.019
Token lick
Token ID36728
Feature activation+2.58e-02

Pos logit contributions
 aids+0.025
 jokes+0.023
 echoed+0.022
iet+0.021
ena+0.021
Neg logit contributions
-0.028
iw-0.024
ve-0.022
ll-0.021
field-0.020
Token and
Token ID290
Feature activation+5.20e-03

Pos logit contributions
 aids+0.058
 jokes+0.054
 lull+0.052
 face+0.051
 repeat+0.051
Neg logit contributions
-0.040
iw-0.032
ve-0.027
ll-0.027
ner-0.026
Token a
Token ID257
Feature activation+2.84e-02

Pos logit contributions
 aids+0.010
iet+0.009
tr+0.009
 lull+0.009
 face+0.008
Neg logit contributions
-0.009
iw-0.008
 searching-0.007
 least-0.007
field-0.006
Token w
Token ID266
Feature activation+3.68e-02

Pos logit contributions
 aids+0.051
iet+0.047
ena+0.046
 jokes+0.044
 echoed+0.044
Neg logit contributions
-0.058
iw-0.049
ve-0.043
 busy-0.038
 exhausted-0.037
Tokenag
Token ID363
Feature activation+9.20e-02

Pos logit contributions
 aids+0.055
 lull+0.051
 echoed+0.051
 face+0.051
 jokes+0.050
Neg logit contributions
-0.085
ve-0.070
iw-0.070
ll-0.067
ner-0.061
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
 jokes+0.146
 aids+0.144
 repeat+0.137
 face+0.130
 lull+0.128
Neg logit contributions
-0.183
ve-0.157
iw-0.140
field-0.138
mer-0.138
Token He
Token ID679
Feature activation+1.11e-03

Pos logit contributions
 aids+0.030
 face+0.028
 jokes+0.028
 lull+0.027
 repeat+0.027
Neg logit contributions
-0.046
iw-0.038
ve-0.034
ll-0.034
ner-0.033
Token was
Token ID373
Feature activation+1.75e-02

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 lull+0.002
 repeat+0.002
Neg logit contributions
-0.002
iw-0.002
ll-0.002
ve-0.002
field-0.002
Token nice
Token ID3621
Feature activation-8.77e-03

Pos logit contributions
 aids+0.037
 face+0.035
tr+0.035
 jokes+0.034
 lull+0.034
Neg logit contributions
-0.027
iw-0.022
ve-0.020
ll-0.020
 least-0.019
Token.
Token ID13
Feature activation+1.42e-02

Pos logit contributions
+0.020
iw+0.016
ve+0.016
ll+0.015
field+0.015
Neg logit contributions
 face-0.014
 aids-0.014
 jokes-0.013
 echoed-0.012
 lull-0.012
Token other
Token ID584
Feature activation-8.30e-03

Pos logit contributions
+0.045
iw+0.037
ll+0.037
ve+0.036
ner+0.035
Neg logit contributions
 face-0.027
 aids-0.025
 jokes-0.024
 repeat-0.022
 song-0.022
Token plants
Token ID6134
Feature activation+7.07e-03

Pos logit contributions
+0.020
iw+0.017
ll+0.016
ve+0.016
ner+0.016
Neg logit contributions
 face-0.012
 aids-0.011
 jokes-0.011
 repeat-0.010
 echoed-0.010
Token."
Token ID526
Feature activation-5.43e-03

Pos logit contributions
 aids+0.013
 face+0.013
 jokes+0.012
 echoed+0.012
 repeat+0.012
Neg logit contributions
-0.014
iw-0.011
ve-0.010
ll-0.010
field-0.010
Token The
Token ID383
Feature activation-6.18e-03

Pos logit contributions
+0.012
iw+0.010
ve+0.010
ll+0.010
field+0.009
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.008
 echoed-0.007
 lull-0.007
Token f
Token ID277
Feature activation+3.18e-02

Pos logit contributions
+0.015
iw+0.013
ve+0.012
ll+0.012
ner+0.012
Neg logit contributions
 face-0.009
 aids-0.008
 jokes-0.008
 repeat-0.007
 song-0.007
Tokenauc
Token ID14272
Feature activation+8.79e-02

Pos logit contributions
 aids+0.046
 face+0.045
 lull+0.044
 jokes+0.043
 repeat+0.041
Neg logit contributions
-0.076
iw-0.066
ve-0.063
ll-0.061
ner-0.057
Tokenet
Token ID316
Feature activation+3.90e-02

Pos logit contributions
 aids+0.119
 face+0.106
 lull+0.106
 repeat+0.103
 echoed+0.101
Neg logit contributions
-0.173
iw-0.172
mer-0.168
ll-0.165
ve-0.163
Token said
Token ID531
Feature activation-2.07e-03

Pos logit contributions
 aids+0.071
 face+0.064
 jokes+0.062
 lull+0.062
 repeat+0.059
Neg logit contributions
-0.077
iw-0.065
ll-0.064
ve-0.062
field-0.059
Token,
Token ID11
Feature activation+1.25e-02

Pos logit contributions
+0.003
iw+0.002
ve+0.002
ll+0.002
ner+0.002
Neg logit contributions
 aids-0.005
tr-0.004
iet-0.004
 face-0.004
 jokes-0.004
Token "
Token ID366
Feature activation-1.90e-02

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.014
 lull+0.014
 echoed+0.014
Neg logit contributions
-0.032
iw-0.027
ve-0.026
ll-0.025
field-0.025
TokenYes
Token ID5297
Feature activation+1.26e-02

Pos logit contributions
+0.042
iw+0.034
ve+0.034
ll+0.033
field+0.032
Neg logit contributions
 aids-0.029
 face-0.028
 jokes-0.028
 lull-0.027
 repeat-0.027
Token had
Token ID550
Feature activation-1.18e-02

Pos logit contributions
 aids+0.011
 face+0.011
 jokes+0.010
 echoed+0.010
 lull+0.010
Neg logit contributions
-0.024
iw-0.021
ve-0.019
ll-0.019
ner-0.019
Token a
Token ID257
Feature activation+8.57e-03

Pos logit contributions
+0.016
 searching+0.013
iw+0.013
 lost+0.013
ve+0.012
Neg logit contributions
 aids-0.026
 lull-0.024
iet-0.024
 jokes-0.023
 echoed-0.023
Token long
Token ID890
Feature activation+4.19e-02

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.015
 echoed+0.014
 lull+0.014
Neg logit contributions
-0.017
iw-0.014
ve-0.013
ll-0.012
field-0.012
Token,
Token ID11
Feature activation+1.42e-03

Pos logit contributions
 aids+0.068
 echoed+0.059
 jokes+0.057
 lull+0.056
 erase+0.054
Neg logit contributions
-0.093
 searching-0.075
field-0.070
ve-0.068
iw-0.067
Token w
Token ID266
Feature activation+2.58e-02

Pos logit contributions
 aids+0.003
 jokes+0.002
iet+0.002
 lull+0.002
 echoed+0.002
Neg logit contributions
-0.003
iw-0.002
 searching-0.002
ve-0.002
 busy-0.002
Tokenag
Token ID363
Feature activation+8.73e-02

Pos logit contributions
 aids+0.036
 face+0.034
 jokes+0.033
 echoed+0.031
 lull+0.031
Neg logit contributions
-0.063
iw-0.054
ve-0.051
ll-0.050
field-0.049
Tokengy
Token ID1360
Feature activation+2.20e-02

Pos logit contributions
 aids+0.154
 jokes+0.147
 repeat+0.140
 lull+0.137
 face+0.137
Neg logit contributions
-0.171
ve-0.141
iw-0.132
ll-0.122
field-0.120
Token tail
Token ID7894
Feature activation+1.96e-02

Pos logit contributions
 aids+0.037
 echoed+0.034
 lull+0.032
 repeat+0.032
 jokes+0.032
Neg logit contributions
-0.046
iw-0.037
field-0.035
ve-0.034
 searching-0.033
Token and
Token ID290
Feature activation-7.02e-03

Pos logit contributions
 aids+0.037
 face+0.035
 jokes+0.034
 lull+0.033
 repeat+0.033
Neg logit contributions
-0.038
iw-0.030
ve-0.028
ll-0.028
field-0.027
Token loved
Token ID6151
Feature activation-3.11e-02

Pos logit contributions
+0.013
iw+0.010
 searching+0.010
field+0.010
ve+0.009
Neg logit contributions
 aids-0.013
iet-0.012
 repeat-0.011
 lull-0.011
tr-0.011
Token to
Token ID284
Feature activation+4.19e-03

Pos logit contributions
+0.068
iw+0.056
ve+0.054
ll+0.052
ner+0.050
Neg logit contributions
 aids-0.052
 jokes-0.051
 face-0.050
 lull-0.047
 echoed-0.046
Token a
Token ID257
Feature activation+1.39e-02

Pos logit contributions
 aids+0.054
 echoed+0.050
 lull+0.049
tr+0.048
 jokes+0.047
Neg logit contributions
-0.052
iw-0.045
ve-0.041
ll-0.039
ner-0.037
Token cookie
Token ID19751
Feature activation+2.12e-03

Pos logit contributions
 aids+0.027
 echoed+0.025
iet+0.025
 jokes+0.024
ena+0.023
Neg logit contributions
-0.027
iw-0.023
ve-0.020
ll-0.018
ner-0.017
Token.
Token ID13
Feature activation+1.53e-02

Pos logit contributions
 aids+0.004
 face+0.004
 jokes+0.004
 lull+0.004
 echoed+0.004
Neg logit contributions
-0.004
iw-0.003
ve-0.003
ll-0.003
field-0.003
Token They
Token ID1119
Feature activation+1.64e-02

Pos logit contributions
 aids+0.025
 lull+0.024
 face+0.024
 jokes+0.023
iet+0.023
Neg logit contributions
-0.034
iw-0.027
ner-0.024
ll-0.023
field-0.023
Token w
Token ID266
Feature activation+2.25e-02

Pos logit contributions
 aids+0.021
 face+0.017
 lull+0.017
 echoed+0.017
 jokes+0.017
Neg logit contributions
-0.042
iw-0.036
ll-0.034
 least-0.034
ve-0.034
Tokenag
Token ID363
Feature activation+8.72e-02

Pos logit contributions
 aids+0.025
 face+0.022
 lull+0.021
tr+0.021
 echoed+0.021
Neg logit contributions
-0.062
iw-0.053
ve-0.051
ll-0.050
ner-0.048
Token their
Token ID511
Feature activation+2.69e-02

Pos logit contributions
 aids+0.126
 repeat+0.122
 lull+0.120
tr+0.120
 jokes+0.114
Neg logit contributions
-0.175
ve-0.156
iw-0.146
ll-0.145
field-0.135
Token tails
Token ID30514
Feature activation+2.29e-02

Pos logit contributions
 aids+0.033
 echoed+0.030
 lull+0.028
tr+0.027
 sounded+0.027
Neg logit contributions
-0.068
iw-0.060
ve-0.056
ll-0.054
field-0.053
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
 aids+0.049
 jokes+0.044
 face+0.043
tr+0.043
 lull+0.043
Neg logit contributions
-0.038
iw-0.029
ve-0.027
ll-0.027
field-0.026
Token They
Token ID1119
Feature activation+1.24e-02

Pos logit contributions
 aids+0.019
 face+0.018
 lull+0.018
 jokes+0.017
 repeat+0.017
Neg logit contributions
-0.025
iw-0.020
ner-0.018
ll-0.017
ve-0.017
Token are
Token ID389
Feature activation+9.03e-03

Pos logit contributions
 aids+0.017
 lull+0.014
tr+0.014
 echoed+0.014
 face+0.014
Neg logit contributions
-0.031
iw-0.026
ll-0.025
 least-0.025
ve-0.024
Token each
Token ID1123
Feature activation+1.40e-02

Pos logit contributions
 aids+0.010
 face+0.009
tr+0.009
 echoed+0.009
 lull+0.008
Neg logit contributions
-0.011
iw-0.010
ll-0.009
 searching-0.009
ve-0.009
Token other
Token ID584
Feature activation-4.31e-03

Pos logit contributions
 aids+0.025
 face+0.023
tr+0.023
 jokes+0.023
 echoed+0.022
Neg logit contributions
-0.028
iw-0.022
ve-0.022
ll-0.021
ner-0.020
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
 least+0.005
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.008
tr-0.008
 lull-0.008
Token They
Token ID1119
Feature activation+6.31e-03

Pos logit contributions
 aids+0.027
 face+0.026
 jokes+0.025
 lull+0.024
 repeat+0.024
Neg logit contributions
-0.033
iw-0.027
ll-0.024
ve-0.024
field-0.023
Token w
Token ID266
Feature activation+1.88e-02

Pos logit contributions
 aids+0.009
 face+0.008
tr+0.007
 jokes+0.007
 echoed+0.007
Neg logit contributions
-0.015
iw-0.013
ll-0.012
ve-0.012
 least-0.012
Tokenag
Token ID363
Feature activation+8.57e-02

Pos logit contributions
 aids+0.020
 face+0.019
tr+0.018
 lull+0.018
 echoed+0.017
Neg logit contributions
-0.051
iw-0.043
ve-0.042
ll-0.041
 least-0.039
Token their
Token ID511
Feature activation+2.23e-02

Pos logit contributions
 aids+0.127
tr+0.123
 repeat+0.120
 lull+0.117
 face+0.115
Neg logit contributions
-0.168
ve-0.149
iw-0.140
ll-0.137
mer-0.131
Token tails
Token ID30514
Feature activation+2.26e-02

Pos logit contributions
 aids+0.027
 echoed+0.024
tr+0.023
 lull+0.022
 repeat+0.022
Neg logit contributions
-0.057
iw-0.049
ve-0.048
ll-0.046
field-0.044
Token and
Token ID290
Feature activation-9.63e-04

Pos logit contributions
 aids+0.047
tr+0.042
 face+0.041
 lull+0.041
iet+0.041
Neg logit contributions
-0.039
iw-0.029
ll-0.028
ve-0.027
field-0.027
Token lick
Token ID36728
Feature activation+1.55e-02

Pos logit contributions
+0.002
iw+0.002
field+0.001
 least+0.001
 searching+0.001
Neg logit contributions
 aids-0.002
iet-0.002
tr-0.002
ont-0.001
 echoed-0.001
Token each
Token ID1123
Feature activation+2.78e-02

Pos logit contributions
 aids+0.027
tr+0.027
 repeat+0.024
 echoed+0.024
 face+0.023
Neg logit contributions
-0.029
iw-0.024
ve-0.020
ll-0.020
 anytime-0.020
Token't
Token ID470
Feature activation+2.67e-02

Pos logit contributions
+0.002
iw+0.001
ll+0.001
ve+0.001
field+0.001
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
tr-0.003
 echoed-0.003
Token work
Token ID670
Feature activation-2.77e-03

Pos logit contributions
 aids+0.060
 face+0.056
 jokes+0.055
 lull+0.053
 echoed+0.053
Neg logit contributions
-0.042
iw-0.032
ve-0.031
ll-0.030
ner-0.028
Token.
Token ID13
Feature activation+9.33e-03

Pos logit contributions
+0.007
iw+0.006
field+0.005
ll+0.005
ve+0.005
Neg logit contributions
 aids-0.004
 face-0.004
 echoed-0.004
 lull-0.004
 jokes-0.004
Token The
Token ID383
Feature activation-1.04e-03

Pos logit contributions
 aids+0.019
 jokes+0.018
 repeat+0.017
 lull+0.017
 echoed+0.017
Neg logit contributions
-0.017
iw-0.013
ll-0.012
ve-0.012
ner-0.011
Token f
Token ID277
Feature activation+3.24e-02

Pos logit contributions
+0.002
iw+0.002
ve+0.002
ll+0.002
 least+0.002
Neg logit contributions
 aids-0.002
 echoed-0.002
 jokes-0.002
 face-0.002
 lull-0.002
Tokenauc
Token ID14272
Feature activation+8.52e-02

Pos logit contributions
 face+0.050
 lull+0.050
 repeat+0.049
 aids+0.049
 jokes+0.048
Neg logit contributions
-0.073
iw-0.062
ve-0.061
ll-0.061
mer-0.054
Tokenet
Token ID316
Feature activation+3.52e-02

Pos logit contributions
 aids+0.114
 face+0.105
 lull+0.101
 repeat+0.100
 echoed+0.093
Neg logit contributions
-0.176
iw-0.164
mer-0.162
ll-0.161
ve-0.158
Token seemed
Token ID3947
Feature activation-1.85e-02

Pos logit contributions
 aids+0.061
 face+0.056
 jokes+0.056
 lull+0.055
 repeat+0.052
Neg logit contributions
-0.074
iw-0.060
ve-0.059
ll-0.059
field-0.054
Token broken
Token ID5445
Feature activation+1.64e-03

Pos logit contributions
+0.036
iw+0.030
ve+0.028
ll+0.027
field+0.026
Neg logit contributions
 aids-0.035
 face-0.033
 jokes-0.032
 lull-0.031
 echoed-0.031
Token.
Token ID13
Feature activation+1.33e-02

Pos logit contributions
 face+0.002
 aids+0.002
 echoed+0.002
 jokes+0.002
 lull+0.002
Neg logit contributions
-0.004
iw-0.003
ve-0.003
ll-0.003
field-0.003
Token
Token ID198
Feature activation+1.72e-02

Pos logit contributions
 aids+0.027
 jokes+0.026
 face+0.025
 repeat+0.025
 lull+0.025
Neg logit contributions
-0.024
iw-0.019
ll-0.017
ve-0.017
ner-0.016
Token."
Token ID526
Feature activation-8.49e-03

Pos logit contributions
+0.015
iw+0.012
ve+0.011
ll+0.011
field+0.011
Neg logit contributions
 aids-0.014
 face-0.014
 jokes-0.013
 echoed-0.013
 lull-0.013
Token
Token ID198
Feature activation+3.89e-03

Pos logit contributions
+0.019
iw+0.017
ve+0.017
ll+0.016
field+0.016
Neg logit contributions
 face-0.013
 aids-0.012
 jokes-0.011
 repeat-0.011
 echoed-0.011
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.007
 lull+0.006
 repeat+0.006
Neg logit contributions
-0.008
iw-0.006
ve-0.006
ll-0.006
field-0.006
TokenThe
Token ID464
Feature activation-9.88e-03

Pos logit contributions
+0.027
iw+0.023
ll+0.022
ve+0.022
field+0.021
Neg logit contributions
 aids-0.013
 face-0.012
 jokes-0.012
 echoed-0.011
 lull-0.011
Token f
Token ID277
Feature activation+3.32e-02

Pos logit contributions
+0.022
iw+0.018
ve+0.017
ll+0.017
field+0.016
Neg logit contributions
 aids-0.016
 face-0.015
 jokes-0.015
 lull-0.014
 echoed-0.014
Tokenauc
Token ID14272
Feature activation+8.44e-02

Pos logit contributions
 aids+0.047
 face+0.046
 lull+0.046
 jokes+0.045
 repeat+0.044
Neg logit contributions
-0.078
iw-0.069
ll-0.067
ve-0.066
mer-0.058
Tokenet
Token ID316
Feature activation+3.08e-02

Pos logit contributions
 aids+0.112
 face+0.099
 lull+0.099
 repeat+0.098
 echoed+0.096
Neg logit contributions
-0.170
ll-0.167
iw-0.166
mer-0.160
ve-0.151
Token was
Token ID373
Feature activation+2.30e-02

Pos logit contributions
 aids+0.055
 lull+0.048
 jokes+0.048
 face+0.047
 Watch+0.047
Neg logit contributions
-0.061
ll-0.050
iw-0.048
ve-0.046
 least-0.045
Token happy
Token ID3772
Feature activation-4.81e-03

Pos logit contributions
 aids+0.049
 jokes+0.045
tr+0.045
 face+0.044
 repeat+0.044
Neg logit contributions
-0.034
iw-0.026
ll-0.025
 least-0.025
 searching-0.024
Token again
Token ID757
Feature activation-1.74e-02

Pos logit contributions
+0.010
iw+0.008
ve+0.007
ll+0.007
field+0.007
Neg logit contributions
 aids-0.009
 face-0.008
 jokes-0.008
 lull-0.008
 echoed-0.008
Token,
Token ID11
Feature activation+6.68e-03

Pos logit contributions
+0.035
iw+0.029
ve+0.027
ll+0.027
field+0.026
Neg logit contributions
 aids-0.031
 face-0.030
 jokes-0.029
 echoed-0.028
 lull-0.028
Token brown
Token ID7586
Feature activation+2.48e-02

Pos logit contributions
+0.008
iw+0.007
ve+0.006
ll+0.006
 searching+0.006
Neg logit contributions
 aids-0.008
 face-0.007
 jokes-0.007
 echoed-0.007
 lull-0.007
Token dog
Token ID3290
Feature activation+3.99e-03

Pos logit contributions
 aids+0.047
 echoed+0.041
 lull+0.040
 jokes+0.040
 repeat+0.038
Neg logit contributions
-0.050
iw-0.040
field-0.036
 Smith-0.035
ve-0.035
Token with
Token ID351
Feature activation-6.42e-03

Pos logit contributions
 aids+0.005
 face+0.005
 jokes+0.005
tr+0.005
 lull+0.005
Neg logit contributions
-0.010
iw-0.008
ve-0.008
ll-0.008
ner-0.007
Token a
Token ID257
Feature activation+1.83e-04

Pos logit contributions
+0.009
iw+0.007
ve+0.007
 searching+0.007
 lost+0.006
Neg logit contributions
 aids-0.014
 lull-0.013
 repeat-0.012
iet-0.012
 nod-0.012
Token w
Token ID266
Feature activation+2.23e-02

Pos logit contributions
 aids+0.000
 jokes+0.000
iet+0.000
 echoed+0.000
ena+0.000
Neg logit contributions
-0.000
ve-0.000
iw-0.000
 searching-0.000
field-0.000
Tokenag
Token ID363
Feature activation+8.39e-02

Pos logit contributions
 aids+0.035
 lull+0.032
 echoed+0.032
 face+0.032
 repeat+0.031
Neg logit contributions
-0.051
iw-0.042
ve-0.041
ll-0.039
ner-0.036
Tokengy
Token ID1360
Feature activation+1.86e-02

Pos logit contributions
 aids+0.150
 jokes+0.146
 repeat+0.138
 lull+0.135
 face+0.131
Neg logit contributions
-0.162
ve-0.131
iw-0.111
mer-0.108
 searching-0.106
Token tail
Token ID7894
Feature activation+1.73e-02

Pos logit contributions
 aids+0.025
 echoed+0.022
iet+0.020
 repeat+0.020
 lull+0.020
Neg logit contributions
-0.044
iw-0.035
field-0.035
ve-0.035
 searching-0.033
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
 aids+0.036
 jokes+0.033
 face+0.032
 lull+0.032
 repeat+0.032
Neg logit contributions
-0.031
iw-0.023
ve-0.022
field-0.021
ll-0.021
Token Tim
Token ID5045
Feature activation-4.26e-03

Pos logit contributions
 aids+0.023
 face+0.022
 jokes+0.021
 lull+0.020
 echoed+0.020
Neg logit contributions
-0.028
iw-0.023
ll-0.021
ve-0.021
field-0.020
Tokenmy
Token ID1820
Feature activation-3.47e-03

Pos logit contributions
+0.010
iw+0.009
ve+0.008
ll+0.008
field+0.008
Neg logit contributions
 aids-0.006
 face-0.006
 jokes-0.006
 lull-0.005
 echoed-0.005
Token Lily
Token ID20037
Feature activation-9.67e-03

Pos logit contributions
 aids+0.030
tr+0.027
 lull+0.027
 repeat+0.026
 echoed+0.026
Neg logit contributions
-0.023
iw-0.020
ve-0.017
us-0.017
ll-0.016
Token and
Token ID290
Feature activation-1.45e-03

Pos logit contributions
+0.020
iw+0.018
ve+0.017
ll+0.017
 least+0.016
Neg logit contributions
 aids-0.014
 echoed-0.012
iet-0.012
tr-0.011
 jokes-0.011
Token Tom
Token ID4186
Feature activation-8.34e-03

Pos logit contributions
+0.003
iw+0.002
 least+0.002
 searching+0.002
field+0.002
Neg logit contributions
 aids-0.002
iet-0.002
tr-0.002
 lull-0.002
 echoed-0.002
Token and
Token ID290
Feature activation+7.26e-03

Pos logit contributions
+0.014
iw+0.012
ve+0.011
ll+0.011
 least+0.011
Neg logit contributions
 aids-0.016
tr-0.015
iet-0.014
 echoed-0.014
 face-0.014
Token w
Token ID266
Feature activation+2.54e-02

Pos logit contributions
 aids+0.013
iet+0.013
ainer+0.011
 lull+0.011
ity+0.011
Neg logit contributions
iw-0.012
-0.012
 least-0.011
 Smith-0.009
 searching-0.009
Tokenag
Token ID363
Feature activation+8.37e-02

Pos logit contributions
tr+0.034
 aids+0.033
 lull+0.030
 repeat+0.030
 echoed+0.029
Neg logit contributions
-0.060
ve-0.052
iw-0.050
ll-0.049
 least-0.044
Token his
Token ID465
Feature activation+2.77e-02

Pos logit contributions
 aids+0.129
 repeat+0.124
tr+0.123
 lull+0.119
 face+0.118
Neg logit contributions
-0.151
ve-0.143
iw-0.131
ll-0.128
mer-0.125
Token tail
Token ID7894
Feature activation+8.65e-03

Pos logit contributions
 aids+0.037
 echoed+0.032
 jokes+0.031
tr+0.030
 repeat+0.030
Neg logit contributions
-0.063
iw-0.055
ve-0.055
field-0.051
 searching-0.050
Token.
Token ID13
Feature activation+6.15e-03

Pos logit contributions
 aids+0.020
iet+0.018
 jokes+0.018
 echoed+0.018
tr+0.018
Neg logit contributions
-0.011
iw-0.009
ve-0.008
field-0.007
ll-0.007
Token
Token ID198
Feature activation+5.62e-03

Pos logit contributions
 aids+0.013
 face+0.013
 jokes+0.012
 lull+0.012
iet+0.012
Neg logit contributions
-0.010
iw-0.008
ll-0.007
ner-0.007
ve-0.006
Token
Token ID198
Feature activation-1.14e-02

Pos logit contributions
 aids+0.011
 face+0.010
 jokes+0.010
 lull+0.010
 echoed+0.009
Neg logit contributions
-0.011
iw-0.009
ve-0.008
ll-0.008
 least-0.008
Token a
Token ID257
Feature activation+9.12e-03

Pos logit contributions
 aids+0.025
 face+0.025
 jokes+0.023
 echoed+0.021
 repeat+0.021
Neg logit contributions
-0.038
iw-0.032
ll-0.030
ve-0.030
field-0.029
Token bucket
Token ID19236
Feature activation+1.75e-02

Pos logit contributions
 aids+0.016
 jokes+0.015
 face+0.015
 echoed+0.015
 lull+0.014
Neg logit contributions
-0.018
iw-0.015
ve-0.014
ll-0.014
field-0.013
Token to
Token ID284
Feature activation+9.88e-03

Pos logit contributions
 aids+0.027
 face+0.026
 jokes+0.025
 lull+0.024
 echoed+0.024
Neg logit contributions
-0.039
iw-0.033
ve-0.031
ll-0.031
field-0.030
Token the
Token ID262
Feature activation-5.36e-03

Pos logit contributions
 face+0.016
 aids+0.016
 jokes+0.015
 repeat+0.014
 echoed+0.014
Neg logit contributions
-0.022
iw-0.019
ve-0.018
ll-0.017
field-0.017
Token f
Token ID277
Feature activation+3.13e-02

Pos logit contributions
+0.013
iw+0.011
ll+0.010
ve+0.010
ner+0.010
Neg logit contributions
 face-0.008
 aids-0.008
 jokes-0.007
 repeat-0.007
tr-0.007
Tokenauc
Token ID14272
Feature activation+8.36e-02

Pos logit contributions
 aids+0.038
 face+0.036
 jokes+0.035
 lull+0.034
 repeat+0.033
Neg logit contributions
-0.082
iw-0.072
ve-0.068
ll-0.067
 least-0.064
Tokenet
Token ID316
Feature activation+3.29e-02

Pos logit contributions
 aids+0.109
 echoed+0.104
 lull+0.100
 repeat+0.095
 face+0.093
Neg logit contributions
-0.186
iw-0.176
mer-0.167
 least-0.161
ll-0.160
Token and
Token ID290
Feature activation+9.66e-03

Pos logit contributions
 aids+0.057
 face+0.055
 jokes+0.053
 echoed+0.051
 lull+0.051
Neg logit contributions
-0.068
iw-0.056
ve-0.053
ll-0.052
field-0.049
Token filled
Token ID5901
Feature activation+4.43e-03

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.014
 lull+0.014
 echoed+0.014
Neg logit contributions
-0.021
iw-0.018
ve-0.016
ll-0.016
field-0.016
Token it
Token ID340
Feature activation-1.52e-02

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.007
 repeat+0.006
 lull+0.006
Neg logit contributions
-0.010
iw-0.008
ll-0.008
ve-0.007
field-0.007
Token with
Token ID351
Feature activation+1.39e-02

Pos logit contributions
+0.033
iw+0.028
ve+0.026
ll+0.026
ner+0.025
Neg logit contributions
 aids-0.024
 face-0.024
 jokes-0.023
 lull-0.022
 echoed-0.022
Token day
Token ID1110
Feature activation+8.48e-03

Pos logit contributions
 aids+0.038
 face+0.035
 jokes+0.035
 echoed+0.034
tr+0.034
Neg logit contributions
-0.019
iw-0.014
ll-0.013
ve-0.012
field-0.012
Token,
Token ID11
Feature activation+1.97e-02

Pos logit contributions
 aids+0.017
 face+0.017
 jokes+0.016
 lull+0.016
 echoed+0.016
Neg logit contributions
-0.015
iw-0.012
ve-0.011
ll-0.011
field-0.010
Token a
Token ID257
Feature activation-5.53e-03

Pos logit contributions
 face+0.028
 aids+0.027
 jokes+0.027
 echoed+0.025
 repeat+0.025
Neg logit contributions
-0.047
iw-0.041
ll-0.041
ve-0.040
field-0.037
Token kind
Token ID1611
Feature activation+2.12e-02

Pos logit contributions
+0.013
iw+0.011
ve+0.010
ll+0.010
field+0.010
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.008
 lull-0.007
 repeat-0.007
Token f
Token ID277
Feature activation+3.22e-02

Pos logit contributions
 aids+0.038
 jokes+0.035
 face+0.035
 echoed+0.034
 lull+0.034
Neg logit contributions
-0.042
iw-0.035
ve-0.032
ll-0.031
ner-0.030
Tokenauc
Token ID14272
Feature activation+8.36e-02

Pos logit contributions
 aids+0.045
 face+0.044
 jokes+0.042
 lull+0.042
 repeat+0.040
Neg logit contributions
-0.078
iw-0.069
ll-0.066
ve-0.066
ner-0.060
Tokenet
Token ID316
Feature activation+3.64e-02

Pos logit contributions
 aids+0.112
 face+0.098
 lull+0.094
 repeat+0.090
 echoed+0.089
Neg logit contributions
-0.174
iw-0.165
mer-0.163
ll-0.161
 least-0.156
Token saw
Token ID2497
Feature activation-3.33e-03

Pos logit contributions
 aids+0.063
 jokes+0.056
 face+0.056
 lull+0.055
 repeat+0.051
Neg logit contributions
-0.077
iw-0.063
ll-0.062
ve-0.061
 least-0.059
Token the
Token ID262
Feature activation-3.67e-02

Pos logit contributions
+0.008
iw+0.007
ve+0.006
ll+0.006
field+0.006
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.005
 echoed-0.004
 sounded-0.004
Token sad
Token ID6507
Feature activation-1.58e-03

Pos logit contributions
+0.090
iw+0.078
ve+0.073
field+0.072
ll+0.072
Neg logit contributions
 face-0.054
 jokes-0.048
 aids-0.047
 repeat-0.044
tr-0.043
Token boy
Token ID2933
Feature activation-4.18e-03

Pos logit contributions
+0.003
iw+0.002
ve+0.002
ll+0.002
ner+0.002
Neg logit contributions
 aids-0.003
 lull-0.003
tr-0.003
 jokes-0.003
 repeat-0.003
Token,
Token ID11
Feature activation+1.04e-02

Pos logit contributions
+0.013
iw+0.010
ve+0.009
ll+0.009
ner+0.009
Neg logit contributions
 aids-0.017
 face-0.016
 jokes-0.016
 lull-0.016
tr-0.016
Token silly
Token ID14397
Feature activation+1.74e-02

Pos logit contributions
 aids+0.021
iet+0.019
tr+0.019
 face+0.019
 echoed+0.018
Neg logit contributions
-0.018
iw-0.015
ll-0.014
 least-0.013
ner-0.013
Token!
Token ID0
Feature activation+2.57e-03

Pos logit contributions
 aids+0.026
tr+0.024
 repeat+0.022
iet+0.022
 erase+0.022
Neg logit contributions
-0.036
ve-0.030
iw-0.030
ll-0.030
ner-0.028
Token I
Token ID314
Feature activation+1.76e-02

Pos logit contributions
 face+0.005
 aids+0.005
iet+0.005
 jokes+0.005
 lull+0.005
Neg logit contributions
-0.005
iw-0.004
ner-0.004
ll-0.003
ve-0.003
Token w
Token ID266
Feature activation+2.02e-02

Pos logit contributions
 aids+0.031
 face+0.029
 jokes+0.028
 lull+0.026
 echoed+0.026
Neg logit contributions
-0.037
iw-0.031
ve-0.030
ll-0.030
 least-0.027
Tokenag
Token ID363
Feature activation+8.36e-02

Pos logit contributions
 aids+0.028
 face+0.026
tr+0.025
 lull+0.025
 jokes+0.025
Neg logit contributions
-0.049
iw-0.041
ve-0.040
ll-0.039
ner-0.037
Token my
Token ID616
Feature activation+3.16e-02

Pos logit contributions
 aids+0.120
 lull+0.120
tr+0.119
 repeat+0.119
 jokes+0.115
Neg logit contributions
-0.157
ve-0.148
iw-0.137
ll-0.134
mer-0.132
Token tail
Token ID7894
Feature activation+1.30e-02

Pos logit contributions
 aids+0.034
 echoed+0.029
 jokes+0.028
 lull+0.027
 face+0.027
Neg logit contributions
-0.085
iw-0.074
ve-0.073
ll-0.070
field-0.068
Token when
Token ID618
Feature activation-2.44e-02

Pos logit contributions
 aids+0.027
 face+0.026
 jokes+0.025
 lull+0.024
 echoed+0.024
Neg logit contributions
-0.022
iw-0.017
ve-0.016
ll-0.016
field-0.015
Token I
Token ID314
Feature activation-2.10e-03

Pos logit contributions
+0.045
iw+0.036
ve+0.033
ll+0.033
field+0.031
Neg logit contributions
 aids-0.048
 face-0.047
 jokes-0.045
 echoed-0.044
 lull-0.044
Token am
Token ID716
Feature activation-2.52e-03

Pos logit contributions
+0.005
iw+0.004
ll+0.004
ve+0.004
 least+0.004
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.003
tr-0.003
Token said
Token ID531
Feature activation-8.48e-03

Pos logit contributions
 aids+0.023
 jokes+0.020
 face+0.020
 lull+0.020
tr+0.019
Neg logit contributions
-0.031
iw-0.026
ll-0.025
ve-0.025
field-0.024
Token,
Token ID11
Feature activation+1.61e-02

Pos logit contributions
+0.015
iw+0.012
ve+0.012
ll+0.011
ner+0.011
Neg logit contributions
 aids-0.017
 face-0.016
 jokes-0.016
 echoed-0.015
 lull-0.015
Token "
Token ID366
Feature activation-1.68e-02

Pos logit contributions
 aids+0.020
 face+0.019
 jokes+0.018
 lull+0.017
 echoed+0.017
Neg logit contributions
-0.041
iw-0.036
ve-0.034
ll-0.034
field-0.033
TokenThe
Token ID464
Feature activation-1.63e-02

Pos logit contributions
+0.031
iw+0.025
ve+0.023
ll+0.023
field+0.021
Neg logit contributions
 aids-0.034
 face-0.033
 jokes-0.032
 lull-0.031
 repeat-0.031
Token f
Token ID277
Feature activation+3.06e-02

Pos logit contributions
+0.037
iw+0.031
ve+0.029
ll+0.029
field+0.028
Neg logit contributions
 aids-0.026
 face-0.024
 jokes-0.023
 lull-0.022
 echoed-0.022
Tokenauc
Token ID14272
Feature activation+8.35e-02

Pos logit contributions
 aids+0.039
 face+0.038
 lull+0.036
 jokes+0.036
 repeat+0.034
Neg logit contributions
-0.078
iw-0.068
ve-0.065
ll-0.064
ner-0.060
Tokenet
Token ID316
Feature activation+2.75e-02

Pos logit contributions
 aids+0.109
 lull+0.102
 echoed+0.097
 face+0.096
 repeat+0.096
Neg logit contributions
-0.174
iw-0.169
ll-0.163
ve-0.161
mer-0.157
Token is
Token ID318
Feature activation+1.35e-02

Pos logit contributions
 aids+0.050
 lull+0.045
 face+0.045
 jokes+0.044
tr+0.042
Neg logit contributions
-0.055
iw-0.047
ve-0.046
ll-0.046
 least-0.042
Token too
Token ID1165
Feature activation-5.71e-03

Pos logit contributions
 aids+0.027
 face+0.025
 jokes+0.025
 lull+0.024
 echoed+0.024
Neg logit contributions
-0.025
iw-0.020
ve-0.019
ll-0.018
 least-0.018
Token tight
Token ID5381
Feature activation-5.96e-03

Pos logit contributions
+0.013
iw+0.011
ner+0.010
field+0.010
 least+0.010
Neg logit contributions
 face-0.009
 aids-0.008
 jokes-0.008
 echoed-0.008
 lull-0.008
Token.
Token ID13
Feature activation-4.51e-03

Pos logit contributions
+0.013
iw+0.011
ve+0.010
ll+0.010
field+0.010
Neg logit contributions
 aids-0.010
 face-0.009
 jokes-0.009
 echoed-0.009
 lull-0.009
Token strong
Token ID1913
Feature activation+2.10e-02

Pos logit contributions
 aids+0.007
 echoed+0.007
 jokes+0.006
 lull+0.006
 face+0.006
Neg logit contributions
-0.007
iw-0.006
ve-0.006
ll-0.005
field-0.005
Token body
Token ID1767
Feature activation+2.95e-02

Pos logit contributions
 aids+0.033
 echoed+0.029
 lull+0.028
 jokes+0.027
iet+0.026
Neg logit contributions
-0.048
iw-0.040
field-0.036
 searching-0.036
ve-0.036
Token and
Token ID290
Feature activation-1.60e-02

Pos logit contributions
 aids+0.063
 jokes+0.059
 face+0.058
 lull+0.057
 echoed+0.057
Neg logit contributions
-0.050
iw-0.038
ve-0.035
ll-0.035
field-0.034
Token a
Token ID257
Feature activation+1.54e-02

Pos logit contributions
+0.028
iw+0.023
 searching+0.022
ve+0.022
ll+0.021
Neg logit contributions
 aids-0.032
iet-0.028
 lull-0.027
tr-0.027
 repeat-0.026
Token w
Token ID266
Feature activation+2.33e-02

Pos logit contributions
 aids+0.030
 echoed+0.026
 jokes+0.026
 lull+0.025
iet+0.025
Neg logit contributions
-0.030
iw-0.024
ve-0.023
field-0.021
ll-0.021
Tokenag
Token ID363
Feature activation+8.31e-02

Pos logit contributions
 aids+0.035
 face+0.033
 jokes+0.032
 lull+0.032
 echoed+0.031
Neg logit contributions
-0.054
iw-0.045
ve-0.044
ll-0.043
field-0.040
Tokengy
Token ID1360
Feature activation+1.48e-02

Pos logit contributions
 aids+0.150
 jokes+0.145
 repeat+0.135
 face+0.133
 lull+0.132
Neg logit contributions
-0.159
ve-0.130
iw-0.119
field-0.114
mer-0.112
Token tail
Token ID7894
Feature activation+1.24e-02

Pos logit contributions
 aids+0.024
 echoed+0.022
 jokes+0.021
 repeat+0.021
 lull+0.021
Neg logit contributions
-0.031
iw-0.025
field-0.025
ve-0.024
ll-0.023
Token.
Token ID13
Feature activation+6.89e-03

Pos logit contributions
 aids+0.025
 face+0.024
 jokes+0.024
 echoed+0.023
 repeat+0.023
Neg logit contributions
-0.022
iw-0.017
ve-0.016
field-0.015
ll-0.015
Token One
Token ID1881
Feature activation+7.60e-03

Pos logit contributions
 aids+0.012
 face+0.011
 lull+0.010
 echoed+0.010
 jokes+0.010
Neg logit contributions
-0.015
iw-0.012
ner-0.011
field-0.011
ll-0.011
Token day
Token ID1110
Feature activation+8.49e-03

Pos logit contributions
 aids+0.022
 jokes+0.020
 erase+0.019
 sentences+0.019
 face+0.019
Neg logit contributions
-0.006
iw-0.004
ll-0.004
field-0.003
 busy-0.003
Token Lucky
Token ID27963
Feature activation+6.49e-03

Pos logit contributions
 aids+0.037
 face+0.034
tr+0.034
 jokes+0.033
 echoed+0.033
Neg logit contributions
-0.031
iw-0.023
ll-0.021
ve-0.021
ner-0.021
Token the
Token ID262
Feature activation-1.65e-02

Pos logit contributions
 aids+0.010
 face+0.010
 jokes+0.009
 lull+0.009
tr+0.008
Neg logit contributions
-0.014
iw-0.012
ll-0.012
ve-0.011
field-0.011
Token dog
Token ID3290
Feature activation+1.03e-03

Pos logit contributions
+0.032
iw+0.024
field+0.023
ll+0.023
 least+0.023
Neg logit contributions
 aids-0.032
 echoed-0.027
 lull-0.026
iet-0.026
tr-0.026
Token would
Token ID561
Feature activation-7.36e-03

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.001
 lull+0.001
 echoed+0.001
Neg logit contributions
-0.002
iw-0.002
ll-0.002
ve-0.002
field-0.002
Token w
Token ID266
Feature activation+1.28e-02

Pos logit contributions
+0.017
field+0.013
ll+0.013
iw+0.013
 searching+0.013
Neg logit contributions
 aids-0.011
tr-0.009
 echoed-0.009
iet-0.009
ena-0.008
Tokenag
Token ID363
Feature activation+8.24e-02

Pos logit contributions
 aids+0.014
tr+0.013
 face+0.013
 lull+0.012
 echoed+0.012
Neg logit contributions
-0.035
ve-0.029
iw-0.029
ll-0.028
ner-0.026
Token his
Token ID465
Feature activation+1.52e-02

Pos logit contributions
 aids+0.131
 repeat+0.122
tr+0.122
 lull+0.120
 face+0.117
Neg logit contributions
-0.155
ve-0.139
mer-0.123
ll-0.123
iw-0.122
Token tail
Token ID7894
Feature activation+1.12e-02

Pos logit contributions
 aids+0.017
 echoed+0.014
 jokes+0.014
 lull+0.013
 repeat+0.013
Neg logit contributions
-0.041
iw-0.035
ve-0.034
field-0.033
ll-0.033
Token around
Token ID1088
Feature activation-1.05e-02

Pos logit contributions
 aids+0.021
 face+0.020
 jokes+0.019
 echoed+0.019
 lull+0.019
Neg logit contributions
-0.021
iw-0.016
field-0.015
ll-0.015
 least-0.015
Token and
Token ID290
Feature activation+1.27e-04

Pos logit contributions
+0.024
iw+0.020
ve+0.019
ll+0.019
field+0.018
Neg logit contributions
 aids-0.016
 face-0.016
 jokes-0.015
 lull-0.014
 echoed-0.014
Token it
Token ID340
Feature activation-5.01e-03

Pos logit contributions
 aids+0.000
iet+0.000
tr+0.000
ena+0.000
ia+0.000
Neg logit contributions
-0.000
iw-0.000
 searching-0.000
field-0.000
 least-0.000
Token She
Token ID1375
Feature activation-3.41e-03

Pos logit contributions
 aids+0.004
 lull+0.004
 jokes+0.004
 face+0.004
 echoed+0.004
Neg logit contributions
-0.005
iw-0.004
ll-0.004
ve-0.004
field-0.004
Token turned
Token ID2900
Feature activation+1.70e-03

Pos logit contributions
+0.007
iw+0.006
ll+0.006
ve+0.006
field+0.005
Neg logit contributions
 aids-0.006
 face-0.005
 jokes-0.005
 lull-0.005
 echoed-0.005
Token on
Token ID319
Feature activation+4.66e-03

Pos logit contributions
 aids+0.003
 jokes+0.003
 face+0.003
 repeat+0.003
 lull+0.003
Neg logit contributions
-0.003
iw-0.003
ve-0.002
ner-0.002
ll-0.002
Token the
Token ID262
Feature activation+1.25e-02

Pos logit contributions
 aids+0.008
 face+0.007
 jokes+0.007
 lull+0.007
 echoed+0.007
Neg logit contributions
-0.010
iw-0.008
ve-0.008
ll-0.007
field-0.007
Token f
Token ID277
Feature activation+3.51e-02

Pos logit contributions
 aids+0.023
 echoed+0.021
 jokes+0.021
 repeat+0.021
 lull+0.020
Neg logit contributions
-0.024
ve-0.018
iw-0.018
ll-0.017
ner-0.017
Tokenauc
Token ID14272
Feature activation+8.21e-02

Pos logit contributions
 aids+0.051
 face+0.050
 lull+0.049
 jokes+0.049
 repeat+0.049
Neg logit contributions
-0.082
iw-0.071
ll-0.067
ve-0.065
ner-0.061
Tokenet
Token ID316
Feature activation+2.85e-02

Pos logit contributions
 aids+0.115
 lull+0.109
 echoed+0.105
 repeat+0.104
 face+0.100
Neg logit contributions
-0.164
iw-0.153
mer-0.145
ll-0.143
 least-0.141
Token and
Token ID290
Feature activation-6.56e-03

Pos logit contributions
 aids+0.059
 jokes+0.054
 face+0.054
 lull+0.053
 echoed+0.051
Neg logit contributions
-0.052
iw-0.038
ll-0.038
 least-0.036
ve-0.035
Token filled
Token ID5901
Feature activation+7.49e-03

Pos logit contributions
+0.011
iw+0.010
 least+0.009
 searching+0.009
ll+0.009
Neg logit contributions
 aids-0.013
 lull-0.012
 jokes-0.012
iet-0.011
 song-0.011
Token the
Token ID262
Feature activation-5.13e-03

Pos logit contributions
 aids+0.013
 echoed+0.012
 face+0.012
 lull+0.012
tr+0.012
Neg logit contributions
-0.015
iw-0.013
ve-0.012
ll-0.011
ner-0.011
Token sink
Token ID14595
Feature activation+5.47e-03

Pos logit contributions
+0.009
iw+0.007
ve+0.007
ll+0.006
 least+0.006
Neg logit contributions
 aids-0.010
 echoed-0.010
 jokes-0.009
 repeat-0.009
 lull-0.009
Token him
Token ID683
Feature activation+3.74e-03

Pos logit contributions
+0.007
iw+0.006
ll+0.006
ve+0.005
field+0.005
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.005
 echoed-0.005
 repeat-0.005
Token,
Token ID11
Feature activation+4.90e-04

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.006
 lull+0.006
 echoed+0.006
Neg logit contributions
-0.007
iw-0.006
ve-0.006
ll-0.006
field-0.005
Token he
Token ID339
Feature activation-1.60e-02

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 lull+0.001
tr+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
field-0.001
Token would
Token ID561
Feature activation-5.62e-03

Pos logit contributions
+0.035
iw+0.029
ll+0.029
ve+0.028
field+0.027
Neg logit contributions
 aids-0.026
 face-0.024
 jokes-0.023
 lull-0.022
tr-0.022
Token w
Token ID266
Feature activation+1.60e-02

Pos logit contributions
+0.014
ll+0.011
iw+0.011
ve+0.011
 least+0.011
Neg logit contributions
 aids-0.008
tr-0.007
 face-0.007
 jokes-0.007
 echoed-0.007
Tokenag
Token ID363
Feature activation+8.21e-02

Pos logit contributions
tr+0.018
 aids+0.018
 face+0.016
 lull+0.016
 echoed+0.016
Neg logit contributions
-0.042
ve-0.037
ll-0.035
iw-0.034
ner-0.032
Token his
Token ID465
Feature activation+2.05e-02

Pos logit contributions
 aids+0.128
tr+0.125
 repeat+0.120
 jokes+0.117
 lull+0.117
Neg logit contributions
-0.154
ve-0.143
mer-0.125
ll-0.123
field-0.122
Token tail
Token ID7894
Feature activation+7.02e-03

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.013
 lull+0.012
 echoed+0.012
Neg logit contributions
-0.063
iw-0.056
ve-0.053
ll-0.053
ner-0.051
Token happily
Token ID18177
Feature activation-1.75e-02

Pos logit contributions
 aids+0.014
 face+0.013
 jokes+0.013
tr+0.012
 repeat+0.012
Neg logit contributions
-0.013
iw-0.010
ll-0.009
ve-0.009
field-0.009
Token and
Token ID290
Feature activation-1.28e-03

Pos logit contributions
+0.032
iw+0.026
ve+0.025
ll+0.025
field+0.023
Neg logit contributions
 aids-0.035
 face-0.032
 jokes-0.032
 lull-0.031
tr-0.031
Token act
Token ID719
Feature activation+2.60e-02

Pos logit contributions
+0.003
 least+0.002
iw+0.002
field+0.002
 searching+0.002
Neg logit contributions
 aids-0.002
iet-0.002
tr-0.002
 face-0.002
 lull-0.002
Token nothing
Token ID2147
Feature activation+4.88e-03

Pos logit contributions
 aids+0.019
 face+0.017
 jokes+0.017
 lull+0.017
 echoed+0.016
Neg logit contributions
-0.019
iw-0.015
ve-0.014
ll-0.014
field-0.014
Token happened
Token ID3022
Feature activation-2.23e-02

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.007
 echoed+0.006
 lull+0.006
Neg logit contributions
-0.012
iw-0.010
ve-0.009
ll-0.009
field-0.009
Token.
Token ID13
Feature activation+6.15e-03

Pos logit contributions
+0.052
iw+0.044
ve+0.039
ll+0.039
field+0.038
Neg logit contributions
 aids-0.033
 face-0.033
 jokes-0.032
 repeat-0.030
 echoed-0.029
Token The
Token ID383
Feature activation-6.91e-03

Pos logit contributions
 aids+0.009
 face+0.009
 jokes+0.008
 echoed+0.008
 lull+0.008
Neg logit contributions
-0.014
iw-0.012
ve-0.012
ll-0.012
field-0.011
Token f
Token ID277
Feature activation+2.88e-02

Pos logit contributions
+0.016
iw+0.013
ve+0.012
ll+0.012
field+0.012
Neg logit contributions
 aids-0.011
 face-0.010
 jokes-0.010
 lull-0.010
 echoed-0.010
Tokenauc
Token ID14272
Feature activation+8.20e-02

Pos logit contributions
 lull+0.042
 aids+0.042
 face+0.042
 jokes+0.041
 repeat+0.040
Neg logit contributions
-0.067
iw-0.058
ll-0.057
ve-0.057
mer-0.050
Tokenet
Token ID316
Feature activation+3.39e-02

Pos logit contributions
 aids+0.110
 face+0.103
 lull+0.103
 repeat+0.100
 nod+0.093
Neg logit contributions
-0.163
mer-0.155
iw-0.151
ll-0.151
ve-0.150
Token just
Token ID655
Feature activation+1.81e-02

Pos logit contributions
 aids+0.058
 face+0.054
 jokes+0.053
 lull+0.053
 repeat+0.049
Neg logit contributions
-0.071
iw-0.058
ve-0.057
ll-0.057
field-0.054
Token sat
Token ID3332
Feature activation-2.15e-02

Pos logit contributions
 aids+0.031
iet+0.028
 lull+0.027
 face+0.027
tr+0.026
Neg logit contributions
-0.033
 least-0.029
 searching-0.028
iw-0.028
ll-0.027
Token there
Token ID612
Feature activation-2.68e-02

Pos logit contributions
+0.054
iw+0.046
irst+0.043
ner+0.043
ve+0.042
Neg logit contributions
 face-0.027
iet-0.024
 sounded-0.023
 echoed-0.023
 aids-0.023
Token,
Token ID11
Feature activation-6.94e-03

Pos logit contributions
+0.055
iw+0.046
ve+0.043
ll+0.042
field+0.040
Neg logit contributions
 aids-0.047
 face-0.046
 jokes-0.044
 echoed-0.042
 lull-0.042
Token flew
Token ID13112
Feature activation-2.49e-03

Pos logit contributions
+0.039
iw+0.033
ll+0.031
ve+0.030
 least+0.030
Neg logit contributions
 aids-0.037
 face-0.035
 jokes-0.034
tr-0.033
 lull-0.033
Token off
Token ID572
Feature activation-1.50e-02

Pos logit contributions
+0.005
iw+0.004
ve+0.004
ll+0.004
field+0.004
Neg logit contributions
 aids-0.005
 face-0.004
 jokes-0.004
 lull-0.004
tr-0.004
Token with
Token ID351
Feature activation+1.68e-02

Pos logit contributions
+0.033
iw+0.028
ll+0.026
ve+0.026
 least+0.025
Neg logit contributions
 aids-0.024
 face-0.024
 jokes-0.022
 echoed-0.022
 lull-0.021
Token a
Token ID257
Feature activation+7.10e-03

Pos logit contributions
 aids+0.024
 face+0.024
 jokes+0.023
 echoed+0.021
 lull+0.021
Neg logit contributions
-0.040
iw-0.035
ve-0.033
ll-0.033
field-0.031
Token w
Token ID266
Feature activation+2.51e-02

Pos logit contributions
 aids+0.012
 face+0.011
 jokes+0.011
 lull+0.010
 echoed+0.010
Neg logit contributions
-0.016
iw-0.013
ve-0.012
ll-0.012
field-0.012
Tokenag
Token ID363
Feature activation+8.19e-02

Pos logit contributions
 aids+0.041
 face+0.039
 jokes+0.037
 lull+0.036
 echoed+0.035
Neg logit contributions
-0.055
iw-0.047
ll-0.043
ve-0.043
field-0.042
Token of
Token ID286
Feature activation+3.87e-02

Pos logit contributions
 aids+0.142
 jokes+0.142
 repeat+0.135
 face+0.131
 Watch+0.130
Neg logit contributions
-0.159
ve-0.131
field-0.118
ll-0.113
iw-0.111
Token his
Token ID465
Feature activation+1.95e-02

Pos logit contributions
 aids+0.065
tr+0.064
 echoed+0.060
 face+0.059
 repeat+0.059
Neg logit contributions
-0.077
iw-0.064
 searching-0.058
 least-0.057
ve-0.056
Token tail
Token ID7894
Feature activation+8.97e-03

Pos logit contributions
 aids+0.028
 face+0.028
 jokes+0.026
 lull+0.025
 echoed+0.024
Neg logit contributions
-0.046
iw-0.040
ve-0.037
ll-0.037
ner-0.036
Token!
Token ID0
Feature activation+1.75e-02

Pos logit contributions
 aids+0.018
 jokes+0.017
 face+0.017
 repeat+0.016
 lull+0.016
Neg logit contributions
-0.016
iw-0.012
ve-0.012
ll-0.011
field-0.011
Token<|endoftext|>
Token ID50256
Feature activation-4.11e-02

Pos logit contributions
 aids+0.032
 face+0.030
 jokes+0.029
 echoed+0.028
tr+0.028
Neg logit contributions
-0.034
iw-0.029
ve-0.028
ll-0.027
 least-0.026
Token She
Token ID1375
Feature activation+1.16e-02

Pos logit contributions
 aids+0.010
 jokes+0.010
 lull+0.010
 echoed+0.009
iet+0.009
Neg logit contributions
-0.012
iw-0.009
ner-0.008
 Smith-0.008
 searching-0.008
Token turned
Token ID2900
Feature activation-1.26e-03

Pos logit contributions
 aids+0.020
 jokes+0.018
 face+0.018
 lull+0.017
 echoed+0.017
Neg logit contributions
-0.024
iw-0.020
ll-0.020
ve-0.020
field-0.018
Token on
Token ID319
Feature activation+1.48e-02

Pos logit contributions
+0.002
iw+0.002
ve+0.002
ner+0.002
ll+0.002
Neg logit contributions
 aids-0.002
 jokes-0.002
 face-0.002
tr-0.002
 repeat-0.002
Token the
Token ID262
Feature activation+4.17e-03

Pos logit contributions
 aids+0.030
 echoed+0.028
tr+0.028
 jokes+0.028
 lull+0.027
Neg logit contributions
-0.025
iw-0.020
ve-0.018
ner-0.017
ll-0.017
Token f
Token ID277
Feature activation+3.52e-02

Pos logit contributions
 aids+0.007
 repeat+0.007
 echoed+0.007
 jokes+0.007
 lull+0.006
Neg logit contributions
-0.009
ve-0.007
iw-0.007
ll-0.006
ner-0.006
Tokenauc
Token ID14272
Feature activation+8.18e-02

Pos logit contributions
 aids+0.051
 lull+0.050
 face+0.049
 jokes+0.049
 repeat+0.048
Neg logit contributions
-0.083
iw-0.072
ve-0.067
ll-0.066
ner-0.062
Tokenet
Token ID316
Feature activation+3.23e-02

Pos logit contributions
 aids+0.114
 echoed+0.100
 lull+0.100
 repeat+0.098
 face+0.095
Neg logit contributions
-0.169
iw-0.156
mer-0.148
ll-0.147
 least-0.144
Token and
Token ID290
Feature activation+2.43e-03

Pos logit contributions
 aids+0.069
 jokes+0.062
 face+0.061
 lull+0.060
iet+0.057
Neg logit contributions
-0.059
iw-0.043
ll-0.043
ve-0.042
ner-0.040
Token icy
Token ID30284
Feature activation+2.16e-02

Pos logit contributions
 aids+0.005
 jokes+0.004
 lull+0.004
 sentence+0.004
tr+0.004
Neg logit contributions
-0.004
iw-0.003
ve-0.003
ner-0.003
ll-0.003
Token water
Token ID1660
Feature activation+4.38e-03

Pos logit contributions
 aids+0.038
 repeat+0.034
tr+0.032
 jokes+0.032
 lull+0.032
Neg logit contributions
-0.044
iw-0.035
ve-0.034
ll-0.033
field-0.032
Token came
Token ID1625
Feature activation-7.31e-04

Pos logit contributions
 aids+0.009
 jokes+0.008
 lull+0.008
 repeat+0.008
 face+0.008
Neg logit contributions
-0.008
iw-0.006
ve-0.006
ll-0.006
field-0.006
Token mom
Token ID1995
Feature activation-2.86e-02

Pos logit contributions
 aids+0.020
 echoed+0.018
 face+0.017
tr+0.017
 jokes+0.017
Neg logit contributions
-0.029
iw-0.023
ll-0.023
ve-0.022
 least-0.022
Token was
Token ID373
Feature activation-5.87e-03

Pos logit contributions
+0.060
iw+0.051
ll+0.050
ve+0.049
field+0.046
Neg logit contributions
 aids-0.048
 face-0.044
 jokes-0.043
 lull-0.040
 echoed-0.040
Token right
Token ID826
Feature activation-7.88e-03

Pos logit contributions
+0.011
iw+0.009
ve+0.008
ll+0.008
field+0.008
Neg logit contributions
 aids-0.011
 face-0.011
 jokes-0.011
 lull-0.010
 echoed-0.010
Tokenâ
Token ID22940
Feature activation-2.96e-02

Pos logit contributions
+0.016
iw+0.013
ve+0.012
ll+0.012
field+0.011
Neg logit contributions
 aids-0.014
 face-0.014
 jokes-0.013
 lull-0.013
 echoed-0.013
Token
Token ID26391
Feature activation-4.37e-02

Pos logit contributions
+0.068
iw+0.058
ve+0.054
ll+0.054
field+0.052
Neg logit contributions
 aids-0.045
 face-0.043
 jokes-0.041
 lull-0.039
 echoed-0.039
Token
Token ID447
Feature activation-8.78e-02

Pos logit contributions
+0.088
iw+0.080
 least+0.071
field+0.069
ve+0.069
Neg logit contributions
 aids-0.072
 face-0.069
 echoed-0.068
 jokes-0.065
tr-0.064
Token
Token ID251
Feature activation-2.01e-02

Pos logit contributions
iw+0.022
+0.012
ve+0.006
 searching+0.005
lves+0.004
Neg logit contributions
 face-0.274
tr-0.263
 aids-0.262
 repeat-0.262
 echoed-0.255
Tokenshe
Token ID7091
Feature activation+3.90e-03

Pos logit contributions
+0.037
iw+0.030
ve+0.027
ll+0.027
field+0.026
Neg logit contributions
 aids-0.040
 face-0.038
 jokes-0.037
 lull-0.036
tr-0.036
Token was
Token ID373
Feature activation+5.44e-03

Pos logit contributions
 aids+0.007
 jokes+0.006
 songs+0.006
icks+0.006
 face+0.006
Neg logit contributions
-0.007
ve-0.006
ll-0.006
 least-0.005
field-0.005
Token happy
Token ID3772
Feature activation-1.34e-02

Pos logit contributions
 aids+0.012
 jokes+0.011
 face+0.011
tr+0.011
 lull+0.011
Neg logit contributions
-0.009
iw-0.006
ve-0.006
 least-0.006
ll-0.006
Token being
Token ID852
Feature activation+7.48e-03

Pos logit contributions
+0.024
iw+0.020
ll+0.018
ve+0.018
field+0.017
Neg logit contributions
 aids-0.027
 face-0.025
 jokes-0.025
tr-0.024
 echoed-0.024
Token she
Token ID673
Feature activation-1.84e-03

Pos logit contributions
 aids+0.018
 face+0.017
 jokes+0.017
 lull+0.017
tr+0.017
Neg logit contributions
-0.017
iw-0.014
ve-0.013
ll-0.013
field-0.012
Token realized
Token ID6939
Feature activation-4.11e-03

Pos logit contributions
+0.004
iw+0.003
ve+0.003
ll+0.003
field+0.003
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.003
tr-0.003
Token something
Token ID1223
Feature activation-2.07e-03

Pos logit contributions
+0.007
iw+0.005
ve+0.005
ll+0.005
field+0.004
Neg logit contributions
 aids-0.009
 face-0.009
 jokes-0.008
 lull-0.008
tr-0.008
Tokenâ
Token ID22940
Feature activation-2.48e-02

Pos logit contributions
+0.004
iw+0.003
ve+0.003
ll+0.003
 least+0.003
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.003
 lull-0.003
tr-0.003
Token
Token ID26391
Feature activation-3.25e-02

Pos logit contributions
+0.059
iw+0.050
ve+0.047
ll+0.047
field+0.045
Neg logit contributions
 aids-0.036
 face-0.034
 jokes-0.033
 lull-0.031
 echoed-0.031
Token
Token ID447
Feature activation-8.55e-02

Pos logit contributions
+0.066
iw+0.063
 least+0.057
ll+0.054
field+0.053
Neg logit contributions
 aids-0.050
 face-0.048
 echoed-0.046
tr-0.044
 jokes-0.043
Token
Token ID251
Feature activation-2.80e-02

Pos logit contributions
iw+0.023
 searching+0.012
ll+0.010
irst+0.009
ve+0.003
Neg logit contributions
 face-0.254
tr-0.252
 aids-0.252
 repeat-0.245
 echoed-0.243
Tokenthe
Token ID1169
Feature activation-1.58e-02

Pos logit contributions
+0.055
iw+0.043
ve+0.040
ll+0.039
field+0.038
Neg logit contributions
 aids-0.053
 jokes-0.049
 face-0.049
 lull-0.048
tr-0.048
Token yacht
Token ID43207
Feature activation-8.60e-03

Pos logit contributions
+0.033
iw+0.029
ll+0.027
ve+0.026
field+0.025
Neg logit contributions
 aids-0.027
 face-0.027
 jokes-0.026
 lull-0.024
 echoed-0.024
Token wasn
Token ID2492
Feature activation-2.13e-02

Pos logit contributions
+0.019
iw+0.015
ve+0.015
ll+0.015
field+0.014
Neg logit contributions
 aids-0.014
 jokes-0.013
 face-0.013
 lull-0.012
tr-0.012
Token't
Token ID470
Feature activation+1.81e-02

Pos logit contributions
+0.052
iw+0.046
ve+0.043
ll+0.043
field+0.042
Neg logit contributions
 aids-0.028
 face-0.026
 jokes-0.025
 echoed-0.024
 repeat-0.024
Tokenwho
Token ID8727
Feature activation+2.90e-02

Pos logit contributions
+0.076
iw+0.062
ve+0.060
ll+0.059
field+0.058
Neg logit contributions
 aids-0.048
 face-0.046
 repeat-0.046
 jokes-0.045
 lull-0.044
Token can
Token ID460
Feature activation-3.64e-02

Pos logit contributions
 aids+0.042
 lull+0.041
 face+0.039
tr+0.038
 jokes+0.036
Neg logit contributions
-0.062
iw-0.054
ve-0.054
ll-0.051
ner-0.050
Token tell
Token ID1560
Feature activation-1.10e-03

Pos logit contributions
+0.069
iw+0.055
ve+0.052
ll+0.051
ner+0.049
Neg logit contributions
 aids-0.070
 face-0.067
 jokes-0.065
 lull-0.064
 echoed-0.063
Token me
Token ID502
Feature activation+5.82e-03

Pos logit contributions
+0.002
iw+0.001
ve+0.001
ll+0.001
 least+0.001
Neg logit contributions
 aids-0.003
 face-0.002
 lull-0.002
tr-0.002
 echoed-0.002
Token what
Token ID644
Feature activation-2.47e-02

Pos logit contributions
 aids+0.010
tr+0.010
 echoed+0.010
 face+0.010
 lull+0.010
Neg logit contributions
-0.011
iw-0.009
ll-0.008
ve-0.008
 least-0.008
Token '
Token ID705
Feature activation-8.49e-02

Pos logit contributions
+0.039
iw+0.032
ll+0.029
ve+0.029
ner+0.027
Neg logit contributions
 aids-0.052
 face-0.051
 lull-0.049
tr-0.048
 jokes-0.048
Tokenwe
Token ID732
Feature activation+8.78e-03

Pos logit contributions
+0.162
iw+0.133
ll+0.127
ve+0.124
field+0.124
Neg logit contributions
 aids-0.151
 repeat-0.138
 face-0.138
 jokes-0.136
 echoed-0.136
Tokenigh
Token ID394
Feature activation-9.82e-03

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.014
tr+0.014
 lull+0.014
Neg logit contributions
-0.018
ve-0.015
iw-0.015
ll-0.014
ner-0.014
Token'
Token ID6
Feature activation-3.65e-02

Pos logit contributions
+0.019
ve+0.015
iw+0.015
ll+0.015
ner+0.014
Neg logit contributions
 aids-0.017
 face-0.017
 lull-0.016
 jokes-0.016
tr-0.016
Token means
Token ID1724
Feature activation-1.44e-02

Pos logit contributions
+0.075
iw+0.062
ve+0.058
ll+0.058
field+0.055
Neg logit contributions
 aids-0.064
 face-0.061
 jokes-0.059
 lull-0.057
 echoed-0.057
Token?"
Token ID1701
Feature activation-1.12e-02

Pos logit contributions
+0.028
iw+0.024
ll+0.021
ve+0.021
field+0.020
Neg logit contributions
 aids-0.027
 face-0.026
 jokes-0.025
 repeat-0.024
 echoed-0.024
Token't
Token ID470
Feature activation+2.85e-02

Pos logit contributions
+0.035
iw+0.030
ll+0.025
ve+0.025
field+0.023
Neg logit contributions
 aids-0.062
 face-0.060
 jokes-0.058
 repeat-0.057
 echoed-0.057
Token believe
Token ID1975
Feature activation+6.62e-03

Pos logit contributions
 aids+0.042
 face+0.038
 jokes+0.038
tr+0.036
 lull+0.036
Neg logit contributions
-0.067
iw-0.055
ve-0.054
ll-0.052
 least-0.052
Token it
Token ID340
Feature activation-2.81e-02

Pos logit contributions
 aids+0.011
tr+0.011
 face+0.010
 lull+0.010
 jokes+0.010
Neg logit contributions
-0.014
iw-0.011
ve-0.011
ll-0.011
ner-0.011
Tokenâ
Token ID22940
Feature activation-1.40e-02

Pos logit contributions
+0.051
iw+0.039
ve+0.039
ll+0.037
field+0.035
Neg logit contributions
 aids-0.057
 face-0.053
 jokes-0.052
 lull-0.051
tr-0.050
Token
Token ID26391
Feature activation-3.23e-02

Pos logit contributions
+0.034
iw+0.028
ve+0.027
ll+0.026
 least+0.026
Neg logit contributions
 aids-0.020
 face-0.019
 jokes-0.019
 echoed-0.017
 lull-0.017
Token
Token ID447
Feature activation-8.33e-02

Pos logit contributions
+0.064
iw+0.059
 least+0.052
ll+0.052
ve+0.051
Neg logit contributions
 aids-0.054
 face-0.050
tr-0.049
 echoed-0.048
 jokes-0.047
Token
Token ID251
Feature activation-3.00e-02

Pos logit contributions
iw+0.028
 searching+0.025
lves+0.021
irst+0.020
ll+0.019
Neg logit contributions
 face-0.231
tr-0.231
 repeat-0.228
 aids-0.224
 nod-0.217
Tokenthis
Token ID5661
Feature activation+3.60e-03

Pos logit contributions
+0.058
iw+0.047
ve+0.043
ll+0.042
field+0.041
Neg logit contributions
 aids-0.058
 face-0.055
 jokes-0.054
 lull-0.052
 echoed-0.052
Token one
Token ID530
Feature activation-8.22e-03

Pos logit contributions
 aids+0.006
 face+0.006
 jokes+0.006
tr+0.005
 echoed+0.005
Neg logit contributions
-0.008
ve-0.006
iw-0.006
field-0.006
ner-0.005
Token feather
Token ID17972
Feature activation+6.21e-03

Pos logit contributions
+0.016
iw+0.013
ve+0.012
ll+0.012
field+0.012
Neg logit contributions
 aids-0.015
tr-0.014
 jokes-0.013
 face-0.013
 lull-0.013
Token seemed
Token ID3947
Feature activation-3.38e-02

Pos logit contributions
 aids+0.012
 jokes+0.011
 face+0.010
tr+0.010
 lull+0.010
Neg logit contributions
-0.012
ve-0.010
ll-0.009
field-0.009
iw-0.009
Token Steve
Token ID6542
Feature activation-1.82e-03

Pos logit contributions
 aids+0.003
 face+0.003
 jokes+0.003
 lull+0.003
 echoed+0.003
Neg logit contributions
-0.005
iw-0.004
ve-0.004
ll-0.004
field-0.004
Token was
Token ID373
Feature activation-4.46e-03

Pos logit contributions
+0.004
iw+0.003
ve+0.003
ll+0.003
field+0.003
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.002
 echoed-0.002
Token amazed
Token ID24562
Feature activation-3.36e-02

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
field+0.005
Neg logit contributions
 aids-0.009
 face-0.009
 jokes-0.009
 echoed-0.009
 lull-0.008
Tokenâ
Token ID22940
Feature activation-2.92e-02

Pos logit contributions
+0.078
iw+0.066
ve+0.061
ner+0.059
ll+0.058
Neg logit contributions
 face-0.053
 jokes-0.050
 aids-0.050
 echoed-0.048
 lull-0.046
Token
Token ID26391
Feature activation-2.94e-02

Pos logit contributions
+0.066
iw+0.056
ve+0.052
ll+0.052
field+0.050
Neg logit contributions
 aids-0.046
 face-0.043
 jokes-0.042
 lull-0.040
 echoed-0.040
Token
Token ID447
Feature activation-8.23e-02

Pos logit contributions
+0.059
iw+0.054
 least+0.050
ner+0.045
irst+0.045
Neg logit contributions
 aids-0.049
 face-0.047
 echoed-0.047
 jokes-0.044
tr-0.044
Token
Token ID251
Feature activation-2.36e-02

Pos logit contributions
iw+0.076
irst+0.065
ll+0.064
ge+0.060
handed+0.059
Neg logit contributions
 face-0.200
tr-0.188
 repeat-0.186
 echoed-0.182
 sounds-0.181
Tokenhis
Token ID14363
Feature activation+2.93e-03

Pos logit contributions
+0.050
iw+0.042
ve+0.040
ll+0.039
field+0.037
Neg logit contributions
 aids-0.040
 face-0.038
 jokes-0.037
 echoed-0.035
 lull-0.035
Token wings
Token ID12098
Feature activation+2.55e-02

Pos logit contributions
 aids+0.005
 face+0.004
 jokes+0.004
 lull+0.004
 echoed+0.004
Neg logit contributions
-0.006
iw-0.005
ve-0.005
ll-0.005
field-0.005
Token started
Token ID2067
Feature activation-2.08e-02

Pos logit contributions
 aids+0.049
 jokes+0.045
tr+0.043
 face+0.041
 lull+0.040
Neg logit contributions
-0.049
ve-0.040
ll-0.038
field-0.037
iw-0.037
Token to
Token ID284
Feature activation-1.97e-02

Pos logit contributions
+0.033
iw+0.026
ve+0.025
ll+0.024
 searching+0.024
Neg logit contributions
 aids-0.044
tr-0.041
 jokes-0.041
 face-0.041
 lull-0.040
Token have
Token ID423
Feature activation-9.28e-03

Pos logit contributions
 aids+0.007
 face+0.007
 repeat+0.006
 jokes+0.006
tr+0.006
Neg logit contributions
-0.013
iw-0.011
ll-0.010
field-0.010
ve-0.010
Token a
Token ID257
Feature activation+2.74e-03

Pos logit contributions
+0.021
iw+0.018
ve+0.017
ll+0.016
field+0.016
Neg logit contributions
 aids-0.015
 face-0.014
 jokes-0.013
 lull-0.013
 echoed-0.013
Token label
Token ID6167
Feature activation+1.84e-02

Pos logit contributions
 aids+0.004
 face+0.004
 jokes+0.004
 echoed+0.004
 lull+0.004
Neg logit contributions
-0.006
iw-0.005
ve-0.005
ll-0.005
field-0.005
Token that
Token ID326
Feature activation-1.65e-02

Pos logit contributions
 aids+0.039
 face+0.037
 jokes+0.037
 lull+0.036
 echoed+0.036
Neg logit contributions
-0.030
iw-0.024
ve-0.023
ll-0.021
ner-0.021
Token says
Token ID1139
Feature activation-9.23e-03

Pos logit contributions
+0.036
iw+0.032
ve+0.029
ll+0.029
 least+0.028
Neg logit contributions
 aids-0.025
 lull-0.024
 face-0.023
 jokes-0.022
 echoed-0.022
Token '
Token ID705
Feature activation-8.20e-02

Pos logit contributions
+0.018
iw+0.014
ve+0.014
ner+0.013
ll+0.013
Neg logit contributions
 aids-0.016
 lull-0.016
 face-0.015
 jokes-0.015
 echoed-0.015
Tokencars
Token ID37993
Feature activation-7.06e-03

Pos logit contributions
+0.163
iw+0.135
ll+0.131
ve+0.126
field+0.124
Neg logit contributions
 aids-0.143
 face-0.131
tr-0.127
 repeat-0.127
 jokes-0.126
Token'.
Token ID4458
Feature activation+5.55e-04

Pos logit contributions
+0.013
iw+0.011
ve+0.010
ll+0.010
ner+0.009
Neg logit contributions
 aids-0.014
 face-0.013
 jokes-0.013
 lull-0.012
 echoed-0.012
Token I
Token ID314
Feature activation+2.80e-02

Pos logit contributions
 aids+0.001
 face+0.001
 lull+0.001
 jokes+0.001
tr+0.001
Neg logit contributions
-0.001
iw-0.001
ner-0.001
ve-0.001
ll-0.001
Token will
Token ID481
Feature activation+1.54e-03

Pos logit contributions
 aids+0.046
 face+0.043
 jokes+0.042
 lull+0.041
 echoed+0.040
Neg logit contributions
-0.062
iw-0.052
ve-0.050
ll-0.048
 least-0.046
Token put
Token ID1234
Feature activation-4.68e-03

Pos logit contributions
 aids+0.002
 echoed+0.002
 jokes+0.002
 face+0.002
 lull+0.002
Neg logit contributions
-0.004
iw-0.003
ve-0.003
ll-0.003
 least-0.003
Token have
Token ID423
Feature activation-3.27e-04

Pos logit contributions
 aids+0.037
 face+0.035
 jokes+0.033
 echoed+0.032
 lull+0.032
Neg logit contributions
-0.053
iw-0.045
ve-0.043
ll-0.042
field-0.040
Token a
Token ID257
Feature activation+9.90e-03

Pos logit contributions
+0.001
iw+0.001
ve+0.001
ll+0.001
field+0.001
Neg logit contributions
 aids-0.001
 face-0.001
 jokes-0.000
 lull-0.000
 echoed-0.000
Token label
Token ID6167
Feature activation+1.96e-02

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.014
 echoed+0.013
 lull+0.013
Neg logit contributions
-0.023
iw-0.019
ve-0.018
ll-0.018
field-0.017
Token that
Token ID326
Feature activation-1.15e-02

Pos logit contributions
 aids+0.039
 face+0.037
 jokes+0.037
 lull+0.036
 echoed+0.036
Neg logit contributions
-0.034
iw-0.029
ve-0.027
ll-0.025
ner-0.025
Token says
Token ID1139
Feature activation-5.07e-03

Pos logit contributions
+0.024
iw+0.021
ll+0.019
ve+0.019
 least+0.019
Neg logit contributions
 aids-0.019
 lull-0.018
 face-0.018
 jokes-0.017
 echoed-0.017
Token '
Token ID705
Feature activation-8.17e-02

Pos logit contributions
+0.012
iw+0.010
ve+0.009
ner+0.009
ll+0.009
Neg logit contributions
 aids-0.007
 face-0.007
 lull-0.007
 jokes-0.007
 echoed-0.007
Tokenanim
Token ID11227
Feature activation-2.09e-02

Pos logit contributions
+0.164
iw+0.136
ll+0.132
ve+0.127
field+0.126
Neg logit contributions
 aids-0.140
 face-0.128
 repeat-0.125
tr-0.125
 jokes-0.124
Tokenals
Token ID874
Feature activation+1.89e-02

Pos logit contributions
+0.046
iw+0.039
ve+0.036
ll+0.036
field+0.034
Neg logit contributions
 aids-0.034
 face-0.032
 jokes-0.031
 lull-0.030
 echoed-0.030
Token'.
Token ID4458
Feature activation+3.87e-03

Pos logit contributions
 aids+0.038
 jokes+0.036
 face+0.036
 lull+0.034
 echoed+0.034
Neg logit contributions
-0.034
iw-0.026
ve-0.026
ll-0.025
ner-0.024
Token I
Token ID314
Feature activation+3.06e-02

Pos logit contributions
 aids+0.006
 face+0.006
 jokes+0.006
 lull+0.006
 echoed+0.006
Neg logit contributions
-0.009
iw-0.007
ve-0.007
ner-0.006
ll-0.006
Token will
Token ID481
Feature activation+2.92e-03

Pos logit contributions
 aids+0.050
 face+0.046
 jokes+0.046
 echoed+0.044
 lull+0.044
Neg logit contributions
-0.068
iw-0.056
ve-0.055
ll-0.052
 least-0.051
Token creature
Token ID7185
Feature activation+2.87e-02

Pos logit contributions
+0.037
iw+0.031
ll+0.029
ve+0.028
ner+0.027
Neg logit contributions
 aids-0.027
 face-0.027
 jokes-0.026
 lull-0.025
 repeat-0.024
Token was
Token ID373
Feature activation+4.75e-04

Pos logit contributions
 aids+0.049
 face+0.045
 jokes+0.044
tr+0.043
 repeat+0.041
Neg logit contributions
-0.057
iw-0.048
ll-0.048
ve-0.047
field-0.047
Token right
Token ID826
Feature activation-1.14e-02

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
tr+0.001
 echoed+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
field-0.001
Tokenâ
Token ID22940
Feature activation-1.19e-02

Pos logit contributions
+0.024
iw+0.020
ve+0.018
ll+0.018
ner+0.017
Neg logit contributions
 face-0.020
 aids-0.019
 jokes-0.018
 lull-0.018
 repeat-0.018
Token
Token ID26391
Feature activation-3.08e-02

Pos logit contributions
+0.027
iw+0.022
ve+0.021
ll+0.021
field+0.020
Neg logit contributions
 aids-0.019
 face-0.018
 jokes-0.017
 echoed-0.016
 lull-0.016
Token
Token ID447
Feature activation-8.12e-02

Pos logit contributions
iw+0.061
+0.059
 least+0.053
ner+0.048
irst+0.048
Neg logit contributions
 aids-0.046
 face-0.046
 echoed-0.046
tr-0.044
 repeat-0.041
Token
Token ID251
Feature activation-2.23e-02

Pos logit contributions
iw+0.053
irst+0.048
ge+0.043
lves+0.042
 searching+0.041
Neg logit contributions
 face-0.211
 repeat-0.200
tr-0.198
 nod-0.193
 sounded-0.186
TokenB
Token ID33
Feature activation-9.60e-03

Pos logit contributions
+0.046
iw+0.040
ve+0.037
ll+0.037
field+0.035
Neg logit contributions
 aids-0.037
 face-0.037
 jokes-0.035
 echoed-0.034
 lull-0.034
Tokenenny
Token ID11870
Feature activation+8.32e-03

Pos logit contributions
+0.019
iw+0.016
field+0.015
ve+0.014
ll+0.014
Neg logit contributions
 aids-0.017
 face-0.017
 jokes-0.016
 echoed-0.016
 lull-0.015
Token had
Token ID550
Feature activation-2.03e-02

Pos logit contributions
 aids+0.014
 jokes+0.013
tr+0.013
 face+0.012
 echoed+0.012
Neg logit contributions
-0.016
ve-0.013
ll-0.013
iw-0.013
field-0.013
Token gotten
Token ID7891
Feature activation-1.45e-02

Pos logit contributions
+0.040
iw+0.033
ve+0.031
ll+0.031
 least+0.030
Neg logit contributions
 aids-0.036
 face-0.034
 jokes-0.033
tr-0.033
 repeat-0.032
Token they
Token ID484
Feature activation-1.98e-03

Pos logit contributions
+0.033
iw+0.028
ve+0.026
ll+0.026
field+0.025
Neg logit contributions
 aids-0.027
 face-0.025
 jokes-0.025
 echoed-0.024
 lull-0.024
Token were
Token ID547
Feature activation-1.05e-02

Pos logit contributions
+0.004
iw+0.004
ve+0.003
ll+0.003
ner+0.003
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 echoed-0.003
 lull-0.003
Token happy
Token ID3772
Feature activation-1.74e-02

Pos logit contributions
+0.021
iw+0.017
ve+0.016
ll+0.015
ner+0.015
Neg logit contributions
 aids-0.020
 face-0.020
 jokes-0.018
 echoed-0.018
 lull-0.018
Tokenâ
Token ID22940
Feature activation-2.08e-02

Pos logit contributions
+0.038
iw+0.033
ve+0.031
ner+0.029
ll+0.029
Neg logit contributions
 face-0.029
 jokes-0.028
 aids-0.027
 lull-0.026
 echoed-0.026
Token
Token ID26391
Feature activation-3.21e-02

Pos logit contributions
+0.045
iw+0.038
ve+0.035
ll+0.035
field+0.034
Neg logit contributions
 aids-0.034
 face-0.033
 jokes-0.031
 lull-0.030
 echoed-0.030
Token
Token ID447
Feature activation-8.11e-02

Pos logit contributions
+0.072
iw+0.067
 least+0.061
ner+0.060
ve+0.058
Neg logit contributions
 aids-0.045
 face-0.044
 echoed-0.042
 jokes-0.040
tr-0.040
Token
Token ID251
Feature activation-2.27e-02

Pos logit contributions
iw+0.031
+0.026
ner+0.025
irst+0.018
lves+0.018
Neg logit contributions
 face-0.244
tr-0.232
 aids-0.223
 repeat-0.221
 echoed-0.221
Tokenthey
Token ID9930
Feature activation+6.59e-03

Pos logit contributions
+0.048
iw+0.041
ve+0.039
ll+0.038
field+0.036
Neg logit contributions
 aids-0.037
 face-0.037
 jokes-0.035
 lull-0.034
 echoed-0.033
Token had
Token ID550
Feature activation-2.05e-02

Pos logit contributions
 aids+0.011
 face+0.010
 jokes+0.010
tr+0.010
 lull+0.010
Neg logit contributions
-0.014
iw-0.012
ve-0.011
ll-0.011
field-0.011
Token prepared
Token ID5597
Feature activation-2.05e-02

Pos logit contributions
+0.040
iw+0.033
ll+0.031
ve+0.031
 least+0.031
Neg logit contributions
 aids-0.037
tr-0.033
 face-0.033
 jokes-0.032
 repeat-0.032
Token a
Token ID257
Feature activation+3.35e-03

Pos logit contributions
+0.048
iw+0.043
ve+0.040
ll+0.039
ner+0.038
Neg logit contributions
 face-0.031
 aids-0.030
 jokes-0.029
 lull-0.027
 repeat-0.025
Token something
Token ID1223
Feature activation-1.72e-03

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 echoed+0.002
tr+0.002
Neg logit contributions
-0.003
iw-0.002
ve-0.002
ll-0.002
field-0.002
Token very
Token ID845
Feature activation+2.24e-03

Pos logit contributions
+0.003
iw+0.003
ve+0.003
ll+0.002
field+0.002
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.003
 echoed-0.003
Token special
Token ID2041
Feature activation+1.43e-02

Pos logit contributions
 aids+0.004
 face+0.004
 jokes+0.004
 lull+0.004
 echoed+0.004
Neg logit contributions
-0.005
iw-0.004
ve-0.003
ll-0.003
ner-0.003
Tokenâ
Token ID22940
Feature activation-1.74e-02

Pos logit contributions
 aids+0.028
 face+0.026
 jokes+0.025
 lull+0.025
 echoed+0.025
Neg logit contributions
-0.027
iw-0.022
ve-0.020
ll-0.020
field-0.019
Token
Token ID26391
Feature activation-2.85e-02

Pos logit contributions
+0.042
iw+0.035
ve+0.033
ll+0.033
field+0.032
Neg logit contributions
 aids-0.025
 face-0.024
 jokes-0.023
 echoed-0.022
 lull-0.021
Token
Token ID447
Feature activation-8.02e-02

Pos logit contributions
+0.059
iw+0.052
 least+0.048
ve+0.046
ll+0.046
Neg logit contributions
 aids-0.048
 face-0.045
 jokes-0.043
 echoed-0.043
 lull-0.041
Token
Token ID251
Feature activation-2.59e-02

Pos logit contributions
iw+0.024
ll+0.014
irst+0.012
 miles+0.012
ve+0.011
Neg logit contributions
 face-0.231
 aids-0.229
tr-0.223
 repeat-0.223
 nod-0.220
Tokenthe
Token ID1169
Feature activation-1.56e-02

Pos logit contributions
+0.052
iw+0.043
ve+0.040
ll+0.040
field+0.038
Neg logit contributions
 aids-0.047
 face-0.045
 jokes-0.043
 echoed-0.042
 lull-0.042
Token rise
Token ID4485
Feature activation+1.88e-02

Pos logit contributions
+0.030
iw+0.025
ve+0.024
ll+0.023
ner+0.023
Neg logit contributions
 aids-0.028
 face-0.027
 jokes-0.026
 echoed-0.026
tr-0.026
Token of
Token ID286
Feature activation-2.70e-03

Pos logit contributions
 aids+0.034
 jokes+0.033
 face+0.031
 repeat+0.031
tr+0.031
Neg logit contributions
-0.034
ve-0.027
field-0.026
iw-0.025
ll-0.025
Token the
Token ID262
Feature activation-1.86e-02

Pos logit contributions
+0.006
iw+0.005
ve+0.005
ll+0.005
field+0.005
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.004
 lull-0.004
 echoed-0.004
Token not
Token ID407
Feature activation+2.26e-02

Pos logit contributions
+0.000
iw+0.000
ve+0.000
ll+0.000
ner+0.000
Neg logit contributions
 aids-0.000
 face-0.000
 jokes-0.000
 echoed-0.000
tr-0.000
Token real
Token ID1103
Feature activation-1.00e-02

Pos logit contributions
 face+0.037
 echoed+0.035
 jokes+0.034
 aids+0.034
 repeat+0.033
Neg logit contributions
-0.053
iw-0.043
ll-0.039
ve-0.038
field-0.037
Token ice
Token ID4771
Feature activation-7.34e-03

Pos logit contributions
+0.021
iw+0.018
ve+0.017
ll+0.016
field+0.016
Neg logit contributions
 face-0.017
 aids-0.017
 jokes-0.016
 echoed-0.016
 repeat-0.015
Tokenâ
Token ID22940
Feature activation-1.19e-02

Pos logit contributions
+0.016
iw+0.013
ner+0.013
ve+0.012
field+0.012
Neg logit contributions
 face-0.012
 jokes-0.011
tr-0.011
 aids-0.010
 echoed-0.010
Token
Token ID26391
Feature activation-2.45e-02

Pos logit contributions
+0.029
iw+0.024
ll+0.023
ve+0.023
 least+0.023
Neg logit contributions
 aids-0.017
 face-0.016
 jokes-0.015
 echoed-0.015
 lull-0.014
Token
Token ID447
Feature activation-8.02e-02

Pos logit contributions
+0.053
iw+0.051
 least+0.046
field+0.044
ner+0.044
Neg logit contributions
 aids-0.035
 face-0.034
tr-0.033
 echoed-0.032
 jokes-0.031
Token
Token ID251
Feature activation-1.73e-02

Pos logit contributions
iw+0.047
irst+0.042
years+0.036
 searching+0.035
joy+0.031
Neg logit contributions
 face-0.221
tr-0.220
 go-0.209
 repeat-0.205
 sounded-0.196
Tokenit
Token ID270
Feature activation+2.84e-02

Pos logit contributions
+0.033
iw+0.026
ve+0.025
ll+0.024
field+0.023
Neg logit contributions
 aids-0.033
 face-0.031
 jokes-0.031
 lull-0.030
tr-0.030
Token was
Token ID373
Feature activation+7.41e-03

Pos logit contributions
 aids+0.055
 jokes+0.050
 names+0.046
 languages+0.046
icks+0.045
Neg logit contributions
-0.054
ve-0.045
mer-0.040
ll-0.038
field-0.038
Token fake
Token ID8390
Feature activation-3.95e-05

Pos logit contributions
 aids+0.017
 face+0.016
 jokes+0.016
 lull+0.015
 repeat+0.015
Neg logit contributions
-0.011
iw-0.008
ve-0.008
ll-0.007
 least-0.007
Token!
Token ID0
Feature activation+1.18e-02

Pos logit contributions
+0.000
iw+0.000
ve+0.000
ll+0.000
field+0.000
Neg logit contributions
 face-0.000
 aids-0.000
 jokes-0.000
 echoed-0.000
 repeat-0.000
Token what
Token ID644
Feature activation-4.05e-03

Pos logit contributions
 aids+0.026
 face+0.024
tr+0.023
 lull+0.023
 jokes+0.023
Neg logit contributions
-0.033
iw-0.028
ve-0.027
ll-0.026
 searching-0.026
Token to
Token ID284
Feature activation-7.66e-03

Pos logit contributions
+0.007
iw+0.006
ve+0.006
ll+0.005
field+0.005
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.007
 lull-0.007
 echoed-0.007
Token do
Token ID466
Feature activation-1.42e-02

Pos logit contributions
+0.018
ve+0.015
ll+0.015
 searching+0.014
 least+0.014
Neg logit contributions
 aids-0.011
tr-0.011
 lull-0.010
iet-0.010
 echoed-0.009
Tokenâ
Token ID22940
Feature activation-1.49e-02

Pos logit contributions
+0.030
iw+0.025
ve+0.022
field+0.022
ll+0.022
Neg logit contributions
 aids-0.024
 face-0.024
 jokes-0.023
 echoed-0.022
 repeat-0.022
Token
Token ID26391
Feature activation-3.14e-02

Pos logit contributions
+0.035
iw+0.030
ve+0.028
ll+0.028
 least+0.027
Neg logit contributions
 aids-0.021
 face-0.020
 jokes-0.020
 echoed-0.019
 lull-0.019
Token
Token ID447
Feature activation-8.00e-02

Pos logit contributions
+0.064
iw+0.062
 least+0.054
ll+0.052
field+0.052
Neg logit contributions
 aids-0.049
 face-0.046
 echoed-0.044
 jokes-0.042
tr-0.041
Token
Token ID251
Feature activation-2.76e-02

Pos logit contributions
iw+0.028
 searching+0.011
irst+0.010
ll+0.009
lves+0.006
Neg logit contributions
 face-0.238
 aids-0.236
tr-0.230
 repeat-0.228
 echoed-0.227
TokenS
Token ID50
Feature activation-2.96e-02

Pos logit contributions
+0.054
iw+0.045
ve+0.042
ll+0.042
field+0.040
Neg logit contributions
 aids-0.050
 face-0.049
 jokes-0.047
 echoed-0.045
 repeat-0.045
Tokenoph
Token ID2522
Feature activation+1.69e-02

Pos logit contributions
+0.063
iw+0.052
ve+0.049
ll+0.048
ner+0.048
Neg logit contributions
 aids-0.049
 face-0.048
 jokes-0.045
 repeat-0.043
 lull-0.043
Tokenie
Token ID494
Feature activation+9.76e-03

Pos logit contributions
 aids+0.037
 jokes+0.034
tr+0.034
 face+0.034
 repeat+0.034
Neg logit contributions
-0.024
iw-0.021
ll-0.020
ve-0.020
 least-0.018
Token used
Token ID973
Feature activation-1.22e-05

Pos logit contributions
 aids+0.017
 lull+0.016
 jokes+0.016
tr+0.016
 face+0.015
Neg logit contributions
-0.018
ve-0.016
ll-0.015
iw-0.015
 least-0.014
Token something
Token ID1223
Feature activation+1.94e-03

Pos logit contributions
+0.012
iw+0.010
ve+0.009
field+0.009
 least+0.009
Neg logit contributions
 aids-0.013
tr-0.013
 face-0.013
iet-0.012
 jokes-0.012
Token was
Token ID373
Feature activation-7.24e-03

Pos logit contributions
 aids+0.004
 face+0.003
tr+0.003
 repeat+0.003
iet+0.003
Neg logit contributions
-0.003
iw-0.003
ve-0.003
ll-0.003
 searching-0.002
Token wrong
Token ID2642
Feature activation-5.86e-03

Pos logit contributions
+0.012
iw+0.009
ve+0.009
field+0.009
 searching+0.009
Neg logit contributions
 aids-0.015
 face-0.014
 jokes-0.014
 lull-0.014
 echoed-0.013
Tokenâ
Token ID22940
Feature activation-1.56e-02

Pos logit contributions
+0.010
iw+0.008
ve+0.008
ll+0.007
field+0.007
Neg logit contributions
 aids-0.012
 face-0.011
 jokes-0.011
 lull-0.011
 repeat-0.011
Token
Token ID26391
Feature activation-2.93e-02

Pos logit contributions
+0.037
iw+0.031
ve+0.030
ll+0.030
field+0.029
Neg logit contributions
 aids-0.022
 face-0.021
 jokes-0.021
 echoed-0.020
 lull-0.020
Token
Token ID447
Feature activation-8.00e-02

Pos logit contributions
+0.055
iw+0.049
 least+0.044
ll+0.043
ve+0.042
Neg logit contributions
 aids-0.054
 face-0.051
 echoed-0.048
 jokes-0.047
tr-0.047
Token
Token ID251
Feature activation-3.44e-02

Pos logit contributions
iw+0.017
 searching+0.007
ll+0.007
irst+0.006
+0.006
Neg logit contributions
 face-0.243
 aids-0.241
tr-0.237
 repeat-0.236
 nod-0.229
Tokenall
Token ID439
Feature activation-1.97e-02

Pos logit contributions
+0.065
iw+0.052
ve+0.049
ll+0.047
field+0.046
Neg logit contributions
 aids-0.066
 face-0.063
 jokes-0.062
 lull-0.060
tr-0.060
Token the
Token ID262
Feature activation-1.94e-02

Pos logit contributions
+0.042
iw+0.035
ve+0.032
ll+0.032
field+0.030
Neg logit contributions
 aids-0.034
 face-0.032
 jokes-0.031
 echoed-0.030
 lull-0.030
Token water
Token ID1660
Feature activation+5.72e-04

Pos logit contributions
+0.041
iw+0.033
ve+0.032
ll+0.031
field+0.030
Neg logit contributions
 aids-0.033
 face-0.031
 jokes-0.030
 echoed-0.030
 lull-0.030
Token had
Token ID550
Feature activation-2.46e-02

Pos logit contributions
 aids+0.001
tr+0.001
 jokes+0.001
 lull+0.001
 face+0.001
Neg logit contributions
-0.001
ve-0.001
ll-0.001
field-0.001
iw-0.001
Token of
Token ID286
Feature activation-8.47e-03

Pos logit contributions
+0.026
iw+0.020
ve+0.019
ner+0.019
ll+0.018
Neg logit contributions
 aids-0.027
 face-0.027
 jokes-0.026
 lull-0.025
 echoed-0.024
Token the
Token ID262
Feature activation-1.30e-02

Pos logit contributions
+0.016
iw+0.013
ve+0.012
field+0.012
 least+0.012
Neg logit contributions
 aids-0.015
 face-0.015
 echoed-0.014
 lull-0.014
 jokes-0.014
Token animal
Token ID5044
Feature activation+1.57e-02

Pos logit contributions
+0.028
iw+0.024
ve+0.023
ll+0.022
field+0.022
Neg logit contributions
 aids-0.021
 face-0.019
 echoed-0.019
 jokes-0.018
 lull-0.018
Token that
Token ID326
Feature activation-1.57e-02

Pos logit contributions
 aids+0.030
 face+0.028
 jokes+0.027
 lull+0.027
 echoed+0.027
Neg logit contributions
-0.030
iw-0.024
ve-0.023
ll-0.022
field-0.021
Token says
Token ID1139
Feature activation-1.57e-02

Pos logit contributions
+0.035
iw+0.030
field+0.028
ve+0.028
ll+0.027
Neg logit contributions
 aids-0.024
 face-0.022
 jokes-0.021
 lull-0.021
iet-0.020
Token '
Token ID705
Feature activation-7.98e-02

Pos logit contributions
+0.027
iw+0.021
ve+0.020
ll+0.019
ner+0.019
Neg logit contributions
 aids-0.032
 face-0.031
 jokes-0.030
 lull-0.029
tr-0.029
Tokenwo
Token ID21638
Feature activation+5.48e-04

Pos logit contributions
+0.159
iw+0.131
ll+0.126
ve+0.123
field+0.119
Neg logit contributions
 aids-0.138
 face-0.125
 repeat-0.124
 jokes-0.123
tr-0.122
Tokenof
Token ID1659
Feature activation+2.72e-02

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 echoed+0.001
tr+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
ner-0.001
Token'
Token ID6
Feature activation-1.11e-02

Pos logit contributions
 aids+0.055
 face+0.055
 echoed+0.053
 jokes+0.053
tr+0.052
Neg logit contributions
-0.045
iw-0.035
ve-0.035
ll-0.032
ner-0.031
Token?"
Token ID1701
Feature activation-1.93e-02

Pos logit contributions
+0.022
iw+0.018
ve+0.017
ll+0.017
field+0.016
Neg logit contributions
 aids-0.019
 face-0.018
 echoed-0.018
 jokes-0.017
 lull-0.017
Token
Token ID198
Feature activation-2.84e-03

Pos logit contributions
+0.038
iw+0.031
ve+0.030
ll+0.029
field+0.028
Neg logit contributions
 aids-0.035
 face-0.034
 jokes-0.033
 echoed-0.031
 repeat-0.031
Token thought
Token ID1807
Feature activation-6.61e-03

Pos logit contributions
+0.006
iw+0.005
ve+0.005
ll+0.005
field+0.005
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.004
 lull-0.004
 echoed-0.004
Token about
Token ID546
Feature activation-3.65e-03

Pos logit contributions
+0.012
ve+0.010
iw+0.009
ll+0.009
ner+0.009
Neg logit contributions
 aids-0.012
 face-0.011
tr-0.011
 jokes-0.011
 lull-0.011
Token the
Token ID262
Feature activation-1.41e-02

Pos logit contributions
+0.007
iw+0.006
ve+0.006
ll+0.006
 least+0.006
Neg logit contributions
 aids-0.006
 face-0.006
tr-0.006
 echoed-0.006
 jokes-0.006
Token night
Token ID1755
Feature activation-3.63e-02

Pos logit contributions
+0.031
iw+0.026
ve+0.024
ll+0.024
field+0.023
Neg logit contributions
 aids-0.024
 face-0.023
 jokes-0.022
 lull-0.021
 echoed-0.021
TokenÂ
Token ID5523
Feature activation+1.39e-02

Pos logit contributions
+0.069
iw+0.056
ve+0.052
ll+0.051
field+0.050
Neg logit contributions
 aids-0.068
 face-0.064
 jokes-0.063
 lull-0.061
 repeat-0.060
Token´
Token ID18265
Feature activation-7.98e-02

Pos logit contributions
 aids+0.028
 face+0.027
 jokes+0.026
 echoed+0.025
 repeat+0.025
Neg logit contributions
-0.025
iw-0.020
ve-0.019
ll-0.018
field-0.018
Tokens
Token ID82
Feature activation+1.09e-02

Pos logit contributions
+0.147
 least+0.140
iw+0.128
field+0.127
 downfall+0.125
Neg logit contributions
 sounded-0.115
 echoed-0.114
tr-0.105
 aids-0.103
 repeat-0.101
Token adventure
Token ID8855
Feature activation-2.94e-02

Pos logit contributions
 aids+0.019
 face+0.017
tr+0.017
 echoed+0.017
 jokes+0.017
Neg logit contributions
-0.021
iw-0.017
ve-0.016
 least-0.016
ll-0.016
Token as
Token ID355
Feature activation-2.53e-03

Pos logit contributions
+0.059
iw+0.049
ve+0.046
ll+0.045
field+0.043
Neg logit contributions
 aids-0.053
 face-0.051
 jokes-0.049
 echoed-0.048
 lull-0.047
Token she
Token ID673
Feature activation-8.50e-03

Pos logit contributions
+0.006
iw+0.005
ve+0.004
ll+0.004
field+0.004
Neg logit contributions
 aids-0.004
 jokes-0.004
 face-0.004
 echoed-0.004
 lull-0.004
Token drifted
Token ID38648
Feature activation+1.41e-02

Pos logit contributions
+0.018
iw+0.015
ll+0.015
ve+0.015
field+0.014
Neg logit contributions
 aids-0.014
 face-0.013
 jokes-0.013
 lull-0.012
tr-0.012
Token a
Token ID257
Feature activation+7.84e-03

Pos logit contributions
+0.003
iw+0.003
ve+0.003
ll+0.003
field+0.002
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.003
 echoed-0.003
Token dirty
Token ID11841
Feature activation+3.14e-02

Pos logit contributions
 aids+0.011
 echoed+0.010
 jokes+0.010
 face+0.010
 lull+0.009
Neg logit contributions
-0.018
iw-0.016
ve-0.015
field-0.014
ll-0.014
Token animal
Token ID5044
Feature activation+2.26e-02

Pos logit contributions
 aids+0.049
 echoed+0.046
 jokes+0.042
 lull+0.041
iet+0.041
Neg logit contributions
-0.067
iw-0.057
field-0.054
ve-0.052
mer-0.049
Token that
Token ID326
Feature activation-1.03e-02

Pos logit contributions
 aids+0.044
 face+0.042
 jokes+0.041
 lull+0.040
 echoed+0.040
Neg logit contributions
-0.040
iw-0.033
ve-0.031
ll-0.030
field-0.029
Token says
Token ID1139
Feature activation-5.96e-03

Pos logit contributions
+0.022
iw+0.019
ve+0.017
field+0.017
ll+0.017
Neg logit contributions
 aids-0.016
 face-0.016
 lull-0.015
 jokes-0.015
 echoed-0.014
Token '
Token ID705
Feature activation-7.93e-02

Pos logit contributions
+0.014
ve+0.011
iw+0.011
ner+0.010
field+0.010
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.008
 echoed-0.008
 lull-0.008
Tokens
Token ID82
Feature activation+2.29e-02

Pos logit contributions
+0.155
iw+0.130
ll+0.126
ve+0.121
field+0.118
Neg logit contributions
 aids-0.138
 face-0.125
 repeat-0.123
tr-0.122
 jokes-0.122
Tokenque
Token ID4188
Feature activation+1.60e-02

Pos logit contributions
 aids+0.034
 echoed+0.034
 face+0.033
 lull+0.033
 jokes+0.033
Neg logit contributions
-0.049
ve-0.040
iw-0.040
field-0.038
ll-0.037
Tokenak
Token ID461
Feature activation-1.85e-03

Pos logit contributions
 aids+0.021
 face+0.020
 jokes+0.019
 lull+0.018
 echoed+0.018
Neg logit contributions
-0.040
iw-0.034
ve-0.032
ll-0.032
field-0.031
Token'.
Token ID4458
Feature activation-7.58e-03

Pos logit contributions
+0.003
iw+0.003
ve+0.002
ll+0.002
field+0.002
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.004
 echoed-0.003
 lull-0.003
Token And
Token ID843
Feature activation-2.28e-03

Pos logit contributions
+0.018
iw+0.015
ve+0.014
ll+0.014
field+0.014
Neg logit contributions
 aids-0.011
 face-0.011
 jokes-0.010
 echoed-0.010
 repeat-0.010
Tokenly
Token ID306
Feature activation-1.26e-02

Pos logit contributions
 aids+0.027
tr+0.025
 lull+0.025
 face+0.025
 jokes+0.024
Neg logit contributions
-0.026
ve-0.023
iw-0.022
ner-0.021
ll-0.020
Token colors
Token ID7577
Feature activation+1.70e-02

Pos logit contributions
+0.024
iw+0.021
ve+0.020
field+0.018
ner+0.018
Neg logit contributions
 aids-0.021
 lull-0.020
 repeat-0.019
 jokes-0.019
 echoed-0.019
Token and
Token ID290
Feature activation-3.13e-02

Pos logit contributions
 aids+0.036
 face+0.033
 jokes+0.033
 lull+0.032
 repeat+0.031
Neg logit contributions
-0.028
iw-0.023
ve-0.022
ll-0.021
ner-0.020
Token it
Token ID340
Feature activation-2.04e-02

Pos logit contributions
+0.058
iw+0.053
 searching+0.047
ve+0.046
field+0.045
Neg logit contributions
 aids-0.051
iet-0.048
 lull-0.048
tr-0.045
 face-0.044
Token says
Token ID1139
Feature activation-1.17e-02

Pos logit contributions
+0.046
iw+0.040
ve+0.039
ll+0.038
 least+0.037
Neg logit contributions
 aids-0.029
 face-0.027
 jokes-0.026
 lull-0.026
iet-0.024
Token '
Token ID705
Feature activation-7.92e-02

Pos logit contributions
+0.021
iw+0.016
ve+0.016
ner+0.015
ll+0.014
Neg logit contributions
 aids-0.022
 lull-0.021
iet-0.020
tr-0.020
 echoed-0.020
TokenWelcome
Token ID14618
Feature activation-2.05e-02

Pos logit contributions
+0.159
iw+0.131
ll+0.127
ve+0.124
field+0.121
Neg logit contributions
 aids-0.135
 face-0.124
tr-0.121
 repeat-0.121
 jokes-0.120
Token'
Token ID6
Feature activation-2.90e-02

Pos logit contributions
+0.034
iw+0.027
ve+0.025
ll+0.023
field+0.023
Neg logit contributions
 aids-0.043
 face-0.041
 jokes-0.041
 echoed-0.041
 lull-0.040
Token at
Token ID379
Feature activation+1.40e-03

Pos logit contributions
+0.052
iw+0.042
ve+0.039
ll+0.038
field+0.036
Neg logit contributions
 aids-0.058
 face-0.056
 jokes-0.054
 echoed-0.053
 lull-0.053
Token the
Token ID262
Feature activation+9.27e-03

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 repeat+0.002
 lull+0.002
Neg logit contributions
-0.003
iw-0.002
ll-0.002
ve-0.002
ner-0.002
Token top
Token ID1353
Feature activation-1.31e-02

Pos logit contributions
 aids+0.018
 face+0.017
 echoed+0.017
 jokes+0.016
 lull+0.016
Neg logit contributions
-0.017
iw-0.014
ve-0.013
field-0.013
 least-0.013
Token was
Token ID373
Feature activation+7.26e-03

Pos logit contributions
 aids+0.022
 face+0.020
 jokes+0.020
 lull+0.019
 echoed+0.019
Neg logit contributions
-0.030
iw-0.025
ve-0.024
ll-0.024
field-0.023
Token so
Token ID523
Feature activation-1.62e-02

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.015
tr+0.015
 lull+0.015
Neg logit contributions
-0.011
iw-0.009
ve-0.008
ll-0.008
 least-0.007
Token relieved
Token ID22598
Feature activation-9.61e-03

Pos logit contributions
+0.031
iw+0.025
ve+0.023
ll+0.023
field+0.021
Neg logit contributions
 aids-0.032
 face-0.030
 jokes-0.029
 echoed-0.029
 lull-0.028
Tokenâ
Token ID22940
Feature activation-2.59e-02

Pos logit contributions
+0.018
iw+0.014
ve+0.013
ll+0.013
field+0.012
Neg logit contributions
 aids-0.019
 face-0.018
 jokes-0.018
 lull-0.017
 echoed-0.017
Token
Token ID26391
Feature activation-2.91e-02

Pos logit contributions
+0.061
iw+0.052
ve+0.049
ll+0.048
field+0.046
Neg logit contributions
 aids-0.038
 face-0.036
 jokes-0.035
 lull-0.033
 echoed-0.033
Token
Token ID447
Feature activation-7.89e-02

Pos logit contributions
+0.060
iw+0.056
 least+0.052
field+0.049
ve+0.049
Neg logit contributions
 aids-0.044
 face-0.044
 echoed-0.043
 jokes-0.039
tr-0.038
Token
Token ID251
Feature activation-2.36e-02

Pos logit contributions
iw+0.093
+0.085
ll+0.083
irst+0.082
 searching+0.080
Neg logit contributions
 face-0.170
tr-0.164
 aids-0.160
 repeat-0.158
 echoed-0.157
Tokenhe
Token ID258
Feature activation+1.72e-03

Pos logit contributions
+0.046
iw+0.038
ve+0.035
ll+0.035
field+0.033
Neg logit contributions
 aids-0.044
 face-0.042
 jokes-0.041
 lull-0.039
 echoed-0.039
Token knew
Token ID2993
Feature activation-1.24e-02

Pos logit contributions
 aids+0.003
tr+0.003
 jokes+0.003
 face+0.003
 lull+0.003
Neg logit contributions
-0.003
ve-0.003
iw-0.003
ner-0.003
field-0.002
Token that
Token ID326
Feature activation+3.24e-03

Pos logit contributions
+0.022
iw+0.018
ve+0.017
ll+0.016
field+0.016
Neg logit contributions
 aids-0.025
 face-0.024
 jokes-0.023
 lull-0.022
 echoed-0.022
Token if
Token ID611
Feature activation-2.25e-02

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.006
tr+0.006
iet+0.006
Neg logit contributions
-0.005
iw-0.004
ve-0.004
ll-0.004
 least-0.004
Token is
Token ID318
Feature activation+9.35e-03

Pos logit contributions
+0.065
iw+0.056
ve+0.054
ll+0.053
field+0.052
Neg logit contributions
 aids-0.030
 face-0.028
 jokes-0.027
 lull-0.026
 echoed-0.025
Token a
Token ID257
Feature activation-6.05e-03

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.013
 lull+0.013
 echoed+0.013
Neg logit contributions
-0.021
iw-0.018
ve-0.016
ll-0.016
field-0.016
Token sign
Token ID1051
Feature activation+1.10e-02

Pos logit contributions
+0.013
iw+0.011
ve+0.010
ll+0.010
ner+0.009
Neg logit contributions
 aids-0.010
 face-0.010
 jokes-0.010
 lull-0.009
 repeat-0.009
Token that
Token ID326
Feature activation-2.42e-02

Pos logit contributions
 aids+0.021
 face+0.020
 jokes+0.019
 lull+0.019
 echoed+0.019
Neg logit contributions
-0.021
iw-0.017
ve-0.016
ll-0.016
field-0.015
Token says
Token ID1139
Feature activation-9.45e-03

Pos logit contributions
+0.054
iw+0.047
ll+0.045
ve+0.044
field+0.044
Neg logit contributions
 aids-0.035
 face-0.033
 lull-0.032
 jokes-0.031
 echoed-0.029
Token '
Token ID705
Feature activation-7.86e-02

Pos logit contributions
+0.019
ve+0.015
iw+0.015
ner+0.014
ll+0.014
Neg logit contributions
 aids-0.016
 face-0.015
 lull-0.015
 jokes-0.015
 echoed-0.014
TokenHot
Token ID21352
Feature activation+4.02e-03

Pos logit contributions
+0.158
iw+0.133
ll+0.128
ve+0.123
field+0.120
Neg logit contributions
 aids-0.134
 face-0.121
tr-0.121
 repeat-0.120
 echoed-0.118
Tokenel
Token ID417
Feature activation+4.71e-02

Pos logit contributions
 aids+0.008
 face+0.007
 jokes+0.007
 echoed+0.007
 lull+0.007
Neg logit contributions
-0.008
iw-0.006
ve-0.006
ll-0.006
field-0.006
Token'
Token ID6
Feature activation-9.79e-03

Pos logit contributions
 aids+0.095
 jokes+0.091
 face+0.090
 lull+0.089
 echoed+0.088
Neg logit contributions
-0.079
ve-0.063
iw-0.062
field-0.058
ll-0.057
Token with
Token ID351
Feature activation+8.94e-03

Pos logit contributions
+0.020
iw+0.017
ve+0.015
field+0.015
ll+0.015
Neg logit contributions
 aids-0.017
 face-0.016
 jokes-0.015
 echoed-0.015
 lull-0.015
Token a
Token ID257
Feature activation+1.16e-03

Pos logit contributions
 aids+0.015
 echoed+0.014
 face+0.014
 lull+0.014
 jokes+0.014
Neg logit contributions
-0.017
iw-0.015
ve-0.014
 least-0.013
field-0.013
Token something
Token ID1223
Feature activation-3.61e-04

Pos logit contributions
+0.010
iw+0.008
ve+0.007
ll+0.007
field+0.007
Neg logit contributions
 aids-0.009
 face-0.009
 lull-0.008
tr-0.008
 repeat-0.008
Token else
Token ID2073
Feature activation-1.20e-02

Pos logit contributions
+0.001
iw+0.000
ve+0.000
 searching+0.000
ll+0.000
Neg logit contributions
 aids-0.001
 face-0.001
 jokes-0.001
 lull-0.001
tr-0.001
Token too
Token ID1165
Feature activation-4.65e-03

Pos logit contributions
+0.023
iw+0.019
ve+0.017
ll+0.017
field+0.016
Neg logit contributions
 aids-0.023
 face-0.022
 jokes-0.021
 lull-0.021
 echoed-0.021
Tokenâ
Token ID22940
Feature activation-2.08e-02

Pos logit contributions
+0.010
iw+0.009
ll+0.008
ve+0.008
 least+0.007
Neg logit contributions
 face-0.008
 aids-0.008
 jokes-0.007
 echoed-0.007
 repeat-0.007
Token
Token ID26391
Feature activation-3.05e-02

Pos logit contributions
+0.050
iw+0.041
ve+0.039
ll+0.039
field+0.038
Neg logit contributions
 aids-0.031
 face-0.029
 jokes-0.029
 lull-0.027
 echoed-0.027
Token
Token ID447
Feature activation-7.84e-02

Pos logit contributions
+0.062
iw+0.056
 least+0.051
ll+0.049
ve+0.049
Neg logit contributions
 aids-0.049
 face-0.047
 echoed-0.045
tr-0.044
 jokes-0.044
Token
Token ID251
Feature activation-2.30e-02

Pos logit contributions
iw+0.027
irst+0.020
years+0.018
ll+0.015
lves+0.015
Neg logit contributions
 face-0.227
tr-0.225
 repeat-0.218
 aids-0.216
 name-0.215
Tokenit
Token ID270
Feature activation+2.33e-02

Pos logit contributions
+0.043
iw+0.034
ve+0.032
ll+0.031
field+0.030
Neg logit contributions
 aids-0.046
 face-0.043
 jokes-0.042
 lull-0.041
tr-0.041
Token was
Token ID373
Feature activation+6.49e-03

Pos logit contributions
 aids+0.044
 jokes+0.041
 languages+0.038
 face+0.038
 lull+0.038
Neg logit contributions
-0.042
ve-0.037
field-0.032
mer-0.031
ll-0.030
Token a
Token ID257
Feature activation-5.37e-03

Pos logit contributions
 aids+0.012
 lull+0.011
 face+0.011
tr+0.011
 jokes+0.011
Neg logit contributions
-0.012
iw-0.010
 searching-0.009
field-0.009
ve-0.009
Token sour
Token ID11348
Feature activation+1.49e-02

Pos logit contributions
+0.011
iw+0.009
ve+0.009
ll+0.008
field+0.008
Neg logit contributions
 aids-0.009
 face-0.009
 jokes-0.009
 echoed-0.008
 lull-0.008
Token
Token ID198
Feature activation-7.00e-03

Pos logit contributions
 aids+0.023
 face+0.022
 lull+0.021
 jokes+0.021
 echoed+0.020
Neg logit contributions
-0.031
iw-0.027
 least-0.024
ve-0.024
field-0.023
TokenOne
Token ID3198
Feature activation+1.17e-02

Pos logit contributions
+0.011
iw+0.009
ve+0.008
ll+0.008
ner+0.008
Neg logit contributions
 aids-0.015
 face-0.015
 jokes-0.014
 echoed-0.014
 repeat-0.013
Token day
Token ID1110
Feature activation+6.42e-03

Pos logit contributions
 aids+0.031
 jokes+0.028
 face+0.028
 echoed+0.027
tr+0.027
Neg logit contributions
-0.013
iw-0.010
ll-0.009
field-0.009
 least-0.007
Token as
Token ID355
Feature activation+1.33e-02

Pos logit contributions
 aids+0.014
 face+0.013
 jokes+0.013
 lull+0.013
tr+0.012
Neg logit contributions
-0.011
iw-0.008
ve-0.008
ner-0.007
ll-0.007
Token she
Token ID673
Feature activation+1.65e-02

Pos logit contributions
 face+0.019
 aids+0.019
 echoed+0.019
 jokes+0.019
 repeat+0.018
Neg logit contributions
-0.030
ve-0.025
iw-0.024
ll-0.024
field-0.023
Token che
Token ID1125
Feature activation+5.18e-02

Pos logit contributions
 aids+0.025
 face+0.023
 jokes+0.023
 lull+0.021
 echoed+0.021
Neg logit contributions
-0.038
iw-0.032
ve-0.030
ll-0.030
field-0.029
Tokenwed
Token ID19103
Feature activation-1.00e-03

Pos logit contributions
 aids+0.074
 face+0.072
 jokes+0.067
 repeat+0.064
tr+0.063
Neg logit contributions
-0.120
iw-0.108
ve-0.103
ll-0.101
field-0.095
Token she
Token ID673
Feature activation-1.63e-02

Pos logit contributions
+0.002
iw+0.002
ve+0.002
ll+0.002
ner+0.001
Neg logit contributions
 aids-0.002
 face-0.002
 jokes-0.002
 lull-0.002
 echoed-0.002
Token saw
Token ID2497
Feature activation-3.08e-03

Pos logit contributions
+0.039
iw+0.033
ve+0.031
ll+0.031
ner+0.030
Neg logit contributions
 aids-0.023
 face-0.023
 jokes-0.022
 repeat-0.021
 echoed-0.020
Token something
Token ID1223
Feature activation-5.12e-03

Pos logit contributions
+0.006
iw+0.005
 least+0.004
ner+0.004
 searching+0.004
Neg logit contributions
 aids-0.006
 face-0.005
 repeat-0.005
 lull-0.005
 echoed-0.005
Token new
Token ID649
Feature activation+6.36e-03

Pos logit contributions
+0.009
iw+0.008
ve+0.007
ll+0.007
field+0.007
Neg logit contributions
 aids-0.010
 face-0.009
 jokes-0.009
 lull-0.009
tr-0.009
Token and
Token ID290
Feature activation-1.20e-02

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 lull+0.002
 echoed+0.002
Neg logit contributions
-0.003
iw-0.002
ve-0.002
ll-0.002
field-0.002
Token started
Token ID2067
Feature activation-1.04e-02

Pos logit contributions
+0.023
iw+0.021
ve+0.019
ll+0.019
 least+0.018
Neg logit contributions
 aids-0.022
 jokes-0.020
 face-0.020
 lull-0.020
 echoed-0.019
Token to
Token ID284
Feature activation-9.78e-03

Pos logit contributions
+0.021
iw+0.018
ve+0.017
ll+0.016
field+0.016
Neg logit contributions
 aids-0.019
 face-0.018
 jokes-0.017
 repeat-0.016
 lull-0.016
Token use
Token ID779
Feature activation+1.27e-02

Pos logit contributions
+0.021
iw+0.018
 least+0.017
field+0.016
 searching+0.016
Neg logit contributions
 aids-0.016
iet-0.014
 echoed-0.013
tr-0.013
 jokes-0.013
Token her
Token ID607
Feature activation+2.21e-02

Pos logit contributions
 aids+0.020
 face+0.020
 jokes+0.019
 echoed+0.018
 repeat+0.018
Neg logit contributions
-0.029
iw-0.024
ve-0.023
ll-0.023
field-0.022
Token cr
Token ID1067
Feature activation+5.22e-02

Pos logit contributions
 aids+0.033
 face+0.030
 jokes+0.029
 lull+0.029
 echoed+0.029
Neg logit contributions
-0.052
iw-0.044
ve-0.041
ll-0.041
field-0.039
Tokenay
Token ID323
Feature activation+3.42e-02

Pos logit contributions
 aids+0.092
 jokes+0.090
 lull+0.086
 face+0.086
 repeat+0.086
Neg logit contributions
-0.107
ve-0.086
iw-0.084
ll-0.084
ner-0.073
Tokenons
Token ID684
Feature activation+2.20e-02

Pos logit contributions
 aids+0.079
 face+0.076
 lull+0.076
 echoed+0.076
 jokes+0.074
Neg logit contributions
-0.047
iw-0.038
ve-0.037
ll-0.035
ner-0.028
Token to
Token ID284
Feature activation+1.08e-02

Pos logit contributions
 aids+0.042
 face+0.040
 jokes+0.039
 lull+0.038
 echoed+0.038
Neg logit contributions
-0.042
iw-0.034
ve-0.032
ll-0.031
field-0.029
Token scrib
Token ID44661
Feature activation+2.96e-02

Pos logit contributions
 aids+0.015
 echoed+0.014
 jokes+0.014
 lull+0.014
iet+0.013
Neg logit contributions
-0.026
iw-0.022
ve-0.020
 least-0.020
field-0.020
Tokenble
Token ID903
Feature activation-6.99e-03

Pos logit contributions
 aids+0.064
 jokes+0.061
 lull+0.061
 face+0.061
 repeat+0.058
Neg logit contributions
-0.046
ve-0.040
iw-0.034
ner-0.033
ll-0.032
TokenBut
Token ID1537
Feature activation+7.88e-03

Pos logit contributions
+0.028
iw+0.023
ve+0.022
ll+0.022
field+0.021
Neg logit contributions
 aids-0.023
 face-0.022
 jokes-0.021
 echoed-0.021
 lull-0.020
Token then
Token ID788
Feature activation+7.08e-04

Pos logit contributions
 aids+0.015
 face+0.015
 jokes+0.014
 lull+0.014
 echoed+0.014
Neg logit contributions
-0.015
iw-0.012
ve-0.011
ll-0.011
field-0.011
Token they
Token ID484
Feature activation+8.14e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 echoed+0.001
 repeat+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
field-0.001
Token heard
Token ID2982
Feature activation+2.16e-02

Pos logit contributions
 aids+0.013
 jokes+0.012
 face+0.012
 lull+0.012
tr+0.011
Neg logit contributions
-0.018
iw-0.015
ve-0.014
ll-0.014
field-0.014
Token a
Token ID257
Feature activation+1.64e-02

Pos logit contributions
 repeat+0.042
tr+0.042
 lull+0.041
 aids+0.041
ems+0.040
Neg logit contributions
-0.029
 late-0.021
 least-0.020
 Smith-0.020
 sight-0.019
Token loud
Token ID7812
Feature activation+5.39e-02

Pos logit contributions
 aids+0.022
 jokes+0.019
 echoed+0.019
tr+0.019
 face+0.018
Neg logit contributions
-0.039
ve-0.033
iw-0.033
ll-0.031
field-0.031
Token noise
Token ID7838
Feature activation+4.44e-02

Pos logit contributions
 aids+0.069
tr+0.063
iet+0.059
 face+0.059
 echoed+0.058
Neg logit contributions
-0.125
ve-0.106
iw-0.102
ner-0.100
 searching-0.098
Token.
Token ID13
Feature activation+1.11e-03

Pos logit contributions
 aids+0.091
 face+0.086
 jokes+0.084
tr+0.081
 lull+0.081
Neg logit contributions
-0.076
iw-0.059
ve-0.056
ll-0.054
ner-0.052
Token They
Token ID1119
Feature activation+6.53e-03

Pos logit contributions
 aids+0.002
 lull+0.002
 repeat+0.002
 face+0.001
 jokes+0.001
Neg logit contributions
-0.003
iw-0.002
ner-0.002
ve-0.002
field-0.002
Token looked
Token ID3114
Feature activation-4.87e-03

Pos logit contributions
 aids+0.012
 lull+0.011
 jokes+0.011
tr+0.010
 face+0.010
Neg logit contributions
-0.012
ve-0.010
iw-0.010
ll-0.010
field-0.009
Token outside
Token ID2354
Feature activation-1.92e-02

Pos logit contributions
+0.009
iw+0.008
ve+0.007
ll+0.007
 least+0.007
Neg logit contributions
 aids-0.009
 face-0.008
tr-0.008
 lull-0.008
 jokes-0.008
Token can
Token ID460
Feature activation-4.09e-03

Pos logit contributions
 face+0.044
 aids+0.039
 repeat+0.039
 jokes+0.038
 nod+0.034
Neg logit contributions
-0.103
iw-0.094
field-0.090
ve-0.087
ll-0.085
Token use
Token ID779
Feature activation+1.91e-02

Pos logit contributions
+0.010
iw+0.009
ve+0.008
ll+0.008
field+0.008
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.005
 echoed-0.005
 lull-0.005
Token the
Token ID262
Feature activation+2.35e-02

Pos logit contributions
 aids+0.027
 face+0.026
 jokes+0.025
 repeat+0.023
 lull+0.023
Neg logit contributions
-0.046
iw-0.039
ve-0.038
ll-0.037
field-0.035
Token red
Token ID2266
Feature activation+3.79e-02

Pos logit contributions
 aids+0.036
 echoed+0.034
 lull+0.033
 face+0.032
 jokes+0.032
Neg logit contributions
-0.052
iw-0.044
ve-0.040
field-0.039
ll-0.039
Token past
Token ID1613
Feature activation+2.10e-02

Pos logit contributions
 aids+0.062
 lull+0.059
 echoed+0.059
iet+0.056
 jokes+0.054
Neg logit contributions
-0.082
iw-0.070
field-0.060
ner-0.060
ve-0.059
Tokenel
Token ID417
Feature activation+6.02e-02

Pos logit contributions
 aids+0.038
 lull+0.037
 face+0.037
 jokes+0.035
 echoed+0.035
Neg logit contributions
-0.041
iw-0.034
ve-0.030
field-0.030
 least-0.030
Token first
Token ID717
Feature activation+9.42e-03

Pos logit contributions
 lull+0.111
 aids+0.110
 echoed+0.106
 jokes+0.101
 face+0.099
Neg logit contributions
-0.103
iw-0.094
ve-0.082
 least-0.078
ll-0.078
Token,
Token ID11
Feature activation+1.57e-02

Pos logit contributions
 aids+0.016
 echoed+0.015
 lull+0.015
 jokes+0.015
iet+0.015
Neg logit contributions
-0.018
iw-0.015
field-0.014
 least-0.013
ve-0.013
Token and
Token ID290
Feature activation-1.17e-02

Pos logit contributions
 aids+0.032
 lull+0.029
tr+0.029
 echoed+0.029
iet+0.028
Neg logit contributions
-0.027
iw-0.023
 least-0.020
ner-0.019
ve-0.018
Token then
Token ID788
Feature activation-1.64e-02

Pos logit contributions
+0.020
iw+0.017
 least+0.015
ll+0.014
ner+0.014
Neg logit contributions
 aids-0.024
 lull-0.023
iet-0.022
 echoed-0.022
 face-0.022
Token give
Token ID1577
Feature activation+2.38e-03

Pos logit contributions
+0.026
iw+0.020
ve+0.018
ll+0.018
ner+0.017
Neg logit contributions
 aids-0.037
 face-0.035
 jokes-0.035
 lull-0.034
 echoed-0.034
Token quiet
Token ID5897
Feature activation+3.82e-04

Pos logit contributions
 aids+0.052
tr+0.047
 face+0.047
 lull+0.046
 echoed+0.046
Neg logit contributions
-0.048
iw-0.041
ve-0.038
 least-0.038
ll-0.037
Token like
Token ID588
Feature activation-6.23e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 lull+0.001
 echoed+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
field-0.001
Token you
Token ID345
Feature activation+1.59e-02

Pos logit contributions
+0.013
iw+0.010
ve+0.010
ll+0.010
field+0.009
Neg logit contributions
 aids-0.011
 face-0.011
 jokes-0.011
 echoed-0.010
 lull-0.010
Token."
Token ID526
Feature activation+7.87e-03

Pos logit contributions
 aids+0.031
 face+0.030
 jokes+0.029
 echoed+0.028
 repeat+0.028
Neg logit contributions
-0.030
iw-0.025
ve-0.022
ll-0.022
field-0.021
Token Ch
Token ID609
Feature activation+1.91e-03

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.013
 lull+0.013
 echoed+0.013
Neg logit contributions
-0.016
iw-0.013
ve-0.012
ll-0.012
field-0.011
Tokenir
Token ID343
Feature activation+6.79e-02

Pos logit contributions
 aids+0.003
 face+0.003
 jokes+0.003
iet+0.003
tr+0.003
Neg logit contributions
-0.004
iw-0.003
field-0.003
 lifetime-0.003
ve-0.003
Tokenpy
Token ID9078
Feature activation+4.09e-02

Pos logit contributions
 aids+0.085
 face+0.077
 lull+0.076
 jokes+0.073
 repeat+0.071
Neg logit contributions
-0.165
ve-0.149
iw-0.148
ll-0.146
ner-0.136
Token decided
Token ID3066
Feature activation-1.93e-02

Pos logit contributions
 aids+0.079
 face+0.069
tr+0.068
 lull+0.066
 jokes+0.066
Neg logit contributions
-0.071
ll-0.063
ve-0.061
iw-0.059
 least-0.054
Token to
Token ID284
Feature activation+1.11e-02

Pos logit contributions
+0.038
iw+0.031
ve+0.029
ll+0.028
field+0.027
Neg logit contributions
 aids-0.036
 face-0.035
 jokes-0.034
 echoed-0.032
 lull-0.032
Token help
Token ID1037
Feature activation+1.72e-02

Pos logit contributions
 aids+0.018
 echoed+0.016
tr+0.016
iet+0.015
 lull+0.015
Neg logit contributions
-0.023
 least-0.019
ll-0.018
field-0.017
ner-0.017
Token Wh
Token ID854
Feature activation+1.36e-02

Pos logit contributions
 aids+0.030
 face+0.029
 jokes+0.028
 repeat+0.027
 echoed+0.027
Neg logit contributions
-0.036
iw-0.030
ve-0.028
ll-0.027
field-0.026
Token dark
Token ID3223
Feature activation+5.06e-03

Pos logit contributions
+0.049
iw+0.042
ve+0.040
ll+0.040
ner+0.038
Neg logit contributions
 aids-0.028
 face-0.028
 jokes-0.026
 lull-0.024
 repeat-0.024
Token cloud
Token ID6279
Feature activation+2.94e-03

Pos logit contributions
 aids+0.009
 lull+0.008
 jokes+0.008
 repeat+0.008
 face+0.008
Neg logit contributions
-0.010
iw-0.008
field-0.008
ve-0.008
ll-0.007
Token was
Token ID373
Feature activation-3.74e-03

Pos logit contributions
 aids+0.005
 face+0.004
 jokes+0.004
 lull+0.004
 repeat+0.004
Neg logit contributions
-0.006
iw-0.005
ve-0.005
ll-0.005
field-0.005
Token just
Token ID655
Feature activation-6.64e-03

Pos logit contributions
+0.007
iw+0.005
ll+0.005
ve+0.005
field+0.005
Neg logit contributions
 aids-0.007
 face-0.007
 jokes-0.007
 echoed-0.007
tr-0.007
Token a
Token ID257
Feature activation-1.99e-02

Pos logit contributions
+0.012
iw+0.010
ve+0.010
 least+0.009
ll+0.009
Neg logit contributions
 aids-0.012
 face-0.012
 jokes-0.011
tr-0.011
 lull-0.011
Token picture
Token ID4286
Feature activation+6.53e-03

Pos logit contributions
+0.044
iw+0.037
ve+0.035
ll+0.035
ner+0.033
Neg logit contributions
 aids-0.032
 face-0.031
 jokes-0.030
 lull-0.030
 repeat-0.028
Token and
Token ID290
Feature activation-1.79e-02

Pos logit contributions
 aids+0.011
 face+0.010
 jokes+0.010
 lull+0.010
 repeat+0.009
Neg logit contributions
-0.014
iw-0.011
ve-0.010
ll-0.010
ner-0.010
Token not
Token ID407
Feature activation+2.87e-02

Pos logit contributions
+0.028
iw+0.023
 least+0.021
field+0.020
ll+0.020
Neg logit contributions
 aids-0.037
 face-0.035
tr-0.034
 lull-0.034
 jokes-0.034
Token something
Token ID1223
Feature activation-1.44e-02

Pos logit contributions
 aids+0.054
 face+0.049
 jokes+0.048
 lull+0.047
iet+0.047
Neg logit contributions
-0.052
iw-0.044
ll-0.040
ve-0.040
 least-0.040
Token to
Token ID284
Feature activation+5.79e-03

Pos logit contributions
+0.026
iw+0.022
ve+0.021
ll+0.020
field+0.019
Neg logit contributions
 aids-0.028
 face-0.026
 jokes-0.025
 lull-0.024
tr-0.024
Token be
Token ID307
Feature activation+1.93e-02

Pos logit contributions
 aids+0.008
tr+0.007
 echoed+0.007
 jokes+0.007
iet+0.007
Neg logit contributions
-0.013
iw-0.011
 least-0.011
field-0.011
ve-0.010
Token.
Token ID13
Feature activation+1.66e-02

Pos logit contributions
 aids+0.046
 jokes+0.042
 face+0.042
 repeat+0.041
 lull+0.041
Neg logit contributions
-0.036
iw-0.028
ve-0.027
ll-0.026
ner-0.024
Token
Token ID198
Feature activation+1.41e-02

Pos logit contributions
 aids+0.029
 face+0.027
 jokes+0.026
 echoed+0.025
 lull+0.025
Neg logit contributions
-0.035
iw-0.029
ve-0.028
ll-0.027
field-0.026
Token
Token ID198
Feature activation-6.57e-03

Pos logit contributions
 aids+0.023
 face+0.022
 jokes+0.020
 Their+0.019
tr+0.019
Neg logit contributions
-0.031
iw-0.027
ve-0.024
field-0.024
 least-0.024
TokenHis
Token ID6653
Feature activation+2.14e-02

Pos logit contributions
+0.016
iw+0.013
ve+0.013
ll+0.013
ner+0.012
Neg logit contributions
 aids-0.010
 face-0.009
 jokes-0.009
 lull-0.009
 echoed-0.009
Token mom
Token ID1995
Feature activation-1.12e-02

Pos logit contributions
 aids+0.038
 echoed+0.035
 lull+0.033
 jokes+0.033
 repeat+0.032
Neg logit contributions
-0.043
iw-0.035
ll-0.033
ve-0.032
 least-0.031
Token heard
Token ID2982
Feature activation+1.04e-02

Pos logit contributions
+0.025
iw+0.021
ve+0.021
ll+0.020
ner+0.019
Neg logit contributions
 aids-0.017
 jokes-0.016
 face-0.016
 lull-0.015
 echoed-0.015
Token him
Token ID683
Feature activation-2.53e-03

Pos logit contributions
 aids+0.019
 face+0.018
tr+0.018
 lull+0.018
 jokes+0.018
Neg logit contributions
-0.020
iw-0.016
ve-0.015
ll-0.014
ner-0.014
Token and
Token ID290
Feature activation-1.04e-02

Pos logit contributions
+0.005
iw+0.004
ve+0.004
ll+0.004
ner+0.003
Neg logit contributions
 aids-0.005
 jokes-0.004
 face-0.004
 lull-0.004
tr-0.004
Token came
Token ID1625
Feature activation-3.22e-03

Pos logit contributions
+0.022
iw+0.019
ve+0.019
ll+0.018
 searching+0.017
Neg logit contributions
 aids-0.017
 jokes-0.015
 lull-0.015
tr-0.015
 face-0.014
Token to
Token ID284
Feature activation-2.19e-02

Pos logit contributions
+0.007
iw+0.006
field+0.005
ve+0.005
ll+0.005
Neg logit contributions
 face-0.005
 aids-0.005
 jokes-0.005
 repeat-0.005
 echoed-0.004
Token help
Token ID1037
Feature activation-1.62e-02

Pos logit contributions
+0.055
iw+0.046
field+0.045
ll+0.045
ve+0.045
Neg logit contributions
 face-0.029
 aids-0.028
 jokes-0.026
 repeat-0.026
 nod-0.024
Token<|endoftext|>
Token ID50256
Feature activation-4.52e-02

Pos logit contributions
 aids+0.012
 face+0.012
 jokes+0.011
 echoed+0.011
tr+0.011
Neg logit contributions
-0.018
iw-0.015
ve-0.015
ll-0.015
field-0.014
TokenOnce
Token ID7454
Feature activation-1.04e-02

Pos logit contributions
+0.078
iw+0.071
ve+0.064
ll+0.062
 least+0.062
Neg logit contributions
 aids-0.081
 face-0.079
 echoed-0.078
 sounded-0.077
tr-0.075
Token upon
Token ID2402
Feature activation+4.08e-02

Pos logit contributions
+0.022
iw+0.018
ve+0.016
field+0.015
 least+0.015
Neg logit contributions
 face-0.018
 aids-0.018
 echoed-0.017
 jokes-0.016
 repeat-0.016
Token a
Token ID257
Feature activation+2.61e-02

Pos logit contributions
 aids+0.071
 face+0.065
 jokes+0.063
tr+0.061
 echoed+0.061
Neg logit contributions
-0.084
iw-0.068
ve-0.066
ll-0.064
ner-0.061
Token time
Token ID640
Feature activation-2.90e-03

Pos logit contributions
 aids+0.036
 face+0.034
 jokes+0.033
 lull+0.031
 echoed+0.031
Neg logit contributions
-0.063
iw-0.055
ve-0.052
ll-0.051
ner-0.049
Token there
Token ID612
Feature activation+5.76e-03

Pos logit contributions
+0.006
iw+0.005
field+0.005
ve+0.005
 least+0.005
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.004
 echoed-0.004
 repeat-0.004
Token lived
Token ID5615
Feature activation-1.09e-02

Pos logit contributions
 aids+0.009
 face+0.008
 jokes+0.008
 lull+0.008
 echoed+0.008
Neg logit contributions
-0.013
iw-0.011
ve-0.010
ll-0.010
field-0.010
Token a
Token ID257
Feature activation-2.78e-03

Pos logit contributions
+0.025
iw+0.021
ve+0.020
ll+0.019
field+0.019
Neg logit contributions
 aids-0.017
 face-0.016
 jokes-0.016
 echoed-0.015
 lull-0.015
Token little
Token ID1310
Feature activation-3.32e-02

Pos logit contributions
+0.007
iw+0.006
ll+0.006
ve+0.006
 least+0.006
Neg logit contributions
 face-0.004
 jokes-0.003
 repeat-0.003
 lull-0.003
 song-0.003
Token girl
Token ID2576
Feature activation+1.81e-02

Pos logit contributions
+0.086
ve+0.071
ll+0.071
iw+0.069
 least+0.067
Neg logit contributions
 face-0.042
 jokes-0.038
 aids-0.036
 song-0.035
 repeat-0.035
Token.
Token ID13
Feature activation-5.88e-03

Pos logit contributions
 aids+0.033
 face+0.030
 jokes+0.030
 echoed+0.029
 lull+0.029
Neg logit contributions
-0.036
iw-0.031
ll-0.028
ve-0.028
field-0.027
Token played
Token ID2826
Feature activation-5.32e-04

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 lull+0.002
 echoed+0.002
Neg logit contributions
-0.003
iw-0.002
ll-0.002
ve-0.002
field-0.002
Token together
Token ID1978
Feature activation+7.77e-04

Pos logit contributions
+0.001
iw+0.001
ve+0.001
ll+0.001
field+0.000
Neg logit contributions
 aids-0.001
 face-0.001
 lull-0.001
 jokes-0.001
 echoed-0.001
Token in
Token ID287
Feature activation-2.22e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 echoed+0.001
 lull+0.001
Neg logit contributions
-0.002
iw-0.001
ve-0.001
ll-0.001
field-0.001
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
+0.005
iw+0.005
ve+0.005
ll+0.004
ner+0.004
Neg logit contributions
 face-0.003
 aids-0.003
 jokes-0.003
 repeat-0.003
 lull-0.003
Token park
Token ID3952
Feature activation-1.59e-02

Pos logit contributions
 aids+0.029
 echoed+0.026
 jokes+0.026
 face+0.026
 repeat+0.025
Neg logit contributions
-0.020
iw-0.015
 least-0.014
field-0.014
ll-0.014
Token.
Token ID13
Feature activation+6.42e-03

Pos logit contributions
+0.032
iw+0.027
ve+0.025
ll+0.024
ner+0.023
Neg logit contributions
 aids-0.028
 face-0.028
 jokes-0.027
 lull-0.026
 echoed-0.026
Token They
Token ID1119
Feature activation+4.74e-03

Pos logit contributions
 aids+0.009
 face+0.009
 jokes+0.009
 lull+0.008
 echoed+0.008
Neg logit contributions
-0.015
iw-0.013
ve-0.012
ll-0.012
field-0.012
Token were
Token ID547
Feature activation-6.39e-03

Pos logit contributions
 aids+0.007
 face+0.007
 jokes+0.006
 lull+0.006
 echoed+0.006
Neg logit contributions
-0.011
iw-0.009
ve-0.009
ll-0.009
field-0.008
Token very
Token ID845
Feature activation-2.42e-02

Pos logit contributions
+0.012
iw+0.010
ve+0.009
ll+0.009
field+0.008
Neg logit contributions
 aids-0.013
 face-0.012
 jokes-0.012
 echoed-0.012
 repeat-0.012
Token happy
Token ID3772
Feature activation-1.31e-02

Pos logit contributions
+0.058
iw+0.045
 least+0.044
field+0.044
ner+0.042
Neg logit contributions
 face-0.038
 jokes-0.035
 aids-0.034
 repeat-0.033
 echoed-0.033
Token.
Token ID13
Feature activation+7.82e-03

Pos logit contributions
+0.034
iw+0.030
ner+0.028
ve+0.028
irst+0.027
Neg logit contributions
 face-0.017
 jokes-0.016
 echoed-0.015
 songs-0.015
 song-0.014
Token showed
Token ID3751
Feature activation+2.57e-02

Pos logit contributions
 aids+0.007
 face+0.006
 jokes+0.006
 lull+0.006
tr+0.006
Neg logit contributions
-0.008
iw-0.007
ll-0.007
ve-0.007
field-0.006
Token it
Token ID340
Feature activation-3.11e-03

Pos logit contributions
tr+0.049
 aids+0.049
 lull+0.046
 echoed+0.045
 face+0.044
Neg logit contributions
-0.042
iw-0.034
ve-0.033
ner-0.033
 least-0.033
Token to
Token ID284
Feature activation+1.03e-02

Pos logit contributions
+0.004
iw+0.003
ll+0.003
 searching+0.003
ve+0.003
Neg logit contributions
 aids-0.007
 lull-0.007
 echoed-0.007
 face-0.007
 repeat-0.007
Token her
Token ID607
Feature activation+1.72e-02

Pos logit contributions
 aids+0.016
tr+0.015
 lull+0.014
 echoed+0.014
 face+0.013
Neg logit contributions
-0.022
 least-0.018
iw-0.017
ner-0.017
ve-0.017
Token brother
Token ID3956
Feature activation+5.47e-03

Pos logit contributions
 face+0.020
 aids+0.019
 jokes+0.018
 repeat+0.016
 lull+0.016
Neg logit contributions
-0.047
iw-0.041
ve-0.039
ll-0.038
field-0.037
Token and
Token ID290
Feature activation+9.49e-03

Pos logit contributions
 aids+0.011
 face+0.011
tr+0.010
 lull+0.010
 echoed+0.010
Neg logit contributions
-0.009
iw-0.007
ll-0.007
 least-0.006
 searching-0.006
Token they
Token ID484
Feature activation-3.86e-03

Pos logit contributions
 aids+0.021
tr+0.020
iet+0.019
 lull+0.019
 face+0.018
Neg logit contributions
-0.014
ll-0.011
iw-0.011
 least-0.011
ner-0.010
Token started
Token ID2067
Feature activation-1.88e-02

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
field+0.006
Neg logit contributions
 aids-0.007
 face-0.006
 jokes-0.006
 lull-0.006
tr-0.006
Token to
Token ID284
Feature activation-1.13e-02

Pos logit contributions
+0.035
iw+0.028
ve+0.026
ll+0.025
field+0.024
Neg logit contributions
 aids-0.037
 face-0.035
 jokes-0.034
 echoed-0.033
 lull-0.033
Token play
Token ID711
Feature activation-1.62e-02

Pos logit contributions
+0.026
iw+0.020
ll+0.019
 least+0.019
ner+0.019
Neg logit contributions
 aids-0.019
tr-0.017
 face-0.016
 echoed-0.016
 lull-0.016
Token.
Token ID13
Feature activation+1.30e-02

Pos logit contributions
+0.032
iw+0.026
ve+0.024
ll+0.024
field+0.023
Neg logit contributions
 aids-0.031
 face-0.029
 jokes-0.028
 lull-0.027
 echoed-0.027
Token that
Token ID326
Feature activation+1.10e-02

Pos logit contributions
+0.026
iw+0.021
 least+0.020
ve+0.019
field+0.019
Neg logit contributions
 aids-0.027
 face-0.025
 echoed-0.025
tr-0.025
 lull-0.024
Token hotel
Token ID7541
Feature activation-1.06e-02

Pos logit contributions
 face+0.018
 aids+0.017
 repeat+0.016
 jokes+0.016
 lull+0.016
Neg logit contributions
-0.026
iw-0.021
ve-0.020
ll-0.019
ner-0.019
Token every
Token ID790
Feature activation-4.10e-03

Pos logit contributions
+0.023
iw+0.020
ve+0.018
ll+0.018
ner+0.017
Neg logit contributions
 aids-0.017
 face-0.017
 jokes-0.016
 echoed-0.016
 repeat-0.015
Token weekend
Token ID5041
Feature activation-3.12e-02

Pos logit contributions
+0.007
iw+0.006
ve+0.005
ll+0.005
field+0.005
Neg logit contributions
 aids-0.009
 face-0.008
 jokes-0.008
 echoed-0.008
 lull-0.008
Token to
Token ID284
Feature activation-1.67e-03

Pos logit contributions
+0.063
iw+0.052
ve+0.049
ll+0.048
field+0.046
Neg logit contributions
 aids-0.056
 face-0.054
 jokes-0.052
 echoed-0.050
 lull-0.050
Token explore
Token ID7301
Feature activation-2.28e-02

Pos logit contributions
+0.004
 least+0.003
iw+0.003
field+0.003
ll+0.003
Neg logit contributions
 aids-0.003
 echoed-0.002
tr-0.002
 lull-0.002
 face-0.002
Token and
Token ID290
Feature activation-7.20e-03

Pos logit contributions
+0.051
iw+0.043
ll+0.040
ve+0.040
field+0.038
Neg logit contributions
 aids-0.036
 face-0.035
 jokes-0.034
 lull-0.032
 echoed-0.032
Token to
Token ID284
Feature activation+5.77e-03

Pos logit contributions
+0.014
iw+0.012
 least+0.011
field+0.011
ll+0.011
Neg logit contributions
 aids-0.012
 echoed-0.011
tr-0.011
iet-0.011
 lull-0.011
Token have
Token ID423
Feature activation-4.54e-03

Pos logit contributions
 aids+0.010
tr+0.010
 echoed+0.010
 lull+0.010
 face+0.009
Neg logit contributions
-0.011
 least-0.009
iw-0.009
field-0.008
 searching-0.008
Token fun
Token ID1257
Feature activation-1.05e-03

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
 least+0.006
Neg logit contributions
 aids-0.009
 face-0.009
tr-0.008
 lull-0.008
 jokes-0.008
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
+0.002
iw+0.002
ve+0.002
ll+0.002
field+0.001
Neg logit contributions
 aids-0.002
 face-0.002
 jokes-0.002
 lull-0.002
 echoed-0.002
Token the
Token ID262
Feature activation-1.42e-02

Pos logit contributions
 aids+0.027
 face+0.026
 jokes+0.026
 echoed+0.025
 lull+0.025
Neg logit contributions
-0.026
iw-0.022
ve-0.020
ll-0.020
field-0.019
Token nation
Token ID3277
Feature activation+1.96e-02

Pos logit contributions
+0.030
iw+0.025
ve+0.024
ll+0.023
field+0.023
Neg logit contributions
 aids-0.024
 face-0.022
 jokes-0.022
 echoed-0.021
 lull-0.021
Token had
Token ID550
Feature activation-1.99e-02

Pos logit contributions
 aids+0.033
 jokes+0.031
 face+0.031
 lull+0.029
tr+0.029
Neg logit contributions
-0.040
iw-0.033
ve-0.033
ll-0.032
field-0.031
Token been
Token ID587
Feature activation-1.93e-03

Pos logit contributions
+0.038
iw+0.031
ve+0.030
ll+0.029
 searching+0.028
Neg logit contributions
 aids-0.035
 face-0.033
tr-0.032
 jokes-0.032
 lull-0.032
Token transformed
Token ID14434
Feature activation-1.47e-02

Pos logit contributions
+0.004
iw+0.003
ll+0.003
ve+0.003
field+0.003
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.003
 echoed-0.003
 lull-0.003
Token into
Token ID656
Feature activation-3.69e-03

Pos logit contributions
+0.030
iw+0.025
ve+0.023
ll+0.022
ner+0.022
Neg logit contributions
 aids-0.026
 face-0.026
 jokes-0.024
 echoed-0.024
 lull-0.023
Token something
Token ID1223
Feature activation-7.21e-03

Pos logit contributions
+0.008
iw+0.007
ll+0.007
ve+0.007
ner+0.006
Neg logit contributions
 aids-0.006
 face-0.006
 jokes-0.006
 echoed-0.005
 lull-0.005
Token new
Token ID649
Feature activation+1.62e-02

Pos logit contributions
+0.015
iw+0.012
ve+0.011
ll+0.011
ner+0.010
Neg logit contributions
 aids-0.013
 face-0.013
 jokes-0.012
 echoed-0.012
 lull-0.012
Token and
Token ID290
Feature activation-2.36e-02

Pos logit contributions
 aids+0.030
 face+0.029
 jokes+0.028
 echoed+0.027
 lull+0.027
Neg logit contributions
-0.032
iw-0.026
ve-0.024
ll-0.024
ner-0.023
Token wonderful
Token ID7932
Feature activation+3.08e-03

Pos logit contributions
+0.040
iw+0.034
ll+0.031
ve+0.031
field+0.030
Neg logit contributions
 aids-0.048
 face-0.044
 jokes-0.043
tr-0.042
 echoed-0.042
Token.
Token ID13
Feature activation+4.65e-03

Pos logit contributions
 aids+0.006
 face+0.006
 jokes+0.006
 lull+0.006
 echoed+0.006
Neg logit contributions
-0.006
iw-0.004
ve-0.004
ll-0.004
field-0.004
Token She
Token ID1375
Feature activation-1.66e-03

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
ner+0.006
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.005
 lull-0.005
 echoed-0.004
Token watered
Token ID48024
Feature activation+6.21e-04

Pos logit contributions
+0.004
iw+0.003
ve+0.003
ll+0.003
field+0.003
Neg logit contributions
 aids-0.002
 face-0.002
 jokes-0.002
 echoed-0.002
 lull-0.002
Token it
Token ID340
Feature activation-1.94e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 echoed+0.001
 lull+0.001
Neg logit contributions
-0.001
iw-0.001
ve-0.001
ll-0.001
field-0.001
Token every
Token ID790
Feature activation-1.65e-02

Pos logit contributions
+0.003
iw+0.003
ve+0.002
ll+0.002
field+0.002
Neg logit contributions
 aids-0.004
 face-0.004
 jokes-0.004
 lull-0.004
 echoed-0.004
Token day
Token ID1110
Feature activation-2.16e-02

Pos logit contributions
+0.028
iw+0.023
ve+0.021
ll+0.020
field+0.019
Neg logit contributions
 aids-0.035
 face-0.033
 jokes-0.033
 echoed-0.032
 lull-0.032
Token and
Token ID290
Feature activation+1.62e-04

Pos logit contributions
+0.048
iw+0.041
ve+0.039
ll+0.038
 least+0.036
Neg logit contributions
 face-0.034
 aids-0.034
 jokes-0.031
 lull-0.030
 repeat-0.030
Token sang
Token ID25889
Feature activation+3.35e-02

Pos logit contributions
 aids+0.000
tr+0.000
 face+0.000
 echoed+0.000
 jokes+0.000
Neg logit contributions
-0.000
iw-0.000
field-0.000
ll-0.000
ve-0.000
Token to
Token ID284
Feature activation+2.88e-03

Pos logit contributions
 aids+0.055
tr+0.054
 echoed+0.049
 repeat+0.048
 erase+0.046
Neg logit contributions
-0.061
ve-0.055
ner-0.049
 searching-0.049
iw-0.048
Token it
Token ID340
Feature activation-6.71e-03

Pos logit contributions
 aids+0.006
tr+0.005
 echoed+0.005
 face+0.005
 lull+0.005
Neg logit contributions
-0.005
iw-0.004
ve-0.004
 least-0.004
field-0.004
Token.
Token ID13
Feature activation+7.87e-03

Pos logit contributions
+0.010
iw+0.007
ve+0.007
field+0.007
ll+0.007
Neg logit contributions
 aids-0.015
 echoed-0.014
 face-0.013
iet-0.013
tr-0.013
Token She
Token ID1375
Feature activation+5.56e-03

Pos logit contributions
 aids+0.013
 face+0.012
 jokes+0.011
 echoed+0.011
 lull+0.011
Neg logit contributions
-0.018
iw-0.015
ve-0.013
ll-0.013
ner-0.013
Token treasure
Token ID14068
Feature activation+1.57e-04

Pos logit contributions
 aids+0.017
 face+0.015
 jokes+0.015
 echoed+0.015
 repeat+0.014
Neg logit contributions
-0.019
iw-0.015
ve-0.014
ll-0.014
field-0.014
Token.
Token ID13
Feature activation+8.74e-03

Pos logit contributions
 aids+0.000
 face+0.000
 jokes+0.000
 lull+0.000
 echoed+0.000
Neg logit contributions
-0.000
iw-0.000
ve-0.000
ll-0.000
field-0.000
Token
Token ID220
Feature activation-3.19e-02

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.014
 echoed+0.013
 lull+0.013
Neg logit contributions
-0.018
iw-0.015
ve-0.014
ll-0.014
field-0.013
Token
Token ID198
Feature activation+9.28e-04

Pos logit contributions
+0.057
iw+0.045
ve+0.042
ll+0.041
field+0.039
Neg logit contributions
 aids-0.065
 face-0.062
 jokes-0.061
 lull-0.059
 echoed-0.059
Token
Token ID198
Feature activation-8.82e-03

Pos logit contributions
 aids+0.002
 face+0.002
 jokes+0.002
 lull+0.002
 echoed+0.002
Neg logit contributions
-0.002
iw-0.002
ve-0.001
ll-0.001
field-0.001
TokenThe
Token ID464
Feature activation-3.09e-03

Pos logit contributions
+0.018
iw+0.015
ve+0.014
ll+0.014
field+0.013
Neg logit contributions
 aids-0.016
 face-0.015
 jokes-0.015
 echoed-0.014
tr-0.014
Token sun
Token ID4252
Feature activation-4.60e-03

Pos logit contributions
+0.007
iw+0.006
ve+0.005
ll+0.005
field+0.005
Neg logit contributions
 aids-0.005
 face-0.005
 jokes-0.005
 lull-0.004
 echoed-0.004
Token slowly
Token ID6364
Feature activation-4.13e-03

Pos logit contributions
+0.009
field+0.007
ll+0.007
ve+0.007
iw+0.007
Neg logit contributions
 aids-0.009
 jokes-0.008
 face-0.008
iet-0.007
 Watch-0.007
Token sank
Token ID30895
Feature activation-4.36e-03

Pos logit contributions
+0.009
iw+0.007
ve+0.007
field+0.007
ll+0.007
Neg logit contributions
 aids-0.007
 jokes-0.007
 face-0.006
tr-0.006
iet-0.006
Token below
Token ID2174
Feature activation-4.71e-03

Pos logit contributions
+0.009
iw+0.007
ve+0.007
ll+0.007
field+0.006
Neg logit contributions
 aids-0.008
 face-0.007
 jokes-0.007
 lull-0.007
 echoed-0.007
Token the
Token ID262
Feature activation-1.98e-02

Pos logit contributions
+0.009
iw+0.007
ve+0.007
ll+0.006
field+0.006
Neg logit contributions
 aids-0.009
 face-0.008
 jokes-0.008
 lull-0.008
tr-0.008
Token fallen
Token ID9292
Feature activation-2.04e-03

Pos logit contributions
 aids+0.000
 jokes+0.000
 face+0.000
 lull+0.000
iet+0.000
Neg logit contributions
-0.000
iw-0.000
ll-0.000
 searching-0.000
 least-0.000
Token before
Token ID878
Feature activation-1.30e-02

Pos logit contributions
+0.005
iw+0.004
ve+0.004
ll+0.004
field+0.004
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 echoed-0.003
tr-0.003
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
iw+0.035
+0.034
irst+0.031
ll+0.031
ve+0.030
Neg logit contributions
 face-0.014
 echoed-0.013
 aids-0.012
 jokes-0.012
 sounded-0.010
Token It
Token ID632
Feature activation+1.00e-02

Pos logit contributions
 aids+0.022
 face+0.022
 jokes+0.020
tr+0.019
 echoed+0.019
Neg logit contributions
-0.028
iw-0.024
ve-0.023
ll-0.022
 least-0.021
Token was
Token ID373
Feature activation+8.25e-03

Pos logit contributions
 aids+0.018
 jokes+0.017
 face+0.017
 repeat+0.016
 lull+0.015
Neg logit contributions
-0.019
ve-0.016
iw-0.016
ll-0.015
 least-0.015
Token a
Token ID257
Feature activation-3.51e-03

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.015
 repeat+0.014
 echoed+0.014
Neg logit contributions
-0.015
iw-0.012
ve-0.011
ll-0.011
 least-0.011
Token long
Token ID890
Feature activation+2.88e-02

Pos logit contributions
+0.008
iw+0.006
ve+0.006
ll+0.006
ner+0.006
Neg logit contributions
 face-0.006
 aids-0.006
 jokes-0.005
 lull-0.005
tr-0.005
Token way
Token ID835
Feature activation-4.12e-03

Pos logit contributions
 aids+0.057
 jokes+0.053
 echoed+0.051
 face+0.050
 lull+0.049
Neg logit contributions
-0.053
iw-0.041
ve-0.039
field-0.038
 searching-0.038
Token down
Token ID866
Feature activation-1.92e-02

Pos logit contributions
+0.008
iw+0.007
ve+0.006
ll+0.006
field+0.006
Neg logit contributions
 aids-0.008
 face-0.007
 jokes-0.007
 lull-0.007
 echoed-0.007
Token,
Token ID11
Feature activation+5.22e-03

Pos logit contributions
+0.053
ve+0.048
ll+0.046
iw+0.045
otte+0.045
Neg logit contributions
 echoed-0.016
 sounded-0.016
 face-0.016
 go-0.014
 aids-0.014
Token and
Token ID290
Feature activation+3.55e-03

Pos logit contributions
 aids+0.009
 face+0.009
 jokes+0.008
 echoed+0.008
 repeat+0.008
Neg logit contributions
-0.011
iw-0.009
ve-0.009
ll-0.008
ner-0.008
Token when
Token ID618
Feature activation-2.73e-02

Pos logit contributions
+0.037
iw+0.031
ve+0.029
ll+0.028
field+0.027
Neg logit contributions
 aids-0.031
 face-0.030
 jokes-0.029
 lull-0.028
 echoed-0.028
Token she
Token ID673
Feature activation-2.80e-02

Pos logit contributions
+0.052
iw+0.043
ve+0.040
ll+0.039
field+0.037
Neg logit contributions
 aids-0.052
 face-0.050
 jokes-0.049
 echoed-0.047
 lull-0.047
Token first
Token ID717
Feature activation-1.29e-02

Pos logit contributions
+0.064
iw+0.054
ve+0.051
ll+0.050
field+0.048
Neg logit contributions
 aids-0.043
 face-0.041
 jokes-0.039
 echoed-0.038
 lull-0.038
Token arrived
Token ID5284
Feature activation-3.88e-02

Pos logit contributions
+0.027
iw+0.022
ve+0.021
ll+0.021
field+0.020
Neg logit contributions
 aids-0.022
tr-0.020
 jokes-0.020
 face-0.019
 lull-0.019
Token.
Token ID13
Feature activation+5.21e-03

Pos logit contributions
+0.081
iw+0.068
ve+0.063
ll+0.063
field+0.060
Neg logit contributions
 aids-0.066
 face-0.064
 jokes-0.061
 lull-0.059
 echoed-0.059
Token<|endoftext|>
Token ID50256
Feature activation-4.65e-02

Pos logit contributions
 aids+0.010
 face+0.009
 jokes+0.009
 echoed+0.009
 lull+0.009
Neg logit contributions
-0.010
iw-0.008
ve-0.008
ll-0.008
field-0.007
TokenOnce
Token ID7454
Feature activation-4.38e-03

Pos logit contributions
+0.074
iw+0.065
ve+0.057
ll+0.057
 least+0.056
Neg logit contributions
 aids-0.093
 face-0.090
 echoed-0.089
 sounded-0.087
 jokes-0.086
Token there
Token ID612
Feature activation+1.48e-02

Pos logit contributions
+0.009
iw+0.007
ve+0.007
field+0.006
ner+0.006
Neg logit contributions
 face-0.008
 aids-0.007
 echoed-0.007
 jokes-0.007
 repeat-0.007
Token was
Token ID373
Feature activation+1.01e-02

Pos logit contributions
 aids+0.019
 face+0.018
 jokes+0.017
 echoed+0.017
 repeat+0.016
Neg logit contributions
-0.037
iw-0.032
ve-0.031
ll-0.030
field-0.029
Token a
Token ID257
Feature activation+5.70e-03

Pos logit contributions
 aids+0.016
 face+0.015
 jokes+0.015
 echoed+0.014
 lull+0.014
Neg logit contributions
-0.022
iw-0.019
ve-0.018
ll-0.017
 least-0.017
Token little
Token ID1310
Feature activation-2.39e-02

Pos logit contributions
 face+0.007
 lull+0.006
 repeat+0.006
 aids+0.006
 jokes+0.006
Neg logit contributions
-0.014
iw-0.012
ll-0.012
ve-0.012
 least-0.012
Token magnet
Token ID19972
Feature activation-1.64e-03

Pos logit contributions
 aids+0.039
 echoed+0.035
 face+0.035
 jokes+0.035
tr+0.034
Neg logit contributions
-0.050
iw-0.041
ve-0.039
ll-0.038
field-0.037
Token,
Token ID11
Feature activation+9.46e-03

Pos logit contributions
+0.003
iw+0.002
ve+0.002
ll+0.002
field+0.002
Neg logit contributions
 aids-0.003
 face-0.003
 jokes-0.003
 lull-0.003
 echoed-0.003
Token feeling
Token ID4203
Feature activation+3.27e-04

Pos logit contributions
 aids+0.018
 face+0.017
 jokes+0.016
 lull+0.016
 echoed+0.016
Neg logit contributions
-0.018
iw-0.015
ve-0.014
ll-0.014
field-0.013
Token lonely
Token ID21757
Feature activation-3.40e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 lull+0.001
 echoed+0.001
Neg logit contributions
-0.001
iw-0.001
ll-0.000
ve-0.000
field-0.000
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
+0.008
iw+0.007
ve+0.007
ll+0.006
ner+0.006
Neg logit contributions
 face-0.005
 aids-0.005
 jokes-0.005
 echoed-0.004
 lull-0.004
Token<|endoftext|>
Token ID50256
Feature activation-4.38e-02

Pos logit contributions
 aids+0.017
 face+0.016
 echoed+0.015
 Their+0.015
 jokes+0.014
Neg logit contributions
-0.026
iw-0.023
ve-0.023
ll-0.022
ized-0.021
TokenOnce
Token ID7454
Feature activation-6.03e-03

Pos logit contributions
+0.079
iw+0.073
ve+0.065
 least+0.065
ll+0.064
Neg logit contributions
 aids-0.075
 face-0.071
 echoed-0.071
 sounded-0.070
tr-0.068
Token upon
Token ID2402
Feature activation+4.47e-02

Pos logit contributions
+0.013
iw+0.011
ve+0.009
field+0.009
 least+0.009
Neg logit contributions
 face-0.010
 aids-0.010
 echoed-0.009
 jokes-0.009
ia-0.009
Token a
Token ID257
Feature activation+3.00e-02

Pos logit contributions
 aids+0.079
 face+0.073
 jokes+0.071
tr+0.069
 echoed+0.069
Neg logit contributions
-0.091
iw-0.073
ve-0.070
ll-0.067
ner-0.065
Token time
Token ID640
Feature activation+5.18e-04

Pos logit contributions
 aids+0.040
 face+0.038
 jokes+0.036
 lull+0.035
 echoed+0.034
Neg logit contributions
-0.073
iw-0.064
ve-0.060
ll-0.059
ner-0.058
Token,
Token ID11
Feature activation-6.74e-03

Pos logit contributions
 aids+0.001
 face+0.001
 jokes+0.001
 echoed+0.001
 repeat+0.001
Neg logit contributions
-0.001
iw-0.001
field-0.001
ve-0.001
 least-0.001
Token fun
Token ID1257
Feature activation+2.29e-03

Pos logit contributions
+0.013
iw+0.011
ve+0.010
ll+0.010
field+0.009
Neg logit contributions
 aids-0.013
 face-0.012
 jokes-0.012
 lull-0.011
 echoed-0.011
Token to
Token ID284
Feature activation-5.80e-05

Pos logit contributions
 aids+0.005
 face+0.004
 jokes+0.004
 lull+0.004
 echoed+0.004
Neg logit contributions
-0.004
iw-0.003
ve-0.003
ll-0.003
field-0.003
Token play
Token ID711
Feature activation-1.58e-02

Pos logit contributions
+0.000
iw+0.000
ve+0.000
 least+0.000
ner+0.000
Neg logit contributions
 aids-0.000
tr-0.000
 jokes-0.000
 echoed-0.000
 lull-0.000
Token with
Token ID351
Feature activation-6.93e-03

Pos logit contributions
+0.032
iw+0.026
ve+0.024
ll+0.024
field+0.023
Neg logit contributions
 aids-0.029
 face-0.027
 jokes-0.026
 echoed-0.026
 lull-0.026
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
+0.016
iw+0.014
ll+0.013
ve+0.013
field+0.013
Neg logit contributions
 aids-0.010
 face-0.010
 jokes-0.010
 lull-0.009
 echoed-0.009
Token<|endoftext|>
Token ID50256
Feature activation-4.63e-02

Pos logit contributions
 aids+0.018
 face+0.017
 jokes+0.016
 echoed+0.016
 lull+0.016
Neg logit contributions
-0.023
iw-0.019
ve-0.019
ll-0.018
field-0.017
TokenOnce
Token ID7454
Feature activation-1.22e-02

Pos logit contributions
+0.079
iw+0.071
ve+0.065
ll+0.063
 least+0.063
Neg logit contributions
 aids-0.085
 face-0.081
 echoed-0.079
 sounded-0.079
 jokes-0.078
Token there
Token ID612
Feature activation+9.24e-03

Pos logit contributions
+0.026
iw+0.021
ve+0.019
field+0.019
 least+0.018
Neg logit contributions
 face-0.021
 aids-0.020
 jokes-0.019
 echoed-0.019
ia-0.018
Token was
Token ID373
Feature activation+7.63e-03

Pos logit contributions
 face+0.010
 aids+0.010
 jokes+0.009
 echoed+0.009
 repeat+0.009
Neg logit contributions
-0.024
iw-0.021
ve-0.020
ll-0.019
ner-0.019
Token a
Token ID257
Feature activation+6.85e-03

Pos logit contributions
 aids+0.012
 face+0.011
 jokes+0.011
 lull+0.010
 echoed+0.010
Neg logit contributions
-0.017
iw-0.015
ve-0.014
ll-0.014
field-0.013
Token little
Token ID1310
Feature activation-2.49e-02

Pos logit contributions
 face+0.009
 aids+0.008
 jokes+0.008
 lull+0.008
 repeat+0.008
Neg logit contributions
-0.017
iw-0.014
ll-0.014
ve-0.014
 least-0.014
Token wear
Token ID5806
Feature activation-1.86e-02

Pos logit contributions
 aids+0.017
tr+0.015
 echoed+0.015
 face+0.014
 jokes+0.014
Neg logit contributions
-0.026
iw-0.020
 least-0.020
ve-0.020
ll-0.020
Token for
Token ID329
Feature activation+2.11e-02

Pos logit contributions
+0.037
iw+0.030
ve+0.028
ll+0.028
 least+0.027
Neg logit contributions
 aids-0.034
 face-0.032
 jokes-0.031
 echoed-0.030
 lull-0.030
Token too
Token ID1165
Feature activation+1.25e-02

Pos logit contributions
 aids+0.031
 jokes+0.031
 face+0.031
 lull+0.027
 repeat+0.027
Neg logit contributions
-0.050
iw-0.044
ll-0.042
ve-0.040
field-0.040
Token long
Token ID890
Feature activation+1.23e-02

Pos logit contributions
 aids+0.019
 face+0.019
 jokes+0.018
 echoed+0.018
 lull+0.017
Neg logit contributions
-0.028
iw-0.024
ll-0.022
field-0.021
ve-0.021
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
 aids+0.022
 face+0.021
 jokes+0.021
 lull+0.020
 echoed+0.020
Neg logit contributions
-0.025
iw-0.021
ve-0.019
ll-0.019
field-0.018
Token<|endoftext|>
Token ID50256
Feature activation-4.42e-02

Pos logit contributions
 aids+0.029
 face+0.027
 jokes+0.026
 lull+0.025
tr+0.025
Neg logit contributions
-0.030
iw-0.025
ve-0.024
ll-0.023
field-0.022
TokenOnce
Token ID7454
Feature activation-1.30e-02

Pos logit contributions
+0.072
iw+0.063
ve+0.057
ll+0.056
 least+0.055
Neg logit contributions
 aids-0.087
 face-0.083
 echoed-0.082
tr-0.080
 sounded-0.080
Token upon
Token ID2402
Feature activation+4.25e-02

Pos logit contributions
+0.026
iw+0.021
ve+0.019
field+0.019
 least+0.018
Neg logit contributions
 face-0.023
 aids-0.023
 echoed-0.021
 jokes-0.021
 repeat-0.020
Token a
Token ID257
Feature activation+2.69e-02

Pos logit contributions
 aids+0.074
 face+0.068
 jokes+0.066
tr+0.065
 echoed+0.065
Neg logit contributions
-0.087
iw-0.071
ve-0.069
ll-0.066
ner-0.064
Token time
Token ID640
Feature activation-9.54e-04

Pos logit contributions
 aids+0.038
 face+0.036
 jokes+0.035
 lull+0.033
 echoed+0.033
Neg logit contributions
-0.064
iw-0.055
ve-0.052
ll-0.051
field-0.050
Token,
Token ID11
Feature activation-1.12e-02

Pos logit contributions
+0.002
iw+0.002
ve+0.002
field+0.002
ll+0.001
Neg logit contributions
 aids-0.002
 face-0.002
 jokes-0.002
 echoed-0.001
 repeat-0.001
Token house
Token ID2156
Feature activation-4.09e-03

Pos logit contributions
+0.018
iw+0.015
ve+0.014
ll+0.014
field+0.013
Neg logit contributions
 aids-0.013
 face-0.013
 jokes-0.012
 lull-0.012
 echoed-0.012
Token nice
Token ID3621
Feature activation-4.41e-03

Pos logit contributions
+0.008
iw+0.007
ve+0.006
ll+0.006
field+0.006
Neg logit contributions
 aids-0.007
 face-0.007
 jokes-0.007
 echoed-0.006
 lull-0.006
Token and
Token ID290
Feature activation-1.45e-02

Pos logit contributions
+0.009
iw+0.007
ve+0.007
ll+0.007
field+0.006
Neg logit contributions
 aids-0.008
 face-0.008
 jokes-0.007
 lull-0.007
 echoed-0.007
Token bright
Token ID6016
Feature activation+8.17e-03

Pos logit contributions
+0.027
iw+0.023
ll+0.021
ve+0.021
field+0.021
Neg logit contributions
 aids-0.028
 face-0.025
 jokes-0.025
tr-0.024
 echoed-0.024
Token.
Token ID13
Feature activation+1.42e-02

Pos logit contributions
 aids+0.015
 face+0.014
 jokes+0.014
 echoed+0.014
 lull+0.013
Neg logit contributions
-0.016
iw-0.013
ve-0.012
ll-0.012
field-0.011
Token<|endoftext|>
Token ID50256
Feature activation-4.49e-02

Pos logit contributions
 aids+0.022
 face+0.022
 jokes+0.020
 echoed+0.020
 lull+0.019
Neg logit contributions
-0.030
ve-0.026
iw-0.026
ll-0.025
field-0.024
TokenOnce
Token ID7454
Feature activation-9.62e-03

Pos logit contributions
+0.075
iw+0.068
ve+0.061
ll+0.060
 least+0.059
Neg logit contributions
 aids-0.083
 face-0.079
 echoed-0.079
 sounded-0.078
tr-0.076
Token upon
Token ID2402
Feature activation+4.38e-02

Pos logit contributions
+0.020
iw+0.017
ve+0.015
field+0.014
ner+0.014
Neg logit contributions
 face-0.017
 aids-0.016
 echoed-0.015
 jokes-0.015
 repeat-0.014
Token a
Token ID257
Feature activation+2.71e-02

Pos logit contributions
 aids+0.077
 face+0.071
 jokes+0.068
tr+0.067
 echoed+0.067
Neg logit contributions
-0.089
iw-0.072
ve-0.070
ll-0.067
ner-0.065
Token time
Token ID640
Feature activation-9.71e-04

Pos logit contributions
 aids+0.038
 face+0.036
 jokes+0.034
 lull+0.033
 echoed+0.033
Neg logit contributions
-0.065
iw-0.056
ve-0.053
ll-0.052
field-0.050
Token there
Token ID612
Feature activation+7.18e-03

Pos logit contributions
+0.002
iw+0.002
field+0.002
ve+0.002
ll+0.002
Neg logit contributions
 aids-0.002
 face-0.002
 jokes-0.001
 echoed-0.001
 repeat-0.001