TOP ACTIVATIONS
MAX = 8.158

Mary asked Tom , " what 's in it ?" and
what is the object you are
closer and asked , " what do you do ?" The
asked his mom , " what is the nurse ?" She
She asked her mom " what 's this ?".
thought to himself , " what can I do to feel
sky and wondered , " what is that black thing in
owner was saying : " What a thoughtful cat it is
hero asked the bear " what can I do to help
ger and asked , " What are you doing ?"
He asked it , " what 's wrong ?"
. He said , " Who did this ? Who cut
back . She says " what are you doing , my
. Jack said , " what do you think that noise
," she said , " what are you doing ? You
. It replied , " what can you do ?". The
's backpack and asks " what do you have in there
not happy and says " what is going on here ?
closely . He asked " What do you see ?"
. He shouted , " Who are you ? What are

MIN ACTIVATIONS
MIN = -10.694

the rain and take a com fy nap . The tiger
soon nothing but a small spe ck in the sky .
and curled up in a com fy spot to take a
She lay down in a com fy position and soon fell
c uddled up in a com fy blanket and soon he
the spider had loved his com fy cob web home .
away , a bright blue spe ck in the sky .
She had a com fy bed and lots of
for food and made a com fy home in the woods
they had not taken the com bs . They learned a
to make it even more com fy !" They both gigg
almost fell asleep in his com fy chair . When he
, the bird had a com fy home , and the
Soon , she had a com fy home of her own
a modern cave with a com fy bed . One day
away . They had no com bs and no fun .
wanted to give her a com fy mattress . When the
to carry supplies and a com fy seat for him to
but only in his big com fy chair upstairs . He
around and saw a big com fy couch . The little

INTERVAL 3.445 - 8.158
CONTAINS 4.971%

Why are you fighting over the red past el ?"
can make a noise . What do you think ?" <|endoftext|>
there was a little girl named Lily . Lily loved to
curious . He sees the prints and wants to touch them
make it wrong . She made it how she liked it

INTERVAL -1.268 - 3.445
CONTAINS 67.846%

Sam are friends . They like to play with cars and
They learned a hard lesson . They should have listened to
his adventure . <|endoftext|> Once there was a big , fat
smiled and said , " You can eat as much as
hard and fast . Soon , the garage is clean and

INTERVAL -5.981 - -1.268
CONTAINS 26.366%

they were being polite to each other . She watched them
and the k idd o soon began to do much better
for a moment and looked up at the sky . He
wet , but it was okay . She was so happy
" We are going to the beach ," said her

INTERVAL -10.694 - -5.981
CONTAINS 0.817%

The little boy worked really hard and soon the toilet was
everyone was happy and having fun . Sally saw her friend
lesson and never felt guilty again . <|endoftext|> One day ,
After that , they continued on their adventure and found the
that giving peace was even better . She had made the
Token Mary
Token ID5335
Feature activation-1.68e+00

Pos logit contributions
Did+0.015
 lying+0.014
pped+0.014
cher+0.014
boards+0.014
Neg logit contributions
 breath-0.013
 common-0.012
 purpose-0.011
 job-0.011
 accomplishment-0.011
Token asked
Token ID1965
Feature activation+1.50e+00

Pos logit contributions
 breath+0.043
 common+0.041
 purpose+0.039
 job+0.038
 accomplishment+0.037
Neg logit contributions
Did-0.032
cher-0.032
pped-0.031
 lying-0.031
ter-0.030
Token Tom
Token ID4186
Feature activation+1.03e+00

Pos logit contributions
Did+0.046
ode+0.046
cher+0.046
auc+0.044
pped+0.044
Neg logit contributions
 breath-0.022
 purpose-0.018
 common-0.018
 home-0.018
 work-0.017
Token,
Token ID11
Feature activation+8.65e-01

Pos logit contributions
Did+0.029
pped+0.029
cher+0.027
ode+0.027
ter+0.027
Neg logit contributions
 breath-0.016
 common-0.015
 own-0.015
 work-0.015
 home-0.015
Token "
Token ID366
Feature activation+5.30e+00

Pos logit contributions
ode+0.022
pped+0.021
cher+0.021
boards+0.021
 register+0.020
Neg logit contributions
 breath-0.017
 common-0.015
 long-0.015
 purpose-0.014
 accomplishment-0.014
Tokenwhat
Token ID10919
Feature activation+8.14e+00

Pos logit contributions
auc+0.109
icked+0.108
cher+0.107
onel+0.105
washer+0.103
Neg logit contributions
-0.107
 purpose-0.101
 breath-0.100
 common-0.100
After-0.097
Token's
Token ID338
Feature activation+2.70e+00

Pos logit contributions
ting+0.218
pped+0.215
ter+0.205
bled+0.201
cher+0.200
Neg logit contributions
 common-0.167
 own-0.158
 job-0.157
 long-0.155
 purpose-0.151
Token in
Token ID287
Feature activation-1.93e+00

Pos logit contributions
Did+0.063
cher+0.062
pped+0.061
boards+0.061
ode+0.059
Neg logit contributions
 common-0.061
 breath-0.060
 job-0.058
 purpose-0.057
 long-0.054
Token it
Token ID340
Feature activation-1.32e-01

Pos logit contributions
 breath+0.038
 common+0.036
 purpose+0.034
 job+0.032
 accomplishment+0.030
Neg logit contributions
Did-0.049
cher-0.049
pped-0.048
boards-0.047
ting-0.047
Token?"
Token ID1701
Feature activation-2.28e+00

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 job+0.002
 own+0.002
Neg logit contributions
Did-0.004
pped-0.004
cher-0.004
ting-0.004
ode-0.004
Token and
Token ID290
Feature activation-9.94e-01

Pos logit contributions
 breath+0.043
 common+0.040
 purpose+0.037
 job+0.036
 accomplishment+0.035
Neg logit contributions
Did-0.060
 lying-0.058
cher-0.058
pped-0.058
ting-0.057
Token
Token ID6184
Feature activation-1.41e+00

Pos logit contributions
pped+0.017
cher+0.016
ode+0.016
 register+0.016
boards+0.015
Neg logit contributions
 long-0.010
 breath-0.010
 work-0.009
 common-0.009
 own-0.009
Token
Token ID95
Feature activation+9.23e-01

Pos logit contributions
 breath+0.034
 common+0.032
 purpose+0.031
 job+0.029
 accomplishment+0.029
Neg logit contributions
Did-0.029
cher-0.029
pped-0.028
 lying-0.028
ter-0.027
Token
Token ID26391
Feature activation-4.08e+00

Pos logit contributions
Did+0.020
cher+0.019
pped+0.019
 lying+0.019
ter+0.019
Neg logit contributions
 common-0.021
 breath-0.021
 purpose-0.020
 job-0.019
 accomplishment-0.018
Token
Token ID129
Feature activation-6.01e-01

Pos logit contributions
 breath+0.095
 common+0.087
 purpose+0.083
 accomplishment+0.082
 job+0.079
Neg logit contributions
Did-0.091
cher-0.087
pped-0.086
 lying-0.085
boards-0.083
Token
Token ID241
Feature activation+5.31e+00

Pos logit contributions
 breath+0.015
 common+0.013
 purpose+0.013
 tant+0.012
+0.012
Neg logit contributions
 lying-0.012
pped-0.012
ting-0.012
pping-0.011
ter-0.011
Tokenwhat
Token ID10919
Feature activation+8.02e+00

Pos logit contributions
Did+0.097
cher+0.095
pped+0.094
 lying+0.093
ter+0.091
Neg logit contributions
 breath-0.140
 common-0.134
 purpose-0.128
 job-0.123
 accomplishment-0.121
Token is
Token ID318
Feature activation+3.64e-01

Pos logit contributions
ting+0.210
ter+0.202
pped+0.201
ode+0.200
 register+0.199
Neg logit contributions
 common-0.172
 long-0.154
 job-0.153
 own-0.153
 idea-0.149
Token the
Token ID262
Feature activation+7.57e-01

Pos logit contributions
Did+0.009
cher+0.009
pped+0.009
 register+0.008
ode+0.008
Neg logit contributions
 breath-0.007
 common-0.007
 purpose-0.007
 job-0.007
 long-0.006
Token object
Token ID2134
Feature activation+3.54e+00

Pos logit contributions
Did+0.019
pped+0.019
cher+0.019
ting+0.018
 lying+0.018
Neg logit contributions
 breath-0.015
 common-0.015
 purpose-0.014
 job-0.013
 own-0.012
Token you
Token ID345
Feature activation-6.08e-01

Pos logit contributions
Did+0.095
pped+0.094
ode+0.089
cher+0.089
 register+0.089
Neg logit contributions
 breath-0.063
 common-0.060
 job-0.059
 purpose-0.056
 own-0.056
Token are
Token ID389
Feature activation+1.44e+00

Pos logit contributions
 breath+0.012
 common+0.012
 purpose+0.011
 job+0.011
 work+0.011
Neg logit contributions
Did-0.015
cher-0.014
 register-0.014
pped-0.014
ting-0.014
Token closer
Token ID5699
Feature activation-4.58e+00

Pos logit contributions
Did+0.043
cher+0.042
ting+0.042
 lying+0.042
pped+0.042
Neg logit contributions
 breath-0.057
 common-0.056
 purpose-0.055
 job-0.051
 accomplishment-0.051
Token and
Token ID290
Feature activation-5.52e-01

Pos logit contributions
 common+0.096
 breath+0.092
 purpose+0.090
 job+0.085
 accomplishment+0.082
Neg logit contributions
 lying-0.107
ting-0.104
ter-0.102
Did-0.100
boards-0.098
Token asked
Token ID1965
Feature activation+1.25e+00

Pos logit contributions
 breath+0.011
 common+0.011
 purpose+0.010
 job+0.009
 work+0.009
Neg logit contributions
pped-0.013
Did-0.013
cher-0.013
ode-0.013
 lying-0.013
Token,
Token ID11
Feature activation+7.19e-01

Pos logit contributions
pped+0.037
Did+0.037
ode+0.036
cher+0.036
 lying+0.034
Neg logit contributions
 breath-0.020
 common-0.018
 purpose-0.017
 job-0.016
 long-0.015
Token "
Token ID366
Feature activation+5.32e+00

Pos logit contributions
pped+0.017
ode+0.017
cher+0.017
Did+0.016
boards+0.016
Neg logit contributions
 breath-0.014
 common-0.013
 long-0.013
 purpose-0.012
 job-0.012
Tokenwhat
Token ID10919
Feature activation+8.01e+00

Pos logit contributions
icked+0.104
oll+0.103
 lying+0.102
auc+0.102
pled+0.101
Neg logit contributions
-0.113
 breath-0.112
 common-0.107
Ar-0.104
away-0.099
Token do
Token ID466
Feature activation+2.90e-01

Pos logit contributions
pped+0.196
ting+0.195
 register+0.184
ter+0.184
cher+0.184
Neg logit contributions
 common-0.179
 job-0.164
 own-0.162
 purpose-0.159
 long-0.157
Token you
Token ID345
Feature activation-1.39e+00

Pos logit contributions
Did+0.006
pped+0.006
cher+0.006
 lying+0.006
ting+0.006
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.006
 job-0.006
 accomplishment-0.005
Token do
Token ID466
Feature activation-1.81e-01

Pos logit contributions
 breath+0.024
 common+0.023
 purpose+0.021
 job+0.021
 work+0.020
Neg logit contributions
Did-0.039
cher-0.038
pped-0.038
 register-0.037
ting-0.037
Token?"
Token ID1701
Feature activation-2.74e+00

Pos logit contributions
 breath+0.003
 common+0.003
 purpose+0.003
 job+0.003
 accomplishment+0.003
Neg logit contributions
Did-0.005
pped-0.005
cher-0.005
ting-0.004
 lying-0.004
Token The
Token ID383
Feature activation+1.93e+00

Pos logit contributions
 breath+0.048
 common+0.045
 purpose+0.042
 job+0.040
 accomplishment+0.039
Neg logit contributions
Did-0.075
cher-0.074
 lying-0.073
pped-0.073
ting-0.072
Token asked
Token ID1965
Feature activation+1.32e+00

Pos logit contributions
 breath+0.021
 common+0.020
 purpose+0.019
 job+0.018
 accomplishment+0.018
Neg logit contributions
Did-0.026
cher-0.026
pped-0.026
 lying-0.025
ter-0.025
Token his
Token ID465
Feature activation+1.32e+00

Pos logit contributions
Did+0.035
pped+0.035
cher+0.035
ode+0.034
ting+0.034
Neg logit contributions
 breath-0.024
 common-0.022
 purpose-0.021
 job-0.020
 work-0.019
Token mom
Token ID1995
Feature activation+1.45e+00

Pos logit contributions
 lying+0.035
cher+0.035
 register+0.034
Did+0.034
ter+0.033
Neg logit contributions
 breath-0.023
 common-0.022
 purpose-0.020
 accomplishment-0.018
 job-0.017
Token,
Token ID11
Feature activation+9.29e-01

Pos logit contributions
pped+0.041
Did+0.039
cher+0.038
coon+0.036
bridge+0.036
Neg logit contributions
 home-0.025
 work-0.024
 own-0.024
,"-0.024
 common-0.023
Token "
Token ID366
Feature activation+4.87e+00

Pos logit contributions
pped+0.027
ode+0.026
boards+0.025
cher+0.024
bridge+0.024
Neg logit contributions
 breath-0.014
 long-0.014
 common-0.013
,"-0.013
 home-0.013
Tokenwhat
Token ID10919
Feature activation+7.98e+00

Pos logit contributions
cher+0.081
 lying+0.081
auc+0.078
icked+0.076
pled+0.076
Neg logit contributions
 breath-0.124
 common-0.116
-0.115
 purpose-0.112
away-0.108
Token is
Token ID318
Feature activation+4.96e-01

Pos logit contributions
ting+0.219
pped+0.216
ter+0.215
bled+0.209
cher+0.206
Neg logit contributions
 common-0.157
 job-0.142
 own-0.140
 long-0.140
 purpose-0.137
Token the
Token ID262
Feature activation+1.67e+00

Pos logit contributions
Did+0.010
cher+0.010
pped+0.010
ode+0.010
 register+0.010
Neg logit contributions
 breath-0.012
 common-0.011
 purpose-0.011
 job-0.011
 long-0.010
Token nurse
Token ID15849
Feature activation+7.71e-01

Pos logit contributions
Did+0.039
cher+0.038
pped+0.038
 lying+0.038
ter+0.037
Neg logit contributions
 breath-0.035
 common-0.034
 purpose-0.032
 job-0.030
 accomplishment-0.030
Token?"
Token ID1701
Feature activation-2.72e+00

Pos logit contributions
Did+0.020
pped+0.020
ode+0.019
boards+0.019
cher+0.019
Neg logit contributions
 breath-0.013
 job-0.013
 common-0.013
 own-0.012
 work-0.012
Token She
Token ID1375
Feature activation+4.99e-02

Pos logit contributions
 breath+0.066
 common+0.064
powers+0.057
 accomplishment+0.057
 purpose+0.055
Neg logit contributions
Did-0.058
 lying-0.056
 Why-0.051
ter-0.051
My-0.050
Token She
Token ID1375
Feature activation+2.63e-01

Pos logit contributions
Did+0.009
 lying+0.008
pped+0.008
ting+0.008
cher+0.008
Neg logit contributions
 breath-0.009
 common-0.009
 purpose-0.008
 job-0.008
 accomplishment-0.008
Token asked
Token ID1965
Feature activation+1.66e+00

Pos logit contributions
Did+0.006
cher+0.006
pped+0.006
 lying+0.006
ter+0.006
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.005
 job-0.005
 accomplishment-0.005
Token her
Token ID607
Feature activation+1.06e+00

Pos logit contributions
cher+0.047
Did+0.047
pped+0.046
ode+0.046
ter+0.045
Neg logit contributions
 breath-0.028
 common-0.025
 purpose-0.024
 job-0.023
 accomplishment-0.022
Token mom
Token ID1995
Feature activation+1.70e+00

Pos logit contributions
Did+0.034
pped+0.034
cher+0.034
ting+0.033
 lying+0.033
Neg logit contributions
 breath-0.014
 common-0.013
 purpose-0.012
 job-0.011
 accomplishment-0.010
Token "
Token ID366
Feature activation+5.06e+00

Pos logit contributions
pped+0.058
cher+0.057
Did+0.055
bridge+0.054
coon+0.053
Neg logit contributions
,-0.022
 home-0.020
,"-0.020
 work-0.019
 own-0.017
Tokenwhat
Token ID10919
Feature activation+7.97e+00

Pos logit contributions
auc+0.089
cher+0.086
 lying+0.084
washer+0.082
otion+0.078
Neg logit contributions
 breath-0.129
-0.121
 purpose-0.117
 common-0.117
 sight-0.112
Token's
Token ID338
Feature activation+2.62e+00

Pos logit contributions
ting+0.239
pped+0.231
ter+0.230
ode+0.219
cher+0.218
Neg logit contributions
 common-0.148
 own-0.133
 long-0.129
,"-0.128
 job-0.128
Token this
Token ID428
Feature activation+2.97e+00

Pos logit contributions
Did+0.057
pped+0.055
cher+0.055
boards+0.053
ode+0.053
Neg logit contributions
 common-0.061
 breath-0.061
 job-0.058
 purpose-0.057
 work-0.054
Token?".
Token ID43634
Feature activation-2.79e+00

Pos logit contributions
Did+0.070
cher+0.070
 lying+0.069
pped+0.068
ter+0.067
Neg logit contributions
 breath-0.061
 common-0.057
 purpose-0.053
 accomplishment-0.050
 job-0.050
Token
Token ID198
Feature activation-2.68e+00

Pos logit contributions
 breath+0.067
 common+0.063
 accomplishment+0.058
 choice+0.056
 purpose+0.055
Neg logit contributions
Did-0.060
 lying-0.057
pped-0.054
ting-0.054
 Why-0.054
Token
Token ID198
Feature activation-2.21e+00

Pos logit contributions
 breath+0.069
 accomplishment+0.061
 common+0.060
 job+0.058
 choice+0.057
Neg logit contributions
Did-0.056
pped-0.053
can-0.053
 lying-0.051
 Why-0.050
Token thought
Token ID1807
Feature activation+2.40e+00

Pos logit contributions
 breath+0.014
 common+0.013
 purpose+0.012
 job+0.012
 accomplishment+0.012
Neg logit contributions
Did-0.011
cher-0.011
pped-0.011
 lying-0.011
ter-0.010
Token to
Token ID284
Feature activation+9.91e-02

Pos logit contributions
cher+0.073
Did+0.072
ode+0.071
pped+0.070
ter+0.069
Neg logit contributions
 breath-0.036
 long-0.034
 common-0.033
 purpose-0.033
 work-0.031
Token himself
Token ID2241
Feature activation+1.01e+00

Pos logit contributions
cher+0.002
Did+0.002
pped+0.002
ting+0.002
 register+0.002
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.002
 own-0.002
Token,
Token ID11
Feature activation+1.15e+00

Pos logit contributions
Did+0.032
cher+0.031
pped+0.031
 lying+0.030
ter+0.030
Neg logit contributions
 breath-0.013
 common-0.012
 purpose-0.011
 job-0.011
 accomplishment-0.010
Token "
Token ID366
Feature activation+5.09e+00

Pos logit contributions
pped+0.029
cher+0.029
ode+0.028
 register+0.028
auc+0.027
Neg logit contributions
 breath-0.021
 long-0.019
 work-0.019
 own-0.018
 common-0.018
Tokenwhat
Token ID10919
Feature activation+7.96e+00

Pos logit contributions
 register+0.069
cher+0.069
 lying+0.065
pled+0.062
boards+0.060
Neg logit contributions
 common-0.148
 purpose-0.147
 breath-0.146
 own-0.138
 job-0.135
Token can
Token ID460
Feature activation+2.15e-01

Pos logit contributions
ting+0.215
pped+0.204
ter+0.203
ode+0.199
uit+0.198
Neg logit contributions
 common-0.159
 job-0.150
 own-0.149
 lesson-0.146
 long-0.142
Token I
Token ID314
Feature activation+7.87e-01

Pos logit contributions
Did+0.005
pped+0.004
cher+0.004
ting+0.004
 register+0.004
Neg logit contributions
 breath-0.005
 common-0.005
 purpose-0.005
 job-0.004
 own-0.004
Token do
Token ID466
Feature activation+1.49e-01

Pos logit contributions
Did+0.017
cher+0.017
pped+0.017
 lying+0.017
ter+0.016
Neg logit contributions
 breath-0.018
 common-0.017
 purpose-0.016
 job-0.015
 accomplishment-0.015
Token to
Token ID284
Feature activation-1.64e+00

Pos logit contributions
Did+0.004
cher+0.004
pped+0.004
 lying+0.004
ter+0.003
Neg logit contributions
 breath-0.003
 common-0.003
 purpose-0.003
 job-0.003
 accomplishment-0.003
Token feel
Token ID1254
Feature activation-7.55e-01

Pos logit contributions
 breath+0.037
 common+0.036
 purpose+0.033
 job+0.032
 accomplishment+0.032
Neg logit contributions
 lying-0.034
pped-0.034
cher-0.034
Did-0.034
ter-0.033
Token sky
Token ID6766
Feature activation-8.41e-01

Pos logit contributions
cher+0.067
pped+0.066
Did+0.065
 register+0.061
ting+0.061
Neg logit contributions
 breath-0.049
 common-0.046
 purpose-0.046
 job-0.043
 sight-0.042
Token and
Token ID290
Feature activation-3.99e-01

Pos logit contributions
 breath+0.011
 common+0.010
 purpose+0.009
 job+0.009
 long+0.008
Neg logit contributions
cher-0.028
Did-0.027
pped-0.027
 register-0.026
ode-0.026
Token wondered
Token ID14028
Feature activation+3.69e-01

Pos logit contributions
 breath+0.009
 common+0.009
 purpose+0.008
 job+0.008
 accomplishment+0.008
Neg logit contributions
cher-0.009
Did-0.008
pped-0.008
 lying-0.008
 register-0.008
Token,
Token ID11
Feature activation+3.15e-01

Pos logit contributions
cher+0.008
pped+0.008
Did+0.008
auc+0.008
ode+0.008
Neg logit contributions
 breath-0.008
,"-0.008
,-0.008
 long-0.008
 home-0.008
Token "
Token ID366
Feature activation+4.92e+00

Pos logit contributions
cher+0.010
pped+0.009
 register+0.009
auc+0.009
ode+0.009
Neg logit contributions
 long-0.005
 breath-0.004
 But-0.004
 own-0.004
 work-0.004
Tokenwhat
Token ID10919
Feature activation+7.96e+00

Pos logit contributions
cher+0.094
 register+0.091
 lying+0.090
pled+0.086
auc+0.082
Neg logit contributions
 common-0.116
 purpose-0.115
 breath-0.115
-0.107
 own-0.106
Token is
Token ID318
Feature activation+5.38e-01

Pos logit contributions
ting+0.222
pped+0.217
uit+0.210
ter+0.210
 register+0.208
Neg logit contributions
 common-0.149
 own-0.139
 long-0.137
 job-0.135
 purpose-0.134
Token that
Token ID326
Feature activation+2.28e+00

Pos logit contributions
 lying+0.011
Did+0.011
cher+0.011
pped+0.011
ter+0.011
Neg logit contributions
 breath-0.013
 common-0.012
 purpose-0.012
 accomplishment-0.011
 job-0.011
Token black
Token ID2042
Feature activation-9.73e-02

Pos logit contributions
 lying+0.053
ter+0.052
Did+0.051
cher+0.050
ting+0.050
Neg logit contributions
 breath-0.049
 common-0.047
 purpose-0.042
 accomplishment-0.041
 job-0.038
Token thing
Token ID1517
Feature activation+3.70e+00

Pos logit contributions
 breath+0.002
 purpose+0.002
 common+0.001
 job+0.001
 sight+0.001
Neg logit contributions
cher-0.003
Did-0.003
pped-0.003
 register-0.003
 lying-0.003
Token in
Token ID287
Feature activation-1.59e+00

Pos logit contributions
Did+0.099
pped+0.098
cher+0.095
ode+0.093
 register+0.093
Neg logit contributions
 breath-0.067
 common-0.064
 purpose-0.061
 job-0.059
 own-0.056
Token owner
Token ID4870
Feature activation+2.73e+00

Pos logit contributions
pped+0.024
Did+0.023
cher+0.023
ting+0.022
 lying+0.022
Neg logit contributions
 breath-0.018
 common-0.016
 purpose-0.016
 job-0.015
 accomplishment-0.014
Token was
Token ID373
Feature activation+1.58e+00

Pos logit contributions
Did+0.063
cher+0.062
pped+0.062
 lying+0.061
ting+0.061
Neg logit contributions
 breath-0.058
 common-0.055
 purpose-0.051
 job-0.049
 accomplishment-0.048
Token saying
Token ID2282
Feature activation+2.18e+00

Pos logit contributions
Did+0.037
 lying+0.036
cher+0.036
pped+0.036
ter+0.035
Neg logit contributions
 breath-0.033
 common-0.032
 purpose-0.030
 job-0.029
 accomplishment-0.028
Token:
Token ID25
Feature activation+3.04e+00

Pos logit contributions
pped+0.058
ode+0.057
Did+0.057
cher+0.056
 register+0.054
Neg logit contributions
 breath-0.038
 common-0.036
 purpose-0.035
 long-0.034
 job-0.033
Token "
Token ID366
Feature activation+5.05e+00

Pos logit contributions
Did+0.060
cher+0.059
pped+0.059
 lying+0.058
ter+0.057
Neg logit contributions
 breath-0.075
 common-0.072
 purpose-0.068
 job-0.065
 accomplishment-0.064
TokenWhat
Token ID2061
Feature activation+7.94e+00

Pos logit contributions
cher+0.081
 lying+0.080
pped+0.079
Did+0.079
 register+0.077
Neg logit contributions
 breath-0.141
 common-0.138
 purpose-0.132
 job-0.126
 accomplishment-0.123
Token a
Token ID257
Feature activation-8.90e-02

Pos logit contributions
pped+0.208
ting+0.207
bled+0.197
ter+0.195
ode+0.195
Neg logit contributions
 common-0.165
 job-0.152
 purpose-0.145
 own-0.140
,"-0.140
Token thoughtful
Token ID22677
Feature activation-2.63e+00

Pos logit contributions
 breath+0.002
 common+0.001
 purpose+0.001
 job+0.001
 accomplishment+0.001
Neg logit contributions
pped-0.002
Did-0.002
cher-0.002
ting-0.002
 lying-0.002
Token cat
Token ID3797
Feature activation+1.67e+00

Pos logit contributions
 breath+0.049
 common+0.046
 purpose+0.042
 accomplishment+0.040
 job+0.039
Neg logit contributions
Did-0.067
 lying-0.066
cher-0.066
ter-0.066
pped-0.065
Token it
Token ID340
Feature activation+2.17e-01

Pos logit contributions
Did+0.041
ode+0.041
pped+0.040
boards+0.040
 register+0.039
Neg logit contributions
 common-0.030
 breath-0.029
 job-0.029
-0.027
 work-0.027
Token is
Token ID318
Feature activation+7.28e-01

Pos logit contributions
Did+0.005
pped+0.005
ode+0.005
cher+0.005
 register+0.005
Neg logit contributions
 common-0.004
 breath-0.004
 job-0.004
 own-0.004
 work-0.004
Token hero
Token ID4293
Feature activation+6.41e-03

Pos logit contributions
 breath+0.098
 common+0.097
 purpose+0.090
 job+0.083
 accomplishment+0.082
Neg logit contributions
ter-0.102
 lying-0.101
cher-0.101
ting-0.099
Did-0.098
Token asked
Token ID1965
Feature activation+1.29e+00

Pos logit contributions
Did+0.000
pped+0.000
cher+0.000
 lying+0.000
 register+0.000
Neg logit contributions
 breath-0.000
 common-0.000
 purpose-0.000
 job-0.000
 accomplishment-0.000
Token the
Token ID262
Feature activation+2.66e+00

Pos logit contributions
Did+0.038
pped+0.037
ode+0.037
cher+0.037
 register+0.036
Neg logit contributions
 breath-0.020
 common-0.018
 purpose-0.017
 job-0.016
 accomplishment-0.016
Token bear
Token ID6842
Feature activation+2.83e+00

Pos logit contributions
pped+0.073
Did+0.069
cher+0.067
pled+0.067
ting+0.066
Neg logit contributions
 breath-0.051
 purpose-0.044
 common-0.044
 job-0.043
 long-0.042
Token "
Token ID366
Feature activation+5.23e+00

Pos logit contributions
pped+0.086
Did+0.086
cher+0.080
ode+0.079
ting+0.078
Neg logit contributions
 breath-0.041
 common-0.037
 home-0.037
 own-0.036
 job-0.035
Tokenwhat
Token ID10919
Feature activation+7.91e+00

Pos logit contributions
auc+0.079
boards+0.078
oll+0.077
washer+0.076
 lying+0.075
Neg logit contributions
 breath-0.147
-0.128
 work-0.126
 common-0.125
 sight-0.121
Token can
Token ID460
Feature activation-2.86e-01

Pos logit contributions
ting+0.193
pped+0.184
ter+0.182
 register+0.179
uit+0.178
Neg logit contributions
 common-0.174
 job-0.161
 own-0.161
 work-0.155
 purpose-0.154
Token I
Token ID314
Feature activation+5.80e-01

Pos logit contributions
 breath+0.007
 common+0.006
 purpose+0.006
 job+0.006
 work+0.006
Neg logit contributions
Did-0.006
pped-0.006
 register-0.006
cher-0.006
ting-0.006
Token do
Token ID466
Feature activation-1.20e-01

Pos logit contributions
Did+0.014
pped+0.014
ting+0.013
cher+0.013
 register+0.013
Neg logit contributions
 breath-0.012
 common-0.012
 job-0.011
 purpose-0.011
 work-0.011
Token to
Token ID284
Feature activation-1.56e+00

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 job+0.002
 accomplishment+0.002
Neg logit contributions
Did-0.003
pped-0.003
cher-0.003
 lying-0.003
ting-0.003
Token help
Token ID1037
Feature activation+9.81e-01

Pos logit contributions
 breath+0.035
 common+0.034
 purpose+0.032
 job+0.030
 accomplishment+0.030
Neg logit contributions
Did-0.034
cher-0.034
pped-0.034
 lying-0.033
ter-0.033
Tokenger
Token ID1362
Feature activation+1.90e+00

Pos logit contributions
 breath+0.015
 common+0.014
 purpose+0.014
 accomplishment+0.013
 job+0.012
Neg logit contributions
 lying-0.015
ter-0.015
ting-0.015
pped-0.015
cher-0.015
Token and
Token ID290
Feature activation-4.75e-01

Pos logit contributions
Did+0.056
pped+0.054
cher+0.054
 lying+0.053
ode+0.053
Neg logit contributions
 breath-0.029
 common-0.026
 purpose-0.024
 job-0.023
 own-0.022
Token asked
Token ID1965
Feature activation+1.64e+00

Pos logit contributions
 breath+0.009
 common+0.009
 purpose+0.008
 job+0.008
 own+0.008
Neg logit contributions
pped-0.012
Did-0.012
ode-0.011
pled-0.011
cher-0.011
Token,
Token ID11
Feature activation+1.09e+00

Pos logit contributions
Did+0.045
pped+0.044
ode+0.043
cher+0.043
 lying+0.042
Neg logit contributions
 breath-0.029
 common-0.028
 purpose-0.026
 job-0.024
 home-0.023
Token "
Token ID366
Feature activation+5.43e+00

Pos logit contributions
pped+0.024
Did+0.024
ode+0.024
cher+0.024
boards+0.023
Neg logit contributions
 breath-0.024
 common-0.023
 purpose-0.021
 job-0.021
 long-0.020
TokenWhat
Token ID2061
Feature activation+7.91e+00

Pos logit contributions
auc+0.098
 lying+0.093
icked+0.092
pled+0.092
washer+0.092
Neg logit contributions
 breath-0.133
-0.128
 common-0.127
Ar-0.121
 purpose-0.117
Token are
Token ID389
Feature activation+1.56e+00

Pos logit contributions
pped+0.220
ting+0.214
Did+0.206
 register+0.204
cher+0.202
Neg logit contributions
 common-0.153
 job-0.141
 purpose-0.138
 own-0.137
 long-0.133
Token you
Token ID345
Feature activation-1.37e+00

Pos logit contributions
Did+0.039
pped+0.038
cher+0.038
ode+0.037
 register+0.036
Neg logit contributions
 breath-0.031
 common-0.030
 purpose-0.028
 job-0.027
 long-0.026
Token doing
Token ID1804
Feature activation+1.57e+00

Pos logit contributions
 breath+0.034
 common+0.033
 job+0.031
 purpose+0.031
 work+0.030
Neg logit contributions
Did-0.028
cher-0.026
pped-0.026
boards-0.026
ode-0.025
Token?"
Token ID1701
Feature activation-2.26e+00

Pos logit contributions
Did+0.040
cher+0.039
pped+0.039
 register+0.038
ode+0.037
Neg logit contributions
 breath-0.031
 common-0.030
 purpose-0.028
 job-0.027
 work-0.025
Token
Token ID198
Feature activation-2.33e+00

Pos logit contributions
 breath+0.046
 common+0.043
 purpose+0.040
 accomplishment+0.039
 job+0.039
Neg logit contributions
Did-0.056
 lying-0.054
cher-0.054
ter-0.054
ting-0.053
Token He
Token ID679
Feature activation-3.97e-02

Pos logit contributions
 breath+0.003
 common+0.003
 purpose+0.002
 job+0.002
 accomplishment+0.002
Neg logit contributions
Did-0.004
 lying-0.003
cher-0.003
ting-0.003
boards-0.003
Token asked
Token ID1965
Feature activation+1.53e+00

Pos logit contributions
 breath+0.001
 common+0.001
 purpose+0.001
 job+0.001
 accomplishment+0.001
Neg logit contributions
Did-0.001
cher-0.001
pped-0.001
 lying-0.001
 register-0.001
Token it
Token ID340
Feature activation+2.49e+00

Pos logit contributions
Did+0.041
pped+0.041
cher+0.040
 lying+0.039
ode+0.039
Neg logit contributions
 breath-0.028
 common-0.026
 purpose-0.024
 job-0.023
 accomplishment-0.023
Token,
Token ID11
Feature activation+9.55e-01

Pos logit contributions
pped+0.064
Did+0.062
cher+0.060
 register+0.058
ode+0.058
Neg logit contributions
 breath-0.049
 common-0.045
 purpose-0.043
 job-0.042
 own-0.042
Token "
Token ID366
Feature activation+5.08e+00

Pos logit contributions
pped+0.024
ode+0.024
cher+0.024
boards+0.024
 register+0.023
Neg logit contributions
 breath-0.018
 long-0.016
 common-0.016
 accomplishment-0.015
 home-0.015
Tokenwhat
Token ID10919
Feature activation+7.89e+00

Pos logit contributions
auc+0.095
icked+0.092
washer+0.091
 register+0.089
oll+0.087
Neg logit contributions
 breath-0.121
-0.114
 common-0.107
away-0.106
Ar-0.106
Token's
Token ID338
Feature activation+2.81e+00

Pos logit contributions
ting+0.200
pped+0.195
ter+0.190
 register+0.188
uit+0.184
Neg logit contributions
 common-0.167
 job-0.156
 own-0.154
 long-0.150
 purpose-0.149
Token wrong
Token ID2642
Feature activation-2.69e+00

Pos logit contributions
Did+0.060
cher+0.059
boards+0.059
pped+0.058
 register+0.058
Neg logit contributions
 breath-0.068
 common-0.066
 job-0.062
 purpose-0.061
 work-0.058
Token?"
Token ID1701
Feature activation-2.97e+00

Pos logit contributions
 breath+0.056
 common+0.054
 purpose+0.050
 accomplishment+0.047
 job+0.047
Neg logit contributions
Did-0.063
cher-0.062
 lying-0.062
ter-0.061
pped-0.061
Token
Token ID220
Feature activation-1.62e+00

Pos logit contributions
 breath+0.061
 common+0.060
 purpose+0.054
 accomplishment+0.051
 job+0.050
Neg logit contributions
Did-0.074
 lying-0.070
cher-0.070
ting-0.069
ter-0.069
Token
Token ID198
Feature activation-2.99e+00

Pos logit contributions
 breath+0.030
 common+0.028
 purpose+0.026
 job+0.025
 accomplishment+0.024
Neg logit contributions
Did-0.042
 lying-0.042
cher-0.042
pped-0.042
ter-0.040
Token.
Token ID13
Feature activation+5.48e-01

Pos logit contributions
 breath+0.048
 common+0.043
 purpose+0.040
 job+0.037
 accomplishment+0.034
Neg logit contributions
cher-0.109
Did-0.109
pped-0.108
ode-0.105
 register-0.105
Token He
Token ID679
Feature activation-7.94e-01

Pos logit contributions
Did+0.015
cher+0.014
 lying+0.014
pped+0.014
ter+0.014
Neg logit contributions
 breath-0.010
 common-0.009
 purpose-0.009
 job-0.008
 accomplishment-0.008
Token said
Token ID531
Feature activation+3.11e+00

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.015
 job+0.014
 work+0.014
Neg logit contributions
cher-0.018
Did-0.018
 register-0.018
pped-0.018
ter-0.018
Token,
Token ID11
Feature activation+2.22e+00

Pos logit contributions
Did+0.100
cher+0.099
pped+0.098
 lying+0.098
ter+0.097
Neg logit contributions
 breath-0.039
 common-0.036
 purpose-0.032
 job-0.029
 accomplishment-0.028
Token "
Token ID366
Feature activation+5.42e+00

Pos logit contributions
pped+0.051
cher+0.051
ode+0.051
 register+0.050
Did+0.049
Neg logit contributions
 breath-0.046
 common-0.043
 purpose-0.041
 long-0.040
 job-0.039
TokenWho
Token ID8241
Feature activation+7.89e+00

Pos logit contributions
Did+0.080
pped+0.076
ter+0.072
ting+0.071
cher+0.070
Neg logit contributions
 breath-0.169
 common-0.162
 accomplishment-0.156
 purpose-0.155
 job-0.153
Token did
Token ID750
Feature activation+2.63e-01

Pos logit contributions
ting+0.190
Did+0.182
ter+0.178
ode+0.174
bling+0.168
Neg logit contributions
 common-0.166
,"-0.151
 breath-0.150
 work-0.150
 own-0.150
Token this
Token ID428
Feature activation+2.98e+00

Pos logit contributions
Did+0.006
cher+0.006
pped+0.006
 lying+0.006
ter+0.005
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.005
 job-0.005
 accomplishment-0.005
Token?
Token ID30
Feature activation+3.20e+00

Pos logit contributions
Did+0.071
pped+0.069
cher+0.068
ting+0.066
ode+0.066
Neg logit contributions
 breath-0.063
 common-0.062
 purpose-0.059
 job-0.057
 work-0.053
Token Who
Token ID5338
Feature activation+7.34e+00

Pos logit contributions
ode+0.061
pped+0.061
Did+0.059
cher+0.059
ting+0.058
Neg logit contributions
 common-0.072
-0.068
 purpose-0.068
 job-0.067
 breath-0.067
Token cut
Token ID2005
Feature activation+3.46e+00

Pos logit contributions
ting+0.173
Did+0.170
ter+0.164
ode+0.158
 register+0.154
Neg logit contributions
 common-0.162
,"-0.143
 own-0.143
 breath-0.143
 work-0.142
Token back
Token ID736
Feature activation-2.70e+00

Pos logit contributions
 lying+0.043
Did+0.042
ting+0.042
cher+0.041
ter+0.041
Neg logit contributions
 common-0.028
 breath-0.027
 purpose-0.026
 job-0.023
 accomplishment-0.023
Token.
Token ID13
Feature activation+6.29e-01

Pos logit contributions
 breath+0.058
 common+0.053
 purpose+0.051
 accomplishment+0.051
 job+0.048
Neg logit contributions
 lying-0.063
ting-0.062
Did-0.061
ter-0.060
cher-0.058
Token She
Token ID1375
Feature activation-2.53e-01

Pos logit contributions
Did+0.015
cher+0.015
pped+0.015
 lying+0.015
ter+0.015
Neg logit contributions
 breath-0.013
 common-0.012
 purpose-0.011
 job-0.011
 accomplishment-0.010
Token says
Token ID1139
Feature activation+2.61e+00

Pos logit contributions
 breath+0.004
 common+0.004
 work+0.003
 own+0.003
 choice+0.003
Neg logit contributions
cher-0.008
pped-0.007
Did-0.007
ter-0.007
ting-0.007
Token "
Token ID366
Feature activation+5.85e+00

Pos logit contributions
cher+0.083
Did+0.083
pped+0.083
ting+0.081
ter+0.081
Neg logit contributions
 breath-0.033
 common-0.030
 purpose-0.027
 job-0.024
 accomplishment-0.023
Tokenwhat
Token ID10919
Feature activation+7.87e+00

Pos logit contributions
cher+0.082
pped+0.080
 lying+0.080
 register+0.080
ode+0.078
Neg logit contributions
 breath-0.172
 common-0.170
 purpose-0.162
 job-0.151
-0.151
Token are
Token ID389
Feature activation+1.25e+00

Pos logit contributions
ting+0.204
ter+0.198
ode+0.191
pped+0.190
cher+0.190
Neg logit contributions
 common-0.171
 job-0.153
 purpose-0.151
 long-0.150
 lesson-0.148
Token you
Token ID345
Feature activation-1.38e+00

Pos logit contributions
Did+0.033
pped+0.033
cher+0.032
ode+0.032
pled+0.031
Neg logit contributions
 common-0.022
 breath-0.022
 job-0.020
 purpose-0.020
 long-0.019
Token doing
Token ID1804
Feature activation+1.24e+00

Pos logit contributions
 breath+0.038
 common+0.036
 purpose+0.034
 job+0.034
 work+0.033
Neg logit contributions
Did-0.024
cher-0.024
boards-0.024
ode-0.023
pped-0.022
Token,
Token ID11
Feature activation+1.65e+00

Pos logit contributions
Did+0.032
cher+0.031
pped+0.030
ode+0.030
boards+0.030
Neg logit contributions
 breath-0.024
 common-0.023
 purpose-0.022
 job-0.022
 accomplishment-0.020
Token my
Token ID616
Feature activation+1.66e+00

Pos logit contributions
Did+0.042
cher+0.041
pped+0.041
 lying+0.041
ter+0.040
Neg logit contributions
 breath-0.032
 common-0.030
 purpose-0.028
 job-0.027
 accomplishment-0.026
Token.
Token ID13
Feature activation+6.14e-01

Pos logit contributions
Did+0.006
ode+0.006
pped+0.006
cher+0.006
 register+0.006
Neg logit contributions
 breath-0.004
 common-0.004
 long-0.004
 work-0.003
 purpose-0.003
Token Jack
Token ID3619
Feature activation-2.21e-01

Pos logit contributions
cher+0.016
pped+0.016
ode+0.016
Did+0.015
 register+0.015
Neg logit contributions
 common-0.011
 breath-0.011
 purpose-0.010
 long-0.010
 job-0.009
Token said
Token ID531
Feature activation+1.96e+00

Pos logit contributions
 breath+0.005
 common+0.005
 purpose+0.004
 job+0.004
 work+0.004
Neg logit contributions
Did-0.005
cher-0.005
pped-0.005
 register-0.005
ter-0.005
Token,
Token ID11
Feature activation+8.72e-01

Pos logit contributions
Did+0.066
pped+0.065
cher+0.065
ode+0.064
ting+0.064
Neg logit contributions
 breath-0.022
 common-0.020
 purpose-0.018
 job-0.016
 long-0.015
Token "
Token ID366
Feature activation+5.28e+00

Pos logit contributions
pped+0.020
ode+0.020
cher+0.020
 register+0.020
Did+0.020
Neg logit contributions
 breath-0.018
 common-0.017
 purpose-0.016
 long-0.016
 job-0.015
Tokenwhat
Token ID10919
Feature activation+7.87e+00

Pos logit contributions
cher+0.090
 lying+0.089
 register+0.086
pped+0.082
pled+0.081
Neg logit contributions
 common-0.141
 breath-0.138
 purpose-0.134
 sight-0.122
-0.122
Token do
Token ID466
Feature activation+2.98e-01

Pos logit contributions
ting+0.222
ter+0.211
pped+0.209
 register+0.205
ode+0.201
Neg logit contributions
 common-0.163
 job-0.144
 own-0.140
 long-0.139
 purpose-0.138
Token you
Token ID345
Feature activation-1.10e+00

Pos logit contributions
Did+0.007
pped+0.007
cher+0.007
 lying+0.007
ting+0.007
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.006
 job-0.005
 accomplishment-0.005
Token think
Token ID892
Feature activation+2.66e-01

Pos logit contributions
 breath+0.020
 common+0.019
 purpose+0.017
 job+0.017
 work+0.017
Neg logit contributions
Did-0.030
cher-0.029
pped-0.029
ting-0.029
 register-0.029
Token that
Token ID326
Feature activation+3.56e+00

Pos logit contributions
Did+0.008
ode+0.007
pped+0.007
cher+0.007
 register+0.007
Neg logit contributions
 common-0.004
 breath-0.004
 long-0.004
 purpose-0.004
 job-0.004
Token noise
Token ID7838
Feature activation+5.48e-01

Pos logit contributions
Did+0.093
pped+0.093
cher+0.090
 register+0.089
ting+0.089
Neg logit contributions
 breath-0.067
 common-0.064
 purpose-0.061
 job-0.059
 own-0.055
Token,"
Token ID553
Feature activation-1.43e+00

Pos logit contributions
 breath+0.031
 common+0.029
 purpose+0.026
 job+0.023
 accomplishment+0.023
Neg logit contributions
cher-0.074
 lying-0.074
Did-0.074
pped-0.074
ter-0.073
Token she
Token ID673
Feature activation-9.28e-01

Pos logit contributions
 breath+0.031
 common+0.030
 purpose+0.028
 accomplishment+0.026
 job+0.026
Neg logit contributions
Did-0.033
 lying-0.032
ter-0.032
cher-0.031
ting-0.031
Token said
Token ID531
Feature activation+3.90e-01

Pos logit contributions
 breath+0.020
 common+0.019
 purpose+0.018
 accomplishment+0.017
 job+0.017
Neg logit contributions
cher-0.021
 lying-0.021
Did-0.021
ter-0.021
pped-0.020
Token,
Token ID11
Feature activation+4.46e-01

Pos logit contributions
 lying+0.011
cher+0.011
ter+0.011
pped+0.010
Did+0.010
Neg logit contributions
 common-0.007
 breath-0.006
 purpose-0.006
 job-0.005
 accomplishment-0.005
Token "
Token ID366
Feature activation+4.66e+00

Pos logit contributions
Did+0.009
 lying+0.009
pped+0.009
cher+0.009
ter+0.008
Neg logit contributions
 breath-0.011
 common-0.011
 purpose-0.010
 job-0.010
 accomplishment-0.009
Tokenwhat
Token ID10919
Feature activation+7.85e+00

Pos logit contributions
cher+0.072
 register+0.070
 lying+0.070
pped+0.068
ode+0.068
Neg logit contributions
 breath-0.125
 common-0.125
 purpose-0.121
-0.111
 own-0.111
Token are
Token ID389
Feature activation+1.39e+00

Pos logit contributions
ting+0.213
ter+0.201
pped+0.199
uit+0.192
ode+0.190
Neg logit contributions
 common-0.164
 job-0.155
 own-0.151
 long-0.150
 lesson-0.146
Token you
Token ID345
Feature activation-1.28e+00

Pos logit contributions
Did+0.038
pped+0.037
ode+0.037
cher+0.036
uit+0.036
Neg logit contributions
 common-0.024
 breath-0.024
 job-0.023
 purpose-0.022
 long-0.022
Token doing
Token ID1804
Feature activation+1.29e+00

Pos logit contributions
 breath+0.034
 common+0.032
 job+0.032
 purpose+0.031
 work+0.031
Neg logit contributions
Did-0.024
cher-0.023
boards-0.023
pped-0.022
ode-0.022
Token?
Token ID30
Feature activation+2.26e+00

Pos logit contributions
Did+0.031
cher+0.031
pped+0.030
ode+0.030
 register+0.030
Neg logit contributions
 breath-0.026
 common-0.025
 purpose-0.024
 job-0.024
 work-0.022
Token You
Token ID921
Feature activation+2.30e+00

Pos logit contributions
pped+0.045
ode+0.044
cher+0.043
 register+0.043
 lying+0.043
Neg logit contributions
 common-0.050
 breath-0.048
 purpose-0.047
 Today-0.047
-0.047
Token.
Token ID13
Feature activation+6.10e-01

Pos logit contributions
 breath+0.061
 common+0.057
 purpose+0.052
 job+0.048
 accomplishment+0.046
Neg logit contributions
Did-0.117
cher-0.116
pped-0.115
 lying-0.114
ter-0.113
Token It
Token ID632
Feature activation+2.71e+00

Pos logit contributions
Did+0.016
cher+0.016
pped+0.016
 lying+0.015
ter+0.015
Neg logit contributions
 breath-0.011
 common-0.011
 purpose-0.010
 job-0.009
 accomplishment-0.009
Token replied
Token ID8712
Feature activation+1.81e+00

Pos logit contributions
Did+0.053
cher+0.052
pped+0.051
 lying+0.051
ter+0.050
Neg logit contributions
 breath-0.067
 common-0.065
 purpose-0.062
 job-0.059
 accomplishment-0.058
Token,
Token ID11
Feature activation+6.95e-01

Pos logit contributions
Did+0.060
cher+0.060
pped+0.060
 lying+0.059
ter+0.059
Neg logit contributions
 breath-0.020
 common-0.019
 purpose-0.016
 job-0.014
 accomplishment-0.014
Token "
Token ID366
Feature activation+5.01e+00

Pos logit contributions
Did+0.013
cher+0.013
pped+0.013
 lying+0.013
ter+0.013
Neg logit contributions
 breath-0.018
 common-0.017
 purpose-0.016
 job-0.015
 accomplishment-0.015
Tokenwhat
Token ID10919
Feature activation+7.84e+00

Pos logit contributions
 lying+0.087
cher+0.086
 register+0.085
pped+0.085
boards+0.081
Neg logit contributions
 breath-0.132
 common-0.131
 purpose-0.124
 job-0.117
 own-0.115
Token can
Token ID460
Feature activation+2.56e-01

Pos logit contributions
ting+0.193
pped+0.186
ter+0.181
 register+0.181
uit+0.177
Neg logit contributions
 common-0.176
 job-0.161
 own-0.158
 lesson-0.156
 purpose-0.155
Token you
Token ID345
Feature activation-3.38e-01

Pos logit contributions
Did+0.005
pped+0.005
cher+0.005
 register+0.005
ting+0.005
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.006
 job-0.006
 own-0.006
Token do
Token ID466
Feature activation+1.24e-01

Pos logit contributions
 breath+0.007
 common+0.007
 purpose+0.007
 job+0.006
 accomplishment+0.006
Neg logit contributions
Did-0.008
cher-0.008
pped-0.008
 lying-0.008
ting-0.007
Token?".
Token ID43634
Feature activation-2.24e+00

Pos logit contributions
Did+0.003
cher+0.003
pped+0.003
 lying+0.003
ting+0.003
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.002
 accomplishment-0.002
Token The
Token ID383
Feature activation+1.62e+00

Pos logit contributions
 breath+0.040
 common+0.038
 purpose+0.035
 job+0.033
 accomplishment+0.032
Neg logit contributions
Did-0.061
cher-0.060
pped-0.059
 lying-0.059
ter-0.058
Token's
Token ID338
Feature activation+1.84e+00

Pos logit contributions
Did+0.008
cher+0.008
boards+0.007
 register+0.007
ter+0.007
Neg logit contributions
 breath-0.006
 common-0.005
 work-0.005
 sleep-0.005
 purpose-0.005
Token backpack
Token ID22526
Feature activation+2.41e+00

Pos logit contributions
 lying+0.045
Did+0.045
cher+0.045
pped+0.044
ter+0.043
Neg logit contributions
 breath-0.035
 common-0.034
 purpose-0.032
 job-0.029
 accomplishment-0.029
Token and
Token ID290
Feature activation-8.72e-01

Pos logit contributions
ter+0.077
ode+0.077
Did+0.076
cher+0.076
ting+0.076
Neg logit contributions
 breath-0.029
 long-0.026
 work-0.024
 common-0.022
 job-0.021
Token asks
Token ID7893
Feature activation+7.64e-01

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.015
 job+0.014
 accomplishment+0.014
Neg logit contributions
Did-0.022
cher-0.022
pped-0.022
 lying-0.021
ter-0.021
Token "
Token ID366
Feature activation+4.93e+00

Pos logit contributions
ode+0.021
pped+0.021
Did+0.021
cher+0.021
bridge+0.020
Neg logit contributions
 breath-0.012
 common-0.012
 home-0.011
 purpose-0.011
 long-0.011
Tokenwhat
Token ID10919
Feature activation+7.84e+00

Pos logit contributions
auc+0.085
icked+0.082
washer+0.081
ovable+0.080
pled+0.080
Neg logit contributions
 breath-0.123
-0.113
Ar-0.103
away-0.102
 common-0.102
Token do
Token ID466
Feature activation+1.05e-01

Pos logit contributions
ting+0.216
ter+0.210
pped+0.206
ode+0.195
bled+0.194
Neg logit contributions
 common-0.157
 job-0.143
 long-0.140
 own-0.138
 purpose-0.137
Token you
Token ID345
Feature activation-1.40e+00

Pos logit contributions
Did+0.002
pped+0.002
cher+0.002
 lying+0.002
ting+0.002
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.002
 accomplishment-0.002
Token have
Token ID423
Feature activation+1.69e+00

Pos logit contributions
 breath+0.027
 common+0.027
 work+0.025
 job+0.025
 own+0.024
Neg logit contributions
Did-0.036
ting-0.036
cher-0.035
boards-0.034
 register-0.034
Token in
Token ID287
Feature activation-3.55e+00

Pos logit contributions
Did+0.045
cher+0.044
ode+0.044
ter+0.043
ting+0.042
Neg logit contributions
 breath-0.030
 common-0.029
 job-0.027
 purpose-0.027
 work-0.026
Token there
Token ID612
Feature activation-2.15e-01

Pos logit contributions
 breath+0.074
 common+0.070
 purpose+0.066
 accomplishment+0.062
 job+0.062
Neg logit contributions
pped-0.082
 lying-0.082
Did-0.081
cher-0.080
ter-0.079
Token not
Token ID407
Feature activation-1.33e+00

Pos logit contributions
ode+0.091
boards+0.088
cher+0.085
ting+0.085
pped+0.082
Neg logit contributions
 breath-0.047
 common-0.044
 long-0.040
 own-0.037
 purpose-0.037
Token happy
Token ID3772
Feature activation-4.79e+00

Pos logit contributions
 breath+0.016
 common+0.014
 purpose+0.012
 job+0.011
 work+0.011
Neg logit contributions
Did-0.044
pped-0.044
cher-0.043
ting-0.043
ode-0.043
Token and
Token ID290
Feature activation-2.47e+00

Pos logit contributions
 breath+0.065
 common+0.060
 purpose+0.056
 job+0.050
 accomplishment+0.047
Neg logit contributions
cher-0.150
Did-0.149
pped-0.148
ter-0.143
 lying-0.143
Token says
Token ID1139
Feature activation+2.55e+00

Pos logit contributions
 breath+0.040
 common+0.038
 purpose+0.035
 job+0.033
 accomplishment+0.031
Neg logit contributions
Did-0.071
cher-0.070
pped-0.070
 lying-0.068
ter-0.068
Token "
Token ID366
Feature activation+5.05e+00

Pos logit contributions
pped+0.080
ode+0.080
Did+0.078
cher+0.077
ting+0.077
Neg logit contributions
 common-0.033
 purpose-0.032
 breath-0.031
 long-0.030
 job-0.028
Tokenwhat
Token ID10919
Feature activation+7.83e+00

Pos logit contributions
Did+0.093
cher+0.091
pped+0.091
 lying+0.089
ter+0.088
Neg logit contributions
 breath-0.132
 common-0.127
 purpose-0.121
 job-0.117
 accomplishment-0.114
Token is
Token ID318
Feature activation+3.59e-01

Pos logit contributions
ter+0.219
ting+0.217
ode+0.205
 register+0.193
pped+0.193
Neg logit contributions
 common-0.165
 job-0.142
 long-0.139
 own-0.138
 purpose-0.136
Token going
Token ID1016
Feature activation-5.84e-02

Pos logit contributions
Did+0.008
cher+0.007
ode+0.007
pped+0.007
ting+0.007
Neg logit contributions
 breath-0.009
 common-0.008
 purpose-0.008
 job-0.008
 work-0.007
Token on
Token ID319
Feature activation-2.93e+00

Pos logit contributions
 breath+0.001
 common+0.001
 job+0.001
 purpose+0.001
 home+0.001
Neg logit contributions
Did-0.001
ode-0.001
pped-0.001
cher-0.001
boards-0.001
Token here
Token ID994
Feature activation-1.87e+00

Pos logit contributions
 breath+0.062
 common+0.059
 purpose+0.056
 job+0.054
 accomplishment+0.051
Neg logit contributions
Did-0.070
cher-0.068
pped-0.066
 lying-0.066
ode-0.065
Token?
Token ID30
Feature activation+2.43e+00

Pos logit contributions
 common+0.031
 breath+0.031
 purpose+0.030
 job+0.030
+0.028
Neg logit contributions
Did-0.052
cher-0.050
ode-0.050
ting-0.048
 register-0.048
Token closely
Token ID7173
Feature activation-2.68e+00

Pos logit contributions
 breath+0.109
 purpose+0.092
 common+0.092
 job+0.086
 accomplishment+0.085
Neg logit contributions
 lying-0.124
ter-0.115
Did-0.113
cher-0.112
 register-0.112
Token.
Token ID13
Feature activation+3.75e-01

Pos logit contributions
 breath+0.034
 common+0.031
 purpose+0.028
 job+0.025
 accomplishment+0.023
Neg logit contributions
cher-0.086
Did-0.086
pped-0.084
ode-0.082
 register-0.082
Token He
Token ID679
Feature activation-1.06e-01

Pos logit contributions
Did+0.009
 lying+0.008
pped+0.008
cher+0.008
ting+0.008
Neg logit contributions
 breath-0.008
 common-0.008
 purpose-0.007
 job-0.007
 accomplishment-0.007
Token asked
Token ID1965
Feature activation+1.70e+00

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 job+0.002
 accomplishment+0.002
Neg logit contributions
cher-0.002
Did-0.002
pped-0.002
 lying-0.002
ting-0.002
Token "
Token ID366
Feature activation+5.54e+00

Pos logit contributions
Did+0.049
cher+0.047
pped+0.047
ode+0.047
ter+0.046
Neg logit contributions
 breath-0.028
 common-0.026
 purpose-0.024
 job-0.022
 accomplishment-0.022
TokenWhat
Token ID2061
Feature activation+7.83e+00

Pos logit contributions
auc+0.114
washer+0.107
icked+0.104
adders+0.100
hes+0.099
Neg logit contributions
 breath-0.124
-0.117
Ar-0.110
 common-0.110
 purpose-0.107
Token do
Token ID466
Feature activation+6.77e-01

Pos logit contributions
pped+0.233
ting+0.229
bled+0.215
ter+0.214
 register+0.213
Neg logit contributions
 common-0.136
,"-0.129
 long-0.126
 own-0.124
 purpose-0.123
Token you
Token ID345
Feature activation-1.09e+00

Pos logit contributions
Did+0.014
cher+0.014
 lying+0.014
pped+0.014
ter+0.013
Neg logit contributions
 breath-0.016
 common-0.015
 purpose-0.015
 job-0.014
 accomplishment-0.014
Token see
Token ID766
Feature activation+3.72e+00

Pos logit contributions
 common+0.017
 breath+0.017
 work+0.016
 job+0.016
 own+0.015
Neg logit contributions
Did-0.032
cher-0.031
ting-0.031
 register-0.031
pped-0.030
Token?"
Token ID1701
Feature activation-2.52e+00

Pos logit contributions
Did+0.110
cher+0.108
 register+0.104
ode+0.103
pped+0.103
Neg logit contributions
 breath-0.054
 common-0.052
 purpose-0.050
,"-0.048
 job-0.047
Token
Token ID198
Feature activation-2.51e+00

Pos logit contributions
 breath+0.052
 common+0.049
 purpose+0.045
 job+0.044
 accomplishment+0.044
Neg logit contributions
Did-0.062
 lying-0.061
cher-0.058
ter-0.057
ting-0.057
Token.
Token ID13
Feature activation+8.12e-01

Pos logit contributions
Did+0.062
pped+0.061
cher+0.061
ode+0.060
 lying+0.060
Neg logit contributions
 breath-0.041
 common-0.038
 purpose-0.035
 job-0.033
 accomplishment-0.032
Token He
Token ID679
Feature activation-1.13e-01

Pos logit contributions
Did+0.019
 lying+0.018
ting+0.017
boards+0.017
cher+0.017
Neg logit contributions
 breath-0.019
 common-0.016
 accomplishment-0.016
 job-0.016
 purpose-0.015
Token shouted
Token ID17293
Feature activation+2.11e+00

Pos logit contributions
 breath+0.002
 common+0.002
 work+0.002
 long+0.002
 purpose+0.002
Neg logit contributions
 register-0.003
cher-0.003
ode-0.003
pped-0.003
ting-0.003
Token,
Token ID11
Feature activation+1.99e+00

Pos logit contributions
 lying+0.057
ting+0.056
pped+0.055
ter+0.055
cher+0.055
Neg logit contributions
 common-0.037
 breath-0.037
 purpose-0.033
 accomplishment-0.031
 job-0.030
Token "
Token ID366
Feature activation+5.13e+00

Pos logit contributions
cher+0.040
Did+0.039
pped+0.039
 register+0.038
 lying+0.038
Neg logit contributions
 breath-0.048
 common-0.046
 purpose-0.044
 job-0.042
 accomplishment-0.041
TokenWho
Token ID8241
Feature activation+7.82e+00

Pos logit contributions
Did+0.076
pped+0.073
cher+0.073
 lying+0.070
ter+0.070
Neg logit contributions
 breath-0.155
 common-0.150
 purpose-0.144
 job-0.140
 accomplishment-0.138
Token are
Token ID389
Feature activation+1.67e+00

Pos logit contributions
ting+0.175
Did+0.171
ter+0.171
ode+0.166
 register+0.157
Neg logit contributions
 common-0.174
 breath-0.162
 own-0.159
 work-0.159
,"-0.156
Token you
Token ID345
Feature activation-1.22e+00

Pos logit contributions
Did+0.048
pped+0.046
ode+0.046
cher+0.045
pled+0.045
Neg logit contributions
 breath-0.026
 common-0.025
 job-0.024
 purpose-0.023
 work-0.023
Token?
Token ID30
Feature activation+3.12e+00

Pos logit contributions
 breath+0.021
 common+0.021
 job+0.019
 purpose+0.019
 work+0.018
Neg logit contributions
Did-0.032
boards-0.032
ode-0.031
cher-0.031
auc-0.031
Token What
Token ID1867
Feature activation+6.85e+00

Pos logit contributions
ode+0.059
pped+0.055
cher+0.054
boards+0.054
ter+0.053
Neg logit contributions
 common-0.073
 breath-0.070
-0.069
 job-0.068
 purpose-0.067
Token are
Token ID389
Feature activation+1.96e+00

Pos logit contributions
ting+0.183
pped+0.174
ode+0.174
ter+0.171
 register+0.170
Neg logit contributions
 common-0.145
,"-0.127
 own-0.125
 long-0.122
 work-0.119
Token the
Token ID262
Feature activation+1.54e+00

Pos logit contributions
 breath+0.039
 common+0.036
 purpose+0.034
 job+0.032
 accomplishment+0.032
Neg logit contributions
cher-0.047
Did-0.047
pped-0.046
 lying-0.045
ter-0.045
Token rain
Token ID6290
Feature activation-1.03e+00

Pos logit contributions
Did+0.038
 lying+0.037
pped+0.037
ter+0.036
cher+0.036
Neg logit contributions
 breath-0.031
 common-0.029
 purpose-0.027
 job-0.025
 accomplishment-0.025
Token and
Token ID290
Feature activation-8.30e-01

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.015
 job+0.014
 accomplishment+0.014
Neg logit contributions
Did-0.029
cher-0.028
 lying-0.028
pped-0.028
ter-0.028
Token take
Token ID1011
Feature activation+2.13e+00

Pos logit contributions
 breath+0.017
 common+0.015
 purpose+0.014
 job+0.014
 own+0.013
Neg logit contributions
Did-0.021
pped-0.021
cher-0.021
 register-0.020
 lying-0.020
Token a
Token ID257
Feature activation-1.23e+00

Pos logit contributions
Did+0.044
cher+0.043
pped+0.042
 lying+0.042
ter+0.041
Neg logit contributions
 breath-0.052
 common-0.050
 purpose-0.047
 job-0.045
 accomplishment-0.044
Token com
Token ID401
Feature activation-1.07e+01

Pos logit contributions
 common+0.022
 purpose+0.021
 accomplishment+0.020
 breath+0.020
 choice+0.019
Neg logit contributions
 lying-0.030
cher-0.030
Did-0.028
pped-0.028
washer-0.027
Tokenfy
Token ID24928
Feature activation-7.83e+00

Pos logit contributions
 purpose+0.197
 breath+0.192
 common+0.189
 sight+0.172
 accomplishment+0.159
Neg logit contributions
cher-0.258
ode-0.247
pped-0.245
boards-0.234
washer-0.233
Token nap
Token ID25422
Feature activation-3.41e+00

Pos logit contributions
 common+0.157
 purpose+0.133
 accomplishment+0.130
 choice+0.129
 breath+0.128
Neg logit contributions
 lying-0.183
pped-0.178
uit-0.172
ode-0.167
ter-0.167
Token.
Token ID13
Feature activation+1.65e-01

Pos logit contributions
 breath+0.051
 common+0.045
 purpose+0.043
 job+0.040
 long+0.038
Neg logit contributions
Did-0.104
pped-0.100
cher-0.099
 register-0.098
ode-0.097
Token The
Token ID383
Feature activation+1.55e+00

Pos logit contributions
cher+0.005
Did+0.005
pped+0.005
 lying+0.005
ter+0.005
Neg logit contributions
 breath-0.003
 common-0.002
 purpose-0.002
 job-0.002
 accomplishment-0.002
Token tiger
Token ID26241
Feature activation+5.08e-02

Pos logit contributions
Did+0.038
cher+0.037
pped+0.037
 lying+0.037
ter+0.036
Neg logit contributions
 breath-0.032
 common-0.030
 purpose-0.028
 job-0.027
 accomplishment-0.026
Token soon
Token ID2582
Feature activation-5.20e+00

Pos logit contributions
Did+0.054
pped+0.053
 lying+0.053
cher+0.053
ter+0.051
Neg logit contributions
 breath-0.059
 common-0.057
 purpose-0.054
 job-0.051
 accomplishment-0.050
Token nothing
Token ID2147
Feature activation-1.69e+00

Pos logit contributions
 breath+0.124
 common+0.118
 purpose+0.112
 job+0.107
 accomplishment+0.105
Neg logit contributions
Did-0.109
cher-0.108
pped-0.106
 lying-0.103
ter-0.103
Token but
Token ID475
Feature activation-4.14e-01

Pos logit contributions
 breath+0.039
 common+0.038
 purpose+0.035
 job+0.034
 accomplishment+0.033
Neg logit contributions
Did-0.036
pped-0.036
cher-0.035
ode-0.035
 register-0.035
Token a
Token ID257
Feature activation+4.99e-02

Pos logit contributions
 breath+0.008
 common+0.007
 accomplishment+0.007
 purpose+0.007
 job+0.006
Neg logit contributions
cher-0.011
Did-0.011
pped-0.011
ode-0.011
 register-0.011
Token small
Token ID1402
Feature activation-4.20e+00

Pos logit contributions
Did+0.001
cher+0.001
 lying+0.001
pped+0.001
ter+0.001
Neg logit contributions
 breath-0.001
 common-0.001
 purpose-0.001
 job-0.001
 accomplishment-0.001
Token spe
Token ID693
Feature activation-1.07e+01

Pos logit contributions
 breath+0.080
 common+0.079
 purpose+0.072
 accomplishment+0.067
 job+0.067
Neg logit contributions
 lying-0.102
ter-0.101
Did-0.100
cher-0.097
ting-0.097
Tokenck
Token ID694
Feature activation+9.30e-01

Pos logit contributions
 purpose+0.221
 job+0.220
powers+0.220
+0.218
 breath+0.218
Neg logit contributions
pped-0.228
ter-0.195
Did-0.191
pled-0.185
ode-0.182
Token in
Token ID287
Feature activation-1.78e+00

Pos logit contributions
Did+0.021
cher+0.020
ode+0.020
boards+0.019
 register+0.019
Neg logit contributions
 breath-0.021
 common-0.018
 purpose-0.017
 job-0.017
 sight-0.017
Token the
Token ID262
Feature activation+1.46e+00

Pos logit contributions
 breath+0.032
 common+0.029
 job+0.029
 purpose+0.029
powers+0.028
Neg logit contributions
Did-0.045
pped-0.044
 lying-0.044
ter-0.042
ting-0.042
Token sky
Token ID6766
Feature activation-1.41e+00

Pos logit contributions
Did+0.035
cher+0.034
pped+0.034
 lying+0.034
ter+0.033
Neg logit contributions
 breath-0.031
 common-0.029
 purpose-0.028
 job-0.026
 accomplishment-0.026
Token.
Token ID13
Feature activation+2.88e-01

Pos logit contributions
 breath+0.019
 common+0.018
 purpose+0.016
 job+0.015
 accomplishment+0.014
Neg logit contributions
cher-0.044
Did-0.044
pped-0.043
 register-0.042
ode-0.042
Token and
Token ID290
Feature activation-2.85e+00

Pos logit contributions
 breath+0.132
 common+0.125
 purpose+0.115
 job+0.107
 accomplishment+0.103
Neg logit contributions
Did-0.228
cher-0.226
pped-0.225
 lying-0.224
ter-0.220
Token curled
Token ID41108
Feature activation-1.82e+00

Pos logit contributions
 breath+0.064
 common+0.062
 purpose+0.059
 job+0.056
 accomplishment+0.054
Neg logit contributions
Did-0.062
cher-0.061
 lying-0.061
pped-0.060
ter-0.059
Token up
Token ID510
Feature activation-3.75e+00

Pos logit contributions
 breath+0.041
 common+0.041
 purpose+0.038
 job+0.036
 accomplishment+0.035
Neg logit contributions
pped-0.038
ter-0.038
 lying-0.038
Did-0.038
cher-0.038
Token in
Token ID287
Feature activation-1.11e+00

Pos logit contributions
 breath+0.089
 common+0.088
 purpose+0.084
 job+0.079
 accomplishment+0.077
Neg logit contributions
 lying-0.075
Did-0.074
pped-0.073
cher-0.072
ter-0.072
Token a
Token ID257
Feature activation-7.57e-01

Pos logit contributions
 breath+0.023
 common+0.023
 purpose+0.021
 job+0.021
 accomplishment+0.020
Neg logit contributions
Did-0.025
 lying-0.025
pped-0.025
cher-0.024
ting-0.024
Token com
Token ID401
Feature activation-1.03e+01

Pos logit contributions
 common+0.015
 breath+0.015
 purpose+0.013
 job+0.013
 accomplishment+0.012
Neg logit contributions
 lying-0.018
Did-0.017
cher-0.017
 register-0.016
ter-0.016
Tokenfy
Token ID24928
Feature activation-7.77e+00

Pos logit contributions
 breath+0.212
 purpose+0.210
 common+0.196
 sight+0.178
 avail+0.171
Neg logit contributions
cher-0.226
washer-0.211
ode-0.207
boards-0.202
auc-0.201
Token spot
Token ID4136
Feature activation-2.84e-01

Pos logit contributions
 common+0.164
 choice+0.142
 breath+0.138
powers+0.138
 judgement+0.134
Neg logit contributions
 lying-0.173
ter-0.156
uit-0.152
pped-0.152
ode-0.150
Token to
Token ID284
Feature activation-2.08e+00

Pos logit contributions
 breath+0.004
 common+0.004
 purpose+0.004
 job+0.003
 accomplishment+0.003
Neg logit contributions
Did-0.008
cher-0.008
pped-0.008
ting-0.008
 register-0.008
Token take
Token ID1011
Feature activation+2.09e+00

Pos logit contributions
 breath+0.039
 common+0.036
 purpose+0.033
 job+0.031
 accomplishment+0.031
Neg logit contributions
Did-0.055
cher-0.054
pped-0.054
 register-0.053
 lying-0.053
Token a
Token ID257
Feature activation-1.24e+00

Pos logit contributions
pped+0.044
cher+0.044
Did+0.044
 lying+0.043
ting+0.043
Neg logit contributions
 breath-0.047
 common-0.046
 purpose-0.043
 job-0.041
 accomplishment-0.041
Token She
Token ID1375
Feature activation-3.14e-02

Pos logit contributions
 breath+0.006
 common+0.006
 job+0.005
 purpose+0.005
 accomplishment+0.005
Neg logit contributions
Did-0.005
 register-0.005
boards-0.005
 Why-0.005
 lying-0.005
Token lay
Token ID3830
Feature activation-6.91e-01

Pos logit contributions
 breath+0.001
 common+0.001
 purpose+0.001
 job+0.001
 accomplishment+0.001
Neg logit contributions
Did-0.001
cher-0.001
pped-0.001
ting-0.001
 lying-0.001
Token down
Token ID866
Feature activation-2.01e+00

Pos logit contributions
 breath+0.014
 common+0.012
 job+0.011
 home+0.011
 purpose+0.011
Neg logit contributions
ode-0.017
cher-0.017
Did-0.017
pped-0.016
 register-0.016
Token in
Token ID287
Feature activation-2.26e+00

Pos logit contributions
 breath+0.044
 common+0.042
 purpose+0.039
 job+0.038
 accomplishment+0.037
Neg logit contributions
Did-0.046
cher-0.045
pped-0.045
 lying-0.044
ter-0.043
Token a
Token ID257
Feature activation-1.16e+00

Pos logit contributions
 breath+0.046
 common+0.044
 purpose+0.041
 job+0.039
 accomplishment+0.038
Neg logit contributions
Did-0.054
pped-0.053
 lying-0.053
cher-0.053
ting-0.052
Token com
Token ID401
Feature activation-1.03e+01

Pos logit contributions
 breath+0.023
 common+0.022
 purpose+0.021
 job+0.019
 accomplishment+0.019
Neg logit contributions
Did-0.028
 lying-0.028
cher-0.028
pped-0.027
 register-0.027
Tokenfy
Token ID24928
Feature activation-7.97e+00

Pos logit contributions
 purpose+0.227
 breath+0.221
 common+0.217
 sight+0.198
 avail+0.190
Neg logit contributions
washer-0.208
boards-0.201
cher-0.197
auc-0.194
oll-0.192
Token position
Token ID2292
Feature activation-7.55e-01

Pos logit contributions
 common+0.165
 breath+0.144
 choice+0.141
 purpose+0.139
 accomplishment+0.137
Neg logit contributions
 lying-0.178
uit-0.163
pped-0.162
box-0.162
boards-0.159
Token and
Token ID290
Feature activation-1.37e+00

Pos logit contributions
 breath+0.012
 common+0.011
 purpose+0.010
 job+0.009
 accomplishment+0.009
Neg logit contributions
Did-0.022
cher-0.022
pped-0.022
 lying-0.021
ter-0.021
Token soon
Token ID2582
Feature activation-5.25e+00

Pos logit contributions
 breath+0.031
 common+0.029
 purpose+0.027
 job+0.026
 accomplishment+0.026
Neg logit contributions
Did-0.030
cher-0.030
pped-0.030
ter-0.029
 lying-0.029
Token fell
Token ID3214
Feature activation-9.58e-01

Pos logit contributions
 breath+0.112
 common+0.108
 purpose+0.101
 job+0.096
 accomplishment+0.094
Neg logit contributions
Did-0.121
cher-0.119
pped-0.119
 lying-0.118
ter-0.116
Token c
Token ID269
Feature activation-9.30e-01

Pos logit contributions
 breath+0.023
 common+0.022
 purpose+0.021
 job+0.020
 accomplishment+0.019
Neg logit contributions
Did-0.020
cher-0.019
pped-0.019
 lying-0.019
ting-0.019
Tokenuddled
Token ID32745
Feature activation+1.16e-01

Pos logit contributions
 common+0.028
 breath+0.028
 purpose+0.026
 accomplishment+0.025
 job+0.024
Neg logit contributions
pped-0.014
Did-0.013
 lying-0.013
boards-0.013
cher-0.013
Token up
Token ID510
Feature activation-4.55e+00

Pos logit contributions
Did+0.003
cher+0.003
pped+0.002
 register+0.002
ode+0.002
Neg logit contributions
 breath-0.003
 common-0.003
 purpose-0.002
 job-0.002
 accomplishment-0.002
Token in
Token ID287
Feature activation-2.01e+00

Pos logit contributions
 breath+0.097
 common+0.094
 purpose+0.090
 job+0.084
 accomplishment+0.082
Neg logit contributions
Did-0.103
 lying-0.103
pped-0.102
cher-0.101
ter-0.099
Token a
Token ID257
Feature activation-9.58e-01

Pos logit contributions
 breath+0.041
 common+0.039
 purpose+0.037
 job+0.035
 accomplishment+0.034
Neg logit contributions
Did-0.048
cher-0.048
pped-0.048
 lying-0.047
ter-0.046
Token com
Token ID401
Feature activation-1.02e+01

Pos logit contributions
 common+0.019
 breath+0.019
 purpose+0.017
 accomplishment+0.017
 job+0.016
Neg logit contributions
 lying-0.023
Did-0.022
cher-0.022
 register-0.021
washer-0.021
Tokenfy
Token ID24928
Feature activation-7.92e+00

Pos logit contributions
 purpose+0.217
 breath+0.215
 common+0.200
 sight+0.191
 accomplishment+0.186
Neg logit contributions
washer-0.219
cher-0.213
boards-0.199
oll-0.197
auc-0.195
Token blanket
Token ID18447
Feature activation-1.60e+00

Pos logit contributions
 common+0.176
 accomplishment+0.154
 choice+0.147
 judgement+0.145
 lesson+0.141
Neg logit contributions
 lying-0.170
box-0.152
uit-0.151
pped-0.149
ode-0.146
Token and
Token ID290
Feature activation-1.27e+00

Pos logit contributions
 breath+0.021
 common+0.019
 purpose+0.017
 job+0.016
 accomplishment+0.015
Neg logit contributions
Did-0.051
pped-0.049
cher-0.049
 lying-0.048
ode-0.048
Token soon
Token ID2582
Feature activation-5.06e+00

Pos logit contributions
 breath+0.028
 common+0.026
 purpose+0.024
 job+0.024
 work+0.023
Neg logit contributions
Did-0.029
cher-0.029
pped-0.029
 register-0.028
ting-0.028
Token he
Token ID339
Feature activation-6.49e-01

Pos logit contributions
 breath+0.106
 common+0.098
 purpose+0.093
 job+0.089
 accomplishment+0.087
Neg logit contributions
Did-0.121
cher-0.118
pped-0.116
ode-0.115
ting-0.114
Token the
Token ID262
Feature activation+2.65e+00

Pos logit contributions
 breath+0.067
 common+0.064
 purpose+0.059
 job+0.056
 accomplishment+0.054
Neg logit contributions
Did-0.103
cher-0.103
pped-0.102
 register-0.100
ting-0.098
Token spider
Token ID19230
Feature activation+1.17e+00

Pos logit contributions
 lying+0.059
Did+0.059
ter+0.057
cher+0.057
pped+0.056
Neg logit contributions
 breath-0.058
 common-0.056
 purpose-0.051
 job-0.048
 accomplishment-0.048
Token had
Token ID550
Feature activation+2.87e+00

Pos logit contributions
Did+0.024
 register+0.023
pped+0.023
cher+0.023
ode+0.022
Neg logit contributions
 breath-0.028
 common-0.026
 own-0.024
 purpose-0.024
 work-0.024
Token loved
Token ID6151
Feature activation-9.25e-02

Pos logit contributions
Did+0.062
cher+0.061
 register+0.059
ting+0.059
ode+0.058
Neg logit contributions
 breath-0.068
 common-0.064
 purpose-0.063
 job-0.058
 work-0.058
Token his
Token ID465
Feature activation-3.45e-02

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 job+0.001
 long+0.001
Neg logit contributions
cher-0.002
Did-0.002
pped-0.002
 register-0.002
ode-0.002
Token com
Token ID401
Feature activation-1.01e+01

Pos logit contributions
 common+0.001
 breath+0.001
 purpose+0.001
+0.001
 accomplishment+0.001
Neg logit contributions
 lying-0.001
ter-0.001
cher-0.001
Did-0.001
pled-0.001
Tokenfy
Token ID24928
Feature activation-5.93e+00

Pos logit contributions
 breath+0.181
 common+0.156
 purpose+0.148
 sight+0.140
 accomplishment+0.136
Neg logit contributions
pled-0.254
ode-0.251
cher-0.241
pped-0.240
oll-0.238
Token cob
Token ID22843
Feature activation-1.78e+00

Pos logit contributions
 common+0.126
 breath+0.118
 accomplishment+0.106
 purpose+0.104
+0.103
Neg logit contributions
 lying-0.132
ter-0.127
pped-0.125
ode-0.123
Did-0.122
Tokenweb
Token ID12384
Feature activation-6.13e-01

Pos logit contributions
 breath+0.054
 common+0.050
 purpose+0.050
 job+0.048
 accomplishment+0.047
Neg logit contributions
Did-0.026
pped-0.024
cher-0.024
 register-0.024
 lying-0.023
Token home
Token ID1363
Feature activation-1.53e+00

Pos logit contributions
 breath+0.009
 common+0.008
 purpose+0.008
 job+0.007
 accomplishment+0.007
Neg logit contributions
Did-0.018
pped-0.018
cher-0.018
 lying-0.018
ting-0.018
Token.
Token ID13
Feature activation+2.41e-01

Pos logit contributions
 breath+0.017
 purpose+0.014
 common+0.014
 job+0.013
 accomplishment+0.012
Neg logit contributions
Did-0.052
pped-0.051
cher-0.050
ode-0.049
ting-0.048
Token away
Token ID1497
Feature activation-3.22e+00

Pos logit contributions
ting+0.023
ter+0.023
 lying+0.023
cher+0.023
pped+0.023
Neg logit contributions
 breath-0.018
 common-0.017
 purpose-0.016
 job-0.015
 accomplishment-0.015
Token,
Token ID11
Feature activation+6.12e-01

Pos logit contributions
 breath+0.058
 common+0.058
 purpose+0.055
 job+0.051
 accomplishment+0.048
Neg logit contributions
 lying-0.082
ter-0.081
cher-0.080
ting-0.080
Did-0.079
Token a
Token ID257
Feature activation-1.13e+00

Pos logit contributions
Did+0.014
cher+0.014
pped+0.014
 lying+0.014
ter+0.014
Neg logit contributions
 breath-0.013
 common-0.012
 purpose-0.011
 job-0.011
 accomplishment-0.011
Token bright
Token ID6016
Feature activation-4.84e+00

Pos logit contributions
 breath+0.020
 common+0.019
 purpose+0.017
 job+0.016
 accomplishment+0.016
Neg logit contributions
Did-0.030
 lying-0.030
cher-0.029
ter-0.028
pped-0.028
Token blue
Token ID4171
Feature activation-3.02e+00

Pos logit contributions
 common+0.086
 breath+0.082
 purpose+0.077
powers+0.073
 job+0.071
Neg logit contributions
 lying-0.124
ter-0.120
Did-0.117
pped-0.115
pled-0.113
Token spe
Token ID693
Feature activation-9.91e+00

Pos logit contributions
 breath+0.061
 common+0.056
 purpose+0.054
 job+0.051
 accomplishment+0.049
Neg logit contributions
cher-0.075
Did-0.075
pped-0.073
 lying-0.071
 register-0.071
Tokenck
Token ID694
Feature activation+1.43e+00

Pos logit contributions
 Today+0.216
 job+0.215
+0.212
 breath+0.209
 purpose+0.208
Neg logit contributions
pped-0.179
ode-0.164
pled-0.157
ter-0.150
coon-0.149
Token in
Token ID287
Feature activation-5.92e-01

Pos logit contributions
Did+0.032
cher+0.029
ode+0.029
boards+0.028
 register+0.028
Neg logit contributions
 breath-0.035
 common-0.028
 home-0.028
 long-0.028
 purpose-0.027
Token the
Token ID262
Feature activation+1.75e+00

Pos logit contributions
 breath+0.011
 common+0.010
 purpose+0.010
 job+0.010
 accomplishment+0.009
Neg logit contributions
Did-0.015
pped-0.014
 lying-0.014
ting-0.014
ter-0.014
Token sky
Token ID6766
Feature activation-1.21e+00

Pos logit contributions
Did+0.041
pped+0.040
cher+0.040
 lying+0.040
ter+0.039
Neg logit contributions
 breath-0.037
 common-0.035
 purpose-0.033
 job-0.031
 accomplishment-0.031
Token.
Token ID13
Feature activation+5.33e-01

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.014
 job+0.013
 accomplishment+0.013
Neg logit contributions
Did-0.037
cher-0.036
pped-0.036
 lying-0.036
ter-0.035
Token
Token ID198
Feature activation-2.18e+00

Pos logit contributions
Did+0.000
 lying+0.000
ting+0.000
pped+0.000
boards+0.000
Neg logit contributions
 breath-0.000
 common-0.000
 purpose-0.000
 accomplishment-0.000
 job-0.000
Token
Token ID198
Feature activation-1.99e+00

Pos logit contributions
 breath+0.049
 common+0.044
 job+0.042
 accomplishment+0.042
 purpose+0.042
Neg logit contributions
Did-0.051
 lying-0.048
pped-0.048
boards-0.046
cher-0.046
TokenShe
Token ID3347
Feature activation-1.47e-01

Pos logit contributions
 breath+0.041
 common+0.038
 purpose+0.036
 job+0.034
 accomplishment+0.034
Neg logit contributions
Did-0.050
pped-0.048
 lying-0.048
cher-0.048
ting-0.046
Token had
Token ID550
Feature activation+3.19e+00

Pos logit contributions
 breath+0.004
 common+0.003
 purpose+0.003
 job+0.003
 own+0.003
Neg logit contributions
Did-0.003
cher-0.003
pped-0.003
ting-0.003
ter-0.003
Token a
Token ID257
Feature activation-6.13e-01

Pos logit contributions
cher+0.069
ter+0.069
ting+0.068
 register+0.066
pped+0.066
Neg logit contributions
 breath-0.075
 common-0.069
 long-0.069
 purpose-0.067
 own-0.064
Token com
Token ID401
Feature activation-9.80e+00

Pos logit contributions
 breath+0.010
 common+0.009
 purpose+0.009
 accomplishment+0.008
 job+0.008
Neg logit contributions
Did-0.016
 lying-0.016
cher-0.016
ode-0.016
 register-0.016
Tokenfy
Token ID24928
Feature activation-6.41e+00

Pos logit contributions
 purpose+0.184
 breath+0.178
 common+0.157
 sight+0.153
 avail+0.141
Neg logit contributions
cher-0.240
ode-0.221
pled-0.220
boards-0.218
asing-0.218
Token bed
Token ID3996
Feature activation-8.99e-01

Pos logit contributions
 common+0.127
 breath+0.116
 purpose+0.114
 accomplishment+0.109
powers+0.105
Neg logit contributions
 lying-0.147
pped-0.139
ode-0.136
uit-0.134
box-0.134
Token and
Token ID290
Feature activation-5.88e-01

Pos logit contributions
 breath+0.014
 job+0.012
 common+0.012
 purpose+0.012
 long+0.012
Neg logit contributions
Did-0.026
pped-0.026
cher-0.025
ode-0.025
ter-0.025
Token lots
Token ID6041
Feature activation-2.37e+00

Pos logit contributions
 breath+0.013
 common+0.012
 job+0.012
 long+0.012
 purpose+0.012
Neg logit contributions
cher-0.013
Did-0.013
pped-0.013
ting-0.013
ter-0.013
Token of
Token ID286
Feature activation+5.33e-01

Pos logit contributions
 breath+0.058
 common+0.056
 purpose+0.053
 job+0.050
 accomplishment+0.049
Neg logit contributions
Did-0.048
pped-0.047
cher-0.047
 lying-0.047
ting-0.046
Token for
Token ID329
Feature activation+1.96e+00

Pos logit contributions
Did+0.119
pped+0.118
cher+0.115
 register+0.115
ode+0.113
Neg logit contributions
 breath-0.069
 common-0.063
 purpose-0.056
 long-0.055
 job-0.054
Token food
Token ID2057
Feature activation+1.28e+00

Pos logit contributions
cher+0.054
Did+0.054
pped+0.054
 register+0.052
ter+0.052
Neg logit contributions
 breath-0.036
 common-0.033
 purpose-0.031
 job-0.029
 accomplishment-0.029
Token and
Token ID290
Feature activation-8.31e-01

Pos logit contributions
Did+0.041
pped+0.041
cher+0.039
 register+0.039
ting+0.039
Neg logit contributions
 breath-0.016
 common-0.015
 purpose-0.013
 job-0.012
 long-0.012
Token made
Token ID925
Feature activation+2.91e+00

Pos logit contributions
 breath+0.017
 common+0.016
 long+0.015
 purpose+0.014
 work+0.014
Neg logit contributions
pped-0.021
 register-0.021
cher-0.021
Did-0.020
ter-0.020
Token a
Token ID257
Feature activation-1.48e-01

Pos logit contributions
Did+0.063
cher+0.062
pped+0.061
 lying+0.060
ter+0.059
Neg logit contributions
 breath-0.067
 common-0.064
 purpose-0.061
 job-0.058
 accomplishment-0.057
Token com
Token ID401
Feature activation-9.77e+00

Pos logit contributions
 breath+0.003
 common+0.003
 purpose+0.002
 accomplishment+0.002
 job+0.002
Neg logit contributions
Did-0.004
 lying-0.004
cher-0.003
ode-0.003
 register-0.003
Tokenfy
Token ID24928
Feature activation-6.57e+00

Pos logit contributions
 breath+0.185
 purpose+0.182
 common+0.173
 sight+0.158
 Wednesday+0.142
Neg logit contributions
cher-0.236
ode-0.227
vell-0.217
washer-0.215
pled-0.212
Token home
Token ID1363
Feature activation-1.23e+00

Pos logit contributions
 common+0.122
 breath+0.112
 purpose+0.103
powers+0.099
+0.097
Neg logit contributions
 lying-0.157
ode-0.146
ter-0.144
uit-0.140
pped-0.140
Token in
Token ID287
Feature activation-1.83e+00

Pos logit contributions
 breath+0.016
 common+0.014
 purpose+0.013
 job+0.012
 accomplishment+0.012
Neg logit contributions
Did-0.040
pped-0.039
cher-0.038
ting-0.037
ode-0.037
Token the
Token ID262
Feature activation+1.21e+00

Pos logit contributions
 breath+0.037
 common+0.035
 purpose+0.033
 job+0.032
 accomplishment+0.031
Neg logit contributions
Did-0.044
cher-0.043
pped-0.043
 lying-0.043
ter-0.042
Token woods
Token ID16479
Feature activation-5.38e-01

Pos logit contributions
Did+0.030
cher+0.030
pped+0.029
 lying+0.029
ter+0.029
Neg logit contributions
 breath-0.024
 common-0.023
 purpose-0.022
 job-0.021
 accomplishment-0.020
Token they
Token ID484
Feature activation+2.02e+00

Pos logit contributions
 breath+0.017
 common+0.016
 long+0.014
 purpose+0.014
 accomplishment+0.014
Neg logit contributions
cher-0.037
ode-0.037
pped-0.036
Did-0.035
 register-0.035
Token had
Token ID550
Feature activation+3.99e+00

Pos logit contributions
auc+0.049
ode+0.047
ting+0.046
 register+0.046
umbers+0.045
Neg logit contributions
 breath-0.041
 common-0.038
 long-0.037
 own-0.037
 care-0.036
Token not
Token ID407
Feature activation-1.54e+00

Pos logit contributions
auc+0.089
ting+0.089
boards+0.088
cher+0.087
 register+0.086
Neg logit contributions
 breath-0.090
 common-0.089
 purpose-0.084
 job-0.079
 own-0.078
Token taken
Token ID2077
Feature activation+7.38e-01

Pos logit contributions
 breath+0.037
 common+0.036
 purpose+0.033
 job+0.032
 work+0.032
Neg logit contributions
cher-0.032
Did-0.032
ting-0.031
 register-0.031
pped-0.031
Token the
Token ID262
Feature activation+2.34e+00

Pos logit contributions
ting+0.015
 lying+0.015
ter+0.015
Did+0.015
pped+0.015
Neg logit contributions
 breath-0.016
 common-0.016
 purpose-0.015
 accomplishment-0.015
 job-0.014
Token com
Token ID401
Feature activation-9.69e+00

Pos logit contributions
Did+0.055
cher+0.055
 lying+0.054
pped+0.054
ter+0.053
Neg logit contributions
 breath-0.049
 common-0.047
 purpose-0.044
 job-0.041
 accomplishment-0.041
Tokenbs
Token ID1443
Feature activation-8.71e-01

Pos logit contributions
 breath+0.174
 purpose+0.154
 common+0.145
 sight+0.139
 accomplishment+0.135
Neg logit contributions
cher-0.259
pled-0.249
 register-0.248
ting-0.245
boards-0.244
Token.
Token ID13
Feature activation+8.16e-01

Pos logit contributions
 breath+0.013
 common+0.012
 purpose+0.011
 job+0.011
 own+0.010
Neg logit contributions
Did-0.026
pped-0.025
cher-0.025
 lying-0.025
ode-0.024
Token They
Token ID1119
Feature activation-3.21e-01

Pos logit contributions
Did+0.022
 lying+0.020
 Why+0.020
ter+0.020
pped+0.020
Neg logit contributions
 breath-0.017
 accomplishment-0.013
 common-0.013
 purpose-0.013
 job-0.013
Token learned
Token ID4499
Feature activation+2.37e+00

Pos logit contributions
 breath+0.007
 common+0.007
 purpose+0.007
 job+0.006
 own+0.006
Neg logit contributions
cher-0.007
Did-0.007
pped-0.007
 lying-0.007
ting-0.007
Token a
Token ID257
Feature activation-6.02e-02

Pos logit contributions
Did+0.050
cher+0.049
pped+0.049
 lying+0.048
ter+0.047
Neg logit contributions
 breath-0.056
 common-0.054
 purpose-0.051
 job-0.049
 accomplishment-0.048
Token to
Token ID284
Feature activation-1.10e+00

Pos logit contributions
 breath+0.005
 common+0.005
 purpose+0.005
 job+0.004
 accomplishment+0.004
Neg logit contributions
Did-0.008
cher-0.008
pped-0.008
 lying-0.008
ter-0.008
Token make
Token ID787
Feature activation+3.71e+00

Pos logit contributions
 breath+0.019
 common+0.018
 purpose+0.017
 job+0.016
 own+0.016
Neg logit contributions
Did-0.031
cher-0.030
pped-0.030
boards-0.029
 register-0.029
Token it
Token ID340
Feature activation-6.62e-01

Pos logit contributions
cher+0.076
 lying+0.076
Did+0.076
pped+0.076
ter+0.074
Neg logit contributions
 breath-0.087
 common-0.085
 purpose-0.078
 job-0.075
 accomplishment-0.074
Token even
Token ID772
Feature activation-2.81e+00

Pos logit contributions
 breath+0.014
 common+0.014
 purpose+0.013
 accomplishment+0.012
 job+0.012
Neg logit contributions
 lying-0.015
cher-0.015
Did-0.014
ter-0.014
pped-0.014
Token more
Token ID517
Feature activation-4.15e+00

Pos logit contributions
 breath+0.067
 common+0.064
 purpose+0.060
 job+0.058
 accomplishment+0.056
Neg logit contributions
Did-0.059
cher-0.058
pped-0.058
 lying-0.057
ter-0.056
Token com
Token ID401
Feature activation-9.66e+00

Pos logit contributions
 breath+0.080
 common+0.072
eth+0.068
by+0.063
 purpose+0.063
Neg logit contributions
 lying-0.106
pled-0.095
cher-0.095
ter-0.095
Did-0.093
Tokenfy
Token ID24928
Feature activation-5.79e+00

Pos logit contributions
 breath+0.225
 purpose+0.208
 common+0.206
 sight+0.192
 accomplishment+0.176
Neg logit contributions
cher-0.204
pled-0.193
ode-0.190
ting-0.190
pped-0.189
Token!"
Token ID2474
Feature activation-2.39e+00

Pos logit contributions
 breath+0.083
 common+0.082
 purpose+0.071
 accomplishment+0.066
 job+0.063
Neg logit contributions
 lying-0.171
ter-0.170
cher-0.169
ting-0.168
pped-0.166
Token They
Token ID1119
Feature activation+1.77e-02

Pos logit contributions
 breath+0.046
 common+0.043
 purpose+0.040
 job+0.038
 accomplishment+0.037
Neg logit contributions
Did-0.061
cher-0.060
pped-0.060
 lying-0.059
ting-0.059
Token both
Token ID1111
Feature activation-2.68e+00

Pos logit contributions
Did+0.000
cher+0.000
ting+0.000
 register+0.000
 lying+0.000
Neg logit contributions
 breath-0.000
 common-0.000
 work-0.000
 job-0.000
 long-0.000
Token gigg
Token ID30442
Feature activation-3.08e+00

Pos logit contributions
 breath+0.067
 common+0.064
 purpose+0.061
 job+0.059
 accomplishment+0.057
Neg logit contributions
Did-0.053
cher-0.052
pped-0.052
 lying-0.051
ter-0.050
Token almost
Token ID2048
Feature activation-4.02e+00

Pos logit contributions
 breath+0.036
 common+0.034
 purpose+0.033
 job+0.032
 accomplishment+0.031
Neg logit contributions
Did-0.020
cher-0.020
pped-0.019
 register-0.019
 lying-0.019
Token fell
Token ID3214
Feature activation-1.21e+00

Pos logit contributions
 breath+0.091
 common+0.086
 purpose+0.080
 job+0.078
 accomplishment+0.076
Neg logit contributions
Did-0.089
cher-0.088
pped-0.085
 register-0.085
ode-0.084
Token asleep
Token ID16039
Feature activation-4.86e+00

Pos logit contributions
 breath+0.026
 common+0.025
 purpose+0.023
 job+0.022
 accomplishment+0.022
Neg logit contributions
 lying-0.028
Did-0.027
pped-0.027
cher-0.027
ter-0.027
Token in
Token ID287
Feature activation-2.31e+00

Pos logit contributions
 breath+0.086
 common+0.085
 purpose+0.078
 job+0.071
 accomplishment+0.069
Neg logit contributions
cher-0.128
Did-0.128
 lying-0.128
pped-0.126
ter-0.125
Token his
Token ID465
Feature activation-9.60e-01

Pos logit contributions
 common+0.043
 breath+0.042
 purpose+0.040
 job+0.039
 accomplishment+0.036
Neg logit contributions
Did-0.056
 lying-0.056
pped-0.056
cher-0.054
ting-0.053
Token com
Token ID401
Feature activation-9.66e+00

Pos logit contributions
 common+0.019
 breath+0.017
 purpose+0.017
 accomplishment+0.015
away+0.015
Neg logit contributions
 lying-0.023
ode-0.020
 register-0.019
washer-0.019
Did-0.019
Tokenfy
Token ID24928
Feature activation-6.88e+00

Pos logit contributions
 purpose+0.207
 common+0.206
 breath+0.199
 accomplishment+0.181
 sight+0.178
Neg logit contributions
ode-0.185
cher-0.180
boards-0.180
pled-0.179
washer-0.178
Token chair
Token ID5118
Feature activation-1.43e+00

Pos logit contributions
 common+0.157
 accomplishment+0.122
 breath+0.122
+0.120
 choice+0.119
Neg logit contributions
 lying-0.150
uit-0.138
box-0.135
pped-0.134
boards-0.134
Token.
Token ID13
Feature activation-2.26e-01

Pos logit contributions
 breath+0.020
 common+0.019
 purpose+0.017
 job+0.016
 accomplishment+0.015
Neg logit contributions
Did-0.043
cher-0.043
pped-0.042
 lying-0.042
ter-0.042
Token When
Token ID1649
Feature activation-1.41e+00

Pos logit contributions
 breath+0.004
 common+0.004
 purpose+0.004
 job+0.003
 accomplishment+0.003
Neg logit contributions
Did-0.006
 lying-0.006
pped-0.006
cher-0.006
ting-0.006
Token he
Token ID339
Feature activation-5.64e-01

Pos logit contributions
 common+0.037
 breath+0.037
 purpose+0.034
powers+0.033
 own+0.032
Neg logit contributions
 lying-0.026
pped-0.024
Did-0.023
 register-0.023
ting-0.022
Token,
Token ID11
Feature activation+1.05e+00

Pos logit contributions
 breath+0.029
 common+0.027
 purpose+0.025
 job+0.024
 accomplishment+0.023
Neg logit contributions
Did-0.050
cher-0.049
pped-0.049
ode-0.048
 register-0.048
Token the
Token ID262
Feature activation+1.70e+00

Pos logit contributions
cher+0.028
pped+0.028
 register+0.028
Did+0.028
boards+0.027
Neg logit contributions
 breath-0.019
 common-0.017
 purpose-0.015
 accomplishment-0.015
 job-0.015
Token bird
Token ID6512
Feature activation+1.77e+00

Pos logit contributions
Did+0.043
cher+0.042
pped+0.042
 lying+0.041
ting+0.041
Neg logit contributions
 breath-0.034
 common-0.032
 purpose-0.030
 job-0.028
 accomplishment-0.027
Token had
Token ID550
Feature activation+3.08e+00

Pos logit contributions
Did+0.035
cher+0.034
pped+0.034
 lying+0.034
ter+0.033
Neg logit contributions
 breath-0.044
 common-0.042
 purpose-0.040
 job-0.038
 accomplishment-0.038
Token a
Token ID257
Feature activation-3.39e-01

Pos logit contributions
Did+0.060
cher+0.059
pped+0.059
ting+0.058
 register+0.058
Neg logit contributions
 breath-0.078
 common-0.075
 purpose-0.072
 job-0.067
 long-0.066
Token com
Token ID401
Feature activation-9.63e+00

Pos logit contributions
 breath+0.006
 common+0.006
 purpose+0.006
 accomplishment+0.005
 job+0.005
Neg logit contributions
Did-0.008
 lying-0.008
cher-0.008
ode-0.008
ter-0.007
Tokenfy
Token ID24928
Feature activation-6.34e+00

Pos logit contributions
 purpose+0.174
 breath+0.170
 common+0.158
 avail+0.137
 sight+0.136
Neg logit contributions
cher-0.233
ode-0.227
pled-0.224
pped-0.216
Did-0.212
Token home
Token ID1363
Feature activation-6.67e-01

Pos logit contributions
 common+0.121
 breath+0.106
 purpose+0.101
 accomplishment+0.096
powers+0.095
Neg logit contributions
 lying-0.154
pped-0.145
ode-0.145
ter-0.144
uit-0.142
Token,
Token ID11
Feature activation+8.65e-01

Pos logit contributions
 breath+0.008
 common+0.007
 job+0.007
 purpose+0.006
 home+0.006
Neg logit contributions
Did-0.022
pped-0.021
cher-0.021
ode-0.021
ting-0.020
Token and
Token ID290
Feature activation-9.38e-01

Pos logit contributions
cher+0.024
 register+0.024
pped+0.024
Did+0.023
ting+0.023
Neg logit contributions
 breath-0.015
 common-0.013
 purpose-0.012
 long-0.012
 accomplishment-0.012
Token the
Token ID262
Feature activation+1.37e+00

Pos logit contributions
 breath+0.018
 common+0.017
 purpose+0.016
 job+0.016
 accomplishment+0.015
Neg logit contributions
pped-0.024
Did-0.024
cher-0.023
ode-0.023
 register-0.023
Token Soon
Token ID15894
Feature activation-3.70e+00

Pos logit contributions
Did+0.009
 lying+0.008
pped+0.008
ting+0.008
boards+0.008
Neg logit contributions
 breath-0.007
 common-0.006
 purpose-0.006
 job-0.005
 accomplishment-0.005
Token,
Token ID11
Feature activation+1.05e+00

Pos logit contributions
 breath+0.047
 common+0.046
 purpose+0.040
 job+0.034
 own+0.034
Neg logit contributions
pped-0.115
Did-0.115
 lying-0.115
ter-0.113
ting-0.113
Token she
Token ID673
Feature activation-2.66e-01

Pos logit contributions
cher+0.032
ode+0.031
auc+0.030
pped+0.030
 register+0.029
Neg logit contributions
 breath-0.017
 long-0.016
 job-0.015
 common-0.015
 work-0.014
Token had
Token ID550
Feature activation+2.84e+00

Pos logit contributions
 breath+0.006
 common+0.005
 work+0.005
 happily+0.005
 own+0.005
Neg logit contributions
 register-0.006
ting-0.006
cher-0.006
Did-0.006
 lying-0.006
Token a
Token ID257
Feature activation-1.83e-01

Pos logit contributions
cher+0.054
Did+0.054
pped+0.053
ting+0.052
ter+0.052
Neg logit contributions
 breath-0.073
 common-0.069
 purpose-0.067
 job-0.064
 accomplishment-0.062
Token com
Token ID401
Feature activation-9.60e+00

Pos logit contributions
 breath+0.003
 common+0.003
 purpose+0.003
 accomplishment+0.002
 job+0.002
Neg logit contributions
Did-0.005
 lying-0.005
cher-0.005
pped-0.005
ode-0.005
Tokenfy
Token ID24928
Feature activation-6.34e+00

Pos logit contributions
 purpose+0.175
 breath+0.175
 common+0.160
 sight+0.143
 avail+0.137
Neg logit contributions
cher-0.234
ode-0.225
pled-0.223
pped-0.217
boards-0.216
Token home
Token ID1363
Feature activation-5.16e-01

Pos logit contributions
 common+0.129
 breath+0.114
 purpose+0.111
 accomplishment+0.107
 own+0.104
Neg logit contributions
 lying-0.145
pped-0.138
ter-0.136
uit-0.133
ode-0.133
Token of
Token ID286
Feature activation+1.28e+00

Pos logit contributions
 breath+0.007
 common+0.006
 job+0.006
 purpose+0.005
 home+0.005
Neg logit contributions
Did-0.017
pped-0.016
cher-0.016
ode-0.016
ting-0.016
Token her
Token ID607
Feature activation-1.81e-01

Pos logit contributions
cher+0.034
ode+0.033
Did+0.033
 register+0.033
pped+0.033
Neg logit contributions
 breath-0.025
 purpose-0.023
 common-0.023
 work-0.022
 long-0.021
Token own
Token ID898
Feature activation+1.17e+00

Pos logit contributions
 common+0.003
 breath+0.003
 purpose+0.003
 accomplishment+0.002
away+0.002
Neg logit contributions
 lying-0.005
ter-0.005
ode-0.004
Did-0.004
 register-0.004
Token a
Token ID257
Feature activation-7.78e-01

Pos logit contributions
 breath+0.037
 common+0.035
 purpose+0.034
 job+0.032
 accomplishment+0.031
Neg logit contributions
cher-0.027
Did-0.027
pped-0.026
 register-0.025
ting-0.025
Token modern
Token ID3660
Feature activation-3.63e+00

Pos logit contributions
 breath+0.018
 common+0.017
 accomplishment+0.017
 choice+0.015
 purpose+0.015
Neg logit contributions
Did-0.017
 lying-0.017
ode-0.015
 living-0.015
ter-0.015
Token cave
Token ID11527
Feature activation-1.34e+00

Pos logit contributions
 common+0.085
 breath+0.084
 accomplishment+0.080
powers+0.070
 care+0.070
Neg logit contributions
 lying-0.074
ter-0.071
ode-0.070
box-0.067
Did-0.067
Token with
Token ID351
Feature activation+3.72e+00

Pos logit contributions
 breath+0.027
 common+0.026
 purpose+0.024
 job+0.023
 accomplishment+0.022
Neg logit contributions
Did-0.033
cher-0.033
pped-0.032
 register-0.031
ting-0.031
Token a
Token ID257
Feature activation+8.80e-02

Pos logit contributions
cher+0.103
Did+0.100
pped+0.100
 register+0.099
ting+0.098
Neg logit contributions
 breath-0.067
 common-0.061
 purpose-0.059
 long-0.056
 job-0.054
Token com
Token ID401
Feature activation-9.58e+00

Pos logit contributions
 lying+0.002
Did+0.002
cher+0.002
ter+0.002
ode+0.002
Neg logit contributions
 breath-0.002
 common-0.002
 accomplishment-0.002
 purpose-0.002
away-0.001
Tokenfy
Token ID24928
Feature activation-6.45e+00

Pos logit contributions
 breath+0.184
 purpose+0.175
 common+0.156
 avail+0.147
 accomplishment+0.146
Neg logit contributions
cher-0.218
ode-0.212
washer-0.211
vell-0.208
 lying-0.208
Token bed
Token ID3996
Feature activation-1.54e+00

Pos logit contributions
 common+0.137
 breath+0.119
 accomplishment+0.117
 choice+0.114
 purpose+0.110
Neg logit contributions
 lying-0.148
ter-0.140
ode-0.135
uit-0.132
pped-0.129
Token.
Token ID13
Feature activation+1.53e-03

Pos logit contributions
 breath+0.016
 job+0.014
 common+0.014
 purpose+0.013
 long+0.013
Neg logit contributions
Did-0.052
cher-0.050
ode-0.050
pped-0.050
 register-0.048
Token One
Token ID1881
Feature activation-7.69e-01

Pos logit contributions
Did+0.000
 lying+0.000
boards+0.000
ter+0.000
pped+0.000
Neg logit contributions
 breath-0.000
 common-0.000
 purpose-0.000
 accomplishment-0.000
 job-0.000
Token day
Token ID1110
Feature activation-3.64e+00

Pos logit contributions
 breath+0.011
 common+0.010
 purpose+0.009
 accomplishment+0.008
 judgement+0.007
Neg logit contributions
 lying-0.024
Did-0.023
ter-0.023
pped-0.022
washer-0.022
Token away
Token ID1497
Feature activation-3.84e+00

Pos logit contributions
ting+0.006
 lying+0.006
 register+0.006
pped+0.006
cher+0.006
Neg logit contributions
 common-0.007
 purpose-0.007
 accomplishment-0.007
 breath-0.007
inkle-0.007
Token.
Token ID13
Feature activation+3.73e-01

Pos logit contributions
anna+0.098
+0.095
 purpose+0.094
 breath+0.090
 avail+0.089
Neg logit contributions
 lying-0.061
ter-0.058
ting-0.054
 who-0.053
 coming-0.051
Token They
Token ID1119
Feature activation-2.81e-01

Pos logit contributions
Did+0.009
 Why+0.009
 lying+0.008
 How+0.008
ting+0.008
Neg logit contributions
 breath-0.009
 accomplishment-0.007
 choice-0.007
 job-0.007
 work-0.007
Token had
Token ID550
Feature activation+3.08e+00

Pos logit contributions
 breath+0.007
 common+0.006
 purpose+0.005
 own+0.005
 job+0.005
Neg logit contributions
cher-0.006
Did-0.006
ter-0.006
pped-0.006
ting-0.006
Token no
Token ID645
Feature activation-1.67e+00

Pos logit contributions
Did+0.061
pped+0.060
cher+0.060
 lying+0.059
ter+0.058
Neg logit contributions
 breath-0.076
 common-0.073
 purpose-0.070
 job-0.067
 accomplishment-0.066
Token com
Token ID401
Feature activation-9.57e+00

Pos logit contributions
 common+0.035
 breath+0.033
 purpose+0.031
away+0.030
eth+0.029
Neg logit contributions
 lying-0.037
 register-0.035
Did-0.033
cher-0.033
ter-0.032
Tokenbs
Token ID1443
Feature activation+1.52e-01

Pos logit contributions
 breath+0.187
 purpose+0.181
 common+0.174
 sight+0.158
 job+0.158
Neg logit contributions
cher-0.236
Did-0.232
pped-0.230
pled-0.227
ode-0.221
Token and
Token ID290
Feature activation-2.00e+00

Pos logit contributions
ode+0.005
Did+0.005
cher+0.005
pped+0.004
boards+0.004
Neg logit contributions
 breath-0.002
 common-0.002
 home-0.002
 purpose-0.002
 work-0.002
Token no
Token ID645
Feature activation-1.72e+00

Pos logit contributions
 breath+0.051
 common+0.048
 long+0.045
 purpose+0.044
 job+0.044
Neg logit contributions
pped-0.040
ode-0.040
ting-0.039
cher-0.039
uit-0.039
Token fun
Token ID1257
Feature activation-6.05e+00

Pos logit contributions
 common+0.034
 breath+0.034
 purpose+0.031
 accomplishment+0.030
+0.030
Neg logit contributions
 lying-0.039
ter-0.037
Did-0.037
cher-0.037
 register-0.036
Token.
Token ID13
Feature activation+5.17e-01

Pos logit contributions
 breath+0.113
 common+0.110
 purpose+0.101
 job+0.094
 accomplishment+0.093
Neg logit contributions
 lying-0.156
Did-0.150
pped-0.149
ting-0.147
cher-0.146
Token wanted
Token ID2227
Feature activation+3.33e+00

Pos logit contributions
Did+0.006
ting+0.006
cher+0.006
 register+0.006
ode+0.006
Neg logit contributions
 breath-0.005
 work-0.005
 common-0.005
 happily-0.004
 care-0.004
Token to
Token ID284
Feature activation-1.05e+00

Pos logit contributions
cher+0.113
pped+0.110
Did+0.109
ode+0.105
ter+0.105
Neg logit contributions
 breath-0.042
 purpose-0.039
 common-0.038
 work-0.033
 long-0.032
Token give
Token ID1577
Feature activation+3.54e+00

Pos logit contributions
 breath+0.023
 common+0.022
 purpose+0.021
 job+0.020
 accomplishment+0.020
Neg logit contributions
Did-0.023
cher-0.023
pped-0.023
 lying-0.023
ter-0.022
Token her
Token ID607
Feature activation+8.48e-01

Pos logit contributions
Did+0.092
cher+0.091
pped+0.090
 lying+0.088
ter+0.086
Neg logit contributions
 breath-0.068
 common-0.064
 purpose-0.061
 job-0.057
 own-0.055
Token a
Token ID257
Feature activation+2.94e-01

Pos logit contributions
Did+0.019
cher+0.018
pped+0.018
ting+0.017
ter+0.017
Neg logit contributions
 breath-0.021
 common-0.020
 purpose-0.019
 job-0.018
 own-0.018
Token com
Token ID401
Feature activation-9.55e+00

Pos logit contributions
Did+0.008
pped+0.008
cher+0.008
 lying+0.008
ting+0.008
Neg logit contributions
 breath-0.005
 common-0.005
 purpose-0.004
 job-0.004
 accomplishment-0.004
Tokenfy
Token ID24928
Feature activation-6.14e+00

Pos logit contributions
 breath+0.167
 purpose+0.157
 common+0.142
 sight+0.139
 accomplishment+0.128
Neg logit contributions
cher-0.235
ode-0.224
washer-0.221
boards-0.220
oll-0.217
Token mattress
Token ID33388
Feature activation-1.48e+00

Pos logit contributions
 common+0.107
 breath+0.097
 purpose+0.093
 accomplishment+0.090
 own+0.084
Neg logit contributions
 lying-0.155
ter-0.145
ode-0.145
pped-0.145
cher-0.144
Token.
Token ID13
Feature activation-2.67e-01

Pos logit contributions
 breath+0.020
 common+0.018
 job+0.017
 purpose+0.016
 work+0.015
Neg logit contributions
Did-0.047
pped-0.046
ting-0.045
cher-0.045
ode-0.044
Token When
Token ID1649
Feature activation-1.61e+00

Pos logit contributions
 breath+0.005
 common+0.005
 purpose+0.004
 accomplishment+0.004
 job+0.004
Neg logit contributions
Did-0.007
boards-0.007
ting-0.007
ter-0.007
 lying-0.007
Token the
Token ID262
Feature activation+1.53e+00

Pos logit contributions
 breath+0.035
 common+0.034
 purpose+0.032
 accomplishment+0.030
 job+0.029
Neg logit contributions
Did-0.036
 lying-0.035
ter-0.034
pped-0.034
ting-0.034
Token to
Token ID284
Feature activation-1.01e+00

Pos logit contributions
Did+0.035
pped+0.033
cher+0.033
ter+0.032
ode+0.032
Neg logit contributions
 breath-0.017
 job-0.014
 long-0.014
 common-0.014
 purpose-0.014
Token carry
Token ID3283
Feature activation+2.81e+00

Pos logit contributions
 breath+0.020
 common+0.018
 purpose+0.017
 work+0.017
 job+0.017
Neg logit contributions
Did-0.027
pped-0.026
cher-0.026
 register-0.026
boards-0.026
Token supplies
Token ID9416
Feature activation+1.92e+00

Pos logit contributions
Did+0.076
cher+0.072
pped+0.071
ode+0.070
 lying+0.069
Neg logit contributions
 breath-0.055
 common-0.050
 purpose-0.048
 job-0.046
 work-0.045
Token and
Token ID290
Feature activation-9.34e-01

Pos logit contributions
Did+0.055
pped+0.053
cher+0.053
ter+0.052
 lying+0.051
Neg logit contributions
 breath-0.032
 common-0.029
 purpose-0.027
 job-0.026
 accomplishment-0.025
Token a
Token ID257
Feature activation-2.09e-01

Pos logit contributions
 breath+0.020
 common+0.018
 work+0.018
 job+0.018
 purpose+0.017
Neg logit contributions
Did-0.022
pped-0.022
cher-0.021
ter-0.021
 register-0.021
Token com
Token ID401
Feature activation-9.53e+00

Pos logit contributions
 breath+0.004
 common+0.004
 purpose+0.004
 job+0.003
 accomplishment+0.003
Neg logit contributions
Did-0.005
cher-0.005
 lying-0.005
pped-0.005
ter-0.005
Tokenfy
Token ID24928
Feature activation-6.49e+00

Pos logit contributions
 breath+0.176
 purpose+0.172
 common+0.163
 sight+0.149
 Wednesday+0.129
Neg logit contributions
cher-0.243
ode-0.223
washer-0.222
boards-0.221
pled-0.218
Token seat
Token ID5852
Feature activation+1.01e-01

Pos logit contributions
 common+0.129
 breath+0.122
 purpose+0.111
 accomplishment+0.105
 job+0.103
Neg logit contributions
 lying-0.152
ter-0.148
pped-0.148
ode-0.147
Did-0.145
Token for
Token ID329
Feature activation+1.52e+00

Pos logit contributions
Did+0.003
pped+0.003
ode+0.003
ting+0.003
cher+0.003
Neg logit contributions
 breath-0.002
 common-0.001
 job-0.001
 purpose-0.001
 work-0.001
Token him
Token ID683
Feature activation-1.13e+00

Pos logit contributions
Did+0.039
cher+0.038
pped+0.037
ode+0.036
ter+0.036
Neg logit contributions
 breath-0.031
 common-0.028
 purpose-0.028
 job-0.026
 accomplishment-0.026
Token to
Token ID284
Feature activation-2.45e+00

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.015
 job+0.014
 accomplishment+0.013
Neg logit contributions
Did-0.034
pped-0.033
cher-0.033
ter-0.032
ting-0.032
Token but
Token ID475
Feature activation+4.54e-01

Pos logit contributions
pped+0.020
cher+0.020
Did+0.020
 register+0.020
ter+0.019
Neg logit contributions
 breath-0.018
 common-0.016
 purpose-0.015
 accomplishment-0.015
 job-0.015
Token only
Token ID691
Feature activation-2.47e+00

Pos logit contributions
Did+0.011
pped+0.011
cher+0.011
ode+0.010
 register+0.010
Neg logit contributions
 breath-0.009
 common-0.009
 purpose-0.008
 job-0.008
 accomplishment-0.008
Token in
Token ID287
Feature activation-2.69e+00

Pos logit contributions
 breath+0.048
 long+0.046
 job+0.044
 common+0.043
 work+0.042
Neg logit contributions
pped-0.064
auc-0.062
boards-0.062
Did-0.061
ode-0.061
Token his
Token ID465
Feature activation-9.70e-01

Pos logit contributions
 breath+0.053
 common+0.049
 purpose+0.046
 job+0.044
 accomplishment+0.043
Neg logit contributions
cher-0.069
Did-0.069
pped-0.068
ter-0.067
ting-0.066
Token big
Token ID1263
Feature activation-4.57e+00

Pos logit contributions
 common+0.015
 breath+0.015
 purpose+0.013
+0.012
 accomplishment+0.012
Neg logit contributions
 lying-0.026
cher-0.025
ode-0.025
Did-0.025
 register-0.025
Token com
Token ID401
Feature activation-9.51e+00

Pos logit contributions
 common+0.080
 purpose+0.072
 breath+0.070
 job+0.062
After+0.062
Neg logit contributions
 lying-0.113
ode-0.111
pled-0.109
Did-0.107
box-0.106
Tokenfy
Token ID24928
Feature activation-6.22e+00

Pos logit contributions
 purpose+0.192
 breath+0.186
 common+0.183
 sight+0.160
 choice+0.154
Neg logit contributions
cher-0.213
ode-0.212
asing-0.208
boards-0.202
pled-0.200
Token chair
Token ID5118
Feature activation-8.21e-01

Pos logit contributions
 common+0.120
 breath+0.107
 purpose+0.099
+0.098
 accomplishment+0.096
Neg logit contributions
 lying-0.148
ode-0.143
boards-0.142
box-0.140
cher-0.140
Token upstairs
Token ID26148
Feature activation-2.19e+00

Pos logit contributions
 breath+0.012
 common+0.011
 purpose+0.010
 job+0.010
 accomplishment+0.009
Neg logit contributions
Did-0.024
pped-0.024
cher-0.024
ter-0.023
ting-0.023
Token.
Token ID13
Feature activation+2.51e-01

Pos logit contributions
 breath+0.034
 common+0.031
 purpose+0.029
 job+0.027
 accomplishment+0.026
Neg logit contributions
Did-0.064
cher-0.063
pped-0.063
ter-0.062
 lying-0.062
Token He
Token ID679
Feature activation+1.84e-03

Pos logit contributions
Did+0.007
pped+0.007
cher+0.007
 lying+0.007
ter+0.007
Neg logit contributions
 breath-0.004
 common-0.004
 purpose-0.003
 job-0.003
 accomplishment-0.003
Token around
Token ID1088
Feature activation-1.45e+00

Pos logit contributions
 lying+0.034
ting+0.034
Did+0.034
pped+0.033
ter+0.032
Neg logit contributions
 common-0.043
 purpose-0.042
 breath-0.042
 job-0.038
 accomplishment-0.037
Token and
Token ID290
Feature activation-1.07e+00

Pos logit contributions
 common+0.026
 breath+0.026
 purpose+0.024
 job+0.023
 accomplishment+0.021
Neg logit contributions
Did-0.037
 lying-0.037
pped-0.036
ting-0.036
ter-0.036
Token saw
Token ID2497
Feature activation+4.50e+00

Pos logit contributions
 breath+0.024
 common+0.023
 purpose+0.021
 job+0.021
 accomplishment+0.020
Neg logit contributions
cher-0.024
pped-0.023
Did-0.023
 register-0.023
ode-0.022
Token a
Token ID257
Feature activation-3.90e-01

Pos logit contributions
cher+0.125
ode+0.124
Did+0.118
pped+0.117
ologist+0.113
Neg logit contributions
 breath-0.083
 long-0.078
 purpose-0.075
 sight-0.074
 home-0.072
Token big
Token ID1263
Feature activation-3.82e+00

Pos logit contributions
 breath+0.007
 common+0.006
 purpose+0.006
 job+0.006
 accomplishment+0.005
Neg logit contributions
Did-0.011
cher-0.010
pped-0.010
 lying-0.010
ter-0.010
Token com
Token ID401
Feature activation-9.45e+00

Pos logit contributions
 breath+0.067
 common+0.065
 purpose+0.060
 job+0.056
 accomplishment+0.055
Neg logit contributions
Did-0.100
 lying-0.099
cher-0.098
pped-0.098
ter-0.097
Tokenfy
Token ID24928
Feature activation-6.18e+00

Pos logit contributions
 breath+0.183
 purpose+0.175
 common+0.164
 choice+0.143
 accomplishment+0.138
Neg logit contributions
cher-0.223
washer-0.213
ode-0.210
pled-0.209
vell-0.209
Token couch
Token ID18507
Feature activation+3.13e-01

Pos logit contributions
 common+0.123
 breath+0.118
 accomplishment+0.106
 purpose+0.106
powers+0.105
Neg logit contributions
 lying-0.145
ter-0.137
pped-0.134
ting-0.134
Did-0.132
Token.
Token ID13
Feature activation+8.79e-02

Pos logit contributions
cher+0.010
Did+0.010
ode+0.009
pped+0.009
ava+0.009
Neg logit contributions
 breath-0.004
 long-0.003
 job-0.003
 common-0.003
 purpose-0.003
Token The
Token ID383
Feature activation+1.64e+00

Pos logit contributions
Did+0.002
cher+0.002
pped+0.002
 lying+0.002
ter+0.002
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.001
 accomplishment-0.001
Token little
Token ID1310
Feature activation-2.38e+00

Pos logit contributions
Did+0.046
cher+0.045
pped+0.045
 lying+0.045
ting+0.044
Neg logit contributions
 breath-0.028
 common-0.026
 purpose-0.024
 job-0.022
 accomplishment-0.022
Token Why
Token ID4162
Feature activation+3.35e+00

Pos logit contributions
Did+0.025
 lying+0.024
cher+0.024
pped+0.024
ter+0.024
Neg logit contributions
 breath-0.021
 common-0.019
 purpose-0.018
 job-0.017
 accomplishment-0.017
Token are
Token ID389
Feature activation+2.86e+00

Pos logit contributions
Did+0.072
pped+0.071
ting+0.070
cher+0.070
 register+0.069
Neg logit contributions
 breath-0.079
 common-0.078
 purpose-0.073
 job-0.071
 work-0.070
Token you
Token ID345
Feature activation-7.62e-01

Pos logit contributions
Did+0.073
pped+0.072
cher+0.071
ting+0.071
ode+0.069
Neg logit contributions
 breath-0.055
 common-0.053
 purpose-0.049
 job-0.048
 own-0.046
Token fighting
Token ID4330
Feature activation+2.07e+00

Pos logit contributions
 breath+0.017
 common+0.016
 purpose+0.016
 job+0.015
 work+0.014
Neg logit contributions
Did-0.017
cher-0.017
pped-0.017
ting-0.016
boards-0.016
Token over
Token ID625
Feature activation-1.02e+00

Pos logit contributions
cher+0.051
Did+0.051
pped+0.050
 register+0.049
ting+0.049
Neg logit contributions
 breath-0.041
 common-0.040
 purpose-0.039
 job-0.036
 work-0.035
Token the
Token ID262
Feature activation+3.62e+00

Pos logit contributions
 breath+0.019
 common+0.018
 purpose+0.017
 job+0.016
 own+0.015
Neg logit contributions
cher-0.027
Did-0.027
pped-0.026
 register-0.026
 lying-0.026
Token red
Token ID2266
Feature activation-1.17e+00

Pos logit contributions
pped+0.092
Did+0.091
cher+0.089
ting+0.087
 register+0.085
Neg logit contributions
 breath-0.073
 common-0.069
 purpose-0.067
 job-0.064
 own-0.060
Token past
Token ID1613
Feature activation+1.30e+00

Pos logit contributions
 breath+0.024
 common+0.023
 purpose+0.022
 job+0.021
 own+0.020
Neg logit contributions
Did-0.028
cher-0.028
pped-0.028
 lying-0.027
ting-0.026
Tokenel
Token ID417
Feature activation+9.74e-01

Pos logit contributions
cher+0.032
Did+0.031
pped+0.030
 register+0.030
 lying+0.030
Neg logit contributions
 breath-0.026
 common-0.026
 purpose-0.025
 job-0.023
 own-0.022
Token?"
Token ID1701
Feature activation-2.75e+00

Pos logit contributions
Did+0.024
cher+0.024
pped+0.024
 lying+0.023
ting+0.023
Neg logit contributions
 breath-0.019
 common-0.018
 purpose-0.017
 job-0.016
 accomplishment-0.016
Token
Token ID198
Feature activation-2.45e+00

Pos logit contributions
 breath+0.066
 common+0.060
 job+0.057
 purpose+0.056
 accomplishment+0.055
Neg logit contributions
Did-0.062
 lying-0.059
 Why-0.057
ter-0.055
ting-0.053
Token can
Token ID460
Feature activation-8.85e-01

Pos logit contributions
Did+0.025
pped+0.024
cher+0.024
 lying+0.024
ting+0.024
Neg logit contributions
 breath-0.043
 common-0.042
 purpose-0.039
 own-0.039
 job-0.039
Token make
Token ID787
Feature activation+3.49e+00

Pos logit contributions
 breath+0.020
 common+0.020
 purpose+0.019
 accomplishment+0.018
 job+0.018
Neg logit contributions
pped-0.018
 lying-0.018
cher-0.018
ter-0.018
Did-0.018
Token a
Token ID257
Feature activation-6.83e-02

Pos logit contributions
 lying+0.060
Did+0.060
ter+0.059
pped+0.059
cher+0.059
Neg logit contributions
 breath-0.093
 common-0.090
 purpose-0.084
 job-0.082
 accomplishment-0.081
Token noise
Token ID7838
Feature activation-1.09e+00

Pos logit contributions
 breath+0.001
 common+0.001
 purpose+0.001
 accomplishment+0.001
 job+0.001
Neg logit contributions
 lying-0.002
Did-0.002
cher-0.002
ter-0.001
ode-0.001
Token.
Token ID13
Feature activation+2.10e+00

Pos logit contributions
 breath+0.021
 common+0.020
 purpose+0.019
 job+0.018
 accomplishment+0.017
Neg logit contributions
Did-0.027
cher-0.027
pped-0.027
 lying-0.027
ter-0.026
Token What
Token ID1867
Feature activation+6.54e+00

Pos logit contributions
pped+0.046
cher+0.046
 lying+0.045
Did+0.045
 register+0.044
Neg logit contributions
 breath-0.046
 common-0.045
 purpose-0.042
 job-0.040
 own-0.039
Token do
Token ID466
Feature activation+4.50e-01

Pos logit contributions
pped+0.148
ting+0.143
cher+0.140
 register+0.139
ter+0.138
Neg logit contributions
 common-0.151
 breath-0.147
 job-0.137
 purpose-0.137
 own-0.136
Token you
Token ID345
Feature activation-1.08e+00

Pos logit contributions
Did+0.010
cher+0.010
pped+0.010
 lying+0.010
ter+0.010
Neg logit contributions
 breath-0.010
 common-0.009
 purpose-0.009
 job-0.008
 accomplishment-0.008
Token think
Token ID892
Feature activation+1.51e-01

Pos logit contributions
 breath+0.020
 common+0.019
 purpose+0.017
 job+0.017
 accomplishment+0.016
Neg logit contributions
Did-0.028
cher-0.028
pped-0.028
 lying-0.028
ting-0.027
Token?"
Token ID1701
Feature activation-3.40e+00

Pos logit contributions
Did+0.004
ode+0.004
pped+0.004
cher+0.004
 register+0.004
Neg logit contributions
 common-0.003
 breath-0.003
 purpose-0.002
 job-0.002
 long-0.002
Token<|endoftext|>
Token ID50256
Feature activation+4.23e-01

Pos logit contributions
 breath+0.077
 common+0.068
 purpose+0.068
 job+0.067
 accomplishment+0.067
Neg logit contributions
Did-0.082
 lying-0.077
 Why-0.072
ter-0.072
cher-0.069
Token there
Token ID612
Feature activation+1.11e+00

Pos logit contributions
cher+0.052
 register+0.050
ting+0.049
ode+0.049
pped+0.049
Neg logit contributions
 breath-0.067
 common-0.065
 purpose-0.060
 job-0.060
 long-0.059
Token was
Token ID373
Feature activation+1.60e+00

Pos logit contributions
cher+0.025
Did+0.023
ode+0.023
 register+0.023
ting+0.023
Neg logit contributions
 breath-0.026
 common-0.025
 purpose-0.024
 job-0.023
 own-0.022
Token a
Token ID257
Feature activation-1.61e+00

Pos logit contributions
cher+0.029
Did+0.029
pped+0.028
ting+0.027
 lying+0.027
Neg logit contributions
 breath-0.042
 common-0.041
 purpose-0.039
 job-0.037
 accomplishment-0.036
Token little
Token ID1310
Feature activation-4.75e+00

Pos logit contributions
 breath+0.027
 common+0.026
 purpose+0.024
 accomplishment+0.022
 job+0.021
Neg logit contributions
Did-0.044
 lying-0.044
ter-0.043
cher-0.043
pped-0.042
Token girl
Token ID2576
Feature activation-9.20e-01

Pos logit contributions
 common+0.093
 breath+0.092
 purpose+0.087
 accomplishment+0.082
 job+0.076
Neg logit contributions
ter-0.117
 lying-0.114
Did-0.113
ting-0.110
cher-0.109
Token named
Token ID3706
Feature activation+5.03e+00

Pos logit contributions
 breath+0.026
 common+0.025
 purpose+0.024
 accomplishment+0.024
 job+0.023
Neg logit contributions
Did-0.015
 lying-0.015
pped-0.014
ter-0.014
cher-0.014
Token Lily
Token ID20037
Feature activation+1.29e+00

Pos logit contributions
Did+0.107
 lying+0.107
pped+0.103
ter+0.102
cher+0.102
Neg logit contributions
 breath-0.118
 common-0.112
 purpose-0.106
 accomplishment-0.102
 job-0.100
Token.
Token ID13
Feature activation+4.39e-01

Pos logit contributions
Did+0.037
 lying+0.036
pped+0.036
cher+0.035
ter+0.035
Neg logit contributions
 breath-0.021
 common-0.020
 purpose-0.019
 job-0.017
 accomplishment-0.017
Token Lily
Token ID20037
Feature activation+1.33e+00

Pos logit contributions
Did+0.010
 lying+0.009
pped+0.009
boards+0.009
ting+0.009
Neg logit contributions
 breath-0.010
 common-0.010
 purpose-0.009
 accomplishment-0.009
 job-0.009
Token loved
Token ID6151
Feature activation-2.24e-01

Pos logit contributions
Did+0.022
 lying+0.022
pped+0.021
cher+0.021
ter+0.020
Neg logit contributions
 breath-0.038
 common-0.036
 purpose-0.034
 accomplishment-0.033
 job-0.033
Token to
Token ID284
Feature activation-1.22e+00

Pos logit contributions
 breath+0.004
 common+0.004
 purpose+0.004
 job+0.003
 accomplishment+0.003
Neg logit contributions
Did-0.006
 lying-0.006
pped-0.006
cher-0.006
ting-0.006
Token curious
Token ID11040
Feature activation-3.85e+00

Pos logit contributions
 breath+0.095
 purpose+0.094
 accomplishment+0.088
 common+0.087
+0.083
Neg logit contributions
 lying-0.092
cher-0.082
auc-0.081
washer-0.080
Did-0.079
Token.
Token ID13
Feature activation+5.92e-01

Pos logit contributions
 breath+0.059
 common+0.056
 purpose+0.050
 job+0.047
 accomplishment+0.046
Neg logit contributions
Did-0.113
 lying-0.111
cher-0.111
pped-0.111
ting-0.109
Token He
Token ID679
Feature activation-4.63e-01

Pos logit contributions
Did+0.013
 lying+0.012
boards+0.012
 Why+0.012
ting+0.012
Neg logit contributions
 breath-0.015
 common-0.013
 accomplishment-0.012
 purpose-0.012
 job-0.012
Token sees
Token ID7224
Feature activation+4.92e+00

Pos logit contributions
 breath+0.010
 common+0.009
 purpose+0.009
 job+0.008
 accomplishment+0.008
Neg logit contributions
Did-0.011
cher-0.011
pped-0.011
 lying-0.011
ter-0.011
Token the
Token ID262
Feature activation+2.51e+00

Pos logit contributions
ode+0.181
cher+0.180
vell+0.164
can+0.162
 register+0.161
Neg logit contributions
 breath-0.051
 common-0.051
 purpose-0.047
 sight-0.044
 long-0.043
Token prints
Token ID20842
Feature activation+6.67e+00

Pos logit contributions
Did+0.060
pped+0.059
cher+0.059
 lying+0.057
ting+0.057
Neg logit contributions
 breath-0.054
 common-0.051
 purpose-0.048
 job-0.046
 accomplishment-0.044
Token and
Token ID290
Feature activation+1.70e-01

Pos logit contributions
Did+0.190
cher+0.188
ode+0.186
ter+0.182
pped+0.178
Neg logit contributions
 breath-0.113
 work-0.097
 common-0.087
 long-0.087
 home-0.083
Token wants
Token ID3382
Feature activation+2.84e+00

Pos logit contributions
Did+0.004
 lying+0.004
cher+0.004
ting+0.004
pped+0.004
Neg logit contributions
 breath-0.003
 common-0.003
 purpose-0.003
 job-0.003
 accomplishment-0.003
Token to
Token ID284
Feature activation-9.40e-01

Pos logit contributions
pped+0.088
cher+0.088
Did+0.087
ode+0.086
ter+0.085
Neg logit contributions
 breath-0.040
 common-0.039
 purpose-0.036
 long-0.033
 own-0.032
Token touch
Token ID3638
Feature activation+2.70e+00

Pos logit contributions
 breath+0.021
 accomplishment+0.020
 surroundings+0.019
 purpose+0.019
 common+0.019
Neg logit contributions
pped-0.018
Did-0.018
 lying-0.017
cher-0.017
ter-0.017
Token them
Token ID606
Feature activation+1.66e+00

Pos logit contributions
cher+0.078
Did+0.077
ode+0.077
pped+0.076
 register+0.075
Neg logit contributions
 breath-0.045
 common-0.044
 purpose-0.039
 own-0.037
 long-0.036
Token make
Token ID787
Feature activation+3.18e+00

Pos logit contributions
 breath+0.049
 common+0.047
 purpose+0.044
 job+0.042
 accomplishment+0.040
Neg logit contributions
Did-0.058
cher-0.057
pped-0.057
 lying-0.056
ting-0.056
Token it
Token ID340
Feature activation-3.35e-01

Pos logit contributions
 lying+0.064
pped+0.059
ter+0.059
 register+0.058
ting+0.057
Neg logit contributions
 common-0.075
 breath-0.074
 accomplishment-0.065
-0.064
 purpose-0.064
Token wrong
Token ID2642
Feature activation-2.76e+00

Pos logit contributions
 breath+0.007
 common+0.007
 purpose+0.006
 accomplishment+0.006
 job+0.006
Neg logit contributions
 lying-0.008
cher-0.007
pped-0.007
Did-0.007
ter-0.007
Token.
Token ID13
Feature activation+6.88e-01

Pos logit contributions
 breath+0.049
 common+0.046
 purpose+0.041
 accomplishment+0.040
 job+0.039
Neg logit contributions
 lying-0.074
Did-0.072
cher-0.070
ter-0.070
pped-0.070
Token She
Token ID1375
Feature activation-7.80e-01

Pos logit contributions
Did+0.017
cher+0.016
pped+0.016
 lying+0.016
ter+0.016
Neg logit contributions
 breath-0.014
 common-0.013
 purpose-0.013
 job-0.012
 accomplishment-0.012
Token made
Token ID925
Feature activation+3.47e+00

Pos logit contributions
 breath+0.015
 common+0.015
 work+0.013
 own+0.013
 happily+0.013
Neg logit contributions
cher-0.019
 register-0.019
ting-0.019
ter-0.018
ode-0.018
Token it
Token ID340
Feature activation+7.23e-01

Pos logit contributions
Did+0.065
 lying+0.065
pped+0.064
cher+0.064
ter+0.062
Neg logit contributions
 breath-0.088
 common-0.086
 purpose-0.081
 job-0.078
 accomplishment-0.075
Token how
Token ID703
Feature activation+1.90e+00

Pos logit contributions
Did+0.016
cher+0.016
 lying+0.016
pped+0.016
ter+0.015
Neg logit contributions
 breath-0.016
 common-0.015
 purpose-0.014
 job-0.014
 accomplishment-0.013
Token she
Token ID673
Feature activation+1.38e+00

Pos logit contributions
Did+0.049
cher+0.048
pped+0.048
 lying+0.048
ter+0.047
Neg logit contributions
 breath-0.036
 common-0.034
 purpose-0.032
 job-0.030
 accomplishment-0.029
Token liked
Token ID8288
Feature activation+1.47e+00

Pos logit contributions
 register+0.033
Did+0.032
ting+0.031
cher+0.031
 lying+0.031
Neg logit contributions
 breath-0.028
 work-0.027
 common-0.027
 own-0.026
 happily-0.025
Token it
Token ID340
Feature activation+3.55e-01

Pos logit contributions
cher+0.041
Did+0.040
pped+0.040
 register+0.039
ode+0.039
Neg logit contributions
 breath-0.026
 common-0.024
 purpose-0.023
 job-0.021
 work-0.020
Token Sam
Token ID3409
Feature activation+8.92e-01

Pos logit contributions
 breath+0.020
 common+0.020
 purpose+0.018
 job+0.017
 work+0.017
Neg logit contributions
cher-0.029
Did-0.028
pped-0.028
ting-0.028
 register-0.027
Token are
Token ID389
Feature activation+2.52e+00

Pos logit contributions
cher+0.017
Did+0.017
 register+0.016
ting+0.016
pped+0.016
Neg logit contributions
 breath-0.023
 common-0.022
 work-0.020
 purpose-0.020
 own-0.020
Token friends
Token ID2460
Feature activation+1.57e+00

Pos logit contributions
Did+0.064
cher+0.064
pped+0.063
ting+0.062
 lying+0.062
Neg logit contributions
 breath-0.049
 common-0.046
 purpose-0.042
 job-0.041
 accomplishment-0.040
Token.
Token ID13
Feature activation+4.46e-01

Pos logit contributions
Did+0.045
cher+0.044
pped+0.044
 lying+0.044
ter+0.043
Neg logit contributions
 breath-0.025
 common-0.024
 purpose-0.022
 job-0.021
 accomplishment-0.020
Token They
Token ID1119
Feature activation+5.23e-01

Pos logit contributions
Did+0.012
 lying+0.012
pped+0.012
cher+0.012
ting+0.012
Neg logit contributions
 breath-0.008
 common-0.007
 purpose-0.007
 job-0.006
 accomplishment-0.006
Token like
Token ID588
Feature activation+2.65e+00

Pos logit contributions
Did+0.012
cher+0.012
pped+0.011
ting+0.011
 register+0.011
Neg logit contributions
 breath-0.012
 common-0.012
 own-0.010
 work-0.010
 job-0.010
Token to
Token ID284
Feature activation-1.44e+00

Pos logit contributions
pped+0.084
cher+0.083
 register+0.081
Did+0.081
ode+0.079
Neg logit contributions
 breath-0.036
 common-0.034
 purpose-0.031
 job-0.029
 work-0.029
Token play
Token ID711
Feature activation+1.25e+00

Pos logit contributions
 breath+0.024
 common+0.022
 purpose+0.021
 job+0.019
 accomplishment+0.019
Neg logit contributions
Did-0.040
cher-0.040
pped-0.040
 lying-0.039
ter-0.039
Token with
Token ID351
Feature activation+2.65e+00

Pos logit contributions
 register+0.029
cher+0.029
Did+0.029
pped+0.028
 lying+0.028
Neg logit contributions
 breath-0.027
 common-0.026
 purpose-0.024
 job-0.024
 own-0.023
Token cars
Token ID5006
Feature activation+1.62e+00

Pos logit contributions
cher+0.069
Did+0.068
pped+0.068
 register+0.066
ter+0.065
Neg logit contributions
 breath-0.052
 common-0.050
 purpose-0.046
 job-0.043
 own-0.042
Token and
Token ID290
Feature activation-1.02e+00

Pos logit contributions
Did+0.050
pped+0.050
cher+0.050
 lying+0.049
ter+0.048
Neg logit contributions
 breath-0.022
 common-0.020
 purpose-0.018
 job-0.017
 own-0.016
Token They
Token ID1119
Feature activation-1.33e-01

Pos logit contributions
Did+0.029
 Why+0.026
 lying+0.026
ter+0.025
pped+0.025
Neg logit contributions
 breath-0.022
 accomplishment-0.018
 common-0.017
 job-0.017
 purpose-0.017
Token learned
Token ID4499
Feature activation+2.40e+00

Pos logit contributions
 breath+0.003
 common+0.003
 purpose+0.003
 job+0.003
 own+0.003
Neg logit contributions
cher-0.003
Did-0.003
pped-0.003
ting-0.003
 lying-0.003
Token a
Token ID257
Feature activation+1.73e-01

Pos logit contributions
Did+0.052
cher+0.051
pped+0.051
 lying+0.050
ter+0.049
Neg logit contributions
 breath-0.055
 common-0.053
 purpose-0.050
 job-0.048
 accomplishment-0.047
Token hard
Token ID1327
Feature activation-7.07e+00

Pos logit contributions
 lying+0.005
Did+0.005
cher+0.005
washer+0.005
ter+0.005
Neg logit contributions
 breath-0.002
 accomplishment-0.002
 common-0.002
 purpose-0.001
-0.001
Token lesson
Token ID11483
Feature activation-4.48e+00

Pos logit contributions
 breath+0.075
 common+0.058
 accomplishment+0.049
 purpose+0.049
 own+0.047
Neg logit contributions
 lying-0.231
Did-0.217
boards-0.216
cher-0.216
pped-0.214
Token.
Token ID13
Feature activation+7.53e-01

Pos logit contributions
 common+0.055
 breath+0.053
 purpose+0.045
 job+0.041
 work+0.040
Neg logit contributions
ode-0.143
cher-0.142
Did-0.142
pped-0.141
ter-0.138
Token They
Token ID1119
Feature activation-4.63e-01

Pos logit contributions
Did+0.021
 lying+0.019
ter+0.019
pped+0.019
 Why+0.019
Neg logit contributions
 breath-0.015
 accomplishment-0.012
 common-0.012
 job-0.012
 purpose-0.012
Token should
Token ID815
Feature activation-2.03e-02

Pos logit contributions
 breath+0.011
 common+0.011
 purpose+0.010
 job+0.010
 own+0.010
Neg logit contributions
cher-0.009
Did-0.009
pped-0.009
 lying-0.009
ting-0.009
Token have
Token ID423
Feature activation+2.13e+00

Pos logit contributions
 breath+0.001
 common+0.001
 sight+0.001
 accomplishment+0.001
 purpose+0.001
Neg logit contributions
 lying-0.000
Did-0.000
pped-0.000
ode-0.000
ter-0.000
Token listened
Token ID16399
Feature activation-2.03e+00

Pos logit contributions
cher+0.041
pped+0.040
ting+0.040
boards+0.040
 register+0.040
Neg logit contributions
 breath-0.054
 common-0.052
 purpose-0.049
 job-0.047
 work-0.046
Token to
Token ID284
Feature activation-1.18e+00

Pos logit contributions
 breath+0.037
 common+0.034
 purpose+0.032
 job+0.030
 accomplishment+0.030
Neg logit contributions
Did-0.054
cher-0.053
 lying-0.053
pped-0.052
ting-0.051
Token his
Token ID465
Feature activation-4.08e-01

Pos logit contributions
Did+0.036
cher+0.035
pped+0.035
ode+0.034
 lying+0.034
Neg logit contributions
 breath-0.028
 common-0.025
 purpose-0.024
 job-0.022
 accomplishment-0.022
Token adventure
Token ID8855
Feature activation-2.22e+00

Pos logit contributions
 breath+0.007
 common+0.007
 purpose+0.006
 job+0.005
 accomplishment+0.005
Neg logit contributions
cher-0.011
 lying-0.011
Did-0.011
pped-0.011
ter-0.010
Token.
Token ID13
Feature activation+5.40e-01

Pos logit contributions
 breath+0.029
 common+0.026
 purpose+0.024
 job+0.022
 accomplishment+0.021
Neg logit contributions
Did-0.071
pped-0.070
cher-0.069
 lying-0.068
ter-0.067
Token<|endoftext|>
Token ID50256
Feature activation+9.92e-01

Pos logit contributions
Did+0.015
cher+0.015
pped+0.015
 lying+0.015
ter+0.014
Neg logit contributions
 breath-0.009
 common-0.009
 purpose-0.008
 job-0.008
 accomplishment-0.007
TokenOnce
Token ID7454
Feature activation+6.29e-01

Pos logit contributions
cher+0.028
ode+0.028
Did+0.027
pped+0.027
 register+0.027
Neg logit contributions
 breath-0.015
 common-0.015
 purpose-0.013
-0.013
 sight-0.012
Token there
Token ID612
Feature activation+1.11e+00

Pos logit contributions
 lying+0.011
ava+0.010
washer+0.010
Did+0.010
pped+0.010
Neg logit contributions
 breath-0.017
 purpose-0.016
 common-0.016
 job-0.015
 accomplishment-0.015
Token was
Token ID373
Feature activation+1.42e+00

Pos logit contributions
cher+0.021
Did+0.021
pped+0.021
ting+0.020
 register+0.020
Neg logit contributions
 breath-0.029
 common-0.027
 purpose-0.026
 job-0.025
 own-0.025
Token a
Token ID257
Feature activation-1.61e+00

Pos logit contributions
cher+0.025
Did+0.025
pped+0.025
 lying+0.024
ting+0.024
Neg logit contributions
 breath-0.038
 common-0.037
 purpose-0.035
 job-0.034
 accomplishment-0.033
Token big
Token ID1263
Feature activation-4.25e+00

Pos logit contributions
 breath+0.029
 common+0.027
 purpose+0.025
 accomplishment+0.023
 job+0.022
Neg logit contributions
Did-0.043
 lying-0.043
ter-0.042
cher-0.041
ting-0.040
Token,
Token ID11
Feature activation+1.64e+00

Pos logit contributions
 common+0.084
 breath+0.083
 purpose+0.078
 accomplishment+0.070
powers+0.069
Neg logit contributions
 lying-0.102
ter-0.100
Did-0.098
ting-0.097
pped-0.095
Token fat
Token ID3735
Feature activation-2.99e+00

Pos logit contributions
cher+0.048
Did+0.048
pped+0.048
ting+0.046
 register+0.046
Neg logit contributions
 breath-0.025
 common-0.024
 purpose-0.021
 job-0.021
 accomplishment-0.019
Token smiled
Token ID13541
Feature activation-2.28e+00

Pos logit contributions
 breath+0.023
 common+0.023
 purpose+0.022
 job+0.021
 accomplishment+0.020
Neg logit contributions
Did-0.020
cher-0.020
pped-0.020
 lying-0.020
ter-0.019
Token and
Token ID290
Feature activation-2.30e+00

Pos logit contributions
 breath+0.042
 common+0.041
 purpose+0.039
 job+0.036
 accomplishment+0.035
Neg logit contributions
Did-0.059
 lying-0.059
ter-0.058
pped-0.057
cher-0.057
Token said
Token ID531
Feature activation+1.36e+00

Pos logit contributions
 breath+0.047
 common+0.045
 purpose+0.043
 job+0.040
 accomplishment+0.038
Neg logit contributions
Did-0.055
cher-0.055
 lying-0.054
pped-0.054
ter-0.054
Token,
Token ID11
Feature activation+3.41e-01

Pos logit contributions
cher+0.044
Did+0.043
 lying+0.043
ter+0.043
pped+0.042
Neg logit contributions
 breath-0.017
 common-0.016
 purpose-0.014
 job-0.012
 accomplishment-0.012
Token "
Token ID366
Feature activation+5.09e+00

Pos logit contributions
pped+0.007
Did+0.007
cher+0.007
 lying+0.007
 register+0.007
Neg logit contributions
 breath-0.008
 common-0.008
 purpose-0.007
 job-0.007
 accomplishment-0.007
TokenYou
Token ID1639
Feature activation+2.31e+00

Pos logit contributions
cher+0.102
 lying+0.101
 register+0.099
pped+0.096
washer+0.093
Neg logit contributions
 common-0.121
 breath-0.114
 purpose-0.112
-0.107
bye-0.107
Token can
Token ID460
Feature activation-3.08e-01

Pos logit contributions
Did+0.044
ting+0.042
pped+0.041
cher+0.040
 register+0.039
Neg logit contributions
 breath-0.062
 common-0.059
 job-0.057
 work-0.057
 care-0.056
Token eat
Token ID4483
Feature activation+1.83e+00

Pos logit contributions
 breath+0.006
 common+0.006
 purpose+0.006
 job+0.006
 own+0.005
Neg logit contributions
Did-0.007
cher-0.007
pped-0.007
 lying-0.007
ting-0.007
Token as
Token ID355
Feature activation-3.26e+00

Pos logit contributions
cher+0.045
Did+0.045
pped+0.043
 register+0.043
ode+0.042
Neg logit contributions
 breath-0.037
 common-0.036
 purpose-0.034
 job-0.032
 own-0.031
Token much
Token ID881
Feature activation-5.74e+00

Pos logit contributions
 breath+0.063
 common+0.059
 purpose+0.056
 accomplishment+0.051
 job+0.051
Neg logit contributions
 lying-0.082
pped-0.077
Did-0.077
ter-0.075
cher-0.074
Token as
Token ID355
Feature activation-3.48e+00

Pos logit contributions
 breath+0.100
 common+0.097
eth+0.095
 purpose+0.093
by+0.086
Neg logit contributions
 lying-0.150
pped-0.142
ting-0.139
ter-0.138
Did-0.131
Token hard
Token ID1327
Feature activation-6.02e+00

Pos logit contributions
Did+0.023
cher+0.023
pped+0.023
ode+0.022
ter+0.022
Neg logit contributions
 breath-0.016
 common-0.015
 purpose-0.014
 job-0.014
 accomplishment-0.013
Token and
Token ID290
Feature activation-1.76e+00

Pos logit contributions
 breath+0.117
 common+0.113
 purpose+0.106
 job+0.097
 accomplishment+0.096
Neg logit contributions
Did-0.148
 lying-0.148
pped-0.147
cher-0.146
ting-0.145
Token fast
Token ID3049
Feature activation-5.29e+00

Pos logit contributions
 breath+0.037
 common+0.036
 purpose+0.033
 job+0.032
 accomplishment+0.031
Neg logit contributions
Did-0.042
pped-0.041
cher-0.041
ter-0.040
ting-0.040
Token.
Token ID13
Feature activation+7.01e-01

Pos logit contributions
 breath+0.080
 common+0.076
 purpose+0.069
 job+0.064
 accomplishment+0.062
Neg logit contributions
Did-0.155
cher-0.153
pped-0.153
 lying-0.152
ting-0.150
Token Soon
Token ID15894
Feature activation-3.67e+00

Pos logit contributions
Did+0.021
cher+0.021
pped+0.021
 lying+0.020
ter+0.020
Neg logit contributions
 breath-0.010
 common-0.010
 purpose-0.009
 job-0.008
 accomplishment-0.008
Token,
Token ID11
Feature activation+1.64e+00

Pos logit contributions
 breath+0.047
 common+0.043
 purpose+0.039
 job+0.035
 accomplishment+0.034
Neg logit contributions
Did-0.116
cher-0.115
pped-0.115
 lying-0.114
ter-0.113
Token the
Token ID262
Feature activation+1.57e+00

Pos logit contributions
cher+0.053
ode+0.052
pped+0.051
umper+0.049
coon+0.049
Neg logit contributions
 breath-0.024
 long-0.023
 common-0.023
 work-0.020
 job-0.020
Token garage
Token ID15591
Feature activation+2.50e+00

Pos logit contributions
pped+0.037
Did+0.036
cher+0.036
ting+0.035
ter+0.034
Neg logit contributions
 breath-0.035
 common-0.032
 purpose-0.031
 job-0.031
 work-0.030
Token is
Token ID318
Feature activation+2.41e+00

Pos logit contributions
Did+0.055
cher+0.054
pped+0.054
ting+0.052
 lying+0.052
Neg logit contributions
 breath-0.057
 common-0.055
 purpose-0.052
 job-0.050
 accomplishment-0.048
Token clean
Token ID3424
Feature activation-1.17e+00

Pos logit contributions
Did+0.055
cher+0.054
ting+0.053
pped+0.053
ter+0.052
Neg logit contributions
 breath-0.054
 common-0.051
 purpose-0.048
 job-0.047
 accomplishment-0.045
Token and
Token ID290
Feature activation-2.70e-01

Pos logit contributions
 breath+0.016
 common+0.014
 purpose+0.014
 job+0.013
 work+0.012
Neg logit contributions
Did-0.037
cher-0.036
pped-0.036
ode-0.035
 lying-0.034
Token they
Token ID484
Feature activation+1.04e+00

Pos logit contributions
cher+0.062
pped+0.062
Did+0.061
ode+0.061
 register+0.059
Neg logit contributions
 breath-0.039
 common-0.038
 purpose-0.035
 job-0.033
 long-0.032
Token were
Token ID547
Feature activation+3.26e+00

Pos logit contributions
ode+0.020
 register+0.020
cher+0.020
uit+0.020
Did+0.020
Neg logit contributions
 breath-0.026
 work-0.025
 common-0.025
 own-0.024
 care-0.024
Token being
Token ID852
Feature activation+2.10e+00

Pos logit contributions
Did+0.077
cher+0.077
pped+0.075
ode+0.074
boards+0.074
Neg logit contributions
 breath-0.070
 common-0.067
 purpose-0.063
 job-0.061
 long-0.059
Token polite
Token ID23507
Feature activation-3.96e+00

Pos logit contributions
Did+0.054
pped+0.053
ode+0.053
boards+0.053
cher+0.052
Neg logit contributions
 breath-0.040
 common-0.038
 purpose-0.035
 job-0.034
 long-0.033
Token to
Token ID284
Feature activation-5.07e-01

Pos logit contributions
 breath+0.050
 common+0.046
 purpose+0.043
 job+0.040
 work+0.038
Neg logit contributions
cher-0.126
Did-0.126
pped-0.125
ode-0.122
 register-0.120
Token each
Token ID1123
Feature activation-1.29e+00

Pos logit contributions
 breath+0.011
 common+0.010
 purpose+0.010
 job+0.009
 accomplishment+0.009
Neg logit contributions
Did-0.012
cher-0.012
pped-0.012
 lying-0.012
ter-0.011
Token other
Token ID584
Feature activation+2.89e-01

Pos logit contributions
 breath+0.026
 common+0.025
 purpose+0.023
 job+0.023
 own+0.022
Neg logit contributions
cher-0.032
Did-0.032
ode-0.031
pped-0.031
 lying-0.030
Token.
Token ID13
Feature activation-2.06e-01

Pos logit contributions
 lying+0.008
Did+0.008
cher+0.008
pped+0.008
ting+0.008
Neg logit contributions
 breath-0.005
 common-0.004
 purpose-0.004
 job-0.004
 accomplishment-0.004
Token She
Token ID1375
Feature activation-5.23e-01

Pos logit contributions
 breath+0.005
 common+0.004
 purpose+0.004
 accomplishment+0.004
 job+0.004
Neg logit contributions
Did-0.005
ting-0.005
ter-0.005
 lying-0.005
boards-0.004
Token watched
Token ID7342
Feature activation+1.52e+00

Pos logit contributions
 breath+0.012
 common+0.011
 purpose+0.010
 job+0.010
 own+0.010
Neg logit contributions
cher-0.011
Did-0.011
pped-0.011
ting-0.011
 register-0.011
Token them
Token ID606
Feature activation+1.92e+00

Pos logit contributions
Did+0.037
cher+0.037
ode+0.036
pped+0.035
 register+0.035
Neg logit contributions
 breath-0.032
 common-0.029
 purpose-0.027
 accomplishment-0.026
 job-0.026
Token and
Token ID290
Feature activation-1.30e+00

Pos logit contributions
cher+0.043
pped+0.043
Did+0.042
ode+0.041
ter+0.041
Neg logit contributions
 breath-0.035
 common-0.033
 purpose-0.031
 job-0.031
 accomplishment-0.030
Token the
Token ID262
Feature activation+1.39e+00

Pos logit contributions
 breath+0.028
 common+0.026
 purpose+0.025
 job+0.024
 accomplishment+0.024
Neg logit contributions
pped-0.031
Did-0.031
cher-0.031
ode-0.030
ter-0.030
Token k
Token ID479
Feature activation-1.87e+00

Pos logit contributions
Did+0.035
pped+0.035
cher+0.034
 lying+0.033
ter+0.033
Neg logit contributions
 breath-0.027
 common-0.026
 purpose-0.025
 job-0.023
 accomplishment-0.023
Tokenidd
Token ID1638
Feature activation+5.14e-01

Pos logit contributions
 breath+0.036
 common+0.033
 purpose+0.031
 job+0.030
 own+0.028
Neg logit contributions
Did-0.048
pped-0.046
 lying-0.046
cher-0.046
boards-0.045
Tokeno
Token ID78
Feature activation-1.55e+00

Pos logit contributions
Did+0.011
pped+0.010
 lying+0.010
cher+0.010
 register+0.010
Neg logit contributions
 breath-0.012
 common-0.011
 job-0.011
 purpose-0.011
 own-0.010
Token soon
Token ID2582
Feature activation-4.23e+00

Pos logit contributions
 breath+0.040
 common+0.038
 purpose+0.037
 job+0.036
 own+0.035
Neg logit contributions
Did-0.029
pped-0.028
cher-0.028
 lying-0.027
ter-0.027
Token began
Token ID2540
Feature activation+1.95e-02

Pos logit contributions
 breath+0.103
 common+0.098
 purpose+0.094
 job+0.091
 accomplishment+0.090
Neg logit contributions
Did-0.086
cher-0.084
pped-0.082
ter-0.082
ode-0.081
Token to
Token ID284
Feature activation-3.58e+00

Pos logit contributions
 lying+0.000
Did+0.000
cher+0.000
pped+0.000
ting+0.000
Neg logit contributions
 breath-0.000
 common-0.000
 purpose-0.000
 job-0.000
 accomplishment-0.000
Token do
Token ID466
Feature activation-5.43e-01

Pos logit contributions
 breath+0.060
 common+0.057
 purpose+0.052
 job+0.049
 accomplishment+0.047
Neg logit contributions
Did-0.098
cher-0.097
pped-0.097
 lying-0.097
ter-0.094
Token much
Token ID881
Feature activation-6.46e+00

Pos logit contributions
 breath+0.011
 common+0.010
 purpose+0.010
 job+0.009
 accomplishment+0.009
Neg logit contributions
Did-0.013
cher-0.013
pped-0.013
ter-0.013
ode-0.013
Token better
Token ID1365
Feature activation-7.70e+00

Pos logit contributions
 breath+0.114
 common+0.110
 purpose+0.098
 job+0.089
 own+0.087
Neg logit contributions
 lying-0.170
Did-0.168
cher-0.165
pped-0.164
ting-0.163
Token for
Token ID329
Feature activation+2.54e-01

Pos logit contributions
cher+0.010
pped+0.010
Did+0.010
 register+0.010
ode+0.010
Neg logit contributions
 breath-0.006
 common-0.005
 purpose-0.005
 accomplishment-0.005
 long-0.005
Token a
Token ID257
Feature activation-1.37e+00

Pos logit contributions
cher+0.005
Did+0.005
pped+0.005
 register+0.005
ting+0.005
Neg logit contributions
 breath-0.007
 common-0.006
 purpose-0.006
 job-0.006
 accomplishment-0.006
Token moment
Token ID2589
Feature activation-3.64e+00

Pos logit contributions
 breath+0.016
 common+0.015
 purpose+0.015
 accomplishment+0.014
 choice+0.013
Neg logit contributions
 lying-0.042
Did-0.042
cher-0.041
ter-0.039
ode-0.039
Token and
Token ID290
Feature activation-1.47e+00

Pos logit contributions
 breath+0.059
 common+0.055
 purpose+0.051
 job+0.047
 accomplishment+0.046
Neg logit contributions
Did-0.103
cher-0.102
pped-0.102
 lying-0.100
ter-0.099
Token looked
Token ID3114
Feature activation+8.29e-01

Pos logit contributions
 breath+0.033
 common+0.031
 purpose+0.029
 job+0.028
 accomplishment+0.028
Neg logit contributions
cher-0.032
Did-0.032
pped-0.032
 register-0.031
 lying-0.031
Token up
Token ID510
Feature activation-3.87e+00

Pos logit contributions
 lying+0.017
Did+0.017
ting+0.017
pped+0.017
cher+0.017
Neg logit contributions
 breath-0.019
 common-0.019
 purpose-0.018
 job-0.017
 accomplishment-0.016
Token at
Token ID379
Feature activation+5.36e-01

Pos logit contributions
 purpose+0.098
 common+0.098
 breath+0.091
 job+0.090
 choice+0.090
Neg logit contributions
 lying-0.075
ting-0.067
Did-0.066
pped-0.066
ter-0.065
Token the
Token ID262
Feature activation+2.49e+00

Pos logit contributions
cher+0.016
Did+0.016
 register+0.015
ode+0.015
boards+0.015
Neg logit contributions
 breath-0.010
 common-0.008
 purpose-0.008
 sight-0.008
 home-0.007
Token sky
Token ID6766
Feature activation-1.23e+00

Pos logit contributions
Did+0.057
cher+0.056
pped+0.056
 lying+0.055
ter+0.054
Neg logit contributions
 breath-0.054
 common-0.052
 purpose-0.049
 job-0.046
 accomplishment-0.045
Token.
Token ID13
Feature activation+1.51e-01

Pos logit contributions
 breath+0.019
 common+0.018
 purpose+0.016
 job+0.015
 accomplishment+0.014
Neg logit contributions
Did-0.036
 lying-0.036
cher-0.035
pped-0.035
ter-0.035
Token He
Token ID679
Feature activation-2.06e-02

Pos logit contributions
cher+0.004
pped+0.004
Did+0.004
 register+0.004
ter+0.004
Neg logit contributions
 breath-0.002
 purpose-0.002
 common-0.002
 long-0.002
 job-0.002
Token wet
Token ID9583
Feature activation-3.96e+00

Pos logit contributions
 common+0.123
 breath+0.116
 purpose+0.113
 job+0.102
 accomplishment+0.098
Neg logit contributions
 lying-0.142
ter-0.138
Did-0.138
ting-0.136
cher-0.134
Token,
Token ID11
Feature activation+2.80e-01

Pos logit contributions
 breath+0.069
 common+0.064
 purpose+0.060
 job+0.057
 accomplishment+0.054
Neg logit contributions
Did-0.108
cher-0.108
pped-0.106
 lying-0.104
 register-0.103
Token but
Token ID475
Feature activation-2.09e-01

Pos logit contributions
cher+0.008
 register+0.007
ode+0.007
boards+0.007
pped+0.007
Neg logit contributions
 breath-0.005
 common-0.005
 long-0.005
 sight-0.004
 purpose-0.004
Token it
Token ID340
Feature activation+1.08e+00

Pos logit contributions
 breath+0.004
 common+0.004
 accomplishment+0.004
 content+0.003
 purpose+0.003
Neg logit contributions
cher-0.005
ode-0.005
boards-0.005
Did-0.005
pped-0.005
Token was
Token ID373
Feature activation+2.40e+00

Pos logit contributions
 register+0.023
cher+0.023
Did+0.023
pped+0.022
ting+0.022
Neg logit contributions
 breath-0.026
 common-0.024
 purpose-0.023
 own-0.022
 job-0.022
Token okay
Token ID8788
Feature activation-3.00e+00

Pos logit contributions
Did+0.056
cher+0.056
pped+0.055
ting+0.054
 register+0.054
Neg logit contributions
 breath-0.051
 common-0.049
 purpose-0.046
 job-0.044
 accomplishment-0.043
Token.
Token ID13
Feature activation+2.74e-01

Pos logit contributions
 breath+0.034
 purpose+0.028
 common+0.027
 job+0.025
 long+0.024
Neg logit contributions
cher-0.104
pped-0.102
Did-0.101
 register-0.099
ode-0.098
Token She
Token ID1375
Feature activation-4.47e-01

Pos logit contributions
Did+0.007
 lying+0.006
ting+0.006
pped+0.006
ter+0.006
Neg logit contributions
 breath-0.006
 common-0.006
 purpose-0.005
 job-0.005
 accomplishment-0.005
Token was
Token ID373
Feature activation+2.60e+00

Pos logit contributions
 breath+0.010
 common+0.010
 purpose+0.009
 accomplishment+0.009
 own+0.009
Neg logit contributions
cher-0.010
Did-0.009
 register-0.009
pped-0.009
ting-0.009
Token so
Token ID523
Feature activation-4.21e+00

Pos logit contributions
Did+0.063
cher+0.063
pped+0.061
boards+0.060
ting+0.060
Neg logit contributions
 breath-0.054
 common-0.050
 purpose-0.047
 job-0.045
 accomplishment-0.045
Token happy
Token ID3772
Feature activation-5.48e+00

Pos logit contributions
 breath+0.073
 common+0.071
 purpose+0.069
 job+0.061
 choice+0.059
Neg logit contributions
 lying-0.116
Did-0.110
pped-0.109
ter-0.106
pled-0.106
Token
Token ID198
Feature activation-2.35e+00

Pos logit contributions
 breath+0.077
 accomplishment+0.069
 common+0.066
 choice+0.064
 job+0.064
Neg logit contributions
Did-0.049
pped-0.047
can-0.045
My-0.045
 lying-0.045
Token"
Token ID1
Feature activation+5.51e+00

Pos logit contributions
 breath+0.059
 accomplishment+0.052
 common+0.051
 choice+0.050
 sight+0.047
Neg logit contributions
Did-0.052
pped-0.051
 lying-0.048
My-0.047
can-0.047
TokenWe
Token ID1135
Feature activation+1.20e+00

Pos logit contributions
cher+0.095
ode+0.094
pled+0.092
pped+0.092
ting+0.090
Neg logit contributions
 common-0.150
 purpose-0.138
 long-0.134
 breath-0.132
 own-0.131
Token are
Token ID389
Feature activation+2.42e+00

Pos logit contributions
Did+0.019
cher+0.018
pped+0.018
ting+0.018
 lying+0.018
Neg logit contributions
 breath-0.035
 common-0.033
 job-0.031
 purpose-0.031
 work-0.031
Token going
Token ID1016
Feature activation-3.60e-01

Pos logit contributions
Did+0.026
cher+0.026
pped+0.026
 lying+0.025
ter+0.024
Neg logit contributions
 breath-0.081
 common-0.079
 purpose-0.076
 job-0.073
 accomplishment-0.073
Token to
Token ID284
Feature activation-1.73e+00

Pos logit contributions
 breath+0.007
 common+0.006
 purpose+0.006
 job+0.006
 accomplishment+0.005
Neg logit contributions
Did-0.009
cher-0.009
pped-0.009
 lying-0.009
ter-0.009
Token the
Token ID262
Feature activation+3.05e+00

Pos logit contributions
 breath+0.036
 common+0.035
 purpose+0.033
 job+0.031
 accomplishment+0.030
Neg logit contributions
Did-0.040
cher-0.040
pped-0.040
 lying-0.039
ter-0.039
Token beach
Token ID10481
Feature activation-1.52e-01

Pos logit contributions
cher+0.074
pped+0.072
Did+0.072
pled+0.070
ter+0.068
Neg logit contributions
 breath-0.062
 common-0.062
 purpose-0.061
 job-0.060
 own-0.057
Token,"
Token ID553
Feature activation-2.08e+00

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 accomplishment+0.002
 job+0.002
Neg logit contributions
cher-0.004
 lying-0.004
Did-0.004
ter-0.004
pped-0.004
Token said
Token ID531
Feature activation+4.94e-01

Pos logit contributions
 breath+0.045
 common+0.043
 purpose+0.040
 accomplishment+0.039
powers+0.038
Neg logit contributions
 lying-0.046
Did-0.046
ter-0.045
cher-0.044
pped-0.043
Token her
Token ID607
Feature activation-4.74e-01

Pos logit contributions
 lying+0.010
ter+0.010
cher+0.010
ting+0.009
 register+0.009
Neg logit contributions
 common-0.011
 breath-0.011
 purpose-0.010
powers-0.010
 accomplishment-0.010
TokenThe
Token ID464
Feature activation+1.18e+00

Pos logit contributions
 breath+0.038
 common+0.036
 purpose+0.034
 job+0.032
 accomplishment+0.032
Neg logit contributions
Did-0.042
pped-0.040
cher-0.040
 lying-0.039
ting-0.039
Token little
Token ID1310
Feature activation-2.55e+00

Pos logit contributions
Did+0.028
pped+0.028
cher+0.028
ting+0.027
 lying+0.027
Neg logit contributions
 breath-0.025
 common-0.024
 purpose-0.022
 job-0.022
 accomplishment-0.021
Token boy
Token ID2933
Feature activation+7.25e-01

Pos logit contributions
 breath+0.045
 common+0.045
 purpose+0.041
 job+0.038
 accomplishment+0.038
Neg logit contributions
Did-0.066
 lying-0.066
cher-0.066
ter-0.066
ting-0.065
Token worked
Token ID3111
Feature activation-5.43e-01

Pos logit contributions
Did+0.016
cher+0.015
pped+0.015
 lying+0.015
ter+0.015
Neg logit contributions
 breath-0.017
 common-0.016
 purpose-0.015
 job-0.014
 accomplishment-0.014
Token really
Token ID1107
Feature activation-3.96e+00

Pos logit contributions
 breath+0.009
 common+0.009
 purpose+0.008
 job+0.008
 accomplishment+0.007
Neg logit contributions
Did-0.015
cher-0.014
pped-0.014
 lying-0.014
ting-0.014
Token hard
Token ID1327
Feature activation-7.06e+00

Pos logit contributions
 breath+0.056
 common+0.052
 purpose+0.048
 job+0.043
 accomplishment+0.041
Neg logit contributions
 lying-0.116
pped-0.116
Did-0.115
cher-0.115
ting-0.111
Token and
Token ID290
Feature activation-1.98e+00

Pos logit contributions
 breath+0.129
 common+0.126
 purpose+0.112
 accomplishment+0.101
 job+0.101
Neg logit contributions
 lying-0.179
ting-0.176
pped-0.174
boards-0.174
cher-0.170
Token soon
Token ID2582
Feature activation-5.57e+00

Pos logit contributions
 breath+0.041
 common+0.039
 purpose+0.037
 job+0.036
 accomplishment+0.035
Neg logit contributions
Did-0.047
cher-0.046
pped-0.046
ode-0.045
ter-0.044
Token the
Token ID262
Feature activation+1.65e+00

Pos logit contributions
 breath+0.118
 common+0.112
 purpose+0.106
 job+0.101
 accomplishment+0.099
Neg logit contributions
Did-0.131
cher-0.128
pped-0.127
 lying-0.125
ter-0.124
Token toilet
Token ID16146
Feature activation+1.30e+00

Pos logit contributions
Did+0.034
pped+0.033
cher+0.033
ting+0.032
 lying+0.031
Neg logit contributions
 breath-0.041
 common-0.039
 purpose-0.038
 job-0.037
 accomplishment-0.035
Token was
Token ID373
Feature activation+1.67e+00

Pos logit contributions
Did+0.030
pped+0.028
cher+0.028
ode+0.027
ting+0.027
Neg logit contributions
 breath-0.029
 common-0.027
 job-0.027
 purpose-0.027
 work-0.026
Token everyone
Token ID2506
Feature activation+1.95e+00

Pos logit contributions
cher+0.058
ode+0.057
Did+0.055
pped+0.055
 register+0.054
Neg logit contributions
 breath-0.032
 common-0.030
 job-0.029
 purpose-0.028
 long-0.027
Token was
Token ID373
Feature activation+1.19e+00

Pos logit contributions
Did+0.042
cher+0.042
ode+0.042
pped+0.041
uit+0.040
Neg logit contributions
 breath-0.044
 common-0.041
 work-0.037
 own-0.037
 job-0.037
Token happy
Token ID3772
Feature activation-5.44e+00

Pos logit contributions
 lying+0.028
Did+0.028
pped+0.028
cher+0.028
ter+0.027
Neg logit contributions
 breath-0.024
 common-0.024
 purpose-0.022
 job-0.021
 accomplishment-0.020
Token and
Token ID290
Feature activation-2.98e+00

Pos logit contributions
 breath+0.092
 common+0.087
 purpose+0.080
 job+0.075
 accomplishment+0.072
Neg logit contributions
Did-0.151
cher-0.148
 lying-0.148
pped-0.148
ter-0.146
Token having
Token ID1719
Feature activation+2.84e+00

Pos logit contributions
 breath+0.058
 common+0.054
 purpose+0.049
 job+0.049
 accomplishment+0.048
Neg logit contributions
pped-0.077
Did-0.077
cher-0.075
ode-0.074
 register-0.073
Token fun
Token ID1257
Feature activation-6.49e+00

Pos logit contributions
Did+0.074
cher+0.074
pped+0.073
 lying+0.071
ter+0.071
Neg logit contributions
 breath-0.053
 common-0.051
 purpose-0.048
 job-0.044
 accomplishment-0.043
Token.
Token ID13
Feature activation-1.03e-01

Pos logit contributions
 breath+0.102
 common+0.095
 purpose+0.087
 job+0.081
 accomplishment+0.078
Neg logit contributions
Did-0.188
cher-0.187
pped-0.186
 lying-0.183
ter-0.181
Token Sally
Token ID25737
Feature activation+1.31e+00

Pos logit contributions
 breath+0.002
 common+0.002
 purpose+0.002
 job+0.002
 accomplishment+0.002
Neg logit contributions
Did-0.003
pped-0.003
 lying-0.003
cher-0.002
ter-0.002
Token saw
Token ID2497
Feature activation+4.68e+00

Pos logit contributions
Did+0.027
cher+0.026
pped+0.026
 lying+0.026
ter+0.025
Neg logit contributions
 breath-0.031
 common-0.030
 purpose-0.029
 job-0.027
 accomplishment-0.027
Token her
Token ID607
Feature activation+7.07e-01

Pos logit contributions
cher+0.156
ode+0.152
Did+0.140
ologist+0.139
can+0.139
Neg logit contributions
 home-0.065
 sight-0.063
 breath-0.063
 purpose-0.060
 long-0.058
Token friend
Token ID1545
Feature activation+2.17e+00

Pos logit contributions
Did+0.020
cher+0.019
 lying+0.019
pped+0.019
ter+0.019
Neg logit contributions
 breath-0.012
 common-0.011
 purpose-0.010
 job-0.009
 accomplishment-0.009
Token lesson
Token ID11483
Feature activation-4.38e+00

Pos logit contributions
Did+0.008
cher+0.008
 lying+0.008
pped+0.008
ter+0.008
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.002
 accomplishment-0.001
Token and
Token ID290
Feature activation-2.25e+00

Pos logit contributions
 breath+0.077
 common+0.075
 purpose+0.070
 job+0.067
 accomplishment+0.064
Neg logit contributions
Did-0.118
cher-0.117
pped-0.115
ode-0.112
ter-0.112
Token never
Token ID1239
Feature activation-2.81e+00

Pos logit contributions
 breath+0.053
 common+0.051
 purpose+0.048
 job+0.047
 accomplishment+0.046
Neg logit contributions
Did-0.048
cher-0.047
pped-0.047
ode-0.046
boards-0.045
Token felt
Token ID2936
Feature activation+9.39e-02

Pos logit contributions
 breath+0.066
 common+0.063
 purpose+0.059
 job+0.057
 accomplishment+0.055
Neg logit contributions
Did-0.060
cher-0.059
pped-0.058
 lying-0.058
ter-0.057
Token guilty
Token ID6717
Feature activation-3.90e+00

Pos logit contributions
Did+0.002
cher+0.002
pped+0.002
 lying+0.002
ter+0.002
Neg logit contributions
 breath-0.002
 common-0.002
 purpose-0.002
 job-0.001
 accomplishment-0.001
Token again
Token ID757
Feature activation-6.62e+00

Pos logit contributions
 breath+0.058
 purpose+0.055
 common+0.053
 job+0.051
 accomplishment+0.049
Neg logit contributions
cher-0.115
Did-0.114
ode-0.112
pped-0.112
 register-0.106
Token.
Token ID13
Feature activation+3.82e-01

Pos logit contributions
 breath+0.106
 common+0.098
 purpose+0.090
 job+0.084
 accomplishment+0.080
Neg logit contributions
Did-0.189
 lying-0.188
pped-0.185
cher-0.185
ting-0.184
Token<|endoftext|>
Token ID50256
Feature activation+9.52e-01

Pos logit contributions
Did+0.010
 lying+0.010
cher+0.010
pped+0.010
ting+0.009
Neg logit contributions
 breath-0.007
 common-0.007
 purpose-0.006
 job-0.006
 accomplishment-0.006
TokenOne
Token ID3198
Feature activation-1.13e+00

Pos logit contributions
cher+0.028
Did+0.028
ode+0.028
pped+0.028
 register+0.027
Neg logit contributions
 breath-0.014
 common-0.013
 purpose-0.012
-0.011
 job-0.011
Token day
Token ID1110
Feature activation-3.52e+00

Pos logit contributions
 breath+0.013
 common+0.012
 purpose+0.011
 accomplishment+0.009
 job+0.009
Neg logit contributions
 lying-0.037
Did-0.035
pped-0.035
ter-0.034
cher-0.034
Token,
Token ID11
Feature activation+2.19e+00

Pos logit contributions
 breath+0.052
 common+0.048
 purpose+0.045
 accomplishment+0.044
 choice+0.040
Neg logit contributions
 lying-0.106
pped-0.104
Did-0.100
cher-0.100
ter-0.100
TokenAfter
Token ID3260
Feature activation-1.49e+00

Pos logit contributions
 breath+0.047
 common+0.045
 purpose+0.043
 job+0.040
 accomplishment+0.040
Neg logit contributions
cher-0.049
Did-0.049
pped-0.049
 lying-0.049
ter-0.048
Token that
Token ID326
Feature activation+1.05e-02

Pos logit contributions
 breath+0.033
 common+0.030
 purpose+0.029
 long+0.029
 job+0.029
Neg logit contributions
ode-0.035
cher-0.034
Did-0.033
 register-0.033
pped-0.033
Token,
Token ID11
Feature activation+4.39e-02

Pos logit contributions
 lying+0.000
ting+0.000
ter+0.000
Did+0.000
 Why+0.000
Neg logit contributions
 breath-0.000
 common-0.000
powers-0.000
 purpose-0.000
 own-0.000
Token they
Token ID484
Feature activation-6.75e-01

Pos logit contributions
Did+0.001
cher+0.001
pped+0.001
 lying+0.001
ter+0.001
Neg logit contributions
 breath-0.001
 common-0.001
 purpose-0.001
 job-0.001
 accomplishment-0.001
Token continued
Token ID3767
Feature activation-1.36e+00

Pos logit contributions
 breath+0.016
 common+0.014
 home+0.014
 work+0.014
 happily+0.013
Neg logit contributions
Did-0.014
cher-0.014
boards-0.014
 register-0.014
 lying-0.014
Token on
Token ID319
Feature activation-6.28e+00

Pos logit contributions
 common+0.030
 breath+0.029
 purpose+0.028
 choice+0.027
+0.026
Neg logit contributions
 lying-0.030
ting-0.028
Did-0.028
pped-0.027
cher-0.026
Token their
Token ID511
Feature activation-1.60e+00

Pos logit contributions
 breath+0.119
 common+0.114
 purpose+0.107
+0.101
 job+0.100
Neg logit contributions
 lying-0.155
Did-0.149
pped-0.145
ting-0.142
pled-0.136
Token adventure
Token ID8855
Feature activation-2.64e+00

Pos logit contributions
 common+0.026
 breath+0.025
 purpose+0.023
+0.022
eth+0.021
Neg logit contributions
 lying-0.041
ode-0.038
Did-0.037
 register-0.036
ting-0.036
Token and
Token ID290
Feature activation-2.08e+00

Pos logit contributions
 breath+0.041
 common+0.040
 purpose+0.036
 job+0.032
 accomplishment+0.031
Neg logit contributions
 lying-0.074
ting-0.073
cher-0.073
Did-0.072
pped-0.072
Token found
Token ID1043
Feature activation+4.82e+00

Pos logit contributions
 breath+0.047
 common+0.045
 job+0.041
 purpose+0.041
 accomplishment+0.040
Neg logit contributions
pped-0.047
cher-0.046
Did-0.046
ode-0.045
 register-0.045
Token the
Token ID262
Feature activation+2.40e+00

Pos logit contributions
Did+0.136
ode+0.135
pped+0.135
cher+0.131
can+0.127
Neg logit contributions
 breath-0.086
 home-0.079
 common-0.079
 purpose-0.078
 job-0.075
Token that
Token ID326
Feature activation+2.49e+00

Pos logit contributions
Did+0.132
ode+0.130
pped+0.128
cher+0.127
 register+0.122
Neg logit contributions
 breath-0.097
 common-0.089
 purpose-0.088
 home-0.085
 sight-0.084
Token giving
Token ID3501
Feature activation+4.23e+00

Pos logit contributions
pped+0.064
cher+0.064
Did+0.063
ode+0.061
 register+0.061
Neg logit contributions
 breath-0.048
 common-0.046
 purpose-0.044
 job-0.041
 accomplishment-0.040
Token peace
Token ID4167
Feature activation-2.34e+00

Pos logit contributions
Did+0.109
cher+0.107
pped+0.106
ode+0.103
 register+0.103
Neg logit contributions
 breath-0.082
 common-0.078
 purpose-0.074
 job-0.069
 accomplishment-0.066
Token was
Token ID373
Feature activation+1.57e+00

Pos logit contributions
 breath+0.037
 common+0.035
 purpose+0.032
 job+0.030
 accomplishment+0.028
Neg logit contributions
Did-0.068
pped-0.068
cher-0.067
 register-0.065
 lying-0.065
Token even
Token ID772
Feature activation-2.95e+00

Pos logit contributions
Did+0.036
cher+0.036
ode+0.035
pped+0.035
boards+0.035
Neg logit contributions
 breath-0.034
 common-0.033
 purpose-0.031
 job-0.030
 accomplishment-0.029
Token better
Token ID1365
Feature activation-6.82e+00

Pos logit contributions
 breath+0.061
 common+0.060
 purpose+0.054
 job+0.054
 accomplishment+0.053
Neg logit contributions
pped-0.071
cher-0.069
Did-0.069
 register-0.067
boards-0.067
Token.
Token ID13
Feature activation+2.71e-01

Pos logit contributions
 breath+0.099
 common+0.090
 purpose+0.083
 accomplishment+0.074
 job+0.074
Neg logit contributions
 lying-0.200
Did-0.200
cher-0.198
ter-0.197
ting-0.196
Token She
Token ID1375
Feature activation-1.74e-01

Pos logit contributions
Did+0.007
 lying+0.006
cher+0.006
pped+0.006
ting+0.006
Neg logit contributions
 breath-0.006
 common-0.005
 purpose-0.005
 job-0.005
 accomplishment-0.004
Token had
Token ID550
Feature activation+2.59e+00

Pos logit contributions
 breath+0.004
 common+0.004
 purpose+0.004
 accomplishment+0.004
 own+0.004
Neg logit contributions
Did-0.004
cher-0.003
pped-0.003
ting-0.003
 register-0.003
Token made
Token ID925
Feature activation+3.21e+00

Pos logit contributions
cher+0.050
Did+0.050
pped+0.049
ter+0.048
ting+0.048
Neg logit contributions
 breath-0.066
 common-0.063
 purpose-0.060
 job-0.057
 accomplishment-0.056
Token the
Token ID262
Feature activation+2.41e+00

Pos logit contributions
Did+0.067
cher+0.066
pped+0.065
 lying+0.064
ter+0.063
Neg logit contributions
 breath-0.076
 common-0.073
 purpose-0.069
 job-0.066
 accomplishment-0.064