TOP ACTIVATIONS
MAX = 0.128

but be careful . Ost ric hes can be gr umpy
ostr ich pen . Ost ric hes are big birds with
ostr ich pen . Ost ric hes are big birds with
should not tease the ost ric hes . They are not
. They were called ost ric hes . Lily and Ben
cried . " The ost ric hes are mean ! They
too close . The ost ric hes might pe ck you
and run to the ost ric hes . "
and looked at the ost ric hes . They saw one
shouted . " The ost ric hes are so funny !"
bitter and made her face scr unch up .
wanted to touch the ost ric hes . They reached out
ie . Unlike other ost ric hes , his feathers gl
mom and to the ost ric hes . They learned their
you go , enjoy your pizz as ." "
But the ost ric hes did not like that
Yes , they are ost ric hes ," mom said .
. He was carrying their pizz as on a big tray
They do not pack ost ric hes anymore . They pack
. They made their own pizz as with their favorite things

MIN ACTIVATIONS
MIN = -0.118

day ever ! <|endoftext|> Mor ris was very excited . Today
up high , beyond where Little Ted could see . Ted
from now on . <|endoftext|> Little Jack saw something special on
show off her new <|endoftext|> Little Jimmy was very proud of
the perfect present ! <|endoftext|> Little John was very excited for
get back soon !" <|endoftext|> Little Dan was playing in the
The End . <|endoftext|> Little Tommy was so persistent !
in the sky ! <|endoftext|> Little Jake was a happy ,
The end . <|endoftext|> Little Jake was so excited !
began to set . <|endoftext|> Little Johnny was so excited .
didn 't return . <|endoftext|> Little John spent his days playing
a bad ending . <|endoftext|> Little Sandra was so excited that
worked very well ! <|endoftext|> Little Jack and his mum were
ever happening again . <|endoftext|> Little Tommy was getting ready for
The end . <|endoftext|> Little Max was playing in the
and hopped away . <|endoftext|> Little Harry was very curious .
to support him . <|endoftext|> Little Ralph ie was very excited
on the ground . <|endoftext|> Little Tim my was playing by
the zoo again . <|endoftext|> Little Cor a was a graceful
across the land . <|endoftext|> Little Jack was playing in the

INTERVAL 0.067 - 0.128
CONTAINS 3.928%

so she was scared . Then , a fish
floor was slippery . This made her g
in , please . This is my daughter , Lily .
with her mom . " Dear Sara , I
. I 'm glad you are not hurt . But you

INTERVAL 0.005 - 0.067
CONTAINS 30.042%

voice on it ? You did not ask me . You
my sack and when I come back , we will be
. She is not a robot . She is a princess
time there was a bar ber who lived in a small
books he lf and found some more books . She grabbed

INTERVAL -0.056 - 0.005
CONTAINS 65.077%

serious face . She looked at the bracelet and saw how
might be from something , but she couldn 't tell .
. <|endoftext|> Once upon a time , there was a little
they got her license from the car .
that there are kind people who can help restore them if

INTERVAL -0.118 - -0.056
CONTAINS 0.953%

to spread it around . Together , they built a beautiful
. The wealthy farmer saw his opportunity to
are friends . The moral of the story is to be
dish washer ." Together they opened the mysterious dish
. The duck follows their ship . L ila
Token but
Token ID475
Feature activation-1.75e-03

Pos logit contributions
ened+0.047
cy+0.043
en+0.039
neath+0.038
ted+0.038
Neg logit contributions
ggy-0.047
 badly-0.041
vern-0.040
 liar-0.039
ggie-0.037
Token be
Token ID307
Feature activation+6.59e-02

Pos logit contributions
ened+0.004
cy+0.003
en+0.003
ted+0.003
neath+0.003
Neg logit contributions
ggy-0.003
 badly-0.003
vern-0.003
 liar-0.003
vere-0.003
Token careful
Token ID8161
Feature activation-1.54e-02

Pos logit contributions
ggy+0.107
vern+0.095
 badly+0.094
 liar+0.093
vere+0.090
Neg logit contributions
ened-0.141
cy-0.134
 1-0.128
en-0.127
 Sarah-0.127
Token.
Token ID13
Feature activation-6.35e-03

Pos logit contributions
ened+0.027
cy+0.024
en+0.022
ted+0.022
 Sarah+0.021
Neg logit contributions
ggy-0.036
 badly-0.032
vern-0.032
 liar-0.031
vere-0.029
Token Ost
Token ID38919
Feature activation-1.83e-02

Pos logit contributions
ened+0.010
cy+0.008
en+0.007
ted+0.007
 Sarah+0.007
Neg logit contributions
ggy-0.016
 badly-0.015
vern-0.015
 liar-0.014
unch-0.014
Tokenric
Token ID1173
Feature activation+1.28e-01

Pos logit contributions
ened+0.038
cy+0.034
en+0.031
ted+0.031
 Sarah+0.031
Neg logit contributions
ggy-0.037
 badly-0.033
vern-0.032
 liar-0.031
vere-0.029
Tokenhes
Token ID956
Feature activation+1.93e-03

Pos logit contributions
ggy+0.173
 badly+0.169
vern+0.158
 liar+0.149
vere+0.135
Neg logit contributions
ened-0.304
 means-0.285
cy-0.283
ted-0.278
 Let-0.277
Token can
Token ID460
Feature activation+5.20e-02

Pos logit contributions
ggy+0.005
 badly+0.004
vern+0.004
 liar+0.004
ggie+0.004
Neg logit contributions
ened-0.003
cy-0.003
 Jenna-0.003
 Sarah-0.003
ted-0.003
Token be
Token ID307
Feature activation+7.32e-02

Pos logit contributions
ggy+0.109
vern+0.105
 liar+0.103
 badly+0.100
vere+0.097
Neg logit contributions
ened-0.085
cy-0.082
en-0.078
neath-0.075
ted-0.075
Token gr
Token ID1036
Feature activation+3.59e-02

Pos logit contributions
ggy+0.127
vern+0.114
 badly+0.109
 liar+0.107
vere+0.104
Neg logit contributions
ened-0.167
cy-0.152
en-0.143
ted-0.141
 Sarah-0.138
Tokenumpy
Token ID32152
Feature activation+6.79e-05

Pos logit contributions
ggy+0.067
 badly+0.059
vern+0.058
 liar+0.057
gh+0.054
Neg logit contributions
ened-0.077
en-0.072
cy-0.072
elling-0.067
 means-0.064
Token ostr
Token ID42387
Feature activation+1.86e-02

Pos logit contributions
ened+0.029
cy+0.026
en+0.024
ted+0.023
 Sarah+0.023
Neg logit contributions
ggy-0.033
 badly-0.029
vern-0.029
 liar-0.028
vere-0.027
Tokenich
Token ID488
Feature activation+1.46e-02

Pos logit contributions
ggy+0.027
vern+0.027
 badly+0.024
 liar+0.024
 gag+0.022
Neg logit contributions
cy-0.042
ened-0.042
ted-0.039
en-0.038
als-0.038
Token pen
Token ID3112
Feature activation+2.56e-02

Pos logit contributions
ggy+0.035
 badly+0.032
vern+0.030
 liar+0.029
ggie+0.029
Neg logit contributions
ened-0.025
cy-0.021
en-0.019
neath-0.019
 Jenna-0.019
Token.
Token ID13
Feature activation-1.45e-02

Pos logit contributions
ggy+0.070
vern+0.064
 badly+0.064
 liar+0.062
vere+0.060
Neg logit contributions
ened-0.033
cy-0.028
en-0.025
ted-0.025
 Let-0.024
Token Ost
Token ID38919
Feature activation-2.21e-02

Pos logit contributions
ened+0.027
en+0.022
cy+0.022
ening+0.020
les+0.020
Neg logit contributions
ggy-0.033
 badly-0.029
 liar-0.028
 much-0.027
 not-0.027
Tokenric
Token ID1173
Feature activation+1.26e-01

Pos logit contributions
ened+0.047
cy+0.042
ted+0.038
en+0.038
 Sarah+0.038
Neg logit contributions
ggy-0.044
 badly-0.039
vern-0.038
 liar-0.036
ggie-0.035
Tokenhes
Token ID956
Feature activation+6.43e-03

Pos logit contributions
 badly+0.176
ggy+0.175
vern+0.168
 liar+0.157
vere+0.144
Neg logit contributions
ened-0.283
 means-0.266
cy-0.264
ted-0.261
 smiles-0.261
Token are
Token ID389
Feature activation+8.03e-02

Pos logit contributions
ggy+0.015
 badly+0.014
vern+0.013
 liar+0.013
ggie+0.012
Neg logit contributions
ened-0.011
cy-0.010
 Jenna-0.009
ted-0.009
 Sarah-0.009
Token big
Token ID1263
Feature activation+2.46e-04

Pos logit contributions
ggy+0.145
vern+0.132
 badly+0.125
 liar+0.122
vere+0.119
Neg logit contributions
ened-0.172
cy-0.152
 Sarah-0.147
en-0.147
 1-0.146
Token birds
Token ID10087
Feature activation+4.67e-02

Pos logit contributions
ggy+0.001
 badly+0.001
 liar+0.001
vern+0.000
 gag+0.000
Neg logit contributions
ened-0.000
cy-0.000
neath-0.000
 Jenna-0.000
 Sarah-0.000
Token with
Token ID351
Feature activation+9.85e-03

Pos logit contributions
ggy+0.119
vern+0.111
 liar+0.110
 badly+0.107
vere+0.104
Neg logit contributions
ened-0.064
cy-0.059
en-0.055
 Let-0.050
ted-0.049
Token ostr
Token ID42387
Feature activation+2.05e-02

Pos logit contributions
ened+0.024
cy+0.021
en+0.020
ted+0.019
 Sarah+0.019
Neg logit contributions
ggy-0.028
 badly-0.025
vern-0.025
 liar-0.024
vere-0.023
Tokenich
Token ID488
Feature activation+1.41e-02

Pos logit contributions
ggy+0.030
vern+0.030
 badly+0.027
 liar+0.027
 gag+0.025
Neg logit contributions
cy-0.047
ened-0.046
ted-0.043
en-0.042
als-0.042
Token pen
Token ID3112
Feature activation+2.32e-02

Pos logit contributions
ggy+0.032
 badly+0.029
vern+0.027
 liar+0.026
ggie+0.026
Neg logit contributions
ened-0.027
cy-0.022
neath-0.020
 Jenna-0.020
en-0.020
Token.
Token ID13
Feature activation-1.95e-02

Pos logit contributions
ggy+0.060
 badly+0.055
vern+0.055
 liar+0.053
vere+0.051
Neg logit contributions
ened-0.034
cy-0.029
en-0.026
ted-0.026
 Sarah-0.025
Token Ost
Token ID38919
Feature activation-2.24e-02

Pos logit contributions
ened+0.036
cy+0.030
en+0.030
ening+0.027
les+0.026
Neg logit contributions
ggy-0.044
 badly-0.039
 liar-0.038
unch-0.036
 much-0.036
Tokenric
Token ID1173
Feature activation+1.25e-01

Pos logit contributions
ened+0.048
cy+0.042
ted+0.039
 Sarah+0.038
 Let+0.038
Neg logit contributions
ggy-0.044
 badly-0.038
vern-0.038
 liar-0.036
ggie-0.035
Tokenhes
Token ID956
Feature activation+3.80e-03

Pos logit contributions
ggy+0.172
 badly+0.171
vern+0.165
 liar+0.154
vere+0.141
Neg logit contributions
ened-0.286
cy-0.264
neath-0.262
ted-0.262
 means-0.261
Token are
Token ID389
Feature activation+7.72e-02

Pos logit contributions
ggy+0.009
 badly+0.008
 liar+0.007
ggie+0.007
vern+0.007
Neg logit contributions
ened-0.007
cy-0.006
 Jenna-0.006
 Sarah-0.005
ted-0.005
Token big
Token ID1263
Feature activation-1.95e-03

Pos logit contributions
ggy+0.139
vern+0.127
 badly+0.120
 liar+0.116
vere+0.114
Neg logit contributions
ened-0.169
cy-0.151
 Sarah-0.145
en-0.144
 1-0.142
Token birds
Token ID10087
Feature activation+4.40e-02

Pos logit contributions
ened+0.003
cy+0.003
neath+0.003
 Jenna+0.003
 First+0.003
Neg logit contributions
ggy-0.004
 badly-0.004
 liar-0.004
vern-0.003
 choke-0.003
Token with
Token ID351
Feature activation+8.69e-03

Pos logit contributions
ggy+0.116
vern+0.108
 liar+0.106
 badly+0.104
vere+0.101
Neg logit contributions
ened-0.059
cy-0.052
en-0.049
ted-0.044
 Sarah-0.043
Token should
Token ID815
Feature activation+3.11e-02

Pos logit contributions
ggy+0.032
 badly+0.030
vern+0.029
 liar+0.028
vere+0.026
Neg logit contributions
ened-0.028
cy-0.025
en-0.023
ted-0.023
neath-0.022
Token not
Token ID407
Feature activation+4.37e-02

Pos logit contributions
ggy+0.055
vern+0.049
 badly+0.048
 liar+0.047
vere+0.047
Neg logit contributions
ened-0.066
cy-0.062
en-0.060
 Sarah-0.059
 Let-0.058
Token tease
Token ID34691
Feature activation-3.23e-02

Pos logit contributions
ggy+0.105
vern+0.102
 liar+0.094
vere+0.094
 badly+0.093
Neg logit contributions
ened-0.069
cy-0.064
en-0.060
 Let-0.057
 Sarah-0.055
Token the
Token ID262
Feature activation-2.17e-03

Pos logit contributions
ened+0.060
cy+0.053
neath+0.048
 Jenna+0.048
ted+0.046
Neg logit contributions
ggy-0.073
 badly-0.067
 liar-0.061
vern-0.061
ggie-0.059
Token ost
Token ID23619
Feature activation+2.61e-02

Pos logit contributions
ened+0.004
cy+0.004
en+0.004
ted+0.003
 Let+0.003
Neg logit contributions
ggy-0.005
vern-0.004
 badly-0.004
 liar-0.004
vere-0.004
Tokenric
Token ID1173
Feature activation+1.24e-01

Pos logit contributions
ggy+0.054
 badly+0.048
vern+0.047
 liar+0.045
vere+0.043
Neg logit contributions
ened-0.053
cy-0.048
en-0.044
ted-0.044
neath-0.043
Tokenhes
Token ID956
Feature activation-1.27e-03

Pos logit contributions
ggy+0.156
 badly+0.143
vern+0.135
 liar+0.127
vere+0.115
Neg logit contributions
ened-0.322
cy-0.301
ted-0.292
en-0.288
 means-0.286
Token.
Token ID13
Feature activation-5.97e-03

Pos logit contributions
ened+0.003
cy+0.003
neath+0.002
 Jenna+0.002
ted+0.002
Neg logit contributions
ggy-0.003
 badly-0.002
 liar-0.002
vern-0.002
 not-0.002
Token They
Token ID1119
Feature activation+4.56e-03

Pos logit contributions
ened+0.009
cy+0.007
ted+0.007
en+0.006
neath+0.006
Neg logit contributions
ggy-0.016
 badly-0.014
vern-0.014
 liar-0.014
vere-0.013
Token are
Token ID389
Feature activation+7.67e-02

Pos logit contributions
ggy+0.010
 badly+0.009
vern+0.009
 liar+0.009
vere+0.008
Neg logit contributions
ened-0.008
cy-0.007
en-0.007
ted-0.007
 Sarah-0.007
Token not
Token ID407
Feature activation+6.38e-02

Pos logit contributions
ggy+0.102
vern+0.085
 badly+0.082
 liar+0.077
vere+0.074
Neg logit contributions
ened-0.206
cy-0.189
en-0.181
 Sarah-0.181
ted-0.178
Token.
Token ID13
Feature activation+4.03e-04

Pos logit contributions
ggy+0.075
 badly+0.067
vern+0.067
 liar+0.064
vere+0.061
Neg logit contributions
ened-0.062
cy-0.055
en-0.051
ted-0.050
 Sarah-0.049
Token They
Token ID1119
Feature activation+4.14e-03

Pos logit contributions
ggy+0.001
 badly+0.001
 liar+0.001
 much+0.001
 not+0.001
Neg logit contributions
ened-0.001
cy-0.001
en-0.001
ted-0.001
ening-0.001
Token were
Token ID547
Feature activation+6.89e-02

Pos logit contributions
ggy+0.009
 badly+0.008
 liar+0.008
vern+0.008
 not+0.008
Neg logit contributions
ened-0.007
cy-0.006
elling-0.006
en-0.006
 Let-0.006
Token called
Token ID1444
Feature activation-1.75e-02

Pos logit contributions
ggy+0.125
vern+0.112
 badly+0.110
 liar+0.104
vere+0.099
Neg logit contributions
ened-0.154
cy-0.139
en-0.132
 Sarah-0.131
ted-0.129
Token ost
Token ID23619
Feature activation+8.73e-03

Pos logit contributions
ened+0.036
cy+0.033
en+0.030
neath+0.028
ted+0.028
Neg logit contributions
ggy-0.037
 badly-0.033
vern-0.031
 liar-0.031
ggie-0.029
Tokenric
Token ID1173
Feature activation+1.21e-01

Pos logit contributions
ggy+0.016
 badly+0.014
vern+0.014
 liar+0.012
ggie+0.012
Neg logit contributions
ened-0.020
cy-0.017
 Let-0.016
 Sarah-0.016
 Jenny-0.016
Tokenhes
Token ID956
Feature activation-1.55e-03

Pos logit contributions
ggy+0.162
 badly+0.148
vern+0.143
 liar+0.133
vere+0.120
Neg logit contributions
ened-0.306
cy-0.280
ted-0.274
en-0.273
neath-0.272
Token.
Token ID13
Feature activation+8.61e-04

Pos logit contributions
ened+0.003
cy+0.003
 Jenna+0.002
neath+0.002
 Let+0.002
Neg logit contributions
ggy-0.004
 badly-0.003
ggie-0.003
 liar-0.003
gh-0.003
Token Lily
Token ID20037
Feature activation-2.49e-02

Pos logit contributions
ggy+0.002
 badly+0.002
 liar+0.002
vern+0.002
unch+0.002
Neg logit contributions
ened-0.002
cy-0.001
en-0.001
ted-0.001
ening-0.001
Token and
Token ID290
Feature activation-2.20e-02

Pos logit contributions
ened+0.044
elling+0.039
neath+0.037
cy+0.037
 Jenna+0.036
Neg logit contributions
ggy-0.060
 badly-0.049
ggie-0.048
 not-0.047
 liar-0.044
Token Ben
Token ID3932
Feature activation-9.01e-03

Pos logit contributions
ened+0.035
cy+0.031
en+0.028
 Let+0.028
neath+0.027
Neg logit contributions
ggy-0.056
 badly-0.049
 liar-0.048
ggie-0.046
vern-0.046
Token cried
Token ID16896
Feature activation-2.39e-03

Pos logit contributions
ened+0.023
cy+0.020
ted+0.019
en+0.019
 Sarah+0.018
Neg logit contributions
ggy-0.034
vern-0.032
 badly-0.031
 liar-0.030
gh-0.029
Token.
Token ID13
Feature activation-9.33e-03

Pos logit contributions
ened+0.004
cy+0.004
en+0.003
ted+0.003
 Sarah+0.003
Neg logit contributions
ggy-0.006
vern-0.005
 badly-0.005
 liar-0.005
ggie-0.005
Token "
Token ID366
Feature activation+2.30e-03

Pos logit contributions
ened+0.015
cy+0.013
en+0.012
neath+0.011
ted+0.011
Neg logit contributions
ggy-0.024
 badly-0.022
 liar-0.021
vern-0.021
vere-0.020
TokenThe
Token ID464
Feature activation-7.43e-03

Pos logit contributions
ggy+0.005
vern+0.005
 badly+0.005
 liar+0.005
ggie+0.005
Neg logit contributions
cy-0.004
ened-0.003
neath-0.003
ted-0.003
 Let-0.003
Token ost
Token ID23619
Feature activation+1.81e-02

Pos logit contributions
ened+0.013
cy+0.011
ted+0.011
 Girl+0.011
 Let+0.010
Neg logit contributions
ggy-0.016
vern-0.015
 badly-0.014
unch-0.013
vere-0.013
Tokenric
Token ID1173
Feature activation+1.20e-01

Pos logit contributions
ggy+0.037
 badly+0.033
vern+0.032
 liar+0.031
vere+0.029
Neg logit contributions
ened-0.038
cy-0.034
en-0.031
ted-0.031
 Sarah-0.031
Tokenhes
Token ID956
Feature activation-7.96e-03

Pos logit contributions
ggy+0.175
 badly+0.172
vern+0.172
 liar+0.154
vere+0.144
Neg logit contributions
ened-0.262
 means-0.248
ted-0.243
cy-0.241
 sets-0.238
Token are
Token ID389
Feature activation+6.92e-02

Pos logit contributions
ened+0.014
cy+0.012
 Jenna+0.011
ted+0.011
en+0.011
Neg logit contributions
ggy-0.019
 badly-0.017
 liar-0.016
vern-0.016
ggie-0.015
Token mean
Token ID1612
Feature activation-1.19e-02

Pos logit contributions
ggy+0.109
vern+0.099
 liar+0.088
 badly+0.088
unch+0.085
Neg logit contributions
ened-0.168
cy-0.143
en-0.143
 Sarah-0.142
ted-0.141
Token!
Token ID0
Feature activation+6.56e-03

Pos logit contributions
ened+0.023
cy+0.021
neath+0.020
 Jenna+0.019
ted+0.019
Neg logit contributions
ggy-0.026
 badly-0.023
vern-0.022
 liar-0.022
vere-0.020
Token They
Token ID1119
Feature activation+1.16e-02

Pos logit contributions
ggy+0.018
 badly+0.016
 liar+0.015
vern+0.015
unch+0.014
Neg logit contributions
ened-0.010
cy-0.008
en-0.007
ted-0.007
neath-0.007
Token too
Token ID1165
Feature activation+5.90e-02

Pos logit contributions
ggy+0.016
vern+0.014
 badly+0.014
 liar+0.013
vere+0.013
Neg logit contributions
ened-0.011
cy-0.010
en-0.009
ted-0.009
 Sarah-0.009
Token close
Token ID1969
Feature activation-4.45e-02

Pos logit contributions
ggy+0.114
vern+0.101
 liar+0.100
gh+0.096
unch+0.092
Neg logit contributions
ened-0.118
cy-0.116
 Sarah-0.105
en-0.102
 Jenny-0.099
Token.
Token ID13
Feature activation-7.45e-03

Pos logit contributions
ened+0.088
cy+0.079
 Girl+0.074
ted+0.073
neath+0.072
Neg logit contributions
ggy-0.089
 badly-0.088
 not-0.077
vern-0.076
gh-0.069
Token The
Token ID383
Feature activation-1.73e-02

Pos logit contributions
ened+0.013
en+0.011
cy+0.010
 Jenna+0.010
ening+0.009
Neg logit contributions
ggy-0.017
 badly-0.016
 liar-0.015
ggie-0.014
vern-0.014
Token ost
Token ID23619
Feature activation+1.09e-02

Pos logit contributions
ened+0.031
cy+0.028
en+0.025
ted+0.025
neath+0.024
Neg logit contributions
ggy-0.040
 badly-0.036
vern-0.036
 liar-0.034
vere-0.033
Tokenric
Token ID1173
Feature activation+1.18e-01

Pos logit contributions
ggy+0.020
 badly+0.017
vern+0.017
 liar+0.016
ggie+0.015
Neg logit contributions
ened-0.025
cy-0.022
 Sarah-0.021
ted-0.020
 Jenny-0.020
Tokenhes
Token ID956
Feature activation-1.47e-02

Pos logit contributions
ggy+0.149
 badly+0.138
vern+0.134
 liar+0.122
vere+0.109
Neg logit contributions
ened-0.304
cy-0.285
ted-0.277
 means-0.274
 Let-0.274
Token might
Token ID1244
Feature activation+5.34e-02

Pos logit contributions
ened+0.029
cy+0.025
 Jenna+0.024
 Sarah+0.023
 Girl+0.023
Neg logit contributions
ggy-0.033
 badly-0.029
 liar-0.026
ggie-0.026
vern-0.025
Token pe
Token ID613
Feature activation+4.30e-02

Pos logit contributions
ggy+0.122
vern+0.112
 liar+0.110
oll+0.109
 badly+0.107
Neg logit contributions
ened-0.078
cy-0.074
en-0.073
 Let-0.067
 means-0.067
Tokenck
Token ID694
Feature activation-1.84e-02

Pos logit contributions
ggy+0.064
 badly+0.056
vern+0.056
 liar+0.056
gh+0.048
Neg logit contributions
ened-0.108
cy-0.098
en-0.098
ted-0.093
 means-0.091
Token you
Token ID345
Feature activation+1.84e-03

Pos logit contributions
ened+0.032
cy+0.030
 Jenna+0.029
 Sarah+0.029
neath+0.029
Neg logit contributions
ggy-0.038
 badly-0.036
 liar-0.032
vern-0.031
ggie-0.030
Token and
Token ID290
Feature activation-2.79e-02

Pos logit contributions
ggy+0.024
 badly+0.023
vern+0.021
 not+0.021
vere+0.020
Neg logit contributions
ened-0.019
cy-0.017
 Jenna-0.016
 Girl-0.015
 Tina-0.015
Token run
Token ID1057
Feature activation-1.85e-02

Pos logit contributions
ened+0.061
cy+0.053
elling+0.051
 Jenna+0.051
en+0.050
Neg logit contributions
ggy-0.050
 badly-0.043
 not-0.041
 choke-0.040
vere-0.040
Token to
Token ID284
Feature activation-2.85e-02

Pos logit contributions
ened+0.032
cy+0.027
 Jenna+0.025
neath+0.025
en+0.025
Neg logit contributions
ggy-0.044
 badly-0.041
vern-0.039
 liar-0.038
vere-0.036
Token the
Token ID262
Feature activation-1.56e-02

Pos logit contributions
ened+0.036
cy+0.029
ted+0.026
en+0.025
neath+0.024
Neg logit contributions
ggy-0.082
 badly-0.075
vern-0.073
 liar-0.071
ggie-0.070
Token ost
Token ID23619
Feature activation+1.19e-02

Pos logit contributions
ened+0.026
cy+0.023
ted+0.021
en+0.021
 Let+0.020
Neg logit contributions
ggy-0.038
 badly-0.034
vern-0.034
 liar-0.032
vere-0.031
Tokenric
Token ID1173
Feature activation+1.17e-01

Pos logit contributions
ggy+0.025
 badly+0.022
vern+0.022
 liar+0.021
vere+0.020
Neg logit contributions
ened-0.024
cy-0.021
en-0.020
ted-0.019
neath-0.019
Tokenhes
Token ID956
Feature activation-8.01e-03

Pos logit contributions
 badly+0.155
ggy+0.154
vern+0.144
 liar+0.140
vere+0.122
Neg logit contributions
ened-0.278
cy-0.261
ted-0.258
neath-0.255
 means-0.255
Token.
Token ID13
Feature activation-3.34e-03

Pos logit contributions
ened+0.017
cy+0.016
 Girl+0.015
neath+0.015
 Jenna+0.015
Neg logit contributions
ggy-0.016
 badly-0.014
vern-0.013
ggie-0.012
 liar-0.012
Token
Token ID198
Feature activation+9.36e-02

Pos logit contributions
ened+0.006
en+0.005
cy+0.005
ening+0.005
 Girl+0.005
Neg logit contributions
ggy-0.008
 badly-0.007
 not-0.007
 liar-0.007
vern-0.007
Token
Token ID198
Feature activation+7.34e-02

Pos logit contributions
ggy+0.231
 badly+0.216
 liar+0.216
vern+0.212
vere+0.200
Neg logit contributions
ened-0.128
cy-0.120
 Let-0.111
 Sarah-0.111
neath-0.108
Token"
Token ID1
Feature activation-3.80e-03

Pos logit contributions
ggy+0.160
vern+0.147
 badly+0.147
 liar+0.144
vere+0.135
Neg logit contributions
ened-0.126
cy-0.120
 Let-0.108
neath-0.107
ted-0.106
Token and
Token ID290
Feature activation-1.52e-02

Pos logit contributions
ggy+0.009
 badly+0.008
vern+0.008
 liar+0.008
vere+0.008
Neg logit contributions
ened-0.006
cy-0.006
en-0.005
ted-0.005
 Sarah-0.005
Token looked
Token ID3114
Feature activation-1.13e-05

Pos logit contributions
ened+0.026
elling+0.024
cy+0.022
en+0.021
 Let+0.021
Neg logit contributions
ggy-0.034
 badly-0.031
 not-0.029
 gag-0.028
ggie-0.027
Token at
Token ID379
Feature activation-2.86e-02

Pos logit contributions
ened+0.000
cy+0.000
neath+0.000
 Let+0.000
en+0.000
Neg logit contributions
 badly-0.000
ggy-0.000
vern-0.000
vere-0.000
 much-0.000
Token the
Token ID262
Feature activation-1.08e-02

Pos logit contributions
ened+0.044
 Let+0.036
 draws+0.034
cy+0.034
 Jenna+0.034
Neg logit contributions
ggy-0.070
 badly-0.068
vern-0.058
ggie-0.057
 liar-0.057
Token ost
Token ID23619
Feature activation+1.25e-02

Pos logit contributions
ened+0.021
cy+0.019
en+0.018
ted+0.017
neath+0.017
Neg logit contributions
ggy-0.023
vern-0.021
 badly-0.021
 liar-0.020
vere-0.019
Tokenric
Token ID1173
Feature activation+1.15e-01

Pos logit contributions
ggy+0.025
 badly+0.022
vern+0.021
 liar+0.020
vere+0.019
Neg logit contributions
ened-0.026
cy-0.024
en-0.022
ted-0.022
 Sarah-0.022
Tokenhes
Token ID956
Feature activation-9.93e-03

Pos logit contributions
ggy+0.146
 badly+0.140
vern+0.133
 liar+0.126
vere+0.111
Neg logit contributions
ened-0.293
cy-0.272
ted-0.267
neath-0.265
en-0.262
Token.
Token ID13
Feature activation-2.07e-02

Pos logit contributions
ened+0.022
cy+0.021
 Jenna+0.019
 exp+0.019
neath+0.019
Neg logit contributions
ggy-0.019
 badly-0.016
vern-0.014
ggie-0.014
gh-0.014
Token They
Token ID1119
Feature activation-1.25e-03

Pos logit contributions
ened+0.038
cy+0.035
en+0.033
ted+0.030
isher+0.029
Neg logit contributions
ggy-0.040
 badly-0.040
 gag-0.038
 liar-0.038
 not-0.037
Token saw
Token ID2497
Feature activation-1.60e-02

Pos logit contributions
ened+0.002
cy+0.002
en+0.002
 Let+0.002
ted+0.001
Neg logit contributions
ggy-0.003
 badly-0.003
vern-0.003
 liar-0.003
vere-0.003
Token one
Token ID530
Feature activation+1.94e-02

Pos logit contributions
ened+0.032
cy+0.030
en+0.027
 Let+0.026
neath+0.026
Neg logit contributions
ggy-0.034
 badly-0.030
 liar-0.028
vern-0.028
ggie-0.026
Token shouted
Token ID17293
Feature activation-1.71e-02

Pos logit contributions
ened+0.034
cy+0.030
neath+0.028
en+0.027
elling+0.025
Neg logit contributions
ggy-0.055
 badly-0.049
vern-0.046
ggie-0.045
 liar-0.045
Token.
Token ID13
Feature activation-1.66e-02

Pos logit contributions
ened+0.034
cy+0.032
neath+0.030
en+0.029
ted+0.029
Neg logit contributions
ggy-0.035
 badly-0.033
vern-0.030
 liar-0.029
ggie-0.028
Token "
Token ID366
Feature activation-1.76e-03

Pos logit contributions
ened+0.025
cy+0.021
en+0.019
ted+0.019
neath+0.018
Neg logit contributions
ggy-0.044
 badly-0.040
vern-0.039
 liar-0.038
vere-0.037
TokenThe
Token ID464
Feature activation-1.24e-02

Pos logit contributions
ened+0.003
cy+0.003
 Let+0.003
ted+0.003
neath+0.003
Neg logit contributions
ggy-0.004
vern-0.003
 badly-0.003
 liar-0.003
vere-0.003
Token ost
Token ID23619
Feature activation+1.08e-02

Pos logit contributions
ened+0.021
cy+0.019
ted+0.017
en+0.017
 Let+0.017
Neg logit contributions
ggy-0.029
 badly-0.027
vern-0.027
 liar-0.025
vere-0.025
Tokenric
Token ID1173
Feature activation+1.13e-01

Pos logit contributions
ggy+0.022
 badly+0.019
vern+0.019
 liar+0.018
vere+0.017
Neg logit contributions
ened-0.022
cy-0.020
en-0.019
ted-0.018
 Sarah-0.018
Tokenhes
Token ID956
Feature activation-1.60e-02

Pos logit contributions
 badly+0.152
ggy+0.151
vern+0.144
 liar+0.134
 numb+0.122
Neg logit contributions
ened-0.267
 means-0.248
cy-0.247
ted-0.247
neath-0.243
Token are
Token ID389
Feature activation+6.63e-02

Pos logit contributions
ened+0.030
cy+0.026
 Jenna+0.025
ted+0.024
 Girl+0.024
Neg logit contributions
ggy-0.037
 badly-0.032
ggie-0.030
 liar-0.030
vern-0.029
Token so
Token ID523
Feature activation+5.31e-02

Pos logit contributions
ggy+0.112
vern+0.099
 badly+0.096
 liar+0.092
vere+0.087
Neg logit contributions
ened-0.155
cy-0.139
en-0.134
 Sarah-0.132
ted-0.130
Token funny
Token ID8258
Feature activation-9.12e-03

Pos logit contributions
ggy+0.078
vern+0.069
 liar+0.066
 badly+0.066
gh+0.063
Neg logit contributions
ened-0.129
cy-0.122
 Sarah-0.113
ted-0.112
 Let-0.111
Token!"
Token ID2474
Feature activation-2.32e-03

Pos logit contributions
ened+0.019
cy+0.017
neath+0.016
ted+0.016
en+0.015
Neg logit contributions
ggy-0.019
 badly-0.017
vern-0.016
 liar-0.015
ggie-0.015
Token bitter
Token ID12922
Feature activation+3.86e-03

Pos logit contributions
ggy+0.061
vern+0.055
 badly+0.053
 liar+0.051
vere+0.049
Neg logit contributions
ened-0.068
cy-0.061
 Sarah-0.057
 Let-0.056
ted-0.056
Token and
Token ID290
Feature activation+2.76e-02

Pos logit contributions
ggy+0.009
vern+0.008
 badly+0.008
 liar+0.008
vere+0.007
Neg logit contributions
ened-0.007
cy-0.006
en-0.006
ted-0.006
 Sarah-0.005
Token made
Token ID925
Feature activation+2.29e-02

Pos logit contributions
ggy+0.049
vern+0.043
 badly+0.043
 liar+0.040
vere+0.039
Neg logit contributions
ened-0.063
cy-0.057
en-0.053
ted-0.053
 Sarah-0.053
Token her
Token ID607
Feature activation+2.15e-02

Pos logit contributions
ggy+0.058
 badly+0.053
vern+0.053
 liar+0.051
vere+0.049
Neg logit contributions
ened-0.035
cy-0.030
en-0.027
ted-0.027
 Sarah-0.027
Token face
Token ID1986
Feature activation+4.87e-02

Pos logit contributions
ggy+0.044
vern+0.044
vere+0.040
 liar+0.038
 badly+0.038
Neg logit contributions
ened-0.042
cy-0.038
 smiles-0.035
en-0.033
 Sarah-0.033
Token scr
Token ID6040
Feature activation+1.13e-01

Pos logit contributions
ggy+0.095
vern+0.090
 liar+0.081
vere+0.081
 badly+0.078
Neg logit contributions
ened-0.105
cy-0.091
en-0.090
ted-0.081
 Sarah-0.081
Tokenunch
Token ID3316
Feature activation+4.35e-02

Pos logit contributions
ggy+0.187
 liar+0.176
 villains+0.160
ggie+0.151
vern+0.149
Neg logit contributions
elling-0.226
ened-0.217
cy-0.211
en-0.207
lee-0.196
Token up
Token ID510
Feature activation-1.07e-02

Pos logit contributions
ggy+0.091
 badly+0.086
vern+0.077
 liar+0.074
gh+0.073
Neg logit contributions
ened-0.084
cy-0.073
 Jenna-0.071
 Sarah-0.071
 exp-0.069
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
ened+0.018
cy+0.016
en+0.015
ted+0.014
neath+0.014
Neg logit contributions
ggy-0.026
vern-0.023
 liar-0.022
 badly-0.022
ggie-0.022
Token
Token ID198
Feature activation+9.19e-02

Pos logit contributions
ggy+0.023
 badly+0.020
vern+0.020
 liar+0.019
unch+0.018
Neg logit contributions
ened-0.021
cy-0.018
en-0.017
ted-0.016
neath-0.016
Token
Token ID198
Feature activation+7.45e-02

Pos logit contributions
ggy+0.226
 badly+0.212
 liar+0.212
vern+0.209
vere+0.200
Neg logit contributions
ened-0.123
cy-0.118
 Sarah-0.110
 Jenny-0.106
 Let-0.105
Token wanted
Token ID2227
Feature activation+7.63e-03

Pos logit contributions
ggy+0.011
 badly+0.010
vern+0.010
 liar+0.010
ggie+0.009
Neg logit contributions
ened-0.008
cy-0.007
en-0.006
ted-0.006
neath-0.006
Token to
Token ID284
Feature activation+2.79e-02

Pos logit contributions
ggy+0.019
 badly+0.017
vern+0.016
 liar+0.016
ggie+0.015
Neg logit contributions
ened-0.013
cy-0.012
en-0.010
neath-0.010
ted-0.010
Token touch
Token ID3638
Feature activation-4.73e-02

Pos logit contributions
ggy+0.068
vern+0.068
 badly+0.065
 liar+0.064
vere+0.063
Neg logit contributions
ened-0.038
cy-0.037
en-0.033
 Sarah-0.032
 Let-0.031
Token the
Token ID262
Feature activation-7.40e-03

Pos logit contributions
ened+0.073
cy+0.063
 Jenna+0.058
rie+0.058
 Let+0.057
Neg logit contributions
ggy-0.119
 badly-0.113
 liar-0.099
ggie-0.097
vern-0.096
Token ost
Token ID23619
Feature activation+1.53e-02

Pos logit contributions
ened+0.013
cy+0.011
ted+0.010
en+0.010
neath+0.010
Neg logit contributions
ggy-0.018
 badly-0.016
vern-0.016
 liar-0.015
vere-0.015
Tokenric
Token ID1173
Feature activation+1.12e-01

Pos logit contributions
ggy+0.031
 badly+0.028
vern+0.027
 liar+0.026
vere+0.025
Neg logit contributions
ened-0.031
cy-0.028
en-0.026
ted-0.026
neath-0.025
Tokenhes
Token ID956
Feature activation-9.89e-03

Pos logit contributions
ggy+0.143
 badly+0.136
vern+0.127
 liar+0.124
vere+0.109
Neg logit contributions
ened-0.283
cy-0.265
ted-0.260
neath-0.255
 means-0.254
Token.
Token ID13
Feature activation-6.30e-03

Pos logit contributions
ened+0.021
cy+0.019
neath+0.018
 exp+0.017
 Jenna+0.017
Neg logit contributions
ggy-0.020
 badly-0.018
 liar-0.016
ggie-0.016
gh-0.016
Token They
Token ID1119
Feature activation-1.88e-03

Pos logit contributions
ened+0.011
en+0.009
cy+0.009
ted+0.008
neath+0.008
Neg logit contributions
ggy-0.016
 badly-0.014
vern-0.013
 liar-0.013
vere-0.013
Token reached
Token ID4251
Feature activation-3.54e-02

Pos logit contributions
ened+0.003
cy+0.003
en+0.002
ted+0.002
neath+0.002
Neg logit contributions
ggy-0.005
 badly-0.004
vern-0.004
 liar-0.004
vere-0.004
Token out
Token ID503
Feature activation-4.06e-02

Pos logit contributions
ened+0.064
cy+0.052
ted+0.050
 Girl+0.050
en+0.049
Neg logit contributions
ggy-0.081
 badly-0.072
vern-0.067
 not-0.066
ggie-0.065
Tokenie
Token ID494
Feature activation-2.94e-02

Pos logit contributions
ened+0.003
cy+0.003
en+0.002
ted+0.002
neath+0.002
Neg logit contributions
ggy-0.004
 badly-0.004
vern-0.004
 liar-0.004
ggie-0.004
Token.
Token ID13
Feature activation-5.59e-03

Pos logit contributions
ened+0.054
cy+0.048
en+0.044
ted+0.044
 Sarah+0.043
Neg logit contributions
ggy-0.066
 badly-0.059
vern-0.059
 liar-0.056
vere-0.054
Token Unlike
Token ID12101
Feature activation-1.35e-02

Pos logit contributions
ened+0.009
cy+0.007
en+0.007
ted+0.006
neath+0.006
Neg logit contributions
ggy-0.014
 badly-0.013
vern-0.013
 liar-0.012
vere-0.012
Token other
Token ID584
Feature activation-6.63e-02

Pos logit contributions
ened+0.023
cy+0.020
en+0.018
ted+0.018
neath+0.018
Neg logit contributions
ggy-0.032
 badly-0.029
vern-0.029
 liar-0.028
vere-0.027
Token ost
Token ID23619
Feature activation+1.35e-02

Pos logit contributions
ened+0.124
cy+0.111
en+0.100
neath+0.100
ted+0.099
Neg logit contributions
ggy-0.147
 badly-0.132
vern-0.129
 liar-0.125
vere-0.119
Tokenric
Token ID1173
Feature activation+1.12e-01

Pos logit contributions
ggy+0.029
 badly+0.026
vern+0.025
 liar+0.024
vere+0.023
Neg logit contributions
ened-0.027
cy-0.024
en-0.022
ted-0.022
 Sarah-0.021
Tokenhes
Token ID956
Feature activation-3.38e-03

Pos logit contributions
ggy+0.147
 badly+0.133
vern+0.127
 liar+0.119
vere+0.108
Neg logit contributions
ened-0.287
cy-0.264
ted-0.258
neath-0.255
en-0.254
Token,
Token ID11
Feature activation-2.25e-04

Pos logit contributions
ened+0.006
cy+0.006
 Jenna+0.005
neath+0.005
 Let+0.005
Neg logit contributions
ggy-0.008
 badly-0.007
 liar-0.006
vern-0.006
ggie-0.006
Token his
Token ID465
Feature activation-4.15e-03

Pos logit contributions
ened+0.000
cy+0.000
en+0.000
ted+0.000
neath+0.000
Neg logit contributions
ggy-0.001
 badly-0.000
vern-0.000
 liar-0.000
vere-0.000
Token feathers
Token ID27400
Feature activation+4.17e-02

Pos logit contributions
ened+0.008
cy+0.007
en+0.007
ted+0.007
 Sarah+0.007
Neg logit contributions
ggy-0.009
 badly-0.008
vern-0.008
 liar-0.007
vere-0.007
Token gl
Token ID1278
Feature activation+1.83e-02

Pos logit contributions
ggy+0.105
 badly+0.096
vern+0.092
 liar+0.089
ggie+0.086
Neg logit contributions
ened-0.066
cy-0.058
en-0.051
neath-0.051
 Sarah-0.051
Token mom
Token ID1995
Feature activation-2.11e-02

Pos logit contributions
ened+0.019
cy+0.015
en+0.013
ted+0.013
neath+0.013
Neg logit contributions
ggy-0.035
 badly-0.032
vern-0.030
 liar-0.030
 not-0.029
Token and
Token ID290
Feature activation-1.06e-02

Pos logit contributions
ened+0.040
cy+0.036
neath+0.035
 exp+0.031
 Jenna+0.031
Neg logit contributions
ggy-0.045
 badly-0.043
vern-0.041
 not-0.039
 liar-0.039
Token to
Token ID284
Feature activation-1.15e-02

Pos logit contributions
ened+0.021
en+0.018
cy+0.018
neath+0.018
ted+0.016
Neg logit contributions
ggy-0.023
 badly-0.021
 liar-0.020
vern-0.020
 not-0.019
Token the
Token ID262
Feature activation-2.28e-02

Pos logit contributions
ened+0.016
cy+0.013
en+0.012
ted+0.011
neath+0.011
Neg logit contributions
ggy-0.032
 badly-0.029
vern-0.028
 liar-0.028
ggie-0.027
Token ost
Token ID23619
Feature activation+1.24e-02

Pos logit contributions
ened+0.041
cy+0.038
en+0.035
ted+0.034
 Girl+0.034
Neg logit contributions
ggy-0.048
vern-0.045
 badly-0.044
 liar-0.041
gh-0.040
Tokenric
Token ID1173
Feature activation+1.11e-01

Pos logit contributions
ggy+0.026
 badly+0.023
vern+0.023
 liar+0.022
vere+0.021
Neg logit contributions
ened-0.024
cy-0.022
en-0.020
ted-0.020
neath-0.019
Tokenhes
Token ID956
Feature activation-1.49e-02

Pos logit contributions
ggy+0.147
 badly+0.141
vern+0.134
 liar+0.128
vere+0.115
Neg logit contributions
ened-0.269
ted-0.249
cy-0.248
 means-0.243
neath-0.241
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
ened+0.033
cy+0.030
neath+0.028
 exp+0.027
 Jenna+0.027
Neg logit contributions
ggy-0.029
 badly-0.026
 not-0.023
vern-0.022
 liar-0.022
Token They
Token ID1119
Feature activation-3.38e-03

Pos logit contributions
ened+0.021
cy+0.017
en+0.017
neath+0.015
ted+0.015
Neg logit contributions
ggy-0.039
 badly-0.036
vern-0.035
 liar-0.034
 not-0.032
Token learned
Token ID4499
Feature activation-2.50e-03

Pos logit contributions
ened+0.005
cy+0.004
en+0.004
ted+0.004
neath+0.004
Neg logit contributions
ggy-0.009
 badly-0.008
vern-0.008
 liar-0.008
vere-0.007
Token their
Token ID511
Feature activation-2.58e-03

Pos logit contributions
ened+0.003
 Sarah+0.002
cy+0.002
ted+0.002
 means+0.002
Neg logit contributions
ggy-0.007
 liar-0.006
vern-0.006
 badly-0.006
gh-0.006
Token you
Token ID345
Feature activation+1.02e-02

Pos logit contributions
ened+0.083
cy+0.072
en+0.065
ted+0.064
neath+0.064
Neg logit contributions
ggy-0.116
 badly-0.100
vern-0.097
 liar-0.093
ggie-0.093
Token go
Token ID467
Feature activation+5.00e-03

Pos logit contributions
ggy+0.020
 badly+0.018
vern+0.017
 liar+0.017
gh+0.016
Neg logit contributions
ened-0.022
cy-0.019
 Girl-0.017
ted-0.017
neath-0.017
Token,
Token ID11
Feature activation-1.60e-02

Pos logit contributions
ggy+0.010
 badly+0.010
gh+0.008
 liar+0.008
vern+0.008
Neg logit contributions
ened-0.010
neath-0.008
ted-0.008
 Jenna-0.008
cy-0.008
Token enjoy
Token ID2883
Feature activation-9.85e-03

Pos logit contributions
ened+0.034
cy+0.030
ted+0.027
neath+0.027
en+0.027
Neg logit contributions
ggy-0.032
 badly-0.028
 liar-0.028
vern-0.027
vere-0.025
Token your
Token ID534
Feature activation+5.39e-03

Pos logit contributions
ened+0.018
cy+0.015
ted+0.014
neath+0.013
en+0.013
Neg logit contributions
ggy-0.024
 badly-0.021
vern-0.021
 liar-0.020
ggie-0.019
Token pizz
Token ID38748
Feature activation+1.10e-01

Pos logit contributions
ggy+0.010
 badly+0.009
vern+0.009
 liar+0.008
vere+0.008
Neg logit contributions
ened-0.012
cy-0.011
en-0.010
ted-0.010
 Sarah-0.010
Tokenas
Token ID292
Feature activation+2.24e-02

Pos logit contributions
ggy+0.224
 badly+0.218
 liar+0.197
vern+0.196
 gag+0.195
Neg logit contributions
cy-0.198
ened-0.173
als-0.170
 Let-0.170
ted-0.156
Token."
Token ID526
Feature activation-2.07e-03

Pos logit contributions
ggy+0.049
 badly+0.044
vern+0.043
 liar+0.041
ggie+0.039
Neg logit contributions
ened-0.044
cy-0.038
ted-0.036
neath-0.035
en-0.035
Token
Token ID198
Feature activation+8.72e-02

Pos logit contributions
ened+0.004
cy+0.004
en+0.004
ted+0.004
neath+0.004
Neg logit contributions
ggy-0.004
 badly-0.004
vern-0.004
 liar-0.003
vere-0.003
Token
Token ID198
Feature activation+7.12e-02

Pos logit contributions
ggy+0.212
 badly+0.194
vern+0.190
 liar+0.190
ggie+0.179
Neg logit contributions
ened-0.134
cy-0.122
 Sarah-0.113
 Jenny-0.110
en-0.109
Token"
Token ID1
Feature activation-2.85e-03

Pos logit contributions
ggy+0.145
 badly+0.130
vern+0.128
 liar+0.124
vere+0.118
Neg logit contributions
ened-0.138
cy-0.131
en-0.119
 Let-0.117
 Jenny-0.117
Token
Token ID198
Feature activation+9.08e-02

Pos logit contributions
ened+0.028
en+0.024
cy+0.024
ening+0.021
ted+0.021
Neg logit contributions
ggy-0.035
 badly-0.031
vern-0.029
 liar-0.029
 not-0.028
Token
Token ID198
Feature activation+7.19e-02

Pos logit contributions
ggy+0.223
 badly+0.208
 liar+0.208
vern+0.205
vere+0.193
Neg logit contributions
ened-0.125
cy-0.115
 Sarah-0.109
 Let-0.107
 Jenny-0.104
TokenBut
Token ID1537
Feature activation-1.46e-02

Pos logit contributions
ggy+0.164
vern+0.153
 badly+0.151
 liar+0.150
vere+0.141
Neg logit contributions
ened-0.111
cy-0.107
 Let-0.097
 Sarah-0.094
ted-0.093
Token the
Token ID262
Feature activation-9.59e-03

Pos logit contributions
ened+0.026
cy+0.023
en+0.022
elling+0.021
 exp+0.021
Neg logit contributions
ggy-0.036
 badly-0.032
 not-0.031
ggie-0.029
 liar-0.027
Token ost
Token ID23619
Feature activation+8.28e-03

Pos logit contributions
ened+0.017
cy+0.015
en+0.014
ted+0.014
 Let+0.013
Neg logit contributions
ggy-0.021
vern-0.020
 badly-0.020
 liar-0.019
vere-0.018
Tokenric
Token ID1173
Feature activation+1.10e-01

Pos logit contributions
ggy+0.017
 badly+0.016
vern+0.015
 liar+0.015
vere+0.014
Neg logit contributions
ened-0.016
cy-0.015
en-0.014
ted-0.013
neath-0.013
Tokenhes
Token ID956
Feature activation-2.62e-02

Pos logit contributions
ggy+0.146
 badly+0.143
vern+0.142
 liar+0.133
vere+0.121
Neg logit contributions
ened-0.257
ted-0.236
cy-0.233
 means-0.232
neath-0.230
Token did
Token ID750
Feature activation+3.69e-02

Pos logit contributions
ened+0.041
cy+0.037
 Jenna+0.034
 Sarah+0.033
elling+0.033
Neg logit contributions
ggy-0.068
 badly-0.059
ggie-0.055
gh-0.054
 liar-0.054
Token not
Token ID407
Feature activation+4.66e-02

Pos logit contributions
ggy+0.034
vern+0.031
 liar+0.031
unch+0.028
vere+0.026
Neg logit contributions
ened-0.106
cy-0.097
en-0.095
 smiles-0.095
 Sarah-0.095
Token like
Token ID588
Feature activation+1.60e-02

Pos logit contributions
ggy+0.111
 badly+0.104
vern+0.102
 liar+0.098
vere+0.096
Neg logit contributions
ened-0.073
cy-0.071
en-0.066
 Sarah-0.060
 Jenny-0.060
Token that
Token ID326
Feature activation-4.54e-03

Pos logit contributions
ggy+0.043
 badly+0.039
vern+0.037
 liar+0.037
ggie+0.036
Neg logit contributions
ened-0.023
cy-0.022
neath-0.018
en-0.018
 Let-0.017
TokenYes
Token ID5297
Feature activation-2.79e-02

Pos logit contributions
ened+0.011
cy+0.010
en+0.008
ted+0.008
neath+0.008
Neg logit contributions
ggy-0.024
 badly-0.022
vern-0.022
 liar-0.021
vere-0.021
Token,
Token ID11
Feature activation-3.06e-02

Pos logit contributions
ened+0.047
cy+0.041
neath+0.038
ted+0.038
en+0.036
Neg logit contributions
ggy-0.070
 badly-0.064
vern-0.059
 liar-0.057
gh-0.057
Token they
Token ID484
Feature activation-9.35e-03

Pos logit contributions
ened+0.061
cy+0.055
en+0.050
neath+0.049
ted+0.047
Neg logit contributions
ggy-0.068
 badly-0.060
 not-0.056
 liar-0.056
vern-0.055
Token are
Token ID389
Feature activation+6.71e-02

Pos logit contributions
ened+0.017
cy+0.015
en+0.013
ted+0.013
 Sarah+0.013
Neg logit contributions
ggy-0.022
 badly-0.020
vern-0.019
 liar-0.019
vere-0.018
Token ost
Token ID23619
Feature activation+1.54e-02

Pos logit contributions
ggy+0.151
 badly+0.138
vern+0.136
 liar+0.132
vere+0.125
Neg logit contributions
ened-0.119
cy-0.106
en-0.098
 Sarah-0.098
 Jenna-0.095
Tokenric
Token ID1173
Feature activation+1.09e-01

Pos logit contributions
ggy+0.031
 badly+0.028
vern+0.028
 liar+0.026
vere+0.025
Neg logit contributions
ened-0.032
cy-0.028
en-0.026
ted-0.026
 Sarah-0.026
Tokenhes
Token ID956
Feature activation-7.60e-03

Pos logit contributions
ggy+0.141
 badly+0.130
vern+0.120
 liar+0.114
gh+0.102
Neg logit contributions
ened-0.284
cy-0.261
ted-0.254
en-0.252
 means-0.250
Token,"
Token ID553
Feature activation-1.80e-02

Pos logit contributions
ened+0.015
cy+0.013
neath+0.012
ted+0.012
en+0.012
Neg logit contributions
ggy-0.017
 badly-0.015
vern-0.014
 liar-0.014
ggie-0.013
Token mom
Token ID1995
Feature activation-1.96e-02

Pos logit contributions
ened+0.029
cy+0.024
en+0.024
neath+0.023
ted+0.021
Neg logit contributions
ggy-0.047
 badly-0.040
 liar-0.038
ggie-0.038
 not-0.037
Token said
Token ID531
Feature activation-2.69e-02

Pos logit contributions
ened+0.036
cy+0.030
neath+0.030
en+0.027
 exp+0.027
Neg logit contributions
ggy-0.046
 badly-0.037
ggie-0.037
vern-0.036
unch-0.035
Token.
Token ID13
Feature activation-2.16e-02

Pos logit contributions
ened+0.055
cy+0.051
neath+0.050
en+0.048
ted+0.046
Neg logit contributions
ggy-0.055
 badly-0.049
vern-0.045
 liar-0.044
ggie-0.042
Token.
Token ID13
Feature activation-1.83e-02

Pos logit contributions
ened+0.033
cy+0.031
neath+0.030
 Jenna+0.029
ted+0.028
Neg logit contributions
ggy-0.031
 badly-0.028
 liar-0.026
vern-0.025
ggie-0.024
Token He
Token ID679
Feature activation+2.54e-03

Pos logit contributions
ened+0.035
cy+0.031
en+0.029
ted+0.028
neath+0.028
Neg logit contributions
ggy-0.039
 badly-0.036
 liar-0.036
vern-0.035
vere-0.033
Token was
Token ID373
Feature activation+7.09e-02

Pos logit contributions
ggy+0.006
 badly+0.005
 liar+0.005
ggie+0.005
vern+0.005
Neg logit contributions
ened-0.004
cy-0.004
en-0.004
 Jenna-0.004
neath-0.004
Token carrying
Token ID6872
Feature activation-2.19e-02

Pos logit contributions
ggy+0.120
vern+0.115
 badly+0.109
vere+0.102
unch+0.100
Neg logit contributions
ened-0.157
 Jenny-0.142
 Sarah-0.142
cy-0.138
 smiles-0.135
Token their
Token ID511
Feature activation-2.13e-03

Pos logit contributions
ened+0.040
cy+0.039
 Let+0.037
 exp+0.036
 Girl+0.035
Neg logit contributions
ggy-0.045
 badly-0.042
 liar-0.038
vern-0.036
ggie-0.036
Token pizz
Token ID38748
Feature activation+1.09e-01

Pos logit contributions
ened+0.004
cy+0.003
en+0.003
neath+0.003
ted+0.003
Neg logit contributions
ggy-0.005
 liar-0.004
vern-0.004
gh-0.004
 badly-0.004
Tokenas
Token ID292
Feature activation+1.55e-02

Pos logit contributions
ggy+0.243
 badly+0.228
 liar+0.213
 gag+0.210
vern+0.209
Neg logit contributions
cy-0.177
als-0.161
ened-0.158
 Let-0.145
ted-0.144
Token on
Token ID319
Feature activation-2.83e-02

Pos logit contributions
ggy+0.034
 badly+0.033
vern+0.029
 liar+0.028
 not+0.027
Neg logit contributions
ened-0.029
cy-0.026
neath-0.023
elling-0.023
 Girl-0.023
Token a
Token ID257
Feature activation+3.30e-02

Pos logit contributions
ened+0.041
cy+0.037
ted+0.032
 Let+0.031
neath+0.030
Neg logit contributions
ggy-0.074
 badly-0.068
vern-0.064
 liar-0.064
ggie-0.062
Token big
Token ID1263
Feature activation+1.18e-03

Pos logit contributions
ggy+0.073
vern+0.069
 badly+0.068
unch+0.064
vere+0.064
Neg logit contributions
ened-0.055
cy-0.049
 Sarah-0.045
 Let-0.044
 Jenny-0.044
Token tray
Token ID26473
Feature activation+6.12e-02

Pos logit contributions
ggy+0.002
 badly+0.002
vern+0.002
 liar+0.002
vere+0.002
Neg logit contributions
ened-0.002
cy-0.002
en-0.002
ted-0.002
neath-0.002
Token They
Token ID1119
Feature activation-2.95e-03

Pos logit contributions
ened+0.032
cy+0.027
ted+0.025
en+0.024
neath+0.023
Neg logit contributions
ggy-0.052
 badly-0.048
 liar-0.047
vern-0.046
 not-0.044
Token do
Token ID466
Feature activation+1.97e-02

Pos logit contributions
ened+0.005
cy+0.005
en+0.004
ted+0.004
neath+0.004
Neg logit contributions
ggy-0.007
 badly-0.006
vern-0.006
 liar-0.006
gh-0.005
Token not
Token ID407
Feature activation+4.91e-02

Pos logit contributions
ggy+0.019
 liar+0.015
 badly+0.015
vern+0.014
vere+0.013
Neg logit contributions
ened-0.057
cy-0.054
en-0.053
 means-0.052
 Let-0.052
Token pack
Token ID2353
Feature activation-8.64e-03

Pos logit contributions
ggy+0.117
vern+0.115
vere+0.108
 liar+0.104
unch+0.103
Neg logit contributions
ened-0.074
cy-0.072
en-0.067
 Let-0.061
 Sarah-0.061
Token ost
Token ID23619
Feature activation+1.34e-02

Pos logit contributions
ened+0.015
cy+0.013
neath+0.013
 Girl+0.013
en+0.012
Neg logit contributions
ggy-0.020
 badly-0.019
vern-0.017
 liar-0.017
 not-0.016
Tokenric
Token ID1173
Feature activation+1.08e-01

Pos logit contributions
ggy+0.027
 badly+0.023
vern+0.023
 liar+0.022
vere+0.021
Neg logit contributions
ened-0.028
cy-0.025
en-0.023
 Sarah-0.023
ted-0.023
Tokenhes
Token ID956
Feature activation-6.62e-03

Pos logit contributions
ggy+0.131
 badly+0.112
vern+0.106
 liar+0.100
vere+0.088
Neg logit contributions
ened-0.298
cy-0.277
ted-0.266
en-0.263
neath-0.262
Token anymore
Token ID7471
Feature activation+1.19e-02

Pos logit contributions
ened+0.012
cy+0.011
 Girl+0.010
neath+0.010
 Jenna+0.010
Neg logit contributions
ggy-0.015
 badly-0.014
vern-0.012
 not-0.012
 liar-0.012
Token.
Token ID13
Feature activation-9.44e-03

Pos logit contributions
ggy+0.028
vern+0.027
 liar+0.026
 badly+0.025
vere+0.025
Neg logit contributions
ened-0.019
cy-0.016
en-0.015
 Sarah-0.015
ted-0.014
Token They
Token ID1119
Feature activation-4.78e-04

Pos logit contributions
ened+0.013
cy+0.011
en+0.010
ted+0.010
neath+0.009
Neg logit contributions
ggy-0.026
 badly-0.024
vern-0.023
 liar-0.023
vere-0.022
Token pack
Token ID2353
Feature activation-2.20e-02

Pos logit contributions
ened+0.001
cy+0.001
en+0.001
ted+0.001
neath+0.001
Neg logit contributions
ggy-0.001
 badly-0.001
vern-0.001
 liar-0.001
gh-0.001
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
vern+0.058
ggy+0.058
 liar+0.056
vere+0.054
 badly+0.052
Neg logit contributions
ened-0.046
en-0.038
cy-0.036
 Sarah-0.035
 Jack-0.034
Token They
Token ID1119
Feature activation-7.60e-03

Pos logit contributions
ened+0.020
cy+0.016
en+0.016
ted+0.016
neath+0.015
Neg logit contributions
ggy-0.023
 badly-0.021
 not-0.020
vern-0.019
 much-0.019
Token made
Token ID925
Feature activation-3.57e-03

Pos logit contributions
ened+0.015
cy+0.013
en+0.012
ted+0.012
elling+0.012
Neg logit contributions
ggy-0.016
 badly-0.014
 liar-0.014
vern-0.013
ggie-0.013
Token their
Token ID511
Feature activation+5.78e-03

Pos logit contributions
ened+0.007
cy+0.007
en+0.006
ted+0.006
neath+0.006
Neg logit contributions
ggy-0.007
 badly-0.006
vern-0.006
 liar-0.006
ggie-0.006
Token own
Token ID898
Feature activation-1.26e-02

Pos logit contributions
ggy+0.014
 badly+0.012
vern+0.012
 liar+0.012
ggie+0.011
Neg logit contributions
ened-0.010
cy-0.008
ted-0.008
en-0.008
neath-0.008
Token pizz
Token ID38748
Feature activation+1.07e-01

Pos logit contributions
ened+0.028
cy+0.025
ted+0.024
en+0.024
neath+0.023
Neg logit contributions
ggy-0.024
 badly-0.021
vern-0.020
 liar-0.019
ggie-0.018
Tokenas
Token ID292
Feature activation+1.84e-02

Pos logit contributions
ggy+0.222
 badly+0.216
 liar+0.204
vern+0.199
 gag+0.188
Neg logit contributions
cy-0.190
ened-0.174
als-0.167
 Let-0.157
en-0.156
Token with
Token ID351
Feature activation-1.06e-02

Pos logit contributions
ggy+0.047
 badly+0.043
vern+0.041
 liar+0.039
ggie+0.038
Neg logit contributions
ened-0.029
cy-0.025
ted-0.023
neath-0.022
 Let-0.022
Token their
Token ID511
Feature activation-2.65e-03

Pos logit contributions
ened+0.016
cy+0.015
 Girl+0.013
 Let+0.013
en+0.013
Neg logit contributions
ggy-0.026
 badly-0.024
vern-0.023
 liar-0.022
gh-0.022
Token favorite
Token ID4004
Feature activation-2.31e-02

Pos logit contributions
ened+0.004
cy+0.004
en+0.003
ted+0.003
neath+0.003
Neg logit contributions
ggy-0.007
 badly-0.006
vern-0.006
 liar-0.006
gh-0.006
Token things
Token ID1243
Feature activation+4.80e-02

Pos logit contributions
ened+0.050
cy+0.046
neath+0.044
 Jenna+0.044
ted+0.044
Neg logit contributions
ggy-0.044
 badly-0.036
 liar-0.035
gh-0.034
vern-0.034
Token day
Token ID1110
Feature activation+4.66e-03

Pos logit contributions
ened+0.028
cy+0.025
 Sarah+0.023
en+0.022
 Let+0.022
Neg logit contributions
ggy-0.036
vern-0.034
 badly-0.033
 liar-0.032
vere-0.032
Token ever
Token ID1683
Feature activation+1.28e-02

Pos logit contributions
ggy+0.006
 badly+0.004
vern+0.004
 liar+0.004
ggie+0.004
Neg logit contributions
ened-0.014
cy-0.013
en-0.012
neath-0.012
ted-0.012
Token!
Token ID0
Feature activation+3.01e-03

Pos logit contributions
ggy+0.027
 badly+0.025
vern+0.024
 liar+0.022
ggie+0.022
Neg logit contributions
ened-0.025
cy-0.022
ted-0.021
neath-0.020
en-0.020
Token<|endoftext|>
Token ID50256
Feature activation+1.37e-02

Pos logit contributions
ggy+0.006
 badly+0.006
vern+0.006
 liar+0.005
vere+0.005
Neg logit contributions
ened-0.006
cy-0.005
en-0.005
ted-0.005
neath-0.005
TokenMor
Token ID20044
Feature activation-5.97e-02

Pos logit contributions
ggy+0.030
 badly+0.026
vern+0.026
 liar+0.025
ggie+0.024
Neg logit contributions
ened-0.026
cy-0.022
en-0.021
ted-0.020
 Jenna-0.020
Tokenris
Token ID2442
Feature activation-1.18e-01

Pos logit contributions
ened+0.112
 Let+0.101
 Sarah+0.099
 Girl+0.096
elling+0.095
Neg logit contributions
ggy-0.113
vern-0.097
gh-0.093
 badly-0.093
ggie-0.092
Token was
Token ID373
Feature activation+3.28e-02

Pos logit contributions
elling+0.192
 Jenna+0.182
ened+0.180
 Tina+0.179
 Let+0.173
Neg logit contributions
ggy-0.276
 badly-0.231
ggie-0.217
gh-0.212
 not-0.191
Token very
Token ID845
Feature activation+3.56e-02

Pos logit contributions
ggy+0.069
 badly+0.061
vern+0.059
 liar+0.057
ggie+0.055
Neg logit contributions
ened-0.067
cy-0.059
en-0.055
ted-0.054
 Let-0.053
Token excited
Token ID6568
Feature activation-4.22e-02

Pos logit contributions
ggy+0.059
 badly+0.051
vern+0.051
 liar+0.047
vere+0.045
Neg logit contributions
ened-0.086
cy-0.079
en-0.074
ted-0.073
 Sarah-0.073
Token.
Token ID13
Feature activation-1.44e-02

Pos logit contributions
ened+0.077
cy+0.072
neath+0.064
en+0.062
ted+0.061
Neg logit contributions
ggy-0.096
 badly-0.086
vern-0.082
 liar-0.079
ggie-0.078
Token Today
Token ID6288
Feature activation-2.67e-02

Pos logit contributions
ened+0.027
cy+0.021
ted+0.021
en+0.021
 roles+0.020
Neg logit contributions
ggy-0.032
 badly-0.028
vern-0.028
 liar-0.027
unch-0.027
Token up
Token ID510
Feature activation-3.85e-02

Pos logit contributions
ened+0.063
cy+0.058
en+0.054
ted+0.053
 Jenna+0.053
Neg logit contributions
ggy-0.040
 badly-0.037
vern-0.034
vere-0.031
 liar-0.031
Token high
Token ID1029
Feature activation-4.85e-02

Pos logit contributions
ened+0.082
cy+0.074
en+0.069
 Jenna+0.068
ted+0.068
Neg logit contributions
ggy-0.072
 badly-0.068
vern-0.062
 liar-0.058
vere-0.058
Token,
Token ID11
Feature activation-2.00e-02

Pos logit contributions
ened+0.084
cy+0.075
 Jenna+0.073
 Sarah+0.072
 Jenny+0.071
Neg logit contributions
ggy-0.106
 badly-0.102
vern-0.088
ggie-0.086
vere-0.085
Token beyond
Token ID3675
Feature activation-3.38e-02

Pos logit contributions
ened+0.044
cy+0.041
en+0.037
ted+0.037
 Jenna+0.036
Neg logit contributions
ggy-0.035
 badly-0.033
vern-0.030
 not-0.030
vere-0.029
Token where
Token ID810
Feature activation-2.05e-02

Pos logit contributions
ened+0.051
cy+0.050
ted+0.042
 Girl+0.042
oured+0.040
Neg logit contributions
ggy-0.084
 badly-0.079
vern-0.074
 not-0.074
 liar-0.072
Token Little
Token ID7703
Feature activation-1.14e-01

Pos logit contributions
ened+0.037
cy+0.034
en+0.031
ted+0.030
 Girl+0.029
Neg logit contributions
ggy-0.046
 badly-0.042
vern-0.040
 liar-0.039
vere-0.038
Token Ted
Token ID11396
Feature activation-4.73e-02

Pos logit contributions
ened+0.176
cy+0.155
elling+0.154
neath+0.153
en+0.146
Neg logit contributions
ggy-0.262
ggie-0.218
 badly-0.210
gh-0.207
 not-0.195
Token could
Token ID714
Feature activation+2.40e-02

Pos logit contributions
ened+0.077
cy+0.068
 Sarah+0.064
 Jenna+0.063
 Jenny+0.062
Neg logit contributions
ggy-0.116
 badly-0.103
vern-0.095
ggie-0.095
 liar-0.094
Token see
Token ID766
Feature activation-3.23e-02

Pos logit contributions
ggy+0.048
 liar+0.047
vern+0.046
 badly+0.045
istered+0.044
Neg logit contributions
ened-0.042
cy-0.036
en-0.035
 Let-0.035
neath-0.035
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
ened+0.045
cy+0.040
en+0.034
ted+0.033
neath+0.033
Neg logit contributions
ggy-0.087
 badly-0.080
vern-0.078
 liar-0.076
vere-0.073
Token Ted
Token ID11396
Feature activation-2.69e-02

Pos logit contributions
ened+0.031
cy+0.028
en+0.026
ted+0.025
neath+0.023
Neg logit contributions
ggy-0.031
 badly-0.027
vern-0.026
unch-0.026
ggie-0.025
Token from
Token ID422
Feature activation-9.83e-03

Pos logit contributions
ggy+0.074
 badly+0.067
vern+0.066
 liar+0.064
vere+0.061
Neg logit contributions
ened-0.057
cy-0.051
en-0.046
ted-0.046
 Sarah-0.045
Token now
Token ID783
Feature activation-1.79e-02

Pos logit contributions
ened+0.014
cy+0.012
en+0.011
ted+0.011
 Sarah+0.011
Neg logit contributions
ggy-0.026
vern-0.024
 badly-0.024
 liar-0.023
vere-0.022
Token on
Token ID319
Feature activation-2.28e-02

Pos logit contributions
ened+0.032
ted+0.027
cy+0.026
en+0.025
 Sarah+0.025
Neg logit contributions
ggy-0.041
vern-0.041
 liar-0.037
 badly-0.037
unch-0.036
Token.
Token ID13
Feature activation-2.94e-03

Pos logit contributions
ened+0.042
cy+0.037
en+0.034
ted+0.034
 Let+0.033
Neg logit contributions
ggy-0.052
 badly-0.047
vern-0.046
 liar-0.044
ggie-0.043
Token<|endoftext|>
Token ID50256
Feature activation+1.33e-02

Pos logit contributions
ened+0.006
cy+0.005
en+0.005
neath+0.005
ted+0.005
Neg logit contributions
ggy-0.006
 badly-0.005
vern-0.005
unch-0.005
 liar-0.005
TokenLittle
Token ID22253
Feature activation-1.13e-01

Pos logit contributions
ggy+0.032
 badly+0.030
vern+0.029
 liar+0.029
vere+0.027
Neg logit contributions
ened-0.021
cy-0.020
 Sarah-0.017
ted-0.017
 Let-0.017
Token Jack
Token ID3619
Feature activation-6.46e-02

Pos logit contributions
ened+0.220
isher+0.191
en+0.186
elling+0.186
 roles+0.185
Neg logit contributions
ggy-0.236
ggie-0.183
 not-0.181
 badly-0.178
gh-0.174
Token saw
Token ID2497
Feature activation-3.77e-02

Pos logit contributions
ened+0.114
elling+0.098
 roles+0.098
 1+0.095
 Sarah+0.095
Neg logit contributions
ggy-0.138
 badly-0.116
ggie-0.106
 liar-0.105
 not-0.105
Token something
Token ID1223
Feature activation+4.64e-02

Pos logit contributions
ened+0.084
cy+0.078
en+0.071
ted+0.069
 Let+0.069
Neg logit contributions
ggy-0.071
 badly-0.063
vern-0.060
 liar-0.058
ggie-0.055
Token special
Token ID2041
Feature activation-2.57e-02

Pos logit contributions
ggy+0.090
vern+0.084
 liar+0.081
 badly+0.077
vere+0.075
Neg logit contributions
ened-0.093
cy-0.083
 Sarah-0.081
 Jenny-0.077
 Johnny-0.076
Token on
Token ID319
Feature activation-4.88e-02

Pos logit contributions
ened+0.056
cy+0.051
neath+0.047
en+0.047
 Jenna+0.046
Neg logit contributions
ggy-0.050
 badly-0.044
vern-0.039
ggie-0.039
 liar-0.038
Token show
Token ID905
Feature activation-4.13e-02

Pos logit contributions
ggy+0.011
 badly+0.010
vern+0.010
 liar+0.009
vere+0.009
Neg logit contributions
ened-0.007
cy-0.006
en-0.006
 Sarah-0.006
 Jenna-0.006
Token off
Token ID572
Feature activation-2.37e-02

Pos logit contributions
ened+0.064
cy+0.060
neath+0.050
ted+0.050
 Let+0.049
Neg logit contributions
ggy-0.106
 badly-0.096
vern-0.094
 liar-0.089
ggie-0.089
Token her
Token ID607
Feature activation-9.33e-03

Pos logit contributions
ened+0.033
cy+0.030
en+0.025
ted+0.025
neath+0.025
Neg logit contributions
ggy-0.064
 badly-0.059
vern-0.057
 liar-0.055
ggie-0.054
Token new
Token ID649
Feature activation-2.45e-02

Pos logit contributions
ened+0.019
cy+0.017
en+0.015
ted+0.015
 Sarah+0.015
Neg logit contributions
ggy-0.020
 badly-0.017
vern-0.017
 liar-0.017
vere-0.016
Token<|endoftext|>
Token ID50256
Feature activation+1.06e-02

Pos logit contributions
ened+0.054
cy+0.048
en+0.045
ted+0.044
 Sarah+0.044
Neg logit contributions
ggy-0.046
 badly-0.040
vern-0.040
 liar-0.038
vere-0.037
TokenLittle
Token ID22253
Feature activation-1.12e-01

Pos logit contributions
ggy+0.024
 badly+0.021
vern+0.021
 liar+0.020
ggie+0.020
Neg logit contributions
ened-0.019
cy-0.016
en-0.015
ted-0.014
 Jenna-0.014
Token Jimmy
Token ID12963
Feature activation-4.86e-02

Pos logit contributions
ened+0.190
isher+0.175
elling+0.166
ays+0.160
 roles+0.159
Neg logit contributions
ggy-0.220
ggie-0.181
 not-0.171
gh-0.157
 badly-0.156
Token was
Token ID373
Feature activation+5.23e-02

Pos logit contributions
elling+0.090
ened+0.088
 roles+0.086
mith+0.081
isher+0.078
Neg logit contributions
ggy-0.092
 badly-0.077
 not-0.074
ggie-0.070
gh-0.067
Token very
Token ID845
Feature activation+5.92e-02

Pos logit contributions
ggy+0.107
vern+0.096
 badly+0.094
 liar+0.091
vere+0.087
Neg logit contributions
ened-0.106
cy-0.096
en-0.088
 Sarah-0.088
neath-0.087
Token proud
Token ID6613
Feature activation-2.08e-02

Pos logit contributions
ggy+0.118
vern+0.103
 badly+0.101
 liar+0.097
vere+0.093
Neg logit contributions
ened-0.125
cy-0.115
 Sarah-0.104
en-0.103
 Jenny-0.103
Token of
Token ID286
Feature activation-2.30e-02

Pos logit contributions
ened+0.043
cy+0.039
neath+0.036
en+0.036
ted+0.035
Neg logit contributions
ggy-0.042
 badly-0.038
vern-0.036
 liar-0.035
vere-0.033
Token the
Token ID262
Feature activation+6.67e-03

Pos logit contributions
ened+0.016
cy+0.015
 exp+0.015
 Let+0.013
en+0.013
Neg logit contributions
ggy-0.021
ggie-0.017
vern-0.017
 badly-0.017
 not-0.017
Token perfect
Token ID2818
Feature activation-3.20e-02

Pos logit contributions
ggy+0.015
 badly+0.014
vern+0.013
 liar+0.013
ggie+0.012
Neg logit contributions
ened-0.012
cy-0.011
en-0.010
ted-0.010
 Jenna-0.010
Token present
Token ID1944
Feature activation+3.65e-02

Pos logit contributions
ened+0.059
cy+0.053
 Jenna+0.051
neath+0.051
ted+0.051
Neg logit contributions
ggy-0.071
 badly-0.064
 liar-0.059
vern-0.059
ggie-0.058
Token!
Token ID0
Feature activation-2.43e-03

Pos logit contributions
ggy+0.078
 badly+0.069
ggie+0.062
vern+0.062
 liar+0.060
Neg logit contributions
ened-0.076
cy-0.066
 exp-0.064
neath-0.064
en-0.063
Token<|endoftext|>
Token ID50256
Feature activation+2.14e-02

Pos logit contributions
ened+0.006
cy+0.005
en+0.005
ted+0.004
neath+0.004
Neg logit contributions
ggy-0.005
vern-0.004
 badly-0.004
 liar-0.004
vere-0.004
TokenLittle
Token ID22253
Feature activation-1.11e-01

Pos logit contributions
ggy+0.049
 badly+0.043
vern+0.043
 liar+0.041
vere+0.040
Neg logit contributions
ened-0.039
cy-0.033
en-0.031
ted-0.030
 Jenna-0.029
Token John
Token ID1757
Feature activation-6.25e-02

Pos logit contributions
ened+0.195
elling+0.181
 stations+0.177
isher+0.174
 automobiles+0.173
Neg logit contributions
ggy-0.200
ggie-0.161
 not-0.145
gh-0.141
uch-0.131
Token was
Token ID373
Feature activation+5.23e-02

Pos logit contributions
ened+0.106
 roles+0.100
elling+0.099
 Jenna+0.097
 Sarah+0.095
Neg logit contributions
ggy-0.126
 badly-0.107
ggie-0.098
 not-0.096
 liar-0.094
Token very
Token ID845
Feature activation+6.41e-02

Pos logit contributions
ggy+0.115
vern+0.106
 badly+0.101
 liar+0.099
vere+0.095
Neg logit contributions
ened-0.097
cy-0.087
 Sarah-0.082
 Jenny-0.079
en-0.079
Token excited
Token ID6568
Feature activation-2.96e-02

Pos logit contributions
ggy+0.124
vern+0.109
 badly+0.107
 liar+0.102
vere+0.097
Neg logit contributions
ened-0.138
cy-0.127
 Sarah-0.115
en-0.114
ted-0.113
Token for
Token ID329
Feature activation-2.83e-03

Pos logit contributions
ened+0.057
cy+0.052
neath+0.048
ted+0.047
en+0.047
Neg logit contributions
ggy-0.064
 badly-0.058
vern-0.055
 liar-0.052
ggie-0.051
Token get
Token ID651
Feature activation+3.46e-02

Pos logit contributions
ggy+0.041
 badly+0.038
vern+0.037
 liar+0.037
vere+0.035
Neg logit contributions
ened-0.033
cy-0.031
en-0.029
 Sarah-0.028
 Let-0.028
Token back
Token ID736
Feature activation-1.62e-02

Pos logit contributions
ggy+0.069
 badly+0.063
vern+0.060
 liar+0.057
vere+0.055
Neg logit contributions
ened-0.072
cy-0.065
en-0.061
neath-0.059
ted-0.059
Token soon
Token ID2582
Feature activation-2.29e-02

Pos logit contributions
ened+0.034
cy+0.028
neath+0.028
 Girl+0.027
en+0.027
Neg logit contributions
ggy-0.033
 badly-0.031
vern-0.028
ggie-0.026
 not-0.026
Token!"
Token ID2474
Feature activation-3.73e-03

Pos logit contributions
ened+0.047
cy+0.043
ted+0.040
neath+0.039
en+0.039
Neg logit contributions
ggy-0.046
 badly-0.042
vern-0.039
 liar-0.038
gh-0.036
Token<|endoftext|>
Token ID50256
Feature activation+8.24e-03

Pos logit contributions
ened+0.008
cy+0.007
en+0.007
ener+0.006
 1+0.006
Neg logit contributions
ggy-0.007
 badly-0.006
unch-0.006
 not-0.006
 much-0.006
TokenLittle
Token ID22253
Feature activation-1.11e-01

Pos logit contributions
ggy+0.019
 badly+0.017
vern+0.017
 liar+0.016
vere+0.015
Neg logit contributions
ened-0.015
cy-0.013
en-0.012
ted-0.012
 Sarah-0.012
Token Dan
Token ID6035
Feature activation-5.58e-02

Pos logit contributions
ened+0.163
elling+0.144
en+0.137
 draws+0.134
rie+0.134
Neg logit contributions
ggy-0.264
ggie-0.223
 not-0.222
gh-0.222
 badly-0.219
Token was
Token ID373
Feature activation+4.30e-02

Pos logit contributions
ened+0.096
 roles+0.090
elling+0.090
riage+0.084
 First+0.083
Neg logit contributions
ggy-0.112
 badly-0.096
 not-0.086
ggie-0.085
gh-0.080
Token playing
Token ID2712
Feature activation-1.67e-02

Pos logit contributions
ggy+0.094
vern+0.087
 badly+0.083
 liar+0.081
vere+0.078
Neg logit contributions
ened-0.081
cy-0.073
 Sarah-0.068
en-0.067
neath-0.066
Token in
Token ID287
Feature activation-4.54e-02

Pos logit contributions
ened+0.034
cy+0.030
en+0.028
ted+0.027
 Jenna+0.027
Neg logit contributions
ggy-0.035
 badly-0.031
vern-0.030
 liar-0.029
ggie-0.028
Token the
Token ID262
Feature activation-2.63e-02

Pos logit contributions
ened+0.071
cy+0.061
als+0.060
ted+0.059
elling+0.058
Neg logit contributions
ggy-0.108
 badly-0.099
 not-0.093
ggie-0.089
 much-0.088
Token
Token ID198
Feature activation+7.43e-02

Pos logit contributions
ggy+0.151
 badly+0.141
 liar+0.137
vern+0.134
vere+0.123
Neg logit contributions
ened-0.194
cy-0.182
 Sarah-0.174
ted-0.171
neath-0.171
TokenThe
Token ID464
Feature activation-3.15e-02

Pos logit contributions
ggy+0.146
 badly+0.136
 liar+0.132
vern+0.131
 gag+0.122
Neg logit contributions
ened-0.141
cy-0.132
neath-0.123
 Sarah-0.121
 Let-0.121
Token End
Token ID5268
Feature activation+3.41e-02

Pos logit contributions
ened+0.067
cy+0.060
en+0.057
ted+0.056
neath+0.055
Neg logit contributions
ggy-0.063
 badly-0.055
vern-0.054
 liar-0.052
vere-0.049
Token.
Token ID13
Feature activation-5.74e-03

Pos logit contributions
ggy+0.080
vern+0.075
 badly+0.074
 liar+0.074
vere+0.069
Neg logit contributions
ened-0.052
cy-0.047
 Sarah-0.044
 Jenny-0.043
 Let-0.043
Token<|endoftext|>
Token ID50256
Feature activation+7.94e-03

Pos logit contributions
ened+0.013
cy+0.011
en+0.010
ted+0.010
neath+0.009
Neg logit contributions
ggy-0.011
 badly-0.010
vern-0.010
 liar-0.009
ggie-0.009
TokenLittle
Token ID22253
Feature activation-1.11e-01

Pos logit contributions
ggy+0.017
 badly+0.014
vern+0.014
unch+0.013
ggie+0.013
Neg logit contributions
ened-0.016
cy-0.013
en-0.013
 Jenna-0.012
ted-0.012
Token Tommy
Token ID19919
Feature activation-5.85e-02

Pos logit contributions
ened+0.181
 draws+0.175
elling+0.167
 channels+0.160
 paints+0.157
Neg logit contributions
ggy-0.226
ggie-0.187
 not-0.173
 badly-0.170
gh-0.164
Token was
Token ID373
Feature activation+4.39e-02

Pos logit contributions
ened+0.104
 roles+0.102
elling+0.099
mith+0.093
isher+0.092
Neg logit contributions
ggy-0.103
 badly-0.091
 not-0.087
ggie-0.077
 choke-0.076
Token so
Token ID523
Feature activation+4.06e-02

Pos logit contributions
ggy+0.088
 badly+0.078
vern+0.077
 liar+0.074
vere+0.070
Neg logit contributions
ened-0.091
cy-0.082
en-0.076
ted-0.076
neath-0.074
Token persistent
Token ID16218
Feature activation-3.97e-02

Pos logit contributions
ggy+0.093
vern+0.082
 badly+0.082
 liar+0.079
vere+0.075
Neg logit contributions
ened-0.075
cy-0.066
en-0.059
 Sarah-0.059
ted-0.059
Token!
Token ID0
Feature activation-7.69e-03

Pos logit contributions
ened+0.081
cy+0.078
neath+0.071
 Let+0.069
 draws+0.069
Neg logit contributions
ggy-0.080
 badly-0.072
ggie-0.065
vern-0.063
gh-0.060
Token in
Token ID287
Feature activation-2.16e-02

Pos logit contributions
ened+0.002
cy+0.001
neath+0.001
ted+0.001
 Girl+0.001
Neg logit contributions
ggy-0.002
 badly-0.001
vern-0.001
gh-0.001
 liar-0.001
Token the
Token ID262
Feature activation-8.95e-03

Pos logit contributions
ened+0.028
cy+0.024
ted+0.021
en+0.020
 Let+0.020
Neg logit contributions
ggy-0.060
 badly-0.056
vern-0.055
 liar-0.053
vere-0.051
Token sky
Token ID6766
Feature activation+3.50e-02

Pos logit contributions
ened+0.017
ted+0.014
cy+0.014
en+0.014
 means+0.013
Neg logit contributions
ggy-0.019
 badly-0.017
vern-0.017
 liar-0.016
vere-0.015
Token!
Token ID0
Feature activation+2.97e-03

Pos logit contributions
ggy+0.075
 badly+0.068
vern+0.065
 liar+0.062
ggie+0.060
Neg logit contributions
ened-0.068
cy-0.060
ted-0.056
neath-0.055
en-0.054
Token<|endoftext|>
Token ID50256
Feature activation+1.26e-02

Pos logit contributions
ggy+0.006
 badly+0.006
vern+0.006
 liar+0.005
unch+0.005
Neg logit contributions
ened-0.006
cy-0.005
en-0.005
ted-0.005
neath-0.005
TokenLittle
Token ID22253
Feature activation-1.10e-01

Pos logit contributions
ggy+0.028
 badly+0.025
vern+0.024
 liar+0.023
vere+0.022
Neg logit contributions
ened-0.024
cy-0.021
en-0.019
ted-0.019
 Sarah-0.019
Token Jake
Token ID14757
Feature activation-5.59e-02

Pos logit contributions
ened+0.174
 roles+0.153
 brushes+0.146
 channels+0.146
 stations+0.146
Neg logit contributions
ggy-0.242
gh-0.194
ggie-0.193
 badly-0.193
 not-0.192
Token was
Token ID373
Feature activation+5.33e-02

Pos logit contributions
ened+0.096
elling+0.091
 roles+0.088
riage+0.084
 Jenna+0.083
Neg logit contributions
ggy-0.116
 badly-0.102
ggie-0.090
 not-0.088
gh-0.086
Token a
Token ID257
Feature activation+3.64e-02

Pos logit contributions
ggy+0.113
vern+0.105
 badly+0.098
 liar+0.097
vere+0.093
Neg logit contributions
ened-0.104
cy-0.094
 Sarah-0.087
en-0.086
neath-0.085
Token happy
Token ID3772
Feature activation-3.25e-02

Pos logit contributions
ggy+0.069
 badly+0.061
vern+0.059
 liar+0.058
ggie+0.054
Neg logit contributions
ened-0.080
cy-0.070
ted-0.068
en-0.067
neath-0.065
Token,
Token ID11
Feature activation-1.73e-02

Pos logit contributions
ened+0.062
ted+0.053
neath+0.052
cy+0.051
en+0.049
Neg logit contributions
ggy-0.069
 badly-0.065
 liar-0.060
vern-0.056
ggie-0.056
Token
Token ID198
Feature activation+7.07e-02

Pos logit contributions
 liar+0.158
ggy+0.156
 badly+0.155
vern+0.144
vere+0.141
Neg logit contributions
ened-0.159
cy-0.154
 Sarah-0.153
ted-0.150
 Jenny-0.150
TokenThe
Token ID464
Feature activation-2.52e-02

Pos logit contributions
ggy+0.158
 liar+0.152
 badly+0.150
 gag+0.145
vern+0.144
Neg logit contributions
ened-0.103
cy-0.097
neath-0.095
ted-0.091
en-0.090
Token end
Token ID886
Feature activation+3.60e-02

Pos logit contributions
ened+0.045
cy+0.042
 smiles+0.038
 Sarah+0.037
 Let+0.037
Neg logit contributions
ggy-0.053
vern-0.050
 badly-0.049
unch-0.047
gh-0.045
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
ggy+0.079
 badly+0.071
vern+0.070
 liar+0.067
vere+0.064
Neg logit contributions
ened-0.069
cy-0.062
en-0.057
ted-0.056
neath-0.055
Token<|endoftext|>
Token ID50256
Feature activation+7.14e-03

Pos logit contributions
ened+0.023
cy+0.021
 Sarah+0.019
en+0.019
ted+0.019
Neg logit contributions
ggy-0.032
 badly-0.028
vern-0.028
 liar-0.027
vere-0.026
TokenLittle
Token ID22253
Feature activation-1.10e-01

Pos logit contributions
ggy+0.016
 badly+0.014
vern+0.014
 liar+0.013
vere+0.013
Neg logit contributions
ened-0.014
cy-0.012
en-0.011
ted-0.011
 Sarah-0.010
Token Jake
Token ID14757
Feature activation-5.64e-02

Pos logit contributions
ened+0.202
 draws+0.182
elling+0.180
en+0.179
 roles+0.177
Neg logit contributions
ggy-0.217
ggie-0.176
 badly-0.173
 not-0.171
gh-0.169
Token was
Token ID373
Feature activation+4.46e-02

Pos logit contributions
ened+0.093
elling+0.089
 roles+0.085
 Let+0.082
 1+0.081
Neg logit contributions
ggy-0.119
 badly-0.102
ggie-0.093
 not-0.091
gh-0.089
Token so
Token ID523
Feature activation+3.84e-02

Pos logit contributions
ggy+0.096
vern+0.087
 badly+0.083
 liar+0.082
vere+0.079
Neg logit contributions
ened-0.087
cy-0.078
 Sarah-0.072
en-0.071
neath-0.071
Token excited
Token ID6568
Feature activation-3.72e-02

Pos logit contributions
ggy+0.093
vern+0.083
 liar+0.081
 badly+0.080
vere+0.077
Neg logit contributions
ened-0.065
cy-0.059
 Sarah-0.052
 Johnny-0.050
 Jenna-0.050
Token!
Token ID0
Feature activation-1.22e-02

Pos logit contributions
ened+0.070
cy+0.066
neath+0.059
en+0.057
ted+0.056
Neg logit contributions
ggy-0.082
 badly-0.073
vern-0.068
ggie-0.066
gh-0.065
Token began
Token ID2540
Feature activation-1.50e-02

Pos logit contributions
ened+0.037
elling+0.034
cy+0.033
 Jenna+0.033
 Jenny+0.032
Neg logit contributions
ggy-0.055
 badly-0.045
gh-0.045
ggie-0.044
unch-0.043
Token to
Token ID284
Feature activation+1.23e-02

Pos logit contributions
ened+0.028
cy+0.027
 Girl+0.024
 draws+0.023
neath+0.023
Neg logit contributions
ggy-0.030
 badly-0.029
vern-0.027
vere-0.026
gh-0.025
Token set
Token ID900
Feature activation-2.66e-02

Pos logit contributions
ggy+0.027
 badly+0.024
vern+0.024
vere+0.023
gh+0.022
Neg logit contributions
ened-0.022
cy-0.020
 Jenny-0.018
 Sarah-0.017
ted-0.017
Token.
Token ID13
Feature activation-5.45e-03

Pos logit contributions
ened+0.054
cy+0.052
ted+0.046
 Let+0.046
 Girl+0.046
Neg logit contributions
ggy-0.053
 badly-0.049
vern-0.045
gh-0.042
ggie-0.042
Token<|endoftext|>
Token ID50256
Feature activation+8.04e-03

Pos logit contributions
ened+0.011
en+0.009
cy+0.009
ted+0.008
 Girl+0.008
Neg logit contributions
ggy-0.011
 badly-0.010
unch-0.010
vern-0.010
 much-0.009
TokenLittle
Token ID22253
Feature activation-1.09e-01

Pos logit contributions
ggy+0.016
 badly+0.014
vern+0.013
unch+0.013
vere+0.013
Neg logit contributions
ened-0.017
cy-0.014
en-0.013
 Jenna-0.013
elling-0.013
Token Johnny
Token ID15470
Feature activation-6.09e-02

Pos logit contributions
ened+0.184
elling+0.176
 stations+0.168
 roles+0.168
 channels+0.167
Neg logit contributions
ggy-0.214
ggie-0.169
 not-0.164
gh-0.161
 badly-0.159
Token was
Token ID373
Feature activation+4.57e-02

Pos logit contributions
ened+0.104
elling+0.103
 roles+0.102
riage+0.093
 Girl+0.091
Neg logit contributions
ggy-0.109
 badly-0.094
 not-0.087
ggie-0.082
 never-0.080
Token so
Token ID523
Feature activation+4.18e-02

Pos logit contributions
ggy+0.094
vern+0.084
 badly+0.083
 liar+0.080
vere+0.076
Neg logit contributions
ened-0.093
cy-0.084
en-0.077
 Sarah-0.076
ted-0.076
Token excited
Token ID6568
Feature activation-3.56e-02

Pos logit contributions
ggy+0.097
vern+0.086
 badly+0.085
 liar+0.083
ggie+0.079
Neg logit contributions
ened-0.075
cy-0.067
 Sarah-0.060
en-0.059
neath-0.058
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
ened+0.069
cy+0.066
neath+0.059
ted+0.056
 Let+0.056
Neg logit contributions
ggy-0.076
 badly-0.068
vern-0.062
ggie-0.061
gh-0.059
Token didn
Token ID1422
Feature activation+4.72e-02

Pos logit contributions
ened+0.005
cy+0.004
 1+0.004
neath+0.004
elling+0.004
Neg logit contributions
ggy-0.007
 badly-0.006
vern-0.006
ggie-0.005
 not-0.005
Token't
Token ID470
Feature activation+4.00e-02

Pos logit contributions
ggy+0.077
 liar+0.070
vern+0.065
 badly+0.063
unch+0.058
Neg logit contributions
ened-0.111
en-0.099
cy-0.099
neath-0.098
 Sarah-0.096
Token return
Token ID1441
Feature activation-1.72e-02

Pos logit contributions
ggy+0.085
vern+0.075
 badly+0.071
 liar+0.070
vere+0.068
Neg logit contributions
ened-0.082
cy-0.070
en-0.068
 Sarah-0.067
 Jenny-0.063
Token.
Token ID13
Feature activation-1.95e-03

Pos logit contributions
ened+0.034
cy+0.033
neath+0.032
 draws+0.030
ted+0.030
Neg logit contributions
 badly-0.032
ggy-0.032
vern-0.029
 not-0.027
gh-0.027
Token<|endoftext|>
Token ID50256
Feature activation+7.47e-03

Pos logit contributions
ened+0.004
cy+0.003
ted+0.003
en+0.003
neath+0.003
Neg logit contributions
ggy-0.004
 badly-0.004
vern-0.004
 liar-0.004
unch-0.004
TokenLittle
Token ID22253
Feature activation-1.09e-01

Pos logit contributions
ggy+0.015
 badly+0.013
vern+0.013
ggie+0.012
gh+0.012
Neg logit contributions
ened-0.015
cy-0.012
en-0.012
 Jenna-0.012
elling-0.012
Token John
Token ID1757
Feature activation-6.48e-02

Pos logit contributions
ened+0.187
elling+0.178
 draws+0.174
 roles+0.171
angles+0.169
Neg logit contributions
ggy-0.222
ggie-0.180
 badly-0.171
 not-0.170
gh-0.161
Token spent
Token ID3377
Feature activation-3.34e-02

Pos logit contributions
ened+0.105
 roles+0.102
elling+0.100
riage+0.093
 Girl+0.093
Neg logit contributions
ggy-0.130
 badly-0.119
 not-0.103
ggie-0.102
 choke-0.101
Token his
Token ID465
Feature activation+3.00e-03

Pos logit contributions
ened+0.055
cy+0.049
en+0.044
ted+0.044
 Let+0.043
Neg logit contributions
ggy-0.082
 badly-0.074
vern-0.072
 liar-0.069
ggie-0.067
Token days
Token ID1528
Feature activation+4.46e-03

Pos logit contributions
ggy+0.006
 badly+0.005
vern+0.005
 liar+0.005
ggie+0.005
Neg logit contributions
ened-0.006
cy-0.006
en-0.005
ted-0.005
neath-0.005
Token playing
Token ID2712
Feature activation-2.11e-02

Pos logit contributions
ggy+0.010
vern+0.009
 badly+0.008
 liar+0.008
vere+0.008
Neg logit contributions
ened-0.009
cy-0.008
en-0.007
ted-0.007
 Sarah-0.007
Token a
Token ID257
Feature activation+4.56e-02

Pos logit contributions
ggy+0.109
 badly+0.098
vern+0.096
 liar+0.091
vere+0.087
Neg logit contributions
ened-0.111
cy-0.101
ted-0.093
en-0.092
neath-0.090
Token bad
Token ID2089
Feature activation-1.07e-02

Pos logit contributions
ggy+0.081
vern+0.070
 badly+0.069
 liar+0.064
vere+0.063
Neg logit contributions
ened-0.102
cy-0.095
 Sarah-0.088
en-0.086
 Let-0.086
Token ending
Token ID7464
Feature activation+1.52e-02

Pos logit contributions
ened+0.024
cy+0.022
en+0.020
ted+0.020
neath+0.020
Neg logit contributions
ggy-0.020
 badly-0.018
vern-0.017
 liar-0.016
vere-0.015
Token.
Token ID13
Feature activation+9.38e-03

Pos logit contributions
ggy+0.033
 badly+0.030
vern+0.028
 liar+0.026
vere+0.026
Neg logit contributions
ened-0.030
cy-0.026
neath-0.025
ted-0.024
en-0.024
Token<|endoftext|>
Token ID50256
Feature activation+6.78e-03

Pos logit contributions
ggy+0.021
 badly+0.019
vern+0.019
 liar+0.018
vere+0.017
Neg logit contributions
ened-0.018
cy-0.015
en-0.014
ted-0.014
neath-0.014
TokenLittle
Token ID22253
Feature activation-1.09e-01

Pos logit contributions
ggy+0.014
 badly+0.012
vern+0.012
 liar+0.011
ggie+0.011
Neg logit contributions
ened-0.014
cy-0.012
en-0.011
ted-0.011
 Jenna-0.011
Token Sandra
Token ID32464
Feature activation-5.39e-02

Pos logit contributions
ened+0.167
 draws+0.152
elling+0.152
en+0.150
 paints+0.150
Neg logit contributions
ggy-0.237
 badly-0.222
ggie-0.196
gh-0.192
 not-0.190
Token was
Token ID373
Feature activation+4.12e-02

Pos logit contributions
ened+0.089
 roles+0.081
elling+0.080
cy+0.077
 First+0.074
Neg logit contributions
ggy-0.114
 badly-0.102
 not-0.091
ggie-0.089
 liar-0.088
Token so
Token ID523
Feature activation+3.41e-02

Pos logit contributions
ggy+0.085
vern+0.077
 badly+0.074
 liar+0.072
vere+0.069
Neg logit contributions
ened-0.083
cy-0.075
 Sarah-0.070
en-0.069
neath-0.068
Token excited
Token ID6568
Feature activation-4.02e-02

Pos logit contributions
ggy+0.072
vern+0.063
 badly+0.063
 liar+0.061
vere+0.058
Neg logit contributions
ened-0.068
cy-0.061
 Sarah-0.055
en-0.055
ted-0.055
Token that
Token ID326
Feature activation-1.76e-02

Pos logit contributions
ened+0.077
cy+0.071
neath+0.065
en+0.062
 Let+0.061
Neg logit contributions
ggy-0.087
 badly-0.078
vern-0.071
ggie-0.071
gh-0.068
Token worked
Token ID3111
Feature activation-1.12e-02

Pos logit contributions
ggy+0.045
 badly+0.039
vern+0.039
 liar+0.037
ggie+0.035
Neg logit contributions
ened-0.042
cy-0.038
en-0.035
ted-0.035
neath-0.034
Token very
Token ID845
Feature activation+4.07e-02

Pos logit contributions
ened+0.022
cy+0.021
 Let+0.020
ted+0.020
neath+0.019
Neg logit contributions
ggy-0.022
 badly-0.021
vern-0.020
gh-0.019
ggie-0.018
Token well
Token ID880
Feature activation-1.59e-03

Pos logit contributions
ggy+0.071
vern+0.061
 liar+0.060
 badly+0.059
vere+0.056
Neg logit contributions
ened-0.093
cy-0.085
en-0.079
ted-0.077
neath-0.076
Token!
Token ID0
Feature activation-2.12e-03

Pos logit contributions
ened+0.003
cy+0.002
 Let+0.002
ted+0.002
neath+0.002
Neg logit contributions
ggy-0.004
 badly-0.004
vern-0.003
gh-0.003
ggie-0.003
Token<|endoftext|>
Token ID50256
Feature activation+8.69e-03

Pos logit contributions
ened+0.005
cy+0.004
en+0.004
ted+0.004
ener+0.004
Neg logit contributions
ggy-0.004
vern-0.003
 much-0.003
 badly-0.003
vere-0.003
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.019
 badly+0.017
vern+0.016
 liar+0.015
ggie+0.015
Neg logit contributions
ened-0.017
cy-0.014
en-0.013
 Jenna-0.013
ted-0.013
Token Jack
Token ID3619
Feature activation-6.22e-02

Pos logit contributions
ened+0.190
elling+0.171
isher+0.158
en+0.157
ening+0.154
Neg logit contributions
ggy-0.238
ggie-0.194
 not-0.184
gh-0.179
 badly-0.167
Token and
Token ID290
Feature activation-3.46e-02

Pos logit contributions
ened+0.107
elling+0.096
 roles+0.093
riage+0.089
 Girl+0.088
Neg logit contributions
ggy-0.131
 badly-0.114
ggie-0.101
 liar-0.101
 not-0.101
Token his
Token ID465
Feature activation-3.27e-02

Pos logit contributions
ened+0.061
cy+0.053
en+0.051
ted+0.048
neath+0.048
Neg logit contributions
ggy-0.082
 badly-0.073
 liar-0.071
ggie-0.067
vern-0.065
Token mum
Token ID25682
Feature activation-3.40e-02

Pos logit contributions
ened+0.065
en+0.055
cy+0.054
neath+0.051
ted+0.051
Neg logit contributions
ggy-0.072
 badly-0.063
vern-0.062
 liar-0.062
ggie-0.058
Token were
Token ID547
Feature activation+4.25e-02

Pos logit contributions
ened+0.061
 roles+0.054
elling+0.051
rie+0.051
cy+0.050
Neg logit contributions
ggy-0.062
 badly-0.055
unch-0.053
 not-0.051
 much-0.051
Token ever
Token ID1683
Feature activation+5.48e-03

Pos logit contributions
ened+0.043
cy+0.041
neath+0.039
elling+0.036
oured+0.036
Neg logit contributions
ggy-0.052
 badly-0.047
gh-0.042
 liar-0.042
ggie-0.042
Token happening
Token ID5836
Feature activation-4.42e-02

Pos logit contributions
ggy+0.011
 badly+0.010
vern+0.009
 not+0.009
vere+0.009
Neg logit contributions
ened-0.010
cy-0.009
 First-0.009
neath-0.009
ted-0.008
Token again
Token ID757
Feature activation-9.03e-03

Pos logit contributions
ened+0.082
cy+0.077
 Let+0.071
ted+0.070
neath+0.070
Neg logit contributions
ggy-0.088
 badly-0.087
vern-0.075
gh-0.074
vere-0.073
Token.
Token ID13
Feature activation+7.31e-03

Pos logit contributions
ened+0.016
cy+0.015
ted+0.013
neath+0.013
en+0.013
Neg logit contributions
ggy-0.021
 badly-0.019
vern-0.018
 liar-0.017
ggie-0.017
Token<|endoftext|>
Token ID50256
Feature activation+1.04e-02

Pos logit contributions
ggy+0.013
 badly+0.011
vern+0.011
unch+0.010
 much+0.010
Neg logit contributions
ened-0.017
en-0.015
cy-0.014
ted-0.014
 roles-0.014
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.022
 badly+0.019
vern+0.019
 liar+0.018
ggie+0.018
Neg logit contributions
ened-0.021
cy-0.017
en-0.017
ted-0.016
 Jenna-0.016
Token Tommy
Token ID19919
Feature activation-5.56e-02

Pos logit contributions
ened+0.188
 channels+0.173
elling+0.172
 roles+0.172
 stations+0.169
Neg logit contributions
ggy-0.215
ggie-0.172
 not-0.164
 badly-0.157
gh-0.154
Token was
Token ID373
Feature activation+4.62e-02

Pos logit contributions
ened+0.101
 roles+0.096
elling+0.095
mith+0.090
isher+0.089
Neg logit contributions
ggy-0.104
 badly-0.090
 not-0.085
ggie-0.079
 choke-0.078
Token getting
Token ID1972
Feature activation+4.90e-02

Pos logit contributions
ggy+0.096
vern+0.086
 badly+0.084
 liar+0.081
vere+0.078
Neg logit contributions
ened-0.093
cy-0.083
en-0.077
 Sarah-0.077
neath-0.076
Token ready
Token ID3492
Feature activation-3.69e-02

Pos logit contributions
ggy+0.096
 badly+0.087
vern+0.085
 liar+0.080
vere+0.077
Neg logit contributions
ened-0.103
cy-0.093
en-0.087
ted-0.085
 Let-0.084
Token for
Token ID329
Feature activation-1.21e-02

Pos logit contributions
ened+0.069
cy+0.067
ted+0.058
 Let+0.056
en+0.056
Neg logit contributions
ggy-0.078
 badly-0.073
vern-0.069
gh-0.067
vere-0.065
Token
Token ID198
Feature activation+7.40e-02

Pos logit contributions
ggy+0.199
 liar+0.195
 badly+0.191
vern+0.185
 gag+0.177
Neg logit contributions
ened-0.135
cy-0.126
 Sarah-0.124
neath-0.120
 Jenny-0.118
TokenThe
Token ID464
Feature activation-2.57e-02

Pos logit contributions
ggy+0.162
 badly+0.152
 liar+0.152
vern+0.150
 gag+0.144
Neg logit contributions
ened-0.117
cy-0.109
neath-0.107
 Let-0.100
ted-0.099
Token end
Token ID886
Feature activation+3.85e-02

Pos logit contributions
ened+0.051
cy+0.047
en+0.041
 smiles+0.041
neath+0.041
Neg logit contributions
ggy-0.052
vern-0.048
 badly-0.047
unch-0.044
gh-0.043
Token.
Token ID13
Feature activation-7.73e-03

Pos logit contributions
ggy+0.081
 badly+0.070
vern+0.068
ggie+0.063
 liar+0.062
Neg logit contributions
ened-0.081
cy-0.072
ted-0.067
neath-0.066
en-0.066
Token<|endoftext|>
Token ID50256
Feature activation+1.24e-02

Pos logit contributions
ened+0.016
cy+0.014
en+0.013
ted+0.013
neath+0.012
Neg logit contributions
ggy-0.016
 badly-0.015
vern-0.014
 liar-0.014
vere-0.013
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.027
 badly+0.023
vern+0.022
 liar+0.021
ggie+0.021
Neg logit contributions
ened-0.024
cy-0.020
en-0.019
 Jenna-0.019
ted-0.018
Token Max
Token ID5436
Feature activation-5.14e-02

Pos logit contributions
ened+0.173
 draws+0.165
elling+0.163
 channels+0.161
angles+0.159
Neg logit contributions
ggy-0.222
ggie-0.181
 badly-0.173
 not-0.169
gh-0.158
Token was
Token ID373
Feature activation+4.46e-02

Pos logit contributions
ened+0.090
 roles+0.088
elling+0.086
isher+0.083
mith+0.082
Neg logit contributions
ggy-0.097
 badly-0.085
 not-0.078
ggie-0.074
 choke-0.070
Token playing
Token ID2712
Feature activation-1.13e-02

Pos logit contributions
ggy+0.093
vern+0.084
 badly+0.081
 liar+0.079
vere+0.076
Neg logit contributions
ened-0.089
cy-0.080
 Sarah-0.074
en-0.074
neath-0.073
Token in
Token ID287
Feature activation-3.74e-02

Pos logit contributions
ened+0.023
cy+0.021
ted+0.019
en+0.019
 Girl+0.019
Neg logit contributions
ggy-0.023
 badly-0.021
vern-0.019
 liar-0.019
ggie-0.018
Token the
Token ID262
Feature activation-2.20e-02

Pos logit contributions
ened+0.062
als+0.053
ted+0.053
cy+0.053
elling+0.052
Neg logit contributions
ggy-0.085
 badly-0.079
 not-0.073
 much-0.069
ggie-0.069
Token and
Token ID290
Feature activation-3.05e-02

Pos logit contributions
ened+0.043
cy+0.040
neath+0.039
elling+0.036
en+0.036
Neg logit contributions
ggy-0.052
 badly-0.050
 not-0.044
vern-0.043
ggie-0.042
Token hopped
Token ID45230
Feature activation-2.05e-02

Pos logit contributions
ened+0.050
cy+0.044
en+0.043
elling+0.043
neath+0.042
Neg logit contributions
ggy-0.071
 badly-0.063
ggie-0.057
 not-0.056
vern-0.055
Token away
Token ID1497
Feature activation-2.10e-02

Pos logit contributions
ened+0.035
cy+0.030
 Let+0.028
 draws+0.028
 means+0.028
Neg logit contributions
ggy-0.048
 badly-0.044
vern-0.042
gh-0.039
ggie-0.039
Token.
Token ID13
Feature activation+4.31e-03

Pos logit contributions
ened+0.035
cy+0.030
en+0.027
ted+0.027
 Sarah+0.027
Neg logit contributions
ggy-0.051
 badly-0.046
vern-0.046
 liar-0.045
vere-0.043
Token<|endoftext|>
Token ID50256
Feature activation+1.24e-02

Pos logit contributions
ggy+0.008
 badly+0.007
vern+0.007
 much+0.007
unch+0.007
Neg logit contributions
ened-0.009
en-0.008
cy-0.007
isher-0.007
elling-0.007
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.025
 badly+0.021
vern+0.020
ggie+0.019
vere+0.019
Neg logit contributions
ened-0.026
cy-0.022
en-0.021
 Jenna-0.021
elling-0.021
Token Harry
Token ID5850
Feature activation-5.86e-02

Pos logit contributions
ened+0.185
 channels+0.173
elling+0.171
 programs+0.167
 stations+0.167
Neg logit contributions
ggy-0.212
ggie-0.171
 badly-0.168
 not-0.156
gh-0.150
Token was
Token ID373
Feature activation+4.40e-02

Pos logit contributions
ened+0.101
elling+0.094
 roles+0.090
cy+0.086
 Let+0.084
Neg logit contributions
ggy-0.122
 badly-0.106
 not-0.097
ggie-0.096
gh-0.090
Token very
Token ID845
Feature activation+5.63e-02

Pos logit contributions
ggy+0.092
vern+0.084
 badly+0.081
 liar+0.079
vere+0.075
Neg logit contributions
ened-0.087
cy-0.078
 Sarah-0.072
en-0.072
neath-0.071
Token curious
Token ID11040
Feature activation-4.04e-02

Pos logit contributions
ggy+0.113
vern+0.099
 badly+0.098
 liar+0.094
vere+0.090
Neg logit contributions
ened-0.118
cy-0.107
en-0.097
 Sarah-0.097
ted-0.096
Token.
Token ID13
Feature activation-1.29e-02

Pos logit contributions
ened+0.076
cy+0.075
neath+0.068
ted+0.065
en+0.064
Neg logit contributions
ggy-0.083
 badly-0.079
vern-0.070
ggie-0.069
 liar-0.068
Token to
Token ID284
Feature activation+1.46e-02

Pos logit contributions
ggy+0.093
 badly+0.083
vern+0.082
 liar+0.078
vere+0.075
Neg logit contributions
ened-0.084
cy-0.075
en-0.069
ted-0.069
neath-0.068
Token support
Token ID1104
Feature activation-4.78e-02

Pos logit contributions
ggy+0.032
 badly+0.029
vern+0.029
 liar+0.029
vere+0.027
Neg logit contributions
ened-0.025
cy-0.022
 Sarah-0.022
en-0.021
 Jenna-0.021
Token him
Token ID683
Feature activation+7.49e-03

Pos logit contributions
ened+0.079
cy+0.069
ted+0.062
en+0.061
neath+0.061
Neg logit contributions
ggy-0.117
 badly-0.107
vern-0.104
 liar-0.100
vere-0.097
Token.
Token ID13
Feature activation+1.73e-03

Pos logit contributions
ggy+0.016
 badly+0.015
vern+0.014
 liar+0.014
vere+0.013
Neg logit contributions
ened-0.014
cy-0.013
ted-0.012
en-0.012
neath-0.011
Token<|endoftext|>
Token ID50256
Feature activation+9.16e-03

Pos logit contributions
ggy+0.003
 badly+0.003
vern+0.003
vere+0.003
 liar+0.003
Neg logit contributions
ened-0.004
cy-0.003
en-0.003
ted-0.003
neath-0.003
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.019
 badly+0.016
vern+0.015
ggie+0.015
vere+0.015
Neg logit contributions
ened-0.019
cy-0.015
en-0.015
elling-0.015
 Jenna-0.015
Token Ralph
Token ID20993
Feature activation-4.86e-02

Pos logit contributions
ened+0.168
elling+0.159
 brushes+0.152
 draws+0.151
 paints+0.151
Neg logit contributions
ggy-0.234
ggie-0.193
 badly-0.189
gh-0.183
 not-0.181
Tokenie
Token ID494
Feature activation-4.08e-02

Pos logit contributions
ened+0.084
elling+0.076
 roles+0.076
ted+0.073
 Girl+0.072
Neg logit contributions
ggy-0.096
 badly-0.087
 not-0.079
ggie-0.073
 much-0.073
Token was
Token ID373
Feature activation+4.90e-02

Pos logit contributions
ened+0.071
elling+0.066
cy+0.063
ted+0.060
 Girl+0.059
Neg logit contributions
ggy-0.089
 badly-0.076
 not-0.070
ggie-0.068
 liar-0.066
Token very
Token ID845
Feature activation+5.61e-02

Pos logit contributions
ggy+0.099
vern+0.089
 badly+0.085
 liar+0.083
vere+0.080
Neg logit contributions
ened-0.101
cy-0.092
 Sarah-0.085
en-0.084
neath-0.083
Token excited
Token ID6568
Feature activation-3.53e-02

Pos logit contributions
ggy+0.115
vern+0.101
 badly+0.099
 liar+0.096
vere+0.092
Neg logit contributions
ened-0.115
cy-0.105
 Sarah-0.095
en-0.094
 Jenny-0.093
Token on
Token ID319
Feature activation-2.37e-02

Pos logit contributions
ggy+0.055
vern+0.047
 liar+0.047
 badly+0.046
ggie+0.044
Neg logit contributions
ened-0.056
cy-0.049
en-0.047
 Sarah-0.045
neath-0.045
Token the
Token ID262
Feature activation-7.97e-03

Pos logit contributions
ened+0.037
cy+0.033
ted+0.029
 Let+0.029
elling+0.028
Neg logit contributions
ggy-0.060
 badly-0.056
vern-0.052
gh-0.051
 liar-0.050
Token ground
Token ID2323
Feature activation+3.87e-02

Pos logit contributions
ened+0.015
cy+0.013
en+0.012
ted+0.012
neath+0.012
Neg logit contributions
ggy-0.017
vern-0.016
 badly-0.016
 liar-0.015
vere-0.014
Token.
Token ID13
Feature activation+1.13e-03

Pos logit contributions
ggy+0.089
vern+0.079
 badly+0.078
 liar+0.076
vere+0.073
Neg logit contributions
ened-0.069
cy-0.061
en-0.057
ted-0.055
 Sarah-0.054
Token<|endoftext|>
Token ID50256
Feature activation+1.04e-02

Pos logit contributions
ggy+0.002
 badly+0.002
vern+0.002
vere+0.002
unch+0.002
Neg logit contributions
ened-0.002
cy-0.002
en-0.002
ted-0.002
neath-0.002
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.023
 badly+0.020
vern+0.020
 liar+0.019
vere+0.018
Neg logit contributions
ened-0.020
cy-0.017
en-0.016
ted-0.016
 Jenna-0.015
Token Tim
Token ID5045
Feature activation-6.89e-02

Pos logit contributions
ened+0.171
 roles+0.156
elling+0.154
 programs+0.146
 draws+0.146
Neg logit contributions
ggy-0.236
 badly-0.194
 not-0.191
ggie-0.188
gh-0.186
Tokenmy
Token ID1820
Feature activation-4.56e-02

Pos logit contributions
ened+0.101
 roles+0.100
als+0.092
ted+0.089
elling+0.089
Neg logit contributions
ggy-0.138
 badly-0.134
 choke-0.117
 not-0.117
gh-0.112
Token was
Token ID373
Feature activation+4.41e-02

Pos logit contributions
ened+0.076
cy+0.070
elling+0.069
 1+0.066
 roles+0.066
Neg logit contributions
ggy-0.100
 badly-0.091
 liar-0.079
 not-0.078
 choke-0.077
Token playing
Token ID2712
Feature activation-1.39e-02

Pos logit contributions
ggy+0.097
vern+0.091
 liar+0.084
 badly+0.083
vere+0.081
Neg logit contributions
ened-0.082
cy-0.073
 Sarah-0.071
neath-0.068
 Johnny-0.067
Token by
Token ID416
Feature activation-3.63e-02

Pos logit contributions
ened+0.027
cy+0.024
en+0.022
ted+0.022
neath+0.021
Neg logit contributions
ggy-0.030
 badly-0.027
vern-0.026
 liar-0.025
ggie-0.024
Token the
Token ID262
Feature activation-1.78e-02

Pos logit contributions
ened+0.040
cy+0.035
ted+0.030
en+0.030
neath+0.029
Neg logit contributions
ggy-0.068
 badly-0.062
vern-0.061
 liar-0.059
vere-0.057
Token zoo
Token ID26626
Feature activation+9.37e-03

Pos logit contributions
ened+0.033
cy+0.030
en+0.027
ted+0.027
 Sarah+0.026
Neg logit contributions
ggy-0.039
 badly-0.035
vern-0.035
 liar-0.033
vere-0.032
Token again
Token ID757
Feature activation-7.62e-03

Pos logit contributions
ggy+0.021
 badly+0.019
vern+0.018
 liar+0.017
ggie+0.017
Neg logit contributions
ened-0.018
cy-0.016
neath-0.015
ted-0.014
en-0.014
Token.
Token ID13
Feature activation+5.97e-03

Pos logit contributions
ened+0.014
cy+0.012
en+0.011
ted+0.011
neath+0.011
Neg logit contributions
ggy-0.017
 badly-0.015
vern-0.015
 liar-0.015
vere-0.014
Token<|endoftext|>
Token ID50256
Feature activation+1.18e-02

Pos logit contributions
ggy+0.013
 badly+0.012
vern+0.011
 liar+0.011
vere+0.011
Neg logit contributions
ened-0.012
cy-0.010
en-0.009
ted-0.009
neath-0.009
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.024
 badly+0.021
vern+0.020
 liar+0.019
ggie+0.019
Neg logit contributions
ened-0.024
cy-0.020
en-0.020
 Jenna-0.019
ted-0.019
Token Cor
Token ID2744
Feature activation-3.36e-02

Pos logit contributions
ened+0.188
elling+0.174
isher+0.157
en+0.156
 roles+0.156
Neg logit contributions
ggy-0.230
ggie-0.191
gh-0.179
 not-0.174
 badly-0.173
Tokena
Token ID64
Feature activation-3.66e-02

Pos logit contributions
ened+0.074
ted+0.062
 Sarah+0.062
 Let+0.062
cy+0.061
Neg logit contributions
ggy-0.066
 badly-0.054
vern-0.053
ggie-0.051
gh-0.051
Token was
Token ID373
Feature activation+4.64e-02

Pos logit contributions
ened+0.063
elling+0.057
cy+0.053
neath+0.052
ted+0.051
Neg logit contributions
ggy-0.085
 badly-0.069
ggie-0.068
 not-0.065
 liar-0.063
Token a
Token ID257
Feature activation+3.42e-02

Pos logit contributions
ggy+0.096
vern+0.086
 badly+0.084
 liar+0.081
vere+0.077
Neg logit contributions
ened-0.094
cy-0.085
 Sarah-0.078
en-0.078
neath-0.077
Token graceful
Token ID44363
Feature activation-3.36e-02

Pos logit contributions
ggy+0.069
 badly+0.061
vern+0.059
 liar+0.058
ggie+0.054
Neg logit contributions
ened-0.072
cy-0.062
ted-0.060
en-0.060
neath-0.058
Token across
Token ID1973
Feature activation-3.78e-02

Pos logit contributions
ened+0.004
cy+0.004
neath+0.003
 Girl+0.003
en+0.003
Neg logit contributions
ggy-0.005
 badly-0.004
vern-0.004
vere-0.004
 liar-0.004
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
ened+0.055
cy+0.052
ted+0.044
 Girl+0.044
 Let+0.043
Neg logit contributions
ggy-0.094
 badly-0.086
vern-0.085
 liar-0.081
gh-0.079
Token land
Token ID1956
Feature activation+3.86e-02

Pos logit contributions
ened+0.036
cy+0.032
ted+0.030
en+0.030
 Sarah+0.030
Neg logit contributions
ggy-0.035
 badly-0.031
vern-0.031
 liar-0.029
vere-0.028
Token.
Token ID13
Feature activation-3.63e-03

Pos logit contributions
ggy+0.082
 badly+0.074
vern+0.071
 liar+0.066
vere+0.066
Neg logit contributions
ened-0.076
cy-0.069
ted-0.063
en-0.062
neath-0.062
Token<|endoftext|>
Token ID50256
Feature activation+1.16e-02

Pos logit contributions
ened+0.007
cy+0.006
en+0.006
ted+0.006
neath+0.005
Neg logit contributions
ggy-0.008
vern-0.007
 badly-0.007
 liar-0.007
unch-0.007
TokenLittle
Token ID22253
Feature activation-1.08e-01

Pos logit contributions
ggy+0.025
 badly+0.022
vern+0.021
 liar+0.020
ggie+0.020
Neg logit contributions
ened-0.023
cy-0.019
en-0.018
 Jenna-0.017
ted-0.017
Token Jack
Token ID3619
Feature activation-5.90e-02

Pos logit contributions
ened+0.175
elling+0.161
 roles+0.158
 brushes+0.153
 channels+0.151
Neg logit contributions
ggy-0.231
ggie-0.188
gh-0.178
 not-0.175
 badly-0.173
Token was
Token ID373
Feature activation+4.66e-02

Pos logit contributions
ened+0.101
elling+0.095
 roles+0.094
riage+0.089
 Girl+0.087
Neg logit contributions
ggy-0.114
 badly-0.100
 not-0.090
ggie-0.087
 choke-0.085
Token playing
Token ID2712
Feature activation-1.29e-02

Pos logit contributions
ggy+0.096
vern+0.086
 badly+0.084
 liar+0.081
vere+0.078
Neg logit contributions
ened-0.095
cy-0.085
 Sarah-0.079
en-0.079
neath-0.078
Token in
Token ID287
Feature activation-4.03e-02

Pos logit contributions
ened+0.026
cy+0.023
ted+0.021
en+0.021
 Jenna+0.021
Neg logit contributions
ggy-0.027
 badly-0.024
vern-0.023
 liar-0.022
ggie-0.021
Token the
Token ID262
Feature activation-2.28e-02

Pos logit contributions
ened+0.065
cy+0.057
ted+0.055
als+0.054
elling+0.053
Neg logit contributions
ggy-0.096
 badly-0.088
 not-0.080
vern-0.079
ggie-0.079
Token so
Token ID523
Feature activation+1.56e-03

Pos logit contributions
ggy+0.006
 badly+0.006
 liar+0.005
vern+0.005
 not+0.005
Neg logit contributions
ened-0.005
cy-0.005
en-0.004
ted-0.004
elling-0.004
Token she
Token ID673
Feature activation-1.77e-02

Pos logit contributions
ggy+0.003
 badly+0.003
 liar+0.003
vern+0.003
ggie+0.003
Neg logit contributions
ened-0.003
cy-0.003
en-0.003
neath-0.002
ted-0.002
Token was
Token ID373
Feature activation+5.82e-02

Pos logit contributions
ened+0.028
cy+0.024
en+0.022
neath+0.022
 Sarah+0.021
Neg logit contributions
ggy-0.045
 badly-0.041
vern-0.039
 liar-0.038
ggie-0.038
Token scared
Token ID12008
Feature activation-5.79e-03

Pos logit contributions
ggy+0.113
vern+0.109
gh+0.099
 liar+0.099
vere+0.098
Neg logit contributions
ened-0.113
 over-0.098
 Sarah-0.097
 down-0.097
cy-0.096
Token.
Token ID13
Feature activation-1.07e-03

Pos logit contributions
ened+0.010
cy+0.009
en+0.008
ted+0.008
 Sarah+0.008
Neg logit contributions
ggy-0.013
vern-0.012
 badly-0.012
 liar-0.012
vere-0.011
Token
Token ID198
Feature activation+7.38e-02

Pos logit contributions
ened+0.002
cy+0.002
en+0.002
isher+0.002
ted+0.002
Neg logit contributions
ggy-0.002
 not-0.002
vern-0.001
 badly-0.001
vere-0.001
Token
Token ID198
Feature activation+5.77e-02

Pos logit contributions
ggy+0.186
 badly+0.176
 liar+0.176
vern+0.172
 gag+0.163
Neg logit contributions
ened-0.099
cy-0.094
 Sarah-0.089
 Jenny-0.086
 Jenna-0.085
TokenThen
Token ID6423
Feature activation-4.21e-02

Pos logit contributions
ggy+0.107
 badly+0.095
vern+0.093
 liar+0.090
vere+0.084
Neg logit contributions
ened-0.128
cy-0.117
en-0.109
ted-0.108
 Sarah-0.107
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
ened+0.075
cy+0.065
en+0.063
elling+0.063
oured+0.061
Neg logit contributions
ggy-0.094
 badly-0.080
 not-0.080
ggie-0.076
gh-0.076
Token a
Token ID257
Feature activation+1.05e-02

Pos logit contributions
ened+0.039
cy+0.033
en+0.033
ted+0.031
elling+0.031
Neg logit contributions
ggy-0.041
 badly-0.035
 not-0.035
ggie-0.032
vere-0.032
Token fish
Token ID5916
Feature activation+3.03e-02

Pos logit contributions
ggy+0.021
 badly+0.019
vern+0.019
 liar+0.018
vere+0.017
Neg logit contributions
ened-0.021
cy-0.019
en-0.018
 Sarah-0.018
ted-0.018
Token floor
Token ID4314
Feature activation+2.80e-02

Pos logit contributions
ened+0.018
cy+0.016
en+0.014
ted+0.014
 Sarah+0.014
Neg logit contributions
ggy-0.024
 badly-0.021
vern-0.021
 liar-0.020
vere-0.019
Token was
Token ID373
Feature activation+5.93e-02

Pos logit contributions
ggy+0.067
 badly+0.064
 not+0.057
ggie+0.054
gh+0.053
Neg logit contributions
ened-0.045
cy-0.042
 Jenna-0.039
ted-0.039
 exp-0.038
Token slippery
Token ID32911
Feature activation-3.39e-03

Pos logit contributions
ggy+0.101
vern+0.086
 badly+0.084
 liar+0.082
vere+0.077
Neg logit contributions
ened-0.142
cy-0.128
en-0.121
 Sarah-0.120
ted-0.119
Token.
Token ID13
Feature activation-8.12e-03

Pos logit contributions
ened+0.005
cy+0.005
 Jenna+0.004
ted+0.004
 Girl+0.004
Neg logit contributions
ggy-0.008
 badly-0.008
vern-0.007
gh-0.007
 liar-0.007
Token
Token ID220
Feature activation+3.50e-02

Pos logit contributions
ened+0.013
cy+0.012
en+0.011
ted+0.009
neath+0.009
Neg logit contributions
ggy-0.020
 badly-0.018
vern-0.018
vere-0.017
 liar-0.017
Token
Token ID198
Feature activation+7.84e-02

Pos logit contributions
ggy+0.080
 badly+0.071
vern+0.071
 liar+0.069
ggie+0.065
Neg logit contributions
ened-0.062
cy-0.056
en-0.051
 Sarah-0.050
ted-0.050
Token
Token ID198
Feature activation+5.98e-02

Pos logit contributions
ggy+0.187
 badly+0.171
vern+0.167
 liar+0.167
ggie+0.155
Neg logit contributions
ened-0.126
cy-0.115
 Sarah-0.106
ted-0.103
 Let-0.103
TokenThis
Token ID1212
Feature activation-4.56e-02

Pos logit contributions
ggy+0.114
vern+0.101
 badly+0.100
 liar+0.100
vere+0.091
Neg logit contributions
ened-0.124
cy-0.116
 Sarah-0.108
neath-0.107
 Let-0.107
Token made
Token ID925
Feature activation-9.10e-03

Pos logit contributions
ened+0.086
cy+0.077
en+0.071
ted+0.070
neath+0.068
Neg logit contributions
ggy-0.101
 badly-0.090
vern-0.089
 liar-0.085
vere-0.082
Token her
Token ID607
Feature activation-9.67e-04

Pos logit contributions
ened+0.012
cy+0.010
 Let+0.009
en+0.009
 exp+0.009
Neg logit contributions
ggy-0.025
 badly-0.023
vern-0.022
ggie-0.021
vere-0.021
Token g
Token ID308
Feature activation+1.74e-02

Pos logit contributions
ened+0.002
cy+0.002
en+0.002
ted+0.002
 Sarah+0.002
Neg logit contributions
ggy-0.002
 badly-0.002
vern-0.002
 liar-0.002
vere-0.002
Token in
Token ID287
Feature activation-2.85e-02

Pos logit contributions
ened+0.086
 exp+0.079
 Jenna+0.068
 Let+0.067
neath+0.067
Neg logit contributions
ggy-0.108
 badly-0.105
 much-0.094
 liar-0.092
 not-0.091
Token,
Token ID11
Feature activation-2.01e-02

Pos logit contributions
ened+0.041
cy+0.036
ted+0.032
en+0.031
 Let+0.030
Neg logit contributions
ggy-0.075
 badly-0.068
vern-0.067
 liar-0.064
ggie-0.063
Token please
Token ID3387
Feature activation+1.44e-02

Pos logit contributions
ened+0.039
cy+0.035
en+0.032
elling+0.031
ted+0.031
Neg logit contributions
ggy-0.044
 badly-0.041
 liar-0.037
vern-0.036
 not-0.036
Token.
Token ID13
Feature activation+2.21e-03

Pos logit contributions
ggy+0.031
 badly+0.028
vern+0.027
 liar+0.026
ggie+0.025
Neg logit contributions
ened-0.029
cy-0.026
ted-0.024
en-0.024
neath-0.023
Token This
Token ID770
Feature activation-1.41e-02

Pos logit contributions
ggy+0.005
 badly+0.005
 liar+0.004
vern+0.004
ggie+0.004
Neg logit contributions
ened-0.004
en-0.003
cy-0.003
ted-0.003
neath-0.003
Token is
Token ID318
Feature activation+7.54e-02

Pos logit contributions
ened+0.023
cy+0.020
ted+0.018
en+0.018
 Jenna+0.017
Neg logit contributions
ggy-0.035
 badly-0.031
vern-0.031
 liar-0.030
ggie-0.029
Token my
Token ID616
Feature activation+1.74e-02

Pos logit contributions
ggy+0.133
 badly+0.116
vern+0.110
 liar+0.106
ggie+0.101
Neg logit contributions
ened-0.180
cy-0.165
ted-0.154
en-0.153
neath-0.150
Token daughter
Token ID4957
Feature activation+2.67e-02

Pos logit contributions
ggy+0.035
 badly+0.032
vern+0.031
 liar+0.030
ggie+0.028
Neg logit contributions
ened-0.036
cy-0.032
en-0.030
ted-0.030
neath-0.030
Token,
Token ID11
Feature activation+1.75e-03

Pos logit contributions
ggy+0.057
 badly+0.052
 liar+0.046
vern+0.046
ggie+0.045
Neg logit contributions
ened-0.055
cy-0.048
ted-0.046
neath-0.046
en-0.044
Token Lily
Token ID20037
Feature activation-9.55e-03

Pos logit contributions
ggy+0.004
 badly+0.004
 liar+0.003
vern+0.003
 not+0.003
Neg logit contributions
ened-0.003
cy-0.003
ted-0.003
en-0.003
neath-0.003
Token.
Token ID13
Feature activation+3.61e-03

Pos logit contributions
ened+0.022
ted+0.019
cy+0.019
neath+0.019
elling+0.018
Neg logit contributions
ggy-0.018
 badly-0.016
vern-0.014
 liar-0.014
ggie-0.014
Token with
Token ID351
Feature activation-2.02e-02

Pos logit contributions
ggy+0.061
 badly+0.058
 much+0.052
 not+0.052
vern+0.052
Neg logit contributions
ened-0.058
cy-0.057
 exp-0.050
neath-0.049
en-0.048
Token her
Token ID607
Feature activation-8.48e-03

Pos logit contributions
ened+0.033
cy+0.032
 Let+0.027
neath+0.027
en+0.026
Neg logit contributions
ggy-0.049
 badly-0.043
vern-0.042
 liar-0.042
ggie-0.040
Token mom
Token ID1995
Feature activation+1.11e-02

Pos logit contributions
ened+0.016
cy+0.015
en+0.014
neath+0.014
ted+0.013
Neg logit contributions
ggy-0.019
 badly-0.017
vern-0.016
 liar-0.016
ggie-0.015
Token.
Token ID13
Feature activation+3.89e-04

Pos logit contributions
ggy+0.024
 badly+0.022
 not+0.021
ggie+0.020
 much+0.020
Neg logit contributions
ened-0.019
neath-0.016
cy-0.016
 exp-0.016
ted-0.015
Token
Token ID198
Feature activation+8.87e-02

Pos logit contributions
ggy+0.001
 badly+0.001
 liar+0.001
unch+0.001
vern+0.001
Neg logit contributions
ened-0.001
cy-0.001
en-0.001
ted-0.001
isher-0.001
Token
Token ID198
Feature activation+7.15e-02

Pos logit contributions
ggy+0.209
 badly+0.194
 liar+0.192
vern+0.190
vere+0.180
Neg logit contributions
ened-0.134
cy-0.126
 Sarah-0.119
 Jenny-0.115
 Let-0.114
Token"
Token ID1
Feature activation-1.74e-03

Pos logit contributions
ggy+0.143
 badly+0.130
vern+0.129
 liar+0.127
vere+0.119
Neg logit contributions
ened-0.133
cy-0.128
 Sarah-0.117
neath-0.116
 Let-0.116
TokenDear
Token ID20266
Feature activation-2.24e-02

Pos logit contributions
ened+0.004
cy+0.003
en+0.003
 Sarah+0.003
ted+0.003
Neg logit contributions
ggy-0.004
 badly-0.003
vern-0.003
 liar-0.003
ggie-0.003
Token Sara
Token ID24799
Feature activation-3.48e-02

Pos logit contributions
ened+0.038
cy+0.033
ted+0.029
neath+0.029
en+0.029
Neg logit contributions
ggy-0.057
 badly-0.050
vern-0.049
 liar-0.049
ggie-0.047
Token,
Token ID11
Feature activation-2.30e-02

Pos logit contributions
ened+0.048
cy+0.040
ted+0.035
neath+0.035
en+0.034
Neg logit contributions
ggy-0.097
 badly-0.089
vern-0.086
 liar-0.085
ggie-0.082
Token I
Token ID314
Feature activation+2.95e-02

Pos logit contributions
ened+0.044
cy+0.039
en+0.036
ted+0.035
neath+0.033
Neg logit contributions
ggy-0.052
 badly-0.047
vern-0.046
 liar-0.046
vere-0.043
Token.
Token ID13
Feature activation-1.45e-02

Pos logit contributions
ened+0.032
cy+0.029
neath+0.027
ted+0.027
en+0.026
Neg logit contributions
ggy-0.035
 badly-0.031
vern-0.030
 liar-0.029
vere-0.027
Token I
Token ID314
Feature activation+4.34e-02

Pos logit contributions
ened+0.028
en+0.023
cy+0.023
ted+0.022
neath+0.021
Neg logit contributions
ggy-0.032
 badly-0.029
 liar-0.028
vern-0.027
 not-0.026
Token'm
Token ID1101
Feature activation+7.54e-02

Pos logit contributions
ggy+0.092
 badly+0.081
vern+0.081
 liar+0.077
vere+0.075
Neg logit contributions
ened-0.086
cy-0.078
en-0.071
ted-0.070
 Sarah-0.069
Token glad
Token ID9675
Feature activation-7.84e-03

Pos logit contributions
ggy+0.123
gh+0.109
vern+0.108
 liar+0.106
 gag+0.100
Neg logit contributions
 Sarah-0.162
 Johnny-0.153
 Jenny-0.153
ened-0.150
 Jack-0.149
Token you
Token ID345
Feature activation+2.18e-02

Pos logit contributions
ened+0.015
cy+0.014
en+0.012
ted+0.012
 Sarah+0.012
Neg logit contributions
ggy-0.017
vern-0.015
 badly-0.015
 liar-0.014
vere-0.014
Token are
Token ID389
Feature activation+7.81e-02

Pos logit contributions
ggy+0.045
 badly+0.043
 liar+0.040
vern+0.039
vere+0.036
Neg logit contributions
ened-0.042
cy-0.036
en-0.035
neath-0.035
 Jenna-0.034
Token not
Token ID407
Feature activation+6.45e-02

Pos logit contributions
ggy+0.109
 liar+0.099
vern+0.098
 gag+0.095
unch+0.093
Neg logit contributions
ened-0.181
 Sarah-0.177
 Johnny-0.165
 down-0.164
 Jack-0.162
Token hurt
Token ID5938
Feature activation+1.82e-02

Pos logit contributions
ggy+0.117
vern+0.104
vere+0.095
 liar+0.094
gh+0.092
Neg logit contributions
ened-0.143
 Sarah-0.127
cy-0.125
en-0.121
 Johnny-0.120
Token.
Token ID13
Feature activation+3.44e-03

Pos logit contributions
ggy+0.043
vern+0.039
 liar+0.036
ggie+0.036
vere+0.035
Neg logit contributions
ened-0.032
cy-0.028
en-0.027
 Sarah-0.025
 Let-0.025
Token But
Token ID887
Feature activation+1.47e-02

Pos logit contributions
ggy+0.008
 badly+0.008
vern+0.007
 liar+0.007
vere+0.007
Neg logit contributions
ened-0.006
cy-0.005
en-0.005
ted-0.004
neath-0.004
Token you
Token ID345
Feature activation+4.41e-02

Pos logit contributions
ggy+0.030
 badly+0.027
vern+0.026
 liar+0.025
vere+0.024
Neg logit contributions
ened-0.030
cy-0.027
en-0.025
ted-0.025
neath-0.024
Token voice
Token ID3809
Feature activation+4.36e-02

Pos logit contributions
ened+0.005
cy+0.005
neath+0.004
ted+0.004
en+0.004
Neg logit contributions
ggy-0.006
 badly-0.005
vern-0.005
 liar-0.005
ggie-0.005
Token on
Token ID319
Feature activation-3.07e-02

Pos logit contributions
ggy+0.082
 badly+0.072
vern+0.065
 liar+0.064
ggie+0.064
Neg logit contributions
ened-0.098
cy-0.089
ted-0.084
neath-0.083
en-0.079
Token it
Token ID340
Feature activation+2.96e-02

Pos logit contributions
ened+0.047
cy+0.044
ted+0.038
 Let+0.036
neath+0.036
Neg logit contributions
ggy-0.079
 badly-0.072
vern-0.067
 liar-0.066
ggie-0.066
Token?
Token ID30
Feature activation-8.13e-03

Pos logit contributions
ggy+0.059
 badly+0.053
vern+0.051
 liar+0.048
vere+0.047
Neg logit contributions
ened-0.063
cy-0.056
ted-0.052
en-0.051
neath-0.050
Token You
Token ID921
Feature activation+2.15e-02

Pos logit contributions
ened+0.015
cy+0.013
en+0.012
ted+0.011
neath+0.011
Neg logit contributions
ggy-0.019
 badly-0.017
 liar-0.016
vern-0.015
vere-0.015
Token did
Token ID750
Feature activation+5.06e-02

Pos logit contributions
ggy+0.049
 badly+0.044
vern+0.043
 liar+0.042
vere+0.039
Neg logit contributions
ened-0.039
cy-0.034
ted-0.032
en-0.031
neath-0.030
Token not
Token ID407
Feature activation+5.03e-02

Pos logit contributions
ggy+0.107
vern+0.098
 liar+0.096
 badly+0.095
unch+0.093
Neg logit contributions
ened-0.094
 Sarah-0.081
en-0.080
cy-0.079
 Jenny-0.078
Token ask
Token ID1265
Feature activation-2.72e-02

Pos logit contributions
ggy+0.108
vern+0.099
 badly+0.097
vere+0.091
unch+0.090
Neg logit contributions
ened-0.094
en-0.085
cy-0.084
 Sarah-0.083
 Jenny-0.081
Token me
Token ID502
Feature activation+7.17e-03

Pos logit contributions
ened+0.048
cy+0.045
neath+0.040
ted+0.039
 Let+0.036
Neg logit contributions
ggy-0.064
 badly-0.058
vern-0.055
 liar-0.054
ggie-0.053
Token.
Token ID13
Feature activation-1.79e-03

Pos logit contributions
ggy+0.016
 badly+0.015
vern+0.014
 liar+0.014
ggie+0.013
Neg logit contributions
ened-0.014
cy-0.012
neath-0.011
ted-0.011
en-0.010
Token You
Token ID921
Feature activation+2.34e-02

Pos logit contributions
ened+0.003
cy+0.003
en+0.002
neath+0.002
ted+0.002
Neg logit contributions
ggy-0.004
 badly-0.004
 liar-0.004
vern-0.004
ggie-0.004
Token my
Token ID616
Feature activation-8.54e-03

Pos logit contributions
ened+0.053
cy+0.045
ted+0.040
 Let+0.039
 Girl+0.039
Neg logit contributions
ggy-0.091
 badly-0.084
vern-0.080
 liar-0.076
ggie-0.076
Token sack
Token ID23704
Feature activation+6.20e-02

Pos logit contributions
ened+0.015
cy+0.013
en+0.013
ted+0.012
neath+0.012
Neg logit contributions
ggy-0.020
 badly-0.018
vern-0.017
 liar-0.017
vere-0.016
Token and
Token ID290
Feature activation-4.43e-03

Pos logit contributions
ggy+0.140
vern+0.124
 badly+0.123
 liar+0.119
vere+0.113
Neg logit contributions
ened-0.113
cy-0.102
en-0.093
ted-0.090
 Sarah-0.090
Token when
Token ID618
Feature activation-1.05e-02

Pos logit contributions
ened+0.008
cy+0.007
en+0.007
ted+0.007
 Sarah+0.006
Neg logit contributions
ggy-0.010
vern-0.009
 badly-0.009
 liar-0.008
vere-0.008
Token I
Token ID314
Feature activation+4.96e-03

Pos logit contributions
ened+0.020
cy+0.018
en+0.017
ted+0.016
neath+0.016
Neg logit contributions
ggy-0.023
 badly-0.021
vern-0.020
 liar-0.019
vere-0.019
Token come
Token ID1282
Feature activation+1.06e-02

Pos logit contributions
ggy+0.011
 badly+0.010
vern+0.010
 liar+0.010
vere+0.009
Neg logit contributions
ened-0.009
cy-0.008
en-0.007
ted-0.007
 Sarah-0.007
Token back
Token ID736
Feature activation-9.23e-03

Pos logit contributions
ggy+0.022
 badly+0.022
vern+0.019
 not+0.019
 liar+0.018
Neg logit contributions
ened-0.021
cy-0.017
ted-0.016
 Girl-0.016
en-0.016
Token,
Token ID11
Feature activation-1.35e-02

Pos logit contributions
ened+0.020
cy+0.016
ted+0.016
neath+0.016
 Girl+0.015
Neg logit contributions
ggy-0.018
 badly-0.017
vern-0.015
 not-0.014
ggie-0.014
Token we
Token ID356
Feature activation+1.20e-02

Pos logit contributions
ened+0.029
cy+0.026
en+0.024
ted+0.024
neath+0.022
Neg logit contributions
ggy-0.027
 badly-0.025
 liar-0.023
vern-0.023
 not-0.023
Token will
Token ID481
Feature activation+1.83e-02

Pos logit contributions
ggy+0.030
 badly+0.027
vern+0.027
 liar+0.026
vere+0.025
Neg logit contributions
ened-0.019
cy-0.016
en-0.015
ted-0.015
 Sarah-0.014
Token be
Token ID307
Feature activation+7.04e-02

Pos logit contributions
ggy+0.044
vern+0.039
 badly+0.039
 liar+0.039
vere+0.037
Neg logit contributions
ened-0.029
cy-0.026
en-0.025
 Sarah-0.024
neath-0.024
Token.
Token ID13
Feature activation-6.47e-03

Pos logit contributions
ggy+0.116
 badly+0.104
vern+0.099
 liar+0.096
ggie+0.092
Neg logit contributions
ened-0.107
cy-0.093
ted-0.088
en-0.087
neath-0.087
Token She
Token ID1375
Feature activation-1.50e-03

Pos logit contributions
ened+0.012
cy+0.010
en+0.010
neath+0.010
ted+0.009
Neg logit contributions
ggy-0.015
 badly-0.013
 liar-0.013
vern-0.012
ggie-0.012
Token is
Token ID318
Feature activation+7.78e-02

Pos logit contributions
ened+0.002
cy+0.002
 Jenna+0.002
neath+0.002
en+0.002
Neg logit contributions
ggy-0.004
 badly-0.003
 liar-0.003
ggie-0.003
vern-0.003
Token not
Token ID407
Feature activation+6.69e-02

Pos logit contributions
ggy+0.149
vern+0.139
 badly+0.132
unch+0.126
vere+0.125
Neg logit contributions
ened-0.159
 Sarah-0.141
cy-0.140
 Jenny-0.140
en-0.134
Token a
Token ID257
Feature activation+4.83e-02

Pos logit contributions
ggy+0.130
vern+0.116
 badly+0.113
 liar+0.106
vere+0.106
Neg logit contributions
ened-0.140
cy-0.125
 Sarah-0.120
en-0.118
 Jenny-0.117
Token robot
Token ID9379
Feature activation+3.73e-02

Pos logit contributions
ggy+0.117
vern+0.106
 badly+0.104
unch+0.099
vere+0.098
Neg logit contributions
ened-0.077
cy-0.070
 Sarah-0.062
en-0.062
 Let-0.061
Token.
Token ID13
Feature activation-2.75e-03

Pos logit contributions
ggy+0.075
 badly+0.067
 liar+0.064
vern+0.063
ggie+0.059
Neg logit contributions
ened-0.079
cy-0.071
neath-0.068
 Jenna-0.065
ted-0.065
Token She
Token ID1375
Feature activation+2.35e-03

Pos logit contributions
ened+0.005
cy+0.004
en+0.004
neath+0.004
ted+0.004
Neg logit contributions
ggy-0.006
 badly-0.006
 liar-0.006
vern-0.005
vere-0.005
Token is
Token ID318
Feature activation+7.70e-02

Pos logit contributions
ggy+0.006
 badly+0.005
 liar+0.005
vern+0.005
ggie+0.005
Neg logit contributions
ened-0.004
cy-0.003
 Jenna-0.003
neath-0.003
en-0.003
Token a
Token ID257
Feature activation+5.18e-02

Pos logit contributions
ggy+0.146
vern+0.133
 badly+0.129
 liar+0.120
vere+0.119
Neg logit contributions
ened-0.163
cy-0.145
 Sarah-0.140
 Jenny-0.139
en-0.137
Token princess
Token ID21752
Feature activation+3.34e-02

Pos logit contributions
ggy+0.117
vern+0.108
 badly+0.104
unch+0.103
vere+0.099
Neg logit contributions
ened-0.086
cy-0.079
 Sarah-0.073
 Jenny-0.071
 Let-0.070
Token time
Token ID640
Feature activation-1.94e-02

Pos logit contributions
 liar+0.007
ggy+0.007
vere+0.006
vern+0.006
aches+0.006
Neg logit contributions
cy-0.004
ened-0.004
 Sarah-0.004
 Jack-0.004
 Little-0.003
Token there
Token ID612
Feature activation-4.71e-02

Pos logit contributions
ened+0.026
cy+0.022
en+0.019
ted+0.019
neath+0.018
Neg logit contributions
ggy-0.054
 badly-0.050
vern-0.049
 liar-0.047
vere-0.046
Token was
Token ID373
Feature activation+8.74e-03

Pos logit contributions
ened+0.080
elling+0.066
cy+0.065
ted+0.064
rie+0.064
Neg logit contributions
ggy-0.117
 badly-0.102
ggie-0.095
gh-0.092
vern-0.088
Token a
Token ID257
Feature activation+1.89e-02

Pos logit contributions
ggy+0.020
vern+0.018
 badly+0.017
 liar+0.017
vere+0.017
Neg logit contributions
ened-0.016
cy-0.014
 Sarah-0.013
neath-0.013
en-0.013
Token bar
Token ID2318
Feature activation+4.19e-02

Pos logit contributions
ggy+0.047
vern+0.044
 badly+0.042
vere+0.040
 liar+0.040
Neg logit contributions
ened-0.028
cy-0.027
 Sarah-0.023
 Let-0.022
 Jenny-0.022
Tokenber
Token ID527
Feature activation+6.32e-03

Pos logit contributions
ggy+0.077
vern+0.076
 badly+0.069
gh+0.068
vere+0.067
Neg logit contributions
cy-0.083
ened-0.083
ted-0.073
 Let-0.071
 Jenny-0.071
Token who
Token ID508
Feature activation-1.87e-02

Pos logit contributions
ggy+0.013
 badly+0.011
vern+0.011
 liar+0.011
ggie+0.010
Neg logit contributions
ened-0.013
cy-0.012
en-0.011
neath-0.011
ted-0.011
Token lived
Token ID5615
Feature activation+1.58e-02

Pos logit contributions
ened+0.031
cy+0.027
en+0.025
ted+0.024
neath+0.024
Neg logit contributions
ggy-0.046
 badly-0.041
vern-0.040
 liar-0.039
ggie-0.038
Token in
Token ID287
Feature activation-3.24e-02

Pos logit contributions
ggy+0.041
vern+0.039
 liar+0.037
 gag+0.035
gh+0.035
Neg logit contributions
cy-0.020
ened-0.019
 down-0.017
 over-0.016
ted-0.016
Token a
Token ID257
Feature activation+1.90e-02

Pos logit contributions
ened+0.052
cy+0.045
 Sarah+0.044
neath+0.042
 Jenny+0.042
Neg logit contributions
ggy-0.078
vern-0.073
 badly-0.071
 liar-0.069
vere-0.068
Token small
Token ID1402
Feature activation-2.52e-02

Pos logit contributions
ggy+0.037
vern+0.034
 badly+0.032
vere+0.031
 liar+0.030
Neg logit contributions
ened-0.039
cy-0.037
 Sarah-0.033
en-0.033
neath-0.033
Token books
Token ID3835
Feature activation+1.84e-02

Pos logit contributions
ened+0.041
cy+0.037
en+0.034
ted+0.034
 Sarah+0.033
Neg logit contributions
ggy-0.048
 badly-0.043
vern-0.043
 liar-0.041
vere-0.039
Tokenhe
Token ID258
Feature activation-2.08e-02

Pos logit contributions
ggy+0.030
 badly+0.027
vern+0.025
 liar+0.023
vere+0.022
Neg logit contributions
ened-0.045
cy-0.040
en-0.038
ted-0.037
 Let-0.037
Tokenlf
Token ID1652
Feature activation+8.72e-03

Pos logit contributions
ened+0.051
cy+0.047
en+0.044
ted+0.044
neath+0.043
Neg logit contributions
ggy-0.035
 badly-0.030
vern-0.029
 liar-0.027
ggie-0.026
Token and
Token ID290
Feature activation-3.43e-02

Pos logit contributions
ggy+0.022
 badly+0.021
vern+0.020
 liar+0.019
ggie+0.019
Neg logit contributions
ened-0.013
cy-0.011
ted-0.010
en-0.010
neath-0.010
Token found
Token ID1043
Feature activation-1.70e-02

Pos logit contributions
ened+0.063
cy+0.058
elling+0.056
en+0.054
neath+0.054
Neg logit contributions
ggy-0.074
 badly-0.064
 not-0.063
ggie-0.059
 liar-0.057
Token some
Token ID617
Feature activation+1.11e-02

Pos logit contributions
ened+0.033
cy+0.031
en+0.028
 Let+0.028
 exp+0.027
Neg logit contributions
ggy-0.036
 badly-0.032
vern-0.031
 liar-0.030
ggie-0.029
Token more
Token ID517
Feature activation+2.21e-02

Pos logit contributions
ggy+0.022
 badly+0.020
vern+0.019
 liar+0.018
vere+0.018
Neg logit contributions
ened-0.023
cy-0.021
en-0.019
neath-0.019
ted-0.019
Token books
Token ID3835
Feature activation+4.72e-02

Pos logit contributions
ggy+0.047
 badly+0.041
vern+0.041
 liar+0.039
ggie+0.037
Neg logit contributions
ened-0.044
cy-0.040
neath-0.036
en-0.036
ted-0.036
Token.
Token ID13
Feature activation-1.73e-02

Pos logit contributions
ggy+0.104
 badly+0.094
vern+0.092
 liar+0.087
vere+0.085
Neg logit contributions
ened-0.089
cy-0.079
en-0.072
ted-0.072
neath-0.070
Token She
Token ID1375
Feature activation-1.23e-02

Pos logit contributions
ened+0.031
cy+0.027
en+0.027
ted+0.024
neath+0.023
Neg logit contributions
ggy-0.038
 badly-0.036
 much-0.033
vern-0.033
unch-0.032
Token grabbed
Token ID13176
Feature activation-4.89e-02

Pos logit contributions
ened+0.020
cy+0.018
en+0.016
elling+0.016
neath+0.016
Neg logit contributions
ggy-0.030
 badly-0.027
vern-0.025
ggie-0.025
 liar-0.024
Token serious
Token ID2726
Feature activation-5.17e-02

Pos logit contributions
ened+0.015
cy+0.014
en+0.013
ted+0.012
 Sarah+0.012
Neg logit contributions
ggy-0.017
vern-0.015
 badly-0.015
 liar-0.014
vere-0.014
Token face
Token ID1986
Feature activation+1.65e-02

Pos logit contributions
ened+0.101
cy+0.097
neath+0.090
ted+0.089
en+0.086
Neg logit contributions
ggy-0.105
 badly-0.097
vern-0.090
 liar-0.089
gh-0.083
Token.
Token ID13
Feature activation-2.24e-03

Pos logit contributions
ggy+0.032
 badly+0.030
ggie+0.025
gh+0.024
 not+0.024
Neg logit contributions
ened-0.034
cy-0.033
 Jenna-0.030
neath-0.030
 exp-0.029
Token She
Token ID1375
Feature activation+5.23e-04

Pos logit contributions
ened+0.004
cy+0.004
en+0.004
ted+0.003
neath+0.003
Neg logit contributions
ggy-0.005
 badly-0.005
vern-0.004
 liar-0.004
vere-0.004
Token looked
Token ID3114
Feature activation+5.47e-03

Pos logit contributions
ggy+0.001
 badly+0.001
 liar+0.001
vern+0.001
ggie+0.001
Neg logit contributions
ened-0.001
cy-0.001
en-0.001
neath-0.001
 Let-0.001
Token at
Token ID379
Feature activation-3.36e-02

Pos logit contributions
 badly+0.009
ggy+0.008
 liar+0.007
 much+0.007
vern+0.007
Neg logit contributions
ened-0.012
cy-0.012
en-0.011
 Let-0.011
neath-0.011
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
ened+0.058
cy+0.052
 exp+0.050
 Let+0.050
oured+0.046
Neg logit contributions
 badly-0.071
ggy-0.069
 not-0.064
 liar-0.059
vern-0.058
Token bracelet
Token ID42893
Feature activation+4.41e-02

Pos logit contributions
ened+0.020
cy+0.017
 smiles+0.016
 Sarah+0.016
ted+0.016
Neg logit contributions
ggy-0.024
vern-0.021
 badly-0.020
 liar-0.019
ggie-0.019
Token and
Token ID290
Feature activation-2.03e-02

Pos logit contributions
ggy+0.115
 badly+0.107
vern+0.101
 liar+0.096
vere+0.095
Neg logit contributions
ened-0.067
cy-0.058
ted-0.051
en-0.051
neath-0.050
Token saw
Token ID2497
Feature activation-1.47e-02

Pos logit contributions
cy+0.033
ened+0.031
elling+0.030
neath+0.029
en+0.029
Neg logit contributions
ggy-0.045
 badly-0.043
 not-0.041
 liar-0.040
 gag-0.036
Token how
Token ID703
Feature activation+3.53e-02

Pos logit contributions
ened+0.030
cy+0.029
en+0.025
 exp+0.025
 Let+0.024
Neg logit contributions
ggy-0.031
 badly-0.028
 liar-0.026
vern-0.026
vere-0.024
Token might
Token ID1244
Feature activation+4.15e-02

Pos logit contributions
ened+0.046
cy+0.040
elling+0.036
 Jenny+0.036
en+0.036
Neg logit contributions
ggy-0.072
 badly-0.065
gh-0.061
ggie-0.060
 not-0.059
Token be
Token ID307
Feature activation+5.47e-02

Pos logit contributions
ggy+0.106
vern+0.096
 liar+0.094
 badly+0.093
oll+0.092
Neg logit contributions
ened-0.059
cy-0.050
en-0.049
 Sarah-0.049
 Jenna-0.046
Token from
Token ID422
Feature activation-1.46e-02

Pos logit contributions
ggy+0.087
 badly+0.074
vern+0.073
 liar+0.068
vere+0.064
Neg logit contributions
ened-0.138
cy-0.127
en-0.119
ted-0.118
neath-0.116
Token something
Token ID1223
Feature activation+4.49e-02

Pos logit contributions
ened+0.028
cy+0.025
ted+0.023
en+0.023
 Let+0.023
Neg logit contributions
ggy-0.032
 badly-0.029
vern-0.028
 liar-0.027
vere-0.025
Token,
Token ID11
Feature activation-8.47e-03

Pos logit contributions
ggy+0.090
 badly+0.081
vern+0.078
 liar+0.073
vere+0.071
Neg logit contributions
ened-0.095
cy-0.087
en-0.080
ted-0.079
neath-0.079
Token but
Token ID475
Feature activation-1.21e-02

Pos logit contributions
ened+0.018
cy+0.017
en+0.015
neath+0.015
ted+0.015
Neg logit contributions
ggy-0.016
 badly-0.015
vern-0.014
 liar-0.013
vere-0.013
Token she
Token ID673
Feature activation-5.67e-03

Pos logit contributions
ened+0.022
cy+0.021
en+0.019
 Let+0.018
elling+0.018
Neg logit contributions
ggy-0.026
 badly-0.026
 not-0.022
vern-0.022
vere-0.021
Token couldn
Token ID3521
Feature activation+4.20e-02

Pos logit contributions
ened+0.008
cy+0.007
en+0.006
ted+0.006
 Sarah+0.006
Neg logit contributions
ggy-0.015
vern-0.014
 badly-0.014
 liar-0.013
vere-0.013
Token't
Token ID470
Feature activation+4.13e-02

Pos logit contributions
ggy+0.067
 liar+0.060
vern+0.059
 badly+0.058
vere+0.051
Neg logit contributions
ened-0.098
cy-0.092
neath-0.090
 Sarah-0.088
 Let-0.087
Token tell
Token ID1560
Feature activation-2.33e-02

Pos logit contributions
ggy+0.104
vern+0.099
 liar+0.096
 badly+0.093
unch+0.092
Neg logit contributions
ened-0.057
cy-0.054
en-0.052
 Jenna-0.046
 Sarah-0.044
Token.
Token ID13
Feature activation+8.21e-04

Pos logit contributions
ened+0.047
 Sarah+0.040
cy+0.039
en+0.039
ted+0.038
Neg logit contributions
ggy-0.047
vern-0.043
 badly-0.042
 liar-0.041
vere-0.039
Token.
Token ID13
Feature activation+8.29e-03

Pos logit contributions
ggy+0.006
 badly+0.006
vern+0.005
 liar+0.005
vere+0.005
Neg logit contributions
ened-0.006
cy-0.005
ted-0.005
en-0.005
neath-0.005
Token<|endoftext|>
Token ID50256
Feature activation+2.24e-02

Pos logit contributions
ggy+0.018
 badly+0.016
vern+0.016
 liar+0.015
vere+0.015
Neg logit contributions
ened-0.016
cy-0.014
en-0.013
ted-0.013
neath-0.012
TokenOnce
Token ID7454
Feature activation-7.10e-03

Pos logit contributions
ggy+0.051
 badly+0.047
vern+0.046
 liar+0.045
vere+0.043
Neg logit contributions
ened-0.039
cy-0.036
en-0.032
ted-0.032
 Sarah-0.031
Token upon
Token ID2402
Feature activation+8.80e-03

Pos logit contributions
ened+0.011
cy+0.010
en+0.009
ted+0.009
neath+0.009
Neg logit contributions
ggy-0.018
 badly-0.016
vern-0.016
 liar-0.015
vere-0.015
Token a
Token ID257
Feature activation+6.26e-03

Pos logit contributions
vern+0.024
ggy+0.024
 liar+0.024
vere+0.024
gh+0.023
Neg logit contributions
 Sarah-0.007
 The-0.007
 the-0.007
 Jack-0.007
ened-0.006
Token time
Token ID640
Feature activation-1.29e-02

Pos logit contributions
 liar+0.014
vere+0.014
ggy+0.014
vern+0.014
 badly+0.013
Neg logit contributions
cy-0.009
ened-0.008
 Sarah-0.008
 Jack-0.007
 Little-0.007
Token,
Token ID11
Feature activation-1.13e-02

Pos logit contributions
ened+0.013
cy+0.011
 Sarah+0.010
en+0.009
 Let+0.009
Neg logit contributions
ggy-0.038
vern-0.035
 badly-0.035
 liar-0.035
vere-0.033
Token there
Token ID612
Feature activation-3.98e-02

Pos logit contributions
ened+0.023
cy+0.021
 Sarah+0.019
neath+0.019
en+0.019
Neg logit contributions
ggy-0.023
vern-0.021
 badly-0.020
 liar-0.020
vere-0.019
Token was
Token ID373
Feature activation+1.98e-02

Pos logit contributions
ened+0.055
cy+0.045
ted+0.041
en+0.041
 Jenna+0.039
Neg logit contributions
ggy-0.110
 badly-0.100
vern-0.095
ggie-0.092
 liar-0.092
Token a
Token ID257
Feature activation+2.49e-02

Pos logit contributions
ggy+0.046
vern+0.043
 badly+0.041
 liar+0.041
vere+0.041
Neg logit contributions
ened-0.033
cy-0.031
 Sarah-0.029
neath-0.027
ted-0.027
Token little
Token ID1310
Feature activation-3.86e-02

Pos logit contributions
ggy+0.062
vern+0.059
 badly+0.057
vere+0.055
gh+0.054
Neg logit contributions
ened-0.035
cy-0.034
 Sarah-0.029
 Girl-0.028
 Jenny-0.028
Token they
Token ID484
Feature activation+6.21e-04

Pos logit contributions
ened+0.010
cy+0.009
en+0.008
ted+0.008
neath+0.008
Neg logit contributions
ggy-0.010
 badly-0.009
vern-0.009
 liar-0.008
vere-0.008
Token got
Token ID1392
Feature activation+3.14e-02

Pos logit contributions
ggy+0.001
 badly+0.001
vern+0.001
 liar+0.001
 not+0.001
Neg logit contributions
ened-0.001
en-0.001
cy-0.001
ted-0.001
elling-0.001
Token her
Token ID607
Feature activation-5.76e-03

Pos logit contributions
ggy+0.060
 badly+0.058
vern+0.052
 liar+0.050
vere+0.048
Neg logit contributions
ened-0.066
cy-0.060
en-0.058
 Let-0.056
ted-0.055
Token license
Token ID5964
Feature activation+3.42e-02

Pos logit contributions
ened+0.013
en+0.012
cy+0.012
ted+0.012
neath+0.012
Neg logit contributions
ggy-0.010
 badly-0.009
vern-0.008
 liar-0.008
 not-0.007
Token from
Token ID422
Feature activation-3.98e-02

Pos logit contributions
ggy+0.077
 badly+0.072
vern+0.065
gh+0.063
ggie+0.062
Neg logit contributions
ened-0.063
cy-0.058
ted-0.054
neath-0.053
 exp-0.053
Token the
Token ID262
Feature activation-1.30e-02

Pos logit contributions
ened+0.061
cy+0.053
 Let+0.053
als+0.051
elling+0.049
Neg logit contributions
ggy-0.096
 badly-0.096
 not-0.082
 liar-0.082
ggie-0.080
Token car
Token ID1097
Feature activation+1.43e-02

Pos logit contributions
ened+0.023
cy+0.021
en+0.019
ted+0.019
 Jenna+0.018
Neg logit contributions
ggy-0.030
 badly-0.027
vern-0.027
 liar-0.026
vere-0.025
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
ggy+0.030
 badly+0.029
 not+0.024
ggie+0.024
vern+0.023
Neg logit contributions
ened-0.029
cy-0.025
elling-0.024
en-0.023
 smiles-0.023
Token
Token ID220
Feature activation+3.76e-02

Pos logit contributions
ened+0.021
en+0.018
cy+0.017
ening+0.016
isher+0.016
Neg logit contributions
 much-0.020
 badly-0.019
ggy-0.019
 not-0.018
unch-0.017
Token
Token ID198
Feature activation+7.56e-02

Pos logit contributions
ggy+0.078
 badly+0.069
vern+0.068
 liar+0.063
vere+0.063
Neg logit contributions
ened-0.079
cy-0.068
en-0.065
ted-0.063
neath-0.061
Token
Token ID198
Feature activation+5.99e-02

Pos logit contributions
ggy+0.177
 badly+0.161
vern+0.159
 liar+0.156
vere+0.146
Neg logit contributions
ened-0.128
cy-0.116
 Sarah-0.104
en-0.104
 Let-0.103
Token that
Token ID326
Feature activation-1.87e-02

Pos logit contributions
ggy+0.016
 badly+0.015
vern+0.014
 liar+0.014
vere+0.013
Neg logit contributions
ened-0.016
cy-0.014
en-0.013
ted-0.013
neath-0.013
Token there
Token ID612
Feature activation-1.34e-02

Pos logit contributions
ened+0.032
cy+0.028
en+0.025
ted+0.025
 Sarah+0.024
Neg logit contributions
ggy-0.045
 badly-0.041
vern-0.040
 liar-0.039
vere-0.037
Token are
Token ID389
Feature activation+6.09e-02

Pos logit contributions
ened+0.025
cy+0.023
ted+0.022
oured+0.021
en+0.020
Neg logit contributions
ggy-0.029
 badly-0.027
vern-0.026
 not-0.025
vere-0.024
Token kind
Token ID1611
Feature activation-1.65e-02

Pos logit contributions
ggy+0.107
 badly+0.095
vern+0.091
 liar+0.087
vere+0.081
Neg logit contributions
ened-0.142
cy-0.131
en-0.121
ted-0.121
neath-0.119
Token people
Token ID661
Feature activation+4.10e-02

Pos logit contributions
ened+0.035
neath+0.034
cy+0.033
 Jenna+0.030
ple+0.030
Neg logit contributions
ggy-0.029
 badly-0.026
 liar-0.025
vern-0.023
 villains-0.022
Token who
Token ID508
Feature activation-4.17e-04

Pos logit contributions
ggy+0.082
 badly+0.078
vern+0.073
 liar+0.068
vere+0.065
Neg logit contributions
ened-0.083
cy-0.073
neath-0.069
ted-0.069
en-0.067
Token can
Token ID460
Feature activation+6.63e-02

Pos logit contributions
ened+0.001
cy+0.001
en+0.001
ted+0.001
neath+0.001
Neg logit contributions
ggy-0.001
 badly-0.001
vern-0.001
 liar-0.001
vere-0.001
Token help
Token ID1037
Feature activation-2.87e-02

Pos logit contributions
ggy+0.162
 liar+0.149
 badly+0.141
vern+0.139
ggie+0.137
Neg logit contributions
ened-0.103
en-0.092
cy-0.090
neath-0.085
 means-0.081
Token restore
Token ID11169
Feature activation-3.72e-02

Pos logit contributions
ened+0.055
cy+0.050
ted+0.046
neath+0.045
en+0.044
Neg logit contributions
ggy-0.063
 badly-0.057
vern-0.055
 liar-0.052
vere-0.051
Token them
Token ID606
Feature activation+2.33e-02

Pos logit contributions
ened+0.069
cy+0.062
ted+0.059
neath+0.058
 Let+0.058
Neg logit contributions
ggy-0.079
 badly-0.076
vern-0.070
 liar-0.064
 much-0.064
Token if
Token ID611
Feature activation-1.58e-02

Pos logit contributions
ggy+0.051
 badly+0.047
vern+0.046
 liar+0.043
vere+0.042
Neg logit contributions
ened-0.044
cy-0.039
ted-0.036
en-0.035
neath-0.035
Token to
Token ID284
Feature activation-1.45e-03

Pos logit contributions
ggy+0.076
 badly+0.070
vern+0.068
 liar+0.064
gh+0.062
Neg logit contributions
ened-0.056
cy-0.051
en-0.045
neath-0.045
 Girl-0.045
Token spread
Token ID4104
Feature activation-2.67e-02

Pos logit contributions
ened+0.003
cy+0.003
ted+0.003
 Girl+0.003
elling+0.002
Neg logit contributions
ggy-0.003
 badly-0.003
vern-0.003
 liar-0.002
gh-0.002
Token it
Token ID340
Feature activation+2.35e-02

Pos logit contributions
ened+0.043
cy+0.041
 Girl+0.038
 Let+0.036
ted+0.035
Neg logit contributions
ggy-0.061
 badly-0.057
vern-0.054
 liar-0.051
gh-0.050
Token around
Token ID1088
Feature activation-4.78e-02

Pos logit contributions
ggy+0.058
 badly+0.054
vern+0.052
 liar+0.049
vere+0.049
Neg logit contributions
ened-0.036
cy-0.032
ted-0.029
en-0.028
 Girl-0.028
Token.
Token ID13
Feature activation-7.19e-03

Pos logit contributions
ened+0.077
cy+0.072
elling+0.067
ted+0.064
 Let+0.064
Neg logit contributions
ggy-0.112
 badly-0.105
gh-0.093
vern-0.093
 liar-0.093
Token Together
Token ID17083
Feature activation-6.90e-02

Pos logit contributions
ened+0.015
cy+0.013
en+0.013
ted+0.012
 roles+0.012
Neg logit contributions
ggy-0.014
 badly-0.013
vern-0.012
unch-0.012
vere-0.011
Token,
Token ID11
Feature activation-1.22e-02

Pos logit contributions
ened+0.117
cy+0.109
elling+0.102
 Let+0.098
neath+0.096
Neg logit contributions
ggy-0.163
 badly-0.150
vern-0.134
ggie-0.132
 not-0.132
Token they
Token ID484
Feature activation-8.57e-03

Pos logit contributions
ened+0.023
cy+0.019
elling+0.019
en+0.018
 exp+0.018
Neg logit contributions
ggy-0.027
 badly-0.024
 much-0.022
unch-0.021
 not-0.021
Token built
Token ID3170
Feature activation-2.90e-02

Pos logit contributions
ened+0.015
cy+0.014
 Let+0.013
 Girl+0.013
elling+0.013
Neg logit contributions
ggy-0.018
 badly-0.016
vern-0.015
 liar-0.015
gh-0.014
Token a
Token ID257
Feature activation+3.06e-02

Pos logit contributions
ened+0.057
cy+0.053
 Girl+0.050
ted+0.049
 Let+0.048
Neg logit contributions
ggy-0.061
 badly-0.053
vern-0.051
gh-0.048
ggie-0.048
Token beautiful
Token ID4950
Feature activation-1.15e-02

Pos logit contributions
ggy+0.062
vern+0.055
 badly+0.055
 liar+0.052
vere+0.050
Neg logit contributions
ened-0.060
cy-0.056
 Sarah-0.051
en-0.051
 Jenny-0.050
Token.
Token ID13
Feature activation-7.08e-03

Pos logit contributions
ened+0.034
cy+0.029
neath+0.028
ted+0.028
 Let+0.027
Neg logit contributions
ggy-0.029
 badly-0.027
vern-0.024
ggie-0.023
gh-0.023
Token
Token ID220
Feature activation+3.64e-02

Pos logit contributions
ened+0.017
en+0.015
ener+0.014
cy+0.014
elling+0.014
Neg logit contributions
ggy-0.012
 badly-0.010
ggie-0.009
 much-0.009
 not-0.009
Token
Token ID198
Feature activation+7.02e-02

Pos logit contributions
ggy+0.087
 badly+0.079
vern+0.078
 liar+0.078
gh+0.072
Neg logit contributions
ened-0.058
cy-0.053
 Sarah-0.049
 Let-0.048
 Jenny-0.047
Token
Token ID198
Feature activation+5.34e-02

Pos logit contributions
ggy+0.172
 liar+0.162
 badly+0.161
vern+0.160
vere+0.148
Neg logit contributions
ened-0.100
cy-0.094
 Sarah-0.089
 Jenny-0.084
 Let-0.084
TokenThe
Token ID464
Feature activation-2.95e-02

Pos logit contributions
ggy+0.104
vern+0.097
 badly+0.095
 liar+0.094
vere+0.088
Neg logit contributions
ened-0.103
cy-0.097
 Sarah-0.091
 Let-0.090
neath-0.089
Token wealthy
Token ID11574
Feature activation-5.94e-02

Pos logit contributions
ened+0.054
cy+0.048
en+0.044
ted+0.043
 Sarah+0.042
Neg logit contributions
ggy-0.067
 badly-0.061
vern-0.060
 liar-0.058
vere-0.055
Token farmer
Token ID18739
Feature activation-1.38e-02

Pos logit contributions
ened+0.127
cy+0.113
neath+0.107
en+0.107
ted+0.105
Neg logit contributions
ggy-0.119
 badly-0.101
vern-0.098
 liar-0.095
ggie-0.092
Token saw
Token ID2497
Feature activation-3.51e-02

Pos logit contributions
ened+0.024
cy+0.021
 roles+0.021
elling+0.020
neath+0.020
Neg logit contributions
ggy-0.031
 badly-0.027
ggie-0.025
 not-0.024
 liar-0.024
Token his
Token ID465
Feature activation-2.20e-02

Pos logit contributions
ened+0.066
cy+0.064
 Let+0.053
en+0.053
rie+0.052
Neg logit contributions
ggy-0.079
 badly-0.069
 liar-0.064
ggie-0.063
 not-0.063
Token opportunity
Token ID3663
Feature activation-2.16e-02

Pos logit contributions
ened+0.043
cy+0.039
en+0.036
ted+0.035
 Sarah+0.035
Neg logit contributions
ggy-0.047
 badly-0.042
vern-0.041
 liar-0.039
vere-0.038
Token to
Token ID284
Feature activation+2.09e-03

Pos logit contributions
ened+0.036
cy+0.033
en+0.028
ted+0.028
neath+0.028
Neg logit contributions
ggy-0.053
 badly-0.048
vern-0.047
 liar-0.045
ggie-0.044
Token are
Token ID389
Feature activation+7.36e-02

Pos logit contributions
ggy+0.043
 badly+0.039
vern+0.036
 liar+0.036
ggie+0.034
Neg logit contributions
ened-0.032
cy-0.027
 Jenna-0.026
 Girl-0.025
elling-0.025
Token friends
Token ID2460
Feature activation+2.57e-02

Pos logit contributions
ggy+0.134
vern+0.116
 badly+0.114
 liar+0.112
vere+0.108
Neg logit contributions
ened-0.165
cy-0.147
en-0.140
 Sarah-0.139
ted-0.134
Token.
Token ID13
Feature activation-9.24e-03

Pos logit contributions
ggy+0.054
 badly+0.049
vern+0.048
 liar+0.046
vere+0.043
Neg logit contributions
ened-0.052
cy-0.046
en-0.042
ted-0.042
neath-0.042
Token The
Token ID383
Feature activation-1.75e-02

Pos logit contributions
ened+0.016
cy+0.013
en+0.012
ted+0.011
neath+0.011
Neg logit contributions
ggy-0.023
vern-0.020
 badly-0.020
 liar-0.019
unch-0.019
Token moral
Token ID6573
Feature activation+5.79e-03

Pos logit contributions
ened+0.031
cy+0.027
 smiles+0.026
 Let+0.026
 Sarah+0.025
Neg logit contributions
ggy-0.038
vern-0.035
 badly-0.034
unch-0.033
ggie-0.032
Token of
Token ID286
Feature activation-7.04e-02

Pos logit contributions
ggy+0.014
vern+0.013
 badly+0.013
istered+0.012
 liar+0.012
Neg logit contributions
ened-0.008
cy-0.008
 Let-0.007
ted-0.007
 means-0.007
Token the
Token ID262
Feature activation-3.96e-02

Pos logit contributions
ened+0.081
cy+0.063
ted+0.053
en+0.053
 Let+0.049
Neg logit contributions
ggy-0.211
 badly-0.192
vern-0.191
 liar-0.184
ggie-0.181
Token story
Token ID1621
Feature activation+2.15e-02

Pos logit contributions
ened+0.083
 Let+0.081
 Sarah+0.078
 smiles+0.077
cy+0.077
Neg logit contributions
 badly-0.062
ggy-0.061
istered-0.058
vern-0.057
 gag-0.053
Token is
Token ID318
Feature activation+4.10e-02

Pos logit contributions
ggy+0.061
 badly+0.056
vern+0.055
 liar+0.053
vere+0.052
Neg logit contributions
ened-0.028
cy-0.023
en-0.020
ted-0.020
neath-0.019
Token to
Token ID284
Feature activation+3.08e-03

Pos logit contributions
ggy+0.100
 badly+0.090
vern+0.090
 liar+0.088
vere+0.084
Neg logit contributions
ened-0.064
cy-0.057
en-0.052
 Sarah-0.052
ted-0.051
Token be
Token ID307
Feature activation+5.83e-02

Pos logit contributions
ggy+0.006
 badly+0.006
vern+0.006
 liar+0.006
vere+0.005
Neg logit contributions
ened-0.006
cy-0.005
en-0.005
 Sarah-0.005
 Jenna-0.005
Token dish
Token ID9433
Feature activation+4.57e-02

Pos logit contributions
ened+0.033
cy+0.029
en+0.027
ted+0.027
 Jenna+0.026
Neg logit contributions
ggy-0.036
 badly-0.034
 liar-0.032
vern-0.032
vere-0.030
Tokenwasher
Token ID45146
Feature activation+1.28e-02

Pos logit contributions
ggy+0.080
 badly+0.076
vern+0.070
 liar+0.066
vere+0.064
Neg logit contributions
ened-0.105
ted-0.085
cy-0.083
en-0.082
 Sarah-0.081
Token."
Token ID526
Feature activation+1.02e-02

Pos logit contributions
ggy+0.022
 badly+0.021
vern+0.019
 not+0.018
vere+0.018
Neg logit contributions
ened-0.028
cy-0.025
ted-0.024
neath-0.023
 Jenna-0.023
Token
Token ID198
Feature activation+8.46e-02

Pos logit contributions
ggy+0.019
 badly+0.016
unch+0.014
 not+0.014
 liar+0.014
Neg logit contributions
ened-0.024
en-0.021
ening-0.020
neath-0.019
cy-0.019
Token
Token ID198
Feature activation+6.86e-02

Pos logit contributions
ggy+0.193
 badly+0.177
vern+0.174
 liar+0.170
vere+0.161
Neg logit contributions
ened-0.144
cy-0.133
 Sarah-0.120
en-0.118
 Let-0.118
TokenTogether
Token ID41631
Feature activation-6.67e-02

Pos logit contributions
ggy+0.135
 badly+0.122
vern+0.119
 liar+0.116
vere+0.109
Neg logit contributions
ened-0.137
cy-0.130
 Let-0.117
 Sarah-0.117
en-0.117
Token they
Token ID484
Feature activation-8.83e-03

Pos logit contributions
ened+0.123
cy+0.110
neath+0.109
elling+0.108
en+0.107
Neg logit contributions
ggy-0.148
 badly-0.131
 not-0.118
 much-0.117
ggie-0.117
Token opened
Token ID4721
Feature activation-5.62e-02

Pos logit contributions
ened+0.016
elling+0.014
en+0.013
 Let+0.013
cy+0.013
Neg logit contributions
ggy-0.018
 badly-0.016
vern-0.015
oll-0.014
aches-0.014
Token the
Token ID262
Feature activation-1.99e-02

Pos logit contributions
elling+0.099
 Let+0.096
ened+0.096
 Girl+0.093
 exp+0.089
Neg logit contributions
 badly-0.103
ggy-0.099
 not-0.095
 much-0.091
 long-0.082
Token mysterious
Token ID11428
Feature activation-4.61e-02

Pos logit contributions
ened+0.040
cy+0.036
ted+0.035
en+0.034
 Jenna+0.033
Neg logit contributions
ggy-0.034
 badly-0.034
vern-0.034
 liar-0.031
vere-0.031
Token dish
Token ID9433
Feature activation+3.01e-02

Pos logit contributions
ened+0.087
cy+0.082
ted+0.081
 Jenna+0.078
 Let+0.076
Neg logit contributions
ggy-0.082
 badly-0.081
 liar-0.073
vern-0.070
gh-0.068
Token.
Token ID13
Feature activation+5.06e-03

Pos logit contributions
ggy+0.105
vern+0.099
 liar+0.094
vere+0.093
 badly+0.093
Neg logit contributions
ened-0.049
cy-0.041
en-0.041
 Let-0.040
 smiles-0.038
Token
Token ID198
Feature activation+8.77e-02

Pos logit contributions
ggy+0.013
 badly+0.011
vern+0.011
 liar+0.011
vere+0.011
Neg logit contributions
ened-0.008
cy-0.007
en-0.006
ted-0.006
neath-0.006
Token
Token ID198
Feature activation+6.68e-02

Pos logit contributions
ggy+0.223
 badly+0.212
 liar+0.210
vern+0.207
vere+0.196
Neg logit contributions
ened-0.109
cy-0.107
 Let-0.098
 Jenny-0.095
 Sarah-0.092
TokenThe
Token ID464
Feature activation-1.79e-02

Pos logit contributions
ggy+0.144
 badly+0.131
vern+0.131
 liar+0.126
vere+0.119
Neg logit contributions
ened-0.122
cy-0.115
en-0.103
 Let-0.102
ted-0.102
Token duck
Token ID22045
Feature activation+1.51e-02

Pos logit contributions
ened+0.026
cy+0.024
ted+0.021
en+0.021
 Let+0.020
Neg logit contributions
ggy-0.045
vern-0.043
 badly-0.042
vere-0.039
 liar-0.039
Token follows
Token ID5679
Feature activation-6.45e-02

Pos logit contributions
ggy+0.037
 badly+0.033
vern+0.032
 liar+0.032
ggie+0.030
Neg logit contributions
ened-0.025
cy-0.021
 Jenna-0.020
 Sarah-0.020
neath-0.019
Token their
Token ID511
Feature activation-1.04e-02

Pos logit contributions
ened+0.108
neath+0.097
cy+0.091
elling+0.091
oured+0.086
Neg logit contributions
ggy-0.151
 badly-0.128
ggie-0.122
gh-0.116
 not-0.114
Token ship
Token ID4074
Feature activation+2.20e-02

Pos logit contributions
ened+0.018
cy+0.016
en+0.015
ted+0.014
 smiles+0.014
Neg logit contributions
ggy-0.024
 badly-0.022
vern-0.022
 liar-0.021
vere-0.021
Token.
Token ID13
Feature activation-2.68e-02

Pos logit contributions
ggy+0.052
 badly+0.047
vern+0.046
 liar+0.044
vere+0.043
Neg logit contributions
ened-0.038
cy-0.034
en-0.031
ted-0.030
neath-0.030
Token L
Token ID406
Feature activation-2.14e-02

Pos logit contributions
ened+0.041
cy+0.039
en+0.033
 Girl+0.032
 Jenna+0.030
Neg logit contributions
ggy-0.066
 badly-0.060
 liar-0.059
 gag-0.056
unch-0.055
Tokenila
Token ID10102
Feature activation-2.11e-02

Pos logit contributions
ened+0.041
cy+0.038
en+0.035
ted+0.035
neath+0.034
Neg logit contributions
ggy-0.044
 badly-0.041
vern-0.040
 liar-0.039
vere-0.037