TOP ACTIVATIONS
MAX = 9.634

asked him , " What did you do with your chain
my asked , " What did you find down there ,
ily asked , " What did the brown bear do ?"
asked Jill , " What do you want to play ?"
and said , " What did you get me for my
" What do you recommend we do today
asked Bob , " What do you like to do with
her mom , " What can I do ?" Her mom
" What can you do with it ?"
What should we do for the event
my asked , " What did you pick up ?" Lily
to play . " What did you find ?" he asked
picnic . " What did you plan to take with
y asked : " What should I bring ?"
asked her mom " What does it mean to be independent
. " What did you do , sweet heart
. " What should we play ?" asked Sam
" Mom , why do the flowers spring ?" Tim
asked Mike : " What would you like best in the
What should we do now ? â

MIN ACTIVATIONS
MIN = -7.988

Ron ny nodded . Tears filled his eyes .
solve his problem . From then on ,
being so helpful . The guard smiled and
his mom my . The two black ducks
the bear again . The boy also remembers
for their kindness . The tired bee flew
excited and happy . Now the bear had
apped and cheered . The k itty had
quickly ran away . Lu cy was very
hive unsc athed . Th ump was relieved
very brave and proud . The moral of this
was so happy . B unny thanked the
proud and special . But not everyone at
sad and scared . The next day Buzz
by many people . He went back to
pattern was fixed . The end . <|endoftext|>
and thanked the owl . The sw an flew
they became friends . From that day on
his mom too . He wanted to take
for his kindness . The boy was glad

INTERVAL 5.229 - 9.634
CONTAINS 0.633%

should never take something that doesn 't belong to you .
there was a furry dog called R uf us . R
" Hello , new dog . What is your name ?"
an ignorant goat . He thought he could get away with
give us our ball back . We love our ball .

INTERVAL 0.823 - 5.229
CONTAINS 35.416%

<|endoftext|> Once upon a time , there was an attractive beetle
a plane outside the window . He was so excited !
too , mom . I want to organize things like the
up the music so she could listen more closely .
That 's right ! â

INTERVAL -3.582 - 0.823
CONTAINS 57.554%

gave him a big hug and said he was a very
torn ear . Lily picked it up and decided to take
wanted to improve your bike so you can ride faster and
because the inside of the mailbox was filled with mail .
a silly clown who made them laugh . The clown had

INTERVAL -7.988 - -3.582
CONTAINS 6.397%

bed and ran outside . At the park ,
the air . Alice tried the game and
She would line them all up , give them cups and
l oll ipop . Her eyes lit up with
doing inside the freezer ?" Mom my smiled and
Token asked
Token ID1965
Feature activation+1.89e+00

Pos logit contributions
ized+0.013
led+0.012
ls+0.012
ling+0.012
lights+0.011
Neg logit contributions
 fault-0.025
 loser-0.023
 liar-0.022
 seeker-0.022
 robber-0.021
Token him
Token ID683
Feature activation+2.43e+00

Pos logit contributions
 fault+0.051
 loser+0.045
 seeker+0.043
 liar+0.043
 supposed+0.043
Neg logit contributions
led-0.030
ling-0.030
ized-0.029
ls-0.029
 deep-0.029
Token,
Token ID11
Feature activation+8.69e-01

Pos logit contributions
 fault+0.067
 loser+0.063
 seeker+0.061
 supposed+0.060
 liar+0.059
Neg logit contributions
ls-0.034
 slowly-0.031
 deep-0.031
s-0.031
 into-0.029
Token "
Token ID366
Feature activation+1.11e+00

Pos logit contributions
 fault+0.025
 seeker+0.022
 loser+0.021
 liar+0.021
 supposed+0.021
Neg logit contributions
 deep-0.012
ized-0.012
 bub-0.011
bel-0.011
lights-0.011
TokenWhat
Token ID2061
Feature activation+4.99e+00

Pos logit contributions
 fault+0.031
 supposed+0.025
 liar+0.025
 loser+0.024
 seeker+0.024
Neg logit contributions
ized-0.020
cking-0.018
ling-0.017
The-0.017
ised-0.017
Token did
Token ID750
Feature activation+9.63e+00

Pos logit contributions
 fault+0.141
 loser+0.131
 robber+0.130
 seeker+0.129
 inconvenience+0.122
Neg logit contributions
s-0.063
 deep-0.060
ls-0.055
 big-0.053
 high-0.048
Token you
Token ID345
Feature activation+5.68e+00

Pos logit contributions
 fault+0.224
 loser+0.195
 seeker+0.193
 supposed+0.192
 liar+0.191
Neg logit contributions
lights-0.192
ized-0.189
ls-0.186
cking-0.184
led-0.174
Token do
Token ID466
Feature activation+7.08e+00

Pos logit contributions
 fault+0.123
 liar+0.103
 loser+0.101
 robber+0.099
 supposed+0.096
Neg logit contributions
ized-0.122
cking-0.120
 deep-0.112
ised-0.110
ling-0.109
Token with
Token ID351
Feature activation+1.99e-01

Pos logit contributions
 fault+0.206
 liar+0.183
 loser+0.179
epad+0.175
 robber+0.175
Neg logit contributions
ized-0.098
lights-0.088
bel-0.087
ls-0.084
 deep-0.084
Token your
Token ID534
Feature activation+2.06e+00

Pos logit contributions
 fault+0.005
 robber+0.004
 supposed+0.004
 loser+0.004
 seeker+0.004
Neg logit contributions
lights-0.004
ized-0.003
cking-0.003
ls-0.003
led-0.003
Token chain
Token ID6333
Feature activation+1.42e+00

Pos logit contributions
 fault+0.052
 robber+0.045
 supposed+0.045
 loser+0.045
 seeker+0.045
Neg logit contributions
lights-0.038
ized-0.037
cking-0.037
ls-0.036
led-0.035
Tokenmy
Token ID1820
Feature activation+2.42e-01

Pos logit contributions
 fault+0.009
 loser+0.008
 liar+0.008
 robber+0.008
 seeker+0.008
Neg logit contributions
cking-0.009
ling-0.009
ls-0.008
ized-0.008
led-0.008
Token asked
Token ID1965
Feature activation+6.16e-01

Pos logit contributions
 fault+0.006
 loser+0.006
 seeker+0.005
 supposed+0.005
 robber+0.005
Neg logit contributions
ized-0.004
ls-0.004
lights-0.004
cking-0.004
led-0.004
Token,
Token ID11
Feature activation+7.51e-01

Pos logit contributions
 fault+0.017
 loser+0.016
 supposed+0.015
 liar+0.015
 seeker+0.015
Neg logit contributions
ls-0.009
ling-0.009
 deep-0.009
led-0.009
ized-0.008
Token "
Token ID366
Feature activation+4.79e-01

Pos logit contributions
 fault+0.019
 loser+0.017
 supposed+0.017
 seeker+0.016
 liar+0.016
Neg logit contributions
ized-0.012
cking-0.012
 deep-0.012
ls-0.012
lights-0.012
TokenWhat
Token ID2061
Feature activation+4.53e+00

Pos logit contributions
 fault+0.012
 supposed+0.010
 loser+0.010
 liar+0.010
 robber+0.009
Neg logit contributions
ized-0.010
cking-0.009
lights-0.009
ls-0.009
ling-0.009
Token did
Token ID750
Feature activation+9.56e+00

Pos logit contributions
 fault+0.118
 robber+0.112
 loser+0.108
epad+0.105
 seeker+0.103
Neg logit contributions
ls-0.068
 deep-0.066
s-0.065
 lands-0.059
led-0.056
Token you
Token ID345
Feature activation+5.43e+00

Pos logit contributions
 fault+0.196
 robber+0.174
 loser+0.170
 liar+0.169
 supposed+0.168
Neg logit contributions
lights-0.213
ized-0.209
cking-0.208
ls-0.200
led-0.195
Token find
Token ID1064
Feature activation+1.82e+00

Pos logit contributions
 fault+0.111
 loser+0.093
 robber+0.093
 liar+0.093
 supposed+0.092
Neg logit contributions
ized-0.122
cking-0.121
lights-0.120
ls-0.118
led-0.113
Token down
Token ID866
Feature activation-2.26e-01

Pos logit contributions
 fault+0.032
 seeker+0.028
 supposed+0.028
 robber+0.028
 loser+0.026
Neg logit contributions
lights-0.047
cking-0.045
ized-0.044
ls-0.043
bel-0.042
Token there
Token ID612
Feature activation+1.41e+00

Pos logit contributions
lights+0.006
cking+0.006
ized+0.006
led+0.005
ls+0.005
Neg logit contributions
 fault-0.004
 seeker-0.004
 loser-0.003
 robber-0.003
 supposed-0.003
Token,
Token ID11
Feature activation+3.03e+00

Pos logit contributions
 fault+0.028
 robber+0.024
 loser+0.024
 supposed+0.024
 seeker+0.024
Neg logit contributions
lights-0.033
ized-0.032
cking-0.032
ls-0.031
led-0.030
Tokenily
Token ID813
Feature activation-1.65e+00

Pos logit contributions
 fault+0.003
 robber+0.003
 loser+0.003
 liar+0.003
 seeker+0.003
Neg logit contributions
led-0.004
ls-0.003
lights-0.003
ized-0.003
ling-0.003
Token asked
Token ID1965
Feature activation+1.24e+00

Pos logit contributions
lights+0.037
cking+0.034
ls+0.033
ized+0.032
cks+0.032
Neg logit contributions
 fault-0.034
 robber-0.031
 supposed-0.030
 liar-0.030
 loser-0.030
Token,
Token ID11
Feature activation-8.51e-02

Pos logit contributions
 fault+0.034
 supposed+0.030
 loser+0.030
 robber+0.030
 liar+0.029
Neg logit contributions
lights-0.019
ized-0.019
ls-0.019
cking-0.018
led-0.018
Token "
Token ID366
Feature activation+9.94e-01

Pos logit contributions
lights+0.002
ized+0.002
cking+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.002
 loser-0.002
 supposed-0.002
 robber-0.002
 seeker-0.002
TokenWhat
Token ID2061
Feature activation+4.81e+00

Pos logit contributions
 fault+0.023
 supposed+0.019
 robber+0.019
 loser+0.019
 seeker+0.019
Neg logit contributions
ized-0.021
lights-0.021
cking-0.020
ls-0.020
led-0.019
Token did
Token ID750
Feature activation+9.46e+00

Pos logit contributions
 fault+0.129
 robber+0.122
 loser+0.120
 seeker+0.115
epad+0.114
Neg logit contributions
ls-0.069
s-0.064
 deep-0.064
 high-0.054
led-0.054
Token the
Token ID262
Feature activation+1.31e+00

Pos logit contributions
 fault+0.197
 robber+0.174
 loser+0.170
 supposed+0.170
 liar+0.169
Neg logit contributions
lights-0.206
ized-0.205
cking-0.203
ls-0.195
led-0.191
Token brown
Token ID7586
Feature activation+2.43e+00

Pos logit contributions
 fault+0.031
 loser+0.027
 liar+0.027
 supposed+0.027
 robber+0.027
Neg logit contributions
lights-0.024
ls-0.023
cking-0.023
ized-0.023
ling-0.022
Token bear
Token ID6842
Feature activation+4.30e+00

Pos logit contributions
 fault+0.063
 supposed+0.057
 liar+0.055
 loser+0.055
epad+0.054
Neg logit contributions
ls-0.042
lights-0.041
ling-0.039
ized-0.039
led-0.038
Token do
Token ID466
Feature activation+5.82e+00

Pos logit contributions
 fault+0.086
 robber+0.074
 loser+0.073
 supposed+0.073
 seeker+0.073
Neg logit contributions
lights-0.098
ized-0.097
cking-0.096
ls-0.095
led-0.091
Token?"
Token ID1701
Feature activation-1.17e+00

Pos logit contributions
 fault+0.139
 liar+0.119
 loser+0.119
epad+0.118
 robber+0.117
Neg logit contributions
ized-0.108
ls-0.107
lights-0.105
cking-0.101
bel-0.100
Token asked
Token ID1965
Feature activation+2.23e+00

Pos logit contributions
 fault+0.053
 loser+0.048
 robber+0.047
 liar+0.047
 seeker+0.047
Neg logit contributions
ized-0.030
ls-0.029
lights-0.028
cking-0.028
led-0.028
Token Jill
Token ID22772
Feature activation+2.67e+00

Pos logit contributions
 fault+0.052
 supposed+0.045
 loser+0.045
 liar+0.044
 seeker+0.043
Neg logit contributions
lights-0.046
ls-0.045
ized-0.043
led-0.042
 deep-0.041
Token,
Token ID11
Feature activation+9.34e-01

Pos logit contributions
 fault+0.058
 loser+0.055
 seeker+0.055
 robber+0.053
 liar+0.053
Neg logit contributions
ls-0.054
ling-0.051
ized-0.050
lights-0.050
cking-0.049
Token "
Token ID366
Feature activation+1.49e+00

Pos logit contributions
 fault+0.028
 seeker+0.024
 loser+0.024
 supposed+0.023
 liar+0.023
Neg logit contributions
 deep-0.012
ized-0.012
 low-0.011
 bub-0.011
bel-0.011
TokenWhat
Token ID2061
Feature activation+5.37e+00

Pos logit contributions
 fault+0.044
 supposed+0.036
 liar+0.036
 robber+0.034
 loser+0.034
Neg logit contributions
ized-0.024
The-0.022
cking-0.021
ling-0.021
Later-0.021
Token do
Token ID466
Feature activation+9.46e+00

Pos logit contributions
 fault+0.150
 robber+0.147
 seeker+0.141
 loser+0.140
epad+0.134
Neg logit contributions
s-0.068
 deep-0.062
ls-0.058
 big-0.053
 high-0.049
Token you
Token ID345
Feature activation+5.63e+00

Pos logit contributions
 fault+0.230
epad+0.194
 supposed+0.194
 seeker+0.193
 liar+0.188
Neg logit contributions
ls-0.182
lights-0.178
ized-0.177
cking-0.167
bel-0.166
Token want
Token ID765
Feature activation+5.47e+00

Pos logit contributions
 fault+0.115
 liar+0.095
 reader+0.094
 loser+0.093
 seeker+0.089
Neg logit contributions
ling-0.124
 deep-0.122
ized-0.122
cking-0.121
 bub-0.118
Token to
Token ID284
Feature activation+4.86e+00

Pos logit contributions
 fault+0.130
 loser+0.116
 liar+0.114
 seeker+0.114
epad+0.110
Neg logit contributions
ized-0.102
ls-0.099
lights-0.097
 deep-0.096
led-0.092
Token play
Token ID711
Feature activation+3.09e+00

Pos logit contributions
 fault+0.115
 liar+0.102
 loser+0.100
 seeker+0.098
 robber+0.097
Neg logit contributions
ized-0.094
lights-0.087
cking-0.086
ised-0.084
 deep-0.084
Token?"
Token ID1701
Feature activation-1.06e-01

Pos logit contributions
 fault+0.078
 supposed+0.068
 liar+0.067
 loser+0.067
 robber+0.066
Neg logit contributions
ized-0.056
lights-0.055
ls-0.052
cking-0.052
ised-0.050
Token and
Token ID290
Feature activation-1.48e+00

Pos logit contributions
lights+0.011
ized+0.011
cking+0.011
ls+0.010
led+0.010
Neg logit contributions
 fault-0.015
 supposed-0.013
 robber-0.013
 loser-0.013
 seeker-0.013
Token said
Token ID531
Feature activation+4.14e+00

Pos logit contributions
lights+0.023
ized+0.022
cking+0.022
ls+0.022
led+0.020
Neg logit contributions
 fault-0.041
 robber-0.036
 loser-0.036
 supposed-0.036
 seeker-0.036
Token,
Token ID11
Feature activation+8.80e-01

Pos logit contributions
 fault+0.125
 supposed+0.113
 robber+0.113
 loser+0.113
 seeker+0.112
Neg logit contributions
ls-0.056
lights-0.054
ized-0.052
led-0.048
 deep-0.046
Token "
Token ID366
Feature activation+1.20e+00

Pos logit contributions
 fault+0.019
 loser+0.017
 seeker+0.016
 robber+0.016
 supposed+0.016
Neg logit contributions
lights-0.019
ized-0.018
cking-0.018
ls-0.018
led-0.017
TokenWhat
Token ID2061
Feature activation+4.74e+00

Pos logit contributions
 fault+0.037
 liar+0.032
 robber+0.032
 loser+0.032
 supposed+0.031
Neg logit contributions
ized-0.017
lights-0.016
cking-0.015
bel-0.014
ised-0.014
Token did
Token ID750
Feature activation+9.42e+00

Pos logit contributions
 fault+0.130
 robber+0.124
 loser+0.121
 seeker+0.114
 liar+0.111
Neg logit contributions
 deep-0.064
s-0.061
 big-0.057
 high-0.056
ls-0.055
Token you
Token ID345
Feature activation+5.76e+00

Pos logit contributions
 fault+0.212
 loser+0.184
 robber+0.182
 seeker+0.182
 supposed+0.181
Neg logit contributions
lights-0.194
ized-0.192
ls-0.188
cking-0.187
led-0.177
Token get
Token ID651
Feature activation+2.94e+00

Pos logit contributions
 fault+0.144
 liar+0.118
 loser+0.114
 robber+0.109
 supposed+0.106
Neg logit contributions
ized-0.110
 deep-0.106
cking-0.104
 bub-0.097
ised-0.097
Token me
Token ID502
Feature activation+2.55e+00

Pos logit contributions
 fault+0.075
 loser+0.067
 liar+0.066
 robber+0.065
 seeker+0.063
Neg logit contributions
ized-0.053
ls-0.051
lights-0.051
cking-0.051
led-0.048
Token for
Token ID329
Feature activation+1.66e-01

Pos logit contributions
 fault+0.057
 loser+0.050
 robber+0.049
 liar+0.049
 supposed+0.049
Neg logit contributions
lights-0.052
ized-0.052
cking-0.051
ls-0.050
led-0.048
Token my
Token ID616
Feature activation+2.68e+00

Pos logit contributions
 fault+0.004
 liar+0.004
 loser+0.004
 robber+0.004
epad+0.003
Neg logit contributions
ized-0.003
lights-0.003
ls-0.003
cking-0.003
ised-0.003
Token
Token ID220
Feature activation+8.90e-01

Pos logit contributions
 fault+0.055
 loser+0.051
 liar+0.050
 robber+0.048
 seeker+0.048
Neg logit contributions
lights-0.048
ized-0.044
ls-0.043
 grass-0.042
bel-0.042
Token
Token ID198
Feature activation-4.58e+00

Pos logit contributions
 fault+0.023
 supposed+0.021
 loser+0.020
 robber+0.020
 seeker+0.020
Neg logit contributions
ized-0.015
lights-0.015
cking-0.015
ls-0.014
led-0.014
Token
Token ID198
Feature activation-3.37e+00

Pos logit contributions
lights+0.083
ized+0.082
cking+0.081
led+0.076
ls+0.076
Neg logit contributions
 fault-0.113
 supposed-0.101
 seeker-0.099
 robber-0.099
 loser-0.099
Token"
Token ID1
Feature activation+1.70e+00

Pos logit contributions
lights+0.057
ized+0.056
cking+0.055
ls+0.054
led+0.051
Neg logit contributions
 fault-0.089
 loser-0.078
 robber-0.078
 supposed-0.078
 seeker-0.078
TokenWhat
Token ID2061
Feature activation+5.40e+00

Pos logit contributions
 fault+0.048
 supposed+0.039
 liar+0.038
 loser+0.037
 robber+0.036
Neg logit contributions
ized-0.029
lights-0.027
The-0.027
Later-0.026
cking-0.026
Token do
Token ID466
Feature activation+9.42e+00

Pos logit contributions
 fault+0.155
 robber+0.150
 loser+0.144
 seeker+0.140
 liar+0.134
Neg logit contributions
s-0.063
 deep-0.059
ls-0.054
 big-0.051
 ch-0.049
Token you
Token ID345
Feature activation+5.94e+00

Pos logit contributions
 fault+0.221
 supposed+0.186
 loser+0.184
 seeker+0.184
 liar+0.182
Neg logit contributions
ls-0.188
lights-0.188
ized-0.186
cking-0.177
led-0.171
Token recommend
Token ID4313
Feature activation+5.77e+00

Pos logit contributions
 fault+0.124
 liar+0.100
 loser+0.096
 robber+0.087
 Thief+0.086
Neg logit contributions
 deep-0.131
ling-0.124
 bub-0.122
ized-0.122
cking-0.120
Token we
Token ID356
Feature activation+4.74e+00

Pos logit contributions
 fault+0.144
 loser+0.124
 liar+0.124
 seeker+0.119
epad+0.119
Neg logit contributions
ized-0.105
ls-0.102
lights-0.101
 deep-0.097
cking-0.095
Token do
Token ID466
Feature activation+5.52e+00

Pos logit contributions
 fault+0.127
 liar+0.108
 loser+0.103
 robber+0.097
oyal+0.096
Neg logit contributions
 deep-0.078
ling-0.074
 bub-0.073
 row-0.071
 ch-0.069
Token today
Token ID1909
Feature activation+2.91e+00

Pos logit contributions
 fault+0.147
 liar+0.133
 loser+0.131
 robber+0.127
epad+0.126
Neg logit contributions
ized-0.085
ls-0.081
 deep-0.081
lights-0.080
ised-0.077
Token asked
Token ID1965
Feature activation+2.32e+00

Pos logit contributions
 fault+0.047
 loser+0.042
 robber+0.041
 liar+0.041
 supposed+0.041
Neg logit contributions
ized-0.028
lights-0.028
ls-0.027
cking-0.027
led-0.026
Token Bob
Token ID5811
Feature activation+1.95e+00

Pos logit contributions
 fault+0.062
 supposed+0.056
epad+0.054
 loser+0.054
 liar+0.053
Neg logit contributions
ls-0.038
lights-0.036
 deep-0.034
led-0.033
ling-0.032
Token,
Token ID11
Feature activation+9.10e-01

Pos logit contributions
 fault+0.050
 loser+0.048
 seeker+0.047
 liar+0.046
 robber+0.045
Neg logit contributions
s-0.031
ls-0.030
 deep-0.028
ling-0.028
bel-0.026
Token "
Token ID366
Feature activation+1.21e+00

Pos logit contributions
 fault+0.026
 seeker+0.023
 loser+0.023
 supposed+0.023
 liar+0.023
Neg logit contributions
 deep-0.012
 low-0.011
 bub-0.010
ls-0.010
ized-0.010
TokenWhat
Token ID2061
Feature activation+5.08e+00

Pos logit contributions
 fault+0.033
 supposed+0.027
 liar+0.026
epad+0.025
 robber+0.025
Neg logit contributions
ized-0.020
ling-0.020
cking-0.020
 distance-0.019
The-0.019
Token do
Token ID466
Feature activation+9.42e+00

Pos logit contributions
 fault+0.139
 robber+0.130
 seeker+0.126
 loser+0.126
epad+0.122
Neg logit contributions
s-0.069
 deep-0.066
ls-0.065
 big-0.055
 high-0.055
Token you
Token ID345
Feature activation+5.45e+00

Pos logit contributions
 fault+0.209
 supposed+0.177
 seeker+0.177
 loser+0.174
 robber+0.174
Neg logit contributions
lights-0.196
ls-0.195
ized-0.193
cking-0.187
led-0.180
Token like
Token ID588
Feature activation+3.34e+00

Pos logit contributions
 fault+0.100
 liar+0.084
 robber+0.081
 loser+0.080
epad+0.075
Neg logit contributions
ized-0.134
cking-0.131
ling-0.126
 deep-0.126
ised-0.124
Token to
Token ID284
Feature activation+4.84e+00

Pos logit contributions
 fault+0.083
 loser+0.074
 liar+0.074
 robber+0.073
 supposed+0.072
Neg logit contributions
lights-0.058
ized-0.058
ls-0.056
cking-0.055
ised-0.052
Token do
Token ID466
Feature activation+5.67e+00

Pos logit contributions
 fault+0.114
 loser+0.099
 liar+0.099
 robber+0.098
 seeker+0.097
Neg logit contributions
ized-0.094
lights-0.092
cking-0.091
ls-0.087
ling-0.084
Token with
Token ID351
Feature activation+1.01e-01

Pos logit contributions
 fault+0.150
 liar+0.127
 loser+0.127
epad+0.126
 supposed+0.123
Neg logit contributions
ized-0.097
ls-0.092
lights-0.092
ised-0.087
bel-0.086
Token her
Token ID607
Feature activation+2.01e+00

Pos logit contributions
 fault+0.058
 loser+0.052
 robber+0.051
 liar+0.051
 seeker+0.051
Neg logit contributions
lights-0.031
ized-0.031
ls-0.031
cking-0.029
led-0.029
Token mom
Token ID1995
Feature activation+8.76e-02

Pos logit contributions
 fault+0.058
 loser+0.053
 supposed+0.051
 liar+0.051
 robber+0.051
Neg logit contributions
lights-0.028
ls-0.028
ized-0.027
cking-0.026
bel-0.024
Token,
Token ID11
Feature activation+4.78e-01

Pos logit contributions
 fault+0.002
 loser+0.002
 liar+0.002
 robber+0.002
 seeker+0.002
Neg logit contributions
lights-0.001
ling-0.001
ls-0.001
s-0.001
ized-0.001
Token "
Token ID366
Feature activation+1.22e+00

Pos logit contributions
 fault+0.013
 loser+0.011
 seeker+0.011
 liar+0.011
 supposed+0.011
Neg logit contributions
ized-0.008
 deep-0.007
cking-0.007
ls-0.007
lights-0.007
TokenWhat
Token ID2061
Feature activation+4.91e+00

Pos logit contributions
 fault+0.039
 liar+0.034
 supposed+0.034
 loser+0.033
 robber+0.033
Neg logit contributions
ized-0.015
lights-0.015
cking-0.014
ised-0.013
bel-0.013
Token can
Token ID460
Feature activation+9.41e+00

Pos logit contributions
 fault+0.138
 robber+0.132
 loser+0.131
 seeker+0.124
 liar+0.120
Neg logit contributions
s-0.064
 deep-0.062
ls-0.061
 big-0.054
 high-0.054
Token I
Token ID314
Feature activation+6.04e+00

Pos logit contributions
 fault+0.157
 loser+0.129
 robber+0.128
 seeker+0.128
 supposed+0.127
Neg logit contributions
lights-0.249
ized-0.247
ls-0.243
cking-0.242
led-0.232
Token do
Token ID466
Feature activation+6.20e+00

Pos logit contributions
 fault+0.133
 loser+0.110
 liar+0.110
 robber+0.105
 seeker+0.104
Neg logit contributions
ized-0.125
cking-0.120
ls-0.120
 deep-0.120
lights-0.116
Token?"
Token ID1701
Feature activation-9.95e-01

Pos logit contributions
 fault+0.180
 liar+0.155
 loser+0.154
 robber+0.149
 seeker+0.147
Neg logit contributions
ized-0.090
bel-0.084
ls-0.082
 deep-0.082
lights-0.081
Token Her
Token ID2332
Feature activation+2.98e+00

Pos logit contributions
cking+0.025
lights+0.025
ized+0.024
ls+0.024
led+0.023
Neg logit contributions
 fault-0.018
 supposed-0.015
 seeker-0.015
 robber-0.015
 loser-0.015
Token mom
Token ID1995
Feature activation-2.08e+00

Pos logit contributions
 fault+0.078
 robber+0.069
 loser+0.069
 supposed+0.068
 seeker+0.068
Neg logit contributions
lights-0.051
ized-0.051
cking-0.050
ls-0.049
led-0.046
Token
Token ID220
Feature activation+6.66e-01

Pos logit contributions
 fault+0.041
 loser+0.035
 robber+0.035
 supposed+0.035
 seeker+0.035
Neg logit contributions
lights-0.046
ized-0.045
cking-0.044
ls-0.044
led-0.042
Token
Token ID198
Feature activation-5.02e+00

Pos logit contributions
 fault+0.017
 supposed+0.015
 robber+0.015
 loser+0.015
 seeker+0.015
Neg logit contributions
ized-0.012
lights-0.012
cking-0.011
ls-0.011
led-0.011
Token
Token ID198
Feature activation-3.93e+00

Pos logit contributions
lights+0.102
ized+0.099
cking+0.099
led+0.095
ls+0.092
Neg logit contributions
 fault-0.110
 supposed-0.101
 robber-0.100
 loser-0.099
 seeker-0.098
Token"
Token ID1
Feature activation+1.44e+00

Pos logit contributions
lights+0.084
cking+0.081
ized+0.080
ls+0.080
led+0.077
Neg logit contributions
 fault-0.079
 robber-0.076
 loser-0.074
 supposed-0.074
 seeker-0.073
TokenWhat
Token ID2061
Feature activation+5.04e+00

Pos logit contributions
 fault+0.038
 loser+0.033
 robber+0.033
 supposed+0.033
 liar+0.033
Neg logit contributions
ized-0.025
lights-0.025
cking-0.024
ls-0.024
led-0.023
Token can
Token ID460
Feature activation+9.38e+00

Pos logit contributions
 fault+0.136
 robber+0.136
 loser+0.129
 seeker+0.123
 liar+0.119
Neg logit contributions
 deep-0.065
s-0.063
ls-0.063
 lands-0.054
 ch-0.054
Token you
Token ID345
Feature activation+5.88e+00

Pos logit contributions
 fault+0.174
 seeker+0.146
 loser+0.145
 robber+0.144
 liar+0.142
Neg logit contributions
ized-0.234
lights-0.233
ls-0.231
cking-0.225
led-0.215
Token do
Token ID466
Feature activation+7.13e+00

Pos logit contributions
 fault+0.138
 liar+0.114
 loser+0.113
 supposed+0.112
epad+0.109
Neg logit contributions
ized-0.119
cking-0.114
 deep-0.109
ised-0.104
ling-0.103
Token with
Token ID351
Feature activation+5.33e-01

Pos logit contributions
 fault+0.211
 loser+0.181
 liar+0.179
epad+0.178
 supposed+0.176
Neg logit contributions
ized-0.103
ls-0.098
 deep-0.092
bel-0.090
lights-0.089
Token it
Token ID340
Feature activation+2.84e+00

Pos logit contributions
 fault+0.013
 robber+0.012
 supposed+0.011
 seeker+0.011
 loser+0.011
Neg logit contributions
lights-0.010
ized-0.010
cking-0.010
ls-0.009
led-0.009
Token?"
Token ID1701
Feature activation-1.37e+00

Pos logit contributions
 fault+0.056
 robber+0.048
 loser+0.048
 supposed+0.048
 seeker+0.048
Neg logit contributions
lights-0.065
ized-0.065
cking-0.064
ls-0.063
led-0.061
Token
Token ID95
Feature activation+2.87e+00

Pos logit contributions
 fault+0.102
 supposed+0.087
 loser+0.087
 liar+0.086
 robber+0.085
Neg logit contributions
ized-0.091
lights-0.087
cking-0.086
ls-0.085
ised-0.080
Token
Token ID26391
Feature activation+5.29e+00

Pos logit contributions
 fault+0.066
 liar+0.058
 loser+0.057
 robber+0.056
 supposed+0.052
Neg logit contributions
ized-0.059
cking-0.055
lights-0.055
ised-0.052
cks-0.051
Token
Token ID129
Feature activation+1.74e+00

Pos logit contributions
 fault+0.125
 robber+0.110
 loser+0.110
 liar+0.108
 supposed+0.108
Neg logit contributions
lights-0.102
ized-0.102
ls-0.099
cking-0.099
led-0.093
Token
Token ID241
Feature activation+1.89e+00

Pos logit contributions
 fault+0.041
 loser+0.036
 robber+0.036
 liar+0.036
 seeker+0.036
Neg logit contributions
ized-0.033
lights-0.033
cking-0.033
ls-0.032
led-0.030
TokenWhat
Token ID2061
Feature activation+4.00e+00

Pos logit contributions
 fault+0.047
 robber+0.044
 Thief+0.042
 liar+0.042
 loser+0.040
Neg logit contributions
The-0.031
s-0.030
bel-0.030
 deep-0.030
ized-0.030
Token should
Token ID815
Feature activation+9.35e+00

Pos logit contributions
 fault+0.110
 loser+0.102
 robber+0.101
 seeker+0.095
epad+0.092
Neg logit contributions
 deep-0.058
ls-0.057
s-0.057
 high-0.049
 big-0.049
Token we
Token ID356
Feature activation+5.22e+00

Pos logit contributions
 fault+0.260
 loser+0.228
 liar+0.227
 seeker+0.226
epad+0.225
Neg logit contributions
lights-0.146
ls-0.145
ized-0.133
 deep-0.129
bel-0.124
Token do
Token ID466
Feature activation+5.77e+00

Pos logit contributions
 fault+0.119
 liar+0.099
 loser+0.098
 robber+0.094
 seeker+0.091
Neg logit contributions
ized-0.106
 deep-0.103
cking-0.102
lights-0.099
ls-0.098
Token for
Token ID329
Feature activation+4.04e-01

Pos logit contributions
 fault+0.153
 liar+0.141
 robber+0.139
 loser+0.136
 Thief+0.133
Neg logit contributions
ized-0.091
ls-0.085
lights-0.085
bel-0.084
 deep-0.082
Token the
Token ID262
Feature activation+7.57e-01

Pos logit contributions
 fault+0.010
 liar+0.009
 robber+0.009
 loser+0.009
epad+0.009
Neg logit contributions
ized-0.007
lights-0.007
ls-0.007
cking-0.007
ised-0.007
Token event
Token ID1785
Feature activation+2.68e+00

Pos logit contributions
 fault+0.016
 loser+0.014
 liar+0.013
 robber+0.013
epad+0.013
Neg logit contributions
lights-0.017
ized-0.017
ls-0.016
cking-0.016
 deep-0.015
Tokenmy
Token ID1820
Feature activation-1.06e+00

Pos logit contributions
lights+0.040
ized+0.040
cking+0.039
ls+0.039
led+0.038
Neg logit contributions
 fault-0.025
 robber-0.021
 supposed-0.021
 loser-0.021
 seeker-0.020
Token asked
Token ID1965
Feature activation+5.46e-01

Pos logit contributions
lights+0.023
cking+0.022
ized+0.020
cks+0.020
ls+0.020
Neg logit contributions
 fault-0.022
 robber-0.020
 supposed-0.020
 indeed-0.019
 liar-0.019
Token,
Token ID11
Feature activation-1.43e-01

Pos logit contributions
 fault+0.015
 loser+0.013
 supposed+0.013
 seeker+0.013
 liar+0.012
Neg logit contributions
lights-0.009
ls-0.009
ized-0.009
cking-0.008
led-0.008
Token "
Token ID366
Feature activation+8.61e-01

Pos logit contributions
lights+0.003
ized+0.003
cking+0.003
ls+0.003
led+0.003
Neg logit contributions
 fault-0.003
 loser-0.003
 seeker-0.003
 supposed-0.003
 robber-0.003
TokenWhat
Token ID2061
Feature activation+4.63e+00

Pos logit contributions
 fault+0.023
 supposed+0.019
 loser+0.018
 liar+0.018
 seeker+0.018
Neg logit contributions
ized-0.016
lights-0.016
cking-0.016
ls-0.014
bel-0.014
Token did
Token ID750
Feature activation+9.31e+00

Pos logit contributions
 fault+0.127
 robber+0.117
 loser+0.116
 seeker+0.111
uries+0.106
Neg logit contributions
 deep-0.061
ls-0.060
s-0.060
 ch-0.053
 big-0.053
Token you
Token ID345
Feature activation+5.72e+00

Pos logit contributions
 fault+0.200
 loser+0.173
 robber+0.173
 supposed+0.172
 seeker+0.171
Neg logit contributions
lights-0.200
ized-0.198
cking-0.194
ls-0.192
led-0.184
Token pick
Token ID2298
Feature activation+9.76e-02

Pos logit contributions
 fault+0.132
 loser+0.109
 liar+0.109
 robber+0.106
 supposed+0.102
Neg logit contributions
ized-0.118
cking-0.115
 deep-0.108
ised-0.106
ls-0.104
Token up
Token ID510
Feature activation-7.23e-01

Pos logit contributions
 fault+0.002
 robber+0.002
 seeker+0.002
 supposed+0.002
 loser+0.002
Neg logit contributions
lights-0.002
cking-0.002
ized-0.002
ls-0.002
led-0.002
Token?"
Token ID1701
Feature activation-1.16e+00

Pos logit contributions
lights+0.016
cking+0.015
ized+0.015
ls+0.015
led+0.014
Neg logit contributions
 fault-0.015
 robber-0.013
 seeker-0.013
 loser-0.013
 supposed-0.013
Token Lily
Token ID20037
Feature activation-1.73e+00

Pos logit contributions
lights+0.027
ized+0.027
cking+0.027
ls+0.027
led+0.025
Neg logit contributions
 fault-0.022
 robber-0.020
 loser-0.019
 supposed-0.019
 liar-0.019
Token to
Token ID284
Feature activation-1.52e+00

Pos logit contributions
ized+0.054
lights+0.053
ls+0.053
cking+0.050
 deep+0.048
Neg logit contributions
 fault-0.088
 liar-0.077
 loser-0.077
 supposed-0.076
 robber-0.076
Token play
Token ID711
Feature activation+1.58e+00

Pos logit contributions
ized+0.027
cking+0.026
lights+0.026
ls+0.025
led+0.024
Neg logit contributions
 fault-0.040
 loser-0.035
 supposed-0.034
 robber-0.034
 liar-0.034
Token.
Token ID13
Feature activation+2.32e+00

Pos logit contributions
 fault+0.041
 robber+0.038
 loser+0.038
 seeker+0.037
 liar+0.037
Neg logit contributions
lights-0.027
cking-0.025
ized-0.025
ls-0.025
led-0.023
Token "
Token ID366
Feature activation+6.19e-01

Pos logit contributions
 fault+0.062
 loser+0.055
 seeker+0.055
 supposed+0.054
 liar+0.054
Neg logit contributions
lights-0.041
ized-0.037
ls-0.036
 bub-0.035
bel-0.035
TokenWhat
Token ID2061
Feature activation+4.75e+00

Pos logit contributions
 fault+0.015
 supposed+0.013
 robber+0.013
 loser+0.013
 seeker+0.013
Neg logit contributions
ized-0.012
cking-0.012
lights-0.011
ls-0.011
led-0.011
Token did
Token ID750
Feature activation+9.22e+00

Pos logit contributions
 fault+0.132
 robber+0.129
 loser+0.121
 seeker+0.119
epad+0.117
Neg logit contributions
s-0.060
ls-0.057
 deep-0.056
 big-0.046
led-0.045
Token you
Token ID345
Feature activation+5.53e+00

Pos logit contributions
 fault+0.195
 robber+0.169
 loser+0.168
 supposed+0.167
 seeker+0.166
Neg logit contributions
lights-0.201
ized-0.199
cking-0.196
ls-0.193
led-0.185
Token find
Token ID1064
Feature activation+1.78e+00

Pos logit contributions
 fault+0.125
 liar+0.107
 loser+0.106
 robber+0.103
 reader+0.102
Neg logit contributions
cking-0.113
ized-0.113
 deep-0.109
ling-0.106
 bub-0.103
Token?"
Token ID1701
Feature activation-1.33e+00

Pos logit contributions
 fault+0.033
 supposed+0.029
 robber+0.029
 seeker+0.029
 loser+0.027
Neg logit contributions
lights-0.043
cking-0.042
ized-0.042
ls-0.040
bel-0.039
Token he
Token ID339
Feature activation-8.27e-01

Pos logit contributions
lights+0.027
ized+0.026
cking+0.026
ls+0.025
led+0.024
Neg logit contributions
 fault-0.031
 robber-0.027
 loser-0.027
 supposed-0.027
 seeker-0.026
Token asked
Token ID1965
Feature activation+3.35e-01

Pos logit contributions
lights+0.019
ized+0.017
cking+0.017
ls+0.017
bel+0.016
Neg logit contributions
 fault-0.016
 robber-0.016
 supposed-0.015
 loser-0.015
 seeker-0.015
Token picnic
Token ID35715
Feature activation+4.52e-01

Pos logit contributions
 fault+0.037
 loser+0.033
 liar+0.032
 seeker+0.032
 robber+0.032
Neg logit contributions
lights-0.021
ized-0.021
ls-0.020
cking-0.020
 deep-0.018
Token.
Token ID13
Feature activation+2.25e+00

Pos logit contributions
 fault+0.012
 robber+0.011
 loser+0.011
 seeker+0.011
 supposed+0.011
Neg logit contributions
lights-0.007
ized-0.007
cking-0.007
ls-0.007
led-0.007
Token
Token ID198
Feature activation-2.91e+00

Pos logit contributions
 fault+0.054
 liar+0.048
 supposed+0.047
 loser+0.047
 robber+0.046
Neg logit contributions
lights-0.045
ized-0.042
ls-0.040
cking-0.039
ised-0.039
Token"
Token ID1
Feature activation+1.32e+00

Pos logit contributions
lights+0.048
ized+0.048
cking+0.046
ls+0.045
led+0.044
Neg logit contributions
 fault-0.076
 supposed-0.069
 loser-0.068
 robber-0.068
 seeker-0.067
TokenWhat
Token ID2061
Feature activation+4.98e+00

Pos logit contributions
 fault+0.040
 liar+0.034
 robber+0.033
 supposed+0.033
 loser+0.032
Neg logit contributions
lights-0.019
ized-0.018
The-0.018
 bub-0.017
 pans-0.017
Token did
Token ID750
Feature activation+9.21e+00

Pos logit contributions
 fault+0.142
 robber+0.135
 loser+0.133
 seeker+0.129
epad+0.124
Neg logit contributions
 deep-0.064
ls-0.062
s-0.061
 big-0.052
 high-0.051
Token you
Token ID345
Feature activation+5.85e+00

Pos logit contributions
 fault+0.226
 loser+0.198
 seeker+0.196
 supposed+0.196
 liar+0.195
Neg logit contributions
lights-0.173
ized-0.169
ls-0.167
cking-0.164
led-0.154
Token plan
Token ID1410
Feature activation+4.95e+00

Pos logit contributions
 fault+0.152
 liar+0.125
 loser+0.122
 robber+0.113
 seeker+0.111
Neg logit contributions
 deep-0.105
ized-0.103
cking-0.103
 bub-0.099
ling-0.095
Token to
Token ID284
Feature activation+2.55e+00

Pos logit contributions
 fault+0.106
 loser+0.092
 robber+0.091
 liar+0.091
 seeker+0.090
Neg logit contributions
lights-0.106
ized-0.105
cking-0.102
ls-0.102
led-0.097
Token take
Token ID1011
Feature activation-3.23e-02

Pos logit contributions
 fault+0.059
 liar+0.051
 loser+0.050
 robber+0.049
 supposed+0.049
Neg logit contributions
ized-0.051
lights-0.050
cking-0.050
ls-0.048
 deep-0.047
Token with
Token ID351
Feature activation-1.14e+00

Pos logit contributions
lights+0.001
ized+0.001
cking+0.001
ls+0.001
led+0.001
Neg logit contributions
 fault-0.001
 loser-0.001
 robber-0.001
 supposed-0.001
 liar-0.001
Tokeny
Token ID88
Feature activation-1.26e+00

Pos logit contributions
lights+0.012
ized+0.012
cking+0.012
ls+0.012
led+0.011
Neg logit contributions
 fault-0.018
 robber-0.016
 loser-0.016
 supposed-0.016
 seeker-0.016
Token asked
Token ID1965
Feature activation+1.57e+00

Pos logit contributions
lights+0.021
ized+0.021
cking+0.020
ls+0.020
led+0.019
Neg logit contributions
 fault-0.033
 loser-0.029
 robber-0.029
 supposed-0.029
 seeker-0.029
Token:
Token ID25
Feature activation+9.93e-01

Pos logit contributions
 fault+0.045
 seeker+0.040
 loser+0.040
 supposed+0.039
 robber+0.038
Neg logit contributions
lights-0.023
ized-0.023
ls-0.022
led-0.021
bel-0.020
Token "
Token ID366
Feature activation+1.25e+00

Pos logit contributions
 fault+0.023
 robber+0.019
 loser+0.019
 seeker+0.019
 supposed+0.019
Neg logit contributions
lights-0.020
ized-0.020
ls-0.019
cking-0.019
led-0.018
TokenWhat
Token ID2061
Feature activation+5.33e+00

Pos logit contributions
 fault+0.037
 supposed+0.031
 liar+0.030
 seeker+0.030
 loser+0.030
Neg logit contributions
ized-0.019
lights-0.019
bel-0.018
cking-0.018
 pans-0.017
Token should
Token ID815
Feature activation+9.17e+00

Pos logit contributions
 fault+0.148
 loser+0.143
 robber+0.142
 seeker+0.138
 liar+0.132
Neg logit contributions
s-0.071
 deep-0.064
ls-0.064
 big-0.055
 high-0.053
Token I
Token ID314
Feature activation+6.26e+00

Pos logit contributions
 fault+0.214
 seeker+0.182
 loser+0.181
 supposed+0.174
 liar+0.174
Neg logit contributions
lights-0.187
ls-0.183
ized-0.176
 deep-0.165
 pans-0.164
Token bring
Token ID2222
Feature activation+1.24e+00

Pos logit contributions
 fault+0.147
 liar+0.126
 loser+0.125
 robber+0.120
epad+0.118
Neg logit contributions
ized-0.117
 deep-0.116
cking-0.114
ls-0.112
lights-0.112
Token?"
Token ID1701
Feature activation-7.71e-01

Pos logit contributions
 fault+0.028
 loser+0.025
 liar+0.024
 robber+0.024
 seeker+0.023
Neg logit contributions
ized-0.025
lights-0.025
ls-0.025
cking-0.024
led-0.023
Token
Token ID198
Feature activation-4.58e+00

Pos logit contributions
ized+0.013
cking+0.013
lights+0.013
ls+0.013
led+0.012
Neg logit contributions
 fault-0.020
 robber-0.018
 loser-0.017
 supposed-0.017
 liar-0.017
Token
Token ID198
Feature activation-4.04e+00

Pos logit contributions
ized+0.091
lights+0.091
cking+0.090
led+0.088
ls+0.085
Neg logit contributions
 fault-0.104
 loser-0.093
 robber-0.093
 supposed-0.093
 liar-0.091
Token asked
Token ID1965
Feature activation+1.94e+00

Pos logit contributions
 fault+0.005
 loser+0.004
 robber+0.004
 liar+0.004
 supposed+0.004
Neg logit contributions
ized-0.003
lights-0.003
ls-0.003
cking-0.003
led-0.003
Token her
Token ID607
Feature activation+1.95e+00

Pos logit contributions
 fault+0.055
 loser+0.050
 liar+0.048
 robber+0.048
 seeker+0.048
Neg logit contributions
lights-0.029
ized-0.028
ls-0.028
cking-0.027
led-0.027
Token mom
Token ID1995
Feature activation-5.89e-02

Pos logit contributions
 fault+0.055
 loser+0.051
 supposed+0.050
 robber+0.049
 liar+0.049
Neg logit contributions
lights-0.027
ls-0.027
ized-0.025
cking-0.024
bel-0.023
Token "
Token ID366
Feature activation+1.31e+00

Pos logit contributions
lights+0.001
ling+0.001
ls+0.001
ized+0.001
s+0.001
Neg logit contributions
 fault-0.001
 loser-0.001
 liar-0.001
 seeker-0.001
 robber-0.001
TokenWhat
Token ID2061
Feature activation+4.96e+00

Pos logit contributions
 fault+0.030
 robber+0.025
 loser+0.025
 liar+0.025
 supposed+0.025
Neg logit contributions
ized-0.027
lights-0.027
cking-0.026
ls-0.025
led-0.025
Token does
Token ID857
Feature activation+9.16e+00

Pos logit contributions
 fault+0.131
 loser+0.124
 robber+0.122
 seeker+0.120
epad+0.111
Neg logit contributions
ls-0.076
s-0.075
 deep-0.069
 high-0.061
 big-0.060
Token it
Token ID340
Feature activation+5.04e+00

Pos logit contributions
 fault+0.239
 robber+0.212
 loser+0.212
 supposed+0.210
 seeker+0.210
Neg logit contributions
lights-0.155
ized-0.153
cking-0.150
ls-0.147
led-0.139
Token mean
Token ID1612
Feature activation+2.97e+00

Pos logit contributions
 fault+0.094
 supposed+0.081
 robber+0.080
 loser+0.078
 seeker+0.077
Neg logit contributions
lights-0.125
ized-0.122
cking-0.122
ls-0.118
led-0.115
Token to
Token ID284
Feature activation+3.47e+00

Pos logit contributions
 fault+0.072
 loser+0.064
 liar+0.064
 seeker+0.064
 supposed+0.063
Neg logit contributions
ized-0.051
ls-0.050
lights-0.049
ling-0.048
 deep-0.047
Token be
Token ID307
Feature activation+3.09e+00

Pos logit contributions
 fault+0.066
 loser+0.055
 seeker+0.055
 robber+0.054
 supposed+0.054
Neg logit contributions
ized-0.082
cking-0.081
lights-0.081
ls-0.080
led-0.075
Token independent
Token ID4795
Feature activation+1.85e+00

Pos logit contributions
 fault+0.066
 loser+0.057
 seeker+0.056
 robber+0.056
 liar+0.055
Neg logit contributions
ized-0.066
cking-0.065
lights-0.064
ls-0.063
led-0.061
Token.
Token ID13
Feature activation+1.59e+00

Pos logit contributions
cking+0.027
lights+0.026
ized+0.025
ls+0.025
cks+0.024
Neg logit contributions
 fault-0.027
 robber-0.024
 supposed-0.023
 liar-0.023
epad-0.023
Token
Token ID198
Feature activation-4.76e+00

Pos logit contributions
 fault+0.032
 robber+0.028
 loser+0.028
 supposed+0.027
 seeker+0.027
Neg logit contributions
lights-0.036
ized-0.036
cking-0.035
ls-0.035
led-0.033
Token
Token ID198
Feature activation-4.07e+00

Pos logit contributions
lights+0.112
cking+0.106
ized+0.104
led+0.102
ls+0.101
Neg logit contributions
 fault-0.084
 robber-0.084
 loser-0.082
 supposed-0.080
 liar-0.080
Token"
Token ID1
Feature activation+7.71e-01

Pos logit contributions
lights+0.096
cking+0.087
ls+0.087
ized+0.086
led+0.084
Neg logit contributions
 robber-0.072
 loser-0.071
 fault-0.069
 liar-0.069
 supposed-0.068
TokenWhat
Token ID2061
Feature activation+4.49e+00

Pos logit contributions
 fault+0.017
 supposed+0.014
 seeker+0.014
 loser+0.014
 robber+0.014
Neg logit contributions
ized-0.017
lights-0.017
cking-0.016
ls-0.016
led-0.015
Token did
Token ID750
Feature activation+9.12e+00

Pos logit contributions
 fault+0.118
 robber+0.111
 loser+0.107
 seeker+0.106
uries+0.099
Neg logit contributions
 deep-0.064
s-0.060
ls-0.057
 ch-0.056
 big-0.055
Token you
Token ID345
Feature activation+5.77e+00

Pos logit contributions
 fault+0.214
 loser+0.185
 seeker+0.185
 supposed+0.183
 robber+0.180
Neg logit contributions
lights-0.180
ized-0.180
ls-0.177
cking-0.174
led-0.165
Token do
Token ID466
Feature activation+6.62e+00

Pos logit contributions
 fault+0.132
 loser+0.109
 liar+0.109
 robber+0.104
 reader+0.103
Neg logit contributions
ized-0.118
cking-0.118
 deep-0.111
ling-0.109
ised-0.105
Token,
Token ID11
Feature activation+4.17e+00

Pos logit contributions
 fault+0.186
 loser+0.163
 liar+0.160
epad+0.156
 supposed+0.156
Neg logit contributions
ized-0.106
ls-0.099
 deep-0.094
bel-0.094
ised-0.093
Token sweet
Token ID6029
Feature activation+1.00e+00

Pos logit contributions
 fault+0.095
 supposed+0.083
epad+0.082
 loser+0.081
 seeker+0.081
Neg logit contributions
lights-0.084
ized-0.084
ls-0.083
cking-0.080
ised-0.076
Tokenheart
Token ID11499
Feature activation+3.91e+00

Pos logit contributions
 fault+0.030
 loser+0.027
 robber+0.026
 seeker+0.026
 supposed+0.026
Neg logit contributions
lights-0.014
ized-0.014
ls-0.014
cking-0.013
ling-0.013
Token.
Token ID13
Feature activation+1.30e+00

Pos logit contributions
 loser+0.048
 fault+0.048
 liar+0.047
 robber+0.046
 seeker+0.046
Neg logit contributions
lights-0.021
ized-0.021
ls-0.019
led-0.018
ling-0.018
Token
Token ID198
Feature activation-4.30e+00

Pos logit contributions
 fault+0.033
 liar+0.030
 loser+0.030
 robber+0.030
 supposed+0.029
Neg logit contributions
lights-0.023
ized-0.022
ls-0.022
cking-0.021
ised-0.020
Token
Token ID198
Feature activation-3.15e+00

Pos logit contributions
lights+0.070
ized+0.069
cking+0.068
ls+0.065
led+0.063
Neg logit contributions
 fault-0.113
 supposed-0.102
 loser-0.101
 robber-0.101
 seeker-0.101
Token"
Token ID1
Feature activation+1.42e+00

Pos logit contributions
lights+0.048
ized+0.047
cking+0.046
ls+0.045
led+0.043
Neg logit contributions
 fault-0.088
 robber-0.078
 loser-0.078
 supposed-0.078
 seeker-0.077
TokenWhat
Token ID2061
Feature activation+4.82e+00

Pos logit contributions
 fault+0.037
 liar+0.031
 robber+0.030
 supposed+0.030
 loser+0.030
Neg logit contributions
ized-0.025
lights-0.025
cking-0.024
ised-0.023
led-0.023
Token should
Token ID815
Feature activation+9.11e+00

Pos logit contributions
 robber+0.130
 fault+0.129
 loser+0.122
 seeker+0.119
 liar+0.115
Neg logit contributions
 deep-0.066
ls-0.063
s-0.063
 high-0.056
 grass-0.055
Token we
Token ID356
Feature activation+5.33e+00

Pos logit contributions
 fault+0.226
 supposed+0.197
 seeker+0.197
 robber+0.195
epad+0.195
Neg logit contributions
lights-0.169
ls-0.165
ized-0.164
cking-0.151
 deep-0.148
Token play
Token ID711
Feature activation+3.08e+00

Pos logit contributions
 fault+0.138
 liar+0.120
 loser+0.113
 robber+0.107
 supposed+0.104
Neg logit contributions
 deep-0.095
ized-0.089
 bub-0.088
 low-0.086
 row-0.084
Token?"
Token ID1701
Feature activation-1.17e-01

Pos logit contributions
 fault+0.082
 supposed+0.072
 liar+0.071
 loser+0.070
epad+0.070
Neg logit contributions
ized-0.052
lights-0.051
ls-0.049
cking-0.049
ised-0.047
Token asked
Token ID1965
Feature activation+7.27e-01

Pos logit contributions
lights+0.002
ized+0.002
cking+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.003
 robber-0.002
 loser-0.002
 supposed-0.002
 seeker-0.002
Token Sam
Token ID3409
Feature activation+1.06e+00

Pos logit contributions
 fault+0.014
 loser+0.012
 supposed+0.012
 robber+0.012
 liar+0.012
Neg logit contributions
lights-0.017
ized-0.017
ls-0.017
cking-0.016
led-0.016
Token
Token ID198
Feature activation-2.90e+00

Pos logit contributions
ized+0.089
cking+0.088
lights+0.087
led+0.083
ls+0.082
Neg logit contributions
 fault-0.097
 supposed-0.089
 seeker-0.084
 robber-0.084
 loser-0.084
Token"
Token ID1
Feature activation+1.47e+00

Pos logit contributions
lights+0.053
ized+0.053
cking+0.052
ls+0.052
led+0.049
Neg logit contributions
 fault-0.068
 supposed-0.064
 robber-0.064
 loser-0.063
 seeker-0.062
TokenMom
Token ID29252
Feature activation+2.35e+00

Pos logit contributions
 fault+0.048
 liar+0.042
 supposed+0.041
 robber+0.040
 loser+0.040
Neg logit contributions
lights-0.017
ized-0.017
cking-0.016
led-0.014
ling-0.014
Token,
Token ID11
Feature activation+3.17e+00

Pos logit contributions
 fault+0.057
 liar+0.054
 robber+0.053
 loser+0.052
 supposed+0.050
Neg logit contributions
ized-0.040
lights-0.039
cking-0.039
ls-0.038
ling-0.037
Token why
Token ID1521
Feature activation+4.03e+00

Pos logit contributions
 fault+0.081
epad+0.072
 supposed+0.071
 liar+0.071
 loser+0.070
Neg logit contributions
 deep-0.055
ls-0.050
lights-0.050
 pans-0.049
 bub-0.048
Token do
Token ID466
Feature activation+9.11e+00

Pos logit contributions
 fault+0.078
 loser+0.067
 robber+0.066
 supposed+0.065
 liar+0.065
Neg logit contributions
lights-0.094
ized-0.093
ls-0.093
cking-0.091
led-0.088
Token the
Token ID262
Feature activation+1.93e+00

Pos logit contributions
 fault+0.174
 robber+0.154
 loser+0.151
 supposed+0.149
 liar+0.149
Neg logit contributions
lights-0.212
ized-0.209
cking-0.208
ls-0.200
led-0.198
Token flowers
Token ID12734
Feature activation+9.00e-02

Pos logit contributions
 fault+0.054
 seeker+0.049
 supposed+0.048
 loser+0.048
 robber+0.048
Neg logit contributions
lights-0.029
ized-0.029
cking-0.029
ls-0.027
led-0.026
Token spring
Token ID6076
Feature activation+3.94e-01

Pos logit contributions
 fault+0.002
 supposed+0.002
 seeker+0.001
 loser+0.001
 robber+0.001
Neg logit contributions
lights-0.002
cking-0.002
ized-0.002
ls-0.002
led-0.002
Token?"
Token ID1701
Feature activation-6.36e-01

Pos logit contributions
 fault+0.007
 supposed+0.006
 seeker+0.006
 robber+0.006
 loser+0.006
Neg logit contributions
cking-0.009
lights-0.009
ized-0.009
cks-0.008
bel-0.008
Token Tim
Token ID5045
Feature activation+3.64e-01

Pos logit contributions
lights+0.018
cking+0.018
ized+0.018
ls+0.018
led+0.017
Neg logit contributions
 fault-0.008
 robber-0.007
 supposed-0.007
 liar-0.006
 seeker-0.006
Token asked
Token ID1965
Feature activation+2.36e+00

Pos logit contributions
 fault+0.020
 loser+0.018
 robber+0.017
 seeker+0.017
 liar+0.017
Neg logit contributions
ized-0.012
ls-0.011
lights-0.011
cking-0.011
led-0.011
Token Mike
Token ID4995
Feature activation+2.12e+00

Pos logit contributions
 fault+0.060
 supposed+0.054
 loser+0.054
 robber+0.053
 seeker+0.053
Neg logit contributions
lights-0.041
ls-0.039
ized-0.039
cking-0.038
led-0.037
Token:
Token ID25
Feature activation+1.94e+00

Pos logit contributions
 fault+0.052
 seeker+0.049
 loser+0.048
 supposed+0.048
 robber+0.047
Neg logit contributions
ls-0.037
lights-0.036
ling-0.036
ized-0.035
cking-0.035
Token "
Token ID366
Feature activation+1.14e+00

Pos logit contributions
 fault+0.051
 supposed+0.045
 robber+0.044
 seeker+0.044
 loser+0.044
Neg logit contributions
ls-0.031
lights-0.030
ized-0.030
 deep-0.030
cking-0.029
TokenWhat
Token ID2061
Feature activation+5.30e+00

Pos logit contributions
 fault+0.028
 supposed+0.024
 liar+0.023
 robber+0.023
 loser+0.022
Neg logit contributions
ized-0.022
cking-0.022
lights-0.021
bel-0.020
ling-0.020
Token would
Token ID561
Feature activation+9.08e+00

Pos logit contributions
 fault+0.140
 seeker+0.131
 loser+0.130
 robber+0.130
epad+0.125
Neg logit contributions
s-0.075
 deep-0.071
ls-0.068
 big-0.060
 high-0.060
Token you
Token ID345
Feature activation+5.70e+00

Pos logit contributions
 fault+0.241
 seeker+0.219
 loser+0.209
 liar+0.207
epad+0.205
Neg logit contributions
ized-0.148
cking-0.137
ls-0.135
 deep-0.135
lights-0.134
Token like
Token ID588
Feature activation+3.42e+00

Pos logit contributions
 fault+0.133
 liar+0.111
 loser+0.110
 supposed+0.104
 robber+0.103
Neg logit contributions
ized-0.109
 deep-0.106
cking-0.103
ling-0.100
 bub-0.098
Token best
Token ID1266
Feature activation+4.33e+00

Pos logit contributions
 fault+0.082
 loser+0.075
 liar+0.072
 seeker+0.072
 supposed+0.070
Neg logit contributions
ized-0.060
lights-0.058
ls-0.057
 deep-0.057
cking-0.055
Token in
Token ID287
Feature activation+5.55e-01

Pos logit contributions
 fault+0.121
 supposed+0.111
epad+0.107
 liar+0.102
 loser+0.102
Neg logit contributions
lights-0.058
ls-0.056
 deep-0.054
 slowly-0.049
ling-0.048
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
 fault+0.017
 robber+0.015
 loser+0.015
 supposed+0.015
 liar+0.015
Neg logit contributions
lights-0.007
ized-0.007
cking-0.007
ls-0.007
led-0.006
Token
Token ID95
Feature activation+2.37e+00

Pos logit contributions
 fault+0.091
 robber+0.082
 seeker+0.081
 loser+0.081
 supposed+0.080
Neg logit contributions
lights-0.080
ized-0.076
cking-0.076
ls-0.076
led-0.073
Token
Token ID26391
Feature activation+4.60e+00

Pos logit contributions
 fault+0.051
 robber+0.044
 loser+0.044
 supposed+0.044
 seeker+0.044
Neg logit contributions
lights-0.051
ized-0.050
cking-0.049
ls-0.049
led-0.047
Token
Token ID129
Feature activation+8.30e-01

Pos logit contributions
 fault+0.114
 seeker+0.100
 supposed+0.100
 loser+0.100
 robber+0.099
Neg logit contributions
lights-0.083
ized-0.081
cking-0.081
ls-0.077
led-0.076
Token
Token ID241
Feature activation+1.56e+00

Pos logit contributions
 fault+0.018
 supposed+0.017
 robber+0.016
 seeker+0.015
 loser+0.015
Neg logit contributions
lights-0.019
ls-0.017
ized-0.017
bel-0.016
led-0.016
TokenWhat
Token ID2061
Feature activation+3.32e+00

Pos logit contributions
 fault+0.042
 robber+0.038
 seeker+0.037
epad+0.036
 loser+0.036
Neg logit contributions
ized-0.024
lights-0.023
bel-0.023
The-0.023
 deep-0.022
Token should
Token ID815
Feature activation+9.07e+00

Pos logit contributions
 fault+0.088
 robber+0.082
 loser+0.081
 seeker+0.077
epad+0.076
Neg logit contributions
ls-0.051
 deep-0.050
s-0.049
led-0.043
ized-0.042
Token we
Token ID356
Feature activation+4.95e+00

Pos logit contributions
 fault+0.264
 seeker+0.236
 loser+0.236
 supposed+0.236
epad+0.234
Neg logit contributions
ls-0.128
lights-0.126
ized-0.120
cking-0.112
 deep-0.106
Token do
Token ID466
Feature activation+5.04e+00

Pos logit contributions
 fault+0.107
 liar+0.092
 loser+0.090
 robber+0.087
epad+0.086
Neg logit contributions
ized-0.101
 deep-0.101
cking-0.099
ls-0.098
ling-0.096
Token now
Token ID783
Feature activation+4.13e+00

Pos logit contributions
 fault+0.128
 liar+0.119
 robber+0.117
epad+0.115
 loser+0.114
Neg logit contributions
ized-0.084
ls-0.082
lights-0.079
 deep-0.076
ised-0.075
Token?
Token ID30
Feature activation+2.31e+00

Pos logit contributions
 fault+0.086
 robber+0.074
 loser+0.074
 supposed+0.073
 liar+0.073
Neg logit contributions
lights-0.091
ized-0.090
cking-0.089
ls-0.088
led-0.084
Tokenâ
Token ID22940
Feature activation+1.01e+00

Pos logit contributions
 fault+0.055
 supposed+0.048
 seeker+0.046
 liar+0.045
 loser+0.045
Neg logit contributions
lights-0.046
ized-0.045
ls-0.045
ised-0.043
cking-0.042
TokenRon
Token ID23672
Feature activation+1.00e+00

Pos logit contributions
lights+0.115
ized+0.100
led+0.100
ls+0.097
cking+0.096
Neg logit contributions
 fault-0.113
 loser-0.106
 robber-0.105
 liar-0.103
 seeker-0.102
Tokenny
Token ID3281
Feature activation-2.00e+00

Pos logit contributions
 fault+0.021
 robber+0.018
 supposed+0.018
 liar+0.018
 seeker+0.018
Neg logit contributions
ized-0.021
lights-0.021
led-0.021
cking-0.021
ls-0.021
Token nodded
Token ID14464
Feature activation-2.96e+00

Pos logit contributions
lights+0.037
cking+0.036
ized+0.034
ls+0.034
cks+0.032
Neg logit contributions
 fault-0.049
 robber-0.044
 loser-0.043
 liar-0.043
 supposed-0.042
Token.
Token ID13
Feature activation+1.35e+00

Pos logit contributions
cking+0.059
lights+0.059
ls+0.058
ized+0.058
cks+0.055
Neg logit contributions
 fault-0.062
 loser-0.056
epad-0.055
 robber-0.055
 liar-0.054
Token Tears
Token ID40328
Feature activation-2.72e+00

Pos logit contributions
 fault+0.026
 robber+0.022
 loser+0.022
 seeker+0.022
 supposed+0.022
Neg logit contributions
ized-0.032
lights-0.032
cking-0.032
ls-0.031
led-0.030
Token filled
Token ID5901
Feature activation-7.99e+00

Pos logit contributions
cking+0.054
cks+0.054
lights+0.051
ls+0.047
 pans+0.046
Neg logit contributions
 loser-0.052
 fault-0.052
 indeed-0.051
 robber-0.050
 liar-0.049
Token his
Token ID465
Feature activation-1.10e+00

Pos logit contributions
lights+0.140
cking+0.135
ized+0.134
ls+0.133
led+0.122
Neg logit contributions
 fault-0.197
 seeker-0.179
 robber-0.178
 loser-0.177
 supposed-0.173
Token eyes
Token ID2951
Feature activation-4.24e+00

Pos logit contributions
ized+0.026
lights+0.026
cking+0.026
ls+0.024
bel+0.024
Neg logit contributions
 fault-0.022
 loser-0.019
 liar-0.017
 robber-0.017
 seeker-0.016
Token.
Token ID13
Feature activation+5.71e-01

Pos logit contributions
cks+0.095
lights+0.094
cking+0.093
ls+0.090
bel+0.088
Neg logit contributions
 fault-0.079
 indeed-0.071
epad-0.069
 loser-0.067
 ok-0.067
Token
Token ID198
Feature activation-6.31e+00

Pos logit contributions
 fault+0.012
 loser+0.010
 robber+0.010
 supposed+0.010
 seeker+0.010
Neg logit contributions
ized-0.013
lights-0.013
cking-0.013
ls-0.012
led-0.012
Token
Token ID198
Feature activation-5.48e+00

Pos logit contributions
lights+0.152
ized+0.151
cking+0.143
led+0.142
ls+0.132
Neg logit contributions
 fault-0.120
 Where-0.110
 supposed-0.108
 loser-0.106
 liar-0.102
Token solve
Token ID8494
Feature activation-3.35e+00

Pos logit contributions
 fault+0.023
 robber+0.021
 loser+0.021
 seeker+0.021
 liar+0.021
Neg logit contributions
lights-0.015
ized-0.014
cking-0.014
ls-0.014
led-0.013
Token his
Token ID465
Feature activation-1.22e+00

Pos logit contributions
lights+0.048
ized+0.047
cking+0.046
ls+0.045
led+0.043
Neg logit contributions
 fault-0.097
 supposed-0.086
 robber-0.086
 loser-0.086
 seeker-0.085
Token problem
Token ID1917
Feature activation-9.60e-01

Pos logit contributions
lights+0.030
ized+0.029
cking+0.029
ls+0.029
led+0.028
Neg logit contributions
 fault-0.023
 robber-0.019
 loser-0.019
 supposed-0.019
 seeker-0.019
Token.
Token ID13
Feature activation-6.01e-01

Pos logit contributions
lights+0.015
ized+0.015
cking+0.014
ls+0.014
led+0.013
Neg logit contributions
 fault-0.026
 robber-0.024
 seeker-0.023
 loser-0.023
 liar-0.023
Token
Token ID220
Feature activation-1.15e+00

Pos logit contributions
cking+0.013
lights+0.013
ized+0.013
ls+0.012
led+0.012
Neg logit contributions
 fault-0.013
 robber-0.012
 loser-0.012
 supposed-0.011
 liar-0.011
Token
Token ID198
Feature activation-7.85e+00

Pos logit contributions
lights+0.021
ized+0.021
cking+0.020
ls+0.019
led+0.019
Neg logit contributions
 fault-0.029
 supposed-0.026
 loser-0.025
 robber-0.025
 liar-0.025
Token
Token ID198
Feature activation-6.36e+00

Pos logit contributions
lights+0.211
ized+0.200
cking+0.192
led+0.188
ls+0.186
Neg logit contributions
 fault-0.123
 Do-0.117
 supposed-0.116
 Enjoy-0.112
 Where-0.111
TokenFrom
Token ID4863
Feature activation-3.21e-01

Pos logit contributions
lights+0.106
ls+0.098
led+0.093
cking+0.087
ized+0.087
Neg logit contributions
 loser-0.164
 robber-0.162
 liar-0.156
 fault-0.155
oyal-0.152
Token then
Token ID788
Feature activation+5.44e-03

Pos logit contributions
lights+0.007
cking+0.007
bel+0.007
ls+0.006
ized+0.006
Neg logit contributions
 fault-0.006
 seeker-0.006
 robber-0.006
 indeed-0.005
 supposed-0.005
Token on
Token ID319
Feature activation-3.05e+00

Pos logit contributions
 fault+0.000
 loser+0.000
 seeker+0.000
 robber+0.000
 supposed+0.000
Neg logit contributions
lights-0.000
ized-0.000
cking-0.000
ls-0.000
led-0.000
Token,
Token ID11
Feature activation-4.77e-01

Pos logit contributions
lights+0.063
ized+0.063
cking+0.063
ls+0.060
led+0.058
Neg logit contributions
 fault-0.068
 loser-0.060
 robber-0.060
 seeker-0.059
 liar-0.058
Token being
Token ID852
Feature activation+2.84e-01

Pos logit contributions
lights+0.067
ls+0.065
ized+0.064
led+0.061
cking+0.061
Neg logit contributions
 fault-0.048
 supposed-0.044
 robber-0.043
 liar-0.042
 loser-0.041
Token so
Token ID523
Feature activation-1.68e+00

Pos logit contributions
 fault+0.007
 robber+0.006
 loser+0.006
 seeker+0.006
 liar+0.006
Neg logit contributions
ized-0.006
cking-0.005
lights-0.005
ls-0.005
led-0.005
Token helpful
Token ID7613
Feature activation-2.68e-01

Pos logit contributions
lights+0.036
ized+0.033
cking+0.033
ls+0.032
led+0.031
Neg logit contributions
 fault-0.037
 supposed-0.033
 liar-0.033
 loser-0.033
 robber-0.033
Token.
Token ID13
Feature activation-6.29e-01

Pos logit contributions
led+0.003
ls+0.003
ized+0.003
lights+0.003
ling+0.003
Neg logit contributions
 fault-0.008
 loser-0.008
 liar-0.008
 seeker-0.008
 robber-0.008
Token
Token ID220
Feature activation-1.21e+00

Pos logit contributions
cking+0.012
ized+0.012
lights+0.012
ls+0.011
led+0.011
Neg logit contributions
 fault-0.015
 loser-0.013
 supposed-0.012
 seeker-0.012
 robber-0.012
Token
Token ID198
Feature activation-7.79e+00

Pos logit contributions
lights+0.022
ized+0.022
cking+0.021
ls+0.020
led+0.020
Neg logit contributions
 fault-0.031
 supposed-0.027
 loser-0.026
 robber-0.026
 seeker-0.026
Token
Token ID198
Feature activation-6.38e+00

Pos logit contributions
lights+0.202
ized+0.195
cking+0.191
led+0.182
ls+0.174
Neg logit contributions
 fault-0.123
 Do-0.116
 supposed-0.113
 Where-0.109
 Yes-0.109
TokenThe
Token ID464
Feature activation-1.99e-01

Pos logit contributions
lights+0.110
ls+0.098
ized+0.095
led+0.095
cking+0.094
Neg logit contributions
 loser-0.155
 robber-0.147
 fault-0.144
 liar-0.144
 seeker-0.142
Token guard
Token ID4860
Feature activation-2.16e+00

Pos logit contributions
lights+0.004
ized+0.004
cking+0.004
ls+0.004
led+0.004
Neg logit contributions
 fault-0.004
 supposed-0.004
 robber-0.004
 loser-0.004
 seeker-0.004
Token smiled
Token ID13541
Feature activation-1.01e+00

Pos logit contributions
lights+0.037
cking+0.036
ized+0.035
ls+0.034
cks+0.032
Neg logit contributions
 fault-0.054
 robber-0.050
 loser-0.049
 liar-0.049
 seeker-0.048
Token and
Token ID290
Feature activation-3.70e+00

Pos logit contributions
lights+0.019
cking+0.019
ized+0.018
ls+0.018
cks+0.017
Neg logit contributions
 fault-0.025
 loser-0.022
 robber-0.022
 seeker-0.022
 supposed-0.021
Token his
Token ID465
Feature activation+1.13e+00

Pos logit contributions
lights+0.019
cking+0.019
ized+0.019
ls+0.017
bel+0.017
Neg logit contributions
 fault-0.021
 robber-0.018
 liar-0.018
 seeker-0.018
 supposed-0.018
Token mom
Token ID1995
Feature activation-1.01e+00

Pos logit contributions
 fault+0.024
 seeker+0.019
 loser+0.019
 supposed+0.019
 robber+0.019
Neg logit contributions
lights-0.027
ized-0.026
cking-0.026
led-0.025
ls-0.025
Tokenmy
Token ID1820
Feature activation+2.15e-01

Pos logit contributions
lights+0.024
ized+0.024
cking+0.023
ls+0.023
led+0.022
Neg logit contributions
 fault-0.020
 loser-0.017
 robber-0.017
 liar-0.017
 seeker-0.017
Token.
Token ID13
Feature activation-5.40e-01

Pos logit contributions
 fault+0.006
 loser+0.005
 robber+0.005
 liar+0.005
 seeker+0.005
Neg logit contributions
ized-0.003
lights-0.003
cking-0.003
ls-0.003
led-0.003
Token
Token ID220
Feature activation-1.05e+00

Pos logit contributions
cking+0.012
ized+0.012
lights+0.011
ls+0.011
led+0.011
Neg logit contributions
 fault-0.012
 robber-0.010
 supposed-0.010
 loser-0.010
 Where-0.010
Token
Token ID198
Feature activation-7.71e+00

Pos logit contributions
lights+0.020
ized+0.019
cking+0.019
ls+0.018
led+0.018
Neg logit contributions
 fault-0.026
 supposed-0.023
 loser-0.022
 robber-0.022
 liar-0.022
Token
Token ID198
Feature activation-6.29e+00

Pos logit contributions
lights+0.204
ized+0.197
cking+0.184
led+0.184
ls+0.181
Neg logit contributions
 supposed-0.115
 fault-0.115
 Do-0.112
 Where-0.107
 Enjoy-0.107
TokenThe
Token ID464
Feature activation-7.03e-01

Pos logit contributions
lights+0.095
ls+0.089
led+0.083
ized+0.082
cking+0.078
Neg logit contributions
 robber-0.166
 loser-0.165
 fault-0.161
 liar-0.160
 supposed-0.158
Token two
Token ID734
Feature activation-1.03e+00

Pos logit contributions
lights+0.015
ized+0.015
cking+0.014
ls+0.014
bel+0.014
Neg logit contributions
 fault-0.015
 loser-0.014
 robber-0.014
 liar-0.013
 seeker-0.013
Token black
Token ID2042
Feature activation+1.25e+00

Pos logit contributions
lights+0.024
ized+0.022
cking+0.022
bel+0.022
cks+0.022
Neg logit contributions
 fault-0.018
 loser-0.017
 robber-0.016
 liar-0.016
 supposed-0.015
Token ducks
Token ID39694
Feature activation-3.51e+00

Pos logit contributions
 fault+0.034
 robber+0.031
 loser+0.031
 seeker+0.030
 supposed+0.030
Neg logit contributions
lights-0.020
ized-0.020
cking-0.019
ls-0.019
led-0.018
Token the
Token ID262
Feature activation-9.36e-01

Pos logit contributions
lights+0.014
cking+0.012
ls+0.012
bel+0.012
led+0.012
Neg logit contributions
 robber-0.007
 fault-0.006
 liar-0.006
 loser-0.006
 Stop-0.006
Token bear
Token ID6842
Feature activation-1.36e+00

Pos logit contributions
lights+0.015
ized+0.015
ls+0.014
cking+0.014
led+0.013
Neg logit contributions
 fault-0.025
 supposed-0.023
 loser-0.022
 seeker-0.022
 robber-0.022
Token again
Token ID757
Feature activation-2.51e+00

Pos logit contributions
lights+0.026
ized+0.026
cking+0.025
ls+0.025
led+0.024
Neg logit contributions
 fault-0.032
 loser-0.028
 robber-0.028
 seeker-0.028
 supposed-0.028
Token.
Token ID13
Feature activation-1.39e-01

Pos logit contributions
lights+0.045
ized+0.044
cking+0.043
ls+0.042
led+0.040
Neg logit contributions
 fault-0.063
 supposed-0.056
 loser-0.056
 robber-0.056
 seeker-0.055
Token
Token ID220
Feature activation-1.15e+00

Pos logit contributions
cking+0.003
ized+0.003
lights+0.003
ls+0.003
led+0.003
Neg logit contributions
 fault-0.003
 supposed-0.003
 robber-0.003
 seeker-0.003
 loser-0.003
Token
Token ID198
Feature activation-7.71e+00

Pos logit contributions
lights+0.021
ized+0.021
cking+0.020
ls+0.019
led+0.019
Neg logit contributions
 fault-0.029
 supposed-0.026
 loser-0.025
 robber-0.025
 liar-0.025
Token
Token ID198
Feature activation-6.39e+00

Pos logit contributions
lights+0.209
ized+0.198
cking+0.191
led+0.187
ls+0.185
Neg logit contributions
 fault-0.116
 Do-0.112
 supposed-0.109
 Where-0.106
 Enjoy-0.105
TokenThe
Token ID464
Feature activation-1.23e+00

Pos logit contributions
lights+0.104
ls+0.093
led+0.090
cking+0.085
ized+0.085
Neg logit contributions
 robber-0.163
 loser-0.162
 fault-0.161
 liar-0.158
 supposed-0.156
Token boy
Token ID2933
Feature activation-2.16e+00

Pos logit contributions
lights+0.025
cking+0.025
ized+0.025
ls+0.024
led+0.023
Neg logit contributions
 fault-0.028
 robber-0.025
 loser-0.025
 seeker-0.025
 liar-0.024
Token also
Token ID635
Feature activation+2.87e-02

Pos logit contributions
lights+0.043
cking+0.042
ized+0.041
ls+0.040
ised+0.038
Neg logit contributions
 fault-0.050
 loser-0.043
 robber-0.043
 liar-0.043
 seeker-0.043
Token remembers
Token ID18140
Feature activation+1.25e+00

Pos logit contributions
 fault+0.001
 robber+0.001
 supposed+0.001
 seeker+0.001
 liar+0.001
Neg logit contributions
lights-0.001
cking-0.001
ized-0.001
ls-0.001
ised-0.001
Token for
Token ID329
Feature activation-2.43e+00

Pos logit contributions
lights+0.053
ized+0.052
cking+0.052
ls+0.050
led+0.048
Neg logit contributions
 fault-0.052
 robber-0.045
 supposed-0.045
 loser-0.045
 seeker-0.044
Token their
Token ID511
Feature activation+1.55e-02

Pos logit contributions
lights+0.058
ls+0.056
ized+0.055
cking+0.053
led+0.052
Neg logit contributions
 fault-0.048
 supposed-0.043
 robber-0.042
 liar-0.042
 loser-0.041
Token kindness
Token ID23887
Feature activation-1.06e+00

Pos logit contributions
 fault+0.000
 robber+0.000
 loser+0.000
 seeker+0.000
 supposed+0.000
Neg logit contributions
ized-0.000
lights-0.000
cking-0.000
ls-0.000
led-0.000
Token.
Token ID13
Feature activation-7.35e-01

Pos logit contributions
lights+0.015
ized+0.015
ls+0.014
led+0.014
cking+0.014
Neg logit contributions
 fault-0.030
 seeker-0.027
 robber-0.027
 loser-0.027
 liar-0.027
Token
Token ID220
Feature activation-9.69e-01

Pos logit contributions
lights+0.015
ized+0.015
cking+0.015
led+0.014
ls+0.014
Neg logit contributions
 fault-0.016
 loser-0.014
 supposed-0.014
 Do-0.014
 Where-0.013
Token
Token ID198
Feature activation-7.70e+00

Pos logit contributions
lights+0.018
ized+0.018
cking+0.017
ls+0.017
led+0.016
Neg logit contributions
 fault-0.024
 supposed-0.021
 loser-0.021
 robber-0.021
 liar-0.020
Token
Token ID198
Feature activation-6.54e+00

Pos logit contributions
lights+0.202
ized+0.190
cking+0.182
led+0.179
ls+0.174
Neg logit contributions
 fault-0.118
 Do-0.116
 Where-0.110
 supposed-0.109
 Enjoy-0.107
TokenThe
Token ID464
Feature activation-1.20e+00

Pos logit contributions
lights+0.118
led+0.101
ized+0.100
ls+0.100
cking+0.096
Neg logit contributions
 loser-0.147
 Do-0.141
 indeed-0.140
 fault-0.138
 liar-0.137
Token tired
Token ID10032
Feature activation+1.07e+00

Pos logit contributions
lights+0.027
ized+0.027
cking+0.025
ls+0.025
led+0.024
Neg logit contributions
 fault-0.024
 loser-0.023
 robber-0.023
 liar-0.022
 seeker-0.022
Token bee
Token ID20697
Feature activation-2.95e+00

Pos logit contributions
 fault+0.025
 loser+0.024
 robber+0.023
 liar+0.023
 seeker+0.023
Neg logit contributions
cking-0.020
ized-0.020
lights-0.019
cks-0.018
ised-0.017
Token flew
Token ID13112
Feature activation-2.79e+00

Pos logit contributions
lights+0.061
cking+0.058
ized+0.055
ls+0.054
cks+0.054
Neg logit contributions
 fault-0.062
 indeed-0.058
 loser-0.056
 robber-0.056
 liar-0.055
Token excited
Token ID6568
Feature activation-1.24e-01

Pos logit contributions
lights+0.018
ls+0.017
ized+0.016
cking+0.016
bing+0.015
Neg logit contributions
 fault-0.020
 supposed-0.019
 liar-0.018
 loser-0.018
 seeker-0.017
Token and
Token ID290
Feature activation-3.18e+00

Pos logit contributions
ized+0.002
lights+0.002
ls+0.002
cking+0.002
led+0.002
Neg logit contributions
 fault-0.003
 loser-0.003
 liar-0.003
 seeker-0.003
 robber-0.003
Token happy
Token ID3772
Feature activation-3.73e-02

Pos logit contributions
lights+0.067
ls+0.063
cking+0.062
ized+0.060
cks+0.058
Neg logit contributions
 supposed-0.065
 fault-0.065
 robber-0.061
 indeed-0.060
 liar-0.060
Token.
Token ID13
Feature activation-1.84e-01

Pos logit contributions
ized+0.001
lights+0.001
ls+0.001
cking+0.000
led+0.000
Neg logit contributions
 fault-0.001
 loser-0.001
 robber-0.001
 liar-0.001
 seeker-0.001
Token
Token ID220
Feature activation-1.07e+00

Pos logit contributions
ized+0.003
lights+0.003
cking+0.003
ls+0.003
led+0.003
Neg logit contributions
 fault-0.005
 robber-0.004
 loser-0.004
 supposed-0.004
 seeker-0.004
Token
Token ID198
Feature activation-7.70e+00

Pos logit contributions
lights+0.018
ized+0.018
cking+0.018
ls+0.017
led+0.016
Neg logit contributions
 fault-0.028
 supposed-0.025
 loser-0.024
 robber-0.024
 seeker-0.024
Token
Token ID198
Feature activation-6.19e+00

Pos logit contributions
lights+0.193
ized+0.183
cking+0.177
led+0.176
ls+0.171
Neg logit contributions
 fault-0.135
 supposed-0.126
 Do-0.121
 Where-0.119
 loser-0.118
TokenNow
Token ID3844
Feature activation+6.82e-01

Pos logit contributions
lights+0.106
ls+0.100
ized+0.096
led+0.094
cking+0.094
Neg logit contributions
 fault-0.155
 loser-0.153
 robber-0.151
 supposed-0.147
 liar-0.147
Token the
Token ID262
Feature activation+3.08e-01

Pos logit contributions
 fault+0.017
 robber+0.015
 seeker+0.015
 loser+0.015
 supposed+0.015
Neg logit contributions
lights-0.013
cking-0.012
ized-0.012
ls-0.012
led-0.011
Token bear
Token ID6842
Feature activation-2.02e-01

Pos logit contributions
 fault+0.008
 supposed+0.007
 loser+0.007
 liar+0.007
 seeker+0.007
Neg logit contributions
lights-0.006
ized-0.006
cking-0.005
ls-0.005
led-0.005
Token had
Token ID550
Feature activation+2.44e+00

Pos logit contributions
lights+0.004
ized+0.004
cking+0.004
ls+0.004
led+0.003
Neg logit contributions
 fault-0.005
 loser-0.004
 robber-0.004
 seeker-0.004
 supposed-0.004
Tokenapped
Token ID6320
Feature activation-2.59e+00

Pos logit contributions
lights+0.013
ized+0.012
cking+0.012
ls+0.012
led+0.011
Neg logit contributions
 fault-0.015
 robber-0.013
 loser-0.013
 supposed-0.013
 seeker-0.013
Token and
Token ID290
Feature activation-4.41e+00

Pos logit contributions
lights+0.044
ized+0.043
cking+0.043
ls+0.042
led+0.040
Neg logit contributions
 fault-0.067
 robber-0.060
 loser-0.060
 supposed-0.059
 seeker-0.059
Token cheered
Token ID34767
Feature activation-3.66e-01

Pos logit contributions
lights+0.080
ized+0.078
cking+0.078
ls+0.076
cks+0.070
Neg logit contributions
 fault-0.109
 seeker-0.098
 supposed-0.097
 robber-0.097
 loser-0.096
Token.
Token ID13
Feature activation-9.85e-02

Pos logit contributions
lights+0.005
ized+0.005
ls+0.005
led+0.005
cking+0.005
Neg logit contributions
 fault-0.011
 liar-0.010
 seeker-0.010
 supposed-0.009
 robber-0.009
Token
Token ID220
Feature activation-9.65e-01

Pos logit contributions
cking+0.002
lights+0.002
ized+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.002
 supposed-0.002
 robber-0.002
 loser-0.002
 seeker-0.002
Token
Token ID198
Feature activation-7.69e+00

Pos logit contributions
lights+0.018
ized+0.018
cking+0.017
ls+0.017
led+0.016
Neg logit contributions
 fault-0.024
 supposed-0.021
 loser-0.020
 robber-0.020
 seeker-0.020
Token
Token ID198
Feature activation-6.40e+00

Pos logit contributions
lights+0.195
ized+0.183
ls+0.177
cking+0.176
led+0.173
Neg logit contributions
 supposed-0.123
 fault-0.123
 Do-0.120
 Where-0.114
 Enjoy-0.113
TokenThe
Token ID464
Feature activation-5.73e-01

Pos logit contributions
lights+0.101
ls+0.090
led+0.086
ized+0.082
cking+0.082
Neg logit contributions
 fault-0.165
 loser-0.163
 robber-0.162
 supposed-0.161
 liar-0.158
Token k
Token ID479
Feature activation-6.52e-01

Pos logit contributions
lights+0.013
ized+0.012
cking+0.012
ls+0.012
led+0.012
Neg logit contributions
 fault-0.012
 robber-0.011
 loser-0.011
 seeker-0.011
 supposed-0.011
Tokenitty
Token ID9760
Feature activation-5.84e-01

Pos logit contributions
lights+0.016
cking+0.015
ls+0.015
ized+0.015
led+0.015
Neg logit contributions
 fault-0.012
 loser-0.011
 robber-0.010
 seeker-0.010
 liar-0.010
Token had
Token ID550
Feature activation+2.01e+00

Pos logit contributions
lights+0.010
ized+0.010
cking+0.010
ls+0.010
led+0.009
Neg logit contributions
 fault-0.015
 robber-0.013
 loser-0.013
 supposed-0.013
 seeker-0.013
Token quickly
Token ID2952
Feature activation-6.84e-01

Pos logit contributions
cking+0.078
ls+0.077
cks+0.077
lights+0.076
ized+0.076
Neg logit contributions
 fault-0.085
 supposed-0.076
 robber-0.075
 indeed-0.074
 seeker-0.073
Token ran
Token ID4966
Feature activation-6.75e-01

Pos logit contributions
lights+0.014
cking+0.013
ls+0.013
ized+0.013
cks+0.013
Neg logit contributions
 fault-0.015
 supposed-0.013
 robber-0.013
 liar-0.012
 indeed-0.012
Token away
Token ID1497
Feature activation-3.91e+00

Pos logit contributions
lights+0.010
ized+0.010
ls+0.009
cking+0.009
led+0.009
Neg logit contributions
 fault-0.020
 loser-0.018
 liar-0.018
 robber-0.017
 seeker-0.017
Token.
Token ID13
Feature activation-1.23e-01

Pos logit contributions
lights+0.078
cking+0.074
ized+0.073
ls+0.072
cks+0.070
Neg logit contributions
 fault-0.091
 supposed-0.081
 loser-0.080
 liar-0.079
 robber-0.079
Token
Token ID220
Feature activation-1.12e+00

Pos logit contributions
ized+0.002
lights+0.002
cking+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.003
 robber-0.003
 supposed-0.003
 loser-0.003
 liar-0.003
Token
Token ID198
Feature activation-7.67e+00

Pos logit contributions
lights+0.019
ized+0.018
cking+0.018
ls+0.017
led+0.017
Neg logit contributions
 fault-0.030
 supposed-0.026
 loser-0.026
 robber-0.026
 liar-0.026
Token
Token ID198
Feature activation-6.31e+00

Pos logit contributions
lights+0.182
ized+0.180
cking+0.178
led+0.166
ls+0.166
Neg logit contributions
 fault-0.137
 supposed-0.128
 Do-0.122
 Where-0.122
 loser-0.120
TokenLu
Token ID25596
Feature activation-1.40e+00

Pos logit contributions
lights+0.114
ls+0.111
ized+0.107
cking+0.107
led+0.102
Neg logit contributions
 fault-0.144
 loser-0.140
 robber-0.138
 supposed-0.137
 liar-0.137
Tokency
Token ID948
Feature activation-2.03e+00

Pos logit contributions
lights+0.028
ized+0.027
cking+0.027
ls+0.026
led+0.025
Neg logit contributions
 fault-0.033
 loser-0.029
 robber-0.029
 supposed-0.028
 seeker-0.028
Token was
Token ID373
Feature activation+1.09e+00

Pos logit contributions
lights+0.039
cking+0.038
ized+0.037
ls+0.036
cks+0.034
Neg logit contributions
 fault-0.048
 supposed-0.042
 loser-0.042
 robber-0.041
 liar-0.041
Token very
Token ID845
Feature activation-1.28e+00

Pos logit contributions
 fault+0.024
 supposed+0.022
 loser+0.021
 seeker+0.021
 liar+0.021
Neg logit contributions
lights-0.024
ls-0.023
ized-0.023
cking-0.022
led-0.021
Token hive
Token ID35881
Feature activation-4.13e+00

Pos logit contributions
lights+0.041
ized+0.039
cking+0.038
ls+0.036
led+0.036
Neg logit contributions
 fault-0.076
 loser-0.069
 robber-0.069
 seeker-0.069
 supposed-0.068
Token unsc
Token ID28594
Feature activation-2.81e+00

Pos logit contributions
lights+0.085
cking+0.084
ls+0.081
ized+0.080
cks+0.078
Neg logit contributions
 fault-0.090
 supposed-0.076
 loser-0.076
epad-0.075
 indeed-0.075
Tokenathed
Token ID35932
Feature activation-1.74e+00

Pos logit contributions
lights+0.057
bel+0.048
ized+0.048
ls+0.048
cking+0.047
Neg logit contributions
 fault-0.065
 robber-0.059
 seeker-0.059
 loser-0.058
 liar-0.056
Token.
Token ID13
Feature activation-3.29e-01

Pos logit contributions
lights+0.028
ized+0.028
cking+0.027
ls+0.027
led+0.025
Neg logit contributions
 fault-0.047
 robber-0.042
 loser-0.042
 supposed-0.041
 seeker-0.041
Token
Token ID220
Feature activation-1.36e+00

Pos logit contributions
lights+0.006
cking+0.006
ized+0.006
ls+0.006
led+0.006
Neg logit contributions
 fault-0.008
 robber-0.007
 loser-0.007
 supposed-0.007
 seeker-0.007
Token
Token ID198
Feature activation-7.65e+00

Pos logit contributions
lights+0.025
ized+0.025
cking+0.024
ls+0.023
led+0.022
Neg logit contributions
 fault-0.035
 supposed-0.030
 loser-0.030
 robber-0.030
 liar-0.029
Token
Token ID198
Feature activation-6.08e+00

Pos logit contributions
lights+0.201
ized+0.186
cking+0.184
led+0.183
ls+0.177
Neg logit contributions
 fault-0.130
 supposed-0.120
 Do-0.116
 robber-0.114
 Where-0.113
TokenTh
Token ID817
Feature activation+2.53e-01

Pos logit contributions
lights+0.103
ls+0.092
led+0.088
cking+0.088
cks+0.084
Neg logit contributions
 fault-0.156
 robber-0.155
 loser-0.151
 supposed-0.149
 liar-0.145
Tokenump
Token ID931
Feature activation-1.79e+00

Pos logit contributions
 fault+0.006
 seeker+0.005
 robber+0.005
 loser+0.005
 liar+0.005
Neg logit contributions
ized-0.005
cking-0.005
lights-0.005
ls-0.005
led-0.005
Token was
Token ID373
Feature activation+4.79e-01

Pos logit contributions
ized+0.030
ls+0.029
lights+0.029
led+0.029
cking+0.028
Neg logit contributions
 fault-0.048
 supposed-0.043
 seeker-0.042
epad-0.042
 loser-0.041
Token relieved
Token ID22598
Feature activation-1.79e-01

Pos logit contributions
 fault+0.010
 supposed+0.009
 indeed+0.008
 robber+0.008
 loser+0.008
Neg logit contributions
lights-0.011
ls-0.011
ized-0.011
cking-0.010
led-0.010
Token very
Token ID845
Feature activation-7.67e-01

Pos logit contributions
lights+0.007
ls+0.007
cking+0.007
ized+0.007
cks+0.006
Neg logit contributions
 fault-0.009
 supposed-0.008
 liar-0.008
 loser-0.008
 robber-0.008
Token brave
Token ID14802
Feature activation-1.67e-01

Pos logit contributions
lights+0.017
ls+0.015
cking+0.015
cks+0.015
 pans+0.013
Neg logit contributions
 fault-0.016
 supposed-0.016
 liar-0.015
 indeed-0.014
 ok-0.014
Token and
Token ID290
Feature activation-1.73e+00

Pos logit contributions
ized+0.002
lights+0.002
led+0.002
ling+0.002
ls+0.002
Neg logit contributions
 fault-0.005
 liar-0.005
 robber-0.005
 loser-0.005
 seeker-0.004
Token proud
Token ID6613
Feature activation-7.77e-01

Pos logit contributions
lights+0.032
ls+0.030
cking+0.030
ized+0.030
cks+0.027
Neg logit contributions
 fault-0.042
 supposed-0.040
 seeker-0.038
 liar-0.038
epad-0.038
Token.
Token ID13
Feature activation-1.10e-01

Pos logit contributions
lights+0.012
ized+0.011
ls+0.011
cking+0.010
led+0.010
Neg logit contributions
 fault-0.022
 robber-0.020
 loser-0.020
 seeker-0.020
 liar-0.020
Token
Token ID198
Feature activation-7.65e+00

Pos logit contributions
lights+0.002
ized+0.002
cking+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.003
 robber-0.003
 loser-0.003
 supposed-0.003
 seeker-0.003
Token
Token ID198
Feature activation-6.20e+00

Pos logit contributions
lights+0.157
ized+0.152
cking+0.143
ls+0.138
led+0.138
Neg logit contributions
 fault-0.160
 supposed-0.158
 Do-0.142
 Where-0.141
 indeed-0.137
TokenThe
Token ID464
Feature activation-1.32e+00

Pos logit contributions
lights+0.101
ls+0.097
ized+0.090
cking+0.090
led+0.088
Neg logit contributions
 fault-0.150
 supposed-0.148
 loser-0.147
 robber-0.144
 liar-0.143
Token moral
Token ID6573
Feature activation+1.40e-01

Pos logit contributions
lights+0.028
ized+0.027
cking+0.027
ls+0.027
led+0.025
Neg logit contributions
 fault-0.029
 robber-0.026
 loser-0.026
 seeker-0.026
 supposed-0.025
Token of
Token ID286
Feature activation-6.23e+00

Pos logit contributions
 fault+0.004
 robber+0.003
 loser+0.003
 seeker+0.003
 liar+0.003
Neg logit contributions
lights-0.002
ized-0.002
ls-0.002
cking-0.002
 deep-0.002
Token this
Token ID428
Feature activation-1.91e+00

Pos logit contributions
lights+0.096
ized+0.091
ls+0.090
cking+0.089
led+0.082
Neg logit contributions
 fault-0.172
 supposed-0.159
 seeker-0.155
 robber-0.153
 loser-0.151
Token was
Token ID373
Feature activation+8.74e-01

Pos logit contributions
lights+0.051
cking+0.048
ized+0.047
ls+0.046
cks+0.044
Neg logit contributions
 fault-0.051
 supposed-0.049
 liar-0.046
 indeed-0.046
 loser-0.045
Token so
Token ID523
Feature activation-1.02e+00

Pos logit contributions
 fault+0.021
 supposed+0.019
 loser+0.018
 robber+0.018
 seeker+0.018
Neg logit contributions
lights-0.017
ized-0.016
ls-0.016
cking-0.016
led-0.015
Token happy
Token ID3772
Feature activation+1.65e-01

Pos logit contributions
lights+0.024
ls+0.021
cking+0.020
ized+0.020
cks+0.020
Neg logit contributions
 supposed-0.020
 fault-0.019
 liar-0.019
 indeed-0.018
 loser-0.018
Token.
Token ID13
Feature activation-5.07e-02

Pos logit contributions
 fault+0.005
 robber+0.004
 loser+0.004
 liar+0.004
 seeker+0.004
Neg logit contributions
ized-0.002
ls-0.002
lights-0.002
cking-0.002
led-0.002
Token
Token ID220
Feature activation-8.72e-01

Pos logit contributions
cking+0.001
lights+0.001
ized+0.001
bel+0.001
ls+0.001
Neg logit contributions
 fault-0.001
 Where-0.001
 Do-0.001
 robber-0.001
 loser-0.001
Token
Token ID198
Feature activation-7.64e+00

Pos logit contributions
lights+0.016
ized+0.016
cking+0.015
ls+0.015
led+0.014
Neg logit contributions
 fault-0.022
 supposed-0.019
 robber-0.019
 loser-0.019
 liar-0.018
Token
Token ID198
Feature activation-6.33e+00

Pos logit contributions
lights+0.214
ized+0.197
cking+0.192
led+0.185
bel+0.180
Neg logit contributions
 Do-0.105
 Where-0.101
 supposed-0.098
 fault-0.097
 Enjoy-0.097
TokenB
Token ID33
Feature activation+1.66e+00

Pos logit contributions
lights+0.134
ls+0.108
led+0.104
cking+0.104
cks+0.103
Neg logit contributions
 robber-0.129
 loser-0.128
 indeed-0.123
 Do-0.122
 Enjoy-0.121
Tokenunny
Token ID16948
Feature activation-2.53e+00

Pos logit contributions
 fault+0.040
 robber+0.035
 loser+0.035
 supposed+0.035
 seeker+0.034
Neg logit contributions
lights-0.032
ized-0.031
cking-0.031
ls-0.030
led-0.029
Token thanked
Token ID26280
Feature activation-1.52e+00

Pos logit contributions
lights+0.050
cking+0.047
ized+0.045
ls+0.045
cks+0.042
Neg logit contributions
 fault-0.058
 robber-0.051
 supposed-0.051
 loser-0.050
 seeker-0.050
Token the
Token ID262
Feature activation+3.94e-01

Pos logit contributions
lights+0.034
ized+0.033
cking+0.033
ls+0.031
bel+0.031
Neg logit contributions
 fault-0.030
 robber-0.028
 seeker-0.026
 loser-0.026
 liar-0.026
Token proud
Token ID6613
Feature activation-5.93e-01

Pos logit contributions
 fault+0.018
 robber+0.016
 loser+0.016
 seeker+0.016
 supposed+0.016
Neg logit contributions
lights-0.013
ized-0.013
cking-0.013
ls-0.013
led-0.012
Token and
Token ID290
Feature activation-2.97e+00

Pos logit contributions
lights+0.009
ized+0.009
ls+0.009
cking+0.008
led+0.008
Neg logit contributions
 fault-0.016
 robber-0.015
 loser-0.015
 liar-0.015
 seeker-0.015
Token special
Token ID2041
Feature activation-3.70e-01

Pos logit contributions
lights+0.053
cking+0.051
ls+0.050
ized+0.049
cks+0.046
Neg logit contributions
 fault-0.073
 supposed-0.070
 seeker-0.067
epad-0.066
 liar-0.066
Token.
Token ID13
Feature activation-2.21e-01

Pos logit contributions
lights+0.005
ized+0.004
ls+0.004
led+0.004
bel+0.004
Neg logit contributions
 fault-0.011
 robber-0.010
 loser-0.010
 seeker-0.010
 liar-0.010
Token
Token ID220
Feature activation-8.16e-01

Pos logit contributions
lights+0.003
ized+0.003
cking+0.003
ls+0.003
led+0.003
Neg logit contributions
 fault-0.006
 robber-0.005
 loser-0.005
 supposed-0.005
 seeker-0.005
Token
Token ID198
Feature activation-7.62e+00

Pos logit contributions
lights+0.013
ized+0.013
cking+0.013
ls+0.013
led+0.012
Neg logit contributions
 fault-0.022
 supposed-0.020
 loser-0.019
 robber-0.019
 seeker-0.019
Token
Token ID198
Feature activation-6.03e+00

Pos logit contributions
lights+0.187
ized+0.176
cking+0.170
led+0.168
ls+0.166
Neg logit contributions
 fault-0.139
 supposed-0.133
 Do-0.122
 Where-0.120
 Yes-0.120
TokenBut
Token ID1537
Feature activation-3.35e-01

Pos logit contributions
lights+0.094
ls+0.087
led+0.082
cking+0.080
ized+0.080
Neg logit contributions
 fault-0.160
 loser-0.155
 supposed-0.154
 robber-0.151
 liar-0.149
Token not
Token ID407
Feature activation+2.11e+00

Pos logit contributions
ized+0.006
lights+0.006
cking+0.005
ls+0.005
led+0.005
Neg logit contributions
 fault-0.009
 robber-0.008
 loser-0.008
 supposed-0.008
 liar-0.008
Token everyone
Token ID2506
Feature activation-3.93e-02

Pos logit contributions
 fault+0.050
 robber+0.046
 loser+0.045
 liar+0.045
 seeker+0.044
Neg logit contributions
ized-0.039
lights-0.039
cking-0.038
ls-0.037
led-0.035
Token at
Token ID379
Feature activation-3.74e+00

Pos logit contributions
ized+0.001
ls+0.001
lights+0.001
cking+0.001
led+0.001
Neg logit contributions
 fault-0.001
 robber-0.001
 seeker-0.001
 loser-0.001
 liar-0.001
Token sad
Token ID6507
Feature activation+1.06e+00

Pos logit contributions
lights+0.000
ized+0.000
cking+0.000
ls+0.000
led+0.000
Neg logit contributions
 fault-0.000
 robber-0.000
 loser-0.000
 supposed-0.000
 liar-0.000
Token and
Token ID290
Feature activation-4.05e+00

Pos logit contributions
 fault+0.028
 robber+0.026
 supposed+0.026
 seeker+0.026
 loser+0.026
Neg logit contributions
lights-0.017
ized-0.017
ls-0.016
cking-0.015
ling-0.015
Token scared
Token ID12008
Feature activation-4.14e-01

Pos logit contributions
lights+0.084
cking+0.078
ized+0.078
ls+0.075
cks+0.072
Neg logit contributions
 fault-0.089
 robber-0.081
 liar-0.081
 supposed-0.080
 seeker-0.078
Token.
Token ID13
Feature activation-2.06e-01

Pos logit contributions
ized+0.006
lights+0.006
ls+0.006
led+0.005
ling+0.005
Neg logit contributions
 fault-0.012
 supposed-0.011
 seeker-0.011
 loser-0.011
 robber-0.011
Token
Token ID220
Feature activation-1.61e+00

Pos logit contributions
lights+0.004
ized+0.004
cking+0.004
ls+0.004
led+0.003
Neg logit contributions
 fault-0.005
 loser-0.004
 robber-0.004
 liar-0.004
 supposed-0.004
Token
Token ID198
Feature activation-7.62e+00

Pos logit contributions
lights+0.031
ized+0.030
cking+0.030
ls+0.028
led+0.028
Neg logit contributions
 fault-0.040
 supposed-0.034
 loser-0.033
 robber-0.033
 liar-0.033
Token
Token ID198
Feature activation-6.13e+00

Pos logit contributions
lights+0.194
ized+0.181
cking+0.176
led+0.168
ls+0.166
Neg logit contributions
 fault-0.131
 Where-0.117
 loser-0.117
 supposed-0.115
 Do-0.114
TokenThe
Token ID464
Feature activation-6.28e-01

Pos logit contributions
lights+0.113
ls+0.099
ized+0.098
cking+0.096
cks+0.090
Neg logit contributions
 loser-0.145
 liar-0.138
 fault-0.137
 robber-0.135
 indeed-0.132
Token next
Token ID1306
Feature activation+8.36e-01

Pos logit contributions
lights+0.011
ized+0.011
cking+0.011
ls+0.010
led+0.010
Neg logit contributions
 fault-0.016
 loser-0.015
 robber-0.014
 liar-0.014
 seeker-0.014
Token day
Token ID1110
Feature activation-4.90e-01

Pos logit contributions
 fault+0.022
 supposed+0.019
 robber+0.019
 loser+0.019
 liar+0.019
Neg logit contributions
lights-0.014
ized-0.014
ls-0.014
cking-0.014
led-0.013
Token Buzz
Token ID15753
Feature activation-2.03e+00

Pos logit contributions
lights+0.010
ized+0.009
cking+0.009
ls+0.009
led+0.009
Neg logit contributions
 fault-0.011
 loser-0.010
 robber-0.010
 seeker-0.010
 supposed-0.010
Token by
Token ID416
Feature activation-2.82e+00

Pos logit contributions
lights+0.003
ized+0.003
cking+0.002
ls+0.002
led+0.002
Neg logit contributions
 fault-0.004
 seeker-0.004
 loser-0.004
 robber-0.004
 liar-0.004
Token many
Token ID867
Feature activation-4.08e+00

Pos logit contributions
lights+0.041
ized+0.041
led+0.039
ls+0.039
cking+0.038
Neg logit contributions
 fault-0.080
 robber-0.072
 supposed-0.071
 liar-0.070
 loser-0.070
Token people
Token ID661
Feature activation-7.83e-01

Pos logit contributions
lights+0.072
ized+0.071
cking+0.070
ls+0.067
led+0.064
Neg logit contributions
 fault-0.101
 robber-0.091
 loser-0.090
 supposed-0.090
 liar-0.090
Token.
Token ID13
Feature activation-2.67e-01

Pos logit contributions
ized+0.012
lights+0.011
cking+0.011
ls+0.011
ling+0.010
Neg logit contributions
 fault-0.022
 seeker-0.020
 loser-0.020
 robber-0.020
 liar-0.020
Token
Token ID220
Feature activation-1.11e+00

Pos logit contributions
lights+0.005
ized+0.005
cking+0.005
ls+0.005
led+0.005
Neg logit contributions
 fault-0.006
 supposed-0.006
 robber-0.006
 loser-0.005
 seeker-0.005
Token
Token ID198
Feature activation-7.62e+00

Pos logit contributions
lights+0.020
ized+0.019
cking+0.019
ls+0.018
led+0.018
Neg logit contributions
 fault-0.029
 supposed-0.025
 loser-0.025
 robber-0.025
 seeker-0.024
Token
Token ID198
Feature activation-6.25e+00

Pos logit contributions
lights+0.181
ized+0.170
cking+0.162
led+0.160
ls+0.160
Neg logit contributions
 fault-0.138
 supposed-0.132
 Do-0.125
 liar-0.125
 loser-0.124
TokenHe
Token ID1544
Feature activation-1.54e+00

Pos logit contributions
lights+0.090
ls+0.083
led+0.077
ized+0.076
cking+0.076
Neg logit contributions
 fault-0.171
 loser-0.169
 robber-0.167
 supposed-0.164
 liar-0.163
Token went
Token ID1816
Feature activation-6.38e-01

Pos logit contributions
lights+0.033
cking+0.032
ized+0.030
ls+0.030
cks+0.029
Neg logit contributions
 fault-0.032
 robber-0.030
 supposed-0.029
 indeed-0.029
 loser-0.029
Token back
Token ID736
Feature activation-2.20e+00

Pos logit contributions
lights+0.012
ized+0.012
cking+0.012
ls+0.011
led+0.011
Neg logit contributions
 fault-0.015
 robber-0.014
 seeker-0.014
 supposed-0.013
 loser-0.013
Token to
Token ID284
Feature activation-3.54e+00

Pos logit contributions
lights+0.060
cking+0.058
ized+0.058
ls+0.054
led+0.054
Neg logit contributions
 fault-0.036
 seeker-0.032
 supposed-0.031
 robber-0.030
 loser-0.028
Token pattern
Token ID3912
Feature activation-2.74e+00

Pos logit contributions
 fault+0.035
 robber+0.032
 supposed+0.032
 loser+0.032
 liar+0.032
Neg logit contributions
lights-0.012
ized-0.012
cking-0.012
ls-0.011
led-0.010
Token was
Token ID373
Feature activation+5.98e-01

Pos logit contributions
cking+0.053
lights+0.053
ized+0.052
ls+0.050
cks+0.048
Neg logit contributions
 fault-0.063
 supposed-0.055
 robber-0.054
 loser-0.054
 liar-0.054
Token fixed
Token ID5969
Feature activation-2.09e+00

Pos logit contributions
 fault+0.013
 supposed+0.012
epad+0.010
 indeed+0.010
 robber+0.010
Neg logit contributions
ls-0.013
lights-0.013
ized-0.013
cking-0.012
led-0.012
Token.
Token ID13
Feature activation-3.97e-01

Pos logit contributions
lights+0.034
ized+0.034
cking+0.032
ls+0.032
led+0.030
Neg logit contributions
 fault-0.056
 robber-0.050
 loser-0.050
 liar-0.050
 seeker-0.050
Token
Token ID220
Feature activation-1.00e+00

Pos logit contributions
cking+0.009
lights+0.009
ized+0.009
ls+0.009
ling+0.008
Neg logit contributions
 fault-0.008
 supposed-0.007
 Where-0.007
 Do-0.007
 Yes-0.007
Token
Token ID198
Feature activation-7.61e+00

Pos logit contributions
lights+0.019
ized+0.018
cking+0.018
ls+0.017
led+0.017
Neg logit contributions
 fault-0.025
 supposed-0.022
 loser-0.022
 seeker-0.021
 robber-0.021
Token
Token ID198
Feature activation-6.44e+00

Pos logit contributions
lights+0.208
ized+0.193
cking+0.187
ls+0.186
led+0.183
Neg logit contributions
 supposed-0.111
 Do-0.109
 fault-0.108
 Where-0.105
 Enjoy-0.103
TokenThe
Token ID464
Feature activation-9.18e-01

Pos logit contributions
lights+0.113
ls+0.102
led+0.097
cking+0.091
ized+0.089
Neg logit contributions
 loser-0.153
 fault-0.151
 supposed-0.151
 indeed-0.148
 liar-0.147
Token end
Token ID886
Feature activation-4.61e-01

Pos logit contributions
lights+0.019
ized+0.019
cking+0.019
ls+0.018
led+0.018
Neg logit contributions
 fault-0.020
 loser-0.018
 robber-0.018
 seeker-0.018
 supposed-0.018
Token.
Token ID13
Feature activation+6.50e-01

Pos logit contributions
ized+0.009
cking+0.009
lights+0.008
ls+0.008
led+0.008
Neg logit contributions
 fault-0.011
 supposed-0.010
 loser-0.010
 robber-0.010
 liar-0.009
Token<|endoftext|>
Token ID50256
Feature activation-4.42e-01

Pos logit contributions
 fault+0.015
 loser+0.013
 supposed+0.013
 seeker+0.013
 liar+0.012
Neg logit contributions
cking-0.013
lights-0.013
ized-0.013
ls-0.013
led-0.013
Token and
Token ID290
Feature activation-2.80e+00

Pos logit contributions
 fault+0.009
 robber+0.009
 loser+0.009
 seeker+0.009
 liar+0.008
Neg logit contributions
ized-0.005
lights-0.005
ls-0.005
cking-0.005
led-0.004
Token thanked
Token ID26280
Feature activation-1.86e+00

Pos logit contributions
lights+0.059
cking+0.054
cks+0.049
ized+0.049
ls+0.048
Neg logit contributions
 fault-0.059
 supposed-0.057
 indeed-0.057
 liar-0.055
 robber-0.054
Token the
Token ID262
Feature activation-6.21e-01

Pos logit contributions
lights+0.028
ized+0.026
cking+0.026
ls+0.025
led+0.025
Neg logit contributions
 fault-0.052
 robber-0.047
 loser-0.047
 liar-0.046
 seeker-0.046
Token owl
Token ID39610
Feature activation-1.32e+00

Pos logit contributions
lights+0.012
ized+0.012
cking+0.012
ls+0.011
led+0.011
Neg logit contributions
 fault-0.015
 loser-0.013
 robber-0.013
 liar-0.013
 seeker-0.013
Token.
Token ID13
Feature activation-3.47e-01

Pos logit contributions
lights+0.026
ized+0.026
cking+0.025
ls+0.025
led+0.024
Neg logit contributions
 fault-0.031
 loser-0.027
 robber-0.027
 supposed-0.026
 seeker-0.026
Token
Token ID198
Feature activation-7.61e+00

Pos logit contributions
cking+0.008
lights+0.008
ized+0.008
led+0.008
bel+0.008
Neg logit contributions
 fault-0.007
 Do-0.006
 loser-0.006
 Where-0.006
 robber-0.006
Token
Token ID198
Feature activation-6.32e+00

Pos logit contributions
lights+0.190
ized+0.175
cking+0.167
 pans+0.167
led+0.164
Neg logit contributions
 fault-0.132
 Do-0.118
 supposed-0.117
 Where-0.113
 Enjoy-0.112
TokenThe
Token ID464
Feature activation-1.01e+00

Pos logit contributions
lights+0.115
led+0.100
ls+0.099
cking+0.089
 pans+0.089
Neg logit contributions
 loser-0.148
 liar-0.141
 robber-0.140
 fault-0.140
 indeed-0.140
Token sw
Token ID1509
Feature activation-4.10e-01

Pos logit contributions
lights+0.025
ized+0.023
led+0.022
cking+0.022
ls+0.022
Neg logit contributions
 fault-0.020
 loser-0.019
 seeker-0.018
 robber-0.018
 liar-0.018
Tokenan
Token ID272
Feature activation-3.04e+00

Pos logit contributions
lights+0.006
ized+0.005
ls+0.005
cking+0.005
 pans+0.004
Neg logit contributions
 fault-0.013
 loser-0.011
 robber-0.011
 seeker-0.011
 supposed-0.011
Token flew
Token ID13112
Feature activation-2.85e+00

Pos logit contributions
lights+0.066
cking+0.061
cks+0.057
 pans+0.057
ized+0.056
Neg logit contributions
 fault-0.061
 indeed-0.059
 robber-0.055
 liar-0.055
 loser-0.054
Token they
Token ID484
Feature activation-2.31e+00

Pos logit contributions
lights+0.071
cking+0.070
ized+0.069
ls+0.067
cks+0.064
Neg logit contributions
 fault-0.098
 robber-0.089
 supposed-0.088
 liar-0.087
 seeker-0.087
Token became
Token ID2627
Feature activation-2.35e-01

Pos logit contributions
lights+0.052
ized+0.049
cking+0.049
ls+0.048
cks+0.047
Neg logit contributions
 seeker-0.044
 fault-0.044
 robber-0.042
 supposed-0.040
 loser-0.040
Token friends
Token ID2460
Feature activation-1.52e+00

Pos logit contributions
lights+0.004
ized+0.004
ls+0.004
cking+0.004
 deep+0.003
Neg logit contributions
 fault-0.006
 liar-0.005
 robber-0.005
 supposed-0.005
epad-0.005
Token.
Token ID13
Feature activation-5.82e-01

Pos logit contributions
ized+0.023
lights+0.023
ls+0.022
cking+0.021
led+0.020
Neg logit contributions
 fault-0.042
 robber-0.039
 supposed-0.038
 loser-0.038
 seeker-0.038
Token
Token ID220
Feature activation-1.14e+00

Pos logit contributions
cking+0.012
ized+0.012
lights+0.012
ls+0.011
bel+0.011
Neg logit contributions
 fault-0.013
 supposed-0.011
 robber-0.011
 loser-0.011
 seeker-0.011
Token
Token ID198
Feature activation-7.60e+00

Pos logit contributions
lights+0.022
ized+0.022
cking+0.021
ls+0.021
led+0.020
Neg logit contributions
 fault-0.028
 supposed-0.025
 robber-0.023
 loser-0.023
 liar-0.023
Token
Token ID198
Feature activation-5.94e+00

Pos logit contributions
lights+0.180
ized+0.176
cking+0.174
ls+0.164
led+0.164
Neg logit contributions
 fault-0.133
 supposed-0.127
 Do-0.121
 Yes-0.116
 Where-0.114
TokenFrom
Token ID4863
Feature activation-3.29e-01

Pos logit contributions
lights+0.085
ls+0.082
cking+0.080
ized+0.079
led+0.074
Neg logit contributions
 fault-0.162
 loser-0.153
 robber-0.152
 supposed-0.152
 liar-0.149
Token that
Token ID326
Feature activation-6.47e-01

Pos logit contributions
lights+0.008
cking+0.007
bel+0.007
ls+0.007
 valleys+0.007
Neg logit contributions
 fault-0.006
 seeker-0.006
 robber-0.005
 supposed-0.005
 indeed-0.005
Token day
Token ID1110
Feature activation-2.11e-01

Pos logit contributions
lights+0.014
ized+0.014
cking+0.014
ls+0.013
led+0.013
Neg logit contributions
 fault-0.014
 supposed-0.013
 loser-0.013
 seeker-0.012
 robber-0.012
Token on
Token ID319
Feature activation-2.23e+00

Pos logit contributions
lights+0.003
ized+0.003
cking+0.003
ls+0.003
bel+0.003
Neg logit contributions
 fault-0.006
 seeker-0.005
 loser-0.005
 robber-0.005
 supposed-0.005
Token his
Token ID465
Feature activation+1.46e+00

Pos logit contributions
ized+0.047
lights+0.047
cking+0.045
ls+0.045
led+0.043
Neg logit contributions
 fault-0.072
 supposed-0.064
 seeker-0.063
 loser-0.063
 robber-0.063
Token mom
Token ID1995
Feature activation+4.10e-01

Pos logit contributions
 fault+0.034
 supposed+0.031
 robber+0.031
 loser+0.030
 liar+0.030
Neg logit contributions
lights-0.027
ized-0.027
ls-0.026
cking-0.026
ling-0.024
Token too
Token ID1165
Feature activation-5.80e-01

Pos logit contributions
 fault+0.010
 loser+0.009
 seeker+0.009
 liar+0.009
 robber+0.009
Neg logit contributions
lights-0.007
ized-0.007
cking-0.007
ls-0.007
ling-0.007
Token.
Token ID13
Feature activation+2.16e-02

Pos logit contributions
ized+0.009
lights+0.008
ls+0.008
cking+0.008
led+0.008
Neg logit contributions
 fault-0.016
 robber-0.014
 seeker-0.014
 loser-0.014
 liar-0.014
Token
Token ID220
Feature activation-6.65e-01

Pos logit contributions
 fault+0.001
 loser+0.000
 supposed+0.000
 seeker+0.000
 robber+0.000
Neg logit contributions
ized-0.000
cking-0.000
lights-0.000
ls-0.000
led-0.000
Token
Token ID198
Feature activation-7.58e+00

Pos logit contributions
ized+0.012
lights+0.012
cking+0.012
ls+0.011
led+0.011
Neg logit contributions
 fault-0.017
 supposed-0.015
 loser-0.014
 robber-0.014
 seeker-0.014
Token
Token ID198
Feature activation-5.97e+00

Pos logit contributions
lights+0.212
ized+0.212
cking+0.200
led+0.195
ls+0.188
Neg logit contributions
 Where-0.105
 Do-0.104
 fault-0.103
 supposed-0.097
 Yes-0.095
TokenHe
Token ID1544
Feature activation-1.43e+00

Pos logit contributions
lights+0.110
ls+0.108
ized+0.104
led+0.102
cking+0.101
Neg logit contributions
 loser-0.140
 fault-0.130
 seeker-0.126
 robber-0.125
 liar-0.125
Token wanted
Token ID2227
Feature activation+2.23e+00

Pos logit contributions
lights+0.026
ized+0.025
cking+0.025
ls+0.024
cks+0.022
Neg logit contributions
 fault-0.035
 robber-0.032
 liar-0.032
 supposed-0.031
 seeker-0.031
Token to
Token ID284
Feature activation+4.21e-01

Pos logit contributions
 fault+0.048
 loser+0.044
 robber+0.043
 seeker+0.043
 liar+0.043
Neg logit contributions
lights-0.045
ized-0.044
ls-0.043
cking-0.043
 deep-0.041
Token take
Token ID1011
Feature activation-1.62e+00

Pos logit contributions
 fault+0.011
 loser+0.010
epad+0.009
 liar+0.009
 supposed+0.009
Neg logit contributions
ized-0.007
cking-0.007
lights-0.007
ls-0.006
 deep-0.006
Token for
Token ID329
Feature activation-2.31e+00

Pos logit contributions
 fault+0.006
 loser+0.005
 robber+0.005
 supposed+0.005
 seeker+0.005
Neg logit contributions
lights-0.005
ized-0.004
cking-0.004
ls-0.004
led-0.004
Token his
Token ID465
Feature activation+1.20e+00

Pos logit contributions
lights+0.050
ls+0.049
ized+0.049
cking+0.047
led+0.046
Neg logit contributions
 fault-0.050
 supposed-0.044
 robber-0.044
 liar-0.043
 loser-0.043
Token kindness
Token ID23887
Feature activation-1.21e+00

Pos logit contributions
 fault+0.026
 seeker+0.024
 loser+0.024
 robber+0.024
 liar+0.023
Neg logit contributions
ized-0.023
lights-0.022
 deep-0.022
ls-0.022
cking-0.022
Token.
Token ID13
Feature activation-7.22e-01

Pos logit contributions
ized+0.015
lights+0.015
ls+0.014
led+0.013
ling+0.013
Neg logit contributions
 fault-0.035
 seeker-0.034
 loser-0.033
 liar-0.033
 robber-0.033
Token
Token ID220
Feature activation-1.26e+00

Pos logit contributions
cking+0.014
lights+0.014
ized+0.014
ls+0.013
led+0.013
Neg logit contributions
 fault-0.017
 robber-0.015
 loser-0.015
 supposed-0.014
 seeker-0.014
Token
Token ID198
Feature activation-7.58e+00

Pos logit contributions
lights+0.023
ized+0.023
cking+0.022
ls+0.021
led+0.021
Neg logit contributions
 fault-0.032
 supposed-0.028
 loser-0.027
 robber-0.027
 liar-0.026
Token
Token ID198
Feature activation-6.37e+00

Pos logit contributions
lights+0.188
ized+0.178
cking+0.173
ls+0.169
led+0.169
Neg logit contributions
 fault-0.132
 supposed-0.122
 Do-0.120
 robber-0.119
 Yes-0.115
TokenThe
Token ID464
Feature activation-5.77e-01

Pos logit contributions
lights+0.099
ls+0.093
led+0.088
ized+0.087
cking+0.087
Neg logit contributions
 fault-0.167
 robber-0.165
 loser-0.164
 liar-0.159
 supposed-0.157
Token boy
Token ID2933
Feature activation-1.88e+00

Pos logit contributions
lights+0.011
ized+0.011
cking+0.011
ls+0.011
led+0.010
Neg logit contributions
 fault-0.014
 robber-0.012
 loser-0.012
 supposed-0.012
 seeker-0.012
Token was
Token ID373
Feature activation+1.27e+00

Pos logit contributions
lights+0.037
cking+0.035
ized+0.034
ls+0.034
cks+0.032
Neg logit contributions
 fault-0.044
 robber-0.039
 loser-0.039
 liar-0.039
 supposed-0.038
Token glad
Token ID9675
Feature activation+3.66e-01

Pos logit contributions
 fault+0.029
 supposed+0.025
 loser+0.025
 robber+0.025
 seeker+0.025
Neg logit contributions
lights-0.026
ized-0.026
ls-0.025
cking-0.025
led-0.024
Token should
Token ID815
Feature activation+3.31e+00

Pos logit contributions
 fault+0.039
 robber+0.036
epad+0.036
 liar+0.036
 loser+0.036
Neg logit contributions
ls-0.048
ized-0.047
s-0.047
ling-0.046
 deep-0.046
Token never
Token ID1239
Feature activation+8.10e-01

Pos logit contributions
 fault+0.072
 robber+0.067
 liar+0.066
 loser+0.066
epad+0.066
Neg logit contributions
ized-0.058
cking-0.056
 slowly-0.055
ling-0.055
 deep-0.054
Token take
Token ID1011
Feature activation-5.04e-01

Pos logit contributions
 fault+0.019
 seeker+0.016
 loser+0.016
 robber+0.016
 liar+0.016
Neg logit contributions
ized-0.015
cking-0.015
ls-0.014
ling-0.014
lights-0.014
Token something
Token ID1223
Feature activation+2.02e-01

Pos logit contributions
ls+0.009
ized+0.009
cking+0.009
lights+0.009
led+0.008
Neg logit contributions
 fault-0.012
 seeker-0.011
 loser-0.011
 robber-0.011
 liar-0.011
Token that
Token ID326
Feature activation+7.13e-01

Pos logit contributions
 fault+0.005
 supposed+0.004
 robber+0.004
 liar+0.004
 loser+0.004
Neg logit contributions
lights-0.004
cking-0.004
ized-0.004
ls-0.003
cks-0.003
Token doesn
Token ID1595
Feature activation+5.55e+00

Pos logit contributions
 fault+0.013
 supposed+0.011
 robber+0.010
 loser+0.010
 liar+0.010
Neg logit contributions
lights-0.019
cking-0.018
ized-0.018
ls-0.017
led-0.017
Token't
Token ID470
Feature activation+2.32e+00

Pos logit contributions
 fault+0.080
 seeker+0.063
 liar+0.062
 loser+0.062
 robber+0.062
Neg logit contributions
ized-0.158
lights-0.158
ls-0.156
cking-0.154
led-0.147
Token belong
Token ID5594
Feature activation+1.29e+00

Pos logit contributions
 fault+0.042
 seeker+0.037
 loser+0.035
 liar+0.034
 robber+0.032
Neg logit contributions
ized-0.056
cking-0.055
ls-0.053
lights-0.053
 deep-0.053
Token to
Token ID284
Feature activation-7.77e-01

Pos logit contributions
 robber+0.030
 fault+0.030
 loser+0.029
 liar+0.029
 seeker+0.029
Neg logit contributions
ls-0.023
ized-0.021
led-0.021
lights-0.021
 deep-0.020
Token you
Token ID345
Feature activation+2.51e+00

Pos logit contributions
lights+0.017
ized+0.016
cking+0.016
ls+0.016
led+0.015
Neg logit contributions
 fault-0.017
 loser-0.015
 robber-0.015
 seeker-0.014
 supposed-0.014
Token.
Token ID13
Feature activation+3.01e+00

Pos logit contributions
 fault+0.067
 seeker+0.061
 robber+0.061
 loser+0.060
 liar+0.059
Neg logit contributions
ized-0.039
ls-0.038
cking-0.036
lights-0.036
ling-0.035
Token there
Token ID612
Feature activation-1.02e+00

Pos logit contributions
cking+0.021
ized+0.021
lights+0.020
led+0.019
cks+0.019
Neg logit contributions
 fault-0.016
 loser-0.013
 supposed-0.013
 seeker-0.013
 robber-0.012
Token was
Token ID373
Feature activation+4.07e+00

Pos logit contributions
lights+0.018
ized+0.017
cking+0.017
ls+0.017
led+0.016
Neg logit contributions
 fault-0.026
 loser-0.023
 robber-0.023
 supposed-0.023
 seeker-0.023
Token a
Token ID257
Feature activation+1.99e+00

Pos logit contributions
 fault+0.097
 robber+0.089
 loser+0.088
 liar+0.088
epad+0.086
Neg logit contributions
ized-0.074
cking-0.073
ls-0.071
lights-0.071
 deep-0.070
Token furry
Token ID40857
Feature activation+2.43e+00

Pos logit contributions
 fault+0.049
 robber+0.045
 loser+0.045
 seeker+0.045
 liar+0.044
Neg logit contributions
lights-0.037
ized-0.035
cking-0.035
ls-0.033
led-0.033
Token dog
Token ID3290
Feature activation+2.88e+00

Pos logit contributions
 fault+0.063
 supposed+0.058
epad+0.057
 loser+0.053
 liar+0.052
Neg logit contributions
ls-0.040
lights-0.039
ized-0.039
ling-0.039
led-0.037
Token called
Token ID1444
Feature activation+6.36e+00

Pos logit contributions
 fault+0.057
 robber+0.051
 liar+0.050
 loser+0.050
 seeker+0.050
Neg logit contributions
lights-0.066
ized-0.064
cking-0.064
ls-0.062
led-0.060
Token R
Token ID371
Feature activation+3.98e+00

Pos logit contributions
 fault+0.151
 supposed+0.133
 loser+0.131
epad+0.130
 liar+0.130
Neg logit contributions
ls-0.117
lights-0.116
ized-0.113
cking-0.111
 deep-0.110
Tokenuf
Token ID3046
Feature activation+3.21e+00

Pos logit contributions
 fault+0.089
 loser+0.076
 robber+0.076
 supposed+0.075
 seeker+0.075
Neg logit contributions
lights-0.084
ized-0.084
cking-0.082
ls-0.079
led-0.077
Tokenus
Token ID385
Feature activation+3.99e+00

Pos logit contributions
 fault+0.074
 supposed+0.065
 robber+0.065
 loser+0.065
 liar+0.064
Neg logit contributions
lights-0.064
ized-0.063
cking-0.061
ls-0.060
led-0.058
Token.
Token ID13
Feature activation+3.86e+00

Pos logit contributions
 fault+0.116
 robber+0.106
 loser+0.106
 seeker+0.105
 liar+0.103
Neg logit contributions
ls-0.050
ized-0.050
cking-0.050
lights-0.050
led-0.047
Token R
Token ID371
Feature activation+3.64e+00

Pos logit contributions
 fault+0.087
 seeker+0.080
 loser+0.078
 robber+0.078
 liar+0.075
Neg logit contributions
ized-0.082
lights-0.080
cking-0.078
led-0.074
ls-0.074
Token"
Token ID1
Feature activation+1.62e+00

Pos logit contributions
lights+0.066
cking+0.064
ized+0.063
ls+0.062
led+0.059
Neg logit contributions
 fault-0.077
 robber-0.074
 loser-0.073
 liar-0.071
 seeker-0.071
TokenHello
Token ID15496
Feature activation+4.22e+00

Pos logit contributions
 fault+0.027
 robber+0.024
 loser+0.024
 seeker+0.023
 liar+0.023
Neg logit contributions
lights-0.042
ized-0.041
cking-0.041
ls-0.041
led-0.039
Token,
Token ID11
Feature activation+4.13e+00

Pos logit contributions
 fault+0.112
 supposed+0.099
epad+0.097
 liar+0.097
 loser+0.097
Neg logit contributions
ls-0.073
ized-0.070
lights-0.070
cking-0.067
led-0.064
Token new
Token ID649
Feature activation+8.94e-01

Pos logit contributions
 fault+0.104
 supposed+0.090
 loser+0.089
epad+0.089
 seeker+0.088
Neg logit contributions
ls-0.074
ized-0.074
lights-0.073
cking-0.071
led-0.066
Token dog
Token ID3290
Feature activation+3.44e+00

Pos logit contributions
 fault+0.022
 supposed+0.019
epad+0.019
 loser+0.018
 liar+0.018
Neg logit contributions
ls-0.018
ized-0.017
lights-0.016
ling-0.016
 deep-0.016
Token.
Token ID13
Feature activation+6.43e+00

Pos logit contributions
 fault+0.081
 loser+0.069
epad+0.068
 supposed+0.068
 seeker+0.067
Neg logit contributions
ls-0.072
ized-0.068
cking-0.066
lights-0.066
led-0.065
Token What
Token ID1867
Feature activation+3.75e+00

Pos logit contributions
 fault+0.140
 supposed+0.119
epad+0.103
 liar+0.097
 loser+0.096
Neg logit contributions
lights-0.134
 deep-0.133
 low-0.132
s-0.128
bel-0.127
Token is
Token ID318
Feature activation+5.48e+00

Pos logit contributions
 fault+0.100
 robber+0.089
 loser+0.088
 seeker+0.087
epad+0.084
Neg logit contributions
ls-0.058
 deep-0.057
s-0.056
 lands-0.050
 ch-0.047
Token your
Token ID534
Feature activation+3.29e+00

Pos logit contributions
 fault+0.148
 loser+0.132
 liar+0.126
epad+0.125
 seeker+0.124
Neg logit contributions
ls-0.093
cking-0.089
ized-0.088
 deep-0.084
lights-0.082
Token name
Token ID1438
Feature activation+3.52e+00

Pos logit contributions
 fault+0.054
 liar+0.049
 loser+0.049
 supposed+0.047
epad+0.047
Neg logit contributions
ls-0.078
ized-0.078
lights-0.077
cking-0.075
 deep-0.075
Token?"
Token ID1701
Feature activation-9.91e-01

Pos logit contributions
 fault+0.070
 loser+0.060
 robber+0.060
 supposed+0.059
 seeker+0.059
Neg logit contributions
lights-0.081
ized-0.080
cking-0.079
ls-0.078
led-0.075
Token an
Token ID281
Feature activation+1.75e+00

Pos logit contributions
 fault+0.094
 loser+0.083
 robber+0.083
 liar+0.082
 seeker+0.080
Neg logit contributions
ized-0.075
cking-0.074
lights-0.074
ls-0.071
led-0.069
Token ignorant
Token ID20109
Feature activation+4.50e+00

Pos logit contributions
 fault+0.044
 robber+0.039
 supposed+0.039
 loser+0.039
 seeker+0.038
Neg logit contributions
lights-0.031
ized-0.031
cking-0.030
ls-0.030
led-0.028
Token goat
Token ID26917
Feature activation+2.67e+00

Pos logit contributions
 fault+0.117
 supposed+0.109
epad+0.102
 loser+0.101
 seeker+0.100
Neg logit contributions
lights-0.072
ized-0.068
ling-0.067
ls-0.064
 deep-0.062
Token.
Token ID13
Feature activation+3.33e+00

Pos logit contributions
 fault+0.080
 loser+0.071
 seeker+0.070
 liar+0.070
 robber+0.070
Neg logit contributions
ized-0.036
lights-0.034
cking-0.034
ls-0.033
ling-0.032
Token He
Token ID679
Feature activation+1.09e+00

Pos logit contributions
 fault+0.081
 supposed+0.074
 liar+0.074
 seeker+0.074
 loser+0.073
Neg logit contributions
lights-0.060
ized-0.058
ised-0.053
cking-0.052
ls-0.051
Token thought
Token ID1807
Feature activation+5.76e+00

Pos logit contributions
 fault+0.027
 loser+0.023
 seeker+0.023
 supposed+0.023
 robber+0.023
Neg logit contributions
ized-0.020
lights-0.020
ls-0.020
cking-0.020
led-0.019
Token he
Token ID339
Feature activation+2.26e+00

Pos logit contributions
 fault+0.154
 seeker+0.138
 loser+0.138
 supposed+0.136
 robber+0.135
Neg logit contributions
lights-0.094
ized-0.091
ls-0.091
cking-0.088
led-0.084
Token could
Token ID714
Feature activation+3.70e+00

Pos logit contributions
 fault+0.063
 loser+0.058
 seeker+0.056
 robber+0.056
 liar+0.056
Neg logit contributions
led-0.033
ized-0.033
ls-0.031
ling-0.031
lights-0.031
Token get
Token ID651
Feature activation+9.82e-01

Pos logit contributions
 fault+0.086
 loser+0.074
 robber+0.074
 supposed+0.074
 seeker+0.074
Neg logit contributions
lights-0.072
ized-0.072
cking-0.071
ls-0.069
led-0.065
Token away
Token ID1497
Feature activation-1.66e-01

Pos logit contributions
 fault+0.023
 loser+0.020
 robber+0.020
 seeker+0.020
 supposed+0.020
Neg logit contributions
lights-0.019
ized-0.019
cking-0.018
ls-0.018
led-0.017
Token with
Token ID351
Feature activation-1.66e+00

Pos logit contributions
lights+0.003
ized+0.003
cking+0.003
ls+0.003
led+0.003
Neg logit contributions
 fault-0.004
 robber-0.004
 loser-0.004
 supposed-0.004
 seeker-0.004
Token give
Token ID1577
Feature activation-4.20e-01

Pos logit contributions
 fault+0.063
 supposed+0.058
epad+0.055
 liar+0.051
 seeker+0.050
Neg logit contributions
lights-0.052
ls-0.050
cking-0.050
ized-0.050
 deep-0.050
Token us
Token ID514
Feature activation-7.20e-01

Pos logit contributions
lights+0.008
ized+0.008
cking+0.007
ls+0.007
led+0.007
Neg logit contributions
 fault-0.010
 robber-0.009
 loser-0.009
 supposed-0.009
 seeker-0.009
Token our
Token ID674
Feature activation-2.68e-01

Pos logit contributions
lights+0.012
ized+0.012
ls+0.012
cking+0.012
led+0.011
Neg logit contributions
 fault-0.018
 supposed-0.017
 loser-0.016
 liar-0.016
 seeker-0.016
Token ball
Token ID2613
Feature activation-3.72e-01

Pos logit contributions
lights+0.005
ized+0.005
cking+0.005
ls+0.005
led+0.005
Neg logit contributions
 fault-0.006
 loser-0.006
 robber-0.006
 seeker-0.005
 supposed-0.005
Token back
Token ID736
Feature activation-1.32e+00

Pos logit contributions
lights+0.006
ized+0.006
ls+0.006
cking+0.006
led+0.006
Neg logit contributions
 fault-0.009
 supposed-0.008
 robber-0.008
 seeker-0.008
 liar-0.008
Token.
Token ID13
Feature activation+5.58e+00

Pos logit contributions
lights+0.027
ized+0.027
cking+0.026
ls+0.026
led+0.025
Neg logit contributions
 fault-0.030
 robber-0.026
 loser-0.026
 supposed-0.026
 seeker-0.026
Token We
Token ID775
Feature activation+3.07e+00

Pos logit contributions
 fault+0.124
 supposed+0.123
epad+0.115
 liar+0.106
 robber+0.102
Neg logit contributions
lights-0.105
cks-0.096
ls-0.094
ized-0.091
rac-0.091
Token love
Token ID1842
Feature activation+4.91e-01

Pos logit contributions
 fault+0.061
epad+0.055
 supposed+0.052
 robber+0.051
 loser+0.051
Neg logit contributions
ls-0.064
ized-0.062
led-0.061
cking-0.061
ck-0.060
Token our
Token ID674
Feature activation+6.20e-01

Pos logit contributions
 fault+0.012
 loser+0.010
 seeker+0.010
 robber+0.010
 liar+0.010
Neg logit contributions
ized-0.010
lights-0.009
cking-0.009
ls-0.009
led-0.009
Token ball
Token ID2613
Feature activation-2.26e-01

Pos logit contributions
 fault+0.015
 robber+0.014
 loser+0.014
 supposed+0.013
 seeker+0.013
Neg logit contributions
lights-0.011
ized-0.011
cking-0.011
ls-0.011
led-0.010
Token.
Token ID13
Feature activation+4.80e+00

Pos logit contributions
lights+0.004
ized+0.004
ls+0.004
cking+0.004
led+0.004
Neg logit contributions
 fault-0.006
 supposed-0.005
 robber-0.005
 seeker-0.005
 loser-0.005
Token<|endoftext|>
Token ID50256
Feature activation-7.08e-01

Pos logit contributions
 fault+0.015
 robber+0.014
 loser+0.014
 supposed+0.013
 seeker+0.013
Neg logit contributions
ized-0.009
lights-0.009
cking-0.009
ls-0.009
led-0.009
TokenOnce
Token ID7454
Feature activation-8.91e-01

Pos logit contributions
lights+0.009
ized+0.009
ls+0.009
cking+0.009
led+0.008
Neg logit contributions
 fault-0.021
 robber-0.019
 loser-0.019
 liar-0.019
 supposed-0.019
Token upon
Token ID2402
Feature activation-3.41e-01

Pos logit contributions
cking+0.021
ized+0.020
lights+0.020
led+0.019
ls+0.018
Neg logit contributions
 fault-0.018
 supposed-0.015
 seeker-0.015
 loser-0.015
 robber-0.014
Token a
Token ID257
Feature activation+1.35e+00

Pos logit contributions
 deep+0.005
cking+0.005
ized+0.005
lights+0.005
ls+0.005
Neg logit contributions
 fault-0.009
 loser-0.008
 robber-0.008
 seeker-0.008
 liar-0.008
Token time
Token ID640
Feature activation+4.91e+00

Pos logit contributions
 fault+0.025
 supposed+0.021
 loser+0.021
 robber+0.021
 seeker+0.021
Neg logit contributions
lights-0.033
ized-0.033
cking-0.032
ls-0.032
led-0.031
Token,
Token ID11
Feature activation+3.63e+00

Pos logit contributions
 fault+0.133
 robber+0.120
 supposed+0.120
 liar+0.119
epad+0.118
Neg logit contributions
ls-0.077
lights-0.076
ized-0.074
cking-0.070
 deep-0.068
Token there
Token ID612
Feature activation+1.28e-01

Pos logit contributions
 fault+0.099
 robber+0.088
 liar+0.087
 supposed+0.087
 loser+0.087
Neg logit contributions
lights-0.058
ized-0.057
ls-0.056
cking-0.056
led-0.054
Token was
Token ID373
Feature activation+4.47e+00

Pos logit contributions
 fault+0.003
 robber+0.003
 loser+0.003
 supposed+0.003
 seeker+0.003
Neg logit contributions
lights-0.002
ized-0.002
cking-0.002
ls-0.002
led-0.002
Token an
Token ID281
Feature activation+1.82e+00

Pos logit contributions
 fault+0.124
 loser+0.111
 liar+0.111
 robber+0.108
epad+0.107
Neg logit contributions
 deep-0.068
ized-0.066
lights-0.062
cking-0.062
 grass-0.061
Token attractive
Token ID10966
Feature activation+2.74e+00

Pos logit contributions
 fault+0.045
 robber+0.041
 supposed+0.041
 seeker+0.041
 loser+0.040
Neg logit contributions
lights-0.032
ls-0.031
ized-0.031
cking-0.031
led-0.029
Token beetle
Token ID45489
Feature activation+1.82e+00

Pos logit contributions
 fault+0.072
 supposed+0.064
 loser+0.061
epad+0.061
 liar+0.060
Neg logit contributions
ized-0.046
lights-0.045
ls-0.045
cking-0.042
 deep-0.042
Token a
Token ID257
Feature activation-4.18e-01

Pos logit contributions
lights+0.020
ized+0.020
cking+0.020
ls+0.019
led+0.018
Neg logit contributions
 fault-0.023
 robber-0.020
 loser-0.020
 supposed-0.020
 liar-0.020
Token plane
Token ID6614
Feature activation-1.37e+00

Pos logit contributions
lights+0.005
ized+0.005
ls+0.005
cking+0.005
bel+0.005
Neg logit contributions
 fault-0.012
 supposed-0.011
 loser-0.011
 seeker-0.011
 robber-0.011
Token outside
Token ID2354
Feature activation-3.30e+00

Pos logit contributions
lights+0.027
ls+0.026
led+0.025
ized+0.025
 pans+0.025
Neg logit contributions
 fault-0.029
 supposed-0.027
 robber-0.026
epad-0.026
 seeker-0.026
Token the
Token ID262
Feature activation-3.37e+00

Pos logit contributions
lights+0.050
ized+0.050
cking+0.049
ls+0.048
led+0.045
Neg logit contributions
 fault-0.091
 loser-0.081
 robber-0.081
 supposed-0.081
 seeker-0.081
Token window
Token ID4324
Feature activation-3.38e+00

Pos logit contributions
lights+0.059
led+0.059
cking+0.059
ized+0.059
ls+0.056
Neg logit contributions
 fault-0.085
 seeker-0.078
 loser-0.077
 robber-0.076
 supposed-0.074
Token.
Token ID13
Feature activation+1.70e+00

Pos logit contributions
ls+0.068
cking+0.067
ized+0.065
lights+0.064
led+0.062
Neg logit contributions
 fault-0.076
 supposed-0.067
epad-0.066
 loser-0.066
 indeed-0.064
Token He
Token ID679
Feature activation-1.86e+00

Pos logit contributions
 fault+0.038
 loser+0.034
 seeker+0.034
 supposed+0.034
 liar+0.033
Neg logit contributions
lights-0.036
ized-0.034
ls-0.032
ised-0.031
cking-0.031
Token was
Token ID373
Feature activation+1.48e+00

Pos logit contributions
lights+0.045
cking+0.041
ized+0.041
ls+0.039
ling+0.038
Neg logit contributions
 fault-0.032
 robber-0.030
 supposed-0.030
 liar-0.030
 indeed-0.029
Token so
Token ID523
Feature activation-4.37e-01

Pos logit contributions
 fault+0.034
 supposed+0.031
 seeker+0.030
 robber+0.029
 loser+0.029
Neg logit contributions
lights-0.031
ized-0.030
ls-0.029
cking-0.028
led-0.028
Token excited
Token ID6568
Feature activation+2.62e-01

Pos logit contributions
lights+0.010
ized+0.009
ls+0.009
led+0.009
bing+0.009
Neg logit contributions
 supposed-0.008
 fault-0.008
 liar-0.007
 loser-0.007
 seeker-0.007
Token!
Token ID0
Feature activation+7.07e-01

Pos logit contributions
 fault+0.007
 loser+0.007
 liar+0.007
 seeker+0.007
 robber+0.006
Neg logit contributions
ized-0.004
lights-0.004
ls-0.004
led-0.004
cking-0.004
Token too
Token ID1165
Feature activation+1.09e+00

Pos logit contributions
 fault+0.028
 loser+0.024
 robber+0.024
 supposed+0.023
 seeker+0.023
Neg logit contributions
lights-0.031
ized-0.031
cking-0.031
ls-0.030
led-0.029
Token,
Token ID11
Feature activation+2.58e+00

Pos logit contributions
 fault+0.025
 loser+0.022
 supposed+0.022
 robber+0.022
 liar+0.021
Neg logit contributions
lights-0.022
ized-0.022
cking-0.022
ls-0.021
led-0.020
Token mom
Token ID1995
Feature activation+1.64e+00

Pos logit contributions
 fault+0.061
 supposed+0.057
epad+0.054
 liar+0.052
 seeker+0.051
Neg logit contributions
lights-0.050
ls-0.048
ized-0.047
cking-0.046
 deep-0.044
Token.
Token ID13
Feature activation+4.86e+00

Pos logit contributions
 fault+0.049
 robber+0.047
 loser+0.046
 liar+0.046
 seeker+0.045
Neg logit contributions
lights-0.020
ls-0.019
ized-0.019
cking-0.018
ling-0.018
Token I
Token ID314
Feature activation+3.03e+00

Pos logit contributions
 fault+0.093
 supposed+0.085
epad+0.078
 robber+0.077
 liar+0.076
Neg logit contributions
lights-0.114
ls-0.105
ized-0.104
cking-0.103
cks-0.102
Token want
Token ID765
Feature activation+2.51e+00

Pos logit contributions
 fault+0.058
 robber+0.053
epad+0.052
 liar+0.049
 loser+0.049
Neg logit contributions
 breathed-0.063
ls-0.062
 deep-0.062
s-0.062
 cre-0.062
Token to
Token ID284
Feature activation+1.69e+00

Pos logit contributions
 fault+0.051
 robber+0.044
 loser+0.044
 seeker+0.044
 supposed+0.043
Neg logit contributions
lights-0.056
ized-0.056
ls-0.055
cking-0.055
led-0.052
Token organize
Token ID16481
Feature activation-1.01e+00

Pos logit contributions
 fault+0.038
 robber+0.033
 loser+0.033
 supposed+0.033
 liar+0.033
Neg logit contributions
lights-0.034
ized-0.034
cking-0.033
ls-0.033
led-0.031
Token things
Token ID1243
Feature activation-1.16e+00

Pos logit contributions
lights+0.020
ized+0.020
cking+0.020
ls+0.019
led+0.018
Neg logit contributions
 fault-0.023
 robber-0.020
 loser-0.020
 seeker-0.020
 supposed-0.020
Token like
Token ID588
Feature activation-2.90e-01

Pos logit contributions
lights+0.020
ized+0.020
cking+0.019
ls+0.019
led+0.018
Neg logit contributions
 fault-0.030
 robber-0.027
 loser-0.027
 supposed-0.026
 seeker-0.026
Token the
Token ID262
Feature activation-1.44e+00

Pos logit contributions
lights+0.005
ized+0.005
cking+0.005
ls+0.005
led+0.005
Neg logit contributions
 fault-0.007
 robber-0.006
 loser-0.006
 supposed-0.006
 seeker-0.006
Token up
Token ID510
Feature activation-4.16e+00

Pos logit contributions
lights+0.034
cking+0.034
ized+0.033
cks+0.032
bel+0.031
Neg logit contributions
 fault-0.036
 supposed-0.031
 robber-0.030
 seeker-0.029
 loser-0.029
Token the
Token ID262
Feature activation-2.10e+00

Pos logit contributions
cking+0.090
lights+0.088
cks+0.085
ized+0.083
ls+0.082
Neg logit contributions
 fault-0.085
 indeed-0.071
 seeker-0.070
 supposed-0.069
 loser-0.069
Token music
Token ID2647
Feature activation-2.26e-01

Pos logit contributions
lights+0.036
ized+0.036
cking+0.036
ls+0.035
led+0.033
Neg logit contributions
 fault-0.054
 robber-0.048
 loser-0.047
 supposed-0.047
 seeker-0.047
Token so
Token ID523
Feature activation-1.04e+00

Pos logit contributions
cking+0.004
ized+0.004
lights+0.004
ls+0.004
led+0.004
Neg logit contributions
 fault-0.006
 robber-0.005
 loser-0.005
 supposed-0.005
 seeker-0.005
Token she
Token ID673
Feature activation-4.75e-01

Pos logit contributions
cking+0.022
cks+0.021
lights+0.021
ls+0.019
ized+0.019
Neg logit contributions
 fault-0.021
 loser-0.018
 indeed-0.018
 supposed-0.018
 liar-0.018
Token could
Token ID714
Feature activation+2.78e+00

Pos logit contributions
lights+0.010
cking+0.010
ized+0.010
ls+0.009
cks+0.009
Neg logit contributions
 fault-0.010
 robber-0.009
 supposed-0.009
 loser-0.009
 seeker-0.009
Token listen
Token ID6004
Feature activation-6.79e-01

Pos logit contributions
 fault+0.060
 loser+0.051
 seeker+0.051
 robber+0.051
 supposed+0.051
Neg logit contributions
lights-0.060
ized-0.059
cking-0.058
ls-0.058
led-0.056
Token more
Token ID517
Feature activation-2.29e+00

Pos logit contributions
lights+0.015
cking+0.015
ized+0.015
ls+0.014
led+0.014
Neg logit contributions
 fault-0.014
 supposed-0.012
 loser-0.012
epad-0.011
 robber-0.011
Token closely
Token ID7173
Feature activation-2.13e+00

Pos logit contributions
ized+0.046
cking+0.045
lights+0.045
ls+0.045
led+0.044
Neg logit contributions
 fault-0.052
 supposed-0.045
epad-0.043
 loser-0.043
 robber-0.043
Token.
Token ID13
Feature activation+1.96e-01

Pos logit contributions
cking+0.038
lights+0.038
ized+0.038
ls+0.036
led+0.034
Neg logit contributions
 fault-0.053
 supposed-0.046
 loser-0.045
 robber-0.045
 seeker-0.044
Token
Token ID198
Feature activation-5.62e+00

Pos logit contributions
 fault+0.005
 loser+0.005
 robber+0.005
 liar+0.005
 supposed+0.005
Neg logit contributions
lights-0.003
ized-0.003
ls-0.003
cking-0.003
ised-0.003
Token
Token ID6184
Feature activation+3.81e+00

Pos logit contributions
 fault+0.059
 loser+0.050
 robber+0.050
 supposed+0.050
 seeker+0.050
Neg logit contributions
lights-0.063
ized-0.062
cking-0.061
ls-0.061
led-0.058
Token
Token ID95
Feature activation+2.17e+00

Pos logit contributions
 fault+0.088
 loser+0.076
 robber+0.076
 supposed+0.076
 seeker+0.075
Neg logit contributions
ized-0.076
lights-0.076
cking-0.074
ls-0.073
led-0.070
Token
Token ID26391
Feature activation+4.17e+00

Pos logit contributions
 fault+0.048
 loser+0.041
 liar+0.041
 robber+0.041
 seeker+0.040
Neg logit contributions
ized-0.045
cking-0.044
lights-0.044
ls-0.042
led-0.041
Token
Token ID129
Feature activation+1.01e+00

Pos logit contributions
 fault+0.094
 seeker+0.083
 supposed+0.083
 robber+0.082
 loser+0.082
Neg logit contributions
lights-0.085
ized-0.083
cking-0.082
ls-0.080
led-0.078
Token
Token ID241
Feature activation+1.85e+00

Pos logit contributions
 fault+0.021
 supposed+0.019
 robber+0.018
 seeker+0.017
 liar+0.017
Neg logit contributions
lights-0.025
ls-0.023
ized-0.022
cking-0.022
bel-0.021
TokenThat
Token ID2504
Feature activation+3.60e+00

Pos logit contributions
 fault+0.050
 robber+0.045
 loser+0.043
 liar+0.043
 seeker+0.042
Neg logit contributions
ized-0.027
 deep-0.026
lights-0.026
bel-0.026
The-0.026
Token's
Token ID338
Feature activation+3.29e+00

Pos logit contributions
 fault+0.092
 robber+0.083
 loser+0.082
 liar+0.082
 supposed+0.081
Neg logit contributions
ized-0.061
lights-0.060
ls-0.060
cking-0.057
led-0.054
Token right
Token ID826
Feature activation+1.99e+00

Pos logit contributions
 fault+0.076
 loser+0.068
 liar+0.068
 robber+0.067
epad+0.065
Neg logit contributions
ized-0.061
lights-0.061
cking-0.061
 deep-0.057
ls-0.057
Token!
Token ID0
Feature activation+3.24e+00

Pos logit contributions
 fault+0.057
epad+0.052
 robber+0.051
 supposed+0.051
 liar+0.051
Neg logit contributions
ls-0.026
lights-0.025
 deep-0.025
ized-0.024
 cre-0.024
Tokenâ
Token ID22940
Feature activation+1.89e+00

Pos logit contributions
 fault+0.066
 supposed+0.053
epad+0.052
 liar+0.052
 seeker+0.051
Neg logit contributions
lights-0.072
ized-0.069
ised-0.067
ls-0.067
 deep-0.066
Token
Token ID26391
Feature activation+3.52e+00

Pos logit contributions
 fault+0.031
 supposed+0.027
 robber+0.027
 seeker+0.027
 loser+0.027
Neg logit contributions
lights-0.050
ls-0.048
ized-0.048
cking-0.048
bel-0.045
Token gave
Token ID2921
Feature activation-2.23e+00

Pos logit contributions
lights+0.027
cking+0.025
ized+0.024
ls+0.024
cks+0.023
Neg logit contributions
 fault-0.026
 supposed-0.024
 robber-0.023
 liar-0.022
 indeed-0.022
Token him
Token ID683
Feature activation-3.89e-01

Pos logit contributions
ized+0.052
lights+0.050
ls+0.049
cking+0.049
led+0.049
Neg logit contributions
 fault-0.046
 loser-0.041
 liar-0.040
 seeker-0.040
 robber-0.039
Token a
Token ID257
Feature activation-1.72e+00

Pos logit contributions
lights+0.008
ized+0.008
cking+0.007
ls+0.007
led+0.007
Neg logit contributions
 fault-0.009
 loser-0.008
 robber-0.008
 supposed-0.008
 seeker-0.008
Token big
Token ID1263
Feature activation-1.65e+00

Pos logit contributions
lights+0.022
ized+0.021
cking+0.021
ls+0.019
led+0.019
Neg logit contributions
 fault-0.053
 robber-0.049
 loser-0.048
 supposed-0.048
 liar-0.048
Token hug
Token ID16225
Feature activation+4.84e-01

Pos logit contributions
lights+0.027
ized+0.027
ls+0.027
 deep+0.026
cking+0.025
Neg logit contributions
 fault-0.041
 supposed-0.039
 liar-0.036
 seeker-0.036
 loser-0.035
Token and
Token ID290
Feature activation-3.22e+00

Pos logit contributions
 fault+0.015
 liar+0.014
 supposed+0.014
 loser+0.013
 seeker+0.013
Neg logit contributions
lights-0.005
ized-0.005
led-0.005
ling-0.005
ls-0.005
Token said
Token ID531
Feature activation+2.67e+00

Pos logit contributions
lights+0.050
cking+0.050
ized+0.049
ls+0.047
ling+0.044
Neg logit contributions
 fault-0.087
 robber-0.079
 supposed-0.078
 seeker-0.077
 liar-0.077
Token he
Token ID339
Feature activation+1.24e+00

Pos logit contributions
 fault+0.074
 supposed+0.067
 seeker+0.067
 loser+0.066
 robber+0.065
Neg logit contributions
lights-0.041
ized-0.041
ls-0.041
led-0.039
cking-0.037
Token was
Token ID373
Feature activation+2.28e+00

Pos logit contributions
 fault+0.036
 loser+0.034
 seeker+0.033
 robber+0.033
 liar+0.032
Neg logit contributions
led-0.018
ized-0.017
ls-0.016
ling-0.015
lights-0.015
Token a
Token ID257
Feature activation+9.06e-01

Pos logit contributions
 fault+0.066
 loser+0.059
 robber+0.059
 liar+0.058
 seeker+0.057
Neg logit contributions
ized-0.032
cking-0.032
lights-0.031
ls-0.029
led-0.028
Token very
Token ID845
Feature activation+1.52e+00

Pos logit contributions
 fault+0.023
 supposed+0.020
 seeker+0.020
epad+0.020
 loser+0.020
Neg logit contributions
lights-0.014
ized-0.014
cking-0.014
 deep-0.014
ls-0.014
Token torn
Token ID12445
Feature activation+1.24e+00

Pos logit contributions
 fault+0.017
 supposed+0.015
 loser+0.015
 robber+0.015
 seeker+0.015
Neg logit contributions
lights-0.006
ized-0.006
ls-0.006
cking-0.006
led-0.005
Token ear
Token ID1027
Feature activation+2.93e+00

Pos logit contributions
 fault+0.034
 supposed+0.031
 loser+0.030
 liar+0.030
 robber+0.030
Neg logit contributions
lights-0.019
ized-0.018
ls-0.018
cking-0.017
led-0.016
Token.
Token ID13
Feature activation+8.36e-01

Pos logit contributions
 fault+0.085
 robber+0.079
 liar+0.077
 loser+0.077
 seeker+0.076
Neg logit contributions
lights-0.040
ls-0.039
cking-0.038
ized-0.038
led-0.036
Token Lily
Token ID20037
Feature activation-8.58e-01

Pos logit contributions
 fault+0.018
 robber+0.016
 seeker+0.016
 loser+0.016
 supposed+0.015
Neg logit contributions
cking-0.018
lights-0.017
ized-0.017
ls-0.016
cks-0.016
Token picked
Token ID6497
Feature activation-7.82e-01

Pos logit contributions
lights+0.017
ized+0.017
cking+0.016
ls+0.016
led+0.015
Neg logit contributions
 fault-0.020
 robber-0.018
 loser-0.018
 supposed-0.018
 seeker-0.017
Token it
Token ID340
Feature activation-6.34e-01

Pos logit contributions
ls+0.012
ized+0.012
cking+0.011
lights+0.011
led+0.011
Neg logit contributions
 fault-0.022
 loser-0.020
 liar-0.020
 robber-0.019
 supposed-0.019
Token up
Token ID510
Feature activation-2.93e+00

Pos logit contributions
lights+0.012
cking+0.011
ized+0.010
cks+0.010
ls+0.010
Neg logit contributions
 fault-0.016
 seeker-0.014
 supposed-0.014
 robber-0.014
 loser-0.013
Token and
Token ID290
Feature activation-2.78e+00

Pos logit contributions
lights+0.067
cking+0.065
cks+0.061
ized+0.060
ls+0.060
Neg logit contributions
 fault-0.059
 seeker-0.053
 robber-0.051
 supposed-0.050
 indeed-0.050
Token decided
Token ID3066
Feature activation+3.84e+00

Pos logit contributions
lights+0.054
cking+0.051
ized+0.050
ls+0.048
cks+0.047
Neg logit contributions
 fault-0.062
 robber-0.057
 supposed-0.057
 seeker-0.056
 liar-0.055
Token to
Token ID284
Feature activation+2.84e-01

Pos logit contributions
 fault+0.092
 loser+0.087
 seeker+0.083
 robber+0.081
 liar+0.080
Neg logit contributions
ized-0.074
ls-0.073
cking-0.069
lights-0.068
bel-0.067
Token take
Token ID1011
Feature activation-9.09e-01

Pos logit contributions
 fault+0.007
 loser+0.006
 supposed+0.006
 liar+0.006
 robber+0.006
Neg logit contributions
ized-0.005
lights-0.005
cking-0.005
ls-0.005
led-0.005
Token wanted
Token ID2227
Feature activation+3.89e+00

Pos logit contributions
 fault+0.062
 robber+0.056
epad+0.054
 liar+0.053
 supposed+0.053
Neg logit contributions
ized-0.068
 slowly-0.066
 cre-0.066
bel-0.065
ling-0.065
Token to
Token ID284
Feature activation+1.82e+00

Pos logit contributions
 fault+0.086
 seeker+0.079
 loser+0.079
 liar+0.077
 robber+0.077
Neg logit contributions
lights-0.076
ized-0.076
ls-0.076
cking-0.073
 deep-0.070
Token improve
Token ID2987
Feature activation-1.96e-01

Pos logit contributions
 fault+0.044
 loser+0.038
 liar+0.038
 supposed+0.038
 robber+0.037
Neg logit contributions
ized-0.034
cking-0.033
lights-0.032
ls-0.031
bel-0.030
Token your
Token ID534
Feature activation+1.23e+00

Pos logit contributions
lights+0.004
ized+0.004
cking+0.004
ls+0.004
led+0.003
Neg logit contributions
 fault-0.005
 robber-0.004
 loser-0.004
 supposed-0.004
 seeker-0.004
Token bike
Token ID7161
Feature activation-4.14e-01

Pos logit contributions
 fault+0.030
 robber+0.026
 loser+0.025
 seeker+0.025
 supposed+0.025
Neg logit contributions
lights-0.024
ized-0.023
cking-0.023
ls-0.023
led-0.022
Token so
Token ID523
Feature activation+2.19e-01

Pos logit contributions
lights+0.008
ized+0.008
cking+0.008
ls+0.008
led+0.007
Neg logit contributions
 fault-0.010
 robber-0.009
 loser-0.009
 supposed-0.008
 seeker-0.008
Token you
Token ID345
Feature activation+1.61e+00

Pos logit contributions
 fault+0.004
 loser+0.003
 supposed+0.003
 liar+0.003
 robber+0.003
Neg logit contributions
cking-0.005
lights-0.005
ized-0.005
ls-0.005
led-0.005
Token can
Token ID460
Feature activation+2.78e+00

Pos logit contributions
 fault+0.032
 robber+0.027
 loser+0.027
 liar+0.027
 supposed+0.027
Neg logit contributions
ized-0.037
lights-0.036
led-0.035
ls-0.035
cking-0.035
Token ride
Token ID6594
Feature activation-1.66e+00

Pos logit contributions
 fault+0.064
 liar+0.055
 robber+0.055
 loser+0.054
 supposed+0.054
Neg logit contributions
ized-0.057
cking-0.054
lights-0.053
ls-0.053
ised-0.050
Token faster
Token ID5443
Feature activation-2.15e+00

Pos logit contributions
lights+0.031
ized+0.030
cking+0.030
ls+0.029
led+0.028
Neg logit contributions
 fault-0.040
 robber-0.036
 loser-0.036
 seeker-0.035
 supposed-0.035
Token and
Token ID290
Feature activation-2.61e+00

Pos logit contributions
lights+0.040
ized+0.040
cking+0.039
ls+0.038
led+0.036
Neg logit contributions
 fault-0.052
 robber-0.046
 loser-0.046
 supposed-0.046
 seeker-0.046
Token because
Token ID780
Feature activation+1.05e+00

Pos logit contributions
 fault+0.029
 loser+0.028
 robber+0.027
 liar+0.027
 seeker+0.026
Neg logit contributions
ized-0.017
lights-0.016
ls-0.016
cking-0.015
led-0.015
Token the
Token ID262
Feature activation+8.20e-01

Pos logit contributions
 fault+0.027
 robber+0.024
 loser+0.024
 liar+0.023
 supposed+0.023
Neg logit contributions
lights-0.019
ized-0.019
ls-0.018
cking-0.018
led-0.017
Token inside
Token ID2641
Feature activation-1.72e+00

Pos logit contributions
 fault+0.019
 loser+0.017
 robber+0.017
 seeker+0.016
 supposed+0.016
Neg logit contributions
ized-0.016
lights-0.016
cking-0.016
ls-0.015
led-0.015
Token of
Token ID286
Feature activation-2.62e+00

Pos logit contributions
lights+0.030
ized+0.030
cking+0.029
ls+0.029
led+0.027
Neg logit contributions
 fault-0.044
 robber-0.039
 loser-0.039
 supposed-0.039
 seeker-0.039
Token the
Token ID262
Feature activation-1.63e+00

Pos logit contributions
ls+0.043
lights+0.042
cking+0.041
ized+0.040
led+0.040
Neg logit contributions
 fault-0.069
 supposed-0.063
 seeker-0.062
 robber-0.062
 loser-0.060
Token mailbox
Token ID37282
Feature activation-1.83e+00

Pos logit contributions
lights+0.038
cking+0.037
ized+0.037
ls+0.036
led+0.035
Neg logit contributions
 fault-0.032
 supposed-0.028
 robber-0.027
 loser-0.027
 liar-0.027
Token was
Token ID373
Feature activation+2.14e+00

Pos logit contributions
lights+0.033
cking+0.032
ized+0.032
ls+0.031
led+0.029
Neg logit contributions
 fault-0.046
 supposed-0.040
 robber-0.040
 loser-0.040
 seeker-0.040
Token filled
Token ID5901
Feature activation-3.40e+00

Pos logit contributions
 fault+0.052
 supposed+0.047
 seeker+0.045
 robber+0.045
 liar+0.044
Neg logit contributions
lights-0.040
ized-0.040
ls-0.040
cking-0.039
led-0.037
Token with
Token ID351
Feature activation-2.36e+00

Pos logit contributions
lights+0.053
cking+0.050
ls+0.050
ized+0.049
led+0.045
Neg logit contributions
 fault-0.090
 supposed-0.084
 seeker-0.083
 robber-0.081
 loser-0.081
Token mail
Token ID6920
Feature activation+4.82e-01

Pos logit contributions
lights+0.042
ized+0.041
cking+0.040
ls+0.040
led+0.037
Neg logit contributions
 fault-0.060
 robber-0.053
 supposed-0.053
 loser-0.053
 seeker-0.052
Token.
Token ID13
Feature activation+3.44e-01

Pos logit contributions
 fault+0.014
 loser+0.013
 liar+0.013
 robber+0.013
 seeker+0.013
Neg logit contributions
lights-0.006
ized-0.006
ls-0.006
led-0.006
cking-0.005
Token a
Token ID257
Feature activation+6.68e-01

Pos logit contributions
lights+0.005
ized+0.005
cking+0.004
ls+0.004
led+0.004
Neg logit contributions
 fault-0.005
 loser-0.004
 robber-0.004
 supposed-0.004
 seeker-0.004
Token silly
Token ID14397
Feature activation+3.22e+00

Pos logit contributions
 fault+0.020
 robber+0.018
 loser+0.018
 seeker+0.018
 liar+0.018
Neg logit contributions
lights-0.009
ized-0.009
cking-0.008
ls-0.008
led-0.008
Token clown
Token ID25573
Feature activation+2.08e+00

Pos logit contributions
 fault+0.082
 supposed+0.073
epad+0.071
 loser+0.070
 seeker+0.068
Neg logit contributions
ized-0.056
ls-0.055
lights-0.053
ling-0.052
cking-0.052
Token who
Token ID508
Feature activation+1.36e+00

Pos logit contributions
 fault+0.058
 loser+0.051
 robber+0.051
 liar+0.050
 seeker+0.050
Neg logit contributions
ized-0.033
ls-0.033
cking-0.033
lights-0.031
ling-0.030
Token made
Token ID925
Feature activation+8.89e-01

Pos logit contributions
 fault+0.030
 robber+0.027
 supposed+0.027
 liar+0.027
 seeker+0.026
Neg logit contributions
lights-0.028
cking-0.027
ized-0.026
ls-0.026
ised-0.024
Token them
Token ID606
Feature activation+6.78e-01

Pos logit contributions
 fault+0.021
 loser+0.018
 seeker+0.018
 robber+0.018
 supposed+0.018
Neg logit contributions
lights-0.017
ized-0.017
cking-0.017
ls-0.017
led-0.016
Token laugh
Token ID6487
Feature activation-1.68e-01

Pos logit contributions
 fault+0.017
 robber+0.015
 loser+0.015
 seeker+0.015
 supposed+0.015
Neg logit contributions
lights-0.012
ized-0.012
cking-0.012
ls-0.012
led-0.011
Token.
Token ID13
Feature activation+7.37e-01

Pos logit contributions
lights+0.002
ized+0.002
ls+0.002
cking+0.002
led+0.002
Neg logit contributions
 fault-0.005
 liar-0.004
 loser-0.004
 seeker-0.004
 robber-0.004
Token The
Token ID383
Feature activation+8.98e-01

Pos logit contributions
 fault+0.016
 robber+0.015
 seeker+0.014
 loser+0.014
 supposed+0.014
Neg logit contributions
cking-0.015
ized-0.015
lights-0.015
ls-0.014
led-0.014
Token clown
Token ID25573
Feature activation-5.90e-01

Pos logit contributions
 fault+0.013
 robber+0.012
 loser+0.011
 liar+0.011
 seeker+0.011
Neg logit contributions
lights-0.025
ized-0.025
cking-0.025
ls-0.024
ised-0.023
Token had
Token ID550
Feature activation+2.63e+00

Pos logit contributions
lights+0.011
cking+0.011
ized+0.010
ls+0.010
cks+0.010
Neg logit contributions
 fault-0.014
 robber-0.013
 supposed-0.012
 liar-0.012
 loser-0.012
Token bed
Token ID3996
Feature activation-1.00e+00

Pos logit contributions
led+0.076
ling+0.075
ized+0.075
lights+0.073
bel+0.072
Neg logit contributions
 fault-0.089
 seeker-0.086
 robber-0.083
 supposed-0.081
 yours-0.077
Token and
Token ID290
Feature activation-2.61e+00

Pos logit contributions
cking+0.021
cks+0.020
lights+0.020
ls+0.020
ized+0.019
Neg logit contributions
 fault-0.021
 seeker-0.019
 robber-0.019
 supposed-0.019
epad-0.018
Token ran
Token ID4966
Feature activation+7.84e-02

Pos logit contributions
lights+0.053
cking+0.051
ized+0.051
ised+0.047
cks+0.047
Neg logit contributions
 seeker-0.057
 fault-0.055
 supposed-0.054
 robber-0.053
epad-0.050
Token outside
Token ID2354
Feature activation-3.31e+00

Pos logit contributions
 fault+0.002
 loser+0.002
 liar+0.002
 robber+0.002
 supposed+0.002
Neg logit contributions
lights-0.002
ized-0.002
ls-0.002
cking-0.001
led-0.001
Token.
Token ID13
Feature activation+8.32e-02

Pos logit contributions
lights+0.072
ized+0.071
cking+0.070
ls+0.069
led+0.067
Neg logit contributions
 fault-0.070
 supposed-0.061
 loser-0.060
 seeker-0.060
 robber-0.059
Token
Token ID198
Feature activation-5.44e+00

Pos logit contributions
 fault+0.002
 supposed+0.002
 robber+0.002
 seeker+0.002
 loser+0.002
Neg logit contributions
lights-0.002
ized-0.001
cking-0.001
ls-0.001
led-0.001
Token
Token ID198
Feature activation-3.84e+00

Pos logit contributions
lights+0.115
ized+0.115
cking+0.114
led+0.108
ls+0.103
Neg logit contributions
 fault-0.114
 supposed-0.106
 seeker-0.100
 robber-0.098
 loser-0.097
TokenAt
Token ID2953
Feature activation-3.66e+00

Pos logit contributions
lights+0.063
ized+0.061
cking+0.060
ls+0.059
led+0.056
Neg logit contributions
 fault-0.101
 robber-0.092
 supposed-0.092
 loser-0.092
 seeker-0.091
Token the
Token ID262
Feature activation-4.13e-01

Pos logit contributions
lights+0.073
led+0.072
ized+0.071
cking+0.069
ling+0.068
Neg logit contributions
 seeker-0.088
 fault-0.086
 supposed-0.076
 robber-0.073
 loser-0.072
Token park
Token ID3952
Feature activation-1.32e+00

Pos logit contributions
lights+0.009
cking+0.009
ized+0.009
led+0.008
ls+0.008
Neg logit contributions
 fault-0.009
 seeker-0.009
 robber-0.008
 loser-0.008
 supposed-0.008
Token,
Token ID11
Feature activation+1.24e+00

Pos logit contributions
lights+0.028
cking+0.027
ized+0.027
ls+0.026
led+0.025
Neg logit contributions
 fault-0.029
 seeker-0.025
 loser-0.025
 robber-0.025
 supposed-0.025
Token the
Token ID262
Feature activation-3.39e+00

Pos logit contributions
lights+0.062
ized+0.061
cking+0.059
ls+0.058
led+0.053
Neg logit contributions
 fault-0.147
 robber-0.133
 loser-0.132
 supposed-0.132
 seeker-0.131
Token air
Token ID1633
Feature activation-1.52e+00

Pos logit contributions
lights+0.059
ized+0.058
cking+0.057
ls+0.056
led+0.053
Neg logit contributions
 fault-0.088
 robber-0.080
 loser-0.079
 seeker-0.079
 liar-0.078
Token.
Token ID13
Feature activation+7.38e-01

Pos logit contributions
lights+0.026
ized+0.026
cking+0.026
ls+0.026
led+0.024
Neg logit contributions
 fault-0.038
 loser-0.034
 robber-0.034
 supposed-0.034
 seeker-0.033
Token
Token ID220
Feature activation-2.34e-02

Pos logit contributions
 fault+0.018
 supposed+0.017
 seeker+0.016
 loser+0.016
 liar+0.016
Neg logit contributions
lights-0.013
ized-0.013
ls-0.012
cking-0.012
ised-0.011
Token
Token ID198
Feature activation-5.64e+00

Pos logit contributions
ized+0.000
lights+0.000
cking+0.000
ls+0.000
led+0.000
Neg logit contributions
 fault-0.001
 supposed-0.001
 loser-0.001
 robber-0.001
 liar-0.001
Token
Token ID198
Feature activation-4.58e+00

Pos logit contributions
lights+0.100
ized+0.097
cking+0.097
ls+0.093
led+0.091
Neg logit contributions
 fault-0.142
 loser-0.126
 robber-0.126
 supposed-0.125
 liar-0.124
TokenAlice
Token ID44484
Feature activation-1.30e+00

Pos logit contributions
lights+0.060
ized+0.060
cking+0.058
ls+0.057
led+0.053
Neg logit contributions
 fault-0.137
 robber-0.123
 supposed-0.122
 loser-0.122
 seeker-0.122
Token tried
Token ID3088
Feature activation+2.14e+00

Pos logit contributions
lights+0.023
ized+0.022
cking+0.022
ls+0.022
led+0.020
Neg logit contributions
 fault-0.033
 robber-0.029
 loser-0.029
 supposed-0.029
 liar-0.029
Token the
Token ID262
Feature activation-8.55e-02

Pos logit contributions
 fault+0.049
 loser+0.045
 seeker+0.045
 liar+0.045
 robber+0.044
Neg logit contributions
ized-0.040
lights-0.039
ls-0.039
cking-0.039
 deep-0.037
Token game
Token ID983
Feature activation-2.16e-01

Pos logit contributions
ized+0.002
 deep+0.002
cking+0.002
ls+0.002
lights+0.002
Neg logit contributions
 fault-0.002
 supposed-0.001
 liar-0.001
 seeker-0.001
 loser-0.001
Token and
Token ID290
Feature activation-3.52e+00

Pos logit contributions
lights+0.004
ized+0.004
ls+0.004
cking+0.004
led+0.004
Neg logit contributions
 fault-0.005
 seeker-0.005
 supposed-0.005
 robber-0.005
 loser-0.005
Token She
Token ID1375
Feature activation-1.09e+00

Pos logit contributions
 fault+0.034
 liar+0.030
 supposed+0.030
 loser+0.030
 seeker+0.030
Neg logit contributions
lights-0.024
ized-0.022
ls-0.022
cking-0.021
ised-0.021
Token would
Token ID561
Feature activation+2.97e+00

Pos logit contributions
lights+0.023
ized+0.023
cking+0.022
ls+0.022
led+0.021
Neg logit contributions
 fault-0.024
 robber-0.021
 supposed-0.021
 loser-0.021
 seeker-0.021
Token line
Token ID1627
Feature activation-2.42e+00

Pos logit contributions
 fault+0.069
 loser+0.060
 liar+0.059
 robber+0.058
 supposed+0.058
Neg logit contributions
lights-0.059
ized-0.057
cking-0.057
ls-0.054
bel-0.053
Token them
Token ID606
Feature activation+1.60e-02

Pos logit contributions
cking+0.044
ized+0.043
lights+0.043
ls+0.042
led+0.039
Neg logit contributions
 fault-0.059
 supposed-0.052
 robber-0.052
 loser-0.051
 seeker-0.051
Token all
Token ID477
Feature activation-3.80e+00

Pos logit contributions
 fault+0.000
 supposed+0.000
 seeker+0.000
 robber+0.000
 loser+0.000
Neg logit contributions
lights-0.000
ized-0.000
cking-0.000
ls-0.000
led-0.000
Token up
Token ID510
Feature activation-4.20e+00

Pos logit contributions
ized+0.072
ls+0.071
cking+0.070
lights+0.070
ised+0.063
Neg logit contributions
 fault-0.093
 supposed-0.085
 seeker-0.078
epad-0.077
 loser-0.076
Token,
Token ID11
Feature activation-2.29e-01

Pos logit contributions
cking+0.074
lights+0.074
ized+0.073
ls+0.070
led+0.068
Neg logit contributions
 fault-0.105
 supposed-0.093
 seeker-0.092
 robber-0.092
 loser-0.091
Token give
Token ID1577
Feature activation-9.91e-01

Pos logit contributions
lights+0.005
ls+0.005
ized+0.005
cking+0.004
led+0.004
Neg logit contributions
 fault-0.005
 supposed-0.005
 robber-0.004
 seeker-0.004
epad-0.004
Token them
Token ID606
Feature activation-8.45e-01

Pos logit contributions
lights+0.018
ized+0.018
cking+0.017
ls+0.017
led+0.016
Neg logit contributions
 fault-0.025
 loser-0.022
 robber-0.022
 supposed-0.022
 seeker-0.022
Token cups
Token ID14180
Feature activation-1.87e+00

Pos logit contributions
lights+0.016
ized+0.016
cking+0.016
ls+0.016
led+0.015
Neg logit contributions
 fault-0.020
 loser-0.018
 robber-0.018
 liar-0.017
 supposed-0.017
Token and
Token ID290
Feature activation-2.44e+00

Pos logit contributions
cking+0.034
ized+0.034
lights+0.034
ls+0.033
led+0.031
Neg logit contributions
 fault-0.045
 robber-0.040
 loser-0.040
 supposed-0.039
 seeker-0.039
Token l
Token ID300
Feature activation+1.48e+00

Pos logit contributions
 fault+0.014
 supposed+0.013
 seeker+0.013
 robber+0.013
 loser+0.013
Neg logit contributions
lights-0.008
ized-0.007
ls-0.007
cking-0.007
 deep-0.007
Tokenoll
Token ID692
Feature activation+1.70e+00

Pos logit contributions
 fault+0.028
 robber+0.023
 supposed+0.023
 loser+0.022
 liar+0.022
Neg logit contributions
ized-0.036
ls-0.035
cking-0.035
lights-0.035
led-0.034
Tokenipop
Token ID42800
Feature activation+1.39e+00

Pos logit contributions
 fault+0.041
 robber+0.036
 loser+0.035
 supposed+0.035
 seeker+0.035
Neg logit contributions
ls-0.032
lights-0.032
ized-0.032
led-0.031
cking-0.031
Token.
Token ID13
Feature activation+1.23e-01

Pos logit contributions
 fault+0.037
 robber+0.033
 loser+0.033
 liar+0.033
 seeker+0.033
Neg logit contributions
ized-0.023
lights-0.022
ls-0.022
cking-0.021
led-0.020
Token
Token ID198
Feature activation-6.35e+00

Pos logit contributions
 fault+0.003
 robber+0.003
 loser+0.003
 supposed+0.003
 seeker+0.003
Neg logit contributions
lights-0.002
cking-0.002
ized-0.002
ls-0.002
led-0.002
Token
Token ID198
Feature activation-4.90e+00

Pos logit contributions
lights+0.141
cking+0.134
ized+0.134
led+0.128
ls+0.123
Neg logit contributions
 fault-0.132
 supposed-0.120
 seeker-0.113
 loser-0.112
 robber-0.111
TokenHer
Token ID9360
Feature activation+2.97e+00

Pos logit contributions
lights+0.067
ls+0.063
cking+0.062
ized+0.060
led+0.059
Neg logit contributions
 fault-0.137
 loser-0.133
 robber-0.132
 supposed-0.130
 seeker-0.130
Token eyes
Token ID2951
Feature activation-3.03e+00

Pos logit contributions
 fault+0.071
 robber+0.062
 loser+0.062
 supposed+0.062
 seeker+0.062
Neg logit contributions
lights-0.057
ized-0.057
cking-0.056
ls-0.055
led-0.052
Token lit
Token ID6578
Feature activation-2.45e+00

Pos logit contributions
lights+0.061
ls+0.060
cks+0.059
ang+0.058
cking+0.057
Neg logit contributions
 indeed-0.058
 supposed-0.052
 like-0.051
 fault-0.050
epad-0.050
Token up
Token ID510
Feature activation-4.11e+00

Pos logit contributions
lights+0.042
ls+0.038
cking+0.038
bel+0.035
cks+0.035
Neg logit contributions
 fault-0.060
 supposed-0.053
 indeed-0.053
 seeker-0.052
epad-0.051
Token with
Token ID351
Feature activation-3.38e+00

Pos logit contributions
lights+0.092
cking+0.090
cks+0.088
ls+0.086
led+0.082
Neg logit contributions
 fault-0.075
 indeed-0.074
 seeker-0.067
 supposed-0.066
 loser-0.066
Token doing
Token ID1804
Feature activation+3.89e+00

Pos logit contributions
 fault+0.059
 loser+0.050
 liar+0.049
 robber+0.049
 seeker+0.048
Neg logit contributions
ized-0.048
ls-0.047
cking-0.046
lights-0.045
ling-0.045
Token inside
Token ID2641
Feature activation-8.98e-01

Pos logit contributions
 fault+0.104
 liar+0.093
 loser+0.092
 robber+0.090
epad+0.090
Neg logit contributions
ized-0.063
lights-0.060
ls-0.058
ling-0.057
cking-0.057
Token the
Token ID262
Feature activation-1.90e+00

Pos logit contributions
lights+0.020
ized+0.020
cking+0.020
ls+0.020
led+0.019
Neg logit contributions
 fault-0.018
 robber-0.015
 loser-0.015
 seeker-0.015
 supposed-0.015
Token freezer
Token ID30967
Feature activation-4.06e-01

Pos logit contributions
lights+0.037
ized+0.037
cking+0.036
ls+0.035
led+0.034
Neg logit contributions
 fault-0.045
 robber-0.040
 loser-0.039
 seeker-0.039
 supposed-0.039
Token?"
Token ID1701
Feature activation-2.32e+00

Pos logit contributions
lights+0.009
ized+0.009
cking+0.009
ls+0.009
led+0.009
Neg logit contributions
 fault-0.008
 robber-0.007
 loser-0.007
 supposed-0.007
 seeker-0.007
Token
Token ID198
Feature activation-5.94e+00

Pos logit contributions
lights+0.038
ized+0.038
cking+0.038
ls+0.037
led+0.035
Neg logit contributions
 fault-0.061
 robber-0.054
 loser-0.054
 supposed-0.054
 liar-0.054
Token
Token ID198
Feature activation-5.22e+00

Pos logit contributions
ized+0.123
cking+0.122
lights+0.122
led+0.117
ls+0.115
Neg logit contributions
 fault-0.124
 loser-0.115
 robber-0.112
 supposed-0.111
 liar-0.110
TokenMom
Token ID29252
Feature activation-2.63e+00

Pos logit contributions
lights+0.076
ized+0.074
ls+0.073
cking+0.073
led+0.069
Neg logit contributions
 fault-0.142
 loser-0.137
 robber-0.135
 liar-0.132
 supposed-0.131
Tokenmy
Token ID1820
Feature activation-2.06e+00

Pos logit contributions
lights+0.066
ized+0.066
cking+0.065
ls+0.064
led+0.062
Neg logit contributions
 fault-0.047
 loser-0.039
 robber-0.039
 seeker-0.039
 supposed-0.039
Token smiled
Token ID13541
Feature activation-1.29e+00

Pos logit contributions
lights+0.035
ized+0.034
cking+0.034
ls+0.033
led+0.030
Neg logit contributions
 fault-0.053
 robber-0.048
 supposed-0.047
 liar-0.046
 loser-0.046
Token and
Token ID290
Feature activation-3.65e+00

Pos logit contributions
cking+0.024
lights+0.024
ized+0.024
ls+0.023
cks+0.022
Neg logit contributions
 fault-0.030
 robber-0.027
 loser-0.026
 seeker-0.026
 supposed-0.026