TOP ACTIVATIONS
MAX = 4.610

. Would the pool stay closed , he wondered ?
to sleep . He sn ugg led into bed and put
he could have it . M ummy said yes , and
. <|endoftext|> It was a warm summer day when Billy and
. The moral of the story is that hugs
that morning . M ummy and Daddy smiled at each other
Mom my and Daddy made sure to give
! <|endoftext|> It was a warm day , so Angela decided
. M ummy and Daddy were very happy that
The sweater was very sn ugg ly and warm . Lily
The moral of the story is that it
great !" Kat ie was happy and said
! <|endoftext|> It was a bright day when Mary and her
to do . M ummy said : " We
! Mom my and Daddy said it was time
The moral of this story is that we
my asked . M ummy smiled . She said
The moral of the story is that it
The moral of the story is that if
!" The moral of the story is that we

MIN ACTIVATIONS
MIN = -5.925

more cautious and not take unnecessary risks . The memory of
be careful and not take unnecessary risks . The moral of
be prepared and not take unnecessary risks . <|endoftext|> Once upon
and friends and not take unnecessary risks . We should always
your parents and not take unnecessary risks . <|endoftext|> One day
friends , and never take unnecessary risks . <|endoftext|> Once upon
instinct can often save us unnecessary danger . <|endoftext|> Once upon
the man if she could have a ride . The man
her mom if she could have a slice of pie .
, I won 't let anyone find you ! I 'll
her mom if she could have one too . Her mom
cautiously went closer to take a look . Inside there was
the tree if she could have some . The tree replied
went anywhere without her brave policeman there to keep her safe
€ � � let â € ™ s explore together !
't like all of the unnecessary extravag ance .
and he would never let anyone else get near it .
her mom if she could have one too . Her mom
mom my if she could have it . Her mom my
shop keeper if they could have a little pie .

INTERVAL 1.976 - 4.610
CONTAINS 1.947%

. They put the dirty plates and cups inside the dish
anything ! Sarah and Jack kept joking until the
boy named Tim found a big box . It was a
in a towel . She dried him and dressed him and
. Next to the patch , lived a small girl

INTERVAL -0.658 - 1.976
CONTAINS 74.106%

garden often , and the wise old bird always warned him
found a good hiding spot , but he accidentally fell and
little girl called Clara . Clara had a little brother who
not so terrible ." Lily whispered to Max .
listened and stopped . They all became friends and played happily

INTERVAL -3.291 - -0.658
CONTAINS 23.828%

" What happened ?" Dad asked . He saw Tom
She ran over to it and asked her mom , "
angry . " Hey , that 's not nice
that they were going to a big building with an elevator
lips and tongue to make the same funny noise . Sammy

INTERVAL -5.925 - -3.291
CONTAINS 0.120%

and said , � � € � � We won !
she said he couldn 't have one yet . Randy didn
Lu cy took a few steps closer and looked
. " Ha ha , Ben , you can
peeled the banana and took a big bite . Suddenly ,
Token.
Token ID13
Feature activation+3.05e-01

Pos logit contributions
ves+0.006
 liar+0.006
ggy+0.006
ork+0.006
fire+0.006
Neg logit contributions
efully-0.008
ummy-0.007
 forth-0.007
el-0.007
ipp-0.007
Token Would
Token ID10928
Feature activation+7.17e-01

Pos logit contributions
efully+0.006
ipp+0.005
ummy+0.005
inding+0.005
 forth+0.005
Neg logit contributions
ves-0.009
fire-0.009
ggy-0.009
 liar-0.009
ork-0.008
Token the
Token ID262
Feature activation+2.60e-01

Pos logit contributions
efully+0.019
ipp+0.018
inding+0.018
ummy+0.018
 forth+0.018
Neg logit contributions
ves-0.016
ggy-0.014
fire-0.014
ork-0.014
 liar-0.014
Token pool
Token ID5933
Feature activation+1.28e+00

Pos logit contributions
efully+0.005
ummy+0.005
ipp+0.005
 forth+0.005
inding+0.005
Neg logit contributions
ves-0.007
ork-0.007
fire-0.007
ggy-0.007
 liar-0.007
Token stay
Token ID2652
Feature activation+3.00e+00

Pos logit contributions
efully+0.027
inding+0.025
ipp+0.025
el+0.024
 forth+0.024
Neg logit contributions
ves-0.035
fire-0.034
ggy-0.033
ork-0.033
 liar-0.032
Token closed
Token ID4838
Feature activation+4.61e+00

Pos logit contributions
efully+0.069
ipp+0.069
ummy+0.065
inding+0.064
aleb+0.061
Neg logit contributions
fire-0.072
ves-0.069
ggy-0.067
 jail-0.066
ork-0.063
Token,
Token ID11
Feature activation+1.56e+00

Pos logit contributions
efully+0.104
ipp+0.097
inding+0.095
 forth+0.090
aleb+0.089
Neg logit contributions
ves-0.114
fire-0.110
ork-0.108
 jail-0.104
ggy-0.104
Token he
Token ID339
Feature activation+1.47e+00

Pos logit contributions
efully+0.034
ipp+0.031
inding+0.030
ummy+0.029
aleb+0.029
Neg logit contributions
ves-0.042
fire-0.040
ggy-0.039
ork-0.039
 jail-0.038
Token wondered
Token ID14028
Feature activation+4.27e-01

Pos logit contributions
efully+0.028
ipp+0.025
ummy+0.025
inding+0.025
 forth+0.024
Neg logit contributions
ves-0.044
fire-0.041
ork-0.040
ggy-0.040
 liar-0.040
Token?
Token ID30
Feature activation-1.87e-01

Pos logit contributions
efully+0.010
ipp+0.009
inding+0.009
ummy+0.009
el+0.009
Neg logit contributions
ves-0.011
fire-0.010
ggy-0.010
ork-0.010
 liar-0.010
Token
Token ID220
Feature activation-6.09e-01

Pos logit contributions
ves+0.005
ggy+0.005
fire+0.005
ork+0.005
 liar+0.005
Neg logit contributions
efully-0.004
ipp-0.004
ummy-0.004
inding-0.004
el-0.004
Token to
Token ID284
Feature activation+1.18e+00

Pos logit contributions
efully+0.050
ipp+0.046
 forth+0.045
inding+0.045
asured+0.044
Neg logit contributions
ves-0.051
ork-0.047
fire-0.047
ggy-0.047
 liar-0.045
Token sleep
Token ID3993
Feature activation+1.42e+00

Pos logit contributions
efully+0.026
ipp+0.025
asured+0.024
inding+0.024
ishing+0.023
Neg logit contributions
ves-0.031
ggy-0.029
 jail-0.029
ork-0.029
fire-0.029
Token.
Token ID13
Feature activation+8.57e-01

Pos logit contributions
efully+0.029
ummy+0.029
ipp+0.028
 forth+0.028
inding+0.027
Neg logit contributions
ves-0.037
 liar-0.036
ork-0.034
ggy-0.034
fire-0.034
Token He
Token ID679
Feature activation+2.15e+00

Pos logit contributions
efully+0.018
 forth+0.018
ipp+0.017
inding+0.017
ummy+0.017
Neg logit contributions
ves-0.022
fire-0.021
ork-0.021
ggy-0.021
 jail-0.020
Token sn
Token ID3013
Feature activation+2.43e+00

Pos logit contributions
efully+0.049
ummy+0.047
inding+0.047
ipp+0.047
 forth+0.045
Neg logit contributions
ves-0.055
ggy-0.051
fire-0.050
ork-0.050
 liar-0.048
Tokenugg
Token ID6837
Feature activation+4.53e+00

Pos logit contributions
 forth+0.057
efully+0.057
inding+0.056
 prett+0.055
 oldest+0.054
Neg logit contributions
ves-0.057
ork-0.055
ggy-0.053
fire-0.050
con-0.048
Tokenled
Token ID992
Feature activation+2.76e+00

Pos logit contributions
 definitely+0.128
inding+0.125
initely+0.125
ipp+0.123
efully+0.123
Neg logit contributions
ves-0.091
l-0.082
ork-0.077
fire-0.075
ger-0.074
Token into
Token ID656
Feature activation+9.69e-01

Pos logit contributions
efully+0.065
ipp+0.064
aleb+0.062
inding+0.060
ummy+0.057
Neg logit contributions
ves-0.062
fire-0.060
l-0.057
ork-0.057
 traps-0.055
Token bed
Token ID3996
Feature activation+2.55e+00

Pos logit contributions
efully+0.026
ipp+0.025
inding+0.024
 forth+0.024
aleb+0.024
Neg logit contributions
ves-0.020
 jail-0.019
fire-0.018
ork-0.018
ggy-0.018
Token and
Token ID290
Feature activation+8.87e-02

Pos logit contributions
efully+0.054
 forth+0.049
ipp+0.048
inding+0.048
el+0.047
Neg logit contributions
ves-0.073
fire-0.068
ork-0.067
ggy-0.067
 liar-0.063
Token put
Token ID1234
Feature activation+8.40e-01

Pos logit contributions
efully+0.002
ipp+0.002
inding+0.002
ummy+0.002
aleb+0.002
Neg logit contributions
ves-0.002
fire-0.002
ggy-0.002
ork-0.002
 liar-0.002
Token he
Token ID339
Feature activation-1.28e+00

Pos logit contributions
efully+0.037
ipp+0.035
ummy+0.035
inding+0.035
 forth+0.035
Neg logit contributions
ves-0.027
fire-0.024
ork-0.024
ggy-0.024
 liar-0.023
Token could
Token ID714
Feature activation+3.38e-01

Pos logit contributions
ves+0.042
fire+0.040
ork+0.039
ggy+0.038
 liar+0.038
Neg logit contributions
efully-0.020
ipp-0.018
ummy-0.017
inding-0.017
 both-0.017
Token have
Token ID423
Feature activation-3.98e+00

Pos logit contributions
efully+0.006
ipp+0.005
ummy+0.005
inding+0.005
el+0.005
Neg logit contributions
ves-0.010
fire-0.010
 liar-0.010
ork-0.010
ggy-0.010
Token it
Token ID340
Feature activation-2.21e-01

Pos logit contributions
fire+0.125
ves+0.121
 liar+0.118
 troll+0.114
ork+0.114
Neg logit contributions
 both-0.072
 each-0.069
 all-0.067
efully-0.067
 how-0.064
Token.
Token ID13
Feature activation+2.82e+00

Pos logit contributions
ves+0.004
fire+0.004
 liar+0.004
ork+0.004
ggy+0.004
Neg logit contributions
efully-0.006
ipp-0.006
ummy-0.006
 forth-0.006
inding-0.006
Token M
Token ID337
Feature activation+4.52e+00

Pos logit contributions
asured+0.061
 forth+0.060
efully+0.058
inding+0.057
aleb+0.055
Neg logit contributions
fire-0.066
ork-0.064
ves-0.063
 troll-0.060
 jail-0.060
Tokenummy
Token ID13513
Feature activation+1.51e+00

Pos logit contributions
 forth+0.063
initely+0.059
 prett+0.055
Sounds+0.053
efully+0.053
Neg logit contributions
ggy-0.129
ork-0.129
th-0.125
ves-0.123
,"-0.118
Token said
Token ID531
Feature activation+8.29e-01

Pos logit contributions
efully+0.038
inding+0.036
ipp+0.036
ummy+0.036
 forth+0.035
Neg logit contributions
ves-0.035
ggy-0.032
ork-0.031
fire-0.031
 liar-0.030
Token yes
Token ID3763
Feature activation+4.28e-01

Pos logit contributions
efully+0.017
ummy+0.016
ipp+0.016
inding+0.016
 forth+0.015
Neg logit contributions
ves-0.023
fire-0.021
ork-0.021
 liar-0.021
ggy-0.021
Token,
Token ID11
Feature activation+5.82e-01

Pos logit contributions
efully+0.009
ipp+0.008
 forth+0.008
inding+0.008
ummy+0.008
Neg logit contributions
ves-0.012
ggy-0.011
ork-0.011
fire-0.011
 liar-0.011
Token and
Token ID290
Feature activation-5.69e-01

Pos logit contributions
efully+0.012
ipp+0.011
 forth+0.011
inding+0.011
asured+0.011
Neg logit contributions
ves-0.016
ggy-0.014
 liar-0.014
fire-0.014
ork-0.014
Token.
Token ID13
Feature activation+4.16e-01

Pos logit contributions
efully+0.022
ipp+0.021
inding+0.020
 forth+0.020
ummy+0.020
Neg logit contributions
ves-0.033
fire-0.031
ork-0.031
ggy-0.031
 liar-0.031
Token<|endoftext|>
Token ID50256
Feature activation-1.22e+00

Pos logit contributions
efully+0.008
ipp+0.008
ummy+0.008
inding+0.007
 forth+0.007
Neg logit contributions
ves-0.012
ggy-0.011
fire-0.011
ork-0.011
 liar-0.011
TokenIt
Token ID1026
Feature activation+2.05e+00

Pos logit contributions
 liar+0.038
ves+0.037
 villains+0.034
ggy+0.034
 inconvenience+0.034
Neg logit contributions
efully-0.023
ummy-0.021
el-0.021
 both-0.021
ipp-0.020
Token was
Token ID373
Feature activation+2.76e+00

Pos logit contributions
efully+0.043
ipp+0.040
ummy+0.039
inding+0.039
 forth+0.039
Neg logit contributions
ves-0.057
ggy-0.052
fire-0.052
ork-0.052
 liar-0.051
Token a
Token ID257
Feature activation+2.11e+00

Pos logit contributions
efully+0.083
 forth+0.080
inding+0.079
ipp+0.077
aleb+0.077
Neg logit contributions
ves-0.043
 jail-0.043
ggy-0.041
fire-0.039
ork-0.038
Token warm
Token ID5814
Feature activation+4.51e+00

Pos logit contributions
efully+0.049
inding+0.044
ipp+0.044
asured+0.044
el+0.042
Neg logit contributions
ves-0.053
fire-0.053
ggy-0.052
 jail-0.051
 liar-0.051
Token summer
Token ID3931
Feature activation+1.89e+00

Pos logit contributions
efully+0.121
inding+0.109
ipp+0.108
asured+0.106
 nic+0.098
Neg logit contributions
fire-0.101
 jail-0.098
ves-0.094
ggy-0.093
ork-0.093
Token day
Token ID1110
Feature activation+4.18e-01

Pos logit contributions
efully+0.039
ipp+0.036
inding+0.036
 forth+0.035
ummy+0.034
Neg logit contributions
ves-0.053
fire-0.050
ggy-0.049
ork-0.048
 liar-0.047
Token when
Token ID618
Feature activation-4.22e-02

Pos logit contributions
efully+0.009
ipp+0.008
ummy+0.008
inding+0.008
 forth+0.008
Neg logit contributions
ves-0.011
fire-0.011
ork-0.011
ggy-0.011
 liar-0.010
Token Billy
Token ID15890
Feature activation-1.03e-01

Pos logit contributions
ves+0.001
ork+0.001
 liar+0.001
fire+0.001
ata+0.001
Neg logit contributions
ummy-0.001
efully-0.001
 forth-0.001
 both-0.001
el-0.001
Token and
Token ID290
Feature activation+1.00e+00

Pos logit contributions
ves+0.003
fire+0.003
ggy+0.003
ork+0.003
 liar+0.003
Neg logit contributions
efully-0.002
ipp-0.002
 forth-0.002
inding-0.002
el-0.002
Token.
Token ID13
Feature activation+8.08e-01

Pos logit contributions
efully+0.030
ummy+0.029
ipp+0.029
 forth+0.029
inding+0.029
Neg logit contributions
ves-0.019
 liar-0.016
fire-0.016
ggy-0.016
ork-0.016
Token
Token ID198
Feature activation+1.37e+00

Pos logit contributions
efully+0.022
ipp+0.021
ummy+0.021
inding+0.021
 forth+0.021
Neg logit contributions
ves-0.017
fire-0.015
ggy-0.015
ork-0.015
 liar-0.015
Token
Token ID198
Feature activation+2.00e+00

Pos logit contributions
efully+0.036
ipp+0.034
inding+0.034
 forth+0.034
ummy+0.034
Neg logit contributions
ves-0.031
fire-0.028
ggy-0.028
ork-0.027
 liar-0.027
TokenThe
Token ID464
Feature activation+1.41e+00

Pos logit contributions
 forth+0.056
asured+0.053
efully+0.053
ummy+0.052
el+0.052
Neg logit contributions
ves-0.045
ggy-0.040
ork-0.039
fire-0.038
l-0.034
Token moral
Token ID6573
Feature activation-1.19e-01

Pos logit contributions
efully+0.029
ipp+0.028
inding+0.027
asured+0.027
ummy+0.026
Neg logit contributions
ves-0.038
ggy-0.036
fire-0.036
 troll-0.035
 liar-0.035
Token of
Token ID286
Feature activation+4.51e+00

Pos logit contributions
ves+0.003
 liar+0.003
fire+0.003
ggy+0.003
ork+0.003
Neg logit contributions
efully-0.002
ummy-0.002
ipp-0.002
el-0.002
inding-0.002
Token the
Token ID262
Feature activation+2.38e+00

Pos logit contributions
efully+0.124
ipp+0.117
inding+0.116
 forth+0.115
 favourites+0.113
Neg logit contributions
ves-0.093
fire-0.087
ggy-0.086
ork-0.080
 troll-0.076
Token story
Token ID1621
Feature activation+4.41e-01

Pos logit contributions
efully+0.061
ipp+0.058
inding+0.057
 forth+0.057
asured+0.056
Neg logit contributions
ves-0.055
ggy-0.051
ork-0.050
fire-0.050
 liar-0.049
Token is
Token ID318
Feature activation+4.67e-01

Pos logit contributions
efully+0.011
 forth+0.010
ipp+0.010
aleb+0.010
asured+0.010
Neg logit contributions
ves-0.011
ork-0.010
ggy-0.010
fire-0.010
 liar-0.009
Token that
Token ID326
Feature activation+4.04e-01

Pos logit contributions
efully+0.013
 forth+0.012
ipp+0.011
inding+0.011
aleb+0.011
Neg logit contributions
ves-0.010
ggy-0.010
ork-0.010
fire-0.009
 jail-0.009
Token hugs
Token ID40657
Feature activation+2.48e+00

Pos logit contributions
efully+0.009
ipp+0.009
ummy+0.009
inding+0.009
 forth+0.009
Neg logit contributions
ves-0.010
fire-0.009
ggy-0.009
ork-0.009
 liar-0.009
Token that
Token ID326
Feature activation-5.15e-01

Pos logit contributions
ves+0.063
 liar+0.061
fire+0.061
 villains+0.058
ggy+0.058
Neg logit contributions
ipp-0.047
efully-0.045
 definitely-0.045
 both-0.045
asured-0.045
Token morning
Token ID3329
Feature activation-1.77e-01

Pos logit contributions
ves+0.014
fire+0.013
ggy+0.013
ork+0.013
 liar+0.013
Neg logit contributions
efully-0.011
ipp-0.010
 forth-0.010
ummy-0.010
inding-0.010
Token.
Token ID13
Feature activation+1.37e+00

Pos logit contributions
ves+0.004
 liar+0.003
fire+0.003
ggy+0.003
 villains+0.003
Neg logit contributions
ummy-0.005
efully-0.005
ipp-0.005
 forth-0.004
inding-0.004
Token M
Token ID337
Feature activation+3.68e+00

Pos logit contributions
 forth+0.038
asured+0.036
efully+0.036
inding+0.035
ishing+0.035
Neg logit contributions
ves-0.024
ork-0.024
fire-0.023
 jail-0.022
ata-0.021
Tokenummy
Token ID13513
Feature activation+2.28e+00

Pos logit contributions
 forth+0.050
efully+0.047
 prett+0.043
initely+0.039
 nic+0.038
Neg logit contributions
ork-0.115
ves-0.108
ggy-0.107
handed-0.106
th-0.106
Token and
Token ID290
Feature activation+4.49e+00

Pos logit contributions
efully+0.049
ummy+0.046
inding+0.046
ipp+0.044
 forth+0.044
Neg logit contributions
ves-0.061
ggy-0.057
fire-0.055
ata-0.053
ork-0.053
Token Daddy
Token ID26926
Feature activation+6.13e-01

Pos logit contributions
efully+0.109
asured+0.104
ust+0.102
initely+0.100
amb+0.100
Neg logit contributions
,"-0.088
.-0.082
aster-0.082
th-0.074
 someone-0.072
Token smiled
Token ID13541
Feature activation+2.83e+00

Pos logit contributions
ummy+0.014
ires+0.014
efully+0.014
inding+0.013
iring+0.013
Neg logit contributions
ves-0.015
fire-0.013
ggy-0.013
ata-0.012
l-0.012
Token at
Token ID379
Feature activation+2.04e+00

Pos logit contributions
initely+0.071
aleb+0.070
 forth+0.069
asured+0.068
efully+0.067
Neg logit contributions
fire-0.051
l-0.048
ves-0.048
.-0.048
,"-0.047
Token each
Token ID1123
Feature activation+1.65e-01

Pos logit contributions
ipp+0.024
asured+0.023
efully+0.023
inding+0.023
 forth+0.022
Neg logit contributions
ves-0.068
ggy-0.065
fire-0.065
 jail-0.063
ork-0.063
Token other
Token ID584
Feature activation+8.78e-01

Pos logit contributions
efully+0.005
ipp+0.004
 forth+0.004
 both+0.004
ummy+0.004
Neg logit contributions
ves-0.004
fire-0.003
ork-0.003
 liar-0.003
con-0.003
Token
Token ID220
Feature activation-1.42e+00

Pos logit contributions
efully+0.011
 forth+0.010
inding+0.010
asured+0.010
ipp+0.010
Neg logit contributions
ves-0.007
ork-0.006
fire-0.006
 jail-0.006
ggy-0.006
Token
Token ID198
Feature activation-3.78e-02

Pos logit contributions
 liar+0.035
ves+0.035
 troll+0.034
fire+0.033
 villains+0.032
Neg logit contributions
ipp-0.034
efully-0.033
ummy-0.032
 both-0.032
inding-0.031
Token
Token ID198
Feature activation+1.43e+00

Pos logit contributions
ves+0.001
 liar+0.001
fire+0.001
 troll+0.001
ork+0.001
Neg logit contributions
efully-0.001
ipp-0.001
ummy-0.001
inding-0.001
 both-0.001
TokenMom
Token ID29252
Feature activation+3.36e+00

Pos logit contributions
 forth+0.038
ish+0.038
ummy+0.038
efully+0.038
el+0.037
Neg logit contributions
ves-0.032
fire-0.028
ggy-0.028
ork-0.027
l-0.024
Tokenmy
Token ID1820
Feature activation+1.59e+00

Pos logit contributions
initely+0.077
ires+0.076
 favourite+0.075
efully+0.074
 nic+0.072
Neg logit contributions
l-0.065
,"-0.064
ata-0.063
ves-0.061
ggy-0.058
Token and
Token ID290
Feature activation+4.47e+00

Pos logit contributions
efully+0.034
initely+0.033
ummy+0.032
inding+0.032
ires+0.032
Neg logit contributions
ves-0.040
ggy-0.038
fire-0.037
ork-0.035
 troll-0.035
Token Daddy
Token ID26926
Feature activation+9.39e-01

Pos logit contributions
efully+0.105
initely+0.097
imm+0.096
ags+0.094
asured+0.094
Neg logit contributions
,"-0.086
aster-0.082
 jail-0.075
.-0.074
th-0.074
Token made
Token ID925
Feature activation+1.76e+00

Pos logit contributions
ires+0.021
ummy+0.021
inding+0.020
efully+0.020
iring+0.020
Neg logit contributions
ves-0.021
fire-0.021
ggy-0.020
l-0.019
 killed-0.019
Token sure
Token ID1654
Feature activation+5.67e-01

Pos logit contributions
efully+0.034
ipp+0.032
 forth+0.032
inding+0.032
ummy+0.031
Neg logit contributions
ves-0.051
ggy-0.047
ork-0.047
fire-0.047
 liar-0.045
Token to
Token ID284
Feature activation+1.03e+00

Pos logit contributions
efully+0.014
ipp+0.013
inding+0.013
ummy+0.013
 forth+0.013
Neg logit contributions
ves-0.014
ggy-0.013
ork-0.013
fire-0.013
 liar-0.012
Token give
Token ID1577
Feature activation+8.38e-01

Pos logit contributions
ummy+0.022
efully+0.022
inding+0.020
ipp+0.020
 forth+0.020
Neg logit contributions
fire-0.026
ves-0.026
ggy-0.026
 jail-0.025
ork-0.024
Token!
Token ID0
Feature activation-1.18e-01

Pos logit contributions
efully+0.003
ipp+0.003
inding+0.003
ummy+0.003
 forth+0.003
Neg logit contributions
ves-0.004
fire-0.004
ork-0.004
ggy-0.004
 liar-0.003
Token<|endoftext|>
Token ID50256
Feature activation-1.13e+00

Pos logit contributions
ves+0.003
fire+0.003
 liar+0.003
ggy+0.003
 troll+0.003
Neg logit contributions
efully-0.003
ipp-0.003
ummy-0.003
 both-0.002
el-0.002
TokenIt
Token ID1026
Feature activation+1.81e+00

Pos logit contributions
 liar+0.032
ves+0.032
 villains+0.029
ggy+0.029
fire+0.029
Neg logit contributions
efully-0.024
el-0.023
ummy-0.022
ipp-0.022
 both-0.021
Token was
Token ID373
Feature activation+2.35e+00

Pos logit contributions
efully+0.038
ipp+0.036
inding+0.035
ummy+0.035
 forth+0.035
Neg logit contributions
ves-0.049
ggy-0.045
fire-0.045
ork-0.045
 liar-0.044
Token a
Token ID257
Feature activation+2.05e+00

Pos logit contributions
efully+0.065
inding+0.064
 forth+0.064
ipp+0.062
ummy+0.061
Neg logit contributions
ves-0.044
ggy-0.041
fire-0.040
 jail-0.039
ork-0.037
Token warm
Token ID5814
Feature activation+4.45e+00

Pos logit contributions
efully+0.046
asured+0.043
inding+0.043
ipp+0.042
el+0.040
Neg logit contributions
fire-0.052
ggy-0.051
ves-0.051
 jail-0.049
 liar-0.049
Token day
Token ID1110
Feature activation+4.57e-01

Pos logit contributions
efully+0.125
inding+0.115
asured+0.114
ipp+0.112
ires+0.104
Neg logit contributions
fire-0.094
 jail-0.089
ger-0.084
ves-0.083
ggy-0.083
Token,
Token ID11
Feature activation+4.55e-01

Pos logit contributions
efully+0.011
ipp+0.010
inding+0.010
 forth+0.010
asured+0.010
Neg logit contributions
ves-0.011
fire-0.011
ggy-0.010
ork-0.010
 liar-0.010
Token so
Token ID523
Feature activation-1.11e-01

Pos logit contributions
efully+0.010
ipp+0.009
 forth+0.009
inding+0.009
ummy+0.009
Neg logit contributions
ves-0.012
ggy-0.011
fire-0.011
ork-0.011
 liar-0.010
Token Angela
Token ID17799
Feature activation+5.84e-03

Pos logit contributions
ves+0.003
ggy+0.003
fire+0.003
ork+0.003
 liar+0.002
Neg logit contributions
efully-0.003
ipp-0.002
inding-0.002
 forth-0.002
ummy-0.002
Token decided
Token ID3066
Feature activation-9.16e-01

Pos logit contributions
efully+0.000
ipp+0.000
inding+0.000
ummy+0.000
 forth+0.000
Neg logit contributions
ves-0.000
ork-0.000
 liar-0.000
fire-0.000
ggy-0.000
Token.
Token ID13
Feature activation+1.53e+00

Pos logit contributions
efully+0.037
ipp+0.035
ummy+0.035
inding+0.035
 forth+0.035
Neg logit contributions
ves-0.023
ggy-0.020
fire-0.020
ork-0.020
 liar-0.020
Token
Token ID198
Feature activation+1.72e+00

Pos logit contributions
efully+0.036
 forth+0.036
inding+0.034
asured+0.033
ipp+0.033
Neg logit contributions
ves-0.036
ork-0.035
fire-0.034
ggy-0.033
 liar-0.032
Token
Token ID198
Feature activation+2.69e+00

Pos logit contributions
efully+0.051
 forth+0.050
inding+0.048
ipp+0.048
asured+0.047
Neg logit contributions
ves-0.034
ggy-0.031
ork-0.030
fire-0.029
l-0.027
TokenM
Token ID44
Feature activation+3.72e+00

Pos logit contributions
 forth+0.080
efully+0.076
ish+0.075
ishing+0.074
asured+0.074
Neg logit contributions
ves-0.058
fire-0.050
ggy-0.049
ork-0.049
l-0.044
Tokenummy
Token ID13513
Feature activation+2.83e+00

Pos logit contributions
 forth+0.044
efully+0.041
initely+0.032
 nic+0.031
 prett+0.027
Neg logit contributions
ves-0.138
ork-0.137
ggy-0.129
l-0.128
th-0.127
Token and
Token ID290
Feature activation+4.45e+00

Pos logit contributions
efully+0.061
 forth+0.057
initely+0.057
inding+0.057
ipp+0.056
Neg logit contributions
ves-0.074
ggy-0.070
fire-0.068
 troll-0.067
ork-0.067
Token Daddy
Token ID26926
Feature activation+8.56e-01

Pos logit contributions
efully+0.110
initely+0.105
ust+0.100
asured+0.100
 definitely+0.097
Neg logit contributions
,"-0.088
.-0.077
aster-0.075
."-0.072
l-0.071
Token were
Token ID547
Feature activation+1.41e+00

Pos logit contributions
ires+0.020
ummy+0.019
efully+0.019
inding+0.019
iring+0.019
Neg logit contributions
ves-0.019
ggy-0.018
fire-0.018
 killed-0.017
ata-0.017
Token very
Token ID845
Feature activation+2.62e-01

Pos logit contributions
ummy+0.030
aleb+0.029
efully+0.029
ishing+0.029
 forth+0.029
Neg logit contributions
ggy-0.036
 jail-0.034
ves-0.033
 killed-0.032
 liar-0.032
Token happy
Token ID3772
Feature activation+1.49e+00

Pos logit contributions
efully+0.006
el+0.006
 forth+0.006
inding+0.005
initely+0.005
Neg logit contributions
ves-0.007
ggy-0.006
fire-0.006
 liar-0.006
 jail-0.006
Token that
Token ID326
Feature activation+3.18e-01

Pos logit contributions
efully+0.041
ipp+0.039
initely+0.038
inding+0.038
 forth+0.038
Neg logit contributions
ves-0.030
fire-0.028
ggy-0.028
ork-0.027
 liar-0.026
TokenThe
Token ID464
Feature activation+1.49e+00

Pos logit contributions
efully+0.046
 forth+0.044
el+0.044
ummy+0.043
asured+0.043
Neg logit contributions
ves-0.041
ork-0.037
ggy-0.036
fire-0.034
ata-0.033
Token sweater
Token ID35685
Feature activation+1.22e+00

Pos logit contributions
efully+0.031
ipp+0.030
inding+0.030
asured+0.029
el+0.028
Neg logit contributions
ves-0.040
ggy-0.038
ork-0.037
fire-0.036
l-0.036
Token was
Token ID373
Feature activation+2.54e+00

Pos logit contributions
efully+0.027
ummy+0.025
ipp+0.025
inding+0.025
asured+0.024
Neg logit contributions
ves-0.033
ggy-0.031
ork-0.031
fire-0.030
 liar-0.029
Token very
Token ID845
Feature activation+1.59e+00

Pos logit contributions
efully+0.059
aleb+0.056
amb+0.053
ipp+0.052
inding+0.052
Neg logit contributions
ves-0.065
ggy-0.064
 killed-0.060
ork-0.058
 jail-0.058
Token sn
Token ID3013
Feature activation+1.98e+00

Pos logit contributions
efully+0.043
 forth+0.039
inding+0.039
aleb+0.038
ipp+0.038
Neg logit contributions
ves-0.036
ggy-0.035
l-0.033
ork-0.033
fire-0.030
Tokenugg
Token ID6837
Feature activation+4.41e+00

Pos logit contributions
efully+0.038
ipp+0.035
ummy+0.035
inding+0.034
 forth+0.034
Neg logit contributions
ves-0.059
fire-0.054
ggy-0.054
ork-0.054
 liar-0.054
Tokenly
Token ID306
Feature activation+6.57e-01

Pos logit contributions
efully+0.102
inding+0.099
ipp+0.098
 definitely+0.096
initely+0.096
Neg logit contributions
ves-0.110
ork-0.101
fire-0.100
l-0.100
ger-0.096
Token and
Token ID290
Feature activation-1.26e-01

Pos logit contributions
efully+0.012
ipp+0.011
ummy+0.010
inding+0.010
 forth+0.010
Neg logit contributions
ves-0.020
fire-0.019
ork-0.019
ggy-0.019
 liar-0.018
Token warm
Token ID5814
Feature activation+1.91e+00

Pos logit contributions
ves+0.003
fire+0.003
ggy+0.003
ork+0.003
con+0.003
Neg logit contributions
efully-0.003
inding-0.003
ipp-0.003
ummy-0.003
aleb-0.003
Token.
Token ID13
Feature activation+1.49e+00

Pos logit contributions
efully+0.048
inding+0.045
ipp+0.045
asured+0.044
el+0.043
Neg logit contributions
ves-0.044
fire-0.042
ggy-0.040
ork-0.039
con-0.038
Token Lily
Token ID20037
Feature activation+1.05e+00

Pos logit contributions
efully+0.042
 forth+0.041
aleb+0.040
inding+0.040
el+0.040
Neg logit contributions
ves-0.027
ork-0.027
 jail-0.025
 liar-0.024
ggy-0.024
Token
Token ID220
Feature activation-1.26e+00

Pos logit contributions
efully+0.029
 forth+0.028
inding+0.027
el+0.026
ipp+0.026
Neg logit contributions
ves-0.028
ork-0.026
fire-0.026
ggy-0.025
 liar-0.024
Token
Token ID198
Feature activation+8.01e-01

Pos logit contributions
ves+0.031
 liar+0.031
fire+0.030
 troll+0.029
ork+0.029
Neg logit contributions
ipp-0.030
efully-0.030
ummy-0.028
 both-0.028
inding-0.028
Token
Token ID198
Feature activation+2.08e+00

Pos logit contributions
efully+0.021
ipp+0.020
ummy+0.020
inding+0.020
 both+0.019
Neg logit contributions
ves-0.017
 liar-0.016
fire-0.016
ggy-0.016
ork-0.016
TokenThe
Token ID464
Feature activation+1.41e+00

Pos logit contributions
 forth+0.050
efully+0.048
el+0.048
asured+0.046
ummy+0.046
Neg logit contributions
ves-0.054
fire-0.049
ork-0.047
ggy-0.045
l-0.045
Token moral
Token ID6573
Feature activation+6.62e-01

Pos logit contributions
efully+0.031
ipp+0.028
inding+0.028
ummy+0.028
 forth+0.027
Neg logit contributions
ves-0.038
ggy-0.035
fire-0.035
 liar-0.035
ork-0.035
Token of
Token ID286
Feature activation+4.39e+00

Pos logit contributions
efully+0.013
ummy+0.012
ipp+0.012
inding+0.012
el+0.012
Neg logit contributions
ves-0.019
fire-0.017
 liar-0.017
ggy-0.017
ork-0.017
Token the
Token ID262
Feature activation+2.45e+00

Pos logit contributions
efully+0.127
ipp+0.119
inding+0.119
 forth+0.117
 favourites+0.114
Neg logit contributions
ves-0.084
ggy-0.078
fire-0.078
ork-0.073
 jail-0.072
Token story
Token ID1621
Feature activation+7.88e-01

Pos logit contributions
efully+0.066
inding+0.061
ipp+0.061
 forth+0.060
asured+0.059
Neg logit contributions
ves-0.055
ggy-0.050
ork-0.050
fire-0.049
 liar-0.048
Token is
Token ID318
Feature activation+1.03e+00

Pos logit contributions
efully+0.021
 forth+0.020
ipp+0.019
aleb+0.019
 nic+0.019
Neg logit contributions
ves-0.018
ork-0.018
fire-0.016
ggy-0.016
 jail-0.016
Token that
Token ID326
Feature activation+5.67e-01

Pos logit contributions
efully+0.028
 forth+0.027
ipp+0.025
inding+0.025
aleb+0.025
Neg logit contributions
ves-0.023
ork-0.021
fire-0.021
ggy-0.021
 jail-0.020
Token it
Token ID340
Feature activation+1.27e+00

Pos logit contributions
efully+0.013
ipp+0.012
ummy+0.012
inding+0.012
 forth+0.012
Neg logit contributions
ves-0.015
fire-0.013
ggy-0.013
 liar-0.013
ork-0.013
Token great
Token ID1049
Feature activation-1.22e-01

Pos logit contributions
efully+0.007
inding+0.006
ipp+0.006
 forth+0.006
ummy+0.006
Neg logit contributions
ves-0.006
ggy-0.005
fire-0.005
ork-0.005
 liar-0.005
Token!"
Token ID2474
Feature activation+1.12e+00

Pos logit contributions
ves+0.002
ork+0.002
fire+0.002
 liar+0.002
ggy+0.002
Neg logit contributions
efully-0.003
ummy-0.003
 forth-0.003
ipp-0.003
inding-0.003
Token
Token ID220
Feature activation-1.09e+00

Pos logit contributions
 forth+0.033
efully+0.032
asured+0.032
inding+0.031
ishing+0.031
Neg logit contributions
ves-0.019
ork-0.018
fire-0.016
 jail-0.016
ggy-0.016
Token
Token ID198
Feature activation+9.16e-01

Pos logit contributions
 liar+0.028
ves+0.028
fire+0.027
 villains+0.027
 troll+0.026
Neg logit contributions
ipp-0.026
ummy-0.025
efully-0.025
inding-0.024
 both-0.023
Token
Token ID198
Feature activation+2.03e+00

Pos logit contributions
efully+0.025
ipp+0.024
ummy+0.024
inding+0.024
el+0.023
Neg logit contributions
ves-0.019
 liar-0.017
fire-0.017
ggy-0.017
ork-0.017
TokenKat
Token ID25881
Feature activation+4.38e+00

Pos logit contributions
 forth+0.066
ish+0.063
efully+0.062
el+0.061
ishing+0.061
Neg logit contributions
ves-0.037
ork-0.033
ggy-0.031
fire-0.029
l-0.027
Tokenie
Token ID494
Feature activation+1.05e+00

Pos logit contributions
initely+0.085
 nic+0.081
 forth+0.079
 definitely+0.078
 favourite+0.077
Neg logit contributions
ves-0.117
l-0.112
ork-0.111
ggy-0.105
ata-0.103
Token was
Token ID373
Feature activation+9.11e-01

Pos logit contributions
ummy+0.024
efully+0.023
ires+0.023
initely+0.022
inding+0.022
Neg logit contributions
ves-0.026
ggy-0.025
ork-0.023
fire-0.022
con-0.022
Token happy
Token ID3772
Feature activation+1.66e+00

Pos logit contributions
ummy+0.018
el+0.017
efully+0.017
aleb+0.017
 forth+0.017
Neg logit contributions
ggy-0.025
ves-0.024
 liar-0.023
 jail-0.023
 killed-0.023
Token and
Token ID290
Feature activation-2.18e-01

Pos logit contributions
efully+0.048
ipp+0.045
initely+0.044
inding+0.043
aleb+0.043
Neg logit contributions
ves-0.031
ggy-0.030
ork-0.029
 jail-0.029
fire-0.029
Token said
Token ID531
Feature activation+3.47e-01

Pos logit contributions
ves+0.005
ggy+0.005
 killed+0.004
con+0.004
fire+0.004
Neg logit contributions
efully-0.005
el-0.005
ummy-0.005
ipp-0.005
inding-0.005
Token!
Token ID0
Feature activation+1.71e-01

Pos logit contributions
efully+0.009
 forth+0.009
inding+0.008
ummy+0.008
ipp+0.008
Neg logit contributions
ves-0.007
ork-0.006
ggy-0.006
 liar-0.006
fire-0.006
Token<|endoftext|>
Token ID50256
Feature activation-1.20e+00

Pos logit contributions
efully+0.004
ummy+0.004
ipp+0.004
inding+0.004
el+0.004
Neg logit contributions
ves-0.004
ggy-0.004
 liar-0.004
fire-0.004
ork-0.004
TokenIt
Token ID1026
Feature activation+2.12e+00

Pos logit contributions
 liar+0.039
ves+0.038
 villains+0.036
 troll+0.036
fire+0.035
Neg logit contributions
efully-0.019
 both-0.019
ummy-0.019
 all-0.018
ipp-0.018
Token was
Token ID373
Feature activation+2.58e+00

Pos logit contributions
efully+0.043
ipp+0.040
inding+0.039
 forth+0.039
ummy+0.039
Neg logit contributions
ves-0.060
fire-0.055
ork-0.055
ggy-0.055
 liar-0.054
Token a
Token ID257
Feature activation+2.29e+00

Pos logit contributions
efully+0.069
inding+0.066
 forth+0.065
ipp+0.064
ummy+0.063
Neg logit contributions
ves-0.054
ggy-0.050
fire-0.050
ork-0.048
 jail-0.048
Token bright
Token ID6016
Feature activation+4.37e+00

Pos logit contributions
efully+0.052
inding+0.048
asured+0.048
ipp+0.048
el+0.046
Neg logit contributions
fire-0.058
ves-0.058
ggy-0.057
 jail-0.055
 liar-0.055
Token day
Token ID1110
Feature activation+7.30e-01

Pos logit contributions
efully+0.110
ipp+0.101
inding+0.101
asured+0.096
aleb+0.091
Neg logit contributions
 jail-0.097
fire-0.097
ork-0.089
ger-0.089
ves-0.089
Token when
Token ID618
Feature activation+1.72e-01

Pos logit contributions
efully+0.020
ipp+0.018
inding+0.018
 forth+0.018
ummy+0.018
Neg logit contributions
ves-0.016
fire-0.015
ggy-0.014
ork-0.014
 liar-0.014
Token Mary
Token ID5335
Feature activation+4.24e-01

Pos logit contributions
efully+0.003
ummy+0.003
 both+0.003
 forth+0.003
el+0.003
Neg logit contributions
ves-0.005
ork-0.004
 liar-0.004
fire-0.004
ata-0.004
Token and
Token ID290
Feature activation+9.21e-01

Pos logit contributions
efully+0.007
ipp+0.006
el+0.006
inding+0.006
 forth+0.006
Neg logit contributions
ves-0.014
fire-0.013
 liar-0.013
ggy-0.012
ork-0.012
Token her
Token ID607
Feature activation+1.73e-01

Pos logit contributions
efully+0.018
ipp+0.017
inding+0.017
ummy+0.016
 forth+0.016
Neg logit contributions
ves-0.027
ggy-0.025
fire-0.025
ork-0.024
 liar-0.024
Token to
Token ID284
Feature activation+8.18e-02

Pos logit contributions
ves+0.012
fire+0.011
ork+0.011
ggy+0.011
 liar+0.011
Neg logit contributions
efully-0.012
ipp-0.012
ummy-0.012
inding-0.012
 forth-0.012
Token do
Token ID466
Feature activation-8.69e-01

Pos logit contributions
efully+0.002
ipp+0.002
asured+0.002
inding+0.002
aleb+0.002
Neg logit contributions
ves-0.002
ggy-0.002
ork-0.002
 jail-0.002
fire-0.002
Token.
Token ID13
Feature activation+2.33e+00

Pos logit contributions
ves+0.014
fire+0.012
ork+0.012
 liar+0.012
ggy+0.012
Neg logit contributions
efully-0.028
ummy-0.027
ipp-0.027
inding-0.027
 forth-0.027
Token
Token ID198
Feature activation+1.27e+00

Pos logit contributions
asured+0.056
 forth+0.055
efully+0.055
inding+0.052
ishing+0.051
Neg logit contributions
fire-0.051
ves-0.051
ork-0.049
ggy-0.047
 jail-0.047
Token
Token ID198
Feature activation+2.21e+00

Pos logit contributions
efully+0.040
 forth+0.038
ipp+0.038
asured+0.038
inding+0.038
Neg logit contributions
ves-0.022
ggy-0.019
ork-0.019
fire-0.019
 liar-0.017
TokenM
Token ID44
Feature activation+4.37e+00

Pos logit contributions
asured+0.077
ipp+0.073
 prett+0.073
efully+0.072
ishment+0.071
Neg logit contributions
ves-0.032
fire-0.030
ggy-0.028
ork-0.023
th-0.020
Tokenummy
Token ID13513
Feature activation+1.89e+00

Pos logit contributions
 forth+0.056
initely+0.054
efully+0.049
 nic+0.046
 prett+0.045
Neg logit contributions
ves-0.146
ork-0.144
ggy-0.142
l-0.136
fire-0.135
Token said
Token ID531
Feature activation+8.26e-01

Pos logit contributions
efully+0.045
ipp+0.043
inding+0.043
ummy+0.042
 forth+0.042
Neg logit contributions
ves-0.046
ggy-0.042
ork-0.041
fire-0.041
 troll-0.040
Token:
Token ID25
Feature activation+9.41e-01

Pos logit contributions
efully+0.019
ummy+0.018
ipp+0.018
inding+0.018
 forth+0.018
Neg logit contributions
ves-0.021
fire-0.019
ork-0.019
ggy-0.019
 liar-0.019
Token "
Token ID366
Feature activation+1.71e+00

Pos logit contributions
efully+0.017
ipp+0.016
ummy+0.016
inding+0.016
 forth+0.015
Neg logit contributions
ves-0.028
fire-0.026
ggy-0.026
ork-0.026
 liar-0.025
TokenWe
Token ID1135
Feature activation+1.57e+00

Pos logit contributions
efully+0.043
ipp+0.042
ummy+0.042
inding+0.041
 forth+0.041
Neg logit contributions
ves-0.038
fire-0.036
ork-0.034
l-0.033
ggy-0.033
Token!
Token ID0
Feature activation+1.32e+00

Pos logit contributions
efully+0.053
ipp+0.048
inding+0.047
el+0.046
aleb+0.046
Neg logit contributions
ves-0.036
fire-0.032
ggy-0.032
ork-0.032
 jail-0.031
Token
Token ID198
Feature activation+1.16e+00

Pos logit contributions
efully+0.032
 forth+0.032
inding+0.030
ipp+0.030
asured+0.030
Neg logit contributions
ves-0.030
ork-0.028
fire-0.028
 jail-0.027
ggy-0.027
Token
Token ID198
Feature activation+2.33e+00

Pos logit contributions
efully+0.029
ipp+0.027
ummy+0.027
inding+0.027
 forth+0.026
Neg logit contributions
ves-0.027
 liar-0.025
fire-0.025
ork-0.024
ggy-0.024
TokenMom
Token ID29252
Feature activation+2.90e+00

Pos logit contributions
 forth+0.064
ish+0.063
efully+0.062
el+0.061
ishing+0.060
Neg logit contributions
ves-0.053
fire-0.047
ork-0.045
ggy-0.045
l-0.041
Tokenmy
Token ID1820
Feature activation+1.67e+00

Pos logit contributions
ires+0.063
initely+0.062
efully+0.061
 favourite+0.060
 absolute+0.059
Neg logit contributions
ata-0.063
ves-0.061
l-0.060
,"-0.060
ggy-0.057
Token and
Token ID290
Feature activation+4.35e+00

Pos logit contributions
efully+0.038
ummy+0.035
ipp+0.034
initely+0.034
ires+0.034
Neg logit contributions
ves-0.043
ggy-0.040
fire-0.040
ork-0.038
ata-0.037
Token Daddy
Token ID26926
Feature activation+8.71e-01

Pos logit contributions
efully+0.105
imm+0.100
 definitely+0.098
ags+0.097
 certainly+0.097
Neg logit contributions
,"-0.085
aster-0.076
 someone-0.075
 jail-0.073
 troll-0.071
Token said
Token ID531
Feature activation+9.28e-01

Pos logit contributions
ires+0.022
ummy+0.022
efully+0.021
iring+0.021
inding+0.020
Neg logit contributions
ves-0.018
fire-0.017
ggy-0.016
ata-0.016
 defeated-0.015
Token it
Token ID340
Feature activation+1.67e+00

Pos logit contributions
efully+0.025
ipp+0.023
inding+0.023
 forth+0.023
ummy+0.023
Neg logit contributions
ves-0.021
ggy-0.019
fire-0.019
ork-0.019
 liar-0.018
Token was
Token ID373
Feature activation+2.21e+00

Pos logit contributions
efully+0.040
ipp+0.038
ummy+0.038
 forth+0.037
inding+0.037
Neg logit contributions
ves-0.043
ggy-0.039
fire-0.038
ork-0.038
 liar-0.035
Token time
Token ID640
Feature activation+7.79e-01

Pos logit contributions
efully+0.054
ipp+0.050
inding+0.050
 forth+0.049
ummy+0.049
Neg logit contributions
ves-0.053
ggy-0.049
fire-0.048
ork-0.048
 liar-0.047
Token
Token ID220
Feature activation-1.48e+00

Pos logit contributions
efully+0.009
ipp+0.009
ummy+0.009
inding+0.008
 forth+0.008
Neg logit contributions
ves-0.011
fire-0.010
ggy-0.010
ork-0.010
 liar-0.010
Token
Token ID198
Feature activation+2.35e-01

Pos logit contributions
 liar+0.043
fire+0.042
 villains+0.041
ves+0.041
 troll+0.040
Neg logit contributions
ipp-0.030
ummy-0.030
 both-0.028
efully-0.028
inding-0.027
Token
Token ID198
Feature activation+1.21e+00

Pos logit contributions
efully+0.006
ipp+0.006
ummy+0.006
 both+0.005
inding+0.005
Neg logit contributions
ves-0.005
 liar-0.005
fire-0.005
con-0.005
 troll-0.005
TokenThe
Token ID464
Feature activation+8.40e-01

Pos logit contributions
 forth+0.031
efully+0.031
asured+0.030
ummy+0.030
ipp+0.030
Neg logit contributions
ves-0.028
ggy-0.026
fire-0.025
ork-0.025
 liar-0.023
Token moral
Token ID6573
Feature activation-1.68e-01

Pos logit contributions
efully+0.018
 forth+0.016
ummy+0.016
el+0.015
inding+0.015
Neg logit contributions
ves-0.024
ork-0.022
ggy-0.022
 liar-0.022
ata-0.021
Token of
Token ID286
Feature activation+4.35e+00

Pos logit contributions
ves+0.005
 liar+0.005
fire+0.005
ggy+0.005
ork+0.005
Neg logit contributions
efully-0.003
ummy-0.003
el-0.003
ipp-0.003
inding-0.003
Token this
Token ID428
Feature activation+2.55e+00

Pos logit contributions
efully+0.125
ipp+0.121
 forth+0.120
inding+0.120
 favourites+0.115
Neg logit contributions
ves-0.080
fire-0.077
ggy-0.075
ork-0.068
 jail-0.067
Token story
Token ID1621
Feature activation+5.09e-01

Pos logit contributions
efully+0.071
ipp+0.067
 forth+0.067
inding+0.066
asured+0.065
Neg logit contributions
ves-0.056
ggy-0.051
ork-0.051
fire-0.048
l-0.047
Token is
Token ID318
Feature activation+5.78e-01

Pos logit contributions
efully+0.011
ipp+0.010
 forth+0.010
inding+0.010
ummy+0.010
Neg logit contributions
ves-0.014
ork-0.013
ggy-0.013
fire-0.013
 liar-0.013
Token that
Token ID326
Feature activation+2.01e-01

Pos logit contributions
efully+0.015
 forth+0.014
ipp+0.014
inding+0.014
asured+0.013
Neg logit contributions
ves-0.013
ggy-0.012
fire-0.012
ork-0.012
 jail-0.012
Token we
Token ID356
Feature activation+1.79e+00

Pos logit contributions
efully+0.005
ipp+0.004
ummy+0.004
inding+0.004
 forth+0.004
Neg logit contributions
ves-0.005
fire-0.005
ggy-0.005
ork-0.005
 liar-0.005
Tokenmy
Token ID1820
Feature activation-1.42e-01

Pos logit contributions
efully+0.016
ipp+0.015
ummy+0.014
inding+0.014
initely+0.013
Neg logit contributions
ves-0.046
ggy-0.044
 liar-0.043
ork-0.043
fire-0.043
Token asked
Token ID1965
Feature activation+7.04e-01

Pos logit contributions
ves+0.004
ggy+0.003
ork+0.003
fire+0.003
 liar+0.003
Neg logit contributions
efully-0.003
ipp-0.003
ummy-0.003
inding-0.003
 forth-0.003
Token.
Token ID13
Feature activation+2.43e+00

Pos logit contributions
 forth+0.017
 both+0.017
el+0.017
 Daddy+0.017
ummy+0.017
Neg logit contributions
ves-0.018
ork-0.017
 inconvenience-0.016
 liar-0.016
fire-0.016
Token
Token ID198
Feature activation+1.15e+00

Pos logit contributions
efully+0.055
inding+0.052
 forth+0.051
ipp+0.051
asured+0.050
Neg logit contributions
ves-0.062
ork-0.058
fire-0.058
ggy-0.057
 liar-0.056
Token
Token ID198
Feature activation+2.29e+00

Pos logit contributions
ummy+0.026
efully+0.026
ipp+0.025
el+0.025
 both+0.024
Neg logit contributions
 liar-0.028
ves-0.027
fire-0.027
 troll-0.026
 villains-0.025
TokenM
Token ID44
Feature activation+4.34e+00

Pos logit contributions
efully+0.069
 forth+0.067
asured+0.067
ipp+0.066
inding+0.066
Neg logit contributions
ves-0.043
ggy-0.040
ork-0.038
fire-0.038
 liar-0.032
Tokenummy
Token ID13513
Feature activation+1.52e+00

Pos logit contributions
 forth+0.047
efully+0.045
initely+0.041
 nic+0.040
 prett+0.038
Neg logit contributions
ork-0.154
ves-0.150
l-0.148
ggy-0.148
 troll-0.144
Token smiled
Token ID13541
Feature activation+2.80e+00

Pos logit contributions
efully+0.033
ummy+0.032
inding+0.031
ipp+0.031
 forth+0.030
Neg logit contributions
ves-0.040
ggy-0.039
fire-0.036
ork-0.036
 troll-0.035
Token.
Token ID13
Feature activation+8.43e-01

Pos logit contributions
initely+0.068
ipp+0.067
efully+0.067
inding+0.065
 favourites+0.064
Neg logit contributions
l-0.055
ves-0.052
ork-0.052
ger-0.052
 troll-0.051
Token She
Token ID1375
Feature activation+2.72e-01

Pos logit contributions
efully+0.018
 forth+0.017
asured+0.017
ipp+0.017
inding+0.017
Neg logit contributions
ves-0.021
ork-0.020
fire-0.020
con-0.020
ggy-0.020
Token said
Token ID531
Feature activation+7.63e-01

Pos logit contributions
efully+0.007
ummy+0.007
 forth+0.007
inding+0.007
ipp+0.007
Neg logit contributions
ves-0.006
ggy-0.006
ork-0.005
fire-0.005
l-0.005
Token
Token ID220
Feature activation-1.39e+00

Pos logit contributions
efully+0.009
ipp+0.008
ummy+0.008
inding+0.008
 forth+0.008
Neg logit contributions
ves-0.007
fire-0.006
ggy-0.006
ork-0.006
 liar-0.006
Token
Token ID198
Feature activation+3.33e-01

Pos logit contributions
 liar+0.040
ves+0.039
fire+0.038
 troll+0.037
 villains+0.037
Neg logit contributions
ipp-0.029
ummy-0.027
inding-0.026
efully-0.026
 both-0.025
Token
Token ID198
Feature activation+1.24e+00

Pos logit contributions
efully+0.008
ipp+0.008
ummy+0.008
inding+0.007
 both+0.007
Neg logit contributions
 liar-0.008
ves-0.008
fire-0.007
 inconvenience-0.007
 jail-0.007
TokenThe
Token ID464
Feature activation+6.82e-01

Pos logit contributions
efully+0.031
 forth+0.030
ummy+0.030
inding+0.029
ipp+0.029
Neg logit contributions
ves-0.030
ggy-0.027
ork-0.027
fire-0.026
 liar-0.025
Token moral
Token ID6573
Feature activation+4.27e-01

Pos logit contributions
efully+0.012
ipp+0.011
 forth+0.011
inding+0.011
el+0.011
Neg logit contributions
ves-0.021
fire-0.020
ork-0.020
ggy-0.019
 liar-0.019
Token of
Token ID286
Feature activation+4.32e+00

Pos logit contributions
efully+0.008
ummy+0.008
ipp+0.008
inding+0.008
el+0.008
Neg logit contributions
ves-0.012
fire-0.011
ggy-0.011
 liar-0.011
ork-0.011
Token the
Token ID262
Feature activation+1.97e+00

Pos logit contributions
efully+0.118
ipp+0.113
 forth+0.111
inding+0.109
asured+0.108
Neg logit contributions
ves-0.092
ggy-0.085
fire-0.084
ork-0.079
 troll-0.076
Token story
Token ID1621
Feature activation+6.07e-01

Pos logit contributions
efully+0.047
ipp+0.044
inding+0.044
ummy+0.043
 forth+0.043
Neg logit contributions
ves-0.049
fire-0.045
ggy-0.045
ork-0.045
 liar-0.044
Token is
Token ID318
Feature activation+5.89e-01

Pos logit contributions
efully+0.015
 forth+0.015
ipp+0.014
aleb+0.014
asured+0.014
Neg logit contributions
ves-0.015
ork-0.014
ggy-0.013
 jail-0.013
fire-0.013
Token that
Token ID326
Feature activation+8.10e-01

Pos logit contributions
efully+0.016
 forth+0.015
ipp+0.014
 nic+0.014
asured+0.014
Neg logit contributions
ves-0.013
ggy-0.012
ork-0.012
fire-0.012
 jail-0.012
Token it
Token ID340
Feature activation+1.44e+00

Pos logit contributions
efully+0.021
ipp+0.019
inding+0.018
aleb+0.018
ummy+0.018
Neg logit contributions
ves-0.019
ggy-0.017
fire-0.017
ork-0.017
 liar-0.017
Token
Token ID220
Feature activation-1.53e+00

Pos logit contributions
efully+0.026
 forth+0.025
inding+0.024
asured+0.024
ipp+0.024
Neg logit contributions
ves-0.021
fire-0.020
ork-0.019
 jail-0.019
ggy-0.018
Token
Token ID198
Feature activation+6.69e-01

Pos logit contributions
ves+0.038
 liar+0.038
 troll+0.036
fire+0.036
 villains+0.035
Neg logit contributions
ipp-0.036
efully-0.036
ummy-0.035
inding-0.034
 both-0.034
Token
Token ID198
Feature activation+1.92e+00

Pos logit contributions
efully+0.018
ipp+0.017
ummy+0.017
inding+0.017
 both+0.016
Neg logit contributions
ves-0.014
 liar-0.014
ggy-0.013
fire-0.013
ork-0.013
TokenThe
Token ID464
Feature activation+3.99e-01

Pos logit contributions
 forth+0.053
efully+0.051
el+0.050
ipp+0.049
asured+0.049
Neg logit contributions
ves-0.041
fire-0.039
ork-0.037
ggy-0.036
l-0.033
Token moral
Token ID6573
Feature activation+3.56e-01

Pos logit contributions
efully+0.007
ipp+0.007
ummy+0.007
inding+0.007
 forth+0.007
Neg logit contributions
ves-0.012
fire-0.011
ork-0.011
ggy-0.011
 liar-0.011
Token of
Token ID286
Feature activation+4.32e+00

Pos logit contributions
efully+0.007
ipp+0.007
ummy+0.007
inding+0.007
el+0.007
Neg logit contributions
ves-0.010
fire-0.009
ggy-0.009
ork-0.009
 liar-0.009
Token the
Token ID262
Feature activation+2.17e+00

Pos logit contributions
efully+0.129
ipp+0.123
inding+0.120
 forth+0.120
outed+0.118
Neg logit contributions
ves-0.077
fire-0.075
ggy-0.073
aster-0.066
ork-0.064
Token story
Token ID1621
Feature activation+5.43e-01

Pos logit contributions
efully+0.060
ipp+0.056
inding+0.056
asured+0.055
 forth+0.055
Neg logit contributions
ves-0.047
ggy-0.043
fire-0.042
ork-0.042
l-0.040
Token is
Token ID318
Feature activation+9.01e-01

Pos logit contributions
efully+0.014
 forth+0.013
ipp+0.013
aleb+0.013
asured+0.013
Neg logit contributions
ves-0.013
ork-0.012
ggy-0.011
fire-0.011
 jail-0.011
Token that
Token ID326
Feature activation+6.58e-01

Pos logit contributions
efully+0.027
 forth+0.025
 nic+0.023
inding+0.023
initely+0.023
Neg logit contributions
ves-0.018
ork-0.017
ggy-0.017
fire-0.017
 jail-0.016
Token if
Token ID611
Feature activation-3.48e-01

Pos logit contributions
efully+0.018
ipp+0.017
aleb+0.016
inding+0.016
ummy+0.016
Neg logit contributions
ves-0.014
fire-0.013
ggy-0.013
 jail-0.013
 liar-0.013
Token!"
Token ID2474
Feature activation+7.12e-01

Pos logit contributions
ves+0.002
 liar+0.002
ggy+0.002
fire+0.002
 troll+0.002
Neg logit contributions
ummy-0.002
efully-0.002
ipp-0.002
inding-0.002
 definitely-0.002
Token
Token ID198
Feature activation+1.10e+00

Pos logit contributions
efully+0.017
ipp+0.016
ummy+0.016
inding+0.016
 forth+0.015
Neg logit contributions
ves-0.018
 liar-0.016
ggy-0.016
ork-0.016
fire-0.016
Token
Token ID198
Feature activation+1.53e+00

Pos logit contributions
efully+0.029
ipp+0.027
ummy+0.027
inding+0.027
el+0.026
Neg logit contributions
ves-0.024
 liar-0.023
fire-0.022
ork-0.022
ggy-0.021
TokenThe
Token ID464
Feature activation+1.02e+00

Pos logit contributions
 forth+0.040
ummy+0.039
efully+0.038
ipp+0.038
asured+0.038
Neg logit contributions
ves-0.036
fire-0.032
ggy-0.031
ork-0.030
l-0.028
Token moral
Token ID6573
Feature activation+9.77e-02

Pos logit contributions
efully+0.026
 forth+0.025
ummy+0.025
el+0.025
ipp+0.024
Neg logit contributions
ves-0.024
ork-0.021
ggy-0.021
 liar-0.021
fire-0.020
Token of
Token ID286
Feature activation+4.31e+00

Pos logit contributions
efully+0.002
ummy+0.002
ipp+0.002
el+0.002
inding+0.002
Neg logit contributions
ves-0.003
 liar-0.003
fire-0.003
ggy-0.003
ork-0.003
Token the
Token ID262
Feature activation+2.22e+00

Pos logit contributions
efully+0.121
 forth+0.115
ipp+0.114
inding+0.112
asured+0.110
Neg logit contributions
ves-0.088
fire-0.083
ggy-0.079
ork-0.076
 jail-0.071
Token story
Token ID1621
Feature activation+4.79e-01

Pos logit contributions
efully+0.056
ipp+0.053
inding+0.052
 forth+0.052
asured+0.051
Neg logit contributions
ves-0.052
fire-0.047
ork-0.047
ggy-0.047
 liar-0.046
Token is
Token ID318
Feature activation+6.25e-01

Pos logit contributions
efully+0.011
 forth+0.010
ipp+0.010
asured+0.010
inding+0.010
Neg logit contributions
ves-0.013
ork-0.012
ggy-0.011
fire-0.011
 liar-0.011
Token that
Token ID326
Feature activation+6.03e-01

Pos logit contributions
efully+0.016
 forth+0.015
ipp+0.015
inding+0.014
asured+0.014
Neg logit contributions
ves-0.015
ggy-0.014
ork-0.014
fire-0.013
 liar-0.013
Token we
Token ID356
Feature activation+1.91e+00

Pos logit contributions
efully+0.015
ipp+0.014
ummy+0.013
inding+0.013
 forth+0.013
Neg logit contributions
ves-0.015
fire-0.013
ggy-0.013
ork-0.013
 liar-0.013
Token more
Token ID517
Feature activation+7.52e-01

Pos logit contributions
ves+0.026
fire+0.024
ork+0.024
ggy+0.024
 liar+0.024
Neg logit contributions
efully-0.019
ipp-0.018
inding-0.018
 forth-0.018
ummy-0.018
Token cautious
Token ID21205
Feature activation+5.40e-01

Pos logit contributions
efully+0.018
ipp+0.018
ummy+0.017
el+0.017
 forth+0.017
Neg logit contributions
ves-0.018
ggy-0.017
 liar-0.017
fire-0.017
ork-0.016
Token and
Token ID290
Feature activation-1.56e+00

Pos logit contributions
efully+0.009
ipp+0.008
ummy+0.008
inding+0.008
 forth+0.008
Neg logit contributions
ves-0.017
fire-0.016
ggy-0.016
ork-0.016
 liar-0.016
Token not
Token ID407
Feature activation-1.80e+00

Pos logit contributions
ves+0.042
fire+0.039
ggy+0.039
 liar+0.038
ork+0.038
Neg logit contributions
efully-0.034
ummy-0.032
ipp-0.032
inding-0.031
el-0.031
Token take
Token ID1011
Feature activation-2.29e+00

Pos logit contributions
ves+0.049
fire+0.045
ork+0.045
ggy+0.045
 liar+0.044
Neg logit contributions
efully-0.038
ipp-0.036
ummy-0.035
inding-0.035
 forth-0.035
Token unnecessary
Token ID13114
Feature activation-5.93e+00

Pos logit contributions
ves+0.058
fire+0.055
ork+0.052
 liar+0.051
ata+0.050
Neg logit contributions
efully-0.054
 forth-0.053
 both-0.053
 each-0.052
el-0.050
Token risks
Token ID7476
Feature activation-2.67e+00

Pos logit contributions
ves+0.123
fire+0.114
ork+0.108
 troll+0.106
ata+0.106
Neg logit contributions
 each-0.158
 how-0.156
 both-0.152
efully-0.150
 all-0.147
Token.
Token ID13
Feature activation+8.26e-01

Pos logit contributions
ves+0.061
ggy+0.057
ork+0.056
 troll+0.056
 liar+0.055
Neg logit contributions
 both-0.065
 each-0.064
 forth-0.063
ummy-0.062
efully-0.062
Token The
Token ID383
Feature activation+6.09e-01

Pos logit contributions
ummy+0.017
ipp+0.016
efully+0.016
inding+0.015
 definitely+0.015
Neg logit contributions
ves-0.025
fire-0.022
ggy-0.022
 liar-0.021
ork-0.021
Token memory
Token ID4088
Feature activation+1.68e+00

Pos logit contributions
efully+0.012
ipp+0.011
ummy+0.011
inding+0.011
 forth+0.011
Neg logit contributions
ves-0.018
fire-0.017
ork-0.017
ggy-0.017
 liar-0.016
Token of
Token ID286
Feature activation+1.95e+00

Pos logit contributions
efully+0.042
ipp+0.039
inding+0.038
ummy+0.037
 forth+0.037
Neg logit contributions
ves-0.043
ggy-0.039
fire-0.038
ork-0.038
 liar-0.036
Token be
Token ID307
Feature activation-1.69e+00

Pos logit contributions
ves+0.026
fire+0.025
ork+0.024
 troll+0.024
 liar+0.024
Neg logit contributions
 forth-0.012
efully-0.011
 both-0.011
ipp-0.011
inding-0.010
Token careful
Token ID8161
Feature activation+1.35e+00

Pos logit contributions
ves+0.055
ork+0.052
fire+0.051
con+0.049
 troll+0.049
Neg logit contributions
efully-0.030
 forth-0.028
 both-0.027
asured-0.027
ipp-0.026
Token and
Token ID290
Feature activation-1.25e+00

Pos logit contributions
efully+0.030
ipp+0.028
inding+0.027
ummy+0.027
 forth+0.027
Neg logit contributions
ves-0.035
fire-0.032
ggy-0.032
ork-0.032
 liar-0.031
Token not
Token ID407
Feature activation-1.73e+00

Pos logit contributions
ves+0.031
ggy+0.030
fire+0.030
 liar+0.029
 jail+0.028
Neg logit contributions
efully-0.029
ummy-0.027
ipp-0.027
el-0.027
inding-0.026
Token take
Token ID1011
Feature activation-1.85e+00

Pos logit contributions
ves+0.047
fire+0.043
ork+0.043
ggy+0.042
 liar+0.042
Neg logit contributions
efully-0.037
 forth-0.035
ipp-0.035
inding-0.035
ummy-0.034
Token unnecessary
Token ID13114
Feature activation-5.86e+00

Pos logit contributions
ves+0.053
fire+0.052
 liar+0.048
ork+0.048
 troll+0.047
Neg logit contributions
efully-0.039
 both-0.038
 forth-0.038
 each-0.035
el-0.035
Token risks
Token ID7476
Feature activation-2.44e+00

Pos logit contributions
ves+0.129
fire+0.120
 troll+0.112
ork+0.110
ata+0.110
Neg logit contributions
 each-0.154
 how-0.151
 both-0.150
efully-0.145
 all-0.145
Token.
Token ID13
Feature activation+9.73e-02

Pos logit contributions
ves+0.051
ggy+0.047
 troll+0.046
 liar+0.046
ork+0.046
Neg logit contributions
 both-0.064
 each-0.063
ummy-0.062
 forth-0.062
efully-0.062
Token The
Token ID383
Feature activation+4.06e-01

Pos logit contributions
ummy+0.002
efully+0.002
 both+0.002
ipp+0.002
inding+0.002
Neg logit contributions
ves-0.003
fire-0.003
ggy-0.003
 liar-0.003
 troll-0.003
Token moral
Token ID6573
Feature activation-3.34e-03

Pos logit contributions
efully+0.007
 forth+0.006
ummy+0.006
ipp+0.006
inding+0.006
Neg logit contributions
ves-0.013
fire-0.012
ork-0.012
ggy-0.012
 liar-0.012
Token of
Token ID286
Feature activation+3.23e+00

Pos logit contributions
ves+0.000
fire+0.000
 liar+0.000
 troll+0.000
ggy+0.000
Neg logit contributions
efully-0.000
ummy-0.000
el-0.000
ipp-0.000
inding-0.000
Token be
Token ID307
Feature activation-1.55e+00

Pos logit contributions
efully+0.008
ummy+0.007
ipp+0.007
inding+0.007
 forth+0.007
Neg logit contributions
ves-0.010
fire-0.009
ggy-0.009
ork-0.009
 liar-0.009
Token prepared
Token ID5597
Feature activation+1.87e+00

Pos logit contributions
ves+0.046
ork+0.043
con+0.042
ggy+0.041
fire+0.041
Neg logit contributions
efully-0.031
inding-0.028
asured-0.027
 forth-0.027
 both-0.027
Token and
Token ID290
Feature activation-1.20e+00

Pos logit contributions
efully+0.033
ipp+0.030
ummy+0.029
inding+0.029
 forth+0.029
Neg logit contributions
ves-0.058
fire-0.054
ork-0.054
ggy-0.054
 liar-0.053
Token not
Token ID407
Feature activation-1.56e+00

Pos logit contributions
ves+0.031
fire+0.030
ggy+0.029
 liar+0.028
 jail+0.028
Neg logit contributions
efully-0.027
ipp-0.026
ummy-0.025
inding-0.025
el-0.024
Token take
Token ID1011
Feature activation-1.68e+00

Pos logit contributions
ves+0.047
ork+0.043
ggy+0.042
fire+0.042
 liar+0.041
Neg logit contributions
efully-0.029
 forth-0.027
ipp-0.027
inding-0.027
ummy-0.027
Token unnecessary
Token ID13114
Feature activation-5.83e+00

Pos logit contributions
ves+0.042
fire+0.038
ggy+0.037
 liar+0.037
ork+0.037
Neg logit contributions
efully-0.041
 both-0.039
 forth-0.038
el-0.038
 each-0.037
Token risks
Token ID7476
Feature activation-2.57e+00

Pos logit contributions
ves+0.117
fire+0.101
ork+0.100
ggy+0.100
ata+0.100
Neg logit contributions
 each-0.157
efully-0.155
 both-0.154
 how-0.153
 forth-0.150
Token.
Token ID13
Feature activation-4.18e-01

Pos logit contributions
ves+0.057
ggy+0.053
 liar+0.051
ork+0.051
 troll+0.051
Neg logit contributions
 both-0.064
ummy-0.063
efully-0.063
 each-0.063
 forth-0.062
Token<|endoftext|>
Token ID50256
Feature activation-1.27e+00

Pos logit contributions
ves+0.010
ggy+0.009
 liar+0.008
ork+0.008
fire+0.008
Neg logit contributions
efully-0.011
ummy-0.010
inding-0.010
ipp-0.010
el-0.010
TokenOnce
Token ID7454
Feature activation+1.43e+00

Pos logit contributions
 liar+0.038
ves+0.037
 villains+0.034
ggy+0.034
fire+0.033
Neg logit contributions
efully-0.025
ummy-0.024
ipp-0.023
el-0.023
 both-0.023
Token upon
Token ID2402
Feature activation-7.60e-01

Pos logit contributions
efully+0.016
ipp+0.014
inding+0.013
 forth+0.013
ummy+0.013
Neg logit contributions
ves-0.053
ork-0.050
ggy-0.050
fire-0.050
 liar-0.049
Token and
Token ID290
Feature activation-1.20e-01

Pos logit contributions
efully+0.022
ipp+0.020
inding+0.020
ummy+0.020
 forth+0.020
Neg logit contributions
ves-0.036
fire-0.034
ggy-0.033
ork-0.033
 liar-0.033
Token friends
Token ID2460
Feature activation+1.93e-01

Pos logit contributions
ves+0.003
fire+0.003
ggy+0.003
 liar+0.003
 villains+0.003
Neg logit contributions
efully-0.003
ipp-0.003
ummy-0.003
inding-0.003
el-0.002
Token and
Token ID290
Feature activation-6.67e-01

Pos logit contributions
efully+0.005
ipp+0.004
inding+0.004
ummy+0.004
 forth+0.004
Neg logit contributions
ves-0.005
fire-0.004
ork-0.004
ggy-0.004
 liar-0.004
Token not
Token ID407
Feature activation-1.69e+00

Pos logit contributions
ves+0.017
fire+0.016
ggy+0.016
 liar+0.016
ork+0.016
Neg logit contributions
efully-0.015
ipp-0.014
ummy-0.014
inding-0.014
el-0.014
Token take
Token ID1011
Feature activation-1.56e+00

Pos logit contributions
ves+0.045
fire+0.042
ggy+0.042
ork+0.042
 liar+0.042
Neg logit contributions
efully-0.037
ipp-0.034
ummy-0.034
inding-0.033
 forth-0.032
Token unnecessary
Token ID13114
Feature activation-5.69e+00

Pos logit contributions
ves+0.038
fire+0.036
ork+0.034
 liar+0.034
ggy+0.034
Neg logit contributions
efully-0.039
 forth-0.036
 both-0.036
ipp-0.035
el-0.035
Token risks
Token ID7476
Feature activation-2.29e+00

Pos logit contributions
ves+0.118
 troll+0.108
fire+0.108
ork+0.107
ata+0.106
Neg logit contributions
 each-0.148
 how-0.146
 both-0.143
efully-0.141
 all-0.138
Token.
Token ID13
Feature activation-8.62e-02

Pos logit contributions
ves+0.051
ggy+0.047
ork+0.047
 troll+0.046
 liar+0.046
Neg logit contributions
 both-0.057
 each-0.056
 forth-0.055
efully-0.055
ummy-0.055
Token We
Token ID775
Feature activation+1.13e+00

Pos logit contributions
ves+0.002
fire+0.002
ggy+0.002
 liar+0.002
ork+0.002
Neg logit contributions
efully-0.002
ummy-0.002
inding-0.002
ipp-0.002
el-0.002
Token should
Token ID815
Feature activation+1.21e+00

Pos logit contributions
efully+0.024
ummy+0.024
ipp+0.023
 forth+0.023
inding+0.023
Neg logit contributions
ves-0.030
fire-0.028
ork-0.028
ggy-0.028
 liar-0.027
Token always
Token ID1464
Feature activation-1.03e+00

Pos logit contributions
efully+0.026
ummy+0.023
ipp+0.023
inding+0.023
 forth+0.022
Neg logit contributions
ves-0.033
ggy-0.032
 jail-0.030
fire-0.030
 liar-0.030
Token your
Token ID534
Feature activation+1.51e+00

Pos logit contributions
efully+0.017
ipp+0.015
ummy+0.015
inding+0.015
 forth+0.015
Neg logit contributions
ves-0.031
fire-0.029
ggy-0.029
ork-0.029
 liar-0.029
Token parents
Token ID3397
Feature activation+4.13e-01

Pos logit contributions
efully+0.027
ipp+0.024
 forth+0.024
inding+0.023
ummy+0.023
Neg logit contributions
ves-0.047
ggy-0.045
ork-0.044
fire-0.044
 liar-0.043
Token and
Token ID290
Feature activation+1.95e-01

Pos logit contributions
efully+0.009
ipp+0.009
ummy+0.008
inding+0.008
 forth+0.008
Neg logit contributions
ves-0.011
fire-0.010
ork-0.010
ggy-0.010
 liar-0.010
Token not
Token ID407
Feature activation-1.27e+00

Pos logit contributions
efully+0.005
ipp+0.005
ummy+0.004
inding+0.004
el+0.004
Neg logit contributions
ves-0.005
ggy-0.005
fire-0.004
 liar-0.004
 jail-0.004
Token take
Token ID1011
Feature activation-1.31e+00

Pos logit contributions
ves+0.034
ggy+0.033
fire+0.032
 liar+0.032
ork+0.032
Neg logit contributions
efully-0.027
ipp-0.025
ummy-0.025
inding-0.024
el-0.023
Token unnecessary
Token ID13114
Feature activation-5.29e+00

Pos logit contributions
ves+0.036
fire+0.034
ork+0.033
 liar+0.032
ggy+0.032
Neg logit contributions
efully-0.028
 both-0.026
 forth-0.026
el-0.026
ipp-0.025
Token risks
Token ID7476
Feature activation-2.15e+00

Pos logit contributions
ves+0.115
 troll+0.110
fire+0.106
ork+0.105
ata+0.104
Neg logit contributions
 each-0.131
 both-0.126
 how-0.123
efully-0.122
 all-0.121
Token.
Token ID13
Feature activation-2.84e-02

Pos logit contributions
ves+0.046
ggy+0.042
 liar+0.042
ork+0.042
 troll+0.042
Neg logit contributions
 both-0.054
efully-0.054
 each-0.053
ummy-0.053
 forth-0.053
Token<|endoftext|>
Token ID50256
Feature activation-1.72e+00

Pos logit contributions
ves+0.001
fire+0.001
ggy+0.001
 liar+0.001
ork+0.001
Neg logit contributions
efully-0.001
ummy-0.001
inding-0.001
ipp-0.001
el-0.001
TokenOne
Token ID3198
Feature activation+9.89e-01

Pos logit contributions
 liar+0.057
ves+0.055
 villains+0.053
 troll+0.052
fire+0.052
Neg logit contributions
efully-0.026
 both-0.026
ummy-0.025
el-0.025
ipp-0.025
Token day
Token ID1110
Feature activation-1.21e+00

Pos logit contributions
efully+0.019
ipp+0.017
inding+0.017
 forth+0.017
ummy+0.017
Neg logit contributions
ves-0.029
ggy-0.027
fire-0.027
ork-0.027
 liar-0.027
Token friends
Token ID2460
Feature activation+1.08e+00

Pos logit contributions
efully+0.032
ipp+0.030
 forth+0.029
inding+0.029
ummy+0.028
Neg logit contributions
ves-0.043
ork-0.040
ggy-0.040
fire-0.039
 liar-0.039
Token,
Token ID11
Feature activation+1.38e+00

Pos logit contributions
efully+0.023
ipp+0.021
ummy+0.021
inding+0.020
 forth+0.020
Neg logit contributions
ves-0.029
fire-0.028
ggy-0.027
ork-0.027
 liar-0.027
Token and
Token ID290
Feature activation-1.04e-01

Pos logit contributions
efully+0.032
ipp+0.029
ummy+0.029
el+0.028
 forth+0.027
Neg logit contributions
ves-0.035
fire-0.033
ggy-0.033
 liar-0.033
 jail-0.033
Token never
Token ID1239
Feature activation-2.37e+00

Pos logit contributions
ves+0.002
fire+0.002
ggy+0.002
 jail+0.002
 villains+0.002
Neg logit contributions
efully-0.003
ipp-0.003
ummy-0.003
el-0.002
aleb-0.002
Token take
Token ID1011
Feature activation-1.19e+00

Pos logit contributions
ves+0.065
fire+0.060
ggy+0.060
ork+0.060
 liar+0.059
Neg logit contributions
efully-0.050
ipp-0.046
ummy-0.046
inding-0.046
 forth-0.045
Token unnecessary
Token ID13114
Feature activation-5.10e+00

Pos logit contributions
ves+0.027
fire+0.025
ork+0.024
 liar+0.024
ggy+0.023
Neg logit contributions
efully-0.031
 forth-0.029
ipp-0.029
inding-0.029
el-0.029
Token risks
Token ID7476
Feature activation-1.54e+00

Pos logit contributions
ves+0.098
ork+0.087
fire+0.086
 troll+0.082
ggy+0.081
Neg logit contributions
efully-0.144
 each-0.141
 how-0.140
 forth-0.139
 both-0.139
Token.
Token ID13
Feature activation-2.02e-01

Pos logit contributions
ves+0.038
ggy+0.035
ork+0.035
 liar+0.034
fire+0.034
Neg logit contributions
efully-0.035
 forth-0.034
ummy-0.034
ipp-0.034
inding-0.033
Token<|endoftext|>
Token ID50256
Feature activation-1.02e+00

Pos logit contributions
ves+0.005
fire+0.004
 liar+0.004
ggy+0.004
ork+0.004
Neg logit contributions
efully-0.005
ummy-0.005
ipp-0.005
inding-0.005
 definitely-0.005
TokenOnce
Token ID7454
Feature activation+1.68e+00

Pos logit contributions
 liar+0.034
ves+0.033
 villains+0.031
fire+0.030
 inconvenience+0.030
Neg logit contributions
ummy-0.017
efully-0.016
 both-0.016
ipp-0.016
el-0.016
Token upon
Token ID2402
Feature activation-4.50e-01

Pos logit contributions
efully+0.018
ipp+0.016
ummy+0.015
inding+0.015
 forth+0.015
Neg logit contributions
ves-0.063
fire-0.060
ggy-0.060
ork-0.060
 liar-0.059
Token instinct
Token ID13311
Feature activation+7.94e-01

Pos logit contributions
efully+0.013
ipp+0.012
inding+0.012
 forth+0.012
ummy+0.012
Neg logit contributions
ves-0.020
ggy-0.019
ork-0.018
fire-0.018
 liar-0.018
Token can
Token ID460
Feature activation+1.00e+00

Pos logit contributions
efully+0.016
ipp+0.015
ummy+0.015
 forth+0.015
inding+0.014
Neg logit contributions
ves-0.022
ork-0.021
fire-0.021
ggy-0.021
 liar-0.021
Token often
Token ID1690
Feature activation-1.02e+00

Pos logit contributions
efully+0.020
ipp+0.018
ummy+0.018
inding+0.018
 forth+0.018
Neg logit contributions
ves-0.029
fire-0.027
ggy-0.027
ork-0.027
 liar-0.026
Token save
Token ID3613
Feature activation-7.59e-01

Pos logit contributions
ves+0.031
ork+0.029
ggy+0.029
fire+0.029
 liar+0.028
Neg logit contributions
efully-0.018
 forth-0.017
ipp-0.017
inding-0.017
ummy-0.016
Token us
Token ID514
Feature activation-1.07e+00

Pos logit contributions
ves+0.023
 liar+0.022
ork+0.022
fire+0.021
 troll+0.021
Neg logit contributions
efully-0.013
 both-0.013
 forth-0.013
 each-0.013
el-0.013
Token unnecessary
Token ID13114
Feature activation-5.04e+00

Pos logit contributions
ves+0.028
ork+0.026
 liar+0.026
ggy+0.026
fire+0.025
Neg logit contributions
efully-0.023
 forth-0.022
inding-0.022
ummy-0.022
ipp-0.022
Token danger
Token ID3514
Feature activation-2.61e+00

Pos logit contributions
ves+0.114
ork+0.110
ata+0.107
 troll+0.105
 liar+0.100
Neg logit contributions
 each-0.117
 both-0.114
 all-0.113
efully-0.112
 how-0.110
Token.
Token ID13
Feature activation-1.31e-01

Pos logit contributions
ves+0.060
ork+0.057
 liar+0.056
 troll+0.054
ggy+0.054
Neg logit contributions
 both-0.063
 each-0.062
 forth-0.061
efully-0.061
 all-0.060
Token<|endoftext|>
Token ID50256
Feature activation-1.34e+00

Pos logit contributions
ves+0.003
fire+0.003
ggy+0.003
 liar+0.003
ork+0.003
Neg logit contributions
efully-0.003
ipp-0.003
ummy-0.003
inding-0.003
el-0.003
TokenOnce
Token ID7454
Feature activation+1.44e+00

Pos logit contributions
 liar+0.040
ves+0.039
 villains+0.036
ggy+0.036
 inconvenience+0.035
Neg logit contributions
efully-0.027
el-0.025
ummy-0.024
ipp-0.024
 both-0.024
Token upon
Token ID2402
Feature activation-7.31e-01

Pos logit contributions
efully+0.016
ipp+0.013
inding+0.013
 forth+0.013
ummy+0.012
Neg logit contributions
ves-0.054
ork-0.051
ggy-0.051
fire-0.051
 liar-0.050
Token the
Token ID262
Feature activation-2.15e-01

Pos logit contributions
efully+0.007
 forth+0.007
 both+0.007
ipp+0.007
ummy+0.007
Neg logit contributions
ves-0.015
fire-0.015
ork-0.014
 liar-0.014
ggy-0.014
Token man
Token ID582
Feature activation-9.67e-01

Pos logit contributions
ves+0.005
fire+0.004
ork+0.004
ggy+0.004
 liar+0.004
Neg logit contributions
efully-0.006
ipp-0.005
ummy-0.005
inding-0.005
 forth-0.005
Token if
Token ID611
Feature activation+1.01e-01

Pos logit contributions
ves+0.029
fire+0.027
ork+0.027
ggy+0.027
 liar+0.026
Neg logit contributions
efully-0.018
ipp-0.017
ummy-0.017
 both-0.016
 forth-0.016
Token she
Token ID673
Feature activation-1.24e+00

Pos logit contributions
 forth+0.002
 both+0.002
efully+0.002
 nic+0.002
 each+0.002
Neg logit contributions
ves-0.003
ork-0.003
fire-0.003
ata-0.003
 liar-0.002
Token could
Token ID714
Feature activation-5.38e-01

Pos logit contributions
ves+0.041
fire+0.040
ork+0.038
ggy+0.037
 liar+0.037
Neg logit contributions
efully-0.019
ipp-0.017
inding-0.017
el-0.016
asured-0.016
Token have
Token ID423
Feature activation-4.88e+00

Pos logit contributions
ves+0.016
fire+0.016
 troll+0.016
ork+0.015
 liar+0.015
Neg logit contributions
ipp-0.009
 both-0.009
efully-0.009
 forth-0.009
 definitely-0.008
Token a
Token ID257
Feature activation-2.80e+00

Pos logit contributions
fire+0.155
ves+0.144
 troll+0.143
ork+0.142
 liar+0.142
Neg logit contributions
 both-0.090
 each-0.087
efully-0.086
 all-0.083
 forth-0.079
Token ride
Token ID6594
Feature activation-8.42e-01

Pos logit contributions
ves+0.077
ork+0.071
ggy+0.071
fire+0.070
 liar+0.070
Neg logit contributions
efully-0.060
ipp-0.056
inding-0.055
ummy-0.055
el-0.053
Token.
Token ID13
Feature activation+1.22e-01

Pos logit contributions
ves+0.019
fire+0.017
ork+0.017
 liar+0.017
 troll+0.016
Neg logit contributions
efully-0.022
 forth-0.021
ipp-0.021
el-0.020
ummy-0.020
Token The
Token ID383
Feature activation-1.38e-02

Pos logit contributions
efully+0.002
ipp+0.002
ummy+0.002
 both+0.002
el+0.002
Neg logit contributions
ves-0.004
 liar-0.004
ggy-0.004
fire-0.003
 villains-0.003
Token man
Token ID582
Feature activation-3.31e-01

Pos logit contributions
ves+0.000
fire+0.000
ggy+0.000
ork+0.000
 liar+0.000
Neg logit contributions
efully-0.000
ipp-0.000
ummy-0.000
inding-0.000
 forth-0.000
Token her
Token ID607
Feature activation+3.74e-01

Pos logit contributions
efully+0.005
ummy+0.005
ipp+0.005
 forth+0.005
 both+0.005
Neg logit contributions
ves-0.010
fire-0.009
ork-0.009
ggy-0.009
 liar-0.009
Token mom
Token ID1995
Feature activation+4.99e-01

Pos logit contributions
efully+0.006
ummy+0.005
ipp+0.005
 forth+0.005
inding+0.005
Neg logit contributions
ves-0.013
fire-0.012
ork-0.012
ggy-0.012
 liar-0.011
Token if
Token ID611
Feature activation+2.33e-01

Pos logit contributions
efully+0.009
ipp+0.008
inding+0.008
 forth+0.008
ummy+0.008
Neg logit contributions
ves-0.015
fire-0.014
ork-0.014
ggy-0.014
 liar-0.014
Token she
Token ID673
Feature activation-1.63e+00

Pos logit contributions
efully+0.005
 forth+0.005
ummy+0.005
ipp+0.005
 both+0.005
Neg logit contributions
ves-0.006
fire-0.006
ork-0.005
 liar-0.005
ggy-0.005
Token could
Token ID714
Feature activation-4.25e-01

Pos logit contributions
ves+0.053
fire+0.051
ork+0.049
ggy+0.049
 liar+0.048
Neg logit contributions
efully-0.025
ipp-0.024
ummy-0.023
inding-0.023
el-0.022
Token have
Token ID423
Feature activation-4.86e+00

Pos logit contributions
ves+0.013
fire+0.012
 liar+0.012
ggy+0.011
ork+0.011
Neg logit contributions
efully-0.008
ipp-0.008
ummy-0.008
 forth-0.008
inding-0.007
Token a
Token ID257
Feature activation-3.27e+00

Pos logit contributions
ves+0.144
fire+0.143
 villains+0.136
 liar+0.135
ork+0.131
Neg logit contributions
 both-0.095
efully-0.093
 each-0.090
el-0.086
 all-0.085
Token slice
Token ID16416
Feature activation+1.07e+00

Pos logit contributions
ves+0.095
fire+0.088
ggy+0.086
 villains+0.086
 liar+0.084
Neg logit contributions
efully-0.060
ummy-0.060
ipp-0.059
 forth-0.057
inding-0.057
Token of
Token ID286
Feature activation-5.75e-01

Pos logit contributions
efully+0.024
ipp+0.022
ummy+0.022
 forth+0.022
inding+0.022
Neg logit contributions
ves-0.028
fire-0.026
ork-0.026
ggy-0.026
 liar-0.025
Token pie
Token ID2508
Feature activation-7.31e-01

Pos logit contributions
ves+0.013
fire+0.012
ork+0.012
ggy+0.012
 liar+0.012
Neg logit contributions
efully-0.014
ipp-0.014
ummy-0.014
inding-0.013
 forth-0.013
Token.
Token ID13
Feature activation+9.92e-01

Pos logit contributions
ves+0.015
 liar+0.013
ork+0.013
fire+0.013
ggy+0.013
Neg logit contributions
efully-0.021
ipp-0.020
ummy-0.020
 forth-0.020
inding-0.019
Token,
Token ID11
Feature activation+3.47e-01

Pos logit contributions
ves+0.067
ork+0.062
 liar+0.059
con+0.057
fire+0.057
Neg logit contributions
efully-0.047
 forth-0.047
ummy-0.047
ipp-0.046
inding-0.046
Token I
Token ID314
Feature activation+3.53e-01

Pos logit contributions
efully+0.008
ipp+0.008
inding+0.008
ummy+0.007
 forth+0.007
Neg logit contributions
ves-0.009
fire-0.008
ggy-0.008
ork-0.008
 liar-0.008
Token won
Token ID1839
Feature activation-2.10e+00

Pos logit contributions
efully+0.005
ipp+0.005
 definitely+0.005
inding+0.005
 certainly+0.005
Neg logit contributions
ves-0.012
fire-0.011
ork-0.011
 liar-0.011
ggy-0.011
Token't
Token ID470
Feature activation-1.99e+00

Pos logit contributions
 liar+0.065
ves+0.063
fire+0.062
 troll+0.061
ork+0.060
Neg logit contributions
 both-0.038
el-0.035
 all-0.034
 each-0.033
efully-0.032
Token let
Token ID1309
Feature activation-1.58e+00

Pos logit contributions
ves+0.054
ork+0.051
fire+0.048
 liar+0.047
con+0.046
Neg logit contributions
 forth-0.042
 both-0.042
 each-0.041
efully-0.040
inding-0.039
Token anyone
Token ID2687
Feature activation-4.78e+00

Pos logit contributions
ves+0.054
fire+0.053
 liar+0.052
 troll+0.050
ork+0.049
Neg logit contributions
 both-0.023
 each-0.022
 forth-0.021
el-0.021
efully-0.020
Token find
Token ID1064
Feature activation-3.03e+00

Pos logit contributions
ves+0.144
ork+0.136
 liar+0.130
fire+0.128
ggy+0.126
Neg logit contributions
 forth-0.088
efully-0.087
 both-0.084
el-0.082
inding-0.079
Token you
Token ID345
Feature activation-2.76e+00

Pos logit contributions
ves+0.110
ork+0.106
fire+0.106
 liar+0.105
ata+0.101
Neg logit contributions
 both-0.041
el-0.040
 each-0.039
efully-0.036
 all-0.035
Token!
Token ID0
Feature activation-9.55e-01

Pos logit contributions
ves+0.060
ork+0.054
fire+0.052
 liar+0.052
ggy+0.052
Neg logit contributions
efully-0.076
 both-0.073
 forth-0.071
ipp-0.071
inding-0.071
Token I
Token ID314
Feature activation+2.10e-01

Pos logit contributions
ves+0.027
ork+0.025
fire+0.025
ggy+0.025
 liar+0.025
Neg logit contributions
efully-0.019
ipp-0.018
inding-0.018
ummy-0.018
 forth-0.017
Token'll
Token ID1183
Feature activation+6.27e-01

Pos logit contributions
efully+0.004
ipp+0.004
inding+0.004
 forth+0.004
ummy+0.004
Neg logit contributions
ves-0.006
fire-0.006
ggy-0.006
ork-0.006
 liar-0.006
Token her
Token ID607
Feature activation+4.34e-01

Pos logit contributions
efully+0.007
ummy+0.007
ipp+0.007
 forth+0.006
inding+0.006
Neg logit contributions
ves-0.020
fire-0.019
ork-0.019
ggy-0.019
 liar-0.019
Token mom
Token ID1995
Feature activation+7.81e-01

Pos logit contributions
ummy+0.007
efully+0.007
ipp+0.007
 both+0.007
inding+0.007
Neg logit contributions
ves-0.014
fire-0.013
ork-0.012
ggy-0.012
 liar-0.012
Token if
Token ID611
Feature activation+3.93e-02

Pos logit contributions
efully+0.012
inding+0.010
ipp+0.010
 forth+0.010
ummy+0.010
Neg logit contributions
ves-0.026
fire-0.024
ork-0.024
ggy-0.024
 liar-0.024
Token she
Token ID673
Feature activation-1.45e+00

Pos logit contributions
 forth+0.001
efully+0.001
ummy+0.001
 both+0.001
ipp+0.001
Neg logit contributions
ves-0.001
ork-0.001
fire-0.001
 liar-0.001
ata-0.001
Token could
Token ID714
Feature activation-3.18e-01

Pos logit contributions
ves+0.048
fire+0.046
ork+0.044
ggy+0.043
 liar+0.043
Neg logit contributions
efully-0.022
ipp-0.021
ummy-0.020
inding-0.020
 both-0.019
Token have
Token ID423
Feature activation-4.78e+00

Pos logit contributions
ves+0.010
fire+0.010
 liar+0.009
ork+0.009
 troll+0.009
Neg logit contributions
efully-0.005
ipp-0.005
 forth-0.005
ummy-0.005
inding-0.005
Token one
Token ID530
Feature activation-8.07e-01

Pos logit contributions
fire+0.153
ves+0.142
 troll+0.139
 liar+0.139
ork+0.135
Neg logit contributions
 both-0.091
efully-0.088
 each-0.086
 forth-0.084
 all-0.081
Token too
Token ID1165
Feature activation-2.85e+00

Pos logit contributions
ves+0.018
ggy+0.016
ork+0.016
fire+0.016
 liar+0.015
Neg logit contributions
efully-0.022
ipp-0.020
inding-0.020
ummy-0.020
 forth-0.020
Token.
Token ID13
Feature activation+9.17e-01

Pos logit contributions
ves+0.062
fire+0.060
con+0.054
ork+0.052
 villains+0.052
Neg logit contributions
ummy-0.081
efully-0.078
 both-0.078
ipp-0.076
inding-0.075
Token Her
Token ID2332
Feature activation+9.66e-01

Pos logit contributions
efully+0.022
ipp+0.020
inding+0.020
 forth+0.020
ummy+0.020
Neg logit contributions
ves-0.023
ork-0.021
ggy-0.020
fire-0.020
 liar-0.020
Token mom
Token ID1995
Feature activation+4.45e-02

Pos logit contributions
efully+0.024
ipp+0.022
asured+0.021
inding+0.021
el+0.021
Neg logit contributions
ves-0.023
ggy-0.022
ork-0.022
 liar-0.021
 jail-0.021
Token cautiously
Token ID39640
Feature activation-2.72e+00

Pos logit contributions
ves+0.021
fire+0.019
 liar+0.019
ork+0.019
ggy+0.019
Neg logit contributions
efully-0.018
ipp-0.017
inding-0.017
 forth-0.017
ummy-0.017
Token went
Token ID1816
Feature activation-1.50e+00

Pos logit contributions
ves+0.087
 liar+0.086
ggy+0.081
 villains+0.081
fire+0.081
Neg logit contributions
efully-0.046
el-0.045
 forth-0.042
 both-0.040
 each-0.038
Token closer
Token ID5699
Feature activation-1.71e+00

Pos logit contributions
ves+0.046
 liar+0.043
ggy+0.043
fire+0.042
ork+0.042
Neg logit contributions
efully-0.027
 forth-0.025
ipp-0.024
ummy-0.024
inding-0.024
Token to
Token ID284
Feature activation-1.39e+00

Pos logit contributions
ves+0.051
 liar+0.050
ggy+0.047
fire+0.047
ork+0.046
Neg logit contributions
efully-0.033
ummy-0.031
 forth-0.030
el-0.030
 both-0.030
Token take
Token ID1011
Feature activation-2.56e+00

Pos logit contributions
ves+0.043
 liar+0.039
ggy+0.039
ork+0.038
fire+0.038
Neg logit contributions
efully-0.026
 forth-0.025
 both-0.025
el-0.024
 each-0.024
Token a
Token ID257
Feature activation-4.71e+00

Pos logit contributions
 liar+0.090
ves+0.089
 villains+0.086
fire+0.086
 inconvenience+0.083
Neg logit contributions
el-0.039
 both-0.039
efully-0.038
 all-0.036
 each-0.035
Token look
Token ID804
Feature activation-1.61e+00

Pos logit contributions
ves+0.158
 liar+0.147
 villains+0.146
fire+0.144
ggy+0.143
Neg logit contributions
 forth-0.062
efully-0.058
 absolutely-0.057
ummy-0.057
 favourite-0.056
Token.
Token ID13
Feature activation-1.08e+00

Pos logit contributions
 liar+0.035
 villains+0.034
ves+0.034
fire+0.033
ggy+0.032
Neg logit contributions
ummy-0.041
el-0.041
 forth-0.040
efully-0.040
 both-0.039
Token Inside
Token ID14384
Feature activation+2.00e+00

Pos logit contributions
ves+0.037
 liar+0.035
ggy+0.034
fire+0.034
 villains+0.033
Neg logit contributions
efully-0.017
ummy-0.015
ipp-0.015
 both-0.015
inding-0.015
Token there
Token ID612
Feature activation+5.53e-01

Pos logit contributions
efully+0.043
ummy+0.041
ipp+0.041
el+0.040
 forth+0.040
Neg logit contributions
ves-0.053
fire-0.050
 liar-0.050
ggy-0.049
ork-0.048
Token was
Token ID373
Feature activation-1.43e+00

Pos logit contributions
efully+0.010
ipp+0.010
ummy+0.010
el+0.010
inding+0.009
Neg logit contributions
ves-0.016
 liar-0.015
fire-0.015
ggy-0.015
ork-0.015
Token the
Token ID262
Feature activation-3.32e-01

Pos logit contributions
efully+0.005
ipp+0.005
ummy+0.004
 forth+0.004
 both+0.004
Neg logit contributions
ves-0.020
fire-0.020
ork-0.019
ggy-0.019
 liar-0.019
Token tree
Token ID5509
Feature activation-1.53e-01

Pos logit contributions
ves+0.009
fire+0.008
ork+0.008
ggy+0.008
 liar+0.008
Neg logit contributions
efully-0.007
ipp-0.007
ummy-0.007
inding-0.006
 forth-0.006
Token if
Token ID611
Feature activation+5.64e-02

Pos logit contributions
ves+0.005
fire+0.004
ggy+0.004
 liar+0.004
ork+0.004
Neg logit contributions
ummy-0.003
efully-0.003
ipp-0.002
inding-0.002
el-0.002
Token she
Token ID673
Feature activation-8.02e-01

Pos logit contributions
 both+0.001
 forth+0.001
 each+0.001
 all+0.001
 nic+0.001
Neg logit contributions
ves-0.002
fire-0.002
ork-0.002
ata-0.002
ggy-0.002
Token could
Token ID714
Feature activation-5.82e-02

Pos logit contributions
ves+0.025
fire+0.023
ork+0.023
ggy+0.023
 liar+0.023
Neg logit contributions
efully-0.014
ipp-0.013
ummy-0.013
inding-0.013
 forth-0.012
Token have
Token ID423
Feature activation-4.70e+00

Pos logit contributions
ves+0.002
fire+0.002
 liar+0.002
ork+0.002
 villains+0.002
Neg logit contributions
efully-0.001
ipp-0.001
ummy-0.001
 both-0.001
el-0.001
Token some
Token ID617
Feature activation-1.85e+00

Pos logit contributions
fire+0.146
ves+0.145
 villains+0.137
ork+0.136
 liar+0.136
Neg logit contributions
 both-0.087
efully-0.084
 each-0.084
 all-0.080
el-0.077
Token.
Token ID13
Feature activation+3.64e-01

Pos logit contributions
ves+0.048
fire+0.045
ork+0.044
ggy+0.044
 liar+0.043
Neg logit contributions
efully-0.042
ummy-0.040
ipp-0.040
 forth-0.039
inding-0.039
Token The
Token ID383
Feature activation-4.68e-02

Pos logit contributions
efully+0.007
ipp+0.007
ummy+0.007
 both+0.007
el+0.007
Neg logit contributions
ves-0.011
 liar-0.010
ggy-0.010
fire-0.010
 villains-0.010
Token tree
Token ID5509
Feature activation+4.83e-01

Pos logit contributions
ves+0.001
fire+0.001
ork+0.001
ggy+0.001
 liar+0.001
Neg logit contributions
efully-0.001
ipp-0.001
ummy-0.001
 forth-0.001
inding-0.001
Token replied
Token ID8712
Feature activation+1.03e+00

Pos logit contributions
efully+0.011
ipp+0.010
ummy+0.010
inding+0.010
 forth+0.009
Neg logit contributions
ves-0.013
ggy-0.012
ork-0.012
fire-0.012
 liar-0.011
Token went
Token ID1816
Feature activation-1.14e+00

Pos logit contributions
ves+0.072
ork+0.065
fire+0.063
ggy+0.063
 liar+0.060
Neg logit contributions
 forth-0.048
efully-0.047
ipp-0.045
asured-0.044
inding-0.043
Token anywhere
Token ID6609
Feature activation-2.98e+00

Pos logit contributions
ves+0.040
fire+0.037
 liar+0.036
ggy+0.036
ork+0.036
Neg logit contributions
 forth-0.016
efully-0.016
 both-0.015
ummy-0.014
asured-0.014
Token without
Token ID1231
Feature activation-1.76e+00

Pos logit contributions
ves+0.090
fire+0.084
 troll+0.084
 liar+0.082
ork+0.082
Neg logit contributions
 both-0.052
 all-0.051
 forth-0.050
 especially-0.049
 each-0.049
Token her
Token ID607
Feature activation-1.86e-01

Pos logit contributions
ves+0.062
ork+0.058
fire+0.057
ggy+0.055
 liar+0.054
Neg logit contributions
efully-0.025
 forth-0.025
 both-0.024
el-0.023
asured-0.023
Token brave
Token ID14802
Feature activation-5.63e-01

Pos logit contributions
ves+0.006
fire+0.005
ork+0.005
ggy+0.005
 liar+0.005
Neg logit contributions
efully-0.003
ummy-0.003
 forth-0.003
inding-0.003
ipp-0.003
Token policeman
Token ID32166
Feature activation-4.69e+00

Pos logit contributions
ves+0.019
fire+0.017
ggy+0.017
ork+0.016
ata+0.016
Neg logit contributions
ummy-0.009
efully-0.009
 forth-0.009
el-0.009
inding-0.008
Token there
Token ID612
Feature activation-9.29e-01

Pos logit contributions
ves+0.126
ork+0.111
fire+0.109
 liar+0.106
 troll+0.106
Neg logit contributions
 forth-0.101
 both-0.099
 absolutely-0.095
efully-0.094
asured-0.094
Token to
Token ID284
Feature activation-4.97e-01

Pos logit contributions
ves+0.022
ork+0.019
 liar+0.019
ggy+0.019
fire+0.019
Neg logit contributions
efully-0.022
 forth-0.022
ummy-0.022
 both-0.022
ipp-0.022
Token keep
Token ID1394
Feature activation-6.47e-01

Pos logit contributions
ves+0.015
ork+0.014
 liar+0.014
ata+0.013
fire+0.013
Neg logit contributions
efully-0.009
 forth-0.009
 both-0.008
 definitely-0.008
el-0.008
Token her
Token ID607
Feature activation+1.76e-01

Pos logit contributions
ves+0.023
ork+0.022
 liar+0.022
fire+0.022
ggy+0.021
Neg logit contributions
efully-0.008
 forth-0.008
el-0.007
ummy-0.007
inding-0.007
Token safe
Token ID3338
Feature activation+8.91e-01

Pos logit contributions
efully+0.003
 forth+0.003
ummy+0.003
inding+0.003
ipp+0.003
Neg logit contributions
ves-0.005
ork-0.005
fire-0.005
ggy-0.005
 liar-0.005
Token€
Token ID26391
Feature activation-3.92e+00

Pos logit contributions
ves+0.035
 liar+0.032
ork+0.031
l+0.031
 traps+0.030
Neg logit contributions
efully-0.025
ipp-0.023
 definitely-0.022
initely-0.022
inding-0.022
Token�
Token ID447
Feature activation-7.85e-01

Pos logit contributions
 liar+0.103
ves+0.100
 traps+0.099
 inconvenience+0.098
fire+0.096
Neg logit contributions
 definitely-0.076
ipp-0.074
 both-0.074
inding-0.073
efully-0.072
Token�
Token ID250
Feature activation-1.83e-01

Pos logit contributions
ves+0.017
ggy+0.015
ork+0.015
 liar+0.015
fire+0.015
Neg logit contributions
efully-0.022
ipp-0.021
ummy-0.020
 forth-0.020
inding-0.020
Token let
Token ID1309
Feature activation-2.79e-01

Pos logit contributions
ves+0.005
ork+0.005
ggy+0.005
fire+0.005
 liar+0.005
Neg logit contributions
efully-0.004
ipp-0.004
inding-0.003
 forth-0.003
ummy-0.003
Tokenâ
Token ID22940
Feature activation-1.76e+00

Pos logit contributions
ves+0.009
fire+0.009
 liar+0.009
 villains+0.008
ork+0.008
Neg logit contributions
efully-0.004
el-0.004
ipp-0.004
 forth-0.004
 both-0.004
Token€
Token ID26391
Feature activation-4.67e+00

Pos logit contributions
ves+0.046
ork+0.042
 liar+0.042
 troll+0.040
ggy+0.040
Neg logit contributions
efully-0.041
ipp-0.038
 forth-0.038
initely-0.037
inding-0.036
Tokenâ„¢
Token ID8151
Feature activation-2.85e+00

Pos logit contributions
 liar+0.127
ves+0.126
 traps+0.123
fire+0.121
 bomb+0.121
Neg logit contributions
 both-0.083
 definitely-0.083
ipp-0.081
efully-0.080
inding-0.078
Tokens
Token ID82
Feature activation+3.45e-01

Pos logit contributions
ves+0.083
fire+0.083
 villains+0.081
 traps+0.081
 troll+0.080
Neg logit contributions
ipp-0.057
ummy-0.056
efully-0.051
 both-0.051
el-0.049
Token explore
Token ID7301
Feature activation-1.00e+00

Pos logit contributions
efully+0.007
ipp+0.006
ummy+0.006
inding+0.006
 forth+0.006
Neg logit contributions
ves-0.010
ggy-0.009
fire-0.009
ork-0.009
 liar-0.009
Token together
Token ID1978
Feature activation-1.49e-01

Pos logit contributions
ves+0.026
 liar+0.024
ork+0.024
ggy+0.024
fire+0.024
Neg logit contributions
efully-0.023
ummy-0.021
 forth-0.021
ipp-0.021
el-0.021
Token!
Token ID0
Feature activation-1.64e+00

Pos logit contributions
ves+0.003
ork+0.003
 liar+0.003
ggy+0.003
fire+0.003
Neg logit contributions
efully-0.004
ipp-0.004
ummy-0.004
 forth-0.004
inding-0.004
Token't
Token ID470
Feature activation-7.93e-02

Pos logit contributions
efully+0.011
ipp+0.010
ummy+0.010
inding+0.010
el+0.010
Neg logit contributions
ves-0.009
fire-0.008
 liar-0.008
ggy-0.008
ork-0.008
Token like
Token ID588
Feature activation+3.27e-01

Pos logit contributions
ves+0.002
fire+0.002
ork+0.002
con+0.002
ggy+0.002
Neg logit contributions
 both-0.001
efully-0.001
 forth-0.001
asured-0.001
ipp-0.001
Token all
Token ID477
Feature activation-5.34e-01

Pos logit contributions
efully+0.007
ipp+0.007
inding+0.007
ummy+0.007
 forth+0.007
Neg logit contributions
ves-0.009
ork-0.008
ggy-0.008
fire-0.008
 liar-0.008
Token of
Token ID286
Feature activation+5.88e-01

Pos logit contributions
ves+0.016
fire+0.015
ork+0.015
ggy+0.015
 liar+0.014
Neg logit contributions
efully-0.010
ummy-0.009
 forth-0.009
ipp-0.009
inding-0.009
Token the
Token ID262
Feature activation-1.68e+00

Pos logit contributions
efully+0.010
ipp+0.010
ummy+0.009
inding+0.009
 forth+0.009
Neg logit contributions
ves-0.018
fire-0.017
ork-0.017
ggy-0.017
 liar-0.017
Token unnecessary
Token ID13114
Feature activation-4.67e+00

Pos logit contributions
ves+0.051
fire+0.047
ork+0.047
ggy+0.047
 liar+0.045
Neg logit contributions
efully-0.030
 forth-0.029
ummy-0.028
ipp-0.027
 both-0.027
Token extravag
Token ID36827
Feature activation+2.83e-02

Pos logit contributions
ves+0.127
fire+0.121
ork+0.120
ggy+0.115
ata+0.112
Neg logit contributions
 both-0.087
 each-0.087
 all-0.087
efully-0.082
 forth-0.080
Tokenance
Token ID590
Feature activation-1.71e-01

Pos logit contributions
efully+0.001
ipp+0.001
ummy+0.001
inding+0.001
 forth+0.001
Neg logit contributions
ves-0.001
fire-0.001
ggy-0.001
ork-0.001
 liar-0.001
Token.
Token ID13
Feature activation+7.14e-01

Pos logit contributions
ves+0.004
fire+0.004
ork+0.004
ggy+0.004
 liar+0.004
Neg logit contributions
efully-0.004
ipp-0.004
ummy-0.004
 forth-0.004
inding-0.004
Token
Token ID198
Feature activation+2.44e-01

Pos logit contributions
ummy+0.016
efully+0.016
ipp+0.015
el+0.015
inding+0.014
Neg logit contributions
ves-0.020
 liar-0.019
fire-0.019
ggy-0.018
 killed-0.017
Token
Token ID198
Feature activation+1.14e+00

Pos logit contributions
ummy+0.005
efully+0.004
 both+0.004
ipp+0.004
inding+0.004
Neg logit contributions
 liar-0.007
fire-0.007
 troll-0.007
 killed-0.007
 traps-0.007
Token and
Token ID290
Feature activation-7.47e-01

Pos logit contributions
ves+0.023
 liar+0.021
ork+0.020
fire+0.020
ggy+0.020
Neg logit contributions
efully-0.029
ipp-0.028
 forth-0.028
inding-0.028
ummy-0.027
Token he
Token ID339
Feature activation-1.55e-02

Pos logit contributions
ves+0.021
fire+0.019
ggy+0.019
ork+0.019
 liar+0.019
Neg logit contributions
efully-0.016
ipp-0.015
ummy-0.015
inding-0.014
el-0.014
Token would
Token ID561
Feature activation+1.74e+00

Pos logit contributions
ves+0.000
ggy+0.000
fire+0.000
ork+0.000
 liar+0.000
Neg logit contributions
efully-0.000
ummy-0.000
ipp-0.000
inding-0.000
 forth-0.000
Token never
Token ID1239
Feature activation-1.98e+00

Pos logit contributions
efully+0.038
ummy+0.035
ipp+0.034
inding+0.034
aleb+0.033
Neg logit contributions
ves-0.046
ggy-0.044
fire-0.043
ork-0.043
 liar-0.042
Token let
Token ID1309
Feature activation-1.81e+00

Pos logit contributions
ves+0.066
ork+0.060
fire+0.060
ggy+0.059
 liar+0.057
Neg logit contributions
efully-0.031
 forth-0.030
ipp-0.028
inding-0.028
asured-0.028
Token anyone
Token ID2687
Feature activation-4.64e+00

Pos logit contributions
ves+0.056
fire+0.056
 liar+0.055
 troll+0.052
ork+0.052
Neg logit contributions
 both-0.031
 each-0.029
 all-0.029
el-0.027
ipp-0.026
Token else
Token ID2073
Feature activation-2.35e+00

Pos logit contributions
ves+0.112
ork+0.102
fire+0.100
 liar+0.100
ggy+0.099
Neg logit contributions
efully-0.112
 forth-0.108
ummy-0.105
ipp-0.105
inding-0.105
Token get
Token ID651
Feature activation-1.79e+00

Pos logit contributions
ves+0.056
ggy+0.051
fire+0.051
ork+0.051
 liar+0.050
Neg logit contributions
efully-0.058
ipp-0.055
inding-0.054
ummy-0.054
 forth-0.053
Token near
Token ID1474
Feature activation-8.42e-01

Pos logit contributions
ves+0.052
fire+0.048
ggy+0.048
ork+0.047
 liar+0.047
Neg logit contributions
efully-0.035
 both-0.035
el-0.034
 forth-0.034
ipp-0.034
Token it
Token ID340
Feature activation-1.31e+00

Pos logit contributions
ves+0.023
fire+0.022
 liar+0.022
ata+0.022
ork+0.021
Neg logit contributions
el-0.018
 both-0.017
ummy-0.017
 each-0.017
 forth-0.017
Token.
Token ID13
Feature activation-3.94e-01

Pos logit contributions
ves+0.026
 liar+0.023
ggy+0.023
fire+0.022
ork+0.022
Neg logit contributions
efully-0.038
 forth-0.037
 both-0.037
ummy-0.037
el-0.037
Token her
Token ID607
Feature activation+7.13e-01

Pos logit contributions
efully+0.012
ummy+0.012
ipp+0.012
 both+0.012
 forth+0.012
Neg logit contributions
ves-0.020
fire-0.019
ork-0.019
ggy-0.019
 liar-0.018
Token mom
Token ID1995
Feature activation+3.84e-01

Pos logit contributions
efully+0.011
ipp+0.011
ummy+0.010
inding+0.010
 forth+0.010
Neg logit contributions
ves-0.023
fire-0.022
ork-0.022
ggy-0.021
 liar-0.021
Token if
Token ID611
Feature activation+2.40e-01

Pos logit contributions
efully+0.008
ipp+0.008
inding+0.007
 forth+0.007
ummy+0.007
Neg logit contributions
ves-0.010
fire-0.010
ork-0.010
ggy-0.010
 liar-0.009
Token she
Token ID673
Feature activation-1.17e+00

Pos logit contributions
efully+0.005
 forth+0.005
ipp+0.005
ummy+0.005
 both+0.005
Neg logit contributions
ves-0.006
fire-0.005
ork-0.005
ggy-0.005
 liar-0.005
Token could
Token ID714
Feature activation-3.27e-01

Pos logit contributions
ves+0.036
fire+0.034
ork+0.033
ggy+0.033
 liar+0.033
Neg logit contributions
efully-0.021
ipp-0.019
ummy-0.019
inding-0.019
 forth-0.018
Token have
Token ID423
Feature activation-4.63e+00

Pos logit contributions
ves+0.009
fire+0.009
ork+0.009
 liar+0.009
ggy+0.009
Neg logit contributions
efully-0.006
ipp-0.006
ummy-0.006
 forth-0.006
inding-0.006
Token one
Token ID530
Feature activation-4.47e-01

Pos logit contributions
fire+0.137
ves+0.133
 troll+0.130
 liar+0.129
ork+0.127
Neg logit contributions
 both-0.089
efully-0.087
 each-0.086
ipp-0.082
el-0.081
Token too
Token ID1165
Feature activation-2.67e+00

Pos logit contributions
ves+0.010
ggy+0.009
ork+0.009
fire+0.009
 liar+0.008
Neg logit contributions
efully-0.012
inding-0.011
ipp-0.011
 forth-0.011
ummy-0.011
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
ves+0.058
fire+0.055
con+0.050
 villains+0.050
ork+0.050
Neg logit contributions
ummy-0.075
 both-0.073
efully-0.073
ipp-0.071
inding-0.070
Token Her
Token ID2332
Feature activation+1.48e+00

Pos logit contributions
efully+0.029
 forth+0.026
ipp+0.026
inding+0.026
ummy+0.026
Neg logit contributions
ves-0.042
ork-0.039
ggy-0.039
fire-0.039
 liar-0.038
Token mom
Token ID1995
Feature activation+1.42e-01

Pos logit contributions
efully+0.039
ipp+0.035
inding+0.035
initely+0.034
asured+0.034
Neg logit contributions
ggy-0.034
ves-0.034
ork-0.031
 liar-0.031
 jail-0.029
Token mom
Token ID1995
Feature activation+1.70e+00

Pos logit contributions
efully+0.026
ipp+0.024
inding+0.023
 forth+0.023
el+0.022
Neg logit contributions
ves-0.036
ggy-0.034
ork-0.034
 liar-0.033
 jail-0.032
Tokenmy
Token ID1820
Feature activation+1.28e+00

Pos logit contributions
efully+0.033
inding+0.027
initely+0.027
aleb+0.026
ipp+0.026
Neg logit contributions
l-0.047
ves-0.047
,"-0.044
ork-0.043
ata-0.043
Token if
Token ID611
Feature activation+3.88e-01

Pos logit contributions
efully+0.033
 forth+0.030
inding+0.030
ipp+0.029
asured+0.029
Neg logit contributions
ves-0.029
ork-0.026
fire-0.026
l-0.026
ggy-0.026
Token she
Token ID673
Feature activation-1.19e+00

Pos logit contributions
efully+0.009
 forth+0.009
ipp+0.009
ummy+0.009
inding+0.009
Neg logit contributions
ves-0.009
fire-0.009
ork-0.008
ggy-0.008
 liar-0.008
Token could
Token ID714
Feature activation-1.83e-01

Pos logit contributions
ves+0.035
fire+0.032
ggy+0.032
ork+0.032
 liar+0.031
Neg logit contributions
efully-0.023
ipp-0.021
ummy-0.021
inding-0.021
 forth-0.021
Token have
Token ID423
Feature activation-4.62e+00

Pos logit contributions
ves+0.005
fire+0.005
 liar+0.005
ork+0.005
ggy+0.005
Neg logit contributions
efully-0.004
ipp-0.004
ummy-0.003
inding-0.003
 forth-0.003
Token it
Token ID340
Feature activation-7.83e-01

Pos logit contributions
fire+0.138
ves+0.130
 liar+0.125
 troll+0.123
ork+0.122
Neg logit contributions
 both-0.092
efully-0.089
 each-0.086
 forth-0.085
 all-0.083
Token.
Token ID13
Feature activation+1.39e+00

Pos logit contributions
ves+0.015
fire+0.014
 liar+0.014
ork+0.014
 troll+0.013
Neg logit contributions
efully-0.022
ipp-0.022
ummy-0.022
 both-0.022
 forth-0.022
Token Her
Token ID2332
Feature activation+1.93e+00

Pos logit contributions
efully+0.037
asured+0.035
 forth+0.034
inding+0.034
ipp+0.034
Neg logit contributions
ves-0.030
ork-0.029
ggy-0.027
 jail-0.025
con-0.025
Token mom
Token ID1995
Feature activation+1.58e+00

Pos logit contributions
efully+0.056
asured+0.052
ipp+0.052
initely+0.051
ishing+0.050
Neg logit contributions
ggy-0.038
ves-0.038
ork-0.036
 liar-0.032
 jail-0.032
Tokenmy
Token ID1820
Feature activation+5.58e-01

Pos logit contributions
ires+0.037
efully+0.037
umm+0.037
initely+0.035
ummy+0.034
Neg logit contributions
ves-0.033
ggy-0.032
l-0.032
,"-0.032
ata-0.031
Token shop
Token ID6128
Feature activation+1.80e-01

Pos logit contributions
ves+0.017
ork+0.015
fire+0.015
ggy+0.015
ata+0.014
Neg logit contributions
efully-0.014
ummy-0.013
ipp-0.013
 forth-0.012
inding-0.012
Tokenkeeper
Token ID13884
Feature activation-4.21e-02

Pos logit contributions
efully+0.005
ipp+0.005
ummy+0.005
inding+0.005
 forth+0.005
Neg logit contributions
ves-0.004
ggy-0.003
ork-0.003
fire-0.003
 liar-0.003
Token if
Token ID611
Feature activation+3.50e-01

Pos logit contributions
ves+0.001
ggy+0.001
fire+0.001
ork+0.001
 liar+0.001
Neg logit contributions
efully-0.001
ipp-0.001
ummy-0.001
inding-0.001
 forth-0.001
Token they
Token ID484
Feature activation-7.00e-01

Pos logit contributions
efully+0.008
 forth+0.008
ummy+0.008
ipp+0.008
inding+0.008
Neg logit contributions
ves-0.009
fire-0.008
ork-0.008
ggy-0.008
 liar-0.008
Token could
Token ID714
Feature activation-8.33e-02

Pos logit contributions
ves+0.020
fire+0.018
ork+0.018
ggy+0.018
 liar+0.018
Neg logit contributions
efully-0.014
ipp-0.013
ummy-0.013
inding-0.013
 forth-0.013
Token have
Token ID423
Feature activation-4.60e+00

Pos logit contributions
ves+0.002
fire+0.002
ork+0.002
ggy+0.002
 liar+0.002
Neg logit contributions
efully-0.002
ipp-0.002
ummy-0.002
inding-0.002
 forth-0.002
Token a
Token ID257
Feature activation-3.32e+00

Pos logit contributions
fire+0.136
ves+0.136
 liar+0.131
 villains+0.130
ork+0.127
Neg logit contributions
 both-0.089
 each-0.086
efully-0.084
 all-0.079
el-0.078
Token little
Token ID1310
Feature activation-2.49e+00

Pos logit contributions
ves+0.098
 villains+0.089
fire+0.089
ggy+0.088
 liar+0.085
Neg logit contributions
ummy-0.060
ipp-0.058
efully-0.058
 forth-0.057
 favourite-0.056
Token pie
Token ID2508
Feature activation-3.36e-01

Pos logit contributions
ves+0.073
ggy+0.064
ork+0.063
fire+0.062
 villains+0.061
Neg logit contributions
 forth-0.050
efully-0.050
ummy-0.049
ish-0.047
el-0.046
Token.
Token ID13
Feature activation+9.61e-01

Pos logit contributions
ves+0.008
 liar+0.007
ork+0.007
ggy+0.007
fire+0.007
Neg logit contributions
efully-0.008
ipp-0.008
 forth-0.008
ummy-0.008
inding-0.008
Token
Token ID220
Feature activation-1.33e+00

Pos logit contributions
efully+0.024
ipp+0.023
 forth+0.023
inding+0.023
asured+0.022
Neg logit contributions
ves-0.022
ork-0.020
fire-0.020
ggy-0.019
 jail-0.019
Token.
Token ID13
Feature activation+6.79e-02

Pos logit contributions
efully+0.023
ipp+0.022
inding+0.022
ummy+0.022
 forth+0.022
Neg logit contributions
ves-0.018
fire-0.016
ork-0.016
ggy-0.016
 liar-0.016
Token They
Token ID1119
Feature activation+9.73e-01

Pos logit contributions
efully+0.001
ipp+0.001
ummy+0.001
inding+0.001
 forth+0.001
Neg logit contributions
ves-0.002
ggy-0.002
fire-0.002
ork-0.002
 liar-0.002
Token put
Token ID1234
Feature activation+1.03e+00

Pos logit contributions
ummy+0.025
efully+0.025
inding+0.024
ipp+0.023
el+0.023
Neg logit contributions
ves-0.021
fire-0.020
 liar-0.019
 killed-0.019
 troll-0.019
Token the
Token ID262
Feature activation+8.26e-01

Pos logit contributions
efully+0.024
inding+0.023
ipp+0.023
ummy+0.023
asured+0.022
Neg logit contributions
ves-0.025
fire-0.024
ork-0.023
ggy-0.022
 liar-0.022
Token dirty
Token ID11841
Feature activation+2.43e+00

Pos logit contributions
efully+0.021
ipp+0.020
inding+0.020
amb+0.019
asured+0.019
Neg logit contributions
ves-0.020
fire-0.019
 liar-0.019
ork-0.018
 jail-0.018
Token plates
Token ID13375
Feature activation+2.11e+00

Pos logit contributions
efully+0.057
inding+0.052
ipp+0.052
asured+0.050
 definitely+0.049
Neg logit contributions
ves-0.063
fire-0.061
ork-0.060
 liar-0.057
 jail-0.056
Token and
Token ID290
Feature activation+2.94e-01

Pos logit contributions
efully+0.047
inding+0.044
ipp+0.043
aleb+0.042
asured+0.042
Neg logit contributions
ves-0.055
fire-0.054
ork-0.052
l-0.050
ata-0.049
Token cups
Token ID14180
Feature activation+1.53e+00

Pos logit contributions
efully+0.007
ipp+0.007
inding+0.007
ummy+0.007
amb+0.007
Neg logit contributions
ves-0.007
fire-0.007
 traps-0.006
 jail-0.006
 killed-0.006
Token inside
Token ID2641
Feature activation+1.18e+00

Pos logit contributions
efully+0.029
ipp+0.026
inding+0.026
ummy+0.026
 forth+0.026
Neg logit contributions
ves-0.046
fire-0.043
ork-0.042
ggy-0.042
 liar-0.041
Token the
Token ID262
Feature activation+3.17e-01

Pos logit contributions
efully+0.031
ummy+0.030
ipp+0.030
 forth+0.030
 both+0.030
Neg logit contributions
ves-0.025
ggy-0.023
 liar-0.023
ork-0.023
fire-0.022
Token dish
Token ID9433
Feature activation+1.51e+00

Pos logit contributions
efully+0.007
ipp+0.007
inding+0.007
asured+0.006
 forth+0.006
Neg logit contributions
ves-0.008
fire-0.008
ork-0.008
 jail-0.008
 liar-0.008
Token anything
Token ID1997
Feature activation-1.16e+00

Pos logit contributions
fire+0.019
ves+0.019
 liar+0.018
con+0.018
 troll+0.018
Neg logit contributions
efully-0.016
 both-0.016
ipp-0.015
 absolutely-0.015
inding-0.015
Token!
Token ID0
Feature activation+3.22e-01

Pos logit contributions
ves+0.032
ork+0.030
 liar+0.029
ggy+0.029
fire+0.029
Neg logit contributions
efully-0.022
 both-0.022
 absolutely-0.021
ipp-0.021
 forth-0.021
Token
Token ID198
Feature activation+4.36e-01

Pos logit contributions
efully+0.007
 forth+0.007
ipp+0.007
inding+0.007
ummy+0.007
Neg logit contributions
ves-0.008
ork-0.008
fire-0.007
ggy-0.007
 liar-0.007
Token
Token ID198
Feature activation+1.39e+00

Pos logit contributions
efully+0.011
ummy+0.010
ipp+0.010
inding+0.010
 both+0.010
Neg logit contributions
 liar-0.010
ves-0.010
fire-0.009
 troll-0.009
ork-0.009
TokenSarah
Token ID29284
Feature activation+9.68e-01

Pos logit contributions
efully+0.033
ipp+0.032
ummy+0.032
 forth+0.031
el+0.031
Neg logit contributions
ves-0.036
ork-0.032
ggy-0.032
fire-0.031
ata-0.029
Token and
Token ID290
Feature activation+3.04e+00

Pos logit contributions
efully+0.021
ummy+0.020
ipp+0.019
inding+0.019
initely+0.019
Neg logit contributions
ves-0.025
ggy-0.024
ork-0.023
fire-0.023
ata-0.022
Token Jack
Token ID3619
Feature activation+4.03e-02

Pos logit contributions
efully+0.077
asured+0.076
ipp+0.076
 prett+0.073
iring+0.073
Neg logit contributions
 jail-0.060
 troll-0.056
ggy-0.054
 arrest-0.053
ves-0.053
Token kept
Token ID4030
Feature activation-2.98e-02

Pos logit contributions
ummy+0.001
efully+0.001
inding+0.001
ires+0.001
iring+0.001
Neg logit contributions
ves-0.001
ggy-0.001
fire-0.001
 liar-0.001
ata-0.001
Token joking
Token ID29711
Feature activation+3.36e-01

Pos logit contributions
ggy+0.001
ves+0.001
 jail+0.001
 crashes+0.001
 killed+0.001
Neg logit contributions
ummy-0.001
ipp-0.001
efully-0.001
ust-0.001
iring-0.001
Token until
Token ID1566
Feature activation+6.63e-01

Pos logit contributions
efully+0.006
ipp+0.006
inding+0.006
ummy+0.006
 forth+0.006
Neg logit contributions
ves-0.010
fire-0.009
ork-0.009
ggy-0.009
 liar-0.009
Token the
Token ID262
Feature activation-3.11e-01

Pos logit contributions
efully+0.016
 forth+0.015
ipp+0.015
ummy+0.015
inding+0.015
Neg logit contributions
ves-0.016
fire-0.015
ggy-0.015
ork-0.015
 liar-0.015
Token boy
Token ID2933
Feature activation-1.39e+00

Pos logit contributions
efully+0.015
ipp+0.014
inding+0.014
ummy+0.014
 forth+0.014
Neg logit contributions
ves-0.018
fire-0.016
ggy-0.016
ork-0.016
 liar-0.016
Token named
Token ID3706
Feature activation-5.66e-01

Pos logit contributions
ves+0.030
fire+0.027
ork+0.026
con+0.026
 liar+0.026
Neg logit contributions
el-0.038
efully-0.037
 forth-0.036
ummy-0.036
inding-0.035
Token Tim
Token ID5045
Feature activation+2.65e-01

Pos logit contributions
ves+0.014
ggy+0.013
fire+0.013
ork+0.012
 liar+0.012
Neg logit contributions
efully-0.014
ipp-0.013
inding-0.013
 forth-0.013
asured-0.012
Token found
Token ID1043
Feature activation-1.66e-01

Pos logit contributions
efully+0.005
ipp+0.005
 forth+0.005
inding+0.005
ummy+0.005
Neg logit contributions
ves-0.008
fire-0.007
ork-0.007
ggy-0.007
 liar-0.007
Token a
Token ID257
Feature activation-4.28e-01

Pos logit contributions
ves+0.005
fire+0.004
ork+0.004
 liar+0.004
ggy+0.004
Neg logit contributions
efully-0.004
ipp-0.003
el-0.003
 both-0.003
ummy-0.003
Token big
Token ID1263
Feature activation+2.23e+00

Pos logit contributions
ves+0.011
ork+0.010
 liar+0.010
ggy+0.010
fire+0.010
Neg logit contributions
efully-0.010
ipp-0.009
inding-0.009
 forth-0.009
asured-0.009
Token box
Token ID3091
Feature activation+4.37e-01

Pos logit contributions
efully+0.066
ipp+0.058
asured+0.057
inding+0.057
ires+0.053
Neg logit contributions
 jail-0.048
ork-0.045
 liar-0.045
 troll-0.044
ves-0.043
Token.
Token ID13
Feature activation+9.90e-01

Pos logit contributions
efully+0.009
ipp+0.008
inding+0.008
ummy+0.008
 forth+0.008
Neg logit contributions
ves-0.012
fire-0.011
ggy-0.011
ork-0.011
 liar-0.011
Token It
Token ID632
Feature activation+1.39e+00

Pos logit contributions
efully+0.027
 forth+0.026
ipp+0.025
aleb+0.025
inding+0.024
Neg logit contributions
ves-0.020
 jail-0.020
ork-0.019
fire-0.018
ggy-0.018
Token was
Token ID373
Feature activation+2.23e+00

Pos logit contributions
efully+0.040
ipp+0.038
inding+0.037
aleb+0.037
 forth+0.037
Neg logit contributions
ves-0.030
ork-0.028
ggy-0.027
fire-0.026
 killed-0.025
Token a
Token ID257
Feature activation+5.56e-01

Pos logit contributions
efully+0.069
aleb+0.064
ipp+0.064
amb+0.061
 forth+0.061
Neg logit contributions
 jail-0.044
ves-0.044
ork-0.042
ggy-0.040
fire-0.039
Token in
Token ID287
Feature activation+4.12e-01

Pos logit contributions
efully+0.007
ummy+0.007
ipp+0.007
inding+0.007
 forth+0.006
Neg logit contributions
ves-0.014
ggy-0.013
fire-0.013
ork-0.013
 liar-0.013
Token a
Token ID257
Feature activation-1.04e+00

Pos logit contributions
efully+0.008
ipp+0.008
ummy+0.008
inding+0.008
el+0.007
Neg logit contributions
ves-0.012
fire-0.011
ggy-0.011
ork-0.011
 liar-0.011
Token towel
Token ID24808
Feature activation+1.64e+00

Pos logit contributions
ves+0.030
fire+0.027
ggy+0.027
ork+0.027
 liar+0.027
Neg logit contributions
efully-0.020
ipp-0.019
ummy-0.019
 forth-0.019
inding-0.019
Token.
Token ID13
Feature activation+2.89e-01

Pos logit contributions
efully+0.041
ipp+0.039
ummy+0.039
 forth+0.039
inding+0.038
Neg logit contributions
ves-0.038
 liar-0.035
ggy-0.035
ork-0.034
fire-0.033
Token She
Token ID1375
Feature activation+7.32e-01

Pos logit contributions
efully+0.005
ummy+0.005
ipp+0.005
inding+0.005
 both+0.005
Neg logit contributions
ves-0.009
fire-0.008
ggy-0.008
ork-0.008
 liar-0.008
Token dried
Token ID16577
Feature activation+2.29e+00

Pos logit contributions
efully+0.015
ummy+0.014
ipp+0.014
inding+0.014
 forth+0.014
Neg logit contributions
ves-0.021
ggy-0.019
fire-0.019
ork-0.019
 liar-0.018
Token him
Token ID683
Feature activation+1.43e+00

Pos logit contributions
efully+0.042
ipp+0.039
ummy+0.038
inding+0.038
 forth+0.038
Neg logit contributions
ves-0.069
fire-0.064
ork-0.064
ggy-0.064
 liar-0.063
Token and
Token ID290
Feature activation-5.95e-01

Pos logit contributions
efully+0.030
ummy+0.027
ipp+0.027
aleb+0.027
 definitely+0.025
Neg logit contributions
ves-0.041
ggy-0.039
ork-0.038
fire-0.038
 jail-0.035
Token dressed
Token ID12049
Feature activation+2.36e+00

Pos logit contributions
ves+0.017
fire+0.016
ork+0.016
ggy+0.016
 liar+0.015
Neg logit contributions
efully-0.012
ipp-0.011
ummy-0.011
inding-0.011
 forth-0.011
Token him
Token ID683
Feature activation+1.30e+00

Pos logit contributions
efully+0.052
inding+0.049
ummy+0.049
ipp+0.048
aleb+0.046
Neg logit contributions
ves-0.062
fire-0.056
ork-0.056
ggy-0.055
l-0.054
Token and
Token ID290
Feature activation-2.80e-01

Pos logit contributions
efully+0.024
ummy+0.023
ipp+0.022
inding+0.022
aleb+0.021
Neg logit contributions
ves-0.039
ggy-0.037
fire-0.036
ork-0.036
l-0.034
Token.
Token ID13
Feature activation-3.00e-01

Pos logit contributions
 liar+0.022
ves+0.021
con+0.021
ork+0.020
 villains+0.020
Neg logit contributions
 both-0.023
 all-0.023
 each-0.023
ummy-0.022
el-0.021
Token
Token ID198
Feature activation-3.69e-01

Pos logit contributions
ves+0.008
 liar+0.008
fire+0.008
ggy+0.008
 troll+0.008
Neg logit contributions
efully-0.006
ummy-0.006
 both-0.006
ipp-0.006
el-0.006
TokenNext
Token ID10019
Feature activation+6.63e-01

Pos logit contributions
 liar+0.011
fire+0.011
con+0.011
 troll+0.010
 traps+0.010
Neg logit contributions
ummy-0.006
 both-0.006
efully-0.006
 each-0.006
el-0.006
Token to
Token ID284
Feature activation+1.03e+00

Pos logit contributions
efully+0.015
ipp+0.014
ummy+0.014
inding+0.014
 forth+0.014
Neg logit contributions
ves-0.017
fire-0.016
ork-0.015
 liar-0.015
ggy-0.015
Token the
Token ID262
Feature activation+2.98e-01

Pos logit contributions
efully+0.022
ipp+0.020
ummy+0.020
inding+0.020
 forth+0.020
Neg logit contributions
ves-0.029
fire-0.027
ggy-0.027
ork-0.026
 liar-0.026
Token patch
Token ID8529
Feature activation+1.98e+00

Pos logit contributions
efully+0.007
asured+0.007
ipp+0.007
inding+0.006
ummy+0.006
Neg logit contributions
ves-0.008
fire-0.007
ork-0.007
ggy-0.007
 troll-0.007
Token,
Token ID11
Feature activation-5.97e-01

Pos logit contributions
efully+0.038
ummy+0.037
ipp+0.037
 forth+0.037
el+0.036
Neg logit contributions
ves-0.053
 liar-0.052
fire-0.051
ork-0.050
ggy-0.050
Token lived
Token ID5615
Feature activation+5.49e-01

Pos logit contributions
ves+0.017
ork+0.016
 liar+0.016
ggy+0.015
fire+0.015
Neg logit contributions
efully-0.012
 forth-0.011
 both-0.011
ummy-0.011
el-0.011
Token a
Token ID257
Feature activation-5.17e-01

Pos logit contributions
efully+0.008
ipp+0.007
inding+0.007
 both+0.007
asured+0.007
Neg logit contributions
ves-0.019
con-0.018
ork-0.018
 liar-0.017
ggy-0.017
Token small
Token ID1402
Feature activation+9.24e-01

Pos logit contributions
ves+0.018
ata+0.016
ork+0.016
con+0.016
seek+0.016
Neg logit contributions
ipp-0.006
ummy-0.006
 favourite-0.006
 favorite-0.006
 adorable-0.006
Token girl
Token ID2576
Feature activation-1.66e+00

Pos logit contributions
ummy+0.018
efully+0.018
 forth+0.018
el+0.018
ipp+0.017
Neg logit contributions
ves-0.027
ggy-0.024
ork-0.024
fire-0.023
 liar-0.023
Token garden
Token ID11376
Feature activation+1.29e+00

Pos logit contributions
efully+0.025
ipp+0.023
 forth+0.022
inding+0.022
asured+0.022
Neg logit contributions
ves-0.032
fire-0.030
ork-0.030
ggy-0.029
 liar-0.029
Token often
Token ID1690
Feature activation-1.45e-02

Pos logit contributions
efully+0.022
ipp+0.020
ummy+0.020
inding+0.020
 forth+0.020
Neg logit contributions
ves-0.040
ggy-0.037
ork-0.037
fire-0.037
 liar-0.037
Token,
Token ID11
Feature activation+5.53e-01

Pos logit contributions
ves+0.000
ggy+0.000
ork+0.000
fire+0.000
 liar+0.000
Neg logit contributions
efully-0.000
 forth-0.000
ipp-0.000
ummy-0.000
inding-0.000
Token and
Token ID290
Feature activation-3.91e-01

Pos logit contributions
efully+0.014
ipp+0.012
ummy+0.012
el+0.012
aleb+0.012
Neg logit contributions
fire-0.013
ves-0.013
 jail-0.013
 liar-0.012
 killed-0.012
Token the
Token ID262
Feature activation+2.96e-01

Pos logit contributions
ves+0.009
fire+0.008
ggy+0.008
ork+0.008
 liar+0.008
Neg logit contributions
efully-0.010
ipp-0.009
ummy-0.009
inding-0.009
el-0.009
Token wise
Token ID10787
Feature activation+6.25e-01

Pos logit contributions
efully+0.005
ummy+0.005
ipp+0.005
 forth+0.005
inding+0.005
Neg logit contributions
ves-0.009
ggy-0.008
ork-0.008
fire-0.008
 liar-0.008
Token old
Token ID1468
Feature activation+5.66e-01

Pos logit contributions
ummy+0.011
ipp+0.010
el+0.010
inding+0.009
efully+0.009
Neg logit contributions
ves-0.020
ata-0.019
fire-0.019
ggy-0.019
 villains-0.018
Token bird
Token ID6512
Feature activation-7.91e-02

Pos logit contributions
ummy+0.013
efully+0.013
el+0.013
 forth+0.013
ipp+0.013
Neg logit contributions
ves-0.014
ata-0.012
ggy-0.012
 villains-0.012
fire-0.012
Token always
Token ID1464
Feature activation-1.74e+00

Pos logit contributions
ves+0.002
ork+0.002
 liar+0.002
fire+0.002
ggy+0.002
Neg logit contributions
efully-0.001
 forth-0.001
el-0.001
 both-0.001
ipp-0.001
Token warned
Token ID7728
Feature activation+1.12e+00

Pos logit contributions
ves+0.050
fire+0.046
ggy+0.046
ork+0.046
 liar+0.046
Neg logit contributions
efully-0.034
ipp-0.032
ummy-0.031
inding-0.031
 forth-0.031
Token him
Token ID683
Feature activation+1.94e-01

Pos logit contributions
 both+0.015
efully+0.014
 forth+0.014
 each+0.014
el+0.013
Neg logit contributions
ves-0.039
 liar-0.038
ork-0.038
fire-0.038
ata-0.037
Token found
Token ID1043
Feature activation+2.69e-01

Pos logit contributions
ves+0.008
fire+0.007
ork+0.007
ggy+0.007
 liar+0.007
Neg logit contributions
efully-0.006
ipp-0.005
inding-0.005
ummy-0.005
 forth-0.005
Token a
Token ID257
Feature activation-1.14e+00

Pos logit contributions
efully+0.006
ipp+0.006
ummy+0.006
inding+0.006
 forth+0.006
Neg logit contributions
ves-0.007
fire-0.006
ork-0.006
ggy-0.006
 liar-0.006
Token good
Token ID922
Feature activation+1.23e+00

Pos logit contributions
ves+0.033
fire+0.030
ggy+0.030
ork+0.030
 liar+0.029
Neg logit contributions
efully-0.021
ummy-0.020
 forth-0.020
ipp-0.020
inding-0.019
Token hiding
Token ID11816
Feature activation+1.01e+00

Pos logit contributions
efully+0.026
ipp+0.024
ummy+0.024
 forth+0.024
inding+0.024
Neg logit contributions
ves-0.034
fire-0.031
ggy-0.030
ork-0.030
 liar-0.030
Token spot
Token ID4136
Feature activation+1.56e+00

Pos logit contributions
efully+0.018
 forth+0.017
ipp+0.016
inding+0.016
ummy+0.016
Neg logit contributions
ves-0.030
 liar-0.028
fire-0.027
ggy-0.027
ork-0.027
Token,
Token ID11
Feature activation-1.99e-01

Pos logit contributions
efully+0.027
ipp+0.025
ummy+0.025
 forth+0.025
inding+0.025
Neg logit contributions
ves-0.048
ork-0.044
ggy-0.044
fire-0.044
 liar-0.044
Token but
Token ID475
Feature activation+7.14e-01

Pos logit contributions
ves+0.006
fire+0.005
ggy+0.005
 liar+0.005
ork+0.005
Neg logit contributions
efully-0.004
ipp-0.004
ummy-0.004
inding-0.004
el-0.003
Token he
Token ID339
Feature activation+2.34e-01

Pos logit contributions
efully+0.020
ipp+0.020
ummy+0.019
inding+0.019
el+0.019
Neg logit contributions
ves-0.014
fire-0.013
ggy-0.013
 jail-0.012
 liar-0.012
Token accidentally
Token ID14716
Feature activation-2.68e+00

Pos logit contributions
efully+0.004
ipp+0.004
ummy+0.004
inding+0.004
 forth+0.004
Neg logit contributions
ves-0.007
ggy-0.007
fire-0.007
ork-0.007
 liar-0.007
Token fell
Token ID3214
Feature activation+6.16e-01

Pos logit contributions
ves+0.067
fire+0.067
ork+0.062
 inconvenience+0.060
con+0.060
Neg logit contributions
 both-0.059
 forth-0.057
 all-0.055
 each-0.054
efully-0.053
Token and
Token ID290
Feature activation-1.40e+00

Pos logit contributions
efully+0.011
ipp+0.010
ummy+0.010
 forth+0.010
inding+0.010
Neg logit contributions
ves-0.019
fire-0.018
ggy-0.018
ork-0.018
 liar-0.018
Token little
Token ID1310
Feature activation+4.12e-01

Pos logit contributions
ummy+0.018
efully+0.018
ipp+0.018
 forth+0.017
el+0.017
Neg logit contributions
ves-0.022
fire-0.020
ork-0.019
ggy-0.019
ata-0.018
Token girl
Token ID2576
Feature activation-1.84e+00

Pos logit contributions
efully+0.008
ummy+0.008
 forth+0.008
ipp+0.008
el+0.008
Neg logit contributions
ves-0.012
ork-0.011
ggy-0.011
fire-0.010
 liar-0.010
Token called
Token ID1444
Feature activation-1.72e+00

Pos logit contributions
ves+0.053
con+0.050
 inconvenience+0.049
ork+0.048
 liar+0.047
Neg logit contributions
el-0.038
 each-0.038
 both-0.035
 forth-0.034
 at-0.033
Token Clara
Token ID27443
Feature activation-5.84e-01

Pos logit contributions
ves+0.057
fire+0.052
 liar+0.051
ns+0.050
 defeated+0.050
Neg logit contributions
 Daddy-0.029
ummy-0.029
 both-0.027
 each-0.025
ipp-0.025
Token.
Token ID13
Feature activation+1.72e+00

Pos logit contributions
con+0.014
ves+0.013
 inconvenience+0.013
 liar+0.012
 crashes+0.012
Neg logit contributions
 both-0.015
 each-0.015
el-0.014
efully-0.014
inding-0.014
Token Clara
Token ID27443
Feature activation-3.06e-02

Pos logit contributions
 forth+0.037
efully+0.035
inding+0.034
ishing+0.033
asured+0.033
Neg logit contributions
ves-0.045
ork-0.043
 jail-0.041
 liar-0.041
ggy-0.041
Token had
Token ID550
Feature activation-1.26e+00

Pos logit contributions
ves+0.001
ggy+0.001
fire+0.001
ork+0.001
 liar+0.001
Neg logit contributions
efully-0.000
ipp-0.000
ummy-0.000
inding-0.000
 forth-0.000
Token a
Token ID257
Feature activation-3.51e-01

Pos logit contributions
ves+0.034
fire+0.032
ork+0.031
 liar+0.030
 troll+0.030
Neg logit contributions
efully-0.027
 both-0.025
ipp-0.025
inding-0.024
 forth-0.024
Token little
Token ID1310
Feature activation-2.42e-01

Pos logit contributions
ves+0.009
ork+0.009
ggy+0.009
 liar+0.009
fire+0.008
Neg logit contributions
efully-0.008
inding-0.008
ipp-0.008
 forth-0.007
ummy-0.007
Token brother
Token ID3956
Feature activation-1.72e+00

Pos logit contributions
ves+0.007
fire+0.006
ork+0.006
ggy+0.006
 liar+0.006
Neg logit contributions
efully-0.005
ipp-0.005
ummy-0.005
 forth-0.005
inding-0.005
Token who
Token ID508
Feature activation+4.10e-01

Pos logit contributions
ves+0.039
fire+0.035
ork+0.035
con+0.032
 liar+0.031
Neg logit contributions
efully-0.043
ummy-0.043
el-0.043
 each-0.042
ipp-0.042
Token not
Token ID407
Feature activation-1.67e+00

Pos logit contributions
efully+0.028
 forth+0.026
ipp+0.026
inding+0.026
ummy+0.026
Neg logit contributions
ves-0.045
ork-0.042
fire-0.042
ggy-0.041
 liar-0.041
Token so
Token ID523
Feature activation-1.69e+00

Pos logit contributions
ves+0.054
ork+0.050
fire+0.049
ggy+0.047
con+0.046
Neg logit contributions
efully-0.029
 forth-0.028
 both-0.027
asured-0.027
ipp-0.026
Token terrible
Token ID7818
Feature activation-7.19e-01

Pos logit contributions
ves+0.058
ork+0.054
fire+0.053
con+0.052
drivers+0.050
Neg logit contributions
 adorable-0.026
 both-0.025
 all-0.022
 each-0.022
 favorite-0.022
Token."
Token ID526
Feature activation-1.04e+00

Pos logit contributions
ves+0.017
ork+0.016
fire+0.015
 liar+0.015
ata+0.015
Neg logit contributions
ummy-0.018
 forth-0.017
efully-0.017
 each-0.017
 both-0.016
Token Lily
Token ID20037
Feature activation-1.43e+00

Pos logit contributions
ves+0.039
fire+0.036
ggy+0.036
 liar+0.034
th+0.033
Neg logit contributions
 Daddy-0.016
 both-0.015
 each-0.014
 Dad-0.013
 Mom-0.013
Token whispered
Token ID25029
Feature activation-1.84e-01

Pos logit contributions
ves+0.044
fire+0.042
ork+0.041
 liar+0.041
ggy+0.039
Neg logit contributions
efully-0.027
 forth-0.025
inding-0.024
 both-0.024
ipp-0.023
Token to
Token ID284
Feature activation+3.50e-01

Pos logit contributions
ves+0.006
ork+0.005
 liar+0.005
 inconvenience+0.005
fire+0.005
Neg logit contributions
efully-0.003
 both-0.003
 each-0.003
 forth-0.003
ummy-0.003
Token Max
Token ID5436
Feature activation-1.36e+00

Pos logit contributions
 each+0.005
 both+0.005
efully+0.005
 forth+0.005
el+0.005
Neg logit contributions
ves-0.012
ork-0.012
fire-0.011
 liar-0.011
ata-0.011
Token.
Token ID13
Feature activation-5.13e-01

Pos logit contributions
ves+0.030
ork+0.026
 liar+0.026
fire+0.025
 inconvenience+0.025
Neg logit contributions
efully-0.036
 both-0.036
 forth-0.036
inding-0.035
ipp-0.035
Token
Token ID198
Feature activation-1.01e-01

Pos logit contributions
ves+0.016
fire+0.015
ggy+0.015
 liar+0.015
ork+0.014
Neg logit contributions
efully-0.008
ummy-0.008
 both-0.008
inding-0.008
 each-0.008
Token
Token ID198
Feature activation-6.50e-02

Pos logit contributions
 liar+0.003
 troll+0.003
fire+0.003
 inconvenience+0.003
ves+0.003
Neg logit contributions
 both-0.002
efully-0.002
 each-0.002
ummy-0.002
inding-0.002
Token listened
Token ID16399
Feature activation+1.63e+00

Pos logit contributions
ves+0.025
ork+0.024
fire+0.024
con+0.023
 liar+0.023
Neg logit contributions
efully-0.013
 both-0.013
 forth-0.012
inding-0.012
 each-0.012
Token and
Token ID290
Feature activation-1.65e+00

Pos logit contributions
efully+0.032
ipp+0.030
inding+0.029
ummy+0.029
 forth+0.029
Neg logit contributions
ves-0.047
fire-0.043
ggy-0.043
 liar-0.043
ork-0.043
Token stopped
Token ID5025
Feature activation-1.41e-01

Pos logit contributions
ves+0.052
ork+0.051
ggy+0.048
 liar+0.048
fire+0.047
Neg logit contributions
efully-0.028
 forth-0.028
 both-0.026
 each-0.025
inding-0.024
Token.
Token ID13
Feature activation-7.70e-01

Pos logit contributions
ves+0.004
ork+0.004
fire+0.004
ggy+0.003
 liar+0.003
Neg logit contributions
efully-0.003
 forth-0.003
ipp-0.003
inding-0.003
ummy-0.003
Token They
Token ID1119
Feature activation+3.18e-01

Pos logit contributions
ves+0.027
ggy+0.026
 liar+0.025
fire+0.025
th+0.024
Neg logit contributions
efully-0.011
 both-0.010
 Daddy-0.010
ummy-0.010
 Every-0.010
Token all
Token ID477
Feature activation-3.67e-01

Pos logit contributions
efully+0.006
ipp+0.006
ummy+0.005
inding+0.005
 forth+0.005
Neg logit contributions
ves-0.009
fire-0.009
ggy-0.009
ork-0.009
 liar-0.009
Token became
Token ID2627
Feature activation-1.31e-01

Pos logit contributions
ves+0.011
ork+0.010
fire+0.010
ggy+0.010
 liar+0.010
Neg logit contributions
efully-0.007
ipp-0.007
 forth-0.006
inding-0.006
ummy-0.006
Token friends
Token ID2460
Feature activation-1.20e+00

Pos logit contributions
ves+0.004
ork+0.004
fire+0.004
ggy+0.004
con+0.004
Neg logit contributions
efully-0.002
 forth-0.002
 both-0.002
ipp-0.002
inding-0.002
Token and
Token ID290
Feature activation-7.98e-01

Pos logit contributions
ves+0.031
ork+0.029
con+0.028
fire+0.028
 liar+0.027
Neg logit contributions
efully-0.027
 both-0.027
inding-0.026
 forth-0.025
 each-0.025
Token played
Token ID2826
Feature activation-6.99e-01

Pos logit contributions
ves+0.025
ork+0.024
fire+0.024
ggy+0.023
 liar+0.023
Neg logit contributions
efully-0.013
 forth-0.012
ipp-0.012
inding-0.012
 both-0.011
Token happily
Token ID18177
Feature activation-1.65e+00

Pos logit contributions
ves+0.024
ork+0.023
 liar+0.023
con+0.022
ggy+0.022
Neg logit contributions
efully-0.010
 both-0.009
 forth-0.009
 each-0.008
inding-0.008
Token
Token ID198
Feature activation-5.71e-01

Pos logit contributions
 liar+0.021
con+0.020
 inconvenience+0.020
fire+0.019
 troll+0.019
Neg logit contributions
 both-0.011
ummy-0.010
efully-0.010
 each-0.010
 Daddy-0.010
Token"
Token ID1
Feature activation+2.26e-01

Pos logit contributions
ves+0.014
 liar+0.014
con+0.013
ork+0.013
ggy+0.013
Neg logit contributions
efully-0.013
ummy-0.012
 both-0.012
inding-0.012
ipp-0.012
TokenWhat
Token ID2061
Feature activation-1.13e+00

Pos logit contributions
efully+0.005
ipp+0.005
ummy+0.005
inding+0.005
 forth+0.004
Neg logit contributions
ves-0.006
fire-0.006
ork-0.006
ggy-0.006
 liar-0.006
Token happened
Token ID3022
Feature activation+3.46e-01

Pos logit contributions
ves+0.033
fire+0.032
ork+0.031
con+0.031
ggy+0.030
Neg logit contributions
efully-0.021
ipp-0.019
inding-0.019
ummy-0.019
el-0.018
Token?"
Token ID1701
Feature activation-1.11e+00

Pos logit contributions
efully+0.007
ummy+0.007
ipp+0.007
inding+0.007
 forth+0.007
Neg logit contributions
ves-0.010
fire-0.009
ork-0.009
ggy-0.009
 liar-0.009
Token Dad
Token ID17415
Feature activation-1.42e+00

Pos logit contributions
ves+0.035
fire+0.033
ork+0.033
ggy+0.032
 liar+0.032
Neg logit contributions
efully-0.019
ummy-0.017
inding-0.017
el-0.017
 both-0.017
Token asked
Token ID1965
Feature activation-1.06e+00

Pos logit contributions
ves+0.034
fire+0.031
ggy+0.031
ork+0.031
 liar+0.030
Neg logit contributions
efully-0.034
ipp-0.033
ummy-0.032
inding-0.032
 forth-0.031
Token.
Token ID13
Feature activation-6.22e-01

Pos logit contributions
ves+0.030
con+0.029
 liar+0.028
ork+0.028
fire+0.028
Neg logit contributions
 both-0.024
 Daddy-0.023
 each-0.023
efully-0.023
el-0.023
Token He
Token ID679
Feature activation-7.38e-01

Pos logit contributions
ves+0.020
ggy+0.019
con+0.019
 inconvenience+0.019
fire+0.018
Neg logit contributions
 Daddy-0.009
efully-0.009
 both-0.009
ummy-0.009
 Dad-0.009
Token saw
Token ID2497
Feature activation-2.02e-01

Pos logit contributions
ves+0.019
ggy+0.018
ork+0.018
fire+0.017
 liar+0.017
Neg logit contributions
efully-0.017
ummy-0.016
ipp-0.016
 forth-0.016
inding-0.016
Token Tom
Token ID4186
Feature activation-4.59e-01

Pos logit contributions
ves+0.006
fire+0.006
ork+0.006
ggy+0.006
 liar+0.006
Neg logit contributions
efully-0.004
 both-0.004
ummy-0.003
el-0.003
 each-0.003
Token She
Token ID1375
Feature activation+2.74e-01

Pos logit contributions
efully+0.013
 forth+0.012
inding+0.012
ipp+0.012
asured+0.012
Neg logit contributions
ves-0.014
ork-0.014
ggy-0.013
 jail-0.013
fire-0.013
Token ran
Token ID4966
Feature activation-1.46e+00

Pos logit contributions
ummy+0.007
efully+0.007
inding+0.007
initely+0.007
el+0.007
Neg logit contributions
ves-0.006
ggy-0.006
ork-0.005
l-0.005
 killed-0.005
Token over
Token ID625
Feature activation-8.04e-01

Pos logit contributions
ves+0.041
ork+0.038
ggy+0.038
fire+0.038
 jail+0.037
Neg logit contributions
efully-0.029
ummy-0.028
inding-0.027
ipp-0.027
el-0.027
Token to
Token ID284
Feature activation-6.97e-01

Pos logit contributions
ves+0.024
fire+0.022
ggy+0.022
ork+0.022
 liar+0.022
Neg logit contributions
efully-0.016
ipp-0.014
ummy-0.014
 forth-0.014
inding-0.014
Token it
Token ID340
Feature activation+1.07e+00

Pos logit contributions
ves+0.018
ggy+0.017
fire+0.017
ork+0.017
 liar+0.016
Neg logit contributions
efully-0.016
ipp-0.015
ummy-0.015
inding-0.015
 forth-0.014
Token and
Token ID290
Feature activation-1.80e+00

Pos logit contributions
efully+0.021
ipp+0.019
inding+0.019
ummy+0.019
aleb+0.018
Neg logit contributions
ves-0.030
ork-0.029
fire-0.028
ggy-0.028
con-0.027
Token asked
Token ID1965
Feature activation+3.69e-01

Pos logit contributions
ves+0.047
fire+0.043
ggy+0.043
ork+0.043
 liar+0.043
Neg logit contributions
efully-0.040
ipp-0.038
ummy-0.037
inding-0.037
 forth-0.037
Token her
Token ID607
Feature activation+9.72e-01

Pos logit contributions
efully+0.005
ipp+0.005
ummy+0.005
 forth+0.005
 both+0.005
Neg logit contributions
ves-0.012
fire-0.012
ork-0.012
con-0.012
ggy-0.012
Token mom
Token ID1995
Feature activation+5.52e-01

Pos logit contributions
efully+0.018
ipp+0.017
inding+0.016
 forth+0.016
ummy+0.016
Neg logit contributions
ves-0.029
ggy-0.027
ork-0.027
fire-0.027
 liar-0.027
Token,
Token ID11
Feature activation-2.88e-01

Pos logit contributions
efully+0.013
ipp+0.012
inding+0.012
 forth+0.012
ummy+0.012
Neg logit contributions
ves-0.014
fire-0.013
ork-0.013
ggy-0.013
 liar-0.013
Token "
Token ID366
Feature activation+1.93e+00

Pos logit contributions
ves+0.007
fire+0.007
ggy+0.007
 liar+0.007
ork+0.007
Neg logit contributions
efully-0.007
ipp-0.006
inding-0.006
ummy-0.006
 forth-0.006
Token angry
Token ID7954
Feature activation-1.57e+00

Pos logit contributions
ves+0.028
ork+0.026
fire+0.026
ggy+0.025
 liar+0.025
Neg logit contributions
efully-0.021
 forth-0.020
ipp-0.019
inding-0.019
ummy-0.019
Token.
Token ID13
Feature activation-9.54e-01

Pos logit contributions
ves+0.040
fire+0.037
ork+0.036
 liar+0.034
con+0.034
Neg logit contributions
 both-0.038
efully-0.038
 each-0.038
ummy-0.037
 forth-0.036
Token
Token ID198
Feature activation-9.74e-01

Pos logit contributions
ves+0.039
fire+0.038
ggy+0.036
 liar+0.036
ork+0.035
Neg logit contributions
efully-0.008
ummy-0.007
 both-0.007
 Daddy-0.007
ipp-0.006
Token
Token ID198
Feature activation-9.56e-01

Pos logit contributions
 liar+0.032
fire+0.031
 troll+0.030
ves+0.029
 inconvenience+0.029
Neg logit contributions
 both-0.016
efully-0.015
ummy-0.014
 each-0.014
el-0.013
Token"
Token ID1
Feature activation-5.03e-02

Pos logit contributions
ves+0.025
 liar+0.025
con+0.024
ork+0.024
fire+0.024
Neg logit contributions
efully-0.021
 both-0.019
inding-0.019
 forth-0.018
el-0.018
TokenHey
Token ID10814
Feature activation-2.12e+00

Pos logit contributions
ves+0.001
ork+0.001
fire+0.001
con+0.001
ggy+0.001
Neg logit contributions
efully-0.001
 forth-0.001
el-0.001
inding-0.001
ipp-0.001
Token,
Token ID11
Feature activation-3.57e-01

Pos logit contributions
ves+0.071
fire+0.067
ork+0.065
con+0.064
 villains+0.061
Neg logit contributions
 forth-0.037
 both-0.033
ummy-0.032
 Daddy-0.030
efully-0.030
Token that
Token ID326
Feature activation-2.37e-01

Pos logit contributions
ves+0.010
fire+0.009
ork+0.009
con+0.009
ggy+0.009
Neg logit contributions
efully-0.007
 forth-0.007
ipp-0.007
 both-0.007
ummy-0.007
Token's
Token ID338
Feature activation-2.05e-01

Pos logit contributions
ves+0.008
fire+0.008
 villains+0.007
ork+0.007
 liar+0.007
Neg logit contributions
efully-0.003
el-0.003
ummy-0.003
ipp-0.003
 forth-0.003
Token not
Token ID407
Feature activation-2.24e+00

Pos logit contributions
ves+0.006
fire+0.006
ggy+0.006
ork+0.006
 liar+0.006
Neg logit contributions
efully-0.004
ipp-0.003
ummy-0.003
inding-0.003
 forth-0.003
Token nice
Token ID3621
Feature activation-3.57e-01

Pos logit contributions
ves+0.081
fire+0.075
ork+0.074
con+0.071
l+0.070
Neg logit contributions
efully-0.031
 forth-0.030
ipp-0.028
 both-0.028
inding-0.027
Token that
Token ID326
Feature activation-6.91e-02

Pos logit contributions
ummy+0.015
ipp+0.014
efully+0.014
inding+0.014
 both+0.014
Neg logit contributions
fire-0.025
ves-0.025
con-0.024
 liar-0.024
ns-0.024
Token they
Token ID484
Feature activation+6.87e-01

Pos logit contributions
ves+0.002
fire+0.002
ork+0.002
ggy+0.002
 liar+0.001
Neg logit contributions
efully-0.002
 forth-0.002
ummy-0.002
ipp-0.002
inding-0.001
Token were
Token ID547
Feature activation+1.33e+00

Pos logit contributions
efully+0.017
ummy+0.016
ipp+0.015
inding+0.015
aleb+0.015
Neg logit contributions
ves-0.017
ork-0.015
ggy-0.015
 liar-0.015
 killed-0.015
Token going
Token ID1016
Feature activation+5.99e-01

Pos logit contributions
efully+0.020
ipp+0.018
inding+0.018
ummy+0.018
 forth+0.017
Neg logit contributions
ves-0.045
fire-0.042
ggy-0.042
ork-0.042
 liar-0.042
Token to
Token ID284
Feature activation-1.29e-01

Pos logit contributions
efully+0.012
ipp+0.011
 forth+0.011
ummy+0.011
inding+0.011
Neg logit contributions
ves-0.017
fire-0.016
ggy-0.016
ork-0.015
 liar-0.015
Token a
Token ID257
Feature activation-6.80e-01

Pos logit contributions
ves+0.004
 liar+0.004
fire+0.004
ork+0.004
 troll+0.004
Neg logit contributions
 both-0.002
 each-0.002
efully-0.002
 definitely-0.002
ipp-0.002
Token big
Token ID1263
Feature activation+1.88e+00

Pos logit contributions
ves+0.020
fire+0.018
 liar+0.018
ggy+0.018
con+0.018
Neg logit contributions
efully-0.012
ummy-0.012
ipp-0.012
 forth-0.011
inding-0.011
Token building
Token ID2615
Feature activation-1.48e+00

Pos logit contributions
efully+0.042
ipp+0.039
inding+0.039
ummy+0.038
 forth+0.038
Neg logit contributions
ves-0.049
fire-0.046
ork-0.045
ggy-0.045
 liar-0.045
Token with
Token ID351
Feature activation-1.23e+00

Pos logit contributions
 liar+0.039
 troll+0.038
ves+0.038
con+0.037
 villains+0.036
Neg logit contributions
 all-0.030
 each-0.029
 both-0.029
 forth-0.029
 at-0.028
Token an
Token ID281
Feature activation+5.81e-01

Pos logit contributions
ves+0.038
fire+0.037
ggy+0.036
 liar+0.036
ork+0.035
Neg logit contributions
 both-0.021
 each-0.021
efully-0.020
 all-0.020
el-0.019
Token elevator
Token ID20932
Feature activation-2.10e+00

Pos logit contributions
efully+0.015
ipp+0.014
inding+0.013
 forth+0.013
ummy+0.013
Neg logit contributions
ves-0.013
ggy-0.013
 jail-0.012
 liar-0.012
ork-0.012
Token lips
Token ID11914
Feature activation+5.49e-01

Pos logit contributions
ves+0.032
ggy+0.029
ork+0.029
fire+0.029
 liar+0.029
Neg logit contributions
efully-0.026
ipp-0.024
ummy-0.024
inding-0.024
 forth-0.023
Token and
Token ID290
Feature activation-9.85e-01

Pos logit contributions
efully+0.013
inding+0.012
ipp+0.012
 forth+0.012
ummy+0.012
Neg logit contributions
ves-0.014
ork-0.013
fire-0.013
ggy-0.012
 liar-0.012
Token tongue
Token ID11880
Feature activation-9.68e-02

Pos logit contributions
ves+0.028
ork+0.026
fire+0.026
ggy+0.025
 liar+0.025
Neg logit contributions
efully-0.020
 forth-0.019
ipp-0.018
inding-0.018
ummy-0.018
Token to
Token ID284
Feature activation+4.44e-01

Pos logit contributions
ves+0.003
fire+0.002
ggy+0.002
 liar+0.002
ork+0.002
Neg logit contributions
efully-0.002
ipp-0.002
 forth-0.002
inding-0.002
el-0.002
Token make
Token ID787
Feature activation-1.56e-01

Pos logit contributions
efully+0.009
 forth+0.008
ipp+0.008
inding+0.008
ummy+0.008
Neg logit contributions
ves-0.013
ork-0.012
fire-0.012
ggy-0.011
 liar-0.011
Token the
Token ID262
Feature activation-8.77e-01

Pos logit contributions
ves+0.005
fire+0.005
 liar+0.004
ork+0.004
ggy+0.004
Neg logit contributions
efully-0.003
 both-0.003
ipp-0.003
el-0.003
 forth-0.003
Token same
Token ID976
Feature activation-6.24e-01

Pos logit contributions
ves+0.023
ork+0.021
fire+0.020
ggy+0.020
ata+0.020
Neg logit contributions
 forth-0.020
efully-0.020
el-0.019
ummy-0.019
 prett-0.018
Token funny
Token ID8258
Feature activation-1.04e+00

Pos logit contributions
ves+0.019
fire+0.017
ork+0.017
 liar+0.017
ggy+0.017
Neg logit contributions
efully-0.011
 forth-0.011
ummy-0.011
el-0.010
ipp-0.010
Token noise
Token ID7838
Feature activation-1.08e+00

Pos logit contributions
ves+0.030
fire+0.028
ork+0.027
 liar+0.027
ggy+0.026
Neg logit contributions
efully-0.020
 forth-0.019
ummy-0.019
el-0.019
ipp-0.018
Token.
Token ID13
Feature activation-3.08e-03

Pos logit contributions
ves+0.021
ork+0.019
 liar+0.019
ggy+0.018
fire+0.018
Neg logit contributions
efully-0.030
 forth-0.030
ummy-0.029
ipp-0.029
el-0.029
Token Sammy
Token ID44067
Feature activation+5.14e-02

Pos logit contributions
ves+0.000
ggy+0.000
fire+0.000
 liar+0.000
ork+0.000
Neg logit contributions
efully-0.000
ipp-0.000
ummy-0.000
inding-0.000
el-0.000
Token and
Token ID290
Feature activation-4.28e-01

Pos logit contributions
efully+0.003
ipp+0.003
ummy+0.003
inding+0.003
 forth+0.003
Neg logit contributions
ves-0.004
fire-0.004
ork-0.004
ggy-0.004
 liar-0.004
Token said
Token ID531
Feature activation-6.30e-02

Pos logit contributions
ves+0.011
ggy+0.010
fire+0.010
ork+0.010
 liar+0.010
Neg logit contributions
efully-0.009
ummy-0.009
ipp-0.009
inding-0.009
 forth-0.009
Token,
Token ID11
Feature activation+3.47e-01

Pos logit contributions
ves+0.002
fire+0.002
ork+0.002
ggy+0.001
 liar+0.001
Neg logit contributions
efully-0.001
ipp-0.001
ummy-0.001
inding-0.001
 forth-0.001
Token �
Token ID6184
Feature activation-5.13e-01

Pos logit contributions
efully+0.006
ipp+0.006
ummy+0.006
inding+0.006
 forth+0.006
Neg logit contributions
ves-0.010
ggy-0.009
ork-0.009
fire-0.009
 liar-0.009
Token�
Token ID95
Feature activation-7.08e-01

Pos logit contributions
ves+0.009
fire+0.008
ork+0.008
 liar+0.008
ggy+0.008
Neg logit contributions
efully-0.015
ipp-0.015
inding-0.015
ummy-0.015
 forth-0.014
Token€
Token ID26391
Feature activation-3.44e+00

Pos logit contributions
ves+0.018
 liar+0.017
ork+0.016
 troll+0.016
fire+0.016
Neg logit contributions
efully-0.016
ipp-0.015
inding-0.015
 definitely-0.014
ummy-0.014
Token�
Token ID129
Feature activation+5.96e-01

Pos logit contributions
 liar+0.076
 troll+0.074
fire+0.073
 inconvenience+0.070
ves+0.070
Neg logit contributions
 both-0.082
inding-0.082
ipp-0.081
 definitely-0.080
efully-0.080
Token�
Token ID241
Feature activation-1.49e-01

Pos logit contributions
efully+0.012
ipp+0.011
inding+0.011
ummy+0.011
 forth+0.010
Neg logit contributions
ves-0.017
fire-0.016
ork-0.016
 liar-0.016
ggy-0.016
TokenWe
Token ID1135
Feature activation+7.76e-01

Pos logit contributions
ves+0.005
ork+0.004
fire+0.004
 liar+0.004
 troll+0.004
Neg logit contributions
efully-0.003
ipp-0.002
 definitely-0.002
 absolutely-0.002
inding-0.002
Token won
Token ID1839
Feature activation-1.22e+00

Pos logit contributions
efully+0.013
ipp+0.012
inding+0.012
ummy+0.012
 forth+0.012
Neg logit contributions
ves-0.024
fire-0.023
ork-0.023
ggy-0.023
 liar-0.022
Token!
Token ID0
Feature activation-1.29e+00

Pos logit contributions
ves+0.030
 liar+0.029
fire+0.028
 troll+0.028
con+0.027
Neg logit contributions
 both-0.027
efully-0.027
inding-0.027
ipp-0.027
 definitely-0.026
Token she
Token ID673
Feature activation-2.64e-01

Pos logit contributions
efully+0.050
ummy+0.047
inding+0.046
ipp+0.045
ishing+0.045
Neg logit contributions
ves-0.030
ggy-0.028
 liar-0.027
 jail-0.025
ork-0.025
Token said
Token ID531
Feature activation+2.11e-01

Pos logit contributions
ves+0.008
ggy+0.007
ork+0.007
fire+0.007
 liar+0.007
Neg logit contributions
efully-0.005
ipp-0.005
ummy-0.005
inding-0.005
 forth-0.005
Token he
Token ID339
Feature activation+1.50e-01

Pos logit contributions
efully+0.005
ummy+0.005
ipp+0.005
inding+0.005
 forth+0.005
Neg logit contributions
ves-0.005
fire-0.005
ork-0.005
ggy-0.004
con-0.004
Token couldn
Token ID3521
Feature activation+6.33e-02

Pos logit contributions
efully+0.003
ipp+0.002
inding+0.002
ummy+0.002
 forth+0.002
Neg logit contributions
ves-0.005
fire-0.004
ork-0.004
ggy-0.004
 liar-0.004
Token't
Token ID470
Feature activation-3.27e-01

Pos logit contributions
efully+0.001
ummy+0.001
el+0.001
ipp+0.001
inding+0.001
Neg logit contributions
ves-0.002
fire-0.001
 liar-0.001
 troll-0.001
 villains-0.001
Token have
Token ID423
Feature activation-3.82e+00

Pos logit contributions
ves+0.009
 liar+0.008
fire+0.008
ork+0.008
ggy+0.008
Neg logit contributions
efully-0.006
ipp-0.006
ummy-0.006
 forth-0.006
inding-0.006
Token one
Token ID530
Feature activation-1.11e+00

Pos logit contributions
fire+0.116
ves+0.109
 troll+0.106
 liar+0.103
con+0.100
Neg logit contributions
 both-0.074
 each-0.072
efully-0.069
 all-0.068
el-0.067
Token yet
Token ID1865
Feature activation-1.62e+00

Pos logit contributions
ves+0.024
fire+0.022
ork+0.021
ggy+0.021
 liar+0.020
Neg logit contributions
efully-0.030
ummy-0.030
ipp-0.029
 forth-0.028
el-0.028
Token.
Token ID13
Feature activation+3.42e-01

Pos logit contributions
ves+0.033
fire+0.029
 liar+0.028
 troll+0.028
ork+0.028
Neg logit contributions
ummy-0.043
 both-0.043
efully-0.043
asured-0.042
 each-0.042
Token Randy
Token ID21248
Feature activation-4.79e-01

Pos logit contributions
efully+0.008
 forth+0.007
inding+0.007
ummy+0.007
ipp+0.007
Neg logit contributions
ves-0.009
ork-0.008
ggy-0.008
fire-0.008
 liar-0.008
Token didn
Token ID1422
Feature activation+5.50e-01

Pos logit contributions
ves+0.014
ggy+0.013
ork+0.013
fire+0.012
 liar+0.012
Neg logit contributions
efully-0.009
ummy-0.009
ipp-0.009
 forth-0.009
inding-0.009
Token
Token ID198
Feature activation+4.49e-01

Pos logit contributions
 liar+0.014
fire+0.014
 troll+0.013
 villains+0.013
ves+0.013
Neg logit contributions
ipp-0.011
ummy-0.011
efully-0.011
 both-0.010
inding-0.010
Token
Token ID198
Feature activation+3.42e-01

Pos logit contributions
efully+0.009
ipp+0.008
 both+0.008
ummy+0.008
inding+0.007
Neg logit contributions
 liar-0.014
 killed-0.013
fire-0.013
 traps-0.013
 troll-0.013
TokenLu
Token ID25596
Feature activation+6.51e-01

Pos logit contributions
efully+0.008
ipp+0.007
inding+0.007
ummy+0.007
 forth+0.007
Neg logit contributions
ves-0.009
 liar-0.008
fire-0.008
ork-0.008
 troll-0.008
Tokency
Token ID948
Feature activation+1.30e+00

Pos logit contributions
ummy+0.007
efully+0.007
ipp+0.006
inding+0.006
 definitely+0.006
Neg logit contributions
ves-0.022
fire-0.022
ork-0.022
con-0.021
ggy-0.021
Token took
Token ID1718
Feature activation-8.25e-01

Pos logit contributions
efully+0.027
ummy+0.027
 forth+0.026
ipp+0.025
inding+0.025
Neg logit contributions
ves-0.036
ggy-0.033
ork-0.032
fire-0.032
 liar-0.030
Token a
Token ID257
Feature activation-3.52e+00

Pos logit contributions
ves+0.025
 liar+0.024
fire+0.024
ork+0.023
ggy+0.022
Neg logit contributions
efully-0.016
 both-0.014
ipp-0.014
 forth-0.014
el-0.014
Token few
Token ID1178
Feature activation-1.40e+00

Pos logit contributions
ves+0.097
 liar+0.088
fire+0.088
ggy+0.087
 villains+0.085
Neg logit contributions
efully-0.067
 forth-0.066
ipp-0.065
ummy-0.065
el-0.064
Token steps
Token ID4831
Feature activation+6.87e-03

Pos logit contributions
ves+0.033
fire+0.031
ggy+0.030
 liar+0.030
ork+0.030
Neg logit contributions
efully-0.035
 forth-0.033
el-0.032
ummy-0.032
ipp-0.031
Token closer
Token ID5699
Feature activation-4.12e-01

Pos logit contributions
efully+0.000
inding+0.000
ipp+0.000
ummy+0.000
asured+0.000
Neg logit contributions
ves-0.000
fire-0.000
ork-0.000
ggy-0.000
 liar-0.000
Token and
Token ID290
Feature activation-1.65e+00

Pos logit contributions
ves+0.012
 liar+0.011
fire+0.011
ggy+0.011
ork+0.011
Neg logit contributions
efully-0.008
ipp-0.007
ummy-0.007
 forth-0.007
inding-0.007
Token looked
Token ID3114
Feature activation-4.39e-01

Pos logit contributions
ves+0.048
 liar+0.044
ggy+0.044
ork+0.043
fire+0.042
Neg logit contributions
efully-0.034
 forth-0.033
 both-0.031
ipp-0.031
inding-0.030
Token.
Token ID13
Feature activation-9.38e-01

Pos logit contributions
 both+0.002
efully+0.002
 each+0.001
el+0.001
 forth+0.001
Neg logit contributions
ves-0.002
 liar-0.001
ggy-0.001
con-0.001
fire-0.001
Token
Token ID198
Feature activation-5.55e-01

Pos logit contributions
ves+0.032
ggy+0.031
fire+0.030
 liar+0.030
con+0.030
Neg logit contributions
 both-0.013
ipp-0.013
 Daddy-0.013
ummy-0.012
efully-0.012
Token
Token ID198
Feature activation-3.16e-01

Pos logit contributions
 liar+0.019
 troll+0.019
ger+0.018
con+0.018
 traps+0.018
Neg logit contributions
 both-0.008
ipp-0.007
 each-0.007
efully-0.007
ummy-0.006
Token"
Token ID1
Feature activation-6.51e-02

Pos logit contributions
con+0.009
 liar+0.009
 troll+0.009
ves+0.009
 traps+0.009
Neg logit contributions
efully-0.006
 both-0.005
ipp-0.005
inding-0.005
 each-0.005
TokenHa
Token ID23303
Feature activation-1.79e+00

Pos logit contributions
ves+0.002
ggy+0.002
ork+0.002
 liar+0.002
fire+0.002
Neg logit contributions
efully-0.001
 forth-0.001
ipp-0.001
inding-0.001
el-0.001
Token ha
Token ID387
Feature activation-3.84e+00

Pos logit contributions
ves+0.043
con+0.040
fire+0.040
ork+0.040
ata+0.038
Neg logit contributions
 forth-0.046
efully-0.044
el-0.044
inding-0.040
 both-0.040
Token,
Token ID11
Feature activation-3.34e-01

Pos logit contributions
fire+0.131
 villains+0.124
ves+0.118
 troll+0.117
 traps+0.115
Neg logit contributions
ipp-0.068
 forth-0.059
el-0.058
 both-0.057
ummy-0.057
Token Ben
Token ID3932
Feature activation-1.95e+00

Pos logit contributions
ves+0.009
fire+0.008
ork+0.008
ggy+0.008
con+0.008
Neg logit contributions
efully-0.007
ipp-0.007
 forth-0.007
 both-0.007
inding-0.007
Token,
Token ID11
Feature activation-7.75e-02

Pos logit contributions
ves+0.064
 villains+0.063
 conclusions+0.061
ork+0.060
 liar+0.060
Neg logit contributions
 forth-0.029
ummy-0.028
el-0.028
ish-0.027
inding-0.027
Token you
Token ID345
Feature activation-1.48e+00

Pos logit contributions
ves+0.002
fire+0.002
ork+0.002
ggy+0.002
 liar+0.002
Neg logit contributions
efully-0.002
ipp-0.002
ummy-0.001
inding-0.001
 forth-0.001
Token can
Token ID460
Feature activation+4.56e-01

Pos logit contributions
ves+0.046
fire+0.044
ork+0.042
con+0.042
ggy+0.041
Neg logit contributions
 both-0.025
efully-0.025
ipp-0.024
inding-0.024
el-0.023
Token peeled
Token ID42617
Feature activation+2.79e+00

Pos logit contributions
efully+0.015
ummy+0.014
ipp+0.014
inding+0.014
 forth+0.014
Neg logit contributions
ves-0.019
ggy-0.017
ork-0.017
fire-0.017
 liar-0.017
Token the
Token ID262
Feature activation-1.17e+00

Pos logit contributions
efully+0.060
ipp+0.057
ummy+0.056
inding+0.056
 forth+0.055
Neg logit contributions
ves-0.075
fire-0.069
ggy-0.069
ork-0.068
 liar-0.068
Token banana
Token ID25996
Feature activation-1.90e-01

Pos logit contributions
ves+0.026
fire+0.023
ggy+0.023
ork+0.023
 liar+0.023
Neg logit contributions
efully-0.031
ipp-0.029
ummy-0.029
 forth-0.029
inding-0.029
Token and
Token ID290
Feature activation-7.85e-01

Pos logit contributions
ves+0.005
ork+0.005
fire+0.005
ggy+0.005
 liar+0.005
Neg logit contributions
efully-0.004
ipp-0.004
inding-0.004
 forth-0.004
asured-0.004
Token took
Token ID1718
Feature activation-1.17e+00

Pos logit contributions
ves+0.023
ork+0.021
 liar+0.021
ggy+0.021
fire+0.021
Neg logit contributions
efully-0.015
 forth-0.014
ipp-0.014
 both-0.013
el-0.013
Token a
Token ID257
Feature activation-3.53e+00

Pos logit contributions
ves+0.034
 liar+0.034
fire+0.033
 villains+0.033
 inconvenience+0.032
Neg logit contributions
 both-0.023
efully-0.023
el-0.022
ipp-0.021
 each-0.021
Token big
Token ID1263
Feature activation-6.35e-01

Pos logit contributions
ves+0.083
 liar+0.075
 villains+0.075
ggy+0.074
fire+0.074
Neg logit contributions
ummy-0.082
ipp-0.081
efully-0.081
 forth-0.079
el-0.079
Token bite
Token ID13197
Feature activation+1.02e+00

Pos logit contributions
ves+0.012
fire+0.010
ork+0.010
ggy+0.010
 liar+0.010
Neg logit contributions
efully-0.019
ipp-0.018
ummy-0.018
inding-0.018
 forth-0.018
Token.
Token ID13
Feature activation+2.49e-01

Pos logit contributions
efully+0.030
ummy+0.030
el+0.029
ipp+0.029
 forth+0.028
Neg logit contributions
ves-0.019
 liar-0.017
fire-0.017
ork-0.017
ggy-0.017
Token Suddenly
Token ID24975
Feature activation-1.05e+00

Pos logit contributions
efully+0.004
ummy+0.004
ipp+0.004
inding+0.003
 both+0.003
Neg logit contributions
ves-0.008
 liar-0.008
fire-0.008
ggy-0.008
ork-0.007
Token,
Token ID11
Feature activation-4.76e-01

Pos logit contributions
ves+0.035
 liar+0.034
fire+0.034
handed+0.034
 inconvenience+0.033
Neg logit contributions
 all-0.015
 both-0.015
 Daddy-0.014
 eyes-0.013
el-0.013