TOP ACTIVATIONS
MAX = 0.070

but be careful . Ost ric hes can be gr umpy
. They were called ost ric hes . Lily and Ben
should not tease the ost ric hes . They are not
. It sees other ost ric hes . It makes new
wanted to touch the ost ric hes . They reached out
ostr ich pen . Ost ric hes are big birds with
They do not pack ost ric hes anymore . They pack
ostr ich pen . Ost ric hes are big birds with
. One of the ost ric hes looked different from the
too close . The ost ric hes might pe ck you
you count how many ost ric hes there are ?"
that said " O st ric hes ". Sara and Tom
did not know what ost ric hes were . They looked
a big box of ch ocol ates to share with the
ie . Unlike other ost ric hes , his feathers gl
a large selection of ch ocol ates . Ell a picked
shouted . " The ost ric hes are so funny !"
Yes , they are ost ric hes ," mom said .
a big box of ch ocol ates . It was so
was a box of ch ocol ates . " For me

MIN ACTIVATIONS
MIN = -0.056

she was very su pr ised by how tasty it was
and he was su pr ised at how quickly he started
just yet â but trust me , it
and offered him a y ummy treat as his prize .
never expected â <|endoftext|> Once upon a time
ep inside and su pr ised himself when he saw that
gave them all a y ummy snack . They were all
, he was su pr ised to see a gray bird
of all â if she only follows the
lady gave Tim a y ummy treat as a reward .
new toy or a y ummy treat . Anna and Ben
them and make a y ummy salad for lunch !"
and it looked so y ummy ! The wild pig ate
led up â and before she knew it
a horse â " John was
some trouble â May we come in ?"
man handed Tim a y ummy cookie and thanked him .
round and looked so y ummy . The frog carefully took
made a big , y ummy w affle . Mia was
Pat ches saw a y ummy piece of paper blowing across

INTERVAL 0.039 - 0.070
CONTAINS 0.015%

, and made noises like hon k and be ep .
around . He built sand cast les and collected seas hell
touch them . He craw ls to the castle and grabs
Tim and Jane built sand cast les and played in the
is called a sailor ," her dad said .

INTERVAL 0.007 - 0.039
CONTAINS 19.609%

that day on , Lucy and the little girl were best
Mom my . He loved his bike and he rode it
to dress up as a fairy . She had a pink
lion . He would roar and jump and scare everyone .
for help . Her mom told her to polish her shoes

INTERVAL -0.025 - 0.007
CONTAINS 80.160%

walk together . Grace was very stubborn and wanted to go
told her mom all about her new friend and how comfortable
tour of his place . Tom and Sue are grateful and
Tom and Lily are friends . They like to play outside
the roof and wave to the people in the street below

INTERVAL -0.056 - -0.025
CONTAINS 0.216%

to share and play together nicely . They both agreed and
kitchen and she found a packet of tea . She put
. They like to play with their toys and share them
much . The girl cl apped her hands and the dog
a big ball to play with . The hipp o was
Token but
Token ID475
Feature activation+1.65e-03

Pos logit contributions
 year+0.004
um+0.003
ally+0.003
Mom+0.003
heart+0.003
Neg logit contributions
 pret-0.004
ips-0.004
ses-0.004
 swords-0.003
drivers-0.003
Token be
Token ID307
Feature activation+1.36e-02

Pos logit contributions
ips+0.003
 pret+0.003
 swords+0.003
ids+0.003
 loses+0.003
Neg logit contributions
 year-0.003
 world-0.002
Mom-0.002
um-0.002
ally-0.002
Token careful
Token ID8161
Feature activation+1.74e-03

Pos logit contributions
 pret+0.019
ips+0.018
ses+0.018
 swords+0.018
 loses+0.018
Neg logit contributions
 year-0.030
Mom-0.022
 world-0.022
um-0.022
ini-0.022
Token.
Token ID13
Feature activation+1.87e-04

Pos logit contributions
 swords+0.003
ips+0.003
ps+0.003
 pret+0.003
ids+0.003
Neg logit contributions
 year-0.003
urious-0.002
ally-0.002
ini-0.002
spe-0.002
Token Ost
Token ID38919
Feature activation-1.02e-02

Pos logit contributions
 pret+0.000
 swords+0.000
ips+0.000
ses+0.000
 loses+0.000
Neg logit contributions
 year-0.000
ini-0.000
 world-0.000
um-0.000
Mom-0.000
Tokenric
Token ID1173
Feature activation+6.79e-02

Pos logit contributions
 year+0.024
ini+0.018
ally+0.018
heart+0.018
 world+0.018
Neg logit contributions
 pret-0.012
ips-0.012
 swords-0.011
ses-0.011
 Stri-0.011
Tokenhes
Token ID956
Feature activation+1.97e-02

Pos logit contributions
 pret+0.073
 swords+0.068
ses+0.068
ips+0.067
 loses+0.064
Neg logit contributions
 year-0.169
 world-0.131
ini-0.129
um-0.129
ide-0.128
Token can
Token ID460
Feature activation+2.48e-02

Pos logit contributions
 pret+0.030
 swords+0.029
ips+0.029
ses+0.028
 loses+0.028
Neg logit contributions
 year-0.039
ini-0.029
ally-0.029
 world-0.028
um-0.028
Token be
Token ID307
Feature activation+2.41e-02

Pos logit contributions
 pret+0.040
ips+0.038
 swords+0.038
ses+0.037
drivers+0.036
Neg logit contributions
 year-0.049
Mom-0.035
um-0.035
 world-0.035
ini-0.035
Token gr
Token ID1036
Feature activation+9.50e-03

Pos logit contributions
 pret+0.040
ips+0.039
ses+0.038
 swords+0.037
rob+0.036
Neg logit contributions
 year-0.046
 world-0.032
Mom-0.032
ini-0.032
um-0.031
Tokenumpy
Token ID32152
Feature activation+1.13e-02

Pos logit contributions
ips+0.016
 pret+0.016
 swords+0.016
ses+0.015
 loses+0.015
Neg logit contributions
 year-0.017
ini-0.012
 world-0.012
ally-0.012
Mom-0.012
Token.
Token ID13
Feature activation+4.69e-03

Pos logit contributions
 pret+0.007
 swords+0.007
ips+0.007
ses+0.007
 loses+0.007
Neg logit contributions
 year-0.004
 world-0.002
ide-0.002
ini-0.002
um-0.002
Token They
Token ID1119
Feature activation+6.36e-03

Pos logit contributions
 pret+0.007
 Stri+0.007
 swords+0.007
ids+0.007
ses+0.007
Neg logit contributions
 year-0.010
ini-0.008
ie-0.007
rated-0.007
heart-0.007
Token were
Token ID547
Feature activation+9.76e-03

Pos logit contributions
 pret+0.012
ips+0.012
 swords+0.012
ses+0.011
 loses+0.011
Neg logit contributions
 year-0.011
 world-0.007
ini-0.007
ally-0.007
um-0.007
Token called
Token ID1444
Feature activation+1.24e-02

Pos logit contributions
 pret+0.018
ips+0.017
 swords+0.017
drivers+0.017
 loses+0.017
Neg logit contributions
 year-0.017
 world-0.012
um-0.011
Mom-0.011
ide-0.011
Token ost
Token ID23619
Feature activation+2.96e-02

Pos logit contributions
 pret+0.018
ips+0.018
 swords+0.017
 loses+0.017
drivers+0.017
Neg logit contributions
 year-0.026
 world-0.019
um-0.019
ide-0.018
Mom-0.018
Tokenric
Token ID1173
Feature activation+6.78e-02

Pos logit contributions
 pret+0.044
 swords+0.041
ips+0.040
 loses+0.039
 superheroes+0.039
Neg logit contributions
 year-0.062
ini-0.047
ide-0.046
 world-0.046
um-0.046
Tokenhes
Token ID956
Feature activation+2.07e-02

Pos logit contributions
 pret+0.068
 swords+0.064
ips+0.064
ses+0.062
 loses+0.060
Neg logit contributions
 year-0.173
ini-0.135
 world-0.135
um-0.134
ally-0.132
Token.
Token ID13
Feature activation-9.32e-06

Pos logit contributions
 pret+0.037
 swords+0.036
ips+0.036
ses+0.035
ids+0.035
Neg logit contributions
 year-0.035
ini-0.025
 world-0.024
ally-0.024
urious-0.024
Token Lily
Token ID20037
Feature activation-1.74e-03

Feature not active on prev token;
no predictions were affected.
Token and
Token ID290
Feature activation+4.98e-04

Pos logit contributions
 year+0.003
ini+0.002
 world+0.002
um+0.002
ally+0.002
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ses-0.003
 loses-0.003
Token Ben
Token ID3932
Feature activation+3.07e-03

Pos logit contributions
ips+0.001
 swords+0.001
 pret+0.001
 loses+0.001
ids+0.001
Neg logit contributions
 year-0.001
ally-0.001
Mom-0.001
ide-0.001
ie-0.001
Token should
Token ID815
Feature activation+7.12e-03

Pos logit contributions
ips+0.004
 pret+0.004
ses+0.004
 swords+0.004
ids+0.004
Neg logit contributions
 year-0.004
Mom-0.003
ini-0.003
 world-0.003
ally-0.003
Token not
Token ID407
Feature activation+3.18e-03

Pos logit contributions
 pret+0.010
 swords+0.010
ses+0.010
ips+0.010
 loses+0.010
Neg logit contributions
 year-0.015
ini-0.011
urious-0.011
spe-0.011
 world-0.011
Token tease
Token ID34691
Feature activation+8.32e-03

Pos logit contributions
 pret+0.004
 loses+0.004
 swords+0.004
 training+0.004
ips+0.004
Neg logit contributions
 year-0.007
ini-0.005
heart-0.005
 world-0.005
um-0.005
Token the
Token ID262
Feature activation+1.09e-02

Pos logit contributions
 pret+0.012
 swords+0.012
 loses+0.011
ips+0.011
 superheroes+0.011
Neg logit contributions
 year-0.018
ini-0.014
ally-0.013
um-0.013
spe-0.013
Token ost
Token ID23619
Feature activation+1.70e-02

Pos logit contributions
 pret+0.013
ses+0.012
 swords+0.011
ips+0.011
 Stri+0.011
Neg logit contributions
 year-0.026
ally-0.021
um-0.020
ini-0.020
urious-0.020
Tokenric
Token ID1173
Feature activation+6.26e-02

Pos logit contributions
 pret+0.023
 swords+0.022
ips+0.021
ses+0.021
 loses+0.020
Neg logit contributions
 year-0.038
ini-0.029
 world-0.029
um-0.028
ally-0.028
Tokenhes
Token ID956
Feature activation+2.00e-02

Pos logit contributions
 pret+0.059
 swords+0.057
ips+0.057
ses+0.054
 loses+0.052
Neg logit contributions
 year-0.163
ini-0.129
 world-0.128
ally-0.127
um-0.127
Token.
Token ID13
Feature activation-5.28e-03

Pos logit contributions
 pret+0.038
 swords+0.037
ips+0.037
 loses+0.036
ids+0.035
Neg logit contributions
 year-0.032
ini-0.023
ally-0.022
urious-0.022
 world-0.022
Token They
Token ID1119
Feature activation+5.74e-03

Pos logit contributions
 year+0.009
ini+0.006
 world+0.006
um+0.006
ally+0.006
Neg logit contributions
 pret-0.010
 swords-0.010
ips-0.010
ses-0.009
 loses-0.009
Token are
Token ID389
Feature activation+6.80e-03

Pos logit contributions
 pret+0.008
ips+0.008
 swords+0.008
ses+0.008
ids+0.008
Neg logit contributions
 year-0.012
 world-0.009
ini-0.009
ally-0.008
Mom-0.008
Token not
Token ID407
Feature activation+7.25e-03

Pos logit contributions
ips+0.013
 pret+0.013
rob+0.013
ids+0.012
drivers+0.012
Neg logit contributions
 year-0.011
 world-0.007
ini-0.007
Mom-0.007
 sound-0.006
Token.
Token ID13
Feature activation+5.74e-03

Pos logit contributions
 pret+0.007
ses+0.007
 swords+0.007
drivers+0.007
ips+0.007
Neg logit contributions
 year-0.003
 world-0.001
Mom-0.001
um-0.001
time-0.001
Token It
Token ID632
Feature activation-6.17e-04

Pos logit contributions
 pret+0.011
 swords+0.011
ips+0.011
ses+0.011
 loses+0.010
Neg logit contributions
 year-0.009
ini-0.006
 world-0.006
um-0.006
ally-0.006
Token sees
Token ID7224
Feature activation+5.17e-03

Pos logit contributions
 year+0.001
 world+0.001
urious+0.001
ide+0.001
ie+0.001
Neg logit contributions
ips-0.001
drivers-0.001
 training-0.001
ids-0.001
 swords-0.001
Token other
Token ID584
Feature activation+1.24e-02

Pos logit contributions
 pret+0.007
ips+0.007
 swords+0.007
ses+0.007
 loses+0.006
Neg logit contributions
 year-0.011
ini-0.008
 world-0.008
um-0.008
ally-0.008
Token ost
Token ID23619
Feature activation+1.28e-02

Pos logit contributions
 pret+0.022
 swords+0.021
ips+0.021
ses+0.020
 loses+0.020
Neg logit contributions
 year-0.022
ini-0.016
um-0.016
ally-0.016
 world-0.015
Tokenric
Token ID1173
Feature activation+6.26e-02

Pos logit contributions
 pret+0.019
 swords+0.018
 loses+0.017
ses+0.017
ips+0.017
Neg logit contributions
 year-0.028
ini-0.021
 world-0.021
ally-0.020
um-0.020
Tokenhes
Token ID956
Feature activation+1.74e-02

Pos logit contributions
 pret+0.056
ips+0.054
 swords+0.054
ses+0.049
ids+0.048
Neg logit contributions
 year-0.165
ini-0.133
ally-0.132
 world-0.131
um-0.130
Token.
Token ID13
Feature activation+4.61e-03

Pos logit contributions
 pret+0.027
 swords+0.026
ips+0.026
ses+0.025
 loses+0.025
Neg logit contributions
 year-0.035
 world-0.025
ini-0.025
um-0.025
ally-0.025
Token It
Token ID632
Feature activation+2.41e-03

Pos logit contributions
 pret+0.008
 swords+0.008
ips+0.008
ses+0.008
 loses+0.008
Neg logit contributions
 year-0.008
 world-0.005
um-0.005
ini-0.005
Mom-0.005
Token makes
Token ID1838
Feature activation+6.24e-03

Pos logit contributions
ips+0.004
drivers+0.003
 swords+0.003
ids+0.003
ses+0.003
Neg logit contributions
 year-0.005
 world-0.004
urious-0.004
ide-0.003
ie-0.003
Token new
Token ID649
Feature activation-7.28e-04

Pos logit contributions
 pret+0.009
 swords+0.008
ips+0.008
ses+0.008
 loses+0.008
Neg logit contributions
 year-0.014
ini-0.010
 world-0.010
um-0.010
ally-0.010
Token wanted
Token ID2227
Feature activation-3.78e-03

Pos logit contributions
 pret+0.003
 swords+0.003
ses+0.003
 loses+0.003
ips+0.002
Neg logit contributions
 year-0.003
ini-0.002
 world-0.002
um-0.002
ally-0.002
Token to
Token ID284
Feature activation+2.72e-03

Pos logit contributions
 year+0.007
Mom+0.005
 world+0.005
time+0.005
um+0.005
Neg logit contributions
ips-0.007
 pret-0.006
rob-0.006
ses-0.006
ids-0.006
Token touch
Token ID3638
Feature activation+4.95e-03

Pos logit contributions
 pret+0.003
 loses+0.003
 swords+0.003
ses+0.003
 training+0.003
Neg logit contributions
 year-0.006
ini-0.005
urious-0.005
heart-0.005
spe-0.005
Token the
Token ID262
Feature activation+8.93e-03

Pos logit contributions
 pret+0.007
ips+0.007
 swords+0.007
ses+0.007
 loses+0.007
Neg logit contributions
 year-0.011
 world-0.008
ini-0.008
um-0.008
Mom-0.008
Token ost
Token ID23619
Feature activation+1.98e-02

Pos logit contributions
 pret+0.013
 swords+0.013
ses+0.013
ips+0.013
 loses+0.012
Neg logit contributions
 year-0.019
ini-0.014
ally-0.014
um-0.014
urious-0.013
Tokenric
Token ID1173
Feature activation+6.25e-02

Pos logit contributions
 pret+0.026
 swords+0.025
ips+0.024
ses+0.024
 loses+0.023
Neg logit contributions
 year-0.045
ini-0.034
 world-0.034
um-0.034
ally-0.033
Tokenhes
Token ID956
Feature activation+1.37e-02

Pos logit contributions
ips+0.048
 pret+0.047
 swords+0.045
ids+0.042
 Stri+0.040
Neg logit contributions
 year-0.172
ally-0.141
ini-0.141
heart-0.139
 world-0.139
Token.
Token ID13
Feature activation-7.27e-03

Pos logit contributions
 pret+0.027
 swords+0.026
ips+0.026
ses+0.026
 loses+0.025
Neg logit contributions
 year-0.021
ini-0.015
ally-0.014
 world-0.014
urious-0.014
Token They
Token ID1119
Feature activation+2.79e-03

Pos logit contributions
 year+0.014
ini+0.012
ie+0.012
heart+0.011
rated+0.011
Neg logit contributions
 pret-0.012
ses-0.011
 Robo-0.011
 Stri-0.011
 loses-0.011
Token reached
Token ID4251
Feature activation+1.48e-02

Pos logit contributions
 pret+0.004
 swords+0.004
ses+0.004
ips+0.004
 loses+0.004
Neg logit contributions
 year-0.006
ini-0.004
 world-0.004
um-0.004
ally-0.004
Token out
Token ID503
Feature activation+9.38e-03

Pos logit contributions
drivers+0.025
rob+0.025
 pret+0.025
 badges+0.025
 swords+0.025
Neg logit contributions
 year-0.025
 corner-0.018
 world-0.018
um-0.018
ie-0.018
Token ostr
Token ID42387
Feature activation+4.21e-03

Pos logit contributions
 year+0.021
heart+0.019
ally+0.018
ini+0.018
um+0.017
Neg logit contributions
 swords-0.012
 pret-0.012
 superheroes-0.011
ses-0.011
 scar-0.011
Tokenich
Token ID488
Feature activation+4.71e-04

Pos logit contributions
 pret+0.005
 swords+0.005
ips+0.005
ses+0.005
 loses+0.005
Neg logit contributions
 year-0.010
 world-0.007
ini-0.007
ally-0.007
um-0.007
Token pen
Token ID3112
Feature activation+8.83e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
 loses+0.001
ids+0.001
Neg logit contributions
 year-0.001
ini-0.001
um-0.001
ally-0.001
Mom-0.001
Token.
Token ID13
Feature activation+5.88e-03

Pos logit contributions
 pret+0.018
 swords+0.017
ips+0.017
ses+0.017
 loses+0.016
Neg logit contributions
 year-0.014
ini-0.009
 world-0.009
um-0.009
ally-0.009
Token Ost
Token ID38919
Feature activation-1.24e-02

Pos logit contributions
 pret+0.011
 swords+0.010
ses+0.010
ips+0.010
 Stri+0.010
Neg logit contributions
 year-0.010
ini-0.007
ally-0.007
 world-0.007
um-0.007
Tokenric
Token ID1173
Feature activation+6.25e-02

Pos logit contributions
 year+0.030
ini+0.023
heart+0.023
um+0.023
ally+0.023
Neg logit contributions
ips-0.014
 pret-0.013
ses-0.013
 swords-0.012
ps-0.012
Tokenhes
Token ID956
Feature activation+1.62e-02

Pos logit contributions
 pret+0.059
 swords+0.056
ips+0.055
ses+0.053
 loses+0.051
Neg logit contributions
 year-0.163
ini-0.129
 world-0.128
ally-0.128
um-0.127
Token are
Token ID389
Feature activation+2.14e-02

Pos logit contributions
 pret+0.023
 swords+0.022
ips+0.022
ses+0.021
 loses+0.021
Neg logit contributions
 year-0.034
ini-0.026
ally-0.026
um-0.025
urious-0.025
Token big
Token ID1263
Feature activation+7.09e-03

Pos logit contributions
ips+0.042
 pret+0.041
rob+0.041
ids+0.039
ses+0.039
Neg logit contributions
 year-0.034
 world-0.023
Mom-0.022
ide-0.020
um-0.020
Token birds
Token ID10087
Feature activation+7.66e-03

Pos logit contributions
ips+0.014
ids+0.013
Aud+0.013
ses+0.012
 pret+0.012
Neg logit contributions
 year-0.012
 world-0.009
ie-0.007
heart-0.007
ini-0.007
Token with
Token ID351
Feature activation-3.17e-03

Pos logit contributions
 pret+0.017
 swords+0.016
ips+0.016
ses+0.016
 loses+0.016
Neg logit contributions
 year-0.010
ini-0.006
 world-0.006
um-0.006
ally-0.006
Token They
Token ID1119
Feature activation+7.34e-03

Pos logit contributions
 pret+0.002
 swords+0.002
ips+0.002
ses+0.002
ids+0.002
Neg logit contributions
 year-0.002
 world-0.001
um-0.001
Mom-0.001
ini-0.001
Token do
Token ID466
Feature activation+2.64e-03

Pos logit contributions
ips+0.012
 pret+0.012
 swords+0.012
ses+0.011
ids+0.011
Neg logit contributions
 year-0.014
 world-0.010
um-0.010
Mom-0.010
ally-0.010
Token not
Token ID407
Feature activation+7.03e-03

Pos logit contributions
 pret+0.005
 swords+0.005
ips+0.005
ses+0.005
 loses+0.004
Neg logit contributions
 year-0.005
ini-0.003
 world-0.003
um-0.003
ally-0.003
Token pack
Token ID2353
Feature activation-3.18e-03

Pos logit contributions
 pret+0.010
 swords+0.010
ips+0.010
ses+0.009
 loses+0.009
Neg logit contributions
 year-0.015
ini-0.011
 world-0.011
um-0.011
ally-0.011
Token ost
Token ID23619
Feature activation+1.69e-02

Pos logit contributions
 year+0.007
ini+0.005
 world+0.005
ally+0.005
um+0.005
Neg logit contributions
 pret-0.004
 swords-0.004
ips-0.004
ses-0.004
 loses-0.004
Tokenric
Token ID1173
Feature activation+6.22e-02

Pos logit contributions
 pret+0.029
 swords+0.025
 loses+0.025
 superheroes+0.025
ses+0.025
Neg logit contributions
 year-0.032
ini-0.026
urious-0.025
ie-0.024
ide-0.024
Tokenhes
Token ID956
Feature activation+2.12e-02

Pos logit contributions
 pret+0.074
ses+0.071
 swords+0.066
 loses+0.066
 scar+0.066
Neg logit contributions
 year-0.148
 world-0.113
ini-0.111
ide-0.111
um-0.111
Token anymore
Token ID7471
Feature activation+1.56e-02

Pos logit contributions
 pret+0.032
ips+0.031
 swords+0.031
ses+0.031
 loses+0.030
Neg logit contributions
 year-0.043
 world-0.031
ini-0.031
um-0.031
Mom-0.030
Token.
Token ID13
Feature activation+5.04e-04

Pos logit contributions
 pret+0.035
 swords+0.034
ips+0.033
ses+0.033
 loses+0.033
Neg logit contributions
 year-0.021
ini-0.013
um-0.013
ally-0.013
 world-0.012
Token They
Token ID1119
Feature activation+9.39e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.001
 world-0.001
ini-0.000
um-0.000
Mom-0.000
Token pack
Token ID2353
Feature activation-4.63e-03

Pos logit contributions
ips+0.015
 pret+0.015
 swords+0.014
ses+0.014
drivers+0.014
Neg logit contributions
 year-0.018
 world-0.013
Mom-0.013
um-0.013
ini-0.013
Token ostr
Token ID42387
Feature activation+1.94e-03

Pos logit contributions
 year+0.025
heart+0.022
ini+0.021
ally+0.021
rated+0.020
Neg logit contributions
 swords-0.017
 pret-0.017
 superheroes-0.016
ses-0.016
 scar-0.016
Tokenich
Token ID488
Feature activation-3.37e-03

Pos logit contributions
 swords+0.002
 pret+0.002
ips+0.002
ses+0.002
 loses+0.002
Neg logit contributions
 year-0.004
 world-0.003
ally-0.003
ini-0.003
um-0.003
Token pen
Token ID3112
Feature activation+4.28e-03

Pos logit contributions
 year+0.006
ini+0.005
um+0.005
Mom+0.005
ally+0.005
Neg logit contributions
 pret-0.005
 swords-0.005
ips-0.005
 loses-0.005
ids-0.005
Token.
Token ID13
Feature activation+7.26e-03

Pos logit contributions
 pret+0.008
 swords+0.008
ips+0.008
ses+0.008
ids+0.007
Neg logit contributions
 year-0.007
ini-0.005
ally-0.005
um-0.005
urious-0.005
Token Ost
Token ID38919
Feature activation-1.34e-02

Pos logit contributions
 pret+0.013
 swords+0.012
ses+0.012
ips+0.012
 Stri+0.012
Neg logit contributions
 year-0.013
ini-0.010
 world-0.009
ally-0.009
um-0.009
Tokenric
Token ID1173
Feature activation+6.22e-02

Pos logit contributions
 year+0.032
ini+0.025
heart+0.025
spe+0.025
um+0.025
Neg logit contributions
ips-0.015
 pret-0.014
ps-0.014
ses-0.013
 Stri-0.013
Tokenhes
Token ID956
Feature activation+1.63e-02

Pos logit contributions
 pret+0.052
ips+0.051
 swords+0.050
ses+0.046
ids+0.045
Neg logit contributions
 year-0.167
ini-0.135
ally-0.135
heart-0.133
 world-0.132
Token are
Token ID389
Feature activation+2.21e-02

Pos logit contributions
 pret+0.027
 swords+0.025
ips+0.025
 loses+0.025
ids+0.024
Neg logit contributions
 year-0.030
ini-0.023
ally-0.023
urious-0.022
Mom-0.022
Token big
Token ID1263
Feature activation+9.38e-03

Pos logit contributions
ips+0.043
 pret+0.042
rob+0.042
ids+0.040
 loses+0.040
Neg logit contributions
 year-0.035
 world-0.024
Mom-0.022
um-0.021
ide-0.021
Token birds
Token ID10087
Feature activation+8.06e-03

Pos logit contributions
ips+0.018
ids+0.017
Aud+0.017
 pret+0.017
ses+0.017
Neg logit contributions
 year-0.015
 world-0.011
ie-0.009
heart-0.009
ini-0.009
Token with
Token ID351
Feature activation-3.85e-04

Pos logit contributions
 pret+0.017
 swords+0.016
ips+0.016
ses+0.016
 loses+0.016
Neg logit contributions
 year-0.012
ini-0.008
 world-0.008
um-0.007
ally-0.007
Token.
Token ID13
Feature activation+8.97e-03

Pos logit contributions
 year+0.002
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token One
Token ID1881
Feature activation-9.12e-03

Pos logit contributions
 pret+0.014
ses+0.014
ips+0.014
 swords+0.014
ids+0.013
Neg logit contributions
 year-0.018
ini-0.013
ally-0.013
 world-0.012
heart-0.012
Token of
Token ID286
Feature activation+1.66e-02

Pos logit contributions
 year+0.015
ini+0.011
ally+0.011
Mom+0.011
urious+0.011
Neg logit contributions
 pret-0.015
ses-0.015
 swords-0.015
ips-0.015
 loses-0.015
Token the
Token ID262
Feature activation+4.25e-03

Pos logit contributions
 pret+0.027
ses+0.026
 loses+0.026
ips+0.026
 swords+0.026
Neg logit contributions
 year-0.033
Mom-0.023
 world-0.023
um-0.023
ide-0.023
Token ost
Token ID23619
Feature activation+1.09e-02

Pos logit contributions
 pret+0.006
 swords+0.005
 scar+0.005
ses+0.005
ips+0.005
Neg logit contributions
 year-0.009
ally-0.007
um-0.007
ini-0.007
ide-0.007
Tokenric
Token ID1173
Feature activation+6.08e-02

Pos logit contributions
 pret+0.016
 swords+0.015
 loses+0.014
ips+0.014
ses+0.014
Neg logit contributions
 year-0.023
ini-0.017
 world-0.017
um-0.017
ally-0.017
Tokenhes
Token ID956
Feature activation+1.38e-02

Pos logit contributions
ips+0.049
 pret+0.049
 swords+0.047
ids+0.044
ses+0.043
Neg logit contributions
 year-0.164
ini-0.134
ally-0.133
 world-0.131
heart-0.131
Token looked
Token ID3114
Feature activation+2.36e-03

Pos logit contributions
 pret+0.026
ips+0.025
 swords+0.025
ids+0.025
 loses+0.025
Neg logit contributions
 year-0.022
ini-0.016
 world-0.015
urious-0.015
ally-0.015
Token different
Token ID1180
Feature activation+1.52e-02

Pos logit contributions
 pret+0.004
 swords+0.003
ips+0.003
 loses+0.003
ses+0.003
Neg logit contributions
 year-0.005
ini-0.004
ally-0.004
um-0.004
Mom-0.004
Token from
Token ID422
Feature activation+7.20e-03

Pos logit contributions
 pret+0.027
 swords+0.026
ips+0.026
ses+0.025
 loses+0.025
Neg logit contributions
 year-0.027
ini-0.019
ally-0.019
Mom-0.019
um-0.019
Token the
Token ID262
Feature activation+7.29e-03

Pos logit contributions
 pret+0.010
 swords+0.010
ips+0.010
ses+0.010
 loses+0.009
Neg logit contributions
 year-0.015
ini-0.011
 world-0.011
um-0.011
ally-0.011
Token too
Token ID1165
Feature activation-3.03e-03

Pos logit contributions
 pret+0.008
ips+0.008
 swords+0.008
 loses+0.008
ses+0.008
Neg logit contributions
 year-0.010
ini-0.007
 world-0.007
um-0.007
ally-0.007
Token close
Token ID1969
Feature activation+5.23e-04

Pos logit contributions
 year+0.005
ini+0.005
ally+0.004
 world+0.004
heart+0.004
Neg logit contributions
 loses-0.005
 swords-0.005
ips-0.004
 training-0.004
 pret-0.004
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
 pret+0.001
 swords+0.001
ses+0.001
ips+0.001
ids+0.001
Neg logit contributions
 year-0.001
 world-0.001
um-0.001
ini-0.001
ally-0.001
Token The
Token ID383
Feature activation-7.46e-03

Pos logit contributions
 year+0.022
ini+0.015
 world+0.015
um+0.015
Mom+0.015
Neg logit contributions
 pret-0.023
 swords-0.022
ips-0.022
ses-0.022
 loses-0.021
Token ost
Token ID23619
Feature activation+1.18e-02

Pos logit contributions
 year+0.014
ally+0.012
ide+0.011
ini+0.011
heart+0.011
Neg logit contributions
ses-0.012
 pret-0.012
 swords-0.011
 tickets-0.011
 scar-0.011
Tokenric
Token ID1173
Feature activation+6.01e-02

Pos logit contributions
 pret+0.016
 swords+0.015
ips+0.015
ses+0.015
 loses+0.014
Neg logit contributions
 year-0.026
ini-0.020
 world-0.020
um-0.019
ally-0.019
Tokenhes
Token ID956
Feature activation+1.30e-02

Pos logit contributions
 pret+0.053
ips+0.051
 swords+0.051
ses+0.048
 loses+0.046
Neg logit contributions
 year-0.159
ini-0.128
ally-0.127
 world-0.126
um-0.125
Token might
Token ID1244
Feature activation+1.45e-02

Pos logit contributions
 pret+0.020
ips+0.019
 swords+0.019
 loses+0.019
ids+0.018
Neg logit contributions
 year-0.025
ini-0.019
ally-0.019
urious-0.019
um-0.019
Token pe
Token ID613
Feature activation+2.35e-02

Pos logit contributions
 pret+0.023
 swords+0.023
ips+0.022
ses+0.022
 loses+0.022
Neg logit contributions
 year-0.028
ini-0.021
 world-0.020
um-0.020
ally-0.020
Tokenck
Token ID694
Feature activation+1.72e-02

Pos logit contributions
 pret+0.034
ses+0.034
 Robo+0.032
 swords+0.031
 superheroes+0.031
Neg logit contributions
 year-0.048
ie-0.037
ini-0.036
urious-0.035
 world-0.034
Token you
Token ID345
Feature activation+1.09e-02

Pos logit contributions
 pret+0.030
ips+0.028
 swords+0.028
ses+0.028
 loses+0.027
Neg logit contributions
 year-0.032
 world-0.022
um-0.022
ini-0.022
heart-0.021
Token you
Token ID345
Feature activation-1.07e-02

Pos logit contributions
 year+0.007
ini+0.006
 world+0.006
um+0.005
ally+0.005
Neg logit contributions
 pret-0.005
 swords-0.005
ips-0.005
ses-0.005
 loses-0.005
Token count
Token ID954
Feature activation-1.40e-02

Pos logit contributions
 year+0.022
um+0.016
 world+0.016
Mom+0.016
ini+0.016
Neg logit contributions
 pret-0.016
ips-0.016
 swords-0.016
ses-0.016
drivers-0.015
Token how
Token ID703
Feature activation+2.60e-03

Pos logit contributions
 year+0.027
 world+0.019
Mom+0.018
um+0.018
ide+0.018
Neg logit contributions
 pret-0.023
ses-0.023
ips-0.022
 swords-0.022
 loses-0.022
Token many
Token ID867
Feature activation+1.95e-03

Pos logit contributions
 pret+0.004
 swords+0.004
ips+0.004
ses+0.003
 loses+0.003
Neg logit contributions
 year-0.006
ini-0.004
 world-0.004
ally-0.004
um-0.004
Token ost
Token ID23619
Feature activation+8.59e-03

Pos logit contributions
 pret+0.003
ses+0.003
ips+0.003
 swords+0.003
ids+0.003
Neg logit contributions
 year-0.004
 world-0.003
ini-0.003
um-0.003
Mom-0.003
Tokenric
Token ID1173
Feature activation+5.99e-02

Pos logit contributions
 pret+0.013
 swords+0.013
 superheroes+0.012
 loses+0.012
 scar+0.012
Neg logit contributions
 year-0.017
ini-0.013
 world-0.013
ide-0.013
um-0.013
Tokenhes
Token ID956
Feature activation+1.61e-02

Pos logit contributions
 pret+0.068
 swords+0.064
ses+0.064
ips+0.062
 scar+0.060
Neg logit contributions
 year-0.146
 world-0.112
ini-0.111
um-0.111
ide-0.109
Token there
Token ID612
Feature activation-7.63e-03

Pos logit contributions
 pret+0.024
 swords+0.024
ips+0.023
ses+0.023
 loses+0.022
Neg logit contributions
 year-0.033
ini-0.024
 world-0.024
um-0.023
ally-0.023
Token are
Token ID389
Feature activation+4.59e-03

Pos logit contributions
 year+0.012
ini+0.008
 world+0.008
um+0.008
ally+0.008
Neg logit contributions
 pret-0.015
 swords-0.014
ips-0.014
ses-0.014
 loses-0.014
Token?"
Token ID1701
Feature activation+1.51e-03

Pos logit contributions
 pret+0.008
ips+0.008
ses+0.008
 swords+0.008
ids+0.007
Neg logit contributions
 year-0.008
 world-0.006
um-0.005
ini-0.005
Mom-0.005
Token
Token ID198
Feature activation-1.22e-02

Pos logit contributions
 pret+0.002
ips+0.002
ses+0.002
 loses+0.002
 Stri+0.002
Neg logit contributions
 year-0.003
ini-0.003
spe-0.002
rated-0.002
 world-0.002
Token that
Token ID326
Feature activation+9.00e-03

Pos logit contributions
 year+0.001
ini+0.001
ally+0.001
 world+0.001
urious+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Token said
Token ID531
Feature activation+3.31e-03

Pos logit contributions
 pret+0.013
 swords+0.012
ips+0.012
ses+0.012
 loses+0.011
Neg logit contributions
 year-0.019
ini-0.015
 world-0.014
um-0.014
ally-0.014
Token "
Token ID366
Feature activation-2.40e-03

Pos logit contributions
 pret+0.007
ips+0.007
 swords+0.007
ses+0.007
 loses+0.007
Neg logit contributions
 year-0.005
 world-0.003
um-0.003
time-0.003
Mom-0.003
TokenO
Token ID46
Feature activation-8.22e-04

Pos logit contributions
 year+0.005
ini+0.003
 world+0.003
um+0.003
ally+0.003
Neg logit contributions
 pret-0.004
 swords-0.004
ips-0.004
ses-0.004
 loses-0.004
Tokenst
Token ID301
Feature activation+1.83e-02

Pos logit contributions
 year+0.002
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Tokenric
Token ID1173
Feature activation+5.85e-02

Pos logit contributions
 pret+0.026
 swords+0.025
ips+0.025
ses+0.024
 loses+0.024
Neg logit contributions
 year-0.039
ini-0.029
 world-0.029
um-0.029
ide-0.028
Tokenhes
Token ID956
Feature activation+1.96e-02

Pos logit contributions
 pret+0.060
 swords+0.057
ips+0.057
ses+0.054
 loses+0.052
Neg logit contributions
 year-0.148
ini-0.116
 world-0.115
ally-0.114
um-0.114
Token".
Token ID1911
Feature activation+9.00e-03

Pos logit contributions
 pret+0.031
ips+0.031
 swords+0.030
ids+0.030
ses+0.030
Neg logit contributions
 year-0.037
ini-0.027
urious-0.027
ally-0.027
Mom-0.026
Token Sara
Token ID24799
Feature activation+7.70e-03

Pos logit contributions
 pret+0.012
ips+0.011
ses+0.011
 swords+0.011
 loses+0.011
Neg logit contributions
 year-0.020
ini-0.016
 world-0.015
ally-0.015
heart-0.015
Token and
Token ID290
Feature activation+5.02e-04

Pos logit contributions
 pret+0.014
 swords+0.014
ips+0.014
ses+0.014
 loses+0.013
Neg logit contributions
 year-0.013
ini-0.009
 world-0.009
um-0.009
ally-0.009
Token Tom
Token ID4186
Feature activation-9.05e-04

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
drivers+0.001
 loses+0.001
Neg logit contributions
 year-0.001
um-0.001
ally-0.001
Mom-0.001
urious-0.001
Token did
Token ID750
Feature activation-8.38e-03

Pos logit contributions
 year+0.001
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token not
Token ID407
Feature activation-4.58e-03

Pos logit contributions
 year+0.015
 world+0.010
ini+0.010
um+0.010
ally+0.010
Neg logit contributions
 pret-0.015
ips-0.015
 swords-0.015
ses-0.015
ids-0.014
Token know
Token ID760
Feature activation-9.56e-03

Pos logit contributions
 year+0.009
 world+0.006
ide+0.006
ie+0.006
Mom+0.006
Neg logit contributions
ips-0.008
ses-0.008
 swords-0.008
 pret-0.008
drivers-0.008
Token what
Token ID644
Feature activation-9.80e-03

Pos logit contributions
 year+0.018
 world+0.012
ini+0.012
um+0.012
ide+0.012
Neg logit contributions
 pret-0.017
ips-0.016
 swords-0.016
ses-0.016
 loses-0.015
Token ost
Token ID23619
Feature activation+1.26e-02

Pos logit contributions
 year+0.019
 world+0.013
time+0.012
ini+0.012
Mom+0.012
Neg logit contributions
ips-0.017
 pret-0.017
ses-0.016
 swords-0.016
 loses-0.016
Tokenric
Token ID1173
Feature activation+5.85e-02

Pos logit contributions
 pret+0.018
 swords+0.017
ips+0.016
 loses+0.016
ses+0.016
Neg logit contributions
 year-0.027
ini-0.021
 world-0.020
um-0.020
ally-0.020
Tokenhes
Token ID956
Feature activation+2.25e-02

Pos logit contributions
 pret+0.062
 swords+0.059
ips+0.058
ses+0.057
 loses+0.054
Neg logit contributions
 year-0.146
ini-0.114
 world-0.113
um-0.113
ally-0.112
Token were
Token ID547
Feature activation-9.24e-03

Pos logit contributions
 pret+0.036
 swords+0.035
ips+0.034
ses+0.034
 loses+0.033
Neg logit contributions
 year-0.044
ini-0.032
 world-0.032
um-0.031
ally-0.031
Token.
Token ID13
Feature activation+2.54e-03

Pos logit contributions
 year+0.015
 world+0.009
um+0.009
ini+0.009
ide+0.009
Neg logit contributions
 pret-0.018
ips-0.018
 swords-0.018
ses-0.018
 Stri-0.017
Token They
Token ID1119
Feature activation+3.70e-04

Pos logit contributions
 pret+0.005
ips+0.004
ses+0.004
 swords+0.004
 Stri+0.004
Neg logit contributions
 year-0.004
ini-0.003
ally-0.003
 world-0.003
heart-0.003
Token looked
Token ID3114
Feature activation+1.90e-03

Pos logit contributions
 pret+0.001
ips+0.001
 swords+0.001
ses+0.001
ids+0.001
Neg logit contributions
 year-0.001
 world-0.000
ally-0.000
um-0.000
Mom-0.000
Token a
Token ID257
Feature activation-1.96e-03

Pos logit contributions
 year+0.005
um+0.003
 world+0.003
Mom+0.003
ie+0.003
Neg logit contributions
 pret-0.005
ses-0.005
 loses-0.005
ips-0.005
ids-0.005
Token big
Token ID1263
Feature activation-4.82e-03

Pos logit contributions
 year+0.004
heart+0.003
ini+0.003
um+0.003
ally+0.003
Neg logit contributions
 swords-0.003
 pret-0.003
 scar-0.003
 superheroes-0.003
ses-0.003
Token box
Token ID3091
Feature activation-7.73e-03

Pos logit contributions
 year+0.009
ini+0.007
 world+0.007
um+0.007
ally+0.007
Neg logit contributions
 pret-0.008
 swords-0.008
ips-0.008
ses-0.007
 loses-0.007
Token of
Token ID286
Feature activation+2.19e-03

Pos logit contributions
 year+0.014
ini+0.010
 world+0.010
ally+0.010
urious+0.010
Neg logit contributions
 swords-0.013
 pret-0.013
ips-0.013
ses-0.012
ids-0.012
Token ch
Token ID442
Feature activation-5.63e-03

Pos logit contributions
 pret+0.002
 swords+0.002
 scar+0.002
 superheroes+0.002
 badges+0.002
Neg logit contributions
 year-0.005
ally-0.004
urious-0.004
um-0.004
ini-0.004
Tokenocol
Token ID4668
Feature activation+5.78e-02

Pos logit contributions
 year+0.013
ini+0.010
um+0.010
 world+0.010
ally+0.010
Neg logit contributions
 pret-0.007
ses-0.007
 swords-0.006
ips-0.006
 loses-0.006
Tokenates
Token ID689
Feature activation-6.56e-03

Pos logit contributions
 pret+0.079
 loses+0.076
ses+0.069
 Robo+0.068
 Stri+0.067
Neg logit contributions
 year-0.132
um-0.107
ini-0.104
ide-0.103
Mom-0.102
Token to
Token ID284
Feature activation+1.39e-03

Pos logit contributions
 year+0.011
um+0.007
 world+0.007
ide+0.007
ie+0.007
Neg logit contributions
 pret-0.013
ses-0.012
ips-0.012
 swords-0.012
 loses-0.012
Token share
Token ID2648
Feature activation-6.44e-03

Pos logit contributions
 pret+0.002
 swords+0.002
 Stri+0.002
 superheroes+0.002
 scar+0.002
Neg logit contributions
 year-0.003
ini-0.002
urious-0.002
spe-0.002
um-0.002
Token with
Token ID351
Feature activation-1.57e-02

Pos logit contributions
 year+0.011
ini+0.009
urious+0.009
ally+0.008
rated+0.008
Neg logit contributions
 swords-0.012
 pret-0.012
 Stri-0.011
ips-0.011
 Robo-0.011
Token the
Token ID262
Feature activation+9.67e-03

Pos logit contributions
 year+0.037
ini+0.033
rated+0.032
urious+0.032
spe+0.031
Neg logit contributions
 swords-0.018
 Stri-0.016
 superheroes-0.016
 pret-0.015
 Robo-0.015
Tokenie
Token ID494
Feature activation+6.92e-03

Pos logit contributions
 pret+0.045
 swords+0.044
ips+0.043
 Stri+0.042
ses+0.042
Neg logit contributions
 year-0.036
 world-0.024
Mom-0.024
um-0.023
urious-0.023
Token.
Token ID13
Feature activation+3.21e-03

Pos logit contributions
 Stri+0.012
 swords+0.012
 pret+0.012
 scar+0.011
 whist+0.011
Neg logit contributions
 year-0.012
rated-0.009
urious-0.009
ini-0.009
spe-0.009
Token Unlike
Token ID12101
Feature activation+7.49e-03

Pos logit contributions
 Stri+0.004
 pret+0.004
ips+0.004
 swords+0.003
ses+0.003
Neg logit contributions
 year-0.007
ini-0.006
urious-0.006
ally-0.006
spe-0.006
Token other
Token ID584
Feature activation+2.36e-02

Pos logit contributions
ips+0.013
Aud+0.012
 swords+0.012
 pret+0.012
ses+0.012
Neg logit contributions
 year-0.014
ie-0.010
 world-0.010
heart-0.010
ide-0.009
Token ost
Token ID23619
Feature activation+1.69e-02

Pos logit contributions
 pret+0.037
 swords+0.036
ips+0.036
ses+0.035
 loses+0.034
Neg logit contributions
 year-0.047
ini-0.034
 world-0.034
um-0.033
ally-0.033
Tokenric
Token ID1173
Feature activation+5.69e-02

Pos logit contributions
 pret+0.024
 swords+0.022
ips+0.022
ses+0.021
 loses+0.021
Neg logit contributions
 year-0.037
ini-0.028
 world-0.028
ally-0.028
um-0.028
Tokenhes
Token ID956
Feature activation+1.44e-02

Pos logit contributions
 pret+0.053
ips+0.052
 swords+0.051
ses+0.048
 loses+0.047
Neg logit contributions
 year-0.148
ini-0.118
ally-0.117
 world-0.116
um-0.116
Token,
Token ID11
Feature activation+7.48e-03

Pos logit contributions
 pret+0.027
 swords+0.026
ips+0.026
ses+0.026
 loses+0.025
Neg logit contributions
 year-0.024
ini-0.016
 world-0.016
um-0.016
ally-0.016
Token his
Token ID465
Feature activation+3.94e-03

Pos logit contributions
 pret+0.011
 swords+0.010
 Stri+0.010
ses+0.010
ips+0.010
Neg logit contributions
 year-0.016
um-0.012
ally-0.012
urious-0.012
ini-0.012
Token feathers
Token ID27400
Feature activation-8.94e-04

Pos logit contributions
 pret+0.006
ses+0.006
 swords+0.006
ips+0.006
 scar+0.006
Neg logit contributions
 year-0.008
ini-0.006
ally-0.006
ide-0.005
Mom-0.005
Token gl
Token ID1278
Feature activation+1.12e-02

Pos logit contributions
 year+0.001
ini+0.001
um+0.001
Mom+0.001
ally+0.001
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token a
Token ID257
Feature activation+3.46e-03

Pos logit contributions
 year+0.002
 world+0.002
ini+0.002
um+0.002
Mom+0.002
Neg logit contributions
 pret-0.002
ips-0.002
ses-0.002
 swords-0.002
 loses-0.002
Token large
Token ID1588
Feature activation-7.18e-03

Pos logit contributions
 pret+0.006
 swords+0.006
 scar+0.006
 training+0.005
ses+0.005
Neg logit contributions
 year-0.006
urious-0.005
ide-0.005
ally-0.005
heart-0.005
Token selection
Token ID6356
Feature activation-1.74e-02

Pos logit contributions
 year+0.014
ini+0.010
urious+0.010
ally+0.010
ide+0.010
Neg logit contributions
 pret-0.012
 swords-0.011
ses-0.011
ips-0.011
 training-0.010
Token of
Token ID286
Feature activation+8.85e-03

Pos logit contributions
 year+0.027
ini+0.018
urious+0.018
ally+0.018
 world+0.018
Neg logit contributions
 pret-0.033
ips-0.033
 swords-0.033
ids-0.032
drivers-0.031
Token ch
Token ID442
Feature activation+1.04e-03

Pos logit contributions
 pret+0.013
ips+0.012
 swords+0.012
ses+0.012
 loses+0.012
Neg logit contributions
 year-0.019
ini-0.014
 world-0.014
um-0.014
ally-0.014
Tokenocol
Token ID4668
Feature activation+5.62e-02

Pos logit contributions
 pret+0.001
ses+0.001
 swords+0.001
 loses+0.001
 Stri+0.001
Neg logit contributions
 year-0.003
um-0.002
ini-0.002
 world-0.002
heart-0.002
Tokenates
Token ID689
Feature activation-6.06e-04

Pos logit contributions
 pret+0.076
 loses+0.076
ses+0.074
 Robo+0.069
 superheroes+0.068
Neg logit contributions
 year-0.125
um-0.103
ide-0.097
ini-0.097
Mom-0.095
Token.
Token ID13
Feature activation-9.62e-04

Pos logit contributions
 year+0.001
 world+0.000
um+0.000
ini+0.000
ide+0.000
Neg logit contributions
 pret-0.001
 swords-0.001
ses-0.001
ips-0.001
 loses-0.001
Token Ell
Token ID7122
Feature activation-1.19e-02

Pos logit contributions
 year+0.002
ini+0.002
ie+0.002
rated+0.001
urious+0.001
Neg logit contributions
 pret-0.001
ids-0.001
ips-0.001
 Stri-0.001
 Robo-0.001
Tokena
Token ID64
Feature activation-8.20e-03

Pos logit contributions
 year+0.025
 world+0.021
umm+0.021
time+0.020
um+0.020
Neg logit contributions
 pret-0.014
ips-0.014
etchup-0.013
ids-0.013
 loses-0.012
Token picked
Token ID6497
Feature activation-3.56e-04

Pos logit contributions
 year+0.013
ini+0.009
 world+0.009
heart+0.009
rated+0.009
Neg logit contributions
 pret-0.016
 swords-0.015
ips-0.014
 loses-0.014
ids-0.014
Token shouted
Token ID17293
Feature activation+9.53e-04

Pos logit contributions
ips+0.015
 pret+0.014
 swords+0.014
ses+0.014
ids+0.014
Neg logit contributions
 year-0.012
Mom-0.008
um-0.008
urious-0.008
ally-0.008
Token.
Token ID13
Feature activation+1.66e-03

Pos logit contributions
 pret+0.002
ips+0.002
 swords+0.002
ses+0.002
 loses+0.002
Neg logit contributions
 year-0.001
um-0.001
 world-0.001
ini-0.001
ally-0.001
Token "
Token ID366
Feature activation-2.05e-03

Pos logit contributions
 pret+0.003
 Robo+0.002
ses+0.002
 swords+0.002
 Stri+0.002
Neg logit contributions
 year-0.003
ini-0.003
heart-0.003
rated-0.003
ally-0.003
TokenThe
Token ID464
Feature activation-6.50e-03

Pos logit contributions
 year+0.003
ini+0.002
ally+0.002
um+0.002
heart+0.002
Neg logit contributions
 pret-0.004
ips-0.004
 swords-0.004
ses-0.004
 loses-0.004
Token ost
Token ID23619
Feature activation+8.11e-03

Pos logit contributions
 year+0.013
 world+0.010
ini+0.010
um+0.010
Mom+0.010
Neg logit contributions
 pret-0.010
ips-0.010
 swords-0.010
ids-0.009
 loses-0.009
Tokenric
Token ID1173
Feature activation+5.62e-02

Pos logit contributions
 pret+0.012
 swords+0.011
 superheroes+0.011
 loses+0.011
ips+0.011
Neg logit contributions
 year-0.017
ini-0.013
um-0.013
 world-0.013
ide-0.013
Tokenhes
Token ID956
Feature activation+1.36e-02

Pos logit contributions
ips+0.045
 pret+0.045
 swords+0.043
ses+0.040
ids+0.039
Neg logit contributions
 year-0.153
ini-0.125
ally-0.125
heart-0.122
 world-0.122
Token are
Token ID389
Feature activation+1.02e-02

Pos logit contributions
 pret+0.018
ips+0.018
 loses+0.018
 swords+0.018
ses+0.017
Neg logit contributions
 year-0.028
ini-0.023
ally-0.022
urious-0.022
Mom-0.022
Token so
Token ID523
Feature activation+4.25e-04

Pos logit contributions
ips+0.018
 pret+0.018
rob+0.018
drivers+0.017
 swords+0.017
Neg logit contributions
 year-0.018
 world-0.013
 sound-0.011
Mom-0.011
ide-0.011
Token funny
Token ID8258
Feature activation-7.92e-03

Pos logit contributions
 swords+0.001
 loses+0.001
ips+0.001
 pret+0.001
 Stri+0.001
Neg logit contributions
 year-0.001
ally-0.001
ini-0.001
heart-0.001
urious-0.001
Token!"
Token ID2474
Feature activation-1.78e-03

Pos logit contributions
 year+0.013
ini+0.008
 world+0.008
um+0.008
ally+0.008
Neg logit contributions
 pret-0.016
 swords-0.015
ips-0.015
ses-0.015
 loses-0.015
TokenYes
Token ID5297
Feature activation+3.82e-03

Pos logit contributions
 pret+0.026
ips+0.025
 swords+0.025
ses+0.025
 Stri+0.025
Neg logit contributions
 year-0.016
ini-0.010
um-0.009
ally-0.009
 world-0.009
Token,
Token ID11
Feature activation-9.29e-04

Pos logit contributions
 swords+0.005
 pret+0.005
ses+0.005
 loses+0.005
ids+0.005
Neg logit contributions
 year-0.008
ini-0.006
 world-0.006
urious-0.006
spe-0.006
Token they
Token ID484
Feature activation+1.30e-02

Pos logit contributions
 year+0.002
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Token are
Token ID389
Feature activation+3.35e-03

Pos logit contributions
 pret+0.020
ips+0.020
drivers+0.020
 Stri+0.019
ses+0.019
Neg logit contributions
 year-0.026
 world-0.020
urious-0.019
ally-0.019
um-0.019
Token ost
Token ID23619
Feature activation+7.23e-03

Pos logit contributions
 pret+0.007
ips+0.007
 swords+0.007
ses+0.007
 loses+0.007
Neg logit contributions
 year-0.005
 world-0.003
um-0.003
ini-0.003
ide-0.003
Tokenric
Token ID1173
Feature activation+5.61e-02

Pos logit contributions
 pret+0.011
 swords+0.010
 loses+0.010
 superheroes+0.010
ips+0.010
Neg logit contributions
 year-0.015
ini-0.012
ally-0.011
ide-0.011
um-0.011
Tokenhes
Token ID956
Feature activation+1.12e-02

Pos logit contributions
 pret+0.057
 swords+0.054
ips+0.053
ses+0.052
 loses+0.050
Neg logit contributions
 year-0.143
 world-0.111
ini-0.111
um-0.110
Mom-0.109
Token,"
Token ID553
Feature activation+8.06e-03

Pos logit contributions
 pret+0.019
 swords+0.018
ips+0.018
ses+0.017
 loses+0.017
Neg logit contributions
 year-0.021
ini-0.016
ally-0.015
 world-0.015
urious-0.015
Token mom
Token ID1995
Feature activation+6.35e-03

Pos logit contributions
 pret+0.012
 swords+0.012
ips+0.011
ses+0.011
 loses+0.011
Neg logit contributions
 year-0.017
ini-0.012
 world-0.012
um-0.012
ally-0.012
Token said
Token ID531
Feature activation+7.42e-03

Pos logit contributions
 pret+0.008
 swords+0.008
ips+0.008
 Stri+0.008
ses+0.007
Neg logit contributions
 year-0.014
 world-0.011
ini-0.011
urious-0.011
um-0.010
Token.
Token ID13
Feature activation+6.40e-03

Pos logit contributions
 pret+0.016
 swords+0.016
ips+0.016
ses+0.016
 loses+0.016
Neg logit contributions
 year-0.010
um-0.006
ini-0.006
ally-0.006
 world-0.006
Token a
Token ID257
Feature activation+4.34e-03

Pos logit contributions
 year+0.013
ini+0.011
urious+0.011
ally+0.011
rated+0.011
Neg logit contributions
 swords-0.007
 pret-0.006
 Stri-0.006
 superheroes-0.006
 badges-0.006
Token big
Token ID1263
Feature activation-3.40e-03

Pos logit contributions
 pret+0.008
 swords+0.007
 scar+0.007
 superheroes+0.007
 badges+0.007
Neg logit contributions
 year-0.008
urious-0.006
ide-0.006
rated-0.006
heart-0.006
Token box
Token ID3091
Feature activation-8.19e-03

Pos logit contributions
 year+0.006
ini+0.004
um+0.004
ally+0.004
 world+0.004
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.006
ses-0.005
 loses-0.005
Token of
Token ID286
Feature activation+4.52e-03

Pos logit contributions
 year+0.014
ini+0.010
urious+0.010
ally+0.010
heart+0.010
Neg logit contributions
 swords-0.014
ips-0.014
 pret-0.014
 training-0.013
ids-0.013
Token ch
Token ID442
Feature activation-5.75e-03

Pos logit contributions
 pret+0.006
 swords+0.006
 superheroes+0.005
 scar+0.005
 badges+0.005
Neg logit contributions
 year-0.010
ally-0.008
um-0.008
urious-0.008
ini-0.008
Tokenocol
Token ID4668
Feature activation+5.58e-02

Pos logit contributions
 year+0.013
um+0.010
ini+0.010
 world+0.010
ally+0.010
Neg logit contributions
 pret-0.008
ses-0.008
 Stri-0.007
rob-0.007
drivers-0.007
Tokenates
Token ID689
Feature activation-7.50e-03

Pos logit contributions
 pret+0.077
 loses+0.073
 Robo+0.069
ses+0.069
 superheroes+0.067
Neg logit contributions
 year-0.125
um-0.103
ide-0.098
ini-0.097
Mom-0.096
Token.
Token ID13
Feature activation+6.07e-03

Pos logit contributions
 year+0.010
 world+0.006
um+0.006
ini+0.006
ally+0.006
Neg logit contributions
 pret-0.017
 swords-0.016
ips-0.016
ses-0.016
 loses-0.016
Token It
Token ID632
Feature activation-1.57e-04

Pos logit contributions
 pret+0.009
 Stri+0.009
 swords+0.009
 Robo+0.009
ids+0.009
Neg logit contributions
 year-0.012
ini-0.010
ie-0.009
rated-0.009
urious-0.009
Token was
Token ID373
Feature activation+6.52e-03

Pos logit contributions
 year+0.000
ini+0.000
 world+0.000
um+0.000
urious+0.000
Neg logit contributions
 pret-0.000
 swords-0.000
 Stri-0.000
 scar-0.000
drivers-0.000
Token so
Token ID523
Feature activation+1.66e-03

Pos logit contributions
 pret+0.010
ips+0.009
 swords+0.009
ses+0.009
 loses+0.009
Neg logit contributions
 year-0.013
 world-0.010
ini-0.010
um-0.010
ally-0.010
Token was
Token ID373
Feature activation+8.73e-03

Pos logit contributions
 pret+0.006
 swords+0.006
ips+0.006
ses+0.005
 loses+0.005
Neg logit contributions
 year-0.006
ini-0.004
 world-0.004
um-0.004
ally-0.004
Token a
Token ID257
Feature activation+2.96e-03

Pos logit contributions
 pret+0.011
 swords+0.010
ips+0.010
ses+0.010
 loses+0.010
Neg logit contributions
 year-0.020
ini-0.016
 world-0.015
um-0.015
ally-0.015
Token box
Token ID3091
Feature activation-5.16e-03

Pos logit contributions
 pret+0.005
 swords+0.005
 scar+0.005
 superheroes+0.005
 training+0.005
Neg logit contributions
 year-0.005
ide-0.004
urious-0.004
ally-0.004
ini-0.004
Token of
Token ID286
Feature activation+7.59e-03

Pos logit contributions
 year+0.009
urious+0.006
ini+0.006
ally+0.006
 world+0.006
Neg logit contributions
 swords-0.009
 pret-0.009
ips-0.009
ids-0.009
ses-0.009
Token ch
Token ID442
Feature activation-2.37e-03

Pos logit contributions
 pret+0.011
 swords+0.010
ips+0.010
ses+0.010
 scar+0.009
Neg logit contributions
 year-0.016
ini-0.012
ally-0.012
um-0.012
 world-0.012
Tokenocol
Token ID4668
Feature activation+5.56e-02

Pos logit contributions
 year+0.006
um+0.004
ini+0.004
 world+0.004
heart+0.004
Neg logit contributions
 pret-0.003
ses-0.003
 swords-0.003
rob-0.003
 Stri-0.003
Tokenates
Token ID689
Feature activation-6.38e-03

Pos logit contributions
 pret+0.084
 loses+0.082
 Robo+0.079
ses+0.077
 superheroes+0.076
Neg logit contributions
 year-0.116
um-0.097
Mom-0.090
ide-0.090
ini-0.089
Token.
Token ID13
Feature activation+1.75e-03

Pos logit contributions
 year+0.009
um+0.006
 world+0.006
ini+0.006
ide+0.006
Neg logit contributions
 pret-0.014
ses-0.013
ips-0.013
 swords-0.013
 loses-0.013
Token "
Token ID366
Feature activation+3.40e-03

Pos logit contributions
 pret+0.003
 Stri+0.003
ips+0.003
 swords+0.003
ids+0.003
Neg logit contributions
 year-0.003
ini-0.002
ie-0.002
urious-0.002
 world-0.002
TokenFor
Token ID1890
Feature activation+1.32e-02

Pos logit contributions
 pret+0.006
ips+0.006
 training+0.006
Aud+0.006
 Stri+0.006
Neg logit contributions
 year-0.005
ini-0.004
um-0.004
ally-0.004
ie-0.004
Token me
Token ID502
Feature activation-4.03e-03

Pos logit contributions
 pret+0.020
 swords+0.019
ips+0.019
ses+0.019
 loses+0.018
Neg logit contributions
 year-0.027
 world-0.020
ini-0.020
um-0.019
ally-0.019
Token she
Token ID673
Feature activation-1.48e-02

Pos logit contributions
 year+0.030
ini+0.022
 world+0.022
um+0.022
Mom+0.021
Neg logit contributions
 pret-0.026
 swords-0.024
ips-0.024
 loses-0.024
ses-0.023
Token was
Token ID373
Feature activation-6.36e-03

Pos logit contributions
 year+0.025
ini+0.019
 world+0.019
um+0.018
ally+0.017
Neg logit contributions
 pret-0.027
 loses-0.025
 swords-0.025
 scar-0.024
ses-0.024
Token very
Token ID845
Feature activation-1.62e-02

Pos logit contributions
 year+0.011
 world+0.008
ini+0.008
um+0.007
ally+0.007
Neg logit contributions
 pret-0.012
 swords-0.011
ips-0.011
ses-0.011
 loses-0.011
Token su
Token ID424
Feature activation-9.36e-03

Pos logit contributions
 year+0.032
ally+0.025
ini+0.025
ide+0.025
 world+0.024
Neg logit contributions
 pret-0.024
 loses-0.023
ips-0.023
 swords-0.022
 training-0.022
Tokenpr
Token ID1050
Feature activation-2.47e-02

Pos logit contributions
 year+0.022
ini+0.016
ally+0.016
um+0.016
 world+0.016
Neg logit contributions
 pret-0.012
 swords-0.012
ses-0.011
ips-0.011
 loses-0.011
Tokenised
Token ID1417
Feature activation-5.40e-02

Pos logit contributions
 year+0.041
 world+0.030
Mom+0.029
ally+0.028
ini+0.028
Neg logit contributions
ips-0.044
 swords-0.042
 loses-0.041
 pret-0.041
ids-0.040
Token by
Token ID416
Feature activation-2.34e-02

Pos logit contributions
 year+0.084
Mom+0.066
ini+0.065
ally+0.065
urious+0.063
Neg logit contributions
ips-0.096
 pret-0.094
 training-0.092
 swords-0.091
 enough-0.090
Token how
Token ID703
Feature activation+3.83e-03

Pos logit contributions
 year+0.052
ini+0.040
 world+0.039
um+0.039
urious+0.039
Neg logit contributions
 pret-0.031
 swords-0.030
ips-0.030
ses-0.029
 Stri-0.028
Token tasty
Token ID25103
Feature activation-4.49e-03

Pos logit contributions
 swords+0.005
ips+0.005
 loses+0.005
 pret+0.005
etchup+0.005
Neg logit contributions
 year-0.007
ide-0.006
ini-0.006
um-0.006
ally-0.005
Token it
Token ID340
Feature activation+4.65e-03

Pos logit contributions
 year+0.009
ini+0.007
urious+0.007
 world+0.007
um+0.007
Neg logit contributions
 pret-0.007
 swords-0.006
ips-0.006
ses-0.006
 loses-0.006
Token was
Token ID373
Feature activation-8.14e-03

Pos logit contributions
 pret+0.010
 swords+0.010
ips+0.010
 loses+0.010
ses+0.010
Neg logit contributions
 year-0.006
um-0.004
ini-0.004
Mom-0.004
 world-0.004
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
 year+0.008
ini+0.006
ally+0.006
Mom+0.006
heart+0.006
Neg logit contributions
 pret-0.006
 swords-0.006
 training-0.006
ips-0.006
 loses-0.006
Token he
Token ID339
Feature activation-1.62e-02

Pos logit contributions
 year+0.023
 world+0.016
ally+0.016
ini+0.016
um+0.016
Neg logit contributions
 pret-0.022
ips-0.022
ses-0.021
 swords-0.021
ids-0.021
Token was
Token ID373
Feature activation-7.67e-03

Pos logit contributions
 year+0.029
ini+0.020
 world+0.020
um+0.020
ally+0.020
Neg logit contributions
 pret-0.028
 swords-0.027
ips-0.027
ses-0.027
 loses-0.026
Token su
Token ID424
Feature activation-6.22e-03

Pos logit contributions
 year+0.012
 world+0.008
spe+0.008
time+0.007
um+0.007
Neg logit contributions
ses-0.015
 pret-0.015
 swords-0.015
rob-0.015
ips-0.015
Tokenpr
Token ID1050
Feature activation-2.41e-02

Pos logit contributions
 year+0.014
ini+0.011
 world+0.011
um+0.011
ally+0.011
Neg logit contributions
 pret-0.008
 swords-0.008
ips-0.008
ses-0.007
 loses-0.007
Tokenised
Token ID1417
Feature activation-5.23e-02

Pos logit contributions
 year+0.038
ini+0.025
 world+0.025
ally+0.025
Mom+0.025
Neg logit contributions
 pret-0.047
ips-0.046
 swords-0.046
 loses-0.044
ses-0.044
Token at
Token ID379
Feature activation-2.56e-02

Pos logit contributions
 year+0.086
ini+0.061
ally+0.060
um+0.059
Mom+0.059
Neg logit contributions
 pret-0.097
ips-0.096
 swords-0.096
 loses-0.091
 training-0.091
Token how
Token ID703
Feature activation-2.74e-04

Pos logit contributions
 year+0.053
ini+0.039
 world+0.038
um+0.038
ally+0.038
Neg logit contributions
 pret-0.038
 swords-0.037
ips-0.037
ses-0.036
 loses-0.035
Token quickly
Token ID2952
Feature activation-9.18e-04

Pos logit contributions
 year+0.001
ally+0.000
ini+0.000
um+0.000
ide+0.000
Neg logit contributions
 swords-0.000
 pret-0.000
ips-0.000
 loses-0.000
 training-0.000
Token he
Token ID339
Feature activation-1.13e-02

Pos logit contributions
 year+0.002
ini+0.001
um+0.001
ally+0.001
 world+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
 loses-0.001
ses-0.001
Token started
Token ID2067
Feature activation-6.86e-03

Pos logit contributions
 year+0.019
ini+0.013
 world+0.013
um+0.012
ally+0.012
Neg logit contributions
 pret-0.021
 swords-0.021
ips-0.021
ses-0.020
 loses-0.020
Token just
Token ID655
Feature activation-2.01e-03

Pos logit contributions
 year+0.038
ini+0.027
ally+0.026
um+0.026
urious+0.026
Neg logit contributions
 pret-0.043
 swords-0.042
ips-0.041
 loses-0.041
ids-0.040
Token yet
Token ID1865
Feature activation-2.43e-02

Pos logit contributions
 year+0.004
ini+0.003
 world+0.003
um+0.003
ally+0.003
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ses-0.003
 loses-0.003
Tokenâ
Token ID22940
Feature activation-3.31e-02

Pos logit contributions
 year+0.040
ini+0.029
ally+0.028
Mom+0.027
um+0.027
Neg logit contributions
 pret-0.046
 swords-0.044
ips-0.044
 loses-0.043
 training-0.042
Token
Token ID26391
Feature activation-2.69e-02

Pos logit contributions
 year+0.062
ini+0.045
 world+0.044
um+0.044
Mom+0.044
Neg logit contributions
 pret-0.055
ips-0.054
ses-0.050
 Stri-0.050
 loses-0.050
Token
Token ID126
Feature activation-2.84e-02

Pos logit contributions
 year+0.052
ini+0.038
urious+0.037
um+0.037
ally+0.037
Neg logit contributions
 pret-0.044
ips-0.042
 swords-0.042
 loses-0.041
ids-0.041
Token
Token ID99
Feature activation-5.22e-02

Pos logit contributions
 year+0.054
ini+0.039
ally+0.038
um+0.038
 world+0.038
Neg logit contributions
 pret-0.047
ips-0.045
 swords-0.045
ses-0.044
 loses-0.043
Tokenbut
Token ID4360
Feature activation-1.76e-02

Pos logit contributions
 year+0.106
ie+0.081
ini+0.081
ally+0.079
rated+0.079
Neg logit contributions
ips-0.077
 pret-0.075
 Stri-0.072
ids-0.071
 swords-0.071
Token trust
Token ID3774
Feature activation+2.98e-03

Pos logit contributions
 year+0.035
ini+0.026
urious+0.026
um+0.025
ally+0.025
Neg logit contributions
 pret-0.027
 swords-0.026
ips-0.026
 loses-0.025
 Stri-0.025
Token me
Token ID502
Feature activation-1.04e-02

Pos logit contributions
 pret+0.005
ips+0.005
 swords+0.004
 loses+0.004
 Stri+0.004
Neg logit contributions
 year-0.006
ini-0.004
ally-0.004
urious-0.004
um-0.004
Token,
Token ID11
Feature activation-6.16e-03

Pos logit contributions
 year+0.017
ini+0.012
ally+0.012
urious+0.012
um+0.011
Neg logit contributions
 pret-0.019
ips-0.019
 swords-0.019
ids-0.018
 loses-0.018
Token it
Token ID340
Feature activation-9.25e-03

Pos logit contributions
 year+0.012
ini+0.009
ally+0.009
urious+0.009
um+0.008
Neg logit contributions
 pret-0.010
 swords-0.010
ips-0.009
 loses-0.009
ses-0.009
Token and
Token ID290
Feature activation-2.56e-03

Pos logit contributions
 year+0.016
 world+0.010
um+0.009
ini+0.009
ally+0.009
Neg logit contributions
 pret-0.023
 swords-0.022
ips-0.022
ses-0.022
 loses-0.021
Token offered
Token ID4438
Feature activation-1.94e-03

Pos logit contributions
 year+0.004
Mom+0.003
ally+0.003
um+0.003
 was+0.003
Neg logit contributions
ips-0.005
ids-0.004
 pret-0.004
 swords-0.004
ses-0.004
Token him
Token ID683
Feature activation-1.02e-02

Pos logit contributions
 year+0.005
ini+0.004
 world+0.004
urious+0.004
um+0.003
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token a
Token ID257
Feature activation-2.99e-03

Pos logit contributions
 year+0.020
 world+0.014
um+0.014
time+0.014
Mom+0.014
Neg logit contributions
 pret-0.016
ses-0.016
ips-0.015
drivers-0.015
rob-0.015
Token y
Token ID331
Feature activation-2.41e-02

Pos logit contributions
 year+0.005
 world+0.004
um+0.003
Mom+0.003
ally+0.003
Neg logit contributions
ips-0.005
 pret-0.005
 swords-0.005
ids-0.005
ses-0.005
Tokenummy
Token ID13513
Feature activation-5.13e-02

Pos logit contributions
 year+0.027
spe+0.019
ini+0.017
Lu+0.017
Mom+0.016
Neg logit contributions
ips-0.053
 swords-0.053
ses-0.051
 loses-0.050
 scar-0.050
Token treat
Token ID2190
Feature activation-2.30e-02

Pos logit contributions
 year+0.105
 world+0.076
um+0.076
ini+0.075
ally+0.075
Neg logit contributions
 pret-0.078
ips-0.076
 swords-0.075
ses-0.073
 loses-0.072
Token as
Token ID355
Feature activation-9.01e-03

Pos logit contributions
 year+0.032
ini+0.020
 world+0.020
ally+0.019
um+0.019
Neg logit contributions
 pret-0.049
 swords-0.048
ips-0.048
ses-0.047
 loses-0.046
Token his
Token ID465
Feature activation+1.44e-03

Pos logit contributions
 year+0.018
um+0.013
 world+0.013
time+0.013
ally+0.013
Neg logit contributions
 pret-0.014
ips-0.013
ses-0.013
rob-0.013
drivers-0.013
Token prize
Token ID11596
Feature activation-5.75e-03

Pos logit contributions
 pret+0.002
 swords+0.002
ses+0.002
ips+0.002
 loses+0.002
Neg logit contributions
 year-0.003
ini-0.002
heart-0.002
ide-0.002
ally-0.002
Token.
Token ID13
Feature activation-2.53e-03

Pos logit contributions
 year+0.007
ini+0.004
 world+0.004
um+0.004
ally+0.004
Neg logit contributions
 pret-0.013
 swords-0.013
ips-0.013
ses-0.013
 loses-0.012
Token never
Token ID1239
Feature activation+1.10e-02

Pos logit contributions
 pret+0.009
 swords+0.009
 loses+0.008
 training+0.008
ids+0.008
Neg logit contributions
 year-0.009
ini-0.007
um-0.007
ally-0.006
urious-0.006
Token expected
Token ID2938
Feature activation-1.88e-03

Pos logit contributions
 pret+0.012
ips+0.012
 swords+0.012
ses+0.011
ids+0.011
Neg logit contributions
 year-0.027
 world-0.021
ini-0.021
ally-0.020
heart-0.020
Tokenâ
Token ID22940
Feature activation-1.89e-02

Pos logit contributions
 year+0.003
ini+0.002
urious+0.002
ally+0.002
Mom+0.002
Neg logit contributions
 pret-0.004
ips-0.004
 swords-0.003
ids-0.003
 loses-0.003
Token
Token ID26391
Feature activation-2.15e-02

Pos logit contributions
 year+0.033
ini+0.023
 world+0.023
um+0.022
ally+0.022
Neg logit contributions
 pret-0.034
ips-0.033
 swords-0.033
ses-0.032
 loses-0.032
Token
Token ID126
Feature activation-2.39e-02

Pos logit contributions
 year+0.045
ini+0.034
um+0.033
 world+0.033
heart+0.033
Neg logit contributions
 pret-0.031
ips-0.030
 swords-0.030
ses-0.029
 loses-0.029
Token
Token ID99
Feature activation-5.11e-02

Pos logit contributions
 year+0.043
ini+0.030
 world+0.029
heart+0.029
ally+0.029
Neg logit contributions
 pret-0.042
ips-0.041
 swords-0.041
ses-0.040
 loses-0.039
Token<|endoftext|>
Token ID50256
Feature activation+3.98e-03

Pos logit contributions
 year+0.101
ini+0.074
ie+0.074
heart+0.073
rated+0.072
Neg logit contributions
ips-0.079
 pret-0.075
 swords-0.075
ids-0.074
 loses-0.072
TokenOnce
Token ID7454
Feature activation+7.91e-03

Pos logit contributions
ips+0.005
 pret+0.005
 Stri+0.005
ids+0.004
 swords+0.004
Neg logit contributions
 year-0.009
ini-0.007
urious-0.007
spe-0.007
ally-0.007
Token upon
Token ID2402
Feature activation-3.75e-04

Pos logit contributions
ips+0.011
ids+0.010
 pret+0.010
 swords+0.010
 Robo+0.010
Neg logit contributions
 year-0.016
ini-0.013
umm-0.013
urious-0.012
 world-0.012
Token a
Token ID257
Feature activation-4.37e-03

Pos logit contributions
 year+0.001
ini+0.001
um+0.001
ally+0.001
 world+0.001
Neg logit contributions
 pret-0.000
 swords-0.000
ips-0.000
 loses-0.000
ses-0.000
Token time
Token ID640
Feature activation-4.33e-04

Pos logit contributions
spe+0.007
ini+0.007
 year+0.007
keeper+0.006
heart+0.006
Neg logit contributions
Aud-0.007
 pret-0.007
 scar-0.006
 tries-0.006
etchup-0.006
Tokenep
Token ID538
Feature activation+9.04e-03

Pos logit contributions
ses+0.015
 pret+0.014
 superheroes+0.014
ips+0.014
rob+0.013
Neg logit contributions
 year-0.018
ie-0.014
urious-0.014
um-0.014
ini-0.013
Token inside
Token ID2641
Feature activation+5.01e-03

Pos logit contributions
 pret+0.015
ips+0.014
 swords+0.014
ses+0.014
 loses+0.013
Neg logit contributions
 year-0.018
 world-0.013
ini-0.013
um-0.013
ally-0.012
Token and
Token ID290
Feature activation-5.98e-03

Pos logit contributions
 pret+0.007
 swords+0.007
ips+0.007
 Stri+0.007
 loses+0.007
Neg logit contributions
 year-0.010
ini-0.008
ally-0.008
um-0.008
Mom-0.008
Token su
Token ID424
Feature activation-4.29e-03

Pos logit contributions
 year+0.012
ini+0.009
ally+0.009
um+0.009
 world+0.009
Neg logit contributions
 pret-0.009
 swords-0.009
 loses-0.008
ses-0.008
ips-0.008
Tokenpr
Token ID1050
Feature activation-2.15e-02

Pos logit contributions
 year+0.009
ally+0.007
um+0.007
Mom+0.007
ini+0.007
Neg logit contributions
ses-0.007
 pret-0.007
 swords-0.006
 loses-0.006
 Stri-0.006
Tokenised
Token ID1417
Feature activation-4.99e-02

Pos logit contributions
 year+0.031
 world+0.020
Mom+0.020
ini+0.020
ally+0.019
Neg logit contributions
ips-0.044
 swords-0.043
 pret-0.043
 loses-0.042
ids-0.041
Token himself
Token ID2241
Feature activation-1.49e-02

Pos logit contributions
 year+0.084
 world+0.056
ini+0.056
um+0.055
ally+0.054
Neg logit contributions
 pret-0.094
 swords-0.092
ips-0.091
ses-0.090
 loses-0.088
Token when
Token ID618
Feature activation-1.48e-02

Pos logit contributions
 year+0.024
 world+0.015
spe+0.015
um+0.015
ide+0.015
Neg logit contributions
ses-0.029
ips-0.029
 pret-0.029
rob-0.028
 swords-0.028
Token he
Token ID339
Feature activation-7.24e-03

Pos logit contributions
 year+0.032
ini+0.024
 world+0.024
um+0.024
ally+0.024
Neg logit contributions
 pret-0.020
 swords-0.020
ips-0.020
ses-0.019
 loses-0.019
Token saw
Token ID2497
Feature activation-6.19e-03

Pos logit contributions
 year+0.012
ini+0.007
 world+0.007
ally+0.007
Mom+0.007
Neg logit contributions
 pret-0.014
ips-0.014
ses-0.014
 swords-0.014
drivers-0.014
Token that
Token ID326
Feature activation-8.31e-03

Pos logit contributions
 year+0.014
ini+0.010
 world+0.010
ally+0.010
um+0.010
Neg logit contributions
 pret-0.009
 swords-0.008
ips-0.008
ses-0.008
 loses-0.008
Token gave
Token ID2921
Feature activation-1.51e-02

Pos logit contributions
 year+0.036
 world+0.027
ini+0.027
um+0.026
urious+0.026
Neg logit contributions
 pret-0.025
 swords-0.023
ips-0.022
 Stri-0.022
 loses-0.022
Token them
Token ID606
Feature activation+2.09e-03

Pos logit contributions
 year+0.035
ini+0.028
spe+0.027
urious+0.027
um+0.026
Neg logit contributions
 pret-0.019
ips-0.018
 swords-0.018
 superheroes-0.017
 scar-0.017
Token all
Token ID477
Feature activation-1.99e-03

Pos logit contributions
 pret+0.003
ses+0.003
ips+0.003
 loses+0.003
 swords+0.003
Neg logit contributions
 year-0.004
um-0.003
Mom-0.003
ally-0.003
 world-0.003
Token a
Token ID257
Feature activation-5.78e-03

Pos logit contributions
 year+0.004
 world+0.003
um+0.003
ini+0.003
ally+0.003
Neg logit contributions
 pret-0.003
 swords-0.003
ses-0.003
ips-0.003
 loses-0.003
Token y
Token ID331
Feature activation-2.54e-02

Pos logit contributions
 year+0.010
ini+0.008
ide+0.008
ally+0.008
urious+0.008
Neg logit contributions
 pret-0.010
 swords-0.010
 scar-0.009
ses-0.009
 training-0.009
Tokenummy
Token ID13513
Feature activation-4.97e-02

Pos logit contributions
 year+0.026
ini+0.012
 world+0.012
Mom+0.012
spe+0.011
Neg logit contributions
ips-0.063
 swords-0.062
 pret-0.062
ses-0.060
 loses-0.060
Token snack
Token ID26906
Feature activation-1.20e-02

Pos logit contributions
 year+0.104
 world+0.076
ini+0.075
um+0.075
ally+0.074
Neg logit contributions
 pret-0.073
ips-0.072
 swords-0.071
ses-0.070
 loses-0.068
Token.
Token ID13
Feature activation+9.89e-05

Pos logit contributions
 year+0.019
ini+0.013
urious+0.012
 world+0.012
ally+0.012
Neg logit contributions
 pret-0.024
 swords-0.023
ips-0.023
ses-0.023
ids-0.022
Token They
Token ID1119
Feature activation+1.33e-03

Pos logit contributions
 pret+0.000
ips+0.000
ses+0.000
ids+0.000
 Robo+0.000
Neg logit contributions
 year-0.000
ini-0.000
rated-0.000
heart-0.000
spe-0.000
Token were
Token ID547
Feature activation+5.69e-03

Pos logit contributions
 pret+0.002
 swords+0.002
ips+0.002
ses+0.002
 loses+0.002
Neg logit contributions
 year-0.002
ini-0.002
 world-0.002
um-0.002
ally-0.002
Token all
Token ID477
Feature activation-3.86e-03

Pos logit contributions
 pret+0.011
 swords+0.010
ips+0.010
ses+0.010
 Stri+0.010
Neg logit contributions
 year-0.010
ini-0.007
 world-0.006
ally-0.006
urious-0.006
Token,
Token ID11
Feature activation-8.50e-05

Pos logit contributions
 pret+0.006
 swords+0.006
ses+0.006
ips+0.006
drivers+0.006
Neg logit contributions
 year-0.006
um-0.004
 world-0.004
ally-0.004
Mom-0.004
Token he
Token ID339
Feature activation-6.93e-03

Pos logit contributions
 year+0.000
ini+0.000
 world+0.000
um+0.000
ally+0.000
Neg logit contributions
 pret-0.000
 swords-0.000
ips-0.000
ses-0.000
 loses-0.000
Token was
Token ID373
Feature activation-4.33e-03

Pos logit contributions
 year+0.014
ini+0.010
 world+0.010
ide+0.010
um+0.010
Neg logit contributions
 pret-0.011
 swords-0.011
 loses-0.011
 Stri-0.010
 scar-0.010
Token su
Token ID424
Feature activation-4.79e-03

Pos logit contributions
 year+0.008
 world+0.006
um+0.006
Mom+0.006
ini+0.006
Neg logit contributions
 pret-0.007
 swords-0.007
ips-0.007
ses-0.007
drivers-0.007
Tokenpr
Token ID1050
Feature activation-2.11e-02

Pos logit contributions
 year+0.010
 world+0.008
ini+0.008
um+0.008
ally+0.008
Neg logit contributions
ips-0.006
 pret-0.006
 swords-0.006
 loses-0.006
ids-0.006
Tokenised
Token ID1417
Feature activation-4.93e-02

Pos logit contributions
 year+0.034
 world+0.024
Mom+0.023
ini+0.023
ally+0.023
Neg logit contributions
ips-0.039
 swords-0.038
 pret-0.038
 loses-0.037
ses-0.037
Token to
Token ID284
Feature activation-1.10e-02

Pos logit contributions
 year+0.086
ini+0.060
um+0.059
 world+0.058
ally+0.058
Neg logit contributions
 pret-0.089
 swords-0.087
ips-0.086
ses-0.084
 loses-0.083
Token see
Token ID766
Feature activation-1.53e-02

Pos logit contributions
 year+0.024
ini+0.019
um+0.018
ally+0.018
ide+0.018
Neg logit contributions
 pret-0.016
 swords-0.015
ses-0.014
 loses-0.014
 superheroes-0.014
Token a
Token ID257
Feature activation-6.94e-03

Pos logit contributions
 year+0.036
ini+0.028
um+0.028
ally+0.028
 world+0.027
Neg logit contributions
 swords-0.019
 pret-0.018
 Stri-0.017
 superheroes-0.017
ips-0.017
Token gray
Token ID12768
Feature activation-1.46e-02

Pos logit contributions
 year+0.013
urious+0.010
ide+0.010
um+0.010
ally+0.010
Neg logit contributions
 pret-0.011
 scar-0.011
 swords-0.010
 superheroes-0.010
 whist-0.010
Token bird
Token ID6512
Feature activation-5.93e-03

Pos logit contributions
 year+0.032
urious+0.029
spe+0.028
rated+0.027
Mom+0.026
Neg logit contributions
 pret-0.016
ses-0.016
 whist-0.015
 scar-0.014
 swords-0.014
Token of
Token ID286
Feature activation+2.00e-03

Pos logit contributions
 year+0.003
ally+0.002
ini+0.002
urious+0.002
um+0.002
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ids-0.003
 loses-0.003
Token all
Token ID477
Feature activation+1.10e-02

Pos logit contributions
 pret+0.003
 swords+0.003
ips+0.003
 superheroes+0.003
ses+0.003
Neg logit contributions
 year-0.004
ini-0.003
ally-0.003
um-0.003
urious-0.003
Tokenâ
Token ID22940
Feature activation-1.81e-02

Pos logit contributions
 pret+0.018
ips+0.018
 swords+0.018
 loses+0.017
 superheroes+0.017
Neg logit contributions
 year-0.018
ally-0.015
Mom-0.015
ini-0.014
um-0.014
Token
Token ID26391
Feature activation-1.05e-02

Pos logit contributions
 year+0.032
ini+0.022
 world+0.022
ally+0.022
um+0.022
Neg logit contributions
 pret-0.032
 swords-0.032
ips-0.031
ses-0.031
 loses-0.030
Token
Token ID126
Feature activation-1.73e-02

Pos logit contributions
 year+0.019
ini+0.014
um+0.014
heart+0.014
urious+0.014
Neg logit contributions
 pret-0.017
ips-0.017
 swords-0.016
 loses-0.016
ids-0.016
Token
Token ID99
Feature activation-4.88e-02

Pos logit contributions
 year+0.033
ini+0.023
 world+0.023
um+0.023
ally+0.023
Neg logit contributions
 pret-0.029
 swords-0.028
ips-0.028
ses-0.027
 loses-0.027
Tokenif
Token ID361
Feature activation-5.31e-03

Pos logit contributions
 year+0.100
ini+0.074
ally+0.073
 world+0.072
ie+0.072
Neg logit contributions
ips-0.072
 pret-0.071
 swords-0.069
ids-0.067
 loses-0.066
Token she
Token ID673
Feature activation+9.02e-03

Pos logit contributions
 year+0.010
ini+0.007
ally+0.007
 world+0.007
um+0.007
Neg logit contributions
 pret-0.009
ips-0.008
 swords-0.008
 loses-0.008
ses-0.008
Token only
Token ID691
Feature activation+1.49e-02

Pos logit contributions
 pret+0.014
 swords+0.014
ips+0.014
ses+0.013
 loses+0.013
Neg logit contributions
 year-0.018
ini-0.013
ally-0.013
um-0.013
 world-0.013
Token follows
Token ID5679
Feature activation-7.23e-03

Pos logit contributions
 pret+0.023
ips+0.022
 swords+0.022
ses+0.022
 Stri+0.021
Neg logit contributions
 year-0.030
 world-0.022
ini-0.021
um-0.021
heart-0.021
Token the
Token ID262
Feature activation-1.60e-03

Pos logit contributions
 year+0.014
urious+0.011
ini+0.011
ally+0.010
spe+0.010
Neg logit contributions
 pret-0.011
 Stri-0.011
ips-0.011
 training-0.011
 swords-0.010
Token lady
Token ID10846
Feature activation-1.75e-02

Pos logit contributions
 year+0.024
ini+0.020
ide+0.020
heart+0.020
urious+0.019
Neg logit contributions
 pret-0.023
ses-0.023
ips-0.022
 training-0.022
 tickets-0.022
Token gave
Token ID2921
Feature activation-5.69e-03

Pos logit contributions
 year+0.034
ini+0.025
 world+0.025
um+0.024
ally+0.024
Neg logit contributions
 pret-0.028
 swords-0.027
ips-0.027
ses-0.027
 loses-0.026
Token Tim
Token ID5045
Feature activation-2.17e-03

Pos logit contributions
 year+0.013
ini+0.011
urious+0.010
 world+0.010
spe+0.010
Neg logit contributions
ips-0.006
 pret-0.006
 Stri-0.006
 swords-0.006
 scar-0.006
Token a
Token ID257
Feature activation-4.37e-05

Pos logit contributions
 year+0.004
ini+0.003
 world+0.003
ally+0.003
um+0.003
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ses-0.003
 loses-0.003
Token y
Token ID331
Feature activation-2.16e-02

Pos logit contributions
 year+0.000
 world+0.000
um+0.000
ini+0.000
Mom+0.000
Neg logit contributions
ips-0.000
 pret-0.000
ses-0.000
 swords-0.000
 loses-0.000
Tokenummy
Token ID13513
Feature activation-4.87e-02

Pos logit contributions
 year+0.023
spe+0.013
Mom+0.013
ini+0.013
 world+0.013
Neg logit contributions
ips-0.051
 swords-0.050
 pret-0.049
ses-0.049
 scar-0.048
Token treat
Token ID2190
Feature activation-1.98e-02

Pos logit contributions
 year+0.103
um+0.075
 world+0.074
ini+0.073
ally+0.073
Neg logit contributions
ips-0.071
 pret-0.070
 swords-0.068
 loses-0.067
 Stri-0.067
Token as
Token ID355
Feature activation-6.75e-03

Pos logit contributions
 year+0.030
ini+0.020
urious+0.020
 world+0.020
ally+0.019
Neg logit contributions
 pret-0.040
 swords-0.039
ips-0.038
ses-0.038
ids-0.037
Token a
Token ID257
Feature activation+2.59e-03

Pos logit contributions
 year+0.014
 world+0.010
um+0.010
time+0.010
ini+0.010
Neg logit contributions
 pret-0.010
ses-0.010
ips-0.010
 loses-0.010
 swords-0.009
Token reward
Token ID6721
Feature activation+1.22e-03

Pos logit contributions
 pret+0.004
 swords+0.004
ses+0.004
ips+0.004
 scar+0.004
Neg logit contributions
 year-0.005
ini-0.004
urious-0.004
ally-0.004
 world-0.004
Token.
Token ID13
Feature activation+5.23e-03

Pos logit contributions
 pret+0.003
 swords+0.002
ips+0.002
ses+0.002
 loses+0.002
Neg logit contributions
 year-0.002
ini-0.001
 world-0.001
um-0.001
ally-0.001
Token new
Token ID649
Feature activation-7.08e-03

Pos logit contributions
 pret+0.000
ips+0.000
 swords+0.000
ses+0.000
 loses+0.000
Neg logit contributions
 year-0.000
ini-0.000
 world-0.000
um-0.000
ally-0.000
Token toy
Token ID13373
Feature activation+4.40e-03

Pos logit contributions
 year+0.012
ini+0.009
ally+0.009
um+0.009
Mom+0.009
Neg logit contributions
 pret-0.013
 swords-0.012
ips-0.012
ses-0.012
 loses-0.012
Token or
Token ID393
Feature activation-5.01e-03

Pos logit contributions
ips+0.008
 pret+0.008
ses+0.008
 swords+0.008
 Stri+0.008
Neg logit contributions
 year-0.007
 world-0.005
time-0.004
um-0.004
ide-0.004
Token a
Token ID257
Feature activation-2.06e-03

Pos logit contributions
 year+0.010
 world+0.007
Mom+0.007
um+0.007
ini+0.007
Neg logit contributions
ips-0.008
 pret-0.007
ses-0.007
ids-0.007
 swords-0.007
Token y
Token ID331
Feature activation-2.06e-02

Pos logit contributions
 year+0.004
ini+0.003
ally+0.003
um+0.003
urious+0.003
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ses-0.003
 loses-0.003
Tokenummy
Token ID13513
Feature activation-4.81e-02

Pos logit contributions
 year+0.024
spe+0.016
Lu+0.015
Mom+0.015
ini+0.014
Neg logit contributions
ips-0.045
 swords-0.044
 enough-0.043
 pret-0.043
 loses-0.042
Token treat
Token ID2190
Feature activation-2.04e-02

Pos logit contributions
 year+0.096
ini+0.073
ally+0.072
urious+0.072
um+0.071
Neg logit contributions
 pret-0.076
 swords-0.072
ses-0.070
ips-0.069
 loses-0.067
Token.
Token ID13
Feature activation-6.39e-03

Pos logit contributions
 year+0.030
 world+0.018
um+0.018
ini+0.018
Mom+0.018
Neg logit contributions
 pret-0.043
ips-0.042
 swords-0.042
ses-0.041
 loses-0.040
Token Anna
Token ID11735
Feature activation+1.06e-03

Pos logit contributions
 year+0.011
 world+0.008
Mom+0.008
um+0.007
ally+0.007
Neg logit contributions
 pret-0.012
 swords-0.011
ips-0.011
ses-0.011
 loses-0.011
Token and
Token ID290
Feature activation+2.20e-03

Pos logit contributions
drivers+0.002
 swords+0.002
 Stri+0.002
 training+0.002
rob+0.002
Neg logit contributions
 year-0.001
 world-0.001
 was-0.001
um-0.001
Mom-0.001
Token Ben
Token ID3932
Feature activation+2.55e-03

Pos logit contributions
drivers+0.004
rob+0.004
cher+0.003
icians+0.003
 pret+0.003
Neg logit contributions
 year-0.004
Mom-0.003
um-0.003
 Mom-0.003
 end-0.003
Token them
Token ID606
Feature activation+6.57e-03

Pos logit contributions
 year+0.012
ini+0.010
urious+0.010
ally+0.009
spe+0.009
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.005
ps-0.005
 scar-0.005
Token and
Token ID290
Feature activation-1.02e-02

Pos logit contributions
 pret+0.012
 swords+0.012
ips+0.011
 loses+0.011
 scar+0.011
Neg logit contributions
 year-0.011
ini-0.008
ally-0.008
 world-0.008
urious-0.008
Token make
Token ID787
Feature activation-5.45e-03

Pos logit contributions
 year+0.022
ini+0.016
ally+0.016
um+0.016
urious+0.016
Neg logit contributions
 pret-0.015
 swords-0.015
ses-0.014
 loses-0.014
ips-0.014
Token a
Token ID257
Feature activation-5.15e-03

Pos logit contributions
 year+0.012
ini+0.009
ally+0.009
spe+0.009
urious+0.009
Neg logit contributions
 pret-0.007
 swords-0.007
 superheroes-0.007
ips-0.006
 scar-0.006
Token y
Token ID331
Feature activation-2.02e-02

Pos logit contributions
 year+0.010
ally+0.008
ide+0.008
urious+0.008
Lu+0.008
Neg logit contributions
 swords-0.008
 scar-0.008
 pret-0.008
 superheroes-0.007
 padd-0.007
Tokenummy
Token ID13513
Feature activation-4.79e-02

Pos logit contributions
 year+0.024
Lu+0.018
spe+0.018
Me+0.017
 corner+0.017
Neg logit contributions
ips-0.040
 swords-0.038
 enough-0.037
 padd-0.036
etchup-0.036
Token salad
Token ID20698
Feature activation-1.43e-02

Pos logit contributions
 year+0.102
urious+0.086
ally+0.085
spe+0.084
Mom+0.084
Neg logit contributions
 pret-0.065
 swords-0.060
 scar-0.056
 superheroes-0.053
 ro-0.052
Token for
Token ID329
Feature activation-5.34e-03

Pos logit contributions
 year+0.025
ini+0.017
ally+0.017
urious+0.017
 world+0.017
Neg logit contributions
 pret-0.026
 swords-0.026
ips-0.025
ses-0.024
 loses-0.024
Token lunch
Token ID9965
Feature activation-1.49e-02

Pos logit contributions
 year+0.011
ini+0.008
 world+0.008
urious+0.008
ally+0.008
Neg logit contributions
 pret-0.008
 swords-0.008
ips-0.008
 superheroes-0.008
ses-0.008
Token!"
Token ID2474
Feature activation-7.06e-03

Pos logit contributions
 year+0.025
ini+0.020
urious+0.019
ally+0.019
heart+0.018
Neg logit contributions
 swords-0.026
 pret-0.026
ips-0.025
 superheroes-0.024
 loses-0.024
Token
Token ID198
Feature activation-4.51e-03

Pos logit contributions
 year+0.012
ini+0.009
rated+0.009
urious+0.008
heart+0.008
Neg logit contributions
ips-0.013
 pret-0.013
 Robo-0.013
 Stri-0.013
ids-0.013
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
 year+0.001
ini+0.000
ally+0.000
 world+0.000
um+0.000
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Token it
Token ID340
Feature activation-1.39e-02

Pos logit contributions
 year+0.027
ini+0.020
 world+0.019
um+0.019
ally+0.019
Neg logit contributions
 pret-0.022
 swords-0.021
ips-0.021
ses-0.020
 loses-0.020
Token looked
Token ID3114
Feature activation-1.16e-02

Pos logit contributions
 year+0.023
ini+0.016
 world+0.016
um+0.016
ally+0.016
Neg logit contributions
 pret-0.026
 swords-0.025
ips-0.025
 loses-0.024
ses-0.024
Token so
Token ID523
Feature activation-1.44e-02

Pos logit contributions
 year+0.024
ini+0.021
urious+0.020
ally+0.020
ide+0.020
Neg logit contributions
 swords-0.016
 pret-0.016
 enough-0.015
 scar-0.015
 superheroes-0.014
Token y
Token ID331
Feature activation-2.84e-02

Pos logit contributions
 year+0.029
ini+0.023
urious+0.023
 world+0.023
ally+0.023
Neg logit contributions
 pret-0.019
 swords-0.019
 loses-0.018
 scar-0.018
 superheroes-0.018
Tokenummy
Token ID13513
Feature activation-4.77e-02

Pos logit contributions
 year+0.034
spe+0.021
 world+0.019
Mom+0.019
 corner+0.018
Neg logit contributions
ips-0.064
 swords-0.060
 loses-0.059
etchup-0.059
 pret-0.058
Token!
Token ID0
Feature activation-8.81e-03

Pos logit contributions
 year+0.072
ini+0.047
 world+0.045
urious+0.045
ally+0.045
Neg logit contributions
 pret-0.097
 swords-0.095
ips-0.094
ses-0.092
 loses-0.091
Token The
Token ID383
Feature activation-1.48e-02

Pos logit contributions
 year+0.017
ini+0.013
heart+0.012
ide+0.012
ie+0.012
Neg logit contributions
 pret-0.014
 Stri-0.014
ses-0.014
ips-0.014
 swords-0.013
Token wild
Token ID4295
Feature activation+8.83e-03

Pos logit contributions
 year+0.035
ide+0.029
urious+0.029
ally+0.028
heart+0.028
Neg logit contributions
 pret-0.016
 swords-0.016
 scar-0.015
 badges-0.014
ips-0.014
Token pig
Token ID12967
Feature activation-1.18e-02

Pos logit contributions
 pret+0.010
 swords+0.010
ips+0.010
ses+0.010
 Stri+0.009
Neg logit contributions
 year-0.021
ini-0.016
ally-0.016
urious-0.016
um-0.016
Token ate
Token ID15063
Feature activation-3.76e-04

Pos logit contributions
 year+0.018
 world+0.012
um+0.012
ini+0.012
Mom+0.012
Neg logit contributions
 pret-0.023
 swords-0.022
ses-0.021
 Stri-0.021
 loses-0.021
Tokenled
Token ID992
Feature activation-9.63e-03

Pos logit contributions
 year+0.033
ini+0.024
ally+0.024
 world+0.024
Mom+0.024
Neg logit contributions
 pret-0.023
ips-0.023
 swords-0.022
ses-0.021
 loses-0.021
Token up
Token ID510
Feature activation+4.98e-03

Pos logit contributions
 year+0.017
ini+0.014
ally+0.013
urious+0.012
Mom+0.012
Neg logit contributions
 pret-0.017
 swords-0.016
ips-0.016
 scar-0.015
 loses-0.015
Tokenâ
Token ID22940
Feature activation-1.76e-02

Pos logit contributions
 pret+0.008
 swords+0.008
ips+0.008
ses+0.008
 loses+0.008
Neg logit contributions
 year-0.009
ini-0.007
 world-0.007
ally-0.007
um-0.007
Token
Token ID26391
Feature activation-8.08e-03

Pos logit contributions
 year+0.031
ini+0.021
 world+0.021
um+0.021
ally+0.021
Neg logit contributions
 pret-0.032
ips-0.031
 swords-0.030
ses-0.030
 loses-0.029
Token
Token ID126
Feature activation-1.54e-02

Pos logit contributions
 year+0.016
ini+0.012
um+0.012
heart+0.012
ally+0.012
Neg logit contributions
 pret-0.012
ips-0.012
 swords-0.012
 loses-0.012
ses-0.012
Token
Token ID99
Feature activation-4.77e-02

Pos logit contributions
 year+0.028
ini+0.020
 world+0.020
um+0.020
ally+0.020
Neg logit contributions
 pret-0.027
 swords-0.026
ips-0.026
ses-0.025
 loses-0.025
Token and
Token ID290
Feature activation-3.99e-03

Pos logit contributions
 year+0.091
ini+0.066
ally+0.064
 world+0.064
heart+0.063
Neg logit contributions
 pret-0.077
ips-0.077
 swords-0.075
ses-0.073
ids-0.072
Token before
Token ID878
Feature activation-1.66e-02

Pos logit contributions
 year+0.008
ini+0.006
 world+0.006
um+0.006
ally+0.006
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.006
ses-0.005
 loses-0.005
Token she
Token ID673
Feature activation-5.76e-03

Pos logit contributions
 year+0.032
time+0.022
 world+0.022
Mom+0.021
um+0.021
Neg logit contributions
ses-0.028
 swords-0.028
 pret-0.028
ips-0.027
ids-0.027
Token knew
Token ID2993
Feature activation+4.37e-04

Pos logit contributions
 year+0.010
ini+0.008
 world+0.007
ally+0.007
ide+0.007
Neg logit contributions
 pret-0.010
 swords-0.010
 loses-0.010
ses-0.009
ips-0.009
Token it
Token ID340
Feature activation-1.57e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.001
 world-0.001
um-0.001
ini-0.001
ally-0.001
Token a
Token ID257
Feature activation-6.23e-03

Pos logit contributions
 pret+0.019
 swords+0.018
ips+0.018
 loses+0.017
ses+0.017
Neg logit contributions
 year-0.024
ini-0.018
ally-0.018
um-0.018
urious-0.018
Token horse
Token ID8223
Feature activation-7.65e-04

Pos logit contributions
 year+0.012
ally+0.010
um+0.010
ini+0.010
urious+0.010
Neg logit contributions
 swords-0.009
 pret-0.009
 scar-0.009
 superheroes-0.008
 training-0.008
Tokenâ
Token ID22940
Feature activation-1.93e-02

Pos logit contributions
 year+0.001
ini+0.001
ally+0.001
um+0.001
urious+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
 loses-0.001
ses-0.001
Token
Token ID26391
Feature activation-1.35e-02

Pos logit contributions
 year+0.035
ini+0.024
 world+0.024
um+0.024
ally+0.024
Neg logit contributions
 pret-0.034
ips-0.033
 swords-0.032
ses-0.032
ids-0.031
Token
Token ID126
Feature activation-1.95e-02

Pos logit contributions
 year+0.026
ini+0.020
um+0.019
heart+0.019
ally+0.019
Neg logit contributions
 pret-0.021
ips-0.020
 swords-0.020
 loses-0.020
ids-0.020
Token
Token ID99
Feature activation-4.76e-02

Pos logit contributions
 year+0.037
ini+0.027
heart+0.026
Mom+0.026
ally+0.026
Neg logit contributions
 pret-0.033
 swords-0.031
ips-0.031
ses-0.030
ids-0.030
Token"
Token ID1
Feature activation+3.90e-03

Pos logit contributions
 year+0.097
ini+0.072
ally+0.071
ie+0.070
heart+0.070
Neg logit contributions
 pret-0.070
ips-0.070
 swords-0.068
 Stri-0.066
ids-0.065
Token
Token ID198
Feature activation-1.38e-02

Pos logit contributions
 pret+0.006
ips+0.006
 swords+0.006
 Stri+0.006
 loses+0.005
Neg logit contributions
 year-0.008
ini-0.006
ally-0.006
urious-0.006
 world-0.005
Token
Token ID198
Feature activation-6.77e-03

Pos logit contributions
 year+0.019
rated+0.015
ini+0.014
heart+0.014
urious+0.014
Neg logit contributions
 Stri-0.028
ips-0.027
 pret-0.026
rob-0.026
 Robo-0.026
TokenJohn
Token ID7554
Feature activation-6.95e-03

Pos logit contributions
 year+0.010
ini+0.007
rated+0.007
 world+0.007
heart+0.007
Neg logit contributions
 Stri-0.013
 pret-0.013
ips-0.013
 swords-0.012
Aud-0.012
Token was
Token ID373
Feature activation+2.46e-03

Pos logit contributions
 year+0.012
ini+0.008
 world+0.008
um+0.008
ally+0.008
Neg logit contributions
 pret-0.013
 swords-0.012
ips-0.012
ses-0.012
 loses-0.012
Token some
Token ID617
Feature activation+1.30e-02

Pos logit contributions
 pret+0.000
 swords+0.000
ips+0.000
ses+0.000
 loses+0.000
Neg logit contributions
 year-0.000
ini-0.000
ally-0.000
um-0.000
urious-0.000
Token trouble
Token ID5876
Feature activation+1.85e-02

Pos logit contributions
 swords+0.019
 pret+0.019
 superheroes+0.018
 training+0.018
 loses+0.018
Neg logit contributions
 year-0.025
ally-0.021
ini-0.021
ide-0.020
urious-0.020
Tokenâ
Token ID22940
Feature activation-1.66e-02

Pos logit contributions
 pret+0.036
 swords+0.035
ips+0.035
 loses+0.034
ses+0.034
Neg logit contributions
 year-0.029
ini-0.020
ally-0.020
um-0.020
urious-0.019
Token
Token ID26391
Feature activation-5.86e-03

Pos logit contributions
 year+0.030
ini+0.021
 world+0.021
um+0.021
ally+0.021
Neg logit contributions
 pret-0.028
ips-0.028
 swords-0.027
ses-0.027
ids-0.026
Token
Token ID126
Feature activation-1.77e-02

Pos logit contributions
 year+0.011
ini+0.008
heart+0.008
um+0.008
ie+0.008
Neg logit contributions
 pret-0.009
 swords-0.009
 loses-0.009
ips-0.009
ids-0.009
Token
Token ID99
Feature activation-4.75e-02

Pos logit contributions
 year+0.034
ini+0.025
Mom+0.024
heart+0.024
ally+0.024
Neg logit contributions
 pret-0.029
ips-0.028
 swords-0.027
ses-0.027
ids-0.027
Token May
Token ID1737
Feature activation-5.49e-03

Pos logit contributions
 year+0.095
ini+0.070
ie+0.069
ally+0.069
heart+0.069
Neg logit contributions
ips-0.072
 pret-0.071
 swords-0.069
ids-0.068
 loses-0.067
Token we
Token ID356
Feature activation-4.69e-03

Pos logit contributions
 year+0.009
ini+0.006
ally+0.006
um+0.006
 world+0.006
Neg logit contributions
 pret-0.011
 swords-0.010
ips-0.010
ses-0.010
 loses-0.010
Token come
Token ID1282
Feature activation+5.08e-03

Pos logit contributions
 year+0.009
ini+0.007
um+0.007
 world+0.007
ally+0.007
Neg logit contributions
 pret-0.007
 swords-0.007
ips-0.007
 loses-0.007
ses-0.007
Token in
Token ID287
Feature activation-1.39e-02

Pos logit contributions
 pret+0.007
 swords+0.007
ips+0.007
 training+0.006
ids+0.006
Neg logit contributions
 year-0.011
ally-0.008
ini-0.008
urious-0.008
um-0.008
Token?"
Token ID1701
Feature activation+1.14e-02

Pos logit contributions
 year+0.024
ini+0.018
ally+0.017
um+0.017
urious+0.017
Neg logit contributions
 pret-0.025
 swords-0.024
ips-0.024
 loses-0.023
 superheroes-0.023
Token man
Token ID582
Feature activation-5.19e-03

Pos logit contributions
 year+0.002
ide+0.002
urious+0.002
ini+0.002
ally+0.002
Neg logit contributions
 pret-0.002
 swords-0.002
ses-0.002
 buck-0.002
 whist-0.002
Token handed
Token ID10158
Feature activation-9.05e-04

Pos logit contributions
 year+0.009
ini+0.007
 world+0.006
um+0.006
ally+0.006
Neg logit contributions
 pret-0.009
 swords-0.009
ips-0.009
 loses-0.008
ses-0.008
Token Tim
Token ID5045
Feature activation+4.48e-04

Pos logit contributions
 year+0.002
ini+0.002
 world+0.002
urious+0.002
ally+0.002
Neg logit contributions
 swords-0.001
 pret-0.001
ips-0.001
 Stri-0.001
ses-0.001
Token a
Token ID257
Feature activation-3.41e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
 Stri+0.001
ses+0.001
Neg logit contributions
 year-0.001
ini-0.001
 world-0.001
um-0.001
ally-0.001
Token y
Token ID331
Feature activation-2.22e-02

Pos logit contributions
 year+0.006
ini+0.005
 world+0.004
um+0.004
ally+0.004
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.006
ses-0.005
 loses-0.005
Tokenummy
Token ID13513
Feature activation-4.75e-02

Pos logit contributions
 year+0.025
spe+0.015
Mom+0.015
Lu+0.015
 world+0.014
Neg logit contributions
ips-0.050
 swords-0.050
ses-0.048
 pret-0.048
etchup-0.047
Token cookie
Token ID19751
Feature activation-1.37e-02

Pos logit contributions
 year+0.100
ini+0.075
 world+0.074
ally+0.073
um+0.073
Neg logit contributions
 pret-0.069
 swords-0.066
ips-0.065
ses-0.064
 loses-0.062
Token and
Token ID290
Feature activation-5.29e-03

Pos logit contributions
 year+0.025
ini+0.018
urious+0.018
 world+0.017
ally+0.017
Neg logit contributions
 pret-0.024
 swords-0.023
ips-0.023
ses-0.022
ids-0.022
Token thanked
Token ID26280
Feature activation+1.25e-02

Pos logit contributions
 year+0.012
 world+0.009
ini+0.009
urious+0.009
ide+0.009
Neg logit contributions
 pret-0.007
 swords-0.007
 Stri-0.007
 scar-0.007
 loses-0.007
Token him
Token ID683
Feature activation+3.81e-03

Pos logit contributions
 Stri+0.013
 pret+0.012
 swords+0.012
 Robo+0.012
 padd+0.010
Neg logit contributions
 year-0.030
urious-0.026
ini-0.026
rated-0.025
 world-0.025
Token.
Token ID13
Feature activation+7.90e-03

Pos logit contributions
 pret+0.006
 swords+0.005
ids+0.005
ips+0.005
 Stri+0.005
Neg logit contributions
 year-0.008
ini-0.006
ally-0.006
urious-0.006
Mom-0.006
Token round
Token ID2835
Feature activation-9.43e-03

Pos logit contributions
 pret+0.011
 swords+0.011
ses+0.010
 scar+0.010
 loses+0.010
Neg logit contributions
 year-0.013
urious-0.010
ini-0.010
um-0.010
ally-0.009
Token and
Token ID290
Feature activation+1.25e-02

Pos logit contributions
 year+0.019
ini+0.014
Mom+0.014
urious+0.013
ally+0.013
Neg logit contributions
 pret-0.015
 swords-0.015
ips-0.014
ses-0.014
 loses-0.013
Token looked
Token ID3114
Feature activation-9.27e-03

Pos logit contributions
 pret+0.018
 swords+0.017
ses+0.016
 Stri+0.016
 scar+0.016
Neg logit contributions
 year-0.028
urious-0.021
ini-0.021
heart-0.021
ide-0.020
Token so
Token ID523
Feature activation-1.19e-02

Pos logit contributions
 year+0.019
ini+0.016
ally+0.015
Mom+0.015
ide+0.015
Neg logit contributions
 pret-0.013
 swords-0.013
 scar-0.012
 loses-0.012
 training-0.012
Token y
Token ID331
Feature activation-2.39e-02

Pos logit contributions
 year+0.026
ini+0.020
urious+0.019
 world+0.019
 doorstep+0.019
Neg logit contributions
 swords-0.017
 pret-0.016
 loses-0.015
 scar-0.015
ses-0.015
Tokenummy
Token ID13513
Feature activation-4.74e-02

Pos logit contributions
 year+0.029
spe+0.016
 world+0.016
ini+0.015
Mom+0.015
Neg logit contributions
ips-0.055
 swords-0.053
 pret-0.052
ses-0.052
 loses-0.052
Token.
Token ID13
Feature activation+3.07e-03

Pos logit contributions
 year+0.069
ini+0.043
 world+0.043
um+0.042
ally+0.042
Neg logit contributions
 pret-0.099
 swords-0.097
ips-0.097
ses-0.095
 loses-0.093
Token The
Token ID383
Feature activation-1.20e-02

Pos logit contributions
 pret+0.005
 swords+0.005
ses+0.005
 Stri+0.005
ips+0.005
Neg logit contributions
 year-0.006
ini-0.004
ie-0.004
heart-0.004
 world-0.004
Token frog
Token ID21264
Feature activation-4.80e-03

Pos logit contributions
 year+0.027
spe+0.024
ide+0.022
heart+0.022
Lu+0.022
Neg logit contributions
ses-0.014
 pret-0.014
 swords-0.014
 mustard-0.014
etchup-0.013
Token carefully
Token ID7773
Feature activation+3.79e-03

Pos logit contributions
 year+0.008
ini+0.006
 world+0.005
um+0.005
heart+0.005
Neg logit contributions
 pret-0.009
 swords-0.009
 loses-0.008
 padd-0.008
 tries-0.008
Token took
Token ID1718
Feature activation+1.86e-03

Pos logit contributions
ips+0.007
ses+0.007
 pret+0.006
 Robo+0.006
 swords+0.006
Neg logit contributions
 year-0.007
ally-0.005
um-0.005
ie-0.004
ide-0.004
Token made
Token ID925
Feature activation-1.40e-02

Pos logit contributions
 year+0.020
ini+0.015
 world+0.014
um+0.014
ally+0.014
Neg logit contributions
 pret-0.015
ips-0.015
 swords-0.015
ses-0.015
 loses-0.014
Token a
Token ID257
Feature activation-6.13e-03

Pos logit contributions
 year+0.032
ini+0.025
 world+0.024
ally+0.024
um+0.024
Neg logit contributions
 pret-0.018
 swords-0.017
ips-0.017
ses-0.016
 Stri-0.016
Token big
Token ID1263
Feature activation-1.76e-02

Pos logit contributions
 year+0.011
ini+0.009
ally+0.009
ide+0.009
urious+0.009
Neg logit contributions
 pret-0.010
 scar-0.010
 swords-0.010
 loses-0.009
 superheroes-0.009
Token,
Token ID11
Feature activation-4.29e-03

Pos logit contributions
 year+0.033
ini+0.025
urious+0.024
ally+0.023
Mom+0.023
Neg logit contributions
 pret-0.030
 swords-0.028
 scar-0.028
ses-0.027
 loses-0.027
Token y
Token ID331
Feature activation-2.27e-02

Pos logit contributions
 year+0.008
ini+0.006
ally+0.006
urious+0.006
ide+0.006
Neg logit contributions
 pret-0.007
 swords-0.007
ses-0.006
 scar-0.006
 loses-0.006
Tokenummy
Token ID13513
Feature activation-4.73e-02

Pos logit contributions
 year+0.026
spe+0.016
 world+0.015
ini+0.015
Mom+0.014
Neg logit contributions
ips-0.052
 swords-0.050
 pret-0.048
 loses-0.048
ses-0.048
Token w
Token ID266
Feature activation-7.42e-03

Pos logit contributions
 year+0.102
urious+0.081
ini+0.081
ally+0.079
Mom+0.079
Neg logit contributions
 pret-0.066
 swords-0.061
 scar-0.058
ses-0.056
ips-0.056
Tokenaffle
Token ID30697
Feature activation-1.14e-02

Pos logit contributions
 year+0.016
ini+0.012
 world+0.012
ally+0.012
um+0.012
Neg logit contributions
 pret-0.010
 swords-0.010
ips-0.010
ses-0.009
 loses-0.009
Token.
Token ID13
Feature activation-8.44e-03

Pos logit contributions
 year+0.019
ini+0.013
 world+0.013
um+0.013
ally+0.013
Neg logit contributions
 pret-0.021
 swords-0.021
ips-0.021
ses-0.020
 loses-0.020
Token Mia
Token ID32189
Feature activation-2.52e-03

Pos logit contributions
 year+0.019
ini+0.016
rated+0.015
spe+0.015
 world+0.015
Neg logit contributions
ips-0.011
 pret-0.011
ses-0.011
ids-0.010
 Stri-0.010
Token was
Token ID373
Feature activation-4.08e-03

Pos logit contributions
 year+0.004
ini+0.003
 world+0.003
um+0.002
ally+0.002
Neg logit contributions
 pret-0.005
 swords-0.005
ips-0.005
 loses-0.005
ses-0.005
Token Pat
Token ID3208
Feature activation+1.52e-04

Pos logit contributions
 year+0.009
 world+0.007
ini+0.007
ally+0.007
um+0.007
Neg logit contributions
 pret-0.007
ips-0.007
 swords-0.007
ses-0.007
 loses-0.006
Tokenches
Token ID2052
Feature activation-7.76e-03

Pos logit contributions
 pret+0.000
 swords+0.000
ips+0.000
 loses+0.000
 Stri+0.000
Neg logit contributions
 year-0.000
ini-0.000
 world-0.000
um-0.000
Mom-0.000
Token saw
Token ID2497
Feature activation-1.33e-02

Pos logit contributions
 year+0.014
ini+0.011
um+0.010
urious+0.010
 world+0.010
Neg logit contributions
 pret-0.013
 loses-0.012
 swords-0.012
ips-0.012
ked-0.012
Token a
Token ID257
Feature activation-3.00e-03

Pos logit contributions
 year+0.032
ini+0.025
um+0.025
 world+0.025
ally+0.025
Neg logit contributions
 pret-0.015
 swords-0.015
ips-0.014
ses-0.014
 Stri-0.014
Token y
Token ID331
Feature activation-1.84e-02

Pos logit contributions
 year+0.005
urious+0.004
um+0.004
ide+0.004
ally+0.004
Neg logit contributions
 pret-0.005
 swords-0.005
 scar-0.005
 superheroes-0.005
ses-0.005
Tokenummy
Token ID13513
Feature activation-4.72e-02

Pos logit contributions
 year+0.020
spe+0.012
Mom+0.012
Lu+0.011
rated+0.011
Neg logit contributions
ips-0.042
 swords-0.041
ses-0.040
 pret-0.040
 loses-0.039
Token piece
Token ID3704
Feature activation-3.82e-03

Pos logit contributions
 year+0.100
urious+0.084
spe+0.083
Mom+0.081
ini+0.080
Neg logit contributions
 pret-0.064
 swords-0.058
 scar-0.057
ses-0.056
 whist-0.053
Token of
Token ID286
Feature activation+6.50e-04

Pos logit contributions
 year+0.006
urious+0.004
ini+0.004
ally+0.004
 world+0.004
Neg logit contributions
 swords-0.007
 pret-0.007
ips-0.007
ses-0.007
ids-0.007
Token paper
Token ID3348
Feature activation+4.18e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ses+0.001
 scar+0.001
ips+0.001
Neg logit contributions
 year-0.002
ini-0.001
urious-0.001
ally-0.001
um-0.001
Token blowing
Token ID19280
Feature activation+2.00e-03

Pos logit contributions
 pret+0.009
 swords+0.008
ips+0.008
ses+0.008
 loses+0.008
Neg logit contributions
 year-0.006
ini-0.004
 world-0.004
ally-0.004
um-0.004
Token across
Token ID1973
Feature activation+1.18e-03

Pos logit contributions
 swords+0.003
 pret+0.003
 scar+0.002
ips+0.002
ps+0.002
Neg logit contributions
 year-0.004
ini-0.004
ally-0.003
urious-0.003
um-0.003
Token,
Token ID11
Feature activation+4.76e-03

Pos logit contributions
ses+0.000
 pret+0.000
ips+0.000
 loses+0.000
 swords+0.000
Neg logit contributions
 year-0.000
 world-0.000
ide-0.000
spe-0.000
time-0.000
Token and
Token ID290
Feature activation+9.22e-03

Pos logit contributions
ips+0.009
ses+0.009
 pret+0.009
etchup+0.008
rob+0.008
Neg logit contributions
 year-0.008
 world-0.005
ide-0.005
spe-0.005
time-0.005
Token made
Token ID925
Feature activation-1.28e-03

Pos logit contributions
ips+0.014
 pret+0.014
ses+0.014
 swords+0.013
 loses+0.013
Neg logit contributions
 year-0.019
 world-0.013
Mom-0.013
ini-0.013
ally-0.013
Token noises
Token ID26782
Feature activation-2.33e-03

Pos logit contributions
 year+0.003
ini+0.002
ally+0.002
um+0.002
 world+0.002
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token like
Token ID588
Feature activation-1.78e-04

Pos logit contributions
 year+0.004
 world+0.002
ide+0.002
ini+0.002
um+0.002
Neg logit contributions
 pret-0.005
ips-0.005
ses-0.005
 swords-0.004
ids-0.004
Token hon
Token ID3032
Feature activation+4.65e-02

Pos logit contributions
 year+0.000
 world+0.000
spe+0.000
time+0.000
ide+0.000
Neg logit contributions
ips-0.000
ses-0.000
 pret-0.000
ids-0.000
 loses-0.000
Tokenk
Token ID74
Feature activation+1.06e-02

Pos logit contributions
 pret+0.060
ips+0.060
ses+0.060
 elbow+0.057
 swords+0.056
Neg logit contributions
 year-0.101
 world-0.077
ide-0.076
urious-0.076
ini-0.075
Token and
Token ID290
Feature activation+1.67e-02

Pos logit contributions
 pret+0.021
ses+0.020
ips+0.020
 swords+0.020
 loses+0.020
Neg logit contributions
 year-0.017
 world-0.011
um-0.011
ini-0.011
ide-0.010
Token be
Token ID307
Feature activation+1.59e-02

Pos logit contributions
ips+0.022
ses+0.020
 pret+0.020
ids+0.020
etchup+0.019
Neg logit contributions
 year-0.036
 world-0.028
spe-0.027
Mom-0.027
 sound-0.026
Tokenep
Token ID538
Feature activation+1.30e-02

Pos logit contributions
ses+0.025
 pret+0.025
ips+0.024
 loses+0.023
ids+0.023
Neg logit contributions
 year-0.032
 world-0.023
ini-0.022
Mom-0.022
spe-0.022
Token.
Token ID13
Feature activation-6.58e-03

Pos logit contributions
 pret+0.026
 swords+0.026
ips+0.026
ses+0.025
 loses+0.025
Neg logit contributions
 year-0.020
ini-0.013
 world-0.013
um-0.012
ally-0.012
Token around
Token ID1088
Feature activation+4.37e-03

Pos logit contributions
 year+0.028
ini+0.022
urious+0.021
ally+0.021
spe+0.021
Neg logit contributions
 swords-0.021
 pret-0.021
ips-0.021
 scar-0.019
 padd-0.019
Token.
Token ID13
Feature activation-8.34e-04

Pos logit contributions
 pret+0.009
 swords+0.008
ips+0.008
 superheroes+0.008
ses+0.008
Neg logit contributions
 year-0.007
ini-0.005
urious-0.005
ally-0.005
um-0.005
Token He
Token ID679
Feature activation-7.44e-04

Pos logit contributions
 year+0.002
ie+0.001
heart+0.001
ini+0.001
urious+0.001
Neg logit contributions
 pret-0.001
 Stri-0.001
ses-0.001
 Robo-0.001
ips-0.001
Token built
Token ID3170
Feature activation-5.27e-03

Pos logit contributions
 year+0.001
 world+0.001
ini+0.001
ally+0.001
Mom+0.001
Neg logit contributions
ips-0.001
 pret-0.001
ses-0.001
 swords-0.001
ids-0.001
Token sand
Token ID6450
Feature activation+1.14e-02

Pos logit contributions
 year+0.011
 world+0.008
um+0.008
Mom+0.008
ini+0.008
Neg logit contributions
ips-0.008
 pret-0.008
ses-0.008
 loses-0.007
ids-0.007
Tokencast
Token ID2701
Feature activation+4.42e-02

Pos logit contributions
ses+0.020
 pret+0.020
ips+0.019
ids+0.019
 loses+0.019
Neg logit contributions
 year-0.020
ide-0.014
 world-0.014
um-0.014
ie-0.014
Tokenles
Token ID829
Feature activation+1.25e-02

Pos logit contributions
ses+0.084
 pret+0.079
ips+0.077
 swords+0.075
 superheroes+0.074
Neg logit contributions
 year-0.080
ie-0.056
ide-0.054
um-0.052
ally-0.052
Token and
Token ID290
Feature activation+3.22e-03

Pos logit contributions
 pret+0.025
ses+0.024
ips+0.024
 swords+0.023
 loses+0.023
Neg logit contributions
 year-0.020
 world-0.013
ide-0.013
um-0.012
time-0.012
Token collected
Token ID7723
Feature activation-8.05e-03

Pos logit contributions
ips+0.004
 pret+0.004
ses+0.004
 swords+0.004
ids+0.004
Neg logit contributions
 year-0.007
ini-0.005
 world-0.005
ally-0.005
Mom-0.005
Token seas
Token ID21547
Feature activation+1.66e-02

Pos logit contributions
 year+0.018
 world+0.013
ini+0.013
um+0.013
ally+0.013
Neg logit contributions
 pret-0.011
ips-0.011
ses-0.011
 swords-0.010
 loses-0.010
Tokenhell
Token ID12758
Feature activation+1.23e-02

Pos logit contributions
 pret+0.034
 swords+0.033
ips+0.033
ses+0.032
 loses+0.032
Neg logit contributions
 year-0.025
ini-0.016
 world-0.016
Mom-0.016
urious-0.016
Token touch
Token ID3638
Feature activation+2.19e-03

Pos logit contributions
 year+0.012
ini+0.008
 world+0.008
um+0.008
ally+0.008
Neg logit contributions
 pret-0.009
ips-0.009
 swords-0.009
ses-0.009
 loses-0.009
Token them
Token ID606
Feature activation+8.37e-03

Pos logit contributions
 pret+0.003
 swords+0.003
ips+0.003
 loses+0.003
ses+0.003
Neg logit contributions
 year-0.005
ini-0.004
ally-0.003
um-0.003
urious-0.003
Token.
Token ID13
Feature activation-3.76e-03

Pos logit contributions
 pret+0.019
 swords+0.019
ips+0.019
ses+0.018
 loses+0.018
Neg logit contributions
 year-0.011
 world-0.006
ini-0.006
um-0.006
ally-0.006
Token He
Token ID679
Feature activation+1.95e-03

Pos logit contributions
 year+0.007
ini+0.005
ie+0.005
ally+0.005
heart+0.005
Neg logit contributions
 pret-0.006
 swords-0.006
 Stri-0.006
ips-0.006
 loses-0.006
Token craw
Token ID27784
Feature activation+1.27e-04

Pos logit contributions
ips+0.003
ids+0.002
 swords+0.002
drivers+0.002
cher+0.002
Neg logit contributions
 year-0.004
urious-0.003
um-0.003
 world-0.003
Mom-0.003
Tokenls
Token ID7278
Feature activation+4.06e-02

Pos logit contributions
ses+0.000
 pret+0.000
drivers+0.000
 training+0.000
 superheroes+0.000
Neg logit contributions
 year-0.000
ie-0.000
um-0.000
ide-0.000
 world-0.000
Token to
Token ID284
Feature activation-1.81e-03

Pos logit contributions
rob+0.080
ses+0.080
 pret+0.079
drivers+0.079
 elbow+0.078
Neg logit contributions
 year-0.058
 corner-0.039
ide-0.038
 was-0.038
 world-0.037
Token the
Token ID262
Feature activation+7.06e-04

Pos logit contributions
 year+0.004
ini+0.003
um+0.003
 world+0.003
urious+0.003
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 Stri-0.002
Token castle
Token ID16669
Feature activation+8.96e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ses+0.001
ips+0.001
 superheroes+0.001
Neg logit contributions
 year-0.001
ally-0.001
heart-0.001
urious-0.001
um-0.001
Token and
Token ID290
Feature activation-1.04e-02

Pos logit contributions
 pret+0.016
 swords+0.016
ips+0.015
ses+0.015
 loses+0.015
Neg logit contributions
 year-0.016
ini-0.011
 world-0.011
um-0.011
ally-0.011
Token grabs
Token ID22378
Feature activation-1.67e-02

Pos logit contributions
 year+0.019
 was+0.015
 edge+0.015
 me+0.015
urious+0.014
Neg logit contributions
ips-0.015
drivers-0.014
ids-0.014
iers-0.013
 training-0.013
Token Tim
Token ID5045
Feature activation-7.53e-03

Pos logit contributions
 year+0.015
ie+0.012
ini+0.012
heart+0.011
ide+0.011
Neg logit contributions
 Stri-0.008
 padd-0.008
ses-0.007
 pret-0.007
ips-0.007
Token and
Token ID290
Feature activation+4.51e-04

Pos logit contributions
 year+0.012
ini+0.008
 world+0.007
um+0.007
ally+0.007
Neg logit contributions
 pret-0.015
 swords-0.015
ips-0.014
ses-0.014
 loses-0.014
Token Jane
Token ID12091
Feature activation-1.85e-03

Pos logit contributions
 pret+0.001
 tries+0.001
ips+0.001
 hesitation+0.001
 Robo+0.001
Neg logit contributions
 year-0.001
ini-0.001
 world-0.001
Twe-0.000
urious-0.000
Token built
Token ID3170
Feature activation-8.12e-03

Pos logit contributions
 year+0.003
ini+0.002
 world+0.002
um+0.002
ally+0.002
Neg logit contributions
 pret-0.003
 swords-0.003
ips-0.003
ses-0.003
 loses-0.003
Token sand
Token ID6450
Feature activation+4.67e-03

Pos logit contributions
 year+0.016
 world+0.012
um+0.011
time+0.011
ide+0.011
Neg logit contributions
ips-0.013
ses-0.013
 pret-0.012
ids-0.012
 loses-0.012
Tokencast
Token ID2701
Feature activation+4.24e-02

Pos logit contributions
ses+0.008
 pret+0.008
ips+0.007
 loses+0.007
ids+0.007
Neg logit contributions
 year-0.009
 world-0.006
ide-0.006
um-0.006
ini-0.006
Tokenles
Token ID829
Feature activation+9.19e-03

Pos logit contributions
ses+0.076
 pret+0.073
ips+0.071
 swords+0.070
 loses+0.067
Neg logit contributions
 year-0.080
ie-0.055
ide-0.055
um-0.053
ally-0.053
Token and
Token ID290
Feature activation-5.39e-04

Pos logit contributions
 pret+0.018
ses+0.017
ips+0.017
 swords+0.017
 loses+0.017
Neg logit contributions
 year-0.015
 world-0.010
ide-0.010
um-0.009
ini-0.009
Token played
Token ID2826
Feature activation-1.05e-02

Pos logit contributions
 year+0.001
 world+0.001
ini+0.001
ally+0.001
Mom+0.001
Neg logit contributions
ips-0.001
 pret-0.001
ses-0.001
 swords-0.001
 loses-0.001
Token in
Token ID287
Feature activation-1.07e-02

Pos logit contributions
 year+0.019
 world+0.013
ini+0.013
um+0.013
ally+0.013
Neg logit contributions
 pret-0.018
ips-0.018
ses-0.018
 swords-0.017
 loses-0.017
Token the
Token ID262
Feature activation+2.39e-03

Pos logit contributions
 year+0.024
ini+0.019
um+0.019
ally+0.018
urious+0.018
Neg logit contributions
 pret-0.014
 swords-0.013
ips-0.013
ses-0.012
 loses-0.012
Token is
Token ID318
Feature activation-2.46e-03

Pos logit contributions
 year+0.039
ini+0.029
ide+0.028
Lu+0.028
ally+0.027
Neg logit contributions
 pret-0.036
 loses-0.035
 swords-0.034
 enough-0.033
 tries-0.033
Token called
Token ID1444
Feature activation+1.34e-02

Pos logit contributions
 year+0.005
ini+0.004
 world+0.004
ally+0.003
um+0.003
Neg logit contributions
 pret-0.004
 swords-0.004
ips-0.004
ses-0.004
 loses-0.004
Token a
Token ID257
Feature activation+1.42e-02

Pos logit contributions
 pret+0.017
ips+0.017
 swords+0.016
ses+0.016
 loses+0.016
Neg logit contributions
 year-0.030
 world-0.023
um-0.023
Mom-0.022
ini-0.022
Token sailor
Token ID43272
Feature activation+1.55e-04

Pos logit contributions
 pret+0.024
 swords+0.022
ses+0.021
 superheroes+0.021
 scar+0.021
Neg logit contributions
 year-0.027
ally-0.021
urious-0.021
um-0.021
ini-0.021
Token,"
Token ID553
Feature activation+1.68e-02

Pos logit contributions
 pret+0.000
 swords+0.000
ips+0.000
ses+0.000
 loses+0.000
Neg logit contributions
 year-0.000
ally-0.000
ini-0.000
um-0.000
urious-0.000
Token her
Token ID607
Feature activation+3.97e-02

Pos logit contributions
 pret+0.021
 Stri+0.020
 swords+0.020
 Robo+0.019
 scar+0.018
Neg logit contributions
 year-0.037
ini-0.032
urious-0.030
 world-0.030
rated-0.030
Token dad
Token ID9955
Feature activation+1.05e-02

Pos logit contributions
 pret+0.075
 swords+0.073
ses+0.072
ips+0.070
 superheroes+0.068
Neg logit contributions
 year-0.068
ini-0.047
ally-0.046
 world-0.046
urious-0.045
Token said
Token ID531
Feature activation+1.49e-02

Pos logit contributions
 pret+0.014
 swords+0.013
ips+0.013
 loses+0.013
ses+0.013
Neg logit contributions
 year-0.023
ini-0.018
 world-0.018
ally-0.017
urious-0.017
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
 pret+0.028
ids+0.027
 swords+0.027
ips+0.027
ses+0.026
Neg logit contributions
 year-0.024
ini-0.017
urious-0.017
 world-0.017
ally-0.017
Token
Token ID198
Feature activation+5.41e-03

Pos logit contributions
 pret+0.022
 Stri+0.021
 Robo+0.021
ips+0.020
ids+0.020
Neg logit contributions
 year-0.022
ini-0.017
urious-0.016
rated-0.016
ie-0.015
Token
Token ID198
Feature activation+9.83e-03

Pos logit contributions
 Stri+0.008
 Robo+0.008
Aud+0.008
 Fay+0.008
 pret+0.007
Neg logit contributions
 year-0.010
umm-0.009
rated-0.009
ini-0.009
heart-0.008
Token that
Token ID326
Feature activation+6.24e-03

Pos logit contributions
 year+0.010
ini+0.007
 world+0.007
um+0.007
ally+0.007
Neg logit contributions
 pret-0.009
 swords-0.008
ips-0.008
ses-0.008
 loses-0.008
Token day
Token ID1110
Feature activation+1.27e-02

Pos logit contributions
 pret+0.015
ips+0.015
 swords+0.014
ses+0.014
 training+0.014
Neg logit contributions
 year-0.006
ini-0.004
ally-0.004
urious-0.004
heart-0.003
Token on
Token ID319
Feature activation+1.85e-02

Pos logit contributions
 pret+0.022
 swords+0.021
ses+0.021
 badges+0.020
 loses+0.020
Neg logit contributions
 year-0.026
um-0.019
Mom-0.019
time-0.019
 day-0.017
Token,
Token ID11
Feature activation+1.23e-02

Pos logit contributions
 pret+0.032
 swords+0.032
ips+0.031
ses+0.031
 loses+0.030
Neg logit contributions
 year-0.034
ini-0.024
 world-0.023
um-0.023
ally-0.023
Token Lucy
Token ID22162
Feature activation-1.14e-03

Pos logit contributions
 pret+0.021
 superheroes+0.020
ips+0.020
 swords+0.020
 Stri+0.019
Neg logit contributions
 year-0.021
ini-0.017
um-0.016
urious-0.016
ally-0.015
Token and
Token ID290
Feature activation+8.62e-03

Pos logit contributions
 year+0.002
 world+0.001
ini+0.001
ie+0.001
ally+0.001
Neg logit contributions
 swords-0.002
ses-0.002
ips-0.002
 pret-0.002
drivers-0.002
Token the
Token ID262
Feature activation-1.56e-03

Pos logit contributions
 pret+0.013
 swords+0.013
ips+0.013
ses+0.012
ids+0.012
Neg logit contributions
 year-0.018
 world-0.013
ally-0.013
ini-0.013
um-0.013
Token little
Token ID1310
Feature activation-4.46e-03

Pos logit contributions
 year+0.002
heart+0.002
ide+0.002
um+0.002
rated+0.002
Neg logit contributions
 superheroes-0.003
ses-0.003
 pret-0.003
 scissors-0.003
 scar-0.003
Token girl
Token ID2576
Feature activation-1.28e-02

Pos logit contributions
 year+0.005
ini+0.003
um+0.003
 world+0.003
ally+0.003
Neg logit contributions
 pret-0.011
 swords-0.011
ips-0.011
ses-0.010
 loses-0.010
Token were
Token ID547
Feature activation-1.39e-03

Pos logit contributions
 year+0.020
 world+0.013
ie+0.012
ini+0.012
ide+0.012
Neg logit contributions
 pret-0.026
ips-0.026
ses-0.025
 swords-0.025
drivers-0.024
Token best
Token ID1266
Feature activation-5.47e-03

Pos logit contributions
 year+0.002
 world+0.002
time+0.001
ie+0.001
Mom+0.001
Neg logit contributions
ses-0.003
 pret-0.003
drivers-0.003
 swords-0.003
 loses-0.003
Token Mom
Token ID11254
Feature activation+2.77e-03

Pos logit contributions
 pret+0.024
 Robo+0.023
 swords+0.023
 Stri+0.023
ips+0.022
Neg logit contributions
 year-0.034
ini-0.028
urious-0.027
rated-0.026
 world-0.026
Tokenmy
Token ID1820
Feature activation-6.38e-04

Pos logit contributions
 pret+0.005
 scar+0.005
ips+0.005
 swords+0.005
 training+0.005
Neg logit contributions
 year-0.005
ini-0.003
urious-0.003
 world-0.003
um-0.003
Token.
Token ID13
Feature activation-4.00e-03

Pos logit contributions
 year+0.001
ini+0.001
ally+0.001
urious+0.001
 world+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ids-0.001
 training-0.001
Token He
Token ID679
Feature activation+2.22e-03

Pos logit contributions
 year+0.007
ini+0.006
heart+0.005
ide+0.005
urious+0.005
Neg logit contributions
 pret-0.006
ses-0.006
 Robo-0.006
ips-0.006
 Stri-0.006
Token loved
Token ID6151
Feature activation-6.22e-03

Pos logit contributions
ips+0.005
 pret+0.005
 swords+0.005
ses+0.005
ids+0.004
Neg logit contributions
 year-0.003
um-0.002
 world-0.002
Mom-0.002
ini-0.002
Token his
Token ID465
Feature activation+8.61e-03

Pos logit contributions
 year+0.011
 world+0.008
 day+0.008
Mom+0.008
time+0.007
Neg logit contributions
ses-0.012
ids-0.011
 loses-0.011
ips-0.011
ked-0.010
Token bike
Token ID7161
Feature activation+1.16e-02

Pos logit contributions
 pret+0.014
 swords+0.014
ips+0.013
ses+0.013
 training+0.013
Neg logit contributions
 year-0.016
ini-0.012
ally-0.012
ide-0.012
heart-0.012
Token and
Token ID290
Feature activation+6.68e-04

Pos logit contributions
 pret+0.021
 swords+0.020
ips+0.020
ses+0.020
 loses+0.019
Neg logit contributions
 year-0.020
ini-0.014
 world-0.014
um-0.014
ally-0.014
Token he
Token ID339
Feature activation-4.07e-03

Pos logit contributions
ips+0.001
 swords+0.001
 pret+0.001
ses+0.001
ids+0.001
Neg logit contributions
 year-0.001
 world-0.001
ini-0.001
ally-0.001
spe-0.001
Token rode
Token ID22075
Feature activation-1.27e-03

Pos logit contributions
 year+0.006
 world+0.004
ini+0.004
um+0.004
ally+0.004
Neg logit contributions
 pret-0.008
 swords-0.008
ips-0.008
ses-0.008
ids-0.008
Token it
Token ID340
Feature activation+2.23e-03

Pos logit contributions
 year+0.003
 world+0.002
um+0.002
Mom+0.002
ini+0.002
Neg logit contributions
 pret-0.002
ses-0.002
ips-0.002
 swords-0.002
ids-0.002
Token to
Token ID284
Feature activation+7.82e-03

Pos logit contributions
 year+0.024
 world+0.017
ie+0.016
 little+0.015
 day+0.015
Neg logit contributions
ses-0.025
ips-0.024
 loses-0.023
ids-0.023
ked-0.022
Token dress
Token ID6576
Feature activation+2.92e-03

Pos logit contributions
 pret+0.012
 swords+0.011
ips+0.011
ses+0.011
 loses+0.011
Neg logit contributions
 year-0.016
ini-0.012
 world-0.012
ally-0.012
um-0.012
Token up
Token ID510
Feature activation+2.13e-02

Pos logit contributions
 pret+0.005
 swords+0.005
ips+0.005
ses+0.005
 loses+0.005
Neg logit contributions
 year-0.005
ini-0.003
 world-0.003
um-0.003
ally-0.003
Token as
Token ID355
Feature activation-1.10e-02

Pos logit contributions
 pret+0.036
 swords+0.036
ips+0.035
 scar+0.034
 training+0.033
Neg logit contributions
 year-0.038
ini-0.029
ally-0.028
urious-0.027
 world-0.027
Token a
Token ID257
Feature activation-2.71e-03

Pos logit contributions
 year+0.022
ini+0.016
 world+0.016
um+0.016
Mom+0.016
Neg logit contributions
 pret-0.017
 swords-0.016
ips-0.016
ses-0.016
 loses-0.016
Token fairy
Token ID25607
Feature activation+1.33e-02

Pos logit contributions
 year+0.006
urious+0.005
ally+0.005
ini+0.005
ide+0.005
Neg logit contributions
 pret-0.004
 swords-0.004
 scar-0.004
 superheroes-0.003
 training-0.003
Token.
Token ID13
Feature activation+1.22e-03

Pos logit contributions
ips+0.025
ses+0.025
 pret+0.024
 swords+0.024
 loses+0.024
Neg logit contributions
 year-0.022
ie-0.015
 world-0.015
ini-0.014
time-0.014
Token She
Token ID1375
Feature activation+5.63e-04

Pos logit contributions
 pret+0.002
ips+0.002
ids+0.002
 Stri+0.002
 Robo+0.002
Neg logit contributions
 year-0.002
ini-0.002
spe-0.002
rated-0.002
urious-0.002
Token had
Token ID550
Feature activation-5.54e-05

Pos logit contributions
 pret+0.001
ips+0.001
 swords+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.001
ini-0.001
 world-0.001
um-0.001
Mom-0.001
Token a
Token ID257
Feature activation+1.26e-02

Pos logit contributions
 year+0.000
 world+0.000
Mom+0.000
um+0.000
time+0.000
Neg logit contributions
 pret-0.000
ips-0.000
ses-0.000
 swords-0.000
 loses-0.000
Token pink
Token ID11398
Feature activation-1.29e-04

Pos logit contributions
 pret+0.021
 swords+0.020
ses+0.020
ips+0.020
 scar+0.019
Neg logit contributions
 year-0.024
ini-0.017
ally-0.017
urious-0.017
ide-0.017
Token lion
Token ID18744
Feature activation+8.26e-03

Pos logit contributions
 pret+0.010
 swords+0.010
ses+0.010
 loses+0.009
 superheroes+0.009
Neg logit contributions
 year-0.015
ini-0.011
ally-0.011
um-0.011
urious-0.011
Token.
Token ID13
Feature activation+2.05e-03

Pos logit contributions
ips+0.016
 pret+0.015
ses+0.015
 swords+0.015
drivers+0.015
Neg logit contributions
 year-0.013
 world-0.009
ide-0.008
time-0.008
ie-0.008
Token He
Token ID679
Feature activation+4.74e-03

Pos logit contributions
 pret+0.003
 Stri+0.003
 swords+0.003
ips+0.003
ses+0.003
Neg logit contributions
 year-0.004
ini-0.003
ie-0.003
ide-0.003
ally-0.003
Token would
Token ID561
Feature activation+1.11e-02

Pos logit contributions
ips+0.009
 swords+0.008
 pret+0.008
ses+0.008
drivers+0.008
Neg logit contributions
 year-0.008
 world-0.006
Mom-0.005
urious-0.005
ie-0.005
Token roar
Token ID34532
Feature activation+9.06e-03

Pos logit contributions
ips+0.016
 pret+0.016
 swords+0.016
ses+0.015
drivers+0.015
Neg logit contributions
 year-0.023
 world-0.017
ini-0.016
Mom-0.016
ide-0.016
Token and
Token ID290
Feature activation+1.02e-02

Pos logit contributions
 pret+0.016
ips+0.016
ses+0.016
 swords+0.016
ids+0.015
Neg logit contributions
 year-0.016
 world-0.011
ide-0.010
um-0.010
Mom-0.010
Token jump
Token ID4391
Feature activation+1.35e-02

Pos logit contributions
ips+0.014
 pret+0.013
 swords+0.013
ses+0.013
drivers+0.013
Neg logit contributions
 year-0.022
 world-0.017
Mom-0.016
heart-0.016
urious-0.016
Token and
Token ID290
Feature activation+4.09e-03

Pos logit contributions
 pret+0.024
ips+0.023
 swords+0.023
ses+0.023
 loses+0.022
Neg logit contributions
 year-0.024
 world-0.017
ini-0.017
um-0.017
ally-0.016
Token scare
Token ID19437
Feature activation-2.50e-03

Pos logit contributions
ips+0.006
 pret+0.005
 swords+0.005
ses+0.005
drivers+0.005
Neg logit contributions
 year-0.009
 world-0.007
Mom-0.006
ide-0.006
ini-0.006
Token everyone
Token ID2506
Feature activation+1.46e-03

Pos logit contributions
 year+0.006
ini+0.005
ally+0.004
um+0.004
urious+0.004
Neg logit contributions
 pret-0.003
 swords-0.003
 loses-0.003
ips-0.003
 Stri-0.003
Token.
Token ID13
Feature activation-5.29e-03

Pos logit contributions
 pret+0.003
 swords+0.003
ips+0.003
ses+0.003
 loses+0.003
Neg logit contributions
 year-0.002
ini-0.002
ally-0.002
um-0.002
 world-0.002
Token for
Token ID329
Feature activation+7.90e-03

Pos logit contributions
 year+0.012
ini+0.010
ally+0.009
urious+0.009
spe+0.009
Neg logit contributions
 pret-0.008
 swords-0.007
ips-0.007
 superheroes-0.007
ids-0.007
Token help
Token ID1037
Feature activation+6.67e-04

Pos logit contributions
ses+0.016
 fault+0.014
 pret+0.014
rob+0.014
 loses+0.014
Neg logit contributions
 year-0.015
Mom-0.011
time-0.010
lf-0.010
um-0.009
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
 pret+0.001
ips+0.001
 training+0.001
 swords+0.001
ids+0.001
Neg logit contributions
 year-0.001
ini-0.001
spe-0.001
ally-0.001
urious-0.001
Token Her
Token ID2332
Feature activation-3.79e-03

Pos logit contributions
 pret+0.017
ses+0.015
ips+0.015
ids+0.015
 Robo+0.015
Neg logit contributions
 year-0.023
ini-0.019
heart-0.018
ie-0.017
rated-0.017
Token mom
Token ID1995
Feature activation+4.37e-03

Pos logit contributions
 year+0.005
ini+0.003
 world+0.003
ally+0.003
um+0.003
Neg logit contributions
 pret-0.008
 swords-0.008
ips-0.008
ses-0.008
 loses-0.008
Token told
Token ID1297
Feature activation+1.18e-02

Pos logit contributions
 pret+0.006
 swords+0.005
ips+0.005
 scar+0.005
 Stri+0.005
Neg logit contributions
 year-0.010
 world-0.008
ini-0.007
ide-0.007
urious-0.007
Token her
Token ID607
Feature activation+7.86e-03

Pos logit contributions
 pret+0.015
 swords+0.014
ips+0.014
 scar+0.014
 loses+0.013
Neg logit contributions
 year-0.027
ini-0.022
spe-0.021
 world-0.021
um-0.020
Token to
Token ID284
Feature activation+9.62e-03

Pos logit contributions
 pret+0.012
 swords+0.012
ips+0.012
ses+0.012
 loses+0.011
Neg logit contributions
 year-0.016
ini-0.011
 world-0.011
um-0.011
ally-0.011
Token polish
Token ID25245
Feature activation+1.36e-02

Pos logit contributions
 pret+0.012
 training+0.012
 swords+0.012
 Stri+0.011
 scar+0.011
Neg logit contributions
 year-0.020
ini-0.017
spe-0.016
urious-0.016
ally-0.015
Token her
Token ID607
Feature activation+9.77e-03

Pos logit contributions
ses+0.022
 swords+0.022
 pret+0.022
ips+0.021
 loses+0.021
Neg logit contributions
 year-0.026
 world-0.018
um-0.018
ie-0.018
heart-0.018
Token shoes
Token ID10012
Feature activation+4.75e-04

Pos logit contributions
 pret+0.014
 swords+0.014
ips+0.014
ses+0.013
 loses+0.013
Neg logit contributions
 year-0.021
ini-0.015
ally-0.015
um-0.015
 world-0.015
Token walk
Token ID2513
Feature activation+3.46e-03

Pos logit contributions
 year+0.005
um+0.004
heart+0.004
ini+0.004
urious+0.004
Neg logit contributions
 pret-0.005
 swords-0.005
 scar-0.004
 superheroes-0.004
 training-0.004
Token together
Token ID1978
Feature activation-9.40e-03

Pos logit contributions
ips+0.005
 pret+0.005
 swords+0.005
 superheroes+0.005
ids+0.005
Neg logit contributions
 year-0.006
ini-0.005
urious-0.005
ally-0.005
um-0.005
Token.
Token ID13
Feature activation-3.02e-03

Pos logit contributions
 year+0.016
ini+0.012
ally+0.011
Mom+0.011
um+0.011
Neg logit contributions
 pret-0.016
 swords-0.016
ips-0.016
 training-0.016
ids-0.016
Token Grace
Token ID16156
Feature activation-8.14e-03

Pos logit contributions
 year+0.005
ini+0.004
ally+0.004
urious+0.004
um+0.004
Neg logit contributions
 pret-0.005
ids-0.005
ips-0.005
 superheroes-0.005
 swords-0.005
Token was
Token ID373
Feature activation+2.24e-03

Pos logit contributions
 year+0.011
ini+0.007
um+0.007
 world+0.007
Mom+0.007
Neg logit contributions
 pret-0.018
 swords-0.017
ips-0.017
 loses-0.016
ses-0.016
Token very
Token ID845
Feature activation-3.79e-03

Pos logit contributions
 pret+0.004
 swords+0.004
ips+0.004
ses+0.004
 loses+0.004
Neg logit contributions
 year-0.004
 world-0.003
ini-0.002
Mom-0.002
um-0.002
Token stubborn
Token ID24444
Feature activation-2.83e-03

Pos logit contributions
 year+0.007
ini+0.005
ally+0.005
 world+0.005
um+0.005
Neg logit contributions
 pret-0.006
ips-0.006
 swords-0.006
 loses-0.006
ses-0.006
Token and
Token ID290
Feature activation-9.24e-03

Pos logit contributions
 year+0.005
 world+0.003
ini+0.003
um+0.003
ally+0.003
Neg logit contributions
 pret-0.005
 swords-0.005
ips-0.005
ses-0.005
 loses-0.005
Token wanted
Token ID2227
Feature activation-8.14e-03

Pos logit contributions
 year+0.017
ini+0.013
um+0.012
 world+0.012
ally+0.012
Neg logit contributions
 pret-0.016
 loses-0.015
 swords-0.015
ses-0.015
 Stri-0.014
Token to
Token ID284
Feature activation-6.85e-03

Pos logit contributions
 year+0.017
 world+0.012
Mom+0.012
time+0.012
Lu+0.012
Neg logit contributions
 pret-0.012
ips-0.012
ses-0.012
 swords-0.012
ids-0.011
Token go
Token ID467
Feature activation-1.69e-03

Pos logit contributions
 year+0.015
ini+0.012
urious+0.012
ally+0.011
um+0.011
Neg logit contributions
 pret-0.009
 training-0.009
 loses-0.009
 swords-0.008
 Stri-0.008
Token told
Token ID1297
Feature activation+7.58e-03

Pos logit contributions
 year+0.002
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.002
ips-0.002
 swords-0.002
ses-0.002
 loses-0.002
Token her
Token ID607
Feature activation-7.28e-04

Pos logit contributions
 pret+0.011
 swords+0.011
ips+0.011
ses+0.010
 loses+0.010
Neg logit contributions
 year-0.016
ini-0.012
 world-0.012
ally-0.012
urious-0.012
Token mom
Token ID1995
Feature activation-4.23e-03

Pos logit contributions
 year+0.001
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.002
 swords-0.002
ips-0.002
ses-0.002
 loses-0.002
Token all
Token ID477
Feature activation+1.06e-02

Pos logit contributions
 year+0.008
ini+0.007
urious+0.006
 world+0.006
Mom+0.006
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.006
 scar-0.006
 superheroes-0.006
Token about
Token ID546
Feature activation-4.49e-03

Pos logit contributions
 pret+0.017
 swords+0.016
ips+0.016
ses+0.016
 loses+0.016
Neg logit contributions
 year-0.020
ini-0.015
Mom-0.015
um-0.015
 world-0.015
Token her
Token ID607
Feature activation+1.80e-03

Pos logit contributions
 year+0.009
 world+0.006
ally+0.006
Mom+0.006
ini+0.006
Neg logit contributions
 pret-0.007
ips-0.007
 swords-0.007
ses-0.007
 loses-0.007
Token new
Token ID649
Feature activation+6.71e-03

Pos logit contributions
 pret+0.003
ips+0.003
 swords+0.003
ses+0.003
 loses+0.003
Neg logit contributions
 year-0.003
 world-0.002
ini-0.002
um-0.002
ally-0.002
Token friend
Token ID1545
Feature activation+8.09e-03

Pos logit contributions
 pret+0.011
ips+0.011
 swords+0.011
ses+0.011
 loses+0.011
Neg logit contributions
 year-0.013
 world-0.009
ini-0.009
um-0.009
ally-0.009
Token and
Token ID290
Feature activation-1.02e-02

Pos logit contributions
 pret+0.016
ips+0.016
 swords+0.015
ses+0.015
 loses+0.015
Neg logit contributions
 year-0.013
 world-0.009
ally-0.008
ide-0.008
ie-0.008
Token how
Token ID703
Feature activation+4.20e-03

Pos logit contributions
 year+0.021
ini+0.015
 world+0.015
um+0.015
ally+0.015
Neg logit contributions
 pret-0.016
 swords-0.015
ips-0.015
ses-0.015
 loses-0.014
Token comfortable
Token ID6792
Feature activation+1.49e-02

Pos logit contributions
 pret+0.006
 swords+0.006
ips+0.005
 loses+0.005
 superheroes+0.005
Neg logit contributions
 year-0.009
ini-0.007
um-0.007
ally-0.007
Mom-0.007
Token tour
Token ID4205
Feature activation-3.20e-03

Pos logit contributions
 year+0.000
ini+0.000
 world+0.000
um+0.000
ally+0.000
Neg logit contributions
 pret-0.000
ips-0.000
 swords-0.000
ses-0.000
 loses-0.000
Token of
Token ID286
Feature activation-4.59e-04

Pos logit contributions
 year+0.005
ini+0.004
urious+0.004
ally+0.004
heart+0.004
Neg logit contributions
 pret-0.006
 swords-0.006
ips-0.006
ids-0.005
ses-0.005
Token his
Token ID465
Feature activation+1.40e-02

Pos logit contributions
 year+0.001
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Token place
Token ID1295
Feature activation-9.45e-03

Pos logit contributions
 swords+0.023
 pret+0.023
ses+0.021
 superheroes+0.021
 badges+0.020
Neg logit contributions
 year-0.027
ally-0.021
ide-0.021
heart-0.021
um-0.020
Token.
Token ID13
Feature activation-2.11e-03

Pos logit contributions
 year+0.016
ini+0.012
ally+0.011
urious+0.011
Mom+0.011
Neg logit contributions
 swords-0.017
 pret-0.017
ips-0.016
ids-0.016
 loses-0.016
Token Tom
Token ID4186
Feature activation+1.75e-03

Pos logit contributions
 year+0.004
ini+0.002
ie+0.002
ally+0.002
heart+0.002
Neg logit contributions
 pret-0.004
ips-0.004
 swords-0.004
 Stri-0.004
 Robo-0.004
Token and
Token ID290
Feature activation+1.18e-03

Pos logit contributions
ips+0.004
 swords+0.003
ses+0.003
 pret+0.003
ids+0.003
Neg logit contributions
 year-0.003
 world-0.002
um-0.002
ie-0.002
urious-0.002
Token Sue
Token ID26089
Feature activation+2.27e-03

Pos logit contributions
 pret+0.002
 swords+0.002
ips+0.002
 Stri+0.002
ses+0.002
Neg logit contributions
 year-0.002
ini-0.002
 world-0.002
um-0.002
urious-0.002
Token are
Token ID389
Feature activation+1.85e-03

Pos logit contributions
ips+0.003
 pret+0.003
 swords+0.003
ses+0.003
drivers+0.003
Neg logit contributions
 year-0.005
 world-0.003
ide-0.003
Mom-0.003
um-0.003
Token grateful
Token ID14066
Feature activation+1.05e-03

Pos logit contributions
 pret+0.004
ips+0.004
 swords+0.004
ses+0.004
drivers+0.004
Neg logit contributions
 year-0.003
 world-0.002
um-0.002
Mom-0.002
ini-0.002
Token and
Token ID290
Feature activation+1.93e-04

Pos logit contributions
 swords+0.002
 pret+0.002
ips+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.002
ini-0.002
ally-0.002
urious-0.002
Mom-0.002
TokenTom
Token ID13787
Feature activation+2.81e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
ses+0.001
 loses+0.000
Neg logit contributions
 year-0.001
ini-0.000
 world-0.000
um-0.000
ally-0.000
Token and
Token ID290
Feature activation+7.39e-04

Pos logit contributions
 pret+0.006
 swords+0.006
ips+0.006
ses+0.006
 loses+0.006
Neg logit contributions
 year-0.004
 world-0.002
ini-0.002
ally-0.002
um-0.002
Token Lily
Token ID20037
Feature activation-8.68e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.002
ini-0.001
 world-0.001
um-0.001
ally-0.001
Token are
Token ID389
Feature activation-1.48e-02

Pos logit contributions
 year+0.013
 world+0.008
ini+0.008
ally+0.008
um+0.008
Neg logit contributions
 pret-0.018
ips-0.018
 swords-0.018
ses-0.017
ids-0.017
Token friends
Token ID2460
Feature activation+2.29e-03

Pos logit contributions
 year+0.032
ini+0.024
 world+0.023
ally+0.023
um+0.023
Neg logit contributions
 pret-0.021
ips-0.020
 swords-0.020
ses-0.020
 loses-0.019
Token.
Token ID13
Feature activation-1.81e-03

Pos logit contributions
ips+0.004
 whist+0.004
 swords+0.004
 pret+0.004
 superheroes+0.004
Neg logit contributions
 year-0.004
rated-0.003
ini-0.003
urious-0.003
ally-0.003
Token They
Token ID1119
Feature activation-2.48e-03

Pos logit contributions
 year+0.003
ini+0.002
 world+0.002
heart+0.002
urious+0.002
Neg logit contributions
 pret-0.003
ips-0.003
 swords-0.003
ses-0.003
ids-0.003
Token like
Token ID588
Feature activation+1.88e-03

Pos logit contributions
 year+0.005
ini+0.004
ally+0.004
um+0.004
 world+0.004
Neg logit contributions
 pret-0.004
 swords-0.004
ips-0.004
ses-0.003
 loses-0.003
Token to
Token ID284
Feature activation+1.29e-02

Pos logit contributions
ips+0.004
ids+0.004
drivers+0.004
 loses+0.004
ps+0.004
Neg logit contributions
 year-0.003
 world-0.002
 day-0.002
 little-0.002
time-0.002
Token play
Token ID711
Feature activation-2.01e-03

Pos logit contributions
 pret+0.023
 swords+0.022
ips+0.021
ses+0.021
ids+0.021
Neg logit contributions
 year-0.024
ini-0.017
 world-0.016
urious-0.016
ally-0.016
Token outside
Token ID2354
Feature activation-2.54e-03

Pos logit contributions
 year+0.003
 world+0.002
time+0.002
ini+0.002
Mom+0.002
Neg logit contributions
 pret-0.004
ses-0.004
ips-0.004
 loses-0.004
 swords-0.004
Token the
Token ID262
Feature activation+1.13e-03

Pos logit contributions
 year+0.017
ini+0.013
 world+0.013
um+0.013
ally+0.013
Neg logit contributions
 pret-0.011
 swords-0.011
ips-0.011
ses-0.011
 loses-0.010
Token roof
Token ID9753
Feature activation+1.27e-02

Pos logit contributions
 tickets+0.002
 padd+0.002
 scar+0.002
 pret+0.002
 swords+0.001
Neg logit contributions
 year-0.002
ally-0.002
heart-0.002
ini-0.002
ie-0.002
Token and
Token ID290
Feature activation-1.01e-02

Pos logit contributions
ips+0.020
 pret+0.019
 swords+0.019
ids+0.018
 Stri+0.018
Neg logit contributions
 year-0.024
ini-0.018
ally-0.018
Mom-0.018
urious-0.018
Token wave
Token ID6769
Feature activation+3.81e-03

Pos logit contributions
 year+0.021
ini+0.016
 world+0.015
um+0.015
ally+0.015
Neg logit contributions
 pret-0.015
 swords-0.015
ips-0.014
ses-0.014
 loses-0.014
Token to
Token ID284
Feature activation+3.31e-03

Pos logit contributions
 pret+0.006
ips+0.006
 swords+0.006
ses+0.006
drivers+0.006
Neg logit contributions
 year-0.008
um-0.006
 world-0.005
Mom-0.005
ally-0.005
Token the
Token ID262
Feature activation+1.60e-03

Pos logit contributions
 pret+0.004
 swords+0.003
 training+0.003
 scar+0.003
 tries+0.003
Neg logit contributions
 year-0.008
ini-0.007
urious-0.006
ally-0.006
spe-0.006
Token people
Token ID661
Feature activation+1.34e-03

Pos logit contributions
 padd+0.002
 swords+0.002
 scar+0.002
 pret+0.002
 pilots+0.002
Neg logit contributions
 year-0.003
ally-0.003
ide-0.002
heart-0.002
date-0.002
Token in
Token ID287
Feature activation+4.09e-04

Pos logit contributions
 pret+0.003
 swords+0.003
ips+0.003
ses+0.003
 loses+0.002
Neg logit contributions
 year-0.002
ini-0.001
 world-0.001
um-0.001
ally-0.001
Token the
Token ID262
Feature activation+5.52e-03

Pos logit contributions
 pret+0.001
 swords+0.001
ips+0.001
ses+0.001
 loses+0.001
Neg logit contributions
 year-0.001
ini-0.001
um-0.001
 world-0.001
ally-0.001
Token street
Token ID4675
Feature activation-2.51e-03

Pos logit contributions
 pret+0.008
 padd+0.008
 swords+0.008
 scar+0.008
 buck+0.008
Neg logit contributions
 year-0.009
ally-0.008
ide-0.008
um-0.008
heart-0.008
Token below
Token ID2174
Feature activation+1.96e-02

Pos logit contributions
 year+0.004
ini+0.003
ally+0.003
um+0.003
Mom+0.003
Neg logit contributions
 pret-0.005
ips-0.005
 swords-0.005
 loses-0.005
ids-0.005
Token to
Token ID284
Feature activation-6.56e-03

Pos logit contributions
 year+0.001
Mom+0.001
time+0.001
um+0.001
 world+0.001
Neg logit contributions
ses-0.001
 pret-0.001
ips-0.001
rob-0.001
ids-0.001
Token share
Token ID2648
Feature activation-1.01e-02

Pos logit contributions
 year+0.015
ini+0.013
heart+0.012
umm+0.012
 world+0.012
Neg logit contributions
 pret-0.008
 training-0.007
 swords-0.007
 loses-0.007
 padd-0.007
Token and
Token ID290
Feature activation-6.37e-03

Pos logit contributions
 year+0.021
ini+0.016
um+0.016
urious+0.016
ally+0.016
Neg logit contributions
 pret-0.015
 swords-0.014
ips-0.014
 loses-0.014
 scar-0.013
Token play
Token ID711
Feature activation-9.26e-03

Pos logit contributions
 year+0.014
ini+0.011
um+0.011
urious+0.011
 world+0.011
Neg logit contributions
 pret-0.009
 swords-0.008
 loses-0.008
 superheroes-0.008
 training-0.008
Token together
Token ID1978
Feature activation-1.75e-02

Pos logit contributions
 year+0.021
ini+0.018
heart+0.018
ally+0.018
um+0.018
Neg logit contributions
 pret-0.012
 swords-0.011
ips-0.011
 superheroes-0.011
 loses-0.010
Token nicely
Token ID16576
Feature activation-2.63e-02

Pos logit contributions
 year+0.029
ini+0.023
heart+0.021
ally+0.021
urious+0.021
Neg logit contributions
 pret-0.033
ips-0.032
 swords-0.031
 loses-0.030
ids-0.030
Token.
Token ID13
Feature activation-7.75e-03

Pos logit contributions
 year+0.038
ini+0.025
ally+0.025
um+0.024
 world+0.024
Neg logit contributions
 pret-0.056
 swords-0.054
ips-0.054
 loses-0.052
ses-0.051
Token They
Token ID1119
Feature activation+4.41e-03

Pos logit contributions
 year+0.017
ini+0.013
ide+0.013
heart+0.013
rated+0.013
Neg logit contributions
 Robo-0.011
ips-0.011
 pret-0.011
etchup-0.010
 Stri-0.010
Token both
Token ID1111
Feature activation+1.37e-02

Pos logit contributions
 pret+0.008
 swords+0.007
ips+0.007
ses+0.007
 loses+0.007
Neg logit contributions
 year-0.008
ini-0.006
 world-0.006
um-0.006
ally-0.006
Token agreed
Token ID4987
Feature activation-1.93e-02

Pos logit contributions
 pret+0.022
 swords+0.020
 loses+0.020
ses+0.019
ips+0.019
Neg logit contributions
 year-0.027
ini-0.021
 world-0.021
um-0.020
ally-0.020
Token and
Token ID290
Feature activation-5.68e-03

Pos logit contributions
 year+0.040
ini+0.031
 world+0.030
urious+0.029
um+0.029
Neg logit contributions
 pret-0.029
 swords-0.027
ips-0.027
 loses-0.026
ses-0.026
Token kitchen
Token ID9592
Feature activation-1.26e-02

Pos logit contributions
 year+0.014
ally+0.012
heart+0.012
ini+0.012
urious+0.012
Neg logit contributions
 pret-0.010
 superheroes-0.009
 scar-0.009
 swords-0.009
 asleep-0.009
Token and
Token ID290
Feature activation-9.13e-03

Pos logit contributions
 year+0.023
ini+0.019
urious+0.017
ally+0.017
rated+0.017
Neg logit contributions
 pret-0.020
 swords-0.019
ips-0.019
ids-0.018
 scar-0.018
Token she
Token ID673
Feature activation-1.12e-02

Pos logit contributions
 year+0.018
ini+0.013
 world+0.013
um+0.013
ally+0.013
Neg logit contributions
 pret-0.015
 swords-0.014
ips-0.014
ses-0.014
 loses-0.014
Token found
Token ID1043
Feature activation-1.69e-02

Pos logit contributions
 year+0.023
ini+0.018
 world+0.017
um+0.017
heart+0.017
Neg logit contributions
 pret-0.017
 swords-0.016
ids-0.015
ips-0.015
 loses-0.015
Token a
Token ID257
Feature activation-7.18e-03

Pos logit contributions
 year+0.041
ini+0.032
 world+0.031
urious+0.031
ally+0.031
Neg logit contributions
 pret-0.020
 swords-0.019
ips-0.019
ses-0.018
ids-0.018
Token packet
Token ID19638
Feature activation-2.68e-02

Pos logit contributions
 year+0.014
urious+0.011
ini+0.011
ide+0.011
ally+0.011
Neg logit contributions
 pret-0.011
 swords-0.011
 scar-0.011
 superheroes-0.010
ses-0.010
Token of
Token ID286
Feature activation+2.87e-03

Pos logit contributions
 year+0.047
urious+0.039
ini+0.036
rated+0.036
spe+0.035
Neg logit contributions
ips-0.044
ids-0.044
 swords-0.042
 pret-0.042
etchup-0.041
Token tea
Token ID8887
Feature activation-1.67e-03

Pos logit contributions
 pret+0.004
 swords+0.003
ips+0.003
ses+0.003
 loses+0.003
Neg logit contributions
 year-0.007
ini-0.005
 world-0.005
ally-0.005
um-0.005
Token.
Token ID13
Feature activation+5.32e-03

Pos logit contributions
 year+0.003
urious+0.002
ini+0.002
spe+0.002
heart+0.002
Neg logit contributions
 pret-0.003
ips-0.003
ids-0.003
 swords-0.003
 scar-0.003
Token She
Token ID1375
Feature activation-1.29e-03

Pos logit contributions
ids+0.008
 pret+0.007
ips+0.007
ses+0.007
 Stri+0.007
Neg logit contributions
 year-0.012
ini-0.009
spe-0.009
ide-0.009
ie-0.009
Token put
Token ID1234
Feature activation-3.48e-03

Pos logit contributions
 year+0.002
 world+0.002
ini+0.002
urious+0.002
heart+0.002
Neg logit contributions
 pret-0.002
 swords-0.002
 scar-0.002
ids-0.002
 Stri-0.002
Token.
Token ID13
Feature activation-1.20e-03

Pos logit contributions
 year+0.029
ini+0.023
rated+0.021
heart+0.021
ally+0.021
Neg logit contributions
 pret-0.032
ips-0.031
 swords-0.031
 loses-0.030
 superheroes-0.030
Token They
Token ID1119
Feature activation-3.34e-03

Pos logit contributions
 year+0.002
ini+0.002
ally+0.002
 world+0.002
heart+0.001
Neg logit contributions
 pret-0.002
ips-0.002
ses-0.002
 swords-0.002
ids-0.002
Token like
Token ID588
Feature activation-1.61e-03

Pos logit contributions
 year+0.007
ini+0.006
ally+0.006
urious+0.006
um+0.006
Neg logit contributions
 pret-0.004
 swords-0.004
ips-0.004
ses-0.004
ids-0.004
Token to
Token ID284
Feature activation+1.02e-02

Pos logit contributions
 year+0.003
 world+0.002
 day+0.002
Mom+0.002
time+0.002
Neg logit contributions
ids-0.003
ips-0.003
ses-0.003
ps-0.003
 loses-0.003
Token play
Token ID711
Feature activation-5.65e-03

Pos logit contributions
 pret+0.018
 swords+0.017
ips+0.017
ses+0.016
ids+0.016
Neg logit contributions
 year-0.019
ini-0.014
 world-0.013
heart-0.013
urious-0.013
Token with
Token ID351
Feature activation-2.48e-02

Pos logit contributions
 year+0.010
 world+0.007
time+0.007
Mom+0.007
um+0.007
Neg logit contributions
 pret-0.010
ses-0.010
ips-0.009
 loses-0.009
 swords-0.009
Token their
Token ID511
Feature activation-6.12e-04

Pos logit contributions
 year+0.053
Mom+0.039
 world+0.038
ie+0.038
ide+0.038
Neg logit contributions
ses-0.035
ips-0.035
ids-0.035
cher-0.034
 loses-0.034
Token toys
Token ID14958
Feature activation-1.25e-02

Pos logit contributions
 year+0.001
ini+0.001
 world+0.001
um+0.001
ally+0.001
Neg logit contributions
 pret-0.001
 swords-0.001
ips-0.001
ses-0.001
 loses-0.001
Token and
Token ID290
Feature activation-7.61e-03

Pos logit contributions
 year+0.016
 world+0.009
um+0.008
ide+0.008
Mom+0.008
Neg logit contributions
 pret-0.029
ses-0.028
ips-0.028
 swords-0.028
 loses-0.028
Token share
Token ID2648
Feature activation-1.34e-02

Pos logit contributions
 year+0.016
 world+0.011
ini+0.011
Mom+0.011
ally+0.011
Neg logit contributions
ips-0.011
 pret-0.011
 swords-0.011
ses-0.011
ids-0.011
Token them
Token ID606
Feature activation-1.03e-03

Pos logit contributions
 year+0.029
ini+0.025
heart+0.023
ally+0.023
urious+0.023
Neg logit contributions
 pret-0.019
 swords-0.018
 scar-0.017
 badges-0.017
 superheroes-0.016
Token much
Token ID881
Feature activation+3.99e-03

Pos logit contributions
 year+0.020
ini+0.017
heart+0.016
 world+0.015
urious+0.015
Neg logit contributions
 swords-0.017
 loses-0.017
 pret-0.017
ips-0.016
ses-0.015
Token.
Token ID13
Feature activation+3.28e-03

Pos logit contributions
 pret+0.010
ips+0.009
 swords+0.009
ses+0.009
ids+0.009
Neg logit contributions
 year-0.005
 world-0.002
um-0.002
ide-0.002
ally-0.002
Token The
Token ID383
Feature activation-8.70e-03

Pos logit contributions
 pret+0.006
 swords+0.006
ips+0.006
ses+0.006
 Stri+0.005
Neg logit contributions
 year-0.006
ini-0.004
 world-0.004
ally-0.004
um-0.004
Token girl
Token ID2576
Feature activation-1.01e-02

Pos logit contributions
 year+0.016
ide+0.014
heart+0.013
spe+0.013
ini+0.013
Neg logit contributions
ses-0.014
 scar-0.013
 superheroes-0.013
 pret-0.013
 swords-0.013
Token cl
Token ID537
Feature activation+7.01e-03

Pos logit contributions
 year+0.016
 world+0.010
ally+0.010
um+0.010
ini+0.010
Neg logit contributions
ips-0.020
 pret-0.020
 swords-0.020
ses-0.019
ids-0.019
Tokenapped
Token ID6320
Feature activation-2.88e-02

Pos logit contributions
 pret+0.012
 swords+0.012
ips+0.011
ses+0.011
 Stri+0.011
Neg logit contributions
 year-0.013
um-0.009
ini-0.009
ally-0.009
 world-0.009
Token her
Token ID607
Feature activation+1.65e-02

Pos logit contributions
 year+0.067
ini+0.058
ally+0.053
urious+0.053
spe+0.053
Neg logit contributions
 swords-0.034
 pret-0.034
 scar-0.031
ips-0.030
 superheroes-0.030
Token hands
Token ID2832
Feature activation+4.11e-05

Pos logit contributions
 pret+0.024
 swords+0.023
ips+0.023
ses+0.022
 loses+0.022
Neg logit contributions
 year-0.036
ini-0.026
 world-0.026
um-0.026
ally-0.026
Token and
Token ID290
Feature activation-1.93e-03

Pos logit contributions
 pret+0.000
ips+0.000
ses+0.000
 swords+0.000
 loses+0.000
Neg logit contributions
 year-0.000
um-0.000
 world-0.000
ide-0.000
time-0.000
Token the
Token ID262
Feature activation-3.55e-04

Pos logit contributions
 year+0.003
ally+0.002
ie+0.002
Mom+0.002
 sound+0.002
Neg logit contributions
ips-0.003
ids-0.003
drivers-0.003
 swords-0.003
 training-0.003
Token dog
Token ID3290
Feature activation-1.27e-02

Pos logit contributions
 year+0.001
ide+0.001
heart+0.001
spe+0.001
ini+0.001
Neg logit contributions
ses-0.000
 swords-0.000
 pret-0.000
 elbow-0.000
 training-0.000
Token a
Token ID257
Feature activation-7.94e-03

Pos logit contributions
 year+0.027
um+0.020
 world+0.020
Mom+0.020
ally+0.020
Neg logit contributions
 pret-0.018
ips-0.018
ses-0.018
 loses-0.017
ids-0.017
Token big
Token ID1263
Feature activation-1.28e-02

Pos logit contributions
 year+0.015
 world+0.010
ini+0.010
um+0.010
Mom+0.009
Neg logit contributions
ips-0.014
 pret-0.014
ids-0.013
ses-0.013
 loses-0.013
Token ball
Token ID2613
Feature activation-1.66e-02

Pos logit contributions
 year+0.025
 world+0.018
um+0.017
ally+0.017
ini+0.017
Neg logit contributions
ips-0.021
 pret-0.020
ids-0.020
 loses-0.020
drivers-0.020
Token to
Token ID284
Feature activation-7.44e-03

Pos logit contributions
 year+0.028
ini+0.019
 world+0.019
ally+0.019
um+0.019
Neg logit contributions
 pret-0.030
 swords-0.030
ips-0.030
ses-0.029
 loses-0.028
Token play
Token ID711
Feature activation-9.44e-03

Pos logit contributions
 year+0.015
ini+0.011
 world+0.011
um+0.011
ally+0.011
Neg logit contributions
 pret-0.011
 swords-0.011
ips-0.011
ses-0.011
 loses-0.010
Token with
Token ID351
Feature activation-2.51e-02

Pos logit contributions
 year+0.014
ini+0.010
ally+0.009
urious+0.009
heart+0.009
Neg logit contributions
 pret-0.019
 swords-0.019
ips-0.019
 superheroes-0.018
 Stri-0.018
Token.
Token ID13
Feature activation+9.08e-03

Pos logit contributions
 year+0.043
ini+0.038
rated+0.035
urious+0.034
umm+0.034
Neg logit contributions
 swords-0.045
 pret-0.042
ips-0.041
 Stri-0.041
 superheroes-0.040
Token The
Token ID383
Feature activation-6.23e-03

Pos logit contributions
 pret+0.020
ips+0.020
 swords+0.019
 loses+0.019
 training+0.019
Neg logit contributions
 year-0.012
um-0.007
 world-0.007
time-0.007
ally-0.007
Token hipp
Token ID18568
Feature activation+1.99e-03

Pos logit contributions
 year+0.014
heart+0.012
urious+0.012
ally+0.012
rated+0.012
Neg logit contributions
 swords-0.007
 padd-0.007
 ro-0.007
 pret-0.007
 tickets-0.006
Tokeno
Token ID78
Feature activation+2.99e-03

Pos logit contributions
ips+0.002
etchup+0.001
 swords+0.001
 Stri+0.001
 pret+0.001
Neg logit contributions
 year-0.005
rated-0.004
heart-0.004
um-0.004
 world-0.004
Token was
Token ID373
Feature activation+3.94e-03

Pos logit contributions
 pret+0.006
 swords+0.006
ips+0.006
ses+0.006
 loses+0.006
Neg logit contributions
 year-0.004
ini-0.003
 world-0.003
um-0.003
ally-0.003