TOP ACTIVATIONS
MAX = 0.097

would whisper " thank you " to whoever put it there
naughty and did not share " and included a smile y
my waving keeps me happy " and she carried on waving
for explaining what " trade " meant . From that day
he said " thank you " to his mom .
need to learn a lot " said Mum .
, " You 're welcome " before returning to his stable
know what " mod est " meant , but he knew
Mia said " thank you " for the fun she was
me to finish my bath " said her mom .
understand what " mod est " meant , but he was
Daisy said " Thank you " and hugged the girl .
it said " thank you " to the hipp o .
all said " thank you " to the sun for giving
" It was my pleasure " The little boy then went
he said " Thank you " and put it inside his
said " Safe to Eat " and knew the berries were
prince said " thank you " over and over again .
bird said " thank you " to its friends . They
family said " thank you " before they left . They

MIN ACTIVATIONS
MIN = -0.081

, sauce , cheese and topp ings . She showed them
of cheese and sauce and topp ings . The people who
saw five fish and seven seaw eed . They were happy
added shells and stones and seaw eed to make it look
of shells , rocks and seaw eed , but Jane was
ate it with shells and seaw eed . They find a
they can add their own topp ings too . Sara chooses
noisy and scary . He bar ks and grow ls and
buckets and shovel s and seaw eed . Then we can
cover your work . Ask for help when you need it
help cook hot dogs and hamb urg ers on the grill
decorated it with shells and seaw eed . They had a
like fish and shells and seaw eed . But they did
Like shells and rocks and seaw eed ." M
They went past rocks and seaw eed . They saw other
salad , they asked for dessert again . "
some shells and rocks and seaw eed . He picked them
sund ae with lots of topp ings . Tim my was
are welcome , kids . Enjoy your ol ives . But
Every day , m ama oy ster taught her baby oy

INTERVAL 0.053 - 0.097
CONTAINS 0.033%

it ." Tim my thought that was a good idea and
tell her dad what she had read . She said ,
happy again and hugged his mom . <|endoftext|> Once upon a
happy and proud of her mom 's plan . The end
tasted ! " T asty " was the perfect word to

INTERVAL 0.008 - 0.053
CONTAINS 26.715%

in the park . <|endoftext|> Tim was an adventurous boy who
. Benny was scared , but he answered , " It
other one was named Hudson . They lived together in a
. He began to run as fast as he could .
a big white bump in the ground ! Mill

INTERVAL -0.037 - 0.008
CONTAINS 72.666%

egg hopped happily back home . It had a wonderful time
ran away . The girl was very happy .
put them in the bin . Then she went to a
. She was happy because her hair was dry again .
driver gave Tim my a surprise . He opened one of

INTERVAL -0.081 - -0.037
CONTAINS 0.586%

She does not want to stop singing . "
She said , " Mom , our carpet is dirty .
monkey ." Then , the tiger waved goodbye and disappeared into
, watching her . The magic pencil feels sorry and lonely
opened the ig loo and pe ered inside . She found
Token would
Token ID561
Feature activation-1.68e-02

Pos logit contributions
 asks+0.007
 knocks+0.007
 walks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.009
he-0.009
ill-0.008
 Suz-0.008
smanship-0.007
Token whisper
Token ID31992
Feature activation-1.36e-02

Pos logit contributions
 asks+0.025
 walks+0.023
 knocks+0.022
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.038
he-0.036
ill-0.033
 Suz-0.031
smanship-0.031
Token "
Token ID366
Feature activation+4.96e-02

Pos logit contributions
 asks+0.021
 walks+0.019
 knocks+0.019
 trips+0.018
 points+0.018
Neg logit contributions
lessness-0.029
he-0.028
ill-0.026
 Suz-0.024
smanship-0.024
Tokenthank
Token ID40716
Feature activation+2.45e-02

Pos logit contributions
lessness+0.095
he+0.086
ill+0.079
 Suz+0.077
smanship+0.077
Neg logit contributions
 asks-0.087
 walks-0.080
 knocks-0.077
 trips-0.077
 points-0.076
Token you
Token ID345
Feature activation+2.23e-02

Pos logit contributions
lessness+0.057
he+0.054
ill+0.050
 Suz+0.048
smanship+0.048
Neg logit contributions
 asks-0.034
 walks-0.031
 knocks-0.030
 trips-0.029
 points-0.028
Token"
Token ID1
Feature activation+9.73e-02

Pos logit contributions
lessness+0.052
he+0.049
ill+0.046
 Suz+0.044
smanship+0.044
Neg logit contributions
 asks-0.030
 walks-0.027
 knocks-0.026
 trips-0.026
 points-0.025
Token to
Token ID284
Feature activation+9.86e-03

Pos logit contributions
lessness+0.218
he+0.200
smanship+0.188
 Suz+0.187
ill+0.187
Neg logit contributions
 asks-0.135
 walks-0.118
 trips-0.113
 points-0.111
 knocks-0.107
Token whoever
Token ID16958
Feature activation-1.10e-02

Pos logit contributions
lessness+0.020
he+0.019
ill+0.018
 Suz+0.017
smanship+0.016
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.014
 trips-0.014
 points-0.014
Token put
Token ID1234
Feature activation+1.46e-02

Pos logit contributions
 asks+0.016
 knocks+0.015
 walks+0.015
 trips+0.014
 requests+0.014
Neg logit contributions
lessness-0.023
he-0.022
ill-0.021
 Suz-0.020
smanship-0.018
Token it
Token ID340
Feature activation+1.93e-02

Pos logit contributions
lessness+0.033
he+0.031
ill+0.029
smanship+0.028
 Suz+0.028
Neg logit contributions
 asks-0.021
 walks-0.019
 knocks-0.018
 trips-0.018
 points-0.018
Token there
Token ID612
Feature activation+2.76e-02

Pos logit contributions
lessness+0.037
he+0.035
ill+0.032
 Suz+0.030
smanship+0.029
Neg logit contributions
 asks-0.034
 knocks-0.032
 walks-0.032
 trips-0.031
 points-0.030
Token naughty
Token ID45128
Feature activation+1.81e-02

Pos logit contributions
lessness+0.020
he+0.019
ill+0.017
 Suz+0.017
taker+0.016
Neg logit contributions
 asks-0.018
 walks-0.017
 knocks-0.017
 trips-0.017
 points-0.015
Token and
Token ID290
Feature activation-2.84e-03

Pos logit contributions
lessness+0.036
he+0.035
ill+0.032
 Suz+0.031
smanship+0.030
Neg logit contributions
 asks-0.030
 knocks-0.027
 walks-0.027
 trips-0.027
 points-0.026
Token did
Token ID750
Feature activation-1.36e-02

Pos logit contributions
 asks+0.004
 walks+0.004
 knocks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.005
smanship-0.005
Token not
Token ID407
Feature activation-1.24e-02

Pos logit contributions
 asks+0.019
 knocks+0.017
 walks+0.016
 trips+0.016
 points+0.016
Neg logit contributions
lessness-0.029
he-0.028
ill-0.025
 Suz-0.025
taker-0.024
Token share
Token ID2648
Feature activation-5.00e-03

Pos logit contributions
 asks+0.019
 knocks+0.018
 walks+0.018
 trips+0.018
 points+0.017
Neg logit contributions
lessness-0.025
he-0.024
ill-0.022
 Suz-0.021
smanship-0.020
Token"
Token ID1
Feature activation+9.49e-02

Pos logit contributions
 asks+0.008
 knocks+0.008
 walks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.010
he-0.010
ill-0.009
 Suz-0.008
smanship-0.008
Token and
Token ID290
Feature activation-1.35e-02

Pos logit contributions
lessness+0.198
he+0.181
 Suz+0.172
ill+0.171
smanship+0.166
Neg logit contributions
 asks-0.149
 walks-0.129
 trips-0.128
 knocks-0.127
 points-0.124
Token included
Token ID3017
Feature activation-8.75e-03

Pos logit contributions
 asks+0.021
 walks+0.019
 knocks+0.019
 trips+0.019
 points+0.018
Neg logit contributions
lessness-0.029
he-0.027
ill-0.025
 Suz-0.024
smanship-0.023
Token a
Token ID257
Feature activation-1.16e-02

Pos logit contributions
 asks+0.014
 walks+0.013
 knocks+0.013
 trips+0.013
 points+0.012
Neg logit contributions
lessness-0.018
he-0.017
ill-0.016
smanship-0.015
 Suz-0.015
Token smile
Token ID8212
Feature activation+2.75e-02

Pos logit contributions
 asks+0.019
 walks+0.017
 knocks+0.017
 trips+0.017
 points+0.016
Neg logit contributions
lessness-0.024
he-0.023
ill-0.021
 Suz-0.020
smanship-0.020
Tokeny
Token ID88
Feature activation+1.39e-02

Pos logit contributions
lessness+0.059
he+0.056
ill+0.051
 Suz+0.049
smanship+0.048
Neg logit contributions
 asks-0.043
 walks-0.039
 knocks-0.039
 trips-0.038
 points-0.036
Token my
Token ID616
Feature activation-2.38e-02

Pos logit contributions
 asks+0.000
 walks+0.000
 knocks+0.000
 trips+0.000
 requests+0.000
Neg logit contributions
lessness-0.001
he-0.000
ill-0.000
 Suz-0.000
taker-0.000
Token waving
Token ID25849
Feature activation-1.54e-02

Pos logit contributions
 asks+0.036
 knocks+0.032
 walks+0.032
 trips+0.031
 points+0.029
Neg logit contributions
lessness-0.052
he-0.050
ill-0.046
 Suz-0.044
smanship-0.043
Token keeps
Token ID7622
Feature activation+7.58e-03

Pos logit contributions
 asks+0.025
 knocks+0.023
 walks+0.023
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.032
he-0.030
ill-0.028
 Suz-0.027
smanship-0.026
Token me
Token ID502
Feature activation+2.79e-03

Pos logit contributions
lessness+0.014
he+0.014
 Suz+0.012
ill+0.012
smanship+0.011
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.011
 trips-0.011
 points-0.010
Token happy
Token ID3772
Feature activation+6.52e-03

Pos logit contributions
lessness+0.005
he+0.005
ill+0.005
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.005
 walks-0.005
 trips-0.004
 knocks-0.004
 points-0.004
Token"
Token ID1
Feature activation+9.46e-02

Pos logit contributions
lessness+0.013
he+0.012
ill+0.011
 Suz+0.011
smanship+0.010
Neg logit contributions
 asks-0.011
 knocks-0.010
 walks-0.010
 trips-0.010
 points-0.010
Token and
Token ID290
Feature activation-8.73e-04

Pos logit contributions
lessness+0.202
he+0.187
 Suz+0.171
smanship+0.170
ill+0.170
Neg logit contributions
 asks-0.145
 walks-0.125
 knocks-0.122
 trips-0.121
 points-0.120
Token she
Token ID673
Feature activation-2.98e-03

Pos logit contributions
 asks+0.001
 trips+0.001
 walks+0.001
 knocks+0.001
 strangers+0.001
Neg logit contributions
lessness-0.002
he-0.002
 Suz-0.002
ill-0.002
taker-0.001
Token carried
Token ID5281
Feature activation-6.96e-04

Pos logit contributions
 asks+0.004
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.006
he-0.006
ill-0.006
 Suz-0.006
smanship-0.005
Token on
Token ID319
Feature activation-1.72e-02

Pos logit contributions
 asks+0.001
 walks+0.001
 knocks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.002
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token waving
Token ID25849
Feature activation-2.92e-02

Pos logit contributions
 asks+0.028
 knocks+0.025
 walks+0.023
 trips+0.023
 points+0.023
Neg logit contributions
lessness-0.036
he-0.034
ill-0.031
 Suz-0.030
smanship-0.029
Token for
Token ID329
Feature activation-1.54e-02

Pos logit contributions
lessness+0.000
he+0.000
ill+0.000
 Suz+0.000
smanship+0.000
Neg logit contributions
 asks-0.000
 walks-0.000
 knocks-0.000
 trips-0.000
 points-0.000
Token explaining
Token ID11170
Feature activation-1.76e-02

Pos logit contributions
 asks+0.022
 knocks+0.020
 walks+0.019
 trips+0.019
 points+0.018
Neg logit contributions
lessness-0.036
he-0.034
ill-0.031
 Suz-0.030
smanship-0.029
Token what
Token ID644
Feature activation-6.41e-03

Pos logit contributions
 asks+0.030
 knocks+0.028
 walks+0.026
 trips+0.026
 points+0.025
Neg logit contributions
lessness-0.035
he-0.033
ill-0.030
 Suz-0.029
smanship-0.027
Token "
Token ID366
Feature activation+6.22e-02

Pos logit contributions
 asks+0.011
 knocks+0.011
 walks+0.010
 trips+0.010
 charges+0.010
Neg logit contributions
lessness-0.012
he-0.011
ill-0.010
 Suz-0.010
smanship-0.009
Tokentrade
Token ID25351
Feature activation+4.83e-04

Pos logit contributions
lessness+0.115
he+0.099
ill+0.096
 Suz+0.095
smanship+0.089
Neg logit contributions
 asks-0.115
 walks-0.104
 knocks-0.101
 trips-0.101
 points-0.097
Token"
Token ID1
Feature activation+9.31e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token meant
Token ID4001
Feature activation+3.02e-02

Pos logit contributions
lessness+0.202
he+0.184
ill+0.177
 Suz+0.177
smanship+0.173
Neg logit contributions
 asks-0.137
 walks-0.117
 knocks-0.115
 trips-0.115
 points-0.109
Token.
Token ID13
Feature activation-1.61e-03

Pos logit contributions
lessness+0.064
he+0.061
ill+0.056
 Suz+0.053
smanship+0.053
Neg logit contributions
 asks-0.047
 knocks-0.043
 walks-0.043
 trips-0.042
 points-0.041
Token From
Token ID3574
Feature activation+7.83e-03

Pos logit contributions
 asks+0.003
 knocks+0.003
 walks+0.003
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.003
ill-0.003
 Suz-0.003
smanship-0.003
Token that
Token ID326
Feature activation+2.23e-02

Pos logit contributions
lessness+0.016
he+0.015
ill+0.014
 Suz+0.013
smanship+0.013
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.012
 points-0.011
Token day
Token ID1110
Feature activation+1.64e-02

Pos logit contributions
lessness+0.037
he+0.034
ill+0.031
 Suz+0.029
smanship+0.028
Neg logit contributions
 asks-0.046
 knocks-0.043
 walks-0.043
 trips-0.042
 points-0.041
Token he
Token ID339
Feature activation+2.25e-02

Pos logit contributions
lessness+0.002
he+0.002
ill+0.002
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 walks-0.001
 knocks-0.001
 trips-0.001
 points-0.001
Token said
Token ID531
Feature activation+1.24e-02

Pos logit contributions
lessness+0.052
he+0.049
ill+0.046
 Suz+0.044
smanship+0.043
Neg logit contributions
 asks-0.032
 walks-0.029
 knocks-0.029
 trips-0.028
 points-0.027
Token "
Token ID366
Feature activation+4.44e-02

Pos logit contributions
lessness+0.027
he+0.025
ill+0.023
 Suz+0.022
smanship+0.022
Neg logit contributions
 asks-0.019
 walks-0.018
 knocks-0.018
 trips-0.017
 points-0.016
Tokenthank
Token ID40716
Feature activation+1.95e-02

Pos logit contributions
lessness+0.089
he+0.073
taker+0.071
 Suz+0.070
smanship+0.070
Neg logit contributions
 asks-0.073
 walks-0.067
 trips-0.064
 points-0.063
 knocks-0.061
Token you
Token ID345
Feature activation+2.42e-02

Pos logit contributions
lessness+0.048
he+0.045
ill+0.042
 Suz+0.040
smanship+0.040
Neg logit contributions
 asks-0.025
 walks-0.023
 knocks-0.022
 trips-0.022
 points-0.021
Token"
Token ID1
Feature activation+9.31e-02

Pos logit contributions
lessness+0.059
he+0.053
smanship+0.051
ill+0.050
 Suz+0.050
Neg logit contributions
 asks-0.029
 trips-0.026
 walks-0.025
 knocks-0.025
 points-0.024
Token to
Token ID284
Feature activation+1.44e-02

Pos logit contributions
lessness+0.208
he+0.189
ill+0.181
smanship+0.178
 Suz+0.178
Neg logit contributions
 asks-0.132
 walks-0.114
 trips-0.112
 points-0.107
 knocks-0.107
Token his
Token ID465
Feature activation-1.34e-02

Pos logit contributions
lessness+0.031
he+0.029
ill+0.027
 Suz+0.026
smanship+0.025
Neg logit contributions
 asks-0.022
 knocks-0.020
 walks-0.020
 trips-0.020
 points-0.019
Token mom
Token ID1995
Feature activation+3.51e-02

Pos logit contributions
 asks+0.016
 knocks+0.014
 walks+0.014
 trips+0.014
 points+0.013
Neg logit contributions
lessness-0.033
he-0.032
ill-0.030
 Suz-0.029
smanship-0.028
Token.
Token ID13
Feature activation-5.48e-03

Pos logit contributions
lessness+0.085
he+0.079
ill+0.072
smanship+0.072
 Suz+0.071
Neg logit contributions
 asks-0.045
 walks-0.040
 knocks-0.038
 trips-0.037
 points-0.036
Token
Token ID198
Feature activation+4.65e-04

Pos logit contributions
 asks+0.009
 knocks+0.009
 walks+0.009
 trips+0.008
 points+0.008
Neg logit contributions
lessness-0.011
he-0.010
ill-0.010
 Suz-0.009
smanship-0.009
Token need
Token ID761
Feature activation-1.39e-02

Pos logit contributions
 asks+0.008
 walks+0.007
 knocks+0.007
 trips+0.007
 points+0.006
Neg logit contributions
lessness-0.009
he-0.008
ill-0.007
 Suz-0.007
smanship-0.007
Token to
Token ID284
Feature activation-1.41e-02

Pos logit contributions
 asks+0.025
 knocks+0.023
 walks+0.021
 throws+0.021
 trips+0.021
Neg logit contributions
lessness-0.025
he-0.021
 Suz-0.021
ill-0.019
taker-0.018
Token learn
Token ID2193
Feature activation-1.58e-02

Pos logit contributions
 asks+0.023
 knocks+0.022
 points+0.021
 throws+0.020
 trips+0.020
Neg logit contributions
he-0.026
lessness-0.024
ill-0.023
 Suz-0.023
smanship-0.020
Token a
Token ID257
Feature activation-1.61e-02

Pos logit contributions
 asks+0.028
 knocks+0.026
 walks+0.025
 trips+0.024
 throws+0.024
Neg logit contributions
lessness-0.029
he-0.027
ill-0.024
 Suz-0.024
taker-0.022
Token lot
Token ID1256
Feature activation+4.44e-03

Pos logit contributions
 asks+0.029
 knocks+0.027
 walks+0.027
 throws+0.025
 trips+0.025
Neg logit contributions
lessness-0.029
he-0.027
 Suz-0.024
smanship-0.023
ill-0.023
Token"
Token ID1
Feature activation+9.28e-02

Pos logit contributions
lessness+0.008
he+0.008
ill+0.007
 Suz+0.006
smanship+0.006
Neg logit contributions
 asks-0.008
 walks-0.008
 knocks-0.008
 trips-0.008
 points-0.007
Token said
Token ID531
Feature activation+2.26e-02

Pos logit contributions
lessness+0.211
he+0.197
ill+0.184
 Suz+0.180
smanship+0.175
Neg logit contributions
 asks-0.134
 walks-0.118
 knocks-0.116
 trips-0.114
 points-0.109
Token Mum
Token ID38367
Feature activation+2.65e-02

Pos logit contributions
lessness+0.048
he+0.046
ill+0.042
 Suz+0.040
smanship+0.039
Neg logit contributions
 asks-0.035
 knocks-0.032
 walks-0.032
 trips-0.031
 points-0.030
Token.
Token ID13
Feature activation-5.09e-03

Pos logit contributions
lessness+0.063
he+0.061
ill+0.054
 Suz+0.053
smanship+0.053
Neg logit contributions
 asks-0.035
 walks-0.030
 knocks-0.030
 trips-0.029
 points-0.028
Token
Token ID198
Feature activation-4.93e-03

Pos logit contributions
 asks+0.009
 walks+0.009
 knocks+0.009
 trips+0.009
 points+0.008
Neg logit contributions
lessness-0.009
he-0.009
ill-0.008
 Suz-0.008
taker-0.007
Token
Token ID198
Feature activation+8.58e-03

Pos logit contributions
 asks+0.008
 knocks+0.007
 walks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.010
he-0.010
ill-0.009
 Suz-0.009
smanship-0.008
Token,
Token ID11
Feature activation-5.49e-03

Pos logit contributions
lessness+0.029
he+0.028
ill+0.025
 Suz+0.024
smanship+0.023
Neg logit contributions
 asks-0.025
 knocks-0.023
 walks-0.023
 trips-0.022
 points-0.021
Token "
Token ID366
Feature activation+2.04e-02

Pos logit contributions
 asks+0.007
 knocks+0.006
 walks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.013
he-0.013
ill-0.012
 Suz-0.011
smanship-0.011
TokenYou
Token ID1639
Feature activation+1.10e-03

Pos logit contributions
lessness+0.045
he+0.041
ill+0.038
smanship+0.038
taker+0.037
Neg logit contributions
 asks-0.028
 walks-0.026
 points-0.024
Go-0.024
 requests-0.022
Token're
Token ID821
Feature activation+1.11e-02

Pos logit contributions
lessness+0.002
he+0.002
ill+0.002
smanship+0.002
 Suz+0.002
Neg logit contributions
 asks-0.002
 walks-0.002
 trips-0.002
 points-0.002
 knocks-0.002
Token welcome
Token ID7062
Feature activation-6.04e-03

Pos logit contributions
lessness+0.022
he+0.020
ill+0.020
smanship+0.020
ingle+0.018
Neg logit contributions
 asks-0.018
 walks-0.017
 knocks-0.016
 trips-0.016
 grown-0.015
Token"
Token ID1
Feature activation+9.24e-02

Pos logit contributions
 asks+0.009
 walks+0.008
 trips+0.008
 knocks+0.008
 points+0.008
Neg logit contributions
lessness-0.014
he-0.013
smanship-0.012
ill-0.012
ini-0.011
Token before
Token ID878
Feature activation-6.20e-04

Pos logit contributions
lessness+0.191
he+0.179
smanship+0.169
igsaw+0.166
ill+0.162
Neg logit contributions
 asks-0.151
 walks-0.126
 points-0.124
 trips-0.119
 knocks-0.113
Token returning
Token ID8024
Feature activation-2.83e-02

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token to
Token ID284
Feature activation+5.15e-03

Pos logit contributions
 asks+0.046
 walks+0.042
 knocks+0.042
 trips+0.041
 points+0.039
Neg logit contributions
lessness-0.060
he-0.056
ill-0.052
 Suz-0.049
smanship-0.048
Token his
Token ID465
Feature activation-1.34e-02

Pos logit contributions
lessness+0.009
he+0.009
 Suz+0.008
ill+0.008
smanship+0.007
Neg logit contributions
 asks-0.009
 knocks-0.008
 points-0.008
 trips-0.008
 walks-0.008
Token stable
Token ID8245
Feature activation-1.39e-02

Pos logit contributions
 asks+0.024
 knocks+0.022
 walks+0.021
 points+0.021
 trips+0.021
Neg logit contributions
lessness-0.023
he-0.022
 Suz-0.020
ill-0.020
smanship-0.018
Token know
Token ID760
Feature activation-5.61e-03

Pos logit contributions
lessness+0.020
he+0.019
ill+0.018
 Suz+0.018
smanship+0.018
Neg logit contributions
 asks-0.014
 walks-0.012
 knocks-0.012
 points-0.011
 trips-0.011
Token what
Token ID644
Feature activation-9.72e-03

Pos logit contributions
 asks+0.007
 knocks+0.007
 walks+0.007
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.013
he-0.013
ill-0.012
 Suz-0.011
smanship-0.011
Token "
Token ID366
Feature activation+5.35e-02

Pos logit contributions
 asks+0.017
 knocks+0.015
 walks+0.015
 points+0.014
 trips+0.014
Neg logit contributions
lessness-0.019
he-0.017
 Suz-0.016
ill-0.015
smanship-0.015
Tokenmod
Token ID4666
Feature activation+2.61e-02

Pos logit contributions
lessness+0.103
he+0.094
ill+0.088
 Suz+0.085
smanship+0.082
Neg logit contributions
 asks-0.094
 knocks-0.085
 walks-0.085
 trips-0.084
 points-0.081
Tokenest
Token ID395
Feature activation+4.68e-03

Pos logit contributions
lessness+0.034
he+0.033
ill+0.028
 Suz+0.026
smanship+0.026
Neg logit contributions
 asks-0.060
 knocks-0.055
 trips-0.054
 walks-0.054
 points-0.054
Token"
Token ID1
Feature activation+9.17e-02

Pos logit contributions
lessness+0.009
he+0.008
ill+0.007
 Suz+0.007
smanship+0.007
Neg logit contributions
 asks-0.009
 knocks-0.008
 walks-0.008
 trips-0.008
 throws-0.007
Token meant
Token ID4001
Feature activation+3.04e-02

Pos logit contributions
lessness+0.209
he+0.194
ill+0.183
 Suz+0.181
smanship+0.176
Neg logit contributions
 asks-0.124
 trips-0.106
 walks-0.106
 knocks-0.104
 points-0.100
Token,
Token ID11
Feature activation+1.83e-02

Pos logit contributions
lessness+0.059
he+0.055
ill+0.051
 Suz+0.048
smanship+0.046
Neg logit contributions
 asks-0.054
 walks-0.050
 knocks-0.050
 trips-0.049
 points-0.046
Token but
Token ID475
Feature activation+7.01e-03

Pos logit contributions
he+0.030
lessness+0.030
ill+0.027
smanship+0.025
 Suz+0.023
Neg logit contributions
 asks-0.037
 walks-0.034
 knocks-0.033
 points-0.033
 trips-0.032
Token he
Token ID339
Feature activation-8.61e-03

Pos logit contributions
lessness+0.014
he+0.013
ill+0.012
 Suz+0.011
smanship+0.011
Neg logit contributions
 asks-0.012
 knocks-0.011
 walks-0.011
 trips-0.011
 points-0.011
Token knew
Token ID2993
Feature activation-6.44e-03

Pos logit contributions
 asks+0.014
 knocks+0.014
 walks+0.013
 trips+0.013
 strangers+0.012
Neg logit contributions
lessness-0.017
he-0.016
ill-0.016
 Suz-0.015
smanship-0.013
Token Mia
Token ID32189
Feature activation-1.46e-02

Pos logit contributions
lessness+0.015
he+0.014
smanship+0.013
ini+0.013
igsaw+0.012
Neg logit contributions
 asks-0.008
 walks-0.008
 knocks-0.007
 trips-0.007
 points-0.007
Token said
Token ID531
Feature activation+1.49e-02

Pos logit contributions
 asks+0.020
 knocks+0.019
 walks+0.018
 requests+0.018
 points+0.017
Neg logit contributions
he-0.030
lessness-0.029
ill-0.028
 Suz-0.025
smanship-0.024
Token "
Token ID366
Feature activation+3.11e-02

Pos logit contributions
lessness+0.028
he+0.025
ill+0.023
 Suz+0.023
taker+0.021
Neg logit contributions
 asks-0.027
 knocks-0.026
 walks-0.025
 trips-0.025
 points-0.023
Tokenthank
Token ID40716
Feature activation+2.01e-02

Pos logit contributions
lessness+0.067
he+0.058
 Suz+0.055
ill+0.054
smanship+0.054
Neg logit contributions
 asks-0.048
 walks-0.043
 trips-0.042
 knocks-0.042
 points-0.041
Token you
Token ID345
Feature activation+2.57e-02

Pos logit contributions
lessness+0.053
he+0.050
ill+0.048
smanship+0.046
 Suz+0.044
Neg logit contributions
 asks-0.022
 walks-0.021
 trips-0.019
 knocks-0.018
 points-0.018
Token"
Token ID1
Feature activation+9.16e-02

Pos logit contributions
lessness+0.067
he+0.062
smanship+0.059
ill+0.057
 Suz+0.056
Neg logit contributions
 asks-0.024
 knocks-0.022
 trips-0.021
 walks-0.021
 points-0.019
Token for
Token ID329
Feature activation-9.61e-03

Pos logit contributions
lessness+0.226
he+0.209
smanship+0.200
 Suz+0.194
igsaw+0.194
Neg logit contributions
 asks-0.110
 walks-0.085
 points-0.085
 trips-0.082
 asked-0.079
Token the
Token ID262
Feature activation-4.96e-03

Pos logit contributions
 asks+0.015
 knocks+0.014
 trips+0.014
 walks+0.013
 points+0.013
Neg logit contributions
lessness-0.020
he-0.020
 Suz-0.018
ill-0.018
smanship-0.016
Token fun
Token ID1257
Feature activation+1.18e-02

Pos logit contributions
 asks+0.008
 knocks+0.007
 walks+0.007
 trips+0.007
 points+0.006
Neg logit contributions
lessness-0.011
he-0.011
ill-0.009
 Suz-0.009
smanship-0.009
Token she
Token ID673
Feature activation+2.86e-02

Pos logit contributions
lessness+0.023
he+0.022
ill+0.020
 Suz+0.019
smanship+0.018
Neg logit contributions
 asks-0.021
 knocks-0.019
 walks-0.019
 trips-0.019
 points-0.018
Token was
Token ID373
Feature activation+2.85e-02

Pos logit contributions
lessness+0.064
he+0.064
ill+0.059
 Suz+0.054
smanship+0.053
Neg logit contributions
 asks-0.037
 knocks-0.035
 walks-0.034
 trips-0.034
 points-0.032
Token me
Token ID502
Feature activation-6.49e-03

Pos logit contributions
 asks+0.050
 knocks+0.046
 points+0.044
 walks+0.042
 trips+0.042
Neg logit contributions
lessness-0.056
he-0.053
 Suz-0.053
ill-0.050
 Sus-0.044
Token to
Token ID284
Feature activation+1.24e-03

Pos logit contributions
 asks+0.010
 walks+0.009
 knocks+0.009
 trips+0.009
 points+0.008
Neg logit contributions
lessness-0.014
he-0.014
ill-0.013
 Suz-0.012
smanship-0.012
Token finish
Token ID5461
Feature activation-3.35e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 points-0.002
 trips-0.002
Token my
Token ID616
Feature activation-3.05e-03

Pos logit contributions
 asks+0.056
 knocks+0.051
 walks+0.049
 points+0.047
 trips+0.046
Neg logit contributions
lessness-0.059
he-0.059
 Suz-0.054
ill-0.053
smanship-0.048
Token bath
Token ID7837
Feature activation-8.79e-04

Pos logit contributions
 asks+0.006
 points+0.005
 challenged+0.005
 charges+0.005
 strangers+0.004
Neg logit contributions
lessness-0.005
he-0.005
 Suz-0.004
smanship-0.004
igsaw-0.004
Token"
Token ID1
Feature activation+9.15e-02

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.002
he-0.002
ill-0.002
 Suz-0.002
smanship-0.002
Token said
Token ID531
Feature activation+1.89e-02

Pos logit contributions
lessness+0.199
he+0.185
ill+0.173
 Suz+0.167
smanship+0.163
Neg logit contributions
 asks-0.142
 walks-0.125
 knocks-0.125
 trips-0.122
 points-0.117
Token her
Token ID607
Feature activation-2.00e-02

Pos logit contributions
lessness+0.039
he+0.036
ill+0.033
 Suz+0.032
smanship+0.031
Neg logit contributions
 asks-0.031
 walks-0.029
 knocks-0.029
 trips-0.028
 points-0.027
Token mom
Token ID1995
Feature activation+2.71e-02

Pos logit contributions
 asks+0.022
 knocks+0.020
 walks+0.020
 trips+0.019
 points+0.018
Neg logit contributions
lessness-0.052
he-0.050
ill-0.047
 Suz-0.045
smanship-0.044
Token.
Token ID13
Feature activation-7.99e-03

Pos logit contributions
lessness+0.043
he+0.040
ill+0.035
 Suz+0.033
smanship+0.032
Neg logit contributions
 asks-0.057
 knocks-0.053
 walks-0.052
 trips-0.051
 points-0.050
Token
Token ID198
Feature activation+2.34e-04

Pos logit contributions
 asks+0.014
 walks+0.013
 trips+0.013
 knocks+0.013
 points+0.013
Neg logit contributions
lessness-0.015
he-0.015
ill-0.013
 Suz-0.013
smanship-0.011
Token understand
Token ID1833
Feature activation+1.99e-02

Pos logit contributions
lessness+0.017
he+0.016
ill+0.015
 Suz+0.014
smanship+0.014
Neg logit contributions
 asks-0.014
 walks-0.013
 knocks-0.013
 trips-0.012
 points-0.012
Token what
Token ID644
Feature activation-7.42e-03

Pos logit contributions
lessness+0.048
he+0.046
ill+0.042
 Suz+0.041
smanship+0.040
Neg logit contributions
 asks-0.026
 walks-0.024
 knocks-0.023
 trips-0.023
 points-0.022
Token "
Token ID366
Feature activation+5.71e-02

Pos logit contributions
 asks+0.012
 knocks+0.011
 walks+0.011
 trips+0.011
 points+0.011
Neg logit contributions
lessness-0.016
he-0.014
 Suz-0.013
ill-0.013
smanship-0.012
Tokenmod
Token ID4666
Feature activation+2.28e-02

Pos logit contributions
lessness+0.109
he+0.097
ill+0.092
 Suz+0.090
smanship+0.086
Neg logit contributions
 asks-0.101
 walks-0.090
 knocks-0.090
 trips-0.089
 points-0.087
Tokenest
Token ID395
Feature activation+7.12e-03

Pos logit contributions
lessness+0.030
he+0.029
ill+0.024
 Suz+0.023
smanship+0.022
Neg logit contributions
 asks-0.052
 knocks-0.048
 walks-0.048
 trips-0.048
 points-0.048
Token"
Token ID1
Feature activation+9.13e-02

Pos logit contributions
lessness+0.014
he+0.012
ill+0.011
 Suz+0.011
smanship+0.011
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.012
 throws-0.011
Token meant
Token ID4001
Feature activation+2.44e-02

Pos logit contributions
lessness+0.204
he+0.190
ill+0.177
 Suz+0.176
smanship+0.170
Neg logit contributions
 asks-0.130
 walks-0.111
 trips-0.110
 knocks-0.109
 points-0.105
Token,
Token ID11
Feature activation+1.73e-02

Pos logit contributions
lessness+0.047
he+0.043
ill+0.040
 Suz+0.038
taker+0.036
Neg logit contributions
 asks-0.043
 knocks-0.040
 walks-0.040
 trips-0.039
 throws-0.036
Token but
Token ID475
Feature activation+7.81e-03

Pos logit contributions
lessness+0.029
he+0.029
ill+0.026
smanship+0.024
 Suz+0.023
Neg logit contributions
 asks-0.034
 walks-0.031
 knocks-0.031
 points-0.031
 trips-0.030
Token he
Token ID339
Feature activation-6.34e-03

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
 Suz+0.013
smanship+0.012
Neg logit contributions
 asks-0.014
 knocks-0.013
 walks-0.013
 trips-0.012
 points-0.012
Token was
Token ID373
Feature activation-2.09e-03

Pos logit contributions
 asks+0.010
 knocks+0.009
 walks+0.009
 trips+0.009
 points+0.008
Neg logit contributions
lessness-0.013
he-0.012
ill-0.012
 Suz-0.011
smanship-0.011
Token Daisy
Token ID40355
Feature activation+4.64e-03

Pos logit contributions
lessness+0.029
he+0.028
ill+0.025
smanship+0.025
ini+0.024
Neg logit contributions
 asks-0.024
 walks-0.023
 trips-0.021
 throws-0.020
 points-0.020
Token said
Token ID531
Feature activation+2.45e-02

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
 Suz+0.008
smanship+0.007
Neg logit contributions
 asks-0.008
 walks-0.007
 knocks-0.007
 trips-0.007
 points-0.007
Token "
Token ID366
Feature activation+3.79e-02

Pos logit contributions
lessness+0.050
he+0.047
ill+0.044
 Suz+0.042
smanship+0.041
Neg logit contributions
 asks-0.040
 walks-0.037
 knocks-0.037
 trips-0.036
 points-0.035
TokenThank
Token ID10449
Feature activation+1.07e-02

Pos logit contributions
lessness+0.074
smanship+0.062
 Suz+0.061
he+0.061
taker+0.060
Neg logit contributions
 asks-0.056
 walks-0.054
Go-0.052
 points-0.051
 trips-0.049
Token you
Token ID345
Feature activation+3.15e-02

Pos logit contributions
lessness+0.027
he+0.025
ill+0.024
smanship+0.023
 Suz+0.023
Neg logit contributions
 asks-0.012
 walks-0.011
 trips-0.010
 knocks-0.010
 points-0.010
Token"
Token ID1
Feature activation+9.11e-02

Pos logit contributions
lessness+0.071
he+0.069
smanship+0.065
imm+0.063
ill+0.061
Neg logit contributions
 asks-0.033
 trips-0.031
 strangers-0.031
 walks-0.031
 knocks-0.029
Token and
Token ID290
Feature activation-1.01e-02

Pos logit contributions
lessness+0.201
he+0.187
smanship+0.176
 Suz+0.174
ill+0.169
Neg logit contributions
 asks-0.126
 walks-0.106
 points-0.100
 asked-0.099
 trips-0.096
Token hugged
Token ID41130
Feature activation-6.12e-03

Pos logit contributions
 asks+0.015
 knocks+0.014
 trips+0.014
 walks+0.014
 points+0.013
Neg logit contributions
lessness-0.021
he-0.020
 Suz-0.019
ill-0.019
taker-0.017
Token the
Token ID262
Feature activation+4.89e-04

Pos logit contributions
 asks+0.009
 walks+0.008
 trips+0.008
 knocks+0.008
 strangers+0.007
Neg logit contributions
lessness-0.014
he-0.013
ill-0.012
smanship-0.012
 Suz-0.011
Token girl
Token ID2576
Feature activation+2.51e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token.
Token ID13
Feature activation+2.41e-03

Pos logit contributions
lessness+0.060
he+0.057
smanship+0.052
ill+0.051
 Suz+0.050
Neg logit contributions
 asks-0.031
 walks-0.029
 knocks-0.027
 trips-0.026
 points-0.026
Token it
Token ID340
Feature activation-1.01e-02

Pos logit contributions
lessness+0.004
he+0.004
ill+0.004
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.003
 walks-0.003
 knocks-0.003
 trips-0.003
 points-0.003
Token said
Token ID531
Feature activation+2.48e-03

Pos logit contributions
 asks+0.014
 knocks+0.013
 walks+0.013
 trips+0.013
 points+0.012
Neg logit contributions
lessness-0.023
he-0.021
ill-0.020
 Suz-0.019
smanship-0.018
Token "
Token ID366
Feature activation+2.71e-02

Pos logit contributions
lessness+0.005
he+0.005
ill+0.004
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.004
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.004
Tokenthank
Token ID40716
Feature activation+1.35e-02

Pos logit contributions
lessness+0.054
he+0.046
ill+0.045
 Suz+0.044
smanship+0.043
Neg logit contributions
 asks-0.046
 walks-0.042
 trips-0.039
 points-0.038
 knocks-0.038
Token you
Token ID345
Feature activation+2.38e-02

Pos logit contributions
lessness+0.034
he+0.032
ill+0.030
smanship+0.029
 Suz+0.029
Neg logit contributions
 asks-0.017
 walks-0.015
 knocks-0.015
 trips-0.014
 points-0.014
Token"
Token ID1
Feature activation+9.06e-02

Pos logit contributions
lessness+0.061
he+0.058
smanship+0.055
 Suz+0.052
ill+0.052
Neg logit contributions
 asks-0.024
 trips-0.021
 knocks-0.021
 walks-0.020
 strangers-0.020
Token to
Token ID284
Feature activation+9.95e-03

Pos logit contributions
lessness+0.222
he+0.204
smanship+0.193
ill+0.190
 Suz+0.188
Neg logit contributions
 asks-0.114
 trips-0.092
 walks-0.091
 points-0.087
 knocks-0.087
Token the
Token ID262
Feature activation-3.77e-03

Pos logit contributions
lessness+0.020
he+0.019
ill+0.018
 Suz+0.017
smanship+0.016
Neg logit contributions
 asks-0.017
 knocks-0.015
 walks-0.015
 trips-0.015
 points-0.014
Token hipp
Token ID18568
Feature activation-1.59e-03

Pos logit contributions
 asks+0.006
 walks+0.006
 knocks+0.006
 trips+0.006
 points+0.005
Neg logit contributions
lessness-0.008
he-0.007
ill-0.007
 Suz-0.007
smanship-0.006
Tokeno
Token ID78
Feature activation+1.23e-03

Pos logit contributions
 asks+0.003
 knocks+0.002
 walks+0.002
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.003
smanship-0.002
ill-0.002
 Suz-0.002
Token.
Token ID13
Feature activation+4.33e-04

Pos logit contributions
lessness+0.003
he+0.003
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.002
 walks-0.002
 knocks-0.002
 trips-0.001
 points-0.001
Token all
Token ID477
Feature activation+1.50e-02

Pos logit contributions
lessness+0.011
he+0.010
ill+0.009
 Suz+0.009
smanship+0.009
Neg logit contributions
 asks-0.007
 knocks-0.006
 walks-0.006
 trips-0.006
 points-0.006
Token said
Token ID531
Feature activation+8.03e-03

Pos logit contributions
lessness+0.036
he+0.034
ill+0.032
 Suz+0.031
smanship+0.030
Neg logit contributions
 asks-0.020
 knocks-0.018
 walks-0.018
 trips-0.017
 points-0.016
Token "
Token ID366
Feature activation+3.45e-02

Pos logit contributions
lessness+0.014
he+0.012
ill+0.011
 Suz+0.011
taker+0.011
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.015
 trips-0.014
 requests-0.014
Tokenthank
Token ID40716
Feature activation+1.75e-02

Pos logit contributions
lessness+0.072
he+0.063
ill+0.059
taker+0.058
smanship+0.058
Neg logit contributions
 asks-0.054
 walks-0.050
 trips-0.048
 points-0.047
 knocks-0.047
Token you
Token ID345
Feature activation+2.41e-02

Pos logit contributions
lessness+0.043
he+0.041
ill+0.038
smanship+0.036
 Suz+0.036
Neg logit contributions
 asks-0.023
 walks-0.020
 knocks-0.020
 trips-0.019
 points-0.019
Token"
Token ID1
Feature activation+9.06e-02

Pos logit contributions
lessness+0.060
he+0.057
smanship+0.053
ill+0.052
 Suz+0.049
Neg logit contributions
 asks-0.026
 walks-0.024
 trips-0.023
 knocks-0.023
 points-0.022
Token to
Token ID284
Feature activation+9.18e-03

Pos logit contributions
lessness+0.210
he+0.199
smanship+0.181
ill+0.181
 Suz+0.176
Neg logit contributions
 asks-0.120
 walks-0.103
 trips-0.101
 points-0.098
 knocks-0.097
Token the
Token ID262
Feature activation-8.12e-03

Pos logit contributions
lessness+0.019
he+0.018
 Suz+0.018
ill+0.017
smanship+0.016
Neg logit contributions
 asks-0.014
 trips-0.013
 knocks-0.013
 walks-0.013
 points-0.012
Token sun
Token ID4252
Feature activation+1.07e-02

Pos logit contributions
 asks+0.013
 knocks+0.012
 walks+0.012
 trips+0.012
 points+0.011
Neg logit contributions
lessness-0.017
he-0.016
ill-0.014
 Suz-0.014
smanship-0.013
Token for
Token ID329
Feature activation-6.12e-03

Pos logit contributions
lessness+0.022
he+0.021
ill+0.018
smanship+0.018
igsaw+0.018
Neg logit contributions
 asks-0.017
 walks-0.016
 knocks-0.016
 trips-0.015
 points-0.015
Token giving
Token ID3501
Feature activation-1.01e-02

Pos logit contributions
 asks+0.010
 knocks+0.009
 trips+0.009
 walks+0.009
 points+0.008
Neg logit contributions
lessness-0.013
he-0.012
 Suz-0.011
ill-0.011
smanship-0.010
Token "
Token ID366
Feature activation+2.94e-02

Pos logit contributions
lessness+0.004
he+0.004
 Suz+0.003
ill+0.003
taker+0.003
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.003
 throws-0.003
TokenIt
Token ID1026
Feature activation+1.19e-02

Pos logit contributions
lessness+0.061
he+0.054
ill+0.054
 Suz+0.053
smanship+0.051
Neg logit contributions
 asks-0.045
 walks-0.039
 points-0.037
Go-0.037
 requests-0.036
Token was
Token ID373
Feature activation+6.65e-03

Pos logit contributions
lessness+0.032
he+0.030
ill+0.028
 Suz+0.028
smanship+0.027
Neg logit contributions
 asks-0.012
 walks-0.010
 knocks-0.010
 trips-0.010
 points-0.009
Token my
Token ID616
Feature activation-5.11e-03

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
smanship+0.012
 Suz+0.012
Neg logit contributions
 asks-0.010
 walks-0.009
 knocks-0.009
 trips-0.009
 points-0.008
Token pleasure
Token ID9476
Feature activation+4.14e-03

Pos logit contributions
 asks+0.010
 walks+0.010
 knocks+0.010
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.009
he-0.009
ill-0.008
 Suz-0.007
smanship-0.007
Token"
Token ID1
Feature activation+9.03e-02

Pos logit contributions
he+0.010
lessness+0.010
ill+0.009
smanship+0.009
 Suz+0.008
Neg logit contributions
 asks-0.006
 knocks-0.005
 walks-0.005
 points-0.005
 trips-0.004
Token The
Token ID383
Feature activation+5.07e-04

Pos logit contributions
lessness+0.211
he+0.196
smanship+0.185
 Suz+0.184
ill+0.184
Neg logit contributions
 asks-0.119
 walks-0.097
 trips-0.093
 points-0.091
 knocks-0.088
Token little
Token ID1310
Feature activation-7.93e-04

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token boy
Token ID2933
Feature activation-6.62e-03

Pos logit contributions
 asks+0.001
 walks+0.001
 knocks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token then
Token ID788
Feature activation+1.96e-02

Pos logit contributions
 asks+0.009
 walks+0.008
 knocks+0.008
 trips+0.008
 points+0.007
Neg logit contributions
lessness-0.016
he-0.015
ill-0.014
 Suz-0.013
smanship-0.013
Token went
Token ID1816
Feature activation-2.03e-03

Pos logit contributions
lessness+0.046
he+0.044
 Suz+0.040
ill+0.040
smanship+0.038
Neg logit contributions
 asks-0.024
 knocks-0.023
 trips-0.022
 walks-0.021
 points-0.021
Token he
Token ID339
Feature activation-4.97e-03

Pos logit contributions
lessness+0.026
he+0.024
ill+0.022
 Suz+0.021
smanship+0.021
Neg logit contributions
 asks-0.020
 knocks-0.018
 walks-0.018
 trips-0.018
 points-0.017
Token said
Token ID531
Feature activation+1.40e-02

Pos logit contributions
 asks+0.007
 knocks+0.007
 walks+0.007
 trips+0.007
 points+0.006
Neg logit contributions
lessness-0.011
he-0.010
ill-0.010
 Suz-0.009
smanship-0.009
Token "
Token ID366
Feature activation+3.31e-02

Pos logit contributions
lessness+0.028
he+0.026
ill+0.024
 Suz+0.023
smanship+0.022
Neg logit contributions
 asks-0.024
 knocks-0.022
 walks-0.022
 trips-0.021
 points-0.020
TokenThank
Token ID10449
Feature activation+6.97e-03

Pos logit contributions
lessness+0.067
he+0.060
 Suz+0.056
ill+0.056
smanship+0.054
Neg logit contributions
 asks-0.054
 walks-0.049
 points-0.047
 trips-0.047
 knocks-0.046
Token you
Token ID345
Feature activation+2.90e-02

Pos logit contributions
lessness+0.016
he+0.015
ill+0.014
 Suz+0.013
smanship+0.013
Neg logit contributions
 asks-0.010
 knocks-0.009
 walks-0.009
 trips-0.009
 points-0.008
Token"
Token ID1
Feature activation+9.02e-02

Pos logit contributions
lessness+0.062
he+0.061
smanship+0.055
ill+0.053
imm+0.052
Neg logit contributions
 asks-0.038
 trips-0.035
 walks-0.035
 points-0.033
 strangers-0.033
Token and
Token ID290
Feature activation-8.31e-03

Pos logit contributions
lessness+0.190
he+0.172
 Suz+0.163
smanship+0.161
igsaw+0.158
Neg logit contributions
 asks-0.147
 points-0.123
 walks-0.122
 trips-0.120
 knocks-0.110
Token put
Token ID1234
Feature activation-2.73e-04

Pos logit contributions
 asks+0.012
 trips+0.012
 knocks+0.012
 walks+0.012
 strangers+0.011
Neg logit contributions
lessness-0.018
he-0.016
 Suz-0.015
ill-0.015
taker-0.014
Token it
Token ID340
Feature activation+1.98e-02

Pos logit contributions
 asks+0.000
 walks+0.000
 trips+0.000
 points+0.000
 knocks+0.000
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
smanship-0.001
 Suz-0.001
Token inside
Token ID2641
Feature activation-1.44e-03

Pos logit contributions
lessness+0.040
he+0.039
ill+0.036
smanship+0.034
 Suz+0.034
Neg logit contributions
 asks-0.031
 walks-0.030
 trips-0.029
 knocks-0.028
 points-0.028
Token his
Token ID465
Feature activation-9.99e-03

Pos logit contributions
 asks+0.002
 walks+0.002
 trips+0.002
 knocks+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.003
ill-0.003
 Suz-0.003
smanship-0.003
Token said
Token ID531
Feature activation+2.20e-03

Pos logit contributions
lessness+0.084
he+0.079
ill+0.073
 Suz+0.071
smanship+0.068
Neg logit contributions
 asks-0.058
 knocks-0.053
 walks-0.053
 trips-0.051
 points-0.049
Token "
Token ID366
Feature activation+4.17e-02

Pos logit contributions
lessness+0.005
he+0.004
ill+0.004
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.003
 points-0.003
TokenSafe
Token ID31511
Feature activation-2.63e-03

Pos logit contributions
lessness+0.084
he+0.077
ill+0.071
 Suz+0.069
smanship+0.068
Neg logit contributions
 asks-0.070
 walks-0.064
 knocks-0.062
 trips-0.061
 points-0.060
Token to
Token ID284
Feature activation+1.07e-02

Pos logit contributions
 asks+0.005
 knocks+0.005
 walks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.005
he-0.005
ill-0.004
 Suz-0.004
smanship-0.004
Token Eat
Token ID27574
Feature activation-1.46e-02

Pos logit contributions
lessness+0.020
he+0.019
 Suz+0.018
ill+0.018
smanship+0.016
Neg logit contributions
 asks-0.018
 knocks-0.017
 trips-0.016
 points-0.016
 walks-0.016
Token"
Token ID1
Feature activation+9.00e-02

Pos logit contributions
 asks+0.028
 knocks+0.026
 walks+0.025
 trips+0.023
 points+0.023
Neg logit contributions
he-0.025
lessness-0.024
ill-0.021
 Suz-0.020
smanship-0.019
Token and
Token ID290
Feature activation-1.54e-02

Pos logit contributions
lessness+0.185
he+0.168
ill+0.160
 Suz+0.158
smanship+0.157
Neg logit contributions
 asks-0.140
 walks-0.124
 trips-0.120
 knocks-0.119
 points-0.115
Token knew
Token ID2993
Feature activation-1.52e-02

Pos logit contributions
 asks+0.023
 knocks+0.022
 trips+0.021
 walks+0.021
 points+0.020
Neg logit contributions
lessness-0.033
he-0.031
ill-0.028
 Suz-0.028
smanship-0.026
Token the
Token ID262
Feature activation-2.67e-03

Pos logit contributions
 asks+0.026
 knocks+0.024
 walks+0.024
 trips+0.023
 points+0.022
Neg logit contributions
lessness-0.030
he-0.028
ill-0.026
 Suz-0.026
smanship-0.024
Token berries
Token ID36322
Feature activation-1.25e-02

Pos logit contributions
 asks+0.005
 walks+0.004
 knocks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.005
he-0.005
ill-0.005
 Suz-0.004
smanship-0.004
Token were
Token ID547
Feature activation+6.79e-03

Pos logit contributions
 asks+0.019
 walks+0.018
 knocks+0.018
 trips+0.017
 points+0.017
Neg logit contributions
lessness-0.026
he-0.024
ill-0.023
 Suz-0.022
smanship-0.021
Token prince
Token ID19716
Feature activation-2.38e-02

Pos logit contributions
 asks+0.002
 knocks+0.002
 walks+0.002
 trips+0.002
 throws+0.001
Neg logit contributions
lessness-0.002
he-0.002
ill-0.001
smanship-0.001
 Suz-0.001
Token said
Token ID531
Feature activation+3.19e-03

Pos logit contributions
 asks+0.033
 knocks+0.029
 walks+0.029
 trips+0.028
 points+0.027
Neg logit contributions
lessness-0.055
he-0.052
ill-0.048
 Suz-0.046
smanship-0.045
Token "
Token ID366
Feature activation+2.53e-02

Pos logit contributions
lessness+0.006
he+0.005
 Suz+0.005
ill+0.005
taker+0.005
Neg logit contributions
 asks-0.006
 knocks-0.006
 walks-0.005
 trips-0.005
 points-0.005
Tokenthank
Token ID40716
Feature activation+1.31e-02

Pos logit contributions
lessness+0.053
he+0.048
ill+0.046
 Suz+0.044
smanship+0.043
Neg logit contributions
 asks-0.041
 walks-0.036
 knocks-0.035
 trips-0.035
 points-0.034
Token you
Token ID345
Feature activation+2.50e-02

Pos logit contributions
lessness+0.030
he+0.029
ill+0.027
 Suz+0.026
smanship+0.025
Neg logit contributions
 asks-0.018
 knocks-0.017
 walks-0.017
 trips-0.016
 points-0.015
Token"
Token ID1
Feature activation+8.96e-02

Pos logit contributions
lessness+0.061
he+0.057
ill+0.055
smanship+0.054
 Suz+0.054
Neg logit contributions
 asks-0.028
 trips-0.024
 walks-0.024
 knocks-0.024
 strangers-0.023
Token over
Token ID625
Feature activation-1.12e-02

Pos logit contributions
lessness+0.207
he+0.189
smanship+0.180
ill+0.178
 Suz+0.177
Neg logit contributions
 asks-0.123
 walks-0.102
 trips-0.099
 points-0.097
 knocks-0.094
Token and
Token ID290
Feature activation-1.86e-03

Pos logit contributions
 asks+0.019
 walks+0.018
 knocks+0.018
 trips+0.017
 points+0.016
Neg logit contributions
lessness-0.023
he-0.021
ill-0.020
 Suz-0.019
smanship-0.018
Token over
Token ID625
Feature activation-2.31e-02

Pos logit contributions
 asks+0.002
 trips+0.002
 walks+0.002
 knocks+0.002
 strangers+0.002
Neg logit contributions
lessness-0.004
he-0.004
 Suz-0.004
ill-0.004
taker-0.003
Token again
Token ID757
Feature activation-2.47e-02

Pos logit contributions
 asks+0.032
 walks+0.031
 trips+0.029
 knocks+0.028
 points+0.027
Neg logit contributions
lessness-0.052
he-0.051
smanship-0.048
ini-0.046
imm-0.046
Token.
Token ID13
Feature activation-9.24e-04

Pos logit contributions
 asks+0.038
 knocks+0.035
 walks+0.035
 trips+0.034
 points+0.032
Neg logit contributions
lessness-0.054
he-0.051
ill-0.047
 Suz-0.045
smanship-0.044
Token bird
Token ID6512
Feature activation+3.71e-04

Pos logit contributions
lessness+0.012
he+0.011
ill+0.010
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.008
 knocks-0.007
 walks-0.007
 trips-0.007
 points-0.007
Token said
Token ID531
Feature activation+8.36e-03

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
igsaw+0.001
Neg logit contributions
 asks-0.000
 walks-0.000
 knocks-0.000
 trips-0.000
 points-0.000
Token "
Token ID366
Feature activation+2.64e-02

Pos logit contributions
lessness+0.017
he+0.015
ill+0.014
 Suz+0.014
smanship+0.013
Neg logit contributions
 asks-0.014
 knocks-0.013
 walks-0.013
 trips-0.013
 points-0.012
Tokenthank
Token ID40716
Feature activation+1.71e-02

Pos logit contributions
lessness+0.055
 Suz+0.046
he+0.046
ill+0.046
igsaw+0.045
Neg logit contributions
 asks-0.041
 walks-0.036
Go-0.034
 points-0.034
 trips-0.033
Token you
Token ID345
Feature activation+2.47e-02

Pos logit contributions
lessness+0.043
he+0.041
ill+0.039
smanship+0.037
 Suz+0.037
Neg logit contributions
 asks-0.021
 walks-0.019
 trips-0.018
 knocks-0.018
 points-0.017
Token"
Token ID1
Feature activation+8.94e-02

Pos logit contributions
lessness+0.063
he+0.059
smanship+0.056
ill+0.055
 Suz+0.054
Neg logit contributions
 asks-0.025
 trips-0.022
 walks-0.022
 knocks-0.022
 points-0.020
Token to
Token ID284
Feature activation+9.64e-03

Pos logit contributions
lessness+0.223
he+0.208
smanship+0.197
ill+0.193
 Suz+0.190
Neg logit contributions
 asks-0.105
 walks-0.084
 trips-0.083
 points-0.080
 knocks-0.079
Token its
Token ID663
Feature activation-7.76e-03

Pos logit contributions
lessness+0.020
he+0.019
ill+0.017
 Suz+0.016
smanship+0.016
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.015
 trips-0.014
 points-0.014
Token friends
Token ID2460
Feature activation+1.54e-02

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.010
 points+0.009
Neg logit contributions
lessness-0.018
he-0.016
ill-0.015
 Suz-0.015
smanship-0.015
Token.
Token ID13
Feature activation+1.52e-03

Pos logit contributions
lessness+0.036
he+0.034
ill+0.032
 Suz+0.031
smanship+0.030
Neg logit contributions
 asks-0.021
 walks-0.019
 knocks-0.018
 trips-0.018
 points-0.017
Token They
Token ID1119
Feature activation+1.07e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.002
Token family
Token ID1641
Feature activation+2.70e-03

Pos logit contributions
lessness+0.005
he+0.005
ill+0.004
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.003
 points-0.003
Token said
Token ID531
Feature activation+3.32e-03

Pos logit contributions
lessness+0.007
he+0.006
ill+0.006
 Suz+0.006
smanship+0.006
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.003
 points-0.003
Token "
Token ID366
Feature activation+3.75e-02

Pos logit contributions
lessness+0.007
he+0.006
ill+0.006
 Suz+0.005
taker+0.005
Neg logit contributions
 asks-0.006
 knocks-0.005
 walks-0.005
 trips-0.005
 points-0.005
Tokenthank
Token ID40716
Feature activation+1.76e-02

Pos logit contributions
lessness+0.071
he+0.063
ill+0.058
 Suz+0.057
smanship+0.056
Neg logit contributions
 asks-0.066
 walks-0.060
 knocks-0.058
 trips-0.058
 points-0.058
Token you
Token ID345
Feature activation+2.04e-02

Pos logit contributions
lessness+0.039
he+0.037
ill+0.035
 Suz+0.033
smanship+0.032
Neg logit contributions
 asks-0.026
 knocks-0.023
 walks-0.023
 trips-0.022
 points-0.021
Token"
Token ID1
Feature activation+8.92e-02

Pos logit contributions
lessness+0.049
he+0.046
ill+0.043
smanship+0.042
 Suz+0.042
Neg logit contributions
 asks-0.025
 knocks-0.022
 walks-0.022
 trips-0.022
 points-0.021
Token before
Token ID878
Feature activation+5.35e-04

Pos logit contributions
lessness+0.206
he+0.191
smanship+0.178
 Suz+0.176
ill+0.173
Neg logit contributions
 asks-0.120
 walks-0.100
 points-0.098
 trips-0.098
 knocks-0.095
Token they
Token ID484
Feature activation+1.93e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token left
Token ID1364
Feature activation-1.13e-02

Pos logit contributions
lessness+0.046
he+0.045
ill+0.041
 Suz+0.039
smanship+0.038
Neg logit contributions
 asks-0.024
 knocks-0.023
 walks-0.022
 trips-0.022
 points-0.021
Token.
Token ID13
Feature activation-8.94e-03

Pos logit contributions
 asks+0.017
 knocks+0.016
 walks+0.016
 trips+0.015
 points+0.015
Neg logit contributions
lessness-0.024
he-0.023
ill-0.021
 Suz-0.020
smanship-0.020
Token They
Token ID1119
Feature activation+7.29e-03

Pos logit contributions
 asks+0.015
 walks+0.014
 knocks+0.014
 trips+0.014
 points+0.013
Neg logit contributions
lessness-0.018
he-0.017
ill-0.015
 Suz-0.014
smanship-0.014
Token,
Token ID11
Feature activation+8.67e-03

Pos logit contributions
 asks+0.024
 knocks+0.022
 walks+0.022
 trips+0.021
 points+0.020
Neg logit contributions
lessness-0.030
he-0.028
ill-0.026
 Suz-0.025
smanship-0.024
Token sauce
Token ID10746
Feature activation-1.59e-02

Pos logit contributions
lessness+0.019
he+0.018
ill+0.017
 Suz+0.016
smanship+0.016
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.011
 points-0.011
Token,
Token ID11
Feature activation+1.67e-04

Pos logit contributions
 asks+0.031
 walks+0.029
 knocks+0.028
 trips+0.027
 requests+0.026
Neg logit contributions
lessness-0.029
he-0.027
ill-0.025
 Suz-0.022
smanship-0.022
Token cheese
Token ID9891
Feature activation-2.66e-02

Pos logit contributions
lessness+0.000
he+0.000
ill+0.000
 Suz+0.000
smanship+0.000
Neg logit contributions
 asks-0.000
 knocks-0.000
 walks-0.000
 trips-0.000
 points-0.000
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
 asks+0.054
 walks+0.050
 knocks+0.049
 trips+0.048
 requests+0.047
Neg logit contributions
lessness-0.043
he-0.041
ill-0.038
 Suz-0.033
smanship-0.032
Token topp
Token ID23126
Feature activation-8.14e-02

Pos logit contributions
 asks+0.018
 knocks+0.017
 trips+0.017
 walks+0.017
 strangers+0.017
Neg logit contributions
lessness-0.020
he-0.018
ill-0.018
 Suz-0.017
smanship-0.016
Tokenings
Token ID654
Feature activation-1.66e-02

Pos logit contributions
 asks+0.157
 walks+0.150
 knocks+0.145
 strangers+0.139
 trips+0.138
Neg logit contributions
lessness-0.139
he-0.136
ill-0.130
 Suz-0.115
smanship-0.110
Token.
Token ID13
Feature activation-2.92e-03

Pos logit contributions
 asks+0.028
 knocks+0.026
 walks+0.025
 trips+0.025
 points+0.024
Neg logit contributions
lessness-0.034
he-0.032
ill-0.029
 Suz-0.028
smanship-0.026
Token She
Token ID1375
Feature activation-3.96e-03

Pos logit contributions
 asks+0.005
 walks+0.004
 knocks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.005
smanship-0.005
Token showed
Token ID3751
Feature activation-1.01e-02

Pos logit contributions
 strangers+0.006
 walks+0.006
 asks+0.006
 trips+0.006
 knocks+0.005
Neg logit contributions
he-0.007
lessness-0.007
ill-0.007
 Suz-0.007
smanship-0.006
Token them
Token ID606
Feature activation-1.13e-02

Pos logit contributions
 asks+0.012
 knocks+0.011
 walks+0.011
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.025
he-0.024
ill-0.023
 Suz-0.022
taker-0.021
Token of
Token ID286
Feature activation-2.76e-03

Pos logit contributions
lessness+0.007
he+0.007
ill+0.006
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.016
 walks-0.015
 knocks-0.015
 trips-0.015
 points-0.014
Token cheese
Token ID9891
Feature activation-1.25e-02

Pos logit contributions
 asks+0.004
 knocks+0.004
 walks+0.003
 trips+0.003
 strangers+0.003
Neg logit contributions
lessness-0.006
he-0.005
 Suz-0.005
taker-0.004
 sorts-0.004
Token and
Token ID290
Feature activation-1.06e-03

Pos logit contributions
 asks+0.024
 walks+0.023
 knocks+0.022
 requests+0.022
 trips+0.021
Neg logit contributions
lessness-0.022
he-0.021
ill-0.019
 Suz-0.016
taker-0.016
Token sauce
Token ID10746
Feature activation-8.60e-03

Pos logit contributions
 asks+0.002
 knocks+0.002
 trips+0.001
 walks+0.001
 requests+0.001
Neg logit contributions
lessness-0.002
he-0.002
 Suz-0.002
ill-0.002
smanship-0.002
Token and
Token ID290
Feature activation-8.81e-03

Pos logit contributions
 asks+0.016
 walks+0.015
 knocks+0.014
 requests+0.014
 trips+0.014
Neg logit contributions
lessness-0.016
he-0.015
ill-0.014
taker-0.012
 Suz-0.011
Token topp
Token ID23126
Feature activation-8.09e-02

Pos logit contributions
 asks+0.014
 knocks+0.013
 trips+0.012
 requests+0.012
 walks+0.012
Neg logit contributions
lessness-0.016
he-0.014
 Suz-0.014
ill-0.013
smanship-0.012
Tokenings
Token ID654
Feature activation-1.62e-02

Pos logit contributions
 asks+0.161
 walks+0.153
 knocks+0.149
 strangers+0.143
 trips+0.138
Neg logit contributions
he-0.128
lessness-0.127
ill-0.121
ay-0.107
 Suz-0.102
Token.
Token ID13
Feature activation-1.77e-02

Pos logit contributions
 asks+0.030
 knocks+0.027
 walks+0.027
 requests+0.026
 trips+0.025
Neg logit contributions
lessness-0.029
he-0.028
ill-0.024
free-0.022
 Suz-0.022
Token The
Token ID383
Feature activation-8.11e-03

Pos logit contributions
 asks+0.029
 walks+0.028
 trips+0.027
 knocks+0.027
 points+0.026
Neg logit contributions
lessness-0.035
he-0.032
ill-0.030
 Suz-0.029
smanship-0.027
Token people
Token ID661
Feature activation-1.40e-02

Pos logit contributions
 asks+0.012
 walks+0.011
 knocks+0.011
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.018
he-0.017
ill-0.016
 Suz-0.015
smanship-0.015
Token who
Token ID508
Feature activation-9.18e-05

Pos logit contributions
 asks+0.024
 walks+0.022
 knocks+0.022
 trips+0.021
 points+0.020
Neg logit contributions
lessness-0.027
he-0.027
ill-0.023
 Suz-0.022
smanship-0.021
Token saw
Token ID2497
Feature activation+4.41e-03

Pos logit contributions
 asks+0.003
 walks+0.003
 knocks+0.003
 trips+0.003
 points+0.003
Neg logit contributions
lessness-0.004
he-0.004
ill-0.004
 Suz-0.003
smanship-0.003
Token five
Token ID1936
Feature activation-1.85e-02

Pos logit contributions
lessness+0.009
he+0.008
ill+0.008
 Suz+0.007
taker+0.007
Neg logit contributions
 asks-0.007
 knocks-0.007
 walks-0.006
 trips-0.006
 requests-0.006
Token fish
Token ID5916
Feature activation-2.37e-02

Pos logit contributions
 asks+0.029
 knocks+0.027
 walks+0.026
 trips+0.025
 requests+0.024
Neg logit contributions
lessness-0.039
he-0.036
ill-0.034
 Suz-0.032
smanship-0.031
Token and
Token ID290
Feature activation-1.12e-02

Pos logit contributions
 asks+0.043
 knocks+0.040
 walks+0.040
 trips+0.038
 requests+0.037
Neg logit contributions
lessness-0.045
he-0.042
ill-0.039
 Suz-0.036
smanship-0.035
Token seven
Token ID3598
Feature activation-2.33e-02

Pos logit contributions
 asks+0.017
 requests+0.016
 knocks+0.016
 trips+0.015
 walks+0.015
Neg logit contributions
lessness-0.021
ill-0.020
 Suz-0.019
he-0.018
taker-0.017
Token seaw
Token ID32085
Feature activation-7.55e-02

Pos logit contributions
 asks+0.037
 knocks+0.036
 dial+0.032
 knocked+0.032
 walks+0.032
Neg logit contributions
lessness-0.044
ill-0.041
he-0.040
 Suz-0.035
smanship-0.033
Tokeneed
Token ID2308
Feature activation-4.40e-02

Pos logit contributions
 asks+0.136
 knocks+0.126
 walks+0.123
 trips+0.118
 strangers+0.114
Neg logit contributions
lessness-0.140
he-0.131
ill-0.128
 Suz-0.113
smanship-0.110
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
 asks+0.079
 walks+0.073
 knocks+0.072
 trips+0.070
 requests+0.067
Neg logit contributions
lessness-0.084
he-0.079
ill-0.076
 Suz-0.066
smanship-0.065
Token They
Token ID1119
Feature activation-1.54e-03

Pos logit contributions
 asks+0.021
 walks+0.020
 knocks+0.019
 trips+0.019
 points+0.018
Neg logit contributions
lessness-0.030
he-0.028
ill-0.026
 Suz-0.025
smanship-0.024
Token were
Token ID547
Feature activation-7.49e-03

Pos logit contributions
 asks+0.002
 walks+0.002
 knocks+0.002
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.003
ill-0.003
 Suz-0.003
smanship-0.003
Token happy
Token ID3772
Feature activation-5.29e-03

Pos logit contributions
 asks+0.014
 knocks+0.013
 walks+0.013
 trips+0.013
 points+0.012
Neg logit contributions
lessness-0.014
he-0.013
ill-0.012
 Suz-0.011
smanship-0.011
Token added
Token ID2087
Feature activation-1.81e-02

Pos logit contributions
 asks+0.005
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.008
he-0.007
ill-0.007
 Suz-0.007
smanship-0.007
Token shells
Token ID19679
Feature activation-6.29e-04

Pos logit contributions
 asks+0.024
 walks+0.022
 knocks+0.022
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.043
he-0.041
ill-0.037
 Suz-0.036
smanship-0.036
Token and
Token ID290
Feature activation-9.33e-03

Pos logit contributions
 asks+0.001
 walks+0.001
 trips+0.001
 points+0.001
 knocks+0.001
Neg logit contributions
lessness-0.001
he-0.001
smanship-0.001
 Suz-0.001
ill-0.001
Token stones
Token ID14966
Feature activation-1.53e-03

Pos logit contributions
 asks+0.016
 knocks+0.015
 trips+0.014
 walks+0.014
 requests+0.014
Neg logit contributions
lessness-0.017
ill-0.016
he-0.015
 Suz-0.015
smanship-0.014
Token and
Token ID290
Feature activation-1.77e-02

Pos logit contributions
 asks+0.002
 walks+0.002
 knocks+0.002
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.003
ill-0.003
 Suz-0.003
smanship-0.003
Token seaw
Token ID32085
Feature activation-7.43e-02

Pos logit contributions
 asks+0.029
 knocks+0.027
 trips+0.025
 walks+0.025
 requests+0.025
Neg logit contributions
lessness-0.035
ill-0.031
he-0.031
 Suz-0.029
smanship-0.027
Tokeneed
Token ID2308
Feature activation-3.45e-02

Pos logit contributions
 asks+0.130
 knocks+0.120
 walks+0.115
 trips+0.112
 strangers+0.111
Neg logit contributions
lessness-0.143
he-0.134
ill-0.129
 Suz-0.117
smanship-0.113
Token to
Token ID284
Feature activation-2.07e-02

Pos logit contributions
 asks+0.062
 knocks+0.057
 walks+0.056
 trips+0.055
 requests+0.052
Neg logit contributions
lessness-0.066
he-0.062
ill-0.058
 Suz-0.053
smanship-0.051
Token make
Token ID787
Feature activation-2.88e-03

Pos logit contributions
 asks+0.028
 walks+0.026
 knocks+0.025
 trips+0.024
 points+0.023
Neg logit contributions
lessness-0.051
he-0.046
ill-0.041
smanship-0.041
taker-0.041
Token it
Token ID340
Feature activation+5.63e-03

Pos logit contributions
 asks+0.004
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.003
Neg logit contributions
lessness-0.006
he-0.006
ill-0.006
 Suz-0.005
smanship-0.005
Token look
Token ID804
Feature activation+5.91e-03

Pos logit contributions
lessness+0.012
he+0.012
ill+0.010
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.009
 walks-0.008
 knocks-0.008
 trips-0.008
 points-0.008
Token of
Token ID286
Feature activation-9.79e-03

Pos logit contributions
 asks+0.004
 walks+0.004
 points+0.004
 trips+0.003
 knocks+0.003
Neg logit contributions
he-0.003
lessness-0.003
ill-0.002
smanship-0.002
 Suz-0.002
Token shells
Token ID19679
Feature activation+5.74e-03

Pos logit contributions
 asks+0.016
 knocks+0.015
 walks+0.014
 trips+0.014
 throws+0.013
Neg logit contributions
lessness-0.020
he-0.019
ill-0.017
 Suz-0.017
smanship-0.016
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
lessness+0.012
he+0.011
ill+0.010
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.009
 walks-0.009
 trips-0.008
 knocks-0.008
 points-0.008
Token rocks
Token ID12586
Feature activation-3.41e-03

Pos logit contributions
lessness+0.025
he+0.023
ill+0.021
 Suz+0.020
smanship+0.020
Neg logit contributions
 asks-0.019
 walks-0.017
 knocks-0.017
 trips-0.017
 points-0.016
Token and
Token ID290
Feature activation-5.37e-03

Pos logit contributions
 asks+0.006
 walks+0.005
 knocks+0.005
 trips+0.005
 points+0.005
Neg logit contributions
lessness-0.007
he-0.006
ill-0.006
 Suz-0.006
smanship-0.006
Token seaw
Token ID32085
Feature activation-7.08e-02

Pos logit contributions
 asks+0.009
 knocks+0.009
 walks+0.008
 trips+0.008
 requests+0.008
Neg logit contributions
lessness-0.010
he-0.010
ill-0.009
 Suz-0.009
smanship-0.008
Tokeneed
Token ID2308
Feature activation-2.91e-02

Pos logit contributions
 asks+0.135
 knocks+0.126
 walks+0.117
 strangers+0.114
 trips+0.113
Neg logit contributions
lessness-0.125
he-0.117
ill-0.115
 Suz-0.100
smanship-0.099
Token,
Token ID11
Feature activation+6.71e-03

Pos logit contributions
 asks+0.048
 knocks+0.044
 walks+0.044
 trips+0.043
 points+0.041
Neg logit contributions
lessness-0.060
he-0.056
ill-0.052
 Suz-0.049
smanship-0.048
Token but
Token ID475
Feature activation-4.73e-04

Pos logit contributions
lessness+0.013
he+0.012
ill+0.011
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.012
 walks-0.011
 knocks-0.011
 trips-0.011
 points-0.011
Token Jane
Token ID12091
Feature activation-7.81e-03

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token was
Token ID373
Feature activation+3.80e-03

Pos logit contributions
 asks+0.012
 knocks+0.011
 walks+0.011
 trips+0.011
 points+0.010
Neg logit contributions
lessness-0.016
he-0.016
ill-0.015
 Suz-0.014
smanship-0.013
Tokenate
Token ID378
Feature activation-1.22e-02

Pos logit contributions
lessness+0.007
he+0.007
ill+0.006
 Suz+0.006
smanship+0.006
Neg logit contributions
 asks-0.004
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.003
Token it
Token ID340
Feature activation+8.72e-03

Pos logit contributions
 asks+0.014
 walks+0.013
 knocks+0.013
 trips+0.012
 points+0.012
Neg logit contributions
lessness-0.031
he-0.030
ill-0.027
smanship-0.027
 Suz-0.027
Token with
Token ID351
Feature activation-2.36e-02

Pos logit contributions
lessness+0.018
he+0.017
ill+0.015
 Suz+0.015
smanship+0.014
Neg logit contributions
 asks-0.015
 walks-0.013
 knocks-0.013
 trips-0.013
 points-0.013
Token shells
Token ID19679
Feature activation-1.26e-03

Pos logit contributions
 asks+0.037
 knocks+0.034
 walks+0.032
 trips+0.031
 requests+0.030
Neg logit contributions
lessness-0.050
he-0.047
ill-0.045
 Suz-0.044
smanship-0.040
Token and
Token ID290
Feature activation-1.80e-02

Pos logit contributions
 asks+0.002
 knocks+0.002
 walks+0.002
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.003
he-0.002
ill-0.002
 Suz-0.002
smanship-0.002
Token seaw
Token ID32085
Feature activation-7.08e-02

Pos logit contributions
 asks+0.030
 knocks+0.027
 trips+0.026
 walks+0.025
 requests+0.025
Neg logit contributions
lessness-0.035
ill-0.031
he-0.031
 Suz-0.030
smanship-0.027
Tokeneed
Token ID2308
Feature activation-3.76e-02

Pos logit contributions
 asks+0.138
 knocks+0.130
 strangers+0.122
 walks+0.121
 trips+0.117
Neg logit contributions
lessness-0.120
he-0.111
ill-0.109
ay-0.094
 Suz-0.093
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 asks+0.068
 knocks+0.063
 walks+0.062
 trips+0.060
 strangers+0.058
Neg logit contributions
lessness-0.071
he-0.065
ill-0.063
 Suz-0.055
taker-0.053
Token They
Token ID1119
Feature activation-5.11e-03

Pos logit contributions
 asks+0.024
 walks+0.022
 knocks+0.022
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.031
he-0.029
ill-0.027
 Suz-0.026
smanship-0.024
Token find
Token ID1064
Feature activation-1.60e-02

Pos logit contributions
 asks+0.008
 knocks+0.007
 walks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.011
he-0.011
ill-0.010
 Suz-0.010
smanship-0.009
Token a
Token ID257
Feature activation-1.29e-02

Pos logit contributions
 asks+0.023
 knocks+0.021
 walks+0.020
 trips+0.020
 points+0.019
Neg logit contributions
lessness-0.037
he-0.035
ill-0.032
 Suz-0.031
smanship-0.030
Token they
Token ID484
Feature activation+7.88e-03

Pos logit contributions
lessness+0.014
he+0.013
ill+0.012
 Suz+0.012
smanship+0.011
Neg logit contributions
 asks-0.013
 walks-0.012
 knocks-0.012
 trips-0.011
 points-0.011
Token can
Token ID460
Feature activation-6.29e-03

Pos logit contributions
lessness+0.018
he+0.016
ill+0.016
 Suz+0.016
smanship+0.015
Neg logit contributions
 asks-0.012
 knocks-0.011
 walks-0.010
 trips-0.010
 throws-0.009
Token add
Token ID751
Feature activation-1.82e-02

Pos logit contributions
 asks+0.009
 knocks+0.008
 walks+0.008
 trips+0.008
 points+0.007
Neg logit contributions
lessness-0.015
he-0.014
ill-0.013
 Suz-0.012
smanship-0.012
Token their
Token ID511
Feature activation-8.01e-03

Pos logit contributions
 asks+0.028
 knocks+0.026
 walks+0.025
 trips+0.024
 points+0.023
Neg logit contributions
lessness-0.038
he-0.037
ill-0.034
 Suz-0.032
smanship-0.031
Token own
Token ID898
Feature activation-2.36e-02

Pos logit contributions
 asks+0.013
 knocks+0.011
 walks+0.011
 strangers+0.011
 trips+0.010
Neg logit contributions
lessness-0.015
he-0.014
ill-0.013
smanship-0.013
 Suz-0.012
Token topp
Token ID23126
Feature activation-7.05e-02

Pos logit contributions
 asks+0.040
 knocks+0.034
 walks+0.034
 strangers+0.033
 knocked+0.031
Neg logit contributions
lessness-0.042
he-0.041
ill-0.038
smanship-0.037
ay-0.035
Tokenings
Token ID654
Feature activation+1.20e-02

Pos logit contributions
 asks+0.134
 walks+0.129
 knocks+0.125
 trips+0.118
 strangers+0.118
Neg logit contributions
lessness-0.121
he-0.120
ill-0.114
smanship-0.101
 Suz-0.098
Token too
Token ID1165
Feature activation-5.65e-03

Pos logit contributions
lessness+0.024
he+0.023
ill+0.021
 Suz+0.020
smanship+0.020
Neg logit contributions
 asks-0.020
 walks-0.018
 knocks-0.018
 trips-0.018
 points-0.017
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 asks+0.009
 walks+0.008
 knocks+0.008
 trips+0.008
 points+0.008
Neg logit contributions
lessness-0.012
he-0.011
ill-0.010
 Suz-0.010
smanship-0.010
Token Sara
Token ID24799
Feature activation+2.64e-03

Pos logit contributions
 walks+0.018
 knocks+0.018
 strangers+0.018
 asks+0.018
 trips+0.018
Neg logit contributions
he-0.017
lessness-0.016
ill-0.015
 Suz-0.013
ay-0.012
Token chooses
Token ID19769
Feature activation+1.28e-02

Pos logit contributions
lessness+0.006
he+0.006
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.004
 walks-0.003
 knocks-0.003
 trips-0.003
 strangers-0.003
Token noisy
Token ID31210
Feature activation-7.33e-03

Pos logit contributions
 asks+0.035
 walks+0.031
 requests+0.031
 trips+0.030
 knocks+0.030
Neg logit contributions
lessness-0.031
he-0.027
 Suz-0.025
ill-0.025
taker-0.023
Token and
Token ID290
Feature activation+1.33e-02

Pos logit contributions
 asks+0.014
 knocks+0.013
 walks+0.013
 trips+0.013
 points+0.012
Neg logit contributions
lessness-0.013
he-0.012
ill-0.011
 Suz-0.010
smanship-0.010
Token scary
Token ID14343
Feature activation-2.01e-02

Pos logit contributions
lessness+0.026
he+0.024
ill+0.021
 Suz+0.021
smanship+0.020
Neg logit contributions
 asks-0.023
 walks-0.021
 knocks-0.021
 trips-0.020
 points-0.020
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
 asks+0.039
 knocks+0.035
 walks+0.035
 trips+0.033
 requests+0.033
Neg logit contributions
lessness-0.034
he-0.028
ill-0.026
 Suz-0.024
free-0.023
Token He
Token ID679
Feature activation-2.04e-02

Pos logit contributions
 asks+0.024
 walks+0.022
 knocks+0.022
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.022
he-0.021
ill-0.019
 Suz-0.018
smanship-0.017
Token bar
Token ID2318
Feature activation-7.03e-02

Pos logit contributions
 asks+0.024
 passports+0.024
 Sale+0.023
 bills+0.023
 requests+0.022
Neg logit contributions
lessness-0.038
he-0.036
ill-0.031
taker-0.031
 Suz-0.030
Tokenks
Token ID591
Feature activation-2.41e-02

Pos logit contributions
 walks+0.095
 trips+0.089
 asks+0.086
 knocks+0.084
 throws+0.081
Neg logit contributions
lessness-0.158
he-0.148
ill-0.139
lessly-0.117
taker-0.115
Token and
Token ID290
Feature activation-1.96e-03

Pos logit contributions
 asks+0.048
 knocks+0.046
 walks+0.044
 trips+0.044
 requests+0.041
Neg logit contributions
lessness-0.041
he-0.036
ill-0.032
taker-0.029
lessly-0.029
Token grow
Token ID1663
Feature activation-1.87e-02

Pos logit contributions
ves+0.002
 passports+0.002
 trips+0.002
 asks+0.002
 bills+0.002
Neg logit contributions
lessness-0.004
 Suz-0.004
he-0.004
ill-0.003
taker-0.003
Tokenls
Token ID7278
Feature activation-1.05e-02

Pos logit contributions
 asks+0.027
 knocks+0.024
 walks+0.024
 trips+0.023
 points+0.022
Neg logit contributions
lessness-0.043
he-0.040
ill-0.037
 Suz-0.035
smanship-0.034
Token and
Token ID290
Feature activation-1.85e-03

Pos logit contributions
 asks+0.018
 knocks+0.017
 walks+0.017
 trips+0.016
 points+0.015
Neg logit contributions
lessness-0.021
he-0.019
ill-0.018
 Suz-0.016
smanship-0.016
Token buckets
Token ID38674
Feature activation-1.10e-03

Pos logit contributions
 asks+0.031
 knocks+0.029
 walks+0.028
 trips+0.026
 requests+0.026
Neg logit contributions
lessness-0.039
he-0.036
ill-0.034
 Suz-0.034
smanship-0.031
Token and
Token ID290
Feature activation-1.52e-02

Pos logit contributions
 asks+0.002
 trips+0.001
 walks+0.001
 points+0.001
 knocks+0.001
Neg logit contributions
lessness-0.002
he-0.002
smanship-0.002
 Suz-0.002
ill-0.002
Token shovel
Token ID33677
Feature activation-2.09e-02

Pos logit contributions
 asks+0.025
 knocks+0.024
 walks+0.023
 trips+0.023
 strangers+0.023
Neg logit contributions
lessness-0.026
 Suz-0.024
ill-0.024
he-0.022
taker-0.020
Tokens
Token ID82
Feature activation-6.72e-03

Pos logit contributions
 asks+0.028
 trips+0.025
 knocks+0.024
 points+0.024
 walks+0.024
Neg logit contributions
lessness-0.049
he-0.046
ill-0.042
 Suz-0.042
smanship-0.041
Token and
Token ID290
Feature activation-2.32e-02

Pos logit contributions
 asks+0.012
 walks+0.011
 knocks+0.011
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.013
he-0.012
ill-0.011
 Suz-0.011
smanship-0.010
Token seaw
Token ID32085
Feature activation-7.00e-02

Pos logit contributions
 asks+0.037
 knocks+0.034
 walks+0.033
 trips+0.033
 requests+0.032
Neg logit contributions
lessness-0.044
ill-0.039
 Suz-0.039
he-0.039
taker-0.034
Tokeneed
Token ID2308
Feature activation-3.36e-02

Pos logit contributions
 asks+0.142
 knocks+0.137
 walks+0.130
 strangers+0.128
 knocked+0.121
Neg logit contributions
lessness-0.109
he-0.101
ill-0.100
imm-0.085
ay-0.084
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
 asks+0.056
 walks+0.052
 knocks+0.052
 trips+0.050
 points+0.048
Neg logit contributions
lessness-0.068
he-0.064
ill-0.059
 Suz-0.056
smanship-0.054
Token Then
Token ID3244
Feature activation-6.33e-03

Pos logit contributions
 asks+0.026
 walks+0.025
 knocks+0.024
 trips+0.024
 points+0.023
Neg logit contributions
lessness-0.030
he-0.027
ill-0.026
 Suz-0.025
smanship-0.023
Token we
Token ID356
Feature activation-7.51e-03

Pos logit contributions
 asks+0.010
 knocks+0.010
 walks+0.010
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.011
 Suz-0.010
he-0.010
ill-0.010
smanship-0.009
Token can
Token ID460
Feature activation-9.21e-03

Pos logit contributions
 asks+0.013
 walks+0.012
 knocks+0.012
 trips+0.012
 points+0.012
Neg logit contributions
lessness-0.014
he-0.014
ill-0.012
 Suz-0.012
smanship-0.011
Token cover
Token ID3002
Feature activation-2.33e-02

Pos logit contributions
 walks+0.033
 asks+0.033
 knocks+0.033
 trips+0.032
 points+0.030
Neg logit contributions
lessness-0.038
he-0.037
ill-0.036
 Suz-0.036
free-0.031
Token your
Token ID534
Feature activation-6.05e-03

Pos logit contributions
 asks+0.033
 knocks+0.030
 walks+0.030
 trips+0.029
 points+0.027
Neg logit contributions
lessness-0.052
he-0.050
ill-0.047
 Suz-0.045
smanship-0.043
Token work
Token ID670
Feature activation+2.83e-02

Pos logit contributions
 asks+0.009
 knocks+0.008
 walks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.013
he-0.012
ill-0.011
 Suz-0.011
smanship-0.011
Token.
Token ID13
Feature activation-2.06e-02

Pos logit contributions
lessness+0.061
he+0.056
 Suz+0.053
ill+0.053
smanship+0.050
Neg logit contributions
 asks-0.043
 knocks-0.040
 walks-0.039
 trips-0.039
 points-0.037
Token Ask
Token ID16981
Feature activation-2.10e-02

Pos logit contributions
 asks+0.037
 walks+0.036
 trips+0.035
 knocks+0.035
 points+0.034
Neg logit contributions
lessness-0.037
he-0.035
ill-0.032
 Suz-0.031
smanship-0.028
Token for
Token ID329
Feature activation-6.90e-02

Pos logit contributions
 asks+0.036
 knocks+0.034
 walks+0.033
 points+0.032
 trips+0.032
Neg logit contributions
lessness-0.039
he-0.036
ill-0.034
 Suz-0.033
smanship-0.030
Token help
Token ID1037
Feature activation+1.62e-03

Pos logit contributions
 asks+0.134
 knocks+0.132
 trips+0.123
 walks+0.117
 hides+0.117
Neg logit contributions
lessness-0.109
 Suz-0.086
he-0.085
ill-0.080
taker-0.077
Token when
Token ID618
Feature activation-1.67e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.002
 points-0.002
Token you
Token ID345
Feature activation-9.43e-04

Pos logit contributions
 asks+0.033
 knocks+0.032
 charges+0.031
 walks+0.031
 dial+0.030
Neg logit contributions
lessness-0.022
 Suz-0.019
he-0.016
ill-0.016
 Maria-0.015
Token need
Token ID761
Feature activation-2.48e-02

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
he-0.002
lessness-0.002
ill-0.002
 Suz-0.002
smanship-0.002
Token it
Token ID340
Feature activation+3.79e-04

Pos logit contributions
 asks+0.043
 knocks+0.041
 hides+0.038
 walks+0.037
 trips+0.037
Neg logit contributions
lessness-0.042
he-0.039
 Suz-0.036
ill-0.036
taker-0.032
Token help
Token ID1037
Feature activation+1.98e-02

Pos logit contributions
he+0.002
lessness+0.002
 Suz+0.002
ill+0.002
smanship+0.002
Neg logit contributions
 asks-0.002
 knocks-0.002
 trips-0.002
 points-0.002
 walks-0.002
Token cook
Token ID4255
Feature activation-1.58e-02

Pos logit contributions
he+0.038
lessness+0.037
ill+0.034
 Suz+0.032
smanship+0.031
Neg logit contributions
 asks-0.035
 knocks-0.032
 walks-0.031
 trips-0.031
 points-0.031
Token hot
Token ID3024
Feature activation-2.12e-02

Pos logit contributions
 asks+0.025
 knocks+0.023
 walks+0.023
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.033
he-0.032
ill-0.029
 Suz-0.028
smanship-0.027
Token dogs
Token ID6844
Feature activation-8.93e-03

Pos logit contributions
 asks+0.032
 knocks+0.030
 walks+0.027
 trips+0.026
 points+0.025
Neg logit contributions
lessness-0.045
he-0.043
ill-0.039
 Suz-0.039
smanship-0.036
Token and
Token ID290
Feature activation-5.39e-03

Pos logit contributions
 asks+0.015
 knocks+0.013
 walks+0.013
 trips+0.013
 points+0.013
Neg logit contributions
lessness-0.018
he-0.018
ill-0.016
 Suz-0.015
smanship-0.015
Token hamb
Token ID41478
Feature activation-6.84e-02

Pos logit contributions
 asks+0.008
 knocks+0.008
 trips+0.007
 walks+0.007
 points+0.007
Neg logit contributions
lessness-0.011
he-0.010
 Suz-0.010
ill-0.009
smanship-0.009
Tokenurg
Token ID3686
Feature activation-1.51e-02

Pos logit contributions
 asks+0.136
 knocks+0.126
 walks+0.117
 trips+0.115
 strangers+0.113
Neg logit contributions
lessness-0.110
ill-0.102
he-0.101
 Suz-0.098
imm-0.094
Tokeners
Token ID364
Feature activation-1.63e-02

Pos logit contributions
 asks+0.023
 walks+0.022
 knocks+0.021
 trips+0.021
 points+0.021
Neg logit contributions
lessness-0.031
he-0.029
ill-0.028
 Suz-0.027
smanship-0.026
Token on
Token ID319
Feature activation-1.92e-02

Pos logit contributions
 asks+0.026
 knocks+0.023
 walks+0.023
 trips+0.023
 points+0.022
Neg logit contributions
lessness-0.035
he-0.033
ill-0.030
 Suz-0.029
smanship-0.028
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
 asks+0.035
 knocks+0.031
 walks+0.030
 trips+0.030
 points+0.030
Neg logit contributions
lessness-0.035
he-0.034
ill-0.031
 Suz-0.030
smanship-0.028
Token grill
Token ID29901
Feature activation-1.22e-02

Pos logit contributions
lessness+0.027
he+0.026
ill+0.022
smanship+0.022
 Suz+0.022
Neg logit contributions
 asks-0.019
 knocks-0.016
 walks-0.015
 points-0.015
 trips-0.015
Token decorated
Token ID24789
Feature activation-4.59e-03

Pos logit contributions
 asks+0.011
 walks+0.010
 trips+0.010
 requests+0.009
 knocks+0.009
Neg logit contributions
lessness-0.014
he-0.014
ill-0.013
 Suz-0.013
taker-0.012
Token it
Token ID340
Feature activation+8.74e-03

Pos logit contributions
 asks+0.007
 knocks+0.006
 walks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.010
he-0.010
ill-0.009
 Suz-0.009
smanship-0.008
Token with
Token ID351
Feature activation-2.27e-02

Pos logit contributions
lessness+0.017
he+0.016
ill+0.015
 Suz+0.014
smanship+0.013
Neg logit contributions
 asks-0.015
 walks-0.014
 knocks-0.014
 trips-0.014
 points-0.013
Token shells
Token ID19679
Feature activation-2.20e-03

Pos logit contributions
 asks+0.036
 knocks+0.032
 walks+0.031
 trips+0.030
 requests+0.029
Neg logit contributions
lessness-0.048
he-0.045
ill-0.043
 Suz-0.042
smanship-0.039
Token and
Token ID290
Feature activation-1.71e-02

Pos logit contributions
 asks+0.004
 knocks+0.003
 walks+0.003
 trips+0.003
 points+0.003
Neg logit contributions
lessness-0.005
he-0.004
ill-0.004
 Suz-0.004
smanship-0.004
Token seaw
Token ID32085
Feature activation-6.83e-02

Pos logit contributions
 asks+0.029
 knocks+0.027
 trips+0.026
 requests+0.025
 walks+0.025
Neg logit contributions
lessness-0.032
ill-0.028
he-0.027
 Suz-0.027
smanship-0.024
Tokeneed
Token ID2308
Feature activation-3.40e-02

Pos logit contributions
 asks+0.128
 knocks+0.119
 walks+0.113
 strangers+0.109
 trips+0.109
Neg logit contributions
lessness-0.124
he-0.114
ill-0.112
 Suz-0.097
smanship-0.095
Token.
Token ID13
Feature activation-1.99e-02

Pos logit contributions
 asks+0.059
 knocks+0.054
 walks+0.054
 trips+0.052
 requests+0.049
Neg logit contributions
lessness-0.068
he-0.063
ill-0.060
 Suz-0.054
smanship-0.052
Token They
Token ID1119
Feature activation-1.58e-03

Pos logit contributions
 asks+0.033
 walks+0.032
 knocks+0.032
 trips+0.031
 strangers+0.030
Neg logit contributions
lessness-0.038
he-0.035
ill-0.034
 Suz-0.033
taker-0.028
Token had
Token ID550
Feature activation-4.51e-03

Pos logit contributions
 asks+0.002
 knocks+0.002
 walks+0.002
 trips+0.002
 points+0.002
Neg logit contributions
lessness-0.004
he-0.003
ill-0.003
 Suz-0.003
smanship-0.003
Token a
Token ID257
Feature activation-1.40e-02

Pos logit contributions
 asks+0.007
 knocks+0.007
 walks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.009
he-0.009
ill-0.008
 Suz-0.008
smanship-0.008
Token like
Token ID588
Feature activation-2.38e-02

Pos logit contributions
lessness+0.034
he+0.031
ill+0.028
smanship+0.026
 Suz+0.026
Neg logit contributions
 asks-0.044
 walks-0.042
 knocks-0.042
 trips-0.041
 points-0.040
Token fish
Token ID5916
Feature activation-1.15e-02

Pos logit contributions
 asks+0.040
 knocks+0.036
 requests+0.033
 trips+0.033
 walks+0.033
Neg logit contributions
lessness-0.046
he-0.041
ill-0.041
 Suz-0.041
taker-0.035
Token and
Token ID290
Feature activation-1.04e-02

Pos logit contributions
 asks+0.023
 knocks+0.022
 walks+0.022
 requests+0.021
 trips+0.020
Neg logit contributions
lessness-0.018
he-0.017
ill-0.016
 Suz-0.014
smanship-0.013
Token shells
Token ID19679
Feature activation+5.63e-03

Pos logit contributions
 asks+0.017
 knocks+0.016
 requests+0.015
 trips+0.015
 walks+0.014
Neg logit contributions
lessness-0.019
 Suz-0.017
ill-0.017
he-0.016
taker-0.014
Token and
Token ID290
Feature activation-1.34e-02

Pos logit contributions
lessness+0.011
he+0.011
ill+0.010
 Suz+0.009
smanship+0.009
Neg logit contributions
 asks-0.010
 knocks-0.009
 walks-0.009
 trips-0.008
 points-0.008
Token seaw
Token ID32085
Feature activation-6.79e-02

Pos logit contributions
 asks+0.022
 knocks+0.020
 trips+0.019
 walks+0.019
 requests+0.018
Neg logit contributions
lessness-0.026
ill-0.023
 Suz-0.022
he-0.022
smanship-0.019
Tokeneed
Token ID2308
Feature activation-2.78e-02

Pos logit contributions
 asks+0.132
 knocks+0.123
 walks+0.116
 strangers+0.114
 trips+0.111
Neg logit contributions
lessness-0.116
he-0.108
ill-0.106
 Suz-0.091
imm-0.090
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
 asks+0.051
 knocks+0.047
 walks+0.046
 trips+0.044
 requests+0.043
Neg logit contributions
lessness-0.052
he-0.048
ill-0.045
 Suz-0.040
smanship-0.039
Token But
Token ID887
Feature activation-4.58e-03

Pos logit contributions
 asks+0.023
 walks+0.023
 knocks+0.023
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.026
he-0.025
ill-0.024
 Suz-0.023
smanship-0.019
Token they
Token ID484
Feature activation+9.43e-04

Pos logit contributions
 asks+0.008
 walks+0.008
 knocks+0.008
 trips+0.008
 points+0.008
Neg logit contributions
lessness-0.007
 Suz-0.007
he-0.007
ill-0.007
cy-0.006
Token did
Token ID750
Feature activation-1.48e-02

Pos logit contributions
lessness+0.002
he+0.002
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.001
 walks-0.001
 knocks-0.001
 trips-0.001
 points-0.001
Token Like
Token ID4525
Feature activation-4.18e-02

Pos logit contributions
 asks+0.008
 walks+0.008
 knocks+0.008
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.009
he-0.008
ill-0.008
 Suz-0.007
smanship-0.007
Token shells
Token ID19679
Feature activation-4.89e-03

Pos logit contributions
 asks+0.076
 knocks+0.070
 requests+0.065
 begs+0.063
 walks+0.062
Neg logit contributions
lessness-0.067
he-0.062
 Suz-0.061
ill-0.059
taker-0.049
Token and
Token ID290
Feature activation-2.21e-02

Pos logit contributions
 asks+0.008
 knocks+0.008
 walks+0.008
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.010
he-0.009
ill-0.008
 Suz-0.008
smanship-0.008
Token rocks
Token ID12586
Feature activation-8.13e-03

Pos logit contributions
 asks+0.036
 knocks+0.035
 trips+0.032
 walks+0.031
 begs+0.031
Neg logit contributions
lessness-0.041
ill-0.035
 Suz-0.034
he-0.033
taker-0.031
Token and
Token ID290
Feature activation-1.72e-02

Pos logit contributions
 asks+0.014
 knocks+0.013
 walks+0.013
 trips+0.012
 points+0.012
Neg logit contributions
lessness-0.016
he-0.015
ill-0.014
 Suz-0.013
smanship-0.013
Token seaw
Token ID32085
Feature activation-6.74e-02

Pos logit contributions
 asks+0.028
 knocks+0.027
 trips+0.025
 requests+0.024
 walks+0.024
Neg logit contributions
lessness-0.032
ill-0.028
he-0.027
 Suz-0.026
taker-0.024
Tokeneed
Token ID2308
Feature activation-2.89e-02

Pos logit contributions
 asks+0.132
 knocks+0.126
 walks+0.118
 knocked+0.115
 strangers+0.114
Neg logit contributions
lessness-0.111
he-0.104
ill-0.101
smanship-0.086
ay-0.086
Token."
Token ID526
Feature activation-4.23e-03

Pos logit contributions
 asks+0.050
 knocks+0.047
 walks+0.046
 trips+0.044
 points+0.042
Neg logit contributions
lessness-0.057
he-0.053
ill-0.049
 Suz-0.044
smanship-0.044
Token
Token ID198
Feature activation-3.39e-03

Pos logit contributions
 asks+0.006
 knocks+0.006
 walks+0.006
 trips+0.006
 points+0.005
Neg logit contributions
lessness-0.009
he-0.009
ill-0.008
 Suz-0.008
smanship-0.007
Token
Token ID198
Feature activation+1.30e-03

Pos logit contributions
 asks+0.005
 knocks+0.005
 walks+0.005
 trips+0.005
 points+0.005
Neg logit contributions
lessness-0.007
he-0.007
ill-0.006
 Suz-0.006
smanship-0.006
TokenM
Token ID44
Feature activation-3.26e-03

Pos logit contributions
lessness+0.003
he+0.003
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.002
Token They
Token ID1119
Feature activation+1.43e-03

Pos logit contributions
lessness+0.006
he+0.005
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.004
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.003
Token went
Token ID1816
Feature activation-1.04e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.002
 walks-0.002
 knocks-0.002
 trips-0.002
 points-0.002
Token past
Token ID1613
Feature activation-1.05e-02

Pos logit contributions
 walks+0.013
 trips+0.012
 asks+0.012
 knocks+0.011
 throws+0.010
Neg logit contributions
lessness-0.026
he-0.025
ill-0.023
smanship-0.023
umm-0.022
Token rocks
Token ID12586
Feature activation+2.15e-03

Pos logit contributions
 asks+0.016
 walks+0.015
 knocks+0.015
 trips+0.015
 points+0.014
Neg logit contributions
lessness-0.023
he-0.021
ill-0.019
 Suz-0.019
smanship-0.018
Token and
Token ID290
Feature activation+4.04e-03

Pos logit contributions
lessness+0.005
he+0.005
 Suz+0.004
ini+0.004
smanship+0.004
Neg logit contributions
 walks-0.003
 trips-0.002
 asks-0.002
 points-0.002
 knocks-0.002
Token seaw
Token ID32085
Feature activation-6.73e-02

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
 Suz+0.008
smanship+0.008
Neg logit contributions
 asks-0.006
 knocks-0.005
 walks-0.005
 trips-0.005
 points-0.005
Tokeneed
Token ID2308
Feature activation-3.47e-02

Pos logit contributions
 asks+0.125
 knocks+0.118
 walks+0.112
 trips+0.108
 strangers+0.105
Neg logit contributions
lessness-0.124
he-0.115
ill-0.111
 Suz-0.100
smanship-0.097
Token.
Token ID13
Feature activation-6.52e-03

Pos logit contributions
 asks+0.059
 knocks+0.054
 walks+0.054
 trips+0.053
 points+0.050
Neg logit contributions
lessness-0.070
he-0.066
ill-0.060
 Suz-0.057
smanship-0.056
Token They
Token ID1119
Feature activation+8.62e-05

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.013
he-0.012
ill-0.011
 Suz-0.011
smanship-0.010
Token saw
Token ID2497
Feature activation+9.79e-03

Pos logit contributions
lessness+0.000
he+0.000
ill+0.000
 Suz+0.000
smanship+0.000
Neg logit contributions
 asks-0.000
 walks-0.000
 knocks-0.000
 trips-0.000
 points-0.000
Token other
Token ID584
Feature activation-1.56e-02

Pos logit contributions
lessness+0.021
he+0.020
ill+0.019
 Suz+0.018
smanship+0.017
Neg logit contributions
 asks-0.015
 knocks-0.014
 walks-0.013
 trips-0.013
 points-0.012
Token salad
Token ID20698
Feature activation-2.11e-02

Pos logit contributions
lessness+0.012
he+0.012
ill+0.010
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.012
 points-0.012
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 asks+0.035
 knocks+0.032
 walks+0.032
 trips+0.031
 points+0.030
Neg logit contributions
lessness-0.044
he-0.041
ill-0.038
 Suz-0.036
smanship-0.035
Token they
Token ID484
Feature activation-7.71e-03

Pos logit contributions
lessness+0.029
he+0.028
ill+0.026
 Suz+0.026
smanship+0.024
Neg logit contributions
 asks-0.018
 knocks-0.016
 walks-0.016
 trips-0.015
 points-0.014
Token asked
Token ID1965
Feature activation-1.42e-02

Pos logit contributions
 asks+0.012
 walks+0.011
 knocks+0.011
 trips+0.011
 points+0.011
Neg logit contributions
lessness-0.016
he-0.015
ill-0.014
 Suz-0.013
smanship-0.012
Token for
Token ID329
Feature activation-5.60e-02

Pos logit contributions
 asks+0.020
 knocks+0.019
 trips+0.017
 walks+0.017
 points+0.017
Neg logit contributions
lessness-0.031
he-0.029
ill-0.028
 Suz-0.027
taker-0.024
Token dessert
Token ID23084
Feature activation-6.72e-02

Pos logit contributions
 asks+0.099
 knocks+0.093
 trips+0.091
 walks+0.087
 points+0.086
Neg logit contributions
lessness-0.106
he-0.095
ill-0.090
 Suz-0.085
smanship-0.080
Token again
Token ID757
Feature activation-2.10e-02

Pos logit contributions
 asks+0.117
 walks+0.110
 knocks+0.108
 trips+0.104
 points+0.100
Neg logit contributions
lessness-0.132
he-0.124
ill-0.118
 Suz-0.105
smanship-0.104
Token.
Token ID13
Feature activation-1.62e-02

Pos logit contributions
 asks+0.038
 knocks+0.036
 walks+0.036
 trips+0.034
 throws+0.033
Neg logit contributions
lessness-0.038
he-0.036
ill-0.035
 Suz-0.031
smanship-0.029
Token
Token ID198
Feature activation-1.28e-02

Pos logit contributions
 walks+0.027
 asks+0.026
 trips+0.026
 knocks+0.025
 strangers+0.025
Neg logit contributions
lessness-0.030
he-0.028
ill-0.028
 Suz-0.027
taker-0.023
Token
Token ID198
Feature activation-6.96e-03

Pos logit contributions
 walks+0.026
 knocks+0.025
 asks+0.025
 hides+0.025
 trips+0.025
Neg logit contributions
he-0.020
lessness-0.018
ill-0.017
 Suz-0.016
taker-0.013
Token"
Token ID1
Feature activation+6.90e-03

Pos logit contributions
 asks+0.012
 knocks+0.012
 walks+0.012
 trips+0.011
 points+0.011
Neg logit contributions
he-0.013
lessness-0.013
ill-0.012
 Suz-0.011
taker-0.010
Token some
Token ID617
Feature activation-3.76e-03

Pos logit contributions
lessness+0.006
he+0.006
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.004
 knocks-0.003
 walks-0.003
 trips-0.003
 points-0.003
Token shells
Token ID19679
Feature activation+1.05e-02

Pos logit contributions
 asks+0.005
 knocks+0.005
 walks+0.005
 trips+0.005
 throws+0.004
Neg logit contributions
lessness-0.009
he-0.008
ill-0.007
 Suz-0.007
smanship-0.007
Token and
Token ID290
Feature activation-9.55e-03

Pos logit contributions
lessness+0.021
he+0.019
ill+0.018
 Suz+0.017
smanship+0.016
Neg logit contributions
 asks-0.018
 walks-0.017
 knocks-0.017
 trips-0.016
 points-0.016
Token rocks
Token ID12586
Feature activation-3.01e-03

Pos logit contributions
 asks+0.016
 knocks+0.015
 trips+0.014
 requests+0.014
 walks+0.014
Neg logit contributions
lessness-0.017
ill-0.015
 Suz-0.015
he-0.014
smanship-0.013
Token and
Token ID290
Feature activation-1.38e-02

Pos logit contributions
 asks+0.005
 knocks+0.005
 walks+0.005
 trips+0.005
 points+0.005
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.005
smanship-0.005
Token seaw
Token ID32085
Feature activation-6.72e-02

Pos logit contributions
 asks+0.023
 knocks+0.021
 trips+0.020
 requests+0.020
 walks+0.020
Neg logit contributions
lessness-0.026
ill-0.023
 Suz-0.022
he-0.021
taker-0.019
Tokeneed
Token ID2308
Feature activation-2.90e-02

Pos logit contributions
 asks+0.130
 knocks+0.122
 walks+0.114
 strangers+0.112
 trips+0.111
Neg logit contributions
lessness-0.116
he-0.106
ill-0.104
 Suz-0.090
smanship-0.089
Token.
Token ID13
Feature activation-6.61e-03

Pos logit contributions
 asks+0.053
 knocks+0.048
 walks+0.047
 trips+0.046
 requests+0.044
Neg logit contributions
lessness-0.055
he-0.051
ill-0.048
 Suz-0.043
smanship-0.042
Token He
Token ID679
Feature activation-9.58e-03

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.013
he-0.013
smanship-0.012
ill-0.011
 Suz-0.010
Token picked
Token ID6497
Feature activation+7.73e-03

Pos logit contributions
 asks+0.016
 knocks+0.015
 walks+0.015
 strangers+0.015
 requests+0.014
Neg logit contributions
lessness-0.017
ill-0.016
he-0.016
 Suz-0.015
taker-0.012
Token them
Token ID606
Feature activation+1.75e-02

Pos logit contributions
lessness+0.016
he+0.016
ill+0.015
 Suz+0.014
smanship+0.013
Neg logit contributions
 asks-0.011
 knocks-0.011
 walks-0.010
 trips-0.010
 strangers-0.009
Token sund
Token ID37437
Feature activation-1.06e-02

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.010
 points+0.009
Neg logit contributions
lessness-0.014
he-0.013
ill-0.012
 Suz-0.012
smanship-0.011
Tokenae
Token ID3609
Feature activation-1.48e-02

Pos logit contributions
 asks+0.018
 knocks+0.017
 trips+0.017
 walks+0.017
 points+0.016
Neg logit contributions
lessness-0.020
he-0.018
 Suz-0.017
ill-0.017
smanship-0.016
Token with
Token ID351
Feature activation-9.95e-03

Pos logit contributions
 asks+0.025
 walks+0.022
 knocks+0.022
 trips+0.022
 points+0.021
Neg logit contributions
lessness-0.030
he-0.029
ill-0.026
 Suz-0.025
smanship-0.024
Token lots
Token ID6041
Feature activation-5.11e-03

Pos logit contributions
 asks+0.018
 knocks+0.018
 walks+0.016
 trips+0.016
 hides+0.016
Neg logit contributions
lessness-0.017
he-0.017
 Suz-0.015
ill-0.015
ay-0.014
Token of
Token ID286
Feature activation+3.64e-03

Pos logit contributions
 asks+0.013
 knocks+0.012
 walks+0.012
 trips+0.012
 points+0.012
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.004
smanship-0.004
Token topp
Token ID23126
Feature activation-6.70e-02

Pos logit contributions
he+0.007
lessness+0.007
 Suz+0.006
ill+0.006
smanship+0.005
Neg logit contributions
 asks-0.006
 knocks-0.006
 walks-0.006
 trips-0.005
 strangers-0.005
Tokenings
Token ID654
Feature activation-2.49e-03

Pos logit contributions
 asks+0.140
 walks+0.132
 knocks+0.130
 strangers+0.123
 trips+0.119
Neg logit contributions
he-0.106
lessness-0.104
ill-0.101
ay-0.087
 Suz-0.084
Token.
Token ID13
Feature activation-2.11e-03

Pos logit contributions
 asks+0.004
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.005
he-0.005
ill-0.004
 Suz-0.004
taker-0.004
Token Tim
Token ID5045
Feature activation+1.52e-02

Pos logit contributions
 asks+0.004
 knocks+0.003
 walks+0.003
 trips+0.003
 points+0.003
Neg logit contributions
lessness-0.004
he-0.004
ill-0.004
 Suz-0.003
smanship-0.003
Tokenmy
Token ID1820
Feature activation-2.03e-02

Pos logit contributions
lessness+0.024
he+0.020
 Suz+0.017
ill+0.017
taker+0.017
Neg logit contributions
 asks-0.035
 knocks-0.032
 walks-0.032
 trips-0.031
 throws-0.031
Token was
Token ID373
Feature activation-1.21e-02

Pos logit contributions
 asks+0.030
 strangers+0.030
 walks+0.030
 knocks+0.029
 trips+0.028
Neg logit contributions
he-0.038
lessness-0.037
ill-0.034
 Suz-0.033
smanship-0.031
Token are
Token ID389
Feature activation-8.52e-03

Pos logit contributions
 asks+0.004
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.003
Neg logit contributions
lessness-0.004
he-0.004
ill-0.003
 Suz-0.003
smanship-0.003
Token welcome
Token ID7062
Feature activation-1.37e-02

Pos logit contributions
 asks+0.018
 walks+0.017
 knocks+0.017
 trips+0.016
 points+0.016
Neg logit contributions
lessness-0.014
he-0.013
ill-0.011
 Suz-0.011
smanship-0.011
Token,
Token ID11
Feature activation+6.97e-03

Pos logit contributions
 asks+0.024
 knocks+0.022
 walks+0.022
 trips+0.021
 points+0.021
Neg logit contributions
lessness-0.026
he-0.025
ill-0.023
 Suz-0.022
smanship-0.021
Token kids
Token ID3988
Feature activation-9.36e-03

Pos logit contributions
lessness+0.017
he+0.016
ill+0.015
 Suz+0.014
smanship+0.014
Neg logit contributions
 asks-0.009
 knocks-0.008
 walks-0.008
 trips-0.008
 points-0.007
Token.
Token ID13
Feature activation-5.61e-03

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.023
he-0.022
 Suz-0.021
ill-0.021
smanship-0.020
Token Enjoy
Token ID18179
Feature activation-6.68e-02

Pos logit contributions
 asks+0.011
 walks+0.010
 knocks+0.010
 trips+0.010
 points+0.009
Neg logit contributions
lessness-0.010
he-0.009
ill-0.008
 Suz-0.008
smanship-0.007
Token your
Token ID534
Feature activation-8.10e-03

Pos logit contributions
 asks+0.108
 knocks+0.102
 walks+0.097
 knocked+0.094
 trips+0.092
Neg logit contributions
lessness-0.125
he-0.125
ill-0.110
 Suz-0.100
ay-0.094
Token ol
Token ID25776
Feature activation+9.28e-03

Pos logit contributions
 asks+0.015
 knocks+0.012
 knocked+0.012
 challenged+0.011
 begged+0.011
Neg logit contributions
lessness-0.012
he-0.012
igsaw-0.011
ay-0.010
 Suz-0.010
Tokenives
Token ID1083
Feature activation+1.23e-02

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
 Suz+0.012
smanship+0.012
Neg logit contributions
 asks-0.019
 knocks-0.018
 walks-0.018
 trips-0.017
 points-0.017
Token.
Token ID13
Feature activation-8.14e-03

Pos logit contributions
lessness+0.024
he+0.023
ill+0.021
 Suz+0.019
smanship+0.019
Neg logit contributions
 asks-0.021
 knocks-0.020
 walks-0.020
 trips-0.019
 points-0.018
Token But
Token ID887
Feature activation-6.32e-03

Pos logit contributions
 asks+0.014
 walks+0.014
 knocks+0.014
 trips+0.014
 points+0.013
Neg logit contributions
lessness-0.015
he-0.014
ill-0.013
 Suz-0.012
smanship-0.011
TokenEvery
Token ID6109
Feature activation+1.22e-02

Pos logit contributions
lessness+0.005
he+0.005
ill+0.005
 Suz+0.005
smanship+0.004
Neg logit contributions
 asks-0.004
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.004
Token day
Token ID1110
Feature activation+2.83e-03

Pos logit contributions
lessness+0.020
he+0.019
ill+0.017
 Suz+0.016
smanship+0.015
Neg logit contributions
 asks-0.025
 walks-0.023
 knocks-0.023
 trips-0.023
 points-0.022
Token,
Token ID11
Feature activation+2.06e-02

Pos logit contributions
lessness+0.006
he+0.005
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.005
 walks-0.004
 knocks-0.004
 trips-0.004
 points-0.004
Token m
Token ID285
Feature activation-2.60e-02

Pos logit contributions
lessness+0.051
he+0.048
ill+0.045
smanship+0.043
igsaw+0.042
Neg logit contributions
 asks-0.028
 walks-0.024
 knocks-0.022
 trips-0.021
 strangers-0.021
Tokenama
Token ID1689
Feature activation-2.18e-02

Pos logit contributions
 asks+0.052
 knocks+0.048
 walks+0.047
 trips+0.047
 points+0.045
Neg logit contributions
lessness-0.044
he-0.041
ill-0.037
 Suz-0.034
smanship-0.034
Token oy
Token ID35104
Feature activation-6.68e-02

Pos logit contributions
 asks+0.038
 knocks+0.037
 walks+0.036
 trips+0.036
 points+0.034
Neg logit contributions
lessness-0.039
he-0.037
ill-0.036
 Suz-0.034
smanship-0.031
Tokenster
Token ID1706
Feature activation-4.03e-02

Pos logit contributions
 asks+0.108
 knocks+0.106
 walks+0.103
 trips+0.099
 throws+0.096
Neg logit contributions
lessness-0.134
he-0.124
ill-0.120
 Suz-0.112
smanship-0.109
Token taught
Token ID7817
Feature activation+7.38e-03

Pos logit contributions
 asks+0.067
 walks+0.064
 knocks+0.064
 trips+0.062
 points+0.061
Neg logit contributions
lessness-0.074
he-0.069
ill-0.065
 Suz-0.061
smanship-0.059
Token her
Token ID607
Feature activation-8.73e-03

Pos logit contributions
lessness+0.017
he+0.016
ill+0.015
 Suz+0.014
smanship+0.014
Neg logit contributions
 asks-0.011
 knocks-0.010
 walks-0.010
 trips-0.010
 points-0.009
Token baby
Token ID5156
Feature activation-2.26e-02

Pos logit contributions
 asks+0.012
 knocks+0.010
 walks+0.010
 trips+0.010
 points+0.010
Neg logit contributions
lessness-0.021
he-0.020
ill-0.018
 Suz-0.018
smanship-0.017
Token oy
Token ID35104
Feature activation-4.52e-02

Pos logit contributions
 asks+0.033
 knocks+0.030
 walks+0.029
 trips+0.028
 points+0.027
Neg logit contributions
lessness-0.051
he-0.047
ill-0.045
 Suz-0.043
smanship-0.041
Token it
Token ID340
Feature activation+1.59e-02

Pos logit contributions
 asks+0.020
 knocks+0.019
 walks+0.017
 trips+0.017
 requests+0.016
Neg logit contributions
lessness-0.021
he-0.020
 Suz-0.019
ill-0.018
taker-0.016
Token."
Token ID526
Feature activation+7.54e-03

Pos logit contributions
lessness+0.031
he+0.030
ill+0.027
 Suz+0.026
smanship+0.025
Neg logit contributions
 asks-0.027
 walks-0.025
 knocks-0.025
 trips-0.025
 points-0.024
Token Tim
Token ID5045
Feature activation+3.22e-02

Pos logit contributions
lessness+0.016
he+0.015
ill+0.014
smanship+0.013
taker+0.013
Neg logit contributions
 asks-0.012
 knocks-0.011
 walks-0.011
 trips-0.010
 throws-0.010
Tokenmy
Token ID1820
Feature activation+1.50e-03

Pos logit contributions
lessness+0.062
he+0.056
taker+0.049
igsaw+0.048
umm+0.048
Neg logit contributions
 asks-0.061
 requests-0.051
 walks-0.050
 asked-0.050
 begged-0.049
Token thought
Token ID1807
Feature activation+2.65e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.002
Token that
Token ID326
Feature activation+5.27e-02

Pos logit contributions
lessness+0.057
he+0.054
ill+0.052
 Suz+0.047
smanship+0.047
Neg logit contributions
 walks-0.040
 asks-0.039
 trips-0.038
 knocks-0.037
 points-0.035
Token was
Token ID373
Feature activation+7.90e-03

Pos logit contributions
lessness+0.129
he+0.123
ill+0.115
 Suz+0.111
smanship+0.108
Neg logit contributions
 asks-0.067
 knocks-0.060
 walks-0.060
 trips-0.058
 points-0.055
Token a
Token ID257
Feature activation+5.85e-03

Pos logit contributions
lessness+0.016
he+0.015
 Suz+0.014
ill+0.014
smanship+0.013
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.012
 points-0.011
Token good
Token ID922
Feature activation+2.46e-02

Pos logit contributions
lessness+0.011
he+0.010
ill+0.009
 Suz+0.009
smanship+0.009
Neg logit contributions
 asks-0.010
 walks-0.009
 knocks-0.009
 trips-0.009
 points-0.009
Token idea
Token ID2126
Feature activation+3.67e-02

Pos logit contributions
lessness+0.039
he+0.036
ill+0.033
 Suz+0.032
smanship+0.030
Neg logit contributions
 asks-0.051
 knocks-0.048
 walks-0.047
 trips-0.046
 points-0.044
Token and
Token ID290
Feature activation+7.66e-03

Pos logit contributions
lessness+0.078
he+0.076
ill+0.070
 Suz+0.067
smanship+0.066
Neg logit contributions
 asks-0.054
 walks-0.050
 knocks-0.049
 trips-0.049
 points-0.048
Token tell
Token ID1560
Feature activation-1.31e-02

Pos logit contributions
he+0.017
lessness+0.017
ill+0.016
 Suz+0.016
smanship+0.014
Neg logit contributions
 asks-0.014
 knocks-0.014
 trips-0.013
 walks-0.013
 points-0.013
Token her
Token ID607
Feature activation-1.24e-02

Pos logit contributions
 asks+0.022
 knocks+0.021
 trips+0.020
 walks+0.020
 points+0.019
Neg logit contributions
lessness-0.026
he-0.024
ill-0.023
 Suz-0.022
taker-0.021
Token dad
Token ID9955
Feature activation+6.44e-03

Pos logit contributions
 asks+0.017
 walks+0.016
 knocks+0.015
 trips+0.015
 points+0.014
Neg logit contributions
lessness-0.029
he-0.028
ill-0.026
 Suz-0.025
smanship-0.024
Token what
Token ID644
Feature activation-1.38e-02

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
 Suz+0.012
smanship+0.012
Neg logit contributions
 asks-0.009
 walks-0.008
 knocks-0.008
 trips-0.008
 points-0.008
Token she
Token ID673
Feature activation+2.59e-02

Pos logit contributions
 asks+0.024
 knocks+0.023
 walks+0.022
 trips+0.021
 throws+0.021
Neg logit contributions
lessness-0.026
he-0.024
ill-0.022
 Suz-0.022
smanship-0.020
Token had
Token ID550
Feature activation+5.74e-02

Pos logit contributions
lessness+0.056
he+0.054
ill+0.050
 Suz+0.047
smanship+0.046
Neg logit contributions
 asks-0.038
 knocks-0.035
 walks-0.035
 trips-0.034
 points-0.032
Token read
Token ID1100
Feature activation+2.50e-02

Pos logit contributions
lessness+0.105
he+0.104
ill+0.094
 Suz+0.083
smanship+0.083
Neg logit contributions
 asks-0.102
 knocks-0.100
 walks-0.097
 trips-0.092
 throws-0.090
Token.
Token ID13
Feature activation+1.82e-03

Pos logit contributions
lessness+0.054
he+0.050
ill+0.046
 Suz+0.045
smanship+0.044
Neg logit contributions
 asks-0.038
 walks-0.036
 knocks-0.035
 trips-0.035
 points-0.033
Token She
Token ID1375
Feature activation+1.05e-03

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.002
taker+0.002
Neg logit contributions
 walks-0.004
 knocks-0.004
 asks-0.004
 trips-0.004
 strangers-0.003
Token said
Token ID531
Feature activation+6.35e-03

Pos logit contributions
he+0.002
lessness+0.002
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.001
 knocks-0.001
 strangers-0.001
 trips-0.001
 walks-0.001
Token,
Token ID11
Feature activation-2.43e-02

Pos logit contributions
lessness+0.011
he+0.010
ill+0.009
 Suz+0.009
taker+0.009
Neg logit contributions
 asks-0.012
 knocks-0.012
 walks-0.011
 trips-0.011
 throws-0.010
Token happy
Token ID3772
Feature activation+1.19e-02

Pos logit contributions
 asks+0.001
 walks+0.001
 knocks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token again
Token ID757
Feature activation+1.43e-02

Pos logit contributions
lessness+0.025
he+0.025
ill+0.023
smanship+0.022
 Suz+0.022
Neg logit contributions
 asks-0.019
 walks-0.018
 knocks-0.017
 trips-0.017
 points-0.016
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
lessness+0.031
he+0.029
ill+0.027
 Suz+0.026
smanship+0.026
Neg logit contributions
 asks-0.022
 walks-0.020
 trips-0.020
 knocks-0.020
 points-0.019
Token hugged
Token ID41130
Feature activation+1.20e-03

Pos logit contributions
lessness+0.023
he+0.022
ill+0.020
 Suz+0.020
smanship+0.019
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.014
 trips-0.014
 points-0.014
Token his
Token ID465
Feature activation-7.47e-04

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
smanship+0.003
taker+0.003
Neg logit contributions
 asks-0.001
 walks-0.001
 knocks-0.001
 trips-0.001
 strangers-0.001
Token mom
Token ID1995
Feature activation+5.56e-02

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.002
he-0.002
ill-0.002
 Suz-0.002
smanship-0.002
Token.
Token ID13
Feature activation+9.63e-03

Pos logit contributions
lessness+0.142
 Suz+0.131
he+0.131
imm+0.129
smanship+0.125
Neg logit contributions
 asks-0.054
 walks-0.050
's-0.045
 trips-0.045
 knocks-0.043
Token<|endoftext|>
Token ID50256
Feature activation+1.49e-03

Pos logit contributions
lessness+0.019
he+0.018
ill+0.017
 Suz+0.016
smanship+0.016
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.015
 trips-0.015
 points-0.014
TokenOnce
Token ID7454
Feature activation+1.01e-02

Pos logit contributions
lessness+0.004
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.002
 walks-0.002
 trips-0.002
 knocks-0.002
 points-0.002
Token upon
Token ID2402
Feature activation-3.08e-02

Pos logit contributions
lessness+0.016
he+0.015
ill+0.013
smanship+0.013
 Suz+0.013
Neg logit contributions
 asks-0.021
 knocks-0.019
 trips-0.019
 walks-0.018
 points-0.018
Token a
Token ID257
Feature activation-1.15e-02

Pos logit contributions
 asks+0.044
 knocks+0.040
 walks+0.039
 trips+0.038
 points+0.036
Neg logit contributions
lessness-0.070
he-0.067
ill-0.062
 Suz-0.059
smanship-0.057
Token happy
Token ID3772
Feature activation+9.13e-03

Pos logit contributions
 asks+0.009
 knocks+0.008
 walks+0.008
 trips+0.008
 points+0.007
Neg logit contributions
lessness-0.008
he-0.007
ill-0.007
smanship-0.006
 Suz-0.006
Token and
Token ID290
Feature activation+2.68e-02

Pos logit contributions
lessness+0.021
he+0.020
ill+0.018
smanship+0.018
 Suz+0.017
Neg logit contributions
 asks-0.013
 walks-0.012
 knocks-0.011
 trips-0.011
 points-0.011
Token proud
Token ID6613
Feature activation+1.01e-02

Pos logit contributions
lessness+0.055
he+0.052
ill+0.049
smanship+0.045
 Suz+0.045
Neg logit contributions
 asks-0.045
 walks-0.041
 knocks-0.041
 trips-0.039
 points-0.038
Token of
Token ID286
Feature activation+8.38e-04

Pos logit contributions
lessness+0.017
he+0.016
ill+0.014
 Suz+0.014
smanship+0.013
Neg logit contributions
 asks-0.021
 walks-0.019
 knocks-0.019
 trips-0.019
 points-0.018
Token her
Token ID607
Feature activation+1.04e-02

Pos logit contributions
lessness+0.002
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.002
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token mom
Token ID1995
Feature activation+5.30e-02

Pos logit contributions
lessness+0.023
he+0.022
ill+0.021
 Suz+0.020
smanship+0.019
Neg logit contributions
 asks-0.015
 walks-0.014
 knocks-0.014
 trips-0.014
 points-0.013
Token's
Token ID338
Feature activation+1.32e-02

Pos logit contributions
lessness+0.134
he+0.120
 Suz+0.119
ill+0.118
imm+0.115
Neg logit contributions
 asks-0.051
 walks-0.050
 trips-0.044
's-0.044
 strangers-0.043
Token plan
Token ID1410
Feature activation+1.87e-02

Pos logit contributions
lessness+0.028
he+0.027
ill+0.025
 Suz+0.023
smanship+0.022
Neg logit contributions
 asks-0.020
 walks-0.020
 knocks-0.019
 trips-0.019
 points-0.018
Token.
Token ID13
Feature activation+1.82e-03

Pos logit contributions
lessness+0.046
he+0.044
ill+0.042
 Suz+0.041
smanship+0.040
Neg logit contributions
 walks-0.020
 asks-0.019
 trips-0.018
 knocks-0.018
 points-0.017
Token The
Token ID383
Feature activation+4.67e-03

Pos logit contributions
lessness+0.004
he+0.004
ill+0.004
smanship+0.004
 Suz+0.004
Neg logit contributions
 asks-0.003
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.002
Token end
Token ID886
Feature activation+1.24e-03

Pos logit contributions
lessness+0.010
he+0.010
ill+0.009
 Suz+0.009
smanship+0.008
Neg logit contributions
 asks-0.007
 walks-0.007
 knocks-0.007
 trips-0.006
 points-0.006
Token tasted
Token ID29187
Feature activation+2.19e-02

Pos logit contributions
lessness+0.018
he+0.016
ill+0.016
 Suz+0.014
smanship+0.013
Neg logit contributions
 asks-0.023
 knocks-0.022
 walks-0.021
 trips-0.021
 requests-0.020
Token!
Token ID0
Feature activation+4.61e-03

Pos logit contributions
lessness+0.044
he+0.042
ill+0.038
 Suz+0.037
smanship+0.036
Neg logit contributions
 asks-0.037
 knocks-0.034
 walks-0.034
 trips-0.033
 points-0.031
Token "
Token ID366
Feature activation+1.10e-02

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
smanship+0.008
 Suz+0.008
Neg logit contributions
 asks-0.008
 walks-0.007
 knocks-0.007
 trips-0.007
 points-0.007
TokenT
Token ID51
Feature activation-8.68e-03

Pos logit contributions
lessness+0.022
he+0.021
ill+0.019
 Suz+0.018
smanship+0.018
Neg logit contributions
 asks-0.019
 walks-0.017
 knocks-0.017
 trips-0.016
 points-0.016
Tokenasty
Token ID7833
Feature activation+4.25e-03

Pos logit contributions
 asks+0.017
 walks+0.016
 trips+0.016
 knocks+0.016
 points+0.015
Neg logit contributions
lessness-0.015
he-0.013
 Suz-0.012
ill-0.012
smanship-0.011
Token"
Token ID1
Feature activation+6.97e-02

Pos logit contributions
lessness+0.008
he+0.008
ill+0.007
 Suz+0.007
smanship+0.007
Neg logit contributions
 asks-0.007
 walks-0.007
 knocks-0.007
 trips-0.007
 points-0.006
Token was
Token ID373
Feature activation+5.66e-03

Pos logit contributions
lessness+0.165
he+0.154
ill+0.145
 Suz+0.140
smanship+0.139
Neg logit contributions
 asks-0.096
 walks-0.083
 knocks-0.081
 trips-0.080
 points-0.078
Token the
Token ID262
Feature activation+2.39e-02

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
 Suz+0.008
taker+0.007
Neg logit contributions
 asks-0.011
 knocks-0.011
 walks-0.011
 trips-0.010
 points-0.010
Token perfect
Token ID2818
Feature activation+5.00e-03

Pos logit contributions
lessness+0.048
ill+0.044
he+0.043
 Suz+0.040
ingle+0.037
Neg logit contributions
 asks-0.040
 walks-0.038
 trips-0.037
 knocks-0.037
 points-0.036
Token word
Token ID1573
Feature activation+2.79e-03

Pos logit contributions
lessness+0.010
he+0.009
ill+0.008
 Suz+0.008
smanship+0.008
Neg logit contributions
 asks-0.009
 walks-0.008
 knocks-0.008
 trips-0.008
 points-0.007
Token to
Token ID284
Feature activation+2.21e-02

Pos logit contributions
lessness+0.005
he+0.005
ill+0.005
 Suz+0.004
smanship+0.004
Neg logit contributions
 asks-0.005
 walks-0.005
 knocks-0.005
 trips-0.004
 points-0.004
Token in
Token ID287
Feature activation+4.88e-03

Pos logit contributions
lessness+0.040
he+0.038
ill+0.035
 Suz+0.033
smanship+0.032
Neg logit contributions
 asks-0.036
 knocks-0.033
 walks-0.033
 trips-0.032
 points-0.031
Token the
Token ID262
Feature activation+1.45e-02

Pos logit contributions
lessness+0.012
he+0.011
ill+0.010
smanship+0.010
 Suz+0.010
Neg logit contributions
 asks-0.007
 walks-0.006
 knocks-0.006
 trips-0.006
 points-0.005
Token park
Token ID3952
Feature activation+7.79e-03

Pos logit contributions
lessness+0.029
he+0.028
ill+0.024
 Suz+0.024
smanship+0.023
Neg logit contributions
 asks-0.024
 knocks-0.021
 walks-0.021
 points-0.020
 trips-0.020
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
lessness+0.018
he+0.017
ill+0.015
 Suz+0.015
smanship+0.015
Neg logit contributions
 asks-0.011
 walks-0.010
 knocks-0.010
 trips-0.010
 points-0.009
Token<|endoftext|>
Token ID50256
Feature activation+4.17e-04

Pos logit contributions
 asks+0.016
 walks+0.015
 knocks+0.015
 trips+0.014
 points+0.014
Neg logit contributions
lessness-0.023
he-0.022
ill-0.020
 Suz-0.020
smanship-0.019
TokenTim
Token ID14967
Feature activation+1.58e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.001
 knocks-0.001
 walks-0.001
 trips-0.001
 points-0.001
Token was
Token ID373
Feature activation-1.31e-02

Pos logit contributions
lessness+0.033
he+0.032
ill+0.030
 Suz+0.027
smanship+0.027
Neg logit contributions
 asks-0.023
 knocks-0.021
 walks-0.021
 trips-0.021
 points-0.020
Token an
Token ID281
Feature activation+1.82e-02

Pos logit contributions
 asks+0.022
 knocks+0.021
 walks+0.020
 trips+0.020
 points+0.020
Neg logit contributions
he-0.025
lessness-0.025
ill-0.021
 Suz-0.021
taker-0.019
Token adventurous
Token ID39770
Feature activation+8.69e-03

Pos logit contributions
lessness+0.039
he+0.036
ill+0.035
 Suz+0.033
smanship+0.032
Neg logit contributions
 asks-0.030
 knocks-0.026
 walks-0.026
 trips-0.025
 strangers-0.024
Token boy
Token ID2933
Feature activation+1.05e-02

Pos logit contributions
lessness+0.020
he+0.018
ill+0.017
 Suz+0.016
smanship+0.016
Neg logit contributions
 asks-0.013
 walks-0.011
 knocks-0.011
 trips-0.011
 points-0.010
Token who
Token ID508
Feature activation+3.84e-04

Pos logit contributions
he+0.018
lessness+0.018
ill+0.016
 Suz+0.015
smanship+0.014
Neg logit contributions
 asks-0.019
 knocks-0.018
 walks-0.017
 trips-0.017
 points-0.017
Token.
Token ID13
Feature activation-7.65e-04

Pos logit contributions
lessness+0.035
he+0.033
ill+0.031
 Suz+0.029
smanship+0.029
Neg logit contributions
 asks-0.025
 knocks-0.022
 walks-0.022
 trips-0.022
 points-0.021
Token Benny
Token ID44275
Feature activation-3.97e-03

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
 Suz-0.001
ill-0.001
taker-0.001
Token was
Token ID373
Feature activation-7.62e-03

Pos logit contributions
 asks+0.006
 knocks+0.006
 strangers+0.006
 trips+0.006
 walks+0.006
Neg logit contributions
lessness-0.007
he-0.007
ill-0.007
 Suz-0.007
taker-0.006
Token scared
Token ID12008
Feature activation+2.08e-02

Pos logit contributions
 asks+0.015
 knocks+0.014
 walks+0.014
 trips+0.014
 points+0.013
Neg logit contributions
lessness-0.013
he-0.012
ill-0.011
 Suz-0.010
smanship-0.009
Token,
Token ID11
Feature activation+2.04e-02

Pos logit contributions
he+0.046
lessness+0.045
smanship+0.041
ill+0.040
 Suz+0.038
Neg logit contributions
 asks-0.030
 walks-0.027
 points-0.026
 trips-0.026
 strangers-0.025
Token but
Token ID475
Feature activation+1.35e-02

Pos logit contributions
lessness+0.030
he+0.030
ill+0.026
smanship+0.024
 Suz+0.023
Neg logit contributions
 asks-0.044
 walks-0.042
 knocks-0.041
 trips-0.041
 points-0.040
Token he
Token ID339
Feature activation-5.07e-03

Pos logit contributions
lessness+0.029
he+0.027
ill+0.025
 Suz+0.024
smanship+0.023
Neg logit contributions
 asks-0.022
 walks-0.020
 knocks-0.020
 trips-0.019
 points-0.019
Token answered
Token ID9373
Feature activation+5.88e-03

Pos logit contributions
 asks+0.009
 knocks+0.008
 walks+0.008
 trips+0.008
 throws+0.008
Neg logit contributions
lessness-0.010
he-0.009
ill-0.009
 Suz-0.008
taker-0.007
Token,
Token ID11
Feature activation-1.28e-03

Pos logit contributions
lessness+0.012
he+0.012
ill+0.011
 Suz+0.010
smanship+0.010
Neg logit contributions
 asks-0.009
 walks-0.009
 knocks-0.008
 trips-0.008
 points-0.008
Token "
Token ID366
Feature activation+2.13e-02

Pos logit contributions
 asks+0.002
 knocks+0.002
 trips+0.002
 walks+0.002
 throws+0.002
Neg logit contributions
lessness-0.003
he-0.003
ill-0.002
 Suz-0.002
taker-0.002
TokenIt
Token ID1026
Feature activation+1.20e-02

Pos logit contributions
lessness+0.045
he+0.042
ill+0.038
smanship+0.037
 Suz+0.037
Neg logit contributions
 asks-0.034
 walks-0.030
 knocks-0.030
 trips-0.029
 points-0.029
Token other
Token ID584
Feature activation-2.79e-02

Pos logit contributions
 knocks+0.019
 asks+0.019
 walks+0.018
 trips+0.017
 requests+0.016
Neg logit contributions
lessness-0.030
he-0.030
ill-0.027
 Suz-0.025
 tallest-0.024
Token one
Token ID530
Feature activation+2.99e-03

Pos logit contributions
 knocks+0.036
 asks+0.035
 walks+0.034
 requests+0.030
 trips+0.029
Neg logit contributions
lessness-0.060
he-0.059
ill-0.054
 Suz-0.050
cy-0.048
Token was
Token ID373
Feature activation+2.23e-03

Pos logit contributions
lessness+0.006
he+0.006
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.005
 walks-0.005
 knocks-0.005
 trips-0.004
 requests-0.004
Token named
Token ID3706
Feature activation+3.27e-02

Pos logit contributions
lessness+0.004
 Suz+0.004
he+0.003
ill+0.003
taker+0.003
Neg logit contributions
 knocks-0.004
 asks-0.004
 walks-0.004
 trips-0.004
 points-0.004
Token Hudson
Token ID19995
Feature activation-1.22e-02

Pos logit contributions
lessness+0.061
he+0.058
ill+0.055
 Suz+0.055
taker+0.048
Neg logit contributions
 asks-0.057
 knocks-0.053
 walks-0.052
 trips-0.051
 points-0.049
Token.
Token ID13
Feature activation+8.69e-03

Pos logit contributions
 asks+0.022
 knocks+0.022
 walks+0.021
 trips+0.020
 throws+0.019
Neg logit contributions
lessness-0.023
he-0.021
ill-0.020
 Suz-0.018
smanship-0.018
Token They
Token ID1119
Feature activation+4.38e-03

Pos logit contributions
he+0.015
lessness+0.015
ill+0.013
 Suz+0.013
smanship+0.011
Neg logit contributions
 asks-0.015
 walks-0.014
 knocks-0.014
 trips-0.014
 points-0.014
Token lived
Token ID5615
Feature activation-1.15e-02

Pos logit contributions
lessness+0.008
he+0.008
ill+0.007
 Suz+0.007
smanship+0.006
Neg logit contributions
 asks-0.008
 knocks-0.008
 walks-0.008
 trips-0.007
 points-0.007
Token together
Token ID1978
Feature activation+3.64e-03

Pos logit contributions
 asks+0.019
 walks+0.018
 knocks+0.018
 trips+0.017
 points+0.017
Neg logit contributions
lessness-0.023
he-0.022
ill-0.020
 Suz-0.019
smanship-0.019
Token in
Token ID287
Feature activation+2.90e-03

Pos logit contributions
lessness+0.006
he+0.005
ill+0.005
 Suz+0.005
smanship+0.004
Neg logit contributions
 asks-0.007
 knocks-0.007
 walks-0.007
 requests-0.007
 yells-0.006
Token a
Token ID257
Feature activation+1.50e-04

Pos logit contributions
lessness+0.006
he+0.006
ill+0.005
 Suz+0.005
smanship+0.005
Neg logit contributions
 asks-0.005
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.004
Token.
Token ID13
Feature activation+8.73e-03

Pos logit contributions
lessness+0.004
he+0.004
ill+0.004
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.003
 walks-0.003
 knocks-0.003
 trips-0.003
 points-0.002
Token He
Token ID679
Feature activation-8.90e-03

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
 Suz+0.013
smanship+0.011
Neg logit contributions
 asks-0.017
 knocks-0.016
 walks-0.016
 trips-0.016
 points-0.015
Token began
Token ID2540
Feature activation-1.79e-02

Pos logit contributions
 asks+0.014
 knocks+0.014
 trips+0.013
 throws+0.013
 walks+0.013
Neg logit contributions
lessness-0.016
 Suz-0.015
he-0.015
ill-0.014
taker-0.012
Token to
Token ID284
Feature activation-1.30e-02

Pos logit contributions
 asks+0.034
 knocks+0.031
 throws+0.029
 happens+0.029
 points+0.029
Neg logit contributions
lessness-0.028
 Suz-0.026
he-0.024
ill-0.023
free-0.020
Token run
Token ID1057
Feature activation-1.90e-02

Pos logit contributions
 asks+0.021
 knocks+0.019
 trips+0.018
 points+0.018
 walks+0.018
Neg logit contributions
lessness-0.025
he-0.024
 Suz-0.023
ill-0.022
smanship-0.020
Token as
Token ID355
Feature activation+1.01e-02

Pos logit contributions
 walks+0.030
 trips+0.029
 asks+0.029
 knocks+0.028
 points+0.026
Neg logit contributions
lessness-0.041
he-0.037
smanship-0.035
ill-0.035
ini-0.033
Token fast
Token ID3049
Feature activation-2.35e-02

Pos logit contributions
lessness+0.021
ill+0.020
he+0.020
smanship+0.019
imm+0.019
Neg logit contributions
 asks-0.015
 walks-0.015
 trips-0.015
 knocks-0.014
 strangers-0.013
Token as
Token ID355
Feature activation+1.07e-02

Pos logit contributions
 asks+0.049
 knocks+0.044
 requests+0.042
 points+0.041
 trips+0.041
Neg logit contributions
lessness-0.037
he-0.033
 Suz-0.029
ill-0.027
free-0.027
Token he
Token ID339
Feature activation-9.46e-03

Pos logit contributions
lessness+0.024
he+0.024
ill+0.022
smanship+0.021
 Suz+0.020
Neg logit contributions
 asks-0.014
 walks-0.014
 trips-0.013
 knocks-0.013
 points-0.012
Token could
Token ID714
Feature activation+1.32e-02

Pos logit contributions
 asks+0.018
 knocks+0.017
 walks+0.017
 trips+0.016
 points+0.016
Neg logit contributions
lessness-0.017
he-0.015
ill-0.014
 Suz-0.014
smanship-0.012
Token.
Token ID13
Feature activation+9.07e-03

Pos logit contributions
lessness+0.029
he+0.027
ill+0.025
 Suz+0.024
smanship+0.024
Neg logit contributions
 asks-0.020
 walks-0.019
 knocks-0.019
 trips-0.018
 points-0.017
Token a
Token ID257
Feature activation-9.76e-03

Pos logit contributions
 asks+0.005
 knocks+0.004
 walks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.005
smanship-0.005
Token big
Token ID1263
Feature activation+1.41e-02

Pos logit contributions
 asks+0.016
 walks+0.015
 knocks+0.015
 trips+0.015
 points+0.014
Neg logit contributions
lessness-0.020
he-0.019
ill-0.017
 Suz-0.016
smanship-0.016
Token white
Token ID2330
Feature activation-4.62e-03

Pos logit contributions
lessness+0.031
he+0.029
ill+0.027
smanship+0.025
 Suz+0.024
Neg logit contributions
 asks-0.022
 knocks-0.021
 walks-0.020
 trips-0.020
 points-0.019
Token bump
Token ID13852
Feature activation+1.77e-02

Pos logit contributions
 asks+0.008
 knocks+0.008
 walks+0.008
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.009
he-0.008
ill-0.008
 Suz-0.007
smanship-0.007
Token in
Token ID287
Feature activation+1.88e-02

Pos logit contributions
lessness+0.034
he+0.031
ill+0.029
 Suz+0.028
smanship+0.027
Neg logit contributions
 asks-0.030
 trips-0.028
 knocks-0.028
 points-0.027
 walks-0.027
Token the
Token ID262
Feature activation+3.02e-02

Pos logit contributions
lessness+0.039
he+0.037
ill+0.034
 Suz+0.032
smanship+0.031
Neg logit contributions
 asks-0.031
 knocks-0.028
 walks-0.028
 trips-0.027
 points-0.026
Token ground
Token ID2323
Feature activation+4.80e-03

Pos logit contributions
lessness+0.064
he+0.061
ill+0.057
smanship+0.054
 Suz+0.054
Neg logit contributions
 asks-0.045
 knocks-0.043
 walks-0.043
 trips-0.043
 strangers-0.040
Token!
Token ID0
Feature activation-3.83e-03

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
 Suz+0.008
smanship+0.007
Neg logit contributions
 asks-0.008
 knocks-0.008
 trips-0.008
 walks-0.008
 points-0.007
Token
Token ID198
Feature activation-6.67e-03

Pos logit contributions
 asks+0.007
 knocks+0.006
 walks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.008
he-0.007
ill-0.007
 Suz-0.006
smanship-0.006
Token
Token ID198
Feature activation+5.38e-03

Pos logit contributions
 asks+0.008
 walks+0.007
 trips+0.007
 knocks+0.006
 points+0.006
Neg logit contributions
lessness-0.017
he-0.015
ill-0.015
smanship-0.015
taker-0.014
TokenMill
Token ID22603
Feature activation+1.08e-02

Pos logit contributions
lessness+0.007
he+0.006
ill+0.006
smanship+0.005
 Suz+0.005
Neg logit contributions
 asks-0.013
 walks-0.012
 knocks-0.012
 trips-0.012
 points-0.012
Token egg
Token ID5935
Feature activation-3.01e-02

Pos logit contributions
lessness+0.025
he+0.025
 Suz+0.021
ill+0.020
smanship+0.020
Neg logit contributions
 asks-0.023
 knocks-0.021
 walks-0.021
 trips-0.020
 points-0.019
Token hopped
Token ID45230
Feature activation-9.70e-04

Pos logit contributions
 asks+0.042
 knocks+0.038
 walks+0.037
 trips+0.036
 points+0.035
Neg logit contributions
lessness-0.069
he-0.066
ill-0.061
 Suz-0.059
smanship-0.057
Token happily
Token ID18177
Feature activation+1.68e-03

Pos logit contributions
 walks+0.001
 trips+0.001
 asks+0.001
 knocks+0.001
 points+0.001
Neg logit contributions
lessness-0.002
ill-0.002
smanship-0.002
 Suz-0.002
he-0.002
Token back
Token ID736
Feature activation-1.56e-02

Pos logit contributions
lessness+0.003
he+0.003
ill+0.003
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.003
 knocks-0.003
 walks-0.003
 trips-0.003
 points-0.003
Token home
Token ID1363
Feature activation-2.46e-02

Pos logit contributions
 asks+0.028
 knocks+0.026
 walks+0.025
 trips+0.025
 points+0.024
Neg logit contributions
lessness-0.030
he-0.028
ill-0.026
 Suz-0.025
smanship-0.024
Token.
Token ID13
Feature activation+3.29e-03

Pos logit contributions
 asks+0.031
 knocks+0.029
 walks+0.028
 trips+0.027
 points+0.026
Neg logit contributions
lessness-0.059
he-0.056
ill-0.052
smanship-0.050
 Suz-0.050
Token It
Token ID632
Feature activation+1.07e-02

Pos logit contributions
lessness+0.007
he+0.007
ill+0.006
smanship+0.006
ini+0.006
Neg logit contributions
 asks-0.005
 walks-0.005
 knocks-0.005
 trips-0.004
 points-0.004
Token had
Token ID550
Feature activation+2.63e-02

Pos logit contributions
lessness+0.026
he+0.025
ill+0.023
 Suz+0.022
smanship+0.022
Neg logit contributions
 asks-0.013
 knocks-0.012
 walks-0.012
 trips-0.011
 points-0.011
Token a
Token ID257
Feature activation-2.17e-04

Pos logit contributions
lessness+0.057
he+0.054
ill+0.051
 Suz+0.048
smanship+0.046
Neg logit contributions
 asks-0.040
 knocks-0.037
 walks-0.035
 trips-0.035
 points-0.033
Token wonderful
Token ID7932
Feature activation+8.03e-03

Pos logit contributions
 walks+0.000
 asks+0.000
 trips+0.000
 knocks+0.000
 throws+0.000
Neg logit contributions
lessness-0.001
he-0.000
ill-0.000
smanship-0.000
 Suz-0.000
Token time
Token ID640
Feature activation-5.24e-03

Pos logit contributions
lessness+0.016
he+0.014
ill+0.013
 Suz+0.013
smanship+0.012
Neg logit contributions
 asks-0.014
 walks-0.013
 knocks-0.013
 trips-0.013
 points-0.013
Token ran
Token ID4966
Feature activation-4.49e-03

Pos logit contributions
lessness+0.056
he+0.053
ill+0.049
 Suz+0.048
smanship+0.045
Neg logit contributions
 asks-0.029
 knocks-0.026
 walks-0.026
 trips-0.026
 points-0.025
Token away
Token ID1497
Feature activation-1.64e-02

Pos logit contributions
 walks+0.007
 trips+0.007
 asks+0.007
 knocks+0.007
 strangers+0.006
Neg logit contributions
lessness-0.009
he-0.008
ill-0.008
smanship-0.008
ini-0.008
Token.
Token ID13
Feature activation-2.50e-03

Pos logit contributions
 asks+0.019
 walks+0.018
 trips+0.018
 knocks+0.018
 points+0.017
Neg logit contributions
lessness-0.040
he-0.038
ill-0.035
smanship-0.035
ini-0.034
Token
Token ID198
Feature activation-7.02e-03

Pos logit contributions
 asks+0.004
 walks+0.004
 knocks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.005
he-0.005
ill-0.004
 Suz-0.004
smanship-0.004
Token
Token ID198
Feature activation+7.98e-03

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.015
he-0.015
ill-0.013
 Suz-0.013
smanship-0.013
TokenThe
Token ID464
Feature activation+8.51e-04

Pos logit contributions
lessness+0.014
he+0.013
ill+0.012
 Suz+0.011
smanship+0.010
Neg logit contributions
 asks-0.016
 knocks-0.015
 walks-0.015
 trips-0.015
 points-0.014
Token girl
Token ID2576
Feature activation-1.04e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.002
Token was
Token ID373
Feature activation+8.43e-04

Pos logit contributions
 asks+0.014
 knocks+0.013
 walks+0.013
 trips+0.013
 points+0.012
Neg logit contributions
lessness-0.023
he-0.022
ill-0.020
 Suz-0.019
smanship-0.018
Token very
Token ID845
Feature activation-7.53e-04

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.002
 knocks-0.002
 walks-0.002
 trips-0.002
 points-0.001
Token happy
Token ID3772
Feature activation+6.33e-03

Pos logit contributions
 asks+0.001
 walks+0.001
 trips+0.001
 knocks+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
 Suz-0.001
ill-0.001
taker-0.001
Token.
Token ID13
Feature activation+2.12e-02

Pos logit contributions
lessness+0.015
he+0.014
ill+0.013
smanship+0.013
 Suz+0.013
Neg logit contributions
 asks-0.009
 walks-0.008
 knocks-0.008
 trips-0.008
 points-0.007
Token put
Token ID1234
Feature activation-1.01e-02

Pos logit contributions
 asks+0.003
 knocks+0.003
 trips+0.003
 walks+0.003
 points+0.003
Neg logit contributions
lessness-0.004
he-0.003
 Suz-0.003
ill-0.003
smanship-0.003
Token them
Token ID606
Feature activation+5.88e-03

Pos logit contributions
 asks+0.007
 walks+0.006
 knocks+0.006
 trips+0.006
 points+0.005
Neg logit contributions
lessness-0.030
he-0.029
ill-0.028
 Suz-0.027
smanship-0.026
Token in
Token ID287
Feature activation+7.03e-03

Pos logit contributions
lessness+0.013
he+0.012
ill+0.011
smanship+0.011
 Suz+0.011
Neg logit contributions
 walks-0.008
 asks-0.008
 trips-0.008
 knocks-0.008
 points-0.007
Token the
Token ID262
Feature activation+1.36e-02

Pos logit contributions
lessness+0.019
he+0.017
ill+0.016
ini+0.016
taker+0.016
Neg logit contributions
 walks-0.007
 asks-0.007
 trips-0.006
 knocks-0.006
 points-0.006
Token bin
Token ID9874
Feature activation+7.07e-03

Pos logit contributions
lessness+0.028
he+0.026
ill+0.024
 Suz+0.023
smanship+0.023
Neg logit contributions
 asks-0.022
 walks-0.021
 knocks-0.021
 trips-0.020
 points-0.019
Token.
Token ID13
Feature activation-4.56e-04

Pos logit contributions
lessness+0.018
he+0.017
 Suz+0.015
ill+0.015
igsaw+0.015
Neg logit contributions
 asks-0.008
 trips-0.007
 walks-0.007
 knocks-0.007
 points-0.006
Token Then
Token ID3244
Feature activation+8.64e-04

Pos logit contributions
 asks+0.001
 walks+0.001
 knocks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
smanship-0.001
 Suz-0.001
Token she
Token ID673
Feature activation-1.66e-02

Pos logit contributions
lessness+0.002
he+0.002
ill+0.002
smanship+0.002
ini+0.002
Neg logit contributions
 asks-0.001
 walks-0.001
 trips-0.001
 knocks-0.001
 points-0.001
Token went
Token ID1816
Feature activation-2.42e-03

Pos logit contributions
 asks+0.025
 knocks+0.024
 walks+0.023
 strangers+0.023
 trips+0.023
Neg logit contributions
lessness-0.032
he-0.031
ill-0.031
 Suz-0.029
smanship-0.026
Token to
Token ID284
Feature activation-3.00e-03

Pos logit contributions
 asks+0.003
 walks+0.003
 knocks+0.003
 trips+0.003
 points+0.003
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
smanship-0.005
 Suz-0.005
Token a
Token ID257
Feature activation-1.08e-02

Pos logit contributions
 asks+0.005
 walks+0.004
 knocks+0.004
 trips+0.004
 points+0.004
Neg logit contributions
lessness-0.007
he-0.006
ill-0.006
 Suz-0.005
smanship-0.005
Token.
Token ID13
Feature activation-8.10e-03

Pos logit contributions
 asks+0.006
 knocks+0.005
 walks+0.005
 trips+0.005
 points+0.005
Neg logit contributions
lessness-0.011
he-0.010
 Suz-0.010
ill-0.010
smanship-0.009
Token She
Token ID1375
Feature activation+1.69e-03

Pos logit contributions
 asks+0.015
 knocks+0.014
 walks+0.014
 trips+0.013
 points+0.013
Neg logit contributions
lessness-0.015
he-0.014
ill-0.013
 Suz-0.012
smanship-0.012
Token was
Token ID373
Feature activation-4.07e-03

Pos logit contributions
lessness+0.004
he+0.003
ill+0.003
 Suz+0.003
smanship+0.003
Neg logit contributions
 asks-0.003
 walks-0.002
 knocks-0.002
 trips-0.002
 strangers-0.002
Token happy
Token ID3772
Feature activation+4.22e-03

Pos logit contributions
 asks+0.008
 knocks+0.007
 walks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.007
he-0.007
ill-0.006
 Suz-0.006
smanship-0.006
Token because
Token ID780
Feature activation+1.20e-02

Pos logit contributions
lessness+0.009
he+0.009
ill+0.008
smanship+0.008
 Suz+0.008
Neg logit contributions
 asks-0.007
 walks-0.006
 knocks-0.006
 trips-0.006
 points-0.006
Token her
Token ID607
Feature activation-1.40e-02

Pos logit contributions
lessness+0.024
he+0.023
ill+0.021
smanship+0.020
 Suz+0.020
Neg logit contributions
 asks-0.021
 knocks-0.019
 walks-0.019
 trips-0.018
 points-0.017
Token hair
Token ID4190
Feature activation-8.29e-03

Pos logit contributions
 asks+0.019
 knocks+0.017
 walks+0.016
 trips+0.016
 points+0.015
Neg logit contributions
lessness-0.032
he-0.031
ill-0.027
 Suz-0.027
smanship-0.026
Token was
Token ID373
Feature activation+2.96e-02

Pos logit contributions
 asks+0.008
 walks+0.007
 knocks+0.007
 trips+0.007
 points+0.007
Neg logit contributions
lessness-0.022
he-0.021
ill-0.020
 Suz-0.020
smanship-0.019
Token dry
Token ID5894
Feature activation+3.55e-02

Pos logit contributions
lessness+0.057
he+0.054
ill+0.050
 Suz+0.047
smanship+0.047
Neg logit contributions
 asks-0.051
 walks-0.048
 knocks-0.047
 trips-0.046
 points-0.045
Token again
Token ID757
Feature activation+1.23e-02

Pos logit contributions
lessness+0.087
 Suz+0.084
he+0.080
ill+0.078
igsaw+0.076
Neg logit contributions
 trips-0.033
 walks-0.032
 asks-0.032
 knocks-0.030
 points-0.030
Token.
Token ID13
Feature activation-7.02e-03

Pos logit contributions
lessness+0.030
he+0.028
ill+0.026
 Suz+0.026
taker+0.025
Neg logit contributions
 asks-0.014
 walks-0.013
 trips-0.013
 knocks-0.012
 points-0.012
Token driver
Token ID4639
Feature activation-1.60e-02

Pos logit contributions
 asks+0.000
 knocks+0.000
 walks+0.000
 trips+0.000
 points+0.000
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
 Suz-0.001
smanship-0.001
Token gave
Token ID2921
Feature activation-8.99e-04

Pos logit contributions
 asks+0.021
 knocks+0.018
 walks+0.018
 trips+0.017
 points+0.017
Neg logit contributions
lessness-0.039
he-0.037
ill-0.034
 Suz-0.033
smanship-0.033
Token Tim
Token ID5045
Feature activation+4.13e-02

Pos logit contributions
 asks+0.001
 walks+0.001
 knocks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.002
he-0.002
ill-0.002
smanship-0.002
 Suz-0.002
Tokenmy
Token ID1820
Feature activation+2.39e-02

Pos logit contributions
lessness+0.069
he+0.060
umm+0.054
 Suz+0.053
smanship+0.052
Neg logit contributions
 walks-0.079
 asks-0.079
 trips-0.073
 throws-0.072
 knocks-0.072
Token a
Token ID257
Feature activation-6.50e-03

Pos logit contributions
lessness+0.058
he+0.054
ill+0.051
 Suz+0.049
smanship+0.048
Neg logit contributions
 asks-0.030
 walks-0.028
 knocks-0.027
 trips-0.027
 points-0.025
Token surprise
Token ID5975
Feature activation-4.24e-03

Pos logit contributions
 asks+0.010
 walks+0.009
 knocks+0.009
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.014
he-0.013
ill-0.012
 Suz-0.012
smanship-0.012
Token.
Token ID13
Feature activation-8.74e-04

Pos logit contributions
 asks+0.007
 walks+0.006
 knocks+0.006
 trips+0.006
 points+0.006
Neg logit contributions
lessness-0.009
he-0.008
ill-0.008
 Suz-0.007
smanship-0.007
Token He
Token ID679
Feature activation-3.57e-03

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.002
he-0.002
ill-0.002
smanship-0.002
 Suz-0.001
Token opened
Token ID4721
Feature activation+5.61e-03

Pos logit contributions
 asks+0.005
 strangers+0.005
 walks+0.005
 knocks+0.005
 trips+0.005
Neg logit contributions
lessness-0.007
he-0.007
ill-0.007
 Suz-0.006
smanship-0.006
Token one
Token ID530
Feature activation+2.23e-02

Pos logit contributions
lessness+0.014
he+0.012
ill+0.011
smanship+0.011
 Suz+0.011
Neg logit contributions
 asks-0.008
 walks-0.007
 trips-0.007
 knocks-0.007
 points-0.007
Token of
Token ID286
Feature activation+1.78e-02

Pos logit contributions
lessness+0.048
 Suz+0.039
ill+0.038
he+0.038
free+0.038
Neg logit contributions
 asks-0.035
 trips-0.034
 knocks-0.032
 points-0.032
 walks-0.032
Token She
Token ID1375
Feature activation-8.90e-03

Pos logit contributions
 walks+0.013
 asks+0.012
 knocks+0.012
 trips+0.012
 throws+0.012
Neg logit contributions
lessness-0.006
he-0.006
ill-0.005
 Suz-0.004
ay-0.003
Token does
Token ID857
Feature activation-2.01e-02

Pos logit contributions
Un+0.009
ves+0.009
 marriage+0.009
 passports+0.009
 strangers+0.008
Neg logit contributions
he-0.017
lessness-0.016
ill-0.015
cy-0.014
 tallest-0.014
Token not
Token ID407
Feature activation-1.70e-02

Pos logit contributions
 asks+0.029
 trips+0.025
 walks+0.025
 knocks+0.025
 cows+0.023
Neg logit contributions
he-0.041
lessness-0.041
ill-0.036
taker-0.034
 Suz-0.033
Token want
Token ID765
Feature activation-2.21e-02

Pos logit contributions
 asks+0.027
 walks+0.025
 knocks+0.025
 trips+0.024
 points+0.023
Neg logit contributions
lessness-0.036
he-0.034
ill-0.031
 Suz-0.030
smanship-0.030
Token to
Token ID284
Feature activation-6.75e-03

Pos logit contributions
 asks+0.039
 knocks+0.037
 walks+0.035
 points+0.035
 trips+0.035
Neg logit contributions
lessness-0.034
ill-0.033
he-0.032
 Suz-0.032
taker-0.026
Token stop
Token ID2245
Feature activation-3.70e-02

Pos logit contributions
 asks+0.011
 walks+0.010
 trips+0.010
 points+0.010
 knocks+0.010
Neg logit contributions
he-0.013
lessness-0.012
ill-0.012
 Suz-0.012
smanship-0.010
Token singing
Token ID13777
Feature activation-1.21e-02

Pos logit contributions
 asks+0.058
 knocks+0.055
 trips+0.053
 walks+0.053
 points+0.051
Neg logit contributions
lessness-0.072
he-0.066
ill-0.064
 Suz-0.063
smanship-0.053
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
 asks+0.018
 walks+0.016
 knocks+0.016
 trips+0.016
 points+0.015
Neg logit contributions
lessness-0.027
he-0.025
ill-0.024
 Suz-0.023
smanship-0.022
Token
Token ID198
Feature activation-1.30e-02

Pos logit contributions
 walks+0.022
 asks+0.021
 knocks+0.021
 trips+0.021
 throws+0.020
Neg logit contributions
lessness-0.013
he-0.013
ill-0.011
 Suz-0.010
ay-0.008
Token
Token ID198
Feature activation-4.49e-03

Pos logit contributions
 asks+0.023
 knocks+0.022
 walks+0.022
 points+0.021
 trips+0.021
Neg logit contributions
he-0.024
lessness-0.024
ill-0.020
 Suz-0.020
taker-0.017
Token"
Token ID1
Feature activation+9.39e-03

Pos logit contributions
 asks+0.008
 knocks+0.008
 walks+0.008
 trips+0.008
 points+0.008
Neg logit contributions
he-0.008
lessness-0.008
ill-0.006
 Suz-0.006
lessly-0.005
Token She
Token ID1375
Feature activation+2.41e-03

Pos logit contributions
lessness+0.012
he+0.011
ill+0.010
 Suz+0.010
smanship+0.009
Neg logit contributions
 asks-0.010
 knocks-0.009
 walks-0.009
 trips-0.009
 points-0.008
Token said
Token ID531
Feature activation-5.29e-03

Pos logit contributions
lessness+0.005
he+0.005
ill+0.004
 Suz+0.004
taker+0.004
Neg logit contributions
 asks-0.003
 knocks-0.003
 strangers-0.003
 trips-0.003
 walks-0.003
Token,
Token ID11
Feature activation-2.21e-02

Pos logit contributions
 asks+0.010
 knocks+0.009
 trips+0.009
 walks+0.009
 requests+0.008
Neg logit contributions
lessness-0.010
he-0.008
 Suz-0.008
ill-0.008
taker-0.007
Token "
Token ID366
Feature activation+1.13e-02

Pos logit contributions
 asks+0.033
 walks+0.031
 trips+0.031
 knocks+0.031
 points+0.028
Neg logit contributions
lessness-0.047
he-0.044
ill-0.042
 Suz-0.041
taker-0.037
TokenMom
Token ID29252
Feature activation-6.27e-04

Pos logit contributions
lessness+0.023
he+0.021
ill+0.019
 Suz+0.019
smanship+0.018
Neg logit contributions
 asks-0.020
 walks-0.018
 knocks-0.018
 trips-0.017
 points-0.017
Token,
Token ID11
Feature activation-4.82e-02

Pos logit contributions
 asks+0.001
 knocks+0.001
 walks+0.001
 trips+0.001
 points+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
smanship-0.001
 Suz-0.001
Token our
Token ID674
Feature activation-4.62e-02

Pos logit contributions
 asks+0.079
 walks+0.073
 knocks+0.072
 trips+0.071
 points+0.068
Neg logit contributions
lessness-0.099
he-0.093
ill-0.086
 Suz-0.083
smanship-0.078
Token carpet
Token ID20710
Feature activation-3.88e-02

Pos logit contributions
 asks+0.071
 knocks+0.061
 walks+0.060
 trips+0.057
 points+0.057
Neg logit contributions
lessness-0.091
he-0.087
 Suz-0.082
ill-0.079
taker-0.074
Token is
Token ID318
Feature activation+1.07e-04

Pos logit contributions
 asks+0.054
 knocks+0.049
 walks+0.049
 trips+0.047
 points+0.045
Neg logit contributions
lessness-0.089
he-0.085
ill-0.078
 Suz-0.075
smanship-0.074
Token dirty
Token ID11841
Feature activation+8.72e-03

Pos logit contributions
lessness+0.000
he+0.000
ill+0.000
 Suz+0.000
smanship+0.000
Neg logit contributions
 asks-0.000
 walks-0.000
 knocks-0.000
 trips-0.000
 points-0.000
Token.
Token ID13
Feature activation-7.43e-03

Pos logit contributions
lessness+0.019
he+0.018
ill+0.017
 Suz+0.017
smanship+0.016
Neg logit contributions
 asks-0.012
 knocks-0.011
 trips-0.011
 walks-0.011
 points-0.010
Token monkey
Token ID21657
Feature activation-3.18e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
 Suz+0.001
smanship+0.001
Neg logit contributions
 asks-0.000
 walks-0.000
 knocks-0.000
 trips-0.000
 points-0.000
Token."
Token ID526
Feature activation+5.55e-03

Pos logit contributions
 asks+0.054
 walks+0.049
 knocks+0.049
 trips+0.048
 points+0.046
Neg logit contributions
lessness-0.064
he-0.061
ill-0.055
 Suz-0.053
smanship-0.052
Token Then
Token ID3244
Feature activation+5.53e-03

Pos logit contributions
lessness+0.013
he+0.012
ill+0.011
smanship+0.011
 Suz+0.011
Neg logit contributions
 asks-0.008
 walks-0.007
 trips-0.007
 knocks-0.007
 points-0.007
Token,
Token ID11
Feature activation+1.08e-03

Pos logit contributions
lessness+0.011
he+0.011
ill+0.010
 Suz+0.009
smanship+0.009
Neg logit contributions
 asks-0.009
 knocks-0.008
 walks-0.008
 trips-0.008
 points-0.008
Token the
Token ID262
Feature activation+4.01e-03

Pos logit contributions
lessness+0.002
he+0.002
ill+0.002
 Suz+0.002
smanship+0.002
Neg logit contributions
 asks-0.002
 walks-0.002
 knocks-0.002
 trips-0.002
 points-0.001
Token tiger
Token ID26241
Feature activation-3.68e-02

Pos logit contributions
lessness+0.008
he+0.007
ill+0.006
 Suz+0.006
smanship+0.006
Neg logit contributions
 asks-0.007
 knocks-0.007
 walks-0.006
 trips-0.006
 points-0.006
Token waved
Token ID26834
Feature activation+4.63e-03

Pos logit contributions
 asks+0.053
 knocks+0.050
 walks+0.048
 trips+0.047
 points+0.045
Neg logit contributions
lessness-0.079
he-0.076
ill-0.071
 Suz-0.068
smanship-0.065
Token goodbye
Token ID24829
Feature activation+5.70e-04

Pos logit contributions
lessness+0.010
he+0.009
ill+0.008
smanship+0.008
 Suz+0.008
Neg logit contributions
 asks-0.007
 walks-0.007
 knocks-0.007
 points-0.007
 trips-0.007
Token and
Token ID290
Feature activation+2.24e-02

Pos logit contributions
lessness+0.001
he+0.001
ill+0.001
smanship+0.001
 Suz+0.001
Neg logit contributions
 asks-0.001
 walks-0.001
 knocks-0.001
 trips-0.001
 points-0.001
Token disappeared
Token ID12120
Feature activation-8.15e-03

Pos logit contributions
lessness+0.050
he+0.048
ill+0.044
 Suz+0.043
smanship+0.041
Neg logit contributions
 asks-0.032
 knocks-0.029
 walks-0.029
 trips-0.028
 points-0.027
Token into
Token ID656
Feature activation+7.13e-03

Pos logit contributions
 asks+0.011
 walks+0.011
 trips+0.010
 points+0.010
 knocks+0.010
Neg logit contributions
lessness-0.018
he-0.018
ill-0.016
smanship-0.015
 Suz-0.015
Token,
Token ID11
Feature activation+1.29e-02

Pos logit contributions
lessness+0.010
he+0.009
 Suz+0.009
smanship+0.009
taker+0.009
Neg logit contributions
 asks-0.005
 knocks-0.005
 walks-0.005
 trips-0.005
 points-0.005
Token watching
Token ID4964
Feature activation-1.46e-02

Pos logit contributions
lessness+0.027
he+0.026
ill+0.024
smanship+0.023
 Suz+0.023
Neg logit contributions
 asks-0.021
 knocks-0.019
 walks-0.019
 trips-0.018
 points-0.018
Token her
Token ID607
Feature activation-7.12e-03

Pos logit contributions
 asks+0.023
 knocks+0.021
 walks+0.021
 trips+0.020
 points+0.019
Neg logit contributions
lessness-0.031
he-0.030
ill-0.028
 Suz-0.026
smanship-0.025
Token.
Token ID13
Feature activation-8.16e-03

Pos logit contributions
 asks+0.011
 knocks+0.010
 walks+0.010
 trips+0.009
 points+0.009
Neg logit contributions
lessness-0.016
he-0.015
ill-0.014
 Suz-0.013
smanship-0.013
Token The
Token ID383
Feature activation-3.82e-03

Pos logit contributions
 asks+0.016
 knocks+0.016
 walks+0.015
 trips+0.015
 points+0.015
Neg logit contributions
lessness-0.014
he-0.013
ill-0.012
 Suz-0.011
smanship-0.010
Token magic
Token ID5536
Feature activation-3.88e-02

Pos logit contributions
 asks+0.006
 walks+0.006
 knocks+0.006
 trips+0.006
 points+0.005
Neg logit contributions
lessness-0.008
he-0.008
ill-0.007
 Suz-0.007
smanship-0.007
Token pencil
Token ID21613
Feature activation-1.57e-02

Pos logit contributions
 asks+0.061
 knocks+0.056
 walks+0.056
 trips+0.054
 points+0.052
Neg logit contributions
lessness-0.083
he-0.078
ill-0.072
 Suz-0.069
smanship-0.067
Token feels
Token ID5300
Feature activation-1.88e-03

Pos logit contributions
 asks+0.019
 knocks+0.017
 walks+0.017
 trips+0.017
 points+0.016
Neg logit contributions
lessness-0.039
he-0.037
ill-0.034
 Suz-0.033
smanship-0.032
Token sorry
Token ID7926
Feature activation+2.01e-02

Pos logit contributions
 asks+0.004
 knocks+0.003
 walks+0.003
 trips+0.003
 points+0.003
Neg logit contributions
lessness-0.003
he-0.003
ill-0.003
 Suz-0.003
smanship-0.002
Token and
Token ID290
Feature activation+9.84e-03

Pos logit contributions
lessness+0.040
he+0.039
ill+0.036
 Suz+0.035
smanship+0.034
Neg logit contributions
 asks-0.033
 knocks-0.031
 walks-0.031
 trips-0.030
 points-0.029
Token lonely
Token ID21757
Feature activation-9.27e-03

Pos logit contributions
lessness+0.020
he+0.018
ill+0.016
 Suz+0.016
taker+0.015
Neg logit contributions
 asks-0.016
 knocks-0.015
 trips-0.015
 walks-0.015
 strangers-0.014
Token opened
Token ID4721
Feature activation+3.64e-03

Pos logit contributions
 asks+0.049
 knocks+0.045
 walks+0.045
 trips+0.044
 points+0.041
Neg logit contributions
lessness-0.077
he-0.073
ill-0.068
 Suz-0.065
smanship-0.063
Token the
Token ID262
Feature activation+4.92e-03

Pos logit contributions
lessness+0.008
he+0.008
ill+0.007
smanship+0.007
 Suz+0.007
Neg logit contributions
 asks-0.005
 walks-0.005
 knocks-0.005
 trips-0.005
 points-0.005
Token ig
Token ID45329
Feature activation-9.20e-04

Pos logit contributions
lessness+0.010
he+0.010
ill+0.009
 Suz+0.008
smanship+0.008
Neg logit contributions
 asks-0.008
 walks-0.008
 knocks-0.008
 trips-0.007
 points-0.007
Tokenloo
Token ID29680
Feature activation-8.16e-03

Pos logit contributions
 asks+0.002
 knocks+0.002
 trips+0.002
 points+0.001
am+0.001
Neg logit contributions
lessness-0.001
he-0.001
ill-0.001
igsaw-0.001
smanship-0.001
Token and
Token ID290
Feature activation+3.27e-03

Pos logit contributions
 asks+0.010
 knocks+0.010
 trips+0.009
 walks+0.009
 points+0.009
Neg logit contributions
lessness-0.019
he-0.017
 Suz-0.016
ill-0.016
smanship-0.015
Token pe
Token ID613
Feature activation-5.89e-02

Pos logit contributions
lessness+0.007
he+0.007
ill+0.006
 Suz+0.006
taker+0.006
Neg logit contributions
 asks-0.005
 knocks-0.004
 walks-0.004
 trips-0.004
 points-0.004
Tokenered
Token ID1068
Feature activation-2.09e-02

Pos logit contributions
 asks+0.089
 knocks+0.082
 walks+0.082
 trips+0.080
 strangers+0.075
Neg logit contributions
lessness-0.127
he-0.120
ill-0.114
 Suz-0.108
smanship-0.105
Token inside
Token ID2641
Feature activation-1.74e-02

Pos logit contributions
 asks+0.032
 walks+0.031
 trips+0.029
 knocks+0.029
 points+0.029
Neg logit contributions
lessness-0.045
he-0.043
ill-0.039
smanship-0.038
 Suz-0.036
Token.
Token ID13
Feature activation-3.04e-03

Pos logit contributions
 asks+0.023
 walks+0.020
 knocks+0.020
 points+0.020
 trips+0.019
Neg logit contributions
lessness-0.041
he-0.039
ill-0.036
smanship-0.035
 Suz-0.034
Token She
Token ID1375
Feature activation-1.98e-03

Pos logit contributions
 asks+0.006
 knocks+0.006
 walks+0.006
 trips+0.006
 throws+0.006
Neg logit contributions
lessness-0.005
he-0.005
 Suz-0.004
ill-0.004
taker-0.004
Token found
Token ID1043
Feature activation-5.08e-04

Pos logit contributions
 asks+0.003
 knocks+0.003
 walks+0.003
 trips+0.003
 strangers+0.003
Neg logit contributions
lessness-0.004
he-0.004
ill-0.004
 Suz-0.004
smanship-0.003