TOP ACTIVATIONS
MAX = 3.324

they played in the garden and had lots of fun .
and wrapped one of their arms around Bobby 's shoulders .
mouth and run around the yard . One day
to have fun , it should never come before taking care
the story is : You should always ask your parents before
Tim my learned that he should complete his math and not
snow man and wrapped his arms around him .
toys and run around the yard . One day , Max
of the story : kids should never try to use grown
the story is : You should not be ignorant and selfish
Every day they played together and joked around . But one
and looking for a carrot trail ," replied Benny .
so he hid a special camera . The bear searched high
hug sound and wraps his arms around the paper .
cricket was sitting on a leaf outside . Suddenly , it
and your mind . You should not play too much or
hug sound and wraps her arms around the paper .
said that Lily and Ben should say sorry and thank you
an important lesson - we should respect the opinion of others
. They always played together and had lots of fun .

MIN ACTIVATIONS
MIN = -3.546

was all broken . Wil ma felt a big
suddenly Lucy heard a loud " SC REAM "! She felt
. It was surrounded by other weak trees in the forest
. One day , Wil ma tri pped and
, the tree shook and an apple fell to the ground
in a jungle with many other animals like lions , tigers
, he heard a loud " Z oom !"
a big tree with many other cher ries . The smooth
Help !" Tom screamed . " There 's a lion in
. She wanted to fill her basket with the needles .
can 't just play with other people 's things without asking
lived on a farm with other animals . The pony liked
One day , while they were walking , Tim my
big tree with lots of other squirrel s . They were
that day : never load your wagon with p ears near
in a big field with other animals . One
. One day , while they were walking , Lily found
in a big field with other animals . One
any friends . All the other trees were big and strong
" Don 't eat too many ." L ily

INTERVAL 1.606 - 3.324
CONTAINS 0.610%

and sometimes they miss . One day , a
OK , OK , you can play with us . But
He felt sick and he wanted to cry . He said
and wrapped it in a box . Sarah said
and promised herself that she would always be open to trying

INTERVAL -0.111 - 1.606
CONTAINS 55.954%

cry . " O w , ow , ow
capt ivated , she decided to stay and rest for a
day . The end . <|endoftext|> Once upon a time ,
future . <|endoftext|> Once upon a time , there was a
and learned many things . She was a gifted girl who

INTERVAL -1.828 - -0.111
CONTAINS 43.156%

who wanted to buy a toy . He asked his mum
so the puppy could run and play .
a smile on his face , the j olly pirate set
the end , Tom helped Kitty unt angle the knot .
deserve them ." L ily and Ben are sad

INTERVAL -3.546 - -1.828
CONTAINS 0.280%

new friend . They share their food with Polly and listen
she wanted to make her daughter happy .
Tom . They were both three years old and loved to
you ." Twe ety flew out of the cage and landed
duck !" Mom says . " She is not food ,
Token they
Token ID484
Feature activation+1.48e+00

Pos logit contributions
abella+0.031
top+0.031
 wrong+0.030
rant+0.027
bench+0.027
Neg logit contributions
hr-0.023
packs-0.022
rum-0.022
cest-0.022
ment-0.022
Token played
Token ID2826
Feature activation+3.92e-03

Pos logit contributions
abella+0.050
 wrong+0.043
top+0.042
bench+0.035
gh+0.034
Neg logit contributions
 exc-0.061
cest-0.057
packs-0.055
oc-0.054
 mis-0.054
Token in
Token ID287
Feature activation-1.32e-01

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
rant+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token the
Token ID262
Feature activation+1.27e+00

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token garden
Token ID11376
Feature activation+1.43e+00

Pos logit contributions
abella+0.063
top+0.053
bench+0.053
 versa+0.052
 wrong+0.052
Neg logit contributions
cest-0.029
AAAA-0.029
hr-0.028
 tou-0.027
 oak-0.027
Token and
Token ID290
Feature activation+3.32e+00

Pos logit contributions
abella+0.076
 versa+0.062
 wrong+0.061
top+0.060
bench+0.058
Neg logit contributions
packs-0.033
hr-0.033
cest-0.031
 exc-0.030
oc-0.029
Token had
Token ID550
Feature activation+1.46e+00

Pos logit contributions
bench+0.149
 wrong+0.146
top+0.142
inner+0.137
 virtue+0.136
Neg logit contributions
 ge-0.072
 squirrel-0.071
packs-0.065
 pets-0.064
 palm-0.063
Token lots
Token ID6041
Feature activation-1.11e+00

Pos logit contributions
abella+0.064
top+0.060
bench+0.057
rant+0.057
 wrong+0.056
Neg logit contributions
hr-0.038
cest-0.035
Upon-0.034
packs-0.034
AAAA-0.033
Token of
Token ID286
Feature activation-3.43e-01

Pos logit contributions
rum+0.041
hr+0.041
AAAA+0.040
cest+0.037
oc+0.035
Neg logit contributions
 wrong-0.038
top-0.037
abella-0.036
 contest-0.030
 weekend-0.030
Token fun
Token ID1257
Feature activation-6.24e-01

Pos logit contributions
hr+0.016
rum+0.016
cest+0.016
AAAA+0.015
ment+0.015
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.008
bench-0.006
 weekend-0.005
Token.
Token ID13
Feature activation+1.84e+00

Pos logit contributions
hr+0.024
rum+0.023
AAAA+0.023
cest+0.023
 tou+0.022
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.019
bench-0.015
 weekend-0.014
Token and
Token ID290
Feature activation+6.35e-02

Pos logit contributions
hr+0.013
rum+0.012
AAAA+0.011
cest+0.011
oc+0.011
Neg logit contributions
abella-0.019
 wrong-0.019
top-0.019
bench-0.016
rant-0.015
Token wrapped
Token ID12908
Feature activation+1.84e+00

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.002
rant+0.002
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token one
Token ID530
Feature activation+6.95e-01

Pos logit contributions
abella+0.086
top+0.078
 contest+0.078
 wrong+0.075
bench+0.075
Neg logit contributions
ment-0.046
iring-0.041
rum-0.040
adoes-0.037
ries-0.037
Token of
Token ID286
Feature activation+1.10e-01

Pos logit contributions
abella+0.029
 wrong+0.025
top+0.025
 versa+0.023
bench+0.023
Neg logit contributions
hr-0.021
rum-0.021
cest-0.020
ment-0.020
oc-0.019
Token their
Token ID511
Feature activation-1.24e+00

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.005
bench+0.004
 versa+0.004
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.002
Token arms
Token ID5101
Feature activation+3.29e+00

Pos logit contributions
hr+0.041
rum+0.038
cest+0.038
AAAA+0.038
oc+0.036
Neg logit contributions
 wrong-0.049
abella-0.046
top-0.045
 would-0.038
 weekend-0.038
Token around
Token ID1088
Feature activation-1.15e-01

Pos logit contributions
abella+0.177
 versa+0.146
 cinema+0.145
 Dame+0.145
 halftime+0.143
Neg logit contributions
 as-0.066
 from-0.065
iring-0.060
 behind-0.057
 than-0.057
Token Bobby
Token ID17155
Feature activation+1.05e+00

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token's
Token ID338
Feature activation+1.22e-02

Pos logit contributions
abella+0.047
 wrong+0.041
top+0.040
 contest+0.037
bench+0.036
Neg logit contributions
hr-0.030
rum-0.029
oc-0.028
cest-0.028
ment-0.027
Token shoulders
Token ID12450
Feature activation+2.34e+00

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.000
rant+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token.
Token ID13
Feature activation+4.65e-01

Pos logit contributions
abella+0.126
 wrong+0.104
 Dame+0.101
 Luke+0.100
 versa+0.099
Neg logit contributions
rum-0.045
iring-0.045
hr-0.044
 exc-0.043
ment-0.043
Token mouth
Token ID5422
Feature activation+8.82e-01

Pos logit contributions
hr+0.033
rum+0.032
cest+0.030
ment+0.030
AAAA+0.029
Neg logit contributions
 wrong-0.031
abella-0.030
top-0.029
bench-0.024
 Luke-0.024
Token and
Token ID290
Feature activation-7.09e-02

Pos logit contributions
abella+0.046
 wrong+0.040
top+0.039
 versa+0.037
 Dame+0.035
Neg logit contributions
hr-0.019
packs-0.018
cest-0.018
rum-0.017
ment-0.017
Token run
Token ID1057
Feature activation+7.85e-01

Pos logit contributions
hr+0.003
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.003
 wrong-0.002
top-0.002
bench-0.002
rant-0.002
Token around
Token ID1088
Feature activation+6.51e-01

Pos logit contributions
abella+0.029
 wrong+0.026
top+0.026
bench+0.022
 Luke+0.021
Neg logit contributions
oc-0.027
hr-0.026
cest-0.026
rum-0.026
AAAA-0.025
Token the
Token ID262
Feature activation+1.63e-01

Pos logit contributions
abella+0.031
top+0.028
 wrong+0.026
bench+0.025
rant+0.024
Neg logit contributions
hr-0.017
rum-0.016
oc-0.016
 exc-0.015
cest-0.015
Token yard
Token ID12699
Feature activation+3.10e+00

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.005
cest-0.005
rum-0.005
AAAA-0.005
oc-0.005
Token.
Token ID13
Feature activation+5.38e-01

Pos logit contributions
abella+0.179
acey+0.149
 versa+0.144
rant+0.141
 Dame+0.139
Neg logit contributions
 than-0.054
 exc-0.050
hr-0.050
cest-0.049
packs-0.047
Token
Token ID198
Feature activation+9.37e-01

Pos logit contributions
abella+0.023
 wrong+0.021
top+0.021
rant+0.018
bench+0.018
Neg logit contributions
oc-0.016
rum-0.015
hr-0.015
AAAA-0.015
 exc-0.014
Token
Token ID198
Feature activation+6.02e-01

Pos logit contributions
abella+0.040
 wrong+0.039
top+0.038
rant+0.033
work+0.033
Neg logit contributions
hr-0.025
rum-0.025
oc-0.025
AAAA-0.024
 exc-0.023
TokenOne
Token ID3198
Feature activation-6.83e-01

Pos logit contributions
abella+0.018
top+0.016
 wrong+0.016
work+0.013
bench+0.013
Neg logit contributions
oc-0.026
AAAA-0.025
rum-0.025
hr-0.024
cest-0.024
Token day
Token ID1110
Feature activation-6.60e-01

Pos logit contributions
hr+0.037
rum+0.037
AAAA+0.035
cest+0.035
ment+0.035
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.010
 weekend-0.008
 morning-0.006
Token to
Token ID284
Feature activation+1.35e+00

Pos logit contributions
abella+0.028
top+0.025
 wrong+0.022
rant+0.021
 versa+0.020
Neg logit contributions
rum-0.033
hr-0.032
ment-0.032
cest-0.031
packs-0.030
Token have
Token ID423
Feature activation+5.14e-01

Pos logit contributions
abella+0.059
top+0.053
 wrong+0.051
bench+0.047
 projector+0.045
Neg logit contributions
hr-0.040
rum-0.039
 exc-0.039
 mis-0.036
cest-0.035
Token fun
Token ID1257
Feature activation-3.36e-01

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
rant+0.014
bench+0.014
Neg logit contributions
hr-0.019
rum-0.018
cest-0.018
AAAA-0.018
ment-0.017
Token,
Token ID11
Feature activation-2.52e-01

Pos logit contributions
hr+0.011
rum+0.011
cest+0.011
AAAA+0.010
oc+0.010
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 versa-0.010
Token it
Token ID340
Feature activation+1.11e+00

Pos logit contributions
hr+0.010
rum+0.010
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.008
bench-0.006
 versa-0.006
Token should
Token ID815
Feature activation+3.09e+00

Pos logit contributions
abella+0.042
top+0.037
 wrong+0.034
bench+0.033
 versa+0.032
Neg logit contributions
hr-0.038
rum-0.036
 exc-0.036
oc-0.036
cest-0.035
Token never
Token ID1239
Feature activation+5.62e-01

Pos logit contributions
abella+0.134
top+0.117
 projector+0.108
rant+0.106
bench+0.105
Neg logit contributions
 exc-0.097
rum-0.087
 mis-0.086
 tou-0.083
 fear-0.082
Token come
Token ID1282
Feature activation-8.63e-01

Pos logit contributions
abella+0.025
top+0.022
 wrong+0.022
bench+0.020
rant+0.019
Neg logit contributions
hr-0.017
cest-0.015
rum-0.015
 exc-0.015
AAAA-0.015
Token before
Token ID878
Feature activation-6.85e-01

Pos logit contributions
hr+0.038
rum+0.037
AAAA+0.037
cest+0.036
oc+0.036
Neg logit contributions
 wrong-0.023
abella-0.023
top-0.022
rant-0.016
work-0.016
Token taking
Token ID2263
Feature activation+1.18e-01

Pos logit contributions
hr+0.026
rum+0.025
cest+0.025
AAAA+0.024
oc+0.024
Neg logit contributions
abella-0.023
 wrong-0.023
top-0.022
bench-0.017
 Luke-0.017
Token care
Token ID1337
Feature activation-2.62e-02

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
ment-0.004
Token the
Token ID262
Feature activation+1.19e+00

Pos logit contributions
abella+0.034
top+0.034
 wrong+0.033
bench+0.029
 versa+0.028
Neg logit contributions
hr-0.020
rum-0.018
oc-0.018
cest-0.018
AAAA-0.017
Token story
Token ID1621
Feature activation+1.01e+00

Pos logit contributions
top+0.056
abella+0.052
rant+0.047
 wrong+0.046
bench+0.045
Neg logit contributions
 tou-0.032
hr-0.031
 exc-0.031
AAAA-0.030
 mis-0.028
Token is
Token ID318
Feature activation+8.56e-01

Pos logit contributions
abella+0.043
top+0.039
 wrong+0.036
 versa+0.035
rant+0.034
Neg logit contributions
rum-0.032
oc-0.029
ment-0.029
packs-0.029
hr-0.028
Token:
Token ID25
Feature activation+8.53e-01

Pos logit contributions
abella+0.029
top+0.028
bench+0.025
 wrong+0.025
 versa+0.024
Neg logit contributions
hr-0.032
rum-0.031
iring-0.030
ment-0.029
 exc-0.029
Token You
Token ID921
Feature activation+2.81e-01

Pos logit contributions
abella+0.034
 wrong+0.031
top+0.031
bench+0.028
rant+0.027
Neg logit contributions
hr-0.029
rum-0.028
 exc-0.027
oc-0.027
AAAA-0.026
Token should
Token ID815
Feature activation+3.08e+00

Pos logit contributions
abella+0.010
top+0.010
 wrong+0.009
bench+0.008
rant+0.008
Neg logit contributions
hr-0.010
rum-0.010
cest-0.009
 exc-0.009
AAAA-0.009
Token always
Token ID1464
Feature activation+1.33e+00

Pos logit contributions
abella+0.144
top+0.124
 projector+0.119
rant+0.116
work+0.110
Neg logit contributions
 exc-0.089
rum-0.086
 mis-0.072
 worry-0.072
 fear-0.070
Token ask
Token ID1265
Feature activation+1.30e-01

Pos logit contributions
abella+0.064
top+0.055
 wrong+0.052
rant+0.047
side+0.046
Neg logit contributions
rum-0.038
 exc-0.038
hr-0.038
 mis-0.035
cest-0.033
Token your
Token ID534
Feature activation+3.97e-01

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
 versa+0.003
Neg logit contributions
hr-0.005
rum-0.005
cest-0.004
AAAA-0.004
ment-0.004
Token parents
Token ID3397
Feature activation+7.73e-01

Pos logit contributions
abella+0.019
top+0.018
 wrong+0.017
bench+0.016
rant+0.016
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
 exc-0.009
packs-0.009
Token before
Token ID878
Feature activation+4.92e-02

Pos logit contributions
abella+0.032
top+0.029
 wrong+0.028
bench+0.026
 versa+0.025
Neg logit contributions
hr-0.024
rum-0.022
cest-0.022
oc-0.021
packs-0.021
Token Tim
Token ID5045
Feature activation+6.79e-01

Pos logit contributions
rum+0.056
hr+0.055
AAAA+0.054
cest+0.054
ment+0.054
Neg logit contributions
 wrong-0.026
top-0.023
abella-0.023
 Jack-0.019
 only-0.017
Tokenmy
Token ID1820
Feature activation+1.04e+00

Pos logit contributions
abella+0.021
top+0.020
 wrong+0.018
 versa+0.016
bench+0.015
Neg logit contributions
hr-0.027
cest-0.027
rum-0.026
oc-0.026
 exc-0.025
Token learned
Token ID4499
Feature activation+2.62e-01

Pos logit contributions
abella+0.038
top+0.035
 wrong+0.033
rant+0.030
 versa+0.029
Neg logit contributions
cest-0.035
hr-0.035
oc-0.035
rum-0.034
 exc-0.034
Token that
Token ID326
Feature activation+1.06e+00

Pos logit contributions
abella+0.007
top+0.007
 wrong+0.006
rant+0.005
bench+0.005
Neg logit contributions
hr-0.012
rum-0.011
cest-0.011
AAAA-0.011
ment-0.011
Token he
Token ID339
Feature activation+1.11e+00

Pos logit contributions
abella+0.042
top+0.038
rant+0.036
 wrong+0.034
 Dame+0.030
Neg logit contributions
hr-0.035
 exc-0.035
rum-0.034
cest-0.033
 accidents-0.032
Token should
Token ID815
Feature activation+3.08e+00

Pos logit contributions
abella+0.046
top+0.044
 wrong+0.039
bench+0.039
side+0.038
Neg logit contributions
hr-0.031
 exc-0.031
rum-0.030
oc-0.030
cest-0.030
Token complete
Token ID1844
Feature activation+5.35e-01

Pos logit contributions
abella+0.146
top+0.127
 projector+0.123
rant+0.123
 cinema+0.116
Neg logit contributions
 exc-0.091
rum-0.081
 mis-0.074
 worry-0.068
 heel-0.068
Token his
Token ID465
Feature activation-1.15e+00

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.019
rant+0.017
bench+0.017
Neg logit contributions
hr-0.017
rum-0.016
cest-0.015
packs-0.015
ment-0.015
Token math
Token ID10688
Feature activation-2.92e-01

Pos logit contributions
cest+0.061
hr+0.061
rum+0.060
AAAA+0.059
ment+0.059
Neg logit contributions
 wrong-0.023
abella-0.021
top-0.018
 contest-0.013
bench-0.013
Token and
Token ID290
Feature activation+7.96e-01

Pos logit contributions
hr+0.013
rum+0.013
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.007
bench-0.006
 versa-0.005
Token not
Token ID407
Feature activation+1.11e+00

Pos logit contributions
abella+0.027
top+0.024
 wrong+0.022
rant+0.021
 versa+0.020
Neg logit contributions
hr-0.031
 exc-0.029
cest-0.029
rum-0.029
ment-0.028
Token snow
Token ID6729
Feature activation+1.19e+00

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.020
bench+0.018
 versa+0.018
Neg logit contributions
hr-0.014
cest-0.014
rum-0.014
AAAA-0.014
oc-0.013
Tokenman
Token ID805
Feature activation+2.25e-01

Pos logit contributions
abella+0.044
 wrong+0.037
top+0.035
 versa+0.032
bench+0.029
Neg logit contributions
oc-0.045
cest-0.045
hr-0.044
ment-0.043
rum-0.043
Token and
Token ID290
Feature activation-5.70e-01

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.009
 versa+0.009
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
oc-0.005
packs-0.005
Token wrapped
Token ID12908
Feature activation+1.94e+00

Pos logit contributions
hr+0.022
rum+0.022
cest+0.021
AAAA+0.020
iring+0.020
Neg logit contributions
abella-0.020
top-0.018
 wrong-0.018
bench-0.014
 Luke-0.014
Token his
Token ID465
Feature activation-1.64e+00

Pos logit contributions
abella+0.089
 wrong+0.085
top+0.082
bench+0.080
 weekend+0.079
Neg logit contributions
ment-0.045
ries-0.041
rum-0.040
hr-0.038
iring-0.037
Token arms
Token ID5101
Feature activation+3.07e+00

Pos logit contributions
hr+0.065
Upon+0.057
cest+0.057
AAAA+0.056
iring+0.055
Neg logit contributions
 wrong-0.056
abella-0.048
top-0.048
 would-0.042
gh-0.042
Token around
Token ID1088
Feature activation+6.92e-01

Pos logit contributions
abella+0.164
 versa+0.136
 halftime+0.132
 Dame+0.132
 cinema+0.132
Neg logit contributions
 than-0.061
 from-0.061
 as-0.059
 behind-0.057
rum-0.055
Token him
Token ID683
Feature activation+3.80e-02

Pos logit contributions
abella+0.031
top+0.029
 wrong+0.029
bench+0.026
 versa+0.025
Neg logit contributions
hr-0.018
rum-0.018
ment-0.016
oc-0.016
cest-0.016
Token.
Token ID13
Feature activation-2.99e-01

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token
Token ID198
Feature activation+3.46e-01

Pos logit contributions
hr+0.011
rum+0.011
cest+0.011
AAAA+0.010
ment+0.010
Neg logit contributions
abella-0.011
top-0.010
 wrong-0.010
bench-0.008
 Luke-0.008
Token
Token ID198
Feature activation-6.01e-01

Pos logit contributions
abella+0.014
 wrong+0.013
top+0.013
bench+0.011
rant+0.011
Neg logit contributions
hr-0.011
rum-0.010
AAAA-0.010
cest-0.010
oc-0.010
Token toys
Token ID14958
Feature activation+5.53e-01

Pos logit contributions
hr+0.067
rum+0.065
ment+0.065
AAAA+0.060
cest+0.057
Neg logit contributions
 wrong-0.067
top-0.054
abella-0.053
 would-0.051
 only-0.051
Token and
Token ID290
Feature activation+9.46e-02

Pos logit contributions
abella+0.029
top+0.026
 wrong+0.025
 versa+0.023
rant+0.023
Neg logit contributions
hr-0.011
cest-0.011
packs-0.010
 exc-0.010
rum-0.010
Token run
Token ID1057
Feature activation+4.58e-01

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
 exc-0.003
Token around
Token ID1088
Feature activation+7.84e-01

Pos logit contributions
abella+0.015
top+0.013
 wrong+0.013
bench+0.010
rant+0.010
Neg logit contributions
hr-0.018
rum-0.018
oc-0.018
cest-0.017
AAAA-0.017
Token the
Token ID262
Feature activation+8.81e-01

Pos logit contributions
abella+0.037
top+0.032
 wrong+0.031
rant+0.028
bench+0.028
Neg logit contributions
hr-0.021
oc-0.020
rum-0.019
 mis-0.018
 exc-0.018
Token yard
Token ID12699
Feature activation+3.07e+00

Pos logit contributions
abella+0.042
top+0.037
 wrong+0.036
 versa+0.034
rant+0.034
Neg logit contributions
hr-0.022
cest-0.021
oc-0.021
 tou-0.021
AAAA-0.020
Token.
Token ID13
Feature activation+8.72e-01

Pos logit contributions
abella+0.173
acey+0.141
 versa+0.138
 Dame+0.136
rant+0.136
Neg logit contributions
packs-0.056
 than-0.055
hr-0.055
oc-0.053
cest-0.052
Token One
Token ID1881
Feature activation-3.96e-01

Pos logit contributions
abella+0.030
 wrong+0.028
top+0.027
rant+0.023
work+0.022
Neg logit contributions
oc-0.032
rum-0.030
packs-0.030
AAAA-0.030
hr-0.029
Token day
Token ID1110
Feature activation-7.60e-02

Pos logit contributions
hr+0.021
rum+0.021
AAAA+0.020
cest+0.020
ment+0.020
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.007
 weekend-0.004
bench-0.004
Token,
Token ID11
Feature activation-2.42e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.003
top-0.003
 wrong-0.003
bench-0.002
rant-0.002
Token Max
Token ID5436
Feature activation-7.75e-01

Pos logit contributions
hr+0.012
rum+0.012
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.004
bench-0.003
 Luke-0.003
Token of
Token ID286
Feature activation+1.36e+00

Pos logit contributions
abella+0.081
 versa+0.075
 wrong+0.075
top+0.075
 Dame+0.074
Neg logit contributions
ment-0.019
hr-0.018
oc-0.016
iring-0.015
Upon-0.015
Token the
Token ID262
Feature activation+9.68e-01

Pos logit contributions
abella+0.062
top+0.059
 wrong+0.059
bench+0.052
 versa+0.051
Neg logit contributions
hr-0.036
rum-0.032
ment-0.031
oc-0.030
cest-0.027
Token story
Token ID1621
Feature activation+7.33e-01

Pos logit contributions
abella+0.038
top+0.038
 wrong+0.034
rant+0.031
bench+0.031
Neg logit contributions
hr-0.032
 exc-0.029
 mis-0.027
AAAA-0.027
rum-0.027
Token:
Token ID25
Feature activation+1.76e+00

Pos logit contributions
abella+0.028
top+0.026
 wrong+0.024
 versa+0.022
rant+0.022
Neg logit contributions
rum-0.024
hr-0.024
packs-0.023
ment-0.023
cest-0.022
Token kids
Token ID3988
Feature activation+2.33e+00

Pos logit contributions
abella+0.074
 wrong+0.063
top+0.062
bench+0.060
rant+0.056
Neg logit contributions
hr-0.059
rum-0.056
 exc-0.054
oc-0.054
ment-0.050
Token should
Token ID815
Feature activation+3.04e+00

Pos logit contributions
abella+0.091
top+0.075
bench+0.067
rant+0.065
 versa+0.065
Neg logit contributions
rum-0.080
 exc-0.077
hr-0.077
cest-0.074
 mis-0.074
Token never
Token ID1239
Feature activation+4.49e-01

Pos logit contributions
abella+0.123
top+0.103
 projector+0.097
rant+0.095
 wrong+0.093
Neg logit contributions
 exc-0.105
rum-0.101
 mis-0.093
hr-0.089
 tou-0.087
Token try
Token ID1949
Feature activation-1.32e-01

Pos logit contributions
abella+0.020
top+0.018
 wrong+0.017
bench+0.015
rant+0.015
Neg logit contributions
hr-0.014
rum-0.013
cest-0.012
 exc-0.012
AAAA-0.012
Token to
Token ID284
Feature activation+5.34e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.005
Neg logit contributions
abella-0.004
 wrong-0.003
top-0.003
bench-0.003
 versa-0.002
Token use
Token ID779
Feature activation-1.31e-01

Pos logit contributions
abella+0.022
 wrong+0.019
top+0.019
bench+0.017
 weekend+0.017
Neg logit contributions
hr-0.018
rum-0.017
cest-0.017
 exc-0.016
oc-0.016
Token grown
Token ID7334
Feature activation-3.55e-01

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token the
Token ID262
Feature activation+1.08e+00

Pos logit contributions
abella+0.014
top+0.014
 wrong+0.013
bench+0.012
 versa+0.011
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
AAAA-0.009
oc-0.009
Token story
Token ID1621
Feature activation+5.77e-01

Pos logit contributions
top+0.047
abella+0.046
rant+0.040
 wrong+0.039
bench+0.039
Neg logit contributions
hr-0.032
 tou-0.030
 exc-0.030
AAAA-0.030
 mis-0.028
Token is
Token ID318
Feature activation+1.05e+00

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.019
 versa+0.017
rant+0.017
Neg logit contributions
rum-0.019
hr-0.019
packs-0.019
ment-0.019
cest-0.019
Token:
Token ID25
Feature activation+1.02e+00

Pos logit contributions
abella+0.038
top+0.035
bench+0.033
 versa+0.031
rant+0.031
Neg logit contributions
hr-0.037
iring-0.036
 tou-0.034
 exc-0.034
rum-0.034
Token You
Token ID921
Feature activation+1.71e-01

Pos logit contributions
abella+0.041
top+0.037
 wrong+0.035
bench+0.034
rant+0.031
Neg logit contributions
hr-0.035
 exc-0.033
rum-0.033
AAAA-0.032
oc-0.032
Token should
Token ID815
Feature activation+3.01e+00

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
AAAA-0.006
 exc-0.006
Token not
Token ID407
Feature activation+1.70e+00

Pos logit contributions
abella+0.139
top+0.118
rant+0.111
 projector+0.111
bench+0.101
Neg logit contributions
 exc-0.095
rum-0.086
 tou-0.075
 mis-0.074
cest-0.069
Token be
Token ID307
Feature activation-4.24e-01

Pos logit contributions
abella+0.083
top+0.068
 wrong+0.067
bench+0.062
 projector+0.061
Neg logit contributions
 exc-0.048
hr-0.047
rum-0.046
oc-0.045
 tou-0.044
Token ignorant
Token ID20109
Feature activation+1.03e-01

Pos logit contributions
hr+0.014
rum+0.013
AAAA+0.013
cest+0.013
oc+0.013
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.013
 Luke-0.012
Token and
Token ID290
Feature activation+5.22e-01

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
ment-0.003
AAAA-0.003
Token selfish
Token ID20363
Feature activation-3.82e-01

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.019
bench+0.017
 versa+0.017
Neg logit contributions
hr-0.017
rum-0.015
 exc-0.015
cest-0.015
AAAA-0.014
Token Every
Token ID3887
Feature activation+1.67e+00

Pos logit contributions
abella+0.017
 wrong+0.016
top+0.015
bench+0.013
rant+0.012
Neg logit contributions
hr-0.015
cest-0.015
AAAA-0.015
rum-0.015
packs-0.015
Token day
Token ID1110
Feature activation+4.06e-01

Pos logit contributions
abella+0.068
top+0.057
 versa+0.055
 wrong+0.055
 Dame+0.051
Neg logit contributions
AAAA-0.052
oc-0.051
ment-0.050
 exc-0.050
packs-0.050
Token they
Token ID484
Feature activation+1.25e+00

Pos logit contributions
abella+0.015
 wrong+0.014
top+0.014
bench+0.012
rant+0.012
Neg logit contributions
hr-0.014
cest-0.013
AAAA-0.013
rum-0.013
ment-0.013
Token played
Token ID2826
Feature activation+1.44e-01

Pos logit contributions
abella+0.037
 wrong+0.033
top+0.032
bench+0.027
gh+0.024
Neg logit contributions
 exc-0.053
packs-0.051
cest-0.050
hr-0.049
 tou-0.049
Token together
Token ID1978
Feature activation+1.01e+00

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
rant+0.003
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
AAAA-0.005
oc-0.005
Token and
Token ID290
Feature activation+3.00e+00

Pos logit contributions
abella+0.050
top+0.043
 wrong+0.043
bench+0.039
 versa+0.039
Neg logit contributions
hr-0.023
cest-0.022
packs-0.022
rum-0.022
oc-0.022
Token joked
Token ID27582
Feature activation+8.12e-01

Pos logit contributions
abella+0.132
 wrong+0.128
top+0.126
bench+0.121
inner+0.121
Neg logit contributions
cest-0.075
 exc-0.070
packs-0.067
 squirrel-0.065
ies-0.063
Token around
Token ID1088
Feature activation+1.98e-01

Pos logit contributions
abella+0.033
top+0.031
 wrong+0.029
bench+0.027
rant+0.026
Neg logit contributions
hr-0.024
rum-0.024
cest-0.023
ment-0.022
oc-0.022
Token.
Token ID13
Feature activation+1.34e+00

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token But
Token ID887
Feature activation+1.10e+00

Pos logit contributions
abella+0.055
 wrong+0.053
top+0.049
bench+0.045
rant+0.044
Neg logit contributions
AAAA-0.039
oc-0.038
packs-0.038
rum-0.037
cest-0.037
Token one
Token ID530
Feature activation-1.91e-01

Pos logit contributions
abella+0.033
top+0.029
 wrong+0.028
bench+0.027
rant+0.024
Neg logit contributions
hr-0.046
cest-0.045
rum-0.043
AAAA-0.043
packs-0.043
Token and
Token ID290
Feature activation+7.54e-01

Pos logit contributions
abella+0.060
 wrong+0.051
top+0.051
bench+0.047
 versa+0.046
Neg logit contributions
hr-0.031
cest-0.029
oc-0.028
rum-0.028
packs-0.027
Token looking
Token ID2045
Feature activation-9.88e-01

Pos logit contributions
abella+0.032
top+0.028
 wrong+0.027
bench+0.026
rant+0.024
Neg logit contributions
hr-0.025
cest-0.022
 exc-0.022
AAAA-0.022
rum-0.022
Token for
Token ID329
Feature activation-9.34e-01

Pos logit contributions
hr+0.049
rum+0.048
AAAA+0.047
cest+0.046
ment+0.045
Neg logit contributions
 wrong-0.023
abella-0.021
top-0.019
 Luke-0.013
rant-0.013
Token a
Token ID257
Feature activation+1.19e-01

Pos logit contributions
hr+0.033
rum+0.033
AAAA+0.031
cest+0.031
packs+0.030
Neg logit contributions
 wrong-0.033
abella-0.032
top-0.030
 Luke-0.025
 versa-0.024
Token carrot
Token ID40562
Feature activation+1.26e+00

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.003
cest-0.003
rum-0.003
AAAA-0.003
oc-0.003
Token trail
Token ID8025
Feature activation+3.00e+00

Pos logit contributions
abella+0.059
top+0.051
 wrong+0.048
bench+0.044
rant+0.044
Neg logit contributions
cest-0.036
hr-0.034
rum-0.034
packs-0.034
oc-0.033
Token,"
Token ID553
Feature activation+5.30e-01

Pos logit contributions
abella+0.155
top+0.121
 Dame+0.119
acey+0.117
 versa+0.116
Neg logit contributions
 than-0.072
cest-0.069
packs-0.065
 from-0.061
 behind-0.060
Token replied
Token ID8712
Feature activation+1.27e+00

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.014
rant+0.014
Neg logit contributions
hr-0.020
AAAA-0.019
oc-0.019
rum-0.019
cest-0.018
Token Benny
Token ID44275
Feature activation+6.14e-01

Pos logit contributions
abella+0.051
top+0.048
 wrong+0.044
bench+0.043
rant+0.040
Neg logit contributions
hr-0.039
 mis-0.037
rum-0.036
AAAA-0.036
 exc-0.035
Token.
Token ID13
Feature activation+5.15e-01

Pos logit contributions
abella+0.027
 wrong+0.024
top+0.024
 weekend+0.021
bench+0.020
Neg logit contributions
hr-0.019
rum-0.018
cest-0.017
oc-0.017
ment-0.017
Token
Token ID198
Feature activation+5.03e-01

Pos logit contributions
abella+0.020
 wrong+0.019
top+0.018
bench+0.015
 versa+0.015
Neg logit contributions
hr-0.017
AAAA-0.016
rum-0.016
oc-0.016
 exc-0.016
Token so
Token ID523
Feature activation-3.00e-01

Pos logit contributions
rum+0.066
ment+0.062
hr+0.062
AAAA+0.061
oc+0.061
Neg logit contributions
abella-0.042
 wrong-0.039
top-0.035
 Luke-0.032
 Jack-0.032
Token he
Token ID339
Feature activation+1.61e-02

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.012
cest+0.012
ment+0.011
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 Luke-0.006
Token hid
Token ID24519
Feature activation-4.02e-01

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.000
rant+0.000
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.000
Token a
Token ID257
Feature activation-2.28e-01

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
oc+0.011
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.014
rant-0.014
Token special
Token ID2041
Feature activation+3.55e-01

Pos logit contributions
hr+0.007
rum+0.007
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.009
 wrong-0.009
top-0.009
bench-0.008
 versa-0.007
Token camera
Token ID4676
Feature activation+2.99e+00

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.014
rant+0.012
bench+0.012
Neg logit contributions
hr-0.010
rum-0.010
cest-0.010
AAAA-0.010
oc-0.009
Token.
Token ID13
Feature activation+2.55e-01

Pos logit contributions
abella+0.163
top+0.131
 Dame+0.131
liv+0.124
 seventeen+0.123
Neg logit contributions
 behind-0.053
 from-0.051
cest-0.047
packs-0.045
iring-0.045
Token The
Token ID383
Feature activation-2.11e-01

Pos logit contributions
abella+0.010
 wrong+0.010
top+0.009
bench+0.008
rant+0.008
Neg logit contributions
hr-0.008
rum-0.008
oc-0.007
AAAA-0.007
cest-0.007
Token bear
Token ID6842
Feature activation-3.21e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.009
 wrong-0.009
top-0.009
bench-0.008
 versa-0.007
Token searched
Token ID16499
Feature activation-4.38e-01

Pos logit contributions
hr+0.012
rum+0.012
cest+0.012
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
 versa-0.008
Token high
Token ID1029
Feature activation+1.03e+00

Pos logit contributions
hr+0.017
rum+0.016
cest+0.016
AAAA+0.016
ment+0.015
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
rant-0.011
Token hug
Token ID16225
Feature activation-3.52e-01

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token sound
Token ID2128
Feature activation-8.65e-02

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.012
cest+0.011
 exc+0.011
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.012
bench-0.010
rant-0.010
Token and
Token ID290
Feature activation-2.52e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.003
 versa-0.003
Token wraps
Token ID27521
Feature activation+1.78e+00

Pos logit contributions
hr+0.009
rum+0.009
cest+0.009
AAAA+0.008
oc+0.008
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
 versa-0.007
Token his
Token ID465
Feature activation-7.22e-01

Pos logit contributions
abella+0.083
 wrong+0.074
top+0.073
 versa+0.072
bench+0.071
Neg logit contributions
ment-0.047
rum-0.040
oc-0.038
cest-0.037
hr-0.036
Token arms
Token ID5101
Feature activation+2.98e+00

Pos logit contributions
hr+0.023
cest+0.022
rum+0.021
AAAA+0.021
oc+0.021
Neg logit contributions
 wrong-0.029
abella-0.029
top-0.027
bench-0.024
 Luke-0.023
Token around
Token ID1088
Feature activation+7.14e-01

Pos logit contributions
abella+0.171
 versa+0.141
 halftime+0.137
 Dame+0.136
 cinema+0.135
Neg logit contributions
 than-0.051
 from-0.050
packs-0.047
 as-0.044
rum-0.044
Token the
Token ID262
Feature activation+3.94e-01

Pos logit contributions
abella+0.033
top+0.031
 wrong+0.030
bench+0.027
 versa+0.026
Neg logit contributions
hr-0.018
rum-0.017
ment-0.017
oc-0.016
cest-0.016
Token paper
Token ID3348
Feature activation+8.18e-01

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.013
bench+0.011
 versa+0.011
Neg logit contributions
hr-0.013
cest-0.013
rum-0.013
AAAA-0.012
oc-0.012
Token.
Token ID13
Feature activation+2.62e-01

Pos logit contributions
abella+0.036
 wrong+0.032
top+0.032
 versa+0.027
 weekend+0.027
Neg logit contributions
hr-0.023
rum-0.023
cest-0.023
packs-0.022
ment-0.022
Token
Token ID198
Feature activation+7.10e-01

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.009
rant+0.009
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
ment-0.007
cest-0.007
Token cricket
Token ID18836
Feature activation+2.78e-01

Pos logit contributions
abella+0.031
 wrong+0.028
top+0.028
bench+0.025
 versa+0.024
Neg logit contributions
hr-0.025
cest-0.023
oc-0.022
ment-0.022
rum-0.022
Token was
Token ID373
Feature activation+1.05e+00

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
AAAA-0.009
oc-0.009
Token sitting
Token ID5586
Feature activation-5.70e-02

Pos logit contributions
abella+0.044
top+0.040
 wrong+0.038
 versa+0.035
bench+0.034
Neg logit contributions
hr-0.031
 exc-0.030
rum-0.030
cest-0.030
 mis-0.029
Token on
Token ID319
Feature activation+5.18e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token a
Token ID257
Feature activation+1.35e+00

Pos logit contributions
abella+0.021
top+0.019
 wrong+0.019
bench+0.017
rant+0.016
Neg logit contributions
hr-0.016
rum-0.015
AAAA-0.015
cest-0.014
oc-0.014
Token leaf
Token ID12835
Feature activation+2.98e+00

Pos logit contributions
abella+0.064
 wrong+0.054
top+0.054
bench+0.050
rant+0.049
Neg logit contributions
hr-0.036
cest-0.033
 exc-0.032
oc-0.032
 oak-0.031
Token outside
Token ID2354
Feature activation-7.70e-01

Pos logit contributions
abella+0.154
 Dame+0.117
 versa+0.116
top+0.116
 wrong+0.115
Neg logit contributions
hr-0.068
oc-0.063
packs-0.060
cest-0.060
iring-0.057
Token.
Token ID13
Feature activation+7.62e-01

Pos logit contributions
hr+0.028
AAAA+0.027
rum+0.027
cest+0.026
 tou+0.025
Neg logit contributions
top-0.026
abella-0.026
 wrong-0.026
 Luke-0.020
 weekend-0.020
Token Suddenly
Token ID24975
Feature activation+6.87e-01

Pos logit contributions
abella+0.032
 wrong+0.031
top+0.029
bench+0.026
rant+0.025
Neg logit contributions
hr-0.022
rum-0.021
AAAA-0.020
Upon-0.019
 exc-0.019
Token,
Token ID11
Feature activation+4.91e-01

Pos logit contributions
abella+0.029
top+0.027
 wrong+0.026
bench+0.024
rant+0.024
Neg logit contributions
hr-0.020
AAAA-0.019
ment-0.019
cest-0.019
rum-0.019
Token it
Token ID340
Feature activation+6.73e-02

Pos logit contributions
abella+0.019
top+0.017
 wrong+0.017
bench+0.015
rant+0.015
Neg logit contributions
hr-0.017
AAAA-0.016
cest-0.015
ment-0.015
rum-0.015
Token and
Token ID290
Feature activation+2.72e-01

Pos logit contributions
hr+0.008
rum+0.007
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.009
rant-0.008
Token your
Token ID534
Feature activation-5.49e-01

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
ment-0.008
AAAA-0.008
Token mind
Token ID2000
Feature activation+2.55e-01

Pos logit contributions
hr+0.017
rum+0.017
AAAA+0.016
cest+0.016
oc+0.015
Neg logit contributions
abella-0.022
 wrong-0.021
top-0.021
bench-0.017
 weekend-0.017
Token.
Token ID13
Feature activation+2.89e-01

Pos logit contributions
abella+0.009
top+0.009
 wrong+0.008
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
ment-0.008
packs-0.008
Token You
Token ID921
Feature activation+6.19e-01

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.010
rum-0.010
AAAA-0.009
cest-0.009
oc-0.009
Token should
Token ID815
Feature activation+2.98e+00

Pos logit contributions
abella+0.024
top+0.021
 wrong+0.021
bench+0.018
rant+0.017
Neg logit contributions
hr-0.021
 exc-0.021
rum-0.021
oc-0.020
AAAA-0.020
Token not
Token ID407
Feature activation+1.63e+00

Pos logit contributions
abella+0.130
top+0.113
rant+0.103
 wrong+0.102
 projector+0.102
Neg logit contributions
 exc-0.097
rum-0.089
 mis-0.082
 tou-0.079
hr-0.077
Token play
Token ID711
Feature activation-3.74e-01

Pos logit contributions
abella+0.079
top+0.067
 wrong+0.067
bench+0.062
 projector+0.061
Neg logit contributions
hr-0.044
 exc-0.043
rum-0.041
 mis-0.040
 tou-0.040
Token too
Token ID1165
Feature activation-2.04e-01

Pos logit contributions
hr+0.012
rum+0.012
cest+0.012
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 Luke-0.011
Token much
Token ID881
Feature activation-2.81e-01

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.004
 versa-0.004
Token or
Token ID393
Feature activation+3.60e-01

Pos logit contributions
hr+0.009
rum+0.009
cest+0.009
AAAA+0.009
ment+0.008
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
 versa-0.008
Token hug
Token ID16225
Feature activation-3.13e-02

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
rant+0.004
Neg logit contributions
hr-0.003
cest-0.003
rum-0.003
AAAA-0.003
oc-0.003
Token sound
Token ID2128
Feature activation+3.63e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token and
Token ID290
Feature activation-3.49e-01

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
ment-0.000
AAAA-0.000
Token wraps
Token ID27521
Feature activation+1.87e+00

Pos logit contributions
hr+0.013
rum+0.013
cest+0.012
AAAA+0.012
oc+0.012
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.009
 Luke-0.009
Token her
Token ID607
Feature activation-6.51e-01

Pos logit contributions
abella+0.085
 wrong+0.076
top+0.075
 versa+0.075
bench+0.073
Neg logit contributions
ment-0.048
rum-0.040
packs-0.039
hr-0.039
oc-0.038
Token arms
Token ID5101
Feature activation+2.98e+00

Pos logit contributions
hr+0.021
cest+0.020
rum+0.020
oc+0.019
AAAA+0.019
Neg logit contributions
 wrong-0.026
abella-0.026
top-0.024
bench-0.021
 Luke-0.020
Token around
Token ID1088
Feature activation+7.93e-01

Pos logit contributions
abella+0.162
 versa+0.138
 cinema+0.134
 halftime+0.133
scar+0.132
Neg logit contributions
 than-0.055
 from-0.055
packs-0.053
 as-0.048
 behind-0.047
Token the
Token ID262
Feature activation+5.42e-01

Pos logit contributions
abella+0.037
top+0.035
 wrong+0.035
bench+0.031
 versa+0.030
Neg logit contributions
hr-0.019
rum-0.018
ment-0.017
packs-0.016
cest-0.016
Token paper
Token ID3348
Feature activation+7.72e-01

Pos logit contributions
abella+0.021
top+0.019
 wrong+0.017
 versa+0.015
bench+0.015
Neg logit contributions
hr-0.019
cest-0.018
rum-0.017
AAAA-0.017
 exc-0.017
Token.
Token ID13
Feature activation+9.33e-02

Pos logit contributions
abella+0.034
top+0.030
 wrong+0.030
 versa+0.026
rant+0.025
Neg logit contributions
hr-0.022
rum-0.022
cest-0.021
packs-0.021
ment-0.021
Token
Token ID198
Feature activation+7.95e-02

Pos logit contributions
abella+0.004
 wrong+0.003
top+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.003
rum-0.003
AAAA-0.003
cest-0.003
ment-0.003
Token said
Token ID531
Feature activation+6.89e-01

Pos logit contributions
abella+0.036
top+0.035
 wrong+0.035
bench+0.030
gh+0.030
Neg logit contributions
 exc-0.027
hr-0.026
cest-0.025
 tou-0.024
AAAA-0.024
Token that
Token ID326
Feature activation+1.10e+00

Pos logit contributions
abella+0.023
top+0.022
 wrong+0.019
bench+0.019
rant+0.018
Neg logit contributions
hr-0.026
ment-0.026
 exc-0.024
rum-0.024
AAAA-0.024
Token Lily
Token ID20037
Feature activation+1.83e-01

Pos logit contributions
abella+0.039
top+0.038
 wrong+0.034
rant+0.033
bench+0.032
Neg logit contributions
hr-0.039
 exc-0.038
ment-0.036
cest-0.036
packs-0.035
Token and
Token ID290
Feature activation+8.29e-01

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
AAAA-0.006
oc-0.006
Token Ben
Token ID3932
Feature activation+1.73e-01

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.020
bench+0.017
rant+0.016
Neg logit contributions
hr-0.035
ment-0.033
packs-0.033
cest-0.033
 mis-0.033
Token should
Token ID815
Feature activation+2.97e+00

Pos logit contributions
abella+0.006
top+0.005
 wrong+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
AAAA-0.006
 exc-0.006
Token say
Token ID910
Feature activation+7.39e-01

Pos logit contributions
abella+0.144
top+0.130
rant+0.117
 projector+0.117
bench+0.113
Neg logit contributions
 exc-0.082
 tou-0.065
rum-0.064
 mis-0.064
 fear-0.061
Token sorry
Token ID7926
Feature activation+2.14e-01

Pos logit contributions
abella+0.031
top+0.030
bench+0.026
 wrong+0.026
rant+0.025
Neg logit contributions
hr-0.021
rum-0.020
ment-0.020
 exc-0.020
AAAA-0.019
Token and
Token ID290
Feature activation+9.89e-01

Pos logit contributions
abella+0.009
top+0.009
 wrong+0.009
bench+0.008
 versa+0.007
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
ment-0.006
AAAA-0.005
Token thank
Token ID5875
Feature activation+6.10e-01

Pos logit contributions
abella+0.043
top+0.038
 wrong+0.035
rant+0.033
 versa+0.032
Neg logit contributions
hr-0.031
 exc-0.030
ment-0.028
rum-0.028
cest-0.028
Token you
Token ID345
Feature activation-9.39e-02

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.020
bench+0.018
rant+0.017
Neg logit contributions
hr-0.022
rum-0.020
ment-0.019
 exc-0.019
AAAA-0.019
Token an
Token ID281
Feature activation-3.76e-01

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.020
bench+0.019
 versa+0.018
Neg logit contributions
hr-0.029
rum-0.028
ment-0.028
cest-0.027
 exc-0.026
Token important
Token ID1593
Feature activation+1.41e+00

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
packs+0.014
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.009
rant-0.008
Token lesson
Token ID11483
Feature activation+3.25e-01

Pos logit contributions
abella+0.069
top+0.063
rant+0.063
 wrong+0.058
 versa+0.057
Neg logit contributions
rum-0.034
oc-0.032
hr-0.032
AAAA-0.032
 exc-0.030
Token -
Token ID532
Feature activation+9.23e-01

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.012
rum-0.012
cest-0.012
ment-0.012
AAAA-0.011
Token we
Token ID356
Feature activation+1.08e+00

Pos logit contributions
abella+0.033
top+0.030
 wrong+0.029
bench+0.026
rant+0.025
Neg logit contributions
hr-0.034
rum-0.033
 exc-0.032
cest-0.032
ment-0.031
Token should
Token ID815
Feature activation+2.97e+00

Pos logit contributions
abella+0.043
top+0.039
 wrong+0.037
bench+0.033
rant+0.032
Neg logit contributions
hr-0.035
rum-0.035
 exc-0.034
cest-0.033
oc-0.032
Token respect
Token ID2461
Feature activation-4.40e-01

Pos logit contributions
abella+0.125
top+0.106
 projector+0.100
bench+0.098
rant+0.097
Neg logit contributions
 exc-0.097
rum-0.089
hr-0.088
 mis-0.087
 tou-0.082
Token the
Token ID262
Feature activation-3.10e-02

Pos logit contributions
hr+0.016
rum+0.015
AAAA+0.015
cest+0.015
ment+0.014
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.012
 Luke-0.012
Token opinion
Token ID4459
Feature activation-8.68e-02

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
rant-0.001
Token of
Token ID286
Feature activation-1.29e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token others
Token ID1854
Feature activation-8.13e-01

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token.
Token ID13
Feature activation+3.86e-01

Pos logit contributions
abella+0.017
 wrong+0.015
top+0.015
bench+0.013
 versa+0.013
Neg logit contributions
hr-0.012
rum-0.011
cest-0.011
packs-0.011
oc-0.011
Token They
Token ID1119
Feature activation+6.74e-01

Pos logit contributions
abella+0.014
 wrong+0.013
top+0.013
bench+0.010
 versa+0.010
Neg logit contributions
hr-0.013
rum-0.013
packs-0.013
cest-0.013
AAAA-0.012
Token always
Token ID1464
Feature activation+1.50e+00

Pos logit contributions
abella+0.027
top+0.025
 wrong+0.025
bench+0.022
 versa+0.020
Neg logit contributions
 exc-0.021
hr-0.021
cest-0.020
rum-0.020
AAAA-0.020
Token played
Token ID2826
Feature activation+3.96e-02

Pos logit contributions
abella+0.065
 wrong+0.058
top+0.058
bench+0.050
side+0.049
Neg logit contributions
 exc-0.045
cest-0.044
hr-0.044
AAAA-0.042
packs-0.041
Token together
Token ID1978
Feature activation+1.04e+00

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
oc-0.001
Token and
Token ID290
Feature activation+2.97e+00

Pos logit contributions
abella+0.047
top+0.042
 wrong+0.041
bench+0.038
 versa+0.037
Neg logit contributions
hr-0.027
rum-0.025
cest-0.025
oc-0.024
 exc-0.024
Token had
Token ID550
Feature activation+1.18e+00

Pos logit contributions
abella+0.138
top+0.131
 wrong+0.128
inner+0.122
bench+0.119
Neg logit contributions
 exc-0.069
cest-0.068
hr-0.060
oc-0.059
packs-0.057
Token lots
Token ID6041
Feature activation-1.07e+00

Pos logit contributions
abella+0.049
top+0.046
 wrong+0.043
rant+0.042
bench+0.041
Neg logit contributions
hr-0.035
cest-0.032
packs-0.030
AAAA-0.030
Upon-0.030
Token of
Token ID286
Feature activation-1.23e-01

Pos logit contributions
AAAA+0.037
hr+0.037
rum+0.037
cest+0.034
 tou+0.033
Neg logit contributions
 wrong-0.038
abella-0.037
top-0.037
 contest-0.030
bench-0.030
Token fun
Token ID1257
Feature activation-3.65e-01

Pos logit contributions
hr+0.006
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
 versa-0.002
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
hr+0.014
rum+0.013
AAAA+0.013
cest+0.013
ment+0.012
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 versa-0.009
Token was
Token ID373
Feature activation+4.38e-02

Pos logit contributions
hr+0.042
AAAA+0.041
rum+0.041
oc+0.040
cest+0.039
Neg logit contributions
 wrong-0.041
top-0.038
abella-0.038
bench-0.030
 contest-0.030
Token all
Token ID477
Feature activation+1.54e-01

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.002
rum-0.001
cest-0.001
AAAA-0.001
 exc-0.001
Token broken
Token ID5445
Feature activation-3.97e-01

Pos logit contributions
abella+0.006
top+0.005
 wrong+0.005
 versa+0.004
bench+0.004
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
 exc-0.005
Token.
Token ID13
Feature activation-2.08e+00

Pos logit contributions
hr+0.014
rum+0.014
AAAA+0.013
cest+0.013
oc+0.013
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
rant-0.010
Token
Token ID198
Feature activation-2.15e+00

Pos logit contributions
cest+0.099
hr+0.097
rum+0.093
packs+0.093
 prey+0.087
Neg logit contributions
abella-0.055
top-0.053
 wrong-0.046
 However-0.045
 only-0.036
Token
Token ID198
Feature activation-3.55e+00

Pos logit contributions
cest+0.110
hr+0.109
packs+0.104
 amaz+0.103
ment+0.101
Neg logit contributions
abella-0.047
top-0.040
 wrong-0.035
 However-0.029
 never-0.028
TokenWil
Token ID22327
Feature activation-2.99e-01

Pos logit contributions
packs+0.145
 amaz+0.131
 prey+0.128
 discoveries+0.126
yards+0.126
Neg logit contributions
abella-0.110
top-0.106
 never-0.102
 However-0.094
 always-0.091
Tokenma
Token ID2611
Feature activation+2.44e-01

Pos logit contributions
hr+0.014
rum+0.013
cest+0.013
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.005
rant-0.005
Token felt
Token ID2936
Feature activation-4.43e-01

Pos logit contributions
abella+0.009
 wrong+0.009
top+0.009
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
cest-0.008
oc-0.007
Token a
Token ID257
Feature activation-1.06e+00

Pos logit contributions
hr+0.016
rum+0.016
cest+0.016
packs+0.015
AAAA+0.015
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.012
 versa-0.012
Token big
Token ID1263
Feature activation+6.97e-01

Pos logit contributions
hr+0.030
packs+0.030
rum+0.029
cest+0.028
ment+0.027
Neg logit contributions
 wrong-0.046
abella-0.044
top-0.043
bench-0.037
 only-0.036
Token suddenly
Token ID6451
Feature activation+1.30e-01

Pos logit contributions
abella+0.029
top+0.027
 wrong+0.026
bench+0.024
rant+0.023
Neg logit contributions
hr-0.028
AAAA-0.027
packs-0.026
 exc-0.026
cest-0.025
Token Lucy
Token ID22162
Feature activation-2.11e-01

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.004
bench+0.004
rant+0.004
Neg logit contributions
hr-0.004
rum-0.004
AAAA-0.004
cest-0.004
ment-0.004
Token heard
Token ID2982
Feature activation-2.50e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.007
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.007
bench-0.005
 versa-0.005
Token a
Token ID257
Feature activation-1.11e+00

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
rant-0.001
Token loud
Token ID7812
Feature activation-5.32e-01

Pos logit contributions
rum+0.038
packs+0.037
hr+0.035
ment+0.035
cest+0.034
Neg logit contributions
 wrong-0.043
abella-0.041
top-0.040
 contest-0.033
bench-0.032
Token "
Token ID366
Feature activation-3.49e+00

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
packs+0.014
AAAA+0.014
Neg logit contributions
abella-0.023
 wrong-0.022
top-0.021
bench-0.018
 versa-0.018
TokenSC
Token ID6173
Feature activation-1.94e-01

Pos logit contributions
 fortunate+0.138
 discoveries+0.135
rum+0.135
ment+0.135
cest+0.131
Neg logit contributions
abella-0.102
top-0.102
bo-0.093
V-0.090
t-0.085
TokenREAM
Token ID32235
Feature activation-3.98e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
rant-0.006
Token"!
Token ID40484
Feature activation-2.90e-02

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.013
 versa-0.013
Token She
Token ID1375
Feature activation-1.11e+00

Pos logit contributions
hr+0.001
rum+0.001
AAAA+0.001
cest+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
rant-0.001
Token felt
Token ID2936
Feature activation-8.24e-01

Pos logit contributions
rum+0.045
hr+0.044
cest+0.043
packs+0.042
ment+0.041
Neg logit contributions
abella-0.038
top-0.031
 wrong-0.031
 would-0.026
 Luke-0.025
Token.
Token ID13
Feature activation-3.53e-01

Pos logit contributions
abella+0.028
top+0.024
 wrong+0.023
 versa+0.022
bench+0.021
Neg logit contributions
hr-0.019
rum-0.018
cest-0.018
oc-0.017
 exc-0.017
Token It
Token ID632
Feature activation-3.35e-01

Pos logit contributions
hr+0.014
rum+0.014
cest+0.014
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.011
top-0.011
 wrong-0.010
bench-0.008
 Luke-0.008
Token was
Token ID373
Feature activation-4.85e-01

Pos logit contributions
hr+0.013
rum+0.013
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
rant-0.008
Token surrounded
Token ID11191
Feature activation+3.21e-01

Pos logit contributions
hr+0.021
rum+0.020
cest+0.020
AAAA+0.020
oc+0.020
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.010
 weekend-0.010
Token by
Token ID416
Feature activation-2.16e+00

Pos logit contributions
abella+0.015
top+0.014
 wrong+0.013
 versa+0.012
bench+0.012
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
iring-0.007
 exc-0.007
Token other
Token ID584
Feature activation-3.37e+00

Pos logit contributions
rum+0.085
AAAA+0.082
packs+0.081
cest+0.081
hr+0.080
Neg logit contributions
 wrong-0.065
abella-0.063
top-0.062
 never-0.053
 always-0.052
Token weak
Token ID4939
Feature activation-2.07e+00

Pos logit contributions
rum+0.123
AAAA+0.119
packs+0.118
ment+0.111
hr+0.109
Neg logit contributions
 wrong-0.115
top-0.107
abella-0.098
 only-0.097
 always-0.096
Token trees
Token ID7150
Feature activation-1.45e-03

Pos logit contributions
hr+0.067
AAAA+0.066
rum+0.062
packs+0.058
cest+0.058
Neg logit contributions
 wrong-0.080
top-0.075
abella-0.069
 night-0.061
 would-0.061
Token in
Token ID287
Feature activation-8.66e-01

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
oc+0.000
Neg logit contributions
abella-0.000
 wrong-0.000
top-0.000
bench-0.000
 versa-0.000
Token the
Token ID262
Feature activation-3.61e-01

Pos logit contributions
hr+0.033
rum+0.032
cest+0.031
ment+0.031
AAAA+0.031
Neg logit contributions
 wrong-0.028
abella-0.028
top-0.027
bench-0.021
 weekend-0.021
Token forest
Token ID8222
Feature activation+8.82e-02

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
ment+0.011
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.013
bench-0.010
 weekend-0.010
Token.
Token ID13
Feature activation-1.59e+00

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.007
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
 versa-0.006
Token
Token ID198
Feature activation-8.25e-01

Pos logit contributions
hr+0.069
cest+0.065
rum+0.062
AAAA+0.062
ment+0.060
Neg logit contributions
abella-0.048
top-0.048
 wrong-0.041
 However-0.040
 weekend-0.036
Token
Token ID198
Feature activation-2.30e+00

Pos logit contributions
hr+0.034
cest+0.033
rum+0.033
ment+0.032
AAAA+0.032
Neg logit contributions
abella-0.027
top-0.024
 wrong-0.023
bench-0.019
 weekend-0.019
TokenOne
Token ID3198
Feature activation-8.77e-01

Pos logit contributions
ment+0.121
hr+0.120
cest+0.116
packs+0.116
rum+0.115
Neg logit contributions
abella-0.044
top-0.042
 weekend-0.030
One-0.030
 wrong-0.029
Token day
Token ID1110
Feature activation-2.45e+00

Pos logit contributions
rum+0.051
AAAA+0.050
hr+0.050
cest+0.048
Upon+0.047
Neg logit contributions
abella-0.012
top-0.011
 wrong-0.011
 weekend-0.008
 morning-0.006
Token,
Token ID11
Feature activation-3.28e+00

Pos logit contributions
rum+0.125
hr+0.123
cest+0.118
 amaz+0.113
 prey+0.112
Neg logit contributions
abella-0.055
top-0.048
 wrong-0.047
 one-0.037
 only-0.036
Token Wil
Token ID5187
Feature activation-1.42e-01

Pos logit contributions
rum+0.149
hr+0.138
AAAA+0.133
cest+0.132
 ey+0.131
Neg logit contributions
abella-0.099
top-0.091
 only-0.087
 one-0.086
 always-0.083
Tokenma
Token ID2611
Feature activation-7.51e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
rant-0.003
Token tri
Token ID1333
Feature activation-1.51e-01

Pos logit contributions
hr+0.029
rum+0.029
AAAA+0.027
cest+0.027
ment+0.027
Neg logit contributions
abella-0.026
top-0.024
 wrong-0.023
bench-0.018
 Luke-0.018
Tokenpped
Token ID1496
Feature activation-9.23e-02

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token and
Token ID290
Feature activation-1.06e+00

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token,
Token ID11
Feature activation-1.31e+00

Pos logit contributions
hr+0.023
rum+0.022
AAAA+0.022
cest+0.021
oc+0.021
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.016
 Luke-0.013
rant-0.012
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
rum+0.062
hr+0.058
cest+0.057
AAAA+0.056
oc+0.055
Neg logit contributions
 wrong-0.034
abella-0.033
top-0.031
 Jack-0.028
 Luke-0.027
Token tree
Token ID5509
Feature activation-1.91e-01

Pos logit contributions
hr+0.066
rum+0.062
cest+0.061
packs+0.060
ment+0.060
Neg logit contributions
 wrong-0.051
top-0.042
 only-0.038
abella-0.036
 always-0.035
Token shook
Token ID14682
Feature activation-3.21e-01

Pos logit contributions
hr+0.007
rum+0.007
cest+0.007
AAAA+0.007
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token and
Token ID290
Feature activation+1.27e-01

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
oc+0.007
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.012
rant-0.012
Token an
Token ID281
Feature activation-3.21e+00

Pos logit contributions
abella+0.005
 wrong+0.004
top+0.004
bench+0.004
rant+0.003
Neg logit contributions
hr-0.005
rum-0.004
cest-0.004
AAAA-0.004
ment-0.004
Token apple
Token ID17180
Feature activation-4.72e-02

Pos logit contributions
packs+0.120
hr+0.118
AAAA+0.110
ment+0.108
bel+0.102
Neg logit contributions
 wrong-0.109
top-0.098
abella-0.098
 weekend-0.082
bench-0.081
Token fell
Token ID3214
Feature activation-2.74e-01

Pos logit contributions
hr+0.002
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.001
 versa-0.001
Token to
Token ID284
Feature activation-5.79e-01

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
oc+0.008
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
rant-0.009
Token the
Token ID262
Feature activation-4.21e-01

Pos logit contributions
hr+0.021
rum+0.021
cest+0.021
ment+0.020
packs+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.019
rant-0.015
 Luke-0.015
Token ground
Token ID2323
Feature activation+6.54e-01

Pos logit contributions
hr+0.016
rum+0.015
cest+0.015
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 weekend-0.011
Token in
Token ID287
Feature activation-1.07e+00

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token a
Token ID257
Feature activation-1.56e-01

Pos logit contributions
hr+0.051
rum+0.051
ment+0.049
cest+0.049
packs+0.048
Neg logit contributions
abella-0.025
 wrong-0.023
top-0.022
 Luke-0.017
 never-0.016
Token jungle
Token ID20712
Feature activation+5.32e-01

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token with
Token ID351
Feature activation-5.18e-01

Pos logit contributions
abella+0.026
top+0.023
 wrong+0.023
 versa+0.021
rant+0.021
Neg logit contributions
hr-0.014
oc-0.012
cest-0.012
rum-0.012
 exc-0.012
Token many
Token ID867
Feature activation-2.62e+00

Pos logit contributions
hr+0.016
rum+0.015
AAAA+0.015
cest+0.015
ment+0.014
Neg logit contributions
abella-0.021
 wrong-0.020
top-0.020
bench-0.016
 Luke-0.016
Token other
Token ID584
Feature activation-3.21e+00

Pos logit contributions
hr+0.114
rum+0.108
ment+0.107
AAAA+0.106
 amaz+0.099
Neg logit contributions
 wrong-0.077
abella-0.057
top-0.056
 often-0.056
 would-0.055
Token animals
Token ID4695
Feature activation-5.50e-01

Pos logit contributions
ment+0.124
rum+0.124
hr+0.122
 amaz+0.121
AAAA+0.113
Neg logit contributions
 wrong-0.098
 always-0.086
 would-0.080
 only-0.080
abella-0.079
Token like
Token ID588
Feature activation-8.84e-01

Pos logit contributions
hr+0.021
rum+0.020
AAAA+0.020
cest+0.019
ment+0.019
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.013
rant-0.013
Token lions
Token ID31420
Feature activation-1.22e+00

Pos logit contributions
hr+0.027
rum+0.027
cest+0.025
ment+0.025
AAAA+0.025
Neg logit contributions
 wrong-0.036
abella-0.036
top-0.034
 contest-0.029
 Luke-0.028
Token,
Token ID11
Feature activation-6.25e-01

Pos logit contributions
hr+0.051
rum+0.049
AAAA+0.048
ment+0.047
 amaz+0.046
Neg logit contributions
 wrong-0.037
abella-0.034
top-0.034
 contest-0.027
 would-0.027
Token tigers
Token ID43114
Feature activation-9.29e-01

Pos logit contributions
rum+0.018
hr+0.018
ment+0.017
cest+0.017
AAAA+0.016
Neg logit contributions
abella-0.027
 wrong-0.026
top-0.025
 Luke-0.021
 versa-0.021
Token,
Token ID11
Feature activation-4.03e-01

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token he
Token ID339
Feature activation+1.78e-01

Pos logit contributions
hr+0.017
rum+0.016
cest+0.016
AAAA+0.015
oc+0.015
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.009
 Luke-0.009
Token heard
Token ID2982
Feature activation+9.34e-02

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.006
cest-0.006
AAAA-0.006
rum-0.006
ment-0.006
Token a
Token ID257
Feature activation-8.92e-01

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.004
rum-0.003
cest-0.003
AAAA-0.003
ment-0.003
Token loud
Token ID7812
Feature activation-3.73e-01

Pos logit contributions
rum+0.027
packs+0.027
hr+0.026
cest+0.025
ment+0.025
Neg logit contributions
 wrong-0.036
abella-0.036
top-0.034
bench-0.028
 contest-0.028
Token "
Token ID366
Feature activation-3.20e+00

Pos logit contributions
hr+0.011
rum+0.010
cest+0.010
packs+0.010
AAAA+0.010
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.013
rant-0.012
TokenZ
Token ID57
Feature activation-1.04e+00

Pos logit contributions
rum+0.125
packs+0.122
cest+0.119
ment+0.119
 discoveries+0.118
Neg logit contributions
top-0.096
abella-0.096
bo-0.088
t-0.081
 wrong-0.078
Tokenoom
Token ID4207
Feature activation-1.09e+00

Pos logit contributions
rum+0.045
cest+0.044
hr+0.044
 exc+0.043
 fortunate+0.042
Neg logit contributions
abella-0.029
top-0.026
 wrong-0.025
 contest-0.019
gh-0.019
Token!"
Token ID2474
Feature activation+3.03e-01

Pos logit contributions
hr+0.036
rum+0.034
packs+0.034
ment+0.034
cest+0.034
Neg logit contributions
abella-0.042
 wrong-0.039
top-0.039
 Luke-0.031
bench-0.031
Token
Token ID198
Feature activation+7.83e-01

Pos logit contributions
abella+0.012
 wrong+0.011
top+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.010
AAAA-0.009
rum-0.009
ment-0.009
cest-0.009
Token
Token ID198
Feature activation-2.36e-01

Pos logit contributions
abella+0.033
top+0.033
 wrong+0.033
rant+0.029
work+0.029
Neg logit contributions
hr-0.022
AAAA-0.021
rum-0.021
oc-0.020
Upon-0.020
Token a
Token ID257
Feature activation+7.53e-01

Pos logit contributions
abella+0.003
top+0.002
 wrong+0.002
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.002
rum-0.002
AAAA-0.002
cest-0.002
 exc-0.002
Token big
Token ID1263
Feature activation+3.86e-01

Pos logit contributions
abella+0.036
top+0.032
 wrong+0.031
bench+0.029
rant+0.028
Neg logit contributions
hr-0.020
 exc-0.018
cest-0.018
oc-0.018
AAAA-0.018
Token tree
Token ID5509
Feature activation+1.51e+00

Pos logit contributions
abella+0.018
top+0.016
 wrong+0.016
bench+0.014
 versa+0.014
Neg logit contributions
hr-0.011
cest-0.010
rum-0.010
oc-0.010
AAAA-0.010
Token with
Token ID351
Feature activation-4.82e-01

Pos logit contributions
abella+0.082
top+0.067
 versa+0.066
 wrong+0.065
 Dame+0.065
Neg logit contributions
hr-0.034
oc-0.032
 exc-0.032
packs-0.031
cest-0.030
Token many
Token ID867
Feature activation-2.60e+00

Pos logit contributions
hr+0.021
rum+0.021
cest+0.020
AAAA+0.020
ment+0.019
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.012
bench-0.009
 Luke-0.009
Token other
Token ID584
Feature activation-3.17e+00

Pos logit contributions
hr+0.117
AAAA+0.109
rum+0.108
ment+0.105
cess+0.105
Neg logit contributions
 wrong-0.080
top-0.057
abella-0.053
 only-0.053
 always-0.053
Token cher
Token ID21382
Feature activation-9.21e-01

Pos logit contributions
rum+0.126
hr+0.125
ment+0.118
cest+0.116
AAAA+0.116
Neg logit contributions
 wrong-0.107
 only-0.086
top-0.085
 always-0.083
 one-0.081
Tokenries
Token ID1678
Feature activation-4.99e-01

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.011
cest+0.010
oc+0.009
Neg logit contributions
abella-0.053
 wrong-0.053
top-0.052
bench-0.045
rant-0.044
Token.
Token ID13
Feature activation+8.10e-01

Pos logit contributions
hr+0.019
rum+0.019
AAAA+0.018
cest+0.018
oc+0.017
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.013
rant-0.012
Token The
Token ID383
Feature activation-1.63e+00

Pos logit contributions
abella+0.032
 wrong+0.029
top+0.028
bench+0.024
rant+0.024
Neg logit contributions
AAAA-0.026
hr-0.026
oc-0.025
rum-0.025
 exc-0.025
Token smooth
Token ID7209
Feature activation-1.50e+00

Pos logit contributions
rum+0.067
hr+0.063
ment+0.062
cest+0.061
packs+0.060
Neg logit contributions
 wrong-0.060
top-0.050
abella-0.048
 only-0.048
 contest-0.046
TokenHelp
Token ID22087
Feature activation-3.29e-01

Pos logit contributions
rum+0.050
hr+0.048
packs+0.046
ment+0.046
iring+0.046
Neg logit contributions
abella-0.043
top-0.039
 wrong-0.036
V-0.033
 Luke-0.032
Token!"
Token ID2474
Feature activation+1.23e-01

Pos logit contributions
hr+0.011
rum+0.010
cest+0.010
AAAA+0.010
 exc+0.010
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 Luke-0.010
Token Tom
Token ID4186
Feature activation-9.52e-01

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.005
rum-0.005
AAAA-0.005
cest-0.005
ment-0.005
Token screamed
Token ID25421
Feature activation-9.87e-03

Pos logit contributions
hr+0.032
cest+0.029
rum+0.029
AAAA+0.029
ment+0.028
Neg logit contributions
abella-0.037
 wrong-0.036
top-0.035
 Luke-0.029
bench-0.029
Token.
Token ID13
Feature activation-1.05e+00

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
ment+0.000
Neg logit contributions
abella-0.000
 wrong-0.000
top-0.000
bench-0.000
 versa-0.000
Token "
Token ID366
Feature activation-3.17e+00

Pos logit contributions
hr+0.044
cest+0.043
packs+0.040
rum+0.040
 fortunate+0.039
Neg logit contributions
abella-0.032
 wrong-0.031
top-0.030
 Luke-0.025
bench-0.024
TokenThere
Token ID1858
Feature activation-5.65e-01

Pos logit contributions
 discoveries+0.148
yt+0.146
ment+0.145
 fortunate+0.143
hr+0.139
Neg logit contributions
top-0.069
bo-0.068
abella-0.065
 never-0.060
V-0.059
Token's
Token ID338
Feature activation-2.03e-01

Pos logit contributions
hr+0.025
rum+0.024
cest+0.023
AAAA+0.023
ment+0.023
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.011
 Luke-0.011
Token a
Token ID257
Feature activation+5.05e-01

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
rant-0.004
Token lion
Token ID18744
Feature activation-1.39e+00

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.021
bench+0.019
 versa+0.019
Neg logit contributions
hr-0.013
cest-0.013
AAAA-0.013
oc-0.012
rum-0.012
Token in
Token ID287
Feature activation-1.38e+00

Pos logit contributions
hr+0.048
ment+0.047
rum+0.046
iring+0.044
AAAA+0.044
Neg logit contributions
 wrong-0.051
abella-0.048
top-0.046
 contest-0.039
 Luke-0.038
Token.
Token ID13
Feature activation+1.85e-01

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.014
 versa+0.012
bench+0.012
Neg logit contributions
hr-0.011
rum-0.010
cest-0.010
packs-0.010
 exc-0.009
Token She
Token ID1375
Feature activation+7.16e-02

Pos logit contributions
abella+0.007
 wrong+0.007
top+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
rum-0.006
AAAA-0.006
cest-0.006
packs-0.006
Token wanted
Token ID2227
Feature activation+7.88e-01

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
 exc-0.002
Token to
Token ID284
Feature activation-1.93e+00

Pos logit contributions
abella+0.027
top+0.024
 wrong+0.024
bench+0.023
 versa+0.022
Neg logit contributions
hr-0.028
 tou-0.026
ries-0.026
packs-0.026
ment-0.026
Token fill
Token ID6070
Feature activation+4.32e-01

Pos logit contributions
ment+0.084
AAAA+0.081
cest+0.081
rum+0.080
hr+0.079
Neg logit contributions
top-0.055
abella-0.049
 wrong-0.046
 only-0.046
 rest-0.045
Token her
Token ID607
Feature activation-3.16e+00

Pos logit contributions
abella+0.016
 wrong+0.015
top+0.015
bench+0.013
 weekend+0.012
Neg logit contributions
hr-0.015
rum-0.014
cest-0.014
ment-0.014
oc-0.013
Token basket
Token ID7988
Feature activation+8.29e-01

Pos logit contributions
AAAA+0.143
hr+0.134
rum+0.132
smart+0.131
Get+0.127
Neg logit contributions
 wrong-0.080
 one-0.076
 rest-0.072
top-0.069
 only-0.067
Token with
Token ID351
Feature activation-1.01e+00

Pos logit contributions
abella+0.042
 wrong+0.037
top+0.035
 versa+0.033
bench+0.032
Neg logit contributions
hr-0.020
rum-0.018
cest-0.018
packs-0.018
 exc-0.017
Token the
Token ID262
Feature activation-1.64e+00

Pos logit contributions
hr+0.042
rum+0.041
AAAA+0.040
cest+0.040
oc+0.039
Neg logit contributions
abella-0.030
 wrong-0.030
top-0.030
bench-0.022
 only-0.021
Token needles
Token ID25209
Feature activation+1.65e+00

Pos logit contributions
hr+0.049
rum+0.047
AAAA+0.045
ment+0.044
packs+0.042
Neg logit contributions
 wrong-0.072
abella-0.069
top-0.067
 would-0.062
 only-0.060
Token.
Token ID13
Feature activation+1.12e-01

Pos logit contributions
abella+0.084
top+0.069
 wrong+0.068
 versa+0.068
 projector+0.065
Neg logit contributions
cest-0.036
rum-0.035
hr-0.034
packs-0.033
 exc-0.033
Token can
Token ID460
Feature activation+1.62e+00

Pos logit contributions
hr+0.017
rum+0.017
cest+0.016
ment+0.016
AAAA+0.016
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
 versa-0.010
Token't
Token ID470
Feature activation+1.00e+00

Pos logit contributions
abella+0.066
 wrong+0.059
top+0.059
bench+0.056
rant+0.055
Neg logit contributions
 exc-0.050
hr-0.048
 mis-0.047
 tou-0.045
rum-0.045
Token just
Token ID655
Feature activation-5.31e-01

Pos logit contributions
abella+0.045
top+0.039
 wrong+0.039
bench+0.035
rant+0.034
Neg logit contributions
hr-0.029
 exc-0.028
rum-0.028
oc-0.027
 tou-0.027
Token play
Token ID711
Feature activation-9.23e-01

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.018
oc+0.018
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.017
bench-0.014
 versa-0.013
Token with
Token ID351
Feature activation-5.55e-01

Pos logit contributions
hr+0.034
rum+0.033
cest+0.033
ment+0.032
AAAA+0.032
Neg logit contributions
 wrong-0.033
abella-0.031
top-0.030
 contest-0.024
 Luke-0.024
Token other
Token ID584
Feature activation-3.15e+00

Pos logit contributions
hr+0.022
rum+0.021
cest+0.021
AAAA+0.021
ment+0.020
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
 Luke-0.013
 versa-0.013
Token people
Token ID661
Feature activation-3.46e-01

Pos logit contributions
rum+0.106
AAAA+0.106
ment+0.103
hr+0.103
iring+0.099
Neg logit contributions
 wrong-0.114
top-0.088
 will-0.086
 one-0.086
abella-0.085
Token's
Token ID338
Feature activation-1.77e+00

Pos logit contributions
hr+0.012
rum+0.011
AAAA+0.011
cest+0.011
ment+0.011
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.012
bench-0.010
rant-0.010
Token things
Token ID1243
Feature activation+5.79e-01

Pos logit contributions
AAAA+0.048
hr+0.048
rum+0.047
Upon+0.045
cest+0.045
Neg logit contributions
 wrong-0.082
top-0.068
abella-0.067
 weekend-0.062
 would-0.060
Token without
Token ID1231
Feature activation+2.72e-01

Pos logit contributions
abella+0.025
top+0.022
 wrong+0.021
bench+0.019
 versa+0.019
Neg logit contributions
hr-0.017
rum-0.017
cest-0.017
 exc-0.015
AAAA-0.015
Token asking
Token ID4737
Feature activation+7.03e-03

Pos logit contributions
abella+0.011
top+0.011
 wrong+0.010
bench+0.009
rant+0.009
Neg logit contributions
hr-0.008
rum-0.008
cest-0.008
AAAA-0.007
ment-0.007
Token lived
Token ID5615
Feature activation-2.80e-01

Pos logit contributions
abella+0.007
top+0.007
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token on
Token ID319
Feature activation-5.88e-01

Pos logit contributions
hr+0.010
rum+0.010
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.009
bench-0.008
rant-0.007
Token a
Token ID257
Feature activation+5.21e-01

Pos logit contributions
hr+0.026
rum+0.025
cest+0.025
AAAA+0.024
ment+0.024
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
 versa-0.011
 weekend-0.011
Token farm
Token ID5318
Feature activation+1.14e+00

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.021
bench+0.019
rant+0.019
Neg logit contributions
hr-0.014
cest-0.014
AAAA-0.013
 exc-0.013
oc-0.013
Token with
Token ID351
Feature activation-9.28e-01

Pos logit contributions
abella+0.062
top+0.052
 wrong+0.051
rant+0.049
 versa+0.049
Neg logit contributions
hr-0.025
oc-0.022
cest-0.022
 exc-0.022
rum-0.021
Token other
Token ID584
Feature activation-3.15e+00

Pos logit contributions
hr+0.043
rum+0.041
cest+0.040
AAAA+0.040
ment+0.038
Neg logit contributions
abella-0.025
 wrong-0.023
top-0.021
 Luke-0.016
 weekend-0.016
Token animals
Token ID4695
Feature activation-4.43e-01

Pos logit contributions
rum+0.133
hr+0.132
ment+0.125
 amaz+0.122
AAAA+0.118
Neg logit contributions
 always-0.087
 wrong-0.087
 would-0.086
 one-0.085
abella-0.082
Token.
Token ID13
Feature activation+9.03e-02

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
ment+0.015
Neg logit contributions
abella-0.015
 wrong-0.015
top-0.014
bench-0.011
rant-0.011
Token The
Token ID383
Feature activation-1.61e+00

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
AAAA-0.003
cest-0.003
oc-0.003
Token pony
Token ID26902
Feature activation+3.28e-01

Pos logit contributions
rum+0.063
hr+0.062
ment+0.059
packs+0.055
run+0.053
Neg logit contributions
 wrong-0.060
abella-0.056
 weekend-0.054
 always-0.053
 morning-0.050
Token liked
Token ID8288
Feature activation-1.07e+00

Pos logit contributions
abella+0.013
top+0.012
 wrong+0.012
bench+0.010
rant+0.010
Neg logit contributions
hr-0.011
cest-0.010
oc-0.010
rum-0.010
AAAA-0.010
Token
Token ID198
Feature activation+8.91e-02

Pos logit contributions
abella+0.031
 wrong+0.029
top+0.029
rant+0.025
bench+0.025
Neg logit contributions
hr-0.022
rum-0.021
oc-0.021
cest-0.020
AAAA-0.020
TokenOne
Token ID3198
Feature activation-6.36e-01

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
oc-0.004
Token day
Token ID1110
Feature activation-7.55e-01

Pos logit contributions
hr+0.034
rum+0.034
AAAA+0.033
cest+0.033
ment+0.032
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
 weekend-0.007
 morning-0.005
Token,
Token ID11
Feature activation-1.05e+00

Pos logit contributions
hr+0.033
rum+0.031
cest+0.031
ment+0.029
 exc+0.029
Neg logit contributions
abella-0.022
 wrong-0.022
top-0.020
 contest-0.015
 Luke-0.015
Token while
Token ID981
Feature activation-6.24e-01

Pos logit contributions
rum+0.053
hr+0.052
cest+0.050
ment+0.048
oc+0.048
Neg logit contributions
abella-0.024
 wrong-0.024
top-0.023
 Jack-0.018
 weekend-0.016
Token they
Token ID484
Feature activation-3.15e+00

Pos logit contributions
hr+0.028
rum+0.027
cest+0.027
oc+0.026
ment+0.025
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
 weekend-0.012
 Luke-0.012
Token were
Token ID547
Feature activation+1.52e-01

Pos logit contributions
rum+0.191
cest+0.183
 wounds+0.180
adoes+0.179
ults+0.175
Neg logit contributions
 would-0.037
 only-0.033
 night-0.033
 made-0.030
top-0.030
Token walking
Token ID6155
Feature activation+1.53e-01

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.005
rum-0.004
AAAA-0.004
cest-0.004
packs-0.004
Token,
Token ID11
Feature activation-1.94e-01

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.005
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
ment-0.005
Token Tim
Token ID5045
Feature activation+7.79e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.005
bench-0.004
 versa-0.004
Tokenmy
Token ID1820
Feature activation-3.53e-02

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.003
packs-0.003
Token big
Token ID1263
Feature activation-4.81e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.003
 versa-0.002
Token tree
Token ID5509
Feature activation+1.47e+00

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
ment+0.015
AAAA+0.015
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.014
 weekend-0.014
Token with
Token ID351
Feature activation-2.46e-01

Pos logit contributions
abella+0.082
top+0.067
 versa+0.067
 wrong+0.067
 Dame+0.066
Neg logit contributions
hr-0.031
oc-0.029
 exc-0.028
packs-0.027
rum-0.027
Token lots
Token ID6041
Feature activation-1.78e+00

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.008
 versa-0.007
Token of
Token ID286
Feature activation-1.72e+00

Pos logit contributions
AAAA+0.075
hr+0.075
rum+0.071
cest+0.068
EX+0.066
Neg logit contributions
 wrong-0.051
top-0.048
abella-0.044
 contest-0.036
 night-0.035
Token other
Token ID584
Feature activation-3.14e+00

Pos logit contributions
hr+0.066
rum+0.060
cest+0.060
packs+0.056
AAAA+0.056
Neg logit contributions
 wrong-0.063
abella-0.054
top-0.051
 always-0.047
 morning-0.046
Token squirrel
Token ID33039
Feature activation-5.76e-01

Pos logit contributions
rum+0.126
hr+0.125
ment+0.122
cess+0.112
 amaz+0.111
Neg logit contributions
 wrong-0.101
 always-0.084
 only-0.083
 would-0.080
 never-0.074
Tokens
Token ID82
Feature activation-1.59e+00

Pos logit contributions
hr+0.020
AAAA+0.019
rum+0.019
cest+0.018
ment+0.018
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.020
bench-0.016
rant-0.016
Token.
Token ID13
Feature activation+3.71e-01

Pos logit contributions
hr+0.060
AAAA+0.057
rum+0.056
Upon+0.054
cest+0.054
Neg logit contributions
 wrong-0.058
top-0.049
abella-0.043
 only-0.042
 always-0.042
Token They
Token ID1119
Feature activation+1.14e-01

Pos logit contributions
abella+0.012
 wrong+0.010
top+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.015
rum-0.015
AAAA-0.015
cest-0.015
oc-0.014
Token were
Token ID547
Feature activation-2.19e-01

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
 exc-0.004
Token that
Token ID326
Feature activation+4.69e-01

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.014
rant+0.014
Neg logit contributions
hr-0.018
ment-0.018
rum-0.018
cest-0.018
packs-0.017
Token day
Token ID1110
Feature activation-1.33e-01

Pos logit contributions
abella+0.013
top+0.012
 wrong+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.021
 exc-0.020
cest-0.020
rum-0.020
AAAA-0.020
Token:
Token ID25
Feature activation+8.27e-02

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token never
Token ID1239
Feature activation-4.09e-01

Pos logit contributions
abella+0.003
top+0.002
 wrong+0.002
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
 exc-0.003
Token load
Token ID3440
Feature activation-7.90e-01

Pos logit contributions
hr+0.015
rum+0.015
cest+0.014
AAAA+0.014
oc+0.014
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
 versa-0.010
Token your
Token ID534
Feature activation-3.14e+00

Pos logit contributions
hr+0.026
rum+0.024
cest+0.024
AAAA+0.024
oc+0.022
Neg logit contributions
abella-0.031
 wrong-0.031
top-0.030
rant-0.025
bench-0.024
Token wagon
Token ID31860
Feature activation-5.90e-01

Pos logit contributions
hr+0.126
cest+0.125
AAAA+0.121
rum+0.120
oc+0.118
Neg logit contributions
 wrong-0.112
 tea-0.080
 first-0.079
 only-0.078
oe-0.074
Token with
Token ID351
Feature activation-3.94e-01

Pos logit contributions
hr+0.021
rum+0.020
AAAA+0.020
cest+0.020
ment+0.019
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.020
bench-0.016
rant-0.016
Token p
Token ID279
Feature activation-1.27e+00

Pos logit contributions
hr+0.014
rum+0.013
cest+0.013
AAAA+0.013
oc+0.013
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
rant-0.011
Tokenears
Token ID4127
Feature activation-3.28e-01

Pos logit contributions
hr+0.038
AAAA+0.038
cest+0.038
rum+0.036
 tou+0.034
Neg logit contributions
abella-0.052
top-0.051
 wrong-0.050
rant-0.041
oe-0.039
Token near
Token ID1474
Feature activation-6.71e-01

Pos logit contributions
hr+0.011
rum+0.010
AAAA+0.010
cest+0.010
oc+0.009
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
rant-0.010
Token in
Token ID287
Feature activation-9.61e-01

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.007
bench-0.005
rant-0.005
Token a
Token ID257
Feature activation+2.24e-01

Pos logit contributions
hr+0.044
rum+0.044
cest+0.043
AAAA+0.042
ment+0.041
Neg logit contributions
abella-0.024
 wrong-0.024
top-0.022
bench-0.017
 Luke-0.017
Token big
Token ID1263
Feature activation-3.57e-01

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
rant+0.008
Neg logit contributions
hr-0.006
cest-0.006
AAAA-0.006
rum-0.006
oc-0.006
Token field
Token ID2214
Feature activation+1.67e+00

Pos logit contributions
hr+0.012
rum+0.012
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.011
 Luke-0.010
Token with
Token ID351
Feature activation-8.27e-01

Pos logit contributions
abella+0.092
 versa+0.075
top+0.073
 wrong+0.072
 Dame+0.072
Neg logit contributions
hr-0.035
 exc-0.032
cest-0.031
packs-0.030
rum-0.030
Token other
Token ID584
Feature activation-3.12e+00

Pos logit contributions
hr+0.037
rum+0.037
AAAA+0.036
cest+0.035
oc+0.034
Neg logit contributions
abella-0.023
 wrong-0.021
top-0.020
 Luke-0.015
bench-0.014
Token animals
Token ID4695
Feature activation-4.59e-01

Pos logit contributions
rum+0.131
hr+0.129
AAAA+0.120
ment+0.118
 amaz+0.111
Neg logit contributions
 wrong-0.094
 always-0.088
 would-0.087
 one-0.083
 only-0.082
Token.
Token ID13
Feature activation+2.60e-01

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
ment+0.015
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.012
 versa-0.012
Token
Token ID198
Feature activation+1.21e+00

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
oc-0.008
 exc-0.007
Token
Token ID198
Feature activation+7.65e-01

Pos logit contributions
abella+0.055
top+0.052
 wrong+0.052
rant+0.047
work+0.045
Neg logit contributions
hr-0.030
oc-0.030
rum-0.029
 exc-0.028
AAAA-0.028
TokenOne
Token ID3198
Feature activation-9.34e-01

Pos logit contributions
abella+0.024
 wrong+0.021
top+0.021
bench+0.018
work+0.018
Neg logit contributions
AAAA-0.031
hr-0.031
oc-0.030
 exc-0.029
Upon-0.029
Token.
Token ID13
Feature activation+1.02e+00

Pos logit contributions
abella+0.012
 wrong+0.011
top+0.011
bench+0.009
 versa+0.009
Neg logit contributions
hr-0.008
rum-0.008
cest-0.008
packs-0.008
oc-0.007
Token One
Token ID1881
Feature activation-1.30e+00

Pos logit contributions
abella+0.034
 wrong+0.032
top+0.030
bench+0.026
 versa+0.025
Neg logit contributions
packs-0.037
 exc-0.035
hr-0.035
AAAA-0.035
Upon-0.034
Token day
Token ID1110
Feature activation-1.59e-01

Pos logit contributions
rum+0.082
hr+0.081
AAAA+0.077
cest+0.076
ment+0.076
Neg logit contributions
 wrong-0.010
abella-0.010
top-0.010
 weekend-0.010
 morning-0.007
Token,
Token ID11
Feature activation-3.60e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token while
Token ID981
Feature activation-3.37e-01

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
AAAA+0.015
oc+0.015
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 Luke-0.007
Token they
Token ID484
Feature activation-3.12e+00

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
oc+0.014
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token were
Token ID547
Feature activation-5.77e-01

Pos logit contributions
rum+0.195
cest+0.181
adoes+0.181
hr+0.180
 wounds+0.175
Neg logit contributions
 would-0.043
 weekend-0.035
 night-0.035
abella-0.033
 made-0.033
Token walking
Token ID6155
Feature activation+7.10e-02

Pos logit contributions
rum+0.022
hr+0.022
cest+0.021
oc+0.021
ment+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
 weekend-0.015
 versa-0.015
Token,
Token ID11
Feature activation+1.44e-01

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token Lily
Token ID20037
Feature activation-8.30e-02

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.006
AAAA-0.006
cest-0.005
rum-0.005
ment-0.005
Token found
Token ID1043
Feature activation+3.17e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
rant-0.002
Token in
Token ID287
Feature activation-7.91e-01

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.009
rant-0.009
Token a
Token ID257
Feature activation+3.00e-02

Pos logit contributions
hr+0.036
rum+0.036
cest+0.034
ment+0.034
AAAA+0.033
Neg logit contributions
abella-0.020
 wrong-0.020
top-0.019
 never-0.014
bench-0.013
Token big
Token ID1263
Feature activation-2.60e-01

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token field
Token ID2214
Feature activation+1.63e+00

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.010
bench-0.008
rant-0.008
Token with
Token ID351
Feature activation-8.88e-01

Pos logit contributions
abella+0.091
 versa+0.075
top+0.073
 wrong+0.073
 Dame+0.072
Neg logit contributions
hr-0.035
 exc-0.031
cest-0.031
rum-0.029
packs-0.029
Token other
Token ID584
Feature activation-3.10e+00

Pos logit contributions
hr+0.041
rum+0.040
AAAA+0.039
cest+0.038
packs+0.037
Neg logit contributions
abella-0.023
 wrong-0.022
top-0.021
 never-0.015
 Luke-0.014
Token animals
Token ID4695
Feature activation-2.39e-01

Pos logit contributions
rum+0.121
hr+0.108
 amaz+0.105
AAAA+0.104
Upon+0.104
Neg logit contributions
 wrong-0.099
 always-0.088
 only-0.087
 one-0.086
 never-0.084
Token.
Token ID13
Feature activation+5.15e-01

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
rant-0.007
Token
Token ID198
Feature activation+1.24e+00

Pos logit contributions
abella+0.021
 wrong+0.019
top+0.018
bench+0.016
rant+0.016
Neg logit contributions
hr-0.016
AAAA-0.016
oc-0.016
 exc-0.016
rum-0.015
Token
Token ID198
Feature activation+5.67e-01

Pos logit contributions
abella+0.056
 wrong+0.054
top+0.053
rant+0.048
work+0.047
Neg logit contributions
hr-0.031
oc-0.030
 exc-0.028
rum-0.028
AAAA-0.028
TokenOne
Token ID3198
Feature activation-1.07e+00

Pos logit contributions
abella+0.017
 wrong+0.015
top+0.015
work+0.012
bench+0.012
Neg logit contributions
hr-0.024
AAAA-0.023
oc-0.023
 exc-0.023
Upon-0.022
Token any
Token ID597
Feature activation-1.85e+00

Pos logit contributions
hr+0.027
rum+0.026
cest+0.025
AAAA+0.025
oc+0.025
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
bench-0.014
rant-0.013
Token friends
Token ID2460
Feature activation-6.93e-01

Pos logit contributions
hr+0.063
AAAA+0.062
oc+0.058
cest+0.058
rum+0.058
Neg logit contributions
 wrong-0.076
top-0.064
abella-0.061
 only-0.056
 contest-0.055
Token.
Token ID13
Feature activation+1.58e-01

Pos logit contributions
hr+0.028
rum+0.027
AAAA+0.027
cest+0.026
oc+0.026
Neg logit contributions
 wrong-0.022
abella-0.022
top-0.020
bench-0.016
 versa-0.015
Token All
Token ID1439
Feature activation-5.75e-01

Pos logit contributions
abella+0.006
 wrong+0.006
top+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.005
rum-0.005
AAAA-0.005
cest-0.005
oc-0.005
Token the
Token ID262
Feature activation-1.97e+00

Pos logit contributions
hr+0.022
rum+0.021
cest+0.020
AAAA+0.020
oc+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.019
 weekend-0.015
 contest-0.015
Token other
Token ID584
Feature activation-3.10e+00

Pos logit contributions
rum+0.083
ment+0.078
hr+0.075
cest+0.074
packs+0.073
Neg logit contributions
 wrong-0.067
 always-0.055
abella-0.055
 only-0.054
 never-0.054
Token trees
Token ID7150
Feature activation-1.88e-01

Pos logit contributions
rum+0.130
ment+0.111
hr+0.110
cest+0.109
AAAA+0.109
Neg logit contributions
 wrong-0.104
 only-0.095
 always-0.093
top-0.089
 never-0.087
Token were
Token ID547
Feature activation-1.01e-01

Pos logit contributions
hr+0.007
rum+0.007
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token big
Token ID1263
Feature activation-8.78e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.004
top-0.004
 wrong-0.004
bench-0.003
 versa-0.003
Token and
Token ID290
Feature activation+3.29e-01

Pos logit contributions
hr+0.027
rum+0.026
AAAA+0.025
ment+0.024
cest+0.024
Neg logit contributions
 wrong-0.036
abella-0.034
top-0.034
bench-0.028
 contest-0.028
Token strong
Token ID1913
Feature activation-3.24e-01

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.012
bench+0.011
 versa+0.010
Neg logit contributions
hr-0.010
cest-0.009
 exc-0.009
rum-0.009
AAAA-0.009
Token "
Token ID366
Feature activation-2.04e-01

Pos logit contributions
hr+0.016
rum+0.015
cest+0.015
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.011
 Luke-0.010
TokenDon
Token ID3987
Feature activation-5.48e-02

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.008
top-0.007
 wrong-0.007
bench-0.006
rant-0.006
Token't
Token ID470
Feature activation-1.56e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
rant-0.001
Token eat
Token ID4483
Feature activation-1.82e+00

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token too
Token ID1165
Feature activation-3.34e-01

Pos logit contributions
rum+0.084
hr+0.083
cest+0.083
oc+0.080
AAAA+0.080
Neg logit contributions
 wrong-0.053
abella-0.047
top-0.043
 contest-0.033
 only-0.032
Token many
Token ID867
Feature activation-3.10e+00

Pos logit contributions
hr+0.014
rum+0.014
cest+0.013
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.009
bench-0.007
rant-0.007
Token."
Token ID526
Feature activation-6.48e-02

Pos logit contributions
AAAA+0.109
hr+0.109
rum+0.106
 amaz+0.103
cest+0.100
Neg logit contributions
 wrong-0.114
abella-0.098
top-0.096
 only-0.088
 always-0.086
Token
Token ID198
Feature activation+5.35e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
rant-0.002
Token
Token ID198
Feature activation-3.56e-01

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.021
rant+0.018
bench+0.018
Neg logit contributions
hr-0.016
rum-0.015
AAAA-0.015
oc-0.015
cest-0.014
TokenL
Token ID43
Feature activation-4.66e-01

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
ment+0.012
AAAA+0.012
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.012
bench-0.010
 versa-0.010
Tokenily
Token ID813
Feature activation+4.68e-02

Pos logit contributions
hr+0.024
rum+0.022
cest+0.022
oc+0.022
AAAA+0.022
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.009
bench-0.007
rant-0.006
Token and
Token ID290
Feature activation+2.14e+00

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
 versa+0.008
bench+0.008
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.007
packs-0.007
Token sometimes
Token ID3360
Feature activation+4.41e-01

Pos logit contributions
abella+0.099
top+0.085
 wrong+0.082
rant+0.077
ugg+0.077
Neg logit contributions
 exc-0.058
hr-0.057
 mis-0.053
cest-0.052
 amaz-0.050
Token they
Token ID484
Feature activation+1.56e+00

Pos logit contributions
abella+0.015
top+0.013
 wrong+0.013
rant+0.011
bench+0.011
Neg logit contributions
hr-0.017
cest-0.017
rum-0.016
 exc-0.016
AAAA-0.016
Token miss
Token ID2051
Feature activation+3.64e-01

Pos logit contributions
abella+0.068
top+0.060
 wrong+0.057
side+0.051
rant+0.051
Neg logit contributions
 exc-0.047
 mis-0.043
hr-0.043
cest-0.041
oc-0.041
Token.
Token ID13
Feature activation+1.83e+00

Pos logit contributions
abella+0.013
top+0.012
 wrong+0.012
bench+0.010
rant+0.010
Neg logit contributions
hr-0.013
rum-0.012
cest-0.012
packs-0.012
AAAA-0.012
Token
Token ID198
Feature activation+2.15e+00

Pos logit contributions
 wrong+0.085
abella+0.085
bench+0.082
top+0.080
work+0.077
Neg logit contributions
packs-0.037
AAAA-0.035
yards-0.034
oc-0.033
Upon-0.030
Token
Token ID198
Feature activation+1.69e+00

Pos logit contributions
abella+0.111
 wrong+0.110
work+0.108
top+0.108
rant+0.103
Neg logit contributions
Upon-0.032
AAAA-0.032
oc-0.030
 exc-0.028
Conf-0.026
TokenOne
Token ID3198
Feature activation+4.82e-02

Pos logit contributions
abella+0.061
 wrong+0.060
work+0.058
bench+0.056
top+0.056
Neg logit contributions
AAAA-0.055
Upon-0.052
Conf-0.050
S-0.050
hr-0.049
Token day
Token ID1110
Feature activation+1.69e-01

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
 exc-0.002
Token,
Token ID11
Feature activation+1.68e-01

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
rum-0.005
AAAA-0.005
cest-0.005
ment-0.005
Token a
Token ID257
Feature activation+1.08e+00

Pos logit contributions
abella+0.005
 wrong+0.004
top+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.007
cest-0.007
AAAA-0.007
rum-0.007
ment-0.007
TokenOK
Token ID11380
Feature activation+2.24e-01

Pos logit contributions
abella+0.007
 wrong+0.006
top+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
AAAA-0.006
cest-0.006
rum-0.006
oc-0.005
Token,
Token ID11
Feature activation-1.23e-01

Pos logit contributions
abella+0.009
top+0.009
 wrong+0.008
bench+0.008
rant+0.007
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.006
 exc-0.006
Token OK
Token ID7477
Feature activation+3.31e-01

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token,
Token ID11
Feature activation-4.21e-02

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.012
bench+0.011
 versa+0.011
Neg logit contributions
hr-0.010
rum-0.010
cest-0.010
AAAA-0.009
oc-0.009
Token you
Token ID345
Feature activation+2.18e-01

Pos logit contributions
hr+0.002
rum+0.001
cest+0.001
AAAA+0.001
 exc+0.001
Neg logit contributions
abella-0.002
top-0.001
 wrong-0.001
bench-0.001
rant-0.001
Token can
Token ID460
Feature activation+1.73e+00

Pos logit contributions
abella+0.007
top+0.007
 wrong+0.006
bench+0.006
rant+0.005
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
AAAA-0.008
 exc-0.008
Token play
Token ID711
Feature activation-1.62e-01

Pos logit contributions
abella+0.085
bench+0.076
top+0.072
 weekend+0.071
 wrong+0.071
Neg logit contributions
 exc-0.042
rum-0.039
 mis-0.037
hr-0.037
oc-0.035
Token with
Token ID351
Feature activation-2.11e-01

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token us
Token ID514
Feature activation-3.71e-01

Pos logit contributions
hr+0.007
rum+0.007
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.008
bench-0.007
 versa-0.006
Token.
Token ID13
Feature activation+1.20e+00

Pos logit contributions
hr+0.014
rum+0.013
AAAA+0.013
cest+0.013
ment+0.013
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.009
rant-0.009
Token But
Token ID887
Feature activation+1.49e+00

Pos logit contributions
abella+0.056
 wrong+0.051
top+0.050
bench+0.047
gh+0.043
Neg logit contributions
oc-0.034
AAAA-0.032
rum-0.032
 exc-0.031
hr-0.030
Token He
Token ID679
Feature activation+2.48e-01

Pos logit contributions
abella+0.017
 wrong+0.017
top+0.016
bench+0.014
rant+0.014
Neg logit contributions
hr-0.013
rum-0.012
AAAA-0.012
ment-0.011
 exc-0.011
Token felt
Token ID2936
Feature activation-2.29e-01

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
 weekend+0.007
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
cest-0.008
oc-0.007
Token sick
Token ID6639
Feature activation+7.43e-01

Pos logit contributions
hr+0.007
rum+0.007
cest+0.007
AAAA+0.007
ment+0.006
Neg logit contributions
abella-0.009
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token and
Token ID290
Feature activation+2.26e-04

Pos logit contributions
abella+0.042
top+0.038
 wrong+0.037
 versa+0.034
rant+0.034
Neg logit contributions
rum-0.013
cest-0.013
ment-0.012
hr-0.011
oc-0.011
Token he
Token ID339
Feature activation+1.99e-01

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token wanted
Token ID2227
Feature activation+1.65e+00

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
AAAA-0.006
 exc-0.006
Token to
Token ID284
Feature activation+4.21e-01

Pos logit contributions
abella+0.066
top+0.064
bench+0.058
 Dame+0.056
rant+0.056
Neg logit contributions
ment-0.045
rum-0.044
 ac-0.044
hr-0.043
 exc-0.043
Token cry
Token ID3960
Feature activation+2.19e-01

Pos logit contributions
abella+0.017
top+0.015
 wrong+0.015
bench+0.013
rant+0.013
Neg logit contributions
hr-0.014
rum-0.013
 exc-0.013
cest-0.013
AAAA-0.013
Token.
Token ID13
Feature activation+2.85e-01

Pos logit contributions
abella+0.009
top+0.008
 wrong+0.008
bench+0.007
 versa+0.007
Neg logit contributions
rum-0.007
hr-0.007
cest-0.007
ment-0.006
oc-0.006
Token He
Token ID679
Feature activation+6.52e-01

Pos logit contributions
abella+0.011
 wrong+0.011
top+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.009
rum-0.009
AAAA-0.008
oc-0.008
cest-0.008
Token said
Token ID531
Feature activation+7.74e-01

Pos logit contributions
abella+0.027
top+0.027
 wrong+0.026
bench+0.023
work+0.023
Neg logit contributions
AAAA-0.017
hr-0.017
 exc-0.017
rum-0.017
oc-0.017
Token and
Token ID290
Feature activation-2.38e-01

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token wrapped
Token ID12908
Feature activation+1.52e+00

Pos logit contributions
hr+0.009
rum+0.009
cest+0.009
AAAA+0.009
ment+0.008
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
 versa-0.006
Token it
Token ID340
Feature activation+1.76e+00

Pos logit contributions
abella+0.057
 wrong+0.052
top+0.051
 versa+0.048
bench+0.048
Neg logit contributions
ment-0.048
hr-0.046
rum-0.045
iring-0.043
ries-0.043
Token in
Token ID287
Feature activation-4.12e-01

Pos logit contributions
abella+0.073
top+0.062
 versa+0.059
 wrong+0.058
 Dame+0.057
Neg logit contributions
hr-0.047
rum-0.046
iring-0.045
 exc-0.045
packs-0.043
Token a
Token ID257
Feature activation-2.38e-01

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
ment+0.013
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.012
 Luke-0.011
Token box
Token ID3091
Feature activation+1.84e+00

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.007
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
rant-0.007
Token.
Token ID13
Feature activation-3.96e-01

Pos logit contributions
abella+0.089
 wrong+0.075
 versa+0.073
top+0.072
 Dame+0.072
Neg logit contributions
cest-0.044
hr-0.044
rum-0.042
oc-0.040
packs-0.040
Token
Token ID198
Feature activation+6.87e-01

Pos logit contributions
hr+0.016
rum+0.016
cest+0.016
AAAA+0.015
ment+0.015
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.011
bench-0.009
 Luke-0.009
Token
Token ID198
Feature activation-7.21e-01

Pos logit contributions
abella+0.028
 wrong+0.028
top+0.027
rant+0.024
 versa+0.024
Neg logit contributions
hr-0.020
rum-0.019
AAAA-0.018
oc-0.018
Upon-0.018
TokenSarah
Token ID29284
Feature activation-3.50e-01

Pos logit contributions
hr+0.027
rum+0.027
cest+0.026
ment+0.026
packs+0.025
Neg logit contributions
abella-0.025
top-0.023
 wrong-0.022
bench-0.019
 Luke-0.018
Token said
Token ID531
Feature activation+6.23e-01

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.011
bench-0.009
rant-0.009
Token and
Token ID290
Feature activation+4.13e-01

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.003
AAAA-0.003
ment-0.003
Token promised
Token ID8072
Feature activation+6.63e-01

Pos logit contributions
abella+0.015
top+0.014
 wrong+0.014
bench+0.012
 versa+0.011
Neg logit contributions
hr-0.014
rum-0.014
cest-0.014
ment-0.013
AAAA-0.013
Token herself
Token ID5223
Feature activation+1.79e-02

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.020
bench+0.017
rant+0.017
Neg logit contributions
hr-0.025
ment-0.024
rum-0.024
cest-0.023
 tou-0.023
Token that
Token ID326
Feature activation+2.85e-01

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.000
bench+0.000
rant+0.000
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token she
Token ID673
Feature activation+1.02e+00

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
ment-0.009
 exc-0.009
Token would
Token ID561
Feature activation+1.86e+00

Pos logit contributions
abella+0.032
top+0.028
 wrong+0.027
bench+0.025
 Luke+0.024
Neg logit contributions
 exc-0.039
hr-0.039
oc-0.039
cest-0.038
ment-0.037
Token always
Token ID1464
Feature activation+2.02e-01

Pos logit contributions
abella+0.072
top+0.058
 wrong+0.058
bench+0.053
 projector+0.050
Neg logit contributions
 exc-0.067
hr-0.063
rum-0.062
 mis-0.062
oc-0.060
Token be
Token ID307
Feature activation-1.51e-02

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
rant+0.006
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
 exc-0.006
AAAA-0.006
Token open
Token ID1280
Feature activation-3.06e-01

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
ment+0.000
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
 versa-0.001
Token to
Token ID284
Feature activation-3.64e-01

Pos logit contributions
hr+0.011
rum+0.010
cest+0.010
AAAA+0.010
ment+0.010
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
rant-0.008
Token trying
Token ID2111
Feature activation-5.48e-02

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
oc+0.011
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
rant-0.010
Token cry
Token ID3960
Feature activation-5.25e-01

Pos logit contributions
hr+0.014
rum+0.014
cest+0.014
ment+0.014
AAAA+0.013
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.020
bench-0.017
 versa-0.017
Token.
Token ID13
Feature activation+3.94e-01

Pos logit contributions
hr+0.020
rum+0.019
AAAA+0.018
cest+0.018
packs+0.017
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.013
rant-0.013
Token
Token ID198
Feature activation+4.47e-02

Pos logit contributions
abella+0.016
top+0.015
 wrong+0.015
bench+0.013
rant+0.013
Neg logit contributions
hr-0.012
AAAA-0.012
rum-0.011
ment-0.011
 exc-0.011
Token
Token ID198
Feature activation+2.65e-02

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.001
rant+0.001
Neg logit contributions
hr-0.002
rum-0.001
AAAA-0.001
cest-0.001
ment-0.001
Token"
Token ID1
Feature activation-6.80e-01

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
AAAA-0.001
cest-0.001
ment-0.000
TokenO
Token ID46
Feature activation+1.01e+00

Pos logit contributions
hr+0.023
rum+0.023
cest+0.023
packs+0.022
ment+0.022
Neg logit contributions
abella-0.026
 wrong-0.024
top-0.023
 Luke-0.019
 versa-0.019
Tokenw
Token ID86
Feature activation+3.96e-02

Pos logit contributions
abella+0.038
top+0.033
 wrong+0.033
 versa+0.031
 projector+0.029
Neg logit contributions
rum-0.036
hr-0.036
oc-0.036
cest-0.035
ment-0.034
Token,
Token ID11
Feature activation-7.38e-01

Pos logit contributions
abella+0.002
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token ow
Token ID12334
Feature activation-5.37e-01

Pos logit contributions
hr+0.020
rum+0.019
cest+0.018
packs+0.018
oc+0.017
Neg logit contributions
abella-0.034
 wrong-0.033
top-0.030
 Luke-0.027
 versa-0.026
Token,
Token ID11
Feature activation-1.03e+00

Pos logit contributions
hr+0.021
rum+0.020
AAAA+0.019
cest+0.019
ment+0.019
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.016
bench-0.012
 Luke-0.012
Token ow
Token ID12334
Feature activation-4.99e-01

Pos logit contributions
hr+0.029
packs+0.028
rum+0.028
cest+0.027
oc+0.026
Neg logit contributions
 wrong-0.045
abella-0.045
top-0.040
 Luke-0.037
 never-0.035
Token capt
Token ID3144
Feature activation+6.07e-01

Pos logit contributions
rum+0.039
oc+0.036
AAAA+0.035
cest+0.035
hr+0.034
Neg logit contributions
abella-0.044
 wrong-0.043
top-0.042
 only-0.035
 never-0.035
Tokenivated
Token ID30829
Feature activation-3.62e-01

Pos logit contributions
abella+0.033
top+0.031
 wrong+0.030
bench+0.028
 Luke+0.027
Neg logit contributions
hr-0.012
AAAA-0.010
iring-0.010
 exc-0.009
rum-0.009
Token,
Token ID11
Feature activation-9.71e-01

Pos logit contributions
hr+0.012
rum+0.012
cest+0.011
AAAA+0.011
oc+0.011
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
rant-0.011
Token she
Token ID673
Feature activation-2.88e-02

Pos logit contributions
rum+0.041
oc+0.038
hr+0.038
cest+0.038
AAAA+0.037
Neg logit contributions
abella-0.031
 wrong-0.030
top-0.029
 weekend-0.023
 Luke-0.022
Token decided
Token ID3066
Feature activation-1.18e-01

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
oc+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
rant-0.001
Token to
Token ID284
Feature activation+1.10e-02

Pos logit contributions
hr+0.006
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
rant-0.002
Token stay
Token ID2652
Feature activation-1.05e+00

Pos logit contributions
abella+0.000
 wrong+0.000
top+0.000
bench+0.000
rant+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
 exc-0.000
Token and
Token ID290
Feature activation+3.57e-01

Pos logit contributions
rum+0.043
AAAA+0.041
hr+0.041
oc+0.039
cest+0.039
Neg logit contributions
 wrong-0.033
abella-0.032
top-0.031
rant-0.024
 weekend-0.023
Token rest
Token ID1334
Feature activation-7.77e-02

Pos logit contributions
abella+0.014
 wrong+0.012
top+0.012
bench+0.010
rant+0.010
Neg logit contributions
hr-0.013
 exc-0.012
cest-0.012
rum-0.012
ment-0.011
Token for
Token ID329
Feature activation+6.10e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.003
top-0.003
 wrong-0.003
bench-0.002
 versa-0.002
Token a
Token ID257
Feature activation+7.17e-01

Pos logit contributions
abella+0.023
top+0.022
 wrong+0.021
bench+0.018
rant+0.018
Neg logit contributions
hr-0.021
rum-0.019
AAAA-0.019
ment-0.019
 exc-0.019
Token day
Token ID1110
Feature activation+3.32e-01

Pos logit contributions
abella+0.040
top+0.034
 wrong+0.030
rant+0.028
 versa+0.028
Neg logit contributions
hr-0.040
 exc-0.037
 mis-0.036
oc-0.036
cest-0.036
Token.
Token ID13
Feature activation+3.45e-01

Pos logit contributions
abella+0.014
 wrong+0.013
top+0.012
 versa+0.011
rant+0.010
Neg logit contributions
hr-0.011
cest-0.010
rum-0.010
packs-0.009
oc-0.009
Token The
Token ID383
Feature activation+7.84e-01

Pos logit contributions
abella+0.013
 wrong+0.013
top+0.012
bench+0.011
 versa+0.010
Neg logit contributions
hr-0.011
rum-0.011
AAAA-0.010
oc-0.010
cest-0.010
Token end
Token ID886
Feature activation+9.18e-02

Pos logit contributions
abella+0.029
top+0.025
 wrong+0.023
onna+0.022
rant+0.021
Neg logit contributions
hr-0.028
 mis-0.026
 exc-0.026
 amaz-0.026
cest-0.026
Token.
Token ID13
Feature activation-1.10e-01

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.003
Token<|endoftext|>
Token ID50256
Feature activation+7.68e-01

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
TokenOnce
Token ID7454
Feature activation+4.32e-01

Pos logit contributions
abella+0.034
 wrong+0.032
top+0.032
rant+0.027
 versa+0.027
Neg logit contributions
hr-0.020
AAAA-0.020
rum-0.020
oc-0.019
packs-0.019
Token upon
Token ID2402
Feature activation-4.61e-01

Pos logit contributions
abella+0.024
top+0.023
 wrong+0.022
bench+0.020
 versa+0.020
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
packs-0.006
 exc-0.006
Token a
Token ID257
Feature activation+1.08e+00

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.019
ment+0.018
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.009
 Luke-0.009
Token time
Token ID640
Feature activation-7.84e-01

Pos logit contributions
abella+0.039
top+0.035
 wrong+0.034
work+0.029
bench+0.029
Neg logit contributions
hr-0.038
AAAA-0.037
oc-0.037
rum-0.037
 exc-0.036
Token,
Token ID11
Feature activation-4.31e-01

Pos logit contributions
hr+0.032
cest+0.031
rum+0.031
ment+0.030
AAAA+0.029
Neg logit contributions
abella-0.024
 wrong-0.023
top-0.021
 Luke-0.017
 contest-0.017
Token future
Token ID2003
Feature activation-1.75e-02

Pos logit contributions
abella+0.052
top+0.042
 versa+0.041
 Luke+0.041
bench+0.039
Neg logit contributions
cest-0.046
hr-0.046
oc-0.045
 exc-0.043
 amaz-0.042
Token.
Token ID13
Feature activation-3.83e-01

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token<|endoftext|>
Token ID50256
Feature activation+1.11e+00

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.012
bench-0.010
 Luke-0.010
TokenOnce
Token ID7454
Feature activation+3.55e-01

Pos logit contributions
abella+0.051
 wrong+0.048
top+0.048
rant+0.042
 versa+0.042
Neg logit contributions
hr-0.026
AAAA-0.026
oc-0.025
rum-0.025
packs-0.025
Token upon
Token ID2402
Feature activation-1.40e-01

Pos logit contributions
abella+0.019
top+0.019
 wrong+0.018
bench+0.017
 versa+0.016
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
packs-0.005
AAAA-0.005
Token a
Token ID257
Feature activation+1.27e+00

Pos logit contributions
hr+0.006
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token time
Token ID640
Feature activation-5.57e-01

Pos logit contributions
abella+0.048
top+0.044
 wrong+0.042
work+0.038
bench+0.037
Neg logit contributions
hr-0.042
oc-0.042
AAAA-0.041
 exc-0.041
rum-0.041
Token,
Token ID11
Feature activation-2.58e-01

Pos logit contributions
hr+0.022
rum+0.021
cest+0.021
AAAA+0.020
ment+0.020
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.016
 Luke-0.013
bench-0.013
Token there
Token ID612
Feature activation-8.97e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
rant-0.008
Token was
Token ID373
Feature activation+1.49e-01

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.003
rant-0.002
Token a
Token ID257
Feature activation-3.70e-01

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.006
rum-0.005
AAAA-0.005
cest-0.005
 exc-0.005
Token and
Token ID290
Feature activation+4.25e-01

Pos logit contributions
abella+0.008
 wrong+0.007
top+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
ment-0.005
packs-0.005
Token learned
Token ID4499
Feature activation-3.28e-01

Pos logit contributions
abella+0.015
top+0.014
 wrong+0.014
bench+0.012
rant+0.011
Neg logit contributions
hr-0.015
cest-0.015
rum-0.014
ment-0.014
 exc-0.014
Token many
Token ID867
Feature activation-1.43e+00

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
oc+0.011
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.009
 versa-0.009
Token things
Token ID1243
Feature activation+2.77e-01

Pos logit contributions
AAAA+0.055
hr+0.052
cest+0.050
rum+0.049
EX+0.047
Neg logit contributions
 wrong-0.052
abella-0.049
top-0.044
 would-0.037
bench-0.036
Token.
Token ID13
Feature activation+1.10e+00

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.009
 versa+0.008
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
ment-0.008
packs-0.008
Token She
Token ID1375
Feature activation+1.18e+00

Pos logit contributions
abella+0.041
 wrong+0.037
top+0.037
 versa+0.032
bench+0.032
Neg logit contributions
 exc-0.035
packs-0.035
rum-0.035
oc-0.035
AAAA-0.033
Token was
Token ID373
Feature activation+1.74e-01

Pos logit contributions
 wrong+0.044
top+0.043
abella+0.043
bench+0.037
rant+0.036
Neg logit contributions
 exc-0.037
hr-0.036
cest-0.035
oc-0.035
packs-0.033
Token a
Token ID257
Feature activation-8.68e-03

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
 exc-0.005
AAAA-0.005
Token gifted
Token ID22527
Feature activation+3.38e-01

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
oc+0.000
Neg logit contributions
abella-0.000
top-0.000
 wrong-0.000
bench-0.000
rant-0.000
Token girl
Token ID2576
Feature activation-2.76e-01

Pos logit contributions
abella+0.016
top+0.015
 wrong+0.015
bench+0.013
rant+0.013
Neg logit contributions
hr-0.008
cest-0.008
rum-0.007
oc-0.007
 exc-0.007
Token who
Token ID508
Feature activation+3.16e-01

Pos logit contributions
hr+0.009
rum+0.009
AAAA+0.008
cest+0.008
ment+0.008
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.009
 versa-0.008
Token who
Token ID508
Feature activation+6.74e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token wanted
Token ID2227
Feature activation+1.43e+00

Pos logit contributions
abella+0.027
top+0.025
 wrong+0.023
rant+0.021
bench+0.021
Neg logit contributions
 exc-0.021
oc-0.021
hr-0.021
cest-0.021
rum-0.021
Token to
Token ID284
Feature activation-4.56e-01

Pos logit contributions
abella+0.060
rant+0.053
top+0.052
bench+0.050
 Dame+0.049
Neg logit contributions
hr-0.043
 prey-0.039
ment-0.039
packs-0.038
 exc-0.038
Token buy
Token ID2822
Feature activation+4.48e-01

Pos logit contributions
hr+0.018
rum+0.018
cest+0.018
AAAA+0.017
ment+0.017
Neg logit contributions
abella-0.014
top-0.013
 wrong-0.013
bench-0.011
 Luke-0.010
Token a
Token ID257
Feature activation-1.27e-01

Pos logit contributions
abella+0.017
top+0.016
 wrong+0.015
bench+0.014
rant+0.013
Neg logit contributions
hr-0.015
rum-0.014
cest-0.014
ment-0.014
oc-0.013
Token toy
Token ID13373
Feature activation-7.49e-01

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.003
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token.
Token ID13
Feature activation+2.12e-01

Pos logit contributions
hr+0.030
rum+0.029
AAAA+0.028
cest+0.027
ment+0.027
Neg logit contributions
abella-0.024
 wrong-0.024
top-0.023
bench-0.018
 contest-0.018
Token He
Token ID679
Feature activation+1.41e-01

Pos logit contributions
abella+0.008
 wrong+0.007
top+0.007
rant+0.006
 versa+0.006
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.007
oc-0.006
Token asked
Token ID1965
Feature activation-5.19e-02

Pos logit contributions
abella+0.006
 wrong+0.005
top+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.005
rum-0.004
cest-0.004
AAAA-0.004
oc-0.004
Token his
Token ID465
Feature activation-4.67e-01

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.003
top-0.002
 wrong-0.002
bench-0.002
rant-0.002
Token mum
Token ID25682
Feature activation+1.22e+00

Pos logit contributions
hr+0.017
rum+0.017
cest+0.016
AAAA+0.016
ment+0.016
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
bench-0.012
 weekend-0.012
Token so
Token ID523
Feature activation-3.45e-01

Pos logit contributions
rum+0.028
hr+0.028
cest+0.026
AAAA+0.026
oc+0.026
Neg logit contributions
abella-0.030
 wrong-0.029
top-0.028
 Luke-0.023
bench-0.023
Token the
Token ID262
Feature activation+3.36e-01

Pos logit contributions
hr+0.013
rum+0.013
cest+0.013
AAAA+0.013
oc+0.012
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.008
 Luke-0.008
Token puppy
Token ID26188
Feature activation-9.40e-01

Pos logit contributions
abella+0.016
top+0.015
 wrong+0.015
bench+0.013
rant+0.013
Neg logit contributions
hr-0.008
cest-0.007
rum-0.007
AAAA-0.007
ment-0.007
Token could
Token ID714
Feature activation-1.08e-01

Pos logit contributions
hr+0.036
rum+0.036
cest+0.034
AAAA+0.034
ment+0.033
Neg logit contributions
abella-0.032
 wrong-0.030
top-0.029
 Luke-0.023
 would-0.023
Token run
Token ID1057
Feature activation+2.50e-01

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.003
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token and
Token ID290
Feature activation-1.57e-01

Pos logit contributions
abella+0.010
 wrong+0.009
top+0.009
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.008
rum-0.008
cest-0.008
AAAA-0.008
oc-0.008
Token play
Token ID711
Feature activation+5.02e-01

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token.
Token ID13
Feature activation-3.81e-01

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.020
bench+0.017
rant+0.017
Neg logit contributions
hr-0.014
rum-0.014
oc-0.013
cest-0.013
AAAA-0.013
Token
Token ID220
Feature activation+5.60e-02

Pos logit contributions
hr+0.015
rum+0.015
cest+0.014
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.011
bench-0.009
 Luke-0.009
Token
Token ID198
Feature activation+2.83e-01

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.002
rant+0.002
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token
Token ID198
Feature activation-8.04e-01

Pos logit contributions
abella+0.011
 wrong+0.011
top+0.010
bench+0.009
rant+0.009
Neg logit contributions
hr-0.009
rum-0.009
AAAA-0.008
cest-0.008
oc-0.008
Token a
Token ID257
Feature activation-1.07e+00

Pos logit contributions
abella+0.021
top+0.020
 wrong+0.020
bench+0.017
rant+0.016
Neg logit contributions
hr-0.013
cest-0.013
rum-0.013
ment-0.013
AAAA-0.013
Token smile
Token ID8212
Feature activation-8.87e-01

Pos logit contributions
rum+0.041
hr+0.040
ries+0.038
packs+0.037
cest+0.037
Neg logit contributions
abella-0.035
 wrong-0.035
top-0.034
bench-0.028
 contest-0.028
Token on
Token ID319
Feature activation-6.57e-01

Pos logit contributions
hr+0.034
rum+0.032
AAAA+0.032
cest+0.031
 exc+0.031
Neg logit contributions
abella-0.030
top-0.028
 wrong-0.028
bench-0.022
 Luke-0.022
Token his
Token ID465
Feature activation-1.07e+00

Pos logit contributions
hr+0.020
cest+0.018
AAAA+0.018
packs+0.017
rum+0.017
Neg logit contributions
abella-0.028
 wrong-0.027
top-0.027
 Luke-0.023
bench-0.022
Token face
Token ID1986
Feature activation-7.78e-01

Pos logit contributions
hr+0.037
packs+0.035
rum+0.035
cest+0.034
oc+0.034
Neg logit contributions
 wrong-0.039
abella-0.037
top-0.036
 Luke-0.031
gh-0.031
Token,
Token ID11
Feature activation-2.48e-01

Pos logit contributions
hr+0.031
rum+0.030
AAAA+0.029
cest+0.029
 exc+0.028
Neg logit contributions
abella-0.025
top-0.024
 wrong-0.023
bench-0.018
 Luke-0.018
Token the
Token ID262
Feature activation+3.19e-01

Pos logit contributions
hr+0.009
rum+0.009
cest+0.009
AAAA+0.009
ment+0.008
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.006
 Luke-0.006
Token j
Token ID474
Feature activation+5.83e-01

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.012
bench+0.011
rant+0.010
Neg logit contributions
hr-0.010
cest-0.009
AAAA-0.009
rum-0.009
 exc-0.009
Tokenolly
Token ID5098
Feature activation-6.16e-01

Pos logit contributions
abella+0.020
 wrong+0.018
top+0.017
 versa+0.015
bench+0.015
Neg logit contributions
hr-0.023
ment-0.022
rum-0.022
cest-0.022
oc-0.021
Token pirate
Token ID25868
Feature activation-1.84e-01

Pos logit contributions
rum+0.021
hr+0.021
AAAA+0.020
packs+0.019
cest+0.019
Neg logit contributions
abella-0.023
 wrong-0.022
top-0.021
 Luke-0.017
rant-0.017
Token set
Token ID900
Feature activation-2.56e-02

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token the
Token ID262
Feature activation+8.68e-01

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.020
rant+0.018
 versa+0.018
Neg logit contributions
hr-0.012
rum-0.012
AAAA-0.011
cest-0.011
packs-0.011
Token end
Token ID886
Feature activation-1.12e+00

Pos logit contributions
abella+0.036
top+0.028
rant+0.028
work+0.025
onna+0.025
Neg logit contributions
hr-0.029
 mis-0.029
AAAA-0.029
 tou-0.028
 amaz-0.027
Token,
Token ID11
Feature activation-9.64e-01

Pos logit contributions
hr+0.052
cest+0.049
rum+0.049
AAAA+0.048
EX+0.048
Neg logit contributions
 wrong-0.027
abella-0.027
top-0.024
 contest-0.019
 Jack-0.018
Token Tom
Token ID4186
Feature activation-2.40e-01

Pos logit contributions
rum+0.045
hr+0.045
cest+0.044
AAAA+0.043
ment+0.042
Neg logit contributions
abella-0.025
 wrong-0.024
top-0.022
 Luke-0.016
 only-0.016
Token helped
Token ID4193
Feature activation-1.29e+00

Pos logit contributions
hr+0.008
rum+0.008
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token Kitty
Token ID21168
Feature activation-5.85e-01

Pos logit contributions
hr+0.048
AAAA+0.047
cest+0.045
rum+0.044
 fortunate+0.044
Neg logit contributions
abella-0.046
 wrong-0.044
top-0.043
 only-0.036
 never-0.035
Token unt
Token ID1418
Feature activation+2.99e-01

Pos logit contributions
hr+0.021
rum+0.021
AAAA+0.021
cest+0.021
ment+0.020
Neg logit contributions
abella-0.020
 wrong-0.020
top-0.019
bench-0.015
 versa-0.015
Tokenangle
Token ID9248
Feature activation-6.92e-01

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.009
 versa+0.009
Neg logit contributions
hr-0.010
rum-0.010
cest-0.009
packs-0.009
ment-0.009
Token the
Token ID262
Feature activation-8.73e-01

Pos logit contributions
hr+0.025
cest+0.024
rum+0.024
AAAA+0.024
oc+0.023
Neg logit contributions
abella-0.025
 wrong-0.024
top-0.023
bench-0.018
 Luke-0.018
Token knot
Token ID29654
Feature activation+7.37e-01

Pos logit contributions
hr+0.032
rum+0.031
cest+0.030
ment+0.030
oc+0.029
Neg logit contributions
 wrong-0.032
abella-0.029
top-0.029
bench-0.025
 contest-0.023
Token.
Token ID13
Feature activation-8.91e-01

Pos logit contributions
abella+0.034
top+0.029
 wrong+0.029
 versa+0.027
rant+0.025
Neg logit contributions
hr-0.020
rum-0.020
cest-0.019
 exc-0.018
packs-0.018
Token deserve
Token ID10925
Feature activation+1.04e+00

Pos logit contributions
abella+0.045
top+0.041
 wrong+0.039
bench+0.035
side+0.035
Neg logit contributions
hr-0.025
rum-0.024
cest-0.024
 exc-0.023
AAAA-0.022
Token them
Token ID606
Feature activation-4.75e-01

Pos logit contributions
abella+0.037
top+0.035
 wrong+0.031
bench+0.030
rant+0.027
Neg logit contributions
hr-0.036
rum-0.035
packs-0.034
AAAA-0.034
cest-0.033
Token."
Token ID526
Feature activation-5.37e-02

Pos logit contributions
hr+0.018
AAAA+0.017
rum+0.017
cest+0.017
ment+0.016
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
bench-0.012
 versa-0.012
Token
Token ID198
Feature activation+4.69e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
rant-0.002
Token
Token ID198
Feature activation+4.91e-02

Pos logit contributions
abella+0.020
top+0.019
 wrong+0.018
rant+0.016
 versa+0.016
Neg logit contributions
hr-0.014
rum-0.013
oc-0.013
AAAA-0.013
cest-0.013
TokenL
Token ID43
Feature activation-1.64e-01

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.002
rant+0.002
Neg logit contributions
hr-0.002
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Tokenily
Token ID813
Feature activation+4.40e-01

Pos logit contributions
hr+0.008
rum+0.007
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
rant-0.003
Token and
Token ID290
Feature activation+9.66e-01

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.021
bench+0.019
 weekend+0.019
Neg logit contributions
hr-0.009
AAAA-0.008
rum-0.008
cest-0.008
oc-0.008
Token Ben
Token ID3932
Feature activation+3.68e-01

Pos logit contributions
abella+0.031
top+0.029
 wrong+0.026
bench+0.025
rant+0.024
Neg logit contributions
hr-0.036
 mis-0.035
packs-0.034
 exc-0.034
 thieves-0.034
Token are
Token ID389
Feature activation-2.40e-01

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.013
bench+0.011
 weekend+0.011
Neg logit contributions
hr-0.013
rum-0.012
AAAA-0.012
cest-0.011
 exc-0.011
Token sad
Token ID6507
Feature activation-3.44e-01

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.005
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.010
bench-0.009
 versa-0.009
Token new
Token ID649
Feature activation+9.31e-03

Pos logit contributions
hr+0.010
rum+0.009
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.011
 versa-0.011
Token friend
Token ID1545
Feature activation+4.49e-02

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
rant+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
oc-0.000
Token.
Token ID13
Feature activation+2.62e-01

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token They
Token ID1119
Feature activation-5.90e-01

Pos logit contributions
abella+0.010
 wrong+0.009
top+0.009
bench+0.008
rant+0.007
Neg logit contributions
hr-0.009
AAAA-0.009
rum-0.009
cest-0.008
ment-0.008
Token share
Token ID2648
Feature activation-3.97e-01

Pos logit contributions
hr+0.023
rum+0.023
cest+0.022
AAAA+0.021
ment+0.021
Neg logit contributions
abella-0.020
 wrong-0.018
top-0.018
bench-0.014
 versa-0.014
Token their
Token ID511
Feature activation-2.00e+00

Pos logit contributions
hr+0.015
rum+0.015
cest+0.014
AAAA+0.014
ment+0.013
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
 Luke-0.010
Token food
Token ID2057
Feature activation-3.73e-01

Pos logit contributions
rum+0.078
cest+0.076
iring+0.076
hr+0.076
AAAA+0.072
Neg logit contributions
 wrong-0.065
abella-0.055
top-0.054
 weekend-0.053
 contest-0.052
Token with
Token ID351
Feature activation+2.26e-01

Pos logit contributions
hr+0.011
rum+0.010
AAAA+0.010
cest+0.010
ment+0.009
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.013
 versa-0.012
Token Polly
Token ID36898
Feature activation-7.73e-01

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
 versa+0.005
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
AAAA-0.008
ment-0.008
Token and
Token ID290
Feature activation-7.81e-01

Pos logit contributions
hr+0.029
AAAA+0.028
rum+0.028
cest+0.027
ment+0.026
Neg logit contributions
abella-0.026
 wrong-0.026
top-0.025
bench-0.020
 Luke-0.020
Token listen
Token ID6004
Feature activation+9.66e-01

Pos logit contributions
hr+0.032
rum+0.031
cest+0.030
AAAA+0.030
iring+0.029
Neg logit contributions
abella-0.024
 wrong-0.023
top-0.022
 Luke-0.019
bench-0.017
Token she
Token ID673
Feature activation+7.83e-01

Pos logit contributions
hr+0.022
rum+0.022
cest+0.021
AAAA+0.021
packs+0.020
Neg logit contributions
abella-0.028
 wrong-0.027
top-0.026
bench-0.022
 Luke-0.021
Token wanted
Token ID2227
Feature activation+1.42e+00

Pos logit contributions
abella+0.030
top+0.029
 wrong+0.028
bench+0.025
rant+0.024
Neg logit contributions
hr-0.025
rum-0.023
 mis-0.023
 exc-0.023
cest-0.023
Token to
Token ID284
Feature activation+3.42e-02

Pos logit contributions
abella+0.051
top+0.048
bench+0.044
rant+0.044
 wrong+0.043
Neg logit contributions
hr-0.043
ment-0.042
 mis-0.041
rum-0.041
 mut-0.040
Token make
Token ID787
Feature activation+3.07e-01

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
 exc-0.001
Token her
Token ID607
Feature activation-1.42e+00

Pos logit contributions
abella+0.012
top+0.012
 wrong+0.012
bench+0.010
rant+0.010
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
ment-0.009
AAAA-0.008
Token daughter
Token ID4957
Feature activation-1.98e+00

Pos logit contributions
AAAA+0.058
rum+0.054
hr+0.053
cest+0.051
iring+0.051
Neg logit contributions
 wrong-0.048
abella-0.046
top-0.045
 weekend-0.036
 Luke-0.035
Token happy
Token ID3772
Feature activation-4.55e-01

Pos logit contributions
AAAA+0.083
rum+0.077
cest+0.076
hr+0.076
iring+0.075
Neg logit contributions
 wrong-0.063
top-0.059
abella-0.055
 Luke-0.044
bench-0.042
Token.
Token ID13
Feature activation-6.02e-01

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
oc+0.016
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.014
bench-0.012
 Luke-0.011
Token
Token ID220
Feature activation-3.96e-01

Pos logit contributions
hr+0.027
cest+0.027
rum+0.026
AAAA+0.025
packs+0.025
Neg logit contributions
abella-0.017
top-0.015
 wrong-0.015
 However-0.010
bench-0.010
Token
Token ID198
Feature activation-6.89e-01

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
oc+0.013
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
 versa-0.010
Token
Token ID198
Feature activation-1.52e+00

Pos logit contributions
hr+0.028
cest+0.026
rum+0.026
packs+0.025
AAAA+0.025
Neg logit contributions
abella-0.023
top-0.020
 wrong-0.020
bench-0.016
 versa-0.016
Token Tom
Token ID4186
Feature activation-5.30e-02

Pos logit contributions
abella+0.052
top+0.052
 wrong+0.051
rant+0.047
bench+0.045
Neg logit contributions
cest-0.051
hr-0.050
 exc-0.050
packs-0.047
 mis-0.046
Token.
Token ID13
Feature activation+2.50e-01

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token They
Token ID1119
Feature activation+3.89e-01

Pos logit contributions
abella+0.009
 wrong+0.008
top+0.008
bench+0.007
 versa+0.006
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
AAAA-0.009
packs-0.008
Token were
Token ID547
Feature activation+1.92e-01

Pos logit contributions
abella+0.016
 wrong+0.015
top+0.015
bench+0.013
 versa+0.012
Neg logit contributions
hr-0.012
cest-0.011
 exc-0.011
rum-0.011
AAAA-0.011
Token both
Token ID1111
Feature activation+5.27e-01

Pos logit contributions
abella+0.007
top+0.007
 wrong+0.006
bench+0.006
rant+0.005
Neg logit contributions
hr-0.007
cest-0.006
rum-0.006
 exc-0.006
AAAA-0.006
Token three
Token ID1115
Feature activation-2.12e+00

Pos logit contributions
abella+0.022
top+0.019
 wrong+0.018
bench+0.017
 versa+0.016
Neg logit contributions
hr-0.018
cest-0.017
 exc-0.016
rum-0.016
 mis-0.015
Token years
Token ID812
Feature activation+3.57e-01

Pos logit contributions
AAAA+0.075
 tou+0.070
EX+0.069
hr+0.068
Upon+0.067
Neg logit contributions
 wrong-0.076
abella-0.073
top-0.071
 never-0.058
 often-0.057
Token old
Token ID1468
Feature activation-4.35e-01

Pos logit contributions
abella+0.016
top+0.015
 wrong+0.015
 versa+0.013
bench+0.012
Neg logit contributions
hr-0.010
cest-0.009
rum-0.009
 exc-0.009
ment-0.009
Token and
Token ID290
Feature activation+1.06e+00

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.011
cest+0.011
ment+0.010
Neg logit contributions
abella-0.019
 wrong-0.018
top-0.018
bench-0.015
 Luke-0.015
Token loved
Token ID6151
Feature activation-9.01e-02

Pos logit contributions
abella+0.039
top+0.037
 wrong+0.037
bench+0.033
rant+0.033
Neg logit contributions
 exc-0.034
hr-0.033
cest-0.033
rum-0.031
 mis-0.030
Token to
Token ID284
Feature activation-1.92e-01

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
rant-0.002
Token you
Token ID345
Feature activation-1.10e+00

Pos logit contributions
packs+0.072
hr+0.072
cest+0.072
rum+0.072
 amaz+0.071
Neg logit contributions
 wrong-0.043
abella-0.034
top-0.033
 never-0.025
 only-0.024
Token."
Token ID526
Feature activation-7.62e-01

Pos logit contributions
AAAA+0.046
hr+0.046
ment+0.043
rum+0.043
cest+0.042
Neg logit contributions
 wrong-0.035
top-0.031
abella-0.030
 never-0.024
 weekend-0.022
Token Twe
Token ID24205
Feature activation-1.64e+00

Pos logit contributions
hr+0.031
cest+0.030
rum+0.030
packs+0.029
AAAA+0.029
Neg logit contributions
abella-0.023
 wrong-0.023
top-0.023
 However-0.018
bench-0.018
Tokenety
Token ID2963
Feature activation-2.15e-01

Pos logit contributions
 amaz+0.075
hr+0.074
rum+0.072
ment+0.072
AAAA+0.072
Neg logit contributions
top-0.038
abella-0.037
 wrong-0.034
bo-0.027
oes-0.026
Token flew
Token ID13112
Feature activation-1.84e-01

Pos logit contributions
hr+0.008
rum+0.008
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
rant-0.006
Token out
Token ID503
Feature activation-1.88e+00

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token of
Token ID286
Feature activation-1.15e+00

Pos logit contributions
AAAA+0.075
rum+0.072
cest+0.072
hr+0.071
oc+0.069
Neg logit contributions
top-0.053
 wrong-0.053
abella-0.051
 Jack-0.044
 over-0.044
Token the
Token ID262
Feature activation-9.75e-01

Pos logit contributions
hr+0.045
cest+0.044
rum+0.044
oc+0.043
ment+0.041
Neg logit contributions
 wrong-0.037
abella-0.036
top-0.033
bench-0.030
 weekend-0.028
Token cage
Token ID17186
Feature activation+1.23e-01

Pos logit contributions
rum+0.033
hr+0.030
ment+0.030
cest+0.029
AAAA+0.028
Neg logit contributions
 wrong-0.040
abella-0.036
top-0.036
 contest-0.030
 weekend-0.030
Token and
Token ID290
Feature activation-3.88e-01

Pos logit contributions
abella+0.006
 wrong+0.006
top+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.003
rum-0.003
cest-0.002
ment-0.002
oc-0.002
Token landed
Token ID11406
Feature activation-7.33e-01

Pos logit contributions
rum+0.015
hr+0.015
cest+0.014
AAAA+0.014
ment+0.013
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.012
bench-0.010
 versa-0.010
Token duck
Token ID22045
Feature activation-3.86e-01

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.002
top-0.002
 wrong-0.002
bench-0.002
rant-0.002
Token!"
Token ID2474
Feature activation-8.97e-01

Pos logit contributions
hr+0.013
rum+0.012
AAAA+0.012
cest+0.012
ment+0.012
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
rant-0.011
Token Mom
Token ID11254
Feature activation+2.28e-01

Pos logit contributions
hr+0.046
rum+0.045
cest+0.045
oc+0.044
AAAA+0.042
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.017
 Luke-0.012
 only-0.012
Token says
Token ID1139
Feature activation+3.91e-01

Pos logit contributions
abella+0.009
top+0.009
 wrong+0.008
bench+0.007
 weekend+0.007
Neg logit contributions
hr-0.007
rum-0.007
AAAA-0.007
cest-0.007
 exc-0.007
Token.
Token ID13
Feature activation-1.30e+00

Pos logit contributions
abella+0.017
top+0.015
 wrong+0.015
 weekend+0.013
 versa+0.013
Neg logit contributions
hr-0.012
rum-0.012
ment-0.011
cest-0.011
 exc-0.011
Token "
Token ID366
Feature activation-1.99e+00

Pos logit contributions
hr+0.062
cest+0.059
rum+0.059
oc+0.056
packs+0.056
Neg logit contributions
abella-0.034
 wrong-0.033
top-0.031
 only-0.024
 However-0.024
TokenShe
Token ID3347
Feature activation-5.26e-01

Pos logit contributions
packs+0.091
rum+0.088
ment+0.086
 amaz+0.085
cest+0.084
Neg logit contributions
top-0.050
abella-0.047
 wrong-0.046
 never-0.037
her-0.037
Token is
Token ID318
Feature activation-1.92e-01

Pos logit contributions
hr+0.022
rum+0.021
cest+0.021
ment+0.021
AAAA+0.021
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.014
bench-0.011
rant-0.011
Token not
Token ID407
Feature activation-1.37e+00

Pos logit contributions
hr+0.007
rum+0.007
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token food
Token ID2057
Feature activation+1.45e-01

Pos logit contributions
oc+0.054
 amaz+0.053
hr+0.052
ment+0.052
rum+0.051
Neg logit contributions
 wrong-0.049
abella-0.041
top-0.035
 always-0.031
bench-0.031
Token,
Token ID11
Feature activation-7.47e-01

Pos logit contributions
abella+0.006
top+0.005
 wrong+0.005
bench+0.004
 versa+0.004
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.004
oc-0.004