TOP ACTIVATIONS
MAX = 0.082

they played in the garden and had lots of fun .
and wrapped one of their arms around Bobby 's shoulders .
mouth and run around the yard . One day
to have fun , it should never come before taking care
Tim my learned that he should complete his math and not
the story is : You should always ask your parents before
snow man and wrapped his arms around him .
toys and run around the yard . One day , Max
of the story : kids should never try to use grown
the story is : You should not be ignorant and selfish
and looking for a carrot trail ," replied Benny .
Every day they played together and joked around . But one
so he hid a special camera . The bear searched high
and your mind . You should not play too much or
said that Lily and Ben should say sorry and thank you
hug sound and wraps his arms around the paper .
hug sound and wraps her arms around the paper .
an important lesson - we should respect the opinion of others
. They always played together and had lots of fun .
cricket was sitting on a leaf outside . Suddenly , it

MIN ACTIVATIONS
MIN = -0.087

was all broken . Wil ma felt a big
suddenly Lucy heard a loud " SC REAM "! She felt
. It was surrounded by other weak trees in the forest
. One day , Wil ma tri pped and
, the tree shook and an apple fell to the ground
in a jungle with many other animals like lions , tigers
, he heard a loud " Z oom !"
a big tree with many other cher ries . The smooth
Help !" Tom screamed . " There 's a lion in
can 't just play with other people 's things without asking
One day , while they were walking , Tim my
lived on a farm with other animals . The pony liked
. She wanted to fill her basket with the needles .
big tree with lots of other squirrel s . They were
that day : never load your wagon with p ears near
in a big field with other animals . One
. One day , while they were walking , Lily found
any friends . All the other trees were big and strong
" Don 't eat too many ." L ily
in a big field with other animals . One

INTERVAL 0.040 - 0.082
CONTAINS 0.613%

please !" Her mom looked at the price tag
skill to have . You can play with Max anytime you
Tim my . Do you want to play with me ?"
. They continued their walk and the boy was happy to
decided to go on a journey . All around

INTERVAL -0.003 - 0.040
CONTAINS 55.963%

and agreed . The voice pulled the little girl out of
person said " It 's ok , you can come out
smiled and said , " Let 's go to the park
place had many old things to look at . Lucy liked
mom went back to the gate . Tom 's mom put

INTERVAL -0.045 - -0.003
CONTAINS 43.143%

to a party with all of his friends and family .
her room was a mess . She didn 't want her
was sorry . The vendor smiled . He said
From that day forward , the little bird always
we stumble , we have friends to help us up .

INTERVAL -0.087 - -0.045
CONTAINS 0.281%

a castle that was one hundred inches high . She said
his classmates . He brought his toy plane and could not
. One day , while he was playing , he found
lesson . They should not feed the animals in the pond
pick up the herb with its be ak to eat it
Token they
Token ID484
Feature activation+3.65e-02

Pos logit contributions
abella+0.031
top+0.031
 wrong+0.030
bench+0.027
rant+0.027
Neg logit contributions
hr-0.023
packs-0.022
rum-0.022
cest-0.022
ment-0.022
Token played
Token ID2826
Feature activation+1.76e-04

Pos logit contributions
abella+0.050
 wrong+0.043
top+0.042
bench+0.035
gh+0.034
Neg logit contributions
 exc-0.061
cest-0.057
packs-0.055
oc-0.055
 mis-0.054
Token in
Token ID287
Feature activation-3.23e-03

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
rant+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
oc-0.000
Token the
Token ID262
Feature activation+3.11e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token garden
Token ID11376
Feature activation+3.51e-02

Pos logit contributions
abella+0.062
top+0.053
bench+0.053
 versa+0.052
 wrong+0.051
Neg logit contributions
cest-0.029
AAAA-0.029
hr-0.028
 tou-0.027
 oak-0.027
Token and
Token ID290
Feature activation+8.18e-02

Pos logit contributions
abella+0.076
 versa+0.062
 wrong+0.060
top+0.059
bench+0.058
Neg logit contributions
packs-0.033
hr-0.033
cest-0.031
 exc-0.030
oc-0.029
Token had
Token ID550
Feature activation+3.58e-02

Pos logit contributions
bench+0.149
 wrong+0.145
top+0.142
inner+0.137
 virtue+0.135
Neg logit contributions
 ge-0.072
 squirrel-0.071
packs-0.065
 pets-0.064
 cor-0.063
Token lots
Token ID6041
Feature activation-2.76e-02

Pos logit contributions
abella+0.064
top+0.060
bench+0.057
rant+0.056
 wrong+0.056
Neg logit contributions
hr-0.038
cest-0.035
Upon-0.034
packs-0.034
AAAA-0.033
Token of
Token ID286
Feature activation-8.31e-03

Pos logit contributions
rum+0.041
hr+0.041
AAAA+0.041
cest+0.038
oc+0.036
Neg logit contributions
 wrong-0.038
top-0.037
abella-0.036
 weekend-0.030
 contest-0.030
Token fun
Token ID1257
Feature activation-1.52e-02

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
AAAA+0.015
ment+0.015
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.007
bench-0.006
 weekend-0.005
Token.
Token ID13
Feature activation+4.51e-02

Pos logit contributions
hr+0.024
rum+0.023
AAAA+0.023
cest+0.022
 tou+0.022
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.019
bench-0.015
 weekend-0.014
Token and
Token ID290
Feature activation+1.35e-03

Pos logit contributions
hr+0.013
rum+0.012
AAAA+0.012
cest+0.011
oc+0.011
Neg logit contributions
abella-0.019
 wrong-0.019
top-0.019
bench-0.016
rant-0.015
Token wrapped
Token ID12908
Feature activation+4.54e-02

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token one
Token ID530
Feature activation+1.71e-02

Pos logit contributions
abella+0.086
top+0.078
 contest+0.077
 wrong+0.075
bench+0.075
Neg logit contributions
ment-0.046
iring-0.041
rum-0.040
adoes-0.037
ries-0.037
Token of
Token ID286
Feature activation+2.71e-03

Pos logit contributions
abella+0.029
 wrong+0.025
top+0.025
 versa+0.023
bench+0.023
Neg logit contributions
hr-0.021
rum-0.021
cest-0.020
ment-0.020
oc-0.020
Token their
Token ID511
Feature activation-3.04e-02

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.005
bench+0.004
 versa+0.004
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.002
Token arms
Token ID5101
Feature activation+8.10e-02

Pos logit contributions
hr+0.041
rum+0.038
cest+0.038
AAAA+0.038
oc+0.036
Neg logit contributions
 wrong-0.049
abella-0.045
top-0.044
 weekend-0.038
 would-0.037
Token around
Token ID1088
Feature activation-2.95e-03

Pos logit contributions
abella+0.177
 versa+0.146
 Dame+0.145
 cinema+0.144
 halftime+0.143
Neg logit contributions
 as-0.066
 from-0.065
iring-0.060
 behind-0.058
rum-0.058
Token Bobby
Token ID17155
Feature activation+2.59e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token's
Token ID338
Feature activation+3.76e-04

Pos logit contributions
abella+0.047
 wrong+0.041
top+0.040
 contest+0.037
bench+0.037
Neg logit contributions
hr-0.030
rum-0.030
oc-0.028
cest-0.028
ment-0.027
Token shoulders
Token ID12450
Feature activation+5.77e-02

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token.
Token ID13
Feature activation+1.13e-02

Pos logit contributions
abella+0.126
 wrong+0.104
 Dame+0.100
 Luke+0.099
 versa+0.099
Neg logit contributions
rum-0.046
iring-0.045
hr-0.044
 exc-0.043
cest-0.043
Token mouth
Token ID5422
Feature activation+2.17e-02

Pos logit contributions
hr+0.033
rum+0.032
cest+0.030
ment+0.030
AAAA+0.029
Neg logit contributions
 wrong-0.031
abella-0.030
top-0.030
bench-0.024
 Luke-0.024
Token and
Token ID290
Feature activation-1.62e-03

Pos logit contributions
abella+0.045
 wrong+0.040
top+0.039
 versa+0.037
bench+0.035
Neg logit contributions
hr-0.019
packs-0.018
cest-0.018
rum-0.018
ment-0.017
Token run
Token ID1057
Feature activation+1.96e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token around
Token ID1088
Feature activation+1.61e-02

Pos logit contributions
abella+0.030
 wrong+0.026
top+0.026
bench+0.023
 Luke+0.021
Neg logit contributions
oc-0.027
hr-0.026
rum-0.026
cest-0.026
AAAA-0.025
Token the
Token ID262
Feature activation+3.92e-03

Pos logit contributions
abella+0.031
top+0.028
 wrong+0.026
bench+0.025
rant+0.024
Neg logit contributions
hr-0.017
rum-0.016
oc-0.016
 exc-0.015
cest-0.015
Token yard
Token ID12699
Feature activation+7.63e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
oc-0.005
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
abella+0.178
acey+0.149
 versa+0.144
rant+0.140
 Dame+0.139
Neg logit contributions
 than-0.054
 exc-0.050
hr-0.050
cest-0.049
packs-0.047
Token
Token ID198
Feature activation+2.31e-02

Pos logit contributions
abella+0.022
 wrong+0.021
top+0.021
bench+0.018
rant+0.018
Neg logit contributions
oc-0.016
rum-0.015
hr-0.015
AAAA-0.015
 exc-0.014
Token
Token ID198
Feature activation+1.47e-02

Pos logit contributions
abella+0.040
 wrong+0.039
top+0.038
rant+0.033
 versa+0.033
Neg logit contributions
hr-0.025
rum-0.025
oc-0.025
AAAA-0.024
packs-0.023
TokenOne
Token ID3198
Feature activation-1.68e-02

Pos logit contributions
abella+0.017
top+0.016
 wrong+0.016
work+0.013
bench+0.013
Neg logit contributions
oc-0.025
AAAA-0.025
rum-0.025
hr-0.024
cest-0.024
Token day
Token ID1110
Feature activation-1.64e-02

Pos logit contributions
hr+0.037
rum+0.037
AAAA+0.035
cest+0.035
ment+0.035
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.010
 weekend-0.008
 morning-0.006
Token to
Token ID284
Feature activation+3.34e-02

Pos logit contributions
abella+0.028
top+0.025
 wrong+0.023
rant+0.021
 versa+0.020
Neg logit contributions
rum-0.033
hr-0.032
ment-0.032
cest-0.031
packs-0.030
Token have
Token ID423
Feature activation+1.26e-02

Pos logit contributions
abella+0.059
top+0.053
 wrong+0.051
bench+0.047
 projector+0.046
Neg logit contributions
hr-0.041
rum-0.039
 exc-0.039
 mis-0.036
cest-0.036
Token fun
Token ID1257
Feature activation-8.52e-03

Pos logit contributions
abella+0.018
top+0.016
 wrong+0.015
rant+0.014
bench+0.013
Neg logit contributions
hr-0.019
rum-0.018
cest-0.018
AAAA-0.018
ment-0.017
Token,
Token ID11
Feature activation-6.10e-03

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
oc+0.010
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 versa-0.010
Token it
Token ID340
Feature activation+2.72e-02

Pos logit contributions
hr+0.010
rum+0.010
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
 versa-0.006
Token should
Token ID815
Feature activation+7.62e-02

Pos logit contributions
abella+0.042
top+0.037
 wrong+0.034
bench+0.033
 versa+0.032
Neg logit contributions
hr-0.038
rum-0.036
 exc-0.036
oc-0.036
cest-0.035
Token never
Token ID1239
Feature activation+1.36e-02

Pos logit contributions
abella+0.134
top+0.117
 projector+0.108
rant+0.106
bench+0.105
Neg logit contributions
 exc-0.098
rum-0.088
 mis-0.086
 tou-0.083
 fear-0.083
Token come
Token ID1282
Feature activation-2.15e-02

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.021
bench+0.019
rant+0.019
Neg logit contributions
hr-0.017
cest-0.015
rum-0.015
 exc-0.015
AAAA-0.015
Token before
Token ID878
Feature activation-1.68e-02

Pos logit contributions
hr+0.039
rum+0.038
AAAA+0.038
oc+0.036
cest+0.036
Neg logit contributions
 wrong-0.023
abella-0.023
top-0.022
rant-0.016
work-0.016
Token taking
Token ID2263
Feature activation+2.88e-03

Pos logit contributions
hr+0.026
rum+0.025
cest+0.025
AAAA+0.024
oc+0.024
Neg logit contributions
abella-0.023
 wrong-0.022
top-0.021
bench-0.017
 weekend-0.017
Token care
Token ID1337
Feature activation-9.67e-04

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
ment-0.004
Token Tim
Token ID5045
Feature activation+1.68e-02

Pos logit contributions
rum+0.056
hr+0.056
AAAA+0.055
cest+0.054
ment+0.054
Neg logit contributions
 wrong-0.026
top-0.023
abella-0.023
 Jack-0.019
 only-0.017
Tokenmy
Token ID1820
Feature activation+2.58e-02

Pos logit contributions
abella+0.021
top+0.020
 wrong+0.018
 versa+0.016
bench+0.015
Neg logit contributions
hr-0.027
cest-0.027
rum-0.026
oc-0.026
 exc-0.025
Token learned
Token ID4499
Feature activation+6.28e-03

Pos logit contributions
abella+0.038
top+0.035
 wrong+0.033
rant+0.030
bench+0.029
Neg logit contributions
cest-0.035
oc-0.035
hr-0.035
rum-0.034
 exc-0.034
Token that
Token ID326
Feature activation+2.62e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.011
rum-0.011
cest-0.011
AAAA-0.011
ment-0.011
Token he
Token ID339
Feature activation+2.75e-02

Pos logit contributions
abella+0.043
top+0.039
rant+0.036
 wrong+0.034
bench+0.031
Neg logit contributions
hr-0.036
 exc-0.035
rum-0.035
cest-0.033
 accidents-0.033
Token should
Token ID815
Feature activation+7.61e-02

Pos logit contributions
abella+0.046
top+0.045
 wrong+0.039
bench+0.039
side+0.038
Neg logit contributions
hr-0.031
 exc-0.031
oc-0.030
rum-0.030
cest-0.030
Token complete
Token ID1844
Feature activation+1.29e-02

Pos logit contributions
abella+0.147
top+0.127
 projector+0.124
rant+0.123
 cinema+0.115
Neg logit contributions
 exc-0.091
rum-0.081
 mis-0.075
 worry-0.068
 heel-0.068
Token his
Token ID465
Feature activation-2.84e-02

Pos logit contributions
abella+0.021
top+0.020
 wrong+0.019
bench+0.017
rant+0.017
Neg logit contributions
hr-0.016
rum-0.016
cest-0.015
packs-0.014
ment-0.014
Token math
Token ID10688
Feature activation-7.34e-03

Pos logit contributions
cest+0.061
hr+0.061
rum+0.060
AAAA+0.059
ment+0.059
Neg logit contributions
 wrong-0.023
abella-0.021
top-0.018
 contest-0.013
bench-0.013
Token and
Token ID290
Feature activation+1.95e-02

Pos logit contributions
hr+0.013
rum+0.013
cest+0.013
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.007
bench-0.006
 versa-0.005
Token not
Token ID407
Feature activation+2.73e-02

Pos logit contributions
abella+0.027
top+0.024
 wrong+0.022
rant+0.021
 versa+0.020
Neg logit contributions
hr-0.030
 exc-0.029
rum-0.029
cest-0.029
ment-0.028
Token the
Token ID262
Feature activation+2.99e-02

Pos logit contributions
abella+0.034
top+0.034
 wrong+0.033
bench+0.030
 versa+0.028
Neg logit contributions
hr-0.020
rum-0.019
oc-0.018
cest-0.018
AAAA-0.017
Token story
Token ID1621
Feature activation+2.51e-02

Pos logit contributions
top+0.056
abella+0.053
rant+0.048
 wrong+0.047
bench+0.046
Neg logit contributions
 tou-0.032
hr-0.032
 exc-0.031
AAAA-0.031
 mis-0.029
Token is
Token ID318
Feature activation+2.10e-02

Pos logit contributions
abella+0.043
top+0.039
 wrong+0.036
 versa+0.035
rant+0.034
Neg logit contributions
rum-0.032
oc-0.030
ment-0.029
packs-0.029
hr-0.028
Token:
Token ID25
Feature activation+2.12e-02

Pos logit contributions
abella+0.029
top+0.028
bench+0.025
 wrong+0.025
 versa+0.024
Neg logit contributions
hr-0.031
rum-0.031
iring-0.030
ment-0.029
 exc-0.028
Token You
Token ID921
Feature activation+7.08e-03

Pos logit contributions
abella+0.035
top+0.031
 wrong+0.031
bench+0.028
rant+0.027
Neg logit contributions
hr-0.029
rum-0.028
 exc-0.027
oc-0.027
AAAA-0.026
Token should
Token ID815
Feature activation+7.61e-02

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.010
rum-0.010
cest-0.010
 exc-0.010
AAAA-0.010
Token always
Token ID1464
Feature activation+3.29e-02

Pos logit contributions
abella+0.145
top+0.124
 projector+0.119
rant+0.116
bench+0.110
Neg logit contributions
 exc-0.090
rum-0.087
 mis-0.073
 worry-0.072
 fear-0.070
Token ask
Token ID1265
Feature activation+3.21e-03

Pos logit contributions
abella+0.064
top+0.054
 wrong+0.051
rant+0.047
side+0.046
Neg logit contributions
rum-0.038
hr-0.038
 exc-0.038
 mis-0.035
cest-0.033
Token your
Token ID534
Feature activation+9.98e-03

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
 versa+0.003
Neg logit contributions
hr-0.005
rum-0.005
cest-0.004
AAAA-0.004
ment-0.004
Token parents
Token ID3397
Feature activation+1.89e-02

Pos logit contributions
abella+0.019
top+0.018
 wrong+0.017
bench+0.016
rant+0.016
Neg logit contributions
hr-0.010
rum-0.010
cest-0.009
 exc-0.009
packs-0.009
Token before
Token ID878
Feature activation+1.52e-03

Pos logit contributions
abella+0.032
top+0.029
 wrong+0.028
bench+0.026
 versa+0.024
Neg logit contributions
hr-0.024
rum-0.022
cest-0.022
oc-0.021
packs-0.021
Token snow
Token ID6729
Feature activation+2.93e-02

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.020
bench+0.018
 versa+0.018
Neg logit contributions
hr-0.014
cest-0.014
rum-0.014
AAAA-0.013
oc-0.013
Tokenman
Token ID805
Feature activation+5.58e-03

Pos logit contributions
abella+0.044
 wrong+0.037
top+0.035
 versa+0.032
bench+0.030
Neg logit contributions
oc-0.045
cest-0.045
hr-0.044
ment-0.043
rum-0.043
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.009
 versa+0.009
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
oc-0.005
packs-0.005
Token wrapped
Token ID12908
Feature activation+4.81e-02

Pos logit contributions
hr+0.022
rum+0.022
cest+0.021
AAAA+0.021
iring+0.020
Neg logit contributions
abella-0.020
top-0.018
 wrong-0.018
bench-0.014
 Luke-0.014
Token his
Token ID465
Feature activation-4.01e-02

Pos logit contributions
abella+0.090
 wrong+0.085
top+0.082
bench+0.081
 weekend+0.079
Neg logit contributions
ment-0.045
ries-0.041
rum-0.040
hr-0.039
iring-0.037
Token arms
Token ID5101
Feature activation+7.57e-02

Pos logit contributions
hr+0.064
Upon+0.057
cest+0.057
AAAA+0.056
rum+0.055
Neg logit contributions
 wrong-0.056
abella-0.048
top-0.047
gh-0.042
 would-0.041
Token around
Token ID1088
Feature activation+1.72e-02

Pos logit contributions
abella+0.164
 versa+0.136
 halftime+0.133
 Dame+0.132
 cinema+0.132
Neg logit contributions
 from-0.061
 than-0.061
 as-0.059
 behind-0.057
rum-0.056
Token him
Token ID683
Feature activation+1.09e-03

Pos logit contributions
abella+0.031
top+0.029
 wrong+0.029
bench+0.026
 versa+0.025
Neg logit contributions
hr-0.018
rum-0.018
ment-0.017
oc-0.017
cest-0.016
Token.
Token ID13
Feature activation-7.38e-03

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token
Token ID198
Feature activation+8.67e-03

Pos logit contributions
hr+0.011
rum+0.011
cest+0.011
AAAA+0.010
ment+0.010
Neg logit contributions
abella-0.011
top-0.010
 wrong-0.010
bench-0.008
 Luke-0.008
Token
Token ID198
Feature activation-1.48e-02

Pos logit contributions
abella+0.014
 wrong+0.013
top+0.013
bench+0.011
 versa+0.011
Neg logit contributions
hr-0.011
rum-0.011
AAAA-0.010
oc-0.010
cest-0.010
Token toys
Token ID14958
Feature activation+1.35e-02

Pos logit contributions
hr+0.067
rum+0.065
ment+0.064
AAAA+0.060
cest+0.057
Neg logit contributions
 wrong-0.066
top-0.053
abella-0.052
 would-0.051
 only-0.050
Token and
Token ID290
Feature activation+2.36e-03

Pos logit contributions
abella+0.029
top+0.025
 wrong+0.025
 versa+0.023
rant+0.022
Neg logit contributions
hr-0.011
cest-0.011
packs-0.010
 exc-0.010
rum-0.010
Token run
Token ID1057
Feature activation+1.14e-02

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
 exc-0.003
Token around
Token ID1088
Feature activation+1.93e-02

Pos logit contributions
abella+0.015
top+0.013
 wrong+0.013
bench+0.010
rant+0.010
Neg logit contributions
hr-0.018
oc-0.018
rum-0.018
cest-0.018
AAAA-0.017
Token the
Token ID262
Feature activation+2.16e-02

Pos logit contributions
abella+0.037
top+0.032
 wrong+0.031
rant+0.028
bench+0.028
Neg logit contributions
hr-0.021
oc-0.020
rum-0.019
 mis-0.019
AAAA-0.018
Token yard
Token ID12699
Feature activation+7.55e-02

Pos logit contributions
abella+0.041
top+0.036
 wrong+0.035
 versa+0.034
bench+0.033
Neg logit contributions
hr-0.022
cest-0.021
oc-0.021
 tou-0.020
AAAA-0.020
Token.
Token ID13
Feature activation+2.14e-02

Pos logit contributions
abella+0.172
acey+0.141
 versa+0.137
 Dame+0.136
rant+0.135
Neg logit contributions
packs-0.056
 than-0.055
hr-0.055
oc-0.053
cest-0.052
Token One
Token ID1881
Feature activation-9.70e-03

Pos logit contributions
abella+0.030
 wrong+0.028
top+0.027
rant+0.023
work+0.022
Neg logit contributions
oc-0.032
rum-0.030
packs-0.030
AAAA-0.030
hr-0.029
Token day
Token ID1110
Feature activation-1.98e-03

Pos logit contributions
hr+0.021
rum+0.021
AAAA+0.020
cest+0.020
ment+0.020
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.006
 weekend-0.004
bench-0.004
Token,
Token ID11
Feature activation-5.95e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
top-0.003
 wrong-0.003
bench-0.002
rant-0.002
Token Max
Token ID5436
Feature activation-1.91e-02

Pos logit contributions
hr+0.012
rum+0.012
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.004
bench-0.003
 Luke-0.003
Token of
Token ID286
Feature activation+3.36e-02

Pos logit contributions
abella+0.081
 versa+0.076
top+0.075
 wrong+0.075
 Dame+0.074
Neg logit contributions
ment-0.019
hr-0.018
oc-0.016
iring-0.015
Upon-0.015
Token the
Token ID262
Feature activation+2.43e-02

Pos logit contributions
abella+0.062
top+0.060
 wrong+0.059
bench+0.053
 versa+0.052
Neg logit contributions
hr-0.036
rum-0.033
ment-0.031
oc-0.030
 amaz-0.027
Token story
Token ID1621
Feature activation+1.81e-02

Pos logit contributions
abella+0.039
top+0.039
 wrong+0.035
bench+0.032
rant+0.032
Neg logit contributions
hr-0.033
 exc-0.029
 mis-0.028
rum-0.028
AAAA-0.028
Token:
Token ID25
Feature activation+4.37e-02

Pos logit contributions
abella+0.028
top+0.026
 wrong+0.024
 versa+0.022
rant+0.022
Neg logit contributions
rum-0.025
hr-0.024
packs-0.023
ment-0.023
oc-0.023
Token kids
Token ID3988
Feature activation+5.76e-02

Pos logit contributions
abella+0.074
 wrong+0.063
top+0.062
bench+0.061
 versa+0.056
Neg logit contributions
hr-0.059
rum-0.057
 exc-0.055
oc-0.054
ment-0.051
Token should
Token ID815
Feature activation+7.50e-02

Pos logit contributions
abella+0.091
top+0.075
bench+0.068
 versa+0.065
side+0.065
Neg logit contributions
rum-0.080
 exc-0.078
hr-0.077
cest-0.075
 mis-0.074
Token never
Token ID1239
Feature activation+1.08e-02

Pos logit contributions
abella+0.123
top+0.103
 projector+0.097
rant+0.095
 wrong+0.093
Neg logit contributions
 exc-0.106
rum-0.101
 mis-0.093
hr-0.089
 fear-0.087
Token try
Token ID1949
Feature activation-3.59e-03

Pos logit contributions
abella+0.019
top+0.017
 wrong+0.017
bench+0.015
rant+0.015
Neg logit contributions
hr-0.013
rum-0.012
cest-0.012
 exc-0.012
AAAA-0.012
Token to
Token ID284
Feature activation+1.30e-02

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token use
Token ID779
Feature activation-3.38e-03

Pos logit contributions
abella+0.022
 wrong+0.019
top+0.019
bench+0.017
 weekend+0.017
Neg logit contributions
hr-0.018
rum-0.017
cest-0.017
 exc-0.016
oc-0.016
Token grown
Token ID7334
Feature activation-8.77e-03

Pos logit contributions
hr+0.005
rum+0.005
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token the
Token ID262
Feature activation+2.70e-02

Pos logit contributions
abella+0.014
top+0.014
 wrong+0.013
bench+0.012
 versa+0.011
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
AAAA-0.009
oc-0.009
Token story
Token ID1621
Feature activation+1.43e-02

Pos logit contributions
top+0.048
abella+0.047
rant+0.041
 wrong+0.040
bench+0.039
Neg logit contributions
hr-0.032
 tou-0.031
 exc-0.030
AAAA-0.030
 mis-0.028
Token is
Token ID318
Feature activation+2.58e-02

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.019
 versa+0.017
rant+0.017
Neg logit contributions
rum-0.020
hr-0.019
packs-0.019
ment-0.019
cest-0.019
Token:
Token ID25
Feature activation+2.53e-02

Pos logit contributions
abella+0.038
top+0.035
bench+0.033
 versa+0.031
rant+0.030
Neg logit contributions
hr-0.036
iring-0.035
 tou-0.034
rum-0.034
 exc-0.034
Token You
Token ID921
Feature activation+4.32e-03

Pos logit contributions
abella+0.042
top+0.037
 wrong+0.035
bench+0.034
rant+0.032
Neg logit contributions
hr-0.035
 exc-0.033
rum-0.033
AAAA-0.032
oc-0.032
Token should
Token ID815
Feature activation+7.42e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
AAAA-0.006
 exc-0.006
Token not
Token ID407
Feature activation+4.18e-02

Pos logit contributions
abella+0.139
top+0.118
rant+0.111
 projector+0.111
bench+0.101
Neg logit contributions
 exc-0.095
rum-0.086
 tou-0.075
 mis-0.074
cest-0.069
Token be
Token ID307
Feature activation-1.07e-02

Pos logit contributions
abella+0.083
top+0.067
 wrong+0.066
bench+0.062
 projector+0.061
Neg logit contributions
 exc-0.048
hr-0.047
rum-0.046
oc-0.045
 tou-0.044
Token ignorant
Token ID20109
Feature activation+2.37e-03

Pos logit contributions
hr+0.014
rum+0.014
AAAA+0.013
cest+0.013
oc+0.013
Neg logit contributions
abella-0.017
 wrong-0.017
top-0.016
bench-0.013
 Luke-0.013
Token and
Token ID290
Feature activation+1.28e-02

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.003
Token selfish
Token ID20363
Feature activation-9.48e-03

Pos logit contributions
abella+0.022
top+0.020
 wrong+0.019
bench+0.017
 versa+0.017
Neg logit contributions
hr-0.017
rum-0.015
 exc-0.015
cest-0.015
AAAA-0.014
Token and
Token ID290
Feature activation+1.86e-02

Pos logit contributions
abella+0.060
 wrong+0.051
top+0.051
bench+0.047
 versa+0.046
Neg logit contributions
hr-0.032
cest-0.030
oc-0.029
rum-0.029
packs-0.028
Token looking
Token ID2045
Feature activation-2.43e-02

Pos logit contributions
abella+0.032
top+0.028
 wrong+0.027
bench+0.026
rant+0.024
Neg logit contributions
hr-0.025
cest-0.022
 exc-0.022
AAAA-0.022
rum-0.022
Token for
Token ID329
Feature activation-2.32e-02

Pos logit contributions
hr+0.049
rum+0.048
AAAA+0.047
cest+0.046
ment+0.045
Neg logit contributions
 wrong-0.023
abella-0.021
top-0.019
 Luke-0.013
 weekend-0.013
Token a
Token ID257
Feature activation+2.81e-03

Pos logit contributions
hr+0.034
rum+0.034
AAAA+0.032
cest+0.032
packs+0.031
Neg logit contributions
 wrong-0.033
abella-0.032
top-0.030
 Luke-0.025
 versa-0.024
Token carrot
Token ID40562
Feature activation+3.13e-02

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.003
cest-0.003
rum-0.003
AAAA-0.003
oc-0.003
Token trail
Token ID8025
Feature activation+7.39e-02

Pos logit contributions
abella+0.059
top+0.051
 wrong+0.049
bench+0.045
rant+0.044
Neg logit contributions
cest-0.036
hr-0.034
packs-0.034
rum-0.034
oc-0.034
Token,"
Token ID553
Feature activation+1.31e-02

Pos logit contributions
abella+0.154
top+0.121
 Dame+0.118
acey+0.117
 versa+0.116
Neg logit contributions
 than-0.072
cest-0.070
packs-0.065
 from-0.061
 behind-0.060
Token replied
Token ID8712
Feature activation+3.12e-02

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.014
rant+0.014
Neg logit contributions
hr-0.020
oc-0.019
AAAA-0.019
rum-0.019
cest-0.018
Token Benny
Token ID44275
Feature activation+1.52e-02

Pos logit contributions
abella+0.051
top+0.047
 wrong+0.044
bench+0.043
rant+0.039
Neg logit contributions
hr-0.039
 mis-0.037
rum-0.036
AAAA-0.036
 exc-0.035
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
abella+0.027
 wrong+0.024
top+0.024
 weekend+0.021
bench+0.021
Neg logit contributions
hr-0.019
rum-0.018
cest-0.017
oc-0.017
ment-0.017
Token
Token ID198
Feature activation+1.24e-02

Pos logit contributions
abella+0.020
 wrong+0.018
top+0.018
bench+0.016
 versa+0.015
Neg logit contributions
hr-0.017
rum-0.016
AAAA-0.016
oc-0.016
 exc-0.016
Token Every
Token ID3887
Feature activation+4.12e-02

Pos logit contributions
abella+0.017
 wrong+0.016
top+0.016
bench+0.013
 versa+0.013
Neg logit contributions
hr-0.016
cest-0.015
AAAA-0.015
rum-0.015
packs-0.015
Token day
Token ID1110
Feature activation+1.02e-02

Pos logit contributions
abella+0.067
top+0.056
 versa+0.055
 wrong+0.054
 Dame+0.051
Neg logit contributions
AAAA-0.053
oc-0.052
ment-0.051
packs-0.050
 exc-0.050
Token they
Token ID484
Feature activation+3.06e-02

Pos logit contributions
abella+0.016
 wrong+0.015
top+0.014
bench+0.013
rant+0.012
Neg logit contributions
hr-0.014
cest-0.014
AAAA-0.014
rum-0.013
ment-0.013
Token played
Token ID2826
Feature activation+3.64e-03

Pos logit contributions
abella+0.037
 wrong+0.032
top+0.032
bench+0.027
gh+0.024
Neg logit contributions
 exc-0.053
packs-0.051
cest-0.050
hr-0.049
 mis-0.049
Token together
Token ID1978
Feature activation+2.47e-02

Pos logit contributions
abella+0.005
top+0.005
 wrong+0.004
bench+0.004
rant+0.004
Neg logit contributions
hr-0.006
rum-0.006
cest-0.005
AAAA-0.005
oc-0.005
Token and
Token ID290
Feature activation+7.39e-02

Pos logit contributions
abella+0.049
top+0.043
 wrong+0.042
bench+0.039
 versa+0.039
Neg logit contributions
hr-0.023
rum-0.022
packs-0.022
cest-0.022
oc-0.022
Token joked
Token ID27582
Feature activation+2.01e-02

Pos logit contributions
abella+0.132
 wrong+0.127
top+0.126
bench+0.121
inner+0.121
Neg logit contributions
cest-0.076
 exc-0.071
packs-0.067
 squirrel-0.065
ies-0.063
Token around
Token ID1088
Feature activation+4.83e-03

Pos logit contributions
abella+0.033
top+0.031
 wrong+0.029
bench+0.028
rant+0.026
Neg logit contributions
hr-0.025
rum-0.024
cest-0.023
ment-0.022
oc-0.022
Token.
Token ID13
Feature activation+3.31e-02

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token But
Token ID887
Feature activation+2.71e-02

Pos logit contributions
abella+0.055
 wrong+0.053
top+0.049
bench+0.045
 versa+0.044
Neg logit contributions
AAAA-0.040
oc-0.039
packs-0.038
rum-0.037
cest-0.037
Token one
Token ID530
Feature activation-4.72e-03

Pos logit contributions
abella+0.033
top+0.029
 wrong+0.028
bench+0.027
rant+0.024
Neg logit contributions
hr-0.046
cest-0.045
rum-0.043
AAAA-0.043
packs-0.043
Token so
Token ID523
Feature activation-7.17e-03

Pos logit contributions
rum+0.066
ment+0.062
hr+0.061
AAAA+0.061
oc+0.061
Neg logit contributions
abella-0.041
 wrong-0.039
top-0.034
 Luke-0.031
 Jack-0.031
Token he
Token ID339
Feature activation+6.80e-04

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.011
cest+0.011
ment+0.011
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.006
 Luke-0.006
Token hid
Token ID24519
Feature activation-9.66e-03

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token a
Token ID257
Feature activation-5.68e-03

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
oc+0.010
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.013
rant-0.013
Token special
Token ID2041
Feature activation+8.79e-03

Pos logit contributions
hr+0.007
rum+0.007
cest+0.007
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.008
 versa-0.007
Token camera
Token ID4676
Feature activation+7.37e-02

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.014
bench+0.012
rant+0.012
Neg logit contributions
hr-0.011
rum-0.010
cest-0.010
AAAA-0.010
oc-0.009
Token.
Token ID13
Feature activation+6.38e-03

Pos logit contributions
abella+0.163
 Dame+0.131
top+0.131
liv+0.123
V+0.123
Neg logit contributions
 behind-0.053
 from-0.051
cest-0.047
packs-0.045
iring-0.045
Token The
Token ID383
Feature activation-4.88e-03

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.008
rum-0.008
oc-0.007
AAAA-0.007
cest-0.007
Token bear
Token ID6842
Feature activation-7.75e-03

Pos logit contributions
hr+0.006
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
 versa-0.007
Token searched
Token ID16499
Feature activation-1.04e-02

Pos logit contributions
hr+0.012
rum+0.012
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
 versa-0.008
Token high
Token ID1029
Feature activation+2.54e-02

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
AAAA+0.015
ment+0.015
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
rant-0.011
Token and
Token ID290
Feature activation+6.78e-03

Pos logit contributions
hr+0.008
rum+0.007
cest+0.007
AAAA+0.007
oc+0.007
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.009
 versa-0.008
Token your
Token ID534
Feature activation-1.33e-02

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.009
rant+0.008
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
ment-0.008
AAAA-0.008
Token mind
Token ID2000
Feature activation+6.17e-03

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
oc+0.015
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.020
bench-0.017
 weekend-0.016
Token.
Token ID13
Feature activation+7.02e-03

Pos logit contributions
abella+0.009
top+0.008
 wrong+0.008
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
ment-0.008
packs-0.008
Token You
Token ID921
Feature activation+1.55e-02

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.010
rum-0.010
AAAA-0.009
cest-0.009
oc-0.009
Token should
Token ID815
Feature activation+7.36e-02

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.021
bench+0.018
rant+0.018
Neg logit contributions
hr-0.022
 exc-0.021
rum-0.021
oc-0.021
AAAA-0.020
Token not
Token ID407
Feature activation+4.01e-02

Pos logit contributions
abella+0.130
top+0.113
rant+0.103
 projector+0.103
 wrong+0.102
Neg logit contributions
 exc-0.097
rum-0.090
 mis-0.083
 tou-0.079
hr-0.078
Token play
Token ID711
Feature activation-9.49e-03

Pos logit contributions
abella+0.078
top+0.067
 wrong+0.067
bench+0.062
 projector+0.061
Neg logit contributions
hr-0.044
 exc-0.043
rum-0.042
 mis-0.040
 tou-0.040
Token too
Token ID1165
Feature activation-5.16e-03

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
ment+0.011
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.012
 Luke-0.011
Token much
Token ID881
Feature activation-7.19e-03

Pos logit contributions
hr+0.009
rum+0.009
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.004
Token or
Token ID393
Feature activation+8.77e-03

Pos logit contributions
hr+0.010
rum+0.009
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
 versa-0.008
Token said
Token ID531
Feature activation+1.70e-02

Pos logit contributions
abella+0.036
top+0.035
 wrong+0.034
bench+0.030
gh+0.030
Neg logit contributions
 exc-0.027
hr-0.026
cest-0.025
 tou-0.024
AAAA-0.024
Token that
Token ID326
Feature activation+2.72e-02

Pos logit contributions
abella+0.023
top+0.022
 wrong+0.019
bench+0.019
rant+0.018
Neg logit contributions
hr-0.026
ment-0.026
 exc-0.025
rum-0.024
AAAA-0.024
Token Lily
Token ID20037
Feature activation+4.81e-03

Pos logit contributions
abella+0.039
top+0.038
 wrong+0.034
rant+0.033
bench+0.032
Neg logit contributions
hr-0.039
 exc-0.038
ment-0.036
cest-0.036
packs-0.036
Token and
Token ID290
Feature activation+2.03e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.006
oc-0.006
Token Ben
Token ID3932
Feature activation+4.63e-03

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.020
bench+0.017
rant+0.016
Neg logit contributions
hr-0.035
packs-0.033
cest-0.033
ment-0.033
 mis-0.033
Token should
Token ID815
Feature activation+7.35e-02

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
rant+0.004
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.007
 exc-0.007
Token say
Token ID910
Feature activation+1.82e-02

Pos logit contributions
abella+0.144
top+0.130
 projector+0.117
rant+0.117
bench+0.114
Neg logit contributions
 exc-0.082
 tou-0.065
rum-0.065
 mis-0.065
 fear-0.062
Token sorry
Token ID7926
Feature activation+5.41e-03

Pos logit contributions
abella+0.031
top+0.030
bench+0.026
 wrong+0.026
rant+0.025
Neg logit contributions
hr-0.021
rum-0.020
ment-0.020
 exc-0.020
AAAA-0.019
Token and
Token ID290
Feature activation+2.43e-02

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
ment-0.006
oc-0.005
Token thank
Token ID5875
Feature activation+1.52e-02

Pos logit contributions
abella+0.042
top+0.037
 wrong+0.034
rant+0.033
bench+0.032
Neg logit contributions
hr-0.031
 exc-0.030
rum-0.028
ment-0.028
cest-0.028
Token you
Token ID345
Feature activation-2.19e-03

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.020
bench+0.019
rant+0.017
Neg logit contributions
hr-0.022
rum-0.020
ment-0.020
 exc-0.019
AAAA-0.019
Token hug
Token ID16225
Feature activation-8.61e-03

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token sound
Token ID2128
Feature activation-2.21e-03

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.012
cest+0.011
 exc+0.011
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
rant-0.010
Token and
Token ID290
Feature activation-6.47e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.003
 versa-0.003
Token wraps
Token ID27521
Feature activation+4.41e-02

Pos logit contributions
hr+0.010
rum+0.009
cest+0.009
AAAA+0.009
oc+0.009
Neg logit contributions
abella-0.009
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token his
Token ID465
Feature activation-1.76e-02

Pos logit contributions
abella+0.083
 wrong+0.074
top+0.073
 versa+0.073
bench+0.071
Neg logit contributions
ment-0.047
rum-0.040
oc-0.038
cest-0.037
packs-0.037
Token arms
Token ID5101
Feature activation+7.34e-02

Pos logit contributions
hr+0.023
cest+0.021
rum+0.021
AAAA+0.020
oc+0.020
Neg logit contributions
 wrong-0.029
abella-0.028
top-0.027
bench-0.023
 Luke-0.022
Token around
Token ID1088
Feature activation+1.76e-02

Pos logit contributions
abella+0.171
 versa+0.141
 halftime+0.136
 Dame+0.135
 cinema+0.135
Neg logit contributions
 than-0.051
 from-0.050
packs-0.048
rum-0.045
 as-0.045
Token the
Token ID262
Feature activation+9.70e-03

Pos logit contributions
abella+0.033
top+0.031
 wrong+0.030
bench+0.027
 versa+0.026
Neg logit contributions
hr-0.018
rum-0.017
ment-0.017
oc-0.016
cest-0.016
Token paper
Token ID3348
Feature activation+2.04e-02

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.013
bench+0.011
 versa+0.011
Neg logit contributions
hr-0.013
cest-0.013
rum-0.013
AAAA-0.012
oc-0.012
Token.
Token ID13
Feature activation+6.35e-03

Pos logit contributions
abella+0.037
 wrong+0.032
top+0.032
 versa+0.028
 weekend+0.027
Neg logit contributions
hr-0.024
rum-0.024
cest-0.023
packs-0.023
ment-0.022
Token
Token ID198
Feature activation+1.75e-02

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.007
ment-0.007
cest-0.007
Token hug
Token ID16225
Feature activation-6.48e-04

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
 versa+0.004
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
oc-0.003
Token sound
Token ID2128
Feature activation+2.23e-05

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token and
Token ID290
Feature activation-8.86e-03

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token wraps
Token ID27521
Feature activation+4.62e-02

Pos logit contributions
hr+0.013
rum+0.013
cest+0.013
AAAA+0.012
oc+0.012
Neg logit contributions
abella-0.012
 wrong-0.012
top-0.011
bench-0.009
 Luke-0.009
Token her
Token ID607
Feature activation-1.59e-02

Pos logit contributions
abella+0.085
 wrong+0.077
top+0.075
 versa+0.075
bench+0.073
Neg logit contributions
ment-0.048
rum-0.040
packs-0.040
hr-0.040
oc-0.038
Token arms
Token ID5101
Feature activation+7.34e-02

Pos logit contributions
hr+0.021
cest+0.020
rum+0.020
oc+0.019
AAAA+0.019
Neg logit contributions
 wrong-0.026
abella-0.026
top-0.024
bench-0.021
 Luke-0.020
Token around
Token ID1088
Feature activation+1.95e-02

Pos logit contributions
abella+0.162
 versa+0.138
 cinema+0.134
 halftime+0.133
scar+0.132
Neg logit contributions
 than-0.055
 from-0.055
packs-0.053
 as-0.048
 behind-0.047
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
abella+0.037
top+0.035
 wrong+0.035
bench+0.031
 versa+0.030
Neg logit contributions
hr-0.019
rum-0.018
ment-0.017
packs-0.016
cest-0.016
Token paper
Token ID3348
Feature activation+1.93e-02

Pos logit contributions
abella+0.021
top+0.019
 wrong+0.017
bench+0.015
 versa+0.015
Neg logit contributions
hr-0.019
cest-0.018
rum-0.018
AAAA-0.017
 exc-0.017
Token.
Token ID13
Feature activation+2.19e-03

Pos logit contributions
abella+0.034
top+0.030
 wrong+0.030
 versa+0.026
 weekend+0.026
Neg logit contributions
hr-0.022
rum-0.022
cest-0.022
packs-0.021
ment-0.021
Token
Token ID198
Feature activation+1.94e-03

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.003
rant+0.003
Neg logit contributions
hr-0.003
rum-0.003
AAAA-0.003
cest-0.003
ment-0.003
Token an
Token ID281
Feature activation-9.31e-03

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.020
bench+0.019
 versa+0.018
Neg logit contributions
hr-0.029
rum-0.028
ment-0.027
cest-0.027
 exc-0.026
Token important
Token ID1593
Feature activation+3.49e-02

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
packs+0.014
Neg logit contributions
abella-0.012
 wrong-0.011
top-0.011
bench-0.009
 weekend-0.008
Token lesson
Token ID11483
Feature activation+7.97e-03

Pos logit contributions
abella+0.069
top+0.063
rant+0.063
 wrong+0.058
 versa+0.057
Neg logit contributions
rum-0.035
oc-0.033
hr-0.032
AAAA-0.032
 exc-0.031
Token -
Token ID532
Feature activation+2.27e-02

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.012
rum-0.012
cest-0.012
ment-0.012
oc-0.011
Token we
Token ID356
Feature activation+2.70e-02

Pos logit contributions
abella+0.033
top+0.029
 wrong+0.029
bench+0.026
rant+0.025
Neg logit contributions
hr-0.034
rum-0.033
 exc-0.032
cest-0.032
ment-0.031
Token should
Token ID815
Feature activation+7.33e-02

Pos logit contributions
abella+0.043
top+0.039
 wrong+0.037
bench+0.033
rant+0.032
Neg logit contributions
hr-0.036
rum-0.035
 exc-0.034
cest-0.033
oc-0.033
Token respect
Token ID2461
Feature activation-1.11e-02

Pos logit contributions
abella+0.125
top+0.106
 projector+0.100
bench+0.098
rant+0.097
Neg logit contributions
 exc-0.097
rum-0.089
hr-0.088
 mis-0.087
 tou-0.082
Token the
Token ID262
Feature activation-5.71e-04

Pos logit contributions
hr+0.016
rum+0.015
AAAA+0.015
cest+0.015
ment+0.014
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
bench-0.013
 Luke-0.012
Token opinion
Token ID4459
Feature activation-2.32e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
rant-0.001
Token of
Token ID286
Feature activation-3.40e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token others
Token ID1854
Feature activation-2.02e-02

Pos logit contributions
hr+0.005
rum+0.005
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token.
Token ID13
Feature activation+9.50e-03

Pos logit contributions
abella+0.017
 wrong+0.015
top+0.015
bench+0.013
 versa+0.013
Neg logit contributions
hr-0.011
rum-0.011
cest-0.011
packs-0.011
oc-0.010
Token They
Token ID1119
Feature activation+1.66e-02

Pos logit contributions
abella+0.014
 wrong+0.013
top+0.013
bench+0.010
 versa+0.010
Neg logit contributions
hr-0.013
rum-0.013
packs-0.013
cest-0.013
AAAA-0.012
Token always
Token ID1464
Feature activation+3.70e-02

Pos logit contributions
abella+0.027
top+0.025
 wrong+0.025
bench+0.022
 versa+0.020
Neg logit contributions
hr-0.021
 exc-0.021
cest-0.021
rum-0.020
AAAA-0.020
Token played
Token ID2826
Feature activation+1.11e-03

Pos logit contributions
abella+0.065
 wrong+0.058
top+0.058
bench+0.050
side+0.049
Neg logit contributions
 exc-0.045
cest-0.045
hr-0.044
AAAA-0.042
packs-0.041
Token together
Token ID1978
Feature activation+2.55e-02

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
oc-0.002
Token and
Token ID290
Feature activation+7.32e-02

Pos logit contributions
abella+0.047
top+0.042
 wrong+0.041
bench+0.038
 versa+0.037
Neg logit contributions
hr-0.027
rum-0.025
cest-0.025
oc-0.025
 exc-0.024
Token had
Token ID550
Feature activation+2.88e-02

Pos logit contributions
abella+0.138
top+0.131
 wrong+0.128
inner+0.122
bench+0.119
Neg logit contributions
 exc-0.069
cest-0.069
hr-0.060
oc-0.059
packs-0.057
Token lots
Token ID6041
Feature activation-2.65e-02

Pos logit contributions
abella+0.049
top+0.045
 wrong+0.042
rant+0.042
bench+0.041
Neg logit contributions
hr-0.035
cest-0.032
packs-0.030
AAAA-0.030
Upon-0.030
Token of
Token ID286
Feature activation-2.95e-03

Pos logit contributions
AAAA+0.038
hr+0.038
rum+0.038
cest+0.035
 tou+0.033
Neg logit contributions
 wrong-0.038
abella-0.037
top-0.037
 contest-0.030
bench-0.030
Token fun
Token ID1257
Feature activation-8.90e-03

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
 versa-0.002
Token.
Token ID13
Feature activation+2.89e-02

Pos logit contributions
hr+0.013
rum+0.013
AAAA+0.013
cest+0.013
ment+0.012
Neg logit contributions
abella-0.012
 wrong-0.012
top-0.012
bench-0.010
 versa-0.009
Token cricket
Token ID18836
Feature activation+6.67e-03

Pos logit contributions
abella+0.031
 wrong+0.028
top+0.028
bench+0.025
 versa+0.024
Neg logit contributions
hr-0.025
cest-0.023
oc-0.023
ment-0.023
rum-0.022
Token was
Token ID373
Feature activation+2.60e-02

Pos logit contributions
abella+0.010
 wrong+0.009
top+0.009
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.009
rum-0.009
cest-0.009
AAAA-0.009
oc-0.008
Token sitting
Token ID5586
Feature activation-1.30e-03

Pos logit contributions
abella+0.044
top+0.040
 wrong+0.038
 versa+0.035
bench+0.034
Neg logit contributions
hr-0.031
rum-0.030
 exc-0.030
cest-0.030
 mis-0.029
Token on
Token ID319
Feature activation+1.27e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
top-0.002
 wrong-0.002
bench-0.002
 versa-0.002
Token a
Token ID257
Feature activation+3.30e-02

Pos logit contributions
abella+0.021
top+0.019
 wrong+0.019
bench+0.017
 versa+0.016
Neg logit contributions
hr-0.016
rum-0.015
AAAA-0.015
cest-0.014
oc-0.014
Token leaf
Token ID12835
Feature activation+7.31e-02

Pos logit contributions
abella+0.063
 wrong+0.054
top+0.053
bench+0.050
 versa+0.049
Neg logit contributions
hr-0.036
cest-0.033
oc-0.032
 exc-0.032
 oak-0.031
Token outside
Token ID2354
Feature activation-1.89e-02

Pos logit contributions
abella+0.153
 Dame+0.116
 versa+0.116
top+0.115
 wrong+0.114
Neg logit contributions
hr-0.068
oc-0.064
packs-0.061
cest-0.060
iring-0.057
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
hr+0.028
AAAA+0.027
rum+0.027
cest+0.026
 tou+0.025
Neg logit contributions
top-0.026
abella-0.026
 wrong-0.026
 weekend-0.020
 Luke-0.020
Token Suddenly
Token ID24975
Feature activation+1.70e-02

Pos logit contributions
abella+0.032
 wrong+0.031
top+0.029
bench+0.026
 versa+0.025
Neg logit contributions
hr-0.022
rum-0.021
AAAA-0.020
Upon-0.020
oc-0.020
Token,
Token ID11
Feature activation+1.21e-02

Pos logit contributions
abella+0.029
top+0.027
 wrong+0.026
bench+0.024
rant+0.024
Neg logit contributions
hr-0.020
AAAA-0.020
ment-0.019
cest-0.019
rum-0.019
Token it
Token ID340
Feature activation+1.71e-03

Pos logit contributions
abella+0.019
top+0.017
 wrong+0.017
bench+0.015
rant+0.015
Neg logit contributions
hr-0.017
AAAA-0.016
cest-0.015
rum-0.015
ment-0.015
Token was
Token ID373
Feature activation+8.82e-04

Pos logit contributions
hr+0.042
AAAA+0.041
rum+0.041
oc+0.040
cest+0.039
Neg logit contributions
 wrong-0.041
top-0.038
abella-0.038
bench-0.031
 contest-0.030
Token all
Token ID477
Feature activation+3.57e-03

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token broken
Token ID5445
Feature activation-9.97e-03

Pos logit contributions
abella+0.006
top+0.005
 wrong+0.005
 versa+0.004
bench+0.004
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
 exc-0.004
Token.
Token ID13
Feature activation-5.11e-02

Pos logit contributions
hr+0.015
rum+0.014
AAAA+0.014
cest+0.014
oc+0.013
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.014
bench-0.011
rant-0.011
Token
Token ID198
Feature activation-5.27e-02

Pos logit contributions
cest+0.099
hr+0.097
rum+0.093
packs+0.093
 prey+0.087
Neg logit contributions
abella-0.054
top-0.052
 wrong-0.046
 However-0.045
 only-0.036
Token
Token ID198
Feature activation-8.73e-02

Pos logit contributions
cest+0.110
hr+0.109
packs+0.104
 amaz+0.103
ment+0.101
Neg logit contributions
abella-0.047
top-0.040
 wrong-0.035
 However-0.029
 never-0.028
TokenWil
Token ID22327
Feature activation-7.43e-03

Pos logit contributions
packs+0.145
 amaz+0.132
 prey+0.127
 discoveries+0.126
yards+0.126
Neg logit contributions
abella-0.110
top-0.106
 never-0.102
 However-0.094
 always-0.091
Tokenma
Token ID2611
Feature activation+5.99e-03

Pos logit contributions
hr+0.014
rum+0.014
cest+0.013
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.005
rant-0.005
Token felt
Token ID2936
Feature activation-1.11e-02

Pos logit contributions
abella+0.009
 wrong+0.009
top+0.009
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
cest-0.008
oc-0.008
Token a
Token ID257
Feature activation-2.64e-02

Pos logit contributions
hr+0.017
rum+0.016
cest+0.016
packs+0.015
AAAA+0.015
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.012
 versa-0.012
Token big
Token ID1263
Feature activation+1.71e-02

Pos logit contributions
hr+0.031
packs+0.030
rum+0.030
cest+0.028
ment+0.028
Neg logit contributions
 wrong-0.046
abella-0.044
top-0.043
bench-0.037
 only-0.036
Token suddenly
Token ID6451
Feature activation+3.07e-03

Pos logit contributions
abella+0.028
top+0.027
 wrong+0.025
bench+0.024
rant+0.022
Neg logit contributions
hr-0.028
AAAA-0.027
packs-0.026
 exc-0.026
cest-0.025
Token Lucy
Token ID22162
Feature activation-5.27e-03

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
rant+0.003
Neg logit contributions
hr-0.004
rum-0.004
AAAA-0.004
cest-0.004
ment-0.004
Token heard
Token ID2982
Feature activation-9.04e-04

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.007
bench-0.005
 versa-0.005
Token a
Token ID257
Feature activation-2.76e-02

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token loud
Token ID7812
Feature activation-1.30e-02

Pos logit contributions
rum+0.038
packs+0.037
hr+0.036
ment+0.035
cest+0.035
Neg logit contributions
 wrong-0.043
abella-0.041
top-0.040
 contest-0.033
bench-0.032
Token "
Token ID366
Feature activation-8.59e-02

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
packs+0.014
AAAA+0.014
Neg logit contributions
abella-0.022
 wrong-0.021
top-0.021
bench-0.018
 versa-0.017
TokenSC
Token ID6173
Feature activation-4.95e-03

Pos logit contributions
 fortunate+0.138
 discoveries+0.135
rum+0.135
ment+0.135
cest+0.131
Neg logit contributions
abella-0.102
top-0.101
bo-0.093
V-0.090
t-0.086
TokenREAM
Token ID32235
Feature activation-9.77e-03

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.008
top-0.007
 wrong-0.007
bench-0.006
 versa-0.006
Token"!
Token ID40484
Feature activation-7.68e-04

Pos logit contributions
hr+0.012
rum+0.011
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.013
 versa-0.013
Token She
Token ID1375
Feature activation-2.73e-02

Pos logit contributions
hr+0.001
rum+0.001
AAAA+0.001
cest+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
rant-0.001
Token felt
Token ID2936
Feature activation-2.04e-02

Pos logit contributions
rum+0.045
hr+0.044
cest+0.043
packs+0.042
ment+0.041
Neg logit contributions
abella-0.038
top-0.031
 wrong-0.031
 would-0.026
 Luke-0.025
Token.
Token ID13
Feature activation-8.53e-03

Pos logit contributions
abella+0.027
top+0.024
 wrong+0.023
 versa+0.022
bench+0.021
Neg logit contributions
hr-0.019
rum-0.018
cest-0.018
oc-0.017
 exc-0.017
Token It
Token ID632
Feature activation-8.18e-03

Pos logit contributions
hr+0.014
rum+0.013
cest+0.013
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.011
top-0.010
 wrong-0.010
bench-0.008
 Luke-0.008
Token was
Token ID373
Feature activation-1.17e-02

Pos logit contributions
hr+0.013
rum+0.013
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
bench-0.008
rant-0.008
Token surrounded
Token ID11191
Feature activation+8.26e-03

Pos logit contributions
hr+0.020
rum+0.020
cest+0.019
AAAA+0.019
oc+0.019
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.010
 weekend-0.010
Token by
Token ID416
Feature activation-5.32e-02

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.014
 versa+0.013
bench+0.013
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
packs-0.008
 exc-0.008
Token other
Token ID584
Feature activation-8.31e-02

Pos logit contributions
rum+0.085
AAAA+0.082
packs+0.081
cest+0.081
hr+0.080
Neg logit contributions
 wrong-0.064
abella-0.062
top-0.061
 never-0.053
 always-0.052
Token weak
Token ID4939
Feature activation-5.09e-02

Pos logit contributions
rum+0.124
AAAA+0.119
packs+0.118
ment+0.111
hr+0.110
Neg logit contributions
 wrong-0.114
top-0.107
abella-0.098
 only-0.097
 always-0.095
Token trees
Token ID7150
Feature activation+1.92e-05

Pos logit contributions
hr+0.067
AAAA+0.066
rum+0.063
packs+0.059
cest+0.058
Neg logit contributions
 wrong-0.080
top-0.075
abella-0.068
 night-0.061
 would-0.060
Token in
Token ID287
Feature activation-2.13e-02

Pos logit contributions
abella+0.000
 wrong+0.000
top+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
oc-0.000
Token the
Token ID262
Feature activation-8.94e-03

Pos logit contributions
hr+0.033
rum+0.032
cest+0.031
ment+0.031
AAAA+0.031
Neg logit contributions
 wrong-0.028
abella-0.027
top-0.027
bench-0.021
 weekend-0.021
Token forest
Token ID8222
Feature activation+2.25e-03

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.013
bench-0.010
 weekend-0.010
Token.
Token ID13
Feature activation-3.94e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.007
Neg logit contributions
abella-0.008
 wrong-0.008
top-0.008
bench-0.006
 versa-0.006
Token
Token ID198
Feature activation-2.03e-02

Pos logit contributions
hr+0.069
cest+0.065
rum+0.063
AAAA+0.062
ment+0.060
Neg logit contributions
abella-0.048
top-0.048
 wrong-0.041
 However-0.040
 weekend-0.037
Token
Token ID198
Feature activation-5.68e-02

Pos logit contributions
hr+0.034
cest+0.033
rum+0.033
ment+0.032
AAAA+0.032
Neg logit contributions
abella-0.026
top-0.024
 wrong-0.022
bench-0.019
 weekend-0.019
TokenOne
Token ID3198
Feature activation-2.15e-02

Pos logit contributions
ment+0.121
hr+0.120
packs+0.116
cest+0.116
rum+0.116
Neg logit contributions
abella-0.043
top-0.042
 weekend-0.031
One-0.030
 wrong-0.029
Token day
Token ID1110
Feature activation-6.05e-02

Pos logit contributions
rum+0.051
AAAA+0.050
hr+0.050
cest+0.049
Upon+0.047
Neg logit contributions
abella-0.012
top-0.011
 wrong-0.011
 weekend-0.008
 morning-0.006
Token,
Token ID11
Feature activation-8.05e-02

Pos logit contributions
rum+0.125
hr+0.123
cest+0.118
 amaz+0.114
oc+0.112
Neg logit contributions
abella-0.055
top-0.048
 wrong-0.047
 one-0.037
 only-0.036
Token Wil
Token ID5187
Feature activation-3.51e-03

Pos logit contributions
rum+0.149
hr+0.137
AAAA+0.133
cest+0.132
smart+0.131
Neg logit contributions
abella-0.099
top-0.091
 only-0.087
 one-0.086
 always-0.083
Tokenma
Token ID2611
Feature activation-1.84e-02

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token tri
Token ID1333
Feature activation-3.85e-03

Pos logit contributions
hr+0.029
rum+0.029
AAAA+0.027
cest+0.027
ment+0.027
Neg logit contributions
abella-0.026
top-0.023
 wrong-0.023
bench-0.018
 Luke-0.018
Tokenpped
Token ID1496
Feature activation-2.20e-03

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token and
Token ID290
Feature activation-2.62e-02

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.002
Neg logit contributions
abella-0.004
 wrong-0.003
top-0.003
bench-0.003
 versa-0.003
Token,
Token ID11
Feature activation-3.24e-02

Pos logit contributions
hr+0.023
rum+0.022
AAAA+0.022
cest+0.022
oc+0.021
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.016
 Luke-0.013
rant-0.012
Token the
Token ID262
Feature activation-3.75e-02

Pos logit contributions
rum+0.062
hr+0.059
cest+0.057
AAAA+0.056
oc+0.055
Neg logit contributions
 wrong-0.034
abella-0.033
top-0.030
 Jack-0.028
 Luke-0.026
Token tree
Token ID5509
Feature activation-4.57e-03

Pos logit contributions
hr+0.066
rum+0.062
cest+0.061
packs+0.060
ment+0.060
Neg logit contributions
 wrong-0.051
top-0.041
 only-0.038
abella-0.036
 always-0.035
Token shook
Token ID14682
Feature activation-7.89e-03

Pos logit contributions
hr+0.007
rum+0.007
cest+0.007
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token and
Token ID290
Feature activation+2.85e-03

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
oc+0.008
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.014
bench-0.012
rant-0.011
Token an
Token ID281
Feature activation-7.90e-02

Pos logit contributions
abella+0.004
 wrong+0.004
top+0.004
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
ment-0.004
Token apple
Token ID17180
Feature activation-1.09e-03

Pos logit contributions
packs+0.121
hr+0.118
AAAA+0.110
ment+0.108
ults+0.102
Neg logit contributions
 wrong-0.108
top-0.097
abella-0.097
 weekend-0.083
bench-0.082
Token fell
Token ID3214
Feature activation-6.77e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.002
top-0.002
 wrong-0.002
bench-0.001
 versa-0.001
Token to
Token ID284
Feature activation-1.45e-02

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
oc+0.008
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
rant-0.009
Token the
Token ID262
Feature activation-1.05e-02

Pos logit contributions
hr+0.022
rum+0.022
cest+0.021
ment+0.020
packs+0.020
Neg logit contributions
abella-0.020
 wrong-0.020
top-0.019
rant-0.016
 Luke-0.015
Token ground
Token ID2323
Feature activation+1.60e-02

Pos logit contributions
hr+0.016
rum+0.015
cest+0.015
AAAA+0.015
ment+0.014
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 weekend-0.011
Token in
Token ID287
Feature activation-2.62e-02

Pos logit contributions
hr+0.005
rum+0.005
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token a
Token ID257
Feature activation-4.00e-03

Pos logit contributions
hr+0.050
rum+0.050
ment+0.048
cest+0.048
packs+0.048
Neg logit contributions
abella-0.025
 wrong-0.023
top-0.022
 Luke-0.017
 never-0.016
Token jungle
Token ID20712
Feature activation+1.31e-02

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token with
Token ID351
Feature activation-1.26e-02

Pos logit contributions
abella+0.026
top+0.023
 wrong+0.023
 versa+0.021
bench+0.021
Neg logit contributions
hr-0.014
oc-0.013
cest-0.012
rum-0.012
 exc-0.012
Token many
Token ID867
Feature activation-6.42e-02

Pos logit contributions
hr+0.016
rum+0.015
AAAA+0.015
cest+0.015
ment+0.014
Neg logit contributions
abella-0.021
 wrong-0.020
top-0.020
bench-0.016
 Luke-0.016
Token other
Token ID584
Feature activation-7.89e-02

Pos logit contributions
hr+0.114
rum+0.109
ment+0.107
AAAA+0.106
 amaz+0.099
Neg logit contributions
 wrong-0.076
abella-0.057
top-0.056
 often-0.055
 would-0.054
Token animals
Token ID4695
Feature activation-1.36e-02

Pos logit contributions
rum+0.125
ment+0.125
hr+0.123
 amaz+0.123
AAAA+0.113
Neg logit contributions
 wrong-0.097
 always-0.085
 would-0.080
 only-0.079
abella-0.079
Token like
Token ID588
Feature activation-2.18e-02

Pos logit contributions
hr+0.021
rum+0.020
AAAA+0.020
cest+0.020
ment+0.019
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.014
rant-0.013
Token lions
Token ID31420
Feature activation-3.01e-02

Pos logit contributions
hr+0.027
rum+0.027
cest+0.025
ment+0.025
AAAA+0.025
Neg logit contributions
 wrong-0.036
abella-0.036
top-0.033
 contest-0.029
 Luke-0.028
Token,
Token ID11
Feature activation-1.55e-02

Pos logit contributions
hr+0.051
rum+0.049
AAAA+0.048
ment+0.047
 amaz+0.047
Neg logit contributions
 wrong-0.037
abella-0.034
top-0.034
 contest-0.027
 would-0.026
Token tigers
Token ID43114
Feature activation-2.30e-02

Pos logit contributions
rum+0.019
hr+0.019
ment+0.018
cest+0.017
AAAA+0.017
Neg logit contributions
abella-0.027
 wrong-0.026
top-0.025
 Luke-0.021
 versa-0.021
Token,
Token ID11
Feature activation-9.93e-03

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token he
Token ID339
Feature activation+4.43e-03

Pos logit contributions
hr+0.017
rum+0.016
cest+0.016
AAAA+0.015
oc+0.015
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.009
 Luke-0.009
Token heard
Token ID2982
Feature activation+2.13e-03

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.006
cest-0.006
AAAA-0.006
rum-0.006
oc-0.006
Token a
Token ID257
Feature activation-2.20e-02

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.003
Token loud
Token ID7812
Feature activation-9.00e-03

Pos logit contributions
rum+0.027
packs+0.027
hr+0.027
cest+0.025
ment+0.025
Neg logit contributions
 wrong-0.036
abella-0.036
top-0.034
bench-0.028
 contest-0.028
Token "
Token ID366
Feature activation-7.87e-02

Pos logit contributions
hr+0.010
rum+0.010
cest+0.010
packs+0.009
AAAA+0.009
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.013
 versa-0.012
TokenZ
Token ID57
Feature activation-2.56e-02

Pos logit contributions
rum+0.125
packs+0.122
cest+0.119
ment+0.119
 amaz+0.118
Neg logit contributions
top-0.096
abella-0.096
bo-0.088
t-0.082
 wrong-0.078
Tokenoom
Token ID4207
Feature activation-2.67e-02

Pos logit contributions
rum+0.045
cest+0.045
hr+0.044
 exc+0.043
oc+0.042
Neg logit contributions
abella-0.029
top-0.026
 wrong-0.025
 contest-0.019
gh-0.019
Token!"
Token ID2474
Feature activation+7.62e-03

Pos logit contributions
hr+0.036
rum+0.034
packs+0.034
ment+0.034
cest+0.033
Neg logit contributions
abella-0.041
 wrong-0.039
top-0.038
bench-0.031
 Luke-0.031
Token
Token ID198
Feature activation+1.95e-02

Pos logit contributions
abella+0.012
 wrong+0.011
top+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.010
AAAA-0.010
rum-0.010
ment-0.009
cest-0.009
Token
Token ID198
Feature activation-5.73e-03

Pos logit contributions
abella+0.033
top+0.033
 wrong+0.033
rant+0.029
work+0.029
Neg logit contributions
hr-0.022
AAAA-0.022
rum-0.021
oc-0.020
Upon-0.020
Token a
Token ID257
Feature activation+1.84e-02

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.003
rum-0.002
AAAA-0.002
cest-0.002
 exc-0.002
Token big
Token ID1263
Feature activation+9.50e-03

Pos logit contributions
abella+0.035
top+0.032
 wrong+0.031
bench+0.029
rant+0.028
Neg logit contributions
hr-0.020
cest-0.018
 exc-0.018
oc-0.018
AAAA-0.018
Token tree
Token ID5509
Feature activation+3.73e-02

Pos logit contributions
abella+0.017
top+0.016
 wrong+0.015
bench+0.014
 versa+0.013
Neg logit contributions
hr-0.011
cest-0.010
rum-0.010
oc-0.010
AAAA-0.010
Token with
Token ID351
Feature activation-1.17e-02

Pos logit contributions
abella+0.082
top+0.066
 versa+0.066
 wrong+0.065
 Dame+0.064
Neg logit contributions
hr-0.034
oc-0.032
 exc-0.032
packs-0.031
cest-0.030
Token many
Token ID867
Feature activation-6.39e-02

Pos logit contributions
hr+0.021
rum+0.020
cest+0.020
AAAA+0.020
oc+0.019
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.012
bench-0.009
 weekend-0.009
Token other
Token ID584
Feature activation-7.80e-02

Pos logit contributions
hr+0.117
AAAA+0.109
rum+0.108
cess+0.106
ment+0.105
Neg logit contributions
 wrong-0.079
top-0.057
 only-0.052
abella-0.052
 always-0.052
Token cher
Token ID21382
Feature activation-2.27e-02

Pos logit contributions
rum+0.126
hr+0.125
ment+0.117
cest+0.116
AAAA+0.116
Neg logit contributions
 wrong-0.106
 only-0.087
top-0.085
 always-0.083
 one-0.082
Tokenries
Token ID1678
Feature activation-1.22e-02

Pos logit contributions
hr+0.011
rum+0.011
AAAA+0.011
cest+0.010
oc+0.009
Neg logit contributions
abella-0.053
 wrong-0.053
top-0.052
bench-0.046
rant-0.044
Token.
Token ID13
Feature activation+1.99e-02

Pos logit contributions
hr+0.019
rum+0.019
AAAA+0.018
cest+0.018
oc+0.017
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.016
bench-0.013
rant-0.012
Token The
Token ID383
Feature activation-3.98e-02

Pos logit contributions
abella+0.032
 wrong+0.029
top+0.028
bench+0.024
rant+0.024
Neg logit contributions
AAAA-0.026
hr-0.026
oc-0.025
rum-0.025
 exc-0.025
Token smooth
Token ID7209
Feature activation-3.67e-02

Pos logit contributions
rum+0.066
hr+0.063
ment+0.061
cest+0.060
packs+0.060
Neg logit contributions
 wrong-0.060
top-0.049
 only-0.047
abella-0.047
 contest-0.046
TokenHelp
Token ID22087
Feature activation-7.99e-03

Pos logit contributions
rum+0.050
hr+0.048
packs+0.046
ment+0.046
cest+0.046
Neg logit contributions
abella-0.043
top-0.039
 wrong-0.036
V-0.033
 Luke-0.032
Token!"
Token ID2474
Feature activation+3.25e-03

Pos logit contributions
hr+0.011
rum+0.010
cest+0.010
AAAA+0.010
 exc+0.010
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 Luke-0.010
Token Tom
Token ID4186
Feature activation-2.32e-02

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.005
rum-0.005
AAAA-0.005
cest-0.005
ment-0.005
Token screamed
Token ID25421
Feature activation-2.18e-04

Pos logit contributions
hr+0.032
cest+0.029
rum+0.029
AAAA+0.028
ment+0.028
Neg logit contributions
abella-0.036
 wrong-0.035
top-0.034
 Luke-0.029
bench-0.028
Token.
Token ID13
Feature activation-2.59e-02

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
ment+0.000
Neg logit contributions
abella-0.000
top-0.000
 wrong-0.000
bench-0.000
 versa-0.000
Token "
Token ID366
Feature activation-7.80e-02

Pos logit contributions
hr+0.044
cest+0.043
packs+0.040
rum+0.040
 fortunate+0.039
Neg logit contributions
abella-0.032
 wrong-0.031
top-0.030
 Luke-0.025
bench-0.024
TokenThere
Token ID1858
Feature activation-1.39e-02

Pos logit contributions
 discoveries+0.148
yt+0.146
ment+0.145
 fortunate+0.143
hr+0.139
Neg logit contributions
top-0.069
bo-0.068
abella-0.065
 never-0.060
V-0.060
Token's
Token ID338
Feature activation-5.34e-03

Pos logit contributions
hr+0.025
rum+0.024
cest+0.023
AAAA+0.023
ment+0.023
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.011
 Luke-0.011
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
hr+0.009
rum+0.009
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
rant-0.005
Token lion
Token ID18744
Feature activation-3.44e-02

Pos logit contributions
abella+0.023
top+0.022
 wrong+0.020
bench+0.019
 versa+0.018
Neg logit contributions
hr-0.013
cest-0.013
AAAA-0.013
oc-0.012
rum-0.012
Token in
Token ID287
Feature activation-3.39e-02

Pos logit contributions
hr+0.049
ment+0.047
rum+0.046
AAAA+0.045
iring+0.045
Neg logit contributions
 wrong-0.051
abella-0.048
top-0.046
 contest-0.039
 Luke-0.038
Token can
Token ID460
Feature activation+3.97e-02

Pos logit contributions
hr+0.017
rum+0.016
cest+0.016
ment+0.016
AAAA+0.016
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.012
bench-0.010
 versa-0.010
Token't
Token ID470
Feature activation+2.48e-02

Pos logit contributions
abella+0.066
 wrong+0.058
top+0.058
bench+0.055
rant+0.054
Neg logit contributions
 exc-0.050
hr-0.048
 mis-0.047
 tou-0.046
rum-0.045
Token just
Token ID655
Feature activation-1.31e-02

Pos logit contributions
abella+0.045
top+0.039
 wrong+0.039
bench+0.035
rant+0.034
Neg logit contributions
hr-0.030
 exc-0.028
rum-0.028
oc-0.027
cest-0.027
Token play
Token ID711
Feature activation-2.30e-02

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.018
oc+0.018
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.017
bench-0.014
 versa-0.013
Token with
Token ID351
Feature activation-1.39e-02

Pos logit contributions
hr+0.035
rum+0.033
cest+0.033
ment+0.033
AAAA+0.032
Neg logit contributions
 wrong-0.034
abella-0.031
top-0.030
 contest-0.025
bench-0.024
Token other
Token ID584
Feature activation-7.79e-02

Pos logit contributions
hr+0.022
rum+0.022
cest+0.021
AAAA+0.021
oc+0.021
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
 versa-0.013
bench-0.013
Token people
Token ID661
Feature activation-8.79e-03

Pos logit contributions
rum+0.108
AAAA+0.107
ment+0.104
hr+0.104
iring+0.100
Neg logit contributions
 wrong-0.114
top-0.087
 will-0.086
 one-0.085
abella-0.084
Token's
Token ID338
Feature activation-4.35e-02

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.011
cest+0.011
ment+0.011
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.011
 versa-0.010
Token things
Token ID1243
Feature activation+1.40e-02

Pos logit contributions
AAAA+0.048
hr+0.048
rum+0.047
Upon+0.046
cest+0.045
Neg logit contributions
 wrong-0.081
top-0.067
abella-0.066
 weekend-0.062
bench-0.060
Token without
Token ID1231
Feature activation+6.68e-03

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.021
bench+0.019
 versa+0.018
Neg logit contributions
hr-0.017
rum-0.017
cest-0.016
 exc-0.015
AAAA-0.015
Token asking
Token ID4737
Feature activation+2.68e-04

Pos logit contributions
abella+0.011
top+0.011
 wrong+0.010
bench+0.009
rant+0.009
Neg logit contributions
hr-0.008
rum-0.008
cest-0.008
AAAA-0.007
ment-0.007
Token
Token ID198
Feature activation+2.05e-03

Pos logit contributions
abella+0.031
 wrong+0.029
top+0.029
rant+0.025
bench+0.025
Neg logit contributions
hr-0.022
rum-0.021
oc-0.021
cest-0.020
AAAA-0.020
TokenOne
Token ID3198
Feature activation-1.56e-02

Pos logit contributions
abella+0.002
 wrong+0.002
top+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
oc-0.004
Token day
Token ID1110
Feature activation-1.88e-02

Pos logit contributions
hr+0.034
rum+0.034
AAAA+0.033
cest+0.033
ment+0.032
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
 weekend-0.007
 morning-0.005
Token,
Token ID11
Feature activation-2.59e-02

Pos logit contributions
hr+0.033
rum+0.031
cest+0.031
ment+0.029
packs+0.029
Neg logit contributions
abella-0.022
 wrong-0.022
top-0.020
 contest-0.015
 weekend-0.015
Token while
Token ID981
Feature activation-1.54e-02

Pos logit contributions
rum+0.053
hr+0.052
cest+0.050
ment+0.048
oc+0.048
Neg logit contributions
abella-0.024
 wrong-0.024
top-0.023
 Jack-0.018
 weekend-0.016
Token they
Token ID484
Feature activation-7.75e-02

Pos logit contributions
hr+0.028
rum+0.027
cest+0.027
oc+0.026
ment+0.026
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
 weekend-0.012
 Luke-0.012
Token were
Token ID547
Feature activation+3.64e-03

Pos logit contributions
rum+0.191
cest+0.184
 wounds+0.180
adoes+0.179
ults+0.175
Neg logit contributions
 would-0.036
 night-0.034
 only-0.033
 made-0.031
top-0.030
Token walking
Token ID6155
Feature activation+3.86e-03

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.005
bench+0.005
rant+0.005
Neg logit contributions
hr-0.005
rum-0.004
AAAA-0.004
cest-0.004
packs-0.004
Token,
Token ID11
Feature activation-4.89e-03

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
ment-0.005
Token Tim
Token ID5045
Feature activation+1.91e-03

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
oc+0.008
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.004
 versa-0.004
Tokenmy
Token ID1820
Feature activation-7.99e-04

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.004
rum-0.004
cest-0.003
AAAA-0.003
packs-0.003
Token lived
Token ID5615
Feature activation-6.48e-03

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.007
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token on
Token ID319
Feature activation-1.44e-02

Pos logit contributions
hr+0.009
rum+0.009
cest+0.009
AAAA+0.009
ment+0.008
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token a
Token ID257
Feature activation+1.27e-02

Pos logit contributions
hr+0.026
rum+0.025
cest+0.025
AAAA+0.024
ment+0.024
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
 weekend-0.011
bench-0.011
Token farm
Token ID5318
Feature activation+2.83e-02

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.021
bench+0.019
rant+0.018
Neg logit contributions
hr-0.014
cest-0.014
AAAA-0.013
 exc-0.013
oc-0.013
Token with
Token ID351
Feature activation-2.27e-02

Pos logit contributions
abella+0.062
top+0.053
 wrong+0.051
 versa+0.050
rant+0.050
Neg logit contributions
hr-0.025
oc-0.023
cest-0.022
 exc-0.022
rum-0.021
Token other
Token ID584
Feature activation-7.75e-02

Pos logit contributions
hr+0.043
rum+0.041
cest+0.040
AAAA+0.040
ment+0.038
Neg logit contributions
abella-0.025
 wrong-0.022
top-0.021
 weekend-0.016
 Luke-0.016
Token animals
Token ID4695
Feature activation-1.09e-02

Pos logit contributions
rum+0.133
hr+0.133
ment+0.125
 amaz+0.122
AAAA+0.118
Neg logit contributions
 always-0.087
 wrong-0.086
 would-0.085
 one-0.084
abella-0.081
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.015
ment+0.015
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 versa-0.011
Token The
Token ID383
Feature activation-3.92e-02

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
AAAA-0.003
cest-0.003
oc-0.003
Token pony
Token ID26902
Feature activation+8.23e-03

Pos logit contributions
rum+0.062
hr+0.061
ment+0.058
packs+0.054
run+0.053
Neg logit contributions
 wrong-0.059
abella-0.056
 weekend-0.054
 always-0.053
 morning-0.050
Token liked
Token ID8288
Feature activation-2.62e-02

Pos logit contributions
abella+0.013
top+0.012
 wrong+0.012
bench+0.010
 versa+0.010
Neg logit contributions
hr-0.011
cest-0.011
oc-0.011
rum-0.011
AAAA-0.010
Token.
Token ID13
Feature activation+4.71e-03

Pos logit contributions
abella+0.016
top+0.014
 wrong+0.014
bench+0.012
 versa+0.012
Neg logit contributions
hr-0.011
rum-0.010
cest-0.010
packs-0.010
 exc-0.010
Token She
Token ID1375
Feature activation+1.87e-03

Pos logit contributions
abella+0.007
 wrong+0.007
top+0.007
bench+0.006
rant+0.006
Neg logit contributions
hr-0.006
rum-0.006
AAAA-0.006
cest-0.006
packs-0.006
Token wanted
Token ID2227
Feature activation+1.94e-02

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
 exc-0.002
Token to
Token ID284
Feature activation-4.75e-02

Pos logit contributions
abella+0.027
top+0.024
 wrong+0.024
bench+0.023
 versa+0.022
Neg logit contributions
hr-0.028
 tou-0.026
ries-0.026
packs-0.026
ment-0.026
Token fill
Token ID6070
Feature activation+1.10e-02

Pos logit contributions
ment+0.084
AAAA+0.081
cest+0.081
rum+0.080
hr+0.079
Neg logit contributions
top-0.055
abella-0.049
 only-0.046
 wrong-0.046
 rest-0.045
Token her
Token ID607
Feature activation-7.74e-02

Pos logit contributions
abella+0.017
 wrong+0.015
top+0.015
bench+0.013
 weekend+0.012
Neg logit contributions
hr-0.016
rum-0.015
cest-0.014
ment-0.014
oc-0.014
Token basket
Token ID7988
Feature activation+2.08e-02

Pos logit contributions
AAAA+0.143
hr+0.134
rum+0.132
smart+0.131
Get+0.126
Neg logit contributions
 wrong-0.080
 one-0.075
 rest-0.072
top-0.069
 only-0.067
Token with
Token ID351
Feature activation-2.47e-02

Pos logit contributions
abella+0.042
 wrong+0.037
top+0.036
 versa+0.034
bench+0.033
Neg logit contributions
hr-0.020
rum-0.019
cest-0.018
packs-0.018
 exc-0.018
Token the
Token ID262
Feature activation-4.01e-02

Pos logit contributions
hr+0.042
rum+0.040
AAAA+0.040
cest+0.040
oc+0.039
Neg logit contributions
abella-0.030
 wrong-0.030
top-0.029
bench-0.021
 only-0.021
Token needles
Token ID25209
Feature activation+4.09e-02

Pos logit contributions
hr+0.049
rum+0.047
AAAA+0.044
ment+0.043
packs+0.041
Neg logit contributions
 wrong-0.071
abella-0.068
top-0.067
 would-0.062
 only-0.060
Token.
Token ID13
Feature activation+2.93e-03

Pos logit contributions
abella+0.085
top+0.070
 wrong+0.068
 versa+0.068
 projector+0.065
Neg logit contributions
cest-0.036
rum-0.036
hr-0.034
packs-0.034
 exc-0.033
Token big
Token ID1263
Feature activation-1.19e-02

Pos logit contributions
hr+0.003
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.003
 versa-0.003
Token tree
Token ID5509
Feature activation+3.61e-02

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
ment+0.015
AAAA+0.015
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.014
 weekend-0.014
Token with
Token ID351
Feature activation-5.94e-03

Pos logit contributions
abella+0.082
top+0.067
 versa+0.067
 wrong+0.066
 Dame+0.065
Neg logit contributions
hr-0.031
oc-0.029
 exc-0.028
packs-0.027
rum-0.027
Token lots
Token ID6041
Feature activation-4.39e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.007
AAAA+0.007
oc+0.007
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.008
 versa-0.007
Token of
Token ID286
Feature activation-4.22e-02

Pos logit contributions
AAAA+0.076
hr+0.076
rum+0.072
cest+0.068
EX+0.066
Neg logit contributions
 wrong-0.051
top-0.048
abella-0.044
 contest-0.035
 night-0.035
Token other
Token ID584
Feature activation-7.73e-02

Pos logit contributions
hr+0.066
rum+0.060
cest+0.060
packs+0.056
AAAA+0.056
Neg logit contributions
 wrong-0.062
abella-0.053
top-0.051
 always-0.046
 morning-0.046
Token squirrel
Token ID33039
Feature activation-1.42e-02

Pos logit contributions
rum+0.127
hr+0.125
ment+0.122
 amaz+0.112
cess+0.112
Neg logit contributions
 wrong-0.100
 always-0.083
 only-0.083
 would-0.079
 made-0.074
Tokens
Token ID82
Feature activation-3.89e-02

Pos logit contributions
hr+0.020
AAAA+0.019
rum+0.019
cest+0.018
 amaz+0.018
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.020
bench-0.016
rant-0.016
Token.
Token ID13
Feature activation+9.26e-03

Pos logit contributions
hr+0.060
AAAA+0.057
rum+0.056
Upon+0.054
cest+0.054
Neg logit contributions
 wrong-0.058
top-0.049
abella-0.043
 only-0.042
 always-0.041
Token They
Token ID1119
Feature activation+2.94e-03

Pos logit contributions
abella+0.012
 wrong+0.010
top+0.010
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.015
rum-0.015
AAAA-0.015
cest-0.015
oc-0.015
Token were
Token ID547
Feature activation-5.04e-03

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.004
 versa+0.003
Neg logit contributions
hr-0.004
rum-0.004
cest-0.004
AAAA-0.004
 exc-0.004
Token that
Token ID326
Feature activation+1.19e-02

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
bench+0.014
rant+0.014
Neg logit contributions
hr-0.019
ment-0.018
rum-0.018
cest-0.018
packs-0.017
Token day
Token ID1110
Feature activation-3.25e-03

Pos logit contributions
abella+0.014
top+0.012
 wrong+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.022
 exc-0.021
cest-0.021
rum-0.021
AAAA-0.021
Token:
Token ID25
Feature activation+2.14e-03

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token never
Token ID1239
Feature activation-1.03e-02

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
 exc-0.003
Token load
Token ID3440
Feature activation-1.94e-02

Pos logit contributions
hr+0.015
rum+0.015
cest+0.015
AAAA+0.014
oc+0.014
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.013
bench-0.011
 weekend-0.011
Token your
Token ID534
Feature activation-7.73e-02

Pos logit contributions
hr+0.026
rum+0.024
cest+0.024
AAAA+0.024
oc+0.022
Neg logit contributions
abella-0.031
 wrong-0.031
top-0.030
rant-0.024
bench-0.024
Token wagon
Token ID31860
Feature activation-1.48e-02

Pos logit contributions
hr+0.126
cest+0.125
AAAA+0.120
rum+0.120
oc+0.119
Neg logit contributions
 wrong-0.111
 tea-0.080
 first-0.079
 only-0.078
oe-0.074
Token with
Token ID351
Feature activation-9.77e-03

Pos logit contributions
hr+0.022
rum+0.021
AAAA+0.021
cest+0.020
ment+0.019
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.021
bench-0.017
rant-0.016
Token p
Token ID279
Feature activation-3.14e-02

Pos logit contributions
hr+0.014
rum+0.014
cest+0.013
AAAA+0.013
oc+0.013
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
rant-0.011
Tokenears
Token ID4127
Feature activation-8.38e-03

Pos logit contributions
hr+0.039
AAAA+0.039
cest+0.038
rum+0.037
oc+0.035
Neg logit contributions
abella-0.052
top-0.051
 wrong-0.050
rant-0.041
oe-0.040
Token near
Token ID1474
Feature activation-1.66e-02

Pos logit contributions
hr+0.011
rum+0.011
AAAA+0.010
cest+0.010
oc+0.010
Neg logit contributions
abella-0.013
 wrong-0.013
top-0.013
bench-0.011
rant-0.010
Token in
Token ID287
Feature activation-2.36e-02

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token a
Token ID257
Feature activation+5.40e-03

Pos logit contributions
hr+0.044
rum+0.043
cest+0.042
AAAA+0.041
ment+0.041
Neg logit contributions
abella-0.024
 wrong-0.023
top-0.022
bench-0.017
 Luke-0.017
Token big
Token ID1263
Feature activation-8.84e-03

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
rant+0.008
Neg logit contributions
hr-0.006
cest-0.006
AAAA-0.006
rum-0.006
oc-0.006
Token field
Token ID2214
Feature activation+4.11e-02

Pos logit contributions
hr+0.012
rum+0.012
cest+0.011
AAAA+0.011
ment+0.011
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.011
 weekend-0.010
Token with
Token ID351
Feature activation-2.03e-02

Pos logit contributions
abella+0.092
 versa+0.075
top+0.073
 wrong+0.072
 Dame+0.072
Neg logit contributions
hr-0.036
 exc-0.032
cest-0.031
packs-0.031
rum-0.031
Token other
Token ID584
Feature activation-7.68e-02

Pos logit contributions
hr+0.037
rum+0.036
AAAA+0.035
cest+0.035
oc+0.034
Neg logit contributions
abella-0.022
 wrong-0.021
top-0.020
 Luke-0.014
bench-0.014
Token animals
Token ID4695
Feature activation-1.13e-02

Pos logit contributions
rum+0.132
hr+0.130
AAAA+0.120
ment+0.118
 amaz+0.112
Neg logit contributions
 wrong-0.093
 always-0.088
 would-0.087
 one-0.082
 only-0.081
Token.
Token ID13
Feature activation+6.40e-03

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
ment+0.015
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.015
bench-0.012
 versa-0.012
Token
Token ID198
Feature activation+3.01e-02

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.008
rum-0.008
AAAA-0.008
oc-0.008
cest-0.007
Token
Token ID198
Feature activation+1.88e-02

Pos logit contributions
abella+0.055
top+0.053
 wrong+0.053
rant+0.047
bench+0.046
Neg logit contributions
hr-0.031
oc-0.030
rum-0.029
 exc-0.029
AAAA-0.028
TokenOne
Token ID3198
Feature activation-2.29e-02

Pos logit contributions
abella+0.023
 wrong+0.021
top+0.021
bench+0.018
work+0.018
Neg logit contributions
AAAA-0.031
hr-0.031
oc-0.030
 exc-0.029
Upon-0.029
Token.
Token ID13
Feature activation+2.50e-02

Pos logit contributions
abella+0.012
 wrong+0.011
top+0.011
bench+0.010
 versa+0.009
Neg logit contributions
hr-0.008
rum-0.008
cest-0.008
packs-0.008
oc-0.008
Token One
Token ID1881
Feature activation-3.18e-02

Pos logit contributions
abella+0.034
 wrong+0.032
top+0.029
bench+0.026
 versa+0.025
Neg logit contributions
packs-0.037
 exc-0.035
hr-0.035
AAAA-0.034
Upon-0.034
Token day
Token ID1110
Feature activation-3.99e-03

Pos logit contributions
rum+0.082
hr+0.081
AAAA+0.077
cest+0.076
 amaz+0.076
Neg logit contributions
abella-0.010
 wrong-0.010
 weekend-0.010
top-0.009
 morning-0.007
Token,
Token ID11
Feature activation-8.77e-03

Pos logit contributions
hr+0.006
rum+0.006
cest+0.006
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token while
Token ID981
Feature activation-8.27e-03

Pos logit contributions
hr+0.016
rum+0.016
cest+0.015
AAAA+0.015
oc+0.015
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 Luke-0.007
Token they
Token ID484
Feature activation-7.67e-02

Pos logit contributions
hr+0.015
rum+0.014
cest+0.014
AAAA+0.014
oc+0.013
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token were
Token ID547
Feature activation-1.41e-02

Pos logit contributions
rum+0.195
cest+0.181
adoes+0.181
hr+0.180
 wounds+0.174
Neg logit contributions
 would-0.043
 weekend-0.035
 night-0.035
 made-0.033
 only-0.033
Token walking
Token ID6155
Feature activation+1.91e-03

Pos logit contributions
rum+0.022
hr+0.022
cest+0.021
oc+0.021
ment+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
 weekend-0.015
 versa-0.015
Token,
Token ID11
Feature activation+3.48e-03

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.002
AAAA-0.002
ment-0.002
Token Lily
Token ID20037
Feature activation-1.99e-03

Pos logit contributions
abella+0.004
top+0.004
 wrong+0.004
bench+0.003
rant+0.003
Neg logit contributions
hr-0.006
AAAA-0.005
rum-0.005
cest-0.005
ment-0.005
Token found
Token ID1043
Feature activation+7.64e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
oc+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
 versa-0.002
Token any
Token ID597
Feature activation-4.57e-02

Pos logit contributions
hr+0.027
rum+0.026
cest+0.026
oc+0.026
AAAA+0.026
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
bench-0.014
 weekend-0.013
Token friends
Token ID2460
Feature activation-1.73e-02

Pos logit contributions
hr+0.064
AAAA+0.062
oc+0.059
cest+0.058
rum+0.058
Neg logit contributions
 wrong-0.076
top-0.063
abella-0.060
 only-0.056
 contest-0.055
Token.
Token ID13
Feature activation+3.99e-03

Pos logit contributions
hr+0.028
rum+0.027
AAAA+0.027
cest+0.027
oc+0.026
Neg logit contributions
 wrong-0.022
abella-0.022
top-0.020
bench-0.016
 versa-0.015
Token All
Token ID1439
Feature activation-1.41e-02

Pos logit contributions
abella+0.006
 wrong+0.006
top+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.005
rum-0.005
AAAA-0.005
cest-0.005
oc-0.005
Token the
Token ID262
Feature activation-4.83e-02

Pos logit contributions
hr+0.022
rum+0.021
cest+0.020
AAAA+0.020
oc+0.020
Neg logit contributions
abella-0.019
 wrong-0.019
top-0.018
 weekend-0.015
bench-0.015
Token other
Token ID584
Feature activation-7.63e-02

Pos logit contributions
rum+0.082
ment+0.077
hr+0.075
cest+0.074
packs+0.073
Neg logit contributions
 wrong-0.066
 always-0.054
abella-0.054
 only-0.054
 never-0.054
Token trees
Token ID7150
Feature activation-4.54e-03

Pos logit contributions
rum+0.130
ment+0.111
hr+0.110
cest+0.109
AAAA+0.109
Neg logit contributions
 wrong-0.103
 only-0.094
 always-0.092
top-0.088
 never-0.087
Token were
Token ID547
Feature activation-2.33e-03

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token big
Token ID1263
Feature activation-2.16e-02

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.002
Neg logit contributions
abella-0.004
top-0.004
 wrong-0.004
bench-0.003
 versa-0.003
Token and
Token ID290
Feature activation+8.32e-03

Pos logit contributions
hr+0.028
rum+0.027
AAAA+0.025
ment+0.024
cest+0.024
Neg logit contributions
 wrong-0.036
abella-0.034
top-0.034
bench-0.028
 contest-0.027
Token strong
Token ID1913
Feature activation-7.94e-03

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.012
bench+0.011
 versa+0.011
Neg logit contributions
hr-0.010
cest-0.010
 exc-0.010
rum-0.010
AAAA-0.010
Token "
Token ID366
Feature activation-4.89e-03

Pos logit contributions
hr+0.015
rum+0.015
cest+0.015
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
 Luke-0.010
TokenDon
Token ID3987
Feature activation-1.10e-03

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.008
top-0.007
 wrong-0.007
bench-0.006
 versa-0.006
Token't
Token ID470
Feature activation-3.63e-03

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
 versa-0.000
Token eat
Token ID4483
Feature activation-4.48e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token too
Token ID1165
Feature activation-8.00e-03

Pos logit contributions
rum+0.084
hr+0.083
cest+0.083
oc+0.080
AAAA+0.080
Neg logit contributions
 wrong-0.053
abella-0.047
top-0.043
 contest-0.033
 only-0.032
Token many
Token ID867
Feature activation-7.62e-02

Pos logit contributions
hr+0.014
rum+0.013
cest+0.013
AAAA+0.013
oc+0.013
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
rant-0.007
Token."
Token ID526
Feature activation-1.44e-03

Pos logit contributions
AAAA+0.110
hr+0.109
rum+0.106
 amaz+0.104
cest+0.100
Neg logit contributions
 wrong-0.113
abella-0.098
top-0.095
 only-0.088
 always-0.086
Token
Token ID198
Feature activation+1.33e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token
Token ID198
Feature activation-8.79e-03

Pos logit contributions
abella+0.022
top+0.021
 wrong+0.021
rant+0.018
bench+0.018
Neg logit contributions
hr-0.016
rum-0.015
AAAA-0.015
oc-0.015
cest-0.014
TokenL
Token ID43
Feature activation-1.16e-02

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
ment+0.012
AAAA+0.012
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.012
bench-0.010
 versa-0.010
Tokenily
Token ID813
Feature activation+1.07e-03

Pos logit contributions
hr+0.024
rum+0.023
cest+0.022
oc+0.022
AAAA+0.022
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.009
bench-0.007
rant-0.006
Token in
Token ID287
Feature activation-1.94e-02

Pos logit contributions
hr+0.011
rum+0.011
cest+0.011
AAAA+0.011
ment+0.010
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
rant-0.008
Token a
Token ID257
Feature activation+6.36e-04

Pos logit contributions
hr+0.036
rum+0.036
cest+0.034
ment+0.034
AAAA+0.033
Neg logit contributions
abella-0.019
 wrong-0.019
top-0.019
 never-0.014
bench-0.014
Token big
Token ID1263
Feature activation-6.42e-03

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token field
Token ID2214
Feature activation+4.02e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.007
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.010
bench-0.008
rant-0.008
Token with
Token ID351
Feature activation-2.18e-02

Pos logit contributions
abella+0.091
 versa+0.075
top+0.073
 wrong+0.072
 Dame+0.072
Neg logit contributions
hr-0.035
 exc-0.031
cest-0.031
rum-0.030
packs-0.030
Token other
Token ID584
Feature activation-7.62e-02

Pos logit contributions
hr+0.041
rum+0.040
AAAA+0.039
cest+0.038
packs+0.037
Neg logit contributions
abella-0.022
 wrong-0.021
top-0.020
 never-0.014
 Luke-0.014
Token animals
Token ID4695
Feature activation-5.86e-03

Pos logit contributions
rum+0.121
hr+0.108
 amaz+0.106
Upon+0.104
AAAA+0.104
Neg logit contributions
 wrong-0.098
 always-0.088
 only-0.086
 one-0.086
 never-0.083
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
hr+0.008
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
 versa-0.007
Token
Token ID198
Feature activation+3.08e-02

Pos logit contributions
abella+0.021
 wrong+0.019
top+0.018
bench+0.016
rant+0.016
Neg logit contributions
hr-0.016
AAAA-0.016
oc-0.016
 exc-0.016
rum-0.015
Token
Token ID198
Feature activation+1.40e-02

Pos logit contributions
abella+0.057
 wrong+0.055
top+0.054
rant+0.048
work+0.047
Neg logit contributions
hr-0.031
oc-0.030
 exc-0.029
rum-0.028
AAAA-0.028
TokenOne
Token ID3198
Feature activation-2.63e-02

Pos logit contributions
abella+0.017
 wrong+0.015
top+0.015
bench+0.012
work+0.012
Neg logit contributions
hr-0.024
AAAA-0.023
oc-0.023
 exc-0.023
Upon-0.022
Token please
Token ID3387
Feature activation+7.47e-03

Pos logit contributions
hr+0.010
rum+0.010
cest+0.010
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
 versa-0.006
Token!"
Token ID2474
Feature activation+4.68e-04

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.010
 versa+0.009
Neg logit contributions
hr-0.010
rum-0.009
cest-0.009
 exc-0.009
AAAA-0.009
Token
Token ID198
Feature activation+1.83e-03

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.000
AAAA-0.000
cest-0.000
ment-0.000
Token
Token ID198
Feature activation-6.56e-03

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.002
rum-0.002
AAAA-0.002
cest-0.002
oc-0.002
TokenHer
Token ID9360
Feature activation-2.20e-02

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.013
top-0.012
 wrong-0.012
bench-0.010
 versa-0.010
Token mom
Token ID1995
Feature activation+4.12e-02

Pos logit contributions
hr+0.042
rum+0.041
AAAA+0.040
oc+0.040
cest+0.039
Neg logit contributions
abella-0.024
 wrong-0.023
top-0.019
bench-0.015
 only-0.015
Token looked
Token ID3114
Feature activation-7.84e-03

Pos logit contributions
abella+0.068
top+0.066
 wrong+0.065
 weekend+0.063
 versa+0.060
Neg logit contributions
hr-0.043
oc-0.039
 mis-0.039
 panicked-0.039
 mut-0.038
Token at
Token ID379
Feature activation+2.19e-04

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.014
bench-0.012
rant-0.011
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token price
Token ID2756
Feature activation+4.55e-03

Pos logit contributions
rum+0.020
hr+0.019
cest+0.018
oc+0.018
AAAA+0.018
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.016
bench-0.013
 versa-0.013
Token tag
Token ID7621
Feature activation+9.72e-03

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
rant+0.006
Neg logit contributions
hr-0.006
rum-0.005
cest-0.005
packs-0.005
ment-0.005
Token skill
Token ID5032
Feature activation+8.25e-04

Pos logit contributions
abella+0.075
top+0.070
 wrong+0.065
rant+0.065
side+0.060
Neg logit contributions
cest-0.027
 exc-0.024
oc-0.024
rum-0.023
hr-0.022
Token to
Token ID284
Feature activation+1.25e-02

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token have
Token ID423
Feature activation-4.24e-03

Pos logit contributions
abella+0.019
top+0.018
 wrong+0.017
bench+0.015
rant+0.014
Neg logit contributions
hr-0.018
rum-0.017
cest-0.016
 exc-0.016
AAAA-0.016
Token.
Token ID13
Feature activation+6.38e-03

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.006
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token You
Token ID921
Feature activation+4.50e-03

Pos logit contributions
abella+0.010
 wrong+0.009
top+0.009
bench+0.008
 versa+0.007
Neg logit contributions
hr-0.009
rum-0.008
AAAA-0.008
cest-0.008
oc-0.008
Token can
Token ID460
Feature activation+4.35e-02

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
rant+0.004
Neg logit contributions
hr-0.007
rum-0.007
cest-0.007
AAAA-0.007
 exc-0.006
Token play
Token ID711
Feature activation-1.12e-02

Pos logit contributions
abella+0.079
 wrong+0.070
bench+0.068
top+0.068
 weekend+0.064
Neg logit contributions
 exc-0.050
hr-0.047
 mis-0.046
 tou-0.044
rum-0.044
Token with
Token ID351
Feature activation-1.03e-02

Pos logit contributions
hr+0.016
rum+0.015
cest+0.015
AAAA+0.015
ment+0.014
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.013
 versa-0.013
Token Max
Token ID5436
Feature activation-8.21e-03

Pos logit contributions
hr+0.015
rum+0.015
cest+0.014
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 Luke-0.011
Token anytime
Token ID17949
Feature activation-6.17e-03

Pos logit contributions
hr+0.010
rum+0.009
cest+0.009
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.014
 wrong-0.014
top-0.013
bench-0.011
 versa-0.011
Token you
Token ID345
Feature activation-8.39e-03

Pos logit contributions
hr+0.009
rum+0.008
cest+0.008
AAAA+0.008
ment+0.008
Neg logit contributions
abella-0.009
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token Tim
Token ID5045
Feature activation-8.32e-03

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
rant+0.007
Neg logit contributions
hr-0.013
rum-0.013
cest-0.013
AAAA-0.012
oc-0.012
Tokenmy
Token ID1820
Feature activation-1.73e-02

Pos logit contributions
hr+0.017
rum+0.017
AAAA+0.016
cest+0.016
ment+0.016
Neg logit contributions
abella-0.007
 wrong-0.007
top-0.006
bench-0.004
rant-0.004
Token.
Token ID13
Feature activation+7.97e-03

Pos logit contributions
hr+0.027
AAAA+0.027
rum+0.026
ment+0.025
cest+0.025
Neg logit contributions
abella-0.023
 wrong-0.023
top-0.022
bench-0.017
 Luke-0.017
Token Do
Token ID2141
Feature activation+7.11e-03

Pos logit contributions
abella+0.013
 wrong+0.012
top+0.012
bench+0.010
 versa+0.010
Neg logit contributions
hr-0.010
rum-0.010
cest-0.010
AAAA-0.010
oc-0.010
Token you
Token ID345
Feature activation+1.29e-02

Pos logit contributions
abella+0.010
top+0.010
 wrong+0.009
bench+0.009
rant+0.008
Neg logit contributions
hr-0.010
rum-0.010
AAAA-0.010
cest-0.010
 exc-0.009
Token want
Token ID765
Feature activation+3.99e-02

Pos logit contributions
abella+0.025
top+0.023
 wrong+0.023
bench+0.021
side+0.019
Neg logit contributions
hr-0.013
 exc-0.013
rum-0.013
cest-0.012
oc-0.012
Token to
Token ID284
Feature activation+7.80e-03

Pos logit contributions
abella+0.064
top+0.060
bench+0.058
rant+0.053
 wrong+0.053
Neg logit contributions
hr-0.048
 squirrel-0.046
 tou-0.046
 ac-0.046
 prey-0.045
Token play
Token ID711
Feature activation+4.46e-03

Pos logit contributions
abella+0.012
top+0.011
 wrong+0.011
bench+0.009
rant+0.009
Neg logit contributions
hr-0.011
cest-0.010
rum-0.010
 exc-0.010
AAAA-0.010
Token with
Token ID351
Feature activation+4.25e-03

Pos logit contributions
abella+0.008
top+0.007
 wrong+0.007
bench+0.006
rant+0.006
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
oc-0.005
Token me
Token ID502
Feature activation+5.75e-03

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token?"
Token ID1701
Feature activation-2.32e-02

Pos logit contributions
abella+0.008
top+0.008
 wrong+0.007
bench+0.006
 versa+0.006
Neg logit contributions
hr-0.009
rum-0.008
cest-0.008
oc-0.008
packs-0.008
Token.
Token ID13
Feature activation-2.87e-04

Pos logit contributions
AAAA+0.048
hr+0.046
rum+0.044
cest+0.044
Upon+0.041
Neg logit contributions
top-0.037
 wrong-0.036
abella-0.034
rant-0.026
work-0.026
Token They
Token ID1119
Feature activation-1.79e-03

Pos logit contributions
hr+0.000
rum+0.000
AAAA+0.000
cest+0.000
oc+0.000
Neg logit contributions
abella-0.000
 wrong-0.000
top-0.000
bench-0.000
 versa-0.000
Token continued
Token ID3767
Feature activation-9.77e-03

Pos logit contributions
hr+0.003
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.003
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token their
Token ID511
Feature activation-1.07e-02

Pos logit contributions
hr+0.018
rum+0.018
cest+0.018
AAAA+0.017
oc+0.017
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 weekend-0.007
Token walk
Token ID2513
Feature activation+1.66e-02

Pos logit contributions
hr+0.015
rum+0.015
cest+0.014
AAAA+0.014
ment+0.014
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.013
 weekend-0.012
Token and
Token ID290
Feature activation+4.02e-02

Pos logit contributions
abella+0.033
 wrong+0.029
top+0.028
 versa+0.026
bench+0.026
Neg logit contributions
hr-0.017
cest-0.016
packs-0.016
oc-0.016
ment-0.016
Token the
Token ID262
Feature activation+9.67e-03

Pos logit contributions
abella+0.065
 wrong+0.064
top+0.059
bench+0.057
inner+0.054
Neg logit contributions
hr-0.049
oc-0.047
 tou-0.045
packs-0.045
cest-0.044
Token boy
Token ID2933
Feature activation-6.54e-04

Pos logit contributions
abella+0.021
top+0.019
 wrong+0.019
bench+0.017
rant+0.017
Neg logit contributions
hr-0.008
cest-0.007
AAAA-0.007
 tou-0.007
 exc-0.007
Token was
Token ID373
Feature activation-4.52e-04

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
oc+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token happy
Token ID3772
Feature activation-1.16e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.000
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
 versa-0.001
Token to
Token ID284
Feature activation-1.78e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.002
top-0.001
 wrong-0.001
bench-0.001
 versa-0.001
Token decided
Token ID3066
Feature activation+7.88e-03

Pos logit contributions
abella+0.020
top+0.019
 wrong+0.019
bench+0.017
rant+0.016
Neg logit contributions
hr-0.014
 exc-0.014
cest-0.014
AAAA-0.013
rum-0.013
Token to
Token ID284
Feature activation+1.13e-02

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.008
bench+0.007
rant+0.007
Neg logit contributions
hr-0.014
cest-0.013
rum-0.013
AAAA-0.013
 exc-0.012
Token go
Token ID467
Feature activation-9.47e-04

Pos logit contributions
abella+0.019
 wrong+0.017
top+0.017
bench+0.014
rant+0.014
Neg logit contributions
hr-0.015
rum-0.014
 exc-0.014
cest-0.014
AAAA-0.013
Token on
Token ID319
Feature activation+1.57e-02

Pos logit contributions
hr+0.002
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token a
Token ID257
Feature activation+3.53e-02

Pos logit contributions
abella+0.026
top+0.024
 wrong+0.022
bench+0.021
rant+0.020
Neg logit contributions
hr-0.020
AAAA-0.019
packs-0.018
rum-0.018
ment-0.018
Token journey
Token ID7002
Feature activation+4.65e-02

Pos logit contributions
abella+0.063
top+0.058
 wrong+0.053
bench+0.052
rant+0.051
Neg logit contributions
hr-0.040
 exc-0.038
cest-0.036
 tou-0.036
oc-0.036
Token.
Token ID13
Feature activation-4.61e-03

Pos logit contributions
abella+0.096
acey+0.078
 versa+0.078
 Salad+0.076
 Dame+0.075
Neg logit contributions
hr-0.044
 than-0.043
cest-0.042
 from-0.041
packs-0.039
Token
Token ID198
Feature activation+1.51e-02

Pos logit contributions
hr+0.007
rum+0.007
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.007
top-0.006
 wrong-0.006
bench-0.005
 versa-0.005
Token
Token ID198
Feature activation-1.43e-02

Pos logit contributions
abella+0.025
 wrong+0.025
top+0.024
rant+0.021
 versa+0.020
Neg logit contributions
hr-0.018
rum-0.017
oc-0.017
AAAA-0.017
cest-0.016
TokenAll
Token ID3237
Feature activation-8.76e-03

Pos logit contributions
hr+0.022
rum+0.021
ment+0.021
cest+0.021
AAAA+0.020
Neg logit contributions
abella-0.021
top-0.019
 wrong-0.018
bench-0.016
 weekend-0.015
Token around
Token ID1088
Feature activation-9.15e-03

Pos logit contributions
hr+0.012
rum+0.012
AAAA+0.012
cest+0.012
oc+0.011
Neg logit contributions
abella-0.013
 wrong-0.012
top-0.012
bench-0.010
 weekend-0.010
Token and
Token ID290
Feature activation+1.75e-03

Pos logit contributions
abella+0.041
 wrong+0.036
top+0.036
bench+0.033
 versa+0.033
Neg logit contributions
hr-0.022
rum-0.022
cest-0.020
ment-0.020
 exc-0.019
Token agreed
Token ID4987
Feature activation+2.30e-02

Pos logit contributions
abella+0.003
top+0.002
 wrong+0.002
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token.
Token ID13
Feature activation-8.39e-04

Pos logit contributions
abella+0.038
top+0.033
 wrong+0.032
rant+0.030
bench+0.030
Neg logit contributions
hr-0.029
rum-0.028
ment-0.028
cest-0.026
 exc-0.026
Token The
Token ID383
Feature activation+5.34e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token voice
Token ID3809
Feature activation-1.24e-02

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.010
bench+0.009
rant+0.009
Neg logit contributions
hr-0.005
AAAA-0.004
rum-0.004
cest-0.004
 exc-0.004
Token pulled
Token ID5954
Feature activation+1.23e-03

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.018
oc+0.018
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
bench-0.012
rant-0.012
Token the
Token ID262
Feature activation-1.75e-02

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.002
rum-0.002
cest-0.001
AAAA-0.001
ment-0.001
Token little
Token ID1310
Feature activation+1.01e-02

Pos logit contributions
hr+0.023
rum+0.022
cest+0.021
packs+0.020
oc+0.020
Neg logit contributions
 wrong-0.028
abella-0.028
top-0.027
 weekend-0.022
bench-0.022
Token girl
Token ID2576
Feature activation-3.96e-02

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.022
bench+0.020
 versa+0.020
Neg logit contributions
hr-0.006
cest-0.006
rum-0.005
AAAA-0.005
ment-0.005
Token out
Token ID503
Feature activation-3.11e-02

Pos logit contributions
AAAA+0.073
cest+0.069
hr+0.069
rum+0.068
ment+0.065
Neg logit contributions
 wrong-0.044
top-0.044
abella-0.040
 never-0.030
 Luke-0.029
Token of
Token ID286
Feature activation-9.18e-03

Pos logit contributions
AAAA+0.043
hr+0.043
cest+0.042
rum+0.041
oc+0.040
Neg logit contributions
 wrong-0.046
top-0.045
abella-0.045
bench-0.036
 Luke-0.036
Token person
Token ID1048
Feature activation+8.68e-04

Pos logit contributions
abella+0.034
top+0.032
 wrong+0.030
bench+0.029
rant+0.028
Neg logit contributions
hr-0.015
cest-0.014
 exc-0.014
AAAA-0.014
ment-0.013
Token said
Token ID531
Feature activation+2.27e-02

Pos logit contributions
abella+0.001
 wrong+0.001
top+0.001
bench+0.001
 versa+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
oc-0.001
Token "
Token ID366
Feature activation-2.26e-02

Pos logit contributions
abella+0.042
top+0.039
bench+0.036
 wrong+0.036
rant+0.034
Neg logit contributions
rum-0.025
ment-0.024
hr-0.024
cest-0.022
AAAA-0.022
TokenIt
Token ID1026
Feature activation-7.32e-03

Pos logit contributions
rum+0.035
hr+0.034
cest+0.033
AAAA+0.032
ment+0.032
Neg logit contributions
abella-0.031
top-0.030
 wrong-0.027
bench-0.023
 weekend-0.023
Token's
Token ID338
Feature activation+2.04e-03

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
ment+0.012
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.006
rant-0.006
Token ok
Token ID12876
Feature activation+1.82e-02

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
 exc-0.003
Token,
Token ID11
Feature activation+2.43e-05

Pos logit contributions
abella+0.031
top+0.029
 wrong+0.027
bench+0.026
 versa+0.024
Neg logit contributions
hr-0.023
rum-0.022
cest-0.021
ment-0.020
 exc-0.019
Token you
Token ID345
Feature activation-1.99e-04

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
ment-0.000
Token can
Token ID460
Feature activation+2.58e-02

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
oc+0.000
Neg logit contributions
abella-0.000
top-0.000
 wrong-0.000
bench-0.000
 versa-0.000
Token come
Token ID1282
Feature activation-3.74e-02

Pos logit contributions
abella+0.043
 wrong+0.038
top+0.038
bench+0.037
 weekend+0.034
Neg logit contributions
rum-0.033
 exc-0.032
hr-0.032
 mis-0.031
oc-0.030
Token out
Token ID503
Feature activation-2.83e-02

Pos logit contributions
hr+0.068
rum+0.068
AAAA+0.066
ment+0.065
Conf+0.064
Neg logit contributions
 wrong-0.038
abella-0.036
top-0.035
 never-0.027
rant-0.027
Token smiled
Token ID13541
Feature activation-1.57e-02

Pos logit contributions
abella+0.038
 wrong+0.036
top+0.036
 weekend+0.032
bench+0.032
Neg logit contributions
hr-0.029
oc-0.027
rum-0.027
AAAA-0.027
 mis-0.027
Token and
Token ID290
Feature activation+2.02e-03

Pos logit contributions
hr+0.019
rum+0.018
AAAA+0.017
cest+0.017
oc+0.016
Neg logit contributions
abella-0.027
 wrong-0.027
top-0.026
bench-0.022
 Luke-0.022
Token said
Token ID531
Feature activation+2.35e-02

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
ment-0.003
Token,
Token ID11
Feature activation-5.53e-03

Pos logit contributions
abella+0.044
top+0.042
bench+0.039
 wrong+0.038
rant+0.037
Neg logit contributions
ment-0.024
rum-0.023
hr-0.023
AAAA-0.021
 exc-0.021
Token "
Token ID366
Feature activation+5.53e-03

Pos logit contributions
hr+0.008
rum+0.007
cest+0.007
AAAA+0.007
ment+0.007
Neg logit contributions
abella-0.009
 wrong-0.008
top-0.008
bench-0.007
 versa-0.007
TokenLet
Token ID5756
Feature activation+2.73e-03

Pos logit contributions
abella+0.009
 wrong+0.009
top+0.008
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.007
cest-0.007
rum-0.007
AAAA-0.007
oc-0.006
Token's
Token ID338
Feature activation+2.05e-02

Pos logit contributions
abella+0.004
top+0.003
 wrong+0.003
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.004
rum-0.004
AAAA-0.004
cest-0.004
oc-0.004
Token go
Token ID467
Feature activation+4.34e-03

Pos logit contributions
abella+0.035
 wrong+0.031
top+0.031
bench+0.028
rant+0.026
Neg logit contributions
hr-0.026
 exc-0.025
rum-0.024
cest-0.024
oc-0.023
Token to
Token ID284
Feature activation-1.78e-03

Pos logit contributions
abella+0.007
 wrong+0.006
top+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.006
rum-0.006
cest-0.006
oc-0.006
AAAA-0.006
Token the
Token ID262
Feature activation+1.17e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
 versa-0.002
Token park
Token ID3952
Feature activation+1.55e-03

Pos logit contributions
abella+0.017
top+0.015
 wrong+0.014
bench+0.014
rant+0.013
Neg logit contributions
hr-0.017
cest-0.016
 tou-0.016
AAAA-0.016
rum-0.015
Token place
Token ID1295
Feature activation+2.72e-02

Pos logit contributions
hr+0.012
rum+0.012
cest+0.011
AAAA+0.011
oc+0.011
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.010
bench-0.008
 weekend-0.008
Token had
Token ID550
Feature activation+9.27e-03

Pos logit contributions
abella+0.047
top+0.043
 wrong+0.039
 versa+0.038
bench+0.038
Neg logit contributions
hr-0.032
packs-0.031
ment-0.029
 exc-0.029
cest-0.029
Token many
Token ID867
Feature activation-5.25e-02

Pos logit contributions
abella+0.013
top+0.012
 wrong+0.012
bench+0.010
rant+0.010
Neg logit contributions
hr-0.014
cest-0.013
AAAA-0.013
rum-0.012
packs-0.012
Token old
Token ID1468
Feature activation-3.53e-02

Pos logit contributions
rum+0.080
hr+0.079
AAAA+0.077
oc+0.073
cest+0.070
Neg logit contributions
 wrong-0.073
abella-0.063
top-0.062
 weekend-0.055
 contest-0.054
Token things
Token ID1243
Feature activation-2.19e-03

Pos logit contributions
hr+0.043
rum+0.041
AAAA+0.038
oc+0.037
cest+0.036
Neg logit contributions
 wrong-0.060
top-0.056
abella-0.056
 weekend-0.048
bench-0.048
Token to
Token ID284
Feature activation-1.87e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.003
bench-0.002
 versa-0.002
Token look
Token ID804
Feature activation-2.96e-02

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.003
ment+0.003
Neg logit contributions
abella-0.003
 wrong-0.003
top-0.002
bench-0.002
 versa-0.002
Token at
Token ID379
Feature activation-1.71e-02

Pos logit contributions
hr+0.048
rum+0.048
AAAA+0.046
cest+0.044
ment+0.043
Neg logit contributions
 wrong-0.043
abella-0.039
top-0.036
rant-0.031
 would-0.029
Token.
Token ID13
Feature activation-8.34e-03

Pos logit contributions
hr+0.027
rum+0.026
AAAA+0.025
cest+0.025
oc+0.024
Neg logit contributions
abella-0.024
 wrong-0.024
top-0.022
bench-0.018
rant-0.018
Token Lucy
Token ID22162
Feature activation+1.80e-02

Pos logit contributions
hr+0.013
rum+0.013
cest+0.012
AAAA+0.012
oc+0.012
Neg logit contributions
abella-0.011
 wrong-0.011
top-0.011
bench-0.009
 versa-0.008
Token liked
Token ID8288
Feature activation-1.98e-02

Pos logit contributions
abella+0.030
 wrong+0.028
top+0.027
bench+0.024
rant+0.024
Neg logit contributions
hr-0.023
 exc-0.021
cest-0.020
oc-0.020
AAAA-0.020
Token mom
Token ID1995
Feature activation+1.90e-04

Pos logit contributions
abella+0.032
top+0.030
 wrong+0.028
bench+0.026
rant+0.026
Neg logit contributions
hr-0.016
cest-0.015
rum-0.015
 exc-0.014
ment-0.014
Token went
Token ID1816
Feature activation+2.14e-02

Pos logit contributions
abella+0.000
 wrong+0.000
top+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
oc-0.000
Token back
Token ID736
Feature activation-3.06e-02

Pos logit contributions
abella+0.031
top+0.027
 wrong+0.027
 versa+0.023
rant+0.023
Neg logit contributions
hr-0.031
packs-0.030
rum-0.028
ment-0.027
 mut-0.027
Token to
Token ID284
Feature activation-1.69e-02

Pos logit contributions
rum+0.068
cest+0.067
AAAA+0.067
hr+0.066
oc+0.063
Neg logit contributions
top-0.021
abella-0.019
 wrong-0.018
 weekend-0.012
bench-0.012
Token the
Token ID262
Feature activation+1.43e-02

Pos logit contributions
rum+0.026
cest+0.026
hr+0.026
AAAA+0.025
ment+0.025
Neg logit contributions
abella-0.023
 wrong-0.021
top-0.021
 Luke-0.018
bench-0.017
Token gate
Token ID8946
Feature activation+2.76e-02

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.020
bench+0.018
rant+0.018
Neg logit contributions
hr-0.019
cest-0.018
AAAA-0.018
 exc-0.017
oc-0.017
Token.
Token ID13
Feature activation-1.92e-02

Pos logit contributions
abella+0.054
 wrong+0.047
top+0.046
 versa+0.044
rant+0.042
Neg logit contributions
hr-0.028
packs-0.026
rum-0.025
ment-0.025
cest-0.025
Token Tom
Token ID4186
Feature activation-1.65e-02

Pos logit contributions
hr+0.036
cest+0.035
rum+0.034
AAAA+0.033
ment+0.032
Neg logit contributions
abella-0.022
top-0.020
 wrong-0.019
bench-0.015
 However-0.015
Token's
Token ID338
Feature activation-3.20e-03

Pos logit contributions
hr+0.026
rum+0.026
cest+0.025
AAAA+0.025
ment+0.025
Neg logit contributions
abella-0.022
top-0.021
 wrong-0.020
bench-0.016
 would-0.016
Token mom
Token ID1995
Feature activation-6.59e-03

Pos logit contributions
hr+0.005
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token put
Token ID1234
Feature activation+2.07e-03

Pos logit contributions
hr+0.010
rum+0.010
cest+0.010
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.009
top-0.009
 wrong-0.009
bench-0.007
 versa-0.007
Token to
Token ID284
Feature activation-1.37e-02

Pos logit contributions
abella+0.029
top+0.026
 wrong+0.025
 versa+0.022
bench+0.022
Neg logit contributions
hr-0.030
packs-0.028
Upon-0.027
cest-0.027
rum-0.027
Token a
Token ID257
Feature activation+2.34e-02

Pos logit contributions
rum+0.020
hr+0.020
cest+0.020
AAAA+0.019
ment+0.019
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
 Luke-0.015
 weekend-0.015
Token party
Token ID2151
Feature activation-2.23e-03

Pos logit contributions
abella+0.039
top+0.034
 wrong+0.033
bench+0.031
rant+0.031
Neg logit contributions
hr-0.031
oc-0.028
cest-0.027
 tou-0.027
 exc-0.027
Token with
Token ID351
Feature activation+1.71e-03

Pos logit contributions
hr+0.003
rum+0.003
cest+0.003
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token all
Token ID477
Feature activation-3.00e-02

Pos logit contributions
abella+0.003
top+0.003
 wrong+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token of
Token ID286
Feature activation-1.00e-02

Pos logit contributions
AAAA+0.045
hr+0.043
oc+0.041
rum+0.041
cest+0.040
Neg logit contributions
abella-0.042
 wrong-0.041
top-0.040
 weekend-0.035
bench-0.034
Token his
Token ID465
Feature activation-3.03e-02

Pos logit contributions
hr+0.013
rum+0.012
cest+0.012
AAAA+0.012
oc+0.012
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.015
bench-0.013
 Luke-0.013
Token friends
Token ID2460
Feature activation-1.35e-02

Pos logit contributions
rum+0.037
hr+0.035
AAAA+0.035
cest+0.032
ment+0.032
Neg logit contributions
abella-0.054
 wrong-0.053
top-0.049
 weekend-0.044
 contest-0.044
Token and
Token ID290
Feature activation-3.95e-03

Pos logit contributions
hr+0.020
rum+0.019
AAAA+0.019
cest+0.019
oc+0.018
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
bench-0.015
 weekend-0.015
Token family
Token ID1641
Feature activation-1.13e-02

Pos logit contributions
hr+0.006
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.004
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
hr+0.017
AAAA+0.017
rum+0.017
cest+0.016
oc+0.016
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.012
 weekend-0.011
Token her
Token ID607
Feature activation-1.77e-02

Pos logit contributions
abella+0.001
top+0.001
 wrong+0.001
bench+0.001
rant+0.001
Neg logit contributions
hr-0.001
rum-0.001
cest-0.001
AAAA-0.001
ment-0.001
Token room
Token ID2119
Feature activation+1.15e-02

Pos logit contributions
hr+0.031
rum+0.031
AAAA+0.031
cest+0.030
oc+0.029
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.019
 weekend-0.015
 Luke-0.015
Token was
Token ID373
Feature activation-9.41e-04

Pos logit contributions
abella+0.018
top+0.016
 wrong+0.016
bench+0.014
 versa+0.014
Neg logit contributions
hr-0.015
rum-0.015
cest-0.014
ment-0.014
packs-0.014
Token a
Token ID257
Feature activation-3.28e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
 versa-0.001
Token mess
Token ID2085
Feature activation+1.02e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.005
 wrong-0.005
top-0.005
bench-0.004
 versa-0.004
Token.
Token ID13
Feature activation-4.01e-03

Pos logit contributions
abella+0.018
top+0.017
 wrong+0.016
 versa+0.014
bench+0.014
Neg logit contributions
hr-0.012
rum-0.011
cest-0.011
ment-0.011
 exc-0.010
Token She
Token ID1375
Feature activation-1.72e-03

Pos logit contributions
hr+0.005
rum+0.005
cest+0.005
AAAA+0.005
ment+0.005
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token didn
Token ID1422
Feature activation+1.51e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.003
 wrong-0.002
top-0.002
bench-0.002
 versa-0.002
Token't
Token ID470
Feature activation+3.06e-03

Pos logit contributions
abella+0.015
top+0.014
 wrong+0.014
bench+0.010
rant+0.010
Neg logit contributions
hr-0.028
rum-0.028
AAAA-0.027
 exc-0.027
Upon-0.026
Token want
Token ID765
Feature activation+3.13e-02

Pos logit contributions
abella+0.006
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
 exc-0.003
AAAA-0.003
Token her
Token ID607
Feature activation-2.80e-02

Pos logit contributions
abella+0.045
top+0.044
bench+0.040
 wrong+0.039
 versa+0.038
Neg logit contributions
hr-0.038
 mis-0.036
packs-0.036
rum-0.036
 mut-0.035
Token was
Token ID373
Feature activation+6.70e-03

Pos logit contributions
abella+0.043
top+0.041
bench+0.038
 wrong+0.038
 weekend+0.036
Neg logit contributions
oc-0.027
 mis-0.026
rum-0.026
hr-0.025
 exc-0.025
Token sorry
Token ID7926
Feature activation-7.23e-03

Pos logit contributions
abella+0.011
top+0.010
 wrong+0.009
bench+0.008
rant+0.008
Neg logit contributions
hr-0.009
rum-0.009
cest-0.008
AAAA-0.008
 exc-0.008
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
hr+0.011
rum+0.010
AAAA+0.010
cest+0.010
oc+0.010
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.010
bench-0.008
 Luke-0.007
Token
Token ID198
Feature activation-7.38e-03

Pos logit contributions
hr+0.024
cest+0.024
rum+0.024
packs+0.023
AAAA+0.022
Neg logit contributions
abella-0.018
top-0.017
 wrong-0.017
bench-0.013
 However-0.013
Token
Token ID198
Feature activation-3.55e-02

Pos logit contributions
hr+0.011
rum+0.011
cest+0.011
AAAA+0.010
ment+0.010
Neg logit contributions
abella-0.011
top-0.010
 wrong-0.010
bench-0.008
 versa-0.007
TokenThe
Token ID464
Feature activation-1.98e-02

Pos logit contributions
packs+0.064
ment+0.060
cest+0.060
hr+0.059
 amaz+0.059
Neg logit contributions
top-0.043
abella-0.040
 wrong-0.039
her-0.031
 never-0.031
Token vendor
Token ID18371
Feature activation+1.54e-02

Pos logit contributions
rum+0.030
hr+0.030
cest+0.028
ment+0.028
AAAA+0.028
Neg logit contributions
 wrong-0.029
abella-0.028
top-0.027
 weekend-0.023
 contest-0.022
Token smiled
Token ID13541
Feature activation-1.84e-02

Pos logit contributions
abella+0.024
top+0.022
 wrong+0.022
bench+0.019
 versa+0.019
Neg logit contributions
hr-0.021
oc-0.019
rum-0.019
cest-0.019
AAAA-0.018
Token.
Token ID13
Feature activation-1.44e-02

Pos logit contributions
hr+0.026
rum+0.024
AAAA+0.024
cest+0.023
oc+0.022
Neg logit contributions
 wrong-0.029
abella-0.028
top-0.028
bench-0.022
 Luke-0.022
Token He
Token ID679
Feature activation-1.30e-03

Pos logit contributions
hr+0.023
cest+0.022
rum+0.022
packs+0.021
AAAA+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.019
bench-0.015
 However-0.014
Token said
Token ID531
Feature activation+1.43e-03

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
oc+0.002
Neg logit contributions
abella-0.002
 wrong-0.002
top-0.002
bench-0.001
 versa-0.001
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token
Token ID198
Feature activation-3.25e-02

Pos logit contributions
hr+0.021
rum+0.020
cest+0.020
AAAA+0.019
packs+0.019
Neg logit contributions
abella-0.019
top-0.017
 wrong-0.017
bench-0.014
 versa-0.013
TokenFrom
Token ID4863
Feature activation+1.84e-02

Pos logit contributions
hr+0.044
packs+0.044
rum+0.043
cest+0.043
ries+0.042
Neg logit contributions
abella-0.053
top-0.045
 wrong-0.041
 never-0.037
 weekend-0.035
Token that
Token ID326
Feature activation+6.68e-03

Pos logit contributions
abella+0.023
top+0.021
 wrong+0.019
bench+0.019
 versa+0.016
Neg logit contributions
hr-0.030
iring-0.028
cest-0.028
ment-0.028
AAAA-0.028
Token day
Token ID1110
Feature activation-1.43e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
 versa+0.005
rant+0.005
Neg logit contributions
hr-0.012
rum-0.012
cest-0.012
oc-0.012
AAAA-0.012
Token forward
Token ID2651
Feature activation-1.89e-02

Pos logit contributions
hr+0.023
cest+0.021
rum+0.021
AAAA+0.021
oc+0.020
Neg logit contributions
abella-0.020
 wrong-0.019
top-0.018
bench-0.015
 Luke-0.014
Token,
Token ID11
Feature activation-3.19e-02

Pos logit contributions
hr+0.031
rum+0.030
AAAA+0.029
cest+0.029
EX+0.028
Neg logit contributions
abella-0.025
 wrong-0.023
top-0.022
 Luke-0.017
 weekend-0.017
Token the
Token ID262
Feature activation+5.52e-03

Pos logit contributions
AAAA+0.052
rum+0.051
hr+0.050
EX+0.047
cest+0.047
Neg logit contributions
abella-0.044
top-0.039
 wrong-0.036
 weekend-0.030
 only-0.030
Token little
Token ID1310
Feature activation+1.39e-03

Pos logit contributions
abella+0.010
top+0.010
 wrong+0.010
bench+0.009
 versa+0.008
Neg logit contributions
hr-0.006
cest-0.006
oc-0.005
AAAA-0.005
rum-0.005
Token bird
Token ID6512
Feature activation-3.22e-02

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
 versa+0.002
Neg logit contributions
hr-0.001
cest-0.001
rum-0.001
AAAA-0.001
oc-0.001
Token always
Token ID1464
Feature activation-5.36e-03

Pos logit contributions
hr+0.056
rum+0.055
AAAA+0.054
packs+0.052
ment+0.051
Neg logit contributions
abella-0.036
 wrong-0.036
top-0.034
 would-0.029
 never-0.028
Token we
Token ID356
Feature activation+8.87e-03

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.002
rant+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token stumble
Token ID35174
Feature activation+1.30e-03

Pos logit contributions
abella+0.014
top+0.013
 wrong+0.013
bench+0.011
rant+0.010
Neg logit contributions
hr-0.012
rum-0.011
cest-0.011
 exc-0.011
AAAA-0.011
Token,
Token ID11
Feature activation+1.37e-03

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.002
 versa+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token we
Token ID356
Feature activation+2.22e-02

Pos logit contributions
abella+0.002
top+0.002
 wrong+0.002
bench+0.002
rant+0.001
Neg logit contributions
hr-0.002
rum-0.002
cest-0.002
AAAA-0.002
ment-0.002
Token have
Token ID423
Feature activation+1.83e-02

Pos logit contributions
abella+0.036
top+0.032
 wrong+0.030
bench+0.028
rant+0.026
Neg logit contributions
hr-0.029
 exc-0.029
rum-0.029
cest-0.028
oc-0.027
Token friends
Token ID2460
Feature activation-1.22e-02

Pos logit contributions
abella+0.025
top+0.022
 wrong+0.020
bench+0.019
rant+0.019
Neg logit contributions
hr-0.029
cest-0.027
AAAA-0.027
rum-0.027
packs-0.027
Token to
Token ID284
Feature activation+4.09e-02

Pos logit contributions
hr+0.017
rum+0.016
AAAA+0.016
cest+0.016
ment+0.015
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.014
 versa-0.014
Token help
Token ID1037
Feature activation-1.02e-03

Pos logit contributions
abella+0.078
top+0.069
 wrong+0.064
rant+0.062
bench+0.061
Neg logit contributions
hr-0.046
 exc-0.045
rum-0.044
 mis-0.042
cest-0.040
Token us
Token ID514
Feature activation-1.56e-02

Pos logit contributions
hr+0.002
rum+0.002
cest+0.002
AAAA+0.002
ment+0.002
Neg logit contributions
abella-0.001
top-0.001
 wrong-0.001
bench-0.001
 versa-0.001
Token up
Token ID510
Feature activation-1.35e-02

Pos logit contributions
hr+0.023
AAAA+0.022
rum+0.022
cest+0.022
oc+0.021
Neg logit contributions
abella-0.021
 wrong-0.021
top-0.021
bench-0.017
 versa-0.016
Token.
Token ID13
Feature activation+7.73e-03

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.019
oc+0.018
Neg logit contributions
abella-0.019
 wrong-0.019
top-0.018
bench-0.015
rant-0.014
Token a
Token ID257
Feature activation+1.59e-02

Pos logit contributions
hr+0.022
rum+0.021
AAAA+0.021
cest+0.021
ment+0.020
Neg logit contributions
abella-0.018
 wrong-0.017
top-0.017
bench-0.013
 Luke-0.013
Token castle
Token ID16669
Feature activation+6.09e-03

Pos logit contributions
abella+0.028
top+0.026
 wrong+0.025
 versa+0.022
rant+0.022
Neg logit contributions
hr-0.019
oc-0.018
cest-0.018
AAAA-0.017
rum-0.016
Token that
Token ID326
Feature activation+4.99e-03

Pos logit contributions
abella+0.010
top+0.009
 wrong+0.009
bench+0.008
 versa+0.008
Neg logit contributions
hr-0.008
rum-0.007
cest-0.007
oc-0.007
packs-0.007
Token was
Token ID373
Feature activation+1.15e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
 versa+0.005
Neg logit contributions
hr-0.008
cest-0.007
rum-0.007
oc-0.007
AAAA-0.007
Token one
Token ID530
Feature activation-1.20e-02

Pos logit contributions
abella+0.017
top+0.016
 wrong+0.015
bench+0.014
 versa+0.013
Neg logit contributions
hr-0.016
cest-0.015
rum-0.015
 exc-0.015
packs-0.014
Token hundred
Token ID3470
Feature activation-4.53e-02

Pos logit contributions
hr+0.016
rum+0.016
AAAA+0.016
cest+0.015
ment+0.014
Neg logit contributions
abella-0.019
 wrong-0.018
top-0.018
bench-0.015
 weekend-0.014
Token inches
Token ID8331
Feature activation+1.49e-02

Pos logit contributions
AAAA+0.066
hr+0.062
rum+0.058
Upon+0.057
 tou+0.056
Neg logit contributions
 wrong-0.065
abella-0.064
top-0.063
bench-0.052
 would-0.051
Token high
Token ID1029
Feature activation-7.04e-04

Pos logit contributions
abella+0.027
 wrong+0.024
top+0.023
 versa+0.021
 Luke+0.021
Neg logit contributions
hr-0.018
cest-0.017
rum-0.017
ment-0.016
oc-0.016
Token.
Token ID13
Feature activation-6.08e-03

Pos logit contributions
hr+0.001
rum+0.001
cest+0.001
AAAA+0.001
ment+0.001
Neg logit contributions
abella-0.001
 wrong-0.001
top-0.001
bench-0.001
 versa-0.001
Token She
Token ID1375
Feature activation+6.09e-03

Pos logit contributions
hr+0.010
rum+0.010
cest+0.009
AAAA+0.009
oc+0.009
Neg logit contributions
abella-0.008
 wrong-0.007
top-0.007
bench-0.006
 versa-0.006
Token said
Token ID531
Feature activation+2.19e-02

Pos logit contributions
abella+0.009
 wrong+0.008
top+0.008
bench+0.007
 versa+0.007
Neg logit contributions
hr-0.009
cest-0.008
rum-0.008
AAAA-0.008
 exc-0.008
Token his
Token ID465
Feature activation-3.38e-02

Pos logit contributions
abella+0.008
 wrong+0.007
top+0.007
bench+0.006
rant+0.006
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
ment-0.005
packs-0.004
Token classmates
Token ID28999
Feature activation-6.92e-03

Pos logit contributions
hr+0.051
rum+0.050
AAAA+0.049
cest+0.046
ment+0.045
Neg logit contributions
 wrong-0.048
abella-0.048
top-0.045
 weekend-0.040
 would-0.038
Token.
Token ID13
Feature activation+2.00e-03

Pos logit contributions
hr+0.010
rum+0.010
AAAA+0.009
cest+0.009
ment+0.009
Neg logit contributions
abella-0.010
 wrong-0.010
top-0.010
bench-0.008
 versa-0.008
Token He
Token ID679
Feature activation-4.34e-03

Pos logit contributions
abella+0.003
 wrong+0.003
top+0.003
bench+0.002
rant+0.002
Neg logit contributions
hr-0.003
rum-0.003
cest-0.003
AAAA-0.003
oc-0.003
Token brought
Token ID3181
Feature activation+1.38e-02

Pos logit contributions
hr+0.007
rum+0.006
cest+0.006
AAAA+0.006
ment+0.006
Neg logit contributions
abella-0.006
 wrong-0.006
top-0.006
bench-0.005
 versa-0.005
Token his
Token ID465
Feature activation-4.93e-02

Pos logit contributions
abella+0.022
 wrong+0.021
top+0.021
bench+0.018
rant+0.017
Neg logit contributions
hr-0.017
rum-0.017
packs-0.016
ment-0.016
cest-0.016
Token toy
Token ID13373
Feature activation-4.23e-02

Pos logit contributions
rum+0.080
hr+0.077
AAAA+0.075
oc+0.072
cest+0.071
Neg logit contributions
 wrong-0.069
abella-0.064
top-0.062
 weekend-0.059
 contest-0.055
Token plane
Token ID6614
Feature activation-2.93e-03

Pos logit contributions
hr+0.071
AAAA+0.068
rum+0.068
cest+0.061
Conf+0.061
Neg logit contributions
 wrong-0.053
top-0.053
abella-0.051
work-0.043
 contest-0.043
Token and
Token ID290
Feature activation-1.71e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.003
Neg logit contributions
abella-0.005
 wrong-0.004
top-0.004
bench-0.004
 versa-0.004
Token could
Token ID714
Feature activation-6.72e-03

Pos logit contributions
hr+0.026
rum+0.026
AAAA+0.025
cest+0.024
oc+0.024
Neg logit contributions
abella-0.025
top-0.022
 wrong-0.022
bench-0.018
 Luke-0.018
Token not
Token ID407
Feature activation+2.15e-03

Pos logit contributions
hr+0.010
rum+0.010
cest+0.010
AAAA+0.009
ment+0.009
Neg logit contributions
abella-0.010
 wrong-0.009
top-0.009
bench-0.007
 versa-0.007
Token.
Token ID13
Feature activation+1.81e-02

Pos logit contributions
hr+0.000
rum+0.000
cest+0.000
AAAA+0.000
ment+0.000
Neg logit contributions
abella-0.000
 wrong-0.000
top-0.000
bench-0.000
 versa-0.000
Token One
Token ID1881
Feature activation-1.70e-02

Pos logit contributions
abella+0.023
 wrong+0.021
top+0.019
 versa+0.017
bench+0.016
Neg logit contributions
oc-0.030
rum-0.029
hr-0.029
 exc-0.029
packs-0.028
Token day
Token ID1110
Feature activation-1.02e-02

Pos logit contributions
hr+0.039
rum+0.038
AAAA+0.037
cest+0.037
ment+0.036
Neg logit contributions
abella-0.011
 wrong-0.010
top-0.010
 weekend-0.007
 morning-0.005
Token,
Token ID11
Feature activation-2.28e-02

Pos logit contributions
hr+0.016
rum+0.015
cest+0.015
AAAA+0.015
ment+0.015
Neg logit contributions
abella-0.014
 wrong-0.013
top-0.013
bench-0.010
 versa-0.010
Token while
Token ID981
Feature activation-1.24e-02

Pos logit contributions
hr+0.044
rum+0.044
cest+0.042
AAAA+0.040
packs+0.040
Neg logit contributions
abella-0.023
 wrong-0.023
top-0.022
 Jack-0.017
 Luke-0.016
Token he
Token ID339
Feature activation-5.07e-02

Pos logit contributions
hr+0.020
rum+0.020
cest+0.019
AAAA+0.019
oc+0.018
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
bench-0.011
 Luke-0.011
Token was
Token ID373
Feature activation+1.64e-02

Pos logit contributions
rum+0.114
cest+0.107
 fortunate+0.107
hr+0.105
ment+0.105
Neg logit contributions
abella-0.035
 would-0.029
 made-0.028
top-0.027
 only-0.025
Token playing
Token ID2712
Feature activation-1.42e-02

Pos logit contributions
abella+0.029
top+0.027
 wrong+0.025
bench+0.024
rant+0.023
Neg logit contributions
hr-0.019
AAAA-0.018
 exc-0.017
 mis-0.017
 mut-0.017
Token,
Token ID11
Feature activation-1.31e-02

Pos logit contributions
hr+0.023
rum+0.022
cest+0.022
packs+0.022
AAAA+0.021
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.013
 contest-0.013
Token he
Token ID339
Feature activation+4.51e-03

Pos logit contributions
hr+0.022
rum+0.022
cest+0.022
packs+0.021
oc+0.021
Neg logit contributions
abella-0.016
 wrong-0.015
top-0.015
 Luke-0.011
bench-0.011
Token found
Token ID1043
Feature activation+1.19e-02

Pos logit contributions
abella+0.007
top+0.006
 wrong+0.006
bench+0.005
rant+0.005
Neg logit contributions
hr-0.006
cest-0.006
AAAA-0.006
rum-0.006
oc-0.006
Token lesson
Token ID11483
Feature activation+1.30e-02

Pos logit contributions
hr+0.004
rum+0.004
cest+0.004
AAAA+0.004
ment+0.004
Neg logit contributions
abella-0.004
 wrong-0.004
top-0.004
bench-0.003
 versa-0.003
Token.
Token ID13
Feature activation+7.24e-03

Pos logit contributions
abella+0.017
top+0.015
 wrong+0.014
bench+0.013
rant+0.012
Neg logit contributions
hr-0.021
rum-0.021
ment-0.020
cest-0.020
oc-0.019
Token They
Token ID1119
Feature activation+4.48e-03

Pos logit contributions
abella+0.011
 wrong+0.010
top+0.010
bench+0.008
rant+0.008
Neg logit contributions
hr-0.010
rum-0.010
AAAA-0.010
cest-0.010
ment-0.010
Token should
Token ID815
Feature activation+4.70e-02

Pos logit contributions
abella+0.006
 wrong+0.005
top+0.005
bench+0.004
rant+0.004
Neg logit contributions
hr-0.007
cest-0.007
rum-0.007
AAAA-0.007
oc-0.007
Token not
Token ID407
Feature activation+1.93e-02

Pos logit contributions
abella+0.065
top+0.057
 wrong+0.049
rant+0.047
 projector+0.047
Neg logit contributions
 exc-0.078
rum-0.073
 tou-0.073
hr-0.069
cest-0.069
Token feed
Token ID3745
Feature activation-4.90e-02

Pos logit contributions
abella+0.036
top+0.031
 wrong+0.030
bench+0.028
rant+0.027
Neg logit contributions
hr-0.023
 exc-0.022
 tou-0.022
rum-0.021
cest-0.021
Token the
Token ID262
Feature activation-2.36e-02

Pos logit contributions
hr+0.080
rum+0.080
cest+0.077
AAAA+0.077
yt+0.075
Neg logit contributions
 wrong-0.069
top-0.058
abella-0.054
 only-0.047
 would-0.047
Token animals
Token ID4695
Feature activation-3.44e-02

Pos logit contributions
rum+0.031
hr+0.029
packs+0.028
ment+0.028
cest+0.027
Neg logit contributions
 wrong-0.039
abella-0.037
top-0.036
 contest-0.031
 weekend-0.030
Token in
Token ID287
Feature activation-3.20e-02

Pos logit contributions
hr+0.052
AAAA+0.052
rum+0.051
cest+0.048
Upon+0.046
Neg logit contributions
 wrong-0.050
top-0.046
abella-0.045
 contest-0.037
 would-0.036
Token the
Token ID262
Feature activation+2.90e-04

Pos logit contributions
hr+0.050
rum+0.050
AAAA+0.047
cest+0.047
Upon+0.047
Neg logit contributions
 wrong-0.043
top-0.039
abella-0.037
 never-0.032
 only-0.031
Token pond
Token ID16723
Feature activation-8.61e-03

Pos logit contributions
abella+0.000
top+0.000
 wrong+0.000
bench+0.000
 versa+0.000
Neg logit contributions
hr-0.000
rum-0.000
cest-0.000
AAAA-0.000
oc-0.000
Token pick
Token ID2298
Feature activation+3.36e-03

Pos logit contributions
hr+0.014
rum+0.013
cest+0.013
AAAA+0.013
ment+0.013
Neg logit contributions
abella-0.015
 wrong-0.014
top-0.014
bench-0.011
 versa-0.011
Token up
Token ID510
Feature activation-3.88e-02

Pos logit contributions
abella+0.005
top+0.004
 wrong+0.004
bench+0.003
 versa+0.003
Neg logit contributions
hr-0.005
rum-0.005
cest-0.005
AAAA-0.005
oc-0.005
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
hr+0.070
cest+0.068
rum+0.068
AAAA+0.068
EX+0.064
Neg logit contributions
 wrong-0.045
top-0.038
abella-0.038
 Jack-0.031
 only-0.031
Token herb
Token ID21047
Feature activation-1.04e-02

Pos logit contributions
hr+0.024
rum+0.023
cest+0.022
ment+0.022
packs+0.022
Neg logit contributions
 wrong-0.027
abella-0.027
top-0.026
 weekend-0.021
rant-0.021
Token with
Token ID351
Feature activation-1.36e-02

Pos logit contributions
hr+0.014
rum+0.014
AAAA+0.013
cest+0.013
ment+0.012
Neg logit contributions
abella-0.016
 wrong-0.016
top-0.016
bench-0.013
rant-0.012
Token its
Token ID663
Feature activation-4.71e-02

Pos logit contributions
hr+0.015
cest+0.014
rum+0.014
AAAA+0.013
oc+0.013
Neg logit contributions
abella-0.025
 wrong-0.025
top-0.024
 Luke-0.020
bench-0.020
Token be
Token ID307
Feature activation-1.46e-02

Pos logit contributions
hr+0.082
AAAA+0.074
rum+0.074
cest+0.073
ment+0.071
Neg logit contributions
 wrong-0.061
top-0.052
 only-0.051
 never-0.047
 would-0.046
Tokenak
Token ID461
Feature activation-1.54e-02

Pos logit contributions
hr+0.020
rum+0.019
cest+0.019
AAAA+0.019
oc+0.018
Neg logit contributions
abella-0.022
 wrong-0.022
top-0.021
bench-0.018
rant-0.017
Token to
Token ID284
Feature activation-1.24e-02

Pos logit contributions
hr+0.022
AAAA+0.021
rum+0.021
cest+0.021
oc+0.020
Neg logit contributions
abella-0.022
 wrong-0.022
top-0.021
bench-0.017
rant-0.017
Token eat
Token ID4483
Feature activation-1.36e-02

Pos logit contributions
hr+0.018
rum+0.017
cest+0.017
AAAA+0.017
packs+0.016
Neg logit contributions
abella-0.018
 wrong-0.018
top-0.017
bench-0.014
rant-0.014
Token it
Token ID340
Feature activation-2.36e-02

Pos logit contributions
hr+0.023
rum+0.022
cest+0.022
AAAA+0.022
packs+0.021
Neg logit contributions
abella-0.017
 wrong-0.016
top-0.016
bench-0.012
rant-0.012