TOP ACTIVATIONS
MAX = 0.053

be scared , but delaying things won 't make them less
green bush with lots of berries on it .
time there was a little av ac ado . It was
west , where the sun went down . They wanted to
such as massage and what herbs to take for cold s
girl who didn 't like av oc ados . She always
was looking for nuts and berries to eat . Suddenly he
a lot of bright blue berries . He was very excited
, cats , birds , ham sters , and more .
, some corn , some carrots , some apples and some
wanted to learn more about veterin arians . She decided to
frozen bags of peas and carrots . He sighed and decided
Anna loves her bird , Sunny . Sunny is yellow and
a lot of dirt and trash and cans . He sees
of juicy , tasty looking berries . Suddenly he
a warm bed and tasty carrots every day . Peter was
sees something amazing . The seeds that he threw away did
not have any bread or seeds . They only had some
cones , ac orns , nuts , and seeds . They
a tall gir affe named Gerry . Kitty asked Gerry if

MIN ACTIVATIONS
MIN = -0.068

the edge of the tr amp oline and took a big
back to the same tr amp oline again . <|endoftext|> Once
playing outside on the tr amp oline . He was excited
and down on the tr amp oline . "
to jump on the tr amp oline , but Lily wanted
old lady with a mis chie vous tw inkle in her
and jumped on the tr amp oline . Lily was so
went on with his mis chie vous ways , causing problems
bounce on the big tr amp oline in their backyard .
to bounce on the tr amp oline , but Mia was
Jack jumped on the tr amp oline and shouted , "
and higher on the tr amp oline .
, â € said the 3 year old .
fun bouncing on the tr amp oline , but soon it
She falls off the tr amp oline . She lands on
they jumped on the tr amp oline and ran races in
stepped carefully onto the tr amp oline . As soon as
at him with a mis chie vous grin . She said
They climbed off the tr amp oline and hugged each other
Suddenly , the tr amp oline collapsed beneath her with

INTERVAL 0.023 - 0.053
CONTAINS 1.484%

pick a few of the herbs and put them into her
to find one that liked le mons and liked them too
kind girl named Lily . She loved to play outside with
them out . They were vegetables ! They were carrots ,
imagine next . <|endoftext|> Tom and Lily were twins who liked

INTERVAL -0.008 - 0.023
CONTAINS 77.274%

was a little , ugly bear and had only a few
she searched for but never found the same leaves again .
very small . The girl was very thoughtful and she wanted
. Lily smiled and took the medicine . She knew her
t eddy bears and the tickets and said , " Let

INTERVAL -0.038 - -0.008
CONTAINS 21.217%

. As they journey ed through the forest , Grand ma
come back soon , I dream of flying through the sky
OK , m ister ," Tom said . " But we
Sure , let 's go look at them ." They went
the trees and houses , and even the clouds in the

INTERVAL -0.068 - -0.038
CONTAINS 0.025%

big hill . The big round ball could roll very easily
It will spin on the floor and make a nice sound
black board ? It 's very messy ." Her teacher said
on paper and on the wall . Sometimes they painted on
She cut it into the shape of the shirt , then
Token be
Token ID307
Feature activation+1.12e-03

Pos logit contributions
 Honor+0.012
 client+0.012
eddy+0.012
ich+0.012
athy+0.012
Neg logit contributions
 contain-0.012
 Jane-0.010
aney-0.009
 yes-0.009
Jane-0.009
Token scared
Token ID12008
Feature activation-1.60e-05

Pos logit contributions
 regret+0.002
 vet+0.001
inging+0.001
 harm+0.001
 Owl+0.001
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 Jane-0.002
 Kate-0.002
Token,
Token ID11
Feature activation-6.53e-03

Pos logit contributions
aney+0.000
 contain+0.000
 Jane+0.000
 Kate+0.000
Jane+0.000
Neg logit contributions
 vet-0.000
 tou-0.000
 regret-0.000
 client-0.000
 veterinarian-0.000
Token but
Token ID475
Feature activation+3.46e-02

Pos logit contributions
aney+0.010
 contain+0.008
Jane+0.008
 yes+0.008
 Kate+0.007
Neg logit contributions
 regret-0.012
 vet-0.012
 harm-0.011
 veterinarian-0.011
inging-0.011
Token delaying
Token ID35221
Feature activation+2.11e-02

Pos logit contributions
inging+0.058
athy+0.054
 veterinarian+0.054
 client+0.054
 vet+0.053
Neg logit contributions
 contain-0.051
 Jane-0.049
 yes-0.048
aney-0.047
 Jill-0.046
Token things
Token ID1243
Feature activation+5.28e-02

Pos logit contributions
 vet+0.032
 tou+0.031
 veterinarian+0.030
 regret+0.030
inging+0.029
Neg logit contributions
aney-0.035
 contain-0.032
 Kate-0.031
 Jane-0.031
 yes-0.030
Token won
Token ID1839
Feature activation+1.68e-02

Pos logit contributions
 vet+0.089
 client+0.084
 veterinarian+0.084
athy+0.080
 hen+0.078
Neg logit contributions
 contain-0.085
 yes-0.077
aney-0.073
 Jane-0.065
 Kate-0.065
Token't
Token ID470
Feature activation+1.35e-02

Pos logit contributions
 regret+0.026
 vet+0.025
 hunter+0.024
 client+0.024
 tou+0.024
Neg logit contributions
aney-0.028
 contain-0.026
 Kate-0.024
ill-0.024
Jane-0.024
Token make
Token ID787
Feature activation+1.21e-02

Pos logit contributions
athy+0.022
 Honor+0.021
inging+0.020
 client+0.020
 vet+0.020
Neg logit contributions
 contain-0.024
aney-0.021
 Jane-0.018
 yes-0.018
Jane-0.017
Token them
Token ID606
Feature activation-2.94e-03

Pos logit contributions
 vet+0.018
 Owl+0.017
 veterinarian+0.017
 hunter+0.017
 regret+0.017
Neg logit contributions
aney-0.022
 Kate-0.019
 Jane-0.019
 contain-0.019
Jane-0.018
Token less
Token ID1342
Feature activation+2.26e-03

Pos logit contributions
aney+0.006
 contain+0.006
 yes+0.005
Jane+0.005
ill+0.005
Neg logit contributions
 vet-0.004
 veterinarian-0.004
 regret-0.004
athy-0.004
 Owl-0.004
Token green
Token ID4077
Feature activation+4.20e-03

Pos logit contributions
aney+0.004
 contain+0.004
Jane+0.004
 yes+0.003
 Kate+0.003
Neg logit contributions
 regret-0.003
 vet-0.003
 harm-0.003
 Owl-0.003
 lose-0.003
Token bush
Token ID24484
Feature activation+1.56e-02

Pos logit contributions
 regret+0.005
 harm+0.005
 lose+0.005
inging+0.005
 vet+0.005
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.008
 yes-0.007
 Jane-0.007
Token with
Token ID351
Feature activation-7.47e-03

Pos logit contributions
 regret+0.019
 vet+0.018
 veterinarian+0.017
 Owl+0.017
 harm+0.017
Neg logit contributions
aney-0.035
 contain-0.031
Jane-0.029
 Kate-0.028
 yes-0.028
Token lots
Token ID6041
Feature activation-6.15e-04

Pos logit contributions
aney+0.013
 contain+0.011
days+0.010
Jane+0.010
ill+0.009
Neg logit contributions
 regret-0.013
 vet-0.013
 wishes-0.012
 trouble-0.012
 Nut-0.012
Token of
Token ID286
Feature activation+2.37e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 Jane+0.001
 yes+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 lose-0.001
 harm-0.001
 Owl-0.001
Token berries
Token ID36322
Feature activation+5.27e-02

Pos logit contributions
 regret+0.003
 vet+0.003
 harm+0.003
 lose+0.003
 Owl+0.003
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
 yes-0.004
 Kate-0.004
Token on
Token ID319
Feature activation+1.58e-03

Pos logit contributions
 regret+0.083
 vet+0.080
 client+0.076
 veterinarian+0.075
 harm+0.074
Neg logit contributions
aney-0.097
 contain-0.087
Jane-0.079
 yes-0.076
 Jane-0.076
Token it
Token ID340
Feature activation+8.50e-03

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 lose+0.002
 Owl+0.002
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.003
 yes-0.003
 Kate-0.003
Token.
Token ID13
Feature activation-9.06e-03

Pos logit contributions
 regret+0.010
 vet+0.009
 harm+0.009
 Owl+0.009
 lose+0.009
Neg logit contributions
aney-0.019
 contain-0.017
Jane-0.016
 Kate-0.015
 yes-0.015
Token
Token ID220
Feature activation+8.30e-03

Pos logit contributions
aney+0.020
 contain+0.019
Jane+0.017
 yes+0.016
days+0.016
Neg logit contributions
 regret-0.011
 vet-0.011
 veterinarian-0.010
 Owl-0.010
 Nut-0.009
Token
Token ID198
Feature activation-4.27e-03

Pos logit contributions
 regret+0.010
 vet+0.010
 harm+0.009
 lose+0.009
 Owl+0.009
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.015
 Kate-0.015
 yes-0.015
Token time
Token ID640
Feature activation-1.06e-02

Pos logit contributions
aney+0.042
 contain+0.040
Jane+0.035
year+0.035
 Kate+0.035
Neg logit contributions
 regret-0.023
 vet-0.020
 harm-0.019
 veterinarian-0.019
inging-0.019
Token there
Token ID612
Feature activation+5.87e-03

Pos logit contributions
aney+0.028
 contain+0.027
year+0.024
shirt+0.024
Jane+0.024
Neg logit contributions
 regret-0.008
 Owl-0.007
 Nut-0.007
 lose-0.006
inging-0.006
Token was
Token ID373
Feature activation-4.24e-03

Pos logit contributions
 regret+0.007
 vet+0.006
 harm+0.006
 lose+0.006
 Owl+0.006
Neg logit contributions
aney-0.013
 contain-0.012
Jane-0.011
 yes-0.011
ill-0.010
Token a
Token ID257
Feature activation+7.42e-03

Pos logit contributions
aney+0.009
 contain+0.009
uits+0.007
days+0.007
ill+0.007
Neg logit contributions
 vet-0.005
 regret-0.005
 trouble-0.005
 Nut-0.005
 veterinarian-0.005
Token little
Token ID1310
Feature activation+1.24e-02

Pos logit contributions
 regret+0.011
 harm+0.011
inging+0.010
 lose+0.010
 vet+0.010
Neg logit contributions
aney-0.013
 contain-0.012
Jane-0.011
 yes-0.011
 Jane-0.011
Token av
Token ID1196
Feature activation+5.26e-02

Pos logit contributions
 regret+0.016
 harm+0.015
inging+0.014
check+0.014
 lose+0.014
Neg logit contributions
aney-0.026
 contain-0.023
 Jane-0.022
Jane-0.022
 yes-0.022
Tokenac
Token ID330
Feature activation-1.43e-05

Pos logit contributions
 regret+0.072
 damage+0.064
 vet+0.064
 lose+0.064
 unnecessary+0.061
Neg logit contributions
aney-0.101
 contain-0.086
mos-0.086
Jane-0.084
year-0.083
Tokenado
Token ID4533
Feature activation+1.72e-03

Pos logit contributions
aney+0.000
 contain+0.000
Jane+0.000
 yes+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 Owl-0.000
 veterinarian-0.000
Token.
Token ID13
Feature activation-1.18e-03

Pos logit contributions
 regret+0.002
 vet+0.002
 Owl+0.002
 harm+0.002
 veterinarian+0.002
Neg logit contributions
aney-0.004
 contain-0.003
Jane-0.003
 yes-0.003
 Kate-0.003
Token It
Token ID632
Feature activation-1.71e-03

Pos logit contributions
aney+0.003
 contain+0.002
 yes+0.002
Jane+0.002
uits+0.002
Neg logit contributions
 regret-0.001
 Owl-0.001
 Nut-0.001
 vet-0.001
 veterinarian-0.001
Token was
Token ID373
Feature activation-1.42e-02

Pos logit contributions
aney+0.004
 contain+0.004
uits+0.004
ill+0.004
Jane+0.004
Neg logit contributions
 regret-0.002
 vet-0.002
 Owl-0.001
 veterinarian-0.001
 harm-0.001
Token west
Token ID7421
Feature activation-2.77e-03

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 Jane+0.003
 Kate+0.003
Neg logit contributions
 regret-0.002
 harm-0.002
 lose-0.002
 Owl-0.002
inging-0.002
Token,
Token ID11
Feature activation-1.75e-03

Pos logit contributions
aney+0.005
 contain+0.005
Jane+0.005
ill+0.004
 Jane+0.004
Neg logit contributions
 regret-0.004
 vet-0.004
 harm-0.004
 Owl-0.004
 veterinarian-0.004
Token where
Token ID810
Feature activation-1.95e-02

Pos logit contributions
aney+0.004
 contain+0.003
mos+0.003
days+0.003
Jane+0.003
Neg logit contributions
 regret-0.002
 vet-0.002
 lose-0.002
 damage-0.002
 harm-0.002
Token the
Token ID262
Feature activation+4.52e-03

Pos logit contributions
aney+0.040
 contain+0.035
Jane+0.031
ill+0.030
shirt+0.030
Neg logit contributions
 regret-0.029
 vet-0.026
 trouble-0.026
 harm-0.025
 damage-0.025
Token sun
Token ID4252
Feature activation+9.96e-03

Pos logit contributions
 regret+0.006
 vet+0.006
 harm+0.006
inging+0.006
 lose+0.006
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.008
 yes-0.007
 Jane-0.007
Token went
Token ID1816
Feature activation+5.22e-02

Pos logit contributions
 regret+0.017
 vet+0.016
 unnecessary+0.016
 harm+0.015
 veterinarian+0.015
Neg logit contributions
aney-0.017
 contain-0.015
Jane-0.014
 Jane-0.014
 Kate-0.014
Token down
Token ID866
Feature activation+1.06e-02

Pos logit contributions
 regret+0.092
 client+0.086
 vet+0.084
 wishes+0.083
 harm+0.082
Neg logit contributions
aney-0.078
 contain-0.076
 yes-0.068
 Jane-0.068
Jane-0.066
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
 vet+0.009
 regret+0.009
 harm+0.008
 Owl+0.008
 trouble+0.008
Neg logit contributions
aney-0.028
 contain-0.024
Jane-0.023
shirt-0.022
 Kate-0.022
Token They
Token ID1119
Feature activation-6.56e-03

Pos logit contributions
aney+0.026
 contain+0.025
days+0.023
year+0.023
 yes+0.023
Neg logit contributions
 vet-0.011
 regret-0.010
 veterinarian-0.009
inging-0.009
 Nut-0.009
Token wanted
Token ID2227
Feature activation-1.35e-03

Pos logit contributions
aney+0.014
 contain+0.013
rd+0.012
book+0.012
shirt+0.012
Neg logit contributions
 regret-0.008
 should-0.007
 vet-0.006
 Nut-0.006
 lose-0.006
Token to
Token ID284
Feature activation+3.75e-04

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
year+0.003
ill+0.003
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 damage-0.001
 trouble-0.001
Token such
Token ID884
Feature activation+2.19e-02

Pos logit contributions
 regret+0.029
 vet+0.029
 veterinarian+0.027
 tou+0.027
 Owl+0.027
Neg logit contributions
aney-0.036
 contain-0.032
Jane-0.029
 yes-0.029
 Kate-0.029
Token as
Token ID355
Feature activation+1.56e-02

Pos logit contributions
 regret+0.032
 vet+0.031
 lose+0.030
inging+0.030
 veterinarian+0.030
Neg logit contributions
aney-0.040
 contain-0.036
Jane-0.033
 yes-0.033
 Jane-0.032
Token massage
Token ID26900
Feature activation+2.55e-02

Pos logit contributions
 regret+0.024
 vet+0.024
inging+0.023
 veterinarian+0.023
 lose+0.023
Neg logit contributions
aney-0.026
 contain-0.023
 yes-0.022
 Jane-0.022
Jane-0.022
Token and
Token ID290
Feature activation+1.27e-02

Pos logit contributions
 regret+0.040
 vet+0.038
inging+0.036
 tou+0.036
 veterinarian+0.035
Neg logit contributions
aney-0.044
 contain-0.041
 yes-0.038
Jane-0.037
 Kate-0.037
Token what
Token ID644
Feature activation+1.91e-02

Pos logit contributions
inging+0.017
 tou+0.017
 veterinarian+0.016
athy+0.016
 vet+0.016
Neg logit contributions
 contain-0.022
aney-0.020
 yes-0.020
 Kate-0.019
 Jane-0.019
Token herbs
Token ID25411
Feature activation+5.22e-02

Pos logit contributions
inging+0.030
 regret+0.030
 vet+0.029
 veterinarian+0.028
 lose+0.028
Neg logit contributions
aney-0.030
 contain-0.028
 yes-0.027
 Jane-0.026
 Kate-0.026
Token to
Token ID284
Feature activation+7.86e-03

Pos logit contributions
 vet+0.073
 regret+0.072
 veterinarian+0.067
 Owl+0.067
 client+0.067
Neg logit contributions
aney-0.101
 contain-0.095
 yes-0.085
Jane-0.083
 Jane-0.082
Token take
Token ID1011
Feature activation+2.27e-02

Pos logit contributions
 vet+0.009
inging+0.009
 regret+0.009
 client+0.009
 Owl+0.009
Neg logit contributions
 contain-0.016
aney-0.015
 yes-0.014
 Jane-0.013
Jane-0.013
Token for
Token ID329
Feature activation+1.36e-02

Pos logit contributions
 regret+0.033
 vet+0.033
inging+0.031
 lose+0.031
 Owl+0.031
Neg logit contributions
aney-0.042
 contain-0.037
Jane-0.035
 yes-0.034
 Jane-0.034
Token cold
Token ID4692
Feature activation+1.03e-03

Pos logit contributions
 regret+0.020
 vet+0.019
 harm+0.018
 Owl+0.018
 lose+0.018
Neg logit contributions
aney-0.026
 contain-0.023
Jane-0.022
 yes-0.021
 Kate-0.021
Tokens
Token ID82
Feature activation+7.59e-03

Pos logit contributions
 regret+0.002
 vet+0.001
inging+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 Jane-0.002
 Kate-0.002
Token girl
Token ID2576
Feature activation+1.18e-02

Pos logit contributions
 regret+0.017
 harm+0.016
inging+0.016
check+0.016
 lose+0.015
Neg logit contributions
aney-0.028
 contain-0.024
Jane-0.024
 Jane-0.024
 yes-0.023
Token who
Token ID508
Feature activation+2.12e-02

Pos logit contributions
 regret+0.021
 vet+0.020
 harm+0.020
 lose+0.020
inging+0.019
Neg logit contributions
aney-0.019
 contain-0.016
Jane-0.015
 Kate-0.014
 Jane-0.014
Token didn
Token ID1422
Feature activation+1.34e-02

Pos logit contributions
 regret+0.037
 vet+0.037
 Owl+0.035
 harm+0.035
inging+0.035
Neg logit contributions
aney-0.034
 contain-0.029
Jane-0.028
 Kate-0.026
 Jane-0.026
Token't
Token ID470
Feature activation-2.43e-03

Pos logit contributions
 regret+0.011
 vet+0.011
 veterinarian+0.010
 harm+0.010
 lose+0.010
Neg logit contributions
aney-0.035
 contain-0.031
Jane-0.030
 yes-0.029
 Kate-0.029
Token like
Token ID588
Feature activation-1.77e-03

Pos logit contributions
aney+0.006
Jane+0.005
 yes+0.005
 contain+0.005
uits+0.005
Neg logit contributions
 regret-0.003
 lose-0.002
 harm-0.002
 vet-0.002
 damage-0.002
Token av
Token ID1196
Feature activation+5.16e-02

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 yes+0.003
 Kate+0.003
Neg logit contributions
 regret-0.003
 vet-0.002
 harm-0.002
 Owl-0.002
 lose-0.002
Tokenoc
Token ID420
Feature activation+1.82e-02

Pos logit contributions
 regret+0.064
 vet+0.063
 veterinarian+0.056
 gut+0.056
 damage+0.053
Neg logit contributions
aney-0.102
 contain-0.088
mos-0.084
 cubes-0.084
ill-0.084
Tokenados
Token ID22484
Feature activation+2.96e-02

Pos logit contributions
 regret+0.033
 vet+0.030
 lose+0.029
 parent+0.028
 damage+0.028
Neg logit contributions
aney-0.029
 contain-0.023
 Jane-0.023
 yes-0.023
Jane-0.023
Token.
Token ID13
Feature activation+1.37e-03

Pos logit contributions
 regret+0.043
 vet+0.041
 harm+0.039
inging+0.039
 Owl+0.038
Neg logit contributions
aney-0.057
 contain-0.052
Jane-0.047
 yes-0.046
 Kate-0.045
Token She
Token ID1375
Feature activation-8.92e-05

Pos logit contributions
 regret+0.001
 vet+0.001
 veterinarian+0.001
 Owl+0.001
 Nut+0.001
Neg logit contributions
aney-0.004
 contain-0.004
 yes-0.003
Jane-0.003
year-0.003
Token always
Token ID1464
Feature activation-2.57e-03

Pos logit contributions
aney+0.000
 contain+0.000
ill+0.000
uits+0.000
Jane+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 wishes-0.000
 lose-0.000
 veterinarian-0.000
Token was
Token ID373
Feature activation+1.41e-02

Pos logit contributions
 regret+0.012
 vet+0.011
 lose+0.011
 harm+0.011
 veterinarian+0.011
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.014
 yes-0.014
 Kate-0.014
Token looking
Token ID2045
Feature activation+2.06e-02

Pos logit contributions
 regret+0.020
 vet+0.019
 veterinarian+0.019
 Owl+0.018
 harm+0.018
Neg logit contributions
aney-0.029
 contain-0.025
Jane-0.024
ill-0.023
 Kate-0.023
Token for
Token ID329
Feature activation+1.68e-02

Pos logit contributions
 regret+0.030
 vet+0.029
 harm+0.028
 Owl+0.027
 lose+0.027
Neg logit contributions
aney-0.040
 contain-0.036
Jane-0.033
 yes-0.032
 Kate-0.032
Token nuts
Token ID14380
Feature activation+4.12e-02

Pos logit contributions
 regret+0.030
 vet+0.029
 harm+0.027
 veterinarian+0.027
 Owl+0.027
Neg logit contributions
aney-0.030
 contain-0.026
Jane-0.023
ill-0.022
 Kate-0.022
Token and
Token ID290
Feature activation+1.18e-02

Pos logit contributions
 regret+0.060
 vet+0.057
 harm+0.055
 veterinarian+0.054
 Owl+0.054
Neg logit contributions
aney-0.081
 contain-0.072
Jane-0.067
 yes-0.065
ill-0.063
Token berries
Token ID36322
Feature activation+5.10e-02

Pos logit contributions
 regret+0.018
 vet+0.017
 Owl+0.016
 harm+0.016
 lose+0.016
Neg logit contributions
aney-0.023
 contain-0.020
Jane-0.019
 Kate-0.018
ill-0.018
Token to
Token ID284
Feature activation+5.87e-03

Pos logit contributions
 regret+0.073
 vet+0.069
 harm+0.067
 veterinarian+0.065
 Owl+0.065
Neg logit contributions
aney-0.101
 contain-0.091
Jane-0.083
 yes-0.081
ill-0.080
Token eat
Token ID4483
Feature activation+2.37e-02

Pos logit contributions
 regret+0.008
 vet+0.007
 harm+0.007
 Owl+0.007
inging+0.007
Neg logit contributions
aney-0.012
 contain-0.011
Jane-0.010
 yes-0.010
 Jane-0.010
Token.
Token ID13
Feature activation+6.70e-03

Pos logit contributions
 regret+0.024
 vet+0.023
 harm+0.023
 Owl+0.022
 veterinarian+0.022
Neg logit contributions
aney-0.057
 contain-0.051
Jane-0.048
 Kate-0.047
ill-0.046
Token Suddenly
Token ID24975
Feature activation+1.61e-02

Pos logit contributions
 regret+0.009
 vet+0.009
 harm+0.009
 Owl+0.008
 veterinarian+0.008
Neg logit contributions
aney-0.014
 contain-0.012
Jane-0.011
 yes-0.011
 Kate-0.011
Token he
Token ID339
Feature activation-2.83e-04

Pos logit contributions
 regret+0.024
 vet+0.022
 harm+0.022
inging+0.021
 lose+0.021
Neg logit contributions
aney-0.030
 contain-0.027
Jane-0.026
 yes-0.026
 Jane-0.025
Token a
Token ID257
Feature activation+9.79e-03

Pos logit contributions
aney+0.014
 contain+0.013
days+0.012
Jane+0.011
ill+0.011
Neg logit contributions
 regret-0.014
 vet-0.014
 trouble-0.013
 veterinarian-0.013
 Owl-0.013
Token lot
Token ID1256
Feature activation-4.13e-03

Pos logit contributions
 regret+0.014
 harm+0.014
check+0.013
 vet+0.013
inging+0.013
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.015
 yes-0.015
 Jane-0.014
Token of
Token ID286
Feature activation-3.09e-04

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.006
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.006
 veterinarian-0.006
 Owl-0.006
Token bright
Token ID6016
Feature activation-2.04e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 yes+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 Owl-0.000
 lose-0.000
Token blue
Token ID4171
Feature activation-2.14e-03

Pos logit contributions
aney+0.004
 contain+0.003
Jane+0.003
 Jane+0.003
 Kate+0.003
Neg logit contributions
 regret-0.003
 vet-0.003
 lose-0.003
 harm-0.003
 veterinarian-0.003
Token berries
Token ID36322
Feature activation+5.07e-02

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 yes+0.002
 Kate+0.002
Neg logit contributions
 regret-0.004
 vet-0.004
 harm-0.004
 Owl-0.004
 lose-0.004
Token.
Token ID13
Feature activation-6.89e-03

Pos logit contributions
 regret+0.077
 vet+0.074
 veterinarian+0.070
 harm+0.069
 client+0.069
Neg logit contributions
aney-0.096
 contain-0.087
Jane-0.078
 yes-0.077
 Jane-0.076
Token He
Token ID679
Feature activation-6.85e-03

Pos logit contributions
aney+0.015
 contain+0.014
Jane+0.012
 yes+0.012
days+0.012
Neg logit contributions
 regret-0.009
 vet-0.008
 Owl-0.008
 veterinarian-0.008
 Nut-0.008
Token was
Token ID373
Feature activation-5.94e-03

Pos logit contributions
aney+0.015
 contain+0.014
days+0.013
ill+0.012
uits+0.012
Neg logit contributions
 regret-0.008
 vet-0.008
 owners-0.007
 veterinarian-0.007
 Owl-0.007
Token very
Token ID845
Feature activation-5.16e-03

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
ill+0.007
days+0.007
Neg logit contributions
 regret-0.011
 vet-0.011
 Owl-0.010
 veterinarian-0.010
 harm-0.010
Token excited
Token ID6568
Feature activation+9.52e-04

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.006
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.010
 vet-0.010
 harm-0.009
 lose-0.009
 Owl-0.009
Token,
Token ID11
Feature activation-7.24e-03

Pos logit contributions
 regret+0.021
 vet+0.021
 harm+0.020
check+0.019
 unnecessary+0.019
Neg logit contributions
aney-0.026
 contain-0.023
Jane-0.022
 yes-0.021
 Jane-0.021
Token cats
Token ID11875
Feature activation+1.60e-02

Pos logit contributions
aney+0.016
 contain+0.014
Jane+0.014
ill+0.013
 Kate+0.013
Neg logit contributions
 regret-0.008
 vet-0.008
 veterinarian-0.008
 Owl-0.008
 Nut-0.008
Token,
Token ID11
Feature activation+8.36e-03

Pos logit contributions
 regret+0.025
 vet+0.023
check+0.023
 harm+0.023
 unnecessary+0.022
Neg logit contributions
aney-0.029
 contain-0.026
Jane-0.024
 yes-0.024
 Jane-0.024
Token birds
Token ID10087
Feature activation+2.36e-02

Pos logit contributions
 regret+0.009
 vet+0.009
 Owl+0.008
 veterinarian+0.008
 harm+0.008
Neg logit contributions
aney-0.020
 contain-0.018
Jane-0.017
 Kate-0.016
ill-0.016
Token,
Token ID11
Feature activation+1.07e-02

Pos logit contributions
 regret+0.039
 vet+0.037
check+0.036
 harm+0.036
 unnecessary+0.036
Neg logit contributions
aney-0.042
 contain-0.035
 yes-0.034
ill-0.033
 Jane-0.033
Token ham
Token ID8891
Feature activation+5.05e-02

Pos logit contributions
 regret+0.011
 vet+0.011
 Owl+0.010
 veterinarian+0.010
 harm+0.010
Neg logit contributions
aney-0.026
 contain-0.023
Jane-0.022
 Kate-0.021
ill-0.021
Tokensters
Token ID5937
Feature activation+3.14e-02

Pos logit contributions
 regret+0.089
 harm+0.081
 damage+0.076
 vet+0.074
 unnecessary+0.074
Neg logit contributions
aney-0.092
 yes-0.076
ill-0.075
 contain-0.074
mos-0.070
Token,
Token ID11
Feature activation+9.90e-03

Pos logit contributions
 regret+0.043
 vet+0.040
 harm+0.039
 lose+0.038
 veterinarian+0.037
Neg logit contributions
aney-0.065
 contain-0.058
Jane-0.054
 yes-0.053
ill-0.052
Token and
Token ID290
Feature activation-4.06e-03

Pos logit contributions
 regret+0.011
 vet+0.010
 Owl+0.010
 veterinarian+0.010
 harm+0.010
Neg logit contributions
aney-0.023
 contain-0.021
Jane-0.020
 Kate-0.019
 Jane-0.018
Token more
Token ID517
Feature activation+7.51e-03

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.007
ill+0.007
 Kate+0.006
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.005
 veterinarian-0.005
 Owl-0.005
Token.
Token ID13
Feature activation-5.66e-03

Pos logit contributions
 regret+0.011
 vet+0.010
inging+0.010
 lose+0.010
 Owl+0.010
Neg logit contributions
aney-0.014
 contain-0.013
Jane-0.012
 Jane-0.012
 yes-0.012
Token,
Token ID11
Feature activation+3.79e-03

Pos logit contributions
 regret+0.037
 vet+0.034
 harm+0.033
 lose+0.032
 veterinarian+0.032
Neg logit contributions
aney-0.058
 contain-0.052
Jane-0.048
 yes-0.048
ill-0.047
Token some
Token ID617
Feature activation+9.96e-03

Pos logit contributions
 vet+0.006
 regret+0.006
 Owl+0.006
 veterinarian+0.006
 lose+0.006
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.006
 Kate-0.005
days-0.005
Token corn
Token ID11676
Feature activation+3.38e-02

Pos logit contributions
 regret+0.016
 lose+0.014
inging+0.014
 client+0.014
 Owl+0.014
Neg logit contributions
aney-0.018
 contain-0.016
 yes-0.015
Jane-0.014
 Jane-0.014
Token,
Token ID11
Feature activation+1.23e-02

Pos logit contributions
 regret+0.056
 vet+0.051
 harm+0.050
 unnecessary+0.050
 client+0.050
Neg logit contributions
aney-0.060
 contain-0.052
 yes-0.049
ill-0.049
 Jane-0.048
Token some
Token ID617
Feature activation+1.14e-02

Pos logit contributions
 vet+0.014
 regret+0.014
 Owl+0.013
 veterinarian+0.013
 harm+0.013
Neg logit contributions
aney-0.028
 contain-0.025
Jane-0.024
 Kate-0.023
days-0.022
Token carrots
Token ID34397
Feature activation+5.04e-02

Pos logit contributions
 regret+0.018
 lose+0.016
inging+0.016
 vet+0.016
 Owl+0.016
Neg logit contributions
aney-0.021
 contain-0.018
 yes-0.017
Jane-0.017
 Jane-0.016
Token,
Token ID11
Feature activation+1.62e-02

Pos logit contributions
 regret+0.085
 vet+0.079
 harm+0.078
 unnecessary+0.078
 client+0.078
Neg logit contributions
aney-0.087
 contain-0.077
 yes-0.072
ill-0.072
 Jane-0.068
Token some
Token ID617
Feature activation+1.27e-02

Pos logit contributions
 vet+0.018
 regret+0.018
 Owl+0.017
 veterinarian+0.017
 harm+0.016
Neg logit contributions
aney-0.037
 contain-0.033
Jane-0.032
 Kate-0.030
 Jane-0.030
Token apples
Token ID22514
Feature activation+4.57e-02

Pos logit contributions
 regret+0.020
 vet+0.018
inging+0.018
 lose+0.018
 client+0.018
Neg logit contributions
aney-0.023
 contain-0.020
 yes-0.019
Jane-0.019
 Jane-0.018
Token and
Token ID290
Feature activation+6.54e-03

Pos logit contributions
 regret+0.083
 client+0.076
 unnecessary+0.076
 harm+0.075
 vet+0.074
Neg logit contributions
aney-0.074
 contain-0.066
 yes-0.062
ill-0.061
 Jane-0.057
Token some
Token ID617
Feature activation+1.20e-02

Pos logit contributions
 regret+0.012
 vet+0.012
 harm+0.011
 Owl+0.011
 veterinarian+0.011
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.008
 Kate-0.007
 yes-0.007
Token wanted
Token ID2227
Feature activation-2.96e-04

Pos logit contributions
 regret+0.030
 vet+0.029
 harm+0.029
inging+0.028
 Owl+0.028
Neg logit contributions
aney-0.032
 contain-0.028
Jane-0.026
 yes-0.025
 Kate-0.025
Token to
Token ID284
Feature activation+5.73e-03

Pos logit contributions
aney+0.001
 contain+0.000
Jane+0.000
 yes+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 lose-0.000
 Owl-0.000
Token learn
Token ID2193
Feature activation+2.97e-03

Pos logit contributions
inging+0.008
 unnecessary+0.008
iring+0.008
athy+0.008
iqu+0.008
Neg logit contributions
 contain-0.011
aney-0.010
 yes-0.009
 Jane-0.009
Jane-0.008
Token more
Token ID517
Feature activation+1.14e-02

Pos logit contributions
 regret+0.005
 vet+0.005
 harm+0.005
 Owl+0.005
 veterinarian+0.005
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
 yes-0.004
 Kate-0.004
Token about
Token ID546
Feature activation+9.81e-03

Pos logit contributions
 regret+0.021
 lose+0.021
check+0.021
 vet+0.021
inging+0.020
Neg logit contributions
aney-0.015
 contain-0.013
 yes-0.012
 Jane-0.012
Jane-0.012
Token veterin
Token ID33647
Feature activation+5.04e-02

Pos logit contributions
inging+0.016
 regret+0.015
 lose+0.015
 vet+0.015
check+0.014
Neg logit contributions
aney-0.016
 contain-0.014
 Jane-0.013
Jane-0.013
 Kate-0.013
Tokenarians
Token ID13517
Feature activation+7.45e-03

Pos logit contributions
 regret+0.080
 trouble+0.069
 hurts+0.067
 wolf+0.067
 wish+0.067
Neg logit contributions
 contain-0.082
aney-0.082
days-0.073
years-0.069
and-0.068
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 regret+0.010
 vet+0.009
 harm+0.009
 Owl+0.009
 lose+0.009
Neg logit contributions
aney-0.015
 contain-0.014
Jane-0.013
 yes-0.012
 Kate-0.012
Token She
Token ID1375
Feature activation+5.68e-03

Pos logit contributions
 regret+0.016
 harm+0.016
 lose+0.014
 damage+0.014
inging+0.014
Neg logit contributions
aney-0.020
 Jane-0.018
 Kate-0.017
 contain-0.017
 yes-0.017
Token decided
Token ID3066
Feature activation+1.04e-02

Pos logit contributions
 regret+0.009
 vet+0.008
 harm+0.008
 veterinarian+0.008
 Owl+0.008
Neg logit contributions
aney-0.011
 contain-0.009
Jane-0.009
 yes-0.008
 Kate-0.008
Token to
Token ID284
Feature activation+2.08e-03

Pos logit contributions
 regret+0.018
inging+0.017
 harm+0.016
 vet+0.016
 difference+0.016
Neg logit contributions
aney-0.017
 contain-0.015
Jane-0.015
 yes-0.015
 Jane-0.014
Token frozen
Token ID12912
Feature activation+1.22e-03

Pos logit contributions
aney+0.023
 contain+0.020
Jane+0.019
 yes+0.018
 Kate+0.018
Neg logit contributions
 regret-0.014
 vet-0.013
 harm-0.013
 veterinarian-0.012
 Owl-0.012
Token bags
Token ID11668
Feature activation+1.97e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 lose+0.002
 harm+0.002
 veterinarian+0.002
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 Jane-0.002
 Kate-0.002
Token of
Token ID286
Feature activation-2.56e-04

Pos logit contributions
 regret+0.027
 vet+0.025
 harm+0.024
 lose+0.024
 Owl+0.024
Neg logit contributions
aney-0.041
 contain-0.035
Jane-0.034
 Kate-0.033
 yes-0.032
Token peas
Token ID22589
Feature activation+3.56e-02

Pos logit contributions
aney+0.001
 contain+0.000
Jane+0.000
 yes+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 Owl-0.000
 lose-0.000
Token and
Token ID290
Feature activation-2.27e-03

Pos logit contributions
 regret+0.051
 vet+0.048
 harm+0.046
 veterinarian+0.046
 Owl+0.045
Neg logit contributions
aney-0.070
 contain-0.063
Jane-0.058
 yes-0.057
ill-0.055
Token carrots
Token ID34397
Feature activation+5.03e-02

Pos logit contributions
aney+0.004
 contain+0.004
Jane+0.004
ill+0.003
 yes+0.003
Neg logit contributions
 regret-0.004
 vet-0.003
 lose-0.003
 harm-0.003
 Owl-0.003
Token.
Token ID13
Feature activation-3.93e-03

Pos logit contributions
 regret+0.067
 vet+0.062
 harm+0.060
 Owl+0.059
 lose+0.059
Neg logit contributions
aney-0.106
 contain-0.094
Jane-0.089
 yes-0.085
 Kate-0.085
Token He
Token ID679
Feature activation-7.70e-03

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
days+0.007
Neg logit contributions
 vet-0.005
 regret-0.005
 Owl-0.004
 veterinarian-0.004
 Nut-0.004
Token sighed
Token ID21893
Feature activation-5.74e-03

Pos logit contributions
aney+0.016
 contain+0.015
days+0.013
ill+0.013
Jane+0.013
Neg logit contributions
 regret-0.010
 vet-0.010
 veterinarian-0.009
 Owl-0.008
 wishes-0.008
Token and
Token ID290
Feature activation-3.76e-03

Pos logit contributions
aney+0.014
 contain+0.012
Jane+0.012
ill+0.011
mos+0.011
Neg logit contributions
 regret-0.006
 vet-0.005
 Owl-0.005
 lose-0.005
 veterinarian-0.005
Token decided
Token ID3066
Feature activation+1.48e-03

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.006
ill+0.006
days+0.005
Neg logit contributions
 regret-0.006
 vet-0.005
 lose-0.005
 harm-0.005
 Owl-0.005
TokenAnna
Token ID31160
Feature activation+1.70e-02

Pos logit contributions
aney+0.019
 contain+0.017
Jane+0.016
 yes+0.016
 Kate+0.016
Neg logit contributions
 regret-0.008
 vet-0.007
 harm-0.007
 Owl-0.007
 lose-0.006
Token loves
Token ID10408
Feature activation-4.35e-03

Pos logit contributions
 unnecessary+0.028
 regret+0.028
 vet+0.027
inging+0.027
 harm+0.027
Neg logit contributions
aney-0.029
 contain-0.023
Jane-0.023
ill-0.023
 Jane-0.023
Token her
Token ID607
Feature activation+1.42e-02

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.006
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.007
 vet-0.007
 harm-0.006
 Owl-0.006
 lose-0.006
Token bird
Token ID6512
Feature activation+9.35e-03

Pos logit contributions
 regret+0.016
 vet+0.015
 harm+0.014
 Owl+0.014
 lose+0.014
Neg logit contributions
aney-0.033
 contain-0.030
Jane-0.028
 yes-0.027
 Kate-0.027
Token,
Token ID11
Feature activation+7.74e-03

Pos logit contributions
 regret+0.010
 vet+0.009
 Owl+0.009
 veterinarian+0.009
 harm+0.008
Neg logit contributions
aney-0.023
 contain-0.021
Jane-0.020
year-0.019
 Kate-0.019
Token Sunny
Token ID32241
Feature activation+5.01e-02

Pos logit contributions
 vet+0.009
 regret+0.009
 veterinarian+0.009
 Owl+0.009
 Nut+0.009
Neg logit contributions
aney-0.018
 contain-0.016
year-0.015
Jane-0.014
mos-0.014
Token.
Token ID13
Feature activation+6.69e-03

Pos logit contributions
 regret+0.079
 vet+0.075
inging+0.074
 harm+0.073
 damage+0.073
Neg logit contributions
aney-0.089
 contain-0.075
 yes-0.072
Jane-0.072
 Jane-0.070
Token Sunny
Token ID32241
Feature activation+3.88e-02

Pos logit contributions
 regret+0.010
 harm+0.009
 vet+0.009
 lose+0.009
inging+0.009
Neg logit contributions
aney-0.013
 contain-0.011
Jane-0.011
 yes-0.010
 Kate-0.010
Token is
Token ID318
Feature activation-2.25e-03

Pos logit contributions
 regret+0.058
 vet+0.055
 harm+0.053
inging+0.053
 unnecessary+0.052
Neg logit contributions
aney-0.073
 contain-0.062
Jane-0.060
 yes-0.059
 Kate-0.058
Token yellow
Token ID7872
Feature activation-4.64e-03

Pos logit contributions
aney+0.004
 contain+0.004
Jane+0.003
ill+0.003
 yes+0.003
Neg logit contributions
 regret-0.004
 vet-0.004
 Owl-0.003
 veterinarian-0.003
 harm-0.003
Token and
Token ID290
Feature activation+3.79e-03

Pos logit contributions
aney+0.012
 contain+0.011
Jane+0.010
 Kate+0.010
 Jane+0.009
Neg logit contributions
 vet-0.004
 regret-0.004
 veterinarian-0.004
 Owl-0.004
 lose-0.004
Token a
Token ID257
Feature activation+1.68e-02

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 yes+0.001
ill+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 Owl-0.001
 veterinarian-0.001
Token lot
Token ID1256
Feature activation+1.10e-03

Pos logit contributions
check+0.023
athy+0.023
 regret+0.023
 harm+0.023
 promises+0.022
Neg logit contributions
aney-0.029
Jane-0.025
 contain-0.025
 Jane-0.025
 seas-0.025
Token of
Token ID286
Feature activation+7.76e-03

Pos logit contributions
 regret+0.002
 vet+0.002
 lose+0.002
 Owl+0.002
 client+0.002
Neg logit contributions
aney-0.002
 contain-0.001
Jane-0.001
ill-0.001
 Jane-0.001
Token dirt
Token ID13647
Feature activation+2.69e-02

Pos logit contributions
inging+0.010
check+0.010
 vet+0.010
athy+0.010
 regret+0.010
Neg logit contributions
aney-0.014
 contain-0.013
Jane-0.013
 Jane-0.013
 Kate-0.012
Token and
Token ID290
Feature activation+9.84e-03

Pos logit contributions
 regret+0.044
 vet+0.041
 client+0.040
 veterinarian+0.040
 unnecessary+0.040
Neg logit contributions
aney-0.046
 contain-0.041
ill-0.038
Jane-0.037
 yes-0.036
Token trash
Token ID13913
Feature activation+5.00e-02

Pos logit contributions
 regret+0.015
 vet+0.014
 harm+0.013
 lose+0.013
 Owl+0.013
Neg logit contributions
aney-0.020
 contain-0.017
Jane-0.016
 yes-0.015
ill-0.015
Token and
Token ID290
Feature activation+3.09e-03

Pos logit contributions
 regret+0.094
athy+0.090
 client+0.089
 vet+0.088
 gut+0.086
Neg logit contributions
aney-0.065
 cubes-0.061
ill-0.060
 contain-0.059
 Jill-0.055
Token cans
Token ID23916
Feature activation+2.51e-02

Pos logit contributions
 regret+0.005
 vet+0.004
 harm+0.004
 lose+0.004
 Owl+0.004
Neg logit contributions
aney-0.006
 contain-0.005
Jane-0.005
 yes-0.005
 Kate-0.005
Token.
Token ID13
Feature activation+5.00e-03

Pos logit contributions
 regret+0.035
 vet+0.033
 harm+0.031
 Owl+0.031
 veterinarian+0.031
Neg logit contributions
aney-0.051
 contain-0.046
Jane-0.043
 yes-0.041
 Kate-0.041
Token He
Token ID679
Feature activation+1.63e-02

Pos logit contributions
 regret+0.008
 harm+0.007
 lose+0.007
inging+0.007
 vet+0.007
Neg logit contributions
aney-0.010
 contain-0.008
Jane-0.008
 Jane-0.008
 Kate-0.008
Token sees
Token ID7224
Feature activation-2.64e-03

Pos logit contributions
 regret+0.019
 harm+0.018
 vet+0.018
 Owl+0.017
 veterinarian+0.017
Neg logit contributions
aney-0.035
 contain-0.031
 Kate-0.030
Jane-0.029
 yes-0.029
Token of
Token ID286
Feature activation+4.03e-03

Pos logit contributions
aney+0.003
 contain+0.002
Jane+0.002
 Kate+0.002
days+0.002
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 veterinarian-0.002
 Owl-0.002
Token juicy
Token ID35075
Feature activation+2.05e-02

Pos logit contributions
 regret+0.006
 vet+0.006
inging+0.006
 lose+0.006
 harm+0.006
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.006
 Jane-0.006
 yes-0.006
Token,
Token ID11
Feature activation-1.19e-02

Pos logit contributions
 regret+0.034
 client+0.032
 vet+0.032
 harm+0.031
 lose+0.031
Neg logit contributions
aney-0.036
 contain-0.030
Jane-0.029
 Jane-0.029
 yes-0.027
Token tasty
Token ID25103
Feature activation+1.72e-02

Pos logit contributions
aney+0.021
 contain+0.018
Jane+0.017
 Kate+0.016
ill+0.016
Neg logit contributions
 regret-0.020
 vet-0.019
 Owl-0.019
 veterinarian-0.018
 harm-0.018
Token looking
Token ID2045
Feature activation+5.09e-03

Pos logit contributions
 regret+0.030
 client+0.029
 vet+0.028
 lose+0.028
 harm+0.028
Neg logit contributions
aney-0.028
 Jane-0.023
 contain-0.023
Jane-0.023
 yes-0.021
Token berries
Token ID36322
Feature activation+4.99e-02

Pos logit contributions
 regret+0.006
 vet+0.006
 harm+0.006
 veterinarian+0.006
 lose+0.006
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
 Kate-0.009
days-0.009
Token.
Token ID13
Feature activation+7.00e-03

Pos logit contributions
 regret+0.073
 vet+0.069
 harm+0.066
 veterinarian+0.065
 Owl+0.064
Neg logit contributions
aney-0.099
 contain-0.088
Jane-0.082
 yes-0.080
ill-0.078
Token
Token ID198
Feature activation+3.98e-03

Pos logit contributions
 regret+0.013
 vet+0.012
 Owl+0.012
 veterinarian+0.012
 Nut+0.012
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
 yes-0.008
ill-0.008
Token
Token ID198
Feature activation+6.19e-03

Pos logit contributions
 regret+0.004
 vet+0.004
 Owl+0.003
 veterinarian+0.003
 harm+0.003
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.008
 yes-0.008
ill-0.008
TokenSuddenly
Token ID38582
Feature activation+8.25e-03

Pos logit contributions
 vet+0.009
 regret+0.008
 veterinarian+0.008
 Owl+0.008
 harm+0.008
Neg logit contributions
aney-0.013
 contain-0.011
 Kate-0.010
 Jane-0.010
ill-0.010
Token he
Token ID339
Feature activation-3.79e-03

Pos logit contributions
 regret+0.011
 vet+0.010
 harm+0.010
 lose+0.010
 veterinarian+0.010
Neg logit contributions
aney-0.017
 contain-0.015
Jane-0.014
 yes-0.014
 Kate-0.014
Token a
Token ID257
Feature activation+2.29e-02

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 Owl+0.004
 lose+0.004
Neg logit contributions
aney-0.004
 contain-0.003
Jane-0.003
ill-0.003
 yes-0.003
Token warm
Token ID5814
Feature activation+2.04e-02

Pos logit contributions
 harm+0.036
check+0.035
athy+0.034
 regret+0.034
inging+0.034
Neg logit contributions
aney-0.036
 yes-0.032
 contain-0.031
Jane-0.029
 Jane-0.028
Token bed
Token ID3996
Feature activation+1.55e-02

Pos logit contributions
 regret+0.029
 harm+0.027
 vet+0.027
 lose+0.026
 client+0.026
Neg logit contributions
aney-0.040
 contain-0.035
Jane-0.033
 Jane-0.032
 yes-0.032
Token and
Token ID290
Feature activation+2.59e-03

Pos logit contributions
 regret+0.018
 vet+0.017
 veterinarian+0.017
 harm+0.017
 Owl+0.016
Neg logit contributions
aney-0.035
 contain-0.031
Jane-0.030
 Kate-0.028
ill-0.028
Token tasty
Token ID25103
Feature activation+1.45e-02

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 lose+0.004
 Owl+0.004
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
ill-0.004
 Kate-0.004
Token carrots
Token ID34397
Feature activation+4.98e-02

Pos logit contributions
 regret+0.024
 vet+0.023
inging+0.023
 harm+0.023
 lose+0.022
Neg logit contributions
aney-0.024
 contain-0.021
 yes-0.019
 Jane-0.019
Jane-0.019
Token every
Token ID790
Feature activation-1.18e-02

Pos logit contributions
 regret+0.068
 vet+0.063
 harm+0.061
 Owl+0.060
 lose+0.060
Neg logit contributions
aney-0.103
 contain-0.092
Jane-0.086
 yes-0.083
 Kate-0.082
Token day
Token ID1110
Feature activation+3.37e-03

Pos logit contributions
aney+0.030
 contain+0.027
ill+0.025
Jane+0.025
uits+0.025
Neg logit contributions
 regret-0.012
 vet-0.012
 harm-0.011
 veterinarian-0.011
 Owl-0.010
Token.
Token ID13
Feature activation+8.64e-03

Pos logit contributions
 regret+0.004
 harm+0.003
 vet+0.003
 lose+0.003
 veterinarian+0.003
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.007
ill-0.006
 Kate-0.006
Token Peter
Token ID5613
Feature activation+1.90e-04

Pos logit contributions
 regret+0.012
 vet+0.011
 Owl+0.011
 veterinarian+0.011
 harm+0.011
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.015
 yes-0.014
ill-0.014
Token was
Token ID373
Feature activation+5.12e-03

Pos logit contributions
 regret+0.000
 vet+0.000
 Owl+0.000
 harm+0.000
 veterinarian+0.000
Neg logit contributions
aney-0.000
 contain-0.000
Jane-0.000
 yes-0.000
ill-0.000
Token sees
Token ID7224
Feature activation+5.46e-04

Pos logit contributions
 vet+0.019
 regret+0.019
inging+0.019
 veterinarian+0.018
 harm+0.018
Neg logit contributions
aney-0.030
 contain-0.029
 Kate-0.028
 Jane-0.027
 Jill-0.027
Token something
Token ID1223
Feature activation+9.95e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
ill-0.001
Token amazing
Token ID4998
Feature activation+1.08e-02

Pos logit contributions
 vet+0.015
 regret+0.015
 Owl+0.015
check+0.015
inging+0.015
Neg logit contributions
aney-0.017
 contain-0.016
Jane-0.015
 yes-0.014
 Jane-0.014
Token.
Token ID13
Feature activation-4.03e-04

Pos logit contributions
 regret+0.015
 vet+0.014
 harm+0.014
 Owl+0.014
 veterinarian+0.013
Neg logit contributions
aney-0.022
 contain-0.020
Jane-0.018
 yes-0.018
 Kate-0.018
Token The
Token ID383
Feature activation-6.35e-04

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 yes+0.001
ill+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 Owl-0.000
 veterinarian-0.000
 harm-0.000
Token seeds
Token ID11904
Feature activation+4.94e-02

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 yes+0.001
 Jane+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
inging-0.001
 lose-0.001
 harm-0.001
Token that
Token ID326
Feature activation+3.01e-03

Pos logit contributions
 vet+0.079
 client+0.076
 unnecessary+0.076
 regret+0.075
 veterinarian+0.075
Neg logit contributions
aney-0.089
 contain-0.084
ill-0.074
 yes-0.073
Jane-0.071
Token he
Token ID339
Feature activation+1.38e-02

Pos logit contributions
 vet+0.004
 regret+0.004
inging+0.004
 veterinarian+0.004
 unnecessary+0.004
Neg logit contributions
aney-0.006
 contain-0.005
 Kate-0.005
 Jane-0.005
 yes-0.005
Token threw
Token ID9617
Feature activation+5.86e-04

Pos logit contributions
 regret+0.016
 vet+0.015
 Owl+0.014
inging+0.014
 veterinarian+0.014
Neg logit contributions
aney-0.030
 contain-0.027
 Kate-0.025
 yes-0.025
Jane-0.025
Token away
Token ID1497
Feature activation+1.06e-02

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
 Kate-0.001
Token did
Token ID750
Feature activation-3.16e-03

Pos logit contributions
 regret+0.010
 Owl+0.009
 vet+0.009
 harm+0.009
 lose+0.008
Neg logit contributions
aney-0.026
 contain-0.024
Jane-0.022
days-0.021
shirt-0.021
Token not
Token ID407
Feature activation-1.12e-02

Pos logit contributions
aney+0.021
 contain+0.018
Jane+0.017
 Kate+0.017
mos+0.016
Neg logit contributions
 regret-0.012
 lose-0.011
 vet-0.011
 harm-0.011
 damage-0.010
Token have
Token ID423
Feature activation-1.13e-02

Pos logit contributions
aney+0.024
 contain+0.022
Jane+0.020
 yes+0.019
ill+0.019
Neg logit contributions
 regret-0.015
 vet-0.014
inging-0.013
 Owl-0.013
 harm-0.013
Token any
Token ID597
Feature activation+1.15e-02

Pos logit contributions
aney+0.025
 contain+0.023
Jane+0.022
days+0.021
year+0.021
Neg logit contributions
 regret-0.015
 trouble-0.014
 harm-0.014
 wishes-0.014
 damage-0.013
Token bread
Token ID8509
Feature activation+2.77e-02

Pos logit contributions
 regret+0.017
 Owl+0.016
 vet+0.016
inging+0.016
 Nut+0.016
Neg logit contributions
aney-0.021
 contain-0.019
Jane-0.017
 yes-0.017
 Jane-0.017
Token or
Token ID393
Feature activation+8.68e-03

Pos logit contributions
 regret+0.040
 vet+0.038
 harm+0.036
 veterinarian+0.036
 Owl+0.036
Neg logit contributions
aney-0.055
 contain-0.048
Jane-0.044
 yes-0.044
ill-0.043
Token seeds
Token ID11904
Feature activation+4.93e-02

Pos logit contributions
 regret+0.014
 vet+0.013
 harm+0.013
 lose+0.012
 Owl+0.012
Neg logit contributions
aney-0.016
 contain-0.014
Jane-0.013
 Kate-0.013
ill-0.012
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
 regret+0.077
 vet+0.073
 client+0.070
inging+0.070
 Owl+0.069
Neg logit contributions
aney-0.089
 contain-0.080
 yes-0.073
ill-0.072
Jane-0.070
Token They
Token ID1119
Feature activation-3.47e-03

Pos logit contributions
 vet+0.012
 regret+0.011
 veterinarian+0.011
 Owl+0.011
 lose+0.010
Neg logit contributions
aney-0.029
 contain-0.027
Jane-0.025
 yes-0.024
shirt-0.024
Token only
Token ID691
Feature activation-1.58e-02

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.006
 Kate+0.005
days+0.005
Neg logit contributions
 regret-0.005
 vet-0.005
 lose-0.005
 harm-0.005
 veterinarian-0.005
Token had
Token ID550
Feature activation-2.24e-05

Pos logit contributions
aney+0.032
 contain+0.028
Jane+0.027
 Kate+0.026
uits+0.026
Neg logit contributions
 regret-0.023
 lose-0.020
 harm-0.020
 vet-0.020
 damage-0.018
Token some
Token ID617
Feature activation+1.58e-02

Pos logit contributions
aney+0.000
 contain+0.000
Jane+0.000
days+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 trouble-0.000
 wishes-0.000
Token cones
Token ID47314
Feature activation+2.19e-02

Pos logit contributions
 regret+0.050
 vet+0.044
 lose+0.043
 harm+0.043
 client+0.042
Neg logit contributions
aney-0.063
 contain-0.054
Jane-0.052
 yes-0.051
 Jane-0.051
Token,
Token ID11
Feature activation+3.90e-03

Pos logit contributions
 regret+0.021
 lose+0.020
 vet+0.020
 Owl+0.020
inging+0.020
Neg logit contributions
aney-0.050
 contain-0.045
Jane-0.044
 Kate-0.042
 yes-0.041
Token ac
Token ID936
Feature activation+4.31e-02

Pos logit contributions
 vet+0.005
 regret+0.005
 Owl+0.005
 veterinarian+0.005
 tou+0.005
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.007
 Kate-0.007
year-0.006
Tokenorns
Token ID19942
Feature activation+4.04e-02

Pos logit contributions
 regret+0.063
 lose+0.054
 vet+0.053
 harm+0.053
 client+0.053
Neg logit contributions
aney-0.090
Jane-0.077
 contain-0.076
ill-0.073
mos-0.073
Token,
Token ID11
Feature activation+7.19e-03

Pos logit contributions
 regret+0.056
 vet+0.052
 harm+0.049
 veterinarian+0.049
 lose+0.048
Neg logit contributions
aney-0.083
 contain-0.074
Jane-0.069
 yes-0.067
ill-0.067
Token nuts
Token ID14380
Feature activation+4.92e-02

Pos logit contributions
 regret+0.008
 vet+0.008
 Owl+0.008
 veterinarian+0.008
 lose+0.008
Neg logit contributions
aney-0.016
 contain-0.014
Jane-0.014
 Kate-0.013
year-0.013
Token,
Token ID11
Feature activation+1.34e-02

Pos logit contributions
 regret+0.072
 vet+0.066
 harm+0.064
 veterinarian+0.063
 client+0.062
Neg logit contributions
aney-0.099
 contain-0.088
Jane-0.081
 yes-0.080
ill-0.080
Token and
Token ID290
Feature activation+1.79e-03

Pos logit contributions
 regret+0.016
 vet+0.015
 Owl+0.014
 harm+0.014
 lose+0.014
Neg logit contributions
aney-0.030
 contain-0.027
Jane-0.026
 Kate-0.025
 Jane-0.024
Token seeds
Token ID11904
Feature activation+4.34e-02

Pos logit contributions
 regret+0.003
 vet+0.003
 harm+0.003
 Owl+0.003
 lose+0.003
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.003
 Kate-0.003
ill-0.003
Token.
Token ID13
Feature activation+6.05e-03

Pos logit contributions
 regret+0.058
 vet+0.055
 harm+0.053
 Owl+0.052
 lose+0.052
Neg logit contributions
aney-0.090
 contain-0.080
Jane-0.076
 yes-0.073
 Kate-0.072
Token They
Token ID1119
Feature activation+1.06e-02

Pos logit contributions
 vet+0.007
 regret+0.007
 Owl+0.006
 veterinarian+0.006
 Nut+0.006
Neg logit contributions
aney-0.013
 contain-0.012
Jane-0.011
 yes-0.011
ill-0.011
Token a
Token ID257
Feature activation+7.69e-03

Pos logit contributions
 vet+0.004
 veterinarian+0.004
 regret+0.004
 Owl+0.004
 Nut+0.004
Neg logit contributions
aney-0.006
 contain-0.005
ill-0.004
days-0.004
 seas-0.004
Token tall
Token ID7331
Feature activation-8.14e-03

Pos logit contributions
inging+0.012
 regret+0.012
 harm+0.012
check+0.012
 wishes+0.012
Neg logit contributions
aney-0.012
Jane-0.011
 yes-0.011
 Jane-0.010
 contain-0.010
Token gir
Token ID37370
Feature activation+1.99e-02

Pos logit contributions
aney+0.017
 contain+0.015
Jane+0.015
 Jane+0.015
 Kate+0.014
Neg logit contributions
 regret-0.010
 unnecessary-0.009
 harm-0.009
inging-0.009
check-0.009
Tokenaffe
Token ID21223
Feature activation+7.92e-03

Pos logit contributions
 regret+0.030
 lose+0.027
 unnecessary+0.027
 harm+0.026
check+0.026
Neg logit contributions
aney-0.038
 contain-0.033
Jane-0.031
ill-0.030
 yes-0.030
Token named
Token ID3706
Feature activation+6.86e-03

Pos logit contributions
 regret+0.013
 lose+0.012
inging+0.012
 damage+0.012
 harm+0.012
Neg logit contributions
aney-0.013
 contain-0.012
Jane-0.011
 yes-0.011
 Jane-0.010
Token Gerry
Token ID42196
Feature activation+4.89e-02

Pos logit contributions
 regret+0.012
 vet+0.011
 harm+0.011
 lose+0.011
 veterinarian+0.011
Neg logit contributions
aney-0.012
 contain-0.010
Jane-0.010
 yes-0.009
 Kate-0.009
Token.
Token ID13
Feature activation+4.90e-03

Pos logit contributions
 regret+0.091
 unnecessary+0.085
inging+0.084
 damage+0.083
 trouble+0.083
Neg logit contributions
aney-0.067
 yes-0.059
 contain-0.058
ill-0.056
mos-0.054
Token Kitty
Token ID21168
Feature activation+1.69e-02

Pos logit contributions
 regret+0.008
 harm+0.008
 damage+0.007
 unnecessary+0.007
 lose+0.007
Neg logit contributions
aney-0.008
 Jane-0.007
 contain-0.007
 Lots-0.007
 yes-0.007
Token asked
Token ID1965
Feature activation+1.45e-02

Pos logit contributions
 regret+0.030
 vet+0.028
 harm+0.028
inging+0.028
 unnecessary+0.027
Neg logit contributions
aney-0.027
 contain-0.023
Jane-0.022
 Jane-0.021
 yes-0.021
Token Gerry
Token ID42196
Feature activation+2.46e-02

Pos logit contributions
 regret+0.022
 lose+0.021
 harm+0.020
inging+0.020
 damage+0.020
Neg logit contributions
aney-0.026
 yes-0.023
 Jane-0.023
Jane-0.022
 contain-0.022
Token if
Token ID611
Feature activation+2.19e-02

Pos logit contributions
 regret+0.041
 vet+0.038
 harm+0.037
inging+0.037
 masse+0.037
Neg logit contributions
aney-0.041
 yes-0.035
ill-0.034
 contain-0.034
 Jane-0.034
Token the
Token ID262
Feature activation-7.89e-03

Pos logit contributions
aney+0.014
 contain+0.011
Jane+0.011
 yes+0.011
days+0.010
Neg logit contributions
 regret-0.008
 vet-0.007
 harm-0.007
 lose-0.007
 Owl-0.007
Token edge
Token ID5743
Feature activation-3.23e-02

Pos logit contributions
aney+0.017
 contain+0.015
Jane+0.014
 yes+0.014
mos+0.014
Neg logit contributions
 regret-0.010
 vet-0.010
 veterinarian-0.010
 lose-0.009
 harm-0.009
Token of
Token ID286
Feature activation-1.85e-02

Pos logit contributions
aney+0.073
 contain+0.067
 yes+0.063
 apr+0.062
days+0.060
Neg logit contributions
 vet-0.032
 regret-0.030
 harm-0.030
 trouble-0.028
inging-0.028
Token the
Token ID262
Feature activation-8.12e-03

Pos logit contributions
aney+0.050
 contain+0.042
days+0.041
rd+0.040
year+0.040
Neg logit contributions
 trouble-0.016
 regret-0.016
 harm-0.015
 vet-0.014
 damage-0.014
Token tr
Token ID491
Feature activation-3.07e-02

Pos logit contributions
aney+0.016
 contain+0.014
Jane+0.013
 yes+0.013
ill+0.012
Neg logit contributions
 regret-0.012
 vet-0.012
 veterinarian-0.011
 lose-0.011
 harm-0.011
Tokenamp
Token ID696
Feature activation-6.80e-02

Pos logit contributions
aney+0.059
 contain+0.055
 yes+0.053
Jane+0.048
 Jane+0.046
Neg logit contributions
inging-0.042
 regret-0.042
 vet-0.040
 Owl-0.039
 harm-0.039
Tokenoline
Token ID14453
Feature activation-1.63e-02

Pos logit contributions
aney+0.110
 contain+0.105
 yes+0.105
nam+0.093
year+0.086
Neg logit contributions
 harm-0.105
 vet-0.097
inging-0.096
 dog-0.095
 Owl-0.095
Token and
Token ID290
Feature activation-1.05e-02

Pos logit contributions
aney+0.042
 contain+0.037
Jane+0.036
year+0.035
days+0.034
Neg logit contributions
 vet-0.013
 regret-0.012
 veterinarian-0.012
 Owl-0.011
 lose-0.010
Token took
Token ID1718
Feature activation+3.20e-03

Pos logit contributions
aney+0.021
 contain+0.017
Jane+0.016
days+0.016
ill+0.016
Neg logit contributions
 regret-0.016
 lose-0.015
 vet-0.014
 Nut-0.014
 Owl-0.014
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 lose+0.004
 wishes+0.004
Neg logit contributions
aney-0.006
 contain-0.006
Jane-0.005
days-0.005
ill-0.005
Token big
Token ID1263
Feature activation+6.13e-03

Pos logit contributions
 regret+0.016
 lose+0.015
 vet+0.014
 harm+0.014
inging+0.014
Neg logit contributions
aney-0.026
 contain-0.023
Jane-0.022
ill-0.021
 Kate-0.021
Token back
Token ID736
Feature activation+9.28e-03

Pos logit contributions
aney+0.002
 contain+0.002
Jane+0.001
 Jane+0.001
 yes+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
inging-0.001
 Owl-0.001
 harm-0.001
Token to
Token ID284
Feature activation-2.48e-03

Pos logit contributions
 regret+0.015
 vet+0.015
inging+0.014
 tou+0.014
 difference+0.014
Neg logit contributions
aney-0.016
 contain-0.014
 Jane-0.013
Jane-0.013
 Kate-0.013
Token the
Token ID262
Feature activation-2.48e-03

Pos logit contributions
aney+0.004
 contain+0.004
 Jane+0.004
Jane+0.004
 Kate+0.004
Neg logit contributions
inging-0.004
 vet-0.003
 regret-0.003
 Owl-0.003
 veterinarian-0.003
Token same
Token ID976
Feature activation-1.95e-02

Pos logit contributions
aney+0.005
 contain+0.005
Jane+0.004
 yes+0.004
 Kate+0.004
Neg logit contributions
 regret-0.003
inging-0.003
 Owl-0.003
 harm-0.003
 vet-0.003
Token tr
Token ID491
Feature activation-2.16e-02

Pos logit contributions
aney+0.042
 contain+0.036
Jane+0.034
 yes+0.032
ill+0.032
Neg logit contributions
 regret-0.028
 vet-0.026
 harm-0.025
 veterinarian-0.024
 lose-0.023
Tokenamp
Token ID696
Feature activation-6.51e-02

Pos logit contributions
aney+0.040
 contain+0.035
 yes+0.032
Jane+0.032
 Jane+0.031
Neg logit contributions
 regret-0.034
 vet-0.032
 harm-0.032
inging-0.032
 lose-0.031
Tokenoline
Token ID14453
Feature activation-1.85e-02

Pos logit contributions
aney+0.098
 contain+0.087
 yes+0.079
uits+0.071
days+0.070
Neg logit contributions
 harm-0.116
 regret-0.114
 veterinarian-0.110
 vet-0.110
 Owl-0.105
Token again
Token ID757
Feature activation+7.36e-03

Pos logit contributions
aney+0.039
 contain+0.033
Jane+0.032
mos+0.031
year+0.031
Neg logit contributions
 regret-0.025
 vet-0.023
 veterinarian-0.022
 harm-0.022
 lose-0.022
Token.
Token ID13
Feature activation+2.06e-03

Pos logit contributions
 regret+0.009
 vet+0.009
 harm+0.008
 lose+0.008
 Owl+0.008
Neg logit contributions
aney-0.016
 contain-0.014
Jane-0.013
 yes-0.013
 Kate-0.013
Token<|endoftext|>
Token ID50256
Feature activation-5.31e-03

Pos logit contributions
 regret+0.003
 vet+0.003
 harm+0.003
 lose+0.003
 Owl+0.003
Neg logit contributions
aney-0.004
 contain-0.003
Jane-0.003
 Jane-0.003
 Kate-0.003
TokenOnce
Token ID7454
Feature activation-1.32e-02

Pos logit contributions
aney+0.012
 contain+0.011
Jane+0.011
 Kate+0.010
 Jane+0.010
Neg logit contributions
 regret-0.006
 vet-0.005
 harm-0.005
 lose-0.005
inging-0.005
Token playing
Token ID2712
Feature activation-2.07e-03

Pos logit contributions
aney+0.010
 contain+0.009
Jane+0.008
ill+0.008
 yes+0.008
Neg logit contributions
 regret-0.009
 vet-0.008
 veterinarian-0.008
 lose-0.008
 Owl-0.007
Token outside
Token ID2354
Feature activation-4.72e-03

Pos logit contributions
aney+0.005
 contain+0.005
ill+0.004
Jane+0.004
year+0.004
Neg logit contributions
 regret-0.002
 vet-0.002
 veterinarian-0.002
 stitch-0.002
 Owl-0.002
Token on
Token ID319
Feature activation-1.03e-02

Pos logit contributions
aney+0.013
 contain+0.011
Jane+0.010
 yes+0.010
uits+0.010
Neg logit contributions
 regret-0.004
 harm-0.004
 vet-0.004
 Owl-0.004
 veterinarian-0.003
Token the
Token ID262
Feature activation+3.03e-03

Pos logit contributions
aney+0.023
 contain+0.020
 yes+0.018
Jane+0.018
year+0.018
Neg logit contributions
 regret-0.014
 vet-0.013
 harm-0.012
 Owl-0.012
 Nut-0.012
Token tr
Token ID491
Feature activation-2.40e-02

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 veterinarian+0.004
 Owl+0.004
Neg logit contributions
aney-0.006
 contain-0.006
Jane-0.005
 yes-0.005
 Kate-0.005
Tokenamp
Token ID696
Feature activation-6.32e-02

Pos logit contributions
aney+0.044
 contain+0.040
 yes+0.039
Jane+0.035
 Kate+0.034
Neg logit contributions
 regret-0.036
inging-0.035
 vet-0.033
 harm-0.033
 Owl-0.033
Tokenoline
Token ID14453
Feature activation-1.29e-02

Pos logit contributions
 yes+0.096
aney+0.095
 contain+0.090
nam+0.079
 YES+0.076
Neg logit contributions
 harm-0.107
 regret-0.100
 damage-0.097
inging-0.095
 vet-0.095
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
aney+0.035
 contain+0.030
Jane+0.029
year+0.028
mos+0.027
Neg logit contributions
 vet-0.009
 regret-0.009
 veterinarian-0.009
 harm-0.008
 Owl-0.008
Token He
Token ID679
Feature activation-3.31e-03

Pos logit contributions
aney+0.028
 contain+0.026
 yes+0.024
Jane+0.024
days+0.024
Neg logit contributions
 regret-0.011
 vet-0.011
 Nut-0.010
 veterinarian-0.010
 lose-0.010
Token was
Token ID373
Feature activation-1.94e-03

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.006
 yes+0.006
days+0.006
Neg logit contributions
 regret-0.004
 lose-0.004
 vet-0.004
 harm-0.004
 wishes-0.003
Token excited
Token ID6568
Feature activation-9.18e-03

Pos logit contributions
aney+0.004
 contain+0.004
 yes+0.003
days+0.003
mos+0.003
Neg logit contributions
 regret-0.003
 vet-0.002
 trouble-0.002
 lose-0.002
 Owl-0.002
Token and
Token ID290
Feature activation-1.94e-02

Pos logit contributions
aney+0.020
 contain+0.017
Jane+0.016
mos+0.016
days+0.016
Neg logit contributions
 regret-0.007
 Owl-0.006
 vet-0.006
 veterinarian-0.006
 Nut-0.006
Token down
Token ID866
Feature activation-5.55e-03

Pos logit contributions
aney+0.039
 contain+0.033
Jane+0.032
 yes+0.030
ill+0.030
Neg logit contributions
 regret-0.029
 vet-0.027
 lose-0.026
 harm-0.026
 Owl-0.026
Token on
Token ID319
Feature activation-1.16e-02

Pos logit contributions
aney+0.013
 contain+0.012
Jane+0.011
mos+0.011
shirt+0.010
Neg logit contributions
 regret-0.006
 vet-0.005
 harm-0.005
 lose-0.005
 Owl-0.005
Token the
Token ID262
Feature activation-5.21e-03

Pos logit contributions
aney+0.026
 contain+0.023
Jane+0.021
year+0.021
days+0.021
Neg logit contributions
 regret-0.014
 vet-0.014
 harm-0.013
 trouble-0.013
 Owl-0.013
Token tr
Token ID491
Feature activation-2.77e-02

Pos logit contributions
aney+0.012
Jane+0.010
year+0.010
ill+0.009
mos+0.009
Neg logit contributions
 regret-0.006
 damage-0.006
 lose-0.006
 veterinarian-0.006
 harm-0.006
Tokenamp
Token ID696
Feature activation-6.31e-02

Pos logit contributions
aney+0.052
 contain+0.049
 yes+0.047
Jane+0.042
 Jane+0.041
Neg logit contributions
 regret-0.039
inging-0.039
 vet-0.036
 harm-0.036
 lose-0.036
Tokenoline
Token ID14453
Feature activation-1.75e-02

Pos logit contributions
 yes+0.098
aney+0.097
 contain+0.096
nam+0.087
year+0.078
Neg logit contributions
 harm-0.102
 damage-0.092
 regret-0.092
inging-0.089
 vet-0.089
Token.
Token ID13
Feature activation-3.97e-03

Pos logit contributions
aney+0.043
 contain+0.038
Jane+0.037
year+0.035
mos+0.034
Neg logit contributions
 regret-0.016
 vet-0.015
 harm-0.014
 lose-0.014
 veterinarian-0.014
Token
Token ID198
Feature activation-6.56e-03

Pos logit contributions
aney+0.008
 contain+0.008
 yes+0.008
days+0.007
 seas+0.007
Neg logit contributions
 vet-0.004
 regret-0.004
 Bad-0.004
 wishes-0.004
 Owl-0.004
Token
Token ID198
Feature activation-9.49e-03

Pos logit contributions
aney+0.017
 contain+0.017
 yes+0.015
 apr+0.015
mos+0.015
Neg logit contributions
 vet-0.004
 veterinarian-0.003
 regret-0.003
 harm-0.003
 Owl-0.003
Token"
Token ID1
Feature activation-9.06e-03

Pos logit contributions
aney+0.019
 contain+0.019
 seas+0.017
 yes+0.017
 apr+0.017
Neg logit contributions
 vet-0.011
 veterinarian-0.010
inging-0.009
 owners-0.009
 Bad-0.009
Token to
Token ID284
Feature activation-1.47e-03

Pos logit contributions
 regret+0.007
 vet+0.007
 harm+0.007
 lose+0.007
 veterinarian+0.007
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
ill-0.009
 Kate-0.009
Token jump
Token ID4391
Feature activation-1.34e-02

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 Kate+0.002
 Jane+0.002
Neg logit contributions
 regret-0.002
 vet-0.002
 Owl-0.002
 harm-0.002
inging-0.002
Token on
Token ID319
Feature activation-1.93e-02

Pos logit contributions
aney+0.036
 contain+0.031
Jane+0.030
year+0.029
days+0.029
Neg logit contributions
 will-0.010
 lose-0.010
 Nut-0.009
 dog-0.009
 veterinarian-0.008
Token the
Token ID262
Feature activation+4.74e-04

Pos logit contributions
aney+0.048
 contain+0.044
year+0.040
ill+0.038
Jane+0.038
Neg logit contributions
 regret-0.019
 vet-0.018
 trouble-0.018
 harm-0.018
 veterinarian-0.018
Token tr
Token ID491
Feature activation-2.13e-02

Pos logit contributions
 dog+0.000
 vet+0.000
 wishes+0.000
 will+0.000
 veterinarian+0.000
Neg logit contributions
aney-0.001
Jane-0.001
 contain-0.001
ill-0.001
year-0.001
Tokenamp
Token ID696
Feature activation-6.29e-02

Pos logit contributions
aney+0.039
 contain+0.036
 yes+0.034
Jane+0.032
 Jane+0.030
Neg logit contributions
inging-0.030
 regret-0.030
 vet-0.029
 harm-0.029
 Owl-0.028
Tokenoline
Token ID14453
Feature activation-2.31e-02

Pos logit contributions
 contain+0.101
 yes+0.100
aney+0.100
nam+0.085
 YES+0.081
Neg logit contributions
 harm-0.100
 veterinarian-0.092
 vet-0.091
 mailbox-0.089
 Owl-0.088
Token,
Token ID11
Feature activation-1.06e-02

Pos logit contributions
aney+0.059
 contain+0.051
Jane+0.050
year+0.048
mos+0.047
Neg logit contributions
 vet-0.019
 regret-0.019
 veterinarian-0.019
 lose-0.018
 harm-0.018
Token but
Token ID475
Feature activation+1.61e-04

Pos logit contributions
aney+0.023
Jane+0.018
 contain+0.018
days+0.017
year+0.017
Neg logit contributions
 regret-0.015
 vet-0.014
 veterinarian-0.014
 lose-0.014
 Owl-0.013
Token Lily
Token ID20037
Feature activation+5.34e-04

Pos logit contributions
 regret+0.000
 vet+0.000
 lose+0.000
 harm+0.000
 veterinarian+0.000
Neg logit contributions
aney-0.000
 contain-0.000
Jane-0.000
ill-0.000
mos-0.000
Token wanted
Token ID2227
Feature activation-3.92e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 difference+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
 Kate-0.001
Token old
Token ID1468
Feature activation-1.49e-02

Pos logit contributions
 regret+0.001
 Owl+0.001
 vet+0.001
 harm+0.001
 lose+0.001
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 yes-0.001
 Jane-0.001
Token lady
Token ID10846
Feature activation-1.30e-02

Pos logit contributions
aney+0.036
 contain+0.032
Jane+0.031
 yes+0.029
 Kate+0.029
Neg logit contributions
 regret-0.015
 vet-0.014
 harm-0.013
 veterinarian-0.013
 Owl-0.013
Token with
Token ID351
Feature activation-3.94e-04

Pos logit contributions
aney+0.028
 contain+0.025
Jane+0.024
 yes+0.022
 Kate+0.022
Neg logit contributions
 regret-0.016
 Owl-0.015
 vet-0.015
 harm-0.015
 veterinarian-0.015
Token a
Token ID257
Feature activation+1.64e-02

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
ill+0.001
 Kate+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 lose-0.001
 Owl-0.001
Token mis
Token ID2984
Feature activation+7.74e-03

Pos logit contributions
 regret+0.022
 vet+0.021
 harm+0.020
 veterinarian+0.020
 lose+0.020
Neg logit contributions
aney-0.034
 contain-0.030
Jane-0.029
 Kate-0.027
ill-0.027
Tokenchie
Token ID3043
Feature activation-6.26e-02

Pos logit contributions
 regret+0.014
 harm+0.014
 vet+0.014
 Owl+0.013
 lose+0.013
Neg logit contributions
aney-0.012
 contain-0.010
 Kate-0.010
 yes-0.009
 Jane-0.009
Tokenvous
Token ID31222
Feature activation+8.58e-03

Pos logit contributions
aney+0.120
 contain+0.101
uits+0.101
ill+0.099
 yes+0.099
Neg logit contributions
 harm-0.084
 knows-0.079
 regret-0.078
 wishes-0.077
 will-0.074
Token tw
Token ID665
Feature activation-5.02e-03

Pos logit contributions
 regret+0.012
 vet+0.011
 harm+0.011
 veterinarian+0.011
 lose+0.011
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.015
 Kate-0.014
ill-0.014
Tokeninkle
Token ID19894
Feature activation+2.58e-03

Pos logit contributions
aney+0.012
 contain+0.011
Jane+0.010
 Kate+0.010
 Jane+0.010
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
 lose-0.005
inging-0.005
Token in
Token ID287
Feature activation-5.18e-03

Pos logit contributions
 regret+0.003
 vet+0.003
 harm+0.003
 Owl+0.003
 lose+0.003
Neg logit contributions
aney-0.006
 contain-0.005
Jane-0.005
 Kate-0.004
 yes-0.004
Token her
Token ID607
Feature activation+1.02e-02

Pos logit contributions
aney+0.011
 contain+0.009
Jane+0.009
 yes+0.008
ill+0.008
Neg logit contributions
 regret-0.007
 vet-0.007
 harm-0.007
 lose-0.007
 veterinarian-0.006
Token and
Token ID290
Feature activation-9.64e-03

Pos logit contributions
aney+0.074
 contain+0.066
shirt+0.064
year+0.060
Jane+0.059
Neg logit contributions
 regret-0.010
 trouble-0.010
 pets-0.009
 will-0.009
ising-0.009
Token jumped
Token ID11687
Feature activation+3.50e-03

Pos logit contributions
aney+0.017
 contain+0.014
Jane+0.013
ill+0.013
year+0.012
Neg logit contributions
 regret-0.016
 vet-0.015
 lose-0.015
 Owl-0.015
 Nut-0.015
Token on
Token ID319
Feature activation-1.60e-02

Pos logit contributions
 regret+0.004
 veterinarian+0.003
 vet+0.003
 lose+0.003
 Nut+0.003
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.007
days-0.007
year-0.007
Token the
Token ID262
Feature activation+1.79e-03

Pos logit contributions
aney+0.037
 contain+0.033
Jane+0.030
ill+0.030
year+0.030
Neg logit contributions
 regret-0.019
 vet-0.018
 veterinarian-0.018
 trouble-0.017
 harm-0.017
Token tr
Token ID491
Feature activation-2.26e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 veterinarian+0.002
 lose+0.002
Neg logit contributions
aney-0.004
 contain-0.003
Jane-0.003
 Kate-0.003
 yes-0.003
Tokenamp
Token ID696
Feature activation-6.22e-02

Pos logit contributions
aney+0.040
 contain+0.037
 yes+0.034
Jane+0.032
 Kate+0.031
Neg logit contributions
 regret-0.035
 vet-0.034
inging-0.034
 harm-0.033
 veterinarian-0.032
Tokenoline
Token ID14453
Feature activation-1.64e-02

Pos logit contributions
 contain+0.092
aney+0.091
 yes+0.090
nam+0.074
year+0.073
Neg logit contributions
 harm-0.109
 vet-0.102
 veterinarian-0.100
 damage-0.099
 regret-0.098
Token.
Token ID13
Feature activation-6.25e-03

Pos logit contributions
aney+0.041
 contain+0.036
Jane+0.035
year+0.034
shirt+0.033
Neg logit contributions
 regret-0.014
 vet-0.014
 veterinarian-0.014
 harm-0.013
 lose-0.013
Token Lily
Token ID20037
Feature activation-3.32e-05

Pos logit contributions
aney+0.013
 contain+0.012
Jane+0.011
ill+0.011
 yes+0.011
Neg logit contributions
 regret-0.008
 vet-0.008
 veterinarian-0.007
 lose-0.007
 Nut-0.007
Token was
Token ID373
Feature activation+3.97e-03

Pos logit contributions
aney+0.000
 contain+0.000
Jane+0.000
year+0.000
 yes+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 lose-0.000
 harm-0.000
 wishes-0.000
Token so
Token ID523
Feature activation-1.27e-05

Pos logit contributions
 regret+0.007
 lose+0.006
 vet+0.006
 veterinarian+0.006
 harm+0.006
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.006
mos-0.005
ill-0.005
Token went
Token ID1816
Feature activation+1.71e-03

Pos logit contributions
aney+0.018
 contain+0.015
Jane+0.014
 yes+0.014
 Kate+0.014
Neg logit contributions
 regret-0.015
 vet-0.014
 harm-0.014
 Owl-0.014
 lose-0.014
Token on
Token ID319
Feature activation+4.20e-04

Pos logit contributions
 regret+0.003
 vet+0.002
 harm+0.002
 Owl+0.002
 lose+0.002
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.003
 yes-0.003
 Kate-0.003
Token with
Token ID351
Feature activation+4.57e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 veterinarian+0.001
 Owl+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
ill-0.001
 yes-0.001
Token his
Token ID465
Feature activation+1.54e-02

Pos logit contributions
 regret+0.008
 vet+0.007
 harm+0.007
 Owl+0.007
 lose+0.007
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.006
ill-0.006
 yes-0.006
Token mis
Token ID2984
Feature activation-2.93e-03

Pos logit contributions
 regret+0.018
 vet+0.016
 harm+0.015
inging+0.015
 lose+0.015
Neg logit contributions
aney-0.034
 contain-0.031
 yes-0.029
Jane-0.029
 Kate-0.028
Tokenchie
Token ID3043
Feature activation-6.21e-02

Pos logit contributions
aney+0.005
 contain+0.004
 yes+0.004
 Kate+0.004
 Jane+0.004
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
inging-0.005
 veterinarian-0.005
Tokenvous
Token ID31222
Feature activation-1.03e-03

Pos logit contributions
aney+0.115
 contain+0.102
 yes+0.099
uits+0.099
 seas+0.098
Neg logit contributions
 harm-0.090
 regret-0.083
 vet-0.079
check-0.078
 trouble-0.076
Token ways
Token ID2842
Feature activation+5.54e-03

Pos logit contributions
aney+0.002
 contain+0.002
Jane+0.002
ill+0.002
 Kate+0.002
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 Owl-0.001
 lose-0.001
Token,
Token ID11
Feature activation-9.17e-03

Pos logit contributions
 regret+0.007
 vet+0.007
 harm+0.007
 Owl+0.006
 lose+0.006
Neg logit contributions
aney-0.012
 contain-0.010
Jane-0.010
 yes-0.009
 Kate-0.009
Token causing
Token ID6666
Feature activation-5.74e-04

Pos logit contributions
aney+0.016
 contain+0.014
Jane+0.013
 Kate+0.012
ill+0.012
Neg logit contributions
 regret-0.016
 vet-0.015
 harm-0.014
 Owl-0.014
 lose-0.014
Token problems
Token ID2761
Feature activation+2.05e-02

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 Jane+0.001
 Kate+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
inging-0.001
 veterinarian-0.001
 Owl-0.001
Token bounce
Token ID20110
Feature activation-1.12e-03

Pos logit contributions
aney+0.013
Jane+0.012
year+0.012
 contain+0.011
mos+0.011
Neg logit contributions
 regret-0.004
 vet-0.004
 lose-0.004
 veterinarian-0.004
 Nut-0.004
Token on
Token ID319
Feature activation-1.78e-02

Pos logit contributions
aney+0.003
 contain+0.002
Jane+0.002
year+0.002
 Kate+0.002
Neg logit contributions
 regret-0.001
 lose-0.001
 vet-0.001
 harm-0.001
 Nut-0.001
Token the
Token ID262
Feature activation-1.21e-03

Pos logit contributions
aney+0.040
 contain+0.035
Jane+0.033
year+0.033
days+0.031
Neg logit contributions
 regret-0.022
 vet-0.021
 harm-0.020
 lose-0.019
 trouble-0.019
Token big
Token ID1263
Feature activation-2.13e-02

Pos logit contributions
aney+0.002
 contain+0.002
Jane+0.002
 yes+0.002
 Kate+0.002
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 Owl-0.002
 veterinarian-0.002
Token tr
Token ID491
Feature activation-2.69e-02

Pos logit contributions
aney+0.048
 contain+0.041
Jane+0.041
year+0.041
 yes+0.039
Neg logit contributions
 regret-0.025
 vet-0.023
 harm-0.023
 lose-0.023
 veterinarian-0.023
Tokenamp
Token ID696
Feature activation-6.20e-02

Pos logit contributions
aney+0.049
 contain+0.045
 yes+0.042
Jane+0.039
 Kate+0.038
Neg logit contributions
 regret-0.041
inging-0.040
 vet-0.039
 harm-0.038
 Owl-0.038
Tokenoline
Token ID14453
Feature activation-1.45e-02

Pos logit contributions
aney+0.094
 contain+0.090
 yes+0.087
 YES+0.071
nam+0.069
Neg logit contributions
 harm-0.108
 vet-0.101
 regret-0.101
 veterinarian-0.101
 damage-0.099
Token in
Token ID287
Feature activation-1.22e-02

Pos logit contributions
aney+0.041
Jane+0.035
 contain+0.034
 Jane+0.033
 Kate+0.033
Neg logit contributions
 vet-0.009
 regret-0.008
 veterinarian-0.008
 lose-0.007
 harm-0.007
Token their
Token ID511
Feature activation-1.42e-03

Pos logit contributions
aney+0.027
 contain+0.024
 yes+0.022
year+0.021
Jane+0.021
Neg logit contributions
 regret-0.015
 vet-0.014
 harm-0.014
 lose-0.014
 trouble-0.013
Token backyard
Token ID24296
Feature activation-4.32e-03

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.002
 Kate+0.002
 yes+0.002
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 Owl-0.002
 veterinarian-0.002
Token.
Token ID13
Feature activation-2.52e-03

Pos logit contributions
aney+0.012
 contain+0.010
Jane+0.010
year+0.010
 Kate+0.009
Neg logit contributions
 vet-0.003
 regret-0.003
 veterinarian-0.003
 difference-0.003
inging-0.003
Token to
Token ID284
Feature activation+7.20e-03

Pos logit contributions
aney+0.012
 contain+0.011
Jane+0.010
days+0.010
year+0.010
Neg logit contributions
 vet-0.004
 regret-0.004
 trouble-0.004
 harm-0.004
 veterinarian-0.004
Token bounce
Token ID20110
Feature activation-1.40e-03

Pos logit contributions
 regret+0.008
 vet+0.007
 lose+0.007
 harm+0.007
 veterinarian+0.007
Neg logit contributions
aney-0.017
 contain-0.015
Jane-0.015
 yes-0.014
 Kate-0.014
Token on
Token ID319
Feature activation-1.44e-02

Pos logit contributions
aney+0.004
year+0.003
 contain+0.003
Jane+0.003
days+0.003
Neg logit contributions
 lose-0.002
 regret-0.001
 Nut-0.001
 harm-0.001
 trouble-0.001
Token the
Token ID262
Feature activation+4.52e-03

Pos logit contributions
aney+0.034
 contain+0.030
year+0.029
days+0.028
Jane+0.027
Neg logit contributions
 regret-0.016
 vet-0.015
 trouble-0.015
 Nut-0.015
 veterinarian-0.014
Token tr
Token ID491
Feature activation-2.42e-02

Pos logit contributions
 regret+0.006
 vet+0.006
 harm+0.006
 veterinarian+0.006
 lose+0.006
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.008
 Kate-0.007
 yes-0.007
Tokenamp
Token ID696
Feature activation-6.19e-02

Pos logit contributions
aney+0.046
 contain+0.043
 yes+0.042
Jane+0.038
 Kate+0.037
Neg logit contributions
inging-0.033
 regret-0.032
 vet-0.031
 harm-0.030
 lose-0.030
Tokenoline
Token ID14453
Feature activation-1.80e-02

Pos logit contributions
 yes+0.102
 contain+0.101
aney+0.100
nam+0.089
year+0.083
Neg logit contributions
 harm-0.096
 vet-0.085
 veterinarian-0.085
 dog-0.083
 damage-0.082
Token,
Token ID11
Feature activation+3.58e-03

Pos logit contributions
aney+0.049
Jane+0.042
 contain+0.041
year+0.041
mos+0.040
Neg logit contributions
 regret-0.012
 vet-0.012
 veterinarian-0.012
 lose-0.011
 harm-0.011
Token but
Token ID475
Feature activation+4.07e-03

Pos logit contributions
 regret+0.005
 lose+0.004
 vet+0.004
 Nut+0.004
 Owl+0.004
Neg logit contributions
aney-0.008
year-0.006
Jane-0.006
mos-0.006
 contain-0.006
Token Mia
Token ID32189
Feature activation-3.78e-03

Pos logit contributions
 regret+0.005
 lose+0.005
 vet+0.005
 will+0.005
 wishes+0.004
Neg logit contributions
aney-0.009
 contain-0.008
year-0.007
mos-0.007
 Lots-0.007
Token was
Token ID373
Feature activation+3.62e-03

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.006
 yes+0.006
 Kate+0.005
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.005
 Owl-0.005
 veterinarian-0.005
Token Jack
Token ID3619
Feature activation-9.37e-03

Pos logit contributions
 regret+0.004
 vet+0.004
 Owl+0.004
 wishes+0.004
 promises+0.004
Neg logit contributions
aney-0.006
 contain-0.006
year-0.005
 seas-0.005
 yes-0.005
Token jumped
Token ID11687
Feature activation-5.37e-03

Pos logit contributions
aney+0.019
 contain+0.017
Jane+0.015
year+0.015
days+0.015
Neg logit contributions
 regret-0.014
 vet-0.012
 Owl-0.012
 veterinarian-0.012
 lose-0.012
Token on
Token ID319
Feature activation-1.40e-02

Pos logit contributions
aney+0.013
 contain+0.011
Jane+0.011
days+0.011
mos+0.010
Neg logit contributions
 regret-0.005
 vet-0.005
 veterinarian-0.005
 lose-0.005
inging-0.005
Token the
Token ID262
Feature activation-6.41e-03

Pos logit contributions
aney+0.034
 contain+0.029
year+0.028
days+0.027
mos+0.027
Neg logit contributions
 regret-0.015
 vet-0.014
 Owl-0.014
 veterinarian-0.013
 harm-0.013
Token tr
Token ID491
Feature activation-2.19e-02

Pos logit contributions
aney+0.014
year+0.011
Jane+0.011
 contain+0.011
ill+0.011
Neg logit contributions
 regret-0.008
 dog-0.008
 lose-0.008
 hurts-0.008
 damage-0.008
Tokenamp
Token ID696
Feature activation-6.14e-02

Pos logit contributions
aney+0.041
 contain+0.039
 yes+0.037
Jane+0.034
 Jane+0.033
Neg logit contributions
inging-0.031
 regret-0.030
 vet-0.029
 Owl-0.028
 harm-0.028
Tokenoline
Token ID14453
Feature activation-1.89e-02

Pos logit contributions
 contain+0.100
aney+0.099
 yes+0.099
nam+0.091
year+0.082
Neg logit contributions
 harm-0.095
inging-0.082
 dog-0.082
 arrest-0.080
 Owl-0.080
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
aney+0.049
 contain+0.042
Jane+0.042
year+0.040
mos+0.040
Neg logit contributions
 regret-0.015
 vet-0.014
 veterinarian-0.013
 lose-0.013
 Owl-0.013
Token shouted
Token ID17293
Feature activation-1.25e-02

Pos logit contributions
aney+0.031
 contain+0.026
mos+0.024
year+0.024
days+0.024
Neg logit contributions
 regret-0.020
 lose-0.018
 Nut-0.017
 vet-0.017
 Cle-0.017
Token,
Token ID11
Feature activation-6.64e-03

Pos logit contributions
aney+0.032
 contain+0.030
Jane+0.028
ill+0.026
days+0.026
Neg logit contributions
 regret-0.010
 lose-0.009
 veterinarian-0.009
 vet-0.009
inging-0.009
Token "
Token ID366
Feature activation-6.06e-03

Pos logit contributions
aney+0.014
 contain+0.012
Jane+0.011
ill+0.011
days+0.011
Neg logit contributions
 regret-0.009
 vet-0.009
 Owl-0.009
 veterinarian-0.009
inging-0.008
Token and
Token ID290
Feature activation-2.06e-02

Pos logit contributions
aney+0.033
 contain+0.029
Jane+0.028
 yes+0.027
 Kate+0.027
Neg logit contributions
 regret-0.015
 vet-0.014
 harm-0.014
 Owl-0.014
 lose-0.014
Token higher
Token ID2440
Feature activation-5.04e-03

Pos logit contributions
aney+0.040
 contain+0.033
Jane+0.031
 yes+0.030
 Kate+0.030
Neg logit contributions
 regret-0.032
 vet-0.030
 lose-0.030
 harm-0.029
 Owl-0.028
Token on
Token ID319
Feature activation-7.31e-03

Pos logit contributions
aney+0.011
 contain+0.009
Jane+0.009
 yes+0.009
 Kate+0.008
Neg logit contributions
 regret-0.007
 vet-0.006
 harm-0.006
 Owl-0.006
 lose-0.006
Token the
Token ID262
Feature activation-3.78e-03

Pos logit contributions
aney+0.014
 contain+0.013
Jane+0.012
 yes+0.011
 Kate+0.011
Neg logit contributions
 regret-0.011
 vet-0.010
 harm-0.010
 Owl-0.010
 lose-0.010
Token tr
Token ID491
Feature activation-2.34e-02

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.007
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
 lose-0.004
 veterinarian-0.004
Tokenamp
Token ID696
Feature activation-6.14e-02

Pos logit contributions
aney+0.044
 contain+0.040
 yes+0.038
Jane+0.035
 Jane+0.034
Neg logit contributions
 regret-0.035
inging-0.034
 vet-0.033
 harm-0.033
 lose-0.032
Tokenoline
Token ID14453
Feature activation-1.70e-02

Pos logit contributions
aney+0.096
 contain+0.089
 yes+0.087
nam+0.077
year+0.072
Neg logit contributions
 harm-0.104
 regret-0.097
 vet-0.095
 veterinarian-0.094
 damage-0.094
Token.
Token ID13
Feature activation+2.27e-03

Pos logit contributions
aney+0.041
 contain+0.036
Jane+0.035
year+0.033
 yes+0.033
Neg logit contributions
 regret-0.016
 vet-0.016
 harm-0.015
 veterinarian-0.015
 lose-0.015
Token
Token ID220
Feature activation+7.24e-03

Pos logit contributions
 regret+0.003
 vet+0.003
 Owl+0.003
 harm+0.003
 veterinarian+0.003
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
 yes-0.004
 Kate-0.004
Token
Token ID198
Feature activation-5.03e-04

Pos logit contributions
 regret+0.011
 harm+0.010
 vet+0.010
 lose+0.009
 Owl+0.009
Neg logit contributions
aney-0.014
 contain-0.012
Jane-0.012
 Kate-0.012
 Jane-0.012
Token
Token ID198
Feature activation+2.68e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 yes+0.001
ill+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 veterinarian-0.000
 Owl-0.000
 harm-0.000
Token,
Token ID11
Feature activation-3.16e-02

Pos logit contributions
aney+0.059
 contain+0.052
Jane+0.050
year+0.047
mos+0.047
Neg logit contributions
 regret-0.033
 lose-0.031
 harm-0.030
 vet-0.029
 Owl-0.029
Tokenâ
Token ID22940
Feature activation+9.61e-03

Pos logit contributions
aney+0.067
 contain+0.059
Jane+0.055
ill+0.052
 Kate+0.052
Neg logit contributions
 regret-0.043
 vet-0.039
 lose-0.038
 harm-0.037
 Owl-0.037
Token€
Token ID26391
Feature activation-9.16e-04

Pos logit contributions
 regret+0.017
 vet+0.016
 harm+0.016
 Owl+0.016
 veterinarian+0.015
Neg logit contributions
aney-0.016
 contain-0.014
Jane-0.013
 yes-0.012
 Jane-0.012
Token said
Token ID531
Feature activation-1.53e-02

Pos logit contributions
aney+0.002
Jane+0.002
 contain+0.002
 yes+0.001
 Jane+0.001
Neg logit contributions
 regret-0.001
 harm-0.001
 vet-0.001
 Owl-0.001
 damage-0.001
Token the
Token ID262
Feature activation-1.48e-02

Pos logit contributions
aney+0.035
 contain+0.031
Jane+0.029
ill+0.028
 Kate+0.028
Neg logit contributions
 regret-0.018
 vet-0.017
 lose-0.016
 harm-0.016
 Owl-0.016
Token 3
Token ID513
Feature activation-6.12e-02

Pos logit contributions
aney+0.034
 contain+0.030
Jane+0.028
 yes+0.028
 Kate+0.028
Neg logit contributions
 regret-0.017
 vet-0.015
inging-0.015
 harm-0.015
 lose-0.015
Token year
Token ID614
Feature activation-2.01e-02

Pos logit contributions
 contain+0.119
aney+0.115
uits+0.111
ill+0.107
 yes+0.105
Neg logit contributions
 wishes-0.075
 regret-0.075
 harm-0.074
 dog-0.073
 wolf-0.071
Token old
Token ID1468
Feature activation-8.75e-03

Pos logit contributions
aney+0.048
 contain+0.043
Jane+0.041
 yes+0.039
ill+0.039
Neg logit contributions
 regret-0.021
 vet-0.020
 harm-0.019
 veterinarian-0.019
 Owl-0.019
Token.
Token ID13
Feature activation-2.19e-02

Pos logit contributions
aney+0.021
 contain+0.019
Jane+0.018
 yes+0.017
ill+0.017
Neg logit contributions
 regret-0.009
 vet-0.008
 harm-0.008
 Owl-0.008
 veterinarian-0.008
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
aney+0.048
 contain+0.044
Jane+0.040
 yes+0.038
ill+0.038
Neg logit contributions
 regret-0.027
 vet-0.026
 veterinarian-0.025
 Owl-0.024
 Nut-0.024
Token
Token ID198
Feature activation-1.27e-02

Pos logit contributions
aney+0.023
 contain+0.021
Jane+0.019
 yes+0.019
 Kate+0.019
Neg logit contributions
 regret-0.012
 vet-0.011
 harm-0.010
 veterinarian-0.010
 Owl-0.010
Token fun
Token ID1257
Feature activation+9.30e-03

Pos logit contributions
 regret+0.037
 Owl+0.037
 vet+0.036
 client+0.035
inging+0.035
Neg logit contributions
aney-0.031
 contain-0.028
Jane-0.027
 Jane-0.025
 yes-0.025
Token bouncing
Token ID30713
Feature activation+4.46e-04

Pos logit contributions
 regret+0.013
 vet+0.012
 harm+0.012
 lose+0.012
 Owl+0.012
Neg logit contributions
aney-0.019
 contain-0.017
Jane-0.016
 yes-0.015
 Kate-0.015
Token on
Token ID319
Feature activation-2.98e-03

Pos logit contributions
 regret+0.001
 harm+0.000
 lose+0.000
 vet+0.000
 Nut+0.000
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
days-0.001
mos-0.001
Token the
Token ID262
Feature activation-4.73e-03

Pos logit contributions
aney+0.006
 contain+0.005
Jane+0.005
 yes+0.005
ill+0.005
Neg logit contributions
 regret-0.004
 vet-0.004
 harm-0.004
 Owl-0.004
 veterinarian-0.004
Token tr
Token ID491
Feature activation-2.55e-02

Pos logit contributions
aney+0.009
 contain+0.007
Jane+0.007
year+0.007
ill+0.007
Neg logit contributions
 regret-0.007
 lose-0.007
 harm-0.007
 hurts-0.007
 damage-0.007
Tokenamp
Token ID696
Feature activation-6.11e-02

Pos logit contributions
aney+0.049
 contain+0.046
 yes+0.044
Jane+0.039
 Jane+0.038
Neg logit contributions
inging-0.035
 regret-0.035
 harm-0.033
 Owl-0.032
 vet-0.032
Tokenoline
Token ID14453
Feature activation-1.53e-02

Pos logit contributions
 contain+0.093
aney+0.092
 yes+0.088
nam+0.075
 goggles+0.071
Neg logit contributions
 harm-0.103
 veterinarian-0.094
 regret-0.094
 Owl-0.092
 damage-0.092
Token,
Token ID11
Feature activation-7.47e-03

Pos logit contributions
aney+0.037
 contain+0.032
Jane+0.031
year+0.029
mos+0.029
Neg logit contributions
 regret-0.016
 vet-0.015
 harm-0.014
 veterinarian-0.014
 lose-0.014
Token but
Token ID475
Feature activation+1.31e-02

Pos logit contributions
aney+0.015
 contain+0.013
Jane+0.012
ill+0.012
 Kate+0.012
Neg logit contributions
 regret-0.011
 vet-0.010
 harm-0.010
 lose-0.010
 Owl-0.009
Token soon
Token ID2582
Feature activation+6.56e-04

Pos logit contributions
 regret+0.018
 vet+0.017
 harm+0.017
 veterinarian+0.016
 Owl+0.016
Neg logit contributions
aney-0.027
 contain-0.024
Jane-0.023
 Kate-0.022
 Jane-0.022
Token it
Token ID340
Feature activation-2.33e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 lose+0.001
 harm+0.001
 trouble+0.001
Neg logit contributions
aney-0.002
 contain-0.001
Jane-0.001
days-0.001
mos-0.001
Token She
Token ID1375
Feature activation+1.17e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 Owl+0.002
 lose+0.002
Neg logit contributions
aney-0.004
 contain-0.004
Jane-0.004
 yes-0.003
 Kate-0.003
Token falls
Token ID8953
Feature activation-3.30e-04

Pos logit contributions
 Owl+0.014
 vet+0.014
 veterinarian+0.013
 Nut+0.013
 regret+0.013
Neg logit contributions
aney-0.023
 contain-0.022
 Kate-0.020
 yes-0.020
 goggles-0.020
Token off
Token ID572
Feature activation+5.81e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 Kate+0.001
 yes+0.001
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 veterinarian-0.000
 Owl-0.000
Token the
Token ID262
Feature activation-7.73e-03

Pos logit contributions
 regret+0.008
 vet+0.007
 harm+0.007
 veterinarian+0.007
 Owl+0.007
Neg logit contributions
aney-0.013
 contain-0.011
Jane-0.010
 yes-0.010
mos-0.010
Token tr
Token ID491
Feature activation-2.28e-02

Pos logit contributions
aney+0.015
Jane+0.013
 contain+0.012
year+0.012
ill+0.012
Neg logit contributions
 regret-0.012
 lose-0.011
 harm-0.011
 vet-0.011
 damage-0.011
Tokenamp
Token ID696
Feature activation-6.08e-02

Pos logit contributions
aney+0.042
 contain+0.038
 yes+0.036
Jane+0.034
 Jane+0.032
Neg logit contributions
 regret-0.035
inging-0.033
 vet-0.033
 harm-0.032
 lose-0.032
Tokenoline
Token ID14453
Feature activation-6.55e-03

Pos logit contributions
aney+0.092
 yes+0.088
 contain+0.088
nam+0.073
year+0.071
Neg logit contributions
 harm-0.105
inging-0.096
 regret-0.095
 damage-0.095
 vet-0.095
Token.
Token ID13
Feature activation+4.82e-03

Pos logit contributions
aney+0.014
 contain+0.013
Jane+0.012
 yes+0.012
 Kate+0.012
Neg logit contributions
 regret-0.008
 vet-0.008
 harm-0.007
 lose-0.007
 veterinarian-0.007
Token She
Token ID1375
Feature activation+8.67e-03

Pos logit contributions
 regret+0.006
 harm+0.005
 vet+0.005
 Owl+0.005
 lose+0.005
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
 Jane-0.009
 Kate-0.009
Token lands
Token ID8604
Feature activation+1.13e-02

Pos logit contributions
 vet+0.009
 Owl+0.008
 veterinarian+0.008
 regret+0.008
 Nut+0.008
Neg logit contributions
aney-0.019
 contain-0.018
 Kate-0.017
 yes-0.017
Jane-0.016
Token on
Token ID319
Feature activation-1.48e-02

Pos logit contributions
 regret+0.014
 vet+0.013
 harm+0.013
 lose+0.013
 veterinarian+0.013
Neg logit contributions
aney-0.025
 contain-0.022
Jane-0.021
 Kate-0.020
 yes-0.020
Token they
Token ID484
Feature activation-1.84e-02

Pos logit contributions
 regret+0.021
 vet+0.020
 harm+0.019
 veterinarian+0.019
 lose+0.019
Neg logit contributions
aney-0.025
 contain-0.022
Jane-0.021
 Jane-0.020
 yes-0.020
Token jumped
Token ID11687
Feature activation-8.08e-03

Pos logit contributions
aney+0.037
 contain+0.032
Jane+0.031
 Kate+0.028
 yes+0.028
Neg logit contributions
 regret-0.026
 vet-0.023
 will-0.022
 veterinarian-0.022
 lose-0.022
Token on
Token ID319
Feature activation-1.05e-02

Pos logit contributions
aney+0.020
 contain+0.017
Jane+0.017
days+0.016
ill+0.016
Neg logit contributions
 regret-0.008
 vet-0.008
 lose-0.007
inging-0.007
 veterinarian-0.007
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
aney+0.022
 contain+0.019
Jane+0.018
ill+0.017
days+0.016
Neg logit contributions
 regret-0.015
 vet-0.014
 Owl-0.014
 harm-0.014
 Nut-0.014
Token tr
Token ID491
Feature activation-2.97e-02

Pos logit contributions
aney+0.023
 contain+0.019
Jane+0.019
ill+0.018
 yes+0.017
Neg logit contributions
 regret-0.017
 vet-0.016
 harm-0.015
 lose-0.015
 veterinarian-0.015
Tokenamp
Token ID696
Feature activation-6.08e-02

Pos logit contributions
aney+0.059
 contain+0.055
 yes+0.052
Jane+0.050
 Jane+0.048
Neg logit contributions
 regret-0.040
inging-0.039
 vet-0.037
 harm-0.037
 lose-0.036
Tokenoline
Token ID14453
Feature activation-1.63e-02

Pos logit contributions
aney+0.090
 contain+0.085
 yes+0.084
nam+0.073
year+0.067
Neg logit contributions
 harm-0.108
 regret-0.101
inging-0.098
 vet-0.097
 damage-0.096
Token and
Token ID290
Feature activation-1.47e-02

Pos logit contributions
aney+0.039
 contain+0.035
Jane+0.034
year+0.032
mos+0.031
Neg logit contributions
 regret-0.016
 vet-0.015
 veterinarian-0.015
 Owl-0.014
 harm-0.014
Token ran
Token ID4966
Feature activation-1.05e-02

Pos logit contributions
aney+0.029
 contain+0.024
Jane+0.022
ill+0.022
 yes+0.021
Neg logit contributions
 regret-0.023
 Nut-0.021
 lose-0.020
 vet-0.020
 harm-0.020
Token races
Token ID9558
Feature activation-9.98e-03

Pos logit contributions
aney+0.024
 contain+0.021
Jane+0.020
 yes+0.019
ill+0.019
Neg logit contributions
 regret-0.012
 vet-0.011
 lose-0.011
 veterinarian-0.011
 harm-0.011
Token in
Token ID287
Feature activation-2.09e-02

Pos logit contributions
aney+0.022
 contain+0.019
Jane+0.018
 yes+0.017
mos+0.017
Neg logit contributions
 regret-0.012
 harm-0.011
 lose-0.011
 vet-0.011
inging-0.011
Token stepped
Token ID10764
Feature activation-2.81e-03

Pos logit contributions
aney+0.003
 contain+0.002
Jane+0.002
days+0.002
ill+0.002
Neg logit contributions
 vet-0.002
 regret-0.002
 veterinarian-0.001
 will-0.001
 lose-0.001
Token carefully
Token ID7773
Feature activation-9.34e-04

Pos logit contributions
aney+0.006
 contain+0.006
Jane+0.005
 Kate+0.005
 Jane+0.005
Neg logit contributions
 regret-0.003
 vet-0.003
 veterinarian-0.003
 Owl-0.003
 harm-0.003
Token onto
Token ID4291
Feature activation+3.13e-03

Pos logit contributions
aney+0.002
Jane+0.002
 contain+0.002
 Kate+0.002
days+0.002
Neg logit contributions
 regret-0.001
 vet-0.001
 lose-0.001
 harm-0.001
inging-0.001
Token the
Token ID262
Feature activation+2.96e-04

Pos logit contributions
 regret+0.004
 vet+0.004
 Owl+0.004
 harm+0.004
 veterinarian+0.004
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.005
days-0.005
year-0.005
Token tr
Token ID491
Feature activation-2.48e-02

Pos logit contributions
 regret+0.000
 lose+0.000
 harm+0.000
 Owl+0.000
 vet+0.000
Neg logit contributions
aney-0.001
Jane-0.000
 contain-0.000
ill-0.000
year-0.000
Tokenamp
Token ID696
Feature activation-6.00e-02

Pos logit contributions
aney+0.047
 contain+0.043
 yes+0.041
Jane+0.038
 Jane+0.037
Neg logit contributions
inging-0.035
 regret-0.035
 vet-0.033
 Owl-0.033
 harm-0.033
Tokenoline
Token ID14453
Feature activation-1.55e-02

Pos logit contributions
aney+0.099
 contain+0.094
 yes+0.093
nam+0.085
year+0.079
Neg logit contributions
 harm-0.095
inging-0.085
 dog-0.085
 Owl-0.084
 vet-0.083
Token.
Token ID13
Feature activation-2.67e-03

Pos logit contributions
aney+0.038
 contain+0.034
Jane+0.033
year+0.031
mos+0.031
Neg logit contributions
 regret-0.014
 vet-0.014
 veterinarian-0.013
 Owl-0.012
 harm-0.012
Token As
Token ID1081
Feature activation-2.25e-04

Pos logit contributions
aney+0.006
 contain+0.005
 yes+0.005
Jane+0.005
days+0.005
Neg logit contributions
 regret-0.003
 vet-0.003
 Owl-0.003
 veterinarian-0.003
 Nut-0.003
Token soon
Token ID2582
Feature activation+1.33e-03

Pos logit contributions
aney+0.000
 contain+0.000
mos+0.000
Jane+0.000
 yes+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 Owl-0.000
 lose-0.000
Token as
Token ID355
Feature activation-7.58e-04

Pos logit contributions
 regret+0.002
 harm+0.002
 vet+0.002
 lose+0.002
 damage+0.002
Neg logit contributions
aney-0.003
 contain-0.002
Jane-0.002
mos-0.002
days-0.002
Token at
Token ID379
Feature activation+2.94e-03

Pos logit contributions
 regret+0.010
 vet+0.010
 harm+0.009
 Owl+0.009
 lose+0.009
Neg logit contributions
aney-0.018
 contain-0.016
Jane-0.015
 yes-0.015
 Kate-0.014
Token him
Token ID683
Feature activation+4.70e-04

Pos logit contributions
 regret+0.005
 vet+0.004
 harm+0.004
 veterinarian+0.004
 lose+0.004
Neg logit contributions
aney-0.005
 contain-0.005
Jane-0.004
 Jane-0.004
 Kate-0.004
Token with
Token ID351
Feature activation+1.11e-02

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
 Kate-0.001
Token a
Token ID257
Feature activation+1.39e-02

Pos logit contributions
 regret+0.018
 vet+0.017
 harm+0.017
 Owl+0.017
 lose+0.016
Neg logit contributions
aney-0.020
 contain-0.017
Jane-0.016
 yes-0.015
 Kate-0.015
Token mis
Token ID2984
Feature activation+6.11e-03

Pos logit contributions
 regret+0.021
 vet+0.019
 harm+0.018
 lose+0.018
 Owl+0.018
Neg logit contributions
aney-0.028
 contain-0.024
Jane-0.023
 Kate-0.022
ill-0.021
Tokenchie
Token ID3043
Feature activation-5.99e-02

Pos logit contributions
 regret+0.012
 vet+0.011
 harm+0.011
inging+0.011
 lose+0.011
Neg logit contributions
aney-0.009
 contain-0.007
 yes-0.007
 Kate-0.007
Jane-0.007
Tokenvous
Token ID31222
Feature activation+1.50e-02

Pos logit contributions
aney+0.107
uits+0.101
 yes+0.098
 contain+0.096
 cubes+0.095
Neg logit contributions
 harm-0.082
 regret-0.077
 wishes-0.076
 knows-0.076
 hurts-0.070
Token grin
Token ID16770
Feature activation+3.63e-03

Pos logit contributions
 regret+0.024
 vet+0.022
 harm+0.022
 Owl+0.021
 lose+0.021
Neg logit contributions
aney-0.028
 contain-0.024
Jane-0.023
 yes-0.021
 Kate-0.021
Token.
Token ID13
Feature activation+3.23e-03

Pos logit contributions
 regret+0.005
 vet+0.005
 harm+0.005
 veterinarian+0.005
 Owl+0.005
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.006
 yes-0.006
 Kate-0.006
Token She
Token ID1375
Feature activation-6.24e-03

Pos logit contributions
 regret+0.005
 harm+0.004
 lose+0.004
 vet+0.004
inging+0.004
Neg logit contributions
aney-0.006
 contain-0.005
Jane-0.005
 Jane-0.005
 Kate-0.005
Token said
Token ID531
Feature activation+2.81e-03

Pos logit contributions
aney+0.013
 contain+0.011
Jane+0.010
days+0.010
ill+0.010
Neg logit contributions
 regret-0.009
 vet-0.008
 Owl-0.008
 veterinarian-0.008
 harm-0.008
TokenThey
Token ID2990
Feature activation-1.25e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 veterinarian+0.002
 harm+0.002
 Owl+0.002
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.003
 Kate-0.003
 yes-0.003
Token climbed
Token ID19952
Feature activation+1.24e-02

Pos logit contributions
aney+0.023
 contain+0.020
Jane+0.019
 Kate+0.018
 yes+0.018
Neg logit contributions
 regret-0.020
 vet-0.018
 lose-0.018
 harm-0.018
 Owl-0.017
Token off
Token ID572
Feature activation+1.24e-03

Pos logit contributions
 regret+0.017
 vet+0.016
 harm+0.016
 Owl+0.015
 lose+0.015
Neg logit contributions
aney-0.025
 contain-0.022
Jane-0.021
 yes-0.020
 Kate-0.020
Token the
Token ID262
Feature activation+7.07e-05

Pos logit contributions
 regret+0.002
 vet+0.001
 lose+0.001
 harm+0.001
 Owl+0.001
Neg logit contributions
aney-0.003
 contain-0.002
Jane-0.002
 yes-0.002
mos-0.002
Token tr
Token ID491
Feature activation-2.46e-02

Pos logit contributions
 regret+0.000
 vet+0.000
 harm+0.000
 lose+0.000
 Owl+0.000
Neg logit contributions
aney-0.000
 contain-0.000
Jane-0.000
 yes-0.000
ill-0.000
Tokenamp
Token ID696
Feature activation-5.94e-02

Pos logit contributions
aney+0.047
 contain+0.043
 yes+0.040
Jane+0.038
 Jane+0.037
Neg logit contributions
 regret-0.036
inging-0.035
 vet-0.034
 harm-0.033
 Owl-0.032
Tokenoline
Token ID14453
Feature activation-1.25e-02

Pos logit contributions
aney+0.093
 contain+0.081
 yes+0.079
nam+0.071
year+0.067
Neg logit contributions
 harm-0.105
 regret-0.100
 vet-0.096
 damage-0.096
inging-0.095
Token and
Token ID290
Feature activation-1.86e-02

Pos logit contributions
aney+0.029
 contain+0.026
Jane+0.025
year+0.024
 yes+0.024
Neg logit contributions
 regret-0.013
 vet-0.013
 veterinarian-0.012
 Owl-0.012
 harm-0.011
Token hugged
Token ID41130
Feature activation+5.39e-03

Pos logit contributions
aney+0.034
 contain+0.029
Jane+0.027
 yes+0.026
ill+0.026
Neg logit contributions
 regret-0.030
 vet-0.028
 lose-0.027
 harm-0.026
 Nut-0.026
Token each
Token ID1123
Feature activation-2.12e-02

Pos logit contributions
 regret+0.009
 vet+0.008
 harm+0.008
 Owl+0.008
 lose+0.008
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.008
 yes-0.008
 Kate-0.008
Token other
Token ID584
Feature activation-2.43e-02

Pos logit contributions
aney+0.053
 contain+0.048
Jane+0.046
days+0.045
ill+0.045
Neg logit contributions
 regret-0.018
 vet-0.017
 harm-0.017
 veterinarian-0.016
 wishes-0.015
Token
Token ID198
Feature activation+4.81e-03

Pos logit contributions
 regret+0.005
 vet+0.004
 harm+0.004
 Owl+0.004
 veterinarian+0.004
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.008
 yes-0.008
 Kate-0.007
TokenSuddenly
Token ID38582
Feature activation+4.14e-03

Pos logit contributions
 regret+0.007
 vet+0.007
 veterinarian+0.006
 Owl+0.006
 harm+0.006
Neg logit contributions
aney-0.010
 contain-0.009
 yes-0.008
 Kate-0.008
Jane-0.008
Token,
Token ID11
Feature activation+5.25e-03

Pos logit contributions
 regret+0.005
 vet+0.005
 harm+0.005
 lose+0.005
 Owl+0.005
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.007
 yes-0.007
 Jane-0.007
Token the
Token ID262
Feature activation-4.27e-03

Pos logit contributions
 regret+0.009
 vet+0.008
 Owl+0.008
 harm+0.008
 veterinarian+0.008
Neg logit contributions
aney-0.010
 contain-0.008
Jane-0.008
 yes-0.007
ill-0.007
Token tr
Token ID491
Feature activation-1.38e-02

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.008
 yes+0.007
 Kate+0.007
Neg logit contributions
 regret-0.006
 vet-0.005
 harm-0.005
 Owl-0.005
 veterinarian-0.005
Tokenamp
Token ID696
Feature activation-5.89e-02

Pos logit contributions
aney+0.024
 contain+0.021
Jane+0.019
 yes+0.019
 Kate+0.018
Neg logit contributions
 regret-0.023
 vet-0.022
 harm-0.021
inging-0.021
 Owl-0.021
Tokenoline
Token ID14453
Feature activation-3.32e-03

Pos logit contributions
aney+0.088
 contain+0.083
 yes+0.079
days+0.065
nam+0.064
Neg logit contributions
 harm-0.102
 regret-0.100
 damage-0.094
 Owl-0.094
 vet-0.094
Token collapsed
Token ID14707
Feature activation+1.30e-02

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.005
 Kate+0.005
 yes+0.005
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.004
 veterinarian-0.004
 lose-0.004
Token beneath
Token ID11061
Feature activation-7.53e-03

Pos logit contributions
 regret+0.016
 vet+0.015
 harm+0.014
 Owl+0.014
 veterinarian+0.014
Neg logit contributions
aney-0.029
 contain-0.026
Jane-0.024
 yes-0.023
 Kate-0.023
Token her
Token ID607
Feature activation+1.34e-02

Pos logit contributions
aney+0.016
 contain+0.014
days+0.012
Jane+0.012
 yes+0.012
Neg logit contributions
 regret-0.011
 vet-0.010
 harm-0.010
 Nut-0.009
 Owl-0.009
Token with
Token ID351
Feature activation-9.19e-03

Pos logit contributions
 regret+0.023
 vet+0.022
 Owl+0.021
 veterinarian+0.021
 harm+0.021
Neg logit contributions
aney-0.023
 contain-0.020
Jane-0.018
 yes-0.018
 Kate-0.017
Token pick
Token ID2298
Feature activation+1.43e-02

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
 Kate+0.007
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
 Owl-0.005
 veterinarian-0.005
Token a
Token ID257
Feature activation+6.76e-03

Pos logit contributions
 regret+0.021
 vet+0.020
 harm+0.019
 lose+0.019
inging+0.019
Neg logit contributions
aney-0.028
 contain-0.024
Jane-0.023
 Jane-0.023
 Kate-0.023
Token few
Token ID1178
Feature activation-1.19e-02

Pos logit contributions
 regret+0.010
 vet+0.009
 harm+0.009
 Owl+0.009
 veterinarian+0.009
Neg logit contributions
aney-0.013
 contain-0.012
Jane-0.011
 yes-0.011
 Kate-0.011
Token of
Token ID286
Feature activation+7.34e-04

Pos logit contributions
aney+0.026
 contain+0.023
Jane+0.021
ill+0.021
 yes+0.021
Neg logit contributions
 regret-0.015
 harm-0.014
 vet-0.014
 damage-0.014
 wishes-0.013
Token the
Token ID262
Feature activation+4.17e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 veterinarian+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
ill-0.001
Token herbs
Token ID25411
Feature activation+2.54e-02

Pos logit contributions
 regret+0.006
 vet+0.005
 lose+0.005
inging+0.005
 Owl+0.005
Neg logit contributions
aney-0.008
 contain-0.007
 yes-0.007
Jane-0.007
 Jane-0.007
Token and
Token ID290
Feature activation+5.37e-03

Pos logit contributions
 regret+0.033
 vet+0.031
 harm+0.030
 Owl+0.029
 lose+0.029
Neg logit contributions
aney-0.054
 contain-0.048
Jane-0.045
 Kate-0.043
 yes-0.043
Token put
Token ID1234
Feature activation+8.31e-03

Pos logit contributions
 regret+0.008
 vet+0.007
inging+0.007
 harm+0.007
 Owl+0.007
Neg logit contributions
aney-0.010
 contain-0.009
 Jane-0.008
Jane-0.008
 yes-0.008
Token them
Token ID606
Feature activation-6.15e-03

Pos logit contributions
 regret+0.013
 vet+0.012
 harm+0.012
 Owl+0.012
 veterinarian+0.012
Neg logit contributions
aney-0.015
 contain-0.014
Jane-0.012
 yes-0.012
 Kate-0.012
Token into
Token ID656
Feature activation-8.85e-04

Pos logit contributions
aney+0.015
Jane+0.013
 contain+0.013
 Lots+0.012
 Kate+0.012
Neg logit contributions
 regret-0.006
 lose-0.005
 promises-0.005
 harm-0.005
 will-0.005
Token her
Token ID607
Feature activation+6.26e-03

Pos logit contributions
aney+0.002
 contain+0.002
Jane+0.001
days+0.001
ill+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 veterinarian-0.001
 Owl-0.001
Token to
Token ID284
Feature activation-3.58e-03

Pos logit contributions
 regret+0.013
inging+0.012
 vet+0.012
 lose+0.012
 veterinarian+0.012
Neg logit contributions
aney-0.013
 contain-0.012
 Jane-0.011
 yes-0.011
Jane-0.011
Token find
Token ID1064
Feature activation+1.84e-02

Pos logit contributions
aney+0.007
 contain+0.007
Jane+0.006
 yes+0.006
 Jane+0.006
Neg logit contributions
inging-0.005
 regret-0.004
 unnecessary-0.004
athy-0.004
 Owl-0.004
Token one
Token ID530
Feature activation-3.83e-03

Pos logit contributions
 regret+0.037
inging+0.036
 lose+0.035
 vet+0.034
 client+0.034
Neg logit contributions
aney-0.024
 contain-0.019
 Jane-0.019
Jane-0.019
 Kate-0.019
Token that
Token ID326
Feature activation+5.59e-04

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.006
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.005
 Owl-0.005
 veterinarian-0.005
Token liked
Token ID8288
Feature activation-1.83e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
 Kate-0.001
Token le
Token ID443
Feature activation+2.34e-02

Pos logit contributions
aney+0.004
 contain+0.003
Jane+0.003
 yes+0.003
 Kate+0.003
Neg logit contributions
 regret-0.003
 vet-0.003
 harm-0.002
 Owl-0.002
 lose-0.002
Tokenmons
Token ID11567
Feature activation+1.51e-02

Pos logit contributions
 regret+0.036
 difference+0.032
 wishes+0.031
 lose+0.031
 trouble+0.030
Neg logit contributions
aney-0.042
cr-0.039
ill-0.038
mos-0.037
 yes-0.035
Token and
Token ID290
Feature activation+1.32e-02

Pos logit contributions
 regret+0.019
 vet+0.017
 harm+0.017
 lose+0.017
 Owl+0.017
Neg logit contributions
aney-0.033
 contain-0.029
Jane-0.028
 Kate-0.027
 Jane-0.026
Token liked
Token ID8288
Feature activation+3.48e-03

Pos logit contributions
 regret+0.020
inging+0.019
 vet+0.019
 veterinarian+0.018
 harm+0.018
Neg logit contributions
aney-0.023
 contain-0.021
 yes-0.019
Jane-0.019
 Jane-0.019
Token them
Token ID606
Feature activation-2.53e-03

Pos logit contributions
 regret+0.005
 vet+0.005
 harm+0.005
 Owl+0.005
 lose+0.005
Neg logit contributions
aney-0.007
 contain-0.006
Jane-0.005
 yes-0.005
 Kate-0.005
Token too
Token ID1165
Feature activation-8.84e-03

Pos logit contributions
aney+0.006
Jane+0.005
 contain+0.005
 Kate+0.005
 Lots+0.005
Neg logit contributions
 regret-0.003
 Owl-0.003
 veterinarian-0.002
 lose-0.002
 harm-0.002
Token kind
Token ID1611
Feature activation+1.68e-02

Pos logit contributions
 regret+0.012
 harm+0.012
inging+0.012
 lose+0.011
 unnecessary+0.011
Neg logit contributions
aney-0.014
 yes-0.012
 contain-0.012
Jane-0.012
 Jane-0.011
Token girl
Token ID2576
Feature activation+1.34e-02

Pos logit contributions
 regret+0.020
 harm+0.019
 lose+0.018
 unnecessary+0.018
inging+0.018
Neg logit contributions
aney-0.037
 contain-0.033
Jane-0.032
 Jane-0.031
 yes-0.030
Token named
Token ID3706
Feature activation-1.85e-02

Pos logit contributions
 regret+0.024
 vet+0.023
 harm+0.022
 lose+0.022
inging+0.022
Neg logit contributions
aney-0.021
 contain-0.018
Jane-0.017
 Jane-0.016
 Kate-0.016
Token Lily
Token ID20037
Feature activation+2.59e-02

Pos logit contributions
aney+0.037
 contain+0.036
 yes+0.031
uits+0.031
year+0.031
Neg logit contributions
 regret-0.023
 vet-0.022
 veterinarian-0.022
 lose-0.021
 Owl-0.021
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
 regret+0.040
 vet+0.039
inging+0.038
 unnecessary+0.037
 lose+0.037
Neg logit contributions
aney-0.047
 contain-0.040
Jane-0.039
 Jane-0.037
ill-0.037
Token She
Token ID1375
Feature activation+2.70e-02

Pos logit contributions
aney+0.032
 contain+0.030
Jane+0.027
 yes+0.027
mos+0.026
Neg logit contributions
 regret-0.008
 vet-0.007
 veterinarian-0.007
 Owl-0.007
 lose-0.006
Token loved
Token ID6151
Feature activation+1.52e-02

Pos logit contributions
 regret+0.049
 vet+0.047
 harm+0.047
inging+0.047
 difference+0.046
Neg logit contributions
aney-0.041
 contain-0.035
Jane-0.033
 yes-0.032
 Kate-0.031
Token to
Token ID284
Feature activation+3.11e-03

Pos logit contributions
 regret+0.023
 vet+0.022
 harm+0.021
 Owl+0.021
 lose+0.021
Neg logit contributions
aney-0.029
 contain-0.025
Jane-0.024
 yes-0.023
 Kate-0.022
Token play
Token ID711
Feature activation+4.55e-03

Pos logit contributions
 regret+0.003
 vet+0.003
 lose+0.003
 veterinarian+0.003
 harm+0.003
Neg logit contributions
aney-0.008
Jane-0.006
 contain-0.006
ill-0.006
year-0.006
Token outside
Token ID2354
Feature activation-3.76e-03

Pos logit contributions
 regret+0.006
 vet+0.006
 harm+0.006
 veterinarian+0.006
 Owl+0.006
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.008
 yes-0.007
 Kate-0.007
Token with
Token ID351
Feature activation+8.22e-03

Pos logit contributions
aney+0.010
 contain+0.009
Jane+0.008
uits+0.008
ill+0.008
Neg logit contributions
 regret-0.003
 veterinarian-0.003
 Owl-0.003
 vet-0.003
 harm-0.003
Token them
Token ID606
Feature activation+5.81e-03

Pos logit contributions
 regret+0.014
 vet+0.013
 harm+0.012
 Owl+0.012
 lose+0.012
Neg logit contributions
aney-0.017
 contain-0.015
Jane-0.014
 yes-0.013
 Kate-0.013
Token out
Token ID503
Feature activation-9.98e-03

Pos logit contributions
 regret+0.009
 vet+0.009
 veterinarian+0.008
 Owl+0.008
 harm+0.008
Neg logit contributions
aney-0.011
 contain-0.010
 yes-0.009
Jane-0.009
 Kate-0.009
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
aney+0.023
 contain+0.020
Jane+0.019
mos+0.018
shirt+0.018
Neg logit contributions
 regret-0.012
 harm-0.011
 vet-0.010
 Owl-0.010
 damage-0.010
Token They
Token ID1119
Feature activation-1.53e-03

Pos logit contributions
aney+0.037
 contain+0.035
days+0.032
mos+0.032
year+0.031
Neg logit contributions
 vet-0.014
 veterinarian-0.013
 regret-0.012
 Bad-0.011
 Cle-0.011
Token were
Token ID547
Feature activation+1.57e-03

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
shirt+0.002
rd+0.002
Neg logit contributions
 regret-0.002
 lose-0.002
 vet-0.002
 Owl-0.002
 parent-0.002
Token vegetables
Token ID13701
Feature activation+3.97e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 Owl+0.002
 lose+0.002
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.002
 yes-0.002
 Kate-0.002
Token!
Token ID0
Feature activation-1.27e-02

Pos logit contributions
 regret+0.064
 vet+0.062
check+0.062
 client+0.060
 harm+0.060
Neg logit contributions
aney-0.068
 contain-0.062
 yes-0.058
ill-0.056
 Jane-0.053
Token They
Token ID1119
Feature activation-3.26e-03

Pos logit contributions
aney+0.030
 contain+0.029
 yes+0.025
Jane+0.025
days+0.025
Neg logit contributions
 vet-0.013
 veterinarian-0.012
 regret-0.011
 Cle-0.010
 trouble-0.010
Token were
Token ID547
Feature activation+2.86e-03

Pos logit contributions
aney+0.006
 contain+0.005
Jane+0.005
 Kate+0.005
shirt+0.005
Neg logit contributions
 regret-0.005
 lose-0.004
 vet-0.004
 Owl-0.004
 harm-0.004
Token carrots
Token ID34397
Feature activation+3.93e-02

Pos logit contributions
 regret+0.005
 vet+0.004
 harm+0.004
 Owl+0.004
 lose+0.004
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
 yes-0.004
 Kate-0.004
Token,
Token ID11
Feature activation+6.86e-03

Pos logit contributions
 regret+0.062
 vet+0.057
 harm+0.057
check+0.056
 client+0.055
Neg logit contributions
aney-0.073
 contain-0.065
 yes-0.060
ill-0.059
Jane-0.058
Token imagine
Token ID5967
Feature activation+3.79e-03

Pos logit contributions
 regret+0.008
 vet+0.008
 Owl+0.008
 harm+0.007
 veterinarian+0.007
Neg logit contributions
aney-0.016
 contain-0.015
Jane-0.014
 yes-0.014
ill-0.013
Token next
Token ID1306
Feature activation-1.09e-02

Pos logit contributions
 regret+0.005
 harm+0.004
 vet+0.004
 Owl+0.004
 lose+0.004
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.007
 yes-0.006
ill-0.006
Token.
Token ID13
Feature activation+5.61e-03

Pos logit contributions
aney+0.026
 contain+0.023
Jane+0.022
 yes+0.021
 Kate+0.021
Neg logit contributions
 regret-0.011
 harm-0.011
 vet-0.011
 veterinarian-0.010
 Owl-0.010
Token<|endoftext|>
Token ID50256
Feature activation-1.00e-02

Pos logit contributions
 harm+0.009
 regret+0.009
 lose+0.009
ising+0.008
inging+0.008
Neg logit contributions
aney-0.010
 Jane-0.009
Jane-0.009
 Kate-0.008
 Jill-0.008
TokenTom
Token ID13787
Feature activation+2.77e-03

Pos logit contributions
aney+0.021
 contain+0.019
Jane+0.018
 Kate+0.017
 Jane+0.017
Neg logit contributions
 regret-0.013
 vet-0.012
 harm-0.012
 lose-0.011
 Owl-0.011
Token and
Token ID290
Feature activation+2.33e-02

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 veterinarian+0.004
inging+0.004
Neg logit contributions
aney-0.005
 contain-0.005
Jane-0.004
ill-0.004
 Jane-0.004
Token Lily
Token ID20037
Feature activation+1.86e-02

Pos logit contributions
 regret+0.036
 vet+0.035
 Owl+0.033
 veterinarian+0.033
 lose+0.033
Neg logit contributions
aney-0.044
 contain-0.038
Jane-0.035
ill-0.034
 yes-0.034
Token were
Token ID547
Feature activation+1.45e-03

Pos logit contributions
 vet+0.032
 unnecessary+0.031
 regret+0.031
 harm+0.031
 Owl+0.030
Neg logit contributions
aney-0.031
 contain-0.026
 Jane-0.025
ill-0.025
Jane-0.025
Token twins
Token ID20345
Feature activation+1.09e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 veterinarian+0.002
 Owl+0.002
Neg logit contributions
aney-0.003
 contain-0.003
Jane-0.002
 yes-0.002
 Kate-0.002
Token who
Token ID508
Feature activation+2.10e-02

Pos logit contributions
 regret+0.015
 vet+0.014
 harm+0.013
 veterinarian+0.013
 Owl+0.013
Neg logit contributions
aney-0.023
 contain-0.020
Jane-0.019
ill-0.018
 Kate-0.018
Token liked
Token ID8288
Feature activation-2.26e-03

Pos logit contributions
 regret+0.039
 vet+0.037
 harm+0.036
 Owl+0.036
 veterinarian+0.036
Neg logit contributions
aney-0.033
 contain-0.028
Jane-0.026
 yes-0.025
 Kate-0.025
Token was
Token ID373
Feature activation+7.71e-04

Pos logit contributions
aney+0.032
 contain+0.029
Jane+0.026
 yes+0.025
ill+0.025
Neg logit contributions
 regret-0.020
 vet-0.018
 Owl-0.017
 harm-0.017
 veterinarian-0.017
Token a
Token ID257
Feature activation+1.22e-02

Pos logit contributions
 regret+0.001
 vet+0.001
 Owl+0.001
 veterinarian+0.001
 harm+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
ill-0.001
 yes-0.001
Token little
Token ID1310
Feature activation+1.37e-02

Pos logit contributions
 regret+0.020
 harm+0.019
inging+0.019
 vet+0.019
 lose+0.019
Neg logit contributions
aney-0.020
 contain-0.018
 yes-0.017
Jane-0.016
 Kate-0.016
Token,
Token ID11
Feature activation+3.12e-03

Pos logit contributions
check+0.021
 regret+0.020
inging+0.020
 harm+0.020
 vet+0.020
Neg logit contributions
aney-0.023
 contain-0.020
 yes-0.019
 Jane-0.019
 Kate-0.019
Token ugly
Token ID13400
Feature activation-3.51e-03

Pos logit contributions
 regret+0.005
 vet+0.005
 harm+0.005
 Owl+0.005
 veterinarian+0.004
Neg logit contributions
aney-0.006
 contain-0.005
Jane-0.005
 Kate-0.004
ill-0.004
Token bear
Token ID6842
Feature activation+1.68e-02

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
 Kate+0.007
Neg logit contributions
 regret-0.003
 vet-0.003
 harm-0.003
 Owl-0.003
 veterinarian-0.003
Token and
Token ID290
Feature activation+4.07e-03

Pos logit contributions
 regret+0.024
 vet+0.023
 harm+0.022
 lose+0.022
inging+0.022
Neg logit contributions
aney-0.032
 contain-0.029
Jane-0.027
 yes-0.026
 Kate-0.026
Token had
Token ID550
Feature activation-1.03e-02

Pos logit contributions
 regret+0.006
 vet+0.006
 Owl+0.006
 harm+0.006
 lose+0.006
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.006
 yes-0.006
ill-0.006
Token only
Token ID691
Feature activation+1.20e-03

Pos logit contributions
aney+0.019
 contain+0.016
Jane+0.015
ill+0.014
days+0.014
Neg logit contributions
 regret-0.018
 vet-0.017
 harm-0.016
 lose-0.016
 veterinarian-0.016
Token a
Token ID257
Feature activation+2.09e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 lose+0.002
 Owl+0.002
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 yes-0.002
 Kate-0.002
Token few
Token ID1178
Feature activation-1.30e-02

Pos logit contributions
 regret+0.031
 harm+0.030
 vet+0.029
check+0.029
inging+0.029
Neg logit contributions
aney-0.037
 contain-0.032
 yes-0.031
Jane-0.029
 Kate-0.029
Token she
Token ID673
Feature activation+5.94e-04

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.007
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.005
 Owl-0.005
 lose-0.005
Token searched
Token ID16499
Feature activation+1.93e-02

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 veterinarian+0.001
Neg logit contributions
aney-0.001
 contain-0.001
Jane-0.001
 yes-0.001
 Kate-0.001
Token for
Token ID329
Feature activation+9.60e-03

Pos logit contributions
 regret+0.025
 harm+0.022
 vet+0.022
 lose+0.021
 veterinarian+0.021
Neg logit contributions
aney-0.043
 contain-0.037
Jane-0.036
 Kate-0.034
ill-0.034
Token but
Token ID475
Feature activation+6.75e-03

Pos logit contributions
 regret+0.014
 harm+0.013
 vet+0.013
 trouble+0.013
 damage+0.012
Neg logit contributions
aney-0.020
 contain-0.017
Jane-0.015
year-0.015
days-0.015
Token never
Token ID1239
Feature activation-7.72e-03

Pos logit contributions
 regret+0.010
 lose+0.008
 vet+0.008
 harm+0.008
 Owl+0.008
Neg logit contributions
aney-0.014
 contain-0.012
Jane-0.011
 yes-0.011
mos-0.011
Token found
Token ID1043
Feature activation+1.56e-02

Pos logit contributions
aney+0.015
 contain+0.013
Jane+0.013
 yes+0.012
 Kate+0.012
Neg logit contributions
 regret-0.012
 vet-0.011
 harm-0.010
 lose-0.010
 Owl-0.010
Token the
Token ID262
Feature activation+1.00e-02

Pos logit contributions
 regret+0.025
 vet+0.024
 harm+0.023
 Owl+0.023
 lose+0.022
Neg logit contributions
aney-0.029
 contain-0.025
Jane-0.023
 yes-0.022
 Kate-0.022
Token same
Token ID976
Feature activation-8.26e-03

Pos logit contributions
 regret+0.014
 vet+0.013
inging+0.013
 lose+0.013
 harm+0.013
Neg logit contributions
aney-0.019
 contain-0.017
Jane-0.016
 yes-0.016
 Jane-0.015
Token leaves
Token ID5667
Feature activation+1.14e-02

Pos logit contributions
aney+0.017
 contain+0.015
Jane+0.014
 yes+0.014
 Kate+0.014
Neg logit contributions
 regret-0.012
 vet-0.011
 harm-0.010
 veterinarian-0.010
 lose-0.010
Token again
Token ID757
Feature activation+1.25e-02

Pos logit contributions
 regret+0.015
 harm+0.014
 vet+0.014
 lose+0.014
 Owl+0.013
Neg logit contributions
aney-0.024
 contain-0.021
Jane-0.020
 Kate-0.019
 yes-0.019
Token.
Token ID13
Feature activation+4.27e-03

Pos logit contributions
 regret+0.017
 vet+0.016
 harm+0.015
 Owl+0.015
 veterinarian+0.015
Neg logit contributions
aney-0.026
 contain-0.023
Jane-0.022
 yes-0.021
 Kate-0.021
Token very
Token ID845
Feature activation-1.16e-02

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
ill+0.003
uits+0.003
Neg logit contributions
 regret-0.003
 vet-0.002
 Owl-0.002
 veterinarian-0.002
 lose-0.002
Token small
Token ID1402
Feature activation+5.90e-03

Pos logit contributions
aney+0.021
 contain+0.018
Jane+0.017
 Jane+0.017
 yes+0.016
Neg logit contributions
 regret-0.019
 vet-0.018
 harm-0.018
 Owl-0.018
 veterinarian-0.017
Token.
Token ID13
Feature activation+7.95e-03

Pos logit contributions
 regret+0.009
 vet+0.008
inging+0.008
 harm+0.008
 Owl+0.008
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
 Kate-0.009
 yes-0.009
Token The
Token ID383
Feature activation-1.21e-02

Pos logit contributions
 regret+0.008
 vet+0.008
 veterinarian+0.008
 Owl+0.007
 lose+0.007
Neg logit contributions
aney-0.018
 contain-0.017
Jane-0.016
 yes-0.016
year-0.015
Token girl
Token ID2576
Feature activation-2.65e-03

Pos logit contributions
aney+0.026
 contain+0.023
Jane+0.022
 yes+0.022
 Jane+0.021
Neg logit contributions
 regret-0.015
inging-0.014
 harm-0.014
 Owl-0.014
 Nut-0.013
Token was
Token ID373
Feature activation+2.91e-03

Pos logit contributions
aney+0.005
 contain+0.004
Jane+0.004
 yes+0.003
 Kate+0.003
Neg logit contributions
 regret-0.005
 vet-0.004
 harm-0.004
 Owl-0.004
 lose-0.004
Token very
Token ID845
Feature activation-5.54e-03

Pos logit contributions
 regret+0.004
 vet+0.004
 harm+0.004
 Owl+0.004
 veterinarian+0.004
Neg logit contributions
aney-0.006
 contain-0.005
Jane-0.005
 yes-0.004
ill-0.004
Token thoughtful
Token ID22677
Feature activation-7.50e-04

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.008
 Jane+0.007
 yes+0.007
Neg logit contributions
 vet-0.009
 regret-0.009
 Owl-0.009
 harm-0.009
check-0.009
Token and
Token ID290
Feature activation+1.19e-02

Pos logit contributions
aney+0.002
 contain+0.001
Jane+0.001
 yes+0.001
 Kate+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 harm-0.001
 Owl-0.001
 lose-0.001
Token she
Token ID673
Feature activation+1.82e-02

Pos logit contributions
 regret+0.019
 vet+0.018
 harm+0.018
inging+0.018
 Owl+0.018
Neg logit contributions
aney-0.021
 contain-0.018
Jane-0.017
 yes-0.017
 Jane-0.017
Token wanted
Token ID2227
Feature activation-2.55e-03

Pos logit contributions
 regret+0.032
 vet+0.030
 harm+0.030
inging+0.029
 Owl+0.029
Neg logit contributions
aney-0.030
 contain-0.026
Jane-0.024
 yes-0.023
 Kate-0.023
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 regret+0.007
 lose+0.006
 vet+0.006
 harm+0.006
 difference+0.006
Neg logit contributions
aney-0.011
 contain-0.010
Jane-0.009
year-0.009
 Kate-0.009
Token Lily
Token ID20037
Feature activation-5.70e-03

Pos logit contributions
aney+0.022
 contain+0.019
Jane+0.018
 Kate+0.018
 Jane+0.017
Neg logit contributions
 regret-0.013
 vet-0.012
 harm-0.012
 Owl-0.012
 lose-0.012
Token smiled
Token ID13541
Feature activation-4.17e-03

Pos logit contributions
aney+0.010
 contain+0.009
Jane+0.008
 yes+0.008
 Kate+0.008
Neg logit contributions
 regret-0.009
 vet-0.009
 harm-0.009
 Owl-0.008
 veterinarian-0.008
Token and
Token ID290
Feature activation+1.45e-02

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
 Kate+0.007
Neg logit contributions
 regret-0.006
 vet-0.005
 harm-0.005
 Owl-0.005
 lose-0.005
Token took
Token ID1718
Feature activation+2.50e-03

Pos logit contributions
 harm+0.024
 regret+0.024
 vet+0.024
inging+0.023
 Owl+0.023
Neg logit contributions
aney-0.023
 contain-0.020
Jane-0.020
 yes-0.019
 Jane-0.019
Token the
Token ID262
Feature activation+1.45e-03

Pos logit contributions
 regret+0.005
inging+0.005
 Owl+0.005
 tou+0.005
 vet+0.005
Neg logit contributions
aney-0.003
 Kate-0.002
 Jane-0.002
 contain-0.002
Jane-0.002
Token medicine
Token ID9007
Feature activation+1.82e-02

Pos logit contributions
inging+0.002
 client+0.002
 Nut+0.002
 Cle+0.002
 regret+0.002
Neg logit contributions
aney-0.003
 yes-0.003
 contain-0.003
 apr-0.002
 rack-0.002
Token.
Token ID13
Feature activation+8.74e-03

Pos logit contributions
 regret+0.026
 vet+0.024
 harm+0.023
 Owl+0.023
 lose+0.023
Neg logit contributions
aney-0.036
 contain-0.032
Jane-0.030
 yes-0.029
 Jane-0.029
Token She
Token ID1375
Feature activation-2.97e-03

Pos logit contributions
 harm+0.013
 regret+0.012
 vet+0.012
 hawk+0.011
 Owl+0.011
Neg logit contributions
aney-0.017
 Jane-0.015
 Kate-0.015
Jane-0.014
 Jill-0.014
Token knew
Token ID2993
Feature activation+6.18e-05

Pos logit contributions
aney+0.005
 contain+0.005
Jane+0.004
 yes+0.004
 Kate+0.004
Neg logit contributions
 regret-0.005
 vet-0.004
 harm-0.004
 Owl-0.004
 veterinarian-0.004
Token her
Token ID607
Feature activation-8.85e-03

Pos logit contributions
inging+0.000
 regret+0.000
 lose+0.000
 harm+0.000
 difference+0.000
Neg logit contributions
aney-0.000
 Jane-0.000
Jane-0.000
 Kate-0.000
 yes-0.000
Token t
Token ID256
Feature activation-7.69e-03

Pos logit contributions
 regret+0.003
 vet+0.003
 veterinarian+0.003
 harm+0.003
 lose+0.003
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
ill-0.004
 Kate-0.004
Tokeneddy
Token ID21874
Feature activation-1.68e-03

Pos logit contributions
aney+0.021
 contain+0.018
Jane+0.018
ill+0.017
 yes+0.017
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.005
 Owl-0.005
 lose-0.005
Token bears
Token ID13062
Feature activation+9.62e-04

Pos logit contributions
aney+0.005
 contain+0.004
Jane+0.004
ill+0.004
 Kate+0.004
Neg logit contributions
 regret-0.001
 Owl-0.001
 vet-0.001
 veterinarian-0.001
 harm-0.001
Token and
Token ID290
Feature activation+7.12e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 veterinarian+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.002
Jane-0.002
 contain-0.002
 Kate-0.002
Kate-0.002
Token the
Token ID262
Feature activation-4.94e-03

Pos logit contributions
 regret+0.011
 vet+0.010
 lose+0.010
 veterinarian+0.010
 harm+0.010
Neg logit contributions
aney-0.014
 contain-0.012
Jane-0.012
ill-0.011
 Kate-0.011
Token tickets
Token ID8587
Feature activation-1.92e-03

Pos logit contributions
aney+0.010
 contain+0.009
Jane+0.009
ill+0.008
 Kate+0.008
Neg logit contributions
 regret-0.007
 vet-0.007
 veterinarian-0.006
 harm-0.006
 lose-0.006
Token and
Token ID290
Feature activation-7.04e-04

Pos logit contributions
aney+0.004
 contain+0.004
Jane+0.004
 Kate+0.004
ill+0.003
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 lose-0.002
 veterinarian-0.002
Token said
Token ID531
Feature activation-8.47e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
ill+0.001
 Kate+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 lose-0.001
 veterinarian-0.001
 harm-0.001
Token,
Token ID11
Feature activation-1.30e-02

Pos logit contributions
aney+0.022
 contain+0.020
Jane+0.019
ill+0.019
mos+0.018
Neg logit contributions
 regret-0.006
 wishes-0.006
 vet-0.006
 veterinarian-0.005
 responsibility-0.005
Token "
Token ID366
Feature activation+5.67e-03

Pos logit contributions
aney+0.028
 contain+0.024
Jane+0.023
ill+0.022
 Kate+0.021
Neg logit contributions
 vet-0.017
 Owl-0.017
 veterinarian-0.017
 regret-0.016
 Nut-0.015
TokenLet
Token ID5756
Feature activation+4.04e-03

Pos logit contributions
 regret+0.007
 vet+0.006
 harm+0.006
 Owl+0.006
 lose+0.006
Neg logit contributions
aney-0.013
 contain-0.011
Jane-0.011
 yes-0.010
 Kate-0.010
Token.
Token ID13
Feature activation-3.87e-03

Pos logit contributions
 regret+0.011
 harm+0.011
 Owl+0.011
 lose+0.010
 veterinarian+0.010
Neg logit contributions
aney-0.024
 contain-0.020
Jane-0.020
ill-0.019
 yes-0.018
Token As
Token ID1081
Feature activation+5.11e-03

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
ill+0.007
Neg logit contributions
 vet-0.004
 regret-0.004
 veterinarian-0.004
 Owl-0.004
 Nut-0.004
Token they
Token ID484
Feature activation-1.14e-02

Pos logit contributions
 regret+0.009
 vet+0.008
 harm+0.008
 Owl+0.008
 lose+0.008
Neg logit contributions
aney-0.009
 contain-0.008
Jane-0.007
 yes-0.007
 Kate-0.007
Token journey
Token ID7002
Feature activation+1.26e-02

Pos logit contributions
aney+0.024
Jane+0.021
 contain+0.020
 Kate+0.019
 Jane+0.019
Neg logit contributions
 regret-0.015
 will-0.013
 vet-0.013
 lose-0.013
 harm-0.013
Tokened
Token ID276
Feature activation+1.05e-02

Pos logit contributions
 regret+0.017
 vet+0.016
 harm+0.015
 Owl+0.015
 veterinarian+0.015
Neg logit contributions
aney-0.027
 contain-0.024
Jane-0.022
 yes-0.022
 Kate-0.021
Token through
Token ID832
Feature activation-9.16e-03

Pos logit contributions
 regret+0.014
 vet+0.014
 harm+0.013
 Owl+0.013
 lose+0.013
Neg logit contributions
aney-0.022
 contain-0.019
Jane-0.018
 yes-0.017
 Kate-0.017
Token the
Token ID262
Feature activation+8.80e-04

Pos logit contributions
aney+0.021
 contain+0.018
Jane+0.017
ill+0.016
 yes+0.016
Neg logit contributions
 regret-0.012
 harm-0.011
 vet-0.011
 Owl-0.010
 trouble-0.010
Token forest
Token ID8222
Feature activation+6.85e-03

Pos logit contributions
 regret+0.001
 vet+0.001
 harm+0.001
 Owl+0.001
 lose+0.001
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.001
 Kate-0.001
 yes-0.001
Token,
Token ID11
Feature activation-5.78e-03

Pos logit contributions
 regret+0.008
 vet+0.008
 harm+0.007
 Owl+0.007
 lose+0.007
Neg logit contributions
aney-0.015
 contain-0.014
Jane-0.013
 yes-0.012
 Kate-0.012
Token Grand
Token ID5675
Feature activation-5.41e-03

Pos logit contributions
aney+0.012
 contain+0.010
Jane+0.009
ill+0.009
shirt+0.009
Neg logit contributions
 regret-0.008
 vet-0.008
 Owl-0.008
 veterinarian-0.007
 harm-0.007
Tokenma
Token ID2611
Feature activation+4.41e-03

Pos logit contributions
aney+0.013
 contain+0.013
Jane+0.011
 yes+0.011
 Kate+0.011
Neg logit contributions
 regret-0.005
 harm-0.004
 vet-0.004
 Owl-0.004
 difference-0.004
Token come
Token ID1282
Feature activation+6.19e-03

Pos logit contributions
 regret+0.015
 vet+0.015
inging+0.014
 unnecessary+0.014
 harm+0.014
Neg logit contributions
aney-0.021
 contain-0.021
 yes-0.019
 Jane-0.018
Jane-0.018
Token back
Token ID736
Feature activation-1.70e-03

Pos logit contributions
 regret+0.010
 vet+0.009
 harm+0.009
 damage+0.009
 client+0.009
Neg logit contributions
aney-0.011
 contain-0.010
 yes-0.009
ill-0.009
Jane-0.009
Token soon
Token ID2582
Feature activation-1.55e-03

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 yes+0.003
 Kate+0.003
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 Owl-0.002
 lose-0.002
Token,
Token ID11
Feature activation-2.17e-02

Pos logit contributions
aney+0.003
 contain+0.003
Jane+0.003
 Kate+0.003
 yes+0.003
Neg logit contributions
 regret-0.002
 vet-0.002
 harm-0.002
 lose-0.002
 Owl-0.002
Token I
Token ID314
Feature activation-5.12e-03

Pos logit contributions
aney+0.043
 contain+0.038
Jane+0.035
 yes+0.034
 Kate+0.034
Neg logit contributions
 regret-0.031
 vet-0.030
 harm-0.029
 lose-0.028
 veterinarian-0.028
Token dream
Token ID4320
Feature activation-1.43e-02

Pos logit contributions
aney+0.013
 contain+0.012
Jane+0.011
 yes+0.011
ill+0.011
Neg logit contributions
 regret-0.005
 vet-0.004
 Owl-0.004
 harm-0.004
inging-0.004
Token of
Token ID286
Feature activation+5.83e-03

Pos logit contributions
aney+0.028
 contain+0.024
Jane+0.023
 apr+0.022
 Kate+0.022
Neg logit contributions
 regret-0.020
 vet-0.019
 harm-0.019
 veterinarian-0.019
 lose-0.019
Token flying
Token ID7348
Feature activation-1.29e-02

Pos logit contributions
 regret+0.008
 vet+0.007
 harm+0.007
 Owl+0.007
 lose+0.007
Neg logit contributions
aney-0.012
 contain-0.011
Jane-0.010
ill-0.010
 yes-0.010
Token through
Token ID832
Feature activation-1.39e-02

Pos logit contributions
aney+0.029
 contain+0.025
Jane+0.024
year+0.022
 yes+0.022
Neg logit contributions
 regret-0.015
 harm-0.015
 Owl-0.014
 vet-0.014
 lose-0.014
Token the
Token ID262
Feature activation-4.93e-03

Pos logit contributions
aney+0.031
 contain+0.027
Jane+0.025
 yes+0.024
shirt+0.023
Neg logit contributions
 regret-0.018
 harm-0.017
 wishes-0.016
 Owl-0.016
 trouble-0.016
Token sky
Token ID6766
Feature activation-2.78e-02

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.008
 Kate+0.008
 Jane+0.008
Neg logit contributions
 regret-0.007
 vet-0.007
 lose-0.007
 veterinarian-0.007
 harm-0.006
Token OK
Token ID7477
Feature activation-2.56e-04

Pos logit contributions
inging+0.013
 regret+0.012
check+0.012
 harm+0.012
 vet+0.012
Neg logit contributions
aney-0.015
 Jane-0.014
 yes-0.013
 Kate-0.013
 contain-0.013
Token,
Token ID11
Feature activation-1.75e-02

Pos logit contributions
aney+0.001
 contain+0.000
Jane+0.000
 yes+0.000
ill+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 Owl-0.000
 veterinarian-0.000
Token m
Token ID285
Feature activation+6.84e-03

Pos logit contributions
aney+0.036
 contain+0.032
Jane+0.030
 yes+0.029
 Kate+0.029
Neg logit contributions
 regret-0.024
 vet-0.022
 harm-0.022
 lose-0.021
 Owl-0.021
Tokenister
Token ID1694
Feature activation+1.40e-03

Pos logit contributions
 regret+0.007
 vet+0.007
 harm+0.006
 veterinarian+0.006
 lose+0.006
Neg logit contributions
aney-0.016
 contain-0.015
Jane-0.014
 yes-0.014
ill-0.013
Token,"
Token ID553
Feature activation-2.98e-03

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 lose+0.002
inging+0.002
Neg logit contributions
aney-0.003
 contain-0.002
Jane-0.002
 Jane-0.002
 Kate-0.002
Token Tom
Token ID4186
Feature activation-1.95e-02

Pos logit contributions
aney+0.006
 contain+0.005
Jane+0.005
ill+0.005
mos+0.005
Neg logit contributions
 regret-0.004
 vet-0.004
 Owl-0.004
 veterinarian-0.004
 Nut-0.004
Token said
Token ID531
Feature activation-7.00e-03

Pos logit contributions
aney+0.048
 contain+0.045
days+0.042
 seas+0.042
year+0.042
Neg logit contributions
 regret-0.017
 vet-0.016
 Owl-0.016
 veterinarian-0.015
 owl-0.014
Token.
Token ID13
Feature activation-6.76e-03

Pos logit contributions
aney+0.018
 contain+0.017
Jane+0.016
 seas+0.016
mos+0.015
Neg logit contributions
 regret-0.006
 vet-0.005
 harm-0.005
 Owl-0.005
 lose-0.005
Token "
Token ID366
Feature activation-4.93e-05

Pos logit contributions
aney+0.014
 contain+0.013
Jane+0.012
days+0.011
ill+0.011
Neg logit contributions
 vet-0.010
 Owl-0.009
 veterinarian-0.009
 regret-0.009
 Nut-0.008
TokenBut
Token ID1537
Feature activation+1.21e-02

Pos logit contributions
aney+0.000
 contain+0.000
 Kate+0.000
 Jane+0.000
Jane+0.000
Neg logit contributions
 vet-0.000
 regret-0.000
 veterinarian-0.000
 harm-0.000
 Owl-0.000
Token we
Token ID356
Feature activation+3.35e-03

Pos logit contributions
 regret+0.017
 vet+0.016
 harm+0.016
inging+0.016
 veterinarian+0.016
Neg logit contributions
aney-0.024
 contain-0.021
 yes-0.020
Jane-0.020
 Kate-0.020
TokenSure
Token ID19457
Feature activation-3.10e-03

Pos logit contributions
aney+0.007
 contain+0.006
 Kate+0.006
ill+0.006
mos+0.006
Neg logit contributions
 vet-0.004
 regret-0.004
 veterinarian-0.004
 harm-0.004
check-0.004
Token,
Token ID11
Feature activation-1.58e-02

Pos logit contributions
aney+0.007
 contain+0.006
Jane+0.006
ill+0.006
 yes+0.006
Neg logit contributions
 regret-0.004
 vet-0.003
 harm-0.003
 Owl-0.003
 lose-0.003
Token let
Token ID1309
Feature activation+1.30e-02

Pos logit contributions
aney+0.034
 contain+0.030
Jane+0.028
 yes+0.027
 Kate+0.027
Neg logit contributions
 regret-0.021
 vet-0.019
 harm-0.019
 Owl-0.018
 lose-0.018
Token's
Token ID338
Feature activation-4.77e-03

Pos logit contributions
 regret+0.023
 Owl+0.022
 vet+0.022
 affect+0.021
 client+0.021
Neg logit contributions
aney-0.020
 Jane-0.017
Jane-0.017
 Kate-0.016
 Jill-0.015
Token go
Token ID467
Feature activation-3.89e-03

Pos logit contributions
aney+0.009
 contain+0.008
 Jane+0.008
 Kate+0.007
 Jill+0.007
Neg logit contributions
 unnecessary-0.007
 regret-0.007
 vet-0.007
 Owl-0.007
 Bad-0.007
Token look
Token ID804
Feature activation-1.11e-02

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.007
 yes+0.007
 Kate+0.007
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
 Owl-0.005
 veterinarian-0.004
Token at
Token ID379
Feature activation+9.23e-03

Pos logit contributions
aney+0.027
 contain+0.025
Jane+0.023
ill+0.022
days+0.021
Neg logit contributions
 regret-0.010
 harm-0.009
 lose-0.009
 will-0.009
 wishes-0.009
Token them
Token ID606
Feature activation-1.69e-02

Pos logit contributions
 regret+0.013
 vet+0.013
 harm+0.012
 Owl+0.012
 veterinarian+0.012
Neg logit contributions
aney-0.019
 contain-0.017
Jane-0.015
 yes-0.015
ill-0.014
Token."
Token ID526
Feature activation+1.34e-02

Pos logit contributions
aney+0.035
 contain+0.032
Jane+0.030
year+0.028
 Kate+0.028
Neg logit contributions
 regret-0.020
 vet-0.020
 veterinarian-0.020
 lose-0.020
 Owl-0.019
Token They
Token ID1119
Feature activation-9.83e-03

Pos logit contributions
 regret+0.024
 vet+0.023
 Owl+0.022
 harm+0.022
 veterinarian+0.022
Neg logit contributions
aney-0.022
 contain-0.019
Jane-0.017
 yes-0.017
 Kate-0.016
Token went
Token ID1816
Feature activation-5.01e-03

Pos logit contributions
aney+0.022
 contain+0.019
Jane+0.018
 Kate+0.017
days+0.017
Neg logit contributions
 regret-0.011
 Nut-0.010
 lose-0.010
 vet-0.010
 mailbox-0.010
Token the
Token ID262
Feature activation-5.89e-04

Pos logit contributions
aney+0.011
 contain+0.010
 yes+0.009
Jane+0.009
mos+0.009
Neg logit contributions
 regret-0.008
 harm-0.008
 vet-0.007
 Owl-0.007
 lose-0.007
Token trees
Token ID7150
Feature activation+5.14e-03

Pos logit contributions
aney+0.001
 contain+0.001
Jane+0.001
 Jane+0.001
 yes+0.001
Neg logit contributions
 regret-0.001
 vet-0.001
 Owl-0.001
 veterinarian-0.001
 lose-0.001
Token and
Token ID290
Feature activation-2.22e-03

Pos logit contributions
 regret+0.007
 vet+0.007
 harm+0.007
 veterinarian+0.007
 Owl+0.006
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.009
 yes-0.008
ill-0.008
Token houses
Token ID7777
Feature activation+6.55e-03

Pos logit contributions
aney+0.004
 contain+0.003
Jane+0.003
 Kate+0.003
 Jane+0.003
Neg logit contributions
 regret-0.004
 vet-0.004
 veterinarian-0.003
 harm-0.003
inging-0.003
Token,
Token ID11
Feature activation-5.72e-03

Pos logit contributions
 regret+0.008
 harm+0.008
 lose+0.008
 vet+0.008
 Owl+0.007
Neg logit contributions
aney-0.014
 contain-0.012
Jane-0.012
 Kate-0.011
mos-0.011
Token and
Token ID290
Feature activation-1.41e-02

Pos logit contributions
aney+0.011
 contain+0.010
Jane+0.009
 yes+0.009
 Kate+0.009
Neg logit contributions
 regret-0.009
 vet-0.008
 harm-0.008
 Owl-0.008
 lose-0.008
Token even
Token ID772
Feature activation+2.85e-03

Pos logit contributions
aney+0.025
 contain+0.022
Jane+0.020
 yes+0.019
 Kate+0.019
Neg logit contributions
 regret-0.023
 vet-0.022
 harm-0.022
 Owl-0.021
 lose-0.021
Token the
Token ID262
Feature activation-4.85e-03

Pos logit contributions
 regret+0.005
 vet+0.005
inging+0.004
 harm+0.004
 veterinarian+0.004
Neg logit contributions
aney-0.005
 contain-0.004
Jane-0.004
 yes-0.004
 Kate-0.004
Token clouds
Token ID15114
Feature activation-4.43e-03

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.008
 yes+0.008
 Jane+0.007
Neg logit contributions
 regret-0.007
 vet-0.007
 harm-0.007
 Owl-0.006
 lose-0.006
Token in
Token ID287
Feature activation-8.95e-03

Pos logit contributions
aney+0.010
 contain+0.008
Jane+0.008
 Kate+0.008
 yes+0.008
Neg logit contributions
 regret-0.006
 harm-0.005
 lose-0.005
 vet-0.005
 Owl-0.005
Token the
Token ID262
Feature activation-9.25e-03

Pos logit contributions
aney+0.019
 contain+0.017
 yes+0.016
Jane+0.016
days+0.015
Neg logit contributions
 regret-0.012
 harm-0.011
 vet-0.011
 trouble-0.011
 Owl-0.011
Token big
Token ID1263
Feature activation-2.04e-02

Pos logit contributions
aney+0.014
 contain+0.012
Jane+0.012
 Kate+0.011
 Jane+0.011
Neg logit contributions
 regret-0.011
 vet-0.010
 harm-0.010
 Owl-0.010
 veterinarian-0.009
Token hill
Token ID12788
Feature activation-1.99e-02

Pos logit contributions
aney+0.045
 contain+0.040
Jane+0.037
 yes+0.037
year+0.036
Neg logit contributions
 regret-0.025
 vet-0.024
 harm-0.023
 Owl-0.023
 lose-0.023
Token.
Token ID13
Feature activation-7.78e-04

Pos logit contributions
aney+0.052
Jane+0.044
 contain+0.043
 apr+0.041
 yes+0.041
Neg logit contributions
 harm-0.015
inging-0.014
 lose-0.014
 regret-0.014
 vet-0.014
Token The
Token ID383
Feature activation-1.19e-02

Pos logit contributions
aney+0.002
 contain+0.002
year+0.002
Jane+0.002
ill+0.002
Neg logit contributions
 vet-0.001
 veterinarian-0.001
 Hug-0.001
 Nut-0.001
 Owl-0.001
Token big
Token ID1263
Feature activation-2.18e-02

Pos logit contributions
aney+0.025
 contain+0.022
Jane+0.020
 Kate+0.019
 yes+0.019
Neg logit contributions
 regret-0.017
 vet-0.016
 harm-0.015
 veterinarian-0.015
 lose-0.015
Token round
Token ID2835
Feature activation-4.13e-02

Pos logit contributions
aney+0.045
 contain+0.039
Jane+0.037
 yes+0.035
year+0.034
Neg logit contributions
 regret-0.031
 vet-0.029
 lose-0.029
 Owl-0.029
 harm-0.028
Token ball
Token ID2613
Feature activation-3.31e-03

Pos logit contributions
aney+0.108
 contain+0.100
Jane+0.093
 yes+0.091
year+0.091
Neg logit contributions
 dog-0.034
 mailbox-0.032
 vet-0.031
 lock-0.030
 cat-0.029
Token could
Token ID714
Feature activation-5.12e-03

Pos logit contributions
aney+0.009
 contain+0.007
Jane+0.007
ill+0.007
year+0.007
Neg logit contributions
 regret-0.003
 vet-0.003
 lose-0.003
 will-0.003
 harm-0.003
Token roll
Token ID4836
Feature activation-1.97e-04

Pos logit contributions
aney+0.013
Jane+0.011
 contain+0.010
mos+0.010
year+0.010
Neg logit contributions
 lose-0.006
 regret-0.006
 vet-0.005
 harm-0.005
 veterinarian-0.005
Token very
Token ID845
Feature activation-7.60e-03

Pos logit contributions
aney+0.000
 contain+0.000
Jane+0.000
 yes+0.000
 Kate+0.000
Neg logit contributions
 regret-0.000
 vet-0.000
 harm-0.000
 lose-0.000
 Owl-0.000
Token easily
Token ID3538
Feature activation+1.53e-02

Pos logit contributions
aney+0.014
 contain+0.012
Jane+0.012
 yes+0.011
 Kate+0.011
Neg logit contributions
 regret-0.012
 vet-0.011
 harm-0.011
 lose-0.011
 Owl-0.011
Token It
Token ID632
Feature activation-7.42e-03

Pos logit contributions
aney+0.009
 contain+0.008
Jane+0.007
 yes+0.007
ill+0.007
Neg logit contributions
 regret-0.006
 vet-0.006
 Owl-0.005
 veterinarian-0.005
 harm-0.005
Token will
Token ID481
Feature activation+8.32e-03

Pos logit contributions
aney+0.017
 contain+0.015
 yes+0.014
 Kate+0.014
Jane+0.014
Neg logit contributions
 regret-0.008
 vet-0.007
 veterinarian-0.007
 Owl-0.007
 client-0.007
Token spin
Token ID7906
Feature activation-2.02e-02

Pos logit contributions
 regret+0.010
 vet+0.010
 client+0.010
inging+0.010
 veterinarian+0.010
Neg logit contributions
aney-0.016
 contain-0.016
 yes-0.014
Jane-0.014
 Jane-0.014
Token on
Token ID319
Feature activation-8.26e-03

Pos logit contributions
aney+0.043
 contain+0.037
Jane+0.035
ill+0.032
 yes+0.032
Neg logit contributions
 regret-0.028
 harm-0.025
 vet-0.025
 lose-0.025
 Owl-0.025
Token the
Token ID262
Feature activation-2.69e-03

Pos logit contributions
aney+0.016
 contain+0.014
Jane+0.013
 yes+0.012
 Kate+0.012
Neg logit contributions
 regret-0.013
 vet-0.012
 harm-0.012
 Owl-0.012
 lose-0.012
Token floor
Token ID4314
Feature activation-3.91e-02

Pos logit contributions
aney+0.005
 contain+0.005
Jane+0.005
 yes+0.004
 Kate+0.004
Neg logit contributions
 regret-0.004
 vet-0.003
 harm-0.003
 Owl-0.003
 lose-0.003
Token and
Token ID290
Feature activation-4.33e-03

Pos logit contributions
aney+0.094
 contain+0.084
Jane+0.080
year+0.076
shirt+0.073
Neg logit contributions
 regret-0.039
 lose-0.036
 harm-0.035
 vet-0.035
 will-0.034
Token make
Token ID787
Feature activation-4.47e-03

Pos logit contributions
 contain+0.008
aney+0.008
 Kate+0.007
 yes+0.007
 Jane+0.007
Neg logit contributions
 vet-0.006
 regret-0.006
 veterinarian-0.006
inging-0.006
athy-0.006
Token a
Token ID257
Feature activation+1.19e-02

Pos logit contributions
aney+0.008
 contain+0.006
 Kate+0.006
Jane+0.006
 Jane+0.006
Neg logit contributions
 regret-0.007
 vet-0.007
 Owl-0.007
 veterinarian-0.007
 harm-0.007
Token nice
Token ID3621
Feature activation+5.14e-03

Pos logit contributions
 Owl+0.017
athy+0.017
 harm+0.017
 Nut+0.017
 regret+0.017
Neg logit contributions
aney-0.020
 contain-0.019
 yes-0.017
Jane-0.017
 Kate-0.017
Token sound
Token ID2128
Feature activation-1.18e-02

Pos logit contributions
 regret+0.007
 vet+0.007
 harm+0.007
 Owl+0.007
 veterinarian+0.006
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.008
 Jane-0.008
 yes-0.008
Token black
Token ID2042
Feature activation-3.90e-02

Pos logit contributions
aney+0.030
 contain+0.027
Jane+0.025
 yes+0.024
 Jane+0.023
Neg logit contributions
 regret-0.024
 vet-0.022
 harm-0.022
 lose-0.022
inging-0.022
Tokenboard
Token ID3526
Feature activation-3.41e-02

Pos logit contributions
aney+0.074
 contain+0.068
Jane+0.058
 yes+0.058
year+0.054
Neg logit contributions
 dog-0.058
 client-0.057
 vet-0.057
 mailbox-0.057
 house-0.056
Token?
Token ID30
Feature activation-2.57e-03

Pos logit contributions
aney+0.084
 contain+0.071
Jane+0.070
mos+0.066
uits+0.065
Neg logit contributions
 lose-0.036
 regret-0.035
 vet-0.032
 veterinarian-0.031
 will-0.030
Token It
Token ID632
Feature activation-2.73e-03

Pos logit contributions
aney+0.006
 contain+0.005
Jane+0.005
 yes+0.004
 Kate+0.004
Neg logit contributions
 regret-0.003
 vet-0.003
 harm-0.003
 Owl-0.003
 lose-0.003
Token's
Token ID338
Feature activation-4.08e-03

Pos logit contributions
aney+0.006
 contain+0.005
 Kate+0.005
 yes+0.005
Jane+0.005
Neg logit contributions
 regret-0.003
 vet-0.003
 veterinarian-0.003
inging-0.003
 harm-0.003
Token very
Token ID845
Feature activation-3.95e-02

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.007
ill+0.006
 yes+0.006
Neg logit contributions
 regret-0.006
 vet-0.006
 harm-0.006
 lose-0.005
 damage-0.005
Token messy
Token ID24097
Feature activation-1.92e-02

Pos logit contributions
aney+0.090
 contain+0.080
Jane+0.074
ill+0.074
mos+0.071
Neg logit contributions
 regret-0.049
 lose-0.044
 harm-0.042
 damage-0.041
 vet-0.041
Token."
Token ID526
Feature activation+5.68e-03

Pos logit contributions
aney+0.045
 contain+0.040
Jane+0.038
 yes+0.035
year+0.034
Neg logit contributions
 regret-0.021
 lose-0.020
 harm-0.020
 vet-0.019
 trouble-0.019
Token Her
Token ID2332
Feature activation+7.29e-03

Pos logit contributions
 regret+0.009
 harm+0.009
 vet+0.009
 lose+0.008
 Owl+0.008
Neg logit contributions
aney-0.010
 contain-0.009
Jane-0.008
 Kate-0.008
 Jane-0.008
Token teacher
Token ID4701
Feature activation-2.70e-02

Pos logit contributions
 regret+0.009
 harm+0.009
 vet+0.009
 Owl+0.009
 lose+0.008
Neg logit contributions
aney-0.015
 contain-0.013
Jane-0.013
 yes-0.013
 Kate-0.013
Token said
Token ID531
Feature activation-1.11e-02

Pos logit contributions
aney+0.058
 contain+0.054
uits+0.048
 seas+0.048
Jane+0.047
Neg logit contributions
 regret-0.036
 vet-0.030
 Owl-0.029
 veterinarian-0.029
 lose-0.028
Token on
Token ID319
Feature activation-2.71e-02

Pos logit contributions
aney+0.020
 contain+0.019
Jane+0.016
days+0.016
year+0.016
Neg logit contributions
 vet-0.011
 Owl-0.011
 Nut-0.010
 mailbox-0.010
 wishes-0.010
Token paper
Token ID3348
Feature activation-1.33e-02

Pos logit contributions
aney+0.060
 contain+0.056
year+0.050
Jane+0.049
days+0.049
Neg logit contributions
 damage-0.030
 vet-0.030
 regret-0.030
 mailbox-0.029
 wishes-0.029
Token and
Token ID290
Feature activation-5.70e-03

Pos logit contributions
aney+0.031
 contain+0.028
Jane+0.027
year+0.025
 yes+0.025
Neg logit contributions
 regret-0.013
 vet-0.013
 Owl-0.012
 harm-0.012
 veterinarian-0.012
Token on
Token ID319
Feature activation-2.24e-02

Pos logit contributions
aney+0.011
 contain+0.009
Jane+0.008
ill+0.008
 yes+0.008
Neg logit contributions
 regret-0.009
 vet-0.009
 harm-0.008
 lose-0.008
 Owl-0.008
Token the
Token ID262
Feature activation+4.20e-03

Pos logit contributions
aney+0.052
 contain+0.049
year+0.044
Jane+0.043
days+0.043
Neg logit contributions
 regret-0.023
 damage-0.023
 vet-0.023
 harm-0.022
 mailbox-0.022
Token wall
Token ID3355
Feature activation-4.08e-02

Pos logit contributions
 regret+0.006
 vet+0.006
 harm+0.006
 veterinarian+0.006
 lose+0.006
Neg logit contributions
aney-0.008
 contain-0.007
Jane-0.007
 yes-0.006
 Kate-0.006
Token.
Token ID13
Feature activation-5.94e-03

Pos logit contributions
aney+0.108
 contain+0.101
Jane+0.094
year+0.092
 YES+0.091
Neg logit contributions
 vet-0.022
 harm-0.022
 will-0.022
 regret-0.021
 wishes-0.021
Token Sometimes
Token ID8975
Feature activation+1.11e-03

Pos logit contributions
 contain+0.015
aney+0.015
 yes+0.013
days+0.013
year+0.013
Neg logit contributions
 vet-0.005
 regret-0.004
 veterinarian-0.004
inging-0.004
 Bad-0.004
Token they
Token ID484
Feature activation-1.53e-02

Pos logit contributions
 regret+0.002
 vet+0.002
 harm+0.002
 Owl+0.002
 veterinarian+0.002
Neg logit contributions
aney-0.002
 contain-0.002
Jane-0.002
 yes-0.002
ill-0.002
Token painted
Token ID13055
Feature activation-8.60e-03

Pos logit contributions
aney+0.030
 contain+0.026
Jane+0.025
 yes+0.024
 Kate+0.023
Neg logit contributions
 regret-0.023
 lose-0.021
 harm-0.021
 vet-0.020
 damage-0.020
Token on
Token ID319
Feature activation-2.37e-02

Pos logit contributions
aney+0.019
 contain+0.018
Jane+0.015
year+0.014
days+0.014
Neg logit contributions
 Owl-0.010
 vet-0.010
 regret-0.010
 wishes-0.010
 veterinarian-0.010
Token She
Token ID1375
Feature activation-1.05e-02

Pos logit contributions
aney+0.023
 contain+0.022
Jane+0.019
 yes+0.019
mos+0.018
Neg logit contributions
 regret-0.014
 vet-0.014
 Owl-0.013
 veterinarian-0.013
 Nut-0.013
Token cut
Token ID2005
Feature activation-3.07e-03

Pos logit contributions
aney+0.022
 contain+0.021
Jane+0.018
days+0.018
ill+0.017
Neg logit contributions
 regret-0.014
 vet-0.013
 veterinarian-0.012
 lose-0.012
 Owl-0.012
Token it
Token ID340
Feature activation-1.29e-02

Pos logit contributions
aney+0.006
 contain+0.005
Jane+0.004
 yes+0.004
 Kate+0.004
Neg logit contributions
 regret-0.005
 vet-0.005
 harm-0.005
 Owl-0.005
 lose-0.005
Token into
Token ID656
Feature activation-1.67e-02

Pos logit contributions
aney+0.028
 contain+0.025
Jane+0.024
 Kate+0.022
 Jane+0.022
Neg logit contributions
 regret-0.016
 vet-0.015
 harm-0.015
 lose-0.015
 Owl-0.015
Token the
Token ID262
Feature activation-9.27e-03

Pos logit contributions
aney+0.032
 contain+0.028
Jane+0.025
days+0.024
 yes+0.024
Neg logit contributions
 regret-0.026
 vet-0.025
 trouble-0.024
 harm-0.024
 Owl-0.024
Token shape
Token ID5485
Feature activation-4.24e-02

Pos logit contributions
aney+0.016
 contain+0.013
Jane+0.012
 yes+0.012
 Kate+0.012
Neg logit contributions
 regret-0.016
 vet-0.015
 harm-0.015
 Owl-0.015
 lose-0.015
Token of
Token ID286
Feature activation-5.89e-03

Pos logit contributions
aney+0.098
 contain+0.088
Jane+0.085
days+0.080
 yes+0.080
Neg logit contributions
 regret-0.042
inging-0.039
 vet-0.039
 harm-0.038
 lose-0.038
Token the
Token ID262
Feature activation-4.72e-03

Pos logit contributions
aney+0.012
 contain+0.010
Jane+0.009
 yes+0.009
days+0.009
Neg logit contributions
 regret-0.009
 vet-0.009
 harm-0.008
 Owl-0.008
 lose-0.008
Token shirt
Token ID10147
Feature activation-1.64e-02

Pos logit contributions
aney+0.008
 contain+0.007
Jane+0.006
 yes+0.006
 Kate+0.006
Neg logit contributions
 regret-0.008
 vet-0.008
 harm-0.008
 Owl-0.008
 lose-0.008
Token,
Token ID11
Feature activation-9.01e-03

Pos logit contributions
aney+0.040
 contain+0.036
Jane+0.035
year+0.033
days+0.032
Neg logit contributions
 regret-0.014
 vet-0.014
inging-0.013
 lose-0.013
 Owl-0.013
Token then
Token ID788
Feature activation-1.47e-02

Pos logit contributions
aney+0.019
 contain+0.016
Jane+0.015
days+0.015
ill+0.014
Neg logit contributions
 regret-0.013
 vet-0.012
 lose-0.011
 harm-0.011
 Owl-0.011