TOP ACTIVATIONS
MAX = 2.393

of her mom 's r ags and soap . She washed
at them . It w ags its tail . It likes
its ears . It w ags its tail a lot .
peek out from the r ags . They see a bird
Jack was turning the r ags , he looked out the
beauty lies in one 's own shadow , and it is
is happy . It w ags its tail and l icks
She found some old r ags which were dirty . She
iffs them . It w ags its tail . It l
to Spot . He w ags his long tail .
and showed her the r ags . She said that she
him to turn the r ags she kept in the closet
He wrote z ig z ags on walls , floors ,
it was difficult finding r ags and water . But Bob
the things . He w ags his tail and runs away
tongue out . He w ags his tail and makes a
running around with the r ags , so this kind of
grabbed one of the r ags and started to spin it
mom bought some new r ags . When they got back
is friendly . He w ags his tail and l icks

MIN ACTIVATIONS
MIN = -2.481

man stepped closer and said gru ff ly , " You
, and even went to outer space ! It was the
, so Tim got mad der . He pushed Jack away
Jen feels even sad der . She thinks Tim is
thanked Clim ber the Lad der for its help . And
" Follow me ," the sun flower says . He bends
stay here and enjoy the sun . But can I visit
Lily said , making her gir affe kick the lion away
Lily said , making her gir affe dodge the lion .
even happier and p rou der . He displays his sail
amed down , making the sun flower 's pet als fl
Thank you for making the sun and the stars and the
!" Benny felt even sad der and started to cry .
amed down , making the sun flower 's pet als fl
do that ," said the gen ie . " Watch this
my , why does the sun rise and set every day
old frog felt even sad der and hopped away .
ily insisted on keeping the sun flower in her room .
her mom and took the sun flower with her . That
And I will share my sun fl owers with you .

INTERVAL 1.174 - 2.393
CONTAINS 0.273%

itty went outside on its own . It found a big
big slide with a z ig zag shape . Lily wanted
contained . It was their own special secret ! <|endoftext|> Once
. They escaped the hunter 's view , and were not
. " What 's that ?" asked Emily .

INTERVAL -0.044 - 1.174
CONTAINS 54.802%

it to all her friends . As time went
Once they were all mixed , she put the bowl in
. They said , " No , you are mean and
fly to the moon ." " OK , but
. The plane went up in the air and soon they

INTERVAL -1.263 - -0.044
CONTAINS 44.696%

delicious red sauce . He tasted it and it was y
that are yellow and blue . Add them together and that
big bow in her hair . Sara liked to brush Anna
says to Mia : " Look , Mia . That dog
careful as he picked the squash from the vine . He

INTERVAL -2.481 - -1.263
CONTAINS 0.229%

al felt wet in the sun and suddenly the rabbit heard
very warm outside and the sun was shining bright . The
mom my then took the girl out of the house to
was very curious about the moon and he asked his mom
Can we swing to the moon , mom ?" Lily asks
Token of
Token ID286
Feature activation-1.54e-01

Pos logit contributions
ous+0.029
ho+0.026
 Nice+0.025
 tigers+0.024
 Sky+0.024
Neg logit contributions
 promise-0.025
ap-0.024
 agree-0.024
leaning-0.022
 promises-0.022
Token her
Token ID607
Feature activation+5.18e-01

Pos logit contributions
ous+0.013
ho+0.012
 Nice+0.012
 grizz+0.011
pill+0.011
Neg logit contributions
 promise-0.009
 agree-0.009
ap-0.009
 promises-0.009
 advantage-0.008
Token mom
Token ID1995
Feature activation+8.69e-01

Pos logit contributions
ap+0.030
 promise+0.030
 agree+0.030
leaning+0.028
 advantage+0.027
Neg logit contributions
ous-0.044
ho-0.041
 Nice-0.039
 spilling-0.038
 tigers-0.038
Token's
Token ID338
Feature activation+1.32e+00

Pos logit contributions
 promise+0.039
 agree+0.037
ap+0.036
 promises+0.033
 So+0.032
Neg logit contributions
ous-0.091
 Nice-0.084
ho-0.083
 tigers-0.079
 Sky-0.079
Token r
Token ID374
Feature activation+4.44e-01

Pos logit contributions
 promise+0.081
ap+0.081
 agree+0.079
leaning+0.074
 advantage+0.072
Neg logit contributions
ous-0.109
ho-0.100
 Nice-0.096
 tigers-0.092
 spilling-0.092
Tokenags
Token ID3775
Feature activation+2.33e+00

Pos logit contributions
 promise+0.033
 agree+0.032
ap+0.031
 advantage+0.029
 So+0.029
Neg logit contributions
ous-0.032
ho-0.029
 Nice-0.027
 spilling-0.026
pill-0.026
Token and
Token ID290
Feature activation+2.95e-02

Pos logit contributions
 promise+0.093
ap+0.092
 agree+0.089
leaning+0.077
 promises+0.076
Neg logit contributions
ous-0.246
ho-0.227
 Nice-0.223
 tigers-0.216
 Sky-0.215
Token soap
Token ID19533
Feature activation+1.57e-01

Pos logit contributions
 promise+0.002
 agree+0.002
ap+0.002
 promises+0.001
 So+0.001
Neg logit contributions
ous-0.003
 Nice-0.003
ho-0.002
pill-0.002
 tigers-0.002
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
ap+0.005
 promise+0.005
 agree+0.005
 promises+0.004
 So+0.004
Neg logit contributions
ous-0.018
 Nice-0.017
 tigers-0.016
 gor-0.016
 grizz-0.016
Token She
Token ID1375
Feature activation+1.84e-01

Pos logit contributions
ap+0.001
 agree+0.001
 promise+0.001
 So+0.001
leaning+0.001
Neg logit contributions
ous-0.002
ho-0.002
 guarding-0.002
 spilling-0.002
 Nice-0.002
Token washed
Token ID18989
Feature activation+5.10e-01

Pos logit contributions
 promise+0.013
ap+0.013
 agree+0.013
leaning+0.012
 promises+0.012
Neg logit contributions
ous-0.014
ho-0.013
 Nice-0.012
 tigers-0.012
 Sky-0.012
Token at
Token ID379
Feature activation+3.84e-01

Pos logit contributions
 promise+0.010
ap+0.010
 agree+0.010
 promises+0.009
 advantage+0.009
Neg logit contributions
ous-0.016
ho-0.015
 Nice-0.014
 tigers-0.014
 Sky-0.014
Token them
Token ID606
Feature activation+4.90e-03

Pos logit contributions
ap+0.022
 agree+0.021
 promise+0.021
leaning+0.021
 So+0.021
Neg logit contributions
ous-0.033
ho-0.031
 Nice-0.029
pill-0.029
chy-0.028
Token.
Token ID13
Feature activation+2.89e-02

Pos logit contributions
 promise+0.000
 agree+0.000
ap+0.000
leaning+0.000
 advantage+0.000
Neg logit contributions
ous-0.001
ho-0.000
 Nice-0.000
 tigers-0.000
 Sky-0.000
Token It
Token ID632
Feature activation+4.04e-01

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
leaning+0.002
 promises+0.002
Neg logit contributions
ous-0.003
 Nice-0.002
ho-0.002
 Sky-0.002
 tigers-0.002
Token w
Token ID266
Feature activation+3.42e-01

Pos logit contributions
 promise+0.027
 promises+0.025
 agree+0.024
 So+0.024
ap+0.023
Neg logit contributions
ous-0.037
ho-0.031
 Nice-0.029
 spilling-0.029
storm-0.028
Tokenags
Token ID3775
Feature activation+2.27e+00

Pos logit contributions
 promise+0.016
 So+0.013
 promises+0.013
 replace+0.012
 agree+0.012
Neg logit contributions
ous-0.032
sed-0.030
 snakes-0.030
pill-0.030
 tigers-0.030
Token its
Token ID663
Feature activation-4.74e-01

Pos logit contributions
ap+0.125
 agree+0.119
 promise+0.118
 So+0.105
 promises+0.105
Neg logit contributions
ous-0.202
pill-0.189
ho-0.180
 Nice-0.179
 gor-0.176
Token tail
Token ID7894
Feature activation-1.43e-01

Pos logit contributions
ous+0.033
ho+0.029
 Nice+0.029
 tigers+0.027
 Sky+0.027
Neg logit contributions
 promise-0.036
ap-0.035
 agree-0.035
leaning-0.033
 promises-0.032
Token.
Token ID13
Feature activation-7.17e-03

Pos logit contributions
ous+0.017
ho+0.016
 Nice+0.015
 tigers+0.015
 Sky+0.015
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
leaning-0.003
 promises-0.003
Token It
Token ID632
Feature activation+3.50e-01

Pos logit contributions
ous+0.001
 Nice+0.001
ho+0.001
 Sky+0.001
 tigers+0.001
Neg logit contributions
 promise-0.000
ap-0.000
 agree-0.000
leaning-0.000
 promises-0.000
Token likes
Token ID7832
Feature activation-5.79e-02

Pos logit contributions
 promise+0.023
 promises+0.022
 agree+0.021
 So+0.020
 own+0.020
Neg logit contributions
ous-0.033
ho-0.027
 spilling-0.026
 Nice-0.025
storm-0.025
Token its
Token ID663
Feature activation-3.98e-01

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
 promises+0.006
 advantage+0.006
Neg logit contributions
ous-0.011
ho-0.011
 Nice-0.010
 tigers-0.010
 guarding-0.010
Token ears
Token ID11368
Feature activation+7.40e-01

Pos logit contributions
ous+0.038
 Nice+0.033
ho+0.033
pill+0.032
 tigers+0.032
Neg logit contributions
 promise-0.023
ap-0.021
 agree-0.020
 promises-0.019
 advantage-0.019
Token.
Token ID13
Feature activation-2.82e-02

Pos logit contributions
 promise+0.022
ap+0.021
 agree+0.021
 So+0.016
 promises+0.016
Neg logit contributions
ous-0.084
ho-0.079
 Nice-0.079
 grizz-0.077
pill-0.077
Token It
Token ID632
Feature activation+2.58e-01

Pos logit contributions
ous+0.003
ho+0.002
 spilling+0.002
 Nice+0.002
 guarding+0.002
Neg logit contributions
ap-0.002
 agree-0.002
 promise-0.002
 So-0.002
leaning-0.001
Token w
Token ID266
Feature activation+3.53e-01

Pos logit contributions
 promise+0.017
 agree+0.016
 promises+0.016
ap+0.015
 So+0.015
Neg logit contributions
ous-0.022
 Nice-0.019
ho-0.019
 spilling-0.018
 guarding-0.017
Tokenags
Token ID3775
Feature activation+2.22e+00

Pos logit contributions
 promise+0.020
 agree+0.017
 promises+0.017
ap+0.017
leaning+0.016
Neg logit contributions
ous-0.033
pill-0.029
ho-0.029
 tigers-0.028
 Sky-0.028
Token its
Token ID663
Feature activation-6.48e-01

Pos logit contributions
ap+0.120
 promise+0.119
 agree+0.118
 promises+0.105
 So+0.104
Neg logit contributions
ous-0.203
pill-0.186
ho-0.178
 Nice-0.178
 gor-0.175
Token tail
Token ID7894
Feature activation-1.45e-01

Pos logit contributions
ous+0.051
ho+0.045
 Nice+0.043
 tigers+0.041
pill+0.041
Neg logit contributions
 promise-0.045
ap-0.044
 agree-0.043
 promises-0.041
 advantage-0.040
Token a
Token ID257
Feature activation+2.84e-01

Pos logit contributions
ous+0.018
ho+0.016
 Nice+0.016
pill+0.016
 spilling+0.016
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.003
 So-0.003
 promises-0.003
Token lot
Token ID1256
Feature activation+4.02e-01

Pos logit contributions
 promise+0.020
 agree+0.019
ap+0.019
 advantage+0.017
 promises+0.017
Neg logit contributions
ous-0.022
ho-0.020
 Nice-0.019
pill-0.019
 Sky-0.018
Token.
Token ID13
Feature activation-1.00e-01

Pos logit contributions
 promise+0.014
ap+0.014
 agree+0.014
 So+0.011
leaning+0.011
Neg logit contributions
ous-0.044
ho-0.041
 Nice-0.040
 tigers-0.039
 Sky-0.039
Token peek
Token ID27185
Feature activation+3.90e-01

Pos logit contributions
ous+0.009
ho+0.008
 Nice+0.008
pill+0.008
 tigers+0.008
Neg logit contributions
 promise-0.006
 agree-0.006
ap-0.005
 promises-0.005
 So-0.005
Token out
Token ID503
Feature activation+5.49e-01

Pos logit contributions
 promise+0.040
ap+0.040
 agree+0.039
leaning+0.037
 advantage+0.037
Neg logit contributions
ous-0.018
ho-0.014
 Nice-0.013
 tigers-0.012
 Sky-0.012
Token from
Token ID422
Feature activation+8.97e-02

Pos logit contributions
ap+0.023
 promise+0.022
 agree+0.022
 promises+0.020
 So+0.019
Neg logit contributions
ous-0.057
ho-0.053
pill-0.052
 Nice-0.051
storm-0.051
Token the
Token ID262
Feature activation+1.20e-01

Pos logit contributions
 promise+0.006
 agree+0.006
ap+0.006
 promises+0.005
 advantage+0.005
Neg logit contributions
ous-0.008
ho-0.007
 Nice-0.007
pill-0.006
chy-0.006
Token r
Token ID374
Feature activation+1.46e-01

Pos logit contributions
 promise+0.008
ap+0.007
 agree+0.007
 promises+0.007
leaning+0.007
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.009
pill-0.008
 Sky-0.008
Tokenags
Token ID3775
Feature activation+2.20e+00

Pos logit contributions
 promise+0.010
 agree+0.009
ap+0.009
 promises+0.009
 advantage+0.009
Neg logit contributions
ous-0.012
ho-0.011
pill-0.010
chy-0.010
 spilling-0.010
Token.
Token ID13
Feature activation-4.18e-01

Pos logit contributions
 promise+0.091
ap+0.087
 agree+0.085
leaning+0.077
 promises+0.075
Neg logit contributions
ous-0.233
ho-0.215
 Nice-0.209
 tigers-0.203
 Sky-0.202
Token They
Token ID1119
Feature activation+3.27e-01

Pos logit contributions
ous+0.037
ho+0.033
 Nice+0.033
 tigers+0.031
 Sky+0.031
Neg logit contributions
 promise-0.024
ap-0.024
 agree-0.023
leaning-0.022
 promises-0.021
Token see
Token ID766
Feature activation+1.27e+00

Pos logit contributions
ap+0.017
leaning+0.017
 promise+0.016
ute+0.016
 advantage+0.016
Neg logit contributions
ous-0.027
ho-0.027
 Sky-0.026
chy-0.025
 gn-0.025
Token a
Token ID257
Feature activation+5.18e-01

Pos logit contributions
 agree+0.054
 promises+0.054
ap+0.053
 promise+0.053
 learn+0.048
Neg logit contributions
ous-0.134
 Nice-0.120
ho-0.120
storm-0.117
pill-0.114
Token bird
Token ID6512
Feature activation+9.27e-02

Pos logit contributions
 promise+0.035
ap+0.032
 agree+0.032
 band+0.028
ute+0.028
Neg logit contributions
ous-0.042
 Nice-0.037
ho-0.036
pill-0.036
igator-0.036
Token Jack
Token ID3619
Feature activation+1.72e-01

Pos logit contributions
ap+0.000
 promise+0.000
 agree+0.000
 So+0.000
 promises+0.000
Neg logit contributions
ous-0.000
ho-0.000
 Nice-0.000
pill-0.000
 gor-0.000
Token was
Token ID373
Feature activation-3.27e-01

Pos logit contributions
 promise+0.013
ap+0.013
 agree+0.012
leaning+0.012
 promises+0.012
Neg logit contributions
ous-0.012
ho-0.011
 Nice-0.011
 tigers-0.010
 Sky-0.010
Token turning
Token ID6225
Feature activation-1.68e-01

Pos logit contributions
ous+0.019
ho+0.016
 spilling+0.015
 guarding+0.015
 chasing+0.015
Neg logit contributions
 promise-0.028
 agree-0.028
ap-0.027
leaning-0.027
 So-0.027
Token the
Token ID262
Feature activation-1.71e-01

Pos logit contributions
ous+0.013
ho+0.011
 Nice+0.011
 tigers+0.011
 spilling+0.011
Neg logit contributions
 promise-0.012
 agree-0.012
ap-0.012
leaning-0.011
 promises-0.011
Token r
Token ID374
Feature activation+1.31e-02

Pos logit contributions
ous+0.013
ho+0.011
 Nice+0.011
 spilling+0.010
chy+0.010
Neg logit contributions
ap-0.012
 promise-0.012
 agree-0.012
leaning-0.011
 advantage-0.011
Tokenags
Token ID3775
Feature activation+2.19e+00

Pos logit contributions
 promise+0.001
 agree+0.001
ap+0.001
 advantage+0.001
 promises+0.001
Neg logit contributions
ous-0.001
ho-0.001
chy-0.001
pill-0.001
 stairs-0.001
Token,
Token ID11
Feature activation+1.13e-01

Pos logit contributions
ap+0.103
 promise+0.100
 agree+0.097
 So+0.087
 promises+0.085
Neg logit contributions
ous-0.215
ho-0.202
 Nice-0.198
pill-0.195
 tigers-0.192
Token he
Token ID339
Feature activation-8.55e-02

Pos logit contributions
 agree+0.008
 promise+0.008
ap+0.008
 So+0.007
 promises+0.007
Neg logit contributions
ous-0.009
ho-0.008
pill-0.007
 Nice-0.007
 gor-0.007
Token looked
Token ID3114
Feature activation+3.10e-01

Pos logit contributions
ous+0.007
ho+0.006
 Nice+0.006
 Sky+0.005
 tigers+0.005
Neg logit contributions
 promise-0.006
ap-0.006
 agree-0.006
 So-0.005
 promises-0.005
Token out
Token ID503
Feature activation+5.90e-01

Pos logit contributions
 promise+0.021
ap+0.021
 agree+0.020
leaning+0.019
 promises+0.019
Neg logit contributions
ous-0.024
ho-0.022
 Nice-0.021
 tigers-0.020
 spilling-0.020
Token the
Token ID262
Feature activation+3.35e-01

Pos logit contributions
 promise+0.033
ap+0.033
 agree+0.033
 So+0.030
 promises+0.030
Neg logit contributions
ous-0.053
ho-0.049
 Nice-0.046
pill-0.045
 tigers-0.045
Token beauty
Token ID8737
Feature activation-1.60e-02

Pos logit contributions
 agree+0.043
iver+0.040
leaning+0.040
ags+0.039
our+0.038
Neg logit contributions
ous-0.049
ho-0.041
 Nice-0.039
 guarding-0.036
fly-0.036
Token lies
Token ID7363
Feature activation+3.27e-01

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.001
ap-0.001
 agree-0.001
leaning-0.001
 promises-0.001
Token in
Token ID287
Feature activation+9.04e-01

Pos logit contributions
 promise+0.016
ap+0.016
 agree+0.016
 promises+0.014
leaning+0.014
Neg logit contributions
ous-0.031
ho-0.028
 Nice-0.028
 tigers-0.027
 Sky-0.027
Token one
Token ID530
Feature activation-1.69e-01

Pos logit contributions
 agree+0.059
 promise+0.058
ap+0.057
 So+0.055
leaning+0.055
Neg logit contributions
ous-0.069
ho-0.063
 guarding-0.060
 spilling-0.060
 Nice-0.060
Token's
Token ID338
Feature activation+1.74e+00

Pos logit contributions
ous+0.014
ho+0.013
 Nice+0.012
igator+0.012
 Sky+0.012
Neg logit contributions
leaning-0.010
 agree-0.010
ap-0.009
 promise-0.009
ags-0.009
Token own
Token ID898
Feature activation+2.18e+00

Pos logit contributions
 agree+0.090
ap+0.087
leaning+0.085
 promise+0.085
 So+0.079
Neg logit contributions
ous-0.156
ho-0.146
 Nice-0.138
chy-0.133
 guarding-0.133
Token shadow
Token ID9082
Feature activation+6.83e-01

Pos logit contributions
 promise+0.110
ap+0.108
 agree+0.107
leaning+0.099
 promises+0.094
Neg logit contributions
ous-0.208
ho-0.189
 Nice-0.185
 tigers-0.177
 Sky-0.176
Token,
Token ID11
Feature activation+3.44e-01

Pos logit contributions
 promise+0.031
ap+0.030
 agree+0.029
 promises+0.026
 advantage+0.025
Neg logit contributions
ous-0.068
ho-0.064
 Nice-0.062
 tigers-0.060
 grizz-0.060
Token and
Token ID290
Feature activation+4.16e-01

Pos logit contributions
ap+0.019
 promise+0.019
 agree+0.019
leaning+0.018
 So+0.017
Neg logit contributions
ous-0.030
ho-0.027
 Nice-0.027
 guarding-0.026
 spilling-0.026
Token it
Token ID340
Feature activation+2.72e-01

Pos logit contributions
 promise+0.024
ap+0.024
 agree+0.023
leaning+0.022
 So+0.021
Neg logit contributions
ous-0.037
ho-0.033
 Nice-0.032
 tigers-0.031
 Sky-0.031
Token is
Token ID318
Feature activation-3.63e-02

Pos logit contributions
 agree+0.017
 promise+0.017
ap+0.017
leaning+0.016
 advantage+0.015
Neg logit contributions
ous-0.023
ho-0.020
 Nice-0.019
 Sky-0.019
pill-0.019
Token is
Token ID318
Feature activation-2.25e-01

Pos logit contributions
 promise+0.005
 agree+0.004
ap+0.004
 So+0.004
 promises+0.004
Neg logit contributions
ous-0.005
ho-0.004
 Nice-0.004
 spilling-0.004
 guarding-0.004
Token happy
Token ID3772
Feature activation+2.17e-02

Pos logit contributions
ous+0.023
pill+0.020
ho+0.020
storm+0.019
 Nice+0.019
Neg logit contributions
 promise-0.013
ap-0.012
 own-0.011
 promises-0.011
 still-0.011
Token.
Token ID13
Feature activation-2.38e-01

Pos logit contributions
 agree+0.001
 promise+0.001
ap+0.001
 promises+0.001
 So+0.001
Neg logit contributions
ous-0.002
ho-0.002
 Nice-0.002
pill-0.002
 grizz-0.002
Token It
Token ID632
Feature activation+1.17e-01

Pos logit contributions
ous+0.020
ho+0.018
 Nice+0.017
 tigers+0.017
 Sky+0.016
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.015
leaning-0.014
 advantage-0.014
Token w
Token ID266
Feature activation+2.14e-01

Pos logit contributions
 promise+0.008
 promises+0.008
 agree+0.007
 So+0.007
ap+0.007
Neg logit contributions
ous-0.010
ho-0.009
 spilling-0.008
 Nice-0.008
storm-0.008
Tokenags
Token ID3775
Feature activation+2.17e+00

Pos logit contributions
 promise+0.009
 So+0.007
 promises+0.007
 agree+0.007
our+0.007
Neg logit contributions
ous-0.021
pill-0.020
 tigers-0.020
sed-0.019
 snakes-0.019
Token its
Token ID663
Feature activation-5.86e-01

Pos logit contributions
ap+0.128
 promise+0.127
 agree+0.126
 advantage+0.112
 promises+0.112
Neg logit contributions
ous-0.188
ho-0.168
 Nice-0.165
pill-0.164
 tigers-0.158
Token tail
Token ID7894
Feature activation-2.23e-01

Pos logit contributions
ous+0.041
ho+0.037
 Nice+0.036
 tigers+0.034
 Sky+0.034
Neg logit contributions
 promise-0.044
ap-0.043
 agree-0.043
leaning-0.041
 promises-0.040
Token and
Token ID290
Feature activation-1.52e-02

Pos logit contributions
ous+0.026
ho+0.024
 Nice+0.023
 tigers+0.023
 spilling+0.023
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.006
 So-0.005
 promises-0.005
Token l
Token ID300
Feature activation-5.07e-01

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 spilling+0.001
lic+0.001
Neg logit contributions
 promise-0.001
 promises-0.001
 agree-0.001
ap-0.001
 So-0.001
Tokenicks
Token ID3378
Feature activation-1.99e-01

Pos logit contributions
ous+0.035
ho+0.031
 Nice+0.030
 tigers+0.028
 Sky+0.028
Neg logit contributions
 promise-0.039
ap-0.039
 agree-0.038
leaning-0.036
 promises-0.036
Token She
Token ID1375
Feature activation+1.86e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.005
leaning+0.004
 promises+0.004
Neg logit contributions
ous-0.007
ho-0.006
 Nice-0.006
 Sky-0.006
pill-0.006
Token found
Token ID1043
Feature activation+5.86e-01

Pos logit contributions
ap+0.013
 promise+0.013
 agree+0.013
leaning+0.012
 advantage+0.012
Neg logit contributions
ous-0.014
ho-0.013
 Nice-0.012
 spilling-0.012
 guarding-0.012
Token some
Token ID617
Feature activation-5.77e-01

Pos logit contributions
 promise+0.030
ap+0.029
 agree+0.028
 promises+0.028
 advantage+0.027
Neg logit contributions
ous-0.056
 Nice-0.051
ho-0.050
 tigers-0.048
storm-0.047
Token old
Token ID1468
Feature activation-1.74e-01

Pos logit contributions
ous+0.043
ho+0.038
 Nice+0.037
 spilling+0.036
 tigers+0.036
Neg logit contributions
ap-0.041
 agree-0.040
 promise-0.040
leaning-0.037
 advantage-0.036
Token r
Token ID374
Feature activation+3.03e-01

Pos logit contributions
ous+0.014
ho+0.013
 Nice+0.012
 tigers+0.012
 spilling+0.012
Neg logit contributions
ap-0.011
 promise-0.011
 agree-0.011
leaning-0.010
 advantage-0.010
Tokenags
Token ID3775
Feature activation+2.17e+00

Pos logit contributions
 promise+0.022
 agree+0.021
ap+0.020
 So+0.019
 promises+0.019
Neg logit contributions
ous-0.023
ho-0.020
pill-0.019
 spilling-0.019
 Nice-0.018
Token which
Token ID543
Feature activation-9.67e-01

Pos logit contributions
ap+0.062
 promise+0.060
 agree+0.060
leaning+0.047
 So+0.047
Neg logit contributions
ous-0.249
 Nice-0.234
ho-0.230
 grizz-0.227
pill-0.227
Token were
Token ID547
Feature activation-4.32e-01

Pos logit contributions
ous+0.075
 Nice+0.066
ho+0.064
 guarding+0.062
 tigers+0.062
Neg logit contributions
 promise-0.067
 agree-0.064
ap-0.063
 promises-0.059
 So-0.059
Token dirty
Token ID11841
Feature activation+1.83e-01

Pos logit contributions
ous+0.038
 Nice+0.034
ho+0.033
pill+0.032
 Sky+0.032
Neg logit contributions
 promise-0.026
 agree-0.024
ap-0.024
 promises-0.024
 advantage-0.022
Token.
Token ID13
Feature activation+1.30e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.004
 So+0.004
 promises+0.004
Neg logit contributions
ous-0.022
ho-0.020
 Nice-0.020
 Sky-0.020
 tigers-0.019
Token She
Token ID1375
Feature activation+1.61e-01

Pos logit contributions
 promise+0.008
ap+0.008
 agree+0.008
leaning+0.008
 So+0.008
Neg logit contributions
ous-0.011
ho-0.010
 Nice-0.009
 tigers-0.009
 guarding-0.009
Tokeniffs
Token ID10203
Feature activation+8.24e-02

Pos logit contributions
 promise+0.011
 agree+0.011
ap+0.010
leaning+0.010
 promises+0.010
Neg logit contributions
ho-0.008
ous-0.008
 Nice-0.007
 Sky-0.007
balls-0.006
Token them
Token ID606
Feature activation-1.44e-01

Pos logit contributions
ap+0.005
 promise+0.005
 agree+0.004
 So+0.004
 promises+0.004
Neg logit contributions
ous-0.008
 Nice-0.007
 grizz-0.007
ho-0.007
pill-0.007
Token.
Token ID13
Feature activation-1.58e-01

Pos logit contributions
ous+0.014
ho+0.014
 Nice+0.013
 Sky+0.013
chy+0.013
Neg logit contributions
 promise-0.006
leaning-0.006
ap-0.006
 agree-0.006
 advantage-0.005
Token It
Token ID632
Feature activation+2.35e-01

Pos logit contributions
ous+0.016
ho+0.014
 spilling+0.014
 guarding+0.013
 Nice+0.013
Neg logit contributions
ap-0.007
 agree-0.007
 promise-0.007
 So-0.007
 promises-0.006
Token w
Token ID266
Feature activation+2.89e-01

Pos logit contributions
 promise+0.017
 promises+0.016
 agree+0.015
 own+0.015
 So+0.014
Neg logit contributions
ous-0.021
ho-0.016
 spilling-0.016
storm-0.016
 Nice-0.016
Tokenags
Token ID3775
Feature activation+2.16e+00

Pos logit contributions
 promise+0.013
 So+0.011
 promises+0.010
leaning+0.010
 replace+0.010
Neg logit contributions
ous-0.027
pill-0.026
 snakes-0.026
 tigers-0.026
 Sky-0.026
Token its
Token ID663
Feature activation-5.43e-01

Pos logit contributions
ap+0.122
 agree+0.118
 promise+0.118
 promises+0.105
 So+0.105
Neg logit contributions
ous-0.194
pill-0.174
ho-0.169
 Nice-0.169
 grizz-0.164
Token tail
Token ID7894
Feature activation-2.37e-01

Pos logit contributions
ous+0.038
ho+0.034
 Nice+0.033
 tigers+0.031
 Sky+0.031
Neg logit contributions
 promise-0.041
ap-0.040
 agree-0.040
leaning-0.038
 promises-0.037
Token.
Token ID13
Feature activation-1.50e-01

Pos logit contributions
ous+0.026
ho+0.024
 Nice+0.024
 tigers+0.023
 Sky+0.023
Neg logit contributions
 promise-0.008
ap-0.008
 agree-0.008
leaning-0.007
 promises-0.006
Token It
Token ID632
Feature activation+2.18e-01

Pos logit contributions
ous+0.015
ho+0.014
 spilling+0.013
 Nice+0.013
 guarding+0.013
Neg logit contributions
ap-0.007
 agree-0.007
 promise-0.007
 So-0.006
 advantage-0.006
Token l
Token ID300
Feature activation-1.52e-01

Pos logit contributions
 promise+0.015
 promises+0.014
 agree+0.013
 So+0.013
 own+0.013
Neg logit contributions
ous-0.021
ho-0.016
 spilling-0.016
storm-0.016
 Nice-0.016
Token to
Token ID284
Feature activation+6.68e-01

Pos logit contributions
 promise+0.049
ap+0.048
 agree+0.047
 promises+0.043
 So+0.043
Neg logit contributions
ous-0.066
ho-0.060
 Nice-0.058
 grizz-0.056
 tigers-0.056
Token Spot
Token ID15899
Feature activation+9.90e-01

Pos logit contributions
 promise+0.040
ap+0.040
 agree+0.039
leaning+0.036
 promises+0.035
Neg logit contributions
ous-0.058
ho-0.052
 Nice-0.051
 tigers-0.049
 guarding-0.048
Token.
Token ID13
Feature activation-2.79e-02

Pos logit contributions
 promise+0.041
ap+0.041
leaning+0.040
 advantage+0.039
 agree+0.038
Neg logit contributions
ous-0.099
ho-0.092
 guarding-0.089
 Sky-0.089
 Nice-0.089
Token He
Token ID679
Feature activation+3.03e-01

Pos logit contributions
ous+0.003
ho+0.002
 Nice+0.002
 tigers+0.002
 Sky+0.002
Neg logit contributions
 promise-0.002
ap-0.001
 agree-0.001
leaning-0.001
 advantage-0.001
Token w
Token ID266
Feature activation+3.09e-01

Pos logit contributions
 promise+0.021
 agree+0.020
ap+0.020
 promises+0.019
 So+0.018
Neg logit contributions
ous-0.025
ho-0.022
 Nice-0.022
 spilling-0.020
storm-0.020
Tokenags
Token ID3775
Feature activation+2.16e+00

Pos logit contributions
 promise+0.015
 So+0.015
So+0.014
leaning+0.013
ute+0.013
Neg logit contributions
ous-0.025
 gor-0.023
pill-0.023
balls-0.023
 bees-0.023
Token his
Token ID465
Feature activation+5.51e-02

Pos logit contributions
ap+0.118
 agree+0.117
 promise+0.115
 promises+0.105
 So+0.102
Neg logit contributions
ous-0.194
pill-0.178
ho-0.176
 Nice-0.172
chy-0.167
Token long
Token ID890
Feature activation-8.77e-01

Pos logit contributions
 promise+0.004
ap+0.004
 agree+0.004
leaning+0.004
 advantage+0.004
Neg logit contributions
ous-0.004
ho-0.003
 Nice-0.003
 tigers-0.003
 Sky-0.003
Token tail
Token ID7894
Feature activation-9.12e-03

Pos logit contributions
ous+0.075
ho+0.071
 guarding+0.067
 spilling+0.067
 Nice+0.067
Neg logit contributions
ap-0.048
 promise-0.048
 agree-0.047
leaning-0.044
 advantage-0.043
Token.
Token ID13
Feature activation+7.57e-02

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.000
ap-0.000
 agree-0.000
leaning-0.000
 promises-0.000
Token
Token ID198
Feature activation+4.22e-01

Pos logit contributions
ap+0.004
 promise+0.004
 agree+0.004
 So+0.004
leaning+0.003
Neg logit contributions
ous-0.007
ho-0.007
 spilling-0.006
 Nice-0.006
igator-0.006
Token and
Token ID290
Feature activation+1.16e-01

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
 promises+0.002
leaning+0.002
Neg logit contributions
ous-0.006
ho-0.006
 Nice-0.006
 tigers-0.006
 Sky-0.006
Token showed
Token ID3751
Feature activation-5.98e-02

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
leaning+0.006
 advantage+0.006
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.009
 tigers-0.008
 Sky-0.008
Token her
Token ID607
Feature activation+1.25e-01

Pos logit contributions
ous+0.004
ho+0.004
 Nice+0.003
 Sky+0.003
 tigers+0.003
Neg logit contributions
 promise-0.005
ap-0.004
 agree-0.004
leaning-0.004
 So-0.004
Token the
Token ID262
Feature activation+1.69e-01

Pos logit contributions
ap+0.008
 promise+0.008
 agree+0.008
leaning+0.008
 advantage+0.008
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.008
 spilling-0.008
 Sky-0.008
Token r
Token ID374
Feature activation+2.53e-01

Pos logit contributions
ap+0.012
 agree+0.012
 promise+0.012
leaning+0.011
 advantage+0.011
Neg logit contributions
ous-0.012
ho-0.011
 Nice-0.011
 spilling-0.010
 tigers-0.010
Tokenags
Token ID3775
Feature activation+2.15e+00

Pos logit contributions
 promise+0.019
 agree+0.019
ap+0.019
 promises+0.018
 So+0.018
Neg logit contributions
ous-0.017
ho-0.015
pill-0.014
 Nice-0.014
 spilling-0.014
Token.
Token ID13
Feature activation+5.90e-02

Pos logit contributions
 promise+0.124
ap+0.122
 agree+0.121
leaning+0.109
 promises+0.109
Neg logit contributions
ous-0.190
ho-0.173
 Nice-0.169
 tigers-0.163
 Sky-0.162
Token She
Token ID1375
Feature activation+2.13e-01

Pos logit contributions
ap+0.004
 promise+0.004
 agree+0.004
 So+0.004
 promises+0.003
Neg logit contributions
ous-0.005
ho-0.004
 tigers-0.004
 guarding-0.004
 gor-0.004
Token said
Token ID531
Feature activation-4.90e-03

Pos logit contributions
 promise+0.015
ap+0.014
 agree+0.014
leaning+0.013
 promises+0.013
Neg logit contributions
ous-0.017
ho-0.015
 Nice-0.015
 tigers-0.014
 Sky-0.014
Token that
Token ID326
Feature activation-4.82e-01

Pos logit contributions
ous+0.000
ho+0.000
 Nice+0.000
 tigers+0.000
 grizz+0.000
Neg logit contributions
ap-0.000
 promise-0.000
 agree-0.000
 So-0.000
 advantage-0.000
Token she
Token ID673
Feature activation-3.37e-01

Pos logit contributions
ous+0.044
ho+0.038
 Nice+0.038
 grizz+0.037
sed+0.037
Neg logit contributions
 promise-0.028
ap-0.027
 promises-0.027
 agree-0.027
leaning-0.024
Token him
Token ID683
Feature activation+1.26e-01

Pos logit contributions
 promise+0.009
ap+0.008
 agree+0.008
leaning+0.008
 advantage+0.007
Neg logit contributions
ous-0.011
ho-0.010
 Nice-0.010
 tigers-0.009
 spilling-0.009
Token to
Token ID284
Feature activation-3.41e-01

Pos logit contributions
leaning+0.008
ap+0.007
 promise+0.007
 agree+0.007
ags+0.007
Neg logit contributions
ous-0.010
chy-0.009
 chasing-0.009
 sleeping-0.009
 guarding-0.009
Token turn
Token ID1210
Feature activation+6.07e-01

Pos logit contributions
ous+0.025
 gn+0.022
 chasing+0.022
 spilling+0.022
 sleeping+0.022
Neg logit contributions
leaning-0.021
ute-0.021
ap-0.020
ig-0.020
 advantage-0.020
Token the
Token ID262
Feature activation-1.85e-01

Pos logit contributions
 promise+0.044
ap+0.043
 agree+0.043
leaning+0.040
 So+0.039
Neg logit contributions
ous-0.045
ho-0.039
 Nice-0.038
 spilling-0.038
 tigers-0.037
Token r
Token ID374
Feature activation+2.63e-03

Pos logit contributions
ous+0.016
ho+0.014
 Nice+0.014
 tigers+0.013
 spilling+0.013
Neg logit contributions
 promise-0.011
ap-0.011
 agree-0.011
leaning-0.010
 So-0.010
Tokenags
Token ID3775
Feature activation+2.15e+00

Pos logit contributions
 promise+0.000
 agree+0.000
 So+0.000
 promises+0.000
 advantage+0.000
Neg logit contributions
ous-0.000
ho-0.000
pill-0.000
chy-0.000
balls-0.000
Token she
Token ID673
Feature activation-4.84e-01

Pos logit contributions
 promise+0.118
ap+0.117
 agree+0.115
leaning+0.103
 advantage+0.103
Neg logit contributions
ous-0.195
ho-0.178
 Nice-0.174
 tigers-0.167
 Sky-0.166
Token kept
Token ID4030
Feature activation+5.14e-02

Pos logit contributions
ous+0.035
ho+0.031
 Nice+0.030
 tigers+0.029
 Sky+0.028
Neg logit contributions
 promise-0.036
ap-0.035
 agree-0.035
leaning-0.033
 promises-0.032
Token in
Token ID287
Feature activation+3.50e-01

Pos logit contributions
 promise+0.004
ap+0.004
 agree+0.004
leaning+0.004
 So+0.004
Neg logit contributions
ous-0.004
ho-0.003
 Nice-0.003
 spilling-0.003
 guarding-0.003
Token the
Token ID262
Feature activation+1.99e-01

Pos logit contributions
ap+0.023
 promise+0.022
 agree+0.022
leaning+0.021
 So+0.020
Neg logit contributions
ous-0.028
ho-0.025
 Nice-0.024
 spilling-0.024
 guarding-0.024
Token closet
Token ID21615
Feature activation+7.81e-01

Pos logit contributions
 agree+0.015
 promise+0.014
ap+0.014
leaning+0.014
 So+0.013
Neg logit contributions
ous-0.013
ho-0.012
 spilling-0.011
 stairs-0.011
chy-0.011
Token He
Token ID679
Feature activation+1.86e-01

Pos logit contributions
ous+0.007
 Nice+0.007
ho+0.006
 tigers+0.006
 gor+0.006
Neg logit contributions
 promise-0.006
ap-0.005
 agree-0.005
leaning-0.005
 promises-0.005
Token wrote
Token ID2630
Feature activation-2.57e-01

Pos logit contributions
 promise+0.014
ap+0.014
 agree+0.014
leaning+0.013
 advantage+0.013
Neg logit contributions
ous-0.013
ho-0.011
 Nice-0.011
 tigers-0.011
 Sky-0.010
Token z
Token ID1976
Feature activation-1.11e-01

Pos logit contributions
ous+0.021
ho+0.020
 Nice+0.019
pill+0.018
 grizz+0.018
Neg logit contributions
ap-0.016
 promise-0.016
 agree-0.015
 So-0.014
 advantage-0.014
Tokenig
Token ID328
Feature activation+1.14e+00

Pos logit contributions
ous+0.009
ho+0.008
balls+0.008
sed+0.008
 Nice+0.007
Neg logit contributions
 promise-0.008
 agree-0.007
 So-0.007
leaning-0.007
 replace-0.006
Tokenz
Token ID89
Feature activation-1.66e-01

Pos logit contributions
ap+0.081
 promise+0.077
 agree+0.076
 So+0.074
our+0.074
Neg logit contributions
ous-0.079
 Nice-0.074
ho-0.073
igator-0.073
 spilling-0.072
Tokenags
Token ID3775
Feature activation+2.15e+00

Pos logit contributions
ous+0.014
 Nice+0.012
ho+0.012
 wo+0.011
balls+0.011
Neg logit contributions
 promise-0.010
 agree-0.010
leaning-0.009
 replace-0.008
 learn-0.008
Token on
Token ID319
Feature activation+5.11e-01

Pos logit contributions
ap+0.115
 promise+0.113
 agree+0.111
 advantage+0.100
 So+0.099
Neg logit contributions
ous-0.194
ho-0.181
 Nice-0.175
pill-0.174
 grizz-0.169
Token walls
Token ID7714
Feature activation-4.16e-01

Pos logit contributions
 promise+0.035
ap+0.034
 agree+0.034
leaning+0.031
 promises+0.031
Neg logit contributions
ous-0.040
ho-0.036
 Nice-0.035
 tigers-0.034
 Sky-0.033
Token,
Token ID11
Feature activation+5.34e-02

Pos logit contributions
ous+0.048
ho+0.045
 Nice+0.044
pill+0.043
 grizz+0.043
Neg logit contributions
ap-0.013
 promise-0.012
 agree-0.012
 So-0.009
 promises-0.009
Token floors
Token ID18570
Feature activation+1.67e-01

Pos logit contributions
ap+0.002
 agree+0.002
 promise+0.002
 learn+0.002
 promises+0.002
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
igator-0.005
sed-0.005
Token,
Token ID11
Feature activation+2.09e-01

Pos logit contributions
ap+0.006
 promise+0.006
 agree+0.005
 promises+0.004
 So+0.004
Neg logit contributions
ous-0.019
ho-0.017
 Nice-0.017
 tigers-0.017
 Sky-0.017
Token it
Token ID340
Feature activation+8.84e-02

Pos logit contributions
ous+0.012
ho+0.010
 Nice+0.010
pill+0.010
 grizz+0.009
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.015
 So-0.014
 advantage-0.014
Token was
Token ID373
Feature activation-3.14e-01

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.006
 promises+0.005
 So+0.005
Neg logit contributions
ous-0.007
 Nice-0.006
ho-0.006
pill-0.006
 Sky-0.006
Token difficult
Token ID2408
Feature activation+4.75e-01

Pos logit contributions
ous+0.028
ho+0.025
 Nice+0.025
pill+0.024
 Sky+0.023
Neg logit contributions
 promise-0.019
ap-0.018
 agree-0.018
 advantage-0.017
 promises-0.016
Token finding
Token ID4917
Feature activation-6.96e-01

Pos logit contributions
 promise+0.029
ap+0.028
 agree+0.028
leaning+0.026
 promises+0.026
Neg logit contributions
ous-0.041
ho-0.037
 Nice-0.036
 tigers-0.034
 Sky-0.034
Token r
Token ID374
Feature activation+8.32e-02

Pos logit contributions
ous+0.049
 spilling+0.044
 tigers+0.042
 chasing+0.041
 Nice+0.041
Neg logit contributions
 agree-0.051
 promise-0.051
ap-0.051
leaning-0.047
our-0.046
Tokenags
Token ID3775
Feature activation+2.15e+00

Pos logit contributions
 promise+0.006
 agree+0.006
ap+0.006
 promises+0.006
 So+0.006
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 tigers-0.005
 spilling-0.005
Token and
Token ID290
Feature activation+4.72e-02

Pos logit contributions
 promise+0.104
ap+0.102
 agree+0.100
leaning+0.090
 promises+0.089
Neg logit contributions
ous-0.209
ho-0.192
 Nice-0.188
 tigers-0.181
 Sky-0.180
Token water
Token ID1660
Feature activation-6.20e-01

Pos logit contributions
 promise+0.003
 agree+0.003
 promises+0.002
 care+0.002
ap+0.002
Neg logit contributions
ous-0.004
pill-0.004
 Nice-0.004
ho-0.004
lic-0.004
Token.
Token ID13
Feature activation-4.54e-02

Pos logit contributions
ous+0.059
 Nice+0.055
pill+0.053
 gor+0.052
 Sky+0.052
Neg logit contributions
 promise-0.031
ap-0.029
 agree-0.029
 promises-0.026
 So-0.025
Token But
Token ID887
Feature activation+5.64e-01

Pos logit contributions
ous+0.003
ho+0.002
 Nice+0.002
 tigers+0.002
 spilling+0.002
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
 So-0.003
 advantage-0.003
Token Bob
Token ID5811
Feature activation+1.40e-01

Pos logit contributions
 promise+0.047
ap+0.046
 agree+0.046
 So+0.043
 promises+0.043
Neg logit contributions
ous-0.036
ho-0.030
 Nice-0.030
pill-0.029
 grizz-0.028
Token the
Token ID262
Feature activation-1.13e-01

Pos logit contributions
ous+0.043
ho+0.039
 Nice+0.038
pill+0.036
 grizz+0.036
Neg logit contributions
 promise-0.021
ap-0.021
 agree-0.020
 So-0.018
 advantage-0.018
Token things
Token ID1243
Feature activation-1.82e-01

Pos logit contributions
ous+0.010
ho+0.009
 Nice+0.009
 Sky+0.008
pill+0.008
Neg logit contributions
 promise-0.007
 agree-0.007
ap-0.006
 promises-0.006
 So-0.006
Token.
Token ID13
Feature activation+5.81e-02

Pos logit contributions
ous+0.018
ho+0.016
 Nice+0.016
 tigers+0.015
 Sky+0.015
Neg logit contributions
 promise-0.009
ap-0.009
 agree-0.009
leaning-0.008
 advantage-0.008
Token He
Token ID679
Feature activation+2.49e-01

Pos logit contributions
 promise+0.004
ap+0.004
 agree+0.004
 So+0.004
leaning+0.003
Neg logit contributions
ous-0.005
ho-0.004
 Nice-0.004
 tigers-0.004
 spilling-0.004
Token w
Token ID266
Feature activation+2.31e-01

Pos logit contributions
 promise+0.018
ap+0.018
 agree+0.018
 promises+0.017
 So+0.016
Neg logit contributions
ous-0.019
ho-0.016
 Nice-0.016
 spilling-0.015
 tigers-0.015
Tokenags
Token ID3775
Feature activation+2.14e+00

Pos logit contributions
 promise+0.011
 So+0.009
 replace+0.009
 promises+0.008
 destination+0.008
Neg logit contributions
sed-0.019
ous-0.019
pill-0.019
 snakes-0.018
balls-0.018
Token his
Token ID465
Feature activation-1.95e-01

Pos logit contributions
 agree+0.115
ap+0.114
 promise+0.111
 promises+0.102
 So+0.102
Neg logit contributions
ous-0.197
pill-0.180
ho-0.178
 Nice-0.172
chy-0.165
Token tail
Token ID7894
Feature activation-3.43e-01

Pos logit contributions
ous+0.013
ho+0.011
 Nice+0.011
 Sky+0.011
 tigers+0.011
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.015
leaning-0.014
 advantage-0.014
Token and
Token ID290
Feature activation-4.43e-02

Pos logit contributions
ous+0.039
ho+0.036
 Nice+0.035
 tigers+0.034
 Sky+0.034
Neg logit contributions
 promise-0.011
ap-0.011
 agree-0.011
leaning-0.009
 promises-0.009
Token runs
Token ID4539
Feature activation-2.50e-01

Pos logit contributions
ous+0.004
ho+0.003
 Nice+0.003
lic+0.003
storm+0.003
Neg logit contributions
 promises-0.003
 promise-0.003
 agree-0.003
 So-0.003
ap-0.003
Token away
Token ID1497
Feature activation+7.03e-01

Pos logit contributions
ous+0.022
ho+0.020
 Nice+0.019
 Sky+0.018
 tigers+0.018
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.014
leaning-0.013
 promises-0.013
Token tongue
Token ID11880
Feature activation+2.70e-01

Pos logit contributions
 promise+0.037
ap+0.036
 agree+0.035
 So+0.032
 promises+0.032
Neg logit contributions
ous-0.041
ho-0.036
 Nice-0.035
pill-0.033
 tigers-0.033
Token out
Token ID503
Feature activation+4.16e-01

Pos logit contributions
 promise+0.009
ap+0.009
 agree+0.009
 promises+0.008
 So+0.007
Neg logit contributions
ous-0.030
ho-0.028
 Nice-0.027
 tigers-0.027
 Sky-0.026
Token.
Token ID13
Feature activation-1.12e-01

Pos logit contributions
ap+0.020
 promise+0.020
 agree+0.019
 So+0.016
ags+0.016
Neg logit contributions
pill-0.040
ous-0.039
ho-0.036
 Nice-0.035
 gor-0.035
Token He
Token ID679
Feature activation+2.07e-01

Pos logit contributions
ous+0.010
ho+0.009
 Nice+0.008
 tigers+0.008
 spilling+0.008
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.007
 So-0.006
leaning-0.006
Token w
Token ID266
Feature activation+2.12e-01

Pos logit contributions
 promise+0.015
ap+0.014
 agree+0.014
 promises+0.013
 So+0.013
Neg logit contributions
ous-0.016
ho-0.014
 Nice-0.014
igator-0.013
 spilling-0.013
Tokenags
Token ID3775
Feature activation+2.13e+00

Pos logit contributions
 promise+0.010
 So+0.009
 replace+0.009
 own+0.008
 promises+0.008
Neg logit contributions
sed-0.019
ous-0.018
ses-0.018
pill-0.017
 snakes-0.017
Token his
Token ID465
Feature activation-4.77e-02

Pos logit contributions
ap+0.091
 agree+0.087
 promise+0.080
 promises+0.074
 So+0.073
Neg logit contributions
pill-0.218
ous-0.215
chy-0.190
storm-0.189
 Nice-0.188
Token tail
Token ID7894
Feature activation-1.94e-01

Pos logit contributions
ous+0.003
ho+0.003
 Nice+0.003
 Sky+0.003
sed+0.003
Neg logit contributions
 promise-0.004
 agree-0.004
ap-0.004
leaning-0.004
 advantage-0.003
Token and
Token ID290
Feature activation-3.70e-02

Pos logit contributions
ous+0.023
ho+0.022
 Nice+0.021
 tigers+0.021
pill+0.021
Neg logit contributions
 promise-0.005
ap-0.005
 agree-0.005
 So-0.004
 promises-0.004
Token makes
Token ID1838
Feature activation-2.37e-01

Pos logit contributions
ous+0.003
ho+0.003
 Nice+0.003
storm+0.003
 spilling+0.003
Neg logit contributions
 promise-0.002
 promises-0.002
 agree-0.002
 So-0.002
ap-0.002
Token a
Token ID257
Feature activation-8.79e-02

Pos logit contributions
ous+0.025
ho+0.023
 Nice+0.022
storm+0.021
 guarding+0.021
Neg logit contributions
 agree-0.011
 promise-0.011
ap-0.010
 promises-0.010
 advantage-0.010
Token running
Token ID2491
Feature activation-9.85e-02

Pos logit contributions
ous+0.012
 Nice+0.010
ho+0.010
igator+0.010
pill+0.010
Neg logit contributions
ap-0.013
 promise-0.013
 agree-0.013
 promises-0.012
leaning-0.012
Token around
Token ID1088
Feature activation+6.27e-01

Pos logit contributions
ous+0.009
ho+0.008
 Nice+0.008
 spilling+0.008
 chasing+0.008
Neg logit contributions
 promise-0.006
 agree-0.005
ap-0.005
leaning-0.005
 promises-0.005
Token with
Token ID351
Feature activation-3.95e-01

Pos logit contributions
 promise+0.033
ap+0.032
 agree+0.032
leaning+0.029
 promises+0.028
Neg logit contributions
ous-0.059
ho-0.054
 Nice-0.053
 tigers-0.051
 Sky-0.050
Token the
Token ID262
Feature activation+2.40e-03

Pos logit contributions
ous+0.032
ho+0.029
 Nice+0.027
 spilling+0.027
 tigers+0.027
Neg logit contributions
 promise-0.026
ap-0.025
 agree-0.025
leaning-0.023
 So-0.023
Token r
Token ID374
Feature activation+1.12e-01

Pos logit contributions
ap+0.000
 agree+0.000
 promise+0.000
 So+0.000
 advantage+0.000
Neg logit contributions
ous-0.000
 spilling-0.000
ho-0.000
chy-0.000
 grizz-0.000
Tokenags
Token ID3775
Feature activation+2.13e+00

Pos logit contributions
 promise+0.008
 agree+0.008
ap+0.007
 learn+0.007
 advantage+0.007
Neg logit contributions
ous-0.008
 spilling-0.007
chy-0.007
ho-0.007
pill-0.006
Token,
Token ID11
Feature activation+1.26e-01

Pos logit contributions
 promise+0.089
ap+0.088
 agree+0.085
 promises+0.073
 advantage+0.073
Neg logit contributions
ous-0.220
ho-0.204
 Nice-0.200
 tigers-0.194
 grizz-0.193
Token so
Token ID523
Feature activation-6.93e-02

Pos logit contributions
ap+0.007
 promise+0.007
 agree+0.007
leaning+0.006
 So+0.006
Neg logit contributions
ous-0.011
ho-0.010
 Nice-0.010
 spilling-0.010
 guarding-0.010
Token this
Token ID428
Feature activation+1.22e-01

Pos logit contributions
ous+0.006
 Nice+0.005
ho+0.005
 tigers+0.005
igator+0.005
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
 promises-0.004
 advantage-0.004
Token kind
Token ID1611
Feature activation+6.78e-01

Pos logit contributions
ap+0.009
 promise+0.009
 agree+0.009
leaning+0.008
 advantage+0.008
Neg logit contributions
ous-0.009
ho-0.008
 Nice-0.007
 spilling-0.007
 tigers-0.007
Token of
Token ID286
Feature activation-1.16e-01

Pos logit contributions
ap+0.043
 agree+0.042
 promise+0.041
ags+0.041
our+0.040
Neg logit contributions
ous-0.056
ho-0.050
 Nice-0.046
 spilling-0.044
chy-0.044
Token grabbed
Token ID13176
Feature activation-4.34e-03

Pos logit contributions
ap+0.102
 promise+0.102
 agree+0.099
leaning+0.095
 advantage+0.094
Neg logit contributions
ous-0.111
ho-0.100
 Nice-0.094
 tigers-0.091
 Sky-0.091
Token one
Token ID530
Feature activation-8.71e-01

Pos logit contributions
ous+0.000
ho+0.000
 Nice+0.000
 spilling+0.000
 Sky+0.000
Neg logit contributions
 promise-0.000
ap-0.000
 agree-0.000
leaning-0.000
 promises-0.000
Token of
Token ID286
Feature activation-5.95e-01

Pos logit contributions
ous+0.066
ho+0.061
igator+0.057
 Nice+0.057
chy+0.055
Neg logit contributions
 agree-0.058
ap-0.056
leaning-0.056
 promise-0.053
 promises-0.052
Token the
Token ID262
Feature activation+1.09e-01

Pos logit contributions
ous+0.039
ho+0.035
 Nice+0.034
 spilling+0.034
 Sky+0.033
Neg logit contributions
ap-0.046
 promise-0.046
 agree-0.045
leaning-0.043
our-0.042
Token r
Token ID374
Feature activation+2.63e-01

Pos logit contributions
ap+0.007
 agree+0.007
 promise+0.007
leaning+0.006
 advantage+0.006
Neg logit contributions
ous-0.008
 spilling-0.008
ho-0.007
chy-0.007
 Nice-0.007
Tokenags
Token ID3775
Feature activation+2.13e+00

Pos logit contributions
 promise+0.020
 agree+0.019
 replace+0.018
 learn+0.018
 express+0.018
Neg logit contributions
ous-0.018
drop-0.015
pill-0.015
 spilling-0.014
ho-0.014
Token and
Token ID290
Feature activation+8.58e-02

Pos logit contributions
 promise+0.093
ap+0.090
 agree+0.088
leaning+0.081
 promises+0.079
Neg logit contributions
ous-0.220
ho-0.202
 Nice-0.195
 spilling-0.188
 tigers-0.187
Token started
Token ID2067
Feature activation-1.53e-01

Pos logit contributions
 promise+0.005
 agree+0.005
ap+0.005
 So+0.005
 promises+0.004
Neg logit contributions
ous-0.008
ho-0.007
 Nice-0.007
 gor-0.007
 guarding-0.006
Token to
Token ID284
Feature activation-4.07e-01

Pos logit contributions
ous+0.012
ho+0.011
 Nice+0.010
igator+0.010
 tigers+0.010
Neg logit contributions
ap-0.011
 promise-0.010
 agree-0.010
 promises-0.009
 So-0.009
Token spin
Token ID7906
Feature activation+4.82e-01

Pos logit contributions
ous+0.034
ho+0.030
 Nice+0.030
 spilling+0.028
 tigers+0.028
Neg logit contributions
 promise-0.025
ap-0.025
 agree-0.024
leaning-0.023
 advantage-0.023
Token it
Token ID340
Feature activation+8.18e-03

Pos logit contributions
 promise+0.031
ap+0.030
 agree+0.029
leaning+0.028
 promises+0.027
Neg logit contributions
ous-0.040
ho-0.036
 Nice-0.035
 Sky-0.034
 tigers-0.034
Token mom
Token ID1995
Feature activation+4.88e-01

Pos logit contributions
 agree+0.031
ap+0.031
 promise+0.031
leaning+0.029
 advantage+0.028
Neg logit contributions
ous-0.064
ho-0.059
 Nice-0.057
 spilling-0.056
 tigers-0.055
Token bought
Token ID5839
Feature activation-4.59e-01

Pos logit contributions
 promise+0.031
ap+0.030
 agree+0.030
leaning+0.028
 So+0.027
Neg logit contributions
ous-0.041
ho-0.037
 Nice-0.036
 tigers-0.034
 Sky-0.034
Token some
Token ID617
Feature activation-7.61e-01

Pos logit contributions
ous+0.038
ho+0.034
 Nice+0.033
 tigers+0.032
 Sky+0.032
Neg logit contributions
 promise-0.030
ap-0.029
 agree-0.029
 promises-0.027
 advantage-0.026
Token new
Token ID649
Feature activation-3.53e-01

Pos logit contributions
ous+0.056
ho+0.048
 Nice+0.048
 tigers+0.048
pill+0.047
Neg logit contributions
ap-0.052
 agree-0.052
 promise-0.052
leaning-0.049
 advantage-0.048
Token r
Token ID374
Feature activation+2.38e-01

Pos logit contributions
ous+0.026
 chasing+0.024
pill+0.023
 spilling+0.023
 Nice+0.023
Neg logit contributions
ap-0.022
 agree-0.022
 promise-0.022
leaning-0.021
 advantage-0.020
Tokenags
Token ID3775
Feature activation+2.13e+00

Pos logit contributions
 promise+0.019
 agree+0.018
 advantage+0.016
ap+0.016
 promises+0.016
Neg logit contributions
ous-0.017
pill-0.014
ho-0.014
drop-0.013
balls-0.013
Token.
Token ID13
Feature activation-2.11e-01

Pos logit contributions
 promise+0.100
ap+0.095
 agree+0.095
leaning+0.088
 advantage+0.086
Neg logit contributions
ous-0.214
ho-0.194
 Nice-0.188
 spilling-0.182
 tigers-0.181
Token When
Token ID1649
Feature activation-1.59e-02

Pos logit contributions
ous+0.013
 Nice+0.011
ho+0.010
 spilling+0.010
 grizz+0.010
Neg logit contributions
 promise-0.019
 agree-0.018
ap-0.018
 advantage-0.017
 promises-0.017
Token they
Token ID484
Feature activation-1.06e-01

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 spilling+0.001
 Sky+0.001
Neg logit contributions
 promise-0.001
 agree-0.001
ap-0.001
 advantage-0.001
leaning-0.001
Token got
Token ID1392
Feature activation+2.38e-01

Pos logit contributions
ous+0.008
ho+0.007
 Nice+0.007
 tigers+0.006
 Sky+0.006
Neg logit contributions
ap-0.008
 promise-0.008
 agree-0.007
leaning-0.007
 advantage-0.007
Token back
Token ID736
Feature activation+1.65e-01

Pos logit contributions
 promise+0.015
 agree+0.015
ap+0.015
leaning+0.014
 advantage+0.014
Neg logit contributions
ous-0.018
ho-0.017
 spilling-0.016
 Nice-0.016
 sleeping-0.016
Token is
Token ID318
Feature activation+3.05e-03

Pos logit contributions
ous+0.016
ho+0.014
 Nice+0.013
 spilling+0.013
 tigers+0.012
Neg logit contributions
 promise-0.015
ap-0.014
 agree-0.014
 promises-0.013
 So-0.013
Token friendly
Token ID8030
Feature activation+2.31e-01

Pos logit contributions
 promise+0.000
 promises+0.000
 agree+0.000
ap+0.000
 advantage+0.000
Neg logit contributions
ous-0.000
ho-0.000
pill-0.000
storm-0.000
 Nice-0.000
Token.
Token ID13
Feature activation-1.40e-01

Pos logit contributions
ap+0.006
 promise+0.006
 agree+0.006
 So+0.005
 promises+0.005
Neg logit contributions
ous-0.028
ho-0.026
 spilling-0.025
 Nice-0.025
pill-0.024
Token He
Token ID679
Feature activation+1.21e-01

Pos logit contributions
ous+0.010
ho+0.009
 Nice+0.009
 tigers+0.008
 Sky+0.008
Neg logit contributions
 promise-0.011
ap-0.010
 agree-0.010
leaning-0.010
 So-0.010
Token w
Token ID266
Feature activation+1.69e-01

Pos logit contributions
 promise+0.009
 agree+0.009
ap+0.009
 promises+0.008
 So+0.008
Neg logit contributions
ous-0.009
ho-0.008
 Nice-0.008
 spilling-0.007
storm-0.007
Tokenags
Token ID3775
Feature activation+2.13e+00

Pos logit contributions
 promise+0.008
 So+0.008
 replace+0.007
�+0.007
So+0.007
Neg logit contributions
sed-0.014
ous-0.013
ses-0.013
 snakes-0.013
pill-0.013
Token his
Token ID465
Feature activation-3.38e-02

Pos logit contributions
ap+0.108
 agree+0.106
 promise+0.102
 So+0.093
 promises+0.091
Neg logit contributions
ous-0.198
pill-0.195
ho-0.177
chy-0.174
 Nice-0.174
Token tail
Token ID7894
Feature activation-3.15e-01

Pos logit contributions
ous+0.002
ho+0.002
 Nice+0.002
 Sky+0.002
sed+0.002
Neg logit contributions
 agree-0.003
 promise-0.003
ap-0.003
leaning-0.003
 advantage-0.002
Token and
Token ID290
Feature activation-5.61e-02

Pos logit contributions
ous+0.037
ho+0.035
 Nice+0.034
 tigers+0.033
 Sky+0.033
Neg logit contributions
 promise-0.009
ap-0.009
 agree-0.008
leaning-0.007
 promises-0.007
Token l
Token ID300
Feature activation-5.16e-01

Pos logit contributions
ous+0.005
ho+0.004
 spilling+0.004
 Nice+0.004
storm+0.003
Neg logit contributions
 promise-0.004
 promises-0.004
 agree-0.004
 So-0.004
ap-0.004
Tokenicks
Token ID3378
Feature activation-2.25e-01

Pos logit contributions
ous+0.033
ho+0.029
 Nice+0.028
 Sky+0.027
 tigers+0.027
Neg logit contributions
 promise-0.042
ap-0.041
 agree-0.041
leaning-0.039
 So-0.038
Token man
Token ID582
Feature activation-4.70e-01

Pos logit contributions
 promise+0.018
 agree+0.017
ap+0.016
 promises+0.015
 advantage+0.015
Neg logit contributions
ous-0.022
ho-0.020
 Nice-0.019
 Sky-0.019
 spilling-0.018
Token stepped
Token ID10764
Feature activation-3.52e-01

Pos logit contributions
ous+0.032
ho+0.030
 Nice+0.027
 grizz+0.026
 tigers+0.026
Neg logit contributions
ap-0.035
 promise-0.035
leaning-0.034
 agree-0.034
 advantage-0.033
Token closer
Token ID5699
Feature activation-1.54e-02

Pos logit contributions
ous+0.029
ho+0.027
 Nice+0.026
 spilling+0.025
 grizz+0.025
Neg logit contributions
 promise-0.022
 agree-0.021
ap-0.021
leaning-0.020
 promises-0.019
Token and
Token ID290
Feature activation+1.59e-02

Pos logit contributions
ous+0.002
ho+0.002
 Nice+0.002
 tigers+0.002
pill+0.002
Neg logit contributions
 promise-0.000
 agree-0.000
ap-0.000
 So-0.000
 promises-0.000
Token said
Token ID531
Feature activation-5.67e-01

Pos logit contributions
 agree+0.001
 promise+0.001
ap+0.001
 So+0.001
 promises+0.001
Neg logit contributions
ous-0.001
 Nice-0.001
ho-0.001
 tigers-0.001
pill-0.001
Token gru
Token ID22848
Feature activation-2.48e+00

Pos logit contributions
ous+0.067
pill+0.062
 tigers+0.062
 spilling+0.061
ho+0.061
Neg logit contributions
ap-0.015
 promise-0.013
 agree-0.012
 So-0.012
 advantage-0.011
Tokenff
Token ID487
Feature activation-3.08e-01

Pos logit contributions
ous+0.197
ho+0.179
 Nice+0.174
 grizz+0.162
 tigers+0.162
Neg logit contributions
 promise-0.164
 agree-0.158
ap-0.158
leaning-0.149
 promises-0.145
Tokenly
Token ID306
Feature activation-1.20e+00

Pos logit contributions
ous+0.023
pill+0.023
 Nice+0.023
 tigers+0.023
 spilling+0.023
Neg logit contributions
 promise-0.018
ap-0.018
 agree-0.017
 advantage-0.016
 So-0.016
Token,
Token ID11
Feature activation-2.97e-01

Pos logit contributions
ous+0.135
ho+0.126
 Nice+0.125
pill+0.123
 tigers+0.122
Neg logit contributions
ap-0.036
 promise-0.035
 agree-0.034
 So-0.031
 advantage-0.028
Token "
Token ID366
Feature activation-2.16e-01

Pos logit contributions
ous+0.018
ho+0.016
 Nice+0.015
pill+0.015
igator+0.015
Neg logit contributions
 agree-0.024
 promise-0.023
ap-0.023
 So-0.023
 advantage-0.021
TokenYou
Token ID1639
Feature activation-1.05e-01

Pos logit contributions
ous+0.014
ho+0.013
 Nice+0.012
pill+0.012
 tigers+0.011
Neg logit contributions
 promise-0.018
ap-0.018
 agree-0.017
 promises-0.016
 advantage-0.016
Token,
Token ID11
Feature activation+9.79e-02

Pos logit contributions
 promise+0.009
ap+0.009
 agree+0.008
leaning+0.008
 advantage+0.007
Neg logit contributions
ous-0.026
ho-0.024
 Nice-0.024
 tigers-0.023
 Sky-0.023
Token and
Token ID290
Feature activation+9.30e-02

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
 promises+0.002
leaning+0.002
Neg logit contributions
ous-0.012
ho-0.011
 Nice-0.010
 grizz-0.010
igator-0.010
Token even
Token ID772
Feature activation-1.83e-01

Pos logit contributions
 agree+0.005
 promise+0.005
 care+0.005
 promises+0.004
 learn+0.004
Neg logit contributions
ous-0.009
ho-0.008
sed-0.008
 grizz-0.008
 Nice-0.008
Token went
Token ID1816
Feature activation+2.09e-01

Pos logit contributions
ous+0.019
ho+0.018
sed+0.017
 grizz+0.017
 Nice+0.016
Neg logit contributions
 care-0.008
 promises-0.008
 practice-0.008
ap-0.008
 promise-0.008
Token to
Token ID284
Feature activation-6.47e-01

Pos logit contributions
ap+0.014
 promise+0.014
 agree+0.013
 advantage+0.012
 promises+0.012
Neg logit contributions
ous-0.017
ho-0.015
 Nice-0.015
 tigers-0.014
 Sky-0.014
Token outer
Token ID12076
Feature activation-2.34e+00

Pos logit contributions
ous+0.064
ho+0.061
 Nice+0.057
pill+0.055
igator+0.055
Neg logit contributions
 promise-0.033
 agree-0.032
ap-0.032
 learn-0.028
 promises-0.028
Token space
Token ID2272
Feature activation-3.16e-01

Pos logit contributions
ous+0.151
 Sky+0.144
 unic+0.139
 gor+0.137
 Nice+0.137
Neg logit contributions
 promise-0.177
 agree-0.171
ap-0.168
our-0.165
leaning-0.164
Token!
Token ID0
Feature activation+1.31e-02

Pos logit contributions
ous+0.032
ho+0.029
 Nice+0.028
 tigers+0.027
 spilling+0.027
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.014
 promises-0.012
 advantage-0.012
Token It
Token ID632
Feature activation+3.13e-01

Pos logit contributions
 promise+0.001
 agree+0.001
ap+0.001
 So+0.001
leaning+0.001
Neg logit contributions
ous-0.001
ho-0.001
 guarding-0.001
 Nice-0.001
 tigers-0.001
Token was
Token ID373
Feature activation-1.55e-01

Pos logit contributions
 promise+0.017
 promises+0.015
ap+0.015
 agree+0.015
 So+0.014
Neg logit contributions
ous-0.029
ho-0.028
 Nice-0.027
sed-0.026
 gor-0.026
Token the
Token ID262
Feature activation-3.77e-02

Pos logit contributions
ous+0.016
ho+0.016
pill+0.014
 Nice+0.014
balls+0.014
Neg logit contributions
 promise-0.007
ap-0.006
 advantage-0.006
 promises-0.006
 agree-0.006
Token,
Token ID11
Feature activation-3.06e-01

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
 promises+0.002
 advantage+0.002
Neg logit contributions
ous-0.005
ho-0.005
 Nice-0.005
 tigers-0.005
 Sky-0.005
Token so
Token ID523
Feature activation-2.80e-01

Pos logit contributions
ous+0.029
ho+0.026
 Nice+0.026
 Sky+0.025
 tigers+0.025
Neg logit contributions
 promise-0.016
ap-0.015
 agree-0.015
 So-0.014
leaning-0.014
Token Tim
Token ID5045
Feature activation+6.42e-02

Pos logit contributions
ous+0.020
ho+0.018
pill+0.017
 Nice+0.017
 gor+0.016
Neg logit contributions
 agree-0.021
 promise-0.021
ap-0.021
 So-0.020
 advantage-0.019
Token got
Token ID1392
Feature activation+1.67e-01

Pos logit contributions
ap+0.004
 promise+0.004
leaning+0.004
 agree+0.004
 advantage+0.004
Neg logit contributions
ous-0.005
ho-0.005
chy-0.004
 Nice-0.004
 guarding-0.004
Token mad
Token ID8805
Feature activation+3.99e-01

Pos logit contributions
 promise+0.012
ap+0.011
 agree+0.011
 promises+0.010
 advantage+0.010
Neg logit contributions
ous-0.013
ho-0.012
 Nice-0.011
 tigers-0.011
 Sky-0.011
Tokender
Token ID1082
Feature activation-2.32e+00

Pos logit contributions
 promise+0.015
ap+0.015
 agree+0.014
leaning+0.012
 promises+0.012
Neg logit contributions
ous-0.043
ho-0.040
 Nice-0.039
 tigers-0.038
 Sky-0.038
Token.
Token ID13
Feature activation-2.17e-01

Pos logit contributions
ous+0.269
ho+0.251
 Nice+0.250
 Sky+0.244
 tigers+0.240
Neg logit contributions
 promise-0.068
 agree-0.068
ap-0.063
 So-0.053
 promises-0.052
Token He
Token ID679
Feature activation+6.01e-03

Pos logit contributions
ous+0.017
ho+0.015
 Nice+0.014
 tigers+0.014
 guarding+0.014
Neg logit contributions
ap-0.015
 promise-0.015
 agree-0.015
 So-0.014
leaning-0.013
Token pushed
Token ID7121
Feature activation+6.69e-02

Pos logit contributions
ap+0.000
 promise+0.000
 agree+0.000
leaning+0.000
 advantage+0.000
Neg logit contributions
ous-0.000
ho-0.000
 Nice-0.000
 tigers-0.000
 guarding-0.000
Token Jack
Token ID3619
Feature activation-3.71e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.005
leaning+0.005
our+0.004
Neg logit contributions
ous-0.005
ho-0.004
 Nice-0.004
 tigers-0.004
 Sky-0.004
Token away
Token ID1497
Feature activation+7.12e-01

Pos logit contributions
ous+0.034
ho+0.032
 Nice+0.031
 tigers+0.029
 Sky+0.029
Neg logit contributions
 promise-0.020
ap-0.019
 agree-0.019
leaning-0.017
 promises-0.017
Token
Token ID198
Feature activation-1.18e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.002
 advantage+0.002
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 tigers-0.005
 Sky-0.005
TokenJen
Token ID44875
Feature activation-3.23e-02

Pos logit contributions
ous+0.008
ho+0.008
 Nice+0.007
 tigers+0.007
 Sky+0.007
Neg logit contributions
 promise-0.009
ap-0.009
 agree-0.008
 advantage-0.008
 promises-0.008
Token feels
Token ID5300
Feature activation-3.32e-01

Pos logit contributions
ous+0.002
ho+0.002
 Sky+0.002
chy+0.002
 tigers+0.002
Neg logit contributions
ap-0.002
 advantage-0.002
 promise-0.002
ute-0.002
leaning-0.002
Token even
Token ID772
Feature activation-6.19e-01

Pos logit contributions
ous+0.022
ho+0.020
 Nice+0.020
chy+0.019
 spilling+0.019
Neg logit contributions
ap-0.025
 promise-0.024
 agree-0.024
leaning-0.023
our-0.023
Token sad
Token ID6507
Feature activation+1.40e-01

Pos logit contributions
ous+0.067
ho+0.062
 Nice+0.059
 tigers+0.057
 Sky+0.056
Neg logit contributions
 promise-0.028
 agree-0.026
ap-0.025
 promises-0.024
 So-0.022
Tokender
Token ID1082
Feature activation-2.28e+00

Pos logit contributions
 promise+0.006
ap+0.005
 agree+0.005
leaning+0.005
 advantage+0.005
Neg logit contributions
ous-0.015
ho-0.014
 Nice-0.013
 tigers-0.013
 guarding-0.013
Token.
Token ID13
Feature activation-3.04e-01

Pos logit contributions
ous+0.261
ho+0.243
 Nice+0.240
 Sky+0.232
 tigers+0.232
Neg logit contributions
 promise-0.071
 agree-0.070
ap-0.069
 promises-0.056
 So-0.056
Token She
Token ID1375
Feature activation+1.21e-01

Pos logit contributions
ous+0.024
ho+0.021
igator+0.020
 tigers+0.020
 guarding+0.020
Neg logit contributions
ap-0.021
 agree-0.020
 So-0.020
 promise-0.020
leaning-0.019
Token thinks
Token ID6834
Feature activation+3.42e-02

Pos logit contributions
ap+0.010
 promise+0.009
 agree+0.009
 advantage+0.009
leaning+0.009
Neg logit contributions
ous-0.008
ho-0.007
chy-0.007
 grizz-0.006
sed-0.006
Token Tim
Token ID5045
Feature activation+7.55e-02

Pos logit contributions
ap+0.002
 agree+0.002
 promise+0.002
 promises+0.002
 So+0.002
Neg logit contributions
ous-0.003
ho-0.003
 Nice-0.003
 grizz-0.003
pill-0.003
Token is
Token ID318
Feature activation+3.41e-03

Pos logit contributions
ap+0.006
 agree+0.005
 promise+0.005
 advantage+0.005
leaning+0.005
Neg logit contributions
ous-0.005
ho-0.005
chy-0.004
 guarding-0.004
 Nice-0.004
Token thanked
Token ID26280
Feature activation-2.51e-01

Pos logit contributions
ous+0.075
ho+0.067
 Nice+0.067
 Sky+0.063
 tigers+0.063
Neg logit contributions
 promise-0.057
 agree-0.055
ap-0.055
 promises-0.050
 So-0.049
Token Clim
Token ID28573
Feature activation+5.25e-02

Pos logit contributions
ous+0.018
ho+0.016
 Nice+0.016
 tigers+0.016
 Sky+0.016
Neg logit contributions
 promise-0.018
ap-0.018
 agree-0.018
leaning-0.017
 So-0.016
Tokenber
Token ID527
Feature activation-8.28e-01

Pos logit contributions
 promise+0.004
ap+0.004
 agree+0.004
leaning+0.004
 So+0.004
Neg logit contributions
ous-0.004
ho-0.003
 Nice-0.003
 tigers-0.003
 guarding-0.003
Token the
Token ID262
Feature activation-1.19e-01

Pos logit contributions
ous+0.077
ho+0.071
 Nice+0.069
 tigers+0.067
 Sky+0.067
Neg logit contributions
 promise-0.043
ap-0.042
 agree-0.041
leaning-0.040
 advantage-0.037
Token Lad
Token ID12862
Feature activation+3.07e-01

Pos logit contributions
ous+0.010
ho+0.009
 Nice+0.009
 tigers+0.008
 Sky+0.008
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.007
 promises-0.006
 advantage-0.006
Tokender
Token ID1082
Feature activation-2.28e+00

Pos logit contributions
 promise+0.018
ap+0.018
 agree+0.018
 advantage+0.016
 promises+0.016
Neg logit contributions
ous-0.026
ho-0.025
 Nice-0.024
 spilling-0.023
 tigers-0.023
Token for
Token ID329
Feature activation-6.02e-01

Pos logit contributions
ous+0.217
ho+0.198
 Nice+0.195
 Sky+0.188
 tigers+0.187
Neg logit contributions
 promise-0.117
ap-0.114
 agree-0.113
 promises-0.100
 advantage-0.100
Token its
Token ID663
Feature activation-4.52e-01

Pos logit contributions
ous+0.042
ho+0.037
 Nice+0.036
pill+0.034
 Sky+0.033
Neg logit contributions
 promise-0.047
ap-0.046
 agree-0.046
 advantage-0.042
 promises-0.042
Token help
Token ID1037
Feature activation+5.84e-02

Pos logit contributions
ous+0.044
ho+0.040
 Nice+0.039
 tigers+0.038
 Sky+0.038
Neg logit contributions
ap-0.022
 promise-0.021
 agree-0.021
leaning-0.020
 So-0.019
Token.
Token ID13
Feature activation-9.96e-02

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
leaning+0.002
 promises+0.002
Neg logit contributions
ous-0.006
ho-0.006
 Nice-0.005
 tigers-0.005
 guarding-0.005
Token And
Token ID843
Feature activation+6.62e-02

Pos logit contributions
ous+0.007
 Nice+0.007
ho+0.006
pill+0.006
 Sky+0.006
Neg logit contributions
 promise-0.008
ap-0.007
 agree-0.007
leaning-0.007
our-0.007
Token"
Token ID1
Feature activation+7.58e-01

Pos logit contributions
 promise+0.008
 advantage+0.008
 promises+0.007
ap+0.007
 agree+0.007
Neg logit contributions
ous-0.010
 Sky-0.010
ho-0.010
 Nice-0.009
pill-0.009
TokenFollow
Token ID7155
Feature activation-8.13e-01

Pos logit contributions
 promise+0.053
ap+0.052
 agree+0.052
leaning+0.048
 promises+0.048
Neg logit contributions
ous-0.058
ho-0.052
 Nice-0.050
 tigers-0.048
 Sky-0.048
Token me
Token ID502
Feature activation+5.91e-01

Pos logit contributions
ous+0.052
 Sky+0.051
ho+0.048
 tigers+0.047
igator+0.046
Neg logit contributions
ap-0.059
 promise-0.058
 agree-0.057
leaning-0.057
our-0.055
Token,"
Token ID553
Feature activation+1.25e-03

Pos logit contributions
leaning+0.038
 promise+0.032
ap+0.032
 agree+0.032
 promises+0.032
Neg logit contributions
ous-0.049
 Sky-0.044
ho-0.043
 guarding-0.042
 Nice-0.041
Token the
Token ID262
Feature activation-5.38e-01

Pos logit contributions
 promise+0.000
 agree+0.000
ap+0.000
 So+0.000
 promises+0.000
Neg logit contributions
ous-0.000
ho-0.000
 Nice-0.000
 guarding-0.000
 spilling-0.000
Token sun
Token ID4252
Feature activation-2.28e+00

Pos logit contributions
ous+0.052
pill+0.046
 Nice+0.046
ho+0.046
 spilling+0.045
Neg logit contributions
 promise-0.028
ap-0.026
 agree-0.026
 So-0.024
 promises-0.023
Tokenflower
Token ID25547
Feature activation-9.96e-01

Pos logit contributions
ous+0.146
 Nice+0.130
pill+0.126
 grizz+0.125
 guarding+0.124
Neg logit contributions
ap-0.199
 promise-0.192
 agree-0.180
 promises-0.171
our-0.169
Token says
Token ID1139
Feature activation-2.66e-01

Pos logit contributions
ous+0.085
ho+0.076
 Nice+0.074
 spilling+0.071
 tigers+0.071
Neg logit contributions
 promise-0.062
ap-0.060
 agree-0.060
 promises-0.054
 So-0.054
Token.
Token ID13
Feature activation-3.46e-01

Pos logit contributions
ous+0.029
ho+0.027
 Nice+0.027
 tigers+0.026
pill+0.026
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.009
 promises-0.008
 So-0.008
Token He
Token ID679
Feature activation-1.09e-01

Pos logit contributions
ous+0.023
ho+0.021
igator+0.019
 Nice+0.019
 tigers+0.019
Neg logit contributions
ap-0.027
 agree-0.027
 promise-0.026
 So-0.026
leaning-0.024
Token bends
Token ID44370
Feature activation+4.07e-02

Pos logit contributions
ous+0.007
ho+0.007
 Nice+0.006
 tigers+0.006
 Sky+0.006
Neg logit contributions
 promise-0.008
ap-0.008
 agree-0.008
leaning-0.008
 advantage-0.008
Token stay
Token ID2652
Feature activation-2.10e-01

Pos logit contributions
ous+0.028
chy+0.025
ho+0.025
igator+0.025
 Sky+0.024
Neg logit contributions
leaning-0.025
 promises-0.020
 So-0.020
ap-0.019
 advantage-0.019
Token here
Token ID994
Feature activation-2.21e-01

Pos logit contributions
 sleeping+0.013
 chasing+0.012
 guarding+0.012
 hiding+0.011
 eating+0.011
Neg logit contributions
leaning-0.017
 So-0.016
So-0.015
 agree-0.015
erve-0.014
Token and
Token ID290
Feature activation-4.84e-01

Pos logit contributions
ous+0.021
ho+0.020
 Nice+0.019
 Sky+0.018
 guarding+0.018
Neg logit contributions
leaning-0.010
 promise-0.010
ap-0.010
 agree-0.010
 advantage-0.009
Token enjoy
Token ID2883
Feature activation-3.82e-01

Pos logit contributions
ous+0.049
ho+0.045
 Nice+0.044
 Sky+0.042
 tigers+0.042
Neg logit contributions
 promise-0.022
ap-0.021
 agree-0.021
leaning-0.019
 advantage-0.018
Token the
Token ID262
Feature activation-8.13e-01

Pos logit contributions
ous+0.036
ho+0.033
 Nice+0.032
 grizz+0.030
pill+0.030
Neg logit contributions
 promise-0.020
ap-0.020
 agree-0.020
 promises-0.018
 advantage-0.018
Token sun
Token ID4252
Feature activation-2.27e+00

Pos logit contributions
ous+0.065
ho+0.056
 Nice+0.055
 tigers+0.053
 grizz+0.052
Neg logit contributions
 promise-0.058
ap-0.056
 agree-0.054
 promises-0.051
 advantage-0.050
Token.
Token ID13
Feature activation-5.34e-02

Pos logit contributions
ous+0.220
ho+0.202
 Nice+0.200
 grizz+0.199
 spilling+0.194
Neg logit contributions
ap-0.113
 promise-0.111
 agree-0.104
 promises-0.094
our-0.091
Token But
Token ID887
Feature activation+5.79e-01

Pos logit contributions
ous+0.004
 Nice+0.004
 Sky+0.004
ho+0.004
 tigers+0.004
Neg logit contributions
 promise-0.004
 agree-0.003
ap-0.003
leaning-0.003
 advantage-0.003
Token can
Token ID460
Feature activation+7.24e-01

Pos logit contributions
 promise+0.034
ap+0.033
 agree+0.032
 promises+0.030
 advantage+0.028
Neg logit contributions
ous-0.052
ho-0.047
 Nice-0.046
pill-0.046
 tigers-0.045
Token I
Token ID314
Feature activation-4.25e-01

Pos logit contributions
 promise+0.051
ap+0.050
 agree+0.049
leaning+0.046
 promises+0.046
Neg logit contributions
ous-0.055
ho-0.049
 Nice-0.048
 tigers-0.045
 guarding-0.045
Token visit
Token ID3187
Feature activation+3.11e-01

Pos logit contributions
ho+0.032
ous+0.031
onna+0.030
chy+0.030
igator+0.028
Neg logit contributions
leaning-0.028
ised-0.026
ags-0.025
ap-0.024
ippers-0.024
Token Lily
Token ID20037
Feature activation-2.81e-01

Pos logit contributions
ous+0.009
 Nice+0.007
ho+0.007
 Sky+0.007
 tigers+0.007
Neg logit contributions
ap-0.019
 promise-0.019
 agree-0.018
leaning-0.018
 advantage-0.017
Token said
Token ID531
Feature activation-3.62e-01

Pos logit contributions
ous+0.020
ho+0.018
 tigers+0.017
 chasing+0.017
 Sky+0.017
Neg logit contributions
 promise-0.020
ap-0.020
leaning-0.020
 agree-0.019
 advantage-0.019
Token,
Token ID11
Feature activation-4.63e-01

Pos logit contributions
ous+0.038
ho+0.035
 Nice+0.034
 Sky+0.033
 spilling+0.033
Neg logit contributions
 promise-0.015
ap-0.014
 agree-0.014
leaning-0.013
 promises-0.012
Token making
Token ID1642
Feature activation-7.16e-01

Pos logit contributions
ous+0.030
pill+0.027
 Nice+0.026
ho+0.025
igator+0.023
Neg logit contributions
 agree-0.038
ap-0.037
 promise-0.036
 So-0.034
 promises-0.034
Token her
Token ID607
Feature activation-9.00e-01

Pos logit contributions
ous+0.066
ho+0.062
 Nice+0.059
 guarding+0.055
igator+0.054
Neg logit contributions
 agree-0.041
 promise-0.041
ap-0.041
 promises-0.040
 advantage-0.038
Token gir
Token ID37370
Feature activation-2.25e+00

Pos logit contributions
ous+0.065
 spilling+0.055
 Nice+0.054
ho+0.054
pill+0.054
Neg logit contributions
 promise-0.074
 agree-0.069
ap-0.067
 advantage-0.065
 own-0.065
Tokenaffe
Token ID21223
Feature activation-1.31e+00

Pos logit contributions
ous+0.161
ho+0.140
 Nice+0.130
 tigers+0.129
 Sky+0.127
Neg logit contributions
 promise-0.174
ap-0.172
 agree-0.169
leaning-0.165
 promises-0.160
Token kick
Token ID4829
Feature activation+2.59e-01

Pos logit contributions
ous+0.114
ho+0.103
 Nice+0.100
 tigers+0.095
 guarding+0.094
Neg logit contributions
 promise-0.081
ap-0.078
 agree-0.077
 promises-0.071
 advantage-0.070
Token the
Token ID262
Feature activation-6.10e-01

Pos logit contributions
 promise+0.014
ap+0.014
 agree+0.014
 advantage+0.013
 promises+0.013
Neg logit contributions
ous-0.025
ho-0.021
 Nice-0.021
 guarding-0.020
 grizz-0.020
Token lion
Token ID18744
Feature activation-1.00e+00

Pos logit contributions
ous+0.047
 Nice+0.039
lic+0.037
onna+0.035
 spilling+0.034
Neg logit contributions
 promise-0.048
 own-0.044
 advantage-0.043
 agree-0.041
ap-0.041
Token away
Token ID1497
Feature activation+8.30e-01

Pos logit contributions
ous+0.104
ho+0.094
 Nice+0.094
pill+0.090
 tigers+0.090
Neg logit contributions
 promise-0.044
ap-0.043
 agree-0.042
 advantage-0.036
 promises-0.036
Token Lily
Token ID20037
Feature activation-3.15e-01

Pos logit contributions
ous+0.008
 Nice+0.006
 Sky+0.006
ho+0.006
 tigers+0.006
Neg logit contributions
ap-0.016
 promise-0.016
 agree-0.015
leaning-0.015
 advantage-0.015
Token said
Token ID531
Feature activation-3.71e-01

Pos logit contributions
ous+0.021
ho+0.019
 chasing+0.018
 tigers+0.018
 Sky+0.017
Neg logit contributions
ap-0.024
 promise-0.024
leaning-0.023
ags-0.023
 advantage-0.023
Token,
Token ID11
Feature activation-4.60e-01

Pos logit contributions
ous+0.039
ho+0.036
 Nice+0.035
 Sky+0.034
 spilling+0.034
Neg logit contributions
 promise-0.015
ap-0.014
 agree-0.014
leaning-0.014
 advantage-0.012
Token making
Token ID1642
Feature activation-6.57e-01

Pos logit contributions
ous+0.029
 Nice+0.026
pill+0.026
ho+0.025
igator+0.022
Neg logit contributions
 agree-0.038
ap-0.038
 promise-0.037
 So-0.034
 promises-0.034
Token her
Token ID607
Feature activation-8.65e-01

Pos logit contributions
ous+0.064
ho+0.060
 Nice+0.058
pill+0.053
storm+0.053
Neg logit contributions
 promise-0.035
 agree-0.035
 promises-0.034
ap-0.034
 advantage-0.033
Token gir
Token ID37370
Feature activation-2.25e+00

Pos logit contributions
ous+0.062
 spilling+0.053
ho+0.052
 Nice+0.051
pill+0.051
Neg logit contributions
 promise-0.072
 agree-0.067
ap-0.065
 advantage-0.064
 promises-0.064
Tokenaffe
Token ID21223
Feature activation-1.30e+00

Pos logit contributions
ous+0.162
ho+0.142
 Nice+0.133
 tigers+0.132
 Sky+0.131
Neg logit contributions
 promise-0.171
ap-0.169
 agree-0.166
leaning-0.161
 promises-0.156
Token dodge
Token ID24885
Feature activation+2.15e-01

Pos logit contributions
ous+0.115
ho+0.104
 Nice+0.102
pill+0.096
 tigers+0.096
Neg logit contributions
 promise-0.080
ap-0.077
 agree-0.076
 promises-0.070
 So-0.069
Token the
Token ID262
Feature activation-2.41e-01

Pos logit contributions
ap+0.011
 promise+0.010
 agree+0.010
 advantage+0.010
 promises+0.009
Neg logit contributions
ous-0.021
 Nice-0.019
ho-0.019
 grizz-0.018
pill-0.018
Token lion
Token ID18744
Feature activation-7.54e-01

Pos logit contributions
ous+0.019
 Nice+0.016
lic+0.014
 spilling+0.014
ho+0.014
Neg logit contributions
 promise-0.019
 agree-0.017
ap-0.017
 advantage-0.017
 own-0.017
Token.
Token ID13
Feature activation-1.23e-01

Pos logit contributions
ous+0.080
 Nice+0.073
ho+0.073
pill+0.070
 grizz+0.069
Neg logit contributions
 promise-0.032
ap-0.030
 agree-0.030
 promises-0.026
 So-0.025
Token even
Token ID772
Feature activation-5.59e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.003
 promises+0.003
Neg logit contributions
ous-0.004
ho-0.004
 Nice-0.004
 tigers-0.003
 Sky-0.003
Token happier
Token ID23030
Feature activation+2.84e-01

Pos logit contributions
ous+0.051
ho+0.046
 Nice+0.045
 tigers+0.043
 Sky+0.043
Neg logit contributions
 promise-0.033
 agree-0.031
ap-0.031
 promises-0.029
 advantage-0.028
Token and
Token ID290
Feature activation+1.11e-01

Pos logit contributions
 promise+0.012
ap+0.011
 agree+0.011
leaning+0.010
 advantage+0.010
Neg logit contributions
ous-0.030
ho-0.028
 Nice-0.027
 tigers-0.026
 guarding-0.026
Token p
Token ID279
Feature activation-2.63e-01

Pos logit contributions
 promise+0.007
 agree+0.006
 promises+0.006
ap+0.006
 So+0.005
Neg logit contributions
ous-0.011
ho-0.010
pill-0.009
 tigers-0.009
 Nice-0.009
Tokenrou
Token ID472
Feature activation-1.01e+00

Pos logit contributions
ous+0.017
ho+0.015
balls+0.013
 Nice+0.013
 guarding+0.012
Neg logit contributions
 promise-0.022
 agree-0.021
leaning-0.021
ap-0.021
 advantage-0.020
Tokender
Token ID1082
Feature activation-2.25e+00

Pos logit contributions
ous+0.059
ho+0.051
 Nice+0.050
 tigers+0.046
 Sky+0.046
Neg logit contributions
ap-0.088
 promise-0.087
 agree-0.087
leaning-0.081
 advantage-0.081
Token.
Token ID13
Feature activation-1.31e-01

Pos logit contributions
ous+0.253
 Nice+0.240
ho+0.236
 Sky+0.232
 grizz+0.228
Neg logit contributions
 agree-0.073
 promise-0.071
ap-0.069
 promises-0.057
 So-0.055
Token He
Token ID679
Feature activation+1.85e-01

Pos logit contributions
ous+0.013
ho+0.012
 spilling+0.011
 guarding+0.011
igator+0.011
Neg logit contributions
 So-0.007
 agree-0.007
ap-0.007
 promise-0.006
 promises-0.006
Token displays
Token ID11298
Feature activation+3.88e-03

Pos logit contributions
ap+0.014
leaning+0.014
 promise+0.013
 agree+0.013
 advantage+0.013
Neg logit contributions
ous-0.012
ho-0.011
 tigers-0.010
 grizz-0.010
 Nice-0.010
Token his
Token ID465
Feature activation+3.89e-01

Pos logit contributions
ap+0.000
 promise+0.000
 agree+0.000
 promises+0.000
 advantage+0.000
Neg logit contributions
ous-0.000
pill-0.000
ho-0.000
 grizz-0.000
storm-0.000
Token sail
Token ID14936
Feature activation+4.80e-01

Pos logit contributions
ap+0.022
 agree+0.022
leaning+0.022
 promise+0.021
 So+0.021
Neg logit contributions
ous-0.033
ho-0.030
 tigers-0.029
 Nice-0.029
 chasing-0.028
Tokenamed
Token ID2434
Feature activation+7.31e-03

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.003
 promises+0.003
Neg logit contributions
ous-0.005
ho-0.005
 Nice-0.005
 tigers-0.004
pill-0.004
Token down
Token ID866
Feature activation+4.18e-01

Pos logit contributions
 promise+0.000
 agree+0.000
ap+0.000
leaning+0.000
 advantage+0.000
Neg logit contributions
ous-0.001
ho-0.001
 Nice-0.001
 guarding-0.001
 Sky-0.001
Token,
Token ID11
Feature activation-1.37e-02

Pos logit contributions
 promise+0.019
ap+0.018
 agree+0.018
leaning+0.016
 promises+0.016
Neg logit contributions
ous-0.042
ho-0.039
 Nice-0.038
 tigers-0.037
 Sky-0.037
Token making
Token ID1642
Feature activation-6.43e-01

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.001
ap-0.001
 agree-0.001
leaning-0.001
 promises-0.001
Token the
Token ID262
Feature activation-7.40e-01

Pos logit contributions
ous+0.055
ho+0.050
 Nice+0.048
 tigers+0.046
 guarding+0.046
Neg logit contributions
 promise-0.039
ap-0.039
 agree-0.038
 promises-0.035
 advantage-0.035
Token sun
Token ID4252
Feature activation-2.24e+00

Pos logit contributions
ous+0.061
ho+0.054
 Nice+0.053
 tigers+0.050
pill+0.050
Neg logit contributions
 promise-0.049
ap-0.048
 agree-0.047
 promises-0.043
 advantage-0.043
Tokenflower
Token ID25547
Feature activation-1.04e+00

Pos logit contributions
ous+0.174
ho+0.157
 Nice+0.155
 tigers+0.148
pill+0.148
Neg logit contributions
 promise-0.153
ap-0.151
 agree-0.147
 promises-0.136
 So-0.133
Token's
Token ID338
Feature activation+3.84e-01

Pos logit contributions
ous+0.091
ho+0.081
 Nice+0.078
 Sky+0.076
chy+0.076
Neg logit contributions
ap-0.060
 promise-0.059
 agree-0.059
leaning-0.058
 advantage-0.054
Token pet
Token ID4273
Feature activation+1.47e-01

Pos logit contributions
 promise+0.022
ap+0.021
 agree+0.021
leaning+0.019
 promises+0.019
Neg logit contributions
ous-0.034
ho-0.031
 Nice-0.030
 tigers-0.029
 Sky-0.029
Tokenals
Token ID874
Feature activation-1.04e+00

Pos logit contributions
 promise+0.008
ap+0.008
 agree+0.008
 So+0.007
 promises+0.007
Neg logit contributions
ous-0.013
ho-0.012
 Nice-0.012
 spilling-0.011
 tigers-0.011
Token fl
Token ID781
Feature activation-5.69e-02

Pos logit contributions
ous+0.088
ho+0.079
 Nice+0.076
 Sky+0.073
 spilling+0.073
Neg logit contributions
 promise-0.064
ap-0.063
 agree-0.062
leaning-0.060
 advantage-0.057
Token Thank
Token ID6952
Feature activation-9.45e-01

Pos logit contributions
ous+0.007
ho+0.007
 spilling+0.006
 guarding+0.006
chy+0.006
Neg logit contributions
ap-0.008
 promise-0.008
 agree-0.007
 So-0.007
leaning-0.007
Token you
Token ID345
Feature activation-1.60e-01

Pos logit contributions
ous+0.083
ho+0.076
 Nice+0.074
 Sky+0.072
 tigers+0.071
Neg logit contributions
 promise-0.056
 agree-0.053
ap-0.053
leaning-0.049
 advantage-0.048
Token for
Token ID329
Feature activation-5.14e-01

Pos logit contributions
ous+0.014
ho+0.013
 spilling+0.013
 Nice+0.013
pill+0.013
Neg logit contributions
 promise-0.009
ap-0.009
 agree-0.009
 So-0.008
 promises-0.008
Token making
Token ID1642
Feature activation-7.11e-01

Pos logit contributions
ous+0.040
ho+0.036
 Nice+0.034
pill+0.033
chy+0.032
Neg logit contributions
 promise-0.038
ap-0.038
 agree-0.037
 promises-0.035
 learn-0.034
Token the
Token ID262
Feature activation-8.59e-01

Pos logit contributions
ous+0.074
ho+0.069
 Nice+0.065
storm+0.063
 spilling+0.063
Neg logit contributions
ap-0.033
 promise-0.032
 agree-0.032
 promises-0.030
 advantage-0.028
Token sun
Token ID4252
Feature activation-2.23e+00

Pos logit contributions
ous+0.073
ho+0.066
 Nice+0.064
 tigers+0.062
 Sky+0.062
Neg logit contributions
 promise-0.052
ap-0.051
 agree-0.050
leaning-0.047
 So-0.046
Token and
Token ID290
Feature activation-6.64e-03

Pos logit contributions
ous+0.223
ho+0.208
 Nice+0.204
 grizz+0.201
 spilling+0.201
Neg logit contributions
ap-0.104
 promise-0.104
 agree-0.097
 promises-0.086
 So-0.082
Token the
Token ID262
Feature activation-5.62e-01

Pos logit contributions
ous+0.001
ho+0.001
lic+0.001
chy+0.001
 Nice+0.001
Neg logit contributions
 promises-0.000
 promise-0.000
 agree-0.000
ap-0.000
 learn-0.000
Token stars
Token ID5788
Feature activation-1.09e+00

Pos logit contributions
ous+0.048
ho+0.042
 Nice+0.041
 spilling+0.039
 guarding+0.039
Neg logit contributions
 promise-0.036
ap-0.035
 agree-0.035
 promises-0.032
 advantage-0.031
Token and
Token ID290
Feature activation+1.45e-01

Pos logit contributions
ous+0.118
ho+0.110
 Nice+0.107
 tigers+0.103
 grizz+0.103
Neg logit contributions
 promise-0.042
ap-0.041
 agree-0.040
 promises-0.034
leaning-0.033
Token the
Token ID262
Feature activation-1.93e-01

Pos logit contributions
 promise+0.007
 agree+0.007
 promises+0.007
ap+0.006
 learn+0.006
Neg logit contributions
ous-0.014
ho-0.013
 Nice-0.013
lic-0.013
 spilling-0.012
Token!"
Token ID2474
Feature activation-2.46e-01

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.006
 promises+0.005
 So+0.005
Neg logit contributions
ous-0.012
ho-0.011
 Nice-0.011
 Sky-0.010
pill-0.010
Token Benny
Token ID44275
Feature activation-2.40e-02

Pos logit contributions
ous+0.014
 Nice+0.013
 tigers+0.012
 grizz+0.011
 chasing+0.011
Neg logit contributions
 promise-0.022
ap-0.021
leaning-0.020
ute-0.020
 agree-0.020
Token felt
Token ID2936
Feature activation-6.83e-02

Pos logit contributions
ous+0.002
ho+0.002
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.002
ap-0.002
 agree-0.002
leaning-0.002
 advantage-0.002
Token even
Token ID772
Feature activation-6.22e-01

Pos logit contributions
ous+0.005
ho+0.004
 Nice+0.004
igator+0.004
 spilling+0.004
Neg logit contributions
ap-0.005
 promise-0.005
 agree-0.005
leaning-0.005
our-0.005
Token sad
Token ID6507
Feature activation+6.53e-02

Pos logit contributions
ous+0.074
ho+0.066
 Nice+0.063
storm+0.062
pill+0.061
Neg logit contributions
 promise-0.028
 agree-0.025
 promises-0.023
ap-0.022
 So-0.020
Tokender
Token ID1082
Feature activation-2.23e+00

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.002
 advantage+0.002
Neg logit contributions
ous-0.007
ho-0.006
 Nice-0.006
 tigers-0.006
 spilling-0.006
Token and
Token ID290
Feature activation-3.98e-02

Pos logit contributions
ous+0.260
ho+0.244
 Nice+0.242
 Sky+0.236
 tigers+0.233
Neg logit contributions
 agree-0.063
 promise-0.061
ap-0.060
 So-0.048
 promises-0.048
Token started
Token ID2067
Feature activation-2.33e-03

Pos logit contributions
ous+0.004
pill+0.003
 Nice+0.003
ho+0.003
 gor+0.003
Neg logit contributions
 promise-0.002
 agree-0.002
 promises-0.002
 So-0.002
ap-0.002
Token to
Token ID284
Feature activation-3.80e-01

Pos logit contributions
ous+0.000
igator+0.000
 Nice+0.000
 tigers+0.000
ho+0.000
Neg logit contributions
ap-0.000
 agree-0.000
 promise-0.000
 promises-0.000
 advantage-0.000
Token cry
Token ID3960
Feature activation+5.33e-02

Pos logit contributions
ous+0.033
ho+0.030
 Nice+0.029
pill+0.029
 tigers+0.028
Neg logit contributions
 promise-0.024
 agree-0.023
ap-0.023
 promises-0.020
 advantage-0.020
Token.
Token ID13
Feature activation+3.35e-02

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
leaning+0.002
 promises+0.002
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 tigers-0.005
 spilling-0.005
Tokenamed
Token ID2434
Feature activation+7.66e-03

Pos logit contributions
ap+0.005
 promise+0.005
 agree+0.005
 advantage+0.005
 promises+0.005
Neg logit contributions
ous-0.008
 Nice-0.007
ho-0.007
pill-0.007
 Sky-0.006
Token down
Token ID866
Feature activation+4.45e-01

Pos logit contributions
 promise+0.000
ap+0.000
 agree+0.000
leaning+0.000
 promises+0.000
Neg logit contributions
ous-0.001
ho-0.001
 Nice-0.001
 tigers-0.001
 Sky-0.001
Token,
Token ID11
Feature activation+1.97e-02

Pos logit contributions
 promise+0.020
ap+0.020
 agree+0.020
 So+0.017
 promises+0.017
Neg logit contributions
ous-0.044
ho-0.041
 Nice-0.040
 tigers-0.039
 Sky-0.038
Token making
Token ID1642
Feature activation-6.75e-01

Pos logit contributions
 promise+0.001
ap+0.001
 agree+0.001
 promises+0.001
leaning+0.001
Neg logit contributions
ous-0.002
ho-0.002
 Nice-0.002
 tigers-0.002
pill-0.002
Token the
Token ID262
Feature activation-7.74e-01

Pos logit contributions
ous+0.061
ho+0.057
 Nice+0.056
 grizz+0.052
 guarding+0.052
Neg logit contributions
ap-0.038
 promise-0.038
 agree-0.037
 promises-0.035
 advantage-0.034
Token sun
Token ID4252
Feature activation-2.23e+00

Pos logit contributions
ous+0.061
ho+0.054
 Nice+0.054
pill+0.050
 guarding+0.050
Neg logit contributions
 promise-0.054
ap-0.052
 agree-0.052
 promises-0.048
 advantage-0.047
Tokenflower
Token ID25547
Feature activation-1.09e+00

Pos logit contributions
ous+0.179
ho+0.163
 Nice+0.160
 tigers+0.154
pill+0.153
Neg logit contributions
 promise-0.145
ap-0.144
 agree-0.139
 promises-0.128
 So-0.126
Token's
Token ID338
Feature activation+3.59e-01

Pos logit contributions
ous+0.097
ho+0.088
 Nice+0.086
 tigers+0.082
 Sky+0.082
Neg logit contributions
 promise-0.061
ap-0.060
 agree-0.059
leaning-0.056
 advantage-0.054
Token pet
Token ID4273
Feature activation+2.15e-01

Pos logit contributions
 promise+0.021
ap+0.021
 agree+0.020
 promises+0.018
 advantage+0.018
Neg logit contributions
ous-0.032
ho-0.029
 Nice-0.028
 tigers-0.027
pill-0.027
Tokenals
Token ID874
Feature activation-9.37e-01

Pos logit contributions
 promise+0.011
ap+0.011
 agree+0.011
 So+0.010
 advantage+0.010
Neg logit contributions
ous-0.019
ho-0.018
 Nice-0.018
 spilling-0.017
 guarding-0.017
Token fl
Token ID781
Feature activation-2.21e-02

Pos logit contributions
ous+0.084
ho+0.077
 Nice+0.074
 tigers+0.072
 Sky+0.071
Neg logit contributions
 promise-0.052
ap-0.051
 agree-0.051
leaning-0.047
 promises-0.046
Token do
Token ID466
Feature activation-1.22e-02

Pos logit contributions
leaning+0.042
ags+0.039
 advantage+0.039
 promise+0.039
ap+0.039
Neg logit contributions
ous-0.057
chy-0.050
 Sky-0.048
ho-0.048
onna-0.048
Token that
Token ID326
Feature activation-1.15e+00

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.001
ap-0.001
 agree-0.001
leaning-0.001
 promises-0.001
Token,"
Token ID553
Feature activation-7.86e-01

Pos logit contributions
ous+0.116
ho+0.110
 Nice+0.105
pill+0.103
 Sky+0.102
Neg logit contributions
 promise-0.055
ap-0.052
 agree-0.051
 So-0.044
 promises-0.044
Token said
Token ID531
Feature activation-4.80e-01

Pos logit contributions
ous+0.045
ho+0.041
 Nice+0.038
 Sky+0.037
 tigers+0.036
Neg logit contributions
 promise-0.068
ap-0.067
 agree-0.065
leaning-0.064
 advantage-0.063
Token the
Token ID262
Feature activation-5.55e-01

Pos logit contributions
ous+0.034
ho+0.030
 Nice+0.029
 tigers+0.028
pill+0.028
Neg logit contributions
ap-0.036
 promise-0.036
 agree-0.035
 advantage-0.033
 So-0.033
Token gen
Token ID2429
Feature activation-2.23e+00

Pos logit contributions
ous+0.048
 Nice+0.043
ho+0.041
pill+0.040
 spilling+0.040
Neg logit contributions
 promise-0.035
ap-0.034
 agree-0.033
 advantage-0.030
 promises-0.030
Tokenie
Token ID494
Feature activation-4.18e-01

Pos logit contributions
ous+0.172
 Nice+0.153
ho+0.152
 spilling+0.145
 tigers+0.145
Neg logit contributions
 promise-0.158
ap-0.154
 agree-0.151
 promises-0.139
 advantage-0.136
Token.
Token ID13
Feature activation-4.68e-01

Pos logit contributions
ous+0.047
 Nice+0.044
ho+0.044
 spilling+0.043
 tigers+0.043
Neg logit contributions
 promise-0.014
ap-0.013
 agree-0.013
 So-0.011
 promises-0.010
Token "
Token ID366
Feature activation-3.71e-01

Pos logit contributions
ous+0.040
ho+0.036
 Nice+0.035
 tigers+0.034
 Sky+0.034
Neg logit contributions
 promise-0.028
ap-0.028
 agree-0.027
leaning-0.025
 promises-0.025
TokenWatch
Token ID10723
Feature activation-3.94e-01

Pos logit contributions
ous+0.024
ho+0.023
 Nice+0.022
pill+0.021
 Sky+0.021
Neg logit contributions
 promise-0.029
ap-0.029
 agree-0.028
 advantage-0.026
our-0.026
Token this
Token ID428
Feature activation-7.49e-01

Pos logit contributions
ous+0.034
ho+0.031
 Nice+0.030
 tigers+0.029
 Sky+0.029
Neg logit contributions
 promise-0.023
ap-0.023
 agree-0.023
leaning-0.021
 advantage-0.021
Tokenmy
Token ID1820
Feature activation+3.29e-01

Pos logit contributions
 promise+0.012
ap+0.012
 agree+0.011
 So+0.010
 promises+0.010
Neg logit contributions
ous-0.019
ho-0.018
 Nice-0.017
pill-0.017
 Sky-0.017
Token,
Token ID11
Feature activation-2.93e-01

Pos logit contributions
 promise+0.013
 agree+0.013
ap+0.013
leaning+0.012
 advantage+0.012
Neg logit contributions
ous-0.034
ho-0.032
 Nice-0.031
 tigers-0.030
 Sky-0.029
Token why
Token ID1521
Feature activation-4.44e-01

Pos logit contributions
ous+0.025
ho+0.022
 Nice+0.021
pill+0.021
 tigers+0.021
Neg logit contributions
 promise-0.019
ap-0.018
 agree-0.018
leaning-0.016
 promises-0.016
Token does
Token ID857
Feature activation-3.94e-01

Pos logit contributions
ous+0.037
ho+0.034
 Nice+0.033
 tigers+0.031
 Sky+0.031
Neg logit contributions
 promise-0.027
ap-0.027
 agree-0.027
 advantage-0.025
leaning-0.025
Token the
Token ID262
Feature activation-9.09e-01

Pos logit contributions
ous+0.033
ho+0.030
 Nice+0.029
 tigers+0.028
 spilling+0.028
Neg logit contributions
 promise-0.024
ap-0.024
 agree-0.023
leaning-0.021
 promises-0.021
Token sun
Token ID4252
Feature activation-2.23e+00

Pos logit contributions
ous+0.081
ho+0.073
pill+0.073
 Nice+0.073
 guarding+0.072
Neg logit contributions
 promise-0.052
ap-0.048
 agree-0.048
 So-0.044
 promises-0.044
Token rise
Token ID4485
Feature activation-3.65e-01

Pos logit contributions
ous+0.186
ho+0.171
 Nice+0.171
 grizz+0.162
pill+0.162
Neg logit contributions
 promise-0.140
ap-0.137
 agree-0.127
 promises-0.121
our-0.120
Token and
Token ID290
Feature activation-2.60e-01

Pos logit contributions
ous+0.036
ho+0.033
 Nice+0.032
 tigers+0.031
 Sky+0.031
Neg logit contributions
 promise-0.018
ap-0.017
 agree-0.017
leaning-0.015
 promises-0.015
Token set
Token ID900
Feature activation-4.44e-01

Pos logit contributions
ous+0.024
 Nice+0.023
ho+0.022
pill+0.022
 grizz+0.021
Neg logit contributions
 promise-0.014
 agree-0.013
ap-0.013
 promises-0.012
 learn-0.012
Token every
Token ID790
Feature activation-8.34e-01

Pos logit contributions
ous+0.027
 Sky+0.023
 spilling+0.023
 chasing+0.023
 sleeping+0.022
Neg logit contributions
 agree-0.036
leaning-0.036
 promise-0.036
ap-0.034
 So-0.034
Token day
Token ID1110
Feature activation-5.83e-01

Pos logit contributions
ous+0.068
 Nice+0.061
ho+0.060
pill+0.059
 grizz+0.058
Neg logit contributions
 promise-0.057
ap-0.054
 agree-0.052
 advantage-0.048
 promises-0.047
Token old
Token ID1468
Feature activation-6.09e-01

Pos logit contributions
 promise+0.023
 agree+0.022
ap+0.022
 own+0.020
 promises+0.020
Neg logit contributions
ous-0.029
ho-0.024
 Nice-0.023
 spilling-0.023
storm-0.023
Token frog
Token ID21264
Feature activation-2.60e-01

Pos logit contributions
ous+0.048
 spilling+0.040
 Nice+0.038
pill+0.037
ho+0.037
Neg logit contributions
 agree-0.045
 promise-0.045
ap-0.043
 promises-0.041
 So-0.040
Token felt
Token ID2936
Feature activation-3.77e-01

Pos logit contributions
ous+0.018
ho+0.017
 tigers+0.015
 Sky+0.015
chy+0.015
Neg logit contributions
ap-0.019
leaning-0.019
ags-0.019
 promise-0.018
 advantage-0.018
Token even
Token ID772
Feature activation-6.26e-01

Pos logit contributions
ous+0.028
ho+0.025
 Nice+0.025
igator+0.023
 Sky+0.023
Neg logit contributions
ap-0.026
 promise-0.026
 agree-0.026
leaning-0.025
our-0.024
Token sad
Token ID6507
Feature activation+1.53e-01

Pos logit contributions
ous+0.068
ho+0.063
 Nice+0.060
pill+0.058
 tigers+0.057
Neg logit contributions
 promise-0.029
 agree-0.027
ap-0.025
 promises-0.025
 advantage-0.023
Tokender
Token ID1082
Feature activation-2.22e+00

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.006
leaning+0.005
 promises+0.005
Neg logit contributions
ous-0.017
ho-0.015
 Nice-0.015
 tigers-0.015
 Sky-0.015
Token and
Token ID290
Feature activation-5.87e-02

Pos logit contributions
ous+0.252
ho+0.236
 Nice+0.235
 Sky+0.228
 tigers+0.226
Neg logit contributions
 promise-0.067
 agree-0.067
ap-0.065
 promises-0.053
 So-0.052
Token hopped
Token ID45230
Feature activation-3.23e-01

Pos logit contributions
ous+0.005
 Nice+0.005
pill+0.005
ho+0.004
 gor+0.004
Neg logit contributions
 promise-0.004
 agree-0.003
 promises-0.003
ap-0.003
 So-0.003
Token away
Token ID1497
Feature activation+4.82e-01

Pos logit contributions
ous+0.034
ho+0.032
 Nice+0.031
 Sky+0.030
 guarding+0.030
Neg logit contributions
 promise-0.013
ap-0.013
 agree-0.012
leaning-0.011
 promises-0.011
Token.
Token ID13
Feature activation-6.64e-02

Pos logit contributions
 promise+0.017
 agree+0.017
ap+0.017
 advantage+0.014
 So+0.014
Neg logit contributions
ous-0.053
ho-0.049
 Nice-0.048
pill-0.048
 tigers-0.047
Token
Token ID198
Feature activation+3.05e-01

Pos logit contributions
ous+0.006
ho+0.006
 Nice+0.006
 tigers+0.005
 Sky+0.005
Neg logit contributions
 promise-0.003
ap-0.003
 agree-0.003
leaning-0.003
 promises-0.003
Tokenily
Token ID813
Feature activation-9.63e-01

Pos logit contributions
 promise+0.010
ap+0.009
 agree+0.009
leaning+0.009
 promises+0.009
Neg logit contributions
ous-0.008
ho-0.007
 Nice-0.007
 tigers-0.006
 grizz-0.006
Token insisted
Token ID11189
Feature activation+8.81e-02

Pos logit contributions
ous+0.056
ho+0.054
chy+0.049
 chasing+0.046
sed+0.044
Neg logit contributions
 advantage-0.078
ags-0.078
ap-0.075
ippers-0.074
 promise-0.074
Token on
Token ID319
Feature activation-1.04e-01

Pos logit contributions
 promise+0.007
leaning+0.006
 agree+0.006
 promises+0.006
ap+0.006
Neg logit contributions
ous-0.006
 spilling-0.005
 eating-0.005
 chasing-0.005
 biting-0.005
Token keeping
Token ID5291
Feature activation-1.14e+00

Pos logit contributions
ous+0.005
ho+0.004
 Nice+0.004
 spilling+0.004
 tigers+0.004
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.010
leaning-0.009
 advantage-0.009
Token the
Token ID262
Feature activation-6.88e-01

Pos logit contributions
 chasing+0.050
 eating+0.049
 sleeping+0.049
 guarding+0.047
 spilling+0.046
Neg logit contributions
 agree-0.108
 promise-0.108
ap-0.107
 So-0.105
leaning-0.102
Token sun
Token ID4252
Feature activation-2.22e+00

Pos logit contributions
ous+0.054
ho+0.048
 Nice+0.047
 tigers+0.045
 Sky+0.044
Neg logit contributions
 promise-0.047
ap-0.046
 agree-0.046
leaning-0.043
 advantage-0.042
Tokenflower
Token ID25547
Feature activation-1.01e+00

Pos logit contributions
ous+0.120
ho+0.103
 Nice+0.102
 grizz+0.097
pill+0.096
Neg logit contributions
 promise-0.202
ap-0.202
 agree-0.193
 promises-0.184
our-0.182
Token in
Token ID287
Feature activation+5.05e-01

Pos logit contributions
ous+0.082
ho+0.075
chy+0.071
 Nice+0.070
igator+0.070
Neg logit contributions
 agree-0.063
ap-0.061
leaning-0.061
 promise-0.059
 So-0.055
Token her
Token ID607
Feature activation+2.20e-01

Pos logit contributions
ap+0.041
 agree+0.040
 promise+0.040
leaning+0.039
our+0.038
Neg logit contributions
ous-0.029
ho-0.027
 spilling-0.027
 chasing-0.026
 Sky-0.025
Token room
Token ID2119
Feature activation+4.70e-01

Pos logit contributions
 agree+0.015
ap+0.013
 promise+0.013
leaning+0.013
wing+0.013
Neg logit contributions
ous-0.016
ho-0.015
 Sky-0.015
sed-0.015
 sleeping-0.014
Token.
Token ID13
Feature activation-5.37e-02

Pos logit contributions
ap+0.021
 promise+0.021
 agree+0.021
leaning+0.020
 advantage+0.019
Neg logit contributions
ous-0.046
ho-0.042
 spilling-0.040
 Nice-0.040
chy-0.040
Token her
Token ID607
Feature activation-1.61e-01

Pos logit contributions
ous+0.005
ho+0.005
 Sky+0.004
 chasing+0.004
 spilling+0.004
Neg logit contributions
 promise-0.010
ap-0.010
leaning-0.010
 agree-0.009
 So-0.009
Token mom
Token ID1995
Feature activation+2.36e-01

Pos logit contributions
ous+0.014
ho+0.013
sed+0.012
 Sky+0.012
 spilling+0.012
Neg logit contributions
ap-0.009
leaning-0.009
 advantage-0.009
 agree-0.009
 So-0.008
Token and
Token ID290
Feature activation+2.72e-01

Pos logit contributions
ap+0.011
leaning+0.011
 promise+0.010
 agree+0.010
 advantage+0.009
Neg logit contributions
ous-0.023
ho-0.021
 Nice-0.020
 Sky-0.020
chy-0.020
Token took
Token ID1718
Feature activation-2.90e-01

Pos logit contributions
 promise+0.017
ap+0.017
 agree+0.017
 promises+0.015
leaning+0.015
Neg logit contributions
ous-0.023
ho-0.020
 Nice-0.020
 tigers-0.019
 Sky-0.019
Token the
Token ID262
Feature activation-5.94e-01

Pos logit contributions
ous+0.015
 Sky+0.015
 Annie+0.013
 Amy+0.013
 spilling+0.013
Neg logit contributions
 promise-0.025
leaning-0.024
 agree-0.024
ute-0.023
ap-0.023
Token sun
Token ID4252
Feature activation-2.20e+00

Pos logit contributions
ous+0.049
ho+0.044
 Nice+0.042
 spilling+0.041
 Sky+0.041
Neg logit contributions
ap-0.038
 agree-0.037
 promise-0.037
 advantage-0.034
leaning-0.034
Tokenflower
Token ID25547
Feature activation-1.06e+00

Pos logit contributions
ous+0.119
ho+0.100
 Nice+0.096
 tigers+0.090
 Sky+0.088
Neg logit contributions
 promise-0.204
ap-0.202
 agree-0.199
leaning-0.189
 promises-0.188
Token with
Token ID351
Feature activation-2.64e-01

Pos logit contributions
ous+0.093
ho+0.088
 Sky+0.082
chy+0.081
igator+0.080
Neg logit contributions
leaning-0.058
ap-0.056
 agree-0.055
 promise-0.054
 advantage-0.051
Token her
Token ID607
Feature activation+3.36e-02

Pos logit contributions
ous+0.015
ho+0.014
 Sky+0.013
 Nice+0.013
 spilling+0.013
Neg logit contributions
 promise-0.022
ap-0.022
 agree-0.022
leaning-0.021
our-0.021
Token.
Token ID13
Feature activation+4.65e-02

Pos logit contributions
ap+0.002
leaning+0.002
 agree+0.002
 promise+0.002
erve+0.002
Neg logit contributions
ous-0.003
ho-0.003
 spilling-0.003
 Nice-0.003
 Sky-0.003
Token That
Token ID1320
Feature activation-2.98e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.003
 So+0.003
Neg logit contributions
ous-0.003
ho-0.003
 Nice-0.003
 tigers-0.003
 guarding-0.003
Token And
Token ID843
Feature activation-8.00e-02

Pos logit contributions
ous+0.016
 Nice+0.014
ho+0.014
 Sky+0.013
 grizz+0.012
Neg logit contributions
 promise-0.025
 agree-0.025
ap-0.024
leaning-0.023
 advantage-0.023
Token I
Token ID314
Feature activation-4.47e-01

Pos logit contributions
ous+0.005
ho+0.004
 Nice+0.004
 tigers+0.004
 Sky+0.004
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.006
leaning-0.006
 advantage-0.006
Token will
Token ID481
Feature activation+6.29e-01

Pos logit contributions
ho+0.035
ous+0.033
chy+0.031
 Nice+0.031
onna+0.031
Neg logit contributions
ags-0.031
ap-0.027
leaning-0.027
ippers-0.027
ised-0.026
Token share
Token ID2648
Feature activation+5.53e-01

Pos logit contributions
leaning+0.040
ippers+0.037
ised+0.037
int+0.037
ute+0.037
Neg logit contributions
ous-0.045
ho-0.043
onna-0.042
chy-0.042
 chasing-0.042
Token my
Token ID616
Feature activation-8.47e-01

Pos logit contributions
 promise+0.036
ap+0.035
 agree+0.035
leaning+0.033
 advantage+0.032
Neg logit contributions
ous-0.044
ho-0.040
 Nice-0.039
 tigers-0.038
 Sky-0.038
Token sun
Token ID4252
Feature activation-2.20e+00

Pos logit contributions
ous+0.058
ho+0.053
 Nice+0.050
 tigers+0.049
 Sky+0.047
Neg logit contributions
ap-0.063
 agree-0.061
 promise-0.060
leaning-0.059
 So-0.058
Tokenfl
Token ID2704
Feature activation-1.55e+00

Pos logit contributions
ous+0.107
ho+0.089
 Nice+0.085
 tigers+0.078
 spilling+0.077
Neg logit contributions
 promise-0.214
ap-0.213
 agree-0.210
leaning-0.199
 promises-0.198
Tokenowers
Token ID3618
Feature activation-3.81e-01

Pos logit contributions
ous+0.106
ho+0.094
 Nice+0.087
pill+0.086
 tigers+0.085
Neg logit contributions
 promise-0.126
 agree-0.123
ap-0.114
 advantage-0.114
leaning-0.112
Token with
Token ID351
Feature activation-2.85e-01

Pos logit contributions
ous+0.038
ho+0.035
 Nice+0.034
 Sky+0.033
igator+0.033
Neg logit contributions
 promise-0.017
ap-0.017
 agree-0.016
leaning-0.016
 advantage-0.015
Token you
Token ID345
Feature activation+1.73e-01

Pos logit contributions
ous+0.026
ho+0.024
 Nice+0.023
 tigers+0.023
 Sky+0.022
Neg logit contributions
 promise-0.015
ap-0.015
 agree-0.015
leaning-0.014
 promises-0.013
Token.
Token ID13
Feature activation-1.28e-01

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
leaning+0.006
 advantage+0.006
Neg logit contributions
ous-0.018
ho-0.017
 Nice-0.016
 tigers-0.016
 Sky-0.016
Tokenitty
Token ID9760
Feature activation+7.69e-02

Pos logit contributions
 promise+0.015
 agree+0.015
ap+0.015
leaning+0.014
 promises+0.014
Neg logit contributions
ous-0.015
ho-0.014
 Nice-0.013
 Sky-0.013
 tigers-0.013
Token went
Token ID1816
Feature activation+2.47e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.005
leaning+0.005
 promises+0.005
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 tigers-0.005
 Sky-0.005
Token outside
Token ID2354
Feature activation-1.51e-01

Pos logit contributions
 promise+0.016
ap+0.016
 agree+0.016
leaning+0.015
 promises+0.014
Neg logit contributions
ous-0.020
ho-0.018
 Nice-0.017
 guarding-0.017
 tigers-0.017
Token on
Token ID319
Feature activation+5.14e-01

Pos logit contributions
ous+0.015
ho+0.014
 Nice+0.013
 tigers+0.013
 Sky+0.013
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.007
 So-0.006
 promises-0.006
Token its
Token ID663
Feature activation-2.98e-01

Pos logit contributions
 promise+0.037
ap+0.036
 agree+0.035
leaning+0.034
 promises+0.033
Neg logit contributions
ous-0.039
ho-0.035
 Nice-0.033
 Sky-0.033
 tigers-0.032
Token own
Token ID898
Feature activation+1.18e+00

Pos logit contributions
ous+0.026
ho+0.024
 Nice+0.023
pill+0.023
 tigers+0.022
Neg logit contributions
 promise-0.018
ap-0.017
 agree-0.017
 advantage-0.015
 promises-0.015
Token.
Token ID13
Feature activation+3.86e-02

Pos logit contributions
ap+0.062
 promise+0.062
 agree+0.060
leaning+0.057
 promises+0.054
Neg logit contributions
ous-0.110
ho-0.100
 Nice-0.097
 Sky-0.094
 tigers-0.094
Token It
Token ID632
Feature activation+1.64e-01

Pos logit contributions
 promise+0.002
ap+0.002
 agree+0.002
leaning+0.002
 promises+0.002
Neg logit contributions
ous-0.003
 Nice-0.003
ho-0.003
 Sky-0.003
 tigers-0.003
Token found
Token ID1043
Feature activation+5.36e-01

Pos logit contributions
 promise+0.010
 agree+0.009
 So+0.009
ap+0.009
 promises+0.009
Neg logit contributions
ous-0.015
 Nice-0.013
ho-0.013
 gor-0.012
pill-0.012
Token a
Token ID257
Feature activation+3.99e-01

Pos logit contributions
 promises+0.021
 advantage+0.020
 So+0.020
ap+0.019
 promise+0.019
Neg logit contributions
ous-0.056
storm-0.050
 Nice-0.049
onna-0.049
ho-0.049
Token big
Token ID1263
Feature activation+1.18e-01

Pos logit contributions
 promise+0.029
ap+0.026
 agree+0.026
 advantage+0.024
 So+0.024
Neg logit contributions
ous-0.031
ho-0.027
 Nice-0.027
pill-0.027
 Sky-0.025
Token big
Token ID1263
Feature activation+1.46e-03

Pos logit contributions
 promise+0.023
 agree+0.021
ap+0.021
 So+0.019
 promises+0.019
Neg logit contributions
ous-0.023
ho-0.020
 Nice-0.020
pill-0.020
 Sky-0.019
Token slide
Token ID10649
Feature activation+1.39e-01

Pos logit contributions
 promise+0.000
ap+0.000
 agree+0.000
leaning+0.000
 promises+0.000
Neg logit contributions
ous-0.000
ho-0.000
 Nice-0.000
 Sky-0.000
 tigers-0.000
Token with
Token ID351
Feature activation+8.19e-02

Pos logit contributions
ap+0.005
 promise+0.005
 agree+0.005
 So+0.004
our+0.004
Neg logit contributions
ous-0.015
 Nice-0.014
ho-0.014
 Sky-0.013
pill-0.013
Token a
Token ID257
Feature activation+5.36e-01

Pos logit contributions
 hair+0.003
 promises+0.003
 advantage+0.003
 agree+0.003
ap+0.003
Neg logit contributions
ous-0.009
 Nice-0.008
ho-0.008
storm-0.008
pill-0.008
Token z
Token ID1976
Feature activation+1.87e-01

Pos logit contributions
 promise+0.035
 agree+0.033
ap+0.032
 advantage+0.029
 So+0.029
Neg logit contributions
ous-0.045
ho-0.040
 Nice-0.039
pill-0.038
 Sky-0.038
Tokenig
Token ID328
Feature activation+1.58e+00

Pos logit contributions
 agree+0.009
 So+0.009
 promise+0.009
leaning+0.009
 promises+0.008
Neg logit contributions
ous-0.018
ho-0.017
sed-0.016
balls-0.015
 spilling-0.015
Tokenzag
Token ID50183
Feature activation-2.16e-01

Pos logit contributions
ap+0.109
 promise+0.105
 agree+0.102
ute+0.099
our+0.099
Neg logit contributions
ous-0.114
 Nice-0.112
pill-0.107
 grizz-0.105
 tigers-0.104
Token shape
Token ID5485
Feature activation+4.96e-01

Pos logit contributions
ous+0.020
 Nice+0.019
ho+0.018
pill+0.018
 grizz+0.018
Neg logit contributions
 promise-0.011
ap-0.011
 agree-0.011
 So-0.010
 advantage-0.009
Token.
Token ID13
Feature activation-4.12e-03

Pos logit contributions
ap+0.017
 promise+0.015
 agree+0.015
our+0.013
ute+0.012
Neg logit contributions
ous-0.053
 Nice-0.051
 Sky-0.050
ho-0.050
 grizz-0.049
Token Lily
Token ID20037
Feature activation-2.43e-02

Pos logit contributions
ous+0.000
ho+0.000
 Nice+0.000
 spilling+0.000
 guarding+0.000
Neg logit contributions
ap-0.000
 agree-0.000
 promise-0.000
 So-0.000
leaning-0.000
Token wanted
Token ID2227
Feature activation-3.31e-01

Pos logit contributions
ous+0.002
ho+0.002
 Nice+0.001
 tigers+0.001
 guarding+0.001
Neg logit contributions
 promise-0.002
ap-0.002
 agree-0.002
leaning-0.002
 advantage-0.002
Token contained
Token ID7763
Feature activation-1.64e-01

Pos logit contributions
 promise+0.037
 agree+0.034
 promises+0.032
ap+0.032
 So+0.031
Neg logit contributions
ous-0.043
ho-0.040
 Nice-0.039
 gor-0.038
 grizz-0.037
Token.
Token ID13
Feature activation+1.52e-01

Pos logit contributions
ous+0.017
ho+0.015
 Nice+0.015
 tigers+0.015
 Sky+0.014
Neg logit contributions
 promise-0.007
ap-0.007
 agree-0.007
 promises-0.006
 advantage-0.006
Token It
Token ID632
Feature activation+2.37e-01

Pos logit contributions
 promise+0.011
 agree+0.010
ap+0.010
 advantage+0.010
wing+0.010
Neg logit contributions
ous-0.011
 Nice-0.011
 Sky-0.010
pill-0.010
ho-0.010
Token was
Token ID373
Feature activation-1.85e-01

Pos logit contributions
 promise+0.017
 agree+0.016
ap+0.015
 promises+0.015
 So+0.015
Neg logit contributions
ous-0.019
ho-0.017
 Nice-0.016
 tigers-0.016
 gor-0.016
Token their
Token ID511
Feature activation+2.33e-01

Pos logit contributions
ous+0.018
ho+0.016
 Nice+0.015
pill+0.015
 Sky+0.015
Neg logit contributions
 promise-0.010
ap-0.009
 agree-0.009
 advantage-0.009
 promises-0.009
Token own
Token ID898
Feature activation+1.24e+00

Pos logit contributions
ap+0.015
 agree+0.015
 promise+0.014
leaning+0.014
our+0.013
Neg logit contributions
ous-0.019
ho-0.017
 Nice-0.016
 Sky-0.016
 tigers-0.016
Token special
Token ID2041
Feature activation-4.37e-01

Pos logit contributions
ap+0.085
 agree+0.084
 promise+0.084
leaning+0.080
 So+0.076
Neg logit contributions
ous-0.095
ho-0.083
 Nice-0.082
 spilling-0.077
 tigers-0.077
Token secret
Token ID3200
Feature activation-5.71e-01

Pos logit contributions
ous+0.035
ho+0.031
 Nice+0.031
 guarding+0.029
 spilling+0.029
Neg logit contributions
ap-0.028
 agree-0.028
 promise-0.027
leaning-0.026
 So-0.025
Token!
Token ID0
Feature activation+2.12e-02

Pos logit contributions
ous+0.055
ho+0.051
 Nice+0.049
 tigers+0.048
 Sky+0.047
Neg logit contributions
 promise-0.028
ap-0.028
 agree-0.027
leaning-0.024
 promises-0.024
Token<|endoftext|>
Token ID50256
Feature activation+3.97e-02

Pos logit contributions
 promise+0.001
ap+0.001
 agree+0.001
leaning+0.001
our+0.001
Neg logit contributions
ous-0.002
 Nice-0.002
 Sky-0.001
pill-0.001
ho-0.001
TokenOnce
Token ID7454
Feature activation-2.06e-01

Pos logit contributions
ap+0.002
 So+0.002
 agree+0.002
 promise+0.002
leaning+0.002
Neg logit contributions
ous-0.004
ho-0.003
 spilling-0.003
sed-0.003
 Nice-0.003
Token.
Token ID13
Feature activation-1.77e-01

Pos logit contributions
ap+0.014
 agree+0.014
 promise+0.013
 So+0.012
 promises+0.012
Neg logit contributions
ous-0.030
ho-0.028
 Nice-0.027
igator-0.027
 Sky-0.027
Token They
Token ID1119
Feature activation+2.18e-01

Pos logit contributions
ous+0.016
ho+0.015
 Nice+0.014
 tigers+0.014
 Sky+0.014
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.010
leaning-0.009
 promises-0.009
Token escaped
Token ID13537
Feature activation-4.30e-01

Pos logit contributions
 promise+0.014
ap+0.014
 agree+0.014
leaning+0.013
 advantage+0.013
Neg logit contributions
ous-0.018
ho-0.016
 Nice-0.015
 tigers-0.015
 Sky-0.015
Token the
Token ID262
Feature activation+2.30e-01

Pos logit contributions
ous+0.039
ho+0.035
pill+0.034
 Nice+0.034
igator+0.032
Neg logit contributions
 promise-0.024
ap-0.023
 agree-0.023
 advantage-0.021
 promises-0.021
Token hunter
Token ID19177
Feature activation-3.36e-01

Pos logit contributions
 promise+0.017
ap+0.016
 agree+0.016
 promises+0.015
 advantage+0.015
Neg logit contributions
ous-0.018
ho-0.016
 Nice-0.015
pill-0.015
 Sky-0.014
Token's
Token ID338
Feature activation+1.20e+00

Pos logit contributions
ous+0.038
ho+0.035
 Nice+0.035
 tigers+0.033
 Sky+0.033
Neg logit contributions
 promise-0.011
ap-0.011
 agree-0.011
 promises-0.009
leaning-0.009
Token view
Token ID1570
Feature activation-6.51e-01

Pos logit contributions
 promise+0.069
ap+0.066
 agree+0.065
 promises+0.061
 advantage+0.060
Neg logit contributions
ous-0.110
ho-0.098
 Nice-0.097
pill-0.093
 Sky-0.092
Token,
Token ID11
Feature activation+3.36e-01

Pos logit contributions
ous+0.072
 Nice+0.068
ho+0.067
pill+0.065
 grizz+0.065
Neg logit contributions
 promise-0.021
ap-0.021
 agree-0.020
 So-0.016
 promises-0.016
Token and
Token ID290
Feature activation+3.57e-01

Pos logit contributions
 promise+0.016
 agree+0.015
ap+0.015
 promises+0.014
 advantage+0.013
Neg logit contributions
ous-0.034
ho-0.031
 Nice-0.030
pill-0.029
 Sky-0.028
Token were
Token ID547
Feature activation-4.34e-01

Pos logit contributions
 promise+0.019
 agree+0.019
ap+0.018
 promises+0.018
 So+0.016
Neg logit contributions
ous-0.033
 Nice-0.030
pill-0.029
ho-0.029
onna-0.028
Token not
Token ID407
Feature activation-1.96e-01

Pos logit contributions
ous+0.042
ho+0.039
 Nice+0.039
pill+0.038
storm+0.036
Neg logit contributions
 promise-0.022
ap-0.022
 agree-0.021
 promises-0.020
 advantage-0.019
Token.
Token ID13
Feature activation+3.44e-03

Pos logit contributions
ous+0.150
 Nice+0.141
ho+0.140
pill+0.139
 grizz+0.137
Neg logit contributions
ap-0.048
 promise-0.046
 agree-0.043
 So-0.037
our-0.037
Token
Token ID198
Feature activation+2.64e-01

Pos logit contributions
 promise+0.000
ap+0.000
 agree+0.000
leaning+0.000
 promises+0.000
Neg logit contributions
ous-0.000
ho-0.000
 Nice-0.000
 tigers-0.000
 Sky-0.000
Token
Token ID198
Feature activation+4.59e-02

Pos logit contributions
 promise+0.015
ap+0.014
 agree+0.014
 advantage+0.013
leaning+0.013
Neg logit contributions
ous-0.024
 Nice-0.021
ho-0.021
 Sky-0.021
 tigers-0.020
Token"
Token ID1
Feature activation+6.29e-01

Pos logit contributions
 promise+0.004
ap+0.004
 agree+0.004
 advantage+0.003
 promises+0.003
Neg logit contributions
ous-0.003
 Nice-0.003
ho-0.002
 Sky-0.002
pill-0.002
TokenWhat
Token ID2061
Feature activation+4.08e-01

Pos logit contributions
leaning+0.042
 agree+0.042
 promise+0.042
So+0.041
 So+0.041
Neg logit contributions
ous-0.049
ho-0.041
sed-0.039
balls-0.039
 spilling-0.039
Token's
Token ID338
Feature activation+1.25e+00

Pos logit contributions
 promise+0.021
ap+0.021
 agree+0.020
 So+0.019
 promises+0.018
Neg logit contributions
ous-0.039
ho-0.037
 spilling-0.035
 grizz-0.035
 Nice-0.035
Token that
Token ID326
Feature activation-1.64e-01

Pos logit contributions
 agree+0.089
 promise+0.088
ap+0.086
 So+0.081
leaning+0.079
Neg logit contributions
ous-0.094
ho-0.085
 guarding-0.080
 spilling-0.080
 Sky-0.079
Token?"
Token ID1701
Feature activation+2.45e-01

Pos logit contributions
ous+0.015
ho+0.014
 Nice+0.013
 tigers+0.013
 Sky+0.013
Neg logit contributions
 promise-0.009
ap-0.009
 agree-0.008
leaning-0.008
 advantage-0.008
Token asked
Token ID1965
Feature activation+2.57e-01

Pos logit contributions
 promise+0.021
ap+0.021
 agree+0.020
leaning+0.019
 advantage+0.019
Neg logit contributions
ous-0.015
ho-0.013
 Nice-0.013
 Sky-0.012
 tigers-0.012
Token Emily
Token ID17608
Feature activation+7.08e-02

Pos logit contributions
 promise+0.024
ap+0.024
 agree+0.023
leaning+0.022
 promises+0.022
Neg logit contributions
ous-0.014
ho-0.011
 Nice-0.011
 Sky-0.010
 tigers-0.010
Token.
Token ID13
Feature activation-3.30e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
 So+0.002
 promises+0.002
Neg logit contributions
ous-0.008
ho-0.007
 Nice-0.007
 tigers-0.007
pill-0.007
Token it
Token ID340
Feature activation+4.15e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.005
leaning+0.005
 So+0.005
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 Sky-0.005
 tigers-0.005
Token to
Token ID284
Feature activation+3.58e-01

Pos logit contributions
 promise+0.027
ap+0.026
 agree+0.026
 promises+0.024
 advantage+0.024
Neg logit contributions
ous-0.034
ho-0.030
 Nice-0.029
 tigers-0.028
 spilling-0.028
Token all
Token ID477
Feature activation-3.24e-01

Pos logit contributions
 promise+0.023
 agree+0.023
ap+0.023
 promises+0.021
 advantage+0.021
Neg logit contributions
ous-0.030
ho-0.028
 Nice-0.026
pill-0.025
 grizz-0.025
Token her
Token ID607
Feature activation+7.50e-01

Pos logit contributions
ous+0.028
ho+0.026
 Nice+0.024
 grizz+0.024
sed+0.024
Neg logit contributions
 promise-0.021
ap-0.020
 agree-0.020
 promises-0.019
 So-0.018
Token friends
Token ID2460
Feature activation+9.53e-01

Pos logit contributions
ap+0.033
 agree+0.030
 promise+0.029
leaning+0.028
our+0.027
Neg logit contributions
ous-0.075
ho-0.070
 Sky-0.068
 Nice-0.067
 gor-0.067
Token.
Token ID13
Feature activation+9.34e-02

Pos logit contributions
 promise+0.044
ap+0.042
leaning+0.040
 agree+0.040
 advantage+0.039
Neg logit contributions
ous-0.096
ho-0.087
 Nice-0.085
 guarding-0.081
 Sky-0.081
Token
Token ID198
Feature activation+3.48e-01

Pos logit contributions
 promise+0.005
ap+0.005
 agree+0.005
leaning+0.004
 advantage+0.004
Neg logit contributions
ous-0.009
ho-0.008
 Nice-0.008
 Sky-0.007
 tigers-0.007
Token
Token ID198
Feature activation+1.21e-01

Pos logit contributions
 promise+0.015
 agree+0.015
ap+0.015
leaning+0.013
 So+0.013
Neg logit contributions
ous-0.036
ho-0.033
 Nice-0.032
 tigers-0.031
 spilling-0.031
TokenAs
Token ID1722
Feature activation-1.50e-01

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
 promises+0.007
 advantage+0.007
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.009
pill-0.009
 spilling-0.009
Token time
Token ID640
Feature activation-3.75e-01

Pos logit contributions
ous+0.013
igator+0.012
 Nice+0.012
ho+0.011
 grizz+0.011
Neg logit contributions
ap-0.009
 promise-0.009
 agree-0.008
 promises-0.008
 practice-0.008
Token went
Token ID1816
Feature activation+2.13e-01

Pos logit contributions
ous+0.028
ho+0.025
 Nice+0.024
 tigers+0.023
 Sky+0.023
Neg logit contributions
ap-0.027
 promise-0.027
 agree-0.026
leaning-0.025
 So-0.024
TokenOnce
Token ID7454
Feature activation-2.50e-01

Pos logit contributions
 agree+0.002
ap+0.002
 promise+0.002
 So+0.002
leaning+0.002
Neg logit contributions
ous-0.004
ho-0.003
 Nice-0.003
 grizz-0.003
sed-0.003
Token they
Token ID484
Feature activation+3.66e-01

Pos logit contributions
 grizz+0.024
ous+0.024
pill+0.024
 tigers+0.024
 Nice+0.023
Neg logit contributions
ap-0.013
 promise-0.013
 agree-0.012
our-0.011
 So-0.011
Token were
Token ID547
Feature activation-5.02e-02

Pos logit contributions
 promise+0.026
ap+0.025
 agree+0.025
leaning+0.023
 promises+0.023
Neg logit contributions
ous-0.028
ho-0.025
 Nice-0.025
 tigers-0.023
 Sky-0.023
Token all
Token ID477
Feature activation-8.04e-01

Pos logit contributions
ous+0.005
 Nice+0.004
ho+0.004
 grizz+0.004
pill+0.004
Neg logit contributions
ap-0.003
 promise-0.003
 agree-0.003
our-0.003
 advantage-0.002
Token mixed
Token ID7668
Feature activation-1.94e-01

Pos logit contributions
ous+0.067
 Nice+0.065
ho+0.062
 grizz+0.059
sed+0.057
Neg logit contributions
 promise-0.055
ap-0.053
 agree-0.052
 promises-0.049
 own-0.047
Token,
Token ID11
Feature activation+7.02e-02

Pos logit contributions
ous+0.016
ho+0.015
 spilling+0.014
 tigers+0.014
 gor+0.014
Neg logit contributions
 promise-0.012
ap-0.012
 agree-0.012
leaning-0.011
 advantage-0.010
Token she
Token ID673
Feature activation-2.19e-01

Pos logit contributions
 agree+0.006
ap+0.006
 promise+0.006
 So+0.006
 promises+0.005
Neg logit contributions
ous-0.004
ho-0.004
 grizz-0.004
 Nice-0.004
 guarding-0.004
Token put
Token ID1234
Feature activation+1.04e-01

Pos logit contributions
ous+0.017
ho+0.015
 Nice+0.015
 tigers+0.014
 Sky+0.014
Neg logit contributions
 promise-0.016
ap-0.015
 agree-0.015
leaning-0.014
 promises-0.014
Token the
Token ID262
Feature activation-3.85e-01

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.006
leaning+0.006
 promises+0.006
Neg logit contributions
ous-0.009
ho-0.008
 Nice-0.008
 tigers-0.007
 guarding-0.007
Token bowl
Token ID9396
Feature activation+2.36e-01

Pos logit contributions
ous+0.029
ho+0.026
 Nice+0.026
 tigers+0.024
 Sky+0.024
Neg logit contributions
 promise-0.028
ap-0.027
 agree-0.027
leaning-0.025
 promises-0.025
Token in
Token ID287
Feature activation+4.68e-01

Pos logit contributions
ap+0.013
 promise+0.013
 agree+0.012
 So+0.011
 promises+0.011
Neg logit contributions
ous-0.022
 Nice-0.021
 grizz-0.020
 tigers-0.020
 gor-0.019
Token.
Token ID13
Feature activation-2.68e-01

Pos logit contributions
 promise+0.037
 agree+0.036
ap+0.036
 So+0.031
 promises+0.031
Neg logit contributions
ous-0.077
ho-0.071
 Nice-0.069
pill-0.067
 tigers-0.066
Token They
Token ID1119
Feature activation+3.46e-01

Pos logit contributions
ous+0.017
 Nice+0.016
 Sky+0.016
 tigers+0.014
ho+0.014
Neg logit contributions
 promise-0.023
ap-0.021
wing-0.020
leaning-0.020
 promises-0.020
Token said
Token ID531
Feature activation-1.28e-01

Pos logit contributions
ap+0.030
leaning+0.028
ute+0.028
 promise+0.027
ags+0.027
Neg logit contributions
ho-0.020
ous-0.019
chy-0.017
 tigers-0.016
sed-0.016
Token,
Token ID11
Feature activation-3.15e-01

Pos logit contributions
ous+0.014
ho+0.013
 Nice+0.012
 tigers+0.012
 Sky+0.012
Neg logit contributions
 promise-0.005
ap-0.005
 agree-0.005
leaning-0.004
 promises-0.004
Token "
Token ID366
Feature activation-3.02e-01

Pos logit contributions
ous+0.017
ho+0.014
 Nice+0.013
 tigers+0.012
 Sky+0.012
Neg logit contributions
 promise-0.029
ap-0.029
 agree-0.029
leaning-0.027
 So-0.027
TokenNo
Token ID2949
Feature activation+8.63e-02

Pos logit contributions
ho+0.017
ous+0.016
 tigers+0.014
pill+0.014
 Nice+0.014
Neg logit contributions
 promise-0.027
ap-0.026
ags-0.024
 agree-0.024
 promises-0.024
Token,
Token ID11
Feature activation-3.71e-01

Pos logit contributions
ap+0.004
leaning+0.004
 agree+0.004
 promise+0.004
ute+0.004
Neg logit contributions
ous-0.008
 tigers-0.007
 Nice-0.007
ho-0.007
 chasing-0.007
Token you
Token ID345
Feature activation-7.08e-02

Pos logit contributions
ous+0.031
ho+0.028
 Nice+0.027
pill+0.027
 grizz+0.026
Neg logit contributions
 promise-0.025
ap-0.023
 agree-0.023
 promises-0.021
leaning-0.021
Token are
Token ID389
Feature activation-8.65e-02

Pos logit contributions
ous+0.005
ho+0.004
 tigers+0.004
chy+0.004
 chasing+0.004
Neg logit contributions
ags-0.005
ap-0.005
leaning-0.005
ute-0.005
ippers-0.005
Token mean
Token ID1612
Feature activation+5.04e-01

Pos logit contributions
ous+0.008
ho+0.008
 Nice+0.007
pill+0.007
 grizz+0.007
Neg logit contributions
 promise-0.005
ap-0.005
 agree-0.005
 advantage-0.004
 promises-0.004
Token and
Token ID290
Feature activation-2.26e-01

Pos logit contributions
 promise+0.017
ap+0.017
 agree+0.016
 So+0.014
 promises+0.013
Neg logit contributions
ous-0.056
ho-0.052
 Nice-0.051
 Sky-0.050
pill-0.050
Token fly
Token ID6129
Feature activation+1.05e-01

Pos logit contributions
 promise+0.036
ap+0.035
 agree+0.035
leaning+0.032
 promises+0.031
Neg logit contributions
ous-0.063
ho-0.057
 Nice-0.056
 tigers-0.054
 Sky-0.053
Token to
Token ID284
Feature activation-5.53e-01

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.006
leaning+0.005
 promises+0.005
Neg logit contributions
ous-0.009
ho-0.009
 Nice-0.008
 tigers-0.008
 Sky-0.008
Token the
Token ID262
Feature activation-2.93e-01

Pos logit contributions
ous+0.048
ho+0.043
 Nice+0.042
 tigers+0.041
 Sky+0.040
Neg logit contributions
 promise-0.033
ap-0.032
 agree-0.032
leaning-0.030
 promises-0.029
Token moon
Token ID8824
Feature activation-1.24e+00

Pos logit contributions
ous+0.025
 Nice+0.022
ho+0.022
pill+0.020
 spilling+0.020
Neg logit contributions
 promise-0.019
ap-0.019
 agree-0.018
 advantage-0.017
 promises-0.017
Token."
Token ID526
Feature activation-9.42e-02

Pos logit contributions
ous+0.119
ho+0.107
 tigers+0.103
 Nice+0.103
 Sky+0.102
Neg logit contributions
 promise-0.062
 agree-0.059
ap-0.059
leaning-0.059
 advantage-0.055
Token
Token ID198
Feature activation-3.80e-02

Pos logit contributions
 Nice+0.005
 tigers+0.005
ous+0.005
 Sky+0.005
 Amy+0.004
Neg logit contributions
 promise-0.008
ap-0.008
ute-0.008
 advantage-0.008
leaning-0.008
Token
Token ID198
Feature activation-2.14e-01

Pos logit contributions
ous+0.003
ho+0.003
 Nice+0.003
 Sky+0.003
 tigers+0.002
Neg logit contributions
 promise-0.003
 advantage-0.002
ap-0.002
 promises-0.002
leaning-0.002
Token"
Token ID1
Feature activation+6.36e-01

Pos logit contributions
ho+0.015
ous+0.015
 tigers+0.014
 Nice+0.014
lic+0.013
Neg logit contributions
 advantage-0.014
 promise-0.014
ap-0.014
 promises-0.014
ags-0.013
TokenOK
Token ID11380
Feature activation+2.62e-01

Pos logit contributions
 promise+0.039
ap+0.038
 agree+0.036
 promises+0.034
 advantage+0.034
Neg logit contributions
ous-0.054
ho-0.050
 Nice-0.048
 tigers-0.047
 Sky-0.046
Token,
Token ID11
Feature activation-5.74e-01

Pos logit contributions
leaning+0.016
ap+0.016
 promise+0.015
ute+0.015
 agree+0.015
Neg logit contributions
ous-0.022
 tigers-0.019
 Sky-0.019
ho-0.019
 Nice-0.019
Token but
Token ID475
Feature activation-2.99e-01

Pos logit contributions
ous+0.042
ho+0.036
 Nice+0.036
 tigers+0.035
 Sky+0.034
Neg logit contributions
ap-0.042
 promise-0.041
 agree-0.040
leaning-0.039
 advantage-0.038
Token.
Token ID13
Feature activation+1.07e-01

Pos logit contributions
 promise+0.013
ap+0.013
 agree+0.012
 promises+0.010
 advantage+0.010
Neg logit contributions
ous-0.039
ho-0.036
 Nice-0.036
 tigers-0.035
 Sky-0.035
Token The
Token ID383
Feature activation+4.52e-01

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
leaning+0.006
 promises+0.006
Neg logit contributions
ous-0.009
ho-0.008
 Nice-0.008
 Sky-0.007
pill-0.007
Token plane
Token ID6614
Feature activation+9.20e-03

Pos logit contributions
ap+0.029
 agree+0.028
 So+0.027
leaning+0.027
 promise+0.026
Neg logit contributions
ous-0.035
ho-0.033
 tigers-0.032
 gor-0.031
 spilling-0.031
Token went
Token ID1816
Feature activation+3.55e-01

Pos logit contributions
ap+0.001
 promise+0.001
 agree+0.001
leaning+0.001
 advantage+0.001
Neg logit contributions
ous-0.001
ho-0.001
 Nice-0.001
 tigers-0.001
 Sky-0.001
Token up
Token ID510
Feature activation-1.75e-01

Pos logit contributions
 promise+0.017
ap+0.017
 agree+0.016
 advantage+0.015
 So+0.015
Neg logit contributions
ous-0.035
ho-0.031
 Nice-0.031
pill-0.031
 tigers-0.030
Token in
Token ID287
Feature activation+3.34e-01

Pos logit contributions
ous+0.021
 Nice+0.019
ho+0.019
pill+0.018
 tigers+0.018
Neg logit contributions
ap-0.005
 promise-0.005
 agree-0.005
 advantage-0.004
 So-0.004
Token the
Token ID262
Feature activation+2.37e-01

Pos logit contributions
 promise+0.021
ap+0.020
 agree+0.020
leaning+0.019
 promises+0.018
Neg logit contributions
ous-0.028
ho-0.026
 Nice-0.025
 tigers-0.024
 Sky-0.024
Token air
Token ID1633
Feature activation-5.58e-01

Pos logit contributions
 agree+0.021
 promise+0.020
ap+0.020
leaning+0.019
our+0.018
Neg logit contributions
ous-0.014
ho-0.012
 tigers-0.011
 grizz-0.011
 stairs-0.011
Token and
Token ID290
Feature activation+1.62e-01

Pos logit contributions
ous+0.066
 Nice+0.061
ho+0.061
pill+0.060
 tigers+0.059
Neg logit contributions
ap-0.015
 promise-0.015
 agree-0.014
 So-0.011
 promises-0.010
Token soon
Token ID2582
Feature activation+3.65e-01

Pos logit contributions
 promise+0.009
 agree+0.009
ap+0.009
 promises+0.009
 band+0.008
Neg logit contributions
ous-0.014
pill-0.013
 Nice-0.013
ho-0.012
 gor-0.011
Token they
Token ID484
Feature activation+2.51e-01

Pos logit contributions
 promise+0.020
 agree+0.020
ap+0.020
 promises+0.019
 So+0.018
Neg logit contributions
ous-0.032
 Nice-0.030
ho-0.029
igator-0.029
 grizz-0.028
Token delicious
Token ID12625
Feature activation-1.36e-01

Pos logit contributions
 promise+0.041
ap+0.039
 agree+0.039
 So+0.036
 advantage+0.036
Neg logit contributions
ous-0.039
ho-0.035
 Nice-0.033
 Sky-0.032
 gor-0.032
Token red
Token ID2266
Feature activation-5.96e-01

Pos logit contributions
ous+0.010
ho+0.009
 Nice+0.009
 Sky+0.008
pill+0.008
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.010
 promises-0.009
 So-0.009
Token sauce
Token ID10746
Feature activation+4.42e-01

Pos logit contributions
ous+0.044
ho+0.039
 Nice+0.038
 tigers+0.036
 Sky+0.036
Neg logit contributions
 promise-0.043
ap-0.043
 agree-0.042
leaning-0.039
 promises-0.039
Token.
Token ID13
Feature activation+2.12e-01

Pos logit contributions
ap+0.014
 promise+0.014
 agree+0.013
 So+0.011
our+0.010
Neg logit contributions
ous-0.049
 Nice-0.046
pill-0.045
 grizz-0.045
ho-0.045
Token He
Token ID679
Feature activation+1.84e-01

Pos logit contributions
 promise+0.013
ap+0.013
 agree+0.013
 So+0.012
leaning+0.012
Neg logit contributions
ous-0.018
ho-0.017
 Nice-0.016
 tigers-0.015
 guarding-0.015
Token tasted
Token ID29187
Feature activation-3.79e-01

Pos logit contributions
 promise+0.013
 agree+0.012
ap+0.012
 So+0.011
 promises+0.011
Neg logit contributions
ous-0.015
 Nice-0.013
ho-0.013
 Sky-0.012
 tigers-0.012
Token it
Token ID340
Feature activation+3.51e-01

Pos logit contributions
ous+0.036
ho+0.033
 Nice+0.032
 Sky+0.031
 tigers+0.031
Neg logit contributions
 promise-0.019
ap-0.019
 agree-0.019
 promises-0.017
 So-0.017
Token and
Token ID290
Feature activation+4.70e-01

Pos logit contributions
 promise+0.012
ap+0.012
 agree+0.011
leaning+0.010
 promises+0.010
Neg logit contributions
ous-0.039
ho-0.036
 Nice-0.035
 tigers-0.034
 Sky-0.034
Token it
Token ID340
Feature activation+2.33e-01

Pos logit contributions
 agree+0.026
 promise+0.026
 promises+0.024
 So+0.023
ap+0.023
Neg logit contributions
ous-0.044
ho-0.040
 Nice-0.040
pill-0.038
 Sky-0.038
Token was
Token ID373
Feature activation-3.81e-02

Pos logit contributions
 promise+0.015
 agree+0.015
ap+0.014
 promises+0.014
 So+0.014
Neg logit contributions
ous-0.019
 Nice-0.017
ho-0.017
 gor-0.016
 guarding-0.016
Token y
Token ID331
Feature activation-2.54e-01

Pos logit contributions
ous+0.003
ho+0.003
pill+0.003
 Nice+0.003
 Sky+0.003
Neg logit contributions
 promise-0.002
 agree-0.002
ap-0.002
 own-0.002
 So-0.002
Token that
Token ID326
Feature activation-3.12e-01

Pos logit contributions
ous+0.037
ho+0.034
 Nice+0.033
 tigers+0.032
 Sky+0.032
Neg logit contributions
 promise-0.018
ap-0.018
 agree-0.018
leaning-0.016
 promises-0.016
Token are
Token ID389
Feature activation-4.68e-02

Pos logit contributions
ous+0.026
ho+0.023
 Nice+0.023
 spilling+0.022
 tigers+0.022
Neg logit contributions
 promise-0.020
ap-0.020
 agree-0.019
 promises-0.018
leaning-0.018
Token yellow
Token ID7872
Feature activation-7.82e-01

Pos logit contributions
ous+0.004
ho+0.004
 Nice+0.004
pill+0.003
 grizz+0.003
Neg logit contributions
 promise-0.003
ap-0.003
 agree-0.003
 advantage-0.003
 promises-0.002
Token and
Token ID290
Feature activation-4.48e-02

Pos logit contributions
ous+0.091
ho+0.084
 Nice+0.083
 grizz+0.082
 spilling+0.082
Neg logit contributions
ap-0.023
 promise-0.021
 agree-0.019
 So-0.017
our-0.016
Token blue
Token ID4171
Feature activation-8.90e-01

Pos logit contributions
ous+0.004
 Nice+0.003
ho+0.003
pill+0.003
storm+0.003
Neg logit contributions
 promise-0.003
 agree-0.003
ap-0.003
 promises-0.003
 own-0.003
Token.
Token ID13
Feature activation-1.71e-01

Pos logit contributions
ous+0.100
ho+0.093
 Nice+0.091
 grizz+0.090
 spilling+0.090
Neg logit contributions
ap-0.029
 promise-0.027
 agree-0.026
 So-0.023
our-0.022
Token Add
Token ID3060
Feature activation-2.06e-01

Pos logit contributions
ous+0.011
 Nice+0.010
 Sky+0.010
ho+0.009
 tigers+0.009
Neg logit contributions
 promise-0.014
 agree-0.013
ap-0.013
leaning-0.013
 promises-0.013
Token them
Token ID606
Feature activation-8.41e-02

Pos logit contributions
ous+0.017
ho+0.015
 Nice+0.014
 tigers+0.014
 guarding+0.014
Neg logit contributions
 promise-0.013
ap-0.013
 agree-0.013
leaning-0.012
 advantage-0.012
Token together
Token ID1978
Feature activation+2.37e-01

Pos logit contributions
ous+0.007
ho+0.007
 Nice+0.006
 tigers+0.006
 Sky+0.006
Neg logit contributions
 promise-0.005
ap-0.005
 agree-0.005
leaning-0.004
 advantage-0.004
Token and
Token ID290
Feature activation+2.09e-01

Pos logit contributions
 promise+0.014
ap+0.014
 agree+0.013
leaning+0.013
 advantage+0.013
Neg logit contributions
ous-0.020
ho-0.018
 Nice-0.017
 tigers-0.017
 Sky-0.017
Token that
Token ID326
Feature activation-3.32e-01

Pos logit contributions
 promise+0.013
 agree+0.013
ap+0.013
 promises+0.012
 So+0.012
Neg logit contributions
ous-0.018
ho-0.016
 Nice-0.016
 spilling-0.015
 grizz-0.015
Token big
Token ID1263
Feature activation-6.49e-02

Pos logit contributions
 promise+0.023
ap+0.022
 agree+0.022
 So+0.020
 promises+0.020
Neg logit contributions
ous-0.022
ho-0.019
 Nice-0.018
pill-0.017
 gor-0.017
Token bow
Token ID9563
Feature activation+1.45e-01

Pos logit contributions
ous+0.006
ho+0.005
 Nice+0.005
 tigers+0.005
 Sky+0.005
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
leaning-0.003
 advantage-0.003
Token in
Token ID287
Feature activation+9.94e-01

Pos logit contributions
 promise+0.006
ap+0.006
 agree+0.005
 So+0.005
 advantage+0.005
Neg logit contributions
ous-0.015
ho-0.014
 Nice-0.014
 tigers-0.014
pill-0.014
Token her
Token ID607
Feature activation+6.38e-01

Pos logit contributions
 promise+0.074
ap+0.073
 agree+0.072
leaning+0.067
 promises+0.067
Neg logit contributions
ous-0.073
ho-0.064
 Nice-0.062
 tigers-0.059
 Sky-0.058
Token hair
Token ID4190
Feature activation+4.28e-01

Pos logit contributions
 agree+0.037
leaning+0.032
 promise+0.031
wing+0.031
 advantage+0.031
Neg logit contributions
ous-0.051
sed-0.050
 Sky-0.049
ho-0.048
 Nice-0.048
Token.
Token ID13
Feature activation-1.09e-01

Pos logit contributions
 promise+0.021
leaning+0.020
ap+0.020
 agree+0.020
 advantage+0.019
Neg logit contributions
ous-0.042
ho-0.038
 Nice-0.036
chy-0.035
 spilling-0.035
Token Sara
Token ID24799
Feature activation+2.93e-01

Pos logit contributions
ous+0.009
ho+0.008
 Nice+0.008
 grizz+0.008
 spilling+0.008
Neg logit contributions
ap-0.007
 agree-0.007
 promise-0.007
 So-0.007
 promises-0.006
Token liked
Token ID8288
Feature activation-3.84e-02

Pos logit contributions
 promise+0.022
ap+0.022
 agree+0.021
 advantage+0.021
leaning+0.021
Neg logit contributions
ous-0.020
ho-0.018
 Nice-0.017
 Sky-0.016
 tigers-0.016
Token to
Token ID284
Feature activation-5.64e-01

Pos logit contributions
ous+0.004
 grizz+0.004
ho+0.004
storm+0.004
igator+0.004
Neg logit contributions
 own-0.001
 hair-0.001
 practice-0.001
 promises-0.001
 care-0.001
Token brush
Token ID14093
Feature activation+5.36e-01

Pos logit contributions
ous+0.043
chy+0.041
 gn+0.040
igator+0.040
 Nice+0.040
Neg logit contributions
leaning-0.033
ap-0.032
 promises-0.030
wing-0.029
 advantage-0.029
Token Anna
Token ID11735
Feature activation-1.57e-01

Pos logit contributions
ap+0.035
 promise+0.035
 agree+0.034
 promises+0.032
 So+0.032
Neg logit contributions
ous-0.043
ho-0.039
 Nice-0.037
 grizz-0.036
igator-0.036
Token says
Token ID1139
Feature activation-6.00e-02

Pos logit contributions
 promise+0.006
 agree+0.005
ap+0.005
 promises+0.005
 So+0.005
Neg logit contributions
ous-0.006
ho-0.005
 Nice-0.005
 spilling-0.005
 tigers-0.005
Token to
Token ID284
Feature activation-3.00e-01

Pos logit contributions
ous+0.005
ho+0.004
 Nice+0.004
 Sky+0.004
 guarding+0.004
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
leaning-0.004
 promises-0.004
Token Mia
Token ID32189
Feature activation+9.59e-02

Pos logit contributions
ous+0.021
ho+0.018
pill+0.018
chy+0.017
 Nice+0.017
Neg logit contributions
 agree-0.024
 promise-0.024
ap-0.023
 So-0.022
 promises-0.022
Token:
Token ID25
Feature activation-2.70e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
 promises+0.002
 So+0.002
Neg logit contributions
ous-0.011
ho-0.010
 Nice-0.010
 tigers-0.010
pill-0.010
Token "
Token ID366
Feature activation-5.35e-02

Pos logit contributions
ous+0.022
igator+0.019
pill+0.019
ho+0.018
 tigers+0.018
Neg logit contributions
ap-0.018
 agree-0.017
 promise-0.017
 So-0.016
our-0.015
TokenLook
Token ID8567
Feature activation-2.98e-01

Pos logit contributions
ous+0.003
ho+0.003
 Nice+0.003
 Sky+0.003
 tigers+0.003
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
 advantage-0.004
leaning-0.004
Token,
Token ID11
Feature activation-1.03e-01

Pos logit contributions
ous+0.029
ho+0.027
 Nice+0.026
 Sky+0.025
 tigers+0.025
Neg logit contributions
 promise-0.014
 agree-0.014
ap-0.013
leaning-0.013
 advantage-0.012
Token Mia
Token ID32189
Feature activation+3.11e-01

Pos logit contributions
ous+0.008
ho+0.007
pill+0.007
storm+0.006
chy+0.006
Neg logit contributions
 promise-0.009
 agree-0.007
 promises-0.007
ap-0.007
 So-0.007
Token.
Token ID13
Feature activation-1.38e-01

Pos logit contributions
 promise+0.009
ap+0.009
 agree+0.008
 So+0.007
 promises+0.007
Neg logit contributions
ous-0.037
ho-0.034
 Nice-0.034
pill-0.033
 spilling-0.033
Token That
Token ID1320
Feature activation-4.72e-01

Pos logit contributions
ous+0.009
ho+0.008
 spilling+0.008
 Nice+0.008
 tigers+0.007
Neg logit contributions
 promise-0.011
ap-0.011
 agree-0.010
 So-0.010
 promises-0.010
Token dog
Token ID3290
Feature activation+1.91e-01

Pos logit contributions
ous+0.041
ho+0.036
 Nice+0.034
 spilling+0.034
 guarding+0.033
Neg logit contributions
 promise-0.030
ap-0.028
 agree-0.027
 So-0.027
 promises-0.026
Token careful
Token ID8161
Feature activation+1.92e-01

Pos logit contributions
 promise+0.017
 agree+0.016
ap+0.016
 promises+0.013
 So+0.013
Neg logit contributions
ous-0.031
 Nice-0.027
pill-0.027
 tigers-0.027
storm-0.026
Token as
Token ID355
Feature activation-2.58e-01

Pos logit contributions
ap+0.010
 promise+0.009
 agree+0.009
 So+0.008
 advantage+0.008
Neg logit contributions
ous-0.018
ho-0.017
 Nice-0.017
pill-0.017
 tigers-0.016
Token he
Token ID339
Feature activation-3.46e-01

Pos logit contributions
ous+0.020
 grizz+0.020
pill+0.019
 Nice+0.019
storm+0.019
Neg logit contributions
 agree-0.017
ap-0.016
 promise-0.016
 promises-0.015
 own-0.015
Token picked
Token ID6497
Feature activation-4.16e-02

Pos logit contributions
ous+0.024
ho+0.022
 Nice+0.020
pill+0.020
 guarding+0.020
Neg logit contributions
ap-0.026
 promise-0.026
 agree-0.025
 advantage-0.024
leaning-0.024
Token the
Token ID262
Feature activation-1.79e-01

Pos logit contributions
ous+0.003
ho+0.003
 Nice+0.003
 tigers+0.002
 spilling+0.002
Neg logit contributions
 promise-0.003
ap-0.003
 agree-0.003
leaning-0.003
 So-0.003
Token squash
Token ID34613
Feature activation-2.64e-01

Pos logit contributions
ous+0.016
ho+0.014
 Nice+0.014
 grizz+0.014
 tigers+0.014
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.010
 promises-0.009
 advantage-0.009
Token from
Token ID422
Feature activation-2.07e-02

Pos logit contributions
ous+0.026
ho+0.024
 Nice+0.024
 tigers+0.023
 Sky+0.023
Neg logit contributions
 promise-0.012
ap-0.012
 agree-0.012
leaning-0.011
 advantage-0.010
Token the
Token ID262
Feature activation+1.72e-01

Pos logit contributions
ous+0.001
ho+0.001
 tigers+0.001
 guarding+0.001
 spilling+0.001
Neg logit contributions
 promise-0.002
ap-0.002
 agree-0.001
leaning-0.001
 So-0.001
Token vine
Token ID17793
Feature activation-5.15e-01

Pos logit contributions
 promise+0.010
ap+0.010
 agree+0.010
 So+0.009
 advantage+0.009
Neg logit contributions
ous-0.016
 Nice-0.014
pill-0.013
ho-0.013
 grizz-0.013
Token.
Token ID13
Feature activation-1.63e-02

Pos logit contributions
ous+0.054
ho+0.050
 Nice+0.049
 tigers+0.048
 Sky+0.047
Neg logit contributions
 promise-0.021
ap-0.020
 agree-0.020
leaning-0.017
 promises-0.017
Token He
Token ID679
Feature activation+1.57e-01

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 tigers+0.001
 Sky+0.001
Neg logit contributions
 promise-0.001
ap-0.001
 agree-0.001
leaning-0.001
 promises-0.001
Tokenal
Token ID282
Feature activation-6.60e-01

Pos logit contributions
ap+0.022
 promise+0.022
 agree+0.022
 So+0.021
leaning+0.020
Neg logit contributions
ous-0.022
 Nice-0.019
ho-0.019
 spilling-0.018
 gor-0.018
Token felt
Token ID2936
Feature activation-3.42e-01

Pos logit contributions
ous+0.052
 Nice+0.047
ho+0.044
 gor+0.044
 spilling+0.044
Neg logit contributions
 promise-0.044
ap-0.044
 agree-0.043
 So-0.040
 promises-0.039
Token wet
Token ID9583
Feature activation-1.20e-01

Pos logit contributions
ous+0.027
ho+0.024
 Nice+0.024
 tigers+0.023
 Sky+0.023
Neg logit contributions
 promise-0.023
ap-0.022
 agree-0.022
leaning-0.021
 promises-0.020
Token in
Token ID287
Feature activation+2.60e-01

Pos logit contributions
ous+0.015
ho+0.014
 Nice+0.014
pill+0.014
 Sky+0.014
Neg logit contributions
ap-0.002
 promise-0.002
 agree-0.002
 So-0.001
 promises-0.001
Token the
Token ID262
Feature activation+3.27e-01

Pos logit contributions
 promise+0.014
 agree+0.014
ap+0.014
 promises+0.013
 advantage+0.013
Neg logit contributions
ous-0.025
ho-0.022
pill-0.021
 Nice-0.021
 gor-0.020
Token sun
Token ID4252
Feature activation-1.53e+00

Pos logit contributions
 promise+0.020
ap+0.019
 agree+0.019
 promises+0.017
 advantage+0.017
Neg logit contributions
ous-0.029
ho-0.025
 Nice-0.025
 tigers-0.024
 guarding-0.024
Token and
Token ID290
Feature activation+1.58e-01

Pos logit contributions
ous+0.168
 Nice+0.155
ho+0.155
pill+0.151
 tigers+0.151
Neg logit contributions
ap-0.053
 promise-0.053
 agree-0.049
 So-0.041
 promises-0.041
Token suddenly
Token ID6451
Feature activation-1.47e-01

Pos logit contributions
 promises+0.007
 promise+0.007
 agree+0.006
 care+0.006
 keep+0.006
Neg logit contributions
ous-0.016
pill-0.015
 Nice-0.015
 gor-0.014
lic-0.014
Token the
Token ID262
Feature activation+1.15e-01

Pos logit contributions
ous+0.014
 Nice+0.013
ho+0.013
igator+0.013
pill+0.013
Neg logit contributions
ap-0.007
 promise-0.007
 agree-0.007
 promises-0.006
 So-0.006
Token rabbit
Token ID22746
Feature activation-2.13e-01

Pos logit contributions
 promise+0.008
ap+0.008
 agree+0.008
 promises+0.007
 advantage+0.007
Neg logit contributions
ous-0.009
 Nice-0.008
ho-0.008
 gor-0.008
pill-0.007
Token heard
Token ID2982
Feature activation+5.16e-02

Pos logit contributions
ous+0.017
 Nice+0.015
ho+0.014
 gor+0.014
pill+0.014
Neg logit contributions
 promise-0.014
 agree-0.014
ap-0.013
 So-0.013
 promises-0.012
Token very
Token ID845
Feature activation+3.19e-01

Pos logit contributions
ous+0.034
ho+0.031
pill+0.031
 Nice+0.029
 Sky+0.028
Neg logit contributions
 promise-0.022
ap-0.021
 agree-0.019
 So-0.019
 own-0.019
Token warm
Token ID5814
Feature activation+8.53e-02

Pos logit contributions
 promise+0.024
ap+0.023
 agree+0.022
 So+0.021
 advantage+0.020
Neg logit contributions
ous-0.024
ho-0.021
 Nice-0.021
pill-0.020
 spilling-0.020
Token outside
Token ID2354
Feature activation+2.92e-01

Pos logit contributions
ap+0.002
 promise+0.002
 agree+0.002
 So+0.002
leaning+0.002
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.009
pill-0.009
 Sky-0.009
Token and
Token ID290
Feature activation+1.71e-01

Pos logit contributions
ap+0.013
 promise+0.012
 agree+0.011
 So+0.011
 promises+0.010
Neg logit contributions
ous-0.030
ho-0.028
 Nice-0.027
pill-0.027
 grizz-0.026
Token the
Token ID262
Feature activation-6.12e-02

Pos logit contributions
 promise+0.007
ap+0.006
 own+0.006
 agree+0.006
 keep+0.006
Neg logit contributions
pill-0.018
ous-0.018
ho-0.016
 Nice-0.016
 grizz-0.016
Token sun
Token ID4252
Feature activation-1.58e+00

Pos logit contributions
ous+0.005
ho+0.005
 Nice+0.005
pill+0.004
 Sky+0.004
Neg logit contributions
 promise-0.004
ap-0.004
 agree-0.004
leaning-0.003
 promises-0.003
Token was
Token ID373
Feature activation-2.24e-01

Pos logit contributions
ous+0.121
 Nice+0.110
pill+0.107
ho+0.107
 Sky+0.104
Neg logit contributions
 promise-0.104
ap-0.104
 agree-0.098
our-0.095
 So-0.094
Token shining
Token ID22751
Feature activation-7.30e-01

Pos logit contributions
ous+0.012
pill+0.011
ho+0.011
 Nice+0.010
 Sky+0.010
Neg logit contributions
ap-0.020
 promise-0.020
 agree-0.019
 promises-0.018
 advantage-0.018
Token bright
Token ID6016
Feature activation-2.50e-01

Pos logit contributions
ous+0.072
ho+0.066
 Nice+0.065
 tigers+0.062
 Sky+0.062
Neg logit contributions
 promise-0.035
ap-0.034
 agree-0.034
leaning-0.030
 promises-0.030
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
ous+0.027
ho+0.025
 Nice+0.024
 tigers+0.023
 Sky+0.023
Neg logit contributions
 promise-0.010
ap-0.010
 agree-0.009
leaning-0.008
 promises-0.008
Token The
Token ID383
Feature activation+5.23e-01

Pos logit contributions
 promise+0.007
ap+0.006
 agree+0.006
 promises+0.006
 advantage+0.006
Neg logit contributions
ous-0.008
 Nice-0.008
 Sky-0.007
 tigers-0.007
ho-0.007
Token mom
Token ID1995
Feature activation+3.04e-01

Pos logit contributions
 promise+0.024
ap+0.022
 agree+0.022
 promises+0.021
leaning+0.021
Neg logit contributions
ous-0.028
ho-0.025
 Nice-0.024
pill-0.024
 Sky-0.022
Tokenmy
Token ID1820
Feature activation+1.66e-01

Pos logit contributions
 promise+0.022
ap+0.022
 agree+0.021
leaning+0.020
 promises+0.020
Neg logit contributions
ous-0.023
ho-0.020
 Nice-0.020
 Sky-0.019
 spilling-0.019
Token then
Token ID788
Feature activation-3.70e-01

Pos logit contributions
ap+0.010
 promise+0.009
 agree+0.009
 advantage+0.009
leaning+0.009
Neg logit contributions
ous-0.015
ho-0.013
 tigers-0.013
 guarding-0.013
 grizz-0.012
Token took
Token ID1718
Feature activation-1.79e-01

Pos logit contributions
ous+0.028
ho+0.025
 Nice+0.023
 guarding+0.023
 tigers+0.023
Neg logit contributions
ap-0.025
 promise-0.025
 agree-0.025
 advantage-0.024
leaning-0.023
Token the
Token ID262
Feature activation-6.65e-01

Pos logit contributions
ous+0.013
ho+0.012
 Nice+0.011
 Sky+0.011
 tigers+0.011
Neg logit contributions
 promise-0.013
ap-0.013
 agree-0.013
leaning-0.012
 promises-0.012
Token girl
Token ID2576
Feature activation-1.41e+00

Pos logit contributions
ous+0.054
ho+0.049
 Nice+0.047
pill+0.045
 Sky+0.044
Neg logit contributions
 promise-0.045
ap-0.043
 agree-0.042
 promises-0.040
 advantage-0.039
Token out
Token ID503
Feature activation+1.15e-01

Pos logit contributions
ous+0.106
ho+0.097
 Nice+0.091
 Sky+0.091
 guarding+0.090
Neg logit contributions
 promise-0.094
 agree-0.093
ap-0.093
leaning-0.089
our-0.085
Token of
Token ID286
Feature activation-7.40e-01

Pos logit contributions
 promise+0.007
ap+0.007
 agree+0.007
leaning+0.006
 promises+0.006
Neg logit contributions
ous-0.010
ho-0.009
 Nice-0.009
 tigers-0.008
 Sky-0.008
Token the
Token ID262
Feature activation+7.61e-02

Pos logit contributions
ous+0.076
ho+0.069
 Nice+0.068
pill+0.066
balls+0.063
Neg logit contributions
 promise-0.036
 agree-0.033
 promises-0.033
ap-0.033
 advantage-0.031
Token house
Token ID2156
Feature activation+4.04e-01

Pos logit contributions
 agree+0.005
 promise+0.005
ap+0.005
leaning+0.005
 advantage+0.005
Neg logit contributions
ous-0.006
ho-0.005
 gor-0.005
 tigers-0.005
 grizz-0.005
Token to
Token ID284
Feature activation-5.32e-03

Pos logit contributions
 promise+0.016
ap+0.015
 agree+0.015
leaning+0.014
 advantage+0.013
Neg logit contributions
ous-0.043
ho-0.040
 Nice-0.039
 spilling-0.038
 guarding-0.038
Token was
Token ID373
Feature activation-5.69e-01

Pos logit contributions
ous+0.054
ho+0.049
 Nice+0.046
 tigers+0.045
 Sky+0.045
Neg logit contributions
 promise-0.053
ap-0.052
 agree-0.051
leaning-0.049
 advantage-0.049
Token very
Token ID845
Feature activation+2.58e-01

Pos logit contributions
ous+0.057
ho+0.053
 Nice+0.051
pill+0.050
storm+0.046
Neg logit contributions
ap-0.030
 promise-0.028
 promises-0.026
 agree-0.026
 advantage-0.025
Token curious
Token ID11040
Feature activation-5.32e-02

Pos logit contributions
 promise+0.015
ap+0.015
 agree+0.015
 promises+0.013
 own+0.013
Neg logit contributions
ous-0.024
ho-0.021
 Nice-0.021
pill-0.021
 tigers-0.021
Token about
Token ID546
Feature activation-9.65e-01

Pos logit contributions
ous+0.005
ho+0.005
 Nice+0.005
 Sky+0.005
 tigers+0.005
Neg logit contributions
 promise-0.002
ap-0.002
 agree-0.002
 So-0.002
leaning-0.002
Token the
Token ID262
Feature activation-1.70e-01

Pos logit contributions
ous+0.087
ho+0.083
 Nice+0.077
pill+0.075
sed+0.074
Neg logit contributions
 promise-0.052
 agree-0.052
ap-0.052
 promises-0.049
 advantage-0.046
Token moon
Token ID8824
Feature activation-1.30e+00

Pos logit contributions
ous+0.011
ho+0.010
pill+0.010
 Nice+0.010
 spilling+0.009
Neg logit contributions
 promise-0.013
ap-0.013
 agree-0.013
 promises-0.012
 advantage-0.012
Token and
Token ID290
Feature activation+1.10e-01

Pos logit contributions
ous+0.137
ho+0.128
 Nice+0.126
 grizz+0.122
pill+0.121
Neg logit contributions
ap-0.051
 promise-0.050
 agree-0.048
 promises-0.041
 So-0.040
Token he
Token ID339
Feature activation-4.20e-01

Pos logit contributions
 agree+0.006
 learn+0.006
 promise+0.006
 own+0.006
 care+0.006
Neg logit contributions
ous-0.010
pill-0.009
 Nice-0.009
ho-0.009
storm-0.009
Token asked
Token ID1965
Feature activation+4.20e-01

Pos logit contributions
ous+0.030
ho+0.027
 Nice+0.026
 tigers+0.025
 Sky+0.024
Neg logit contributions
 promise-0.031
ap-0.031
 agree-0.030
leaning-0.028
 promises-0.028
Token his
Token ID465
Feature activation+1.75e-01

Pos logit contributions
 promise+0.027
ap+0.027
 agree+0.027
 So+0.024
 promises+0.024
Neg logit contributions
ous-0.035
ho-0.032
 Nice-0.030
pill-0.030
 spilling-0.029
Token mom
Token ID1995
Feature activation+3.13e-01

Pos logit contributions
 promise+0.007
ap+0.006
 agree+0.006
 promises+0.005
 So+0.005
Neg logit contributions
ous-0.020
ho-0.018
 Nice-0.018
pill-0.017
 spilling-0.017
TokenCan
Token ID6090
Feature activation-1.60e-02

Pos logit contributions
 promise+0.049
ap+0.048
 agree+0.048
leaning+0.045
 So+0.045
Neg logit contributions
ous-0.046
ho-0.039
 Nice-0.038
 spilling-0.036
 guarding-0.036
Token we
Token ID356
Feature activation-1.93e-02

Pos logit contributions
ous+0.001
ho+0.001
 Nice+0.001
 Sky+0.001
 tigers+0.001
Neg logit contributions
 promise-0.001
 agree-0.001
ap-0.001
leaning-0.001
 advantage-0.001
Token swing
Token ID9628
Feature activation+3.09e-01

Pos logit contributions
ous+0.001
ho+0.001
igator+0.001
 Sky+0.001
 tigers+0.001
Neg logit contributions
leaning-0.001
 promises-0.001
 advantage-0.001
ap-0.001
ippers-0.001
Token to
Token ID284
Feature activation-5.70e-01

Pos logit contributions
 promise+0.019
 agree+0.019
ap+0.019
leaning+0.018
 promises+0.017
Neg logit contributions
ous-0.025
ho-0.023
 Nice-0.022
 tigers-0.022
 Sky-0.022
Token the
Token ID262
Feature activation-4.42e-01

Pos logit contributions
ous+0.041
ho+0.037
 tigers+0.036
 Sky+0.036
igator+0.034
Neg logit contributions
 promise-0.041
ap-0.040
 agree-0.039
leaning-0.039
 advantage-0.037
Token moon
Token ID8824
Feature activation-1.27e+00

Pos logit contributions
ous+0.037
pill+0.036
 Nice+0.034
lic+0.033
 spilling+0.033
Neg logit contributions
 promise-0.026
ap-0.025
 own-0.024
 agree-0.022
 learn-0.022
Token,
Token ID11
Feature activation-2.34e-01

Pos logit contributions
ous+0.128
ho+0.117
 Nice+0.114
 tigers+0.110
 Sky+0.110
Neg logit contributions
 promise-0.058
ap-0.056
 agree-0.055
leaning-0.051
 advantage-0.050
Token mom
Token ID1995
Feature activation+5.79e-01

Pos logit contributions
ous+0.018
ho+0.017
 Nice+0.017
 tigers+0.017
 Sky+0.017
Neg logit contributions
ap-0.014
leaning-0.014
 advantage-0.013
 agree-0.013
 promise-0.013
Token?"
Token ID1701
Feature activation+3.77e-02

Pos logit contributions
 promise+0.027
ap+0.026
 agree+0.026
 advantage+0.024
leaning+0.024
Neg logit contributions
ous-0.057
ho-0.052
 Nice-0.051
 tigers-0.050
 guarding-0.049
Token Lily
Token ID20037
Feature activation-2.65e-01

Pos logit contributions
 promise+0.003
ap+0.003
 agree+0.003
leaning+0.003
 promises+0.003
Neg logit contributions
ous-0.002
ho-0.002
 Nice-0.002
 tigers-0.002
 Sky-0.002
Token asks
Token ID7893
Feature activation+6.77e-02

Pos logit contributions
ous+0.018
 Nice+0.016
ho+0.016
 tigers+0.016
 Sky+0.016
Neg logit contributions
 promise-0.019
ap-0.019
 advantage-0.018
 agree-0.017
leaning-0.017