TOP ACTIVATIONS
MAX = 0.058

be happy . Even though situations can be serious , it
new friend . His name is Sam . Sam comes to
day . When it got dark , they got back in
how frightening the world might seem , everything would work out
knew that sometimes things can seem scary at first , but
learned that sometimes when things seem scary or difficult , it
. But when it got dark , she wanted to go
When it started to get dark , Lily said , "
learned that sometimes things can seem scary , but if we
is that sometimes when things seem too big or expensive ,
before . When it got dark , the little girl had
that even though things may seem dull and boring sometimes ,
like him . She asked uncle Ben , " Can I
is that sometimes things may seem scary or fearful , but
When it started to get dark , Jill went back inside
As it started to get dark , they waved goodbye to
. Even though it may seem easy and fun to do
the park . His name is Leo . Leo sees their
, enjoying the sights and sounds . Eventually , the reached
. When it started getting dark out , Sally realised that

MIN ACTIVATIONS
MIN = -0.084

soon nothing but a small spe ck in the sky .
He was so angry he would squ awk and scree ch
came and stole his paint brush . The painter was sad
. She was so sad she started to cry .
One day , L ott ie 's mom went to
old tire . " O oh , what can I do
did you take all the carrots ?" Benny asked .
â € P ipp a gigg led and said
the nic est coats he had ever seen !
, surprised voice , " oh my !"
gave Lily a band - aid for her knee and made
. " Wo oh oo !" she said ,
it , a loud noise filled the air .
the most beautiful thing she had ever seen . She slowly
and put a band - aid on his cheek . He
Lucy was feeling tired so she went to the classroom for
for the adventure , he packed his bag . He put
very special toy for L ott ie . It was a
! She was so sad and didn 't know what to
some books off the books he lf . Mom my was

INTERVAL 0.022 - 0.058
CONTAINS 1.576%

. Pe ering around the corner , he saw a little
He asked his mom if he could borrow her m op
They talked , laughed and played games . It was a
big dog named Spot . Spot is brown and white and
lot of different animals and lush green trees .

INTERVAL -0.013 - 0.022
CONTAINS 87.273%

started to laugh and point at her . Amy felt embarrassed
eat broccoli . She said , " I like broccoli because
, even when it feels a bit dizz y . <|endoftext|>
how to tw irl the spaghetti around it without dropping it
flower in his pocket and started walking . He had the

INTERVAL -0.049 - -0.013
CONTAINS 11.148%

they were very brave . They worked together
his pocket . He also filled his bottle with some water
the stickers , cr ay ons , and glitter . They
" Okay , Lily ," Ben says . "
down to Annie and she was so amazed ! The angel

INTERVAL -0.084 - -0.049
CONTAINS 0.003%

for air , he cl amb ered out of the p
the most beautiful apples he had ever seen . His mom
was time for the happy couple to head off . Everyone
so scared that the books would get damaged , so she
the most beautiful purse she had ever seen . It was
Token be
Token ID307
Feature activation+3.51e-03

Pos logit contributions
 agree+0.006
 know+0.006
 acts+0.006
 creativity+0.006
 supervision+0.006
Neg logit contributions
 upon-0.010
but-0.010
itates-0.010
just-0.010
chard-0.010
Token happy
Token ID3772
Feature activation+2.48e-03

Pos logit contributions
 know+0.004
 creativity+0.004
 agree+0.004
 help+0.004
 thank+0.003
Neg logit contributions
itates-0.006
but-0.006
ied-0.006
 upon-0.006
ained-0.006
Token.
Token ID13
Feature activation-9.14e-04

Pos logit contributions
 know+0.004
 agree+0.004
 creativity+0.004
 acts+0.004
 supervision+0.004
Neg logit contributions
but-0.003
 upon-0.003
itates-0.003
just-0.003
ied-0.003
Token Even
Token ID3412
Feature activation-5.96e-03

Pos logit contributions
itates+0.001
but+0.001
ied+0.001
 upon+0.001
ained+0.001
Neg logit contributions
 know-0.001
 creativity-0.001
 agree-0.001
 acts-0.001
 thank-0.001
Token though
Token ID996
Feature activation+1.08e-02

Pos logit contributions
itates+0.007
but+0.007
ied+0.007
ained+0.007
just+0.007
Neg logit contributions
 know-0.010
 creativity-0.010
 agree-0.010
 thank-0.010
 acts-0.010
Token situations
Token ID7445
Feature activation+5.79e-02

Pos logit contributions
 know+0.018
 agree+0.018
 supervision+0.017
 bisc+0.017
 creativity+0.017
Neg logit contributions
 upon-0.013
 into-0.012
but-0.012
 by-0.012
ied-0.012
Token can
Token ID460
Feature activation-2.14e-03

Pos logit contributions
orse+0.076
 bisc+0.076
 amaz+0.068
uf+0.064
 omega+0.063
Neg logit contributions
 by-0.099
 into-0.094
 upon-0.092
 over-0.088
 up-0.088
Token be
Token ID307
Feature activation+9.99e-03

Pos logit contributions
but+0.004
 upon+0.003
itates+0.003
just+0.003
itating+0.003
Neg logit contributions
 creativity-0.002
 supervision-0.002
 pleasure-0.002
 know-0.002
 agree-0.002
Token serious
Token ID2726
Feature activation+2.64e-02

Pos logit contributions
 know+0.014
 agree+0.014
 creativity+0.014
 amaz+0.013
 acts+0.013
Neg logit contributions
 upon-0.014
itates-0.014
ained-0.014
ied-0.013
chard-0.013
Token,
Token ID11
Feature activation-4.41e-03

Pos logit contributions
 bisc+0.030
 know+0.029
 agree+0.029
orse+0.028
 omega+0.027
Neg logit contributions
 into-0.047
 upon-0.047
 by-0.047
ied-0.045
but-0.044
Token it
Token ID340
Feature activation+1.03e-02

Pos logit contributions
but+0.006
 upon+0.006
itates+0.006
just+0.006
ied+0.006
Neg logit contributions
 know-0.007
 agree-0.006
 creativity-0.006
 acts-0.006
 supervision-0.006
Token new
Token ID649
Feature activation-4.15e-03

Pos logit contributions
 supervision+0.005
 acts+0.005
 generosity+0.004
 amaz+0.004
 know+0.004
Neg logit contributions
 upon-0.004
chard-0.004
itates-0.004
eh-0.003
bel-0.003
Token friend
Token ID1545
Feature activation+1.16e-02

Pos logit contributions
 upon+0.005
but+0.005
eh+0.005
ained+0.005
 into+0.005
Neg logit contributions
 supervision-0.007
 know-0.006
 creativity-0.006
 amaz-0.006
 Something-0.006
Token.
Token ID13
Feature activation+4.02e-03

Pos logit contributions
 amaz+0.018
 creativity+0.018
 pleasure+0.017
 know+0.017
 bravery+0.016
Neg logit contributions
 by-0.016
 into-0.016
 upon-0.016
 over-0.015
but-0.015
Token His
Token ID2399
Feature activation+2.32e-02

Pos logit contributions
 agree+0.006
 creativity+0.006
 supervision+0.006
 generosity+0.006
 believe+0.006
Neg logit contributions
itates-0.005
 upon-0.005
ained-0.005
but-0.005
 into-0.005
Token name
Token ID1438
Feature activation+3.68e-02

Pos logit contributions
ird+0.035
 supervision+0.035
 ah+0.033
itivity+0.033
 agree+0.033
Neg logit contributions
 upon-0.032
 by-0.029
 up-0.028
ained-0.026
 into-0.026
Token is
Token ID318
Feature activation+5.66e-02

Pos logit contributions
 supervision+0.058
 know+0.057
 agree+0.057
 creativity+0.057
 pleasure+0.056
Neg logit contributions
itates-0.047
 upon-0.046
ied-0.046
 into-0.046
 by-0.046
Token Sam
Token ID3409
Feature activation+7.71e-03

Pos logit contributions
 agree+0.074
 amaz+0.073
 creativity+0.072
 wish+0.069
 offer+0.069
Neg logit contributions
chard-0.086
Prin-0.086
 upon-0.085
 by-0.084
ained-0.084
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 amaz+0.013
 creativity+0.013
 know+0.013
 pleasure+0.012
 supervision+0.012
Neg logit contributions
 upon-0.010
 by-0.009
itates-0.009
 into-0.009
but-0.009
Token Sam
Token ID3409
Feature activation+2.12e-02

Pos logit contributions
 generosity+0.014
 creativity+0.014
 supervision+0.014
 agree+0.014
 believe+0.014
Neg logit contributions
 upon-0.016
itates-0.016
ained-0.015
but-0.015
 by-0.014
Token comes
Token ID2058
Feature activation+8.82e-03

Pos logit contributions
 supervision+0.029
 pleasure+0.029
 amaz+0.029
 creativity+0.029
 know+0.028
Neg logit contributions
 upon-0.032
itates-0.031
 into-0.030
Prin-0.029
 by-0.029
Token to
Token ID284
Feature activation+2.70e-02

Pos logit contributions
 know+0.016
 help+0.015
 agree+0.015
 hop+0.015
 lend+0.014
Neg logit contributions
itates-0.010
ained-0.010
but-0.009
side-0.009
chard-0.009
Token day
Token ID1110
Feature activation+9.97e-03

Pos logit contributions
 know+0.005
 agree+0.005
 help+0.005
 creativity+0.005
 offer+0.005
Neg logit contributions
itates-0.005
but-0.005
chard-0.005
just-0.005
Prin-0.005
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
 know+0.013
 acts+0.013
 creativity+0.013
 generosity+0.012
 supervision+0.012
Neg logit contributions
but-0.016
 upon-0.016
 into-0.015
just-0.015
side-0.015
Token When
Token ID1649
Feature activation+1.23e-02

Pos logit contributions
 acts+0.026
 wish+0.025
 creativity+0.025
 supervision+0.025
 bravery+0.025
Neg logit contributions
ained-0.020
 upon-0.020
but-0.020
itates-0.019
 over-0.018
Token it
Token ID340
Feature activation+2.80e-02

Pos logit contributions
 know+0.015
 agree+0.014
 creativity+0.014
 acts+0.014
 supervision+0.014
Neg logit contributions
 upon-0.020
itates-0.020
but-0.020
ied-0.020
ained-0.020
Token got
Token ID1392
Feature activation+3.26e-02

Pos logit contributions
 know+0.042
 creativity+0.041
 agree+0.040
 supervision+0.039
 acts+0.039
Neg logit contributions
itates-0.039
 upon-0.038
but-0.038
ied-0.038
ained-0.037
Token dark
Token ID3223
Feature activation+5.64e-02

Pos logit contributions
,+0.043
 know+0.040
 Sammy+0.039
 help+0.037
 pleasure+0.035
Neg logit contributions
but-0.055
we-0.054
ross-0.054
ied-0.051
itates-0.051
Token,
Token ID11
Feature activation+9.38e-05

Pos logit contributions
 know+0.064
 agree+0.061
 supervision+0.060
 creativity+0.056
 generosity+0.055
Neg logit contributions
 into-0.109
 upon-0.103
but-0.103
 by-0.102
ied-0.100
Token they
Token ID484
Feature activation+3.92e-03

Pos logit contributions
 know+0.000
 creativity+0.000
 agree+0.000
 acts+0.000
 supervision+0.000
Neg logit contributions
but-0.000
itates-0.000
 upon-0.000
ied-0.000
chard-0.000
Token got
Token ID1392
Feature activation+9.06e-04

Pos logit contributions
 creativity+0.006
 know+0.006
 supervision+0.006
 acts+0.006
 pleasure+0.006
Neg logit contributions
 upon-0.005
itates-0.005
ied-0.005
but-0.005
chard-0.005
Token back
Token ID736
Feature activation-2.45e-03

Pos logit contributions
 know+0.002
 help+0.001
 creativity+0.001
 agree+0.001
 pleasure+0.001
Neg logit contributions
itates-0.001
but-0.001
ied-0.001
Prin-0.001
ross-0.001
Token in
Token ID287
Feature activation-5.47e-03

Pos logit contributions
itates+0.002
but+0.002
ied+0.002
 upon+0.002
ained+0.002
Neg logit contributions
 know-0.005
 creativity-0.005
 agree-0.005
 acts-0.005
 thank-0.005
Token how
Token ID703
Feature activation+1.95e-02

Pos logit contributions
 bisc+0.044
 agree+0.044
 know+0.043
 chew+0.041
 amaz+0.041
Neg logit contributions
 upon-0.056
ied-0.054
 by-0.053
but-0.052
ained-0.051
Token frightening
Token ID23101
Feature activation+2.71e-02

Pos logit contributions
 know+0.027
 creativity+0.026
 agree+0.026
 thank+0.025
 acts+0.025
Neg logit contributions
itates-0.029
but-0.029
 upon-0.029
ied-0.028
just-0.028
Token the
Token ID262
Feature activation-4.95e-04

Pos logit contributions
 agree+0.049
 know+0.047
 bisc+0.046
 amaz+0.045
 pleasure+0.044
Neg logit contributions
 by-0.030
 into-0.030
ied-0.028
 upon-0.028
but-0.027
Token world
Token ID995
Feature activation+2.77e-02

Pos logit contributions
 upon+0.001
chard+0.001
 by+0.001
just+0.001
but+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 Something-0.000
 creativity-0.000
 supervision-0.000
Token might
Token ID1244
Feature activation+2.28e-02

Pos logit contributions
 know+0.040
 agree+0.039
 amaz+0.038
 pleasure+0.038
 creativity+0.038
Neg logit contributions
ied-0.039
 upon-0.039
just-0.038
 into-0.038
but-0.038
Token seem
Token ID1283
Feature activation+5.64e-02

Pos logit contributions
 amaz+0.027
 creativity+0.026
 bisc+0.026
 omega+0.026
 supervision+0.026
Neg logit contributions
 upon-0.039
 into-0.038
 by-0.037
just-0.037
but-0.036
Token,
Token ID11
Feature activation+2.23e-03

Pos logit contributions
 amaz+0.080
 supervision+0.077
 bisc+0.076
 creativity+0.076
 agree+0.074
Neg logit contributions
 upon-0.089
 into-0.086
 by-0.082
itating-0.077
 over-0.076
Token everything
Token ID2279
Feature activation+2.20e-02

Pos logit contributions
 agree+0.004
 know+0.003
 creativity+0.003
 bisc+0.003
 acts+0.003
Neg logit contributions
 upon-0.003
but-0.003
just-0.003
 into-0.003
 by-0.003
Token would
Token ID561
Feature activation-1.22e-02

Pos logit contributions
 know+0.030
 agree+0.029
 creativity+0.028
 supervision+0.028
 pleasure+0.028
Neg logit contributions
 upon-0.033
itates-0.033
ied-0.033
but-0.032
just-0.032
Token work
Token ID670
Feature activation+9.63e-03

Pos logit contributions
 upon+0.018
just+0.017
but+0.017
itating+0.016
 into+0.016
Neg logit contributions
 agree-0.016
 acts-0.016
 supervision-0.016
 amaz-0.015
 creativity-0.015
Token out
Token ID503
Feature activation+5.77e-03

Pos logit contributions
 know+0.015
 agree+0.014
 creativity+0.014
 amaz+0.014
 acts+0.014
Neg logit contributions
 upon-0.013
but-0.013
ied-0.013
just-0.012
itates-0.012
Token knew
Token ID2993
Feature activation+5.40e-04

Pos logit contributions
 know+0.007
 creativity+0.006
 agree+0.006
 pleasure+0.006
 supervision+0.006
Neg logit contributions
but-0.006
itates-0.006
ied-0.006
 upon-0.006
ained-0.006
Token that
Token ID326
Feature activation+9.34e-03

Pos logit contributions
 know+0.001
 agree+0.001
 creativity+0.001
 acts+0.001
 amaz+0.001
Neg logit contributions
but-0.001
ied-0.001
 upon-0.001
just-0.001
itates-0.001
Token sometimes
Token ID3360
Feature activation+1.50e-02

Pos logit contributions
 know+0.016
 agree+0.016
 creativity+0.015
 pleasure+0.015
 offer+0.014
Neg logit contributions
but-0.011
 upon-0.011
just-0.010
 by-0.010
chard-0.010
Token things
Token ID1243
Feature activation+3.86e-02

Pos logit contributions
 bisc+0.026
 agree+0.025
 supervision+0.025
 know+0.025
 amaz+0.024
Neg logit contributions
 by-0.019
 into-0.017
 up-0.016
 upon-0.016
 in-0.015
Token can
Token ID460
Feature activation+1.68e-02

Pos logit contributions
 amaz+0.045
 bisc+0.042
 pleasure+0.040
 agree+0.040
 supervision+0.039
Neg logit contributions
 into-0.068
 upon-0.067
ied-0.067
 by-0.066
itating-0.063
Token seem
Token ID1283
Feature activation+5.58e-02

Pos logit contributions
 amaz+0.022
 supervision+0.021
 pleasure+0.021
 creativity+0.020
 bisc+0.019
Neg logit contributions
 upon-0.027
itating-0.027
 into-0.027
just-0.026
but-0.026
Token scary
Token ID14343
Feature activation+4.14e-02

Pos logit contributions
 amaz+0.093
 supervision+0.089
 agree+0.088
 creativity+0.086
 acts+0.085
Neg logit contributions
 upon-0.070
 into-0.069
 by-0.065
chard-0.063
ained-0.062
Token at
Token ID379
Feature activation+1.16e-02

Pos logit contributions
 amaz+0.065
 know+0.063
 bisc+0.062
 agree+0.062
 pleasure+0.061
Neg logit contributions
 into-0.060
 by-0.058
 upon-0.054
 in-0.052
 over-0.052
Token first
Token ID717
Feature activation+2.02e-02

Pos logit contributions
 agree+0.022
 know+0.022
 amaz+0.021
 bisc+0.021
 Something+0.021
Neg logit contributions
 upon-0.011
 into-0.011
ained-0.011
chard-0.011
 by-0.010
Token,
Token ID11
Feature activation+5.68e-03

Pos logit contributions
 amaz+0.027
 pleasure+0.026
 know+0.026
 agree+0.026
 omega+0.025
Neg logit contributions
 into-0.032
 by-0.031
 upon-0.030
ained-0.029
ied-0.028
Token but
Token ID475
Feature activation+2.16e-02

Pos logit contributions
 know+0.009
 agree+0.009
 supervision+0.008
 offer+0.008
 creativity+0.008
Neg logit contributions
but-0.008
 upon-0.007
just-0.007
 by-0.007
 into-0.007
Token learned
Token ID4499
Feature activation-7.51e-03

Pos logit contributions
 pleasure+0.015
 supervision+0.015
 know+0.014
 creativity+0.014
 acts+0.014
Neg logit contributions
ied-0.012
 upon-0.012
 into-0.012
but-0.012
itates-0.011
Token that
Token ID326
Feature activation+1.04e-04

Pos logit contributions
but+0.012
ied+0.012
 upon+0.012
just+0.012
itates+0.011
Neg logit contributions
 know-0.009
 agree-0.009
 creativity-0.009
 acts-0.009
 pleasure-0.009
Token sometimes
Token ID3360
Feature activation+1.38e-02

Pos logit contributions
 agree+0.000
 know+0.000
 pleasure+0.000
 thank+0.000
 supervision+0.000
Neg logit contributions
 upon-0.000
 by-0.000
but-0.000
ied-0.000
 into-0.000
Token when
Token ID618
Feature activation+1.55e-02

Pos logit contributions
 bisc+0.026
 pleasure+0.023
 agree+0.023
�+0.022
uf+0.022
Neg logit contributions
 by-0.017
 into-0.016
 upon-0.015
 up-0.015
 in-0.014
Token things
Token ID1243
Feature activation+4.21e-02

Pos logit contributions
 agree+0.027
 know+0.026
 believe+0.025
 bisc+0.024
 amaz+0.024
Neg logit contributions
 upon-0.019
 into-0.017
 by-0.017
ained-0.016
just-0.016
Token seem
Token ID1283
Feature activation+5.55e-02

Pos logit contributions
 amaz+0.044
 bisc+0.041
 pleasure+0.040
 supervision+0.040
 Something+0.039
Neg logit contributions
 upon-0.080
 into-0.078
 by-0.077
ied-0.077
itates-0.073
Token scary
Token ID14343
Feature activation+4.81e-02

Pos logit contributions
 amaz+0.088
 supervision+0.084
 agree+0.084
 creativity+0.083
 acts+0.082
Neg logit contributions
 upon-0.075
 into-0.073
ained-0.069
 by-0.068
itates-0.068
Token or
Token ID393
Feature activation+1.67e-02

Pos logit contributions
 amaz+0.067
orse+0.064
 bisc+0.064
 agree+0.064
 know+0.061
Neg logit contributions
 into-0.084
 by-0.079
 up-0.074
 in-0.074
 over-0.073
Token difficult
Token ID2408
Feature activation+2.74e-02

Pos logit contributions
 agree+0.025
 know+0.025
 amaz+0.024
 pleasure+0.024
 acts+0.024
Neg logit contributions
 upon-0.022
chard-0.022
 into-0.021
itates-0.021
ained-0.021
Token,
Token ID11
Feature activation+3.75e-03

Pos logit contributions
 amaz+0.039
 bisc+0.039
 agree+0.037
orse+0.036
 know+0.035
Neg logit contributions
 into-0.047
 by-0.044
 in-0.040
ied-0.040
 upon-0.040
Token it
Token ID340
Feature activation+1.79e-02

Pos logit contributions
 agree+0.006
 know+0.006
 amaz+0.006
 pleasure+0.006
 bisc+0.006
Neg logit contributions
 upon-0.005
 by-0.004
but-0.004
 into-0.004
just-0.004
Token.
Token ID13
Feature activation+2.18e-03

Pos logit contributions
 know+0.005
 amaz+0.005
 creativity+0.005
 pleasure+0.005
 acts+0.005
Neg logit contributions
but-0.005
 upon-0.005
 into-0.005
ained-0.005
just-0.005
Token But
Token ID887
Feature activation+1.90e-02

Pos logit contributions
 know+0.003
 creativity+0.003
 agree+0.003
 acts+0.003
 bravery+0.003
Neg logit contributions
but-0.003
itates-0.003
 upon-0.003
ained-0.003
just-0.003
Token when
Token ID618
Feature activation+1.17e-02

Pos logit contributions
 know+0.027
 creativity+0.026
 agree+0.026
 thank+0.025
 acts+0.025
Neg logit contributions
itates-0.028
but-0.027
ied-0.027
 upon-0.026
ained-0.026
Token it
Token ID340
Feature activation+2.86e-02

Pos logit contributions
 know+0.017
 creativity+0.016
 agree+0.016
 acts+0.015
 thank+0.015
Neg logit contributions
itates-0.017
but-0.017
 upon-0.017
ied-0.017
ained-0.016
Token got
Token ID1392
Feature activation+2.70e-02

Pos logit contributions
 know+0.038
 creativity+0.036
 agree+0.035
 help+0.035
 acts+0.035
Neg logit contributions
but-0.045
itates-0.044
chard-0.043
ained-0.043
ied-0.043
Token dark
Token ID3223
Feature activation+5.48e-02

Pos logit contributions
 help+0.035
 know+0.032
 Sammy+0.032
,+0.032
 care+0.029
Neg logit contributions
but-0.046
ross-0.044
itates-0.044
we-0.044
lett-0.043
Token,
Token ID11
Feature activation-1.68e-04

Pos logit contributions
 know+0.056
 agree+0.053
 creativity+0.052
 supervision+0.050
 generosity+0.049
Neg logit contributions
 into-0.104
but-0.104
 upon-0.103
ied-0.101
ained-0.101
Token she
Token ID673
Feature activation+5.66e-04

Pos logit contributions
 upon+0.000
but+0.000
chard+0.000
ied+0.000
just+0.000
Neg logit contributions
 know-0.000
 agree-0.000
 creativity-0.000
 acts-0.000
 supervision-0.000
Token wanted
Token ID2227
Feature activation+6.06e-03

Pos logit contributions
 know+0.001
 agree+0.001
 creativity+0.001
 acts+0.001
 thank+0.001
Neg logit contributions
itates-0.001
but-0.001
ied-0.001
ained-0.001
just-0.001
Token to
Token ID284
Feature activation+5.39e-03

Pos logit contributions
 know+0.013
 agree+0.013
 creativity+0.013
 acts+0.012
 thank+0.012
Neg logit contributions
but-0.004
itates-0.004
ied-0.004
 upon-0.004
ained-0.004
Token go
Token ID467
Feature activation+6.04e-03

Pos logit contributions
 supervision+0.007
 generosity+0.007
 creativity+0.007
 acts+0.007
 omega+0.006
Neg logit contributions
 into-0.008
 upon-0.008
 over-0.008
 by-0.008
 up-0.007
Token When
Token ID1649
Feature activation+8.44e-03

Pos logit contributions
 know+0.016
 creativity+0.015
 agree+0.015
 thank+0.015
 acts+0.014
Neg logit contributions
itates-0.023
but-0.023
ied-0.022
 upon-0.022
just-0.022
Token it
Token ID340
Feature activation+2.26e-02

Pos logit contributions
 know+0.011
 creativity+0.010
 agree+0.010
 acts+0.010
 offer+0.010
Neg logit contributions
itates-0.014
but-0.013
 upon-0.013
ied-0.013
ained-0.013
Token started
Token ID2067
Feature activation+1.94e-02

Pos logit contributions
 know+0.027
 help+0.025
 thank+0.025
 creativity+0.024
 agree+0.024
Neg logit contributions
chard-0.040
but-0.039
itates-0.038
just-0.037
itating-0.037
Token to
Token ID284
Feature activation+2.33e-02

Pos logit contributions
 know+0.029
 creativity+0.028
 agree+0.028
 thank+0.027
 help+0.027
Neg logit contributions
itates-0.027
but-0.027
ied-0.026
just-0.026
 upon-0.026
Token get
Token ID651
Feature activation+4.38e-02

Pos logit contributions
 supervision+0.030
 creativity+0.029
 acts+0.029
 know+0.029
 bravery+0.027
Neg logit contributions
 upon-0.037
Prin-0.035
ained-0.035
chard-0.034
but-0.034
Token dark
Token ID3223
Feature activation+5.43e-02

Pos logit contributions
 know+0.059
 help+0.057
 creativity+0.054
 pleasure+0.054
 thank+0.052
Neg logit contributions
but-0.072
itates-0.070
just-0.066
itating-0.066
ied-0.064
Token,
Token ID11
Feature activation+4.13e-03

Pos logit contributions
 know+0.065
 agree+0.063
 supervision+0.061
 creativity+0.056
 wish+0.056
Neg logit contributions
 into-0.104
 by-0.096
 upon-0.094
 late-0.093
but-0.093
Token Lily
Token ID20037
Feature activation-5.91e-03

Pos logit contributions
 know+0.004
 creativity+0.004
 agree+0.004
 acts+0.004
 thank+0.004
Neg logit contributions
itates-0.008
but-0.008
 upon-0.008
ied-0.008
ained-0.008
Token said
Token ID531
Feature activation-5.98e-03

Pos logit contributions
 upon+0.008
but+0.008
itates+0.008
ied+0.008
 into+0.008
Neg logit contributions
 know-0.008
 creativity-0.008
 pleasure-0.008
 supervision-0.008
 acts-0.008
Token,
Token ID11
Feature activation-6.80e-03

Pos logit contributions
but+0.013
itates+0.012
ied+0.012
 upon+0.012
just+0.012
Neg logit contributions
 know-0.005
 creativity-0.004
 agree-0.004
 acts-0.004
 amaz-0.004
Token "
Token ID366
Feature activation-5.77e-03

Pos logit contributions
itates+0.011
ied+0.011
ained+0.011
but+0.011
 upon+0.011
Neg logit contributions
 know-0.008
 creativity-0.008
 agree-0.008
 thank-0.008
 help-0.008
Token learned
Token ID4499
Feature activation-9.62e-03

Pos logit contributions
 know+0.001
 creativity+0.001
 pleasure+0.001
 agree+0.001
 supervision+0.001
Neg logit contributions
but-0.001
itates-0.001
 upon-0.001
ied-0.001
ained-0.001
Token that
Token ID326
Feature activation+1.88e-03

Pos logit contributions
itates+0.015
but+0.015
ied+0.014
 upon+0.014
ained+0.014
Neg logit contributions
 know-0.013
 creativity-0.012
 agree-0.012
 thank-0.012
 help-0.012
Token sometimes
Token ID3360
Feature activation+1.66e-02

Pos logit contributions
 agree+0.003
 know+0.003
 creativity+0.003
 supervision+0.003
 pleasure+0.003
Neg logit contributions
but-0.002
 upon-0.002
ained-0.002
 by-0.002
just-0.002
Token things
Token ID1243
Feature activation+3.56e-02

Pos logit contributions
 bisc+0.032
 amaz+0.030
 pleasure+0.029
 supervision+0.029
 agree+0.028
Neg logit contributions
 by-0.020
 into-0.018
 up-0.017
 in-0.016
 upon-0.016
Token can
Token ID460
Feature activation+1.61e-02

Pos logit contributions
 amaz+0.043
 bisc+0.040
 pleasure+0.037
 delight+0.037
orse+0.037
Neg logit contributions
 into-0.063
 by-0.063
 upon-0.062
ied-0.061
itating-0.060
Token seem
Token ID1283
Feature activation+5.39e-02

Pos logit contributions
 amaz+0.024
 pleasure+0.022
 supervision+0.021
 bisc+0.020
 creativity+0.020
Neg logit contributions
 upon-0.025
itating-0.025
 into-0.024
 by-0.024
just-0.023
Token scary
Token ID14343
Feature activation+3.88e-02

Pos logit contributions
 amaz+0.094
 supervision+0.086
 delight+0.084
 creativity+0.084
 agree+0.084
Neg logit contributions
 upon-0.068
 into-0.066
 by-0.063
ained-0.061
 late-0.060
Token,
Token ID11
Feature activation+5.38e-03

Pos logit contributions
 amaz+0.065
 bisc+0.062
 know+0.061
 pleasure+0.060
 agree+0.060
Neg logit contributions
 into-0.055
 by-0.053
 upon-0.049
 in-0.048
 over-0.048
Token but
Token ID475
Feature activation+2.22e-02

Pos logit contributions
 know+0.008
 agree+0.008
 amaz+0.008
 offer+0.008
 thank+0.008
Neg logit contributions
but-0.007
 upon-0.007
just-0.007
 by-0.007
 into-0.007
Token if
Token ID611
Feature activation+2.53e-02

Pos logit contributions
 know+0.033
 agree+0.033
 creativity+0.032
 supervision+0.031
 acts+0.031
Neg logit contributions
 upon-0.030
but-0.029
itates-0.028
just-0.028
ained-0.028
Token we
Token ID356
Feature activation+2.64e-02

Pos logit contributions
 know+0.038
 agree+0.038
 amaz+0.036
 creativity+0.035
 pleasure+0.035
Neg logit contributions
 upon-0.035
but-0.033
chard-0.033
ied-0.032
just-0.032
Token is
Token ID318
Feature activation+9.73e-03

Pos logit contributions
but+0.033
itates+0.033
chard+0.032
ied+0.032
 upon+0.032
Neg logit contributions
 know-0.018
 creativity-0.017
 agree-0.016
 acts-0.016
 thank-0.016
Token that
Token ID326
Feature activation-1.24e-03

Pos logit contributions
 know+0.014
 creativity+0.013
 agree+0.013
 acts+0.013
 amaz+0.013
Neg logit contributions
itates-0.014
but-0.014
ied-0.014
 upon-0.014
ained-0.014
Token sometimes
Token ID3360
Feature activation+1.19e-02

Pos logit contributions
 by+0.002
 upon+0.001
but+0.001
 into+0.001
just+0.001
Neg logit contributions
 know-0.002
 bisc-0.002
 agree-0.002
 amaz-0.002
 supervision-0.002
Token when
Token ID618
Feature activation+2.23e-02

Pos logit contributions
 bisc+0.020
 amaz+0.018
 supervision+0.017
 agree+0.017
 pleasure+0.016
Neg logit contributions
 into-0.016
 by-0.016
 upon-0.015
 up-0.015
 in-0.014
Token things
Token ID1243
Feature activation+4.45e-02

Pos logit contributions
 agree+0.036
 amaz+0.035
 bisc+0.034
 know+0.034
 Something+0.032
Neg logit contributions
 upon-0.029
 into-0.028
 by-0.027
just-0.026
chard-0.026
Token seem
Token ID1283
Feature activation+5.34e-02

Pos logit contributions
 amaz+0.052
 bisc+0.051
 supervision+0.047
 Something+0.045
 pleasure+0.044
Neg logit contributions
 into-0.081
 upon-0.080
 by-0.079
ied-0.075
itates-0.074
Token too
Token ID1165
Feature activation+3.09e-02

Pos logit contributions
 amaz+0.078
 know+0.078
 agree+0.078
 creativity+0.077
 supervision+0.076
Neg logit contributions
 upon-0.073
itates-0.070
 into-0.070
ained-0.069
itating-0.068
Token big
Token ID1263
Feature activation+3.08e-02

Pos logit contributions
 agree+0.047
 amaz+0.047
 supervision+0.047
 creativity+0.046
 know+0.046
Neg logit contributions
 upon-0.043
itates-0.040
 into-0.040
ained-0.039
itating-0.039
Token or
Token ID393
Feature activation+8.71e-03

Pos logit contributions
 amaz+0.041
 agree+0.038
 know+0.038
orse+0.038
 Something+0.036
Neg logit contributions
 into-0.053
 by-0.051
 upon-0.048
 over-0.047
ied-0.047
Token expensive
Token ID5789
Feature activation+3.86e-02

Pos logit contributions
 agree+0.013
 amaz+0.013
 know+0.013
 acts+0.013
 pleasure+0.012
Neg logit contributions
chard-0.011
 upon-0.011
 into-0.011
 by-0.010
 late-0.010
Token,
Token ID11
Feature activation+1.11e-02

Pos logit contributions
 amaz+0.055
orse+0.053
 know+0.050
 bisc+0.050
 agree+0.050
Neg logit contributions
 into-0.069
 by-0.064
 up-0.060
 in-0.059
 down-0.059
Token before
Token ID878
Feature activation+1.70e-02

Pos logit contributions
itates+0.010
but+0.010
ied+0.010
 upon+0.009
ained+0.009
Neg logit contributions
 know-0.015
 creativity-0.015
 agree-0.014
 thank-0.014
 acts-0.014
Token.
Token ID13
Feature activation+5.82e-03

Pos logit contributions
 know+0.026
 agree+0.025
 creativity+0.025
 acts+0.025
 believe+0.024
Neg logit contributions
 upon-0.023
but-0.023
 into-0.023
ained-0.022
just-0.022
Token When
Token ID1649
Feature activation+7.21e-03

Pos logit contributions
 know+0.008
 creativity+0.008
 agree+0.008
 acts+0.008
 thank+0.007
Neg logit contributions
but-0.009
itates-0.009
 upon-0.008
just-0.008
ained-0.008
Token it
Token ID340
Feature activation+2.24e-02

Pos logit contributions
 know+0.011
 agree+0.010
 creativity+0.010
 acts+0.010
 supervision+0.010
Neg logit contributions
but-0.010
 upon-0.010
itates-0.010
ied-0.010
just-0.009
Token got
Token ID1392
Feature activation+2.46e-02

Pos logit contributions
 know+0.032
 creativity+0.030
 agree+0.030
 thank+0.029
 acts+0.029
Neg logit contributions
itates-0.033
but-0.032
ied-0.032
 upon-0.031
ained-0.031
Token dark
Token ID3223
Feature activation+5.32e-02

Pos logit contributions
 help+0.034
 know+0.033
,+0.032
 Sammy+0.031
 thank+0.030
Neg logit contributions
but-0.039
itates-0.039
we-0.038
ross-0.037
ied-0.036
Token,
Token ID11
Feature activation-2.64e-03

Pos logit contributions
 know+0.059
 agree+0.056
 supervision+0.053
 creativity+0.052
 generosity+0.050
Neg logit contributions
 into-0.101
but-0.098
 upon-0.097
ied-0.095
just-0.094
Token the
Token ID262
Feature activation-2.04e-03

Pos logit contributions
itates+0.004
but+0.004
ied+0.004
 upon+0.004
ained+0.004
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 thank-0.004
 acts-0.004
Token little
Token ID1310
Feature activation+1.38e-03

Pos logit contributions
 upon+0.003
but+0.003
itates+0.003
chard+0.003
ied+0.003
Neg logit contributions
 know-0.003
 agree-0.003
 creativity-0.003
 acts-0.003
 supervision-0.003
Token girl
Token ID2576
Feature activation+5.38e-03

Pos logit contributions
 know+0.003
 creativity+0.003
 agree+0.003
 supervision+0.002
 thank+0.002
Neg logit contributions
 upon-0.001
but-0.001
chard-0.001
itates-0.001
ied-0.001
Token had
Token ID550
Feature activation-7.90e-03

Pos logit contributions
 know+0.008
 creativity+0.008
 supervision+0.008
 pleasure+0.008
 agree+0.008
Neg logit contributions
 upon-0.007
but-0.007
itates-0.007
ied-0.007
just-0.007
Token that
Token ID326
Feature activation+2.67e-03

Pos logit contributions
but+0.004
itates+0.004
ied+0.004
 upon+0.004
ained+0.004
Neg logit contributions
 know-0.003
 creativity-0.003
 agree-0.003
 acts-0.003
 thank-0.003
Token even
Token ID772
Feature activation-2.46e-03

Pos logit contributions
 know+0.004
 agree+0.004
 supervision+0.004
 creativity+0.004
 amaz+0.004
Neg logit contributions
 by-0.003
 upon-0.003
but-0.003
just-0.003
ained-0.003
Token though
Token ID996
Feature activation+2.00e-02

Pos logit contributions
itates+0.003
but+0.003
ied+0.003
 upon+0.003
ained+0.003
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 acts-0.004
 thank-0.004
Token things
Token ID1243
Feature activation+4.19e-02

Pos logit contributions
 agree+0.034
 know+0.033
 bisc+0.032
 amaz+0.032
 believe+0.032
Neg logit contributions
 upon-0.025
 by-0.024
 into-0.023
ied-0.022
ained-0.022
Token may
Token ID743
Feature activation+3.28e-02

Pos logit contributions
 amaz+0.048
 bisc+0.047
orse+0.042
 supervision+0.041
 pleasure+0.041
Neg logit contributions
 into-0.076
 by-0.075
 upon-0.073
ied-0.073
 over-0.070
Token seem
Token ID1283
Feature activation+5.31e-02

Pos logit contributions
 amaz+0.041
 bisc+0.040
 supervision+0.038
 pleasure+0.037
 delight+0.035
Neg logit contributions
 upon-0.058
 into-0.054
itating-0.053
 by-0.052
 over-0.051
Token dull
Token ID19222
Feature activation+4.24e-02

Pos logit contributions
 amaz+0.089
 agree+0.084
 supervision+0.083
 bisc+0.082
 delight+0.081
Neg logit contributions
 into-0.066
 upon-0.066
 by-0.062
 late-0.061
ained-0.060
Token and
Token ID290
Feature activation+1.21e-02

Pos logit contributions
 amaz+0.056
 agree+0.056
 know+0.055
 bisc+0.055
orse+0.054
Neg logit contributions
 into-0.073
 by-0.069
 upon-0.067
 over-0.065
 up-0.064
Token boring
Token ID14262
Feature activation+3.64e-02

Pos logit contributions
 agree+0.018
 bisc+0.018
 amaz+0.017
 know+0.017
 pleasure+0.016
Neg logit contributions
 upon-0.016
chard-0.016
 by-0.016
 into-0.016
ained-0.015
Token sometimes
Token ID3360
Feature activation+1.89e-02

Pos logit contributions
 bisc+0.055
 know+0.054
orse+0.054
 amaz+0.053
 agree+0.052
Neg logit contributions
 into-0.056
 by-0.054
 in-0.049
 over-0.049
 up-0.048
Token,
Token ID11
Feature activation+5.90e-03

Pos logit contributions
 bisc+0.033
 amaz+0.032
uf+0.030
 agree+0.030
 pleasure+0.030
Neg logit contributions
 by-0.024
 into-0.022
 up-0.021
 in-0.021
 late-0.020
Token like
Token ID588
Feature activation+2.99e-02

Pos logit contributions
 amaz+0.032
 creativity+0.031
 know+0.031
 acts+0.030
 disl+0.030
Neg logit contributions
 by-0.023
but-0.022
 into-0.022
 upon-0.021
just-0.021
Token him
Token ID683
Feature activation+2.70e-02

Pos logit contributions
 know+0.043
 agree+0.040
 wish+0.039
 amaz+0.039
 acts+0.039
Neg logit contributions
 by-0.043
 into-0.043
but-0.043
chard-0.043
 upon-0.042
Token.
Token ID13
Feature activation+4.63e-03

Pos logit contributions
 creativity+0.042
 amaz+0.041
 know+0.041
 disl+0.040
 acts+0.040
Neg logit contributions
 upon-0.037
 by-0.035
but-0.035
 into-0.034
just-0.033
Token She
Token ID1375
Feature activation+2.12e-03

Pos logit contributions
 bravery+0.008
 creativity+0.008
 generosity+0.008
 know+0.008
 believe+0.008
Neg logit contributions
but-0.005
itates-0.005
 upon-0.005
Prin-0.004
just-0.004
Token asked
Token ID1965
Feature activation+1.26e-02

Pos logit contributions
 know+0.003
 creativity+0.003
 pleasure+0.003
 supervision+0.003
 agree+0.003
Neg logit contributions
itates-0.003
 upon-0.003
but-0.003
ied-0.003
Prin-0.003
Token uncle
Token ID7711
Feature activation+5.29e-02

Pos logit contributions
 know+0.018
 amaz+0.018
 wish+0.017
 pleasure+0.017
 creativity+0.017
Neg logit contributions
 upon-0.018
ied-0.018
but-0.018
 by-0.018
 into-0.017
Token Ben
Token ID3932
Feature activation+2.86e-02

Pos logit contributions
 amaz+0.066
 know+0.066
 creativity+0.065
 wish+0.064
 pleasure+0.063
Neg logit contributions
 upon-0.088
but-0.086
chard-0.083
just-0.083
itates-0.082
Token,
Token ID11
Feature activation+3.29e-02

Pos logit contributions
 amaz+0.050
 creativity+0.048
 pleasure+0.047
 wish+0.047
 know+0.046
Neg logit contributions
 upon-0.037
but-0.034
 by-0.032
 into-0.032
itates-0.032
Token "
Token ID366
Feature activation+6.16e-03

Pos logit contributions
 know+0.049
 creativity+0.048
 acts+0.047
 agree+0.046
 amaz+0.046
Neg logit contributions
but-0.047
 upon-0.047
itates-0.045
just-0.044
chard-0.044
TokenCan
Token ID6090
Feature activation+2.00e-02

Pos logit contributions
 agree+0.013
 know+0.013
 help+0.012
 Something+0.012
 offer+0.012
Neg logit contributions
ied-0.005
ained-0.005
itates-0.005
 upon-0.005
but-0.004
Token I
Token ID314
Feature activation+2.68e-02

Pos logit contributions
 know+0.026
 amaz+0.025
 agree+0.025
 creativity+0.025
 pleasure+0.024
Neg logit contributions
but-0.031
itates-0.031
 upon-0.031
chard-0.030
just-0.030
Token is
Token ID318
Feature activation+1.23e-02

Pos logit contributions
itates+0.011
but+0.011
 upon+0.011
ied+0.011
ained+0.010
Neg logit contributions
 know-0.007
 creativity-0.007
 agree-0.007
 supervision-0.006
 amaz-0.006
Token that
Token ID326
Feature activation-1.10e-03

Pos logit contributions
 know+0.015
 agree+0.015
 amaz+0.015
 creativity+0.015
 acts+0.015
Neg logit contributions
but-0.019
itates-0.019
 upon-0.019
ied-0.019
just-0.019
Token sometimes
Token ID3360
Feature activation+1.85e-02

Pos logit contributions
 by+0.001
 upon+0.001
but+0.001
just+0.001
 into+0.001
Neg logit contributions
 bisc-0.002
 know-0.002
 agree-0.002
 supervision-0.002
 amaz-0.002
Token things
Token ID1243
Feature activation+3.63e-02

Pos logit contributions
 bisc+0.032
 amaz+0.029
 supervision+0.027
 agree+0.026
 Something+0.026
Neg logit contributions
 by-0.025
 into-0.024
 upon-0.023
 up-0.022
 in-0.022
Token may
Token ID743
Feature activation+3.06e-02

Pos logit contributions
 amaz+0.039
 bisc+0.038
 pleasure+0.033
 Something+0.033
 supervision+0.033
Neg logit contributions
 into-0.068
 upon-0.066
 by-0.066
itating-0.065
ied-0.065
Token seem
Token ID1283
Feature activation+5.28e-02

Pos logit contributions
 amaz+0.037
 bisc+0.035
 supervision+0.034
 pleasure+0.033
 creativity+0.033
Neg logit contributions
 upon-0.053
itating-0.052
 into-0.050
 by-0.049
just-0.049
Token scary
Token ID14343
Feature activation+4.12e-02

Pos logit contributions
 amaz+0.083
 supervision+0.079
 creativity+0.077
 agree+0.077
 bisc+0.077
Neg logit contributions
 upon-0.070
 into-0.069
 by-0.064
itating-0.064
chard-0.063
Token or
Token ID393
Feature activation+1.47e-02

Pos logit contributions
 bisc+0.059
 amaz+0.059
 know+0.057
 agree+0.055
orse+0.054
Neg logit contributions
 into-0.065
 by-0.063
 upon-0.059
 in-0.057
 over-0.057
Token fearful
Token ID26844
Feature activation+3.52e-02

Pos logit contributions
 amaz+0.021
 agree+0.020
 bisc+0.020
 pleasure+0.020
 know+0.020
Neg logit contributions
chard-0.021
 upon-0.021
 into-0.020
itates-0.019
itating-0.019
Token,
Token ID11
Feature activation+7.43e-03

Pos logit contributions
 bisc+0.055
 amaz+0.054
 agree+0.052
 pleasure+0.051
 creativity+0.050
Neg logit contributions
 into-0.049
 by-0.048
 upon-0.046
ied-0.045
 over-0.044
Token but
Token ID475
Feature activation+1.79e-02

Pos logit contributions
 know+0.011
 agree+0.011
 bisc+0.010
 creativity+0.010
 amaz+0.010
Neg logit contributions
 upon-0.010
but-0.010
just-0.010
 by-0.010
 into-0.010
Token When
Token ID1649
Feature activation+1.10e-02

Pos logit contributions
 know+0.012
 creativity+0.012
 agree+0.012
 acts+0.012
 thank+0.012
Neg logit contributions
itates-0.012
but-0.012
 upon-0.012
ied-0.012
ained-0.011
Token it
Token ID340
Feature activation+2.80e-02

Pos logit contributions
 know+0.015
 creativity+0.015
 agree+0.014
 acts+0.014
 thank+0.014
Neg logit contributions
itates-0.017
but-0.017
ied-0.016
 upon-0.016
ained-0.016
Token started
Token ID2067
Feature activation+2.28e-02

Pos logit contributions
 know+0.038
 creativity+0.035
 agree+0.035
 thank+0.035
 help+0.034
Neg logit contributions
itates-0.043
but-0.043
chard-0.042
ied-0.041
ained-0.041
Token to
Token ID284
Feature activation+2.52e-02

Pos logit contributions
 know+0.042
 creativity+0.040
 help+0.040
 agree+0.039
 thank+0.039
Neg logit contributions
itates-0.025
but-0.024
ied-0.023
just-0.023
chard-0.022
Token get
Token ID651
Feature activation+4.61e-02

Pos logit contributions
 supervision+0.031
 creativity+0.030
 acts+0.030
 know+0.029
 generosity+0.029
Neg logit contributions
 upon-0.042
Prin-0.040
ained-0.039
itates-0.038
but-0.038
Token dark
Token ID3223
Feature activation+5.28e-02

Pos logit contributions
 know+0.062
 help+0.060
 creativity+0.056
 pleasure+0.054
 thank+0.054
Neg logit contributions
but-0.076
itates-0.074
just-0.070
itating-0.069
ross-0.069
Token,
Token ID11
Feature activation+2.45e-03

Pos logit contributions
 know+0.069
 agree+0.066
 supervision+0.064
 generosity+0.063
 creativity+0.062
Neg logit contributions
 into-0.095
 by-0.087
 upon-0.085
but-0.085
ained-0.085
Token Jill
Token ID22772
Feature activation-2.10e-03

Pos logit contributions
 know+0.003
 creativity+0.003
 agree+0.003
 thank+0.003
 acts+0.003
Neg logit contributions
itates-0.004
but-0.004
ied-0.004
ained-0.004
 upon-0.004
Token went
Token ID1816
Feature activation+2.29e-03

Pos logit contributions
itates+0.003
but+0.003
ied+0.003
 upon+0.003
ained+0.003
Neg logit contributions
 know-0.003
 creativity-0.003
 agree-0.003
 acts-0.003
 thank-0.003
Token back
Token ID736
Feature activation+2.38e-03

Pos logit contributions
 know+0.004
 help+0.004
 agree+0.004
 creativity+0.004
 thank+0.004
Neg logit contributions
itates-0.002
ied-0.002
but-0.002
ained-0.002
Prin-0.002
Token inside
Token ID2641
Feature activation-1.04e-02

Pos logit contributions
 know+0.005
 creativity+0.005
 agree+0.005
 thank+0.005
 acts+0.005
Neg logit contributions
 upon-0.002
but-0.002
 into-0.001
itates-0.001
ied-0.001
Token As
Token ID1081
Feature activation+8.90e-03

Pos logit contributions
 know+0.016
 acts+0.016
 creativity+0.016
 agree+0.016
 help+0.016
Neg logit contributions
but-0.018
 upon-0.017
ained-0.017
itates-0.017
ied-0.017
Token it
Token ID340
Feature activation+1.01e-02

Pos logit contributions
 know+0.012
 acts+0.012
 agree+0.012
 creativity+0.011
 supervision+0.011
Neg logit contributions
 upon-0.013
ied-0.013
but-0.013
ained-0.013
just-0.012
Token started
Token ID2067
Feature activation+1.25e-02

Pos logit contributions
 know+0.014
 creativity+0.013
 agree+0.013
 acts+0.013
 help+0.013
Neg logit contributions
itates-0.015
but-0.015
chard-0.015
just-0.014
ained-0.014
Token to
Token ID284
Feature activation+2.17e-02

Pos logit contributions
 know+0.025
 agree+0.025
 creativity+0.024
 acts+0.024
 supervision+0.024
Neg logit contributions
 upon-0.011
but-0.011
ied-0.010
ained-0.010
itates-0.010
Token get
Token ID651
Feature activation+4.24e-02

Pos logit contributions
 supervision+0.028
 creativity+0.026
 know+0.024
 omega+0.024
 acts+0.024
Neg logit contributions
 upon-0.036
Prin-0.033
 into-0.033
ained-0.033
 over-0.032
Token dark
Token ID3223
Feature activation+5.21e-02

Pos logit contributions
 know+0.064
 help+0.062
 creativity+0.061
 pleasure+0.060
 thank+0.059
Neg logit contributions
but-0.061
itates-0.061
just-0.055
ied-0.055
itating-0.054
Token,
Token ID11
Feature activation+6.18e-04

Pos logit contributions
 know+0.060
 agree+0.058
 supervision+0.056
 creativity+0.053
 Something+0.052
Neg logit contributions
 into-0.099
 upon-0.092
 by-0.091
but-0.091
ained-0.089
Token they
Token ID484
Feature activation-2.88e-04

Pos logit contributions
 know+0.001
 creativity+0.001
 agree+0.001
 acts+0.001
 thank+0.001
Neg logit contributions
itates-0.001
but-0.001
ied-0.001
 upon-0.001
ained-0.001
Token waved
Token ID26834
Feature activation+2.95e-03

Pos logit contributions
 upon+0.000
itates+0.000
but+0.000
ied+0.000
just+0.000
Neg logit contributions
 know-0.000
 creativity-0.000
 acts-0.000
 supervision-0.000
 agree-0.000
Token goodbye
Token ID24829
Feature activation-2.60e-03

Pos logit contributions
 know+0.005
 creativity+0.005
 agree+0.004
 acts+0.004
 supervision+0.004
Neg logit contributions
but-0.004
itates-0.004
 upon-0.004
ied-0.004
just-0.004
Token to
Token ID284
Feature activation-1.73e-02

Pos logit contributions
itates+0.003
but+0.003
ied+0.003
ained+0.003
 upon+0.003
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 thank-0.004
 help-0.004
Token.
Token ID13
Feature activation-3.53e-03

Pos logit contributions
but+0.020
 by+0.020
 into+0.020
 upon+0.019
ied+0.019
Neg logit contributions
 know-0.020
 agree-0.020
 disl-0.020
 creativity-0.020
 pleasure-0.019
Token Even
Token ID3412
Feature activation+1.59e-04

Pos logit contributions
but+0.005
itates+0.004
just+0.004
 upon+0.004
ained+0.004
Neg logit contributions
 agree-0.006
 creativity-0.006
 know-0.005
 bravery-0.005
 believe-0.005
Token though
Token ID996
Feature activation+1.67e-02

Pos logit contributions
 know+0.000
 creativity+0.000
 agree+0.000
 thank+0.000
 acts+0.000
Neg logit contributions
but-0.000
itates-0.000
 upon-0.000
ied-0.000
ained-0.000
Token it
Token ID340
Feature activation+2.48e-02

Pos logit contributions
 agree+0.030
 know+0.028
 amaz+0.028
 creativity+0.027
 bisc+0.027
Neg logit contributions
 upon-0.021
 by-0.018
 into-0.018
ained-0.018
but-0.018
Token may
Token ID743
Feature activation+2.61e-02

Pos logit contributions
 agree+0.035
 know+0.034
 supervision+0.033
 amaz+0.033
 creativity+0.033
Neg logit contributions
 upon-0.037
itates-0.036
ied-0.036
ained-0.035
just-0.035
Token seem
Token ID1283
Feature activation+5.20e-02

Pos logit contributions
 amaz+0.033
 supervision+0.031
 pleasure+0.031
 creativity+0.030
 agree+0.030
Neg logit contributions
 upon-0.044
 into-0.040
itating-0.039
ained-0.039
just-0.039
Token easy
Token ID2562
Feature activation+2.92e-02

Pos logit contributions
 amaz+0.082
 agree+0.080
 creativity+0.080
 supervision+0.079
 know+0.079
Neg logit contributions
 upon-0.068
 into-0.064
itates-0.061
chard-0.061
ained-0.061
Token and
Token ID290
Feature activation+7.57e-04

Pos logit contributions
 amaz+0.038
orse+0.038
 know+0.038
 agree+0.038
 bisc+0.036
Neg logit contributions
 into-0.051
 by-0.049
 upon-0.046
ied-0.045
 up-0.044
Token fun
Token ID1257
Feature activation+1.44e-02

Pos logit contributions
 agree+0.001
 bisc+0.001
orse+0.001
 amaz+0.001
 know+0.001
Neg logit contributions
 upon-0.001
 by-0.001
just-0.001
 into-0.001
but-0.001
Token to
Token ID284
Feature activation+1.03e-02

Pos logit contributions
 know+0.020
 bisc+0.020
 agree+0.020
orse+0.019
 amaz+0.019
Neg logit contributions
 into-0.023
 by-0.022
 upon-0.021
but-0.021
 in-0.020
Token do
Token ID466
Feature activation-9.46e-03

Pos logit contributions
 supervision+0.014
 bisc+0.014
 creativity+0.013
 amaz+0.013
 omega+0.013
Neg logit contributions
 by-0.014
 upon-0.014
 often-0.014
 over-0.014
 up-0.014
Token the
Token ID262
Feature activation-1.56e-03

Pos logit contributions
 agree+0.001
 creativity+0.001
 supervision+0.001
 know+0.001
 pleasure+0.001
Neg logit contributions
 late-0.000
chard-0.000
 into-0.000
 upon-0.000
 by-0.000
Token park
Token ID3952
Feature activation+4.47e-03

Pos logit contributions
itates+0.002
 upon+0.002
chard+0.002
Prin+0.002
ained+0.002
Neg logit contributions
 know-0.002
 creativity-0.002
 agree-0.002
 supervision-0.002
 acts-0.002
Token.
Token ID13
Feature activation+7.73e-03

Pos logit contributions
 know+0.006
 creativity+0.006
 agree+0.005
 thank+0.005
 pleasure+0.005
Neg logit contributions
itates-0.007
but-0.007
 upon-0.007
ied-0.007
ained-0.007
Token His
Token ID2399
Feature activation+2.29e-02

Pos logit contributions
 creativity+0.012
 bravery+0.011
 believe+0.011
 supervision+0.011
 know+0.011
Neg logit contributions
chard-0.011
 upon-0.011
itates-0.010
 into-0.010
ained-0.010
Token name
Token ID1438
Feature activation+3.73e-02

Pos logit contributions
 ah+0.043
 supervision+0.040
orse+0.040
 disl+0.039
ird+0.039
Neg logit contributions
 upon-0.025
 by-0.024
 up-0.023
man-0.020
 into-0.019
Token is
Token ID318
Feature activation+5.19e-02

Pos logit contributions
 supervision+0.069
 know+0.066
 creativity+0.065
 pleasure+0.065
 agree+0.065
Neg logit contributions
 by-0.042
 up-0.041
 over-0.040
 into-0.040
 upon-0.039
Token Leo
Token ID19632
Feature activation+8.54e-03

Pos logit contributions
 agree+0.076
 wish+0.072
 creativity+0.072
 offer+0.071
 know+0.070
Neg logit contributions
Prin-0.076
chard-0.074
 upon-0.071
 by-0.071
 into-0.069
Token.
Token ID13
Feature activation+9.13e-03

Pos logit contributions
 creativity+0.015
 amaz+0.015
 know+0.015
 pleasure+0.015
 supervision+0.015
Neg logit contributions
 upon-0.010
 by-0.010
 into-0.009
ling-0.009
 over-0.008
Token Leo
Token ID19632
Feature activation+2.10e-02

Pos logit contributions
 creativity+0.012
 supervision+0.011
 know+0.011
 agree+0.011
 generosity+0.011
Neg logit contributions
itates-0.015
 upon-0.015
chard-0.014
but-0.014
Prin-0.014
Token sees
Token ID7224
Feature activation+4.66e-03

Pos logit contributions
 supervision+0.034
 pleasure+0.033
 amaz+0.033
 creativity+0.032
 bravery+0.031
Neg logit contributions
 upon-0.029
itates-0.028
 by-0.026
 into-0.026
ling-0.025
Token their
Token ID511
Feature activation+1.63e-02

Pos logit contributions
 agree+0.008
 know+0.007
 creativity+0.007
 supervision+0.007
 pleasure+0.007
Neg logit contributions
 upon-0.006
chard-0.006
ied-0.005
 into-0.005
Prin-0.005
Token,
Token ID11
Feature activation+1.57e-02

Pos logit contributions
 know+0.026
 creativity+0.025
 supervision+0.024
 acts+0.024
 agree+0.024
Neg logit contributions
but-0.023
 upon-0.023
ied-0.022
 into-0.022
just-0.022
Token enjoying
Token ID13226
Feature activation+7.60e-03

Pos logit contributions
 know+0.025
 creativity+0.024
 acts+0.024
 agree+0.023
 wish+0.023
Neg logit contributions
but-0.021
 upon-0.021
just-0.020
chard-0.020
 into-0.020
Token the
Token ID262
Feature activation+6.64e-03

Pos logit contributions
 know+0.014
 supervision+0.014
 lend+0.014
 creativity+0.013
 agree+0.013
Neg logit contributions
 into-0.007
 upon-0.007
 by-0.007
chard-0.007
but-0.007
Token sights
Token ID21343
Feature activation+1.95e-02

Pos logit contributions
 know+0.010
 supervision+0.010
 creativity+0.009
 agree+0.009
 Something+0.009
Neg logit contributions
 upon-0.009
ling-0.009
just-0.009
ied-0.009
 by-0.009
Token and
Token ID290
Feature activation+5.36e-03

Pos logit contributions
 know+0.031
 supervision+0.030
 creativity+0.030
 Something+0.030
 agree+0.030
Neg logit contributions
 into-0.027
but-0.025
 by-0.025
ied-0.025
 upon-0.025
Token sounds
Token ID5238
Feature activation+5.15e-02

Pos logit contributions
 know+0.008
 agree+0.008
 creativity+0.008
 supervision+0.008
 thank+0.008
Neg logit contributions
 upon-0.007
but-0.007
itates-0.007
just-0.007
chard-0.007
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
 know+0.080
 creativity+0.077
 acts+0.076
 agree+0.075
 supervision+0.075
Neg logit contributions
but-0.069
ied-0.067
 upon-0.067
just-0.066
 into-0.065
Token Eventually
Token ID16178
Feature activation+1.29e-02

Pos logit contributions
 know+0.028
 acts+0.027
 creativity+0.027
 supervision+0.027
 wish+0.027
Neg logit contributions
but-0.028
 upon-0.028
itates-0.027
ained-0.026
just-0.026
Token,
Token ID11
Feature activation-3.84e-03

Pos logit contributions
 supervision+0.017
 know+0.017
 acts+0.016
 agree+0.016
 disl+0.015
Neg logit contributions
 into-0.021
 upon-0.020
ied-0.019
 by-0.019
side-0.019
Token the
Token ID262
Feature activation+1.82e-04

Pos logit contributions
itates+0.008
but+0.007
ied+0.007
ained+0.007
just+0.007
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.003
 help-0.003
 thank-0.003
Token reached
Token ID4251
Feature activation-6.42e-03

Pos logit contributions
 know+0.000
 agree+0.000
 supervision+0.000
 creativity+0.000
 disl+0.000
Neg logit contributions
 upon-0.000
chard-0.000
itates-0.000
ied-0.000
but-0.000
Token.
Token ID13
Feature activation+2.42e-03

Pos logit contributions
 know+0.011
 creativity+0.010
 amaz+0.010
 agree+0.010
 pleasure+0.010
Neg logit contributions
ied-0.010
but-0.010
 upon-0.010
ained-0.009
 into-0.009
Token When
Token ID1649
Feature activation+7.68e-03

Pos logit contributions
 know+0.004
 creativity+0.004
 acts+0.004
 agree+0.004
 bravery+0.004
Neg logit contributions
but-0.003
 upon-0.003
ained-0.003
itates-0.003
just-0.003
Token it
Token ID340
Feature activation+2.57e-02

Pos logit contributions
 know+0.012
 agree+0.011
 creativity+0.011
 acts+0.011
 supervision+0.011
Neg logit contributions
but-0.010
 upon-0.010
itates-0.010
ied-0.010
ained-0.010
Token started
Token ID2067
Feature activation+1.96e-02

Pos logit contributions
 know+0.036
 creativity+0.034
 agree+0.033
 thank+0.033
 acts+0.033
Neg logit contributions
itates-0.039
but-0.039
ied-0.037
ained-0.037
 upon-0.037
Token getting
Token ID1972
Feature activation+5.02e-02

Pos logit contributions
 know+0.036
 creativity+0.035
 help+0.035
 thank+0.034
 agree+0.034
Neg logit contributions
itates-0.021
but-0.020
ied-0.020
chard-0.019
just-0.019
Token dark
Token ID3223
Feature activation+5.14e-02

Pos logit contributions
 help+0.065
 know+0.061
 care+0.056
 Sammy+0.054
 thank+0.053
Neg logit contributions
but-0.089
ross-0.083
itates-0.082
we-0.082
ied-0.082
Token out
Token ID503
Feature activation+1.95e-02

Pos logit contributions
 know+0.067
 agree+0.065
 creativity+0.063
 supervision+0.062
 generosity+0.061
Neg logit contributions
 into-0.084
but-0.082
ained-0.081
 upon-0.081
ied-0.079
Token,
Token ID11
Feature activation-6.66e-04

Pos logit contributions
 agree+0.020
 know+0.020
 creativity+0.019
 supervision+0.018
 generosity+0.018
Neg logit contributions
 upon-0.038
 into-0.038
ied-0.036
but-0.036
 by-0.035
Token Sally
Token ID25737
Feature activation-4.87e-03

Pos logit contributions
but+0.001
itates+0.001
 upon+0.001
ied+0.001
ained+0.001
Neg logit contributions
 know-0.001
 creativity-0.001
 agree-0.001
 acts-0.001
 thank-0.001
Token realised
Token ID19169
Feature activation-5.47e-03

Pos logit contributions
itates+0.007
but+0.007
 upon+0.007
ied+0.007
just+0.007
Neg logit contributions
 know-0.007
 creativity-0.007
 agree-0.006
 acts-0.006
 thank-0.006
Token that
Token ID326
Feature activation-9.54e-04

Pos logit contributions
but+0.008
 upon+0.008
ied+0.008
just+0.007
chard+0.007
Neg logit contributions
 know-0.008
 agree-0.008
 creativity-0.008
 pleasure-0.008
 acts-0.008
Token soon
Token ID2582
Feature activation-7.88e-03

Pos logit contributions
 know+0.004
 creativity+0.003
 thank+0.003
 agree+0.003
 pleasure+0.003
Neg logit contributions
itates-0.004
but-0.004
ied-0.004
just-0.003
 upon-0.003
Token nothing
Token ID2147
Feature activation-1.41e-02

Pos logit contributions
itates+0.010
but+0.010
ied+0.010
ained+0.009
 upon+0.009
Neg logit contributions
 know-0.013
 creativity-0.012
 agree-0.012
 thank-0.012
 help-0.012
Token but
Token ID475
Feature activation-5.21e-03

Pos logit contributions
but+0.019
ied+0.018
itates+0.018
 upon+0.018
just+0.018
Neg logit contributions
 know-0.022
 agree-0.021
 creativity-0.021
 amaz-0.020
 acts-0.020
Token a
Token ID257
Feature activation-3.59e-03

Pos logit contributions
 upon+0.006
but+0.006
ied+0.006
 into+0.006
itates+0.006
Neg logit contributions
 know-0.008
 agree-0.008
 supervision-0.008
 creativity-0.008
 acts-0.008
Token small
Token ID1402
Feature activation+7.67e-03

Pos logit contributions
 upon+0.005
chard+0.005
Prin+0.005
itates+0.005
but+0.005
Neg logit contributions
 know-0.005
 acts-0.005
 supervision-0.005
 agree-0.005
 creativity-0.005
Token spe
Token ID693
Feature activation-6.08e-02

Pos logit contributions
 supervision+0.011
 know+0.011
 disl+0.011
drivers+0.010
 creativity+0.010
Neg logit contributions
 upon-0.010
 by-0.010
chard-0.010
 into-0.010
Prin-0.009
Tokenck
Token ID694
Feature activation-1.32e-02

Pos logit contributions
itates+0.124
 upon+0.122
but+0.122
chard+0.117
ained+0.117
Neg logit contributions
 know-0.057
 agree-0.054
 thank-0.047
 wish-0.047
 offer-0.047
Token in
Token ID287
Feature activation-1.37e-02

Pos logit contributions
itates+0.016
but+0.016
ied+0.015
 upon+0.015
ained+0.015
Neg logit contributions
 know-0.022
 creativity-0.021
 agree-0.021
 acts-0.020
 thank-0.020
Token the
Token ID262
Feature activation-9.01e-03

Pos logit contributions
 upon+0.015
ied+0.014
Prin+0.014
but+0.014
itates+0.014
Neg logit contributions
 know-0.024
 agree-0.023
 creativity-0.023
 supervision-0.023
 acts-0.022
Token sky
Token ID6766
Feature activation+2.54e-03

Pos logit contributions
itates+0.011
but+0.011
ied+0.011
ained+0.011
just+0.011
Neg logit contributions
 know-0.015
 thank-0.015
 creativity-0.015
 help-0.014
 offer-0.014
Token.
Token ID13
Feature activation+3.45e-03

Pos logit contributions
 know+0.004
 creativity+0.003
 agree+0.003
 supervision+0.003
 pleasure+0.003
Neg logit contributions
 upon-0.004
but-0.004
ied-0.004
just-0.004
itates-0.004
Token He
Token ID679
Feature activation-2.81e-02

Pos logit contributions
itates+0.033
chard+0.033
 upon+0.032
but+0.032
ained+0.032
Neg logit contributions
 creativity-0.033
 agree-0.033
 know-0.033
 supervision-0.033
 bravery-0.032
Token was
Token ID373
Feature activation-1.43e-02

Pos logit contributions
but+0.052
itating+0.050
just+0.049
itates+0.049
ied+0.049
Neg logit contributions
 know-0.032
 thank-0.028
 wish-0.028
 acts-0.028
 agree-0.028
Token so
Token ID523
Feature activation-9.52e-03

Pos logit contributions
but+0.023
itates+0.022
side+0.022
ied+0.022
just+0.022
Neg logit contributions
 know-0.020
 help-0.018
 thank-0.018
 creativity-0.017
 lend-0.017
Token angry
Token ID7954
Feature activation-7.00e-03

Pos logit contributions
itates+0.014
but+0.014
 upon+0.014
ied+0.014
ained+0.014
Neg logit contributions
 know-0.013
 creativity-0.013
 agree-0.013
 acts-0.012
 thank-0.012
Token he
Token ID339
Feature activation-4.87e-02

Pos logit contributions
but+0.012
itates+0.012
 upon+0.012
ied+0.012
just+0.012
Neg logit contributions
 know-0.008
 creativity-0.008
 agree-0.008
 acts-0.008
 amaz-0.007
Token would
Token ID561
Feature activation-5.92e-02

Pos logit contributions
but+0.087
side+0.084
itates+0.083
ied+0.083
just+0.082
Neg logit contributions
 know-0.056
 thank-0.051
 help-0.050
 agree-0.050
 wish-0.049
Token squ
Token ID2809
Feature activation-1.53e-02

Pos logit contributions
itates+0.103
but+0.102
ied+0.101
ained+0.100
 upon+0.099
Neg logit contributions
 know-0.072
 thank-0.065
 help-0.065
 agree-0.065
 creativity-0.064
Tokenawk
Token ID19301
Feature activation-7.31e-03

Pos logit contributions
but+0.024
itates+0.024
chard+0.024
just+0.023
 upon+0.023
Neg logit contributions
 know-0.021
 agree-0.020
 creativity-0.019
 acts-0.019
 wish-0.018
Token and
Token ID290
Feature activation-6.29e-03

Pos logit contributions
itates+0.009
but+0.009
ained+0.009
ied+0.009
chard+0.009
Neg logit contributions
 know-0.012
 agree-0.011
 creativity-0.011
 thank-0.011
 acts-0.011
Token scree
Token ID43465
Feature activation-4.32e-03

Pos logit contributions
itates+0.013
but+0.013
 upon+0.013
chard+0.012
ied+0.012
Neg logit contributions
 know-0.005
 creativity-0.005
 agree-0.005
 acts-0.004
 pleasure-0.004
Tokench
Token ID354
Feature activation-1.37e-02

Pos logit contributions
chard+0.007
but+0.006
itates+0.006
 upon+0.006
ied+0.006
Neg logit contributions
 know-0.006
 agree-0.006
 acts-0.005
 thank-0.005
 help-0.005
Token came
Token ID1625
Feature activation-2.85e-02

Pos logit contributions
itates+0.005
 upon+0.005
but+0.005
ied+0.005
just+0.005
Neg logit contributions
 know-0.007
 creativity-0.007
 pleasure-0.007
 agree-0.007
 amaz-0.006
Token and
Token ID290
Feature activation-6.55e-03

Pos logit contributions
chard+0.045
Prin+0.044
itates+0.044
abby+0.043
side+0.043
Neg logit contributions
 know-0.038
,-0.035
 help-0.035
 swim-0.032
 hop-0.032
Token stole
Token ID18240
Feature activation-2.38e-02

Pos logit contributions
ained+0.013
but+0.012
itates+0.012
side+0.012
ied+0.012
Neg logit contributions
 know-0.007
 help-0.007
 thank-0.007
 give-0.006
 chew-0.006
Token his
Token ID465
Feature activation-7.55e-03

Pos logit contributions
itates+0.034
but+0.034
ied+0.033
 upon+0.033
ained+0.032
Neg logit contributions
 know-0.035
 creativity-0.033
 agree-0.033
 acts-0.032
 thank-0.032
Token paint
Token ID7521
Feature activation-2.63e-02

Pos logit contributions
 upon+0.011
ied+0.011
but+0.011
 into+0.011
chard+0.011
Neg logit contributions
 know-0.010
 agree-0.009
 creativity-0.009
 acts-0.009
 Something-0.009
Tokenbrush
Token ID32680
Feature activation-5.84e-02

Pos logit contributions
itates+0.048
 upon+0.046
chard+0.045
but+0.045
ied+0.044
Neg logit contributions
 know-0.029
 help-0.027
 creativity-0.027
 thank-0.027
 agree-0.027
Token.
Token ID13
Feature activation-9.79e-03

Pos logit contributions
itates+0.120
Prin+0.115
chard+0.111
ied+0.109
 upon+0.109
Neg logit contributions
 know-0.056
 agree-0.049
 help-0.048
 thank-0.046
 chew-0.046
Token The
Token ID383
Feature activation-1.21e-02

Pos logit contributions
ied+0.018
itates+0.018
ained+0.018
but+0.018
just+0.018
Neg logit contributions
 know-0.011
 agree-0.009
 thank-0.009
 creativity-0.009
 Something-0.009
Token painter
Token ID34537
Feature activation-2.71e-02

Pos logit contributions
ained+0.022
itates+0.022
ied+0.022
but+0.021
itating+0.021
Neg logit contributions
 help-0.015
 know-0.014
 thank-0.014
 agree-0.014
 creativity-0.014
Token was
Token ID373
Feature activation-8.37e-03

Pos logit contributions
itates+0.047
but+0.047
ained+0.047
 upon+0.046
itating+0.045
Neg logit contributions
 know-0.032
 agree-0.029
 creativity-0.029
 acts-0.028
 help-0.028
Token sad
Token ID6507
Feature activation+3.42e-03

Pos logit contributions
but+0.013
itates+0.013
ied+0.013
side+0.013
abby+0.013
Neg logit contributions
 know-0.012
 help-0.011
 thank-0.010
 creativity-0.010
 bravery-0.010
Token.
Token ID13
Feature activation-1.76e-02

Pos logit contributions
itates+0.067
but+0.067
ied+0.065
 upon+0.065
ained+0.065
Neg logit contributions
 know-0.052
 creativity-0.049
 agree-0.048
 acts-0.047
 thank-0.047
Token She
Token ID1375
Feature activation-2.79e-02

Pos logit contributions
itates+0.028
but+0.028
ied+0.027
 upon+0.027
ained+0.027
Neg logit contributions
 know-0.023
 creativity-0.022
 agree-0.021
 thank-0.021
 acts-0.021
Token was
Token ID373
Feature activation-2.31e-02

Pos logit contributions
but+0.048
ained+0.047
itates+0.047
just+0.046
itating+0.046
Neg logit contributions
 know-0.036
 creativity-0.031
 agree-0.031
 acts-0.031
 help-0.031
Token so
Token ID523
Feature activation-2.00e-02

Pos logit contributions
but+0.035
itates+0.034
just+0.033
Prin+0.033
abby+0.033
Neg logit contributions
 know-0.034
 help-0.031
 creativity-0.031
 thank-0.030
 agree-0.030
Token sad
Token ID6507
Feature activation-1.11e-02

Pos logit contributions
but+0.034
itates+0.033
just+0.032
side+0.032
chard+0.032
Neg logit contributions
 know-0.027
 creativity-0.024
 help-0.024
 thank-0.023
 agree-0.023
Token she
Token ID673
Feature activation-5.75e-02

Pos logit contributions
itates+0.019
but+0.019
ied+0.019
chard+0.019
ained+0.019
Neg logit contributions
 know-0.013
 creativity-0.012
 agree-0.012
 thank-0.011
 acts-0.011
Token started
Token ID2067
Feature activation-3.86e-02

Pos logit contributions
but+0.102
ained+0.099
side+0.098
itating+0.096
just+0.096
Neg logit contributions
 know-0.072
 help-0.064
 wish-0.063
 agree-0.061
 thank-0.061
Token to
Token ID284
Feature activation-2.79e-02

Pos logit contributions
itates+0.049
abby+0.046
ained+0.044
chard+0.044
but+0.043
Neg logit contributions
 know-0.069
 help-0.067
 creativity-0.065
 agree-0.062
 chew-0.061
Token cry
Token ID3960
Feature activation-2.91e-03

Pos logit contributions
itates+0.058
 upon+0.058
 into+0.058
ied+0.057
ained+0.057
Neg logit contributions
 creativity-0.019
 pleasure-0.018
 generosity-0.018
 acts-0.018
 supervision-0.017
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
itates+0.005
but+0.005
 upon+0.005
ied+0.004
ained+0.004
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 acts-0.003
 thank-0.003
Token
Token ID198
Feature activation-1.82e-02

Pos logit contributions
itates+0.028
but+0.027
ied+0.027
 upon+0.027
ained+0.027
Neg logit contributions
 know-0.022
 creativity-0.021
 agree-0.020
 thank-0.020
 help-0.020
Token
Token ID198
Feature activation-2.53e-03

Pos logit contributions
chard+0.025
itates+0.025
ied+0.024
just+0.024
but+0.024
Neg logit contributions
 know-0.013
 agree-0.012
 help-0.010
 thank-0.010
 offer-0.010
TokenOne
Token ID3198
Feature activation+6.54e-03

Pos logit contributions
chard+0.004
itates+0.004
rier+0.003
ied+0.003
bel+0.003
Neg logit contributions
 know-0.004
 agree-0.004
 chew-0.003
 wish-0.003
 help-0.003
Token day
Token ID1110
Feature activation+1.71e-02

Pos logit contributions
 know+0.011
 creativity+0.010
 agree+0.010
 thank+0.010
 help+0.010
Neg logit contributions
itates-0.008
but-0.008
ied-0.008
just-0.008
ained-0.008
Token,
Token ID11
Feature activation-4.70e-04

Pos logit contributions
 know+0.012
 creativity+0.011
 agree+0.011
 supervision+0.011
 acts+0.011
Neg logit contributions
 upon-0.038
but-0.037
itates-0.037
ied-0.037
ained-0.036
Token L
Token ID406
Feature activation-1.64e-02

Pos logit contributions
itates+0.001
ied+0.001
but+0.001
just+0.001
ained+0.001
Neg logit contributions
 know-0.000
 creativity-0.000
 help-0.000
 thank-0.000
 agree-0.000
Tokenott
Token ID1252
Feature activation-5.69e-02

Pos logit contributions
itates+0.025
chard+0.025
ained+0.024
itating+0.024
but+0.024
Neg logit contributions
 know-0.023
 creativity-0.021
 agree-0.020
 help-0.020
 wish-0.020
Tokenie
Token ID494
Feature activation-1.81e-02

Pos logit contributions
chard+0.094
itates+0.083
itating+0.083
ifying+0.078
but+0.078
Neg logit contributions
 know-0.084
 acts-0.074
 agree-0.073
 wishes-0.071
 disl-0.071
Token's
Token ID338
Feature activation-4.08e-03

Pos logit contributions
itates+0.027
but+0.027
ied+0.026
ained+0.026
 upon+0.026
Neg logit contributions
 know-0.025
 creativity-0.024
 agree-0.024
 acts-0.023
 thank-0.023
Token mom
Token ID1995
Feature activation+6.09e-03

Pos logit contributions
 upon+0.005
 by+0.005
ied+0.005
bel+0.005
chard+0.005
Neg logit contributions
 know-0.006
 agree-0.006
 disl-0.006
 acts-0.006
 supervision-0.006
Token went
Token ID1816
Feature activation-1.72e-02

Pos logit contributions
 know+0.008
 agree+0.007
 creativity+0.007
 thank+0.007
 help+0.007
Neg logit contributions
itates-0.010
but-0.010
ained-0.009
ied-0.009
chard-0.009
Token to
Token ID284
Feature activation-1.29e-02

Pos logit contributions
itates+0.024
rated+0.023
abby+0.023
but+0.022
ied+0.022
Neg logit contributions
 help-0.029
 know-0.026
 lend-0.026
 give-0.026
 offer-0.025
Token old
Token ID1468
Feature activation-6.85e-03

Pos logit contributions
chard+0.010
but+0.010
ied+0.010
ained+0.009
 upon+0.009
Neg logit contributions
 know-0.009
 creativity-0.009
 agree-0.009
 Something-0.009
 acts-0.009
Token tire
Token ID15867
Feature activation-2.93e-02

Pos logit contributions
 upon+0.011
but+0.011
chard+0.011
itates+0.011
ied+0.011
Neg logit contributions
 know-0.008
 agree-0.008
 creativity-0.008
 supervision-0.007
 acts-0.007
Token.
Token ID13
Feature activation-1.10e-02

Pos logit contributions
itates+0.045
but+0.043
 upon+0.042
ied+0.042
Prin+0.042
Neg logit contributions
 know-0.039
 agree-0.038
 help-0.037
 creativity-0.037
 acts-0.037
Token "
Token ID366
Feature activation-2.58e-02

Pos logit contributions
but+0.017
 upon+0.016
ained+0.016
just+0.015
 into+0.015
Neg logit contributions
 agree-0.016
 creativity-0.015
 supervision-0.015
 know-0.015
 acts-0.015
TokenO
Token ID46
Feature activation-3.55e-02

Pos logit contributions
ied+0.024
itates+0.024
 upon+0.023
but+0.022
ained+0.022
Neg logit contributions
 know-0.050
 Something-0.047
 agree-0.047
 amaz-0.047
 chew-0.047
Tokenoh
Token ID1219
Feature activation-5.64e-02

Pos logit contributions
but+0.033
ained+0.032
itates+0.032
just+0.031
 upon+0.031
Neg logit contributions
 creativity-0.070
 know-0.069
 thank-0.069
 pleasure-0.069
 amaz-0.069
Token,
Token ID11
Feature activation-2.00e-02

Pos logit contributions
 upon+0.090
but+0.090
just+0.087
itates+0.086
ied+0.086
Neg logit contributions
 creativity-0.069
 know-0.068
 acts-0.067
 agree-0.066
 amaz-0.065
Token what
Token ID644
Feature activation-3.10e-03

Pos logit contributions
 upon+0.032
 by+0.030
but+0.028
just+0.027
ied+0.027
Neg logit contributions
 agree-0.030
 acts-0.027
 creativity-0.026
 generosity-0.025
 enthusiasm-0.024
Token can
Token ID460
Feature activation-3.03e-02

Pos logit contributions
 upon+0.005
 by+0.005
itates+0.005
ied+0.005
ained+0.004
Neg logit contributions
 agree-0.004
 creativity-0.004
 amaz-0.004
 supervision-0.004
 pleasure-0.004
Token I
Token ID314
Feature activation-2.10e-02

Pos logit contributions
 upon+0.036
but+0.036
 into+0.035
ied+0.034
just+0.034
Neg logit contributions
 agree-0.054
 amaz-0.050
 pleasure-0.048
 creativity-0.047
 know-0.047
Token do
Token ID466
Feature activation-4.46e-02

Pos logit contributions
 upon+0.033
 by+0.032
 into+0.032
just+0.031
but+0.030
Neg logit contributions
 creativity-0.026
 amaz-0.026
 supervision-0.026
 acts-0.025
 generosity-0.025
Token did
Token ID750
Feature activation-4.53e-02

Pos logit contributions
 upon+0.031
 by+0.027
ied+0.026
chard+0.026
just+0.026
Neg logit contributions
 agree-0.021
 creativity-0.019
 amaz-0.019
 supervision-0.019
 acts-0.018
Token you
Token ID345
Feature activation-2.45e-02

Pos logit contributions
 upon+0.043
chard+0.042
 by+0.040
just+0.039
 into+0.039
Neg logit contributions
 agree-0.084
 amaz-0.083
 pleasure-0.080
 acts-0.080
 know-0.079
Token take
Token ID1011
Feature activation-2.64e-02

Pos logit contributions
 upon+0.051
itates+0.050
but+0.050
ied+0.050
chard+0.050
Neg logit contributions
 creativity-0.018
 acts-0.017
 know-0.017
 amaz-0.017
 agree-0.017
Token all
Token ID477
Feature activation-3.47e-02

Pos logit contributions
 into+0.030
 by+0.027
ied+0.026
 over+0.026
 up+0.026
Neg logit contributions
 agree-0.049
 amaz-0.048
 supervision-0.045
 acts-0.045
 Something-0.045
Token the
Token ID262
Feature activation-2.20e-02

Pos logit contributions
itates+0.073
itating+0.067
chard+0.064
but+0.063
ators+0.063
Neg logit contributions
 know-0.041
 help-0.040
 care-0.034
 thank-0.034
 kindness-0.034
Token carrots
Token ID34397
Feature activation-5.62e-02

Pos logit contributions
itates+0.038
but+0.038
ied+0.037
ained+0.037
 upon+0.037
Neg logit contributions
 know-0.026
 creativity-0.024
 agree-0.024
 acts-0.023
 help-0.023
Token?"
Token ID1701
Feature activation+5.28e-03

Pos logit contributions
itates+0.077
ied+0.073
but+0.072
Prin+0.071
just+0.070
Neg logit contributions
 know-0.090
 agree-0.084
 help-0.084
 offer-0.081
 thank-0.080
Token Benny
Token ID44275
Feature activation-5.00e-03

Pos logit contributions
 amaz+0.006
 supervision+0.006
 pleasure+0.006
 creativity+0.006
 generosity+0.006
Neg logit contributions
but-0.009
 upon-0.009
chard-0.009
 into-0.009
 by-0.008
Token asked
Token ID1965
Feature activation-7.21e-03

Pos logit contributions
itates+0.006
 upon+0.006
Prin+0.006
chard+0.006
but+0.006
Neg logit contributions
 pleasure-0.008
 creativity-0.008
 amaz-0.008
 know-0.008
 supervision-0.007
Token.
Token ID13
Feature activation+8.55e-03

Pos logit contributions
 upon+0.011
but+0.011
itates+0.010
ied+0.010
 into+0.010
Neg logit contributions
 know-0.010
 creativity-0.010
 amaz-0.009
 acts-0.009
 pleasure-0.009
Token
Token ID220
Feature activation-2.40e-02

Pos logit contributions
 know+0.008
 creativity+0.008
 supervision+0.008
 acts+0.007
 wish+0.007
Neg logit contributions
but-0.017
 upon-0.017
chard-0.016
itates-0.016
just-0.016
Tokenâ
Token ID22940
Feature activation-2.20e-02

Pos logit contributions
but+0.020
 upon+0.019
just+0.019
itates+0.019
Prin+0.018
Neg logit contributions
 creativity-0.021
 bravery-0.020
 amaz-0.020
 wish-0.019
 agree-0.019
Token€
Token ID26391
Feature activation-1.30e-02

Pos logit contributions
itates+0.023
but+0.023
 upon+0.023
ied+0.022
ained+0.022
Neg logit contributions
 know-0.040
 agree-0.039
 creativity-0.038
 thank-0.038
 acts-0.038
Token
Token ID198
Feature activation-1.81e-02

Pos logit contributions
itates+0.016
 upon+0.016
but+0.016
just+0.015
chard+0.014
Neg logit contributions
 creativity-0.021
 generosity-0.021
 kindness-0.021
 acts-0.021
 offer-0.021
Token
Token ID198
Feature activation-6.06e-03

Pos logit contributions
but+0.028
itates+0.027
 upon+0.027
just+0.026
ied+0.026
Neg logit contributions
 creativity-0.025
 know-0.024
 acts-0.024
 supervision-0.024
 agree-0.023
TokenP
Token ID47
Feature activation-2.59e-02

Pos logit contributions
itates+0.006
but+0.006
 upon+0.006
ied+0.006
ained+0.005
Neg logit contributions
 know-0.012
 creativity-0.011
 agree-0.011
 acts-0.011
 thank-0.011
Tokenipp
Token ID3974
Feature activation-5.58e-02

Pos logit contributions
 upon+0.033
ied+0.033
but+0.031
itates+0.031
just+0.030
Neg logit contributions
 creativity-0.042
 know-0.041
 bravery-0.040
 amaz-0.040
 help-0.039
Tokena
Token ID64
Feature activation-1.03e-02

Pos logit contributions
ained+0.058
but+0.055
itates+0.055
chard+0.055
just+0.052
Neg logit contributions
 know-0.106
 agree-0.102
 acts-0.098
 creativity-0.097
 thank-0.097
Token gigg
Token ID30442
Feature activation-1.11e-02

Pos logit contributions
but+0.015
 upon+0.015
itates+0.015
ied+0.015
just+0.015
Neg logit contributions
 know-0.014
 creativity-0.014
 supervision-0.014
 pleasure-0.013
 amaz-0.013
Tokenled
Token ID992
Feature activation-2.49e-03

Pos logit contributions
but+0.009
chard+0.009
bel+0.009
side+0.009
just+0.009
Neg logit contributions
 agree-0.022
 acts-0.022
 know-0.022
 thank-0.021
 creativity-0.021
Token and
Token ID290
Feature activation-9.30e-03

Pos logit contributions
itates+0.004
but+0.004
ied+0.004
ained+0.004
 upon+0.004
Neg logit contributions
 know-0.003
 creativity-0.003
 agree-0.003
 thank-0.003
 help-0.003
Token said
Token ID531
Feature activation-1.51e-03

Pos logit contributions
 into+0.014
but+0.014
 upon+0.014
chard+0.013
ied+0.013
Neg logit contributions
 creativity-0.012
 supervision-0.012
 know-0.012
 acts-0.011
 pleasure-0.011
Token the
Token ID262
Feature activation-2.22e-02

Pos logit contributions
ied+0.035
 upon+0.035
chard+0.035
ained+0.034
but+0.034
Neg logit contributions
 know-0.059
 agree-0.059
 acts-0.058
 creativity-0.057
 amaz-0.057
Token nic
Token ID9200
Feature activation-3.74e-02

Pos logit contributions
 upon+0.028
but+0.028
itates+0.027
ained+0.027
just+0.027
Neg logit contributions
 know-0.035
 agree-0.033
 creativity-0.033
 amaz-0.032
 thank-0.032
Tokenest
Token ID395
Feature activation-1.86e-02

Pos logit contributions
itates+0.057
but+0.056
ied+0.055
 upon+0.055
ained+0.055
Neg logit contributions
 know-0.051
 creativity-0.049
 agree-0.048
 acts-0.047
 thank-0.047
Token coats
Token ID30720
Feature activation-2.71e-02

Pos logit contributions
 by+0.025
 upon+0.025
ling+0.024
 into+0.024
but+0.023
Neg logit contributions
 know-0.028
 disl-0.027
 Something-0.026
 creativity-0.026
 supervision-0.026
Token he
Token ID339
Feature activation-3.70e-02

Pos logit contributions
but+0.037
ied+0.035
 upon+0.035
 into+0.035
just+0.035
Neg logit contributions
 know-0.044
 amaz-0.043
 creativity-0.042
 agree-0.042
 acts-0.041
Token had
Token ID550
Feature activation-5.44e-02

Pos logit contributions
itates+0.050
but+0.049
 upon+0.049
ied+0.048
just+0.048
Neg logit contributions
 know-0.057
 creativity-0.055
 agree-0.054
 acts-0.053
 amaz-0.052
Token ever
Token ID1683
Feature activation-3.07e-02

Pos logit contributions
itates+0.101
 hotter+0.098
itating+0.097
chard+0.092
Prin+0.091
Neg logit contributions
 know-0.063
 help-0.059
 agree-0.057
 believe-0.055
 lend-0.055
Token seen
Token ID1775
Feature activation-2.16e-02

Pos logit contributions
itates+0.043
but+0.042
ied+0.041
 upon+0.041
ained+0.041
Neg logit contributions
 know-0.046
 creativity-0.044
 agree-0.043
 acts-0.042
 thank-0.042
Token!
Token ID0
Feature activation-1.12e-02

Pos logit contributions
itates+0.042
itating+0.039
bel+0.038
ied+0.038
Prin+0.038
Neg logit contributions
 know-0.020
 help-0.020
 agree-0.019
 thank-0.019
 creativity-0.019
Token
Token ID198
Feature activation-7.63e-03

Pos logit contributions
but+0.020
itates+0.020
 upon+0.020
ained+0.019
ied+0.019
Neg logit contributions
 know-0.012
 creativity-0.012
 agree-0.011
 acts-0.011
 thank-0.011
Token
Token ID198
Feature activation-1.28e-03

Pos logit contributions
chard+0.015
ied+0.015
itates+0.015
ained+0.015
rier+0.015
Neg logit contributions
 know-0.007
 agree-0.006
 help-0.005
 Something-0.005
 chew-0.005
Token,
Token ID11
Feature activation-1.99e-02

Pos logit contributions
 by+0.008
ling+0.007
 into+0.007
Prin+0.007
 upon+0.007
Neg logit contributions
 supervision-0.007
 pleasure-0.007
 creativity-0.007
 generosity-0.007
 agree-0.007
Token surprised
Token ID6655
Feature activation-2.65e-02

Pos logit contributions
itates+0.031
but+0.030
ied+0.030
 upon+0.030
ained+0.030
Neg logit contributions
 know-0.027
 creativity-0.025
 agree-0.025
 thank-0.025
 acts-0.024
Token voice
Token ID3809
Feature activation-9.90e-03

Pos logit contributions
itates+0.043
but+0.042
ained+0.041
ied+0.041
 upon+0.041
Neg logit contributions
 know-0.035
 creativity-0.033
 agree-0.033
 thank-0.032
 help-0.032
Token,
Token ID11
Feature activation-1.65e-02

Pos logit contributions
itates+0.015
but+0.015
ied+0.015
 upon+0.015
just+0.015
Neg logit contributions
 know-0.013
 creativity-0.013
 agree-0.013
 acts-0.012
 thank-0.012
Token "
Token ID366
Feature activation-2.58e-02

Pos logit contributions
itates+0.026
but+0.026
ied+0.025
 upon+0.025
ained+0.025
Neg logit contributions
 know-0.021
 creativity-0.020
 agree-0.020
 thank-0.020
 acts-0.019
Tokenoh
Token ID1219
Feature activation-5.44e-02

Pos logit contributions
ied+0.031
chard+0.030
itates+0.028
 upon+0.028
rier+0.028
Neg logit contributions
 know-0.042
 Something-0.039
 acts-0.039
 chew-0.039
 agree-0.037
Token my
Token ID616
Feature activation-9.42e-03

Pos logit contributions
 upon+0.096
but+0.094
just+0.092
itates+0.090
bel+0.090
Neg logit contributions
 creativity-0.059
 acts-0.058
 know-0.057
 agree-0.056
 generosity-0.055
Token!"
Token ID2474
Feature activation-1.09e-02

Pos logit contributions
 upon+0.016
but+0.015
itates+0.015
ied+0.014
just+0.014
Neg logit contributions
 agree-0.012
 know-0.011
 supervision-0.011
 creativity-0.011
 generosity-0.011
Token
Token ID220
Feature activation-2.84e-02

Pos logit contributions
but+0.016
itates+0.015
 upon+0.015
just+0.015
 into+0.015
Neg logit contributions
 creativity-0.015
 supervision-0.015
 know-0.015
 acts-0.015
 amaz-0.014
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
itates+0.043
but+0.043
 upon+0.043
ained+0.042
ied+0.041
Neg logit contributions
 creativity-0.037
 know-0.037
 acts-0.036
 agree-0.036
 amaz-0.035
Token
Token ID198
Feature activation-1.54e-03

Pos logit contributions
itates+0.021
ied+0.021
ained+0.021
chard+0.021
but+0.021
Neg logit contributions
 know-0.013
 agree-0.012
 help-0.011
 creativity-0.011
 thank-0.011
Token gave
Token ID2921
Feature activation-1.35e-02

Pos logit contributions
 know+0.016
 creativity+0.015
 agree+0.015
 acts+0.014
 pleasure+0.014
Neg logit contributions
itates-0.021
but-0.021
 upon-0.021
ied-0.021
just-0.020
Token Lily
Token ID20037
Feature activation-2.02e-02

Pos logit contributions
itates+0.024
but+0.023
itating+0.023
ained+0.023
just+0.022
Neg logit contributions
 know-0.017
 help-0.017
 kindness-0.015
 thank-0.015
 creativity-0.015
Token a
Token ID257
Feature activation-2.15e-02

Pos logit contributions
itates+0.033
but+0.033
ied+0.032
 upon+0.032
ained+0.032
Neg logit contributions
 know-0.025
 creativity-0.023
 agree-0.023
 thank-0.022
 acts-0.022
Token band
Token ID4097
Feature activation-4.20e-02

Pos logit contributions
 upon+0.036
itates+0.036
but+0.035
ied+0.035
just+0.034
Neg logit contributions
 know-0.025
 agree-0.024
 creativity-0.024
 acts-0.023
 supervision-0.023
Token-
Token ID12
Feature activation-3.51e-02

Pos logit contributions
but+0.048
ied+0.046
itates+0.046
bel+0.046
just+0.046
Neg logit contributions
 creativity-0.070
 know-0.070
 acts-0.069
 agree-0.069
 amaz-0.069
Tokenaid
Token ID1698
Feature activation-5.40e-02

Pos logit contributions
itates+0.081
itating+0.080
 upon+0.079
ained+0.079
chard+0.079
Neg logit contributions
 know-0.023
 help-0.020
 agree-0.017
 thank-0.017
 creativity-0.017
Token for
Token ID329
Feature activation-2.53e-02

Pos logit contributions
itates+0.092
chard+0.088
itating+0.087
ained+0.085
Prin+0.085
Neg logit contributions
 know-0.067
 help-0.066
 thank-0.065
 kindness-0.062
 agree-0.062
Token her
Token ID607
Feature activation-4.46e-03

Pos logit contributions
chard+0.026
but+0.026
 upon+0.025
 into+0.025
just+0.025
Neg logit contributions
 acts-0.047
 amaz-0.046
 know-0.045
 agree-0.045
 Something-0.044
Token knee
Token ID10329
Feature activation-1.71e-02

Pos logit contributions
 upon+0.006
 by+0.006
but+0.006
 into+0.006
just+0.006
Neg logit contributions
 amaz-0.006
 know-0.006
 Something-0.006
 acts-0.006
 Fin-0.006
Token and
Token ID290
Feature activation+4.73e-04

Pos logit contributions
but+0.023
 into+0.022
ained+0.022
just+0.022
 by+0.021
Neg logit contributions
 amaz-0.027
 know-0.027
 creativity-0.026
 acts-0.026
 pleasure-0.026
Token made
Token ID925
Feature activation-2.25e-02

Pos logit contributions
 know+0.001
 agree+0.001
 pleasure+0.001
 amaz+0.001
 acts+0.001
Neg logit contributions
 into-0.001
but-0.001
ied-0.001
chard-0.001
 upon-0.001
Token.
Token ID13
Feature activation-2.44e-03

Pos logit contributions
but+0.007
 into+0.007
 upon+0.006
ied+0.006
side+0.006
Neg logit contributions
 know-0.009
 amaz-0.008
 creativity-0.008
 pleasure-0.008
 acts-0.008
Token
Token ID198
Feature activation-4.77e-03

Pos logit contributions
itates+0.004
but+0.004
ied+0.004
 upon+0.004
ained+0.004
Neg logit contributions
 know-0.003
 agree-0.003
 creativity-0.003
 thank-0.003
 acts-0.003
Token
Token ID198
Feature activation+5.81e-04

Pos logit contributions
itates+0.009
chard+0.009
ied+0.009
ained+0.009
but+0.009
Neg logit contributions
 know-0.005
 agree-0.004
 help-0.004
 thank-0.004
 Something-0.004
Token"
Token ID1
Feature activation-1.25e-02

Pos logit contributions
 know+0.001
 agree+0.001
 Something+0.001
 chew+0.001
 believe+0.001
Neg logit contributions
ators-0.001
bel-0.001
itates-0.001
ied-0.001
ained-0.001
TokenWo
Token ID49450
Feature activation-3.34e-02

Pos logit contributions
ied+0.015
itates+0.014
bel+0.014
 upon+0.013
 into+0.013
Neg logit contributions
 know-0.021
 Something-0.021
 agree-0.020
 chew-0.020
 amaz-0.020
Tokenoh
Token ID1219
Feature activation-5.38e-02

Pos logit contributions
itates+0.044
but+0.042
chard+0.042
Prin+0.041
ained+0.040
Neg logit contributions
 know-0.053
 agree-0.052
 thank-0.051
 help-0.049
 creativity-0.048
Tokenoo
Token ID2238
Feature activation-1.61e-02

Pos logit contributions
but+0.078
 upon+0.077
itates+0.076
ied+0.076
just+0.075
Neg logit contributions
 know-0.076
 creativity-0.073
 acts-0.072
 agree-0.072
 amaz-0.070
Token!"
Token ID2474
Feature activation+9.66e-04

Pos logit contributions
ied+0.023
but+0.023
 upon+0.022
just+0.022
itates+0.022
Neg logit contributions
 know-0.023
 amaz-0.022
 creativity-0.022
 agree-0.022
 Something-0.021
Token she
Token ID673
Feature activation-8.24e-03

Pos logit contributions
 know+0.001
 agree+0.001
 creativity+0.001
 thank+0.001
 acts+0.001
Neg logit contributions
itates-0.002
but-0.002
ied-0.002
 upon-0.002
ained-0.002
Token said
Token ID531
Feature activation-1.03e-02

Pos logit contributions
but+0.013
 upon+0.012
Prin+0.012
 into+0.012
ied+0.012
Neg logit contributions
 creativity-0.010
 pleasure-0.010
 know-0.010
 amaz-0.010
 supervision-0.010
Token,
Token ID11
Feature activation-3.48e-03

Pos logit contributions
but+0.016
itates+0.015
 upon+0.015
ied+0.015
 into+0.015
Neg logit contributions
 know-0.014
 creativity-0.014
 acts-0.014
 agree-0.013
 amaz-0.013
Token it
Token ID340
Feature activation+8.10e-03

Pos logit contributions
 agree+0.008
 know+0.008
 creativity+0.008
 generosity+0.008
 acts+0.008
Neg logit contributions
 into-0.006
 upon-0.005
ied-0.005
but-0.005
 by-0.005
Token,
Token ID11
Feature activation-1.50e-02

Pos logit contributions
 agree+0.010
 know+0.010
 supervision+0.009
 creativity+0.009
 disl+0.009
Neg logit contributions
 into-0.014
but-0.014
 upon-0.014
 by-0.014
just-0.013
Token a
Token ID257
Feature activation-2.20e-02

Pos logit contributions
 upon+0.022
but+0.021
chard+0.020
ied+0.020
itates+0.020
Neg logit contributions
 know-0.021
 agree-0.021
 creativity-0.020
 pleasure-0.020
 acts-0.020
Token loud
Token ID7812
Feature activation-8.45e-03

Pos logit contributions
 upon+0.030
itates+0.029
Prin+0.029
chard+0.029
but+0.028
Neg logit contributions
 know-0.031
 agree-0.031
 acts-0.031
 supervision-0.031
 creativity-0.030
Token noise
Token ID7838
Feature activation+6.97e-03

Pos logit contributions
 by+0.011
 into+0.011
 upon+0.011
Prin+0.011
ling+0.011
Neg logit contributions
 supervision-0.012
 creativity-0.012
 pleasure-0.012
 generosity-0.012
 agree-0.011
Token filled
Token ID5901
Feature activation-5.37e-02

Pos logit contributions
 know+0.012
 pleasure+0.012
 creativity+0.012
 supervision+0.012
 agree+0.012
Neg logit contributions
ied-0.008
 into-0.008
but-0.008
itates-0.008
 upon-0.008
Token the
Token ID262
Feature activation-4.21e-03

Pos logit contributions
itates+0.085
but+0.081
ained+0.081
 upon+0.079
just+0.079
Neg logit contributions
 know-0.072
 agree-0.068
 amaz-0.067
 creativity-0.067
 help-0.066
Token air
Token ID1633
Feature activation-1.86e-02

Pos logit contributions
ied+0.007
but+0.007
itates+0.007
ained+0.007
chard+0.007
Neg logit contributions
 know-0.005
 wish-0.005
 creativity-0.005
 amaz-0.005
 help-0.005
Token.
Token ID13
Feature activation-4.66e-03

Pos logit contributions
but+0.026
 upon+0.026
 into+0.026
just+0.025
ied+0.025
Neg logit contributions
 know-0.028
 creativity-0.027
 acts-0.026
 agree-0.026
 pleasure-0.026
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
itates+0.008
but+0.008
ied+0.008
 upon+0.008
ained+0.008
Neg logit contributions
 know-0.005
 creativity-0.005
 agree-0.005
 thank-0.005
 acts-0.005
Token
Token ID198
Feature activation-4.00e-03

Pos logit contributions
itates+0.019
but+0.019
ied+0.019
 upon+0.019
ained+0.019
Neg logit contributions
 know-0.014
 creativity-0.014
 agree-0.013
 thank-0.013
 acts-0.013
Token the
Token ID262
Feature activation-3.05e-02

Pos logit contributions
 know+0.001
 creativity+0.001
 thank+0.001
 agree+0.001
 help+0.001
Neg logit contributions
itates-0.001
but-0.001
ied-0.001
just-0.001
 upon-0.001
Token most
Token ID749
Feature activation-3.48e-03

Pos logit contributions
itates+0.044
ied+0.043
but+0.043
ained+0.042
 upon+0.042
Neg logit contributions
 know-0.044
 creativity-0.042
 agree-0.042
 thank-0.042
 help-0.041
Token beautiful
Token ID4950
Feature activation-1.18e-02

Pos logit contributions
but+0.005
just+0.005
 by+0.005
 upon+0.005
iring+0.005
Neg logit contributions
 supervision-0.005
 know-0.005
 agree-0.005
 creativity-0.005
 acts-0.005
Token thing
Token ID1517
Feature activation-1.09e-02

Pos logit contributions
itates+0.018
but+0.018
ied+0.018
 upon+0.018
ained+0.018
Neg logit contributions
 know-0.016
 creativity-0.015
 agree-0.015
 thank-0.015
 acts-0.014
Token she
Token ID673
Feature activation-4.62e-02

Pos logit contributions
but+0.014
itates+0.014
ied+0.013
 upon+0.013
just+0.013
Neg logit contributions
 know-0.018
 creativity-0.017
 agree-0.017
 acts-0.016
 thank-0.016
Token had
Token ID550
Feature activation-5.36e-02

Pos logit contributions
itating+0.086
ained+0.086
but+0.085
side+0.084
rier+0.083
Neg logit contributions
 know-0.053
 help-0.047
 agree-0.045
 thank-0.044
 chew-0.044
Token ever
Token ID1683
Feature activation-3.17e-02

Pos logit contributions
 hotter+0.099
itates+0.098
rier+0.098
Prin+0.094
chard+0.094
Neg logit contributions
 know-0.061
 help-0.056
 believe-0.055
 agree-0.054
 give-0.049
Token seen
Token ID1775
Feature activation-2.22e-02

Pos logit contributions
itates+0.044
but+0.043
ied+0.042
 upon+0.042
ained+0.042
Neg logit contributions
 know-0.048
 creativity-0.046
 agree-0.046
 acts-0.044
 thank-0.044
Token.
Token ID13
Feature activation-9.54e-03

Pos logit contributions
itates+0.042
bel+0.041
itating+0.039
ied+0.039
Prin+0.039
Neg logit contributions
 know-0.021
 help-0.021
 agree-0.020
 thank-0.020
 acts-0.019
Token She
Token ID1375
Feature activation-1.11e-02

Pos logit contributions
itates+0.016
but+0.016
ied+0.016
 upon+0.015
ained+0.015
Neg logit contributions
 know-0.012
 creativity-0.011
 agree-0.011
 thank-0.010
 acts-0.010
Token slowly
Token ID6364
Feature activation+7.71e-04

Pos logit contributions
itates+0.018
but+0.018
ained+0.018
ied+0.018
itating+0.018
Neg logit contributions
 know-0.014
 agree-0.013
 creativity-0.013
 thank-0.012
 acts-0.012
Token and
Token ID290
Feature activation-6.56e-03

Pos logit contributions
 into+0.044
 upon+0.042
 by+0.042
but+0.041
ied+0.040
Neg logit contributions
 creativity-0.052
 amaz-0.051
 pleasure-0.051
 know-0.051
 disl-0.050
Token put
Token ID1234
Feature activation-3.21e-02

Pos logit contributions
but+0.010
ied+0.010
 into+0.010
chard+0.010
itates+0.010
Neg logit contributions
 know-0.009
 pleasure-0.008
 agree-0.008
 creativity-0.008
 acts-0.008
Token a
Token ID257
Feature activation-3.24e-02

Pos logit contributions
itates+0.047
but+0.047
ied+0.046
 upon+0.046
just+0.045
Neg logit contributions
 know-0.045
 creativity-0.043
 agree-0.043
 acts-0.042
 thank-0.041
Token band
Token ID4097
Feature activation-4.39e-02

Pos logit contributions
itates+0.062
but+0.061
ained+0.061
ied+0.060
just+0.059
Neg logit contributions
 know-0.033
 creativity-0.031
 help-0.030
 thank-0.030
 agree-0.029
Token-
Token ID12
Feature activation-3.81e-02

Pos logit contributions
itates+0.065
chard+0.063
 upon+0.062
but+0.062
itating+0.061
Neg logit contributions
 know-0.065
 help-0.060
 agree-0.060
 creativity-0.059
 thank-0.058
Tokenaid
Token ID1698
Feature activation-5.34e-02

Pos logit contributions
itates+0.091
itating+0.091
ained+0.090
chard+0.090
 upon+0.089
Neg logit contributions
 know-0.022
 help-0.020
 care-0.015
 chew-0.014
 kindness-0.013
Token on
Token ID319
Feature activation-2.25e-02

Pos logit contributions
itates+0.077
chard+0.072
Prin+0.071
abby+0.070
itating+0.069
Neg logit contributions
 help-0.083
 thank-0.078
 agree-0.075
,-0.074
 know-0.073
Token his
Token ID465
Feature activation-1.25e-02

Pos logit contributions
itates+0.033
but+0.032
ied+0.032
 upon+0.032
ained+0.032
Neg logit contributions
 know-0.032
 creativity-0.031
 agree-0.031
 thank-0.030
 help-0.030
Token cheek
Token ID19353
Feature activation-1.20e-02

Pos logit contributions
but+0.019
itates+0.019
 into+0.018
 upon+0.018
chard+0.018
Neg logit contributions
 know-0.017
 agree-0.016
 creativity-0.015
 acts-0.015
 pleasure-0.015
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
itates+0.022
ied+0.021
chard+0.021
Prin+0.021
but+0.021
Neg logit contributions
 know-0.013
 agree-0.012
 help-0.011
 creativity-0.011
 thank-0.011
Token He
Token ID679
Feature activation-1.26e-02

Pos logit contributions
itates+0.019
but+0.019
 upon+0.018
ied+0.018
ained+0.018
Neg logit contributions
 know-0.014
 creativity-0.013
 agree-0.013
 acts-0.013
 help-0.013
Token Lucy
Token ID22162
Feature activation-2.36e-02

Pos logit contributions
itates+0.045
but+0.045
 upon+0.044
ied+0.044
ained+0.043
Neg logit contributions
 know-0.032
 creativity-0.031
 agree-0.030
 acts-0.029
 thank-0.029
Token was
Token ID373
Feature activation-8.72e-03

Pos logit contributions
 upon+0.031
itates+0.030
ied+0.030
but+0.029
 into+0.029
Neg logit contributions
 creativity-0.037
 pleasure-0.035
 know-0.035
 supervision-0.035
 thank-0.035
Token feeling
Token ID4203
Feature activation-3.08e-03

Pos logit contributions
itates+0.016
but+0.016
ied+0.015
abby+0.015
itating+0.015
Neg logit contributions
 know-0.010
 help-0.010
 creativity-0.009
 lend-0.009
 agree-0.009
Token tired
Token ID10032
Feature activation-1.49e-02

Pos logit contributions
itates+0.004
but+0.004
ied+0.004
 upon+0.004
just+0.004
Neg logit contributions
 know-0.005
 creativity-0.004
 agree-0.004
 help-0.004
 thank-0.004
Token so
Token ID523
Feature activation-1.18e-02

Pos logit contributions
itates+0.024
but+0.024
ied+0.023
 upon+0.023
ained+0.023
Neg logit contributions
 know-0.019
 creativity-0.018
 agree-0.018
 acts-0.017
 thank-0.017
Token she
Token ID673
Feature activation-5.32e-02

Pos logit contributions
itates+0.017
 upon+0.017
but+0.017
ied+0.017
ained+0.016
Neg logit contributions
 know-0.017
 creativity-0.016
 agree-0.016
 acts-0.016
 thank-0.015
Token went
Token ID1816
Feature activation-2.85e-02

Pos logit contributions
but+0.094
itating+0.090
ained+0.088
itates+0.088
just+0.088
Neg logit contributions
 know-0.068
 wish-0.061
 help-0.060
 agree-0.059
 thank-0.059
Token to
Token ID284
Feature activation-2.72e-02

Pos logit contributions
itates+0.029
but+0.028
ied+0.027
chard+0.026
ained+0.026
Neg logit contributions
 know-0.055
 help-0.053
 creativity-0.052
 thank-0.052
 offer-0.051
Token the
Token ID262
Feature activation-1.52e-02

Pos logit contributions
 upon+0.041
itates+0.039
but+0.039
chard+0.039
 into+0.038
Neg logit contributions
 creativity-0.036
 know-0.035
 agree-0.035
 acts-0.034
 supervision-0.034
Token classroom
Token ID15806
Feature activation-2.54e-02

Pos logit contributions
itates+0.028
ied+0.027
but+0.027
just+0.026
ained+0.026
Neg logit contributions
 know-0.016
 help-0.016
 creativity-0.015
 thank-0.015
 chew-0.015
Token for
Token ID329
Feature activation-3.43e-02

Pos logit contributions
itates+0.042
ained+0.041
ied+0.041
but+0.040
chard+0.040
Neg logit contributions
 know-0.031
 agree-0.030
 creativity-0.029
 help-0.029
 thank-0.028
Token for
Token ID329
Feature activation-2.18e-02

Pos logit contributions
itates+0.008
but+0.008
ied+0.007
 upon+0.007
ained+0.007
Neg logit contributions
 know-0.016
 creativity-0.016
 agree-0.015
 thank-0.015
 acts-0.015
Token the
Token ID262
Feature activation-1.32e-02

Pos logit contributions
 into+0.020
 late+0.020
 by+0.018
side+0.018
 upon+0.017
Neg logit contributions
 acts-0.042
 agree-0.041
 Something-0.039
 believe-0.039
 amaz-0.038
Token adventure
Token ID8855
Feature activation-1.30e-02

Pos logit contributions
 upon+0.020
but+0.020
just+0.020
ained+0.019
itates+0.019
Neg logit contributions
 know-0.018
 creativity-0.017
 agree-0.017
 acts-0.016
 Something-0.016
Token,
Token ID11
Feature activation-1.09e-02

Pos logit contributions
but+0.018
ied+0.018
 into+0.018
just+0.017
 by+0.017
Neg logit contributions
 know-0.020
 amaz-0.019
 creativity-0.019
 agree-0.018
 wish-0.018
Token he
Token ID339
Feature activation-1.37e-02

Pos logit contributions
but+0.015
 upon+0.014
just+0.014
chard+0.014
 into+0.014
Neg logit contributions
 know-0.017
 agree-0.016
 acts-0.016
 creativity-0.016
 supervision-0.015
Token packed
Token ID11856
Feature activation-5.32e-02

Pos logit contributions
itates+0.018
but+0.017
ied+0.017
 upon+0.017
ained+0.017
Neg logit contributions
 know-0.022
 creativity-0.021
 agree-0.021
 acts-0.020
 thank-0.020
Token his
Token ID465
Feature activation-1.55e-02

Pos logit contributions
itates+0.091
itating+0.086
ained+0.083
but+0.083
Prin+0.081
Neg logit contributions
 know-0.071
 thank-0.068
 help-0.068
 creativity-0.067
 offer-0.066
Token bag
Token ID6131
Feature activation-3.19e-02

Pos logit contributions
 upon+0.025
 into+0.024
chard+0.024
bel+0.023
ied+0.023
Neg logit contributions
 agree-0.020
 know-0.020
 believe-0.018
 creativity-0.017
 disl-0.017
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
itates+0.048
Prin+0.045
itating+0.043
ained+0.042
Oh+0.042
Neg logit contributions
 know-0.046
 help-0.046
 offer-0.044
 agree-0.044
 creativity-0.043
Token He
Token ID679
Feature activation-7.69e-03

Pos logit contributions
but+0.018
 upon+0.017
itates+0.017
ained+0.017
just+0.016
Neg logit contributions
 creativity-0.017
 know-0.016
 acts-0.016
 bravery-0.016
 supervision-0.016
Token put
Token ID1234
Feature activation-2.63e-02

Pos logit contributions
 upon+0.009
ied+0.009
bel+0.009
itates+0.009
but+0.009
Neg logit contributions
 supervision-0.013
 know-0.012
 amaz-0.012
 creativity-0.012
 pleasure-0.012
Token very
Token ID845
Feature activation-2.14e-02

Pos logit contributions
 upon+0.015
chard+0.015
but+0.015
itates+0.015
ied+0.015
Neg logit contributions
 know-0.013
 acts-0.013
 agree-0.012
 creativity-0.012
 supervision-0.012
Token special
Token ID2041
Feature activation-6.21e-03

Pos logit contributions
itates+0.037
but+0.037
ied+0.036
 upon+0.036
ained+0.036
Neg logit contributions
 know-0.025
 creativity-0.023
 agree-0.023
 thank-0.022
 acts-0.022
Token toy
Token ID13373
Feature activation-2.23e-03

Pos logit contributions
 upon+0.010
 by+0.010
 into+0.009
eh+0.009
but+0.009
Neg logit contributions
 know-0.008
 supervision-0.008
 agree-0.007
 acts-0.007
 amaz-0.007
Token for
Token ID329
Feature activation-2.43e-02

Pos logit contributions
but+0.003
 into+0.003
just+0.003
 by+0.003
ied+0.003
Neg logit contributions
 know-0.004
 amaz-0.004
 creativity-0.004
 agree-0.003
 disl-0.003
Token L
Token ID406
Feature activation-1.39e-02

Pos logit contributions
ied+0.024
 upon+0.024
but+0.024
 into+0.023
ained+0.022
Neg logit contributions
 know-0.045
 acts-0.044
 agree-0.044
 creativity-0.043
 amaz-0.043
Tokenott
Token ID1252
Feature activation-5.31e-02

Pos logit contributions
itates+0.022
chard+0.022
ained+0.021
itating+0.021
but+0.021
Neg logit contributions
 know-0.018
 creativity-0.017
 agree-0.016
 help-0.016
 acts-0.016
Tokenie
Token ID494
Feature activation-2.10e-02

Pos logit contributions
itates+0.068
chard+0.065
itating+0.063
but+0.058
Oh+0.058
Neg logit contributions
 know-0.092
 agree-0.087
 supervision-0.085
 creativity-0.084
 acts-0.084
Token.
Token ID13
Feature activation+1.92e-03

Pos logit contributions
 by+0.024
 into+0.023
 upon+0.023
but+0.022
ied+0.022
Neg logit contributions
 amaz-0.038
 know-0.035
 creativity-0.035
 pleasure-0.035
 Something-0.035
Token It
Token ID632
Feature activation+5.96e-03

Pos logit contributions
 know+0.003
 agree+0.003
 creativity+0.003
 bravery+0.003
 acts+0.003
Neg logit contributions
but-0.003
 upon-0.003
itates-0.003
ained-0.003
ied-0.003
Token was
Token ID373
Feature activation+3.18e-03

Pos logit contributions
 agree+0.011
 supervision+0.011
 know+0.011
 creativity+0.011
 pleasure+0.011
Neg logit contributions
 into-0.006
 upon-0.006
but-0.006
ied-0.006
just-0.006
Token a
Token ID257
Feature activation-1.26e-02

Pos logit contributions
 know+0.005
 agree+0.005
 creativity+0.004
 acts+0.004
 supervision+0.004
Neg logit contributions
 upon-0.004
ied-0.004
but-0.004
ained-0.004
itates-0.004
Token!
Token ID0
Feature activation-1.56e-02

Pos logit contributions
itates+0.036
Prin+0.033
but+0.033
bel+0.032
abby+0.032
Neg logit contributions
 paws-0.029
,-0.029
 creativity-0.028
 amaz-0.028
 help-0.028
Token She
Token ID1375
Feature activation-2.76e-02

Pos logit contributions
itates+0.027
ained+0.027
ied+0.027
just+0.026
ind+0.026
Neg logit contributions
 know-0.019
 creativity-0.017
 agree-0.017
 Something-0.017
 thank-0.017
Token was
Token ID373
Feature activation-1.83e-02

Pos logit contributions
ained+0.046
but+0.046
itates+0.045
just+0.044
 upon+0.044
Neg logit contributions
 know-0.036
 creativity-0.032
 agree-0.032
 help-0.031
 acts-0.031
Token so
Token ID523
Feature activation-1.60e-02

Pos logit contributions
itates+0.024
but+0.023
 upon+0.023
ied+0.023
ained+0.023
Neg logit contributions
 know-0.029
 creativity-0.028
 agree-0.028
 acts-0.027
 thank-0.027
Token sad
Token ID6507
Feature activation-1.09e-02

Pos logit contributions
itates+0.022
but+0.020
 upon+0.020
ied+0.020
ained+0.020
Neg logit contributions
 creativity-0.024
 acts-0.024
 know-0.024
 agree-0.024
 supervision-0.024
Token and
Token ID290
Feature activation-5.30e-02

Pos logit contributions
but+0.018
itates+0.018
 upon+0.018
ied+0.018
just+0.018
Neg logit contributions
 know-0.013
 creativity-0.012
 agree-0.012
 acts-0.012
 thank-0.012
Token didn
Token ID1422
Feature activation-3.65e-02

Pos logit contributions
but+0.094
abby+0.090
just+0.090
ained+0.089
itates+0.089
Neg logit contributions
 know-0.066
 thank-0.059
 help-0.059
 wish-0.058
 creativity-0.057
Token't
Token ID470
Feature activation-4.59e-02

Pos logit contributions
itates+0.042
but+0.041
ied+0.041
chard+0.041
 upon+0.040
Neg logit contributions
 know-0.064
 help-0.060
 creativity-0.060
 thank-0.060
 agree-0.059
Token know
Token ID760
Feature activation-3.06e-04

Pos logit contributions
itates+0.113
 upon+0.113
 into+0.110
but+0.109
just+0.108
Neg logit contributions
 supervision-0.016
 creativity-0.016
 acts-0.016
 pleasure-0.015
 generosity-0.014
Token what
Token ID644
Feature activation-1.12e-02

Pos logit contributions
but+0.000
 upon+0.000
chard+0.000
ied+0.000
just+0.000
Neg logit contributions
 know-0.000
 agree-0.000
 acts-0.000
 pleasure-0.000
 creativity-0.000
Token to
Token ID284
Feature activation-8.96e-03

Pos logit contributions
itates+0.008
but+0.008
ied+0.008
 upon+0.008
just+0.008
Neg logit contributions
 know-0.024
 creativity-0.023
 agree-0.023
 acts-0.023
 thank-0.023
Token some
Token ID617
Feature activation-2.35e-02

Pos logit contributions
bel+0.022
itates+0.022
itating+0.020
abby+0.020
chard+0.019
Neg logit contributions
 know-0.029
 help-0.028
 agree-0.026
 thank-0.026
 Sammy-0.026
Token books
Token ID3835
Feature activation-2.43e-02

Pos logit contributions
itates+0.036
but+0.036
ied+0.035
 upon+0.035
ained+0.034
Neg logit contributions
 know-0.032
 creativity-0.031
 agree-0.030
 thank-0.029
 help-0.029
Token off
Token ID572
Feature activation-2.83e-02

Pos logit contributions
itates+0.027
ind+0.026
abby+0.026
Prin+0.026
chard+0.026
Neg logit contributions
 know-0.043
 help-0.041
 thank-0.041
 agree-0.040
 are-0.039
Token the
Token ID262
Feature activation-2.09e-02

Pos logit contributions
itates+0.049
but+0.047
just+0.046
itating+0.046
ained+0.045
Neg logit contributions
 know-0.034
 help-0.032
 thank-0.032
 creativity-0.032
 acts-0.031
Token books
Token ID3835
Feature activation-3.47e-02

Pos logit contributions
ied+0.036
itates+0.036
ained+0.036
itating+0.035
but+0.035
Neg logit contributions
 know-0.025
 help-0.025
 creativity-0.025
 thank-0.024
 chew-0.023
Tokenhe
Token ID258
Feature activation-5.29e-02

Pos logit contributions
itates+0.048
but+0.045
 upon+0.045
chard+0.045
ied+0.045
Neg logit contributions
 know-0.052
 thank-0.049
 agree-0.049
 help-0.048
 offer-0.046
Tokenlf
Token ID1652
Feature activation-3.71e-02

Pos logit contributions
itates+0.094
itating+0.091
chard+0.085
ained+0.085
Prin+0.083
Neg logit contributions
 know-0.064
 help-0.062
 creativity-0.062
 acts-0.061
 offer-0.060
Token.
Token ID13
Feature activation-7.97e-03

Pos logit contributions
itates+0.071
but+0.067
Prin+0.066
 upon+0.066
just+0.066
Neg logit contributions
 know-0.037
 agree-0.034
 help-0.034
 creativity-0.034
 acts-0.033
Token Mom
Token ID11254
Feature activation-8.33e-04

Pos logit contributions
but+0.013
itates+0.013
 upon+0.012
ied+0.012
ained+0.012
Neg logit contributions
 know-0.010
 creativity-0.010
 agree-0.010
 acts-0.009
 thank-0.009
Tokenmy
Token ID1820
Feature activation+2.72e-03

Pos logit contributions
itates+0.001
but+0.001
chard+0.001
ained+0.001
itating+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 help-0.001
 creativity-0.001
 acts-0.001
Token was
Token ID373
Feature activation-5.01e-03

Pos logit contributions
 pleasure+0.004
 creativity+0.004
 know+0.004
 amaz+0.004
 agree+0.004
Neg logit contributions
but-0.004
itates-0.004
 upon-0.004
ied-0.004
just-0.004
Token.
Token ID13
Feature activation-1.29e-03

Pos logit contributions
 know+0.006
 creativity+0.006
 agree+0.005
 acts+0.005
 thank+0.005
Neg logit contributions
but-0.007
itates-0.007
 upon-0.007
ied-0.007
ained-0.007
Token Pe
Token ID2631
Feature activation-1.86e-02

Pos logit contributions
but+0.002
 upon+0.002
itates+0.002
just+0.002
chard+0.002
Neg logit contributions
 creativity-0.002
 know-0.002
 acts-0.002
 bravery-0.002
 help-0.002
Tokenering
Token ID1586
Feature activation+1.55e-02

Pos logit contributions
but+0.023
ained+0.022
itates+0.022
Prin+0.022
side+0.021
Neg logit contributions
 agree-0.031
 know-0.031
 supervision-0.030
 creativity-0.028
 acts-0.028
Token around
Token ID1088
Feature activation+5.50e-03

Pos logit contributions
 know+0.030
 creativity+0.028
 thank+0.028
 agree+0.028
 help+0.028
Neg logit contributions
itates-0.015
ained-0.015
but-0.014
Prin-0.014
just-0.014
Token the
Token ID262
Feature activation-2.80e-04

Pos logit contributions
 know+0.008
 creativity+0.008
 agree+0.008
 acts+0.008
 thank+0.007
Neg logit contributions
but-0.008
itates-0.008
 upon-0.008
ied-0.007
just-0.007
Token corner
Token ID5228
Feature activation+2.34e-02

Pos logit contributions
itates+0.001
but+0.001
ied+0.001
ained+0.001
just+0.001
Neg logit contributions
 know-0.000
 creativity-0.000
 agree-0.000
 help-0.000
 thank-0.000
Token,
Token ID11
Feature activation+1.40e-02

Pos logit contributions
 know+0.030
 creativity+0.029
 agree+0.028
 acts+0.028
 thank+0.027
Neg logit contributions
but-0.037
 upon-0.037
ied-0.037
itates-0.036
just-0.036
Token he
Token ID339
Feature activation+1.38e-02

Pos logit contributions
 know+0.019
 creativity+0.018
 agree+0.018
 thank+0.017
 acts+0.017
Neg logit contributions
itates-0.021
but-0.021
ied-0.021
ained-0.020
 upon-0.020
Token saw
Token ID2497
Feature activation-5.61e-04

Pos logit contributions
 know+0.021
 creativity+0.020
 agree+0.020
 thank+0.019
 acts+0.019
Neg logit contributions
itates-0.019
but-0.019
ied-0.018
ained-0.018
 upon-0.018
Token a
Token ID257
Feature activation+1.21e-02

Pos logit contributions
but+0.001
ied+0.001
 upon+0.001
 into+0.001
chard+0.001
Neg logit contributions
 agree-0.001
 know-0.001
 creativity-0.001
 acts-0.001
 lend-0.001
Token little
Token ID1310
Feature activation+2.67e-02

Pos logit contributions
 know+0.017
 creativity+0.016
 acts+0.016
 supervision+0.016
 agree+0.016
Neg logit contributions
 upon-0.018
itates-0.017
chard-0.017
but-0.017
Prin-0.016
TokenHe
Token ID1544
Feature activation+9.27e-03

Pos logit contributions
 know+0.017
 agree+0.016
 creativity+0.016
 thank+0.015
 help+0.015
Neg logit contributions
itates-0.009
ied-0.009
but-0.009
ained-0.009
chard-0.009
Token asked
Token ID1965
Feature activation+9.33e-03

Pos logit contributions
 know+0.014
 creativity+0.013
 agree+0.013
 pleasure+0.013
 supervision+0.013
Neg logit contributions
itates-0.013
but-0.013
 upon-0.013
ied-0.013
just-0.012
Token his
Token ID465
Feature activation+2.92e-02

Pos logit contributions
 know+0.016
 creativity+0.016
 agree+0.015
 acts+0.015
 pleasure+0.015
Neg logit contributions
ied-0.011
 upon-0.010
but-0.010
 into-0.010
chard-0.010
Token mom
Token ID1995
Feature activation+3.12e-02

Pos logit contributions
 disl+0.043
 supervision+0.041
 know+0.040
 acts+0.040
 omega+0.040
Neg logit contributions
 upon-0.038
 by-0.035
 into-0.033
 up-0.032
ied-0.031
Token if
Token ID611
Feature activation+2.66e-02

Pos logit contributions
 know+0.043
 creativity+0.042
 agree+0.040
 acts+0.040
 pleasure+0.040
Neg logit contributions
 upon-0.048
but-0.046
ied-0.046
itates-0.045
just-0.045
Token he
Token ID339
Feature activation+2.52e-02

Pos logit contributions
 know+0.037
 agree+0.035
 creativity+0.035
 acts+0.034
 pleasure+0.033
Neg logit contributions
 upon-0.040
but-0.040
ied-0.039
chard-0.039
just-0.038
Token could
Token ID714
Feature activation+9.45e-03

Pos logit contributions
 supervision+0.051
 disl+0.051
 creativity+0.050
 amaz+0.049
 pleasure+0.049
Neg logit contributions
 upon-0.021
itates-0.020
but-0.019
just-0.019
ied-0.019
Token borrow
Token ID8804
Feature activation+1.08e-02

Pos logit contributions
 supervision+0.012
 disl+0.011
 acts+0.011
 amaz+0.011
 creativity+0.011
Neg logit contributions
 into-0.015
but-0.015
 by-0.015
 upon-0.015
 over-0.015
Token her
Token ID607
Feature activation+1.94e-02

Pos logit contributions
 agree+0.024
 know+0.021
 amaz+0.021
 hospitality+0.021
 generosity+0.021
Neg logit contributions
 into-0.010
 up-0.009
 by-0.009
 against-0.008
 over-0.008
Token m
Token ID285
Feature activation+8.55e-03

Pos logit contributions
 know+0.025
 agree+0.025
 amaz+0.023
 thank+0.023
 creativity+0.022
Neg logit contributions
 upon-0.032
 into-0.031
but-0.031
 by-0.030
ied-0.029
Tokenop
Token ID404
Feature activation+7.27e-03

Pos logit contributions
 know+0.011
 thank+0.010
 creativity+0.010
 agree+0.010
 amaz+0.010
Neg logit contributions
but-0.015
ied-0.014
just-0.014
ained-0.014
itates-0.013
Token They
Token ID1119
Feature activation+1.39e-02

Pos logit contributions
 creativity+0.018
 bravery+0.017
 know+0.017
 acts+0.017
 help+0.017
Neg logit contributions
 upon-0.021
but-0.021
itates-0.020
ained-0.020
 into-0.019
Token talked
Token ID6619
Feature activation+1.94e-02

Pos logit contributions
 creativity+0.020
 acts+0.020
 supervision+0.020
 pleasure+0.019
 Something+0.019
Neg logit contributions
 upon-0.019
chard-0.019
ied-0.018
itates-0.018
but-0.018
Token,
Token ID11
Feature activation+9.06e-03

Pos logit contributions
 know+0.030
 creativity+0.029
 agree+0.029
 acts+0.028
 supervision+0.028
Neg logit contributions
but-0.025
itates-0.025
ied-0.025
 upon-0.025
just-0.024
Token laughed
Token ID13818
Feature activation+2.15e-02

Pos logit contributions
 supervision+0.013
 acts+0.013
 wish+0.013
 know+0.013
 pleasure+0.012
Neg logit contributions
but-0.013
just-0.012
 upon-0.012
chard-0.012
 over-0.012
Token and
Token ID290
Feature activation+1.16e-02

Pos logit contributions
 know+0.032
 supervision+0.030
 acts+0.030
 creativity+0.030
 agree+0.029
Neg logit contributions
but-0.031
 upon-0.030
 into-0.030
just-0.030
itates-0.029
Token played
Token ID2826
Feature activation+2.36e-02

Pos logit contributions
 supervision+0.017
 acts+0.017
 know+0.017
 offer+0.017
 creativity+0.017
Neg logit contributions
 into-0.015
chard-0.015
ied-0.014
 over-0.014
but-0.014
Token games
Token ID1830
Feature activation+1.88e-02

Pos logit contributions
 know+0.039
 creativity+0.038
 agree+0.037
 acts+0.037
 thank+0.036
Neg logit contributions
itates-0.029
but-0.029
ied-0.028
 upon-0.028
ained-0.027
Token.
Token ID13
Feature activation+1.49e-02

Pos logit contributions
 know+0.030
 acts+0.029
 creativity+0.028
 amaz+0.028
 Something+0.028
Neg logit contributions
but-0.025
 into-0.024
side-0.023
just-0.023
 upon-0.023
Token It
Token ID632
Feature activation+1.91e-02

Pos logit contributions
 creativity+0.020
 know+0.020
 bravery+0.019
 supervision+0.019
 help+0.019
Neg logit contributions
 upon-0.023
itates-0.023
but-0.023
ained-0.022
ied-0.021
Token was
Token ID373
Feature activation+1.70e-02

Pos logit contributions
 know+0.030
 supervision+0.030
 creativity+0.030
 agree+0.029
 pleasure+0.029
Neg logit contributions
itates-0.024
but-0.023
 upon-0.023
ied-0.023
just-0.023
Token a
Token ID257
Feature activation+7.69e-03

Pos logit contributions
 know+0.023
 creativity+0.022
 agree+0.022
 thank+0.021
 acts+0.021
Neg logit contributions
itates-0.026
but-0.026
ied-0.025
 upon-0.025
just-0.025
Token big
Token ID1263
Feature activation+6.76e-03

Pos logit contributions
 know+0.007
 supervision+0.006
 acts+0.006
 agree+0.006
 generosity+0.006
Neg logit contributions
 upon-0.006
itates-0.005
chard-0.005
Prin-0.005
bel-0.005
Token dog
Token ID3290
Feature activation+1.94e-02

Pos logit contributions
 know+0.011
 supervision+0.010
 Something+0.010
 disl+0.010
 agree+0.010
Neg logit contributions
 upon-0.009
 by-0.008
 up-0.007
oe-0.007
chard-0.007
Token named
Token ID3706
Feature activation+2.75e-03

Pos logit contributions
 amaz+0.035
 pleasure+0.034
 creativity+0.032
 generosity+0.032
 delight+0.031
Neg logit contributions
 by-0.028
 in-0.027
 into-0.026
 over-0.025
 under-0.025
Token Spot
Token ID15899
Feature activation+2.20e-02

Pos logit contributions
 amaz+0.004
 agree+0.004
 creativity+0.004
 wish+0.004
 know+0.004
Neg logit contributions
chard-0.004
 into-0.004
 upon-0.004
 by-0.004
 over-0.004
Token.
Token ID13
Feature activation+4.07e-03

Pos logit contributions
 amaz+0.040
 creativity+0.039
 pleasure+0.037
 determination+0.036
 delight+0.036
Neg logit contributions
 upon-0.029
 by-0.027
 into-0.027
itates-0.026
 over-0.025
Token Spot
Token ID15899
Feature activation+2.81e-02

Pos logit contributions
 creativity+0.005
 believe+0.005
 bravery+0.005
 supervision+0.005
 generosity+0.005
Neg logit contributions
 upon-0.007
itates-0.007
ained-0.007
chard-0.006
but-0.006
Token is
Token ID318
Feature activation+1.77e-02

Pos logit contributions
 amaz+0.050
 supervision+0.050
 creativity+0.048
 pleasure+0.048
 generosity+0.045
Neg logit contributions
itates-0.035
 upon-0.035
 up-0.032
 into-0.032
 by-0.032
Token brown
Token ID7586
Feature activation+1.47e-02

Pos logit contributions
 agree+0.025
 know+0.025
 creativity+0.025
 amaz+0.024
 acts+0.024
Neg logit contributions
itates-0.025
 upon-0.025
ained-0.025
chard-0.025
but-0.024
Token and
Token ID290
Feature activation+1.85e-02

Pos logit contributions
 know+0.022
 creativity+0.021
 agree+0.020
 supervision+0.020
 pleasure+0.020
Neg logit contributions
itates-0.021
but-0.020
 upon-0.020
ained-0.020
 into-0.020
Token white
Token ID2330
Feature activation+2.41e-02

Pos logit contributions
 agree+0.031
 amaz+0.029
 believe+0.028
 supervision+0.028
 know+0.027
Neg logit contributions
chard-0.025
 upon-0.024
itates-0.024
 late-0.023
Please-0.022
Token and
Token ID290
Feature activation+2.09e-02

Pos logit contributions
 know+0.039
 creativity+0.038
 supervision+0.037
 pleasure+0.037
 amaz+0.037
Neg logit contributions
 into-0.031
 by-0.031
itates-0.030
but-0.030
 upon-0.030
Token lot
Token ID1256
Feature activation+2.87e-02

Pos logit contributions
 know+0.018
 supervision+0.018
 acts+0.018
 creativity+0.017
 agree+0.017
Neg logit contributions
 upon-0.018
chard-0.017
itates-0.017
Prin-0.016
itating-0.016
Token of
Token ID286
Feature activation-3.27e-03

Pos logit contributions
 know+0.048
 creativity+0.047
 agree+0.047
 supervision+0.046
 pleasure+0.045
Neg logit contributions
but-0.035
ied-0.034
itates-0.034
 upon-0.033
just-0.033
Token different
Token ID1180
Feature activation+1.15e-02

Pos logit contributions
 upon+0.004
ied+0.004
ained+0.004
Prin+0.004
chard+0.004
Neg logit contributions
 supervision-0.005
 know-0.005
 agree-0.005
 creativity-0.005
 Something-0.004
Token animals
Token ID4695
Feature activation+2.86e-03

Pos logit contributions
 know+0.018
 agree+0.018
 creativity+0.018
 supervision+0.017
 thank+0.017
Neg logit contributions
 upon-0.014
ied-0.014
but-0.014
itates-0.014
ained-0.014
Token and
Token ID290
Feature activation+8.03e-03

Pos logit contributions
 creativity+0.005
 omega+0.005
 Something+0.005
 pleasure+0.005
 know+0.005
Neg logit contributions
 into-0.003
 by-0.003
 upon-0.003
 in-0.003
but-0.003
Token lush
Token ID37408
Feature activation+2.94e-02

Pos logit contributions
 agree+0.012
 supervision+0.012
 pleasure+0.011
 know+0.011
 creativity+0.011
Neg logit contributions
 upon-0.011
chard-0.010
bel-0.010
 into-0.010
ied-0.010
Token green
Token ID4077
Feature activation+2.11e-02

Pos logit contributions
 know+0.049
 supervision+0.047
 agree+0.047
 creativity+0.045
 Something+0.045
Neg logit contributions
 upon-0.036
 into-0.036
ied-0.035
ling-0.035
 by-0.034
Token trees
Token ID7150
Feature activation+1.52e-02

Pos logit contributions
 know+0.036
 agree+0.035
 supervision+0.034
 creativity+0.034
 pleasure+0.033
Neg logit contributions
 upon-0.025
ied-0.024
 into-0.024
but-0.024
ained-0.023
Token.
Token ID13
Feature activation+7.05e-03

Pos logit contributions
 creativity+0.024
 know+0.024
 pleasure+0.023
 supervision+0.023
 Something+0.023
Neg logit contributions
 upon-0.020
 into-0.020
but-0.020
just-0.019
chard-0.019
Token
Token ID198
Feature activation+2.01e-03

Pos logit contributions
 know+0.008
 creativity+0.008
 acts+0.008
 bravery+0.008
 supervision+0.008
Neg logit contributions
 upon-0.012
itates-0.012
but-0.012
ained-0.011
chard-0.011
Token
Token ID198
Feature activation+8.04e-03

Pos logit contributions
 know+0.002
 creativity+0.002
 agree+0.002
 thank+0.002
 acts+0.002
Neg logit contributions
itates-0.003
but-0.003
ied-0.003
 upon-0.003
ained-0.003
Token started
Token ID2067
Feature activation-6.92e-03

Pos logit contributions
 upon+0.001
ied+0.001
itates+0.001
 into+0.001
but+0.001
Neg logit contributions
 creativity-0.001
 pleasure-0.001
 know-0.001
 generosity-0.001
 bravery-0.001
Token to
Token ID284
Feature activation+7.06e-03

Pos logit contributions
 upon+0.005
but+0.005
ied+0.005
 into+0.005
ained+0.005
Neg logit contributions
 know-0.015
 creativity-0.015
 pleasure-0.014
 agree-0.014
 thank-0.014
Token laugh
Token ID6487
Feature activation+1.11e-02

Pos logit contributions
 creativity+0.008
 supervision+0.008
 pleasure+0.007
 omega+0.007
 generosity+0.007
Neg logit contributions
 upon-0.012
Prin-0.012
chard-0.012
 over-0.011
itating-0.011
Token and
Token ID290
Feature activation+4.85e-03

Pos logit contributions
 know+0.016
 creativity+0.015
 agree+0.015
 supervision+0.015
 acts+0.015
Neg logit contributions
itates-0.016
but-0.016
 upon-0.016
ied-0.016
ained-0.015
Token point
Token ID966
Feature activation+5.31e-03

Pos logit contributions
 acts+0.005
 creativity+0.005
 omega+0.005
 supervision+0.005
 pleasure+0.005
Neg logit contributions
 into-0.008
chard-0.008
 upon-0.008
 over-0.008
itates-0.008
Token at
Token ID379
Feature activation-3.78e-03

Pos logit contributions
 know+0.010
 creativity+0.010
 agree+0.010
 acts+0.009
 thank+0.009
Neg logit contributions
itates-0.005
but-0.005
chard-0.005
ained-0.005
just-0.005
Token her
Token ID607
Feature activation-7.65e-04

Pos logit contributions
ied+0.004
 upon+0.004
 into+0.004
 over+0.004
 by+0.004
Neg logit contributions
 know-0.007
 bisc-0.006
 creativity-0.006
 wish-0.006
 pleasure-0.006
Token.
Token ID13
Feature activation+1.47e-03

Pos logit contributions
 upon+0.001
 into+0.001
 by+0.001
bel+0.001
but+0.001
Neg logit contributions
 know-0.001
 omega-0.001
 pleasure-0.001
 creativity-0.001
 fiber-0.001
Token Amy
Token ID14235
Feature activation-5.04e-03

Pos logit contributions
 know+0.002
 creativity+0.002
 supervision+0.002
 acts+0.002
 bravery+0.002
Neg logit contributions
itates-0.002
 upon-0.002
but-0.002
chard-0.002
ied-0.002
Token felt
Token ID2936
Feature activation-3.22e-03

Pos logit contributions
 upon+0.007
itates+0.007
 into+0.006
but+0.006
Prin+0.006
Neg logit contributions
 supervision-0.008
 creativity-0.008
 pleasure-0.008
 know-0.007
 generosity-0.007
Token embarrassed
Token ID21100
Feature activation+6.77e-03

Pos logit contributions
itates+0.004
 into+0.004
 late+0.003
Prin+0.003
chard+0.003
Neg logit contributions
 supervision-0.006
 creativity-0.005
 acts-0.005
 agree-0.005
 bisc-0.005
Token eat
Token ID4483
Feature activation+1.09e-02

Pos logit contributions
 supervision+0.031
 horror+0.027
 pleasure+0.027
 bravery+0.027
 amaz+0.026
Neg logit contributions
chard-0.020
bel-0.020
 early-0.020
pron-0.020
 into-0.020
Token broccoli
Token ID44653
Feature activation-3.33e-03

Pos logit contributions
 know+0.014
 creativity+0.013
 agree+0.013
 supervision+0.013
 pleasure+0.013
Neg logit contributions
but-0.017
 upon-0.017
chard-0.017
ied-0.017
 into-0.017
Token.
Token ID13
Feature activation-4.89e-03

Pos logit contributions
but+0.005
itates+0.005
 upon+0.005
ied+0.005
just+0.005
Neg logit contributions
 know-0.005
 creativity-0.004
 agree-0.004
 acts-0.004
 amaz-0.004
Token She
Token ID1375
Feature activation-1.93e-04

Pos logit contributions
but+0.007
itates+0.007
 upon+0.007
ained+0.007
ied+0.007
Neg logit contributions
 know-0.007
 creativity-0.007
 agree-0.007
 bravery-0.006
 supervision-0.006
Token said
Token ID531
Feature activation+7.60e-04

Pos logit contributions
itating+0.000
itates+0.000
but+0.000
ained+0.000
just+0.000
Neg logit contributions
 know-0.000
 help-0.000
 agree-0.000
 chew-0.000
 thank-0.000
Token,
Token ID11
Feature activation-1.63e-03

Pos logit contributions
 know+0.001
 creativity+0.001
 agree+0.001
 acts+0.001
 thank+0.001
Neg logit contributions
itates-0.001
but-0.001
ied-0.001
 upon-0.001
just-0.001
Token "
Token ID366
Feature activation-6.02e-03

Pos logit contributions
but+0.002
itates+0.002
 upon+0.002
just+0.002
ied+0.002
Neg logit contributions
 know-0.002
 creativity-0.002
 agree-0.002
 acts-0.002
 amaz-0.002
TokenI
Token ID40
Feature activation-5.59e-03

Pos logit contributions
ied+0.008
 upon+0.007
itates+0.007
itating+0.007
 into+0.006
Neg logit contributions
 know-0.010
 chew-0.010
 Something-0.010
 agree-0.010
 help-0.009
Token like
Token ID588
Feature activation+1.52e-02

Pos logit contributions
 upon+0.010
itates+0.009
just+0.009
Prin+0.009
ained+0.009
Neg logit contributions
 creativity-0.006
 supervision-0.006
 bravery-0.006
 amaz-0.006
 generosity-0.006
Token broccoli
Token ID44653
Feature activation+6.10e-03

Pos logit contributions
 know+0.026
 creativity+0.025
 agree+0.025
 thank+0.025
 help+0.025
Neg logit contributions
itates-0.018
but-0.017
ied-0.017
ained-0.017
 upon-0.017
Token because
Token ID780
Feature activation+1.52e-02

Pos logit contributions
 know+0.007
 amaz+0.006
 creativity+0.006
 supervision+0.006
 agree+0.006
Neg logit contributions
but-0.011
 upon-0.011
 into-0.011
chard-0.011
itates-0.011
Token,
Token ID11
Feature activation-9.22e-03

Pos logit contributions
 know+0.008
 agree+0.008
 amaz+0.008
 pleasure+0.007
 chew+0.007
Neg logit contributions
but-0.007
ied-0.007
 upon-0.007
just-0.007
 by-0.007
Token even
Token ID772
Feature activation-1.16e-02

Pos logit contributions
itates+0.014
but+0.014
ied+0.013
 upon+0.013
ained+0.013
Neg logit contributions
 know-0.013
 creativity-0.012
 agree-0.012
 acts-0.012
 thank-0.012
Token when
Token ID618
Feature activation+1.69e-02

Pos logit contributions
itates+0.020
ained+0.019
ied+0.019
abby+0.018
but+0.018
Neg logit contributions
 know-0.015
 help-0.014
 creativity-0.014
 acts-0.014
 thank-0.014
Token it
Token ID340
Feature activation+2.57e-02

Pos logit contributions
 agree+0.026
 know+0.026
 amaz+0.025
 bisc+0.024
 pleasure+0.024
Neg logit contributions
 upon-0.022
but-0.021
just-0.021
ied-0.021
ained-0.021
Token feels
Token ID5300
Feature activation+1.11e-02

Pos logit contributions
 agree+0.033
 pleasure+0.033
 amaz+0.032
 know+0.032
 creativity+0.032
Neg logit contributions
 upon-0.040
itates-0.040
ained-0.039
ied-0.039
 into-0.039
Token a
Token ID257
Feature activation-1.24e-02

Pos logit contributions
 agree+0.016
 know+0.016
 acts+0.015
 amaz+0.015
 creativity+0.015
Neg logit contributions
 upon-0.016
 into-0.015
ained-0.015
ied-0.015
but-0.015
Token bit
Token ID1643
Feature activation+1.76e-02

Pos logit contributions
 upon+0.018
itates+0.018
but+0.018
just+0.018
 into+0.017
Neg logit contributions
 know-0.016
 agree-0.016
 acts-0.015
 creativity-0.015
 supervision-0.015
Token dizz
Token ID29527
Feature activation+1.33e-02

Pos logit contributions
 agree+0.027
 amaz+0.027
 know+0.027
 creativity+0.026
 Something+0.026
Neg logit contributions
 into-0.023
 upon-0.022
ied-0.022
 by-0.021
ling-0.021
Tokeny
Token ID88
Feature activation-5.33e-03

Pos logit contributions
 know+0.031
 creativity+0.031
 amaz+0.031
 agree+0.031
 pleasure+0.031
Neg logit contributions
ied-0.008
ained-0.007
but-0.007
itates-0.007
 upon-0.007
Token.
Token ID13
Feature activation+8.59e-04

Pos logit contributions
 into+0.007
 by+0.007
but+0.007
 upon+0.007
ied+0.007
Neg logit contributions
 know-0.008
 agree-0.008
 pleasure-0.008
 creativity-0.008
 Something-0.008
Token<|endoftext|>
Token ID50256
Feature activation-2.38e-03

Pos logit contributions
 know+0.001
 creativity+0.001
 agree+0.001
 thank+0.001
 acts+0.001
Neg logit contributions
ied-0.001
itates-0.001
but-0.001
chard-0.001
ained-0.001
Token how
Token ID703
Feature activation-6.02e-03

Pos logit contributions
 know+0.011
 agree+0.010
 creativity+0.010
 thank+0.010
 acts+0.010
Neg logit contributions
but-0.014
 upon-0.014
ied-0.014
itates-0.013
just-0.013
Token to
Token ID284
Feature activation+1.65e-02

Pos logit contributions
itates+0.007
ained+0.007
Prin+0.006
itating+0.006
ators+0.006
Neg logit contributions
 know-0.011
 help-0.011
 creativity-0.011
 kindness-0.010
 thank-0.010
Token tw
Token ID665
Feature activation-1.55e-02

Pos logit contributions
 supervision+0.022
 pleasure+0.022
 acts+0.021
 bravery+0.021
 generosity+0.021
Neg logit contributions
 into-0.023
 upon-0.023
chard-0.023
but-0.022
 by-0.022
Tokenirl
Token ID1901
Feature activation+8.27e-03

Pos logit contributions
but+0.022
itates+0.022
ied+0.021
 upon+0.021
ained+0.021
Neg logit contributions
 know-0.023
 creativity-0.022
 agree-0.021
 thank-0.021
 acts-0.021
Token the
Token ID262
Feature activation-1.87e-03

Pos logit contributions
 know+0.010
 creativity+0.009
 agree+0.009
 thank+0.009
 help+0.009
Neg logit contributions
itates-0.014
chard-0.013
but-0.013
ained-0.013
just-0.013
Token spaghetti
Token ID48241
Feature activation-1.24e-02

Pos logit contributions
ained+0.004
itates+0.004
Prin+0.004
bel+0.004
itating+0.004
Neg logit contributions
 help-0.001
 chew-0.001
 creativity-0.001
 paws-0.001
 thank-0.001
Token around
Token ID1088
Feature activation+1.24e-03

Pos logit contributions
but+0.015
itates+0.015
ied+0.015
 upon+0.014
just+0.014
Neg logit contributions
 know-0.021
 creativity-0.020
 agree-0.019
 acts-0.019
 pleasure-0.019
Token it
Token ID340
Feature activation+1.26e-02

Pos logit contributions
 help+0.002
 know+0.002
 creativity+0.002
 agree+0.002
 thank+0.002
Neg logit contributions
itates-0.002
ained-0.002
itating-0.002
but-0.002
ied-0.002
Token without
Token ID1231
Feature activation-1.56e-02

Pos logit contributions
 know+0.019
 generosity+0.018
 pleasure+0.018
 supervision+0.018
 agree+0.018
Neg logit contributions
but-0.018
 into-0.018
just-0.017
 upon-0.017
 by-0.017
Token dropping
Token ID12047
Feature activation+1.93e-03

Pos logit contributions
 into+0.019
 upon+0.018
but+0.018
ied+0.018
chard+0.018
Neg logit contributions
 know-0.024
 acts-0.024
 agree-0.024
 offer-0.024
 Something-0.024
Token it
Token ID340
Feature activation-5.88e-03

Pos logit contributions
 know+0.003
 agree+0.003
 creativity+0.003
 acts+0.003
 pleasure+0.003
Neg logit contributions
but-0.003
 upon-0.003
ied-0.003
itates-0.003
 into-0.003
Token flower
Token ID15061
Feature activation-7.57e-03

Pos logit contributions
 know+0.003
 creativity+0.003
 help+0.003
 offer+0.003
 thank+0.003
Neg logit contributions
itates-0.004
but-0.004
ained-0.004
ied-0.004
just-0.004
Token in
Token ID287
Feature activation-1.15e-02

Pos logit contributions
itates+0.006
but+0.006
ained+0.006
Prin+0.005
ied+0.005
Neg logit contributions
 know-0.015
 help-0.015
 creativity-0.015
 agree-0.015
 offer-0.015
Token his
Token ID465
Feature activation+1.07e-02

Pos logit contributions
 upon+0.012
 into+0.012
but+0.012
chard+0.011
side+0.011
Neg logit contributions
 know-0.021
 agree-0.020
 creativity-0.020
 acts-0.020
 supervision-0.020
Token pocket
Token ID10000
Feature activation-6.25e-03

Pos logit contributions
 know+0.014
 agree+0.013
 acts+0.012
 disl+0.012
 believe+0.012
Neg logit contributions
 into-0.017
 upon-0.017
chard-0.017
 by-0.016
eh-0.016
Token and
Token ID290
Feature activation+9.78e-04

Pos logit contributions
but+0.010
 upon+0.010
ied+0.009
itates+0.009
just+0.009
Neg logit contributions
 know-0.008
 creativity-0.008
 agree-0.008
 acts-0.008
 thank-0.008
Token started
Token ID2067
Feature activation+5.87e-03

Pos logit contributions
 know+0.001
 creativity+0.001
 agree+0.001
 pleasure+0.001
 acts+0.001
Neg logit contributions
but-0.001
ied-0.001
 upon-0.001
chard-0.001
itates-0.001
Token walking
Token ID6155
Feature activation+2.02e-02

Pos logit contributions
 know+0.012
 agree+0.011
 creativity+0.011
 acts+0.011
 pleasure+0.011
Neg logit contributions
but-0.005
 upon-0.005
 into-0.005
ied-0.005
just-0.005
Token.
Token ID13
Feature activation+6.76e-03

Pos logit contributions
 know+0.027
 agree+0.027
 help+0.027
 offer+0.026
 thank+0.025
Neg logit contributions
itates-0.032
ained-0.031
ied-0.030
eh-0.030
Prin-0.030
Token He
Token ID679
Feature activation+5.40e-03

Pos logit contributions
 know+0.010
 agree+0.010
 creativity+0.010
 acts+0.009
 supervision+0.009
Neg logit contributions
but-0.010
itates-0.009
 upon-0.009
ained-0.009
ied-0.009
Token had
Token ID550
Feature activation-3.89e-03

Pos logit contributions
 know+0.008
 creativity+0.008
 supervision+0.008
 pleasure+0.008
 generosity+0.008
Neg logit contributions
 upon-0.007
but-0.007
ied-0.007
itates-0.007
 into-0.007
Token the
Token ID262
Feature activation+9.18e-04

Pos logit contributions
 upon+0.005
but+0.005
itates+0.005
ied+0.005
ained+0.005
Neg logit contributions
 know-0.006
 agree-0.006
 creativity-0.006
 acts-0.006
 thank-0.005
Token they
Token ID484
Feature activation-1.12e-02

Pos logit contributions
 acts+0.006
 creativity+0.006
 pleasure+0.006
 agree+0.006
 know+0.006
Neg logit contributions
chard-0.012
 upon-0.012
 into-0.012
but-0.012
bel-0.012
Token were
Token ID547
Feature activation-1.33e-02

Pos logit contributions
itates+0.016
ied+0.016
 upon+0.016
but+0.016
chard+0.015
Neg logit contributions
 creativity-0.015
 know-0.015
 pleasure-0.015
 acts-0.015
 amaz-0.015
Token very
Token ID845
Feature activation-4.64e-03

Pos logit contributions
itates+0.017
 upon+0.016
but+0.016
ied+0.016
ained+0.016
Neg logit contributions
 know-0.021
 creativity-0.020
 acts-0.020
 agree-0.020
 amaz-0.020
Token brave
Token ID14802
Feature activation-7.89e-03

Pos logit contributions
itates+0.007
 upon+0.006
chard+0.006
ied+0.006
bel+0.006
Neg logit contributions
 acts-0.007
 know-0.006
 creativity-0.006
 amaz-0.006
 agree-0.006
Token.
Token ID13
Feature activation-8.91e-03

Pos logit contributions
 upon+0.011
 into+0.011
itates+0.010
but+0.010
just+0.010
Neg logit contributions
 know-0.012
 creativity-0.011
 Something-0.011
 agree-0.011
 thank-0.011
Token
Token ID220
Feature activation-3.32e-02

Pos logit contributions
but+0.013
itates+0.013
 upon+0.013
ained+0.012
Prin+0.012
Neg logit contributions
 creativity-0.013
 know-0.013
 bravery-0.012
 supervision-0.012
 acts-0.012
Token
Token ID198
Feature activation-1.58e-02

Pos logit contributions
itates+0.055
but+0.055
ied+0.054
 upon+0.054
ained+0.053
Neg logit contributions
 know-0.040
 creativity-0.037
 agree-0.037
 thank-0.036
 acts-0.036
Token
Token ID198
Feature activation-4.63e-03

Pos logit contributions
itates+0.027
but+0.027
ied+0.026
ained+0.026
 upon+0.026
Neg logit contributions
 know-0.019
 agree-0.018
 creativity-0.017
 thank-0.017
 help-0.017
TokenThey
Token ID2990
Feature activation-5.10e-03

Pos logit contributions
ied+0.004
itates+0.004
but+0.004
 upon+0.004
ained+0.004
Neg logit contributions
 know-0.009
 agree-0.009
 creativity-0.008
 help-0.008
 Something-0.008
Token worked
Token ID3111
Feature activation-2.00e-02

Pos logit contributions
 upon+0.007
ied+0.007
itates+0.007
chard+0.007
but+0.007
Neg logit contributions
 creativity-0.007
 know-0.007
 pleasure-0.007
 supervision-0.007
 acts-0.007
Token together
Token ID1978
Feature activation-7.34e-03

Pos logit contributions
itates+0.032
chard+0.031
Prin+0.031
but+0.031
just+0.030
Neg logit contributions
 know-0.026
 creativity-0.025
 agree-0.025
 help-0.025
 bravery-0.024
Token his
Token ID465
Feature activation+1.20e-02

Pos logit contributions
 upon+0.005
but+0.005
 into+0.005
chard+0.005
side+0.005
Neg logit contributions
 know-0.007
 creativity-0.007
 acts-0.007
 supervision-0.007
 agree-0.007
Token pocket
Token ID10000
Feature activation+2.61e-03

Pos logit contributions
 know+0.016
 agree+0.015
 believe+0.014
 disl+0.014
 expect+0.014
Neg logit contributions
 into-0.020
 upon-0.019
chard-0.018
bel-0.017
 by-0.017
Token.
Token ID13
Feature activation+6.77e-03

Pos logit contributions
 know+0.004
 creativity+0.004
 pleasure+0.004
 amaz+0.004
 agree+0.004
Neg logit contributions
but-0.004
 upon-0.004
ied-0.004
 into-0.004
just-0.004
Token He
Token ID679
Feature activation+9.91e-03

Pos logit contributions
 know+0.010
 creativity+0.010
 acts+0.009
 agree+0.009
 supervision+0.009
Neg logit contributions
but-0.010
itates-0.010
 upon-0.010
ained-0.009
just-0.009
Token also
Token ID635
Feature activation+1.81e-02

Pos logit contributions
 supervision+0.015
 pleasure+0.015
 know+0.015
 creativity+0.014
 thank+0.014
Neg logit contributions
 upon-0.013
ied-0.013
 into-0.013
itates-0.013
chard-0.013
Token filled
Token ID5901
Feature activation-2.65e-02

Pos logit contributions
 pleasure+0.031
 know+0.029
 creativity+0.029
 supervision+0.029
 bravery+0.028
Neg logit contributions
 upon-0.022
chard-0.022
bel-0.021
 into-0.020
ied-0.020
Token his
Token ID465
Feature activation+4.16e-03

Pos logit contributions
itates+0.043
ained+0.042
but+0.041
itating+0.040
just+0.040
Neg logit contributions
 know-0.035
 agree-0.032
 help-0.032
 creativity-0.032
 offer-0.031
Token bottle
Token ID9294
Feature activation+1.01e-02

Pos logit contributions
 know+0.005
 agree+0.004
 creativity+0.004
 thank+0.004
 acts+0.004
Neg logit contributions
 upon-0.007
chard-0.007
 into-0.007
ied-0.007
itates-0.007
Token with
Token ID351
Feature activation-3.61e-03

Pos logit contributions
 know+0.018
 creativity+0.017
 agree+0.017
 thank+0.017
 acts+0.017
Neg logit contributions
itates-0.011
but-0.011
 upon-0.011
ied-0.011
ained-0.011
Token some
Token ID617
Feature activation-7.11e-04

Pos logit contributions
 into+0.005
 upon+0.005
ied+0.004
 by+0.004
chard+0.004
Neg logit contributions
 agree-0.006
 know-0.005
urry-0.005
ird-0.005
 supervision-0.005
Token water
Token ID1660
Feature activation+6.01e-03

Pos logit contributions
 into+0.001
 upon+0.001
ied+0.001
 by+0.001
ling+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 supervision-0.001
 disl-0.001
 Something-0.001
Token the
Token ID262
Feature activation-1.14e-02

Pos logit contributions
but+0.010
 upon+0.010
itates+0.010
ied+0.010
just+0.010
Neg logit contributions
 know-0.009
 creativity-0.008
 agree-0.008
 acts-0.008
 amaz-0.008
Token stickers
Token ID28568
Feature activation-3.51e-02

Pos logit contributions
 upon+0.018
chard+0.016
but+0.016
just+0.016
 by+0.016
Neg logit contributions
 know-0.016
 agree-0.015
 disl-0.014
 supervision-0.014
 amaz-0.014
Token,
Token ID11
Feature activation-7.32e-04

Pos logit contributions
but+0.050
ied+0.049
 upon+0.049
itates+0.048
just+0.047
Neg logit contributions
 know-0.052
 creativity-0.049
 agree-0.049
 acts-0.048
 amaz-0.048
Token cr
Token ID1067
Feature activation+8.37e-03

Pos logit contributions
but+0.001
 upon+0.001
just+0.001
chard+0.001
itates+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 acts-0.001
 creativity-0.001
 supervision-0.001
Tokenay
Token ID323
Feature activation-8.27e-03

Pos logit contributions
 know+0.012
 agree+0.012
 creativity+0.011
 acts+0.011
 supervision+0.011
Neg logit contributions
ied-0.012
but-0.012
itates-0.012
 upon-0.012
ained-0.012
Tokenons
Token ID684
Feature activation-1.42e-02

Pos logit contributions
ied+0.009
eh+0.008
ling+0.008
 upon+0.008
 into+0.008
Neg logit contributions
 amaz-0.015
 pleasure-0.015
 Something-0.015
 know-0.014
 agree-0.014
Token,
Token ID11
Feature activation-5.03e-04

Pos logit contributions
 into+0.020
 upon+0.020
but+0.019
ied+0.019
 by+0.019
Neg logit contributions
 amaz-0.022
 know-0.021
 pleasure-0.021
 acts-0.021
 creativity-0.020
Token and
Token ID290
Feature activation-1.92e-03

Pos logit contributions
 upon+0.001
but+0.001
 into+0.001
just+0.001
chard+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 pleasure-0.001
 supervision-0.001
 acts-0.001
Token glitter
Token ID31133
Feature activation+6.61e-03

Pos logit contributions
 into+0.003
 upon+0.002
chard+0.002
ied+0.002
 late+0.002
Neg logit contributions
 agree-0.003
 know-0.003
 pleasure-0.003
 supervision-0.002
 acts-0.002
Token.
Token ID13
Feature activation+2.87e-03

Pos logit contributions
 know+0.012
 creativity+0.011
 agree+0.011
 amaz+0.011
 pleasure+0.011
Neg logit contributions
ied-0.008
 upon-0.008
but-0.008
itates-0.007
 into-0.007
Token They
Token ID1119
Feature activation+4.43e-03

Pos logit contributions
 know+0.004
 creativity+0.004
 bravery+0.004
 agree+0.004
 acts+0.004
Neg logit contributions
itates-0.004
but-0.004
 upon-0.004
ained-0.004
Prin-0.004
Token
Token ID198
Feature activation-4.24e-03

Pos logit contributions
itates+0.007
but+0.007
 upon+0.007
ained+0.006
just+0.006
Neg logit contributions
 know-0.006
 creativity-0.006
 agree-0.005
 acts-0.005
 bravery-0.005
Token
Token ID198
Feature activation+7.51e-03

Pos logit contributions
itates+0.007
but+0.006
 upon+0.006
ied+0.006
ained+0.006
Neg logit contributions
 know-0.006
 creativity-0.006
 acts-0.005
 agree-0.005
 pleasure-0.005
Token"
Token ID1
Feature activation-9.44e-03

Pos logit contributions
 know+0.010
 agree+0.010
 help+0.009
 thank+0.009
 chew+0.009
Neg logit contributions
chard-0.012
ained-0.012
ied-0.012
but-0.012
itates-0.011
TokenOkay
Token ID16454
Feature activation-1.05e-02

Pos logit contributions
ied+0.012
ained+0.012
 upon+0.012
chard+0.012
itates+0.012
Neg logit contributions
 agree-0.015
 know-0.015
 creativity-0.014
 help-0.014
 thank-0.014
Token,
Token ID11
Feature activation-2.12e-02

Pos logit contributions
 upon+0.017
 by+0.017
 into+0.017
 up+0.016
oe+0.016
Neg logit contributions
 amaz-0.012
 acts-0.012
 know-0.012
 creativity-0.012
 disl-0.012
Token Lily
Token ID20037
Feature activation-2.21e-02

Pos logit contributions
 upon+0.026
 by+0.025
but+0.025
just+0.023
 into+0.023
Neg logit contributions
 acts-0.033
 know-0.032
 amaz-0.031
 disl-0.030
 creativity-0.030
Token,"
Token ID553
Feature activation-2.92e-04

Pos logit contributions
 upon+0.036
 by+0.035
itates+0.034
but+0.033
just+0.033
Neg logit contributions
 amaz-0.028
 creativity-0.026
 Something-0.026
 know-0.025
 pleasure-0.025
Token Ben
Token ID3932
Feature activation+7.82e-03

Pos logit contributions
 upon+0.000
 by+0.000
 into+0.000
chard+0.000
 late+0.000
Neg logit contributions
 pleasure-0.000
 know-0.000
 creativity-0.000
 supervision-0.000
 believe-0.000
Token says
Token ID1139
Feature activation-9.84e-03

Pos logit contributions
 pleasure+0.007
 creativity+0.007
 know+0.007
 amaz+0.006
 supervision+0.006
Neg logit contributions
itates-0.015
 into-0.015
 upon-0.015
Prin-0.015
 by-0.014
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
itates+0.016
but+0.016
ied+0.015
 upon+0.015
just+0.015
Neg logit contributions
 know-0.012
 creativity-0.012
 agree-0.012
 acts-0.011
 thank-0.011
Token "
Token ID366
Feature activation-1.36e-02

Pos logit contributions
itates+0.018
but+0.018
 upon+0.017
ied+0.017
ained+0.017
Neg logit contributions
 know-0.015
 creativity-0.015
 agree-0.014
 acts-0.014
 thank-0.014
Token down
Token ID866
Feature activation-3.93e-03

Pos logit contributions
ators+0.025
we+0.024
abby+0.023
Acc+0.023
chard+0.023
Neg logit contributions
,-0.028
 help-0.025
 Sammy-0.025
 hop-0.024
 thank-0.024
Token to
Token ID284
Feature activation-1.87e-02

Pos logit contributions
itates+0.008
abby+0.008
ied+0.008
chard+0.008
ained+0.008
Neg logit contributions
 help-0.004
 thank-0.003
 know-0.003
,-0.003
 agree-0.003
Token Annie
Token ID25860
Feature activation-2.11e-02

Pos logit contributions
itates+0.029
but+0.029
ied+0.028
 upon+0.028
ained+0.028
Neg logit contributions
 know-0.025
 creativity-0.024
 agree-0.024
 thank-0.023
 acts-0.023
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
 into+0.030
 upon+0.030
but+0.029
itates+0.028
just+0.028
Neg logit contributions
 creativity-0.031
 know-0.030
 pleasure-0.029
 thank-0.029
 generosity-0.029
Token she
Token ID673
Feature activation-2.38e-02

Pos logit contributions
itates+0.019
but+0.019
ied+0.018
 upon+0.018
ained+0.018
Neg logit contributions
 know-0.013
 creativity-0.013
 agree-0.013
 thank-0.012
 acts-0.012
Token was
Token ID373
Feature activation-1.50e-02

Pos logit contributions
itates+0.033
but+0.033
 upon+0.032
ied+0.032
just+0.031
Neg logit contributions
 know-0.035
 creativity-0.034
 agree-0.034
 acts-0.033
 thank-0.033
Token so
Token ID523
Feature activation-7.81e-03

Pos logit contributions
but+0.022
itates+0.022
ied+0.022
 upon+0.021
just+0.021
Neg logit contributions
 know-0.022
 creativity-0.020
 agree-0.020
 thank-0.019
 help-0.019
Token amazed
Token ID24562
Feature activation-1.96e-02

Pos logit contributions
itates+0.012
 upon+0.012
but+0.012
ied+0.012
chard+0.012
Neg logit contributions
 creativity-0.010
 know-0.010
 acts-0.010
 agree-0.010
 supervision-0.010
Token!
Token ID0
Feature activation-8.90e-03

Pos logit contributions
itates+0.039
chard+0.037
ained+0.037
Prin+0.037
eh+0.036
Neg logit contributions
 know-0.017
 agree-0.017
 acts-0.016
 thank-0.016
 help-0.016
Token The
Token ID383
Feature activation-6.10e-03

Pos logit contributions
ained+0.014
ied+0.014
itates+0.014
but+0.014
chard+0.014
Neg logit contributions
 know-0.012
 creativity-0.011
 thank-0.011
 agree-0.011
 acts-0.011
Token angel
Token ID18304
Feature activation-1.62e-02

Pos logit contributions
ained+0.010
but+0.009
itates+0.009
ied+0.009
just+0.009
Neg logit contributions
 thank-0.009
 know-0.009
 help-0.009
 wish-0.009
 creativity-0.008
Token for
Token ID329
Feature activation-8.42e-03

Pos logit contributions
 know+0.008
 agree+0.007
 creativity+0.007
 supervision+0.007
 acts+0.007
Neg logit contributions
ied-0.005
but-0.005
 upon-0.005
 into-0.005
itates-0.005
Token air
Token ID1633
Feature activation-9.53e-04

Pos logit contributions
 upon+0.011
 into+0.011
ied+0.011
side+0.010
 late+0.010
Neg logit contributions
 acts-0.013
 know-0.012
 supervision-0.012
 agree-0.012
 wish-0.012
Token,
Token ID11
Feature activation-3.85e-03

Pos logit contributions
 into+0.001
 upon+0.001
ied+0.001
but+0.001
 by+0.001
Neg logit contributions
 know-0.001
 agree-0.001
 creativity-0.001
 Something-0.001
 amaz-0.001
Token he
Token ID339
Feature activation-9.37e-03

Pos logit contributions
 upon+0.006
 into+0.005
but+0.005
 late+0.005
 by+0.005
Neg logit contributions
 know-0.006
 acts-0.005
 agree-0.005
 supervision-0.005
 wish-0.005
Token cl
Token ID537
Feature activation-9.78e-03

Pos logit contributions
 upon+0.012
ied+0.012
itates+0.011
but+0.011
just+0.011
Neg logit contributions
 know-0.015
 creativity-0.015
 pleasure-0.014
 supervision-0.014
 amaz-0.014
Tokenamb
Token ID4131
Feature activation-4.93e-02

Pos logit contributions
itates+0.009
but+0.008
ied+0.008
ained+0.008
 upon+0.008
Neg logit contributions
 know-0.020
 creativity-0.020
 help-0.019
 acts-0.019
 bravery-0.019
Tokenered
Token ID1068
Feature activation-1.92e-03

Pos logit contributions
ied+0.064
but+0.062
itates+0.062
 upon+0.061
just+0.060
Neg logit contributions
 know-0.078
 creativity-0.075
 agree-0.074
 amaz-0.073
 Something-0.073
Token out
Token ID503
Feature activation-9.52e-03

Pos logit contributions
itates+0.002
but+0.001
ained+0.001
ied+0.001
Prin+0.001
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 help-0.004
 amaz-0.004
Token of
Token ID286
Feature activation-2.11e-02

Pos logit contributions
itates+0.012
but+0.012
ained+0.012
ied+0.012
chard+0.012
Neg logit contributions
 know-0.015
 creativity-0.015
 agree-0.015
 thank-0.015
 help-0.014
Token the
Token ID262
Feature activation+1.12e-03

Pos logit contributions
 into+0.027
 upon+0.027
ied+0.026
ained+0.026
 by+0.025
Neg logit contributions
 know-0.033
 agree-0.032
 acts-0.031
 creativity-0.031
 Something-0.030
Token p
Token ID279
Feature activation+3.60e-03

Pos logit contributions
 know+0.001
 agree+0.001
 creativity+0.001
 supervision+0.001
 acts+0.001
Neg logit contributions
 upon-0.002
itates-0.002
ained-0.002
but-0.002
ied-0.002
Token the
Token ID262
Feature activation-1.21e-02

Pos logit contributions
but+0.035
 upon+0.035
itates+0.035
ied+0.035
ained+0.034
Neg logit contributions
 know-0.035
 agree-0.034
 creativity-0.034
 acts-0.033
 thank-0.032
Token most
Token ID749
Feature activation+3.82e-05

Pos logit contributions
itates+0.021
ied+0.021
but+0.020
ained+0.020
itating+0.020
Neg logit contributions
 know-0.016
 help-0.014
 creativity-0.014
 thank-0.014
 pleasure-0.014
Token beautiful
Token ID4950
Feature activation-6.69e-03

Pos logit contributions
 know+0.000
 agree+0.000
 creativity+0.000
 supervision+0.000
 thank+0.000
Neg logit contributions
 upon-0.000
but-0.000
itates-0.000
just-0.000
ied-0.000
Token apples
Token ID22514
Feature activation-2.55e-02

Pos logit contributions
itates+0.012
but+0.011
itating+0.011
ied+0.011
ained+0.011
Neg logit contributions
 know-0.008
 creativity-0.008
 thank-0.008
 acts-0.008
 agree-0.007
Token he
Token ID339
Feature activation-3.03e-02

Pos logit contributions
but+0.032
 into+0.032
ied+0.032
chard+0.032
 upon+0.031
Neg logit contributions
 creativity-0.042
 know-0.042
 pleasure-0.041
 amaz-0.041
 agree-0.040
Token had
Token ID550
Feature activation-5.09e-02

Pos logit contributions
itates+0.040
 upon+0.040
but+0.040
ied+0.039
chard+0.039
Neg logit contributions
 know-0.046
 creativity-0.045
 agree-0.044
 thank-0.043
 acts-0.043
Token ever
Token ID1683
Feature activation-2.81e-02

Pos logit contributions
itates+0.086
itating+0.080
Prin+0.079
chard+0.079
ained+0.078
Neg logit contributions
 know-0.065
 agree-0.060
 help-0.059
 believe-0.057
 offer-0.057
Token seen
Token ID1775
Feature activation-2.07e-02

Pos logit contributions
itates+0.041
but+0.040
ied+0.039
ained+0.039
just+0.038
Neg logit contributions
 know-0.041
 creativity-0.039
 agree-0.039
 acts-0.038
 thank-0.037
Token.
Token ID13
Feature activation-4.68e-03

Pos logit contributions
itates+0.038
but+0.036
ied+0.035
itating+0.035
bel+0.035
Neg logit contributions
 know-0.022
 agree-0.020
 creativity-0.020
 acts-0.020
 help-0.020
Token His
Token ID2399
Feature activation-1.77e-03

Pos logit contributions
itates+0.008
but+0.008
ied+0.008
ained+0.008
just+0.008
Neg logit contributions
 know-0.005
 agree-0.005
 creativity-0.005
 thank-0.005
 amaz-0.005
Token mom
Token ID1995
Feature activation+9.99e-03

Pos logit contributions
 upon+0.003
itates+0.003
but+0.003
ied+0.003
chard+0.003
Neg logit contributions
 know-0.002
 agree-0.002
 creativity-0.002
 acts-0.002
 supervision-0.002
Token was
Token ID373
Feature activation-5.09e-04

Pos logit contributions
itates+0.000
 upon+0.000
but+0.000
ied+0.000
ained+0.000
Neg logit contributions
 know-0.000
 creativity-0.000
 agree-0.000
 pleasure-0.000
 supervision-0.000
Token time
Token ID640
Feature activation-3.43e-03

Pos logit contributions
itates+0.001
but+0.001
ied+0.001
just+0.001
bel+0.001
Neg logit contributions
 know-0.001
 creativity-0.001
 pleasure-0.001
 thank-0.001
 agree-0.001
Token for
Token ID329
Feature activation-7.31e-03

Pos logit contributions
Prin+0.005
itates+0.005
chard+0.005
itating+0.005
Moon+0.005
Neg logit contributions
 know-0.005
 help-0.005
 thank-0.005
 creativity-0.005
 agree-0.005
Token the
Token ID262
Feature activation-1.60e-02

Pos logit contributions
 into+0.008
 upon+0.008
 by+0.008
ained+0.007
Prin+0.007
Neg logit contributions
 supervision-0.013
 agree-0.012
 chew-0.012
 acts-0.012
 amaz-0.012
Token happy
Token ID3772
Feature activation-2.07e-02

Pos logit contributions
but+0.025
itates+0.025
 upon+0.025
ied+0.024
just+0.024
Neg logit contributions
 know-0.021
 agree-0.019
 creativity-0.019
 acts-0.019
 amaz-0.018
Token couple
Token ID3155
Feature activation-5.20e-02

Pos logit contributions
 upon+0.029
ied+0.028
but+0.028
just+0.027
itates+0.027
Neg logit contributions
 know-0.029
 supervision-0.028
 Something-0.028
 agree-0.027
 creativity-0.027
Token to
Token ID284
Feature activation-7.13e-03

Pos logit contributions
ied+0.071
 upon+0.071
but+0.070
just+0.068
 into+0.067
Neg logit contributions
 know-0.076
 creativity-0.075
 amaz-0.072
 Something-0.072
 acts-0.072
Token head
Token ID1182
Feature activation-2.89e-03

Pos logit contributions
 over+0.011
 upon+0.011
 into+0.011
Prin+0.010
 by+0.010
Neg logit contributions
 supervision-0.009
 acts-0.009
 creativity-0.008
 omega-0.008
 generosity-0.008
Token off
Token ID572
Feature activation-2.53e-02

Pos logit contributions
itates+0.004
but+0.004
 upon+0.004
ied+0.004
ained+0.004
Neg logit contributions
 know-0.005
 creativity-0.004
 agree-0.004
 acts-0.004
 thank-0.004
Token.
Token ID13
Feature activation+3.48e-03

Pos logit contributions
itates+0.033
but+0.032
ied+0.031
ained+0.031
 upon+0.031
Neg logit contributions
 know-0.040
 creativity-0.039
 agree-0.038
 thank-0.038
 help-0.037
Token Everyone
Token ID11075
Feature activation-6.67e-03

Pos logit contributions
 supervision+0.005
 agree+0.005
 creativity+0.004
 chew+0.004
 know+0.004
Neg logit contributions
 upon-0.005
but-0.005
itates-0.005
ained-0.005
ied-0.005
Token so
Token ID523
Feature activation-1.73e-03

Pos logit contributions
but+0.008
ied+0.008
itates+0.008
 upon+0.008
Prin+0.007
Neg logit contributions
 know-0.007
 help-0.006
 thank-0.006
 agree-0.006
 creativity-0.006
Token scared
Token ID12008
Feature activation+6.42e-03

Pos logit contributions
itates+0.003
but+0.003
 upon+0.003
ied+0.003
ained+0.003
Neg logit contributions
 know-0.002
 creativity-0.002
 agree-0.002
 acts-0.002
 thank-0.002
Token that
Token ID326
Feature activation-2.60e-02

Pos logit contributions
 know+0.007
 creativity+0.007
 agree+0.007
 acts+0.007
 amaz+0.006
Neg logit contributions
but-0.011
itates-0.011
 upon-0.011
ied-0.011
just-0.011
Token the
Token ID262
Feature activation-2.15e-02

Pos logit contributions
but+0.038
itates+0.037
 upon+0.037
ied+0.036
just+0.036
Neg logit contributions
 know-0.037
 creativity-0.036
 agree-0.035
 acts-0.035
 pleasure-0.034
Token books
Token ID3835
Feature activation-2.43e-02

Pos logit contributions
itates+0.036
but+0.035
ied+0.035
ained+0.034
itating+0.034
Neg logit contributions
 know-0.027
 creativity-0.026
 agree-0.025
 help-0.025
 thank-0.025
Token would
Token ID561
Feature activation-5.09e-02

Pos logit contributions
itates+0.035
but+0.035
chard+0.034
Prin+0.034
just+0.034
Neg logit contributions
 know-0.034
 agree-0.032
 thank-0.032
 help-0.032
 creativity-0.031
Token get
Token ID651
Feature activation-1.12e-02

Pos logit contributions
itates+0.078
but+0.078
 upon+0.076
ied+0.076
ained+0.075
Neg logit contributions
 know-0.068
 creativity-0.065
 agree-0.064
 acts-0.063
 thank-0.062
Token damaged
Token ID9694
Feature activation-7.30e-03

Pos logit contributions
but+0.015
itates+0.015
itating+0.014
just+0.014
abby+0.014
Neg logit contributions
 know-0.018
 help-0.017
 creativity-0.017
 thank-0.017
 kindness-0.016
Token,
Token ID11
Feature activation-3.04e-02

Pos logit contributions
but+0.011
itates+0.011
 upon+0.011
ied+0.011
ained+0.011
Neg logit contributions
 know-0.010
 creativity-0.009
 agree-0.009
 acts-0.009
 thank-0.009
Token so
Token ID523
Feature activation-1.57e-02

Pos logit contributions
itates+0.052
but+0.051
ied+0.051
 upon+0.050
ained+0.050
Neg logit contributions
 know-0.036
 creativity-0.034
 agree-0.034
 thank-0.033
 acts-0.032
Token she
Token ID673
Feature activation-3.47e-02

Pos logit contributions
but+0.023
itates+0.023
 upon+0.023
ied+0.023
ained+0.023
Neg logit contributions
 know-0.022
 creativity-0.021
 agree-0.021
 acts-0.020
 pleasure-0.020
Token the
Token ID262
Feature activation-3.30e-02

Pos logit contributions
 know+0.003
 creativity+0.003
 agree+0.003
 thank+0.003
 acts+0.003
Neg logit contributions
itates-0.004
but-0.004
ied-0.004
 upon-0.004
ained-0.004
Token most
Token ID749
Feature activation-4.53e-03

Pos logit contributions
itates+0.054
ied+0.052
but+0.050
ained+0.050
itating+0.049
Neg logit contributions
 know-0.042
 help-0.041
 creativity-0.041
 thank-0.041
 offer-0.040
Token beautiful
Token ID4950
Feature activation-1.33e-02

Pos logit contributions
but+0.006
just+0.006
 upon+0.006
ied+0.006
 by+0.006
Neg logit contributions
 know-0.007
 agree-0.007
 supervision-0.007
 acts-0.007
 creativity-0.007
Token purse
Token ID27517
Feature activation-2.26e-02

Pos logit contributions
but+0.022
itates+0.022
 upon+0.022
ied+0.022
ained+0.021
Neg logit contributions
 know-0.016
 creativity-0.015
 agree-0.015
 acts-0.015
 thank-0.014
Token she
Token ID673
Feature activation-4.63e-02

Pos logit contributions
but+0.031
ied+0.030
just+0.030
itates+0.030
 upon+0.030
Neg logit contributions
 know-0.035
 creativity-0.034
 agree-0.034
 acts-0.033
 pleasure-0.032
Token had
Token ID550
Feature activation-5.11e-02

Pos logit contributions
ained+0.086
itating+0.083
itates+0.082
but+0.082
ied+0.080
Neg logit contributions
 know-0.053
 help-0.047
 thank-0.047
 creativity-0.047
 agree-0.047
Token ever
Token ID1683
Feature activation-3.03e-02

Pos logit contributions
itates+0.092
itating+0.087
 hotter+0.087
Prin+0.085
ained+0.084
Neg logit contributions
 know-0.060
 help-0.055
 believe-0.054
 agree-0.051
 offer-0.051
Token seen
Token ID1775
Feature activation-1.72e-02

Pos logit contributions
itates+0.044
but+0.043
ied+0.042
ained+0.042
just+0.041
Neg logit contributions
 know-0.044
 creativity-0.042
 agree-0.042
 acts-0.040
 thank-0.040
Token.
Token ID13
Feature activation-3.44e-03

Pos logit contributions
itates+0.030
bel+0.028
itating+0.028
ied+0.028
Prin+0.028
Neg logit contributions
 know-0.019
 help-0.018
 agree-0.018
 creativity-0.018
 thank-0.018
Token It
Token ID632
Feature activation+1.53e-03

Pos logit contributions
itates+0.006
but+0.006
ied+0.005
 upon+0.005
ained+0.005
Neg logit contributions
 know-0.004
 creativity-0.004
 agree-0.004
 thank-0.004
 acts-0.004
Token was
Token ID373
Feature activation+5.62e-04

Pos logit contributions
 supervision+0.003
 agree+0.003
 creativity+0.003
 know+0.003
 pleasure+0.003
Neg logit contributions
 upon-0.002
itates-0.002
but-0.002
ied-0.002
chard-0.002