TOP ACTIVATIONS
MAX = 0.033

for being so well - beh aved and Jack was delighted
be happy and well - beh aved . The end .
put their clothes in their clos ets , their books on
became friendly and well - beh aved , just like Buddy
baby . He was a mis chie vous little baby and
could be and never mis beh aved . <|endoftext|> Once upon
a nice home in a terr arium . They learned more
by and imported by and exported by and sold by and
a plan in case of emergencies . <|endoftext|> Once upon a
way she solved the math equations . Every
his friends . They all che wed the gum and blew
saw some other kids mis beh aving and running around .
still be healthy and un har med . Suddenly , her
looked at him with a mis chie vous grin . She
back the village to its former state . With the help
the old lady with a mis chie vous tw inkle in
see what was inside the envelop es ! <|endoftext|> One day
you put them in the compost pile over there ." Little
wagon crashed into a d andel ion bush ! Jack was
tree , he was un har med , but he felt

MIN ACTIVATIONS
MIN = -0.045

you go , my d arl ings . Enjoy your lunch
worked hard . The grass ho pper was spoiled and didn
but be careful . Ost ric hes can be gr umpy
. Rex sn arl s . He does not
's okay , my d arl ings . I forgive you
have fun , my d arl ings ?" Their mother asked
Spot grow ls and sn arl s . He tries to
. They were called ost ric hes . Lily and Ben
forgive you , my d arl ings . And of course
'm sorry , my d arl ings . That was a
ostr ich pen . Ost ric hes are big birds with
you okay , my d arl ings ?" Mom asked .
ostr ich pen . Ost ric hes are big birds with
'm sorry , my d arl ings . That was a
you too , my d arl ings . Now let 's
and looked at the ost ric hes . They saw one
'm sorry , my d arl ings . The tornado took
and run to the ost ric hes . "
you too , my d arl ings ," she said .
's wonderful , my d arl ings !" Mom my says

INTERVAL 0.014 - 0.033
CONTAINS 0.052%

, which seemed to grow bigger and bigger .
elly nation had a wide variety of sm elly foods ,
put the snacks on the convey or belt . Tim my
" Can I touch the books he lf too ?" His
smelled like it was too much . Mia took a sniff

INTERVAL -0.006 - 0.014
CONTAINS 92.232%

a very high fever . You need to rest and drink
very embarrass because she couldn 't reach her ball .
. " What did you do ?" she asks
says . " We love you too , mom
it was too small . She tried to ask her mom

INTERVAL -0.025 - -0.006
CONTAINS 7.713%

The friends were so happy when they saw the strawberries .
, Ben , it is not y ucky . It is
give it a chance . How about we make a deal
spl ashing and kicking his legs . This time he felt
the boy asked in a loud voice . The

INTERVAL -0.045 - -0.025
CONTAINS 0.002%

Then Baby saw some grass ho ppers in the garden .
you too , my d arl ings ," she said .
you go , my d arl ings . Enjoy your lunch
home . He grabbed some seaw eed and started to build
you okay , my d arl ings ?" she asked .
Token for
Token ID329
Feature activation+3.08e-03

Pos logit contributions
gy+0.003
 Majesty+0.003
 majesty+0.003
bank+0.003
asis+0.002
Neg logit contributions
 later-0.002
 feedback-0.002
 tug-0.002
 adjustments-0.002
 longer-0.002
Token being
Token ID852
Feature activation+1.84e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.005
bank-0.004
man-0.004
 majesty-0.004
 Majesty-0.004
Token so
Token ID523
Feature activation-2.48e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
itement+0.002
 later+0.002
 longer+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
asis-0.003
 majesty-0.003
Token well
Token ID880
Feature activation+5.90e-04

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
asis+0.003
zy+0.003
Neg logit contributions
 feedback-0.004
 tug-0.004
 adjustments-0.003
itement-0.003
 grades-0.003
Token-
Token ID12
Feature activation-3.74e-03

Pos logit contributions
 later+0.001
 feedback+0.001
 tug+0.001
 adjustments+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 majesty-0.001
 Majesty-0.001
Tokenbeh
Token ID20709
Feature activation+3.21e-02

Pos logit contributions
gy+0.007
 majesty+0.007
 Majesty+0.006
bank+0.006
man+0.006
Neg logit contributions
itement-0.004
 feedback-0.004
 later-0.004
 tug-0.004
 adjustments-0.004
Tokenaved
Token ID9586
Feature activation+6.80e-03

Pos logit contributions
 feedback+0.045
 adjustments+0.044
 tug+0.043
 conflict+0.042
oral+0.042
Neg logit contributions
gy-0.052
bank-0.048
o-0.045
man-0.043
asis-0.043
Token and
Token ID290
Feature activation-2.58e-03

Pos logit contributions
 later+0.007
 adjustments+0.007
 tug+0.007
 grades+0.007
 longer+0.007
Neg logit contributions
gy-0.012
 majesty-0.012
 Majesty-0.012
asis-0.011
bank-0.011
Token Jack
Token ID3619
Feature activation+2.24e-03

Pos logit contributions
gy+0.005
man+0.004
an+0.004
asis+0.004
bank+0.004
Neg logit contributions
 tug-0.003
 later-0.003
 adjustments-0.003
 grades-0.003
 feedback-0.003
Token was
Token ID373
Feature activation-1.20e-03

Pos logit contributions
 tug+0.003
 later+0.003
itement+0.003
 grades+0.003
 demonstrated+0.002
Neg logit contributions
gy-0.004
bank-0.004
asis-0.004
 Majesty-0.003
 majesty-0.003
Token delighted
Token ID20707
Feature activation+2.46e-03

Pos logit contributions
gy+0.002
bank+0.002
asis+0.002
 majesty+0.002
man+0.002
Neg logit contributions
 longer-0.002
 tug-0.001
 later-0.001
itement-0.001
 grades-0.001
Token be
Token ID307
Feature activation+1.99e-04

Pos logit contributions
gy+0.026
bank+0.024
man+0.024
zy+0.023
 majesty+0.023
Neg logit contributions
 feedback-0.016
 adjustments-0.016
 tug-0.016
itement-0.015
 grades-0.015
Token happy
Token ID3772
Feature activation+2.63e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 later+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Token and
Token ID290
Feature activation-4.05e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
 grades+0.003
 tug+0.003
 later+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
zy-0.004
 majesty-0.004
Token well
Token ID880
Feature activation-1.39e-03

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
zy+0.006
asis+0.006
Neg logit contributions
 feedback-0.005
 tug-0.005
 adjustments-0.005
 grades-0.005
 later-0.005
Token-
Token ID12
Feature activation-8.37e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
 majesty+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
 tug-0.002
 grades-0.002
 later-0.002
Tokenbeh
Token ID20709
Feature activation+3.10e-02

Pos logit contributions
gy+0.015
bank+0.014
 majesty+0.014
man+0.014
 Majesty+0.014
Neg logit contributions
 feedback-0.010
itement-0.009
 adjustments-0.009
 tug-0.009
 grades-0.009
Tokenaved
Token ID9586
Feature activation+6.87e-03

Pos logit contributions
 feedback+0.045
oral+0.044
 adjustments+0.044
 conflict+0.044
 tug+0.043
Neg logit contributions
gy-0.049
bank-0.043
o-0.042
asis-0.038
man-0.038
Token.
Token ID13
Feature activation-3.03e-04

Pos logit contributions
 adjustments+0.008
 feedback+0.008
 later+0.008
 grades+0.007
 tug+0.007
Neg logit contributions
gy-0.012
 majesty-0.012
 Majesty-0.011
bank-0.011
asis-0.011
Token The
Token ID383
Feature activation-6.81e-04

Pos logit contributions
gy+0.001
 Majesty+0.001
 majesty+0.000
bank+0.000
man+0.000
Neg logit contributions
Less-0.000
oral-0.000
 feedback-0.000
 later-0.000
itement-0.000
Token end
Token ID886
Feature activation-2.11e-03

Pos logit contributions
man+0.001
 jungle+0.001
 gorilla+0.001
gy+0.001
 volcano+0.001
Neg logit contributions
ares-0.001
care-0.001
 feedback-0.001
race-0.001
pr-0.001
Token.
Token ID13
Feature activation+1.88e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
zy+0.003
 majesty+0.003
Neg logit contributions
 feedback-0.003
 adjustments-0.003
itement-0.003
 tug-0.003
 grades-0.003
Token put
Token ID1234
Feature activation+7.54e-04

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 shortcuts+0.002
oral+0.002
Neg logit contributions
gy-0.003
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token their
Token ID511
Feature activation+3.20e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 tug+0.001
oral+0.001
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
zy-0.001
 majesty-0.001
Token clothes
Token ID8242
Feature activation+8.30e-04

Pos logit contributions
 adjustments+0.004
itement+0.004
 feedback+0.004
 graded+0.004
oral+0.004
Neg logit contributions
gy-0.005
 majesty-0.005
man-0.005
bank-0.005
 Majesty-0.004
Token in
Token ID287
Feature activation-7.73e-04

Pos logit contributions
 feedback+0.001
oral+0.001
Less+0.001
itement+0.001
 adjustments+0.001
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
zy-0.001
o-0.001
Token their
Token ID511
Feature activation+4.41e-03

Pos logit contributions
gy+0.001
man+0.001
bank+0.001
 Arctic+0.001
 Twe+0.001
Neg logit contributions
 feedback-0.001
oral-0.001
 graded-0.001
ares-0.001
 adjustments-0.001
Token clos
Token ID3542
Feature activation+3.00e-02

Pos logit contributions
 feedback+0.005
 adjustments+0.005
itement+0.005
 graded+0.005
Less+0.005
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 majesty-0.007
 Majesty-0.007
Tokenets
Token ID1039
Feature activation+5.70e-03

Pos logit contributions
 tug+0.039
Less+0.039
 adjustments+0.038
 feedback+0.038
itement+0.038
Neg logit contributions
gy-0.048
bank-0.044
man-0.043
zy-0.042
 majesty-0.041
Token,
Token ID11
Feature activation+3.18e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.007
itement+0.007
 tug+0.007
 grades+0.007
Neg logit contributions
gy-0.010
bank-0.009
man-0.009
 majesty-0.008
zy-0.008
Token their
Token ID511
Feature activation+3.42e-03

Pos logit contributions
 feedback+0.004
itement+0.004
 adjustments+0.004
oral+0.004
Less+0.004
Neg logit contributions
gy-0.005
bank-0.004
man-0.004
 majesty-0.004
zy-0.004
Token books
Token ID3835
Feature activation+1.39e-02

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.004
 graded+0.004
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
 majesty-0.005
 Majesty-0.005
Token on
Token ID319
Feature activation+8.57e-03

Pos logit contributions
 feedback+0.021
itement+0.020
 adjustments+0.020
 tug+0.019
 grades+0.019
Neg logit contributions
gy-0.021
man-0.019
bank-0.019
zy-0.018
asis-0.017
Token became
Token ID2627
Feature activation-3.66e-03

Pos logit contributions
gy+0.013
bank+0.012
man+0.012
 Majesty+0.012
 majesty+0.011
Neg logit contributions
itement-0.009
 feedback-0.009
 tug-0.009
 adjustments-0.009
 later-0.009
Token friendly
Token ID8030
Feature activation-6.21e-04

Pos logit contributions
gy+0.005
bank+0.005
man+0.005
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.006
itement-0.005
 adjustments-0.005
 tug-0.005
 grades-0.005
Token and
Token ID290
Feature activation-6.40e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
 majesty+0.001
zy+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token well
Token ID880
Feature activation-4.02e-03

Pos logit contributions
gy+0.011
bank+0.011
man+0.010
asis+0.010
 Majesty+0.010
Neg logit contributions
 feedback-0.008
 tug-0.008
 adjustments-0.008
 grades-0.007
itement-0.007
Token-
Token ID12
Feature activation-1.08e-02

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
 majesty+0.006
zy+0.006
Neg logit contributions
 feedback-0.005
 adjustments-0.005
itement-0.005
 tug-0.005
 grades-0.005
Tokenbeh
Token ID20709
Feature activation+2.93e-02

Pos logit contributions
gy+0.019
 majesty+0.018
 Majesty+0.018
man+0.017
bank+0.017
Neg logit contributions
itement-0.013
 feedback-0.013
 adjustments-0.012
 tug-0.012
 later-0.012
Tokenaved
Token ID9586
Feature activation+3.12e-03

Pos logit contributions
 feedback+0.041
 adjustments+0.041
oral+0.040
 conflict+0.040
 tug+0.039
Neg logit contributions
gy-0.047
bank-0.042
o-0.041
man-0.038
asis-0.038
Token,
Token ID11
Feature activation-3.69e-04

Pos logit contributions
 adjustments+0.003
 later+0.003
 tug+0.003
 feedback+0.003
 grades+0.003
Neg logit contributions
gy-0.006
 majesty-0.005
 Majesty-0.005
asis-0.005
bank-0.005
Token just
Token ID655
Feature activation-8.33e-03

Pos logit contributions
gy+0.001
man+0.001
bank+0.001
 majesty+0.001
asis+0.001
Neg logit contributions
 tug-0.000
 feedback-0.000
 adjustments-0.000
 longer-0.000
 later-0.000
Token like
Token ID588
Feature activation-4.20e-03

Pos logit contributions
gy+0.015
man+0.014
bank+0.014
asis+0.014
 majesty+0.013
Neg logit contributions
 tug-0.010
 feedback-0.010
 later-0.009
 adjustments-0.009
 grades-0.009
Token Buddy
Token ID36896
Feature activation+1.19e-02

Pos logit contributions
gy+0.007
asis+0.007
man+0.007
bank+0.007
 majesty+0.007
Neg logit contributions
 feedback-0.005
 later-0.005
 tug-0.005
 adjustments-0.005
 longer-0.005
Token baby
Token ID5156
Feature activation-2.30e-03

Pos logit contributions
bank+0.007
gy+0.007
opl+0.006
man+0.006
zy+0.006
Neg logit contributions
 tug-0.006
 feedback-0.005
 conflict-0.005
 longer-0.005
itement-0.005
Token.
Token ID13
Feature activation-2.53e-04

Pos logit contributions
bank+0.004
 Majesty+0.004
 majesty+0.004
gy+0.004
opl+0.004
Neg logit contributions
 later-0.003
 longer-0.003
 girls-0.003
 feedback-0.002
 tug-0.002
Token He
Token ID679
Feature activation-3.74e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 Majesty+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
itement-0.000
 adjustments-0.000
 tug-0.000
 grades-0.000
Token was
Token ID373
Feature activation+1.45e-04

Pos logit contributions
 Majesty+0.006
bank+0.006
stack+0.006
 majesty+0.006
gy+0.006
Neg logit contributions
 tug-0.005
 later-0.005
 conflict-0.004
 graded-0.004
itement-0.004
Token a
Token ID257
Feature activation+2.09e-03

Pos logit contributions
 longer+0.000
 tug+0.000
 persu+0.000
 graded+0.000
 feedback+0.000
Neg logit contributions
bank-0.000
asis-0.000
 Majesty-0.000
 majesty-0.000
opl-0.000
Token mis
Token ID2984
Feature activation+2.89e-02

Pos logit contributions
 tug+0.003
 longer+0.003
 feedback+0.003
 conflict+0.003
itement+0.002
Neg logit contributions
bank-0.004
gy-0.003
opl-0.003
zy-0.003
 Majesty-0.003
Tokenchie
Token ID3043
Feature activation-1.06e-02

Pos logit contributions
 feedback+0.047
 conflict+0.047
 adjustments+0.046
 grades+0.046
oral+0.046
Neg logit contributions
gy-0.039
bank-0.034
man-0.033
o-0.031
eh-0.030
Tokenvous
Token ID31222
Feature activation+6.04e-03

Pos logit contributions
 Majesty+0.017
 majesty+0.016
gy+0.015
bank+0.014
stack+0.014
Neg logit contributions
 of-0.014
itement-0.014
 later-0.014
care-0.013
 feedback-0.013
Token little
Token ID1310
Feature activation-4.38e-04

Pos logit contributions
 tug+0.006
 longer+0.005
 girls+0.005
 conflict+0.005
 feedback+0.005
Neg logit contributions
bank-0.012
gy-0.012
opl-0.011
asis-0.011
 Majesty-0.011
Token baby
Token ID5156
Feature activation-2.53e-03

Pos logit contributions
gy+0.001
bank+0.001
 majesty+0.001
man+0.001
 Majesty+0.001
Neg logit contributions
 feedback-0.001
 tug-0.001
 adjustments-0.001
itement-0.000
 longer-0.000
Token and
Token ID290
Feature activation+8.60e-04

Pos logit contributions
bank+0.005
gy+0.005
 Majesty+0.005
 majesty+0.005
asis+0.004
Neg logit contributions
 later-0.002
 longer-0.002
 tug-0.002
 feedback-0.002
 graded-0.002
Token could
Token ID714
Feature activation+7.16e-03

Pos logit contributions
gy+0.004
bank+0.004
 Majesty+0.004
 majesty+0.004
man+0.004
Neg logit contributions
 later-0.003
 tug-0.003
 feedback-0.003
 grades-0.003
 graded-0.003
Token be
Token ID307
Feature activation+3.00e-03

Pos logit contributions
 tug+0.008
 feedback+0.008
 adjustments+0.008
 later+0.008
 grades+0.007
Neg logit contributions
gy-0.013
bank-0.012
man-0.012
 majesty-0.012
 Majesty-0.012
Token and
Token ID290
Feature activation+2.09e-03

Pos logit contributions
 feedback+0.004
 later+0.004
 longer+0.003
 adjustments+0.003
 grades+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
 majesty-0.005
stack-0.005
Token never
Token ID1239
Feature activation-2.00e-03

Pos logit contributions
 feedback+0.003
 tug+0.002
 later+0.002
 grades+0.002
 longer+0.002
Neg logit contributions
gy-0.004
eh-0.003
asis-0.003
man-0.003
zy-0.003
Token mis
Token ID2984
Feature activation+2.20e-02

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
 majesty+0.003
zy+0.003
Neg logit contributions
 feedback-0.002
 adjustments-0.002
itement-0.002
 tug-0.002
 grades-0.002
Tokenbeh
Token ID20709
Feature activation+2.85e-02

Pos logit contributions
oral+0.035
 feedback+0.033
 recess+0.032
Less+0.032
 adjustments+0.032
Neg logit contributions
gy-0.035
bank-0.028
o-0.028
man-0.026
an-0.024
Tokenaved
Token ID9586
Feature activation+1.19e-02

Pos logit contributions
oral+0.045
 feedback+0.043
Less+0.042
 adjustments+0.042
 conflict+0.041
Neg logit contributions
gy-0.044
o-0.038
bank-0.037
eer-0.033
opl-0.033
Token.
Token ID13
Feature activation-2.90e-03

Pos logit contributions
 later+0.012
 grades+0.012
 adjustments+0.012
 tug+0.012
 shortcuts+0.012
Neg logit contributions
gy-0.022
 majesty-0.022
asis-0.021
 Majesty-0.021
bank-0.021
Token<|endoftext|>
Token ID50256
Feature activation+2.44e-03

Pos logit contributions
gy+0.005
asis+0.005
 majesty+0.005
 Majesty+0.004
man+0.004
Neg logit contributions
Less-0.004
 feedback-0.003
oral-0.003
 grades-0.003
 shortcuts-0.003
TokenOnce
Token ID7454
Feature activation-2.80e-03

Pos logit contributions
 later+0.003
itement+0.003
 graded+0.003
care+0.003
 grades+0.003
Neg logit contributions
gy-0.004
 Majesty-0.004
 majesty-0.004
bank-0.004
man-0.004
Token upon
Token ID2402
Feature activation-4.15e-03

Pos logit contributions
gy+0.006
bank+0.006
 Majesty+0.006
 majesty+0.006
asis+0.006
Neg logit contributions
 later-0.002
 conflict-0.002
oms-0.002
oral-0.002
itement-0.002
Token a
Token ID257
Feature activation+1.82e-04

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token nice
Token ID3621
Feature activation-1.39e-03

Pos logit contributions
 feedback+0.000
 tug+0.000
 adjustments+0.000
itement+0.000
 conflict+0.000
Neg logit contributions
gy-0.000
bank-0.000
zy-0.000
man-0.000
 Majesty-0.000
Token home
Token ID1363
Feature activation+9.55e-04

Pos logit contributions
gy+0.003
bank+0.002
zy+0.002
asis+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.002
 tug-0.002
 adjustments-0.001
itement-0.001
 conflict-0.001
Token in
Token ID287
Feature activation+4.21e-03

Pos logit contributions
 later+0.001
 feedback+0.001
 longer+0.001
 adjustments+0.001
 grades+0.001
Neg logit contributions
gy-0.002
 Majesty-0.002
bank-0.002
 majesty-0.002
asis-0.002
Token a
Token ID257
Feature activation+3.07e-03

Pos logit contributions
 feedback+0.006
itement+0.005
 adjustments+0.005
 tug+0.005
 later+0.005
Neg logit contributions
gy-0.007
bank-0.006
man-0.006
zy-0.006
 majesty-0.006
Token terr
Token ID8812
Feature activation+2.79e-02

Pos logit contributions
 feedback+0.005
 tug+0.005
itement+0.005
 adjustments+0.005
 conflict+0.005
Neg logit contributions
gy-0.004
bank-0.004
zy-0.004
man-0.004
 majesty-0.004
Tokenarium
Token ID17756
Feature activation-3.49e-03

Pos logit contributions
 feedback+0.036
 adjustments+0.036
itement+0.035
 tug+0.035
 grades+0.034
Neg logit contributions
gy-0.048
bank-0.043
man-0.043
 majesty-0.041
 Majesty-0.041
Token.
Token ID13
Feature activation-9.76e-04

Pos logit contributions
gy+0.007
 Majesty+0.006
bank+0.006
 majesty+0.006
zy+0.006
Neg logit contributions
 later-0.004
 feedback-0.003
itement-0.003
 longer-0.003
 adjustments-0.003
Token They
Token ID1119
Feature activation+4.11e-04

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
 majesty+0.001
zy+0.001
Neg logit contributions
 feedback-0.002
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token learned
Token ID4499
Feature activation+3.61e-03

Pos logit contributions
 feedback+0.000
itement+0.000
 tug+0.000
 adjustments+0.000
 later+0.000
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 Majesty-0.001
zy-0.001
Token more
Token ID517
Feature activation-2.66e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.005
itement+0.005
 tug+0.005
 grades+0.005
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
zy-0.005
 majesty-0.005
Token by
Token ID416
Feature activation+2.08e-03

Pos logit contributions
 feedback+0.007
itement+0.007
 adjustments+0.007
 tug+0.007
 later+0.006
Neg logit contributions
gy-0.011
man-0.010
bank-0.010
 majesty-0.010
 Majesty-0.010
Token and
Token ID290
Feature activation-1.67e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.002
 tug+0.002
itement+0.002
 grades+0.002
Neg logit contributions
gy-0.004
man-0.003
bank-0.003
asis-0.003
 majesty-0.003
Token imported
Token ID17392
Feature activation+1.65e-02

Pos logit contributions
gy+0.003
man+0.003
bank+0.003
asis+0.003
eh+0.003
Neg logit contributions
 tug-0.002
 feedback-0.002
 adjustments-0.002
 grades-0.002
itement-0.002
Token by
Token ID416
Feature activation+3.01e-03

Pos logit contributions
 feedback+0.021
 adjustments+0.021
 tug+0.020
itement+0.020
 grades+0.019
Neg logit contributions
gy-0.029
man-0.026
bank-0.025
zy-0.024
o-0.024
Token and
Token ID290
Feature activation-1.37e-03

Pos logit contributions
 feedback+0.004
 tug+0.004
itement+0.004
 adjustments+0.004
 later+0.004
Neg logit contributions
gy-0.005
man-0.005
bank-0.005
asis-0.004
 majesty-0.004
Token exported
Token ID29050
Feature activation+2.74e-02

Pos logit contributions
gy+0.002
man+0.002
eh+0.002
asis+0.002
bank+0.002
Neg logit contributions
 feedback-0.002
 tug-0.002
 adjustments-0.002
 grades-0.002
 conflict-0.002
Token by
Token ID416
Feature activation+9.40e-04

Pos logit contributions
 adjustments+0.045
 feedback+0.045
 tug+0.045
itement+0.044
 grades+0.043
Neg logit contributions
gy-0.036
man-0.032
bank-0.031
o-0.029
zy-0.029
Token and
Token ID290
Feature activation-8.25e-03

Pos logit contributions
 feedback+0.001
 tug+0.001
 later+0.001
itement+0.001
 adjustments+0.001
Neg logit contributions
gy-0.002
bank-0.001
man-0.001
asis-0.001
 Majesty-0.001
Token sold
Token ID2702
Feature activation+2.26e-03

Pos logit contributions
gy+0.013
man+0.012
bank+0.012
 majesty+0.012
 Majesty+0.011
Neg logit contributions
 tug-0.011
 feedback-0.011
 grades-0.011
itement-0.011
 setup-0.011
Token by
Token ID416
Feature activation+1.06e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 tug+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 majesty-0.004
zy-0.004
Token and
Token ID290
Feature activation-9.54e-03

Pos logit contributions
 feedback+0.001
 tug+0.001
 later+0.001
itement+0.001
 adjustments+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
asis-0.002
 Majesty-0.002
Token a
Token ID257
Feature activation-2.79e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.005
itement+0.005
 tug+0.005
 grades+0.005
Neg logit contributions
gy-0.007
bank-0.006
man-0.006
 majesty-0.006
zy-0.006
Token plan
Token ID1410
Feature activation+3.38e-03

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
itement-0.004
 adjustments-0.004
 grades-0.004
Less-0.004
Token in
Token ID287
Feature activation+9.22e-05

Pos logit contributions
 feedback+0.005
 adjustments+0.004
itement+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
 majesty-0.005
zy-0.005
Token case
Token ID1339
Feature activation+8.98e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Token of
Token ID286
Feature activation-3.92e-03

Pos logit contributions
 feedback+0.011
 adjustments+0.011
itement+0.011
 later+0.011
 grades+0.011
Neg logit contributions
gy-0.016
bank-0.014
man-0.014
 Majesty-0.014
 majesty-0.014
Token emergencies
Token ID36047
Feature activation+2.72e-02

Pos logit contributions
gy+0.007
bank+0.007
man+0.007
zy+0.006
 majesty+0.006
Neg logit contributions
 feedback-0.005
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token.
Token ID13
Feature activation-4.89e-04

Pos logit contributions
 later+0.031
 grades+0.030
 feedback+0.030
 tug+0.029
 adjustments+0.028
Neg logit contributions
gy-0.047
eh-0.046
 majesty-0.045
bank-0.045
asis-0.045
Token<|endoftext|>
Token ID50256
Feature activation+2.75e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
asis+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
Less-0.001
 tug-0.001
TokenOnce
Token ID7454
Feature activation-1.60e-03

Pos logit contributions
 later+0.003
care+0.003
itement+0.003
 tug+0.003
 graded+0.003
Neg logit contributions
gy-0.005
 Majesty-0.005
 majesty-0.005
bank-0.004
man-0.004
Token upon
Token ID2402
Feature activation-3.41e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
 Majesty+0.003
asis+0.003
Neg logit contributions
 later-0.002
 feedback-0.002
 conflict-0.001
itement-0.001
oral-0.001
Token a
Token ID257
Feature activation-7.86e-03

Pos logit contributions
bank+0.007
gy+0.007
 Majesty+0.007
 majesty+0.007
man+0.006
Neg logit contributions
 later-0.003
 tug-0.003
 feedback-0.003
 adjustments-0.003
 longer-0.003
Token way
Token ID835
Feature activation+2.66e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
zy-0.003
Token she
Token ID673
Feature activation-2.75e-04

Pos logit contributions
 feedback+0.004
 adjustments+0.003
 grades+0.003
itement+0.003
 shortcuts+0.003
Neg logit contributions
gy-0.004
 majesty-0.004
bank-0.004
asis-0.004
 Majesty-0.004
Token solved
Token ID16019
Feature activation+1.25e-02

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 majesty+0.000
 Majesty+0.000
Neg logit contributions
 tug-0.000
 grades-0.000
 feedback-0.000
 graded-0.000
 adjustments-0.000
Token the
Token ID262
Feature activation+6.25e-03

Pos logit contributions
 feedback+0.017
 adjustments+0.016
itement+0.016
 tug+0.016
 grades+0.016
Neg logit contributions
gy-0.020
bank-0.019
man-0.019
 majesty-0.018
zy-0.018
Token math
Token ID10688
Feature activation+1.36e-02

Pos logit contributions
itement+0.007
 feedback+0.007
 tug+0.007
 adjustments+0.007
oms+0.007
Neg logit contributions
gy-0.011
man-0.010
bank-0.010
 Majesty-0.010
zy-0.010
Token equations
Token ID27490
Feature activation+2.71e-02

Pos logit contributions
 feedback+0.018
itement+0.018
 adjustments+0.017
 tug+0.017
 grades+0.017
Neg logit contributions
gy-0.022
bank-0.021
man-0.020
zy-0.019
 majesty-0.019
Token.
Token ID13
Feature activation+3.71e-04

Pos logit contributions
 feedback+0.038
itement+0.037
 adjustments+0.037
 tug+0.036
 grades+0.035
Neg logit contributions
gy-0.042
bank-0.039
man-0.039
zy-0.037
 majesty-0.036
Token
Token ID220
Feature activation-4.28e-03

Pos logit contributions
 feedback+0.001
itement+0.001
 adjustments+0.001
 tug+0.001
 conflict+0.001
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
zy-0.000
 Majesty-0.000
Token
Token ID198
Feature activation-1.39e-05

Pos logit contributions
gy+0.008
bank+0.008
asis+0.007
 majesty+0.007
 Majesty+0.007
Neg logit contributions
 later-0.005
Less-0.005
care-0.005
 feedback-0.005
oral-0.004
Token
Token ID198
Feature activation+8.76e-04

Pos logit contributions
gy+0.000
man+0.000
bank+0.000
zy+0.000
o+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
itement-0.000
 tug-0.000
 grades-0.000
TokenEvery
Token ID6109
Feature activation-4.23e-03

Pos logit contributions
 adjustments+0.001
 halves+0.001
 dollars+0.001
 grades+0.001
itement+0.001
Neg logit contributions
gy-0.001
man-0.001
opl-0.001
zy-0.001
o-0.001
Token his
Token ID465
Feature activation+3.01e-03

Pos logit contributions
gy+0.004
bank+0.004
asis+0.004
man+0.004
 majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
 tug-0.004
 later-0.004
 longer-0.004
Token friends
Token ID2460
Feature activation-3.16e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
 tug+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.006
bank-0.006
man-0.005
zy-0.005
 majesty-0.005
Token.
Token ID13
Feature activation+5.65e-04

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
zy+0.003
o+0.003
Neg logit contributions
 feedback-0.006
 shortcuts-0.005
itement-0.005
Uh-0.005
 adjustments-0.005
Token They
Token ID1119
Feature activation-1.15e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
 grades+0.001
 later+0.001
oral+0.001
Neg logit contributions
gy-0.001
asis-0.001
 Majesty-0.001
man-0.001
bank-0.001
Token all
Token ID477
Feature activation-4.24e-03

Pos logit contributions
gy+0.002
man+0.002
bank+0.002
zy+0.001
o+0.001
Neg logit contributions
 feedback-0.002
 adjustments-0.002
 shortcuts-0.002
 grades-0.002
itement-0.002
Token che
Token ID1125
Feature activation+2.71e-02

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
zy+0.005
 majesty+0.005
Neg logit contributions
 feedback-0.006
 adjustments-0.006
 grades-0.006
 shortcuts-0.006
itement-0.006
Tokenwed
Token ID19103
Feature activation+7.81e-03

Pos logit contributions
 feedback+0.040
 adjustments+0.039
 grades+0.038
oral+0.038
 graded+0.038
Neg logit contributions
gy-0.042
eh-0.037
o-0.037
bank-0.036
at-0.035
Token the
Token ID262
Feature activation+1.77e-03

Pos logit contributions
 feedback+0.011
itement+0.011
oral+0.011
 grades+0.011
 adjustments+0.011
Neg logit contributions
gy-0.012
bank-0.010
man-0.010
zy-0.010
o-0.010
Token gum
Token ID27142
Feature activation+4.67e-04

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token and
Token ID290
Feature activation+3.30e-03

Pos logit contributions
 later+0.000
 adjustments+0.000
 tug+0.000
 halves+0.000
 feedback+0.000
Neg logit contributions
gy-0.001
 majesty-0.001
 Majesty-0.001
bank-0.001
asis-0.001
Token blew
Token ID17948
Feature activation+5.40e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
 majesty-0.005
zy-0.005
Token saw
Token ID2497
Feature activation+2.33e-03

Pos logit contributions
 majesty+0.004
gy+0.004
 Majesty+0.004
bank+0.004
stack+0.003
Neg logit contributions
 later-0.003
 graded-0.002
 tug-0.002
 conflict-0.002
 grades-0.002
Token some
Token ID617
Feature activation+6.33e-04

Pos logit contributions
 adjustments+0.002
 feedback+0.002
 longer+0.002
 tug+0.002
 harder+0.002
Neg logit contributions
gy-0.005
asis-0.005
bank-0.004
 majesty-0.004
zy-0.004
Token other
Token ID584
Feature activation-4.64e-03

Pos logit contributions
 feedback+0.001
itement+0.001
 adjustments+0.001
 tug+0.001
Less+0.001
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
 majesty-0.001
zy-0.001
Token kids
Token ID3988
Feature activation-3.35e-03

Pos logit contributions
gy+0.009
bank+0.009
man+0.009
 majesty+0.008
zy+0.008
Neg logit contributions
 feedback-0.004
itement-0.004
 adjustments-0.004
 tug-0.004
 grades-0.004
Token mis
Token ID2984
Feature activation+2.24e-02

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
zy+0.004
 majesty+0.004
Neg logit contributions
 feedback-0.006
 adjustments-0.005
itement-0.005
 tug-0.005
 grades-0.005
Tokenbeh
Token ID20709
Feature activation+2.65e-02

Pos logit contributions
 conflict+0.037
oral+0.037
 feedback+0.037
Less+0.036
 grades+0.036
Neg logit contributions
gy-0.030
bank-0.027
o-0.026
eh-0.023
man-0.023
Tokenaving
Token ID2703
Feature activation-7.77e-04

Pos logit contributions
 adjustments+0.040
 feedback+0.040
 conflict+0.039
 tug+0.039
oral+0.039
Neg logit contributions
gy-0.041
o-0.036
bank-0.036
man-0.032
eer-0.032
Token and
Token ID290
Feature activation+2.32e-03

Pos logit contributions
gy+0.001
 majesty+0.001
asis+0.001
man+0.001
 Majesty+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 tug-0.001
itement-0.001
 grades-0.001
Token running
Token ID2491
Feature activation-2.06e-03

Pos logit contributions
 tug+0.003
 feedback+0.003
 later+0.003
 adjustments+0.003
 grades+0.003
Neg logit contributions
gy-0.004
asis-0.004
man-0.004
 majesty-0.004
zy-0.004
Token around
Token ID1088
Feature activation+1.11e-03

Pos logit contributions
gy+0.003
man+0.003
bank+0.003
 majesty+0.003
 Majesty+0.003
Neg logit contributions
 feedback-0.003
 adjustments-0.003
 tug-0.003
itement-0.003
 grades-0.003
Token.
Token ID13
Feature activation+6.38e-04

Pos logit contributions
 tug+0.001
 feedback+0.001
 adjustments+0.001
itement+0.001
 later+0.001
Neg logit contributions
gy-0.002
man-0.002
 majesty-0.002
bank-0.002
 Majesty-0.002
Token still
Token ID991
Feature activation+2.11e-04

Pos logit contributions
 longer+0.007
 feedback+0.006
 tug+0.006
 later+0.006
 undo+0.006
Neg logit contributions
gy-0.010
bank-0.009
 Majesty-0.009
zy-0.009
eh-0.009
Token be
Token ID307
Feature activation+1.31e-03

Pos logit contributions
 later+0.000
 tug+0.000
 feedback+0.000
 longer+0.000
 adjustments+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 Majesty-0.000
 majesty-0.000
Token healthy
Token ID5448
Feature activation+4.40e-03

Pos logit contributions
 later+0.002
 longer+0.002
 feedback+0.001
itement+0.001
 adjustments+0.001
Neg logit contributions
bank-0.002
gy-0.002
stack-0.002
ule-0.002
man-0.002
Token and
Token ID290
Feature activation-4.84e-03

Pos logit contributions
 later+0.004
 longer+0.004
 adjustments+0.004
 feedback+0.004
 grades+0.004
Neg logit contributions
gy-0.009
 majesty-0.009
bank-0.009
man-0.008
 Majesty-0.008
Token un
Token ID555
Feature activation+9.34e-03

Pos logit contributions
gy+0.008
bank+0.008
opl+0.007
man+0.007
zy+0.007
Neg logit contributions
 tug-0.007
 feedback-0.007
 longer-0.007
 later-0.007
 adjustments-0.006
Tokenhar
Token ID9869
Feature activation+2.64e-02

Pos logit contributions
 feedback+0.011
itement+0.011
 adjustments+0.011
 tug+0.011
 grades+0.010
Neg logit contributions
gy-0.016
bank-0.015
man-0.015
 majesty-0.015
 Majesty-0.015
Tokenmed
Token ID1150
Feature activation-6.38e-04

Pos logit contributions
 feedback+0.022
itement+0.021
 adjustments+0.021
 tug+0.021
 grades+0.020
Neg logit contributions
gy-0.055
bank-0.053
man-0.052
 majesty-0.051
 Majesty-0.050
Token.
Token ID13
Feature activation-3.52e-04

Pos logit contributions
gy+0.001
bank+0.001
 majesty+0.001
asis+0.001
 Majesty+0.001
Neg logit contributions
 later-0.001
 feedback-0.001
 adjustments-0.001
 tug-0.001
 grades-0.001
Token Suddenly
Token ID24975
Feature activation+1.87e-03

Pos logit contributions
gy+0.001
bank+0.000
man+0.000
 majesty+0.000
 Majesty+0.000
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token,
Token ID11
Feature activation-2.22e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
oral+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.002
zy-0.002
bank-0.002
 majesty-0.002
man-0.002
Token her
Token ID607
Feature activation+3.36e-03

Pos logit contributions
gy+0.003
zy+0.003
man+0.003
bank+0.003
 Dod+0.002
Neg logit contributions
 feedback-0.004
 adjustments-0.003
oral-0.003
 grades-0.003
itement-0.003
Token looked
Token ID3114
Feature activation-1.86e-04

Pos logit contributions
 tug+0.006
 graded+0.006
itement+0.006
 later+0.006
 feedback+0.006
Neg logit contributions
gy-0.012
bank-0.012
asis-0.012
 Majesty-0.012
 majesty-0.012
Token at
Token ID379
Feature activation+1.32e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
zy+0.000
 Majesty+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
itement-0.000
oral-0.000
 tug-0.000
Token him
Token ID683
Feature activation+3.56e-03

Pos logit contributions
 later+0.001
 tug+0.001
 feedback+0.001
itement+0.001
 shortcuts+0.001
Neg logit contributions
 majesty-0.003
asis-0.003
gy-0.003
eh-0.003
bank-0.003
Token with
Token ID351
Feature activation-2.43e-03

Pos logit contributions
 feedback+0.005
itement+0.005
 adjustments+0.005
 grades+0.004
 tug+0.004
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
zy-0.005
 majesty-0.005
Token a
Token ID257
Feature activation-5.03e-04

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
asis+0.004
 majesty+0.004
Neg logit contributions
 tug-0.003
 feedback-0.003
 adjustments-0.003
itement-0.003
 grades-0.003
Token mis
Token ID2984
Feature activation+2.64e-02

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 tug-0.001
 adjustments-0.001
itement-0.001
 conflict-0.001
Tokenchie
Token ID3043
Feature activation-1.02e-02

Pos logit contributions
 grades+0.044
 conflict+0.044
 feedback+0.043
 graded+0.043
 adjustments+0.043
Neg logit contributions
gy-0.033
bank-0.030
eh-0.028
man-0.027
an-0.027
Tokenvous
Token ID31222
Feature activation+5.70e-03

Pos logit contributions
gy+0.015
 Majesty+0.014
 majesty+0.014
stack+0.013
bank+0.012
Neg logit contributions
itement-0.016
 adjustments-0.015
 later-0.015
oral-0.015
 agree-0.015
Token grin
Token ID16770
Feature activation+1.94e-03

Pos logit contributions
 tug+0.006
 adjustments+0.006
 feedback+0.006
itement+0.006
 grades+0.006
Neg logit contributions
gy-0.011
bank-0.010
man-0.010
asis-0.010
zy-0.009
Token.
Token ID13
Feature activation+1.04e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
 Majesty-0.003
Token She
Token ID1375
Feature activation-1.38e-03

Pos logit contributions
 adjustments+0.001
 graded+0.001
 grades+0.001
 feedback+0.001
 tug+0.001
Neg logit contributions
gy-0.002
asis-0.002
man-0.001
 majesty-0.001
bank-0.001
Token back
Token ID736
Feature activation+1.20e-04

Pos logit contributions
gy+0.009
bank+0.009
man+0.008
 majesty+0.008
zy+0.008
Neg logit contributions
 feedback-0.007
 adjustments-0.006
itement-0.006
 tug-0.006
 grades-0.006
Token the
Token ID262
Feature activation+2.08e-03

Pos logit contributions
 feedback+0.000
itement+0.000
 adjustments+0.000
 later+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
zy-0.000
 majesty-0.000
Token village
Token ID7404
Feature activation+7.55e-03

Pos logit contributions
 adjustments+0.003
 feedback+0.003
itement+0.003
 grades+0.003
 tug+0.003
Neg logit contributions
gy-0.003
bank-0.003
asis-0.002
man-0.002
 Majesty-0.002
Token to
Token ID284
Feature activation+2.47e-03

Pos logit contributions
 adjustments+0.009
 feedback+0.009
 later+0.009
itement+0.009
 grades+0.009
Neg logit contributions
gy-0.013
bank-0.012
man-0.012
 majesty-0.012
 Majesty-0.012
Token its
Token ID663
Feature activation-3.94e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
 tug+0.003
itement+0.003
 later+0.003
Neg logit contributions
gy-0.005
man-0.004
bank-0.004
zy-0.004
 majesty-0.004
Token former
Token ID1966
Feature activation+2.62e-02

Pos logit contributions
gy+0.006
bank+0.005
man+0.005
zy+0.005
o+0.005
Neg logit contributions
 feedback-0.006
 adjustments-0.006
itement-0.006
 tug-0.006
 grades-0.006
Token state
Token ID1181
Feature activation+9.35e-03

Pos logit contributions
 feedback+0.027
 adjustments+0.027
itement+0.026
 conflict+0.025
 grades+0.025
Neg logit contributions
gy-0.050
man-0.048
bank-0.046
o-0.046
zy-0.045
Token.
Token ID13
Feature activation+1.28e-03

Pos logit contributions
 later+0.009
 feedback+0.009
 adjustments+0.009
 longer+0.009
itement+0.009
Neg logit contributions
gy-0.018
bank-0.017
 majesty-0.017
 Majesty-0.016
man-0.016
Token With
Token ID2080
Feature activation-5.50e-03

Pos logit contributions
Less+0.002
 feedback+0.002
oral+0.002
itement+0.002
 adjustments+0.002
Neg logit contributions
gy-0.002
man-0.002
bank-0.002
 majesty-0.002
asis-0.002
Token the
Token ID262
Feature activation+8.12e-04

Pos logit contributions
gy+0.009
bank+0.009
man+0.009
 majesty+0.008
zy+0.008
Neg logit contributions
 feedback-0.007
 adjustments-0.007
itement-0.007
 tug-0.007
 grades-0.007
Token help
Token ID1037
Feature activation-3.44e-03

Pos logit contributions
 adjustments+0.001
 feedback+0.001
itement+0.001
 graded+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 majesty-0.001
 Majesty-0.001
Token the
Token ID262
Feature activation-3.96e-03

Pos logit contributions
gy+0.005
bank+0.005
man+0.005
asis+0.005
eh+0.005
Neg logit contributions
 tug-0.003
 later-0.003
itement-0.003
 feedback-0.003
 adjustments-0.003
Token old
Token ID1468
Feature activation-1.32e-02

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
zy+0.006
 majesty+0.006
Neg logit contributions
 feedback-0.005
 adjustments-0.005
itement-0.005
 tug-0.005
 grades-0.005
Token lady
Token ID10846
Feature activation+3.11e-03

Pos logit contributions
gy+0.023
bank+0.021
man+0.020
zy+0.020
 Majesty+0.020
Neg logit contributions
 feedback-0.016
 adjustments-0.015
itement-0.015
 grades-0.015
 tug-0.015
Token with
Token ID351
Feature activation-1.38e-03

Pos logit contributions
 later+0.004
 dollars+0.003
itement+0.003
 graded+0.003
oms+0.003
Neg logit contributions
bank-0.005
gy-0.005
asis-0.005
 Majesty-0.005
 majesty-0.005
Token a
Token ID257
Feature activation-5.55e-04

Pos logit contributions
gy+0.003
bank+0.002
man+0.002
asis+0.002
zy+0.002
Neg logit contributions
 feedback-0.002
 tug-0.002
 adjustments-0.002
 shortcuts-0.002
itement-0.002
Token mis
Token ID2984
Feature activation+2.61e-02

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
asis+0.001
Neg logit contributions
 tug-0.001
 feedback-0.001
itement-0.001
 adjustments-0.001
 conflict-0.001
Tokenchie
Token ID3043
Feature activation-1.52e-02

Pos logit contributions
 feedback+0.041
 adjustments+0.040
 conflict+0.040
 grades+0.040
Less+0.039
Neg logit contributions
gy-0.035
bank-0.033
man-0.031
o-0.029
asis-0.029
Tokenvous
Token ID31222
Feature activation+5.17e-03

Pos logit contributions
gy+0.023
 Majesty+0.021
stack+0.021
 majesty+0.021
bank+0.019
Neg logit contributions
itement-0.024
 adjustments-0.023
 later-0.023
pr-0.022
 shortcuts-0.022
Token tw
Token ID665
Feature activation+5.30e-04

Pos logit contributions
 tug+0.006
 adjustments+0.006
 feedback+0.005
 grades+0.005
itement+0.005
Neg logit contributions
gy-0.010
man-0.009
bank-0.009
asis-0.009
zy-0.008
Tokeninkle
Token ID19894
Feature activation+6.78e-03

Pos logit contributions
 adjustments+0.000
 tug+0.000
itement+0.000
oms+0.000
 grades+0.000
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
stack-0.001
o-0.001
Token in
Token ID287
Feature activation+2.62e-03

Pos logit contributions
 later+0.007
itement+0.006
 adjustments+0.006
 tug+0.006
 feedback+0.006
Neg logit contributions
gy-0.013
bank-0.013
 majesty-0.012
 Majesty-0.012
man-0.012
Token see
Token ID766
Feature activation-2.23e-04

Pos logit contributions
 feedback+0.002
 adjustments+0.002
 tug+0.002
itement+0.002
 grades+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token what
Token ID644
Feature activation-2.80e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
asis+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
 later-0.000
 adjustments-0.000
itement-0.000
 tug-0.000
Token was
Token ID373
Feature activation+4.88e-03

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 graded-0.004
Token inside
Token ID2641
Feature activation+5.28e-03

Pos logit contributions
 feedback+0.006
 later+0.006
 adjustments+0.006
 grades+0.006
 tug+0.006
Neg logit contributions
gy-0.009
bank-0.008
 majesty-0.008
man-0.008
 Majesty-0.007
Token the
Token ID262
Feature activation+3.28e-03

Pos logit contributions
 feedback+0.007
itement+0.006
 adjustments+0.006
 tug+0.006
 later+0.006
Neg logit contributions
gy-0.009
bank-0.008
man-0.008
 majesty-0.008
 Majesty-0.008
Token envelop
Token ID16441
Feature activation+2.61e-02

Pos logit contributions
itement+0.005
 feedback+0.005
 adjustments+0.005
 tug+0.005
 grades+0.005
Neg logit contributions
gy-0.004
man-0.004
bank-0.004
zy-0.004
asis-0.004
Tokenes
Token ID274
Feature activation+1.09e-02

Pos logit contributions
 feedback+0.034
itement+0.033
 adjustments+0.032
 tug+0.032
 grades+0.031
Neg logit contributions
gy-0.045
man-0.043
bank-0.040
zy-0.040
asis-0.038
Token!
Token ID0
Feature activation+1.50e-03

Pos logit contributions
 feedback+0.015
 adjustments+0.014
itement+0.014
 tug+0.014
 grades+0.013
Neg logit contributions
gy-0.018
bank-0.017
man-0.016
zy-0.016
 majesty-0.015
Token<|endoftext|>
Token ID50256
Feature activation+3.31e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 grades+0.002
 tug+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
TokenOne
Token ID3198
Feature activation+8.98e-03

Pos logit contributions
 later+0.004
care+0.004
itement+0.004
 graded+0.004
 tug+0.004
Neg logit contributions
gy-0.006
 Majesty-0.006
 majesty-0.006
bank-0.006
man-0.005
Token day
Token ID1110
Feature activation+2.15e-03

Pos logit contributions
itement+0.014
 feedback+0.014
 later+0.014
 conflict+0.014
 tug+0.014
Neg logit contributions
gy-0.013
bank-0.012
 Majesty-0.011
 majesty-0.011
man-0.011
Token you
Token ID345
Feature activation-1.94e-03

Pos logit contributions
gy+0.005
eer+0.005
bank+0.005
man+0.005
 majesty+0.005
Neg logit contributions
 later-0.004
 adjustments-0.004
 feedback-0.003
 halves-0.003
 harder-0.003
Token put
Token ID1234
Feature activation+1.48e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
zy+0.003
opl+0.003
Neg logit contributions
 feedback-0.002
 later-0.002
 tug-0.002
itement-0.002
 longer-0.002
Token them
Token ID606
Feature activation+2.64e-03

Pos logit contributions
 later+0.002
 feedback+0.001
 longer+0.001
 shortcuts+0.001
 adjustments+0.001
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
lder-0.003
ule-0.003
Token in
Token ID287
Feature activation+9.22e-04

Pos logit contributions
 feedback+0.003
 later+0.003
itement+0.003
 adjustments+0.003
 tug+0.003
Neg logit contributions
bank-0.005
gy-0.005
man-0.004
 majesty-0.004
 Majesty-0.004
Token the
Token ID262
Feature activation+3.10e-03

Pos logit contributions
 feedback+0.001
 later+0.001
itement+0.001
 tug+0.001
 adjustments+0.001
Neg logit contributions
gy-0.002
bank-0.001
man-0.001
zy-0.001
 majesty-0.001
Token compost
Token ID36541
Feature activation+2.59e-02

Pos logit contributions
 feedback+0.005
 adjustments+0.005
 tug+0.005
 shortcuts+0.005
itement+0.005
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 majesty-0.004
zy-0.004
Token pile
Token ID14540
Feature activation+1.44e-02

Pos logit contributions
 later+0.032
itement+0.028
 halves+0.028
 longer+0.028
oms+0.028
Neg logit contributions
 majesty-0.042
gy-0.042
 Majesty-0.041
lot-0.040
lder-0.040
Token over
Token ID625
Feature activation+1.25e-02

Pos logit contributions
 later+0.018
 longer+0.015
 halves+0.014
oms+0.014
itement+0.014
Neg logit contributions
gy-0.026
 Majesty-0.026
 majesty-0.026
bank-0.025
 gra-0.024
Token there
Token ID612
Feature activation-3.89e-03

Pos logit contributions
 later+0.015
 longer+0.014
 feedback+0.014
itement+0.014
 tug+0.013
Neg logit contributions
gy-0.023
man-0.022
bank-0.022
 majesty-0.021
zy-0.021
Token."
Token ID526
Feature activation-1.23e-03

Pos logit contributions
gy+0.007
 majesty+0.006
eh+0.006
bank+0.006
 Majesty+0.006
Neg logit contributions
 later-0.006
 longer-0.005
 again-0.005
 agree-0.004
care-0.004
Token Little
Token ID7703
Feature activation+9.46e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
 Majesty+0.002
 majesty+0.002
Neg logit contributions
 feedback-0.002
 tug-0.002
 adjustments-0.002
 later-0.002
care-0.002
Token wagon
Token ID31860
Feature activation-2.65e-03

Pos logit contributions
gy+0.012
bank+0.011
 majesty+0.010
 Majesty+0.010
zy+0.010
Neg logit contributions
 feedback-0.016
 adjustments-0.016
 tug-0.015
itement-0.015
 grades-0.015
Token crashed
Token ID14997
Feature activation+8.98e-03

Pos logit contributions
gy+0.004
bank+0.003
man+0.003
zy+0.003
 majesty+0.003
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token into
Token ID656
Feature activation-5.14e-03

Pos logit contributions
 feedback+0.010
 adjustments+0.010
 tug+0.010
itement+0.010
 grades+0.010
Neg logit contributions
gy-0.016
bank-0.015
 majesty-0.015
 Majesty-0.015
man-0.015
Token a
Token ID257
Feature activation+1.65e-04

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
zy+0.006
 majesty+0.006
Neg logit contributions
 feedback-0.008
itement-0.008
 adjustments-0.008
 grades-0.008
oral-0.008
Token d
Token ID288
Feature activation+1.38e-02

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Tokenandel
Token ID33134
Feature activation+2.58e-02

Pos logit contributions
 feedback+0.019
itement+0.019
 tug+0.019
 adjustments+0.019
 grades+0.018
Neg logit contributions
gy-0.022
bank-0.020
man-0.020
zy-0.019
eh-0.018
Tokenion
Token ID295
Feature activation+1.35e-02

Pos logit contributions
 feedback+0.030
 tug+0.029
itement+0.029
 adjustments+0.028
 conflict+0.027
Neg logit contributions
gy-0.046
man-0.045
bank-0.043
zy-0.042
eh-0.041
Token bush
Token ID24484
Feature activation+1.18e-02

Pos logit contributions
 adjustments+0.015
 feedback+0.015
 tug+0.015
itement+0.015
 grades+0.015
Neg logit contributions
gy-0.024
 majesty-0.022
bank-0.022
 Majesty-0.022
man-0.021
Token!
Token ID0
Feature activation+2.17e-03

Pos logit contributions
 adjustments+0.013
 tug+0.013
itement+0.013
 feedback+0.013
 later+0.013
Neg logit contributions
gy-0.021
 majesty-0.020
 Majesty-0.020
bank-0.019
man-0.019
Token Jack
Token ID3619
Feature activation+2.72e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.002
zy-0.002
Token was
Token ID373
Feature activation+3.33e-03

Pos logit contributions
 tug+0.004
 feedback+0.004
itement+0.004
 adjustments+0.004
 graded+0.004
Neg logit contributions
gy-0.004
bank-0.004
man-0.003
 majesty-0.003
 Majesty-0.003
Token tree
Token ID5509
Feature activation+7.61e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.007
itement+0.007
 tug+0.007
 grades+0.007
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 majesty-0.003
 Majesty-0.003
Token,
Token ID11
Feature activation-3.91e-04

Pos logit contributions
 later+0.009
 adjustments+0.008
itement+0.008
 tug+0.008
 longer+0.008
Neg logit contributions
gy-0.013
 majesty-0.013
 Majesty-0.013
man-0.012
bank-0.012
Token he
Token ID339
Feature activation-2.99e-03

Pos logit contributions
gy+0.001
bank+0.000
man+0.000
 majesty+0.000
 Majesty+0.000
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 tug-0.001
itement-0.001
 grades-0.001
Token was
Token ID373
Feature activation+1.86e-03

Pos logit contributions
gy+0.005
bank+0.005
man+0.005
 majesty+0.005
 Majesty+0.005
Neg logit contributions
 tug-0.004
itement-0.004
 adjustments-0.003
 feedback-0.003
 graded-0.003
Token un
Token ID555
Feature activation+1.04e-02

Pos logit contributions
 tug+0.003
 longer+0.002
itement+0.002
 adjustments+0.002
 later+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
 Majesty-0.003
Tokenhar
Token ID9869
Feature activation+2.54e-02

Pos logit contributions
 feedback+0.014
oral+0.014
 conflict+0.014
 adjustments+0.014
Less+0.014
Neg logit contributions
gy-0.018
man-0.016
bank-0.015
o-0.015
zy-0.015
Tokenmed
Token ID1150
Feature activation-1.83e-03

Pos logit contributions
 feedback+0.028
oral+0.026
 adjustments+0.026
 shortcuts+0.025
 agree+0.025
Neg logit contributions
gy-0.050
zy-0.045
man-0.044
bank-0.044
o-0.044
Token,
Token ID11
Feature activation-1.43e-03

Pos logit contributions
gy+0.003
bank+0.003
 majesty+0.003
 Majesty+0.003
man+0.003
Neg logit contributions
 adjustments-0.002
 feedback-0.002
 tug-0.002
itement-0.002
 grades-0.002
Token but
Token ID475
Feature activation+4.82e-03

Pos logit contributions
gy+0.003
man+0.002
 majesty+0.002
asis+0.002
 Majesty+0.002
Neg logit contributions
 tug-0.002
 longer-0.002
 adjustments-0.002
 later-0.002
 harder-0.002
Token he
Token ID339
Feature activation-5.18e-03

Pos logit contributions
 tug+0.007
 later+0.007
 adjustments+0.007
itement+0.007
 feedback+0.007
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 majesty-0.007
 Majesty-0.007
Token felt
Token ID2936
Feature activation+5.40e-04

Pos logit contributions
gy+0.008
bank+0.008
 majesty+0.007
 Majesty+0.007
man+0.007
Neg logit contributions
 tug-0.007
 later-0.007
itement-0.007
 graded-0.007
 adjustments-0.007
Token you
Token ID345
Feature activation+2.83e-03

Pos logit contributions
 feedback+0.003
itement+0.003
 tug+0.003
 adjustments+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.003
 majesty-0.003
Token go
Token ID467
Feature activation-3.38e-03

Pos logit contributions
 feedback+0.003
 later+0.003
 grades+0.003
itement+0.003
 agree+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
zy-0.004
asis-0.004
Token,
Token ID11
Feature activation+2.28e-03

Pos logit contributions
gy+0.006
 majesty+0.006
man+0.006
zy+0.006
lder+0.005
Neg logit contributions
 longer-0.004
 later-0.004
 grades-0.004
itement-0.004
 feedback-0.004
Token my
Token ID616
Feature activation-8.08e-04

Pos logit contributions
 later+0.003
 feedback+0.003
 tug+0.003
 longer+0.003
 agree+0.002
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
asis-0.004
opl-0.004
Token d
Token ID288
Feature activation+8.84e-04

Pos logit contributions
gy+0.001
bank+0.001
zy+0.001
man+0.001
opl+0.001
Neg logit contributions
 tug-0.001
 feedback-0.001
 longer-0.001
 adjustments-0.001
itement-0.001
Tokenarl
Token ID7063
Feature activation-4.04e-02

Pos logit contributions
 feedback+0.001
itement+0.001
 adjustments+0.001
 grades+0.001
oms+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
zy-0.001
asis-0.001
Tokenings
Token ID654
Feature activation-3.04e-03

Pos logit contributions
gy+0.073
 Arctic+0.069
eh+0.069
zy+0.068
 Dod+0.066
Neg logit contributions
oral-0.044
 grades-0.042
 later-0.041
 conflict-0.041
oms-0.040
Token.
Token ID13
Feature activation+4.30e-04

Pos logit contributions
gy+0.005
 Majesty+0.005
bank+0.005
 majesty+0.005
asis+0.005
Neg logit contributions
 later-0.004
 feedback-0.004
 grades-0.003
itement-0.003
 adjustments-0.003
Token Enjoy
Token ID18179
Feature activation+5.67e-04

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 shortcuts+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
asis-0.001
zy-0.001
Token your
Token ID534
Feature activation-5.98e-03

Pos logit contributions
 later+0.001
 feedback+0.001
itement+0.001
 longer+0.001
 tug+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
stack-0.001
chy-0.001
Token lunch
Token ID9965
Feature activation-8.69e-03

Pos logit contributions
gy+0.011
bank+0.011
man+0.010
zy+0.010
o+0.010
Neg logit contributions
 feedback-0.007
 tug-0.007
 adjustments-0.007
 grades-0.007
 conflict-0.006
Token worked
Token ID3111
Feature activation-4.82e-03

Pos logit contributions
gy+0.004
asis+0.004
man+0.004
bank+0.004
zy+0.004
Neg logit contributions
 tug-0.003
 later-0.002
 adjustments-0.002
 feedback-0.002
 grades-0.002
Token hard
Token ID1327
Feature activation+5.84e-03

Pos logit contributions
gy+0.008
bank+0.007
man+0.007
zy+0.006
 majesty+0.006
Neg logit contributions
 feedback-0.007
 adjustments-0.007
itement-0.007
 tug-0.006
 grades-0.006
Token.
Token ID13
Feature activation-1.26e-04

Pos logit contributions
 feedback+0.008
itement+0.008
 adjustments+0.008
 tug+0.007
 grades+0.007
Neg logit contributions
gy-0.010
bank-0.009
man-0.009
zy-0.008
 majesty-0.008
Token The
Token ID383
Feature activation+4.74e-03

Pos logit contributions
gy+0.000
asis+0.000
man+0.000
bank+0.000
 majesty+0.000
Neg logit contributions
 later-0.000
 feedback-0.000
Less-0.000
care-0.000
itement-0.000
Token grass
Token ID8701
Feature activation-9.82e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.006
itement+0.006
 tug+0.006
 grades+0.006
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
zy-0.007
 majesty-0.006
Tokenho
Token ID8873
Feature activation-3.83e-02

Pos logit contributions
 Majesty+0.016
 majesty+0.015
gy+0.015
opl+0.014
bank+0.013
Neg logit contributions
itement-0.012
 tug-0.012
 later-0.012
 halves-0.012
 conflict-0.012
Tokenpper
Token ID2848
Feature activation+4.20e-04

Pos logit contributions
bank+0.070
gy+0.068
 Arctic+0.066
 majesty+0.066
 Majesty+0.065
Neg logit contributions
 later-0.045
 tug-0.040
 halves-0.039
 adjustments-0.038
itement-0.038
Token was
Token ID373
Feature activation-3.42e-03

Pos logit contributions
 later+0.000
 tug+0.000
 demonstrated+0.000
itement+0.000
 adjustments+0.000
Neg logit contributions
gy-0.001
 majesty-0.001
bank-0.001
man-0.001
 Majesty-0.001
Token spoiled
Token ID34781
Feature activation-1.62e-04

Pos logit contributions
gy+0.006
 majesty+0.005
 Majesty+0.005
asis+0.005
bank+0.005
Neg logit contributions
 longer-0.005
 tug-0.005
 harder-0.005
 later-0.004
 conflict-0.004
Token and
Token ID290
Feature activation+3.09e-04

Pos logit contributions
gy+0.000
bank+0.000
 majesty+0.000
man+0.000
 Majesty+0.000
Neg logit contributions
 adjustments-0.000
 later-0.000
 feedback-0.000
 tug-0.000
 grades-0.000
Token didn
Token ID1422
Feature activation-7.15e-03

Pos logit contributions
 tug+0.000
 later+0.000
 feedback+0.000
 conflict+0.000
itement+0.000
Neg logit contributions
gy-0.001
asis-0.001
 Majesty-0.001
man-0.001
 majesty-0.000
Token but
Token ID475
Feature activation+3.76e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.005
 tug+0.005
itement+0.005
 grades+0.005
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
 Majesty-0.005
 majesty-0.005
Token be
Token ID307
Feature activation+4.66e-04

Pos logit contributions
 feedback+0.005
 later+0.005
 tug+0.005
 longer+0.005
itement+0.005
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
zy-0.006
opl-0.005
Token careful
Token ID8161
Feature activation-1.39e-03

Pos logit contributions
 feedback+0.000
 tug+0.000
itement+0.000
 adjustments+0.000
 later+0.000
Neg logit contributions
gy-0.001
bank-0.001
zy-0.001
asis-0.001
stack-0.001
Token.
Token ID13
Feature activation-4.38e-05

Pos logit contributions
 Majesty+0.003
 majesty+0.003
eh+0.003
asis+0.003
 gra+0.003
Neg logit contributions
 later-0.001
 instructions-0.001
 harder-0.001
 of-0.001
 longer-0.001
Token Ost
Token ID38919
Feature activation+2.60e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
zy+0.000
eh+0.000
Neg logit contributions
 feedback-0.000
itement-0.000
 adjustments-0.000
oral-0.000
 tug-0.000
Tokenric
Token ID1173
Feature activation-3.79e-02

Pos logit contributions
itement+0.005
 feedback+0.005
oral+0.005
Less+0.005
 later+0.005
Neg logit contributions
 majesty-0.002
gy-0.002
 Majesty-0.002
bank-0.002
man-0.002
Tokenhes
Token ID956
Feature activation+2.08e-03

Pos logit contributions
gy+0.084
bank+0.075
 Majesty+0.075
 majesty+0.072
zy+0.072
Neg logit contributions
 later-0.033
 feedback-0.031
oral-0.031
itement-0.031
oms-0.030
Token can
Token ID460
Feature activation+4.29e-03

Pos logit contributions
 later+0.003
itement+0.002
 feedback+0.002
oral+0.002
 halves+0.002
Neg logit contributions
gy-0.004
 Majesty-0.003
 majesty-0.003
zy-0.003
bank-0.003
Token be
Token ID307
Feature activation-1.00e-03

Pos logit contributions
 tug+0.005
 feedback+0.005
 longer+0.005
 later+0.005
itement+0.005
Neg logit contributions
 Majesty-0.008
gy-0.007
 majesty-0.007
bank-0.007
man-0.007
Token gr
Token ID1036
Feature activation+6.09e-03

Pos logit contributions
gy+0.002
 Majesty+0.002
bank+0.001
zy+0.001
 majesty+0.001
Neg logit contributions
 tug-0.001
 feedback-0.001
 later-0.001
itement-0.001
 dollars-0.001
Tokenumpy
Token ID32152
Feature activation-7.55e-03

Pos logit contributions
 feedback+0.010
 adjustments+0.009
itement+0.009
 tug+0.009
 grades+0.009
Neg logit contributions
gy-0.008
bank-0.008
man-0.008
 majesty-0.007
 Majesty-0.007
Token.
Token ID13
Feature activation-1.19e-03

Pos logit contributions
 feedback+0.008
itement+0.007
 adjustments+0.007
 tug+0.007
 grades+0.007
Neg logit contributions
gy-0.009
bank-0.008
man-0.008
zy-0.008
 majesty-0.008
Token
Token ID198
Feature activation+1.38e-03

Pos logit contributions
gy+0.002
 Majesty+0.002
asis+0.002
bank+0.002
 majesty+0.002
Neg logit contributions
 tug-0.001
 feedback-0.001
 adjustments-0.001
 later-0.001
care-0.001
Token
Token ID198
Feature activation+1.90e-03

Pos logit contributions
 later+0.001
 tug+0.001
oral+0.001
Less+0.001
Together+0.001
Neg logit contributions
gy-0.003
 Majesty-0.003
bank-0.003
asis-0.003
 majesty-0.003
TokenRex
Token ID47389
Feature activation+5.71e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
itement+0.002
 adjustments+0.002
 later+0.002
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.003
 majesty-0.003
Token sn
Token ID3013
Feature activation-3.50e-03

Pos logit contributions
 tug+0.008
 feedback+0.008
 adjustments+0.008
itement+0.008
 grades+0.008
Neg logit contributions
gy-0.008
bank-0.008
man-0.008
 majesty-0.007
asis-0.007
Tokenarl
Token ID7063
Feature activation-3.69e-02

Pos logit contributions
gy+0.006
 Majesty+0.006
bank+0.006
 majesty+0.006
zy+0.006
Neg logit contributions
itement-0.004
 feedback-0.004
 later-0.003
 grades-0.003
 adjustments-0.003
Tokens
Token ID82
Feature activation+4.71e-03

Pos logit contributions
gy+0.060
bank+0.055
man+0.054
 majesty+0.051
zy+0.051
Neg logit contributions
 feedback-0.050
 adjustments-0.049
itement-0.049
 tug-0.048
 grades-0.047
Token.
Token ID13
Feature activation-2.97e-03

Pos logit contributions
 feedback+0.005
 tug+0.005
 adjustments+0.005
itement+0.005
 grades+0.004
Neg logit contributions
gy-0.009
bank-0.009
man-0.008
 majesty-0.008
 Majesty-0.008
Token He
Token ID679
Feature activation+1.89e-03

Pos logit contributions
gy+0.004
bank+0.003
asis+0.003
man+0.003
zy+0.003
Neg logit contributions
 feedback-0.005
 tug-0.005
 adjustments-0.005
 grades-0.005
 shortcuts-0.005
Token does
Token ID857
Feature activation-8.13e-03

Pos logit contributions
 tug+0.002
 feedback+0.002
 adjustments+0.002
itement+0.002
 grades+0.002
Neg logit contributions
gy-0.003
man-0.003
bank-0.003
asis-0.003
zy-0.003
Token not
Token ID407
Feature activation-5.08e-03

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
 majesty+0.006
zy+0.006
Neg logit contributions
 feedback-0.017
 adjustments-0.016
itement-0.016
 tug-0.016
 grades-0.016
Token's
Token ID338
Feature activation+7.85e-03

Pos logit contributions
 adjustments+0.004
 graded+0.004
 feedback+0.004
 conflict+0.003
 grades+0.003
Neg logit contributions
gy-0.005
bank-0.005
asis-0.005
eh-0.005
 Majesty-0.005
Token okay
Token ID8788
Feature activation-1.00e-03

Pos logit contributions
 feedback+0.009
 adjustments+0.009
 later+0.009
itement+0.009
 tug+0.008
Neg logit contributions
gy-0.014
bank-0.014
man-0.013
asis-0.013
zy-0.013
Token,
Token ID11
Feature activation+1.19e-03

Pos logit contributions
bank+0.002
stack+0.002
opl+0.002
asis+0.002
 gra+0.002
Neg logit contributions
 later-0.001
 longer-0.001
 tug-0.001
 feedback-0.001
 distractions-0.001
Token my
Token ID616
Feature activation-3.61e-03

Pos logit contributions
 later+0.001
 feedback+0.001
 tug+0.001
 conflict+0.001
 girls+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
asis-0.002
eh-0.002
Token d
Token ID288
Feature activation+1.84e-04

Pos logit contributions
gy+0.005
bank+0.005
man+0.004
asis+0.004
zy+0.004
Neg logit contributions
 feedback-0.006
 grades-0.006
 tug-0.006
 adjustments-0.006
itement-0.005
Tokenarl
Token ID7063
Feature activation-3.62e-02

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Tokenings
Token ID654
Feature activation-2.48e-03

Pos logit contributions
gy+0.054
bank+0.049
man+0.048
 majesty+0.046
zy+0.045
Neg logit contributions
 feedback-0.054
 adjustments-0.053
itement-0.052
 tug-0.051
 grades-0.051
Token.
Token ID13
Feature activation-1.56e-03

Pos logit contributions
gy+0.005
 Majesty+0.005
bank+0.005
 majesty+0.005
asis+0.004
Neg logit contributions
 later-0.003
 feedback-0.002
 adjustments-0.002
 distractions-0.002
itement-0.002
Token I
Token ID314
Feature activation-2.15e-05

Pos logit contributions
gy+0.002
bank+0.002
eh+0.002
asis+0.002
man+0.002
Neg logit contributions
 feedback-0.002
 grades-0.002
 distractions-0.002
 shortcuts-0.002
 adjustments-0.002
Token forgive
Token ID20927
Feature activation-2.17e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 majesty+0.000
zy+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
itement-0.000
 tug-0.000
 grades-0.000
Token you
Token ID345
Feature activation+2.48e-03

Pos logit contributions
gy+0.005
bank+0.005
asis+0.004
stack+0.004
eh+0.004
Neg logit contributions
 later-0.002
 feedback-0.002
 longer-0.002
 tug-0.002
 girls-0.002
Token have
Token ID423
Feature activation+9.58e-03

Pos logit contributions
 feedback+0.001
itement+0.001
 tug+0.001
 adjustments+0.001
 grades+0.000
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 Majesty-0.001
 majesty-0.001
Token fun
Token ID1257
Feature activation+1.87e-03

Pos logit contributions
 feedback+0.013
 adjustments+0.013
itement+0.013
 tug+0.012
 grades+0.012
Neg logit contributions
gy-0.015
bank-0.014
man-0.014
 majesty-0.013
zy-0.013
Token,
Token ID11
Feature activation+2.16e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.003
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token my
Token ID616
Feature activation-4.94e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
zy-0.003
Token d
Token ID288
Feature activation-2.44e-04

Pos logit contributions
gy+0.008
bank+0.008
man+0.007
asis+0.007
zy+0.007
Neg logit contributions
 feedback-0.007
 tug-0.006
 adjustments-0.006
itement-0.006
 conflict-0.006
Tokenarl
Token ID7063
Feature activation-3.60e-02

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 majesty+0.000
 Majesty+0.000
Neg logit contributions
 feedback-0.000
itement-0.000
 adjustments-0.000
 tug-0.000
 grades-0.000
Tokenings
Token ID654
Feature activation-3.22e-03

Pos logit contributions
gy+0.058
bank+0.054
zy+0.052
 Majesty+0.052
man+0.051
Neg logit contributions
 feedback-0.047
 adjustments-0.046
itement-0.045
 grades-0.045
oral-0.045
Token?"
Token ID1701
Feature activation+1.73e-03

Pos logit contributions
gy+0.004
 Majesty+0.004
bank+0.004
 majesty+0.004
zy+0.003
Neg logit contributions
 feedback-0.006
itement-0.006
 later-0.006
 adjustments-0.006
 graded-0.005
Token Their
Token ID5334
Feature activation+2.43e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 shortcuts+0.003
Neg logit contributions
gy-0.002
man-0.002
bank-0.002
 majesty-0.002
zy-0.002
Token mother
Token ID2802
Feature activation+2.33e-03

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
 conflict+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
asis-0.004
man-0.004
eh-0.004
Token asked
Token ID1965
Feature activation+1.63e-03

Pos logit contributions
 feedback+0.004
 shortcuts+0.003
 grades+0.003
oral+0.003
Less+0.003
Neg logit contributions
man-0.003
gy-0.003
o-0.003
zy-0.003
bank-0.003
TokenSpot
Token ID32565
Feature activation+5.19e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
 majesty-0.006
zy-0.006
Token grow
Token ID1663
Feature activation+4.53e-04

Pos logit contributions
 feedback+0.007
 tug+0.007
 adjustments+0.007
itement+0.007
 grades+0.007
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 majesty-0.007
 Majesty-0.007
Tokenls
Token ID7278
Feature activation+4.49e-03

Pos logit contributions
 adjustments+0.001
 feedback+0.001
itement+0.001
 grades+0.001
oms+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 majesty-0.001
 Majesty-0.001
Token and
Token ID290
Feature activation-2.97e-03

Pos logit contributions
 feedback+0.007
itement+0.007
oral+0.006
 adjustments+0.006
Less+0.006
Neg logit contributions
gy-0.007
man-0.006
bank-0.006
zy-0.006
o-0.005
Token sn
Token ID3013
Feature activation-1.52e-03

Pos logit contributions
gy+0.005
man+0.005
bank+0.005
asis+0.005
zy+0.005
Neg logit contributions
 tug-0.004
 adjustments-0.004
 feedback-0.004
 grades-0.003
itement-0.003
Tokenarl
Token ID7063
Feature activation-3.60e-02

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
 Majesty+0.003
 majesty+0.003
Neg logit contributions
itement-0.002
 feedback-0.002
 adjustments-0.002
 grades-0.002
 later-0.002
Tokens
Token ID82
Feature activation+3.13e-03

Pos logit contributions
gy+0.062
bank+0.057
man+0.056
zy+0.054
 Majesty+0.054
Neg logit contributions
 feedback-0.045
 adjustments-0.044
itement-0.043
 tug-0.043
 grades-0.042
Token.
Token ID13
Feature activation-1.88e-03

Pos logit contributions
 tug+0.003
 later+0.003
 feedback+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.006
 Majesty-0.006
 majesty-0.006
asis-0.006
bank-0.006
Token He
Token ID679
Feature activation+1.63e-03

Pos logit contributions
gy+0.003
asis+0.002
man+0.002
bank+0.002
 Majesty+0.002
Neg logit contributions
 tug-0.003
 adjustments-0.003
 shortcuts-0.003
 grades-0.003
 feedback-0.003
Token tries
Token ID8404
Feature activation-4.08e-03

Pos logit contributions
 tug+0.002
 feedback+0.002
 adjustments+0.002
 grades+0.002
itement+0.002
Neg logit contributions
gy-0.003
man-0.003
bank-0.003
zy-0.003
asis-0.003
Token to
Token ID284
Feature activation-7.10e-03

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
 majesty+0.005
zy+0.005
Neg logit contributions
 feedback-0.006
 adjustments-0.006
itement-0.006
 tug-0.006
 grades-0.006
Token.
Token ID13
Feature activation+2.15e-03

Pos logit contributions
gy+0.013
 majesty+0.013
 Majesty+0.013
bank+0.012
asis+0.012
Neg logit contributions
 later-0.007
 tug-0.006
 adjustments-0.006
oms-0.006
 feedback-0.006
Token They
Token ID1119
Feature activation+2.17e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.003
man-0.003
bank-0.003
asis-0.003
zy-0.003
Token were
Token ID547
Feature activation-2.34e-03

Pos logit contributions
 tug+0.003
itement+0.003
 feedback+0.003
 graded+0.003
 later+0.003
Neg logit contributions
gy-0.004
bank-0.003
man-0.003
 Majesty-0.003
 majesty-0.003
Token called
Token ID1444
Feature activation-5.11e-03

Pos logit contributions
gy+0.004
asis+0.004
bank+0.004
ini+0.004
lder+0.004
Neg logit contributions
 longer-0.003
 groups-0.003
 girls-0.003
 more-0.002
 louder-0.002
Token ost
Token ID23619
Feature activation+1.15e-03

Pos logit contributions
gy+0.008
bank+0.008
asis+0.008
man+0.008
eh+0.008
Neg logit contributions
 feedback-0.006
oms-0.006
 later-0.006
itement-0.006
 shortcuts-0.006
Tokenric
Token ID1173
Feature activation-3.59e-02

Pos logit contributions
oms+0.002
omed+0.002
 feedback+0.002
itement+0.002
 halves+0.002
Neg logit contributions
 majesty-0.001
 Majesty-0.001
eh-0.001
bank-0.001
man-0.001
Tokenhes
Token ID956
Feature activation+6.67e-03

Pos logit contributions
gy+0.075
 Majesty+0.069
bank+0.069
 majesty+0.068
zy+0.065
Neg logit contributions
oms-0.032
 later-0.031
itement-0.030
pr-0.029
 feedback-0.029
Token.
Token ID13
Feature activation-3.06e-04

Pos logit contributions
 later+0.007
itement+0.007
 feedback+0.007
 tug+0.007
 adjustments+0.007
Neg logit contributions
gy-0.012
 Majesty-0.012
 majesty-0.012
bank-0.011
asis-0.011
Token Lily
Token ID20037
Feature activation+7.24e-04

Pos logit contributions
gy+0.000
man+0.000
bank+0.000
asis+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
itement-0.000
 adjustments-0.000
 tug-0.000
 shortcuts-0.000
Token and
Token ID290
Feature activation+3.82e-03

Pos logit contributions
 tug+0.001
 later+0.001
 explained+0.001
 graded+0.001
 demonstrated+0.001
Neg logit contributions
gy-0.001
 Majesty-0.001
bank-0.001
opl-0.001
man-0.001
Token Ben
Token ID3932
Feature activation+6.21e-03

Pos logit contributions
 tug+0.004
 feedback+0.004
 girls+0.004
 later+0.004
 grades+0.004
Neg logit contributions
asis-0.007
man-0.007
gy-0.007
bank-0.007
eh-0.006
Token forgive
Token ID20927
Feature activation-1.18e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
 majesty+0.001
zy+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token you
Token ID345
Feature activation+1.10e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
 majesty+0.002
asis+0.002
Neg logit contributions
 feedback-0.001
 later-0.001
 tug-0.001
 adjustments-0.001
itement-0.001
Token,
Token ID11
Feature activation+4.19e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.001
itement+0.001
 tug+0.001
 grades+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token my
Token ID616
Feature activation-5.10e-03

Pos logit contributions
 feedback+0.006
 adjustments+0.006
itement+0.006
 tug+0.006
 grades+0.006
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
 majesty-0.005
zy-0.005
Token d
Token ID288
Feature activation+7.86e-04

Pos logit contributions
gy+0.007
bank+0.007
man+0.006
zy+0.006
asis+0.005
Neg logit contributions
 feedback-0.009
 adjustments-0.008
 tug-0.008
itement-0.008
 grades-0.008
Tokenarl
Token ID7063
Feature activation-3.57e-02

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 tug+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 majesty-0.001
zy-0.001
Tokenings
Token ID654
Feature activation-3.17e-03

Pos logit contributions
gy+0.057
bank+0.053
zy+0.050
 Majesty+0.050
man+0.050
Neg logit contributions
 feedback-0.048
 adjustments-0.046
itement-0.046
 grades-0.046
oral-0.046
Token.
Token ID13
Feature activation-7.09e-04

Pos logit contributions
gy+0.005
bank+0.005
 Majesty+0.005
 majesty+0.005
asis+0.005
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 later-0.004
 tug-0.004
Token And
Token ID843
Feature activation+3.38e-03

Pos logit contributions
gy+0.001
bank+0.001
asis+0.001
man+0.001
eh+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 grades-0.001
 shortcuts-0.001
 distractions-0.001
Token of
Token ID286
Feature activation-4.82e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.005
itement+0.005
 tug+0.005
 grades+0.005
Neg logit contributions
gy-0.005
bank-0.005
man-0.004
zy-0.004
 majesty-0.004
Token course
Token ID1781
Feature activation-5.23e-03

Pos logit contributions
gy+0.009
bank+0.008
man+0.008
zy+0.008
 majesty+0.008
Neg logit contributions
 feedback-0.006
itement-0.006
 adjustments-0.006
 tug-0.006
 later-0.006
Token'm
Token ID1101
Feature activation-1.33e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 grades+0.004
 shortcuts+0.004
Neg logit contributions
gy-0.003
bank-0.002
man-0.002
zy-0.002
 majesty-0.002
Token sorry
Token ID7926
Feature activation-1.27e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
 majesty+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
 tug-0.002
itement-0.002
 grades-0.002
Token,
Token ID11
Feature activation+1.39e-04

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
asis+0.002
eh+0.002
Neg logit contributions
 feedback-0.001
 later-0.001
 adjustments-0.001
 tug-0.001
itement-0.001
Token my
Token ID616
Feature activation-2.62e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
 tug+0.000
itement+0.000
 later+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
asis-0.000
 Majesty-0.000
Token d
Token ID288
Feature activation+5.34e-04

Pos logit contributions
gy+0.004
bank+0.004
stack+0.003
opl+0.003
man+0.003
Neg logit contributions
 tug-0.004
 feedback-0.004
 conflict-0.004
itement-0.004
oral-0.004
Tokenarl
Token ID7063
Feature activation-3.56e-02

Pos logit contributions
 feedback+0.001
oms+0.001
 adjustments+0.001
itement+0.000
 grades+0.000
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 Majesty-0.001
 majesty-0.001
Tokenings
Token ID654
Feature activation-3.25e-03

Pos logit contributions
gy+0.058
bank+0.054
 Majesty+0.053
eh+0.053
zy+0.053
Neg logit contributions
 feedback-0.044
oral-0.043
 adjustments-0.043
 grades-0.042
itement-0.042
Token.
Token ID13
Feature activation-1.59e-03

Pos logit contributions
gy+0.006
 Majesty+0.006
 majesty+0.006
bank+0.006
asis+0.005
Neg logit contributions
 feedback-0.003
 later-0.003
 adjustments-0.003
itement-0.003
 tug-0.003
Token That
Token ID1320
Feature activation-2.82e-03

Pos logit contributions
bank+0.002
gy+0.002
man+0.002
eh+0.002
asis+0.002
Neg logit contributions
 feedback-0.002
 shortcuts-0.002
 grades-0.002
 distractions-0.002
 adjustments-0.002
Token was
Token ID373
Feature activation-3.23e-04

Pos logit contributions
gy+0.005
 Majesty+0.005
 majesty+0.005
bank+0.004
asis+0.004
Neg logit contributions
 conflict-0.003
 adjustments-0.003
itement-0.003
 graded-0.003
 feedback-0.003
Token a
Token ID257
Feature activation+1.13e-03

Pos logit contributions
gy+0.001
bank+0.001
 Majesty+0.000
asis+0.000
man+0.000
Neg logit contributions
 feedback-0.000
 tug-0.000
 later-0.000
itement-0.000
 adjustments-0.000
Token ostr
Token ID42387
Feature activation-1.31e-02

Pos logit contributions
 feedback+0.001
 conflict+0.001
oral+0.001
 tug+0.001
 setup+0.001
Neg logit contributions
gy-0.001
bank-0.001
zy-0.001
imm-0.001
man-0.001
Tokenich
Token ID488
Feature activation+1.12e-02

Pos logit contributions
chy+0.009
bank+0.009
gy+0.008
igator+0.008
 majesty+0.008
Neg logit contributions
oral-0.026
itement-0.025
 later-0.025
 halves-0.025
 feedback-0.025
Token pen
Token ID3112
Feature activation+3.42e-03

Pos logit contributions
 feedback+0.014
 later+0.014
itement+0.013
 adjustments+0.013
 tug+0.013
Neg logit contributions
gy-0.018
 majesty-0.017
bank-0.017
 Majesty-0.017
man-0.017
Token.
Token ID13
Feature activation-4.50e-04

Pos logit contributions
 later+0.004
 feedback+0.003
 adjustments+0.003
 longer+0.003
itement+0.003
Neg logit contributions
gy-0.006
 majesty-0.006
 Majesty-0.006
bank-0.006
zy-0.006
Token Ost
Token ID38919
Feature activation+4.17e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
asis+0.001
Neg logit contributions
 feedback-0.001
 tug-0.001
 adjustments-0.001
itement-0.001
 later-0.001
Tokenric
Token ID1173
Feature activation-3.55e-02

Pos logit contributions
itement+0.008
Less+0.008
oral+0.008
 later+0.008
 feedback+0.008
Neg logit contributions
 majesty-0.004
 Majesty-0.004
gy-0.003
bank-0.003
eh-0.003
Tokenhes
Token ID956
Feature activation+1.43e-03

Pos logit contributions
gy+0.079
 Majesty+0.073
bank+0.072
 majesty+0.071
icol+0.070
Neg logit contributions
oms-0.026
 later-0.026
oral-0.026
itement-0.025
Both-0.025
Token are
Token ID389
Feature activation-5.29e-03

Pos logit contributions
 later+0.002
itement+0.002
 feedback+0.001
oms+0.001
 halves+0.001
Neg logit contributions
gy-0.003
 Majesty-0.003
zy-0.002
 majesty-0.002
eh-0.002
Token big
Token ID1263
Feature activation-5.16e-04

Pos logit contributions
gy+0.008
bank+0.008
zy+0.007
 Majesty+0.007
asis+0.007
Neg logit contributions
 tug-0.008
 later-0.007
 feedback-0.007
 longer-0.007
 conflict-0.007
Token birds
Token ID10087
Feature activation-2.01e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
asis+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 tug-0.001
 grades-0.001
 later-0.001
Token with
Token ID351
Feature activation-5.06e-04

Pos logit contributions
 Majesty+0.004
gy+0.004
 majesty+0.004
bank+0.004
man+0.004
Neg logit contributions
 later-0.002
 tug-0.002
 feedback-0.002
 halves-0.002
 adjustments-0.002
Token you
Token ID345
Feature activation+1.81e-03

Pos logit contributions
gy+0.008
bank+0.007
man+0.007
asis+0.007
 majesty+0.007
Neg logit contributions
 feedback-0.005
 tug-0.005
 later-0.005
itement-0.005
 adjustments-0.005
Token okay
Token ID8788
Feature activation-2.34e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
 adjustments+0.002
 later+0.002
itement+0.002
Neg logit contributions
bank-0.003
gy-0.003
man-0.003
 majesty-0.003
stack-0.003
Token,
Token ID11
Feature activation+2.43e-03

Pos logit contributions
bank+0.004
gy+0.004
asis+0.004
 majesty+0.004
man+0.004
Neg logit contributions
 later-0.003
 tug-0.002
 feedback-0.002
 conflict-0.002
 graded-0.002
Token my
Token ID616
Feature activation-3.77e-03

Pos logit contributions
itement+0.004
 grades+0.004
 feedback+0.004
 adjustments+0.004
 graded+0.003
Neg logit contributions
gy-0.003
 majesty-0.003
 Majesty-0.003
man-0.003
zy-0.003
Token d
Token ID288
Feature activation+6.81e-04

Pos logit contributions
bank+0.006
gy+0.006
stack+0.006
 gra+0.005
man+0.005
Neg logit contributions
 conflict-0.005
 tug-0.005
oral-0.005
 feedback-0.005
itement-0.005
Tokenarl
Token ID7063
Feature activation-3.51e-02

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 tug+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
 Majesty-0.001
 majesty-0.001
Tokenings
Token ID654
Feature activation-2.41e-03

Pos logit contributions
gy+0.055
bank+0.051
zy+0.049
man+0.049
 Majesty+0.048
Neg logit contributions
 feedback-0.047
 adjustments-0.046
itement-0.046
 grades-0.045
 tug-0.045
Token?"
Token ID1701
Feature activation+1.79e-03

Pos logit contributions
gy+0.003
 Majesty+0.003
bank+0.003
 majesty+0.003
zy+0.003
Neg logit contributions
 feedback-0.004
itement-0.004
 adjustments-0.004
 later-0.004
 tug-0.004
Token Mom
Token ID11254
Feature activation-5.79e-05

Pos logit contributions
 feedback+0.003
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.002
zy-0.002
Token asked
Token ID1965
Feature activation+4.37e-03

Pos logit contributions
gy+0.000
man+0.000
bank+0.000
o+0.000
zy+0.000
Neg logit contributions
 feedback-0.000
itement-0.000
 adjustments-0.000
 shortcuts-0.000
oral-0.000
Token.
Token ID13
Feature activation-6.00e-04

Pos logit contributions
 feedback+0.007
itement+0.007
 adjustments+0.006
 graded+0.006
 grades+0.006
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
zy-0.005
 majesty-0.005
Token ostr
Token ID42387
Feature activation-1.24e-02

Pos logit contributions
 feedback+0.004
 conflict+0.004
 setup+0.004
itement+0.004
 clues+0.004
Neg logit contributions
gy-0.005
bank-0.004
zy-0.004
imm-0.004
 majesty-0.004
Tokenich
Token ID488
Feature activation+1.32e-02

Pos logit contributions
chy+0.009
bank+0.009
igator+0.008
gy+0.008
 gra+0.008
Neg logit contributions
oral-0.024
itement-0.023
 later-0.023
 halves-0.023
 longer-0.023
Token pen
Token ID3112
Feature activation+3.27e-03

Pos logit contributions
 later+0.016
 feedback+0.016
itement+0.015
 adjustments+0.015
 longer+0.014
Neg logit contributions
gy-0.022
 majesty-0.021
 Majesty-0.021
man-0.020
bank-0.020
Token.
Token ID13
Feature activation+7.05e-04

Pos logit contributions
 later+0.003
 feedback+0.003
itement+0.003
 adjustments+0.003
 longer+0.003
Neg logit contributions
gy-0.006
 majesty-0.006
 Majesty-0.006
bank-0.006
zy-0.005
Token Ost
Token ID38919
Feature activation+4.68e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
 tug+0.001
itement+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
asis-0.001
zy-0.001
Tokenric
Token ID1173
Feature activation-3.49e-02

Pos logit contributions
itement+0.009
Less+0.009
oral+0.009
 feedback+0.009
 halves+0.009
Neg logit contributions
 majesty-0.004
 Majesty-0.004
gy-0.004
bank-0.003
eh-0.003
Tokenhes
Token ID956
Feature activation+1.14e-03

Pos logit contributions
gy+0.078
 Majesty+0.073
 majesty+0.072
bank+0.071
icol+0.069
Neg logit contributions
 later-0.025
oms-0.024
Both-0.024
oral-0.024
itement-0.024
Token are
Token ID389
Feature activation-4.47e-03

Pos logit contributions
 later+0.001
itement+0.001
oms+0.001
 feedback+0.001
 conflict+0.001
Neg logit contributions
gy-0.002
 Majesty-0.002
 majesty-0.002
eh-0.002
zy-0.002
Token big
Token ID1263
Feature activation-7.29e-04

Pos logit contributions
gy+0.007
asis+0.007
bank+0.007
 Majesty+0.006
zy+0.006
Neg logit contributions
 tug-0.006
 later-0.006
 longer-0.006
 feedback-0.006
 conflict-0.006
Token birds
Token ID10087
Feature activation-2.77e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
asis+0.001
zy+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 tug-0.001
 grades-0.001
 later-0.001
Token with
Token ID351
Feature activation-1.21e-04

Pos logit contributions
 Majesty+0.006
 majesty+0.006
gy+0.005
bank+0.005
man+0.005
Neg logit contributions
 later-0.002
 tug-0.002
oms-0.002
 longer-0.002
 adjustments-0.002
Token'm
Token ID1101
Feature activation+6.97e-04

Pos logit contributions
 feedback+0.003
 tug+0.003
itement+0.003
 adjustments+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.004
 majesty-0.004
Token sorry
Token ID7926
Feature activation-1.32e-03

Pos logit contributions
 longer+0.001
 tug+0.001
 later+0.001
 feedback+0.001
 adjustments+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
asis-0.001
 Majesty-0.001
Token,
Token ID11
Feature activation+6.25e-04

Pos logit contributions
gy+0.003
bank+0.003
asis+0.002
man+0.002
eh+0.002
Neg logit contributions
 later-0.001
 feedback-0.001
 adjustments-0.001
 tug-0.001
 longer-0.001
Token my
Token ID616
Feature activation-2.20e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
 tug+0.001
itement+0.001
 later+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
asis-0.001
zy-0.001
Token d
Token ID288
Feature activation+1.22e-03

Pos logit contributions
bank+0.004
gy+0.003
stack+0.003
man+0.003
opl+0.003
Neg logit contributions
 tug-0.003
 feedback-0.003
 conflict-0.003
itement-0.003
oral-0.003
Tokenarl
Token ID7063
Feature activation-3.42e-02

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 tug+0.001
 grades+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 Majesty-0.002
 majesty-0.002
Tokenings
Token ID654
Feature activation-3.14e-03

Pos logit contributions
gy+0.052
bank+0.048
man+0.046
zy+0.045
 Majesty+0.045
Neg logit contributions
 feedback-0.049
 adjustments-0.048
itement-0.048
 tug-0.047
 grades-0.046
Token.
Token ID13
Feature activation-2.21e-04

Pos logit contributions
gy+0.006
bank+0.005
 Majesty+0.005
 majesty+0.005
man+0.005
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 later-0.004
Token That
Token ID1320
Feature activation-3.16e-03

Pos logit contributions
gy+0.000
bank+0.000
eh+0.000
asis+0.000
man+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
 shortcuts-0.000
 blame-0.000
 tug-0.000
Token was
Token ID373
Feature activation-2.02e-04

Pos logit contributions
 majesty+0.006
 Majesty+0.006
gy+0.005
bank+0.005
zy+0.005
Neg logit contributions
 conflict-0.004
 explained-0.003
oral-0.003
 adjustments-0.003
 later-0.003
Token a
Token ID257
Feature activation-1.15e-04

Pos logit contributions
gy+0.000
bank+0.000
asis+0.000
 Majesty+0.000
 majesty+0.000
Neg logit contributions
 harder-0.000
 longer-0.000
 tug-0.000
 feedback-0.000
 more-0.000
Token you
Token ID345
Feature activation+1.05e-03

Pos logit contributions
 feedback+0.006
 later+0.006
 tug+0.006
 longer+0.006
itement+0.005
Neg logit contributions
gy-0.011
bank-0.010
man-0.010
asis-0.009
 Majesty-0.009
Token too
Token ID1165
Feature activation+8.00e-03

Pos logit contributions
 adjustments+0.001
 shortcuts+0.001
 grades+0.001
 graded+0.001
 feedback+0.001
Neg logit contributions
gy-0.002
man-0.002
zy-0.002
o-0.002
 majesty-0.001
Token,
Token ID11
Feature activation+1.02e-03

Pos logit contributions
 feedback+0.012
 adjustments+0.011
itement+0.011
 tug+0.011
 grades+0.011
Neg logit contributions
gy-0.012
bank-0.011
man-0.011
 majesty-0.010
zy-0.010
Token my
Token ID616
Feature activation-5.31e-03

Pos logit contributions
itement+0.002
 grades+0.002
ulk+0.002
 graded+0.002
 adjustments+0.002
Neg logit contributions
 majesty-0.001
 Majesty-0.001
zy-0.001
gy-0.001
 gorilla-0.001
Token d
Token ID288
Feature activation+4.04e-03

Pos logit contributions
gy+0.007
bank+0.007
man+0.006
asis+0.006
stack+0.006
Neg logit contributions
 feedback-0.009
 tug-0.009
 adjustments-0.009
itement-0.009
 later-0.008
Tokenarl
Token ID7063
Feature activation-3.40e-02

Pos logit contributions
 feedback+0.005
 adjustments+0.005
itement+0.005
 tug+0.005
 grades+0.005
Neg logit contributions
gy-0.007
bank-0.007
man-0.006
 majesty-0.006
 Majesty-0.006
Tokenings
Token ID654
Feature activation-1.77e-03

Pos logit contributions
gy+0.051
bank+0.047
man+0.046
zy+0.044
 majesty+0.044
Neg logit contributions
 feedback-0.050
 adjustments-0.048
itement-0.048
 tug-0.047
 grades-0.047
Token.
Token ID13
Feature activation-1.11e-03

Pos logit contributions
gy+0.003
bank+0.003
 Majesty+0.003
 majesty+0.003
man+0.003
Neg logit contributions
 feedback-0.002
 adjustments-0.002
itement-0.002
 later-0.002
 tug-0.002
Token Now
Token ID2735
Feature activation-3.58e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
asis+0.002
eh+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.001
 shortcuts-0.001
 grades-0.001
 distractions-0.001
Token let
Token ID1309
Feature activation-5.05e-03

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
asis+0.005
zy+0.005
Neg logit contributions
 feedback-0.004
 tug-0.004
itement-0.004
 adjustments-0.004
 later-0.004
Token's
Token ID338
Feature activation+1.53e-03

Pos logit contributions
gy+0.009
bank+0.008
man+0.008
 majesty+0.008
stack+0.008
Neg logit contributions
 later-0.006
 feedback-0.006
itement-0.006
 conflict-0.006
 adjustments-0.006
Token and
Token ID290
Feature activation-4.33e-04

Pos logit contributions
gy+0.003
bank+0.002
zy+0.002
o+0.002
man+0.002
Neg logit contributions
 feedback-0.003
itement-0.003
 adjustments-0.003
oral-0.003
 graded-0.003
Token looked
Token ID3114
Feature activation-3.71e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
 majesty+0.000
Neg logit contributions
 feedback-0.001
oral-0.001
 adjustments-0.001
 grades-0.001
 shortcuts-0.001
Token at
Token ID379
Feature activation-1.37e-03

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
 majesty+0.006
zy+0.006
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token the
Token ID262
Feature activation+3.22e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
 majesty+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.002
itement-0.002
 tug-0.002
 adjustments-0.002
 later-0.001
Token ost
Token ID23619
Feature activation+5.70e-03

Pos logit contributions
 feedback+0.005
itement+0.005
 adjustments+0.005
 tug+0.005
 conflict+0.005
Neg logit contributions
gy-0.005
bank-0.004
man-0.004
 majesty-0.004
 Majesty-0.004
Tokenric
Token ID1173
Feature activation-3.39e-02

Pos logit contributions
itement+0.010
oms+0.010
 feedback+0.010
 halves+0.010
oral+0.010
Neg logit contributions
 majesty-0.005
 Majesty-0.005
bank-0.005
man-0.005
gy-0.005
Tokenhes
Token ID956
Feature activation+8.85e-03

Pos logit contributions
gy+0.075
 Majesty+0.068
bank+0.067
 majesty+0.067
zy+0.063
Neg logit contributions
oral-0.029
 later-0.028
oms-0.028
itement-0.026
Less-0.025
Token.
Token ID13
Feature activation-9.09e-04

Pos logit contributions
 feedback+0.012
 adjustments+0.012
itement+0.012
 tug+0.011
 grades+0.011
Neg logit contributions
gy-0.014
bank-0.013
man-0.013
zy-0.012
 majesty-0.012
Token They
Token ID1119
Feature activation+1.14e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
 majesty+0.001
zy+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token saw
Token ID2497
Feature activation+2.00e-03

Pos logit contributions
 feedback+0.002
itement+0.002
 adjustments+0.002
 tug+0.002
 grades+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
 Majesty-0.002
Token one
Token ID530
Feature activation+8.01e-05

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 later+0.002
 tug+0.002
Neg logit contributions
gy-0.004
asis-0.004
bank-0.004
zy-0.004
 majesty-0.004
Token'm
Token ID1101
Feature activation-5.74e-04

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 tug+0.003
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
zy-0.003
Token sorry
Token ID7926
Feature activation-1.74e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
asis+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 tug-0.001
 longer-0.001
 later-0.001
 adjustments-0.001
Token,
Token ID11
Feature activation-7.70e-05

Pos logit contributions
gy+0.003
bank+0.003
asis+0.003
eh+0.003
man+0.003
Neg logit contributions
 later-0.002
 feedback-0.002
 longer-0.002
 girls-0.002
 adjustments-0.002
Token my
Token ID616
Feature activation-3.08e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
asis+0.000
eh+0.000
Neg logit contributions
 feedback-0.000
 tug-0.000
 later-0.000
itement-0.000
 adjustments-0.000
Token d
Token ID288
Feature activation+1.65e-03

Pos logit contributions
bank+0.004
gy+0.004
stack+0.004
asis+0.004
eh+0.003
Neg logit contributions
 tug-0.005
 grades-0.005
 feedback-0.005
 adjustments-0.005
 conflict-0.005
Tokenarl
Token ID7063
Feature activation-3.39e-02

Pos logit contributions
 feedback+0.002
 adjustments+0.002
oms+0.002
 grades+0.002
itement+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 Majesty-0.003
asis-0.003
Tokenings
Token ID654
Feature activation-3.18e-03

Pos logit contributions
gy+0.054
bank+0.051
 Majesty+0.049
zy+0.049
eh+0.048
Neg logit contributions
 feedback-0.043
 adjustments-0.042
 grades-0.042
oral-0.042
itement-0.042
Token.
Token ID13
Feature activation+7.23e-05

Pos logit contributions
gy+0.006
 Majesty+0.006
bank+0.006
 majesty+0.006
asis+0.005
Neg logit contributions
 feedback-0.003
 later-0.003
 adjustments-0.003
itement-0.003
 tug-0.003
Token The
Token ID383
Feature activation+1.40e-05

Pos logit contributions
 feedback+0.000
 shortcuts+0.000
 grades+0.000
 adjustments+0.000
 tug+0.000
Neg logit contributions
gy-0.000
bank-0.000
eh-0.000
asis-0.000
man-0.000
Token tornado
Token ID33718
Feature activation-2.88e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
 tug+0.000
 conflict+0.000
itement+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
zy-0.000
asis-0.000
Token took
Token ID1718
Feature activation-1.76e-03

Pos logit contributions
gy+0.004
bank+0.003
man+0.003
 Majesty+0.003
 majesty+0.003
Neg logit contributions
 feedback-0.005
 tug-0.005
 adjustments-0.005
itement-0.005
 grades-0.004
Token and
Token ID290
Feature activation+2.35e-03

Pos logit contributions
 feedback+0.006
 adjustments+0.006
itement+0.006
 tug+0.006
 later+0.006
Neg logit contributions
gy-0.009
bank-0.008
man-0.008
 majesty-0.008
 Majesty-0.008
Token run
Token ID1057
Feature activation+3.59e-05

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 majesty-0.004
 Majesty-0.004
Token to
Token ID284
Feature activation-2.10e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
 Majesty-0.000
Token the
Token ID262
Feature activation+1.12e-03

Pos logit contributions
gy+0.004
man+0.004
asis+0.004
eh+0.004
bank+0.004
Neg logit contributions
 tug-0.002
 feedback-0.002
 later-0.002
 adjustments-0.002
 agree-0.002
Token ost
Token ID23619
Feature activation+6.40e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
 adjustments+0.002
 conflict+0.002
itement+0.002
Neg logit contributions
gy-0.002
bank-0.001
man-0.001
zy-0.001
 majesty-0.001
Tokenric
Token ID1173
Feature activation-3.38e-02

Pos logit contributions
itement+0.011
 feedback+0.011
 halves+0.011
oms+0.010
oral+0.010
Neg logit contributions
 majesty-0.006
 Majesty-0.006
eh-0.005
 Stanley-0.005
 Dod-0.005
Tokenhes
Token ID956
Feature activation+8.79e-03

Pos logit contributions
gy+0.073
 Majesty+0.067
bank+0.065
 majesty+0.065
zy+0.062
Neg logit contributions
 later-0.030
oral-0.030
oms-0.030
itement-0.027
 feedback-0.027
Token.
Token ID13
Feature activation-1.96e-03

Pos logit contributions
 feedback+0.010
 later+0.009
 adjustments+0.009
 tug+0.009
itement+0.009
Neg logit contributions
gy-0.016
 Majesty-0.015
 majesty-0.015
bank-0.015
asis-0.015
Token
Token ID198
Feature activation+2.45e-03

Pos logit contributions
gy+0.003
man+0.003
asis+0.003
 Majesty+0.003
bank+0.003
Neg logit contributions
 tug-0.002
 feedback-0.002
itement-0.002
 adjustments-0.002
 later-0.002
Token
Token ID198
Feature activation+3.41e-03

Pos logit contributions
 later+0.003
oral+0.003
 tug+0.003
itement+0.003
Less+0.003
Neg logit contributions
gy-0.005
 Majesty-0.004
 majesty-0.004
bank-0.004
zy-0.004
Token"
Token ID1
Feature activation+2.90e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.004
 grades+0.004
itement+0.004
 distractions+0.004
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
o-0.005
zy-0.005
Token you
Token ID345
Feature activation+1.90e-03

Pos logit contributions
 feedback+0.005
 later+0.005
 girls+0.005
 tug+0.005
 grades+0.004
Neg logit contributions
gy-0.012
asis-0.011
bank-0.011
eh-0.011
man-0.010
Token too
Token ID1165
Feature activation+8.50e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token,
Token ID11
Feature activation+1.98e-03

Pos logit contributions
 feedback+0.011
 adjustments+0.011
 tug+0.011
itement+0.011
 later+0.010
Neg logit contributions
gy-0.014
bank-0.013
man-0.013
asis-0.013
 Majesty-0.012
Token my
Token ID616
Feature activation-3.62e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 tug+0.003
Neg logit contributions
gy-0.003
man-0.003
 majesty-0.002
bank-0.002
 Majesty-0.002
Token d
Token ID288
Feature activation+3.07e-03

Pos logit contributions
gy+0.006
bank+0.006
asis+0.005
man+0.005
 Majesty+0.005
Neg logit contributions
 tug-0.005
 feedback-0.005
 adjustments-0.005
 conflict-0.005
itement-0.005
Tokenarl
Token ID7063
Feature activation-3.35e-02

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
 majesty-0.005
 Majesty-0.005
Tokenings
Token ID654
Feature activation-1.48e-03

Pos logit contributions
gy+0.053
bank+0.049
zy+0.046
man+0.046
 Majesty+0.046
Neg logit contributions
 feedback-0.045
 adjustments-0.044
itement-0.044
 grades-0.044
 tug-0.043
Token,"
Token ID553
Feature activation+1.57e-03

Pos logit contributions
gy+0.002
bank+0.002
 Majesty+0.002
 majesty+0.002
asis+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
itement-0.002
 later-0.002
 tug-0.002
Token she
Token ID673
Feature activation-9.46e-04

Pos logit contributions
 feedback+0.002
itement+0.002
 adjustments+0.002
 grades+0.002
 tug+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token said
Token ID531
Feature activation+1.07e-02

Pos logit contributions
gy+0.001
man+0.001
bank+0.001
zy+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 shortcuts-0.001
itement-0.001
 grades-0.001
 distractions-0.001
Token.
Token ID13
Feature activation+2.27e-04

Pos logit contributions
 feedback+0.014
 adjustments+0.014
itement+0.014
 tug+0.013
 grades+0.013
Neg logit contributions
gy-0.018
bank-0.016
man-0.016
 majesty-0.015
 Majesty-0.015
Token's
Token ID338
Feature activation+9.29e-03

Pos logit contributions
gy+0.005
bank+0.005
eh+0.005
asis+0.005
 majesty+0.005
Neg logit contributions
 conflict-0.004
 adjustments-0.004
 later-0.004
 tug-0.004
itement-0.004
Token wonderful
Token ID7932
Feature activation+8.58e-04

Pos logit contributions
 feedback+0.013
 tug+0.013
 adjustments+0.013
 later+0.013
itement+0.013
Neg logit contributions
gy-0.014
bank-0.013
man-0.013
asis-0.013
zy-0.013
Token,
Token ID11
Feature activation+8.06e-03

Pos logit contributions
 later+0.001
 adjustments+0.001
 grades+0.001
 feedback+0.001
 tug+0.001
Neg logit contributions
gy-0.002
bank-0.002
asis-0.002
stack-0.002
eh-0.002
Token my
Token ID616
Feature activation-2.10e-03

Pos logit contributions
 feedback+0.011
 adjustments+0.011
 tug+0.011
itement+0.011
 later+0.011
Neg logit contributions
gy-0.013
bank-0.012
man-0.012
asis-0.011
zy-0.011
Token d
Token ID288
Feature activation+1.36e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
zy+0.003
asis+0.003
Neg logit contributions
 feedback-0.003
 tug-0.003
 adjustments-0.003
itement-0.003
 grades-0.003
Tokenarl
Token ID7063
Feature activation-3.33e-02

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 grades+0.002
 tug+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
zy-0.002
 majesty-0.002
Tokenings
Token ID654
Feature activation-3.35e-03

Pos logit contributions
gy+0.058
eh+0.054
 Arctic+0.053
bank+0.053
zy+0.053
Neg logit contributions
oral-0.038
 grades-0.038
 feedback-0.037
 conflict-0.036
 adjustments-0.036
Token!"
Token ID2474
Feature activation+2.48e-03

Pos logit contributions
gy+0.006
bank+0.005
 Majesty+0.005
 majesty+0.005
man+0.005
Neg logit contributions
 feedback-0.004
itement-0.004
 adjustments-0.004
 tug-0.004
 later-0.004
Token Mom
Token ID11254
Feature activation+1.52e-04

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 later+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
asis-0.004
 Majesty-0.004
Tokenmy
Token ID1820
Feature activation+1.00e-03

Pos logit contributions
 adjustments+0.000
 tug+0.000
itement+0.000
 feedback+0.000
 graded+0.000
Neg logit contributions
gy-0.000
bank-0.000
 Majesty-0.000
asis-0.000
 majesty-0.000
Token says
Token ID1139
Feature activation+6.89e-03

Pos logit contributions
 tug+0.001
 adjustments+0.001
 feedback+0.001
 graded+0.001
itement+0.001
Neg logit contributions
gy-0.002
bank-0.002
asis-0.002
man-0.002
 Majesty-0.002
Token,
Token ID11
Feature activation-2.52e-04

Pos logit contributions
 majesty+0.005
 Majesty+0.005
 Dod+0.005
opl+0.004
lder+0.004
Neg logit contributions
 tug-0.004
 later-0.004
 longer-0.004
 adjustments-0.004
oms-0.004
Token which
Token ID543
Feature activation+7.33e-03

Pos logit contributions
gy+0.000
 majesty+0.000
 Majesty+0.000
man+0.000
asis+0.000
Neg logit contributions
 tug-0.000
 later-0.000
 longer-0.000
 adjustments-0.000
 feedback-0.000
Token seemed
Token ID3947
Feature activation+9.94e-03

Pos logit contributions
 tug+0.009
 adjustments+0.009
 feedback+0.008
 conflict+0.008
itement+0.008
Neg logit contributions
gy-0.012
 Majesty-0.012
 majesty-0.012
zy-0.011
bank-0.011
Token to
Token ID284
Feature activation+1.84e-03

Pos logit contributions
 feedback+0.012
 longer+0.012
 tug+0.012
 conflict+0.011
 shortcuts+0.011
Neg logit contributions
gy-0.017
zy-0.016
man-0.016
 Majesty-0.016
 majesty-0.016
Token grow
Token ID1663
Feature activation+9.71e-03

Pos logit contributions
 tug+0.002
 later+0.002
 feedback+0.002
 longer+0.002
itement+0.002
Neg logit contributions
gy-0.003
 majesty-0.003
 Majesty-0.003
bank-0.003
man-0.003
Token bigger
Token ID5749
Feature activation+1.42e-02

Pos logit contributions
oms+0.011
 later+0.011
 longer+0.010
 adjustments+0.010
 harder+0.010
Neg logit contributions
 majesty-0.018
gy-0.018
 Majesty-0.017
bank-0.017
stack-0.016
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
 later+0.015
 adjustments+0.015
 feedback+0.015
 grades+0.015
 longer+0.014
Neg logit contributions
gy-0.027
man-0.025
bank-0.025
 majesty-0.024
zy-0.024
Token bigger
Token ID5749
Feature activation+1.16e-02

Pos logit contributions
gy+0.022
bank+0.021
man+0.020
 majesty+0.020
zy+0.020
Neg logit contributions
 feedback-0.011
 adjustments-0.011
 tug-0.010
itement-0.010
 longer-0.010
Token.
Token ID13
Feature activation-1.09e-03

Pos logit contributions
 adjustments+0.013
 feedback+0.013
 later+0.013
 grades+0.012
 tug+0.012
Neg logit contributions
gy-0.022
man-0.020
bank-0.020
zy-0.019
 majesty-0.019
Token
Token ID198
Feature activation-1.18e-03

Pos logit contributions
gy+0.002
man+0.002
 majesty+0.001
bank+0.001
 Majesty+0.001
Neg logit contributions
 feedback-0.002
itement-0.002
 adjustments-0.002
 tug-0.002
 grades-0.001
Token
Token ID198
Feature activation-2.16e-04

Pos logit contributions
gy+0.002
 Majesty+0.002
 majesty+0.002
bank+0.002
man+0.002
Neg logit contributions
 tug-0.001
 later-0.001
itement-0.001
oral-0.001
 feedback-0.001
Tokenelly
Token ID6148
Feature activation-1.61e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 grades+0.000
 tug+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Token nation
Token ID3277
Feature activation-2.78e-03

Pos logit contributions
gy+0.003
bank+0.002
o+0.002
man+0.002
zy+0.002
Neg logit contributions
 feedback-0.002
 grades-0.002
 conflict-0.002
itement-0.002
 adjustments-0.002
Token had
Token ID550
Feature activation+1.35e-03

Pos logit contributions
gy+0.004
bank+0.003
man+0.003
 majesty+0.003
 Majesty+0.003
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token a
Token ID257
Feature activation+2.08e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
zy-0.002
asis-0.002
Token wide
Token ID3094
Feature activation+1.97e-03

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 conflict+0.003
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token variety
Token ID4996
Feature activation+1.55e-02

Pos logit contributions
 feedback+0.002
 tug+0.002
itement+0.002
 adjustments+0.002
 conflict+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
o-0.003
Token of
Token ID286
Feature activation-3.06e-03

Pos logit contributions
 feedback+0.009
 adjustments+0.009
itement+0.009
 tug+0.008
 grades+0.008
Neg logit contributions
gy-0.037
bank-0.035
man-0.034
 Majesty-0.034
 majesty-0.034
Token sm
Token ID895
Feature activation+7.04e-03

Pos logit contributions
gy+0.005
bank+0.004
man+0.004
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Tokenelly
Token ID6148
Feature activation+4.78e-03

Pos logit contributions
 tug+0.010
Less+0.010
 feedback+0.010
 adjustments+0.010
itement+0.009
Neg logit contributions
gy-0.010
bank-0.010
o-0.009
man-0.009
 majesty-0.009
Token foods
Token ID9013
Feature activation+9.46e-03

Pos logit contributions
 feedback+0.006
 adjustments+0.006
itement+0.006
 tug+0.006
 grades+0.006
Neg logit contributions
gy-0.008
bank-0.008
man-0.008
zy-0.007
 majesty-0.007
Token,
Token ID11
Feature activation+3.57e-03

Pos logit contributions
 feedback+0.011
 later+0.010
 adjustments+0.010
 tug+0.010
itement+0.010
Neg logit contributions
gy-0.017
 Majesty-0.016
bank-0.016
 majesty-0.016
man-0.016
Token put
Token ID1234
Feature activation+1.38e-03

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 majesty-0.004
 Majesty-0.004
Token the
Token ID262
Feature activation+2.20e-03

Pos logit contributions
 later+0.001
itement+0.001
 adjustments+0.001
 tug+0.001
 feedback+0.001
Neg logit contributions
gy-0.003
bank-0.002
 majesty-0.002
 Majesty-0.002
man-0.002
Token snacks
Token ID28487
Feature activation+2.45e-03

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 conflict+0.003
Neg logit contributions
gy-0.004
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token on
Token ID319
Feature activation+9.58e-03

Pos logit contributions
 feedback+0.003
itement+0.003
 later+0.003
 adjustments+0.003
 tug+0.003
Neg logit contributions
gy-0.004
bank-0.004
 Majesty-0.004
 majesty-0.004
man-0.004
Token the
Token ID262
Feature activation+3.23e-03

Pos logit contributions
 tug+0.011
 later+0.011
 feedback+0.011
 adjustments+0.011
itement+0.011
Neg logit contributions
gy-0.016
 majesty-0.016
bank-0.016
 Majesty-0.016
man-0.015
Token convey
Token ID13878
Feature activation+1.47e-02

Pos logit contributions
 feedback+0.005
 tug+0.004
 adjustments+0.004
 grades+0.004
 halves+0.004
Neg logit contributions
gy-0.005
bank-0.005
man-0.004
o-0.004
 majesty-0.004
Tokenor
Token ID273
Feature activation+9.33e-03

Pos logit contributions
 feedback+0.013
itement+0.013
 adjustments+0.013
 later+0.013
 tug+0.012
Neg logit contributions
gy-0.029
bank-0.028
 majesty-0.027
 Majesty-0.027
man-0.027
Token belt
Token ID10999
Feature activation+9.79e-03

Pos logit contributions
 later+0.012
 grades+0.011
 adjustments+0.011
itement+0.010
 graded+0.010
Neg logit contributions
gy-0.016
 majesty-0.015
 Majesty-0.015
 Dod-0.015
bank-0.015
Token.
Token ID13
Feature activation+1.77e-03

Pos logit contributions
 later+0.010
 adjustments+0.009
 grades+0.009
 longer+0.009
itement+0.009
Neg logit contributions
 majesty-0.018
 Majesty-0.018
gy-0.018
asis-0.017
 Dod-0.017
Token Tim
Token ID5045
Feature activation-9.20e-04

Pos logit contributions
 adjustments+0.002
 tug+0.002
 feedback+0.002
 later+0.002
 graded+0.002
Neg logit contributions
gy-0.003
asis-0.003
man-0.003
 majesty-0.002
 Majesty-0.002
Tokenmy
Token ID1820
Feature activation+2.14e-03

Pos logit contributions
bank+0.002
 majesty+0.002
 Majesty+0.002
stack+0.002
asis+0.002
Neg logit contributions
 later-0.001
itement-0.001
 tug-0.001
 graded-0.001
 conflict-0.001
Token "
Token ID366
Feature activation-4.14e-03

Pos logit contributions
gy+0.004
bank+0.004
asis+0.004
 majesty+0.004
man+0.004
Neg logit contributions
 tug-0.005
 feedback-0.005
 later-0.005
 adjustments-0.005
itement-0.005
TokenCan
Token ID6090
Feature activation-7.56e-03

Pos logit contributions
gy+0.008
bank+0.007
man+0.007
 majesty+0.007
zy+0.007
Neg logit contributions
 feedback-0.005
 adjustments-0.005
itement-0.004
 tug-0.004
 grades-0.004
Token I
Token ID314
Feature activation-4.56e-03

Pos logit contributions
gy+0.012
bank+0.012
eh+0.011
opl+0.011
man+0.011
Neg logit contributions
 later-0.011
 feedback-0.010
 tug-0.010
 longer-0.010
itement-0.010
Token touch
Token ID3638
Feature activation+1.25e-03

Pos logit contributions
gy+0.008
bank+0.008
man+0.008
 Majesty+0.008
 majesty+0.008
Neg logit contributions
 feedback-0.005
 tug-0.005
 later-0.005
itement-0.005
 adjustments-0.005
Token the
Token ID262
Feature activation-3.55e-03

Pos logit contributions
 later+0.001
 feedback+0.001
itement+0.001
 adjustments+0.001
 longer+0.001
Neg logit contributions
gy-0.003
bank-0.003
 majesty-0.002
man-0.002
 Majesty-0.002
Token books
Token ID3835
Feature activation+1.47e-02

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
 majesty+0.005
 Majesty+0.005
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Tokenhe
Token ID258
Feature activation-7.63e-03

Pos logit contributions
 later+0.018
itement+0.018
 feedback+0.017
 adjustments+0.017
 longer+0.017
Neg logit contributions
gy-0.026
bank-0.024
 majesty-0.024
 Majesty-0.024
man-0.022
Tokenlf
Token ID1652
Feature activation+7.81e-03

Pos logit contributions
gy+0.013
bank+0.013
 Dod+0.011
 Majesty+0.011
 majesty+0.011
Neg logit contributions
 later-0.010
 grades-0.010
 adjustments-0.009
 graded-0.009
 conflict-0.009
Token too
Token ID1165
Feature activation+1.19e-02

Pos logit contributions
 later+0.011
 longer+0.010
itement+0.010
oms+0.009
 feedback+0.009
Neg logit contributions
bank-0.013
 majesty-0.012
gy-0.012
 Majesty-0.012
zy-0.011
Token?"
Token ID1701
Feature activation-2.59e-04

Pos logit contributions
 later+0.017
 longer+0.016
itement+0.016
care+0.016
 conflict+0.016
Neg logit contributions
gy-0.019
bank-0.018
 majesty-0.018
 Majesty-0.018
zy-0.016
Token His
Token ID2399
Feature activation+4.72e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 majesty+0.000
asis+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
 tug-0.000
itement-0.000
 grades-0.000
Token smelled
Token ID37609
Feature activation+4.13e-03

Pos logit contributions
 tug+0.002
 feedback+0.002
 adjustments+0.002
 demonstrated+0.002
itement+0.002
Neg logit contributions
 Majesty-0.003
 majesty-0.003
asis-0.003
gy-0.003
opl-0.003
Token like
Token ID588
Feature activation-5.65e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.006
itement+0.006
 tug+0.006
 shortcuts+0.006
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
 majesty-0.005
 Majesty-0.005
Token it
Token ID340
Feature activation+5.98e-04

Pos logit contributions
gy+0.011
bank+0.010
man+0.010
asis+0.009
zy+0.009
Neg logit contributions
 feedback-0.007
 adjustments-0.007
 later-0.006
 tug-0.006
itement-0.006
Token was
Token ID373
Feature activation+3.52e-03

Pos logit contributions
 later+0.001
 tug+0.001
itement+0.001
 adjustments+0.001
 feedback+0.001
Neg logit contributions
gy-0.001
 Majesty-0.001
 majesty-0.001
bank-0.001
man-0.001
Token too
Token ID1165
Feature activation+8.17e-03

Pos logit contributions
 feedback+0.005
 later+0.005
 tug+0.004
 longer+0.004
 conflict+0.004
Neg logit contributions
gy-0.006
man-0.005
bank-0.005
asis-0.005
 Majesty-0.005
Token much
Token ID881
Feature activation+1.83e-02

Pos logit contributions
 feedback+0.012
 tug+0.012
itement+0.012
 adjustments+0.011
 conflict+0.011
Neg logit contributions
gy-0.013
bank-0.012
man-0.012
 Majesty-0.011
 majesty-0.011
Token.
Token ID13
Feature activation-6.83e-04

Pos logit contributions
itement+0.027
 feedback+0.027
oral+0.025
Less+0.025
 adjustments+0.025
Neg logit contributions
gy-0.026
bank-0.025
man-0.024
 majesty-0.022
zy-0.022
Token Mia
Token ID32189
Feature activation+3.66e-03

Pos logit contributions
gy+0.001
man+0.001
bank+0.001
asis+0.001
 Majesty+0.001
Neg logit contributions
 feedback-0.001
 adjustments-0.001
 tug-0.001
itement-0.001
 later-0.001
Token took
Token ID1718
Feature activation+2.85e-03

Pos logit contributions
 tug+0.005
 feedback+0.004
itement+0.004
 adjustments+0.004
 grades+0.004
Neg logit contributions
gy-0.006
bank-0.006
man-0.006
asis-0.006
 Majesty-0.006
Token a
Token ID257
Feature activation-4.14e-03

Pos logit contributions
 feedback+0.004
itement+0.004
 adjustments+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
zy-0.004
 majesty-0.004
Token sniff
Token ID26300
Feature activation+6.55e-03

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
 majesty+0.005
zy+0.005
Neg logit contributions
 feedback-0.006
 adjustments-0.006
itement-0.006
 grades-0.006
 tug-0.005
Token a
Token ID257
Feature activation+7.04e-04

Pos logit contributions
 feedback+0.007
 adjustments+0.007
itement+0.007
 tug+0.007
 grades+0.007
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 majesty-0.007
 Majesty-0.007
Token very
Token ID845
Feature activation-4.23e-03

Pos logit contributions
 feedback+0.001
 tug+0.001
 adjustments+0.001
itement+0.001
 conflict+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
zy-0.001
asis-0.001
Token high
Token ID1029
Feature activation-5.74e-03

Pos logit contributions
gy+0.008
bank+0.007
man+0.007
zy+0.006
stack+0.006
Neg logit contributions
 feedback-0.006
 tug-0.006
 conflict-0.006
 adjustments-0.006
 grades-0.005
Token fever
Token ID17372
Feature activation-1.72e-03

Pos logit contributions
gy+0.009
bank+0.009
zy+0.008
man+0.008
o+0.008
Neg logit contributions
 tug-0.008
itement-0.007
 grades-0.007
 conflict-0.007
 later-0.007
Token.
Token ID13
Feature activation+1.35e-03

Pos logit contributions
gy+0.003
 Majesty+0.002
bank+0.002
 majesty+0.002
 Arctic+0.002
Neg logit contributions
 later-0.002
 grades-0.002
 adjustments-0.002
care-0.002
 tug-0.002
Token You
Token ID921
Feature activation+6.05e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
 adjustments+0.002
 later+0.002
 shortcuts+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
asis-0.002
zy-0.002
Token need
Token ID761
Feature activation+7.40e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.007
itement+0.007
 tug+0.007
 grades+0.006
Neg logit contributions
gy-0.011
bank-0.010
man-0.010
 majesty-0.010
 Majesty-0.010
Token to
Token ID284
Feature activation-2.40e-03

Pos logit contributions
 feedback+0.010
 adjustments+0.009
itement+0.009
 tug+0.009
 later+0.009
Neg logit contributions
gy-0.013
bank-0.012
man-0.012
zy-0.011
 Majesty-0.011
Token rest
Token ID1334
Feature activation+1.98e-03

Pos logit contributions
gy+0.004
man+0.004
bank+0.004
 Majesty+0.004
 majesty+0.004
Neg logit contributions
 tug-0.003
 later-0.003
 feedback-0.003
 longer-0.002
 adjustments-0.002
Token and
Token ID290
Feature activation+3.05e-03

Pos logit contributions
 later+0.002
 longer+0.002
 feedback+0.002
 harder+0.002
itement+0.002
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.004
zy-0.004
Token drink
Token ID4144
Feature activation+2.15e-03

Pos logit contributions
 tug+0.003
 later+0.003
 longer+0.003
 feedback+0.003
 adjustments+0.003
Neg logit contributions
gy-0.006
man-0.005
bank-0.005
zy-0.005
opl-0.005
Token very
Token ID845
Feature activation-2.51e-03

Pos logit contributions
gy+0.002
man+0.002
bank+0.002
 majesty+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
 tug-0.002
itement-0.002
 grades-0.002
Token embarrass
Token ID9614
Feature activation-3.15e-03

Pos logit contributions
gy+0.005
man+0.005
bank+0.004
stack+0.004
o+0.004
Neg logit contributions
 tug-0.003
 conflict-0.003
 adjustments-0.003
 feedback-0.003
 grades-0.003
Token because
Token ID780
Feature activation+1.99e-04

Pos logit contributions
gy+0.006
 majesty+0.006
bank+0.006
 Majesty+0.006
asis+0.006
Neg logit contributions
 later-0.003
 feedback-0.003
 adjustments-0.003
 longer-0.003
 grades-0.003
Token she
Token ID673
Feature activation-6.00e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
itement+0.000
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
 Majesty-0.000
Token couldn
Token ID3521
Feature activation+2.52e-03

Pos logit contributions
bank+0.010
gy+0.010
 Majesty+0.010
man+0.010
 majesty+0.010
Neg logit contributions
 tug-0.008
 feedback-0.007
 graded-0.007
itement-0.007
 grades-0.007
Token't
Token ID470
Feature activation-2.83e-03

Pos logit contributions
 later+0.003
 adjustments+0.003
 tug+0.003
Less+0.003
oral+0.003
Neg logit contributions
 majesty-0.005
 Majesty-0.004
gy-0.004
bank-0.004
eer-0.004
Token reach
Token ID3151
Feature activation+7.57e-04

Pos logit contributions
gy+0.005
bank+0.004
man+0.004
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token her
Token ID607
Feature activation+1.73e-03

Pos logit contributions
 feedback+0.001
 later+0.001
 tug+0.001
 longer+0.001
 adjustments+0.001
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
 majesty-0.001
asis-0.001
Token ball
Token ID2613
Feature activation-5.15e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
 tug+0.002
 halves+0.002
 conflict+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.003
 Majesty-0.002
Token.
Token ID13
Feature activation-1.05e-04

Pos logit contributions
gy+0.009
bank+0.009
 majesty+0.008
man+0.008
 Majesty+0.008
Neg logit contributions
 feedback-0.006
 adjustments-0.006
 tug-0.006
itement-0.006
 later-0.006
Token
Token ID198
Feature activation-1.32e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
asis+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
 tug-0.000
 grades-0.000
itement-0.000
Token.
Token ID13
Feature activation-2.87e-03

Pos logit contributions
gy+0.008
man+0.008
bank+0.008
zy+0.007
o+0.007
Neg logit contributions
itement-0.010
 feedback-0.010
 adjustments-0.010
 tug-0.010
oral-0.009
Token
Token ID198
Feature activation+1.16e-03

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
asis+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
 tug-0.004
 grades-0.004
itement-0.004
Token
Token ID198
Feature activation+1.95e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
 tug+0.002
itement+0.002
 grades+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 Majesty-0.001
 majesty-0.001
Token"
Token ID1
Feature activation+2.54e-03

Pos logit contributions
itement+0.003
 feedback+0.003
 adjustments+0.003
 grades+0.003
 longer+0.003
Neg logit contributions
gy-0.003
man-0.002
o-0.002
zy-0.002
bank-0.002
TokenWhat
Token ID2061
Feature activation-2.16e-03

Pos logit contributions
 dollars+0.004
 grades+0.004
 efforts+0.004
 adjustments+0.004
 allowance+0.004
Neg logit contributions
man-0.003
ich-0.002
fl-0.002
gy-0.002
o-0.002
Token did
Token ID750
Feature activation+4.20e-03

Pos logit contributions
gy+0.003
man+0.003
bank+0.003
zy+0.003
o+0.003
Neg logit contributions
 feedback-0.003
 adjustments-0.003
itement-0.003
 tug-0.003
 shortcuts-0.003
Token you
Token ID345
Feature activation+2.19e-03

Pos logit contributions
 feedback+0.006
 adjustments+0.006
itement+0.006
 tug+0.006
 grades+0.006
Neg logit contributions
gy-0.006
bank-0.006
man-0.005
 majesty-0.005
zy-0.005
Token do
Token ID466
Feature activation-1.44e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 shortcuts+0.003
Neg logit contributions
gy-0.003
man-0.003
zy-0.003
bank-0.003
asis-0.003
Token?"
Token ID1701
Feature activation+3.93e-04

Pos logit contributions
gy+0.002
man+0.002
bank+0.002
zy+0.002
 majesty+0.001
Neg logit contributions
 feedback-0.002
itement-0.002
 adjustments-0.002
 tug-0.002
 grades-0.002
Token she
Token ID673
Feature activation-4.46e-03

Pos logit contributions
itement+0.001
 feedback+0.001
 adjustments+0.001
 grades+0.001
 shortcuts+0.001
Neg logit contributions
gy-0.000
man-0.000
zy-0.000
bank-0.000
 majesty-0.000
Token asks
Token ID7893
Feature activation+5.84e-03

Pos logit contributions
gy+0.007
man+0.007
bank+0.007
zy+0.006
 majesty+0.006
Neg logit contributions
itement-0.006
 feedback-0.006
 adjustments-0.005
oral-0.005
Less-0.005
Token says
Token ID1139
Feature activation+6.33e-03

Pos logit contributions
 feedback+0.001
itement+0.001
 shortcuts+0.001
 adjustments+0.001
 tug+0.001
Neg logit contributions
gy-0.002
man-0.002
bank-0.002
zy-0.002
o-0.002
Token.
Token ID13
Feature activation+2.68e-04

Pos logit contributions
 feedback+0.009
itement+0.009
 adjustments+0.009
 shortcuts+0.009
 graded+0.009
Neg logit contributions
gy-0.009
bank-0.009
man-0.008
zy-0.008
 majesty-0.008
Token
Token ID198
Feature activation+2.03e-03

Pos logit contributions
 feedback+0.001
itement+0.001
 adjustments+0.001
 tug+0.000
 grades+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
zy-0.000
 majesty-0.000
Token
Token ID198
Feature activation+2.47e-03

Pos logit contributions
oral+0.002
 feedback+0.002
 later+0.002
Less+0.002
 tug+0.002
Neg logit contributions
gy-0.004
bank-0.003
 Majesty-0.003
man-0.003
 majesty-0.003
Token"
Token ID1
Feature activation+3.41e-03

Pos logit contributions
 grades+0.004
 adjustments+0.004
 feedback+0.003
itement+0.003
 graded+0.003
Neg logit contributions
gy-0.004
bank-0.003
man-0.003
o-0.003
zy-0.003
TokenWe
Token ID1135
Feature activation+1.89e-04

Pos logit contributions
 adjustments+0.005
 grades+0.005
 feedback+0.005
itement+0.005
 tug+0.004
Neg logit contributions
gy-0.005
man-0.005
bank-0.005
zy-0.005
 majesty-0.005
Token love
Token ID1842
Feature activation+4.17e-03

Pos logit contributions
 adjustments+0.000
 feedback+0.000
itement+0.000
 grades+0.000
 tug+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
zy-0.000
Token you
Token ID345
Feature activation+1.61e-03

Pos logit contributions
 feedback+0.005
 later+0.005
 tug+0.005
itement+0.005
 adjustments+0.005
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 Majesty-0.007
eh-0.007
Token too
Token ID1165
Feature activation+8.13e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 grades+0.002
 shortcuts+0.002
Neg logit contributions
gy-0.003
man-0.003
bank-0.002
zy-0.002
 majesty-0.002
Token,
Token ID11
Feature activation+2.77e-03

Pos logit contributions
 feedback+0.010
 adjustments+0.010
 tug+0.010
itement+0.010
 later+0.010
Neg logit contributions
gy-0.014
bank-0.013
man-0.013
asis-0.012
 Majesty-0.012
Token mom
Token ID1995
Feature activation-3.04e-04

Pos logit contributions
itement+0.004
 feedback+0.004
 adjustments+0.004
 grades+0.004
 tug+0.004
Neg logit contributions
gy-0.004
man-0.003
 majesty-0.003
 Majesty-0.003
bank-0.003
Token it
Token ID340
Feature activation+1.91e-03

Pos logit contributions
 feedback+0.007
 adjustments+0.007
itement+0.007
 grades+0.007
 tug+0.006
Neg logit contributions
gy-0.007
bank-0.006
man-0.006
zy-0.006
 majesty-0.006
Token was
Token ID373
Feature activation-9.29e-04

Pos logit contributions
 tug+0.003
itement+0.003
 adjustments+0.003
 later+0.003
 graded+0.003
Neg logit contributions
 Majesty-0.003
gy-0.003
 majesty-0.003
bank-0.003
opl-0.003
Token too
Token ID1165
Feature activation+5.82e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
zy+0.002
 majesty+0.002
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 grades-0.001
oral-0.001
Token small
Token ID1402
Feature activation+4.32e-03

Pos logit contributions
 feedback+0.009
 adjustments+0.009
itement+0.009
 tug+0.009
 grades+0.008
Neg logit contributions
gy-0.008
bank-0.008
man-0.007
 majesty-0.007
 Majesty-0.007
Token.
Token ID13
Feature activation-3.59e-04

Pos logit contributions
 feedback+0.006
itement+0.006
 adjustments+0.006
 tug+0.006
oral+0.006
Neg logit contributions
gy-0.007
bank-0.006
man-0.006
zy-0.006
 majesty-0.005
Token She
Token ID1375
Feature activation+1.66e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
zy+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.001
 adjustments-0.001
itement-0.001
 tug-0.001
 grades-0.001
Token tried
Token ID3088
Feature activation-4.47e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
 majesty-0.002
zy-0.002
Token to
Token ID284
Feature activation-6.45e-03

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
zy+0.005
o+0.005
Neg logit contributions
 feedback-0.007
itement-0.007
 adjustments-0.007
 grades-0.006
oral-0.006
Token ask
Token ID1265
Feature activation-6.15e-03

Pos logit contributions
gy+0.011
bank+0.011
man+0.010
zy+0.010
 majesty+0.010
Neg logit contributions
 feedback-0.008
itement-0.008
 adjustments-0.008
 grades-0.007
oral-0.007
Token her
Token ID607
Feature activation-3.86e-03

Pos logit contributions
gy+0.010
bank+0.009
man+0.009
 majesty+0.009
zy+0.009
Neg logit contributions
 feedback-0.008
 adjustments-0.008
itement-0.008
 tug-0.008
 grades-0.008
Token mom
Token ID1995
Feature activation-4.11e-05

Pos logit contributions
gy+0.006
man+0.005
bank+0.005
 majesty+0.005
 Majesty+0.005
Neg logit contributions
 feedback-0.006
itement-0.006
 adjustments-0.006
 tug-0.006
 grades-0.005
Token The
Token ID383
Feature activation+2.51e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 tug+0.001
 grades+0.001
Neg logit contributions
gy-0.001
man-0.001
asis-0.001
bank-0.001
 majesty-0.001
Token friends
Token ID2460
Feature activation-3.86e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
zy-0.004
 majesty-0.004
Token were
Token ID547
Feature activation-2.05e-03

Pos logit contributions
gy+0.006
bank+0.006
man+0.006
zy+0.005
 majesty+0.005
Neg logit contributions
 feedback-0.005
 adjustments-0.005
 shortcuts-0.005
itement-0.005
 grades-0.005
Token so
Token ID523
Feature activation+1.65e-03

Pos logit contributions
gy+0.003
bank+0.003
asis+0.003
man+0.003
eer+0.003
Neg logit contributions
 tug-0.003
 longer-0.003
 adjustments-0.002
 later-0.002
itement-0.002
Token happy
Token ID3772
Feature activation+4.35e-03

Pos logit contributions
 tug+0.002
 feedback+0.002
 adjustments+0.002
itement+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
asis-0.002
stack-0.002
Token when
Token ID618
Feature activation-6.81e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.005
 tug+0.005
 later+0.005
 grades+0.005
Neg logit contributions
gy-0.008
bank-0.007
man-0.007
 majesty-0.007
 Majesty-0.007
Token they
Token ID484
Feature activation-2.94e-03

Pos logit contributions
gy+0.013
bank+0.012
man+0.012
 majesty+0.011
zy+0.011
Neg logit contributions
 feedback-0.008
 adjustments-0.007
itement-0.007
 tug-0.007
 grades-0.007
Token saw
Token ID2497
Feature activation+5.63e-03

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
zy+0.004
 majesty+0.004
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token the
Token ID262
Feature activation+1.22e-04

Pos logit contributions
 later+0.005
 tug+0.005
 feedback+0.005
 adjustments+0.005
itement+0.005
Neg logit contributions
gy-0.012
asis-0.011
man-0.011
 majesty-0.011
eh-0.011
Token strawberries
Token ID36973
Feature activation-2.47e-03

Pos logit contributions
 feedback+0.000
 conflict+0.000
itement+0.000
 tug+0.000
 adjustments+0.000
Neg logit contributions
gy-0.000
bank-0.000
man-0.000
 majesty-0.000
 Majesty-0.000
Token.
Token ID13
Feature activation+9.41e-04

Pos logit contributions
gy+0.004
 majesty+0.004
 Majesty+0.004
bank+0.004
man+0.004
Neg logit contributions
 later-0.003
 adjustments-0.003
 tug-0.003
itement-0.003
 feedback-0.003
Token,
Token ID11
Feature activation+6.32e-03

Pos logit contributions
bank+0.001
gy+0.001
man+0.001
opl+0.001
eh+0.001
Neg logit contributions
 later-0.001
 tug-0.001
 feedback-0.001
 girls-0.001
 pins-0.001
Token Ben
Token ID3932
Feature activation-4.51e-04

Pos logit contributions
 feedback+0.009
 tug+0.008
 later+0.008
itement+0.008
 adjustments+0.008
Neg logit contributions
gy-0.011
bank-0.010
man-0.010
asis-0.009
zy-0.009
Token,
Token ID11
Feature activation+8.05e-03

Pos logit contributions
bank+0.001
stack+0.001
 Arctic+0.001
gy+0.001
opl+0.001
Neg logit contributions
 tug-0.000
 later-0.000
 conflict-0.000
 explained-0.000
 agree-0.000
Token it
Token ID340
Feature activation+3.95e-03

Pos logit contributions
 tug+0.010
 feedback+0.010
 later+0.009
itement+0.009
 conflict+0.009
Neg logit contributions
gy-0.015
bank-0.014
man-0.014
opl-0.013
asis-0.013
Token is
Token ID318
Feature activation-9.17e-04

Pos logit contributions
 tug+0.005
 conflict+0.005
 adjustments+0.004
itement+0.004
 graded+0.004
Neg logit contributions
 Majesty-0.006
gy-0.006
 majesty-0.006
bank-0.006
man-0.006
Token not
Token ID407
Feature activation-5.55e-03

Pos logit contributions
gy+0.001
bank+0.001
 Majesty+0.001
man+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 conflict-0.001
 later-0.001
 longer-0.001
 tug-0.001
Token y
Token ID331
Feature activation+6.46e-03

Pos logit contributions
bank+0.009
gy+0.009
zy+0.009
 Majesty+0.008
man+0.008
Neg logit contributions
 later-0.007
 feedback-0.007
 tug-0.007
 adjustments-0.007
 conflict-0.007
Tokenucky
Token ID5309
Feature activation+2.32e-03

Pos logit contributions
 feedback+0.011
 adjustments+0.010
itement+0.010
 tug+0.010
 grades+0.010
Neg logit contributions
gy-0.009
bank-0.008
man-0.008
zy-0.007
 majesty-0.007
Token.
Token ID13
Feature activation-2.25e-04

Pos logit contributions
 later+0.003
 tug+0.003
 adjustments+0.003
 feedback+0.003
 conflict+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.004
 majesty-0.004
Token It
Token ID632
Feature activation+3.45e-03

Pos logit contributions
gy+0.000
bank+0.000
asis+0.000
man+0.000
eh+0.000
Neg logit contributions
 feedback-0.000
 shortcuts-0.000
 tug-0.000
 distractions-0.000
 adjustments-0.000
Token is
Token ID318
Feature activation-9.30e-04

Pos logit contributions
 tug+0.004
 conflict+0.004
 halves+0.003
 later+0.003
 turns+0.003
Neg logit contributions
 Majesty-0.007
 majesty-0.006
gy-0.006
bank-0.006
ule-0.006
Token give
Token ID1577
Feature activation+5.88e-04

Pos logit contributions
 feedback+0.007
 tug+0.007
itement+0.007
 adjustments+0.006
 grades+0.006
Neg logit contributions
gy-0.010
bank-0.009
man-0.009
 Majesty-0.009
 majesty-0.009
Token it
Token ID340
Feature activation+2.85e-03

Pos logit contributions
 feedback+0.001
 later+0.001
 tug+0.001
 harder+0.001
 shortcuts+0.001
Neg logit contributions
gy-0.001
bank-0.001
asis-0.001
stack-0.001
 Arctic-0.001
Token a
Token ID257
Feature activation+2.18e-03

Pos logit contributions
 feedback+0.004
 later+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.004
zy-0.004
 Majesty-0.004
Token chance
Token ID2863
Feature activation+2.76e-03

Pos logit contributions
 feedback+0.003
 tug+0.003
 adjustments+0.003
itement+0.003
 conflict+0.003
Neg logit contributions
gy-0.004
bank-0.004
zy-0.003
man-0.003
asis-0.003
Token.
Token ID13
Feature activation-2.11e-03

Pos logit contributions
 feedback+0.003
 later+0.003
 adjustments+0.003
 grades+0.003
 longer+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.004
zy-0.004
 Majesty-0.004
Token How
Token ID1374
Feature activation-5.75e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
asis+0.003
zy+0.003
Neg logit contributions
 feedback-0.003
 adjustments-0.003
itement-0.003
 tug-0.003
 shortcuts-0.003
Token about
Token ID546
Feature activation-5.10e-03

Pos logit contributions
gy+0.011
bank+0.010
man+0.010
zy+0.009
asis+0.009
Neg logit contributions
 feedback-0.007
itement-0.007
 adjustments-0.007
 later-0.007
 tug-0.006
Token we
Token ID356
Feature activation-1.48e-03

Pos logit contributions
gy+0.009
man+0.008
bank+0.008
zy+0.008
asis+0.008
Neg logit contributions
 feedback-0.006
 adjustments-0.006
 later-0.006
 tug-0.006
itement-0.006
Token make
Token ID787
Feature activation-3.68e-05

Pos logit contributions
gy+0.003
bank+0.002
man+0.002
 majesty+0.002
zy+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
itement-0.002
 tug-0.002
 grades-0.002
Token a
Token ID257
Feature activation-3.81e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
asis+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
 later-0.000
 longer-0.000
itement-0.000
Token deal
Token ID1730
Feature activation+2.50e-03

Pos logit contributions
gy+0.007
bank+0.006
zy+0.006
o+0.005
man+0.005
Neg logit contributions
 feedback-0.005
 conflict-0.005
 adjustments-0.005
 tug-0.005
 longer-0.005
Token spl
Token ID4328
Feature activation+1.40e-03

Pos logit contributions
gy+0.011
man+0.010
 majesty+0.010
 Majesty+0.010
asis+0.010
Neg logit contributions
 tug-0.006
 adjustments-0.006
 later-0.006
 feedback-0.006
 longer-0.006
Tokenashing
Token ID2140
Feature activation-4.15e-04

Pos logit contributions
 feedback+0.002
itement+0.002
 adjustments+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.002
man-0.002
bank-0.002
 majesty-0.002
 Majesty-0.002
Token and
Token ID290
Feature activation-1.87e-03

Pos logit contributions
gy+0.001
 majesty+0.001
 Majesty+0.001
man+0.001
asis+0.001
Neg logit contributions
 tug-0.000
 adjustments-0.000
 later-0.000
itement-0.000
 longer-0.000
Token kicking
Token ID17997
Feature activation+1.80e-03

Pos logit contributions
gy+0.003
man+0.003
 majesty+0.003
 Majesty+0.003
asis+0.003
Neg logit contributions
 tug-0.003
 longer-0.003
 adjustments-0.002
 later-0.002
 feedback-0.002
Token his
Token ID465
Feature activation-6.13e-04

Pos logit contributions
 tug+0.002
 adjustments+0.002
 feedback+0.002
itement+0.002
 later+0.002
Neg logit contributions
gy-0.003
man-0.003
 majesty-0.003
 Majesty-0.003
bank-0.003
Token legs
Token ID7405
Feature activation-5.75e-03

Pos logit contributions
gy+0.001
bank+0.001
man+0.001
zy+0.001
 Majesty+0.001
Neg logit contributions
 tug-0.001
 feedback-0.001
 grades-0.001
 adjustments-0.001
itement-0.001
Token.
Token ID13
Feature activation-6.85e-05

Pos logit contributions
gy+0.011
 majesty+0.011
 Majesty+0.010
man+0.010
asis+0.010
Neg logit contributions
 tug-0.006
 longer-0.006
 later-0.006
 adjustments-0.006
 harder-0.006
Token This
Token ID770
Feature activation-3.30e-03

Pos logit contributions
gy+0.000
man+0.000
bank+0.000
asis+0.000
 majesty+0.000
Neg logit contributions
 adjustments-0.000
 feedback-0.000
 tug-0.000
 grades-0.000
itement-0.000
Token time
Token ID640
Feature activation-1.38e-04

Pos logit contributions
gy+0.006
bank+0.005
man+0.005
 majesty+0.005
 Majesty+0.005
Neg logit contributions
 feedback-0.004
 adjustments-0.004
itement-0.004
 tug-0.004
 grades-0.004
Token he
Token ID339
Feature activation-3.11e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
zy+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
 adjustments-0.000
itement-0.000
 grades-0.000
 tug-0.000
Token felt
Token ID2936
Feature activation+1.57e-03

Pos logit contributions
gy+0.005
man+0.005
bank+0.005
 majesty+0.005
 Majesty+0.005
Neg logit contributions
 tug-0.004
itement-0.004
 later-0.004
 feedback-0.004
 adjustments-0.004
Token the
Token ID262
Feature activation-7.24e-03

Pos logit contributions
 feedback+0.005
 adjustments+0.004
itement+0.004
 tug+0.004
 grades+0.004
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
zy-0.004
 majesty-0.004
Token boy
Token ID2933
Feature activation-5.25e-03

Pos logit contributions
gy+0.011
bank+0.011
opl+0.010
zy+0.010
 majesty+0.010
Neg logit contributions
 tug-0.010
 conflict-0.010
 feedback-0.010
 adjustments-0.010
 girls-0.009
Token asked
Token ID1965
Feature activation+3.66e-03

Pos logit contributions
gy+0.010
bank+0.010
man+0.009
 majesty+0.009
zy+0.009
Neg logit contributions
 feedback-0.005
 adjustments-0.005
itement-0.005
 tug-0.005
 grades-0.005
Token in
Token ID287
Feature activation+1.19e-03

Pos logit contributions
 feedback+0.004
 adjustments+0.004
 tug+0.004
itement+0.004
 later+0.004
Neg logit contributions
gy-0.006
bank-0.006
 majesty-0.006
man-0.006
 Majesty-0.006
Token a
Token ID257
Feature activation-1.19e-04

Pos logit contributions
 feedback+0.002
itement+0.002
oms+0.002
 adjustments+0.002
oral+0.002
Neg logit contributions
gy-0.002
man-0.001
bank-0.001
 majesty-0.001
 Majesty-0.001
Token loud
Token ID7812
Feature activation-5.58e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
zy+0.000
 majesty+0.000
Neg logit contributions
 feedback-0.000
 tug-0.000
 adjustments-0.000
itement-0.000
 grades-0.000
Token voice
Token ID3809
Feature activation-1.60e-03

Pos logit contributions
gy+0.011
 majesty+0.011
 Majesty+0.011
asis+0.011
 tallest+0.011
Neg logit contributions
 tug-0.005
 adjustments-0.005
 feedback-0.004
 blame-0.004
 action-0.004
Token.
Token ID13
Feature activation+6.36e-04

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
zy+0.002
o+0.002
Neg logit contributions
 feedback-0.003
itement-0.002
oral-0.002
 adjustments-0.002
 shortcuts-0.002
Token
Token ID198
Feature activation+5.44e-04

Pos logit contributions
 tug+0.001
 adjustments+0.001
 feedback+0.001
care+0.001
itement+0.001
Neg logit contributions
gy-0.001
asis-0.001
 majesty-0.001
 Majesty-0.001
man-0.001
Token
Token ID198
Feature activation+8.37e-04

Pos logit contributions
 feedback+0.001
 tug+0.001
itement+0.001
 adjustments+0.001
oral+0.001
Neg logit contributions
gy-0.001
bank-0.001
 Majesty-0.001
 majesty-0.001
man-0.001
TokenThe
Token ID464
Feature activation+1.38e-03

Pos logit contributions
 feedback+0.001
 adjustments+0.001
 grades+0.001
itement+0.001
 tug+0.001
Neg logit contributions
gy-0.001
man-0.001
bank-0.001
zy-0.001
 majesty-0.001
TokenThen
Token ID6423
Feature activation-2.70e-03

Pos logit contributions
 feedback+0.000
 adjustments+0.000
 grades+0.000
itement+0.000
 tug+0.000
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
zy-0.001
asis-0.001
Token Baby
Token ID14801
Feature activation+4.55e-03

Pos logit contributions
gy+0.004
bank+0.004
man+0.004
 majesty+0.004
 Majesty+0.004
Neg logit contributions
 feedback-0.004
itement-0.004
 adjustments-0.004
 tug-0.003
 grades-0.003
Token saw
Token ID2497
Feature activation+5.62e-03

Pos logit contributions
 tug+0.005
 later+0.005
 graded+0.005
itement+0.005
 feedback+0.005
Neg logit contributions
gy-0.008
bank-0.008
opl-0.008
 majesty-0.007
man-0.007
Token some
Token ID617
Feature activation-1.47e-03

Pos logit contributions
 later+0.005
 tug+0.005
itement+0.004
 adjustments+0.004
 longer+0.004
Neg logit contributions
gy-0.012
man-0.012
 majesty-0.012
asis-0.012
 Majesty-0.012
Token grass
Token ID8701
Feature activation-7.40e-03

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
eh+0.002
zy+0.002
Neg logit contributions
 feedback-0.002
 grades-0.002
itement-0.002
Less-0.002
oral-0.002
Tokenho
Token ID8873
Feature activation-2.52e-02

Pos logit contributions
gy+0.011
 majesty+0.011
 Majesty+0.011
bank+0.010
man+0.010
Neg logit contributions
itement-0.010
 feedback-0.010
 tug-0.010
oms-0.010
 later-0.010
Tokenppers
Token ID11799
Feature activation+8.23e-03

Pos logit contributions
 majesty+0.034
bank+0.033
 Majesty+0.033
gy+0.032
 Arctic+0.032
Neg logit contributions
 tug-0.038
 later-0.038
 undo-0.036
itement-0.036
 feedback-0.036
Token in
Token ID287
Feature activation-2.59e-04

Pos logit contributions
 feedback+0.009
 adjustments+0.009
 tug+0.009
itement+0.009
 later+0.009
Neg logit contributions
gy-0.015
bank-0.014
man-0.014
 majesty-0.013
 Majesty-0.013
Token the
Token ID262
Feature activation+2.57e-03

Pos logit contributions
gy+0.000
bank+0.000
man+0.000
 Dod+0.000
zy+0.000
Neg logit contributions
 feedback-0.000
 grades-0.000
oral-0.000
 graded-0.000
oms-0.000
Token garden
Token ID11376
Feature activation+4.44e-03

Pos logit contributions
 feedback+0.004
itement+0.003
 halves+0.003
 tug+0.003
 conflict+0.003
Neg logit contributions
gy-0.004
man-0.004
bank-0.004
 majesty-0.004
 Majesty-0.004
Token.
Token ID13
Feature activation+1.17e-03

Pos logit contributions
 later+0.005
 adjustments+0.004
 feedback+0.004
itement+0.004
 tug+0.004
Neg logit contributions
gy-0.008
 majesty-0.008
 Majesty-0.008
man-0.008
bank-0.007
Token you
Token ID345
Feature activation+1.90e-03

Pos logit contributions
 feedback+0.005
 later+0.005
 girls+0.005
 tug+0.005
 grades+0.004
Neg logit contributions
gy-0.012
asis-0.011
bank-0.011
eh-0.011
man-0.010
Token too
Token ID1165
Feature activation+8.50e-03

Pos logit contributions
 feedback+0.002
 adjustments+0.002
itement+0.002
 tug+0.002
 grades+0.002
Neg logit contributions
gy-0.003
bank-0.003
man-0.003
zy-0.003
 majesty-0.003
Token,
Token ID11
Feature activation+1.98e-03

Pos logit contributions
 feedback+0.011
 adjustments+0.011
 tug+0.011
itement+0.011
 later+0.010
Neg logit contributions
gy-0.014
bank-0.013
man-0.013
asis-0.013
 Majesty-0.012
Token my
Token ID616
Feature activation-3.62e-03

Pos logit contributions
 feedback+0.003
 adjustments+0.003
itement+0.003
 grades+0.003
 tug+0.003
Neg logit contributions
gy-0.003
man-0.003
 majesty-0.002
bank-0.002
 Majesty-0.002
Token d
Token ID288
Feature activation+3.07e-03

Pos logit contributions
gy+0.006
bank+0.006
asis+0.005
man+0.005
 Majesty+0.005
Neg logit contributions
 tug-0.005
 feedback-0.005
 adjustments-0.005
 conflict-0.005
itement-0.005
Tokenarl
Token ID7063
Feature activation-3.35e-02

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
 tug+0.003
 grades+0.003
Neg logit contributions
gy-0.006
bank-0.005
man-0.005
 majesty-0.005
 Majesty-0.005
Tokenings
Token ID654
Feature activation-1.48e-03

Pos logit contributions
gy+0.053
bank+0.049
zy+0.046
man+0.046
 Majesty+0.046
Neg logit contributions
 feedback-0.045
 adjustments-0.044
itement-0.044
 grades-0.044
 tug-0.043
Token,"
Token ID553
Feature activation+1.57e-03

Pos logit contributions
gy+0.002
bank+0.002
 Majesty+0.002
 majesty+0.002
asis+0.002
Neg logit contributions
 feedback-0.002
 adjustments-0.002
itement-0.002
 later-0.002
 tug-0.002
Token she
Token ID673
Feature activation-9.46e-04

Pos logit contributions
 feedback+0.002
itement+0.002
 adjustments+0.002
 grades+0.002
 tug+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
 majesty-0.002
zy-0.002
Token said
Token ID531
Feature activation+1.07e-02

Pos logit contributions
gy+0.001
man+0.001
bank+0.001
zy+0.001
 majesty+0.001
Neg logit contributions
 feedback-0.001
 shortcuts-0.001
itement-0.001
 grades-0.001
 distractions-0.001
Token.
Token ID13
Feature activation+2.27e-04

Pos logit contributions
 feedback+0.014
 adjustments+0.014
itement+0.014
 tug+0.013
 grades+0.013
Neg logit contributions
gy-0.018
bank-0.016
man-0.016
 majesty-0.015
 Majesty-0.015
Token you
Token ID345
Feature activation+2.83e-03

Pos logit contributions
 feedback+0.003
itement+0.003
 tug+0.003
 adjustments+0.003
 grades+0.003
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
 Majesty-0.003
 majesty-0.003
Token go
Token ID467
Feature activation-3.38e-03

Pos logit contributions
 feedback+0.003
 later+0.003
 grades+0.003
itement+0.003
 agree+0.003
Neg logit contributions
gy-0.005
bank-0.005
man-0.005
zy-0.004
asis-0.004
Token,
Token ID11
Feature activation+2.28e-03

Pos logit contributions
gy+0.006
 majesty+0.006
man+0.006
zy+0.006
lder+0.005
Neg logit contributions
 longer-0.004
 later-0.004
 grades-0.004
itement-0.004
 feedback-0.004
Token my
Token ID616
Feature activation-8.08e-04

Pos logit contributions
 later+0.003
 feedback+0.003
 tug+0.003
 longer+0.003
 agree+0.002
Neg logit contributions
gy-0.004
bank-0.004
man-0.004
asis-0.004
opl-0.004
Token d
Token ID288
Feature activation+8.84e-04

Pos logit contributions
gy+0.001
bank+0.001
zy+0.001
man+0.001
opl+0.001
Neg logit contributions
 tug-0.001
 feedback-0.001
 longer-0.001
 adjustments-0.001
itement-0.001
Tokenarl
Token ID7063
Feature activation-4.04e-02

Pos logit contributions
 feedback+0.001
itement+0.001
 adjustments+0.001
 grades+0.001
oms+0.001
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
zy-0.001
asis-0.001
Tokenings
Token ID654
Feature activation-3.04e-03

Pos logit contributions
gy+0.073
 Arctic+0.069
eh+0.069
zy+0.068
 Dod+0.066
Neg logit contributions
oral-0.044
 grades-0.042
 later-0.041
 conflict-0.041
oms-0.040
Token.
Token ID13
Feature activation+4.30e-04

Pos logit contributions
gy+0.005
 Majesty+0.005
bank+0.005
 majesty+0.005
asis+0.005
Neg logit contributions
 later-0.004
 feedback-0.004
 grades-0.003
itement-0.003
 adjustments-0.003
Token Enjoy
Token ID18179
Feature activation+5.67e-04

Pos logit contributions
 feedback+0.001
 adjustments+0.001
itement+0.001
 shortcuts+0.001
 grades+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
asis-0.001
zy-0.001
Token your
Token ID534
Feature activation-5.98e-03

Pos logit contributions
 later+0.001
 feedback+0.001
itement+0.001
 longer+0.001
 tug+0.001
Neg logit contributions
gy-0.001
bank-0.001
man-0.001
stack-0.001
chy-0.001
Token lunch
Token ID9965
Feature activation-8.69e-03

Pos logit contributions
gy+0.011
bank+0.011
man+0.010
zy+0.010
o+0.010
Neg logit contributions
 feedback-0.007
 tug-0.007
 adjustments-0.007
 grades-0.007
 conflict-0.006
Token home
Token ID1363
Feature activation-2.77e-03

Pos logit contributions
 tug+0.000
 feedback+0.000
itement+0.000
 adjustments+0.000
 conflict+0.000
Neg logit contributions
gy-0.001
zy-0.000
bank-0.000
 Majesty-0.000
man-0.000
Token.
Token ID13
Feature activation+1.68e-03

Pos logit contributions
 Majesty+0.006
 majesty+0.006
gy+0.006
bank+0.005
lder+0.005
Neg logit contributions
 later-0.003
 longer-0.002
 grades-0.002
 adjustments-0.002
 feedback-0.002
Token He
Token ID679
Feature activation+2.60e-03

Pos logit contributions
 feedback+0.002
 tug+0.002
 later+0.002
 adjustments+0.002
itement+0.002
Neg logit contributions
gy-0.003
asis-0.002
man-0.002
bank-0.002
zy-0.002
Token grabbed
Token ID13176
Feature activation-3.54e-03

Pos logit contributions
 tug+0.003
itement+0.003
 feedback+0.003
 adjustments+0.003
 later+0.003
Neg logit contributions
gy-0.005
bank-0.004
man-0.004
 Majesty-0.004
asis-0.004
Token some
Token ID617
Feature activation+2.60e-03

Pos logit contributions
gy+0.007
bank+0.006
man+0.006
asis+0.006
 majesty+0.006
Neg logit contributions
 tug-0.004
 feedback-0.004
 adjustments-0.004
itement-0.004
 grades-0.004
Token seaw
Token ID32085
Feature activation-2.62e-02

Pos logit contributions
 feedback+0.004
itement+0.004
Less+0.004
 grades+0.003
 graded+0.003
Neg logit contributions
gy-0.004
bank-0.003
man-0.003
 majesty-0.003
asis-0.003
Tokeneed
Token ID2308
Feature activation-2.27e-03

Pos logit contributions
gy+0.046
bank+0.042
man+0.042
zy+0.041
 Majesty+0.039
Neg logit contributions
 feedback-0.032
 tug-0.031
 adjustments-0.031
itement-0.030
 grades-0.029
Token and
Token ID290
Feature activation-9.99e-04

Pos logit contributions
gy+0.004
 Majesty+0.004
 majesty+0.004
bank+0.004
asis+0.004
Neg logit contributions
 later-0.002
 halves-0.002
 feedback-0.002
 tug-0.002
oms-0.002
Token started
Token ID2067
Feature activation-1.03e-02

Pos logit contributions
gy+0.002
bank+0.002
man+0.002
zy+0.002
 Majesty+0.002
Neg logit contributions
 feedback-0.001
 tug-0.001
 adjustments-0.001
itement-0.001
 later-0.001
Token to
Token ID284
Feature activation-3.75e-03

Pos logit contributions
gy+0.017
man+0.016
bank+0.015
zy+0.015
asis+0.015
Neg logit contributions
 tug-0.014
 feedback-0.014
 adjustments-0.013
 grades-0.013
itement-0.013
Token build
Token ID1382
Feature activation-2.69e-03

Pos logit contributions
man+0.006
gy+0.006
zy+0.006
ich+0.006
 majesty+0.006
Neg logit contributions
 tug-0.005
 adjustments-0.004
 feedback-0.004
itement-0.004
 later-0.004
Token you
Token ID345
Feature activation+3.60e-03

Pos logit contributions
gy+0.003
bank+0.003
man+0.003
 majesty+0.003
asis+0.003
Neg logit contributions
 feedback-0.002
 tug-0.002
itement-0.002
 later-0.002
 adjustments-0.002
Token okay
Token ID8788
Feature activation-9.99e-04

Pos logit contributions
 later+0.003
 longer+0.003
 tug+0.003
 feedback+0.003
 adjustments+0.003
Neg logit contributions
bank-0.008
gy-0.007
man-0.007
stack-0.007
 majesty-0.007
Token,
Token ID11
Feature activation+1.31e-03

Pos logit contributions
bank+0.002
gy+0.002
asis+0.002
stack+0.002
 majesty+0.002
Neg logit contributions
 later-0.001
 tug-0.001
 longer-0.001
 feedback-0.001
 conflict-0.001
Token my
Token ID616
Feature activation-3.04e-03

Pos logit contributions
itement+0.002
 feedback+0.002
 adjustments+0.002
 grades+0.002
 tug+0.002
Neg logit contributions
gy-0.002
man-0.002
 majesty-0.002
zy-0.001
 Majesty-0.001
Token d
Token ID288
Feature activation+3.75e-03

Pos logit contributions
bank+0.005
gy+0.005
stack+0.004
asis+0.004
 gra+0.004
Neg logit contributions
 tug-0.005
 conflict-0.004
 feedback-0.004
itement-0.004
 grades-0.004
Tokenarl
Token ID7063
Feature activation-3.11e-02

Pos logit contributions
 feedback+0.004
 adjustments+0.004
itement+0.004
oms+0.004
 grades+0.004
Neg logit contributions
gy-0.007
bank-0.007
man-0.006
 Majesty-0.006
 majesty-0.006
Tokenings
Token ID654
Feature activation-1.95e-04

Pos logit contributions
gy+0.045
man+0.041
bank+0.041
 majesty+0.038
zy+0.037
Neg logit contributions
 feedback-0.049
 adjustments-0.047
itement-0.047
 tug-0.046
 grades-0.045
Token?"
Token ID1701
Feature activation+1.51e-03

Pos logit contributions
gy+0.000
 Majesty+0.000
 majesty+0.000
bank+0.000
zy+0.000
Neg logit contributions
itement-0.000
 feedback-0.000
 adjustments-0.000
 later-0.000
 graded-0.000
Token she
Token ID673
Feature activation-2.91e-03

Pos logit contributions
 feedback+0.002
itement+0.002
 adjustments+0.002
 grades+0.002
 shortcuts+0.002
Neg logit contributions
gy-0.002
bank-0.002
man-0.002
zy-0.002
 majesty-0.002
Token asked
Token ID1965
Feature activation+4.90e-03

Pos logit contributions
gy+0.005
man+0.004
bank+0.004
zy+0.004
o+0.004
Neg logit contributions
 feedback-0.004
itement-0.004
 grades-0.004
oral-0.004
 adjustments-0.004
Token.
Token ID13
Feature activation-1.25e-03

Pos logit contributions
 feedback+0.007
itement+0.007
 adjustments+0.007
oral+0.007
 grades+0.007
Neg logit contributions
gy-0.007
man-0.006
bank-0.006
zy-0.006
o-0.006