TOP ACTIVATIONS
MAX = 0.066

his knee to help it heal . Later that
on it to help it heal . The next
mosquito bites to help them heal . The medicine made the
the doctor said it would heal if she was careful .
away . His mom asked why he wouldn 't eat it
ases them and tries to n ip their legs .
Lily 's plants started to die . She was sad and
's knee to help it heal . After Tim
band - aid to help heal the boo - bo o
behind would cause it to die . This is a reminder
don 't know . But why do you need to find
. His mom asked him why he was crying and Tim
Tim my needed medicine to heal . But Tim my didn
some medicine to help it heal . Tim my felt better
came outside and asked him why he was crying . He
is better . Now , why don 't you swim together
home ". His friend asked why . Johnny said , "
on it and it will heal soon ." Tim
cabinet and out came a little bunny . Lilly wanted to
to take Max to the veterinarian , who is a person

MIN ACTIVATIONS
MIN = -0.077

never forgot the fast che et ah he had seen .
eth any and her je ep had many adventures . Everywhere
!" Mor ris smiled and replied , "
stopped . Kay lee was excited and said ,
It became the happiest books he lf in the world !
eth any and her je ep were very happy together .
them about the fast che et ah . They were all
had never touched the books he lf . She learned that
. She had many cr ay ons of different colors .
!" " Trump et !" Tom said , swinging
like to write with cr ay ons . They write on
. She had many cr ay ons of different colors that
pony and the 3 - year old girl played together all
He had all his cr ay ons out and was spreading
. She was three - years old and she loved to
real that the 3 - year - old boy could feel
knew that the f auc et was really good , and
was known as the cle ve rest mouse in the whole
was known as the cle ve rest engine in all the
. She had many cr ay ons and papers . She

INTERVAL 0.030 - 0.066
CONTAINS 0.948%

it back to me . I want to throw it in
to be careful . " I will explain why later ,"
time , there was a little cater pillar . She was
everyone started to act a little different . They smiled more
forest , there was a little wolf . The little wolf

INTERVAL -0.005 - 0.030
CONTAINS 67.936%

and grateful for the help he received . He left feeling
use it to decor ate your paper things . And you
rooms !" Tom and Lily felt sad and confused
new friends . They eat and drink and then lie down
in the bright sunshine . One day , he saw a

INTERVAL -0.041 - -0.005
CONTAINS 30.966%

a little boy . He had a pet dog called Bos
little girl named Sue went for a walk . She saw
twins . They like to play together , but sometimes they
a little anxious but kept on bouncing around .
B eth â € ™ s friends tried to help her

INTERVAL -0.077 - -0.041
CONTAINS 0.150%

The bird ch ir ped and settled in the
was a 3 - year - old who loved to explore
the little boy cheered the loud est . <|endoftext|> John and
It was even more delicious than he remembered . The pirate
home , knowing that their thick chamber was always a place
Token his
Token ID465
Feature activation+6.57e-03

Pos logit contributions
aces+0.016
lf+0.015
anes+0.015
 floors+0.014
keeping+0.014
Neg logit contributions
 better-0.019
 promise-0.018
 forgive-0.018
 sip-0.017
 rinse-0.017
Token knee
Token ID10329
Feature activation+1.91e-02

Pos logit contributions
 promise+0.014
 forgive+0.013
 better+0.013
 agree+0.012
 rinse+0.012
Neg logit contributions
lf-0.010
aces-0.010
anes-0.009
 towers-0.009
 walls-0.009
Token to
Token ID284
Feature activation+2.75e-02

Pos logit contributions
 promise+0.035
 forgive+0.033
 better+0.032
use+0.031
 agree+0.031
Neg logit contributions
aces-0.033
lf-0.033
 walls-0.031
 towers-0.031
 floors-0.030
Token help
Token ID1037
Feature activation+3.96e-02

Pos logit contributions
use+0.040
ackers+0.040
 promise+0.040
 unnecessary+0.039
ory+0.039
Neg logit contributions
lf-0.052
 walls-0.050
 himself-0.049
aces-0.046
 floors-0.046
Token it
Token ID340
Feature activation+3.55e-02

Pos logit contributions
 promise+0.088
 forgive+0.084
 better+0.084
use+0.081
 agree+0.080
Neg logit contributions
aces-0.054
lf-0.053
 walls-0.048
 towers-0.048
 floors-0.047
Token heal
Token ID12035
Feature activation+6.62e-02

Pos logit contributions
use+0.051
 promise+0.049
ackers+0.047
 forgive+0.047
ory+0.046
Neg logit contributions
 walls-0.069
aces-0.068
lf-0.068
 towers-0.066
 himself-0.066
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
 promise+0.139
use+0.133
 sip+0.131
 forgive+0.130
 agree+0.126
Neg logit contributions
lf-0.091
aces-0.091
 himself-0.088
 walls-0.087
keeping-0.086
Token
Token ID198
Feature activation+1.16e-02

Pos logit contributions
 promise+0.028
 forgive+0.028
 better+0.027
 agree+0.026
use+0.025
Neg logit contributions
aces-0.038
lf-0.037
 walls-0.035
 floors-0.034
 towers-0.034
Token
Token ID198
Feature activation+1.48e-02

Pos logit contributions
 promise+0.019
 forgive+0.018
 better+0.018
use+0.017
 rinse+0.016
Neg logit contributions
lf-0.022
aces-0.021
 walls-0.020
 towers-0.020
anes-0.020
TokenLater
Token ID18602
Feature activation+1.61e-02

Pos logit contributions
 promise+0.025
 forgive+0.024
 better+0.024
use+0.023
 agree+0.022
Neg logit contributions
aces-0.028
lf-0.027
 walls-0.026
 towers-0.025
 floors-0.025
Token that
Token ID326
Feature activation+3.35e-02

Pos logit contributions
 better+0.028
 promise+0.027
 forgive+0.027
 sip+0.026
 kiss+0.025
Neg logit contributions
aces-0.033
keeping-0.028
 towers-0.028
 floors-0.027
lf-0.027
Token on
Token ID319
Feature activation-1.50e-02

Pos logit contributions
 better+0.025
 promise+0.023
 forgive+0.023
 safer+0.021
 kiss+0.021
Neg logit contributions
lf-0.025
aces-0.024
 sculptures-0.023
 towers-0.023
 floors-0.023
Token it
Token ID340
Feature activation+6.10e-03

Pos logit contributions
aces+0.028
anes+0.027
lf+0.027
keeping+0.026
 sculptures+0.026
Neg logit contributions
 better-0.026
 forgive-0.024
 promise-0.023
 sip-0.022
 rinse-0.022
Token to
Token ID284
Feature activation+2.08e-02

Pos logit contributions
 promise+0.010
 better+0.010
 forgive+0.010
 sip+0.009
 agree+0.009
Neg logit contributions
lf-0.011
aces-0.011
anes-0.011
 towers-0.011
 floors-0.011
Token help
Token ID1037
Feature activation+3.18e-02

Pos logit contributions
use+0.031
 promise+0.030
ackers+0.029
 unnecessary+0.029
 agree+0.028
Neg logit contributions
lf-0.040
 walls-0.038
 himself-0.037
aces-0.037
 towers-0.036
Token it
Token ID340
Feature activation+2.86e-02

Pos logit contributions
 promise+0.069
 forgive+0.067
 better+0.067
 agree+0.063
use+0.062
Neg logit contributions
aces-0.046
lf-0.044
 walls-0.040
anes-0.040
 towers-0.040
Token heal
Token ID12035
Feature activation+6.37e-02

Pos logit contributions
use+0.040
 promise+0.040
 forgive+0.037
ackers+0.036
 agree+0.036
Neg logit contributions
aces-0.057
 walls-0.057
lf-0.057
 towers-0.055
 floors-0.054
Token.
Token ID13
Feature activation+8.88e-03

Pos logit contributions
 promise+0.131
use+0.124
 sip+0.122
 forgive+0.122
 agree+0.118
Neg logit contributions
lf-0.094
aces-0.092
 walls-0.088
 himself-0.088
keeping-0.087
Token
Token ID198
Feature activation+9.01e-04

Pos logit contributions
 forgive+0.013
 promise+0.012
 better+0.012
 agree+0.012
 sip+0.011
Neg logit contributions
aces-0.019
lf-0.019
 walls-0.018
 floors-0.017
keeping-0.017
Token
Token ID198
Feature activation+4.43e-03

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
use+0.001
 agree+0.001
Neg logit contributions
lf-0.002
aces-0.002
 walls-0.002
 towers-0.002
 floors-0.002
TokenThe
Token ID464
Feature activation+1.25e-02

Pos logit contributions
 promise+0.007
 forgive+0.007
 better+0.006
 agree+0.006
 sip+0.006
Neg logit contributions
aces-0.009
lf-0.008
anes-0.008
 walls-0.008
keeping-0.008
Token next
Token ID1306
Feature activation+7.41e-03

Pos logit contributions
 promise+0.031
 forgive+0.030
 better+0.030
 kiss+0.028
 rinse+0.027
Neg logit contributions
aces-0.017
lf-0.015
anes-0.014
keeping-0.014
 towers-0.014
Token mosquito
Token ID35446
Feature activation+1.90e-02

Pos logit contributions
 promise+0.009
 forgive+0.009
 better+0.009
use+0.008
 agree+0.008
Neg logit contributions
aces-0.007
lf-0.007
 walls-0.007
 towers-0.007
anes-0.007
Token bites
Token ID26081
Feature activation+2.13e-02

Pos logit contributions
 promise+0.038
 forgive+0.036
 better+0.035
use+0.034
 agree+0.034
Neg logit contributions
aces-0.029
lf-0.029
 walls-0.027
 towers-0.027
 floors-0.027
Token to
Token ID284
Feature activation+2.52e-02

Pos logit contributions
 better+0.038
 promise+0.038
 forgive+0.037
 sip+0.035
use+0.034
Neg logit contributions
lf-0.038
aces-0.037
 walls-0.034
 towers-0.033
 floors-0.033
Token help
Token ID1037
Feature activation+3.64e-02

Pos logit contributions
 promise+0.039
 forgive+0.036
use+0.036
 better+0.035
 unnecessary+0.035
Neg logit contributions
lf-0.048
 walls-0.047
aces-0.045
 towers-0.045
 floors-0.044
Token them
Token ID606
Feature activation+7.42e-03

Pos logit contributions
 promise+0.073
 better+0.072
 forgive+0.071
 agree+0.067
use+0.067
Neg logit contributions
aces-0.057
lf-0.054
 walls-0.050
anes-0.049
 towers-0.049
Token heal
Token ID12035
Feature activation+6.25e-02

Pos logit contributions
 promise+0.012
use+0.011
 forgive+0.011
 better+0.010
 unnecessary+0.010
Neg logit contributions
 walls-0.013
aces-0.013
lf-0.013
 floors-0.013
 towers-0.013
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
 promise+0.131
use+0.124
 forgive+0.123
 sip+0.119
 agree+0.117
Neg logit contributions
lf-0.087
aces-0.085
 walls-0.083
 himself-0.082
keeping-0.081
Token The
Token ID383
Feature activation+1.67e-02

Pos logit contributions
 better+0.020
 forgive+0.020
 sip+0.019
 agree+0.019
 please+0.018
Neg logit contributions
aces-0.028
ates-0.027
lf-0.027
keeping-0.027
ator-0.027
Token medicine
Token ID9007
Feature activation+1.87e-02

Pos logit contributions
 promise+0.028
 forgive+0.026
 better+0.026
use+0.024
 agree+0.024
Neg logit contributions
aces-0.032
lf-0.032
 walls-0.030
 towers-0.030
 floors-0.030
Token made
Token ID925
Feature activation-4.73e-04

Pos logit contributions
 promise+0.032
 better+0.031
 forgive+0.030
 sip+0.029
 kiss+0.028
Neg logit contributions
aces-0.035
 towers-0.033
lf-0.032
anes-0.032
keeping-0.031
Token the
Token ID262
Feature activation+1.75e-02

Pos logit contributions
aces+0.001
keeping+0.001
lf+0.001
ates+0.001
pter+0.001
Neg logit contributions
 better-0.001
 forgive-0.001
 sip-0.001
 promise-0.001
 louder-0.001
Token the
Token ID262
Feature activation+1.52e-02

Pos logit contributions
 promise+0.005
use+0.004
 forgive+0.004
 agree+0.004
 better+0.004
Neg logit contributions
lf-0.004
aces-0.004
 walls-0.003
 towers-0.003
 floors-0.003
Token doctor
Token ID6253
Feature activation+3.63e-02

Pos logit contributions
 promise+0.027
 better+0.026
 forgive+0.025
 kiss+0.023
use+0.022
Neg logit contributions
aces-0.028
lf-0.028
anes-0.027
keeping-0.026
 towers-0.026
Token said
Token ID531
Feature activation+1.33e-02

Pos logit contributions
 promise+0.061
 forgive+0.058
 better+0.057
 agree+0.053
use+0.053
Neg logit contributions
aces-0.070
lf-0.068
 towers-0.066
 walls-0.066
 floors-0.065
Token it
Token ID340
Feature activation+1.52e-02

Pos logit contributions
 better+0.023
 forgive+0.023
 promise+0.023
 rinse+0.021
 sip+0.020
Neg logit contributions
aces-0.024
lf-0.023
keeping-0.022
anes-0.022
 floors-0.022
Token would
Token ID561
Feature activation+1.13e-02

Pos logit contributions
 better+0.024
 promise+0.023
 forgive+0.022
use+0.019
 accept+0.019
Neg logit contributions
lf-0.031
 towers-0.031
aces-0.030
 floors-0.030
keeping-0.030
Token heal
Token ID12035
Feature activation+6.23e-02

Pos logit contributions
 promise+0.017
 forgive+0.016
use+0.016
 better+0.016
 agree+0.015
Neg logit contributions
aces-0.022
lf-0.022
 walls-0.021
 towers-0.020
 floors-0.020
Token if
Token ID611
Feature activation+2.15e-02

Pos logit contributions
 promise+0.114
use+0.106
 sip+0.106
 agree+0.105
 forgive+0.103
Neg logit contributions
aces-0.100
lf-0.098
 himself-0.097
 walls-0.095
keeping-0.092
Token she
Token ID673
Feature activation+2.05e-02

Pos logit contributions
 promise+0.040
 forgive+0.038
 better+0.037
use+0.036
 agree+0.035
Neg logit contributions
aces-0.037
lf-0.036
 walls-0.034
 towers-0.034
 floors-0.033
Token was
Token ID373
Feature activation+1.16e-02

Pos logit contributions
 promise+0.037
 forgive+0.035
 better+0.034
use+0.032
 agree+0.032
Neg logit contributions
lf-0.037
aces-0.037
 towers-0.035
 walls-0.035
 floors-0.035
Token careful
Token ID8161
Feature activation-4.87e-03

Pos logit contributions
 promise+0.018
 forgive+0.017
use+0.017
 agree+0.016
 sip+0.016
Neg logit contributions
aces-0.022
lf-0.022
 walls-0.021
 floors-0.020
 towers-0.020
Token.
Token ID13
Feature activation+1.89e-02

Pos logit contributions
lf+0.010
aces+0.009
anes+0.009
 towers+0.009
 floors+0.009
Neg logit contributions
 better-0.008
 promise-0.007
 forgive-0.007
 agree-0.006
 louder-0.006
Token away
Token ID1497
Feature activation+2.08e-02

Pos logit contributions
 promise+0.006
 better+0.006
 forgive+0.005
 sip+0.005
 agree+0.005
Neg logit contributions
lf-0.010
aces-0.010
anes-0.010
 floors-0.009
keeping-0.009
Token.
Token ID13
Feature activation-4.33e-03

Pos logit contributions
 promise+0.035
 better+0.034
 forgive+0.034
 agree+0.031
 sip+0.031
Neg logit contributions
aces-0.039
lf-0.039
anes-0.036
 towers-0.036
 walls-0.036
Token His
Token ID2399
Feature activation-4.83e-05

Pos logit contributions
lf+0.010
aces+0.010
 walls+0.009
 floors+0.009
 towers+0.009
Neg logit contributions
 forgive-0.006
 better-0.006
 promise-0.006
 agree-0.005
 kiss-0.005
Token mom
Token ID1995
Feature activation+1.00e-02

Pos logit contributions
aces+0.000
anes+0.000
lf+0.000
keeping+0.000
 floors+0.000
Neg logit contributions
 promise-0.000
 forgive-0.000
 better-0.000
 kiss-0.000
 sip-0.000
Token asked
Token ID1965
Feature activation+2.26e-02

Pos logit contributions
 promise+0.018
 forgive+0.017
 better+0.017
 agree+0.016
use+0.015
Neg logit contributions
aces-0.019
lf-0.018
 walls-0.017
keeping-0.017
 towers-0.017
Token why
Token ID1521
Feature activation+6.18e-02

Pos logit contributions
 promise+0.049
 forgive+0.048
 better+0.047
use+0.046
 agree+0.045
Neg logit contributions
aces-0.032
lf-0.031
 walls-0.028
anes-0.028
 floors-0.028
Token he
Token ID339
Feature activation+1.76e-02

Pos logit contributions
 promise+0.133
 forgive+0.129
 better+0.128
 agree+0.122
 rinse+0.120
Neg logit contributions
aces-0.086
lf-0.082
 walls-0.078
keeping-0.077
 towers-0.077
Token wouldn
Token ID3636
Feature activation+5.52e-03

Pos logit contributions
 promise+0.035
 forgive+0.033
 better+0.033
use+0.032
 agree+0.031
Neg logit contributions
aces-0.027
lf-0.027
 walls-0.025
 towers-0.025
 floors-0.025
Token't
Token ID470
Feature activation+1.71e-03

Pos logit contributions
 promise+0.014
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.005
lf-0.005
 towers-0.005
 walls-0.004
 floors-0.004
Token eat
Token ID4483
Feature activation+1.52e-03

Pos logit contributions
 promise+0.002
 forgive+0.002
 better+0.002
use+0.002
 agree+0.002
Neg logit contributions
lf-0.004
aces-0.004
 walls-0.004
 towers-0.004
 floors-0.004
Token it
Token ID340
Feature activation+6.17e-03

Pos logit contributions
 better+0.003
 promise+0.003
 forgive+0.003
 sip+0.002
 agree+0.002
Neg logit contributions
aces-0.003
lf-0.003
anes-0.002
keeping-0.002
 floors-0.002
Tokenases
Token ID1386
Feature activation+8.03e-03

Pos logit contributions
 promise+0.023
 forgive+0.022
 better+0.022
use+0.021
 rinse+0.020
Neg logit contributions
aces-0.012
 towers-0.011
lf-0.010
anes-0.010
 walls-0.010
Token them
Token ID606
Feature activation-1.33e-02

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.014
 agree+0.013
Neg logit contributions
aces-0.014
lf-0.013
 walls-0.013
 towers-0.013
 floors-0.012
Token and
Token ID290
Feature activation-6.77e-04

Pos logit contributions
lf+0.023
aces+0.023
 walls+0.022
 towers+0.021
 floors+0.021
Neg logit contributions
 promise-0.024
 forgive-0.023
 better-0.023
 agree-0.021
use-0.021
Token tries
Token ID8404
Feature activation-2.77e-03

Pos logit contributions
 towers+0.001
aces+0.001
 walls+0.001
lf+0.001
 floors+0.001
Neg logit contributions
 promise-0.001
 forgive-0.001
use-0.001
ackers-0.001
 agree-0.001
Token to
Token ID284
Feature activation+2.83e-02

Pos logit contributions
aces+0.003
lf+0.003
 walls+0.003
 towers+0.003
 floors+0.003
Neg logit contributions
 promise-0.006
 forgive-0.006
use-0.006
 better-0.006
 agree-0.006
Token n
Token ID299
Feature activation+6.16e-02

Pos logit contributions
use+0.043
 promise+0.043
 unnecessary+0.041
 better+0.040
 forgive+0.040
Neg logit contributions
lf-0.050
 walls-0.050
 himself-0.049
 towers-0.048
aces-0.048
Tokenip
Token ID541
Feature activation-2.01e-02

Pos logit contributions
 promise+0.118
 better+0.114
use+0.112
 unnecessary+0.108
 cal+0.106
Neg logit contributions
aces-0.096
lf-0.093
keeping-0.092
umbled-0.092
anes-0.087
Token their
Token ID511
Feature activation-1.02e-02

Pos logit contributions
aces+0.034
lf+0.033
 walls+0.031
 towers+0.031
 floors+0.030
Neg logit contributions
 promise-0.038
 forgive-0.036
use-0.035
 better-0.035
 agree-0.034
Token legs
Token ID7405
Feature activation-3.04e-03

Pos logit contributions
lf+0.014
aces+0.014
anes+0.013
 walls+0.012
 floors+0.012
Neg logit contributions
 promise-0.023
 better-0.021
 forgive-0.021
use-0.020
 agree-0.020
Token.
Token ID13
Feature activation+2.83e-03

Pos logit contributions
aces+0.005
lf+0.004
 walls+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.006
use-0.006
 forgive-0.006
 better-0.005
 sip-0.005
Token
Token ID198
Feature activation-5.44e-04

Pos logit contributions
 promise+0.005
 forgive+0.005
 better+0.005
use+0.005
 agree+0.005
Neg logit contributions
aces-0.005
lf-0.005
 walls-0.005
 towers-0.005
 floors-0.004
Token Lily
Token ID20037
Feature activation+1.46e-02

Pos logit contributions
 promise+0.009
 better+0.009
 forgive+0.008
 sip+0.008
 rinse+0.008
Neg logit contributions
aces-0.013
lf-0.013
anes-0.012
 walls-0.012
 towers-0.012
Token's
Token ID338
Feature activation+8.20e-04

Pos logit contributions
 promise+0.026
 forgive+0.024
 better+0.024
 agree+0.023
use+0.022
Neg logit contributions
aces-0.026
lf-0.026
 towers-0.025
 walls-0.025
anes-0.024
Token plants
Token ID6134
Feature activation+1.83e-03

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
 rinse+0.001
 agree+0.001
Neg logit contributions
aces-0.002
lf-0.002
anes-0.002
keeping-0.002
 towers-0.001
Token started
Token ID2067
Feature activation-1.46e-02

Pos logit contributions
 promise+0.003
 forgive+0.003
 better+0.003
 sip+0.003
 agree+0.003
Neg logit contributions
lf-0.003
aces-0.003
anes-0.003
 towers-0.003
 walls-0.003
Token to
Token ID284
Feature activation+7.10e-03

Pos logit contributions
lf+0.026
aces+0.025
anes+0.023
 walls+0.021
 floors+0.021
Neg logit contributions
 promise-0.028
 better-0.027
 forgive-0.027
 sip-0.026
 rinse-0.025
Token die
Token ID4656
Feature activation+6.13e-02

Pos logit contributions
 promise+0.012
use+0.011
 unnecessary+0.011
 forgive+0.011
Elsa+0.010
Neg logit contributions
 walls-0.012
 floors-0.012
lf-0.011
 towers-0.011
 ground-0.011
Token.
Token ID13
Feature activation-3.06e-03

Pos logit contributions
 promise+0.127
 forgive+0.123
use+0.122
 agree+0.117
 sip+0.115
Neg logit contributions
aces-0.087
 walls-0.084
 floors-0.082
lf-0.081
 towers-0.079
Token She
Token ID1375
Feature activation+1.01e-02

Pos logit contributions
lf+0.006
aces+0.006
keeping+0.005
anes+0.005
ates+0.005
Neg logit contributions
 promise-0.005
 forgive-0.005
 better-0.005
 sip-0.004
 agree-0.004
Token was
Token ID373
Feature activation+5.27e-03

Pos logit contributions
 promise+0.019
 forgive+0.018
 better+0.017
use+0.017
 agree+0.017
Neg logit contributions
lf-0.017
aces-0.017
 walls-0.016
 towers-0.016
 floors-0.016
Token sad
Token ID6507
Feature activation+1.59e-02

Pos logit contributions
 promise+0.007
 better+0.007
 forgive+0.007
 sip+0.006
 agree+0.006
Neg logit contributions
lf-0.012
aces-0.012
 walls-0.011
 towers-0.011
anes-0.011
Token and
Token ID290
Feature activation-5.45e-04

Pos logit contributions
 promise+0.035
 forgive+0.033
use+0.032
 better+0.032
 agree+0.031
Neg logit contributions
aces-0.022
lf-0.021
 walls-0.020
 towers-0.019
 floors-0.019
Token's
Token ID338
Feature activation+5.10e-03

Pos logit contributions
 promise+0.021
 better+0.020
 forgive+0.019
use+0.018
 sip+0.018
Neg logit contributions
aces-0.026
lf-0.026
anes-0.025
 towers-0.024
 walls-0.024
Token knee
Token ID10329
Feature activation+5.90e-03

Pos logit contributions
 promise+0.010
 forgive+0.009
 better+0.009
 kiss+0.008
 agree+0.008
Neg logit contributions
anes-0.009
lf-0.009
aces-0.009
umbled-0.009
 sculptures-0.008
Token to
Token ID284
Feature activation+2.44e-02

Pos logit contributions
 better+0.011
 promise+0.011
 forgive+0.010
use+0.009
 kiss+0.009
Neg logit contributions
lf-0.010
aces-0.010
 towers-0.010
 floors-0.009
anes-0.009
Token help
Token ID1037
Feature activation+2.80e-02

Pos logit contributions
 promise+0.036
use+0.034
ackers+0.033
 unnecessary+0.032
 better+0.032
Neg logit contributions
lf-0.049
 walls-0.046
aces-0.045
 himself-0.044
 towers-0.044
Token it
Token ID340
Feature activation+2.83e-02

Pos logit contributions
 better+0.056
 forgive+0.056
 promise+0.055
 agree+0.052
 accept+0.050
Neg logit contributions
aces-0.047
lf-0.043
anes-0.040
 towers-0.039
 floors-0.039
Token heal
Token ID12035
Feature activation+6.11e-02

Pos logit contributions
 promise+0.040
use+0.038
 forgive+0.037
 better+0.036
 agree+0.035
Neg logit contributions
aces-0.059
lf-0.058
 walls-0.057
 towers-0.055
 floors-0.055
Token.
Token ID13
Feature activation+5.88e-03

Pos logit contributions
 promise+0.122
 forgive+0.114
use+0.114
 sip+0.111
 better+0.110
Neg logit contributions
lf-0.095
aces-0.094
 walls-0.089
 towers-0.087
 floors-0.086
Token
Token ID198
Feature activation-2.29e-03

Pos logit contributions
 forgive+0.009
 better+0.009
 promise+0.009
 agree+0.008
 sip+0.008
Neg logit contributions
aces-0.012
lf-0.012
 floors-0.011
 walls-0.011
ates-0.010
Token
Token ID198
Feature activation+4.62e-04

Pos logit contributions
aces+0.005
lf+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
 agree-0.003
use-0.003
TokenAfter
Token ID3260
Feature activation-3.49e-03

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
 agree+0.001
 sip+0.001
Neg logit contributions
aces-0.001
keeping-0.001
 walls-0.001
lf-0.001
 towers-0.001
Token Tim
Token ID5045
Feature activation-3.77e-03

Pos logit contributions
aces+0.006
 floors+0.006
lf+0.006
keeping+0.006
anes+0.006
Neg logit contributions
 better-0.006
 forgive-0.006
 promise-0.006
 sip-0.006
 rinse-0.006
Token band
Token ID4097
Feature activation+4.92e-03

Pos logit contributions
 agree+0.017
use+0.017
ackers+0.017
itionally+0.016
 unnecessary+0.016
Neg logit contributions
 blocks-0.011
lf-0.011
 walls-0.011
 cubes-0.011
 bricks-0.011
Token-
Token ID12
Feature activation-2.71e-02

Pos logit contributions
 promise+0.010
 forgive+0.009
 better+0.009
 agree+0.009
use+0.009
Neg logit contributions
lf-0.008
aces-0.008
 towers-0.007
 walls-0.007
 floors-0.007
Tokenaid
Token ID1698
Feature activation+4.75e-03

Pos logit contributions
lf+0.067
aces+0.067
 walls+0.064
 towers+0.064
 floors+0.063
Neg logit contributions
 promise-0.029
 forgive-0.027
 better-0.027
use-0.024
 agree-0.023
Token to
Token ID284
Feature activation+2.37e-02

Pos logit contributions
 better+0.008
 promise+0.008
 forgive+0.008
 kiss+0.007
 agree+0.007
Neg logit contributions
lf-0.009
 towers-0.009
 floors-0.009
 walls-0.009
aces-0.008
Token help
Token ID1037
Feature activation+3.25e-02

Pos logit contributions
 promise+0.035
use+0.035
ackers+0.035
 unnecessary+0.035
ory+0.032
Neg logit contributions
lf-0.043
 himself-0.041
 walls-0.041
aces-0.039
 floors-0.038
Token heal
Token ID12035
Feature activation+6.08e-02

Pos logit contributions
 promise+0.070
 better+0.068
 forgive+0.068
 agree+0.065
use+0.064
Neg logit contributions
aces-0.047
lf-0.044
 walls-0.041
 towers-0.040
anes-0.040
Token the
Token ID262
Feature activation+1.92e-02

Pos logit contributions
 promise+0.134
use+0.127
 sip+0.125
 forgive+0.125
 agree+0.121
Neg logit contributions
lf-0.082
aces-0.078
 himself-0.076
 walls-0.075
keeping-0.073
Token boo
Token ID22513
Feature activation+4.09e-02

Pos logit contributions
 promise+0.032
 forgive+0.030
 better+0.030
use+0.029
 agree+0.028
Neg logit contributions
aces-0.036
lf-0.036
 walls-0.034
 towers-0.034
 floors-0.033
Token-
Token ID12
Feature activation-1.65e-02

Pos logit contributions
 promise+0.086
 forgive+0.083
use+0.082
 sip+0.080
 agree+0.079
Neg logit contributions
aces-0.059
lf-0.057
 walls-0.053
 floors-0.052
 towers-0.051
Tokenbo
Token ID2127
Feature activation+9.52e-03

Pos logit contributions
aces+0.027
lf+0.026
 walls+0.025
 floors+0.024
 towers+0.024
Neg logit contributions
 promise-0.032
 forgive-0.030
 better-0.029
 agree-0.029
use-0.029
Tokeno
Token ID78
Feature activation+1.39e-02

Pos logit contributions
 promise+0.019
 forgive+0.018
 better+0.018
use+0.017
 agree+0.017
Neg logit contributions
lf-0.015
aces-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Token behind
Token ID2157
Feature activation-2.65e-02

Pos logit contributions
 promise+0.030
 better+0.029
 forgive+0.029
use+0.027
 agree+0.027
Neg logit contributions
aces-0.028
lf-0.027
 towers-0.025
 walls-0.025
 floors-0.025
Token would
Token ID561
Feature activation-4.02e-03

Pos logit contributions
aces+0.049
lf+0.048
 floors+0.045
 towers+0.045
 walls+0.045
Neg logit contributions
 promise-0.046
 forgive-0.044
 better-0.043
use-0.041
 sip-0.040
Token cause
Token ID2728
Feature activation+2.27e-02

Pos logit contributions
aces+0.008
lf+0.008
 walls+0.008
 towers+0.008
 floors+0.008
Neg logit contributions
 promise-0.006
 forgive-0.006
 better-0.006
use-0.005
 agree-0.005
Token it
Token ID340
Feature activation+7.98e-03

Pos logit contributions
 promise+0.041
 forgive+0.039
 better+0.039
use+0.037
 agree+0.037
Neg logit contributions
aces-0.039
lf-0.039
 walls-0.037
 towers-0.036
 floors-0.036
Token to
Token ID284
Feature activation-3.01e-04

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.013
lf-0.013
 walls-0.013
 towers-0.012
 floors-0.012
Token die
Token ID4656
Feature activation+6.00e-02

Pos logit contributions
lf+0.001
aces+0.000
 walls+0.000
 towers+0.000
 floors+0.000
Neg logit contributions
 promise-0.001
 forgive-0.001
use-0.001
 better-0.000
 agree-0.000
Token.
Token ID13
Feature activation+2.13e-03

Pos logit contributions
 promise+0.120
 forgive+0.116
use+0.114
 sip+0.112
 agree+0.111
Neg logit contributions
aces-0.088
 walls-0.085
keeping-0.084
lf-0.083
 floors-0.083
Token This
Token ID770
Feature activation+7.51e-03

Pos logit contributions
 promise+0.004
 better+0.003
 forgive+0.003
use+0.003
 rinse+0.003
Neg logit contributions
aces-0.004
lf-0.004
 walls-0.004
 towers-0.003
anes-0.003
Token is
Token ID318
Feature activation+4.81e-04

Pos logit contributions
 promise+0.013
 better+0.012
 forgive+0.012
use+0.011
 agree+0.011
Neg logit contributions
aces-0.015
lf-0.014
 towers-0.013
 floors-0.013
 walls-0.013
Token a
Token ID257
Feature activation+7.75e-03

Pos logit contributions
 promise+0.001
use+0.001
 agree+0.001
 forgive+0.001
 rinse+0.001
Neg logit contributions
lf-0.001
aces-0.001
 walls-0.001
 towers-0.001
 bricks-0.001
Token reminder
Token ID15438
Feature activation-8.06e-04

Pos logit contributions
use+0.014
bs+0.013
ems+0.013
 agree+0.013
 rinse+0.013
Neg logit contributions
 walls-0.012
lf-0.012
 ground-0.012
 bricks-0.011
 floor-0.011
Token don
Token ID836
Feature activation-8.19e-03

Pos logit contributions
 promise+0.041
 forgive+0.038
 better+0.038
use+0.037
 agree+0.036
Neg logit contributions
aces-0.059
lf-0.058
 walls-0.056
 floors-0.055
anes-0.054
Token't
Token ID470
Feature activation+1.36e-03

Pos logit contributions
aces+0.012
keeping+0.011
 towers+0.011
 floors+0.010
 walls+0.010
Neg logit contributions
 better-0.018
 promise-0.017
 forgive-0.017
 agree-0.016
 sip-0.016
Token know
Token ID760
Feature activation+6.37e-03

Pos logit contributions
 rinse+0.002
 promise+0.002
 unnecessary+0.002
use+0.002
 sip+0.002
Neg logit contributions
aces-0.002
lf-0.002
keeping-0.002
 walls-0.002
 himself-0.002
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
 promise+0.010
 forgive+0.010
 better+0.009
use+0.009
 agree+0.009
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.012
 towers-0.011
 floors-0.011
Token But
Token ID887
Feature activation+2.11e-02

Pos logit contributions
 promise+0.030
 forgive+0.028
 better+0.028
use+0.027
 agree+0.027
Neg logit contributions
aces-0.030
lf-0.030
 walls-0.028
 towers-0.028
 floors-0.028
Token why
Token ID1521
Feature activation+5.97e-02

Pos logit contributions
 promise+0.033
 forgive+0.031
 better+0.031
use+0.030
 agree+0.029
Neg logit contributions
lf-0.041
aces-0.041
 walls-0.039
 towers-0.038
 floors-0.038
Token do
Token ID466
Feature activation-1.24e-02

Pos logit contributions
use+0.075
 promise+0.072
 unnecessary+0.071
 sip+0.070
 forgive+0.067
Neg logit contributions
lf-0.134
aces-0.125
 walls-0.123
 towers-0.118
 floors-0.118
Token you
Token ID345
Feature activation+1.04e-02

Pos logit contributions
lf+0.021
aces+0.021
 walls+0.019
 towers+0.019
 floors+0.019
Neg logit contributions
 promise-0.023
 forgive-0.022
 better-0.021
use-0.021
 agree-0.021
Token need
Token ID761
Feature activation+7.23e-03

Pos logit contributions
 promise+0.011
 better+0.011
use+0.011
 unnecessary+0.011
 forgive+0.010
Neg logit contributions
aces-0.023
lf-0.023
 walls-0.022
 floors-0.021
anes-0.021
Token to
Token ID284
Feature activation+7.53e-03

Pos logit contributions
 promise+0.014
 forgive+0.013
 better+0.013
use+0.013
 agree+0.013
Neg logit contributions
aces-0.012
lf-0.011
 walls-0.011
 towers-0.011
 floors-0.011
Token find
Token ID1064
Feature activation+1.81e-02

Pos logit contributions
use+0.011
 unnecessary+0.011
 promise+0.010
bs+0.010
 sip+0.010
Neg logit contributions
 walls-0.014
lf-0.014
 himself-0.013
aces-0.013
 floors-0.013
Token.
Token ID13
Feature activation-3.37e-03

Pos logit contributions
use+0.092
 promise+0.090
 forgive+0.083
 sip+0.082
 rinse+0.081
Neg logit contributions
aces-0.049
 walls-0.049
 himself-0.047
lf-0.046
 floors-0.045
Token His
Token ID2399
Feature activation+4.29e-04

Pos logit contributions
aces+0.006
lf+0.006
keeping+0.006
anes+0.005
ates+0.005
Neg logit contributions
 forgive-0.006
 promise-0.006
 better-0.006
 agree-0.005
 nod-0.005
Token mom
Token ID1995
Feature activation+1.32e-02

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
 kiss+0.001
 agree+0.001
Neg logit contributions
aces-0.001
anes-0.001
lf-0.001
keeping-0.001
 floors-0.001
Token asked
Token ID1965
Feature activation+3.00e-02

Pos logit contributions
 promise+0.023
 forgive+0.022
 better+0.022
 unnecessary+0.021
use+0.021
Neg logit contributions
lf-0.023
aces-0.022
 walls-0.021
 towers-0.020
 floors-0.020
Token him
Token ID683
Feature activation+1.89e-02

Pos logit contributions
 promise+0.063
 forgive+0.062
 better+0.061
use+0.058
 agree+0.058
Neg logit contributions
aces-0.045
lf-0.043
anes-0.040
 walls-0.039
keeping-0.039
Token why
Token ID1521
Feature activation+5.93e-02

Pos logit contributions
 promise+0.029
 forgive+0.026
use+0.026
 better+0.025
 agree+0.025
Neg logit contributions
aces-0.038
lf-0.038
 walls-0.036
 towers-0.036
 floors-0.035
Token he
Token ID339
Feature activation+2.04e-02

Pos logit contributions
 promise+0.127
 forgive+0.125
 better+0.125
 agree+0.117
 rinse+0.115
Neg logit contributions
aces-0.084
lf-0.077
keeping-0.075
 walls-0.075
 towers-0.074
Token was
Token ID373
Feature activation+7.53e-03

Pos logit contributions
 promise+0.041
 forgive+0.040
 better+0.039
use+0.038
 agree+0.038
Neg logit contributions
lf-0.030
aces-0.030
 walls-0.028
 towers-0.027
 floors-0.027
Token crying
Token ID13774
Feature activation+3.50e-02

Pos logit contributions
 promise+0.013
 better+0.013
 forgive+0.013
 agree+0.012
 sip+0.012
Neg logit contributions
aces-0.014
lf-0.013
 towers-0.012
 walls-0.012
anes-0.012
Token and
Token ID290
Feature activation+1.61e-02

Pos logit contributions
 promise+0.073
use+0.070
 forgive+0.069
 sip+0.068
 rinse+0.067
Neg logit contributions
aces-0.050
lf-0.049
 walls-0.047
 floors-0.046
 towers-0.045
Token Tim
Token ID5045
Feature activation+7.68e-03

Pos logit contributions
 promise+0.030
 forgive+0.029
 better+0.028
use+0.027
 agree+0.026
Neg logit contributions
aces-0.028
lf-0.027
 walls-0.026
 towers-0.025
 floors-0.025
Token Tim
Token ID5045
Feature activation+1.19e-02

Pos logit contributions
 better+0.059
 forgive+0.057
 promise+0.055
 rinse+0.050
 sip+0.050
Neg logit contributions
aces-0.064
lf-0.058
umbled-0.057
keeping-0.056
 floors-0.056
Tokenmy
Token ID1820
Feature activation+1.79e-02

Pos logit contributions
 promise+0.016
 better+0.016
 forgive+0.015
use+0.014
 accept+0.013
Neg logit contributions
lf-0.027
 towers-0.026
 walls-0.026
 floors-0.026
aces-0.026
Token needed
Token ID2622
Feature activation+1.56e-02

Pos logit contributions
 promise+0.030
 better+0.029
 forgive+0.028
use+0.025
 accept+0.025
Neg logit contributions
lf-0.036
aces-0.036
 towers-0.036
 floors-0.034
 walls-0.034
Token medicine
Token ID9007
Feature activation+2.69e-02

Pos logit contributions
 promise+0.029
 better+0.029
 forgive+0.029
use+0.027
 sip+0.026
Neg logit contributions
aces-0.027
lf-0.027
 towers-0.024
anes-0.024
 floors-0.024
Token to
Token ID284
Feature activation+8.83e-03

Pos logit contributions
 promise+0.054
 forgive+0.052
 better+0.051
use+0.050
 agree+0.049
Neg logit contributions
aces-0.041
lf-0.041
 walls-0.038
 towers-0.037
 floors-0.037
Token heal
Token ID12035
Feature activation+5.93e-02

Pos logit contributions
 promise+0.013
use+0.012
 forgive+0.012
 unnecessary+0.012
 agree+0.012
Neg logit contributions
lf-0.017
 walls-0.017
aces-0.016
 himself-0.016
 floors-0.016
Token.
Token ID13
Feature activation+1.13e-02

Pos logit contributions
 promise+0.119
 forgive+0.112
use+0.111
 sip+0.110
 agree+0.109
Neg logit contributions
lf-0.089
aces-0.087
 walls-0.084
 himself-0.082
 floors-0.081
Token But
Token ID887
Feature activation+1.70e-02

Pos logit contributions
 forgive+0.019
 better+0.019
 promise+0.019
 agree+0.018
 sip+0.017
Neg logit contributions
aces-0.021
lf-0.020
ates-0.019
keeping-0.019
ator-0.018
Token Tim
Token ID5045
Feature activation+4.39e-03

Pos logit contributions
 promise+0.034
 better+0.034
 forgive+0.033
 agree+0.030
 sip+0.030
Neg logit contributions
aces-0.029
anes-0.026
lf-0.026
keeping-0.025
 walls-0.024
Tokenmy
Token ID1820
Feature activation+4.24e-03

Pos logit contributions
 promise+0.007
 forgive+0.006
 better+0.006
use+0.006
 agree+0.006
Neg logit contributions
aces-0.009
lf-0.009
 towers-0.009
 walls-0.009
 floors-0.009
Token didn
Token ID1422
Feature activation-4.45e-03

Pos logit contributions
 promise+0.007
 better+0.006
 forgive+0.006
 agree+0.006
use+0.006
Neg logit contributions
aces-0.009
 towers-0.008
lf-0.008
anes-0.008
keeping-0.008
Token some
Token ID617
Feature activation+3.45e-03

Pos logit contributions
 promise+0.027
 forgive+0.026
 better+0.026
use+0.024
 sip+0.024
Neg logit contributions
lf-0.024
aces-0.024
 walls-0.022
 towers-0.022
 floors-0.021
Token medicine
Token ID9007
Feature activation+2.41e-02

Pos logit contributions
 promise+0.006
 forgive+0.006
use+0.006
 agree+0.006
 better+0.005
Neg logit contributions
aces-0.006
 walls-0.006
lf-0.006
 floors-0.005
 towers-0.005
Token to
Token ID284
Feature activation+2.02e-02

Pos logit contributions
 promise+0.045
 better+0.044
 forgive+0.044
use+0.041
 sip+0.041
Neg logit contributions
lf-0.041
aces-0.039
 towers-0.038
 floors-0.038
 walls-0.038
Token help
Token ID1037
Feature activation+3.06e-02

Pos logit contributions
 promise+0.029
 unnecessary+0.028
use+0.027
 better+0.026
 forgive+0.026
Neg logit contributions
lf-0.040
 walls-0.038
 himself-0.037
aces-0.035
 floors-0.035
Token it
Token ID340
Feature activation+2.55e-02

Pos logit contributions
 promise+0.063
 better+0.062
 forgive+0.061
 agree+0.058
use+0.057
Neg logit contributions
aces-0.048
lf-0.045
anes-0.042
 towers-0.041
 walls-0.041
Token heal
Token ID12035
Feature activation+5.92e-02

Pos logit contributions
 promise+0.037
use+0.036
 forgive+0.035
 agree+0.033
 unnecessary+0.033
Neg logit contributions
aces-0.050
 walls-0.049
lf-0.049
 himself-0.047
 towers-0.047
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
 promise+0.118
use+0.112
 forgive+0.112
 sip+0.108
 agree+0.106
Neg logit contributions
lf-0.089
aces-0.087
 walls-0.084
keeping-0.082
 himself-0.082
Token Tim
Token ID5045
Feature activation+4.40e-04

Pos logit contributions
 forgive+0.024
 better+0.024
 promise+0.024
 agree+0.022
 sip+0.022
Neg logit contributions
aces-0.022
lf-0.022
keeping-0.019
 walls-0.019
 floors-0.019
Tokenmy
Token ID1820
Feature activation-5.89e-03

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
use+0.001
 agree+0.001
Neg logit contributions
aces-0.001
lf-0.001
 walls-0.001
 towers-0.001
 floors-0.001
Token felt
Token ID2936
Feature activation+2.30e-02

Pos logit contributions
aces+0.010
lf+0.010
 towers+0.009
 walls+0.009
 floors+0.009
Neg logit contributions
 promise-0.011
 forgive-0.010
 better-0.010
 agree-0.010
use-0.010
Token better
Token ID1365
Feature activation+7.69e-03

Pos logit contributions
use+0.035
 promise+0.029
ems+0.029
 agree+0.027
 unison+0.027
Neg logit contributions
 himself-0.049
 walls-0.043
aces-0.042
 ground-0.042
 floors-0.041
Token came
Token ID1625
Feature activation+7.61e-03

Pos logit contributions
 promise+0.017
 forgive+0.016
 better+0.016
use+0.015
 agree+0.015
Neg logit contributions
aces-0.020
lf-0.019
 walls-0.018
 towers-0.018
 floors-0.018
Token outside
Token ID2354
Feature activation+2.32e-02

Pos logit contributions
 promise+0.015
 better+0.015
 forgive+0.015
 agree+0.014
use+0.013
Neg logit contributions
aces-0.013
lf-0.012
anes-0.011
 towers-0.011
 sculptures-0.011
Token and
Token ID290
Feature activation+2.71e-02

Pos logit contributions
 promise+0.040
 forgive+0.040
 better+0.039
 agree+0.037
 sip+0.037
Neg logit contributions
aces-0.045
lf-0.042
anes-0.040
 floors-0.039
 towers-0.039
Token asked
Token ID1965
Feature activation+2.29e-02

Pos logit contributions
 promise+0.045
 forgive+0.041
 better+0.040
use+0.039
 agree+0.039
Neg logit contributions
lf-0.052
aces-0.050
 walls-0.049
 towers-0.048
 floors-0.048
Token him
Token ID683
Feature activation+2.12e-02

Pos logit contributions
 promise+0.049
 forgive+0.047
 better+0.047
use+0.045
 agree+0.045
Neg logit contributions
aces-0.032
lf-0.032
 walls-0.029
 towers-0.029
 floors-0.029
Token why
Token ID1521
Feature activation+5.90e-02

Pos logit contributions
 promise+0.026
 forgive+0.023
use+0.023
 better+0.022
 agree+0.021
Neg logit contributions
aces-0.049
lf-0.048
 walls-0.047
 towers-0.046
 floors-0.045
Token he
Token ID339
Feature activation+1.95e-02

Pos logit contributions
 promise+0.124
 forgive+0.122
 better+0.121
 agree+0.114
 rinse+0.111
Neg logit contributions
aces-0.086
lf-0.080
 walls-0.077
 floors-0.077
keeping-0.076
Token was
Token ID373
Feature activation+6.43e-03

Pos logit contributions
 promise+0.038
 forgive+0.037
 better+0.036
use+0.035
 agree+0.034
Neg logit contributions
aces-0.032
lf-0.031
 towers-0.030
 walls-0.029
 floors-0.029
Token crying
Token ID13774
Feature activation+3.39e-02

Pos logit contributions
 better+0.012
 promise+0.011
 forgive+0.011
 agree+0.010
 safer+0.010
Neg logit contributions
aces-0.012
lf-0.011
anes-0.011
 towers-0.011
cles-0.010
Token.
Token ID13
Feature activation+8.96e-03

Pos logit contributions
 promise+0.067
 forgive+0.063
use+0.063
 sip+0.061
 agree+0.060
Neg logit contributions
aces-0.053
lf-0.052
 walls-0.050
 floors-0.048
 towers-0.048
Token He
Token ID679
Feature activation+1.03e-02

Pos logit contributions
 forgive+0.015
 better+0.014
 promise+0.014
 agree+0.013
use+0.013
Neg logit contributions
aces-0.017
lf-0.016
 floors-0.015
 towers-0.015
 walls-0.015
Token is
Token ID318
Feature activation+1.33e-02

Pos logit contributions
 promise+0.002
 forgive+0.002
use+0.002
 better+0.002
 agree+0.002
Neg logit contributions
lf-0.001
aces-0.001
 walls-0.001
 floors-0.001
 towers-0.001
Token better
Token ID1365
Feature activation+1.56e-02

Pos logit contributions
use+0.027
 sip+0.026
ackers+0.024
 promise+0.024
bs+0.023
Neg logit contributions
 himself-0.021
aces-0.019
lf-0.019
 towers-0.018
 walls-0.018
Token.
Token ID13
Feature activation+8.94e-03

Pos logit contributions
ackers+0.035
 sip+0.034
use+0.034
 rinse+0.033
 promise+0.033
Neg logit contributions
lf-0.017
aces-0.016
 himself-0.016
 walls-0.015
 towers-0.015
Token Now
Token ID2735
Feature activation+1.29e-02

Pos logit contributions
 promise+0.018
 better+0.017
 forgive+0.016
use+0.016
 agree+0.016
Neg logit contributions
aces-0.014
lf-0.013
 walls-0.013
 towers-0.013
 floors-0.012
Token,
Token ID11
Feature activation+2.15e-04

Pos logit contributions
use+0.024
ackers+0.024
 unnecessary+0.024
 promise+0.024
 sip+0.023
Neg logit contributions
 walls-0.018
lf-0.018
 towers-0.018
aces-0.017
 himself-0.016
Token why
Token ID1521
Feature activation+5.88e-02

Pos logit contributions
 promise+0.000
 forgive+0.000
use+0.000
 better+0.000
 agree+0.000
Neg logit contributions
lf-0.000
aces-0.000
 walls-0.000
 towers-0.000
 floors-0.000
Token don
Token ID836
Feature activation-1.17e-02

Pos logit contributions
use+0.069
 unnecessary+0.065
 promise+0.062
 sip+0.059
 rinse+0.058
Neg logit contributions
lf-0.137
aces-0.127
 walls-0.127
 towers-0.125
 blocks-0.123
Token't
Token ID470
Feature activation-2.45e-03

Pos logit contributions
aces+0.012
lf+0.011
 floors+0.010
 walls+0.010
 towers+0.010
Neg logit contributions
 promise-0.030
 better-0.029
 forgive-0.029
use-0.028
 sip-0.028
Token you
Token ID345
Feature activation+6.64e-03

Pos logit contributions
lf+0.004
aces+0.004
 walls+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.005
 forgive-0.004
 better-0.004
use-0.004
 agree-0.004
Token swim
Token ID9422
Feature activation+2.34e-02

Pos logit contributions
 unnecessary+0.009
use+0.008
 safer+0.008
 rinse+0.008
ackers+0.008
Neg logit contributions
 walls-0.012
aces-0.012
lf-0.012
 himself-0.012
 towers-0.012
Token together
Token ID1978
Feature activation+2.18e-03

Pos logit contributions
 promise+0.048
 rinse+0.048
use+0.047
 unnecessary+0.047
 sip+0.046
Neg logit contributions
aces-0.027
 walls-0.026
 towers-0.026
 cubes-0.025
 himself-0.024
Token home
Token ID1363
Feature activation+2.17e-02

Pos logit contributions
 promise+0.006
 forgive+0.006
 better+0.006
use+0.006
 agree+0.006
Neg logit contributions
aces-0.005
lf-0.005
 walls-0.004
 towers-0.004
 floors-0.004
Token".
Token ID1911
Feature activation+6.83e-03

Pos logit contributions
 promise+0.037
 better+0.036
 forgive+0.035
 agree+0.033
use+0.033
Neg logit contributions
aces-0.040
lf-0.039
 towers-0.037
 walls-0.037
anes-0.036
Token His
Token ID2399
Feature activation+1.65e-03

Pos logit contributions
 promise+0.011
 forgive+0.011
 better+0.011
 agree+0.010
use+0.010
Neg logit contributions
aces-0.013
lf-0.013
 towers-0.012
 walls-0.012
 floors-0.012
Token friend
Token ID1545
Feature activation+1.34e-02

Pos logit contributions
 promise+0.003
 forgive+0.002
use+0.002
 better+0.002
 agree+0.002
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.003
 towers-0.003
 floors-0.003
Token asked
Token ID1965
Feature activation+7.70e-03

Pos logit contributions
 promise+0.023
 forgive+0.022
 better+0.022
use+0.021
 agree+0.021
Neg logit contributions
aces-0.024
lf-0.024
 walls-0.023
 towers-0.022
 floors-0.022
Token why
Token ID1521
Feature activation+5.85e-02

Pos logit contributions
 promise+0.016
 forgive+0.015
 better+0.015
use+0.014
 agree+0.014
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.011
 towers-0.011
 floors-0.011
Token.
Token ID13
Feature activation+1.99e-03

Pos logit contributions
 promise+0.128
 forgive+0.123
 better+0.122
use+0.117
 agree+0.117
Neg logit contributions
aces-0.080
lf-0.078
 walls-0.073
 towers-0.072
 floors-0.071
Token Johnny
Token ID15470
Feature activation-1.49e-02

Pos logit contributions
 promise+0.003
 forgive+0.003
 better+0.003
 agree+0.003
use+0.003
Neg logit contributions
aces-0.004
lf-0.004
 towers-0.003
 walls-0.003
 floors-0.003
Token said
Token ID531
Feature activation-5.71e-03

Pos logit contributions
lf+0.026
aces+0.026
 walls+0.024
 towers+0.024
 floors+0.024
Neg logit contributions
 promise-0.026
 forgive-0.025
 better-0.025
use-0.024
 rinse-0.024
Token,
Token ID11
Feature activation-4.97e-03

Pos logit contributions
aces+0.009
lf+0.009
keeping+0.009
 walls+0.009
 floors+0.009
Neg logit contributions
 promise-0.011
 forgive-0.011
 better-0.011
 agree-0.010
use-0.010
Token "
Token ID366
Feature activation-9.61e-03

Pos logit contributions
aces+0.009
lf+0.009
 walls+0.008
 floors+0.008
 towers+0.008
Neg logit contributions
 promise-0.009
 forgive-0.009
 better-0.008
 agree-0.008
use-0.008
Token on
Token ID319
Feature activation-1.33e-02

Pos logit contributions
 promise+0.025
 better+0.024
 forgive+0.023
use+0.022
 agree+0.022
Neg logit contributions
lf-0.019
aces-0.019
 walls-0.018
 towers-0.018
 floors-0.018
Token it
Token ID340
Feature activation+1.34e-02

Pos logit contributions
aces+0.024
lf+0.024
 walls+0.022
 floors+0.022
 towers+0.022
Neg logit contributions
 promise-0.023
 better-0.022
 forgive-0.022
use-0.021
 agree-0.020
Token and
Token ID290
Feature activation+8.02e-03

Pos logit contributions
 promise+0.024
 forgive+0.023
 better+0.022
use+0.022
 agree+0.022
Neg logit contributions
aces-0.023
lf-0.023
 walls-0.022
 towers-0.021
 floors-0.021
Token it
Token ID340
Feature activation+6.08e-03

Pos logit contributions
 promise+0.011
 forgive+0.010
use+0.010
 better+0.010
 agree+0.010
Neg logit contributions
lf-0.016
aces-0.016
 walls-0.016
 towers-0.015
 floors-0.015
Token will
Token ID481
Feature activation-2.86e-03

Pos logit contributions
 promise+0.007
 forgive+0.007
 agree+0.007
use+0.006
 sip+0.006
Neg logit contributions
aces-0.014
 walls-0.013
lf-0.013
 floors-0.013
 towers-0.013
Token heal
Token ID12035
Feature activation+5.84e-02

Pos logit contributions
 walls+0.005
lf+0.005
aces+0.005
 himself+0.005
 floors+0.005
Neg logit contributions
 promise-0.004
 agree-0.004
use-0.004
 unnecessary-0.004
ackers-0.004
Token soon
Token ID2582
Feature activation+1.20e-02

Pos logit contributions
 promise+0.103
 sip+0.100
use+0.096
 agree+0.095
 forgive+0.094
Neg logit contributions
lf-0.095
aces-0.094
 himself-0.094
 walls-0.094
 towers-0.091
Token."
Token ID526
Feature activation+2.45e-02

Pos logit contributions
 promise+0.022
 forgive+0.022
 sip+0.021
use+0.020
 agree+0.020
Neg logit contributions
aces-0.020
lf-0.019
 walls-0.019
 floors-0.018
 towers-0.018
Token
Token ID198
Feature activation+1.26e-02

Pos logit contributions
 promise+0.033
 forgive+0.032
 better+0.031
 agree+0.030
use+0.029
Neg logit contributions
aces-0.054
lf-0.054
 walls-0.051
 floors-0.050
 towers-0.050
Token
Token ID198
Feature activation+1.45e-02

Pos logit contributions
 promise+0.021
 forgive+0.019
 better+0.019
use+0.019
 agree+0.018
Neg logit contributions
lf-0.024
aces-0.024
 walls-0.022
 towers-0.022
 floors-0.022
TokenTim
Token ID14967
Feature activation-4.73e-03

Pos logit contributions
 promise+0.026
 forgive+0.025
 better+0.024
 agree+0.023
use+0.023
Neg logit contributions
aces-0.026
lf-0.025
 walls-0.024
 towers-0.024
 floors-0.023
Token cabinet
Token ID13447
Feature activation+8.47e-03

Pos logit contributions
 promise+0.001
 better+0.001
 forgive+0.001
use+0.001
 sip+0.001
Neg logit contributions
aces-0.001
lf-0.001
keeping-0.001
anes-0.001
umbled-0.001
Token and
Token ID290
Feature activation+1.13e-02

Pos logit contributions
 promise+0.013
 better+0.013
 forgive+0.011
 sip+0.011
 closer+0.011
Neg logit contributions
aces-0.018
lf-0.017
keeping-0.016
anes-0.016
ped-0.016
Token out
Token ID503
Feature activation+5.61e-03

Pos logit contributions
 promise+0.017
 forgive+0.016
 better+0.016
use+0.015
 sip+0.015
Neg logit contributions
aces-0.024
lf-0.023
anes-0.022
 towers-0.022
keeping-0.022
Token came
Token ID1625
Feature activation-1.38e-04

Pos logit contributions
 sip+0.009
 forgive+0.009
 better+0.009
 promise+0.009
 kiss+0.008
Neg logit contributions
keeping-0.011
aces-0.011
 floors-0.010
lf-0.010
ates-0.010
Token a
Token ID257
Feature activation+2.38e-02

Pos logit contributions
lf+0.000
aces+0.000
anes+0.000
 towers+0.000
 floors+0.000
Neg logit contributions
 promise-0.000
 forgive-0.000
 better-0.000
 agree-0.000
 accept-0.000
Token little
Token ID1310
Feature activation+5.81e-02

Pos logit contributions
 unnecessary+0.049
use+0.048
 promise+0.048
 agree+0.048
bs+0.046
Neg logit contributions
lf-0.036
 walls-0.033
 stairs-0.029
 floors-0.028
 towers-0.028
Token bunny
Token ID44915
Feature activation+4.13e-02

Pos logit contributions
 promise+0.098
 forgive+0.095
 agree+0.093
use+0.092
 accept+0.090
Neg logit contributions
 walls-0.105
 floors-0.099
lf-0.098
 towers-0.098
 bricks-0.095
Token.
Token ID13
Feature activation-2.03e-03

Pos logit contributions
 promise+0.082
 forgive+0.077
 better+0.076
use+0.076
 agree+0.075
Neg logit contributions
lf-0.063
aces-0.062
 walls-0.059
 towers-0.057
keeping-0.055
Token Lilly
Token ID35134
Feature activation-7.11e-03

Pos logit contributions
aces+0.003
lf+0.003
 walls+0.003
 towers+0.003
 floors+0.003
Neg logit contributions
 promise-0.004
 forgive-0.004
use-0.004
 better-0.004
 agree-0.004
Token wanted
Token ID2227
Feature activation-6.46e-03

Pos logit contributions
aces+0.012
lf+0.011
 towers+0.011
 walls+0.011
 floors+0.011
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.012
 agree-0.012
Token to
Token ID284
Feature activation+5.44e-04

Pos logit contributions
aces+0.012
lf+0.012
anes+0.011
 floors+0.011
umbled+0.011
Neg logit contributions
 better-0.012
 forgive-0.011
 promise-0.011
 sip-0.011
use-0.010
Token to
Token ID284
Feature activation+8.06e-03

Pos logit contributions
 promise+0.030
 forgive+0.029
 better+0.029
use+0.027
 sip+0.026
Neg logit contributions
aces-0.025
lf-0.025
 towers-0.023
anes-0.023
 floors-0.023
Token take
Token ID1011
Feature activation+1.23e-02

Pos logit contributions
use+0.012
 promise+0.011
ackers+0.011
 unnecessary+0.011
 sip+0.011
Neg logit contributions
lf-0.015
 walls-0.015
 himself-0.014
aces-0.014
 floors-0.014
Token Max
Token ID5436
Feature activation+1.26e-02

Pos logit contributions
 promise+0.023
 forgive+0.022
 better+0.021
use+0.021
 agree+0.021
Neg logit contributions
aces-0.020
lf-0.020
 walls-0.019
 towers-0.019
 floors-0.019
Token to
Token ID284
Feature activation+1.79e-02

Pos logit contributions
 promise+0.025
 forgive+0.024
 better+0.023
use+0.023
 agree+0.022
Neg logit contributions
aces-0.020
lf-0.020
 walls-0.018
 towers-0.018
 floors-0.018
Token the
Token ID262
Feature activation+8.93e-03

Pos logit contributions
use+0.043
 promise+0.043
ackers+0.041
bs+0.040
ealing+0.040
Neg logit contributions
 walls-0.018
 himself-0.018
lf-0.017
 towers-0.016
 floors-0.015
Token veterinarian
Token ID41404
Feature activation+5.80e-02

Pos logit contributions
 promise+0.016
use+0.016
 agree+0.015
 forgive+0.015
bs+0.015
Neg logit contributions
 walls-0.015
 towers-0.014
lf-0.014
 floors-0.014
 ground-0.014
Token,
Token ID11
Feature activation+1.11e-02

Pos logit contributions
 promise+0.115
 rinse+0.114
 sip+0.110
 unnecessary+0.110
 forgive+0.110
Neg logit contributions
aces-0.081
lf-0.079
 walls-0.078
keeping-0.074
 floors-0.071
Token who
Token ID508
Feature activation+3.35e-02

Pos logit contributions
 promise+0.021
 forgive+0.020
 better+0.020
use+0.019
 agree+0.019
Neg logit contributions
aces-0.018
lf-0.018
 walls-0.017
 towers-0.017
 floors-0.016
Token is
Token ID318
Feature activation+4.35e-03

Pos logit contributions
 promise+0.061
 forgive+0.058
 better+0.057
use+0.054
 agree+0.053
Neg logit contributions
aces-0.061
lf-0.059
 towers-0.057
 walls-0.057
 floors-0.056
Token a
Token ID257
Feature activation+6.13e-03

Pos logit contributions
 promise+0.009
 forgive+0.008
 agree+0.008
use+0.008
 sip+0.008
Neg logit contributions
lf-0.006
aces-0.006
 walls-0.006
 floors-0.006
 towers-0.005
Token person
Token ID1048
Feature activation+1.27e-02

Pos logit contributions
ackers+0.011
reci+0.010
bs+0.010
 sip+0.010
 agree+0.010
Neg logit contributions
lf-0.009
 walls-0.009
 workers-0.008
 floors-0.008
 fixing-0.008
Token never
Token ID1239
Feature activation-1.39e-02

Pos logit contributions
aces+0.031
lf+0.031
 towers+0.030
 floors+0.030
 walls+0.029
Neg logit contributions
 promise-0.033
 better-0.031
 forgive-0.030
 agree-0.029
use-0.029
Token forgot
Token ID16453
Feature activation+1.89e-03

Pos logit contributions
aces+0.024
lf+0.023
 walls+0.022
 towers+0.021
 floors+0.021
Neg logit contributions
 promise-0.025
 forgive-0.023
 better-0.023
 rinse-0.023
 sip-0.022
Token the
Token ID262
Feature activation-5.89e-03

Pos logit contributions
 promise+0.004
 forgive+0.004
 better+0.004
 agree+0.003
use+0.003
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.003
 floors-0.003
 towers-0.003
Token fast
Token ID3049
Feature activation-3.69e-02

Pos logit contributions
lf+0.012
aces+0.012
anes+0.011
keeping+0.011
 floors+0.011
Neg logit contributions
 promise-0.010
 better-0.010
 forgive-0.009
 agree-0.009
use-0.008
Token che
Token ID1125
Feature activation-2.59e-02

Pos logit contributions
lf+0.066
aces+0.066
anes+0.062
keeping+0.060
 floors+0.060
Neg logit contributions
 promise-0.068
 better-0.063
 forgive-0.062
 agree-0.060
 kiss-0.058
Tokenet
Token ID316
Feature activation-7.65e-02

Pos logit contributions
lf+0.038
 walls+0.036
anes+0.035
 towers+0.034
 floors+0.034
Neg logit contributions
 promise-0.055
 better-0.050
use-0.050
 forgive-0.049
 agree-0.049
Tokenah
Token ID993
Feature activation-3.68e-02

Pos logit contributions
lf+0.164
 floors+0.164
 walls+0.162
 towers+0.157
keeping+0.155
Neg logit contributions
 promise-0.110
 forgive-0.106
 better-0.103
 agree-0.097
 accept-0.090
Token he
Token ID339
Feature activation-1.36e-02

Pos logit contributions
 floors+0.060
 walls+0.059
lf+0.058
 towers+0.058
 bricks+0.057
Neg logit contributions
 promise-0.071
 forgive-0.067
 better-0.066
 agree-0.065
 accept-0.060
Token had
Token ID550
Feature activation-2.24e-02

Pos logit contributions
lf+0.024
aces+0.024
 walls+0.023
 towers+0.022
 floors+0.022
Neg logit contributions
 promise-0.025
 forgive-0.023
 better-0.023
use-0.022
 agree-0.022
Token seen
Token ID1775
Feature activation-2.89e-03

Pos logit contributions
lf+0.038
aces+0.037
 floors+0.036
anes+0.036
 walls+0.036
Neg logit contributions
 promise-0.042
 forgive-0.041
 better-0.040
 agree-0.038
use-0.037
Token.
Token ID13
Feature activation-3.55e-03

Pos logit contributions
aces+0.006
lf+0.005
 floors+0.005
 walls+0.005
 towers+0.005
Neg logit contributions
 better-0.005
 promise-0.005
 forgive-0.004
 agree-0.004
use-0.004
Tokeneth
Token ID2788
Feature activation-4.70e-02

Pos logit contributions
aces+0.011
lf+0.011
 walls+0.010
 towers+0.010
 floors+0.010
Neg logit contributions
 promise-0.007
 forgive-0.007
 better-0.007
use-0.006
 agree-0.006
Tokenany
Token ID1092
Feature activation-2.49e-02

Pos logit contributions
 towers+0.084
 bricks+0.080
ados+0.077
 sculptures+0.076
 walls+0.076
Neg logit contributions
 promise-0.082
 forgive-0.071
 better-0.071
 agree-0.071
use-0.070
Token and
Token ID290
Feature activation-4.02e-03

Pos logit contributions
aces+0.048
 bricks+0.046
 towers+0.045
lf+0.045
ados+0.044
Neg logit contributions
 promise-0.040
 better-0.038
 agree-0.036
 forgive-0.036
 kiss-0.033
Token her
Token ID607
Feature activation-1.75e-02

Pos logit contributions
aces+0.008
lf+0.007
 towers+0.007
anes+0.007
 walls+0.007
Neg logit contributions
 promise-0.007
 forgive-0.006
 better-0.006
 agree-0.006
use-0.006
Token je
Token ID11223
Feature activation-3.78e-02

Pos logit contributions
aces+0.031
lf+0.029
keeping+0.029
anes+0.028
umbled+0.027
Neg logit contributions
 promise-0.033
 forgive-0.031
 better-0.030
 kiss-0.028
 agree-0.027
Tokenep
Token ID538
Feature activation-7.43e-02

Pos logit contributions
keeping+0.069
aces+0.069
 walls+0.066
lf+0.065
 sculptures+0.065
Neg logit contributions
 promise-0.059
 forgive-0.056
 better-0.054
 unnecessary-0.048
use-0.047
Token had
Token ID550
Feature activation-2.12e-02

Pos logit contributions
lf+0.111
aces+0.107
 towers+0.104
 walls+0.102
 floors+0.101
Neg logit contributions
 promise-0.144
 better-0.138
 forgive-0.137
 agree-0.131
 kiss-0.127
Token many
Token ID867
Feature activation-3.17e-02

Pos logit contributions
lf+0.040
aces+0.037
anes+0.035
ator+0.035
umbled+0.035
Neg logit contributions
 better-0.038
 promise-0.034
 forgive-0.034
 safer-0.032
 happier-0.031
Token adventures
Token ID17545
Feature activation-8.26e-03

Pos logit contributions
aces+0.064
lf+0.064
keeping+0.059
anes+0.059
 walls+0.059
Neg logit contributions
 promise-0.050
 better-0.047
 forgive-0.046
 agree-0.043
use-0.042
Token.
Token ID13
Feature activation-1.58e-02

Pos logit contributions
lf+0.017
aces+0.016
 floors+0.016
 walls+0.016
 towers+0.015
Neg logit contributions
 better-0.012
 promise-0.012
 forgive-0.011
 agree-0.011
 safer-0.010
Token Everywhere
Token ID48201
Feature activation-4.25e-03

Pos logit contributions
lf+0.029
aces+0.028
 towers+0.026
 walls+0.026
 floors+0.026
Neg logit contributions
 promise-0.027
 forgive-0.026
 better-0.026
use-0.024
 agree-0.024
Token!"
Token ID2474
Feature activation+1.31e-03

Pos logit contributions
aces+0.021
lf+0.021
 floors+0.020
 towers+0.019
 walls+0.019
Neg logit contributions
 promise-0.017
 better-0.016
 forgive-0.015
use-0.015
 agree-0.015
Token
Token ID220
Feature activation+1.50e-03

Pos logit contributions
 promise+0.002
 forgive+0.002
 better+0.002
use+0.002
 agree+0.002
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.002
 towers-0.002
 floors-0.002
Token
Token ID198
Feature activation-2.65e-03

Pos logit contributions
 promise+0.002
 forgive+0.002
 better+0.002
use+0.002
 agree+0.002
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.003
 towers-0.003
 floors-0.003
Token
Token ID198
Feature activation-5.63e-04

Pos logit contributions
aces+0.005
lf+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
use-0.004
 agree-0.004
TokenMor
Token ID20044
Feature activation+8.47e-03

Pos logit contributions
aces+0.001
lf+0.001
 walls+0.001
 towers+0.001
 floors+0.001
Neg logit contributions
 promise-0.001
 forgive-0.001
 better-0.001
use-0.001
 agree-0.001
Tokenris
Token ID2442
Feature activation-7.19e-02

Pos logit contributions
 promise+0.014
 better+0.013
 forgive+0.013
 rinse+0.012
 unnecessary+0.012
Neg logit contributions
aces-0.017
lf-0.016
anes-0.014
 walls-0.014
keeping-0.014
Token smiled
Token ID13541
Feature activation-2.13e-03

Pos logit contributions
aces+0.143
anes+0.136
 floors+0.130
 towers+0.130
 bricks+0.128
Neg logit contributions
 promise-0.115
use-0.106
 agree-0.103
 forgive-0.099
 better-0.097
Token and
Token ID290
Feature activation+4.94e-03

Pos logit contributions
aces+0.004
lf+0.003
 walls+0.003
 towers+0.003
 floors+0.003
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
use-0.004
 agree-0.004
Token replied
Token ID8712
Feature activation-1.72e-02

Pos logit contributions
 promise+0.009
 forgive+0.008
 better+0.008
use+0.008
 agree+0.008
Neg logit contributions
lf-0.009
aces-0.009
 walls-0.008
 towers-0.008
 floors-0.008
Token,
Token ID11
Feature activation-5.85e-03

Pos logit contributions
aces+0.032
lf+0.032
 walls+0.030
 towers+0.030
 floors+0.029
Neg logit contributions
 promise-0.029
 forgive-0.027
 better-0.027
use-0.026
 agree-0.026
Token "
Token ID366
Feature activation-1.97e-02

Pos logit contributions
aces+0.009
lf+0.009
 walls+0.009
 towers+0.008
 floors+0.008
Neg logit contributions
 promise-0.012
 forgive-0.011
 better-0.011
use-0.011
 agree-0.010
Token stopped
Token ID5025
Feature activation-1.38e-02

Pos logit contributions
aces+0.002
lf+0.002
 walls+0.001
 towers+0.001
 floors+0.001
Neg logit contributions
 promise-0.002
 forgive-0.002
 better-0.002
use-0.002
 agree-0.001
Token.
Token ID13
Feature activation-8.60e-03

Pos logit contributions
aces+0.021
lf+0.021
 walls+0.020
 towers+0.020
 floors+0.019
Neg logit contributions
 promise-0.028
 forgive-0.026
 better-0.025
use-0.025
 agree-0.025
Token
Token ID198
Feature activation-6.04e-03

Pos logit contributions
lf+0.017
aces+0.016
 floors+0.015
 towers+0.015
 walls+0.015
Neg logit contributions
 forgive-0.014
 promise-0.014
 better-0.013
 agree-0.013
use-0.012
Token
Token ID198
Feature activation-2.07e-03

Pos logit contributions
aces+0.012
lf+0.012
 walls+0.012
 towers+0.011
 floors+0.011
Neg logit contributions
 promise-0.009
 forgive-0.009
 better-0.008
use-0.008
 agree-0.008
TokenKay
Token ID37247
Feature activation-3.89e-02

Pos logit contributions
aces+0.004
lf+0.004
 walls+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
 agree-0.003
use-0.003
Tokenlee
Token ID7197
Feature activation-7.18e-02

Pos logit contributions
aces+0.065
lf+0.064
 towers+0.062
 floors+0.061
 walls+0.061
Neg logit contributions
 promise-0.071
 forgive-0.067
 better-0.065
use-0.065
 agree-0.063
Token was
Token ID373
Feature activation-1.37e-03

Pos logit contributions
aces+0.132
lf+0.131
 towers+0.130
 cubes+0.128
 bricks+0.128
Neg logit contributions
 promise-0.113
 agree-0.111
 forgive-0.105
 better-0.105
 nod-0.098
Token excited
Token ID6568
Feature activation-1.31e-02

Pos logit contributions
lf+0.002
aces+0.002
 walls+0.002
 towers+0.002
 floors+0.002
Neg logit contributions
 promise-0.003
 better-0.003
 forgive-0.003
 agree-0.002
 sip-0.002
Token and
Token ID290
Feature activation-7.69e-03

Pos logit contributions
aces+0.028
lf+0.027
anes+0.026
ates+0.025
 floors+0.025
Neg logit contributions
 promise-0.020
 better-0.020
 forgive-0.019
 agree-0.019
 louder-0.017
Token said
Token ID531
Feature activation-2.22e-02

Pos logit contributions
aces+0.013
lf+0.013
 walls+0.012
 towers+0.012
 floors+0.012
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.013
 agree-0.013
Token,
Token ID11
Feature activation-2.26e-02

Pos logit contributions
aces+0.036
lf+0.034
 walls+0.033
 towers+0.033
 floors+0.032
Neg logit contributions
 promise-0.043
use-0.040
 forgive-0.040
 better-0.039
 sip-0.039
Token It
Token ID632
Feature activation-1.06e-03

Pos logit contributions
aces+0.007
lf+0.007
 walls+0.006
 towers+0.006
 floors+0.006
Neg logit contributions
 promise-0.006
 forgive-0.005
 better-0.005
use-0.005
 agree-0.005
Token became
Token ID2627
Feature activation+1.47e-02

Pos logit contributions
aces+0.002
lf+0.002
 towers+0.002
 walls+0.002
 floors+0.002
Neg logit contributions
 promise-0.002
 better-0.002
 forgive-0.002
 agree-0.002
use-0.002
Token the
Token ID262
Feature activation-3.98e-03

Pos logit contributions
 promise+0.031
use+0.031
 sip+0.030
 forgive+0.030
 agree+0.029
Neg logit contributions
aces-0.019
 walls-0.019
 towers-0.018
lf-0.018
 himself-0.018
Token happiest
Token ID49414
Feature activation-4.12e-03

Pos logit contributions
aces+0.007
lf+0.007
keeping+0.006
anes+0.006
umbled+0.006
Neg logit contributions
 promise-0.008
 better-0.007
 forgive-0.007
 kiss-0.006
 safer-0.006
Token books
Token ID3835
Feature activation-3.55e-02

Pos logit contributions
aces+0.008
ped+0.008
umbled+0.007
keeping+0.007
anes+0.007
Neg logit contributions
 better-0.007
 promise-0.007
 forgive-0.006
 kiss-0.006
 safer-0.006
Tokenhe
Token ID258
Feature activation-7.14e-02

Pos logit contributions
aces+0.053
 floors+0.051
 towers+0.051
lf+0.050
 walls+0.049
Neg logit contributions
 promise-0.073
 better-0.072
 forgive-0.070
 agree-0.065
 sip-0.064
Tokenlf
Token ID1652
Feature activation-4.58e-02

Pos logit contributions
aces+0.059
anes+0.056
 markings+0.046
umbled+0.044
 towers+0.042
Neg logit contributions
 promise-0.187
 better-0.178
 forgive-0.177
 agree-0.173
 sip-0.170
Token in
Token ID287
Feature activation-2.22e-02

Pos logit contributions
 floors+0.070
 towers+0.068
aces+0.067
 walls+0.066
anes+0.065
Neg logit contributions
 promise-0.090
 better-0.088
 forgive-0.085
 agree-0.078
 accept-0.077
Token the
Token ID262
Feature activation+4.29e-03

Pos logit contributions
aces+0.038
lf+0.037
 towers+0.034
 walls+0.034
 floors+0.034
Neg logit contributions
 promise-0.041
 forgive-0.039
 better-0.039
use-0.037
 agree-0.037
Token world
Token ID995
Feature activation-3.02e-03

Pos logit contributions
 promise+0.008
 better+0.008
 forgive+0.007
use+0.007
 safer+0.007
Neg logit contributions
aces-0.008
ped-0.007
lf-0.007
anes-0.007
keeping-0.007
Token!
Token ID0
Feature activation-5.04e-03

Pos logit contributions
aces+0.006
lf+0.006
 floors+0.005
 towers+0.005
 walls+0.005
Neg logit contributions
 promise-0.005
 better-0.005
 forgive-0.005
 agree-0.004
use-0.004
Tokeneth
Token ID2788
Feature activation-4.76e-02

Pos logit contributions
lf+0.013
aces+0.013
 walls+0.012
 towers+0.012
 floors+0.012
Neg logit contributions
 promise-0.009
 forgive-0.008
 better-0.008
use-0.008
 agree-0.008
Tokenany
Token ID1092
Feature activation-2.43e-02

Pos logit contributions
 towers+0.083
 bricks+0.080
anes+0.080
aces+0.077
 sculptures+0.076
Neg logit contributions
 promise-0.084
 agree-0.075
 forgive-0.072
 better-0.072
 accept-0.071
Token and
Token ID290
Feature activation-4.84e-03

Pos logit contributions
aces+0.050
anes+0.046
 towers+0.046
 bricks+0.046
keeping+0.045
Neg logit contributions
 promise-0.037
 better-0.034
 agree-0.034
 forgive-0.033
 accept-0.030
Token her
Token ID607
Feature activation-1.72e-02

Pos logit contributions
aces+0.009
lf+0.009
 towers+0.009
anes+0.008
 walls+0.008
Neg logit contributions
 promise-0.008
 forgive-0.008
 better-0.008
 agree-0.007
 rinse-0.007
Token je
Token ID11223
Feature activation-3.59e-02

Pos logit contributions
aces+0.035
lf+0.032
anes+0.032
keeping+0.032
umbled+0.032
Neg logit contributions
 promise-0.029
 forgive-0.027
 better-0.026
 kiss-0.024
 agree-0.023
Tokenep
Token ID538
Feature activation-7.07e-02

Pos logit contributions
keeping+0.066
aces+0.066
 walls+0.062
 sculptures+0.061
lf+0.060
Neg logit contributions
 promise-0.056
 forgive-0.054
 better-0.051
use-0.045
 unnecessary-0.044
Token were
Token ID547
Feature activation-1.07e-02

Pos logit contributions
lf+0.106
aces+0.105
 towers+0.101
 walls+0.098
 floors+0.098
Neg logit contributions
 promise-0.132
 better-0.128
 forgive-0.126
 agree-0.122
 kiss-0.118
Token very
Token ID845
Feature activation-4.89e-03

Pos logit contributions
aces+0.021
lf+0.021
keeping+0.020
anes+0.020
opes+0.018
Neg logit contributions
 better-0.018
 promise-0.016
 happier-0.015
 safer-0.015
 forgive-0.014
Token happy
Token ID3772
Feature activation-4.85e-03

Pos logit contributions
aces+0.011
lf+0.010
 walls+0.010
 towers+0.010
 floors+0.010
Neg logit contributions
 promise-0.007
 better-0.007
 forgive-0.007
 agree-0.006
use-0.006
Token together
Token ID1978
Feature activation-1.75e-02

Pos logit contributions
aces+0.011
lf+0.010
anes+0.010
 floors+0.010
 walls+0.010
Neg logit contributions
 better-0.007
 forgive-0.006
 promise-0.006
 agree-0.006
 happier-0.005
Token.
Token ID13
Feature activation-1.74e-02

Pos logit contributions
aces+0.035
lf+0.034
anes+0.033
 floors+0.032
 towers+0.032
Neg logit contributions
 promise-0.028
 better-0.027
 forgive-0.026
 agree-0.025
 sip-0.024
Token them
Token ID606
Feature activation-2.00e-02

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
 agree+0.001
 accept+0.001
Neg logit contributions
aces-0.001
lf-0.001
anes-0.001
 floors-0.001
umbled-0.001
Token about
Token ID546
Feature activation-2.05e-03

Pos logit contributions
aces+0.037
lf+0.037
keeping+0.034
 floors+0.034
anes+0.034
Neg logit contributions
 forgive-0.034
 promise-0.034
 better-0.034
 agree-0.031
 accept-0.030
Token the
Token ID262
Feature activation-5.53e-03

Pos logit contributions
aces+0.004
lf+0.003
 floors+0.003
 walls+0.003
anes+0.003
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
 agree-0.003
 sip-0.003
Token fast
Token ID3049
Feature activation-3.35e-02

Pos logit contributions
aces+0.010
lf+0.010
umbled+0.009
keeping+0.009
anes+0.009
Neg logit contributions
 promise-0.011
 better-0.010
 forgive-0.010
use-0.009
 agree-0.009
Token che
Token ID1125
Feature activation-2.36e-02

Pos logit contributions
aces+0.056
lf+0.052
keeping+0.047
anes+0.046
umbled+0.045
Neg logit contributions
 promise-0.068
 better-0.064
 forgive-0.063
use-0.061
 agree-0.060
Tokenet
Token ID316
Feature activation-7.06e-02

Pos logit contributions
lf+0.036
 walls+0.035
 towers+0.034
anes+0.034
 floors+0.034
Neg logit contributions
 promise-0.047
use-0.043
 better-0.043
 forgive-0.041
 agree-0.041
Tokenah
Token ID993
Feature activation-2.99e-02

Pos logit contributions
lf+0.140
 floors+0.139
aces+0.138
 walls+0.136
keeping+0.136
Neg logit contributions
 promise-0.106
 better-0.101
 forgive-0.101
 agree-0.094
use-0.087
Token.
Token ID13
Feature activation-8.57e-03

Pos logit contributions
 floors+0.051
aces+0.051
 walls+0.049
lf+0.049
 towers+0.049
Neg logit contributions
 promise-0.053
 better-0.050
 forgive-0.050
 agree-0.048
 sip-0.045
Token They
Token ID1119
Feature activation-4.06e-03

Pos logit contributions
lf+0.017
aces+0.016
 floors+0.016
 walls+0.016
 towers+0.015
Neg logit contributions
 forgive-0.013
 promise-0.013
 better-0.013
 agree-0.013
use-0.012
Token were
Token ID547
Feature activation-7.35e-03

Pos logit contributions
lf+0.005
aces+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.009
 forgive-0.008
 better-0.008
use-0.008
 sip-0.008
Token all
Token ID477
Feature activation-3.92e-03

Pos logit contributions
aces+0.015
lf+0.015
keeping+0.014
anes+0.014
 floors+0.014
Neg logit contributions
 better-0.012
 promise-0.011
 forgive-0.011
 safer-0.010
 happier-0.010
Token had
Token ID550
Feature activation-9.26e-04

Pos logit contributions
 promise+0.006
 better+0.006
 forgive+0.006
 accept+0.005
use+0.005
Neg logit contributions
anes-0.008
 towers-0.008
aces-0.008
 floors-0.008
lf-0.008
Token never
Token ID1239
Feature activation-2.14e-02

Pos logit contributions
aces+0.001
anes+0.001
lf+0.001
 floors+0.001
 walls+0.001
Neg logit contributions
 forgive-0.002
 promise-0.002
 better-0.002
use-0.002
 accept-0.002
Token touched
Token ID12615
Feature activation-1.51e-03

Pos logit contributions
aces+0.034
lf+0.034
 walls+0.031
 towers+0.031
anes+0.031
Neg logit contributions
 promise-0.042
 forgive-0.041
 better-0.040
use-0.038
 agree-0.038
Token the
Token ID262
Feature activation-1.49e-02

Pos logit contributions
lf+0.003
aces+0.003
anes+0.003
 walls+0.002
 floors+0.002
Neg logit contributions
 better-0.003
 promise-0.002
 forgive-0.002
use-0.002
 sip-0.002
Token books
Token ID3835
Feature activation-2.62e-02

Pos logit contributions
aces+0.023
keeping+0.023
umbled+0.022
ped+0.022
ates+0.021
Neg logit contributions
 better-0.030
 promise-0.029
 forgive-0.028
 safer-0.026
use-0.025
Tokenhe
Token ID258
Feature activation-7.05e-02

Pos logit contributions
aces+0.042
lf+0.041
 walls+0.039
 towers+0.038
 floors+0.038
Neg logit contributions
 promise-0.051
 forgive-0.049
 better-0.048
use-0.047
 agree-0.046
Tokenlf
Token ID1652
Feature activation-3.21e-02

Pos logit contributions
aces+0.076
anes+0.074
umbled+0.058
unes+0.057
ocular+0.056
Neg logit contributions
 promise-0.161
 sip-0.154
 better-0.154
 agree-0.153
 forgive-0.148
Token.
Token ID13
Feature activation-2.75e-03

Pos logit contributions
aces+0.070
anes+0.067
lf+0.067
 towers+0.067
 floors+0.066
Neg logit contributions
 promise-0.044
 better-0.043
 forgive-0.040
 agree-0.036
 sip-0.036
Token She
Token ID1375
Feature activation+2.82e-03

Pos logit contributions
aces+0.005
lf+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.005
 forgive-0.004
 better-0.004
use-0.004
 agree-0.004
Token learned
Token ID4499
Feature activation-2.02e-03

Pos logit contributions
 promise+0.005
 forgive+0.004
 better+0.004
use+0.004
 agree+0.004
Neg logit contributions
aces-0.005
lf-0.005
 walls-0.005
 towers-0.005
anes-0.005
Token that
Token ID326
Feature activation+1.17e-02

Pos logit contributions
aces+0.003
lf+0.003
 towers+0.003
 walls+0.003
 floors+0.003
Neg logit contributions
 promise-0.004
 better-0.004
 forgive-0.004
 agree-0.004
 sip-0.004
Token.
Token ID13
Feature activation+8.74e-03

Pos logit contributions
lf+0.017
aces+0.017
 walls+0.016
 towers+0.016
 floors+0.016
Neg logit contributions
 promise-0.017
 forgive-0.016
 better-0.016
use-0.015
 agree-0.015
Token She
Token ID1375
Feature activation+7.93e-03

Pos logit contributions
 promise+0.016
 forgive+0.015
 better+0.015
use+0.015
 agree+0.014
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Token had
Token ID550
Feature activation-1.59e-02

Pos logit contributions
 promise+0.011
 forgive+0.011
 better+0.011
 agree+0.010
 sip+0.009
Neg logit contributions
keeping-0.017
anes-0.017
aces-0.016
ados-0.016
 floors-0.016
Token many
Token ID867
Feature activation-2.96e-02

Pos logit contributions
lf+0.034
umbled+0.034
anes+0.033
aces+0.033
ator+0.032
Neg logit contributions
 better-0.023
 forgive-0.020
 safer-0.018
 rinse-0.018
 promise-0.018
Token cr
Token ID1067
Feature activation-4.91e-02

Pos logit contributions
keeping+0.034
aces+0.032
lf+0.032
anes+0.031
umbled+0.031
Neg logit contributions
 promise-0.070
 better-0.068
 forgive-0.067
 agree-0.065
use-0.063
Tokenay
Token ID323
Feature activation-7.04e-02

Pos logit contributions
lf+0.102
keeping+0.102
 floors+0.100
 walls+0.093
 towers+0.093
Neg logit contributions
 forgive-0.060
 promise-0.056
 kiss-0.056
 better-0.055
of-0.054
Tokenons
Token ID684
Feature activation-1.46e-02

Pos logit contributions
keeping+0.082
 floors+0.074
lf+0.069
 walls+0.067
 ground+0.066
Neg logit contributions
 forgive-0.166
 better-0.163
 promise-0.161
bs-0.156
 agree-0.154
Token of
Token ID286
Feature activation-1.05e-02

Pos logit contributions
lf+0.031
 floors+0.030
 towers+0.028
 walls+0.028
keeping+0.028
Neg logit contributions
 better-0.022
 forgive-0.021
 promise-0.021
 agree-0.019
 kiss-0.018
Token different
Token ID1180
Feature activation-1.17e-02

Pos logit contributions
aces+0.020
lf+0.020
 walls+0.018
 floors+0.018
anes+0.018
Neg logit contributions
 promise-0.017
 forgive-0.017
 better-0.016
use-0.015
 agree-0.015
Token colors
Token ID7577
Feature activation-2.08e-02

Pos logit contributions
lf+0.014
anes+0.012
aces+0.012
keeping+0.011
umbled+0.011
Neg logit contributions
 promise-0.027
 better-0.026
use-0.026
 agree-0.026
 forgive-0.026
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
lf+0.045
keeping+0.044
 floors+0.043
aces+0.040
 towers+0.040
Neg logit contributions
 better-0.029
 forgive-0.026
 promise-0.024
 kiss-0.023
 agree-0.023
Token!"
Token ID2474
Feature activation+1.03e-02

Pos logit contributions
 promise+0.010
 better+0.009
 forgive+0.009
 agree+0.009
use+0.008
Neg logit contributions
aces-0.011
lf-0.011
 floors-0.010
anes-0.010
 walls-0.010
Token
Token ID198
Feature activation+1.06e-02

Pos logit contributions
 promise+0.020
 forgive+0.019
 better+0.019
 agree+0.018
use+0.018
Neg logit contributions
lf-0.017
aces-0.017
 walls-0.015
 floors-0.015
 towers-0.015
Token
Token ID198
Feature activation+1.45e-02

Pos logit contributions
 promise+0.020
use+0.019
 forgive+0.018
 better+0.018
 unnecessary+0.018
Neg logit contributions
lf-0.018
aces-0.017
 walls-0.016
 towers-0.016
anes-0.016
Token"
Token ID1
Feature activation-2.01e-02

Pos logit contributions
 promise+0.026
 forgive+0.024
 better+0.023
use+0.023
 sip+0.023
Neg logit contributions
lf-0.027
aces-0.025
 towers-0.025
 walls-0.025
 floors-0.024
TokenTrump
Token ID6170
Feature activation+7.24e-03

Pos logit contributions
aces+0.041
lf+0.041
 walls+0.039
 towers+0.038
 floors+0.038
Neg logit contributions
 promise-0.031
 forgive-0.029
 better-0.028
use-0.027
 agree-0.026
Tokenet
Token ID316
Feature activation-7.04e-02

Pos logit contributions
 promise+0.018
 forgive+0.017
 better+0.017
use+0.017
 agree+0.017
Neg logit contributions
aces-0.008
lf-0.008
 walls-0.007
 towers-0.007
 floors-0.007
Token!"
Token ID2474
Feature activation+4.08e-03

Pos logit contributions
lf+0.155
 floors+0.152
keeping+0.152
 walls+0.149
aces+0.148
Neg logit contributions
 promise-0.092
 better-0.081
 forgive-0.078
 agree-0.076
use-0.076
Token Tom
Token ID4186
Feature activation-2.28e-02

Pos logit contributions
 promise+0.008
 better+0.008
 forgive+0.008
 agree+0.008
 sip+0.007
Neg logit contributions
aces-0.006
lf-0.006
 floors-0.006
 walls-0.006
 towers-0.006
Token said
Token ID531
Feature activation-7.67e-04

Pos logit contributions
lf+0.046
aces+0.045
anes+0.044
keeping+0.043
 towers+0.043
Neg logit contributions
 promise-0.035
 better-0.033
 agree-0.033
 forgive-0.033
 sip-0.030
Token,
Token ID11
Feature activation-8.44e-03

Pos logit contributions
aces+0.001
lf+0.001
 walls+0.001
 towers+0.001
 floors+0.001
Neg logit contributions
 promise-0.001
 forgive-0.001
use-0.001
 better-0.001
 agree-0.001
Token swinging
Token ID25635
Feature activation-4.62e-03

Pos logit contributions
aces+0.014
anes+0.013
keeping+0.013
lf+0.013
umbled+0.012
Neg logit contributions
 promise-0.016
 better-0.016
 forgive-0.016
 agree-0.015
 sip-0.015
Token like
Token ID588
Feature activation+1.57e-03

Pos logit contributions
aces+0.037
lf+0.035
 walls+0.034
 towers+0.034
 floors+0.033
Neg logit contributions
 promise-0.033
 forgive-0.030
 better-0.030
 unnecessary-0.029
use-0.028
Token to
Token ID284
Feature activation-1.11e-02

Pos logit contributions
 forgive+0.003
 better+0.003
 promise+0.003
 agree+0.003
 kiss+0.003
Neg logit contributions
lf-0.003
aces-0.003
 floors-0.003
 walls-0.003
ator-0.003
Token write
Token ID3551
Feature activation+9.53e-03

Pos logit contributions
aces+0.026
lf+0.026
 walls+0.025
 towers+0.025
 floors+0.025
Neg logit contributions
 promise-0.013
 forgive-0.012
 better-0.012
use-0.011
 agree-0.011
Token with
Token ID351
Feature activation+3.38e-03

Pos logit contributions
 promise+0.020
use+0.018
 forgive+0.018
 sip+0.018
 better+0.017
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
 towers-0.014
 floors-0.013
Token cr
Token ID1067
Feature activation-3.92e-02

Pos logit contributions
 promise+0.006
 forgive+0.006
 better+0.006
use+0.006
 agree+0.006
Neg logit contributions
aces-0.006
lf-0.006
 walls-0.005
 towers-0.005
 floors-0.005
Tokenay
Token ID323
Feature activation-7.00e-02

Pos logit contributions
lf+0.077
keeping+0.077
 floors+0.075
 walls+0.071
 towers+0.070
Neg logit contributions
 forgive-0.050
 kiss-0.049
 promise-0.048
of-0.048
 better-0.048
Tokenons
Token ID684
Feature activation-7.56e-03

Pos logit contributions
keeping+0.067
lf+0.058
 floors+0.057
aces+0.055
 walls+0.053
Neg logit contributions
 better-0.174
 forgive-0.174
 promise-0.174
use-0.168
 agree-0.165
Token.
Token ID13
Feature activation+8.81e-03

Pos logit contributions
lf+0.013
aces+0.013
 walls+0.013
 towers+0.012
 floors+0.012
Neg logit contributions
 promise-0.013
 forgive-0.013
 better-0.013
use-0.012
 agree-0.012
Token They
Token ID1119
Feature activation-2.32e-03

Pos logit contributions
 promise+0.018
use+0.016
 unnecessary+0.016
 forgive+0.016
 better+0.016
Neg logit contributions
 walls-0.014
aces-0.014
 towers-0.013
lf-0.013
 floors-0.013
Token write
Token ID3551
Feature activation+1.98e-03

Pos logit contributions
aces+0.005
lf+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
use-0.003
 agree-0.003
Token on
Token ID319
Feature activation-1.72e-02

Pos logit contributions
 promise+0.004
 forgive+0.004
 better+0.004
 agree+0.004
use+0.004
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.003
 floors-0.003
keeping-0.003
Token.
Token ID13
Feature activation+5.13e-03

Pos logit contributions
lf+0.017
aces+0.017
keeping+0.016
 floors+0.015
 walls+0.015
Neg logit contributions
 promise-0.018
 better-0.018
 forgive-0.018
 agree-0.016
use-0.016
Token She
Token ID1375
Feature activation+8.38e-03

Pos logit contributions
 promise+0.009
 forgive+0.009
 better+0.009
use+0.009
 agree+0.008
Neg logit contributions
aces-0.009
lf-0.009
 walls-0.008
 towers-0.008
 floors-0.008
Token had
Token ID550
Feature activation-1.59e-02

Pos logit contributions
 promise+0.013
 forgive+0.013
 better+0.013
 agree+0.012
use+0.012
Neg logit contributions
aces-0.016
keeping-0.016
anes-0.016
lf-0.016
 towers-0.016
Token many
Token ID867
Feature activation-2.80e-02

Pos logit contributions
lf+0.034
ator+0.033
anes+0.033
umbled+0.032
aces+0.031
Neg logit contributions
 better-0.023
 forgive-0.021
 rinse-0.019
 promise-0.019
 safer-0.019
Token cr
Token ID1067
Feature activation-4.81e-02

Pos logit contributions
keeping+0.037
lf+0.036
anes+0.035
aces+0.035
umbled+0.034
Neg logit contributions
 promise-0.062
 better-0.060
 forgive-0.059
 agree-0.056
 sip-0.055
Tokenay
Token ID323
Feature activation-6.97e-02

Pos logit contributions
keeping+0.103
lf+0.103
 floors+0.100
 walls+0.094
 towers+0.093
Neg logit contributions
 forgive-0.056
 kiss-0.053
of-0.051
 promise-0.051
arl-0.050
Tokenons
Token ID684
Feature activation-1.41e-02

Pos logit contributions
keeping+0.096
 floors+0.080
lf+0.075
 ground+0.073
 walls+0.072
Neg logit contributions
 forgive-0.154
 better-0.152
 promise-0.148
bs-0.145
 agree-0.142
Token of
Token ID286
Feature activation-9.71e-03

Pos logit contributions
lf+0.027
 floors+0.025
 towers+0.024
keeping+0.024
 walls+0.024
Neg logit contributions
 better-0.024
 promise-0.023
 forgive-0.023
 kiss-0.020
 agree-0.020
Token different
Token ID1180
Feature activation-8.30e-03

Pos logit contributions
aces+0.017
lf+0.017
 walls+0.016
 floors+0.016
anes+0.016
Neg logit contributions
 promise-0.017
 forgive-0.016
 better-0.016
use-0.015
 agree-0.015
Token colors
Token ID7577
Feature activation-2.04e-02

Pos logit contributions
lf+0.010
anes+0.009
aces+0.008
keeping+0.008
zy+0.008
Neg logit contributions
 promise-0.019
 better-0.018
 forgive-0.018
 agree-0.018
use-0.018
Token that
Token ID326
Feature activation+8.02e-03

Pos logit contributions
lf+0.042
keeping+0.040
 floors+0.040
aces+0.037
 towers+0.037
Neg logit contributions
 better-0.031
 forgive-0.028
 promise-0.027
 kiss-0.025
 sip-0.025
Token pony
Token ID26902
Feature activation-1.34e-02

Pos logit contributions
aces+0.039
lf+0.039
 walls+0.036
 towers+0.036
 floors+0.035
Neg logit contributions
 promise-0.059
 forgive-0.056
 better-0.056
use-0.054
 agree-0.053
Token and
Token ID290
Feature activation-9.78e-03

Pos logit contributions
lf+0.024
aces+0.024
 floors+0.023
 towers+0.023
 walls+0.023
Neg logit contributions
 promise-0.023
 better-0.022
 forgive-0.022
 agree-0.021
use-0.020
Token the
Token ID262
Feature activation+2.09e-03

Pos logit contributions
aces+0.014
lf+0.014
 walls+0.013
 towers+0.013
 floors+0.013
Neg logit contributions
 promise-0.020
 forgive-0.020
use-0.019
 better-0.019
 agree-0.019
Token 3
Token ID513
Feature activation-4.11e-02

Pos logit contributions
 promise+0.004
use+0.003
 forgive+0.003
 agree+0.003
 sip+0.003
Neg logit contributions
lf-0.004
 walls-0.004
aces-0.004
 floors-0.004
 towers-0.003
Token-
Token ID12
Feature activation-5.50e-02

Pos logit contributions
aces+0.067
lf+0.063
 towers+0.062
 walls+0.062
anes+0.062
Neg logit contributions
 promise-0.080
 better-0.074
 forgive-0.073
 agree-0.072
use-0.069
Tokenyear
Token ID1941
Feature activation-6.97e-02

Pos logit contributions
lf+0.100
aces+0.098
 walls+0.095
 floors+0.093
 towers+0.093
Neg logit contributions
 promise-0.095
 forgive-0.093
 better-0.091
use-0.087
 agree-0.086
Token old
Token ID1468
Feature activation-1.46e-02

Pos logit contributions
lf+0.133
aces+0.123
anes+0.119
keeping+0.118
 walls+0.117
Neg logit contributions
 promise-0.123
 better-0.111
 agree-0.107
 forgive-0.105
 accept-0.098
Token girl
Token ID2576
Feature activation-1.86e-04

Pos logit contributions
aces+0.028
lf+0.028
 walls+0.027
 towers+0.026
 floors+0.026
Neg logit contributions
 promise-0.024
 forgive-0.022
 better-0.022
use-0.021
 agree-0.021
Token played
Token ID2826
Feature activation-2.93e-03

Pos logit contributions
lf+0.000
aces+0.000
 walls+0.000
 towers+0.000
 floors+0.000
Neg logit contributions
 promise-0.000
 forgive-0.000
 better-0.000
 agree-0.000
use-0.000
Token together
Token ID1978
Feature activation-8.98e-04

Pos logit contributions
aces+0.006
lf+0.006
 walls+0.005
 floors+0.005
 towers+0.005
Neg logit contributions
 promise-0.005
 forgive-0.005
 better-0.005
 agree-0.004
use-0.004
Token all
Token ID477
Feature activation+9.74e-04

Pos logit contributions
aces+0.001
lf+0.001
anes+0.001
 floors+0.001
keeping+0.001
Neg logit contributions
 promise-0.002
 better-0.002
 forgive-0.002
 agree-0.002
 sip-0.002
Token He
Token ID679
Feature activation-1.21e-02

Pos logit contributions
lf+0.020
aces+0.019
 towers+0.017
pie+0.017
ped+0.016
Neg logit contributions
 forgive-0.020
 promise-0.020
 better-0.020
 agree-0.019
 accept-0.019
Token had
Token ID550
Feature activation-2.02e-02

Pos logit contributions
aces+0.022
lf+0.021
 towers+0.021
keeping+0.020
 floors+0.020
Neg logit contributions
 promise-0.021
 forgive-0.019
 better-0.019
use-0.019
 agree-0.019
Token all
Token ID477
Feature activation-2.54e-02

Pos logit contributions
aces+0.042
ados+0.040
lf+0.039
ator+0.039
anes+0.037
Neg logit contributions
 better-0.031
 forgive-0.024
 notice-0.024
 kiss-0.023
 promise-0.023
Token his
Token ID465
Feature activation-3.41e-02

Pos logit contributions
aces+0.050
lf+0.048
 walls+0.046
 towers+0.046
 floors+0.045
Neg logit contributions
 promise-0.041
 better-0.040
 forgive-0.036
 kiss-0.036
 agree-0.034
Token cr
Token ID1067
Feature activation-4.95e-02

Pos logit contributions
lf+0.062
aces+0.061
keeping+0.058
anes+0.057
 towers+0.056
Neg logit contributions
 promise-0.063
 better-0.057
 forgive-0.055
 kiss-0.051
use-0.051
Tokenay
Token ID323
Feature activation-6.97e-02

Pos logit contributions
keeping+0.106
lf+0.105
 floors+0.105
 towers+0.104
 walls+0.100
Neg logit contributions
 forgive-0.052
 promise-0.051
 kiss-0.050
of-0.049
ench-0.046
Tokenons
Token ID684
Feature activation-2.05e-02

Pos logit contributions
keeping+0.094
 floors+0.091
 ground+0.082
 walls+0.082
 stairs+0.079
Neg logit contributions
 promise-0.146
 forgive-0.145
 better-0.141
bs-0.138
use-0.131
Token out
Token ID503
Feature activation-1.99e-02

Pos logit contributions
lf+0.041
 floors+0.039
 towers+0.038
aces+0.038
 walls+0.037
Neg logit contributions
 better-0.032
 promise-0.030
 forgive-0.029
 agree-0.026
 kiss-0.025
Token and
Token ID290
Feature activation-5.12e-03

Pos logit contributions
aces+0.035
lf+0.034
 floors+0.032
 walls+0.032
 towers+0.032
Neg logit contributions
 promise-0.035
 better-0.034
 forgive-0.034
use-0.032
 agree-0.031
Token was
Token ID373
Feature activation-3.55e-03

Pos logit contributions
aces+0.009
anes+0.009
umbled+0.008
lf+0.008
ator+0.008
Neg logit contributions
 promise-0.009
 better-0.009
 forgive-0.009
 kiss-0.008
 accept-0.008
Token spreading
Token ID14342
Feature activation+7.73e-03

Pos logit contributions
aces+0.007
lf+0.006
anes+0.006
keeping+0.006
umbled+0.006
Neg logit contributions
 better-0.006
 promise-0.006
 forgive-0.006
 happier-0.005
 rinse-0.005
Token.
Token ID13
Feature activation-7.95e-03

Pos logit contributions
aces+0.032
lf+0.031
 floors+0.030
 towers+0.029
keeping+0.029
Neg logit contributions
 promise-0.031
 forgive-0.030
 better-0.029
 sip-0.027
 agree-0.027
Token She
Token ID1375
Feature activation-1.18e-03

Pos logit contributions
lf+0.013
aces+0.013
 floors+0.012
 walls+0.012
 towers+0.012
Neg logit contributions
 promise-0.015
 forgive-0.014
 better-0.014
 sip-0.013
use-0.013
Token was
Token ID373
Feature activation-1.49e-02

Pos logit contributions
aces+0.003
anes+0.003
keeping+0.003
 towers+0.002
ados+0.002
Neg logit contributions
 promise-0.002
 forgive-0.002
 better-0.002
 sip-0.001
use-0.001
Token three
Token ID1115
Feature activation-2.39e-02

Pos logit contributions
lf+0.036
aces+0.034
keeping+0.034
anes+0.033
cles+0.033
Neg logit contributions
 better-0.020
 promise-0.018
 forgive-0.017
 safer-0.016
 rinse-0.015
Token-
Token ID12
Feature activation-5.34e-02

Pos logit contributions
aces+0.038
keeping+0.037
lf+0.037
anes+0.036
 walls+0.034
Neg logit contributions
 promise-0.047
 better-0.043
 forgive-0.042
 agree-0.041
 sip-0.039
Tokenyears
Token ID19002
Feature activation-6.95e-02

Pos logit contributions
lf+0.091
aces+0.089
 walls+0.088
 ground+0.084
 floors+0.083
Neg logit contributions
 forgive-0.097
 promise-0.092
 better-0.092
use-0.087
bs-0.087
Token old
Token ID1468
Feature activation-1.40e-02

Pos logit contributions
lf+0.134
keeping+0.125
 walls+0.123
 towers+0.121
aces+0.121
Neg logit contributions
 promise-0.110
 better-0.101
 forgive-0.096
 agree-0.096
 cal-0.088
Token and
Token ID290
Feature activation-2.05e-03

Pos logit contributions
lf+0.024
aces+0.024
anes+0.022
 walls+0.022
 towers+0.022
Neg logit contributions
 promise-0.025
 better-0.024
 forgive-0.024
 agree-0.023
use-0.022
Token she
Token ID673
Feature activation-2.52e-03

Pos logit contributions
aces+0.004
lf+0.004
anes+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
 agree-0.003
 rinse-0.003
Token loved
Token ID6151
Feature activation-7.74e-03

Pos logit contributions
anes+0.006
aces+0.005
 towers+0.005
 floors+0.005
 walls+0.005
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
 agree-0.003
 accept-0.003
Token to
Token ID284
Feature activation-2.39e-02

Pos logit contributions
aces+0.016
lf+0.015
umbled+0.015
ator+0.015
anes+0.015
Neg logit contributions
 better-0.012
 forgive-0.011
 promise-0.010
 agree-0.010
 kiss-0.010
Token real
Token ID1103
Feature activation-5.46e-03

Pos logit contributions
 promise+0.000
 forgive+0.000
use+0.000
 better+0.000
 agree+0.000
Neg logit contributions
aces-0.000
lf-0.000
 walls-0.000
 towers-0.000
 floors-0.000
Token that
Token ID326
Feature activation-2.65e-03

Pos logit contributions
lf+0.011
aces+0.010
 floors+0.010
anes+0.010
keeping+0.010
Neg logit contributions
 better-0.009
 promise-0.008
 forgive-0.008
 agree-0.008
 safer-0.007
Token the
Token ID262
Feature activation-5.97e-03

Pos logit contributions
aces+0.004
lf+0.004
 floors+0.004
keeping+0.004
anes+0.004
Neg logit contributions
 better-0.005
 promise-0.005
 forgive-0.005
 agree-0.005
 accept-0.005
Token 3
Token ID513
Feature activation-3.98e-02

Pos logit contributions
aces+0.010
lf+0.010
 towers+0.009
anes+0.009
 walls+0.009
Neg logit contributions
 promise-0.012
 forgive-0.011
 better-0.011
use-0.010
 agree-0.010
Token-
Token ID12
Feature activation-5.33e-02

Pos logit contributions
aces+0.077
anes+0.073
ados+0.068
 bricks+0.068
keeping+0.068
Neg logit contributions
 promise-0.068
 agree-0.060
 better-0.057
 forgive-0.056
ackers-0.053
Tokenyear
Token ID1941
Feature activation-6.93e-02

Pos logit contributions
lf+0.103
 walls+0.102
aces+0.102
 floors+0.101
 towers+0.099
Neg logit contributions
 forgive-0.083
 better-0.080
 promise-0.079
use-0.074
 agree-0.073
Token-
Token ID12
Feature activation-5.39e-02

Pos logit contributions
lf+0.139
anes+0.134
aces+0.134
keeping+0.128
 bricks+0.127
Neg logit contributions
 promise-0.101
 agree-0.086
 better-0.086
 forgive-0.078
 cal-0.076
Tokenold
Token ID727
Feature activation+6.13e-03

Pos logit contributions
lf+0.127
 walls+0.126
 floors+0.126
aces+0.125
 towers+0.122
Neg logit contributions
 forgive-0.061
 promise-0.056
 better-0.056
 agree-0.053
use-0.051
Token boy
Token ID2933
Feature activation+5.54e-03

Pos logit contributions
 promise+0.011
 better+0.011
 forgive+0.011
 agree+0.010
use+0.010
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.010
 towers-0.010
 floors-0.010
Token could
Token ID714
Feature activation-1.26e-02

Pos logit contributions
 promise+0.010
 forgive+0.009
 better+0.009
 agree+0.009
use+0.009
Neg logit contributions
lf-0.010
aces-0.010
 towers-0.009
 floors-0.009
 walls-0.009
Token feel
Token ID1254
Feature activation+1.49e-02

Pos logit contributions
aces+0.027
lf+0.027
 floors+0.025
 walls+0.025
 towers+0.025
Neg logit contributions
 promise-0.018
 forgive-0.017
 better-0.017
 agree-0.016
use-0.016
Token knew
Token ID2993
Feature activation-4.61e-03

Pos logit contributions
 promise+0.005
 forgive+0.005
 better+0.005
use+0.005
 agree+0.005
Neg logit contributions
aces-0.006
lf-0.006
 walls-0.006
 towers-0.006
 floors-0.006
Token that
Token ID326
Feature activation+1.50e-02

Pos logit contributions
aces+0.008
lf+0.008
 towers+0.007
 walls+0.007
 floors+0.007
Neg logit contributions
 promise-0.009
 forgive-0.009
 better-0.008
use-0.008
 agree-0.008
Token the
Token ID262
Feature activation-4.99e-03

Pos logit contributions
 promise+0.026
 better+0.025
 forgive+0.025
 agree+0.023
 rinse+0.023
Neg logit contributions
aces-0.028
lf-0.027
anes-0.025
 towers-0.025
 walls-0.025
Token f
Token ID277
Feature activation-1.37e-02

Pos logit contributions
aces+0.008
lf+0.008
 walls+0.008
 towers+0.007
 floors+0.007
Neg logit contributions
 promise-0.010
 forgive-0.009
 better-0.009
use-0.009
 agree-0.009
Tokenauc
Token ID14272
Feature activation-4.12e-03

Pos logit contributions
lf+0.028
aces+0.025
 walls+0.025
 towers+0.025
keeping+0.024
Neg logit contributions
 promise-0.021
 forgive-0.021
 better-0.020
use-0.019
 rinse-0.019
Tokenet
Token ID316
Feature activation-6.92e-02

Pos logit contributions
lf+0.005
aces+0.005
 walls+0.004
anes+0.004
 towers+0.004
Neg logit contributions
 promise-0.010
 forgive-0.009
 better-0.009
use-0.009
 agree-0.009
Token was
Token ID373
Feature activation+7.89e-03

Pos logit contributions
lf+0.129
aces+0.129
 towers+0.126
keeping+0.122
 floors+0.122
Neg logit contributions
 promise-0.117
 better-0.109
 forgive-0.108
use-0.099
 accept-0.097
Token really
Token ID1107
Feature activation-1.77e-03

Pos logit contributions
 promise+0.015
 better+0.014
 forgive+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.013
lf-0.013
 walls-0.012
 towers-0.012
 floors-0.012
Token good
Token ID922
Feature activation-3.25e-03

Pos logit contributions
aces+0.003
lf+0.003
 walls+0.003
 towers+0.003
 floors+0.003
Neg logit contributions
 promise-0.003
 forgive-0.003
use-0.003
 agree-0.003
 sip-0.003
Token,
Token ID11
Feature activation-1.17e-02

Pos logit contributions
lf+0.007
aces+0.006
anes+0.006
 towers+0.006
 walls+0.006
Neg logit contributions
 promise-0.005
 better-0.005
 forgive-0.005
 agree-0.005
 accept-0.004
Token and
Token ID290
Feature activation+1.50e-03

Pos logit contributions
lf+0.020
aces+0.020
anes+0.018
 floors+0.018
 towers+0.018
Neg logit contributions
 promise-0.022
 better-0.021
 forgive-0.021
 agree-0.019
 rinse-0.019
Token was
Token ID373
Feature activation+4.65e-03

Pos logit contributions
aces+0.008
lf+0.008
 floors+0.008
 towers+0.008
 walls+0.008
Neg logit contributions
 promise-0.007
 better-0.007
 forgive-0.007
 agree-0.007
 accept-0.006
Token known
Token ID1900
Feature activation+3.35e-04

Pos logit contributions
 promise+0.009
 forgive+0.008
 better+0.008
use+0.008
 agree+0.008
Neg logit contributions
aces-0.008
lf-0.008
 walls-0.007
 towers-0.007
 floors-0.007
Token as
Token ID355
Feature activation-6.97e-03

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
use+0.001
 agree+0.001
Neg logit contributions
aces-0.001
lf-0.001
 walls-0.001
 towers-0.001
 floors-0.001
Token the
Token ID262
Feature activation-3.81e-03

Pos logit contributions
aces+0.011
lf+0.011
 walls+0.010
 towers+0.010
 floors+0.010
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.012
 agree-0.012
Token cle
Token ID1190
Feature activation-1.97e-02

Pos logit contributions
aces+0.007
lf+0.006
 walls+0.006
 towers+0.006
 floors+0.006
Neg logit contributions
 promise-0.007
 forgive-0.006
use-0.006
 sip-0.006
 agree-0.006
Tokenve
Token ID303
Feature activation-6.90e-02

Pos logit contributions
lf+0.034
aces+0.033
 walls+0.033
 towers+0.033
 floors+0.032
Neg logit contributions
 promise-0.037
 forgive-0.035
 better-0.034
 agree-0.033
use-0.033
Tokenrest
Token ID2118
Feature activation-2.87e-02

Pos logit contributions
 towers+0.091
 walls+0.088
lf+0.088
aces+0.088
 floors+0.087
Neg logit contributions
 better-0.151
 promise-0.148
 forgive-0.145
 accept-0.139
use-0.138
Token mouse
Token ID10211
Feature activation-3.94e-03

Pos logit contributions
lf+0.053
aces+0.052
 floors+0.049
 towers+0.049
 walls+0.048
Neg logit contributions
 promise-0.049
 forgive-0.048
 better-0.048
use-0.045
 agree-0.043
Token in
Token ID287
Feature activation-7.89e-03

Pos logit contributions
lf+0.007
aces+0.007
 floors+0.007
 walls+0.007
 towers+0.007
Neg logit contributions
 promise-0.007
 forgive-0.007
 better-0.007
 agree-0.006
use-0.006
Token the
Token ID262
Feature activation+3.52e-03

Pos logit contributions
aces+0.013
lf+0.013
 walls+0.012
 towers+0.012
 floors+0.011
Neg logit contributions
 promise-0.015
 forgive-0.015
 better-0.014
use-0.014
 agree-0.014
Token whole
Token ID2187
Feature activation-1.07e-02

Pos logit contributions
 promise+0.007
 forgive+0.007
 agree+0.007
 rinse+0.007
 sip+0.007
Neg logit contributions
 walls-0.005
lf-0.005
aces-0.004
 towers-0.004
 floors-0.004
Token was
Token ID373
Feature activation+6.31e-03

Pos logit contributions
aces+0.066
keeping+0.066
lf+0.065
 towers+0.062
ados+0.062
Neg logit contributions
 better-0.057
 promise-0.056
 forgive-0.054
 agree-0.050
 accept-0.047
Token known
Token ID1900
Feature activation+4.67e-03

Pos logit contributions
 promise+0.012
 forgive+0.012
 better+0.011
use+0.011
 agree+0.011
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.009
 towers-0.009
 floors-0.009
Token as
Token ID355
Feature activation-1.18e-02

Pos logit contributions
 promise+0.008
 forgive+0.008
use+0.008
 better+0.008
 sip+0.008
Neg logit contributions
aces-0.008
lf-0.008
 walls-0.008
 towers-0.008
 floors-0.007
Token the
Token ID262
Feature activation+1.98e-04

Pos logit contributions
aces+0.019
lf+0.019
 walls+0.018
 towers+0.017
 floors+0.017
Neg logit contributions
 promise-0.023
 forgive-0.022
 better-0.022
use-0.021
 agree-0.021
Token cle
Token ID1190
Feature activation-1.83e-02

Pos logit contributions
 promise+0.000
 forgive+0.000
use+0.000
 sip+0.000
 better+0.000
Neg logit contributions
aces-0.000
lf-0.000
 walls-0.000
 towers-0.000
 floors-0.000
Tokenve
Token ID303
Feature activation-6.89e-02

Pos logit contributions
lf+0.031
aces+0.031
 walls+0.029
 towers+0.029
 floors+0.028
Neg logit contributions
 promise-0.035
 forgive-0.032
 better-0.032
 agree-0.031
use-0.031
Tokenrest
Token ID2118
Feature activation-2.82e-02

Pos logit contributions
lf+0.090
 towers+0.089
aces+0.087
 walls+0.086
 floors+0.082
Neg logit contributions
 better-0.151
 promise-0.148
 forgive-0.142
 accept-0.137
 unnecessary-0.136
Token engine
Token ID3113
Feature activation-3.42e-02

Pos logit contributions
lf+0.051
aces+0.050
 towers+0.045
keeping+0.045
 walls+0.045
Neg logit contributions
 promise-0.049
 better-0.048
 forgive-0.048
use-0.043
 agree-0.043
Token in
Token ID287
Feature activation-1.48e-02

Pos logit contributions
aces+0.058
lf+0.058
 floors+0.055
keeping+0.055
 towers+0.054
Neg logit contributions
 better-0.062
 promise-0.061
 forgive-0.060
 agree-0.055
 accept-0.054
Token all
Token ID477
Feature activation-9.24e-03

Pos logit contributions
aces+0.023
lf+0.023
 walls+0.021
 towers+0.021
 floors+0.021
Neg logit contributions
 promise-0.030
 forgive-0.028
 better-0.028
use-0.027
 agree-0.027
Token the
Token ID262
Feature activation+1.42e-03

Pos logit contributions
aces+0.017
lf+0.016
 walls+0.015
 floors+0.015
 towers+0.015
Neg logit contributions
 promise-0.017
 better-0.016
 forgive-0.016
 agree-0.015
use-0.015
Token.
Token ID13
Feature activation+8.08e-03

Pos logit contributions
 promise+0.018
 better+0.017
 forgive+0.017
 agree+0.016
 sip+0.015
Neg logit contributions
lf-0.019
aces-0.019
 floors-0.018
 walls-0.018
keeping-0.018
Token She
Token ID1375
Feature activation+7.99e-03

Pos logit contributions
 promise+0.016
 forgive+0.015
 better+0.015
use+0.014
 agree+0.014
Neg logit contributions
aces-0.013
lf-0.013
 walls-0.012
 towers-0.012
 floors-0.012
Token had
Token ID550
Feature activation-1.38e-02

Pos logit contributions
 promise+0.011
 better+0.010
 forgive+0.010
 agree+0.010
 rinse+0.009
Neg logit contributions
keeping-0.017
anes-0.017
ados-0.017
aces-0.017
umbled-0.016
Token many
Token ID867
Feature activation-2.73e-02

Pos logit contributions
lf+0.031
anes+0.030
umbled+0.030
ator+0.030
aces+0.029
Neg logit contributions
 better-0.019
 forgive-0.016
 rinse-0.016
 promise-0.015
 safer-0.015
Token cr
Token ID1067
Feature activation-4.83e-02

Pos logit contributions
keeping+0.036
anes+0.035
aces+0.034
umbled+0.034
lf+0.033
Neg logit contributions
 promise-0.061
 better-0.059
 forgive-0.056
 agree-0.055
 safer-0.053
Tokenay
Token ID323
Feature activation-6.89e-02

Pos logit contributions
lf+0.104
keeping+0.103
 floors+0.102
 towers+0.096
 walls+0.096
Neg logit contributions
 forgive-0.050
ack-0.049
of-0.049
ackers-0.048
 kiss-0.048
Tokenons
Token ID684
Feature activation-1.31e-02

Pos logit contributions
keeping+0.102
 floors+0.084
 ground+0.080
stack+0.079
jack+0.078
Neg logit contributions
 forgive-0.139
 better-0.137
bs-0.136
 promise-0.133
 agree-0.129
Token and
Token ID290
Feature activation-1.51e-02

Pos logit contributions
lf+0.028
 floors+0.027
 towers+0.026
umbled+0.026
keeping+0.026
Neg logit contributions
 better-0.019
 promise-0.018
 forgive-0.017
 agree-0.016
bs-0.015
Token papers
Token ID9473
Feature activation-2.41e-02

Pos logit contributions
lf+0.019
aces+0.019
anes+0.019
umbled+0.018
keeping+0.018
Neg logit contributions
 promise-0.033
 better-0.032
 forgive-0.032
 rinse-0.030
 agree-0.030
Token.
Token ID13
Feature activation+1.10e-02

Pos logit contributions
lf+0.050
keeping+0.047
umbled+0.047
 floors+0.047
aces+0.047
Neg logit contributions
 better-0.035
 promise-0.034
 forgive-0.033
 agree-0.031
use-0.029
Token She
Token ID1375
Feature activation+1.04e-02

Pos logit contributions
 promise+0.023
 forgive+0.021
 better+0.021
use+0.021
 unnecessary+0.020
Neg logit contributions
aces-0.017
lf-0.016
 walls-0.016
 towers-0.016
 floors-0.015
Token it
Token ID340
Feature activation+6.67e-03

Pos logit contributions
 better+0.031
 promise+0.030
 forgive+0.029
use+0.028
 sip+0.027
Neg logit contributions
aces-0.035
lf-0.034
anes-0.032
 floors-0.031
 walls-0.031
Token back
Token ID736
Feature activation+6.66e-03

Pos logit contributions
 promise+0.012
 forgive+0.011
 better+0.011
use+0.010
 agree+0.010
Neg logit contributions
lf-0.012
aces-0.012
 walls-0.011
 floors-0.011
 towers-0.011
Token to
Token ID284
Feature activation+2.01e-02

Pos logit contributions
 promise+0.012
 forgive+0.011
 better+0.011
use+0.011
 agree+0.010
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.011
 towers-0.011
 floors-0.011
Token me
Token ID502
Feature activation+2.53e-02

Pos logit contributions
 promise+0.038
 forgive+0.035
 better+0.035
use+0.034
 agree+0.033
Neg logit contributions
lf-0.034
aces-0.033
 walls-0.032
 towers-0.031
 floors-0.030
Token.
Token ID13
Feature activation+1.25e-03

Pos logit contributions
 promise+0.044
 forgive+0.042
 better+0.041
use+0.040
 sip+0.039
Neg logit contributions
aces-0.046
lf-0.045
 walls-0.042
 towers-0.042
 floors-0.041
Token I
Token ID314
Feature activation+3.37e-02

Pos logit contributions
 promise+0.002
 forgive+0.002
 better+0.002
use+0.002
 agree+0.002
Neg logit contributions
aces-0.002
lf-0.002
 walls-0.002
 towers-0.002
 floors-0.002
Token want
Token ID765
Feature activation-4.00e-03

Pos logit contributions
 unnecessary+0.037
 louder+0.036
 unison+0.035
 better+0.034
 cal+0.033
Neg logit contributions
aces-0.067
lf-0.062
 walls-0.061
 himself-0.061
 stacked-0.061
Token to
Token ID284
Feature activation+2.89e-03

Pos logit contributions
aces+0.007
lf+0.007
 floors+0.006
anes+0.006
keeping+0.006
Neg logit contributions
 promise-0.007
 better-0.007
 forgive-0.007
 sip-0.007
use-0.007
Token throw
Token ID3714
Feature activation+1.48e-02

Pos logit contributions
 promise+0.004
 forgive+0.003
 better+0.003
use+0.003
 agree+0.003
Neg logit contributions
aces-0.007
lf-0.007
 walls-0.006
 towers-0.006
 floors-0.006
Token it
Token ID340
Feature activation+2.49e-03

Pos logit contributions
 promise+0.031
 forgive+0.030
use+0.029
 better+0.029
 agree+0.028
Neg logit contributions
aces-0.021
lf-0.021
 walls-0.020
 towers-0.020
 floors-0.019
Token in
Token ID287
Feature activation+7.39e-03

Pos logit contributions
 promise+0.004
 better+0.004
 forgive+0.004
use+0.004
 agree+0.004
Neg logit contributions
lf-0.005
aces-0.005
 walls-0.004
 floors-0.004
 towers-0.004
Token to
Token ID284
Feature activation+1.45e-02

Pos logit contributions
 promise+0.005
 better+0.005
 forgive+0.005
 agree+0.004
 accept+0.004
Neg logit contributions
aces-0.006
lf-0.006
anes-0.006
 floors-0.006
umbled-0.006
Token be
Token ID307
Feature activation+2.60e-02

Pos logit contributions
 promise+0.020
 forgive+0.018
 better+0.018
use+0.017
 agree+0.017
Neg logit contributions
aces-0.032
lf-0.032
 walls-0.030
 towers-0.030
 floors-0.030
Token careful
Token ID8161
Feature activation-3.16e-03

Pos logit contributions
 promise+0.045
use+0.044
 sip+0.043
 forgive+0.042
 agree+0.041
Neg logit contributions
aces-0.044
 walls-0.043
lf-0.043
 floors-0.043
 towers-0.042
Token.
Token ID13
Feature activation+4.65e-03

Pos logit contributions
anes+0.007
aces+0.007
 towers+0.007
lf+0.007
 sculptures+0.007
Neg logit contributions
 better-0.004
 promise-0.004
 forgive-0.004
 agree-0.004
 louder-0.003
Token "
Token ID366
Feature activation-8.56e-03

Pos logit contributions
 promise+0.007
 forgive+0.007
 better+0.006
use+0.006
 agree+0.006
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.009
 towers-0.009
 floors-0.009
TokenI
Token ID40
Feature activation+3.61e-02

Pos logit contributions
aces+0.017
lf+0.017
 walls+0.016
 towers+0.016
 floors+0.015
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.012
 agree-0.012
Token will
Token ID481
Feature activation-3.04e-03

Pos logit contributions
 unnecessary+0.043
 unison+0.042
 louder+0.040
 conclusions+0.039
 happier+0.038
Neg logit contributions
lf-0.065
aces-0.063
 walls-0.063
 hardly-0.062
zy-0.061
Token explain
Token ID4727
Feature activation+8.92e-03

Pos logit contributions
lf+0.006
aces+0.006
 walls+0.006
 towers+0.006
 floors+0.006
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
use-0.004
 agree-0.004
Token why
Token ID1521
Feature activation+4.17e-02

Pos logit contributions
 promise+0.018
 forgive+0.017
 better+0.017
use+0.016
 agree+0.016
Neg logit contributions
aces-0.014
lf-0.014
 walls-0.013
 towers-0.013
 floors-0.013
Token later
Token ID1568
Feature activation+5.50e-03

Pos logit contributions
 promise+0.070
 forgive+0.065
 better+0.065
use+0.061
 agree+0.060
Neg logit contributions
aces-0.080
lf-0.078
 walls-0.074
 towers-0.073
 floors-0.073
Token,"
Token ID553
Feature activation+7.71e-03

Pos logit contributions
 promise+0.009
 better+0.008
 forgive+0.008
use+0.008
 agree+0.008
Neg logit contributions
aces-0.011
lf-0.011
 walls-0.010
 floors-0.010
 towers-0.010
Token time
Token ID640
Feature activation+3.22e-03

Pos logit contributions
 promise+0.014
 forgive+0.014
 better+0.014
use+0.014
 unnecessary+0.013
Neg logit contributions
lf-0.011
aces-0.011
 walls-0.010
 towers-0.009
zy-0.009
Token,
Token ID11
Feature activation-1.96e-02

Pos logit contributions
 promise+0.004
 forgive+0.004
 better+0.004
use+0.004
 agree+0.004
Neg logit contributions
aces-0.007
lf-0.007
 floors-0.007
 walls-0.007
keeping-0.007
Token there
Token ID612
Feature activation-1.90e-03

Pos logit contributions
lf+0.026
aces+0.026
 walls+0.024
 towers+0.023
 floors+0.023
Neg logit contributions
 promise-0.044
 forgive-0.042
 better-0.041
use-0.041
 agree-0.040
Token was
Token ID373
Feature activation-1.03e-02

Pos logit contributions
aces+0.004
lf+0.003
 floors+0.003
keeping+0.003
 towers+0.003
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
 sip-0.003
 agree-0.003
Token a
Token ID257
Feature activation+6.66e-03

Pos logit contributions
aces+0.019
lf+0.019
 floors+0.018
 towers+0.017
anes+0.017
Neg logit contributions
 better-0.017
 promise-0.017
 forgive-0.017
 sip-0.016
 agree-0.015
Token little
Token ID1310
Feature activation+3.60e-02

Pos logit contributions
use+0.015
 unnecessary+0.014
severe+0.013
 promise+0.013
forts+0.013
Neg logit contributions
lf-0.010
 walls-0.009
 towers-0.008
 cubes-0.008
 blocks-0.008
Token cater
Token ID20825
Feature activation+5.98e-03

Pos logit contributions
 promise+0.057
 forgive+0.055
use+0.054
 unnecessary+0.052
 agree+0.052
Neg logit contributions
lf-0.068
 walls-0.065
 bricks-0.064
 cubes-0.063
 towers-0.063
Tokenpillar
Token ID41643
Feature activation+1.04e-02

Pos logit contributions
 promise+0.009
 forgive+0.009
 better+0.008
use+0.008
 agree+0.008
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.011
 towers-0.011
 floors-0.011
Token.
Token ID13
Feature activation+3.35e-03

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.022
lf-0.022
 walls-0.021
 towers-0.021
 floors-0.021
Token She
Token ID1375
Feature activation+6.77e-03

Pos logit contributions
 forgive+0.005
 better+0.004
 sip+0.004
 promise+0.004
 nod+0.004
Neg logit contributions
lf-0.007
 floors-0.007
we-0.007
ator-0.007
keeping-0.006
Token was
Token ID373
Feature activation-2.91e-03

Pos logit contributions
 better+0.010
 promise+0.010
 forgive+0.010
 sip+0.009
use+0.008
Neg logit contributions
anes-0.014
keeping-0.014
aces-0.013
 floors-0.013
 towers-0.013
Token everyone
Token ID2506
Feature activation-3.97e-03

Pos logit contributions
 promise+0.024
 forgive+0.022
use+0.021
 better+0.021
 agree+0.021
Neg logit contributions
lf-0.021
aces-0.020
 walls-0.020
 towers-0.019
 floors-0.019
Token started
Token ID2067
Feature activation-2.05e-02

Pos logit contributions
aces+0.006
lf+0.006
 walls+0.006
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.008
 forgive-0.008
 better-0.008
use-0.007
 agree-0.007
Token to
Token ID284
Feature activation-2.90e-03

Pos logit contributions
aces+0.033
lf+0.032
 walls+0.028
 towers+0.028
 floors+0.028
Neg logit contributions
 promise-0.041
 better-0.040
 forgive-0.039
 agree-0.038
 sip-0.037
Token act
Token ID719
Feature activation+2.43e-02

Pos logit contributions
 walls+0.006
lf+0.006
 towers+0.005
 floors+0.005
aces+0.005
Neg logit contributions
 promise-0.004
use-0.004
 unnecessary-0.004
 better-0.004
 forgive-0.004
Token a
Token ID257
Feature activation+4.38e-03

Pos logit contributions
 promise+0.048
use+0.046
 forgive+0.044
 sip+0.043
 rinse+0.043
Neg logit contributions
aces-0.039
 walls-0.037
lf-0.037
 towers-0.035
 floors-0.035
Token little
Token ID1310
Feature activation+3.54e-02

Pos logit contributions
use+0.009
ackers+0.008
.'"+0.008
arl+0.008
 Deal+0.008
Neg logit contributions
 walls-0.006
lf-0.006
 floors-0.005
 floor-0.005
 towers-0.005
Token different
Token ID1180
Feature activation+5.13e-03

Pos logit contributions
use+0.060
 forgive+0.059
 promise+0.058
ackers+0.057
 rinse+0.056
Neg logit contributions
 walls-0.060
lf-0.058
 floors-0.057
 towers-0.057
aces-0.054
Token.
Token ID13
Feature activation-5.42e-03

Pos logit contributions
 promise+0.010
 forgive+0.010
 better+0.009
use+0.009
 agree+0.009
Neg logit contributions
lf-0.008
aces-0.008
 walls-0.008
 towers-0.007
 floors-0.007
Token They
Token ID1119
Feature activation-4.88e-03

Pos logit contributions
lf+0.009
aces+0.009
 walls+0.008
 towers+0.008
 floors+0.008
Neg logit contributions
 promise-0.010
 forgive-0.010
 better-0.009
 agree-0.009
use-0.009
Token smiled
Token ID13541
Feature activation+3.52e-03

Pos logit contributions
lf+0.007
aces+0.007
 walls+0.007
 towers+0.007
 floors+0.007
Neg logit contributions
 promise-0.010
 forgive-0.009
 better-0.009
use-0.009
 agree-0.009
Token more
Token ID517
Feature activation+4.84e-03

Pos logit contributions
 promise+0.007
 forgive+0.007
use+0.007
 better+0.007
 sip+0.007
Neg logit contributions
aces-0.005
lf-0.005
 walls-0.005
 towers-0.005
 floors-0.005
Token forest
Token ID8222
Feature activation+6.15e-03

Pos logit contributions
aces+0.019
lf+0.018
 walls+0.017
keeping+0.017
 floors+0.017
Neg logit contributions
 promise-0.025
 better-0.023
 forgive-0.023
use-0.022
 agree-0.022
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
 promise+0.007
 forgive+0.007
 better+0.006
 sip+0.006
 agree+0.006
Neg logit contributions
aces-0.015
keeping-0.015
anes-0.014
 floors-0.014
lf-0.014
Token there
Token ID612
Feature activation-3.88e-03

Pos logit contributions
aces+0.032
lf+0.031
 floors+0.029
keeping+0.029
 towers+0.029
Neg logit contributions
 promise-0.037
 forgive-0.036
 better-0.036
 sip-0.033
use-0.033
Token was
Token ID373
Feature activation-1.33e-02

Pos logit contributions
aces+0.008
keeping+0.008
 floors+0.007
anes+0.007
lf+0.007
Neg logit contributions
 forgive-0.006
 promise-0.006
 better-0.006
 agree-0.005
 sip-0.005
Token a
Token ID257
Feature activation+4.70e-04

Pos logit contributions
aces+0.025
lf+0.025
anes+0.023
keeping+0.023
 floors+0.023
Neg logit contributions
 better-0.022
 forgive-0.022
 promise-0.022
 sip-0.020
 safer-0.019
Token little
Token ID1310
Feature activation+3.58e-02

Pos logit contributions
use+0.001
severe+0.001
 unnecessary+0.001
forts+0.001
 promise+0.001
Neg logit contributions
lf-0.001
 walls-0.001
 towers-0.000
 bricks-0.000
 ground-0.000
Token wolf
Token ID17481
Feature activation-1.13e-04

Pos logit contributions
 promise+0.066
use+0.066
 forgive+0.066
 rinse+0.064
 agree+0.063
Neg logit contributions
 walls-0.054
lf-0.053
 bricks-0.053
 cubes-0.050
 blocks-0.050
Token.
Token ID13
Feature activation-3.78e-03

Pos logit contributions
aces+0.000
lf+0.000
 floors+0.000
 towers+0.000
 walls+0.000
Neg logit contributions
 promise-0.000
 forgive-0.000
 better-0.000
use-0.000
 agree-0.000
Token The
Token ID383
Feature activation+8.36e-03

Pos logit contributions
lf+0.007
aces+0.007
ator+0.007
 floors+0.007
keeping+0.007
Neg logit contributions
 forgive-0.005
 sip-0.005
 promise-0.005
 kiss-0.005
 better-0.005
Token little
Token ID1310
Feature activation+2.63e-02

Pos logit contributions
 promise+0.013
 better+0.013
 forgive+0.013
use+0.012
 sip+0.011
Neg logit contributions
aces-0.017
lf-0.016
 towers-0.015
 walls-0.015
keeping-0.015
Token wolf
Token ID17481
Feature activation-2.11e-03

Pos logit contributions
 promise+0.042
 forgive+0.039
 better+0.039
use+0.037
 agree+0.037
Neg logit contributions
aces-0.051
lf-0.051
 walls-0.049
 towers-0.048
 floors-0.048
Token and
Token ID290
Feature activation-6.21e-03

Pos logit contributions
 promise+0.046
 forgive+0.045
use+0.044
 sip+0.044
 better+0.043
Neg logit contributions
aces-0.035
lf-0.035
 walls-0.034
 floors-0.032
 towers-0.032
Token grateful
Token ID14066
Feature activation+1.20e-02

Pos logit contributions
lf+0.013
aces+0.013
 walls+0.012
 towers+0.012
 floors+0.012
Neg logit contributions
 promise-0.009
 forgive-0.008
 better-0.008
use-0.008
 agree-0.008
Token for
Token ID329
Feature activation+2.99e-03

Pos logit contributions
 promise+0.020
 forgive+0.019
 better+0.019
 agree+0.017
use+0.017
Neg logit contributions
lf-0.023
aces-0.023
anes-0.022
 towers-0.022
 walls-0.021
Token the
Token ID262
Feature activation-2.19e-03

Pos logit contributions
 promise+0.006
use+0.006
 forgive+0.006
 better+0.006
 sip+0.006
Neg logit contributions
lf-0.004
aces-0.004
 walls-0.004
 towers-0.004
 floors-0.004
Token help
Token ID1037
Feature activation-4.74e-03

Pos logit contributions
aces+0.005
lf+0.004
anes+0.004
umbled+0.004
 towers+0.004
Neg logit contributions
 promise-0.004
 better-0.003
 forgive-0.003
 safer-0.003
 agree-0.003
Token he
Token ID339
Feature activation+8.03e-03

Pos logit contributions
aces+0.008
lf+0.008
 towers+0.007
 walls+0.007
 floors+0.007
Neg logit contributions
 promise-0.009
 better-0.009
 forgive-0.008
 agree-0.008
use-0.008
Token received
Token ID2722
Feature activation-3.77e-03

Pos logit contributions
 promise+0.015
 forgive+0.015
 better+0.014
use+0.014
 sip+0.014
Neg logit contributions
aces-0.013
lf-0.013
 walls-0.012
 floors-0.012
 towers-0.012
Token.
Token ID13
Feature activation-7.40e-03

Pos logit contributions
aces+0.006
lf+0.006
 walls+0.006
 towers+0.006
 floors+0.006
Neg logit contributions
 promise-0.007
 forgive-0.007
 better-0.007
use-0.006
 agree-0.006
Token He
Token ID679
Feature activation-6.48e-03

Pos logit contributions
lf+0.014
aces+0.014
 walls+0.014
 floors+0.013
 towers+0.013
Neg logit contributions
 promise-0.012
 forgive-0.012
 better-0.012
 agree-0.011
 accept-0.010
Token left
Token ID1364
Feature activation-1.76e-03

Pos logit contributions
aces+0.011
lf+0.011
 walls+0.010
 towers+0.010
 floors+0.010
Neg logit contributions
 promise-0.012
 forgive-0.011
 better-0.011
use-0.011
 agree-0.011
Token feeling
Token ID4203
Feature activation+2.18e-02

Pos logit contributions
aces+0.003
anes+0.003
lf+0.003
umbled+0.003
 sculptures+0.003
Neg logit contributions
 better-0.003
 promise-0.003
 forgive-0.003
 safer-0.003
 kiss-0.003
Token use
Token ID779
Feature activation+3.75e-03

Pos logit contributions
 promise+0.012
 forgive+0.011
 better+0.011
use+0.011
 agree+0.011
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Token it
Token ID340
Feature activation-6.44e-03

Pos logit contributions
 promise+0.007
 forgive+0.006
 better+0.006
use+0.006
 agree+0.006
Neg logit contributions
aces-0.007
lf-0.007
 walls-0.006
 towers-0.006
 floors-0.006
Token to
Token ID284
Feature activation+7.33e-03

Pos logit contributions
lf+0.011
aces+0.011
 walls+0.010
 floors+0.010
 towers+0.010
Neg logit contributions
 promise-0.012
 better-0.011
 forgive-0.011
use-0.011
 agree-0.010
Token decor
Token ID11705
Feature activation+1.01e-02

Pos logit contributions
 promise+0.011
use+0.010
 better+0.010
 forgive+0.010
 unnecessary+0.010
Neg logit contributions
lf-0.015
aces-0.014
 walls-0.014
 towers-0.014
 floors-0.013
Tokenate
Token ID378
Feature activation+1.86e-02

Pos logit contributions
 promise+0.020
 better+0.020
 forgive+0.019
use+0.018
 agree+0.018
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
keeping-0.014
 towers-0.013
Token your
Token ID534
Feature activation+1.71e-02

Pos logit contributions
 promise+0.032
 better+0.031
 forgive+0.030
use+0.029
 agree+0.028
Neg logit contributions
aces-0.034
lf-0.033
 walls-0.031
 floors-0.031
anes-0.030
Token paper
Token ID3348
Feature activation-8.53e-03

Pos logit contributions
 promise+0.038
 forgive+0.036
 better+0.036
use+0.034
 agree+0.034
Neg logit contributions
aces-0.023
lf-0.023
 walls-0.021
 towers-0.021
anes-0.021
Token things
Token ID1243
Feature activation+6.82e-03

Pos logit contributions
aces+0.015
lf+0.014
 floors+0.013
 walls+0.013
anes+0.013
Neg logit contributions
 promise-0.016
 better-0.015
 forgive-0.015
use-0.014
 agree-0.014
Token.
Token ID13
Feature activation+7.60e-03

Pos logit contributions
 promise+0.013
 forgive+0.012
 better+0.012
use+0.012
 agree+0.012
Neg logit contributions
aces-0.011
lf-0.011
 walls-0.010
 towers-0.010
 floors-0.010
Token And
Token ID843
Feature activation+1.63e-02

Pos logit contributions
 promise+0.015
 better+0.014
 forgive+0.014
use+0.014
 sip+0.013
Neg logit contributions
aces-0.012
lf-0.011
 walls-0.011
 towers-0.011
anes-0.010
Token you
Token ID345
Feature activation+1.87e-02

Pos logit contributions
 promise+0.028
 forgive+0.027
 better+0.026
use+0.026
 agree+0.025
Neg logit contributions
aces-0.029
lf-0.029
 walls-0.027
 towers-0.027
 floors-0.027
Token rooms
Token ID9519
Feature activation+7.01e-03

Pos logit contributions
 promise+0.034
 forgive+0.032
 better+0.032
 agree+0.030
use+0.030
Neg logit contributions
aces-0.018
lf-0.018
anes-0.016
 walls-0.016
 towers-0.016
Token!"
Token ID2474
Feature activation+6.22e-03

Pos logit contributions
 promise+0.013
 forgive+0.012
 better+0.012
use+0.011
 agree+0.011
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.011
 towers-0.011
 floors-0.011
Token
Token ID198
Feature activation+9.99e-03

Pos logit contributions
 promise+0.010
 forgive+0.009
 better+0.009
use+0.009
 agree+0.008
Neg logit contributions
aces-0.012
lf-0.012
 walls-0.012
 towers-0.011
 floors-0.011
Token
Token ID198
Feature activation+1.64e-02

Pos logit contributions
 promise+0.019
use+0.018
 better+0.017
 sip+0.017
 unnecessary+0.017
Neg logit contributions
lf-0.017
aces-0.016
 walls-0.015
 towers-0.015
 himself-0.015
TokenTom
Token ID13787
Feature activation-2.53e-02

Pos logit contributions
 promise+0.034
 better+0.032
 sip+0.032
 forgive+0.032
use+0.032
Neg logit contributions
lf-0.024
aces-0.023
 towers-0.022
 walls-0.021
 floors-0.021
Token and
Token ID290
Feature activation+1.01e-02

Pos logit contributions
lf+0.045
aces+0.044
 towers+0.043
 walls+0.043
 floors+0.042
Neg logit contributions
 promise-0.045
 forgive-0.043
 better-0.043
 agree-0.041
use-0.039
Token Lily
Token ID20037
Feature activation-2.17e-02

Pos logit contributions
 promise+0.019
use+0.019
 unnecessary+0.018
 agree+0.017
 better+0.017
Neg logit contributions
lf-0.017
aces-0.015
 walls-0.015
 towers-0.015
 floors-0.014
Token felt
Token ID2936
Feature activation+1.90e-02

Pos logit contributions
aces+0.037
lf+0.037
 walls+0.035
 towers+0.034
 floors+0.034
Neg logit contributions
 promise-0.040
 forgive-0.038
 better-0.037
use-0.036
 agree-0.035
Token sad
Token ID6507
Feature activation+1.23e-02

Pos logit contributions
use+0.033
 promise+0.031
ackers+0.028
 agree+0.028
 sip+0.027
Neg logit contributions
aces-0.035
 himself-0.034
 walls-0.034
 towers-0.033
 floors-0.033
Token and
Token ID290
Feature activation+9.26e-04

Pos logit contributions
 promise+0.027
use+0.026
 forgive+0.025
 rinse+0.024
 unnecessary+0.024
Neg logit contributions
aces-0.016
lf-0.015
 walls-0.015
 towers-0.015
 floors-0.014
Token confused
Token ID10416
Feature activation+5.00e-03

Pos logit contributions
 promise+0.001
use+0.001
 forgive+0.001
 agree+0.001
 sip+0.001
Neg logit contributions
aces-0.002
 walls-0.002
lf-0.002
 towers-0.002
 floors-0.002
Token new
Token ID649
Feature activation-7.30e-03

Pos logit contributions
 promise+0.011
 forgive+0.010
 better+0.010
use+0.010
 agree+0.010
Neg logit contributions
aces-0.009
lf-0.009
 walls-0.009
 floors-0.009
 towers-0.009
Token friends
Token ID2460
Feature activation+2.28e-02

Pos logit contributions
lf+0.012
aces+0.011
anes+0.011
 walls+0.010
 floors+0.010
Neg logit contributions
 promise-0.015
 forgive-0.014
 better-0.014
 agree-0.013
use-0.013
Token.
Token ID13
Feature activation+4.69e-03

Pos logit contributions
 promise+0.042
 forgive+0.040
 better+0.039
use+0.039
 agree+0.038
Neg logit contributions
aces-0.039
lf-0.038
 walls-0.036
 towers-0.035
 floors-0.035
Token They
Token ID1119
Feature activation+1.83e-03

Pos logit contributions
 promise+0.013
use+0.012
 unnecessary+0.012
 forgive+0.012
 sip+0.011
Neg logit contributions
aces-0.004
 himself-0.004
lf-0.004
 walls-0.004
 towers-0.004
Token eat
Token ID4483
Feature activation+1.21e-03

Pos logit contributions
 unnecessary+0.003
use+0.002
 promise+0.002
 louder+0.002
ackers+0.002
Neg logit contributions
aces-0.004
 walls-0.003
lf-0.003
 towers-0.003
 himself-0.003
Token and
Token ID290
Feature activation+3.09e-03

Pos logit contributions
 promise+0.002
use+0.002
 forgive+0.002
 agree+0.002
 unnecessary+0.002
Neg logit contributions
aces-0.002
 towers-0.002
 walls-0.002
lf-0.002
 floors-0.002
Token drink
Token ID4144
Feature activation+1.60e-02

Pos logit contributions
 promise+0.004
 unnecessary+0.004
use+0.004
 forgive+0.004
 better+0.004
Neg logit contributions
lf-0.006
aces-0.006
 walls-0.006
 towers-0.006
keeping-0.006
Token and
Token ID290
Feature activation-3.51e-03

Pos logit contributions
 promise+0.032
use+0.030
 forgive+0.029
 agree+0.028
 unnecessary+0.028
Neg logit contributions
aces-0.025
 walls-0.024
 towers-0.024
lf-0.023
 floors-0.022
Token then
Token ID788
Feature activation-3.17e-03

Pos logit contributions
lf+0.007
aces+0.007
 walls+0.007
 towers+0.007
keeping+0.007
Neg logit contributions
 promise-0.005
use-0.004
 forgive-0.004
 unnecessary-0.004
 better-0.004
Token lie
Token ID6486
Feature activation+3.57e-02

Pos logit contributions
aces+0.006
lf+0.006
 towers+0.006
 walls+0.006
 blocks+0.005
Neg logit contributions
 promise-0.005
 unnecessary-0.005
use-0.004
 forgive-0.004
ackers-0.004
Token down
Token ID866
Feature activation-9.31e-03

Pos logit contributions
 promise+0.078
 forgive+0.075
 agree+0.071
use+0.071
 better+0.071
Neg logit contributions
aces-0.048
lf-0.046
 walls-0.045
 towers-0.045
 floors-0.044
Token in
Token ID287
Feature activation-1.04e-02

Pos logit contributions
keeping+0.008
aces+0.008
umbled+0.007
anes+0.007
ped+0.007
Neg logit contributions
 better-0.006
 forgive-0.005
 sip-0.005
 promise-0.005
 louder-0.005
Token the
Token ID262
Feature activation-9.47e-03

Pos logit contributions
aces+0.023
umbled+0.020
lf+0.020
anes+0.019
ped+0.019
Neg logit contributions
 better-0.016
 promise-0.015
 forgive-0.015
 sip-0.014
 rinse-0.014
Token bright
Token ID6016
Feature activation-5.49e-03

Pos logit contributions
lf+0.014
aces+0.014
 walls+0.013
 towers+0.013
 floors+0.013
Neg logit contributions
 promise-0.020
 forgive-0.019
 better-0.018
use-0.018
 agree-0.018
Token sunshine
Token ID34488
Feature activation+1.89e-02

Pos logit contributions
aces+0.009
lf+0.008
keeping+0.008
anes+0.007
 walls+0.007
Neg logit contributions
 promise-0.011
 better-0.010
 forgive-0.010
use-0.010
 sip-0.010
Token.
Token ID13
Feature activation-5.13e-03

Pos logit contributions
 better+0.024
 promise+0.021
 forgive+0.020
 agree+0.020
 sip+0.019
Neg logit contributions
lf-0.042
aces-0.041
anes-0.040
umbled-0.039
ped-0.039
Token One
Token ID1881
Feature activation-1.73e-03

Pos logit contributions
aces+0.011
lf+0.011
 walls+0.010
 towers+0.010
 floors+0.010
Neg logit contributions
 promise-0.008
 forgive-0.007
 better-0.007
use-0.007
 agree-0.006
Token day
Token ID1110
Feature activation+9.07e-03

Pos logit contributions
aces+0.003
lf+0.003
 walls+0.002
 towers+0.002
 floors+0.002
Neg logit contributions
 promise-0.004
 forgive-0.003
 better-0.003
use-0.003
 agree-0.003
Token,
Token ID11
Feature activation-9.53e-03

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.018
lf-0.017
 walls-0.017
 towers-0.016
 floors-0.016
Token he
Token ID339
Feature activation-4.28e-03

Pos logit contributions
aces+0.017
keeping+0.015
lf+0.014
anes+0.014
 floors+0.014
Neg logit contributions
 better-0.019
 forgive-0.018
 promise-0.018
 sip-0.018
 agree-0.017
Token saw
Token ID2497
Feature activation-2.50e-02

Pos logit contributions
lf+0.007
aces+0.007
 walls+0.007
 floors+0.007
 towers+0.007
Neg logit contributions
 promise-0.008
 forgive-0.007
 better-0.007
use-0.007
 agree-0.007
Token a
Token ID257
Feature activation+9.53e-03

Pos logit contributions
aces+0.048
lf+0.046
keeping+0.044
 walls+0.042
 floors+0.042
Neg logit contributions
 better-0.042
 forgive-0.042
 promise-0.039
 sip-0.037
 safer-0.036
Token a
Token ID257
Feature activation+3.15e-03

Pos logit contributions
aces+0.003
lf+0.003
keeping+0.003
anes+0.003
 towers+0.003
Neg logit contributions
 better-0.003
 promise-0.003
 forgive-0.003
 rinse-0.003
 sip-0.003
Token little
Token ID1310
Feature activation+3.24e-02

Pos logit contributions
use+0.006
 unnecessary+0.006
 promise+0.006
 forgive+0.006
 sip+0.006
Neg logit contributions
lf-0.005
 walls-0.004
 floors-0.004
 towers-0.004
 cubes-0.004
Token boy
Token ID2933
Feature activation-2.25e-03

Pos logit contributions
 promise+0.059
use+0.057
 forgive+0.057
 sip+0.053
 unnecessary+0.053
Neg logit contributions
lf-0.052
 walls-0.052
 floors-0.050
aces-0.050
 towers-0.050
Token.
Token ID13
Feature activation-1.25e-03

Pos logit contributions
aces+0.004
lf+0.004
 walls+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
use-0.004
 agree-0.003
Token He
Token ID679
Feature activation-5.94e-03

Pos logit contributions
lf+0.002
aces+0.002
 walls+0.002
 towers+0.002
 floors+0.002
Neg logit contributions
 promise-0.002
 forgive-0.002
 better-0.002
 agree-0.002
use-0.002
Token had
Token ID550
Feature activation-1.70e-02

Pos logit contributions
aces+0.011
lf+0.010
 towers+0.010
keeping+0.010
anes+0.010
Neg logit contributions
 promise-0.010
 forgive-0.010
 better-0.010
 agree-0.009
use-0.009
Token a
Token ID257
Feature activation-8.48e-03

Pos logit contributions
lf+0.038
ator+0.037
aces+0.036
anes+0.035
ados+0.035
Neg logit contributions
 better-0.024
 forgive-0.021
 promise-0.020
 kiss-0.018
 rinse-0.018
Token pet
Token ID4273
Feature activation+1.16e-03

Pos logit contributions
lf+0.012
 walls+0.011
 blocks+0.010
 cubes+0.010
 towers+0.010
Neg logit contributions
use-0.017
 agree-0.017
 unnecessary-0.017
 forgive-0.017
 promise-0.016
Token dog
Token ID3290
Feature activation-9.66e-03

Pos logit contributions
 promise+0.002
 forgive+0.002
 better+0.002
use+0.002
 agree+0.002
Neg logit contributions
aces-0.002
lf-0.002
 walls-0.002
 towers-0.002
 floors-0.002
Token called
Token ID1444
Feature activation+3.09e-04

Pos logit contributions
aces+0.017
lf+0.017
 walls+0.016
 towers+0.015
 floors+0.015
Neg logit contributions
 promise-0.018
 forgive-0.016
use-0.016
 better-0.016
 agree-0.016
Token Bos
Token ID14548
Feature activation-2.21e-02

Pos logit contributions
 promise+0.001
 forgive+0.000
 better+0.000
use+0.000
 agree+0.000
Neg logit contributions
aces-0.001
lf-0.001
 walls-0.001
 towers-0.001
 floors-0.001
Token little
Token ID1310
Feature activation+3.72e-02

Pos logit contributions
use+0.026
 promise+0.025
 unnecessary+0.025
severe+0.025
 rinse+0.023
Neg logit contributions
lf-0.016
 walls-0.014
 cubes-0.013
 blocks-0.012
 bricks-0.012
Token girl
Token ID2576
Feature activation+1.23e-02

Pos logit contributions
 promise+0.065
 forgive+0.062
use+0.059
 agree+0.058
 unnecessary+0.058
Neg logit contributions
lf-0.064
 walls-0.061
 bricks-0.061
aces-0.060
 cubes-0.060
Token named
Token ID3706
Feature activation-9.11e-03

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.013
use+0.013
 sip+0.012
Neg logit contributions
aces-0.029
lf-0.029
 walls-0.027
 towers-0.027
 floors-0.027
Token Sue
Token ID26089
Feature activation-1.16e-02

Pos logit contributions
aces+0.018
lf+0.018
 walls+0.018
 towers+0.017
 floors+0.017
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.012
 agree-0.012
Token went
Token ID1816
Feature activation-8.60e-03

Pos logit contributions
aces+0.022
lf+0.022
 towers+0.021
 walls+0.021
 floors+0.021
Neg logit contributions
 promise-0.019
 forgive-0.018
 better-0.018
use-0.018
 sip-0.017
Token for
Token ID329
Feature activation-1.37e-02

Pos logit contributions
aces+0.015
lf+0.015
 floors+0.014
keeping+0.014
 walls+0.013
Neg logit contributions
 better-0.016
 promise-0.015
 forgive-0.015
use-0.014
 sip-0.014
Token a
Token ID257
Feature activation-8.16e-04

Pos logit contributions
aces+0.023
lf+0.023
 towers+0.021
 walls+0.021
 floors+0.021
Neg logit contributions
 promise-0.025
 better-0.024
 forgive-0.024
use-0.023
 sip-0.023
Token walk
Token ID2513
Feature activation+1.02e-03

Pos logit contributions
lf+0.002
 walls+0.001
 cubes+0.001
 towers+0.001
 bricks+0.001
Neg logit contributions
 promise-0.001
 forgive-0.001
 better-0.001
 cal-0.001
 unnecessary-0.001
Token.
Token ID13
Feature activation-5.37e-03

Pos logit contributions
 promise+0.002
 better+0.002
 forgive+0.002
 sip+0.002
 agree+0.002
Neg logit contributions
lf-0.002
aces-0.002
anes-0.002
 towers-0.002
 floors-0.002
Token She
Token ID1375
Feature activation+2.95e-03

Pos logit contributions
lf+0.010
aces+0.010
 floors+0.009
 walls+0.009
anes+0.009
Neg logit contributions
 promise-0.009
 forgive-0.009
 sip-0.009
 better-0.009
 agree-0.008
Token saw
Token ID2497
Feature activation-2.84e-02

Pos logit contributions
 promise+0.004
 forgive+0.004
 better+0.004
use+0.003
 agree+0.003
Neg logit contributions
aces-0.007
lf-0.006
anes-0.006
keeping-0.006
 towers-0.006
Token twins
Token ID20345
Feature activation-6.34e-04

Pos logit contributions
aces+0.012
lf+0.012
 walls+0.011
 towers+0.011
 floors+0.011
Neg logit contributions
 promise-0.017
 forgive-0.016
 better-0.016
use-0.016
 agree-0.015
Token.
Token ID13
Feature activation+6.38e-03

Pos logit contributions
lf+0.001
aces+0.001
 walls+0.001
 towers+0.001
 floors+0.001
Neg logit contributions
 unnecessary-0.001
 promise-0.001
use-0.001
 sip-0.001
severe-0.001
Token They
Token ID1119
Feature activation-2.22e-03

Pos logit contributions
 promise+0.013
 unnecessary+0.013
 forgive+0.012
 better+0.012
use+0.012
Neg logit contributions
aces-0.009
lf-0.009
 walls-0.009
 towers-0.008
 floors-0.008
Token like
Token ID588
Feature activation-1.11e-03

Pos logit contributions
aces+0.005
 walls+0.004
lf+0.004
 towers+0.004
 floors+0.004
Neg logit contributions
 promise-0.003
 unnecessary-0.003
use-0.003
 better-0.003
 forgive-0.003
Token to
Token ID284
Feature activation-1.16e-02

Pos logit contributions
aces+0.002
lf+0.002
ator+0.002
 walls+0.002
 floors+0.002
Neg logit contributions
 forgive-0.002
 better-0.002
 promise-0.002
 kiss-0.002
 agree-0.002
Token play
Token ID711
Feature activation-6.58e-03

Pos logit contributions
aces+0.026
lf+0.026
 walls+0.025
 towers+0.025
 floors+0.025
Neg logit contributions
 promise-0.014
 forgive-0.013
 better-0.013
use-0.013
 agree-0.012
Token together
Token ID1978
Feature activation+6.04e-03

Pos logit contributions
lf+0.008
aces+0.007
 walls+0.007
 towers+0.007
 floors+0.007
Neg logit contributions
 promise-0.016
use-0.015
 forgive-0.015
 better-0.015
 unnecessary-0.014
Token,
Token ID11
Feature activation+7.95e-05

Pos logit contributions
 promise+0.012
 forgive+0.011
 better+0.011
use+0.011
 agree+0.010
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.009
 towers-0.009
 floors-0.009
Token but
Token ID475
Feature activation+5.66e-03

Pos logit contributions
 promise+0.000
 forgive+0.000
 better+0.000
 agree+0.000
 sip+0.000
Neg logit contributions
aces-0.000
lf-0.000
umbled-0.000
keeping-0.000
 floors-0.000
Token sometimes
Token ID3360
Feature activation+2.49e-03

Pos logit contributions
 promise+0.010
use+0.010
 forgive+0.010
 unnecessary+0.010
 agree+0.010
Neg logit contributions
lf-0.009
aces-0.009
 walls-0.008
 towers-0.008
 floors-0.008
Token they
Token ID484
Feature activation-1.73e-03

Pos logit contributions
 promise+0.004
 forgive+0.004
use+0.004
 better+0.004
 sip+0.004
Neg logit contributions
lf-0.005
aces-0.004
 walls-0.004
 towers-0.004
 floors-0.004
Token a
Token ID257
Feature activation+1.88e-03

Pos logit contributions
use+0.040
 promise+0.039
 forgive+0.036
 agree+0.036
 sip+0.034
Neg logit contributions
aces-0.022
 himself-0.021
 walls-0.020
 floors-0.019
 towers-0.019
Token little
Token ID1310
Feature activation+3.31e-02

Pos logit contributions
use+0.003
 promise+0.003
 forgive+0.003
 agree+0.003
 unnecessary+0.003
Neg logit contributions
lf-0.003
 walls-0.003
 floors-0.003
 ground-0.003
aces-0.003
Token anxious
Token ID18116
Feature activation-1.61e-03

Pos logit contributions
use+0.048
ackers+0.045
 forgive+0.042
?'"+0.040
 promise+0.040
Neg logit contributions
 walls-0.063
aces-0.061
 floors-0.059
 towers-0.059
 ground-0.058
Token but
Token ID475
Feature activation-2.30e-02

Pos logit contributions
aces+0.003
lf+0.003
 walls+0.003
 floors+0.002
 towers+0.002
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
use-0.003
 agree-0.003
Token kept
Token ID4030
Feature activation-4.92e-03

Pos logit contributions
aces+0.038
lf+0.038
 walls+0.036
 towers+0.035
 floors+0.035
Neg logit contributions
 promise-0.042
use-0.040
 forgive-0.039
 better-0.038
 agree-0.037
Token on
Token ID319
Feature activation-2.37e-02

Pos logit contributions
aces+0.007
lf+0.007
 walls+0.006
 towers+0.006
 floors+0.006
Neg logit contributions
 promise-0.011
 better-0.010
 forgive-0.010
use-0.010
 agree-0.010
Token bouncing
Token ID30713
Feature activation+7.25e-03

Pos logit contributions
aces+0.037
lf+0.036
anes+0.035
ator+0.032
 cubes+0.031
Neg logit contributions
 better-0.050
 promise-0.048
 forgive-0.045
 unnecessary-0.044
 agree-0.044
Token around
Token ID1088
Feature activation-7.05e-03

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
aces-0.011
lf-0.011
 walls-0.010
 towers-0.010
 floors-0.010
Token.
Token ID13
Feature activation-2.68e-03

Pos logit contributions
aces+0.013
lf+0.013
anes+0.013
umbled+0.012
 floors+0.012
Neg logit contributions
 better-0.012
 forgive-0.011
 promise-0.011
 sip-0.010
 agree-0.010
Token
Token ID198
Feature activation-6.72e-03

Pos logit contributions
lf+0.005
aces+0.005
 towers+0.005
 walls+0.005
 floors+0.005
Neg logit contributions
 promise-0.004
 forgive-0.004
 better-0.004
 agree-0.004
use-0.004
Token
Token ID198
Feature activation-2.00e-03

Pos logit contributions
lf+0.013
aces+0.013
 walls+0.012
 towers+0.012
 floors+0.012
Neg logit contributions
 promise-0.011
 forgive-0.010
 better-0.010
use-0.010
 sip-0.010
TokenB
Token ID33
Feature activation-2.27e-03

Pos logit contributions
 promise+0.001
 better+0.001
 forgive+0.001
use+0.001
 sip+0.001
Neg logit contributions
lf-0.001
aces-0.001
 walls-0.001
 floors-0.001
 towers-0.001
Tokeneth
Token ID2788
Feature activation-4.34e-02

Pos logit contributions
lf+0.005
aces+0.005
 walls+0.005
 towers+0.005
 floors+0.005
Neg logit contributions
 promise-0.003
 forgive-0.003
 better-0.003
use-0.003
 agree-0.002
Tokenâ
Token ID22940
Feature activation+3.24e-03

Pos logit contributions
anes+0.074
 bricks+0.074
 towers+0.074
aces+0.073
 cubes+0.072
Neg logit contributions
 promise-0.077
 agree-0.071
 better-0.070
 forgive-0.069
 accept-0.065
Token€
Token ID26391
Feature activation-1.48e-02

Pos logit contributions
 promise+0.005
 forgive+0.005
 better+0.005
use+0.005
 agree+0.005
Neg logit contributions
lf-0.006
aces-0.006
 walls-0.006
 towers-0.006
 floors-0.006
Tokenâ„¢
Token ID8151
Feature activation-1.02e-02

Pos logit contributions
aces+0.029
lf+0.029
 walls+0.028
 floors+0.028
 towers+0.028
Neg logit contributions
 promise-0.022
 forgive-0.021
 agree-0.020
use-0.020
 better-0.020
Tokens
Token ID82
Feature activation-5.77e-03

Pos logit contributions
lf+0.019
aces+0.019
 walls+0.018
 towers+0.018
 floors+0.018
Neg logit contributions
 promise-0.017
 forgive-0.016
 better-0.016
use-0.015
 agree-0.015
Token friends
Token ID2460
Feature activation+9.31e-03

Pos logit contributions
aces+0.010
lf+0.010
anes+0.009
keeping+0.009
 towers+0.009
Neg logit contributions
 promise-0.011
 forgive-0.010
 better-0.010
 agree-0.009
 kiss-0.009
Token tried
Token ID3088
Feature activation+6.48e-03

Pos logit contributions
 promise+0.018
 better+0.017
 forgive+0.017
use+0.016
 sip+0.016
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Token to
Token ID284
Feature activation+1.97e-02

Pos logit contributions
 promise+0.013
 better+0.013
 forgive+0.013
 agree+0.012
 sip+0.012
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.009
anes-0.009
 floors-0.009
Token help
Token ID1037
Feature activation+2.80e-02

Pos logit contributions
 promise+0.029
 forgive+0.027
 better+0.026
 agree+0.025
use+0.024
Neg logit contributions
aces-0.043
lf-0.042
 walls-0.040
 towers-0.040
 floors-0.040
Token her
Token ID607
Feature activation+2.29e-02

Pos logit contributions
 promise+0.054
 forgive+0.051
 better+0.050
use+0.050
 agree+0.049
Neg logit contributions
lf-0.044
aces-0.044
 walls-0.042
 floors-0.041
 towers-0.041
Token
Token ID198
Feature activation+5.91e-05

Pos logit contributions
 forgive+0.012
 promise+0.012
 better+0.012
 agree+0.012
 sip+0.011
Neg logit contributions
lf-0.017
aces-0.017
 floors-0.016
 walls-0.016
 towers-0.015
Token
Token ID198
Feature activation+1.78e-03

Pos logit contributions
 promise+0.000
 forgive+0.000
 better+0.000
use+0.000
 agree+0.000
Neg logit contributions
lf-0.000
aces-0.000
 walls-0.000
 towers-0.000
 floors-0.000
TokenThe
Token ID464
Feature activation+1.47e-02

Pos logit contributions
 promise+0.003
 forgive+0.003
 better+0.003
use+0.002
 agree+0.002
Neg logit contributions
aces-0.004
lf-0.004
 walls-0.003
 towers-0.003
 floors-0.003
Token bird
Token ID6512
Feature activation+1.47e-02

Pos logit contributions
 promise+0.027
 better+0.025
 forgive+0.024
 agree+0.022
 sip+0.021
Neg logit contributions
aces-0.029
keeping-0.028
lf-0.027
anes-0.026
 floors-0.025
Token ch
Token ID442
Feature activation+8.36e-03

Pos logit contributions
 promise+0.026
 forgive+0.025
 better+0.025
 agree+0.023
 sip+0.023
Neg logit contributions
aces-0.027
lf-0.026
 floors-0.025
 towers-0.025
 walls-0.025
Tokenir
Token ID343
Feature activation-4.36e-02

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.014
use+0.013
 agree+0.013
Neg logit contributions
lf-0.015
aces-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Tokenped
Token ID9124
Feature activation-5.88e-03

Pos logit contributions
lf+0.046
 floors+0.039
keeping+0.035
 sculptures+0.035
 walls+0.035
Neg logit contributions
 promise-0.109
 better-0.109
 forgive-0.108
 accept-0.102
 kiss-0.101
Token and
Token ID290
Feature activation+6.04e-03

Pos logit contributions
lf+0.011
aces+0.011
 walls+0.011
 floors+0.011
 towers+0.010
Neg logit contributions
 promise-0.010
 better-0.009
 forgive-0.009
 agree-0.008
 sip-0.008
Token settled
Token ID10282
Feature activation+1.95e-02

Pos logit contributions
 promise+0.011
 forgive+0.010
 better+0.010
use+0.009
 agree+0.009
Neg logit contributions
aces-0.011
lf-0.011
 walls-0.010
 floors-0.010
 towers-0.010
Token in
Token ID287
Feature activation+1.92e-03

Pos logit contributions
 better+0.033
 promise+0.033
 forgive+0.033
use+0.030
 agree+0.029
Neg logit contributions
lf-0.035
aces-0.035
anes-0.033
 floors-0.032
 walls-0.031
Token the
Token ID262
Feature activation+9.91e-03

Pos logit contributions
 promise+0.004
 forgive+0.004
 better+0.003
use+0.003
 agree+0.003
Neg logit contributions
aces-0.003
lf-0.003
 walls-0.003
 floors-0.003
 towers-0.003
Token was
Token ID373
Feature activation-6.62e-04

Pos logit contributions
aces+0.009
lf+0.009
 walls+0.009
 towers+0.009
 floors+0.009
Neg logit contributions
 promise-0.011
 forgive-0.010
 better-0.010
use-0.010
 agree-0.010
Token a
Token ID257
Feature activation+3.39e-03

Pos logit contributions
aces+0.001
lf+0.001
keeping+0.001
anes+0.001
 towers+0.001
Neg logit contributions
 better-0.001
 promise-0.001
 forgive-0.001
 rinse-0.001
 sip-0.001
Token 3
Token ID513
Feature activation-2.73e-02

Pos logit contributions
use+0.007
 unnecessary+0.007
 forgive+0.006
 promise+0.006
 sip+0.006
Neg logit contributions
lf-0.005
 walls-0.005
 floors-0.004
 towers-0.004
 cubes-0.004
Token-
Token ID12
Feature activation-4.61e-02

Pos logit contributions
aces+0.049
anes+0.046
 towers+0.043
 walls+0.043
keeping+0.042
Neg logit contributions
 promise-0.051
 agree-0.047
 forgive-0.046
 better-0.046
 sip-0.044
Tokenyear
Token ID1941
Feature activation-5.85e-02

Pos logit contributions
aces+0.086
lf+0.085
 walls+0.083
 towers+0.081
 floors+0.080
Neg logit contributions
 promise-0.078
 forgive-0.077
 better-0.075
 agree-0.070
use-0.070
Token-
Token ID12
Feature activation-4.57e-02

Pos logit contributions
aces+0.108
lf+0.108
anes+0.103
 towers+0.102
keeping+0.100
Neg logit contributions
 promise-0.103
 better-0.091
 agree-0.089
 forgive-0.087
 kiss-0.082
Tokenold
Token ID727
Feature activation+7.92e-03

Pos logit contributions
aces+0.108
lf+0.107
 walls+0.107
 towers+0.104
 floors+0.103
Neg logit contributions
 promise-0.054
 forgive-0.054
 better-0.051
 agree-0.048
use-0.045
Token who
Token ID508
Feature activation+1.24e-02

Pos logit contributions
 promise+0.013
 forgive+0.012
 better+0.012
use+0.011
 sip+0.011
Neg logit contributions
aces-0.015
lf-0.015
 walls-0.014
 towers-0.014
 floors-0.014
Token loved
Token ID6151
Feature activation-2.60e-03

Pos logit contributions
 promise+0.022
 forgive+0.021
 better+0.021
 agree+0.019
use+0.019
Neg logit contributions
aces-0.022
lf-0.022
 towers-0.021
 walls-0.021
 floors-0.020
Token to
Token ID284
Feature activation-2.42e-02

Pos logit contributions
aces+0.005
lf+0.004
umbled+0.004
anes+0.004
ator+0.004
Neg logit contributions
 better-0.005
 forgive-0.004
 promise-0.004
 agree-0.004
 kiss-0.004
Token explore
Token ID7301
Feature activation-1.33e-02

Pos logit contributions
lf+0.053
aces+0.052
 walls+0.050
 towers+0.049
 floors+0.049
Neg logit contributions
 promise-0.033
 forgive-0.030
 better-0.030
use-0.029
 agree-0.027
Token the
Token ID262
Feature activation+9.06e-04

Pos logit contributions
 promise+0.021
 forgive+0.020
 better+0.020
use+0.019
 agree+0.019
Neg logit contributions
aces-0.019
lf-0.019
 walls-0.018
 towers-0.018
 floors-0.017
Token little
Token ID1310
Feature activation+1.24e-02

Pos logit contributions
 promise+0.001
 forgive+0.001
 better+0.001
use+0.001
 agree+0.001
Neg logit contributions
aces-0.002
lf-0.002
 walls-0.002
 towers-0.002
 floors-0.002
Token boy
Token ID2933
Feature activation+1.72e-04

Pos logit contributions
 promise+0.019
 forgive+0.018
 better+0.017
use+0.017
 agree+0.016
Neg logit contributions
aces-0.025
lf-0.025
 walls-0.024
 towers-0.024
 floors-0.023
Token cheered
Token ID34767
Feature activation-1.17e-03

Pos logit contributions
 promise+0.000
 forgive+0.000
 better+0.000
use+0.000
 agree+0.000
Neg logit contributions
aces-0.000
lf-0.000
 walls-0.000
 towers-0.000
 floors-0.000
Token the
Token ID262
Feature activation-9.88e-03

Pos logit contributions
aces+0.002
lf+0.002
 walls+0.002
 floors+0.002
 towers+0.002
Neg logit contributions
 promise-0.002
 forgive-0.002
 better-0.002
 agree-0.002
use-0.002
Token loud
Token ID7812
Feature activation-4.58e-02

Pos logit contributions
lf+0.018
aces+0.018
 walls+0.017
 floors+0.016
 towers+0.016
Neg logit contributions
 promise-0.017
 forgive-0.016
use-0.016
 better-0.016
 agree-0.015
Tokenest
Token ID395
Feature activation-1.31e-02

Pos logit contributions
lf+0.080
aces+0.079
 walls+0.075
 towers+0.074
 floors+0.073
Neg logit contributions
 promise-0.082
 forgive-0.078
 better-0.076
use-0.075
 agree-0.073
Token.
Token ID13
Feature activation-1.32e-02

Pos logit contributions
lf+0.024
aces+0.024
 walls+0.023
 towers+0.022
 floors+0.022
Neg logit contributions
 promise-0.022
 forgive-0.021
use-0.021
 better-0.020
 rinse-0.020
Token<|endoftext|>
Token ID50256
Feature activation-3.63e-03

Pos logit contributions
aces+0.022
lf+0.022
 walls+0.020
 towers+0.020
 floors+0.020
Neg logit contributions
 promise-0.025
 forgive-0.024
 better-0.023
use-0.023
 agree-0.022
TokenJohn
Token ID7554
Feature activation-2.79e-02

Pos logit contributions
aces+0.008
lf+0.008
 walls+0.008
 towers+0.007
 floors+0.007
Neg logit contributions
 promise-0.005
 forgive-0.005
 better-0.005
use-0.004
 agree-0.004
Token and
Token ID290
Feature activation-4.43e-03

Pos logit contributions
aces+0.048
lf+0.047
 towers+0.044
 walls+0.044
 floors+0.044
Neg logit contributions
 promise-0.052
 forgive-0.049
 better-0.049
use-0.047
 agree-0.046
Token It
Token ID632
Feature activation-7.20e-04

Pos logit contributions
lf+0.017
aces+0.016
 floors+0.015
 towers+0.015
 walls+0.015
Neg logit contributions
 forgive-0.019
 better-0.019
 promise-0.019
 agree-0.018
 sip-0.018
Token was
Token ID373
Feature activation-2.37e-03

Pos logit contributions
aces+0.001
keeping+0.001
anes+0.001
 towers+0.001
lf+0.001
Neg logit contributions
 better-0.001
 promise-0.001
 forgive-0.001
 sip-0.001
 rinse-0.001
Token even
Token ID772
Feature activation+2.65e-02

Pos logit contributions
aces+0.005
lf+0.004
anes+0.004
keeping+0.004
 floors+0.004
Neg logit contributions
 better-0.004
 promise-0.004
 forgive-0.004
 rinse-0.003
 safer-0.003
Token more
Token ID517
Feature activation-8.02e-03

Pos logit contributions
 promise+0.035
use+0.035
 forgive+0.032
 agree+0.032
 sip+0.030
Neg logit contributions
 walls-0.056
 towers-0.056
lf-0.055
 floors-0.055
aces-0.055
Token delicious
Token ID12625
Feature activation-1.38e-02

Pos logit contributions
aces+0.017
lf+0.017
 walls+0.016
anes+0.016
keeping+0.016
Neg logit contributions
 promise-0.012
 better-0.012
 forgive-0.011
 agree-0.011
 rinse-0.010
Token than
Token ID621
Feature activation-4.34e-02

Pos logit contributions
aces+0.028
anes+0.028
lf+0.027
umbled+0.026
 floors+0.025
Neg logit contributions
 better-0.021
 promise-0.019
 agree-0.018
 sip-0.017
 forgive-0.017
Token he
Token ID339
Feature activation-1.49e-02

Pos logit contributions
lf+0.076
aces+0.076
 floors+0.071
anes+0.070
 walls+0.070
Neg logit contributions
 better-0.079
 forgive-0.074
 promise-0.074
 safer-0.069
 sip-0.067
Token remembered
Token ID12086
Feature activation-1.57e-02

Pos logit contributions
aces+0.025
lf+0.024
 towers+0.024
anes+0.024
 floors+0.024
Neg logit contributions
 promise-0.028
 better-0.027
 forgive-0.026
use-0.026
 agree-0.025
Token.
Token ID13
Feature activation-5.36e-03

Pos logit contributions
aces+0.029
lf+0.028
anes+0.027
 walls+0.027
 floors+0.027
Neg logit contributions
 promise-0.027
 forgive-0.026
 better-0.026
 agree-0.024
use-0.024
Token The
Token ID383
Feature activation+8.90e-03

Pos logit contributions
lf+0.009
aces+0.009
 walls+0.008
 floors+0.008
 towers+0.008
Neg logit contributions
 forgive-0.010
 promise-0.010
 better-0.009
 agree-0.009
use-0.009
Token pirate
Token ID25868
Feature activation-3.06e-03

Pos logit contributions
 promise+0.015
 forgive+0.014
 better+0.013
use+0.013
 agree+0.013
Neg logit contributions
aces-0.018
lf-0.017
 walls-0.016
 towers-0.016
 floors-0.016
Token home
Token ID1363
Feature activation+2.84e-04

Pos logit contributions
 promise+0.011
 forgive+0.010
 better+0.010
use+0.010
 agree+0.010
Neg logit contributions
aces-0.010
lf-0.010
 walls-0.009
 towers-0.009
 floors-0.009
Token,
Token ID11
Feature activation+4.09e-03

Pos logit contributions
 promise+0.001
 forgive+0.000
 better+0.000
 agree+0.000
use+0.000
Neg logit contributions
aces-0.001
lf-0.000
anes-0.000
 towers-0.000
 walls-0.000
Token knowing
Token ID6970
Feature activation-1.27e-02

Pos logit contributions
 promise+0.006
 better+0.006
 forgive+0.006
 agree+0.006
 happier+0.005
Neg logit contributions
aces-0.009
lf-0.008
anes-0.008
 floors-0.008
 walls-0.008
Token that
Token ID326
Feature activation+9.46e-03

Pos logit contributions
aces+0.018
lf+0.018
 walls+0.017
 towers+0.017
 floors+0.016
Neg logit contributions
 promise-0.027
 forgive-0.026
 better-0.025
use-0.024
 agree-0.024
Token their
Token ID511
Feature activation-1.58e-02

Pos logit contributions
 promise+0.017
 better+0.016
 forgive+0.016
 agree+0.015
use+0.015
Neg logit contributions
aces-0.017
lf-0.017
 floors-0.015
anes-0.015
 walls-0.015
Token thick
Token ID6546
Feature activation-4.19e-02

Pos logit contributions
aces+0.030
lf+0.029
anes+0.029
umbled+0.027
 floors+0.027
Neg logit contributions
 promise-0.029
 forgive-0.026
 better-0.026
 agree-0.024
 sip-0.023
Token chamber
Token ID11847
Feature activation-2.26e-02

Pos logit contributions
anes+0.072
aces+0.071
lf+0.070
umbled+0.067
 floors+0.065
Neg logit contributions
 promise-0.082
 forgive-0.074
 better-0.072
use-0.070
 sip-0.067
Token was
Token ID373
Feature activation+5.18e-03

Pos logit contributions
aces+0.040
lf+0.039
anes+0.038
 towers+0.038
 floors+0.037
Neg logit contributions
 promise-0.043
 better-0.039
 forgive-0.039
 agree-0.035
use-0.035
Token always
Token ID1464
Feature activation+7.71e-03

Pos logit contributions
 promise+0.010
 forgive+0.009
 better+0.009
use+0.009
 agree+0.008
Neg logit contributions
aces-0.009
lf-0.009
 walls-0.008
 towers-0.008
 floors-0.008
Token a
Token ID257
Feature activation+2.82e-03

Pos logit contributions
 promise+0.014
use+0.013
 forgive+0.013
 sip+0.013
 agree+0.013
Neg logit contributions
lf-0.013
aces-0.013
 walls-0.012
 towers-0.012
 floors-0.012
Token place
Token ID1295
Feature activation-1.03e-02

Pos logit contributions
use+0.005
ackers+0.004
 sip+0.004
 agree+0.004
 rinse+0.004
Neg logit contributions
lf-0.005
 walls-0.005
 floors-0.004
 towers-0.004
 sculptures-0.004