TOP ACTIVATIONS
MAX = 0.096

at him with a mis chie vous grin . She said
and replied with a mis chie vous gl int in his
and the frog suggested catching tad p oles in a nearby
old lady with a mis chie vous tw inkle in her
glad I could help the tad p oles ". <|endoftext|> One
," and handed him the five grapes . Jimmy
small boy with a mis chie vous tw inkle in his
With a mis chie vous gl int in her
day she decided to reach even higher and tried to grab
to see the vegetable growing even closer , so mum and
deer wanted to find an even stronger fighter than himself .
thought she could make an even higher score and then it
went on with his mis chie vous ways , causing problems
So the princess only added five apples to her basket and
you can stretch and jump even higher !" Tim
look different and some are even colorful ?" Dad replied ,
but then she noticed an even bigger one . It was
, we can build an even bigger tower together !" And
had to use the words ' Please ' and ' Thank
, 3 , 4 , 5 , 6 and 7 sp

MIN ACTIVATIONS
MIN = -0.092

and fell . The ash tr ay flew out of her
not . Lucy and her t eddy bear went back home
anymore . Tim and his t eddy bear were very happy
day , Lily and her t eddy bear went for a
fell out and the ash tr ay broke into many pieces
, it caused the ash tr ay to scatter all over
Lily accidentally dropped the ash tr ay and it broke into
together until it was time for Snow ball to take a
hit her mom 's ash tr ay . The ash tr
tr ay . The ash tr ay fell down and broke
day , Lily and her t eddy bear went on an
L enny saw an ash tr ay on the side of
day , Lucy and her t eddy bear went for a
ambulance . It had a red light and a s iren
She saw the broken ash tr ay and the mess on
bear . Lucy and her t eddy bear lived in a
big scary monster ! The t eddy bear and his friends
. He drops the ash tr ay . It breaks .
aking . He had a red kay ak that he loved
stuck there forever . The t eddy bear wished he had

INTERVAL 0.049 - 0.096
CONTAINS 0.062%

but Ben had only a few . Lily was happy ,
, so he asked his dad for help .
he was always trying to be the one who won .
ay on and add some squ igg les . The boy
ider . Lily counted to ten and then looked for Tom

INTERVAL 0.002 - 0.049
CONTAINS 44.436%

". The creature gigg led and said , "
silly faces in her mirror ever again . <|endoftext|> Once upon
cake . They followed their mom inside , leaving the huge
. Red felt very happy and successful because he
of toy animals and she liked to skip around the garden

INTERVAL -0.045 - 0.002
CONTAINS 55.372%

happy and laughed a lot . They even used their tape
paper , paint , and brushes , and set up a
by . He saw the snow man and the jewel .
and Pat were out in the garden playing .
couch , the table and the chairs . Their

INTERVAL -0.092 - -0.045
CONTAINS 0.130%

One day , Tim my 's mom my told him they
tri pped and fell . Her t eddy bear went flying
to get the fan if Br idget was careful . Br
kept flowing , it ruined T oot 's home and all
and frustrated . She wanted her t eddy bear back .
Token at
Token ID379
Feature activation-1.62e-02

Pos logit contributions
smart+0.061
 respectful+0.059
 disl+0.058
 cooperative+0.057
 facts+0.057
Neg logit contributions
 once-0.020
 doorstep-0.018
 t-0.016
 were-0.016
 re-0.016
Token him
Token ID683
Feature activation+1.56e-02

Pos logit contributions
 once+0.016
 were+0.014
 t+0.014
 doorstep+0.014
 all+0.014
Neg logit contributions
 respectful-0.037
smart-0.036
 cooperative-0.036
 disl-0.034
 facts-0.034
Token with
Token ID351
Feature activation-5.51e-03

Pos logit contributions
 disl+0.041
smart+0.040
 respectful+0.039
 cooperative+0.039
 facts+0.038
Neg logit contributions
 once-0.013
 doorstep-0.010
 t-0.009
 all-0.009
 were-0.008
Token a
Token ID257
Feature activation+4.53e-03

Pos logit contributions
 once+0.006
 were+0.006
 t+0.006
 doorstep+0.006
 re+0.006
Neg logit contributions
smart-0.011
 respectful-0.011
 cooperative-0.011
 disl-0.010
 facts-0.010
Token mis
Token ID2984
Feature activation+6.80e-02

Pos logit contributions
smart+0.009
 respectful+0.008
 disl+0.008
 cooperative+0.008
 facts+0.008
Neg logit contributions
 once-0.005
 t-0.005
 doorstep-0.005
 were-0.005
 baby-0.005
Tokenchie
Token ID3043
Feature activation+9.56e-02

Pos logit contributions
 disl+0.120
 facts+0.116
smart+0.113
 respectful+0.111
 cooperative+0.106
Neg logit contributions
 doorstep-0.086
el-0.084
 dot-0.080
 once-0.080
eled-0.079
Tokenvous
Token ID31222
Feature activation+7.37e-03

Pos logit contributions
 cooperative+0.141
smart+0.137
 disl+0.134
 respectful+0.131
 wisely+0.130
Neg logit contributions
 once-0.160
 were-0.144
 cloud-0.133
he-0.133
 was-0.133
Token grin
Token ID16770
Feature activation+1.82e-02

Pos logit contributions
smart+0.016
 disl+0.015
 cooperative+0.015
 respectful+0.015
 facts+0.015
Neg logit contributions
 t-0.007
 once-0.007
 re-0.007
 baby-0.007
 doorstep-0.007
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
 respectful+0.038
 disl+0.037
smart+0.037
 cooperative+0.036
 facts+0.036
Neg logit contributions
 once-0.022
 t-0.020
 were-0.019
 re-0.019
 doorstep-0.018
Token She
Token ID1375
Feature activation+9.87e-03

Pos logit contributions
 respectful+0.029
 cooperative+0.027
leness+0.026
smart+0.026
 rewarding+0.026
Neg logit contributions
 Brown-0.012
 once-0.011
 Mom-0.010
 Bobby-0.010
 were-0.010
Token said
Token ID531
Feature activation+3.15e-02

Pos logit contributions
smart+0.021
 respectful+0.021
 disl+0.020
 cooperative+0.020
 facts+0.020
Neg logit contributions
 doorstep-0.009
 once-0.009
 t-0.008
 re-0.008
 were-0.008
Token and
Token ID290
Feature activation-4.40e-03

Pos logit contributions
 once+0.006
 doorstep+0.005
 t+0.005
 re+0.005
 were+0.004
Neg logit contributions
smart-0.015
 respectful-0.015
 disl-0.015
 facts-0.014
 cooperative-0.014
Token replied
Token ID8712
Feature activation+2.07e-02

Pos logit contributions
 once+0.004
 doorstep+0.003
 t+0.003
 re+0.003
 were+0.003
Neg logit contributions
smart-0.010
 respectful-0.010
 disl-0.010
 facts-0.010
 cooperative-0.010
Token with
Token ID351
Feature activation-6.27e-03

Pos logit contributions
 facts+0.052
 disl+0.051
 respectful+0.051
 cooperative+0.051
smart+0.050
Neg logit contributions
 once-0.015
 all-0.013
 t-0.013
 re-0.012
 were-0.011
Token a
Token ID257
Feature activation+3.76e-03

Pos logit contributions
 once+0.007
 t+0.007
 were+0.006
 all+0.006
 doorstep+0.006
Neg logit contributions
smart-0.013
 respectful-0.012
 disl-0.012
 cooperative-0.012
 facts-0.012
Token mis
Token ID2984
Feature activation+6.51e-02

Pos logit contributions
smart+0.007
 disl+0.007
 facts+0.007
 respectful+0.007
 cooperative+0.007
Neg logit contributions
 t-0.004
 re-0.004
 baby-0.004
 doorstep-0.004
 towel-0.004
Tokenchie
Token ID3043
Feature activation+9.48e-02

Pos logit contributions
 disl+0.134
 facts+0.130
 respectful+0.117
smart+0.115
 cooperative+0.113
Neg logit contributions
be-0.062
amb-0.059
el-0.055
eled-0.054
 doorstep-0.054
Tokenvous
Token ID31222
Feature activation+8.25e-03

Pos logit contributions
 cooperative+0.143
 disl+0.139
smart+0.136
 respectful+0.133
 facts+0.132
Neg logit contributions
 once-0.149
 were-0.141
 was-0.129
 cloud-0.125
he-0.124
Token gl
Token ID1278
Feature activation+3.09e-02

Pos logit contributions
smart+0.017
 disl+0.017
 cooperative+0.017
 respectful+0.017
 facts+0.016
Neg logit contributions
 t-0.009
 re-0.008
 once-0.008
 baby-0.008
 cloud-0.007
Tokenint
Token ID600
Feature activation+6.06e-03

Pos logit contributions
 disl+0.055
 cooperative+0.055
 facts+0.054
 respectful+0.054
smart+0.054
Neg logit contributions
 once-0.037
 doorstep-0.036
amb-0.035
el-0.034
ay-0.033
Token in
Token ID287
Feature activation-6.37e-03

Pos logit contributions
 disl+0.013
 cooperative+0.013
 respectful+0.012
 rewarding+0.012
 facts+0.011
Neg logit contributions
 once-0.008
el-0.007
 were-0.007
 re-0.007
 t-0.006
Token his
Token ID465
Feature activation+2.16e-03

Pos logit contributions
 once+0.007
 t+0.006
 doorstep+0.006
 re+0.006
 were+0.006
Neg logit contributions
 respectful-0.014
 cooperative-0.013
smart-0.013
 disl-0.013
 facts-0.013
Token and
Token ID290
Feature activation+2.65e-02

Pos logit contributions
smart+0.039
 disl+0.038
 cooperative+0.038
 respectful+0.038
reetings+0.037
Neg logit contributions
 once-0.013
 all-0.012
 were-0.011
 t-0.010
 just-0.009
Token the
Token ID262
Feature activation+2.00e-02

Pos logit contributions
smart+0.058
 respectful+0.056
 disl+0.055
 cooperative+0.054
 facts+0.052
Neg logit contributions
 once-0.027
 were-0.025
 doorstep-0.024
 t-0.024
 all-0.023
Token frog
Token ID21264
Feature activation+2.89e-02

Pos logit contributions
smart+0.051
 disl+0.045
 respectful+0.044
 cooperative+0.043
 facts+0.043
Neg logit contributions
 baby-0.018
 once-0.018
 t-0.017
 family-0.017
 re-0.016
Token suggested
Token ID5220
Feature activation+2.75e-02

Pos logit contributions
smart+0.067
 disl+0.065
 respectful+0.065
 cooperative+0.063
 facts+0.061
Neg logit contributions
 once-0.027
 were-0.026
 doorstep-0.023
 t-0.022
 re-0.021
Token catching
Token ID16508
Feature activation+6.08e-02

Pos logit contributions
smart+0.071
reetings+0.067
 disl+0.066
 respectful+0.065
 cooperative+0.065
Neg logit contributions
 once-0.024
 all-0.022
 were-0.021
 just-0.018
 both-0.018
Token tad
Token ID36264
Feature activation+8.96e-02

Pos logit contributions
 respectful+0.154
 cooperative+0.150
smart+0.148
 disl+0.146
reetings+0.143
Neg logit contributions
 all-0.060
 once-0.058
 were-0.044
 both-0.042
 duck-0.041
Tokenp
Token ID79
Feature activation+4.36e-02

Pos logit contributions
 cooperative+0.176
 disl+0.171
 respectful+0.166
smart+0.165
 dishonest+0.162
Neg logit contributions
 once-0.104
 all-0.094
 just-0.088
 both-0.082
 baby-0.079
Tokenoles
Token ID4316
Feature activation+4.54e-02

Pos logit contributions
 cooperative+0.102
 respectful+0.101
 disl+0.093
smart+0.092
 dishonest+0.091
Neg logit contributions
 once-0.049
 all-0.045
 were-0.042
 duck-0.038
 both-0.037
Token in
Token ID287
Feature activation+1.52e-02

Pos logit contributions
 disl+0.104
 cooperative+0.103
 respectful+0.102
smart+0.099
 facts+0.095
Neg logit contributions
 all-0.046
 once-0.045
 were-0.039
 both-0.036
 t-0.035
Token a
Token ID257
Feature activation+3.60e-02

Pos logit contributions
 disl+0.036
 respectful+0.036
 cooperative+0.036
aring+0.035
smart+0.035
Neg logit contributions
 once-0.014
 all-0.012
 were-0.011
 t-0.011
 baby-0.010
Token nearby
Token ID6716
Feature activation+3.36e-02

Pos logit contributions
 disl+0.077
smart+0.077
 facts+0.074
 cooperative+0.073
 respectful+0.073
Neg logit contributions
 blanket-0.038
 doorstep-0.036
 t-0.036
 re-0.035
 plane-0.035
Token old
Token ID1468
Feature activation+6.22e-03

Pos logit contributions
 once+0.001
 doorstep+0.001
 baby+0.001
 re+0.001
 t+0.001
Neg logit contributions
smart-0.001
 disl-0.001
 cooperative-0.001
 respectful-0.001
 facts-0.001
Token lady
Token ID10846
Feature activation+3.22e-02

Pos logit contributions
smart+0.016
 disl+0.014
 cooperative+0.014
 rewarding+0.013
leness+0.013
Neg logit contributions
 once-0.006
 t-0.005
 balloon-0.005
 baby-0.005
 plane-0.005
Token with
Token ID351
Feature activation-1.32e-02

Pos logit contributions
 disl+0.073
 cooperative+0.072
 respectful+0.072
 facts+0.067
leness+0.067
Neg logit contributions
 once-0.034
 were-0.029
 t-0.027
 all-0.027
 baby-0.025
Token a
Token ID257
Feature activation-1.98e-02

Pos logit contributions
 t+0.014
 once+0.014
 re+0.013
 were+0.013
 all+0.013
Neg logit contributions
 cooperative-0.027
 respectful-0.027
smart-0.027
 disl-0.026
 facts-0.025
Token mis
Token ID2984
Feature activation+6.09e-02

Pos logit contributions
 t+0.023
 doorstep+0.022
 re+0.022
 baby+0.021
 once+0.021
Neg logit contributions
smart-0.041
 disl-0.038
 cooperative-0.038
 respectful-0.037
 facts-0.037
Tokenchie
Token ID3043
Feature activation+8.93e-02

Pos logit contributions
 disl+0.125
 facts+0.120
 respectful+0.109
 cooperative+0.107
smart+0.107
Neg logit contributions
be-0.063
el-0.059
eled-0.057
 doorstep-0.057
amb-0.055
Tokenvous
Token ID31222
Feature activation-8.46e-03

Pos logit contributions
 cooperative+0.140
 disl+0.132
smart+0.129
 wisely+0.129
 respectful+0.127
Neg logit contributions
 once-0.135
 were-0.125
 set-0.116
 was-0.116
he-0.113
Token tw
Token ID665
Feature activation+2.12e-02

Pos logit contributions
 t+0.009
 once+0.008
 re+0.008
 baby+0.008
 doorstep+0.008
Neg logit contributions
smart-0.018
 cooperative-0.018
 disl-0.017
 respectful-0.017
 facts-0.017
Tokeninkle
Token ID19894
Feature activation-2.49e-02

Pos logit contributions
smart+0.051
 respectful+0.050
 cooperative+0.049
 facts+0.049
 disl+0.049
Neg logit contributions
 once-0.015
el-0.014
 doorstep-0.014
 re-0.013
 t-0.012
Token in
Token ID287
Feature activation-3.64e-03

Pos logit contributions
 once+0.021
 re+0.019
 t+0.018
 were+0.017
 baby+0.017
Neg logit contributions
 respectful-0.059
smart-0.059
 disl-0.058
 cooperative-0.058
 facts-0.057
Token her
Token ID607
Feature activation-2.35e-03

Pos logit contributions
 once+0.004
 t+0.003
 re+0.003
el+0.003
 baby+0.003
Neg logit contributions
 respectful-0.008
 cooperative-0.008
smart-0.008
 facts-0.008
 disl-0.007
Token glad
Token ID9675
Feature activation-1.79e-02

Pos logit contributions
 facts+0.016
smart+0.016
 disl+0.015
 respectful+0.015
 cooperative+0.014
Neg logit contributions
 once-0.011
 all-0.009
 finally-0.009
 t-0.009
 baby-0.009
Token I
Token ID314
Feature activation+1.39e-02

Pos logit contributions
 once+0.022
 t+0.019
 re+0.019
 all+0.019
 baby+0.019
Neg logit contributions
smart-0.036
 disl-0.035
 respectful-0.035
 cooperative-0.034
 facts-0.033
Token could
Token ID714
Feature activation+2.29e-04

Pos logit contributions
smart+0.030
 respectful+0.029
 disl+0.029
 cooperative+0.028
 facts+0.027
Neg logit contributions
 doorstep-0.013
 once-0.013
 re-0.011
 t-0.011
 baby-0.011
Token help
Token ID1037
Feature activation+2.51e-03

Pos logit contributions
smart+0.000
 respectful+0.000
 disl+0.000
 cooperative+0.000
 facts+0.000
Neg logit contributions
 once-0.000
 doorstep-0.000
 t-0.000
 were-0.000
 re-0.000
Token the
Token ID262
Feature activation+1.54e-02

Pos logit contributions
smart+0.005
 disl+0.005
 respectful+0.005
 cooperative+0.005
 facts+0.005
Neg logit contributions
 once-0.003
 doorstep-0.003
 all-0.003
 t-0.003
 baby-0.003
Token tad
Token ID36264
Feature activation+8.65e-02

Pos logit contributions
smart+0.034
 disl+0.031
 respectful+0.030
 cooperative+0.029
 facts+0.029
Neg logit contributions
 doorstep-0.018
 baby-0.016
 once-0.016
 t-0.016
 family-0.015
Tokenp
Token ID79
Feature activation+1.47e-02

Pos logit contributions
smart+0.161
 disl+0.160
 facts+0.156
 cooperative+0.154
 respectful+0.148
Neg logit contributions
 once-0.099
 all-0.086
 just-0.085
 baby-0.084
 cloud-0.083
Tokenoles
Token ID4316
Feature activation+2.36e-02

Pos logit contributions
smart+0.030
 respectful+0.030
 cooperative+0.029
 disl+0.029
 facts+0.028
Neg logit contributions
 once-0.017
 doorstep-0.015
 cloud-0.015
 were-0.014
 all-0.014
Token".
Token ID1911
Feature activation-3.72e-03

Pos logit contributions
smart+0.049
 disl+0.048
 facts+0.048
 respectful+0.047
 cooperative+0.047
Neg logit contributions
 once-0.027
 all-0.025
 were-0.024
 doorstep-0.024
 t-0.023
Token<|endoftext|>
Token ID50256
Feature activation+2.16e-02

Pos logit contributions
 once+0.004
 doorstep+0.003
 were+0.003
 Brown+0.003
 t+0.003
Neg logit contributions
 respectful-0.008
smart-0.008
 disl-0.008
 cooperative-0.008
 facts-0.008
TokenOne
Token ID3198
Feature activation+1.89e-02

Pos logit contributions
 respectful+0.046
smart+0.045
 disl+0.044
 cooperative+0.043
 facts+0.042
Neg logit contributions
 once-0.025
 doorstep-0.022
 were-0.021
 t-0.020
 re-0.020
Token,"
Token ID553
Feature activation+7.74e-03

Pos logit contributions
 facts+0.074
 disl+0.073
 cooperative+0.071
 respectful+0.070
smart+0.067
Neg logit contributions
 once-0.029
 all-0.027
 set-0.025
 house-0.023
 finally-0.023
Token and
Token ID290
Feature activation+4.19e-03

Pos logit contributions
smart+0.018
 respectful+0.017
 disl+0.017
 facts+0.016
 cooperative+0.016
Neg logit contributions
 once-0.008
 were-0.007
 t-0.007
 doorstep-0.006
 baby-0.006
Token handed
Token ID10158
Feature activation-1.67e-02

Pos logit contributions
smart+0.009
 respectful+0.009
 disl+0.008
 cooperative+0.008
 facts+0.008
Neg logit contributions
 doorstep-0.004
 once-0.004
 t-0.004
 re-0.004
 were-0.004
Token him
Token ID683
Feature activation+3.48e-02

Pos logit contributions
 doorstep+0.018
 dot+0.016
 once+0.015
 re+0.015
burn+0.015
Neg logit contributions
smart-0.033
 respectful-0.032
 facts-0.030
 cooperative-0.030
 disl-0.029
Token the
Token ID262
Feature activation+5.43e-02

Pos logit contributions
 respectful+0.086
 cooperative+0.084
 disl+0.084
smart+0.083
 facts+0.081
Neg logit contributions
 once-0.033
 all-0.031
 both-0.026
 t-0.026
 were-0.025
Token five
Token ID1936
Feature activation+8.29e-02

Pos logit contributions
smart+0.112
 disl+0.104
 respectful+0.098
 cooperative+0.095
 facts+0.092
Neg logit contributions
 once-0.073
 all-0.069
 were-0.067
 baby-0.066
 t-0.066
Token grapes
Token ID36656
Feature activation+6.97e-03

Pos logit contributions
 disl+0.197
 respectful+0.183
 cooperative+0.180
smart+0.179
reetings+0.175
Neg logit contributions
 once-0.083
 all-0.080
 were-0.070
 baby-0.064
 both-0.063
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
 disl+0.014
smart+0.014
 respectful+0.014
 cooperative+0.013
 facts+0.013
Neg logit contributions
 once-0.009
 were-0.008
 doorstep-0.008
 all-0.008
 t-0.008
Token
Token ID198
Feature activation-1.36e-02

Pos logit contributions
 once+0.011
 doorstep+0.011
 t+0.010
 were+0.010
 re+0.009
Neg logit contributions
smart-0.022
 respectful-0.022
 disl-0.021
 cooperative-0.020
 facts-0.020
Token
Token ID198
Feature activation-5.44e-03

Pos logit contributions
 doorstep+0.018
 once+0.017
 t+0.016
 re+0.016
 were+0.015
Neg logit contributions
smart-0.025
 respectful-0.024
 disl-0.023
 cooperative-0.023
 facts-0.022
TokenJimmy
Token ID40335
Feature activation+1.81e-02

Pos logit contributions
 doorstep+0.007
 once+0.007
 t+0.006
 re+0.006
 were+0.006
Neg logit contributions
smart-0.010
 respectful-0.010
 disl-0.009
 cooperative-0.009
 facts-0.009
Token small
Token ID1402
Feature activation-1.21e-03

Pos logit contributions
smart+0.001
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 doorstep-0.001
 once-0.001
 t-0.001
 re-0.001
 were-0.000
Token boy
Token ID2933
Feature activation+1.01e-02

Pos logit contributions
 doorstep+0.001
 once+0.001
 t+0.001
 re+0.001
 were+0.001
Neg logit contributions
smart-0.003
 respectful-0.003
 disl-0.002
 cooperative-0.002
 facts-0.002
Token with
Token ID351
Feature activation-1.92e-02

Pos logit contributions
smart+0.019
 respectful+0.019
 disl+0.018
 cooperative+0.018
 facts+0.018
Neg logit contributions
 once-0.013
 doorstep-0.012
 t-0.011
 re-0.011
 were-0.011
Token a
Token ID257
Feature activation-2.36e-02

Pos logit contributions
 once+0.025
 doorstep+0.024
 t+0.023
 were+0.023
 re+0.022
Neg logit contributions
smart-0.036
 respectful-0.035
 disl-0.034
 cooperative-0.033
 facts-0.032
Token mis
Token ID2984
Feature activation+5.00e-02

Pos logit contributions
 doorstep+0.030
 once+0.030
 t+0.027
 re+0.027
 were+0.027
Neg logit contributions
smart-0.044
 respectful-0.043
 disl-0.041
 cooperative-0.040
 facts-0.040
Tokenchie
Token ID3043
Feature activation+8.28e-02

Pos logit contributions
smart+0.075
 disl+0.075
 respectful+0.074
 facts+0.071
 cooperative+0.069
Neg logit contributions
 doorstep-0.078
 once-0.076
el-0.072
 re-0.071
 t-0.070
Tokenvous
Token ID31222
Feature activation-9.42e-03

Pos logit contributions
 cooperative+0.115
 respectful+0.111
smart+0.110
 disl+0.108
 facts+0.105
Neg logit contributions
 once-0.144
 were-0.134
he-0.128
 doorstep-0.128
 cloud-0.126
Token tw
Token ID665
Feature activation+1.67e-02

Pos logit contributions
 once+0.010
 doorstep+0.009
 t+0.009
 re+0.009
 baby+0.009
Neg logit contributions
smart-0.020
 respectful-0.019
 disl-0.019
 cooperative-0.019
 facts-0.018
Tokeninkle
Token ID19894
Feature activation-2.50e-02

Pos logit contributions
smart+0.036
 respectful+0.035
 cooperative+0.034
 disl+0.034
 facts+0.034
Neg logit contributions
 once-0.016
 doorstep-0.016
 re-0.014
 t-0.014
el-0.014
Token in
Token ID287
Feature activation-1.01e-02

Pos logit contributions
 doorstep+0.021
 once+0.020
 t+0.018
 were+0.017
 re+0.017
Neg logit contributions
smart-0.057
 respectful-0.056
 disl-0.054
 cooperative-0.053
 facts-0.052
Token his
Token ID465
Feature activation-1.11e-02

Pos logit contributions
 once+0.013
 doorstep+0.013
 t+0.013
 re+0.012
 were+0.012
Neg logit contributions
 respectful-0.019
smart-0.019
 cooperative-0.018
 disl-0.017
 facts-0.017
Token
Token ID198
Feature activation-1.28e-03

Pos logit contributions
 respectful+0.046
smart+0.045
 disl+0.044
 facts+0.043
 cooperative+0.043
Neg logit contributions
 once-0.019
 were-0.016
 doorstep-0.015
he-0.014
 Brown-0.013
Token
Token ID198
Feature activation+7.80e-03

Pos logit contributions
 once+0.001
 doorstep+0.001
 were+0.001
he+0.001
 t+0.001
Neg logit contributions
smart-0.003
 respectful-0.003
 disl-0.003
 cooperative-0.003
 facts-0.003
TokenWith
Token ID3152
Feature activation+5.86e-03

Pos logit contributions
smart+0.014
 facts+0.014
 respectful+0.013
 disl+0.013
 cooperative+0.013
Neg logit contributions
 doorstep-0.010
 once-0.010
 were-0.010
 finally-0.009
he-0.009
Token a
Token ID257
Feature activation+6.50e-03

Pos logit contributions
smart+0.012
 respectful+0.011
 disl+0.011
 cooperative+0.011
 facts+0.010
Neg logit contributions
 once-0.007
 t-0.007
 were-0.006
 doorstep-0.006
 re-0.006
Token mis
Token ID2984
Feature activation+5.70e-02

Pos logit contributions
smart+0.012
 respectful+0.012
 disl+0.012
 cooperative+0.011
 facts+0.011
Neg logit contributions
 doorstep-0.008
 once-0.008
 t-0.008
 re-0.008
 were-0.008
Tokenchie
Token ID3043
Feature activation+8.22e-02

Pos logit contributions
 disl+0.094
smart+0.090
 respectful+0.089
 facts+0.089
 cooperative+0.084
Neg logit contributions
 doorstep-0.079
 once-0.074
el-0.073
eled-0.071
 dot-0.070
Tokenvous
Token ID31222
Feature activation+8.24e-03

Pos logit contributions
 disl+0.111
smart+0.111
 cooperative+0.110
 respectful+0.106
 facts+0.105
Neg logit contributions
 once-0.138
 were-0.131
 doorstep-0.124
 set-0.120
he-0.119
Token gl
Token ID1278
Feature activation+2.44e-02

Pos logit contributions
smart+0.018
 disl+0.017
 respectful+0.017
 cooperative+0.016
 facts+0.016
Neg logit contributions
 t-0.009
 re-0.008
 once-0.008
 baby-0.008
 doorstep-0.008
Tokenint
Token ID600
Feature activation+6.75e-03

Pos logit contributions
smart+0.039
 respectful+0.039
 cooperative+0.039
 disl+0.039
 facts+0.038
Neg logit contributions
 doorstep-0.034
 once-0.034
el-0.031
amb-0.030
 t-0.030
Token in
Token ID287
Feature activation+2.68e-03

Pos logit contributions
 disl+0.015
 cooperative+0.015
 respectful+0.015
 rewarding+0.013
 facts+0.013
Neg logit contributions
 once-0.007
el-0.007
 were-0.007
 re-0.006
 airplane-0.006
Token her
Token ID607
Feature activation+5.80e-04

Pos logit contributions
 respectful+0.006
 cooperative+0.006
smart+0.006
 disl+0.006
 facts+0.005
Neg logit contributions
 once-0.003
 t-0.003
 were-0.003
 doorstep-0.003
 all-0.002
Token day
Token ID1110
Feature activation+1.58e-02

Pos logit contributions
smart+0.043
 respectful+0.042
 disl+0.041
 cooperative+0.040
 facts+0.039
Neg logit contributions
 once-0.019
 doorstep-0.018
 were-0.016
 t-0.016
 re-0.016
Token she
Token ID673
Feature activation-2.07e-04

Pos logit contributions
 respectful+0.036
smart+0.036
 cooperative+0.035
 disl+0.035
 facts+0.034
Neg logit contributions
 once-0.015
 were-0.014
 t-0.013
 re-0.013
 baby-0.012
Token decided
Token ID3066
Feature activation+3.03e-02

Pos logit contributions
 doorstep+0.000
eddy+0.000
 once+0.000
ifier+0.000
burn+0.000
Neg logit contributions
smart-0.000
 respectful-0.000
 disl-0.000
 cooperative-0.000
 facts-0.000
Token to
Token ID284
Feature activation+6.04e-03

Pos logit contributions
smart+0.081
 cooperative+0.078
 facts+0.075
leness+0.075
 disl+0.074
Neg logit contributions
 once-0.022
 all-0.020
 just-0.018
 finally-0.017
 couldn-0.016
Token reach
Token ID3151
Feature activation+2.42e-02

Pos logit contributions
smart+0.011
 respectful+0.010
 disl+0.010
 cooperative+0.010
 facts+0.010
Neg logit contributions
 doorstep-0.008
 once-0.008
 t-0.007
 were-0.007
 re-0.007
Token even
Token ID772
Feature activation+8.00e-02

Pos logit contributions
 disl+0.064
 respectful+0.064
 cooperative+0.063
smart+0.061
aring+0.061
Neg logit contributions
 once-0.022
 all-0.021
 just-0.015
 t-0.015
 both-0.014
Token higher
Token ID2440
Feature activation+2.75e-02

Pos logit contributions
 disl+0.164
smart+0.151
 facts+0.148
 cooperative+0.140
vern+0.136
Neg logit contributions
 once-0.110
 all-0.103
 set-0.093
 both-0.092
 t-0.091
Token and
Token ID290
Feature activation+5.49e-02

Pos logit contributions
 disl+0.067
smart+0.066
 respectful+0.064
 cooperative+0.063
rets+0.061
Neg logit contributions
 once-0.028
 all-0.022
 were-0.020
el-0.019
 t-0.018
Token tried
Token ID3088
Feature activation+4.10e-02

Pos logit contributions
smart+0.102
 facts+0.094
 disl+0.091
 cooperative+0.088
 respectful+0.086
Neg logit contributions
 once-0.082
 finally-0.078
 were-0.075
 all-0.075
 set-0.072
Token to
Token ID284
Feature activation+1.97e-02

Pos logit contributions
 disl+0.110
 cooperative+0.106
smart+0.103
leness+0.101
rets+0.101
Neg logit contributions
 all-0.040
 once-0.038
 just-0.028
 both-0.026
 were-0.024
Token grab
Token ID5552
Feature activation+3.63e-02

Pos logit contributions
smart+0.039
 respectful+0.036
 disl+0.035
 cooperative+0.035
 facts+0.035
Neg logit contributions
 doorstep-0.025
 once-0.025
 t-0.023
 were-0.022
 re-0.022
Token to
Token ID284
Feature activation+1.48e-02

Pos logit contributions
smart+0.067
 respectful+0.066
 disl+0.066
 cooperative+0.064
reetings+0.061
Neg logit contributions
 once-0.034
 all-0.031
 were-0.030
 t-0.028
 re-0.027
Token see
Token ID766
Feature activation+3.81e-02

Pos logit contributions
smart+0.029
 respectful+0.028
 disl+0.027
 cooperative+0.026
 facts+0.026
Neg logit contributions
 doorstep-0.018
 once-0.018
 t-0.016
 were-0.016
 re-0.016
Token the
Token ID262
Feature activation+1.78e-02

Pos logit contributions
 respectful+0.077
 disl+0.075
smart+0.074
 cooperative+0.074
 facts+0.069
Neg logit contributions
 once-0.047
 all-0.045
 t-0.044
 were-0.043
 re-0.043
Token vegetable
Token ID20236
Feature activation+2.64e-02

Pos logit contributions
smart+0.032
 respectful+0.029
 disl+0.029
 cooperative+0.027
 facts+0.026
Neg logit contributions
 once-0.026
 doorstep-0.025
 t-0.024
 re-0.024
 were-0.024
Token growing
Token ID3957
Feature activation+2.86e-02

Pos logit contributions
 disl+0.058
smart+0.057
 respectful+0.055
 cooperative+0.053
 facts+0.051
Neg logit contributions
 once-0.031
 all-0.029
 were-0.028
 t-0.026
 re-0.026
Token even
Token ID772
Feature activation+7.97e-02

Pos logit contributions
 disl+0.076
 respectful+0.073
 cooperative+0.070
smart+0.070
 facts+0.068
Neg logit contributions
 once-0.029
 all-0.027
el-0.020
 were-0.020
 just-0.019
Token closer
Token ID5699
Feature activation+2.09e-02

Pos logit contributions
 disl+0.174
vern+0.160
 facts+0.157
smart+0.157
aring+0.153
Neg logit contributions
 all-0.100
 once-0.100
 set-0.087
 t-0.080
 just-0.080
Token,
Token ID11
Feature activation-3.69e-04

Pos logit contributions
 disl+0.052
smart+0.049
 cooperative+0.049
 respectful+0.048
rets+0.048
Neg logit contributions
 once-0.024
 all-0.020
 were-0.018
 just-0.017
 t-0.015
Token so
Token ID523
Feature activation+8.85e-03

Pos logit contributions
 once+0.000
 were+0.000
 t+0.000
 all+0.000
 re+0.000
Neg logit contributions
smart-0.001
 respectful-0.001
 disl-0.001
 cooperative-0.001
 facts-0.001
Token mum
Token ID25682
Feature activation+1.36e-02

Pos logit contributions
smart+0.017
 respectful+0.016
 disl+0.016
 cooperative+0.015
 facts+0.015
Neg logit contributions
 once-0.012
 were-0.011
 doorstep-0.011
 all-0.011
 mom-0.011
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
smart+0.032
 respectful+0.031
 disl+0.031
 cooperative+0.030
 facts+0.029
Neg logit contributions
 once-0.011
 doorstep-0.011
 were-0.010
 t-0.010
 re-0.010
Token deer
Token ID20096
Feature activation-2.06e-04

Pos logit contributions
 doorstep+0.044
 least+0.040
 tub+0.039
burn+0.039
 reprint+0.038
Neg logit contributions
 respectful-0.050
smart-0.049
 facts-0.049
 disl-0.047
 questions-0.046
Token wanted
Token ID2227
Feature activation+9.45e-03

Pos logit contributions
 doorstep+0.000
 once+0.000
 re+0.000
 t+0.000
burn+0.000
Neg logit contributions
smart-0.000
 respectful-0.000
 disl-0.000
 cooperative-0.000
 facts-0.000
Token to
Token ID284
Feature activation+1.26e-02

Pos logit contributions
 disl+0.022
smart+0.022
 respectful+0.022
 cooperative+0.022
leness+0.022
Neg logit contributions
 once-0.009
 all-0.008
 t-0.008
 were-0.007
 Brown-0.007
Token find
Token ID1064
Feature activation+8.47e-03

Pos logit contributions
smart+0.027
 respectful+0.026
 cooperative+0.025
 disl+0.024
 facts+0.024
Neg logit contributions
 doorstep-0.013
 once-0.013
 were-0.012
 t-0.012
 re-0.011
Token an
Token ID281
Feature activation+1.01e-02

Pos logit contributions
smart+0.016
 respectful+0.016
 disl+0.016
 cooperative+0.015
 facts+0.015
Neg logit contributions
 once-0.010
 t-0.010
 doorstep-0.010
 re-0.009
 were-0.009
Token even
Token ID772
Feature activation+7.80e-02

Pos logit contributions
 respectful+0.016
smart+0.016
 disl+0.015
 cooperative+0.015
 facts+0.015
Neg logit contributions
 doorstep-0.015
 once-0.014
 were-0.014
 raged-0.013
 re-0.013
Token stronger
Token ID7387
Feature activation+1.63e-02

Pos logit contributions
 disl+0.157
smart+0.146
reetings+0.138
 facts+0.136
leness+0.135
Neg logit contributions
 once-0.094
 all-0.092
 set-0.092
 t-0.089
 just-0.086
Token fighter
Token ID10543
Feature activation+3.79e-02

Pos logit contributions
smart+0.036
 disl+0.033
 respectful+0.033
 cooperative+0.032
 facts+0.031
Neg logit contributions
 once-0.017
 doorstep-0.016
 t-0.015
 re-0.015
 baby-0.015
Token than
Token ID621
Feature activation+2.71e-03

Pos logit contributions
 cooperative+0.076
smart+0.076
 disl+0.075
 respectful+0.075
reetings+0.071
Neg logit contributions
 once-0.046
 t-0.041
 were-0.039
 all-0.038
 airplane-0.038
Token himself
Token ID2241
Feature activation+2.33e-02

Pos logit contributions
 respectful+0.007
 disl+0.006
 cooperative+0.006
smart+0.006
aring+0.006
Neg logit contributions
 all-0.003
 once-0.002
 were-0.002
 both-0.002
 t-0.002
Token.
Token ID13
Feature activation+3.34e-04

Pos logit contributions
 disl+0.055
smart+0.051
 respectful+0.050
 cooperative+0.050
rets+0.049
Neg logit contributions
 once-0.029
 all-0.025
 were-0.024
 t-0.021
 just-0.020
Token thought
Token ID1807
Feature activation+2.35e-02

Pos logit contributions
smart+0.047
 respectful+0.046
 disl+0.045
 cooperative+0.044
 facts+0.043
Neg logit contributions
 once-0.014
 doorstep-0.014
 were-0.013
 t-0.013
 re-0.011
Token she
Token ID673
Feature activation+3.57e-02

Pos logit contributions
 disl+0.055
 respectful+0.053
 cooperative+0.053
smart+0.052
minded+0.051
Neg logit contributions
 were-0.022
 once-0.022
 all-0.020
 couldn-0.018
 t-0.017
Token could
Token ID714
Feature activation+1.56e-02

Pos logit contributions
smart+0.094
 respectful+0.092
 disl+0.090
 cooperative+0.088
 facts+0.086
Neg logit contributions
 once-0.021
 were-0.021
 doorstep-0.018
 t-0.018
 couldn-0.015
Token make
Token ID787
Feature activation+1.41e-02

Pos logit contributions
smart+0.034
 respectful+0.033
 cooperative+0.032
 disl+0.032
 facts+0.032
Neg logit contributions
 once-0.016
 doorstep-0.015
 were-0.014
 t-0.014
 finally-0.013
Token an
Token ID281
Feature activation+2.92e-02

Pos logit contributions
smart+0.030
 respectful+0.030
 cooperative+0.028
 disl+0.028
reetings+0.028
Neg logit contributions
 once-0.017
 baby-0.015
 all-0.015
 t-0.014
 mom-0.014
Token even
Token ID772
Feature activation+7.77e-02

Pos logit contributions
smart+0.048
 respectful+0.047
 cooperative+0.044
 disl+0.043
 facts+0.042
Neg logit contributions
 once-0.042
 doorstep-0.042
 were-0.041
 t-0.040
 re-0.039
Token higher
Token ID2440
Feature activation+2.90e-02

Pos logit contributions
 disl+0.142
smart+0.139
orts+0.129
reetings+0.129
 facts+0.128
Neg logit contributions
 once-0.099
 t-0.093
 all-0.092
 set-0.091
 just-0.089
Token score
Token ID4776
Feature activation+3.57e-02

Pos logit contributions
smart+0.055
 respectful+0.052
 disl+0.051
 cooperative+0.050
 facts+0.046
Neg logit contributions
 once-0.040
 were-0.034
 t-0.034
 doorstep-0.034
 re-0.034
Token and
Token ID290
Feature activation+4.28e-02

Pos logit contributions
 disl+0.084
 respectful+0.081
 cooperative+0.080
smart+0.075
rets+0.075
Neg logit contributions
 once-0.038
 all-0.034
 were-0.033
 just-0.028
 t-0.026
Token then
Token ID788
Feature activation+2.99e-02

Pos logit contributions
smart+0.090
 respectful+0.085
 cooperative+0.083
 disl+0.082
vern+0.081
Neg logit contributions
 were-0.044
 once-0.044
 all-0.041
 finally-0.040
 t-0.039
Token it
Token ID340
Feature activation+1.49e-02

Pos logit contributions
smart+0.063
 respectful+0.060
 cooperative+0.060
 disl+0.060
 facts+0.058
Neg logit contributions
 were-0.032
 once-0.032
 t-0.030
 all-0.030
 finally-0.029
Token went
Token ID1816
Feature activation+3.69e-03

Pos logit contributions
 doorstep+0.003
 once+0.003
 re+0.003
 were+0.003
 t+0.003
Neg logit contributions
smart-0.006
 respectful-0.006
 disl-0.006
 cooperative-0.005
 facts-0.005
Token on
Token ID319
Feature activation-4.17e-03

Pos logit contributions
 disl+0.009
 facts+0.008
 cooperative+0.008
smart+0.008
 respectful+0.008
Neg logit contributions
 once-0.003
 all-0.003
 t-0.003
 doorstep-0.003
 re-0.003
Token with
Token ID351
Feature activation-3.23e-02

Pos logit contributions
 once+0.005
 all+0.004
 t+0.004
 doorstep+0.004
 re+0.004
Neg logit contributions
 disl-0.009
smart-0.009
 cooperative-0.009
 respectful-0.009
reetings-0.008
Token his
Token ID465
Feature activation-8.34e-03

Pos logit contributions
 doorstep+0.043
 once+0.042
 re+0.038
 t+0.038
 were+0.038
Neg logit contributions
smart-0.059
 respectful-0.057
 disl-0.055
 cooperative-0.053
 facts-0.053
Token mis
Token ID2984
Feature activation+5.18e-02

Pos logit contributions
 once+0.007
 doorstep+0.007
 t+0.007
 re+0.006
 were+0.006
Neg logit contributions
smart-0.020
 respectful-0.019
 disl-0.018
 cooperative-0.018
 facts-0.018
Tokenchie
Token ID3043
Feature activation+7.66e-02

Pos logit contributions
 disl+0.079
smart+0.076
 facts+0.075
 respectful+0.072
 cooperative+0.070
Neg logit contributions
 doorstep-0.079
 once-0.073
el-0.073
eled-0.071
 re-0.067
Tokenvous
Token ID31222
Feature activation+4.08e-03

Pos logit contributions
 disl+0.109
 cooperative+0.109
smart+0.106
 facts+0.101
 respectful+0.099
Neg logit contributions
 once-0.131
 were-0.116
 doorstep-0.108
he-0.107
 was-0.104
Token ways
Token ID2842
Feature activation-4.77e-03

Pos logit contributions
smart+0.009
 cooperative+0.009
 respectful+0.009
 disl+0.008
 facts+0.008
Neg logit contributions
 t-0.004
 once-0.004
 baby-0.004
 re-0.004
 were-0.004
Token,
Token ID11
Feature activation+2.56e-03

Pos logit contributions
 once+0.004
 were+0.004
 all+0.003
 doorstep+0.003
 t+0.003
Neg logit contributions
 disl-0.012
 cooperative-0.011
 respectful-0.011
 facts-0.011
smart-0.011
Token causing
Token ID6666
Feature activation-5.88e-03

Pos logit contributions
smart+0.005
 disl+0.005
 cooperative+0.005
 respectful+0.005
 facts+0.005
Neg logit contributions
 once-0.003
 t-0.003
 were-0.003
 doorstep-0.002
 re-0.002
Token problems
Token ID2761
Feature activation-7.83e-04

Pos logit contributions
 once+0.006
 baby+0.006
 all+0.005
 were+0.005
 t+0.005
Neg logit contributions
smart-0.013
 respectful-0.013
 cooperative-0.013
 disl-0.013
reetings-0.012
TokenSo
Token ID2396
Feature activation-2.98e-03

Pos logit contributions
smart+0.003
 facts+0.003
 disl+0.003
 respectful+0.002
 cooperative+0.002
Neg logit contributions
 once-0.002
 were-0.002
he-0.002
 doorstep-0.001
 re-0.001
Token the
Token ID262
Feature activation+8.60e-03

Pos logit contributions
 once+0.004
 doorstep+0.003
 were+0.003
 t+0.003
 re+0.003
Neg logit contributions
smart-0.006
 respectful-0.006
 disl-0.006
 cooperative-0.006
 facts-0.005
Token princess
Token ID21752
Feature activation+1.10e-02

Pos logit contributions
smart+0.020
 disl+0.020
 respectful+0.019
 facts+0.019
 cooperative+0.018
Neg logit contributions
 once-0.008
 t-0.008
 doorstep-0.007
 re-0.007
 baby-0.006
Token only
Token ID691
Feature activation+4.06e-02

Pos logit contributions
smart+0.026
 respectful+0.025
 disl+0.024
 cooperative+0.024
 facts+0.024
Neg logit contributions
 once-0.009
 doorstep-0.009
 were-0.008
 t-0.008
 re-0.008
Token added
Token ID2087
Feature activation+4.75e-02

Pos logit contributions
 disl+0.103
smart+0.101
 respectful+0.098
 facts+0.097
 cooperative+0.096
Neg logit contributions
 were-0.031
 once-0.030
 t-0.026
 set-0.025
 couldn-0.023
Token five
Token ID1936
Feature activation+7.57e-02

Pos logit contributions
 disl+0.117
 respectful+0.114
 cooperative+0.112
smart+0.110
rets+0.107
Neg logit contributions
 all-0.048
 once-0.047
 were-0.042
 t-0.035
 baby-0.035
Token apples
Token ID22514
Feature activation+1.99e-02

Pos logit contributions
 disl+0.170
 respectful+0.160
 cooperative+0.155
smart+0.154
reetings+0.150
Neg logit contributions
 once-0.081
 all-0.081
 were-0.077
 baby-0.069
 t-0.069
Token to
Token ID284
Feature activation+2.58e-02

Pos logit contributions
 disl+0.050
 respectful+0.048
smart+0.047
 cooperative+0.047
rets+0.045
Neg logit contributions
 once-0.019
 were-0.017
 all-0.017
 t-0.013
 both-0.012
Token her
Token ID607
Feature activation+1.45e-02

Pos logit contributions
smart+0.063
 disl+0.062
 respectful+0.062
aring+0.062
 cooperative+0.061
Neg logit contributions
 all-0.023
 once-0.021
 were-0.019
 t-0.019
 baby-0.019
Token basket
Token ID7988
Feature activation+3.77e-03

Pos logit contributions
smart+0.036
 disl+0.034
 respectful+0.033
 cooperative+0.032
 facts+0.032
Neg logit contributions
 once-0.013
 doorstep-0.013
 t-0.012
 baby-0.011
 blanket-0.010
Token and
Token ID290
Feature activation+2.03e-02

Pos logit contributions
 disl+0.009
smart+0.008
 cooperative+0.008
 respectful+0.008
 facts+0.008
Neg logit contributions
 once-0.005
 were-0.004
 all-0.004
 doorstep-0.004
 set-0.003
Token you
Token ID345
Feature activation-4.04e-04

Pos logit contributions
smart+0.001
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 once-0.000
 doorstep-0.000
 were-0.000
 t-0.000
 re-0.000
Token can
Token ID460
Feature activation-6.43e-03

Pos logit contributions
 doorstep+0.000
 once+0.000
 re+0.000
 t+0.000
eled+0.000
Neg logit contributions
smart-0.001
 respectful-0.001
 disl-0.001
 cooperative-0.001
 facts-0.001
Token stretch
Token ID7539
Feature activation-9.42e-03

Pos logit contributions
 doorstep+0.008
eled+0.008
 once+0.008
eddy+0.007
 suitcase+0.007
Neg logit contributions
 respectful-0.011
smart-0.011
 disl-0.010
 cooperative-0.010
 facts-0.010
Token and
Token ID290
Feature activation+3.14e-02

Pos logit contributions
 once+0.009
 all+0.007
 were+0.007
 t+0.007
 both+0.007
Neg logit contributions
smart-0.022
 disl-0.022
 respectful-0.022
 cooperative-0.021
 facts-0.021
Token jump
Token ID4391
Feature activation-2.41e-03

Pos logit contributions
smart+0.067
 respectful+0.062
 disl+0.061
 facts+0.060
 cooperative+0.060
Neg logit contributions
 once-0.035
 doorstep-0.034
 t-0.032
 were-0.031
 re-0.030
Token even
Token ID772
Feature activation+7.51e-02

Pos logit contributions
 once+0.002
 all+0.002
 t+0.002
 were+0.002
 re+0.002
Neg logit contributions
 respectful-0.006
smart-0.006
 disl-0.006
 cooperative-0.006
 facts-0.006
Token higher
Token ID2440
Feature activation+1.54e-02

Pos logit contributions
 disl+0.138
 facts+0.129
smart+0.128
vern+0.121
rets+0.119
Neg logit contributions
 once-0.114
 all-0.101
 t-0.097
 set-0.096
 both-0.092
Token!"
Token ID2474
Feature activation-2.65e-03

Pos logit contributions
smart+0.034
 disl+0.033
 respectful+0.032
rets+0.031
 cooperative+0.031
Neg logit contributions
 once-0.018
 all-0.014
 re-0.014
 t-0.013
 were-0.013
Token
Token ID198
Feature activation-1.42e-02

Pos logit contributions
 once+0.003
 doorstep+0.003
 were+0.003
 t+0.002
 re+0.002
Neg logit contributions
smart-0.006
 respectful-0.006
 disl-0.005
 cooperative-0.005
 facts-0.005
Token
Token ID198
Feature activation-1.08e-03

Pos logit contributions
 doorstep+0.016
 once+0.016
 t+0.015
 re+0.015
 were+0.015
Neg logit contributions
smart-0.028
 respectful-0.028
 disl-0.026
 cooperative-0.026
 facts-0.026
TokenTim
Token ID14967
Feature activation+6.38e-03

Pos logit contributions
 once+0.001
 doorstep+0.001
 were+0.001
 t+0.001
eddy+0.001
Neg logit contributions
smart-0.002
 facts-0.002
 respectful-0.002
 disl-0.002
 cooperative-0.002
Token look
Token ID804
Feature activation+2.61e-02

Pos logit contributions
smart+0.011
 respectful+0.011
 disl+0.010
 cooperative+0.010
 facts+0.010
Neg logit contributions
 once-0.005
 doorstep-0.005
 t-0.005
 were-0.005
 re-0.005
Token different
Token ID1180
Feature activation+1.61e-02

Pos logit contributions
smart+0.052
 disl+0.052
 respectful+0.049
 cooperative+0.048
 facts+0.046
Neg logit contributions
 all-0.033
 once-0.032
 t-0.030
 both-0.029
 just-0.029
Token and
Token ID290
Feature activation+4.72e-02

Pos logit contributions
smart+0.038
 respectful+0.037
 disl+0.037
 cooperative+0.035
leness+0.034
Neg logit contributions
 once-0.015
 all-0.013
 t-0.012
 were-0.012
 re-0.012
Token some
Token ID617
Feature activation+4.37e-02

Pos logit contributions
smart+0.098
aring+0.086
reetings+0.086
lad+0.086
orts+0.085
Neg logit contributions
 were-0.056
 all-0.056
 t-0.052
 family-0.048
 both-0.048
Token are
Token ID389
Feature activation+3.46e-02

Pos logit contributions
smart+0.085
 respectful+0.083
 disl+0.082
 cooperative+0.082
minded+0.079
Neg logit contributions
 were-0.054
 all-0.053
 re-0.050
 t-0.049
 family-0.047
Token even
Token ID772
Feature activation+7.51e-02

Pos logit contributions
smart+0.070
 disl+0.068
minded+0.064
lad+0.064
reetings+0.063
Neg logit contributions
 all-0.044
 t-0.039
 once-0.038
 just-0.038
 both-0.038
Token colorful
Token ID20239
Feature activation+1.31e-02

Pos logit contributions
 disl+0.153
smart+0.147
aring+0.144
vern+0.143
lad+0.141
Neg logit contributions
 all-0.083
 just-0.080
 once-0.079
 t-0.078
 even-0.071
Token?"
Token ID1701
Feature activation-2.96e-03

Pos logit contributions
smart+0.028
 respectful+0.028
 disl+0.027
 cooperative+0.026
rets+0.024
Neg logit contributions
 once-0.014
 t-0.013
 re-0.013
 all-0.012
 baby-0.011
Token Dad
Token ID17415
Feature activation+1.17e-02

Pos logit contributions
 once+0.003
 doorstep+0.003
 were+0.003
 t+0.003
 re+0.003
Neg logit contributions
smart-0.006
 respectful-0.006
 cooperative-0.006
 disl-0.006
 facts-0.006
Token replied
Token ID8712
Feature activation+3.27e-02

Pos logit contributions
smart+0.024
 respectful+0.023
 disl+0.023
 cooperative+0.022
 facts+0.022
Neg logit contributions
 once-0.013
 doorstep-0.012
 t-0.011
 were-0.011
 re-0.011
Token,
Token ID11
Feature activation+1.40e-02

Pos logit contributions
 disl+0.085
 respectful+0.082
 facts+0.082
smart+0.081
 cooperative+0.081
Neg logit contributions
 once-0.026
 were-0.020
 all-0.019
 t-0.019
 baby-0.017
Token but
Token ID475
Feature activation-5.12e-03

Pos logit contributions
smart+0.022
 respectful+0.021
 disl+0.021
 cooperative+0.020
reetings+0.020
Neg logit contributions
 were-0.010
 t-0.010
 once-0.010
 all-0.009
 re-0.008
Token then
Token ID788
Feature activation+1.23e-02

Pos logit contributions
 doorstep+0.006
 once+0.006
 re+0.005
 t+0.005
 were+0.005
Neg logit contributions
smart-0.010
 respectful-0.010
 disl-0.009
 cooperative-0.009
 facts-0.009
Token she
Token ID673
Feature activation-2.68e-03

Pos logit contributions
smart+0.022
 respectful+0.021
 disl+0.021
 cooperative+0.020
 facts+0.020
Neg logit contributions
 once-0.018
 were-0.017
 t-0.017
 doorstep-0.016
 baby-0.016
Token noticed
Token ID6810
Feature activation+1.07e-02

Pos logit contributions
 doorstep+0.003
 once+0.003
 re+0.002
 t+0.002
burn+0.002
Neg logit contributions
smart-0.006
 respectful-0.005
 cooperative-0.005
 disl-0.005
 facts-0.005
Token an
Token ID281
Feature activation+2.54e-02

Pos logit contributions
 respectful+0.023
smart+0.022
 cooperative+0.022
 disl+0.022
aring+0.021
Neg logit contributions
 once-0.014
 were-0.013
 all-0.012
 t-0.012
 baby-0.011
Token even
Token ID772
Feature activation+7.49e-02

Pos logit contributions
smart+0.038
 respectful+0.037
 disl+0.035
 cooperative+0.034
 facts+0.033
Neg logit contributions
 doorstep-0.042
 once-0.041
 t-0.039
 re-0.039
 were-0.038
Token bigger
Token ID5749
Feature activation+2.75e-02

Pos logit contributions
 disl+0.153
smart+0.142
reetings+0.135
vern+0.134
lad+0.132
Neg logit contributions
 once-0.090
 all-0.081
 t-0.080
 just-0.079
 set-0.078
Token one
Token ID530
Feature activation+3.12e-02

Pos logit contributions
smart+0.060
 respectful+0.056
 disl+0.056
 cooperative+0.055
 facts+0.051
Neg logit contributions
 once-0.030
 t-0.028
 doorstep-0.027
 were-0.027
 re-0.026
Token.
Token ID13
Feature activation-8.99e-03

Pos logit contributions
 disl+0.069
smart+0.067
 respectful+0.066
 cooperative+0.066
 facts+0.061
Neg logit contributions
 once-0.036
 were-0.032
 doorstep-0.028
 all-0.028
 t-0.028
Token It
Token ID632
Feature activation+1.57e-02

Pos logit contributions
 once+0.012
 doorstep+0.012
 were+0.011
 t+0.011
 baby+0.010
Neg logit contributions
smart-0.017
 respectful-0.017
 disl-0.016
 cooperative-0.016
 facts-0.016
Token was
Token ID373
Feature activation+1.17e-02

Pos logit contributions
smart+0.041
 respectful+0.040
 disl+0.038
 cooperative+0.038
 facts+0.037
Neg logit contributions
 doorstep-0.009
 once-0.009
 t-0.007
 were-0.007
 re-0.007
Token,
Token ID11
Feature activation-7.27e-03

Pos logit contributions
 once+0.004
 t+0.004
 baby+0.004
 doorstep+0.004
 were+0.004
Neg logit contributions
smart-0.011
 respectful-0.010
 disl-0.010
 cooperative-0.010
 facts-0.010
Token we
Token ID356
Feature activation+9.21e-03

Pos logit contributions
 doorstep+0.009
 once+0.009
 t+0.009
 re+0.008
 were+0.008
Neg logit contributions
smart-0.013
 respectful-0.013
 disl-0.012
 cooperative-0.012
 facts-0.012
Token can
Token ID460
Feature activation-1.22e-03

Pos logit contributions
smart+0.017
 respectful+0.016
 disl+0.016
 cooperative+0.015
 facts+0.015
Neg logit contributions
 doorstep-0.012
 once-0.011
 re-0.011
 t-0.010
 blanket-0.010
Token build
Token ID1382
Feature activation+1.94e-02

Pos logit contributions
eled+0.002
 re+0.002
 doorstep+0.002
 suitcase+0.002
 luggage+0.002
Neg logit contributions
smart-0.002
 respectful-0.002
 disl-0.002
 cooperative-0.002
 smarter-0.001
Token an
Token ID281
Feature activation+1.68e-02

Pos logit contributions
smart+0.037
 respectful+0.036
 disl+0.035
 cooperative+0.034
 facts+0.033
Neg logit contributions
 once-0.025
 doorstep-0.023
 t-0.022
 were-0.022
 re-0.021
Token even
Token ID772
Feature activation+7.48e-02

Pos logit contributions
smart+0.026
 respectful+0.026
 cooperative+0.024
 disl+0.023
 complex+0.023
Neg logit contributions
 were-0.025
burn-0.025
 doorstep-0.024
 re-0.024
 raged-0.024
Token bigger
Token ID5749
Feature activation+1.77e-02

Pos logit contributions
 disl+0.121
smart+0.115
 facts+0.110
leness+0.108
 respectful+0.104
Neg logit contributions
 once-0.114
 all-0.110
 t-0.107
 set-0.106
 airplane-0.106
Token tower
Token ID10580
Feature activation+1.53e-02

Pos logit contributions
smart+0.036
 respectful+0.035
 disl+0.034
 cooperative+0.033
 facts+0.033
Neg logit contributions
 doorstep-0.019
 once-0.018
 re-0.017
 were-0.017
 t-0.016
Token together
Token ID1978
Feature activation+4.72e-03

Pos logit contributions
smart+0.031
 respectful+0.031
 disl+0.030
 facts+0.029
 cooperative+0.029
Neg logit contributions
 once-0.018
 doorstep-0.016
 were-0.015
 t-0.015
 all-0.014
Token!"
Token ID2474
Feature activation+1.42e-03

Pos logit contributions
smart+0.009
 disl+0.009
 respectful+0.009
 facts+0.009
 cooperative+0.008
Neg logit contributions
 once-0.006
 were-0.005
 doorstep-0.005
 all-0.005
 t-0.005
Token And
Token ID843
Feature activation+1.16e-02

Pos logit contributions
smart+0.003
 respectful+0.003
 disl+0.003
 cooperative+0.003
 facts+0.003
Neg logit contributions
 once-0.002
 doorstep-0.002
 were-0.002
 t-0.001
 tub-0.001
Token had
Token ID550
Feature activation+9.16e-03

Pos logit contributions
smart+0.051
 respectful+0.045
 facts+0.044
 cooperative+0.043
 disl+0.040
Neg logit contributions
 doorstep-0.043
 once-0.042
 t-0.038
 re-0.037
 all-0.037
Token to
Token ID284
Feature activation+1.90e-03

Pos logit contributions
smart+0.021
 cooperative+0.019
 respectful+0.019
 disl+0.019
aring+0.019
Neg logit contributions
 once-0.010
 were-0.009
 all-0.008
 finally-0.008
 t-0.008
Token use
Token ID779
Feature activation+1.42e-02

Pos logit contributions
smart+0.004
 respectful+0.004
 cooperative+0.004
 facts+0.004
 disl+0.004
Neg logit contributions
 doorstep-0.002
 once-0.002
 t-0.002
 were-0.002
 all-0.002
Token the
Token ID262
Feature activation+1.33e-02

Pos logit contributions
 respectful+0.034
 cooperative+0.033
aring+0.032
 disl+0.032
smart+0.032
Neg logit contributions
 all-0.014
 once-0.013
 t-0.013
 were-0.012
 blanket-0.011
Token words
Token ID2456
Feature activation-5.86e-03

Pos logit contributions
smart+0.029
 disl+0.027
 respectful+0.027
 cooperative+0.026
aring+0.024
Neg logit contributions
 t-0.015
 once-0.015
 doorstep-0.014
 blanket-0.014
 re-0.014
Token '
Token ID705
Feature activation+7.45e-02

Pos logit contributions
 once+0.006
 were+0.006
 all+0.005
 t+0.005
 both+0.005
Neg logit contributions
 disl-0.013
smart-0.013
 respectful-0.013
 cooperative-0.013
 facts-0.012
TokenPlease
Token ID5492
Feature activation+2.76e-02

Pos logit contributions
 respectful+0.142
 disl+0.138
 facts+0.137
 cooperative+0.133
smart+0.125
Neg logit contributions
 once-0.100
he-0.097
 were-0.096
el-0.090
neath-0.086
Token'
Token ID6
Feature activation+1.10e-02

Pos logit contributions
smart+0.059
 cooperative+0.056
 respectful+0.056
 facts+0.056
 disl+0.056
Neg logit contributions
 once-0.029
 were-0.028
 doorstep-0.028
 all-0.027
 t-0.026
Token and
Token ID290
Feature activation+2.82e-02

Pos logit contributions
smart+0.022
 facts+0.021
 respectful+0.021
 disl+0.021
 cooperative+0.021
Neg logit contributions
 once-0.013
 doorstep-0.012
 t-0.011
 all-0.011
 were-0.011
Token '
Token ID705
Feature activation+6.04e-02

Pos logit contributions
smart+0.059
 respectful+0.055
 cooperative+0.055
 facts+0.054
 disl+0.054
Neg logit contributions
 once-0.029
 were-0.029
 doorstep-0.029
 t-0.028
 all-0.027
TokenThank
Token ID10449
Feature activation+2.54e-02

Pos logit contributions
 facts+0.119
 respectful+0.118
 disl+0.114
 cooperative+0.113
smart+0.111
Neg logit contributions
 once-0.073
he-0.070
 were-0.069
neath-0.064
el-0.063
Token,
Token ID11
Feature activation+5.03e-02

Pos logit contributions
 disl+0.153
 respectful+0.146
 cooperative+0.142
smart+0.140
 rewarding+0.139
Neg logit contributions
 once-0.042
 were-0.035
 all-0.033
el-0.028
 just-0.028
Token3
Token ID18
Feature activation+5.37e-02

Pos logit contributions
 disl+0.127
smart+0.115
 respectful+0.113
leness+0.112
 facts+0.112
Neg logit contributions
 once-0.044
 t-0.044
 all-0.041
 set-0.039
 were-0.038
Token,
Token ID11
Feature activation+6.00e-02

Pos logit contributions
 disl+0.145
 respectful+0.142
 cooperative+0.134
smart+0.133
 rewarding+0.129
Neg logit contributions
 once-0.041
 were-0.037
el-0.036
 all-0.033
 both-0.029
Token4
Token ID19
Feature activation+5.39e-02

Pos logit contributions
 disl+0.134
smart+0.122
 respectful+0.119
leness+0.116
 rewarding+0.115
Neg logit contributions
 once-0.068
 t-0.066
 set-0.065
 finally-0.064
 all-0.063
Token,
Token ID11
Feature activation+5.53e-02

Pos logit contributions
 disl+0.145
 respectful+0.138
 cooperative+0.135
 rewarding+0.132
smart+0.131
Neg logit contributions
 once-0.040
 were-0.037
 all-0.035
el-0.030
 both-0.028
Token5
Token ID20
Feature activation+7.41e-02

Pos logit contributions
 disl+0.127
smart+0.115
 respectful+0.114
leness+0.111
 rewarding+0.109
Neg logit contributions
 once-0.060
 t-0.058
 all-0.056
 set-0.056
 finally-0.055
Token,
Token ID11
Feature activation+5.18e-02

Pos logit contributions
 disl+0.195
 respectful+0.168
smart+0.163
leness+0.161
rets+0.156
Neg logit contributions
 all-0.074
 once-0.069
 both-0.062
 were-0.062
 t-0.059
Token6
Token ID21
Feature activation+6.04e-02

Pos logit contributions
 disl+0.121
smart+0.111
 respectful+0.108
leness+0.107
vern+0.105
Neg logit contributions
 once-0.054
 t-0.051
 all-0.050
 set-0.050
 were-0.047
Token and
Token ID290
Feature activation+5.78e-02

Pos logit contributions
 disl+0.152
smart+0.142
 respectful+0.137
 dishonest+0.129
leness+0.129
Neg logit contributions
 once-0.057
 all-0.055
 both-0.051
 just-0.045
el-0.044
Token 7
Token ID767
Feature activation+5.75e-02

Pos logit contributions
 disl+0.127
smart+0.119
 respectful+0.116
reetings+0.115
vern+0.114
Neg logit contributions
 all-0.065
 were-0.064
 once-0.064
 finally-0.062
 both-0.061
Token sp
Token ID599
Feature activation+2.06e-02

Pos logit contributions
 disl+0.137
 respectful+0.130
 cooperative+0.124
smart+0.124
 rewarding+0.119
Neg logit contributions
 were-0.063
 once-0.060
 all-0.055
 t-0.050
 was-0.050
Token and
Token ID290
Feature activation-1.99e-02

Pos logit contributions
 doorstep+0.049
 once+0.041
 re+0.041
 suitcase+0.038
 were+0.038
Neg logit contributions
smart-0.085
 respectful-0.079
 disl-0.074
 cooperative-0.073
 facts-0.072
Token fell
Token ID3214
Feature activation-3.33e-02

Pos logit contributions
 doorstep+0.021
 once+0.018
 re+0.016
eddy+0.016
 destination+0.016
Neg logit contributions
smart-0.043
 respectful-0.040
 disl-0.039
 cooperative-0.037
 facts-0.036
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 once+0.041
 doorstep+0.039
 t+0.037
 were+0.037
 re+0.036
Neg logit contributions
smart-0.064
 respectful-0.063
 disl-0.061
 cooperative-0.060
 facts-0.060
Token The
Token ID383
Feature activation-9.77e-04

Pos logit contributions
 once+0.012
 were+0.011
 Mom+0.011
 Brown+0.011
 t+0.010
Neg logit contributions
 respectful-0.024
smart-0.023
 cooperative-0.023
 facts-0.022
 rewarding-0.021
Token ash
Token ID12530
Feature activation-5.43e-02

Pos logit contributions
 doorstep+0.001
 once+0.001
 t+0.001
 re+0.001
 baby+0.001
Neg logit contributions
smart-0.002
 respectful-0.002
 disl-0.002
 cooperative-0.002
 facts-0.002
Tokentr
Token ID2213
Feature activation-8.40e-02

Pos logit contributions
 doorstep+0.059
 Bobby+0.054
 re+0.052
 once+0.052
eled+0.051
Neg logit contributions
smart-0.102
 respectful-0.093
 disl-0.088
 facts-0.087
 wisely-0.084
Tokenay
Token ID323
Feature activation-4.41e-02

Pos logit contributions
 re+0.106
 doorstep+0.103
 picnic+0.089
 lake+0.086
 family+0.086
Neg logit contributions
smart-0.175
olds-0.130
ways-0.130
 wisely-0.129
 disl-0.129
Token flew
Token ID13112
Feature activation-2.01e-02

Pos logit contributions
 doorstep+0.043
 once+0.039
 re+0.035
 t+0.034
 were+0.033
Neg logit contributions
smart-0.098
 respectful-0.092
 disl-0.090
 cooperative-0.087
 facts-0.087
Token out
Token ID503
Feature activation-2.15e-02

Pos logit contributions
 once+0.017
 doorstep+0.017
 t+0.015
 were+0.015
 re+0.014
Neg logit contributions
smart-0.046
 respectful-0.045
 disl-0.044
 cooperative-0.043
 facts-0.042
Token of
Token ID286
Feature activation-2.49e-02

Pos logit contributions
 once+0.020
 doorstep+0.018
 were+0.017
 t+0.017
 re+0.015
Neg logit contributions
 respectful-0.049
 disl-0.048
smart-0.048
 cooperative-0.046
 facts-0.046
Token her
Token ID607
Feature activation-2.81e-02

Pos logit contributions
 doorstep+0.033
 once+0.032
 re+0.030
 were+0.029
 t+0.029
Neg logit contributions
smart-0.045
 respectful-0.044
 disl-0.042
 cooperative-0.041
 facts-0.040
Token not
Token ID407
Feature activation-6.98e-03

Pos logit contributions
 doorstep+0.044
 once+0.041
eled+0.041
 suitcase+0.040
 re+0.040
Neg logit contributions
smart-0.045
 respectful-0.045
 disl-0.041
 cooperative-0.039
 facts-0.037
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
 doorstep+0.009
 once+0.009
 re+0.008
eled+0.008
 were+0.008
Neg logit contributions
smart-0.013
 respectful-0.013
 disl-0.012
 cooperative-0.011
 facts-0.011
Token Lucy
Token ID22162
Feature activation-1.64e-02

Pos logit contributions
 doorstep+0.018
 once+0.017
 re+0.016
 t+0.016
 were+0.016
Neg logit contributions
smart-0.031
 respectful-0.030
 disl-0.028
 cooperative-0.028
 facts-0.027
Token and
Token ID290
Feature activation-1.55e-02

Pos logit contributions
 doorstep+0.014
 once+0.013
 re+0.012
 t+0.012
 tub+0.011
Neg logit contributions
smart-0.037
 respectful-0.036
 disl-0.034
 cooperative-0.034
 facts-0.033
Token her
Token ID607
Feature activation-5.00e-02

Pos logit contributions
 doorstep+0.025
 once+0.023
eddy+0.022
 tub+0.022
 dot+0.022
Neg logit contributions
smart-0.024
 respectful-0.023
 disl-0.022
 cooperative-0.021
 facts-0.020
Token t
Token ID256
Feature activation-8.20e-02

Pos logit contributions
 doorstep+0.052
neath+0.050
he+0.046
eb+0.045
 raged+0.044
Neg logit contributions
 respectful-0.095
smart-0.090
 questions-0.089
 cooperative-0.086
 disl-0.083
Tokeneddy
Token ID21874
Feature activation-3.34e-02

Pos logit contributions
 blast+0.108
 picnic+0.103
 doorstep+0.094
 finally+0.093
neath+0.092
Neg logit contributions
smart-0.140
and-0.124
ups-0.119
aring-0.119
 respectful-0.116
Token bear
Token ID6842
Feature activation-1.73e-02

Pos logit contributions
 doorstep+0.037
 once+0.028
he+0.027
 tub+0.026
 comet+0.026
Neg logit contributions
smart-0.071
 respectful-0.066
 facts-0.062
 disl-0.060
 cooperative-0.060
Token went
Token ID1816
Feature activation-1.14e-02

Pos logit contributions
 doorstep+0.013
eddy+0.011
 once+0.011
 tub+0.010
 comet+0.010
Neg logit contributions
smart-0.041
 respectful-0.040
 facts-0.037
 disl-0.037
 cooperative-0.037
Token back
Token ID736
Feature activation-9.31e-03

Pos logit contributions
 once+0.012
 doorstep+0.012
 t+0.011
 were+0.011
 re+0.010
Neg logit contributions
smart-0.024
 respectful-0.024
 disl-0.024
 facts-0.023
 cooperative-0.023
Token home
Token ID1363
Feature activation-2.00e-02

Pos logit contributions
 doorstep+0.012
 once+0.012
 t+0.010
 re+0.010
 were+0.010
Neg logit contributions
smart-0.018
 respectful-0.017
 disl-0.017
 cooperative-0.016
 facts-0.016
Token anymore
Token ID7471
Feature activation-1.12e-02

Pos logit contributions
 once+0.012
 doorstep+0.011
 were+0.010
 t+0.010
 re+0.010
Neg logit contributions
smart-0.025
 respectful-0.025
 disl-0.024
 cooperative-0.024
 facts-0.023
Token.
Token ID13
Feature activation-7.03e-03

Pos logit contributions
 once+0.014
 were+0.013
 t+0.012
 doorstep+0.012
 all+0.012
Neg logit contributions
smart-0.022
 respectful-0.022
 disl-0.021
 cooperative-0.021
 facts-0.020
Token Tim
Token ID5045
Feature activation-1.90e-02

Pos logit contributions
 once+0.006
 were+0.005
 doorstep+0.005
 Brown+0.005
 Bobby+0.004
Neg logit contributions
 respectful-0.017
smart-0.017
 cooperative-0.016
 facts-0.016
 disl-0.015
Token and
Token ID290
Feature activation-1.79e-02

Pos logit contributions
 doorstep+0.018
 once+0.016
 re+0.014
 t+0.014
 were+0.014
Neg logit contributions
smart-0.042
 respectful-0.041
 disl-0.038
 cooperative-0.038
 facts-0.037
Token his
Token ID465
Feature activation-4.22e-02

Pos logit contributions
 doorstep+0.026
 once+0.025
 re+0.023
 were+0.022
 tub+0.022
Neg logit contributions
smart-0.031
 respectful-0.030
 disl-0.029
 facts-0.027
 cooperative-0.027
Token t
Token ID256
Feature activation-8.12e-02

Pos logit contributions
 doorstep+0.037
neath+0.034
 raged+0.030
he+0.029
amb+0.027
Neg logit contributions
 respectful-0.089
smart-0.088
 questions-0.084
 cooperative-0.083
 disl-0.082
Tokeneddy
Token ID21874
Feature activation-4.17e-02

Pos logit contributions
 blast+0.106
 picnic+0.099
neath+0.094
 doorstep+0.093
 finally+0.093
Neg logit contributions
smart-0.136
and-0.121
ups-0.115
 respectful-0.113
aring-0.112
Token bear
Token ID6842
Feature activation-1.92e-02

Pos logit contributions
 doorstep+0.046
 once+0.036
he+0.033
 tub+0.033
neath+0.033
Neg logit contributions
smart-0.087
 respectful-0.081
 disl-0.076
 facts-0.076
 questions-0.074
Token were
Token ID547
Feature activation-3.63e-02

Pos logit contributions
 doorstep+0.014
 once+0.012
eddy+0.011
 tub+0.010
 re+0.010
Neg logit contributions
smart-0.046
 respectful-0.045
 disl-0.042
 facts-0.042
 cooperative-0.042
Token very
Token ID845
Feature activation-1.40e-02

Pos logit contributions
 doorstep+0.068
amb+0.058
ifier+0.058
 dot+0.058
 suitcase+0.058
Neg logit contributions
 respectful-0.050
smart-0.047
 cooperative-0.043
 smarter-0.041
 rewarding-0.040
Token happy
Token ID3772
Feature activation-2.06e-02

Pos logit contributions
 once+0.022
 doorstep+0.021
 were+0.021
 t+0.021
 re+0.020
Neg logit contributions
smart-0.022
 respectful-0.021
 facts-0.020
 disl-0.020
 cooperative-0.019
Token day
Token ID1110
Feature activation+4.86e-03

Pos logit contributions
 respectful+0.001
smart+0.001
 disl+0.001
 facts+0.001
 cooperative+0.001
Neg logit contributions
 doorstep-0.001
 once-0.001
 were-0.001
 raged-0.001
 re-0.001
Token,
Token ID11
Feature activation-6.26e-03

Pos logit contributions
smart+0.008
 respectful+0.008
 disl+0.007
 cooperative+0.007
 facts+0.007
Neg logit contributions
 doorstep-0.007
 once-0.007
 re-0.006
eled-0.006
 airplane-0.006
Token Lily
Token ID20037
Feature activation-2.89e-02

Pos logit contributions
 doorstep+0.010
 once+0.010
 re+0.009
el+0.009
 t+0.009
Neg logit contributions
smart-0.009
 respectful-0.009
 disl-0.009
 cooperative-0.008
 facts-0.008
Token and
Token ID290
Feature activation-1.72e-02

Pos logit contributions
 Stev+0.033
 doorstep+0.033
ifier+0.033
ay+0.032
 once+0.031
Neg logit contributions
 respectful-0.049
smart-0.048
 disl-0.046
 cooperative-0.041
minded-0.041
Token her
Token ID607
Feature activation-5.26e-02

Pos logit contributions
 doorstep+0.036
 once+0.033
 dot+0.032
 tub+0.032
 raged+0.032
Neg logit contributions
 respectful-0.018
 disl-0.018
smart-0.017
 cooperative-0.015
 facts-0.015
Token t
Token ID256
Feature activation-8.05e-02

Pos logit contributions
neath+0.061
 doorstep+0.057
 raged+0.057
storm+0.050
he+0.048
Neg logit contributions
 respectful-0.094
 questions-0.093
smart-0.087
 cooperative-0.085
 facts-0.082
Tokeneddy
Token ID21874
Feature activation-4.00e-02

Pos logit contributions
 blast+0.119
 exploded+0.103
 picnic+0.102
neath+0.100
 finally+0.100
Neg logit contributions
smart-0.120
and-0.116
ups-0.105
olds-0.101
 respectful-0.101
Token bear
Token ID6842
Feature activation-2.33e-02

Pos logit contributions
 doorstep+0.048
he+0.038
 once+0.037
 dot+0.036
ifier+0.036
Neg logit contributions
smart-0.076
 respectful-0.070
 questions-0.067
 facts-0.066
 disl-0.064
Token went
Token ID1816
Feature activation-1.06e-02

Pos logit contributions
 doorstep+0.021
eddy+0.020
 comet+0.020
 dot+0.019
icer+0.018
Neg logit contributions
smart-0.049
 respectful-0.046
 questions-0.042
 facts-0.042
 disl-0.041
Token for
Token ID329
Feature activation-3.30e-02

Pos logit contributions
 doorstep+0.014
 once+0.014
 t+0.012
 re+0.012
 were+0.012
Neg logit contributions
smart-0.020
 respectful-0.019
 disl-0.018
 cooperative-0.018
 facts-0.018
Token a
Token ID257
Feature activation-6.64e-03

Pos logit contributions
 doorstep+0.048
 once+0.046
 were+0.042
 re+0.042
 t+0.042
Neg logit contributions
smart-0.056
 respectful-0.054
 disl-0.051
 cooperative-0.050
 facts-0.050
Token fell
Token ID3214
Feature activation-4.55e-02

Pos logit contributions
 doorstep+0.036
eddy+0.032
 once+0.031
amb+0.029
burn+0.027
Neg logit contributions
smart-0.096
 respectful-0.090
 disl-0.083
 cooperative-0.083
 facts-0.082
Token out
Token ID503
Feature activation-2.06e-02

Pos logit contributions
 doorstep+0.057
 once+0.048
 re+0.046
eled+0.044
 comet+0.043
Neg logit contributions
smart-0.090
 respectful-0.081
 cooperative-0.074
 disl-0.073
 facts-0.073
Token and
Token ID290
Feature activation-1.88e-02

Pos logit contributions
 doorstep+0.021
 once+0.019
 re+0.018
 t+0.017
 were+0.017
Neg logit contributions
smart-0.045
 respectful-0.043
 disl-0.041
 cooperative-0.040
 facts-0.040
Token the
Token ID262
Feature activation-9.71e-03

Pos logit contributions
 doorstep+0.020
 once+0.018
 re+0.016
eddy+0.015
el+0.015
Neg logit contributions
smart-0.040
 respectful-0.038
 disl-0.037
 cooperative-0.036
 facts-0.035
Token ash
Token ID12530
Feature activation-5.36e-02

Pos logit contributions
 doorstep+0.010
 once+0.010
 t+0.009
 re+0.008
 were+0.008
Neg logit contributions
smart-0.021
 respectful-0.021
 disl-0.020
 cooperative-0.020
 facts-0.019
Tokentr
Token ID2213
Feature activation-7.91e-02

Pos logit contributions
 doorstep+0.051
 re+0.050
 once+0.049
eled+0.048
 comet+0.047
Neg logit contributions
smart-0.106
 respectful-0.099
 disl-0.093
 facts-0.092
 wisely-0.089
Tokenay
Token ID323
Feature activation-3.51e-02

Pos logit contributions
 re+0.099
 doorstep+0.090
 picnic+0.085
 comet+0.081
 lake+0.081
Neg logit contributions
smart-0.163
ways-0.128
 wisely-0.124
olds-0.120
 complex-0.117
Token broke
Token ID6265
Feature activation-3.68e-02

Pos logit contributions
 doorstep+0.031
 once+0.030
 re+0.026
 t+0.026
 were+0.025
Neg logit contributions
smart-0.079
 respectful-0.076
 disl-0.073
 cooperative-0.072
 facts-0.072
Token into
Token ID656
Feature activation-5.11e-03

Pos logit contributions
 doorstep+0.051
 once+0.049
 re+0.046
 t+0.046
 were+0.045
Neg logit contributions
smart-0.064
 respectful-0.061
 disl-0.058
 cooperative-0.057
 facts-0.056
Token many
Token ID867
Feature activation+2.90e-03

Pos logit contributions
 doorstep+0.006
 once+0.006
 re+0.005
 t+0.005
 were+0.005
Neg logit contributions
smart-0.010
 respectful-0.010
 disl-0.009
 cooperative-0.009
 facts-0.009
Token pieces
Token ID5207
Feature activation-1.77e-02

Pos logit contributions
smart+0.006
 respectful+0.006
 disl+0.006
 cooperative+0.006
 facts+0.005
Neg logit contributions
 doorstep-0.003
 once-0.003
 t-0.003
 re-0.003
 were-0.003
Token,
Token ID11
Feature activation-1.36e-02

Pos logit contributions
 once+0.012
 were+0.011
 t+0.011
 re+0.010
 all+0.010
Neg logit contributions
 respectful-0.028
smart-0.028
 disl-0.027
 cooperative-0.027
 facts-0.026
Token it
Token ID340
Feature activation-2.23e-02

Pos logit contributions
 once+0.015
 doorstep+0.014
 were+0.014
 t+0.014
 re+0.013
Neg logit contributions
smart-0.028
 respectful-0.028
 disl-0.027
 cooperative-0.026
 facts-0.026
Token caused
Token ID4073
Feature activation-2.06e-02

Pos logit contributions
 doorstep+0.022
 once+0.018
eddy+0.018
ifier+0.018
 suitcase+0.017
Neg logit contributions
smart-0.047
 respectful-0.045
 disl-0.043
 cooperative-0.041
 facts-0.040
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
 once+0.023
 were+0.019
 t+0.019
 Brown+0.018
 baby+0.018
Neg logit contributions
 respectful-0.046
smart-0.045
 cooperative-0.044
 disl-0.044
 facts-0.042
Token ash
Token ID12530
Feature activation-3.17e-02

Pos logit contributions
 doorstep+0.011
 once+0.011
 t+0.010
 re+0.010
 were+0.010
Neg logit contributions
smart-0.022
 respectful-0.021
 disl-0.020
 cooperative-0.020
 facts-0.020
Tokentr
Token ID2213
Feature activation-7.85e-02

Pos logit contributions
 doorstep+0.035
 once+0.034
 re+0.031
 t+0.031
 were+0.031
Neg logit contributions
smart-0.064
 respectful-0.062
 disl-0.059
 cooperative-0.058
 facts-0.057
Tokenay
Token ID323
Feature activation-2.64e-02

Pos logit contributions
 re+0.087
 doorstep+0.077
 stuffed+0.069
 comet+0.069
 baby+0.069
Neg logit contributions
smart-0.172
 respectful-0.135
 wisely-0.134
 disl-0.134
aring-0.132
Token to
Token ID284
Feature activation-4.20e-02

Pos logit contributions
 once+0.030
 were+0.026
 doorstep+0.025
 t+0.025
 Brown+0.024
Neg logit contributions
 respectful-0.058
 cooperative-0.055
smart-0.055
 disl-0.054
 facts-0.053
Token scatter
Token ID41058
Feature activation-1.74e-02

Pos logit contributions
 doorstep+0.048
 once+0.047
 re+0.042
 t+0.041
 were+0.041
Neg logit contributions
smart-0.083
 respectful-0.080
 disl-0.078
 cooperative-0.075
 facts-0.073
Token all
Token ID477
Feature activation+7.31e-03

Pos logit contributions
 once+0.018
 were+0.016
 doorstep+0.015
 t+0.015
 all+0.014
Neg logit contributions
 respectful-0.039
smart-0.038
 disl-0.037
 cooperative-0.037
 facts-0.036
Token over
Token ID625
Feature activation-1.20e-02

Pos logit contributions
smart+0.016
 respectful+0.016
 facts+0.016
 cooperative+0.016
 disl+0.016
Neg logit contributions
 once-0.007
 were-0.006
 doorstep-0.006
 t-0.006
 all-0.005
Token Lily
Token ID20037
Feature activation-2.70e-02

Pos logit contributions
 doorstep+0.033
 re+0.031
eled+0.030
 once+0.029
el+0.029
Neg logit contributions
 respectful-0.023
smart-0.023
 disl-0.022
 cooperative-0.020
 facts-0.019
Token accidentally
Token ID14716
Feature activation-1.86e-02

Pos logit contributions
ifier+0.032
 doorstep+0.032
 Stev+0.031
 once+0.029
stack+0.029
Neg logit contributions
 respectful-0.048
smart-0.046
 disl-0.043
 cooperative-0.041
ups-0.038
Token dropped
Token ID5710
Feature activation-1.28e-02

Pos logit contributions
 doorstep+0.016
 once+0.016
 re+0.013
 were+0.013
 t+0.013
Neg logit contributions
smart-0.042
 respectful-0.041
 disl-0.040
 cooperative-0.039
 facts-0.038
Token the
Token ID262
Feature activation-1.72e-02

Pos logit contributions
 once+0.015
 doorstep+0.015
 t+0.014
 were+0.013
 re+0.013
Neg logit contributions
smart-0.025
 respectful-0.025
 disl-0.024
 cooperative-0.023
 facts-0.023
Token ash
Token ID12530
Feature activation-4.30e-02

Pos logit contributions
 doorstep+0.027
 once+0.026
 were+0.025
 re+0.025
eled+0.024
Neg logit contributions
smart-0.025
 respectful-0.024
 disl-0.023
 facts-0.022
 cooperative-0.021
Tokentr
Token ID2213
Feature activation-7.80e-02

Pos logit contributions
 doorstep+0.046
 re+0.045
eled+0.045
 once+0.042
 comet+0.041
Neg logit contributions
smart-0.083
 respectful-0.073
 disl-0.069
 facts-0.068
 wisely-0.067
Tokenay
Token ID323
Feature activation-3.36e-02

Pos logit contributions
 re+0.094
 doorstep+0.085
 picnic+0.080
oyd+0.077
 comet+0.075
Neg logit contributions
smart-0.165
 wisely-0.124
ways-0.124
 complex-0.121
olds-0.119
Token and
Token ID290
Feature activation-2.86e-02

Pos logit contributions
 once+0.034
 doorstep+0.032
 were+0.030
 t+0.029
 re+0.028
Neg logit contributions
 respectful-0.073
smart-0.073
 disl-0.070
 cooperative-0.070
 facts-0.068
Token it
Token ID340
Feature activation-3.08e-02

Pos logit contributions
 doorstep+0.036
 once+0.035
 re+0.032
eddy+0.031
amb+0.030
Neg logit contributions
smart-0.053
 respectful-0.051
 disl-0.049
 cooperative-0.046
 facts-0.045
Token broke
Token ID6265
Feature activation-3.83e-02

Pos logit contributions
 doorstep+0.037
amb+0.034
eddy+0.032
 re+0.032
 suitcase+0.032
Neg logit contributions
smart-0.059
 respectful-0.054
 disl-0.050
 smarter-0.049
 wisely-0.048
Token into
Token ID656
Feature activation-1.47e-02

Pos logit contributions
 doorstep+0.051
 once+0.048
 re+0.046
 were+0.045
 t+0.044
Neg logit contributions
smart-0.069
 respectful-0.066
 disl-0.062
 cooperative-0.062
 facts-0.059
Token together
Token ID1978
Feature activation-2.69e-03

Pos logit contributions
 doorstep+0.015
el+0.014
eled+0.014
burn+0.013
 dot+0.013
Neg logit contributions
smart-0.017
 respectful-0.016
 cooperative-0.015
 facts-0.014
 disl-0.014
Token until
Token ID1566
Feature activation-5.55e-02

Pos logit contributions
 doorstep+0.003
eddy+0.003
burn+0.003
eled+0.003
 tub+0.003
Neg logit contributions
smart-0.005
 respectful-0.005
 facts-0.005
 cooperative-0.005
 questions-0.004
Token it
Token ID340
Feature activation-2.40e-02

Pos logit contributions
 doorstep+0.100
amb+0.083
sey+0.081
 dot+0.080
eddy+0.076
Neg logit contributions
smart-0.066
 smarter-0.060
 respectful-0.059
 disl-0.054
 risky-0.053
Token was
Token ID373
Feature activation-4.44e-02

Pos logit contributions
 suitcase+0.032
eddy+0.031
 motor+0.031
ifier+0.031
 doorstep+0.031
Neg logit contributions
smart-0.034
 respectful-0.033
 smarter-0.032
ups-0.030
 risky-0.029
Token time
Token ID640
Feature activation-5.04e-03

Pos logit contributions
 doorstep+0.080
amb+0.070
 re+0.067
eddy+0.067
 dot+0.066
Neg logit contributions
 respectful-0.057
smart-0.055
 rewarding-0.050
 smarter-0.049
 risky-0.047
Token for
Token ID329
Feature activation-7.79e-02

Pos logit contributions
 once+0.007
 doorstep+0.007
 t+0.006
 were+0.006
 re+0.006
Neg logit contributions
smart-0.009
 respectful-0.009
 disl-0.009
 cooperative-0.008
 facts-0.008
Token Snow
Token ID7967
Feature activation-1.68e-02

Pos logit contributions
 doorstep+0.094
 dot+0.083
eled+0.081
neath+0.077
he+0.076
Neg logit contributions
smart-0.144
 respectful-0.135
 disl-0.126
 facts-0.124
 cooperative-0.122
Tokenball
Token ID1894
Feature activation-1.35e-02

Pos logit contributions
 once+0.018
 doorstep+0.018
 t+0.016
 were+0.016
 re+0.016
Neg logit contributions
smart-0.035
 respectful-0.034
 disl-0.033
 cooperative-0.033
 facts-0.032
Token to
Token ID284
Feature activation-2.60e-02

Pos logit contributions
 once+0.020
 doorstep+0.019
 were+0.018
 t+0.018
 re+0.017
Neg logit contributions
smart-0.023
 respectful-0.023
 disl-0.023
 cooperative-0.022
 facts-0.022
Token take
Token ID1011
Feature activation-3.00e-02

Pos logit contributions
 doorstep+0.022
 once+0.022
eddy+0.019
eled+0.019
 dot+0.019
Neg logit contributions
smart-0.056
 respectful-0.055
 disl-0.053
 cooperative-0.051
 facts-0.051
Token a
Token ID257
Feature activation-4.44e-03

Pos logit contributions
 doorstep+0.035
 once+0.034
 t+0.031
 re+0.031
 were+0.031
Neg logit contributions
smart-0.059
 respectful-0.058
 disl-0.055
 cooperative-0.054
 facts-0.054
Token hit
Token ID2277
Feature activation-1.79e-02

Pos logit contributions
 once+0.020
 doorstep+0.019
 t+0.018
 were+0.017
 re+0.017
Neg logit contributions
smart-0.056
 respectful-0.055
 disl-0.052
 cooperative-0.052
 facts-0.052
Token her
Token ID607
Feature activation-2.00e-02

Pos logit contributions
 once+0.020
 t+0.017
 all+0.017
 baby+0.017
 Brown+0.016
Neg logit contributions
 respectful-0.042
 cooperative-0.041
smart-0.040
 disl-0.039
 facts-0.038
Token mom
Token ID1995
Feature activation-2.44e-03

Pos logit contributions
 doorstep+0.019
 once+0.018
 were+0.017
 re+0.017
eled+0.016
Neg logit contributions
smart-0.041
 respectful-0.041
 disl-0.039
 cooperative-0.038
 facts-0.037
Token's
Token ID338
Feature activation-7.82e-03

Pos logit contributions
 once+0.002
 doorstep+0.002
 t+0.002
 were+0.002
el+0.002
Neg logit contributions
 respectful-0.005
 disl-0.005
 facts-0.005
 cooperative-0.005
smart-0.005
Token ash
Token ID12530
Feature activation-3.88e-02

Pos logit contributions
 doorstep+0.007
 once+0.006
 t+0.006
 re+0.005
 baby+0.005
Neg logit contributions
smart-0.018
 respectful-0.018
 disl-0.017
 cooperative-0.017
 facts-0.017
Tokentr
Token ID2213
Feature activation-7.71e-02

Pos logit contributions
 doorstep+0.035
 once+0.034
 re+0.032
 were+0.031
 t+0.029
Neg logit contributions
smart-0.084
 respectful-0.080
 disl-0.076
 cooperative-0.074
 facts-0.074
Tokenay
Token ID323
Feature activation-3.42e-02

Pos logit contributions
 re+0.084
 doorstep+0.076
 baby+0.066
 picnic+0.066
 comet+0.065
Neg logit contributions
smart-0.172
 wisely-0.134
 disl-0.133
 respectful-0.131
aring-0.129
Token.
Token ID13
Feature activation-1.70e-02

Pos logit contributions
 once+0.039
 were+0.034
 t+0.034
 doorstep+0.033
 Brown+0.031
Neg logit contributions
 respectful-0.074
smart-0.072
 cooperative-0.071
 disl-0.071
 facts-0.069
Token The
Token ID383
Feature activation-1.14e-02

Pos logit contributions
 once+0.021
 doorstep+0.020
 were+0.019
 t+0.018
 baby+0.018
Neg logit contributions
smart-0.034
 respectful-0.034
 facts-0.032
 cooperative-0.032
 disl-0.031
Token ash
Token ID12530
Feature activation-5.45e-02

Pos logit contributions
 doorstep+0.011
 once+0.011
 t+0.010
 baby+0.010
 re+0.009
Neg logit contributions
smart-0.026
 respectful-0.025
 disl-0.024
 cooperative-0.024
 facts-0.023
Tokentr
Token ID2213
Feature activation-7.70e-02

Pos logit contributions
 doorstep+0.057
 re+0.054
eled+0.051
 tub+0.051
 once+0.051
Neg logit contributions
smart-0.104
 respectful-0.096
 disl-0.091
 facts-0.087
 wisely-0.086
Tokentr
Token ID2213
Feature activation-7.71e-02

Pos logit contributions
 doorstep+0.035
 once+0.034
 re+0.032
 were+0.031
 t+0.029
Neg logit contributions
smart-0.084
 respectful-0.080
 disl-0.076
 cooperative-0.074
 facts-0.074
Tokenay
Token ID323
Feature activation-3.42e-02

Pos logit contributions
 re+0.084
 doorstep+0.076
 baby+0.066
 picnic+0.066
 comet+0.065
Neg logit contributions
smart-0.172
 wisely-0.134
 disl-0.133
 respectful-0.131
aring-0.129
Token.
Token ID13
Feature activation-1.70e-02

Pos logit contributions
 once+0.039
 were+0.034
 t+0.034
 doorstep+0.033
 Brown+0.031
Neg logit contributions
 respectful-0.074
smart-0.072
 cooperative-0.071
 disl-0.071
 facts-0.069
Token The
Token ID383
Feature activation-1.14e-02

Pos logit contributions
 once+0.021
 doorstep+0.020
 were+0.019
 t+0.018
 baby+0.018
Neg logit contributions
smart-0.034
 respectful-0.034
 facts-0.032
 cooperative-0.032
 disl-0.031
Token ash
Token ID12530
Feature activation-5.45e-02

Pos logit contributions
 doorstep+0.011
 once+0.011
 t+0.010
 baby+0.010
 re+0.009
Neg logit contributions
smart-0.026
 respectful-0.025
 disl-0.024
 cooperative-0.024
 facts-0.023
Tokentr
Token ID2213
Feature activation-7.70e-02

Pos logit contributions
 doorstep+0.057
 re+0.054
eled+0.051
 tub+0.051
 once+0.051
Neg logit contributions
smart-0.104
 respectful-0.096
 disl-0.091
 facts-0.087
 wisely-0.086
Tokenay
Token ID323
Feature activation-4.08e-02

Pos logit contributions
 re+0.089
 doorstep+0.085
 picnic+0.073
 lake+0.071
 family+0.069
Neg logit contributions
smart-0.169
 disl-0.130
 wisely-0.129
 respectful-0.126
 complex-0.126
Token fell
Token ID3214
Feature activation-5.53e-02

Pos logit contributions
 doorstep+0.028
 once+0.028
 re+0.023
 t+0.023
 were+0.022
Neg logit contributions
smart-0.100
 respectful-0.097
 disl-0.094
 cooperative-0.092
 facts-0.091
Token down
Token ID866
Feature activation-3.45e-02

Pos logit contributions
 doorstep+0.064
 once+0.054
 re+0.053
eled+0.050
 were+0.049
Neg logit contributions
smart-0.113
 respectful-0.104
 disl-0.096
 cooperative-0.095
 facts-0.093
Token and
Token ID290
Feature activation-3.24e-02

Pos logit contributions
 doorstep+0.031
 once+0.030
 re+0.026
 t+0.026
 were+0.026
Neg logit contributions
smart-0.078
 respectful-0.076
 disl-0.073
 cooperative-0.072
 facts-0.071
Token broke
Token ID6265
Feature activation-3.74e-02

Pos logit contributions
 doorstep+0.042
 once+0.039
 re+0.036
eddy+0.034
 were+0.034
Neg logit contributions
smart-0.061
 respectful-0.058
 disl-0.056
 cooperative-0.054
 facts-0.052
Token day
Token ID1110
Feature activation+5.66e-03

Pos logit contributions
 doorstep+0.003
 once+0.002
 destination+0.002
 raged+0.002
 suitcase+0.002
Neg logit contributions
 respectful-0.003
smart-0.003
 disl-0.003
 cooperative-0.003
 facts-0.003
Token,
Token ID11
Feature activation-6.02e-03

Pos logit contributions
smart+0.010
 respectful+0.010
 disl+0.009
 cooperative+0.009
 facts+0.009
Neg logit contributions
 doorstep-0.008
 once-0.007
 re-0.007
 t-0.007
 airplane-0.007
Token Lily
Token ID20037
Feature activation-2.90e-02

Pos logit contributions
 doorstep+0.011
 once+0.010
 re+0.010
 were+0.009
eled+0.009
Neg logit contributions
smart-0.008
 respectful-0.008
 disl-0.008
 cooperative-0.007
 facts-0.007
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
 doorstep+0.036
 Stev+0.035
ifier+0.035
ay+0.034
stack+0.034
Neg logit contributions
 respectful-0.047
smart-0.045
 disl-0.043
 cooperative-0.039
able-0.038
Token her
Token ID607
Feature activation-4.52e-02

Pos logit contributions
 doorstep+0.039
 once+0.035
 dot+0.035
 raged+0.034
eled+0.034
Neg logit contributions
 respectful-0.018
smart-0.017
 disl-0.017
 cooperative-0.014
 facts-0.014
Token t
Token ID256
Feature activation-7.69e-02

Pos logit contributions
 doorstep+0.049
neath+0.048
 raged+0.045
 once+0.041
eled+0.039
Neg logit contributions
 respectful-0.086
 questions-0.082
 cooperative-0.079
smart-0.078
 disl-0.074
Tokeneddy
Token ID21874
Feature activation-3.94e-02

Pos logit contributions
 blast+0.105
 picnic+0.097
neath+0.092
 doorstep+0.092
 finally+0.091
Neg logit contributions
smart-0.120
and-0.113
ups-0.110
olds-0.103
 respectful-0.103
Token bear
Token ID6842
Feature activation-2.25e-02

Pos logit contributions
 doorstep+0.050
he+0.039
 once+0.037
neath+0.036
 destination+0.036
Neg logit contributions
smart-0.073
 respectful-0.068
 questions-0.064
 facts-0.063
 disl-0.061
Token went
Token ID1816
Feature activation-1.04e-02

Pos logit contributions
 doorstep+0.026
 comet+0.025
eddy+0.024
icer+0.023
 dot+0.022
Neg logit contributions
smart-0.041
 respectful-0.039
 questions-0.035
 facts-0.034
 disl-0.033
Token on
Token ID319
Feature activation-1.73e-02

Pos logit contributions
 doorstep+0.014
 once+0.014
 were+0.013
 re+0.013
 t+0.013
Neg logit contributions
smart-0.019
 respectful-0.018
 disl-0.017
 cooperative-0.017
 facts-0.016
Token an
Token ID281
Feature activation-2.39e-03

Pos logit contributions
 doorstep+0.028
eled+0.025
 dot+0.025
 once+0.025
 were+0.025
Neg logit contributions
smart-0.025
 respectful-0.025
 facts-0.023
 disl-0.023
 cooperative-0.022
Token L
Token ID406
Feature activation-1.39e-02

Pos logit contributions
 doorstep+0.025
 once+0.024
 re+0.023
 t+0.022
 were+0.022
Neg logit contributions
smart-0.022
 respectful-0.021
 disl-0.020
 cooperative-0.019
 facts-0.019
Tokenenny
Token ID11870
Feature activation-9.76e-03

Pos logit contributions
 doorstep+0.024
 once+0.024
 t+0.022
 re+0.022
 were+0.022
Neg logit contributions
smart-0.020
 respectful-0.019
 disl-0.018
 cooperative-0.017
 facts-0.017
Token saw
Token ID2497
Feature activation-6.45e-03

Pos logit contributions
 doorstep+0.009
 once+0.009
 re+0.007
 dot+0.007
burn+0.007
Neg logit contributions
smart-0.021
 respectful-0.021
 disl-0.020
 cooperative-0.019
 facts-0.019
Token an
Token ID281
Feature activation-4.74e-03

Pos logit contributions
 once+0.009
 doorstep+0.008
 t+0.008
 re+0.008
 were+0.008
Neg logit contributions
smart-0.012
 respectful-0.011
 disl-0.011
 cooperative-0.011
 facts-0.011
Token ash
Token ID12530
Feature activation-3.38e-02

Pos logit contributions
 once+0.008
 were+0.008
 doorstep+0.008
burn+0.007
 raged+0.007
Neg logit contributions
 respectful-0.007
smart-0.006
 cooperative-0.006
 disl-0.006
 facts-0.005
Tokentr
Token ID2213
Feature activation-7.68e-02

Pos logit contributions
eled+0.033
 doorstep+0.033
 once+0.032
 Bobby+0.030
 were+0.030
Neg logit contributions
smart-0.065
 respectful-0.060
 facts-0.058
 disl-0.055
 cooperative-0.054
Tokenay
Token ID323
Feature activation-2.64e-02

Pos logit contributions
 re+0.106
 picnic+0.094
 doorstep+0.094
 duck+0.091
 stuffed+0.089
Neg logit contributions
smart-0.149
ways-0.110
 wisely-0.104
 complex-0.103
olds-0.103
Token on
Token ID319
Feature activation-3.44e-02

Pos logit contributions
 doorstep+0.024
 once+0.023
 re+0.020
 t+0.020
 were+0.020
Neg logit contributions
smart-0.059
 respectful-0.057
 disl-0.055
 cooperative-0.054
 facts-0.054
Token the
Token ID262
Feature activation-1.41e-02

Pos logit contributions
 doorstep+0.049
 once+0.045
 were+0.043
 re+0.042
eled+0.042
Neg logit contributions
smart-0.057
 respectful-0.054
 disl-0.053
 facts-0.050
 cooperative-0.050
Token side
Token ID1735
Feature activation-9.41e-03

Pos logit contributions
 once+0.015
eled+0.013
 were+0.013
 doorstep+0.013
 raged+0.012
Neg logit contributions
 respectful-0.028
smart-0.027
 disl-0.026
 cooperative-0.026
 facts-0.025
Token of
Token ID286
Feature activation-1.72e-02

Pos logit contributions
 once+0.008
 doorstep+0.007
 were+0.006
el+0.006
 t+0.005
Neg logit contributions
 respectful-0.023
 disl-0.023
smart-0.023
 cooperative-0.022
 facts-0.022
Token day
Token ID1110
Feature activation-2.46e-03

Pos logit contributions
smart+0.022
 respectful+0.021
 disl+0.020
 cooperative+0.020
 facts+0.020
Neg logit contributions
 doorstep-0.012
 once-0.012
 t-0.011
 re-0.010
 were-0.010
Token,
Token ID11
Feature activation-1.47e-02

Pos logit contributions
 doorstep+0.004
 once+0.003
 airplane+0.003
 dot+0.003
 re+0.003
Neg logit contributions
smart-0.004
 respectful-0.004
 disl-0.003
 cooperative-0.003
 facts-0.003
Token Lucy
Token ID22162
Feature activation-2.25e-02

Pos logit contributions
 doorstep+0.024
 dot+0.020
eled+0.020
el+0.020
eddy+0.020
Neg logit contributions
 respectful-0.022
smart-0.021
 disl-0.020
 cooperative-0.019
 facts-0.019
Token and
Token ID290
Feature activation-2.08e-02

Pos logit contributions
 doorstep+0.024
 once+0.022
ifier+0.022
 comet+0.022
 motor+0.022
Neg logit contributions
smart-0.042
 respectful-0.041
 disl-0.037
 cooperative-0.036
 facts-0.035
Token her
Token ID607
Feature activation-4.42e-02

Pos logit contributions
 doorstep+0.044
 dot+0.041
 once+0.040
 tub+0.040
 raged+0.039
Neg logit contributions
 respectful-0.020
smart-0.020
 disl-0.018
 questions-0.017
 facts-0.017
Token t
Token ID256
Feature activation-7.64e-02

Pos logit contributions
neath+0.057
 raged+0.055
 doorstep+0.055
eled+0.050
Uh+0.048
Neg logit contributions
 respectful-0.070
 questions-0.070
 cooperative-0.065
smart-0.058
 smart-0.058
Tokeneddy
Token ID21874
Feature activation-3.83e-02

Pos logit contributions
 blast+0.112
 picnic+0.097
 exploded+0.096
 finally+0.096
neath+0.095
Neg logit contributions
smart-0.111
ups-0.107
and-0.105
ins-0.097
aring-0.096
Token bear
Token ID6842
Feature activation-1.95e-02

Pos logit contributions
 doorstep+0.046
he+0.037
 once+0.036
 dot+0.034
ifier+0.034
Neg logit contributions
smart-0.073
 respectful-0.068
 questions-0.064
 facts-0.063
 disl-0.062
Token went
Token ID1816
Feature activation+1.25e-03

Pos logit contributions
 doorstep+0.016
eddy+0.016
 comet+0.015
 dot+0.015
ifier+0.014
Neg logit contributions
smart-0.041
 respectful-0.040
 facts-0.037
 questions-0.036
 cooperative-0.036
Token for
Token ID329
Feature activation-3.05e-02

Pos logit contributions
smart+0.003
 disl+0.003
 respectful+0.003
 cooperative+0.003
 facts+0.003
Neg logit contributions
 once-0.001
 doorstep-0.001
 t-0.001
 all-0.001
 were-0.001
Token a
Token ID257
Feature activation+8.95e-04

Pos logit contributions
 doorstep+0.044
 once+0.042
 were+0.039
 re+0.039
 t+0.039
Neg logit contributions
smart-0.052
 respectful-0.050
 disl-0.048
 cooperative-0.046
 facts-0.046
Token ambulance
Token ID22536
Feature activation-1.43e-02

Pos logit contributions
 once+0.045
 doorstep+0.045
 were+0.041
el+0.040
eled+0.040
Neg logit contributions
 respectful-0.082
smart-0.079
 cooperative-0.075
 disl-0.075
 facts-0.074
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 doorstep+0.022
 once+0.021
 t+0.020
 re+0.020
 were+0.019
Neg logit contributions
smart-0.023
 respectful-0.022
 disl-0.021
 cooperative-0.020
 facts-0.020
Token It
Token ID632
Feature activation-2.83e-02

Pos logit contributions
 once+0.011
 doorstep+0.010
 were+0.010
 t+0.010
 re+0.009
Neg logit contributions
smart-0.023
 respectful-0.023
 disl-0.022
 cooperative-0.022
 facts-0.021
Token had
Token ID550
Feature activation-3.43e-02

Pos logit contributions
 doorstep+0.037
amb+0.033
olk+0.031
ifier+0.031
eddy+0.031
Neg logit contributions
smart-0.054
 respectful-0.052
 disl-0.047
 facts-0.044
 cooperative-0.044
Token a
Token ID257
Feature activation-5.23e-02

Pos logit contributions
 doorstep+0.041
 once+0.041
 t+0.037
 re+0.037
 were+0.037
Neg logit contributions
smart-0.066
 respectful-0.064
 disl-0.062
 cooperative-0.061
 facts-0.060
Token red
Token ID2266
Feature activation-7.54e-02

Pos logit contributions
 once+0.074
 doorstep+0.072
el+0.066
 were+0.066
eled+0.065
Neg logit contributions
 respectful-0.089
smart-0.088
 disl-0.082
 cooperative-0.079
 facts-0.078
Token light
Token ID1657
Feature activation-3.94e-02

Pos logit contributions
 doorstep+0.077
 once+0.077
 raged+0.064
neath+0.064
 were+0.064
Neg logit contributions
smart-0.146
 respectful-0.145
 disl-0.138
 facts-0.130
 questions-0.125
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
 doorstep+0.038
 once+0.037
 were+0.033
 t+0.032
 re+0.032
Neg logit contributions
smart-0.086
 respectful-0.083
 disl-0.080
 cooperative-0.078
 facts-0.078
Token a
Token ID257
Feature activation-4.54e-02

Pos logit contributions
 t+0.013
 once+0.012
 were+0.012
 re+0.012
 baby+0.011
Neg logit contributions
smart-0.025
 respectful-0.024
 disl-0.024
 cooperative-0.023
 facts-0.023
Token s
Token ID264
Feature activation+1.01e-02

Pos logit contributions
 doorstep+0.053
 once+0.052
 t+0.047
 re+0.047
 were+0.047
Neg logit contributions
smart-0.090
 respectful-0.087
 disl-0.083
 cooperative-0.082
 facts-0.081
Tokeniren
Token ID24080
Feature activation-2.25e-02

Pos logit contributions
 respectful+0.017
 disl+0.017
 facts+0.016
 rewarding+0.015
 cooperative+0.015
Neg logit contributions
he-0.015
el-0.015
 once-0.014
 doorstep-0.014
ay-0.014
Token She
Token ID1375
Feature activation+4.10e-04

Pos logit contributions
 once+0.004
 doorstep+0.004
 were+0.004
 t+0.003
 baby+0.003
Neg logit contributions
smart-0.005
 respectful-0.005
 cooperative-0.004
 facts-0.004
 disl-0.004
Token saw
Token ID2497
Feature activation-8.79e-03

Pos logit contributions
smart+0.001
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 doorstep-0.000
 once-0.000
 re-0.000
burn-0.000
 dot-0.000
Token the
Token ID262
Feature activation-2.62e-02

Pos logit contributions
 once+0.010
 doorstep+0.009
 were+0.009
 t+0.009
 re+0.009
Neg logit contributions
smart-0.018
 respectful-0.018
 disl-0.017
 cooperative-0.017
 facts-0.017
Token broken
Token ID5445
Feature activation-3.87e-02

Pos logit contributions
 doorstep+0.036
 once+0.035
amb+0.033
burn+0.033
eled+0.033
Neg logit contributions
 respectful-0.043
smart-0.042
 disl-0.042
 cooperative-0.039
 facts-0.038
Token ash
Token ID12530
Feature activation-3.64e-02

Pos logit contributions
 doorstep+0.054
 once+0.053
 re+0.049
amb+0.049
 raged+0.049
Neg logit contributions
smart-0.062
 respectful-0.060
 disl-0.058
 facts-0.054
 cooperative-0.053
Tokentr
Token ID2213
Feature activation-7.54e-02

Pos logit contributions
 doorstep+0.034
eled+0.032
 once+0.032
 re+0.031
 were+0.029
Neg logit contributions
smart-0.076
 respectful-0.070
 facts-0.067
 disl-0.066
 cooperative-0.064
Tokenay
Token ID323
Feature activation-2.36e-02

Pos logit contributions
 re+0.096
 doorstep+0.084
 picnic+0.083
 duck+0.079
 stuffed+0.079
Neg logit contributions
smart-0.157
ways-0.122
olds-0.117
 wisely-0.114
 complex-0.112
Token and
Token ID290
Feature activation-1.20e-02

Pos logit contributions
 once+0.022
 doorstep+0.021
 t+0.019
 were+0.019
 re+0.019
Neg logit contributions
smart-0.052
 respectful-0.052
 disl-0.050
 cooperative-0.049
 facts-0.048
Token the
Token ID262
Feature activation-1.95e-02

Pos logit contributions
 once+0.011
 were+0.011
 t+0.011
 doorstep+0.011
 re+0.010
Neg logit contributions
smart-0.026
 respectful-0.026
 disl-0.025
 cooperative-0.024
 facts-0.024
Token mess
Token ID2085
Feature activation-1.64e-02

Pos logit contributions
 doorstep+0.021
 once+0.020
 t+0.019
 re+0.018
 were+0.018
Neg logit contributions
smart-0.041
 respectful-0.039
 disl-0.038
 cooperative-0.037
 facts-0.037
Token on
Token ID319
Feature activation-2.66e-02

Pos logit contributions
 once+0.022
 were+0.022
 doorstep+0.021
 t+0.021
 re+0.019
Neg logit contributions
 respectful-0.029
smart-0.029
 disl-0.029
 facts-0.027
 cooperative-0.027
Token bear
Token ID6842
Feature activation+8.62e-03

Pos logit contributions
 doorstep+0.019
 once+0.017
 were+0.015
 re+0.014
eled+0.014
Neg logit contributions
smart-0.059
 respectful-0.058
 disl-0.055
 cooperative-0.055
 facts-0.055
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
smart+0.019
 disl+0.019
 respectful+0.019
 cooperative+0.018
 facts+0.018
Neg logit contributions
 once-0.009
 t-0.008
 doorstep-0.008
 re-0.008
 all-0.007
Token Lucy
Token ID22162
Feature activation-5.36e-03

Pos logit contributions
 doorstep+0.022
 once+0.021
 re+0.020
 were+0.020
 t+0.020
Neg logit contributions
smart-0.021
 respectful-0.020
 disl-0.020
 facts-0.019
 cooperative-0.019
Token and
Token ID290
Feature activation-1.81e-02

Pos logit contributions
 doorstep+0.006
 once+0.006
 dot+0.005
 tub+0.005
 re+0.005
Neg logit contributions
 respectful-0.011
smart-0.011
 disl-0.010
 cooperative-0.010
 facts-0.009
Token her
Token ID607
Feature activation-4.38e-02

Pos logit contributions
 doorstep+0.037
 once+0.034
 dot+0.033
neath+0.032
eled+0.032
Neg logit contributions
 respectful-0.020
smart-0.020
 disl-0.018
 facts-0.018
 cooperative-0.017
Token t
Token ID256
Feature activation-7.54e-02

Pos logit contributions
neath+0.048
 doorstep+0.045
 raged+0.043
eled+0.040
 dot+0.039
Neg logit contributions
 respectful-0.083
 questions-0.080
 cooperative-0.076
 facts-0.074
smart-0.073
Tokeneddy
Token ID21874
Feature activation-3.62e-02

Pos logit contributions
 blast+0.115
 flood+0.103
neath+0.101
 exploded+0.101
 picnic+0.099
Neg logit contributions
smart-0.105
and-0.104
ups-0.093
 respectful-0.089
olds-0.088
Token bear
Token ID6842
Feature activation-6.51e-03

Pos logit contributions
 doorstep+0.043
 once+0.032
he+0.032
neath+0.032
 dot+0.031
Neg logit contributions
smart-0.071
 respectful-0.068
 facts-0.064
 questions-0.063
 disl-0.063
Token lived
Token ID5615
Feature activation-1.80e-02

Pos logit contributions
 doorstep+0.004
 once+0.004
 re+0.003
 t+0.003
 were+0.003
Neg logit contributions
smart-0.016
 respectful-0.016
 disl-0.016
 cooperative-0.015
 facts-0.015
Token in
Token ID287
Feature activation-2.29e-02

Pos logit contributions
 once+0.013
 doorstep+0.011
 t+0.010
 were+0.010
 re+0.010
Neg logit contributions
smart-0.045
 disl-0.043
 respectful-0.043
 facts-0.042
 cooperative-0.041
Token a
Token ID257
Feature activation-3.83e-03

Pos logit contributions
 doorstep+0.035
 once+0.032
 re+0.030
 were+0.030
eled+0.030
Neg logit contributions
smart-0.037
 respectful-0.036
 disl-0.035
 cooperative-0.033
 facts-0.033
Token big
Token ID1263
Feature activation-4.34e-02

Pos logit contributions
 once+0.047
 doorstep+0.046
el+0.043
eled+0.043
 were+0.042
Neg logit contributions
 respectful-0.060
smart-0.059
 disl-0.055
 cooperative-0.054
 facts-0.052
Token scary
Token ID14343
Feature activation-3.24e-02

Pos logit contributions
 doorstep+0.052
eled+0.047
 once+0.047
amb+0.045
 were+0.045
Neg logit contributions
smart-0.083
 respectful-0.082
 disl-0.074
 cooperative-0.072
 facts-0.071
Token monster
Token ID9234
Feature activation-1.86e-02

Pos logit contributions
 doorstep+0.035
 were+0.033
 once+0.032
eled+0.032
neath+0.031
Neg logit contributions
smart-0.063
 respectful-0.062
 disl-0.059
 facts-0.056
 cooperative-0.055
Token!
Token ID0
Feature activation-2.16e-02

Pos logit contributions
 doorstep+0.025
 once+0.024
 were+0.022
 t+0.022
 re+0.022
Neg logit contributions
smart-0.033
 respectful-0.032
 disl-0.030
 cooperative-0.029
 facts-0.029
Token The
Token ID383
Feature activation-3.95e-02

Pos logit contributions
 doorstep+0.031
 once+0.029
 t+0.028
 re+0.028
burn+0.027
Neg logit contributions
smart-0.037
 respectful-0.035
 disl-0.034
 cooperative-0.033
 facts-0.032
Token t
Token ID256
Feature activation-7.53e-02

Pos logit contributions
amb+0.048
neath+0.044
 doorstep+0.042
eb+0.040
icking+0.039
Neg logit contributions
 respectful-0.071
smart-0.068
 questions-0.066
 smarter-0.062
 cooperative-0.062
Tokeneddy
Token ID21874
Feature activation-4.24e-02

Pos logit contributions
 picnic+0.114
 blast+0.112
 finally+0.104
 start+0.102
 set+0.098
Neg logit contributions
smart-0.116
and-0.108
hearted-0.101
aring-0.100
able-0.096
Token bear
Token ID6842
Feature activation-2.59e-02

Pos logit contributions
 doorstep+0.060
 destination+0.054
 Fourth+0.051
 suitcase+0.050
 airport+0.049
Neg logit contributions
smart-0.073
 respectful-0.063
 disl-0.056
 cooperative-0.055
 questions-0.054
Token and
Token ID290
Feature activation-2.03e-02

Pos logit contributions
 doorstep+0.030
eddy+0.029
burn+0.029
ifier+0.028
 tub+0.028
Neg logit contributions
smart-0.050
 respectful-0.047
 disl-0.042
 cooperative-0.041
aring-0.040
Token his
Token ID465
Feature activation-2.76e-02

Pos logit contributions
 t+0.025
 once+0.025
 re+0.024
 baby+0.024
 doorstep+0.024
Neg logit contributions
smart-0.038
 respectful-0.037
 disl-0.036
 cooperative-0.036
 facts-0.035
Token friends
Token ID2460
Feature activation-2.44e-02

Pos logit contributions
neath+0.035
 doorstep+0.034
he+0.032
 once+0.031
 raged+0.031
Neg logit contributions
 respectful-0.048
smart-0.046
 disl-0.046
 facts-0.045
 cooperative-0.044
Token.
Token ID13
Feature activation+7.73e-03

Pos logit contributions
 once+0.011
 doorstep+0.009
 t+0.009
 were+0.009
 re+0.009
Neg logit contributions
smart-0.027
 respectful-0.027
 disl-0.026
 cooperative-0.026
 facts-0.026
Token He
Token ID679
Feature activation+5.32e-03

Pos logit contributions
 respectful+0.019
 cooperative+0.019
 facts+0.018
 rewarding+0.018
smart+0.018
Neg logit contributions
 once-0.007
 Mom-0.006
 Brown-0.006
 Bobby-0.006
 mom-0.005
Token drops
Token ID10532
Feature activation-1.76e-02

Pos logit contributions
smart+0.009
 respectful+0.009
 disl+0.009
 cooperative+0.008
 facts+0.008
Neg logit contributions
 doorstep-0.007
 once-0.007
 t-0.007
 re-0.007
 were-0.007
Token the
Token ID262
Feature activation+2.76e-03

Pos logit contributions
 once+0.020
 were+0.018
 doorstep+0.018
 t+0.018
 re+0.017
Neg logit contributions
 respectful-0.036
smart-0.035
 disl-0.034
 cooperative-0.034
 facts-0.033
Token ash
Token ID12530
Feature activation-2.60e-02

Pos logit contributions
smart+0.006
 respectful+0.006
 disl+0.006
 cooperative+0.005
 facts+0.005
Neg logit contributions
 doorstep-0.003
 once-0.003
 t-0.002
 re-0.002
 were-0.002
Tokentr
Token ID2213
Feature activation-7.53e-02

Pos logit contributions
 doorstep+0.029
 once+0.028
 re+0.026
 were+0.025
eled+0.025
Neg logit contributions
smart-0.051
 respectful-0.049
 disl-0.046
 facts-0.045
 cooperative-0.045
Tokenay
Token ID323
Feature activation-3.21e-02

Pos logit contributions
 re+0.092
 doorstep+0.086
 stuffed+0.076
 picnic+0.075
 comet+0.075
Neg logit contributions
smart-0.158
 wisely-0.120
olds-0.118
ways-0.115
run-0.113
Token.
Token ID13
Feature activation-4.28e-03

Pos logit contributions
 doorstep+0.037
 once+0.036
 re+0.033
 t+0.033
 were+0.032
Neg logit contributions
smart-0.064
 respectful-0.062
 disl-0.060
 cooperative-0.058
 facts-0.058
Token It
Token ID632
Feature activation-2.86e-03

Pos logit contributions
 once+0.004
 Brown+0.003
 Mom+0.003
 were+0.003
 mom+0.003
Neg logit contributions
 respectful-0.011
 cooperative-0.010
smart-0.010
 facts-0.010
 rewarding-0.009
Token breaks
Token ID9457
Feature activation-2.42e-02

Pos logit contributions
 doorstep+0.004
 once+0.004
 re+0.004
eled+0.004
burn+0.004
Neg logit contributions
smart-0.005
 respectful-0.005
 disl-0.005
 facts-0.004
 cooperative-0.004
Token.
Token ID13
Feature activation-3.31e-03

Pos logit contributions
 once+0.034
 t+0.031
 doorstep+0.030
 were+0.029
 re+0.029
Neg logit contributions
 respectful-0.045
smart-0.045
 disl-0.044
 facts-0.042
 cooperative-0.042
Tokenaking
Token ID868
Feature activation-8.00e-03

Pos logit contributions
smart+0.047
 disl+0.046
 respectful+0.045
 facts+0.043
 cooperative+0.041
Neg logit contributions
 once-0.074
 doorstep-0.070
 were-0.068
he-0.068
 re-0.067
Token.
Token ID13
Feature activation+6.73e-04

Pos logit contributions
 once+0.010
 re+0.009
 all+0.009
 t+0.009
 were+0.009
Neg logit contributions
smart-0.016
 respectful-0.016
 disl-0.015
 cooperative-0.015
 facts-0.014
Token He
Token ID679
Feature activation-7.50e-03

Pos logit contributions
smart+0.002
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 once-0.001
 were-0.001
 t-0.001
 Brown-0.001
 doorstep-0.001
Token had
Token ID550
Feature activation-1.07e-02

Pos logit contributions
 doorstep+0.008
 once+0.007
el+0.006
 t+0.006
 dot+0.006
Neg logit contributions
smart-0.016
 respectful-0.016
 disl-0.015
 cooperative-0.015
 facts-0.015
Token a
Token ID257
Feature activation-2.88e-02

Pos logit contributions
 once+0.014
 doorstep+0.014
 were+0.013
 t+0.013
 re+0.013
Neg logit contributions
smart-0.020
 respectful-0.019
 disl-0.018
 cooperative-0.018
 facts-0.018
Token red
Token ID2266
Feature activation-7.52e-02

Pos logit contributions
 doorstep+0.040
 once+0.040
 were+0.036
el+0.036
eled+0.035
Neg logit contributions
 respectful-0.050
smart-0.048
 disl-0.045
 cooperative-0.045
 facts-0.044
Token kay
Token ID34681
Feature activation+3.55e-02

Pos logit contributions
 doorstep+0.087
neath+0.081
 raged+0.077
 once+0.076
he+0.075
Neg logit contributions
 respectful-0.124
smart-0.115
 questions-0.111
 facts-0.110
 disl-0.106
Tokenak
Token ID461
Feature activation-1.04e-02

Pos logit contributions
smart+0.041
 respectful+0.039
 disl+0.039
 facts+0.036
 cooperative+0.036
Neg logit contributions
 once-0.070
 doorstep-0.068
 were-0.066
he-0.065
 re-0.064
Token that
Token ID326
Feature activation+6.62e-03

Pos logit contributions
 once+0.009
 all+0.008
 were+0.008
 doorstep+0.008
 t+0.008
Neg logit contributions
 disl-0.025
smart-0.025
 cooperative-0.025
 respectful-0.024
 facts-0.023
Token he
Token ID339
Feature activation+7.88e-03

Pos logit contributions
smart+0.013
 respectful+0.012
 disl+0.012
 cooperative+0.012
 facts+0.011
Neg logit contributions
 once-0.008
 doorstep-0.008
 were-0.008
 t-0.008
 re-0.008
Token loved
Token ID6151
Feature activation-1.09e-02

Pos logit contributions
smart+0.017
 respectful+0.016
 disl+0.016
 cooperative+0.015
 facts+0.015
Neg logit contributions
 once-0.008
 doorstep-0.008
 were-0.007
 re-0.007
 t-0.007
Token stuck
Token ID7819
Feature activation-4.65e-02

Pos logit contributions
 doorstep+0.057
eddy+0.051
 dot+0.050
el+0.050
amb+0.050
Neg logit contributions
 respectful-0.048
smart-0.047
 cooperative-0.042
 smarter-0.040
 rewarding-0.037
Token there
Token ID612
Feature activation-2.91e-02

Pos logit contributions
 doorstep+0.041
 once+0.039
 re+0.035
 t+0.034
 were+0.034
Neg logit contributions
smart-0.105
 respectful-0.102
 disl-0.097
 cooperative-0.096
 facts-0.094
Token forever
Token ID8097
Feature activation-4.24e-03

Pos logit contributions
 once+0.033
 were+0.029
 doorstep+0.028
 t+0.026
 all+0.026
Neg logit contributions
 disl-0.064
smart-0.063
 respectful-0.061
 cooperative-0.060
 facts-0.059
Token.
Token ID13
Feature activation-2.40e-02

Pos logit contributions
 once+0.005
 were+0.005
 doorstep+0.004
 t+0.004
 all+0.004
Neg logit contributions
smart-0.009
 disl-0.009
 respectful-0.009
 cooperative-0.008
 facts-0.008
Token The
Token ID383
Feature activation-3.86e-02

Pos logit contributions
 doorstep+0.035
burn+0.034
 re+0.034
 t+0.033
 once+0.033
Neg logit contributions
smart-0.039
 respectful-0.036
 disl-0.036
 cooperative-0.034
 facts-0.034
Token t
Token ID256
Feature activation-7.52e-02

Pos logit contributions
amb+0.040
 doorstep+0.038
neath+0.037
eb+0.034
apped+0.034
Neg logit contributions
 respectful-0.076
smart-0.070
 questions-0.069
 cooperative-0.067
 disl-0.066
Tokeneddy
Token ID21874
Feature activation-4.65e-02

Pos logit contributions
 blast+0.116
 picnic+0.113
 start+0.107
 finally+0.105
 set+0.101
Neg logit contributions
smart-0.116
and-0.110
aring-0.102
hearted-0.100
ups-0.096
Token bear
Token ID6842
Feature activation-2.90e-02

Pos logit contributions
 doorstep+0.061
 destination+0.053
 suitcase+0.051
 Fourth+0.051
ay+0.051
Neg logit contributions
smart-0.083
 respectful-0.075
 cooperative-0.067
 disl-0.067
minded-0.064
Token wished
Token ID16555
Feature activation-3.00e-02

Pos logit contributions
 doorstep+0.030
eddy+0.028
burn+0.027
ifier+0.027
 comet+0.027
Neg logit contributions
smart-0.060
 respectful-0.058
 cooperative-0.052
 disl-0.052
 facts-0.050
Token he
Token ID339
Feature activation-1.56e-03

Pos logit contributions
 once+0.039
 were+0.036
 t+0.034
 all+0.033
 re+0.033
Neg logit contributions
smart-0.058
 respectful-0.058
 disl-0.057
 cooperative-0.055
 facts-0.053
Token had
Token ID550
Feature activation-6.40e-03

Pos logit contributions
 doorstep+0.002
 once+0.002
 tub+0.002
 dot+0.001
el+0.001
Neg logit contributions
smart-0.003
 respectful-0.003
 cooperative-0.003
 disl-0.003
 facts-0.003
Token but
Token ID475
Feature activation+1.91e-02

Pos logit contributions
smart+0.043
 disl+0.040
 respectful+0.038
reetings+0.038
 cooperative+0.037
Neg logit contributions
 t-0.022
 were-0.020
 once-0.020
 stuffed-0.020
 all-0.019
Token Ben
Token ID3932
Feature activation+3.23e-02

Pos logit contributions
smart+0.043
 disl+0.039
 respectful+0.037
 facts+0.036
reetings+0.036
Neg logit contributions
 once-0.023
 were-0.022
 all-0.021
 t-0.021
 both-0.020
Token had
Token ID550
Feature activation+2.66e-02

Pos logit contributions
smart+0.077
 respectful+0.073
 disl+0.072
 cooperative+0.070
 facts+0.070
Neg logit contributions
 once-0.026
 doorstep-0.025
 were-0.024
 t-0.024
 re-0.022
Token only
Token ID691
Feature activation+4.95e-02

Pos logit contributions
smart+0.065
 disl+0.064
reetings+0.061
 cooperative+0.060
aring+0.060
Neg logit contributions
 all-0.025
 once-0.024
 were-0.024
 t-0.022
 both-0.021
Token a
Token ID257
Feature activation+2.25e-02

Pos logit contributions
 disl+0.114
smart+0.114
 respectful+0.109
 cooperative+0.108
vern+0.103
Neg logit contributions
 once-0.047
 were-0.045
 t-0.044
 all-0.044
 set-0.039
Token few
Token ID1178
Feature activation+5.38e-02

Pos logit contributions
smart+0.048
 disl+0.045
 respectful+0.044
 cooperative+0.042
 facts+0.042
Neg logit contributions
 t-0.025
 once-0.024
 baby-0.024
 doorstep-0.023
 blanket-0.022
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
 disl+0.112
reetings+0.104
smart+0.102
 respectful+0.102
lad+0.096
Neg logit contributions
 once-0.060
 all-0.057
 were-0.056
 t-0.056
 stuffed-0.053
Token Lily
Token ID20037
Feature activation+1.91e-02

Pos logit contributions
smart+0.034
 respectful+0.034
 disl+0.033
 facts+0.032
 cooperative+0.032
Neg logit contributions
 once-0.017
 were-0.014
 tub-0.013
 doorstep-0.013
 all-0.013
Token was
Token ID373
Feature activation+2.02e-02

Pos logit contributions
 respectful+0.039
smart+0.039
 disl+0.037
 cooperative+0.036
 facts+0.035
Neg logit contributions
 doorstep-0.020
 once-0.020
 re-0.017
burn-0.017
 t-0.017
Token happy
Token ID3772
Feature activation+5.90e-04

Pos logit contributions
 disl+0.046
smart+0.045
 facts+0.045
reetings+0.044
leness+0.043
Neg logit contributions
 once-0.022
 all-0.020
 t-0.020
 just-0.019
 baby-0.019
Token,
Token ID11
Feature activation+1.40e-03

Pos logit contributions
smart+0.001
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 once-0.001
 doorstep-0.001
 t-0.001
 were-0.001
 re-0.001
Token,
Token ID11
Feature activation-9.40e-03

Pos logit contributions
 respectful+0.031
 cooperative+0.030
smart+0.030
 disl+0.030
aring+0.029
Neg logit contributions
 all-0.008
 once-0.008
 t-0.007
 were-0.007
 Brown-0.006
Token so
Token ID523
Feature activation-4.92e-03

Pos logit contributions
 once+0.011
 doorstep+0.011
 were+0.011
 t+0.010
 re+0.010
Neg logit contributions
smart-0.018
 respectful-0.018
 disl-0.017
 cooperative-0.017
 facts-0.017
Token he
Token ID339
Feature activation+4.28e-03

Pos logit contributions
 doorstep+0.009
 once+0.008
 re+0.008
 dot+0.008
 t+0.008
Neg logit contributions
smart-0.007
 respectful-0.006
 disl-0.006
 cooperative-0.006
 facts-0.006
Token asked
Token ID1965
Feature activation+1.18e-02

Pos logit contributions
 respectful+0.008
smart+0.007
 cooperative+0.007
 disl+0.006
 smarter+0.006
Neg logit contributions
 doorstep-0.006
 once-0.005
he-0.005
neath-0.005
 dot-0.005
Token his
Token ID465
Feature activation+9.16e-03

Pos logit contributions
 disl+0.028
 respectful+0.026
 facts+0.026
 cooperative+0.026
 rewarding+0.025
Neg logit contributions
 t-0.012
 all-0.012
 once-0.011
 mom-0.011
 were-0.011
Token dad
Token ID9955
Feature activation+5.10e-02

Pos logit contributions
smart+0.024
 respectful+0.024
 disl+0.023
 cooperative+0.023
 facts+0.022
Neg logit contributions
 doorstep-0.005
 once-0.005
 were-0.004
 re-0.003
 t-0.003
Token for
Token ID329
Feature activation+1.84e-03

Pos logit contributions
 disl+0.135
leness+0.124
 cooperative+0.122
 agile+0.120
 facts+0.117
Neg logit contributions
 all-0.048
 once-0.046
 t-0.038
 mom-0.037
 were-0.034
Token help
Token ID1037
Feature activation+7.15e-03

Pos logit contributions
smart+0.004
 disl+0.004
 facts+0.004
 respectful+0.004
reetings+0.004
Neg logit contributions
 t-0.002
 once-0.002
 all-0.002
 re-0.002
 towel-0.002
Token.
Token ID13
Feature activation-1.58e-02

Pos logit contributions
 disl+0.014
 facts+0.014
smart+0.014
 respectful+0.014
 cooperative+0.014
Neg logit contributions
 once-0.009
 t-0.008
 doorstep-0.008
 were-0.008
 all-0.008
Token
Token ID198
Feature activation-2.31e-02

Pos logit contributions
 once+0.018
 doorstep+0.018
 were+0.016
 t+0.016
 re+0.016
Neg logit contributions
smart-0.032
 respectful-0.031
 disl-0.029
 cooperative-0.029
 facts-0.029
Token
Token ID198
Feature activation-1.47e-02

Pos logit contributions
 doorstep+0.035
 re+0.033
 t+0.032
 dot+0.031
 once+0.031
Neg logit contributions
smart-0.038
 respectful-0.035
 disl-0.033
 cooperative-0.031
 facts-0.031
Token he
Token ID339
Feature activation+1.79e-02

Pos logit contributions
smart+0.017
 respectful+0.016
 disl+0.015
 cooperative+0.015
 facts+0.015
Neg logit contributions
 once-0.011
 were-0.011
 t-0.011
 doorstep-0.011
 re-0.010
Token was
Token ID373
Feature activation+9.37e-03

Pos logit contributions
smart+0.040
 respectful+0.039
 disl+0.037
 cooperative+0.037
 facts+0.036
Neg logit contributions
 once-0.017
 doorstep-0.016
 were-0.015
 t-0.015
 re-0.014
Token always
Token ID1464
Feature activation+4.26e-02

Pos logit contributions
 facts+0.024
smart+0.024
 disl+0.024
leness+0.023
reetings+0.023
Neg logit contributions
 all-0.007
 once-0.007
 just-0.006
 both-0.006
 t-0.006
Token trying
Token ID2111
Feature activation+4.81e-02

Pos logit contributions
 facts+0.097
 disl+0.091
smart+0.090
leness+0.084
ruit+0.084
Neg logit contributions
 all-0.046
 once-0.045
 t-0.044
 both-0.041
 baby-0.038
Token to
Token ID284
Feature activation+3.93e-02

Pos logit contributions
 disl+0.120
reetings+0.117
 cooperative+0.115
 respectful+0.113
smart+0.113
Neg logit contributions
 all-0.049
 once-0.037
 just-0.032
 both-0.030
 t-0.029
Token be
Token ID307
Feature activation+4.92e-02

Pos logit contributions
smart+0.088
 disl+0.082
 facts+0.081
 respectful+0.081
 cooperative+0.079
Neg logit contributions
 once-0.037
 doorstep-0.036
 t-0.036
 were-0.036
 all-0.035
Token the
Token ID262
Feature activation+6.91e-03

Pos logit contributions
 facts+0.102
 disl+0.101
reetings+0.099
aring+0.098
ruit+0.098
Neg logit contributions
 all-0.058
 just-0.053
 both-0.052
 finally-0.048
 once-0.048
Token one
Token ID530
Feature activation+2.79e-02

Pos logit contributions
smart+0.018
 disl+0.017
 respectful+0.015
 facts+0.015
leness+0.015
Neg logit contributions
 once-0.006
 t-0.005
 doorstep-0.005
 baby-0.005
 re-0.005
Token who
Token ID508
Feature activation+3.00e-02

Pos logit contributions
 disl+0.061
smart+0.058
 respectful+0.056
 facts+0.056
leness+0.055
Neg logit contributions
 once-0.032
 baby-0.026
 t-0.026
 all-0.026
 re-0.025
Token won
Token ID1839
Feature activation+8.76e-03

Pos logit contributions
 disl+0.073
smart+0.072
 respectful+0.072
 cooperative+0.070
 facts+0.068
Neg logit contributions
 were-0.026
 all-0.022
 once-0.020
 t-0.019
 couldn-0.017
Token.
Token ID13
Feature activation-5.59e-03

Pos logit contributions
 respectful+0.019
smart+0.019
 disl+0.019
 facts+0.018
 cooperative+0.018
Neg logit contributions
 once-0.010
 all-0.009
 were-0.008
 t-0.008
 both-0.007
Tokenay
Token ID323
Feature activation+1.29e-02

Pos logit contributions
 once+0.002
 doorstep+0.002
 were+0.001
 t+0.001
 re+0.001
Neg logit contributions
smart-0.012
 respectful-0.012
 disl-0.011
 cooperative-0.011
 facts-0.011
Tokenon
Token ID261
Feature activation+5.21e-03

Pos logit contributions
 respectful+0.034
 cooperative+0.032
 disl+0.031
 rewarding+0.031
smart+0.031
Neg logit contributions
 once-0.011
 were-0.010
el-0.010
he-0.009
ay-0.008
Token and
Token ID290
Feature activation+1.72e-02

Pos logit contributions
 disl+0.012
 respectful+0.012
smart+0.012
 cooperative+0.012
 facts+0.011
Neg logit contributions
 once-0.005
 were-0.005
 t-0.004
 all-0.004
 doorstep-0.004
Token add
Token ID751
Feature activation+3.88e-02

Pos logit contributions
smart+0.037
 respectful+0.036
 cooperative+0.035
 disl+0.035
 facts+0.034
Neg logit contributions
 once-0.016
 t-0.016
 were-0.016
 doorstep-0.015
 re-0.015
Token some
Token ID617
Feature activation+3.75e-02

Pos logit contributions
 respectful+0.097
 disl+0.096
 cooperative+0.093
smart+0.092
aring+0.089
Neg logit contributions
 once-0.039
 all-0.036
 were-0.030
 t-0.029
 both-0.027
Token squ
Token ID2809
Feature activation+5.64e-02

Pos logit contributions
 disl+0.086
 respectful+0.085
smart+0.084
 cooperative+0.081
reetings+0.078
Neg logit contributions
 once-0.036
 t-0.033
 all-0.032
 were-0.032
el-0.030
Tokenigg
Token ID6950
Feature activation+4.21e-02

Pos logit contributions
smart+0.111
 respectful+0.109
reetings+0.103
 cooperative+0.102
 disl+0.100
Neg logit contributions
el-0.071
 once-0.062
ay-0.059
ee-0.055
eled-0.055
Tokenles
Token ID829
Feature activation+2.16e-02

Pos logit contributions
 respectful+0.106
smart+0.106
 cooperative+0.103
 cafeteria+0.099
 disl+0.098
Neg logit contributions
 once-0.030
el-0.029
 all-0.022
 were-0.021
sey-0.019
Token.
Token ID13
Feature activation-4.83e-04

Pos logit contributions
 respectful+0.049
 cooperative+0.049
 disl+0.048
reetings+0.047
smart+0.047
Neg logit contributions
 once-0.021
 all-0.021
 were-0.018
 t-0.018
 both-0.017
Token The
Token ID383
Feature activation+4.03e-03

Pos logit contributions
 once+0.001
 were+0.000
 doorstep+0.000
 t+0.000
 Brown+0.000
Neg logit contributions
 respectful-0.001
smart-0.001
 disl-0.001
 cooperative-0.001
 facts-0.001
Token boy
Token ID2933
Feature activation+1.41e-02

Pos logit contributions
smart+0.011
 disl+0.011
 respectful+0.010
 facts+0.010
 cooperative+0.010
Neg logit contributions
 t-0.002
 once-0.002
 baby-0.002
 doorstep-0.002
 re-0.002
Tokenider
Token ID1304
Feature activation+8.53e-03

Pos logit contributions
 doorstep+0.012
 once+0.011
 t+0.011
 were+0.010
 re+0.010
Neg logit contributions
smart-0.007
 respectful-0.007
 disl-0.006
 facts-0.006
 cooperative-0.006
Token.
Token ID13
Feature activation+5.74e-03

Pos logit contributions
 disl+0.019
smart+0.019
 cooperative+0.018
 respectful+0.018
rets+0.017
Neg logit contributions
 once-0.010
 were-0.010
 doorstep-0.008
 t-0.008
 all-0.008
Token Lily
Token ID20037
Feature activation+1.16e-02

Pos logit contributions
smart+0.012
 respectful+0.012
 disl+0.012
 cooperative+0.012
 facts+0.011
Neg logit contributions
 once-0.007
 doorstep-0.006
 were-0.006
 Brown-0.006
 t-0.006
Token counted
Token ID14789
Feature activation+2.53e-02

Pos logit contributions
smart+0.025
 respectful+0.024
 disl+0.023
 cooperative+0.023
 facts+0.022
Neg logit contributions
 doorstep-0.012
 once-0.011
 re-0.010
 t-0.010
burn-0.009
Token to
Token ID284
Feature activation+3.02e-02

Pos logit contributions
 disl+0.067
 respectful+0.061
smart+0.060
 cooperative+0.060
leness+0.059
Neg logit contributions
 once-0.027
 all-0.025
 were-0.022
 t-0.021
 both-0.020
Token ten
Token ID3478
Feature activation+5.27e-02

Pos logit contributions
 disl+0.059
smart+0.058
aring+0.053
leness+0.053
 respectful+0.053
Neg logit contributions
 t-0.042
 once-0.042
 all-0.041
 were-0.041
 doorstep-0.039
Token and
Token ID290
Feature activation+2.73e-02

Pos logit contributions
 disl+0.142
 respectful+0.129
reetings+0.122
 cooperative+0.122
smart+0.121
Neg logit contributions
 once-0.045
 all-0.038
 were-0.036
 t-0.036
el-0.034
Token then
Token ID788
Feature activation+4.19e-02

Pos logit contributions
smart+0.062
 respectful+0.059
 disl+0.058
 cooperative+0.057
 facts+0.056
Neg logit contributions
 once-0.025
 doorstep-0.025
 were-0.022
 t-0.022
 re-0.021
Token looked
Token ID3114
Feature activation+4.22e-02

Pos logit contributions
 disl+0.095
smart+0.091
 cooperative+0.087
 respectful+0.087
leness+0.086
Neg logit contributions
 t-0.044
 were-0.042
 all-0.042
 once-0.040
 finally-0.039
Token for
Token ID329
Feature activation+1.46e-02

Pos logit contributions
 disl+0.110
smart+0.109
 respectful+0.105
 cooperative+0.104
 facts+0.103
Neg logit contributions
 all-0.030
 once-0.027
 t-0.025
 were-0.023
 both-0.022
Token Tom
Token ID4186
Feature activation+4.54e-02

Pos logit contributions
 disl+0.031
smart+0.031
 respectful+0.031
 cooperative+0.029
aring+0.029
Neg logit contributions
 once-0.016
 t-0.016
 all-0.015
 were-0.015
 mom-0.015
Token".
Token ID1911
Feature activation+6.39e-03

Pos logit contributions
smart+0.050
 disl+0.048
 respectful+0.048
 cooperative+0.047
leness+0.046
Neg logit contributions
 once-0.023
 t-0.018
 doorstep-0.018
 all-0.018
 set-0.017
Token
Token ID198
Feature activation+2.50e-03

Pos logit contributions
 respectful+0.015
smart+0.015
 disl+0.014
 cooperative+0.014
 facts+0.014
Neg logit contributions
 once-0.006
 were-0.006
 doorstep-0.005
he-0.004
 Brown-0.004
Token
Token ID198
Feature activation+1.49e-02

Pos logit contributions
smart+0.006
 respectful+0.005
 disl+0.005
 cooperative+0.005
 facts+0.005
Neg logit contributions
 once-0.002
 doorstep-0.002
 were-0.002
 t-0.002
he-0.002
TokenThe
Token ID464
Feature activation+9.13e-03

Pos logit contributions
smart+0.026
 disl+0.024
 respectful+0.024
 facts+0.024
 cooperative+0.022
Neg logit contributions
 once-0.021
 doorstep-0.021
 were-0.021
he-0.019
 finally-0.018
Token creature
Token ID7185
Feature activation-1.17e-02

Pos logit contributions
smart+0.020
 respectful+0.019
 disl+0.019
 cooperative+0.018
 facts+0.018
Neg logit contributions
 once-0.009
 doorstep-0.009
 t-0.009
 re-0.008
 were-0.008
Token gigg
Token ID30442
Feature activation+5.06e-03

Pos logit contributions
 doorstep+0.011
 once+0.011
burn+0.009
 re+0.009
 t+0.009
Neg logit contributions
smart-0.025
 respectful-0.025
 disl-0.023
 cooperative-0.023
 facts-0.023
Tokenled
Token ID992
Feature activation+1.52e-02

Pos logit contributions
smart+0.010
 respectful+0.010
 disl+0.009
 cooperative+0.009
 facts+0.009
Neg logit contributions
 once-0.006
 doorstep-0.006
 were-0.006
 t-0.006
 re-0.006
Token and
Token ID290
Feature activation+6.57e-04

Pos logit contributions
 disl+0.041
smart+0.039
 facts+0.039
 cooperative+0.038
 respectful+0.038
Neg logit contributions
 once-0.012
 all-0.009
 t-0.009
 were-0.009
 just-0.008
Token said
Token ID531
Feature activation+2.94e-02

Pos logit contributions
smart+0.002
 respectful+0.001
 disl+0.001
 cooperative+0.001
 facts+0.001
Neg logit contributions
 once-0.001
 doorstep-0.001
 were-0.001
 t-0.001
 re-0.001
Token,
Token ID11
Feature activation+1.79e-02

Pos logit contributions
 disl+0.075
 respectful+0.075
 cooperative+0.074
 facts+0.073
 rewarding+0.073
Neg logit contributions
 once-0.021
 all-0.018
 were-0.018
 t-0.017
 baby-0.016
Token "
Token ID366
Feature activation+2.34e-02

Pos logit contributions
smart+0.037
 facts+0.037
 disl+0.036
 respectful+0.036
 rewarding+0.035
Neg logit contributions
 once-0.022
 t-0.020
 finally-0.019
 all-0.019
 baby-0.018
Token silly
Token ID14397
Feature activation+2.59e-03

Pos logit contributions
smart+0.002
 respectful+0.002
 disl+0.002
 cooperative+0.002
reetings+0.002
Neg logit contributions
 once-0.002
 t-0.001
 baby-0.001
 all-0.001
 re-0.001
Token faces
Token ID6698
Feature activation+3.73e-03

Pos logit contributions
smart+0.005
 respectful+0.005
 cooperative+0.005
 disl+0.005
 facts+0.004
Neg logit contributions
 once-0.003
 t-0.003
 were-0.003
 doorstep-0.003
 baby-0.003
Token in
Token ID287
Feature activation-2.14e-03

Pos logit contributions
 disl+0.009
 respectful+0.009
smart+0.009
 facts+0.009
 cooperative+0.009
Neg logit contributions
 once-0.003
 t-0.002
 were-0.002
 all-0.002
 baby-0.002
Token her
Token ID607
Feature activation-2.08e-03

Pos logit contributions
 once+0.003
 t+0.002
 all+0.002
 baby+0.002
 were+0.002
Neg logit contributions
smart-0.004
 respectful-0.004
 disl-0.004
 cooperative-0.004
aring-0.004
Token mirror
Token ID10162
Feature activation+2.82e-03

Pos logit contributions
 once+0.002
 doorstep+0.002
 t+0.002
 re+0.002
 blanket+0.002
Neg logit contributions
smart-0.005
 respectful-0.004
 disl-0.004
 cooperative-0.004
 facts-0.004
Token ever
Token ID1683
Feature activation+3.03e-02

Pos logit contributions
 disl+0.006
smart+0.006
 respectful+0.006
 cooperative+0.006
 facts+0.006
Neg logit contributions
 once-0.003
 doorstep-0.003
 t-0.003
 all-0.003
 were-0.003
Token again
Token ID757
Feature activation+1.21e-02

Pos logit contributions
smart+0.078
 respectful+0.075
 disl+0.075
 cooperative+0.074
 facts+0.072
Neg logit contributions
 once-0.023
 were-0.018
 t-0.017
 doorstep-0.016
 finally-0.014
Token.
Token ID13
Feature activation+4.83e-03

Pos logit contributions
 disl+0.025
smart+0.024
 respectful+0.024
 cooperative+0.024
 facts+0.024
Neg logit contributions
 once-0.015
 all-0.012
 were-0.012
 t-0.012
 doorstep-0.012
Token<|endoftext|>
Token ID50256
Feature activation+1.93e-02

Pos logit contributions
 respectful+0.010
smart+0.010
 cooperative+0.010
 disl+0.010
 facts+0.009
Neg logit contributions
 once-0.006
 were-0.005
 doorstep-0.004
 Brown-0.004
 all-0.004
TokenOnce
Token ID7454
Feature activation-1.59e-02

Pos logit contributions
smart+0.038
 respectful+0.037
 disl+0.035
 cooperative+0.034
 facts+0.034
Neg logit contributions
 once-0.023
 doorstep-0.023
 were-0.021
 t-0.021
 re-0.020
Token upon
Token ID2402
Feature activation-1.02e-02

Pos logit contributions
 doorstep+0.025
 re+0.023
 once+0.023
 t+0.022
burn+0.022
Neg logit contributions
smart-0.025
 respectful-0.023
 disl-0.022
 cooperative-0.021
 facts-0.021
Token cake
Token ID12187
Feature activation-5.72e-03

Pos logit contributions
 once+0.015
 were+0.014
 t+0.014
 doorstep+0.014
 re+0.014
Neg logit contributions
smart-0.027
 respectful-0.026
 disl-0.025
 cooperative-0.025
 facts-0.023
Token.
Token ID13
Feature activation-6.95e-03

Pos logit contributions
 once+0.007
 doorstep+0.006
 were+0.006
 t+0.006
 re+0.006
Neg logit contributions
smart-0.011
 respectful-0.011
 disl-0.011
 cooperative-0.010
 facts-0.010
Token They
Token ID1119
Feature activation-1.62e-04

Pos logit contributions
 doorstep+0.009
 once+0.009
 t+0.008
 re+0.008
 were+0.008
Neg logit contributions
smart-0.013
 respectful-0.013
 disl-0.012
 cooperative-0.012
 facts-0.012
Token followed
Token ID3940
Feature activation-1.20e-02

Pos logit contributions
 doorstep+0.000
 once+0.000
eddy+0.000
el+0.000
burn+0.000
Neg logit contributions
 respectful-0.000
smart-0.000
 disl-0.000
 cooperative-0.000
 facts-0.000
Token their
Token ID511
Feature activation-1.21e-02

Pos logit contributions
 doorstep+0.011
 once+0.010
 were+0.009
 re+0.009
 raged+0.008
Neg logit contributions
smart-0.027
 respectful-0.026
 disl-0.025
 facts-0.025
 cooperative-0.024
Token mom
Token ID1995
Feature activation+2.97e-02

Pos logit contributions
neath+0.011
he+0.011
 once+0.010
amb+0.009
 were+0.009
Neg logit contributions
 respectful-0.027
 cooperative-0.025
 facts-0.025
 questions-0.025
smart-0.025
Token inside
Token ID2641
Feature activation-1.36e-02

Pos logit contributions
 disl+0.064
smart+0.063
 respectful+0.061
 cooperative+0.060
 facts+0.059
Neg logit contributions
 once-0.033
 doorstep-0.031
 t-0.030
 were-0.028
 re-0.028
Token,
Token ID11
Feature activation-1.51e-02

Pos logit contributions
 once+0.015
 doorstep+0.014
 t+0.013
 were+0.013
 re+0.012
Neg logit contributions
smart-0.029
 respectful-0.028
 disl-0.028
 cooperative-0.027
 facts-0.026
Token leaving
Token ID4305
Feature activation-7.75e-03

Pos logit contributions
 once+0.020
 doorstep+0.020
 t+0.018
 re+0.018
 were+0.018
Neg logit contributions
smart-0.028
 respectful-0.027
 disl-0.026
 cooperative-0.025
 facts-0.025
Token the
Token ID262
Feature activation+3.44e-03

Pos logit contributions
 once+0.008
 t+0.007
 doorstep+0.007
 were+0.007
 re+0.007
Neg logit contributions
smart-0.018
 disl-0.017
 respectful-0.017
 cooperative-0.016
 facts-0.016
Token huge
Token ID3236
Feature activation+1.11e-03

Pos logit contributions
smart+0.008
 respectful+0.008
 disl+0.007
 cooperative+0.007
 facts+0.007
Neg logit contributions
 doorstep-0.003
 once-0.003
 t-0.003
 re-0.003
 were-0.002
Token.
Token ID13
Feature activation-1.88e-02

Pos logit contributions
 once+0.018
 were+0.017
 doorstep+0.017
 all+0.016
 t+0.015
Neg logit contributions
 disl-0.032
smart-0.032
 facts-0.031
 cooperative-0.031
 respectful-0.031
Token
Token ID198
Feature activation-6.92e-03

Pos logit contributions
 once+0.021
 doorstep+0.020
 were+0.018
 t+0.018
 re+0.018
Neg logit contributions
smart-0.039
 respectful-0.038
 disl-0.036
 facts-0.036
 cooperative-0.036
Token
Token ID198
Feature activation-7.76e-03

Pos logit contributions
 once+0.006
 doorstep+0.006
 were+0.005
 re+0.005
 t+0.005
Neg logit contributions
smart-0.016
 respectful-0.016
 disl-0.015
 cooperative-0.015
 facts-0.015
TokenRed
Token ID7738
Feature activation-4.51e-03

Pos logit contributions
 once+0.010
 doorstep+0.010
 were+0.009
 t+0.009
 re+0.009
Neg logit contributions
smart-0.014
 respectful-0.013
 facts-0.013
 disl-0.013
 cooperative-0.013
Token felt
Token ID2936
Feature activation-7.01e-03

Pos logit contributions
 once+0.003
 doorstep+0.003
 were+0.003
 re+0.003
 t+0.003
Neg logit contributions
smart-0.011
 respectful-0.011
 disl-0.011
 facts-0.010
 cooperative-0.010
Token very
Token ID845
Feature activation+1.32e-02

Pos logit contributions
 once+0.009
 were+0.009
 all+0.008
 finally+0.008
 t+0.008
Neg logit contributions
 facts-0.013
smart-0.013
 disl-0.013
 respectful-0.013
 cooperative-0.013
Token happy
Token ID3772
Feature activation-5.87e-03

Pos logit contributions
smart+0.026
 facts+0.025
rets+0.022
 disl+0.022
ruit+0.021
Neg logit contributions
 once-0.018
 were-0.016
 finally-0.016
 couldn-0.015
 all-0.015
Token and
Token ID290
Feature activation+7.53e-03

Pos logit contributions
 once+0.006
 doorstep+0.005
 t+0.005
 were+0.005
 re+0.005
Neg logit contributions
smart-0.013
 respectful-0.013
 disl-0.012
 cooperative-0.012
 facts-0.012
Token successful
Token ID4388
Feature activation-1.17e-05

Pos logit contributions
smart+0.012
 respectful+0.012
 disl+0.011
 facts+0.011
 cooperative+0.011
Neg logit contributions
 once-0.011
 doorstep-0.011
 were-0.011
 t-0.011
 re-0.010
Token because
Token ID780
Feature activation-2.52e-02

Pos logit contributions
 once+0.000
 were+0.000
 t+0.000
 all+0.000
 re+0.000
Neg logit contributions
 disl-0.000
 respectful-0.000
smart-0.000
 facts-0.000
 cooperative-0.000
Token he
Token ID339
Feature activation+1.33e-02

Pos logit contributions
 doorstep+0.038
 once+0.035
 re+0.033
 t+0.033
 were+0.033
Neg logit contributions
smart-0.042
 respectful-0.040
 disl-0.038
 cooperative-0.037
 facts-0.036
Token of
Token ID286
Feature activation-2.45e-03

Pos logit contributions
smart+0.007
 respectful+0.007
 disl+0.007
 cooperative+0.007
rets+0.007
Neg logit contributions
 were-0.001
 once-0.001
 t-0.001
 re-0.001
 all-0.001
Token toy
Token ID13373
Feature activation-2.20e-02

Pos logit contributions
 t+0.003
 baby+0.002
 re+0.002
 all+0.002
 stuffed+0.002
Neg logit contributions
smart-0.006
 respectful-0.005
 cooperative-0.005
reetings-0.005
 disl-0.005
Token animals
Token ID4695
Feature activation+1.11e-02

Pos logit contributions
 doorstep+0.024
 once+0.024
 t+0.021
 re+0.021
 were+0.021
Neg logit contributions
smart-0.044
 respectful-0.043
 disl-0.041
 facts-0.041
 cooperative-0.040
Token and
Token ID290
Feature activation+5.87e-03

Pos logit contributions
 cooperative+0.026
 disl+0.026
smart+0.025
 respectful+0.024
rets+0.024
Neg logit contributions
 were-0.012
 once-0.010
 all-0.009
 t-0.008
 just-0.008
Token she
Token ID673
Feature activation+8.91e-03

Pos logit contributions
smart+0.013
 respectful+0.012
 cooperative+0.012
reetings+0.011
 disl+0.011
Neg logit contributions
 were-0.007
 t-0.006
 baby-0.006
 re-0.006
 all-0.006
Token liked
Token ID8288
Feature activation+5.93e-03

Pos logit contributions
smart+0.019
 respectful+0.019
 disl+0.018
 cooperative+0.018
 facts+0.017
Neg logit contributions
 once-0.009
 doorstep-0.009
 were-0.008
 t-0.008
 re-0.008
Token to
Token ID284
Feature activation+1.20e-02

Pos logit contributions
 respectful+0.013
smart+0.013
reetings+0.013
 cooperative+0.013
 disl+0.013
Neg logit contributions
 once-0.006
 all-0.006
 t-0.006
 re-0.005
 baby-0.005
Token skip
Token ID14267
Feature activation+1.31e-02

Pos logit contributions
smart+0.029
 respectful+0.028
 cooperative+0.027
 facts+0.026
 disl+0.025
Neg logit contributions
 t-0.010
 were-0.010
 doorstep-0.009
 re-0.009
 once-0.009
Token around
Token ID1088
Feature activation+3.51e-03

Pos logit contributions
 cooperative+0.033
 respectful+0.032
 disl+0.031
smart+0.031
 facts+0.030
Neg logit contributions
 once-0.013
 all-0.013
 were-0.011
 t-0.010
 both-0.009
Token the
Token ID262
Feature activation+2.08e-02

Pos logit contributions
 respectful+0.008
smart+0.008
 disl+0.007
 cooperative+0.007
 facts+0.007
Neg logit contributions
 once-0.004
 t-0.003
 all-0.003
 re-0.003
 were-0.003
Token garden
Token ID11376
Feature activation+1.07e-02

Pos logit contributions
smart+0.050
 respectful+0.047
 disl+0.046
 facts+0.045
 cooperative+0.045
Neg logit contributions
 doorstep-0.018
 once-0.017
 t-0.016
 re-0.016
 baby-0.015
Token happy
Token ID3772
Feature activation-2.04e-02

Pos logit contributions
 once+0.005
 baby+0.005
 re+0.005
 were+0.005
 t+0.005
Neg logit contributions
smart-0.007
 facts-0.006
 disl-0.006
rets-0.006
 respectful-0.006
Token and
Token ID290
Feature activation-3.76e-03

Pos logit contributions
 doorstep+0.023
 once+0.023
 t+0.021
 re+0.021
 were+0.021
Neg logit contributions
smart-0.041
 respectful-0.040
 disl-0.038
 cooperative-0.037
 facts-0.037
Token laughed
Token ID13818
Feature activation-1.20e-02

Pos logit contributions
 once+0.004
 doorstep+0.004
 were+0.004
 t+0.004
 re+0.004
Neg logit contributions
smart-0.008
 respectful-0.007
 disl-0.007
 cooperative-0.007
 facts-0.007
Token a
Token ID257
Feature activation-5.64e-03

Pos logit contributions
 once+0.014
 t+0.012
 were+0.011
 all+0.011
 doorstep+0.011
Neg logit contributions
smart-0.026
 disl-0.026
 respectful-0.025
 facts-0.024
 cooperative-0.024
Token lot
Token ID1256
Feature activation+1.99e-02

Pos logit contributions
 doorstep+0.007
 once+0.007
 were+0.006
 re+0.006
 t+0.006
Neg logit contributions
smart-0.011
 respectful-0.011
 disl-0.010
 cooperative-0.010
 facts-0.010
Token.
Token ID13
Feature activation+1.73e-03

Pos logit contributions
 disl+0.044
smart+0.042
rets+0.041
 facts+0.040
 respectful+0.040
Neg logit contributions
 once-0.023
 all-0.019
 were-0.017
 re-0.017
 t-0.017
Token They
Token ID1119
Feature activation-2.77e-02

Pos logit contributions
smart+0.003
 respectful+0.003
 disl+0.003
 cooperative+0.003
 facts+0.003
Neg logit contributions
 doorstep-0.002
 once-0.002
 t-0.002
 re-0.002
 were-0.002
Token even
Token ID772
Feature activation+1.44e-02

Pos logit contributions
 doorstep+0.026
 once+0.024
eddy+0.022
 tub+0.021
burn+0.021
Neg logit contributions
 respectful-0.059
smart-0.059
 disl-0.055
 cooperative-0.054
 facts-0.054
Token used
Token ID973
Feature activation-9.84e-03

Pos logit contributions
smart+0.033
 disl+0.032
 respectful+0.032
 cooperative+0.031
 facts+0.031
Neg logit contributions
 once-0.012
 were-0.011
 t-0.011
 re-0.010
 doorstep-0.010
Token their
Token ID511
Feature activation-1.22e-02

Pos logit contributions
 once+0.011
 t+0.010
 were+0.010
 re+0.010
 all+0.009
Neg logit contributions
smart-0.021
 respectful-0.020
 cooperative-0.020
 disl-0.019
reetings-0.018
Token tape
Token ID9154
Feature activation-3.42e-02

Pos logit contributions
 once+0.015
 doorstep+0.015
 t+0.014
 re+0.014
 were+0.014
Neg logit contributions
smart-0.023
 respectful-0.022
 disl-0.021
 cooperative-0.021
 facts-0.020
Token paper
Token ID3348
Feature activation-1.83e-02

Pos logit contributions
 doorstep+0.000
 once+0.000
 t+0.000
 re+0.000
 were+0.000
Neg logit contributions
smart-0.001
 respectful-0.001
 disl-0.001
 cooperative-0.001
 facts-0.001
Token,
Token ID11
Feature activation-6.67e-03

Pos logit contributions
 doorstep+0.016
 once+0.015
 re+0.013
 were+0.013
 t+0.013
Neg logit contributions
smart-0.041
 respectful-0.040
 disl-0.039
 cooperative-0.038
 facts-0.038
Token paint
Token ID7521
Feature activation-3.16e-03

Pos logit contributions
 once+0.006
 t+0.006
 were+0.006
 doorstep+0.006
 blanket+0.006
Neg logit contributions
smart-0.015
 respectful-0.014
 disl-0.014
 cooperative-0.013
 facts-0.013
Token,
Token ID11
Feature activation-3.44e-03

Pos logit contributions
 once+0.003
 were+0.002
 t+0.002
 all+0.002
 blanket+0.002
Neg logit contributions
 respectful-0.008
smart-0.008
 disl-0.008
 cooperative-0.007
 facts-0.007
Token and
Token ID290
Feature activation+1.75e-02

Pos logit contributions
 once+0.002
 t+0.002
 were+0.002
 doorstep+0.002
 re+0.002
Neg logit contributions
smart-0.009
 respectful-0.008
 disl-0.008
 cooperative-0.008
 facts-0.008
Token brushes
Token ID36377
Feature activation-2.23e-02

Pos logit contributions
smart+0.039
 respectful+0.037
 disl+0.037
 cooperative+0.036
 facts+0.035
Neg logit contributions
 once-0.017
 were-0.016
 t-0.016
 doorstep-0.015
 re-0.015
Token,
Token ID11
Feature activation-4.73e-03

Pos logit contributions
 once+0.023
 were+0.021
 t+0.020
 all+0.018
 doorstep+0.018
Neg logit contributions
smart-0.051
 disl-0.050
 respectful-0.050
 cooperative-0.049
 facts-0.048
Token and
Token ID290
Feature activation+2.08e-02

Pos logit contributions
 once+0.003
 were+0.002
 t+0.002
 all+0.002
 re+0.002
Neg logit contributions
smart-0.013
 disl-0.012
 respectful-0.012
 cooperative-0.012
reetings-0.012
Token set
Token ID900
Feature activation+1.45e-02

Pos logit contributions
smart+0.046
 disl+0.044
 respectful+0.043
 cooperative+0.041
 facts+0.040
Neg logit contributions
 once-0.021
 were-0.021
 t-0.021
 doorstep-0.019
 re-0.019
Token up
Token ID510
Feature activation+2.01e-02

Pos logit contributions
 respectful+0.036
smart+0.035
 disl+0.035
 cooperative+0.034
reetings+0.033
Neg logit contributions
 once-0.013
 all-0.012
 t-0.011
 mom-0.010
 blanket-0.010
Token a
Token ID257
Feature activation+2.36e-02

Pos logit contributions
 respectful+0.048
 disl+0.048
 cooperative+0.046
aring+0.045
smart+0.045
Neg logit contributions
 once-0.021
 all-0.018
 were-0.016
 t-0.016
 blanket-0.014
Token by
Token ID416
Feature activation+8.07e-03

Pos logit contributions
 once+0.017
 t+0.014
 doorstep+0.014
 all+0.014
 baby+0.014
Neg logit contributions
 disl-0.034
 respectful-0.033
 facts-0.032
smart-0.032
 cooperative-0.032
Token.
Token ID13
Feature activation+2.36e-03

Pos logit contributions
 respectful+0.020
 cooperative+0.020
smart+0.019
rets+0.019
 disl+0.019
Neg logit contributions
 once-0.007
 t-0.005
 all-0.005
 doorstep-0.005
 Brown-0.004
Token He
Token ID679
Feature activation+5.45e-03

Pos logit contributions
 respectful+0.006
smart+0.006
 cooperative+0.006
 facts+0.006
 disl+0.006
Neg logit contributions
 Brown-0.002
 once-0.001
 t-0.001
 doorstep-0.001
 Bobby-0.001
Token saw
Token ID2497
Feature activation+4.22e-03

Pos logit contributions
smart+0.014
 respectful+0.013
 cooperative+0.013
 disl+0.013
 facts+0.013
Neg logit contributions
 once-0.004
 were-0.003
 doorstep-0.003
 t-0.003
 re-0.003
Token the
Token ID262
Feature activation-3.50e-03

Pos logit contributions
 respectful+0.010
smart+0.009
 cooperative+0.009
 disl+0.009
 facts+0.009
Neg logit contributions
 once-0.004
 t-0.004
 were-0.004
 all-0.003
 baby-0.003
Token snow
Token ID6729
Feature activation-1.97e-02

Pos logit contributions
 doorstep+0.004
 once+0.004
 t+0.004
 re+0.004
 were+0.004
Neg logit contributions
smart-0.007
 respectful-0.007
 disl-0.006
 cooperative-0.006
 facts-0.006
Tokenman
Token ID805
Feature activation+1.95e-03

Pos logit contributions
 doorstep+0.019
 once+0.018
 re+0.016
 t+0.016
 were+0.016
Neg logit contributions
smart-0.042
 respectful-0.041
 disl-0.039
 facts-0.038
 cooperative-0.038
Token and
Token ID290
Feature activation-9.88e-03

Pos logit contributions
smart+0.004
 respectful+0.004
 disl+0.004
 cooperative+0.004
 facts+0.004
Neg logit contributions
 once-0.002
 doorstep-0.002
 were-0.002
 t-0.002
 re-0.002
Token the
Token ID262
Feature activation-1.22e-02

Pos logit contributions
 t+0.009
 were+0.009
 once+0.008
 re+0.008
 doorstep+0.008
Neg logit contributions
smart-0.022
 respectful-0.022
 cooperative-0.021
 disl-0.021
 facts-0.020
Token jewel
Token ID16927
Feature activation-1.43e-02

Pos logit contributions
 once+0.010
 doorstep+0.010
 t+0.010
 re+0.010
 were+0.009
Neg logit contributions
smart-0.029
 respectful-0.027
 disl-0.026
 cooperative-0.026
 facts-0.026
Token.
Token ID13
Feature activation-5.46e-03

Pos logit contributions
 once+0.020
 doorstep+0.019
 t+0.018
 were+0.018
 re+0.018
Neg logit contributions
smart-0.026
 respectful-0.025
 disl-0.024
 cooperative-0.024
 facts-0.023
Token and
Token ID290
Feature activation-9.36e-03

Pos logit contributions
 doorstep+0.011
 once+0.011
 t+0.009
 re+0.009
 were+0.009
Neg logit contributions
smart-0.032
 respectful-0.031
 disl-0.030
 cooperative-0.030
 facts-0.029
Token Pat
Token ID3208
Feature activation+6.02e-03

Pos logit contributions
 doorstep+0.010
 once+0.010
 t+0.009
 re+0.009
 were+0.009
Neg logit contributions
smart-0.019
 respectful-0.019
 disl-0.018
 cooperative-0.018
 facts-0.017
Token were
Token ID547
Feature activation-1.00e-02

Pos logit contributions
 respectful+0.015
smart+0.015
 cooperative+0.015
 disl+0.014
 facts+0.014
Neg logit contributions
 were-0.004
 once-0.004
 re-0.004
 t-0.004
 doorstep-0.004
Token out
Token ID503
Feature activation-3.35e-02

Pos logit contributions
 doorstep+0.013
 dot+0.012
 re+0.012
 raged+0.012
 cloud+0.012
Neg logit contributions
smart-0.017
 respectful-0.017
 cooperative-0.015
 disl-0.015
 questions-0.014
Token in
Token ID287
Feature activation-1.45e-02

Pos logit contributions
 once+0.034
 doorstep+0.031
 re+0.029
 t+0.029
 were+0.028
Neg logit contributions
smart-0.072
 respectful-0.071
 disl-0.070
 facts-0.067
 cooperative-0.067
Token the
Token ID262
Feature activation-6.29e-03

Pos logit contributions
 doorstep+0.017
 once+0.017
 t+0.015
 re+0.015
 were+0.015
Neg logit contributions
smart-0.028
 respectful-0.028
 disl-0.026
 cooperative-0.026
 facts-0.025
Token garden
Token ID11376
Feature activation+4.94e-03

Pos logit contributions
 doorstep+0.007
 once+0.006
 were+0.006
 re+0.006
 t+0.006
Neg logit contributions
smart-0.012
 respectful-0.012
 disl-0.012
 cooperative-0.011
 facts-0.011
Token playing
Token ID2712
Feature activation+2.98e-02

Pos logit contributions
smart+0.011
 disl+0.010
 cooperative+0.010
 facts+0.010
 respectful+0.010
Neg logit contributions
 once-0.006
 were-0.006
 doorstep-0.006
 all-0.005
 t-0.005
Token.
Token ID13
Feature activation+3.76e-03

Pos logit contributions
smart+0.067
 respectful+0.065
 disl+0.064
 cooperative+0.063
 facts+0.062
Neg logit contributions
 once-0.032
 were-0.028
 all-0.027
 t-0.027
 doorstep-0.026
Token
Token ID220
Feature activation+1.72e-02

Pos logit contributions
smart+0.007
 respectful+0.007
 cooperative+0.007
 disl+0.007
 facts+0.006
Neg logit contributions
 once-0.005
 were-0.005
 doorstep-0.004
 t-0.004
 re-0.004
Token
Token ID198
Feature activation+1.26e-02

Pos logit contributions
 respectful+0.044
smart+0.043
 disl+0.042
 cooperative+0.042
 facts+0.042
Neg logit contributions
 once-0.015
 were-0.013
he-0.010
 doorstep-0.009
 all-0.009
Token couch
Token ID18507
Feature activation-2.12e-02

Pos logit contributions
smart+0.003
 respectful+0.003
 disl+0.003
 cooperative+0.003
 facts+0.003
Neg logit contributions
 doorstep-0.001
 once-0.001
 t-0.001
 re-0.001
 baby-0.001
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
 once+0.016
 were+0.014
 doorstep+0.014
 t+0.013
 re+0.012
Neg logit contributions
smart-0.053
 respectful-0.052
 disl-0.051
 cooperative-0.051
 facts-0.049
Token the
Token ID262
Feature activation-1.04e-04

Pos logit contributions
 once+0.011
 t+0.010
 were+0.010
 doorstep+0.009
 re+0.009
Neg logit contributions
smart-0.028
 respectful-0.027
 disl-0.027
 cooperative-0.027
 facts-0.026
Token table
Token ID3084
Feature activation-1.15e-02

Pos logit contributions
 once+0.000
 doorstep+0.000
 t+0.000
 re+0.000
 blanket+0.000
Neg logit contributions
smart-0.000
 respectful-0.000
 disl-0.000
 cooperative-0.000
 facts-0.000
Token and
Token ID290
Feature activation+4.73e-03

Pos logit contributions
 were+0.010
 once+0.009
 t+0.008
 doorstep+0.008
 blanket+0.008
Neg logit contributions
 disl-0.028
smart-0.028
 respectful-0.027
 cooperative-0.027
 facts-0.026
Token the
Token ID262
Feature activation-7.27e-04

Pos logit contributions
smart+0.010
 respectful+0.010
 disl+0.009
 cooperative+0.009
 facts+0.009
Neg logit contributions
 once-0.005
 doorstep-0.005
 t-0.005
 were-0.005
 re-0.005
Token chairs
Token ID18791
Feature activation-2.47e-02

Pos logit contributions
 doorstep+0.001
 once+0.001
 t+0.001
 re+0.001
 blanket+0.001
Neg logit contributions
smart-0.002
 respectful-0.002
 disl-0.002
 cooperative-0.001
 facts-0.001
Token.
Token ID13
Feature activation+3.42e-03

Pos logit contributions
 once+0.031
 were+0.030
 doorstep+0.029
 t+0.028
 all+0.027
Neg logit contributions
smart-0.049
 disl-0.048
 respectful-0.048
 cooperative-0.046
 facts-0.046
Token
Token ID198
Feature activation+1.13e-02

Pos logit contributions
 respectful+0.008
smart+0.008
 facts+0.007
 cooperative+0.007
 disl+0.007
Neg logit contributions
 once-0.003
 Brown-0.003
 Mom-0.003
 Bobby-0.003
 were-0.003
Token
Token ID198
Feature activation+1.41e-02

Pos logit contributions
 respectful+0.027
smart+0.027
 disl+0.026
 facts+0.026
 cooperative+0.026
Neg logit contributions
 once-0.009
 were-0.008
 doorstep-0.007
he-0.006
 mom-0.006
TokenTheir
Token ID14574
Feature activation-8.12e-03

Pos logit contributions
 facts+0.027
smart+0.026
 disl+0.025
 respectful+0.024
 cooperative+0.024
Neg logit contributions
 once-0.017
Mom-0.017
 were-0.016
he-0.016
 doorstep-0.016
Token One
Token ID1881
Feature activation+1.35e-03

Pos logit contributions
 doorstep+0.012
 once+0.011
 re+0.011
 t+0.010
 were+0.010
Neg logit contributions
smart-0.014
 respectful-0.013
 disl-0.013
 cooperative-0.012
 facts-0.012
Token day
Token ID1110
Feature activation+4.03e-03

Pos logit contributions
 respectful+0.002
smart+0.002
 disl+0.002
 cooperative+0.002
 facts+0.002
Neg logit contributions
 doorstep-0.002
 once-0.002
 raged-0.002
 re-0.002
 suitcase-0.002
Token,
Token ID11
Feature activation-9.44e-03

Pos logit contributions
smart+0.006
 respectful+0.006
 disl+0.006
 facts+0.005
 cooperative+0.005
Neg logit contributions
 doorstep-0.007
 once-0.006
 re-0.006
 airplane-0.006
sey-0.006
Token Tim
Token ID5045
Feature activation-1.99e-02

Pos logit contributions
 doorstep+0.015
 once+0.012
 re+0.012
sey+0.012
 suitcase+0.012
Neg logit contributions
smart-0.016
 respectful-0.015
 disl-0.015
 facts-0.014
 cooperative-0.014
Tokenmy
Token ID1820
Feature activation-2.95e-02

Pos logit contributions
 doorstep+0.037
 lighthouse+0.034
 Stev+0.034
ifier+0.034
 re+0.033
Neg logit contributions
smart-0.024
 respectful-0.022
 disl-0.018
 cooperative-0.017
able-0.016
Token's
Token ID338
Feature activation-5.11e-02

Pos logit contributions
 doorstep+0.038
ifier+0.037
 Stev+0.037
 destination+0.035
ay+0.035
Neg logit contributions
 respectful-0.049
smart-0.045
 disl-0.043
 cooperative-0.040
ups-0.038
Token mom
Token ID1995
Feature activation-1.90e-02

Pos logit contributions
neath+0.074
 raged+0.069
eled+0.063
mine+0.062
hed+0.061
Neg logit contributions
 respectful-0.073
 questions-0.069
 math-0.061
 nice-0.059
 cooperative-0.058
Tokenmy
Token ID1820
Feature activation-1.94e-02

Pos logit contributions
 doorstep+0.031
ifier+0.029
 hull+0.029
 dot+0.029
 suitcase+0.029
Neg logit contributions
 respectful-0.028
smart-0.027
 cooperative-0.023
 disl-0.022
ups-0.021
Token told
Token ID1297
Feature activation-1.99e-02

Pos logit contributions
 doorstep+0.028
 dot+0.026
 once+0.026
ifier+0.025
el+0.025
Neg logit contributions
 respectful-0.031
smart-0.030
 disl-0.027
 cooperative-0.026
 questions-0.024
Token him
Token ID683
Feature activation+1.29e-02

Pos logit contributions
 doorstep+0.024
 once+0.022
 re+0.021
 t+0.020
 dot+0.020
Neg logit contributions
smart-0.039
 respectful-0.038
 disl-0.037
 facts-0.035
 cooperative-0.035
Token they
Token ID484
Feature activation-9.00e-03

Pos logit contributions
smart+0.028
 respectful+0.028
 cooperative+0.027
 disl+0.027
reetings+0.025
Neg logit contributions
 once-0.013
 were-0.012
 all-0.012
 t-0.012
 mom-0.011
Token tri
Token ID1333
Feature activation-6.74e-03

Pos logit contributions
 doorstep+0.012
 once+0.010
 re+0.010
eddy+0.010
 blanket+0.010
Neg logit contributions
smart-0.020
 respectful-0.018
 disl-0.017
 cooperative-0.016
 facts-0.016
Tokenpped
Token ID1496
Feature activation-3.93e-02

Pos logit contributions
 doorstep+0.007
 once+0.006
 were+0.006
 re+0.006
 baby+0.005
Neg logit contributions
smart-0.015
 respectful-0.014
 cooperative-0.013
 disl-0.013
 facts-0.012
Token and
Token ID290
Feature activation-3.14e-02

Pos logit contributions
 doorstep+0.046
 once+0.038
 re+0.038
 were+0.036
 suitcase+0.036
Neg logit contributions
smart-0.080
 respectful-0.073
 disl-0.069
 cooperative-0.067
 facts-0.066
Token fell
Token ID3214
Feature activation-3.09e-02

Pos logit contributions
 doorstep+0.029
 once+0.026
 re+0.023
eddy+0.022
 dot+0.021
Neg logit contributions
smart-0.071
 respectful-0.068
 disl-0.066
 cooperative-0.063
 facts-0.061
Token.
Token ID13
Feature activation-1.31e-02

Pos logit contributions
 once+0.039
 doorstep+0.038
 t+0.036
 were+0.035
 re+0.035
Neg logit contributions
smart-0.058
 respectful-0.057
 disl-0.055
 cooperative-0.053
 facts-0.053
Token Her
Token ID2332
Feature activation-6.05e-02

Pos logit contributions
 once+0.016
 doorstep+0.016
 t+0.015
 were+0.015
 baby+0.014
Neg logit contributions
smart-0.026
 respectful-0.025
 cooperative-0.024
 facts-0.024
 disl-0.024
Token t
Token ID256
Feature activation-6.76e-02

Pos logit contributions
neath+0.070
 raged+0.069
he+0.062
 were+0.062
 doorstep+0.059
Neg logit contributions
 respectful-0.097
 protective-0.088
 questions-0.088
 disl-0.088
 math-0.086
Tokeneddy
Token ID21874
Feature activation-4.29e-02

Pos logit contributions
 picnic+0.095
 blast+0.093
 flood+0.087
 re+0.087
 finally+0.087
Neg logit contributions
smart-0.101
and-0.100
ups-0.090
aring-0.086
rows-0.085
Token bear
Token ID6842
Feature activation-2.92e-02

Pos logit contributions
 doorstep+0.049
he+0.038
 once+0.037
 destination+0.036
 tub+0.036
Neg logit contributions
smart-0.089
 respectful-0.082
 disl-0.073
 cooperative-0.073
 questions-0.071
Token went
Token ID1816
Feature activation-1.21e-02

Pos logit contributions
 doorstep+0.036
eddy+0.032
 once+0.031
 tub+0.030
 suitcase+0.030
Neg logit contributions
smart-0.058
 respectful-0.053
 disl-0.047
 cooperative-0.046
 facts-0.044
Token flying
Token ID7348
Feature activation-3.19e-03

Pos logit contributions
 once+0.012
 doorstep+0.011
 t+0.011
 re+0.010
 were+0.010
Neg logit contributions
 respectful-0.026
smart-0.026
 disl-0.026
 cooperative-0.025
 facts-0.025
Token to
Token ID284
Feature activation-9.37e-04

Pos logit contributions
 disl+0.045
 respectful+0.045
 cooperative+0.045
smart+0.044
 facts+0.042
Neg logit contributions
 once-0.019
 all-0.017
 t-0.014
 just-0.013
 Brown-0.013
Token get
Token ID651
Feature activation+1.31e-02

Pos logit contributions
 once+0.001
 doorstep+0.001
 re+0.001
 were+0.001
 t+0.001
Neg logit contributions
smart-0.002
 respectful-0.002
 disl-0.002
 cooperative-0.002
 facts-0.001
Token the
Token ID262
Feature activation-2.31e-02

Pos logit contributions
smart+0.030
 respectful+0.029
 facts+0.029
 disl+0.029
 cooperative+0.029
Neg logit contributions
 once-0.013
 t-0.012
 all-0.012
 baby-0.011
 were-0.011
Token fan
Token ID4336
Feature activation+5.28e-03

Pos logit contributions
 doorstep+0.017
 once+0.014
eled+0.013
eddy+0.013
amb+0.012
Neg logit contributions
 respectful-0.054
smart-0.051
 cooperative-0.050
 disl-0.049
 questions-0.049
Token if
Token ID611
Feature activation-2.21e-02

Pos logit contributions
smart+0.011
 disl+0.011
 respectful+0.011
 cooperative+0.011
 facts+0.011
Neg logit contributions
 once-0.006
 doorstep-0.005
 t-0.005
 were-0.005
 airplane-0.005
Token Br
Token ID1709
Feature activation-4.73e-02

Pos logit contributions
 once+0.024
 were+0.023
 baby+0.021
 t+0.020
 all+0.020
Neg logit contributions
smart-0.048
 respectful-0.047
 disl-0.046
 cooperative-0.046
 facts-0.045
Tokenidget
Token ID17484
Feature activation-1.02e-02

Pos logit contributions
 doorstep+0.049
 once+0.047
 re+0.044
 were+0.043
 t+0.043
Neg logit contributions
smart-0.098
 respectful-0.094
 disl-0.091
 cooperative-0.089
 facts-0.087
Token was
Token ID373
Feature activation+8.47e-03

Pos logit contributions
 once+0.007
 were+0.007
 t+0.006
 doorstep+0.006
 re+0.006
Neg logit contributions
 respectful-0.026
smart-0.026
 cooperative-0.025
 disl-0.025
 facts-0.024
Token careful
Token ID8161
Feature activation-1.57e-02

Pos logit contributions
 facts+0.019
 disl+0.019
smart+0.017
rets+0.017
minded+0.017
Neg logit contributions
 once-0.010
 t-0.010
 just-0.010
 all-0.009
 baby-0.009
Token.
Token ID13
Feature activation-1.51e-02

Pos logit contributions
 once+0.018
 all+0.014
 were+0.014
 doorstep+0.013
 t+0.013
Neg logit contributions
smart-0.035
 disl-0.035
 facts-0.033
 cooperative-0.033
 respectful-0.033
Token Br
Token ID1709
Feature activation-5.71e-02

Pos logit contributions
 once+0.019
 doorstep+0.018
 were+0.018
 Brown+0.016
 t+0.016
Neg logit contributions
 respectful-0.029
smart-0.029
 cooperative-0.028
 facts-0.028
 disl-0.027
Token kept
Token ID4030
Feature activation-1.28e-02

Pos logit contributions
 doorstep+0.026
 once+0.023
eddy+0.021
burn+0.021
 t+0.020
Neg logit contributions
smart-0.058
 respectful-0.057
 disl-0.053
 cooperative-0.053
 facts-0.052
Token flowing
Token ID17609
Feature activation+3.14e-03

Pos logit contributions
 once+0.014
 t+0.013
 re+0.013
 were+0.012
 all+0.012
Neg logit contributions
 respectful-0.027
smart-0.027
 disl-0.026
 cooperative-0.026
 facts-0.025
Token,
Token ID11
Feature activation-1.97e-02

Pos logit contributions
 respectful+0.007
 disl+0.007
smart+0.007
 cooperative+0.007
 facts+0.007
Neg logit contributions
 once-0.003
 were-0.003
 t-0.002
 re-0.002
 all-0.002
Token it
Token ID340
Feature activation-8.23e-03

Pos logit contributions
 once+0.023
 doorstep+0.022
 t+0.020
 re+0.020
 were+0.020
Neg logit contributions
smart-0.040
 respectful-0.039
 disl-0.037
 cooperative-0.036
 facts-0.036
Token ruined
Token ID20484
Feature activation-3.77e-02

Pos logit contributions
 doorstep+0.007
 once+0.007
 re+0.006
 t+0.006
 were+0.005
Neg logit contributions
smart-0.019
 respectful-0.018
 disl-0.017
 cooperative-0.017
 facts-0.017
Token T
Token ID309
Feature activation-4.57e-02

Pos logit contributions
 once+0.044
 were+0.038
 t+0.038
 doorstep+0.037
 all+0.037
Neg logit contributions
 respectful-0.080
smart-0.080
 disl-0.078
 cooperative-0.077
 facts-0.073
Tokenoot
Token ID1025
Feature activation-7.36e-03

Pos logit contributions
 doorstep+0.076
 once+0.075
 t+0.070
 were+0.070
 re+0.070
Neg logit contributions
smart-0.067
 respectful-0.064
 disl-0.061
 cooperative-0.059
 facts-0.058
Token's
Token ID338
Feature activation-1.53e-02

Pos logit contributions
 once+0.008
 were+0.007
 doorstep+0.007
 t+0.006
 re+0.006
Neg logit contributions
 cooperative-0.016
 respectful-0.016
smart-0.016
 disl-0.016
 facts-0.015
Token home
Token ID1363
Feature activation-2.30e-02

Pos logit contributions
 doorstep+0.017
 once+0.017
 t+0.016
 re+0.016
 were+0.015
Neg logit contributions
smart-0.032
 respectful-0.030
 disl-0.029
 cooperative-0.029
 facts-0.028
Token and
Token ID290
Feature activation-1.64e-02

Pos logit contributions
 once+0.028
 doorstep+0.027
 were+0.026
 t+0.024
 all+0.023
Neg logit contributions
 disl-0.046
smart-0.046
 respectful-0.045
 facts-0.044
 cooperative-0.043
Token all
Token ID477
Feature activation-6.20e-03

Pos logit contributions
 once+0.017
 doorstep+0.017
 t+0.016
 were+0.016
 re+0.016
Neg logit contributions
smart-0.035
 respectful-0.033
 disl-0.032
 cooperative-0.032
 facts-0.031
Token and
Token ID290
Feature activation+1.19e-03

Pos logit contributions
 disl+0.003
smart+0.003
 respectful+0.003
 cooperative+0.003
 facts+0.003
Neg logit contributions
 once-0.001
 all-0.001
 were-0.001
 t-0.001
 just-0.000
Token frustrated
Token ID14718
Feature activation+9.47e-03

Pos logit contributions
smart+0.003
 facts+0.003
reetings+0.002
 disl+0.002
 cooperative+0.002
Neg logit contributions
 t-0.001
 couldn-0.001
 were-0.001
 once-0.001
 all-0.001
Token.
Token ID13
Feature activation+6.18e-03

Pos logit contributions
 disl+0.023
 respectful+0.021
reetings+0.021
rets+0.021
 cooperative+0.021
Neg logit contributions
 once-0.010
 all-0.009
 were-0.009
 t-0.008
 just-0.008
Token She
Token ID1375
Feature activation-1.07e-02

Pos logit contributions
 respectful+0.014
smart+0.014
 facts+0.014
 disl+0.014
 cooperative+0.013
Neg logit contributions
 once-0.007
 were-0.005
 Mom-0.005
 mom-0.005
 doorstep-0.005
Token wanted
Token ID2227
Feature activation-4.36e-03

Pos logit contributions
 doorstep+0.014
 once+0.014
burn+0.013
eddy+0.013
 tub+0.012
Neg logit contributions
 respectful-0.019
smart-0.019
 disl-0.018
 cooperative-0.017
 facts-0.017
Token her
Token ID607
Feature activation-5.32e-02

Pos logit contributions
 once+0.005
 t+0.005
 mom+0.004
 baby+0.004
 all+0.004
Neg logit contributions
smart-0.009
 respectful-0.009
 disl-0.009
 cooperative-0.009
 facts-0.008
Token t
Token ID256
Feature activation-6.40e-02

Pos logit contributions
 doorstep+0.058
neath+0.054
eled+0.053
aud+0.053
he+0.052
Neg logit contributions
 respectful-0.096
smart-0.096
 questions-0.089
 disl-0.088
 smarter-0.087
Tokeneddy
Token ID21874
Feature activation-4.08e-02

Pos logit contributions
 picnic+0.076
neath+0.073
 headed+0.073
 finally+0.072
 flood+0.071
Neg logit contributions
smart-0.113
and-0.107
ups-0.099
aring-0.097
 respectful-0.093
Token bear
Token ID6842
Feature activation+4.17e-03

Pos logit contributions
 doorstep+0.044
eled+0.033
 once+0.032
neath+0.031
 headed+0.031
Neg logit contributions
smart-0.088
 respectful-0.082
 disl-0.077
 facts-0.076
 cooperative-0.075
Token back
Token ID736
Feature activation-1.44e-02

Pos logit contributions
 disl+0.011
 cooperative+0.010
smart+0.010
 facts+0.010
 respectful+0.010
Neg logit contributions
 once-0.003
 t-0.003
 all-0.003
 were-0.003
 baby-0.002
Token.
Token ID13
Feature activation-2.46e-03

Pos logit contributions
 once+0.018
 doorstep+0.017
 were+0.016
 t+0.016
 all+0.015
Neg logit contributions
smart-0.029
 disl-0.028
 respectful-0.028
 cooperative-0.027
 facts-0.027