TOP ACTIVATIONS
MAX = 0.078

big , generous dog named Che wy . Che wy loved
named Fl uffy and a ham ster named Nib bles .
to the backyard and unw ra pped the pastry . The
around the package and unw ra pped it .
join them again , as long as he kept his promise
two friends , Alan and Aub rey . Alan and Aub
under the table and unw ra pped the chocolate .
of candy and carefully unw ra pped it . She popped
and watched it for as long as they could . They
A br ac ad ab ra !" Nothing happens
was a little girl named Rus sel . Rus sel had
They had red eyes and sharp be aks . They did
the park . He unw ra pped the l oll ipop
grew mint and basil and pars ley . One
We can play here as long as we want ."
Aub rey . Alan and Aub rey would play together every
to stay outside for as long as she wanted , as
A br ac ad ab ra , I turn you into
they could do anything as long as they had each other
One day Alan and Aub rey decided to turn the

MIN ACTIVATIONS
MIN = -0.069

very helpless . He felt all alone in the big ,
" Why did you help me ?" The
mom , " What made me so sad ?" His mom
whist ling to make us both happy ."
Lily accidentally stub bed her toe on one of the toys
, " Can you help me shine too ?" The small
a special noise that made everyone around very happy .
said , " Please help me get my ball ." The
hot . It was making everyone very sleepy . The roof
of noise . It made everyone look . J
" Oh no ! You both played with the junk and
either . This made his friends sad and they started to
Billy , " Please help me get my toy ." Billy
" Why did you thank me , little bird ?" The
� € � � Hello all igator ! â € .
sound . This sound made everyone nearby very happy .
how are you ?" The all igator didn 't reply ,
tool . It made her friends happy , it made them
sad . The treasure made everyone feel bad , and it
And whenever she helped her friends , her special mint was

INTERVAL 0.041 - 0.078
CONTAINS 0.128%

. Even if someone is not like you , unite with
use them to learn about numbers , shapes , patterns ,
, they find a big steel truck in the garage .
but some of them were not fit enough to go .
with his toys all day long . One day , he

INTERVAL 0.004 - 0.041
CONTAINS 36.319%

You should never go through holes in fences . They can
L ily had so much fun playing with the other
played with the white balloon and had so much fun .
dad went to the pond to go fishing . Jane was
to the sound , until he arrived at a pond .

INTERVAL -0.033 - 0.004
CONTAINS 62.710%

used a pipe to make a tunnel for his cars .
cared about doing the right thing . So , she went
Anna and Ben . He was angry . He ran towards
in the garden . But Max was still a bit naughty
very much . She would hug Daisy every day and night

INTERVAL -0.069 - -0.033
CONTAINS 0.842%

look . He saw lots of shiny red apples hanging from
Sarah wanted to play with the basketball too ,
spark before and it made him scared . The
. Tom kicked the ball into the goal and won the
't fall asleep . He started to search for his favorite
Token big
Token ID1263
Feature activation+6.99e-03

Pos logit contributions
 hawk+0.025
 prope+0.024
 valleys+0.023
growth+0.023
 mast+0.022
Neg logit contributions
 welcome-0.009
 helping-0.008
 hugging-0.008
 liking-0.008
 eager-0.007
Token,
Token ID11
Feature activation-3.96e-03

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.011
 valleys+0.011
 trigger+0.011
Neg logit contributions
 welcome-0.011
 helping-0.010
 hugging-0.010
 polite-0.010
 eager-0.010
Token generous
Token ID14431
Feature activation+1.75e-02

Pos logit contributions
 liking+0.005
 hugging+0.005
ll+0.004
 myself+0.004
 helping+0.004
Neg logit contributions
 hawk-0.008
 prope-0.008
 bolt-0.007
 valleys-0.007
 mast-0.007
Token dog
Token ID3290
Feature activation+1.79e-02

Pos logit contributions
 hawk+0.030
 prope+0.027
 mast+0.025
 trigger+0.024
growth+0.024
Neg logit contributions
 helping-0.028
 welcome-0.028
 liking-0.028
 hugging-0.028
 eager-0.027
Token named
Token ID3706
Feature activation+9.19e-04

Pos logit contributions
 hawk+0.040
 prope+0.039
 valleys+0.037
growth+0.037
 trigger+0.036
Neg logit contributions
 helping-0.020
 welcome-0.020
 hugging-0.019
 polite-0.019
 sweet-0.019
Token Che
Token ID2580
Feature activation+7.77e-02

Pos logit contributions
 hawk+0.002
growth+0.002
 prope+0.001
 valleys+0.001
 Zed+0.001
Neg logit contributions
 helping-0.001
 hugging-0.001
 welcome-0.001
 liking-0.001
 eager-0.001
Tokenwy
Token ID21768
Feature activation+3.68e-02

Pos logit contributions
 hawk+0.140
 prope+0.137
growth+0.126
 valleys+0.125
 mast+0.124
Neg logit contributions
 welcome-0.122
 polite-0.118
 helping-0.117
 obedient-0.116
 sweet-0.115
Token.
Token ID13
Feature activation+2.46e-02

Pos logit contributions
 hawk+0.083
 prope+0.079
 valleys+0.076
growth+0.076
 trigger+0.074
Neg logit contributions
 welcome-0.041
 helping-0.040
 hugging-0.038
 polite-0.038
 sweet-0.038
Token Che
Token ID2580
Feature activation+6.83e-02

Pos logit contributions
 hawk+0.048
 prope+0.044
growth+0.043
 valleys+0.042
 obstacle+0.041
Neg logit contributions
 welcome-0.037
 helping-0.036
 eager-0.034
 sweet-0.034
 hugging-0.034
Tokenwy
Token ID21768
Feature activation+2.96e-02

Pos logit contributions
 hawk+0.122
 prope+0.119
growth+0.111
 valleys+0.109
 mast+0.108
Neg logit contributions
 welcome-0.109
 polite-0.106
 helping-0.105
 obedient-0.105
 sweet-0.103
Token loved
Token ID6151
Feature activation-7.87e-03

Pos logit contributions
 hawk+0.060
 prope+0.056
growth+0.055
 valleys+0.054
aw+0.053
Neg logit contributions
 welcome-0.038
 helping-0.037
 hugging-0.036
 liking-0.035
 polite-0.035
Token named
Token ID3706
Feature activation+1.78e-02

Pos logit contributions
 hawk+0.073
 prope+0.069
growth+0.066
 valleys+0.065
 mast+0.063
Neg logit contributions
 welcome-0.042
 helping-0.041
 polite-0.040
 eager-0.039
 hugging-0.039
Token Fl
Token ID1610
Feature activation+2.78e-02

Pos logit contributions
 hawk+0.027
growth+0.023
 Zed+0.022
 trigger+0.022
 winds+0.021
Neg logit contributions
 liking-0.032
 eager-0.031
 polite-0.031
 helping-0.031
 obedient-0.031
Tokenuffy
Token ID15352
Feature activation+2.18e-02

Pos logit contributions
 hawk+0.053
 prope+0.047
growth+0.046
 valleys+0.046
aw+0.045
Neg logit contributions
 helping-0.041
 welcome-0.041
 eager-0.039
 polite-0.039
 liking-0.039
Token and
Token ID290
Feature activation+1.89e-02

Pos logit contributions
 hawk+0.043
 prope+0.039
growth+0.038
 valleys+0.038
 trigger+0.037
Neg logit contributions
 welcome-0.030
 helping-0.028
 polite-0.028
 liking-0.028
 hugging-0.028
Token a
Token ID257
Feature activation+2.47e-02

Pos logit contributions
 hawk+0.039
 prope+0.035
growth+0.034
 happens+0.033
 valleys+0.032
Neg logit contributions
 welcome-0.024
 hugging-0.024
 liking-0.024
 helping-0.024
 eager-0.023
Token ham
Token ID8891
Feature activation+7.69e-02

Pos logit contributions
 hawk+0.047
 prope+0.041
 pirate+0.036
 mast+0.035
 hunter+0.035
Neg logit contributions
 myself-0.038
 helping-0.036
pack-0.036
 liking-0.036
 handing-0.035
Tokenster
Token ID1706
Feature activation+3.46e-02

Pos logit contributions
 hawk+0.120
 prope+0.113
 valleys+0.108
growth+0.107
 trigger+0.100
Neg logit contributions
 welcome-0.138
 helping-0.135
 sweet-0.131
 hugging-0.131
 polite-0.131
Token named
Token ID3706
Feature activation+1.53e-02

Pos logit contributions
 hawk+0.072
 prope+0.067
growth+0.065
 valleys+0.064
 trigger+0.063
Neg logit contributions
 welcome-0.044
 helping-0.042
 polite-0.041
 hugging-0.041
 eager-0.041
Token Nib
Token ID49532
Feature activation+5.09e-02

Pos logit contributions
 hawk+0.025
growth+0.023
 prope+0.021
 valleys+0.021
 trigger+0.021
Neg logit contributions
 helping-0.026
 polite-0.025
 eager-0.025
 liking-0.025
 welcome-0.025
Tokenbles
Token ID7689
Feature activation+1.61e-02

Pos logit contributions
 hawk+0.088
 prope+0.079
 valleys+0.076
growth+0.075
 trigger+0.074
Neg logit contributions
 welcome-0.085
 helping-0.083
 eager-0.080
 hugging-0.079
 polite-0.079
Token.
Token ID13
Feature activation+1.87e-02

Pos logit contributions
 hawk+0.033
 prope+0.031
growth+0.030
 valleys+0.030
 trigger+0.029
Neg logit contributions
 welcome-0.021
 helping-0.020
 polite-0.019
 hugging-0.019
 obedient-0.019
Token to
Token ID284
Feature activation+1.10e-04

Pos logit contributions
 polite+0.005
 liking+0.005
 welcome+0.005
 thoughtful+0.005
 sweet+0.005
Neg logit contributions
 hawk-0.006
 valleys-0.005
 prope-0.005
 mountains-0.005
 islands-0.005
Token the
Token ID262
Feature activation+1.03e-02

Pos logit contributions
 hawk+0.000
 prope+0.000
growth+0.000
 valleys+0.000
 islands+0.000
Neg logit contributions
 obedient-0.000
 liking-0.000
ll-0.000
 polite-0.000
 generous-0.000
Token backyard
Token ID24296
Feature activation+6.86e-03

Pos logit contributions
 hawk+0.020
 prope+0.017
 valleys+0.017
 mountains+0.017
 cockpit+0.016
Neg logit contributions
 welcome-0.014
 hugging-0.014
 liking-0.014
 helping-0.013
 eager-0.013
Token and
Token ID290
Feature activation+1.06e-02

Pos logit contributions
 valleys+0.013
 hawk+0.012
 mast+0.012
 islands+0.012
 prope+0.012
Neg logit contributions
 welcome-0.009
 liking-0.008
 polite-0.008
 obedient-0.008
 sweet-0.008
Token unw
Token ID7379
Feature activation+3.35e-02

Pos logit contributions
 hawk+0.018
 valleys+0.017
 prope+0.017
growth+0.017
 mast+0.016
Neg logit contributions
 welcome-0.016
 liking-0.016
 helping-0.015
 polite-0.015
 sweet-0.015
Tokenra
Token ID430
Feature activation+7.57e-02

Pos logit contributions
 hawk+0.064
 prope+0.058
 valleys+0.056
growth+0.056
 happens+0.054
Neg logit contributions
 welcome-0.049
 helping-0.048
 obedient-0.048
 hugging-0.048
 polite-0.048
Tokenpped
Token ID1496
Feature activation+1.32e-02

Pos logit contributions
 hawk+0.094
 prope+0.085
growth+0.080
 valleys+0.078
 trigger+0.074
Neg logit contributions
 welcome-0.161
 helping-0.158
 polite-0.155
 hugging-0.155
 sweet-0.154
Token the
Token ID262
Feature activation+2.44e-03

Pos logit contributions
 hawk+0.023
 prope+0.021
 valleys+0.020
growth+0.020
 trigger+0.019
Neg logit contributions
 welcome-0.022
 polite-0.021
 helping-0.021
 obedient-0.021
 liking-0.021
Token pastry
Token ID41738
Feature activation+6.42e-03

Pos logit contributions
 hawk+0.004
 trigger+0.004
 prope+0.004
 helicopter+0.004
aw+0.004
Neg logit contributions
ll-0.004
 helping-0.003
 allow-0.003
 behaving-0.003
 wanting-0.003
Token.
Token ID13
Feature activation-4.98e-04

Pos logit contributions
 hawk+0.012
 prope+0.011
 valleys+0.011
growth+0.011
 mast+0.010
Neg logit contributions
 welcome-0.009
 polite-0.009
 obedient-0.009
 liking-0.009
 helping-0.009
Token The
Token ID383
Feature activation-2.81e-03

Pos logit contributions
 liking+0.001
 polite+0.001
 obedient+0.001
 helping+0.001
 talking+0.001
Neg logit contributions
growth-0.001
 hawk-0.001
 valleys-0.001
 prope-0.001
 mast-0.001
Token around
Token ID1088
Feature activation-5.46e-04

Pos logit contributions
 hawk+0.012
 prope+0.011
growth+0.011
 mast+0.010
 valleys+0.010
Neg logit contributions
 welcome-0.006
 helping-0.006
 eager-0.006
 sweet-0.006
 polite-0.006
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
 welcome+0.001
 polite+0.001
 liking+0.001
 obedient+0.001
 hugging+0.001
Neg logit contributions
 hawk-0.001
 valleys-0.001
growth-0.001
 prope-0.001
 islands-0.001
Token package
Token ID5301
Feature activation+2.04e-02

Pos logit contributions
 hawk+0.019
 prope+0.017
 valleys+0.016
 trigger+0.016
growth+0.016
Neg logit contributions
 welcome-0.023
 liking-0.022
 helping-0.022
 hugging-0.022
 laughing-0.020
Token and
Token ID290
Feature activation+2.14e-02

Pos logit contributions
 prope+0.035
 mast+0.035
 hawk+0.034
 hull+0.034
 valleys+0.034
Neg logit contributions
 welcome-0.031
 liking-0.027
 polite-0.026
 my-0.026
 obedient-0.026
Token unw
Token ID7379
Feature activation+3.70e-02

Pos logit contributions
 hawk+0.035
 valleys+0.034
 prope+0.034
growth+0.033
 mast+0.033
Neg logit contributions
 welcome-0.035
 liking-0.033
 polite-0.031
 hugging-0.030
 sweet-0.030
Tokenra
Token ID430
Feature activation+7.53e-02

Pos logit contributions
 hawk+0.074
 prope+0.066
 valleys+0.064
growth+0.063
 happens+0.062
Neg logit contributions
 helping-0.050
 obedient-0.050
 welcome-0.050
 polite-0.049
 hugging-0.049
Tokenpped
Token ID1496
Feature activation+1.35e-02

Pos logit contributions
 hawk+0.098
 prope+0.089
growth+0.084
 valleys+0.083
 trigger+0.079
Neg logit contributions
 welcome-0.155
 helping-0.152
 polite-0.149
 hugging-0.148
 sweet-0.148
Token it
Token ID340
Feature activation+6.70e-05

Pos logit contributions
 hawk+0.025
 prope+0.023
growth+0.022
 valleys+0.022
 trigger+0.022
Neg logit contributions
 welcome-0.020
 helping-0.020
 polite-0.019
 obedient-0.019
 liking-0.019
Token.
Token ID13
Feature activation-3.94e-03

Pos logit contributions
 prope+0.000
 hawk+0.000
 valleys+0.000
 trigger+0.000
 mast+0.000
Neg logit contributions
 welcome-0.000
 liking-0.000
 hugging-0.000
 polite-0.000
 obedient-0.000
Token
Token ID198
Feature activation-9.00e-03

Pos logit contributions
 welcome+0.006
 helping+0.006
 polite+0.006
 liking+0.006
 obedient+0.006
Neg logit contributions
 hawk-0.006
growth-0.006
 valleys-0.006
 prope-0.006
 mast-0.006
Token
Token ID198
Feature activation+2.80e-03

Pos logit contributions
 welcome+0.012
 helping+0.012
 hugging+0.011
 obedient+0.011
 polite+0.011
Neg logit contributions
 hawk-0.018
growth-0.017
 prope-0.017
 valleys-0.017
 islands-0.015
Token join
Token ID4654
Feature activation-1.03e-02

Pos logit contributions
 islands+0.009
growth+0.009
 mast+0.009
 hawk+0.009
 prope+0.009
Neg logit contributions
 eager-0.009
 obedient-0.008
ll-0.008
ower-0.008
ve-0.008
Token them
Token ID606
Feature activation-1.24e-02

Pos logit contributions
 hugging+0.019
 polite+0.019
 thoughtful+0.019
 sweet+0.019
 helping+0.019
Neg logit contributions
growth-0.013
 islands-0.013
 hawk-0.013
aw-0.012
 valleys-0.012
Token again
Token ID757
Feature activation+8.10e-03

Pos logit contributions
 welcome+0.017
 liking+0.016
 polite+0.016
 hugging+0.016
 eager+0.015
Neg logit contributions
 hawk-0.021
 islands-0.021
 prope-0.020
 valleys-0.020
 targets-0.019
Token,
Token ID11
Feature activation+3.60e-02

Pos logit contributions
 islands+0.012
 hawk+0.012
 valleys+0.012
growth+0.011
 prope+0.011
Neg logit contributions
 hugging-0.013
 welcome-0.013
 handing-0.013
 liking-0.012
 cheek-0.012
Token as
Token ID355
Feature activation+3.77e-02

Pos logit contributions
 prope+0.062
 valleys+0.061
 hawk+0.060
 islands+0.058
 mast+0.058
Neg logit contributions
 welcome-0.047
ooo-0.046
ummy-0.041
 supposed-0.041
 helpful-0.040
Token long
Token ID890
Feature activation+7.53e-02

Pos logit contributions
 hawk+0.070
 islands+0.065
 prope+0.065
 valleys+0.063
growth+0.062
Neg logit contributions
 welcome-0.053
 hugging-0.051
 liking-0.050
 eager-0.049
 helping-0.049
Token as
Token ID355
Feature activation+3.79e-02

Pos logit contributions
 hawk+0.174
 prope+0.166
growth+0.157
 valleys+0.157
 trigger+0.154
Neg logit contributions
 helping-0.083
 welcome-0.078
 hugging-0.076
 sweet-0.075
 liking-0.074
Token he
Token ID339
Feature activation+2.36e-02

Pos logit contributions
 hawk+0.067
 prope+0.062
 valleys+0.062
 islands+0.062
growth+0.059
Neg logit contributions
 welcome-0.056
 liking-0.055
 hugging-0.055
 eager-0.054
 helping-0.054
Token kept
Token ID4030
Feature activation+2.02e-02

Pos logit contributions
 hawk+0.035
growth+0.033
 prope+0.033
 valleys+0.033
aw+0.032
Neg logit contributions
 welcome-0.043
 helping-0.042
 liking-0.041
 hugging-0.040
 polite-0.040
Token his
Token ID465
Feature activation-3.54e-04

Pos logit contributions
 hawk+0.036
 prope+0.034
 valleys+0.034
 islands+0.033
 mast+0.032
Neg logit contributions
 welcome-0.029
 polite-0.027
ll-0.027
 obedient-0.027
 liking-0.027
Token promise
Token ID6991
Feature activation+1.22e-02

Pos logit contributions
 liking+0.001
 hugging+0.000
 helping+0.000
 inviting+0.000
 handing+0.000
Neg logit contributions
 hawk-0.001
 prope-0.001
 mast-0.001
 obstacle-0.001
 plane-0.001
Token two
Token ID734
Feature activation-8.82e-03

Pos logit contributions
 welcome+0.041
 helping+0.039
 polite+0.037
 sweet+0.037
 hugging+0.037
Neg logit contributions
 hawk-0.091
 prope-0.086
growth-0.084
 valleys-0.083
 trigger-0.081
Token friends
Token ID2460
Feature activation-1.75e-02

Pos logit contributions
 welcome+0.009
 liking+0.009
 helping+0.009
 hugging+0.008
 ready+0.008
Neg logit contributions
 hawk-0.020
 prope-0.019
 valleys-0.018
aw-0.018
 islands-0.018
Token,
Token ID11
Feature activation+2.25e-02

Pos logit contributions
 welcome+0.022
 helping+0.021
 polite+0.020
 sweet+0.020
 hugging+0.020
Neg logit contributions
 hawk-0.037
 prope-0.034
growth-0.033
 valleys-0.033
 mast-0.032
Token Alan
Token ID12246
Feature activation+3.53e-02

Pos logit contributions
 hawk+0.045
 prope+0.042
growth+0.042
 valleys+0.041
 islands+0.040
Neg logit contributions
 liking-0.027
 welcome-0.027
 hugging-0.026
 helping-0.026
 obedient-0.025
Token and
Token ID290
Feature activation+8.15e-03

Pos logit contributions
 hawk+0.087
 prope+0.082
growth+0.080
 valleys+0.080
 trigger+0.077
Neg logit contributions
 welcome-0.031
 helping-0.030
 hugging-0.029
 polite-0.029
 sweet-0.028
Token Aub
Token ID19723
Feature activation+7.51e-02

Pos logit contributions
 hawk+0.018
 prope+0.017
growth+0.017
 valleys+0.016
 mast+0.016
Neg logit contributions
 welcome-0.009
 helping-0.009
 hugging-0.008
 liking-0.008
 polite-0.008
Tokenrey
Token ID4364
Feature activation+3.74e-02

Pos logit contributions
 hawk+0.139
 prope+0.128
 valleys+0.123
 happens+0.119
growth+0.118
Neg logit contributions
 first-0.119
 not-0.112
 welcome-0.111
 polite-0.107
 sweet-0.104
Token.
Token ID13
Feature activation+2.82e-02

Pos logit contributions
 hawk+0.090
 prope+0.085
growth+0.083
 valleys+0.082
 trigger+0.081
Neg logit contributions
 welcome-0.039
 helping-0.038
 eager-0.036
 hugging-0.036
 polite-0.036
Token Alan
Token ID12246
Feature activation+1.68e-02

Pos logit contributions
 hawk+0.071
 prope+0.070
 bolt+0.064
bridge+0.062
 mast+0.062
Neg logit contributions
 welcome-0.026
 eager-0.025
 looking-0.023
 helping-0.023
 first-0.023
Token and
Token ID290
Feature activation+7.31e-03

Pos logit contributions
 hawk+0.034
 prope+0.032
 valleys+0.031
growth+0.031
 happens+0.031
Neg logit contributions
 welcome-0.021
 helping-0.020
 hugging-0.020
 liking-0.020
 polite-0.019
Token Aub
Token ID19723
Feature activation+6.98e-02

Pos logit contributions
 hawk+0.012
 prope+0.011
growth+0.011
 valleys+0.010
 trigger+0.010
Neg logit contributions
 welcome-0.013
 helping-0.012
 polite-0.012
 sweet-0.012
 eager-0.012
Token under
Token ID739
Feature activation+6.49e-03

Pos logit contributions
 hawk+0.031
 prope+0.029
growth+0.028
 valleys+0.028
 trigger+0.027
Neg logit contributions
 welcome-0.018
 helping-0.017
 polite-0.017
 hugging-0.016
 eager-0.016
Token the
Token ID262
Feature activation+6.10e-04

Pos logit contributions
 hawk+0.013
 prope+0.013
growth+0.012
 valleys+0.012
 trigger+0.012
Neg logit contributions
 welcome-0.009
 helping-0.008
 polite-0.008
 obedient-0.008
 eager-0.008
Token table
Token ID3084
Feature activation+1.34e-02

Pos logit contributions
 hawk+0.001
 prope+0.001
growth+0.001
 trigger+0.001
 valleys+0.001
Neg logit contributions
 welcome-0.001
 helping-0.001
 polite-0.001
 obedient-0.001
 eager-0.001
Token and
Token ID290
Feature activation+1.19e-02

Pos logit contributions
 valleys+0.023
growth+0.021
 prope+0.021
 hawk+0.021
 islands+0.021
Neg logit contributions
 welcome-0.020
 polite-0.019
 obedient-0.019
 thoughtful-0.019
 liking-0.018
Token unw
Token ID7379
Feature activation+3.51e-02

Pos logit contributions
 hawk+0.022
 prope+0.021
 valleys+0.021
growth+0.021
 mast+0.020
Neg logit contributions
 welcome-0.018
 helping-0.017
 liking-0.016
 polite-0.016
 sweet-0.016
Tokenra
Token ID430
Feature activation+7.50e-02

Pos logit contributions
 hawk+0.072
 prope+0.064
 valleys+0.062
growth+0.060
 winds+0.060
Neg logit contributions
 obedient-0.047
 hugging-0.046
 helping-0.046
 welcome-0.046
 sweet-0.046
Tokenpped
Token ID1496
Feature activation+1.30e-02

Pos logit contributions
 hawk+0.090
 prope+0.082
growth+0.079
 valleys+0.077
 trigger+0.073
Neg logit contributions
 welcome-0.162
 helping-0.157
 polite-0.155
 eager-0.154
 obedient-0.153
Token the
Token ID262
Feature activation+2.73e-03

Pos logit contributions
 hawk+0.025
 prope+0.023
growth+0.022
 valleys+0.022
 trigger+0.021
Neg logit contributions
 welcome-0.019
 helping-0.019
 polite-0.019
 obedient-0.018
 liking-0.018
Token chocolate
Token ID11311
Feature activation+1.69e-02

Pos logit contributions
 hawk+0.005
 prope+0.005
 trigger+0.005
 valleys+0.004
 winds+0.004
Neg logit contributions
 helping-0.004
 welcome-0.004
 liking-0.004
 obedient-0.004
 eager-0.004
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
 hawk+0.037
 prope+0.034
growth+0.033
 valleys+0.033
 trigger+0.032
Neg logit contributions
 welcome-0.020
 helping-0.020
 polite-0.019
 eager-0.019
 obedient-0.019
Token
Token ID198
Feature activation+1.44e-03

Pos logit contributions
 hawk+0.023
 prope+0.022
growth+0.021
 valleys+0.021
 trigger+0.020
Neg logit contributions
 welcome-0.016
 helping-0.016
 polite-0.015
 obedient-0.015
 liking-0.015
Token of
Token ID286
Feature activation-2.17e-02

Pos logit contributions
 hawk+0.030
 prope+0.027
growth+0.027
aw+0.027
 valleys+0.026
Neg logit contributions
 welcome-0.019
 polite-0.019
 helping-0.019
 hugging-0.019
 obedient-0.019
Token candy
Token ID18550
Feature activation+2.21e-02

Pos logit contributions
ll+0.036
 helping+0.035
 liking+0.035
 obedient+0.035
 hugging+0.035
Neg logit contributions
 hawk-0.032
 prope-0.029
 trigger-0.028
growth-0.028
 winds-0.027
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
 hawk+0.049
 prope+0.046
growth+0.044
 valleys+0.044
 trigger+0.044
Neg logit contributions
 welcome-0.025
 helping-0.024
 polite-0.023
 obedient-0.023
 eager-0.023
Token carefully
Token ID7773
Feature activation+3.52e-02

Pos logit contributions
 hawk+0.022
growth+0.021
 prope+0.021
 valleys+0.020
 happens+0.020
Neg logit contributions
 welcome-0.019
 liking-0.017
 helping-0.017
 hugging-0.017
 polite-0.017
Token unw
Token ID7379
Feature activation+3.43e-02

Pos logit contributions
 hawk+0.071
 prope+0.068
growth+0.067
 valleys+0.065
 mast+0.064
Neg logit contributions
 welcome-0.046
 helping-0.043
 polite-0.042
 obedient-0.042
 eager-0.041
Tokenra
Token ID430
Feature activation+7.43e-02

Pos logit contributions
 hawk+0.070
 prope+0.062
 valleys+0.060
growth+0.060
 winds+0.059
Neg logit contributions
 obedient-0.046
 welcome-0.046
 helping-0.046
 polite-0.045
 hugging-0.045
Tokenpped
Token ID1496
Feature activation+7.76e-03

Pos logit contributions
 hawk+0.092
 prope+0.083
growth+0.077
 valleys+0.076
 trigger+0.072
Neg logit contributions
 welcome-0.158
 helping-0.156
 hugging-0.152
 polite-0.152
 sweet-0.152
Token it
Token ID340
Feature activation-2.85e-03

Pos logit contributions
 hawk+0.011
 prope+0.010
growth+0.010
 valleys+0.010
 trigger+0.010
Neg logit contributions
 welcome-0.015
 polite-0.014
 obedient-0.014
 helping-0.014
 eager-0.014
Token.
Token ID13
Feature activation-7.22e-03

Pos logit contributions
 welcome+0.005
 liking+0.005
 obedient+0.005
 polite+0.005
 hugging+0.005
Neg logit contributions
 hawk-0.004
 valleys-0.004
 prope-0.004
aw-0.004
 happens-0.004
Token She
Token ID1375
Feature activation-1.02e-03

Pos logit contributions
 helping+0.010
 welcome+0.010
 polite+0.010
 liking+0.010
 obedient+0.010
Neg logit contributions
growth-0.013
 hawk-0.013
 valleys-0.013
 prope-0.013
 winds-0.012
Token popped
Token ID22928
Feature activation-8.16e-04

Pos logit contributions
 welcome+0.002
 helping+0.001
 liking+0.001
 hugging+0.001
 polite+0.001
Neg logit contributions
 hawk-0.002
 prope-0.002
growth-0.002
 valleys-0.002
 mast-0.002
Token and
Token ID290
Feature activation+1.18e-02

Pos logit contributions
 hawk+0.015
 prope+0.014
 valleys+0.014
 mast+0.014
growth+0.013
Neg logit contributions
 welcome-0.010
 liking-0.009
 polite-0.009
 obedient-0.009
 sweet-0.009
Token watched
Token ID7342
Feature activation+6.84e-03

Pos logit contributions
 hawk+0.024
 prope+0.022
growth+0.021
 valleys+0.021
 mast+0.021
Neg logit contributions
 welcome-0.016
 helping-0.015
 liking-0.015
 polite-0.015
 hugging-0.015
Token it
Token ID340
Feature activation-1.90e-02

Pos logit contributions
 hawk+0.012
 prope+0.011
growth+0.011
 valleys+0.011
 mast+0.010
Neg logit contributions
 welcome-0.011
 polite-0.010
 obedient-0.010
 helping-0.010
 liking-0.010
Token for
Token ID329
Feature activation+6.53e-03

Pos logit contributions
 welcome+0.028
 helping+0.027
 liking+0.027
 polite+0.027
 obedient+0.027
Neg logit contributions
 hawk-0.035
 prope-0.033
 valleys-0.032
growth-0.031
 mast-0.031
Token as
Token ID355
Feature activation+3.00e-02

Pos logit contributions
 hawk+0.014
growth+0.013
 prope+0.013
 valleys+0.013
 islands+0.012
Neg logit contributions
 welcome-0.008
 hugging-0.008
 eager-0.007
 obedient-0.007
 polite-0.007
Token long
Token ID890
Feature activation+7.34e-02

Pos logit contributions
 hawk+0.053
 prope+0.050
 valleys+0.050
 islands+0.048
growth+0.047
Neg logit contributions
 welcome-0.045
 liking-0.044
 hugging-0.044
 helping-0.042
 obedient-0.042
Token as
Token ID355
Feature activation+3.29e-02

Pos logit contributions
 hawk+0.176
 prope+0.167
 trigger+0.158
 valleys+0.157
growth+0.157
Neg logit contributions
 helping-0.076
 taking-0.069
 welcome-0.068
 hugging-0.068
 playing-0.068
Token they
Token ID484
Feature activation+1.42e-02

Pos logit contributions
 hawk+0.054
 valleys+0.050
 prope+0.050
 islands+0.049
growth+0.047
Neg logit contributions
 welcome-0.053
 liking-0.052
 hugging-0.052
 eager-0.051
 obedient-0.051
Token could
Token ID714
Feature activation+2.23e-02

Pos logit contributions
 hawk+0.024
 prope+0.022
growth+0.021
 trigger+0.021
 valleys+0.021
Neg logit contributions
 welcome-0.023
 helping-0.023
 eager-0.022
 polite-0.022
 obedient-0.022
Token.
Token ID13
Feature activation+5.80e-03

Pos logit contributions
 hawk+0.045
 prope+0.043
growth+0.041
 valleys+0.041
 mast+0.040
Neg logit contributions
 welcome-0.029
 helping-0.029
 polite-0.028
 sweet-0.028
 obedient-0.028
Token They
Token ID1119
Feature activation-2.12e-03

Pos logit contributions
growth+0.010
 valleys+0.009
 hawk+0.009
 prope+0.009
aw+0.009
Neg logit contributions
 helping-0.009
 polite-0.009
 hugging-0.009
 welcome-0.009
 liking-0.009
TokenA
Token ID32
Feature activation-6.85e-03

Pos logit contributions
 polite+0.021
 welcome+0.021
 obedient+0.020
 sweet+0.020
 eager+0.019
Neg logit contributions
 hawk-0.036
 prope-0.033
 trigger-0.032
 mast-0.031
growth-0.031
Tokenbr
Token ID1671
Feature activation+5.97e-03

Pos logit contributions
 welcome+0.015
 helping+0.015
 sweet+0.014
 polite+0.014
 obedient+0.014
Neg logit contributions
 hawk-0.008
growth-0.007
 prope-0.007
 valleys-0.007
 trigger-0.006
Tokenac
Token ID330
Feature activation+9.90e-03

Pos logit contributions
 hawk+0.015
 prope+0.013
growth+0.013
 valleys+0.013
 winds+0.013
Neg logit contributions
 welcome-0.007
 sweet-0.006
 polite-0.006
 first-0.006
 helping-0.006
Tokenad
Token ID324
Feature activation+1.92e-02

Pos logit contributions
 hawk+0.017
 prope+0.016
 valleys+0.016
growth+0.015
 mast+0.015
Neg logit contributions
 welcome-0.017
 helping-0.016
 sweet-0.015
 polite-0.015
 obedient-0.015
Tokenab
Token ID397
Feature activation-1.09e-02

Pos logit contributions
 hawk+0.034
 prope+0.032
growth+0.029
 valleys+0.029
 happens+0.029
Neg logit contributions
 welcome-0.030
 helping-0.030
 hugging-0.030
 eager-0.029
 polite-0.029
Tokenra
Token ID430
Feature activation+7.28e-02

Pos logit contributions
 welcome+0.019
 helping+0.019
 polite+0.019
 eager+0.019
 hugging+0.018
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 valleys-0.015
 trigger-0.015
Token!"
Token ID2474
Feature activation-6.92e-03

Pos logit contributions
 hawk+0.149
 prope+0.141
growth+0.135
 valleys+0.132
 happens+0.132
Neg logit contributions
 welcome-0.093
 helping-0.092
 eager-0.090
 polite-0.089
 obedient-0.088
Token
Token ID198
Feature activation+3.05e-03

Pos logit contributions
 welcome+0.007
 helping+0.007
 sweet+0.006
 polite+0.006
 eager+0.006
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 valleys-0.015
 trigger-0.015
Token
Token ID198
Feature activation+7.94e-03

Pos logit contributions
 hawk+0.007
 prope+0.007
growth+0.007
 valleys+0.007
 trigger+0.007
Neg logit contributions
 welcome-0.003
 helping-0.003
 sweet-0.003
 polite-0.003
 eager-0.003
TokenNothing
Token ID18465
Feature activation-5.98e-03

Pos logit contributions
 hawk+0.019
 prope+0.018
growth+0.017
 trigger+0.017
 valleys+0.017
Neg logit contributions
 welcome-0.008
 polite-0.008
 sweet-0.008
 helping-0.008
 obedient-0.008
Token happens
Token ID4325
Feature activation+1.27e-02

Pos logit contributions
 welcome+0.014
 helping+0.014
 sweet+0.014
 polite+0.013
 eager+0.013
Neg logit contributions
 hawk-0.006
 prope-0.006
growth-0.006
 valleys-0.005
 hull-0.005
Token was
Token ID373
Feature activation-1.28e-02

Pos logit contributions
 welcome+0.009
 helping+0.009
 polite+0.008
 hugging+0.008
 eager+0.008
Neg logit contributions
 hawk-0.019
 prope-0.018
growth-0.017
 valleys-0.017
 trigger-0.016
Token a
Token ID257
Feature activation+6.29e-03

Pos logit contributions
 welcome+0.011
 helping+0.011
 hugging+0.010
 polite+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.032
 prope-0.030
growth-0.029
 valleys-0.029
 trigger-0.028
Token little
Token ID1310
Feature activation-4.47e-03

Pos logit contributions
 hawk+0.016
 prope+0.015
 valleys+0.014
growth+0.014
 mast+0.014
Neg logit contributions
 welcome-0.005
 helping-0.005
 hugging-0.005
 liking-0.005
 eager-0.004
Token girl
Token ID2576
Feature activation+6.46e-03

Pos logit contributions
 welcome+0.007
 helping+0.006
 sweet+0.006
 polite+0.006
 obedient+0.006
Neg logit contributions
 hawk-0.008
 prope-0.008
growth-0.008
 valleys-0.008
 trigger-0.007
Token named
Token ID3706
Feature activation-1.82e-02

Pos logit contributions
 hawk+0.014
 prope+0.013
growth+0.012
 valleys+0.012
 trigger+0.012
Neg logit contributions
 welcome-0.008
 helping-0.008
 polite-0.008
 hugging-0.008
 sweet-0.007
Token Rus
Token ID9223
Feature activation+7.17e-02

Pos logit contributions
 helping+0.028
 encouraging+0.027
 hugging+0.027
 teaching+0.026
 obedient+0.026
Neg logit contributions
 happens-0.024
growth-0.023
 hawk-0.023
 valleys-0.022
 prope-0.021
Tokensel
Token ID741
Feature activation+2.64e-02

Pos logit contributions
 hawk+0.140
 prope+0.128
 winds+0.119
 trigger+0.118
 bolt+0.118
Neg logit contributions
 first-0.105
 looking-0.102
 sweet-0.099
 not-0.098
 obedient-0.098
Token.
Token ID13
Feature activation+1.75e-02

Pos logit contributions
 hawk+0.054
 prope+0.050
growth+0.048
 valleys+0.048
 trigger+0.047
Neg logit contributions
 welcome-0.034
 helping-0.033
 hugging-0.032
 polite-0.031
 liking-0.031
Token Rus
Token ID9223
Feature activation+5.80e-02

Pos logit contributions
 hawk+0.051
 prope+0.047
 bolt+0.045
 mast+0.044
 hull+0.044
Neg logit contributions
 welcome-0.012
 first-0.011
 her-0.011
 eager-0.011
 helping-0.010
Tokensel
Token ID741
Feature activation+1.14e-02

Pos logit contributions
 hawk+0.099
 prope+0.088
 bolt+0.078
 trigger+0.075
 conclusions+0.075
Neg logit contributions
 sweet-0.102
 first-0.102
 not-0.101
 looking-0.099
 thoughtful-0.099
Token had
Token ID550
Feature activation+1.21e-03

Pos logit contributions
 hawk+0.025
 prope+0.024
growth+0.023
 valleys+0.023
aw+0.022
Neg logit contributions
 welcome-0.012
 helping-0.012
 hugging-0.012
 liking-0.011
 polite-0.011
Token They
Token ID1119
Feature activation+8.18e-03

Pos logit contributions
 hawk+0.028
 prope+0.027
growth+0.025
 valleys+0.025
 trigger+0.025
Neg logit contributions
 welcome-0.017
 helping-0.017
 sweet-0.016
 polite-0.016
 hugging-0.016
Token had
Token ID550
Feature activation+3.33e-02

Pos logit contributions
 hawk+0.017
 prope+0.016
growth+0.015
 valleys+0.015
 mast+0.014
Neg logit contributions
 welcome-0.011
 helping-0.011
 hugging-0.010
 polite-0.010
 obedient-0.010
Token red
Token ID2266
Feature activation+3.10e-02

Pos logit contributions
 hawk+0.071
 prope+0.067
 valleys+0.065
growth+0.064
 islands+0.064
Neg logit contributions
 welcome-0.039
 helping-0.038
 hugging-0.038
 obedient-0.036
 liking-0.036
Token eyes
Token ID2951
Feature activation+4.11e-02

Pos logit contributions
 hawk+0.058
 prope+0.054
growth+0.052
 valleys+0.052
 trigger+0.050
Neg logit contributions
 welcome-0.046
 helping-0.045
 polite-0.044
 hugging-0.044
 sweet-0.044
Token and
Token ID290
Feature activation+2.93e-02

Pos logit contributions
 hawk+0.087
 prope+0.084
aw+0.079
growth+0.079
 valleys+0.079
Neg logit contributions
 welcome-0.052
 helping-0.047
 obedient-0.046
 polite-0.046
 eager-0.045
Token sharp
Token ID7786
Feature activation+7.14e-02

Pos logit contributions
 hawk+0.056
 prope+0.053
growth+0.051
 valleys+0.050
 mast+0.049
Neg logit contributions
 welcome-0.043
 helping-0.041
 hugging-0.040
 eager-0.039
 liking-0.039
Token be
Token ID307
Feature activation+1.98e-02

Pos logit contributions
 hawk+0.122
 prope+0.112
growth+0.106
 valleys+0.105
 obstacle+0.101
Neg logit contributions
 welcome-0.118
 sweet-0.117
 helping-0.116
 hugging-0.113
 polite-0.113
Tokenaks
Token ID4730
Feature activation+3.58e-02

Pos logit contributions
aw+0.021
 hawk+0.021
 prope+0.019
 valleys+0.018
 islands+0.018
Neg logit contributions
 welcome-0.043
 helping-0.042
 hugging-0.041
 liking-0.041
 eager-0.040
Token.
Token ID13
Feature activation+1.75e-02

Pos logit contributions
 hawk+0.072
 prope+0.069
 valleys+0.066
growth+0.065
 trigger+0.065
Neg logit contributions
 welcome-0.047
 helping-0.045
 liking-0.043
 hugging-0.043
 polite-0.043
Token They
Token ID1119
Feature activation+1.12e-02

Pos logit contributions
 hawk+0.038
 prope+0.036
growth+0.035
 valleys+0.034
 trigger+0.033
Neg logit contributions
 welcome-0.021
 helping-0.021
 polite-0.020
 sweet-0.020
 hugging-0.020
Token did
Token ID750
Feature activation+2.94e-02

Pos logit contributions
 hawk+0.023
 prope+0.022
growth+0.021
 valleys+0.021
 trigger+0.020
Neg logit contributions
 welcome-0.015
 helping-0.014
 hugging-0.014
 polite-0.014
 obedient-0.014
Token the
Token ID262
Feature activation+1.66e-02

Pos logit contributions
 welcome+0.007
 polite+0.007
 helping+0.007
 obedient+0.007
 eager+0.007
Neg logit contributions
 hawk-0.010
growth-0.009
 valleys-0.009
 prope-0.009
 islands-0.008
Token park
Token ID3952
Feature activation+6.14e-03

Pos logit contributions
 hawk+0.030
 prope+0.027
growth+0.026
 valleys+0.026
 trigger+0.025
Neg logit contributions
 welcome-0.026
 helping-0.026
 hugging-0.025
 polite-0.025
 eager-0.025
Token.
Token ID13
Feature activation+7.21e-03

Pos logit contributions
 hawk+0.011
 valleys+0.010
 prope+0.010
 mast+0.010
aw+0.010
Neg logit contributions
 welcome-0.009
 sweet-0.008
 helping-0.008
 polite-0.008
 hugging-0.008
Token He
Token ID679
Feature activation+2.14e-02

Pos logit contributions
 hawk+0.012
 prope+0.012
growth+0.011
 valleys+0.011
 trigger+0.011
Neg logit contributions
 welcome-0.012
 helping-0.012
 hugging-0.011
 polite-0.011
 obedient-0.011
Token unw
Token ID7379
Feature activation+2.20e-02

Pos logit contributions
 hawk+0.041
 prope+0.037
 valleys+0.036
growth+0.036
aw+0.035
Neg logit contributions
 welcome-0.032
 helping-0.031
 hugging-0.030
 liking-0.030
 sweet-0.029
Tokenra
Token ID430
Feature activation+7.11e-02

Pos logit contributions
 hawk+0.044
 prope+0.040
 valleys+0.039
growth+0.038
 happens+0.038
Neg logit contributions
 obedient-0.030
 helping-0.029
 welcome-0.029
 hugging-0.029
 polite-0.029
Tokenpped
Token ID1496
Feature activation-5.81e-04

Pos logit contributions
 hawk+0.092
 prope+0.084
growth+0.077
 valleys+0.077
 mast+0.073
Neg logit contributions
 welcome-0.147
 helping-0.146
 hugging-0.143
 sweet-0.142
 polite-0.142
Token the
Token ID262
Feature activation+1.00e-02

Pos logit contributions
 welcome+0.001
 helping+0.001
 hugging+0.001
 polite+0.001
 eager+0.001
Neg logit contributions
 hawk-0.001
 prope-0.001
growth-0.001
 valleys-0.001
 mast-0.001
Token l
Token ID300
Feature activation+1.65e-02

Pos logit contributions
 hawk+0.021
 prope+0.019
 trigger+0.019
growth+0.018
 valleys+0.018
Neg logit contributions
 helping-0.013
 welcome-0.012
 eager-0.012
 obedient-0.012
 hugging-0.012
Tokenoll
Token ID692
Feature activation+1.80e-02

Pos logit contributions
 hawk+0.028
 prope+0.027
 valleys+0.025
growth+0.024
 mast+0.024
Neg logit contributions
 welcome-0.028
 helping-0.026
 liking-0.026
 sweet-0.026
 hugging-0.026
Tokenipop
Token ID42800
Feature activation+2.92e-03

Pos logit contributions
 hawk+0.030
 prope+0.028
growth+0.026
 valleys+0.026
 trigger+0.025
Neg logit contributions
 welcome-0.031
 helping-0.030
 polite-0.029
 hugging-0.029
 eager-0.029
Token grew
Token ID6348
Feature activation-1.01e-02

Pos logit contributions
 hawk+0.015
 valleys+0.014
growth+0.014
 prope+0.014
 islands+0.014
Neg logit contributions
 welcome-0.016
 hugging-0.015
 helping-0.015
 liking-0.015
 polite-0.015
Token mint
Token ID21043
Feature activation+1.35e-02

Pos logit contributions
 welcome+0.012
 hugging+0.012
 liking+0.011
 helping+0.011
 polite+0.011
Neg logit contributions
 hawk-0.021
 valleys-0.020
 prope-0.020
 islands-0.020
growth-0.019
Token and
Token ID290
Feature activation+2.16e-02

Pos logit contributions
 hawk+0.024
 valleys+0.023
growth+0.022
 prope+0.022
 islands+0.022
Neg logit contributions
 welcome-0.021
 hugging-0.020
 helping-0.019
 eager-0.019
 polite-0.019
Token basil
Token ID37792
Feature activation+3.04e-02

Pos logit contributions
 hawk+0.038
 prope+0.036
 valleys+0.035
growth+0.034
 mast+0.033
Neg logit contributions
 welcome-0.034
 obedient-0.032
 polite-0.031
 eager-0.031
 thoughtful-0.029
Token and
Token ID290
Feature activation+1.91e-02

Pos logit contributions
 hawk+0.049
 valleys+0.047
growth+0.046
aw+0.045
 islands+0.045
Neg logit contributions
 welcome-0.048
 obedient-0.048
 polite-0.047
 eager-0.047
 hugging-0.046
Token pars
Token ID13544
Feature activation+7.04e-02

Pos logit contributions
 hawk+0.031
 prope+0.029
growth+0.028
 valleys+0.028
 winds+0.027
Neg logit contributions
 welcome-0.033
 obedient-0.031
 eager-0.030
 polite-0.030
 hugging-0.029
Tokenley
Token ID1636
Feature activation+2.47e-02

Pos logit contributions
 hawk+0.079
 prope+0.072
growth+0.064
 valleys+0.064
 mast+0.060
Neg logit contributions
 welcome-0.156
 helping-0.155
 sweet-0.151
 hugging-0.151
 polite-0.150
Token.
Token ID13
Feature activation+1.88e-03

Pos logit contributions
 hawk+0.046
growth+0.043
 prope+0.043
 valleys+0.043
 islands+0.041
Neg logit contributions
 welcome-0.035
 helping-0.033
 polite-0.033
 obedient-0.033
 hugging-0.033
Token
Token ID198
Feature activation+1.33e-02

Pos logit contributions
 hawk+0.005
 prope+0.004
growth+0.004
 valleys+0.004
 trigger+0.004
Neg logit contributions
 welcome-0.002
 helping-0.002
 sweet-0.002
 eager-0.002
 polite-0.002
Token
Token ID198
Feature activation+1.70e-02

Pos logit contributions
 hawk+0.031
 prope+0.029
growth+0.028
 valleys+0.028
 trigger+0.027
Neg logit contributions
 welcome-0.014
 helping-0.014
 hugging-0.013
 polite-0.013
 eager-0.013
TokenOne
Token ID3198
Feature activation-2.09e-02

Pos logit contributions
 hawk+0.028
 prope+0.026
growth+0.025
 valleys+0.025
 trigger+0.024
Neg logit contributions
 welcome-0.029
 helping-0.028
 polite-0.028
 sweet-0.028
 obedient-0.028
Token We
Token ID775
Feature activation+1.01e-02

Pos logit contributions
 hawk+0.015
 prope+0.014
growth+0.013
 valleys+0.013
 mast+0.013
Neg logit contributions
 welcome-0.009
 helping-0.009
 polite-0.009
 sweet-0.008
 hugging-0.008
Token can
Token ID460
Feature activation-4.46e-03

Pos logit contributions
 hawk+0.021
 prope+0.019
growth+0.018
 valleys+0.018
 islands+0.018
Neg logit contributions
 helping-0.013
 hugging-0.013
 welcome-0.013
 liking-0.012
 eager-0.012
Token play
Token ID711
Feature activation-2.34e-02

Pos logit contributions
 helping+0.005
 obedient+0.005
 eager+0.005
 welcome+0.005
 polite+0.005
Neg logit contributions
 hawk-0.009
 prope-0.009
growth-0.008
 mast-0.008
 hull-0.008
Token here
Token ID994
Feature activation+2.61e-03

Pos logit contributions
 welcome+0.022
 eager+0.021
 hugging+0.021
 helping+0.021
 obedient+0.020
Neg logit contributions
 hawk-0.054
 prope-0.052
 valleys-0.051
 islands-0.051
growth-0.049
Token as
Token ID355
Feature activation+2.67e-02

Pos logit contributions
 hawk+0.004
 prope+0.004
 valleys+0.004
 islands+0.004
aw+0.004
Neg logit contributions
 hugging-0.004
 welcome-0.004
 liking-0.004
 helping-0.004
 eager-0.004
Token long
Token ID890
Feature activation+7.03e-02

Pos logit contributions
 hawk+0.057
 prope+0.053
 valleys+0.051
growth+0.051
 islands+0.051
Neg logit contributions
 welcome-0.032
 hugging-0.032
 helping-0.031
 eager-0.030
 polite-0.030
Token as
Token ID355
Feature activation+3.25e-02

Pos logit contributions
 hawk+0.156
 prope+0.148
growth+0.142
 valleys+0.142
 trigger+0.137
Neg logit contributions
 helping-0.080
 welcome-0.079
 liking-0.074
 hugging-0.074
 sweet-0.074
Token we
Token ID356
Feature activation+1.17e-02

Pos logit contributions
 hawk+0.071
 prope+0.066
 valleys+0.065
 islands+0.065
growth+0.065
Neg logit contributions
 hugging-0.036
 helping-0.035
 welcome-0.035
 liking-0.034
 eager-0.034
Token want
Token ID765
Feature activation+2.01e-02

Pos logit contributions
 hawk+0.019
 prope+0.018
 valleys+0.018
growth+0.018
 mast+0.017
Neg logit contributions
 helping-0.019
 hugging-0.019
 sweet-0.018
 obedient-0.018
 welcome-0.018
Token."
Token ID526
Feature activation+1.01e-02

Pos logit contributions
 hawk+0.043
 prope+0.041
growth+0.039
 valleys+0.039
 islands+0.038
Neg logit contributions
 welcome-0.024
 helping-0.023
 hugging-0.023
 eager-0.023
 obedient-0.023
Token
Token ID198
Feature activation+1.11e-02

Pos logit contributions
 hawk+0.027
 prope+0.026
growth+0.025
 trigger+0.024
 valleys+0.024
Neg logit contributions
 welcome-0.008
 helping-0.007
 sweet-0.007
 hugging-0.007
 eager-0.007
Token Aub
Token ID19723
Feature activation+7.51e-02

Pos logit contributions
 hawk+0.018
 prope+0.017
growth+0.017
 valleys+0.016
 mast+0.016
Neg logit contributions
 welcome-0.009
 helping-0.009
 hugging-0.008
 liking-0.008
 polite-0.008
Tokenrey
Token ID4364
Feature activation+3.74e-02

Pos logit contributions
 hawk+0.139
 prope+0.128
 valleys+0.123
 happens+0.119
growth+0.118
Neg logit contributions
 first-0.119
 not-0.112
 welcome-0.111
 polite-0.107
 sweet-0.104
Token.
Token ID13
Feature activation+2.82e-02

Pos logit contributions
 hawk+0.090
 prope+0.085
growth+0.083
 valleys+0.082
 trigger+0.081
Neg logit contributions
 welcome-0.039
 helping-0.038
 eager-0.036
 hugging-0.036
 polite-0.036
Token Alan
Token ID12246
Feature activation+1.68e-02

Pos logit contributions
 hawk+0.071
 prope+0.070
 bolt+0.064
bridge+0.062
 mast+0.062
Neg logit contributions
 welcome-0.026
 eager-0.025
 looking-0.023
 helping-0.023
 first-0.023
Token and
Token ID290
Feature activation+7.31e-03

Pos logit contributions
 hawk+0.034
 prope+0.032
 valleys+0.031
growth+0.031
 happens+0.031
Neg logit contributions
 welcome-0.021
 helping-0.020
 hugging-0.020
 liking-0.020
 polite-0.019
Token Aub
Token ID19723
Feature activation+6.98e-02

Pos logit contributions
 hawk+0.012
 prope+0.011
growth+0.011
 valleys+0.010
 trigger+0.010
Neg logit contributions
 welcome-0.013
 helping-0.012
 polite-0.012
 sweet-0.012
 eager-0.012
Tokenrey
Token ID4364
Feature activation+2.54e-02

Pos logit contributions
 hawk+0.135
 prope+0.126
 valleys+0.124
growth+0.119
isel+0.118
Neg logit contributions
 first-0.102
 welcome-0.097
 not-0.094
 polite-0.091
 obedient-0.089
Token would
Token ID561
Feature activation+3.67e-03

Pos logit contributions
 hawk+0.055
 prope+0.051
growth+0.050
 valleys+0.049
 trigger+0.048
Neg logit contributions
 welcome-0.032
 helping-0.031
 polite-0.030
 eager-0.030
 sweet-0.030
Token play
Token ID711
Feature activation-2.02e-02

Pos logit contributions
 hawk+0.008
 prope+0.007
 mast+0.007
 hull+0.007
growth+0.007
Neg logit contributions
 hugging-0.004
 liking-0.004
 welcome-0.004
 obedient-0.004
 sweet-0.004
Token together
Token ID1978
Feature activation+1.07e-02

Pos logit contributions
 welcome+0.018
 hugging+0.018
 helping+0.017
 eager+0.017
 liking+0.017
Neg logit contributions
 hawk-0.049
 prope-0.046
 valleys-0.045
growth-0.045
 islands-0.044
Token every
Token ID790
Feature activation+2.60e-03

Pos logit contributions
 hawk+0.021
 prope+0.020
 valleys+0.019
growth+0.019
 islands+0.019
Neg logit contributions
 welcome-0.014
 hugging-0.014
 obedient-0.014
 polite-0.013
 sweet-0.013
Token to
Token ID284
Feature activation+5.65e-03

Pos logit contributions
 hawk+0.030
 prope+0.028
 happens+0.028
growth+0.028
 winds+0.026
Neg logit contributions
 welcome-0.014
 hugging-0.014
 liking-0.013
 helping-0.013
 eager-0.013
Token stay
Token ID2652
Feature activation-1.05e-02

Pos logit contributions
 hawk+0.010
 hull+0.010
 mast+0.010
 prope+0.010
growth+0.009
Neg logit contributions
 eager-0.008
 hugging-0.007
 liking-0.007
 helping-0.007
 thoughtful-0.007
Token outside
Token ID2354
Feature activation-2.85e-02

Pos logit contributions
 liking+0.012
 welcome+0.012
 hugging+0.011
 handing+0.011
 allow+0.011
Neg logit contributions
 hawk-0.020
 prope-0.020
 valleys-0.019
 hull-0.019
 islands-0.018
Token for
Token ID329
Feature activation+9.60e-04

Pos logit contributions
 welcome+0.044
 liking+0.044
 polite+0.042
 thoughtful+0.041
 supposed+0.040
Neg logit contributions
 prope-0.043
 valleys-0.043
 hawk-0.040
 mountains-0.039
 winds-0.038
Token as
Token ID355
Feature activation+2.12e-02

Pos logit contributions
 hawk+0.002
growth+0.002
 prope+0.002
 valleys+0.002
 winds+0.002
Neg logit contributions
 eager-0.001
 welcome-0.001
 hugging-0.001
 polite-0.001
 obedient-0.001
Token long
Token ID890
Feature activation+6.97e-02

Pos logit contributions
 hawk+0.038
 prope+0.036
 valleys+0.035
growth+0.034
 islands+0.033
Neg logit contributions
 welcome-0.029
 hugging-0.028
 handing-0.028
 liking-0.027
 cheek-0.026
Token as
Token ID355
Feature activation+2.59e-02

Pos logit contributions
 hawk+0.158
 prope+0.149
growth+0.143
 valleys+0.142
 trigger+0.142
Neg logit contributions
 helping-0.079
 welcome-0.075
 hugging-0.073
 sweet-0.072
 eager-0.072
Token she
Token ID673
Feature activation+1.42e-02

Pos logit contributions
 hawk+0.048
 valleys+0.045
 prope+0.045
growth+0.044
 islands+0.043
Neg logit contributions
 welcome-0.034
 eager-0.033
 liking-0.033
 hugging-0.033
 obedient-0.032
Token wanted
Token ID2227
Feature activation+9.66e-03

Pos logit contributions
 hawk+0.023
 prope+0.021
growth+0.021
 valleys+0.021
 happens+0.020
Neg logit contributions
 welcome-0.025
 helping-0.024
 liking-0.024
 hugging-0.024
 polite-0.023
Token,
Token ID11
Feature activation+1.25e-02

Pos logit contributions
 hawk+0.013
 happens+0.013
growth+0.013
 winds+0.013
 prope+0.013
Neg logit contributions
 eager-0.016
 welcome-0.015
 polite-0.015
 obedient-0.015
 glad-0.015
Token as
Token ID355
Feature activation+2.31e-02

Pos logit contributions
 prope+0.023
 hawk+0.023
 valleys+0.021
 winds+0.021
 mast+0.020
Neg logit contributions
 welcome-0.016
 liking-0.015
pack-0.015
ll-0.014
 hugging-0.014
TokenA
Token ID32
Feature activation-1.11e-02

Pos logit contributions
 welcome+0.008
 polite+0.008
 helping+0.008
 obedient+0.008
 sweet+0.008
Neg logit contributions
 hawk-0.013
 prope-0.013
growth-0.012
 valleys-0.012
 mast-0.012
Tokenbr
Token ID1671
Feature activation+5.47e-03

Pos logit contributions
 hugging+0.022
 helping+0.021
 welcome+0.021
 eager+0.021
 obedient+0.021
Neg logit contributions
 hawk-0.016
 prope-0.014
 trigger-0.013
 obstacle-0.013
 happens-0.013
Tokenac
Token ID330
Feature activation+9.28e-03

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.012
 valleys+0.012
 winds+0.012
Neg logit contributions
 welcome-0.006
 sweet-0.005
 polite-0.005
 first-0.005
 helping-0.005
Tokenad
Token ID324
Feature activation+1.69e-02

Pos logit contributions
 hawk+0.015
 prope+0.013
growth+0.013
 valleys+0.013
 trigger+0.012
Neg logit contributions
 welcome-0.017
 helping-0.017
 hugging-0.016
 polite-0.016
 eager-0.016
Tokenab
Token ID397
Feature activation-9.23e-03

Pos logit contributions
 hawk+0.028
 prope+0.026
aw+0.025
 happens+0.024
growth+0.023
Neg logit contributions
 helping-0.028
 welcome-0.028
 hugging-0.028
 eager-0.027
 polite-0.027
Tokenra
Token ID430
Feature activation+6.97e-02

Pos logit contributions
 helping+0.017
 welcome+0.017
 eager+0.017
 polite+0.017
 hugging+0.016
Neg logit contributions
 hawk-0.014
 prope-0.013
growth-0.012
 trigger-0.012
 mast-0.012
Token,
Token ID11
Feature activation-1.14e-02

Pos logit contributions
 hawk+0.142
 prope+0.134
 happens+0.128
growth+0.126
aw+0.125
Neg logit contributions
 welcome-0.089
 helping-0.088
 eager-0.087
 polite-0.086
 obedient-0.084
Token I
Token ID314
Feature activation+5.25e-03

Pos logit contributions
 eager+0.015
 hugging+0.014
 helping+0.014
 welcome+0.014
 obedient+0.014
Neg logit contributions
 hawk-0.023
 prope-0.022
 mast-0.020
aw-0.019
 trigger-0.019
Token turn
Token ID1210
Feature activation-1.29e-02

Pos logit contributions
 hawk+0.009
 prope+0.009
 happens+0.009
growth+0.008
isel+0.008
Neg logit contributions
 helping-0.008
 hugging-0.008
 eager-0.007
 liking-0.007
 welcome-0.007
Token you
Token ID345
Feature activation-6.67e-03

Pos logit contributions
 welcome+0.019
 helping+0.018
 hugging+0.018
 polite+0.018
 liking+0.018
Neg logit contributions
 hawk-0.024
 prope-0.023
 happens-0.021
 valleys-0.021
 trigger-0.021
Token into
Token ID656
Feature activation-3.01e-02

Pos logit contributions
 helping+0.009
 hugging+0.009
 liking+0.009
 handing+0.009
 welcome+0.009
Neg logit contributions
 happens-0.011
 prope-0.011
 hawk-0.011
aw-0.011
 valleys-0.011
Token they
Token ID484
Feature activation-1.63e-02

Pos logit contributions
 hawk+0.025
 prope+0.022
 happens+0.021
growth+0.021
 islands+0.021
Neg logit contributions
 welcome-0.029
 eager-0.028
 polite-0.028
 liking-0.027
 hugging-0.027
Token could
Token ID714
Feature activation-4.13e-03

Pos logit contributions
 welcome+0.030
 hugging+0.030
 helping+0.029
 liking+0.029
 sweet+0.029
Neg logit contributions
 hawk-0.023
 prope-0.021
 valleys-0.020
 happens-0.020
growth-0.020
Token do
Token ID466
Feature activation+2.35e-02

Pos logit contributions
 eager+0.005
 hugging+0.005
 welcome+0.005
 liking+0.005
 obedient+0.005
Neg logit contributions
 hawk-0.008
 prope-0.008
growth-0.008
 valleys-0.008
 islands-0.007
Token anything
Token ID1997
Feature activation+1.90e-02

Pos logit contributions
 hawk+0.039
 prope+0.037
growth+0.035
 valleys+0.034
 trigger+0.033
Neg logit contributions
 welcome-0.040
 helping-0.039
 polite-0.038
 hugging-0.038
 sweet-0.038
Token as
Token ID355
Feature activation+2.87e-02

Pos logit contributions
 hawk+0.035
 happens+0.033
 prope+0.032
 valleys+0.031
 obstacle+0.031
Neg logit contributions
 welcome-0.026
 hugging-0.026
 eager-0.025
 liking-0.024
 helping-0.024
Token long
Token ID890
Feature activation+6.96e-02

Pos logit contributions
 hawk+0.058
 prope+0.053
 islands+0.053
 happens+0.053
growth+0.052
Neg logit contributions
 hugging-0.035
 welcome-0.034
 handing-0.033
 eager-0.032
 liking-0.031
Token as
Token ID355
Feature activation+3.14e-02

Pos logit contributions
 hawk+0.138
 prope+0.129
growth+0.125
 valleys+0.124
 trigger+0.120
Neg logit contributions
 welcome-0.096
 helping-0.091
 polite-0.090
 eager-0.089
 hugging-0.089
Token they
Token ID484
Feature activation+9.95e-03

Pos logit contributions
 hawk+0.050
 islands+0.048
 happens+0.047
 valleys+0.045
 prope+0.045
Neg logit contributions
 hugging-0.049
 welcome-0.049
 eager-0.048
 liking-0.047
 cheek-0.047
Token had
Token ID550
Feature activation+3.35e-02

Pos logit contributions
 hawk+0.015
 prope+0.014
 valleys+0.014
growth+0.014
 happens+0.014
Neg logit contributions
 welcome-0.018
 hugging-0.018
 helping-0.017
 sweet-0.017
 liking-0.017
Token each
Token ID1123
Feature activation-5.29e-03

Pos logit contributions
 happens+0.059
 hawk+0.057
 islands+0.057
aw+0.057
 prope+0.055
Neg logit contributions
 hugging-0.046
 welcome-0.046
 cheek-0.044
 eager-0.043
 obedient-0.042
Token other
Token ID584
Feature activation+1.45e-05

Pos logit contributions
 hugging+0.009
 welcome+0.008
 sweet+0.008
 handing+0.008
 helping+0.008
Neg logit contributions
 hawk-0.008
 targets-0.007
growth-0.007
 happens-0.007
 obstacle-0.007
Token
Token ID198
Feature activation+2.38e-02

Pos logit contributions
 hawk+0.037
 prope+0.037
 trigger+0.036
 winds+0.035
 mast+0.035
Neg logit contributions
 welcome-0.005
 not-0.005
 first-0.005
 eager-0.004
 helping-0.004
TokenOne
Token ID3198
Feature activation-1.44e-02

Pos logit contributions
 prope+0.055
isel+0.051
 hawk+0.051
 mast+0.051
 hull+0.050
Neg logit contributions
 not-0.025
 first-0.025
 polite-0.024
 talking-0.022
 eager-0.022
Token day
Token ID1110
Feature activation+2.08e-02

Pos logit contributions
 hugging+0.022
 helping+0.022
 handing+0.021
 liking+0.021
 welcome+0.020
Neg logit contributions
 hawk-0.026
aw-0.022
 obstacle-0.022
growth-0.022
 prope-0.021
Token Alan
Token ID12246
Feature activation+1.39e-02

Pos logit contributions
 prope+0.059
 hawk+0.057
 cockpit+0.055
 obstacle+0.054
 mast+0.054
Neg logit contributions
 welcome-0.014
 helping-0.013
 first-0.011
 playing-0.011
 looking-0.011
Token and
Token ID290
Feature activation+5.04e-03

Pos logit contributions
 hawk+0.031
 prope+0.030
growth+0.029
 valleys+0.028
 trigger+0.028
Neg logit contributions
 welcome-0.015
 helping-0.015
 polite-0.014
 hugging-0.014
 sweet-0.014
Token Aub
Token ID19723
Feature activation+6.96e-02

Pos logit contributions
 prope+0.008
 hawk+0.008
 valleys+0.008
growth+0.008
 mast+0.007
Neg logit contributions
 welcome-0.009
 helping-0.008
 polite-0.008
 eager-0.008
 obedient-0.008
Tokenrey
Token ID4364
Feature activation+2.11e-02

Pos logit contributions
 hawk+0.137
 prope+0.133
 valleys+0.130
isel+0.124
 mast+0.122
Neg logit contributions
 first-0.096
 welcome-0.089
 not-0.087
 polite-0.083
 obedient-0.082
Token decided
Token ID3066
Feature activation+7.83e-03

Pos logit contributions
 hawk+0.042
 prope+0.039
growth+0.038
 valleys+0.037
 trigger+0.036
Neg logit contributions
 welcome-0.030
 helping-0.029
 polite-0.029
 eager-0.028
 obedient-0.028
Token to
Token ID284
Feature activation-8.18e-04

Pos logit contributions
 hawk+0.021
 prope+0.020
growth+0.019
 valleys+0.019
 trigger+0.019
Neg logit contributions
 welcome-0.006
 helping-0.006
 polite-0.005
 hugging-0.005
 sweet-0.005
Token turn
Token ID1210
Feature activation-1.18e-02

Pos logit contributions
 hugging+0.001
 liking+0.001
 eager+0.001
 helping+0.001
 sweet+0.001
Neg logit contributions
 hawk-0.002
 prope-0.002
growth-0.002
 mast-0.001
 hull-0.001
Token the
Token ID262
Feature activation+4.11e-03

Pos logit contributions
 welcome+0.012
 helping+0.012
 sweet+0.011
 eager+0.011
 polite+0.011
Neg logit contributions
 hawk-0.028
 prope-0.026
growth-0.026
 valleys-0.025
 trigger-0.025
Token very
Token ID845
Feature activation-2.84e-02

Pos logit contributions
 hawk+0.007
 prope+0.007
 valleys+0.006
growth+0.006
 winds+0.006
Neg logit contributions
 hugging-0.002
 welcome-0.002
 liking-0.002
 helping-0.002
 sweet-0.002
Token helpless
Token ID21144
Feature activation-6.81e-03

Pos logit contributions
pack+0.037
 handing+0.036
 hugging+0.036
 telling+0.035
 helping+0.034
Neg logit contributions
 hawk-0.049
 mast-0.046
 prope-0.046
 winds-0.044
 valleys-0.044
Token.
Token ID13
Feature activation+2.27e-03

Pos logit contributions
 welcome+0.010
 polite+0.009
 hugging+0.009
 helping+0.009
 liking+0.009
Neg logit contributions
 hawk-0.013
 prope-0.012
 valleys-0.012
growth-0.012
 winds-0.012
Token He
Token ID679
Feature activation-5.24e-03

Pos logit contributions
 hawk+0.004
 prope+0.003
growth+0.003
 valleys+0.003
 mast+0.003
Neg logit contributions
 helping-0.004
 welcome-0.004
 hugging-0.004
 liking-0.004
 polite-0.004
Token felt
Token ID2936
Feature activation-1.23e-02

Pos logit contributions
 obedient+0.005
 helping+0.005
 eager+0.005
 polite+0.005
 welcome+0.005
Neg logit contributions
 hawk-0.012
 prope-0.012
growth-0.011
 valleys-0.011
 trigger-0.011
Token all
Token ID477
Feature activation-6.94e-02

Pos logit contributions
 welcome+0.011
 helping+0.011
 hugging+0.010
 polite+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.030
 prope-0.029
growth-0.028
 valleys-0.027
 trigger-0.027
Token alone
Token ID3436
Feature activation-2.86e-02

Pos logit contributions
 welcome+0.084
 helping+0.083
 hugging+0.082
 liking+0.082
 polite+0.081
Neg logit contributions
 hawk-0.142
 prope-0.135
 valleys-0.129
growth-0.127
 winds-0.125
Token in
Token ID287
Feature activation-3.15e-02

Pos logit contributions
 welcome+0.036
 hugging+0.033
 polite+0.032
 liking+0.032
 sweet+0.032
Neg logit contributions
 hawk-0.059
 prope-0.056
 valleys-0.054
 winds-0.051
growth-0.051
Token the
Token ID262
Feature activation-1.98e-03

Pos logit contributions
 welcome+0.039
 helping+0.038
 hugging+0.037
 polite+0.037
 obedient+0.036
Neg logit contributions
 hawk-0.067
 prope-0.063
growth-0.062
 valleys-0.061
 islands-0.058
Token big
Token ID1263
Feature activation+5.71e-03

Pos logit contributions
 helping+0.003
 welcome+0.003
 hugging+0.003
 liking+0.003
 polite+0.003
Neg logit contributions
 hawk-0.003
 prope-0.003
 valleys-0.003
growth-0.003
 mast-0.003
Token,
Token ID11
Feature activation-8.93e-03

Pos logit contributions
 hawk+0.010
 prope+0.009
growth+0.009
 valleys+0.009
 trigger+0.009
Neg logit contributions
 welcome-0.009
 helping-0.009
 polite-0.009
 eager-0.009
 sweet-0.009
Token "
Token ID366
Feature activation-2.90e-03

Pos logit contributions
 hawk+0.014
 prope+0.013
 valleys+0.012
 mast+0.012
smith+0.011
Neg logit contributions
 welcome-0.011
 obedient-0.011
 supposed-0.010
 liking-0.010
 clean-0.010
TokenWhy
Token ID5195
Feature activation+6.50e-03

Pos logit contributions
 welcome+0.005
 helping+0.005
 eager+0.005
 hugging+0.005
 obedient+0.005
Neg logit contributions
 hawk-0.005
 prope-0.005
growth-0.005
 valleys-0.004
 trigger-0.004
Token did
Token ID750
Feature activation+3.09e-03

Pos logit contributions
 hawk+0.011
 happens+0.011
 prope+0.010
growth+0.009
aw+0.009
Neg logit contributions
 hugging-0.010
 liking-0.010
 eager-0.010
 helping-0.010
 obedient-0.009
Token you
Token ID345
Feature activation-4.32e-02

Pos logit contributions
 hawk+0.007
 prope+0.007
growth+0.007
 valleys+0.007
 trigger+0.007
Neg logit contributions
 welcome-0.003
 helping-0.003
 hugging-0.003
 eager-0.003
 polite-0.003
Token help
Token ID1037
Feature activation-1.97e-02

Pos logit contributions
 hugging+0.060
 eager+0.058
 loving+0.056
 obedient+0.055
 polite+0.055
Neg logit contributions
 hawk-0.079
growth-0.074
 hull-0.071
isel-0.071
 mast-0.069
Token me
Token ID502
Feature activation-6.71e-02

Pos logit contributions
 eager+0.024
 hugging+0.022
 glad+0.022
 obedient+0.021
 welcome+0.021
Neg logit contributions
 hawk-0.039
growth-0.035
 obstacle-0.035
 conclusions-0.034
 islands-0.034
Token?"
Token ID1701
Feature activation-1.36e-02

Pos logit contributions
 eager+0.093
 cheek+0.088
 hugging+0.086
 liking+0.086
 polite+0.086
Neg logit contributions
 hawk-0.105
growth-0.093
 obstacle-0.092
 anchor-0.092
 hull-0.092
Token
Token ID220
Feature activation-1.02e-02

Pos logit contributions
 welcome+0.020
 helping+0.019
 polite+0.019
 liking+0.019
 obedient+0.019
Neg logit contributions
 hawk-0.025
growth-0.024
 valleys-0.023
 prope-0.023
aw-0.022
Token
Token ID198
Feature activation-1.22e-02

Pos logit contributions
 welcome+0.011
 helping+0.011
 liking+0.011
 polite+0.011
 hugging+0.011
Neg logit contributions
 hawk-0.023
 prope-0.021
growth-0.021
 valleys-0.021
 trigger-0.020
Token
Token ID198
Feature activation-6.17e-03

Pos logit contributions
 welcome+0.015
 helping+0.015
 hugging+0.015
 liking+0.015
 eager+0.015
Neg logit contributions
 hawk-0.025
 prope-0.023
growth-0.023
 valleys-0.023
aw-0.022
TokenThe
Token ID464
Feature activation-8.84e-03

Pos logit contributions
 welcome+0.011
 helping+0.011
 hugging+0.011
 eager+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.010
 prope-0.009
growth-0.009
 valleys-0.009
aw-0.008
Token mom
Token ID1995
Feature activation-8.66e-03

Pos logit contributions
 hawk+0.005
 prope+0.005
 mast+0.004
growth+0.004
 islands+0.004
Neg logit contributions
 welcome-0.003
 helping-0.003
 liking-0.003
 hugging-0.002
 eager-0.002
Token,
Token ID11
Feature activation+1.13e-02

Pos logit contributions
 liking+0.012
 hugging+0.012
 obedient+0.012
 polite+0.012
 welcome+0.012
Neg logit contributions
 hawk-0.013
 valleys-0.013
 islands-0.013
 happens-0.013
smith-0.012
Token "
Token ID366
Feature activation-5.93e-03

Pos logit contributions
 hawk+0.016
 prope+0.015
 mast+0.014
smith+0.013
 hull+0.013
Neg logit contributions
 myself-0.017
 supposed-0.017
 welcome-0.016
 liking-0.015
 obedient-0.015
TokenWhat
Token ID2061
Feature activation+1.29e-02

Pos logit contributions
 welcome+0.010
 eager+0.010
 helping+0.010
 hugging+0.010
 sweet+0.009
Neg logit contributions
 hawk-0.010
growth-0.009
 prope-0.009
 valleys-0.008
smith-0.008
Token made
Token ID925
Feature activation+1.24e-04

Pos logit contributions
 hawk+0.020
 happens+0.020
 prope+0.018
growth+0.017
aw+0.017
Neg logit contributions
 hugging-0.022
 welcome-0.021
 liking-0.021
 helping-0.021
 eager-0.021
Token me
Token ID502
Feature activation-6.65e-02

Pos logit contributions
 hawk+0.000
 happens+0.000
 winds+0.000
aw+0.000
 obstacle+0.000
Neg logit contributions
 hugging-0.000
 eager-0.000
 liking-0.000
 handing-0.000
 obedient-0.000
Token so
Token ID523
Feature activation-2.94e-02

Pos logit contributions
 Deal+0.069
 liking+0.067
 eager+0.066
 Care+0.066
 welcome+0.066
Neg logit contributions
 hawk-0.122
 bolt-0.114
 prope-0.112
 obstacle-0.110
 hull-0.107
Token sad
Token ID6507
Feature activation-1.52e-02

Pos logit contributions
 hugging+0.038
 liking+0.036
 helping+0.036
 welcome+0.034
 handing+0.034
Neg logit contributions
 hawk-0.057
 happens-0.055
 prope-0.054
 winds-0.052
 obstacle-0.051
Token?"
Token ID1701
Feature activation-1.43e-02

Pos logit contributions
 eager+0.020
 hugging+0.020
 welcome+0.020
 liking+0.020
 obedient+0.019
Neg logit contributions
 hawk-0.030
 happens-0.027
 prope-0.026
 obstacle-0.026
 winds-0.026
Token His
Token ID2399
Feature activation+1.42e-03

Pos logit contributions
 supposed+0.024
 sweet+0.024
 liking+0.023
 obedient+0.023
 polite+0.023
Neg logit contributions
growth-0.019
 hawk-0.018
crow-0.017
smith-0.017
aw-0.016
Token mom
Token ID1995
Feature activation-2.26e-02

Pos logit contributions
 hawk+0.003
 prope+0.003
 trigger+0.002
 mast+0.002
growth+0.002
Neg logit contributions
 helping-0.002
 welcome-0.002
 liking-0.002
 myself-0.002
 taking-0.002
Token whist
Token ID32262
Feature activation-2.02e-02

Pos logit contributions
 cheek+0.021
 liking+0.021
ll+0.021
pack+0.020
 sweet+0.019
Neg logit contributions
 hawk-0.045
 prope-0.042
aw-0.037
isel-0.037
 mast-0.037
Tokenling
Token ID1359
Feature activation-2.02e-02

Pos logit contributions
 helping+0.042
 hugging+0.041
 welcome+0.041
 liking+0.040
 cleaning+0.039
Neg logit contributions
 hawk-0.025
 prope-0.023
aw-0.022
 valleys-0.021
 trigger-0.021
Token to
Token ID284
Feature activation-1.42e-02

Pos logit contributions
 hugging+0.024
 liking+0.024
 welcome+0.023
 helping+0.023
 eager+0.022
Neg logit contributions
 hawk-0.041
 prope-0.040
 valleys-0.037
aw-0.037
 winds-0.037
Token make
Token ID787
Feature activation-1.24e-02

Pos logit contributions
 helping+0.016
 hugging+0.015
 eager+0.015
 welcome+0.015
 obedient+0.015
Neg logit contributions
 hawk-0.031
 prope-0.030
growth-0.029
 valleys-0.027
 mast-0.027
Token us
Token ID514
Feature activation-5.45e-02

Pos logit contributions
 welcome+0.013
 hugging+0.013
 helping+0.013
 eager+0.013
 liking+0.013
Neg logit contributions
 hawk-0.027
 prope-0.026
aw-0.025
 happens-0.025
 valleys-0.024
Token both
Token ID1111
Feature activation-6.60e-02

Pos logit contributions
 welcome+0.060
 helping+0.060
 liking+0.060
 hugging+0.058
 obedient+0.056
Neg logit contributions
 prope-0.114
 hawk-0.114
 mast-0.103
 hull-0.103
 valleys-0.101
Token happy
Token ID3772
Feature activation-1.71e-02

Pos logit contributions
 welcome+0.067
 helping+0.066
 liking+0.065
 hugging+0.064
 obedient+0.062
Neg logit contributions
 hawk-0.148
 prope-0.147
 valleys-0.135
 mast-0.132
 winds-0.131
Token."
Token ID526
Feature activation-7.43e-03

Pos logit contributions
 welcome+0.023
 eager+0.023
 obedient+0.022
 hugging+0.022
 polite+0.022
Neg logit contributions
 hawk-0.030
 prope-0.030
 happens-0.030
 winds-0.029
 islands-0.028
Token
Token ID220
Feature activation-1.69e-02

Pos logit contributions
 helping+0.009
 welcome+0.009
 polite+0.009
 liking+0.009
 obedient+0.009
Neg logit contributions
 hawk-0.015
 prope-0.014
growth-0.014
 valleys-0.014
 mast-0.013
Token
Token ID198
Feature activation-7.39e-03

Pos logit contributions
 helping+0.021
 liking+0.021
 polite+0.021
 obedient+0.021
 welcome+0.021
Neg logit contributions
 hawk-0.034
 prope-0.031
growth-0.031
 valleys-0.031
aw-0.030
Token
Token ID198
Feature activation-4.15e-04

Pos logit contributions
 helping+0.009
 welcome+0.009
 hugging+0.009
 liking+0.009
 eager+0.009
Neg logit contributions
 hawk-0.015
 prope-0.014
growth-0.014
 valleys-0.014
aw-0.013
Token Lily
Token ID20037
Feature activation+1.86e-02

Pos logit contributions
 liking+0.011
 welcome+0.010
 obedient+0.010
 hugging+0.010
 helping+0.010
Neg logit contributions
 hawk-0.018
 prope-0.016
 mast-0.015
 valleys-0.015
 islands-0.014
Token accidentally
Token ID14716
Feature activation+1.06e-02

Pos logit contributions
 hawk+0.038
 prope+0.035
 valleys+0.034
aw+0.033
 happens+0.033
Neg logit contributions
 welcome-0.024
 helping-0.023
 liking-0.023
 polite-0.022
 hugging-0.022
Token stub
Token ID17071
Feature activation-2.24e-03

Pos logit contributions
 hawk+0.018
 prope+0.016
growth+0.016
 happens+0.015
 trigger+0.015
Neg logit contributions
 welcome-0.018
 polite-0.017
 eager-0.017
 obedient-0.017
 helping-0.017
Tokenbed
Token ID3077
Feature activation-4.50e-02

Pos logit contributions
 helping+0.003
 welcome+0.003
 hugging+0.003
 sweet+0.003
 liking+0.003
Neg logit contributions
 hawk-0.004
 prope-0.004
 valleys-0.004
growth-0.004
 mast-0.004
Token her
Token ID607
Feature activation-3.06e-02

Pos logit contributions
 welcome+0.042
 helping+0.040
 polite+0.039
 eager+0.039
 liking+0.038
Neg logit contributions
 hawk-0.109
 prope-0.103
growth-0.100
 trigger-0.099
 valleys-0.098
Token toe
Token ID21189
Feature activation-6.57e-02

Pos logit contributions
 welcome+0.047
 polite+0.045
 helping+0.044
 liking+0.044
 eager+0.043
Neg logit contributions
 hawk-0.053
 prope-0.052
 trigger-0.048
growth-0.048
 dot-0.047
Token on
Token ID319
Feature activation-2.62e-02

Pos logit contributions
 welcome+0.094
 helping+0.087
 hugging+0.085
 liking+0.085
 polite+0.084
Neg logit contributions
 hawk-0.126
 prope-0.118
 trigger-0.115
growth-0.109
aw-0.107
Token one
Token ID530
Feature activation-3.92e-02

Pos logit contributions
 helping+0.027
 welcome+0.027
 eager+0.027
 polite+0.027
 obedient+0.026
Neg logit contributions
 hawk-0.060
 prope-0.056
growth-0.055
 trigger-0.053
 valleys-0.052
Token of
Token ID286
Feature activation-2.78e-02

Pos logit contributions
 helping+0.054
 hugging+0.053
 polite+0.053
 obedient+0.052
 liking+0.052
Neg logit contributions
 hawk-0.074
 prope-0.069
growth-0.068
 trigger-0.066
 altitude-0.065
Token the
Token ID262
Feature activation-9.60e-03

Pos logit contributions
 eager+0.032
 obedient+0.031
ll+0.031
 polite+0.030
 liking+0.029
Neg logit contributions
 hawk-0.059
 prope-0.055
growth-0.055
 trigger-0.051
 altitude-0.051
Token toys
Token ID14958
Feature activation-4.22e-02

Pos logit contributions
 welcome+0.015
 helping+0.015
 eager+0.014
 polite+0.014
 hugging+0.014
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 trigger-0.015
 valleys-0.014
Token,
Token ID11
Feature activation-8.13e-04

Pos logit contributions
 welcome+0.027
 hugging+0.027
 liking+0.026
 helping+0.026
 eager+0.026
Neg logit contributions
 hawk-0.043
 prope-0.041
 valleys-0.039
growth-0.039
 islands-0.038
Token "
Token ID366
Feature activation-1.00e-02

Pos logit contributions
 welcome+0.001
 supposed+0.001
 liking+0.001
 obedient+0.001
 myself+0.001
Neg logit contributions
 hawk-0.001
 prope-0.001
 mast-0.001
 valleys-0.001
 hull-0.001
TokenCan
Token ID6090
Feature activation-1.08e-02

Pos logit contributions
 welcome+0.015
 helping+0.014
 hugging+0.014
 polite+0.014
 sweet+0.014
Neg logit contributions
 hawk-0.019
 prope-0.018
growth-0.017
 valleys-0.017
 trigger-0.016
Token you
Token ID345
Feature activation-3.20e-02

Pos logit contributions
 welcome+0.009
 helping+0.008
 eager+0.008
 hugging+0.008
 liking+0.008
Neg logit contributions
 hawk-0.027
 prope-0.026
growth-0.024
 mast-0.024
 valleys-0.024
Token help
Token ID1037
Feature activation-1.87e-02

Pos logit contributions
 hugging+0.036
 eager+0.036
 liking+0.035
 obedient+0.034
 helping+0.034
Neg logit contributions
 hawk-0.064
 prope-0.064
 hull-0.058
 mast-0.058
growth-0.057
Token me
Token ID502
Feature activation-6.54e-02

Pos logit contributions
 eager+0.021
 glad+0.019
 hugging+0.019
 ready+0.018
 welcome+0.018
Neg logit contributions
 hawk-0.040
 prope-0.036
 islands-0.036
growth-0.036
 targets-0.035
Token shine
Token ID18340
Feature activation-3.20e-02

Pos logit contributions
 eager+0.086
 liking+0.083
 obedient+0.080
 hugging+0.079
 polite+0.079
Neg logit contributions
 hawk-0.111
 prope-0.107
 hull-0.102
 mast-0.100
 islands-0.098
Token too
Token ID1165
Feature activation-2.92e-02

Pos logit contributions
 liking+0.041
 hugging+0.041
 eager+0.039
 welcome+0.039
 helping+0.039
Neg logit contributions
 hawk-0.062
 prope-0.061
 valleys-0.057
 islands-0.056
 altitude-0.055
Token?"
Token ID1701
Feature activation-1.48e-02

Pos logit contributions
 liking+0.047
 hugging+0.047
 handing+0.046
 laughing+0.046
 helping+0.044
Neg logit contributions
 hawk-0.048
 prope-0.042
 altitude-0.041
 happens-0.040
 winds-0.039
Token The
Token ID383
Feature activation-3.10e-02

Pos logit contributions
 welcome+0.020
 helping+0.020
 hugging+0.019
 polite+0.019
 sweet+0.019
Neg logit contributions
 hawk-0.029
 prope-0.027
growth-0.027
 valleys-0.026
 trigger-0.025
Token small
Token ID1402
Feature activation-1.29e-02

Pos logit contributions
 welcome+0.035
 hugging+0.034
 helping+0.034
 liking+0.034
 ready+0.033
Neg logit contributions
 hawk-0.068
 prope-0.063
growth-0.060
 trigger-0.059
 valleys-0.059
Token a
Token ID257
Feature activation-3.28e-02

Pos logit contributions
 liking+0.018
 hugging+0.018
 welcome+0.017
 obedient+0.017
 helping+0.017
Neg logit contributions
 hawk-0.023
 prope-0.022
 valleys-0.022
growth-0.022
 winds-0.022
Token special
Token ID2041
Feature activation-2.77e-02

Pos logit contributions
 welcome+0.037
 helping+0.036
 hugging+0.036
 eager+0.034
 obedient+0.034
Neg logit contributions
 hawk-0.073
 prope-0.069
 mast-0.065
 valleys-0.064
 trigger-0.064
Token noise
Token ID7838
Feature activation-2.35e-02

Pos logit contributions
 eager+0.039
 welcome+0.039
 obedient+0.039
 hugging+0.038
 polite+0.037
Neg logit contributions
 hawk-0.050
 prope-0.046
 mast-0.045
 dot-0.043
 obstacle-0.043
Token that
Token ID326
Feature activation-1.56e-02

Pos logit contributions
 welcome+0.032
 polite+0.030
 obedient+0.030
 eager+0.029
 hugging+0.029
Neg logit contributions
 hawk-0.045
 prope-0.043
 happens-0.041
 valleys-0.040
growth-0.040
Token made
Token ID925
Feature activation-8.52e-03

Pos logit contributions
 liking+0.026
 hugging+0.026
 welcome+0.025
 helping+0.024
 eager+0.024
Neg logit contributions
 hawk-0.025
 happens-0.024
 prope-0.024
 valleys-0.023
 mast-0.022
Token everyone
Token ID2506
Feature activation-6.53e-02

Pos logit contributions
 liking+0.014
 hugging+0.014
 eager+0.013
 welcome+0.013
 obedient+0.013
Neg logit contributions
 hawk-0.014
 prope-0.013
 winds-0.013
 valleys-0.013
growth-0.013
Token around
Token ID1088
Feature activation-1.16e-02

Pos logit contributions
 welcome+0.064
 helping+0.060
 hugging+0.059
 liking+0.058
 obedient+0.057
Neg logit contributions
 hawk-0.145
 prope-0.142
 valleys-0.135
 happens-0.132
 mast-0.130
Token very
Token ID845
Feature activation-2.52e-02

Pos logit contributions
 welcome+0.018
 obedient+0.017
 polite+0.017
 sweet+0.017
 hugging+0.016
Neg logit contributions
 hawk-0.020
 prope-0.019
 valleys-0.019
 islands-0.018
growth-0.018
Token happy
Token ID3772
Feature activation-6.03e-03

Pos logit contributions
 welcome+0.012
 helping+0.011
 polite+0.010
 hugging+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.073
 prope-0.070
growth-0.068
 valleys-0.068
 trigger-0.066
Token.
Token ID13
Feature activation-1.89e-02

Pos logit contributions
 welcome+0.009
 hugging+0.009
 obedient+0.009
 polite+0.009
 helping+0.009
Neg logit contributions
 hawk-0.010
 prope-0.010
 valleys-0.010
 islands-0.010
 winds-0.009
Token
Token ID198
Feature activation-7.84e-03

Pos logit contributions
 welcome+0.019
 helping+0.018
 polite+0.017
 eager+0.017
 hugging+0.017
Neg logit contributions
 hawk-0.045
 prope-0.042
growth-0.041
 valleys-0.040
 trigger-0.040
Token said
Token ID531
Feature activation-8.50e-03

Pos logit contributions
 welcome+0.020
 helping+0.019
 polite+0.019
 hugging+0.019
 sweet+0.019
Neg logit contributions
 hawk-0.032
 prope-0.030
growth-0.029
 valleys-0.029
 trigger-0.028
Token,
Token ID11
Feature activation-1.28e-02

Pos logit contributions
 welcome+0.010
 helping+0.009
 hugging+0.009
 polite+0.009
 eager+0.009
Neg logit contributions
 hawk-0.019
 prope-0.018
growth-0.017
 valleys-0.017
 trigger-0.017
Token "
Token ID366
Feature activation-9.79e-03

Pos logit contributions
 welcome+0.018
 obedient+0.016
 polite+0.016
 clean+0.015
 supposed+0.015
Neg logit contributions
 hawk-0.024
 prope-0.024
 mast-0.022
 valleys-0.022
 winds-0.021
TokenPlease
Token ID5492
Feature activation-1.63e-02

Pos logit contributions
 welcome+0.013
 helping+0.013
 polite+0.013
 obedient+0.013
 sweet+0.013
Neg logit contributions
 hawk-0.020
 prope-0.019
growth-0.018
 valleys-0.018
 mast-0.017
Token help
Token ID1037
Feature activation-2.61e-02

Pos logit contributions
 eager+0.023
 laughing+0.022
 hugging+0.021
 liking+0.021
 enthusiastic+0.021
Neg logit contributions
 hawk-0.030
 trigger-0.026
 prope-0.026
 happens-0.025
 mast-0.025
Token me
Token ID502
Feature activation-6.51e-02

Pos logit contributions
 hugging+0.020
 eager+0.020
 helping+0.019
 welcome+0.019
 liking+0.019
Neg logit contributions
 hawk-0.066
 prope-0.060
 trigger-0.059
growth-0.059
aw-0.058
Token get
Token ID651
Feature activation-2.38e-02

Pos logit contributions
 freshly+0.082
 eager+0.082
 hugging+0.081
 liking+0.080
 laughing+0.079
Neg logit contributions
 hawk-0.120
 hull-0.105
 mast-0.105
 prope-0.104
 trigger-0.103
Token my
Token ID616
Feature activation-6.37e-03

Pos logit contributions
 welcome+0.030
 liking+0.029
 hugging+0.029
 helping+0.029
 eager+0.028
Neg logit contributions
 hawk-0.047
 prope-0.044
 valleys-0.042
 trigger-0.042
 happens-0.042
Token ball
Token ID2613
Feature activation-2.21e-02

Pos logit contributions
 helping+0.010
 welcome+0.010
 hugging+0.010
 liking+0.009
 polite+0.009
Neg logit contributions
 hawk-0.011
 prope-0.011
growth-0.010
 trigger-0.010
 valleys-0.010
Token."
Token ID526
Feature activation-1.43e-02

Pos logit contributions
 hugging+0.032
 liking+0.032
 helping+0.031
 loving+0.031
 talking+0.031
Neg logit contributions
 hawk-0.038
 prope-0.035
 trigger-0.034
 mast-0.034
 altitude-0.033
Token The
Token ID383
Feature activation-1.25e-02

Pos logit contributions
 welcome+0.024
 liking+0.024
 hugging+0.024
 helping+0.023
 polite+0.023
Neg logit contributions
 hawk-0.021
growth-0.020
 prope-0.019
 valleys-0.019
 mast-0.019
Token hot
Token ID3024
Feature activation-4.07e-03

Pos logit contributions
 hugging+0.022
 helping+0.020
 handing+0.020
pack+0.020
isk+0.019
Neg logit contributions
 hawk-0.035
 prope-0.033
 happens-0.033
 mast-0.032
aw-0.032
Token.
Token ID13
Feature activation-1.51e-02

Pos logit contributions
 welcome+0.005
 polite+0.005
 liking+0.005
 hugging+0.005
 thoughtful+0.004
Neg logit contributions
 hawk-0.009
 valleys-0.008
 winds-0.008
 prope-0.008
aw-0.008
Token It
Token ID632
Feature activation-1.52e-02

Pos logit contributions
 helping+0.022
 liking+0.022
 polite+0.022
 welcome+0.021
 sweet+0.021
Neg logit contributions
 hawk-0.026
growth-0.024
 valleys-0.024
 prope-0.024
aw-0.023
Token was
Token ID373
Feature activation-3.78e-03

Pos logit contributions
 welcome+0.024
 liking+0.022
 helping+0.022
 hugging+0.022
 polite+0.021
Neg logit contributions
 hawk-0.028
 happens-0.025
 prope-0.025
 valleys-0.024
growth-0.024
Token making
Token ID1642
Feature activation-1.14e-02

Pos logit contributions
ll+0.005
pack+0.004
 liking+0.004
 cheek+0.004
 welcome+0.004
Neg logit contributions
 hawk-0.008
 prope-0.007
 winds-0.006
 valleys-0.006
 altitude-0.006
Token everyone
Token ID2506
Feature activation-6.50e-02

Pos logit contributions
 liking+0.018
 welcome+0.017
 hugging+0.017
 eager+0.017
 handing+0.017
Neg logit contributions
 hawk-0.019
 winds-0.018
 prope-0.018
 valleys-0.018
aw-0.017
Token very
Token ID845
Feature activation-2.76e-02

Pos logit contributions
 welcome+0.068
 liking+0.064
 helping+0.062
 hugging+0.061
 first+0.061
Neg logit contributions
 hawk-0.132
 prope-0.128
 valleys-0.124
 happens-0.121
 obstacle-0.120
Token sleepy
Token ID42023
Feature activation-2.05e-02

Pos logit contributions
 welcome+0.027
 helping+0.026
 hugging+0.026
 liking+0.025
 handing+0.024
Neg logit contributions
 hawk-0.062
 prope-0.060
 valleys-0.058
aw-0.058
 winds-0.058
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
 welcome+0.030
 hugging+0.030
 liking+0.030
 polite+0.029
 thoughtful+0.029
Neg logit contributions
 hawk-0.034
 valleys-0.032
 prope-0.032
 winds-0.031
 islands-0.031
Token The
Token ID383
Feature activation-5.76e-03

Pos logit contributions
 helping+0.017
 liking+0.017
 welcome+0.017
 polite+0.017
 hugging+0.017
Neg logit contributions
 hawk-0.024
 prope-0.022
growth-0.022
 valleys-0.022
aw-0.021
Token roof
Token ID9753
Feature activation-2.58e-02

Pos logit contributions
 helping+0.009
 welcome+0.009
 liking+0.009
 hugging+0.009
 eager+0.008
Neg logit contributions
 hawk-0.011
 prope-0.010
 trigger-0.009
 winds-0.009
 valleys-0.009
Token of
Token ID286
Feature activation-3.75e-02

Pos logit contributions
 welcome+0.036
 helping+0.034
 polite+0.033
 sweet+0.033
 hugging+0.032
Neg logit contributions
 hawk-0.081
 prope-0.077
growth-0.075
 valleys-0.074
 trigger-0.072
Token noise
Token ID7838
Feature activation-1.64e-02

Pos logit contributions
 welcome+0.049
 helping+0.048
 liking+0.047
 hugging+0.047
 eager+0.046
Neg logit contributions
 hawk-0.076
 prope-0.073
 valleys-0.067
 winds-0.066
growth-0.066
Token.
Token ID13
Feature activation-5.08e-03

Pos logit contributions
 welcome+0.021
 eager+0.019
 polite+0.019
 obedient+0.019
 hugging+0.019
Neg logit contributions
 hawk-0.034
 prope-0.032
 valleys-0.031
aw-0.030
growth-0.030
Token It
Token ID632
Feature activation-1.48e-02

Pos logit contributions
 helping+0.008
 welcome+0.008
 obedient+0.008
 polite+0.008
 liking+0.008
Neg logit contributions
 hawk-0.009
 prope-0.008
growth-0.008
 valleys-0.008
 mast-0.007
Token made
Token ID925
Feature activation-6.14e-03

Pos logit contributions
 welcome+0.017
 helping+0.017
 polite+0.016
 sweet+0.016
 hugging+0.016
Neg logit contributions
 hawk-0.033
 prope-0.031
growth-0.030
 valleys-0.030
 trigger-0.029
Token everyone
Token ID2506
Feature activation-6.47e-02

Pos logit contributions
 welcome+0.007
 helping+0.007
 liking+0.007
 hugging+0.007
 eager+0.007
Neg logit contributions
 hawk-0.013
 prope-0.013
growth-0.012
 valleys-0.012
 trigger-0.012
Token look
Token ID804
Feature activation-1.14e-03

Pos logit contributions
 welcome+0.068
 helping+0.064
 liking+0.063
 polite+0.060
 obedient+0.059
Neg logit contributions
 hawk-0.138
 prope-0.135
 valleys-0.125
 dot-0.123
aw-0.123
Token.
Token ID13
Feature activation+8.89e-04

Pos logit contributions
 welcome+0.001
 helping+0.001
 hugging+0.001
 liking+0.001
ll+0.001
Neg logit contributions
 prope-0.002
 hawk-0.002
aw-0.002
 trigger-0.002
 obstacle-0.002
Token
Token ID198
Feature activation+1.32e-04

Pos logit contributions
 hawk+0.002
 prope+0.002
growth+0.002
 valleys+0.002
 mast+0.001
Neg logit contributions
 helping-0.001
 welcome-0.001
 obedient-0.001
 polite-0.001
 liking-0.001
Token
Token ID198
Feature activation+6.25e-03

Pos logit contributions
 hawk+0.000
 prope+0.000
growth+0.000
 valleys+0.000
 trigger+0.000
Neg logit contributions
 welcome-0.000
 helping-0.000
 polite-0.000
 hugging-0.000
 sweet-0.000
TokenJ
Token ID41
Feature activation+1.41e-02

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.012
 valleys+0.011
 trigger+0.011
Neg logit contributions
 welcome-0.008
 helping-0.008
 polite-0.008
 obedient-0.008
 sweet-0.008
Token "
Token ID366
Feature activation+1.16e-03

Pos logit contributions
 welcome+0.003
 myself+0.003
pack+0.003
 obedient+0.003
 polite+0.003
Neg logit contributions
 hawk-0.003
 prope-0.003
 winds-0.002
 happens-0.002
der-0.002
TokenOh
Token ID5812
Feature activation-1.03e-02

Pos logit contributions
 hawk+0.002
 prope+0.002
growth+0.002
 valleys+0.002
 mast+0.002
Neg logit contributions
 welcome-0.002
 helping-0.002
 polite-0.001
 sweet-0.001
 obedient-0.001
Token no
Token ID645
Feature activation+1.44e-03

Pos logit contributions
 welcome+0.013
 helping+0.013
 hugging+0.013
 polite+0.012
 eager+0.012
Neg logit contributions
 hawk-0.021
 prope-0.020
growth-0.019
 valleys-0.019
 trigger-0.019
Token!
Token ID0
Feature activation-2.12e-02

Pos logit contributions
 hawk+0.003
 prope+0.002
 winds+0.002
 happens+0.002
 obstacle+0.002
Neg logit contributions
 welcome-0.002
 eager-0.002
 helping-0.002
 hugging-0.002
 polite-0.002
Token You
Token ID921
Feature activation-7.47e-03

Pos logit contributions
 welcome+0.028
 helping+0.027
 polite+0.027
 sweet+0.026
 eager+0.026
Neg logit contributions
 hawk-0.042
 prope-0.040
growth-0.039
 trigger-0.037
 happens-0.037
Token both
Token ID1111
Feature activation-6.45e-02

Pos logit contributions
 hugging+0.012
 welcome+0.012
 liking+0.011
 helping+0.011
 polite+0.011
Neg logit contributions
 hawk-0.013
 prope-0.012
 happens-0.011
 trigger-0.011
 winds-0.011
Token played
Token ID2826
Feature activation-3.60e-02

Pos logit contributions
 hugging+0.097
 welcome+0.091
 helping+0.091
 liking+0.090
 handing+0.090
Neg logit contributions
 happens-0.099
 winds-0.095
 hawk-0.091
 prope-0.091
 obstacle-0.088
Token with
Token ID351
Feature activation-9.74e-03

Pos logit contributions
 hugging+0.038
 eager+0.037
ll+0.035
 talking+0.035
 welcome+0.035
Neg logit contributions
 hawk-0.071
 prope-0.070
aw-0.069
 valleys-0.067
 happens-0.067
Token the
Token ID262
Feature activation+2.37e-03

Pos logit contributions
 welcome+0.014
 eager+0.014
 hugging+0.013
 liking+0.013
 helping+0.013
Neg logit contributions
 hawk-0.018
 prope-0.017
aw-0.016
 winds-0.016
growth-0.016
Token junk
Token ID18556
Feature activation+3.65e-03

Pos logit contributions
 hawk+0.003
 trigger+0.003
 valleys+0.003
 prope+0.003
 mast+0.003
Neg logit contributions
 hugging-0.004
 helping-0.004
 liking-0.004
 eager-0.004
 welcome-0.004
Token and
Token ID290
Feature activation-2.82e-03

Pos logit contributions
 hawk+0.007
 happens+0.006
aw+0.006
 prope+0.006
 valleys+0.006
Neg logit contributions
 hugging-0.005
 liking-0.005
 welcome-0.005
 eager-0.005
 helping-0.005
Token either
Token ID2035
Feature activation-4.30e-03

Pos logit contributions
 welcome+0.044
 hugging+0.042
 liking+0.042
 helping+0.041
 polite+0.041
Neg logit contributions
 hawk-0.048
 prope-0.045
aw-0.043
 happens-0.041
 islands-0.041
Token.
Token ID13
Feature activation-1.47e-02

Pos logit contributions
 polite+0.007
 obedient+0.007
 hugging+0.006
 eager+0.006
 thoughtful+0.006
Neg logit contributions
 hawk-0.007
 happens-0.007
 prope-0.006
crow-0.006
aw-0.006
Token This
Token ID770
Feature activation-5.97e-03

Pos logit contributions
 welcome+0.021
 helping+0.021
 polite+0.020
 hugging+0.020
 liking+0.020
Neg logit contributions
 hawk-0.028
 prope-0.026
growth-0.025
 valleys-0.025
 mast-0.024
Token made
Token ID925
Feature activation-7.56e-03

Pos logit contributions
 welcome+0.008
 helping+0.008
 polite+0.008
 sweet+0.008
 hugging+0.008
Neg logit contributions
 hawk-0.012
 prope-0.011
growth-0.011
 valleys-0.011
 trigger-0.010
Token his
Token ID465
Feature activation-2.66e-02

Pos logit contributions
 liking+0.012
 hugging+0.012
 welcome+0.012
 eager+0.012
 helping+0.012
Neg logit contributions
 hawk-0.012
 prope-0.011
 winds-0.011
 valleys-0.011
 targets-0.011
Token friends
Token ID2460
Feature activation-6.44e-02

Pos logit contributions
 welcome+0.030
 helping+0.030
 liking+0.029
 hugging+0.029
 polite+0.028
Neg logit contributions
 hawk-0.059
 prope-0.056
growth-0.053
 trigger-0.052
 valleys-0.052
Token sad
Token ID6507
Feature activation-1.79e-02

Pos logit contributions
 welcome+0.069
 liking+0.068
 hugging+0.064
 myself+0.063
 my+0.061
Neg logit contributions
 prope-0.124
 hawk-0.120
 winds-0.112
 hull-0.111
 mast-0.111
Token and
Token ID290
Feature activation-1.05e-02

Pos logit contributions
 welcome+0.023
 eager+0.021
 polite+0.021
 hugging+0.021
 helping+0.021
Neg logit contributions
 hawk-0.035
 prope-0.034
 winds-0.033
 happens-0.032
 mast-0.032
Token they
Token ID484
Feature activation-1.44e-02

Pos logit contributions
 welcome+0.015
 liking+0.014
 hugging+0.014
 helping+0.014
 sweet+0.014
Neg logit contributions
 hawk-0.019
 prope-0.018
 happens-0.017
 valleys-0.017
 winds-0.017
Token started
Token ID2067
Feature activation-1.39e-02

Pos logit contributions
 welcome+0.013
 helping+0.013
 eager+0.013
 polite+0.012
 clean+0.012
Neg logit contributions
 hawk-0.036
growth-0.034
 prope-0.033
 valleys-0.033
 trigger-0.032
Token to
Token ID284
Feature activation-7.55e-04

Pos logit contributions
 welcome+0.015
 sweet+0.012
 polite+0.012
 obedient+0.012
 myself+0.011
Neg logit contributions
 hawk-0.030
 prope-0.030
 valleys-0.029
growth-0.028
 mast-0.028
Token Billy
Token ID15890
Feature activation-4.33e-03

Pos logit contributions
 liking+0.007
 welcome+0.007
 eager+0.006
 hugging+0.006
 polite+0.006
Neg logit contributions
 hawk-0.009
 prope-0.008
growth-0.007
 winds-0.007
 valleys-0.007
Token,
Token ID11
Feature activation+2.67e-03

Pos logit contributions
 welcome+0.005
 liking+0.005
 hugging+0.005
 polite+0.005
 obedient+0.005
Neg logit contributions
 hawk-0.008
 prope-0.007
growth-0.007
 trigger-0.007
 valleys-0.007
Token "
Token ID366
Feature activation+1.93e-03

Pos logit contributions
 hawk+0.005
 prope+0.004
smith+0.004
 valleys+0.004
 happens+0.004
Neg logit contributions
 welcome-0.004
 myself-0.003
 tidy-0.003
 liking-0.003
 obedient-0.003
TokenPlease
Token ID5492
Feature activation-1.20e-02

Pos logit contributions
 hawk+0.004
 prope+0.003
growth+0.003
 valleys+0.003
 trigger+0.003
Neg logit contributions
 welcome-0.003
 helping-0.003
 hugging-0.003
 polite-0.003
 sweet-0.003
Token help
Token ID1037
Feature activation-1.60e-02

Pos logit contributions
 eager+0.017
 liking+0.016
 laughing+0.016
 hugging+0.016
 enthusiastic+0.016
Neg logit contributions
 hawk-0.022
 trigger-0.019
 prope-0.019
 happens-0.018
ger-0.018
Token me
Token ID502
Feature activation-6.32e-02

Pos logit contributions
 eager+0.016
 hugging+0.015
 ready+0.015
 liking+0.015
 glad+0.015
Neg logit contributions
 hawk-0.037
 trigger-0.033
growth-0.032
 winds-0.032
 prope-0.032
Token get
Token ID651
Feature activation-2.32e-02

Pos logit contributions
 eager+0.079
 liking+0.077
 hugging+0.075
 polite+0.075
 loving+0.075
Neg logit contributions
 hawk-0.120
 prope-0.104
 trigger-0.103
 bolt-0.103
 obstacle-0.102
Token my
Token ID616
Feature activation-1.41e-02

Pos logit contributions
 welcome+0.026
 liking+0.026
 eager+0.025
 hugging+0.025
 polite+0.025
Neg logit contributions
 hawk-0.050
 prope-0.046
 trigger-0.044
 winds-0.044
 happens-0.044
Token toy
Token ID13373
Feature activation-2.12e-02

Pos logit contributions
 welcome+0.022
 helping+0.022
 liking+0.022
 hugging+0.022
 polite+0.021
Neg logit contributions
 hawk-0.025
 prope-0.023
 trigger-0.022
growth-0.021
 mast-0.021
Token."
Token ID526
Feature activation+5.34e-04

Pos logit contributions
 welcome+0.029
 liking+0.029
 polite+0.028
 helping+0.028
 eager+0.028
Neg logit contributions
 hawk-0.042
 prope-0.039
 trigger-0.037
 mast-0.036
growth-0.036
Token Billy
Token ID15890
Feature activation-6.17e-03

Pos logit contributions
 hawk+0.001
growth+0.001
 prope+0.001
 winds+0.001
 valleys+0.001
Neg logit contributions
 welcome-0.001
 polite-0.001
 liking-0.001
 obedient-0.001
 thoughtful-0.001
Token "
Token ID366
Feature activation+1.34e-03

Pos logit contributions
 welcome+0.002
 liking+0.002
 obedient+0.002
ll+0.002
 polite+0.002
Neg logit contributions
 hawk-0.005
 prope-0.004
 valleys-0.004
 mast-0.004
 winds-0.004
TokenWhy
Token ID5195
Feature activation+1.07e-02

Pos logit contributions
 hawk+0.002
 prope+0.002
growth+0.002
 valleys+0.002
 trigger+0.002
Neg logit contributions
 welcome-0.002
 helping-0.002
 polite-0.002
 hugging-0.002
 sweet-0.002
Token did
Token ID750
Feature activation+6.21e-03

Pos logit contributions
 hawk+0.020
 prope+0.018
 happens+0.018
aw+0.017
growth+0.016
Neg logit contributions
 helping-0.016
 hugging-0.016
 eager-0.016
 liking-0.015
 obedient-0.015
Token you
Token ID345
Feature activation-3.57e-02

Pos logit contributions
 hawk+0.015
 prope+0.015
growth+0.014
 valleys+0.014
 trigger+0.014
Neg logit contributions
 welcome-0.006
 helping-0.005
 polite-0.005
 sweet-0.005
 hugging-0.005
Token thank
Token ID5875
Feature activation-1.82e-02

Pos logit contributions
 hugging+0.050
 helping+0.049
 eager+0.048
 obedient+0.048
 cleaning+0.047
Neg logit contributions
 hawk-0.072
 prope-0.063
growth-0.061
 hull-0.058
 mast-0.057
Token me
Token ID502
Feature activation-6.29e-02

Pos logit contributions
 welcome+0.015
 eager+0.014
 hugging+0.014
 helping+0.014
 obedient+0.014
Neg logit contributions
 hawk-0.045
 prope-0.042
 valleys-0.040
growth-0.040
 winds-0.040
Token,
Token ID11
Feature activation-4.36e-02

Pos logit contributions
 eager+0.091
 freshly+0.090
 liking+0.090
 hugging+0.089
 polite+0.086
Neg logit contributions
 hawk-0.108
 prope-0.091
aw-0.091
smith-0.090
 bolt-0.088
Token little
Token ID1310
Feature activation-7.42e-03

Pos logit contributions
 helping+0.057
 liking+0.056
 hugging+0.055
 welcome+0.055
 eager+0.054
Neg logit contributions
 hawk-0.091
 prope-0.083
 dot-0.073
 bolt-0.073
 obstacle-0.073
Token bird
Token ID6512
Feature activation-1.27e-02

Pos logit contributions
 welcome+0.013
 sweet+0.013
 helping+0.013
 polite+0.013
 obedient+0.013
Neg logit contributions
 hawk-0.011
growth-0.011
 prope-0.011
 valleys-0.010
 trigger-0.010
Token?"
Token ID1701
Feature activation-1.17e-02

Pos logit contributions
 welcome+0.016
 hugging+0.016
 helping+0.016
 eager+0.016
 sweet+0.016
Neg logit contributions
 hawk-0.026
 prope-0.023
 happens-0.022
aw-0.022
 trigger-0.021
Token The
Token ID383
Feature activation-1.60e-02

Pos logit contributions
 welcome+0.016
 helping+0.016
 polite+0.015
 hugging+0.015
 sweet+0.015
Neg logit contributions
 hawk-0.024
 prope-0.022
growth-0.021
 valleys-0.021
 trigger-0.020
Token�
Token ID95
Feature activation+7.07e-03

Pos logit contributions
 welcome+0.005
 helping+0.004
 polite+0.004
 obedient+0.004
 liking+0.004
Neg logit contributions
 hawk-0.005
 prope-0.005
 valleys-0.004
growth-0.004
 trigger-0.004
Token€
Token ID26391
Feature activation+1.67e-02

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.012
 valleys+0.012
 trigger+0.011
Neg logit contributions
 welcome-0.011
 helping-0.010
 polite-0.010
 obedient-0.010
 eager-0.010
Token�
Token ID129
Feature activation+2.22e-02

Pos logit contributions
 hawk+0.034
 prope+0.032
growth+0.031
 trigger+0.030
 valleys+0.030
Neg logit contributions
 welcome-0.022
 helping-0.022
 polite-0.021
 sweet-0.021
 eager-0.021
Token�
Token ID241
Feature activation-1.66e-02

Pos logit contributions
 hawk+0.038
 prope+0.035
growth+0.034
 valleys+0.034
 trigger+0.032
Neg logit contributions
 welcome-0.037
 helping-0.036
 polite-0.035
 hugging-0.035
 eager-0.035
TokenHello
Token ID15496
Feature activation-8.70e-03

Pos logit contributions
 welcome+0.022
 helping+0.021
 polite+0.021
 eager+0.021
 sweet+0.021
Neg logit contributions
 hawk-0.034
 prope-0.032
growth-0.031
 valleys-0.030
 trigger-0.030
Token all
Token ID477
Feature activation-6.27e-02

Pos logit contributions
 welcome+0.010
 helping+0.010
 hugging+0.009
 eager+0.009
 liking+0.009
Neg logit contributions
 hawk-0.019
 prope-0.018
 valleys-0.017
growth-0.017
 trigger-0.017
Tokenigator
Token ID23823
Feature activation-1.86e-02

Pos logit contributions
 welcome+0.103
 helping+0.102
 hugging+0.100
 polite+0.099
 liking+0.099
Neg logit contributions
 hawk-0.107
 prope-0.098
 valleys-0.094
growth-0.094
 trigger-0.090
Token!
Token ID0
Feature activation-8.00e-04

Pos logit contributions
 welcome+0.021
 helping+0.020
 hugging+0.019
 polite+0.019
 eager+0.019
Neg logit contributions
 hawk-0.042
 prope-0.039
growth-0.038
 valleys-0.038
 trigger-0.037
Tokenâ
Token ID22940
Feature activation+8.30e-03

Pos logit contributions
 welcome+0.001
 helping+0.001
 polite+0.001
 hugging+0.001
 liking+0.001
Neg logit contributions
 hawk-0.002
 prope-0.002
growth-0.001
 valleys-0.001
 trigger-0.001
Token€
Token ID26391
Feature activation+1.86e-02

Pos logit contributions
 hawk+0.014
 prope+0.013
growth+0.013
 valleys+0.013
 trigger+0.012
Neg logit contributions
 welcome-0.014
 helping-0.013
 polite-0.013
 hugging-0.013
 sweet-0.013
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
 hawk+0.043
 prope+0.040
 trigger+0.039
growth+0.039
 valleys+0.037
Neg logit contributions
 welcome-0.021
 helping-0.020
 eager-0.019
 sweet-0.019
 polite-0.019
Token sound
Token ID2128
Feature activation-7.74e-03

Pos logit contributions
 helping+0.022
 welcome+0.022
 hugging+0.022
 liking+0.021
 eager+0.020
Neg logit contributions
 hawk-0.041
 prope-0.038
 mast-0.037
 valleys-0.036
 obstacle-0.036
Token.
Token ID13
Feature activation-8.93e-03

Pos logit contributions
 welcome+0.010
 obedient+0.009
 helping+0.009
 polite+0.009
 eager+0.009
Neg logit contributions
 hawk-0.015
 prope-0.015
 valleys-0.014
growth-0.014
 mast-0.014
Token This
Token ID770
Feature activation-3.33e-04

Pos logit contributions
 welcome+0.011
 helping+0.011
 polite+0.010
 hugging+0.010
 eager+0.010
Neg logit contributions
 hawk-0.019
 prope-0.018
growth-0.017
 valleys-0.017
 trigger-0.017
Token sound
Token ID2128
Feature activation-1.75e-02

Pos logit contributions
 welcome+0.000
 helping+0.000
 polite+0.000
 sweet+0.000
 eager+0.000
Neg logit contributions
 hawk-0.001
 prope-0.001
growth-0.001
 valleys-0.001
 trigger-0.001
Token made
Token ID925
Feature activation-3.28e-03

Pos logit contributions
 welcome+0.024
 helping+0.023
 hugging+0.023
 liking+0.023
 eager+0.022
Neg logit contributions
 hawk-0.035
 prope-0.032
 happens-0.032
 valleys-0.031
growth-0.031
Token everyone
Token ID2506
Feature activation-6.23e-02

Pos logit contributions
 liking+0.005
 hugging+0.005
 eager+0.005
 welcome+0.005
 helping+0.005
Neg logit contributions
 hawk-0.006
growth-0.005
 valleys-0.005
 prope-0.005
 winds-0.005
Token nearby
Token ID6716
Feature activation-3.87e-03

Pos logit contributions
 welcome+0.061
 helping+0.058
 liking+0.057
 hugging+0.056
 polite+0.054
Neg logit contributions
 hawk-0.137
 prope-0.133
 valleys-0.128
 happens-0.124
 obstacle-0.122
Token very
Token ID845
Feature activation-2.24e-02

Pos logit contributions
 welcome+0.005
 hugging+0.005
 liking+0.005
 obedient+0.005
 polite+0.005
Neg logit contributions
 prope-0.007
 hawk-0.007
 valleys-0.007
 islands-0.007
aw-0.006
Token happy
Token ID3772
Feature activation-8.14e-03

Pos logit contributions
 welcome+0.016
 helping+0.016
 hugging+0.015
 liking+0.015
 cleaning+0.014
Neg logit contributions
 hawk-0.056
 prope-0.055
 valleys-0.053
 happens-0.052
aw-0.052
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
 welcome+0.012
 polite+0.012
 hugging+0.012
 obedient+0.012
 eager+0.011
Neg logit contributions
 hawk-0.013
 islands-0.013
 valleys-0.013
 prope-0.013
 winds-0.013
Token
Token ID198
Feature activation-5.77e-03

Pos logit contributions
 welcome+0.012
 helping+0.011
 hugging+0.011
 polite+0.011
 eager+0.011
Neg logit contributions
 hawk-0.024
 prope-0.023
growth-0.022
 valleys-0.022
 trigger-0.022
Token how
Token ID703
Feature activation-1.29e-04

Pos logit contributions
 hawk+0.002
 prope+0.002
 trigger+0.002
 obstacle+0.002
 mast+0.002
Neg logit contributions
 hugging-0.002
 eager-0.002
 liking-0.002
 obedient-0.002
 helping-0.002
Token are
Token ID389
Feature activation-2.19e-02

Pos logit contributions
 hugging+0.000
 helping+0.000
 welcome+0.000
 liking+0.000
 eager+0.000
Neg logit contributions
 hawk-0.000
 prope-0.000
 valleys-0.000
growth-0.000
 happens-0.000
Token you
Token ID345
Feature activation-1.42e-02

Pos logit contributions
 welcome+0.016
 hugging+0.016
 liking+0.015
 eager+0.015
 helping+0.015
Neg logit contributions
 hawk-0.057
 prope-0.051
 valleys-0.051
 happens-0.050
 winds-0.050
Token?"
Token ID1701
Feature activation-9.75e-03

Pos logit contributions
 Deal+0.017
 sweet+0.015
 hugging+0.015
 loving+0.015
 cheek+0.015
Neg logit contributions
 hawk-0.026
 happens-0.024
 islands-0.024
 targets-0.024
 island-0.023
Token The
Token ID383
Feature activation-2.61e-02

Pos logit contributions
 welcome+0.014
 helping+0.014
 polite+0.014
 liking+0.013
 obedient+0.013
Neg logit contributions
 hawk-0.018
growth-0.017
 prope-0.016
 valleys-0.016
aw-0.015
Token all
Token ID477
Feature activation-6.22e-02

Pos logit contributions
 helping+0.028
 hugging+0.028
 liking+0.027
 welcome+0.027
 cleaning+0.025
Neg logit contributions
 hawk-0.060
 prope-0.054
 valleys-0.053
 trigger-0.052
growth-0.052
Tokenigator
Token ID23823
Feature activation-2.18e-02

Pos logit contributions
 helping+0.108
 welcome+0.107
 hugging+0.106
 liking+0.105
 polite+0.103
Neg logit contributions
 hawk-0.100
 prope-0.089
 valleys-0.086
growth-0.085
 trigger-0.082
Token didn
Token ID1422
Feature activation+1.09e-02

Pos logit contributions
 welcome+0.031
 helping+0.029
 liking+0.029
 hugging+0.028
 polite+0.027
Neg logit contributions
 hawk-0.044
 prope-0.039
aw-0.038
 happens-0.037
growth-0.037
Token't
Token ID470
Feature activation+6.56e-03

Pos logit contributions
 hawk+0.028
growth+0.026
 prope+0.026
 trigger+0.025
 valleys+0.024
Neg logit contributions
 welcome-0.009
 polite-0.008
 helping-0.008
 sweet-0.008
 obedient-0.008
Token reply
Token ID10971
Feature activation-4.71e-04

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.012
 trigger+0.012
crow+0.011
Neg logit contributions
 helping-0.008
 hugging-0.008
 cleaning-0.008
 eager-0.008
 liking-0.007
Token,
Token ID11
Feature activation-1.81e-03

Pos logit contributions
 welcome+0.001
 obedient+0.001
 eager+0.001
 freshly+0.001
 polite+0.001
Neg logit contributions
 hawk-0.001
aw-0.001
 happens-0.001
 prope-0.001
 islands-0.001
Token tool
Token ID2891
Feature activation+1.09e-02

Pos logit contributions
 welcome+0.018
 helping+0.017
 hugging+0.017
 eager+0.017
 liking+0.017
Neg logit contributions
 hawk-0.022
 prope-0.021
 mast-0.019
growth-0.019
 valleys-0.019
Token.
Token ID13
Feature activation-7.76e-03

Pos logit contributions
 hawk+0.021
 prope+0.019
 mast+0.018
 valleys+0.018
growth+0.018
Neg logit contributions
 welcome-0.015
 polite-0.014
 obedient-0.014
 eager-0.014
 liking-0.014
Token It
Token ID632
Feature activation-1.15e-02

Pos logit contributions
 helping+0.010
 polite+0.010
 liking+0.010
 hugging+0.010
 obedient+0.010
Neg logit contributions
growth-0.014
 valleys-0.014
 hawk-0.014
 prope-0.014
 winds-0.013
Token made
Token ID925
Feature activation-6.75e-03

Pos logit contributions
 welcome+0.018
 liking+0.017
 hugging+0.017
 helping+0.017
 eager+0.016
Neg logit contributions
 hawk-0.020
 prope-0.019
 happens-0.019
 valleys-0.018
growth-0.018
Token her
Token ID607
Feature activation-3.22e-02

Pos logit contributions
 liking+0.009
 hugging+0.008
 welcome+0.008
 eager+0.008
 helping+0.008
Neg logit contributions
 hawk-0.013
 valleys-0.012
 prope-0.012
growth-0.012
 winds-0.012
Token friends
Token ID2460
Feature activation-6.21e-02

Pos logit contributions
 welcome+0.037
 helping+0.036
 liking+0.036
 hugging+0.035
 polite+0.034
Neg logit contributions
 hawk-0.069
 prope-0.067
 valleys-0.063
growth-0.063
 obstacle-0.061
Token happy
Token ID3772
Feature activation-2.15e-02

Pos logit contributions
 welcome+0.058
 liking+0.054
 hugging+0.052
ll+0.052
 helping+0.050
Neg logit contributions
 prope-0.134
 hawk-0.128
 valleys-0.127
 mast-0.124
 hull-0.123
Token,
Token ID11
Feature activation-1.74e-03

Pos logit contributions
 welcome+0.030
 obedient+0.029
 polite+0.028
 eager+0.028
 hugging+0.027
Neg logit contributions
 hawk-0.037
 happens-0.036
 islands-0.035
 prope-0.035
 valleys-0.035
Token it
Token ID340
Feature activation-2.37e-02

Pos logit contributions
 welcome+0.002
pack+0.002
ll+0.002
 liking+0.002
 Deal+0.002
Neg logit contributions
 hawk-0.003
 prope-0.003
 valleys-0.003
 islands-0.003
aw-0.003
Token made
Token ID925
Feature activation+1.40e-03

Pos logit contributions
 welcome+0.038
 liking+0.036
 hugging+0.035
 helping+0.035
 polite+0.034
Neg logit contributions
 hawk-0.040
 prope-0.038
 valleys-0.038
 happens-0.037
growth-0.036
Token them
Token ID606
Feature activation-5.23e-02

Pos logit contributions
aw+0.002
 hawk+0.002
 valleys+0.002
 winds+0.002
 happens+0.002
Neg logit contributions
 liking-0.002
 hugging-0.002
ll-0.002
 eager-0.002
 welcome-0.002
Token sad
Token ID6507
Feature activation+1.99e-02

Pos logit contributions
 welcome+0.003
 Deal+0.003
 sweet+0.003
 freshly+0.003
 myself+0.003
Neg logit contributions
 prope-0.003
 valleys-0.003
 mast-0.002
 islands-0.002
 hawk-0.002
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
 hawk+0.040
 prope+0.038
 valleys+0.036
growth+0.036
 islands+0.035
Neg logit contributions
 welcome-0.027
 helping-0.025
 polite-0.025
 hugging-0.024
 eager-0.024
Token The
Token ID383
Feature activation-2.04e-03

Pos logit contributions
 hawk+0.023
 prope+0.022
growth+0.021
 valleys+0.021
 trigger+0.021
Neg logit contributions
 welcome-0.013
 helping-0.013
 polite-0.012
 hugging-0.012
 sweet-0.012
Token treasure
Token ID14068
Feature activation+4.80e-03

Pos logit contributions
 welcome+0.003
 helping+0.003
 polite+0.003
 sweet+0.003
 eager+0.003
Neg logit contributions
 hawk-0.004
 prope-0.004
growth-0.003
 valleys-0.003
 trigger-0.003
Token made
Token ID925
Feature activation+1.78e-02

Pos logit contributions
 hawk+0.010
 prope+0.009
 valleys+0.008
growth+0.008
 mast+0.008
Neg logit contributions
 welcome-0.007
 helping-0.006
 liking-0.006
 polite-0.006
 hugging-0.006
Token everyone
Token ID2506
Feature activation-6.21e-02

Pos logit contributions
 hawk+0.033
 prope+0.031
growth+0.030
 valleys+0.029
aw+0.029
Neg logit contributions
 welcome-0.026
 helping-0.026
 hugging-0.025
 liking-0.025
 eager-0.025
Token feel
Token ID1254
Feature activation+1.06e-02

Pos logit contributions
 welcome+0.057
 helping+0.055
 liking+0.052
 hugging+0.052
 polite+0.051
Neg logit contributions
 hawk-0.149
 prope-0.141
growth-0.133
 valleys-0.133
 trigger-0.131
Token bad
Token ID2089
Feature activation+3.58e-03

Pos logit contributions
 hawk+0.024
 prope+0.023
growth+0.022
 valleys+0.022
 trigger+0.022
Neg logit contributions
 helping-0.010
 welcome-0.010
 hugging-0.010
 liking-0.010
 polite-0.010
Token,
Token ID11
Feature activation-1.76e-03

Pos logit contributions
 hawk+0.008
 prope+0.007
growth+0.007
 valleys+0.007
 trigger+0.007
Neg logit contributions
 welcome-0.004
 helping-0.004
 polite-0.004
 hugging-0.004
 eager-0.004
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
 welcome+0.002
pack+0.002
 helping+0.002
 hugging+0.002
 liking+0.002
Neg logit contributions
 hawk-0.004
 prope-0.003
 mast-0.003
 islands-0.003
 winds-0.003
Token it
Token ID340
Feature activation-1.17e-02

Pos logit contributions
 hawk+0.018
 prope+0.017
 happens+0.016
 valleys+0.016
aw+0.016
Neg logit contributions
 welcome-0.016
 hugging-0.016
 liking-0.015
 helping-0.015
 polite-0.015
Token And
Token ID843
Feature activation-1.26e-02

Pos logit contributions
 helping+0.018
 polite+0.018
 liking+0.018
 obedient+0.018
 hugging+0.018
Neg logit contributions
 hawk-0.029
growth-0.028
 valleys-0.027
aw-0.026
 prope-0.026
Token whenever
Token ID8797
Feature activation-3.77e-03

Pos logit contributions
 hugging+0.017
 welcome+0.017
 liking+0.017
 polite+0.017
 cheek+0.017
Neg logit contributions
 hawk-0.022
growth-0.019
 prope-0.019
 valleys-0.019
aw-0.019
Token she
Token ID673
Feature activation-1.85e-02

Pos logit contributions
 hugging+0.005
 liking+0.004
 cheek+0.004
 eager+0.004
 helping+0.004
Neg logit contributions
 hawk-0.008
 happens-0.007
growth-0.007
 valleys-0.006
 winds-0.006
Token helped
Token ID4193
Feature activation-9.28e-03

Pos logit contributions
 welcome+0.030
 helping+0.029
 hugging+0.028
 sweet+0.028
 liking+0.028
Neg logit contributions
 hawk-0.032
 happens-0.030
growth-0.029
 prope-0.028
 valleys-0.028
Token her
Token ID607
Feature activation-3.94e-02

Pos logit contributions
 eager+0.010
ll+0.010
 liking+0.010
 hugging+0.009
 obedient+0.009
Neg logit contributions
 hawk-0.019
crow-0.018
 prope-0.017
 altitude-0.017
 obstacle-0.017
Token friends
Token ID2460
Feature activation-6.20e-02

Pos logit contributions
 eager+0.046
 welcome+0.045
 liking+0.045
 ready+0.043
 polite+0.042
Neg logit contributions
 hawk-0.072
 prope-0.063
growth-0.063
 obstacle-0.063
 happens-0.062
Token,
Token ID11
Feature activation-9.60e-03

Pos logit contributions
 liking+0.084
 hugging+0.080
 obedient+0.078
 supposed+0.077
 eager+0.076
Neg logit contributions
 hawk-0.101
 valleys-0.092
 happens-0.090
 prope-0.090
 mast-0.088
Token her
Token ID607
Feature activation-1.83e-02

Pos logit contributions
 liking+0.013
 cheek+0.013
 hugging+0.013
 welcome+0.013
pack+0.013
Neg logit contributions
 hawk-0.017
 happens-0.015
 winds-0.015
 valleys-0.015
 prope-0.014
Token special
Token ID2041
Feature activation-1.31e-02

Pos logit contributions
 welcome+0.021
 liking+0.020
 myself+0.019
 ready+0.019
 eager+0.019
Neg logit contributions
 hawk-0.038
 prope-0.035
 winds-0.033
 obstacle-0.031
aw-0.031
Token mint
Token ID21043
Feature activation-1.02e-02

Pos logit contributions
 welcome+0.018
 eager+0.018
 helping+0.018
 liking+0.018
 obedient+0.018
Neg logit contributions
 hawk-0.023
 prope-0.021
 trigger-0.020
 winds-0.020
 dot-0.019
Token was
Token ID373
Feature activation+1.02e-02

Pos logit contributions
 welcome+0.016
 liking+0.016
 hugging+0.015
 eager+0.015
 helping+0.015
Neg logit contributions
 hawk-0.017
 happens-0.015
 winds-0.015
 valleys-0.015
 prope-0.015
Token.
Token ID13
Feature activation-6.03e-03

Pos logit contributions
 welcome+0.006
 eager+0.006
 obedient+0.006
 liking+0.006
 hugging+0.006
Neg logit contributions
 islands-0.008
 hawk-0.008
 prope-0.008
 happens-0.008
 valleys-0.008
Token Even
Token ID3412
Feature activation+3.59e-03

Pos logit contributions
 welcome+0.007
 helping+0.007
 polite+0.007
 sweet+0.007
 hugging+0.007
Neg logit contributions
 hawk-0.013
 prope-0.012
growth-0.012
 valleys-0.012
 trigger-0.011
Token if
Token ID611
Feature activation+4.53e-03

Pos logit contributions
 hawk+0.007
 prope+0.006
 valleys+0.006
 islands+0.006
 happens+0.006
Neg logit contributions
 welcome-0.005
 hugging-0.005
 liking-0.005
 cheek-0.005
 eager-0.005
Token someone
Token ID2130
Feature activation-7.80e-03

Pos logit contributions
 hawk+0.010
 prope+0.009
 happens+0.008
 islands+0.008
 valleys+0.008
Neg logit contributions
 welcome-0.006
 hugging-0.006
 eager-0.005
 liking-0.005
 helping-0.005
Token is
Token ID318
Feature activation+7.61e-03

Pos logit contributions
 hugging+0.012
 liking+0.012
 helping+0.011
 sweet+0.011
 eager+0.011
Neg logit contributions
 hawk-0.014
 happens-0.013
 prope-0.012
growth-0.011
 islands-0.011
Token not
Token ID407
Feature activation+4.50e-02

Pos logit contributions
 hawk+0.017
 prope+0.015
 happens+0.015
 islands+0.014
 altitude+0.014
Neg logit contributions
pack-0.007
 hugging-0.007
ll-0.007
 cheek-0.007
 liking-0.007
Token like
Token ID588
Feature activation+4.22e-03

Pos logit contributions
 hawk+0.098
 prope+0.088
 happens+0.086
 winds+0.083
 altitude+0.083
Neg logit contributions
 hugging-0.047
 liking-0.043
 cleaning-0.043
pack-0.043
 handing-0.042
Token you
Token ID345
Feature activation+4.77e-03

Pos logit contributions
 hawk+0.008
 prope+0.007
 happens+0.007
 islands+0.007
 valleys+0.007
Neg logit contributions
 hugging-0.006
 welcome-0.006
 obedient-0.006
pack-0.006
 eager-0.006
Token,
Token ID11
Feature activation-9.76e-03

Pos logit contributions
 hawk+0.008
 happens+0.007
 islands+0.007
 targets+0.007
aw+0.007
Neg logit contributions
 hugging-0.007
 cleaning-0.007
 obedient-0.007
 handing-0.007
 helping-0.007
Token unite
Token ID24558
Feature activation+1.88e-03

Pos logit contributions
 hugging+0.012
 welcome+0.012
 eager+0.012
 liking+0.012
 obedient+0.011
Neg logit contributions
 hawk-0.020
 prope-0.019
 valleys-0.018
 islands-0.018
growth-0.017
Token with
Token ID351
Feature activation-6.08e-03

Pos logit contributions
 hawk+0.003
 islands+0.003
 happens+0.003
 targets+0.003
 prope+0.003
Neg logit contributions
 hugging-0.003
 liking-0.002
 eager-0.002
 obedient-0.002
 handing-0.002
Token use
Token ID779
Feature activation+1.39e-02

Pos logit contributions
 welcome+0.004
 eager+0.004
 helping+0.004
 hugging+0.004
 obedient+0.004
Neg logit contributions
 hawk-0.008
 prope-0.008
growth-0.007
 mast-0.007
 hull-0.007
Token them
Token ID606
Feature activation-4.77e-03

Pos logit contributions
 hawk+0.025
 prope+0.025
 valleys+0.023
growth+0.023
 islands+0.023
Neg logit contributions
 welcome-0.020
 eager-0.019
 hugging-0.019
 obedient-0.019
 helping-0.019
Token to
Token ID284
Feature activation+8.50e-03

Pos logit contributions
 hugging+0.006
 welcome+0.006
 eager+0.006
 liking+0.006
 helping+0.006
Neg logit contributions
 prope-0.009
 hawk-0.008
 islands-0.008
 valleys-0.008
 happens-0.008
Token learn
Token ID2193
Feature activation+3.43e-02

Pos logit contributions
 prope+0.016
 mast+0.015
 hawk+0.015
growth+0.015
 hull+0.015
Neg logit contributions
 hugging-0.011
 eager-0.011
 obedient-0.010
 helping-0.010
 drinking-0.010
Token about
Token ID546
Feature activation+2.03e-02

Pos logit contributions
 prope+0.066
 hawk+0.066
 happens+0.063
 islands+0.062
aw+0.060
Neg logit contributions
 eager-0.043
 welcome-0.043
 hugging-0.042
 helping-0.042
 obedient-0.041
Token numbers
Token ID3146
Feature activation+4.23e-02

Pos logit contributions
 hawk+0.032
 prope+0.031
 islands+0.030
 valleys+0.029
 targets+0.029
Neg logit contributions
 welcome-0.034
 eager-0.034
 obedient-0.033
 hugging-0.032
 ready-0.031
Token,
Token ID11
Feature activation+1.67e-02

Pos logit contributions
 hawk+0.080
 prope+0.079
 valleys+0.076
 happens+0.075
 islands+0.075
Neg logit contributions
 welcome-0.055
 hugging-0.054
 eager-0.053
 helping-0.053
 liking-0.053
Token shapes
Token ID15268
Feature activation+2.35e-02

Pos logit contributions
 prope+0.027
 hawk+0.026
 islands+0.025
 valleys+0.024
 mast+0.024
Neg logit contributions
 welcome-0.027
 eager-0.026
 hugging-0.026
 helping-0.025
 obedient-0.025
Token,
Token ID11
Feature activation+2.08e-02

Pos logit contributions
 hawk+0.045
 prope+0.043
 valleys+0.041
 islands+0.041
 happens+0.041
Neg logit contributions
 eager-0.031
 obedient-0.030
 polite-0.029
 hugging-0.029
 welcome-0.029
Token patterns
Token ID7572
Feature activation+1.73e-02

Pos logit contributions
 prope+0.037
 hawk+0.036
 valleys+0.035
 islands+0.034
 mast+0.033
Neg logit contributions
 welcome-0.032
 eager-0.031
 hugging-0.030
 obedient-0.030
 helping-0.030
Token,
Token ID11
Feature activation+2.15e-02

Pos logit contributions
 hawk+0.031
 prope+0.030
 valleys+0.029
 islands+0.029
aw+0.029
Neg logit contributions
 eager-0.024
 obedient-0.023
 polite-0.023
 welcome-0.023
 thoughtful-0.022
Token,
Token ID11
Feature activation-2.40e-03

Pos logit contributions
 hawk+0.011
 prope+0.010
growth+0.010
 valleys+0.010
 happens+0.009
Neg logit contributions
 welcome-0.007
 polite-0.007
 liking-0.007
 obedient-0.007
 sweet-0.007
Token they
Token ID484
Feature activation+6.44e-03

Pos logit contributions
 welcome+0.003
 liking+0.003
 hugging+0.003
 helping+0.003
 polite+0.003
Neg logit contributions
 hawk-0.005
 prope-0.004
 valleys-0.004
growth-0.004
 islands-0.004
Token find
Token ID1064
Feature activation+1.04e-02

Pos logit contributions
 hawk+0.013
 prope+0.012
growth+0.011
 valleys+0.011
 mast+0.011
Neg logit contributions
 helping-0.009
 welcome-0.009
 hugging-0.009
 liking-0.009
 polite-0.009
Token a
Token ID257
Feature activation+1.11e-02

Pos logit contributions
 hawk+0.024
 prope+0.023
growth+0.022
 valleys+0.022
 trigger+0.021
Neg logit contributions
 welcome-0.011
 helping-0.011
 liking-0.010
 hugging-0.010
 polite-0.010
Token big
Token ID1263
Feature activation+6.98e-03

Pos logit contributions
 hawk+0.025
 prope+0.022
 mast+0.021
 trigger+0.021
 obstacle+0.020
Neg logit contributions
 myself-0.013
 hugging-0.013
 welcome-0.012
 helping-0.012
 liking-0.012
Token steel
Token ID7771
Feature activation+4.45e-02

Pos logit contributions
 hawk+0.012
 prope+0.011
 mast+0.010
 trigger+0.010
 obstacle+0.010
Neg logit contributions
 helping-0.011
 liking-0.011
 welcome-0.011
 hugging-0.011
 eager-0.011
Token truck
Token ID7779
Feature activation+1.36e-02

Pos logit contributions
 hawk+0.084
 prope+0.080
growth+0.079
 valleys+0.079
 islands+0.073
Neg logit contributions
 helping-0.063
 welcome-0.063
 sweet-0.059
 liking-0.059
 polite-0.059
Token in
Token ID287
Feature activation-1.09e-02

Pos logit contributions
 hawk+0.026
 prope+0.024
 mast+0.023
 trigger+0.023
growth+0.023
Neg logit contributions
 welcome-0.019
 sweet-0.018
 polite-0.018
 obedient-0.018
 thoughtful-0.018
Token the
Token ID262
Feature activation+1.83e-02

Pos logit contributions
 welcome+0.014
 helping+0.014
 hugging+0.014
 liking+0.014
 polite+0.014
Neg logit contributions
 hawk-0.022
 prope-0.021
growth-0.021
 valleys-0.020
 islands-0.019
Token garage
Token ID15591
Feature activation+2.03e-02

Pos logit contributions
 hawk+0.036
 prope+0.034
growth+0.032
 valleys+0.031
 trigger+0.031
Neg logit contributions
 welcome-0.027
 helping-0.025
 polite-0.025
 eager-0.024
 sweet-0.024
Token.
Token ID13
Feature activation+1.95e-03

Pos logit contributions
 hawk+0.033
 prope+0.032
 mast+0.032
 valleys+0.032
 hull+0.031
Neg logit contributions
 welcome-0.032
 sweet-0.030
 obedient-0.030
 liking-0.030
 polite-0.030
Token but
Token ID475
Feature activation+3.41e-02

Pos logit contributions
 hawk+0.046
 valleys+0.044
 islands+0.042
 prope+0.042
 mountains+0.041
Neg logit contributions
 welcome-0.028
 hugging-0.027
 liking-0.026
 helping-0.025
 polite-0.025
Token some
Token ID617
Feature activation-3.63e-03

Pos logit contributions
 hawk+0.066
 prope+0.062
growth+0.059
 valleys+0.059
 trigger+0.057
Neg logit contributions
 welcome-0.049
 helping-0.048
 hugging-0.046
 polite-0.046
 sweet-0.046
Token of
Token ID286
Feature activation-9.09e-03

Pos logit contributions
 welcome+0.005
 helping+0.005
 hugging+0.004
 polite+0.004
 sweet+0.004
Neg logit contributions
 hawk-0.008
 prope-0.007
growth-0.007
 valleys-0.007
 trigger-0.007
Token them
Token ID606
Feature activation-4.28e-03

Pos logit contributions
 welcome+0.014
 helping+0.014
 hugging+0.013
 polite+0.013
 eager+0.013
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 valleys-0.015
 trigger-0.014
Token were
Token ID547
Feature activation+2.05e-02

Pos logit contributions
 welcome+0.007
 hugging+0.007
 liking+0.007
 helping+0.007
 sweet+0.007
Neg logit contributions
 hawk-0.007
 prope-0.006
 valleys-0.006
aw-0.006
 happens-0.006
Token not
Token ID407
Feature activation+4.82e-02

Pos logit contributions
 hawk+0.047
 prope+0.042
 valleys+0.040
 winds+0.039
 islands+0.039
Neg logit contributions
 welcome-0.023
 hugging-0.021
 liking-0.020
 handing-0.019
 helping-0.019
Token fit
Token ID4197
Feature activation+8.78e-03

Pos logit contributions
 hawk+0.109
 prope+0.098
 valleys+0.093
 islands+0.093
 altitude+0.093
Neg logit contributions
 hugging-0.050
 liking-0.046
 welcome-0.045
 handing-0.044
 helping-0.044
Token enough
Token ID1576
Feature activation+1.73e-02

Pos logit contributions
 hawk+0.016
 prope+0.016
 valleys+0.015
 islands+0.015
 happens+0.014
Neg logit contributions
 welcome-0.012
 hugging-0.011
 liking-0.011
 polite-0.011
 sweet-0.011
Token to
Token ID284
Feature activation+7.27e-03

Pos logit contributions
 hawk+0.032
 islands+0.030
 prope+0.030
 winds+0.030
 mast+0.030
Neg logit contributions
 hugging-0.022
 welcome-0.021
 polite-0.021
 obedient-0.021
 eager-0.021
Token go
Token ID467
Feature activation-4.26e-03

Pos logit contributions
 hawk+0.013
 prope+0.013
 mast+0.013
 hull+0.013
growth+0.012
Neg logit contributions
 hugging-0.009
 eager-0.009
 sweet-0.009
 helping-0.009
 liking-0.009
Token.
Token ID13
Feature activation+2.92e-03

Pos logit contributions
 welcome+0.006
 sweet+0.006
 hugging+0.006
 polite+0.006
 liking+0.006
Neg logit contributions
 hawk-0.008
 valleys-0.007
 prope-0.007
 islands-0.007
 mast-0.007
Token with
Token ID351
Feature activation-7.17e-03

Pos logit contributions
 welcome+0.018
 hugging+0.016
 eager+0.016
 helping+0.016
 polite+0.016
Neg logit contributions
 hawk-0.049
 prope-0.047
growth-0.045
 valleys-0.045
 mast-0.044
Token his
Token ID465
Feature activation+9.01e-03

Pos logit contributions
 welcome+0.009
 helping+0.009
 polite+0.009
 eager+0.009
 hugging+0.009
Neg logit contributions
 hawk-0.015
 prope-0.014
growth-0.014
 valleys-0.013
 trigger-0.013
Token toys
Token ID14958
Feature activation-9.76e-03

Pos logit contributions
 hawk+0.019
 prope+0.018
 mast+0.017
 valleys+0.017
 trigger+0.017
Neg logit contributions
 welcome-0.011
 liking-0.011
 helping-0.010
 hugging-0.010
 polite-0.010
Token all
Token ID477
Feature activation-2.10e-02

Pos logit contributions
 welcome+0.012
 hugging+0.011
 helping+0.011
 liking+0.011
 polite+0.011
Neg logit contributions
 hawk-0.020
 prope-0.019
 valleys-0.018
 mast-0.018
growth-0.018
Token day
Token ID1110
Feature activation+8.02e-03

Pos logit contributions
 welcome+0.027
 helping+0.027
 hugging+0.027
 liking+0.026
 polite+0.026
Neg logit contributions
 hawk-0.042
 prope-0.039
 valleys-0.039
growth-0.037
 islands-0.036
Token long
Token ID890
Feature activation+4.92e-02

Pos logit contributions
 hawk+0.013
 prope+0.013
 mast+0.012
 valleys+0.012
 winds+0.012
Neg logit contributions
 welcome-0.013
 hugging-0.011
 polite-0.011
 liking-0.011
 obedient-0.011
Token.
Token ID13
Feature activation-9.93e-03

Pos logit contributions
 hawk+0.099
 prope+0.094
growth+0.090
 valleys+0.090
 islands+0.086
Neg logit contributions
 welcome-0.066
 helping-0.062
 polite-0.061
 hugging-0.061
 eager-0.061
Token One
Token ID1881
Feature activation-1.63e-02

Pos logit contributions
 welcome+0.016
 helping+0.016
 eager+0.015
 polite+0.015
 hugging+0.015
Neg logit contributions
 hawk-0.018
 prope-0.016
growth-0.016
 valleys-0.015
 trigger-0.015
Token day
Token ID1110
Feature activation+9.25e-03

Pos logit contributions
 hugging+0.024
 welcome+0.023
 helping+0.023
 liking+0.023
 polite+0.021
Neg logit contributions
 hawk-0.030
 prope-0.027
growth-0.026
 valleys-0.026
 obstacle-0.026
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
 hawk+0.021
 prope+0.019
growth+0.019
 valleys+0.018
 trigger+0.018
Neg logit contributions
 welcome-0.011
 helping-0.010
 polite-0.010
 hugging-0.010
 eager-0.010
Token he
Token ID339
Feature activation-4.87e-03

Pos logit contributions
 hawk+0.028
 prope+0.025
growth+0.025
 valleys+0.025
 trigger+0.024
Neg logit contributions
 welcome-0.018
 hugging-0.017
 liking-0.016
 helping-0.016
 polite-0.016
TokenYou
Token ID1639
Feature activation-1.22e-02

Pos logit contributions
 hawk+0.000
 prope+0.000
growth+0.000
 trigger+0.000
 mast+0.000
Neg logit contributions
 welcome-0.000
 helping-0.000
 polite-0.000
 sweet-0.000
 obedient-0.000
Token should
Token ID815
Feature activation+4.53e-03

Pos logit contributions
 welcome+0.015
 helping+0.015
 hugging+0.015
 liking+0.015
 polite+0.015
Neg logit contributions
 hawk-0.025
 prope-0.024
growth-0.023
 valleys-0.022
 trigger-0.022
Token never
Token ID1239
Feature activation+1.77e-02

Pos logit contributions
 hawk+0.011
 prope+0.010
growth+0.009
 altitude+0.009
 mast+0.009
Neg logit contributions
 hugging-0.004
 helping-0.004
 eager-0.004
 liking-0.004
 polite-0.004
Token go
Token ID467
Feature activation-1.50e-02

Pos logit contributions
 hawk+0.038
growth+0.034
 prope+0.034
 trigger+0.033
 mast+0.033
Neg logit contributions
 helping-0.020
 eager-0.020
 hugging-0.019
 sweet-0.019
 obedient-0.019
Token through
Token ID832
Feature activation-4.84e-03

Pos logit contributions
 sweet+0.018
 welcome+0.018
 polite+0.018
 hugging+0.018
 thoughtful+0.017
Neg logit contributions
 hawk-0.030
 valleys-0.027
 trigger-0.026
 islands-0.025
 mountains-0.025
Token holes
Token ID10421
Feature activation+9.39e-03

Pos logit contributions
 welcome+0.007
 helping+0.006
 polite+0.006
 eager+0.006
 hugging+0.006
Neg logit contributions
 hawk-0.010
 prope-0.009
growth-0.009
 valleys-0.009
 winds-0.008
Token in
Token ID287
Feature activation-2.93e-02

Pos logit contributions
 hawk+0.017
 prope+0.015
aw+0.015
 valleys+0.015
growth+0.015
Neg logit contributions
 hugging-0.013
 welcome-0.013
 polite-0.013
 helping-0.013
 eager-0.013
Token fences
Token ID35771
Feature activation+1.53e-02

Pos logit contributions
 welcome+0.041
 polite+0.041
 liking+0.041
 eager+0.041
 helping+0.040
Neg logit contributions
 hawk-0.057
growth-0.052
 prope-0.052
 valleys-0.051
 altitude-0.048
Token.
Token ID13
Feature activation-2.64e-03

Pos logit contributions
 hawk+0.029
 prope+0.026
 valleys+0.026
growth+0.025
aw+0.025
Neg logit contributions
 welcome-0.022
 polite-0.021
 sweet-0.021
 hugging-0.021
 obedient-0.021
Token They
Token ID1119
Feature activation-9.66e-04

Pos logit contributions
 welcome+0.003
 helping+0.003
 polite+0.003
 sweet+0.003
 hugging+0.003
Neg logit contributions
 hawk-0.006
 prope-0.005
growth-0.005
 valleys-0.005
 trigger-0.005
Token can
Token ID460
Feature activation-1.82e-03

Pos logit contributions
 hugging+0.002
 liking+0.002
 welcome+0.002
 helping+0.002
 eager+0.002
Neg logit contributions
 hawk-0.002
 prope-0.001
 happens-0.001
 winds-0.001
 valleys-0.001
Token
Token ID198
Feature activation+7.63e-03

Pos logit contributions
 helping+0.003
 welcome+0.003
 obedient+0.003
 hugging+0.003
 polite+0.003
Neg logit contributions
 hawk-0.004
 valleys-0.004
growth-0.004
 prope-0.004
aw-0.004
TokenL
Token ID43
Feature activation+4.39e-03

Pos logit contributions
 hawk+0.013
 prope+0.012
 valleys+0.011
growth+0.011
 trigger+0.011
Neg logit contributions
 welcome-0.013
 helping-0.012
 hugging-0.012
 eager-0.012
 talking-0.011
Tokenily
Token ID813
Feature activation-5.67e-04

Pos logit contributions
 hawk+0.006
growth+0.006
 prope+0.006
aw+0.006
 happens+0.006
Neg logit contributions
 helping-0.008
 welcome-0.008
 hugging-0.008
 eager-0.008
 polite-0.007
Token had
Token ID550
Feature activation+1.17e-02

Pos logit contributions
 welcome+0.001
 helping+0.001
 liking+0.001
 hugging+0.001
 sweet+0.001
Neg logit contributions
 hawk-0.001
 prope-0.001
 valleys-0.001
growth-0.001
aw-0.001
Token so
Token ID523
Feature activation+4.89e-03

Pos logit contributions
 hawk+0.027
 prope+0.026
 valleys+0.026
growth+0.025
aw+0.024
Neg logit contributions
 welcome-0.012
 hugging-0.012
 helping-0.011
 liking-0.011
 polite-0.010
Token much
Token ID881
Feature activation+6.02e-03

Pos logit contributions
 hawk+0.011
 prope+0.011
 trigger+0.011
 mast+0.010
growth+0.010
Neg logit contributions
 welcome-0.005
 sweet-0.005
 polite-0.005
 eager-0.005
 helping-0.005
Token fun
Token ID1257
Feature activation-1.48e-02

Pos logit contributions
 hawk+0.014
 prope+0.013
growth+0.013
 valleys+0.013
 happens+0.013
Neg logit contributions
 welcome-0.006
 helping-0.006
 hugging-0.005
 liking-0.005
 polite-0.005
Token playing
Token ID2712
Feature activation-2.62e-02

Pos logit contributions
 welcome+0.014
 helping+0.013
 polite+0.013
 hugging+0.012
 sweet+0.012
Neg logit contributions
 hawk-0.036
 prope-0.034
 valleys-0.033
growth-0.033
 islands-0.032
Token with
Token ID351
Feature activation-3.56e-03

Pos logit contributions
 hugging+0.027
 welcome+0.027
 eager+0.026
 sweet+0.025
 liking+0.025
Neg logit contributions
 hawk-0.057
 valleys-0.054
 prope-0.054
aw-0.053
growth-0.052
Token the
Token ID262
Feature activation+5.62e-03

Pos logit contributions
 welcome+0.005
 helping+0.004
 liking+0.004
 obedient+0.004
 eager+0.004
Neg logit contributions
 hawk-0.007
 prope-0.007
 valleys-0.006
growth-0.006
 islands-0.006
Token other
Token ID584
Feature activation-2.09e-04

Pos logit contributions
 hawk+0.011
 prope+0.010
 valleys+0.010
 islands+0.009
 obstacle+0.009
Neg logit contributions
 helping-0.008
 hugging-0.008
 liking-0.008
 welcome-0.008
 eager-0.008
Token played
Token ID2826
Feature activation-1.99e-02

Pos logit contributions
 welcome+0.005
 helping+0.005
 hugging+0.005
 polite+0.005
 liking+0.005
Neg logit contributions
 hawk-0.007
 prope-0.007
growth-0.007
 valleys-0.007
 mast-0.006
Token with
Token ID351
Feature activation-6.60e-04

Pos logit contributions
 welcome+0.018
 hugging+0.018
 helping+0.017
 liking+0.017
 eager+0.017
Neg logit contributions
 hawk-0.046
 prope-0.045
 valleys-0.044
growth-0.042
 islands-0.042
Token the
Token ID262
Feature activation-4.67e-05

Pos logit contributions
 sweet+0.001
 helping+0.001
 welcome+0.001
 hugging+0.001
 polite+0.001
Neg logit contributions
 hawk-0.002
growth-0.001
 prope-0.001
 trigger-0.001
 mast-0.001
Token white
Token ID2330
Feature activation-2.66e-03

Pos logit contributions
 welcome+0.000
 helping+0.000
 liking+0.000
 hugging+0.000
 obedient+0.000
Neg logit contributions
 hawk-0.000
 prope-0.000
 valleys-0.000
growth-0.000
 trigger-0.000
Token balloon
Token ID21190
Feature activation-4.21e-03

Pos logit contributions
 welcome+0.005
 helping+0.005
 polite+0.004
 hugging+0.004
 sweet+0.004
Neg logit contributions
 hawk-0.004
 prope-0.004
growth-0.004
 valleys-0.004
 trigger-0.004
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
 welcome+0.005
 helping+0.005
 polite+0.005
 hugging+0.005
 liking+0.005
Neg logit contributions
 hawk-0.009
 prope-0.009
 valleys-0.008
growth-0.008
 mast-0.008
Token had
Token ID550
Feature activation+1.60e-02

Pos logit contributions
 hawk+0.026
 prope+0.024
growth+0.024
 valleys+0.023
 trigger+0.023
Neg logit contributions
 welcome-0.016
 helping-0.016
 hugging-0.015
 polite-0.015
 sweet-0.015
Token so
Token ID523
Feature activation+1.29e-02

Pos logit contributions
 hawk+0.042
 prope+0.040
growth+0.039
 valleys+0.039
 trigger+0.038
Neg logit contributions
 welcome-0.012
 helping-0.011
 polite-0.010
 hugging-0.010
 sweet-0.010
Token much
Token ID881
Feature activation+1.10e-02

Pos logit contributions
 hawk+0.032
 prope+0.030
 trigger+0.029
 mast+0.029
growth+0.029
Neg logit contributions
 welcome-0.012
 sweet-0.012
 helping-0.012
 eager-0.012
 polite-0.012
Token fun
Token ID1257
Feature activation-1.02e-02

Pos logit contributions
 hawk+0.027
 prope+0.026
growth+0.025
 valleys+0.025
 trigger+0.025
Neg logit contributions
 welcome-0.010
 helping-0.009
 polite-0.009
 hugging-0.009
 eager-0.009
Token.
Token ID13
Feature activation-4.17e-04

Pos logit contributions
 welcome+0.011
 helping+0.011
 hugging+0.010
 polite+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.024
 prope-0.022
growth-0.022
 valleys-0.021
 trigger-0.021
Token dad
Token ID9955
Feature activation+5.32e-03

Pos logit contributions
 hawk+0.051
 prope+0.048
growth+0.046
 valleys+0.046
 trigger+0.045
Neg logit contributions
 helping-0.023
 welcome-0.023
 liking-0.022
 hugging-0.022
 polite-0.022
Token went
Token ID1816
Feature activation-1.84e-03

Pos logit contributions
 hawk+0.009
aw+0.008
 valleys+0.008
growth+0.008
 prope+0.008
Neg logit contributions
 liking-0.008
 helping-0.008
 polite-0.008
 welcome-0.008
 sweet-0.008
Token to
Token ID284
Feature activation+2.05e-02

Pos logit contributions
 sweet+0.002
 welcome+0.002
 polite+0.002
 liking+0.002
 obedient+0.002
Neg logit contributions
 hawk-0.003
 valleys-0.003
 islands-0.003
 prope-0.003
 cockpit-0.003
Token the
Token ID262
Feature activation+3.18e-02

Pos logit contributions
 hawk+0.036
growth+0.035
 prope+0.034
 valleys+0.033
 islands+0.033
Neg logit contributions
 helping-0.029
 hugging-0.028
 liking-0.028
 obedient-0.028
 welcome-0.028
Token pond
Token ID16723
Feature activation+1.50e-02

Pos logit contributions
 hawk+0.059
 prope+0.054
growth+0.052
 valleys+0.052
 trigger+0.050
Neg logit contributions
 welcome-0.048
 helping-0.047
 hugging-0.046
 polite-0.045
 eager-0.045
Token to
Token ID284
Feature activation+1.65e-02

Pos logit contributions
 islands+0.022
 hawk+0.022
 valleys+0.022
aw+0.022
 hull+0.022
Neg logit contributions
 welcome-0.024
 hugging-0.023
 liking-0.022
 sweet-0.022
 helping-0.022
Token go
Token ID467
Feature activation-6.02e-03

Pos logit contributions
 hawk+0.035
growth+0.034
 mast+0.033
 hull+0.033
 valleys+0.033
Neg logit contributions
 hugging-0.018
 helping-0.018
 liking-0.017
 obedient-0.017
 polite-0.016
Token fishing
Token ID12478
Feature activation+2.69e-02

Pos logit contributions
 hugging+0.007
 polite+0.007
 liking+0.007
 sweet+0.007
 helping+0.007
Neg logit contributions
 hawk-0.012
 valleys-0.011
 islands-0.011
growth-0.010
 prope-0.010
Token.
Token ID13
Feature activation+2.35e-02

Pos logit contributions
 hawk+0.057
 prope+0.053
 valleys+0.052
growth+0.052
 islands+0.051
Neg logit contributions
 welcome-0.032
 hugging-0.032
 helping-0.032
 liking-0.031
 sweet-0.030
Token Jane
Token ID12091
Feature activation+1.47e-02

Pos logit contributions
 hawk+0.042
 valleys+0.042
growth+0.042
 prope+0.039
 mast+0.038
Neg logit contributions
 helping-0.033
 hugging-0.032
 polite-0.032
 liking-0.032
 welcome-0.032
Token was
Token ID373
Feature activation+1.40e-02

Pos logit contributions
 hawk+0.031
 prope+0.030
growth+0.029
 valleys+0.028
 trigger+0.028
Neg logit contributions
 welcome-0.018
 helping-0.017
 polite-0.017
 hugging-0.017
 sweet-0.017
Token to
Token ID284
Feature activation-9.87e-03

Pos logit contributions
 hawk+0.005
 prope+0.004
 valleys+0.004
growth+0.004
 winds+0.004
Neg logit contributions
 welcome-0.003
 hugging-0.003
 helping-0.003
 liking-0.003
 polite-0.003
Token the
Token ID262
Feature activation-6.61e-03

Pos logit contributions
 hugging+0.013
 helping+0.013
 welcome+0.013
 eager+0.012
 obedient+0.012
Neg logit contributions
 hawk-0.020
growth-0.019
 prope-0.018
 valleys-0.018
 trigger-0.017
Token sound
Token ID2128
Feature activation-3.60e-03

Pos logit contributions
 helping+0.009
 hugging+0.009
 cleaning+0.009
 handing+0.009
pack+0.008
Neg logit contributions
 hawk-0.013
 trigger-0.012
 winds-0.011
 obstacle-0.011
 prope-0.011
Token,
Token ID11
Feature activation+1.96e-02

Pos logit contributions
 welcome+0.005
 helping+0.004
 hugging+0.004
 eager+0.004
 polite+0.004
Neg logit contributions
 hawk-0.007
 prope-0.007
 valleys-0.007
growth-0.007
aw-0.006
Token until
Token ID1566
Feature activation+1.43e-02

Pos logit contributions
 hawk+0.041
 prope+0.039
 valleys+0.038
 mast+0.036
 winds+0.035
Neg logit contributions
 welcome-0.025
 polite-0.022
 obedient-0.021
 helping-0.021
 liking-0.021
Token he
Token ID339
Feature activation+1.88e-02

Pos logit contributions
 prope+0.029
 hawk+0.029
 trigger+0.026
growth+0.026
 obstacle+0.026
Neg logit contributions
 helping-0.019
 ready-0.019
 first-0.018
 welcome-0.018
 not-0.018
Token arrived
Token ID5284
Feature activation-1.41e-03

Pos logit contributions
 hawk+0.035
 prope+0.034
growth+0.032
 valleys+0.032
 trigger+0.031
Neg logit contributions
 welcome-0.028
 helping-0.027
 polite-0.026
 sweet-0.026
 eager-0.026
Token at
Token ID379
Feature activation-9.85e-04

Pos logit contributions
 welcome+0.002
 helping+0.002
 hugging+0.002
 polite+0.002
 liking+0.002
Neg logit contributions
 hawk-0.003
aw-0.003
 valleys-0.003
 prope-0.002
 islands-0.002
Token a
Token ID257
Feature activation+2.40e-02

Pos logit contributions
 welcome+0.001
 helping+0.001
 hugging+0.001
 polite+0.001
 liking+0.001
Neg logit contributions
 hawk-0.002
 prope-0.002
growth-0.002
 valleys-0.002
 trigger-0.002
Token pond
Token ID16723
Feature activation+4.42e-03

Pos logit contributions
 hawk+0.052
 prope+0.049
growth+0.048
 valleys+0.047
 trigger+0.046
Neg logit contributions
 welcome-0.029
 helping-0.028
 polite-0.027
 sweet-0.027
 hugging-0.026
Token.
Token ID13
Feature activation-5.12e-03

Pos logit contributions
 hawk+0.009
 valleys+0.008
 prope+0.008
aw+0.008
growth+0.008
Neg logit contributions
 welcome-0.006
 helping-0.006
 hugging-0.006
 polite-0.006
 obedient-0.006
Token used
Token ID973
Feature activation+1.16e-02

Pos logit contributions
 hawk+0.010
 prope+0.009
growth+0.009
 valleys+0.009
 trigger+0.009
Neg logit contributions
 welcome-0.007
 helping-0.007
 polite-0.006
 eager-0.006
 obedient-0.006
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
 hawk+0.022
 prope+0.022
 valleys+0.021
growth+0.020
 mast+0.020
Neg logit contributions
 welcome-0.016
 hugging-0.015
 helping-0.015
 liking-0.015
 obedient-0.015
Token pipe
Token ID12656
Feature activation+9.89e-03

Pos logit contributions
 hawk+0.028
 prope+0.027
 valleys+0.025
growth+0.025
 mast+0.025
Neg logit contributions
 welcome-0.017
 helping-0.017
 hugging-0.017
 liking-0.016
 obedient-0.016
Token to
Token ID284
Feature activation-3.31e-04

Pos logit contributions
 hawk+0.023
 prope+0.022
 valleys+0.021
growth+0.021
 mast+0.021
Neg logit contributions
 welcome-0.010
 hugging-0.009
 helping-0.009
 polite-0.009
 liking-0.009
Token make
Token ID787
Feature activation+3.03e-04

Pos logit contributions
 hugging+0.000
 welcome+0.000
 helping+0.000
 eager+0.000
 liking+0.000
Neg logit contributions
 prope-0.001
 hawk-0.001
 mast-0.001
 valleys-0.001
 hull-0.001
Token a
Token ID257
Feature activation-1.02e-02

Pos logit contributions
 hawk+0.001
 prope+0.001
 valleys+0.001
growth+0.001
 trigger+0.001
Neg logit contributions
 welcome-0.000
 helping-0.000
 hugging-0.000
 liking-0.000
 eager-0.000
Token tunnel
Token ID13275
Feature activation-5.20e-03

Pos logit contributions
 welcome+0.016
 helping+0.016
 polite+0.015
 sweet+0.015
 hugging+0.015
Neg logit contributions
 hawk-0.018
 prope-0.017
growth-0.017
 valleys-0.016
 trigger-0.016
Token for
Token ID329
Feature activation-1.81e-03

Pos logit contributions
 welcome+0.006
 hugging+0.006
 helping+0.006
 liking+0.006
 eager+0.006
Neg logit contributions
 hawk-0.011
 prope-0.011
 valleys-0.010
 mast-0.010
growth-0.010
Token his
Token ID465
Feature activation+1.04e-02

Pos logit contributions
 welcome+0.003
 eager+0.002
 polite+0.002
 helping+0.002
 obedient+0.002
Neg logit contributions
 hawk-0.003
 prope-0.003
growth-0.003
 valleys-0.003
 islands-0.003
Token cars
Token ID5006
Feature activation-6.21e-03

Pos logit contributions
 hawk+0.020
 prope+0.019
 mast+0.018
 valleys+0.017
growth+0.017
Neg logit contributions
 welcome-0.015
 helping-0.014
 hugging-0.014
 liking-0.014
 polite-0.014
Token.
Token ID13
Feature activation-7.36e-03

Pos logit contributions
 hugging+0.008
 welcome+0.008
 liking+0.008
 sweet+0.008
 helping+0.008
Neg logit contributions
 hawk-0.011
 prope-0.011
 mast-0.010
 valleys-0.010
aw-0.010
Token cared
Token ID19951
Feature activation-1.67e-02

Pos logit contributions
 welcome+0.003
 helping+0.003
 hugging+0.002
 polite+0.002
 sweet+0.002
Neg logit contributions
 hawk-0.004
 prope-0.004
growth-0.004
 valleys-0.004
 trigger-0.003
Token about
Token ID546
Feature activation+1.21e-02

Pos logit contributions
 welcome+0.023
 hugging+0.022
 polite+0.022
 eager+0.022
 liking+0.021
Neg logit contributions
 hawk-0.030
 prope-0.029
 happens-0.027
growth-0.026
 trigger-0.026
Token doing
Token ID1804
Feature activation+1.37e-02

Pos logit contributions
 hawk+0.023
 prope+0.020
growth+0.020
 islands+0.020
 valleys+0.020
Neg logit contributions
 welcome-0.017
 obedient-0.017
 polite-0.016
 eager-0.016
 cheek-0.016
Token the
Token ID262
Feature activation-3.59e-03

Pos logit contributions
 hawk+0.025
 prope+0.024
 valleys+0.023
growth+0.023
 islands+0.022
Neg logit contributions
 welcome-0.020
 hugging-0.019
 helping-0.018
 eager-0.018
 liking-0.018
Token right
Token ID826
Feature activation+1.80e-02

Pos logit contributions
 welcome+0.005
 helping+0.005
 sweet+0.005
 polite+0.005
 eager+0.005
Neg logit contributions
 hawk-0.007
 prope-0.007
growth-0.006
 valleys-0.006
 trigger-0.006
Token thing
Token ID1517
Feature activation-2.04e-03

Pos logit contributions
 prope+0.034
 hawk+0.034
growth+0.032
 valleys+0.030
 mast+0.030
Neg logit contributions
 welcome-0.028
 sweet-0.026
 first-0.026
 looking-0.026
 polite-0.026
Token.
Token ID13
Feature activation-1.82e-02

Pos logit contributions
 welcome+0.003
 polite+0.003
 liking+0.003
 obedient+0.003
 hugging+0.003
Neg logit contributions
 hawk-0.004
aw-0.003
 targets-0.003
 happens-0.003
 prope-0.003
Token So
Token ID1406
Feature activation-1.85e-02

Pos logit contributions
 welcome+0.022
 helping+0.022
 polite+0.021
 liking+0.021
 obedient+0.021
Neg logit contributions
 hawk-0.038
 prope-0.036
growth-0.035
 valleys-0.035
 trigger-0.033
Token,
Token ID11
Feature activation-3.60e-03

Pos logit contributions
 welcome+0.023
 helping+0.021
 liking+0.021
 polite+0.021
 eager+0.021
Neg logit contributions
 hawk-0.040
 prope-0.037
growth-0.035
 valleys-0.035
 trigger-0.034
Token she
Token ID673
Feature activation-8.85e-03

Pos logit contributions
 welcome+0.004
 helping+0.004
 polite+0.004
 obedient+0.004
 liking+0.004
Neg logit contributions
 hawk-0.008
 prope-0.007
growth-0.007
 valleys-0.007
 trigger-0.007
Token went
Token ID1816
Feature activation-5.51e-03

Pos logit contributions
 welcome+0.013
 helping+0.012
 hugging+0.012
 polite+0.012
 sweet+0.012
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 valleys-0.015
 mast-0.015
Token Anna
Token ID11735
Feature activation-1.47e-02

Pos logit contributions
 welcome+0.002
 helping+0.002
 sweet+0.002
 hugging+0.002
 polite+0.002
Neg logit contributions
 hawk-0.005
 prope-0.005
growth-0.004
 valleys-0.004
 mast-0.004
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
 welcome+0.021
 polite+0.018
 sweet+0.018
 thoughtful+0.018
 helpful+0.018
Neg logit contributions
 hawk-0.028
 prope-0.027
growth-0.025
 valleys-0.025
 dot-0.025
Token Ben
Token ID3932
Feature activation-4.05e-03

Pos logit contributions
 welcome+0.030
 helping+0.029
 liking+0.028
 hugging+0.028
 polite+0.028
Neg logit contributions
 hawk-0.036
 prope-0.033
growth-0.033
 valleys-0.032
aw-0.031
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
 welcome+0.005
 sweet+0.004
 polite+0.004
 obedient+0.004
 thoughtful+0.004
Neg logit contributions
 hawk-0.008
 prope-0.008
 valleys-0.008
growth-0.007
 trigger-0.007
Token He
Token ID679
Feature activation-3.73e-03

Pos logit contributions
 hawk+0.020
 prope+0.019
growth+0.018
 valleys+0.018
 trigger+0.017
Neg logit contributions
 helping-0.019
 welcome-0.018
 hugging-0.018
 liking-0.018
 polite-0.018
Token was
Token ID373
Feature activation-1.56e-03

Pos logit contributions
 welcome+0.006
 helping+0.005
 hugging+0.005
 liking+0.005
 cleaning+0.005
Neg logit contributions
 hawk-0.007
 prope-0.006
 valleys-0.006
growth-0.006
aw-0.006
Token angry
Token ID7954
Feature activation-1.36e-02

Pos logit contributions
ll+0.002
cakes+0.002
pack+0.002
days+0.002
 allow+0.002
Neg logit contributions
 hawk-0.003
 prope-0.002
aw-0.002
 bolt-0.002
 beast-0.002
Token.
Token ID13
Feature activation-1.76e-03

Pos logit contributions
 welcome+0.015
 helping+0.014
 polite+0.014
 hugging+0.014
 eager+0.014
Neg logit contributions
 hawk-0.031
 prope-0.029
growth-0.028
 valleys-0.028
 trigger-0.027
Token He
Token ID679
Feature activation-2.38e-03

Pos logit contributions
 helping+0.003
 welcome+0.003
 hugging+0.003
 liking+0.003
 polite+0.003
Neg logit contributions
 hawk-0.003
 prope-0.003
growth-0.003
 valleys-0.003
 trigger-0.002
Token ran
Token ID4966
Feature activation+2.13e-03

Pos logit contributions
 welcome+0.004
 helping+0.004
 hugging+0.003
 cleaning+0.003
 liking+0.003
Neg logit contributions
 hawk-0.004
 prope-0.004
 happens-0.004
aw-0.004
 trigger-0.004
Token towards
Token ID3371
Feature activation+4.78e-03

Pos logit contributions
 hawk+0.005
 prope+0.004
 valleys+0.004
growth+0.004
 trigger+0.004
Neg logit contributions
 welcome-0.002
 helping-0.002
 hugging-0.002
 sweet-0.002
 liking-0.002
Token in
Token ID287
Feature activation-1.70e-02

Pos logit contributions
 hawk+0.021
 prope+0.020
 valleys+0.019
 trigger+0.019
growth+0.018
Neg logit contributions
 welcome-0.017
 hugging-0.015
 polite-0.015
 liking-0.015
 obedient-0.015
Token the
Token ID262
Feature activation-3.04e-03

Pos logit contributions
 welcome+0.022
 helping+0.021
 hugging+0.021
 polite+0.021
 sweet+0.021
Neg logit contributions
 hawk-0.035
 prope-0.033
growth-0.033
 valleys-0.032
 trigger-0.030
Token garden
Token ID11376
Feature activation-1.73e-03

Pos logit contributions
 hugging+0.004
 helping+0.004
 welcome+0.004
 liking+0.004
 polite+0.004
Neg logit contributions
 hawk-0.006
 prope-0.006
 valleys-0.006
growth-0.006
 trigger-0.005
Token.
Token ID13
Feature activation-6.26e-03

Pos logit contributions
 welcome+0.003
 polite+0.003
 obedient+0.003
 hugging+0.003
 sweet+0.003
Neg logit contributions
 hawk-0.003
 valleys-0.003
aw-0.003
 prope-0.003
 islands-0.003
Token But
Token ID887
Feature activation+8.81e-03

Pos logit contributions
 welcome+0.010
 helping+0.010
 hugging+0.010
 polite+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.011
 prope-0.010
growth-0.010
 valleys-0.010
 trigger-0.010
Token Max
Token ID5436
Feature activation-3.55e-04

Pos logit contributions
 hawk+0.018
 prope+0.017
growth+0.017
 valleys+0.016
 trigger+0.016
Neg logit contributions
 helping-0.012
 welcome-0.012
 obedient-0.012
 not-0.012
 polite-0.012
Token was
Token ID373
Feature activation+1.01e-03

Pos logit contributions
 welcome+0.000
 helping+0.000
 hugging+0.000
 polite+0.000
 liking+0.000
Neg logit contributions
 hawk-0.001
 prope-0.001
growth-0.001
 valleys-0.001
aw-0.001
Token still
Token ID991
Feature activation-6.00e-04

Pos logit contributions
 hawk+0.003
 prope+0.003
growth+0.002
 valleys+0.002
 trigger+0.002
Neg logit contributions
 welcome-0.001
 helping-0.001
 hugging-0.001
 polite-0.001
 sweet-0.001
Token a
Token ID257
Feature activation+1.79e-03

Pos logit contributions
 welcome+0.001
 liking+0.001
 sweet+0.001
 allow+0.001
 hugging+0.001
Neg logit contributions
 hawk-0.001
 prope-0.001
 valleys-0.001
 altitude-0.001
 dot-0.001
Token bit
Token ID1643
Feature activation-2.06e-03

Pos logit contributions
 hawk+0.004
 prope+0.004
growth+0.004
 valleys+0.004
 trigger+0.004
Neg logit contributions
 welcome-0.002
 helping-0.002
 hugging-0.002
 polite-0.002
 eager-0.002
Token naughty
Token ID45128
Feature activation+6.82e-04

Pos logit contributions
 welcome+0.002
 helping+0.002
 polite+0.002
 obedient+0.002
 sweet+0.002
Neg logit contributions
 hawk-0.005
 prope-0.005
growth-0.005
 valleys-0.005
 mast-0.005
Token very
Token ID845
Feature activation-1.51e-02

Pos logit contributions
 welcome+0.002
 liking+0.002
 hugging+0.002
 myself+0.002
 helping+0.002
Neg logit contributions
 hawk-0.005
 prope-0.005
aw-0.005
 valleys-0.005
 winds-0.005
Token much
Token ID881
Feature activation+1.42e-02

Pos logit contributions
 welcome+0.022
 hugging+0.021
 helping+0.021
 liking+0.020
 polite+0.020
Neg logit contributions
 hawk-0.027
 prope-0.025
growth-0.025
 valleys-0.024
aw-0.024
Token.
Token ID13
Feature activation-6.65e-03

Pos logit contributions
 hawk+0.025
 prope+0.023
aw+0.022
 valleys+0.022
growth+0.022
Neg logit contributions
 welcome-0.020
 hugging-0.020
 helping-0.019
 liking-0.019
 polite-0.019
Token She
Token ID1375
Feature activation+1.05e-02

Pos logit contributions
 obedient+0.009
 helping+0.009
 polite+0.009
 liking+0.009
 hugging+0.009
Neg logit contributions
 hawk-0.012
growth-0.012
 valleys-0.011
 prope-0.011
 happens-0.010
Token would
Token ID561
Feature activation-3.59e-03

Pos logit contributions
 hawk+0.021
 prope+0.020
growth+0.019
 valleys+0.018
 happens+0.018
Neg logit contributions
 welcome-0.014
 helping-0.014
 liking-0.013
 hugging-0.013
 polite-0.013
Token hug
Token ID16225
Feature activation-1.18e-02

Pos logit contributions
ll+0.004
 welcome+0.004
 hugging+0.004
 helping+0.004
 liking+0.004
Neg logit contributions
 prope-0.007
 hawk-0.006
 mast-0.006
 cockpit-0.006
 hull-0.006
Token Daisy
Token ID40355
Feature activation-7.65e-03

Pos logit contributions
pack+0.015
 welcome+0.015
 obedient+0.014
 polite+0.014
 eager+0.014
Neg logit contributions
 hawk-0.024
growth-0.022
 prope-0.021
 winds-0.021
 targets-0.020
Token every
Token ID790
Feature activation-5.90e-03

Pos logit contributions
 welcome+0.011
 liking+0.010
 helping+0.009
 polite+0.009
 hugging+0.009
Neg logit contributions
 hawk-0.013
 prope-0.013
 valleys-0.012
 islands-0.012
 planets-0.012
Token day
Token ID1110
Feature activation-3.61e-03

Pos logit contributions
 hugging+0.010
 helping+0.010
ll+0.010
 not+0.010
 sweet+0.010
Neg logit contributions
 hawk-0.009
 prope-0.008
growth-0.008
 dot-0.008
 trigger-0.007
Token and
Token ID290
Feature activation+7.10e-03

Pos logit contributions
 welcome+0.006
 Deal+0.005
 hugging+0.005
pack+0.005
 polite+0.005
Neg logit contributions
aw-0.005
 prope-0.005
 hawk-0.005
 mountains-0.005
 valleys-0.005
Token night
Token ID1755
Feature activation+7.80e-03

Pos logit contributions
 prope+0.014
 hawk+0.014
 happens+0.013
growth+0.013
 mast+0.012
Neg logit contributions
 liking-0.008
 welcome-0.008
 eager-0.008
 obedient-0.008
 helping-0.008
Token look
Token ID804
Feature activation+1.06e-02

Pos logit contributions
 hawk+0.002
 prope+0.002
 altitude+0.002
growth+0.002
 hull+0.002
Neg logit contributions
 hugging-0.002
 helping-0.002
 liking-0.002
 eager-0.002
 glad-0.001
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
 hawk+0.021
 prope+0.020
 valleys+0.020
 happens+0.019
aw+0.019
Neg logit contributions
 hugging-0.013
 helping-0.013
 welcome-0.012
 liking-0.012
 taking-0.011
Token He
Token ID679
Feature activation+8.84e-04

Pos logit contributions
 hawk+0.030
 valleys+0.030
growth+0.029
 winds+0.029
aw+0.028
Neg logit contributions
 welcome-0.028
 hugging-0.028
 helping-0.027
 liking-0.027
 polite-0.027
Token saw
Token ID2497
Feature activation+8.21e-04

Pos logit contributions
 hawk+0.002
 prope+0.002
growth+0.002
 happens+0.002
 valleys+0.002
Neg logit contributions
 welcome-0.001
 helping-0.001
 liking-0.001
 hugging-0.001
 sweet-0.001
Token lots
Token ID6041
Feature activation-1.97e-02

Pos logit contributions
 hawk+0.002
 prope+0.002
growth+0.002
 valleys+0.002
 trigger+0.002
Neg logit contributions
 welcome-0.001
 helping-0.001
 hugging-0.001
 polite-0.001
 liking-0.001
Token of
Token ID286
Feature activation-4.14e-02

Pos logit contributions
 welcome+0.023
 helping+0.022
 hugging+0.022
 liking+0.022
 polite+0.021
Neg logit contributions
 hawk-0.042
 prope-0.039
 valleys-0.039
growth-0.038
aw-0.037
Token shiny
Token ID22441
Feature activation+7.85e-03

Pos logit contributions
 welcome+0.065
 helping+0.064
 hugging+0.064
 liking+0.063
 eager+0.060
Neg logit contributions
 hawk-0.074
 prope-0.068
 valleys-0.066
growth-0.063
 winds-0.062
Token red
Token ID2266
Feature activation+1.42e-03

Pos logit contributions
 hawk+0.014
growth+0.014
 prope+0.013
 trigger+0.013
 hull+0.013
Neg logit contributions
 sweet-0.012
 helping-0.011
 welcome-0.011
 polite-0.011
 looking-0.011
Token apples
Token ID22514
Feature activation+1.36e-02

Pos logit contributions
 hawk+0.003
 prope+0.002
growth+0.002
 valleys+0.002
 mast+0.002
Neg logit contributions
 sweet-0.002
 helping-0.002
 welcome-0.002
 hugging-0.002
 liking-0.002
Token hanging
Token ID10938
Feature activation-5.74e-03

Pos logit contributions
 hawk+0.028
 valleys+0.025
 prope+0.025
 islands+0.024
 trigger+0.024
Neg logit contributions
 welcome-0.018
 polite-0.017
 obedient-0.017
 liking-0.016
 thoughtful-0.016
Token from
Token ID422
Feature activation+2.99e-02

Pos logit contributions
 welcome+0.007
 liking+0.007
 helping+0.007
 polite+0.007
 obedient+0.007
Neg logit contributions
 hawk-0.012
 prope-0.011
 valleys-0.011
 trigger-0.011
growth-0.010
Token
Token ID198
Feature activation-2.70e-03

Pos logit contributions
 helping+0.016
 liking+0.016
 obedient+0.015
 polite+0.015
 welcome+0.015
Neg logit contributions
 hawk-0.022
growth-0.021
 valleys-0.021
 prope-0.020
aw-0.020
Token
Token ID198
Feature activation+6.81e-03

Pos logit contributions
 helping+0.004
 welcome+0.004
 hugging+0.004
 liking+0.004
 obedient+0.003
Neg logit contributions
 hawk-0.005
 valleys-0.005
 prope-0.005
growth-0.005
 islands-0.004
TokenSarah
Token ID29284
Feature activation+6.51e-03

Pos logit contributions
 hawk+0.013
 prope+0.013
 valleys+0.012
growth+0.012
 trigger+0.012
Neg logit contributions
 welcome-0.009
 helping-0.009
 hugging-0.009
 sweet-0.009
 eager-0.009
Token wanted
Token ID2227
Feature activation-1.03e-02

Pos logit contributions
 hawk+0.014
 prope+0.013
growth+0.013
 valleys+0.013
aw+0.012
Neg logit contributions
 welcome-0.008
 helping-0.008
 liking-0.007
 polite-0.007
 hugging-0.007
Token to
Token ID284
Feature activation-1.40e-02

Pos logit contributions
 welcome+0.011
 hugging+0.011
 eager+0.011
 obedient+0.010
 polite+0.010
Neg logit contributions
 hawk-0.022
growth-0.020
 happens-0.020
 prope-0.020
 targets-0.020
Token play
Token ID711
Feature activation-3.75e-02

Pos logit contributions
 hugging+0.017
 helping+0.016
 eager+0.016
 glad+0.015
 preparing+0.015
Neg logit contributions
 hawk-0.025
growth-0.025
 prope-0.024
 trigger-0.024
 mast-0.023
Token with
Token ID351
Feature activation-4.51e-03

Pos logit contributions
 hugging+0.033
 eager+0.032
 welcome+0.031
 obedient+0.030
 helping+0.030
Neg logit contributions
 hawk-0.083
 prope-0.079
 trigger-0.077
 targets-0.076
aw-0.076
Token the
Token ID262
Feature activation-1.59e-02

Pos logit contributions
 welcome+0.008
 helping+0.008
 obedient+0.008
 liking+0.008
 eager+0.008
Neg logit contributions
 hawk-0.007
 prope-0.006
growth-0.006
 valleys-0.006
 trigger-0.006
Token basketball
Token ID9669
Feature activation-7.80e-03

Pos logit contributions
 hugging+0.023
 helping+0.022
 preparing+0.021
 liking+0.021
 obedient+0.021
Neg logit contributions
 trigger-0.028
 hawk-0.027
 targets-0.026
 obstacle-0.025
 arrows-0.024
Token too
Token ID1165
Feature activation-6.42e-03

Pos logit contributions
 welcome+0.010
 hugging+0.010
 liking+0.010
 polite+0.010
 obedient+0.010
Neg logit contributions
 hawk-0.015
 prope-0.014
 targets-0.013
 valleys-0.013
aw-0.013
Token,
Token ID11
Feature activation+1.24e-02

Pos logit contributions
 hugging+0.009
 welcome+0.009
 helping+0.009
 liking+0.009
 obedient+0.009
Neg logit contributions
 hawk-0.011
 prope-0.010
 altitude-0.010
 happens-0.010
aw-0.010
Token spark
Token ID9009
Feature activation-5.83e-03

Pos logit contributions
 welcome+0.018
 helping+0.018
 polite+0.017
 sweet+0.017
 hugging+0.017
Neg logit contributions
 hawk-0.013
 prope-0.012
growth-0.011
 valleys-0.011
 trigger-0.011
Token before
Token ID878
Feature activation+1.93e-02

Pos logit contributions
 welcome+0.008
 helping+0.008
 sweet+0.008
 liking+0.008
 polite+0.008
Neg logit contributions
 hawk-0.012
 prope-0.011
 valleys-0.011
growth-0.011
 islands-0.010
Token and
Token ID290
Feature activation-3.77e-03

Pos logit contributions
 hawk+0.035
 valleys+0.032
growth+0.032
 prope+0.032
 winds+0.031
Neg logit contributions
 welcome-0.029
 polite-0.028
 obedient-0.027
 sweet-0.026
 liking-0.026
Token it
Token ID340
Feature activation-1.28e-02

Pos logit contributions
 welcome+0.006
 helping+0.006
 hugging+0.006
 polite+0.006
 liking+0.006
Neg logit contributions
 hawk-0.006
 prope-0.006
 valleys-0.006
growth-0.006
 happens-0.005
Token made
Token ID925
Feature activation+2.79e-03

Pos logit contributions
 welcome+0.019
 helping+0.018
 hugging+0.018
 liking+0.017
 polite+0.017
Neg logit contributions
 hawk-0.024
 prope-0.023
 valleys-0.022
growth-0.022
 happens-0.021
Token him
Token ID683
Feature activation-3.41e-02

Pos logit contributions
 hawk+0.006
 prope+0.006
growth+0.006
 valleys+0.006
 trigger+0.005
Neg logit contributions
 welcome-0.003
 helping-0.003
 hugging-0.003
 liking-0.003
 polite-0.003
Token scared
Token ID12008
Feature activation-1.50e-02

Pos logit contributions
 welcome+0.039
 helping+0.038
 hugging+0.037
 liking+0.036
 cleaning+0.034
Neg logit contributions
 hawk-0.070
 prope-0.069
 valleys-0.065
growth-0.064
 mast-0.063
Token.
Token ID13
Feature activation-1.69e-02

Pos logit contributions
 welcome+0.018
 helping+0.018
 hugging+0.017
 polite+0.017
 eager+0.017
Neg logit contributions
 hawk-0.032
 prope-0.030
growth-0.029
 valleys-0.029
 trigger-0.028
Token
Token ID198
Feature activation-7.15e-03

Pos logit contributions
 welcome+0.019
 helping+0.018
 eager+0.017
 polite+0.017
 hugging+0.017
Neg logit contributions
 hawk-0.039
 prope-0.037
growth-0.035
 valleys-0.035
 trigger-0.034
Token
Token ID198
Feature activation-8.58e-05

Pos logit contributions
 welcome+0.007
 helping+0.007
 polite+0.007
 hugging+0.006
 eager+0.006
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.016
 valleys-0.015
 trigger-0.015
TokenThe
Token ID464
Feature activation-3.15e-03

Pos logit contributions
 welcome+0.000
 helping+0.000
 polite+0.000
 obedient+0.000
 eager+0.000
Neg logit contributions
 hawk-0.000
 prope-0.000
growth-0.000
 valleys-0.000
 trigger-0.000
Token.
Token ID13
Feature activation+7.81e-03

Pos logit contributions
 helping+0.008
 playing+0.007
 welcome+0.007
 liking+0.007
 eager+0.007
Neg logit contributions
 hawk-0.020
 prope-0.019
growth-0.018
 trigger-0.018
 valleys-0.018
Token Tom
Token ID4186
Feature activation+1.26e-03

Pos logit contributions
 hawk+0.017
 prope+0.016
growth+0.016
 valleys+0.016
 trigger+0.015
Neg logit contributions
 welcome-0.009
 helping-0.009
 polite-0.008
 hugging-0.008
 sweet-0.008
Token kicked
Token ID12165
Feature activation-1.61e-02

Pos logit contributions
 hawk+0.003
 prope+0.002
aw+0.002
 valleys+0.002
 happens+0.002
Neg logit contributions
 welcome-0.002
 helping-0.002
 liking-0.001
 hugging-0.001
 polite-0.001
Token the
Token ID262
Feature activation-7.68e-03

Pos logit contributions
 welcome+0.019
 helping+0.019
 polite+0.018
 sweet+0.018
 hugging+0.018
Neg logit contributions
 hawk-0.036
 prope-0.034
growth-0.032
 valleys-0.032
 trigger-0.031
Token ball
Token ID2613
Feature activation-2.20e-02

Pos logit contributions
 helping+0.012
 welcome+0.012
 hugging+0.012
 liking+0.012
 eager+0.012
Neg logit contributions
 hawk-0.014
 prope-0.013
 trigger-0.012
 valleys-0.012
 mast-0.012
Token into
Token ID656
Feature activation-3.43e-02

Pos logit contributions
 welcome+0.027
 hugging+0.025
 helping+0.025
 liking+0.024
 polite+0.024
Neg logit contributions
 hawk-0.045
 prope-0.043
 valleys-0.041
 trigger-0.041
growth-0.040
Token the
Token ID262
Feature activation-1.02e-02

Pos logit contributions
 welcome+0.044
 helping+0.043
 hugging+0.042
 obedient+0.041
 polite+0.041
Neg logit contributions
 hawk-0.070
 prope-0.067
growth-0.065
 valleys-0.065
 trigger-0.062
Token goal
Token ID3061
Feature activation+1.51e-03

Pos logit contributions
 welcome+0.015
 helping+0.015
 hugging+0.014
 polite+0.014
 eager+0.014
Neg logit contributions
 hawk-0.020
 prope-0.018
growth-0.018
 valleys-0.018
 trigger-0.017
Token and
Token ID290
Feature activation-1.69e-03

Pos logit contributions
 hawk+0.003
 valleys+0.003
 prope+0.003
growth+0.003
 mast+0.003
Neg logit contributions
 hugging-0.002
 welcome-0.002
 liking-0.002
 helping-0.002
 polite-0.002
Token won
Token ID1839
Feature activation-4.33e-03

Pos logit contributions
 welcome+0.002
 helping+0.002
 polite+0.002
 hugging+0.002
 sweet+0.002
Neg logit contributions
 hawk-0.003
 prope-0.003
growth-0.003
 valleys-0.003
 trigger-0.003
Token the
Token ID262
Feature activation+1.05e-02

Pos logit contributions
 welcome+0.005
 helping+0.005
 polite+0.005
 sweet+0.005
 hugging+0.005
Neg logit contributions
 hawk-0.010
 prope-0.009
growth-0.009
 valleys-0.009
 trigger-0.009
Token't
Token ID470
Feature activation+5.74e-04

Pos logit contributions
 welcome+0.004
 polite+0.004
 helping+0.004
 liking+0.004
 sweet+0.004
Neg logit contributions
 hawk-0.012
growth-0.012
 prope-0.011
 valleys-0.011
 happens-0.011
Token fall
Token ID2121
Feature activation-8.96e-03

Pos logit contributions
 hawk+0.001
 prope+0.001
growth+0.001
 valleys+0.001
 mast+0.001
Neg logit contributions
 welcome-0.001
 helping-0.001
 hugging-0.001
 sweet-0.001
 polite-0.001
Token asleep
Token ID16039
Feature activation-1.20e-02

Pos logit contributions
 welcome+0.013
 polite+0.013
 obedient+0.013
 thoughtful+0.013
 eager+0.013
Neg logit contributions
 hawk-0.016
 prope-0.016
 valleys-0.015
growth-0.014
 winds-0.014
Token.
Token ID13
Feature activation-1.62e-02

Pos logit contributions
 welcome+0.017
 hugging+0.015
 sweet+0.015
 liking+0.015
 polite+0.015
Neg logit contributions
 hawk-0.023
 prope-0.022
 valleys-0.021
growth-0.021
 mast-0.020
Token He
Token ID679
Feature activation-6.45e-03

Pos logit contributions
 helping+0.025
 welcome+0.025
 hugging+0.024
 liking+0.024
 polite+0.024
Neg logit contributions
 hawk-0.029
growth-0.027
 prope-0.027
 valleys-0.026
 mast-0.025
Token started
Token ID2067
Feature activation-3.62e-02

Pos logit contributions
 welcome+0.010
 helping+0.009
 hugging+0.009
 liking+0.009
 sweet+0.009
Neg logit contributions
 hawk-0.012
 prope-0.012
growth-0.011
 valleys-0.011
 happens-0.011
Token to
Token ID284
Feature activation-1.70e-02

Pos logit contributions
 welcome+0.048
 polite+0.039
 sweet+0.038
 thoughtful+0.037
 helpful+0.036
Neg logit contributions
 prope-0.072
 hawk-0.071
 valleys-0.069
growth-0.067
 winds-0.065
Token search
Token ID2989
Feature activation-2.12e-02

Pos logit contributions
 welcome+0.021
 liking+0.020
 polite+0.020
 helping+0.019
 obedient+0.019
Neg logit contributions
 prope-0.034
 hawk-0.034
 valleys-0.031
growth-0.031
 winds-0.030
Token for
Token ID329
Feature activation-8.31e-03

Pos logit contributions
 welcome+0.022
 liking+0.022
 hugging+0.022
 helping+0.021
 polite+0.021
Neg logit contributions
 hawk-0.047
 prope-0.045
 valleys-0.044
growth-0.043
 winds-0.042
Token his
Token ID465
Feature activation+5.92e-04

Pos logit contributions
 welcome+0.011
 helping+0.011
 hugging+0.010
 eager+0.010
 liking+0.010
Neg logit contributions
 hawk-0.017
 prope-0.016
growth-0.015
 valleys-0.015
 islands-0.015
Token favorite
Token ID4004
Feature activation-1.91e-02

Pos logit contributions
 hawk+0.001
 prope+0.001
 winds+0.001
growth+0.001
 valleys+0.001
Neg logit contributions
 welcome-0.001
 liking-0.001
 hugging-0.001
 helping-0.001
 handing-0.001